Node Dominance: Revealing Community and Core-Periphery Structure in Social Networks
Jennifer Gamble, Harish Chintakunta, Adam Wilkerson, Hamid Krim, Ananthram Swami
11 Node Dominance: Revealing Community andCore-Periphery Structure in Social Networks
Jennifer Gamble,
Student Member, IEEE,
Harish Chintakunta,
Member, IEEE,
Adam Wilkerson,
Member, IEEE,
Hamid Krim,
Fellow, IEEE and Ananthram Swami,
Fellow, IEEE,
Abstract —This study relates the local property of node dom-inance to local and global properties of a network. Iterativeremoval of dominated nodes yields a distributed algorithm forcomputing a core-periphery decomposition of a social network,where nodes in the network core are seen to be essential interms of network flow and global structure. Additionally, theconnected components in the periphery give information aboutthe community structure of the network, aiding in community de-tection. A number of explicit results are derived, relating the coreand periphery to network flow, community structure and globalnetwork structure, which are corroborated by observationalresults. The method is illustrated using a real world network(DBLP co-authorship network), with ground-truth communities.
Index Terms —Core-periphery, community detection, simplicialcollapse, topological data analysis, social network.
I. I
NTRODUCTION O NE of the interesting challenges in social networks is torelate local connectivity properties to global structure.The motivation for doing do stems from the belief that localproperties reflect interactions amongst individuals (or entities).Therefore such relationships help us make inferences about thenature of interactions which led to the network, by studying itsglobal properties. In this paper, we present the local propertyof node dominance as a method for network analysis. Wewill show why node dominance is such a useful criterion,by developing a low complexity, distributed algorithm for thecore-periphery decomposition of a network based on nodedominance criteria. We will also demonstrate its relation tothe network community structure.Owing to a localized definition, the node dominance criteriafor a node v can be determined only from a two hop neigh-borhood. A node v is dominated by node w if all nodes thatshare and edge with v , also share an edge with w . The formaldefinition of node dominance is based on a simplicial complex(as opposed to graph) structure, and will be discussed in detaillater. If we iteratively collapse dominated nodes, the resultingset (the network core ) is shown to consist of nodes that areimportant with respect to the network flow, community struc-ture, and global network structure. One especially important Jennifer Gamble and Hamid Krim are with the Department of Electrical andComputer Engineering, North Carolina State University, Raleigh, NC, USA,e-mail: [email protected] and [email protected] Chintakunta is with the Coordinated Science Laboratory, Uni-versity of Illinois Urbana Champaign, Champaign, IL, USA, email: [email protected] Wilkerson, Terrence J. Moore and Ananthram Swami are with theArmy Research Lab, Adelphi, MD, USA. email: add emails property of the core is the preservation of shortest distances,so a shortest path between any two nodes in the core is alsoa shortest path between them in the original network. Thenetwork periphery (the complement to the core, consistingof dominated nodes) is seen to consist of many connectedcomponents, including all the nodes in the network throughwhich no shortest paths pass. These peripheral componentsalso play a key role in the community structure of the network.The intuitive notion that a network naturally decomposesinto a core and periphery has appeared many times in thesocial network literature over the decades. Researchers haveproposed different interpretations about what such a decom-position should look like, but it is commonly suggested thata ‘core’ should be central to the network (with respect toinformation flow, or shortest paths) [1], have high averagedegree [2], and be relatively well-connected both internally,and to the periphery [3] [4]. In contrast, the periphery shouldbe connected to the core, but extremely sparsely connectedamongst itself.Borgatti and Everett [3] were the first to attempt to an-alytically describe these intuitive properties. They proposedan ‘idealized core-periphery’, wherein every core node isconnected to every other core node, each peripheral node isconnected to the core, and no peripheral nodes are connectedto each other. They would then learn the core-peripherystructure for a given network by assigning each node as‘core’ or ‘periphery’ in the way that best correlated with thisidealized structure. This method assumes explicitly that theprobability of two nodes being joined by an edge is onlya function of their ‘core-ness’, as opposed to some othercharacteristics, such as community membership. In this sense,the core-periphery model considered in [3] is in contrastto common network models based on community structure.Both core-periphery and community network structures can beexpressed using a stochastic blockmodel approach [4], but withdifferent parameters, so under these models a given networkwill not display both structures simultaneously.Another approach, by Rombach et al. [5] presents a gen-eralization of Borgatti and Everett’s philosophy, where a corescore is computed for each node, using a range of possiblecore sizes and continuous/discrete transitions between coreand periphery. Here, they admit that both core-peripheryand community structure are often present in real-world net-works, but still propose the core-periphery decomposition asan alternative/complementary analysis to the more commoncommunity detection methods. In Della Rossa et al. [6], anapproach to periphery detection based on random walks is a r X i v : . [ c s . S I] S e p taken, where is it assumed that due to the extremely sparseconnectivity of the periphery, a random walk will exit the setof peripheral nodes very quickly. Thus, a core-periphery profilefor the network, along with a coreness value for each node,is computed using a greedy algorithm that incrementally addsnodes to the periphery in a way that minimizes the expectedexit time of a random walk. Again, this method focuses veryheavily on the sparsity of the periphery, and is somewhatunrelated to any community structure that may be present inthe network. For a good review of existing methods of core-periphery network decomposition, see the survey by Csermely et al. [2], or the introductory sections in [5].Traditionally, approaches to community detection in net-works have assumed that communities form a partition of thenetwork, with each node belonging to exactly one community.A foundational method has been the Girvan-Newman algo-rithm [7], where communities are detected though iterativeremoval of edges with high betweenness centrality. Theydefined the notion of ‘modularity’ as a stopping criterion fortheir algorithm, and many subsequent algorithms attempt topartition a network in such a way that optimizes (usuallyapproximately) modularity [8], or cut ratio (approximatedusing spectral clustering) [9]. Fortunato provides an excellentoverview of the breadth and depth of approaches to thecommunity detection problem in his 100 page survey paper[10]. In more recent years, researchers are determining thatpartition-based methods are often somewhat unrealistic, sincereal-world networks with ground-truth communities typicallydisplay overlapping community structure [11], where one nodemay have multiple community memberships. See Xie et al. [12] for a survey of methods for overlapping communitydetection, including clique percolation, link clustering, andfuzzy detection methods using mixed-membership stochasticblock models, or nonnegative matrix factorization.A particularly realistic model for overlapping communitydetection is Yang and Leskovec’s community-affiliation graphmodel (AGM) [13] [14]. This model considers communitiesas ‘overlapping tiles’, and its distinguishing feature is thatregions of community overlaps are more densely connectedthan regions involving single communities. Precisely, the prob-ability of an edge existing between two vertices is based onthe communities they share, with higher probability whenthey have more community memberships in common. Thisassumption is validated on data sets with ground-truth com-munity memberships available, where higher edge densitiesare observed in community intersections [13]. AGM, and theother methods for overlapping community detection are morerealistic than the partition-based methods, but they do not scaleup well with size of the network. A recent relaxation of AGM,referred to as Cluster Affiliation Model for Big Networks(BIGCLAM) [15], allows nodes to have continuous-valuedcommunity memberships, indicating their degree of involve-ment in a given community. This reduces the combinatorialoptimization in AGM to a continuous optimization that canbe solved using nonnegative matrix factorization, making itviable for large networks. We will return to these models inSection IV-C.In the current paper, we will see how a core-periphery struc- ture and a community structure are both present in real-worldnetworks, and how node dominance informs us about both.The relationship between the core-periphery and communitystructure of a network has been touched upon previously byLeskovec et al. [16], where they also noted the presence ofa network periphery, defined in terms of whiskers (clusters ofnodes that are separable from the main network by removinga single edge), which were interpreted as small communities,weakly connected to the remaining network “core”. In theAGM model mentioned above [14], Yang and Leskovec referto the overlapping portions of communities as the “core” of thenetwork. We will see that this interpretation does in fact concurwith our notion of core and periphery, where in networkswith ground-truth communities available, the nodes in thecore obtained using node dominance typically have multiplecommunity memberships, while the nodes in the peripheryhave fewer community memberships (often just one).Iterative node dominance collapses were originally proposedindependently by Wilkerson et al. [17] and Barmak andMinian [18], as a homology/homotopy-preserving simplifi-cation of a simplicial complex, with the distributed versiondescribed in [19]. Here, we explore much more deeply theuse of this simplification as a network core, and describe therelationship between the core-periphery decomposition, andthe community structure, global structure, and network flowproperties.In Section II, we will first describe the relevant informationfor the simplicial complex representation of a network, and thebackground and definition of the node dominance criterion.We follow this in Section III by statements and derivations ofthe resulting properties of core-periphery decomposition, andpresent an algorithm for the use of peripheral components incommunity detection. In Section IV, we illustrate our methodwith two real-world network data sets which contain ground-truth community information. We not only empirically verifythe importance of core nodes with respect to network flowand global structure, but see that our propose d use of theperipheral components for community detection outperformsBIGCLAM, which is considered the current state-of-the-artmethod for overlapping community detection in large net-works. Finally, in Section V we draw some conclusions,and discuss the limitations of our method, as well as somedirections for future research.II. B ACKGROUND
A. Simplicial homology
A graph G = G ( V, E ) is defined by a list, V , of its vertices,as well as a list, E , of the pairs of vertices that are joinedby an edge. An implicit assumption in this is that an edge e = ( v i , v j ) ∈ E can only be present in G if both of itsvertices v i and v j are in V . The notion of a simplicial complexis a higher-order generalization of a graph, while similarlypreserving this ‘closed under subsets’ property. Definition (Simplicial complex) . A k -simplex σ =( v , v , . . . , v k ) is a set of ( k + 1) singleton elements (called vertices ). A simplicial complex K is a set of simplices (i.e. aset of sets of vertices) such that (i) if σ, τ ∈ K , then σ ∩ τ ∈ K (ii) if τ ≤ σ , then τ ∈ K where ≤ indicates the subset relation. If τ ≤ σ , we call τ a face of σ . A simplex σ is maximal if there are no τ ∈ K such that σ <τ . A k -simplex has dimension k . The dimension of simplicialcomplex K is the maximum dimension of any simplex in K dim( K ) = max σ ∈ K dim( σ ) . A subset K (cid:48) of a simplicial complex K is called a subcomplex ,if K (cid:48) is itself a simplicial complex (satisfying properties (i)and (ii) above). The k -skeleton of K is the subcomplex formedby all simplices in K with dimension at most kk -skeleton of K = { σ ∈ K | dim( σ ) ≤ k } Definition.
Let K and K be two simplicial complexeswith vertex sets V and V . A map φ : V → V on thevertex sets induces a simplicial map φ : K → K on thecomplexes, if for every simplex σ = ( v , . . . , v k ) ∈ K , theset ( φ ( v ) , . . . , φ v k ) spans a simplex in K . A simplicialmap φ : K → K induced by an isomorphic map on thevertex sets is said to be an isomorphic simplicial map , and inthis case, K and K are isomorphic simplicial complexes . In Section III, this isomorphism between complexes will beused to describe the uniqueness of the core obtained usingnode dominance collapsing.Given a graph G = G ( V, E ) , we can think of G as the 1-skeleton of a simplicial complex, whose higher-dimensionalsimplices have not been directly observed. The maximalsimplicial complex whose 1-skeleton is equal to G is calledthe flag complex . Definition (Flag complex) . Given a graph G = G ( V, E ) , thesimplicial complex X ( G ) = { σ = ( v i , v i , . . . , v i dim σ ) | ( v i j , v i k ) ∈ E for all ≤ j, k ≤ dim σ } contains a simplex σ whenever all pairs of vertices in σ areconnected by an edge in E . X ( G ) is called the flag complex of G . As we will see in Section II-B1, if we have additionalinformation about the k -tuple relations in G , we may build asimplicial complex using that information, adding k -simplex σ whenever its vertices satisfy a k -tuple relation, and allfaces of the simplex are also present. In the absence of suchinformation, when only the graph G is given, we propose theuse of the flag complex, and see that it can be very informative.A final notion we will mention here is the definition of the homology of a simplicial complex. Definition (Homology) . We encode the structure of simpli-cial complex X through boundary maps { ∂ k } dim( X ) k =1 , where ∂ k gives the oriented connectivity information between k -simplices and ( k − -simplices. Then the k -th homology groupof X is H k ( X ) = ker( ∂ k ) / im( ∂ k +1 ) See, for example, [20] for a more mathematically completedefinition of simplicial homology.
Intuitively, the dimension of the k -th homology spacecounts the number of k -dimensional “holes” in the simplicialcomplex. These can be thought of as ( k + 1) -dimensionalvoids enclosed by k -simplices, so H counts the number ofloops which are not “filled-in” by triangles, and H counts thenumber of voids. The interpretation of H is slightly different:it counts the number of connected components of X (whichmay be interpreted as cycles of dimension zero).The sequence of homology spaces of a simplicial complex,in essence, specify the ”global structure” of the complex. Forour purposes, we will not be computing any homology directly,but we will see that by preserving homology during our nodedominance collapse, we will in fact be preserving importantglobal structure of the network. B. Node dominance
We will be representing a network using its flag complex,and in that setting, node dominance is characterized by thefollowing definition.
Definition.
The neighbor set of a node v , is the set of allnodes sharing an edge with v , as well as v itself: N [ v ] := { u ∈ V | ( u, v ) ∈ E } ∪ { v } . A node v is dominated by one of its neighbors w , if and onlyif N [ v ] ⊆ N [ w ] i.e., all the neighbors of v are also neighborsof w . To understand the importance and relevance of this defini-tion, we will explore a bit of its history, and related concepts.
1) Homology of a relation:
Definition. A relation on two sets A and B is a function r : A × B → { , } . We say that elements a i , a j ∈ A are related (through element b ) if there exists an element b ∈ B such that r ( a i , b ) = 1 and r ( a j , b ) = 1 . Similarly, b i , b j ∈ B are relatedif there exists an a ∈ A such that r ( a, b i ) = 1 and r ( a, b j ) =1 . For A and B finite, the relation r can be represented by an | A | × | B | binary matrix R = ( r ij ) , where r ij = r ( a i , b j ) . As an example, the elements of set A could be actors, andthe elements of set B could be movies, with r ( a, b ) = 1 whenever actor a appears in movie b .Given a relation, there are two ways to encode its structureas a simplicial complex. The first way, which we will denoteas X R ( A, B ) , the elements of A are represented as vertices,and vertices { a i , a i , . . . , a i k } are spanned by a k -simplexwhenever there exists a b ∈ B such that r ( a i l , b ) = 1 forall l = 0 , , . . . , k . The second way, which we will denoteas X R ( B, A ) , the elements of B are represented as vertices,and { b j , b j , . . . , b j k } are similarly spanned by a k -simplexwhenever they are all related by the same a ∈ A . Note also thatfor any simplicial complex X (even if it wasn’t constructedusing a relation) one may form its dual complex ˆ X , by lettingeach maximal simplex in X correspond to a vertex in ˆ X . Inthat case, a set of vertices in ˆ X are spanned by a simplex iftheir associated simplices in X all had a vertex in common. In the example with actors and movies, this means thatwe can represent their relationships by building a simplicialcomplex where actors are vertices, and simplices are formedbetween actors who are in the same movie; or alternatively,we can encode it by using movies as vertices and spanninga set of movies by a simplex when they all feature the sameactor.Note that these two simplicial complexes may have drasti-cally different structure (different number of vertices, differentdimension), but Dowker [21] proved that the two complexeshave exactly the same homology (in the sense that the k th homology groups of the two complexes are isomorphic, forall k ). Theorem II.1 (Dowker) . If R is a relation on sets A and B , with associated simplicial complexes X R ( A, B ) and X R ( B, A ) , then H k ( X R ( A, B )) ∼ = H k ( X R ( B, A )) for all k
2) Node dominance and equivalent notions:
In light of thedual simplicial complexes presented in Section II-B1, we cannow give the more general definition of node dominance.
Definition (Node dominance) . Given simplicial complex X and its dual complex ˆ X , each vertex v ∈ X has an associatedsimplex σ v ∈ ˆ X . We say a vertex v is dominated by vertex w , if σ v is a face of σ w . This occurs exactly when the set ofsimplices incident to (i.e. containing) v is a subset of the setof simplices incident to w (in X ). When the simplicial complex of interest is a flag complex,we know that the presence of a higher dimensional simplexis determined by the presence of its constituent edges. This iswhy we are able to check the node dominance criterion usingonly the neighbor sets of our vertices, in the flag complexsetting: if the neighbors of v are all neighbors of w , then theset of simplices incident to v is a subset of the set of simplicesincident to w .To illustrate the concept of node dominance using theexample of actors and movies, consider two actors, representedby separate vertices a i and a j in X R ( A, B ) . If the moviesfeaturing actor a i is a (proper) subset of the movies featuringactor a j (i.e. a i is dominated by a j ), then in the dual complex X R ( B, A ) , the simplex σ a i will be a (proper) face of simplex σ a j . Thus, removing actor a i (and all its incident simplices)completely, will not change the simplicial structure of thedual complex X R ( B, A ) at all, and thus will not change thehomology of the original complex X R ( A, B ) .The insight that removing dominated nodes does not changethe homology of the simplicial complex, suggests an algo-rithm, as originally proposed (independently) by [17] and [18],to simplify a simplicial complex by iteratively removing suchvertices. In the work by Barmak and Minian [18], they termthe removal of a dominated node a strong homotopy collapse ,node dominance is a stricter condition than that required fora regular homotopy-preserving simplicial collapse [22].In Figure 1, vertex v is dominated by vertex w , where vertex w could have additional connections in the network whichare not shown. The removal of vertex v does not create or destroy any connected components, loops, or voids (preserveshomology), and does not affect shortest path lengths betweenother nodes (see Section III-A). • So, turn off all nodes satisfying this inclusion (cid:3) (cid:141) vw • So, turn off all nodes satisfying this inclusion (cid:3) (cid:141) w Fig. 1. Node v , dominated by node w . Removal of v only has local effects. One more definition we will note is that of a , which is the neighbor set of a node that also contains all“friends of friends”, instead of just immediate neighbors: N [ v ] = { u ∈ V | ( u, v ) ∈ E, or ( u, v i ) ∈ E for some v i ∈ N [ v ] } Performing the node dominance collapse using the 2-hop neighborset can allow greater collapsability in networks with few dominatednodes. It also allows small holes in the flag complex (i.e. thosewith hop length ≤
6) to be “filled in”, so only larger homologicalfeatures are preserved. We will use this version of the node dominancecollapse on one of the data sets in Section IV.
3) Distributed algorithm for flag complexes:
Assuming a flagcomplex structure, the node dominance collapse can be performedreferring only to its 1-skeleton (the original graph under analysis).Moveover, the criterion for determining node dominance requiresonly local information, making the algorithm of distributed nature.This algorithm was first presented in [19].Each node v has the list of its neighbor set N [ v ] , and it thenexecutes the following steps during each iteration: Distributed algorithm for node dominance collapse
Broadcast N [ v ] to neighbors for v i ∈ N [ v ] , v i (cid:54) = v Receive N [ v i ] if N [ v i ] ⊆ N [ v ] Broadcast OFF to v i if OFF received from v i Handshake to determine if v or v i turns off end ifend ifend forif OFF received OR Handshake determined v turns off v designated OFF else Update N [ v ] , omitting OFF neighborsA very similar distributed algorithm is also possible in the non-flag complex setting, where there exists some a priori informationabout which k -tuples of simplices are related. An example of thiswould be the list of movies and actors, or some other relation(eg. authors/papers). In that case three actors (vertices) are onlyspanned by a triangle when there is a single movie they all appearedin together, not only if they had all appeared in movies togetherpairwise, as in the flag complex case. To compute node dominancein that setting, we only need to assume that each node has accessto its list of maximal simplices (eg. an actor has its movie list, anauthor has its paper list, etc.). Then the algorithm above can proceedexactly as written, with N [ v ] replaced by the maximal simplex listof v . III. P
ROPERTIES OF CORE AND PERIPHERY
In this section, we will outline both the analytical and empiricallyobserved properties of the core-periphery decomposition obtainedthrough the iterative node dominance collapse. Examples of theobserved properties on real-world data sets are presented in SectionIV-A.
Analytical properties:
1) Shortest paths in the core are shortest paths in the originalnetwork. (Network flow)
2) Nodes with betweenness centrality zero are not in the core (Network flow)
3) A node is more likely to be dominated by a node sharing thecommunity membership(s) of its neighborhood set, comparedto a node which does not. (Community structure)
4) The homology of the flag complex of the core is the same asthe homology of the flag complex of the entire network (Globalstructure)
5) The structure of the core is unique (all possible cores for agiven network are isomorphic as simplicial complexes) (Globalstructure)
Observed properties: • Core nodes typically have high degree and high betweennesscentrality. ‘Hub’ nodes are in the core. (Network flow) • Nodes with multiple ground-truth community membership la-bels tend to be in the core, while nodes with just one (orno) community labels are usually in the periphery. (Communitystructure) • Using the peripheral groups, we can obtain candidate setsthat are seen to contain a large proportion of ground-truthcommunities. See Section IV-C for details, and our use of thesecandidate sets for community detection. (Community structure) • The core is stable with respect to the order of collapses in theiterative algorithm. (Global structure)
Throughout this section, for a graph G = G ( V, E ) , the core G C = G ( V C , E C ) is the graph induced by the set of nodes V C ⊆ V whichremain upon an iterative and total removal of dominated nodes from V . Note that the set V C (and thus the core itself) is not necessarilyunique, because of a potential random ‘handshake’ in the Algorithmfrom Section II-B3. The statements given below are valid for anycore obtained by the procedure of iterative node dominance collapse.As we will discuss further in Section III-C below, all possible coresobtained from the same initial graph have the exact same structure(are isomorphic) [23]. A. Network flow
The properties in this subsection involve statements about shortestpaths between given nodes in the network. An outline of a proofsimilar to Property III.1 is given in [17], and we include the proofhere for completeness.
Definition (Shortest paths) . Given a graph G (cid:48) = G ( V, E ) , for anypair of points v i , v j , ∈ V , a path p = ( v i = v , v , . . . , v l = v j ) ∗ is a sequence of vertices such that ( v k , v k +1 ) ∈ E for all k =1 , . . . , l − . The path has length | p | = l , and p is a shortest path if l ≤ | p (cid:48) | for any other path p (cid:48) from v i to v j . The set of all shortestpaths from v i to v j , in the graph G (cid:48) is denoted SP G (cid:48) ( v i , v j ) . Property III.1 (Shortest paths in the core are shortest paths in theoriginal network.) . For v , v ∈ V C , if p ∈ SP G C ( v , v ) , then p ∈ SP G ( v , v ) .Proof: For any graph G (cid:48) , let v j be dominated by its neigh-bor v i . Consider any shortest path p = ( . . . , v k , v j , v l , . . . ) passing through v j . Note that k, l (cid:54) = i [Proof by contradic-tion: p = ( . . . , v i , v j , v l , . . . ) could be replaced by shorter path ( . . . , v i , v l , . . . ) , since N [ v j ] ⊆ N [ v i ] so v l ∈ N [ v j ] ⇒ v l ∈N [ v i ] ]. So p = ( . . . , v k , v j , v l , . . . ) can be replaced by p (cid:48) =( . . . , v k , v i , v l , . . . ) , which is the same length as p , but doesn’tcontain v j .Therefore, the length of all shortest paths in G (cid:48) (where v j is not thesource or destination) are preserved when v j is removed. ∗ Note that there is no loss of generality by using indices 1,2,. . . ,l
Definition (Betweenness centrality) . The betweenness centrality of anode v is defined as the proportion of shortest paths between nodes s and t that pass through v , summed over all pairs s, t (cid:54) = v . i.e.) bc( v ) = (cid:88) s,t (cid:54) = v |{ p ∈ SP G ( s, t ) | v ∈ p }|| SP G ( s, t ) | Property III.2 (If the size of the core is greater than 1 ∗ , nodes withbetweenness centrality zero are not in the core) . bc( v ) = 0 ⇒ v (cid:54)∈ V c Proof:
Using the definition of betweenness centrality above, wecan see that bc ( v ) = 0 ⇒ |{ p ∈ SP G ( s, t ) | v ∈ p }| = 0 ∀ s, t (cid:54) = v. Therefore, either(i) deg( v ) = 1 (ii) ∀ s, t, ∈ N [ v ] , ( s, t ) ∈ E (so that . . . , s, v, t, . . . will not be inany shortest path)If (i), then v is dominated.If (ii), then N [ v ] is a clique, so for any w ∈ N [ v ] with w (cid:54) = v , N [ v ] ⊆ N [ w ] . This implies v is dominated by all its neighbors. Inthis case, either v is removed and therefore in the periphery, or allits neighbors are removed and v is the only node in the core. Sincewe assume that the size of the core is greater than 1, v (cid:54)∈ V C .Both of these properties speak to the ‘centrality’ of the nodes inthe core, with respect to the original network. Property III.1 tells usthat there is no way to shortcut through the periphery when travelingbetween two nodes in the core, and Property III.2 says the nodesthat are not involved in any shortest paths are guaranteed to becontained in the periphery. Together, we can conclude that the nodedominance collapse only has local effects (with respect to shortestpaths in the network), in that only shortest paths beginning or endingat the dominated node are affected.Empirically, we see that nodes with high betweeness centralityand nodes with high degree will lie in the core (see Section IV-Afor concrete examples). These are ‘hub’ nodes, in terms of networkflow properties, so removal of nodes in the core have a much greaterimpact on network information flow than removal of nodes from theperiphery. B. Community structure
The community affiliation graph model (AGM) proposed by Yangand Leskovec [13] assumes that the probability of an edge formingbetween two nodes depends on the community membership(s) of thenodes under consideration. This is similar to the traditional stochasticblockmodel (which require communities to form a partition of thenetwork), or generalizations [24] of the stochastic blockmodel thatallow for overlapping communities, with the notable exception thatunder AGM the edge density in the intersections of communitiesis higher than the edge density in the non-overlapping portions ofcommunities.For notation, consider the set C = { c k } mk =1 defining the m communities in the network, where c k is the set of nodes belongingto the k th community. Note that each node in V may belong tozero, one, or multiple communities. For two nodes u, v ∈ V ,let C uv = { c ∈ C | u, v ∈ c } denote the set of communitiescontaining both u and v . We will also use the more general notation C S = { c ∈ C | ∃ v ∈ S s.t. v ∈ c } to denote the set of communitymemberships for nodes in a given set S . Under AGM, an edge formsbetween u and v , independently, with probability p c for each of thecommunities c ∈ C uv . In other words, denoting the probability of anedge between u and v by p ( u, v ) = P [( u, v ) ∈ E ] , we have p ( u, v ) = 1 − (cid:89) c ∈ C uv (1 − p c ) . (1) ∗ In practice, this assumption is almost always satisfied.
Further, Yang and Leskovec define a baseline edge probability ε = p ( u, v ) for u, v with no communities in common. They choose ε = | E || V | ( | V |− , which is typically a number of orders of magnitudesmaller than the p c probabilities. For the proof of the followingresult, we assume the AGM model for network community structure,however the result would still hold for any model that bases the prob-ability of an edge between two nodes on the community membershipof the nodes, where the probability of an edge is significantly higherfor nodes sharing communities than nodes not sharing communities. Property III.3 (A node is more likely to be dominated by anode sharing the community membership(s) of its neighborhood set,compared to a node which does not.) . In other words, v is dominatedby w with much higher probability when C N [ v ] ⊆ C w as comparedto the case when C N [ v ] (cid:54)⊆ C w Proof:
The probability that v is dominated by w is P [ v dom. by w ] = (cid:89) v i ∈N [ v ] p ( w, v i )= (cid:89) v i ∈N [ v ] C wvi (cid:54) = ∅ − (cid:89) c ∈ C wvi (1 − p c ) (cid:89) v i ∈N [ v ] C wvi = ∅ ε In other words, v will be dominated by w , only if thereexist edges between w and all v i ∈ N [ v ] . Each of theseedges occurs independently, with probability p ( w, v i ) , withthe value given in Equation (1) if w and v i share communitymembership(s) (i.e. if C wv i (cid:54) = ∅ ), and p ( w, v i ) = ε otherwise.Since ε (cid:28) p k for all k , P [( w, v i ) ∈ E | C wv i (cid:54) = ∅ ] (cid:29) P [( w, v i ) ∈ E | C wv i = ∅ ] Therefore P [ v dom. by w | C N [ v ] ⊆ C w ] (cid:29) P [ v dominated by w | C N [ v ] (cid:54)⊆ C w ] In real world networks (as described in Section IV-A), nodes inthe periphery typically have one (or no) community membership(s),while nodes in the core have multiple community memberships, andlie in the intersections of communities. In Section IV-C, we willtake this interpretation further, by proposing a method for using theperipheral components to obtain candidate sets which are likely tocontain communities of the network. We can think of the peripheralcomponents as the non-overlapping portions of the communities,in which case the true network communities would consist of aperipheral component, along with adjoining nodes in the core. It isalso possible that a single community could have non-overlappingportions which “stick out” from the core in multiple places, onaccount of which we propose a method of combining peripheralcomponents according to which core nodes they connect to. Thisyields an algorithm for obtaining “candidate sets” which are intendedto contain the true network communities. This method is discussedfurther in Section IV-C.
C. Global structure
As described in Section II-B, when the flag complex representa-tions of the original network and the core network are used, the core isseen to have the exact same homology as the original complex, in thesense that their homology spaces are isomorphic in all dimensions.
Property III.4 (Homology is preserved in the core) . H k ( X ( G C )) ∼ = H k ( X ( G )) for all k Proof:
This property follows immediately from the result ofDowker’s Theorem (that a simplicial complex and its dual complexhave the same homology), combined with the observation that if avertex is dominated, its corresponding simplex in the dual complexwill be a face of the simplex corresponding to the dominating node,and thus will not contribute to the structure of the dual complex.An alternative formulation and proof is available in [18].A corollary of Property III.1 is that at least one shortest cyclefor each homology class is retained in the core. Thus, not only is thedimension of each homology space preserved, but the ‘hole locations’in the network are also preserved. It is this additional property thattruly allows us to interpret the core as the global scaffolding for thenetwork.Property III.4, together with Property III.3 tell us that nodes withdiverse friend sets (including bridging ties) will be in the core. If theyare not, it is only because they are dominated by another node withall the same diverse connections. In real-world networks, we see thatthe average clustering coefficient for nodes in the core is much lowerthan in the network as a whole (see Section IV-A), which supportsthe ‘diverse friend set’ interpretation, because the friends of a corenode are usually not friends with each other.
IV. A
NALYSIS OF REAL - WORLD NETWORKS
We will use two data sets in this section as a running illustration,both obtained from the Stanford SNAP network database [25]. Thefirst is a coauthorship network built from the DBLP computerscience bibliography, and the second is a co-purchasing networkfrom Amazon. The networks were originally analyzed by Yang andLeskovec [11] in one of the first papers to systematically analyzethe properties of ground-truth communities (abbreviated in figuresas GTCs) in real-world networks. Both communities have ground-truth community labels: 13,477 ground-truth community labels inDBLP, defined as connected components of authors within the samepublication venue; and 271,570 ground-truth community labels inAmazon, defined using product categories. Additionally, Yang andLeskovec labeled 5000 of the communities in each data set as“best” in terms of having community-like properties such as lowconductance or high triangle-participation ratio. We computed thecore-periphery decomposition for both networks using the iterativenode dominance collapse algorithm described in Section II-B3. Forthe Amazon co-purchasing network, the periphery consisted of 70716nodes (accounting for only 21% of the nodes in the network), eachof which were singletons, connected only to the core and not toother peripheral nodes. To allow further collapse, we re-computedthe core using the 2-hop neighbor sets N [ v ] described in SectionII-B2. This yielded 193,195 nodes in the periphery (57.7% of thenodes in the network), with 70716 peripheral components, of which20136 were non-singletons (of varying sizes). All analysis presentedbelow uses the regular node dominance collapse on the DBLP dataset, and the node dominance collapse based on 2-hop neighbor setsfor the Amazon data set.Descriptive statistics for the networks, as well as for their asso-ciated core-periphery partitions, are presented in Table IV. For thecomputations of average degree and clustering coefficient, the valueswere computed with respect to the entire network, and again withrespect to the induced subgraph under consideration (either the coreor periphery).To verify the stability of the core under multiple realizations ofthe node dominance collapse algorithm, we performed the followingrandomization: For one realization of the iterated node dominancecollapse, we would compute the set of dominated nodes, pick oneat random to collapse, add the newly dominated nodes to the set ofdominated nodes, randomly pick the next dominated node to collapse,and so on. After performing 100 realizations of the core-peripherydecomposition on the two data sets, we found that 99.58% (DBLP)and 99.43% (Amazon) of the nodes in the core were present in thecore on every realization. The set of nodes that appeared in thecore on some (but not all) realizations was 0.89% (DBLP) 1.24%(Amazon) the size of the core. Thus, not only is the shape of the TABLE ID
ESCRIPTIVE STATISTICS FOR THE
DBLP
AND A MAZON DATA SETS , ANDTHEIR CORE - PERIPHERY DECOMPOSITIONS .DBLP AmazonNodes in core: 71,018 141,688Nodes in periphery: 246,062 193,195Nodes (total):
Edges within core: 318,741 347,527Edges within periphery: 274,367 218,237Edges between core and periphery: 456,758 360,108Edges (total):
Mean degree:Entire network 6.62 5.53Core (w.r.t entire network) 15.41 7.45Core (w.r.t. core) 8.98 4.91Periphery (w.r.t entire network) 4.09 4.12Periphery (w.r.t periphery) 2.23 2.26Clustering coefficient:Entire network 0.632 0.397Core (w.r.t entire network) 0.285 0.219Core (w.r.t. core) 0.255 0.182Periphery (w.r.t entire network) 0.733 0.527Periphery (w.r.t periphery) 0.385 0.293Communities (total):Number 13,477 271,570Average size 53.41 11.67Standard deviation of size 257.58 273.66Communities (best):Number 5000 5000Average size 22.45 13.49Standard deviation of size 201.08 17.52 core unique, but the actual nodes composing it are very stable inthese real-world data sets.
A. Relationship of core-periphery to network structure
For both data sets, we observe (Table IV) that nodes in the corehave higher degree than nodes in the periphery, with the differenceespecially pronounced in the DBLP network. Additionally, nodes inthe core have lower clustering coefficient, which corroborates ourintuition that core nodes have “diverse friend sets”, so their friendsare not all friends with each other. Along with their high degree, thisis also interpretable as having reach outside of their local community.Scatterplots showing the natural logarithm of betweenness central-ity versus node degree are shown in Figure 2, with the two plots of thesame data alternating whether core or periphery is plotted on top, tohelp display the region of overlap. As mentioned in Section III-A, allnodes with betweenness centrality of zero (i.e. nodes through whichno shortest paths pass) are guaranteed to be in the periphery, and weobserve that additionally, all of the nodes with highest betweennesscentrality are in the core. For example, in Figure 2, it can be seenthat in the DBLP data set there is a threshold betweenness centralityvalue (around ln( bc ) = 17 ), above which all nodes are in the core,while in the Amazon data set, it is the nodes with both high degreeand high betweenness centrality that appear exclusively in the core.Figure 3 shows the number of ground-truth community assign-ments per node in the core and periphery of the DBLP and Amazonnetworks. Out of all the nodes in the periphery, 22.11% had noground-truth community (GTC) membership labels, 57.39% hadexactly one, and 20.49% had more than one GTC membership label.On the other hand, out of the nodes in the core 85.02% had multipleGTC membership labels, while 12.65% had a single community,and only 2.33% had no GTC label. From another perspective, theperiphery contained 97.05% of the nodes without a GTC label,94.02% of the nodes with a single label, but 45.51% of the nodeswith multiple labels (however of those nodes multiply labeled, theaverage number of labels was 2.9 in the periphery, but 7.0 in thecore). A similar behavior is observed in the Amazon network, albeitto a lesser extent, and likely due to the average number of labels pernode being much higher. Fig. 2. Log betweenness centrality vs degree in core and periphery (DBLP-top, Amazon-bottom)
01 2+
01 2+
01 2+
Fig. 3. Number of community memberships for nodes in core and periphery(DBLP-top, Amazon-bottom)
B. Role of core in network flow
To demonstrate the key role our core nodes play in informationflow over the network, we computed their contribution to the shortestpaths of the network. For each network, we randomly chose 1000pairs of nodes, and computed shortest paths between them. Since100% of these paths contain at least one node from the core,we computed the proportion of each path that is in the core. Forcomparison, we chose three sets of nodes, each with the same numberof nodes as the core: chosen uniformly randomly; using the nodesof highest degree; and using the nodes with highest betweennesscentrality. Then, using the same 1000 shortest paths, we computedthe proportion of nodes from each path belonging to each of thesesets. Taking the average over all 1000 paths, the mean proportionof each path contained in the four sets (Core, Highest BC, HighestDegree, and Random) are shown in Table IV-B. Since betweennesscentrality measures how many shortest paths pass through a node,the nodes with highest betweenness centrality should be the optimalchoice for this measure (if considering all shortest paths in the entirenetwork), so it is not surprising that they have the highest proportionof shortest path nodes. What is somewhat more surprising, is thatfor both data sets, the nodes in the core out-perform the nodes withhighest degree, so a greater proportion of nodes in shortest pathsbelong to the core, than belong to the equal-sized set of highestdegree nodes. The proportion of nodes in the shortest paths that
TABLE III
MPORTANCE OF CORE NODES , HIGH BETWEENNESS CENTRALITY NODES , HIGH DEGREE NODES , AND RANDOMLY CHOSEN NODES , IN SHORTESTPATHS OF THE
DBLP
AND A MAZON NETWORKS
Proportion of nodes in shortest pathsbelonging to important setsDBLP AmazonHighest BC 0.785 0.892
Core 0.753 0.841
Highest degree 0.739 0.698Random 0.222 0.427 belong to the Random set give us a baseline probability from whichto compare the other choices of “important” nodes. Recall also, thatbetweennness centrality is very expensive computationally, requiringglobal information, so it is useful that the distributed core-peripherycomputation be nearly comparable at obtaining nodes central tonetwork flow.
C. Community detection
The findings of this study are consistent with the communityaffiliation graph model (AGM) of Yang and Leskovec [13], [14],in the sense that it supports an overlapping community modelfor social and information networks where the probability of anedge between two nodes is related to their common communitymembership(s), with higher probabilities of edges between nodesthat have multiple communities in common. Under this model, weshowed that nodes are only dominated (with very high probability)by nodes which share their community memberships. Interpretingour peripheral components with respect to this model, they appear tobe the ‘non-overlapping’ parts of communities that stick out of thenetwork. Figure 4 shows embeddings of some peripheral componentsfrom the DBLP data set as examples, where the peripheral componentis drawn in black, while the core nodes and connecting edges aregrey. The internal structure and connectivity to the core can varyconsiderably between peripheral components. −1 −0.5 0 0.5 1 1.5−1−0.500.511.5
Embedding of peripheral group 3484 −1.5 −1 −0.5 0 0.5 1 1.5−0.8−0.6−0.4−0.200.20.40.60.81
Embedding of peripheral group 3651 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3−0.8−0.6−0.4−0.200.20.40.60.8
Embedding of peripheral group 7438
Fig. 4. Example peripheral components.
In light of the interpretation of peripheral components as non-overlapping portions of communities, we propose an algorithm whichconsists of taking unions of these peripheral components, along withtheir neighboring nodes in the core, to obtain candidate sets forcommunity detection.More precisely, let
P C = { pc i } | PC | i =1 denote the set of peripheralcomponents in the network, where each node in the periphery isin exactly one peripheral component, pc i . Then define the extendedperipheral components P C + = { pc + i } | PC | i =1 where pc + i = { v ∈ V c | ∃ v j ∈ pc i s.t. ( v j , v ) ∈ E } ∪ pc i , so each extended peripheral component additionally contains all thenodes in the core that share an edge with a vertex of the peripheralcomponent. The extended peripheral components are meant to ap-proximate ground-truth communities in the data set, however thereare large numbers of very small size (such as those consisting of anisolated peripheral node and its single neighboring core node). Weconsolidate extended peripheral components into “candidate sets” bytaking, for each v ∈ V C , the union of all extended peripheral groups that include v . So we obtain { cs v } v ∈ V C , where cs v = (cid:91) pc + i ∈ PC + v ∈ pc + i pc + i . For example, if there were many peripheral nodes connected toa single core node (but not connected amongst each other), thisgroup would be consolidated into a single candidate set. We thenremove any candidate sets cs v that are repetitions or subsets of othercandidate sets, to obtain our final set of maximal candidate sets: CS .Intuitively, our candidate sets are meant to approximate ground truthcommunities, or unions of ground truth communities (that overlap oncommon core nodes).To judge the performance of our candidate sets for the purposesof community detection, we also ran the BIGCLAM algorithm [15]on the DBLP data set. Popular methods for detection overlappingcommunities include clique percolation, link clustering, and fuzzydetection methods using mixed-membership stochastic block models(see [12] for a survey), however none of these methods scale upwell to networks with hundreds of thousands or millions of nodes.The recent exception to this is Yang and Leskovec’s BIGCLAMalgorithm, which can estimate the overlapping community structurefor large networks. The BIGCLAM algorithm (available in the SNAPC++ package [26]) allows the user to input the expected numberof communities, but runs into memory problems if the number ofcommunities is larger than a few hundred. It also has an option forthe algorithm to learn the appropriate number of communities, with adefault to test between 5 and 100 communities. Therefore, to obtaina set of communities of the same order as the number of ground-truth communities (13,477 for the DBLP data set), we performedBIGCLAM in a nested manner. First obtaining 100 communities, andthen further subdividing each of these, where the optimal number ofsubcommunities was most often also 100. This yielded a total of9904 detected communities from the BIGCLAM algorithm. We usedthe same method for analysis of the Amazon data set, yielding 8899BIGCLAM communities, even though that network has a much largernumber of ground-truth communities (271,570). For both data sets,the number of candidate sets obtained using our method was around40,000 (47,134 for DBLP and 37,449 for Amazon).To measure the fit of the candidate sets and BIGCLAM communi-ties to the ground-truth communities, we used precision, recall, andaverage F1 score. For a detected community C and ground truthcommunity C (the target), the precision is the proportion of detectednodes that belong to the target: precision ( C , C ) = | C ∩ C || C | , the recall is the proportion of target nodes captured in the detectedcommunity: recall ( C , C ) = | C ∩ C || C | , and the F1-score is the harmonic mean of precision and recall: F C , C ) = precision ( C , C ) · recall ( C , C )2( precision ( C , C ) + recall ( C , C )) . These three values for a given ground-truth community are obtainedby maximizing each over all candidate sets (BIGCLAM communi-ties), and an average precision, recall, and F1-score for the ground-truth communities is obtained. Similarly, the three values are obtainedfor each candidate set (BIGCLAM community) by thinking of it asthe “target” community, and maximizing precision, recall, and F1-score over all ground-truth communities, and then taking the averageof these maxima.Using all three of these values (precision, recall, and F1-score)helps offset some of the discrepancies caused by the varying numbersof ground-truth communities, candidate sets, and BIGCLAM commu-nities. Since the matching of ground-truth communities onto detected
TABLE IIID
ETECTION OF ALL GROUND - TRUTH COMMUNITIES BY CANDIDATE SETSAND
BIGCLAM
COMMUNITIES
DBLP (all 13,477 communities)Candidate sets BIGCLAMground-truth detected average ground-truth detected averageRecall
Precision
F1-score communities, but also the matching of detected communities ontoground-truth communities, are considered, having more candidate setsthan BIGCLAM communities will not necessarily be an advantage.Table IV-C gives the values for recall, precision and F1-scorewhen comparing the ground-truth communities to our candidate sets(left three columns), and to the BIGCLAM communities (right threecolumns). The performance using candidate sets and BIGCLAMcommunities are compared for each measure (eg. “ground-truth com-munity recall”, or “ average precision”), with the values in boldfaceindicating the method (candidate sets or BIGCLAM) with superiorperformance in that measure. The column “ground-truth” gives theaverage values for the ground truth communities (when maximizedover the detected communities), and the column “detected” gives theaverage for the detected communities (when maximized over ground-truth communities).Our candidate sets give better overall community detection perfor-mance than the BIGCLAM communities (as measured by the averageF1-score). For the DBLP data set, the ground-truth communitieswere contained in the candidate sets (based on higher ground-truth recall scores), more so than the candidate sets found strongly-matching ground-truth communities (although it is worth noting, asYang and Leskovec did, that not all “true” ground-truth communitiesnecessarily have ground-truth community labels in this data set). Theperformance on the Amazon data set is quite good, with very highground-truth recall and detected recall and precision for both thecandidate sets and the BIGCLAM methods, although our candidatesets out-performed BIGCLAM in detected recall, as well as ground-truth, detected and average F1-scores.The analysis was repeated using only the 5000 “best” ground-truthcommunities, and again the candidate sets resulted in higher averageF1-scores than the BIGCLAM communities. The main differencewas that recall for the ground-truth communities increased (onaverage, each ground-truth community had a candidate set it was94% contained in), while recall and precision for the candidate setsdecreased (since there were fewer ground-truth communities to matchto, fewer detected had a well-matched ground-truth community). Itis also worth noting that for the DBLP data set 81.7% of the bestground-truth communities were completely contained in at least onecandidate set, while 73.8% of the best ground-truth communities werecompletely contained in at least one BIGCLAM community. For the Amazon data set, these values were 94.8% for the candidate sets, and82.8% for the BIGCLAM communities.The challenge of detecting thousands of overlapping communitiesfrom a large network is formidable. Currently there are no availablemethods which achieve excellent performance when comparing de-tected to ground-truth communities. Based on the analysis of twolarge, real-world data sets with ground-truth community informa-tion, our proposed algorithm of obtaining candidate sets from theperipheral components of the core-periphery decomposition, yieldedmore accurate community detection results than the state-of-the-artBIGCLAM algorithm for overlapping community detection, withmuch lower complexity and a distributed algorithm.
V. C
ONCLUSION
This study posed the question “How does the concept of nodedominance relate to local and global properties of a network?”.Previous work determined that iteratively removing dominated nodesis a homology-preserving way to perform a collapse/simplification ofa simplicial complex [18] [17]. This was extended into a distributedalgorithm for the case of flag complexes [19]. Here, we undertook aninvestigation of the theoretical and practical properties of performingsuch a collapse on social and information networks, and discoveredthat it has implications for both a core-periphery decomposition ofthe network, as well as uncovering network community structure.The properties of the core and periphery that we developed inSection III, and observed in Section IV, lead to the interpretationthat nodes in the core obtained using node dominance collapse areimportant with respect to network flow, to the global structure of thenetwork, and to the network community structure.The core nodes are essential to network flow because of twoproperties: a shortest path between any two points in the coreis contained in the core; and nodes with betweenness centralityzero (through which no shortest paths pass) are never in the core.Observationally, ‘hub’ nodes are contained in the core, and core nodesoften have high degree and high betweenness centrality.The global structure of the network is preserved in the corebecause the homology of the core is the same as the homology ofthe entire network, when considering the respective flag complexes.This can be interpreted as node dominance collapses only having‘local’ effects, and that nodes with diverse neighbor sets (includingbridging ties) are members of the core, maintaining a scaffoldingfor the global structure of the network. The observation that eachcore node typically has a diverse neighbor set (their friends are notall friends with each other) is also quantified by their relatively lowclustering coefficient values.Finally, the core is related to the community structure of the net-work because under community membership models where within-community connections have significantly higher probability thancross-community connections, we see that nodes are dominated (withhigh probability) by nodes that share their community membership(s).In real-world networks with overlapping ground-truth communitylabels, this is observed through nodes with multiple communitymemberships typically residing in the core, and through nodes withsingle (or no) community labels occupying the periphery.The result relating the core-periphery to the community structureof the network gives us an additional application: the use of theperipheral components to generate “candidate sets” which are likelyto contain the true network communities. Many state-of-the-art com-munity detection algorithms which allow for overlapping communi-ties, are not scalable past network sizes of a few thousand nodes.The notable recent exception is Yang and Leskovec’s BIGCLAMalgorithm, which our method is shown to outperform on their DBLPdataset.Implications of this work may be of interest not only to researchersexplicitly interested in a core-periphery decomposition of complexnetworks, but to anyone studying community structure, or key nodesfor network flow. Hopefully this work will also serve to furtherpopularize the node dominance collapse for use in general contextswhere data is represented using a simplicial complex structure. One limitation of our method is that some networks don’t collapseusing node dominance. For example, on Facebook there are veryfew people who have a friend list completely contained in thefriend list of another person. One option for future research in thisdirection would involve performing the node dominance collapselocally on ego networks, and consolidating the resulting communities.Another potential drawback is the nondeterministic nature of the nodedominance collapse algorithm. Perhaps under some circumstances itwould be wise to consider the set of nodes that are “ever in the core”,or “always in the core”, under repeated realizations of the algorithm.In practice however (Section IV-A), we have seen that these two setsare quite similar.One other area for future research is in the study of the core undera graph evolution. Either using observed or model-generated dynamicnetworks, studying how the core varies over time could be used tohelp evaluate or predict community structure and key players in thenetwork. R EFERENCES[1] P. Holme, “Core-periphery organization of complex networks,”
PhysicalReview E , vol. 72, no. 4, p. 046111, 2005.[2] P. Csermely, A. London, L.-Y. Wu, and B. Uzzi, “Structure and dynamicsof core/periphery networks,”
Journal of Complex Networks , vol. 1, no. 2,pp. 93–123, 2013.[3] S. P. Borgatti and M. G. Everett, “Models of core/periphery structures,”
Social networks , vol. 21, no. 4, pp. 375–395, 2000.[4] X. Zhang, T. Martin, and M. Newman, “Identification of core-peripherystructure in networks,” arXiv preprint arXiv:1409.4813 , 2014.[5] M. P. Rombach, M. A. Porter, J. H. Fowler, and P. J. Mucha, “Core-periphery structure in networks,”
SIAM Journal on Applied mathematics ,vol. 74, no. 1, pp. 167–190, 2014.[6] F. Della Rossa, F. Dercole, and C. Piccardi, “Profiling core-peripherynetwork structure by random walkers,”
Scientific reports , vol. 3, 2013.[7] M. E. Newman and M. Girvan, “Finding and evaluating communitystructure in networks,”
Physical review E , vol. 69, no. 2, p. 026113,2004.[8] M. E. Newman, “Fast algorithm for detecting community structure innetworks,”
Physical review E , vol. 69, no. 6, p. 066133, 2004.[9] P. K. Chan, M. D. Schlag, and J. Y. Zien, “Spectral k-way ratio-cut partitioning and clustering,”
Computer-Aided Design of IntegratedCircuits and Systems, IEEE Transactions on , vol. 13, no. 9, pp. 1088–1096, 1994.[10] S. Fortunato, “Community detection in graphs,”
Physics Reports , vol.486, no. 3, pp. 75–174, 2010.[11] J. Yang and J. Leskovec, “Defining and evaluating network communitiesbased on ground-truth,” in
Proceedings of the ACM SIGKDD Workshopon Mining Data Semantics . ACM, 2012, p. 3.[12] J. Xie, S. Kelley, and B. K. Szymanski, “Overlapping communitydetection in networks: The state-of-the-art and comparative study,”
ACMComputing Surveys (CSUR) , vol. 45, no. 4, p. 43, 2013.[13] J. Yang and J. Leskovec, “Community-affiliation graph model foroverlapping network community detection,” in
Data Mining (ICDM),2012 IEEE 12th International Conference on . IEEE, 2012, pp. 1170–1175.[14] ——, “Overlapping communities explain core–periphery organization ofnetworks,” 2014.[15] ——, “Overlapping community detection at scale: a nonnegative matrixfactorization approach,” in
Proceedings of the sixth ACM internationalconference on Web search and data mining . ACM, 2013, pp. 587–596.[16] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney, “Communitystructure in large networks: Natural cluster sizes and the absence of largewell-defined clusters,”
Internet Mathematics , vol. 6, no. 1, pp. 29–123,2009.[17] A. C. Wilkerson, T. J. Moore, A. Swami, and H. Krim, “Simplifyingthe homology of networks via strong collapses,” in
Acoustics, Speechand Signal Processing (ICASSP), 2013 IEEE International Conferenceon . IEEE, 2013, pp. 5258–5262.[18] J. A. Barmak and E. G. Minian, “Strong homotopy types, nerves andcollapses,”
Discrete & Computational Geometry , vol. 47, no. 2, pp. 301–328, 2012.[19] A. C. Wilkerson, H. Chintakunta, H. Krim, T. J. Moore, and A. Swami,“A distributed collapse of a network’s dimensionality,” in
Proceedingsof the IEEE Global Conference on Signal and Information Processing(GlobalSIP) . IEEE, 2013, pp. 595–598. [20] A. Hatcher,
Algebraic Topology . Cambridge University Press, 2002.[21] C. Dowker, “Homology groups of relations,”
Annals of mathematics , pp.84–95, 1952.[22] J. H. C. Whitehead, “Simplicial spaces, nuclei and m-groups,”
Proceed-ings of the London mathematical society , vol. 2, no. 1, pp. 243–327,1939.[23] J. Matouˇsek, “Lc reductions yield isomorphic simplicial complexes,”
Contributions to Discrete Mathematics , vol. 3, no. 2, 2008.[24] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, “Mixedmembership stochastic blockmodels,” in
Advances in Neural InformationProcessing Systems , 2009, pp. 33–40.[25] J. Leskovec and A. Krevl, “SNAP Datasets: Stanford large networkdataset collection,” http://snap.stanford.edu/data, Jun. 2014.[26] J. Leskovec and R. Sosiˇc, “SNAP: A general purpose network analysisand graph mining library in C++,” http://snap.stanford.edu/snap, Jun.2014.PLACEPHOTOHERE
Jennifer Gamble
Biography text here.PLACEPHOTOHERE
Harish Chintakunta
Biography text here.PLACEPHOTOHERE
Adam Wilkerson
Biography text here.PLACEPHOTOHERE
Terrence J. Moore
Biography text here. PLACEPHOTOHERE
Ananthram Swami
Biography text here.PLACEPHOTOHERE