Community Detection with and without Prior Information
aa r X i v : . [ phy s i c s . s o c - ph ] O c t Community Detection with and without Prior Information
Armen E. Allahverdyan , Greg Ver Steeg , and Aram Galstyan Yerevan Physics Institute, Alikhanian Brothers Street 2, Yerevan 375036, Armenia, Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA
We study the problem of graph partitioning, or clustering, in sparse networks with prior infor-mation about the clusters. Specifically, we assume that for a fraction ρ of the nodes their truecluster assignments are known in advance. This can be understood as a semi–supervised versionof clustering, in contrast to unsupervised clustering where the only available information is thegraph structure. In the unsupervised case, it is known that there is a threshold of the inter–clusterconnectivity beyond which clusters cannot be detected. Here we study the impact of the prior in-formation on the detection threshold, and show that even minute [but generic] values of ρ > PACS numbers:
Graph partitioning is an important problem with awide range of applications in circuit design, data mining,social sciences, etc [1]. In the context of social networkanalysis, a relaxed version of this problem is known as community detection , where community is loosely definedas a group of nodes so that the link density within thegroup is higher than across different groups. Many real–world networks have well–manifested community struc-ture [1], which explain the significant attention this prob-lem has received recently. Indeed, much recent researchhas focused on developing community detection methodsusing various approaches. A recent review of existingapproaches can be found in [2].Generally, most algorithms are able to detect commu-nities accurately if the number of inter–community edgesis not very large. The detection becomes less accurate asone increases the density of links across the communities.In fact, most community detection algorithms seem tohave an intrinsic threshold in inter–community couplingbeyond which detection accuracy is very poor [3]. Re-cently this problem has been studied theoretically by for-mulating community detection as a minimization of a cer-tain Potts-Ising Hamiltonian [4]. It was shown that thegraph partitioning problem is indeed characterized by aphase transition from detectable to undetectable regimesas one increases the coupling strength between the clus-ters [4]. Specifically, for sufficiently large inter–clustercoupling, the ground state configuration of the Hamilto-nian has random overlap with the underlying communitystructure.Most work on community detection so far has consid-ered unsupervised version of clustering, where the onlyavailable information is the graph structure. In many sit-uations, however, one might have additional informationabout possible cluster assignments of certain nodes. Gen-erally speaking, such information can be in form of pair-wise constraints (via must– and cannot links), or, alter-natively, via known cluster assignments for a fraction ofnodes. Here we consider the latter scenario, which has at- tracted recent interest in the context of semi–supervisedlearning and classification [5–7]. For instance, classifi-cation of text documents can be posed as a graph clus-tering problem, with links based on proximity for somesimilarity score. In this case, we could ask how pickingsome small random fraction of documents to be classifiedby humans will affect our clustering algorithm. Semi–supervised learning falls in between unsupervised (i.e.,regression) and totally supervised methods (i,e., cluster-ing). The main premise of the semi–supervised learningis to use prior information about fraction of data pointsin order to facilitate the classification of the other nodes.Here we are specifically interested in graph–based semi–supervised learning. In this approach, one first maps thedata to a (weighted) graph using pair–wise similaritiesbetween different data points, and then partitions thenodes in this the graph, e.g., using spectral clustering.We note that while most clustering methods have beendeveloped for unweighted (homogeneous) networks, gen-eralization to the weighted situation has been suggestedas well [2, 8].Despite extensive amount of work in semi–supervisedlearning in recent years, there is a lack of adequatetheoretical development. The purpose of this Letter isto present a theoretical analysis of the semi–supervisedversion of the community detection, and uncover newscenarios of community detection facilitated by semi–supervising. Here we focus on the so called planted bisec-tion graph model [4, 9], where the clusters are introducedby hand (implanted), and one checks whether the clus-tering method under consideration will recognize them.This model is the (supposedly) simplest laboratory forstudying the foundations of clustering methods.Our main contributions can be summarized as follows:For unweighted graphs, we show analytically that anysmall (but finite) amount of prior information destroysthe critical nature of cluster detectability, by shiftingthe detection threshold to its lowest possible value. Fur-thermore, for graphs where links within and across com-munities have different weights, we find that the semi–supervision leads to detectable clusters even below theintuitive weigh–balanced value. Note that for weightedgraphs the very definition of the communities is ratherambiguous. Our results suggests that the availability ofprior information might resolve this ambiguity.
Model : Consider an Erd¨os–R´enyi graph where eachpair of nodes is linked with probability α/N , and where N is the number of nodes in the graph. We assume thateach link carries a weight J >
0. Now imagine a pairof such identical Erd¨os–R´enyi graph, which models twoclusters (communities). Besides the intra-cluster J -links,each node in one graph is linked with probability γ/N with any node of another Erd¨os–R´enyi graph. Theseinter-cluster nodes are given weight K >
0. For clar-ity, both J and K are assumed to be integer numbers.This planted bisection graph model [9] will be employedfor studying the performance of the cluster detectionmethod, which places an Ising spin on each node andlets these spins interact via the network links [4, 10]: H = − X Ni
0) for α + γ < α + γ > α = 1, even very small inter–cluster coupling γ nullifies m . Fig. 1 shows that at thedetection threshold α > γ ; moreover the difference α − γ at the threshold grows as p π ( α + γ ) for a large α + γ ;see (17). Thus, the ratio α − γα + γ converges to zero for alarge α + γ . In this weak sense, the detection thresholdconverges to α = γ for large α + γ , while for any finite α the unsupervised clustering detection threshold lies belowthe line α = γ ; see Fig. 1. Under semi-supervising we still employ (12, 13) andobtain (14, 15, 16), but now in the RHS of these equationsone should substitute m → ρ + (1 − ρ ) m and q → ρ +(1 − ρ ) q . Expanding over a small m we get1 − q = e − ( α + γ )( ρ +[1 − ρ ] q ) I [( α + γ )( ρ + [1 − ρ ] q )] , (19) m = ρ ( α − γ )(1 − q ) (cid:20) I [( α + γ )( ρ + [1 − ρ ] q )] I [( α + γ )( ρ + [1 − ρ ] q )] (cid:21) . Now m > α − γ >
0. This is the average-connectivity threshold, which for the considered un- æ æ æ æ æ æ æ æ æ æ æ æ æ ææ ààààààààààà à à à à ììììììììììììì ì ì Α m FIG. 2: Normal curve: magnetization m versus α for γ =1. m undergoes second-order phase-transition at α = 3 . m versus α for γ = 1 for ρ = 0 . ρ = 0 .
05 (bottom).Symbols: mean magnetization (with 99% confidence inter-val) for numerical experiments using simulated annealing ongraphs with 10,000 nodes for ρ = 0 (circles), 0 .
05 (squares),and 0 . ρ . . .
005 0 α . . . . q . . . . m . . × − m | α =2 . . . − TABLE I: Weighted situation: 2 J = K = 2. For γ = 1and various semi-supervising degrees ρ we list the clusteringthreshold α and the values of q and m at this threshold. weighted scenario is the only possible definition of clus-tering. Thus, any generic semi-supervising leads to thetheoretically best possible threshold α = γ ; see Fig. 2.Note that for ρ >
0, (19) has a non-trivial solution q < α + γ >
0: the percolation bound isalso diminished by a small [but generic] ρ . Arbitrary degree distributions can also be considered inthis framework. First, note that the excess degree distri-bution for n intra–cluster and m inter–cluster edges in (6)can also be written in terms of an overall excess degreedistribution and a parameter p out which determines theprobability that a given edge connects to a node outsidethe cluster of the current node. α n e − α n ! γ m e − γ m ! = ∞ X s =0 q ( s ) s X k =0 (cid:18) sk (cid:19) p kout (1 − p out ) s − k (20) × δ n,s − k δ m,k . In this case, q ( s ) = e − ( γ + α ) ( γ + α ) s /s ! and p out = γ/ ( α + γ ). Now, we want to consider q ( s ) to be an arbi-trary excess degree distribution, while p out measures theconnection between clusters; p out = 0 indicates disjoint clusters while p out = indicates no cluster structure.We employ a second trick by rewriting q ( s ) in terms of agenerating function G ( x ) = P s x s q ( s ) using the inverseformula q ( s ) = lim x → ∂ sx G ( x ) /s !.This transformation leads to expressions similar to(14). For ρ = 0, c n = lim x → (cid:18) q + e mq − e m (cid:19) n I n [ p q − e m ∂ x ] e (1 − q ) ∂ x G ( x ) , (21)where e m = (1 − p out ) m . If we use the generating func-tion for the Poisson distribution G ( x ) = e − ( α + γ )(1 − x ) andconsider a power series representation of this expression,we see that we should just replace ∂ x → ( − α − γ ) andwe recover (14) exactly.Rewriting the modified Bessel function in integral formand using the identity lim x → e y∂ x G ( x ) = G ( y ), providesthe following succinct expression.1 − q = 1 π Z π dθ G ( p q − e m cos( θ ) + 1 − q ) (22) m = e mq π Z π dθ G ( p q − e m cos( θ ) + 1 − q ) r − (cid:16) e mq (cid:17) cos θ − m, q . We also recover the result that these equa-tions apply in the supervised case with the substitutions m → ρ + (1 − ρ ) m and q → ρ + (1 − ρ ) q on the RHS.For a power–law degree distribution, we get results qual-itatively the same as depicted in Fig. 2, replacing the α with p out on the x axis. Specifically, for power–lawnetworks, both the analytic results above and numericalexperiments using simulated annealing confirm the exis-tence of a detection threshold in the unsupervised case,while the supervised case leads to nonzero magnetizationwhenever p out < . The weighted situation J = K will be studied via twoparticular (but important) cases 2 J = K = 2 and 2 K = J = 2, to make them amenable to analytic approach.Putting it into (6) and using (13) two times, we see thatthere are now four order-parameters: m, q, q ≡ c + c − , m ≡ c − c − . (24)Note that only q and m are observed from the single-spinstatistics, see (10, 11); q and m can be observed onlyvia measuring the internal field distribution e P ( h ).For 2 J = K = 2 we introduce the following notations C p ( q, m ) = a X ∞ n = −∞ I n (˜ u ) I p − n (˜ x ) cosh[2˜ yn − ˜ vn − ˜ yp ] ,S p ( q, m ) = a X ∞ n = −∞ I n (˜ u ) I p − n (˜ x ) sinh[2˜ yn − ˜ vn − ˜ yp ] ,a = 2 e − ( α + γ ) q , z ± = q ± m, z ± = q ± m , ξ = 2 ρ − ρ , ˜ x = (1 − ρ ) q ( αz − + γz +1 )( α [ z + + ξ ] + γz − ) , (25)˜ y = 12 ln αz − + γz +1 α [ z + + ξ ] + γz − , ˜ v = 12 ln z + − z +1 + ξz − − z − , (26)˜ u = γ (1 − ρ ) q ( z − − z − )( z + − z +1 + ξ ) , (27)and write down the order-parameter equations: q = X ∞ p =1 C p ( q, m ) , q = C ( q, m ) , (28) m = X ∞ p =1 S p ( q, m ) , m = S ( q, m ) . (29)Eqs. (28, 29) apply also for 2 K = J = 2, but now in (25–27) we should interchange α and γ , and then substitute˜ y → − ˜ y and ˜ v → − ˜ v . Discussion.
Eqs. (28, 29) predict a second-order transi-tion over m and m . Similarly to the unsupervised case,the threshold of this transition is found via expanding(28, 29) over m and m . But the real qualitative differ-ences between weighted and unweighted situations showup under semi-supervising, which we consider in moredetails below.First we focus on 2 J = K = 2 and recall that theclustering threshold is defined via m = 0. While forthe previous unweighted situation, any amount of semi-supervision (as quantified by ρ ) sufficed for shifting theclustering threshold to a ρ -independent value, here thedetection threshold starts to depend on ρ , and the small-est threshold is achieved for ρ →
0; see Table 1 foran illustration. To understand this seemingly counter-intuitive observation, note that detection threshold isachieved as a balance between the inter–cluster links withthe average connectivity γ and weight K = 2, and intra–cluster links with the average connectivity α and weight J = 1. Consider now the impact of the semi–supervisionon a test-spin. The inter–cluster links will exert negativefields on this test–spin, while the intra–cluster links willexert positive fields. Since inter-cluster links have twicelarger weigh, increasing ρ facilitates the negative fields.This explains why vanishing semi-supervising ρ → m and m simultane-ously turn to zero at the threshold—we get m > m = 0;see Table 1. Since m cannot be observed via a singlespin, this memory is hidden. The reason of m > ρ .
005 0 . . α . . . . q . . . . m − . − . − . J = 2 K = 2. that m counts the internal fields equal to ±
1, and thereare more such fields coming from the intra–cluster [con-nectivity α , weight 1] links that exert positive fields dueto the semi-supervised (frozen) spins.Now consider perhaps the most paradoxical aspect ofthe semi-supervised detection threshold: it is smaller than the value deduced from balancing the cumulativeweights of intra–cluster and inter–cluster links, whichyields αJ = γK . Indeed, according to Table I (where γ = 1) we have α = 1 . ρ →
0) versusthe weight-balancing value α = 2. This result seeminglycontradicts the intuition we got so far: i) a rough intu-ition about Hamiltonian (1) is that it is based on defininga cluster via the intra–cluster weight being larger thanthe inter–cluster weight. ii) The unsupervised thresh-old is well above the weight-balancing prediction; see Ta-ble I. iii)
In the unweighted case ( J = K ) the semi-supervising just reduces the detection threshold towards α = γ , which coincides with the weight-balancing value.To understand this effect, we turn to the physical pic-ture of the threshold, where positively and negativelyacting links driven by the semi-supervised (frozen) spinscompensate each other. At the weight-balance αJ = γK (with J < K ) fewer (but stronger) inter–cluster linkshave the same weight as more numerous (but weaker)intra–cluster links. Since the intra–cluster links are morenumerous, their overall effect on a (randomly chosen) testspin is more deterministic and hence capable of buildingup a positive m at αJ = γK . Thus, the actual thresholdis reached for αJ < γK .We thus conclude that for weighted graph K > J asmall [but generic] semi-supervising can be employed fordefining the very clustering structure. This definitionis non-trivial, since it performs better than the weight-balancing definition. Indeed, for a weighted network thedefinition of detection threshold is not clear a priori , incontrast to unweighted networks, where the only possi-ble definition goes via the connectivity balance α = γ .To illustrate this unclarity, consider a node connectedto one cluster via few heavy links, and to another clus-ter via many light links. To which cluster this nodeshould belong in principle ? Our answer is that the propercluster assignment in this case can be defined via semi-supervising.It is interesting to calculate m at the weight-balancingvalue αJ = γK , since this is the semi-supervising benefitof those who would insist on the weight-balancing defi-nition of the threshold; see Table I. Note finally that forlarge values of γ both unsupervised and semi-supervisedthresholds converge to αJ = γK , since now fluctuationsare irrelevant from the outset.All these effects turn upside-down for 2 K = J = 2;see Table II. Now the threshold is minimized for themaximal semi-supervising ρ → m is negative at thethreshold—and thus the memory about the clusteringis contained in m − m > α is always larger than the weight-balancing value γK/J . These results are explained by”inverting” the above arguments developed for J < K . In conclusion , we analyzed the community detectionin semi–supervised settings, where one has prior informa-tion about the community assignments of certain nodes.We showed that for the planted bisection graph modelwith intra–cluster and inter–cluster average connectivi-ties α and γ , respectively, even a tiny (but finite) semi-supervising shifts the detection threshold to its intuitivevalue α = γ . We observed a similar effect of lowereddetection threshold for weighted graphs. In contrast tothe unweighted case, the shift in this case depends on thedegree of supervision. Furthermore, we found that whenapproaching the unsupervised limit by having ρ → + ,the detection threshold converges to a value lower (bet-ter) from the one obtained via balancing intra–clusterand inter–cluster weights. We suggest that this can serveas an alternative definition of clusters. We also saw thatin the semi-supervised case some (hidden) memory onthe clustering survives at the detection threshold.Although this work focused on the analytically simplercase of Erd¨os–R´enyi graphs, we have repeated the ana-lytic and numerical analysis for power–law graphs andfound similar results, suggesting that the impact of thenetwork topology is quantitative rather than qualitative.A similar picture has been observed in [4].An interesting generalization is to consider choosingfrozen spins more deliberately (e.g., based on node con-nectivity) as opposed to random selection studied here.While this difference might not be important for ERgraphs, it might lead to some significant quantitativechanges for power–law graphs with a large number ofwell–connected hubs. Finally, it would be interesting toto consider prior information not only about nodes, butalso about links in the network. An example is a con-straint that two nodes belong to the same community.In our model, this can be incorporated by making thecoupling between those pairs sufficiently strong. This scenario resonates well with a recent observation that thelinks in a network are usually community–specific, whilethe nodes might participate in different communities [16].This research was partially supported by the U.S. AROMURI grant W911NF–06–1–0094. A.E.A. would like toacknowledge support by Volkswagenstiftung. [1] M. E. J. Newman, PNAS , 8577 (2006).[2] S. Fortunato, Physics Reports , 75–174 (2010).[3] L. Danon et al. , J. Stat. Mech.: Theory Exp., P09008(2005).[4] J. Reichardt and M. Leone, Phys. Rev. Lett. , 078701(2008).[5] X. Zhu and A. B. Goldberg, Introduction to Semi-Supervised Learning , Synthesis Lectures on Artificial In-telligence and Machine Learning, Morgan & ClaypoolPublishers, (2009).[6] G. Getz, N. Shental, and E. Domany,
Semi-supervisedlearning a statistical physics approach (Proc. 22nd ICMLWorkshop on Learning with Partially Classified TrainingData, Bonn, Germany, 2005).[7] M. Leone et al. , Eur. Phys. J. B , 125 (2008).[8] M. E. J. Newman, Phys. Rev. E , 056131 (2004).[9] A. Condon and R. M. Karp Algorithms for Graph Parti-tioning on the Planted Partition Model , Random Struc-tures and Algorithms , pp.116–140, 2001.[10] J. Reichardt and S. Bornholdt, Phys. Rev. E , 1067 (1987).[12] I. Kanter and H. Sompolinsky, Phys. Rev. Lett. , 164(1987).[13] Y. Y. Goldschmidt, Phys. Rev. B , 8148 (1991).[14] M. Mezard and G. Parisi, Eur. Phys. J. B , 217 (2001).[15] L. Viana and A. J. Bray, J. Phys C,18