[PDF] Community Detection with and without Prior Information

Abstract

We study the problem of graph partitioning, or clustering, in sparse networks with prior information about the clusters. Specifically, we assume that for a fraction ρ of the nodes their true cluster assignments are known in advance. This can be understood as a semi--supervised version of clustering, in contrast to unsupervised clustering where the only available information is the graph structure. In the unsupervised case, it is known that there is a threshold of the inter--cluster connectivity beyond which clusters cannot be detected. Here we study the impact of the prior information on the detection threshold, and show that even minute [but generic] values of ρ>0 shift the threshold downwards to its lowest possible value. For weighted graphs we show that a small semi--supervising can be used for a non-trivial definition of communities.

Full PDF

aa r X i v : . [ phy s i c s . s o c - ph ] O c t Community Detection with and without Prior Information

Armen E. Allahverdyan , Greg Ver Steeg , and Aram Galstyan Yerevan Physics Institute, Alikhanian Brothers Street 2, Yerevan 375036, Armenia, Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA

We study the problem of graph partitioning, or clustering, in sparse networks with prior infor-mation about the clusters. Speciﬁcally, we assume that for a fraction ρ of the nodes their truecluster assignments are known in advance. This can be understood as a semi–supervised versionof clustering, in contrast to unsupervised clustering where the only available information is thegraph structure. In the unsupervised case, it is known that there is a threshold of the inter–clusterconnectivity beyond which clusters cannot be detected. Here we study the impact of the prior in-formation on the detection threshold, and show that even minute [but generic] values of ρ > PACS numbers:

Graph partitioning is an important problem with awide range of applications in circuit design, data mining,social sciences, etc [1]. In the context of social networkanalysis, a relaxed version of this problem is known as community detection , where community is loosely deﬁnedas a group of nodes so that the link density within thegroup is higher than across diﬀerent groups. Many real–world networks have well–manifested community struc-ture [1], which explain the signiﬁcant attention this prob-lem has received recently. Indeed, much recent researchhas focused on developing community detection methodsusing various approaches. A recent review of existingapproaches can be found in [2].Generally, most algorithms are able to detect commu-nities accurately if the number of inter–community edgesis not very large. The detection becomes less accurate asone increases the density of links across the communities.In fact, most community detection algorithms seem tohave an intrinsic threshold in inter–community couplingbeyond which detection accuracy is very poor [3]. Re-cently this problem has been studied theoretically by for-mulating community detection as a minimization of a cer-tain Potts-Ising Hamiltonian [4]. It was shown that thegraph partitioning problem is indeed characterized by aphase transition from detectable to undetectable regimesas one increases the coupling strength between the clus-ters [4]. Speciﬁcally, for suﬃciently large inter–clustercoupling, the ground state conﬁguration of the Hamilto-nian has random overlap with the underlying communitystructure.Most work on community detection so far has consid-ered unsupervised version of clustering, where the onlyavailable information is the graph structure. In many sit-uations, however, one might have additional informationabout possible cluster assignments of certain nodes. Gen-erally speaking, such information can be in form of pair-wise constraints (via must– and cannot links), or, alter-natively, via known cluster assignments for a fraction ofnodes. Here we consider the latter scenario, which has at- tracted recent interest in the context of semi–supervisedlearning and classiﬁcation [5–7]. For instance, classiﬁ-cation of text documents can be posed as a graph clus-tering problem, with links based on proximity for somesimilarity score. In this case, we could ask how pickingsome small random fraction of documents to be classiﬁedby humans will aﬀect our clustering algorithm. Semi–supervised learning falls in between unsupervised (i.e.,regression) and totally supervised methods (i,e., cluster-ing). The main premise of the semi–supervised learningis to use prior information about fraction of data pointsin order to facilitate the classiﬁcation of the other nodes.Here we are speciﬁcally interested in graph–based semi–supervised learning. In this approach, one ﬁrst maps thedata to a (weighted) graph using pair–wise similaritiesbetween diﬀerent data points, and then partitions thenodes in this the graph, e.g., using spectral clustering.We note that while most clustering methods have beendeveloped for unweighted (homogeneous) networks, gen-eralization to the weighted situation has been suggestedas well [2, 8].Despite extensive amount of work in semi–supervisedlearning in recent years, there is a lack of adequatetheoretical development. The purpose of this Letter isto present a theoretical analysis of the semi–supervisedversion of the community detection, and uncover newscenarios of community detection facilitated by semi–supervising. Here we focus on the so called planted bisec-tion graph model [4, 9], where the clusters are introducedby hand (implanted), and one checks whether the clus-tering method under consideration will recognize them.This model is the (supposedly) simplest laboratory forstudying the foundations of clustering methods.Our main contributions can be summarized as follows:For unweighted graphs, we show analytically that anysmall (but ﬁnite) amount of prior information destroysthe critical nature of cluster detectability, by shiftingthe detection threshold to its lowest possible value. Fur-thermore, for graphs where links within and across com-munities have diﬀerent weights, we ﬁnd that the semi–supervision leads to detectable clusters even below theintuitive weigh–balanced value. Note that for weightedgraphs the very deﬁnition of the communities is ratherambiguous. Our results suggests that the availability ofprior information might resolve this ambiguity.

Model : Consider an Erd¨os–R´enyi graph where eachpair of nodes is linked with probability α/N , and where N is the number of nodes in the graph. We assume thateach link carries a weight J >

0. Now imagine a pairof such identical Erd¨os–R´enyi graph, which models twoclusters (communities). Besides the intra-cluster J -links,each node in one graph is linked with probability γ/N with any node of another Erd¨os–R´enyi graph. Theseinter-cluster nodes are given weight K >

0. For clar-ity, both J and K are assumed to be integer numbers.This planted bisection graph model [9] will be employedfor studying the performance of the cluster detectionmethod, which places an Ising spin on each node andlets these spins interact via the network links [4, 10]: H = − X Ni αN . Likewise, K ij identically andindependently are equal to zero with probability 1 − γN and to K > γN . In order to enforceequipartition, the Hamiltonian (1) will be studied underthe constraint X Ni =1 s i + X Ni =1 ¯ s i = 0 . (2)Thus, detecting the sign of a given spin s i at zero tem-perature (so as to exclude all thermal ﬂuctuations) wecan conclude to which cluster the corresponding nodebelongs: all spins having equal signs belong to the samecluster. The error probability for the cluster assignmentis p e , which can also be viewed as the fraction of in-correctly identiﬁed spins or one minus the probability ofcorrectly identifying a node’s community. This error isdirectly related to magnetization, p e = (1 − | m | ) / , (3) m ≡ [ h s i i T =0 ] av = − [ h ¯ s i i T =0 ] av , (4)where m is the (single–cluster) magnetization, h . . . i T =0 isthe zero-temperature Gibbsian average, i.e. the averageover all conﬁgurations of spins having in the thermody-namic limit the minimal energy given by (1, 2), and where[ . . . ] av is the average over the bi-graph structure, i.e., over { J ij } , { ¯ J ij } and { K ij } . As implied by the self-averagingfeature, instead of taking [ . . . ] av , we can evaluate h s i i T =0 on the most probable bi-graph structure(s). The above formulation refers to the unsupervised com-munity detection. The semi–supervising implies thatfor some nodes their cluster assignment is known in ad-vance [6, 7]. Here we assume that these nodes are dis-tributed randomly over the graph. To account for theprior information, we introduce inﬁnitely strong mag-netic ﬁelds acting on those node in the appropriate direc-tion. Thus, the Hamiltonian (1) is modiﬁed as follows: e H = H − X Ni =1 f i s i − X Ni =1 ¯ f i ¯ s i , (5)where f i (resp. ¯ f i ) are identically and independently dis-tributed random variables that are equal to 0 with prob-ability 1 − ρ and to ∞ (resp. −∞ ) with probability ρ .The constraint (2) is satisﬁed in the average sense. Thus,with respect of two randomly chosen sets of spins (eachcontaining ρN members) we know exactly to which clus-ter they belong, since f i = ∞ implies s i = 1. Below westudy the threshold of cluster detection with and withoutsemi–supervising.It is known that Ising models on Erd¨os–R´enyi graphscan be eﬃciently studied via the cavity method; see, e.g,[11–13]. The main object of this method is the probabil-ity P ( h ) of an internal ﬁeld acting on one s -spin. Thephysical order-parameters are expressed as moments of P ( h ); see (10, 11). As applied to our Hamiltonian (5),the cavity method produces the following equation for P ( h ) [the derivation of this equation is fairly similar tothat given in [11–13] for the ordinary Ising model on theErd¨os–R´enyi graph; so we shall repeat it here]: P ( h ) = X ∞ n =0 X ∞ m =0 α n e − α n ! γ m e − γ m ! × Z ˆ p ( f )f. Z Y mk =1 P ( h k )h. k Z Y nl =1 ¯ P ( g l )g. l × δ (cid:16) h − f − X mk =1 φ [ h k , J ] − X nk =1 φ [ g k , K ] (cid:17) , (6)Here we already assumed the zero-temperature limit [17]denoted φ [ a, b ] ≡ sign( a ) min[ | a | , b ] and g k (resp. h k ) arethe ﬁelds acting on the s -spin from ¯ s -spin (resp. fromother s -spins). These ﬁelds naturally enter with weight γ m e − γ m ! (resp. α n e − α n ! ), which is the excess degree distri-bution of the corresponding Erd¨os–R´enyi network.In (6), the distribution of the frozen (supervising) ﬁeldacting on s -spins is determined byˆ p ( f ) = ρδ ( f − ∞ ) + (1 − ρ ) δ ( f ) . (7)Due to (1–5) and the complete inversion symmetry be-tween the two clusters, we can take ¯ P ( g ) = P ( − g ), andthen (6) is worked out via the Fourier representation ofthe delta-function yielding P ( h ) = ρδ ( h − ∞ ) + (1 − ρ ) e P ( h ) , (8)where e P ( h ) refers to those s -spins, which were not di-rectly frozen by inﬁnitely strong random ﬁelds.It can be seen from (6, 8) that e P ( h ) satisﬁes the fol-lowing equation: e P ( h ) = e − α − γ Z z.2 π e izh exp (cid:20) αρ e − izJ + γρ e izK + α (1 − ρ ) Z g. e P ( g ) e − iz sign( g ) min[ | g | ,J ] + γ (1 − ρ ) Z g. e P ( g ) e iz sign( g ) min[ | g | ,K ] (cid:21) . (9)The physical order-parameters are expressed as [11]: m = [ h s i i T =0 ] av = Z h. e P ( h ) sign( h ) , (10) q = [ h s i i T =0 ] av = Z h. e P ( h ) sign ( h ) , sign(0) = 0 , (11)where [ . . . ] av is now the average over the bi-graph struc-ture and the random ﬁelds. Recall that m deﬁnes theerror probability according to (4). In (11) q diﬀers from1 due to possible contribution ∝ δ ( h ) in e P ( h ). Thus,1 − q is the fraction of spins that do not have deﬁnitemagnetization, since they do not belong to the sub-graphof strongly connected spins [which exists above the per-colation threshold], while q − m is the fraction of spinsthat do not have deﬁnite magnetization, because theyare strongly frustrated, though they do belong to thesub-graph of strongly connected spins. Unweighted ( J = K ) unsupervised ( ρ = 0) situation:Since at T = 0 only the ratio J/K matters (and not theabsolute values of J and K ) we assume J = K = 1.Now the local ﬁelds can attain only integer values andthe solution of (9) is searched for as e P ( h ) = X ∞ n = −∞ c n δ ( h − n ) , (12)which upon substituting into (9) and using e x ( e y + iz + e − y − iz ) = X ∞ n = −∞ I n ( x ) e − ny − inz , (13)where I n ( x ) is the modiﬁed Bessel function, produces c n = e − ( α + γ ) q − ny I n ( x ) , (14)where x ≡ p ( α + γ ) q − ( α − γ ) m ,y ≡ atanh ( γ − α ) m ( α + γ ) q . This then implies via (10, 11, 12)1 − q = e − ( α + γ ) q I [ x ] , (15) m = − e − ( α + γ ) q X ∞ n =1 I n ( x ) sinh[ ny ] . (16) Eq. (16) predicts a second-order transition, where m isthe order-parameter. In the vicinity of the second-orderphase transition one can expand (15, 16) over m :1 − q = e − ( α + γ ) q I [( α + γ ) q ] , (17)1 = ( α − γ )(1 − q ) (cid:18) I [( α + γ ) q ] I [( α + γ ) q ] (cid:19) . (18)where we employed identities involving Bessel functions.We have m > m = 0) if the RHS of (18) is larger(smaller) than its LHS. Non - detectable Detectable ΑΓ FIG. 1: The phase diagram for J = K = 1. The line on the( α, γ ) plane indicates second-order phase-transition from m =0 (no clustering detection) to m > Eq. (18) determines the detection threshold, above ofwhich the method is capable of detecting clustering with better than random probability of error (4). In the α − γ plane, the threshold line starts from ( α = 1 , γ = 0),see Fig. 1, since (17) predicts a percolation bound for q : q = 0 ( q >

0) for α + γ < α + γ > α = 1, even very small inter–cluster coupling γ nulliﬁes m . Fig. 1 shows that at thedetection threshold α > γ ; moreover the diﬀerence α − γ at the threshold grows as p π ( α + γ ) for a large α + γ ;see (17). Thus, the ratio α − γα + γ converges to zero for alarge α + γ . In this weak sense, the detection thresholdconverges to α = γ for large α + γ , while for any ﬁnite α the unsupervised clustering detection threshold lies belowthe line α = γ ; see Fig. 1. Under semi-supervising we still employ (12, 13) andobtain (14, 15, 16), but now in the RHS of these equationsone should substitute m → ρ + (1 − ρ ) m and q → ρ +(1 − ρ ) q . Expanding over a small m we get1 − q = e − ( α + γ )( ρ +[1 − ρ ] q ) I [( α + γ )( ρ + [1 − ρ ] q )] , (19) m = ρ ( α − γ )(1 − q ) (cid:20) I [( α + γ )( ρ + [1 − ρ ] q )] I [( α + γ )( ρ + [1 − ρ ] q )] (cid:21) . Now m > α − γ >

0. This is the average-connectivity threshold, which for the considered un- æ æ æ æ æ æ æ æ æ æ æ æ æ ææ ààààààààààà à à à à ììììììììììììì ì ì Α m FIG. 2: Normal curve: magnetization m versus α for γ =1. m undergoes second-order phase-transition at α = 3 . m versus α for γ = 1 for ρ = 0 . ρ = 0 .

05 (bottom).Symbols: mean magnetization (with 99% conﬁdence inter-val) for numerical experiments using simulated annealing ongraphs with 10,000 nodes for ρ = 0 (circles), 0 .

05 (squares),and 0 . ρ . . .

005 0 α . . . . q . . . . m . . × − m | α =2 . . . − TABLE I: Weighted situation: 2 J = K = 2. For γ = 1and various semi-supervising degrees ρ we list the clusteringthreshold α and the values of q and m at this threshold. weighted scenario is the only possible deﬁnition of clus-tering. Thus, any generic semi-supervising leads to thetheoretically best possible threshold α = γ ; see Fig. 2.Note that for ρ >

0, (19) has a non-trivial solution q < α + γ >

0: the percolation bound isalso diminished by a small [but generic] ρ . Arbitrary degree distributions can also be considered inthis framework. First, note that the excess degree distri-bution for n intra–cluster and m inter–cluster edges in (6)can also be written in terms of an overall excess degreedistribution and a parameter p out which determines theprobability that a given edge connects to a node outsidethe cluster of the current node. α n e − α n ! γ m e − γ m ! = ∞ X s =0 q ( s ) s X k =0 (cid:18) sk (cid:19) p kout (1 − p out ) s − k (20) × δ n,s − k δ m,k . In this case, q ( s ) = e − ( γ + α ) ( γ + α ) s /s ! and p out = γ/ ( α + γ ). Now, we want to consider q ( s ) to be an arbi-trary excess degree distribution, while p out measures theconnection between clusters; p out = 0 indicates disjoint clusters while p out = indicates no cluster structure.We employ a second trick by rewriting q ( s ) in terms of agenerating function G ( x ) = P s x s q ( s ) using the inverseformula q ( s ) = lim x → ∂ sx G ( x ) /s !.This transformation leads to expressions similar to(14). For ρ = 0, c n = lim x → (cid:18) q + e mq − e m (cid:19) n I n [ p q − e m ∂ x ] e (1 − q ) ∂ x G ( x ) , (21)where e m = (1 − p out ) m . If we use the generating func-tion for the Poisson distribution G ( x ) = e − ( α + γ )(1 − x ) andconsider a power series representation of this expression,we see that we should just replace ∂ x → ( − α − γ ) andwe recover (14) exactly.Rewriting the modiﬁed Bessel function in integral formand using the identity lim x → e y∂ x G ( x ) = G ( y ), providesthe following succinct expression.1 − q = 1 π Z π dθ G ( p q − e m cos( θ ) + 1 − q ) (22) m = e mq π Z π dθ G ( p q − e m cos( θ ) + 1 − q ) r − (cid:16) e mq (cid:17) cos θ − m, q . We also recover the result that these equa-tions apply in the supervised case with the substitutions m → ρ + (1 − ρ ) m and q → ρ + (1 − ρ ) q on the RHS.For a power–law degree distribution, we get results qual-itatively the same as depicted in Fig. 2, replacing the α with p out on the x axis. Speciﬁcally, for power–lawnetworks, both the analytic results above and numericalexperiments using simulated annealing conﬁrm the exis-tence of a detection threshold in the unsupervised case,while the supervised case leads to nonzero magnetizationwhenever p out < . The weighted situation J = K will be studied via twoparticular (but important) cases 2 J = K = 2 and 2 K = J = 2, to make them amenable to analytic approach.Putting it into (6) and using (13) two times, we see thatthere are now four order-parameters: m, q, q ≡ c + c − , m ≡ c − c − . (24)Note that only q and m are observed from the single-spinstatistics, see (10, 11); q and m can be observed onlyvia measuring the internal ﬁeld distribution e P ( h ).For 2 J = K = 2 we introduce the following notations C p ( q, m ) = a X ∞ n = −∞ I n (˜ u ) I p − n (˜ x ) cosh[2˜ yn − ˜ vn − ˜ yp ] ,S p ( q, m ) = a X ∞ n = −∞ I n (˜ u ) I p − n (˜ x ) sinh[2˜ yn − ˜ vn − ˜ yp ] ,a = 2 e − ( α + γ ) q , z ± = q ± m, z ± = q ± m , ξ = 2 ρ − ρ , ˜ x = (1 − ρ ) q ( αz − + γz +1 )( α [ z + + ξ ] + γz − ) , (25)˜ y = 12 ln αz − + γz +1 α [ z + + ξ ] + γz − , ˜ v = 12 ln z + − z +1 + ξz − − z − , (26)˜ u = γ (1 − ρ ) q ( z − − z − )( z + − z +1 + ξ ) , (27)and write down the order-parameter equations: q = X ∞ p =1 C p ( q, m ) , q = C ( q, m ) , (28) m = X ∞ p =1 S p ( q, m ) , m = S ( q, m ) . (29)Eqs. (28, 29) apply also for 2 K = J = 2, but now in (25–27) we should interchange α and γ , and then substitute˜ y → − ˜ y and ˜ v → − ˜ v . Discussion.

Eqs. (28, 29) predict a second-order transi-tion over m and m . Similarly to the unsupervised case,the threshold of this transition is found via expanding(28, 29) over m and m . But the real qualitative diﬀer-ences between weighted and unweighted situations showup under semi-supervising, which we consider in moredetails below.First we focus on 2 J = K = 2 and recall that theclustering threshold is deﬁned via m = 0. While forthe previous unweighted situation, any amount of semi-supervision (as quantiﬁed by ρ ) suﬃced for shifting theclustering threshold to a ρ -independent value, here thedetection threshold starts to depend on ρ , and the small-est threshold is achieved for ρ →

0; see Table 1 foran illustration. To understand this seemingly counter-intuitive observation, note that detection threshold isachieved as a balance between the inter–cluster links withthe average connectivity γ and weight K = 2, and intra–cluster links with the average connectivity α and weight J = 1. Consider now the impact of the semi–supervisionon a test-spin. The inter–cluster links will exert negativeﬁelds on this test–spin, while the intra–cluster links willexert positive ﬁelds. Since inter-cluster links have twicelarger weigh, increasing ρ facilitates the negative ﬁelds.This explains why vanishing semi-supervising ρ → m and m simultane-ously turn to zero at the threshold—we get m > m = 0;see Table 1. Since m cannot be observed via a singlespin, this memory is hidden. The reason of m > ρ .

005 0 . . α . . . . q . . . . m − . − . − . J = 2 K = 2. that m counts the internal ﬁelds equal to ±

1, and thereare more such ﬁelds coming from the intra–cluster [con-nectivity α , weight 1] links that exert positive ﬁelds dueto the semi-supervised (frozen) spins.Now consider perhaps the most paradoxical aspect ofthe semi-supervised detection threshold: it is smaller than the value deduced from balancing the cumulativeweights of intra–cluster and inter–cluster links, whichyields αJ = γK . Indeed, according to Table I (where γ = 1) we have α = 1 . ρ →

0) versusthe weight-balancing value α = 2. This result seeminglycontradicts the intuition we got so far: i) a rough intu-ition about Hamiltonian (1) is that it is based on deﬁninga cluster via the intra–cluster weight being larger thanthe inter–cluster weight. ii) The unsupervised thresh-old is well above the weight-balancing prediction; see Ta-ble I. iii)

In the unweighted case ( J = K ) the semi-supervising just reduces the detection threshold towards α = γ , which coincides with the weight-balancing value.To understand this eﬀect, we turn to the physical pic-ture of the threshold, where positively and negativelyacting links driven by the semi-supervised (frozen) spinscompensate each other. At the weight-balance αJ = γK (with J < K ) fewer (but stronger) inter–cluster linkshave the same weight as more numerous (but weaker)intra–cluster links. Since the intra–cluster links are morenumerous, their overall eﬀect on a (randomly chosen) testspin is more deterministic and hence capable of buildingup a positive m at αJ = γK . Thus, the actual thresholdis reached for αJ < γK .We thus conclude that for weighted graph K > J asmall [but generic] semi-supervising can be employed fordeﬁning the very clustering structure. This deﬁnitionis non-trivial, since it performs better than the weight-balancing deﬁnition. Indeed, for a weighted network thedeﬁnition of detection threshold is not clear a priori , incontrast to unweighted networks, where the only possi-ble deﬁnition goes via the connectivity balance α = γ .To illustrate this unclarity, consider a node connectedto one cluster via few heavy links, and to another clus-ter via many light links. To which cluster this nodeshould belong in principle ? Our answer is that the propercluster assignment in this case can be deﬁned via semi-supervising.It is interesting to calculate m at the weight-balancingvalue αJ = γK , since this is the semi-supervising beneﬁtof those who would insist on the weight-balancing deﬁ-nition of the threshold; see Table I. Note ﬁnally that forlarge values of γ both unsupervised and semi-supervisedthresholds converge to αJ = γK , since now ﬂuctuationsare irrelevant from the outset.All these eﬀects turn upside-down for 2 K = J = 2;see Table II. Now the threshold is minimized for themaximal semi-supervising ρ → m is negative at thethreshold—and thus the memory about the clusteringis contained in m − m > α is always larger than the weight-balancing value γK/J . These results are explained by”inverting” the above arguments developed for J < K . In conclusion , we analyzed the community detectionin semi–supervised settings, where one has prior informa-tion about the community assignments of certain nodes.We showed that for the planted bisection graph modelwith intra–cluster and inter–cluster average connectivi-ties α and γ , respectively, even a tiny (but ﬁnite) semi-supervising shifts the detection threshold to its intuitivevalue α = γ . We observed a similar eﬀect of lowereddetection threshold for weighted graphs. In contrast tothe unweighted case, the shift in this case depends on thedegree of supervision. Furthermore, we found that whenapproaching the unsupervised limit by having ρ → + ,the detection threshold converges to a value lower (bet-ter) from the one obtained via balancing intra–clusterand inter–cluster weights. We suggest that this can serveas an alternative deﬁnition of clusters. We also saw thatin the semi-supervised case some (hidden) memory onthe clustering survives at the detection threshold.Although this work focused on the analytically simplercase of Erd¨os–R´enyi graphs, we have repeated the ana-lytic and numerical analysis for power–law graphs andfound similar results, suggesting that the impact of thenetwork topology is quantitative rather than qualitative.A similar picture has been observed in [4].An interesting generalization is to consider choosingfrozen spins more deliberately (e.g., based on node con-nectivity) as opposed to random selection studied here.While this diﬀerence might not be important for ERgraphs, it might lead to some signiﬁcant quantitativechanges for power–law graphs with a large number ofwell–connected hubs. Finally, it would be interesting toto consider prior information not only about nodes, butalso about links in the network. An example is a con-straint that two nodes belong to the same community.In our model, this can be incorporated by making thecoupling between those pairs suﬃciently strong. This scenario resonates well with a recent observation that thelinks in a network are usually community–speciﬁc, whilethe nodes might participate in diﬀerent communities [16].This research was partially supported by the U.S. AROMURI grant W911NF–06–1–0094. A.E.A. would like toacknowledge support by Volkswagenstiftung. [1] M. E. J. Newman, PNAS , 8577 (2006).[2] S. Fortunato, Physics Reports , 75–174 (2010).[3] L. Danon et al. , J. Stat. Mech.: Theory Exp., P09008(2005).[4] J. Reichardt and M. Leone, Phys. Rev. Lett. , 078701(2008).[5] X. Zhu and A. B. Goldberg, Introduction to Semi-Supervised Learning , Synthesis Lectures on Artiﬁcial In-telligence and Machine Learning, Morgan & ClaypoolPublishers, (2009).[6] G. Getz, N. Shental, and E. Domany,

Semi-supervisedlearning a statistical physics approach (Proc. 22nd ICMLWorkshop on Learning with Partially Classiﬁed TrainingData, Bonn, Germany, 2005).[7] M. Leone et al. , Eur. Phys. J. B , 125 (2008).[8] M. E. J. Newman, Phys. Rev. E , 056131 (2004).[9] A. Condon and R. M. Karp Algorithms for Graph Parti-tioning on the Planted Partition Model , Random Struc-tures and Algorithms , pp.116–140, 2001.[10] J. Reichardt and S. Bornholdt, Phys. Rev. E , 1067 (1987).[12] I. Kanter and H. Sompolinsky, Phys. Rev. Lett. , 164(1987).[13] Y. Y. Goldschmidt, Phys. Rev. B , 8148 (1991).[14] M. Mezard and G. Parisi, Eur. Phys. J. B , 217 (2001).[15] L. Viana and A. J. Bray, J. Phys C,18