Modeling and Detecting Network Communities with the Fusion of Node Attributes
11 Block Modeling and Detectability for CommunityStructure in Node Attributed Networks
Ren Ren, Jinliang Shao
Abstract —Node attributes are ubiquitous in real-world networks, whose fusion with graph topology raises new challenges for thedetection and understanding of community structure. Due to their principled characterization and interpretation, probabilistic generativemodels (PGMs) have become the mainstream methods for community detection in attributed networks. Most existing PGMs require theattributes to be categorical or Gaussian distributed. A novel PGM is proposed here to overcome these limitations. For the generality ofour model, we consider the impact of the distances between attributes on node popularity, whose description raises the model selectionproblem in statistical inference. We present a novel scheme to address this issue by analyzing the detectability of communities for ourmodel, which is also a quantitative description on the effect of node attributes with respect to communities. With the model determined,an efficient algorithm is then developed to estimate the parameters and infer the communities. Extensive experiments have beenconducted to validate our work from two aspects. First, the experiments on artificial networks verified the detectability condition for ourmodel. Second, the comparison on various real-world datasets shows that our algorithm outperforms the competitive methods.
Index Terms —Community detection, Attributed network, Stochastic block model, Model selection, Detectability (cid:70)
NTRODUCTION
From user-defined circles in online social media to protein-protein interactions [1], many real-world complex systemsnaturally form multiple groups due to the close relationshipbetween the individuals [2]. Abstracting the system as anetwork with nodes and edges, community detection aimsto divide the nodes into such structural groups or modulesthat the nodes have more connections with others in thesame group than the rest of the network [3]. Many metricshave been designed to intuitively describe the connectivitypattern of communities, e.g., modularity [3], conductance[4], which are the fundamentals of a number of fitness opti-mization approaches [5]. A family of more rigorous methodsis based on the probabilistic models such as the StochasticBlock Model (SBM) [6], which offer a principled characteri-zation of the emergence [7], the internal organization [8], [9],the computational hardness [10], [11] of communities, etc.Despite the wide application of community detection,recent studies [12], [13] find that structural communities(i.e., communities involving graph structure only) cannotmatch the ground truth groups in real-life networks com-pletely. On the other hand, there are often rich node at-tributes available in real-life networks, which can reflect thesimilarity between nodes [13], [14]. These findings stimulatethe fusion of both network topology and node attributes,raising the problem of community detection in attributednetworks [15], [16].Because of the incisive modeling and interpretation onstructural communities [7], [8], [9], [10], [11], the probabilis-tic generative model (PGM)-based methods have becomethe mainstream since attribute-aware community detectionprevailed [17], especially for the study on the relationship
Ren Ren, Jinliang Shao are with the School of Automation Engineering,University of Electronic Science and Technology of China, Sichuan, 611731,P. R. China. between network structure and attributes [13], [15], [18],which may provide insights on the formation mechanismof the organizations of real-world systems. Mostly basedon the SBM that generates network connections accordingto the latent block (i.e., group) structure and group mem-bership of nodes [6], two schemes are usually adopted inthe PGMs considering node attributes. One scheme modelsthe generative process of both edges and node features [14],[19], [20], [21], which requires the distribution of attributesto be specified. It is usually strongly assumed in the modelthat discrete, categorical attributes follow a multinomial orPoisson distribution [14], [19], [20] while continuous onesobey a multivariate Gaussian [21]. The other scheme onlyfocuses on the generation of edges and the fusion is mani-fested by the dependence of group structure or membershipon node attributes, while the attributes are seen as givenparameters of the model [15], [18]. This strategy circumventsthe generative modeling of attributes, whereas characteriz-ing the influence of node features on community structureis still a challenge, only discrete or one-dimensional real-valued node attributes are considered in [15], [18].One significant advantage of the PGM is that it allowsprincipled analysis of the attributes’ influence on the de-tectability of communities, that is, the condition of commu-nities’ being detected. The pioneering work [22] showed ingeneral that a fraction of nodes with known membershipscan improve the detectability, using the algorithm for struc-tural communities in [10]. And the detectability analysisbased on a specific model was empirically performed in [15],which also validates the effectiveness of the proposed datafusion method thereof.In this paper, motivated by the limitations of existingPGMs on node attributes, we propose a new PGM forcommunity detection with both discrete and continuousnode attributes and develop efficient algorithms to solve it.For the generality of our model, as in [15], [18], we focus on a r X i v : . [ c s . S I] J a n the generation of edges, which is affected by the distancesbetween node attributes and does not rely on any specificdistribution of attributes. Using this strategy, our mainchallenge is to characterize the impact of different nodeattributes on communities. Describing the impact by a real-valued function of the distances between node attributes,this issue is formulated into the model selection problem instatistical inference [15].However, without prior knowledge or assumptions onthe distribution of the various node attributes, the widelyused Bayesian and information-theoretical criteria [8], [9],[23] for model selection are difficult to be applied. To tacklethis challenge, we investigate that under what conditionthe communities can be efficiently detected based on theproposed model. The detectability condition provides aquantitative description on the effect of node attributes,which naturally leads to a novel model selection scheme.The main contributions in this paper can be summarizedas threefold: 1) We propose a novel node attributed com-munity detection model that is applicable for both discreteand continuous attributes. 2) We analyze the detectability ofcommunities based on the proposed model, which quantita-tively clarifies the impact of node attributes on communitydetection. 3) We present a novel model selection schemeand develop efficient algorithms to solve the model forattributed community detection. Finally, numerical experi-ments on artificial networks are conducted to validate thedetectability analysis, and the comparisons on extensivereal-world datasets show that our algorithm outperformsother state-of the-art approaches. HE PROPOSED MODEL
Given an undirected binary network G = ( V, E ) , where V is the node set, and E ⊆ V × V is the edge set, it can begenerated by such a family of probabilistic model: Each edge ( i, j ) ∈ E of G is independently generated via a Bernoullidistribution parameterized by a possibility p ij [6], resultingin the likelihood P ( G | ϑ ) = (cid:89) i In the Bayesian view, one may choose a max-imum entropy prior π ( ω ) = ω − e ω/ω for ω rs , where ω denotes the average of ω , and then the maximum a pos-teriori (MAP) estimation gives ω rs = m rs / (Ξ rs + ω − ) [26].Note that the average linking possibility is (cid:104) p (cid:105) = 2 m/n , inDCSBM, ω = 2 m/ ( c n ) = O ( n − ) . Similarly, as long asthe range of f is O (1) , ω is then O ( n − ) and Ξ is O ( n /q ) in CRSBM. Therefore the MAP estimation of ω is equivalentwith (8) when n (cid:29) q . LGORITHM AND D ETECTABILITY With the new model (6) for node attributed networks pre-sented, in this section we first develop an efficient algorithmto infer the community memberships based on Belief Propa-gation (BP), a classical method for the estimation of marginaldistribution in probabilistic graphical models. After this, wefurther analyze the detectability condition of communitiesbased on the proposed algorithm, which can be also seen asan algorithmic performance analysis. According to Bayes’ rule, the posterior distribution of z follows P ( z | G, ϑ ) = P ( G, z | ϑ ) / (cid:80) z P ( G, z | ϑ ) , where P ( G, z | ϑ ) is displayed in (5), and the possibility of eachnode i belonging to any community r is P ( z i = r | G, ϑ ) = (cid:80) z : z i = r P ( z | G, ϑ ) . To infer this marginal distribution, foreach ordered pair ( i, j ) ∈ V × V, i (cid:54) = j , BP defines messages from i to j , denoted by ψ i → jr , that means the marginal of z i = r conditioned on z j . Assume that the distributions ofthe neighbors ∂i = { j | a ij = 1 } of node i only correlatesone another through i , which means that i and its neighborsapproximately form a locally tree-like structure [10], [11].Conditioned on z i , the joint distribution of { z (cid:96) | (cid:96) ∈ ∂i } isthen the product of their marginals. In this case, ψ i → jr from i to j can be recursively expressed by the messages fromother nodes except j using the sum-product rule [31]. Basedon the posterior distribution P ( z | G, ϑ ) , we derive the BPequation for the message ψ i → jr as ψ i → jr = ν r Z i → j (cid:89) l/ ∈ ∂i (cid:16) − (cid:88) s ψ l → is g li ω sr (cid:17) × (cid:89) l ∈ ∂i \ j (cid:16)(cid:88) s ψ l → is g li ω sr (cid:17) , (9)where Z i → j is the normalization factor that ensures (cid:80) qr =1 ψ i → jr = 1 . The marginal of i can be then estimatedaccording to the messages that i receives, that is, ψ ir = ν r Z i (cid:89) l/ ∈ ∂i (cid:16) − (cid:88) s ψ l → is g li ω sr (cid:17) (cid:89) l ∈ ∂i (cid:16)(cid:88) s ψ l → is g li ω sr (cid:17) , (10)where ψ ir is the estimation of P ( z i = r | G, ϑ ) , which is alsoreferred as the so-called belief in BP algorithm, The maindifference between ψ ir and ψ i → lr is that whether the messagefrom node l is included. Note that in the case l / ∈ ∂i , theadditional term in the product of ψ lr is − ¯ p li , where ¯ p li (cid:44) (cid:80) s ψ l → is g li ω sr = O ( p li ) = O ( n − ) is sufficiently small withincreasing n , it then follows that ψ l → ir = ψ lr + O ( n − ) . Giventhis approximation, we obtain − (cid:80) s ψ l → is g li ω sr ≈ − (cid:80) s ψ ls g li ω sr = 1 − ¯ p li ≈ e − ¯ p li , the message ψ i → jr can bethen written as ψ i → jr = ν r Z i → j e − h ir (cid:89) l ∈ ∂i \ j (cid:16) f lr (cid:88) s ψ l → is ω sr f is (cid:17) , (11)where h ir (cid:44) (cid:88) l (cid:88) s g li ψ ls ω sr = (cid:88) l f lr (cid:88) s f is ψ ls ω sr , (12) Algorithm 1: BP inference Input: G , { x i , i ∈ V } , the number of communities q Learning model: f , ϑ = { ω, β , ζ } ψ i → jr := rand (0 , , ψ i → jr := ψ i → jr /Z i → j , ∀ ( i, j ) ∈ E ; compute f ir , ψ ir , h ir for i ∈ V , r ∈ [ q ] by (4)(13)(12); while beliefs { ψ ir } not converged do compute { h ir } and store it into a n × q matrix H ; set ∆ as a zero matrix of size q × q ; foreach ordered pair ( i, j ) ∈ E in random order do h (cid:96)r := H (cid:96)r + (cid:80) qs =1 f (cid:96)s ∆ sr for (cid:96) ∈ { i, j } , r ∈ [ q ] ; update ψ i → jr , r ∈ [ q ] by (11); φ := ( ψ j , . . . , ψ jq ) , update ψ jr , r ∈ [ q ] by (13); ∆ rs += ( ψ jr − φ r ) f js ω rs for ( r, s ) ∈ [ q ] × [ q ] ; Return: { ψ ir } , z i = arg max r { ψ ir } , i ∈ V, r ∈ [ q ] is the so-called auxiliary external field. And the belief in (10)can be accordingly approximated as ψ ir = ν r Z i e − h ir (cid:89) l ∈ ∂i (cid:32) f lr (cid:88) s ψ l → is f is ω sr (cid:33) . (13)As long as the function f and parameter set ϑ are given,the marginal P ( z i = r | G, ϑ ) can be inferred via iteratingBP equations (11), (12) and (13) for each ordered node pair ( i, j ) ∈ E (cid:44) { ( i, j ) | a ij = 1 } until the convergence of { ψ ir } . For clarity, we present the detailed steps in advance inAlgorithm 1 though the model learning procedure in Line 2has not been discussed.In Algorithm 1, to achieve the convergence of BP equa-tions efficiently, an asynchronous update scheme is used,which means that the messages and beliefs are computedusing the latest updated values available instead of thevalues at last iteration, as shown by the inner loop inLines 8–12. It is also notable that according to (12), theupdate of ψ (cid:96)r of any node (cid:96) will influence the values of { h ir } of every node i , to reduce the time complexity, instead ofupdating all the h ir , i ∈ V after each computation of ψ (cid:96)r , weadopt a lazy update strategy [32], that is, h ir and h jr are onlyupdated before the computation of message ψ i → jr . For thispurpose, we first compute and store all the { h ir } before theinner loop (Line 6), and accumulate the changes caused byeach update of ψ (cid:96)r (Line 12) during the iteration, h ir and h jr can thereby be computed using the changes and the storedinitial values (Line 9). Remark 2. Setting f as the constant function , we recoverthe BP equation for the standard SBM, which reads ψ i → jr = ν r Z i → j e − h r (cid:89) l ∈ ∂i \ j (cid:16)(cid:88) s ψ l → is ω sr (cid:17) , (14)where h r = (cid:80) l (cid:80) s ψ ls ω sr is the external field. Moreover,replacing f is with k i /c in (11)–(13), the BP equations forDCSBM are also recovered. Without losing the essence, community detection algorithmsare usually theoretically analyzed based on a simplifiedcase of SBM (SSBM) [10], [33], [34], in which all the com-munities have the same size n/q , and m rs only has two distinct values for ( r, s ) ∈ [ q ] × [ q ] , that is, m rs = m in if r = s and m rs = m out otherwise. We further denotethe intra- and inter-community degrees by c in = 2 m in /n and c out = m out /n , respectively, the average degree of thenetwork is then c = q − ( c in +( q − c out ) .For the SSBM, (14) has a factorized fixed point (FFP) ∀ ( i, j ) ∈ E , ψ j → ir = 1 /q , which is a trivial solution that im-plies the failure of community detection. The convergence atthe FFP can be investigated via the linear stability analysis,which is described by the first order derivatives of messagesin (14) and the corresponding q × q message transfer matrix T with the entry T rs (cid:44) ∂ψ i → jr ∂ψ l → is (cid:12)(cid:12)(cid:12)(cid:12) FFP . (15)For a sparse graph G , it was conjectured in [10] and provedin [35] that, when the parameters in (15) are in line with theSBM that generates G , the FFP is not stable with randomperturbation ψ i → jr = 1 /q + ξ r if ˜ cλ ( T ) > , (16)and the marginals of community memberships can be theninferred efficiently via (14). In (16), ˜ c denotes the aver-age number of neighbors that each node passes messagesto, and λ ( T ) is the largest eigenvalue of T . In detail, ˜ c = (cid:104) k (cid:105) / (cid:104) k (cid:105) − is called the average excess degree [36],where (cid:104) k (cid:105) is the mean degree and (cid:104) k (cid:105) is the mean-squaredegree. In particular, for ER networks, ˜ c = c . The criticalvalue at ˜ cλ ( T ) = 1 is referred to as the detectability limitof community structure, or the Kesten-Stigum (KS) bound[37] in general.To study the impact of node attributes on the effective-ness of community detection, we first investigate the mod-ular networks with categorical node attributes, which yields (cid:107) x i − x j (cid:107) ∈ { , } and α jr ∈ { , } . Based on the SSBM,we start from the case that each node has a categoricalattribute x i = ς i ∈ [ q ] that indicates its community. Setting f (1) > f (0) , we find that the trivial solution ψ i → jr = 1 /q , ∀ ( i, j ) ∈ E is not the fixed point of (11) in this case. Reducing(11) according to the SSBM, we observe instead that ψ i → jr = (cid:26) γ/ ( γ + q − r = ς i , / ( γ + q − r (cid:54) = ς i (17)is a fixed point, where γ = f (1) /f (0) . And with f (0) = f (1) , the trivial FFP ψ i → jr = 1 /q is recovered.Eq. (17) tells that, as long as the attributes are indicative,that is, the memberships indicated by the node attributesare better than random guess, the detectability limit ofcommunities vanishes. This contrast shows that our modelcan improve the effectiveness of community detection byincorporating the contribution of node attributes.On the other hand, the available useful nodal infor-mation is often inadequate in real-world networks. Onecollection of nodes with the same categorical attribute cancontain multiple communities due to the inhomogeneousinteractions within the category [17], (e.g., the Amazon co-purchasing network with its nodes attributed by the productcategories [1]). A nature question in this situation is that,whether the multiple communities within the same categoryis always detectable, or they are mixed into one communityas indicated by the node attribute. With this problem in mind, we consider the followingnested model: there are q ∗ planted communities in thenetwork generated by SSBM, each node of which is an-notated by one attribute from ˜ q = q ∗ /q b ≥ categories,and each category contains q b ≥ modular groups, whichare referred to as brother communities in this paper. Thedistance of each node to its own category is , and thedistances to other categories are . For convenience, we use z ∈ z ς (cid:44) { q b ς − q b + 1 , q b ς − q b + 2 , . . . , q b ς } , ς ∈ [˜ q ] to labelthe brother communities. Without losing generality, we set f (0) = 1 , and denote the value of f (1) by γ . For this model,we find a fixed point of (11) ψ i → jr = (cid:26) γ/ ( q b γ + q ∗ − q b ) r ∈ z ς i , / ( q b γ + q ∗ − q b ) otherwise , (18)at which the modular structure within each category isundetectable. Following the pioneering studies on the com-munity detectability of BP algorithm, we analyze the linearstability of (11) at the fixed point (18) with the actual modelparameters. Using (15), we obtain the message transfermatrix T with T irs = ω rs f is ψ ir (cid:80) u ω ru f iu ψ iu − ψ ir (cid:88) u (cid:18) ω us f is ψ iu (cid:80) v ω uv f iv ψ iv (cid:19) , (19)where u, v, r, s ∈ [ q ] indexes communities, and at the fixedpoint in (18) ψ ir = ψ i → jr according to (13). Writing (19) intothe matrix-vector form, we obtain T i = ( I − ψ i T )( ˜ D − Ψ i Ω F i ) , (20)where I is the identity matrix, is an all (cid:48) s column vector, ψ i = ( ψ i , ψ i , . . . , ψ iq ∗ ) T , Ψ i = diag ( ψ i ) , Ω = [ ω rs ] q ∗ × q ∗ , F i = diag ( f i , f i , . . . , f iq ∗ ) and ˜ D is a diagonal matrix withits r th diagonal entry being the r th row sum of Ψ i Ω F i . Tosolve the eigenvalues of T i , we next discuss the value of ω rs in (19).With f (0) = 1 , we obtain according to the MLE in (8)that ω rr = c in /n . Note that in the message passing process,for each community, its brothers are indistinguishable fromother groups owing to the identical group sizes and randominitial messages. Therefore, the values of ω rs , r (cid:54) = s in (19)is equivalent to the average value of the MLE, i.e., ω rs = (cid:104) ω (cid:105) r (cid:54) = s = c out (cid:2) q b − γ − ( q ∗ − q b ) (cid:3) n ( q ∗ − , ∀ r (cid:54) = s. (21)With the matrix Ω in (20) obtained, for the leading eigen-value λ ( T i ) we have the following theorem: Theorem 1. For each node i ∈ V , the eigenvalues of T i are allreal values and the largest eigenvalue of T i shares the same value λ ( T i ) = λ ( T ) = ω in − ω out ω in +( q ∗ − − q b ) ω out + q b γ − ω out , (22) where ω in = c in /n and ω out = (cid:104) ω (cid:105) r (cid:54) = s is shown in (21).Proof. Please see Appendix A.Combining Theorem 1 and (16), we can then obtainthe condition under which the brother communities withinthe same category is detectable. To show this result suc-cinctly, let (cid:15) = c out /c in denote the ratio of inter- and intra-community degrees, the detectability condition is then (cid:15) < (cid:15) ∗ γ = √ ˜ c − η ( q ∗ − q b + q b γ − + √ ˜ c − , (23) with η = ( q ∗ − − [ q b − γ − ( q ∗ − q b )] < . Setting γ = 1 in (23), we obtain the detectability of the BP equation(14) back for SSBM, i.e., (cid:15) < (cid:15) ∗ = ( q ∗ + √ ˜ c − − ( √ ˜ c − .Given that γ > , we have (cid:15) ∗ γ > (cid:15) ∗ , that is, leveraging thenode attributes, the condition in (23) is less strict than thatfor SSBM. It is notable that (23) in fact suggests that theproposed CRSBM and BP algorithm can take advantage ofboth network topology, described by (cid:15) , and node attributes,described by γ , to detect communities.Moreover, based on two extended models of SBM, theabove analysis on community detectability not only demon-strates the effectiveness of our approach, but measures theimpact of node attributes on communities, which can pro-vide a novel strategy to solve the model selection problemof the node popularity function f in the model (4), as willbe shown in Section 4. ODEL S ELECTION AND A LGORITHM D ETAILS In existing community detection literature, multiple avail-able models are often compared and selected according tosome criteria including minimum description length (MDL)and Bayesian model selection [8], [9], [23]. However, becauseof the diversity of node attributes, especially continuousones, it is difficult to determine their description length orspecify a prior distribution without strong assumptions.In this section, we present a novel the model selectionscheme for the node popularity function f in (4) based onthe analysis of the impact of node attributes on communitydetection, which can be quantitatively described by com-munity detectability. After determining the form of f , wepresent a parameter estimation method that cooperates withthe BP inference, and thus obtain the whole node attributedcommunity detection algorithm. In the model (4), α ir ∈ [0 , . Without losing generality, weset f (0) = 1 . Note that for either categorical or continuousattributes, w ir = 1 means that x i is completely differentfrom ζ r . Therefore, a reasonable upper bound γ ∗ = f (1) ofthe popularity function f can be investigated through theanalysis of networks with categorical attributes. To this end,we inspect the detectability condition (23).In fact, the critical value (cid:15) ∗ γ in (23) limits the “strength”,or formally, the statistical significance [38] of the detectedcommunities, which is described by the ratio (cid:15) = c out /c in .The indicative attributes relaxed the condition, makingweaker communities with larger (cid:15) detectable, on the otherhand, the overweighting of attributes can cause over-split ofcommunities. Hence f (1) should be a limited value.For assortative modular structure, it is required that (cid:15) < in SBM. In (23), (cid:15) ∗ γ > may lead to the emergence of somenoise structure of no statistical significance. To avoid suchside effect, we have ∀ q b ≥ , (cid:15) ∗ γ ≤ , which is reduced to (cid:15) ∗ γ | q b =2 ≤ since that (cid:15) ∗ γ decreases as q b increases. Furthernote that in the interval [1 , + ∞ ) , (cid:15) ∗ γ is a monotone increasingfunction of γ , the critical value of γ is the maximum real-valued solution of (cid:15) ∗ γ | q b =2 = ( q ∗ − √ ˜ c − q ∗ − γ − + √ ˜ c )[1 + γ − ( q ∗ − (24) with q ∗ ≥ , which can be simplified to a cubic equation.Analyzing the solution of (24), we find that it is requiredthat ˜ c > to ensure γ ∗ > .For the cases where (24) fails, we here present an al-ternative method for the choice of γ . It is clear from thecondition (16) that a large λ ( T ) benefits community detec-tion. And the empirical studies in [10] also show that larger λ ( T ) results in higher accuracy of detection. These findingsinspire us to inspect the contribution of γ to λ ( T ) . Forsimplicity, we investigate an extreme case based on SSBM,where the categorical attribute ς i of each node i indicatesits community z i correctly, i.e., ∀ i, ς i = z i . In this situation,the transfer matrix T is in the same form of (19) and has q real-valued eigenvalues with the largest one λ ( T ) = ω in − ω out ω in +( q − γ ) ω out = γ − (cid:15)γ +( q − γ ) (cid:15) , (25)which can be derived analogously by the method in Theo-rem 1. The derivative of λ ( T ) with respect to γ is dλ ( T ) dγ = (cid:15) [ (cid:15) + γ (2 q − γ )][ (cid:15) ( q − γ ) + γ ] > , (26)which approaches with increasing γ . To ensure the contri-bution of attributes to λ ( T ) and reduce the impact of noiseon detected communities, we select γ ∗ at which point thegrowth rate of λ ( T ) is small enough, that is, dλ ( T ) dγ (cid:12)(cid:12)(cid:12)(cid:12) γ ∗ = µ dλ ( T ) dγ (cid:12)(cid:12)(cid:12)(cid:12) γ =1 , (27)where µ ∈ (0 , is a hyper-parameter. It has an approximatesolution γ ∗ ≈ µ − / [1 + ( q − (cid:15) ] / .In practice, considering that in real-world networks, theintra-community edges are usually more than inter- ones[39], that is, c in ≥ ( q − c out , taking the corner case of c in = ( q − c out we obtain γ ∗ ≈ (4 /µ ) / . (28)Based on the bounds above, we set γ ∗ the minimum valueof the solutions given by (24) and (28).In brief, we show in a principled way that the nodepopularity function f in CRSBM should be a bounded func-tion by analyzing the two-sided effects of node attributes oncommunity detection, and then derive the bounds based onthe detectability limit. The analysis above has in fact suggested several rules for themodel selection of f in CRSBM defined on the continuousinterval [0 , . I). For convenience but without losing gener-ality, f (0) = 1 . II). f (1) > f (0) and f (1) should be a limitedvalue that can be decided by (24) and (28). GeneralizingRule II to the distance x ∈ (0 , , we further have: III). Forany two points x , x satisfying x > x , f ( x ) ≥ f ( x ) ,and f ( x ) − f ( x ) should be small if x is close to x , thatis, formally, the derivative f (cid:48) ( x ) ∈ [0 , C ] is limited. IV).Under the condition of Rule III, f should be in a form thatmakes ω in /ω out as large as possible, which enlarges λ ( T ) according to (22) and (25). These rules suggest that an S -shape curve is a goodchoice of f , such as the Sigmoid-like function f ( x ) = ( γ ∗ − / [1 + exp( − β x + β )] + 1 , β > , (29)whose parameter set is denoted by β = { β , β } . Note thatthe log-likelihood log P ( G | z, ϑ ) contains the summation of O ( n ) terms in the form of − log (cid:80) i f ir , maximizing sucha non-convex objective with respect to β is expensive andsensitive to initialization. To avoid these ill cases, we nextpropose a heuristic method to estimate β and f withoutoptimizing the log-likelihood.Before proceeding, we first give some preliminaries. Foreach node j and community r , f ( α jr ) is reasonable to beclose to the lower bound if z j = r , otherwise f ( α jr ) should be close to the upper bound γ ∗ . Based on thisintuition, for each point x , we can update f ( x ) heuristicallyaccording to the marginals P x (cid:44) { ψ jr | ( j, r ) s.t. α jr ∈ N x } with corresponding distances falling into the neighborhood N x = ( x − dx, x + dx ) of x . To this end, we define the measure ∆ x = 2 (cid:104) ψ jr (cid:105)(cid:104) ψ jr (cid:105) + ( q − − (1 −(cid:104) ψ jr (cid:105) ) − , (30)where (cid:104) ψ jr (cid:105) is the average of the marginals in P x . Note that ∆ x satisfies that ∆ x > iff (cid:104) ψ jr (cid:105) > /q and ∆ x < iff (cid:104) ψ jr (cid:105) < /q , we update f ( x ) by f τ +1 ( x ) = f τ ( x ) + | ∆ x | · ( b − f τ ( x )) · exp( − τ /τ max ) , (31)where τ indexes the iteration of parameter learning, b = 1 if ∆ x > and b = γ ∗ otherwise. In (31), the term b − f ( x ) guarantees that f τ +1 ( x ) is within the interval [1 , γ ∗ ] giventhat ∆ x ∈ [ − , . And the term exp( − τ /τ max ) penalizesthe update as the iteration proceeds, which makes the esti-mation of f more stable.In practice, we update f ( x ) on a finite set of samples S = { ( x, f τ ( x )) } according to (31), and β are then re-estimated by the Least Squares Method (LSM) to guaranteethat Rule III and Rule IV are satisfied. In detail, for thefunction f ( · ) in (29), the estimation of β given { ( x, y ) } with y = f τ +1 ( x ) can be solved by the linear least squaresestimation of β on the transformed samples T = { (˜ x, ˜ y ) } ,where ˜ x = − x and ˜ y = log( γ ∗ − y ) − log( y − 1) = β ˜ x + β . (32)Following [10], [38], we adopt an iterative learningscheme for the proposed model, that is, the parametersare updated based on the results of last iteration. The δ z i ,r ∈ { , } terms in (7) and (8) are relaxed to the marginal ψ ir , which improves the robustness of parameter estimation.This relaxation gives ν r = 1 n (cid:88) i ψ ir and n sr = (cid:88) i ψ ir f is , (33)Different from ν r and n sr that relate to one-node marginalsonly, m rs in (8) involves two-nodes marginals P ( z i , z j ) , thatis, m rs = (cid:80) i In Remark 2, we have shown that the derivedBP equations can be transformed into those for SBM andDCSBM by changing f is into and c − k i respectively. Theseconversions are also applicable to (33) and (34) for param-eter estimation. Furthermore, the node degrees can also beincorporated into our CRSBM together with attributes byreplacing f is with c − k i f is in Eqs. (11)–(13) for communityinference, and in Eqs. (33), (34) for parameter estimation. Based on the proposed model learning scheme, we presentin Algorithm 2 the whole community detection procedurefor attributed networks using CRSBM. In Algorithm 2, weinitialize ζ r , r ∈ [ q ] using the famous initialization methodfor the cluster centers in k-means++ [40]. After the initializa-tion, we conduct BP inference and parameter learning pro-cess iteratively using an Expectation Maximization (EM)-like framework (Line 5–15), where the E-step for the latentgroup membership z is performed by the BP inference, andin M-step the parameters ϑ are estimated by MLE.However, it is difficult to specify a universal convergencethreshold of EM for various network data due to the dif-ferent correlation of network structure and node attributes.As pointed by Newman et al. in [15], the EM algorithmwith superfluous iterations may converge to poor solutions.Considering this, we run the iterations for τ max = 10 times, Algorithm 2: Node Attributed Community Detection Input : G , { x i , i ∈ V } , community number q initialize ζ by the center initialization in k-means++; get γ ∗ by (24) and (28) with µ = 0 . , ˜ q = q ; initialize f ( x ) = ( γ ∗ − x +1 , ω := qc/n ; ω rr := ω (1+ γ ∗ ) − γ ∗ , ω rr := ω (1+ γ ∗ ) − by γ ∗ in (28); for τ := 0 to τ max − do get { ψ ir } and z i by BP inference in Algorithm 1; divide [ α min , α max ] into N s = 10 grids uniformly,use the midpoints { x k } of the grids to form S ; compute { ∆ x k } N s k =1 by (30), x < x < · · · < x N s ; if ∆ x < and ∆ x < then update { ζ s } and { α ir } by (37), (3); goto Line 15; update f ( x k ) for { ( x k , f ( x k )) } N s k =1 in S by (31); get T by (32) and conduct LSM on T to get β ; update ζ by (37) and update { f is } with new β , ζ ; update ν r , n rs , n sr , m rs , ω rs by (33), (34) and (8); compute the GN modularity Q for the resultingcommunities at each iteration; output: { z i } corresponding to the largest Q and use the GN modularity Q [3] of the partitions at eachiteration as an index to select the results (Line 16), where Q = 12 m (cid:88) i,j (cid:18) a ij − k i k j m (cid:19) δ z i ,z j . Despite that the ground truth community divisions of real-world networks may not show the optimal modularity, itworks well on selecting good results among the divisionsgenerated by multiple iterations.For the parameter learning procedure, besides the MLEequations, we here give further explanations of the updateof β and ζ . In the selection of the sample set S for LSM,the interval [ α min , α max ] is divided into N s = 10 grids ofequal length dx and S is composed of ( x k , f ( x k )) with x k , k ∈ [ N s ] being the midpoint of the grids. To ensure thepopularity function f in the form (29) is non-decreasing, i.e., β > , we skip the update of β if the measure ∆ x < forboth the first two grids of [ α min , α max ] (Lines 9–11), whichmostly occurs in the early iterations of Algorithm 2. In theearly stage, the update of f may make a large differenceof the membership z , stopping re-estimating β and keepingupdating ζ aim to obtain good cluster representatives of theinferred communities. In practice, we empirically find that ζ can reach good points quickly by Line 10, and the updateof β seldom stops for three successive iterations.Finally, we analyze the time complexity of the proposedmethod. In Algorithm 2, the initialization steps cost O ( qn ) time. For the parameter learning, updating { m rs } takes O ( q m ) time operations, updating { ν r } , { n sr } , { ζ s } and f all takes O ( qn ) time, and conducting LSM for the estima-tion of β takes O ( N s ) = O (1) time. The BP inference isconducted by Algorithm 1. In Algorithm 1, at each iteration,there are O ( m ) messages { ψ i → j } to update, each of whichis a q × vector (Line 10), and the update of ∆ rs and h (cid:96)r , (cid:96) ∈ { i, j } takes O ( q ) time operations for each ψ i → j ,thus the time complexity of BP inference is O ( q m ) . Finally, calculating the modularity Q costs O ( n ) time. In conclusion,Algorithm 2 has a time complexity of O ( q m ) , which keepsin the same order of that of BP algorithm using graphtopology only [10]. XPERIMENTS In this section, extensive experiments on both artificialand real-world networks are conducted to demonstrate theperformance of our model and algorithms. Since that thecommunity assignment is still disputed when the clustersof node attributes are not in line with structural commu-nities [13], there is currently no widely accepted artificialbenchmarks for modular networks with node attributes [16].Following [15], [18], we only use synthetic SBM graphswith categorical node attributes to validate the detectabilityanalysis of our algorithm, whose main finding is shownin (23). And the comparison of our method with otherbaseline approaches is conducted on six real-life networkswith ground truth communities. To validate the detectability condition of the multiple com-munities with the same categorical node attributes, we gen-erate a collection of SBM graphs with q ∗ = 4 communitiesof the same node size n = 5000 and set the number of cat-egories ˜ q = 2 . The synthetic graphs are all with average de-gree c = 4 , while c in and c out vary in different networks. Forconvenience, we fix γ = f (1) = 2 . By (23), the critical valueof detectability is (cid:15) ∗ = 1 / , more intuitively, the correspond-ing ratio of internal degree is k in /c = c in / ( cq ∗ ) = 2 / .We show in Table 1 the confusion matrices M ∈ R q ∗ × q ∗ of BP inference on three SBM graphs. The SBM-generatednetworks are with k in /c ∈ { / , / , / } respectivelyand (cid:15) ∈ { / , / , / } accordingly, and we set C ( C )and C ( C ) to be in the same category. From the graycolored diagonal blocks in Table 1 we can see that when (cid:15) ≥ (cid:15) ∗ , the two brother communities with the same categor-ical attributes are mixed into one in the detected communitystructure, which results in M = M = 0 . In contrast,with (cid:15) = 11 / < (cid:15) ∗ , BP inference finds two communitiesin each category, as shown by M rr > , ∀ r ∈ [ q ∗ ] , thatis, the brother communities are detectable with (cid:15) below thedetectability threshold. From the experimental results on theabove three SBM graphs, the correctness of the detectabilitycondition (23) of our method is verified, which describes thesituation where node attributes are insufficient to indicatethe communities in the network. To illustrate our method in more detail, we here show theprocess of Algorithm 2 by a case study on the Pubmeddataset with 19729 nodes, 44338 edges, 500 dimensionalnode attributes and 3 ground truth communities, as shownin Fig. 1a. The Pubmed network describes the citation rela-tionships between publications on diabetes in the PubMedwebsite. The node attributes in Pubmed are sparse real-valued vectors describing TF/IDF weights of words in thetitles from a 500 word dictionary [16], whose first two prin-cipal components are visualized in Fig. 1b via the principal TABLE 1Confusion matrices of BP on the SBM graphs with (cid:15) = 4 / > (cid:15) ∗ , (cid:15) = 1 / (cid:15) ∗ , and (cid:15) = 11 / < (cid:15) ∗ . C ( C ) and C ( C ), are in thesame category. Each element in the matrices are normalized into [0 , by the division of n . DC: detected communities. GT: ground truth. (cid:15) GT DC C C C C C C C C (cid:15) GT DC C C C C C C C C (cid:15) GT DC C C C C C C C C component analysis (PCA) algorithm [41]. We can see fromFig. 1b that a substantial portion of the attributes in eachcommunity mix with those belonging to other communities,which suggests that mere node attributes cannot indicate thecommunities well in Pubmed.Applying Algorithm 2 to Pubmed, the third iterationshows the largest modularity of Q = 0 . among τ max =10 iterations, where the corresponding cluster representa-tives { ζ r | r ∈ [3] } and the popularity function f are shownin Fig. 1b and Fig. 1c respectively. From the visualization, weobserve that each of the estimated ζ ’s locates at the positionwhere the attributes in the same community are denselydistributed and the distances between ζ ’s are relativelylarge, hence the choice of cluster centers of attributes arereasonable. Starting from the initial state of a linear function(Line 3 in Algorithm 2), the node popularity f changes intoan S -shape curve as the iterations proceed, which is in linewith the proposal of detectability analysis.For the comparison with ground truth, we present thedetected communities in Fig. 1d. It shows that our methodestimates the group membership of most nodes correctly,while the error is mainly caused by the nodes that havenearly equal amount of links to three communities, asshown by the bottom-left of Fig. 1a and Fig. 1d. The quan-titative evaluation show that our method achieves the bestperformance compared with the baselines on Pubmed, aswill be presented in Section 5.3. We further qualify the performance of the proposed methodby comparing it with baseline algorithms on various real-life networks with ground truth available. The experimentalsettings are shown below. Datasets : Six real-world network datasets are used in theexperiments, including Citeseer, Cora, Pubmed, Facebook,Twitter and Parliament, whose profiles are summarized inTable 2. The node attributes in all the datasets are binary val-ued except for those in Pubmed. We convert the nonzero real (a) The Pubmed network (b) Projected attributes and CRs(c) The evolving f along with iterations(d) Detected communities in Pubmed Fig. 1. (a). The ground truth communities in Pubmed are indicated bynode colors. (b). The projected data points of the estimated clusterrepresentatives (CRs) ζ and attributes in the ground truth communities C , C and C . (c). At the second iteration, the condition in Line 9of Algorithm 2 are satisfied, thus f is not updated. (d). The detectedcommunities are shown with node position unchanged from (a).TABLE 2Real-world Dataset Profiles Class Dataset | V | | E | d K ∗ AttributeSocial Twitter* 171 796 578 6 binaryFacebook* 1045 26749 576 9 binaryCitation Citeseer 3312 4732 3703 6 binaryCora 2708 5429 1433 7 binaryPubmed 19729 44338 500 3 real valuePolitics Parliament 451 5823 108 7 binary K ∗ : Number of ground-truth communities , d : Dimension of attributesFacebook*: network id: 107, Twitter*: network id: 629863 values in the attributes of Pubmed into for the algorithmsthat take categorical-valued node attributes as input con-sidering the sparsity of the nonzero elements therein. Thedatasets Facebook and Twitter are two collections of mul-tiple social networks, we adopt the two attributed graphsthat contain the most nodes in their collections respectivelyin the experiments. Baseline algorithms : Three classes of community detectionmethods are employed for comparison. First, statistical in-ference methods using network topology only. Specially, weadopt the extension of BP inference to DCSBM [23], which can be derived from our algorithm as shown in Remark 2and Remark 3. Second, PGM-based algorithms incorporat-ing both network topology and node attributes, includingBAGC [19], CESNA [14], SI [15], which requires categoricalnode attributes, and CohsMix [21], which requires Gaussiandistributed attributes. Third, focusing on network modeling,we also interested in the methods considering the behaviorof real-life networked systems. To this end, we employCAMAS [42], which is a latest method of this line basedon the network dynamics and node cluster properties of themulti-agent system.The tuning parameters of all the baselines are set ac-cording to the authors’ recommendations. For the statisticalinference algorithms, we specify the ground truth value K ∗ for the number of communities to be detected. It is worthto note that SI [15] requires all the possible combinations ofeach dimension of node annotations, which is not scalableto networks with attributes of thousands of dimensions, asshown in Table 2. To solve this problem, we first apply k-means clustering [40] to the attributes, which converts thehigh-dimensional features to one dimension, and then usethe clustering result as the input of SI. Evaluation metrics: We adopt two widely used metricsin community detection to qualify the accordance betweenexperimental results and ground truth and evaluate thecomparative methods, i.e., Average F Score (AvgF1) andNMI metric [43], whose definitions are as follows: AvgF K ∗ (cid:88) C ∗ ∈ C ∗ max C ∈ C F ( C ∗ , C ) + 12 K (cid:88) C ∈ C max C ∗ ∈ C ∗ F ( C, C ∗ ) ,NMI = − (cid:80) Kp =1 (cid:80) K ∗ q =1 n pq log n pq nn p · n · q (cid:80) Kp =1 n p · log n p · n + (cid:80) K ∗ q =1 n · q log n · q n , where C ∈ C is a community detected by an algorithm, C ∗ ∈ C ∗ is a ground truth community, K is the numberof detected communities, K ∗ is that of ground truth, and F ( C p , C q ) is the F score between two sets C p and C q . n pq = | C p ∩ C q | , n p · = (cid:80) q n pq and n · q = (cid:80) p n pq . It isexpected that higher NMI and AvgF1 scores indicate bettercommunity divisions.Note that the overlapping community detection algo-rithms CESNA [14] and CAMAS [42] may discard anoma-lous nodes in the detection procedure. Consequently, theNMI index that requires the compared partitions to coverthe same node set is unable to evaluate the performancesof CESNA and CAMAS. Instead, we use the overlappingextension of NMI (ONMI) in [44] as the evaluation metric.We evaluate our algorithm and the baselines on thedatasets in Table 2, and show the results in Table 3, wherethe best scores for each network are highlighted in bold,and N/A means that the algorithm only detects one trivialcommunity on the network. From Table 3, we observethat: (1). Our CRSBM is superior to DCSBM on all thesix datasets, which shows that CRSBM can effectively in-corporate node attributes to improve the performance ofcommunity detection. (2). Our algorithm and SI are effectiveon both dense and sparse networks, while CESNA andCAMAS show inferior performances on the citation net-works that have a small average node degree around . (3).Our method outperforms the baselines on all the networks TABLE 3Comparison of AvgF1 and (O)NMI Scores of Our CRSBM and Baselines Network Twitter Facebook Cora Citeseer Pubmed ParliamentMetric % AvgF1 NMI AvgF1 NMI AvgF1 NMI AvgF1 NMI AvgF1 NMI AvgF1 NMIDCSBM 49.33 55.47 38.73 43.23 53.50 36.96 39.17 16.34 55.33 18.14 51.23 41.96BAGC N/A N/A 27.28 9.03 36.46 16.97 N/A N/A 36.33 8.31 29.76 5.27SI 50.89 54.52 Metric % AvgF1 ONMI AvgF1 ONMI AvgF1 ONMI AvgF1 ONMI AvgF1 ONMI AvgF1 ONMICAMAS 34.02 17.93 31.94 except for Facebook in terms of AvgF1 and (O)NMI metrics.Overall, our method achieves the best performance amongthe competitive approaches. Moreover, compared to otheralgorithms, it also shows a good applicability to variousnode annotated networks, whose network connections maybe sparse or dense, and node attributes may be categoricalor real-valued. ONCLUSION In this paper, we proposed a novel probabilistic generativemodel (PGM) named as CRSBM for community detectionin node attributed networks based on the classical SBM.Without requirements on the distribution of attributes, ourmodel takes the distances between the feature vectors ofnodes into account in the fusion with graph topology. Inspecific, we consider the influence of attributes on thenode popularity that is described by a real-valued func-tion. To choose an appropriate node popularity function,which relates to the model selection problem, we analyzethe detectability of communities in attributed network forour model. The analysis shows that an S -shape curves isa good choice for the node popularity function. With themodel determined, an efficient algorithm was developedto estimate the parameters and detect the communities.Extensive comparative experiments on real-world networkswith ground truth has shown that our method is superior toother state-of-the-art approaches.As a byproduct, we derived the detectability conditionfor our model, which has been verified by numerical ex-periments on artificial attributed networks. In fact, the de-tectability quantifies the effect of node attributes on commu-nity structure. It is shown that if there are multiple (but notall) communities whose nodes are with the same categoricalattribute, the detectability can still be improved, where theimprovement is mainly determined by the dependency onattributes and the average degree of the graph. A PPENDIX AP ROOF OF T HEOREM For any two matrices T i and T j defined in (20), it followsthat T i = T j if ς i = ς j , that is, i and j have the samecategorical attribute. Otherwise, let z i = r and z j = s , T i can be transformed into T j by first swapping its r thand s th rows and then swapping the r th and s th columns. which are elementary transformations. Therefore, the matri-ces { T i | i ∈ V } are similar to each other, and they share thesame eigenvalues.Note that (cid:80) q ∗ r =1 ψ ir = 1 , which yields T ( I − ψ i T ) = T , it then follows that T T i = T = 0 T . Conse-quently, is an eigenvalue of T i . Before solving othereigenvalues of T i , we first present some notations. Let v rs (cid:44) (0 , . . . , , . . . , − , . . . , T , where is the r th and − is the s th entry, r (cid:54) = s , while other entries are all . Wealso define an auxiliary matrix ˜ T i (cid:44) ˜ D − Ψ i Ω F i , whichsatisfies that T i v rs = ˜ T i v rs .Without losing generality, let z i = r = 1 , then F i = diag (1 , . . . , , γ, . . . , γ ) with ’s the first q b entries, and ψ i ∝ ( γ, , . . . , with γ the first entry. After some linesof linear algebra, we obtain that v s , s = 2 , . . . , q b are q b − eigenvectors of ˜ T i with the corresponding eigenvaluessharing the same value λ s ( ˜ T i ) = ω in − ω out ω in +( q ∗ +1 − q b ) γω out +( q b − ω out . (38)Similarly, setting r = q b + 1 , we obtain that v rs , s = r + 1 , . . . , q ∗ are q ∗ − q b + 1 eigenvectors of with the cor-responding eigenvalues sharing the same value λ q b +1 ,s ( ˜ T i ) = ω in − ω out ω in +( q ∗ − − q b ) ω out + q b γ − ω out . (39)Given that T i v rs = ˜ T i v rs , the eigenvalues in (38) and (39)also belong to T i . Now we have already found q ∗ − real eigenvalues of T i , thus all the q ∗ eigenvalues of T i are real since the complex eigenvalues must be conjugate.The remaining one, denoted by λ last ( T i ) , can be computedaccording to the fact that (cid:80) k λ k ( T i ) = trace ( T i ) , wheretrace ( T i ) = (cid:80) r T irr is the trace of T i . Given that γ > and ω in > ω out , we have λ q b +1 ,s ( T i ) > λ s ( T i ) > ,and by direct computation we also find that λ last ( T i ) <λ q b +1 ,s ( T i ) . Therefore, λ q b +1 ,s ( T i ) in (39) is the largesteigenvalue among all the q ∗ real eigenvalues of T i , ∀ i ∈ V .This completes the proof. R EFERENCES [1] J. Yang and J. Leskovec, “Defining and evaluating network com-munities based on ground-truth,” Knowledge and Information Sys-tems , vol. 42, no. 1, pp. 181–213, Jan 2015.[2] M. E. J. Newman, Networks: An Introduction , 2010.[3] M. Girvan and M. E. J. Newman, “Community structure in socialand biological networks,” Proc. Natl. Acad. Sci. , vol. 99, no. 12, pp.7821–7826, jun 2002. [4] D. F. Gleich and C. Seshadhri, “Vertex neighborhoods, low con-ductance cuts, and good seeds for local community methods,”in Proceedings of the 18th ACM SIGKDD international conference onKnowledge discovery and data mining , 2012, pp. 597–605.[5] T. Chakraborty, A. Dalmia, A. Mukherjee, and N. Ganguly, “Met-rics for Community Analysis,” ACM Comput. Surv. , vol. 50, no. 4,pp. 1–37, aug 2017.[6] P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic block-models: First steps,” Soc. Networks , vol. 5, no. 2, pp. 109–137, jun1983.[7] M. E. J. Newman and E. A. Leicht, “Mixture models and ex-ploratory analysis in networks,” Proc. Natl. Acad. Sci. , vol. 104,no. 23, pp. 9564–9569, jun 2007.[8] T. P. Peixoto, “Hierarchical Block Structures and High-ResolutionModel Selection in Large Networks,” Phys. Rev. X , vol. 4, no. 1, p.011047, mar 2014.[9] ——, “Model selection and hypothesis testing for large-scale net-work models with overlapping groups,” Phys. Rev. X , vol. 5, no. 1,pp. 1–20, 2015.[10] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborov´a, “Asymptoticanalysis of the stochastic block model for modular networks andits algorithmic applications,” Phys. Rev. E , vol. 84, no. 6, p. 066106,dec 2011.[11] L. Zdeborov´a and F. Krzakala, “Statistical physics of inference:thresholds and algorithms,” Adv. Phys. , vol. 65, no. 5, pp. 453–552,2016.[12] D. Hric, R. K. Darst, and S. Fortunato, “Community detection innetworks: Structural communities versus ground truth,” Phys. Rev.E , vol. 90, no. 6, p. 062805, dec 2014.[13] L. Peel, D. B. Larremore, and A. Clauset, “The ground truth aboutmetadata and community detection in networks,” Sci. Adv. , vol. 3,no. 5, p. e1602548, may 2017.[14] J. Yang, J. McAuley, and J. Leskovec, “Community detection innetworks with node attributes,” in Proc. - IEEE Int. Conf. DataMining, ICDM , 2013, pp. 1151–1156.[15] M. E. J. Newman and A. Clauset, “Structure and inference inannotated networks,” Nat. Commun. , vol. 7, no. May, p. 11863, jun2016.[16] P. Chunaev, “Community detection in node-attributed social net-works: a survey,” Computer Science Review , vol. 37, p. 100286, 2020.[17] S. Fortunato and D. Hric, “Community detection in networks: Auser guide,” Phys. Rep. , vol. 659, pp. 1–44, nov 2016.[18] D. Hric, T. P. Peixoto, and S. Fortunato, “Network structure,metadata, and the prediction of missing nodes and annotations,” Phys. Rev. X , vol. 6, no. 3, pp. 1–15, 2016.[19] Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng, “A model-basedapproach to attributed graph clustering,” in Proc. 2012 Int. Conf.Manag. Data - SIGMOD ’12 . New York, New York, USA: ACMPress, 2012, p. 505.[20] Z. Chang, C. Jia, X. Yin, and Y. Zheng, “A generative model for ex-ploring structure regularities in attributed networks,” InformationSciences , vol. 505, pp. 252–264, Dec 2019.[21] H. Zanghi, S. Volant, and C. Ambroise, “Clustering based on ran-dom graph model embedding vertex features,” Pattern Recognit.Lett. , vol. 31, no. 9, pp. 830–836, jul 2010.[22] P. Zhang, C. Moore, and L. Zdeborov´a, “Phase transitions insemisupervised clustering of sparse networks,” Phys. Rev. E ,vol. 90, no. 5, p. 052802, nov 2014.[23] X. Yan, C. Shalizi, J. E. Jensen, F. Krzakala, C. Moore, L. Zdeborov´a,P. Zhang, and Y. Zhu, “Model selection for degree-corrected blockmodels,” J. Stat. Mech. Theory Exp. , vol. 2014, no. 5, p. P05007, may2014.[24] J.-G. Young, G. St-Onge, P. Desrosiers, and L. J. Dub´e, “Universal-ity of the stochastic block model,” Phys. Rev. E , vol. 98, no. 3, p.032309, Sep 2018.[25] B. Karrer and M. E. J. Newman, “Stochastic blockmodels andcommunity structure in networks,” Phys. Rev. E , vol. 83, no. 1,pp. 1–11, 2011.[26] T. P. Peixoto, “Nonparametric Bayesian inference of the micro-canonical stochastic block model,” Phys. Rev. E , vol. 95, no. 1, p.012317, jan 2017.[27] A. Faqeeh, S. Osat, and F. Radicchi, “Characterizing the AnalogyBetween Hyperbolic Embedding and Community Structure ofComplex Networks,” Phys. Rev. Lett. , vol. 121, no. 9, p. 098301,2018. [28] F. Simini, M. C. Gonz´alez, A. Maritan, and A.-L. Barab´asi, “Auniversal model for mobility and migration patterns,” Nature , vol.484, no. 7392, pp. 96–100, feb 2012.[29] M. Steinbach, V. Kumar, and P. Tan, “Cluster analysis: basic con-cepts and algorithms,” Introduction to data mining, 1st edn. PearsonAddison Wesley , 2005.[30] A. K. Jain, “Data clustering: 50 years beyond k-means,” Patternrecognition letters , vol. 31, no. 8, pp. 651–666, 2010.[31] M. Mezard and A. Montanari, “Information, physics, and compu-tation,” USA, 2009.[32] A. L. Madsen, “Belief update in clg bayesian networks withlazy propagation,” International Journal of Approximate Reasoning ,vol. 49, no. 2, pp. 503–521, 2008.[33] R. R. Nadakuditi and M. E. J. Newman, “Graph Spectra and theDetectability of Community Structure in Networks,” Phys. Rev.Lett. , vol. 108, no. 18, p. 188701, may 2012.[34] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zde-borova, and P. Zhang, “Spectral redemption in clustering sparsenetworks,” Proc. Natl. Acad. Sci. , vol. 110, no. 52, dec 2013.[35] E. Mossel, J. Neeman, and A. Sly, “A Proof of the Block ModelThreshold Conjecture,” Combinatorica , vol. 38, no. 3, pp. 665–708,jun 2018.[36] F. Brauer, “An introduction to networks in epidemic modeling,”in Mathematical epidemiology . Springer, 2008, pp. 133–146.[37] H. Kesten and B. P. Stigum, “Additional Limit Theorems for In-decomposable Multidimensional Galton-Watson Processes,” Ann.Math. Stat. , vol. 37, no. 6, pp. 1463–1481, dec 1966.[38] P. Zhang and C. Moore, “Scalable detection of statistically sig-nificant communities and hierarchies, using message passing formodularity,” Proc. Natl. Acad. Sci. , vol. 111, no. 51, pp. 18 144–18 149, dec 2014.[39] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi,“Defining and identifying communities in networks,” Proc. Natl.Acad. Sci. , vol. 101, no. 9, pp. 2658–2663, 2004.[40] A. David, “Vassilvitskii s.: K-means++: The advantages of carefulseeding,” in , 2007, pp. 1027–1035.[41] H. Abdi and L. J. Williams, “Principal component analysis,” Wileyinterdisciplinary reviews: computational statistics , vol. 2, no. 4, pp.433–459, 2010.[42] Z. Bu, G. Gao, H.-J. Li, and J. Cao, “Camas: A cluster-awaremultiagent system for attributed graph clustering,” InformationFusion , vol. 37, pp. 10–21, Sept 2017.[43] J. Xie, S. Kelley, and B. K. Szymanski, “Overlapping communitydetection in networks,” ACM Comput. Surv. , vol. 45, no. 4, pp. 1–35, aug 2013.[44] A. Lancichinetti, S. Fortunato, and J. Kert´esz, “Detecting theoverlapping and hierarchical community structure in complexnetworks,”