[PDF] Detectability thresholds of general modular graphs

Abstract

We investigate the detectability thresholds of various modular structures in the stochastic block model. Our analysis reveals how the detectability threshold is related to the details of the modular pattern, including the hierarchy of the clusters. We show that certain planted structures are impossible to infer regardless of their fuzziness.

Full PDF

aa r X i v : . [ c s . S I] J a n Detectability thresholds of general modular graphs

Tatsuro Kawamoto and Yoshiyuki Kabashima Department of Mathematical and Computing Science, Tokyo Institute of Technology,4259-G5-22, Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa 226-8502, Japan

We investigate the detectability thresholds of various modular structures in the stochastic blockmodel. Our analysis reveals how the detectability threshold is related to the details of the mod-ular pattern, including the hierarchy of the clusters. We show that certain planted structures areimpossible to infer regardless of their fuzziness.

I. INTRODUCTION

Motivated by needs in data-driven science, a numberof frameworks and algorithms for modular structure de-tection have been proposed in several ﬁelds in the lastfew decades [1–5]. Correspondingly, theoretical and ex-perimental analyses of statistical signiﬁcance of resultsare thus the subject of signiﬁcant research interest. Forexample, although an algorithm suggests the partition ofa graph following the application of some optimizationprocess, if the graph is a typical instance of a uniformrandom graph, it is doubtful whether the eﬀected parti-tion contains any useful information in practice. More-over, even when the graph is generated from a modelwith some planted structure, it may be indistinguishablefrom a uniform random graph if the planted structure istoo fuzzy.It is a challenging problem in general, and the basicstrategy to solve it involves investigating the conditionswhereby we can retrieve the planted structure for a spec-iﬁed random graph ensemble. To this end, the so-calledstochastic block model [6], which we explain in detailbelow, is often considered. This random graph modelhas controllable noise strength ǫ , i.e., ǫ = 0 representsa graph that clearly realizes the planted structure, and ǫ = 1 represents a uniform random graph. Above a cer-tain critical value ǫ ∗ , an algorithm cannot retrieve theplanted structure better than chance. This critical valueis called the detectability threshold , and a large number ofstudies have been devoted to it [7–20] for sparse graphs,including rigorous treatments [21–23]. Besides the dis-tinguishability from a uniform random graph, the exactrecovery in dense graphs has also been studied [24–30].Nevertheless, a large portion [31] of the research fo-cuses on the community structure (assortative structure)and the disassortative structure. In this paper, we inves-tigate the detectability threshold of more general struc-tures. We show that according to the linear stabil-ity analysis of belief propagation (BP), the detectabilitythreshold varies depending on the details of the modularstructure. II. STOCHASTIC BLOCK MODEL

The stochastic block model is a random graph modelwith a planted modular structure: the graph of N ver- tices consists of q clusters, each of which of size γ σ N ( σ ∈ { , . . . , q } ), and every pair of vertices is connectedindependently and randomly according to its cluster as-signments. For example, if vertices i and j belong toclusters σ and σ ′ , respectively, they are connected withprobability ω σσ ′ ( σ, σ ′ ∈ { , . . . , q } ); matrix ω is calledthe aﬃnity matrix. For given N , q , γ , and ω , we cangenerate random graph instances of the stochastic blockmodel. In the case of the inverse problem, which is ofinterest to us in this paper, our goal is to infer the pa-rameters γ and ω as well as cluster assignments σ givena graph. The number of clusters q is sometimes givenas input; otherwise, it is determined by some model se-lection criterion. Throughout this paper, we treat q asinput and focus on sparse graphs, i.e., each element of ω is scaled as O (1 /N ) so that the average degree does notdiverge as N → ∞ .While there exist many types of modular structures,the simplest and most studied case is the communitystructure as illustrated in Fig. 1(a); that is, the aﬃn-ity matrix has large values for its diagonal elements, ω σσ = ω in , and small values for the remaining elements, ω σσ ′ = ω out ( σ = σ ′ ). Although the elements of theaﬃnity matrix can be arbitrary nonnegative numbers,we hereafter consider the case where they are either ω in or ω out : that is, ω = ( ω in − ω out ) W + ω out ⊤ , (1)where W is an indicator matrix, where W σσ ′ = 1 rep-resents a densely connected cluster pair (which we referto as a bicluster), W σσ ′ = 0 represents a sparsely con-nected bicluster, and is the column vector with all el-ements equal to unity. This random graph ensemble canbe regarded as a restricted version of the stochastic blockmodel, or a generalized version of the planted partitionmodel [24].This aﬃnity matrix contains the above communitystructure as a special case, and can express arbitrarymodular patterns. Note that the indicator matrix W canbe regarded as a cluster-wise adjacency matrix, i.e., eachplanted cluster represents a coarse-grained vertex and adensely connected bicluster represents a bundled edge (adensely connected cluster constitutes a self-loop). We re-fer to the graph with adjacency matrix equal to W asa module graph . Note that some matrices represent theequivalent modular pattern; for example, Figs. 1(c) and1(d) diﬀer only by permutation. The average degree c of a b c d FIG. 1. Aﬃnity matrices of various modular structures. Theelements in gray have higher connection probabilities. this stochastic block model is c = N γ ⊤ ωγ . By deﬁningthe strength of the modular structure by ǫ ≡ ω out /ω in ,we can express elements ω in and ω out as ω in = cN (cid:2) (1 − ǫ ) γ ⊤ W γ + ǫ (cid:3) − , ω out = ǫ ω in . (2) III. BAYESIAN INFERENCE OF THESTOCHASTIC BLOCK MODEL

We now consider the Bayesian inference of the modu-lar structure using the stochastic block model. The priorprobability p ( σ | γ ) of cluster assignments is representedby a multinomial distribution of each planted cluster offraction γ σ , and the probability of independent and ran-dom connections between vertex pairs is represented bythe product of Bernoulli distributions. Thus, the likeli-hood of the stochastic block model is p ( A, σ | ω , γ , q ) = p ( A | σ , ω , γ ) p ( σ | γ )= Y i γ σ i Y i

Let ψ iσ be the marginal probability of cluster σ forvertex i calculated in the E-step ( P σ ψ iσ = 1), and ψ i beits row vector. Unfortunately, the exact computation of ψ i is demanding. To avoid this computational burden,we use BP [13, 36], which is justiﬁed for sparse graphs.Using tree approximation, the marginal probability ψ i can be estimated as ψ i = 1 Z i γ ◦ Y k ∈ ∂i h + ω in ψ k → i W i ◦ exp " − ω in ω out X ℓ ψ ℓ W , (6)where and ψ k → i are the q -dimensional unit row-vectorand the marginal probability for vertex k without thecontribution from edge ( k, i ), respectively. The latter isoften referred to as the cavity bias. ◦ and ∂i representthe element-wise product (Hadamard product) and theset of neighboring vertices of vertex i , respectively, and Z i is the normalization factor. We also deﬁne ω in ≡ ω in − ω out ω out = ǫ − − . (7)To obtain ψ i → j , we compute the following iterativeequation, i.e., the BP update equation. ψ i → j = 1 Z i → j γ ◦ Y k ∈ ∂i \ j h + ω in ψ k → i W i ◦ exp " − ω in ω out X ℓ ψ ℓ W . (8)Analogously to (6), Z i → j is the normalization factor.The BP update equation (8) can be formally written as ψ i → j = F i → j h ψ k → i W, ψ ℓ W i , (9)where F i → j is the non-linear operator representingthe right-hand side of (8). Note that ψ i → j = F i → j h ψ k → i , ψ ℓ i is essentially equivalent to the so-calledmod-bp [37] (without degree correction). If we considercavity biases Ψ i → j of the transformed basis Ψ i → j ≡ ψ i → j W, (10)its update equation is Ψ i → j = F i → j h Ψ k → i , Ψ ℓ i W. (11)We can transform back to the original basis by operating W − if it exists, or by operating F i → j .In the M-step, the parameter estimates (ˆ γ and ˆ ω ) areupdated as ˆ γ σ = 1 N N X i =1 h δ σσ i i , (12)ˆ ω in = P i

We now analyze the detectability threshold for a givenaﬃnity matrix W . In the undetectable phase, BP con-verges to a trivial (uninformative) ﬁxed point. When thegraph reaches the detectable phase, the trivial ﬁxed pointbecomes unstable, and BP converges to an informativeﬁxed point instead. To see this stability, we ﬁrst con-sider the propagation of perturbations on a vertex at thetrivial ﬁxed point. In the linear-response regime, it isdominated by the transfer matrix of (11) T σ ′ σ = δ Ψ i → jσ δ Ψ k → iσ ′ = ω in ω in Ψ k → iσ ′ ψ i → jσ ′ (cid:0) W σ ′ σ − Ψ i → jσ (cid:1) . (17)We neglect the contribution due to ω in ω out P ℓ Ψ ℓ ˜ σ , be-cause ω out = O (1 /N ).Although the eﬀect of the perturbation of a single ver-tex may be vanishingly small at a distant vertex, if the FIG. 2. (Color online) Fraction of correctly classiﬁed ver-tices for the structure of Fig. 1b. The size of the graph is N = 30 , c = 4 ,

5, and 6, respectively. The dashed vertical linesare the detectability thresholds predicted in (21) for c = 5 and6. The shadows represent the standard deviations of 10 sam-ples. eﬀect from all connected vertices adds to O (1), the trivialﬁxed point is unstable. Under tree approximation, this isachieved when cν >

1, where ν is the leading eigenvalueof the transfer matrix T ; the equality condition yields thedetectability threshold. Note that investigating the de-tectability threshold for an arbitrary structure is diﬃcultbecause the trivial ﬁxed point is not always known. Inthe following, hence, we analyze some solvable cases. A. A solvable case

Let us consider the case where a fraction of clusters isequal in size, i.e., γ σ = 1 /q for any σ , and the averagedegree of each cluster is also equal. That is, X σ ′ W σσ ′ = a ( a = const.) (18)for any σ . In other words, the module graph constitutesa regular graph. This is also assumed in Ref. [13]. Inthis case, the factorized state, i.e., ψ i → jσ = 1 /q for any i → j and σ , is the trivial BP ﬁxed point. Therefore, thetransfer matrix T at this ﬁxed point is T = ω in q + aω in (cid:18) W − aq ⊤ (cid:19) . (19)Because / √ q is the leading eigenvector of W with eigen-value a , ν can be written as ν = ω in q + aω in λ , (20)where λ is the second leading eigenvalue of W in mag-nitude. Thus, in terms of ǫ , the detectability thresholdis given by ǫ ∗ = | λ |√ c − a | λ |√ c − a + q . (21)The stochastic block model with a community struc-ture has a = 1 and λ = 1, which reproduces a previouslyknown result [13]. The threshold (21) indicates that asthe number of densely connected clusters increases, thediﬃculty in inferring the structure also increases. In par-ticular, when c < ( a/λ ) , it is statistically impossible toinfer the planted structure better than chance for any ǫ .This behavior is shown in Fig. 2; when c = 4, no signalis retrieved even when the noise ǫ is (almost) zero.The λ -dependency of the module graph in (21) is an-other notable feature. For graph G , the second eigen-value λ of an adjacency matrix is bounded from belowand above by the (normalized) edge expansion h ( G ) as1 − h ( G ) ≤ λ ≤ − h ( G ) , (22)which is known as Cheeger’s inequality [39]. The edgeexpansion h ( G ) is a measure of a sparse cut, deﬁned by h ( G ) = min S | E ( S, V \ S ) | a min {| S | , | V \ S |} , (23)where S is a subset of vertex set V of the graph, and | E ( S, V \ S ) | is the number of edges between sets S and V \ S . The inequality (22) indicates that the modulegraph with no satisfactory sparse cut [large h ( G )] tendsto have a small value of λ : that is, the planted struc-ture is diﬃcult to infer. Put another way, if the graphhas a strong hierarchical modular structure [40], its in-ference tends to be relatively easy. Note also that aslong as the second eigenvalue is strictly positive, the de-tectability threshold is always positive for a suﬃcientlylarge average degree.One might think that a diﬀerent detectability thresh-old can be obtained if we instead use the ﬂipped indica-tor matrix f W = ⊤ − W to parametrize noise strengthas ˜ ǫ ≡ ǫ − , even though the structure to infer is thesame. However, one can straightforwardly conﬁrm thatthis treatment also yields threshold ˜ ǫ ∗ equal to (21). B. Another solvable case

In the case where the factorized state is not a trivial BPﬁxed point, the calculation of the detectability thresholdis diﬃcult. Although it is rather a toy model example,there is another case where we can obtain the analyticalexpression for it.Let W be a matrix whose linearly independent columnsare orthogonal to one another, e.g., Fig. 1(c). We set theprior distribution γ so that γ W ∝ ⊤ , and keep it ﬁxed, FIG. 3. (Color online) Fraction of correctly classiﬁed verticesfor the structure of Figs. 1(c) with error bars. The dashedvertical and horizontal lines represent the estimate of the de-tectability threshold (25) and 1 /

3, respectively. The size ofthe graph is N = 30 ,

000 with average degree c = 6 and eachcluster has the same size. The shadow represents the standarddeviation of 10 samples. i.e., we skip (12); for the structure in Fig. 1(c), we set γ = (1 / , / , / T = ω in ω in )  − − − −  (24)and the leading eigenvalue is ν = ω in (2 + ω in ) − . Thecorresponding detectability threshold is ǫ ∗ = √ c − √ c + 1 . (25)This threshold was compared with the numerical exper-iment in Fig. 3. V. SUMMARY AND DISCUSSION

In this paper, we analyzed the detectability thresh-olds of general modular structures in the restricted graphensembles. Although our results do not cover arbitrarystructures, our solvable case analyses provide deeper in-sight into the nature of detectability. We showed thatsome structures are statistically impossible to infer (us-ing BP in Sec. III), no matter how small the noise ǫ is. Wealso revealed that detectability transition is connected tothe hierarchical structure of clusters. Our results are notrigorous and may diﬀer from the information-theoreticlimits. Also, when the number of clusters is large, thereoften exists another phase called the hard phase [13].These points are left as open questions for future re-search. ACKNOWLEDGMENTS

The authors thank Jean-Gabriel Young for useful com-ments. This work was supported by JSPS KAKENHI No.26011023 (T.K.) and No. 25120013 (Y.K.). [1] M. Girvan and M. E. J. Newman, Proc. Natl. Acad. Sci.U.S.A. , 7821 (2002).[2] J. Shi and J. Malik, IEEE Trans. Pattern Anal. and Ma-chine Intel. , 888 (2000).[3] J. Leskovec, K. J. Lang, D. Anirban, and M. W. Ma-honey, Internet Math. , 29 (2009).[4] S. Fortunato, Phys. Rep. , 75 (2010).[5] J.-B. Leger, C. Vacher, and J.-J. Daudin, Stat. Comp. , 675 (2014).[6] P. W. Holland, K. B. Laskey, and S. Leinhardt, Soc.Netw. , 109 (1983).[7] J. Reichardt and M. Leone, Phys. Rev. Lett. , 078701(2008).[8] R. R. Nadakuditi and M. E. J. Newman, Phys. Rev. Lett. , 188701 (2012).[9] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly,L. Zdeborov´a, and P. Zhang, Proc. Natl. Acad. Sci.U.S.A. , 20935 (2013).[10] T. Kawamoto and Y. Kabashima, Phys. Rev. E ,062803 (2015).[11] T. Kawamoto and Y. Kabashima, Eur. Phys. Lett. ,40007 (2015).[12] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborov´a,Phys. Rev. Lett. , 065701 (2011).[13] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborov´a,Phys. Rev. E , 066106 (2011).[14] F. Radicchi, Phys. Rev. E , 010801 (2013).[15] F. Radicchi, Eur. Phys. Lett. , 38001 (2014).[16] D. Hu, P. Ronhovde, and Z. Nussinov, Philos. Mag. ,406 (2012).[17] P. Ronhovde, D. Hu, and Z. Nussinov, Eur. Phys. Lett. , 38006 (2012).[18] G. V. Steeg, C. Moore, A. Galstyan, and A. Al-lahverdyan, Eur. Phys. Lett. , 48004 (2014).[19] P. Zhang, C. Moore, and L. Zdeborov´a, Phys. Rev. E ,052802 (2014).[20] A. Ghasemian, P. Zhang, A. Clauset, C. Moore, andL. Peel, Phys. Rev. X , 031005 (2016).[21] E. Mossel, J. Neeman, and A. Sly, Probab. Theory Relat.Fields pp. 1–31 (2014).[22] L. Massouli´e, in Proceedings of the 46th Annual ACMSymposium on Theory of Computing (ACM, New York,2014), STOC ’14, pp. 694–703.[23] J. Banks, C. Moore, J. Neeman, and P. Netrapalli, in (2016), pp. 383–416.[24] A. Condon and R. M. Karp, Random Struct. Algorithms , 116 (2001).[25] M. Onsj¨o and O. Watanabe, in Algorithms and Compu-tation (Springer, New York, 2006), pp. 507–516.[26] P. J. Bickel and A. Chen, Proc. Natl. Acad. Sci. U.S.A. , 21068 (2009).[27] K. Rohe, S. Chatterjee, and B. Yu, Ann. Stat. , 1878(2011).[28] S.-Y. Yun and A. Proutiere, in COLT (2014), pp. 138–175.[29] E. Abbe and C. Sandon, in (2015), pp. 670–688.[30] E. Abbe and C. Sandon, in