[PDF] Community Detection: Exact Recovery in Weighted Graphs

Abstract

In community detection, the exact recovery of communities (clusters) has been mainly investigated under the general stochastic block model with edges drawn from Bernoulli distributions. This paper considers the exact recovery of communities in a complete graph in which the graph edges are drawn from either a set of Gaussian distributions with community-dependent means and variances, or a set of exponential distributions with community-dependent means. For each case, we introduce a new semi-metric that describes sufficient and necessary conditions of exact recovery. The necessary and sufficient conditions are asymptotically tight. The analysis is also extended to incomplete, fully connected weighted graphs.

Full PDF

CCommunity Detection: Exact Recovery in WeightedGraphs

Mohammad Esmaeili and Aria Nosratinia

Department of Electrical and Computer Engineering, The University of Texas at DallasEmail: {Esmaeili, Aria}@utdallas.edu

Abstract —In community detection, the exact recovery of com-munities (clusters) has been mainly investigated under the generalstochastic block model with edges drawn from Bernoulli distri-butions. This paper considers the exact recovery of communitiesin a complete graph in which the graph edges are drawn fromeither a set of Gaussian distributions with community-dependentmeans and variances, or a set of exponential distributions withcommunity-dependent means. For each case, we introduce a newsemi-metric that describes sufﬁcient and necessary conditionsof exact recovery. The necessary and sufﬁcient conditions areasymptotically tight. The analysis is also extended to incomplete,fully connected weighted graphs.

I. I

NTRODUCTION

A main thrust of community detection literature has beenon the stochastic block model with the graph edges drawnfrom Bernoulli distributions [1]–[7], under various recoverymetrics [8]–[17], and algorithms [18]–[22]. Exact recoverythreshold of general stochastic block model was derived in [1]by approximating Binomial distributions by Poisson distribu-tions and utilizing the Chernoff-Hellinger divergence.While binary edges represent several practical applicationsand are analytically more tractable, there are many real-world graphs in which edge weights are better modelledby continuous values. For example, brain networks are in-trinsically weighted, reﬂecting a continuous distribution ofconnectivity strengths between different brain regions [23].Applications in communications, e.g., data forwarding inDelay Tolerant Networks (DTN) and worm containment inOnline Social Networks (OSN) [24] also are well representedwith continuous-valued weighted graphs. The edges of socialmedia networks can be of different types, such as simple,weighted, directed and multi-way (i.e. connecting more thantwo entities) depending on the network creation process [25].In biology, community detection is applied on weighted genenetworks for revealing cancers and anomalous tissues [26]. Forthese applications, the stochastic block model with continuousprobability density functions such as Gaussian distributions isthe more appropriate choice.For community detection from continuous-valued weightedgraphs, only a few information-theoretic results are known,mostly under Gaussian distributions. In [27], weak recoveryand exact recovery of a hidden community is investigatedwhile the edges are drawn from two different Gaussiandistributions. This is a symmetric version of the submatrixlocalization (also known as noisy biclustering) problem [18],[28], [29]. In submatrix localization problem, the task is to detect a small block (blocks) with atypical mean within alarge Gaussian matrix. Binary symmetric communities withGaussian distributions are investigated in [30]. The problemof detecting a sparse principal component based on a samplefrom a multivariate Gaussian distribution in high dimensionsis considered in [31].Community detection in a more general setting similarto [1] and under well-known continuous probability densityfunctions is an interesting and challenging problem from bothalgorithmic and information-theoretic perspectives. This paperinvestigates this problem and obtains information limits forexact recovery of communities.The contributions of this paper are as follows. First, weanalyze the exact recovery of node labels in a completegraph in which the edge weights are drawn from either aset of Gaussian or a set of Exponential distributions whoseparameters are determined by the latent labels. Under thismodel, sufﬁcient and necessary conditions for exact recoveryare derived. Second, we extend the results to fully-connectedbut incomplete weighted graphs, by showing that under someconditions the inter and intra community probability distri-butions can be approximated by Gaussian distributions. Thecontributions of this paper and techniques that are used hereare widely applicable for other high-dimensional inferenceproblems such as sparse PCA, Gaussian mixture clustering,tensor PCA, and other community detection problems withcontinuous distributions.II. S

YSTEM M ODEL & M

AIN R ESULTS

Notation: P indicates the probability operator and P aprobability distribution which is identiﬁed by the choice ofits variables whenever there is no confusion. A matrix A hascolumns A i and elements A ij . R is the set of real numbers, R + is the set of non-negative real numbers, and R ++ is theset of positive real numbers.We start by considering a complete graph with n nodes.The graph nodes are divided into K communities, where K is ﬁnite. Let Q be an K × K matrix with entries Q ij . Anode from community i is connected to a node in community j by a weighted edge drawn from distribution Q ij . In thispaper, Q ij belongs to either a set of Gaussian or a set ofExponential distributions. Let p (cid:44) [ p , p , · · · , p K ] , where p i denote the size of community i . It is assumed that the size ofeach community is proportional to n , i.e., p i = (cid:98) ρ i n (cid:99) , where ρ i ∈ (0 , and (cid:80) Ki =1 ρ i = 1 . a r X i v : . [ c s . S I] F e b hen Q ij belongs to the set of Gaussian distributions, Q ij = N (¯ µ ij , ¯ σ ij ) . For this case, we deﬁne matrices µ and Σ with entries µ ij = p i ¯ µ ij and Σ ij = p i ¯ σ ij , respectively.When Q ij belongs to the set of Exponential distributions, Q ij = Exp (˜ λ ij ) . For this case, we deﬁne matrix λ with entries λ ij = ˜ λ ij .Under the model with Gaussian distributions, assume thateach edge is removed by a Bernoulli random variable. Then anedge from a node in community i to a node in community j is removed with probability − θ ij . To have a fully connectedgraph, we consider a regime in which θ ij = c ij log nn , where c ij is a constant. For this case, we deﬁne matrix θ with entries θ ij . In this paper, this model is called incomplete but fullyconnected weighted graph with Gaussian distributions.Now, we summarize the main results of this paper. Forconvenience deﬁne the following semi-metrics: D g ( ˜ µ , ˆ µ , ˜ Σ , ˆ Σ ) (cid:44) max t ∈ [0 , K (cid:88) k =1 (cid:40) ˜ µ k ˆΣ k t + ˆ µ k ˜Σ k (1 − t )2 ˜Σ k ˆΣ k − (cid:104) ˜ µ k ˆΣ k t + ˆ µ k ˜Σ k (1 − t ) (cid:105) k ˆΣ k (cid:104) ˆΣ k t + ˜Σ k (1 − t ) (cid:105) −

12 log (cid:32) ˜Σ − tk ˆΣ tk ˜Σ k (1 − t ) + ˆΣ k t (cid:33) (cid:41) , D e (˜ λ , ˆ λ , p ) (cid:44) max t ∈ [0 , K (cid:88) k =1 p k log (cid:32) (1 − t )˜ λ k + t ˆ λ k ˜ λ − tk ˆ λ tk (cid:33) . Theorem 1.

With Gaussian distributions, • when D g ( µ i , µ j , Σ i , Σ j ) = ω (log n ) exact recovery ofnode labels is possible if and only if min i,j,i (cid:54) = j D g ( µ i , µ j , Σ i , Σ j ) > . • when D g ( µ i , µ j , Σ i , Σ j ) = O (log n ) exact recovery ofnode labels is possible if and only if min i,j,i (cid:54) = j lim n →∞ D g ( µ i , µ j , Σ i , Σ j )log n > . Theorem 2.

With Exponential distributions, • when D e ( λ i , λ j , p ) = ω (log n ) , exact recovery of nodelabels is possible if and only if min i,j,i (cid:54) = j D e ( λ i , λ j , p ) > . • when D e ( λ i , λ j , p )) = O (log n ) , exact recovery of nodelabels is possible if and only if min i,j,i (cid:54) = j lim n →∞ D e ( λ i , λ j , p )log n > . Remark 1.

For both the Gaussian and the Exponential cases,when the related semi-metric is ω (log n ) , the exact recoverycondition is equivalent to Q i (cid:54) = Q j ∀ i (cid:54) = j. Corollary 1.

For a fully connected weighted but incomplete graph whose edge weights are Gaussian distributed, exactrecovery of node labels is possible if and only if min i,j,i (cid:54) = j lim n →∞ D g ( µ i , µ j , Σ i , Σ j )log n > , where µ ij = p i ¯ µ ij θ ij ∀ i, j, Σ ij = p i θ ij [¯ σ ij + (1 − θ ij )¯ µ ij ] ∀ i, j. III. P

ROOFS

At each node, our problem is equivalent to testing a hypoth-esis H indicating which community the node belongs to, outof the set of K communities. In our setting, this is a Bayesianproblem with prior P ( H = i ) = ρ i . For each node, let W bea random vector with entries W i representing the summationof edge weights connecting a node of interest to nodes incommunity i .Assume that all node labels are revealed except for one,whose community membership H is to be derived based onan observation of W . The maximum a posteriori estimator(MAP) is argmax i ρ i P ( w | H = i ) . A simple comparison can eliminate a candidate, i.e, if ρ i P ( w | H = i ) < ρ k P ( w | H = k ) , (1)then H (cid:54) = i . Therefore, a set of pairwise comparisons of thehypotheses reveals the MAP. Assume that the true hypothesisis H = i . Denote by B ( i, k ) the region of W for which (1) issatisﬁed, i.e., H = i has a worse metric compared with H = k .Also denote by B ( i ) the region for W where the overall MAPestimator is in error. Then the probability of error is P e = (cid:88) i ρ i P ( W ∈ B ( i ) | H = i ) . Since B ( i ) ⊂ (cid:83) Kk =1 B ( i, k ) , P e ≤ (cid:88) i (cid:88) k,k (cid:54) = i ρ i P ( W ∈ B ( i, k ) | H = i ) . (2)Deﬁne I ( w , i, k ) (cid:44) min { ρ i P ( w | H = i ) , ρ k P ( w | H = k ) } , and note that I ( w , i, k ) = (cid:40) ρ i P ( w | H = i ) when W ∈ B ( i, k ) ρ k P ( w | H = k ) when W ∈ B c ( i, k ) . (3)Substituting (3) into (2), P e ≤ (cid:90) (cid:88) i (cid:88) k>i I ( w , i, k ) d w . (4)The error is bounded from below by P e ≥ K − (cid:90) (cid:88) i (cid:88) k>i I ( w , i, k ) d w , (5)ecause (cid:88) kk (cid:54) = i P ( W ∈ B ( i, k ) | H = i ) ≤ ( K − P ( W ∈ B ( i ) | H = i ) . Therefore, the error probability is bounded by controlling (cid:90) I ( w , i, k ) d w . (6) A. Proof of Theorem 1

For a node in community i , the edge sums W j are dis-tributed according to N ( p j ¯ µ ji , p j ¯ σ ji ) , and are independent ofeach other. We collect these edge sums into the vector W ,which obeys a multivariate Gaussian distribution with meandenoted µ i and covariance matrix diag( Σ i ) . Then f ( w ; µ i , Σ i ) (cid:44) P ( w | H = i )= K (cid:89) k =1 √ π Σ ki exp (cid:18) − ( w k − µ ki ) ki (cid:19) , where µ ki = p k ¯ µ ki and Σ ki = p k ¯ σ ki . Lemma 1.

Let ˜ µ , ˆ µ ∈ R K , ˜ Σ , ˆ Σ ∈ R K ++ , and ˜ ρ, ˆ ρ ∈ R ++ . Ifeither ˜ µ (cid:54) = ˆ µ or ˜ Σ (cid:54) = ˆ Σ , then (cid:90) R K min { ˜ ρ f ( w ; ˜ µ , ˜ Σ ) , ˆ ρ f ( w ; ˆ µ , ˆ Σ ) } d w ≤ e − D g (˜ µ , ˆ µ , ˜ Σ , ˆ Σ )+ c , (cid:90) R K min { ˜ ρf ( w ; ˜ µ , ˜ Σ ) , ˆ ρf ( w ; ˆ µ , ˆ Σ ) } d w ≥ e − D g (˜ µ , ˆ µ , ˜ Σ , ˆ Σ )+ c , where c and c are some constants.Proof. Deﬁne g ( t ) (cid:44) (cid:32) f ( w ; ˜ µ , ˜ Σ ) f ( w ; ˆ µ , ˆ Σ ) (cid:33) − t ,g ( t ) (cid:44) (cid:32) f ( w ; ˆ µ , ˆ Σ ) f ( w ; ˜ µ , ˜ Σ ) (cid:33) t ,g ( t ) (cid:44) f ( w ; ˜ µ , ˜ Σ ) t f ( w ; ˆ µ , ˆ Σ ) − t , in which the dependence of g ( t ) , g ( t ) , and g ( t ) on w issuppressed for notational convenience. Note that g ( t ) can berestated as g ( t ) = e − (cid:80) Kk =1 D k ( t ) K (cid:89) k =1 (cid:112) πσ k ( t ) exp (cid:18) − ( w k − µ k ( t )) σ k ( t ) (cid:19) , where µ k ( t ) (cid:44) ˜ µ k ˆΣ k t + ˆ µ k ˜Σ k (1 − t )ˆΣ k t + ˜Σ k (1 − t ) ,σ k ( t ) (cid:44) ˜Σ k ˆΣ k ˆΣ k t + ˜Σ k (1 − t ) ,D k ( t ) (cid:44) ˜ µ k ˆΣ k t + ˆ µ k ˜Σ k (1 − t )2 ˜Σ k ˆΣ k − (cid:104) ˜ µ k ˆΣ k t + ˆ µ k ˜Σ k (1 − t ) (cid:105) k ˆΣ k (cid:104) ˆΣ k t + ˜Σ k (1 − t ) (cid:105) −

12 log (cid:32) ˜Σ − tk ˆΣ tk ˜Σ k (1 − t ) + ˆΣ k t (cid:33) . Lemma 2.

For any t ∈ [0 , , min { g ( t ) , g ( t ) } ≤ .Proof. Both g ( t ) and g ( t ) are monotonic and g ( t ) g ( t ) is a pos-itive constant (does not depend on t ), thus min { g ( t ) , g ( t ) } is also monotonic in t . Since g (1) = g (0) = 1 , for all t wehave: min { g ( t ) , g ( t ) } ≤ . It can be shown that for any t ∈ [0 , , (cid:90) R K min { ˜ ρf ( w ; ˜ µ , ˜ Σ ) , ˆ ρf ( w ; ˆ µ , ˆ Σ ) } d w ≤ max { ˜ ρ, ˆ ρ } (cid:90) R K min { f ( w ; ˜ µ , ˜ Σ ) , f ( w ; ˆ µ , ˆ Σ ) } d w = max { ˜ ρ, ˆ ρ } (cid:90) R K g ( t ) min { g ( t ) , g ( t ) } d w ≤ max { ˜ ρ, ˆ ρ } e − (cid:80) Kk =1 D k ( t ) , where the last inequality holds due to Lemma 2 and (cid:90) R K K (cid:89) k =1 (cid:112) πσ k ( t ) exp (cid:18) − ( w k − µ k ( t )) σ k ( t ) (cid:19) d w = 1 . When t is chosen to minimize e − (cid:80) Kk =1 D k ( t ) , (cid:90) R K min { ˜ ρf ( w ; ˜ µ , ˜ Σ ) , ˆ ρf ( w ; ˆ µ , ˆ Σ ) } d w ≤ max { ˜ ρ, ˆ ρ } e − D g (˜ µ , ˆ µ , ˜ Σ , ˆ Σ ) . To prove the second half, note that min { g ( t ∗ ) , g ( t ∗ ) } = g ( τ ∗ ) , (7)where τ ∗ (cid:44) t ∗ if min { g ( t ∗ ) , g ( t ∗ ) } = g ( t ∗ ) ; Otherwise τ ∗ (cid:44) t ∗ − . Hence, at t ∗ , (cid:90) R K min { ˜ ρf ( w ; ˜ µ , ˜ Σ ) , ˆ ρf ( w ; ˆ µ , ˆ Σ ) } d w ≥ min { ˜ ρ, ˆ ρ } (cid:90) R K min { f ( w ; ˜ µ , ˜ Σ ) , f ( w ; ˆ µ , ˆ Σ ) } d w = min { ˜ ρ, ˆ ρ } (cid:90) R K g ( t ∗ ) min { g ( t ∗ ) , g ( t ∗ ) } d w = min { ˜ ρ, ˆ ρ } e − D g (˜ µ , ˆ µ , ˜ Σ , ˆ Σ ) × (cid:90) R K K (cid:89) k =1 (cid:112) πσ k ( t ∗ ) e − ( wk − µk ( t ∗ ))22 σ k ( t ∗ ) g ( w k , τ ∗ ) d w , here g ( w k , τ ∗ ) = (cid:32) ˜Σ k ˆΣ k (cid:33) τ ∗ e − (cid:20) ( wk − ˆ µk )22ˆΣ k − ( wk − ˜ µk )22˜Σ k (cid:21) τ ∗ . Since g ( w k , τ ∗ ) is a non-negative and integrable function of w k , applying a generalized variant of the mean value Theorem,there exists w ∗ k such that (cid:90) R (cid:112) πσ k ( t ∗ ) e − ( wk − µk ( t ∗ ))22 σ k ( t ∗ ) g ( w k , τ ∗ ) d w k = g ( w ∗ k , τ ∗ ) . It can be shown that at τ ∗ , g ( w ∗ k , τ ∗ ) is a positive constant.Therefore, (cid:90) R K min { ˜ ρf ( w ; ˜ µ , ˜ Σ ) , ˆ ρf ( w ; ˆ µ , ˆ Σ ) } d w ≥ min { ˜ ρ, ˆ ρ } e − D g (˜ µ , ˆ µ , ˜ Σ , ˆ Σ )+ c , where c is a constant.Using Lemma 1 and the bounds (4) and (5), P e ≤ e − D g ( µ i , µ j , Σ i , Σ j )+ c ,P e ≥ e − D g ( µ i , µ j , Σ i , Σ j )+ c . When D g ( µ i , µ j , Σ i , Σ j ) = ω (log n ) , as n goes to inﬁnity,exact recovery is possible if and only if min i,j,i (cid:54) = j D g ( µ i , µ j , Σ i , Σ j ) > . If µ i is close to µ j and Σ i is close to Σ j , then D g ( µ i , µ j , Σ i , Σ j ) = O (log n ) . In this regime, exact recoveryis possible if and only if min i,j,i (cid:54) = j lim n →∞ D g ( µ i , µ j , Σ i , Σ j )log n > . B. Proof of Theorem 2

If the node of interest belongs to community i , W j isdistributed according to Gamma( p j , λ ji ) . The vector W hasindependent Gamma entries with different means p j /λ ji .Under H = i , random variable W is drawn from a multivariateGamma distribution with shape parameter p and rate parameter λ i ∈ R K ++ . Then f ( w ; p , λ i ) (cid:44) P ( w | H = i ) = K (cid:89) k =1 λ kip k Γ( p k ) w p k − k e − λ ki w k . Lemma 3.

Let ˜ λ , ˆ λ ∈ R K ++ , p ∈ R K ++ , and ˜ ρ, ˆ ρ ∈ R ++ . If ˜ λ (cid:54) = ˆ λ , (cid:90) R K + min { ˜ ρf ( w ; p , ˜ λ ) , ˆ ρf ( w ; p , ˆ λ ) } d w ≤ e − D e (˜ λ , ˆ λ , p )+ c , (cid:90) R K + min { ˜ ρf ( w ; p , ˜ λ ) , ˆ ρf ( w ; p , ˆ λ ) } d w ≥ e − D e (˜ λ , ˆ λ , p )+ c , where c and c are some constants. Proof. Deﬁne g ( t ) (cid:44) (cid:32) f ( w ; p , ˜ λ ) f ( w ; p , ˆ λ ) (cid:33) − t ,g ( t ) (cid:44) (cid:32) f ( w ; p , ˆ λ ) f ( w ; p , ˜ λ ) (cid:33) t ,g ( t ) (cid:44) f ( w ; p , ˜ λ ) t f ( w ; p , ˆ λ ) − t , in which the dependence of g ( t ) , g ( t ) , and g ( t ) on w issuppressed for notational convenience. Notice that Lemma 2holds also in this case. For any t ∈ [0 , , (cid:90) R K + min { ˜ ρf ( w ; p , ˜ λ ) , ˆ ρf ( w ; p , ˆ λ ) } d w ≤ max { ˜ ρ, ˆ ρ } (cid:90) R K + min { f ( w ; p , ˜ λ ) , f ( w ; p , ˆ λ ) } d w = max { ˜ ρ, ˆ ρ } (cid:90) R K + g ( t ) min { g ( t ) , g ( t ) } d w ≤ max { ˜ ρ, ˆ ρ } e − (cid:80) Kk =1 p k log (cid:18) (1 − t )˜ λk + t ˆ λk ˜ λ − tk ˆ λtk (cid:19) , where the last inequality holds due to Lemma 2 and (cid:90) R K + K (cid:89) k =1 ( λ k ( t )) p k Γ( p k ) w p k − k e − w k λ k ( t ) d w = 1 , where λ k ( t ) (cid:44) (1 − t )˜ λ k + t ˆ λ k . When t is chosen to maximize (cid:80) Kk =1 p k log (cid:16) (1 − t )˜ λ k + t ˆ λ k ˜ λ − tk ˆ λ tk (cid:17) , (cid:90) R K + min { ˜ ρf ( w ; p , ˜ λ ) , ˆ ρf ( w ; p , ˆ λ ) } d w ≤ max { ˜ ρ, ˆ ρ } e − D e (˜ λ , ˆ λ , p ) . Notice that (7) holds also in this case. Hence, at t ∗ , (cid:90) R K + min { ˜ ρf ( w ; p , ˜ λ ) , ˆ ρf ( w ; p , ˆ λ ) } d w ≥ min { ˜ ρ, ˆ ρ } (cid:90) R K + min { f ( w ; p , ˜ λ ) , f ( w ; p , ˆ λ ) } d w = min { ˜ ρ, ˆ ρ } (cid:90) R K + g ( t ∗ ) min { g ( t ∗ ) , g ( t ∗ ) } d w = min { ˜ ρ, ˆ ρ } e − D e (˜ λ , ˆ λ , p ) × (cid:90) R K + K (cid:89) k =1 ( λ k ( t ∗ )) p k Γ( p k ) ( w k ) p k − e − w k λ k ( t ∗ ) g ( w k , τ ∗ ) d w , where g ( w k , τ ∗ ) = (cid:32) ˜ λ k ˆ λ k (cid:33) p k τ ∗ e − w k τ ∗ (˜ λ k − ˆ λ k ) . Since g ( w k , τ ∗ ) is a non-negative and integrable function of w k , applying a generalized variant of mean value Theorem,there exists w ∗ k such that (cid:90) R + (cid:112) πσ k ( t ∗ ) e − ( wk − µk ( t ∗ ))22 σ k ( t ∗ ) g ( w k , τ ∗ ) d w k = g ( w ∗ k , τ ∗ ) . t can be shown that at τ ∗ , g ( w ∗ k , τ ∗ ) is a positive constant.Therefore, (cid:90) R K + min { ˜ ρf ( w ; p , ˜ λ ) , ˆ ρf ( w ; p , ˆ λ ) } d w ≥ min { ˜ ρ, ˆ ρ } e − D e (˜ λ , ˆ λ , p )+ c , where c is a constant.Using Lemma 3 and the bounds (4) and (5), for someconstants c and c , P e ≤ e − D e ( λ i , λ j , p )+ c ,P e ≥ e − D e ( λ i , λ j , p )+ c . When D e ( λ i , λ j , p ) = ω (log n ) , as n goes to inﬁnity, exactrecovery is possible if and only if min i,j,i (cid:54) = j D e ( λ i , λ j , p ) > . If λ i is close to λ j , then D e ( λ i , λ j , p ) = O (log n ) . Inthis regime, lim n →∞ D g ( µ i , µ j , Σ i , Σ j )log n is a constant and exactrecovery is possible if and only if min i,j,i (cid:54) = j lim n →∞ D e ( λ i , λ j , p )log n > . IV. I

NCOMPLETE BUT F ULLY C ONNECTED W EIGHTED G RAPHS

Let X ∼ Bern( θ ) and Y ∼ N ( µ, σ ) . Then Z (cid:44) XY is arandom variable with probability density function f Z ( z ) = θf Y ( z ) + (1 − θ ) δ ( z ) , where f Y ( y ) is the probability density function of Y and δ ( z ) is Dirac delta function. Then the probability density functionof (cid:80) ni =1 Z i is n (cid:88) i =0 (cid:18) ni (cid:19) θ i (1 − θ ) n − i { f Y ( z ) } (cid:126) i (cid:126) δ ( z ) , (8)where (cid:126) denotes the convolution operator. In (8), for each i , { f Y ( z ) } (cid:126) i is a Gaussian probability density function withmean iµ and variance of iσ . If θ is in order of log nn and µσ ≤ , then the probability density function (8) iswell-enough approximated by a Gaussian distribution withmean nµθ and variance of nθ [ σ + (1 − θ ) µ ] . Figure 1compares the probability density function (8) and its Gaussianapproximation under the conditions mentioned above.Using this approximation and following Theorem 1, when Q ij = N (¯ µ ij , ¯ σ ij ) , exact recovery of node labels is possibleif and only if min i,j,i (cid:54) = j lim n →∞ D g ( µ i , µ j , Σ i , Σ j )log n > , where µ ij = p i ¯ µ ij θ ij , Σ ij = p i θ ij [¯ σ ij + (1 − θ ij )¯ µ ij ] . (a) µ = 0 , σ = 1 (b) µ = 4 , σ = 1 (c) µ = 6 , σ = 1 Fig. 1: True distribution (8) and its approximation for θ = log nn , n = 10000 , and different values of µσ . EFERENCES[1] E. Abbe and C. Sandon, “Community detection in general stochasticblock models: Fundamental limits and efﬁcient algorithms for recovery,”in . IEEE, 2015, pp. 670–688.[2] P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic blockmodels:First steps,”

Social networks , vol. 5, no. 2, pp. 109–137, 1983.[3] B. Hajek, Y. Wu, and J. Xu, “Exact recovery threshold in the binarycensored block model,” in . IEEE, 2015, pp. 99–103.[4] A. Saade, M. Lelarge, F. Krzakala, and L. Zdeborová, “Spectral detectionin the censored block model,” in . IEEE, 2015, pp. 1184–1188.[5] M. Esmaeili, H. Saad, and A. Nosratinia, “Community detection withside information via semideﬁnite programming,” in . IEEE, 2019, pp.420–424.[6] P. Fronczak, A. Fronczak, and M. Bujok, “Exponential random graphmodels for networks with community structure,”

Physical Review E ,vol. 88, no. 3, p. 032810, 2013.[7] M. Esmaeili and A. Nosratinia, “Community detection with secondarylatent variables,” in , 2020, pp. 1355–1360.[8] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová, “Inference andphase transitions in the detection of modules in sparse networks,”

Physical Review Letters , vol. 107, no. 6, p. 065701, 2011.[9] E. Mossel, J. Neeman, and A. Sly, “Reconstruction and estimation inthe planted partition model,”

Probability Theory and Related Fields , vol.162, no. 3-4, pp. 431–461, 2015.[10] L. Massoulié, “Community detection thresholds and the weak ramanujanproperty,” in

Proceedings of the forty-sixth annual ACM symposium onTheory of computing , 2014, pp. 694–703.[11] E. Mossel, J. Neeman, and A. Sly, “A proof of the block model thresholdconjecture,”

Combinatorica , vol. 38, no. 3, pp. 665–708, 2018.[12] H. Saad, A. Abotabl, and A. Nosratinia, “Exit analysis for beliefpropagation in degree-correlated stochastic block models,” in . IEEE, 2016,pp. 775–779.[13] E. Mossel and J. Xu, “Density evolution in the degree-correlatedstochastic block model,” in

Conference on Learning Theory , 2016, pp.1319–1356.[14] S.-Y. Yun and A. Proutiere, “Community detection via random andadaptive sampling,” in

Conference on learning theory , 2014, pp. 138–175.[15] E. Abbe, “Community detection and stochastic block models: recentdevelopments,”

The Journal of Machine Learning Research , vol. 18,no. 1, pp. 6446–6531, 2017.[16] E. Abbe, A. S. Bandeira, and G. Hall, “Exact recovery in the stochasticblock model,”

IEEE Transactions on Information Theory , vol. 62, no. 1,pp. 471–487, 2015.[17] E. Mossel, J. Neeman, and A. Sly, “Consistency thresholds for theplanted bisection model,” in

Proceedings of the forty-seventh annualACM symposium on Theory of computing , 2015, pp. 69–75.[18] Y. Chen and J. Xu, “Statistical-computational tradeoffs in plantedproblems and submatrix localization with a growing number of clustersand submatrices,”

The Journal of Machine Learning Research , vol. 17,no. 1, pp. 882–938, 2016.[19] E. Mossel, J. Neeman, and A. Sly, “Belief propagation, robust recon-struction and optimal recovery of block models,” in

Conference onLearning Theory , 2014, pp. 356–370.[20] M. Esmaeili, H. Saad, and A. Nosratinia, “Exact recovery by semidef-inite programming in the binary stochastic block model with partiallyrevealed side information,” in

ICASSP 2019 - 2019 IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP) , 2019,pp. 3477–3481.[21] A. A. Amini, E. Levina et al. , “On semideﬁnite relaxations for the blockmodel,”

The Annals of Statistics , vol. 46, no. 1, pp. 149–179, 2018.[22] B. Hajek, Y. Wu, and J. Xu, “Achieving exact cluster recovery thresh-old via semideﬁnite programming,”

IEEE Transactions on InformationTheory , vol. 62, no. 5, pp. 2788–2797, 2016.[23] C. Nicolini, C. Bordier, and A. Bifone, “Community detection inweighted brain connectivity networks beyond the resolution limit,”

Neuroimage , vol. 146, pp. 28–39, 2017. [24] Z. Lu, X. Sun, Y. Wen, G. Cao, and T. La Porta, “Algorithms andapplications for community detection in weighted networks,”

IEEETransactions on Parallel and Distributed Systems , vol. 26, no. 11, pp.2916–2926, 2014.[25] S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos, “Com-munity detection in social media,”

Data Mining and Knowledge Discov-ery , vol. 24, no. 3, pp. 515–554, 2012.[26] L. Cantini, E. Medico, S. Fortunato, and M. Caselle, “Detection of genecommunities in multi-networks reveals cancer drivers,”

Scientiﬁc reports ,vol. 5, p. 17386, 2015.[27] B. Hajek, Y. Wu, and J. Xu, “Information limits for recovering a hiddencommunity,”

IEEE Transactions on Information Theory , vol. 63, no. 8,pp. 4729–4745, 2017.[28] Butucea, Cristina, Ingster, Yuri I., and Suslina, Irina A., “Sharpvariable selection of a sparse submatrix in a high-dimensional noisymatrix,”

ESAIM: PS , vol. 19, pp. 115–134, 2015. [Online]. Available:https://doi.org/10.1051/ps/2014017[29] M. Kolar, S. Balakrishnan, A. Rinaldo, and A. Singh, “Minimax local-ization of structural information in large noisy matrices,” in

Advancesin Neural Information Processing Systems , 2011, pp. 909–917.[30] Y. Wu and J. Xu, “Statistical problems with planted structures:Information-theoretical and computational limits,” arXiv preprintarXiv:1806.00118 , 2018.[31] Q. Berthet, P. Rigollet et al. , “Optimal detection of sparse principalcomponents in high dimension,”