[PDF] Community Detection with a Subsampled Semidefinite Program

Abstract

Semidefinite programming is an important tool to tackle several problems in data science and signal processing, including clustering and community detection. However, semidefinite programs are often slow in practice, so speed up techniques such as sketching are often considered. In the context of community detection in the stochastic block model, Mixon and Xie [9] have recently proposed a sketching framework in which a semidefinite program is solved only on a subsampled subgraph of the network, giving rise to significant computational savings. In this short paper, we provide a positive answer to a conjecture of Mixon and Xie about the statistical limits of this technique for the stochastic block model with two balanced communities.

Full PDF

aa r X i v : . [ m a t h . O C ] F e b COMMUNITY DETECTION WITH A SUBSAMPLEDSEMIDEFINITE PROGRAM

PEDRO ABDALLA AND AFONSO S. BANDEIRA

Abstract.

Semideﬁnite programming is an important tool to tackle severalproblems in data science and signal processing, including clustering and com-munity detection. However, semideﬁnite programs are often slow in practice,so speed up techniques such as sketching are often considered. In the contextof community detection in the stochastic block model, Mixon and Xie [9] haverecently proposed a sketching framework in which a semideﬁnite program issolved only on a subsampled subgraph of the network, giving rise to signiﬁcantcomputational savings. In this short paper, we provide a positive answer to aconjecture of Mixon and Xie about the statistical limits of this technique forthe stochastic block model with two balanced communities. Introduction

Clustering problems are ubiquitous in data science. The main goal is to ﬁnd apartition of the data into clusters based on similarity measures. A large body ofwork has focused on the stochastic block model, a random network model with aplanted cluster structure, we refer the reader to [2] for a survey on recent devel-opments. We will focus on case of two balanced communities. Let n be an evennatural number and G ∼ G ( n ; p, q ) be a random graph on n nodes drawn as fol-lows: Randomly partition the set of n vertices V in two equally sized communities V = S ∪ S . For every pair of vertices, an edge is placed with probability p ifthey belong to the same community S i and with probability q < p otherwise, allindependent. The goal is to exactly recover the partition { S , S } from the graphalone. Let the matrix A ∈ R n × n denote the adjacency matrix of the graph G . Con-sidering a label vector x ∈ {± } n representing community membership of notes.The maximum likelihood estimator for the node labels x is given by the programbelow [2],(1) max x x T Ax s.t. 1 T x = 0 x ∈ {± } n Here denotes all-ones vector. Since it is well known that the problem (1) isNP-Hard [6], we consider the standard semideﬁnite relaxation [7]. Department of Mathematics, ETH Z¨urich

E-mail address : [email protected] and [email protected] . Date : February 2021. (2) max X ∈ R n × n Tr( AX )s.t. X ii = 1 X (cid:23) A J ) = 0Where X is a surrogate variable for xx T and J denotes all-ones matrix. Thefollowing theorem gives the sharp phase transition for the community detectionproblem with two balanced communities.

Theorem 1 (Exact recovery threshold [1, 4, 8, 10]) . Let G ∼ G ( n ; p, q ) with p = α log nn , q = β log nn and planted communities { S , S } . Then, • (I) For √ α − √ β < √ , no algorithm can exactly recover the partition withhigh probability. • (II) For √ α − √ β > √ , with high probability: The semideﬁnite program(2) has a unique solution given by X ♮ = x ♮ ( x ♮ ) T where x ♮ corresponds tothe memberships of the true communities, thus achieving exact recovery. Although polynomial time, semideﬁnite programs tend to be computationallycostly. A powerful tool to overcome computational complexity is that of sketching(we refer the reader to [13] for an instance of this idea in least squares, and [5,14] for semideﬁnite optimization). In the particular framework addressed in thispaper, Mixon and Xie [9] have recently proposed a sketching approach whereina potentially signiﬁcantly smaller semideﬁnite program is solved, its size dependson the community structure strength. Our main contribution is to resolve in thepositive a conjecture in [9] regarding the dependency of the size of the resultingsemideﬁnite program and the community structure strength. We now describe thesketching approach in [9], which consists of a three step process, and a tunningparameter 0 < γ < • (Step 1) Given a graph with vertex set V . Subsample a smaller vertex set V ♮ by sampling each node in V independently at random with probability γ . • (Step 2) Solve the community detection problem in the subgraph inducedby V ♮ . • (Step 3) For each node v not in V ♮ use a majority vote procedure amongthe neighbours of v in V ♮ to infer its community membership.The main goal of this paper is to determine the minimum value of γ such thatthe approach above exactly recovers both communities with high probability. Thecomputational savings come from the reduced size of the semideﬁnite program andso the paramter γ governs the computational cost of the algorithm (we refer thereader to [3] for the dependency of the computational cost of semideﬁnite pro-graming on the number of variables). Mixon and Xie [9] conjectured that, as longas γ > √ α − √ β ) , Note that while there is a natural ambiguity in the labelling of each of the communities,corresponding to an ambiguity of global sign ﬂip in x , this is no longer the case for xx T whichsimply represents the partition. OMMUNITY DETECTION WITH A SUBSAMPLED SEMIDEFINITE PROGRAM 3 the sketching approach works with high probability. Our main result provides apositive answer for this conjecture. In particular, for γ = 1, we recover the thresholdin Theorem 1 (see part II). 2. An Oracle Bound

As described above, the sketching approach consists of three steps: Sampling,solving the community detection problem for a smaller sampled graph and thenrecovering the entire communities using a majority vote procedure. In this section,we analyze the Step 3 and prove that it works, for a certain range of the parameter γ , as long as we know the smaller communities in Step 2. The analysis is describedin the proposition below, we refer to it as an oracle bound because it assumes theknowledge of the communities in Step 2. Proposition 1.

Let G ∼ G ( n ; p, q ) with planted communities { S , S } and with p = α log nn and q = β log nn satisfying p > q . Draw a vertex set V ♮ at random bysampling each node of the graph G independently at random with probability γ . Let R , R be the planted communities in the sampled graph, i.e, R i = S i ∩ V ♮ for both i ∈ { , } . Moreover, denote e ( v, S ) by the number of edges of G between the vertex v and the set S ⊂ V ( G ) where V ( G ) is the vertex set of the graph G . Now, consider ˆ S = R ∪ { v ∈ V ( G ) \ V ♮ : e ( v, R ) > e ( v, R ) } . ˆ S = R ∪ { v ∈ V ( G ) \ V ♮ : e ( v, R ) > e ( v, R ) } . Then, with probability − o (1) , ( ˆ S , ˆ S ) = ( S , S ) as long as γ > √ α − √ β ) . The next lemma will play a key role in the proof of Proposition 1, it is similarto Lemma 8 in [1] but it deals with almost balanced communities, this is crucial toour analysis.

Lemma 1.

Suppose α > β > . Let X and Y be two independent random variableswith X ∼ Binom ( K , α log nn ) and Y ∼ Binom ( K , β log nn ) , where K = nγ + o ( n ) and K = nγ + o ( n ) as n → ∞ . Then, P ( X − Y ≤ ≤ n − (( α + β ) γ − γ √ αβ )+ o (1) . We present a simple and direct proof of this lemma.

Proof.

Let ε >

0. We proceed with the Laplace transform method, for all t ≥ P ( X − Y ≤ ≤ P ( X − Y ≤ ε ) ≤ e tε E e − t ( X − Y ) := e − ψ ( t ) , where ψ ( t ) := − tε − log E e − t ( X − Y ) . Now we use the fact that the function ψ ( t ) isadditive for sums of independent random variables together with the formula forthe moment generating function of a binomial distribution (Example 3.32 in [12])log E e − t ( X − Y ) = K log(1 − p (1 − e − t )) + K log(1 − q (1 − e t )) , where p = α log nn and q = β log nn . Using the elementary inequality, log(1 − x ) ≤ − x ,valid for all 0 ≤ x ≤

1, we get ψ ( t ) ≥ − εt + K p (1 − e − t ) + K q (1 − e t ) . We pick t ∗ = log((2 K q ) − ( − ε + p ε + 4 K K pq )) in order to optimize the righthand side. The second term in the right hand side becomes K p (1 − e − t ∗ ) = K p − K q − ε + p ε + 4 K K pq ! . We are interested in the behaviour of ψ ( t ∗ ) when ε → + , so we take the limit bothsides in the equality abovelim ε → + K p (1 − e − t ∗ ) = K p − p K K pq. Similarly, we get lim ε → + K q (1 − e t ∗ ) = K q − p K K pq. Now we can take the limit as ε → + in inequality 3 to obtain P ( X − Y ≤ ≤ e lim ε → − ψ ( t ∗ ) ≤ e − ( K p + K q − √ K K pq ) . Recall that K = nγ + o ( n ), K = nγ + o ( n ), p = α log nn and q = β log nn . Then, P ( X − Y ≤ ≤ e − log( n )( γ α + β − γ √ αβ + o (1)) . (cid:3) We end this section with the proof of Proposition 1.

Proof.

We denote E by the success event and we condition on the event that V ♮ has been drawn. By union bound we can write, P ( E c | V ♮ ) ≤ P + P . Here P := P v ∈ S { v ∈ V ( G ) \ V ♮ } P ( e ( v, R ) − e ( v, R ) ≤

0) and P is deﬁned anal-ogously. Observe that now the probability in the right hand side of P is equalto P  K X j =1 B ( p ) j − K X j =1 B ( q ) j ≤  , where K i = | R i | and for all j , the random variables B pj ∼ Ber( p ) and B qj ∼ Ber( q )are all independent. We set X := P K j =1 B ( p ) j ∼ Binom( K , α log nn ) and Y := P K j =1 B ( p ) j ∼ Binom( K , β log nn ). In order to apply Lemma 1, we denote A by theevent in which both K and K lie in the interval nγ (1 ± √ log n ). So we can bound P by P ≤ X v ∈ S { v ∈ V ( G ) \ V ♮ } ( {A c } + {A} P ( X − Y ≤ |A )) . We use the crude bound { v ∈ V ( G ) \ V ♮ } ≤ P ≤ n {A c } + {A} P ( X − Y ≤ |A )) . It is easy to see that the same bound holds for P , so P ( E c | V ♮ ) ≤ P + P ≤ n ( {A c } + {A} P ( X − Y ≤ |A )) . We take the expectation with respect to V ♮ both sides to obtain,(4) P ( E c ) ≤ n ( P ( A c ) + E V ♮ P ( X − Y ≤ |A )) . OMMUNITY DETECTION WITH A SUBSAMPLED SEMIDEFINITE PROGRAM 5

By Chernoﬀ’s small deviation inequality (Exercise 2.3.5 [11]), there is an absoluteconstant c > P ( A c ) ≤ P (cid:18) | K − nγ | > nγ √ log n (cid:19) ≤ e − c γn log n = o ( 1 n ) . By Lemma 1,(6) E V ♮ P ( X − Y ≤ |A ) ≤ n − (( α + β ) γ − γ √ αβ )+ o (1) . By the assumption on γ , ( α + β − √ αβ ) γ >

1. Therefore, there exists an ε > P ( X − Y ≤ |A ) ≤ n − − ε + o (1) = o ( 1 n ) . Then we combine inequalities 5 and 6 with inequality 4 to complete the proof. (cid:3) Exact Recovery in the Subsampled Nodes

In the sampling procedure in Step 1, the unknown communities S ∩ V ♮ and S ∩ V ♮ are no longer guaranteed to be balanced, therefore we cannot directlyuse the optimization program (2) because the maximum likelihood estimator is nolonger (1). However, thanks to the authors in [8], similar semideﬁnite programscan be used to handle this case. We follow the approach in [8].To begin with, it is straightforward to see that if the communities have sizes K and n − K , the maximum likelihood estimator becomes(7) max x x T Ax s.t. 1 T x = (2 K − n ) x ∈ {± } n Therefore we can relax the problem in the same as before, we set X := xx T andwrite(8) max X ∈ R n × n Tr( AX )s.t. X ii = 1 X (cid:23) A J ) = (2 K − n ) We should remark that the formulation (8) requires the knowledge of the sizes ofthe communities. To overcome this problem, we consider a Lagrangian formulation(9) max X ∈ R n × n Tr( AX ) − λ ∗ Tr( A J )s.t. X ii = 1 X (cid:23) λ ∗ adjusts the sizes of the com-munities. An important insight from [8] is the following: There exists a value of λ ∗ that works for all values K , so the optimization program (9) can be used torecover unbalanced communities with unknown sizes. Indeed, the following propo-sition reﬂects it. We use the notation G ∼ G ( n , n , p, q ) to denote a random graphdrawn exactly in the same way as before with the exception that now the planted communities have sizes n and n satisfying n + n = n but n is not necessarilyequal to n . Proposition 2. [8] Let G ∼ G ( K, n − K, p, q ) with planted communities { S , S } and with p = α log nn and q = β log nn satisfying p > q . Then, for √ α − √ β > √ , the semideﬁnite program (9) with λ ∗ = (cid:16) α − β log α − log β (cid:17) log nn exactly recovers thecommunities with probability − o (1) . Main Theorem

We shall proceed to the main result of this paper. We combine the ideas insection IV and V to establish a complete analysis of the sketching procedure.

Theorem 2 (Main result) . Let G ∼ G ( n ; p, q ) with planted communities { S , S } and with p = α log nn and q = β log nn satisfying p > q . Draw a vertex set V ♮ at randomby sampling each node of the graph G independently at random with probability γ .Denote ˆ R i to be the maximum likelihood estimators of R i = S i ∩ V ♮ , for i ∈ { , } .Now take ˆ S = ˆ R ∪ { v ∈ V ( G ) \ V ♮ : e ( v, ˆ R ) > e ( v, ˆ R ) } . ˆ S = ˆ R ∪ { v ∈ V ( G ) \ V ♮ : e ( v, ˆ R ) > e ( v, ˆ R ) } . Then, with probability − o (1) , ( ˆ S , ˆ S ) = ( S , S ) as long as γ > √ α − √ β ) . Proof.

Observe that after sampling the vertex set V ( G ) of the graph, the inducedsubgraph H ⊂ G is a random graph with law H ∼ G ( S ∩ V ♮ , S ∩ V ♮ , p, q ). Weclaim that there exists a λ ∗ such that the optimization program 9 recovers bothcommunities S ∩ V ♮ and S ∩ V ♮ with probability 1 − o (1). The proof of thetheorem easily follows from the claim by applying Proposition 1 and observing thatthe intersection of two events that occur with high probability also occurs with highprobability.Now, we proceed to prove the claim. In order to apply Proposition 2 we need tocheck that, with high probability,(10) √ α H − p β H > √ , where α H := p | V ♮ | log | V ♮ | and β H := q | V ♮ | log | V ♮ | if | V ♮ | ≥ p = α log nn and q = β log nn . Clearly, the degenerate event | V ♮ | ≤ o (1), so P (cid:18) α H = α | V ♮ | log nn log | V ♮ | ∩ | V ♮ | ≥ (cid:19) = 1 − o (1) . An analogous fact holds for β H , so the event {√ α H − p β H = s | V ♮ | log nn log | V ♮ | ( √ α − p β ) } ∩ {| V ♮ | ≥ } , occurs with probability 1 − o (1). Since log n log | V ♮ | ≥ − o (1), OMMUNITY DETECTION WITH A SUBSAMPLED SEMIDEFINITE PROGRAM 7 r | V ♮ | n ! ( √ α − p β ) > √ . Observe that | V ♮ | is a sum of n i.i.d random variables with Bernoulli distributionwith mean γ . By assumption, there exists an δ > √ α −√ β ≥ q γ (1+ δ )and by the weak law of large numbers we have convergence in probability, i.e, forevery ε > P (cid:16) | V ♮ | n ≥ γ − ε (cid:17) = 1 − o (1). Putting these three facts together, weobtain, for every ε > s | V ♮ | log nn log | V ♮ | ! ( √ α − p β ) ≥ √ δ ) r − εγ , with probability 1 − o (1). We choose ε > δ ) q − εγ > (cid:3) Acknowledgment

The authors would like to thank Dustin Mixon and Kaiying Xie for helpfuldiscussions.

References [1] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model.

IEEETransactions on Information Theory , 62(1):471–487, 2016.[2] Emmanuel Abbe. Community detection and stochastic block models: Recent developments.

The Journal of Machine Learning Research , 18(1):6446–6531, 2017.[3] Farid Alizadeh. Interior point methods in semideﬁnite programming with applications tocombinatorial optimization.

SIAM journal on Optimization , 5(1):13–51, 1995.[4] Afonso S Bandeira. Random Laplacian matrices and convex relaxations.

Foundations of Com-putational Mathematics , 18(2):345–379, 2018.[5] Andreas Bluhm and Daniel Stilck Fran¸ca. Dimensionality reduction of SDPs through sketch-ing.

Linear Algebra and its Applications , 563:461–475, 2019.[6] Michael R Garey and David S Johnson.

Computers and Intractability: A Guide to the Theoryof NP-Completeness , volume 174. Freeman San Francisco, 1979.[7] Michel X Goemans and David P Williamson. Improved approximation algorithms for max-imum cut and satisﬁability problems using semideﬁnite programming.

Journal of the ACM(JACM) , 42(6):1115–1145, 1995.[8] Bruce Hajek, Yihong Wu, and Jiaming Xu. Achieving exact cluster recovery thresholdvia semideﬁnite programming: Extensions.

IEEE Transactions on Information Theory ,62(10):5918–5937, 2016.[9] Dustin G Mixon and Kaiying Xie. Sketching semideﬁnite programs for faster clustering. arXivpreprint arXiv:2008.04270 , 2020.[10] Elchanan Mossel, Joe Neeman, and Allan Sly. Consistency thresholds for the planted bisectionmodel. In

Proceedings of the forty-seventh annual ACM symposium on Theory of computing ,pages 69–75, 2015.[11] Roman Vershynin.

High-Dimensional Probability: An Introduction with Applications in DataScience , volume 47. Cambridge University Press, 2018.[12] Larry Wasserman.

All of Statistics: A Concise Course in Statistical Inference . SpringerScience & Business Media, 2013.[13] David P Woodruﬀ. Sketching as a tool for numerical linear algebra.

Foundations and Trends ® in Theoretical Computer Science , 10(1–2):1–157, 2014.[14] Alp Yurtsever, Madeleine Udell, Joel A Tropp, and Volkan Cevher. Sketchy decisions: Convexlow-rank matrix optimization with optimal storage. arXiv preprint arXiv:1702.06838arXiv preprint arXiv:1702.06838