[PDF] Graph Community Detection from Coarse Measurements: Recovery Conditions for the Coarsened Weighted Stochastic Block Model

Abstract

We study the problem of community recovery from coarse measurements of a graph. In contrast to the problem of community recovery of a fully observed graph, one often encounters situations when measurements of a graph are made at low-resolution, each measurement integrating across multiple graph nodes. Such low-resolution measurements effectively induce a coarse graph with its own communities. Our objective is to develop conditions on the graph structure, the quantity, and properties of measurements, under which we can recover the community organization in this coarse graph. In this paper, we build on the stochastic block model by mathematically formalizing the coarsening process, and characterizing its impact on the community members and connections. Through this novel setup and modeling, we characterize an error bound for community recovery. The error bound yields simple and closed-form asymptotic conditions to achieve the perfect recovery of the coarse graph communities.

Full PDF

GGraph Community Detection from Coarse Measurements: RecoveryConditions for the Coarsened Weighted Stochastic Block Model

Naﬁseh Ghoroghchian Gautam Dasarathy Stark C. Draper

University of TorontoVector Institute Arizona State University University of Toronto

Abstract

We study the problem of community recov-ery from coarse measurements of a graph. Incontrast to the problem of community recov-ery of a fully observed graph, one often en-counters situations when measurements of agraph are made at low-resolution, each mea-surement integrating across multiple graphnodes. Such low-resolution measurements ef-fectively induce a coarse graph with its owncommunities. Our objective is to developconditions on the graph structure, the quan-tity, and properties of measurements, underwhich we can recover the community orga-nization in this coarse graph. In this pa-per, we build on the stochastic block modelby mathematically formalizing the coarsen-ing process, and characterizing its impacton the community members and connections.Through this novel setup and modeling, wecharacterize an error bound for communityrecovery. The error bound yields simple andclosed-form asymptotic conditions to achievethe perfect recovery of the coarse graph com-munities.

Community detection (a.k.a. clustering) in a graph isthe problem of identifying groups of nodes with similarbehaviour (Fortunato and Hric, 2016; Von Luxburg,2007; Abbe, 2017). Identifying communities is usu-ally the ﬁrst analysis tool used to draw an initial ob-servation from data (Yang and Leskovec, 2013). Acommunity in a graph refers to a group of nodes that

A version of this manuscript appears in the Proceedingsof the 24 th International Conference on Artiﬁcial Intelli-gence and Statistics (AISTATS) 2021, San Diego, Califor-nia, USA. are more similar to each other than to the rest of thegraph. The notion of similarity most conventionallymeans assortativity, i.e. denser intra-community linksin an unweighted graph where no weight or label is as-sociated with the graph edges (Fortunato, 2010). How-ever, the group similarity notion has been extended toother forms of connectivity, as well as to weighted net-works (Fortunato and Hric, 2016). Cluster formationis proven to be a universal structure in real networks(Yang and Leskovec, 2015). As a result, detecting com-munities in networks has become a central question toa great body of prediction and inference tasks, with ap-plications in network neuroscience (Sporns and Betzel,2016; Bassett and Sporns, 2017; Betzel et al., 2019),social networks (Yang and Leskovec, 2013), collabora-tion networks (Hou et al., 2008), and biological net-works (Girvan and Newman, 2002).While existing methods for community detection havebeen eﬀective in modeling, studying, and recover-ing communities from ﬁnely detailed, high-resolutiongraphs (Fortunato and Hric, 2016), there are variousscenarios where a large-scale graph is not fully ob-servable and should be coarsened due to restrictionsimposed by the measuring instrument (will be exem-pliﬁed shortly) (Betzel and Bassett, 2017), limitationsof the storage memory, high sampling costs, compu-tational tractability (Dabagia et al.; Serrano et al.,2009), restricted accessibility to data, and the creationof multi-scale representations for graphs (Safro et al.,2015; Loukas, 2019). Discovering the latent commu-nity structure from the coarse measured graph is avaluable objective of many graph-based tasks (Muchaet al., 2010; Betzel et al., 2019).Although conventional community detection modelscan be directly applied to the coarse measured graphs(Betzel and Bassett, 2017), a fundamental understand-ing of the impact of coarsening on the communitystructure and recovery is missing. Fig. 1 illustrateshow the coarse measurement process can obscure thehigh-resolution graph structure. The ﬁgure shows thatas coarsening reduces the size of the graph, introduces a r X i v : . [ m a t h . S T ] F e b aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper Coverage Size, 𝑟 (i.e. =4 Community Recovery c-nodes

High-resolution (Fine) Graph

Fine Adjacency Matrix 𝑊 𝑛 =54 Measuring (Coarsening) c-edges

Low-resolution (Coarse) Graph

Coarse Adjacency Matrix 𝑚 =9 Maximum communities with which a measurement overlaps, 𝜈 =2 Colors represent 𝐾 =3 (a) (b) (c) Figure 1:

Visual illustration of (a) the underlying high-resolution (ﬁne) graph, (b) the measurement (coarsening) proce-dure whose result is modeled as a coarse graph, and (c) the eﬀect of the coarsening on the community structure, whoserecovery is the objective of this paper. Some notations used in this paper, with their values realized for this ﬁgure, areannotated. heterogeneity in the edge weights, which can poten-tially cause a drift away from the true communitystructure.The study of clustering from coarse measured graphsenables the characterization of contributing factorsto their community recovery. Such characterizationleads to identifying the barriers in community detec-tion from a coarse graph, which can potentially im-prove the clustering by applying adjustments to themeasurement and community recovery process. Suchclustering characterization and recovery improvementare crucial to many ﬁelds including neuroscience. Of-ten in the study of the brain on a large scale, the scien-tiﬁc measuring instruments are quite coarse and can-not directly monitor the activity of all the neurons inthe brain, which is as high as 14 billion. Hence, oneis restricted to collect aggregate signals from bundlesof neurons (Osorio et al., 2016; Ghoroghchian et al.,2020), from which a low-resolution functional braingraph is generated (Friston, 2011; Ghoroghchian et al.,2018). The communities identiﬁed in the measuredgraph have been connected to brain cognitive and be-havioral units, and they provide biomarkers for neuro-logical diseases (Sporns and Betzel, 2016; Bassett andSporns, 2017; Lynn and Bassett, 2019; Patankar et al.,2020).

Contributions : In this paper, we study the commu-nity detection from coarse measured graphs, which tothe best of our knowledge is the ﬁrst analysis of thisproblem: • A random generative model is introduced for the coarse measured networks. A mathematicalframework is deﬁned that characterizes the mea-surement process, the coarse graph, as well as therelationship between the community structure ofthe ﬁne and coarse graphs. • Simple and closed-form asymptotic conditions aredeveloped on the graph structure, the quantityand properties of the measurements, under whichthe community organization of the coarse graphis recovered. The recovery error is characterized,which facilitated studying the eﬀects of variousmeasurement- and structure-related parameters,who take part in improving or exacerbating thequality of the recovery. • Simulations are provided to compare the derivedtheoretical error bound with the performance ofstate-of-the-art community detection methods.

Related Work : While the problem of coarseninga known graph has received considerable attention inthe past (Karypis and Kumar, 1998; Harel and Ko-ren, 2001; Kushnir et al., 2006; Safro et al., 2015;Loukas, 2019; Rahmani et al., 2020), to the best ofour knowledge, this paper is the ﬁrst to consider learn-ing community structure from coarse summaries of an unknown graph .This paper is built upon the stochastic block model(SBM), a random generative model that is widely usedas a canonical model in community detection literature(Abbe, 2017). Although there are other approachesto detect communities, mainly based on modularity aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper maximization and statistical inference (Fortunato andHric, 2016; Javed et al., 2018), there are advantagesto SBM that ﬁt it to our purposes. SBM providesa rich benchmark that facilitates its generalization tonumerous variants (Abbe, 2017; Fortunato and Hric,2016; Funke and Becker, 2019). Furthermore, the gen-erative nature of SBMs allows for characterizing com-munities and their recovery (Abbe, 2017), which par-ticularly serves the improvement of community detec-tion. We start by using the vanilla symmetric SBM tomodel the ﬁne scale graph, which we consider a latentmodel that underlies the observed coarse graph. Underthis model, we show that the coarse graph becomes aweighted and mixed membership (or overlapping) vari-ant of the SBM.The mixed membership SBM (MMSBM) is anotherrelevant paradigm to our purposes and could serve asa good model when directly applied at the measure-ment (coarsened) level. However in the current paper,we start with a model of the ﬁne graph and charac-terize the coarse model as a function of the coars-ening/measurement procedure. This is more natu-ral given our goal is to infer community informationabout the underlying ﬁne graph. Relatedly, as far aswe know, most papers on MMSBM such as (Dulacet al., 2020) are algorithmic-oriented and do not con-tain theoretical analysis of the community recoveryperformance similar to our work in this paper. Fewexisting works that include theoretical analysis (Maoet al., 2017), do not model weighted edges and do notfocus on coarsening, the two components that are cru-cial to our setup.

Consider an unweighted graph G = G ( V , E ), where V is a set of nodes of cardinality |V| = n , and E is a setof pairs of nodes, referred to as edges. Alternative to E and since the graph is unweighted, we can representthe edges using an adjacency matrix W ∈ { , } n × n ,where each node of the graph is labeled by a uniquenumber in the index set [ n ] (cid:44) { , , · · · , n } , and W uv =1 shows the existence of an edge between nodes u and v .We assume an underlying community structure on V , which partitions the node set into disjointed sets V = ∪ Kk =1 V k . For all k ∈ [ K ], V k represents the setof the nodes that belong to community k . Each nodebelongs to only one of the K communities. The intra-connection among nodes in the same community is dif-ferent from their connection to the rest of the graph.Let P ∈ { , } K × n be the true community assignmentmatrix, where P ku = 1 iﬀ node u belongs to commu- nity k , i.e. P ku = (cid:26) u ∈ V k . (1)A graph is drawn under the Symmetric StochasticBlock Model (SSBM) characterised by p and q , wherethe probability of having an edge between two nodesis independently distributed according to Bernoulli( p ),for two nodes in the same community, and Bernoulli( q )for nodes in diﬀerent communities. Also, the nodesare assigned to communities in a uniform and inde-pendent manner. We let W be distributed accordingto W ∼ SSBM( n, K, p, q ) conditional on P , i.e., W uv ∼ (cid:26) Bernoulli( p ) if ∃ k ∈ [ K ] : P ku = 1 , P kv = 1Bernoulli( q ) else , (2)We assume a general scaling behaviour for p, q bydeﬁning the constants 0 < α, β < ∞ and a scalingfactor f ( n ), where: p (cid:44) αf ( n ) , q (cid:44) βf ( n ) . (3) f ( n ) tracks the changes in the graph density as a func-tion of the graph size. As n increases, f ( n ) may remainunchanged, or it may get smaller, i.e. the graph be-comes sparser as it grows. The latter sparsity assump-tion has been considered in existing literature, as it ﬁtsto many real-world applications, including biological,social, and collaborative networks (Abbe, 2017; Mos-sel et al., 2014; Abbe et al., 2015; Abbe and Sandon,2015a).In real applications, G can be very large, in the or-der of millions or billions of nodes. In general, thepopulation is much larger than the number of commu-nities (e.g., there are many more citizens than cities)and so n (cid:29) K . We often cannot observe the existence(or lack of existence) of all n ( n − possible connectionsand instead measure J summaries of associations.One possible choice to collect a simpliﬁed and inter-pretable set of summary measurements (more expla-nations come shortly), is to deﬁne a set of disjointed measurement vectors { b , b , · · · , b m } , all satisfying b i ∈ { , } n and b i b (cid:124) j = 0 for all i ∈ [ m ] diﬀerent from j ∈ [ m ]. The latter condition means measurement vec-tors do not overlap, i.e. each node is measured at mostone time. Each summary, denoted by s (cid:96) for (cid:96) ∈ [ m ],is deﬁned as: s (cid:96) = (cid:88) u ∈ supp( b (cid:100) (cid:96)/m (cid:101) ) (cid:88) v ∈ supp( b (cid:96) mod m ) W uv = b (cid:100) (cid:96)/m (cid:101) W b (cid:124) (cid:96) mod m . (4)supp( b i ) denotes the support of b i and | supp( b i ) | isthe cardinality of the support. Equation (4) corre-sponds to the set of summary measurements one would aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper get if one deﬁnes an m × n matrix whose rows are b , b , · · · , b m , and then collects m non-distinct (or m ( m +1)2 distinct) measurements as in (4), forming thefollowing matrix equality:˜ W = BW B (cid:124) . (5)Such measurement model is a natural choice in exist-ing applications. For instance, linear measurement of ahigh-dimensional signal appear in compressed sensing(Donoho, 2006; Draper and Malekpour, 2009) which isfurther applied to Electroencephalogram (EEG) sig-nal processing (Aviyente, 2007) and image process-ing (Baraniuk, 2007), as well as in Covariance sketch-ing (Dasarathy et al., 2015). For such linear mea-surements, the original and the measured graphs re-spectively model the Covariance (here, thresholded forweighted graphs) matrices of the original and the lin-early measured signals. The measurement model in(5) is also a popular graph reduction method, where˜ W approximates W by preserving some of its spectralproperties (Safro et al., 2015; Loukas, 2019; Jin et al.,2020).Matrix ˜ W can be thought of as the weighted adjacencymatrix representation of a measured weighted graph˜ G ∈ ˜ G ( ˜ V , ˜ E ), where ˜ V is the set of c-nodes and | ˜ V| = m . ˜ E is the set of c-edges , consisting of pairs of c-nodes and a weight, i.e., ( i, j, ˜ w ). Note that ˜ W ij ’s forall i > j are independent random variables if the b i ’sare disjoint. We return to this point, and the formalstatistics of ˜ W , shortly. Deﬁnition 1.

A measurement matrix B is “ r -homogeneous” if for all i ∈ [ m ] there is a constantpositive integer r ≤ nm such that | supp ( b i ) | = r . We assume the number of measured ﬁne nodes thatrepresent a c-node is the same for all c-nodes. We referto this number as the coverage size and denote it by r .Accordingly, the support of the rows of a homogeneousmeasurement matrix has cardinality equal to r . Wedeﬁne the c-node proﬁle matrix:Φ (cid:44) BP (cid:124) , (6)whose dimension is m × K and connects the measure-ment matrix B to the graph of community assignmentmatrix P . Φ displays the impact of coarsening on thecommunity memberships. A c-node can belong to onecommunity or multiple communities. Each row of thec-node proﬁle matrix, φ i , is a length- K vector thatcounts the number of nodes in each community in G that is measured by the i -th c-node. For instance,Φ =   means that, all the 4 ﬁne nodes “c-” stands for compound or coarse. that map to the ﬁrst (resp. the third) c-node belongto community 2 (resp. 1), while half of the 4 ﬁne nodesmapping to the second c-node belong to community 1and the other half belong to community 3.The following Lemma derives the statistics of ˜ W . Lemma 1.

Let W ∼ SSBM ( n, K, p, q ) from which ˜ W in (5) is measured under the r -homogeneous measure-ment assumption deﬁned in Def. 1. Then ˜ W ij ’s arei.i.d. random variables for all i > j , with distribution: ˜ W ij ∼ PoissonBinomial ( { p } φ (cid:124) i φ j , { q } r − φ (cid:124) i φ j ) , (7) where the PoissonBinomial in (22) , is a compact nota-tion for a Poisson Binomial distribution with successprobabilities φ (cid:124) i φ j of p ’s and r − φ (cid:124) i φ j of q ’s. The proof is elaborated in Sec. 4.1 of the supplemen-tary materials.Each c-node can measure from members of one or mul-tiple communities. We denote the maximum numberof communities that overlap with a c-node, by ν , where1 ≤ ν ≤ K . This is considered as a Community Over-lap (CO) constraint, and is illustrated in the next Def-inition. Deﬁnition 2.

A measurement matrix B is CO- ν withrespect to a graph G ∈ G ( V , E ) with community assign-ment matrix P , if the proﬁle matrix Φ = BP (cid:124) satisﬁes: ≤ | supp ( φ i ) | ≤ ν ∀ i ∈ [ m ] . Def. 2 means that the support of each row of B cor-responds to at most ν of the communities in G . Thenext deﬁnition is the last step to formalizing the coarsegraph community structure. Deﬁnition 3.

A measurement matrix B is “balanced”with respect to a graph G ∈ G ( V , E ) with communityassignment matrix P , if the proﬁle matrix Φ = BP (cid:124) satisﬁes Φ ik = Φ ik (cid:48) for all i ∈ [ m ] and k, k (cid:48) ∈ supp ( φ i ) . In other words, in a balanced-measured graph, an iden-tical number of nodes are measured from each commu-nity.The objective of this paper is to recover the c-nodeproﬁle matrix Φ from the measured graph ˜ W in (5).Let a maximum a posteriori (MAP) estimator take ameasured graph ˜ G with the true c-node proﬁle matrixΦ, and returns its estimate ˆΦ that assigns every c-nodein ˜ G to communities. We characterize an upper boundon the failure probability of the MAP estimator. Theerror refers to assigning a wrong proﬁle to at least onec-node, up to equivalent relabelling of communities.We also study the asymptotic conditions such that thiserror tends to zero.Recovering Φ in (6) from the measured matrix ˜ W ,without imposing additional constraints on Φ, is gen- aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper erally a very hard problem. Hence, we relax the prob-lem to achieve tractability, by putting the constraintsin Def. 1, 2, and 3 on the measurement matrix (i.e. B ), with respect to the community assignment matrix(i.e. P ) of the graph. In many practical settings,assumptions such as homogeneity are reasonable. Forinstance, Electroencephalography (ECoG) signals areacquired from diﬀerent brain regions using electrodeswhose contact surface areas are the same. Neverthe-less, a relaxation of these assumptions is of consider-able interest and will serve as a compelling avenue forfuture exploration.In the next section, we state and study the communityrecovery problem under the CO- ν constraint. ν constraint In this section, we derive an upper bound on the MAPrecovery error of the proﬁle matrix Φ, as described atthe end of Sec. 2. The recovered proﬁle matrix ˆΦ,estimates at most ν communities from which each c-node measures. We begin sketching our main result by deﬁning K ( ν ) = ν (cid:88) (cid:96) =1 (cid:18) K(cid:96) (cid:19) , (8)the proﬁle set:Υ ( ν ) (cid:44) { φ i | φ i = b i P (cid:124) , ≤ | supp( φ i ) | ≤ ν, ∀ k ∈ supp( φ i ) : φ ik = r | supp( φ i ) | } , (9)and a one-to-one function h : Υ ( ν ) −→ [ K ( ν ) ]. Function h maps a c-node proﬁle to an extended communityindexed by [ K ( ν ) ] (more explanations in Sec. 3.2). Aprobability matrix U ∈ [0 , K ( ν ) × K ( ν ) is deﬁned, forall a , a (cid:48) ∈ Υ ( ν ) and k = h ( a ) , k (cid:48) = h ( a (cid:48) ), as: U k,k (cid:48) = P ( X ≥ r (˜ τ p + (1 − ˜ τ ) q )) , (10)where 0 ≤ ˜ τ ≤ X is an auxiliary random variabledistributed as: X ∼ PoissonBinomial( { p } a (cid:124) a (cid:48) , { q } r − a (cid:124) a (cid:48) ) . (11)Sec. 3.2 will elaborate on the reasons behind these def-initions, using the binarization of the coarse measuredgraph, i.e. mapping the c-edge weights to zero or one.Sec. 3.2 shows that the elements of matrix U in (10)essentially denote the probability of having a connec-tion between members of the extended communities (or equivalently between c-node proﬁles), in the bi-narized coarse graph. The prior distribution on theextended communities is denoted by the probabilityvector s . We deﬁne the scaled Chernoﬀ-Hellinger (CH)divergence as: D (diag( s ) U k , diag( s ) U k (cid:48) ) (cid:44) max ≤ t ≤ (cid:88) k (cid:48)(cid:48) ∈ [ K ( ν ) ] s k (cid:48)(cid:48) [ tU kk (cid:48)(cid:48) + (1 − t ) U k (cid:48) k (cid:48)(cid:48) − U tkk (cid:48)(cid:48) U (1 − t ) k (cid:48) k (cid:48)(cid:48) ] , (12)where the original CH divergence is D + = m log m D . Thefollowing theorem provides an error bound for commu-nity recovery from the coarse graph. Theorem 1.

Let W ∼ SSBM ( n, K, p, q ) from which ˜ W in (5) is measured under the r -homogeneous, bal-anced, and CO- ν constraints. s is a length- K ( ν ) prob-ability vector, U k denotes the k -th column of matrix U deﬁned in (10) , and K ( ν ) is deﬁned in (8) . Theprobability that the MAP estimator fails to recover thec-node proﬁle matrix Φ from ˜ W (up to relabelling of Φ ’s columns) is upper-bounded by: P ( MAP failure ) ≤ (cid:88) k,k (cid:48) ∈ [ K ( ν ) ] k

In order to extract interpretable obser-vations from the recovery error bound in Theorem 1,we examine the dominant term of the CH divergencein (12) . For each pair of extended communities, k, k (cid:48) ,the dominant term corresponds to an extended com-munity k (cid:48)(cid:48) , where the probability of its connectivity tothose communities is the most distant. We derivedan estimate for the dominant term in Sec. 4.2 of thesupplementary materials, which demonstrates the fol-lowing: the exponent of the error recovery bound (i.e. aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper the CH divergence) increases as r and | α − β | increase(by ﬁxing whichever α or β that is smaller and in-creasing the other one), or as ν decreases, while otherparameters remain unchanged. In the following we list the observations derived fromTheorem 1 and Remark 1:1. As we increase the measurement size (i.e. m , thenumber of c-nodes), the error bound decreases.2. As the coverage size per measurement (i.e. r , thenumber of measured ﬁne nodes represented by a c-node) expands, the failure error bound decreases.3. By allowing measurements overlapping with fewercommunities (i.e. increasing the purity of the c-nodes),the error bound drops. This intuitively makes sensedue to a decrease in complexity.4. The expansion of the gap between extra- and intra-community probabilities results in a decrease in theerror bound. This is intuitively expected since commu-nities become more distinguishable from one another.Note that the trends listed above are true so long asthe prior s remains unchanged, or does not changesuch behaviors. We also assume other parameters ex-cept for the one mentioned, remain unchanged. Oth-erwise, we face perturbing multiple parameters simul-taneously, which might make the behavior of the errorbound unpredictable and heavily depending on the pa-rameter values.The following corollary characterizes the asymptoticconditions such that the community recovery error,upper-bounded in Theorem 1, approaches zero. Corollary 1.

Let W ∼ SSBM ( n, K, p, q ) from which ˜ W in (5) is measured under the r -homogeneous, bal-anced, and CO- ν constraints. The probability that theMAP estimator fails to recover the c-node proﬁle ma-trix Φ from ˜ W (up to relabelling of Φ ’s columns), fora constant ∆ > , < ˜ τ < ν , tends to zero as: r > ∆ √ f ( n ) , α (cid:54) = β, ∆ ( mn ) < f ( n ) ≤ f , n ≥ n , m, n → ∞ . (14) The constant ∆ is deﬁned in equation (45) in Sec. 4.3of the supplementary materials. The remaining pa-rameters are assumed to remain ﬁxed. The condition in (14) is directly derived from the er-ror bound in Theorem 1, by tending the exponent toinﬁnity resulting in the error to approach zero. Thecomplete proof is sketched in Sec. 3.2.Corollary 1 characterizes the impact of coarsening onthe community recovery. After the coarse graph isbinarized, the connectivity probability between some c-edges reaches very fast to zero, and the rest to one,which facilitates the separation of communities. More-over, the measurement coverage size r (i.e. the num-ber of measured ﬁne nodes combined into a c-node),and the graph binarization threshold ˜ τ , must satisfya lower and upper bound, respectively, to allow per-fect community recovery of a coarsened graph throughits binarization. The recovery conditions derived inCorollary 1, are illustrated in the last column of Ta-ble 1, and are compared with those of the classic (non-coarsened) general SBM that exist in the literature.The comparison is made in terms of various scalingsof the parameters. The ﬁrst column exhaustively par-titions the scaling of the connection probability of thecoarse graph, which can be a function of m, n, r anddenoted by ˜ f ( m, n, r ), for which the second columnshows state-of-the-art conditions to allow or disallowexact recovery. In the third column, diﬀerent scal-ings of the coarsening coverage size r are considered,where each scaling results in separate recovery condi-tions demonstrated in the last column. The community recovery problem under the CO- ν con-straint refers to the problem of estimating the c-nodeproﬁle matrix Φ that corresponds to the weighted ad-jacency matrix ˜ W deﬁned in (5) and measured from W ∼ SSBM( n, K, p, q ) under the r -homogeneous, bal-anced, and CO- ν constraints. This way, ˜ W is dis-tributed according to (7) and hence, can be thoughtof and modeled as a sample of a weighted version ofthe Overlapping general SBM (OSBM) random graphensemble. The formal deﬁnition of general SBM isfound in (Abbe and Sandon, 2015a). We deﬁne theweighted OSBM that models ˜ W , similar to the classicOSBM, except that the node proﬁles φ i for all i be-long to the set Υ ( ν ) deﬁned as (9). rather than theset of any length- K binary vectors { , } K . Further-more, the weighted OSBM that models ˜ W , an edgebetween pairs of nodes is distributed as the PoissonBinomial distribution in (7), rather than the Bernoullidistribution in classic OSBM. Note that due to theCommunity Overlap (i.e. CO- ν ) assumption on ˜ W ,the edge distributions depend on the inner product ofthe pairwise proﬁles, which takes values between 0 and r , i.e. 0 ≤ φ (cid:124) i φ j ≤ r . Hence, the weighted OSBM isnot symmetric.Deriving the conditions that allow the community re-covery from a weighted OSBM, except for the sym-metric case (c.f. Sec. 3.3 and Sec. 1 in the supple-mentary materials), is an open problem (Xu et al.,2020). In the following, we exploit the properties ofthe special case of the weighted OSBM concerning thisstudy, which enables its transformation to a classic aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper Table 1:

Comparison of the recovery conditions under the CO- ν constraint, derived from Corollary 1. All scaling notations( o for strictly smaller than, and Ω for strictly greater than, disregarding constants) are deﬁned with respect to m . f ( n ) isthe probability scaling of connections in the ﬁne graph, Q is a constant matrix (i.e. it does not scale with other variables),∆ is a positive constant, and ˜ f ( m, n, r ) represents the probability scaling of connections in the coarse graph, which is afunction of the ﬁne and coarse graphs sizes, and the coverage size (i.e. the number of measured ﬁne nodes represented bya c-node), to allow for comparison with the classic scenario (i.e. with n = m, r = 1).˜ f ( m, n, r ): Classic (exact) Recovery

Scaling of

Recovery, This paper probability scaling SBM( m, s , U = Q ˜ f ( m, m, m, n → ∞ of connections as m, n → ∞ coverage size,in coarse graph (Abbe and Sandon, 2015a) i.e. ro ( log mm ) Impossible o ( √ f ( n ) ) Impossible c √ f ( n ) Possible if α (cid:54) = β, c > ∆ , f ( n ) > ∆ ( mn ) Ω( √ f ( n ) ) Possible if α (cid:54) = β, f ( n ) = Ω(( mn ) ) c mm Possible if D + > o ( (cid:113) m log m ) Impossible c (cid:113) mc log m Possible if α (cid:54) = β, c > ∆ , f ( n ) > ∆ ( mn ) Ω( (cid:113) m log m ) Possible if α (cid:54) = β, f ( n ) = Ω(( mn ) )Ω( log mm ) Possible if D + > o ( √ f ( n ) ) Impossible c √ f ( n ) Possible if α (cid:54) = β, c > ∆ , f ( n ) > ∆ ( mn ) Ω( √ f ( n ) ) Possible if α (cid:54) = β, f ( n ) = Ω(( mn ) ) (unweighted) general SBM. We propose a two-stagestrategy that ﬁrst binarizes ˜ W and then represents theresultant unweighted OSBM as an unweighted clas-sic (non-overlapping) general SBM. The binarizationis motivated for two reasons. First, binarization iswidely used to simplify and sparsify weighted graphs.Second, through binarization, we can leverage existingwork in community detection literature to study theconditions to recover the c-node proﬁle matrix. ˜ W The summation in the coarsening model (5) suggeststhe concentration of edge weights around a mean value.Hence, for the c-edges that corresponds to a pair of c-nodes measuring from only one community, the ex-pectation of the weights tend to concentrate aboutmeans pr or qr . Regarding the c-nodes measuringfrom multiple communities, their corresponding c-edgeweights concentrate about means pφ (cid:124) i φ j + q ( r − φ (cid:124) i φ j ).This motivates solving our weighted OSBM prob-lem by ﬁrst binarizing ˜ W . Such binarization facili-tates community recovery by adopting the much moreevolved tools available for unweighted graphs. We de-ﬁne the binarized coarse measured matrix ˜ W (b) as:˜ W (b) ij (cid:44) (cid:26) W ij ≥ r (˜ τ p + (1 − ˜ τ ) q )0 else , (15) for 0 ≤ ˜ τ ≤

1. The chosen threshold, i.e. r (˜ τ p +(1 − ˜ τ ) q ) in (15), is a suitable choice since it is lower-and upper- bounded by qr , pr , the minimum andmaximum mean values of ˜ W ij for various proﬁle innerproducts. This way, we only keep the most signiﬁcantedges, i.e. those whose weights are above the meanvalue of the intra-community connections. Through the binarization explained in Sec. 3.2.1, thecoarse graph ˜ W previously modeled as a weighted gen-eral OSBM, is converted to ˜ W (b) , which is a clas-sic (unweighted) general OSBM. Following the ap-proach suggested in (Abbe, 2017), we convert the clas-sic OSBM to an equivalent non-overlapping generalSBM. To do so, instead of the original communityset [ K ], we use the extended community set [ K ( ν ) ], where K ( ν ) (cid:44) | Υ ( ν ) | deﬁned in (8), where each ex-tended community represents a possible c-node proﬁle φ i ∈ Υ ( ν ) for all i ∈ [ m ]. The one-to-one function h : Υ ( ν ) −→ [ K ( ν ) ] provides indexing for the extendedcommunities, i.e. a proﬁle vector φ i ∈ Υ ( ν ) maps toan extended community k = h ( φ i ). Such conversionof proﬁles to extended communities, models the bina-rized matrix of measurements ˜ W (b) in (15) as a generalunweighted SBM denoted by ˜ W (b) ∼ SBM( m, s , U ),where s is a prior probability vector of the extended aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper communities. Sec. 2 in the supplementary materi-als provides the formal deﬁnition of the general un-weighted SBM, the derivation of the matrix of commu-nity connectivity probabilities U , and the remaining ofthe proof techniques of Theorem 1 and Corollary 1. constraint The results in Sec. 3.1 are applicable to coarse mea-sured graphs under the general CO- ν constraint, for all1 ≤ ν ≤ K . However, the CO-1 constraint is an specialcase, which corresponds to a weighted and symmetric SBM. Contrary to the general (i.e. non-symmetric)weighted SBM model which is an open problem, thecommunity recovery from such weighted and symmet-ric SBM has already been addressed in the literature(Jog and Loh, 2015; Xu et al., 2020). In the followingtheorem, we adopt the results of (Jog and Loh, 2015)to achieve stronger recovery conditions under the CO-1 constraint, compared with those of the general CO- ν scenario in Corollary 1. Theorem 2.

Let W ∼ SSBM ( n, K, p, q ) from which ˜ W in (5) is measured under the r -homogeneous, andCO- constraints. The probability that the MAP esti-mator fails to recover the c-node proﬁle matrix Φ from ˜ W (up to relabelling of Φ ’s columns) from ˜ W , tendsto zero as: (cid:40) α (cid:54) = β if m < ∞ r (cid:112) f ( n ) → ∞ α + β − √ αβ > lim m →∞ (cid:104) K mr f ( n ) m (cid:105) if m, n → ∞ , (16) if f ( n ) n →∞ −−−−→ . K is assumed to remain ﬁxed. More explanations and proof details are found in Sec. 1and Sec. 4.4 of the supplementary materials.

In this section, we evaluate the error behavior ofthe community recovery from synthetically generatedcoarse measured graphs. We compare the theoreti-cal error bounds derived in Sec. 3, with state-of-the-art community detection methods from existing worksthat are applied to the generated coarse graphs. Itshould be noted these algorithmic methods only out-put the index of the nodes estimated to be assigned.This translates into the recovery of a binarized ver-sion of the community assignment matrix Φ . Referto Sec. 3 in supplementary materials for the detailedmethodology used in this section. The Python code to reproduce the results ofthis paper is available at: https://github.com/NaGho/Community-Detection-From-Coarse-Measured-Graphs . In Fig. 2, the theoretical error bound (solid line),as well as the community recovery error for multi-ple state-of-the-art overlapping community detectionmethods (Rossetti et al., 2019) are plotted . Themethods include Modularized non-negative matrix fac-torization (M-NMF) (Wang et al., 2017), Speaker-listener Label Propagation Algorithm (SLPA) (Xieet al., 2011, 2013), Non-Negative Symmetric Encoder-Decoder (NNSED) (Sun et al., 2017), and Cluster Af-ﬁliation Model for Big Networks (BigClam) (Yang andLeskovec, 2013) (dashed lines). Note that we haveevaluated these methods for various hyper-parametersand plotted their best performance. (a) w.r.t. m for r = 50.(b) w.r.t. r for m = 400. Figure 2:

Community recovery error for n = 30000 ﬁnenodes, ν = 2 community overlap (CO), K = 5 commu-nities, and α = 500 , β = 50 intra- and extra- communityconstants for the probability of connectivity. From Fig. 2a, we observe that as we increase the mea- The results in this section are computed assuming p, q, K, r are known. However, using model selection meth-ods, heuristics can be developed to estimate these param-eters when they are not known apriori. aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper surement size (i.e. m , the number of c-nodes), thetheoretical error bound drops monotonically. Simi-larly, Fig. 2b plots the community recovery error withrespect to the coverage size r (i.e. the number ofmeasured ﬁne nodes combined into a c-node), demon-strating that increasing the coverage size monotoni-cally improves the theoretical community recovery er-ror. These observations conﬁrm the expectations madesubsequent to Theorem 1. Although the simulatedmethods, both in Fig. 2a and Fig. 2b, do not performas predictable as the theoretical error bound, most ofthem show an overall decrease in their recovery errorwhen respectively, the number of measurements andthe coverage size increase. Note that the light shadein Fig. 2 around the theoretical bound represents theambiguity in the calculation of the bound (c.f. Sec. 3of the supplementary materials).Note that the theoretical bound is the upper bound for the MAP estimator. Fig. 2 shows that the upperbound seem to be loose in certain regimes (e.g. forsmall m, r ), in which existing methods perform bet-ter. However, as the measurement- and the coveragesizes increase, the theoretical error bound becomestight and outperforms existing community detectionmethods with an increasing gap. We introduced a mathematical framework based onthe stochastic block model, that characterizes com-munity recovery from coarse measured graphs. Wedeveloped theoretical conditions, on the quantity andproperties of the measurements with respect to thecommunity structure of the high-resolution graph, toachieve perfect recovery. The assumptions of homoge-neous and balanced measurements were essential tothis work. We leave to future work the relaxationof these assumptions. Moreover, community recov-ery in a coarse measured graph, in which communitiesmodeled using the weighted and overlapping stochasticblock model, utilized edge weight binarization. Futurework can look into community recovery without bi-narization, in which one would use full graph weightdistribution for recovery.Finally, a signiﬁcant gap was observed between theperformance of state-of-the-art community detectionalgorithms, with the theoretical error bounds derivedin this paper, in certain regimes. This gap motivatesfuture work to improve existing clustering algorithmsto achieve its theoretical potential. An algorithmicinvestigation into recovery performance, e.g. similarto the variational inference approaches used in (Aicheret al., 2015; Dulac et al., 2020), is a promising directionto future work and would complement our theoretical analyses.

Acknowledgements

This work was supported by the Natural Sciences andEngineering Research Council (NSERC) of Canadathrough a Discovery Research Grant; the NationalScience Foundation Grants CCF-2029044 and CCF-2048223; the National Institutes of Health Grant1R01GM140468-01; Connaught International Scholar-ship for Doctoral Students; and the Vector Postgrad-uate Aﬃliate Award by Vector Institute for AI.

References

Emmanuel Abbe. Community detection and stochasticblock models: recent developments.

The Journal ofMachine Learning Research , 18(1):6446–6531, 2017.Emmanuel Abbe and Colin Sandon. Community de-tection in general stochastic block models: Funda-mental limits and eﬃcient algorithms for recovery.In , pages 670–688. IEEE,2015a.Emmanuel Abbe and Colin Sandon. Community de-tection in general stochastic block models: fun-damental limits and eﬃcient recovery algorithms.2015b.Emmanuel Abbe, Afonso S Bandeira, and GeorginaHall. Exact recovery in the stochastic block model.

IEEE Transactions on Information Theory , 62(1):471–487, 2015.Christopher Aicher, Abigail Z Jacobs, and AaronClauset. Learning latent block structure in weightednetworks.

Journal of Complex Networks , 3(2):221–248, 2015.Selin Aviyente. Compressed sensing framework for eegcompression. In , pages 181–184. IEEE,2007.Richard G Baraniuk. Compressive sensing [lecturenotes].

IEEE Signal Processing Magazine , 24(4):118–121, 2007.Danielle S Bassett and Olaf Sporns. Network neuro-science.

Nature Neuroscience , 20(3):353–364, 2017.Richard F Betzel and Danielle S Bassett. Multi-scalebrain networks.

Neuroimage , 160:73–83, 2017.Richard F Betzel, Maxwell A Bertolero, Evan M Gor-don, Caterina Gratton, Nico UF Dosenbach, andDanielle S Bassett. The community structure offunctional brain networks exhibits scale-speciﬁc pat-terns of inter-and intra-subject variability.

Neuroim-age , 202:115990, 2019. aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper

Max Dabagia, Konrad P Kording, and Eva L Dyer.Comparing high-dimensional neural recordings byaligning their low-dimensional latent representa-tions.Gautam Dasarathy, Parikshit Shah, Badri NarayanBhaskar, and Robert D Nowak. Sketching sparsematrices, covariances, and graphs via tensor prod-ucts.

IEEE Transactions on Information Theory , 61(3):1373–1388, 2015.David L Donoho. Compressed sensing.

IEEE Trans-actions on Information Theory , 52(4):1289–1306,2006.Stark C Draper and Sheida Malekpour. Compressedsensing over ﬁnite ﬁelds. In , pages 669–673.IEEE, 2009.Adrien Dulac, Eric Gaussier, and Christine Larg-eron. Mixed-membership stochastic block models forweighted networks. In

Conference on Uncertainty inArtiﬁcial Intelligence , pages 679–688. PMLR, 2020.Santo Fortunato. Community detection in graphs.

Physics Reports , 486(3-5):75–174, 2010.Santo Fortunato and Darko Hric. Community detec-tion in networks: A user guide.

Physics Reports ,659:1–44, 2016.Karl J Friston. Functional and eﬀective connectivity:a review.

Brain Connectivity , 1(1):13–36, 2011.Thorben Funke and Till Becker. Stochastic block mod-els: A comparison of variants and inference meth-ods.

PloS One , 14(4):e0215296, 2019.Naﬁseh Ghoroghchian, Stark C Draper, and RomanGenov. A hierarchical graph signal processing ap-proach to inference from spatiotemporal signals. In , pages 1–5. IEEE, 2018.Naﬁseh Ghoroghchian, David M Groppe, RomanGenov, Tauﬁk A Valiante, and Stark C Draper.Node-centric graph learning from data for brainstate identiﬁcation.

IEEE Transactions on Signaland Information Processing over Networks , 6:120–132, 2020.Michelle Girvan and Mark EJ Newman. Communitystructure in social and biological networks.

Proceed-ings of the National Academy of Sciences , 99(12):7821–7826, 2002.David Harel and Yehuda Koren. On clustering us-ing random walks. In

International Conference onFoundations of Software Technology and TheoreticalComputer Science , pages 18–41. Springer, 2001.Haiyan Hou, Hildrun Kretschmer, and Zeyuan Liu.The structure of scientiﬁc collaboration networks inscientometrics.

Scientometrics , 75(2):189–202, 2008. Muhammad Aqib Javed, Muhammad Shahzad Younis,Siddique Latif, Junaid Qadir, and Adeel Baig. Com-munity detection in networks: A multidisciplinaryreview.

Journal of Network and Computer Applica-tions , 108:87–111, 2018.Yu Jin, Andreas Loukas, and Joseph JaJa. Graphcoarsening with preserved spectral properties. In

In-ternational Conference on Artiﬁcial Intelligence andStatistics , pages 4452–4462. PMLR, 2020.Varun Jog and Po-Ling Loh. Information-theoreticbounds for exact recovery in weighted stochasticblock models using the renyi divergence. arXivpreprint arXiv:1509.06418 , 2015.George Karypis and Vipin Kumar. A fast and highquality multilevel scheme for partitioning irregulargraphs.

SIAM Journal on scientiﬁc Computing , 20(1):359–392, 1998.Dan Kushnir, Meirav Galun, and Achi Brandt. Fastmultiscale clustering and manifold identiﬁcation.

Pattern Recognition , 39(10):1876–1891, 2006.Andreas Loukas. Graph reduction with spectral andcut guarantees.

Journal of Machine Learning Re-search , 20(116):1–42, 2019.Christopher W Lynn and Danielle S Bassett. Thephysics of brain network structure, function andcontrol.

Nature Reviews Physics , 1(5):318, 2019.Xueyu Mao, Purnamrita Sarkar, and DeepayanChakrabarti. On mixed memberships and sym-metric nonnegative matrix factorizations. In

In-ternational Conference on Machine Learning , pages2324–2333. PMLR, 2017.Elchanan Mossel, Joe Neeman, and Allan Sly. Consis-tency thresholds for binary symmetric block models. arXiv preprint arXiv:1407.1591 , 3(5), 2014.Peter J Mucha, Thomas Richardson, Kevin Macon,Mason A Porter, and Jukka-Pekka Onnela. Com-munity structure in time-dependent, multiscale, andmultiplex networks.

Science , 328(5980):876–878,2010.Ivan Osorio, Hitten P Zaveri, Mark G Frei, and Su-san Arthurs.

Epilepsy: the intersection of neu-rosciences, biology, mathematics, engineering, andphysics . CRC Press, 2016.Shubhankar P Patankar, Jason Z Kim, FabioPasqualetti, and Danielle S Bassett. Path-dependent connectivity, not modularity, consis-tently predicts controllability of structural brainnetworks.

Network Neuroscience , pages 1–31, 2020.Mostafa Rahmani, Andre Beckus, Adel Karimian, andGeorge K Atia. Scalable and robust community de-tection with randomized sketching.

IEEE Transac-tions on Signal Processing , 68:962–977, 2020. aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper

Giulio Rossetti, Luca Pappalardo, and SalvatoreRinzivillo. A novel approach to evaluate communitydetection algorithms on ground truth. In

ComplexNetworks VII , pages 133–144. Springer, 2016.Giulio Rossetti, Letizia Milli, and R´emy Cazabet.Cdlib: a python library to extract, compare andevaluate communities from complex networks.

Ap-plied Network Science , 4(1):52, 2019.Ilya Safro, Peter Sanders, and Christian Schulz. Ad-vanced coarsening schemes for graph partitioning.

Journal of Experimental Algorithmics (JEA) , 19:1–24, 2015.M ´Angeles Serrano, Mari´an Bogun´a, and Alessan-dro Vespignani. Extracting the multiscale backboneof complex weighted networks.

Proceedings of theNational Academy of Sciences , 106(16):6483–6488,2009.Olaf Sporns and Richard F Betzel. Modular brain net-works.

Annual Review of Psychology , 67:613–640,2016.Bing-Jie Sun, Huawei Shen, Jinhua Gao, WentaoOuyang, and Xueqi Cheng. A non-negative symmet-ric encoder-decoder approach for community detec-tion. In

Proceedings of the 2017 ACM on Conferenceon Information and Knowledge Management , pages597–606, 2017.Wenpin Tang and Fengmin Tang. The poisson bi-nomial distribution–old & new. arXiv preprintarXiv:1908.10024 , 2019.Ulrike Von Luxburg. A tutorial on spectral clustering.

Statistics and computing , 17(4):395–416, 2007.Xiao Wang, Peng Cui, Jing Wang, Jian Pei, WenwuZhu, and Shiqiang Yang. Community preservingnetwork embedding. In

AAAI , volume 17, pages203–209, 2017.Jierui Xie, Boleslaw K Szymanski, and Xiaoming Liu.Slpa: Uncovering overlapping communities in socialnetworks via a speaker-listener interaction dynamicprocess. In , pages 344–349.IEEE, 2011.Jierui Xie, Stephen Kelley, and Boleslaw K Szyman-ski. Overlapping community detection in networks:The state-of-the-art and comparative study.

ACMComputing Surveys (CSUR) , 45(4):1–35, 2013.Min Xu, Varun Jog, Po-Ling Loh, et al. Optimal ratesfor community estimation in the weighted stochasticblock model.

The Annals of Statistics , 48(1):183–204, 2020.Jaewon Yang and Jure Leskovec. Overlapping commu-nity detection at scale: a nonnegative matrix factor-ization approach. In

Proceedings of the sixth ACM International Conference on Web Search and DataMining , pages 587–596. ACM, 2013.Jaewon Yang and Jure Leskovec. Deﬁning and evalu-ating network communities based on ground-truth.

Knowledge and Information Systems , 42(1):181–213, 2015.Anderson Y Zhang, Harrison H Zhou, et al. Minimaxrates of community detection in stochastic blockmodels.

The Annals of Statistics , 44(5):2252–2280,2016. aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper

Supplementary Materials constraint in Sec. 3.3 In the CO-1 constrained measurements with respect to a graph G ∈ G ( V , E ), each c-node only measures fromone community. Hence, the proﬁle matrix Φ deﬁned in (6) has row-wise support of size 1 and can be re-expressedas a proﬁle vector θ ∈ [ K ] m , where for all i ∈ [ m ], θ i identiﬁes the index of the community with which the i thc-node aligns: θ i = k if Φ ik (cid:54) = 0 (17)This way, the statics of ˜ W is simpliﬁed as the following Lemma. Lemma 2.

Let W ∼ SSBM ( n, K, p, q ) from which ˜ W in (5) is measured under the r -homogeneous and CO- measurement constraints. Then, ˜ W ij ’s are i.i.d. random variables for all i > j , with distribution: ˜ W ij ∼ (cid:26) Binomial ( r , p ) if θ i = θ j Binomial ( r , q ) else . (18) Proof.

The result is straightforwardly derived from Lemma 1 and the connection between the c-node proﬁlematrix and vector in (17).Assuming a uniform prior over the set [ K ] on the θ ’s elements, the community recovery problem is reducedto estimating the c-node proﬁle vector θ (i.e. assigning an element of [ K ] to each c-node) from a graph withweighted adjacency matrix ˜ W of (5) that is measured from W ∼ SSBM( n, K, p, q ) under the r -homogeneous,balanced, and CO-1 constraints.Let an MAP estimator take a measured graph ˜ G with the true c-node proﬁle vector θ , and estimate ˆ θ which as-signs an element of [ K ] to every c-node in ˜ G . In the following, we study the conditions such that the probabilityof failure, i.e. assigning a wrong community to at least one c-node considering the relabelling of communi-ties, approaches 0 when the intra- and extra-community distribution of associations are Binomial( r , p ) andBinomial( r , q ), respectively.As explained before, the ˜ W distributed as (18) can be thought of as a graph representation modeled as a weightedvariant of the SSBM (WSSBM). The problem of community recovery from the WSSBM is addressed in (Jog andLoh, 2015; Xu et al., 2020). The following Lemma uses Theorem 3.2 in (Jog and Loh, 2015) to ﬁnd an upperbound on the probability of MAP failure in estimating the c-node proﬁles. Lemma 3.

Let W ∼ SSBM ( n, K, p, q ) from which ˜ W in (5) is measured under the r -homogeneous and CO- constraints. The probability that the MAP estimator fails to recover the c-node proﬁle vector θ ∈ [ K ] m from ˜ W (up to relabelling of the indices), is upper-bounded by: P ( MAP failure ) ≤ (cid:98) m K (cid:99) (cid:88) m =1 min { ( emKm ) m , K m } e ( − mK m + m ) I + m (cid:88) m = (cid:98) m K (cid:99) +1 min { ( emKm ) m , K m } e − mm K I , (19) where I (cid:44) − r log (cid:16)(cid:112) (1 − p )(1 − q ) + √ pq (cid:17) . (20) aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper The error bound in (19) is directly taken from Theorem 3.2 of (Jog and Loh, 2015) after some notationadjustments. Also, we derived the Renyi divergence in (20) in a closed from using the distributions in (18). Acomplete proof can be found in Sec. 4.6 of the supplementary materials.Lemma 3 not only helps us characterize and predict the behaviour of the community recovery error, but alsoallows studying the conditions on the parameters of the underlying generative SSBM on W , that asymptoticallyguarantee such community recovery. This has been investigated in Theorem 2, in which the recovery of thec-node proﬁle matrix Φ in (6) is equivalent to the recovery of the c-node proﬁle vector θ ∈ [ K ] m deﬁned in (17).For better visualization and understanding, the recovery conditions in Theorem 2 is illustrated in the last columnof Table 2. The table compares the recovery conditions of a coarse measured graph with those of the classic SBMexisting in the literature, in terms of various scaling of the parameters. The ﬁrst column provides exhaustivepartitioning of the connection probability scaling of the ﬁne graph f ( n ), for which the second column shows state-of-the-art conditions to allow exact recovery. In the third column, and corresponding to each ﬁne connectionprobability scaling, diﬀerent scalings of the coarsening coverage size r (i.e. the number of measured ﬁne nodescombined into a c-node) are considered that will result in separate recovery conditions shown in the last column.The additional constraint on the f ( n ) scaling in the last column comes from the inequality r ≤ nm from thedeﬁnition of the coverage size r in Def. 1.From Lemma 3 and its consequences demonstrated in Table 2, a signiﬁcant observation can be derived. Whenthe connection probability scaling of an observed graph is less than logarithmic o ( log mm ), a classic (uncoarsened)community recovery is impossible (Abbe and Sandon, 2015a). However, graph coarsening allows for communityrecovery, by compensating via large coverage size. This is signiﬁcant since in classic graphs with constant-sizemeasurements, tending community recovery error to zero is impossible, while it is possible for coarsened graphswith coverage size growing as Ω( (cid:113) log mf ( n ) m ). Furthermore, for ﬁxed-size coverage while tending the measurement-size to inﬁnity, the α, β gap to allow recovery becomes less harsh, as the coverage size becomes larger, comparedwith the classic version ( r = 1).Table 2: Comparison of the recovery conditions under the CO-1 constraint, derived from Theorem 2. All scaling notations( o for strictly smaller, Θ for equal, and Ω for strictly greater orders) are deﬁned with respect to m . f ( n ) is the probabilityscaling of connections in the ﬁne graph, ˜ f ( m, n, r ) represents the probability scaling of connections in the coarse graph,which is a function of the ﬁne and coarse graphs sizes, and the coverage size to allow for comparison with the classicscenario (i.e. with n = m, r = 1).˜ f ( m, n, r ): Classic (exact) Recovery

Scaling of

Recovery, This paper probability scaling SSBM( m, α ˜ f ( m, m, , β ˜ f ( m, m, r in coarse graph o ( log mm ) Impossible o ( (cid:113) log mf ( n ) m ) Impossible c (cid:113) log mf ( n ) m Possible if α + β − √ αβ ≥ K c , and f ( n ) ≥ c m log mn , m, n → ∞ Ω( (cid:113) log mf ( n ) m ) Possible if α (cid:54) = β, f ( n ) = Ω( m log mn ) c mm Possible if α + β − √ αβ ≥ K c , o (1) Impossible m, n → ∞ Θ(1) Possible if α + β − √ αβ ≥ K c r , and f ( n ) ≥ r m log mn , m, n → ∞ Ω(1) Possible if α (cid:54) = β, f ( n ) = Ω( m log mn )Ω( log mm ) Possible if α (cid:54) = β, m, n → ∞ o ( (cid:113) log mf ( n ) m ) Impossible c (cid:113) log mf ( n ) m Possible if α + β − √ αβ ≥ K c , and f ( n ) ≥ c m log mn , m, n → ∞ Ω( (cid:113) log mf ( n ) m ) Possible if α (cid:54) = β, f ( n ) = Ω( m log mn ) aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper We started the proof of Theorem 1 and Corollary 1 by modeling the Φ recovery problem under the CO- ν constraints, as a community detection problem from a weighted OSBM. Next, we used a mapping to convert theproblem to a community detection from a general unweighted SBM model. This conversation facilitates ﬁndingthe estimation error bounds in Theorem 1 and the recovery conditions in Corollary 1.Before deriving the matrix of community connectivity probabilities U in Lemma 4, we ﬁrst rewrite the formaldeﬁnition for the general unweighted SBM, adopted from Deﬁnition 1 in (Abbe, 2017), in the following remark. Remark 2.

Let V be a positive integer (the number of vertices), κ be a positive integer (the number of communi-ties), s = ( s , · · · , s κ ) be a probability vector on [ d ] (the prior on the d communities) and A be a κ by d symmetricmatrix with entries in [0 , (the connectivity probabilities). The pair ( X, G ) is drawn under SBM ( V, s , A ) if X is a V -dimensional random vector with i.i.d. components distributed under s , and G is a V -node graph wherevertices i and j are connected with probability A X i , A X j , independently of other pairs of vertices. For notation simplicity in the following, rather than considering the pair (

X, G ) being drawn from SBM( V, s , A ),we only use the adjacency matrix which is a representative notation for G . Lemma 4.

Let W ∼ SSBM ( n, K, p, q ) from which ˜ W in (5) is measured under the r -homogeneous, balanced,and CO- ν constraints. Let ˜ W binarize to ˜ W ( b ) matrix as in (15) . ˜ W ( b ) is distributed as ˜ W ( b ) ∼ SBM ( m, s , U ) ,with s be the length- K ( ν ) vector prior distribution for the c-nodes’ extended community proﬁle vector, and for all i, j ∈ [ m ] and φ i , φ j ∈ Υ ( ν ) : U h ( φ i ) ,h ( φ j ) = P ( X ≥ r (˜ τ p + (1 − ˜ τ ) q )) , (21) for X ∼ PoissonBinomial ( { p } φ (cid:124) i φ j , { q } r − φ (cid:124) i φ j ) . The proof is found in Sec. 4.5 in supplementary materials.

Remark 3.

The r -normalized c-node proﬁle vectors, i.e. φ i r for all i ∈ [ m ] , corresponding to r -homogeneous,balanced, and CO- ν constrained measurements, take values in the set { ν , ν − , · · · , , } and are independent of n, m, r . Moreover, r is assumed to be divisible by { , · · · , ν } . The remaining proof of Corollary 1 is elaborated in Sec. 4.3 in supplementary materials.

The numerical results is essentially comprised of two parts: • Calculating the theoretical error bound derived in Theorem 1 : The bound computes (13), for whichwe need to calculate U . From Lemma 4, each element of U is a cumulative Poisson Binomial distributionwhich is intractable to compute for our parameter values. Hence, we calculate the mean, as well as the lowerand upper bounds of U elements derived in Lemma 5, to take into the ambiguity of the U evaluation. • Evaluating the performance of existing state-of-the art community detection methods on syn-thetically generated graphs : we generate ﬁne graphs W , with their corresponding community assignmentmatrices P , using the SBM random graph generators in networkX Python module. We ﬁx ν and s , theprior on the extended communities, based on which a proﬁle matrix Φ was randomly generated under the r -homogeneous, balanced, and CO- ν constraints. Next, we coarse measure the ﬁne graph according to (5),using a randomly generated B (using proﬁle matrix Φ and the community assignment matrix P of theinitially generated graph). Note that as explained in Sec. 3, we utilized the general SBM framework to characterize and derive error bounds for the recovery of communities from the binarized, SBM-representedcoarse graphs. The algorithm for such community recovery does not yet have an eﬃcient implementation(Abbe and Sandon, 2015a). We use the following four existing state-of-the-art overlapping communitydetection methods that are applied to the generated, and later binarized, coarse graphs:1. Modularized Non-Negative Matrix Factorization (M-NMF) (Wang et al., 2017),2. Speaker-listener Label Propagation Algorithm (SLPA) (Xie et al., 2011), aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper

3. Non-Negative Symmetric Encoder-Decoder (NNSED) (Sun et al., 2017),4. Cluster Aﬃliation Model for Big Networks (BigClam) (Yang and Leskovec, 2013).We used CDLIB python module with the implementations of these algorithms (Rossetti et al., 2019). Toevaluate the goodness of recovery, we use the 1 − “nF1” measure, i.e. normalized F1 subtracted from 1(Rossetti et al., 2016), to evaluate the overlapping community detection error. nF1 is considered a standardand computationally tractable community evaluation measure, also implemented as part of the CDLIBmodule. Proof.

Given the SSBM generative model for W and the measurement matrix B , each element in ˜ W in (5) isthe sum of independent Bernoulli random variables with p and q success probabilities. Since b i ’s are disjointed,˜ W ij ’s become independent random variables for all i > j with distribution:˜ W ij ∼ PoissonBinomial( { p } φ (cid:124) i φ j , { q } | supp( b i ) || supp( b j ) |− φ (cid:124) i φ j ) . (22)Note that (22) generally holds for the measurement matrices deﬁned prior to (5). Under the r -homogeneousmeasurement assumption deﬁned in Def. 1, for all i ∈ [ m ] we have: | supp( b i ) | = r. (23)Hence, equation (22) simpliﬁes to (7). Both (22) and (7) show the likelihoods of ˜ W elements given B and Φ. Proof.

We intend to ﬁnd a lower bound on D in (12) whose dominant term is suﬃciently simple to deriveinterpretable observations from Theorem 1. We start with the deﬁnition in (12), by ﬁxing the parameter t = : D (diag( s ) U k , diag( s ) U k (cid:48) ) ≥ (cid:88) k (cid:48)(cid:48) ∈ [ K ( ν ) ] s k (cid:48)(cid:48) [ U kk (cid:48)(cid:48) + U k (cid:48) k (cid:48)(cid:48) − (cid:112) U kk (cid:48)(cid:48) U k (cid:48) k (cid:48)(cid:48) ] ≥ max k (cid:48)(cid:48) ∈ [ K ( ν ) ] s k (cid:48)(cid:48) [ U kk (cid:48)(cid:48) + U k (cid:48) k (cid:48)(cid:48) − (cid:112) U kk (cid:48)(cid:48) U k (cid:48) k (cid:48)(cid:48) ]= max k (cid:48)(cid:48) ∈ [ K ( ν ) ] s k (cid:48)(cid:48) ( √ U kk (cid:48)(cid:48) − (cid:112) U k (cid:48) ˆ k ) . (24)To continue, we provide lower and upper bounds for the U elements in the following Lemma. Lemma 5.

The elements of the extended community connectivity matrix U deﬁned in (10) can be upper andlower bounded, for all a , a (cid:48) ∈ Υ ( ν ) , by: | U h ( a ) ,h ( a (cid:48) ) − Ψ( r ( p − q )(( a r ) (cid:124) a (cid:48) r − ˜ τ ) (cid:113) p (1 − p )( a r ) (cid:124) a (cid:48) r + q (1 − q )(1 − ( a r ) (cid:124) a (cid:48) r ) ) |≤ . r (cid:113) p (1 − p )( a r ) (cid:124) a (cid:48) r + q (1 − q )(1 − ( a r ) (cid:124) a (cid:48) r ) . (25) where Ψ( x ) = (cid:82) x −∞ √ π exp( − x / is the cumulative distribution function for the normal distribution.Proof. The proof can be found in Sec. 4.7 of the supplementary materials. aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper

By deﬁning k = h ( a ) , k (cid:48) = h ( a (cid:48) ), and k (cid:48)(cid:48) = h ( a (cid:48)(cid:48) ), we substitute (25) into (24) to further simplify the lowerbound: D (diag( s ) U k , diag( s ) U k (cid:48) ) ≥ max a (cid:48)(cid:48) ∈ Υ ( ν ) s h ( a (cid:48)(cid:48) ) (cid:118)(cid:117)(cid:117)(cid:117)(cid:116) max  Ψ( ( α − β )(( a r ) (cid:124) a (cid:48)(cid:48) r − ˜ τ ) (cid:113) ( α − β )( a r ) (cid:124) a (cid:48)(cid:48) r + β r (cid:112) f ( n )) − . r (cid:112) f ( n ) (cid:113) α (1 − p )( a r ) (cid:124) a (cid:48)(cid:48) r + β (1 − q )(1 − ( a r ) (cid:124) a (cid:48)(cid:48) r ) ,  − (cid:115) max (cid:20) Ψ( ( α − β )(( a (cid:48) r ) (cid:124) a (cid:48)(cid:48) r − ˜ τ ) (cid:113) ( α − β )( a (cid:48) r ) (cid:124) a (cid:48)(cid:48) r + β r (cid:112) f ( n )) + . r √ f ( n ) (cid:113) α (1 − p )( a (cid:48) r ) (cid:124) a (cid:48)(cid:48) r + β (1 − q )(1 − ( a (cid:48) r ) (cid:124) a (cid:48)(cid:48) r ) , (cid:21)(cid:33) . (26) Assumption 1.

The prior vector s does not aﬀect the dominant term in (26) , i.e. the maximum term in (26) is the same regardless of s h ( a (cid:48)(cid:48) ) being included in the maximization or not. If Assumption (1) holds, a simple solution that would give an intuitively good estimate for the index of thedominant community in (26) is:ˆ k = h (ˆ a ) , ˆ a =  arg max a (cid:48)(cid:48) ∈ Υ ( ν ) ( a − a (cid:48) ) (cid:124) a (cid:48)(cid:48) if α > β arg max a (cid:48)(cid:48) ∈ Υ ( ν ) ( a (cid:48) − a (cid:48) ) (cid:124) a (cid:48)(cid:48) if α < β . (27)It is easily seen that ˆ a in both conditions of(27) is achieved for:ˆ a u = (cid:40) u = arg max v | a v − a (cid:48) v | , α (cid:54) = β . (28)Equation (28) shows that the index of the estimated dominant term is independent of all the parameters, exceptfor the index of the community pair it is calculated for, i.e. a , a (cid:48) (or equivalently k, k (cid:48) ).We continue simplifying (26) by deﬁning: ω (cid:44) max v ( a v r , a (cid:48) v r ) . (29)From (28), we have: (cid:40) ( a r ) (cid:124) ˆ a r = ω, ( a (cid:48) r ) (cid:124) ˆ a r = 0 if α > β or( a r ) (cid:124) ˆ a r = 0 , ( a (cid:48) r ) (cid:124) ˆ a r = ω if α < β . (30)Replacing (30) into (26) yields: D (diag( s ) U k , diag( s ) U k (cid:48) ) ≥ s ˆ k  (cid:32)(cid:115) max (cid:20) Ψ( ( α − β )( ω − ˜ τ ) √ ( α − β ) ω + β r (cid:112) f ( n )) − . r √ f ( n ) √ α (1 − p ) ω + β (1 − q )(1 − ω ) , (cid:21) if α > β − (cid:115) max (cid:20) Ψ( ( α − β )( − ˜ τ ) √ β r (cid:112) f ( n )) + . r √ f ( n ) √ β (1 − q ) , (cid:21)(cid:33) (cid:32)(cid:115) max (cid:20) Ψ( ( α − β )( − ˜ τ ) √ β r (cid:112) f ( n )) − . r √ f ( n ) √ β (1 − q ) , (cid:21) if α < β − (cid:115) max (cid:20) Ψ( ( α − β )( ω − ˜ τ ) √ ( α − β ) ω + β r (cid:112) f ( n )) + . r √ f ( n ) √ α (1 − p ) ω + β (1 − q )(1 − ω ) , (cid:21)(cid:33) . (31)Note that even if Assumption 1 does not hold, or if (27) does not give the maximum term in (26), i.e. if (28) isnot the dominant index of (26), the inequality in (31) is still true as a lower bound.For 0 < ˜ τ < ν and αf ( n ) , βf ( n ) ≤ , it is straightforward to see that the estimate dominant lower bound termin (31) increases as r , f ( n ), ω , the gap between α and β increase, while other parameters remain unchanged.Increasing the gap between α and β , is achieved by ﬁxing whichever α or β that is smaller and increasing theother one. From the deﬁnition (29), ω only depends on the extended community index pair k, k (cid:48) for which it iscalculated. As ν increases, such community pair proﬁles, i.e. a and a (cid:48) , are allowed to cover more communities.Hence, from Def. 2 and 3, a decrease in ν results in an increase in ω . aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper Proof.

In the following we address the recovery condition laid out in the corollary. The conditions are derivedmainly using Abbe’s community recovery conditions in Theorem 1 of the general SBM paper (Abbe and Sandon,2015a).From Lemma 4, we obtain that when ˜ W is binarized to ˜ W (b) according to (15), ˜ W (b) can be modeled as˜ W (b) ∼ SBM( m, s , U ), where U is deﬁned in (21) and s is the length- K ( ν ) prior distribution vector for thec-nodes’ extended community proﬁle vector. The probability of the MAP estimator failure to recover Φ from˜ W (b) ∼ SBM( m, s , U ) tends to zero, if for all k, k (cid:48) ∈ [ K ( ν ) ] and k (cid:54) = k (cid:48) (Abbe and Sandon, 2015a): (cid:20) lim m −→ ∞ m log m D (diag( s ) U k , diag( s ) U k (cid:48) ) (cid:21) ≥ D is the scaled CH divergence deﬁned in (12) (The term m log m should be the coeﬃcient of U in the originalCH divergence deﬁnition, but we excluded it from D for notation simplicity). D is lower-bounded for the ﬁxparameter t = : D (diag( s ) U k , diag( s ) U k (cid:48) ) ≥ (cid:88) k (cid:48)(cid:48) ∈ [ K ( ν ) ] s k (cid:48)(cid:48) [ U kk (cid:48) + U k (cid:48) k (cid:48)(cid:48) − (cid:112) U kk (cid:48) U k (cid:48) k (cid:48)(cid:48) ] . (33)From (33), we can easily deduce that if for all k, k (cid:48) ∈ [ K ( ν ) ] and k (cid:54) = k (cid:48) , the term (cid:88) k (cid:48)(cid:48) ∈ [ K ( ν ) ] s k (cid:48)(cid:48) [ U kk (cid:48) + U k (cid:48) k (cid:48)(cid:48) − (cid:112) U kk (cid:48) U k (cid:48) k (cid:48)(cid:48) ] is strictly greater than zero, the condition (32) always satisﬁes (i.e. the LHS of (32) tends to ∞ as m −→ ∞ ). In order to have strictly positive inner summation terms, from the inequality of arithmetic andgeometric means, for all k, k (cid:48) ∈ [ K ( ν ) ], it is suﬃcient to have: ∃ k (cid:48)(cid:48) ∈ [ K ( ν ) ] : s k (cid:48)(cid:48) (cid:20) U k,k (cid:48)(cid:48) + U k (cid:48) ,k (cid:48)(cid:48) − (cid:112) U k,k (cid:48)(cid:48) U k (cid:48) ,k (cid:48)(cid:48) (cid:21) > . (34)Equation (34) happens if for all k, k (cid:48) ∈ [ K ( ν ) ] ∃ k (cid:48)(cid:48) ∈ [ K ( ν ) ] : s k (cid:48)(cid:48) > , | U k,k (cid:48)(cid:48) − U k (cid:48) ,k (cid:48)(cid:48) | > k = h ( a ) , k (cid:48) = h ( a (cid:48) ), and k (cid:48)(cid:48) = h ( a (cid:48)(cid:48) ) for the proﬁle vectors a , a (cid:48) , a (cid:48)(cid:48) and the mapping function h . The assumptions of balanced and CO- ν measurements in Def. 2 and 3 suggestthat for all proﬁle vectors a , a (cid:48) ∈ Υ ( ν ) where a (cid:54) = a (cid:48) , there exists at least a u ∈ [ K ] where a u r ≥ ν , a (cid:48) u = 0 or a (cid:48) u r ≥ ν , a u = 0. Without loss of generality, we assume the former to get U k,k (cid:48)(cid:48) − U k (cid:48) ,k (cid:48)(cid:48) > a and a (cid:48) to get U k,k (cid:48)(cid:48) − U k (cid:48) ,k (cid:48)(cid:48) < a (cid:48)(cid:48) ∈ Υ ( ν ) that satisﬁes a (cid:48)(cid:48) u r = 1 andso, ( a r ) (cid:124) a (cid:48)(cid:48) r ≥ ν and a (cid:48) (cid:124) a (cid:48)(cid:48) = 0 hold. Using the upper and lower bounds on U elements derived in Lemma 5, wesimplify the condition of (35) step-by-step: | U k,k (cid:48)(cid:48) − U k (cid:48) ,k (cid:48)(cid:48) | ≥ | Ψ( ( p − q )(( a r ) (cid:124) a (cid:48)(cid:48) r − ˜ τ ) (cid:113) ( p − q )( a r ) (cid:124) a (cid:48)(cid:48) r + q r ) − Ψ( − ( p − q )˜ τ √ q r ) |− . r (cid:113) p (1 − p )( a r ) (cid:124) a (cid:48)(cid:48) r + q (1 − q )(1 − ( a r ) (cid:124) a (cid:48)(cid:48) r ) − . r √ q (1 − q ) ≥ | Ψ( ( p − q )( ν − ˜ τ ) (cid:113) ( p − q ) ν + q r ) − Ψ( − ( p − q )˜ τ √ q r ) |− r (cid:20) . √ min( p (1 − p )( ν )+ q (1 − q )(1 − ν ) ,p (1 − p )) + . √ q (1 − q ) (cid:21) ≥ | Ψ( ( α − β )( ν − ˜ τ ) (cid:113) ( α − β ) ν + β (cid:124) (cid:123)(cid:122) (cid:125) ρ r (cid:112) f ( n )) − Ψ( − ( α − β )˜ τ √ β (cid:124) (cid:123)(cid:122) (cid:125) ρ r (cid:112) f ( n )) |− r √ f ( n ) (cid:20) . √ min( α (1 − αf ( n ))( ν )+ β (1 − βf ( n ))(1 − ν ) ,α (1 − αf ( n ))) + . √ β (1 − βf ( n )) (cid:21) > . (36) aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper Assume a constant upper bound for f ( n ) ≤ f as n ≥ n . Hence, we can upper bound the term: . √ min( α (1 − αf ( n ))( ν )+ β (1 − βf ( n ))(1 − ν ) ,α (1 − αf ( n ))) + . √ β (1 − βf ( n )) ≤ . √ min( α (1 − αf )( ν )+ β (1 − βf )(1 − ν ) ,α (1 − αf )) + . √ β (1 − βf ) (37)To simplify the notations, we deﬁne: d (cid:44) r (cid:112) f ( n ) , ρ (cid:44) ( α − β )( ν − ˜ τ ) (cid:113) ( α − β ) ν + β , ρ (cid:44) ( α − β )˜ τ √ β ρ (cid:44) (cid:20) . √ min( α (1 − αf ( n ))( ν )+ β (1 − βf ( n ))(1 − ν ) ,α (1 − αf ( n ))) + . √ β (1 − βf ( n )) (cid:21) . (38)To continue deriving a simpler formulation for the recovery condition, we consider two cases: ρ , ρ > ρ , ρ <

0. This way, we must have α > β and α < β , respectively. So, considering both cases together:0 < ˜ τ < ν , α (cid:54) = β . (39)Substituting (38) and the equalities Ψ( x ) = (1 + erf( x √ )) and erf( x ) = − erf( − x ) into (36), with erf( x ) repre-senting the Gauss Error Function, we get: | Ψ( ρ d ) − Ψ( − ρ d ) | > ρ d | (cid:104) ρ d √ ) (cid:105) − (cid:104) − erf( ρ d √ ) (cid:105) | > ρ d or | (cid:104) − erf( − ρ d √ ) (cid:105) − (cid:104) − ρ d √ ) (cid:105) | > ρ d | erf( ρ d √ ) + erf( ρ d √ ) | > ρ d or | erf( − ρ d √ ) + erf( − ρ d √ ) | > ρ d | erf( | ρ | d √ ) + erf( | ρ | d √ ) | > ρ d . (40)Next, we relax the condition in (36) using the inequality erf( x ) ≥ − e − x for all x ≥ − e − ( | ρ | d )22 + 1 − e − ( | ρ | d )22 > ρ d e − ( ρ d )22 + e − ( ρ d )22 < − ρ d . (41)We apply another relaxation to the condition in (41), which yields: e − (max( | ρ | , | ρ | ) d )22 < − ρ d − (max( | ρ | , | ρ | ) d ) < log(1 − ρ d ) . (42)The last relaxation is applied using log( x ) ≥ − x : − (max( | ρ | , | ρ | ) d ) < − − ρ d = − ρ d − ρ (max( | ρ | , | ρ | ) d ) > ρ d − ρ d − ρ d − ρ max ( | ρ | , | ρ | ) (cid:124) (cid:123)(cid:122) (cid:125) ρ > . (43)The third-order polynomial in (43) has only one real root. Hence, the condition in (43) simpliﬁes to the ﬁnalform of: r (cid:112) f ( n ) > ∆ , (44)where ∆ (cid:44) (cid:104) (cid:36) √ + √ ρ (cid:36) + ρ (cid:105) , ρ (cid:44) ρ max ( | ρ | , | ρ | ) ,(cid:36) (cid:44) (cid:113) √ (cid:112) ρ ρ + 27 ρ + 2 ρ + 27 ρ , (45)and the rest of the constants have already been deﬁned in (38). aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper Next, we use the upper bound on the coverage size r (i.e. the number of measured ﬁne nodes represented by ac-node) from its deﬁnition in Def. 1, i.e. r ≤ nm , to further see which scaling functions f ( n ) is necessary to allowfor the condition r > ∆ √ f ( n ) in (44): nm > ∆ √ f ( n ) (46)Hence, the following is a necessary requirement to satisfy (46): f ( n ) > ∆ ( mn ) . (47)This completes the proof. Proof.

We study the asymptotic behaviour of the MAP failure error using (19) considering two scenarios of m being constant, or growing m → ∞ . In the former case, the error upper bound in (19) is the sum of ﬁnite terms,and P (MAP failure) → I → ∞ . (48)In the latter scenario, i.e. when m → ∞ , it is proven in Theorem 3.2 of (Jog and Loh, 2015) and Theorem 5.1of (Xu et al., 2020) that the error goes to zero as:lim m →∞ mIK log m > . (49)We ﬁrst provide an upper bound for the log( . ) term in (20), using the inequality log( x ) ≤ x − √ − x = 1 + x + x + · · · :log (cid:16)(cid:112) (1 − p )(1 − q ) + √ pq (cid:17) = log( (cid:112) (1 − p )(1 − q )) + log(1 + √ pq √ (1 − p )(1 − q ) ) ≤ log((1 − p )(1 − q )) + √ pq √ (1 − p )(1 − q ) = log(1 − p − q + pq ) + √ pq √ (1 − p )(1 − q ) ≤ ( − p − q + pq ) + √ pq √ (1 − p )(1 − q ) = − p + q + pq + √ pq (1 + p + q − pq + ( p + q − pq ) + · · · ) ≤ − ( p + q − √ pq ) + pq + √ pq ( p + q − pq + ( p + q − pq ) + · · · ) . (50)Next, we substitute (3) into (50):log (cid:16)(cid:112) (1 − p )(1 − q ) + √ pq (cid:17) ≤ − ( α + β − √ αβ ) f ( n ) + αβ f ( n )+ √ αβf ( n )( α + β − αβ f ( n ) + ( α + β − αβ ) f ( n ) + · · · ) . (51)As mentioned subsequent to the deﬁnition (3), f ( n ) is a decreasing function with respect to its argument. Hence,we can rewrite (51) for α (cid:54) = β , in terms of the dominant orders and denoting the equality order by Θ:log (cid:16)(cid:112) (1 − p )(1 − q ) + √ pq (cid:17) ≤ − ( α + β − √ αβ )( f ( n ) − Θ( f ( n ))) . (52)Substituting (52) into (20), (48), and (49), yields the following conditions for P (MAP failure) →  (cid:104) ( α + β − √ αβ ) r ( f ( n ) − Θ( f ( n ))) (cid:105) → ∞ (cid:34) lim m →∞ α + β − √ αβ ) K r ( f ( n ) − Θ( f ( n ))) m log m (cid:35) > . (53) aﬁseh Ghoroghchian, Gautam Dasarathy, Stark C. Draper The ﬁrst condition in (53) can only happen when r → ∞ (since f ( n ) is a probability scaling and cannot approach ∞ ). From Def. 1, r → ∞ only if n → ∞ . This makes ( α + β − √ αβ ) r f ( n )(1 − Θ( f ( n )) f ( n ) ) r,n →∞ −−−−−→ ∞ being equalto ( α + β − √ αβ ) r f ( n ) r,n →∞ −−−−−→ ∞ , since we assumed Θ( f ( n )) f ( n ) n →∞ −−−−→ m , the number of c-nodes) can not exceed the ﬁne graph size, i.e. m ≤ n .Hence, by tending n → ∞ , we simultaneously should have n → ∞ . This way, the term lim m →∞ ( f ( n ) − Θ( f ( n )))in the second condition in (53), becomes lim m,n →∞ f ( n ), since Θ( f ( n )) f ( n ) →

0. Accordingly, (53) simpliﬁes to:  (cid:104) ( α + β − √ αβ ) r f ( n ) (cid:105) → ∞ (cid:34) lim m →∞ α + β − √ αβ ) K r f ( n ) m log m (cid:35) > . (54)Straightforward calculations summarize the recovery conditions in (54) as (16). Proof.

Using the deﬁnition of ˜ W (b) in (15), the probability of community-wise connectivity can be calculated as: U h ( φ i ) ,h ( φ j ) = P [ ˜ W (b) ij = 1 | h ( φ i ) , h ( φ j )]= P [ ˜ W ij ≥ r (˜ τ p + (1 − ˜ τ ) q ) | φ i , φ j ] . (55)Considering the distribution of ˜ W in (7), the proof is complete. Proof.

We deﬁne I as the Renyi divergence of order for discrete distributions P and Q : I (cid:44) − (cid:88) (cid:96) ≥ (cid:112) P ( (cid:96) ) Q ( (cid:96) )) . (56)The Renyi divergence I evaluates the extent to which P and Q are diﬀerent from one another. P and Q arethe intra- and extra-community edge weight distributions, which according to (18), correspond respectively toBinomial( r , p ) and Binomial( r , q ) in this work. Straightforward calculations yields I in (20). Proof.

From Lemma 4, we have: U k,k (cid:48) = P ( X ≥ r (˜ τ p + (1 − ˜ τ ) q )) , (57)where X ∼ PoissonBinomial( { p } a (cid:124) a (cid:48) , { q } r − a (cid:124) a (cid:48) ). The Poisson binomial distribution can be approximated bythe standard normal distribution with mean µ (cid:44) p a (cid:124) a (cid:48) + q ( r − a (cid:124) a (cid:48) ) and variance σ (cid:44) p (1 − p ) a (cid:124) a (cid:48) + q (1 − q )( r − a (cid:124) a (cid:48) ). Adopted from the Berry-Esseen theorem, such approximation comes with upper and lower boundsformalizing the convergence rate. This way, the Poisson Binomial cumulative distribution function in (55) isapproximated by the cumulative distribution function for the standard normal distribution denoted by Ψ, andupper and lower bounded by Theorem 3.5 in (Tang and Tang, 2019): | U h ( a ) ,h ( a (cid:48) ) − Ψ( µ − r (˜ τp +(1 − ˜ τ ) q ) σ ) | ≤ . σ (58)Substituting µ, σ into (58) gives: | U h ( a ) ,h ( a (cid:48) ) − Ψ( p a (cid:124) a (cid:48) + q ( r − a (cid:124) a (cid:48) ) − r (˜ τp +(1 − ˜ τ ) q ) √ p (1 − p ) a (cid:124) a (cid:48) + q (1 − q )( r − a (cid:124) a (cid:48) ) ) | ≤ . √ p (1 − p ) a (cid:124) a (cid:48) + q (1 − q )( r − a (cid:124) a (cid:48) ) (59)Replacing p, qp, q