On spectral algorithms for community detection in stochastic blockmodel graphs with vertex covariates
Cong Mu, Angelo Mele, Lingxin Hao, Joshua Cape, Avanti Athreya, Carey E. Priebe
11 On identifying unobserved heterogeneity instochastic blockmodel graphs with vertex covariates
Cong Mu, Angelo Mele, Lingxin Hao, Joshua Cape, Avanti Athreya, and Carey E. Priebe
Abstract —Both observed and unobserved vertex heterogeneitycan influence block structure in graphs. To assess these effects onblock recovery, we present a comparative analysis of two model-based spectral algorithms for clustering vertices in stochasticblockmodel graphs with vertex covariates. The first algorithmdirectly estimates the induced block assignments by investigatingthe estimated block connectivity probability matrix includingthe vertex covariate effect. The second algorithm estimates thevertex covariate effect and then estimates the induced blockassignments after accounting for this effect. We employ Chernoffinformation to analytically compare the algorithms’ performanceand derive the Chernoff ratio formula for some special models ofinterest. Analytic results and simulations suggest that, in general,the second algorithm is preferred: we can better estimate theinduced block assignments by first estimating the vertex covariateeffect. In addition, real data experiments on a diffusion MRIconnectome data set indicate that the second algorithm has theadvantages of revealing underlying block structure and takingobserved vertex heterogeneity into account in real applications.Our findings emphasize the importance of distinguishing betweenobserved and unobserved factors that can affect block structurein graphs.
Index Terms —Spectral graph inference, Chernoff ratio,stochastic blockmodel, vertex covariate.
I. I
NTRODUCTION I N network inference applications, it is important to dis-tinguish different factors such as vertex covariates andunderlying vertex block assignments that can lead to net-works with different latent communities. As a special case ofrandom graph models, stochastic blockmodel (SBM) graphsare popular in the literature for community detection [1]–[3].Inference in SBMs extended to include vertex covariates relieson either variational methods [4]–[6] or spectral approachesthat promise applicability to large graphs [7]–[9]. Spectralmethods [10] have been widely used in random graph modelsfor a variety of subsequent inference tasks such as communitydetection [11]–[14], vertex nomination [15], nonparametrichypothesis testing [16], and multiple graph inference [17]. Two
Date of current version July 2, 2020.This work was supported in part by the US National Science Foundationunder Grant SES-1951005 and in part by the US Defense Advanced ResearchProjects Agency under the D3M program administered through contractFA8750-17-2-0112.C. Mu, A. Athreya, and C.E. Priebe are with the Department of AppliedMathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218.E-mail: { cmu2, dathrey1, cep } @jhu.edu.A. Mele is with the Carey Business School, Johns Hopkins University,Baltimore, MD 21202. E-mail: [email protected]. Hao is with the Department of Sociology, Johns Hopkins University,Baltimore, MD 21218. E-mail: [email protected]. Cape is with the Department of Statistics, University of Pittsburgh,Pittsburgh, PA 15260. E-mail: [email protected]. particular spectral embedding methods, adjacency spectralembedding (ASE) and Laplacian spectral embedding (LSE),are popular since they enjoy nice propertices including con-sistency [18] and asymptotic normality [19], [20]. To comparethe performance of these two embedding methods, the conceptof Chernoff information is first employed for SBMs [20] andthen extended to consider the underlying graph structure [21].One problem of interest in hypothesis testing framework isto assess the influence of unobserved vertex heterogeneity onoutcome variables, controlling for vertex covariate effect [22],[23]. In a K -block SBM, that is to test whether F k = F for k ∈ { , · · · , K } given y i | τ i = k ∼ F k , where y i areoutcome variables and τ i is the induced block assignment forvertex i . To achieve this goal, it is crucial to estimate the blockstructure τ after accounting for the vertex covariate effect.Here we use “induced block assignment” to refer to the blockassignment after accounting for the vertex covariate effect,since the number of blocks can change. An “induced” K = 2 SBM, but with each of the two blocks split into two via theeffect of a binary vertex covariate, becomes a K = 4 SBM.We shall address this concept in detail in Section II.In this article, we investigate two model-based spectralalgorithms for clustering vertices in stochastic blockmodelgraphs with vertex covariates. Analytically, we compare thealgorithms’ performance via Chernoff information and derivethe Chernoff ratio formula for special models of interest. Weshall address the notion of Chernoff information for comparingalgorithms in detail in Section IV. Practically, we compare thealgorithms’ actual clustering performance by simulations andreal data experiments on a diffusion MRI connectome data set.The structure of this article is summarized as follows.Section II reviews relevant models for random graphs andthe basic idea of spectral methods. Section III introducesour model-based spectral algorithms for clustering verticesin stochastic blockmodel graphs with vertex covariates. Sec-tion IV analytically compares the algorithms’ performance viaChernoff information and derives the Chernoff ratio formulafor special models of interest. Section V provides simulationsand real data experiments on a diffusion MRI connectome dataset to compare the algorithms’ performance in terms of actualclustering performance. Section VI discusses the findings andpresents some open questions for further investigation. Ap-pendix A and Appendix B provide technical details for latentposition geometry and analytic derivations of the Chernoffratio. a r X i v : . [ s t a t . M L ] J u l II. M
ODELS AND S PECTRAL M ETHODS
We consider the latent position model [24], [25] for edge-independent random graphs in which each vertex is associatedwith a latent position X i ∈ X where X is some latent spacesuch as R d , and edges between vertices arise independentlywith probability P ij = κ ( X i , X j ) for some kernel function κ : X × X → [0 , . In particular, we focus on the generalizedrandom dot product graph (GRDPG) where the kernel functionis taken to be the (indefinite) inner product, which can includemore flexible SBMs as special cases. Definition 1 (Generalized Random Dot Product Graph [26]) . Let I d + d − = I d + (cid:76) (cid:0) − I d − (cid:1) with d + ≥ and d − ≥ .Let F be a d -dimensional inner product distirbution with d = d + + d − on X ⊂ R d satisfying x (cid:62) I d + d − y ∈ [0 , for all x , y ∈ X . Let A be an adjacency matrix and X = [ X , · · · , X n ] (cid:62) ∈ R n × d where X i ∼ F , i.i.d. for all i ∈ { , · · · , n } . Then we say ( A , X ) ∼ GRDPG ( n, F, d + , d − ) if A ij | X i , X j ∼ Bernoulli ( P ij ) where P ij = X (cid:62) i I d + d − X j for any i, j ∈ { , · · · , n } . As a special case of the GRDPG model, the SBM can beused to model block structure in edge-independent randomgraphs.
Definition 2 ( K -block Stochastic Blockmodel Graph [2]) . The K -block stochastic blockmodel (SBM) graph is an edge-independent random graph with each vertex belonging to oneof K blocks. It can be parametrized by a block connectivityprobability matrix B ∈ [0 , K × K and a nonnegative vectorof block assignment probabilities π ∈ R K summing to unity.Let A be an adjacency matrix and τ be a vector of blockassignments with τ i = k if vertex i is in block k (occuringwith probability π k ). We say ( A , τ ) ∼ SBM ( n, B , π ) if A ij | τ i , τ j ∼ Bernoulli ( P ij ) where P ij = B τ i τ j for any i, j ∈ { , · · · , n } . Let ( A , τ ) ∼ SBM ( n, B , π ) as in Definition 2 where B ∈ [0 , K × K with d + strictly positive eigenvalues and d − strictlynegative eigenvalues. To represent this SBM in the GRDPGmodel, we can choose ν , · · · , ν K ∈ R d where d = d + + d − such that ν (cid:62) k I d + d − ν (cid:96) = B k(cid:96) for all k, (cid:96) ∈ { , · · · , K } . Forexample, we can take ν = U B | S B | / where B = U B S B U (cid:62) B is the spectral decomposition of B after re-ordering. Then wehave the latent position of vertex i as X i = ν k if τ i = k . As anillustration, consider the prototypical 2-block SBM with rankone block connectivity probability matrix B where B = p , B = q , B = B = pq with < p < q < . Let X i be the latent position of vertex i where X i = ν = p if τ i = 1 and X i = ν = q if τ i = 2 . Then we can represent this SBMin the GRDPG model with latent positions ν = (cid:2) p q (cid:3) (cid:62) as B = νν (cid:62) = (cid:20) p pqpq q (cid:21) . (1)An extension of GRDPG taking vertex covariates intoconsideration is available. Definition 3 (GRDPG with Vertex Covariates [9]) . Consider GRDPG as in Definition 1. Let Z ∈ R n × r denote observed vertex covariates. Then we say ( A , X , Z , β ) ∼ GRDPG-Cov ( n, F, d + , d − , f, h ) if A ij | X i , X j , Z i , Z j ∼ Bernoulli ( P ij ) where P ij = h (cid:0) X (cid:62) i I d + d − X j + β (cid:62) f ( Z i , Z j ) (cid:1) for any i, j ∈ { , · · · , n } with link functions f : R r × R r → R r and h : R → [0 , . Remark 1.
A special case of the model in Definition 3 is to usethe indicator function as f and the identity function as h withone binary covariate. I.e. P ij = X (cid:62) i I d + d − X j + β { Z i = Z j } for any i, j ∈ { , · · · , n } or P = XI d + d − X (cid:62) + β ( + ZZ (cid:62) ) with Z ∈ {− , } n . In the case of an SBM, we have P ij = B τ i τ j + β { Z i = Z j } . Example 1 (2-block Rank One Model with One BinaryCovariate) . As an illustration, consider the rank one matrix B in Eq. (1) and the SBM model in Remark 1. Let Z ∈ {− , } n denote the observed binary covariate. Assume < β < with p + β, q + β, pq + β ∈ [0 , . Then we have the blockconnectivity probability matrix with the vertex covariate effectas B Z = p + β p pq + β pqp p + β pq pq + βpq + β pq q + β q pq pq + β q q + β . (2) Example 2 (2-block Homogeneous Model with One BinaryCovariate) . As a second illustration, consider the rank twomatrix B where B = B = a, B = B = b with < b < a < . The SBMs parametrized by this B lead tothe notion of the homogeneous model [1], [21]. For K -blockhomogeneous model, we have B k(cid:96) = a for k = (cid:96) and B k(cid:96) = b for k (cid:54) = (cid:96) . Assume < β < with a + β, b + β ∈ [0 , . Wethen have the block connectivity probability matrix with thevertex covariate effect as B Z = a + β a b + β ba a + β b b + βb + β b a + β ab b + β a a + β . (3)Note that in both of these examples, an induced 2-blockSBM becomes a 4-block SBM via the effect of a binary vertexcovariate. The goal is to cluster each vertex into one of thetwo induced blocks after accounting for the vertex covariateeffect. Definition 4 (Adjacency Spectral Embedding) . Let A ∈{ , } n × n be an adjacency matrix with eigendecomposition A = (cid:80) ni =1 λ i u i u (cid:62) i where | λ | ≥ · · · ≥ | λ n | are themagnitude-ordered eigenvalues and u , · · · , u n are the cor-responding orthonormal eigenvectors. Given the embeddingdimension d < n , the adjacency spectral embedding (ASE)of A into R d is the n × d matrix (cid:98) X = U A | S A | / where S A = diag ( λ , · · · , λ d ) and U A = [ u | · · · | u d ] . Remark 2.
There are different methods for choosing theembedding dimension [27], [28]; we adopt the simple and ef-ficient profile likelihood method [29] to automatically identify“elbow”, which is the cut-off between the signal dimensionsand the noise dimensions in scree plot.
In this article, we will focus on applying ASE for ourinference task. The adaptation of our algorithms and analytic derivations to the Laplacian spectral embedding can be avaluable future contribution.III. M
ODEL - BASED S UBSEQUENT I NFERENCE VIA S PECTRAL M ETHODS
We are interested in the inference task of estimating theinduced block assignments in a SBM with vertex covariates.To that end, we also consider algorithms for estimating thevertex covariate effect, which can be further used to estimatethe induced block assignments. For simplicity, we considerall algorithms with identity link and one binary covariateas in Remark 1. Generalization to the case with other linkfunctions and more than one covariate can be a valuable futurecontribution.
Algorithm 1:
Estimation of induced block assignmentincluding the vertex covariate effect
Input:
Adjacency matrix A ∈ { , } n × n Output:
Block assignments including the vertexcovariate effect as (cid:98) ξ ; induced block assignmentsafter accounting for the vertex covariate effectas (cid:98) τ . Estimate latent positions including the vertex covariateeffect as (cid:98) Y ∈ R n × (cid:98) d using ASE of A where (cid:98) d is chosenas in Remark 2. Cluster (cid:98) Y using Gaussian mixture modeling (GMM) toestimate the block assignments including the vertexcovariate effect as (cid:98) ξ ∈ { , · · · , (cid:98) K } n where (cid:98) K is chosenvia Bayesian Information Criterion (BIC). Compute the estimated block connectivity probabilitymatrix including the vertex covariate effect as (cid:98) B Z = (cid:98) µ I d + d − (cid:98) µ (cid:62) ∈ [0 , (cid:98) K × (cid:98) K , where (cid:98) µ ∈ R (cid:98) K × (cid:98) d is the estimated means of all clusters. Cluster the diagonal of (cid:98) B Z using GMM to estimate thecluster assignments of the diagonal as (cid:98) φ ∈ { , · · · , (cid:98) K } (cid:98) K . Estimate the induced block assignments as (cid:98) τ by (cid:98) τ k = c for k ∈ { i | (cid:98) ξ i = t for t ∈ { j | (cid:98) φ j = c }} and c = 1 , · · · , (cid:98) K .Note that in Algorithm 1, the estimation of the inducedblock assignments, i.e., (cid:98) τ , highly depends on the estimatedblock connectivity probability matrix (cid:98) B Z . This suggests thatwe may not obtain an accurate estimate of the induced blockassignments if (cid:98) B Z is not well-structrued, which is oftenthe case in real applications. Thus we propose a modifiedalgorithm that will use additional information from vertexcovariates to estimate the induced block assignments alongwith vertex covariate effect.As an illustration of estimating β (Step 2 in Algorithm 2),consider the block connectivity probability matrix B Z as inEq. (3). To get β , we can subtract two specific entries of B Z .For example, Algorithm 2:
Estimation of induced block assignmentafter accounting for the vertex covariate effect
Input:
Adjacency matrix A ∈ { , } n × n ; observedvertex covariates Z ∈ {− , } n Output:
Block assignments including the vertexcovariate effect as (cid:98) ξ ; induced block assignmentsafter accounting for the vertex covariate effectas (cid:101) τ ; estimated vertex covariate effect as (cid:98) β . Estimate the vertex covariate effect as (cid:98) β using one of thefollowing procedures [9].(a) Assign the block covariates as Z B ∈ {− , } (cid:98) K foreach block using the mode, i.e., Z B,k = − if n − ,k ≥ n ,k if n − ,k < n ,k , where n z,k = (cid:88) i : (cid:98) ξ i = k { Z i = z } . Construct pair set S = { ( k(cid:96), k(cid:96) (cid:48) ) , k, (cid:96), (cid:96) (cid:48) ∈{ , · · · , (cid:98) K } | (cid:98) φ (cid:96) = (cid:98) φ (cid:96) (cid:48) , Z B,k = Z B,(cid:96) , Z B,k (cid:54) = Z B,(cid:96) (cid:48) } .Estimate the vertex covariate effect as (cid:98) β SA = 1 | S | (cid:88) ( k(cid:96),k(cid:96) (cid:48) ) ∈ S (cid:98) B Z,k(cid:96) − (cid:98) B Z,k(cid:96) (cid:48) . (b) Compute the probability that two entries from (cid:98) B Z form a pair as p k(cid:96),k(cid:96) (cid:48) = n − ,k n − ,(cid:96) n ,(cid:96) (cid:48) + n ,k n ,(cid:96) n − ,(cid:96) (cid:48) n k n (cid:96) n (cid:96) (cid:48) , where n k = n (cid:88) i =1 { (cid:98) ξ i = k } . Construct pair set W = { ( (cid:96), (cid:96) (cid:48) ) , (cid:96), (cid:96) (cid:48) ∈ { , · · · , (cid:98) K } | (cid:98) φ (cid:96) = (cid:98) φ (cid:96) (cid:48) } . Estimatethe vertex covariate effect as (cid:98) β WA = 1 (cid:98) K | W | (cid:98) K (cid:88) k =1 (cid:88) ( (cid:96),(cid:96) (cid:48) ) ∈ W p k(cid:96),k(cid:96) (cid:48) (cid:16) (cid:98) B Z,k(cid:96) − (cid:98) B Z,k(cid:96) (cid:48) (cid:17) . Account for the vertex covariate effect by (cid:101) A ij = A ij − (cid:98) β { Z i = Z j } , where (cid:98) β is either (cid:98) β SA or (cid:98) β WA . Estimate latent positions after accounting for the vertexcovariate effect as (cid:101) Y ∈ R n × (cid:101) d using ASE of (cid:101) A where (cid:101) d is chosen as in Remark 2. Cluster (cid:101) Y using GMM to estimate the induced blockassignments after accounting for the vertex covariateeffect as (cid:101) τ ∈ { , · · · , (cid:98) K } n . B Z, − B Z, = ( a + β ) − a = β, B Z, − B Z, = ( b + β ) − b = β. (4)Then we can get (cid:98) β by subtracting two specific entries of (cid:98) B Z . However, the ASE and GMM under GRDPG model canlead to the re-ordering of (cid:98) B Z . Thus we need to identify pairsfirst so that we subtract the correct entries.In Step 2(a), we find pairs in (cid:98) B Z by first assigning eachblock common covariates using the mode. However, it ispossible that we can not find any pairs using this approach,especially in the unbalanced case where the size of each blockis different and/or the distribution of the vertex covariate isdifferent. For example, one block size is much larger than theothers and/or vertex covariates are all the same within oneblock.In Step 2(b), instead of first finding pairs using mode, weonly compute the probability that two entries of (cid:98) B Z form apair. This will make the estimation more robust to extremecases or special structure.IV. S PECTRAL I NFERENCE P ERFORMANCE
A. Chernoff Ratio
There are different metrics for comparing spectral inferenceperformance such as within-class covariance and Chernoffinformation [3], [20], [30]. The within-class covariance willdepend on which clustering procedure is used, specifically K -means. Chernoff information is independent of the clusteringprocedure and intrinsically related to the Bayes risk. Weemploy Chernoff information to compare the performance ofAlgorithm 1 and Algorithm 2 for estimating the induced blockassignments in SBMs with vertex covariates. Let F and F betwo continuous multivariate distributions on R d with densityfunctions f and f . The Chernoff information [31], [32] isdefined as C ( F , F ) = − log (cid:20) inf t ∈ (0 , (cid:90) R d f t ( x ) f − t ( x ) d x (cid:21) = sup t ∈ (0 , (cid:20) − log (cid:90) R d f t ( x ) f − t ( x ) d x (cid:21) . (5)Consider the special case where we take F = N ( µ , Σ ) and F = N ( µ , Σ ) ; then the corresponding Chernoffinformation is C ( F , F ) = sup t ∈ (0 , (cid:20) t (1 − t )( µ − µ ) (cid:62) Σ − t ( µ − µ )+ 12 log | Σ t || Σ | t | Σ | − t (cid:21) , (6)where Σ t = t Σ + (1 − t ) Σ . For a given embedding methodsuch as ASE in Algorithm 1 and Algorithm 2, comparsion viaChernoff information is based on the statistical informationbetween the limiting distributions of the blocks and smallerstatistical information implies less information to discriminatebetween different blocks of the SBM. To that end, we also review the limiting results of ASE for SBM, essential forinvestigating Chernoff information. Theorem 1 (CLT of ASE for SBM [26]) . Let ( A ( n ) , X ( n ) ) ∼ GRDPG ( n, F, d + , d − ) be a sequence of adjacency matricesand associated latent positions of a d -dimensional GRDPGas in Definition 1 from an inner product distribution F where F is a mixture of K point masses in R d , i.e., F = K (cid:88) k =1 π k δ ν k with ∀ k, π k > and K (cid:88) k =1 π k = 1 , (7) where δ ν k is the Dirac delta measure at ν k . Let Φ( z , Σ ) denote the cumulative distribution function (CDF) of a mul-tivariate Gaussian distribution with mean and covariancematrix Σ , evaluated at z ∈ R d . Let (cid:98) X ( n ) be the ASEof A ( n ) with (cid:98) X ( n ) i as the i -th row (same for X ( n ) i ). Thenthere exists a sequence of matrices M n ∈ R d × d satisfying M n I d + d − M (cid:62) n = I d + d − such that for all z ∈ R d and fixedindex i, P (cid:110) √ n (cid:16) M n (cid:98) X ( n ) i − X ( n ) i (cid:17) ≤ z (cid:12)(cid:12) X ( n ) i = ν k (cid:111) → Φ( z , Σ k ) , (8) where for ν ∼ F ∆ = E (cid:2) νν (cid:62) (cid:3) , E k = E (cid:2)(cid:0) ν (cid:62) k I d + d − ν (cid:1) (cid:0) − ν (cid:62) k I d + d − ν (cid:1) νν (cid:62) (cid:3) , Σ k = I d + d − ∆ − E k ∆ − I d + d − . (9) Remark 3.
If the adjacency matrix A is sampled from an SBMparameterized by the block connectivity probability matrix B in Eq. (1) and block assignment probabilities π = ( π , π ) with π + π = 1 , then as a speical case for Theorem 1 [20],[30], we have for each fixed index i , √ n (cid:16) (cid:98) X i − p (cid:17) d −→ N (cid:0) , σ p (cid:1) if X i = p, √ n (cid:16) (cid:98) X i − q (cid:17) d −→ N (cid:0) , σ q (cid:1) if X i = q. (10) where σ p = π p (1 − p ) + π pq (1 − pq )[ π p + π q ] ,σ q = π p q (1 − pq ) + π q (1 − q )[ π p + π q ] . (11)Now for a K -block SBM, let B ∈ [0 , K × K be the blockconnectivity probability matrix and π ∈ R K be the vector ofblock assignment probabilities. Given an n vertex instantiationof the SBM parameterized by B and π , for sufficiently large n , the large sample optimal error rate for estimating theblock assignments using ASE can be measured via Chernoffinformation as [20], [30] ρ = min k (cid:54) = l sup t ∈ (0 , (cid:20) nt (1 − t )( ν k − ν (cid:96) ) (cid:62) Σ − k(cid:96) ( t )( ν k − ν (cid:96) )+ 12 log | Σ k(cid:96) ( t ) || Σ k | t | Σ (cid:96) | − t (cid:21) , (12)where Σ k(cid:96) ( t ) = t Σ k + (1 − t ) Σ (cid:96) , Σ k and Σ (cid:96) are defined asin Eq. (9). Also note that as n → ∞ , the logarithm term in Eq. (12) will be dominated by the other term. Then we havethe Chernoff ratio as ρ ∗ = ρ ∗ ρ ∗ → min k (cid:54) = (cid:96) sup t ∈ (0 , (cid:104) t (1 − t )( ν ,k − ν ,(cid:96) ) (cid:62) Σ − ,k(cid:96) ( t )( ν ,k − ν ,(cid:96) ) (cid:105) min k (cid:54) = (cid:96) sup t ∈ (0 , (cid:104) t (1 − t )( ν ,k − ν ,(cid:96) ) (cid:62) Σ − ,k(cid:96) ( t )( ν ,k − ν ,(cid:96) ) (cid:105) . (13)Here ρ ∗ and ρ ∗ are associated with the Algorithm 1 andAlgorithm 2 respectively. If ρ ∗ > , then Algorithm 1 ispreferred, otherwise Algorithm 2 is preferred. B. 2-block Rank One Model with One Binary Covariate
As an illustration of using Chernoff ratio in Eq. (13) tocompare the performance of Algorithm 1 and Algorithm 2for estimating the induced block assignments, we consider the2-block SBM with one binary covariate parametrized by theblock connectivity probability matrix B Z as in Eq. (2). Inaddition, we consider the balanced case where π = ( , ) and π Z = ( , , , ) with the assumption that n i = nπ i and n Z,j = nπ Z,j for i ∈ { , } and j ∈ { , , , } . Via the ideaof Cholesky decomposition, we can re-write B Z as B Z = ν Z ν (cid:62) Z = ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν ν (cid:62) ν , (14)where ν Z = (cid:2) ν ν ν ν (cid:3) (cid:62) . Elementary calculationsyield the canonical latent positions as ν Z = (cid:112) p + β p √ p + β (cid:113) β (2 p + β ) p + β pq + β √ p + β (cid:113) βp ( q − p ) ( p + β )(2 p + β ) (cid:113) β ( q − p ) (2 p + β ) pq √ p + β (cid:113) β ( p + pq + β ) ( p + β )(2 p + β ) (cid:113) β ( q − p ) (2 p + β ) . (15)For this model, the block connectivity probability matrix B Z as in Eq. (2) is positive semidefinite with rank ( B Z ) = 3 .Then we have I d + d − = I and we can omit it in our analyticderivations. With the canonical latent positions in Eq. (15), theonly remaining term to derive for Chernoff ratio is Σ k(cid:96) ( t ) inEq. (13). For t ∈ (0 , , define g t ( ν k , ν (cid:96) , ν ) = tg ( ν k , ν ) + (1 − t ) g ( ν (cid:96) , ν ) , (16)where g ( ν k , ν ) = (cid:0) ν (cid:62) k ν (cid:1) (cid:0) − ν (cid:62) k ν (cid:1) . (17)Then we can re-write Σ k in Eq. (9) as Σ k = ∆ − E (cid:2) g ( ν k , ν ) νν (cid:62) (cid:3) ∆ − (18)and Σ k(cid:96) ( t ) from Eq. (13) as Σ k(cid:96) ( t ) = ∆ − E (cid:2) g t ( ν k , ν (cid:96) , ν ) νν (cid:62) (cid:3) ∆ − . (19) To evaluate the Chernoff ratio, we also define for ≤ k <(cid:96) ≤ C k(cid:96) = sup t ∈ (0 , t (1 − t )( ν k − ν (cid:96) ) (cid:62) Σ − k(cid:96) ( t )( ν k − ν (cid:96) ) . (20)By the symmetric structure of B Z as in Eq. (2) and thebalanced assumption, we observe that C = C , C = C .Thus we need only to evaluate C , C , C , C . Subsequentcalculations and simplification yield C = β φ p + φ pq + β (1 − p − pq − β )] ,C = β φ q + φ pq + β (1 − q − pq − β )] , (21)where for < p < q < φ p = p (1 − p ) ,φ q = q (1 − q ) ,φ pq = pq (1 − pq ) . (22)Then we have the approximate Chernoff information forAlgorithm 1 as ρ ∗ ≈ min k ∈{ , } ,k<(cid:96) ≤ C k(cid:96) , (23)where C k(cid:96) for k ∈ { , } , k < (cid:96) ≤ are defined as in Eq. (21).For this model, there is no tractable closed-form for C and C but numerical experiments can be used to obtain ρ ∗ . Bythe Remark 3 and similar calculations [20], [30], we have theapproximate Chernoff information for Algorithm 2 as ρ ∗ ≈ sup t ∈ (0 , t (1 − t )( p − q ) (cid:2) tσ p + (1 − t ) σ q (cid:3) − = ( p − q ) ( p + q ) (cid:2)(cid:112) p φ p + q φ pq + (cid:112) q φ q + p φ pq (cid:3) , (24)where σ p , σ q are defined as in Eq. (11) and φ p , φ q , φ pq aredefined as in Eq. (22).Figure 1 shows the Chernoff ratio when we fix p = 0 . and take q ∈ (0 . , . , β ∈ (0 . , . in the 2-block rank onemodels with one binary covariate. We can see that ρ ∗ < for most of the region while ρ ∗ > only when q and β arerelatively large. Recall that the performance of Algorithm 1highly depends on the estimated block connectivity probabilitymatrix (cid:98) B Z . Large q and β lead to a relatively well-structured (cid:98) B Z and thus Algorithm 1 can have better performance in thisregion. C. 2-block Homogeneous Model with One Binary Covariate
Now we consider the 2-block SBM with one binary covari-ate parametrized by the block connectivity probability matrix B Z as in Eq. (3). We also consider the balanced case where π = ( , ) and π Z = ( , , , ) with the assumption that n i = nπ i and n Z,j = nπ Z,j for i ∈ { , } and j ∈ { , , , } . Fig. 1. Chernoff ratio as in Eq. (13) for 2-block rank one model, p = 0 . , q ∈ (0 . , . , β ∈ (0 . , . , π = ( , ) , π Z = ( , , , ) . Similarly, the idea of Cholesky decomposition and elementarycalculations yield the canonical latent positions as ν Z = √ a + β a √ a + β (cid:113) β (2 a + β ) a + β b + β √ a + β (cid:113) β ( b − a ) ( a + β )(2 a + β ) (cid:113) a − b )( a + b + β )(2 a + β ) b √ a + β (cid:113) β ( a + b + β ) ( a + β )(2 a + β ) (cid:113) a − b )( a + b + β )(2 a + β ) . (25)Observe that for this model, the block connectivity proba-bility matrix B Z as in Eq. (3) is also positive semidefinite withrank ( B Z ) = 3 . Then we have I d + d − = I and we can omit itin the derivations as for 2-block rank one model. To evaluatethe Chernoff ratio, we also investigate the C k(cid:96) as defined inEq. (20). Similar observations suggest that C = C , C = C , C = C . Thus we only need to evaluate C , C , C .Subsequent calculations and simplification yield C = β φ a + φ b + φ β ) ,C = ( a − b ) φ a + φ b + φ β ) ,C = β N + ( a − b ) N D + ( φ a + φ b )( φ a + φ b + 2 φ β )] , (26)where for < b < a < and < β < φ a = a (1 − a ) ,φ b = b (1 − b ) ,φ β = β (1 − a − b − β ) ,N = a (1 − b ) + b (1 − a ) + φ β N = ab ( a − b ) + φ a ( a + β ) − φ b ( b + β ) D = β (1 − a − β )(1 − b − β ) . (27) Then we have the approximate Chernoff information forAlgorithm 1 as ρ ∗ ≈ min (cid:96) ∈{ , , } C (cid:96) , (28)where C (cid:96) for (cid:96) ∈ { , , } are defined as in Eq. (26). Alsoobserve that C − C = − ( a − b ) [ φ a + φ b + β (1 − a − b )] D ,C − C = − β N D , (29)where D = 2( φ a + φ b + φ β )[ D + ( φ a + φ b )( φ a + φ b + 2 φ β )] . (30)Then we can further simplify ρ ∗ as ρ ∗ ≈ β φ a + φ b + φ β ) if β ≤ a − b ( a − b ) φ a + φ b + φ β ) if β > a − b . (31)By the same derivations [21], we have the approximateChernoff information for Algorithm 2 as ρ ∗ ≈ ( a − b ) a (1 − a ) + b (1 − b )] = ( a − b ) φ a + φ b ) , (32)where φ a and φ b are defined as in Eq. (27). We then have thegeneral Chernoff ratio formula as follows. Corollary 1.
For 2-block homogeneous balanced model withone binary covariate parametrized by B Z as in Eq. (3) and π Z = ( , , , ) , the Chernoff ratio as in Eq. (13) can bederived analytically as ρ ∗ = ρ ∗ ρ ∗ → β ( φ a + φ b )( a − b ) ( φ a + φ b + φ β ) if β ≤ a − b φ a + φ b φ a + φ b + φ β if β > a − b , (33) where φ a , φ b , φ β are defined as in Eq. (27) . Figure 2 shows Chernoff ratio when we fix b = 0 . andtake a ∈ (0 . , . , β ∈ (0 . , . in the 2-block homogeneousmodels with one binary covariate. Again we can see that ρ ∗ < for most of the region while ρ ∗ > only when a and β arerelatively large, which agrees with the general formula forChernoff ratio as in Corollary 1. According to Eq. (33), wecan have ρ ∗ > only when φ β < and this can happenonly when a and β are relatively large. This implies that, ingeneral, Algorithm 2 is preferred for estimating the inducedblock assignments. D. K -block Homogeneous Model with One Binary Covariate We extend the discussion from the 2-block homogeneousmodel to the K -block homogeneous model with one binarycovariate. Still, we consider the balanced case where π =( K , · · · , K ) and π Z = ( K , · · · , K ) with the assumptionthat n i = nπ i and n Z,j = nπ Z,j for i ∈ { , · · · , K } and j ∈ { , · · · , K } . Similar observation and derivations yield Fig. 2. Chernoff ratio as in Eq. (13) for 2-block homogeneous models. b =0 . , a ∈ (0 . , . , β ∈ (0 . , . , π = ( , ) , π Z = ( , , , ) . the approximate Chernoff information for Algorithm 1 as (seeAppendix A and Appendix B for more details) ρ ∗ ≈ Kβ D if δ ≤ ( a − b ) K ( φ a + φ b + φ β ) if δ > , (34)where φ a , φ b , φ β are defined as in Eq. (27) and D = K − a − K − b − Kβ,D = 2 φ a + 2( K − φ b + βD ,δ = K β ( φ a + φ b + φ β ) − a − b ) D . (35)Again by the same derivations [21], we have the approxi-mate Chernoff information for Algorithm 2 as ρ ∗ ≈ ( a − b ) K [ a (1 − a ) + b (1 − b )] = ( a − b ) K ( φ a + φ b ) , (36)where φ a and φ b are defined as in Eq. (27). We then have thegeneral Chernoff ratio formula as follows. Theorem 2.
For K -block homogeneous balanced model withone binary covariate parametrized by B Z ∈ [0 , K × K withsimilar structure as in Eq. (3) and π Z = ( K , · · · , K ) , theChernoff ratio as in Eq. (13) can be derived analytically as ρ ∗ = ρ ∗ ρ ∗ → K β ( φ a + φ b )2( a − b ) D if δ ≤ φ a + φ b φ a + φ b + φ β if δ > , (37) where φ a , φ b , φ β are defined as in Eq. (27) , D , δ are definedas in Eq. (35) . Remark 4.
Clearly Theorem 2 generalizes Corollary 1 beyond K = 2 . Figure 3 shows Chernoff ratio when we fix b = 0 . and take a ∈ (0 . , . , β ∈ (0 . , . in the 4-block homogeneous Fig. 3. Chernoff ratio as in Eq. (13) for 4-block homogeneous models. b =0 . , a ∈ (0 . , . , β ∈ (0 . , . , π = ( , , , ) , π Z = ( , · · · , ) . models with one binary covariate. We can see that ρ ∗ < for most of the region while ρ ∗ > only when a and β are relatively large. This implies again that, in general,Algorithm 2 is preferred for estimating the induced blockassignments.V. S IMULATIONS AND R EAL D ATA E XPERIMENTS
In addition to measuring the two algorithms’ performanceanalytically via Chernoff ratio, we also compare Algorithm 1and Algorithm 2 (with β and (cid:98) β in Step 3 respectively) byactual clustering results. Recall that the analytic comparisonvia Chernoff ratio is based on the limiting results of ASE forSBM when the number of vertices n → ∞ . The comparisonvia actual clustering results can measure the performance ofthese two algorithms for finite n .As an illustration of this correspondence, we start with thesetting related to “A” ( p = 0 . , q = 0 . , β = 0 . with ρ ∗ = 1 . > ) and “B” ( p = 0 . , q = 0 . , β = 0 . with ρ ∗ = 0 . < ) in left panel of Figure 4 for 2-block rank onemodel with one binary covariate Z ∈ { , } n . We considerthe balanced case where n = n = n and n Z, = n Z, = n Z, = n Z, = n . For each n ∈ { , , , , } ,we simulate 100 adjacency matrices with n vertices in eachblock and generate binary covariate with n vertices havingeach value of Z within each block. We then apply Algorithm 1and Algorithm 2 (with β and (cid:98) β in Step 3 respectively) usingembedding dimension (cid:98) d = 3 to estimate the induced blockassignments where adjusted Rand index (ARI) is used to mea-sure the performance. The upper right panel in Figure 4 showsthat although ρ ∗ > and Algorithm 1 should be preferred interms of Chernoff ratio, the ARI suggests that Algorithm 2is preferred. Chernoff ratio is a limiting result. However, theregion for which ρ ∗ > is so easy for clustering—e.g., q − p is large for “A”—that both algorithms are essentiallyperfect even for small n . The lower right panel in Figure 4 shows that Algorithm 2 tends to have better performance thanAlgorithm 1, which agrees with the Chernoff ratio as in leftfigure where ρ ∗ < and Algorithm 2 is preferred.To further investigate the flexibility of our models andalgorithms, we also extend the disscussion from binary tocategorical vertex covariate. A. 2-block Rank One Model with One 5-categorical Covariate
Specifically, we first consider the 2-block rank one modelwith one 5-categorical covariate Z ∈ { , , , , } n , i.e.,we have the block connectivity probability matrix B Z ∈ [0 , × with the similar structure as in Eq. (2).We first fix p = 0 . , β = 0 . and consider q ∈{ . , . , . , . , . } . For each q , we simulate 100adjacency matrices with 1000 vertices in each block andgenerate 5-categorical covariate with 200 vertices having eachvalue of Z within each block. We then apply Algorithm 1and Algorithm 2 (with β and (cid:98) β in Step 3 respectively) usingembedding dimension (cid:98) d = 6 to estimate the induced blockassignments. Figure 5a shows that both algorithms estimatemore accurate induced block assignments as the latent posi-tions of two induced block move away from each other, i.e.,two induced blocks tend to be more seperate, and Algorithm 2can have better performance than Algorithm 1.Next we fix p = 0 . , q = 0 . and consider β ∈{ . , . , . , . , . } . For each β , we simulate 100 adja-cency matrices with 1000 vertices in each block and generate5-categorical covariate with 200 vertices having each value of Z within each block. We then apply both algorithms (with β and (cid:98) β in Step 3 of Algorithm 2 respectively) using embeddingdimension (cid:98) d = 6 to estimate the induced block assignments.Figure 5b shows Algorithm 1 can only estimate accurateinduced block assignments when β is relatively small whileAlgorithm 2 can estimate accurate induced block assignmentsno matter β is small or large. Intuitively, as Algorithm 1directly estimates the induced block assignments, when β is relatively large, i.e., vertex covariates can affect blockstructure significantly, it lacks the ability to distinguish thiseffect. However, Algorithm 2 can use additional informationfrom vertex covariates to estimate β , taking this effect intoconsideration when estimating the induced block assignments.Again, the overall performance of Algorithm 2 is better thanthat of Algorithm 1. B. 2-block Homogeneous Model with One 5-categorical Co-variate
We now consider the 2-block homogeneous model with one5-categorical covariate Z ∈ { , , , , } n , i.e., we have theblock connectivity probability matrix B Z ∈ [0 , × withthe similar structure as in Eq. (3). Note that we can re-write B like Eq. (1) as B = νν (cid:62) = (cid:20) a bb a (cid:21) with ν = (cid:34) √ a b √ a (cid:113) ( a − b )( a + b ) a (cid:35) . (38)With these canonical latent positions, the distance betweentwo induced blocks can be measured by (cid:18) √ a − b √ a (cid:19) + (cid:32) − (cid:114) ( a − b )( a + b ) a (cid:33) = 2( a − b ) . (39)We first fix b = 0 . , β = 0 . and consider a ∈{ . , . , . , . , . } . For each a , we simulate 100adjacency matrices with 1000 vertices in each block andgenerate 5-categorical covariate with 200 vertices having eachvalue of Z within each block. We then apply both algorithms(with β and (cid:98) β in Step 3 of Algorithm 2 respectively) usingembedding dimension (cid:98) d = 6 to estimate the induced blockassignments. Figure 6a shows that both algorithms estimatemore accurate induced block assignments as the latent posi-tions of two induced block move away from each other, i.e.,two induced blocks tend to be more seperate as measured byEq. (39), and Algorithm 2 can have much better performance.Recall that Algorithm 1 tries to estimate the induced blockassignments by clustering the diagonal of (cid:98) B Z and re-assigningthe block assignments including the vertex covariate effect.For the homogeneous model, the diagonal of B Z are all thesame, which can make it hard for Algorithm 1 to accuratelyestimate the induced block assignments. But Algorithm 2 isnot affected by the homogeneous structure since it estiamtesthe vertex covariate effect first and then estimates the inducedblock assignments by clustering the estimated latent positionslike the canonical ones in Eq. (38).Next we fix a = 0 . , b = 0 . and consider β ∈{− . , − . , − . , − . , − . } . For each β , we alsosimulate 100 adjacency matrices with 1000 vertices in eachblock and generate 5-categorical covariate with 200 verticeshaving each value of Z within each block. We then applyboth algorithms (with β and (cid:98) β in Step 3 of Algorithm 2respectively) using embedding dimension (cid:98) d = 6 to estimatethe induced block assignments. Figure 6b shows that bothalgorithms are relative stable for this homogeneous model ifwe fix a and b , due to the special structure. Still, Algorithm 2can have much better performance than Algorithm 1. C. Connectome Data
We conduct real data experiments on a diffusion MRIconnectome data set [33]. There are 114 graphs (connectomes)estimated by the NDMG pipeline [34] in this data set. Eachvertex in these graphs has a { Left, Right } hemisphere labeland a { Gray, White } tissue label.We begin with a synthetic data analysis. We model thegraphs as GRDPG with vertex covariates by applying twoseparate a prior 2-block projections as in [33]; in each caseone label is treated as the induced block and the other labelis treated as the binary vertex covariate. In particular, the apriori block connectivity probability matrices are given by B LR = (cid:20) .
050 0 . .
013 0 . (cid:21) & B GW = (cid:20) .
011 0 . .
027 0 . (cid:21) . (40)Furthermore, we have the block assignment probabili-ties from the data set as π LR = (0 . , . , π GW =(0 . , . and π Z, LR = (0 . , . , . , . , π Z, GW =(0 . , . , . , . . Fig. 4. Correspondence between Chernoff analysis and simulations.(a) ARI as latent positions of two induced blocks move away fromeach other with β = 0 . . (b) ARI as β increases with p = 0 . , q = 0 . .Fig. 5. Simulations for 2-block rank one model with one 5-categorical covariate, balanced case. We first treat { Left, Right } as the induced block and { Gray, White } as the vertex covariate. For each β ∈{ . , . , . , . , . } , we simulate 100 adja-cency matrices with n = 1000 where 280 vertices havelabel { Left } and { Gray } , 220 vertices have label { Left } and { White } , 280 vertices have label { Right } and { Gray } , 220vertices have label { Right } and { White } . We then applyboth algorithms (with β and (cid:98) β in Step 3 of Algorithm 2respectively) using embedding dimension (cid:98) d = 3 to estimate theinduced block assignments. Figure 7a shows that Algorithm 2can have much better performance than Algorithm 1. Note that B LR approximately has homogeneous structure; as discussedabove, this diagonal structure can make it difficult for Algo- rithm 1 to accurately estimate the induced block assignmentswhile Algorithm 2 can first estimate the vertex covariate effectand then estimate the induced block assignments by clusteringthe latent positions after accounting for this effect.We then treat { Gray, White } as the induced block and { Left, Right } as the vertex covariate. We fix β = 0 . andconsider n ∈ { , , , , } . For each n , wesimulate 100 adjacency matrices where . n vertices havelabel { Gray } and { Left } , . n vertices have label { Gray } and { Right } , . n vertices have label { White } and { Left } , . n vertices have label { White } and { Right } . We then applyboth algorithms (with β and (cid:98) β in Step 3 of Algorithm 2respectively) using embedding dimension (cid:98) d = 3 to estimate (a) ARI as latent positions of two induced blocks move away fromeach other with β = 0 . . (b) ARI as β increases with a = 0 . , b = 0 . .Fig. 6. Simulations for 2-block homogeneous model with one 5-categorical covariate, balanced case. the induced block assignments. Figure 7b shows that bothalgorithms tend to have better performance as n increasesand for small n Algorithm 2 again can have relatively betterperformance than Algorithm 1.Now we apply our algorithms to the actual graphs fromthe connectome data set. Each of the 114 connectomes (thenumber of vertices n ≈ ) is represented by a point inFigure 8 with x = ARI(Algo2, LR) − ARI(Algo1, LR) and y = ARI(Algo2, GW) − ARI(Algo1, GW) where ARI(Algo1,LR) denotes the ARI when we apply Algorithm 1 and treat { Left, Right } as the induced block (with analogous notationfor the rest). We see that most of the points lie in the (+,+)quadrant, indicating ARI(Algo2, LR) > ARI(Algo1, LR) andARI(Algo2, GW) > ARI(Algo1, GW). That is, Algorithm 2is better at estimating the induced block assignments forthis real application. Note that this claim holds no matterwhich label is treated as the induced block. This againemphasizes the importance of distinguishing different factorsthat can affect block structure in graphs. Algorithm 2 is ableto identify particular block structure by using the observedvertex covariate information. That is, it is more likely todiscover the { Left, Right } structure after accounting for theeffect of { Gray, White } label and more likely to discoverthe { Gray, White } structure after accounting for the effect of { Left, Right } label. In real data, we may not have ground truthfor the block structure. Our findings suggest that we are able todiscover block structure by using observed vertex covariates,which can lead to meaningful insights in widely varyingapplications. That is, we can better reveal underlying blockstructure and thus better understand the data by accountingfor the vertex covariate effect.VI. D ISCUSSION
We present a comparative analysis of two model-basedspectral algorithms for clustering vertices in stochastic block-model graphs with vertex covariates to assess the effectof observed and unobserved vertex heterogeneity on block structure in graphs. The main difference of these two algo-rithms in estimating the induced block assignments is whetherwe estimate the vertex covariate effect using the observedcovariate information. To analyze the algorithms’ performance,we employ Chernoff information and derive the Chernoff ratioformula for homogeneous balanced model. We also simulatemultiple adjacency matrices with varied type of covariatesto compare the algorithms’ performance via actual clusteringaccuracy measured by ARI. In addition, we conduct a real dataanalysis on a diffusion MRI connectome data set. Analyticresults, simulations, and real data experiments suggest that,in general, the second algorithm is preferred: we can betterestimate the induced block assignments and reveal underlyingblock structure by first estimating the vertex covariate effect.Our findings also emphasize the importance of distinguishingbetween observed and unobserved factors that can affect blockstructure in graphs.We focus on the model specified as in Remark 1 whereindicator function is used to measure the vertex covariateeffect and identity function is used as the link betweenedge probabilities and latent positions. We also investigatethe flexibility of our models and algorithms by consideringcategorical vertex covariates. The extension from discretevertex covariates to continuous vertex covariates is underinvestigation, for instance, via latent structure models [35].The indicator function is used to measure the vertex covariateeffect for binary and generally categorical vertex covariatesunder the intuition that vertices having the same covariatesare more likely to form an edge between them and differentfunctions can be apdoted for the continuous vertex covariatesfollowing the similar intuition. For example, similarity anddistance functions can be chosen according to the nature ofdifferent vertex covariates to measure how they can influencegraph structure. One other extension is to replace the identitylink with general link function such as logit. The idea of usingChernoff information to compare algorithms’ performance canbe adopted for all the above generalizations and numerical (a) ARI as β increases, { Left, Right } is treated as the inducedblock and { Gray, White } is treated as the vertex covariate. (b) ARI as n increases, { Gray, White } is treated as the inducedblock and { Left, Right } is treated as the vertex covariate.Fig. 7. Synthetic data analysis via a diffusion MRI connectome data set.Fig. 8. Algorithms’ comparative performance on connectome data via ARI. evaluations can be obtained in the absence of closed-formexpressions, which in turn can reveal how graph structurewill affect our algorithms and provide guidelines for realapplication. Moreover, our models and algorithms can beapplied to directed and bipartite graph with some modification,which is another valuable future contribution.A PPENDIX AL ATENT P OSITION G EOMETRY
Observe that a K -block SBM becomes a K -block SBMwhen adding a binary covariate that can affect block structuresignificantly. To analytically derive the Chernoff ratio for K -block homogeneous model with one binary covariate, wefirst investigate the canonical latent positions for this modelvia the idea of Cholesky decomposition. Specifically, let B ∈ [0 , K × K denote the block connectivity probability matrix after accounting for the vertex covariate effect and B Z ∈ [0 , K × K denote the block connectivity probabilitymatrix including the vertex covariate effect. Here we focus oncanonical latent positions for B Z , details about the canon-ical latent positions for B have been discussed [21]. Let ν Z ( K, K ) denote the canonical latent position matrix, thenwe can re-write B Z as B Z = ν Z ( K, K ) ν Z ( K, K ) (cid:62) , (41)where ν Z ( K, K ) = (cid:2) ν · · · ν K (cid:3) (cid:62) . For K = 2 we havevia the idea of Cholesky decomposition ν Z (2 ,
4) = √ a + β a √ a + β (cid:113) β (2 a + β ) a + β b + β √ a + β (cid:113) β ( b − a ) ( a + β )(2 a + β ) (cid:113) a − b )( a + b + β )(2 a + β ) b √ a + β (cid:113) β ( a + b + β ) ( a + β )(2 a + β ) (cid:113) a − b )( a + b + β )(2 a + β ) . (42)And by induction, for K ≥ we have ν Z ( K, K ) · , = ν Z ( K − , K − · , K − ν Z ( K − , K − K − , K − ν Z ( K − , K − K − , K − , ν Z ( K, K ) · , = ν Z ( K − , K − · ,K κ ν Z ( K − , K − K − ,K κ ν Z ( K − , K − K − ,K , ν Z ( K, K ) · , = (cid:113) ( a − b )[2 a +2( K − b + Kβ ]2 a +2( K − b +( K − β (cid:113) ( a − b )[2 a +2( K − b + Kβ ]2 a +2( K − b +( K − β , (43) where κ = 2 b + β a + 2( K − b + ( K − β . (44)For this K -block homogeneous model with one binarycovariate, the symmetric structure of B Z yields ν (cid:62) ν = ν (cid:62) ν = · · · = ν (cid:62) K ν K = a + β, ν (cid:62) ν = ν (cid:62) ν = · · · = ν (cid:62) K − ν K = a, ν (cid:62) ν = ν (cid:62) ν = · · · = ν (cid:62) K − ν K = b + β, ν (cid:62) ν = ν (cid:62) ν = · · · = ν (cid:62) K − ν K − = b. (45)Along with the balanced assumption, i.e. π Z =( K , · · · , K ) , the first four rows of ν Z ( K, K ) are ideal forderivation as they have the fewest non-zero entries and canrepresent all the possible geometric structure. In other word,we can only evaluate C , C , C where C kl is defined asin Eq. (20) to derive the Chernoff ratio.A PPENDIX BA NALYTIC D ERIVATIONS OF C HERNOFF R ATIO
For K -block homogeneous model with one binary covariate,we observe that B Z has eigenvalue 0 with algebraic multi-plicity K − , eigenvalue Kβ with algebraic multiplicity 1,eigenvalue a − b ) with algebraic multiplicity K − andeigenvalue a + 2( K − b + Kβ with algebraic multiplicity 1.Along with the assumption that < b < a < and < β < , we have among non-zero eigenvalues of B Z λ max ( B Z ) = 2 a + 2( K − b + Kβ,λ min ( B Z ) = (cid:40) Kβ if β ≤ a − b ) K a − b ) if β > a − b ) K . (46)Thus B Z is positive semidefinite with rank ( B Z ) = K + 1 .Then we have I d + d − = I K +1 and we can omit it in thederivations. As discussed in the previous section, we only con-sider the first four rows of the canonical latent position matrix ν Z ( K, K ) and evalute C , C , C . With the definition asin Eq. (16), we have E (cid:104) g ( ν , ν , ν ) νν (cid:62) (cid:105) = c ∆ + c N N (cid:62) , E (cid:104) g ( ν , ν , ν ) νν (cid:62) (cid:105) = ∆ T + c N N (cid:62) + c N N (cid:62) , E (cid:104) g ( ν , ν , ν ) νν (cid:62) (cid:105) = c ∆ + c N N (cid:62) + c N N (cid:62) , (47)where ∆ ∈ R ( K +1) × ( K +1) is defined as in Eq. (9), ν Z ∈ R K × ( K +1) is defined as in Eq. (43) and φ a = a (1 − a ) ,φ b = b (1 − b ) ,φ bβ = ( b + β )(1 − b − β ) ,c = φ b + φ bβ ,c = ( a − b )(1 − a − b − β )2 K ,c = c = ( a − b )(1 − a − b − β )4 K ,c = c = φ a − φ b K ,c T = β (1 − b − β )4 K , N k(cid:96) = (cid:2) ν k ν (cid:96) (cid:3) ∈ R ( K +1) × , I T = diag (1 , − , · · · , , − ∈ R K × K ∆ T = ν (cid:62) Z (cid:16) c T I T + c K I K (cid:17) ν Z ∈ R ( K +1) × ( K +1) . (48)With the canonical latent position matrix ν Z ( K, K ) as inEq. (43), observe that N (cid:62) ∆ − N = (cid:20) K + 1 K − K − K + 1 (cid:21) , N (cid:62) ∆ − N = (cid:20) K + 1 11 K + 1 (cid:21) , N (cid:62) ∆ − N = (cid:20) K + 1 − − K + 1 (cid:21) , N (cid:62) ∆ − N = (cid:20) K − K − (cid:21) , N (cid:62) ∆ − T N = 2 n (cid:20) φ b + Kφ bβ φ b φ b φ b + Kφ bβ (cid:21) , N (cid:62) ∆ − T N = 2 n (cid:20) Kφ b + φ bβ φ bβ φ bβ Kφ b + φ bβ (cid:21) , N (cid:62) ∆ − T N = 1 c (cid:20) K − − − K − (cid:21) , (49)where c , φ b , φ bβ are defined as in Eq. (48) and n = 2 φ b + β (1 − β ) + 3 βφ b (1 − b − β ) − bβ (1 − b − β ) ,n = φ b ( φ b + φ bβ ) . (50)By the Sherman-Morrison-Woodbury formula [36], we have E (cid:104) g ( ν , ν , ν ) νν (cid:62) (cid:105) − = 1 c ∆ − − c ∆ − M ∆ − , E (cid:104) g ( ν , ν , ν ) νν (cid:62) (cid:105) − = ∆ − T − ∆ − T M ∆ − T − ∆ − T M ∆ − T + ∆ − T M ∆ − T M ∆ − T + ∆ − T M ∆ − T M ∆ − T − ∆ − T M ∆ − T M ∆ − T M ∆ − T , E (cid:104) g ( ν , ν , ν ) νν (cid:62) (cid:105) − = 1 c ∆ − − c ∆ − M ∆ − − c ∆ − M ∆ − + 1 c ∆ − M ∆ − M ∆ − + 1 c ∆ − M ∆ − M ∆ − − c ∆ − M ∆ − M ∆ − M ∆ − , (51)where c , c , c , c , c , c are defined as in Eq. (48) and D = 1 c I + 1 c N (cid:62) ∆ − N , D = 1 c I + N (cid:62) ∆ − T N , D = 1 c I + 1 c N (cid:62) ∆ − N , M = N D − N (cid:62) , M = N D − N (cid:62) , M = N D − N (cid:62) , D = 1 c I + 1 c N (cid:62) ∆ − N − c N (cid:62) ∆ − M ∆ − N , D = 1 c I + N (cid:62) ∆ − T N − N (cid:62) ∆ − T M ∆ − T N , M = N D − N (cid:62) , M = N D − N (cid:62) . (52)Again by canonical latent position matrix ν Z ( K, K ) as inEq. (43), we have ( ν − ν ) (cid:62) ∆ ( ν − ν ) = β , ( ν − ν ) (cid:62) ∆ ( ν − ν ) = 2 K ( a − b ) , ( ν − ν ) (cid:62) ∆ ( ν − ν ) = 2 K ( a − b ) + β , ( ν − ν ) (cid:62) ∆∆ − T ∆ ( ν − ν ) = 1 c K ( a − b ) . (53)Similarly, we have N (cid:62) ( ν − ν ) = β (cid:2) − (cid:3) (cid:62) , N (cid:62) ( ν − ν ) = ( a − b ) (cid:2) − (cid:3) (cid:62) , N (cid:62) ( ν − ν ) = ( a − b + β ) (cid:2) − (cid:3) (cid:62) , N (cid:62) ( ν − ν ) = ( a − b − β ) (cid:2) − (cid:3) (cid:62) , N (cid:62) ∆ − T ∆ ( ν − ν ) = ( a − b ) c (cid:2) − (cid:3) (cid:62) . (54)Then with all the results above, we have C = Kβ D ,C = ( a − b ) K ( φ a + φ b + φ β ) ,C = K β ( φ a + φ b + φ β ) + 2 KN + 4 N K [2( φ a − φ b ) + D ] , (55)where φ a , φ b are defined as in Eq. (48) and φ β = β (1 − a − b − β ) ,D = K − a − K − b − Kβ,D = 2 φ a + 2( K − φ b + βD ,N = ( a − b ) [2 φ b + β (1 + β − b )] ,N = ( a − b ) (1 − a − b − β ) ,D = 2 β ( a − b )[(1 − a − b − β ) − φ a + φ b ) − φ β + 2 b ( a + β )] + K { φ b ( φ a + φ b ) − bβ ( φ b + a − b ) − abφ β + β (1 − β )[ φ a + (3 b + β )(1 − β ) − aβ − b ] } . (56)Then we have the approximate Chernoff information forAlgorithm 1 as ρ ∗ ≈ min (cid:96) ∈{ , , } C (cid:96) , (57)where C (cid:96) for (cid:96) ∈ { , , } are defined as in Eq. (55). Alsoobserve that C − C = − ( a − b ) N KD [2( φ a − φ b ) + D ] ,C − C = − β [2( a − b ) + K ( φ a + φ b + φ β )] K ( φ a + φ b + φ β )[2( φ a − φ b ) + D ] , (58)where φ a , φ b are defined as in Eq. (48), φ β , D , D are definedas in Eq. (56) and N = β [ K − a − K − b ] ,N = 2 φ a + 2( K − φ b + N . (59)Subsequent calculations and simplification yield ρ ∗ as inEq. (34). R EFERENCES[1] E. Abbe, “Community detection and stochastic block models: recentdevelopments,”
Journal of Machine Learning Research , vol. 18, no. 1,pp. 6446–6531, 2017.[2] P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic blockmodels:First steps,”
Social Networks , vol. 5, no. 2, pp. 109–137, 1983.[3] B. Karrer and M. E. Newman, “Stochastic blockmodels and communitystructure in networks,”
Physical Review E , vol. 83, no. 1, p. 016107,2011.[4] D. S. Choi, P. J. Wolfe, and E. M. Airoldi, “Stochastic blockmodels witha growing number of classes,”
Biometrika , vol. 99, no. 2, pp. 273–284,2012.[5] S. Roy, Y. Atchad´e, and G. Michailidis, “Likelihood inference forlarge scale stochastic blockmodels with covariates based on a divide-and-conquer parallelizable algorithm with communication,”
Journal ofComputational and Graphical Statistics , vol. 28, no. 3, pp. 609–619,2019.[6] T. M. Sweet, “Incorporating covariates into stochastic blockmodels,”
Journal of Educational and Behavioral Statistics , vol. 40, no. 6, pp.635–664, 2015.[7] N. Binkiewicz, J. T. Vogelstein, and K. Rohe, “Covariate-assistedspectral clustering,”
Biometrika , vol. 104, no. 2, pp. 361–377, 2017.[8] S. Huang and Y. Feng, “Pairwise covariates-adjusted block model forcommunity detection,” arXiv preprint arXiv:1807.03469 , 2018.[9] A. Mele, L. Hao, J. Cape, and C. E. Priebe, “Spectral inference forlarge stochastic blockmodels with nodal covariates,” arXiv preprintarXiv:1908.06438 , 2019.[10] U. Von Luxburg, “A tutorial on spectral clustering,”
Statistics andComputing , vol. 17, no. 4, pp. 395–416, 2007.[11] V. Lyzinski, D. L. Sussman, M. Tang, A. Athreya, and C. E. Priebe,“Perfect clustering for stochastic blockmodel graphs via adjacencyspectral embedding,”
Electronic Journal of Statistics , vol. 8, no. 2, pp.2905–2922, 2014.[12] V. Lyzinski, M. Tang, A. Athreya, Y. Park, and C. E. Priebe, “Commu-nity detection and classification in hierarchical stochastic blockmodels,”
IEEE Transactions on Network Science and Engineering , vol. 4, no. 1,pp. 13–26, 2016.[13] F. McSherry, “Spectral partitioning of random graphs,” in
Proceedings42nd IEEE Symposium on Foundations of Computer Science . IEEE,2001, pp. 529–537.[14] K. Rohe, S. Chatterjee, and B. Yu, “Spectral clustering and the high-dimensional stochastic blockmodel,”
The Annals of Statistics , vol. 39,no. 4, pp. 1878–1915, 2011.[15] V. Lyzinski, K. Levin, and C. E. Priebe, “On consistent vertex nomina-tion schemes.”
Journal of Machine Learning Research , vol. 20, no. 69,pp. 1–39, 2019.[16] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, and C. E. Priebe,“A nonparametric two-sample hypothesis testing problem for randomgraphs,”
Bernoulli , vol. 23, no. 3, pp. 1599–1630, 2017.[17] S. Wang, J. Arroyo, J. T. Vogelstein, and C. E. Priebe, “Joint embed-ding of graphs,”
IEEE Transactions on Pattern Analysis and MachineIntelligence , 2019.[18] D. L. Sussman, M. Tang, D. E. Fishkind, and C. E. Priebe, “A consis-tent adjacency spectral embedding for stochastic blockmodel graphs,”
Journal of the American Statistical Association , vol. 107, no. 499, pp.1119–1128, 2012.[19] A. Athreya, C. E. Priebe, M. Tang, V. Lyzinski, D. J. Marchette, andD. L. Sussman, “A limit theorem for scaled eigenvectors of random dotproduct graphs,”
Sankhya A , vol. 78, no. 1, pp. 1–18, 2016.[20] M. Tang and C. E. Priebe, “Limit theorems for eigenvectors of thenormalized laplacian for random graphs,”
The Annals of Statistics ,vol. 46, no. 5, pp. 2360–2415, 2018.[21] J. Cape, M. Tang, and C. E. Priebe, “On spectral embedding performanceand elucidating network structure in stochastic blockmodel graphs,”
Network Science , vol. 7, no. 3, pp. 269–291, 2019. [22] L. Hao, A. Mele, J. Cape, A. Athreya, C. Mu, and C. E. Priebe, “Latentcommunities in employment relation and wage distribution: a networkapproach,” submitted , 2020.[23] C. R. Shalizi and E. McFowland III, “Estimating causal peer influencein homophilous social networks by inferring latent locations,” arXivpreprint arXiv:1607.06565 , 2016.[24] P. D. Hoff, A. E. Raftery, and M. S. Handcock, “Latent space approachesto social network analysis,”
Journal of the American Statistical Associ-ation , vol. 97, no. 460, pp. 1090–1098, 2002.[25] M. S. Handcock, A. E. Raftery, and J. M. Tantrum, “Model-basedclustering for social networks,”
Journal of the Royal Statistical Society:Series A (Statistics in Society) , vol. 170, no. 2, pp. 301–354, 2007.[26] P. Rubin-Delanchy, C. E. Priebe, M. Tang, and J. Cape, “A statisticalinterpretation of spectral embedding: the generalised random dot productgraph,” arXiv preprint arXiv:1709.05506 , 2017.[27] T. Hastie, R. Tibshirani, and J. Friedman,
The Elements of StatisticalLearning: Data Mining, Inference, and Prediction . Springer Science& Business Media, 2009.[28] I. T. Jolliffe and J. Cadima, “Principal component analysis: a review andrecent developments,”
Philosophical Transactions of the Royal SocietyA: Mathematical, Physical and Engineering Sciences , vol. 374, no. 2065,p. 20150202, 2016.[29] M. Zhu and A. Ghodsi, “Automatic dimensionality selection from thescree plot via the use of profile likelihood,”
Computational Statistics &Data Analysis , vol. 51, no. 2, pp. 918–930, 2006.[30] A. Athreya, D. E. Fishkind, M. Tang, C. E. Priebe, Y. Park, J. T.Vogelstein, K. Levin, V. Lyzinski, and Y. Qin, “Statistical inference onrandom dot product graphs: a survey,”
Journal of Machine LearningResearch , vol. 18, no. 1, pp. 8393–8484, 2017.[31] H. Chernoff, “A measure of asymptotic efficiency for tests of a hypoth-esis based on the sum of observations,”
The Annals of MathematicalStatistics , vol. 23, no. 4, pp. 493–507, 1952.[32] ——, “Large-sample theory: Parametric case,”
The Annals of Mathe-matical Statistics , vol. 27, no. 1, pp. 1–22, 1956.[33] C. E. Priebe, Y. Park, J. T. Vogelstein, J. M. Conroy, V. Lyzinski,M. Tang, A. Athreya, J. Cape, and E. Bridgeford, “On a two-truthsphenomenon in spectral graph clustering,”
Proceedings of the NationalAcademy of Sciences , vol. 116, no. 13, pp. 5995–6000, 2019.[34] G. Kiar, E. W. Bridgeford, W. R. Gray Roncal, , V. Chandrashekhar,D. Mhembere, S. Ryman, X.-N. Zuo, D. S. Margulies, R. C. Craddock,C. E. Priebe, R. Jung, V. D. Calhoun, B. Caffo, R. Burns, M. P.Milham, and J. T. Vogelstein, “A high-throughput pipeline identifiesrobust connectomes but troublesome variability,” bioRxiv , p. 188706,2018.[35] A. Athreya, M. Tang, Y. Park, and C. E. Priebe, “On estimation andinference in latent structure random graphs,”
Statistical Science , vol.accepted for publication, 2020.[36] R. A. Horn and C. R. Johnson,