[PDF] Dual regularized Laplacian spectral clustering methods on community detection

Abstract

Spectral clustering methods are widely used for detecting clusters in networks for community detection, while a small change on the graph Laplacian matrix could bring a dramatic improvement. In this paper, we propose a dual regularized graph Laplacian matrix and then employ it to three classical spectral clustering approaches under the degree-corrected stochastic block model. If the number of communities is known as K , we consider more than K leading eigenvectors and weight them by their corresponding eigenvalues in the spectral clustering procedure to improve the performance. Three improved spectral clustering methods are dual regularized spectral clustering (DRSC) method, dual regularized spectral clustering on Ratios-of-eigenvectors (DRSCORE) method, and dual regularized symmetrized Laplacian inverse matrix (DRSLIM) method. Theoretical analysis of DRSC and DRSLIM show that under mild conditions DRSC and DRSLIM yield stable consistent community detection, moreover, DRSCORE returns perfect clustering under the ideal case. We compare the performances of DRSC, DRSCORE and DRSLIM with several spectral methods by substantial simulated networks and eight real-world networks.

Full PDF

aa r X i v : . [ s t a t . M L ] N ov Dual regularized Laplacian spectralclustering methods on community detection

Huan QingDepartment of Mathematics, China University of Mining and TechnologyandJingli WangSchool of Statistics and Data Science, Nankai UniversityEmail: [email protected] 10, 2020

Abstract

Spectral clustering methods are widely used for detecting clusters in networksfor community detection, while a small change on the graph Laplacian matrix couldbring a dramatic improvement. In this paper, we propose a dual regularized graphLaplacian matrix and then employ it to three classical spectral clustering approachesunder the degree-corrected stochastic block model. If the number of communitiesis known as K , we consider more than K leading eigenvectors and weight themby their corresponding eigenvalues in the spectral clustering procedure to improvethe performance. Three improved spectral clustering methods are dual regularizedspectral clustering (DRSC) method, dual regularized spectral clustering on Ratios-of-eigenvectors (DRSCORE) method, and dual regularized symmetrized Laplacianinverse matrix (DRSLIM) method. Theoretical analysis of DRSC and DRSLIM showthat under mild conditions DRSC and DRSLIM yield stable consistent communitydetection, moreover, DRSCORE returns perfect clustering under the ideal case. Wecompare the performances of DRSC, DRSCORE and DRSLIM with several spectralmethods by substantial simulated networks and eight real-world networks. Keywords:

Spectral clustering; degree-corrected stochastic block model; Network structuredata 1

Introduction

The network has inadvertently penetrated into every aspect of our life, such as friendshipnetworks, work networks, social networks, biochemical networks and so on. In past fewdecades, researchers have developed a variety of approaches to explore network structuresfor understanding more underlying mechanisms (Lorrain and White, 1971; Burt, 1976;Doreian, 1985; Doreian et al., 1994; Borgatti and Everett, 1997). Recently, the commu-nity detection problem appeals to the ascending number of attention in computer science,physics and statistics (Karrer and Newman, 2011; Newman, 2006; Nepusz et al., 2008;Bickel and Chen, 2009a; Jin, 2015). The stochastic blockmodel (SBM) (Holland et al.,1983) is a well-known generative model for communities in networks, and it also be takenas a posteriori blockmodeling to discover the network structure. In the SBM, the nodeswithin each community are assumed to have the same expected degrees, that is to say, thedistribution of degrees within each community follows a Poisson distribution. Under thismodel, one can easily produce abundant diﬀerent network structures (Snijders and Nowicki,1997; Daudin et al., 2008; Bickel and Chen, 2009b). Unfortunately, due to the restrictiveassumptions of vertices, the simple blockmodel does not work well in many applicationsto real-world networks (Karrer and Newman, 2011). In fact, there are some extensionsof SBM to make it more ﬂexible to empirical networks. For instance, Wang and Wong(1987) constructed a blockmodel for directed networks with arbitrary expected degrees;Reichardt and White (2007) proposed a density-based stochastic block model for complexnetworks; Airoldi et al. (2008) introduced a mixed membership stochastic blockmodel basedon hierarchical Bayes; Karrer and Newman (2011) established the degree-corrected stochas-tic blockmodel (DCSBM) by assuming the degrees follow a power-law distribution, whichallows the degree varies among diﬀerent communities. Doubtless, DCSBM is the mostpopular one of the generalizations of SBM. And a number of methods have been designedbased on DCSBM, such as, Newman and Clauset (2016); Jin (2015); Qin and Rohe (2013);Zhang et al. (2020); Chen et al. (2018); Ma and Ma (2017). In this paper, we constructspectral clustering methods under the framework of DCSBM.The spectral clustering algorithm is a fast and eﬃcient method for community detection.In general, assuming there are K clusters, a spectral clustering method usually has foursteps (Luxburg, 2007): (1) obtain the adjacency matrix or its variants of the given network;(2) compute the ﬁrst leading K eigenvectors of the adjacency matrix or its variants; (3)combine eigenvectors by column to a matrix; (4) apply clustering methods (e.g. K-meansclustering, K-median clustering) to the row vectors of the eigenvector matrix to detectcommunities. Some algorithms apply a normalizing or regularizing procedure for the graph2atrix, and a scaling-invariant mapping to each row of eigenvector matrix to reduce theheterogeneity (Ng et al., 2002; Luxburg, 2007; Jin, 2015). Under DCSBM, substantialspectral clustering algorithms are developed recently. The regularized spectral clustering(RSC) method (Qin and Rohe, 2013) is established based on the general spectral clusteringmethod with a regularized graph Laplacian matrix and normalized eigenvector matrix.From RSC we can ﬁnd that a small change of graph matrix can lead to dramatically betterresults. Jin (2015) proposed a spectral clustering method which is called SCORE in short,using the entry-wise ratios between the ﬁrst leading eigenvector and each of the otherleading eigenvectors for clustering. Recently, Jing et al. (2021) employed a symmetrizedLaplacian inverse matrix (SLIM) on the spectral clustering method to measure the closenessbetween nodes. In fact, the procedures for most spectral clustering methods are similar,but the graph Laplacian matrices vary. In this paper, we focus on the construction ofthe graph Laplacian matrix to improve the performance of spectral clustering methods forcommunity detection.For a network, the Laplacian matrix plays a key role since it is consistent with thematrix in spectral geometry and random walks (Chung and Graham, 1997). Many peoplethink the quantities based on this matrix may more faithfully reﬂect the structure andproperties of a graph (Chen and Zhang, 2007). Actually, a number of authors have con-structed algorithms based on the (regularized/normalized) Laplacian matrix. For example,Qin and Rohe (2013) proposed a spectral clustering method with a regularized Laplacianmatrix; Jin et al. (2018) have discussed the reason why the regularized Laplacian matrixshould be used as a pre-procedure for their PCA-based clustering method; Dong et al.(2016) studied the smooth graph signal representation with a Laplacian matrix. Moreover,there are also many attractive methods established based on the dual Laplacian matrix.Yankelevsky and Elad (2016) applied dual Laplacian regularization to dictionary learning.Xiao et al. (2018) and Tang et al. (2019) developed biological methods based on dual Lapla-cian regularization microRNA-disease associations prediction. It is worth to mention thatthere are a variety of forms for Laplacian matrices, and we should be careful to chose thesuitable one for using (Von Luxburg, 2007; Chaudhuri et al., 2012). The motivation of thispaper comes from the fact that Amini et al. (2013); Qin and Rohe (2013); Joseph and Yu(2016) study that regularization on the Laplacian matrix can improve the performance ofspectral clustering, therefore a question comes naturally, whether spectral clustering ap-proaches designed based on the dual regularization or even multiple regularization on theLaplacian matrix can give satisfactory performances. In this paper we construct our detec-tion methods based on a dual regularized Laplacian matrix. However, our proposed dualregularized Laplacian methods diﬀer from the above mentioned dual Laplacian methods.3n our dual Laplacian methods, we ﬁrst build a regularized Laplacian matrix on the adja-cency matrix and then reapply regularized Laplacian on the ﬁrst Laplacian matrix, whileother dual Laplacian approaches just use two Laplacian matrix in a same function/model.In this paper we propose three spectral clustering methods to detect communities. Thethree spectral clustering methods are designed based on combining the dual regularizedLaplacian matrix L τ (which is deﬁned in section 3) with two traditional spectral cluster-ing methods and one recent spectral clustering method. Next we brieﬂy introduce our threeapproaches one by one. For the dual regularized spectral clustering (DRSC) method: asmentioned above we ﬁrst construct a dual regularized Laplacian matrix to reduce the noiseof the adjacency matrix of a network, and then unlike other spectral clustering methods, wecompute the ( K + 1) leading eigenvalues and its corresponding eigenvectors with unit-normon the obtained dual regularized Laplacian matrix. After a row-normalization step aimingat reducing the noise caused by degree heterogeneity, the ﬁnal estimated clusters are deter-mined by K-means. In fact, the dual regularized Laplacian can be taken as the foundationof this proposed method DRSC and our DRSC is designed based on the traditional regu-larized spectral clustering (RSC) method (Qin and Rohe, 2013). For our dual regularizedspectral clustering on ratios-of-eigenvectors (DRSCORE) method which is designed basedon the spectral clustering on ratios-of-eigenvectors (SCORE) method (Jin, 2015): afterobtaining the production of the leading ( K + 1) eigenvectors and eigenvalues of L τ , thereis a step to obtain the ratio of entry-wise matrix and then apply K-means to this matrixfor clustering. For our dual regularized symmetrized Laplacian inverse matrix (DRSLIM)method which is based on the recent symmetric Laplacian inverse matrix (SLIM) method(Jing et al., 2021): we obtain the symmetric Laplacian inverse matrix based on L τ , thenapply K-means on the row-normalization of the production of the leading ( K + 2) eigen-vectors and eigenvalues of the symmetric Laplacian inverse matrix. More importantly, ourproposed method use more than K eigenvalues and eigenvectors, which enables that DRSC,DRSCORE and DRSLIM could deal with some weak signal networks, such as Simmonsand Caltech where the two weak signal networks are discussed in Jin et al. (2018). In thenumerical studies, we also construct other multiple regularized spectral clustering methodswith multiple regularized Laplacian. Unfortunately, their performances are not as good asDRSC and DRSLIM.In Section 2, we set up the community detection problem under DCSBM. In Section3, we propose our three methods DRSC, DRSCORE and DRSLIM. Section 4 presentstheoretical framework of these three approaches where we show the consistency of DRSCand DRSLIM, and we provide population analysis for DRSCORE. Section 5 investigatesthe performances of DRSC, DRSCORE and DRSLIM via comparing with four spectral4lustering methods on both numerical networks and eight empirical data sets. Section 5also studies the eﬀect of diﬀerent choice of the two regularizers τ and τ on the perfor-mance of DRSC. Meanwhile, we also compare the performances of DRSC, DRSCORE andDRSLIM with the multiple regularization spectral clustering methods MRSC, MRSCOREand MRSLIM in Section 5, respectively. Section 6 concludes. The following notations will be used throughout the paper: k · k F for a matrix denotes theFrobenius norm, k · k for a matrix denotes the spectral norm, and k · k for a vector denotesthe l -norm. For convenience, when we say “leading eigenvalues” or “leading eigenvectors”,we are comparing the magnitudes of the eigenvalues and their respective eigenvectors withunit-norm. For any matrix or vector x , x ′ denotes the transpose of x .Consider an undirected, no-loops, and un-weighted connected network N with n nodesand let A be its adjacency matrix such that A ij = 1 if there is an edge between node i and j , A ij = 0 otherwise. Since there is no-loops in N , all diagonal entries of A arezero. Let C denote the set containing all nodes in N , assume that there exist K disjointclusters C (1) , C (2) , . . . , C ( K ) and each node belongs to exactly one cluster (i.e., C = ∪ Ki =1 C i ,and C i ∩ C j = ∅ for any distinct i, j such that 1 ≤ i, j ≤ K ). The number of clusters K is assumed to be known. Let ℓ be an n × ℓ ( i ) takes values from { , , . . . , K } and ℓ ( i ) is the true community label for node i . In this paper, our goal isdesigning spectral clustering algorithms via applying information by the given ( A, K ) ofthe network N to estimate ℓ .In this paper, we consider the degree-corrected stochastic blockmodel (DCSBM). UnderDCSBM, an n × θ = ( θ , · · · , θ n ) ′ is introduced to control thenode degrees, where θ i > i , i = 1 , · · · , n . To facilitate theoretical analysis,assume that all elements of θ are in [0 , i and node j is assumed to follow a Bernoulli distributionsuch that Pr( A ij = 1) = θ i θ j P g i g j where P is a K × K symmetric matrix with full rank(called mixing matrix) and its entries are in [0 , g i denotes the cluster that node i belongs to (i.e., g i = ℓ ( i )). Then we deﬁne the expectation matrix of the adjacency matrix A as Ω △ = E [ A ] such that Ω ij = Pr( A ij = 1) = θ i θ j P g i g j . From Jin (2015); Karrer and Newman(2011) and Qin and Rohe (2013), Ω can be expressed asΩ = Θ ZP Z ′ Θ , n × n diagonal matrix Θ’s i -th diagonal element is θ i and the n × K membershipmatrix Z directly contains information about the true nodes labels such that Z ik = 1 ifand only if node i belongs to block k (i.e., g i = k ), otherwise Z ik = 0.Given ( n, P, Θ , Z ), we can generate the random adjacency matrix A under DCSBM,therefore we denote the DCSBM model as DCSBM ( n, P, Θ , Z ) for convenience in thispaper. For community detection under DCSBM, spectral clustering algorithms are alwaysdesigned based on analyzing the properties of Ω or its variants since it can be expressed bythe true membership matrix Z . In this section, under DCSBM we ﬁrst introduce a dual regularized Laplacian matrixbased spectral clustering method in general case. And then we apply our dual regu-larization ideology to three published spectral clustering methods to improve their per-formances. These published spectral clustering methods are RSC (Qin and Rohe, 2013),SCORE (Jin, 2015) and SLIM (Jing et al., 2021), and their reﬁnements are called dualregularized spectral clustering (DRSC for short), dual regularized spectral clustering onratios-of-eigenvectors (DRSCORE for short) and dual regularized symmetrized Laplacianinverse matrix (DRSLIM for short).

The regularized Laplacian matrix is deﬁned as L τ = D − / τ AD − / τ , where D τ = D + τ I , D is an n × n diagonal matrix whose i -th diagonal entry is given by D ( i, i ) = P j A ( i, j ), I is an n × n identity matrix, and the regularizer τ is a nonnegativenumber.The dual regularized Laplacian matrix is deﬁned as L τ = D − / τ L τ D − / τ , where D τ = D + τ I , D is an n × n diagonal matrix whose i -th diagonal entry is givenby D ( i, i ) = P j L τ ( i, j ), and the regularizer τ is nonnegative . Note that in this paper, without causing confusion, matrices or vectors with subscript 1 or τ are

6n traditional spectral clustering methods, once people have adjacent matrix or itsvariants, we should compute the leading K eigenvectors and combine them by columnto a matrix, and then one may normalize the matrix by row or without this normalizingprocedure. Finally, K-means is applied to the eigenvector-matrix to detect communities.Diﬀerent from the traditional methods, we employ the weighted ( K + K ) leading eigen-vectors of the dual regularized Laplacian matrix, where K is a positive integer. In factthe idea of more than K leading eigenvectors is ﬁrst proposed by Jin et al. (2018) sincethey ﬁnd that the ( K + 1)-th eigenvector may contain some label information for someweak signal networks. In their method they apply the K or ( K + 1) leading eigenvec-tors depending on some conditions, but we release their conditions and directly use these( K + K ) leading eigenvectors for all cases. In this paper, unless speciﬁed, let { ˆ λ i } K + K i =1 bethe leading ( K + K ) eigenvalues of L τ , and { ˆ η i } K + K i =1 be the respective eigenvectors withunit-norm. Then we construct a weighted eigenvector matrix ˆ X based on L τ (or ˇ X basedon M deﬁned in Section 3.4), where the weights are their corresponding eigenvalues. Thismatrix can be presented as ˆ X = [ˆ η , ˆ η , . . . , ˆ η K , ˆ η K + K ] · diag(ˆ λ , ˆ λ , . . . , ˆ λ K , ˆ λ K + K ). ˇ X shares similar forms as ˆ X except that it consists of eigenvectors and eigenvalues of M . Thepenultimate step is normalizing ˆ X and ˇ X by row or by entry-wise ratio. Finally, K-meansmethod is applied to the normalization versions of ˆ X and ˇ X to detect the communities.There are 3 key points in our proposed methods: one is the idea of dual regularizedLaplacian matrix; one is using the ﬁrst ( K + K ) leading eigenvectors; one is taking theeigenvalues as weights for the eigenvectors. These three points contribute a lot to theperformances of spectral clustering methods. Note that the ( K + 1) or ( K + 2) leadingeigenvectors are enough for detecting strong and weak signal networks. For example, weuse ( K + 1) leading eigenvectors for DRSC and DRSCORE algorithms, and present thealgorithm of DRSLIM with ( K + 2) leading eigenvectors. The detail of the DRSC method proceeds as in Algorithm 1.Though there are two ridge regularizers τ and τ in our DRSC, numerical results oneight real-world datasets in Section 5.2 shows that our DRSC is insensitive to the choiceof τ and τ . For convenience, unless speciﬁed, the default values for τ and τ for DRSC always computed based on the regularized Laplacian matrix or the population version of the regularizedLaplacian matrix in next sections, while matrices or vectors with subscript 2 or τ are always computedbased on the dual regularized Laplacian matrix or the population version of the dual regularized Laplacianmatrix in next sections. lgorithm 1 Dual Regularized Spectral Clustering algorithm ( DRSC ) Input:

A, K and regularizers τ , τ . Output: node labels ˆ ℓ Compute L τ (the default τ for DRSC is the average degree, i.e., τ = P i,j A ( i, j ) /n ). Compute L τ (the default τ for DRSC is τ = P i,j L τ ( i, j ) /n ). Compute the n × ( K+1 ) matrix ˆ X = [ˆ η , ˆ η , . . . , ˆ η K , ˆ η K +1 ] · diag(ˆ λ , ˆ λ , . . . , ˆ λ K , ˆ λ K +1 ). Compute ˆ X ∗ by normalizing each of ˆ X ’s rows to have unit length. Treat each row of ˆ X ∗ as a point in R K +1 , and apply K-means to ˆ X ∗ with K clustersto obtain ˆ ℓ .are set as in Algorithm 1 in this paper. The detail of the DRSCORE method proceeds as in Algorithm 2.

Algorithm 2 Dual Regularized Spectral Clustering On Ratios-of-Eigenvectorsalgorithm ( DRSCORE ) Input:

A, K and regularizers τ , τ . Output: node labels ˆ ℓ Compute L τ (the default τ for DRSCORE is τ = P i,j A ( i, j )). Compute L τ (the default τ for DRSCORE is τ = P i,j L τ ( i,j ) nK ). Compute ˆ X = [ˆ η , ˆ η , . . . , ˆ η K , ˆ η K +1 ] · diag(ˆ λ , ˆ λ , . . . , ˆ λ K , ˆ λ K +1 ). Set ˆ X i as the i -thcolumn of ˆ X, ≤ i ≤ K + 1. Compute the n × K matrix ˆ R of entry-wise eigen-ratios such that ˆ R ( i, k ) =ˆ X k +1 ( i ) / ˆ X ( i ) , ≤ i ≤ n, ≤ k ≤ K . Treat each row of ˆ R as a point in R K , and apply K-means to ˆ R with K clusters toobtain ˆ ℓ .In the original SCORE method, there is a threshold parameter T n (the default T n is log( n )) to control the eigen-ratios. In our DRSCORE, we release this condition. ByLemma 2.5 in Jin (2015), since we only consider connected network in this paper, ˆ λ is nonzero and all elements of ˆ η are nonzero, which means that ˆ λ ˆ η ( i ) can be in thedenominator, therefore the eigen-ratio matrix ˆ R is well deﬁned. Meanwhile, there are tworegularizers τ and τ in DRSCORE. Though numerical results in Section 5.2 show that8ur DRSCORE is sensitive to the choice of τ and τ , as long as we set τ = P i,j A ( i, j )and τ = P i,j L τ ( i,j ) nK , our DRSCORE has excellent performances and always outperformsSCORE. For convenience, unless speciﬁed, the default values for τ and τ for DRSCOREare set as in Algorithm 2 in this paper. The detail of the DRSLIM method proceeds as in Algorithm 3.

Algorithm 3 Dual Regularized Symmetrized Laplacian Inverse Matrix algo-rithm ( DRSLIM ) Input:

A, K and regularizers τ , τ and tuning parameter γ . Output: node labels ˆ ℓ Compute L τ (the default τ for DRSCORE is τ = P i,j A ( i, j ) /n ). Compute L τ (the default τ for DRSCORE is τ = P i,j L τ ( i, j ) /n ). Compute the inverse dual regularized Laplacian matrix ˆ W = ( I − e − γ D − τ L τ ) − (thedefault γ is 0.25). Calculate ˆ M = ( ˆ W + ˆ W ′ ) / M ’s diagonal entries to be 0. Compute the n × (K+2) matrix ˇ X such that ˇ X = [ˇ η , ˇ η , . . . , ˇ η K +1 , ˇ η K +2 ] · diag(ˇ λ , ˇ λ , . . . , ˇ λ K +1 , ˇ λ K +2 ), where { ˇ λ i } K +2 i =1 are the leading eigenvalues of ˆ M , { ˇ η i } K +2 i =1 are the respective eigenvectors with unit-norm. Compute ˇ X ∗ by normalizing each of ˇ X ’s rows to have unit length. Treat each row of ˇ X ∗ as a point in R K , and apply K-means to ˇ X ∗ with K clusters toobtain ˆ ℓ .There is a row-normalization step of DRSLIM to obtain ˇ X ∗ , and then apply K-meanson ˇ X ∗ , instead of simply applying K-means on ˇ X as in SLIM. Though there are tworegularizers τ and τ in DRSLIM, it is shown in Section 5.2 that DRSLIM is insensitive tothe choice of τ and τ as long as they are lager than 1. For convenience, unless speciﬁed,the default values for τ and τ for DRSLIM are set as in Algorithm 3 in this paper. Asargued in Jing et al. (2021), we always set the default value for γ as 0.25 since it has beenfound to be a good choice in both simulated and real-world networks under both SBM andDCSBM. This choice of γ is applied for all numerical studies in this paper.9 Theoretical Results

This section builds theoretical frameworks for DRSC, DRSCORE and DRSLIM to showthat they yields stable consistent community detection under mild conditions. Beforepropose the details of the theoretical analysis for the proposed methods, ﬁrst we deﬁne thedual regularized population Laplacian matrix L τ and present some useful properties of it.Deﬁne the diagonal matrix D to contain the expected node degrees such that D ( i, i ) = P nj Ω( i, j ) for 1 ≤ i ≤ n , and deﬁne D τ such that D τ = D + τ I . Then the regularizedpopulation Laplacian L τ in R n × n can be presented as: L τ = D − . τ Ω D − . τ . Then we deﬁne the diagonal matrix D such that its i -th diagonal element is deﬁned as D ( i, i ) = P nj L τ ( i, j ), and deﬁne D τ such that D τ = D + τ I . Then the dual regularizedpopulation Laplacian L τ can be written in the following way: L τ = D − . τ L τ D − . τ . L τ For convenience, we denote δ min = min i D ( i, i ) , δ max = max i D ( i, i ) , ∆ min = min i D ( i, i ) , and ∆ max =max i D ( i, i ). And we set ̟ a , ̟ b as ̟ a = max( vuut τ + ∆ max τ +∆ min τ + δ min τ + δ max − , − vuut τ + ∆ min τ +∆ max τ + δ max τ + δ min ) ,̟ b = max( τ + ∆ max τ +∆ min τ + δ min τ + δ max − , − τ + ∆ min τ +∆ max τ + δ max τ + δ min ) . Deﬁne a K × n matrix Q τ as Q τ = P Z ′ Θ (note that though Q τ is not deﬁned basedon τ , we use this subscript to mark that it is not designed based on dual procedures todistinguish it from Q τ , the same nomenclature holds for D τ P , ˜ P τ deﬁned below), and deﬁnethe K × K diagonal matrix D τ P as D τ P ( i, i ) = P nj =1 Q τ ( i, j ) , i = 1 , , . . . , K . The nextlemma gives an explicit form for L τ as a product of the parameter matrices. Lemma 4.1. ( Explicit form for L τ ) Under DCSBM ( n, P, Θ , Z ) , deﬁne θ τ ( i ) as θ τ ( i ) = θ i D ( i,i ) D ( i,i )+ τ , let Θ τ be a diagonal matrix whose ii ’th entry is θ τ ( i ) . Deﬁne ˜ P τ as ˜ P τ =( D τ P ) − . P ( D τ P ) − . , then L τ can be written as L τ = D − . τ Ω D − . τ = Θ . τ Z ˜ P τ Z ′ Θ . τ .

10n fact, we can rewrite L τ as follows: L τ = k ˜ θ τ k ˜Γ τ ˜ D τ ˜ P τ ˜ D τ ˜Γ ′ τ , where ˜ θ τ is an n × θ τ ( i ) = p θ τ ( i ) , for 1 ≤ i ≤ n, ˜ D τ is a K × K diagonalmatrix of the overall degree intensities with ˜ D τ ( k, k ) = k ˜ θ ( k ) τ k ( k ˜ θ τ k ) − , ≤ k ≤ K, ˜ θ ( k ) τ isan n × ≤ i ≤ n, ≤ k ≤ K , ˜ θ ( k ) τ ( i ) = ˜ θ τ ( i ) { g i = k } , and the n × K matrix ˜Γ τ such that ˜Γ τ = [ ˜ θ (1) τ k ˜ θ (1) τ k ˜ θ (2) τ k ˜ θ (2) τ k . . . ˜ θ ( K ) τ k ˜ θ ( K ) τ k ] . To conduct the explicit form of L τ , we deﬁne a K × n matrix Q τ as Q τ = ˜ P τ Z ′ Θ . τ ,and deﬁne a K × K diagonal matrix D τ P as D τ P ( i, i ) = P nj =1 Q τ ( i, j ) , i = 1 , , . . . , K . Thenext lemma gives an explicit form for L τ as a product of the parameter matrices. Lemma 4.2. ( Explicit form for L τ ) Under DCSBM ( n, P, Θ , Z ) , deﬁne θ τ ( i ) as θ τ ( i ) =( θ τ ( i )) . D ( i,i ) D ( i,i )+ τ , let Θ τ be a diagonal matrix whose ii ’th entry is θ τ ( i ) . Deﬁne ˜ P τ as ˜ P τ = ( D τ P ) − . ˜ P τ ( D τ P ) − . , then L τ can be written as L τ = D − . τ L τ D − . τ = Θ . τ Z ˜ P τ Z ′ Θ . τ . Similarly, we can represent L τ to the following form: L τ = k ˜ θ τ k ˜Γ τ ˜ D τ ˜ P τ ˜ D τ ˜Γ ′ τ , where ˜ θ τ = (˜ θ τ (1) , · · · , ˜ θ τ ( n )) ′ , ˜ θ τ ( i ) = p θ τ ( i ) , ≤ i ≤ n, ˜ D τ is a K × K diagonalmatrix with ˜ D τ ( k, k ) = k ˜ θ ( k ) τ k ( k ˜ θ τ k ) − , ≤ k ≤ K , ˜ θ ( k ) τ is an n × ≤ i ≤ n, ≤ k ≤ K , ˜ θ ( k ) τ ( i ) = ˜ θ τ ( i ) { g i = k } , and ˜Γ τ = [ ˜ θ (1) τ k ˜ θ (1) τ k ˜ θ (2) τ k ˜ θ (2) τ k . . . ˜ θ ( K ) τ k ˜ θ ( K ) τ k ] . By basic knowledge of algebra, we know that the rank of L τ is K when there are K clusters, therefore L τ has K nonzero eigenvalues. We give the expressions of the leading K eigenvectors of L τ in Lemma 4.3. Lemma 4.3.

Under

DCSBM ( n, P, Θ , Z ) , suppose all eigenvalues of ˜ D τ ˜ P τ ˜ D τ are sim-ple. Let λ / k ˜ θ τ k , λ / k ˜ θ τ k , . . . , λ K / k ˜ θ τ k be such eigenvalues, arranged in the descendingorder of the magnitudes, and let a , a , . . . , a K be the associated (unit-norm) eigenvectors. hen the K nonzero eigenvalues of L τ are λ , λ , . . . , λ K , with the associated (unit-norm)eigenvectors being η k = K X i =1 [ a k ( i ) / k ˜ θ ( i ) τ k ] · ˜ θ ( i ) τ , k = 1 , , . . . , K. Let ˆ λ , ˆ λ , . . . , ˆ λ K be the K leading eigenvalues of L τ . The theoretical bound for thespectral norm k L τ − L τ k is given in Lemma 4.4 which is helpful in the theoretical analysisfor DRSC and DRSLIM.For convenience, we set one parameter err n as: err n = 4 q n/ǫ ) δ min + τ τ + δ min δ max + τ + ( 1 q τ + ∆ min τ +∆ max + 1 q τ + δ min τ + δ max ) ∆ max τ +∆ max q τ + ∆ min ∆ max + τ ̟ a . Lemma 4.4. (Concentration of the dual regularized Graph Laplacian) Under

DCSBM ( n, P, Θ , Z ) ,if τ + δ min > n/ǫ ) , with probability at least − ǫ , we have k L τ − L τ k ≤ err n . Figure 1 demonstrates the eﬀect of τ and τ on k L τ − L τ k , and the color indicates k L τ − L τ k on given τ and τ . We also consider the case when τ = τ in Figure 2. FromFigure 1 and Figure 2, we can ﬁnd that k L τ − L τ k decreases to zero when τ and/or τ increase(s). This is consistent with Lemma 4.4, since when increasing the two regularizersor one of the two regularizers, err n decreases to zero, and hence k L τ − L τ k goes to zero.Meanwhile, k L τ − L τ k is more sensitive on τ than on τ , since smaller τ indicates large k L τ − L τ k while k L τ − L τ k is quite small for any τ as long as τ is larger than 0.Therefore, to make k L τ − L τ k small enough, τ should be larger than 0. Numericalresults in Section 4 also supports this statement.We can also bound the diﬀerences of the estimated eigenvalues and the populationeigenvalues by err n . Since L τ and L τ are two symmetric matrices, Weyl’s inequality(Weyl, 1912) holds. The following lemma is a direct result of Lemma 4.4 and the Weyl’sinequality, the proof of which is omitted. Lemma 4.5.

Under

DCSBM ( n, P, Θ , Z ) , if τ + δ min > n/ǫ ) , then with probabilityat least − ǫ , max ≤ k ≤ K {| ˆ λ k − λ k |} ≤ err n . τ and τ on k L τ − L τ k . x-axis: τ . y-axis: τ . For the ﬁgure, wetake n = 400 , K = 2, and let nodes belong to one of the clusters with equal probability,set θ i = 0 . g i = 1, θ i = 0 . g i = 2, and the mixing matrix P with 0.1 as diagonalentries and 0.05 as oﬀ-diagonal entries. Given the above settings of parameters, we cangenerate one A and Ω with equal size, then set τ , τ ∈ { , , , . . . , } , for each τ and τ , we can obtain L τ and L τ , then we can obtain the heatmap of k L τ − L τ k against τ and τ . 13 Figure 2: The eﬀect of τ on k L τ − L τ k with τ = τ = τ . τ ∈ { , . , . , . . . , } andother settings are same as in Figure 1. x-axis: τ . y-axis: k L τ − L τ k .The following lemma gives a bound for diﬀerence of matrices of eigenvectors and alsoconstitutes the key component of the proof of Theorem 4.8 for DRSC. Lemma 4.6.

Let ˆ V K be the n × K matrix such that its i -th column is ˆ η i , let V K be the n × K matrix such that its i -th column is η i for ≤ i ≤ K , then under DCSBM ( n, P, Θ , Z ) , ifwe assume that (a) τ + δ min > n/ǫ ) , (b) err n ≤ λ K , with probability at least − ǫ ,we have k ˆ V K − V K k F ≤ err n √ Kλ K . Note that assumption (b) also means that we assume all nonzero eigenvalues of L τ arepositive. Recall that we have obtained the expressions of the leading ( K + 1) eigenvalues and theleading K eigenvectors of L τ , then we can write down the Ideal DRSC algorithm whichcan be obtained via using Ω to replace A in the DRSC algorithm. Ideal DRSC . Input: Ω. Output: ℓ . 14 tep 1: Obtain L τ . Step 2:

Obtain L τ . Step 3:

Obtain X = [ η , η , . . . , η K , η K +1 ] · diag( λ , λ , . . . , λ K , λ K +1 ) = [ λ η , λ η , . . . , λ K η K , L τ only has K nonzero eigenvalues. Step 4:

Obtain X ∗ , the row-normalized version of X . Step 5:

Apply K-means to X ∗ assuming there are K clusters.The population analysis of DRSC aims at conﬁrming that the Ideal DRSC algorithmcan return perfect clustering. Lemma 4.7 guarantees that the Ideal DRSC algorithm surelyprovides true nodes labels. Lemma 4.7.

Under

DCSBM ( n, P, Θ , Z ) , X has K distinct rows and for any two distinctnodes i, j if g i = g j , the j -th row of X equals the i -th row of it. Applying K-means to X leads to the true community labels of each node. The abovepopulation analysis for DRSC method presents a direct understanding of why the DRSCalgorithm works and guarantees that DRSC returns perfect clustering results under theideal case. Hamming error rate Jin (2015) is a proper criterion to measure the performance of ˆ ℓ . Thissection bounds the Hamming error rate of DRSC under DCSBM to show that it stablyyields consistent community detection under certain conditions.The Hamming error rate of ˆ ℓ is deﬁned as:Hamm n (ˆ ℓ, ℓ ) = min π ∈ S K H p (ˆ ℓ, π ( ℓ )) /n, where S K = { π : π is a permutation of the set { , , . . . , K }} , π ( ℓ )( i ) = π ( ℓ ( i )) for 1 ≤ i ≤ n , and H p (ˆ ℓ, ℓ ) is the expected number of mismatched labels which is deﬁned as H p (ˆ ℓ, ℓ ) = n X i =1 P (ˆ ℓ ( i ) = ℓ ( i )) . Therefore, a direct understanding of Hamming error rate is that it is the ratio between theexpected number of nodes where the estimated label does not match with the true labeland the number of nodes in a given network Jin (2015). Due to the fact that the clustering errors should not depend on how we tag each of the K communities,it is necessary for us to take permutation of labels into account to measure the performances of DRSC. k ˆ X ∗ − X ∗ k F . Therefore, we ﬁrst bound k ˆ X − X k F and k ˆ X ∗ − X ∗ k F in Theorem 4.8. Theorem 4.8.

Under

DCSBM ( n, P, Θ , Z ) , deﬁne m a = min i { min {k ˆ X i k , k X i k}} as thelength of the shortest row in ˆ X and X . Then, for any ǫ > and suﬃciently large n , assumethat assumptions (a) and (b) in Lemma 4.6 hold, then with probability at least − ǫ , thefollowing holds k ˆ X − X k F ≤ q Kerr n + ˆ λ K +1 + 8 Kerr n λ K , k ˆ X ∗ − X ∗ k F ≤ m a ( q Kerr n + ˆ λ K +1 + 8 Kerr n λ K ) . The following theorem is the main theoretical result of this paper which bounds theHamming error rate of our DRSC method under mild conditions and shows that the DRSCmethod stably yields consistent community detection under several conditions if we assumethat the adjacency matrix A are generated from the DCSBM model. Theorem 4.9.

Under the DCSBM with parameters { n, P, Θ , Z } and the same assumptionsas in Theorem 4.8 hold, suppose as n → ∞ , we have m a ( q Kerr n + ˆ λ K +1 + 8 Kerr n λ K ) / min { n , n , . . . , n K } → , where n k is the size of the k -th community for ≤ k ≤ K . For the estimates label vector ˆ ℓ by DRSC, such that with probability at least − ǫ , we have Hamm n (ˆ ℓ, ℓ ) ≤ nm a ( q Kerr n + ˆ λ K +1 + 8 Kerr n λ K ) . Note that by the assumption (b), we can room Hamm n (ˆ ℓ, ℓ ) asHamm n (ˆ ℓ, ℓ ) ≤ ( q Kλ K + 4ˆ λ K +1 + 8 K ) nm a . We can ﬁnd that as n goes on increasing while keeping other parameters ﬁxed, the Hammingerror rates of DRSC decreases to zero. A larger K suggests a larger error bound, whichmeans that it becomes harder to detect communities for DRSC when K increases. Wecan also ﬁnd that our DRSC procedure can detect both strong signal networks and weak16ignal networks since the theoretical bound of the Hamming error rate for DRSC dependson ˆ λ K +1 and λ K . When a network is strong signal, suggesting that ˆ λ K +1 is much smallerthan ˆ λ K , and hence much smaller than λ K by Lemma 4.5, therefore the bound of Hammingerror rate for strong signal networks mainly depends on the K -th leading eigenvalue ˆ λ K .When dealing with weak signal networks, ˆ λ K +1 is quite close to ˆ λ K , and hence it is alsoclose to λ K by Lemma 4.5, which means that the bound of Hamming error rate for weaksignal networks mainly depends on the ( K + 1)-th leading eigenvalue ˆ λ K +1 . Similar as the procedures of theoretical analysis for DRSC, for the population analysis ofDRSCORE, ﬁrst we present its ideal case, the Ideal DRSCORE algorithm:

Ideal DRSCORE . Input: Ω. Output: ℓ . Step 1:

Obtain L τ . Step 2:

Obtain L τ . Step 3:

Obtain X = [ η , η , . . . , η K , η K +1 ] · diag( λ , λ , . . . , λ K , λ K +1 ) = [ λ η , λ η , . . . , λ K η K , L τ only has K nonzero eigenvalues. Set X i as the i -th column of X, ≤ i ≤ K + 1. Step 4:

Obtain the n × K matrix R of entry-wise eigen-ratios such that R ( i, k ) = X k +1 ( i ) /X ( i ) , ≤ i ≤ n, ≤ k ≤ K . Step 5:

Apply K-means to R assuming there are K clusters.Note that, by Lemma 2.5 in Jin (2015), since L τ is a connected matrix (i.e., it does nothave dis-connected parts), λ is nonzero and all elements of η are nonzero, hence λ η ( i )can be in the denominator, and R is well deﬁned. Applying K-means to R leads to the truecommunity labels of each node. The following population analysis for DRSCORE methodpresents a direct understanding of why the DRSCORE algorithm works and guaranteesthat DRSCORE returns perfect clustering results under the ideal case. Lemma 4.10.

Under

DCSBM ( n, P, Θ , Z ) , R has K distinct rows and for any two distinctnodes i, j if g i = g j , then the j -th row of R equals the i -th row of it. ˆ M Similar as the procedures of theoretical analysis for DRSC and DRSCORE, for the popu-lation analysis of DRSLIM, ﬁrst we present its ideal case:17 deal DRSLIM . Input: Ω. Output: ℓ . Step 1:

Obtain L τ . Step 2:

Obtain L τ . Step 3:

Obtain W = ( I − e − γ D − τ L τ ) − . Calculate M = ( W + W ′ ) / M ’sdiagonal entries to be 0. Step 4:

Obtain the n × (K+2) matrix ˜ X such that ˜ X = [˜ η , ˜ η , . . . , ˜ η K +1 , ˜ η K +2 ] · diag(˜ λ , ˜ λ , . . . , ˜ λ K +1 , ˜ λ K +2 ), where { ˜ λ i } K +2 i =1 are the leading eigenvalues of M , { ˜ η i } K +2 i =1 arethe respective eigenvectors with unit-norm. Step 5:

Obtain ˜ X ∗ by normalizing each of ˜ X ’s rows to have unit length. Step 6:

Apply K-means to ˜ X ∗ assuming there are K clusters.Since W is nonsingular, clearly M is also nonsingular, which indicates that the n × n matrix M has n nonzero eigenvalues. Therefore, M does not share similar explicitexpressions as L τ and L τ , so it is challenging to ﬁnd the explicit expressions of theeigenvectors of M .For convenience, in DRSLIM, we set ς = e − γ and Err n as Err n = ς ( 11 − ς τ ∆ max +∆ ( τ ∆ max + τ τ +∆ min ) )( 11 − ς τ δ max + δ ( τ δ max + τ τ + δ min ) ) × ( err n τ + ∆ min τ +∆ max + ( τ + ∆ max ) δ max ̟ b ( τ τ + τ ∆ max + ∆ min )( τ τ + τ δ max + δ min ) ) . In the theoretical analysis for DRSLIM, without causing confusion, we set four matrices asfollows ˇ V = [ˇ η , ˇ η , . . . , ˇ η K , ˇ η K +1 , ˇ η K +2 ] , ˜ V = [˜ η , ˜ η , . . . , ˜ η K , ˜ η K +1 , ˜ η K +2 ] , ˇ E = diag(ˇ λ , ˇ λ , . . . , ˇ λ K , ˇ λ K +1 , ˇ λ K +2 ) , ˜ E = diag(˜ λ , ˜ λ , . . . , ˜ λ K , ˜ λ K +1 , ˜ λ K +2 ) . Similar as Lemma 4.5 and 4.6, we consider the bounds for max ≤ k ≤ K +2 {| ˇ λ k − ˜ λ k |} and k ˇ V − ˜ V k F . Lemma 4.11.

Under

DCSBM ( n, P, Θ , Z ) , if assumptions (a) and (c) ς < min { ( τ ∆ max + τ τ + ∆ min ) τ ∆ max + ∆ , ( τ δ max + τ τ + δ min ) τ δ max + δ } , hold, then with probability at least − ǫ , we have max ≤ k ≤ K +2 {| ˇ λ k − ˜ λ k |} ≤ Err n . emma 4.12. Under

DCSBM ( n, P, Θ , Z ) , if assumptions (a) and (c) hold, and assumethat (d) ˜ λ ≥ . . . ≥ ˜ λ K +2 > , ˇ λ ≥ . . . ≥ ˇ λ K +2 > , and ˜ λ K +2 > | ˜ λ i | , ˇ λ K +2 > | ˇ λ i | for i = K + 3 , K + 4 , . . . , n hold, then with probability at least − ǫ , we have k ˇ V − ˜ V k F ≤ p K + 2) Err n ˜ λ k +2 − ˜ λ k +3 . Theorem 4.13 provides the bound of k ˇ X ∗ − ˜ X ∗ k F , which is the corner stone to charac-terize the behavior of our DRSC approach. Theorem 4.13.

Under

DCSBM ( n, P, Θ , Z ) , deﬁne m b = min i { min {k ˇ X i k , k ˜ X i k}} as thelength of the shortest row in ˇ X and ˜ X . Then, for any ǫ > and suﬃciently large n , assumethat assumptions (a), (c) and (d) in Lemma 4.12 hold, then with probability at least − ǫ ,the followings hold k ˇ X − ˜ X k F ≤ Err n √ K + 2 + 2 / ( K + 2)˜ λ Err n ˜ λ k +2 − ˜ λ k +3 , k ˇ X ∗ − ˜ X ∗ k F ≤ m b ( Err n √ K + 2 + 2 / ( K + 2)˜ λ Err n ˜ λ k +2 − ˜ λ k +3 ) . The following theorem is the main theoretical result which bounds the Hamming error rateof our DRSLIM method under mild conditions and shows that the DRSLIM method stablyyields consistent community detection under several conditions.

Theorem 4.14.

Under the DCSBM with parameters { n, P, Θ , Z } and the same assump-tions as in Theorem 4.13 hold, suppose as n → ∞ , we have m b ( Err n √ K + 2 + 2 / ( K + 2)˜ λ Err n ˜ λ k +2 − ˜ λ k +3 ) / min { n , n , . . . , n K } → . For the estimates label vector ˇ ℓ by DRSLIM, such that with probability at least − ǫ , wehave Hamm n (ˇ ℓ, ℓ ) ≤ nm b ( Err n √ K + 2 + 2 / ( K + 2)˜ λ Err n ˜ λ k +2 − ˜ λ k +3 ) . Numerical Results

We compare our DRSC, DRSCORE and DRSLIM with a few recent methods: RSC(Qin and Rohe, 2013), SCORE (Jin, 2015), SLIM (Jing et al., 2021) and OCCAM (Zhang et al.,2020) via synthetic data and eight real-world networks. It needs to mention that overlap-ping continuous community assignment model (OCCAM) is designed for overlapping com-munities by Zhang et al. (2020) which is also a spectral clustering algorithm via applyingK-median method for clustering instead of K-means.For each procedure, the clustering error rate is measured bymin { π :permutation over { , ,...,K }} n n X i =1 { π (ˆ ℓ i ) = ℓ i } , where ℓ i and ˆ ℓ i are the true and estimated labels of node i . In this subsection, we use three simulated experiments to investigate the performance ofthese approaches.

Experiment 1 . In this experiment, we investigate performances of these approachesunder SBM when K = 2 and 3 by increasing n . Set n ∈ { , , , . . . , } . For eachﬁxed n , we record the mean of the error rate of 50 repetitions. Experiment 1(a) . When K = 2, we generate ℓ by setting each node belonging to one ofthe clusters with equal probability (i.e., ℓ i − i . i . d . ∼ Bernoulli(1 / P a ) as P a ) = (cid:20) . . (cid:21) . Generate θ as θ ( i ) = 0 . g i = 1, θ ( i ) = 0 . g i = 2. Experiment 1(b) . When K = 3, we generate ℓ by setting each node belonging to one ofthe clusters with equal probability. The mixing matrix P b ) is P b ) =  . . . . . .  . Generate θ as θ ( i ) = 0 . g i = 1, θ ( i ) = 0 . g i = 2, and θ ( i ) = 0 . g i = 3.20he numerical results of Experiment 1 are shown in Figure 3 from which we can havethe following conclusions. When the number of clusters is 2, as n increases, error ratesof RSC, DRSC, DRSCORE, and DRSLIM decrease rapidly, while error rates for SCORE,SLIM and OCCAM decrease quite slowly. When we increase the number of clusters from2 to 3, the error rates of RSC, SCORE, SLIM and OCCOM are always quite large though n increases. Overall, our DRSC, DRSCORE and DRSLIM have comparable performancesand almost always outperform the other four procedures obviously in this Experiment. n n Figure 3: Numerical results of Experiment 1. Left panel: Experiment 1(a). Right panel:Experiment 1(b). y-axis: error rates.

Experiment 2 . In this experiment, we study how the ratios of θ , the ratios of the sizebetween diﬀerent communities and the ratios of diagonal and oﬀ-diagonal entries of themixing matrix impact behaviors of our proposed approaches when n = 500. Experiment 2(a) . We study the inﬂuence of ratios of θ when K = 2 under SBM. Setthe proportion a in { , . , . , . . . , } . We generate ℓ by setting each node belonging toone of the clusters with equal probability. And the mixing matrix P a ) is P a ) = h . . . . i . Generate θ as θ ( i ) = 1 if g i = 1, θ ( i ) = 1 /a if g i = 2. Note that for each ﬁxed a , θ ( i )is a ﬁxed number for all nodes in the same community and hence this is a SBM case.Meanwhile, for each ﬁxed a , we record the mean of clustering error rates of 50 samplednetworks. Experiment 2(b) . We study how the sparsity of networks aﬀects the performance of thesemethods under SBM in this sub-experiment. Let the proportion b in { , . , , . . . , } . Wegenerate ℓ by setting each node belonging to one of the clusters with equal probability. Set21 as θ ( i ) = 1 for all i . The mixing matrix P b ) is set as: P b ) = b h . . .

15 0 . i . For each ﬁxed b , we record the average of the clustering error rates of 50 simulated net-works. Note that as b increases, more edges are generated, therefore the generated networkis more dense. Experiment 2(c) . All parameters are the same as in Experiment 2(b), except that themixing matrix P c ) is as follows: P c ) = b h .

15 0 . . . i . Therefore, networks generated from Experiment 2(c) are dis-associative where dis-associativenetworks denote networks generated from the mixing matrix such that the oﬀ-diagonal en-tries are larger than the diagonal entries, i.e., there are more edges between nodes fromdistinct clusters than from the same cluster.

Experiment 2(d) . We study how the proportion between the size of clusters inﬂuencesthe performance of these methods under SBM in this sub-experiment. We set the proportion c in { , , . . . , } . Set n = round( nc +1 ) as the number of nodes in cluster 1 whereround( x ) denotes the nearest integer for any real number x . We generate ℓ such that g i = 1for i = 1 , · · · , n , and g i = 2 for i = ( n + 1) , · · · , n . Note that c is the ratio of the sizeof cluster 2 and cluster 1. And the mixing matrix P d ) is set as: P d ) = h . . . . i . Let θ be θ ( i ) = 0 . g i = 1 and θ ( i ) = 0 . c , we record theaverage for the clustering error rates of 50 simulated networks. Experiment 2(e) . All parameters are the same as in Experiment 2(d), except that weset θ as θ i = 0 . . i/n ) for 1 ≤ i ≤ n (i.e., Experiment 2(e) is the DCSBM case). Experiment 2(f ) . We study how the probability of nodes belong to distinct communitiesinﬂuences the performance of these methods when K = 2 in this sub-experiment. Set theproportion d in { , . , . , . . . , . } . We generate ℓ such that nodes belong to cluster 1with probability 0 . − d , nodes belong to cluster 2 with probability 0 . d . Therefore,as d increases from 0 to 0.45, number of nodes in cluster 1 decreases, so this is a case that Number of nodes in cluster 2 is n − round( nc +1 ) ≈ n − nc +1 = c nc +1 , therefore number of nodes incluster 2 is c times of that in cluster 1.

22s more challenge to detect. The mixing matrix is same as in Experiment 2(d). Let θ be θ ( i ) = 0 . g i = 1 and θ ( i ) = 0 . d , we record the average forthe clustering error rates of 50 simulated networks.The numerical results of Experiment 2 are shown by Figure 4. In Experiment 2(a), as a increases, θ ( i ) decreases for node i in cluster 2, which suggests that there are fewer edgesgenerated in cluster 2, hence it become more challenging to detect cluster 2. Numerical re-sults of Experiment 2(a) suggests that as the variability of degree increases, all approaches(except SLIM) perform poorer while our DRSC, DRSCORE and DRSLIM perform bet-ter than the three traditional spectral clustering methods RSC, SCORE and OCCAM.However, it is interesting to ﬁnd that SLIM has abnormal performances: its error ratedecreases when a increases from 3 to 4. We can not explain the abnormal behavior ofSLIM at present, and we leave it for our future work. In Experiment 2(b), all methods per-form better when the simulated network becomes denser, meanwhile, our three approachesDRSC, DRSCORE and DRSLIM have better performances than that of RSC, SCORE,OCCAM and SLIM. Numerical results of Experiment 2(c) say that all methods can detectdis-associative networks while OCCAM fails. When the oﬀ-diagonal entries of the mixingmatrix are close to its diagonal entries, all methods have poor performances. Numericalresults of Experiment 2(d), 2(e), and 2(f) tell us that though it becomes challenging forall approaches to have satisfactory detection performances for a ﬁxed size network whenthe size of one of the cluster decreases, our approaches DRSC, DRSCORE and DRSLIMalways outperform the other four comparison approaches. In all, our DRSC, DRSCOREand DRSLIM always outperform other approaches in this experiment. Remark: Recall that in the theoretical analysis for DRSC and DRSLIM, we assumethat λ ≥ . . . ≥ λ K > and ˜ λ ≥ . . . ≥ ˜ λ K > (i.e., the networks generated underDCSBM should be associative network), combine with DRSC and DRSLIM’s satisfactoryperformances in Experiment 2(c), we argue that our DRSC and DRSLIM can detect dis-associative networks as well. This phenomenon suggests that the Davia-Kahan theoremmay be extended to the case that S (deﬁned in Lemma ?? ) has several disjoint sets, and weleave the study of this conjecture for future work. a RSCSCORESLIMOCCAMDRSCDRSCOREDRSLIM b b c c d Figure 4: Numerical results of Experiment 2. Top three panels (from left to right): Ex-periment 2(a), Experiment 2(b), and Experiment 2(c). Bottom three panels (from left toright): Experiment 2(d), Experiment 2(e), and Experiment 2(f). y-axis: error rates.

Experiment 3 . This experiment contains two sub-experiments. In both two sub-experiments, we set n = 500 , K = 4, generate ℓ by setting each node belonging to oneof the clusters with equal probability, and set θ i = 1 if g i = 1, θ i = 0 . g i = 2, θ i = 0 . g i = 3, and θ i = 0 . g i = 4. Set α, β in { , / , / , / , . . . , / } , and for eachﬁxed α (as well as β ), we record the mean of the error rate of 50 simulated samples. Experiment 3(a) . We increase the diagonal entries of P while oﬀ-diagonal entries areﬁxed under SBM. The mixing matrix P a ) is set as P a ) =  . α . . . . . α . . . . . α . . . . . α  . When α = 0, the networks have only two communities, and as α increases, the fourcommunities become more distinguishable. Experiment 3(b) . We increase all entries of P to generate denser networks. The mixing24atrix P b ) is set as P b ) =  . β . . β . β . . β . β . β . β . β . β . . β . β . . β  . When β increases from 0 to 0.6, the four communities become more distinguishable andthe simulated networks are more denser. Figure 5: Numerical results of Experiment 3. Left panel: Experiment 3(a). Right panel:Experiment 3(b). y-axis: error rates.Numerical results of Experiment 3 is demonstrated by Figure 5. In experiment 3(a), allmethods perform poor when α is small, that is, a smaller α means that the four communi-ties are more diﬃcult to be distinguished. All methods except SCORE perform better as α increases for Experiment 3(a), while the performances of SCORE almost do not changewhen α increases from 0.3 to 0.6. Our DRSC, DRSCORE and DRSLIM obviously outper-form RSC, SCORE and OCCAM in Experiment 3(b), however, SLIM performs similar asthe proposed methods. It is interesting to ﬁnd that OCCAM has abnormal behaviors thatit performs even poorer when β increases which gives a more denser simulated network,while the other six procedures can handle with a denser network. In this paper, eight real-world network datasets are analyzed to test the performances of ourDRSC, DRSCORE and DRSLIM. The eight datasets are used in the paper Jin et al. (2018)and can be downloaded directly from http://zke.fas.harvard.edu/software.html . Ta-ble 1 presents some basic information about the eight datasets. These eight datasets are25etworks with known labels for all nodes where the true label information is surveyed byresearchers. From Table 1, we can see that d min and d max4 are always quite diﬀerent forany one of the eight real-world datasets, which suggests a DCSBM case. Readers whoare interested in the background information of the eight real-world networks can refer toAppendix B for details.Table 1: Eight real-world data sets with known label information analyzed in this paper. n

34 62 110 92 79 1222 1137 590 K d min d max

17 12 13 24 39 351 293 179

Table 2: Error rates on the eight empirical data sets.

Methods

Karate Dolphins Football Polbooks UKfaculty Polblogs Simmons CaltechRSC K +2 K +2 K +1 DRSLIM

Next we study the performances of these methods on the eight real-world networks.Recall that in Section 3.1, we said that we can apply the leading ( K + K ) eigenvectorsand eigenvalues of L τ and M in our three methods, where K can be 1 or 2. Here, if K is 2 for DRSC and DRSCORE, then we call the two new methods as DRSC K +2 andDRSCORE K +2 , and if K is 1 for DRSLIM, we call the new method as DRSLIM K +1 . Table2 records the error rates on the eight real-world networks, from which we can see that forKarate, Dolphins, Football, Polboks, UKfaculty, and Polblogs, our DRSC, DRSCORE, Where d min = min i D ( i, i ) and d max = max i D ( i, i ). K +2 and DRSLIM K +1 have similar performances as that of RSC, SCORE,OCCAM and SLIM. While, for Simmons and Caltech, our DRSC, DRSCORE, DRSLIM,DRSC K +2 and DRSLIM K +1 approaches detect clusters with much lower error numbersthan RSC, SCORE, OCCAM and SLIM. This phenomenon occurs since Simmons andCaltech are two weak signal networks which is suggested by Jin et al. (2018) where weaksignal networks are deﬁned as network whose leading (K+1)-th eigenvalue of A or itsvariants is close to its leading K -th eigenvalue, while the other six real-world datasetsare deemed as strong signal networks. Recall that our DRSC and DRSCORE apply theleading ( K + 1) eigenvectors to construct ˆ X while our DRSLIM applies the leading ( K + 2)eigenvectors of ˆ M to construct ˇ X , this is the reason that our three approaches outperformthe other four approaches when detecting Simmons and Caltech. Though DRSCOREperforms satisfactory on the eight empirical datasets, DRSCORE K +2 fail to detect Karate,Polbooks and Polblogs, which suggests that K should be 1 for DRSCORE. Meanwhile,DRSC K +2 has similar performances as DRSC, hence K can be 1 or 2 for DRSC, and we set K as 1 in DRSC for simplicity. Finally ,though DRSLIM K +1 performs similar as DRSLIMgenerally, it performs poor on Simmons network, and this is the reason we set K as 2 forour DRSLIM algorithm. At present, there is no practical criterion to choose the optimal τ and τ in this paper. Tohave a better knowledge of the eﬀect of τ and τ on the performances of DRSC, DRSCOREand DRSLIM, we study that whether the three procedures are sensitive to the choice of τ and τ here. For convenience, set τ = τ = τ . Figure 6 records error rates of thethree dual regularized spectral clustering methods on the eight real-world datasets when τ is in { , . , . , . , } (i.e., τ is set as a small number). From Figure 6, we see thatDRSC successfully detect Karate, Dolphins, Football, Polbooks and Ukfaculty. However,when τ = τ = 0, DRSC has high error numbers for detecting Polblogs, Simmons andCaltech. Figure 6 shows that the performances of DRSC on Simmons and Caltech are notsatisfactory when τ is too small. When τ is too small, DRSCORE fails to detect Karate,Polbooks, Polblogs, Simmons and Caltech. Therefore DRSCORE is sensitive to the choiceof τ if τ is too small. When τ is small, DRSLIM has similar performances on the eightempirical datasets as DRSC, hence it shares similar conclusions as DRSC.We also consider the cases when τ is a large number, that is, set τ = τ = τ ∈{ , , . . . , } , and the number errors of DRSC, DRSCORE and DRSLIM on the eightempirical data sets are shown in Figure 7. We can ﬁnd that DRSC and DRSLIM perform27ell with small number errors on the eight real-world networks when τ is slightly largerthan 1. Unfortunately, DRSCORE is still sensitive to the choice of τ since it has largernumber errors for Polbooks and UKfaculty even when τ is set quite large. Combine thenumerical results shown by Figure 6 and Figure 7, we can ﬁnd that our DRSC and DRSLIMare insensitive to the choice of τ and τ when τ equals to τ as long as they are slightlylarger than 1 while DRSCORE is sensitive to the choice of τ and τ .Similar as Figure 1, we can obtain the heatmap of number errors for the eight real-world datasets against τ and τ , and the numerical results for DRSC are shown in Figure8. From Figure 8, we can ﬁnd that DRSC is insensitive to the choice of τ and τ as long as τ is slightly larger, since DRSC performs poor on Simmons when τ is 1 or 10, and DRSCperforms unsatisfactory on Caltech when τ is 1. That is, DRSC is more sensitive on τ than on τ , which is consistent with the ﬁndings in Figure 1. The heatmaps of numbererrors for DRSCORE and DRSLIM are shown in Figure 9 and Figure 10 in Appendix A,which tell us that DRSCORE is sensitive to the choices of τ and τ while DRSLIM isinsensitive.For a conclusion of the above analysis on the choice of τ and τ for DRSC and DRSLIM,a safe choice for τ is the average degree for the given network (i.e., τ = P ni,j =1 A ij /n ), assuggested in Qin and Rohe (2013), since the average degree for sparse networks are alwayslager than 1. Meanwhile, since DRSC and DRSLIM are insensitive to the choice of τ aslong as it is positive, for convenience, we suggest that set τ as P ni,j =1 L τ ( i, j ) /n for DRSCand DRSLIM. As for DRSCORE, substantial numerical results show that DRSCORE hassatisfactory numerical performances when setting τ = P i,j A ( i, j ) , τ = P i,j L τ ( i,j ) nK . We study the performances of multiple regularized spectral clustering methods. Recall thatour DRSC, DRSCORE and DRSLIM are designed based on dual regularized Laplacianmatrix, an interesting idea comes naturally, we can design multiple regularized spectralclustering methods and give respective theoretical framework (in Appendix ?? ) just as ourDRSC, DRSCORE and DRSLIM procedures.Before introducing multiple regularized spectral methods, ﬁrst we deﬁne the M -th Without causing confusion with the matrix M deﬁned in the Ideal DRSLIM algorithm, here we use M to denote a positive integer for multiple regularization. DRSC: Karate

DRSC: Dolphins

DRSC: Football

DRSC: Polbooks

DRSC: Ukfaculty

DRSC: Polblogs

DRSC: Simmons

DRSC: Caltech

DRSCORE: Karate

DRSCORE: Dolphins

DRSCORE: Football

DRSCORE: Polbooks

DRSCORE: Ukfaculty

DRSCORE: Polblogs

DRSCORE: Simmons

DRSCORE: Caltech

DRSLIM: Karate

DRSLIM: Dolphins

DRSLIM: Football

DRSLIM: Polbooks

DRSLIM: Ukfaculty

DRSLIM: Polblogs

DRSLIM: Simmons

DRSLIM: Caltech

Figure 6: Number errors on the eight empirical data sets for DRSC, DRSCORE andDRSLIM when τ is in { , . , . , . , } . x-axis: τ . y-axis: number errors.29

50 100-1-0.500.51

DRSC: Karate

DRSC: Dolphins

DRSC: Football

DRSC: Polbooks

DRSC: Ukfaculty

DRSC: Polblogs

DRSC: Simmons

DRSC: Caltech

DRSCORE: Karate

DRSCORE: Dolphins

DRSCORE: Football

DRSCORE: Polbooks

DRSCORE: Ukfaculty

DRSCORE: Polblogs

DRSCORE: Simmons

DRSCORE: Caltech

DRSLIM: Karate

DRSLIM: Dolphins

DRSLIM: Football

DRSLIM: Polbooks

DRSLIM: Ukfaculty

DRSLIM: Polblogs

DRSLIM: Simmons

DRSLIM: Caltech

Figure 7: Number errors on the eight empirical data sets for DRSC, DRSCORE andDRSLIM when τ is in { , , . . . , } . x-axis: τ . y-axis: number errors.30

10 19 28 37 46 55 64 73 82 91 100110192837465564738291100

Karate -1-0.500.51

Dolphins

Football

Polbooks

UKfaculty

Polblogs

Simmons

Caltech

Figure 8: The eﬀect of τ and τ on the performances of DRSC for the eight empiricalnetworks. x-axis: τ . y-axis: τ . 31egularized Laplacian matrix L τ M by iteration as: L τ M = D − / τ M L τ M − D − / τ M , M = 1 , , , . . . , where L τ = A, D τ M = D M + τ M I , D M is an n × n diagonal matrix whose i -th diagonalentry is given by D M ( i, i ) = P j L τ M − ( i, j ).First, we introduce the M -th multiple regularized spectral clustering (MRSC) methoddesigned as below: MRSC . Input:

A, K , and regularizers τ , τ , . . . , τ M . Output: nodes labels. Step 1 : Obtain the M -th graph Laplacian matrix L τ M . Step 2 : Obtain the matrix of the product of the leading

K+1 eigenvectors with unit-norm and the leading

K+1 eigenvalues of L τ M byˆ X = [ˆ η , ˆ η , . . . , ˆ η K , ˆ η K +1 ] · diag(ˆ λ , ˆ λ , . . . , ˆ λ K , ˆ λ K +1 ) . where ˆ λ i is the i -th leading eigenvalue of L τ M , and ˆ η i is the respective eigenvector withunit-norm. Step 3 : Obtain ˆ X ∗ by normalizing each of ˆ X ’ rows to have unit length. Step 4 : Apply K-means to ˆ X ∗ , assuming there are K clusters.Note that for MRSC, the default regularizer τ m = P ni,j L τ m − ( i,j ) /n for 1 ≤ m ≤ M .When M is 2, the 2RSC is actually our DRSC approach. And when M is 1, the 1RSCmethod is the regularized spectral clustering. Table 3 records number errors for the eightreal-world datasets of MRSC (M= 1, 2, . . . , 10) approaches.Table 3: Error rates on the eight empirical datasets for MRSC when M=1, 2, . . . , 10. Methods

Karate Dolphins Football Polbooks UKfaculty Polblogs Simmons Caltech1RSC 0/34 0/62 5/110 3/92 1/79 66/1222 133/1137 97/590DRSC 0/34 1/62 5/110 3/92 2/79 63/1222 124/1137 95/5903RSC 0/34 1/62 6/110 2/92 2/79 57/1222 193/1137 94/5904RSC 3/34 1/62 3/110 2/92 2/79 60/1222 184/1137 93/5905RSC 3/34 1/62 3/110 2/92 2/79 61/1222 182/1137 151/5906RSC 3/34 1/62 6/110 2/92 2/79 64/1222 266/1137 151/5907RSC 3/34 1/62 3/110 2/92 2/79 588/1222 266/1137 153/5908RSC 3/34 1/62 3/110 1/92 2/79 588/1222 264/1137 159/5909RSC 3/34 1/62 6/110 1/92 2/79 588/1222 264/1137 212/59010RSC 3/34 1/62 3/110 1/92 2/79 588/1222 264/1137 204/590 K + 1) eigenvectorsto construct ˆ X for clustering. 4) 7RSC, 8RSC, 9RSC and 10RSC fail to detect Polblogs,Simmons and Caltech. 5) 1RSC is the regular regularized spectral clustering, and it per-forms satisfactory on all the eight real-world datasets. When compared the performancesof 1RSC with our DRSC, we can ﬁnd that our DRSC outperforms 1RSC. Therefore, bythe above analysis of Table 3, generally speaking, our DRSC (the dual regularized spectralclustering method) has the best performances among all MRSC approaches. Though wecan build respective theoretical frameworks for MRSC (when M is 3, 4, 5, etc.) follow-ing similar theoretical analysis procedures of DRSC , it is tedious to build full theoreticalframework for MRSC. Furthermore, it is unnecessary to ﬁnd the theoretical bound forMRSC when M is large since our DRSC has the best performances. For reader’s reference,we provide the population analysis for MRSC in Appendix ?? to show that MRSC returnsperfect under the ideal case.Then we introduce the M -th multiple regularized spectral clustering on ratios-of-eigenvectors(MRSCORE) method designed as follows: MRSCORE . Input:

A, K , and regularizers τ , τ , . . . , τ M . Output: nodes labels. Step 1 : Obtain L τ M . Step 2 : Obtain the matrix of the product of the leading

K+1 eigenvectors with unit-norm and the leading

K+1 eigenvalues of L τ M byˆ X = [ˆ η , ˆ η , . . . , ˆ η K , ˆ η K +1 ] · diag(ˆ λ , ˆ λ , . . . , ˆ λ K , ˆ λ K +1 ) . Step 3 : Obtain the n × K matrix ˆ R of entry-wise eigen-ratios such that ˆ R ( i, k ) =ˆ X k +1 ( i ) / ˆ X ( i ) , ≤ i ≤ n, ≤ k ≤ K . Step 4 : Apply K-means to ˆ R , assuming there are K clusters.Note that for MRSCORE, the default regularizer τ m = P ni,j L τ m − ( i,j ) for 1 ≤ m ≤ M − τ M = P ni,j L τM − nK . Numerical results show that MRSCORE is sensitive tothe choice of regularizers, therefore we recommend applying the default regularizers forMRSCORE.When M is 2, the 2RSCORE is actually our DRSCORE approach. Table 4 recordsnumber errors for the eight real-world datasets of MRSCORE (M= 1, 2, . . . , 6) approaches.33rom Table 4, we see that MRSCORE performs similar when M is diﬀerent when dealingwith the eight empirical datasets.Table 4: Error rates on the eight empirical datasets for MRSCORE when M=1, 2, . . . , 6. Methods

Karate Dolphins Football Polbooks UKfaculty Polblogs Simmons Caltech1RSCORE 0/34 4/62 6/110 4/92 3/79 65/1222 117/1137 99/590DRSCORE 0/34 4/62 5/110 4/92 3/79 65/1222 117/1137 99/5903RSCORE 0/34 4/62 6/110 4/92 3/79 64/1222 117/1137 99/5904RSCORE 0/34 4/62 3/110 4/92 3/79 64/1222 117/1137 99/5905RSCORE 2/34 4/62 3/110 4/92 3/79 64/1222 117/1137 98/5906RSCORE 2/34 4/62 3/110 4/92 3/79 64/1222 117/1137 99/590

Finally, we introduce the M -th multiple regularized symmetric Laplacian inverse matrix(MRSLIM) method designed as below: MRSLIM . Input:

A, K , and regularizers τ , τ , . . . , τ M . Output: nodes labels. Step 1 : Obtain L τ M . Step 2 : Obtain ˆ W = ( I − e − γ D − τ M L τ M ) − . Set ˆ M = ( ˆ W + ˆ W ′ ) / M ’s diagonalentries to be 0. Step 3 : Obtain ˇ X by ˇ X = [ˇ η , ˇ η , . . . , ˇ η K +1 , ˇ η K +2 ] · diag(ˇ λ , ˇ λ , . . . , ˇ λ K +1 , ˇ λ K +2 ), whereˇ λ i is the i -th leading eigenvalue of ˆ M , and ˇ η i is the respective eigenvector with unit-norm. Step 4 : Obtain ˇ X ∗ by normalizing each of ˇ X ’ rows to have unit length. Step 5 : Apply K-means to ˇ X ∗ , assuming there are K clusters.Note that for MRSLIM, the default regularizer τ m = P ni,j L τ m − ( i,j ) /n for 1 ≤ m ≤ M .When M is 2, the 2RSLIM is actually our DRSLIM approach. Table 5 records numbererrors for the eight real-world datasets of MRSLIM (M= 1, 2, . . . , 6) approaches. Fromtable 5, we see that our DRSLIM almost always has the best performances among allMRSLIM approaches.Table 5: Error rates on the eight empirical datasets for MRSLIM when M=1, 2, . . . , 6. Methods

Karate Dolphins Football Polbooks UKfaculty Polblogs Simmons Caltech1RSLIM 0/34 0/62 3/110 2/92 2/79 58/1222 121/1137 96/590DRSLIM 0/34 0/62 3/110 2/92 2/79 59/1222 115/1137 98/5903RSLIM 0/34 1/62 3/110 2/92 2/79 55/1222 122/1137 99/5904RSLIM 0/34 1/62 3/110 2/92 2/79 58/1222 123/1137 97/5905RSLIM 0/34 1/62 3/110 2/92 2/79 60/1222 277/1137 90/5906RSLIM 0/34 1/62 3/110 5/92 2/79 61/1222 276/1137 100/590 Discussion

In this paper, we give theoretical analysis, simulation and empirical results that demon-strate how a simple adjustment to the traditional spectral clustering methods RSC, SCOREand recently published spectral clustering method SLIM can give dramatically better re-sults for community detection. There are several open problems for future work. Forexample, study the theoretical optimal values for τ , τ and γ . Find the relationship be-tween the eigenvalues of L τ (and M ) and the two regularizers τ , τ . Since we only providepopulation analysis for DRSCORE and it is sensitive to the choice of τ and τ , it is mean-ingful to build full theoretical framework for DRSCORE to study the question that whetherthere exists optimal regularizers for DRSCORE. Because numerical results show that ourDRSC and DRSLIM can successfully detect dis-associative networks while we assume thatthe leading K eigenvalues of L τ and M should be positive in the theoretical analysis forDRSC and DRSLIM, it is meaningful to extend the Davis-Kahan theorem to disjoint sets of S (deﬁned in Lemma ?? ). As in Jing et al. (2021), it is interesting to study the consistencyof DRSC, DRSCORE and DRSLIM for both sparse networks and dense networks. Thoughwe design three approaches DRSC, DRSCORE, DRSLIM based on the application of thedual regularized Laplacian matrix L τ and build theoretical framework for them as well asinvestigate their performances via comparing with several spectral clustering methods, it isstill unsolved that whether dual regularization can always provide with better performancesthan ordinary regularization in both theoretical analysis and numerical guarantees. Finally,it is interesting to study the question that whether there exists optimal α op and β op suchthat spectral clustering methods based on the leading eigenvectors and eigenvalues of thematrix D − α op τ L β op τ D − α op τ outperform methods designed based on the matrix D − ατ L βτ D − ατ forany α and β . We leave studies of these problems to our future work. References

Adamic, L. A. and N. Glance (2005). The political blogosphere and the 2004 us election:divided they blog. pp. 36–43.Airoldi, E. M., D. M. Blei, S. E. Fienberg, and E. P. Xing (2008). Mixed membershipstochastic blockmodels.

Journal of Machine Learning Research 9 , 1981–2014.Amini, A. A., A. Chen, P. J. Bickel, and E. Levina (2013). Pseudo-likelihood methods forcommunity detection in large sparse networks.

Annals of Statistics 41 (4), 2097–2122.35ickel, P. J. and A. Chen (2009a). A nonparametric view of network models and New-man–Girvan and other modularities.

Proceedings of the National Academy of Sciencesof the United States of America 106 (50), 21068–21073.Bickel, P. J. and A. Chen (2009b). A nonparametric view of network models and New-man–Girvan and other modularities.

Proceedings of the National Academy of Sciencesof the United States of America 106 (50), 21068–21073.Borgatti, S. P. and M. G. Everett (1997). Network analysis of 2-mode data.

Social Net-works 19 (3), 243–269.Burt, R. S. (1976). Positions in networks.

Social Forces 55 (1), 93–122.Chaudhuri, K., F. Chung, and A. Tsiatas (2012). Spectral clustering of graphs with generaldegrees in the extended planted partition model. pp. 1–23.Chen, H. and F. Zhang (2007). Resistance distance and the normalized laplacian spectrum.

Discrete Applied Mathematics 155 (5), 654–661.Chen, Y., X. Li, and J. Xu (2018). Convexiﬁed modularity maximization for degree-corrected stochastic block models.

Annals of Statistics 46 (4), 1573–1602.Chung, F. R. and F. C. Graham (1997).

Spectral graph theory . Number 92. AmericanMathematical Society.Daudin, J. J., F. Picard, and S. Robin (2008). A mixture model for random graphs.

Statistics and Computing 18 (2), 173–183.Dong, X., D. Thanou, P. Frossard, and P. Vandergheynst (2016). Learning laplacian matrixin smooth graph signal representations.

IEEE Transactions on Signal Processing 64 (23),6160–6173.Doreian, P. (1985). Structural equivalence in a psychology journal network.

Journal of theAssociation for Information Science and Technology 36 (6), 411–417.Doreian, P., V. Batagelj, and A. Ferligoj (1994). Partitioning networks based on generalizedconcepts of equivalence.

Journal of Mathematical Sociology 19 (1), 1–27.Girvan, M. and M. E. Newman (2002). Community structure in social and biologicalnetworks.

Proceedings of the national academy of sciences 99 (12), 7821–7826.36olland, P. W., K. B. Laskey, and S. Leinhardt (1983). Stochastic blockmodels: Firststeps.

Social Networks 5 (2), 109–137.Jin, J. (2015). Fast community detection by SCORE.

Annals of Statistics 43 (1), 57–89.Jin, J., Z. T. Ke, and S. Luo (2018). SCORE+ for network community detection. arXivpreprint arXiv:1811.05927 .Jing, B., T. Li, N. Ying, and X. Yu (2021). Community detection in sparse networks usingthe symmetrized laplacian inverse matrix (slim).

Statistica Sinica .Joseph, A. and B. Yu (2016). Impact of regularization on spectral clustering.

Annals ofStatistics 44 (4), 1765–1791.Karrer, B. and M. E. J. Newman (2011). Stochastic blockmodels and community structurein networks.

Physical Review E 83 (1), 16107.Lorrain, F. and H. C. White (1971). Structural equivalence of individuals in social networks.

Journal of Mathematical Sociology 1 (1), 49–80.Lusseau, D. (2003). The emergent properties of a dolphin social network.

Proceedings ofthe Royal Society of London. Series B: Biological Sciences 270 (suppl 2), S186–S188.Lusseau, D. (2007). Evidence for social role in a dolphin social network.

Evolutionaryecology 21 (3), 357–366.Lusseau, D., K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson (2003).The bottlenose dolphin community of Doubtful Sound features a large proportion oflong-lasting associations.

Behavioral Ecology and Sociobiology 54 (4), 396–405.Luxburg, U. (2007). A tutorial on spectral clustering.

Statistics and Computing 17 (4),395–416.Ma, Z. and Z. Ma (2017). Exploration of large networks with covariates via fast anduniversal latent space model ﬁtting. arXiv preprint arXiv:1705.02372 .Nepusz, T., A. Petr´oczi, L. N´egyessy, and F. Bazs´o (2008). Fuzzy communities and theconcept of bridgeness in complex networks.

Physical Review E 77 (1), 16107–16107.Nepusz, T., A. Petr´oczi, L. N´egyessy, and F. Bazs´o (2008). Fuzzy communities and theconcept of bridgeness in complex networks.

Physical Review E 77 (1), 016107.37ewman, M. (2006). Modularity and community structure in networks.

Bulletin of theAmerican Physical Society .Newman, M. E. and M. Girvan (2004). Finding and evaluating community structure innetworks.

Physical review E 69 (2), 026113.Newman, M. E. J. and A. Clauset (2016). Structure and inference in annotated networks.

Nature Communications 7 (1), 11863–11863.Ng, A. Y., M. I. Jordan, and Y. Weiss (2002). On spectral clustering: Analysis and analgorithm.

Advances in Neural Information Processing Systems , 849–856.Qin, T. and K. Rohe (2013). Regularized spectral clustering under the degree-correctedstochastic blockmodel. In

Advances in Neural Information Processing Systems 26 , pp.3120–3128.Reichardt, J. and D. R. White (2007). Role models for complex networks.

EuropeanPhysical Journal B 60 (2), 217–224.Snijders, T. A. B. and K. Nowicki (1997). Estimation and prediction for stochastic block-models for graphs with latent block structur.

Journal of Classiﬁcation 14 (1), 75–100.Tang, C., H. Zhou, X. Zheng, Y. Zhang, and X. Sha (2019). Dual laplacian regularizedmatrix completion for microrna-disease associations prediction.

RNA Biology 16 (5),601–611.Traud, A. L., E. D. Kelsic, P. J. Mucha, and M. A. Porter (2011). Comparing communitystructure to characteristics in online collegiate social network.

Siam Review 53 (3), 526–543.Traud, A. L., P. J. Mucha, and M. A. Porter (2012). Social structure of facebook networks.

Physica A-statistical Mechanics and Its Applications 391 (16), 4165–4180.Von Luxburg, U. (2007). A tutorial on spectral clustering.

Statistics and Computing 17 (4),395–416.Wang, Y. J. and G. Y. Wong (1987). Stochastic blockmodels for directed graphs.

Journalof the American Statistical Association 82 (397), 8–19.38eyl, H. (1912). Das asymptotische verteilungsgesetz der eigenwerte linearer partiellerdiﬀerentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung).

Mathematische Annalen 71 (4), 441–479.Xiao, Q., J. Luo, C. Liang, J. Cai, and P. Ding (2018). A graph regularized non-negativematrix factorization method for identifying microrna-disease associations.

Bioinformat-ics 34 (2), 239–248.Yankelevsky, Y. and M. Elad (2016). Dual graph regularized dictionary learning.

IEEETransactions on Signal and Information Processing over Networks 2 (4), 611–624.Zachary, W. W. (1977). An information ﬂow model for conﬂict and ﬁssion in small groups.

Journal of Anthropological Research 33 (4), 452–473.Zhang, Y., E. Levina, and J. Zhu (2020). Detecting overlapping communities in networksusing spectral methods.

SIAM Journal on Mathematics of Data Science 2 (2), 265–283.39 ppendices

A Heatmaps for DRSCORE and DRSLIM

Karate

Dolphins -1-0.500.51

Football

Polbooks

UKfaculty

Polblogs

Simmons

Caltech

Figure 9: The eﬀect of τ and τ on the performances of DRSCORE for the eight empiricalnetworks. x-axis: τ . y-axis: τ . 40

10 19 28 37 46 55 64 73 82 91 100110192837465564738291100

Karate -1-0.500.51

Dolphins

Football

Polbooks

UKfaculty

Polblogs

Simmons

Caltech