Higher-Order Spectral Clustering under Superimposed Stochastic Block Model
HHigher-Order Spectral Clustering under Superimposed StochasticBlock Models
Subhadeep Paul, The Ohio State University Olgica Milenkovic, University of Illinois at Urbana-ChampaignYuguo Chen, University of Illinois at Urbana-Champaign
Abstract
Higher-order motif structures and multi-vertex interactions are becoming increas-ingly important in studies that aim to improve our understanding of functionalitiesand evolution patterns of networks. To elucidate the role of higher-order structures incommunity detection problems over complex networks, we introduce the notion of a Su-perimposed Stochastic Block Model (SupSBM). The model is based on a random graphframework in which certain higher-order structures or subgraphs are generated throughan independent hyperedge generation process, and are then replaced with graphs thatare superimposed with directed or undirected edges generated by an inhomogeneousrandom graph model. Consequently, the model introduces controlled dependenciesbetween edges which allows for capturing more realistic network phenomena, namelystrong local clustering in a sparse network, short average path length, and communitystructure. We proceed to rigorously analyze the performance of a number of recentlyproposed higher-order spectral clustering methods on the SupSBM. In particular, weprove non-asymptotic upper bounds on the misclustering error of spectral communitydetection for a SupSBM setting in which triangles or 3-uniform hyperedges are super-imposed with undirected edges. As part of our analysis, we also derive new boundson the misclustering error of higher-order spectral clustering methods for the standardSBM and the 3-uniform hypergraph SBM. Furthermore, for a non-uniform hypergraphSBM model in which one directly observes both edges and 3-uniform hyperedges, weobtain a criterion that describes when to perform spectral clustering based on edgesand when on hyperedges, based on a function of hyperedge density and observationquality.
KEYWORDS: Higher-order structures; Superimposed random graph model; Spectralclustering; Hypergraphs; Community detection.
Network data science has traditionally focused on studies capturing two-way interactions orconnections between pairs of vertices or agents in networks. In this context, the problemsof interest have been to identify heterogeneous and power law vertex degree distributions(e.g., determine if the networks are scale-free) as well as dense subgraphs and cliques, andefficiently detect and isolate community structures (Newman 2003; Barab´asi and Albert1999; Watts and Strogatz 1998).It has by now become apparent that many aspects of relational organization, functionalityand the evolving structure of a complex network can only be understood through higher-order Emails: Paul - [email protected] , Milenkovic - [email protected] , Chen - [email protected] a r X i v : . [ s t a t . M L ] D ec ubgraph (motif) interactions involving more than two vertices (Milo et al. 2002; Shen-Orret al. 2002; Mangan and Alon 2003; Honey et al. 2007; Alon 2007; Porter et al. 2009; Bensonet al. 2016; Yavero˘glu et al. 2014). Certain subgraphs in networks function as fundamentalunits of control and regulation of network communities and dynamics: for example, networkmotifs are crucial regulators in brain networks (Sporns and K¨otter 2004; Park and Friston2013; Battiston et al. 2017), transcriptional regulatory networks (Mangan and Alon 2003),food webs (Paulau et al. 2015; Li and Milenkovic 2017), social networks (Girvan and New-man 2002; Snijders 2001) and air traffic networks (Rosvall et al. 2014; Benson et al. 2016).Traditionally, statistical and algorithmic work on network motifs has been concerned withdiscovering and counting the frequency of over-expressed subgraphs (which are usually deter-mined in comparison with some statistical null model) in various real world networks (Alon2007; Klusowski and Wu 2018). Indeed, frequency distributions or spectra of motifs havebeen shown to provide useful information about the regulatory and dynamic organization ofnetworks obtained from disparate sources. Network motifs have also recently been used toperform learning tasks such as community detection (Benson et al. 2016; Li and Milenkovic2017; Tsourakakis et al. 2017). A parallel line of work has focused on identifying communi-ties in hypergraphs and was reported in Zhou et al. (2006); Angelini et al. (2015); Kim et al.(2017); Ghoshdastidar et al. (2017); Chien et al. (2018).Unfortunately, existing random graph models with community structures based on Erd¨os-R´enyi random graphs (Erd¨os and R´enyi 1960), such as the Stochastic Block Models (Hollandet al. 1983; Snijders and Nowicki 1997; Bickel and Chen 2009; Choi et al. 2012; Rohe et al.2012; Celisse et al. 2012; Rohe et al. 2011; Qin and Rohe 2013; Jin 2015; Lei et al. 2015;Decelle et al. 2011; Hajek et al. 2016; Abbe and Sandon 2015; Gao et al. 2017), their degree-corrected versions (Karrer and Newman 2011; Zhao et al. 2012), and other extensions failto produce graphs with strong local clustering, i.e., with over-abundant triangles and otherrelevant higher-order structures. To investigate community structures in terms of particularsubgraphs and determine under which conditions they can be recovered or detected, oneneeds to consider more versatile community structure models.To address the aforementioned problem, a number of more realistic network models withsome of the desired motif structures have been proposed in the literature; however, most suchmodels are not mathematically tractable in general or in the context of community detectiondue to dependencies among the edges (Bollob´as et al. 2011). Notable exceptions include themathematically tractable random graph model with local clustering and dependences amongedges proposed in Bollob´as et al. (2011). There, the authors constructed random graphs bysuperimposing small subgraphs and edges, thereby introducing dependencies among subsetsof vertices. More specifically, they constructed an inhomogeneous random hypergraph withconditionally independent hyperedges, and then replaced each hyperedge by a completegraph over the same set of vertices. A similar model, termed the Subgraph GenerationModel (SUGM), was proposed in Chandrasekhar and Jackson (2014, 2016).More recently, Hajek and Sankagiri (2018) analyzed a variation of the preferential at-tachment model with community structure and proposed a message passing algorithm torecover the communities. In parallel, a geometric block model that uses Euclidean latentspace geometric graphs instead of the usual Erdo¨s-Re´nyi graphs for the mixture componentswas introduced in Galhotra et al. (2017, 2018). Although all these modes capture someaspects of real-life networks and introduce controlled dependencies among the edges in the2raphs, they fail to provide a general approach for combining dependent motif structuresand analytical techniques that highlight if communities should be identified though pairwiseor higher-order interactions.Our contributions are two-fold. First, we propose a new Superimposed Stochastic BlockModel (SupSBM), a random graph model for networks with community structure obtainedby generalizing the framework of Chandrasekhar and Jackson (2014) and Bollob´as et al.(2011) to account for communities akin to the classical SBM. SupSBM captures the mostrelevant aspects of higher-order organization of the datasets under consideration, e.g., itincorporates triangles and other motifs, but couples them through edges that may be viewedas noise in the motif-based graphs. The community structure of interest maybe presenteither at a higher-order structural level only or both at the level of higher-order structuresand edges. Drawing parallels with the classical SBM which is a mixture of Erd¨os-R´enyigraphs, SupSBM may be viewed as a mixture of superimposed inhomogeneous random graphsgenerated according to process described in Chandrasekhar and Jackson (2014) and Bollob´aset al. (2011). Second, we derive theoretical performance guarantees for higher-order spectralclustering methods (Benson et al. 2016; Tsourakakis et al. 2017) applied to SupSBM. Themain difference between our analysis and previous lines of work on spectral algorithms forthe SBM (Rohe et al. 2011; Lei et al. 2015; Gao et al. 2017; Chin et al. 2015; Vu and Lei2013), and hypergraph SBM (Ghoshdastidar et al. 2017; Kim et al. 2017; Chien et al. 2018)is that the elements of the analogues of adjacency matrices in our analysis are dependent.We derive several non-asymptotic upper bounds of the spectral norms of such generalizedadjacency matrices, and these results are of independent interest in other areas of networkanalysis. For this purpose, we express the spectral norms as sums of polynomial functionsof independent random variables. The terms in the sums are dependent, however, any giventerm is dependent only on a small fraction of other terms. We exploit this behavior tocarefully control the effects of such dependence on the functions of interest. We use recentresults on polynomial functions of independent random variables (Boucheron et al. 2013; Kimand Vu 2000; Janson and Ruci´nski 2004), typical bounded differences inequalities (Warnke2016) and Chernoff style concentration inequalities under limited dependence (Warnke 2017)to complete our analysis. In addition, we derive a number of corollaries implying performanceguarantees for higher-order spectral clustering under the classical stochastic block model andthe hypergraph stochastic block model. The analysis of the non-uniform hypergraph SBMalso reveals interesting results regarding the benefit of using ordinary versus higher orderspectral clustering methods on random hypergraphs.The remainder of the article is organized as follows. Section 2 defines superimposed ran-dom graph models and then develops the Superimposed Stochastic Block Model (SupSBM).Section 3 presents a non-asymptotic analysis of the misclustering rate of higher-order spectralclustering under the SupSBM. Some real world network examples are discussed in Section 4.The Appendix contains proofs of all the theorems and many auxiliary lemmas used in thederivations. 3 Superimposed random graph and block models
We start our analysis by defining what we refer to as an inhomogeneous superimposed randomgraph model, which is based on the random graph models described in Bollob´as et al. (2011);Chandrasekhar and Jackson (2014). We then proceed to introduce a natural extension ofthe stochastic block model in which the community components are superimposed randomgraphs. Our main focus is on models that superimpose edges and triangles, as these areprevalent motifs in real social and biological networks (Alon 2007; Benson et al. 2016; Li andMilenkovic 2017; Laniado et al. 2016). However, as discussed in subsequent sections, thesuperimposed SBM can be easily extended to include other superimposed graph structures.Formally, the proposed random graph model, denoted by G s ( n, P e , P t ), is a superimposi-tion of a classical dyadic (edge-based) random graph G e ( n, P e ) and a triadic (triangle-based)random graph G t ( n, P t ). In this setting, n denotes the number of vertices in the graph, P e denotes an n × n matrix whose ( i, j )th entry equals the probability of an edge in G e betweenthe vertices i and j , and P t denotes a 3-way (3rd order) n × n × n tensor whose ( i, j, k )thelement equals the probability of a triangle involving the vertices ( i, j, k ) in G t .A random graph from the model G s ( n, P e , P t ) is generated as follows. One starts with n unconnected vertices. The G t ( n, P t ) graph is generated by creating triangles (3 − hyperedges)for each of the (cid:18) n (cid:19) i, j, k ) according to the outcome of independent Bernoulli random variables T ijk with parameter p tijk = ( P t ) ijk . The hyperedges are conse-quently viewed as triangles, which results in a loss of their generative identity. Note thatthis process may lead to multi-edges between pairs of vertices i and j if these are involvedin more than one triangle. The multi-edges in the graph G t are collapsed into single edges.All pairs of vertices ( i, j ) remain within all their constituent triangles as before the mergingprocedure. Next, the graph G e ( n, P e ) is generated by placing edges between the (cid:18) n (cid:19) pairsof vertices ( i, j ) according to the outcomes of independent Bernoulli random variables E ij with parameter p eij = ( P e ) ij . Note this is simply the usual inhomogeneous random graphmodel (Bollob´as et al. 2007) that may be viewed as a generalization of the Erd¨os-R´enyimodel in which the probabilities of individual edges are allowed to be nonuniform. The twoindependently generated graphs are then superimposed to arrive at G s ( n, P e , P t ).The graph generation process is depicted by an example in Figure 1. Observe that thesuperimposed graph is allowed to contain multi-edges (or, more precisely, exactly two edges)between two vertices if and only if those vertices are involved in both at least one trianglein G t and an edge in G e . A practical justification for this choice of a multi-edge modelcomes from the fact that pair-wise and triple-wise affinities often provide complementaryinformation . Clearly, the resulting graph G s has dependences among its edges and stronglocal clustering properties for properly chosen matrices P t due to the increased presence oftriangles.Furthermore, we would like to point out that this inhomogeneous superimposed random For example, Laniado et al. (2016) studied gender patterns in dyadic and triadic ties in an onlinesocial network and found different degrees of gender homophily in different types of ties. Hence instead ofduplicating evidence from the same source, we retain two parallel edges in the graph only if they reinforcethe information provided by each other. G t with n = 7 vertices, before multi-edge collapsing;(b) the collapsed graph G t ; (c) the dyadic graph G e , and (d) the superimposed graph G s .In the simplest incarnation of the model, one may choose ( P e ) ij = p e for all i, j and( P t ) ijk = p t for all i, j, k . In this case, the graph G e is a classical Erd¨os-R´enyi dyadicrandom graph, while G t before multi-edges collapsing may be thought of as a generalizationof Erd¨os-R´enyi graphs to the triadic setting.We describe next the superimposed stochastic block model based on G s graphs. Our superimposed stochastic block model (SupSBM) is based on the inhomogeneous super-imposed random graph framework defined in the previous section. We consider two types ofSupSBMs. In the first case, “community signals” are present both in the higher-order struc-tures and the dyadic edges, while in the second case, the “community signals” are presentonly in the higher-order structures but not in the dyadic edges. Drawing a parallel withthe classical SBM, where intra- and inter-community edges are generated via Erd¨os-R´enyigraphs, both the intra- and inter-community edges in SupSBM are generated by superim-posed random graph models ( G s ) as defined in the previous section.We formally define a graph with n vertices and k communities generated from a SupSBMas follows. Each vertex of the graph is assigned a community label vector of length k , whichtakes the value of 1 at the position corresponding to its community and 0 at all otherpositions. To organize the labels, we keep track of an n × k community assignment matrix C whose i th row C i is the community label vector for the i th vertex. Given the community5ssignments for all the vertices in the graph, the triangle hyperedge indicators T ijk involvingthree distinct vertices i , j , k are (conditionally) independent, and they follow a Bernoullidistribution with a parameter that depends only on the community assignments, i.e., P ( T ijk = 1 | C ip = 1 , C jq = 1 , C kl = 1) = π tpql , p, q, l ∈ { , . . . , k } , where π t is a 3-way k × k × k tensor of parameters. The triangle hyperedges naturally reduceto a triangle, and as before, multi-edges are collapsed to form the graph G t .An edge between two vertices i and j is generated independently of other edges andhyperedges following a Bernoulli distribution with a parameter that also depends on thecommunity assignments, so that the edge indicator variable E ij satisfies P ( E ij = 1 | C ip = 1 , C jq = 1) = π epq , p, q ∈ { , . . . , k } , where π e is a k × k matrix of model parameters. For the case that the community structureis present only in the higher-order structures and not at the level of dyadic edges, thisparameter equals p e irrespective of the communities that the vertices i and j belong to. Thedesired graph is obtained by superimposing G t and G e following the process described in theprevious section.The above described model contains a large number of unknown parameters, and toenable a more tractable analysis, we proceed as follows. We define the stochastic blockmodel on 3 − hyperedges in the following manner: P ( T ijk = 1 | C i , C j , C k ) = (cid:40) a t n , if C i = C j = C kb t n , otherwise , so that the probability of a triangle hyperedge equals a t n if the three vertices involved arein the same community, and b t n if at least one of the vertices is in a different communitythan the other two. The dyadic edges are generated according to the following rule: theprobability of an edge is a e n if both the end points belong to the same community and b e n ifthey belong to different communities.Another simplification consists in assuming that all communities are of the same size,leading to balanced n -vertex k -block SupSBMs, G s ( n, k, C, a e , b e , a t , b t ), in which all the k communities have nk vertices and the matrix C is an n × k community assignment matrix. Spectral clustering methods for hypergraphs, also known as higher-order spectral clusteringmethods, have been studied in a number of recent papers (Zhou et al. 2006; Benson et al.2016; Tsourakakis et al. 2017; Li and Milenkovic 2017). In particular, Benson et al. (2016)introduced a method that creates a “motif adjacency matrix” for each motif structure ofinterest. In a motif adjacency matrix, the ( i, j )th element represents the number of motifsthat include the vertices i and j . Spectral clustering is applied to the motif adjacency matrixin a standard form in order to find communities of motifs. While there are many variantsof spectral clustering that may be applied to the motif adjacency matrices, throughout our6nalysis, we investigate only one algorithm, which computes the k eigenvectors correspondingto the k largest in absolute value eigenvalues of the motif adjacency matrix. The algorithmsubsequently performs a (1 + (cid:15) )-approximate, (cid:15) > k -means clustering (Kumar et al. 2004;Lei et al. 2015) on the rows of the resultant n × k matrix of eigenvectors. Furthermore,we only consider two motif adjacency matrices, involving edges and triangles. The primarygoal of our analysis is to describe how to detect the community structures of the SupSBMfrom observed triangle patterns using spectral clustering. We will consider both versionsof SupSBM, namely, one with community structure present only at the triangle level, andanother with community structure present both at the triangle and edge levels. In whatfollows, we first prove a number of concentration results for certain motif adjacency matricesunder the more general inhomogeneous superimposed random graph model. Subsequently,we specialize our analysis to the SupSBMs. Let G ∼ G s ( n, P e , P t ) be a graph generated from the inhomogeneous superimposed edge-triangle random graph model. We introduce two matrices, A E and A T , respectively; ( A E ) ij represent the number of observed edges between the vertices i and j , while ( A T ) ij representsthe number of observed triangles including both i and j as vertices. Note these matricesare not the motif adjacency matrices of G e and G t , since there are edges in G t that con-tribute to A E and triangles from G e that contribute to A T . There may be many “incidentallygenerated” or imposed triangles (Chandrasekhar and Jackson 2014) that arise due to super-imposition which also contribute to A T . The different scenarios are depicted in Figure 2.For our analysis, we also introduce the following two matrices: • A E : the adjacency matrix of edges in G e ; here, ( A E ) ij = E ij . • A T : the adjacency matrix of triangle motifs in G t ; here, ( A T ) ij = (cid:80) k T ijk . As noted in Chandrasekhar and Jackson (2014), there are four classes of matrices neededto describe the model, namely(a) A E : the motif adjacency matrix of all triangles formed by random edges from G e . Thegenerative random variable for these triangles reads as: E ijk = E ij E jk E ik , and ( A E ) ij = (cid:80) k E ij E jk E ik .(b) A T : the motif adjacency matrix of all triangles formed by three intersecting trianglesfrom G t . The generative random variable for these triangles reads as: T ijk = (1 − T ijk )1( (cid:88) k (cid:54) = k T ijk > (cid:88) k (cid:54) = i T jkk > (cid:88) k (cid:54) = j T ikk > , and ( A T ) ij = (cid:80) k T ijk . 7a) (b) (c) (d)Figure 2: Imposed triangles generated through the superimposition of edges and triangles:(a) E , (b) T , (c) T E , and (d) T E .(c) A T E : the motif adjacency matrix of all triangles formed by two triangles from G t andone edge from G e . The generative random variable for these triangles reads as: T E ijk = (1 − T ijk )1( (cid:88) k (cid:54) = k T ijk > ∩ E ij = 0)1( (cid:88) k (cid:54) = i T jkk > ∩ E jk = 0)1( (cid:88) k (cid:54) = j T ikk = 0 ∩ E ik = 1) , and ( A T E ) ij = (cid:80) k T E ijk .(d) A T E : the motif adjacency matrix of all triangles formed by one triangle from G t andtwo edges from G e . The generative random variable for these triangles reads as: T E ijk = (1 − T ijk )1( (cid:88) k (cid:54) = k T ijk > ∩ E ij = 0)1( (cid:88) k (cid:54) = i T jkk = 0 ∩ E jk = 1)1( (cid:88) k (cid:54) = j T ikk = 0 ∩ E ik = 1) , and ( A T E ) ij = (cid:80) k T E ijk .Note that except for case (a), an imposed triangle involving the vertices ( i, j, k ) arises onlyif there is no model-created triangle involving ( i, j, k ) already present. Hence, the definitionsof each of the random variables T , T E and T E include (1 − T ijk ) as a factor that indicatesthis dependence. For case (a), since we allow a multiedge between two vertices that are bothinvolved in a triangle hyperedge and an edge, it is possible to have an imposed triangle inaddition to a model-generated triangle on the same triple of vertices.8ith these definitions, we have that the triangle adjacency matrix reads as A T = A T + A T E + A T E + A T + A T E , capturing both model-based and imposed triangles. Obviously, we only observe the matrices A E and A T and not their specific constituents, as in real networks we do not have labelsdescribing how an interaction is formed. Hence, even though the community structure ismost explicitly described by A T , we need to analyze how this matrix reflects on A T andwhat the properties of the latter matrix are based on A T . Then, the expectation of A T equals E [ A T ] = E [ A T ] + E [ A T E ] + E [ A T E ] + E [ A T ] + E [ A T E ] , where all operators are used component-wise. We start with some notation. Let p e max = max i,j p eij and p t max = max i,j,k p tijk , denote the maximum probability of edge inclusion in G e and triangle hyperedge inclusionin G t , respectively. As noted by Chandrasekhar and Jackson (2014), in the superimposedrandom graph framework, the generative probabilities summarized in the two matrices P t and P e must satisfy certain conditions in order to ensure that the imposed triangles donot significantly outnumber the generative triangles. Accordingly, we impose the followingasymptotic growth conditions on p t max and p e max : c log nn ≤ p e max < c n / − (cid:15) n , (3.1) c (log n ) n < p t max < c n / − (cid:15) n , (3.2)and p t max > c p e max log nn for some (cid:15) > c , c , and c independent of n . A typicalexample are the following two growth rates: p e max = O ( log nn ) and p t max = O ( n / n ). Note thatthe asymptotic growth bounds are required only for the analysis of superimposed randomgraphs under the SupSBM. We do not require these relations to hold for results regardingregular SBMs or 3-uniform hypergraph SBMs. Hence, we will not make any assumptions onthe asymptotic growth bounds for p t max and p e max until Theorem 6.The following five results, summarized in Theorems 1 to 5, provide non-asymptotic errorbounds that hold in more general settings, as described in the statements of the respectivetheorems. Note that we make repeated use of the symbols c or r to represent different genericconstants as needed in the proofs in order to avoid notational clutter.It is well-known that the Frobenius norm (cid:107) A E − P E (cid:107) is bounded by c √ ∆ withprobability at least 1 − n − r (Lei et al. 2015; Gao et al. 2015; Chin et al. 2015), where∆ = max { np e max , c log n } . The following theorems establish similar upper bounds for othercomponent matrices involved in our analysis as well as a bound on (cid:107) A T − P T (cid:107) . The proofsof all theorems are delegated to the Appendix.9 .1.2 Concentration bounds for A T Theorem 1.
Let G t ( n, P t ) be a -uniform hypergraph in which each possible 3-hyperedge isgenerated according to a Bernoulli random variable T ijk with parameter p tijk , independent ofall other 3-hyperedges. Let A T , as before, stand for the triangle-motif adjacency matrix.Furthermore, let ∆ t = max { n p t max , c log n } . Then, for some constant r > , there exists aconstant c ( c, r ) > such that with probability at least − n − r , one has (cid:107) A T − E [ A T ] (cid:107) ≤ c (cid:112) ∆ t . Note that in the above bound, ∆ t may be interpreted as being an approximation of themaximum expected “triangle degree” of vertices in G t . Drawing a parallel with adjacencymatrices of graphs, one may define the “degree” of a row of an arbitrary matrix as the sumof the elements in that row. Then, ∆ t is an upper bound on the degree of a row in the matrix A T , much like ∆ is an upper bound for the degrees of the rows in A E . The above resultfor triangle-motif adjacency matrices is hence an analogue of a similar result for standardadjacency matrices described in Lei et al. (2015); Gao et al. (2015); Chin et al. (2015). Thearguments used to prove the result in the cited papers are based on an (cid:15) − net analysis ofrandom regular graphs laid out in Friedman et al. (1989); Feige and Ofek (2005). We extendthese arguments to the case of triangle hyperedges; due to the independence of the randomvariables corresponding to the hyperedges involved in all sums of interest, we do not requirenew concentration inequalities to establish the claim. This is not the case for the results tofollow. E Next, we derive an upper bound for the spectral norm of the matrix A E − E [ A E ]. Notethat the elements of the matrix A E are dependent and consequently, the sums of the ran-dom variables used in the (cid:15) − net approach include dependent variables. Hence, the (cid:15) − netapproach cannot be applied directly, and several substantial modifications in the proofs areneeded. However, each element of A E is a low-degree polynomial of the generic independentrandom variables E ij . Therefore, we show that in all the sums of dependent random variablesof our interest, the dependencies between the random variables are limited, and the numberof co-dependent random variables are significantly smaller than that of all the variables inthe sum. In what follows, we build upon recent advances in concentration inequalities forfunctions of independent random variables (Warnke 2016) and sums of dependent randomvariables (Warnke 2017) to derive concentration bounds for A E .Let τ max = max { n ( p e max ) , c log n } , ∆ E = max { n ( p e max ) , c (log n ) } and D E = np e max τ =max { n ( p e max ) , c np e max (log n ) } and assume np e max ≥ c log n . We have the following result. Theorem 2.
Let G e ( n, P e ) be an inhomogeneous random graph in which each edge is inde-pendently generated by a Bernoulli random variable E ij with parameter p eij , i, j = 1 , . . . , n .Let D E = max { n ( p e max ) , cnp e max (log n ) } for some constant c . Then, for some constant r > , there exists a constant c ( c, r ) > such that with probability at least − n − r , (cid:107) A E − E [ A E ] (cid:107) ≤ c (cid:112) D E . (cid:15) − nets but also take into account theparticular dependencies between the random variables. The key observations are that twotriangles generated by edges from G e are dependent if they share an edge (see Figure 3(a)),and, with high probability, each triangle shares an edge with at most 2 τ max other triangles.The random variables whose sums we are interested in bounding represent such triangles.Note that our result is a stronger bound than the one actually needed for obtaining an upperbound for (cid:107) A T − P T (cid:107) under the asymptotic growth conditions of interest. Indeed, Propo-sition 1 stated in the Appendix automatically gives an upper bound of the form O (∆ E ) for (cid:107) A E − E [ A E ] (cid:107) . While this loose bound would have been sufficient, we resorted to a morecareful analysis using the (cid:15) − net approach and the results of Theorem 1.3 of Warnke (2016)to arrive at a significantly improved bound O ( √ D E ). We also note that based on the resultsderived for A E and A T , the bound for the case when the elements of A E are mutuallyindependent should read as O ( √ ∆ E ). The current bound is worse than this bound by afactor of √ ∆, and it is not immediately clear how the latter bound can be improved further. For the next three results, we use the following property of the spectral norm of a squaresymmetric matrix. For any n × n square symmetric matrix X , define the spectral norm of X as (cid:107) X (cid:107) = σ max ( X ), the largest singular value of X , the 1-norm as (cid:107) X (cid:107) = max j (cid:80) i | X ij | ,and the ∞− norm as (cid:107) X (cid:107) ∞ = max i (cid:80) j | X ij | . Now assume X is an n × n symmetric matrixwhose elements are non-negative random variables. Let the entries of its expectation, E [ X ],also be non-negative. Then, (cid:107) X − E [ X ] (cid:107) ≤ (cid:112) (cid:107) X − E [ X ] (cid:107) (cid:107) X − E [ X ] (cid:107) ∞ = (cid:107) X − E [ X ] (cid:107) = max i (cid:88) j | X ij − E [ X ] ij | ≤ max i (cid:88) j X ij + max i (cid:88) j E [ X ] ij , (3.3)where the first inequality is Corollary 2.3.2 in Golub and Van Loan (2012), and the secondequality follows since X − E [ X ] is a symmetric matrix by assumption. Note the first termin the final sum is the degree of row i of the matrix X . Hence, a high-probability bound onthe maximum degree will allow us to upper bound this quantity. The second term equalsthe maximum expected degree of X , which is a deterministic quantity. For the next threetheorems in this section we assume n p t max > c (log n ) and np e max > c (cid:48) log n . Theorem 3.
Let G ∼ G s ( n, P e , P t ) be a graph generated by the superimposed random graphmodel. Let ∆ T = max { n ( p t max ) , c (log n ) } , where c is some constant. Then, there exists a constant c (cid:48) > , such that with probability atleast − o (1) , one has (cid:107) A T − E [ A T ] (cid:107) ≤ c (cid:48) ∆ T . i , (a) E (b) T (c) T E of type 1, (d) T E of type 2, (e) T E of type 1, and (f) T E oftype 2. Theorem 4.
Let G ∼ G s ( n, P e , P t ) be a graph generated by the superimposed random graphmodel. Let ∆ T E = max { n ( p t max ) p e max , c (log n ) } , where c is some constant. Then, there exists a constant c (cid:48) > , such that with probability atleast − o (1) , one has (cid:107) A T E − E [ A T E ] (cid:107) ≤ c (cid:48) ∆ T E . Theorem 5.
Let G ∼ G s ( n, P e , P t ) be a graph generated by the superimposed random graphmodel. Let ∆ T E = max { n p t max ( p e max ) , c (log n ) } , here c is some constant. Then, there exists a constant c (cid:48) > , such that with probability atleast − o (1) , one has (cid:107) A T E − P T E (cid:107) ≤ c (cid:48) ∆ T E . The proofs of all three above results follow a similar outline. In each case, the degree ofa row i is a sum of dependent triangle-indicator random variables for triangles that includevertex i . However, in each case we carefully enumerate the events that lead to two suchincidental triangle indicator random variables to be dependent and show that the numberof such cases is limited with high probability. This allows us to apply Theorem 9 of Warnke(2017) in an iterative manner and obtain concentration results on the respective sums. Whilewe relegate technically involved rigorous proofs to the Appendix, we graphically illustrate allthe events that lead to the dependencies that we need to consider in Figure 3. For the familyof random variables { ( T ) ijk , j, k ∈ { , . . . , n }} , two variables are dependent if and only if oneof the triangles from G t covering ( T ) ijk has an edge ij (cid:48) or ik (cid:48) and consequently also covers( T ) ij (cid:48) k (cid:48) (see Figure 3(b)). For the family of random variables { ( T E ) ijk , j, k ∈ { , . . . , n }} ,two variables are dependent if either one of the edges ij or ik is covered by a triangle from G t and the same triangle edge-intersects ( T E ) ij (cid:48) k (cid:48) (see Figure 3(c)), where one of ij or ik is an edge in G e and is also an edge in ( T E ) ij (cid:48) k (cid:48) (see Figure 3(d)). Finally, for the familyof random variables { ( T E ) ijk , j, k ∈ { , . . . , n }} , two variables are dependent if either oneof the edges ij or ik is covered by a triangle from G t and the same triangle edge-intersects( T E ) ij (cid:48) k (cid:48) (see Figure 3(e)), or, one of the two edges ij and ik is generated by G e and is alsoan edge covered by ( T E ) ij (cid:48) k (cid:48) (see Figure 3(f)).We also note that the bounds in the previous three results can be improved by applyingthe (cid:15) − net approach for dependent random variables as used in Theorem 2. However, theabove stated upper bounds suffice to obtain the desired concentration bound for A T , assummarized in the following subsection. A T In the next theorem we combine the previous results to arrive at a concentration bound forthe matrix A T under the assumptions made on p e max and p t max in (3.1) and (3.2). Theorem 6.
Let A T denote the triangle-motif adjacency matrix of a random graph G gener-ated by the inhomogeneous superimposed random graph model G s ( n, P e , P t ) . If ∆ t > c (log n ) for some constant c , and the assumptions (3.1) and 3.2) on p e max and p t max hold, then withprobability at least − o (1) , one has (cid:107) A T − E [ A T ] (cid:107) ≤ c (cid:48) (cid:112) ∆ t , where c (cid:48) is a constant independent of n . We note the similarity of the upper bound of this concentration inequality with that ob-tained for A T in Theorem 1. The above result then tells us that the effect of the incidentaltriangles on the concentration of A T is limited and the rate in the upper bound is predomi-nantly determined by the rate for A T . This suggests that while the superimposition process13nduces dependencies between the edges in G s through the presence of triangles from G t , themodel, under suitable sparsity conditions, is still mathematically tractable. The influence ofthe incidental triangles can be analyzed and controlled.Next, we turn our attention to analyzing random graphs generated by SupSBMs, andfocus in particular on quantifying the misclustering error rate under a higher-order spectralclustering algorithm. Let G ∼ G s ( C, n, k, a e , b e , a t , b t ) be a graph generated by the balanced n -vertex, k -blockSupSBM with a community assignment matrix C as defined before. Let ˆ C denote the n × k matrix of eigenvectors corresponding to the k largest absolute-value eigenvalues of thetriangle motif adjacency matrix A T . To obtain the community assignments for the vertices,a (1 + (cid:15) )-approximate k -means clustering, with (cid:15) >
0, is performed on the rows of ˆ C (Kumaret al. 2004; Lei et al. 2015).We define the misclustering error rate R as follows. Let ¯ e and ˆ e denote the vectorscontaining the true and estimated community labels of all the vertices in V . Then we define R = inf Π n n (cid:88) i =1 e i (cid:54) = Π(ˆ e i )) , where the infimum is taken over all permutations Π( · ) of the community labels.To bound the misclustering rate R , one needs to relate it to the difference between theestimated and the true eigenvectors. For this purpose, one can use the well-known Davis-Kahan Theorem (Davis and Kahan 1970; Stewart and Sun 1990) that characterizes theinfluence of perturbations on the spectrum of a matrix. For a symmetric matrix X , let λ min ( X ) stand for its smallest in absolute value non-zero eigenvalue. Since ˆ C n × k is the matrix of eigenvectors it has orthonormal columns, and hence we have the followingbound R ≤ n nk (cid:15) ) (cid:107) ˆ C − C ( C T C ) − / O(cid:107) F ≤ (cid:15) ) (cid:107) A T − E [ A T ] (cid:107) ( λ min ( E [ A T ]) , (3.4)where O is an arbitrary orthogonal matrix (Lei et al. 2015) and the last inequality arisesfrom the Davis-Kahan Theorem.Next, we derive a lower bound on λ min ( E [ A T ]). We start by computing the expectationsof the motif adjacency matrices A E , A T , and A E under the SupSBM. In all three cases,these expectations are of the form C (( g − h ) I k + h k Tk ) C T , where as before C denotesthe community assignment matrix, I k is the k -dimensional identity matrix, 1 k is the k -dimensional vector of all 1s, and g and h are functions of the parameters n, k, a e , b e , a t , b t .For matrices of the form C (( g − h ) I k + h k Tk ) C T , with g > h >
0, 1 k is an eigenvectorcorresponding to the eigenvalue nk ( g − h ) + nh, and the remaining non-zero eigenvalues areof the form nk ( g − h ) , where the values of g and h differ for the different matrices (Rohe et al.2011). Since nh >
0, the smallest non-zero eigenvalue equals nk ( g − h ).We start by analyzing A E . Clearly, E [ A E ] = C (cid:18) ( a e − b e ) n I k + b e n k Tk (cid:19) C T ,
14o that λ min ( E [ A E ]) = a e − b e k . Next, we note that the expected value of A T equals E [ A T ] ij = (cid:80) k (cid:54) = i,j p tijk . When C i = C j , i.e., when the vertices i and j are in the same community, then E [ A T ] ij = (cid:16) nk − (cid:17) a t n + ( k − nk b t n , while when C i (cid:54) = C j , E [ A T ] ij = ( n − b t n . The difference between the two above entities equals (cid:16) nk − (cid:17) a t n + ( k − nk b t n − ( n − b t n = (cid:16) nk − (cid:17) a t − b t n . Hence, E [ A T ] = C (cid:18)(cid:16) nk − (cid:17) a t − b t n I k + ( n − b t n k Tk (cid:19) C T . Consequently, λ min ( E [ A T ]) = nk (cid:16) nk − (cid:17) a t − b t n = (cid:16) nk − (cid:17) a t − b t k . (3.5)To determine E [ A E ], we first note that E [ A E ] ij = (cid:88) k (cid:54) = i,j p ij p jk p ik = p ij (cid:88) k (cid:54) = i,j p jk p ik . When C i = C j , E [ A E ] ij = a e n { (cid:16) nk − (cid:17) a e n + ( k − nk b e n } , while when C i (cid:54) = C j , E [ A E ] ij = b e n { (cid:16) nk − (cid:17) a e b e n + ( k − nk b e n } . The difference between the above two probabilities equals b e ( a e − b e ) n + ( a e + a e b e − b e ) ( a e − b e ) kn − a e ( a e + b e )( a e − b e ) n . Hence, E [ A E ] = Z (( b e ( a e − b e ) n + ( a e + a e b e − b e ) ( a e − b e ) kn − a e ( a e + b e )( a e − b e ) n ) I k + b e n (2( nk − a e b e n + ( k − nk b e n )1 k Tk ) Z T . Consequently, the smallest non-zero eigenvalue equals λ min ( E [ A E ]) = ( kb e + a e + a e b e − b e )( a e − b e ) k n − a e ( a e + b e )( a e − b e ) kn . (3.6)15 special case of the SBM that is widely analyzed in the literature is the balanced SBM,in which 2 n vertices are partitioned into two blocks. In our setting, balancing implies thatthe SupSBM model of interest has parameters G s ( Z, n, , a e , b e , a t , b t ), and it results in λ min ( E [ A T ]) = ( n − a t − b t ) and λ min ( E [ A E ]) = a e ( a e + b e )( a e − b e ) n − e ( a e + b e )( a e − b e ) n .We are now ready to state the main result of the paper. Theorem 7.
Let G ∼ G s ( C, n, k, a e , b e , a t , b t ) be a graph generated from the balanced k blockSupSBM. If ∆ t > c (log n ) , then with probability at least − o (1) , the misclustering rate ofcommunity detection using the higher-order spectral clustering method satisfies R T ≤ (cid:15) ) c ∆ t (( nk − a t − b t )) . Under the assumed growth rate on p t max we have n p t max > c (log n ) , and hence we canreplace ∆ t by n p t max . Under the k block balanced SupSBM model, p t max = a t n . Henceignoring the constants, we can rewrite the upper bound as R T (cid:46) k a t n ( a t − b t ) . As an example, if we assume a t = m t n / and b t = s t n / for constants m t and s t , then theresult implies that it is possible to detect the communities consistently as long as ( m t − s t ) = ω ( k n / ).For the special case of a 2 n -vertices and 2-block SupSBM G s ( C, n, , a e , b e , a t , b t ),we can further simplify this upper bound to R (cid:46) a t n ( a t − b t ) . (3.7)For comparison, we note that the result in Lei et al. (2015) for the misclustering rate ofspectral clustering with classical adjacency matrices under the SBM reads as R E (cid:46) a e ( a e − b e ) . In what follows, we analyze the performance of the higher-order spectral clustering under theuniform and non-uniform hypergraph SBMs (Ghoshdastidar et al. 2017; Chien et al. 2018).The balanced n -vertex k -block SBM G t ( C, n, k, a t , b t ) is defined in thefollowing way. All the k communities have an equal number of vertices s = nk , and theprobability of forming a triangle hyperedge equals a t n if all three vertices belong to the samecommunity, while the probability of forming a triangle hyperedge equals b t n if one of thevertices belongs to a different community than the other two.Non-uniform hypergraphs involve two types of hyperedges, edges and triangles, thatneed to be described separately. Note that this model differs from the superimposed randomgraph framework used throughout the paper as the observations are of the form of two-wayand three-way interactions between entities. Hence, we have a way to differentiate betweenan edge and a triangle hyperedge. The n -vertex k -block balanced non-uniform hypergraph16BM G H ( C, n, k, a e , b e , a t , b t ) is defined in the same way as a SupSBM, except that we do notreplace the generated triangle hyperedges with three ordinary edges and we do not collapsemultiedges.If we assume our observed graph is generated from a uniform hypergraph SBM on trianglehyperedges, then spectral clustering of the motif adjacency matrix is equivalent to spectralclustering based on A T only. Let ˆ C ( T ) be the matrix of eigenvectors corresponding to the k largest absolute eigenvalues of the matrix A T . Then, using the bound for A T in Theorem 1from Section 3.1.2, we arrive at the following result. Theorem 8.
Let G t be a triangle hypergraph generated from the k -block uniform trianglehypergraph SBM with parameters C, n, k, a t , b t . Then, with probability at least − n − c , themisclustering rate of the community assignments obtained using the higher-order spectralclustering algorithm applied to the triangle motif adjacency matrix equals R T ≤ (cid:15) ) (cid:107) A T − E [ A T ] (cid:107) ( λ min ( E [ A T ])) ≤ (cid:15) ) c ∆ t (( nk − a t − b t )) . We can simplify this upper bound under the assumption of a 2 n -vertex 2-block trianglehypergraph SBM G t ( C, n, , a t , b t ) to R T (cid:46) a t n ( a t − b t ) . Note that the concentration boundis smaller by a factor of n when compared to the same result for spectral clustering ofordinary edge-adjacency matrix in Lei et al. (2015), provided that the parameters a e , b e and a t , b t are comparable. Alternatively, the misclustering error rate using the triangle motifadjacency matrix for a graph generated from a triangle hypergraph SBM is better than thecorresponding rate from an edge-based adjacency matrix of a graph generated from SBM aslong as a t (cid:38) a e n .The above observation has important implication for non-uniform hypergraph SBMs. Todescribe why this is the case, assume that we are given a non-uniform hypergraph generatedfrom the 2 n -vertex 2-block balanced non-uniform hypergraph SBM G H ( C, n, k, a e , b e , a t , b t ) . The question of interest is: Given a e , b e , a t , b t , with a e (cid:16) b e and a t (cid:16) b t , should one use theedge-based adjacency matrix, the triangle-based adjacency matrix, or a combination thereof?Let a t (cid:16) a e δ , a t − b t = m a e − b e δ , a e (cid:16) b e , (3.8)so that asymptotically, the probabilities a e and b e are δ -times the probabilities a t and b t , whilethe difference between the probabilities a e − b e is δm times that of the difference between a t − b t .Clearly, δ captures the asymptotic difference between the densities of triangle hyperedgesand dyadic edges, while m captures the difference in the “communal” qualities betweenthese two types of hyperedges. Note that the notation for asymptotic equivalence ignores allconstants. Theorem 9.
Let G ∼ G H ( C, n, , a e , b e , a t , b t ) be a graph generated from the non-uniform hypergraph SBM. Assume the relationships between the probabilities a e , b e , a t , b t areas in Equation (3.8). Then, spectral clustering based on a triangle adjacency matrix has alower error rate than spectral clustering based on an edge adjacency matrix if δm n (cid:46) , anda higher error rate if δm n (cid:38) . m and δ , it is possibleto estimate them reliably and efficiently. To estimate δ , we only need to look at the ratioof the densities of the hyperedges. The expected degree density of triangle hyperedges is O ( na t ), while that of edges is O ( a e ). This implies that δ (cid:16) a e a t = n expected edge degreeexpected triangle degree . Hence ˆ δ = n average edge degreeaverage triangle degree is a “good” estimator of δ . To obtain an estimate of m ,we first cluster the vertices using spectral clustering on edges and triangles separately, andthen compute the respective probability parameters for intra- and inter-cluster connections.Then, we may use ˆ m = ˆ δ (ˆ a t − ˆ b t )ˆ a e − ˆ b e as an estimate of m .The above results also allow us to bound the error rate of spectral clustering of a weightedmotif adjacency matrix under the non-uniform hypergraph SBM. Let A W = A E + wA T bethe weighted sum of adjacency matrices of edges and triangle hyperedges with known relativeweight w >
0. Clearly, E [ A W ] = E [ A E ] + wE [ A T ] and the smallest non-zero eigenvalueof E ( A W ) is λ min ( E [ A W ]) = ( a e − b e ) + w ( n − a t − b t ). Then, with probability at least1 − o (1) we have (cid:107) A W − E [ A W ] (cid:107) ≤ (cid:107) A E − E [ A E ] (cid:107) + w (cid:107) A T − E [ A T ] (cid:107) (cid:46) √ ∆ + w (cid:112) ∆ t , and the error rate is upper bounded according to, R W (cid:46) (cid:32) √ ∆ + w √ ∆ t ( a e − b e ) + wn ( a t − b t ) (cid:33) (cid:16) (cid:18) √ a e + w √ na t ( a e − b e ) + wn ( a t − b t ) (cid:19) . When the asymptotic relationships of Equation (3.8) hold, we can further simplify thisexpression to R W (cid:46) (cid:32) (cid:112) nδ w mnδ w (cid:33) a e ( a e − b e ) . (3.9)While Theorem 9 suggests that depending upon the values of δ, m, n , either the edge-basedor triangle-based adjacency matrix has a lower error rate, in practice it might be beneficialfor numerical stability to use a weighted average of both of them. The result in Equation3.9 provides a bound for any weighted sum of these two hyperedge adjacency matrices. For the case of a classical SBM, we are only presented with G e but not G t . In this case, A E is the adjacency matrix of the graph G e , which we denoted by A E . The matrix A T is thetriangle motif adjacency matrix constructed from the triangles that arise due to G e , whichwe denoted by A E . The smallest non-zero eigenvalue λ min of E [ A E ] equals nk a e − b e n = a e − b e k .In Equation (3.6) we described the smallest non-zero eigenvalue for E [ A E ].Let ˆ C ( E ) and ˆ C ( E ) denote the n × k matrices of eigenvectors corresponding to the k largest in absolute value eigenvalues of the classical edge-based adjacency matrix andthe triangle-adjacency matrix, respectively. Using the bound for (cid:107) A E − E [ A E ] (cid:107) fromTheorem 2, Section 3.1.3, and the Davis-Kahan Theorem we have the following result.18 heorem 10. Let G e be a dyadic graph generated from the k -block balanced SBM withparameters C, n, k, a e , b e . Then, with probability at least − n − c , the misclustering rate ofthe community assignments obtained by higher-order spectral clustering equals R E ≤ (cid:15) ) (cid:107) A E − E [ A E ] (cid:107) ( λ min ( E [ A E ])) (cid:46) (cid:15) ) k n D E ( kb e + a e + a e b e − b e ) ( a e − b e ) . For the case k = 2 which is a widely analyzed setting in the SBM literature, one cansimplify the bound above. First, note that in this case λ min ( E [ A E ]) simplifies to a ( a − b )( a + b ) n − O ( n ), while D E = max { a n , c (log n ) } . Furthermore, since a e (cid:16) b e , we have a e (cid:16) a e + b e .Thus, R E (cid:46) n max { ( a e /n ) , (log n ) } a e ( a e − b e ) ( a e + b e ) (cid:16) max { a e , ( n /a e )(log n ) } ( a e − b e ) . (3.10)We conclude this section by evaluating the performance of spectral clustering on theweighted sum of the two motif adjacency matrices, the edge-based matrix A E and thetriangle-based matrix A E . For this purpose, let A W = A E + wA E be the weighted sumof motif adjacency matrices, where w > E ( A W ) = E [ A E ] + wE [ A E ]. Then, from the results of Theorem 2 and Theorem 5.2 of Lei et al. (2015), wehave that with probability at least 1 − o (1), it holds that (cid:107) A W − E [ A W ] (cid:107) ≤ (cid:107) A E − E [ A E ] (cid:107) + w (cid:107) A E − E [ A E ] (cid:107) (cid:46) √ ∆ + w (cid:112) D E . The smallest non-zero eigenvalue of E [ A W ] can be computed as λ min ( E [ A W ]) = ( a e − b e ) + w ( a e ( a e + b e )( a e − b e ) n − O (1 /n ) . From Equation (3.4), we have R W (cid:46) (cid:32) √ ∆ + w √ D E ( a e − b e ) + w ( a e ( a e + b e )( a e − b e ) n (cid:33) . Let us start by comparing the upper bound R E for higher-order spectral clustering underthe SBM obtained in Equation (3.10) with the corresponding upper bound for spectralclustering based on the edge-based adjacency matrix A E , which reads as R (cid:46) a e ( a e − b e ) (Leiet al. 2015). The bound based on the triangle motif adjacency matrix A E is essentially equalto the bound based on the edge adjacency matrix as long as a e (cid:38) n / (cid:15) , or equivalently,as long as p e max (cid:38) n / (cid:15) n . However, when a e grows slower than this rate, the performanceguarantees for spectral clustering based on the motif adjacency matrix is worse than thecorresponding bound based on the edge adjacency matrix. This result is intuitively justifiedas we expect very few triangles in a sparse dyadic graph. The presence of a triangle isa random phenomenon rather than an indicator of community structure. Hence, using19riangles for community detection could lead to unwanted errors unless the graph is dense.For a e (cid:38) n / (cid:15) , we say that the graph is “triangle-dense” and in this case one can use thetriangle-adjacency matrix for community detection. When this condition is not satisfied,we either need to perform some form of regularization or completely dispose of the trianglebased adjacency matrix.However, as previously observed, real world networks contain more triangles and higher-order structures, and consequently have a higher level of local clustering than one wouldexpect from the SBM. Hence, the SupSBM is a more appropriate model for networks withcommunity structures. The upper bound on the misclustering rate in Equation (3.7) suggeststhat spectral clustering based on higher-order structures can consistently detect communitiesunder the SupSBM. In fact, if a e (cid:16) a t , then the underlying upper bound is smaller by a factorof n compared to that of the spectral clustering under the standard SBM. This suggests thateven though spectral clustering based on higher-order structures may not be appropriatefor the SBM, it offers improved performance for the SupSBM. Note that as we focus onhigher-order structures based spectral clustering, we did not analyze edge-adjacency matrixbased spectral clustering under the SupSBM. We hence cannot compare the misclusteringrate of spectral clustering on the triangle-adjacency matrix with that of the edge-adjacencymatrix under SupSBM, as the later does not follow directly from the existing results, e.g., Leiet al. (2015) (due to the fact that the observed edge-adjacency matrix has edges generatedby triangles from G t in addition to edges from G e ). Nevertheless, our analysis of the non-uniform hypergraph SBM, especially Theorem 9, describes the error rate tradeoff betweenspectral clustering with edge-based and triangle-based adjacency matrices. We test the effectiveness of spectral clustering using a weighted sum of adjacency and Lapla-cian matrices for higher-order structures on three benchmark network datasets. In particular,we choose to work with a uniformly weighted edge-triangle adjacency matrix, A W = A E + A T ,where A E and A T are the observed edge and triangle adjacency matrices defined earlier. Thenormalized Laplacian matrix is obtained as L w = D − / w A W D − / W , where D W is a diagonalmatrix such that ( D W ) ii = (cid:80) j ( A W ) ij . We compare the performance of various known formsof spectral clustering methods based on edge-based matrices, namely those using adjacencymatrices (spA), normalized Laplacian matrices (spL), and regularized normalized Laplacianmatrices (rspL) (Sarkar et al. 2015; Chin et al. 2015; Qin and Rohe 2013) with their weightedhigher-order structure counterparts, hospA, hospL and horspL, respectively. In all six in-stances of the spectral clustering, the eigenvectors are row-normalized before applying thek-means algorithm. Table 1 summarizes the performance of the methods. Political blogs data.
The political blog datasets (Adamic and Glance 2005), collectedduring the 2004 US presidential election, comprise 1490 political blogs with hyperlinks be-tween them, giving rise to directed edges. These benchmark datasets have been analyzedby a number of authors (Karrer and Newman 2011; Amini et al. 2013; Qin and Rohe 2013;Joseph and Yu 2016; Jin 2015; Gao et al. 2017; Paul and Chen 2016) in order to test com-munity detection algorithms. Following previous approaches, we first convert directed edgesinto undirected edges by assigning an edge between two vertices if there is an edge between20able 1: The number of misclustered vertices for various spectral community detection algo-rithms that use different forms of weighted higher-order matrices. Performance is evaluatedbased on a known ground truth model.Dataset spA hospA spL hospL rspL horspLPolitical blogs 63 71 588 59 64 64Karate club 0 0 1 0 0 0Dolphins 2 2 2 1 2 1them in either direction and consider the largest connected component of the resultant graphwhich contains 1222 vertices. The ground truth community assignment used for comparisonssplits the graph in two groups, liberal and conservative, according to the political leanings ofthe blogs. We note the hospA and horspL are competitive with the corresponding edge basedmethods spA and rspL, respectively. However, for spectral clustering based on the normal-ized Laplacian matrix, the edge-based method spL completely fails to detect the communitystructure due to well-documented reasons described in Qin and Rohe (2013); Jin (2015);Joseph and Yu (2016); Gao et al. (2017). On the other hand, hospL succeeds in splitting thegraph into two communities with only 59 misclustered vertices.
Karate club data.
The Zachary’s karate club data (Zachary 1977) is another frequentlyused benchmark dataset for network community detection (Newman and Girvan 2004; Bickeland Chen 2009; Jin 2015). The network describes friendship patterns of 34 members of akarate club and the ground truth splits club members into two subgroups. The method spLmisclusters one vertex, while all other methods manage to recover the communities in anerror-free manner.
Dolphin social network data.
This dataset describes an undirected social networkinvolving 62 dolphins in Doubtful Sound, New Zealand, curated by Lusseau et al. (2003).Over the course of the study the group split into two due to departure of a “well connected”dolphin. These two subgroups are used as the ground truth. From Table 1, one can see thatonly hospL and horspL miscluster one dolphin, while all the remaining methods misclustertwo dolphins.
We proposed and analyzed a superimposed stochastic block model, a mathematically tractablerandom graph model with community structure, that produces networks with properties sim-ilar to that observed in real networks. In particular it can generate sparse networks withshort average path length (small-world), strong local clustering, and community structure.To produce the strong local clustering property, the model allows for dependencies amongthe edges, yet remaining mathematically suitable for analysis of algorithms. While not pur-sued here, a degree correction to the model similar to that of degree corrected stochasticblock model is expected to produce networks with highly heterogeneous degree distribu-tion (power law), hub nodes and core-periphery structure while simultaneously retaining theaforementioned properties. We hope to extend the model in that direction in a future work.We have also analyzed the performance of the higher-order spectral clustering algorithm21nder the proposed SupSBM. This analysis showed that it is possible to mathematicallyanalyze community detection algorithms under the supSBM, and that the method can detectcommunity structure consistently for graphs generated from the SupSBM. In future, we hopeto determine minimax rates of error of community detection under the SupSBM and obtainalgorithms that achieve those rates.
Appendix A
In the Appendices we will use r and c to represent generic constants whose values will bedifferent for different results. Proof of Theorem 1
Proof.
We follow and extend the arguments in the proof of a similar result for standardadjacency matrices in Lei et al. (2015); Gao et al. (2017), and Chin et al. (2015) to the caseof triangle-motif adjacency matrices. The arguments in all of the above mentioned papersrely on the use of (cid:15) − nets on random regular graphs (Friedman et al. 1989; Feige and Ofek2005).Let S denote the unit sphere in the n dimensional Euclidean space. An (cid:15) − net of thesphere is defined as follows: N = { x = ( x , . . . , x n ) ∈ S : ∀ i, (cid:15) √ nx i ∈ Z } , where Z denotes the set of integers. Hence, N is a set of grid points of size (cid:15) √ n spanning alldirections within the unit sphere. For our analysis we only use (cid:15) = 1 / − nets of spheres andhenceforth use N to denote such nets.Next, we recall Lemma 2.1 of Lei et al. (2015) which established that for any W ∈ R n × n ,one has (cid:107) W (cid:107) ≤ x,y ∈N | x T W y | . Hence, a constant-approximation upper bound for (cid:107) A T − E [ A T ] (cid:107) may be found by optimizing | x T ( A T − E [ A T ]) y | over all possible pairs( x, y ) ∈ N . In addition, note that x T ( A T − E [ A T ]) y = (cid:88) i,j x i y j ( A T − E [ A T ]) ij = (cid:88) i,j (cid:88) k (cid:54) = i,j x i y j ( T ijk − E [ T ijk ]) . (5.1)We now divide the pairs ( x i , y j ) into two sets, the set of light pairs L and the set of heavypairs H , according to L = { ( i, j ) : | x i y j | ≤ √ ∆ t n } ,H = { ( i, j ) : | x i y j | > √ ∆ t n } , where ∆ t is as defined in the statement of the theorem.We bound the term x T ( A T − E [ A T ]) y separately for the light and heavy pairs, assummarized in the following two lemmas. 22 emma 1. (Light pairs) For some constant r > , there exists a constant c ( r ) > , suchthat with probability at least − exp( − r n ) , sup x,y ∈ T | (cid:88) ( i,j ) ∈ L (cid:88) k x i y j ( T ijk − E [ T ijk ]) | < c ( r ) (cid:112) ∆ t . Whenever clear from the context, we suppress the dependence of the constants on otherterms (e.g., c ( r ) = c .)To obtain a similar bound for heavy pairs, we first note thatsup x,y ∈ T | (cid:88) ( i,j ) ∈ H (cid:88) k x i y j w ijk | ≤ sup x,y ∈ T | (cid:88) ( i,j ) ∈ H (cid:88) k x i y j a ijk | + sup x,y ∈ T | (cid:88) ( i,j ) ∈ H (cid:88) k x i y j p ijk | . (5.2)The second term can be easily bounded as follows: | (cid:88) ( i,j ) ∈ H (cid:88) k x i y j p ijk | ≤ (cid:88) ( i,j ) ∈ H (cid:88) k x i y j | x i y j | p ijk ≤ n √ ∆ t (cid:88) k max i,j,k ( p ijk ) (cid:88) i,j x i y j ≤ n √ ∆ t ∆ t n ≤ (cid:112) ∆ t . How to bound the first term is described in the next Lemma 2.
Lemma 2.
For some constant r > , there exists a constant c ( r ) > such that withprobability at least − n − r , (cid:80) ( i,j ) ∈ H (cid:80) k x i y j T ijk ≤ c √ ∆ t . Combining the results for the light and heavy pairs, we find that with probability at least1 − n − r , (cid:107) A T − E [ A T ] (cid:107) ≤ x,y ∈ T | x T ( A T − E [ A T ]) y | ≤ c (cid:112) ∆ t . This completes the proof of Theorem 1.
Proof of Theorem 2
Proof.
As before, we create an (cid:15) − net N for the unit sphere and separately analyze the lightand heavy pairs. In this setting, the pairs are defined according to L = { ( i, j ) : | x i y j | ≤ √ D E nτ max } and H = { ( i, j ) : | x i y j | > √ D E nτ max } , with D E as defined in the statement of the Theorem.For the light pairs, we can prove the following result.23 emma 3. (Light pairs) For some constant r > , there exists a constant c ( r ) > suchthat with probability at least − n − r , sup x,y ∈N | (cid:88) ( i,j ) ∈ L x i y j ( A E − E [ A E ]) ij | < c (cid:112) D E . To bound the contribution of the heavy pairs, we once again divide the sum into twoterms.First, let W E = ( A E − E [ A E ]) and note that max i,j ( E [ A E ]) ij ≤ n ( p e max ) . Then,sup x,y ∈N | (cid:88) ( i,j ) ∈ H x i y j ( W E ) ij | ≤ sup x,y ∈N | (cid:88) ( i,j ) ∈ H x i y j ( A E ) ij | + sup x,y ∈N | (cid:88) ( i,j ) ∈ H x i y j ( E [ A E ]) ij | . (5.3)The second term can be bounded as follows: | (cid:88) ( i,j ) ∈ H x i y j ( E [ A E ]) ij | ≤ (cid:88) ( i,j ) ∈ H x i y j | x i y j | ( E [ A E ]) ij ≤ nτ max √ D E max ij ( E [ A E ]) ij (cid:88) ( i,j ) x i y j ≤ nτ max √ D E n ( p e max ) ≤ D E √ D E ≤ (cid:112) D E , where the penultimate inequality follows since if τ max = n ( p e max ) , then nτ max n ( p e max ) = n ( p e max ) ≤ D E . In addition, if τ max = log n , then n ( p e max ) ≤ log n . Consequently, nτ max n ( p e max ) = n ( p e max ) n ( p e max ) log n ≤ n ( p e max )(log n ) ≤ D E . For the first term in Equation (5.3) we have the following result.
Lemma 4.
For some constant c > , there exists a constant c ( c ) > such that withprobability at least − n − c , (cid:80) ( i,j ) ∈ H (cid:80) k x i y j ( A E ) ij ≤ c √ D E . Combining the results for the light and heavy pairs, we obtain (cid:107) A E − E [ A E (cid:107) ≤ x,y ∈N | x T ( A E − E [ A E ]) y | ≤ c (cid:112) D E . Proof of Theorem 3
Proof.
The proof of this result and those of Theorems 4 and 5 will repeatedly use Theorem9 of Warnke (2017), which we reproduce here for ease of reference.24 roposition 1 (Theorem 9 of Warnke (2017)) . Let ( Y i ) , i ∈ I be a collection of non-negativerandom variables with (cid:80) i ∈I E ( Y i ) ≤ µ . Assume that ∼ is a symmetric relation on I suchthat each Y i with i ∈ I is independent of { Y j : j ∈ I , j (cid:28) i } . Let Z C = max (cid:80) i ∈J Y i , wherethe maximum is taken over all sets J ⊂ I such that max j ∈J (cid:80) i ∈J ,i ∼ j Y i ≤ C . Then for all C, t > we have P ( Z C ≥ µ + t ) ≤ min (cid:110) exp (cid:18) − t C ( µ + t/ (cid:19) , (cid:18) t µ (cid:19) − t/ C (cid:111) . For any vertex i , define the degree of i in the matrix A T according to( d T ) i = (cid:88) j (cid:88) k T ijk . The expectation of the degree may be bounded as E [( d T ) i ] = E [ (cid:88) j (cid:88) k (1 − T ijk )1( (cid:88) k (cid:54) = k T ijk > (cid:88) k (cid:54) = k T jkk > (cid:88) k (cid:54) = k T ikk > ≤ (cid:88) j (cid:88) k P ( (cid:88) k (cid:54) = k T ijk > P ( (cid:88) k (cid:54) = k T jkk > P ( (cid:88) k (cid:54) = k T ikk > ≤ (cid:88) j (cid:88) k ( np t max ) ≤ n ( p t max ) ≤ ∆ T , where the second inequality follows since P ( (cid:88) k (cid:54) = k T ijk > ≤ P ( ∪ k (cid:54) = k { T ijk = 1 } ) ≤ ∪ k (cid:54) = k P ( { T ijk = 1 } ) ≤ np t max . Let I i = { ( T ) ijk , j = { , . . . , n } , k = { , . . . , n }} denote the set of all triangles incidentto vertex i and generated incidentally by three other triangles in G t . Observe that theset { T } comprises elements that are indicator random variables indexed by θ = { i, j, k } corresponding to incidentally generated triangle. Consequently, two random variables in thefamily ( T ) θ , when restricted to the set I i , are dependent if and only if one of the trianglescreating the edges ik or ij of ( T ) ijk includes j (cid:48) or k (cid:48) as a vertex and is consequently part of( T ) ij (cid:48) k (cid:48) (see Figure 3(b)). We refer to an event corresponding to the above described scenarioas T C and note that this event also accounts for the case when we have T ij (cid:48) k “sharing” theedge ik with T ijk .In summary, the set I i contains O ( n ) dependent random variables, with ( d T ) i denotingthe sum of all the random variables in the set I i . However, in what follows we show thatthe dependence is “limited” in the sense that we can limit the number of other variablesdependent on one random variable in the set I i to O ( n ( p t max ) ) with high probability.We characterize the event T C through an associated indicator random variable. Weshow that the number of incidentally generated triangles T ij (cid:48) k (cid:48) that give rise to TC eventsis bounded, provided that certain “good events” occur with high probability. For this pur-pose, let T ij (cid:48) k be a triangle in G t leading to the creation of an incidental triangle T ijk (see25igure 3(b)). To create the incidental triangle T ij (cid:48) k (cid:48) , we also require the existence of at leasttwo triangles from G t with edges ik (cid:48) and jk (cid:48) . We capture this event through its indicatorvariable V j (cid:48) k (cid:48) = (1 − T ij (cid:48) k (cid:48) ) T ij (cid:48) k (cid:88) k (cid:48)(cid:48) (cid:54) = i T j (cid:48) k (cid:48) k (cid:48)(cid:48) > (cid:88) k (cid:48)(cid:48)(cid:48) T ik (cid:48) k (cid:48)(cid:48)(cid:48) > . Observe that one can think of V j (cid:48) k (cid:48) , as the random variable T ij (cid:48) k (cid:48) given that T ijk = 1 (seeFigure 3(b)). Consequently, for any T ijk ∈ I i , the number of incidentally generated triangles T ij (cid:48) k (cid:48) that also belong to the set I i and contribute to the occurrence of the event T C is atmost 2 (cid:80) j (cid:48) ( (cid:80) k (cid:48) V j (cid:48) k (cid:48) ).Next, we define a “good event” as Γ = Γ ∩ Γ , where Γ and Γ are two events that forany i, j, k may be described as follows:Γ = { For an edge ij there are at most V max = max { n ( p t max ) , (log n ) } vertices k (cid:48) such that the edges ik (cid:48) and jk (cid:48) are introduced by triangles from G t } , Γ = { The number of triangles in G t sharing an edge ij is at most 3 W max = 3 max { np t max , log n }} . Hence, the event Γ essentially asserts that there are 6 W max choices for the value of j (cid:48) ,and given a j (cid:48) , the event Γ asserts that there are V max choices for a k (cid:48) . Consequently, underthe “good event” Γ the number of triangles in I i on which a triangle T ijk depends on is2 (cid:80) j (cid:48) (cid:80) k (cid:48) V j (cid:48) k (cid:48) ≤ V max W max .Next, define a set J ⊂ I i as follows: J = { θ ∈ I i : max θ ∈ J | θ ∈ J ; θ and θ are dependent | ≤ V max W max } . In a nutshell, the sets J are collections of θ ’s such that no T θ is dependent on more than6 V max W max other T θ ’s. Let θ ∼ θ state that the two underlying random variables indexedby θ and θ are dependent. Then we have,max θ ∈ J (cid:88) θ ∈ J : θ ∼ θ T θ = 2 (cid:88) j (cid:48) (cid:88) k (cid:48) V j (cid:48) k (cid:48) ≤ V max W max , E [ (cid:88) θ ∈ I i T θ ] = E [( d T ) i ] ≤ ∆ T . Applying Proposition 1 for t = µ = ∆ T leads to P (max J (cid:88) θ ∈ J T θ ≥ T ) ≤ min (cid:26) exp (cid:18) − ∆ T V max W max (∆ T + ∆ T / (cid:19) , (cid:18) T T (cid:19) − ∆ T V max W max (cid:27) = min (cid:26) exp (cid:18) − ∆ T V max W max (cid:19) , − ∆ T / V max W max }≤ exp( − c (cid:48) log n ) ≤ n − c (cid:48) . The last inequality may be established through the following argument. If W max = np t max ,then np t max ≥ log n , which implies n ( p t max ) ≥ n ( log nn ) = n (log n ) . V max = n ( p t max ) , and consequently∆ T V max W max = max { n, (log n ) n ( p t max ) } ≥ n. On the other hand, if W max = log n , then np t max < log n . Now, either V max = (log n ) , inwhich case W max V max = (log n ) and ∆ T V max ≥ log n . Or, V max = n ( p t max ) , and consequently V max W max = n ( p t max ) log n . Then∆ T V max W max = max { n p t max log n , (log n ) n ( p t max ) log n } ≥ log n, since n p t max > (log n ) by assumption.Recall that the event T C describes the only setting for which two random variables inthe set I i are dependent on each other; under the good event Γ, we have I i = arg max J | J | and consequently, max J (cid:80) α ∈ J T α = ( d T ) i .Next, we need to show that the probability of the “bad event” (i.e., complement of thegood event) is exponentially small. For that, we note P (Γ C ) = P (Γ C ∪ Γ C ) ≤ P (Γ C ) + P (Γ C ) . The last term P (Γ C ) can be easily bounded using Bernstein’s inequality as follows. Let W ij = (cid:80) k T ijk . Then W ij counts the number of triangles in G t sharing an edge ij . Theevent Γ asserts that the number of triangles in G t sharing an edge is at most 3 W max =3 max { np t max , log n } . From Bernstein’s inequality and the union bound we consequentlyhave P (Γ C ) ≤ n P ( W ij > W max ) ≤ n exp( − W (cid:80) k p tijk (1 − p tijk ) + W max ) ≤ n exp( − W W max + 2 W max ) ≤ n exp( − W max ) ≤ exp( −
14 log n ) ≤ n − c (cid:48)(cid:48) . We now turn our attention to the event Γ . Fix a j (cid:48) . The sum (cid:80) k (cid:48) V j (cid:48) k (cid:48) includes dependentrandom variables; two random variables in the sum, say V j (cid:48) k (cid:48) and V j (cid:48) k (cid:48)(cid:48) , are dependent if andonly if there is a triangle, say T ik (cid:48) k (cid:48)(cid:48) in G t that has both ik (cid:48) and ik (cid:48)(cid:48) as edges (Figure 4(a)).However, the number of triangles in G t sharing an edge ij (cid:48) can be bounded by referring tothe event Γ . First, we define I ij (cid:48) to be the collection of all V j (cid:48) k (cid:48) with fixed i and j (cid:48) . Then,we define the sets J s as subsets of I ij (cid:48) such that no V j (cid:48) k (cid:48) is dependent on more than W max other V j (cid:48) k (cid:48) s. To apply Proposition 1 to (cid:80) k (cid:48) V j (cid:48) k (cid:48) , we first observe that one may upper boundthe relevant expectation as E [ (cid:88) k (cid:48) (1 − T ij (cid:48) k (cid:48) )1( (cid:88) k (cid:48)(cid:48) (cid:54) = i a tj (cid:48) k (cid:48) k (cid:48)(cid:48) > (cid:88) k (cid:48)(cid:48)(cid:48) a tik (cid:48) k (cid:48)(cid:48)(cid:48) > ≤ n ( p t max ) ≤ V max , for T , (b) Γ for T E , (c) Γ for T E , and (d) Γ for T E .and max β ∈ J (cid:80) α ∈ J : α ∼ β V j (cid:48) k (cid:48) ≤ W max . Then, with t = µ = V max , P ( V ij (cid:48) ≥ V max ) ≤ min (cid:26) exp (cid:18) − V W max ( V max + V max / (cid:19) , (cid:18) V max V max (cid:19) − V max / W max (cid:27) = min (cid:26) exp (cid:18) − V max83 W max (cid:19) , − V max / W max }≤ exp( − c (cid:48)(cid:48)(cid:48) log n ) ≤ n − c (cid:48)(cid:48)(cid:48) . The last inequality holds due to the following argument. If W max = np t max , then p t max ≥ log nn ,and consequently, n p t max ≥ n log n . Then V max W max ≥ n p t max > log n . If W max = log n , then V max W max ≥ log n, since V max ≥ (log n ) . Since there are at most n choices for j (cid:48) , for any i , theunion bound leads to P (Γ C ) ≤ nP ( V ij (cid:48) ≥ V max ) ≤ n − c (cid:48)(cid:48)(cid:48)(cid:48) . P (( d T ) i ≥ T ) ≤ n − c (cid:48) + n − c (cid:48)(cid:48) + n − c (cid:48)(cid:48)(cid:48)(cid:48) + n − c (cid:48)(cid:48) . Invoking the union bound, now over all i , we can show that max i ( d T ) i ≤ c ∆ T withprobability at least 1 − n − c (cid:48)(cid:48) . By Equation (3.3), the claimed result holds. Proof of Theorem 4
Proof.
Triangles of type T E are generated by two triangles from G t and one edge from G e .Without loss of generality, we may assume that in T E ijk , the sides ij and jk are generatedby triangles from G t and that the side ik is generated by an edge from G e . Then, we have E [( d T E ) i ] = E [ (cid:88) j (cid:88) k T E ijk ] ≤ (cid:88) j (cid:88) k P ( (cid:88) k (cid:54) = k T ijk > P ( (cid:88) k (cid:54) = k T jkk > P ( E ik = 1) ≤ (cid:88) j (cid:88) k ( np t max ) ( p e max ) ≤ n ( p t max ) p e max ≤ ∆ T E . Let the set I i = { ( T E ) ijk , j = { , . . . , n } , k = { , . . . , n }} denote the set of all incidentallygenerated triangles of type T E that includes the vertex i . The set { T E θ } , indexed by θ = { i, j, k } , represents a family of indicator variables corresponding to incidentally generatedtriangles of type T E . Two random variables in the family ( T E ) θ restricted to the set I i may be dependent in two scenarios. One possibility is that one of the sides ij or ik is anedge from G e and serves as an edge for ( T E ) ijj (cid:48) or ( T E ) ikj (cid:48) , for some j (cid:48) (see Figure 3(c)).The other possibility is that one of the sides ij or ik is created by a triangle from G t andthe same triangle is involved in creating ( T E ) ij (cid:48) k (cid:48) for some j (cid:48) and k (cid:48) (see Figure 3(d)). Werefer to these two events as T C and T C , respectively.We now need to derive a bound on ( d T E ) i , which equals the sum of the random variables T E ijk , that holds with high probability. We proceed as in the proof of the previous theoremand describe “good events” which limit the number of random variables that a randomvariable in the sum depends on with high probability.For this purpose, we characterize the events T C and T C using indicator variables. First,define the random variables Q j (cid:48) = (1 − T ijj (cid:48) )1( (cid:88) k (cid:48) T jj (cid:48) k (cid:48) > (cid:88) k (cid:48)(cid:48) T ij (cid:48) k (cid:48)(cid:48) > . Then, for any T E ijk , the number of other incidentally generated triangles in I i creating theevent T C is at most (cid:80) j (cid:48) Q j (cid:48) (Figure 3(c)).With regards to the event T C , define the following random variable U j (cid:48) k (cid:48) = (1 − T ij (cid:48) k (cid:48) ) T ijj (cid:48) (cid:88) k (cid:48)(cid:48) (cid:54) = i T j (cid:48) k (cid:48) k (cid:48)(cid:48) > E ik (cid:48) = 1) . Then, for any T E ijk , the number of additional incidental triangles in I i that contribute tothe event T C is at most 2 (cid:80) j (cid:48) (cid:80) k (cid:48) U j (cid:48) k (cid:48) (Figure 3(d)).29s before, define a “good event” as Γ = Γ ∩ Γ ∩ Γ , where for any i, j, k , Γ , Γ and Γ are defined as:Γ = { For an edge ij there are at most V max = max { n ( p t max ) , (log n ) } vertices k (cid:48) , such that edges ik (cid:48) and jk (cid:48) are generated by triangles from G t } , Γ = { The number of triangles in G t incident to an edge ij is at most 3 W max = 3 max { np t max , log n }} , Γ = { For an edge ij there are at most U max = max { n p t max p e max , (log n ) } vertices k (cid:48) , such that the edge ik (cid:48) arises from G e and edge jk (cid:48) arises from a triangle in G t } . Note the second event Γ is the same as the event Γ described in the proof of Theorem 3.As in the previous setting, the events above are defined in a way that ensures that “goodevents” happen with high probability. We note that under the events Γ and Γ , one has (cid:88) j (cid:48) Q j (cid:48) ≤ V max , (cid:88) j (cid:48) (cid:88) k (cid:48) U j (cid:48) k (cid:48) ≤ W max U max . Hence, the two events limit the number of occurrences of the events
T C and T C , respec-tively, and consequently limit the number of random variables that a random variable in thesum ( d T E ) i is dependent on. We once again apply Proposition 1 to ( d T E ) i under the goodevent Γ to obtain an upper bound for P (Γ C ). For this purpose, define a set J ⊂ I i as follows: J = { θ ∈ I i : max θ ∈ J | θ ∈ J ; θ and θ are dependent | ≤ C } , where C may be found frommax β ∈ J (cid:88) θ ∈ J : θ ∼ β T E θ = (cid:88) j (cid:48) Q j (cid:48) + (cid:88) j (cid:48) (cid:88) k (cid:48) U j (cid:48) k (cid:48) ≤ V max + 6 W max U max ≤ { n ( p t max ) , n p t max p e max log n, (log n ) } = C. Then, E [max J (cid:80) α ∈ J T E α ] ≤ ∆ T E , and with t = µ = ∆ T E , P (max J (cid:88) θ ∈ J T E θ ≥ T E ) ≤ min (cid:26) exp (cid:18) − ∆ T E C (∆ T E + ∆ T E / (cid:19) , (cid:18) T E T E (cid:19) − T E / C (cid:27) = min (cid:26) exp (cid:18) − ∆ T E C (cid:19) , − ∆ T E /C }≤ exp( − c (cid:48) log n ) ≤ n − c (cid:48) , where the last inequality holds due to the following argument. If C = n ( p t max ) , then ∆ T E C ≥ np e max which, by assumption, is greater than log n . If C = n p t max p e max log n , then ∆ T E C ≥ n p t max log n which, by assumption, is greater than log n . Finally, if C = (log n ) , then ∆ T E C ≥ log n .In our previous proofs, we already established upper bounds for P (Γ C ) and P (Γ C ). Tocomplete the proof of the claimed result, we only need to determine an upper bound on30 (Γ C ). Using the previously introduced variables U j (cid:48) k (cid:48) , the event Γ occurs if (cid:80) k (cid:48) U j (cid:48) k (cid:48) ≤ U max for any j (cid:48) . Note that the sum (cid:80) k (cid:48) U j (cid:48) k (cid:48) includes dependent random variables.An upper bound on the expectation of this sum reads as E ( (cid:88) k (cid:48) U j (cid:48) k (cid:48) ) ≤ E [ (cid:88) k (cid:48) (cid:88) k (cid:48)(cid:48) (cid:54) = i T jk (cid:48) k (cid:48)(cid:48) > E ik (cid:48) = 1)] ≤ n p t max p e max ≤ U max . We also introduce the set J that restricts the number of U -variables that another U -variablein the sum (cid:80) k (cid:48) U j (cid:48) k (cid:48) is dependent on: J = { θ : max θ ∈ J | θ ∈ J ; θ and θ are dependent | ≤ W max } . Fix i and j (cid:48) and define I ij (cid:48) to be the collection of all random variables U j (cid:48) k (cid:48) that contributeto the event T C . Two random variables in the sum (cid:80) k (cid:48) U j (cid:48) k (cid:48) , say U j (cid:48) k (cid:48) and U j (cid:48) k (cid:48)(cid:48) , aredependent (conditioned on T E ijk ) if and only if the triangle T jk (cid:48) k (cid:48)(cid:48) from G t generates an edgefor both the incidental triangles characterized by U j (cid:48) k (cid:48) and U j (cid:48) k (cid:48)(cid:48) (see Figure 4(c)). The set Γ essentially limits the frequency of such triangles T jk (cid:48) k (cid:48)(cid:48) s: under the event Γ , the largest set J as defined above is equal to the set I ij (cid:48) . We therefore have max β ∈ J (cid:80) α ∈ J : α ∼ β U α ≤ W max ,and for t = µ = U max , P (max (cid:88) θ ∈ J U θ ≥ U max ) ≤ min (cid:26) exp (cid:18) − U W max ( U max + U max / (cid:19) , (cid:18) U max U max (cid:19) − U max / W max (cid:27) = min (cid:26) exp (cid:18) − K max83 W max (cid:19) , − U max / W max }≤ exp( − c (cid:48)(cid:48)(cid:48) log n ) ≤ n − c (cid:48)(cid:48)(cid:48) , where the last inequality follows since if W max = np t max , then U max W max ≥ np e max which, byassumption, is greater than c log n ; and, if W max = log n , then U max W max ≥ log n . Combiningthe previous results we obtain P (( d T E ) i ≥ T E ) ≤ P (max J (cid:88) θ ∈ J T E θ ≥ T E ) + P (Γ C ) ≤ n − c (cid:48) + n − c (cid:48)(cid:48) + n − c (cid:48)(cid:48)(cid:48) + n − c (cid:48)(cid:48) . Applying the union bound over all indices i we can bound max i ( d T E ) i ≤ c ∆ T E withprobability at least 1 − n − c (cid:48)(cid:48) . Then, from Equation (3.3) we arrive at the result claimed inthe theorem. Proof of Theorem 5
Proof.
For incidental triangles of type
T E , the generating class is one triangle from G t andtwo edges from G e . Consequently, we have E [( d T E ) i ] = E [ (cid:88) j (cid:88) k T E ijk ] ≤ (cid:88) j (cid:88) k P ( (cid:88) k (cid:54) = k T ijk > P ( E jk = 1) P ( E ik = 1)31 (cid:88) j (cid:88) k np t max ( p e max ) ≤ n p t max ( p e max ) ≤ ∆ T E . Next, let I i = { ( T E ) ijk , j = { , . . . , n } , k = { , . . . , n }} , denote the set of all incidentallygenerated triangles of type T E including a vertex i . Then, ( T E ) θ indexed by θ = { i, j, k } isa family of indicator variables with each variable corresponding to an incidentally generatedtriangle of type T E . Two different random variables in the family ( T E ) θ restricted to theset I i may be dependent in two ways. First, one of the edges ij or ik of the incidental trianglecharacterized by T E ijk , may be an edge from G e and be an edge in the incidental trianglecharacterized by ( T E ) ijk (cid:48) for some k (cid:48) (see Figure 3(e)). Second, one of the edges ij or ik mayhave been created by a triangle from G t , with the same triangle being involved in creatingthe incidental triangle characterized by ( T E ) ij (cid:48) k (cid:48) for some j (cid:48) and k (cid:48) (see Figure 3(f)). Notethe second possibility also includes the case when the triangles characterized by ( T E ) ijk and ( T E ) ijk (cid:48) share an edge ij which is created by a triangle from G t . We refer to these twoevents as T C and T C , respectively.With regards to the event T C , define the following random variable K k (cid:48) = (1 − T ijk (cid:48) )1( (cid:88) k (cid:48)(cid:48) (cid:54) = i T jk (cid:48) k (cid:48)(cid:48) > E ik (cid:48) = 1) . Conditioned on
T E ijk , each K k (cid:48) characterizes an incidentally generated triangle in I i andcontributes to the event T C ; for simplicity, we let I K stand for the set of all such variables K k (cid:48) . Then, for any T E ijk , the number of additional incidentally generated triangles in I i contributing to the event T C is at most 2 (cid:80) k (cid:48) K k (cid:48) (Figure 3(e)).With regards to the event T C , define the random variable S j (cid:48) k (cid:48) = (1 − T ij (cid:48) k (cid:48) ) T ijj (cid:48) E ik (cid:48) E j (cid:48) k (cid:48) . Conditioned on
T E ijk , each S j (cid:48) k (cid:48) characterizes an incidentally generated triangle in I i andleads to the event T C ; for simplicity, we let I S stand for the set of all such variables S j (cid:48) k (cid:48) s. Then, for any T E ijk , the number of additional incidentally generated triangles in I i contributing to the event T C is at most (cid:80) j (cid:48) (cid:80) k (cid:48) S j (cid:48) k (cid:48) (Figure 3(f)).Define a “good event” as Γ = Γ ∩ Γ ∩ Γ , where for any i, j, k , Γ , Γ and Γ may bedescribed as follows:Γ = { The number of triangles in G t incident to an edge ij is at most 3 W max = 3 max { np t max , log n }} , Γ = { For a side ij there are at most U max = max { n p t max p e max , (log n ) } vertices k (cid:48) , such that the side ik (cid:48) is an edge from G e and the side jk (cid:48) belongs to a triangle from G t } , Γ = { Two vertices { i, j } have at most 4 τ max = 4 max { n ( p e max ) , (log n ) } common neighbors { k (cid:48) }} . We again apply Proposition 1 to ( d T E ) i under the good event Γ to obtain an upperbound on P (Γ C ).Under the event Γ , it holds that 2 (cid:80) k (cid:48) K k (cid:48) ≤ U max , which in turn implies that thenumber of T E α in I i that depend on T E ijk according to the event T C is limited to U max .Furthermore, for the event Γ , we have (cid:80) j (cid:48) (cid:80) k (cid:48) S j (cid:48) k (cid:48) ≤ τ max W max which implies that32he number of T E α in I i that depend on T E ijk according to the event T C is limited to τ max W max .Now, define a set J ⊂ I i as follows: J = { θ : max θ ∈ J | θ ∈ J ; θ and θ are dependent | ≤ C } , where C may be found according to C = (cid:88) θ ∈ J : θ ∼ β T E θ = 2 U max +12 τ max W max ≤
14 max { n p t max p e max , np t max log n, n ( p e max ) log n, (log n ) } . Then, for t = µ = ∆ T E , P (max (cid:88) θ ∈ J T E θ ≥ T E ) ≤ min (cid:26) exp (cid:18) − ∆ T E C (∆ T E + ∆ T E / (cid:19) , (cid:18) T E T E (cid:19) − TE / C (cid:27) = min (cid:26) exp (cid:18) − ∆ T E C (cid:19) , − ∆ TE /C }≤ exp( − c (cid:48) log n ) ≤ n − c (cid:48) , where the last inequality follows since if C = n p t max p e max , then ∆ TE C ≥ np e max , which is byassumption greater than c log n ; if C = np t max log n , then ∆ TE C ≥ ( np e max ) log n ≥ log n ; and if C = n ( p e max ) log n , then ∆ TE C ≥ n p t max log n ≥ log n . Finally, if C = (log n ) , then ∆ TE C ≥ log n .We bounded the probability P (Γ C ) in the previous proof, while a bound on P (Γ C ) isgiven in Lemma 3.Combining the expressions for all previously evaluated bounds, we obtain P (( d T E ) i ≥ T E ) ≤ P ( Z C ≥ T E ) + P (Γ C ) ≤ n − c (cid:48) + n − c (cid:48)(cid:48) + n − c (cid:48)(cid:48)(cid:48) + n − c (cid:48)(cid:48) . Taking the union bound over all i , we can show that max i ( d T E ) i ≤ c ∆ T E holds withprobability at least 1 − n − c (cid:48)(cid:48) . The claimed result then follows from Equation (3.3). Proof of Theorem 6
Proof.
We note that under the given assumptions on p e max and p t max we have the followingresults: (cid:112) D E = max { n / ( p e max ) / , ( n / ( p e max ) / (log n ) / } ≤ max { n − (cid:15) , (cid:112) ∆ t } = (cid:112) ∆ t , ∆ T = max { n ( p t max ) , (log n ) } ≤ max { (cid:112) ∆ t n ( p t max ) / , (cid:112) ∆ t }≤ max { (cid:112) ∆ t n − (cid:15) , (cid:112) ∆ t } = (cid:112) ∆ t , ∆ T E = max { n ( p t max ) p e max , (log n ) } ≤ max { (cid:112) ∆ t n ( p t max ) / p e max , (cid:112) ∆ t }≤ max { (cid:112) ∆ t n − (cid:15) , (cid:112) ∆ t } = (cid:112) ∆ t , T E = max { n ( p t max )( p e max ) , (log n ) } ≤ max { (cid:112) ∆ t n ( p t max ) / ( p e max ) , (cid:112) ∆ t } , ≤ max { (cid:112) ∆ t n − (cid:15) , (cid:112) ∆ t } = (cid:112) ∆ t . Consequently, (cid:107) A T − E [ A T ] (cid:107) ≤ (cid:107) A T − E [ A T ] (cid:107) + (cid:107) A E − E [ A E ] (cid:107) + (cid:107) A T − E [ A T ] (cid:107) + (cid:107) A T E − E [ A T E ] (cid:107) + (cid:107) A T E − E [ A T E ] (cid:107) ≤ c ( (cid:112) ∆ t + (cid:112) D E + ∆ T + ∆ T E + ∆ T E ) ≤ ˜ c (cid:112) ∆ t , where c is the maximum of all constants used for bounding the individual matrix terms, and˜ c is another constant that may be easily computed from the previous inequalities. Proof of Theorem 7
Proof.
First, note that E [ A T ] = E [ A T ] + E [ A E ] + E [ A T E ] + E [ A T ] + E [ A T E ] , and allmatrices in the sum under the SupSBM model may be written in the form C (( g − h ) I k + y k Tk ) C T . Consequently E [ A T ] can also be written in the form C (( g − h ) I k + y k Tk ) C T .Then, we have λ min ( E [ A T ]) = nk ( g − h ) for some g and h . Now note that the ( g − h ) termin E [ A T ] is the sum of the corresponding ( g − h ) terms in the component matrices, all ofwhich are positive due to the community structure of the SupSBM. Hence, the ( g − h ) termof E [ A T ] is going to be greater than the ( g − h ) term of E [ A T ], so that λ min ( E [ A T ]) ≥ λ min ( E [ A T ]). This implies that we can replace λ min ( E [ A T ]) with λ min ( E [ A T ]) in the upperbound of Equation (3.4). We have already computed λ min ( E [ A T ]) in Equation (3.5) and thenumerator of Equation (3.4) has been upper bounded in Theorem 6. Combining the results,we arrive at the claimed result. Proof of Theorem 8
Proof.
The first inequality is a result of Equation (3.4) which relates the misclustering rate R T with (cid:107) A T − E [ A T ] (cid:107) and λ min ( E [ A T ]) through the Davis-Kahan Theorem. The secondinequality is obtained by replacing the numerator with the bound from Theorem 1 and thedenominator with the result computed in Equation (3.5). Proof of Theorem 9
Proof.
We have the following asymptotic relationship between the two error rates: a t n ( a t − b t ) (cid:16) a e /δ nm ( a e − b e ) δ (cid:16) δm n a e ( a e − b e ) . Hence, the error rate obtained by using the information about edges is δm n times that ofusing triangles. Consequently, the error rate is lower for triangle hyperedges if (cid:96)m n (cid:46) roof of Theorem 10 Proof.
The first inequality follows from Equation (3.4) that relates the misclustering ratewith (cid:107) A E − E [ A E ] (cid:107) and λ min ( E [ A E ]) through the Davis-Kahan Theorem. The secondinequality is obtained by replacing the numerator with the bound from Theorem 2 and thedenominator with the result in Equation (3.6). Appendix B
Proof of Lemma 1.
Proof.
Define u ij = x i y j i, j ) ∈ L ) + x j y i j, i ) ∈ L ) for all i, j = 1 , . . . , n . Then, (cid:88) ( i,j ) ∈ L (cid:88) k x i y j ( T ijk − E [ T ijk ]) = (cid:88) i We first address the subset of heavy pairs H = { ( i, j ) ∈ H : x i > , y j > } . Theother cases may be analyzed similarly.Define the following two families of sets: I = { − √ n ≤ x i ≤ √ n } , I s = { s − √ n < x i ≤ s √ n } , s = 2 , , . . . , (cid:100) log √ n (cid:101) ,J = { − √ n ≤ y i ≤ √ n } , J t = { t − √ n < y i ≤ t √ n } , t = 2 , , . . . , (cid:100) log √ n (cid:101) . Next, for two arbitrary sets I and J of vertices, also define e ( I, J ) = (cid:40)(cid:80) i ∈ I (cid:80) j ∈ J (cid:80) k (cid:54) =( i,j ) T ijk I ∩ J = ∅ , (cid:80) ( i,j ) ∈ I × J \ ( I ∩ J ) (cid:80) k (cid:54) =( i,j ) T ijk + (cid:80) ( i,j ) ∈ ( I ∩ J ) ,i Let d t,i = (cid:80) j (cid:80) k (cid:54) = i,j T ijk denote the triangle-degree of vertex i . Then, for all i ,and a constant r > , there exists a constant c ( r ) > such that d t,i ≤ c ∆ t with probabilityat least − n − r . Lemma 6. For a constant r > , there exists constants c ( r ) , c ( r ) > such that for anypair of vertex sets I, J ⊆ { , . . . , n } such that | I | ≤ | J | , with probability at least − n − r ,at least one of the following statements holds:(a) e ( I,J )¯ µ ( I,J ) ≤ e c , (b) e ( I, J ) log e ( I,J )¯ µ ( I,J ) ≤ c | J | log n | J | . Now, we use the result of the two previous lemmas to complete the proof of the claimedresult for the heavy pairs. We note (cid:88) ( i,j ) ∈ H x i y j (cid:88) k (cid:54) =( i,j ) T ijk ≤ (cid:88) ( s,t ):2 ( s + t ) ≥√ ∆ t e ( I s , J t ) 2 s √ n t √ n ≤ √ ∆ t (cid:88) ( s,t ):2 ( s + t ) ≥√ ∆ t α s β t σ st . We would like to bound the right-hand-side of the inequality by a constant multiple of √ ∆ t . To this end, first note the following two facts: (cid:88) s α s ≤ / − = 1 , (cid:88) t β t ≤ . Following the approach of Lei et al. (2015) and Chin et al. (2015), we split the set of pairs C : { ( s, t ) : 2 ( s + t ) ≥ √ ∆ t , | I s | ≤ | J t |} into six parts and show that desired invariant for eachpart is bounded. 36 C : { ( s, t ) ∈ C, σ st ≤ } : (cid:88) ( s,t ) α s β t σ st { ( s, t ) ∈ C } ≤ (cid:88) s,t α s β t ≤ . • C : { ( s, t ) ∈ C \ C , λ st ≤ e c } :Since σ st = λ st (cid:112) ∆ t − ( s + t ) ≤ λ st ≤ e c , consequently (cid:88) ( s,t ) α s β t σ st { ( s, t ) ∈ C } ≤ e c (cid:88) s,t α s β t ≤ e c . • C : { ( s, t ) ∈ C \ ( C ∪ C ) , s − t ≥ √ ∆ t } :By Lemma (5), e ( I s , J t ) ≤ c | I s | ∆ t . Hence, λ st = e ( I s , J t ) / ¯ µ st ≤ c | I s | ∆ t | I s || J t | ∆ t /n ≤ c n | J t | , and consequently, σ st ≤ c (cid:112) ∆ t − ( s + t ) n | J t | ≤ c − t n | J t | , for ( s, t ) ∈ C . Then, (cid:88) ( s,t ) α s β t σ st { ( s, t ) ∈ C } ≤ (cid:88) s α s (cid:88) t β t c − t n | J t |≤ (cid:88) s α s (cid:88) t t | J t | n c − t n | J t | ≤ c (cid:88) s α s ≤ c . • C : { ( s, t ) ∈ C \ ( C ∪ C ∪ C ) , log λ st > [2 t log 2 + log(1 /β t )] } :From part (b) of Lemma 6, we have, λ st log λ st | I s || J t | ∆ t n ≤ e ( I s , J t )¯ µ ( I s , J t ) log e ( I s , J t )¯ µ ( I s , J t ) ¯ µ ( I s , J t ) ≤ c | J t | log 2 t | J t | , which is equivalent to σ st α s ≤ c λ st s − t √ ∆ t { t log 2 + log(1 /β t ) } ≤ c s − t √ ∆ t . Then, (cid:88) ( s,t ) α s β t σ st { ( s, t ) ∈ C } = (cid:88) t β t (cid:88) s σ st α s { ( s, t ) ∈ C }≤ c (cid:88) t β t (cid:88) s s − t √ ∆ t { ( s, t ) ∈ C } ≤ c (cid:88) t β t ≤ c . C : { ( s, t ) ∈ C \ ( C ∪ C ∪ C ∪ C ) , t log 2 ≥ log(1 /β t )] } :First, note that since ( s, t ) / ∈ C , we have log λ st ≤ [2 t log 2 + log(1 /β t )] ≤ t log 2and hence λ st ≤ t . Next, σ st = λ st √ ∆ t − ( s + t ) ≤ − s √ ∆ t , and hence σ st α s ≤ c s − t √ ∆ t t log 2. Therefore, (cid:88) ( s,t ) α s β t σ st { ( s, t ) ∈ C } ≤ (cid:88) t β t (cid:88) s c s − t √ ∆ t t log 2 ≤ c log 2 (cid:88) t β t ≤ c . • C : { ( s, t ) ∈ C \ ( C ∪ C ∪ C ∪ C ∪ C ) } :Since 2 t log 2 < log(1 /β t ), we have log λ st ≤ t log 2 ≤ log(1 /β t ) / 2. This observation,along with the fact λ st ≥ 1, implies that λ st ≤ /β t . As a result, (cid:88) ( s,t ) α s β t σ st { ( s, t ) ∈ C } ≤ (cid:88) s α s (cid:88) t − ( s + t ) (cid:112) ∆ t { ( s, t ) ∈ C } ≤ (cid:88) s α s ≤ . In a similar fashion, the set of pairs C : { ( s, t ) : 2 ( s + t ) ≥ √ ∆ t , | I s | > | J t |} is split into sixcategories in order to bound (cid:80) ( s,t ) α s β t σ st . The derivations are omitted.Collecting all the previously obtained terms, we arrive at the claimed result for heavypairs: for some constant r > 0, there exists a constant c ( r ) > − n − r , one has (cid:88) ( i,j ) ∈ H (cid:88) k x i y j T ijk ≤ c (cid:112) ∆ t . Proof of Lemma 3 Proof. As before, define u ij = x i y j i, j ) ∈ L ) + x j y i j, i ) ∈ L ) for all i, j = 1 , . . . , n .Then, (cid:88) ( i,j ) ∈ L x i y j ( A E − E [ A E ]) ij = (cid:88) i 3) log n ) . Then using Theorem 1.3 of Warnke (2016), we have P ( { f ( A ) − E [ f ( A )] ≥ (cid:112) D T E } ∩ B C ) ≤ exp( − D E (cid:80) ij p ij (1 − p ij )( c ij + e ij ) + 2 C √ D E / ≤ exp( − D E (cid:80) ij p max (64 τ ) u ij + √ D E n √ D E ) ≤ exp( − D E p max τ + √ D E n √ D E ) ≤ exp( − D E (128 + 16 / D E n ) ≤ exp( − cn ) , where the penultimate inequality follows since D E = np e max τ .Clearly, the event { A / ∈ Γ } does not depend on the choice of the vectors x, y . Hence,taking the supremum over all x and y , we have P ( sup x,y ∈N | (cid:88) i,j ∈ L x i y j ( A E − E [ A E ]) ij ≥ (cid:112) D E ) ≤ exp( − ( c − log 5) n ) + exp( − ( c (cid:48) − 3) log n ) . This completes the proof. Proof of Lemma 4 Proof. As before, we first focus on the subset of heavy pairs, H = { ( i, j ) ∈ H : x i > , y j > } ; the other two cases follow similarly. The vertex sets I , . . . , I (cid:100) log √ n (cid:101) , J , . . . , J (cid:100) log √ n (cid:101) are defined as before. In addition, we write e ( I, J ) = (cid:40)(cid:80) i ∈ I (cid:80) j ∈ J ( A E ) ij I ∩ J = ∅ (cid:80) ( i,j ) ∈ I × J \ ( I ∩ J ) ( A E ) ij + (cid:80) ( i,j ) ∈ ( I ∩ J ) ,i Lemma 7. If np e max > log n , then for a constant r > , there exists a constant c ( r ) > such that the “degree” of row i , ( d E ) i ≤ c ∆ E with probability at least − n − r for all i . Lemma 8. For a constant c > , there exist constants c ( c ) , c ( c ) > such that withprobability at least − n − c and for any vertex sets I, J ⊆ [ n ] and | I | ≤ | J | one of thefollowing two statements is true:(a) e ( I,J )¯ µ ( I,J ) ≤ ec , (b) e ( I, J ) log e ( I,J )¯ µ ( I,J ) ≤ c n ( p e max ) | J | log n | J | . We use the result of the two previous lemmas to establish the proof for heavy pairs. Inthis setting, note that (cid:88) ( i,j ) ∈ H x i y j ( A E ) ij ≤ (cid:88) ( s,t ):2 ( s + t ) ≥ √ DE τ max e ( I s , J t ) 2 s √ n t √ n = 2 14 (cid:88) ( s,t ):2 ( s + t ) ≥ √ DE τ max e ( I s , J t ) | I s || J t | ( np e max ) D E τ max s | I s | t | J | t n = 12 (cid:112) D E (cid:88) ( s,t ):2 ( s + t ) ≥ √ DE τ max e ( I s , J t )¯ µ ( I s , J t ) √ D E τ max − ( s + t ) s | I s | t | J | t n = √ D E (cid:88) ( s,t ):2 ( s + t ) ≥ √ DE τ max α s β t σ st . Next, we need to bound this quantity by a constant multiple of √ D E . Following theapproach of Lei et al. (2015) and Chin et al. (2015), we split the set of pairs C : { ( s, t ) :2 ( s + t ) ≥ √ D E τ max , | I s | ≤ | J t |} into six parts and show that the contribution of each part isbounded accordingly. Again, in our proof we rely on two facts, (cid:88) s α s ≤ (cid:88) i | x i | ≤ , (cid:88) t β t ≤ . • C : { ( s, t ) ∈ C, σ st ≤ } : (cid:88) ( s,t ) α s β t σ st { ( s, t ) ∈ C } ≤ (cid:88) s,t α s β t ≤ . C : { ( s, t ) ∈ C \ C , λ st ≤ c e } :Under C , 2 s + t ≥ √ D E τ max , and consequently, σ st = λ st √ D E τ max − ( s + t ) ≤ λ st ≤ c e. This implies (cid:88) ( s,t ) α s β t σ st { ( s, t ) ∈ C } ≤ c e (cid:88) s,t α s β t ≤ c e. • C : { ( s, t ) ∈ C \ ( C ∪ C ) , s − t ≥ √ D E τ max } :By Lemma (7), e ( I s , J t ) ≤ c | I s | ∆ E . Hence, λ st = e ( I s , J t ) / ¯ µ st ≤ c | I s | ∆ E | I s || J t | ( np e max ) ≤ c n | J t | , and consequently, σ st = λ st √ D E τ max − ( s + t ) ≤ c n | J t | √ D E τ max − ( s + t ) ≤ c − t n | J t | , for ( s, t ) ∈ C . Then, (cid:88) ( s,t ) α s β t σ st { ( s, t ) ∈ C } ≤ (cid:88) s α s (cid:88) t β t c − t n | J t |≤ (cid:88) s α s (cid:88) t t | J t | n c − t n | J t | ≤ c (cid:88) s α s ≤ c . • C : { ( s, t ) ∈ C \ ( C ∪ C ∪ C ) , log λ st > [2 t log 2 + log(1 /β t )] } :From part (b) of Lemma 6, we have λ st log λ st | I s || J t | np ≤ e ( I s , J t )¯ µ ( I s , J t ) log e ( I s , J t )¯ µ ( I s , J t ) ¯ µ ( I s , J t ) ≤ c T max | J t | log n | J t | . Noting that τ max n ( p e max ) = τ n ( p e max ) τ max = τ D E , we may write σ st α s ≤ λ st √ D E τ max − ( s + t ) | I s | s n ≤ c λ st ( τ max ) D E log( n | J t | ) √ D E τ max ( s − t ) ≤ c λ st s − t √ D E τ max { t log 2 + log(1 /β t ) } ≤ c s − t √ D E /τ max . Then, (cid:88) ( s,t ) α s β t σ st { ( s, t ) ∈ C } = (cid:88) t β t (cid:88) s σ st α s { ( s, t ) ∈ C } c (cid:88) t β t (cid:88) s s − t √ D E /τ max { ( s, t ) ∈ C } ≤ c (cid:88) t β t ≤ c , where the penultimate inequality relies on the fact that ( s, t ) / ∈ C , and that conse-quently 2 ( s − t ) ≤ √ D E τ max . • C : { ( s, t ) ∈ C \ ( C ∪ C ∪ C ∪ C ) , t log 2 ≥ log(1 /β t ) } :First, note that since ( s, t ) / ∈ C , we have log λ st ≤ [2 t log 2 + log(1 /β t )] ≤ t log 2 andhence λ st ≤ t . Furthermore, σ st = λ st √ D E τ max − ( s + t ) ≤ − s √ D E τ max . Since ( s, t ) / ∈ C , wehave log λ st ≥ σ st α s ≤ c λ st s − t √ D E τ max { t log 2 + log(1 /β t ) } ≤ c s − t √ D TE τ max t log 2 . Then, (cid:88) ( s,t ) α s β t σ st { ( s, t ) ∈ C } ≤ (cid:88) t β t (cid:88) s c s − t √ D E τ max t log 2 ≤ c log 2 (cid:88) t β t ≤ c . • C : { ( s, t ) ∈ C \ ( C ∪ C ∪ C ∪ C ∪ C ) } :Since 2 t log 2 < log(1 /β t ), we have log λ st ≤ t log 2 ≤ log(1 /β t ) / 2. This fact, alongwith λ st ≥ , implies that λ st ≤ /β t . Therefore, (cid:88) ( s,t ) α s β t σ st { ( s, t ) ∈ C } ≤ (cid:88) s α s (cid:88) t − ( s + t ) √ D E τ max { ( s, t ) ∈ C } ≤ (cid:88) s α s ≤ . The set of pairs C : { ( s, t ) : 2 ( s + t ) ≥ √ ∆ t , | I s | > | J t |} may be similarly split into sixcategories categories and similar arguments may be used to bound each of the contributions (cid:80) ( s,t ) α s β t σ st . Collecting all the terms we have the following result for heavy pairs: for someconstant c > 0, there exists a constant c ( c ) > − n − c ,one has (cid:88) ( i,j ) ∈ H x i y j ( A E ) ij ≤ c (cid:112) D E . Proof of Lemma 5. Proof. We note d t,i = (cid:80) j (cid:80) k T ijk is a sum of independent random variables, each boundedin absolute value by 1. Therefore, Bernstein’s inequality gives P ( d t,i ≥ c ∆ t ) ≤ P (cid:32)(cid:88) j (cid:88) k w ijk ≥ ( c − t (cid:33) exp (cid:32) − ( c − ∆ t (cid:80) j (cid:80) k p ijk (1 − p ijk ) + ( c − t (cid:33) ≤ exp( − ∆ t c − c + 4 ) ≤ n − c , where the last inequality follows since ∆ t ≥ c log n . Taking the union bound over all valuesof i we obtain that max i d t,i ≤ c ∆ t with probability at least 1 − n − r , where c is a functionof the constant r . Proof of Lemma 6. Proof. If | J | > n/e , then the result of Lemma 5 implies e ( I, J )∆ t | I || J | /n ≤ (cid:80) i ∈ I max i d t,i ∆ t | I | /e ≤ | I | c ∆ t ∆ t | I | /e ≤ c e, and consequently, (a) holds for this case.If | J | < n/e , let S ( I, J ) = { ( i, j ) , i ∈ I, j ∈ J } . We next invoke Corollary A.1.10 of Alonand Spencer (2004), described below. Proposition 2. For independent Bernoulli random variables X u ∼ Bern ( p u ) , u = 1 , . . . , n and p = n (cid:80) u p u , we have P ( (cid:88) u ( X u − p u ) ≥ a ) ≤ exp( a − ( a + pn ) log(1 + a/pn )) . Using the above result, for l ≥ 8, we have P ( e ( I, J ) ≥ l ¯ µ ( I, J )) ≤ P ( (cid:88) ( i,j ) ∈ S ( I,J ) (cid:88) k (cid:54) =( i,j ) ( T ijk − p ijk ) ≥ l ¯ µ ( I, J ) − (cid:88) ( i,j ) ∈ S ( I,J ) (cid:88) k (cid:54) =( i,j ) p ijk ) ≤ P ( (cid:88) ( i,j ) ∈ S ( I,J ) (cid:88) k (cid:54) =( i,j ) w ijk ≥ ( l − µ ( I, J )) ≤ exp(( l − µ ( I, J ) − l ¯ µ ( I, J ) log l ) ≤ exp( − l log l ¯ µ ( I, J )) . For a constant c > 0, let t ( I, J ) log t ( I, J ) = c | J | ¯ µ ( I, J ) log n | J | , and let l ( I, J ) = max { , t ( I, J ) } . Then, from the previous calculations, we have P ( e ( I, J ) ≥ l ( I, J )¯ µ ( I, J )) ≤ exp( − 12 ¯ µ ( I, J ) l ( I, J ) log l ( I, J )) ≤ c | J | log n | J | . From this point onwards identical arguments as those used in Lei et al. (2015) can be invokedto complete the proof of Lemma 6. 44 roof of Lemma 7 Proof. We use Proposition 1. Let us start with the observation that E [( d E ) i ] = E [ (cid:88) j (cid:88) k E ij E jk E ik ] ≤ n ( p e max ) ≤ ∆ E . Furthermore, let I i be the set of all triangles of type E incident to a vertex i . Let E θ = E ij E jk E ik , indexed by θ = { i, j, k } , denote a family of indicator random variables. Define a“good event” Γ as before: under the good event, every pair of vertices has at most C = 4 τ max common neighbors. Clearly, two triangles belonging to the set I i are independent if they donot share any edges. For simplicity, let “ ∼ ” denote a relation such that θ ∼ θ holds if θ and θ share an edge. For any E ijk , the good event Γ restricts the number of triangles oftype E in the set I i that are dependent on E ijk to 2 C .Define J ⊂ I as J = { θ : max θ ∈ J | θ ∈ J ; θ and θ share at least one edge | ≤ C } . Then, we have max θ ∈ J (cid:88) θ ∈ J : θ ∼ θ E θ ≤ C = 8 τ max , µ = E [ (cid:88) θ ∈ I i E θ ]∆ E . For t = µ = ∆ E , the above results imply P (max J (cid:88) θ ∈ J E θ ≥ E ) ≤ min (cid:26) exp (cid:18) − ∆ E C (∆ E + ∆ E / (cid:19) , (cid:18) E E (cid:19) − ∆ E / C (cid:27) = min (cid:26) exp (cid:18) − ∆ E τ max (cid:19) , − ∆ E / τ max }≤ exp( − c (cid:48) log n ) ≤ n − c (cid:48) , where the last inequality is a consequence of the following argument. If τ max = n ( p e max ) ,then ∆ E τ max = np e max ≥ log n by assumption, and if τ max = log n , then ∆ E τ max ≥ log n . Under thegood event Γ, we have I = J ∗ and consequently, max J (cid:80) α ∈ J E α = ( d E ) i . Then, P (( d E ) i ≥ E ) n − c (cid:48) + P ( A / ∈ Γ) ≤ n − c (cid:48)(cid:48) , since P ( A / ∈ Γ) ≤ exp( − log n ). Taking the union bound over all values of i results inmax i ( d T E ) i ≤ c ∆ T E with probability at least 1 − n − c (cid:48)(cid:48) . Proof of Lemma 8 Proof. We first note that ¯ µ ( I, J ) = | I || J | n ( p e max ) and ∆ E ≥ n ( p e max ) .Next, if | J | > n/e , then the result of Lemma 7 implies that e ( I, J )¯ µ ( I, J ) = e ( I, J ) | I || J | n ( p e max ) ≤ (cid:80) i ∈ I max i ( d E ) i n p | I | /e ≤ | I | c ∆ E n p | I | /e ≤ c e, 45o that in this case (a) holds true. If | J | < n/e , define S ( I, J ) as the set of all 3-tuples suchthat each tuple has one vertex in each of the sets I and J .To prove that the second statement also holds, we cannot invoke the exponential con-centration inequality used in the proof of Theorem 1 due to the lack of the independenceassumption. Instead, we use Proposition 1 on the set S ( I, J ) of 3-tuples.First, note that e ( I, J ) = (cid:88) i ∈ I (cid:88) j ∈ J ( A E ) ij = (cid:88) θ ∈ S ( I,J ) ( A E ) θ and E ( (cid:88) θ ∈ S ( A E ) θ ) ≤ | I || J | n ( p e max ) = ¯ µ ( I, J ) . Define S ∗ ⊂ S to be such that any ( A E ) θ , for some θ ∈ S ∗ , depends only on τ max other( A E ) θ s. We then have max S ∗ (cid:80) θ ∈ S ∗ ( I,J ) ( A E ) θ ≤ τ max .Next, let t = ( l − µ ( I, J ). Then, for some l ≥ P ( e ( I, J ) ≥ l ¯ µ ( I, J )) ≤ exp( − l ¯ µ ( I, J ) log l − ( l − µ ( I, J ) τ max ) ≤ exp( − l log l ¯ µ ( I, J ) τ max ) . For a constant c > 0, define t ( I, J ) according to t ( I, J ) log t ( I, J ) = c τ max | J | ¯ µ ( I, J ) log n | J | , and let l ( I, J ) = max { , t ( I, J ) } . From the previous calculations, we have P ( e ( I, J ) ≥ l ( I, J )¯ µ ( I, J )) ≤ exp( − 12 ¯ µ ( I, J ) τ max l ( I, J ) log l ( I, J )) ≤ exp( − c | J | log n | J | ) . Following an identical argument as described in Lei et al. (2015), we have P ( ∃ I, J : | I | ≤ | J | ≤ ne , e ( I, J ) ≥ l ( I, J )¯ µ ( I, J )) ≤ n − c . Therefore, with probability at least 1 − n − c , we have e ( I, J ) ≤ l ( I, J )¯ µ ( I, J )). For pairs { ( I, J ) : | I | ≤ | J | ≤ ne } such that l ( I, J ) = 8, we readily have e ( I, J )¯ µ ( I, J ) ≤ . This establishes that part (a) of the claim is true. For the remaining pairs for which l ( I, J ) = t ( I, J ) holds, we have e ( I,J )¯ µ ( I,J ) ≤ t ( I, J ), and e ( I, J )¯ µ ( I, J ) log e ( I, J )¯ µ ( I, J ) ≤ t ( I, J ) log t ( I, J ) = c τ max | J | ¯ µ ( I, J ) log n | J | , implying that part (b) of the claim is true as well.46 cknowledgement The authors would like to thank Prof. Lutz Warnke of Georgia Institute of Technology forexplaining how his work may be applied in parts of the analysis. We also appreciate hisgenerous assistance with proving some of the results. The work was supported by the NSFgrant CCF 1527636, the Center of Science of Information NSF STC center, and an NIH U01grant for targeted software development. References Abbe, E. and Sandon, C. (2015). Community detection in general stochastic block models: Fun-damental limits and efficient algorithms for recovery. In IEEE 56th Annual Symposium onFoundations of Computer Science (FOCS) , pages 670–688. IEEE.Adamic, L. A. and Glance, N. (2005). The political blogosphere and the 2004 US election: dividedthey blog. In Proceedings of the 3rd International Workshop on Link Discovery , pages 36–43.ACM.Alon, N. and Spencer, J. H. (2004). The Probabilistic Method . John Wiley & Sons.Alon, U. (2007). Network motifs: theory and experimental approaches. Nature Reviews Genetics ,8(6):450–461.Amini, A. A., Chen, A., Bickel, P. J., and Levina, E. (2013). Pseudo-likelihood methods forcommunity detection in large sparse networks. The Annals of Statistics , 41(4):2097–2122.Angelini, M. C., Caltagirone, F., Krzakala, F., and Zdeborov, L. (2015). Spectral detection onsparse hypergraphs. In , pages 66–73. IEEE.Barab´asi, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. Science ,286(5439):509–512.Battiston, F., Nicosia, V., Chavez, M., and Latora, V. (2017). Multilayer motif analysis of brainnetworks. Chaos: An Interdisciplinary Journal of Nonlinear Science , 27(4):047404.Benson, A. R., Gleich, D. F., and Leskovec, J. (2016). Higher-order organization of complexnetworks. Science , 353(6295):163–166.Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvanand other modularities. Proceedings of the National Academy of Sciences , 106(50):21068–21073.Bollob´as, B., Janson, S., and Riordan, O. (2007). The phase transition in inhomogeneous randomgraphs. Random Structures & Algorithms , 31(1):3–122.Bollob´as, B., Janson, S., and Riordan, O. (2011). Sparse random graphs with clustering. RandomStructures & Algorithms , 38(3):269–323.Boucheron, S., Lugosi, G., and Massart, P. (2013). Concentration Inequalities: A NonasymptoticTheory of Independence . Oxford university press.Celisse, A., Daudin, J. J., and Pierre, L. (2012). Consistency of maximum-likelihood and variationalestimators in the stochastic block model. Electronic Journal of Statistics , 6:1847–1899.Chandrasekhar, A. and Jackson, M. O. (2016). A network formation model based on subgraphs. arXiv preprint arXiv:1611.07658 .Chandrasekhar, A. G. and Jackson, M. O. (2014). Tractable and consistent random graph models.Technical report, National Bureau of Economic Research.Chien, I., Lin, C.-Y., and Wang, I.-H. (2018). Community detection in hypergraphs: Optimal tatistical limit and efficient algorithms. In International Conference on Artificial Intelligenceand Statistics , pages 871–879.Chin, P., Rao, A., and Vu, V. (2015). Stochastic block model and community detection in sparsegraphs: A spectral algorithm with optimal rate of recovery. In COLT , pages 391–423.Choi, D. S., Wolfe, P. J., and Airoldi, E. M. (2012). Stochastic blockmodels with a growing numberof classes. Biometrika , 99:273–284.Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. iii. SIAMJournal on Numerical Analysis , 7(1):1–46.Decelle, A., Krzakala, F., Moore, C., and Zdeborov´a, L. (2011). Asymptotic analysis of the stochas-tic block model for modular networks and its algorithmic applications. Physical Review E ,84(6):066106.Erd¨os, P. and R´enyi, A. (1960). On the evolution of random graphs. Publ. Math. Inst. Hung. Acad.Sci , 5(1):17–60.Feige, U. and Ofek, E. (2005). Spectral techniques applied to sparse random graphs. RandomStructures & Algorithms , 27(2):251–275.Friedman, J., Kahn, J., and Szemeredi, E. (1989). On the second eigenvalue of random regulargraphs. In Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing ,pages 587–598. ACM.Galhotra, S., Mazumdar, A., Pal, S., and Saha, B. (2017). The geometric block model. arXivpreprint arXiv:1709.05510 .Galhotra, S., Mazumdar, A., Pal, S., and Saha, B. (2018). Connectivity in random annulus graphsand the geometric block model. arXiv preprint arXiv:1804.05013 .Gao, C., Ma, Z., Zhang, A. Y., and Zhou, H. H. (2015). Achieving optimal misclassificationproportion in stochastic block model. arXiv preprint arXiv:1505.03772 .Gao, C., Ma, Z., Zhang, A. Y., and Zhou, H. H. (2017). Achieving optimal misclassificationproportion in stochastic block models. The Journal of Machine Learning Research , 18(1):1980–2024.Ghoshdastidar, D., Dukkipati, A., et al. (2017). Consistency of spectral hypergraph partitioningunder planted partition model. The Annals of Statistics , 45(1):289–315.Girvan, M. and Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences , 99(12):7821–7826.Golub, G. H. and Van Loan, C. F. (2012). Matrix computations , volume 3. JHU Press.Hajek, B. and Sankagiri, S. (2018). Recovering a hidden community in a preferential attachmentgraph. arXiv preprint arXiv:1801.06818 .Hajek, B., Wu, Y., and Xu, J. (2016). Achieving exact cluster recovery threshold via semidefiniteprogramming. IEEE Transactions on Information Theory , 62(5):2788–2797.Holland, P., Laskey, K., and Leinhardt, S. (1983). Stochastic blockmodels: some first steps. SocialNetworks , 5:109–137.Honey, C. J., K¨otter, R., Breakspear, M., and Sporns, O. (2007). Network structure of cerebral cor-tex shapes functional connectivity on multiple time scales. Proceedings of the National Academyof Sciences , 104(24):10240–10245.Janson, S. and Ruci´nski, A. (2004). The deletion method for upper tail estimates. Combinatorica ,24(4):615–640.Jin, J. (2015). Fast community detection by score. The Annals of Statistics , 43(1):57–89.Joseph, A. and Yu, B. (2016). Impact of regularization on spectral clustering. The Annals ofStatistics , 44(4):1765–1791. arrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure innetworks. Phys. Rev. E. , 83(1):016107.Kim, C., Bandeira, A. S., and Goemans, M. X. (2017). Community detection in hypergraphs,spiked tensor models, and sum-of-squares. In Sampling Theory and Applications (SampTA),2017 International Conference on , pages 124–128. IEEE.Kim, J. H. and Vu, V. H. (2000). Concentration of multivariate polynomials and its applications. Combinatorica , 20(3):417–434.Klusowski, J. M. and Wu, Y. (2018). Counting motifs with graph sampling. arXiv preprintarXiv:1802.07773 .Kumar, A., Sabharwal, Y., and Sen, S. (2004). A simple linear time (1+ (cid:15) )-approximation algorithmfor k-means clustering in any dimensions. In In Proceedings of the 45th Annual IEEE Symposiumon Foundations of Computer Science , pages 454–462. IEEE.Laniado, D., Volkovich, Y., Kappler, K., and Kaltenbrunner, A. (2016). Gender homophily inonline dyadic and triadic relationships. EPJ Data Science , 5(1):19.Lei, J., Rinaldo, A., et al. (2015). Consistency of spectral clustering in stochastic block models. The Annals of Statistics , 43(1):215–237.Li, P. and Milenkovic, O. (2017). Inhomogoenous hypergraph clustering with applications. In Advances in Neural Information Processing Systems , pages 2305–2315.Lusseau, D., Schneider, K., Boisseau, O. J., Haase, P., Slooten, E., and Dawson, S. M. (2003).The bottlenose dolphin community of doubtful sound features a large proportion of long-lastingassociations. Behavioral Ecology and Sociobiology , 54(4):396–405.Mangan, S. and Alon, U. (2003). Structure and function of the feed-forward loop network motif. Proceedings of the National Academy of Sciences , 100(21):11980–11985.Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., and Alon, U. (2002). Networkmotifs: simple building blocks of complex networks. Science , 298(5594):824–827.Newman, M. E. (2003). The structure and function of complex networks. SIAM Review , 45(2):167–256.Newman, M. E. J. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E , 69(2):026113.Park, H.-J. and Friston, K. (2013). Structural and functional brain networks: from connections tocognition. Science , 342(6158):1238411.Paul, S. and Chen, Y. (2016). Orthogonal symmetric non-negative matrix factorization under thestochastic block model. arXiv preprint arXiv:1605.05349 .Paulau, P. V., Feenders, C., and Blasius, B. (2015). Motif analysis in directed ordered networksand applications to food webs. Scientific Reports , 5:11926.Porter, M. A., Onnela, J.-P., and Mucha, P. J. (2009). Communities in networks. Notices of theAMS , 56(9):1082–1097.Qin, T. and Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochasticblockmodel. In Advances in Neural Information Processing Systems , pages 3120–3128.Rohe, K., Chatterjee, S., and Yu, B. (2011). Spectral clustering and the high-dimensional stochasticblockmodel. Ann. Statist , 39(4):1878–1915.Rohe, K., Qin, T., and Fan, H. (2012). The highest dimensional stochastic blockmodel with aregularized estimator. Statistica Sinica , 39(4):1878–1915.Rosvall, M., Esquivel, A. V., Lancichinetti, A., West, J. D., and Lambiotte, R. (2014). Mem-ory in network flows and its effects on spreading dynamics and community detection. NatureCommunications , 5. arkar, P., Bickel, P. J., et al. (2015). Role of normalization in spectral clustering for stochasticblockmodels. The Annals of Statistics , 43(3):962–990.Shen-Orr, S. S., Milo, R., Mangan, S., and Alon, U. (2002). Network motifs in the transcriptionalregulation network of Escherichia Coli. Nature Genetics , 31(1):64–68.Snijders, T. A. (2001). The statistical evaluation of social network dynamics. Sociological Method-ology , 31(1):361–395.Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodelsfor graphs with latent block structure. Journal of Classificaion , 14:75–100.Sporns, O. and K¨otter, R. (2004). Motifs in brain networks. PLoS Biology , 2(11):e369.Stewart, G. W. and Sun, J.-G. (1990). Matrix Perturbation Theory . Academic Press, Boston, MA.Tsourakakis, C. E., Pachocki, J., and Mitzenmacher, M. (2017). Scalable motif-aware graph cluster-ing. In Proceedings of the 26th International Conference on World Wide Web , pages 1451–1460.International World Wide Web Conferences Steering Committee.Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices. arXivpreprint arXiv:1011.3027 .Vu, V. Q. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions. The Annals of Statistics , 41(6):2905–2947.Warnke, L. (2016). On the method of typical bounded differences. Combinatorics, Probability andComputing , 25(02):269–299.Warnke, L. (2017). Upper tails for arithmetic progressions in random subsets. Israel Journal ofMathematics , 221(1):317–365.Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of small-world networks. Nature ,393(6684):440–442.Yavero˘glu, ¨O. N., Malod-Dognin, N., Davis, D., Levnajic, Z., Janjic, V., Karapandza, R., Sto-jmirovic, A., and Prˇzulj, N. (2014). Revealing the hidden language of complex networks. Scien-tific Reports , 4:4547.Zachary, W. W. (1977). An information flow model for conflict and fission in small groups. Journalof anthropological research , pages 452–473.Zhao, Y., Levina, E., and Zhu, J. (2012). Consistency of community detection in networks underdegree-corrected stochastic block models. Ann. Statist , 40:2266–2292.Zhou, D., Huang, J., and Sch¨olkopf, B. (2006). Learning with hypergraphs: Clustering, classifica-tion, and embedding. In Advances in Neural Information Processing Systems , pages 1601–1608., pages 1601–1608.