[PDF] Extended Stochastic Block Models with Application to Criminal Networks

Abstract

Reliably learning group structure among nodes in network data is challenging in modern applications. We are motivated by covert networks encoding relationships among criminals. These data are subject to measurement errors and exhibit a complex combination of an unknown number of core-periphery, assortative and disassortative structures that may unveil the internal architecture of the criminal organization. The coexistence of such noisy block structures limits the reliability of community detection algorithms routinely applied to criminal networks, and requires extensions of model-based solutions to realistically characterize the node partition process, incorporate information from node attributes, and provide improved strategies for estimation, uncertainty quantification, model selection and prediction. To address these goals, we develop a novel class of extended stochastic block models (ESBM) that infer groups of nodes having common connectivity patterns via Gibbs-type priors on the partition process. This choice encompasses several realistic priors for criminal networks, covering solutions with fixed, random and infinite number of possible groups, and facilitates inclusion of node attributes in a principled manner. Among the new alternatives in our class, we focus on the Gnedin process as a realistic prior that allows the number of groups to be finite, random and subject to a reinforcement process coherent with the modular structures in organized crime. A collapsed Gibbs sampler is proposed for the whole ESBM class, and refined strategies for estimation, prediction, uncertainty quantification and model selection are outlined. ESBM performance is illustrated in realistic simulations and in an application to an Italian Mafia network, where we learn key block patterns revealing a complex hierarchical structure of the organization, mostly hidden from state-of-the-art alternative solutions.

Full PDF

...

Extended stochastic block models

Sirio Legramanti , Tommaso Rigon , Daniele Durante , David B. Dunson Abstract

Stochastic block models (

SBM ) are widely used in network science due to their interpretable structure that allowsinference on groups of nodes having common connectivity patterns. Although providing a well establishedmodel–based approach for community detection, such formulations are still the object of intense research toaddress the key problem of inferring the unknown number of communities. This has motivated the developmentof several probabilistic mechanisms to characterize the node partition process, covering solutions with ﬁxed,random and inﬁnite number of communities. In this article we provide a uniﬁed view of all these formulationswithin a single extended stochastic block model (

ESBM ), that relies on Gibbs–type processes and encompassesmost existing representations as special cases. Connections with Bayesian nonparametric literature open up newavenues that allow the natural inclusion of several unexplored options to model the nodes partition process andto incorporate node attributes in a principled manner. Among these new alternatives, we focus on the Gnedinprocess as an example of a probabilistic mechanism with desirable theoretical properties and nice empiricalperformance. A collapsed Gibbs sampler that can be applied to the whole

ESBM class is proposed, and reﬁnedmethods for estimation, uncertainty quantiﬁcation and model assessment are outlined. The performance of

ESBM is assessed in simulations and an application to bill co–sponsorship networks in the Italian parliament,where we ﬁnd key hidden block structures and core–periphery patterns.

Keywords

Bayesian nonparametric — community detection — Gibbs–type prior — network data — product partition model Department of Decision Sciences. Bocconi University, via R¨ontgen 1, 20136, Milan, Italy Department of Statistical Science, Duke University, Box 90251, Durham, North Carolina 27708, U.S.A. Department of Decision Sciences and Bocconi Institute for Data Science and Analytics. Bocconi University, via R¨ontgen 1, 20136, Milan, Italy

Contents

Learning the number of communities • Asymptotic properties • Inclusion of node attributes

1. Introduction

Network data are ubiquitous in science and there is recurringinterest in community structure. Interacting units—such asbrain regions [7], genes [29], social actors [50] and transporta-tion nodes [19]—can often be grouped into clusters whichshare similar connectivity patterns in the corresponding net-work. The relevance of such a property and the interdisci- plinary nature of network science have motivated a collec-tive effort by various disciplines towards the developmentof methods for community detection, ranging from algorith-mic strategies [16, 36, 34, 4, 42] to model–based solutions[22, 38, 25, 1, 2, 15]; see also [10, 26] for a comprehen-sive overview. Despite being widely used in practice, mostalgorithmic approaches lack uncertainty quantiﬁcation andcan only detect communities characterized by dense within–block connectivity and sparser connections between differentblocks [11]. These issues have motivated a growing interestin model–based solutions which rely on generative statisticalmodels. This choice allows coherent uncertainty quantiﬁca-tion, model selection and hypothesis testing, while accountingfor more general connectivity patterns, where nodes in thesame community are not necessarily more densely connected,but simply share the same connectivity behavior [11], whichmay even characterize core–periphery, disassortative or weakcommunity patterns [11] [Figure 8]. These alternative struc-tures are also found in the motivating 2013–2018 Italian billco–sponsorship network [5] displayed in Figure 1, thus sup-porting our focus on model–based solutions.Among the generative models for learning communitiesin network data, the stochastic block model (

SBM ) [22, 38] isarguably the most widely implemented and well–establishedformulation, owing also to its unique balance between sim- a r X i v : . [ s t a t . M E ] J u l xtended stochastic block models — 2/11Figure 1. Adjacency matrix of the 2013–2018 bill co–sponsorshipnetwork in the Italian parliament. Edges and non–edges are depicted as blackand white pixels, respectively. Colors on the left denote the right wing (blue),left wing (red),

Movimento 5 Stelle (yellow) and mixed group (grey). plicity and ﬂexibility [26]. In

SBM s, the probability of an edgebetween two nodes only depends on their cluster memberships,thus allowing efﬁcient inference on communities and on blockprobabilities—which can characterize assortative, disassorta-tive, core–periphery or weak community patterns, and com-binations of such structures [11]. These desirable propertieshave motivated extensive theoretical studies [51, 3, 39] andvarious generalizations [48, 25, 20, 1, 23, 45, 35, 15, 12, 47]of the original

SBM .Most of the above extensions aim at addressing two fun-damental open problems with classical

SBM s. First, in real–world applications the number of underlying communitiesis typically unknown and has to be learned from the data.Therefore, classical

SBM formulations based on a ﬁxed andpre–speciﬁed number of communities [22, 38] are not suitableto address this goal. Second, it is common to observe nodalattributes that may effectively inform the community assign-ment mechanism. Hence,

SBM s require extensions to includesuch information in the process regulating the node partitions.A successful answer to the ﬁrst open issue has been providedby Bayesian nonparametric solutions replacing the originalDirichlet–multinomial process for node partitioning [38] withalternative priors that allow the number of communities togrow adaptively with the size of the network via the Chineserestaurant process (

CRP ) [25, 45] or be ﬁnite and randomunder a mixture of ﬁnite mixtures representation [15]. In-clusion of nodal attributes within the community assignmentis instead obtained via multinomial probit [48] or mixturemodels [35, 47]. Unfortunately, all these different extensionshave been developed separately and

SBM s still lack a unifyingframework, which would be useful to clarify common proper-ties, develop broad computational and inferential strategies,and identify novel solutions. Motivated by the above discussion, we unify the afore-mentioned formulations within a general extended stochasticblock model (

ESBM ) framework based on Gibbs–type pri-ors [18, 8], which also allow the inclusion of node attributesin a principled manner via product partition models (

PPM s)[21]. Within this class, we focus on the Gnedin process [17]as an example of a prior which has not yet been employedin the context of

SBM s, but exhibits analytical tractability,desirable properties, theoretical guarantees and promisingempirical performance when combined with such models.Our framework allows posterior computation via an easy–to–implement collapsed Gibbs sampler, and motivates generalmethods for uncertainty quantiﬁcation and model assessment,thus exploiting the advantages of a model–based approachover algorithmic strategies. The performance of key priorswithin the

ESBM class is evaluated in simulations. In light ofthese results, we opt for the Gnedin process to analyze thepolitical network in Figure 1. Code and data are availableat https://github.com/danieledurante/ESBM, where we alsoprovide additional ﬁgures and empirical analyses.

2. Model Formulation

Consider a binary undirected network with V nodes and let Y denote its V × V symmetric adjacency matrix, with elements y vu = y uv = v and u are connected, and y vu = y uv = SBM s and then introduceour general

ESBM class along with associated properties andextensions to incorporate node attributes. For simplicity, wefocus on binary undirected networks and categorical attributes,but our approach can be naturally extended to other types ofnetworks and covariates, as highlighted in the ﬁnal discussion.

SBM s [22, 38] partition the nodes into ¯ H mutually exclusiveand exhaustive communities, with nodes in the same com-munity sharing common connectivity patterns. More speciﬁ-cally, SBM s assume the sub–diagonal entries y vu , v = , . . . , V , u = , . . . , v − Y are con-ditionally independent Bernoulli random variables with prob-abilities depending only on the community memberships ofthe involved nodes v and u . Denoting with ¯ z = ( ¯ z , . . . , ¯ z V ) (cid:124) ∈{ , . . . , ¯ H } V the vector of community assignments of the V nodes, and with Θ the ¯ H × ¯ H symmetric matrix whose genericelement θ hk is the probability of a connection between a nodein community h and a node in community k , the likelihoodfor the adjacency matrix Y is p ( Y | ¯ z , Θ ) = ∏ Vv = ∏ v − u = θ y vu ¯ z v ¯ z u ( − θ ¯ z v ¯ z u ) − y vu = ∏ ¯ Hh = ∏ hk = θ m hk hk ( − θ hk ) m hk , (1)where m hk and m hk denote the number of edges and non–edgesbetween communities h and k , respectively. Classical SBM s[22, 38] assume independent Beta ( a , b ) priors for the block xtended stochastic block models — 3/11 probabilities θ hk . Thus the joint density for the diagonal andsub–diagonal elements of the symmetric matrix Θ is p ( Θ ) = ∏ ¯ Hh = ∏ hk = θ a − hk ( − θ hk ) b − B ( a , b ) , (2)where B ( · , · ) is the Beta function. Although quantifying prioruncertainty in the block probabilities via (2) is important, theoverarching goal in SBM s is to provide inference on com-munities. Consistent with this focus, Θ is usually treated asa nuisance parameter and marginalized out in (1) via beta–binomial conjugacy, obtaining p ( Y | ¯ z ) = ∏ ¯ Hh = ∏ hk = B ( a + m hk , b + m hk ) B ( a , b ) . (3)As we will clarify in the following sections, such a collapsedrepresentation is also useful for computation and inference.(3) provides a simple likelihood common to several exten-sions of SBM s, which instead differ in the choice of the proba-bilistic mechanism underlying the assignments ¯ z . A naturalchoice is a Dirichlet–multinomial distribution on ¯ z , obtainedby marginalizing the vector of community assignment proba-bilities π = ( π , . . . , π ¯ H ) ∼ Dirichlet ( β ) out of the likelihoodfor ¯ z , assuming pr ( ¯ z v = h | π ) = π h , v = , . . . , V . If ¯ H isﬁxed and ﬁnite, this leads to the original SBM [38]. How-ever, as already discussed, the number of communities isusually unknown and has to be inferred from the data. Apossible solution consists in placing a prior on ¯ H , which leadsto the mixture of ﬁnite mixtures ( MFM ) version of the

SBM proposed by [15]. Another option is a Dirichlet process parti-tion mechanism, which corresponds to the inﬁnite relationalmodel [25]. Such an inﬁnite mixture model differs from

MFM in that ¯ H = ∞ , meaning that inﬁnitely many nodes wouldgive rise to inﬁnitely many communities. Note that the totalnumber of possible clusters ¯ H should not be confused withthe number of occupied clusters H . The latter is deﬁned asthe number of distinct labels in ¯ z , and is upper bounded bymin { V , ¯ H } . Hence H cannot exceed V , even when ¯ H = ∞ .So far we have introduced labeled clusters, identiﬁed by ¯ z .This means that a vector ¯ z and its relabelings are regarded asdistinct objects, even though they identify the same partition.Throughout the rest of the paper we will rely on a generic z to denote all relabelings of ¯ z that lead to the same partition.For convenience, one may assume that z v ∈ { , . . . , H } , whichcorresponds to avoiding empty communities. Note that (3) isinvariant under relabeling and, hence, p ( Y | z ) = p ( Y | ¯ z ) . As illustrated in the previous section, several priors for com-munity memberships have been considered in the context of

SBM s, including the Dirichlet–multinomial [38], the Dirichletprocess [25], and mixtures of ﬁnite Dirichlet mixtures [15].These are all Gibbs–type priors, which were introduced by[18] and stand out for analytical and computational tractabil-ity [8]. In this section we propose the

ESBM as a unifyingframework characterized by the choice of a Gibbs–type prior for the assignments. This formulation includes the previously–mentioned

SBM s as special cases and offers new alternativesby exploring the whole Gibbs–type class and its connectionswith

PPM s [21].Gibbs–type priors are deﬁned over the space of the un-labeled community indicators z . For a >

0, denote the as-cending factorial with ( a ) n = a ( a + ) · · · ( a + n − ) for any n ≥

1, and set ( a ) =

1. A probability mass function p ( z ) isof Gibbs–type if and only if it has the form p ( z ) = W V , H ∏ Hh = ( − σ ) n h − , (4)where n h denotes the number of nodes in cluster h , σ < discount parameter and { W V , H : 1 ≤ H ≤ V } is acollection of non–negative weights satisfying the recursion W V , H = ( V − H σ ) W V + , H + W V + , H + , with W , =

1. Gibbs–type priors are a special case of

PPM s [21, 43], which areprobability models for random partitions z of the form p ( z ) ∝ c ( Z ) · · · c ( Z H ) , where { Z , . . . , Z H } is the partition associatedto z , so that v ∈ Z h if and only if z v = h , whereas c ( · ) is a non–negative cohesion function measuring the homogeneity withineach cluster. Such a connection will be useful to incorporatenode–speciﬁc attributes effects in ESBM s. Interestingly, Gibbs–type priors represent the largest class of

PPM s which are alsospecies sampling models [41], meaning that the membershipindicators z can be obtained in a sequential and interpretablemanner. Speciﬁcally, a Gibbs–type random partition z can besequentially generated according topr ( z V + = h | z ) ∝ (cid:40) W V + , H ( n h − σ ) for h = , . . . , H , W V + , H + for h = H + . (5)Hence, the community assignment process can be interpretedas a simple seating mechanism in which a new node is as-signed to an existing community h with probability propor-tional to the current size n h of that community discounted bya global factor σ and further rescaled by a weight W V + , H ,which may depend both on the size of the network and onthe current number of non–empty communities. Alternatively,the incoming node is assigned to a new community with prob-ability proportional to W V + , H + . According to (5), when σ > V as O ( V σ ) when σ >

0. When σ = O ( log V ) ,while σ < H even for inﬁnitely many nodes.This is due to the fact that the reinforcement mechanism is re-versed and each new community decreases the probability ofcreating future ones [8]. In the examples below we show howcommonly used partition processes in SBM s and unexploredalternatives can be obtained as special cases of (5).

Example 1 (Dirichlet–multinomial – DM ) . Let σ < anddeﬁne W V , H = β H − / ( β ¯ H + ) V − ∏ H − h = ( ¯ H − h ) ( H ≤ ¯ H ) xtended stochastic block models — 4/11 ¯ H σ H (growth) ExampleI Fixed σ < DM )II Random σ < GN )III.a Inﬁnite σ = O ( log V ) Dirichlet process ( DP )III.b Inﬁnite σ ∈ ( , ) O ( V σ ) Pitman–Yor process ( PY ) Table 1.

A classiﬁcation of Gibbs–type priors. for some β = − σ and ¯ H ∈ { , , . . . } . Then (5) coincideswith the Dirichlet–multinomial urn–schemepr ( z V + = h | z ) ∝ (cid:40) n h + β for h = , . . . , H , β ( ¯ H − H ) ( H ≤ ¯ H ) for h = H + . Example 2 (Dirichlet process – DP ) . Let σ = and W V , H = α H / ( α ) V for some α > . Then (5) leads to a CRP schemepr ( z V + = h | z ) ∝ (cid:40) n h for h = , . . . , H , α for h = H + . The

CRP can also be obtained as a limiting case of a Dirichlet-multinomial process with β = α / ¯ H, as ¯ H → ∞ . Example 3 (Pitman–Yor process – PY ) . Let σ ∈ [ , ) and set W V , H = ∏ H − h = ( α + h σ ) / ( α + ) V − for some α > − σ . Then (5) characterizes the Pitman–Yor processpr ( z V + = h | z ) ∝ (cid:40) n h − σ for h = , . . . , H , α + H σ for h = H + . This scheme clearly reduces to the DP when σ = . Example 4 (Gnedin process – GN ) . Let σ = − and W V , H =( γ ) V − H ∏ H − h = ( h − γ h ) / ∏ V − v = ( v + γ v ) for some γ ∈ ( , ) .Then (5) identiﬁes the Gnedin processpr ( z V + = h | z ) ∝ (cid:40) ( n h + )( V − H + γ ) for h = , . . . , H , H − H γ for h = H + . Other known and popular examples of tractable Gibbs–type priors can be found in [28, 8, 9, 32].

A key focus in community detection is inferring the numberof occupied clusters H . As the number of nodes V grows, H converges to ¯ H , which can be assumed, depending onthe application, to be ﬁnite (scenario I), random but almostsurely ﬁnite (scenario II), or inﬁnite (scenario III). Classical SBM s [38] fall into scenario I, the

MFM approach of [15]into scenario II, and the inﬁnite relational model of [25] intoscenario III. As shown in Table 1, Gibbs–type priors cover allthe aforementioned scenarios, allowing analysts to choose themost suitable for a given study.The only Gibbs–type prior within scenario I is the Dirichlet–multinomial, which serves as a building block for Gibbs–typepriors in scenario II. In fact, the latter can be derived from theDirichlet–multinomial by placing a prior on ¯ H , thus making it random. For instance, the distribution p G ( z ; γ ) of z underthe Gnedin process in Example 4 can be easily expressed as p G ( z ; γ ) = ∑ ∞ h = pr ( ¯ H = h ) p DM ( z ; 1 , h ) , where p DM ( z ; β , ¯ H ) denotes the Dirichlet–multinomial distri-bution in Example 1, and pr ( ¯ H = h ) = γ ( − γ ) h − / h ! can beinterpreted as a prior distribution on ¯ H . Although differentprior choices for ¯ H might be considered [32], the Gnedin pro-cess has considerable advantages. Firstly, the sequential mech-anism described in Example 4 has a simple analytical expres-sion. Moreover, the distribution pr ( ¯ H = h ) = γ ( − γ ) h − / h !has the mode at 1, heavy tail and inﬁnite expectation [17].Hence, the associated MFM favors simpler models with fewercommunities while being also a robust speciﬁcation for ¯ H dueto the heavy-tailed prior distribution.Priors on ¯ H quantify uncertainty in the total number ofcommunities that one would expect if V → ∞ . In practice,the number of non–empty communities H occupied by theobserved V nodes is of more direct interest. Under Gibbs–type priors such a quantity has a closed form probability massfunction [18] that coincides withpr ( H = h ) = W V , h σ h C ( V , h ; σ ) , h = , . . . , V , (6)where C ( V , h ; σ ) = / h ! ∑ hj = ( − ) j h ! { j ! ( h − j ) ! } − ( − j σ ) V is the generalized factorial coefﬁcient. The CRP is recoveredwhen σ →

0. In https://github.com/danieledurante/ESBMwe provide code to evaluate such quantities under the Gibbs–type priors in Examples 1–4, and then leverage these valuesto compute the prior expectation of H —which can assist inchoosing the hyperparameters. In our implementation the co-efﬁcients C ( V , h ; σ ) were not computed from their deﬁnition,but leveraging numerically stable recursive formulas.In addition to its practical relevance, (6) clariﬁes theasymptotic behavior of H . Indeed, the distribution of H con-verges to a point mass in scenario I, to a proper distribution inscenario II and to a point mass at inﬁnity in scenario III. Forinstance, recalling again the Gnedin process in Example 4, wehave that (6) reduces topr G ( H = h ) = (cid:18) Vh (cid:19) ( − γ ) h − ( γ ) V − h ( + γ ) V − , h = , . . . , V , and hence the expected value can be easily computed as E ( H ) = ∑ Vh = h · (cid:18) Vh (cid:19) ( − γ ) h − ( γ ) V − h ( + γ ) V − . Note that lim V → ∞ pr ( H = h ) = pr ( ¯ H = h ) = γ ( − γ ) h − / h !. xtended stochastic block models — 5/112.2.2 Asymptotic properties Dirichlet and Pitman–Yor processes may lead to inconsistentestimates for the number of communities if the data are gen-erated from a model with ¯ H < ∞ [31]. Intuitively, priors inscenario III fail in estimating a ﬁnite ¯ H because, by assump-tion, ¯ H = ∞ . Hence, we suggest Gibbs–type priors with σ ≥ H = ∞ , that is, when the truenumber of communities is assumed to grow without boundwith the number of nodes V .If the analyst believes that ¯ H < ∞ , then Gibbs–type pri-ors of scenario II may be more suitable. In the context of SBM s, [15] proved a consistency result for a

MFM , that actu-ally applies to any Dirichlet–multinomial with a prior on ¯ H supported on all positive integers. For instance, consistencyholds for the Gnedin process in Example 4. When node–speciﬁc attributes x v = ( x v , . . . , x vd ) (cid:124) , v = , . . . , V are available, such information may support inference on com-munity structures, both in term of point estimation and inreduction of posterior uncertainty. An option to include at-tributes within ESBM s in a principled manner is to rely onthe

PPM structure of Gibbs–type priors. Adapting results in[40, 33] to our network setting, this solution is based on theidea of replacing (4) with p ( z | X ) ∝ W V , H ∏ Hh = p ( X h )( − σ ) n h − , (7)where X = ( x , . . . , x V ) (cid:124) , whereas X h = ( x v : z v = h ) are theattributes for the nodes in cluster h . In (7), p ( X h ) controls thecontribution of the attributes to the cluster cohesion and, as wewill clarify later, it favors communities that are homogeneouswith respect to attribute values. Even if attributes are not con-sidered random, in this context [33] suggests choosing p ( X h ) as the probability distribution induced by an auxiliary model p ( X h | ξ h ) , with ξ h denoting community–speciﬁc parameters,thus obtaining p ( X h ) = (cid:82) p ( X h | ξ h ) d p ( ξ h ) . We refer to [33]for further discussion about the choice of p ( · ) .In this work, we consider the case in which each nodeattribute x v = x v is a single categorical variable taking valuesin { , . . . , C } . This is a common setting in applications, wherenode attributes often come in the form of exogenous partitions.For example, in the Italian bill co–sponsorship network in Fig-ure 1, possible attributes are party or coalition memberships,that we expect to inﬂuence voting behaviors. Following [33],we consider a Dirichlet–multinomial auxiliary model for suchattributes, which leads to p ( X h ) ∝ Γ ( n h + α ) ∏ Cc = Γ ( n hc + α c ) , (8)where n hc is the number of nodes in cluster h with attributevalue c , and α = ∑ Cc = α c , with α c > c = , . . . , C .

3. Posterior computations and inference

We derive a collapsed Gibbs sampler that holds for any modelwithin the

ESBM class, and allows inclusion of node attributes. Then, we provide extensive tools not only for point estimationof the community structure, but also for uncertainty quantiﬁ-cation and model selection. Despite their importance, thesetwo aspects have been largely neglected in the

SBM literature.

The availability of the urn scheme in (5) for the whole classof Gibbs–type priors allows us to derive a collapsed Gibbssampler that holds for any

ESBM (see Algorithm 1). At eachiteration, we sample the community assignment of each node v from its full–conditional distribution given the adjacencymatrix Y and the vector z − v of the cluster assignments of allthe other nodes. By simple application of the Bayes rule, thesefull conditional probabilities are equal topr ( z v = h | Y , z − v ) = pr ( z v = h | z − v ) p ( Y | z v = h , z − v ) p ( Y | z − v ) . (9)Recalling [45], the last term in (9) simpliﬁes to ∏ Hk = B ( a + m − hk + r vk , b + m − hk + n − k − r vk ) B ( a + m − hk , b + m − hk ) , (10)where m − hk and m − hk denote the number of edges and non–edgesbetween clusters h and k , without counting node v , and r vk isthe number of edges between node v and the nodes in cluster k .The prior term pr ( z v = h | z − v ) in (9) is derived from (5) andcoincides withpr ( z v = h | z − v ) ∝ (cid:40) W V , H − ( n − h − σ ) for h ≤ H − , W V , H − + for h = H − + , (11)where n − h and H − are the cardinality of cluster h and thenumber of occupied communities, respectively, after removingnode v from Y . Under the priors in Table 1, (11) admits thesimple closed–form expressions reported in Examples 1–4.When available, nodal attributes can be incorporated via (7),leading to an attribute–dependent collapsed Gibbs sampler. Inthis case, the full conditionals in (9) becomepr ( z v = h | Y , X , z − v ) ∝ pr ( z v = h | Y , z − v ) p ( X h ) p ( X h , − v ) , (12)where X h and X h , − v are the attributes for the nodes in the h thcommunity, including and excluding node v , respectively. Inthe case of categorical attributes with p ( X h ) as in (8), the lastterm in (12) can be written as p ( X h ) p ( X h , − v ) = n − hx v + α x v n − h + α , (13)where n − hc is the number of nodes in cluster h with covari-ate value c and n − h is the total number of nodes in cluster h ,both without counting node v . The introduction of this addi-tional term favors the attribution of node v to the cluster(s)containing a higher fraction of nodes with its same covariatevalue x v . In fact, (13) tends to the fraction of nodes in clus-ter h that have the same attribute value as node v . Instead,for h = H − + α x v / α . xtended stochastic block models — 6/11 Algorithm 1: Gibbs sampler for ESBM

At each iteration, update the cluster assignments as follows:

For v = , . . . , V do : . Remove node v from the network; . If the cluster which contained node v contains no other node, discard it (so that clusters 1 , . . . , H − are non-empty); . Sample z v from a categorical distribution with probabilitiespr ( z v = h | Y , z − v ) = pr ( z v = h | z − v ) · p ( Y | z v = h , z − v ) p ( Y | z − v ) , for h = , . . . , H − +

1, with pr ( z v = h | z − v ) as in (11) and p ( Y | z v = h , z − v ) / p ( Y | z − v ) as in (10). If categorical nodeattributes are available and have to be incorporated via (7)–(8), rescale the above expression by (13).Finally, although Algorithm 1 leverages the marginal like-lihood in (3) with block probabilities integrated out, esti-mates for each θ hk can be easily obtained. In particular, since ( θ hk | Y , z ) ∼ Beta ( a + m hk , b + m hk ) , we estimate θ hk by (cid:98) θ hk = E [ θ hk | Y , z = (cid:98) z ] = a + (cid:98) m hk a + (cid:98) m hk + b + (cid:98) m hk , (14)where (cid:98) m hk and (cid:98) m hk denote the number of edges and non–edgesbetween nodes in communities h and k computed from theestimated cluster assignment (cid:98) z . In the next subsection, we de-scribe methods for estimation of z , uncertainty quantiﬁcationin community detection, and model selection. While algorithmic methods return a single estimated partition,

ESBM s provide the whole posterior distribution over the spaceof partitions. To fully exploit such a posterior, we adapt thedecision–theoretic approach of [49] to the community detec-tion setting. In this way, we summarize posterior distributionson partitions leveraging the variation of information ( VI ) met-ric [30], which quantiﬁes distances between two clusteringsby comparing their individual and joint entropies, and rangesfrom 0 to log V . Intuitively, VI measures the amount of infor-mation in two clusterings relative to the information sharedbetween them, thus providing a metric that decreases to 0 asthe overlap between two partitions grows; see [49] for a dis-cussion of the key properties of VI that facilitate uncertaintyquantiﬁcation on partitions. Under this framework, a formalBayesian point estimate for z is that partition with lowestposterior averaged VI distance from the other clusterings, thusobtainingˆ z = arg min z (cid:48) E z [ VI ( z , z (cid:48) ) | Y ] , (15)where the expectation is taken with respect to z . Due to thehuge cardinality of the partition space, even for moderate V , the optimization in (15) is typically carried out through agreedy algorithm [49], as in the R package mcclust.ext .The VI distance also provides natural strategies to con-struct credible sets around point estimates. In particular, onecan deﬁne a 1 − α credible ball around ˆ z by ordering the par-titions according to their VI distance from ˆ z , and deﬁning the ball as containing all the partitions having less than a thresh-old distance from ˆ z , with the threshold chosen to minimizethe size of the ball while ensuring it contains at least 1 − α posterior probability. Summarizing this ball is non–trivialgiven the high–dimensional discrete nature of the space ofpartitions. In practice, as we illustrate in our examples below,one can report the partition at the edge of the ball, which wecall a credible bound. This form of uncertainty quantiﬁca-tion complements the posterior co–clustering matrix whichpresents, for every pair of nodes, the relative frequency of MCMC samples in which such nodes are assigned to the samecommunity.Another advantage of a Bayesian approach over algorith-mic techniques is the possibility of model selection throughformal testing. In particular, we can test two models M and M (cid:48) against each other by studying the Bayes factor [24] B M , M (cid:48) = p ( Y | M ) p ( Y | M (cid:48) ) = ∑ z p ( Y | z ) p ( z | M ) ∑ z p ( Y | z ) p ( z | M (cid:48) ) . (16)Due to the uniﬁed structure of ESBM s, such an approachis highly general and allows comparisons between any twomodels in the

ESBM class covering, for example, represen-tations relying on different priors for z and formulations in-cluding or not node attributes. While for degenerate models,with p ( z | M ) = δ z (cid:48) , computing p ( Y | M ) reduces to evalu-ating (3) at a speciﬁc z (cid:48) [27], for non-degenerate models wemust rely on posterior samples z ( ) , . . . , z ( T ) from p ( z | Y , M ) to obtain an estimate of p ( Y | M ) , for example through theharmonic mean [37, 44] (cid:98) p ( Y | M ) = [ T − ∑ Tt = p ( Y | z ( t ) ) − ] − , (17)where p ( Y | z ( t ) ) is the marginal likelihood in (3) at z ( t ) . Weshall note that (17) may face instabilities and slow conver-gence to p ( Y | M ) , thus motivating other estimators [14].Such issues did not occur in our empirical studies and theresults were always coherent with other model assessmentmeasures. Hence, we maintain (17) for its simplicity. As aglobal measure of goodness–of–ﬁt we also study the misclas-siﬁcation error when predicting each y vu with ˆ θ ˆ z v ˆ z u . xtended stochastic block models — 7/11

4. Simulation Studies

To assess

ESBM performance and highlight beneﬁts over al-gorithmic strategies [4], we consider two simulated networksof V =

100 nodes with various types of block structures, bothsampled from a

SBM with ¯ H = . .

3. As illustrated in Figure 2, theﬁrst network has equally–sized groups of 20 nodes each, dis-playing either community or core–periphery patterns, whereasthe second has a cluster of size 40, one of size 30 and theremaining three of size 10, all characterized by classical com-munity structures. State–of–the–art algorithmic strategies [4]applied to these two networks failed in recovering the trueunderlying blocks and showed a tendency to over–collapsedifferent communities, due to their inability to incorporateunbalanced noisy partitions and behaviors beyond communitypatterns.The above result motivates implementation of

ESBM s,both with and without attributes coinciding with the true parti-tion z . Within the Gibbs–type class, we test the four represen-tative priors ( DM , DP , PY and GN ) for z presented in Table 1.Their hyperparameters are set so that the prior mean for thenumber H of non–empty clusters is close to 10 > ¯ H under allpriors. In this way we can check whether our results are robustto hyperparameters settings. Speciﬁcally, we set α = .

55 forthe DP , σ = .

575 and α = − .

325 for the PY , ¯ H =

50 and β = /

50 for the DM and γ = .

475 for the GN . In implement-ing these models we consider the default setting a = b = α = · · · = α C = z , after a conservative burn–in of 5000. Inour experiments, inference was robust with respect to the ini-tialization of z , but starting with one community for each nodeprovided the best mixing — when monitored on the chain ofthe likelihood in (3), evaluated at the MCMC samples of z . Thetraceplots for such a chain suggested rapid convergence under Figure 2.

Left: observed adjacency matrix, with colors on the sidecorresponding to the true communities. Center and right: posteriorco–clustering matrix under the Gnedin process from the

ESBM without andwith node attributes, respectively. Colors on the side correspond to theestimated partition. Top panel refers to the ﬁrst simulated network, bottompanel to the second. log p ( Y ) E [ VI ( z , z ) | Y ] H VI ( ˆ z , z ) VI ( ˆ z , z b ) NETWORK

WITHOUT ATTRIBUTES ] DM − . .

648 7 [6,8] .

303 0 . DP − . .

631 7 [6,8] .

303 0 . PY − . .

554 6 [6,7] .

303 0 . GN − . .

519 5 [5,6] . . NETWORK

WITH ATTRIBUTES ] DM − . .

108 5 [5,6] .

000 0 . DP − . .

105 5 [5,6] .

000 0 . PY − . .

105 5 [5,6] .

000 0 . GN − . .

085 5 [5,5] . . NETWORK

WITHOUT ATTRIBUTES ] DM − . .

837 6 [5,7] .

570 1 . DP − . .

819 6 [5,7] .

570 0 . PY − . .

762 4 [3,5] .

570 0 . GN − . . [3,5] . . NETWORK

WITH ATTRIBUTES ] DM − . .

052 5 [5,6] .

000 0 . DP − . .

063 5 [5,6] .

000 0 . PY − . .

081 5 [5,5] .

000 0 . GN − . . [5,5] . . Table 2.

Results of

ESBM s in the two scenarios with ¯ H = p ( Y ) , posterior mean ofthe variation of information distance E [ VI ( z , z ) | Y ] from the true partition z , the posterior median number of the non–empty clusters H (with ﬁrst andthird quartiles in brackets), distance VI ( ˆ z , z ) among the estimated and truepartitions, and distance VI ( ˆ z , z b ) among the estimated partition and the 95%credible bound. all models, and Algorithm 1 provided 120 samples of z persecond when implemented on an iMac with 1 Intel Core i5 3.4 GHZ processor and 8

GB RAM , thus showing good efﬁciency.Table 2 summarizes the performance of the four priors, bothwith and without node attributes, in each of the two scenarios.Among the four Gibbs–type priors considered for z , theGnedin process always performed slightly better in terms ofmarginal likelihood and posterior mean of the VI distance fromthe true partition z . More notably, it typically offered moreaccurate learning of the number of communities, with tighterinterquartile ranges that always included the true ¯ H =

5, andtighter credible balls around the VI –optimal posterior pointestimate ˆ z . The posterior bias in terms of VI distance betweenˆ z and true z is comparable under all four considered priorsand much smaller than the maximum achievable VI amongtwo partitions of 100 nodes, which is log ≈ . GN was also the most robust to hyperparameters.As expected, including informative attributes improvedperformance, lowering by one order of magnitude the E [ VI ( z , z ) | Y ] , bringing VI ( ˆ z , z ) to zero and shrinking the credible balls.In a sense, this is the best scenario, since we used the truememberships z as attributes. We also tried supervising witha random permutation of z . This resulted in a slight perfor-mance deterioration relative to the model without attributes,which is doubly reassuring. In fact, on one hand it shows thatthe unsupervised model would be preferred to one with non– xtended stochastic block models — 8/11 informative attributes under the proposed model–selectioncriteria. On the other, the fact that performance deteriorationis not dramatic suggests robustness in learning. According toFigure 2, unbalanced partitions are harder to infer, especiallywithout attributes. However this gap vanishes when includinginformative attributes that can successfully support inference.All misclassiﬁcation errors were about 0 .

29, almost match-ing the one expected under the true model. This suggests ac-curate calibration and tendency to avoid overﬁtting in

ESBM s.

5. Bill co–sponsorship networks

Motivated by the growing interest in the analysis of polit-ical networks [13, 6, 5, 46], we apply our proposed

ESBM class to the bill co–sponsorship network among the V = Y with elements y vu = y uv = v co–sponsored a bill authored by u or vicev-ersa, and y vu = y uv = z within our ESBM formulation. The hyperparameter γ is set to 0 .

5, corresponding to 20 expected clusters a priori,twice as many as the parties in the legislature, which seemsreasonably conservative. We run Algorithm 1 both with andwithout node attributes for 15000 iterations after a burn–inof 5000. Mixing was adequate as in the simulation, whereasthe running times increased due to the much larger V , whichrequires one minute to produce 120 samples of z . Despite PARTY SEATS LEFT - RIGHT WINGS

Sinistra Ecologia Libert`a 34 1.3

LEFT

Movimento 5 Stelle 104 2.6 M S Partito Democratico 314 2.6

LEFT

Per l’Italia: Centro Democratico 11 6.0

LEFT

Scelta Civica per Monti 30 6.0

LEFT

Forza Italia (Il Popolo della Libert`a) 72 7.1

RIGHT

Lega Nord 22 7.8

RIGHT

Alleanza Nazionale 9 8.1

RIGHT

Area Popolare 31 –

RIGHT

Mixed or minor group 28 –

MIXED

Table 3.

Composition of the Italian parliament during the 2013–2018mandate. For each party, we report the number of seats and the left–rightscore in [5], with 0 corresponding to extreme–left and 10 to extreme–right.The last column denotes the macro–alliances underlying the 2013–2018legislature, that we use as attributes.

Figure 3.

Left: observed bill co–sponsorship adjacency matrix, reorderedaccording to the estimated communities; colors on the side correspond topolitical wings, used as node attributes: blue for the right wing, red for theleft wing, yellow for

Movimento 5 Stelle and grey for the mixed group. Right:posterior co–clustering matrix; colors on the side denote estimated clusters,with shades proportional to the prevailing party: green for

Lega Nord , bluefor the rest of the right wing, red for

Partito Democratico and

Per l’Italia:Centro Democratico , orange for

Scelta Civica per Monti , purple for

SinistraEcologia Libert`a , yellow for

Movimento 5 stelle , grey for the mixed group.

Figure 4.

Riverplot highlighting which nodes change community whencomparing the estimated partition ˆ z with the bound z b of the 95% credibleball around ˆ z . Party colors are the same as in Figure 3. this increment, running times under Algorithm 1 remain stillfeasible even for larger networks.As shown in Table 3, node attributes are political wingsdenoting macro–alliances, which we found to be more infor-mative than single parties, based on the marginal likelihood.As shown in Table 4, this measure suggests also a preferencefor the attribute–assisted model, meaning that such externalgroupings carry information about the block structures in thenetwork. This is conﬁrmed by the matrix in the left panel ofFigure 3, in which, by reordering the nodes according to theinferred partition, we can observe a recurrent core–peripherypattern underlying each wing that was hidden in Figure 1 andcould not have been captured by algorithmic approaches. Thisstructure is suggestive of a system in which only a subset ofpoliticians are active in proposing new bills, whereas the oth-ers are less active and tend to support just those bills proposedby members of the same wing. The right panel of Figure 3,instead, represents the posterior co–clustering matrix, whichis quite sharp, suggesting limited posterior uncertainty. This isalso highlighted by Figure 4 and is conﬁrmed by the posteriorsummaries in Table 4. In fact, the radius of the credible ballis far below 1 under both models, while the maximum achiev-able VI distance is log ≈ . .

05 conﬁrms the satisfactory ﬁt of the models.The co–clustering matrix in Figure 3 also shows alliancesamong parties in the same wing and fragmentations within xtended stochastic block models — 9/11 log p ( Y ) H VI ( ˆ z , z b ) WITHOUT ATTRIBUTES − .

06 32 [32,33] . WITH ATTRIBUTES [ WINGS ] − . [31,31] . Table 4.

Marginal likelihood log p ( Y ) and posterior summaries for the bill co–sponsorship network under the Gnedin process: posterior median number ofoccupied communities H (with ﬁrst and third quartiles in brackets) and distance VI ( ˆ z , z b ) among the estimated partition ˆ z and the 95% VI credible bound. Figure 5.

Network representation of inferred clusters. Each node denotesone community and edges are weighted by estimated block probabilities.Node sizes are proportional to cluster cardinalities, while pie–chartsrepresent cluster compositions with respect to party afﬁliations (party colorsare the same as in Figure 3). Node placement reﬂects the strength ofconnections (higher block probabilities result in closer nodes). larger parties, mostly due to core–periphery structures. Thiscan be observed also in Figure 5, where clusters are visualizedas nodes of a new weighted network, with weights given bythe block probabilities estimated via (14). Party membershipswithin each cluster are represented via pie–charts, thus high-lighting different fragmentation and aggregation levels for thedifferent parties. For example, all members of

Lega Nord belong to the same community, while

Movimento 5 Stelle and

Partito Democratico are split over several blocks. Right–wing parties, instead, belong to two main communities withdifferent party proportions. The “geography” of such com-munities, induced by the block probabilities, mostly reﬂectsthe left–right placement of Italian parties in Table 3, and high-lights a polarization around three main forces, covering leftparties, right parties and

Movimento 5 Stelle , that are almostequidistant.

6. Discussion

In the present paper we have proposed

ESBM s, a broad classof models that uniﬁes most existing

SBM s via Gibbs–type priors. Besides providing a single methodological, theoreti-cal and computational framework for various

SBM s, such ageneralization facilitates the proposal of new models by ex-ploring alternative options within the Gibbs–type class, andallows natural inclusion of attributes via connections with

PPM s. For example, we have shown in simulations that theGnedin process, which to the best of our knowledge had neverbeen used in

SBM s, can improve performance of the already–implemented DP , PY and DM . The illustrative political appli-cation outlines the beneﬁts of our extended class of modelsand inference methods, capturing hidden block structures andcore–periphery patterns.The present work offers many directions for future re-search. For example, the highly modular structure of ESBM sfacilitates extensions to directed, bipartite and weighted net-works, as done by [45] for the inﬁnite relational model. To ad-dress this goal, it is sufﬁcient to substitute the beta–binomiallikelihood in (3) with suitable ones, for example gamma–Poisson for count edges and Gaussian–Gaussian for contin-uous ones. Also other types of attributes beyond categorialones can be easily included leveraging the default choicessuggested by [33] for p ( · ) in (7) under continuous, ordinaland count–type attributes. Further extensions to other repre-sentations, such as the mixed membership SBM [1], and thedevelopment of more scalable algorithms are also worth beingexplored.

Acknowledgments

This work was partially supported by the

PRIN – MIUR

References [1] A IROLDI , E. M., B

LEI , D. M., F

IENBERG , S. E.,

AND X ING , E. P. Mixed membership stochastic blockmodels.

Journal of Machine Learning Research 9 (2008), 1981–2014. [2] A THREYA , A., F

ISHKIND , D. E., T

ANG , M., P

RIEBE ,C. E., P

ARK , Y., V

OGELSTEIN , J. T., L

EVIN , K.,L

YZINSKI , V.,

AND Q IN , Y. Statistical inference onrandom dot product graphs: a survey. Journal of MachineLearning Research 18 (2017), 8393–8484. [3] B ICKEL , P., C

HOI , D., C

HANG , X., Z

HANG , H.,

ET AL .Asymptotic normality of maximum likelihood and its vari-ational approximation for stochastic blockmodels.

Annalsof Statistics 41 (2013), 1922–1943. xtended stochastic block models — 10/11 [4] B LONDEL , V. D., G

UILLAUME , J. L., L

AMBIOTTE ,R.,

AND L EFEBVRE , E. Fast unfolding of communitiesin large networks.

Journal of Statistical Mechanics 10 (2008), P10008. [5] B RIATTE , F. Network patterns of legislative collabora-tion in twenty parliaments.

Network Science 4 (2016),266–271. [6] C RANMER , S. J.,

AND D ESMARAIS , B. A. Inferentialnetwork analysis with exponential random graph models.

Political analysis 19 (2011), 66–86. [7] C ROSSLEY , N. A., M

ECHELLI , A., V ´

ERTES , P. E.,W

INTON -B ROWN , T. T., P

ATEL , A. X., G

INESTET ,C. E., M C G UIRE , P.,

AND B ULLMORE , E. T. Cogni-tive relevance of the community structure of the humanbrain functional coactivation network.

Proceedings of theNational Academy of Sciences 110 (2013), 11583–11588. [8] D E B LASI , P., F

AVARO , S., L

IJOI , A., M

ENA , R. H.,P R ¨ UNSTER , I.,

AND R UGGIERO , M. Are Gibbs–typepriors the most natural generalization of the Dirichletprocess?

IEEE Transactions on Pattern Analysis andMachine Intelligence 37 (2013), 212–229. [9] D E B LASI , P., L

IJOI , A.,

AND P R ¨ UNSTER , I. An asymp-totic analysis of a class of discrete nonparametric priors.

Statistica Sinica 23 , 3 (2013), 1299–1321. [10] F ORTUNATO , S. Community detection in graphs.

PhysicsReports 486 (2010), 75–174. [11] F ORTUNATO , S.,

AND H RIC , D. Community detectionin networks: A user guide.

Physics Reports 659 (2016),1–44. [12] F OSDICK , B. K., M C C ORMICK , T. H., M

URPHY , T. B.,N G , T. L. J., AND W ESTLING , T. Multiresolution net-work models.

Journal of Computational and GraphicalStatistics 28 (2019), 185–196. [13] F OWLER , J. H. Connecting the congress: A study ofcosponsorship networks.

Political Analysis 14 (2006),456–487. [14] G ELMAN , A.,

AND M ENG , X.-L. Simulating normaliz-ing constants: from importance sampling to bridge sam-pling to path sampling.

Statistical Science 13 , 2 (1998),163–185. [15] G ENG , J., B

HATTACHARYA , A.,

AND P ATI , D. Proba-bilistic community detection with unknown number ofcommunities.

Journal of the American Statistical Associ-ation 114 (2019), 893–905. [16] G IRVAN , M.,

AND N EWMAN , M. E. Community struc-ture in social and biological networks.

Proceedings of theNational Academy of Sciences 99 (2002), 7821–7826. [17] G NEDIN , A. Species sampling model with ﬁnitely manytypes.

Electronic Communications in Probability 15 (2010), 79–88. [18] G NEDIN , A.,

AND P ITMAN , J. Exchangeable Gibbspartitions and Stirling triangles.

Zapiski Nauchnykh Sem-inarov, POMI 325 (2005), 83–102. [19] G UIMERA , R., M

OSSA , S., T

URTSCHI , A.,

AND A MA - RAL , L. N. The worldwide air transportation network:Anomalous centrality, community structure, and cities’global roles.

Proceedings of the National Academy ofSciences 102 (2005), 7794–7799. [20] H ANDCOCK , M. S., R

AFTERY , A. E.,

AND T ANTRUM ,J. M. Model-based clustering for social networks.

Jour-nal of the Royal Statistical Society: Series A 170 (2007),301–354. [21] H ARTIGAN , J. Partition models.

Communications inStatistics - Theory and Methods 19 (1990), 2745–2756. [22] H OLLAND , P. W., L

ASKEY , K. B.,

AND L EINHARDT ,S. Stochastic blockmodels: First steps.

Social Networks5 (1983), 109–137. [23] K ARRER , B.,

AND N EWMAN , M. E. Stochastic block-models and community structure in networks.

PhysicalReview E 83 (2011), 016107. [24] K ASS , R. E.,

AND R AFTERY , A. E. Bayes factors.

Journal of the American Statistical Association 90 (1995),773–795. [25] K EMP , C., T

ENENBAUM , J. B., G

RIFFITHS , T. L., Y A - MADA , T.,

AND U EDA , N. Learning systems of conceptswith an inﬁnite relational model. In

Proceedings of the21st National Conference on Artiﬁcial Intelligence - Vol-ume 1 (2006), pp. 381–388. [26] L EE , C., AND W ILKINSON , D. J. A review of stochasticblock models and extensions for graph clustering.

AppliedNetwork Science 4 (2019), 1–50. [27] L EGRAMANTI , S., R

IGON , T.,

AND D URANTE , D.Bayesian testing for exogenous partition structures instochastic block models.

Manuscript Submitted for Pub-lication (2020). [28] L IJOI , A., M

ENA , R. H.,

AND P R ¨ UNSTER , I. Con-trolling the reinforcement in Bayesian non-parametricmixture models.

Journal of the Royal Statistical Society.Series B 69 , 4 (2007), 715–740. [29] L IU , F., C HOI , D., X IE , L., AND R OEDER , K. Globalspectral clustering in dynamic networks.

Proceedings ofthe National Academy of Sciences 115 (2018), 927–932. [30] M EIL ˘ A , M. Comparing clusterings — an informationbased distance. Journal of Multivariate Analysis 98 , 5(2007), 873–895. [31] M ILLER , J. W.,

AND H ARRISON , M. T. Inconsistencyof Pitman-Yor process mixtures for the number of com-ponents.

Journal of Machine Learning Research 15 , 1(2014), 3333–3370. [32] M ILLER , J. W.,

AND H ARRISON , M. T. Mixture modelswith a prior on the number of components.

Journal of theAmerican Statistical Association 113 (2018), 340–356. xtended stochastic block models — 11/11 [33]

M ¨

ULLER , P., Q

UINTANA , F.,

AND R OSNER , G. L. Aproduct partition model with regression on covariates.

Journal of Computational and Graphical Statistics 20 , 1(2011), 260–278. [34] N EWMAN , M. E. Modularity and community structurein networks.

Proceedings of the National Academy ofSciences 103 (2006), 8577–8582. [35] N EWMAN , M. E.,

AND C LAUSET , A. Structure andinference in annotated networks.

Nature Communications7 (2016), 1–11. [36] N EWMAN , M. E. J.,

AND G IRVAN , M. Finding andevaluating community structure in networks.

PhysicalReview E 69 (2004), 026113. [37] N EWTON , M. A.,

AND R AFTERY , A. E. ApproximateBayesian inference with the weighted likelihood boot-strap.

Journal of the Royal Statistical Society. Series B56 (1994), 3–26. [38] N OWICKI , K.,

AND S NIJDERS , T. A. B. Estimation andprediction for stochastic blockstructures.

Journal of theAmerican Statistical Association 96 (2001), 1077–1087. [39] O LHEDE , S. C.,

AND W OLFE , P. J. Network histogramsand universality of blockmodel approximation.

Proceed-ings of the National Academy of Sciences 111 (2014),14722–14727. [40] P ARK , A. J.- H ., AND D UNSON , D. B. Bayesian gen-eralized product partition model.

Statistica Sinica 20 , 3(2010), 1203–1226. [41] P ITMAN , J. Some developments of the Blackwell-MacQueen urn scheme.

Statistics, Probability and GameTheory 30 (1996), 245–267. [42] P RIEBE , C. E., P

ARK , Y., V

OGELSTEIN , J. T., C

ONROY ,J. M., L

YZINSKI , V., T

ANG , M., A

THREYA , A., C

APE ,J.,

AND B RIDGEFORD , E. On a two-truths phenomenonin spectral graph clustering.

Proceedings of the NationalAcademy of Sciences 116 (2019), 5995–6000. [43] Q UINTANA , F. A.,

AND I GLESIAS , P. L. Bayesianclustering and product partition models.

Journal of theRoyal Statistical Society. Series B 65 , 2 (2003), 557–574. [44] R AFTERY , A. E., N

EWTON , M. A., S

ATAGOPAN , J. M.,

AND K RIVITSKY , P. N. Estimating the integrated likeli-hood via posterior simulation using the harmonic meanidentity.

Bayesian Statistics 8 (2007), 1–45. [45] S CHMIDT , M. N.,

AND M ORUP , M. NonparametricBayesian modeling of complex networks: An introduc-tion.

IEEE Signal Processing Magazine 30 (2013), 110–128. [46] S IGNORELLI , M.,

AND W IT , E. C. A penalized infer-ence approach to stochastic block modelling of commu-nity structure in the italian parliament. Journal of theRoyal Statistical Society: Series C 67 (2018), 355–369. [47] S TANLEY , N., B

ONACCI , T., K

WITT , R., N

IETHAM - MER , M.,

AND M UCHA , P. J. Stochastic block modelswith multiple continuous attributes.

Applied NetworkScience 4 (2019), 1–22. [48] T ALLBERG , C. A Bayesian approach to modelingstochastic blockstructures with covariates.

Journal ofMathematical Sociology 29 (2004), 1–23. [49] W ADE , S.,

AND G HAHRAMANI , Z. Bayesian clusteranalysis: Point estimation and credible balls.

BayesianAnalysis 13 (2018), 559–626. [50] Z HAO , Y., L

EVINA , E.,

AND Z HU , J. Community ex-traction for social networks. Proceedings of the NationalAcademy of Sciences 108 (2011), 7321–7326. [51] Z HAO , Y., L

EVINA , E.,