[PDF] Modeling partitions of individuals

Abstract

Despite the central role of self-assembled groups in animal and human societies, statistical tools to explain their composition are limited. We introduce a statistical framework for cross-sectional observations of groups with exclusive membership to illuminate the social and organizational mechanisms that bring people together. Drawing from stochastic models for networks and partitions, the proposed framework introduces an exponential family of distributions for partitions. We derive its main mathematical properties and suggest strategies to specify and estimate such models. A case study on hackathon events applies the developed framework to the study of mechanisms underlying the formation of self-assembled project teams.

Full PDF

MModeling partitions of individuals

Marion Hoffman ∗ , Per Block andTom A.B. Snijders Chair of Social Networks, ETH Z¨urich, Weinbergstrasse 109, 8006 Z¨urich, Switzerland Department of Sociology and Leverhulme Centre for Demographic Science, University of Oxford,Oxford OX1 1JD, United Kingdom University of Groningen, Groningen, Netherlands Nuﬃeld College, University of Oxford, Oxford OX1 1NF, United Kingdom

Abstract

Despite the central role of self-assembled groups in animal and human societies,statistical tools to explain their composition are limited. We introduce a statisti-cal framework for cross-sectional observations of groups with exclusive membershipto illuminate the social and organizational mechanisms that bring people together.Drawing from stochastic models for networks and partitions, the proposed frameworkintroduces an exponential family of distributions for partitions. We derive its mainmathematical properties and suggest strategies to specify and estimate such models.A case study on hackathon events applies the developed framework to the study ofmechanisms underlying the formation of self-assembled project teams.

Keywords : exponential families, stochastic partitions, statistical modeling, social groups,self-assembled groups

In gregarious species, individuals have a tendency to come together in groups. This isespecially pertinent in humans. Often, the composition of these groups emerge from vol-untary decisions of members, thus, crystallizing socializing preferences in social groups, orgoal-oriented behaviors in the case of task-oriented groups. In some cases, membership toa group is exclusive, in the sense that every individual can only be member of one group.This exclusivity might result from physical and temporal constraints—e.g. when groupboundaries are deﬁned by physical gathering—or structural rules—e.g., group overlap isoften forbidden when groups compete for some goal. In such cases of self-assembled andexclusive groups, the decision to group with certain individuals rather than others candepend on important social mechanisms that structure the organisation of a community.The present paper introduces a statistical framework to model and explain observationsof self-assembled exclusive groups, with a view to better understand the mechanisms un-derlying their formation.Examples of self-assembled exclusive groups are numerous, ranging from mammal herdsin the wild to player squads in online games. Numerous situations require individuals to ∗ Contact: marion.hoﬀ[email protected] a r X i v : . [ s t a t . M E ] S e p rganize themselves into such groups, in order for them to execute an action or acquirea resource. In the animal kingdom, many species gather into ﬂocks, herds, or schools fortraveling purposes (Okubo, 1986; Reynolds, 1987) and predators assemble packs for hunt-ing (Creel & Creel, 1995; Gittleman, 1989). In the human world, children groups gatherin the schoolyard to engage in common activities (Moody, 2001), sport clubs emerge toprovide opportunities for shared free-time activities (Putnam, 2000; Lazarsfeld & Merton,1954), and project teams assemble spontaneously to tackle organizational tasks (Guimeraet al., 2005; Zhu et al., 2013). In this paper we use the example of human groups and inparticular project teams in the empirical illustration.The existence and composition of social groups have a crucial role in determiningsocietal outcomes. Seminal sociological works recognize that by coming together in groups,individuals inﬂuence each other’s cognition, aﬀective structures, and individual outcomes(Parsons, 1949; Homans, 1950; Lewin et al., 1936). Various theories develop conceptsfor group settings, such as social circles (Simmel, 1949), social foci (Feld, 1982), socialsettings (Pattison & Robins, 2002), or social situations (Block, 2018). Groups lay groundfor the development of social ties (Fischer, 1982; Moody, 2001; Lazarsfeld & Merton, 1954;Simmel, 1949) and provide the context for exchange relations (Granovetter, 1985), wherethe ability of one group member to acquire resources and social support will depend onwhat the other members can provide. Additionally, the set of attributes present in a groupas well the relations between its members might have an impact on some essential groupoutcomes. At a broader level, the formation of groups in a community can indicate andimpact how diﬀerent parts of the community relate to each other and segregate (Allportet al., 1954). Investigating group formation is all the more important in instances wheresuch outcomes are crucial to the functioning of individuals and communities. The studyof these interdependent group processes calls for the development of mathematical toolstailored for this level of social unit, as argued by (Lindenberg, 1997).The main aim of the model we propose is to uncover which mechanisms guide the com-position of self-assembled groups in a given setting, and to assess their relative importance.Such mechanisms can fall in the categories of biological imperatives, social preferences, andexogenous constraints. Adding to the variety of their origin, the mechanisms underlyinggroup formation can also be situated at diﬀerent levels.1. For any group member, the characteristics of the other members reﬂect their indi-vidual attraction towards others exhibiting some particular attribute.2. Group composition can also reﬂect dyadic preferences, such as the preference ofindividuals connected through a relationship (e.g., kinship or friendship) to belongto the same groups.3. Finally, group-level mechanisms, such as the optimization of a certain combinationof attributes, can guide group formation.In the example of project teams, individuals might seek (1) teams with other powerfulor skilled individuals, (2) colleagues with whom they have already collaborated, or (3)teams with an eﬃcient distribution of competences (Skvoretz & Bailey, 2016). On top ofthese formation mechanisms, some contexts might constrain group compositions or sizes– for example, a maximal group size might be imposed. The proposed model aims to shedlight on the role of these diverse factors in group formation processes while taking suchconstraints into account. 2 .2 Previous approaches A common approach to modelling group membership is to represent individuals and groupsby a two-mode, or bipartite, network in which nodes on one level (i.e., individuals) areconnected to a second level of nodes (i.e., groups). Permutation test techniques andmodels such as the Quadratic Assignment Procedure (QAP) proposed by (Krackhardt,1988) can be used to investigate whether some combinations of attributes within groupsare more likely than others within this representation. Other statistical tools, and notablythe Exponential Random Graph Model (ERGM), leverage the capabilities of exponentialfamily models (Sundberg, 2019) and make use of techniques from spatial statistics (Besag,1974) and graphical modeling (Lauritzen, 1996) to capture more complex dependenciesbetween tie observations in one or two-mode networks (Lusher et al., 2013). The ERGMcan be used to model both attribute and structural dependencies, such as the propensityof individuals to join groups in case they already share other groups with their members.Theoretically, it is possible to restrict the support of an ERGM to bipartite networks withindividuals’ degrees ﬁxed to one (Morris et al., 2008), thus allowing to model exclusivity ingroup membership. However, the main limitation of representing a partition by a bipartitenetwork is that the number and characteristics of second mode nodes are predeterminedand are not modelled themselves (Wang et al., 2009). One consequence of this limitation isthat the model allows few insights into the mechanisms underlying number and (implicitly)size of the self-emergent groups. So far, only approaches designed for dynamic changesin group compositions over time could circumvent this issue by artiﬁcially creating anddeleting the second mode nodes (Hoﬀman et al., 2020), but such procedures remains ill-suited for cross-sectional observations.A variation of the network logic that integrates the constraint of exclusive group mem-bership was deﬁned in the general location system (GLS) model of Butts (2007). Thismodel is tailored to observations where individuals can be assigned to only one group(or location) at a time, similarly to how individuals set themselves into occupations orgeographical residences. In the same vein as the ERGM for network representations, theGLS framework builds upon the exponential family formalism in order to model complexdependencies between observations of group memberships (or location assignments). Asabove, this speciﬁcation requires to know the number and characteristics of the availablegroups in advance.A way to circumvent the diﬃculty of not having predeﬁned groups is to representgroups as a partition of the set of individuals, with a partition being a division of theindividuals into non-overlapping groups. Popular partition distributions are the uniformDirichlet-multinomial partitions deﬁned for partitions with a maximum possible numberof groups (McCullagh, 2011; Kingman, 1978) and Poisson-Dirichlet distributions (Pitman& Yor, 1997). Such families of distributions still assume a predeﬁned number of avail-able classes, although the possible number of groups now sits between one and a maximalvalue. The extension of these models when the maximum value becomes inﬁnite is knownas the Ewens distribution (Ewens, 1972; McCullagh, 2011). The Ewens distribution wasﬁrst applied to the problem of allele sampling in genetics (Ewens, 1972), but its use, aswell as the use of the related Dirichlet distributions, has spread into the ﬁelds of biodi-versity (Hubbell, 2001), Bayesian statistics (Ferguson, 1973; Antoniak, 1974) and manyother ﬁelds of mathematics (Crane et al., 2016). Interestingly, the Ewens speciﬁcationalso deﬁnes an exponential family (Crane et al., 2016). One limitation of these models isthat they cannot incorporate attribute and structural dependencies between group mem-berships in the same way ERGMs and GLS models do. This is connected to their mainapplications to sampling problems. 3n this paper, we incorporate insights from the network and partition modeling lit-erature into a novel statistical framework suited for observations of self-assembled andexclusive groups. This framework represents groups of individuals as a partition of a setof individuals and builds upon the literature on exponential families for networks to cap-ture non-trivial dependencies between the groups composing the partition. The modelallows for the size and composition of groups to be the result of individual, relational, andgroup-level processes and oﬀers the possibility to draw inference on the processes drivingthe formation of groups in a certain context. Sections 2 to 4 describe the deﬁnitions,mathematical formulation, and interpretation of the model. Sections 5 and 6 cover thecomputation and estimation of the model parameters. Section 7 presents an applicationto the study of self-assembled teams during hackathon competitions.

Consider a set of n actors A . A partition P over A represents a division of these actorsinto non-overlapping subsets. Formally, P is a set of groups or blocks , denoted G , thatsatisﬁes the conditions: (cid:91) G ∈ P = A , ∀ ( G, G (cid:48) ) ∈ P , G (cid:54) = G (cid:48) : G ∩ G (cid:48) = ∅ , ∀ G ∈ P : G (cid:54) = ∅ . For convenience, we deﬁne the function g P : [[1 , n ]] (cid:55)→ P returning the group of a givennode: g P ( i ) = G | i ∈ G. We can also transform the partition representation into the binary n × n matrix X = (cid:2) x i,j (cid:3) i,j ∈A where x i,j = 1 ⇔ g P ( i ) = g P ( j ) . Figure 1 illustrates diﬀerent possible representations of a partition in comparison to theones used in the case of networks.Figure 1: Possible representations of a partition over the nodeset { , , , , , } . H ( H ≥

0) individual attributes (e.g. gender,age). We deﬁne the actor attribute matrix containing all the actors’ attributes: A = (cid:2) a h,i (cid:3) h ∈ [[1 ,H ]] ,i ∈A . Relational attributes relevant to the analysis, such as interpersonal ties between actors,can also be deﬁned as n × n matrices Z (1) , Z (2) , and so on.In the following sections, we use the notation P for the number of groups in apartition P , and G to deﬁne the size of a given group G . Furthermore, we use the letter P when referring to a random partition, and p for the realization of a partition. To avoidany confusion, probabilities are written with the symbol Pr. The power set of all partitions over the set A is referred as P ( A ) (or P when the nodesetis not ambiguous). The size of P is given by the Bell number B n (Bell, 1934; Pitman,1997) and can be calculated iteratively by: B = 1 and B n +1 = n (cid:88) i =0 (cid:18) ni (cid:19) B i . (1)In certain contexts, some partitions of the actor set might not be realistic or allowed,in which case one might only consider a subset P (cid:48) of the whole partition space P . Mostprominently, certain group sizes might not be allowed. When considering subsets P (cid:48) thatonly contain groups of sizes higher or equal to minimal value s min and lower or equal to avalue s max , a number of calculations can be extended. The number of partitions belongingto this subset can be calculated similarly to Equation (1) with a sequence B (cid:48) n (details canbe found in Appendix A). After deﬁning the values i min = max(0 , n + 1 − s max ) and i max = min( n, n + 1 − s min ), B (cid:48) n is deﬁned by: B (cid:48) n = 0 for 0 < n < σ min ,B (cid:48) = B (cid:48) σ min = 1 ,B (cid:48) n +1 = i max (cid:88) i = i min (cid:18) ni (cid:19) B (cid:48) i for n (cid:62) σ min . (2) For the purpose of parameter estimation and interpretation, we deﬁne three symmetricbinary relations between the elements of P , called the merge/split , permute , and transfer relations (see an illustration of these relations in Figure 2).The merge/split relation R merge is the set of all unordered pairs of partitions for whichone partition of the pair is obtained by merging two distinct groups in the other partition.Since these are unordered pairs, in the reverse direction this deﬁnition includes splittingone group in one partition into two groups in the other. Formally, we deﬁne P − G,G (cid:48) = P \ { G, G (cid:48) } the partition P with two groups G and G (cid:48) removed. The relation can bewritten as: R merge = (cid:8) { P, P (cid:48) } ⊆ P | ∃

G, G (cid:48) ∈ P : P (cid:48) = P − G,G (cid:48) ∪ { G ∪ G (cid:48) } (cid:9) . The permute relation R permute links partitions in which two nodes in two diﬀerentgroups are exchanged, while the other nodes grouping remains the same. For i and i (cid:48) two5odes respectively belonging to two distinct groups G and G (cid:48) , we note G i ↔ i (cid:48) and G (cid:48) i (cid:48) ↔ i the groups in which the nodes i and i (cid:48) have been exchanged. Under the same notation therelation deﬁnes the following unordered pairs: R permute = (cid:8) { P, P (cid:48) } ⊆ P | ∃

G, G (cid:48) ∈ P , i ∈ G , i (cid:48) ∈ G (cid:48) : P (cid:48) = P − G,G (cid:48) ∪ { G i ↔ i (cid:48) , G (cid:48) i (cid:48) ↔ i } (cid:9) . Finally, the transfer relation R transfer contains the unordered pairs of partitions { P, P (cid:48) } for which partition P and P (cid:48) are identical, with the exception of one node that belongsto a diﬀerent group in P and P (cid:48) (we can say that this node is being transferred from onegroup to another). Importantly, this node may be an isolate in one of the two partitions.Similarly, for a node i belonging to the original nodeset A , we denote P − i the projectionof the partition on the set A\ i . The relation is then deﬁned by: R transfer = (cid:8) { P, P (cid:48) } ⊆ P | ∃ i ∈ A : P (cid:48)− i = P − i and P (cid:48) (cid:54) = P (cid:9) . Figure 2: Illustration of the merge/split , permute , and transfer relations for the full set ofpartitions over three nodes. All relations are binary and symmetric. Our aim is to deﬁne a parametric set of probability distributions over P for a given set ofactors. The parameters of this distribution should be associated to statistics relevant forthe hypotheses under consideration on the processes resulting in the observed partition.As outlined earlier, such hypotheses can be associated to the structure of the partition(i.e. the number of groups and their sizes) or the distribution of actors’ attributes withinthe groups.The class of exponential distributions allows such parametrization in a straightforwardway (Sundberg, 2019). We propose here an exponential family with support the set P (or a subset P (cid:48) ). This family is deﬁned for an identity base measure, a vector of naturalsuﬃcient statistics s ( P ) = (cid:0) s k ( P ) (cid:1) k ∈ K , and a canonical parameter vector α = (cid:0) α k (cid:1) k ∈ K .It is expressed by: Pr α ( P = p ) = exp (cid:16) (cid:80) k α k s k ( p ) (cid:17) κ P ( α ) . (3)6here the normalizing constant κ P ( α ) is deﬁned by: κ P ( α ) = (cid:88) (cid:101) P ∈P exp (cid:16) (cid:88) k α k s k ( (cid:101) P ) (cid:17) . (4)This formulation mirrors the deﬁnition of an ERGM when considering a partitioninstead of a graph distribution (see (Lusher et al., 2013) or (Robins, Pattison, et al., 2007)for more details on ERGMs).Some special cases of this exponential family are related to well-known distributions.Naturally, the model deﬁned without any suﬃcient statistic generates the uniform distri-bution over the partition set P . The Ewens distribution (Ewens, 1972; McCullagh, 2011)is deﬁned for a positive parameter λ as follows:Pr λ ( P = p ) = Γ( λ − λ p (cid:81) G ∈ p ( G − n + λ −

1) (5)with Γ being the Gamma function. As shown in Appendix B, this deﬁnition is equivalentto the following formulation of (3) with the parameter vector α = (log( λ ) , α ( P = p ) = exp (cid:16) α p + α (cid:80) G ∈ p log (cid:0) ( G − (cid:1)(cid:17) κ P ( α ) . (6) Graphical modeling with dependence graphs is a useful technique for specifying expo-nential family distributions (Lauritzen, 1996). In the network literature, this techniquewas introduced for Markov graphs by Frank and Strauss (1986) and later developed forERGMs (Wasserman & Pattison, 1996; Robins & Pattison, 2012). Dependence graphsthen capture the dependence structure of the tie variables and this structure can informthe choice of relevant suﬃcient statistics, by virtue of the Hammersley-Cliﬀord theorem(Hammersley & Cliﬀord, 1971; Besag, 1974).However, graphical modeling is ill-suited to study partition models since the depen-dence graph of group variables with the non-overlapping constraint is not straightforward.Instead, we take inspiration from the statistics and the independence assumptions used inother related statistical models, in particular partition models (i.e., Ewens and Dirichletpartitions) or Dirichlet models. Extending the statistics used in the Ewens formula, weshow that statistics deﬁned as sums of group attributes can model a wide range of parti-tion properties. The independence properties of count statistics are described in Section3.2.

Structural statistics aim to accurately model the observed group sizes and their dispersionin a given partition. To understand which statistics can be used, we calculate the expecteddistribution of group sizes in random partitions of 10 nodes for diﬀerent statistics relatedto group sizes. Having n = 10 allows us to enumerate the number of partitions withspeciﬁc group statistics and directly calculate their probabilities (for more details of theseprobabilities, see Section 4). 7igure 3: Illustration of the calculation of introduced statistics based on counts for a givenpartition p = (cid:8) { , , } , { , } , { } , { , , , } (cid:9) with a binary covariate (actor’s shape);the red dashed elements are to be counted to get the statistics value for (a) number ofgroups s ( p ) = 4; (b) squared group sizes, i.e. each unordered dyad must be counted twiceplus the number of nodes s ( p ) = 30; (c) number of dyads within groups that are identicalon shape s homophi y, ( p ) = 4; and (d) number of ordered dyads within groups that includeone square s sociability, ( p ) = 8.The ﬁrst relevant statistic to model group sizes is the number of groups (i.e., thecardinality of the partition, see Figure 3a) s ( P ) = P, as it is the basis of the Ewens formula (see equation (6)). Figure 4a shows that low valuesof α favor partitions with large groups of 10, 9, or 8 nodes, while high values favor manysmall groups of 1, 2 or 3 nodes. Figure 4b shows that, as α increases, the expectednumber of groups increases, and so does the expected number of singleton groups. Theexpected number of groups of size 10, i.e., trivial one-group partitions, decreases, whilethe expected prevalence of the intermediate group sizes 2—9 is unimodal, and assumetheir maxima for values of α that decrease with group size. Figure 4c further shows thatthe probability for a random node to belong to large groups decreases when α increases.Finally, Figure 4d shows that the distribution of group sizes stochastically decreases with α . We conclude that the number of groups is a simple and eﬃcient way to model thecentral tendency of group sizes in a partition.Another important feature of the group size distribution is its dispersion or skewness.We ﬁrst use the statistic in deﬁnition (6) of the Ewens distribution: s ( P ) = (cid:88) G ∈ P log (cid:0) ( G − (cid:1) . Since the Ewens model can reproduce a ”richer-get-richer” eﬀect on group sizes (McCullagh,2011), we can expect it is the result of this term being included in the model. We calculatethe size distribution for partitions over 10 nodes for a model containing the two statistics s and s by varying the parameter α . To ﬁx the ﬁrst statistic, we determine the value α that maintains the expected value of s equal to 4 for each pre-determined α . Thismeans we explore the distribution of expected group sizes for a constant expected numberof groups. Figure 5a shows the expected distribution. The dispersion of sizes increaseswith the parameter value for the statistic s .Another intuitive statistic for modeling the skewness of the size distribution is the sumof squared sizes: s ( P ) = (cid:88) G ∈ P G . s ( P ) = P , as a function of the parameter α . (a)Expected group size of a given node; (b) expected number of groups of a given size; (c)probability function for the size of a given node for three values of α ; and (d) expecteddistribution of group sizes for three values of α .9igure 5: Distribution of group sizes for a random partition deﬁned by a model with10 nodes and two suﬃcient statistics for three values of the second parameter (seethe text for the determination of the ﬁrst parameter). (a) s ( P ) = P , s ( P ) = (cid:80) G ∈ P log (cid:0) ( G − (cid:1) ; and (b) s ( P ) = P , s ( P ) = (cid:80) G ∈ P G .It is equal to the sum of the elements of the matrix representation X of the partition (seeFigure 3b). The group size distributions obtained for this statistic are shown in Figure5b. Once again, increasing the value α can increase the dispersion of sizes in the randompartition. Choosing between s and s to model size dispersion is then a practical matterof which one represents more accurately the structure of the observed partition.In case the distribution of group sizes cannot be approximately reproduced by theabove parameters, or if a particular group size might be over- or under-represented forexogenous reasons, the number of groups of particular sizes can be added as a suﬃcientstatistics.At this point, it is important to mention that some estimation issues coined as degen-eracy or near-degeneracy by the ERGM literature (Handcock, 2003; Snijders et al., 2006;Robins, Snijders, et al., 2007; Lusher et al., 2013) might ensue from the use of certainstatistics combinations in this model. This is the case for the previous models deﬁned for S = (cid:0) s , s (cid:1) and S = (cid:0) s , s (cid:1) . As a result, some estimated models will correspond to un-realistic distributions that concentrate their probability mass on a few extreme partitionssuch as the one with only one group and the one only containing singletons rather thanaccurately reﬂecting the observed statistics. In most cases, a degenerate model will pointto some misspeciﬁcation, and it might prove useful to have a diﬀerent operationalizationof size dispersion. For example, one might use a weighted sum over all group sizes of thenumber of cliques of a given size. Weights could be deﬁned as decreasing in a similarway as the ”geometrically weighted edgewise shared partners” (gwesp) eﬀect proposed inSnijders et al. (2006) and Hunter (2007). Most observations on degeneracy made in thecase of ERGMs can be extended to the model presented here. The inﬂuence of individual covariates on the formation of relational ties has been widely in-vestigated in social networks, starting with the fundamental idea of homophily (McPhersonet al., 2001; Rivera et al., 2010) stating that similar individuals are more likely to be con-10ected. In the case of a dyad, homophily can be operationalized as a dyadic variableindicating whether the actors have the same (or similar) attributes. Including this mech-anism in the current model requires to extend the concept of homophily to the grouplevel.A ﬁrst operationalisation of homophily is the preference for homogeneous groups, inwhich all actors are similar to each other. For example, children might be more likelyto form uniform groups in terms of age or gender. For a binary attribute a we canoperationalize this preference by counting the number of homophilous dyads inside allgroups as illustrated in Figure 3c: s homophily, ( P ) = (cid:88) G ∈ P (cid:88) i,j ∈ G [ a i = a j ] . This amounts to counting the number of homophilous ties of each actor in the networkrepresentation of the partition. For a categorical and ordered attribute, or a continuousattribute, the binary term [ a i = a j ] can be replaced by the absolute diﬀerence | a i − a j | .Alternatively, homophily can operate within a certain threshold for individuals. Forexample, individuals might prefer groups with at least one person speaking the samelanguage as them. In that case, the relevant statistic should count the number of groupmembers who have a similar other in their group: s homophily, ( P ) = (cid:88) G ∈ P (cid:88) i ∈ G max[ a i = a j ] j ∈ G,j (cid:54) = i . The concept of heterophily , or complementarity (Rivera et al., 2010), can be extendedin a similar manner. Actors might aim for diverse groups to form more eﬃcient teams.In this case, the statistic of interest can be the number of diﬀerent attributes amongmembers of a group. If we write unique( a i,i ∈ G ) as the vector containing unique attributesof members of a group G , we have: s complementarity ( P ) = (cid:88) G ∈ P (cid:0) ( a i ) i ∈ G (cid:1) . In the case of continuous variables, the range of the attribute among the group memberscan be considered. Furthermore, actors might try to optimize a certain combination ofattributes, for example individuals might prefer teams with two or three diﬀerent back-grounds present to foster creativity while not losing too much time in communicationbetween diﬀerent experts.In practice it can be advantageous to normalize the presented statistics by the numberof dyads in each group (e.g., for the ﬁrst homophily statistic proposed) or the size of thegroup (e.g., for the second homophily statistic or the complementarity statistic) to increasecomparability, especially in empirical cases of heterogenous group sizes.

Mechanisms related to both covariates and structural features can be included in thecurrent framework. This includes a translation of the network concepts of sociability or aspiration (Snijders & Lomi, 2019), deﬁned as the tendencies for actors with a highattribute to send or receive more ties, respectively. For groups, these mechanisms translateinto the preference of actors that score high on an attribute to be in larger groups. For11xample, extraverted individuals might be more likely to be found in larger groups, whichcan be modeled with a sociability statistic deﬁned by: s sociability, ( P ) = (cid:88) G ∈ P (cid:88) i ∈ G G a i . This is illustrated by Figure 3d. Alternatively, the sum of individual attributes can bereplaced by the average value of the attributes within groups: s sociability, ( P ) = (cid:88) G ∈ P G mean (cid:2) a i (cid:3) i ∈ G . Classical methods of graphical modeling (Lauritzen, 1996) are ill-suited for representingdependence assumptions in partition models. However, we can use other concepts todiscuss independence properties of the model.(Kingman, 1978) established the property of consistency for the Ewens sampling for-mula. This concept represents that for a given sampled population that can be modeledwith an Ewens distribution of parameter λ , any sub-population of this population also fol-lows an Ewens distribution of the same parameter. As shown by simple counter-examplesin Appendix C, most models deﬁned for the statistics presented above fail to fulﬁll thiscondition. The Ewens formula is a special case in that regard, since consistency is a criti-cal property for the study of population samples and much less so in the case of completeobservations in our case.A second relevant concept is neutrality , as introduced by (Connor & Mosimann, 1969)to study distributions of proportions of a ﬁxed quantity. Such variables are deﬁned as astrictly positive vector ( X , X , ..., X n ) with X + X + ... + X n = q where q is constant.Each variable will never be independent from the others as it can be expressed as alinear combination of the others. To remedy this, Connor and Mosimann introduce theconcept of neutrality that deﬁnes that the proportion X , for example, is neutral if it isindependent of the vector (cid:0) X / ( q − X ) , ..., X n / ( q − X ) (cid:1) . This property allows to ignoreone or several proportions to study the others. For example, it was shown that neutralityof all proportions characterizes the Dirichlet distribution (Connor & Mosimann, 1969;Geiger et al., 1997).Although the concept of neutrality was initially deﬁned for proportion vectors, itsextension to partitions can help us understand how the composition of a subset of thepartition might aﬀect the rest of the partition. Let P be a random partition over a set A , and A (cid:48) a subset of A , with complement set A (cid:48) c . We further deﬁne π and π c as therespective projections of partitions in P ( A ) over A (cid:48) and A (cid:48) c .We deﬁne a distribution to be neutral if and only if the projections of P on A (cid:48) and A (cid:48) c are independent under the condition that any group of P is either in A (cid:48) or A (cid:48) c . Thiscondition is equivalent to having P as the union of its two projections: P = π ( P ) ∪ π c ( P ).A distribution is neutral if and only if:Pr α (cid:0) P = p | P = π ( P ) ∪ π c ( P ) (cid:1) =Pr α (cid:0) π ( P ) = π ( p ) (cid:1) × Pr α (cid:0) π c ( P ) = π c ( p ) (cid:1) . (7)We show in Appendix C that this property holds for any model speciﬁed for statistics s k deﬁned as sums of real functions of the groups of P . Notably, all statistics proposed inthe previous section and used in our analyses later are of this form.12 Computation of the normalizing constant of the distribu-tion

As shown in the re-wiring of the Ewens formula in Equation (5), some model speciﬁcationsinduce a simpliﬁcation of Equation (3) into more tractable forms. This allows a directevaluation of (3) for these cases, which can be leveraged to approximate the likelihood ofmore complex speciﬁcations, that we use in the empirical example of the paper.For a model speciﬁed only with the statistic s ( P ) = P for a set of n nodes, we canmake use of the Stirling numbers of the second kind (Riordan, 1958) to derive a simpleformulation of the normalizing constant κ P . The Stirling number (cid:8) nm (cid:9) is the number ofpartitions with m groups, in other words, the number of partitions for which s ( P ) = m (Pitman, 1997). We can therefore sum over all possible values m and get the directexpression: κ P ( α ) = n (cid:88) m =1 (cid:26) nm (cid:27) exp( α m ) . (8)More interestingly, one can calculate the normalizing constant of any model containingstatistics of the form: s k ( P ) = (cid:88) G ∈ P f k ( G ) (9)where f k are functions of the block sizes. For such models, the suﬃcient statistics deﬁnean exchangeable distribution (McCullagh, 2011) that does not depend on the labeling ofthe nodes. We deﬁne κ n as the normalizing constant of these models on any set of n nodes. This constant can be constructed as a recursive sequence and computed with thefollowing formulas: κ = 1 , κ = exp (cid:16) (cid:88) k ∈ K α k f k (1) (cid:17) ,κ n +1 = n (cid:88) i =0 (cid:18) ni (cid:19) exp (cid:18) (cid:88) k ∈ K α k f k ( n + 1 − i ) (cid:19) κ i . (10)The proof of the derivation of these relations can be found in Appendix D. As mentioned in Section 2.3, some analyses might require restricting the sampled space ofpartitions. For the interest of this paper, we focus on the subset P (cid:48) of partitions containingonly groups of sizes between σ min and σ max .The formula (8) previously established for the sole statistic s ( P ) = P can also beused by replacing the Stirling numbers (cid:8) nm (cid:9) by an extension deﬁned by the number ofpartitions in m blocks with all block sizes belonging to [ σ min , σ max ]. Appendix A providesdetails on how to recursively calculate these numbers.More generally, the property given by formula (10) can be extended for this case ofsize restrictions (see Appendix D). Again, models deﬁned for statistics of the form (9) onthe set P (cid:48) are exchangeable and we can write the constant κ P (cid:48) as κ (cid:48) n as it only dependson the number of nodes. By using again the values i min = max(0 , n + 1 − σ max ) and13 max = min( n, n +1 − σ min ), we can construct the sequence κ (cid:48) n with the following recursion: κ (cid:48) n = 0 for 0 < n < σ min ,κ (cid:48) = 1 , κ (cid:48) σ min = exp (cid:16) (cid:88) k ∈ K α k f k ( σ min ) (cid:17) ,κ (cid:48) n +1 = i max (cid:88) i = i min (cid:18) ni (cid:19) exp (cid:18) (cid:88) k ∈ K α k f k ( n + 1 − i ) (cid:19) κ (cid:48) i for n (cid:62) σ min . (11) In the case of exponential families, the maximum-likelihood estimation method is equiv-alent to the method of moments that consists in ﬁnding the parameters under which theexpected statistics of the modeled partition are equal to the observed statistics (Sundberg,2019). Such estimations require, however, the calculation of either the likelihood functionor the expected statistics under the model. When the normalizing constant of a model,and therefore its likelihood, can be calculated as shown in Section 4, any optimisationmethod, such as a Newton-Raphson method (Deuﬂhard, 2011), can be applied to approx-imate the parameter value for which this likelihood is maximum. This maximum eitherexists and is unique, or is inﬁnite, by virtue of the properties of convexity in exponentialfamilies (Wedderburn, 1976).As soon as a model includes statistics related to actors’ attributes, such simpliﬁcationsof the normalizing constant κ P are unlikely to be found. Since κ P contains B n terms,following Equation (1), the calculation of this likelihood is practically intractable for alarge number of nodes. This problem can be circumvented with Monte-Carlo MarkovChain (MCMC) techniques, drawing inspiration from algorithms originally devised forERGMs (Lusher et al., 2013; Snijders, 2002; Hunter & Handcock, 2006). As the space P becomes extremely large for high values of n , we can only sample a subsetof random partitions to approximate the distribution of partitions under a given model.Monte-Carlo Markov Chain (MCMC) methods can assist in constructing such a subsetby sampling partitions from a Markov chain whose stationary distribution is the modeldistribution given by Equation (3). A suitable algorithm for this purpose is the Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970; Chib & Greenberg, 1995).The Metropolis-Hastings approach consists in deﬁning a Markov chain with transitionprobabilities q deﬁned as the product: q ( p (cid:48) | p ) = (cid:101) q ( p (cid:48) | p ) A ( p (cid:48) , p ) . At each step in the chain, a new partition p (cid:48) is proposed with probability (cid:101) q ( p (cid:48) | p ), and itgets accepted according to the acceptance ratio A ( p (cid:48) , p ).To deﬁne the proposal distribution (cid:101) q , we use a symmetric relation R on the space P . This relation can be one of the previously deﬁned relations R merge , R permute , and R transfer or a combination of them. Importantly, this relation should connect the entireoutcome space ; therefore, R permute should not be used without at least one of the othertwo. For our analyses, we use R merge for purely structural models, and a combination of R merge and R permute when covariate eﬀects are included. These relations do connect theentire outcome space and seem more eﬃcient than others, especially when transitions topartitions with isolates are allowed. 14or a given partition p , we propose to only move to a partition p (cid:48) such that p and p (cid:48) are linked by a given relation R , with a uniform probability: (cid:101) q ( p (cid:48) | p ) = 1 { (cid:101) p | ( p, (cid:101) p ) ∈ R} . By using the detailed balance equation q ( p (cid:48) | p )Pr( p ) = q ( p | p (cid:48) )Pr( p (cid:48) ) that ensures the con-vergence of the Markov chain to the desired distribution (Metropolis et al., 1953; Hastings,1970; Chib & Greenberg, 1995), we get: A ( p (cid:48) , p ) = min (cid:32) , Pr( P (cid:48) ) { (cid:101) p | ( p, (cid:101) p ) ∈ R} Pr( P ) { (cid:101) p | ( p (cid:48) , (cid:101) p ) ∈ R} (cid:33) . As evident from Figure 2, the proposal distribution deﬁned for relations such as R merge is not symmetric. In other words, for some pairs ( p, p (cid:48) ) ∈ R it is the case that (cid:101) q ( p (cid:48) | p ) (cid:54) = (cid:101) q ( p | p (cid:48) ). Therefore, it is necessary to calculate the proposal probabilities at each step ofthe chain to ﬁnd the acceptance ratio. Deciding on which relation to use depends on howfast these calculations can be made and on how eﬃciently the algorithm then covers thesampled space. Moreover, the proposal distribution has to be adapted when the set ofallowed partitions is restricted, to make sure only and every correct partition is reached.In certain cases, it might be well advised to design a chain that covers a larger space andonly retain correct partitions. The estimation procedure used in this study implements the Robbins-Monro algorithm(Robbins & Monro, 1951) in a similar way as proposed by Snijders (2001, 2002) for theestimation of ERGMs. The Robbins-Monro algorithm is a variant of the Newton-Raphsonoptimisation algorithm for objective functions obtained via Monte Carlo methods. It wasshown to be a useful tool for a large range of stochastic approximation problems (Lai etal., 2003), in particular for the maximum-likelihood estimation of models that can onlybe analyzed by simulations (Capp´e et al., 2005; Gu & Kong, 1998; Gu & Zhu, 2001).Although this algorithm was chosen for our study, we note that various other algorithmswere designed for similar problems, among which notably the Geyer-Thompson algorithm(Geyer & Thompson, 1992) and the stepping algorithm by Hummel and colleagues (2012).The aim of the Robbins-Monro algorithm is to solve the moment equation: E α [ s ] = s obs , (12)where E α [ s ] is the expected vector of suﬃcient statistics for the model with parameter α and s obs = s ( p obs ) is the vector of statistics in the observed partition p obs . The original N th iteration step of the algorithm consists in drawing a variable s N from the distributionof the statistics for the model with parameter α N and updating the model parameter to: α N +1 = α N − a N D − N ( s N − s obs ) . (13)In this equation, ( a N ) is called the gain sequence and controls the magnitude of theoptimisation steps and D N is the scaling matrix . A classic choice for the gain is a N = 1 /n and for D nN the derivative matrix ∂E α N [ s ] /∂α N .Using the arguments developed by Snijders (2001), our algorithm uses in place of thematrices D N only one scaling matrix D calculated once for all. This scaling matrix is thecovariance matrix of a sample of the model parametrized by some starting parameters,15nd represents an estimation of the sensitivity of the suﬃcient statistics to the parameters’variations. This is based on a result from Polyak (1990) implying that the use of thismatrix, or also its diagonal matrix, will lead to an optimal rate of convergence, as long asthe sequence ( a N ) converges at the rate n − c , with 0 . < c <

1. This procedure also requiresto use the average of the sequence ( α N ) as the solution to the optimisation problem (12).Regarding the gain sequence, we use the idea from Pﬂug (1990) that it is better tokeep a constant value a N as long as the sequence s N has not crossed the observed values s obs yet. The algorithm is therefore divided in R subphases within which the value a r iskept constant while the sequence ( α r,N ) is updated with the adapted steps (13): α r,N +1 = α r,N − a r D − ( s ( p n ) − s obs ) , (14)with p N drawn from the model parametrized by α r,N . Importantly, the lengths of thesubphases must ensure the convergence of ( a N ) at the rate n − c , and the starting parametervalue for the subphase r should be the average of the previous sequence ( α r − ,N ), in orderto satisfy the convergence conditions mentioned earlier.In practice, the algorithm is implemented in three phases. The ﬁrst phase is usedto estimate the matrix D by sampling M partitions p , p , ..., p M from the modeldeﬁned for the starting parameters α , with the Metropolis-Hastings algorithm presentedin Section 5.1. We only retain partitions after a burn-in period and with a certain thinning interval as to ensure a low auto-correlation between the sampled statistics (usually below0.4). A value of a few hundreds for M usually suﬃced. We obtain an estimation of theexpected statistics and of the covariance matrix: s α = 1 M ( s ( p ) + s ( p ) + ... + s ( p M ))ˆcov( α ) = 1 M M (cid:88) m =1 ( s ( p m ) s ( p m ) T ) − s α s α T The scaling matrix D is generally deﬁned as D = diag (cid:0) ˆcov( α ) (cid:1) . Its inverse D − provides the new starting estimates: α − aD − ( s α − s obs ) . In the second phase, we implement the iterative steps of (14) within R subphases. Ateach N th iteration, only one partition p N is drawn from the distribution with parameter α r,N , with the Metropolis-Hastings algorithm starting at the previously drawn partition p N − . Each r subphase lasts until its length is above the minimum length of the subphaseand all sampled statistics have crossed the observed values. Alternatively it stops when N is above the maximal length of the subphase. In this study, we used the values a = 0 . a r = a/ (2 r − ), and kept lengths of subphases of the order 2 r/ .Finally, phase 3 is used to sample M partitions from the ﬁnal distribution in order toapproximate the expected suﬃcient statistics with the sample mean s α f and the covariancematrix of these statistics with the sample covariance matrix. We used large values of M ,typically between 1000 and 2000. Model convergence is assessed by calculating the samplestandard deviation for each statistic separately. It is considered excellent for the k thstatistic when the convergence ratio c k : c k = s α f ,k − s obs,k SD α f ( s k ( p ) , ..., s k ( p M )) Alternatively, D can be the covariance matrix with its non-diagonal elements multiplied by a numberbetween 0 and 1, as long as this value is small enough as to avoid instability in the optimization steps. − . .

1, with SD α f ( s ( p ) , ..., s ( p M )) being the sample standard de-viations. This value is aligned to the one chosen for ERGM estimation (see (Snijders,2002)). Furthermore, we assume that parameter estimates have an approximate multi-variate normal distribution and test signiﬁcance of the model parameters from a simpleWald test considering whether the ratio between the elements of α f and their standarderrors are smaller than − The goodness of ﬁt of a model can be assessed by the calculation of auxiliary statistics(i.e., not included in the suﬃcient statistics of the model), similarly to ERGMs (Hunteret al., 2008). By sampling from the estimated model, we can test whether the obtaineddistribution of such auxiliary statistics correspond to those in the observed data.In order to compare diﬀerent model speciﬁcations, we further use the measure of AIC(Akaike, 1973) proposed by Hunter and Handcock (2006) for ERGMs. The calculationof this AIC is done through path-sampling as presented by Gelman and Meng (1998) toestimate the log-likelihood of a model for an estimated parameter α when its normalizingconstant κ ( α ) is intractable. First, we calculate the log-likelihood (cid:96) ( α , p obs ) of a simplemodel paramerized by α , such as the model containing only the statistic s ( P ) = P ,with the equations presented in Section 4. We can then estimate the diﬀerence betweenthe normalizing constants λ ( α , α ) = κ ( α ) − κ ( α ) by sampling M models with parameters α m = mM α + − mM α ) that produce large overlaps between the sampled distributions:ˆ λ ( α, α ) = 1 M M (cid:88) m =1 ( α − α ) T s α m . We ﬁnally estimate the log-likelihood of our model of parameter α with:ˆ (cid:96) ( α, p obs ) = (cid:96) + ( α − α ) T s obs − ˆ λ ( α, α ) . Implemented code, documentation, and an example script can be found in the supplemen-tary materials and the repository github.com/marion-hoﬀman/ERPM .The results of theRobbins-Monro algorithm were compared to a simple Newton-Raphson estimation in thecase of a simple model for which the likelihood can directly be calculated.

Hackathons were deﬁned as ”problem-focused computer programming events” by Topiand Tucker (2014). They are often designed for participating teams to solve a digitalproblem in a short period of time. Such events provide companies, universities, or non-proﬁt organizations the opportunity of harnessing the ideas of volunteers in exchange forrewards and funding for the winning teams (Lara & Lockwood, 2016; Briscoe & Mulligan,2014). Hackathons have recently developed to tackle an increasingly broad range of topics,including education, marketing, and arts (Lara & Lockwood, 2016).We collected data during the 2017 and 2018 editions of a hackathon at a technical uni-versity. The events welcomed 60 and 58 participants respectively, who divided themselves17

017 Edition 2018 Edition

Gender

Male ( N = 60) 49 ( N = 58) 55Female 11 3 Age <

20 ( N = 43) 11 ( N = 54) 1220-25 13 2525-30 10 13 >

30 9 4

First language Swiss German ( N = 49) 16 ( N = 56) 16 German

10 10Others 23 30

Current degree

B.Sc. ( N = 60) 24 ( N = 58) 12M.Sc. 25 31Ph.D. or employed 11 15 Major

Engineering ( N = 60) 14 ( N = 58) 34Computer Science, IT 23 10Physics 6 3Mathematics 2 2Chemistry 3 5Environmental sciences 4 3Other 8 1 Table 1: Counts of gender, age, language, degree, and major attributes among participantsof the ﬁrst and second hackathon editions.into 14 teams in both cases. Individual attributes of participants as well as their prior ac-quaintances were gathered during the registration process via online questionnaires. Theevents were scheduled as follows. The registered participants were invited to the venue ona Saturday at 9:00 and were introduced to the tasks proposed to them. They were laterasked to mingle and deﬁne teams until 13:00. Organizers only allowed teams including2 to 5 individuals in the ﬁrst edition and 3 to 5 members in the second. These teamscollaborated until Sunday afternoon on designing and implementing their solution to thehackathon challenge. The teams’ compositions and their performances as assessed by ajury of experts were collected at the end.In the ﬁrst edition, 1 team of 2, 1 team of 3, 5 teams of 4, and 7 teams of 5 wereformed. The 14 teams in the second edition were divided into 1 team of 3, 9 teams of 4,and 3 teams of 5. Descriptives for the participants’ attributes used in our analyses arepresented in Table 1. Additionally, 22 pairs of participants reported already knowing eachother in the ﬁrst edition, and 23 such pairs were reported in the second edition.

Self-assembled teams for short projects are ubiquitous to organizational, educational, orrecreational contexts (Falk-Krzesinski et al., 2010; Guimera et al., 2005; Contractor, 2013;Zhu et al., 2013). Scholars investigating the motivations for individuals to form a teamin various settings generally identify four types of mechanisms as classiﬁed by (Bailey &Skvoretz, 2017), namely, familiarity, homophily, competence, and aﬀect.First, familiarity describes that individuals are more comfortable teaming up withothers with whom they have collaborated in the past, because of shared practices or values(Bailey & Skvoretz, 2017; Lungeanu et al., 2018; G´omez-Zar´a et al., 2019). Since some18articipants knew each other prior to the event, and reported to join the event together,we expect to ﬁnd a high number of prior acquaintance ties within the teams of our dataset.Second, homophily, as reviewed by (McPherson et al., 2001), is commonly observedin dyadic collaboration and teams, and denotes that similar individuals tend to collab-orate. For example, gender homophily appears to prevail within organisational contexts(Kalleberg et al., 1996; Ruef et al., 2003; McPherson et al., 2001), which leads us to expectgender homophily within the teams of our study. Additionally, complementarity of ageand academic level can guide team formation in this hackathon context, since individualsof diﬀerent ages or levels might have diﬀerent ways of working and expectations from theevent. Finally, both editions were held in English, but participants spoke a wide rangeof diﬀerent languages with the largest group having Swiss German or German as theirﬁrst language. Since sharing the same language could enable collaboration in our context,language homophily could contribute to the composition of the teams.Third, the competence of team members is central in teams whose aim is to achievea given task. However, the diﬃcult endeavor of forming performing teams is to ﬁndthe right balance between optimizing the number of skills within team members andreducing overhead costs of combining diﬀerent ways of thinking or working. Some previousresearch on self-assembled teams found evidence for complementarity of skills (Zhu et al.,2013), while other studies found that individuals teamed to similar others even whencomplementarity would have been beneﬁcial (G´omez-Zar´a et al., 2019). In the case ofour studies, organizers strongly recommended participants to form teams with as diverseskills and knowledge as possible. Moreover, a high number of participants reported toparticipate to the event mostly to learn new skills (24 out of 26 respondents in the ﬁrstedition, and 48 out of 53 in the second). Consequently, we expect to ﬁnd that participantswere more likely to form teams in which a large number of majors or specialities arerepresented.Finally, interpersonal aﬀect, or on the opposite dislike, can be a strong predictor inthe choice of team partners, arguably even more important than competence (Casciaro &Lobo, 2008). However, since our data does not contain such information, we cannot testany related mechanism.

Two models were estimated for each dataset. The models diﬀer by the operationalisation ofcomplementarity in specialisation of participants. We ﬁrst present the included parametersand subsequently discuss their interpretation are discussed one-by-one. Interpreting thesize of parameter values beyond its sign follows the same principles as interpretations forother exponential family models. We can use the binary relations merge/split , permute ,and transfer introduced in Section 2.3 to deﬁne pairs of partitions that exhibit a unitchange for a given statistic, ceteris paribus . Using these operations we can formulate logprobability ratios between partitions that are related through one of those relations andattach a quantitative interpretation to exact parameter values.First, the group size distributions were modeled by the number of groups as a suﬃcientstatistic. We limited the allowed group sizes to a minimum of 2 and a maximum of 5 inthe ﬁrst dataset, and to a minimum of 3 and a maximum of 5 in the second. In addition,we used the sum of squared sizes to model the strong concentration of sizes around 4 and 5in the ﬁrst dataset. On the basis of attributes we model the familiarity eﬀect representedby the count of previous acquaintance ties within groups. Homophily was operationalizedwith either the count of homophilous ties (in the case of gender and language), or thesum of absolute diﬀerences (in the case of age). Finally, the complementarity of skills was19 Est. Sig. S.e. Est. Sig. S.e. Est. Sig. S.e. Est. Sig. S.e.Number of groups -4.36 *** 0.02 -3.96 *** 0.02 -1.07 *** 0.01 -0.92 *** 0.01Sum of squared sizes -0.11 0.23 -0.04 0.25 -Acquaintances 5.71 *** 0.07 6.16 *** 0.10 2.50 *** 0.04 2.47 *** 0.05Age diﬀerences -0.02 0.89 -0.01 1.02 -0.03 0.54 -0.03 0.60Same language -0.11 0.08 -0.15 0.09 0.53 *** 0.08 0.52 *** 0.09Same level 0.25 * 0.12 0.20 0.17 0.03 0.08 0.04 0.10Same major -0.47 *** 0.07 -0.08 0.07Number of majors 0.32 *** 0.07 0.04 0.03Log Likelihood 8.22 8.81 21.80 21.86AIC -30.44 -31.63 -55.59 -55.72

Table 2: Results for Model 1 (with skill homophily) and Model 2 (with skill complemen-tarity) estimated for the ﬁrst and second hackathon edition.modelled with two diﬀerent statistics. Model 1 uses the number of homophilous ties interms of majors within the groups, while Model 2 contains a statistic counting the numberof diﬀerent majors within each team.Results of both models for the ﬁrst dataset are presented in Table 2. We see a signiﬁcantand negative parameter for the number of groups of − .

36 in Model 1 and − .

96 inModel 2. This indicates a tendency to form fewer and, therefore, larger groups. Here,we can interpret this parameter with the log probability ratio between two partitionslinked by the merge/split relation. Speciﬁcally, Model 1 predicts a partition to be aroundexp(4 .

36) = 78 times more likely (52 for M2) compared to the same partition with onegroup split into two , given that all other statistics remain constant and group sizes stayin the allowed range. This applies, for example, to the comparison of having one groupof four participants compared to two groups of two participants, ceteris paribus . Bothparameters for the size dispersion parameter are negative but non-signiﬁcant.Turning to familiarity, we ﬁnd in both models a positive and strongly signiﬁcant pa-rameter for the number of previous acquaintances within groups. For this statistic, it ismore useful to invoke the permute relation to calculate log probability ratios. The param-eters indicate that a partition obtained from a permutation of two actors that would addone acquaintance tie in a group, leaving other statistics equal, is around 300 times morelikely than before permutation in M1 (even 470 in M2). Thus, participants that sign uptogether have a very high probability to be members of the same team.We ﬁnd a slightly signiﬁcant homophily parameter for the academic level in Model 1,indicating a tendency for working with others of the same level, but this eﬀect disappearsin M2. The homophily parameter related to majors is negative and signiﬁcant, givingevidence for a complementary skills in this team formation process. Model 2 conﬁrms thistendency since the parameter related to the number of unique majors in groups is positiveand signiﬁcant. A permutation of an actor that would add one new skill to a team andleave all other statistics unchanged leads then to a partition around 1 . Calculated as − . or − . . ceteris paribus condition is still not trivial to invoke, sinceit is not always possible to ﬁnd merges of groups or permutation of nodes that only aﬀectone statistic at a time. Such log probability ratios should therefore be interpreted withcaution. 21 .4 Model comparison We ﬁrst illustrate model comparison on the basis of the AIC calculated for the four modelspresented. The bottom lines of Table 2 contain the estimated values of the log-likelihoodsand AIC of the diﬀerent models. In the ﬁrst edition, the log-likelihood increases from8 .

22 in Model 1 to 8 .

81 in Model 2 and the AIC decreases from − .

44 to − .

63. Thiscomparison implies that the speciﬁcation of skills preferences by the statistic counting thenumber of unique skills in each teams provides a better ﬁt to the observed data. In thesecond dataset, both values remain similar between M1 and M2 (21 .

80 and 21 .

86 for thelog-likelihood, and − .

59 and − .

72 for the AIC), indicating no important diﬀerence inthe ﬁt of these two models.In Figure 6, we further investigate the distribution of auxiliary statistics in the esti-mated models to assess goodness of ﬁt, following similar procedures as the ones recom-mended for network models (Hunter et al., 2008). These distirbutions are represented bythe violin plots proposed by Hintze and Nelson (1998). Since results did not substantiallychange between models for the second edition, we only carried out these analyses for theﬁrst dataset. We ﬁrst observe that the distribution of group sizes (Figure 6a) is well re-covered by Models 1 and 2, except for the number of groups of size 3 which is slightlyoverestimated by both models. To assess the distribution of ages, we calculate the intra-class correlation coeﬃcient of ages within groups (Figure 6b). Again, both models yield asimilar distribution, slightly overestimating the observed value. Regarding the homophilyeﬀects of age, level and major, we compare the average density of same attribute tiesin groups (Figure 6c). We also compute this density for acquaintance ties. All observedstatistics fall within the conﬁdence intervals of the simulated models. However, we observefor the major attribute that statistics calculated for Model 1 are slightly better centeredaround the observed value. Examination of the correlation between certain individual at-tributes and the size of their team (Figure 6d) further helps assessing how well the modelsreproduce the tendency of certain attributes to be present in larger groups, which is notan eﬀect included in the model. The correlation for age and the attribute of being anM.Sc. student are well centered around the observed value for both models. However, thecorrelation for the attribute of studying Computer Science is slightly underestimated bythe models and in particular Model 1.In conclusion, AICs suggest that Model 2 is a better representation of the data collectedin the ﬁrst edition, however goodness of ﬁt statistics suggest that the diﬀerence is minor.

The present paper introduces the statistical framework of exponential partition modelsand presents its main mathematical properties. Building upon the rich literature on ex-ponential families of distributions, stochastic networks, and stochastic partitions, we showthat this model can uncover regularities in observations of self-assembled exclusive groupswhile taking into account structural dependencies between these observations. Exponentialpartition models can be applied to various contexts in which individuals sort themselvesinto groups based on social preferences, opportunities, and exogenous constraints. Spec-iﬁcations are proposed to investigate a variety of mechanisms that can be situated at anindividual, relational, and group level. An example study case illustrating some of the ca-pabilities of the model using the self-formation of hackathon teams is provided. All codeand documentation for further use of this framework can be found at github.com/marion-hoﬀman/ERPM . Data for replication are available on request.22his work bridges two branches of the statistics literature, one representing systemsas networks and the other as partitions. On the one hand, we augment network methodsby introducing the possibility of modeling social mechanisms at the level of groups ratherthan dyads. By re-thinking the mathematical representation of groups, the proposedframework allows researchers to investigate group formation as coordination processesbetween individuals rather than an aggregation of dyadic ties to group entities. On theother hand, we contribute to the stochastic partition modeling literature by extending theuse of such models to studying complex structural properties of social communities. Inparticular, the model allows to study the inﬂuence of mechanisms related to individualand relational covariates on group formation processes.The presented methodological developments aim to further our understanding of mech-anisms driving the formation of social groups. First, they allow social scientists to modeland explain observations of self-assembled groups and potentially expand the range ofsocial contexts that could be investigated. Moreover, some social processes widely stud-ied at the dyadic level, such as homophily, can now be investigated at the group level.This modeling framework oﬀers the possibility to explore diﬀerent operationalizations ofsuch mechanisms and assess which ones give a better representation of real-life processes,through the use of model diagnosis techniques described in this paper. Finally, by movingfrom the dyad to the group perspective, mechanisms that have been suggested for groupprocesses can be statistically tested. An example of such a process is the optimizationby group members of the combination of individual attributes, such as the distribution ofcompetences in the teams of our case study.Much remains to be discovered about the formation of self-assembled groups. A currentlimitation of the presented framework is its inability to model observations of overlappinggroups. Since such groups are encountered in many social contexts, future research shouldextend the modelling framework to more general data representations, such as hyper-graphs. Modeling group overlaps opens up the possibility of representing new dependen-cies between group memberships. In particular, it would allow to analyse, for example,what leads individuals to belong to multiple groups at the same time. A second limitationlies in its cross-sectional nature. Dynamic or longitudinal data oﬀering rich insights onthe processes driving social systems, an extension of this framework to a dynamic grouprepresentation would greatly further our understanding of social groups dynamics.

Acknowledgement(s)

The authors thank the members of the Chair of Social Networks at ETH Z¨urich, and themembers of the Duisterbelt, for useful comments and feedbacks. P.B. is supported by theLeverhulme Centre for Demographic Science.

Disclosure statement

The authors report no conﬂict of interest.

References

Akaike, H. (1973). Information theory and an extension of maximum likelihood principle. InB. Petrox & F. Caski (Eds.),

Proc. 2nd int. symp. on information theory (pp. 267–281).Budapest: Akad´emiai Kiad´o. llport, G. W., Clark, K., & Pettigrew, T. (1954). The nature of prejudice . Cambridge, MA:Addison-Wesley.Antoniak, C. E. (1974). Mixtures of dirichlet processes with applications to bayesian nonparametricproblems.

The Annals of Statistics , (6), 1152–1174.Bailey, J. L., & Skvoretz, J. (2017). The social-psychological aspects of team formation: newavenues for research. Sociology Compass , (6), 1-12.Bell, E. T. (1934). Exponential polynomials. Annals of Mathematics , (35), 258–277.Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of theRoyal Statistical Society: Series B (Methodological) , (2), 192–225.Block, P. (2018). Network evolution and social situations. Sociological Science , , 402–431.Briscoe, G., & Mulligan, C. (2014). Digital Innovation: The Hackathon Phenomenon. Creative-works London Working Paper No. 6 .Butts, C. T. (2007). Models for generalized location systems.

Sociological Methodology , (1),283–348.Capp´e, O., Moulines, E., & Ryd´en, T. (2005). Inference in hidden markov models . Springer-Verlag:Berlin, Germany.Casciaro, T., & Lobo, M. S. (2008). When competence is irrelevant: The role of interpersonalaﬀect in task-related ties.

Administrative Science Quarterly , (4), 655–684.Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The Amer-ican Statistician , (4), 327–335.Connor, R. J., & Mosimann, J. E. (1969). Concepts of independence for proportions with ageneralization of the Dirichlet distribution. Journal of the American Statistical Association , (325), 194–206.Contractor, N. (2013). Some assembly required: leveraging web science to understand and enableteam assembly. Philosophical Transactions of the Royal Society A: Mathematical, Physicaland Engineering Sciences , (1987), 1–14.Crane, H., et al. (2016). The ubiquitous Ewens sampling formula. Statistical science , (1), 1–19.Creel, S., & Creel, N. M. (1995, January). Communal hunting and pack size in african wild dogs,Lycaon pictus. Anim. Behav. , (5), 1325–1339.Deuﬂhard, P. (2011). Newton methods for nonlinear problems: aﬃne invariance and adaptivealgorithms (Vol. 35). New York, NY: Springer.Ewens, W. J. (1972). The sampling theory of selectively neutral alleles.

Theoretical populationbiology , (1), 87–112.Falk-Krzesinski, H. J., B¨orner, K., Contractor, N., Fiore, S. M., Hall, K. L., Keyton, J., . . . Uzzi,B. (2010). Advancing the science of team science. Clinical and translational science , (5),263–266.Feld, S. L. (1982). Social structural determinants of similarity among associates. AmericanSociological Review , (6), 797–801.Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The annals ofstatistics , (2), 209–230.Fischer, C. S. (1982). To dwell among friends: Personal networks in town and city . Chicago, IL:University of Chicago Press.Frank, O., & Strauss, D. (1986). Markov Graphs.

Journal of the American Statistical Association , (395), 832–842.Geiger, D., Heckerman, D., et al. (1997). A characterization of the Dirichlet distribution throughglobal and local parameter independence. The Annals of Statistics , (3), 1344–1369.Gelman, A., & Meng, X.-L. (1998). Simulating normalizing constants: From importance samplingto bridge sampling to path sampling. Statistical science , (2), 163–185.Geyer, C. J., & Thompson, E. A. (1992). Constrained monte carlo maximum likelihood fordependent data. Journal of the Royal Statistical Society: Series B (Methodological) , (3),657–683.Gittleman, J. L. (1989). Carnivore group living: Comparative trends. In J. L. Gittleman (Ed.), Carnivore behavior, ecology, and evolution (pp. 183–207). Boston, MA: Springer US.G´omez-Zar´a, D., Paras, M., Twyman, M., Lane, J. N., DeChurch, L. A., & Contractor, N. (2019).Who would you like to work with? In

Proceedings of the 2019 chi conference on humanfactors in computing systems (pp. 1–15). ranovetter, M. (1985). Economic action and social structure: The problem of embeddedness. American Journal of Sociology , (3), 481–510.Gu, M. G., & Kong, F. H. (1998). A stochastic approximation algorithm with markov chainmonte-carlo method for incomplete data estimation problems. Proceedings of the NationalAcademy of Sciences , (13), 7270–7274.Gu, M. G., & Zhu, H.-T. (2001). Maximum likelihood estimation for spatial models by markovchain monte carlo stochastic approximation. Journal of the Royal Statistical Society: SeriesB (Statistical Methodology) , (2), 339–355.Guimera, R., Uzzi, B., Spiro, J., & Amaral, L. A. N. (2005). Team assembly mechanisms determinecollaboration network structure and team performance. Science , (5722), 697–702.Hammersley, J. M., & Cliﬀord, P. (1971). Markov ﬁelds on ﬁnite graphs and lattices. Unpublishedmanuscript .Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks.

WorkingPaper No. 39 .Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications.

Biometrika , (1), 97–109.Hintze, J. L., & Nelson, R. D. (1998). Violin plots: a box plot-density trace synergism. TheAmerican Statistician , (2), 181–184.Hoﬀman, M., Block, P., Elmer, T., & Stadtfeld, C. (2020). A model for the dynamics of face-to-faceinteractions in social groups. Network Science , (S1), 1–22.Homans, G. C. (1950). The human group.

Brace, New York, NY: Harcourt.Hubbell, S. P. (2001).

The Uniﬁed Neutral Theory of Biodiversity and Biogeography . Princeton,NJ: Princeton University Press.Hummel, R. M., Hunter, D. R., & Handcock, M. S. (2012). Improving simulation-based algorithmsfor ﬁtting ergms.

Journal of Computational and Graphical Statistics , (4), 920–939.Hunter, D. R. (2007). Curved exponential family models for social networks. Social networks , (2), 216–230.Hunter, D. R., Goodreau, S. M., & Handcock, M. S. (2008). Goodness of ﬁt of social networkmodels. Journal of the American Statistical Association , (481), 248–258.Hunter, D. R., & Handcock, M. S. (2006). Inference in curved exponential family models fornetworks. Journal of Computational and Graphical Statistics , (3), 565–583.Kalleberg, A. L., Knoke, D., Marsden, P. V., & Spaeth, J. L. (1996). Organizations in america:Analysing their structures and human resource practices . Thousand Oaks, CA: Sage.Kingman, J. F. C. (1978). Random partitions in population genetics.

Proceedings of the RoyalSociety of London. A. Mathematical and Physical Sciences , (1704), 1–20.Krackhardt, D. (1988, December). Predicting with networks: Nonparametric multiple regressionanalysis of dyadic data. Social Networks , (4), 359–381.Lai, T. L., et al. (2003). Stochastic approximation. The annals of Statistics , (2), 391–406.Lara, M., & Lockwood, K. (2016). Hackathons as community-based learning: a case study. TechTrends , (5), 486–495.Lauritzen, S. L. (1996). Graphical models . Oxford, England: Clarendon Press.Lazarsfeld, P. F., & Merton, R. K. (1954). Friendship as a social process: A substantive andmethodological analysis.

Freedom and control in modern society , (1), 18–66.Lewin, K., Heider, F. T., & Heider, G. M. (1936). Principles of topological psychology . New York,NY: McGraw-Hill.Lindenberg, S. (1997, January). Grounding groups in theory: Functional, cognitive, and structuralinterdependencies.

Advances in Group Processes , , 281–331.Lungeanu, A., Carter, D. R., DeChurch, L. A., & Contractor, N. S. (2018). How team interlockecosystems shape the assembly of scientiﬁc teams: A hypergraph approach. Communicationmethods and measures , (2-3), 174–198.Lusher, D., Koskinen, J., & Robins, G. (2013). Exponential random graph models for socialnetworks: Theory, methods, and applications . Cambridge, England: Cambridge UniversityPress.McCullagh, P. (2011). Random permutations and partition models.

International Encyclopedia ofStatistical Science , 1170–1177. cPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in socialnetworks. Annual Review of Sociology , (1), 415–444.Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equationof state calculations by fast computing machines. The Journal of Chemical Physics , (6),1087–1092.Moody, J. (2001, November). Race, school integration, and friendship segregation in America. American Journal of Sociology , (3), 679–716.Morris, M., Handcock, M. S., & Hunter, D. R. (2008). Speciﬁcation of Exponential-Family randomgraph models: Terms and computational aspects. Journal of Statistical Software , (4),1548–7660.Nielsen, N. (1906). Handbuch der theorie der gammafunktion . Leipzig, Germany: Teubner.Okubo, A. (1986). Dynamical aspects of animal grouping: swarms, schools, ﬂocks, and herds.

Adv. Biophys. , , 1–94.Parsons, T. (1949). The structure of social action (Vol. 491). New York, NY: Free press.Pattison, P., & Robins, G. (2002, August). Neighborhood-Based models for social networks.

Sociological Methodology , (1), 301–337.Pﬂug, G. C. (1990). Non-asymptotic conﬁdence bounds for stochastic approximation algorithmswith constant step size. Monatshefte f¨ur Mathematik , (3-4), 297–314.Pitman, J. (1997). Some probabilistic aspects of set partitions. The American MathematicalMonthly , (3), 201–209.Pitman, J., & Yor, M. (1997). The two-parameter Poisson-Dirichlet distribution derived from astable subordinator. The Annals of Probability , (2), 855–900.Polyak, B. T. (1990). New method of stochastic approximation type. Automation and remotecontrol , (7 pt 2), 937–946.Putnam, R. D. (2000). Bowling alone: The collapse and revival of american community . NewYork, NY: Simon and Schuster.Reynolds, C. W. (1987, August). Flocks, herds and schools: A distributed behavioral model.

Computer Graphics , (4), 25–34.Riordan, J. (1958). Introduction to combinatorial analysis . New York, NY: John Wiley and Sons.Rivera, M. T., Soderstrom, S. B., & Uzzi, B. (2010). Dynamics of dyads in social networks:Assortative, relational, and proximity mechanisms.

Annual Review of Sociology , , 91–115.Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of MathematicalStatistics , (3), 400–407.Robins, G., & Pattison, P. (2012). Interdependencies and Social Processes: Dependence Graphsand Generalized Dependence Structures. In P. Carrington, J. Scott, & S. Wasserman (Eds.), Models and methods in social network analysis (pp. 192–214). Cambridge, England: Cam-bridge University Press. doi: 10.1017/cbo9780511811395.010Robins, G., Pattison, P., Kalish, Y., & Lusher, D. (2007). An introduction to exponential randomgraph (p*) models for social networks.

Social networks , (2), 173–191.Robins, G., Snijders, T., Wang, P., Handcock, M., & Pattison, P. (2007). Recent developments inexponential random graph (p*) models for social networks. Social networks , (2), 192–215.Ruef, M., Aldrich, H. E., & Carter, N. M. (2003). The structure of founding teams: Homophily,strong ties, and isolation among us entrepreneurs. American Sociological Review , (2),195–222.Simmel, G. (1949). The Sociology of Sociability. American Journal of Sociology , (3), 254-261.Skvoretz, J., & Bailey, J. L. (2016). “Red, White, Yellow, Blue, All Out but You” Status Eﬀectson Team Formation, an Expectation States Theory. Social Psychology Quarterly , (2),136–155.Snijders, T. A. (2001). The statistical evaluation of social network dynamics. Sociological method-ology , (1), 361–395.Snijders, T. A. (2002). Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure , (2), 1–40.Snijders, T. A., & Lomi, A. (2019). Beyond homophily: Incorporating actor variables in statisticalnetwork models. Network Science , (1), 1–19. nijders, T. A., Pattison, P. E., Robins, G. L., & Handcock, M. S. (2006). New speciﬁcations forExponential Random Graph Models. Sociological Methodology , (1), 99–153.Sundberg, R. (2019). Statistical modelling by exponential families (Vol. 12). Cambridge, England:Cambridge University Press.Topi, H., & Tucker, A. (2014).

Computing handbook: Information systems and informationtechnology (Vol. 2). CRC Press.Wang, P., Sharpe, K., Robins, G. L., & Pattison, P. E. (2009). Exponential random graph ( p ∗ )models for aﬃliation networks. Social Networks , (1), 12–25.Wasserman, S., & Pattison, P. (1996). Logit models and logistic regressions for social networks:I. An introduction to Markov graphs and p*. Psychometrika , (3), 401–425.Wedderburn, R. W. (1976). On the existence and uniqueness of the maximum likelihood estimatesfor certain generalized linear models. Biometrika , (1), 27–32.Zhu, M., Huang, Y., & Contractor, N. S. (2013). Motivations for self-assembling into projectteams. Social networks , (2), 251–264. ppendix A. Partition sets with restricted group sizes: extension of the Bellnumbers and Stirling numbers of the second kindExtended Bell numbers. Bell numbers can famously be derived through the recur-sive relation (1) (Bell, 1934), and a similar formula can express the size of the space P (cid:48) containing all partitions whose group sizes belong to the interval [[ σ min , σ max ]].To initialize the recurrence, we ﬁrst know that the sets P (cid:48) ([[1 , n ]]) are empty when n issmaller then σ min . The minimal size n required for one correct partition to exist, thereforewe have: B (cid:48) n = 0 for 0 < n < σ min and B (cid:48) σ min = 1The Bell recursion at ( n + 1) formula enumerates, for i varying from 0 to n , thepartitions with node ( n + 1) in a group of size ( n + 1 − i ) and the i remaining nodescovering all possible partitions given by B i . Here, we can enumerate the same partitionsbut the size ( n + 1 − i ) can only take values between σ min and σ max , therefore i can onlyvary from i min and i max deﬁned as: i min = max (0 , n + 1 − σ max ) and i max = min ( n, n + 1 − σ min ) . (15)If we note P i ([[1 , n + 1]]) the sets containing all partitions P ∈ P (cid:48) ([[1 , n + 1]]) such that node( n + 1) belongs to a group of size ( n + 1 − i ): P i ([[1 , n + 1]]) = { P ∈ P (cid:48) ([[1 , n + 1]]) | g P ( n + 1) = n + 1 − i } , (16)we can write: P (cid:48) ([[1 , n + 1]]) = i max (cid:91) i = i min P i ([[1 , n + 1]])For partitions in P i ([[1 , n + 1]]), we ﬁrst know that there are (cid:0) ni (cid:1) ways to choose thegroup of ( n + 1) and B (cid:48) i ways to choose how to arrange the remaining i nodes. This is truefor any i except when ( n + 1 − i ) corresponds to the whole set size ( n + 1) (i.e., i = 0). Inthat case, we use for convenience: B (cid:48) = 1 . We can therefore write B (cid:48) n +1 as the sum: B (cid:48) n +1 = i max (cid:88) i = i min P i ([[1 , n + 1]]) = i max (cid:88) i = i min (cid:18) ni (cid:19) B (cid:48) i and this establishes the recursive relation for n (cid:62) σ min . Extended Stirling numbers.

The Stirling number (cid:8) nm (cid:9) is the number of partitions of n nodes in m blocks. Its calculation follows the relations (Nielsen, 1906): (cid:26) (cid:27) = 1, (cid:26) n (cid:27) = (cid:26) n (cid:27) = 0 for n > (cid:26) n + 1 m + 1 (cid:27) = n (cid:88) i = m (cid:18) ni (cid:19)(cid:26) im (cid:27) for m > . (17)Similarly, we can calculate ψ σ min ,σ max ( n, m ), the number of partitions in m blocks whenblocks sizes belong to [[ σ min , σ max ]]. First, there is no possible partition for n < mσ min ,therefore: ψ σ min ,σ max ( n, m ) = 0 for 0 < n < mσ min m blocks of minimal size (i.e., n = mσ min ).To count possible partitions in this case, we ﬁrst order all nodes in n ! diﬀerent ways, andtake each time the ﬁrst group of σ min nodes, then the second group, and so on. Somepartitions are counted several times as we have m possible ways to order these groups and σ min ! ways to order the nodes inside each group. The ﬁnal count is: ψ σ min ,σ max ( n, m ) = n ! m !( σ min !) m for n = mσ min The terms in the sum of (17) are the numbers of partitions where the node ( n + 1) isin a group of size ( n + 1 − i ) and the i remaining nodes are partitioned in m groups. Asfor B (cid:48) n numbers, we can adapt the original recursive relation with the indexes (15): ψ σ min ,σ max ( n + 1 , m + 1) = i max (cid:88) i = i min (cid:18) ni (cid:19) ψ σ min ,σ max ( i, m ) for n (cid:62) mσ min . Finally, for the extreme case when the group of node ( n + 1) is of size ( n + 1) (i.e., i = 0) and there are no left groups to form (i.e., m = 0), we have to set for convenience: ψ σ min ,σ max (0 ,

0) = 1 . ppendix B. Translation of the Ewens distribution In this section, we demonstrate that the Ewens distribution (Ewens, 1972), as deﬁnedby Equation (5), can be expressed in the form of the exponential family introduced in thispaper with the following deﬁnition:Pr λ ( P = p ) = exp (cid:16) log( λ ) p + (cid:80) G ∈ p log (cid:0) ( G − (cid:1)(cid:17) κ P (cid:0) (log( λ ) , (cid:1) (18)To prove that this deﬁnition is equivalent to (5), we develop its numerator and denom-inator. Following properties of the exponential and logarithm functions, the numeratorcan be expressed for any partition P :exp(log( λ ) P + (cid:88) G ∈ P log(( G − λ p (cid:89) G ∈ P Γ( G − κ P (cid:0) (log( λ ) , (cid:1) = Γ( n + λ − λ −

1) (20)Proving (20) can be achieved by induction on n the number of nodes. The distribution(18) is deﬁned for statistics that are functions of the group sizes, we can therefore deﬁnethe sequence κ n relations found in Equations (26) and (28) of Appendix D.From Equation (26) and the property Γ(1) = 1, we have for the basic case n = 1: κ = exp(log( λ ) + log(1!) = λ Besides, we know from properties of the Gamma function that Γ( λ + 1) = λ Γ( λ ), we cantherefore validate the relation (20) for n = 1: κ = Γ( λ )Γ( λ − n . κ n +1 = n (cid:88) i =0 (cid:18) ni (cid:19) exp (cid:16) log( λ ) + log (cid:0) ( n − i )! (cid:1)(cid:17) κ i = n (cid:88) i =0 (cid:18) ni (cid:19) λ ( n − i )! κ i (21)We then separate this sum into two parts, one containing the term corresponding to i = n and one containing the other terms: κ n +1 = λκ n + n − (cid:88) i =0 (cid:18) ni (cid:19) λ ( n − i )! κ i Finally, we develop the binomial coeﬃcients and re-arrange them in order to ﬁnd thedeﬁnition of κ n corresponding to the deﬁnition (21) for ( n + 1): κ n +1 = λκ n + n − (cid:88) i =0 ( n − i !( n − − i )! n ( n − i ) λ ( n − i )! κ i = λκ n + n (cid:18) n − (cid:88) i =0 (cid:18) n − i (cid:19) λ (cid:0) ( n − − i (cid:1) ! κ i (cid:19) = ( λ + n ) κ n

30f we assume that (20) holds for a given n >

0, we therefore also have for ( n + 1): κ n +1 = ( λ + n ) Γ( n + λ − λ −

1) = Γ(( n + 1) + λ − λ − n and that the model deﬁnedby (18) is the same as the Ewens model deﬁned by (5).31 ppendix C. Independence properties of the distributionConsistency. Let P be a random partition over A following the distribution (3) and P (cid:48) arandom partition over A (cid:48) following the same distribution with identical suﬃcient statisticsand parameters. We pose π ( P ) the projection of P over the subset A (cid:48) , and π − ( P (cid:48) ) theset of partitions over the nodeset A whose projection is P (cid:48) .Consistency of the distribution then implies equality between the marginal distributionof the random partition P over A (cid:48) and the distribution of P (cid:48) . In other words, the familyof projections of a partition model on A on the subset A (cid:48) is a partition model with thesame suﬃcient statistics and same parameters. This property translates to:Pr α (cid:0) P ∈ π − ( p (cid:48) ) (cid:1) = Pr α (cid:0) P (cid:48) = p (cid:48) (cid:1) . Here we present some counter-examples of distributions used in this paper for whichthis property does not hold. Let us use the space A = { , , } , its subset A (cid:48) = { , } , andthe projection π from P ( A ) to P ( A (cid:48) ). Uniform model.

Let us use the uniform distribution over P ( A ). There are 5 diﬀerent waysof partitioning this set, hence 1 / p (cid:48) = (cid:8) { , } (cid:9) , we can calculate the marginal probability:Pr (cid:16) P ∈ π − ( p (cid:48) ) (cid:17) = Pr (cid:16) P = (cid:8) { , , } (cid:9)(cid:17) + Pr (cid:16) P = (cid:8) { , } , { } (cid:9)(cid:17) = 25and the probability of observing (cid:8) { , } (cid:9) over A (cid:48) :Pr (cid:16) P (cid:48) = p (cid:48) (cid:17) = Pr (cid:16) P (cid:48) = (cid:8) { , } (cid:9)(cid:17) = 12The uniform distribution is therefore not consistent. Model with one statistic s ( P ) = P . We can use again the same example on the samesets and p (cid:48) = (cid:8) { , } (cid:9) . We have as marginal probability:Pr α (cid:16) P ∈ π − ( p (cid:48) ) (cid:17) = Pr α (cid:16) P = (cid:8) { , , } (cid:9)(cid:17) + Pr α (cid:16) P = (cid:8) { , } , { } (cid:9)(cid:17) = exp( α ) + exp(2 α )exp( α ) + 3 exp(2 α ) + exp(3 α )and: Pr α (cid:16) P (cid:48) = p (cid:48) (cid:17) = Pr α (cid:16) P (cid:48) = (cid:8) { , } (cid:9)(cid:17) = exp( α )exp( α ) + exp(2 α )Having these two terms equal is equivalent to the equation exp(2 α ) = 0, which has nosolution in R . Again the consistency condition cannot be fulﬁlled for such models. Neutrality.

We show in this section that the neutrality property deﬁned by Equation (7)holds for any model deﬁned for a set of statistics of the form: s k ( P ) = (cid:88) G ∈ P f k ( G )with ( f k ) deﬁned as real functions of the groups in the partition (i.e. representing any char-acteristic of the group). This deﬁnition covers all statistics used in this article, however,other types of statistics could also lead to neutral distributions.32et P be a random partition deﬁned for such a model with parameter vector α . Fur-thermore, let p be the observed partition that has the property of being the union of itsprojections over the subsets A and A c . We can write:Pr α (cid:16) P = p | P = π ( P ) ∪ π c ( P ) (cid:17) = Pr α (cid:16) P = p , p = π ( p ) ∪ π c ( p ) (cid:17) Pr α (cid:16) P = π ( P ) ∪ π c ( P ) (cid:17) Since the observed partition veriﬁes p = π ( p ) ∪ π c ( p ), the numerator simpliﬁes to:Pr α (cid:16) P = p , p = π ( p ) ∪ π c ( p ) (cid:17) = Pr α (cid:16) P = p (cid:17) and since summing over all groups of p is equivalent to summing over the groups in π ( p )and π c ( p ), this probability factorizes as follows:Pr α (cid:16) P = p (cid:17) = 1 κ P ( A ) ( α ) exp (cid:18) (cid:88) k ∈ K α k (cid:16) (cid:88) G ∈ π ( p ) f k ( G ) + (cid:88) G ∈ π c ( p ) f k ( G ) (cid:17)(cid:19) = 1 κ P ( A ) ( α ) exp (cid:18) (cid:88) k ∈ K α k (cid:88) G ∈ π ( p ) f k ( G ) (cid:19) exp (cid:18) (cid:88) k ∈ K α k (cid:88) G ∈ π c ( p ) f k ( G ) (cid:19) . (22)The denominator expresses the probability of having the random partition P verifying P = π ( P ) ∪ π c ( P ). It is the sum of probabilities of all partitions in P ( A ) with thisproperty. If we deﬁne Q ( A , A (cid:48) ) the set of these partitions, we can deﬁne a bijection b : Q ( A , A (cid:48) ) → (cid:0) P ( A (cid:48) ) , P ( A (cid:48) c ) (cid:1) such that b ( P ) = ( π A (cid:48) ( P ) , π A (cid:48) c ( P )). We deduce:Pr α (cid:16) P = π ( P ) ∪ π c ( P ) (cid:17) = (cid:88) (cid:101) P ∈Q ( A , A (cid:48) ) κ P ( A ) ( α ) exp (cid:18) (cid:88) k ∈ K α k (cid:16) (cid:88) G ∈ π A(cid:48) ( (cid:101) P ) f k ( G ) + (cid:88) G ∈ π A(cid:48) c ( (cid:101) P ) f k ( G ) (cid:17)(cid:19) = (cid:88) (cid:101) P ∈P ( A (cid:48) ) (cid:88) (cid:101) P ∈P ( A (cid:48) c ) κ P ( A ) ( α ) exp (cid:18) (cid:88) k ∈ K α k (cid:16) (cid:88) G ∈ (cid:101) P f k ( G ) + (cid:88) G ∈ (cid:101) P f k ( G ) (cid:17)(cid:19) = 1 κ P ( A ) ( α ) (cid:18) (cid:88) (cid:101) P ∈P ( A (cid:48) ) exp (cid:18) (cid:88) k ∈ K α k (cid:88) G ∈ (cid:101) P f k ( G ) (cid:19)(cid:19)(cid:18) (cid:88) (cid:101) P ∈P ( A (cid:48) c ) exp (cid:18) (cid:88) k ∈ K α k (cid:88) G ∈ (cid:101) P f k ( G ) (cid:19)(cid:19) and we can simplify:Pr α (cid:16) P = π ( P ) ∪ π c ( P ) (cid:17) = κ P ( A (cid:48) ) ( α ) κ P ( A (cid:48) c ) ( α ) κ P ( A ) ( α ) . (23)By dividing the terms (22) and (23), the term κ P ( A ) simpliﬁes and we ﬁnally have:Pr α (cid:16) P = p | P = π ( P ) ∪ π c ( P ) (cid:17) = 1 κ P ( A (cid:48) ) ( α ) exp (cid:18) (cid:88) k ∈ K α k (cid:88) G ∈ π ( p ) f k ( G ) (cid:19) × κ P ( A (cid:48) c ) ( α ) exp (cid:18) (cid:88) k ∈ K α k (cid:88) G ∈ π c ( p ) f k ( G ) (cid:19) = Pr α (cid:16) π ( P ) = π ( p ) (cid:17) × Pr α (cid:16) π c ( P ) = π c ( p ) (cid:17) and this demonstrates the property of neutrality as deﬁned by (7).33 ppendix D. Calculation of the normalizing constant when statistics are func-tions of block sizes Here, we demonstrate that the normalizing constant κ as expressed by Equation (4)can be calculated with a recursive formula when suﬃcient statistics (cid:0) s k (cid:1) k ∈ K are of theform: s k ( P ) = (cid:88) G ∈ P f k ( G ) (24)with ( f k ) deﬁned as functions from the set of possible group sizes to R . In the rest of theproof, we also pose: f ( P ) = exp (cid:16) (cid:88) k ∈ K α k (cid:88) G ∈ P f k ( G ) (cid:17) . (25)In such cases, the probability distribution deﬁned by (3) is said to be exchangeable(McCullagh, 2011), as it is invariant under any permutation of the nodes. The normalizingconstant κ then only depends on n and is noted κ n .For the sake of conciseness, we demonstrate the relation (11) for the constant κ (cid:48) n deﬁnedover the set P (cid:48) ([[1 , n ]]) with groups sizes between σ min and σ max . The relation (10) for thegeneral case directly follows.The proof is based on a similar logic to the one used in Appendix A. To initialize therecursion, we know that there are no possible partitions for smaller sizes, and that thereis only one partition with one group for n = σ min . Therefore: κ (cid:48) n = 0 for n < σ min κ (cid:48) n = exp (cid:16) (cid:88) k ∈ K α k f k ( σ min ) (cid:17) for n = σ min (26)For n > σ min , we can use the subsets P i ([[1 , n + 1]]) deﬁned by (16) and write: κ (cid:48) n +1 = (cid:88) (cid:101) P ∈P (cid:48) ([[1 ,n +1]]) f ( (cid:101) P ) = i max (cid:88) i = i min (cid:32) (cid:88) (cid:101) P ∈P i ([[1 ,n +1]]) f ( (cid:101) P ) (cid:33) . Let us deﬁne G i ([[1 , n + 1]]) the set of all possible groups of nodes in [[1 , n + 1]] thatinclude the node ( n + 1) and whose size is equal to ( n + 1 − i ). To enumerate all possiblepartitions of P i ([[1 , n + 1]]), we enumerate all groups in G i ([[1 , n + 1]]) and all possiblepartitions over the remaining i nodes. With this notation, we have: κ (cid:48) n +1 = i max (cid:88) i = i min (cid:32) (cid:88) g ∈G i ([[1 ,n +1]]) (cid:88) (cid:101) P ∈P (cid:48) ([[1 ,n +1]] \ g ) f ( (cid:101) P ∪ g ) (cid:33) . Since the deﬁnition of the function f is invariant under permutations of the nodes, wecan re-order the i remaining nodes from 1 to i . From this we deduce that for any group g ∈ G i ([[1 , n + 1]]) there is a bijection b g : P (cid:48) ([[1 , n + 1]] \ g ) → P (cid:48) ([[1 , i ]]) such that partitionsover remaining nodes are deﬁned for these re-ordered nodes. We can therefore replace thesum indices in the previous expression: κ (cid:48) n +1 = i max (cid:88) i = i min (cid:32) (cid:88) g ∈G i ([[1 ,n +1]]) (cid:88) (cid:101) P ∈P (cid:48) ([[1 ,i ]]) f ( (cid:101) P ∪ g ) (cid:33) .

34e can the use the deﬁnition (24) of the statistics s k to derive: f ( (cid:101) P ∪ g ) = exp (cid:18) (cid:88) k ∈ K α k (cid:16) (cid:88) G ∈ (cid:101) P f k ( G ) (cid:19) exp (cid:18) (cid:88) k ∈ K α k f k ( g ) (cid:17)(cid:19) = f ( (cid:101) P ) f ( g )and factorize: κ (cid:48) n +1 = i max (cid:88) i = i min (cid:32) (cid:88) g ∈G i ([[1 ,n +1]]) f ( g ) (cid:88) (cid:101) P ∈P (cid:48) ([[1 ,i ]]) f ( (cid:101) P ) (cid:33) . By deﬁnition, the following term simpliﬁes to one of the previously calculated normal-izing constants: (cid:88) (cid:101) P ∈P (cid:48) ([[1 ,i ]]) f ( (cid:101) P ) = κ (cid:48) i , except in the case of i = 0 for which we set: κ (cid:48) = 1 . (27)Moreover, we know that for any g ∈ G i ([[1 , n + 1]]), f k ( g ) = f k ( n + 1 − i ). Developing f ( g ) then removes any term depending on g . The size of G i ([[1 , n + 1]]) being the numberof ways to choose n − i elements (or i elements) among n nodes, we deduce: κ (cid:48) n +1 = i max (cid:88) i = i min (cid:18) ni (cid:19) exp (cid:18) (cid:88) k ∈ K α k f k ( n + 1 − i ) (cid:19) κ (cid:48) i . (28)These expressions shows that we can recursively construct the sequence κ (cid:48) nn