[PDF] A guide to choosing and implementing reference models for social network analysis

Abstract

Analyzing social networks is challenging. Key features of relational data require the use of non-standard statistical methods such as developing system-specific null, or reference, models that randomize one or more components of the observed data. Here we review a variety of randomization procedures that generate reference models for social network analysis. Reference models provide an expectation for hypothesis-testing when analyzing network data. We outline the key stages in producing an effective reference model and detail four approaches for generating reference distributions: permutation, resampling, sampling from a distribution, and generative models. We highlight when each type of approach would be appropriate and note potential pitfalls for researchers to avoid. Throughout, we illustrate our points with examples from a simulated social system. Our aim is to provide social network researchers with a deeper understanding of analytical approaches to enhance their confidence when tailoring reference models to specific research questions.

Full PDF

AA guide to choosing and implementing reference models forsocial network analysis

Elizabeth A. Hobson , Matthew J. Silk , Nina H. Fefferman , Daniel B. Larremore ,Puck Rombach , Saray Shai , and Noa Pinter-Wollman Department of Biological Sciences, University of Cincinnati, Ohio USA Centre for Ecology and Conservation, University of Exeter Penryn Campus, Penryn, Cornwall, UK Departments of Ecology and Evolutionary Biology & Mathematics, University of Tennessee, Knoxville, TN, USA Department of Computer Science & BioFrontiers Institute, University of Colorado Boulder, Boulder, CO, USA Department of Mathematics & Statistics, University of Vermont, VT, USA Department of Mathematics and Computer Science, Wesleyan University, Middletown, CT, USA Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, USA * Equal contribution

Abstract

Analyzing social networks is challenging. Key features of relational data require the use of non-standardstatistical methods such as developing system-speciﬁc null, or reference, models that randomize one ormore components of the observed data. Here we review a variety of randomization procedures thatgenerate reference models for social network analysis. Reference models provide an expectation forhypothesis-testing when analyzing network data. We outline the key stages in producing an effectivereference model and detail four approaches for generating reference distributions: permutation,resampling, sampling from a distribution, and generative models. We highlight when each type ofapproach would be appropriate and note potential pitfalls for researchers to avoid. Throughout, weillustrate our points with examples from a simulated social system. Our aim is to provide social networkresearchers with a deeper understanding of analytical approaches to enhance their conﬁdence whentailoring reference models to speciﬁc research questions.

Keywords: agent-based model, animal sociality, conﬁguration model, permutation, randomization, socialnetwork analysis

Individuals interact with each other in many ways but determining why they interact and uncovering thefunction of social patterns is challenging. Network theory has provided useful tools to quantify patterns1 a r X i v : . [ c s . S I] A ug f social interactions (Borgatti et al., 2009; Croft et al., 2008; Wasserman and Faust, 1994). The analysis ofsocial networks is complicated by the fact that applying statistical inference using standard methods isoften not appropriate because of the inherent dependence of individuals within a network (for example,the actions of one individual are linked to the actions of another) (Croft et al., 2011). Network methodshave emerged as a powerful set of tools with which to analyze social systems (Butts, 2008; Cranmer et al.,2017; Farine and Whitehead, 2015; Fisher et al., 2017; Pinter-Wollman et al., 2014; Silk et al., 2017a,b; Sosaet al., 2020; Wey et al., 2008). Using network tools can often be difﬁcult because important assumptionscan be cryptic and do not apply universally across all suites of research questions or data types. Networkanalyses have been used to address many diverse questions in social and behavioral sciences (Bruch andNewman, 2019; Clauset et al., 2015; Crabtree et al., 2017; Croft et al., 2016; Power, 2017; Ripperger et al.,2019; Sih et al., 2018; Webber and Vander Wal, 2019). This diversity of questions and approaches,especially in an interdisciplinary ﬁeld like network science, has led many researchers to develop toolscustomized to a particular use case. Careful consideration of the math underlying each approach canhelp understand the similarities and differences between alternative methods, can ensure that researchersare correctly testing their hypothesis, and can help researchers avoid violating the assumptions ofparticular methods.In this paper, we describe methods for drawing statistical inferences about patterns of sociality,focusing on the underlying math and using simulated examples to illustrate each approach (seeSupplementary Material). We begin by explaining the concept of a reference (null) model, outliningwhen these reference models are required, discussing the key considerations facing researchers whenusing them, and outlining some of the potential pitfalls that may arise. We then introduce differentapproaches to creating reference models. We highlight the beneﬁts of each approach and provide typicalresearch questions for which different reference models are appropriate. We detail particular pitfalls ofusing the different approaches illustrating potential questions and pitfalls using examples andsimulations. We base our simulations on the social system of a mythical animal, the burbil (Box 1). Ourpaper is targeted at those who have some experience in social network analysis and are looking for waysto make statistical inferences about the social systems they are studying. Croft et al. (2008), Farine andWhitehead (2015), Krause et al. (2015), and Newman (2018) provide excellent introductions to the studyof social networks as a jumping off point for what follows here. The essence of any statistical inference is to determine whether empirical observations are meaningful orwhether they are the outcomes of chance alone. When data do not meet the assumptions of traditionalstatistical methods, as is often the case with network data (Cranmer et al., 2017; Croft et al., 2011),researchers can create chance distributions against which to compare their data.Traditionally, the likelihood of an observation occurring by chance has been referred to as a nullmodel (Croft et al., 2011; Good, 2013). However, the term “null” suggests that no patterns of interest arepresent. While it is correct to assume for statistical inference purposes that nothing is happening in2elation to the phenomenon that is being observed, many other processes can still be acting on, orotherwise limiting, the system. For this reason, we advocate for using the term reference models (Gauvinet al., 2018) rather than null models or chance distributions. The use of the term “reference” highlightsthe notion that we are not comparing observations to a completely random scenario that contains nopredictable patterns but rather to a system in which certain features of interest are preserved and othersare randomized.

Box 1: Introduction to the imaginary burbil society.

To illustrate and provide examples for thedifferent randomization procedures and to summarize some of the key pitfalls that each approachis susceptible to, we created imaginary social network datasets of the mythical burbils (aka

Burbelissilkensis ). Burbils live in open habitats and exhibit two unique nose-color morphs (red and orange).Individual burbils can be uniquely identiﬁed and their sex (male or female) and age (adults, subadultsand juveniles) are known. Burbils form ﬁssion-fusion societies characterised by large groups thatroost together at night but ﬁssion into smaller subgroups when foraging during the day. The numberof subgroups each day is drawn from a Poisson distribution ( λ = 5) and we suspect that subgroupmembership may be assorted by nose colour (see Box 2). Foraging subgroups from different roostinggroups occasionally meet and intermingle, creating opportunities for between-group associations.These between-group associations are more likely if the two burbil groups belong to the same “clan”(similar to the vocal clans of killer whales (Yurk et al., 2002)). Burbil groups differ in size and groupsof different sizes might have different social network structures. Within their social groups burbils areinvolved in both dominance interactions and afﬁliative interactions with group-mates and we suspectthat these are inﬂuenced by age and sex. These interactions can only occur between individuals in thesame sub-groups with the number of interactions recorded in each sub-group varying based on thenumber of individuals recorded. Further information on burbil societies, social network generation,and example analyses are provided in the Supplementary Material. Figure 1:

A burbil. Art by Maureen Rombach.To achieve statistical inference using reference models, the two most important questions aresearcher needs to ask are: ‘How can empirical data be sampled in an unbiased way?’ and ‘What is thechance of patterns of interest being present?’ These questions are linked because chance could be differentdepending on the sampling approach. For example, if one samples only females, the chance distributionshould not include males because the mechanisms that underlie the observed processes could differ3etween males and females. Identifying the appropriate chance distribution that observations should becompared to is critical for avoiding straw-man hypotheses.Importantly, there are inherent differences between observed biological networks and themathematical constructs that underlie reference distributions. Observed biological networks are ﬁniteand exist outside of inﬁnite mathematical constructs (i.e., outside of “Asymptopia”). Therefore, it isimportant not to attribute meaning to differences between observed networks and reference models thatemerge from the difference between the ﬁnite nature of the observed network and the generalmathematical construct that describes the reference model. Instead, inference of meaning should comefrom consideration of agreement with, or deviation from, appropriately chosen patterns that reﬂect thereal-world processes that generate and/or constrain them.

The effective use of a reference model hinges on four key steps, which focus on answering a biologicalquestion by comparing empirical observations to randomized or synthetic constructs. In sequence, wesuggest that researchers (i) clearly articulate the biological question, (ii) choose an appropriate teststatistic, (iii) generate a reference distribution, and (iv) reﬂect on whether the biological question wasaddressed properly. By identifying these discrete steps, we can scrutinize the analysis process to avoidmethodological pitfalls (see Section 4).

Step 1: Articulate the research question and specify the feature of the reference model to berandomized.

A reference model answers a question by connecting an observation to a distribution ofhypothetical observations in which some aspect of the data has been shufﬂed, resampled, or otherwisestochastically altered. Creating a reference distribution by randomizing some aspect of the observed datais an alternative to an experimental manipulation, where an experimental treatment would create adistribution of observations of the system. Choosing which observed feature(s) to randomize in areference model is as important as designing a carefully controlled experiment: both require combiningthe research question, domain knowledge, and accessible data to determine what should be heldconstant and what should be manipulated. Thus, the outcome of Step 1 is a list of network or dataproperties that are to be (i) randomized and (ii) maintained. Networks are interesting precisely becausethey capture complex interdependencies between nodes, which means that choosing what to manipulateand what to hold constant is not always trivial.Although all reference models randomize some aspect of the data while ﬁxing other aspects, bothrandomization and ﬁxation can be done at different levels of abstraction: (Level 1) permutation , in whichobservations are swapped , (Level 2) resampling , in which observations are sampled from the observed datawith replacement, (Level 3) distribution-sampling , in which observations are drawn from a ﬁxeddistribution, and (Level 4) generative-processes , in which synthetic data or networks are constructed fromstochastic rules (Fig. 2). These levels of model abstraction can be applied to the observed data at differentstages of analysis. 4 igure 2: Methods for creating reference models increase in level of abstraction.

Methods progressfrom reference models that rely strongly on the empirical observations of sociality (left) to methods thatmake assumptions about the generative processes that underlie the observed sociality and do not use theobserved social associations when producing a reference model (right).

Step 2: Choose a test statistic.

The test statistic is the quantity that will be calculated from boththe empirical data and from the reference model. Many summary measures can be used as test statistics(see summaries in Sosa et al. (2020); Wey et al. (2008)). The test statistic should quantify the networkfeature, or the relationship between features, that is tied directly to the biological question (see Box 2 foran example).

Step 3: Generate a reference distribution.

Samples from a reference model constitute a referencedataset, and applying the test statistic to each randomized sample in the reference dataset creates areference distribution. In this way, the samples from the reference model can be compared to theempirical data through the lens of the test statistic. If the test statistic from the observed data isindistinguishable from the distribution of test statistics in the reference distribution, a researcher wouldnot be able to reject their hypothesis. If the test statistic from the observed data falls outside or at theextremities of the reference distribution, then a researcher would conclude that the feature, orrelationship, of interest were unlikely to occur by chance and would reject their hypothesis.

Step 4: Reﬂect.

Researchers must investigate whether they truly have a valid ﬁt between thequestion they set out to study and the reference model they chose. They further need to determinewhether the reference model behaves as expected. As we show in this paper, there are many ways inwhich reference models may have hidden biases that can result in misleading outcomes. In setting up anew reference model, it is beneﬁcial to separate Step 1 (choosing which features to randomize and whichto preserve, and at what level of abstraction) from Step 2 (choosing a test statistic) because this allows usto diagnose separate issues associated with a test statistic vs issues related to the data randomizationprocedure itself.

Box 2: Comparing informative and non informative reference models.

Two groups of researcherswant to examine if burbil association networks are different from what would be expected at random.One group is of biologists who are experts in studying animal social interactions, we refer to themas the ’Informed’ group. The other group is of scientists who became fascinated with burbils whenencountering them while working on an unrelated project and are less experienced in using networkmethods. We refer to the latter as the ’Naive’ group.Both teams have association data from a single group of burbils. Based on these association5ata, they build a weighted, undirected network (Fig. 3a). The researchers have information on theattributes of the burbils, such as age, sex, and nose color.The informed group immediately notices that individual burbils differ from one another intheir nose color and ask a speciﬁc research question related to that trait. The naive team overlooksthe natural history of the burbils and asks a much more vague question about burbil social structure.R code for both analyses is provided in the Supplementary Material.

Informed research group Naive research groupStep 1a. Articulate research question:

Are burbils more likely to associate with otherswith the same nose colour? [ clearly deﬁned ques-tion ] Do Burbils associate at random? [ poorly deﬁnedquestion ] Step 1b. Develop a reference model:

To determine if burbils associate in sub groupsbased on nose color, the researchers decide topreserve the observed network structure, i.e.who associates with whom, and randomize nosecolor. Note that this choice maintains all aspectsof burbil social structure - except for nose color- which is the variable the researchers are inter-ested in examining. To determine if burbils associate at random, theresearchers generate random networks with thesame number of nodes and edges and then foreach random network draw edge weights froma normal distribution with the same mean andstandard deviation as the observed adjacencymatrix.

Step 2. Choose a test statistic:

The informed ressearchers use a weighted assor-tativity coefﬁcient to measure the tendency ofburbils to associate with those of the same nosecolor. The researchers choose a measure of variance ofthe weighted degree (strength) distribution - co-efﬁcient of variance (CV) - as the test statistic tocompare the observed and reference networks.

Step 3. Generate a reference distribution:

Both teams generate a reference distribution by running 9999 iterations of their randomizationprocedure to which they will compare the observed test statistic. Using 9999 iterations meanstheir full reference dataset including the observed value is n=10,000. They use their differentalgorithms to generate their reference distributions.

Step 3a. Network randomization and generating reference test statistic:

After each shufﬂe of nose color, the weighted as-sortatvity coefﬁcient is calculated to obtain 9999reference values to compare with the observedvalue. After the creation of each new interaction net-work, the CV of the weighted degree distribu-tion is calculated for each simulation to obtain9999 reference values of simulated weighted de-gree CV to compare with the observed value.

Step 3b: Compare reference and observed test statistics:

Step 3c: Draw inference from comparison between observed and reference values:

The observed assortativity coefﬁcient falls out-side the 95% conﬁdence interval of the refer-ence distribution indicating that burbils do in-deed assort by nose colour (Fig. 3b). The observed weighted degree CV falls insidethe 95% interval of the reference distribution in-dicating that the network is not different fromrandom with regard to this particular networkmeasure.

Step 4: Reﬂect:

The informed team asked a speciﬁc question,used a permutation procedure that shufﬂedonly the one aspect of burbil society that was ofinterest, and they chose a test statistic that waswell matched to their question. The Naive team asked a vague question (whatdoes it mean for a network to be non-random?What is the biological meaning of ‘random’ andhow is it measured?). They found it difﬁcult todeﬁne a satisfactory reference model and theychose a test statistic that was not clearly linkedto their question. The Naive team is thereforeuncertain about the biological conclusions theycan draw. They have also failed to properlyreﬂect on how they generated their referencedistribution, and missed the fact that they in-cluded zero values for self-loops in their calcu-lation of the mean and standard deviation of theedge weights when generating their referencenetworks. These edge weights had a biased rep-resentation and inﬂated their importance com-pared to the observed edge weights.7 igure 3:

Do burbils socially assort by nose-color? (a) Association network of burbils, with nodescolor-coded by nose color and (b) distribution of values based on the permutation procedure of theinformed team; observed value of the test statistic shown as a red solid line and the 2.5% and 97.5%quantiles of the reference distribution as blue dashed lines

A reference model functions, in a computational sense, as a control against the observed outcomes in asystem. The ‘null hypothesis’ would then be that no meaningful differences exist between the calculatedreference and the measured results. Our goal in constructing an appropriate reference model is thereforeto know conﬁdently when to reject that null hypothesis.Although we focus on selecting appropriate reference models against which to contrasthypothesized processes or outcomes (i.e., an appropriate control for observational experiment), the ideaof a test against a reference model itself relies implicitly on the existence of a known and concretealternate hypothesis describing either the process from which the observations emerged, or describingfeatures of the observed data/structures themselves. One potential (and common) point of complicationin the analysis of social networks is that hypothesis generation (i.e., discovery-driven exploration) andhypothesis testing may be easily conﬂated. In discovery-driven investigations it is impossible to designan appropriate reference model because it is impossible to decouple a hypothesis from the observationsthemselves.There is often a temptation to randomize each pattern of interest in a network with the hope thatﬁnding the correct reference model for contrast can allow meaningful interpretation from observationsthat may not be rich enough, or well understood enough yet, to support it. This is not at all to suggest8hat exploratory data analysis is inappropriate. It is critical to purposefully differentiate between the exploration and discovery phase of research (when pattern discovery may itself be the goal and does notrequire statistical departure from a constructed reference model) and the hypothesis testing phase ofresearch (when appropriate reference models are necessary); see Box 3 for an example.

Box 3: Discovery versus Hypothesis Testing.

Consider the case in which a researcher suspects thatindividual risk of infection from a contagious disease circulating in a population may be correlatedwith some measure of the centrality of individuals in the network. There are three potential casesthat are all included in this general description:

Case 1) The mode of transmission of the pathogen is known (e.g. sexually transmitted).

In this case, the network of contacts among individuals that may provide the means for disease trans-mission is well-deﬁned (in the same example, an edge is drawn between two individuals who haveengaged in sexual contact with each other). Given that network, we may then hypothesize that aparticular centrality measure may be the most likely to correlate with infection risk (for example,eigenvector or betweenness centrality). Calculating the individual centralities of each node in the net-work and their respective correlation with disease burden observed is a valid endeavor and requiresthe construction of an appropriate reference model to be able to infer meaning and make appropriateinterpretations of the outcome.

Case 2) There is only one understood centrality measure the researchers believe should be cor-related with disease risk (e.g. betweenness), but the mode of transmission for the infection it-self is not known (e.g. it might transmit by sexual transmission, but might instead be transmit-ted via inhalation of droplets from the respiratory system, so close contact with anyone cough-ing/sneezing/exhaling while infected is sufﬁcient for potentially successful transmission).

In this case the researchers may construct two potential networks: one from observed sexual contactand the other from some spatial proximity index that could reﬂect exposure to exhaled droplets fromothers. Here again, calculating the betweenness of the individuals in each of these two different net-works and their respective correlations with observed disease burden is also valid and requires anappropriate reference model.

Case 3) We are unsure of both the mode of transmission of the pathogen and also which centralitymeasure might correlate with infection risk.

In this case, selecting the combination of network structure and centrality measure that yields thehighest correlation with observed disease burden may not be a well-formed question. No referencemodel, no matter how carefully constructed, may be able to provide a valid context for interpretation.Because the centrality calculations do not exist in the absence of the structure of the selected network,the “pair” of measure and network that produces the greatest ﬁt to the observed transmission patternis the logical equivalent of over-ﬁtting a regression. Unfortunately, unlike a simple regression, becausethe centralities of individuals depend on the network structure (i.e., factors that are extrinsic to thenode itself), validation by sensitivity of the correlation under iterative removal and recalculation (orother common techniques) is not possible. 9

Common pitfalls when using reference models

When using reference models to analyze network data, researchers should keep in mind the pitfalls thatcan arise at each of the above steps. We provide a broad overview here and link these general pitfalls tospeciﬁc examples that are related to the different approaches to generating reference models, which wedetail below.

The most important step when designing a randomization procedure is ensuring that the researchquestion is directly addressed. Researchers may set out to examine a particular question but thenrandomize the network in a way that misses the question and results in misleading conclusions (seeBox 2 for an example). Designing an appropriate randomization procedure can be challenging becausechanging one property of a network can often change others and imposing too many constraints maylead to computational issues or prevent researchers from answering the desired question. Therefore,having a clear understanding of the types of constraints that can be imposed is important.Reference models for social networks can be constructed to preserve both or either non-networkand network aspects of the animals’ biology.

Non-network constraints are properties of the biologicalsystem that are extrinsic to the social network but might inﬂuence whether or not an interaction occurs.Such constraints might shape how the reference distribution is generated, for example by providingrestrictions on possible permutations or resampling. Restricting permutations (or resampling) to speciﬁctime windows, for example, could prevent creating interactions between individuals that had not beenborn yet and ones that have already died, or immigrated away from the study site. Similarly, includingspatial constraints in reference models recognizes that some individuals can never meet, for exampleterrestrial organisms that are separated by a river they cannot cross. Failing to prevent the generation ofsamples in the reference model that are not naturally feasible may lead to false positive results. Often,imposing these constraints will require knowledge of the study system.

Network constraints are emergentproperties of the network that might be important to maintain when testing particular hypotheses (e.g.,the degree distribution, the number of network components or clusters, etc.). These properties are easierto maintain using some approaches to generating reference models than others. For example, datastreampermutation methods overlook the importance of maintaining speciﬁc properties of the social network(more detailed discussion in Section 5.2) (Weiss et al., 2020). Not accounting for network constraints canresult in reference datasets (networks) that fall within the non-network constraints imposed but whichhave substantial differences between some key properties of the emergent network structure in thereference dataset and the observed social network. A failure to include network constraints can result inerrors in inference (Weiss et al., 2020).Both network and non-network constraints can cause unexpected changes such that the referencemodel may no longer address the original research question, or it might address a similar but notidentical research question (see Section 5.2). For example, randomization of movement data can alter thesocial networks constructed from those movements, which may in turn, introduce undesired changes in10eference networks that could not be foreseen from the movement data permutations. Thus, randomizingaway correlations at one scale (e.g., movement) may introduce correlations at another scale (e.g., socialnetwork).Further, while it is important to consider both network and non-network constraints on thereference datasets, a reference model can include too many constraints (see Section 5.2). In some casesthese constraints may prevent the production of a reference model (too few possible conﬁgurations) ormake the process too computationally intensive. Applying constraints may lead to a narrow referencedistribution, which does not have to be a pitfall and might just be the nature of the biological question.However, a pitfall arises if these restrictions stop a researcher from randomizing the aspects of the datathat are the focus of the research question. Sometimes creating a wide enough reference model is notpossible using less abstract approaches (permutations and resampling, see Fig. 2), for example, verysmall networks have a small, ﬁnite, number of possible edge permutations. In such a situation, it mightbe beneﬁcial to change the randomization approach. We discuss in Section 7 a randomization procedurethat can allow researchers to produce wider distributions than those that are obtained by permutation.

We cannot emphasize enough the importance of choosing a biologically appropriate test statistic. Thedata, network structure, or properties of the test statistic may constrain the choice of a test statistic.Thinking carefully about the biological meaning of the test statistic that is being compared betweenobserved and reference data will determine whether or not the biological question can be answered.Researchers might be familiar with particular network measures (e.g., degree, strength, betweeness,density, modularity) and use only those to answer all their questions about network structure. However,not all measures are appropriate for answering every research question and each measure has a differentbiological meaning that can depend also on the network structure (Brent, 2015; Farine and Whitehead,2015; Silk et al., 2017a; Sosa et al., 2020; Wey et al., 2008). Therefore, it is important to think very carefullyabout the biological meaning of the test statistic. Understanding the biological meaning of the test statisticwill prevent testing too many measures. The more test statistics one measures, the more hypotheses arebeing tested and so the greater the need to account for multiple testing (to prevent false positive errors).For example, a researcher might be interested in uncovering the centrality of individuals in a networkand would like to use degree, strength, and betweeness. If the network is very dense, it is likely thatthese three measures are highly correlated with one another and so it would not be informative toexamine all three (Farine and Whitehead, 2015; Silk et al., 2017a). An additional pitfall is that for someresearch questions, the randomization procedure can affect the test statistic in unexpected ways,especially if comparing networks of different sizes. There might be ways to adjust a test statistic but suchadjustments can lead to subtle change in the research question being asked and therefore to newinferences (see Box 4 and the Supplementary Material).

Box 4: Potential pitfalls when choosing a test statistic - an example.

When comparing measuresfrom networks of different sizes, one should be cautious about the test statistic that is being compared11ecause results can differ substantially if (and how) network size is accounted for or not. For example,to determine whether two groups of different sizes differ in their social network structure one mighthypothesize that bigger groups contain more “bridge” individuals than smaller groups. In this casean appropriate test statistic would be betweenness centrality. Betweenness centrality counts, for eachnode, the number of shortest paths between node pairs that it occurs on. Thus, as the number ofnodes in the network increases, there will be more node pairs to consider and betweenness willincrease on average. To account for the effect of network size, it is common to normalize networkmeasures by the number of nodes or, as in this case, the number of node-pairs (we prefer the latteras it is the number of node-pairs in the network that is most directly correlated with the number ofpotential shortest routes in the network).In Fig. 4, we show the complementary cumulative distribution (CCDF) function of betweennesscentrality values in two burbil association networks generated using the spatial and social rules (seesupplementary material). The “small” network contains 130 burbils connected by 1898 edges (blue),and the “big” network contains 297 burbils connected by 3905 edges (orange). As expected, the rawbetweenness centrality values (Fig. 4a) are larger in the big network because there are more pairs ofnodes to consider (each node has more opportunities to be part of multiple shortest paths). However,when normalizing betweenness by the number of pairs in the network (Fig. 4b), nodes in the smallernetwork tend to be more important than in large networks (i.e., have greater normalized betweennesscentrality) because there are fewer options of shortest paths between pairs of nodes in the smallernetwork. Thus, when choosing a test statistic, one needs to ponder how the randomization procedurewill affect it to determine the most appropriate way to normalize for network size.A related pitfall when comparing networks of different sizes is that the most appropriate nor-malization approach can depend on the behavioural rules that generate the network. We show inthe supplementary material how the generative process that underlies the network can impact theability to compare networks of different sizes. We compare the mean degree of burbils in huddlingnetworks in two different sized groups in two different seasons to test whether they are similar. Inboth cases the same rules underlie network structure. In each season different procedures underliethe formation of the network: a random graph and a small-world process. When comparing the meandegree of two networks of different sizes a sensible normalization is to divide raw degree values bythe number of individuals in the group minus one (i.e., the number of individuals it is possible to beconnected to). However, the outcome of doing this depends on whether the network is generated asa random graph or a small-world process. In the former, the normalized mean degree is much moresimilar between the two groups than the mean of the raw degree values. However, when we do thesame for a small-world network the mean of the raw degrees is similar, while the mean of normal-ized degree values is very different. This example highlights the challenges in testing the similarity ofdifferent-sized networks without knowledge of the behaviour that generated them. A similar caveatapplies when using resampling based reference models to compare networks of different sizes.12 bc, unnormalized betweenness centrality P r ( X b c ) smallbig (a) Unnormalized betweenness centrality bc, normalized betweenness centrality P r ( X b c ) smallbig (b) Normalized betweenness centrality

Figure 4:

Comparison of betweenness centrality values in a big (orange points, 297 nodes) and small(blue points, 130 nodes) burbil network generated with the same social and spatial rules: (left) rawbetweenness centrality, deﬁned as bc ( v ) = ∑ s , t σ st ( v ) σ st where σ st is the number of shortest paths betweentwo nodes s and t , and σ st ( v ) is the number of shortest paths between s and t that pass through v ,(right) Normalized betweenness centrality in which values are divided by the number of node pairs,that is bc ( v ) = ( n ) ∑ s , t σ st ( v ) σ st . The process of generating the reference distribution holds a number of potential pitfalls for the unwary.First, the reference model does not always sample the full parameter space. There might be values thatwill never appear in the reference distribution because of the structure of the data or the algorithm of therandomization. Under-representation of values in the reference distribution might be important tomaintain, but could also be an unwanted side-product that could be resolved by using a differentrandomization procedure, as we explain in Section 4.1. We provide an example of how sampling fromdifferent distributions yield different ranges of values in Box 4. Second, the parameter space needs to besampled in an unbiased manner. When generating a reference distribution, certain values might be over-or under- represented if the procedure used to generate the model does not explore the entire parameterspace or explores it naively. Ideally, the randomization procedure will produce a reference distribution inwhich values are uniformly distributed or follow a distribution that is appropriate for the networkstructure. It is important to understand the constraints of the randomization procedure that is being usedto determine if such biased distribution may emerge. We provide a detailed example in Section 5.2.Third, generating a reference distribution can be computationally intensive, to the point that it is notfeasible to generate a large enough reference distribution. We offer a range of approaches, some of which(like sampling from distributions, Section 7) are less computationally intensive than others (such aspermutations in Section 5). If computational constraints inﬂuence the choice of methods, it is importantto carefully evaluate what concessions are being made regarding the ability of the randomization13rocedure to answer the biological question. For example, when using permutations conducting too fewswaps can lead to problems with statistical inference (see Section 5).

Many of the general pitfalls identiﬁed here can be detected by reﬂecting carefully on the approach used.This step can help identify further potential pitfalls. One important point to consider is whether theﬁndings are driven by the question rather than the question being answered by the ﬁnding (as discussedin Section 3). A second potential pitfall is that agreement between observed data and reference modeloutcomes does not necessarily imply similar causality. If the observed data is similar to the randomizeddata, this does not necessarily mean that the algorithm underlying the randomization is the same as thebiological process that underlies the observed network; with a close match, the algorithm is a plausiblegenerating mechanism for the observed patterns, but must be tested further. For example, manyobserved social networks are characterized by a heavy-tailed degree distribution, such that the networkhas few individuals with much higher degree than the rest of the individuals, i.e., they can be consideredas hubs. Often, researchers model the heavy-tailed degree distribution of such networks as a power law,in which the frequency of nodes with a certain degree k is proportional to k − α . Although the algorithmof degree-based preferential attachment (i.e., the Barab´asi-Albert model; Barab´asi and Albert (1999))yields a network with a power law degree distribution, and so do other algorithms (e.g., the “copymodel”; Kleinberg et al. (1999)). It is therefore clear that inferring the process by which a network resultsin a power law degree distribution cannot uniquely rely on agreement with the emergent structure itself.We provide further examples of this pitfall in Section 8.Finally, not all network analysis requires the use of reference models (see also Section 3). While theuse of reference models is often necessary when analyzing features of individuals that are linked toothers in a network because of the dependency between individuals, there are questions and methodsthat do not require the use of reference models. For example, one might use network measures tocharacterize many groups in a society. Researchers might want to ask if a network measure, for exampledensity, increases with the size of the group. In this case a simple correlation between group size anddensity would address the research question. If however, the researchers are interested in the process thatunderlies the relationship between group size and network density they might use generative models(Section 8) or sample from distributions (Section 7) to produce groups of different sizes using differentengagement rules. Note, however, that the second approach addresses the question: ‘what are theunderlying causes of the observed relationship between group size and density?’ rather than answeringthe original research question: ‘is there a relationship between group size and density’? Permutation-based reference models take observed data and shufﬂe it to produce referencedatasets (Good, 2013). The resulting reference models preserve certain attributes of the observed dataset,14uch as distributions of key network measures or features of the raw data, such as group size. Becausedata are shufﬂed and observations are swapped, new values are not necessarily introduced in thereference models (although new values of some measures can be calculated). In their simplest form,permutation-based methods randomize a single feature of the observed data while preserving all otherobserved features. Statistically, this approach breaks correlations that are shaped by the permutedfeature. Permutations can be applied either to the network structure itself (e.g., nodes and edges, orfeatures of them) or to the raw data that underlies the network structure (e.g., movement data, groupmembership, etc.).

Permutations can be used on both node features and edge features. In both cases, these permutationsinvolve swapping attributes among either the nodes or the edges. Attributes can be any feature of thenodes or the edges. Common node attributes are individual identity (often referred to as the node’s label ), sex, body size, age, color, or other features. Attributes of edges can be the types of edgesconnecting two nodes, for example, different types of relationships or interactions, such as aggressionand afﬁliation, or the direction of the edge for asymmetric relationships or for directed interactions.Node feature permutation-based reference models swap attributes among nodes in the network.Node feature swaps preserve the structure of the observed networks, but break potential correlationsbetween the structure of the network and node attributes. Comparing observed networks to nodeattribute permutation reference models allows researchers to test if the attributes of interest areassociated with observed patterns of interactions or associations (for an example, see Box 2).Node feature swaps have been frequently used as reference models in social networkanalysis (Hamilton et al., 2019; Johnson et al., 2017; Snijders et al., 2018; Wilson-Aggarwal et al., 2019).They are most often used to test associations between measures of social network position andphenotypic traits of individuals (e.g., (Ellis et al., 2017; Hamilton et al., 2019; Johnson et al., 2017; Keiseret al., 2016; Wilson-Aggarwal et al., 2019)). We provide an example in the Supplementary Material inwhich we test the relationship between sex and out-strength in burbil dominance networks. Inferencefrom node swap permutations can be complex if there are underlying processes (e.g., differences insampling) that may generate patterns of interest. For example, in Box 2 we swap a node attribute, nosecolor, to test if burbils socially assort by nose color when they interact in an afﬁliative manner. Thesenode swap permutations show that burbil afﬁliative networks are indeed assorted by nose color.However, interactions can only occur when individuals are associating within the same group, therefore,without taking into account the group structure of the population in the permutation, we are unable toanswer whether afﬁliative interactions are assorted for nose color within subgroups.Edge feature permutation-based reference models swap attributes of the edges, leaving the nodeidentities, node metadata, and the connections among them intact. Edge feature swaps can involveshufﬂing the following: (i) labels of edges - swapping one type of interaction for another, like aggression toafﬁliation; (ii) edge directions - swapping which individual directs a behavior to which recipient in an15nteraction, swapping an edge from A to B to go from B to A (De Bacco et al., 2018; Miller et al., 2017); or(iii) edge weights - swapping the values that represent the strength, frequency, or duration of interactionsamong individuals, such as swapping a strong relationship between A and B with a weak relationshipbetween C and D). Note that permuting edge weights can only involve swaps between pairs withnon-zero weighted edges otherwise it would become edge rewiring as detailed in Section 5.2. We providean example of edge direction swaps in the Supplementary Material where we test the hypothesis thatadult burbils have higher out-strength in networks of dominance interactions than younger individuals(subadults and juveniles). We swap edge directions at random in an iterative process where we generatea Markov Chain (see Section 5.2).Edge feature swaps could be used on raw temporal data in edge list form if each interactionbetween two individuals is labeled with the time at which the interaction occurred. A possible edge labelswap would be to randomize the time at which each interaction occurred (changing the time label butkeeping the identities of the pairs that interacted). If edges have further information about the type ofinteractions (e.g., the type of behavior, such as grooming or ﬁghting) one could also randomize the typeof interaction that occurred at each particular time, thus, changing the type of interactions but keepingthe individuals involved and the timing or order of the interactions the same as in the observed. In boththese examples, the edge label swaps would not lead to reference models that are different from theobserved dataset if all time points or all types of social interactions are aggregated. However, networkmeasures that are sensitive to temporal dynamics or to the type of interactions (such as multilayermeasures (Finn et al., 2019; Kivel¨a et al., 2014)) can be affected by these feature swaps.

Edge rewiring involves swapping the edges that represent interactions or associations in raw datastreamsor swapping edges that connect nodes in a network in an adjacency matrix. For example, edge rewiringmay swap the edges ab and cd to replace them with edges ad and cb . Edge rewiring results in what isknown in network science as the conﬁguration model (Bollob´as, 1980).The conﬁguration model is a graph which is sampled uniformly from all graphs of a given degreesequence (with some key technicalities). The degree sequence is the list of all observed degrees in anetwork, which can be summarized as a degree distribution. Conﬁguration models require appropriatecare when making decisions about the speciﬁcations of the underlying model (Fosdick et al., 2018). Likeedge feature swaps, edge rewiring breaks correlations between the node metadata and the structure ofthe network to test whether the observed edge arrangement leads to a network structure that is differentfrom a structure that would be achieved by chance, while preserving group size and the metadata ofnodes. Like edge feature swaps, edge rewiring can be conducted at different stages, from modifying theraw data (in what are often known as datastream permutations ; (Farine, 2017)) to modifying the group’snetwork structure directly by manipulating the adjacency matrix. In general, rewiring models form whatmathematicians call a Markov Chain , such that drawing samples by rewiring is equivalent to samplingfrom a distribution of networks by Markov chain Monte Carlo (MCMC) (Fosdick et al., 2018).16dge rewiring on raw data are often used in animal social network analysis ( datastreampermutations ). Importantly, however, when datastream permutations are used, the conﬁguration modelgenerated is related to the current format of the data rather than the projected social network that issubsequently analysed. Biologists often use a rewiring approach for association data ingroup-by-individual matrices, also known as gambit of the group data formats, (e.g. (Bejder et al., 1998;Brandl et al., 2019; Croft et al., 2006, 2005; Poirier and Festa-Bianchet, 2018; Zeus et al., 2018)). In this dataformat, each individual is recorded as present in a particular group and “group” is often deﬁned as anaggregation of animals that are present at the same time and the same place (Franks et al., 2010;Whitehead and Dufault, 1999). This data format is a bipartite network with edges that connectindividuals to the groups they were observed in, i.e., it is a bipartite version of the conﬁguration modelthat respects the bipartition. When such datastream permutations are applied to group-by-individualmatrices the edge rewiring step takes place on this bipartite network rather than on the projected socialnetwork that is created subsequently. Similarly, when edge rewiring is used for raw data on behaviouralinteractions (e.g. (Miller et al., 2017; Webber et al., 2016)), it is the multigraph that contains all interactions(i.e., a network with multiple rather than weighted edges between nodes) that is rewired, while thenetwork analysed is subsequently treated as a weighted network (with single edges between nodes).Elaborate rewiring procedures can be used to impose both network and non-network constraints.For example, researchers may constrain rewiring to only swap individuals between groups that occur inthe same location or on the same day (non-network constraints). Researchers may further want to imposenetwork constraints, such as forcing the re-wired reference models to preserve the degree distribution ofthe observed network. The R package igraph (Csardi et al., 2006) can rewire social networks whilemaintaining a ﬁxed degree sequence, while Chodrow (2019) shows how to preserve both event size (thenumber of individuals in each grouping event) and the degree of each individual if using datastreampermutations to analyse data on animal groups (or equivalent bipartite networks in other ﬁelds)and Farine and Carter (2020) propose a double permutation test to help avoid elevated type I errors.Another example of an elaborate rewiring procedure is disconnecting either just one or both end(s) of anedge and re-connecting it to a new individual (or individuals) (e.g., (Formica et al., 2016; Hobson andDeDeo, 2015; Hobson et al., 2018). For example, an edge connecting A to B can be disconnected from Band re-wired to now connect A to C. This kind of rewiring results in some changes to both the dyadicrelationships between individuals and the network structure, but preserves other features of thenetworks, such as eigenvector centrality, and can be used to generate reference datasets that areconsistent with a desired network constraint (Hobson and DeDeo, 2015; Hobson et al., 2018). This edgerewiring procedure is different from the conﬁguration model, as it does not generally preserve thedegree sequence. If the network is directed, this type of rewiring can be used to preserve the sequence ofout-degrees, but not in-degrees (or vice versa). As the complexity of the rewiring procedures and theconstraints imposed on them increase, these rewiring procedures become more similar to generativemodels, which we detail in Section 8.We provide examples of datastream permutations for both association and interaction data in theSupplementary Material. For associations, we generate two reference distributions to test the hypothesesthat burbil associations are non-random with or without accounting for assortativity by nose color. Ourpermutations conduct edge rewiring in the group-by-individual matrix and in both reference models we17onstrain swaps to occur within the same burbil group and to be between two sub-groups observed onthe same day. The ﬁrst reference model is naive as we already know the burbils are assorted by nosecolor (Box 2). However, when we additionally constrain swaps so that edges can only be rewired betweenburbils with the same nose color, we see that association patterns are random within a burbil group. Thisexample demonstrates the potential power of using multiple reference models in concert. Forinteractions, we ask what explains burbil afﬁliative interactions. Using edge rewiring in the rawinteraction data we ﬁnd that there is no evidence for assortativity by nose color when controlling forsubgroup membership. We show this by rewiring interactions within each subgroup so that the nosecolor of each dyad is randomized. Afﬁliative interactions are assorted by nose colour only because eachsubgroup tends to be dominated by one nose color or the other (rather than being an unbiased sample ofindividuals in the group).

For permutation-based methods, a ﬁrst major potential pitfall to watch for is failing to impose the correctconstraints on swaps. In feature swaps (conducted on the adjacency matrix itself), it may not always bepossible to constrain swaps as desired. For example, swaps can be constrained to only occur betweenindividuals recorded at the same location (e.g., (Shizuka et al., 2014)), in the same group (e.g., (Ellis et al.,2017)), or alive at the same time (e.g., (Shizuka and Johnson, 2020)). However, it can be challenging toincorporate some other non-network constraints, as we show in our example that tests for assortment bynose color in the burbil network of afﬁliative interactions; there is not a natural way to restrict swaps onthe adjacency matrix to account for this issue. For reference models generated by edge rewiring, it iscritical to consider both the non-network and network constraints because decisions about whichconstraints to build into the rewiring procedure affect the resulting conﬁguration model. In mostcommon animal social network rewiring methods, researchers control for unwanted structure innon-network constraints (e.g., sampling biases, differences in gregariousness, etc). It is less common forresearchers to consider network constraints, such as forcing the rewired networks to conform to aparticular degree distribution. However, without network constraints, the reference model will approacha random network as the number of rewiring steps increases and can result in misleading, false positiveinference (Weiss et al., 2020). Chodrow (2019) shows how one can preserve both the size of interactions(number of animals in each interaction) and the degree of each individual to produce a permutation ofthe datastream that preserves the degree distribution.A second pitﬂl of using permutation-based reference models is the computational limitations andpotential for biased sampling. Permutation-based approaches are often computationally intensive (e.g.,as seen when running the code of the Supplementary Material). Computational constraints can beexacerbated when increasing the number of constraints on the randomization (network or non-network),especially when using the approach because many of the attempted swaps will be rejected. In somecases, over-specifying constraints on the randomization can result in a conﬁguration model withinsufﬁcient acceptable states, making it impossible to generate a reference model, especially whenexamining small networks. Furthermore, it is important to sample from the conﬁguration model in anunbiased manner. This pitfall is especially likely when sampling from a distribution of networks by18 igure 5:

An illustration of how incorrect use of MCMC methods can lead to biased sampling fromthe conﬁguration model when using datastream permutations. When permuting a bipartite group-by-individual network there are 11 possible conﬁgurations - depicted under (D). Of these possibilities, 5(colored in yellow and orange) are acceptable because they do not contain double edges, (as seen in thegreen and blue possibilities as a thick edge). Double edges indicate that the same individual occurredin the same grouping event twice - which is impossible. In (A) we show the “graph of graphs”, or theMarkov chain. (B) is the distribution of samples obtained when permutations are conducted and everystate, including those that are impossible (green and blue) are accepted. (C) is the distribution of samplesobtained when rejecting swaps that result in double edges and then rewiring a randomized network. Notethat a sampling bias arises here - the orange state is oversampled - because it has more routes to otheracceptable states as seen in (A). (D) is the distribution of the samples obtained when swaps that makedouble edges are resampled (i.e., the correct unbiased sampling approach). Note that in (D) the samplingof the ﬁve acceptable states is uniform - as it should be.MCMC, as is often done in edge rewiring approaches. When a swap is rejected (i.e., a suggested swap isnot possible within the set of network and non-network constraints imposed) it is important to resamplethe current reference network as the next iteration of the Markov Chain (Krause et al., 2009). If suchresampling is not done, then the conﬁguration model will be sampled in a biased way (Fig. 5), whichcould lead to errors in inference. Such rejection of swaps will arise more frequently when there are moreconstraints imposed on the permutations, and then other potential pitfalls arise: the Markov Chain willa) take longer to become stationary and b) be slower to mix, which could lead to further errors ininference. Addressing this pitfall requires a burn-in period during which the permuted networks are notused in the reference distribution and a thinning interval that equates to permuted networks only beingsaved as reference datasets after so many iterations in the Markov Chain (e.g., every 10th iteration). Weprovide examples of these in the Supplementary Material.19 third potential pitfall of permutation-based approaches is that they are often prone to havingunanticipated affects on network structure, especially when permutations are conducted on the rawdatastream. Consequently, failing to properly reﬂect is a particularly important pitfall forpermutation-based approaches. For example, a completely uniform random rewiring might make thedata too ‘unrealistic’ or mean the distribution of the response variable is changed considerably (Weisset al., 2020). Edge-rewiring of the group-by-individual matrices typically alters degree and edge weightdistributions, which can lead to false positive errors because the reference model does not address thequestion originally asked. Incorporating constraints imposed on the rewiring possibilities (e.g., Chodrow(2019)) could help resolve this problem.

Re-sampling of network data is a bootstrapping procedure that generates reference models which can befurther from the observed data (Fig. 2) than the permutation-based methods we have discussed thus far.While generating reference models using permutations permits each observation to appear only once inthe reference model (i.e., sampling without replacement), creating reference models using resampling(i.e., sampling with replacement) results in observations appearing more than once, or not at all, in eachsimulation iteration. This difference between the two approaches can change which features of the dataare maintained and which ones are randomized. For example, if a researcher decides that an importantfeature of the social structure is the degree distribution, rather than the exact dyadic interactions betweenindividuals, one can produce reference models by resampling from the observed degree sequence (i.e.,the list of all observed degrees). Resampling from the degree sequence will produce reference networkswith a similar degree distribution to the observed network, but the observed and reference networksmight differ in the degree sequence and potentially also in the number of nodes and/or edges. Onepotential use of resampling-based reference models is the ability to draw reference networks of differentsizes and compare them (see Box 4 for more details and caveats to using this approach). Resampling canbe an effective tool when used with the raw data, however the only network-level properties that can besampled with replacement are the degree sequence and edge weights. Thus, a resampling approach ismore speciﬁc and limited than other approaches we present.

An important utility of the resampling approach in behavioral studies is to resample the raw data that isthe foundation of the network, rather than the network itself. For example, researchers of animal socialnetworks often use the spatial positions of animals to infer interactions from co-localization of individuals(two individuals being in the same place at the same time, e.g., (Mersch et al., 2013; Pinter-Wollmanet al., 2011; Robitaille et al., 2018; Schl¨agel et al., 2019)). A raw data resampling procedure could samplewith replacement individuals’ locations from the observed locations, thus preserving the physicalconstraints on these locations. This approach restricts the sampling to biologically feasible locations so,for example, a terrestrial animal could not be resampled in the middle of a lake. We provide an example20n the Supplementary Material of resampling the foraging location of burbil subgroups separately foreach of the 16 groups in our main study population. The resulting reference models maintain theobserved subgroup memberships and locations are only sampled from within each group’s home range.The way in which the data are resampled could have a large inﬂuence on the reference model. Forexample, restricting the resampling of locations of particular individuals to only their own set oflocations (e.g., Spiegel et al. (2016)) will maintain home range sizes and average travel distances, andtherefore, it might maintain the number and identity of individuals that each individual interacts with.Such a resampling procedure is more likely to result in reference models that are closer to the observednetwork structure, especially if non-network rather than network factors are important in generating thisstructure. Conversely, if individuals seek out conspeciﬁcs to interact with preferentially, then not havingnetwork constraints in the resampling procedure means that the resampling will break the temporaloverlap between interacting individuals. Consequently, well-designed resampling of locations can beuseful to teasing apart non-network and network explanations for social network structure (Spiegel et al.,2016). Alternatively, one could allow resampling an individual’s position from all observed positions ofall individuals in the population. Such a resampling approach would require that it is biologicallyfeasible for animals to move from one position to any other location in which animals were observed.Resampling that breaks the link between the identity of an individual and its movement patterns canproduce reference models that differ considerably from the observed networks, for example, in thenumber of interactions among individuals. These reference models could be used to test the relativeimportance of non-network factors that may drive interactions.

The ﬁrst important pitfall to watch for when resampling network data is that certain re-sampled degreesequences cannot produce a network because they include too many edges or too many nodes. Forexample, if the sum of all degrees in a network ends up being an odd number after resampling thedegree sequence of an unweighted network, a network cannot be generated. Next, when resampling theraw data that underlies the network, it is important to make sure that the resulting network isbiologically feasible. For example, resampling of spatial locations could allow an individual to interactsimultaneously with two individuals that are on opposite ends of a the study site if not conducted withappropriate caution.Finally, pitfalls of resampling-based approaches also include over- or under-sampling certainvalues and deviating from the observed network in unexpected ways. Such biased sampling is likely tobe a particular issue for small networks in which the observed degree sequence represents a smallsample from the degree distribution. For example, resampling from small degree sequences could lead torepeated sampling of a particular degree value that is an outlier in the observed degree sequence.Alternatively, rare values of degree might be omitted in the reference model, leading to substantialchanges in certain network measures. These biases could result in very broad or even multimodalreference distributions in some contexts, and potentially cause problems with inference. Test statisticsthat are based on edge strength could be highly impacted by resampling from the degree sequence,21specially if the observed strength distribution is skewed. For example, resampling could alter thestrength distribution of the reference network by omitting the tail of the distribution. The effects ofresampling on different types of measures could become part of the research question if thought throughcarefully, otherwise it risks leading to erroneous inferences.

Reference models can emerge from general processes that shape a network rather than from the dataitself. One can generalize the features of the observed network, as we detail in this section, or theprocesses that underlie the formation of the network, as we discuss in Section 8. In Section 6 wediscussed resampling from the observed data; a further generalization of this approach is to createreference models based on inferences of the probabilistic description of the observed data, such as thedegree distribution. Distribution-based approaches can result in reference datasets that diverge fromsome of the speciﬁc characteristics of observed networks that are often preserved in permutation-basedreference model approaches (such as group size, or the number of interactions), makingdistribution-based approaches a method for generating reference datasets which are more abstractedfrom observed datasets (Fig. 2).There are a number of technical approaches for implementing distribution-based randomization.To maintain the observed degree distribution in the reference models, researchers can either permute thenetwork edges so that the reference network will have the exact same degree sequence as the observed,but is otherwise random (as described in Section 5). Alternatively, researchers could create a referencenetwork by resampling (with replacement) a new degree sequence from the observed degrees (asdescribed in Section 6), or generate a network from the conﬁguration model (as described in Section 8).Resampling from the degree sequences is equivalent to drawing random samples from an empiricaldegree sequence deﬁned as p k = number of nodes with degree ktotal number of nodes .However, if the functional form of the underlying degree distribution is unknown, it is possible to drawrandom samples from a ﬁtted distribution to obtain a new degree sequence and subsequently generate anetwork (Fig. 6). For example, in many social networks there are right-skewed degree distributions inwhich most individuals have few interactions and few individuals have many interactions. Such a degreedistribution often ﬁts a geometric distribution. Therefore, if researchers are interested in maintaining theshape of the distribution, but not necessarily the exact number of times each degree was observed, thenreference models can be generated by resampling from a geometric distribution that has the sameparameters as the observed data. Sampling from a ﬁtted distribution can result in sampling nodes withdegree k that were not present in the observed network, unlike the resampling approach detailed inSection 6. Sampling from a ﬁtted distribution imposes fewer restrictions on the reference model, whichcan have both statistical and computational advantages.Drawing from a distribution can be thought of as sampling from a ‘smoothed’ version of theobserved network. The biggest challenge is to ﬁnd an appropriate statistical model for the ﬁt. In many22 a) Original data P D F ( k ) sample size = 100 sample from log normalresample from data sample size = 200 sample from log normalresample from data P D F ( k ) sample size = 500 sample from log normalresample from data sample size = 1000 sample from log normalresample from data (b) Figure B

Figure 6:

Drawing random degree sequences from the distribution-based model: (a) histogram of the de-gree sequence of the network shown in the inset and a ﬁtted lognormal distribution (red line), (b) randomsamples of different sizes (100, 200, 500, 1000 randomization iterations) drawn from the ﬁtted lognor-mal distribution (orange) and by resampling the original degree sequence (gray). Network visualizationwas done in Gephi (Bastian et al., 2009) with force atlas, a force directed layout. Node color and sizecorrespond to degree.cases, ﬁnding an appropriate model can be done by ﬁtting a parametric distribution to the data (forexample, using maximum likelihood estimation) and drawing random samples from that distribution(e.g. (Rozins et al., 2018)). It is more convenient to ﬁt continuous distributions, even when describing adiscrete behavior, and one should be conscious of the implications of various rounding procedures toturn the sample into whole numbers (Clauset et al., 2009). In some cases there are efﬁcient stochasticprocesses that can be used for the distributions-based randomization approach. For example, to generatea network with the same degree distribution as the observed network, researchers can use the Chung-Lumodel, which draws an edge between every pair of nodes i , j , with probability proportional to k i · k j where k i and k j are the degrees of node i and j respectively. Using this process would generate networkswith degrees that were not present in the observed network, despite having similar degree distributionsto the observed network.Distribution-based models can offer ﬂexibility and robustness. They are especially useful whenother randomization procedures result in too few unique reference networks that satisfy all therandomization constraints, i.e., there are not enough unique random samples to compare the observedwith (e.g., in small networks, see Section 5.3). Furthermore, the inferences from a distribution-basedrandomization approach emerge from the statistical features of the observed data and therefore mayuncover inherent patterns in the underlying social processes. However, selecting appropriatedistribution-based reference models can also come with challenges, which we outline below.23 .1 Key pitfalls for distribution-based reference models An important potential pitfall when sampling from a distribution is failing to ﬁt the correct distributionto the observed data and therefore simulating a reference dataset that differs from the observed one inkey parameters. For example, a uniform random network has a Poisson degree distribution. However,many real-world social networks have overdispersed (right-skewed) degree distributions (e.g., Rozinset al. (2018)), and failing to account for this overdispersion in a distribution-based reference model willlead to errors in inference.A second potential pitfall arises when sampling independently from two distributions that co-vary.For example, consider a theoretical distribution-based reference model that preserves both the degreedistribution and the distribution of clustering coefﬁcients of an observed network. The clusteringcoefﬁcient of a node measures the fraction of pairs of neighbors of that node which share a link. Thisquantity tends to co-vary with degree, often in a negative direction, especially in networks with anassortative community structure (e.g., Fig. 7). The negative relationship between degree and clusteringcoefﬁcient emerges from the fact that high-degree nodes tend to connect different communities andtherefore their friends are not tightly connected to each other because they belong to many differentcommunities. In Fig. 7 we show a burbil association network with an assortative community structure inwhich node size corresponds to degree and node color corresponds to clustering coefﬁcient. If aresearcher ignored correlations between degree and clustering coefﬁcient and sampled two sequences ofnumbers independently from the distribution in (a) and (b) respectively, the resulting distributionswould mimic the dataset individually but not jointly. We illustrate another example of failing to accountfor the correlation between two distributions (degree and mean edge weight) in the SupplementaryMaterial. For some correlations there may be easy solutions to this co-variance, for example if degreedistributions differed between two sexes then they could be simulated separately for each sex. For otherdistributions (of network measures or in the raw data) it will be necessary to draw simulations from theappropriate multivariate distribution.A third pitfall of using distribution-based reference models is that it is not (currently) possible tosimulate networks with ﬁxed distributions of many social network measures, one example beingclustering coefﬁcient (as per the example above). For a ﬁxed number of nodes and edges, or for a ﬁxeddegree distribution, we know how to sample a network uniformly over all networks with such properties.However, conducting such uniform sampling can be done for very few other network properties.Researchers often use null models that do not sample uniformly from the space of all networks with agiven property, but rather use null models that happen to have properties that are close to the network inquestion (like the generative models in Section 8). It is important to understand the difference betweensampling uniformly over all networks with a given property and sampling from a set of networks thattend to have the property while also having other constraints on their structure, because of the inﬂuencethat these sampling methods will have on the inference process. These potential pitfalls of generatingdistribution-based reference models limits the contexts in which such randomization can be applied.24 k, degree P D F ( k ) (a) Degree distribution data c, clustering coefficient P D F ( c ) (b) Clustering coefﬁcient distribution (c)

Burbil network

15 20 25 30 35 40 45 50 degree c l u s t e r i n g c o e ff i c i e n t (d) Degree-clustering correlations

Figure 7:

An illustration of covariance between two network properties in a burbil association networkgenerated in the Supplementary Material: (a) the degree distribution of the network, (b) the distribution ofthe clustering coefﬁcient - the fraction of a node’s friends that are friends with each other, (c) a visualizationof the network where node size corresponds to degree and node color corresponds to clustering coefﬁcient(network visualization was done in Gephi (Bastian et al., 2009) with force atlas, a force directed layout),(d) the correlation between clustering coefﬁcient and degree in the network.25

Generative reference models

Generative models produce a set of reference networks according to stochastic rules or processes whichencode assumptions about how the network was formed. Thus, generative models are like recipes forcreating networks from scratch. For instance, a researcher might know the behavioral rules that typicallyunderlie the formation of interactions, and might therefore create a network-forming generative modelthat instantiates those rules. However, care must be taken when modeling networks using such generalrules about interaction formation because they have the potential to produce reference networks that arevery different from those observed, in spite of sharing the same number of nodes, links, or otherhigh-level features. In particular, when a generative process is fundamentally non-biological, thatgenerative model may be a poor reference models because it differs too dramatically—andimplausibly—from the observed network.One example of a common but usually implausible reference model used in studies of animalbehavior is the uniform model G ( n , p ) (Gilbert, 1959), also referred to as the Erd˝os-R´enyi (ER) model. Thismodel produces reference networks according to a simple recipe: begin with n nodes, and then place alink between each pair of nodes with probability p , independently of other pairs. While this model hasthe potential to create any network, it is set up to maximize entropy and uniformity, and is thereforeunlikely to mimic any of the features of a network arising from animal behavior. Indeed, even animalsfollowing a Brownian motion in space will encounter each other in a way that is constrained by physicaldistance and barriers (Pinter-Wollman, 2015), thus even random encounters are poorly captured by theuniform reference model. Another example of a common reference model is the conﬁguration model ,introduced in Section 5. While the conﬁguration model is commonly associated with thedegree-preserving permutation of edges via rewiring, it is also simply a modiﬁed uniform model withmore constraints: it chooses uniformly from all networks with a given degree sequence. In this way, thethe conﬁguration model is a generative reference model, for which there are a large number of differentvariations (Fosdick et al., 2018).There is no shortage of generative models for networks. In fact, many common statistical modelsof networks, which we may usually think of as models to ﬁt to data, are generative, including exponentialrandom graph models (ERGMs; Lusher et al. (2013); Robins et al. (2007); Snijders et al. (2006)) andstochastic block models (SBMs; Bollob´as (1980); Snijders and Nowicki (1997)). Just as with other classesof reference models, generative reference models require the careful consideration of the researchquestion and hypothesis to inform the choice of the generative rules. For instance, ERGMs are dyadicmodels that can be used to test hypotheses about which features of dyads affect the presence or strengthof edges. By including sex as an explanatory variable in an ERGM, it becomes possible for there to bedifferences between the likelihood of edges between female-female, female-male and male-male dyads.We illustrate some simple examples of the use of these models in our burbil case study. In theSupplementary Material we ﬁt an ERGM to a within-group dominance network to simultaneously testhypotheses about the role of individual traits in explaining dominance relationships and an SBM topopulation-level association network to examine how well the community structure of the associationnetwork is explained by group membership. 26 class of system-speciﬁc generative reference models are agent-based models (ABMs). In networkanalysis, ABMs can be spatially-explicit or socially-explicit. Spatially-explicit models can help reveal therole of spatial behaviour in explaining social network structure. For example, a generative model inwhich the movement of individuals is constrained by the spatial organization of the environment couldbe used to test whether spatial constraints are sufﬁcient to explain social structure. Researchers couldfurther include differences in spatial behaviour between individuals within such anABM (Pinter-Wollman, 2015). In the Supplementary Material we use a spatially-explicit agent-basedmodel to test whether the space use of burbils can explain patterns of between-group associations. Notethat if we do not include any social component in the model then while our reference network iscorrelated with the observed network, it predicts far too many between-group associations.Socially-explicit ABMs incorporate social behaviour (e.g., interaction preferences). One example ofa socially-explicit ABM in the study of animal behavior is the social inheritance model , in which offspringare likely to form connections with friends of their parents while avoiding parents’ enemies (Ilany andAkcay, 2016). While such a mechanism is highly likely, and has indeed been supported in some socialsystems, such as hyena (Ilany et al., 2020), this model requires knowledge about relatedness andhistorical interactions, or long term relationships, that are not available in all study systems. In our burbilcase study in the Supplementary Material we develop two socially-explicit agent-based models that buildon our spatially-explicit model. The ﬁrst uses knowledge about burbil sub-group size to simulate burbilsmoving within groups rather than independently. The reference network generated is much more similarto the observed network than the previous version, which was only spatially-explicit. We then test thehypothesis that “clan” membership (burbil groups belong to three distinct clans) can help explainpatterns of between-group associations. When we include clan membership in our ABM, the referencemodel produces a network that is very similar to the observed one, suggesting that clan membership canindeed explain the observed social interactions. In reality we would replicate these ABMs 1000 or moretimes to generate a full reference distribution rather than providing a single comparison, which we did toreduce computational run-time. Comparing observed data with generative reference models provides insights about what processesmight underlie observed interactions, and what processes might not. However, as a note of caution, it ispossible to create the same types of networks with multiple generative processes—multiple recipes cangenerate similar patterns. Therefore, when observed data match a generative references model, it doesnot necessarily mean that the modeled generative process is indeed the biological process that actuallygenerated the observed network. Instead, it means that the modeled generative process is a plausiblehypothesis that needs to be tested mechanistically.Further, as generative models become more and more complicated, constraints on one propertythat is being modeled can have cascading effects on other properties. Complicated generative modelswith many parameters can result in one desirable property while other properties of the model remainpoorly understood. Furthermore, complicated models require the speciﬁcation of many parameters,27hich, if misspeciﬁed, can produce reference distributions that signiﬁcantly differ from observations,leading to spurious conclusions. Uniform and conﬁguration models have enjoyed much usage becausetheir complete distributions, constraints, and correlations among their properties are well understood.However, these simple models might not encapsulate all the biological complexities a researcher mightbe interested in. As we experiment with more exotic and complex generative models, which capturemore realistic aspects of observed behavior, it is increasingly important to carefully check for theunintentional creation of fundamentally unrealistic patterns and behaviors in our reference models.

Here we provided an overview of the process and caveats of using reference models when analyzingsocial networks. We detailed common approaches to generating reference distributions that increase inlevel of abstraction with respect to the observed dataset. We highlighted the strengths and weakness ofeach approach, drawing attention to common pitfalls that can arise when using them. Our goal is toprovide a guide for researchers using social network analysis for hypothesis testing in diverse studysystems. We anticipate that our overview will help researchers better appreciate the similarities anddifferences between different analytic approaches and also encourage greater conﬁdence in designingappropriate reference models for their research questions. Our key message is that the construction ofreference models should depend closely on both the research question and study system and that the usegeneric approaches applied without careful reﬂection as to their suitability can lead to incorrect inference.

Acknowledgements

This work was conducted as a part of the Null Models in Social Behavior Working Group at the NationalInstitute for Mathematical and Biological Synthesis, supported by the National Science Foundationthrough NSF Award

References

Barab´asi, A.-L. and Albert, R. (1999). Emergence of scaling in random networks.

Science ,286(5439):509–512.Bastian, M., Heymann, S., and Jacomy, M. (2009). Gephi: an open source software for exploring andmanipulating networks. In

Third international AAAI conference on weblogs and social media .Bejder, L., Fletcher, D., and Br¨ager, S. (1998). A method for testing association patterns of social animals.

Animal Behaviour , 56(3):719–725. 28ollob´as, B. (1980). A probabilistic proof of an asymptotic formula for the number of labelled regulargraphs.

European Journal of Combinatorics , 1(4):311–316.Borgatti, S. P., Mehra, A., Brass, D. J., and Labianca, G. (2009). Network analysis in the social sciences.

Science , 323(5916):892–895.Brandl, H. B., Farine, D. R., Funghi, C., Schuett, W., and Grifﬁth, S. C. (2019). Early-life socialenvironment predicts social network position in wild zebra ﬁnches.

Proceedings of the Royal Society B ,286(1897):20182579.Brent, L. J. (2015). Friends of friends: are indirect connections in social networks important to animalbehaviour?

Animal Behaviour , 103:211–222.Bruch, E. E. and Newman, M. (2019). Structure of online dating markets in us cities.

Sociological Science ,6:219–234.Butts, C. T. (2008). Social network analysis: A methodological introduction.

Asian Journal of SocialPsychology , 11(1):13–41.Chodrow, P. S. (2019). Conﬁguration models of random hypergraphs and their applications. arXivpreprint, arXiv:1902.09302 .Clauset, A., Arbesman, S., and Larremore, D. B. (2015). Systematic inequality and hierarchy in facultyhiring networks.

Science Advances , 1(1):e1400005.Clauset, A., Shalizi, C. R., and Newman, M. E. J. (2009). Power-law distributions in empirical data.

SIAMReview , 51(4):661–703.Crabtree, S. A., Vaughn, L. J., and Crabtree, N. T. (2017). Reconstructing ancestral pueblo food webs inthe southwestern united states.

Journal of Archaeological Science , 81:116–127.Cranmer, S. J., Leifeld, P., McClurg, S. D., and Rolfe, M. (2017). Navigating the range of statistical toolsfor inferential network analysis.

American Journal of Political Science , 61(1):237–251.Croft, D., James, R., Hathaway, C., Mawdsley, D., Laland, K., and Krause, J. (2006). Social structure andco-operative interactions in a wild population of guppies (poecilia reticulata).

Behavioral Ecology andSociobiology , 59(5):644–650.Croft, D., James, R., Ward, A., Botham, M., Mawdsley, D., and Krause, J. (2005). Assortative interactionsand social networks in ﬁsh.

Oecologia , 143(2):211–219.Croft, D. P., Darden, S. K., and Wey, T. W. (2016). Current directions in animal social networks.

CurrentOpinion in Behavioral Sciences , 12:52–58.Croft, D. P., James, R., and Krause, J. (2008).

Exploring animal social networks . Princeton University Press.Croft, D. P., Madden, J. R., Franks, D. W., and James, R. (2011). Hypothesis testing in animal socialnetworks.

Trends in Ecology & Evolution , 26(10):502–507.Csardi, G., Nepusz, T., et al. (2006). The igraph software package for complex network research.

InterJournal, Complex Systems , 1695(5):1–9. 29e Bacco, C., Larremore, D. B., and Moore, C. (2018). A physical model for efﬁcient ranking in networks.

Science Advances , 4(7):eaar8260.Ellis, S., Franks, D. W., Nattrass, S., Cant, M. A., Weiss, M. N., Giles, D., Balcomb, K., and Croft, D. P.(2017). Mortality risk and social network position in resident killer whales: Sex differences and theimportance of resource abundance.

Proceedings of the Royal Society B: Biological Sciences ,284(1865):20171313.Farine, D. R. (2017). A guide to null models for animal social network analysis.

Methods in Ecology andEvolution , 8(10):1309–1320.Farine, D. R. and Carter, G. G. (2020). Permutation tests for hypothesis testing with animal social data:problems and potential solutions.

BioRxiv .Farine, D. R. and Whitehead, H. (2015). Constructing, conducting and interpreting animal social networkanalysis.

Journal of Animal Ecology , 84(5):1144–1163.Finn, K. R., Silk, M. J., Porter, M. A., and Pinter-Wollman, N. (2019). The use of multilayer networkanalysis in animal behaviour.

Animal Behaviour , 149:7 – 22.Fisher, D. N., Ilany, A., Silk, M. J., and Tregenza, T. (2017). Analysing animal social network dynamics:the potential of stochastic actor-oriented models.

Journal of Animal Ecology , 86(2):202–212.Formica, V., Wood, C., Cook, P., and Brodie III, E. (2016). Consistency of animal social networks afterdisturbance.

Behavioral Ecology , page arw128.Fosdick, B. K., Larremore, D. B., Nishimura, J., and Ugander, J. (2018). Conﬁguring random graphmodels with ﬁxed degree sequences.

SIAM Review , 60(2):315–355.Franks, D. W., Ruxton, G. D., and James, R. (2010). Sampling animal association networks with thegambit of the group.

Behavioral Ecology and Sociobiology , 64(3):493–503.Gauvin, L., G´enois, M., Karsai, M., Kivel¨a, M., Takaguchi, T., Valdano, E., and Vestergaard, C. L. (2018).Randomized reference models for temporal networks. arXiv preprint arXiv:1806.04032 .Gilbert, E. N. (1959). Random graphs.

The Annals of Mathematical Statistics , 30(4):1141–1144.Good, P. (2013).

Permutation tests: a practical guide to resampling methods for testing hypotheses . SpringerScience & Business Media.Hamilton, D. G., Jones, M. E., Cameron, E. Z., McCallum, H., Storfer, A., Hohenlohe, P. A., and Hamede,R. K. (2019). Rate of intersexual interactions affects injury likelihood in tasmanian devil contactnetworks.

Behavioral Ecology , 30(4):1087–1095.Hobson, E. and DeDeo, S. (2015). Social feedback and the emergence of rank in animal society.

PLoSComputational Biology , 11(9).Hobson, E. A., Mønster, D., and DeDeo, S. (2018). Strategic heuristics underlie animal dominancehierarchies and provide evidence of group-level social knowledge. arXiv preprint,http://arxiv.org/abs/1810.07215 . 30lany, A. and Akcay, E. (2016). Social inheritance can explain the structure of animal social networks.

Nature Communications , 7(1):1–10.Ilany, A., Holekamp, K. E., and Akc¸ay, E. (2020). Rank-dependent social inheritance determines socialnetwork structure in a wild mammal population. .Johnson, K. V.-A., Aplin, L. M., Cole, E. F., Farine, D. R., Firth, J. A., Patrick, S. C., and Sheldon, B. C.(2017). Male great tits assort by personality during the breeding season.

Animal Behaviour , 128:21–32.Keiser, C. N., Pinter-Wollman, N., Augustine, D. A., Ziemba, M. J., Hao, L., Lawrence, J. G., and Pruitt,J. N. (2016). Individual differences in boldness inﬂuence patterns of social interactions and thetransmission of cuticular bacteria among group-mates.

Proceedings of the Royal Society B: BiologicalSciences , 283(1829):20160457.Kivel¨a, M., Arenas, A., Barthelemy, M., Gleeson, J. P., Moreno, Y., and Porter, M. A. (2014). Multilayernetworks.

Journal of Complex Networks , 2(3):203–271.Kleinberg, J. M., Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. S. (1999). The web as a graph:measurements, models, and methods. In

International Computing and Combinatorics Conference , pages1–17. Springer.Krause, J., James, R., Franks, D. W., and Croft, D. P. (2015).

Animal social networks . Oxford UniversityPress, USA.Krause, S., Mattner, L., James, R., Guttridge, T., Corcoran, M. J., Gruber, S. H., and Krause, J. (2009).Social network analysis and valid markov chain monte carlo tests of null models.

Behavioral Ecology andSociobiology , 63(7):1089–1096.Lusher, D., Koskinen, J., and Robins, G. (2013).

Exponential random graph models for social networks: Theory,methods, and applications . Cambridge University Press.Mersch, D. P., Crespi, A., and Keller, L. (2013). Tracking individuals shows spatial ﬁdelity is a keyregulator of ant social organization.

Science , 340(6136):1090–1093.Miller, E. T., Bonter, D. N., Eldermire, C., Freeman, B. G., Greig, E. I., Harmon, L. J., Lisle, C., Hochachka,W. M., and Stephens, D. (2017). Fighting over food unites the birds of North America in a continentaldominance hierarchy.

Behavioral Ecology , 28(6):1454–1463.Newman, M. (2018).

Networks . Oxford University Press.Pinter-Wollman, N. (2015). Persistent variation in spatial behavior affects the structure and function ofinteraction networks.

Current Zoology , 61:1.Pinter-Wollman, N., Hobson, E. A., Smith, J. E., Edelman, A. J., Shizuka, D., De Silva, S., Waters, J. S.,Prager, S. D., Sasaki, T., Wittemyer, G., et al. (2014). The dynamics of animal social networks:analytical, conceptual, and theoretical advances.

Behavioral Ecology , 25(2):242–255.31inter-Wollman, N., Wollman, R., Guetz, A., Holmes, S., and Gordon, D. M. (2011). The effect ofindividual variation on the structure and function of interaction networks in harvester ants.

Journal ofthe Royal Society Interface , 8(64):1562–1573.Poirier, M.-A. and Festa-Bianchet, M. (2018). Social integration and acclimation of translocated bighornsheep (ovis canadensis).

Biological Conservation , 218:1–9.Power, E. A. (2017). Social support networks and religiosity in rural south india.

Nature Human Behaviour ,1(3):1–6.Ripperger, S. P., Carter, G. G., Duda, N., Koelpin, A., Cassens, B., Kapitza, R., Josic, D., Berr´ıo-Mart´ınez,J., Page, R. A., and Mayer, F. (2019). Vampire bats that cooperate in the lab maintain their socialnetworks in the wild.

Current Biology , 29(23):4139–4144.Robins, G., Pattison, P., Kalish, Y., and Lusher, D. (2007). An introduction to exponential random graph(p*) models for social networks.

Social Networks , 29(2):173–191.Robitaille, A. L., Webber, Q. M., and Vander Wal, E. (2018). Conducting social network analysis withanimal telemetry data: applications and methods using spatsoc. .Rozins, C., Silk, M. J., Croft, D. P., Delahay, R. J., Hodgson, D. J., McDonald, R. A., Weber, N., and Boots,M. (2018). Social structure contains epidemics and regulates individual roles in disease transmission ina group-living mammal.

Ecology and Evolution , 8(23):12044–12055.Schl¨agel, U. E., Signer, J., Herde, A., Eden, S., Jeltsch, F., Eccard, J. A., and Dammhahn, M. (2019).Estimating interactions between individuals from concurrent animal movements.

Methods in Ecologyand Evolution , 10(8):1234–1245.Shizuka, D., Chaine, A. S., Anderson, J., Johnson, O., Laursen, I. M., and Lyon, B. E. (2014). Across-yearsocial stability shapes network structure in wintering migrant sparrows.

Ecology Letters , 17(8):998–1007.Shizuka, D. and Johnson, A. E. (2020). How demographic processes shape animal social networks.

Behavioral Ecology , 31(1):1–11.Sih, A., Spiegel, O., Godfrey, S., Leu, S., and Bull, C. M. (2018). Integrating social networks, animalpersonalities, movement ecology and parasites: a framework with examples from a lizard.

AnimalBehaviour , 136:195–205.Silk, M. J., Croft, D. P., Delahay, R. J., Hodgson, D. J., Boots, M., Weber, N., and McDonald, R. A. (2017a).Using social network measures in wildlife disease ecology, epidemiology, and management.

BioScience ,67(3):245–257.Silk, M. J., Croft, D. P., Delahay, R. J., Hodgson, D. J., Weber, N., Boots, M., and McDonald, R. A. (2017b).The application of statistical network models in disease research.

Methods in Ecology and Evolution ,8(9):1026–1041.Snijders, L., Kurvers, R. H., Krause, S., Ramnarine, I. W., and Krause, J. (2018). Individual-andpopulation-level drivers of consistent foraging success across environments.

Nature Ecology &Evolution , 2(10):1610–1618. 32nijders, T. A. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphswith latent block structure.

Journal of Classiﬁcation , 14(1):75–100.Snijders, T. A., Pattison, P. E., Robins, G. L., and Handcock, M. S. (2006). New speciﬁcations forexponential random graph models.

Sociological Methodology , 36(1):99–153.Sosa, S., Sueur, C., and Puga-Gonzalez, I. (2020). Network measures in animal social network analysis:their strengths, limits, interpretations and uses.

Methods in Ecology and Evolution .Spiegel, O., Leu, S. T., Sih, A., and Bull, C. M. (2016). Socially interacting or indifferent neighbours?randomization of movement paths to tease apart social preference and spatial constraints.

Methods inEcology and Evolution , 7(8):971–979.Wasserman, S. and Faust, K. (1994).

Social Network Analysis: Methods and Applications . CambridgeUniversity Press.Webber, Q. M., Brigham, R. M., Park, A. D., Gillam, E. H., O’Shea, T. J., and Willis, C. K. (2016). Socialnetwork characteristics and predicted pathogen transmission in summer colonies of female big brownbats (

Eptesicus fuscus ). Behavioral Ecology and Sociobiology , 70(5):701–712.Webber, Q. M. and Vander Wal, E. (2019). Trends and perspectives on the use of animal social networkanalysis in behavioural ecology: a bibliometric approach.

Animal Behaviour , 149:77–87.Weiss, M. N., Franks, D. W., Brent, L. J., Ellis, S., Silk, M. J., and Croft, D. P. (2020). Common datastreampermutations of animal social network data are not appropriate for hypothesis testing using regressionmodels. .Wey, T., Blumstein, D. T., Shen, W., and Jord´an, F. (2008). Social network analysis of animal behaviour: apromising tool for the study of sociality.

Animal Behaviour , 75(2):333–344.Whitehead, H. and Dufault, S. (1999). Techniques for analyzing vertebrate social structure usingidentiﬁed individuals.

Adv Stud Behav , 28:33–74.Wilson-Aggarwal, J. K., Ozella, L., Tizzoni, M., Cattuto, C., Swan, G. J., Moundai, T., Silk, M. J., Zingeser,J. A., and McDonald, R. A. (2019). High-resolution contact networks of free-ranging domestic dogs

Canis familiaris and implications for transmission of infection.

PLoS Neglected Tropical Diseases ,13(7):e0007565.Yurk, H., Barrett-Lennard, L., Ford, J., and Matkin, C. (2002). Cultural transmission within maternallineages: vocal clans in resident killer whales in southern Alaska.

Animal Behaviour , 63(6):1103–1119.Zeus, V. M., Reusch, C., and Kerth, G. (2018). Long-term roosting data reveal a unimodular socialnetwork in large ﬁssion-fusion society of the colony-living Natterer’s bat (

Myotis nattereri ). BehavioralEcology and Sociobiology , 72(6):99. 33 upplemental Materials urbilWorld Matthew Silk15/06/2020Please direct any questions about the examples presented in this R script to Matthew Silk([email protected] (mailto:[email protected]))

First we are going to prepare the R environment and load the necessary packages for our case studyNote that throughout this script we use an edited version of the asnipe get_network2 function that doesn’t print messages rm(list=ls())set.seed(5) library (asnipe) library (igraph) library (boot) library (prodlim) library (sna) library (assortnet) library (blockmodels) library (ergm) library (ergm.count) library (tnet) library (vegan)

Part One

Creating a population of burbils with social networks

Burbils live in open habitats throughout the world. They form fission-fusion societies characterised by stable social groups that roost together butfission into smaller subgroups when foraging during the day. Foraging subgroups from different groups occasionally meet and intermingle creatingopportunities for between-group interactions. Burbil groups vary in size and we are unsure whether groups of different sizes have similar socialnetwork structures. Groups also contain two unique colour morphs: burbils with red noses, and those with orange noses. As well as being able toidentify individual burbils (which we use to construct their social networks!), we are also able to distinguish male and female burbils as well asthose from three distinct age classes (adults, subadults and juveniles). We know that burbils are involved in both dominance interactions andaffiliative interactions with group-mates. We suspect they may have a dominance hierarchy, but we don’t know this for sure. We have a lot to findout!Burbils form fission-fusion societies characterised by large groups that roost together at night but fission into smaller subgroups when foragingduring the day. Foraging subgroups from different roosting groups occasionally meet and intermingle, creating opportunities for between-groupassociations. These between-group associations are more likely if the two Burbil groups belong to the same “clan”. burbilIn this section of the code we create our burbil society (starting with the association network), explaining what we do as we go along. With practiceit should be possible to change some of the numbers in this code to change the nature of social relationships in your burbil society. GS<-20 x<-seq(3,18,1)y<-seq(3,18,1)locs<-expand.grid(x,y)names(locs)<-c("x","y") group_locs<-locs[locs$x%%4==0&locs$y%%4==0,] n_groups<-dim(group_locs)[1] Here we create three distinct clans of burbils. This will effect associations between members of different groups group_clans<-sample(c("A","B","C"),n_groups,replace=TRUE) p_wc<-1 p_bc<-0.4 indss<-list() gss<-list() sexes<-list() ages<-list() noses<-list() daysl<-list() gbis<-list() sg_mn<-5 sg_ass<-0.15 for (j in inds<-seq(1,rpois(1,GS),1)indss[[j]]<-inds gs<-length(inds)gss[[j]]<-gs sex<-sample(c("M","F"),gs,replace=TRUE)sexes[[j]]<-sex age<-sample(c("AD","SUB","JUV"),gs,replace=TRUE,prob=c(0.6,0.2,0.2))ages[[j]]<-age nose<-sample(c("RED","ORANGE"),gs,replace=TRUE,prob=c(0.7,0.3))noses[[j]]<-nose n_sg<-rpois(1,sg_mn-1)+1 max_red<-floor(n_sg/2) subgroups1<-sample(n_sg,sum(nose=="RED"),replace=TRUE,prob=c(rep(0.5+sg_ass,max_red),rep(0.5-sg_ass,n_sg-max_red)))subgroups2<-sample(n_sg,sum(nose=="ORANGE"),replace=TRUE,prob=c(rep(0.5-sg_ass,max_red),rep(0.5+sg_ass,n_sg-max_red)))subgroups<-rep(NA,gs)subgroups[nose=="RED"]<-subgroups1subgroups[nose=="ORANGE"]<-subgroups2 bi<-matrix(0,nc=gs,nr=n_sg)gbi[cbind(subgroups,seq(1,gs,1))]<-1days<-rep(1,nrow(gbi)) for (i in max_red<-floor(n_sg/2) subgroups1<-sample(n_sg,sum(nose=="RED"),replace=TRUE,prob=c(rep(0.5+sg_ass,max_red),rep(0.5-sg_ass,n_sg-max_red))) subgroups2<-sample(n_sg,sum(nose=="ORANGE"),replace=TRUE,prob=c(rep(0.5-sg_ass,max_red),rep(0.5+sg_ass,n_sg-max_red))) subgroups<-rep(NA,gss[[j]]) subgroups[nose=="RED"]<-subgroups1 subgroups[nose=="ORANGE"]<-subgroups2 tgbi<-matrix(0,nc=gs,nr=n_sg) tgbi[cbind(subgroups,seq(1,gs,1))]<-1 days<-c(days,rep(i,nrow(tgbi))) gbi<-rbind(gbi,tgbi)} gbi2<-gbi[rowSums(gbi)>0,]days<-days[rowSums(gbi)>0]gbi<-gbi2 daysl[[j]]<-daysgbis[[j]]<-gbi} sglocs<-list() for (i in n_inds<-numeric() for (i in n_tot<-sum(n_inds) inds_tot<-seq(1,n_tot,1) g_tot<-rep(seq(1,n_groups,1),n_inds) gi_tot<-seq(1,n_inds[1],1) for (i in full_net<-matrix(0,nr=n_tot,nc=n_tot) or (i in for (j in for (k in (j+1):n_groups){ tA<-paste0(sglocs[[j]][,1],"-",sglocs[[j]][,2]) tB<-paste0(sglocs[[k]][,1],"-",sglocs[[k]][,2]) tA2<-tA[daysl[[j]]==i] tB2<-tB[daysl[[k]]==i] tt<-match(tA2,tB2) if (sum(is.na(tt))

Create dominance interactions individual identities inds1<-indss[[1]] gs1<-gss[[1]] sex1<-sexes[[1]] age1<-ages[[1]] nose1<-noses[[1]]gbi<-gbis[[1]] GROUP<-numeric()WINNER<-numeric()LOSER<-numeric()

RHP_ad<-1RHP_sub<-0RHP_juv<- -1RHP_M<--0.5RHP_resid<-0.2RHPs1<-rnorm(gs1,RHP_ad*(age1=="AD")+RHP_sub*(age1=="SUB")+RHP_juv*(age1=="JUV")+RHP_M*(sex1=="M"),RHP_resid) m_nipi<-2 grD<-numeric() c<-1 for (g in if (rowSums(gbi)[g]>1){ nipi<-rpois(1,m_nipi)indivs<-which(gbi[g,]==1)ni<-nipi*length(indivs) for (n in if (winner==1){ WINNER[c]<-i1 LOSER[c]<-i2} if (winner==0){ WINNER[c]<-i2 LOSER[c]<-i1}grD[c]<-gc<-c+1}}} dom_net<-graph_from_edgelist(cbind(WINNER,LOSER), directed = TRUE)E(dom_net)$weight <- 1 dom_net<-simplify(dom_net, edge.attr.comb=list(weight="sum")) plot(dom_net,edge.width=edge_attr(dom_net)$weight^0.5,layout=layout_in_circle) So to show that our code to generate the dominance network works we plot the relationship between in-strength and out-strength and it is negatively correlated as would be expected for a linear dominance hierarchy plot(strength(dom_net,mode="out"),strength(dom_net,mode="in"),pch=16,xlab="Out-Degree",ylab="In-Degree",cex.lab=1.5,cex.axis=1)

Create affiliative interactions We use an equivalent approach as for dominance networks so have kept much of the coding the same (hence the mismatch in names)

GROUP<-numeric()GIV<-numeric()REC<-numeric()

AHP_ad<- -1AHP_sub<- -1AHP_juv<-1AHP_M<-0AHP_nose<-1AHP_resid<-0.2AHPs1<-rnorm(gs1,AHP_ad*(age1=="AD")+AHP_sub*(age1=="SUB")+AHP_juv*(age1=="JUV")+AHP_M*(sex1=="M"),AHP_resid) m_nipi<-0.5 grA<-numeric() c<-1 for (g in if (rowSums(gbi)[g]>1){ nipi<-rpois(1,m_nipi) indivs<-which(gbi[g,]==1) ni<-nipi*length(indivs) for (n in if (nose1[i1]==nose1[i2]){tn<-1} winner<-rbinom(1,1,inv.logit(AHPs1[i1]-AHPs1[i2]+tn)) GROUP[c]<-g if (winner==1){ GIV[c]<-i1 REC[c]<-i2 } if (winner==0){ GIV[c]<-i2 REC[c]<-i1 } grA[c]<-g c<-c+1 } }} aff_net<-graph_from_edgelist(cbind(GIV,REC), directed = TRUE)E(aff_net)$weight <- 1 aff_net<-simplify(aff_net, edge.attr.comb=list(weight="sum")) plot(aff_net,edge.width=edge_attr(aff_net)$weight^0.5,layout=layout_in_circle) Plot the same correlation used for dominance networks plot(strength(aff_net,mode="out"),strength(aff_net,mode="in"),pch=16,xlab="Out-Degree",ylab="In-Degree",cex.lab=1.5,cex.axis=1)

Data were also collected on the huddling networks of two burbil groups while they were roosting during summer and winter. This data can be usedto test if the huddling networks differs between small and large groups. We simulate that data here. m_g<-which.min(n_inds)bi_g<-which.max(n_inds) hud_netSM<-sample_smallworld(dim=1, size=gss[[sm_g]], nei=3, p=0.05, loops = FALSE, multiple = FALSE) plot(hud_netSM) igraph::betweenness(hud_netSM) hud_netBI<-sample_smallworld(dim=1, size=gss[[bi_g]], nei=3, p=0.05, loops = FALSE, multiple = FALSE) plot(hud_netBI) Calculate betweenness of network igraph::betweenness(hud_netBI) hist(igraph::betweenness(hud_netSM),breaks=seq(0,60,1),col=rgb(1,0,0,0.3),border=NA,main="",xlab="Betweenness",cex.lab=1.5,cex.axis=1)hist(igraph::betweenness(hud_netBI),breaks=seq(0,60,1),col=rgb(0,0,1,0.3),border=NA,add=TRUE,main="",cex.lab=1.5,cex.axis=1) hud_netSM_w<-erdos.renyi.game(n=gss[[sm_g]], p=0.3, loops = FALSE, multiple = FALSE)hud_netBI_w<-erdos.renyi.game(n=gss[[bi_g]], p=0.3, loops = FALSE, multiple = FALSE) We have also been sent association data from a similar but smaller burbil population by a colleague. They want to know whether their burbilpopulation has a similar network structure to ours.

GS_B<-20 x_B<-seq(3,13,1)y_B<-seq(3,9,1)locs_B<-expand.grid(x_B,y_B)names(locs_B)<-c("x","y") group_locs_B<-locs_B[locs_B$x%%4==0&locs_B$y%%4==0,] n_groups_B<-dim(group_locs_B)[1] group_clans_B<-sample(c("A","B","C"),n_groups_B,replace=TRUE) p_wc_B<-1 p_bc_B<-0.4 indss_B<-list() gss_B<-list() sexes_B<-list() Create a list to store the age of each individual ages_B<-list() noses_B<-list() daysl_B<-list() gbis_B<-list() sg_mn_B<-5 sg_ass_B<-0.1 for (j in inds_B<-seq(1,rpois(1,GS_B),1)indss_B[[j]]<-inds_B gs_B<-length(inds_B)gss_B[[j]]<-gs_B sex_B<-sample(c("M","F"),gs_B,replace=TRUE)sexes_B[[j]]<-sex_B age_B<-sample(c("AD","SUB","JUV"),gs_B,replace=TRUE,prob=c(0.6,0.2,0.2))ages_B[[j]]<-age_B nose_B<-sample(c("RED","ORANGE"),gs_B,replace=TRUE,prob=c(0.7,0.3))noses_B[[j]]<-nose_B n_sg_B<-rpois(1,sg_mn_B-1)+1 max_red_B<-floor(n_sg_B/2) subgroups1_B<-sample(n_sg_B,sum(nose_B=="RED"),replace=TRUE,prob=c(rep(0.5+sg_ass_B,max_red_B),rep(0.5-sg_ass_B,n_sg_B-max_red_B)))subgroups2_B<-sample(n_sg_B,sum(nose_B=="ORANGE"),replace=TRUE,prob=c(rep(0.5-sg_ass_B,max_red_B),rep(0.5+sg_ass_B,n_sg_B-max_red_B)))subgroups_B<-rep(NA,gs_B)subgroups_B[nose_B=="RED"]<-subgroups1_Bsubgroups_B[nose_B=="ORANGE"]<-subgroups2_B gbi_B<-matrix(0,nc=gs_B,nr=n_sg_B)gbi_B[cbind(subgroups_B,seq(1,gs_B,1))]<-1days_B<-rep(1,nrow(gbi_B)) for (i in max_red_B<-floor(n_sg_B/2) subgroups1_B<-sample(n_sg_B,sum(nose_B=="RED"),replace=TRUE,prob=c(rep(0.5+sg_ass_B,max_red_B),rep(0.5-sg_ass_B,n_sg_B-max_red_B))) subgroups2_B<-sample(n_sg_B,sum(nose_B=="ORANGE"),replace=TRUE,prob=c(rep(0.5-sg_ass_B,max_red_B),rep(0.5+sg_ass_B,n_sg_B-max_red_B))) subgroups_B<-rep(NA,gss_B[[j]]) subgroups_B[nose_B=="RED"]<-subgroups1_B subgroups_B[nose_B=="ORANGE"]<-subgroups2_B tgbi_B<-matrix(0,nc=gs_B,nr=n_sg_B) tgbi_B[cbind(subgroups_B,seq(1,gs_B,1))]<-1 days_B<-c(days_B,rep(i,nrow(tgbi_B))) gbi_B<-rbind(gbi_B,tgbi_B)} gbi2_B<-gbi_B[rowSums(gbi_B)>0,]days_B<-days_B[rowSums(gbi_B)>0]gbi_B<-gbi2_Bdaysl_B[[j]]<-days_Bgbis_B[[j]]<-gbi_B} sglocs_B<-list() for (i in n_inds_B<-numeric() for (i in n_tot_B<-sum(n_inds_B) inds_tot_B<-seq(1,n_tot_B,1) g_tot_B<-rep(seq(1,n_groups_B,1),n_inds_B) gi_tot_B<-seq(1,n_inds_B[1],1) for (i in full_net_B<-matrix(0,nr=n_tot_B,nc=n_tot_B) for (i in for (j in for (k in (j+1):n_groups_B){ tA_B<-paste0(sglocs_B[[j]][,1],"-",sglocs_B[[j]][,2]) tB_B<-paste0(sglocs_B[[k]][,1],"-",sglocs_B[[k]][,2]) tA2_B<-tA_B[daysl_B[[j]]==i] tB2_B<-tB_B[daysl_B[[k]]==i] tt_B<-match(tA2_B,tB2_B) if (sum(is.na(tt_B))

Part Two

Social network analysis examples

A) PERMUTATION-BASED REFERENCE MODELS

Our first analyses are for the examples presented in Box 2, with two research groups asking questions about the associations of Burbils in groupone. irst, the informed group: MAT1<-get_network2(gbis[[1]])

NET1<-graph.adjacency(MAT1,mode="undirected",weighted=TRUE)plot(NET1,vertex.label=NA,vertex.color=noses[[1]],edge.width=(edge_attr(NET1)$weight*10)^2) obs<-assortnet::assortment.discrete(MAT1, types=noses[[1]], weighted = TRUE, SE = FALSE, M = 1) reference<-numeric()MAT_T<-sna::rmperm(MAT1) for (i in reference2<-c(obs$r,reference) sum(obs$r

MAT_DOM<-as_adjacency_matrix(dom_net,sparse=FALSE,attr="weight") str_obs<-rowSums(MAT_DOM) obs<-coef(lm(str_obs~sexes[[1]]))[2] reference<-numeric()MAT_T<-sna::rmperm(MAT_DOM) for (i in reference2<-c(obs,reference) sum(obs

MAT_DOM<-as_adjacency_matrix(dom_net,sparse=FALSE,attr="weight") str_obs<-rowSums(MAT_DOM) ageT<-ages[[1]]ageT[ageT=="AD"]<-"A"ageT[ageT=="SUB"|ageT=="JUV"]<-"Y" obs<-coef(lm(str_obs~ageT))[2] burnin<-numeric()reference<-numeric()MAT_T<-MAT_DOM for (i in plot(burnin,type="l",las=1,ylab="Test statistic value",xlab="Position in Markov Chain") We can then continue the Markov Chain and sample from it after each swap to generate our reference distribution of test statistics. for (i in reference2<-c(obs,reference) sum(obs

CoV<- function (a){ a2<-a diag(a2)<-NA return (sd(a2,na.rm=T)/mean(a2,na.rm=T))}CoV2<- function (a){ return (sd(a[a!=0])/mean(a[a!=0]))} obs_cv<-CoV(get_network2(gbi)) gbi<-gbis[[1]]day<-daysl[[1]] gbi_t<-gbirgbis<-list() for (i in pind<-which(gbi_t>0,arr.ind=TRUE) tind1<-pind[sample(1:nrow(pind),1),] td<-which(day==day[tind1[1]]) pind2<-pind[which(pind[,1]% in %td),] tind2<-pind2[sample(1:nrow(pind2),1),] if (tind1[1]!=tind2[1]&tind1[2]!=tind2[2]){ if (gbi_t[tind1[1],tind2[2]]==0&gbi_t[tind2[1],tind1[2]]==0){ gbi_t2<-gbi_t gbi_t2[tind2[1],tind1[2]]<-gbi_t[tind1[1],tind1[2]] gbi_t2[tind1[1],tind1[2]]<-gbi_t[tind2[1],tind1[2]] gbi_t2[tind1[1],tind2[2]]<-gbi_t[tind2[1],tind2[2]] gbi_t2[tind2[1],tind2[2]]<-gbi_t[tind1[1],tind2[2]] gbi_t<-gbi_t2 } }} c<-1 for (i in in %td),] tind2<-pind2[sample(1:nrow(pind2),1),] if (tind1[1]!=tind2[1]&tind1[2]!=tind2[2]){ if (gbi_t[tind1[1],tind2[2]]==0&gbi_t[tind2[1],tind1[2]]==0){ gbi_t2<-gbi_t gbi_t2[tind2[1],tind1[2]]<-gbi_t[tind1[1],tind1[2]] gbi_t2[tind1[1],tind1[2]]<-gbi_t[tind2[1],tind1[2]] gbi_t2[tind1[1],tind2[2]]<-gbi_t[tind2[1],tind2[2]] gbi_t2[tind2[1],tind2[2]]<-gbi_t[tind1[1],tind2[2]] gbi_t<-gbi_t2 } } if (i%%10==0){ rgbis[[c]]<-gbi_t c<-c+1 }} rnets<-lapply(rgbis,get_network2) ref_cvs<-unlist(lapply(rnets,CoV)) par(xpd=FALSE)plot(ref_cvs,type="l",ylim=c(0,0.4),las=1,ylab="Value of test statistic")lines(x=c(-100,100000),y=c(obs_cv,obs_cv),col="red",lwd=2) gbi_t<-gbirgbis<-list() for (i in tn1<-noses[[1]][tind1[2]] pind2<-pind[which(pind[,1]% in %td),] tind2<-pind2[sample(1:nrow(pind2),1),] tn2<-noses[[1]][tind2[2]] if (tind1[1]!=tind2[1]&tind1[2]!=tind2[2]){ if (gbi_t[tind1[1],tind2[2]]==0&gbi_t[tind2[1],tind1[2]]==0){ if (tn1==tn2){ gbi_t2<-gbi_t gbi_t2[tind2[1],tind1[2]]<-gbi_t[tind1[1],tind1[2]] gbi_t2[tind1[1],tind1[2]]<-gbi_t[tind2[1],tind1[2]] gbi_t2[tind1[1],tind2[2]]<-gbi_t[tind2[1],tind2[2]] gbi_t2[tind2[1],tind2[2]]<-gbi_t[tind1[1],tind2[2]] gbi_t<-gbi_t2 } } }} c<-1 for (i in in %td),] tind2<-pind2[sample(1:nrow(pind2),1),] tn2<-noses[[1]][tind2[2]] if (tind1[1]!=tind2[1]&tind1[2]!=tind2[2]){ if (gbi_t[tind1[1],tind2[2]]==0&gbi_t[tind2[1],tind1[2]]==0){ if (tn1==tn2){ gbi_t2<-gbi_t gbi_t2[tind2[1],tind1[2]]<-gbi_t[tind1[1],tind1[2]] gbi_t2[tind1[1],tind1[2]]<-gbi_t[tind2[1],tind1[2]] gbi_t2[tind1[1],tind2[2]]<-gbi_t[tind2[1],tind2[2]] gbi_t2[tind2[1],tind2[2]]<-gbi_t[tind1[1],tind2[2]] gbi_t<-gbi_t2 } } } if (i%%100==0){ rgbis[[c]]<-gbi_t c<-c+1 }} rnets<-lapply(rgbis,get_network2) ref_cvs<-unlist(lapply(rnets,CoV)) plot(ref_cvs,type="l",ylim=c(0,0.4),ylab="Value of test statistic",las=1)lines(x=c(-100,100000),y=c(obs_cv,obs_cv),col="red",lwd=2) We could also calculate p values or produce histograms as we have previously Define function to convert permuted edge lists into adjacency matrices matrix_gen<- function (a){ Taff_net<-graph_from_edgelist(cbind(a,REC), directed = TRUE) E(Taff_net)$weight <- 1 Taff_net<-simplify(Taff_net, edge.attr.comb=list(weight="sum")) b<-as_adjacency_matrix(Taff_net,sparse=FALSE,attr="weight") return (b)} assortment2<- function (a){ b<-assortnet::assortment.discrete(a, types=noses[[1]], weighted = TRUE, SE = FALSE, M = 1) return (b$r)} obs<-assortnet::assortment.discrete(MAT_AFF, types=noses[[1]], weighted = TRUE, SE = FALSE, M = 1)$r

T_W<-GIVrGIV<-list() for (i in c<-1 for (i in if (i%%10==0){ rGIV[[c]]<-T_W c<-c+1 }} r_affnets<-lapply(rGIV,matrix_gen) refs<-unlist(lapply(r_affnets,assortment2)) refs2<-c(obs,refs) sum(obs

B) Resampling-based reference models

Resampling-based reference models can be used to test a range of hypotheses, that may overlap or differ from the type of hypotheses testedusing permutation-based reference models.Resampling of either the network or the raw data can be used.First we demonstrate how resampling of the network might be used. We show how it might be applied to test the hypothesis that two networks ofdifferent sizes are generated by the same underlying process. However, we provide two examples to demonstrate how challenging this can be. Wealso refer to the main text where we discuss the relationship between network size and network properties.Here we test whether the underlying structure of the huddling network in our smallest and largest groups are the same. We do this for winter whenhuddling networks are random graphs and summer when huddling networks are small-world graphs hns2_m<-as_adjacency_matrix(hud_netSM_w,sparse=FALSE)hnb2_m<-as_adjacency_matrix(hud_netBI_w,sparse=FALSE) hist(colSums(hns2_m),col=adjustcolor("firebrick",0.2),border=NA,breaks=seq(0,20,1),ylim=c(0,15),las=1,las=1,xlab="Betweenness",main="",cex.lab=1.5)hist(colSums(hnb2_m),col=adjustcolor("dodgerblue",0.2),border=NA,breaks=seq(0,20,1),add=TRUE) We then decide that the mean degree of the network is our test statistic. We calculate the mean degree for each network smeanw<-mean(colSums(hns2_m))bmeanw<-mean(colSums(hnb2_m))print(smeanw) bmeansw<-numeric() for (i in sum(smeanw

Another form of resampling-based reference model is to sample from a metric distribution, most commonly the degree distribution. For example,using igraph we can caclulate the degree sequence of our full burbil association network and generate otherwise randomised graphs with the samedegree sequence. eg<-igraph::degree(full_net2)rdsn<-igraph::sample_degseq(deg,method="simple.no.multiple")plot(rdsn,vertex.label=NA) edge_attr(rdsn)$weight<-sample(edge_attr(full_net2)$weight,gsize(rdsn),replace=TRUE) plot(rdsn,vertex.label=NA,edge.width=(edge_attr(rdsn)$weight*10)^2) hese types of reference model can be very useful when studying transmission through social networks. Degree distributions can have profoundimplications for spreading processes, and this type of approach provides a way to maintain the degree distribution to study the importance of otheraspects of network structure. Resampling raw data to generate reference models

We know that space use is important in structuring the overall (between-group) association network. Therefore we could construct a null model byresampling the spatial locations that subgroups were observed at and re-constructing the social network.We use this example to demonstrate bootstrapping as a technique in generating reference models. This is sampling with replacement.Here we test the hypothesis that there is no preferential associations between members of different groups when they meet.

N.B. Here we calculate only one instance of the reference distribution (a single reference network) but there is no reason this operation can’t berepeated to generate a full reference distribution Generate adjacency matrix to store reference model

R_fn_rs<-matrix(0,nr=nrow(full_net),nc=ncol(full_net))

R_p_wc<-1R_p_bc<-1

R_gbis<-list() for (i in R_sglocs<-list() for (i in for (i in for (j in for (k in (j+1):n_groups){ tA<-paste0(R_sglocs[[j]][,1],"-",R_sglocs[[j]][,2]) tB<-paste0(R_sglocs[[k]][,1],"-",R_sglocs[[k]][,2]) tA2<-tA[daysl[[j]]==i] tB2<-tB[daysl[[k]]==i] tt<-match(tA2,tB2) if (sum(is.na(tt))

C) Distribution-based reference models

Instead of resampling from existing measures we can also fit statistical models to distributions of network measures and then re-generate networksaccordingly.This can be a relatively easy process to follow (for some but not all network measures) when the distribution of only one measure is involved, butgets progressively more challenging if multiple properties of the network are to be retained. This is especially true when these distributions arecorrelated. We illustrate a burbil example in the main text, but also examine it briefly here.Our example uses the overall population association network of our main burbil study population deg<-igraph::degree(full_net2)hist(deg,las=1,xlab="Degree",cex.lab=1.5,col="grey",main="") In a similar way we can also calculate the distribution of clustering coefficients for the network clu<-igraph::transitivity(full_net2,type="weighted")hist(clu,las=1,xlab="Clustering coefficient",cex.lab=1.5,col="grey",main="") mean(deg) fit<-glm(deg~1) mdeg<-coef(fit) ndeg<-rpois(length(V(full_net2)),mdeg) while (class( try (igraph::sample_degseq(ndeg,method="simple.no.multiple")))=="try-error"){ ndeg<-rpois(length(V(full_net2)),mdeg)}rdsn2<-igraph::sample_degseq(ndeg,method="simple.no.multiple")plot(rdsn2,vertex.label=NA,vertex.size=8) hist(full_net[full_net>0],las=1,xlab="Degree",cex.lab=1.5,col="grey",main="") This distribution is a little complex (it looks like there are two different statistical models generating it) par(mfrow=c(1,2))hist(full_net[full_net>0.025],las=1,xlab="Degree",cex.lab=1.5,col="grey",main="")hist(full_net[full_net>0&full_net<0.025],las=1,xlab="Degree",cex.lab=1.5,col="grey",main="") ar(mfrow=c(1,1)) ne<-gsize(full_net2) pes<-sum(full_net>0&full_net<0.025)/ne meb<-mean(full_net[full_net>0.025])sdeb<-sd(full_net[full_net>0.025]) new_edgeweights<-rep(NA,gsize(rdsn2)) for (i in if (tb==1){ new_edgeweights[i]<-0.005 } if (tb==0){ new_edgeweights[i]<-rnorm(1,meb,sdeb) }} par(mfrow=c(1,2))hist(full_net[full_net>0],las=1,xlab="Degree",cex.lab=1.5,col="grey",breaks=seq(0,0.3,0.01),main="")hist(new_edgeweights,las=1,xlab="Degree",cex.lab=1.5,col="grey",breaks=seq(0,0.3,0.01),main="")par(mfrow=c(1,1)) edge_attr(rdsn2)$weight<-new_edgeweightsplot(rdsn2,vertex.label=NA,vertex.size=5,edge.width=(edge_attr(rdsn2)$weight*10)^2) However, we then reflect and realise there is a problem with what we've done. Between-group edges tend to be much weaker and individuals with high degree may have more between-group connections reducing their mean edge weight. Similarly edge weights may also be biased by other things that influence associations such as nose colour. We failed to consider the covariance between edge weights and degree. plot(deg,rowMeans(full_net),pch=16,xlab="Degree",ylab="Mean Association Strength",cex.lab=1.5) istribution-based reference models are complicated! For some network measures they might not be possible at all. While for others it might beimportant to consider covariance between them in order to generate an appropriate reference model.Distribution-based reference models could also be applied to raw data. You could fit a distribution to the relationship between individual traits orlandscape features and the social or spatial data used to build you nets and rebuild your network from there. We don’t cover that in this case study. D) Generative reference models

Our final type of reference model is the generative reference model. We first briefly illustrate the use of some basic statistical models for thenetworks themselves, before showing how agent-based models can be used to generate reference distributions of networksStatistical network models are well-covered elsewhere in the network structure. We touch on two commonly-used examples here: - a) Stochasticblock models which can be used to generate a reference distribution related to the community structure of the graph; - b) Exponential randomgraph models which can be used to fit parameters to describe how the probability or weight of edges can be explained by structural properties ofthe network, nodal traits and dyadic traits. sb<-blockmodels::BM_gaussian(membership_type="SBM_sym",adj=full_net,verbosity=0)sb$estimate() dom_el<-as.tnet(MAT_DOM) network::set.edge.attribute(dom,"weight",as.vector(dom_el[,3])) network::set.vertex.attribute(dom,"sex",as.vector(sexes[[1]]))network::set.vertex.attribute(dom,"age",as.vector(ages[[1]]))network::set.vertex.attribute(dom,"nose",as.vector(noses[[1]])) dom_mod<-ergm(dom~nonzero+sum+mutual(form="nabsdiff")+cyclicalweights(twopath="min",combine="max",affect="min")+transitiveweights(twopath="min",combine="max",affect="min")+nodematch("sex",diff=TRUE)+nodematch("age",diff=TRUE)+nodematch("nose",diff=TRUE)+nodeofactor("age")+nodeofactor("sex")+nodeofactor("nose"),reference=~Poisson,response="weight",silent=TRUE)

Now we can examine the fit of these statistical models and explore how to use them as reference distributions. plot(sb$ICL,pch=16) We can see that the fit of the model doesn't really improve once 16 communities are included.This is unsurprising given we simulated 16 berbil groups and within-group associations are so much more frequent than between-group associations. which.max(sb$ICL) sb$plot_obs_pred(16) b$plot_obs_pred(20) mems<-sign(round(sb$memberships[[16]]$Z,2))table(unlist(gss)) summary(dom_mod) ref_doms<-simulate(dom_mod,10) par(mfrow=c(2,5),mar=c(0,0,0,0)) for (i in ar(mfrow=c(1,1),mar=c(5,6,2,2)) ref_mats<-as.sociomatrix(ref_doms,attrname="weight",simplify=FALSE)ref_mats[[1]] Agent-based models offer a powerful way to develop reference distributions that depend on behavioural rules rather than the structure of theobserved network. You can program individuals to behave in a particular way and record their interactions and associations to generate asimulated network.This is of course how we generated our burbil society in the first place. Therefore, in order to demonstrate the use of agent-based models we aregoing to resuse some of our previous code and encourage you to examine the consequences of changing key parts of it.We first fit a spatially-explicit agent-based model, then a spatiall-explicit model applied at a subgroup level, and the subsequently develop asocially-explicit agent-based model to see whether it is better able to explain berbil association patterns.

Note that we only fit produce one simulation of each agent-based model here. However, stochastic agent-based models such as this can also beused to build reference distributions of test statistics if run multiple times. Give it a go if you fancy! dist_eff<-2 print(group_locs) group_net<-matrix(0,nr=dim(group_locs)[1],nc=dim(group_locs)[1]) for (i in for (j in if (g_tot[i]!=g_tot[j]){ group_net[g_tot[i],g_tot[j]]<-group_net[g_tot[i],g_tot[j]]+full_net[i,j] } }} gnet<-graph.adjacency(group_net,mode="undirected",weighted=TRUE)plot(gnet,vertex.color="light blue",edge.width=(edge_attr(gnet)$weight)^2) R_indiv_locs<-list() for (i in R_fn<-matrix(NA,nr=nrow(full_net),nc=ncol(full_net)) for (i in for (j in R_gn<-matrix(0,nr=dim(group_locs)[1],nc=dim(group_locs)[1]) for (i in for (j in if (g_tot[i]!=g_tot[j]){ R_gn[g_tot[i],g_tot[j]]<-R_gn[g_tot[i],g_tot[j]]+R_fn[i,j] } }} RGN<-graph.adjacency(R_gn,mode="undirected",weighted=TRUE)plot(RGN,vertex.color="light blue",edge.width=(edge_attr(RGN)$weight)^2) Calculate values for the test statistics vegan::mantel(R_gn,group_net)

Rindss<-list() Create a list to store group sizes

Rgss<-list()

Rsexes<-list()

Rages<-list()

Rnoses<-list()

Rdaysl<-list()

Rgbis<-list()

Rsg_mn<-5

Rsg_ass<-0.2 for (j in Rinds<-seq(1,n_inds[j],1)Rindss[[j]]<-Rinds gs<-length(Rinds)Rgss[[j]]<-gs sex<-sample(c("M","F"),gs,replace=TRUE)Rsexes[[j]]<-sex age<-sample(c("AD","SUB","JUV"),gs,replace=TRUE,prob=c(0.6,0.2,0.2))Rages[[j]]<-age nose<-sample(c("RED","ORANGE"),gs,replace=TRUE,prob=c(0.7,0.3))Rnoses[[j]]<-nose n_sg<-rpois(1,Rsg_mn-1)+1 max_red<-floor(n_sg/2) subgroups1<-sample(n_sg,sum(nose=="RED"),replace=TRUE,prob=c(rep(0.5+Rsg_ass,max_red),rep(0.5-Rsg_ass,n_sg-max_red)))subgroups2<-sample(n_sg,sum(nose=="ORANGE"),replace=TRUE,prob=c(rep(0.5-Rsg_ass,max_red),rep(0.5+Rsg_ass,n_sg-max_red)))subgroups<-rep(NA,gs)subgroups[nose=="RED"]<-subgroups1subgroups[nose=="ORANGE"]<-subgroups2

Rgbi<-matrix(0,nc=gs,nr=n_sg)Rgbi[cbind(subgroups,seq(1,gs,1))]<-1Rdays<-rep(1,nrow(Rgbi)) for (i in max_red<-floor(n_sg/2) subgroups1<-sample(n_sg,sum(nose=="RED"),replace=TRUE,prob=c(rep(0.5+Rsg_ass,max_red),rep(0.5-Rsg_ass,n_sg-max_red))) subgroups2<-sample(n_sg,sum(nose=="ORANGE"),replace=TRUE,prob=c(rep(0.5-Rsg_ass,max_red),rep(0.5+Rsg_ass,n_sg-max_red))) subgroups<-rep(NA,gss[[j]]) subgroups[nose=="RED"]<-subgroups1 subgroups[nose=="ORANGE"]<-subgroups2 tgbi<-matrix(0,nc=gs,nr=n_sg) tgbi[cbind(subgroups,seq(1,gs,1))]<-1 Rdays<-c(Rdays,rep(i,nrow(tgbi))) Rgbi<-rbind(Rgbi,tgbi)} Rgbi2<-Rgbi[rowSums(Rgbi)>0,]Rdays<-Rdays[rowSums(Rgbi)>0]Rgbi<-Rgbi2Rdaysl[[j]]<-RdaysRgbis[[j]]<-Rgbi}Rsglocs<-list() for (i in R_fn2<-matrix(0,nr=n_tot,nc=n_tot) for (i in for (j in for (k in (j+1):n_groups){ tA<-paste0(Rsglocs[[j]][,1],"-",Rsglocs[[j]][,2]) tB<-paste0(Rsglocs[[k]][,1],"-",Rsglocs[[k]][,2]) tA2<-tA[Rdaysl[[j]]==i] tB2<-tB[Rdaysl[[k]]==i] tt<-match(tA2,tB2) if (sum(is.na(tt))

R_fn3<-matrix(0,nr=n_tot,nc=n_tot) for (i in for (j in for (k in (j+1):n_groups){ tA<-paste0(Rsglocs[[j]][,1],"-",Rsglocs[[j]][,2]) tB<-paste0(Rsglocs[[k]][,1],"-",Rsglocs[[k]][,2]) tA2<-tA[daysl[[j]]==i] tB2<-tB[daysl[[k]]==i] tt<-match(tA2,tB2) if (sum(is.na(tt))