[PDF] The likelihood-ratio test for multi-edge network models

Abstract

Full PDF

GGiona Casiraghi

The likelihood-ratio test for multi-edge network models (Submitted for publication)

The likelihood-ratio test for multi-edge network models

Giona Casiraghi

ETH Zürich, Chair of Systems DesignWeinbergstrasse 56/58, Zürich, [email protected]

Abstract

The complexity underlying real-world systems implies that standard statistical hypothesistesting methods may not be adequate for these peculiar applications. Specifically, we show thatthe likelihood-ratio test’s null-distribution needs to be modified to accommodate the complexityfound in multi-edge network data. When working with independent observations, the p-values oflikelihood-ratio tests are approximated using a 𝜒 distribution. However, such an approximationshould not be used when dealing with multi-edge network data. This type of data is characterizedby multiple correlations and competitions that make the standard approximation unsuitable. Weprovide a solution to the problem by providing a better approximation of the likelihood-ratio testnull-distribution through a Beta distribution. Finally, we empirically show that even for a smallmulti-edge network, the standard 𝜒 approximation provides erroneous results, while the proposedBeta approximation yields the correct p-value estimation. Keywords: likelihood-ratio test , multi-edge network , complex system , hypothesis testing , model selection Complex systems are notoriously challenging to analyze due to the large number of interdependencies,competitions, and correlations underlying their dynamics. To deal with these issues, data-driven studiesof complex systems are based – either directly or indirectly – on the careful formulation of modelsrepresenting different hypotheses about the system. The validation of these hypotheses is performedby comparing how well different models fit some observed data  . Principled model selection is mostprobably the central problem of data analysis.Model selection and statistical hypothesis testing have been intensively investigated in the generalcases of, e.g., linear and generalized statistical regression models [6, 21]. However, less attention hasbeen devoted to developing hypothesis testing methods specific to network models and network data,commonly used to study complex systems. In this article, we investigate how one standard statisticaltest, the likelihood-ratio (LR) test , needs to be modified when dealing with multi-edge network data . a r X i v : . [ s t a t . M E ] F e b iona Casiraghi The likelihood-ratio test for multi-edge network models (Submitted for publication)

Model selection is addressed by operationalizing the principle of parsimony, one of the fundamentalconcepts in statistical modeling. Statisticians usually view the principle of parsimony as a bias versusvariance tradeoff : bias decreases, and variance increases as the complexity of a model increases. Thefit of any model can be improved by increasing the number of parameters. However, a tradeoff withthe increasing variance must be considered when selecting a model to validate a statistical hypothesis.Parsimonious models should achieve the proper tradeoff between bias and variance. Box et al. [3] sug-gested that the principle of parsimony should lead to a model with “the smallest possible number ofparameters for adequate representation of the data". Data-driven selection of a parsimonious model isthus at the core of scientific research.We can roughly summarise model selection methods into two groups that address the principle of par-simony differently. Model selection based on statistical tests and model selection based on information-theoretic methods [6]. Prominent examples of information-theoretic methods are the AIC [1, 2], theBIC [27], or description length minimisation [23, 26]. Studying how such methods fare when facedwith network data complexity is beyond this article’s scope.The LR test is instead one of the most common examples of statistical tests used for hypothesis testing.Statistical tests allow performing hypothesis testing in the following way. They evaluate how far a teststatistic 𝜆 falls from an appropriately constructed null-model. In the LR test, the test statistic 𝜆 is theratio between the likelihood 𝐿 of the model 𝑋 representing the null-hypothesis and the likelihood 𝐿 𝑎 of a more complex model 𝑋 𝑎 representing the alternative hypothesis to be tested. The better thealternative hypothesis is compared to the null, the smaller the test statistic’s absolute value 𝜆 .How small need 𝜆 be to reject the null hypothesis in favor of the alternative? This depends on thenull-distribution of 𝜆 . In other words, it depends on the null-hypothesis and its corresponding null-model 𝑋 . Assuming that the null-hypothesis is correct, we could generate realizations ̃  of model 𝑋 that represent all the possible forms the null-hypothesis could have taken in the data. This providesa null-distribution for the test statistic, i.e., a baseline distribution of the test statistic assuming thatthe null hypothesis was true. If the alternative hypothesis does not fit the observed data well, we canexpect the probability of observing 𝜆 from the null-distribution to be relatively large. The reason forthis is that 𝑋 𝑎 does not fit the data better than 𝑋 . In other words, there is no (statistical) evidencethat we need the more complex model 𝑋 𝑎 to explain the data, and the null-model 𝑋 is sufficient. Ifthe alternative hypothesis is considerably better than the null, the probability of observing 𝜆 under thenull-hypothesis will be small. This would give statistical evidence to reject the null hypothesis in favorof the alternative. The p-value of the LR test is precisely the probability of observing a value from thenull distribution as small or smaller than 𝜆 .Standard implementations of the LR test have been developed assuming that the data consist of manyindependent observations of the same process (i.i.d observations) [21]. Under these circumstances, iona Casiraghi The likelihood-ratio test for multi-edge network models (Submitted for publication)

Wilk’s theorem provides a widely used approximation of the null-distribution of 𝜆 to a 𝜒 distribu-tion [24]. Analyzing complex systems, we are often faced with multi-edge network data . These dataconsist of 𝑚 repeated – and possibly time-stamped – edges (𝑖, 𝑗) representing interactions between 𝑛 different agents 𝑖, 𝑗 , the vertices of the network. Examples of such datasets arise in multiple fields, e.g.,in the form of human or animal interactions. Usually, events in networks are explicitly dependent oneach other. Thus, the crucial assumption of i.i.d observations required by Wilk’s theorem is violated.The underlying assumption in network science is that the opposite of i.i.d. is true, i.e., that the presenceor absence of edges between some vertex pairs affects edges between other vertex pairs. This is criticalin the presence of phenomena typically found in complex social systems, such as triadic closure [25],structural balance [17], degree-degree correlations [22], and other network effects.So what is the implication of such interdependencies? The dependence between different observa-tions of a complex system means that some of the statistical tests’ properties will not hold when an-alyzing network data. In particular, we show that the null-distribution of the test statistic 𝜆 of the LRtest needs to be modified to accommodate such dependence. When this is not done, the results obtainedapplying LR tests for hypothesis testing cannot be relied upon.To illustrate this, we employ the gHypEG (generalized hypergeometric ensemble of randomgraphs) [9, 12] to model multi-edge network data. The gHypEG allows the encoding of different typesof hypotheses in a model, from simple ones like block structures [8] to more complex ones, akin to sta-tistical regression models [5, 7]. Such models can then be used to evaluate different hypotheses aboutthe data [4]. In this article, we deal with multi-edge network data. Each observation 𝑒 𝑖→𝑗 consists of an interactionfrom a system’s agent 𝑖 to another agent 𝑗 . All observations 𝐸 = {𝑒 𝑖→𝑗 } can be collected as directededges in a multi-edge network  (𝑉 , 𝐸) , where 𝑉 is the set of all interacting agents, the vertices of thenetwork. The matrix 𝐴𝐴𝐴 denotes the adjacency matrix of the network  . Each of its entries 𝐴 𝑖𝑗 reportsthe number of edges from vertex 𝑖 to vertex 𝑗 , or in other words, the number of observed interactionsbetween agent 𝑖 and agent 𝑗 .Within this framework, statistical hypotheses are formulated in terms of random graph models . Simpleexamples of such hypotheses are:(a) Each agent has the same potential to interact as any other.(b) Different agents have different potentials to interact. iona Casiraghi The likelihood-ratio test for multi-edge network models (Submitted for publication) (c) Agents are separated into 2 distinct groups, and agents of one group are more likely to interactwith each other than with agents of the other group.(d) Agents are separated into 𝑛 distinct groups, and agents of one group are more likely to interactwith each other than with agents of the other groups.More complex social and dynamical hypotheses can obviously be formulated depending on the systemstudied. Testing these hypotheses requires encoding them in statistical models and then comparingthe fit of these models against each other or against some null-models [21]. We employ the generalizedhypergeometric ensemble of random graph (gHypEG) as the statistical model encoding such hypothesesto deal with multi-edge network data. While this is not the only option, we choose the gHypEG becauseof its versatility and suitability to model multi-edges. In the next sections, we show i) how to formulatea likelihood-ratio test between two of such hypotheses and ii) how the null-distribution of the LR testneeds to be modified to fit complex network data. Before discussing the LR test details, we briefly introduce the generalized hypergeometric ensemble ofrandom graphs. A more formal presentation is provided in [9].The general idea underlying the gHypEG is to sample edges at random from a predefined set ofpossible edges. Hypotheses about the system from which the data is observed can be encoded by eithera) changing the number of possible edges in such a set and b) changing the odds of sampling an edgebetween two vertices instead of others, i.e., by specifying different edge sampling weights, or biases.Figure 1 provides a graphical illustration of such a process from the perspective of an agent A. P AB P AC P AD edge probability0 1AB DC P AB P AC P AD edge probability0 1B DCA edge probability0 1 ~ 3Ω AB P AB ~2 Ω AC P AC ~ Ω AD P AD B DCA

Figure 1:

Probabilities of connecting different agents according to gHypEG, from the perspective of an agent A.Graphical illustration of (left) uniform edge probabilities, (center) the probability of connecting two vertices is afunction of degrees, and (right) the probability of connecting two vertices is function of degrees and propensities Ω (represented as the edge-weight). On the left-hand side of Fig. 1, the number of possible ways A can interact with the other agents is thesame: there are two edge-stubs for each vertex. Moreover, the odds of sampling one edge-stub instead iona Casiraghi

The likelihood-ratio test for multi-edge network models (Submitted for publication) of another is . I.e., each edge has the same sampling weight, which is denoted as Ω 𝑖𝑗 in the following,where 𝑖, 𝑗 are vertices in 𝑉 . According to the model just described, the probability of observing a multi-edge network  with 𝑚 edges depends only on the number Ξ of possible edges between each pairof vertices. This scenario gives rise to a uniform random graph model similar in spirit to the 𝐺(𝑛, 𝑝) model of Erdös and Rènyi [15]. The process of sampling 𝑚 edges from a collection of 𝑛 Ξ possibleedges, i.e, Ξ possible edges for each pair of vertices in a directed network with selfloops, is describedby the hypergeometric distribution [9]: Pr(  |Ξ) = (𝑛 Ξ𝑚 ) −1 ∏ 𝑖,𝑗∈𝑉 ( Ξ𝐴 𝑖𝑗 ), (1)By setting Ξ = (𝑚/𝑛) , we ensure that the average degree of the observed network  is preserved bythe model. This first scenario corresponds to the hypothesis (a) listed above: each agent has the samepotential to interact. Furthermore, in a directed network with self-loops, Ξ = 𝑚 /𝑛 corresponds to themaximum likelihood estimation of the only model parameter Ξ . We refer to the resulting hypergeo-metric network model as regular model .The central illustration of Fig. 1 highlights a different case. The odds between the different inter-actions are still identical. I.e., there is no preference for A to interact with any of the other agents.However, the actual possibilities of interactions vary between the different agents: each agent has adifferent number of edge-stubs for A to connect to. This scenario encodes a different potential of in-teraction for the different agents, usually reflected in a heterogeneous degree distribution found in thenetwork  . This model encodes hypothesis (b) above. In practice, this hypothesis requires setting dif-ferent values Ξ 𝑖𝑗 for the number of possible edges between each pair vertices 𝑖, 𝑗 . The probability ofobserving a network  according to this model changes as follows: Pr(  |ΞΞΞ) = (∑ 𝑖𝑗 Ξ 𝑖𝑗 𝑚 ) −1 ∏ 𝑖,𝑗∈𝑉 (Ξ 𝑖𝑗 𝐴 𝑖𝑗 ), (2)where the matrix ΞΞΞ contains all different entries Ξ 𝑖𝑗 . The value of Ξ 𝑖𝑗 can be freely chosen to encodedifferent properties of the system studied (see, e.g., [4]). For example, if we were studying a citationnetwork consisting of citations between scientists, we could set Ξ 𝑖𝑗 to 𝑝 𝑖 ⋅ 𝑝 𝑗 , where 𝑝 𝑥 is the numberof articles published by scientist 𝑥 . 𝑝 𝑖 ⋅ 𝑝 𝑗 would then encode all the possible ways scientist 𝑖 couldhave cited scientist 𝑗 , through all their respective publications. In most cases, though, Ξ 𝑖𝑗 is taken tobe 𝑘 out 𝑖 ⋅ 𝑘 in 𝑗 , where 𝑘 in 𝑥 is the observed in-degree of agent 𝑥 , and 𝑘 out 𝑥 its observed out degree. Thishypergeometric network model corresponds to a soft configuration model [16], and defines a networkmodel that preserves the observed degree sequences in expectation [9]. iona Casiraghi The likelihood-ratio test for multi-edge network models (Submitted for publication)

The two models described so far are both characterized by the absence of sampling biases, i.e., in-teraction preferences between specific vertex pairs that go beyond what is prescribed by the numberof edge-stubs and degrees. GHypEG further expands this formulation modifying the hypergeometricconfiguration model with additional information available about the system. Specifically, the probabil-ity of connecting two vertices depends not only on the observed degrees (i.e., number of stubs) but alsoon an independent propensity of two vertices to be connected. Such propensities introduce non-degreerelated effects into the model. This result is achieved by changing the odds of connecting a pair of ver-tices instead of another. The right side of Fig. 1 illustrates this case, where 𝐴 is most likely to connectwith vertex 𝐷 , even though 𝐷 has only one available stub.We collect these edge propensities in a matrix 𝛀 . The ratio between any two elements Ω 𝑖𝑗 and Ω 𝑘𝑙 ofthe propensity matrix gives the odds-ratio of observing an edge between vertices 𝑖 and 𝑗 instead of 𝑘 and 𝑙 , independently of the degrees of the vertices. The probability of a graph  depends on the stubs’configuration specified by 𝚵 , and on the odds defined by 𝛀 . Such a probability distribution is describedby the multivariate Wallenius’ non-central hypergeometric distribution [14, 30]: Pr(  |𝚵, 𝛀) = [∏ 𝑖,𝑗 (Ξ 𝑖𝑗 𝐴 𝑖𝑗 )] ∫ ∏ 𝑖,𝑗 (1 − 𝑧 Ω𝑖𝑗𝑆Ω ) 𝐴 𝑖𝑗 𝑑𝑧 (3)with 𝑆 Ω = ∑ 𝑖,𝑗 Ω 𝑖𝑗 (Ξ 𝑖𝑗 − 𝐴 𝑖𝑗 ) . By constraining the number of free parameters in

ΩΩΩ , we can specify hypotheses about the data changingthe sampling odds for different vertex pairs. For example, we can cluster vertices into multiple groupsand verify whether the odds of observing interactions within a group and between a group are differ-ent [8]. The resulting model is similar to a degree-corrected stochastic block model [18]. Alternatively,we can specify

ΩΩΩ to encode endogenous network properties. E.g.,

ΩΩΩ can be utilized to encode triadicclosure [5], to verify whether pairs whose interactions will close triads in the network are more likelythan others. Finally, different effects contributing to the odds of observing some interactions instead ofothers can be composed together to formulate more complex hypotheses [7].The advantage of the approach just described is the ability to encode a wide range of statistical hy-potheses within the same modelling framework. This has the practical benefit of allowing the compari-son of very different models, as they can all be formulated by means of the same probability distributionof Eq. (3). Different hypotheses are thus encoded by appropriately choosing the free parameters in

ΞΞΞ and

ΩΩΩ . iona Casiraghi The likelihood-ratio test for multi-edge network models (Submitted for publication)

For clarity, we will focus on simple hypotheses such as those described in the previous section. How-ever, the results shown do hold for any combinations of

ΞΞΞ and

ΩΩΩ . In the particular case of encodinggroup structures,

ΩΩΩ takes the following form: Ω 𝑖𝑗 ∶= 𝜔 𝑔 𝑖 ,𝑔 𝑗 , (4)where 𝑔 𝑖 is the group of agent 𝑖 , 𝑔 𝑗 is the group of agent 𝑗 , and 𝜔 𝑔 𝑖 ,𝑔 𝑗 is the propensity of sampling anedge between group 𝑔 𝑖 and group 𝑔 𝑗 . In the presence of 2 different groups of vertices (𝐴, 𝐵) , there are3 possible values that Ω 𝑖𝑗 can take: 𝜔 𝐴𝐴 , 𝜔 𝐵𝐵 , 𝜔 𝐴𝐵 (assuming that 𝜔 𝐴𝐵 = 𝜔 𝐵𝐴 ). The ratio 𝜔 𝐴𝐴 /𝜔 𝐴𝐵 givesthe odds between sampling an edge within group A and an edge between group A and B given a valueof Ξ . We now illustrate how the LR test is used to test a null-hypothesis against an alternative hypothesisabout the observed system. The data are used to define a graph  with adjacency matrix 𝑨 . Let 𝐻 𝑟 be some statistical hypothesis. Here, we always assume that each hypothesis is defined by a gHypEGmodel 𝑋 𝑟 that can be encoded by a propensity matrix 𝛀 𝑟 and a combinatorial matrix 𝚵 𝑟 . Each modelis characterized by several free parameters that we want to fit to the data  , such that the probabil-ity of observing  is maximized. This requirement corresponds to performing a maximum likelihoodestimation (MLE) of the free parameters. Likelihood-ratio statistic.

Assume now we have two hypotheses we want to test against each other.Let 𝐻 denote the null-hypothesis and let 𝐻 𝑎 denote the alternative. The corresponding models aredefined in terms of 𝛀 , 𝚵 and 𝛀 𝑎 , 𝚵 𝑎 . To test the alternative hypothesis against the null, we use thelikelihood-ratio statistic 𝜆(0, 𝑎) , defined as follows: Definition 1 (Likelihood-ratio statistic) . Let  be a graph, 𝑋 be the model corresponding to the null-hypothesis, and 𝑋 𝑎 the model corresponding to the alternative hypothesis. The likelihood ratio statistic 𝜆(0, 𝑎) is given by 𝜆(0, 𝑎) ∶= 𝐿(𝚵 , 𝛀 |𝑨)sup(𝐿(𝚵 , 𝛀 |𝑨), 𝐿(𝚵 𝑎 , 𝛀 𝑎 |𝑨)) , (5)where 𝐿(𝚵 𝑟 , 𝛀 𝑟 |𝑨) = Pr(  |𝚵 𝑟 , 𝛀 𝑟 ) denotes the likelihood of model 𝑋 𝑟 given the network  .Through the likelihood-ratio statistic, we can perform two types of tests. First, we can perform astandard model selection test to compare a simpler model against a more complex model. This test cor-responds to verify whether there is enough evidence in the data that justifies the more complex model iona Casiraghi The likelihood-ratio test for multi-edge network models (Submitted for publication) or whether the simpler model fits the data well enough. In this scenario, the simple model correspondsto the null-hypothesis, while the more complex model to the alternative.Second, the likelihood-ratio test can be used to perform a goodness-of-fit test. This test allows veri-fying the quality of the fit of a model 𝑋 𝑟 . By defining the alternative hypothesis with a model 𝑋 full thatperfectly reproduces the observed data (in expectation), we can test whether the fit of the model 𝑋 𝑟 isas good as such an overfitting model [21]. In the framework of gHypEGs, the alternative hypothesis isobtained by specifying the parameter matrix ΩΩΩ full such that the expectation of 𝑋 full corresponds to theobservation  . This model is the maximally complex model that can be specified with a gHypEG andhas as many free parameters as entries in the adjacency matrix [9].Let’s now assume that the two models corresponding to the alternative and null hypotheses are nested . This means that 𝚵 can be written as a special case of 𝚵 𝑎 , and 𝛀 as a special case of 𝛀 𝑎 . Thus, thenull-model (with fewer parameters) can be formulated by constraining some of the alternative modelparameters. Thanks to Wilks’ theorem [24], if the two models are nested, the number of observations 𝑚 is large, and the observations are independent, the distribution of 𝜆 under the null-hypothesis canbe written in terms of 𝐷(0, 𝑎) ∶= −2 log(𝜆(0, 𝑎)), (6)and can be approximated by the 𝜒 distribution with as many degrees of freedom as the difference ofdegrees of freedoms between the two models. Letting 𝜈 be the difference of degrees of freedom betweenthe null and the alternative modes, the 𝑝 -value of the likelihood-ratio test between the two hypothesesis computed as follows: p-value ∶= Pr(𝜒 (𝜈) ≥ 𝐷(0, 𝑎)). (7)We reject the null hypothesis in favour of the alternative if the 𝑝 -value is smaller than some threshold 𝛼 . Distribution of 𝜆 under the null-hypothesis. The question that remains to be answered is whetherthe conditions provided by multi-edge network data allow Wilks’ theorem’s application. Unfortunately,in most real-world scenarios, the answer to this question is negative. This is a known issue in statistics,where it arises in the context of multinomial goodness-of-fit tests [13, 19, 20, 29]. Because Wallenius’multivariate non-central hypergeometric distribution converges to the multinomial distribution (a for-mal proof can be found in [32]), we use the results obtained for multinomial tests to find a betterapproximation for the null distribution of

𝐷(0, 𝑎) than the 𝜒 approximation of Wilks’ theorem. Specifi-cally, following the work of Smith et al. [29], we propose approximating the distribution of 𝐷(0, 𝑎) witha Beta distribution. iona Casiraghi

The likelihood-ratio test for multi-edge network models (Submitted for publication)

Theorem 1 (Convergence in distribution of likelihood-ratio statistics) . The distribution under the null-hypothesis of

𝐷(0, 𝑎) , defined as in Eq. (6) , for 𝑚 large tends towards a Beta (𝛼, 𝛽) distribution with param-eters 𝛼 = 𝜇 [𝐷(0, 𝑎)]𝑀 ⋅ 𝜎 [𝐷(0, 𝑎)] ⋅ (𝜇 [𝐷(0, 𝑎)] ⋅ (𝑀 − 𝜇 [𝐷(0, 𝑎)]) − 𝜎 [𝐷(0, 𝑎)]) , (8) and 𝛽 = (𝑀 − 𝜇 [𝐷(0, 𝑎)]) ⋅ 𝛼𝜇 [𝐷(0, 𝑎)] , (9) where 𝑀 denotes the upper limit of the image of 𝐷(0, 𝑎) , 𝜇 [𝐷(0, 𝑎)] its expectation and 𝜎 [𝐷(0, 𝑎)] itsvariance. In some special cases there exist analytical solutions for 𝜇 [𝐷(0, 𝑎)] and 𝜎 [𝐷(0, 𝑎)] [29]. However, inmost situations, we resort to a numerical estimation of them. While a general solution would be opti-mal, thanks to the ability to generate samples provided by gHypEG models, the parameters’ numericalestimation can be nevertheless performed with ease.The result provided by Theorem 1 greatly helps when performing likelihood-ratio tests involvingmulti-edge network data. In fact, to estimate the full null distribution of the likelihood-ratio statisticnumerically, we would need a considerable number of realisations. In the case of large networks, this isinfeasible. Exploiting Theorem 1 instead, we only need to estimate the first two moments of the distri-bution under the null hypothesis, which can be done reliably with a smaller number of realisation [28].For example, in Fig. 2, we show the results of applying Kolmogorov-Smirnov’s test to compare theLR’s distribution statistics against the Beta distribution fitted to increasing sample sizes. The exampleis constructed from a 40 vertices random graph with 500 undirected edges. The edges are generatedaccording to the hypergeometric configuration model, and the likelihood-ratio test is performed com-paring a regular model against the generating configuration model. To build the empirical distributionof the likelihood-ratio statistic, we take 𝑆 = 500 000 samples under the null hypothesis, and we computethe parameters of the asymptotic Beta distribution from an increasing number of independent samples.The results show that with a limited sample size 𝑠 ∼ 1000 , most of the observations give a p-valuefor the Kolmogorov-Smirnov test larger than . I.e., with a limited sample size 𝑠 , the empirical null-distribution obtained from 𝑆 is not significantly different from the Beta distribution whose parametershave been estimated from the 𝑠 realizations. That points to the fact that the asymptotic results of The-orem 1 are acceptable even for a finite number of observations and small sample sizes. The R package ghypernet [10] provides an implementation of the likelihood-ratio test for gHypEG models. Thepackage is Open Source and can be obtained from the CRAN R package repository. https://ghyper.net iona Casiraghi The likelihood-ratio test for multi-edge network models (Submitted for publication) sample size p - v a l u e Figure 2:

P-values of a Kolmogorov-Smirnov test for the likelihood-ratio null-distribution. Each point is the p-value (y-axis) of the median statistic resulting from 500 tests of the empirical distribution against the asymptoticBeta distribution, computed for an increasing sample size (x-axis). The length of the lines denotes 1.5 inter-quantile ranges. The horizontal red line shows the significance threshold for the Kolmogorov-Smirnov test.It is evident a clear trend where the p-value increases with increasing sample size. That highlights an increase ofthe goodness-of-fit of the limiting Beta distribution for the empirical distribution.

We provide two short case studies about the application of the likelihood-ratio test. First, we generatea random undirected graph with 𝑛 = 100 vertices and 𝑚 = 400 directed edges uniformly distributedbetween each vertex pairs. Utilizing the likelihood-ratio test, we can test the null-hypothesis (a) thateach vertex has the same potential of interactions against the alternative hypothesis (b) that differentagents have different interaction potentials. As explained in the previous sections, (a) is encoded by a regular model with one parameter, and (b) by a configuration model . This test corresponds to testingthat the degree distribution deviates from that of the regular model. We expect that the test returnshigh p-values because we choose the null-hypothesis to match the random graph’s generating process.The results, obtained from 1000 repetitions of the experiment, confirm this hypothesis with a p-valueof the median 𝜆 of . Similarly, we perform the same experiment generating a random undirectedgraph from the standard configuration model with a heterogeneous degree distribution. To ease thecomparison with the example above, we define it by a degree sequence sampled from a geometricdistribution with mean chosen such that the expected number of edges in the graph is 𝑚 = 400 . Thiseffectively corresponds to generating data according to the hypothesis (b). In this case, we expect smallp-values from the same test done before because the generating model of the data corresponds to thealternative hypothesis. Repeating the experiment 1000 times, for the largest recorded 𝜆 we obtain ap-value < 1𝑒 − 20 . iona Casiraghi The likelihood-ratio test for multi-edge network models (Submitted for publication)

The second experiment we perform is with an empirical graph. We use Zachary’s Karate Club(ZKC) [31] as a test case. ZKC consists of 34 vertices and 231 undirected multi-edges. As with mostempirical graphs, its degree sequence is skewed (empirical skewness is ). Hence, we expect thatthe test we performed before – comparing the null-hypothesis of a regular model against the hyper-geometric configuration model – should reject the null-hypothesis. Performing such a likelihood-ratiotest gives a p-value < 1𝑒 − 20 , which confirms our expectations.We can further exploit this example to compare the empirical distribution of

𝐷(0, 𝑎) under the nullhypothesis with the 𝜒 distribution and the Beta distribution. The result is shown in Fig. 3a. Note thatthe value of 𝐷(0, 𝑎) for ZKC is , which is out of scale on the right side of the x-axis of Fig. 3a.In this case, it appears that the Beta distribution and the 𝜒 distribution provide similar fits. However,when we perform a two-sided Kolmogorov-Smirnov test, we get a p-value of for the 𝜒 distribution, which means that we can reject the null hypothesis that the empirical distribution followsa 𝜒 . Performing the same test for the Beta distribution, we get a p-value of . That means thatwe cannot reject the null-hypothesis that the empirical distribution follows a Beta. While this is notenough to claim that the distribution of 𝐷(0, 𝑎) is a Beta, it gives confidence on using the asymptoticresults of Theorem 1.We perform a second experiment to highlight how much, in extreme cases, the 𝜒 distribution candeviate from the empirical distribution of 𝐷(0, 𝑎) under the null-hypothesis. For the ZKC, we nowperform a goodness-of-fit test of the hypergeometric configuration model. The alternative hypothesisis thus encoded by the maximally complex model fitted by gHypEGs and results in a model that fixesthe expected graph as the observed one, as explained above. The fit of its parameters is performedaccording to what described in [9]. The test can be interpreted as how well the null-model fits the data,which is entirely encoded in the full model. This test results in a p-value of . That meansthat the configuration model is not a good model for the ZKC. This result is hardly surprising, giventhe well-known community structure present in the empirical graph. In Fig. 3b, we show the empiricaldistribution of

𝐷(0, 𝑎) . There, we notice that while the Beta distribution provides a visually good fit, the 𝜒 distribution is heavily shifted to the right. The two-sample Kolmogorov-Smirnov test confirms thisresult, providing a p-value of for the Beta distribution, and < 2.2𝑒 − 16 for the 𝜒 distribution.In this last example, it is essential to note that using the 𝜒 distribution would provide a misleadingresult. Comparing the value of 𝐷(0, 𝑎) for the empirical graph, we see that it is on the right tail of the 𝜒 distribution. Computing a p-value from this distribution would result in a p-value of ≈ 0.005 . Thatmeans that in this case, we would only weakly reject the null-hypothesis, giving the wrong impressionthat the ZKC could come from an extreme realization of a simple configuration model. However, thisis ruled out by looking at the likelihood-ratio statistics’ empirical distribution or simply comparing anempirical graph with a realization from the hypergeometric configuration model [11]. iona Casiraghi The likelihood-ratio test for multi-edge network models (Submitted for publication)

𝐷(0, 𝑎) CC D F fill empirical CCDF colour 𝜒 CCDFBeta CCDF (a)

𝐷(0, 𝑎) CC D F fill empirical CCDF colour 𝜒 CCDFBeta CCDF (b)

Figure 3:

Empirical distribution of

𝐷(0, 2) for two likelihood-ratio tests perform on ZKC. In the top figure, weperform a likelihood-ratio test for the null-hypothesis that ZKC comes from a regular model against the alterna-tive hypothesis that ZKC comes from a configuration model. The results show that there is strong evidence toreject the null hypothesis. In the bottom figure, we perform a goodness-of-fit test for the hypergeometric config-uration model on ZKC. Also, in this case, the results show a bad fit of the null-model. While in the top-figure, the 𝜒 and the Beta approximations of the likelihood-ratio statistic’s empirical distribution give a relatively good fit,in the bottom case, it is clear that the 𝜒 does not approximate well the empirical distribution. The shaded areadenotes the empirical distribution of 𝐷(0, 2) under the null-hypothesis, the orange line its 𝜒 approximation, andthe green line its Beta approximation. The vertical line denotes the value of the likelihood-ratio statistic for ZKC.In the top figure, such a line is out of the boundaries on the right side and hence not plotted. The study of complex systems is intertwined with network science and advanced multivariate statistics.Hypothesis testing and model selection methods, in particular, need to account for the complexityunderlying observations from such systems. Because interactions between system agents tend not to iona Casiraghi

The likelihood-ratio test for multi-edge network models (Submitted for publication) be independent, many standard statistical methods should be employed with care when dealing withnetwork data.This article has investigated how the likelihood-ratio test needs to be adapted to deal with networkmodels. Likelihood-ratio tests provide a practical methodology for selecting different network modelsand testing statistical hypotheses. However, the characteristics of multi-edge networks require us toadapt the test null-distribution to account for the underlying complexity of network data. When this isnot done, we incur the risk of over- (or under-) estimating the p-values of the statistical test, generat-ing contradictory results, as shown in the case study above. With Theorem 1, we provide the means tocorrectly estimate the p-values for likelihood-ratio tests by means of a Beta distribution. Finally, we pro-vide an implementation of the methods described through the Open Source R package ghypernet .Even though our analysis is focused on the likelihood-ratio test, similar issues may arise with otherstatistical tools applied to complex networks.The main limitation of the results presented in this article is the need to numerically estimate the firsttwo empirical moments of the statistic’s null distribution. Although this can be performed easily usingour implementation, we will investigate analytical asymptotic estimates for the parameters needed infuture research. Acknowledgements

The author thanks Frank Schweitzer for his support and Georges Andres for his detailed comments.

References [1] Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle. In: B. Petrov;F. Csaki (eds.),

International Symposium on Information Theory . pp. 267–281.[2] Akaike, H. (1974). A new look at the statistical model identification.

IEEE Transactions on Automatic Control , 716–723.[3] Box, G. E. P.; Jenkins, G. M.; Reinsel, G. C. (1994).

Time Series Analysis: Forecasting and Control . Wiley Seriesin Probability and Statistics, Wiley. ISBN 9781118619063.[4] Brandenberger, L.; Casiraghi, G.; Andres, G.; Schweighofer, S.; Schweitzer, F. (2021). Why Online does notEqual Offline: Comparing Online and Real-World Political Support Among Politicians. arXiv preprint .[5] Brandenberger, L.; Casiraghi, G.; Nanumyan, V.; Schweitzer, F. (2019). Quantifying triadic closure in multi-edge social networks. In:

Proceedings of the 2019 IEEE/ACM International Conference on Advances in SocialNetworks Analysis and Mining . New York, NY, USA: ACM, pp. 307–310. ISBN 9781450368681.[6] Burnham, K. P.; Anderson, D. R. (eds.) (2004).

Model Selection and Multimodel Inference . New York, NY:Springer New York. ISBN 978-0-387-95364-9. iona Casiraghi

The likelihood-ratio test for multi-edge network models (Submitted for publication) [7] Casiraghi, G. (2017). Multiplex Network Regression: How do relations drive interactions? arXiv preprintarXiv:1702.02048 .[8] Casiraghi, G. (2019). The block-constrained configuration model.

Applied Network Science , 123.[9] Casiraghi, G.; Nanumyan, V. (2018). Generalised hypergeometric ensembles of random graphs: the config-uration model as an urn problem. arXiv preprint arXiv:1810.06495 .[10] Casiraghi, G.; Nanumyan, V. (2020). GHYPERNET v1.0.1: Fit and Simulate Generalised HypergeometricEnsembles of Graphs.[11] Casiraghi, G.; Nanumyan, V.; Scholtes, I.; Schweitzer, F. (2016). Generalized Hypergeometric Ensembles:Statistical Hypothesis Testing in Complex Networks. arXiv preprint arXiv:1607.02441 .[12] Casiraghi, G.; Nanumyan, V.; Scholtes, I.; Schweitzer, F. (2017). From Relational Data to Graphs: InferringSignificant Links Using Generalized Hypergeometric Ensembles. In: C. G.; M. A.; Y. T. (eds.),

Social Informat-ics. SocInfo 2017. Lecture Notes in Computer Science . Cham: Springer, pp. 111–120. ISBN 978-3-319-67255-7.[13] Chapman, J.-A. W. (1976). A Comparison of the Chi squared , -2 log R , and Multinomial Probability Criteriafor Significance Tests When Expected Frequencies are Small.

Journal of the American Statistical Association , 854–863.[14] Chesson, J. (1978). Measuring Preference in Selective Predation.

Ecology , 211–215.[15] Erdős, P.; Rényi, A. (1959). On random graphs I.

Publ. Math. Debrecen , 156.[16] Fosdick, B. K.; Larremore, D. B.; Nishimura, J.; Ugander, J. (2018). Configuring random graph models withfixed degree sequences.

SIAM Review , 315–355.[17] Heider, F. (1946). Attitudes and Cognitive Organization.

The Journal of Psychology , 107–112.[18] Karrer, B.; Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks.

Phys.Rev. E , 16107.[19] Koehler, K. J.; Larntz, K. (1980). An Empirical Investigation of Goodness-of-Fit Statistics for Sparse Multi-nomials.

Journal of the American Statistical Association , 336–344.[20] Larntz, K. (1978). Small-Sample Comparisons of Exact Levels for Chi-Squared Goodness-of-Fit Statistics.

Journal of the American Statistical Association , 253–263.[21] Lehmann, E. L.; Romano, J. P. (eds.) (2005).

Testing Statistical Hypotheses . Springer Texts in Statistics, NewYork, NY: Springer New York. ISBN 978-0-387-98864-1.[22] Mondragón, R. J. (2020). Estimating degree–degree correlation and network cores from the connectivity ofhigh–degree nodes in complex networks.

Scientific Reports , 5668.[23] Peixoto, T. P. (2013). Parsimonious Module Inference in Large Networks.

Physical Review Letters ,148701.[24] Rao, C. R. (1973).

Linear statistical inference and its applications , vol. 2. Wiley New York.[25] Rivera, M. T.; Soderstrom, S. B.; Uzzi, B. (2010). Dynamics of Dyads in Social Networks: Assortative, Rela-tional, and Proximity Mechanisms.

Annual Review of Sociology , 91–115.[26] Rosvall, M.; Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community struc-ture.

Proceedings of the National Academy of Sciences , 1118–1123.[27] Schwarz, G. (1978). Estimating the Dimension of a Model.

The Annals of Statistics , 461–464. iona Casiraghi

The likelihood-ratio test for multi-edge network models (Submitted for publication) [28] Shore, H. (1995). Fitting a distribution by the first two moments (partial and complete).

ComputationalStatistics & Data Analysis , 563–577.[29] Smith, P. J.; Rae, D. S.; Manderscheid, R. W.; Silbergeld, S. (1981). Approximating the Moments and Distri-bution of the Likelihood Ratio Statistic for Multinomial Goodness of Fit.

Journal of the American StatisticalAssociation , 737–740.[30] Wallenius, K. T. (1963).

Biased Sampling: the Noncentral Hypergeometric Probability Distribution . Ph.d. thesis,Stanford University.[31] Zachary, W. W. (1977). An Information Flow Model for Conflict and Fission in Small Groups.

Journal ofAnthropological Research

Vol. 33(No. 4) , 452–473.[32] Zingg, C.; Casiraghi, G.; Vaccario, G.; Schweitzer, F. (2019). What Is the Entropy of a Social Organization?

Entropy21(9)