[PDF] Improving ERGM Starting Values Using Simulated Annealing

Abstract

Much of the theory of estimation for exponential family models, which include exponential-family random graph models (ERGMs) as a special case, is well-established and maximum likelihood estimates in particular enjoy many desirable properties. However, in the case of many ERGMs, direct calculation of MLEs is impossible and therefore methods for approximating MLEs and/or alternative estimation methods must be employed. Many MLE approximation methods require alternative estimates as starting points. We discuss one class of such alternatives here. The MLE satisfies the so-called "likelihood principle," unlike the MPLE. This means that different networks may have different MPLEs even if they have the same sufficient statistics. We exploit this fact here to search for improved starting values for approximation-based MLE methods. The method we propose has shown its merit in producing an MLE for a network dataset and model that had defied estimation using all other known methods.

Full PDF

IImproving ERGM Starting Values Using Simulated Annealing

Christian S. Schmid, David R. HunterPennsylvania State UniversityJuly 2020

Abstract

Much of the theory of estimation for exponential family models, which include exponential-family random graph models (ERGMs) as a special case, is well-established and maximum like-lihood estimates in particular enjoy many desirable properties. However, in the case of manyERGMs, direct calculation of MLEs is impossible and therefore methods for approximatingMLEs and/or alternative estimation methods must be employed. Many MLE approximationmethods require alternative estimates as starting points. We discuss one class of such alter-natives here. The MLE satisﬁes the so-called “likelihood principle,” unlike the MPLE. Thismeans that diﬀerent networks may have diﬀerent MPLEs even if they have the same suﬃcientstatistics. We exploit this fact here to search for improved starting values for approximation-based MLE methods. The method we propose has shown its merit in producing an MLE for anetwork dataset and model that had deﬁed estimation using all other known methods.

Given a network-valued random variable A and a set of suﬃcient statistics T : A → R q , a (cid:55)→ ( T ( a ) , . . . , T q ( a )), the exponential-family random graph model (ERGM) takes the form P θ ( A = a ) = exp (cid:8) θ (cid:62) T ( a ) (cid:9) k ( θ ) for a ∈ A , (1)where A is the sample space of allowable networks and θ ∈ R q is a vector of parameters. In manyapplications, A denotes the entire set { a ∈ R N × N , a ij = a ji ∈ { , } , a ii = 0 } of possible undirected1 a r X i v : . [ s t a t . C O ] S e p etworks on N nodes, while in other applications A may be constrained to be a proper subset ofthis set. (If we drop the requirement that a ij = a ji , then the sample space consists of directednetworks.) The suﬃcient statistics T ( a ) play a central role in the model, since they enable theinclusion of traditional exogenous covariates like a node’s age as well as endogenous statistics, i.e.,statistics that allow for inference on the structure of the network. Popular endogenous statisticsare a network’s number of triangles or the number of pairs of ties that share one common node(two-stars). The normalizing constant k ( θ ) := (cid:88) a ∗ ∈A exp (cid:8) θ (cid:62) T ( a ∗ ) (cid:9) , (2)a weighted sum over all possible networks in the sample space, assures that (1) deﬁnes a probabilitymodel.Maximizing the log-likelihood function (cid:96) ( θ ) = θ (cid:62) T ( A obs ) − log k ( θ ) (3)of the ERGM results in the maximum likelihood estimator, or MLE. The MLE can be diﬃcult tocalculate directly due to the required calculation of k ( θ ), a sum which is only feasible for particularchoices of T ( a ) for which the sum may be simpliﬁed or sample spaces small enough to allow directcalculation. Even for a small number of nodes, the sample space can be prohibitively large; forinstance, there are over 3 . × networks on just N = 10 nodes, and an additional node inﬂatesthis number by a factor of over 1000. It is therefore often necessary to use an approximation methodwhen an MLE is desired, as exact calculation is not possible.An alternative method of estimation, known as maximum pseudolikelihood estimation (MPLE),has the desirable property that it is relatively easy to calculate even in cases where an exactMLE is elusive. This article ﬁrst discusses approximate maximum likelihood estimation and thensummarizes how MPLE works and explains why, even in cases where an MLE is desired, theapproximation method often begins by calculating the MPLE. Then, in Section 4, we explain a2ovel method to augment the approximation of the MLE by exploiting the failure of the MPLE tosatisfy the so-called likelihood principle. We demonstrate the use of this idea in Section 5. An expedient to the problem of maximizing the often-intractable likelihood function (3) is given bythe Markov chain Monte Carlo maximum likelihood estimator (MCMLE). First proposed by Geyerand Thompson (1992) and then adapted to the ERGM framework by Snijders (2002) and Hunterand Handcock (2006). The MCMLE is based on the idea that for any θ ∈ R q , k ( θ ) k ( θ ) = E θ exp (cid:8) ( θ − θ ) (cid:62) T ( A ) (cid:9) , (4)where the expectation is taken assuming that the random A has the distribution P θ .For a given θ , Equation (4) may be exploited by sampling a large number of networks A , . . . , A L from the distribution P θ . Then one may approximate and optimize the diﬀerence of log-likelihoodfunctions (cid:96) ( θ ) − (cid:96) ( θ ) ≈ ( θ − θ ) · T ( A ) − log (cid:18) L L (cid:88) i =1 exp (cid:8) ( θ − θ ) (cid:62) T ( A i ) (cid:9)(cid:19) . (5)In theory, the MCMLE algorithm works for any starting value θ , however, Hummel et al. (2012)show that approximation (5) is best when θ is close to θ , and furthermore the approximation candegrade badly when these parameter values are not close. For this reason, in practice it is necessaryto choose θ close to a maximizer of (cid:96) ( θ ) or else the MCMLE idea fails.The most common choice of θ is the maximizer of the so-called pseudolikellhood function.To construct the pseudolikelihood, we focus on the individual values of the tie indicators A ij . Astraightforward calculation shows that for a particular i, j , the conditional distribution of A ij giventhe rest of the network A cij is calculated from Equation (1) to be P θ ( A ij = 1 | A cij = a cij ) = logit − (cid:2) θ (cid:62) ( T ( a + ij ) − T ( a − ij ) (cid:3) , (6)3here the inverse logit function is deﬁned by logit − ( x ) = exp { x } / (1 + exp { x } ) and the networks a + ij and a − ij are formed by setting the value of a ij to be one or zero, respectively, while ﬁxing therest of the network at a cij . We deﬁne (∆ a ) ij := T ( a + ij ) − T ( a − ij ) and refer to (∆ a ) ij as the i, j vectorof change statistics. We may therefore rewrite Equation (6) as P θ ( A ij = 1 | A cij = a cij ) = logit − (cid:2) θ (cid:62) (∆ a ) ij (cid:3) for all i, j . Under the additional assumption that the A ij are all independent of one another, theseequations together deﬁne a logistic regression model, and the maximum likelihood estimator forthis logistic regression is known as the maximum pseudo-likelihood estimator (MPLE) because theassumption of independence is not justiﬁed in all cases and therefore the logistic regression likelihoodfunction is sometimes misspeciﬁed. Despite this misspeciﬁcation, the MPLE is frequently used asan estimator of θ because it is easily obtained using logistic regression software. Indeed, the MPLEhas a lengthy history in the literature on ERGMs; see Schmid and Hunter (2020) for more details.In particular, there is a substantial literature on the use of MPLE as an estimator in its ownright. For example, Schmid and Hunter (2020) argue that when its covariance is properly estimated,the MPLE can allow for valid statistical inference just as the MLE can. On the other hand, for themost part it is assumed (see, e.g., van Duijn et al., 2009) that MLE is preferable to MPLE. Indeed,one might prefer MLE to MPLE simply based on the classical principle, as articulated by JohnTukey for example, “Far better an approximate answer to the right question, which is often vague,than an exact answer to the wrong question, which can always be made precise” (Tukey, 1962). Forthe purpose of this article, the MPLE will serve merely as a value θ used in Approximation (5)and we assume that the ultimate goal is to obtain an approximate MLE via MCMLE. The theory of exponential family models is well-established in general; Barndorﬀ-Nielsen (1978)gives an extensive book-length treatment of this theory. As a particularly useful example in the4ontext of ERGMs, for exponential family distributions the expectation of the T vector underthe MLE equals the suﬃcient statistic on the observed network, i.e., E ˆ θ [ T ( A )] = T ( A obs ). Inother words, the MLE equals the method of moments estimator. This aligns with one’s generalexpectation that the distribution described by an estimate should on average describe the observednetwork. Furthermore, this fact provides a useful means for checking that a potential maximizer ofthe approximate log-likelihood function is in fact close to the MLE, as one may simulate networksfrom the distribution derived from the MLE and check that their sample mean is approximatelyequal to T ( A obs ). Another property of the MLE is that it satisﬁes the likelihood principle (Barnard et al., 1962;Birnbaum, 1962), which states that all information in the data relevant to the model parameters iscontained in the likelihood function. In the ERGM context, this means that an estimator shoulddepend on A obs only through T ( A obs ), as the likelihood itself depends on A obs only through T ( A obs ).This means that two networks with the same suﬃcient statistic will yield the same MLE. However,as Corander et al. (1998) point out, the MPLE does not satisfy the likelihood principle. Thisobservation forms the basis of the remainder of this article.The failure of the MPLE to satisfy the likelihood principle means that two networks A and A may have diﬀerent MPLEs even if T ( A ) = T ( A ). We see this fact illustrated in Figure 1, whichdepicts numerous possible MPLE values, each of which results from a network on 9 nodes with thesame values of T . Also depicted in Figure 1 is the mean value parameter space, an alternative tothe natural parameter space R q , which is deﬁned as the interior of the suﬃcient statistic’s samplespace. Each parameter θ can be uniquely projected to a point g in mean value parameter spaceby a bijective function µ : R q → C . For exponential family distributions, this function is deﬁnedas µ ( θ ) = ∇ log k ( θ ) = E θ [ T ( Y )]. In other words, a parameter θ ’s corresponding point in meanvalue space is the expectation of the suﬃcient statistic with respect to the distribution deﬁned by θ . This means that T ( A obs ) is the MLE’s projection into mean value parameter space, which isan interesting fact in its own right, since the MLE in natural parameter space is hard to ﬁnd, but5 . . . Natural Parameter Space

Edges T r i ang l e s Mean Value Parameter Space

Edges T r i ang l e s MPLE of Networks with 18 Edges & 13 Triangles

Figure 1: Every grey dot represents the MPLE of a network on N = 9 nodes with 18 edges and 13triangles using Model 7. The ’X’ visualizes the MLE.ﬁnding its counterpart in mean value space is trivially easy.More speciﬁcally, Figure 1 depicts the MLE and multiple MPLEs in the natural as well as inmean value parameter space of networks on N = 9 nodes with exactly 18 edges and 13 triangles.The dashed lines in the lower plot visualize the boundary of the convex hull of the mean valueparameter space. The networks were sampled from the ( ρ, σ, τ )-model of Frank and Strauss (1986)with σ taken to be zero, which results in an ERGM with a 2-dimensional T vector. We ﬁxed ρ = − τ = 0 .

53, which is equivalent to P θ ( A = a ) ∝ exp {− · ( . · ( } . (7)With only 9 nodes, it is computationally feasible to enumerate all possible T ( a ) vectors for a ∈ A , soit is possible to calculate the MLE exactly in this case. Schmid and Hunter (2020) demonstrate howto achieve this enumeration using the ergm package (Handcock et al., 2019) for the R computingenvironment (R Core Team, 2020). Although it is possible to enumerate all network statistics6ccording to their multiplicity in this 9-node example, it is not computationally feasible to calculateevery possible MPLE resulting from a network with 18 edges and 13 triangles. Thus, Figure 1 usesthe simulated annealing method explained in Section 4 to generate multiple such networks randomly. Another potential way to exploit the failure of the MPLE to satisfy the likelihood principle is via theso-called Rao-Blackwellization of the MPLE. The Rao-Blackwell theorem (Lehmann and Casella,1998) states that any estimator that is not a function of the suﬃcient statistic T ( a ) is inadmissible,meaning that for any estimator ˜ θ that is not a function of T ( a ), one can ﬁnd an improved estimator θ ∗ by taking the expectation of ˜ θ conditional on T ( a ). This new estimator has lower risk than ˜ θ byany convex lost function, e.g., mean squared error.In principle, computing the Rao-Blackwellized MPLE would require the distribution of theMPLE conditional on T ( A obs ). To put this idea into practice, a sample from this distribution couldbe obtained using, say, the simulated annealing method of Section 4. However, since the stochasticproperties of the simulated annealing algorithm are not well understood, we have not explored thisparticular idea in the current manuscript. As described earlier, maximum likelihood estimation by MCMC requires the selection of an auxiliaryparameter θ . Even though in theory the algorithm operates with any choice of θ , in practice, aparameter close to the MLE is essential for the MCMLE algorithm to work successfully (Hummelet al., 2012). The commonly used starting value for θ is the MPLE, since it is simple to calculate vialogistic regression; even though other methods have been proposed (e.g., Krivitsky, 2017), MPLEremains the predominant method. As shown in Figure 1, however, the MPLE can in some casesbe very diﬀerent from the MLE, which typically results in the failure of the MCMLE algorithm.In our 9-node example, the MCMLE algorithm failed for about a third of the MPLEs as startingvalues shown in Figure 1. 7igure 2: A network with 18 edges and 13 triangles. We demonstrate the problems of the MPLE as starting value for the MCMLE algorithm by takinga particular network on 9 nodes with 18 edges and 13 triangles as depicted in Figure 2. TheMPLE of this particular network is ˜ θ = ( − . , . θ = ( − . , . θ values at each trial from among the values depicted in Figure 1. In four outof ten trials, the algorithm stopped due to model degeneracy. As ﬁrst sketched by Handcock (2001)and later studied in detail by Schweinberger (2011), degeneracy occurs when the distribution deﬁnedby P ˆ θ —ostensibly the best possible parameter value, in some sense—places most of its probabilitymass on just a few networks, usually the empty and the full network. In this scenario, the simulatednetworks are so diﬀerent from the observed network that the MCMLE algorithm fails. In six casesshown in Table 3, the algorithm provided an MCMLE. Yet three trials yielded estimates furtheraway from the MLE than the MPLE, leading to estimates in natural parameter space close to theboundary of the convex hull and therefore clearly diﬀerent from the true MLE based on the observednetwork. The only glimmer of hope is trial 5, which leads to an estimate somewhat similar to theMLE. This example shows that it is necessary to check that an MLE is producing values of T , onaverage, close to T ( A obs ). Once this takes place, additional techniques such as those detailed inHummel et al. (2012) may be employed to ﬁne-tune the MLE. However, even using such techniques,many of the trials from Table 3 did not end successfully.8 Edges T r i ang l e s Figure 3: MCMLEs of the network in Figure 2 using Model (7) for ten independent trials. ’+’indicates the MPLE, ’x’ the MLE.

As noted earlier, Hummel et al. (2012) show that a poorly chosen θ can make successful MCMLestimation almost impossible. Consequently, the more diﬀerent the MPLE is from the MLE, themore diﬃcult it is to ﬁnd the MCMLE, since the MPLE is commonly taken as the algorithm’sstarting value θ due to its simple and fast calculation. Inspired by Figure 1, we propose a novelapproach for ﬁnding an improved starting value for the MCMLE algorithm, namely, to search fornetworks that yield the same—or at least nearly the same— statistics as the observed network,then consider these networks’ MPLEs as potential starting values.We propose searching for networks with the same statistics as the observed network using the simulated annealing algorithm (Kirkpatrick et al., 1983). This algorithm was initially inspiredfrom the practice of annealing metal, a procedure in which the material is heated and then cooledvery slowly, allowing it to be gradually shaped into the desired form. The process of heating upand cooling down is translated into the simulated annealing approach by allowing interim resultsinitially (during the “heated” phase) to be worse than previous results; that is, the algorithm ismore stochastic than deterministic in its early phase. Gradually, the algorithm “cools,” becoming9ore deterministic and less prone to random jumps. The hope is that the algorithm does notget stuck at local maxima that it otherwise might not be able to leave if a purely deterministicoptimization algorithm were used. By starting in an initial random phase followed by increasinglydeterministic behavior, the algorithm can ﬁnd and then focus on areas of the search space thatinclude globally optimal solutions. We demonstrate the simulated annealing-based approach to MCMLE using the E. coli transcrip-tional regulation network of Shen-Orr et al. (2002), which is based on the RegulonDB data ofSalgado et al. (2001). The nodes in this network represent operons, while an edge from operon i to j indicates that i encodes a transcription factor that regulates j . Even though this is originally adirected network that contains self-edges, i.e., operons that regulate themselves, we follow Saul andFilkov (2007) and treat this network as undirected and without self-edges. This results in a networkwith 519 edges and 418 nodes. We study the same ERGM on these data as Hummel et al. (2012),a model which yields MCMLEs considerably diﬀerent from the MPLE, making estimation diﬃcult.The model’s statistics consist of the number of edges; the numbers of nodes with degrees two, three,four, and six; and the geometrically weighted degree distribution with the decay parameter ﬁxedat 0 .

25. As demonstrated by Hummel et al. (2012), initializing the MCMLE algorithm with theMPLE does not produce successful results.Figure 4 depicts the MPLEs of 10,000 networks (in grey) that have the same suﬃcient statisticsas the original network, including the MPLE of the original network as well as the MCMLE. All10 ,

000 networks were obtained using the simulated annealing algorithm, always starting with theobserved network. It is remarkable that the MPLEs of these networks essentially form two clusters,one cluster that includes the MPLE and another one that includes the MCMLE of the observednetwork. Among other things, this ﬁgure illustrates why ﬁnding the MCMLE by beginning thealgorithm at the observed network’s MPLE is a diﬃcult task. The MPLE is simply not closeenough to the MLE for the approximate log-likelihood using an MPLE-based sample of networks10igure 4: MPLEs of networks with the same suﬃcient statistics as the observed E. coli network.The MPLE of the observed network is marked with an ’+’, the MCMLE with an ’x’.to be eﬀective. On the other hand, setting the starting value to one of the MPLEs that form thecluster around the MCMLE evidently makes the algorithm more likely to succeed.A natural question that arises when considering Figure 4 is how the networks in the two clustersdiﬀer. Figure 5 visualizes networks of each cluster as well as the original E. coli network. Eventhough all three networks have the same suﬃcient statistics, it is possible to discern that the networkin the MPLE cluster maintains some of the distinctive structures of the original network, while thenetwork in the MCMLE cluster is more reminiscent of an Erd¨os-R´enyi network, i.e., a networkwhere each tie occurs independently with same probability p . The fact that the occurrence of a tiein an Erd¨os-R´enyi network does not depend on the occurrence or absence of any other tie resultsin a model in which the MPLE and the MLE are the same. Consequently, we conjecture that itis advantageous to ﬁnd a network that resembles an Erd¨os-R´enyi network and that, in addition,yields the same suﬃcient statistics as the observed network.Simulating networks from the probability distribution deﬁned by the obtained MCMLE resultsin networks that resemble the rightmost network in Figure 5, rather than the observed network.This consequently casts doubt on whether the statistics of this ERGM appropriately capture theunique structure of the observed network. Stated diﬀerently, if the MLE, ostensibly the gold11 bserved Network Network with MPLE in MPLE Cluster Network with MPLE in MCMLE Cluster Figure 5: The left visualizes the E.coli transcriptional regulation network of Shen-Orr et al. (2002).The center depicts a network whose MPLE falls into the MPLE cluster of Figure 4. The rightdepicts a network whose MPLE falls into the MPLE cluster of Figure 4. All three networks havethe same suﬃcient statistics vector.standard among estimators, yields unsatisfying results, the model itself should be reconsidered. Itis important to remember that in such cases, as indeed in this case, generally the MPLE also fails toresult in simulated networks resembling the original network; thus, a poor model cannot generallybe mended by using a diﬀerent estimation technique.We repeat the simulation study that resulted in Figure 4, with the exception that we startthe simulated annealing algorithm with an Erd¨os-R´enyi-generated network with similar density tothe original network, i.e., where p is deﬁned as the ratio of the number of ties to the number ofpossible ties. With this modiﬁcation, the simulated annealing algorithm only ﬁnds networks wherethe MPLE is in the proximity of the MCMLE, meaning that the MPLE of any of these networks isan improved starting value for the MCMLE algorithm compared to the original MPLE.In summary, we suggest ﬁnding an improved starting value θ the following way:1. For a given network A obs , ﬁnd the suﬃcient statistics T ( A obs )2. Simulate a network G using an Erd¨os-R´enyi model on the same nodes as A obs and with p given by the edge density of A obs

3. Simulate a new network A ∗ with T ( A obs ) = T ( A ∗ ) with simulated annealing and start with the12lgorithm at G . If T ( A ) includes continuous statistics, ﬁnd a network A ∗ with T ( A ) ≈ T ( A ∗ )4. Take the MPLE of A ∗ as the improved starting value θ This approach was successfully applied by Schmid et al. (2020) to a citation network based onall US Supreme Court majority opinions written between 1937 and 2015. The majority opinion ofeach court case is considered a node, and a citation from case i to case j is deﬁned as a directedtie. The network includes 10 ,

020 nodes and the ERGM used includes 14 endogenous and exogenousstatistics—complicated enough to be computationally challenging. Ultimately, Schmid et al. (2020)ﬁnd that multiple diﬀerent approaches fail to yield a successful MCMLE, among them the steppingmethod of Hummel et al. (2012) and the standard MCMLE algorithm using the MPLE as startingvalue. Yet the approach described here—which reproduces the observed network statistics onlyapproximately due to the inclusion of several continuous variables among these statistics—doesproduce a successful MCMLE; it is the only known method that does so in this example.

The basic idea of this article—searching for networks that have the same, or approximately thesame, ERGM statistics as the observed network and then using the MPLEs from these networksas starting values for a traditional MCMLE algorithm—is relatively simplistic. Yet it has alreadydemonstrated its value in solving previously unsolvable ERGM-based estimation problems.It seems there is still much to learn about this methodology. For instance, is there a wayto implement Rao-Blackwellization, and does this approach lead to estimates that are closer intheir behavior to the MLE? Also, how important is it to search the sample space of all networksthoroughly, and is an alternative to simulated annealing possible for this purpose? Figure 4 suggeststhat initializing the simulated annealing algorithm at the observed network and the Erd¨os-R´enyi-generated network generate wholly distinct clouds of MPLE values, which leads to the question ofwhether all possible MPLE values for a particular set of statistics is actually bimodal, or whetherin fact there is a vast as-yet-unexplored set of MPLE values that could potentially be of use bothas starting MCMLE values and as sample points in a Rao-Blackwellization scheme.13hat such a simple idea can prove so eﬀective relative to all other known methods suggeststhat there exists immense untapped potential for improving upon approximate likelihood-basedinference using MPLEs.

References

Barnard, G. A., G. M. Jenkins, and C. B. Winsten (1962). Likelihood inference and time series.

Journal of the Royal Statistical Society. Series A (General) 125 (3), 321–372.Barndorﬀ-Nielsen, O. (1978).

Information and Exponential Families in Statistical Theory . Wiley.Birnbaum, A. (1962). On the foundations of statistical inference.

Journal of the American StatisticalAssociation 57 (298), 269–306.Corander, J., K. Dahmstr¨om, and P. Dahmstr¨om (1998). Maximum likelihood estimation forMarkov graphs. Technical report, Department of Statistics, University of Stockholm.Frank, O. and D. Strauss (1986). Markov graphs.

Journal of the American Statistical Associa-tion 81 (395), 832–842.Geyer, C. J. and E. A. Thompson (1992). Constrained Monte Carlo maximum likelihood fordependent data.

Journal of the Royal Statistical Society. Series B (Methodological) 54 (3), 657–699.Handcock, M. S. (2001). Assessing degeneracy in statistical models of social networks. Technicalreport, Center for Statistics and the Social Sciences, University of Washington.Handcock, M. S., D. R. Hunter, C. T. Butts, S. M. Goodreau, P. N. Krivitsky, and M. Morris(2019). ergm: Fit, Simulate and Diagnose Exponential-Family Models for Networks . The StatnetProject ( https://statnet.org ). R package version 3.10.0-4837.Hummel, R. M., D. R. Hunter, and M. S. Handcock (2012). Improving simulation-based algorithmsfor ﬁtting ERGMs.

Journal of Computational and Graphical Statistics 21 (4), 920–939.14unter, D. R. and M. S. Handcock (2006). Inference in curved exponential family models fornetworks.

Journal of Computational and Graphical Statistics 15 (3), 565–583.Kirkpatrick, S., C. D. Gelatt, and M. P. Vecchi (1983). Optimization by simulated annealing.

Science 220 (4598), 671–680.Krivitsky, P. N. (2017). Using contrastive divergence to seed Monte Carlo MLE for exponential-family random graph models.

Computational Statistics and Data Analysis 107 (C), 149–161.Lehmann, E. and G. Casella (1998).

Theory of Point Estimation . Springer Verlag.R Core Team (2020).

R: A Language and Environment for Statistical Computing . Vienna, Austria:R Foundation for Statistical Computing.Salgado, H., A. Santos-Zavaleta, S. Gama-Castro, D. Mill´an-Z´arate, E. D´ıaz-Peredo, F. S´anchez-Solano, E. P´erez-Rueda, C. Bonavides-Mart´ınez, and J. Collado-Vides (2001). RegulonDB (ver-sion 3.2): Transcriptional regulation and operon organization in Escherichia coli K-12.

Nucleicacids research 29 (1), 72–74.Saul, Z. and V. Filkov (2007). Exploring biological network structure using exponential randomgraph models.

Bioinformatics (Oxford, England) 23 , 2604–11.Schmid, C. S., T. H. Chen, and B. A. Desmarais (2020). Generative dynamics of Supreme Courtcitations.

Working Paper, The Pennsylvania State University .Schmid, C. S. and D. R. Hunter (2020). Accounting for model misspeciﬁcation when using pseu-dolikelihood for ERGMs. Unpublished manuscript.Schweinberger, M. (2011). Instability, sensitivity, and degeneracy of discrete exponential families.

Journal of the American Statistical Association 106 (496), 1361–1370.Shen-Orr, S., R. Milo, S. Mangan, and U. Alon (2002, 06). Network motifs in the transcriptionalregulation network of Escherichia coli.

Nature genetics 31 , 64–68.15nijders, T. A. (2002). Markov chain Monte Carlo estimation of exponential random graph models.

Journal of Social Structure 3 (2), 1–40.Tukey, J. W. (1962). The future of data analysis.

Annals of Mathematical Statistics 33 , 1–67.van Duijn, M. A., K. J. Gile, and M. S. Handcock (2009). A framework for the comparison ofmaxmimum pseudo-likelikood and maximum likelihood estimation of exponential family randomgraph models.