[PDF] An optimal test for strategic interaction in social and economic network formation between heterogeneous agents

Abstract

We introduce a test for whether agents' preferences over network structure are interdependent. Interdependent preferences induce strategic behavior since the optimal set of links directed by agent i will vary with the configuration of links directed by other agents. Our model also incorporates agent-specific in- and out-degree heterogeneity and homophily on observable agent attributes. This introduces 2N+ K 2 nuisance parameters ( N is number of agents in the network and K the number of possible agent attribute configurations). Under the null equilibrium is unique, but our hypothesis is nevertheless a composite one as the degree heterogeneity and homophily nuisance parameters may range freely across their parameter space. Under the alternative our model is incomplete; there may be multiple equilibrium network configurations and our test is agnostic about which one is selected. Motivated by size control, and exploiting the exponential family structure of our model \emph{under the null}, we restrict ourselves to conditional tests. We characterize the exact null distribution of a family of conditional tests and introduce a novel Markov Chain Monte Carlo (MCMC) algorithm for simulating this distribution. We also characterize the locally best test. The form of this test depends upon the gradient of the likelihood with respect to the strategic interaction parameter in the neighborhood of the null. Remarkably, this gradient, and consequently the form of the locally best test statistic, does not depend on how an equilibrium is selected. Exploiting this lack of dependence, we outline a feasible version of the locally best test.

Full PDF

AAn optimal test for strategic interactionin social and economic network formationbetween heterogeneous agents

Andrin Pelican and Bryan S. Graham ∗ September 2, 2020 ∗ Pelican: e-mail: [[email protected]]. Graham: Department of Economics, University of Califor-nia - Berkeley, 530 Evans Hall http://bryangraham.github.io/econometrics/ . We thankseminar participants at the University of California - Berkeley, the University of Cambridge, ITAM, JinanUniversity, Yale University, Microsoft Research and the World Congress of the Econometric Society for feed-back and suggestions. We also thank Ka¨ıla A. Munro, Daniele Ballinari, Gabriel Okasa, Enrico De Giorgi,Piotr Lukaszuk and Yi Zhang for their generous feedback and input. A special thanks to Winfried Hochst¨attlerfor continuously pointing out the connection to the discrete mathematics literature and to Michael Jannsonfor help and insight into the nature of our testing problem. All the usual disclaimers apply. Financial supportfrom NSF grant SES i a r X i v : . [ ec on . E M ] S e p bstract We introduce a test for whether agents’ preferences over network structure are interdependent . Interdependent preferences induce strategic behavior since the optimalset of links directed by agent i will vary with the conﬁguration of links directed byother agents.Our model also incorporates agent-speciﬁc in- and out-degree heterogeneity and ho-mophily on observable agent attributes. This introduces 2 N + K nuisance parameters( N is number of agents in the network and K the number of possible agent attributeconﬁgurations).Under the null equilibrium is unique, but our hypothesis is nevertheless a compositeone as the degree heterogeneity and homophily nuisance parameters may range freelyacross their parameter space. Under the alternative our model is incomplete; there maybe multiple equilibrium network conﬁgurations and our test is agnostic about which oneis selected.Motivated by size control, and exploiting the exponential family structure of ourmodel under the null , we restrict ourselves to conditional tests. We characterize theexact null distribution of a family of conditional tests and introduce a novel MarkovChain Monte Carlo (MCMC) algorithm for simulating this distribution.We also characterize the locally best test. The form of this test depends upon thegradient of the likelihood with respect to the strategic interaction parameter in theneighborhood of the null. Remarkably, this gradient, and consequently the form of thelocally best test statistic, does not depend on how an equilibrium is selected. Exploitingthis lack of dependence, we outline a feasible version of the locally best test.We present two illustrative applications. First, we test for whether nations behavestrategically when choosing locations for overseas diplomatic missions. Second, we testfor whether ﬁrms prefer to sell to ﬁrms with richer customer bases (i.e., whether ﬁrmsvalue “indirect customers”). Some Monte Carlo experiments explore the size and powerproperties of our test in practice. JEL Codes:

C31

Keywords:

Network formation, Locally Best Tests, Similar Tests, Exponential Fam-ily, Incomplete Models, Degree Heterogeneity, Homophily, Binary Matrix Simulation, EdgeSwitching Algorithms iietwork data feature in many areas of economic research. Examples include buyer-supplier networks or supply-chains (e.g., Atalay et al., 2011), research and development(R&D) and other types of strategic partnerships across ﬁrms (e.g., K¨onig et al., 2019), pat-terns of trade among nations (e.g., Tinbergen, 1962), the structure of friendships betweenadolescents (e.g., Calv´o-Armengol et al., 2009), and interbank lending and borrowing (e.g.,Boss et al., 2004). Jackson et al. (2017) present many other examples. Such data abound inthe other social sciences as well (e.g., Apicella et al., 2012).One approach to modelling networks proceeds pairwise, or dyad-by-dyad. In this approach(the realization of) each possible link in a network is independent of all others. Importantlythis independence may only hold conditional on latent agent-speciﬁc attributes; such latentattributes may induce dependence across links unconditionally. Gravity models of trade,with exporter and importer ﬁxed eﬀects, provide a familiar illustration (Anderson, 2011).Stochastic block models (SBMs), widely studied in statistics, also fall into this category(Airoldi et al., 2008; Bickel et al., 2013; Gao et al., 2015).A second approach views a network as an equilibrium outcome of a large N -player game.In this approach agents’ preferences over links may vary with the presence or absence of linkselsewhere in the network. For example agents’ may prefer reciprocated to unreciprocatedlinks. Alternatively they may attach extra utility to links which induce transitive closure(Granovetter, 1973). In such settings small, local, re-wirings of a network may induce acascade of additional link updates which can, at least in principle, change the global topologyof a network. Multiple equilibria may also arise. In strategic models, stable networks neednot be eﬃcient as agents fail to account for the costs and beneﬁts of links they form onothers. The two classes of network formation models, in addition to being scientiﬁcallydistinct, generate diﬀerent policy implications (Goyal, 2009).Graham (2017) and de Paula et al. (2018) represent two recent attempts to actualize,respectively the dyad-by-dyad and strategic approaches, into workable econometric mod-els. In this paper we take a ﬁrst step toward integrating these two econometric modellingapproaches. We study a model of network formation which simultaneously incorporatesrich agent-level unobserved heterogeneity, homophily, and interdependent preferences. Weare aware of no prior attempt to incorporate these three features into a single econometricmodel. Incorporating heterogeneity and homophily into the null model is important becausethese factors provide alternative explanations for the types of network microstructure oftenassociated with strategic behavior.In our model the importance of preference interdependencies is indexed by a parameter(or vector of parameters). Our goal is to test whether this parameter equals zero. Testing the Graham (2020) surveys the larger econometric literature on network formation. N agent-speciﬁc (incidental) degree heterogeneity parameters as well as K homophily coef-ﬁcients (where K equals the number of observed agent types). Because these nuisanceparameters can range freely across their parameter space under the null, avoiding sizedistortion is diﬃcult. This problem famously arises in instrumental variables models,where the size properties of common tests may vary with instrument strength (c.f.,Moreira, 2009; Andrews et al., 2019).2. Our model is incomplete under the alternative (cf., de Paula, 2013). When preferencesare inter-dependent multiple equilibrium networks may occur. We leave the mechanismwhich selects the observed equilibrium unspeciﬁed. Because the alternative is incom-plete it is not obvious how to choose a test statistic with good power. A likelihood ratiotest, for example, would require a complete speciﬁcation of the equilibrium selectionunder the alternative.3. We characterize the exact distribution of our test statistic under the null. Practical2igure 1: Pharmaceutical Buyer-Supplier Network, 2015Source: Compustat and authors’ calculations.Notes: 2015 buyer-supplier relationships among publicly traded ﬁrms in NAICS industry3254 (pharmaceuticals). The head of each arc denotes the buying ﬁrm. Firms in each of thesix-digit sub-sectors are shaded diﬀerently (see the legend). The largest weakly-connectedcomponent is shown.application of this result, however, requires a feasible simulation algorithm.Section 1 presents our model of strategic network formation. We begin by deﬁning agentpreferences and characterizing equilibrium networks. With this foundation we are able towrite down a likelihood function for the network. Since there may exist multiple equilibriumnetworks, this likelihood depends on an unknown (and unmodelled) equilibrium selectionmechanism. Although well-deﬁned (see Theorem 1.1 below), our likelihood function cannotbe numerically evaluated in practice.Section 2 outlines our approach to testing and derives the form of the locally best teststatistic. We characterize the exact distribution of our test statistic under the null. However,for reasons of practically, we approximate the exact null distribution by simulation. Section 3outlines a new Markov Chain Monte Carlo (MCMC) algorithm for generating random drawsfrom the required null distribution. Our algorithm may be of independent interest to thosefamiliar with binary matrix simulation and counting problems arising in machine learning,ecology and other ﬁelds (e.g., Sinclair, 1993).Section 4 presents two small applications of our test. First we test for whether nationsbehave strategically when choosing locations for their diplomatic missions. In particular, wefocus on whether nations value transitivity in diplomatic ties. We might posit, for example,that the value of a diplomatic mission in the People’s Republic China (PRC) increased for3any countries after President Carter’s decision to formally recognize the PRC in 1978 (cf.,Kinne, 2014). If prior to 1978 many countries had diplomatic relations with the UnitedStates, but not the PRC, directing an arc to the PRC after the US did so would generate atransitive triad.In a second application, we test for whether ﬁrms value indirect customers (i.e., do theyprefer to sell to ﬁrms which themselves sell to many other ﬁrms). For this illustration we usethree Buyer-Supplier networks that we constructed from Compustat data. Speciﬁcally welook at the vehicle, computer and pharmaceutical manufacturing industries. Figure 1 plotsthe pharmaceutical and medicine buyer-supplier network.Section 4 also reports on a small number of Monte Carlo experiments we conducted toverify the theoretical size and power properties of our test. Section 5 ﬁnishes with a shortdiscussion of some possible areas for additional research.While our focus is on strategic interaction in the context of a single network (with manyagents), our results are also applicable to settings where the econometrician observes manyindependent games, each with a small number of players (e.g., market entry decisions by rivalﬁrms across many markets). Chen et al. (2018) and Kaido and Zhang (2019) are two recentexamples of attempts to extend likelihood-ratio ideas to this type of setting. The test weintroduce below is an analogous to a score-type test, complementing these likelihood-basedapproaches.

Here we outline a model of strategic network formation. In this model N heterogenous agentsform a directed network (or digraph). We begin by establishing some basic notation. Wethen introduce agent preferences over the form of the network, discuss equilibrium networksand, ﬁnally, develop a likelihood function for the observed network. A directed graph G ( N , A ) consists of a set of nodes (agents) N = { , . . . , N } and a set ofordered pairs of nodes A = { ( i, j ) , ( k, l ) , . . . } for i (cid:54) = j , k (cid:54) = l , and i, j, k, l ∈ N . The elementsof A correspond to those arcs, or directed links, present in G ( N , A ).In what follows we typically work with the adjacency matrix D = [ D ij ] where D ij = (cid:40) ij ∈ A . (1) Atalay et al. (2011) constructed a similar network also using data from Compustat. D consists of structural zeros.Let G − ij denote the network obtained by deleting link ij from G (if present), and G + ij the network one gets after adding this link (if absent). Let D ± ij denote the adjacency matrixassociated with the network obtained by adding/deleting link ij from G . Let D N denote theset of all 2 N ( N − possible adjacency matrices and I J the set of all possible J -dimensionalbinary vectors.Associated with each agent in the network is the triple ( A i , B i , X (cid:48) i ) (cid:48) . Here A i and B i are,as explained further below, agent-level out- and in-degree heterogeneity terms unobserved bythe econometrician. In contrast X i is a K × observed by the econometrician. These indicators might reﬂect the industrial classiﬁcation ofa ﬁrm, the gender or race of an individual, or the broad geographic location of a nation. Moregenerally X i enumerates the support points of a collection of (observed) discrete regressors.We leave the joint distribution of ( A i , B i , X (cid:48) i ) (cid:48) unrestricted. This implies, for example,that the unobserved degree heterogeneity ( A i , B i ) (cid:48) may be correlated with the observed co-variates X i , as in ﬁxed eﬀects panel data analyses (see also Graham (2017), Dzemski (2018),Jochmans (2018) and Yan et al. (2018)). We assume that agents care about the shape of the network. Let d ∈ D N be a feasible N -player network. Agent utility varies with the conﬁguration of this network. The utilityagent i gets from some feasible network wiring d is assumed equal to ν i ( d i , d − i ; U ) = (cid:88) j d ij [ A i + B j + X (cid:48) i Λ X i + γ s ij ( d ) − U ij ] , (2)Here d i = ( d i , . . . , d ii − , d ii +1 , . . . , d iN ) (cid:48) corresponds the set of links that agent i chooses toform (or not), while d − i equals the links that the other N − d i corresponds to a pure strategy.Agent i ’s utility varies with number and nature of those links she chooses to send, ordirect, towards others. The utility associated with i directing a link to j is increasing in theheterogeneity terms A i and B j . Agents with high values of out-degree heterogeneity A i geta large amount of baseline utility from any link they send. In a social network context high A i agents are “extroverts”. High B j agents, in contrast, are especially attractive targets forlinks sent by others. In a social network high B j agents are “prestigious”. This distribution does have implications for test power, as will become apparent below.

5n a buyer-supplier context high A i ﬁrms might especially value a diverse customer baseor supply a “critical” input used in the production processes of many other ﬁrms. High B j ﬁrms correspond to especially attractive customers. For example, national big box retailchains with many retail locations, like Walmart and Target, may have high B j values sincetheir purchases are less sensitive to local economic shocks.The X (cid:48) i Λ X j def ≡ W (cid:48) ij λ term allows for assortative matching on agent attributes. Theelements of the K × K matrix Λ = [ λ kl ] parameterize the systematic utility generated bylinks, say, from group k to group l . This allows, for example, the utility generated by linksacross agents belonging to diﬀerent groups to systematically diﬀer from that generated bywithin-group links. In the buyer-supply context, arcs between ﬁrms with particular industrialclassiﬁcations may generate greater surplus. In a social network girls might, all things equal,prefer other girls as friends. The Λ matrix parameterizes homophily (or heterophily) ofthese types.Network generating processes where link utility varies with agent-level degree heterogene-ity and observable dyad attributes – the ﬁrst three terms in (2) – can successfully match manyfeatures of real world networks (Graham, 2017). The third term in (2) – s ij ( d ) – enrichesthis baseline model to allow agent preferences over links to vary with the presence or absenceof links elsewhere in the network. de Paula et al. (2018) call preferences of this type “in-terdependent”. It is the dependence of utility on s ij ( d ) that makes the model “strategic”:agent i ’s optimal action may vary with the conﬁguration of links directed by others.For now the only restriction we place on s ij ( d ) is that s ij ( d ) = s ij ( d − ij ) = s ij ( d + ij ) . (3)If existence of a pure strategy equilibrium is additionally desired, then additional restrictionson s ij ( d ) may be needed. Although we emphasize pure strategy equilibria in our discussionand examples, all of our results allow for mixed strategy equilibria as well. Consequently, inpractice, s ij ( d ) may be speciﬁed quite freely, although our test may have low power for somechoices.One feature of s ij ( d ), which will prove central to our analysis, is that is has ﬁnite range.To see this observe that since the set of all networks D N is ﬁnite, the strategic interactionterm s ij ( d ) also takes only a ﬁnite number of values. Let S = { s, s , . . . , s M , s } be the set ofpossible values for s ij ( d ), ordered from smallest to largest. We deﬁne W ij = ( X i ⊗ X j ) and λ = vec (Λ (cid:48) ).

6n example illustrates. Let s ij ( d ) equal s ij ( d ) = d ji , (4)as would be appropriate when agents have a taste for reciprocated links. In this case S = { , } . If agents prefer transitive links (i.e., they prefer to direct friendships to “friends offriends”), then we might set s ij ( d ) = (cid:88) k d ik d kj , (5)which implies that S = { , , . . . , N − } . Finiteness of the cardinality of S (for a given N )plays an important role in our analysis, as will become apparent below.The ﬁnal component of agent utility is idiosyncratic; we assume that the { U ij } i (cid:54) = j are in-dependent and identically distributed (iid) logistic random variables. The logistic assumptionis also important: it generates exponential family structure which we exploit when formingour test. Throughout we assume that the observed network D coincides with an equilibrium outcomeof an N -player complete information game. Each agent (i) observes { ( A i , B i , X (cid:48) i ) } Ni =1 and { U ij } i (cid:54) = j and then (ii) decides which, out of N − d ∈ D N coincides with a pure strategy combination.We assume that the observed network corresponds to a pure strategy contained in a (possiblymixed strategy) Nash equilibrium (NE). In practice most (common) choices of s ij ( d ) aremonotonic, which ensures (by Tarski ﬁxed point theorem), the existence of an equilibrium inpure strategies. We emphasize this special case in most of what follows, but nothing essentialhinges upon it and our results apply to equilibria in mixed strategies as well.In the analysis of undirected networks, the pairwise stability equilibrium concept intro-duced by Jackson and Wolinsky (1996) plays a prominent role. The use of NE, however, isstandard in the context of directed networks. For example, Bala and Goyal (2000) and Duttaand Jackson (2000) study the eﬃciency properties of pure strategy NE directed networks.7 ure strategy equilibria A pure strategy NE corresponds to a pure strategy combination d ∗ where, for U = u (with U def ≡ [ U ij ]) and all i = 1 , . . . , N , ν i (cid:0) d ∗ i , d ∗− i , u (cid:1) ≥ ν i (cid:0) d i , d ∗− i , u (cid:1) (6)for all possible pure strategies d i ∈ I N − .To further understand the structure of a pure strategy equilibrium it is helpful to introducea notion of marginal utility. The marginal utility i receives from sending a link to j equals: M U ij ( d i , d − i ; U ) = (cid:40) ν i ( d ) − ν i ( d − ij ) if d ij = 1 ν i ( d + ij ) − ν i ( d ) if d ij = 0 . (7)Under preferences (2) the marginal utility of the ij link is therefore M U ij ( d i , d − i ; U ) = A i + B j + W (cid:48) ij λ + γ s ij ( d ) − U ij . (8)With this notation any adjacency matrix which simultaneously satisﬁes the N ( N −

1) non-linear equations: D ij = ( A i + B j + W (cid:48) ij λ + γ s ij ( D ) ≥ U ij ) (9)for i = 1 , . . . , N and j (cid:54) = i is a pure strategy NE.Similar to Miyauchi (2016), consider the mapping ϕ ( D ) : D N → I N ( N − : ϕ ( D ) N ( N − × ≡  ( M U ( D ) ≥ ( M U ( D ) ≥ ( M U NN − ( D ) ≥  . (10)Next let vec ∗ ( A ) be a modiﬁcation of the matrix vectorization operator which drops thediagonal elements of the square matrix A . Deﬁne its inverse operator as reconstituting A ,but now with zeros on its main diagonal. With this notation it easy to see that any purestrategy NE equilbrium network, d ∗ , including possibly the observed one, D , corresponds toa ﬁxed point: d ∗ = vec − ∗ ( ϕ ( d ∗ )) . (11)One advantage of the ﬁxed point representation (11) is that is allows for the application ofTarski’s (1955) ﬁxed point theorem. For γ ≥ s ij ( d ) weakly increasing in d for all8yads, Tarski’s theorem guarantees (i) the existence of an equilibrium and (ii) that the set ofall equilibria constitutes a non-empty complete lattice (cf., Miyauchi, 2016, Proposition 1).This characterization applies when s ij ( d ) takes either of the two example forms introducedabove. Of course, as shown by Nash (1950), an equilibrium in mixed strategies will alwaysexist. In practice this allows for substantial ﬂexibility in the form of s ij ( d ). While we remain agnostic about equilibrium selection in the presence of multiplicity, it isnevertheless useful to develop an abstract notation for the unknown equilibrium selectionrule. This notation allows us to write down a likelihood for the network. Of course thislikelihood function could not be evaluated numerically without ﬁrst replacing our abstractselection mechanism with something more concrete.Let n def ≡ N ( N −

1) equal the number of (ordered) dyads in the network. Further let A def ≡ [ A i ] and B def ≡ [ B i ] be the N × δ =( λ (cid:48) , A (cid:48) , B (cid:48) ) (cid:48) , recalling that λ is the K × X i .Adding our strategic interaction parameter, γ we get a full parameter vector of θ = ( γ, δ (cid:48) ) (cid:48) .Let N ( d , u ; θ ) be a function which assigns, for U = u , a probability weight to networkor, equivalently, pure strategy combination d : N ( d , u ; θ ) : D N × R n → [0 ,

1] (12)We assume that the selection mechanism (12) is such that:1. if d is the only network which satisﬁes (6) when U = u (i.e., is the unique NE), then N ( d , u ; θ ) = 1;2. if d is not a NE when U = u , then N ( d , u ; θ ) = 0;3. if there are multiple pure strategy NE, then N ( d , u ; θ ) ≥ d which is a NEand zero otherwise (subject to the adding-up constraint (cid:80) d ∈ D N N ( d , u ; θ ) = 1);4. if there is a unique mixed strategy NE when U = u , then N ( d , u ; θ ) ≥ d (contained in the mixed strategyNE). If there are multiple mixed strategy NE when U = u , then N ( d , u ; θ ) ≥ N ( d , u ; θ ) deﬁned, we can write the likelihood of observing network D = d as P ( d ; θ, N ) = (cid:90) u ∈ R n N ( d , u ; θ ) f u ( u )d u , (13)where f u ( u ) = (cid:81) i (cid:54) = j f U ( u ij ) with f U ( u ) = e u / [1 + e u ] Theorem 1.1.

For any network d ∈ D N there exists a measurable function N ( d , · ; θ ) : R n → [0 , , which assigns to u ∈ R n the NE weight on the pure strategy combination correspondingto d . The proof of Theorem 1.1 can be found in Appendix A.1. Although we do not explicitlydeﬁne N ( d , u ; θ ), only stating its key properties, Theorem 1.1 shows that N ( d , u ; θ ) existsand is measurable. An implication of this result is that the likelihood (13) is well-deﬁned.To understand the likelihood (13) it is helpful to consider a (relatively) simple example.This example will also help in understanding our derivation of the optimal test statisticbelow. Assume that s ij ( d ) = d ji such that agents prefer reciprocated links when γ ≥

0. Inthis example s ij ( d ) equals either zero ( j does not reciprocate) or one ( j does reciprocate).We can use the two elements of S to partition the real line into what we will call buckets : R = ( −∞ , µ ij ] ∪ ( µ ij , µ ij + γ ] ∪ ( µ ij + γ, ∞ ) . (14)Here µ ij = A i + B j + X (cid:48) i Λ X i equals the systematic, non-strategic, component of utilitygenerated by arc ij . Next consider the realization of U ij , the idiosyncratic utility agent i gets when she directs a link to j . If U ij falls into the ﬁrst bucket in (14), then agent i willalways direct a link to j ; irrespective of whether j chooses to direct a link to i or not. If U ij falls into the middle or inner bucket, however, then i will direct a link to j only if j reciprocates. Finally, if U ij falls into the last bucket, then i will never direct a link to j regardless of whether j directs a link to i or not. We will call the ﬁrst and last buckets in(14) outer buckets.If both U ij and U ji fall in their respective inner buckets, then the { i, j } dyad can eithertake the empty ( D ij = D ji = 0) or reciprocated ( D ij = D ji = 1) conﬁguration in equilibrium.In contrast, if either U ij or U ji falls into an outer bucket, then the { i, j } dyad’s wiring isuniquely determined. For example if U ij is in the ﬁrst outer bucket and U ji is in the innerbucket, then the { i, j } dyad will take the reciprocated form with probability one. It is astrictly dominant strategy for i to direct an link to j in this case and a best response for j to reciprocate.For U = u , let J ( u ) ≤ (cid:0) N (cid:1) equal the number of dyads { i, j } , where both u ij and u ji fallinto their inner bucket. For each of these dyads both the empty and reciprocated conﬁguration10s an equilibrium outcome. There are therefore 2 J ( u ) equilibrium networks in this case; the N ( d , u ; θ ) function would assign some probability between zero and one to each of these 2 J ( u ) networks (summing to one in total). Let D NE N ( u ) be the set of 2 J ( u ) equilibrium networks when U = u . One equilibriumselection rule would assign equal probability to all NE. In this case we could write thelikelihood as P ( d ; θ, N ) = (cid:90) u ∈ R n (cid:0) d ∈ D NE N ( u ) (cid:1) | D NE N ( u ) | f u ( u )d u , (15)such that N ( d , u ; θ ) = ( d ∈ D NE N ( u ) ) | D NE N ( u ) | . This example illustrates that (13), while well-deﬁned, isgenerally intractable; even when the equilibrium selection mechanism is fully-speciﬁed. Our goal is to construct a powerful test for the presence of strategic interaction in networkformation with good size properties. Importantly we wish to remain agnostic about anydegree heterogeneity and homophily. Let ∆ denote a subset of the K + 2 N dimensionalEuclidean space in which δ = ( λ , A , B ) is, a priori, known to lie, andΘ = { ( γ, δ (cid:48) ) : γ = 0 , δ ∈ ∆ } . (16)Our null hypothesis is the composite one: H : θ ∈ Θ (17)since δ may range freely over ∆ ⊂ R K +2 N under the null.Under the null the likelihood is P ( d ; δ ) def ≡ P ( d ; (0 , δ (cid:48) ) (cid:48) , N ) with N ( d , u ; θ ) = (cid:89) i (cid:89) j (cid:0) A i + B j + W (cid:48) ij λ ≥ u ij (cid:1) d ij × (cid:0) A i + B j + W (cid:48) ij λ < u ij (cid:1) − d ij . Under the null the unique “equilibrium” network is the one where all links with positivemarginal utility are present and those with negative marginal utility are not; N ( d , u ; θ )places a probability of 1 on this network. Evaluating the integral (13) under the null yields For simplicity we ignore mixed strategy equilibria in this example. ( d ; δ ) = N (cid:89) i =1 (cid:89) j (cid:54) = i (cid:34) exp (cid:0) W (cid:48) ij λ + R (cid:48) i A + R (cid:48) j B (cid:1) (cid:0) W (cid:48) ij λ + R (cid:48) i A + R (cid:48) j B (cid:1) (cid:35) d ij × (cid:34)

11 + exp (cid:0) W (cid:48) ij λ + R (cid:48) i A + R (cid:48) j B (cid:1) (cid:35) − d ij where R i is the N × i th element and zeros elsewhere. Variants of thislikelihood are analyzed by Chatterjee et al. (2011), Charbonneau (2017), Graham (2017),Jochmans (2018), Dzemski (2018) and Yan et al. (2018). Under the null our likelihood, P ( d ; δ ), is a member of the exponential family. To see this itis helpful to establish some additional notation. The out- and in-degree sequences equal: S = (cid:32) S out S int (cid:33) (cid:48) = (cid:32) D , . . . , D N + D +1 , . . . , D + N (cid:33) . (18)Here D + i = (cid:80) j D ji and D i + = (cid:80) j D ij equal the in- and out-degree of agents i = 1 , . . . , N .The K × K cross-link matrix equals M = (cid:88) i (cid:88) j D ij X i X (cid:48) j . (19)This matrix summarizes the inter-group link structure in the network (homophily). The kl th element of M records the number of links sent by type k agents (e.g., semiconductormanufacturers) to type l agents (e.g., computer manufacturers).Let S , M be a degree sequence and cross-link matrix. We say S , M is graphical if thereexists at least one arc set A such that G ( V , A ) is a simple directed graph with degreesequence S and cross link matrix M . We call any such network a realization of S , M . Theset of all possible realizations of S , M is denoted by G S , M ( D S , M denotes the associated setof adjacency matrices).With this notation it is easy to verify that the null model belongs to the exponentialfamily: P ( d ; δ ) = c ( δ ) exp ( t (cid:48) δ ) (20)with a (minimally) suﬃcient statistic for δ of t = (cid:0) vec ( m (cid:48) ) (cid:48) , s (cid:48) out , s (cid:48) in (cid:1) (cid:48) . In words, the K + N + N suﬃcient statistics are (i) the cross link matrix, (ii) the out-degree sequence and (iii)12he in-degree sequence.Under H the conditional likelihood of the event D = d is P ( d | T = t ) = 1 | D s , m | (21)if d ∈ D s , m and zero otherwise. Under the null of no strategic interaction all networks withthe same in- and out-degree sequences and cross link structure are equally likely. Importantlythis conditional likelihood is invariant to actual value of the nuisance parameter δ . In our setting a test with critical function φ ( D ) will have size α if its null rejection probability(NRP) is less than or equal to α for all values of the nuisance parameter:sup θ ∈ Θ E θ [ φ ( D )] = sup γ = γ ,δ ∈(cid:52) E θ [ φ ( D )] = α. (22)Since the nuisance parameter δ is very high dimensional, size control is non-trivial. Forsome intuition as to why consider, as an example, the case where s ij ( d ) = (cid:80) k d ik d kj , suchthat agents’ have a taste for transitivity when γ >

0. A natural test statistic in this casewould be some function of D that is increasing in the number of transitive triads, , inthe network. The researcher would then reject the null of γ = 0 when this statistic is largeenough. Unfortunately, the expected number of transitive triads varies dramatically underthe null depending on the value of δ . Certain conﬁgurations of A , B and/or λ may resultin a network with large numbers of transitive triads even when agents’ have no taste fortransitivity per se (i.e., under the null). If we choose a single critical value for rejection then,depending on the values of A , B and/or λ , size may be very poor.To avoid any size distortion induced by variation in δ over ∆ ⊂ R K +2 N we exploit the ex-ponential family structure of our model (under the null). Let T = { ( s , m ) : s , m is graphical } be the set of possible suﬃcient statistics T . We proceed conditional on T ; that is, instead ofchoosing a single critical value, which may result in under- or over-rejection, depending onthe value of δ , we proceed conditionally on T (the minimally suﬃcient statistic for δ ). Ourchosen critical value varies with T . In this way we ensure good size control.Formally, for each t ∈ T we form a test with the property that, for all θ ∈ Θ , E θ [ φ ( D ) | T = t ] = α. (23)13uch an approach ensures similarity of our test since, by iterated expectations, E θ [ φ ( D )] = E θ [ E θ [ φ ( D ) | T ]] = α (24)for any θ ∈ Θ (Ferguson, 1967). By proceeding conditionally we ensure that the NRP isunaﬀected by the value of δ .By Ferguson (1967, Lemma 1, Section 3.6) T is a boundedly complete suﬃcient statisticfor θ under the null. By Ferguson (1967, Theorem 2, Section 5.4) every similar test will thustake the form E θ [ φ ( D ) | T = t ] = α (25)for t ∈ T . Therefore, if we desire similarity of our test we must take the conditional approach. A conditional test

We have shown that proceeding conditionally results in a similar test. Here we outline,concretely, how to construct an exact similar test. There will be two limitations associatedwith this test. First, since the test statistic is chosen heuristically, it may not have goodpower in the direction of the alternative of primary interest. Second, it is generally notcomputationally feasible to compute the exact test.In subsequent sections we address both of these limitations. Speciﬁcally we derive theform of the optimal test statistic and outline an MCMC algorithm for approximating its nulldistribution.Let R ( D ) be some statistic of the adjacency matrix. For example R ( D ) might be thenetwork reciprocity index (Newman, 2010): R ( D ) = 2 ˆ P ( )2 ˆ P ( ) + ˆ P ( ) , (26)where ˆ P ( ) = 2 N ( N − N − (cid:88) i =1 N (cid:88) j = i +1 [ D ij (1 − D ji ) + (1 − D ij ) D ji ] (27)equals the fraction of dyads which take an unreciprocated or “asymmetric” conﬁgurationand ˆ P ( ) = 2 N ( N − N − (cid:88) i =1 N (cid:88) j = i +1 D ij D ji (28)the fraction which take a reciprocated or “mutual” conﬁguration . Heuristically, this choiceof R ( D ) might be useful for detecting whether agents have a taste for reciprocated links.14 conditional test based upon R ( D ) will have a critical function of φ ( d ) =  R ( d ) > c α ( t ) g α ( t ) R ( d ) = c α ( t )0 R ( d ) < c α ( t ) (29)where the values of c α ( t ) and g α ( t ) ∈ [0 ,

1] are chosen to satisfy the requirement that E θ [ φ ( D ) | T = t ] = α .Under the null all adjacency matrices with the S = s and M = m are equally probable.Therefore the null distribution of R ( D ) coincides with the one induced by a discrete uniformdistribution on D s , m . By enumerating all adjacency matrices in D s , m we could exactly com-pute this distribution and calculate the critical values c α ( t ) and g α ( t ). In general such abrute force approach will be infeasible. Therefore a method of approximating the exact nulldistribution is required.The intuition behind this test is straightforward. If the network in hand has an “unusu-ally” large value of R ( D ) relative to the set of all networks with same in- and out-degreesequences and cross-link matrix, then we reject our null. The locally best conditional test

While choosing a statistic of adjacency matrix heuristically may lead to a test with good powerin practice, there is no guarantee that it will. In this section we derive, for any interdependentpreference structure, a test statistic with good power to detect small deviations from the nostrategic interaction null.Under the alternative of strategic interaction the conditional likelihood is P ( d | T = t ; θ, N ) = P ( d ; θ, N ) (cid:80) v ∈ D s , m P ( v ; θ, N ) . (30)Two features of this likelihood make it impractical and/or unattractive for use in testing.First, it is complicated and (logically) cannot be evaluated without specifying an explicitequilibrium selection mechanism, N ( d , u ; θ ). We wish to develop inference methods whichdo not depend upon details of equilibrium selection. Even if the researcher were able to specify N ( d , u ; θ ), numerical evaluation of the likelihood may be impractical. Second, equation (30) In fact very little is known about the set D s , m ; for example we are aware of no method for checking whethera given s , m pair is graphic. From related settings we believe that the cardinality of D s , m will typically beintractably huge even for modestly-sized networks. See Blitzstein and Diaconis (2011) for discussion of thispoint and examples from a related setting. Of course, there is longstanding tradition of choosing test statistics heuristically. See Cox (2006) forinteresting discussion. δ .Assume that both δ and the equilibrium selection mechanism N ( d , u ; θ ) are known (thelatter up to knowledge of γ of course), then, by the Neyman-Pearson Lemma, the mostpowerful test for the simple hypothesis H : γ = 0 versus H : γ = γ a would be based uponthe likelihood ratio (LR) P ( d | T = t ; ( γ a , δ (cid:48) ) (cid:48) , N ) | D s , m | − . If the likelihood of the network in hand ( D = d ) is high under the alternative relative tothe discrete uniform reference distribution, then we reject. Unfortunately, as noted above,forming a LR requires specifying an equilibrium selection mechanism, N . This is not straight-forward to do, and we would prefer to avoid doing so in any case.As an alternative to a LR test, we instead choose, for each t ∈ T , the crit-ical function, φ ( D ) to maximize the derivative of the (conditional) power function β ( γ, t ) = E [ φ ( D ) | T = t ] evaluated at γ = 0 subject to the (conditional) size constraint E θ [ φ ( D ) | T = t ] = α . Such a φ ( D ) is locally best (Ferguson, 1967, Lemma 1, Section 5.5).Remarkably we show that the locally best test doesn’t not depend upon the form of theequilibrium selection mechanism N ( d , u ; θ ).Diﬀerentiating the power function we get ∂β ( γ, t ) ∂γ (cid:12)(cid:12)(cid:12)(cid:12) γ =0 = E [ φ ( D ) S γ ( D | T ; θ ) | T = t ] (31)with S γ ( d | t ; θ ) denoting the conditional score function S γ ( d | t ; θ ) = 1 P ( d ; δ ) ∂P ( d ; θ ) ∂γ (cid:12)(cid:12)(cid:12)(cid:12) γ =0 − (cid:88) v ∈ D s , m ∂P ( v ; θ ) ∂γ (cid:12)(cid:12)(cid:12)(cid:12) γ =0 = 1 P ( d ; δ ) ∂P ( d ; θ ) ∂γ (cid:12)(cid:12)(cid:12)(cid:12) γ =0 + k ( t )and k ( t ) only depending on the data through T = t .By the Neyman-Pearson lemma the test with critical function φ ( d ) =  P ( d ; δ ) ∂P ( d ; θ ) ∂γ (cid:12)(cid:12)(cid:12) γ =0 > c α ( t ) g α ( t ) P ( d ; δ ) ∂P ( d ; θ ) ∂γ (cid:12)(cid:12)(cid:12) γ =0 = c α ( t )0 P ( d ; δ ) ∂P ( d ; θ ) ∂γ (cid:12)(cid:12)(cid:12) γ =0 < c α ( t ) (32)where the values of c α ( t ) and g α ( t ) ∈ [0 ,

1] are chosen to satisfy (23), will be locally best.16he idea behind the locally best test is as follows. If the likelihood increases sharply aswe move away from the null in the direction of the alternative of interest , then we take thisas evidence against the null. Intuitively if the likelihood gradient in the neighborhood of thenull is large, then the likelihood ratio will also be large for simple alternatives close to thenull (i.e., when γ a ∈ ( − (cid:15), (cid:15) )).Constructing (32) requires calculating P ( d ; δ ) ∂P ( d ; θ ) ∂γ (cid:12)(cid:12)(cid:12) γ =0 . This is not straightforward sinceit depends on properties of the likelihood under the alternative. Surprisingly we are able toderive the form of this derivative. Theorem 2.1. P ( d ; θ, N ) is twice diﬀerentiable with respect to γ at γ = 0 . Its ﬁrst derivativeat γ = 0 is ∂P ( d ; θ, N ) ∂γ (cid:12)(cid:12)(cid:12)(cid:12) γ =0 = P ( d ; δ ) × (cid:34)(cid:88) i (cid:54) = j s ij ( d ) (cid:40) d ij f U ( µ ij ) (cid:82) v ij −∞ f U ( u ) d u − (1 − d ij ) f U ( µ ij ) (cid:82) ∞ v ij f U ( u ) d u (cid:41)(cid:35) . (33) recalling that µ ij = A i + B j + X (cid:48) i Λ X i equals the systematic, non-strategic, component ofutility generated by arc ij and that f U is the logistic density. The proof of Theorem 2.1 can be found in Section A.2 of the Appendix. Here, becauseit is one of our main results, and also insightful to do so, we provide a high level overview ofits derivation. Although ∂P ( d ; θ, N ) ∂γ (cid:12)(cid:12)(cid:12) γ =0 does vary with δ , it does not depend upon N . Belowwe provide some intuition for this result.Recall that S = { s, s , . . . , s M , s } equals the possible values of s ij ( d ), arranged fromsmallest to largest. We can use these support points to partition R into a set of intervals B : R = ( −∞ , µ ij + γs ] ∪ ( µ ij + γs, µ ij + γs ] ∪· · · ∪ ( µ ij + γs M , µ ij + γs ] ∪ ( µ ij + γs, ∞ ) . (34)The elements of B , called buckets , correspond to the intervals listed in (34). In principle weshould write B ij instead of B , reﬂecting the dependence of the bucket deﬁnitions on the valueof µ ij , the systematic non-strategic utility associated with an i -to- j link. However, since thisdependence is not essential to any of the arguments that follow we leave it implicit. Notethat the cardinality of B does not depend on µ ij , but instead equals | S | + 1.Agent i ’s linking behavior vis-a-vis j depends on which bucket U ij falls into. For B ∈ B , if U ij ∈ B , then we say U ij is in, or falls into, bucket B . The ﬁrst and last buckets, respectively17 −∞ , µ ij + γs ] and ( µ ij + γs, ∞ ), play an important role in our argument. We call these twobuckets outer buckets . The rest of the buckets we call inner buckets .If U ij falls into one of these outer buckets then player i has a pure strategy for d ij whichis strictly dominating. Speciﬁcally if U ij falls into the lowest bucket, then i will direct an linkto j regardless of what actions are taken by the other agents in the network. The marginalutility generated by link ij is so large that it remains positive across all possible conﬁgurationsof the rest of the network; hence i always chooses to direct an link to j .If, instead, U ij falls into the highest bucket, then i will never direct an link to j . In thiscase the marginal utility associated with link ij is so low that it remains negative across allpossible conﬁgurations of the rest of the network; hence i never chooses to direct a link to j .Finally, if U ij falls into an inner bucket, say ( µ ij + γs m , µ ij + γs m +1 ], then agent i ’s op-timal choice for d ij is contingent upon the linking behavior of other agents. If other agents’link actions are such that s ij ( d ) ≥ s m , then it is a best response for i to link with j , but nototherwise.The vector of idiosyncratic taste shocks, U contains n = N ( N −

1) elements; one for eachpossible arc. Let the boldface subscripts i = , , . . . index these potential arcs in arbitraryorder (e.g., i maps to some ij and vice-versa). Let b ∈ B n def ≡ B × · · · × B and U =( U , . . . , U n ) (cid:48) ; we have that U ∈ b for b ∈ B n so that each element of u falls into a bucket.With the above notation established we can rewrite the likelihood (13) as: P ( d ; θ, N ) = (cid:88) b ∈ B n (cid:90) u ∈ b N ( d , u ; θ ) f U ( u ) d u (35)Expression (35) suggests a derivation by cases approach to ﬁnding ∂P ( d ; θ, N ) ∂γ (cid:12)(cid:12)(cid:12) γ =0 . Fortunatelya brute force exhaustive approach is not required because it is possible to show that most ofthe summands in (35) do not inﬂuence the derivative at γ = 0.Let ˜ B n be the set of bucket conﬁgurations with at least two inner buckets. If at least twoelements of U fall in inner buckets, then we have that U ∈ b with b ∈ ˜ B n . If, instead, atmost one element of U falls in an inner bucket, then we have that U ∈ b with b ∈ B n \ ˜ B n .This set-up gives the likelihood decomposition: P ( d ; θ, N ) = ˜ P ( d ; θ, N ) + Q ( d ; θ, N ) , (36)18ith ˜ P ( d ; θ, N ) = (cid:88) b ∈ B n \ ˜ B n (cid:90) u ∈ b N ( d , u ; θ ) f U ( u ) d u (37) Q ( d ; θ, N ) = (cid:88) b ∈ ˜ B n (cid:90) u ∈ b N ( d , u ; θ ) f U ( u ) d u . (38)To proove Theorem 2.1 we show that for γ → P ( d ; θ, N ) = ˜ P ( d ; θ, N ) + O (cid:0) γ (cid:1) . (39)Intuitively, this follows from the fact that the chance that two or more elements of U fall ininner buckets is negligible when γ is close to zero (because most of the probability mass for U ij is contained in the two outer buckets when strategic interactions are small). Hence whencalculating the optimal test statistic we are free to focus on the cases where either all, or allbut one, of the elements of U fall in outer buckets. We can then show that ∂P ( d ; θ, N ) ∂γ (cid:12)(cid:12)(cid:12)(cid:12) γ =0 = ∂ ˜ P ( d ; θ, N ) ∂γ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) γ =0 . (40)Hence to derive the form of ∂P ( d ; θ, N ) ∂γ (cid:12)(cid:12)(cid:12) γ =0 we need only calculate ∂ ˜ P ( d ; θ, N ) ∂γ (cid:12)(cid:12)(cid:12) γ =0 . This calcula-tion is non-trivial, but doable. Details of this calculation are provided in the proof.It was not ex ante obvious that a useful expression for ∂P ( d ; θ, N ) ∂γ (cid:12)(cid:12)(cid:12) γ =0 would be availablewithout any assumptions about the nature of equilibrium selection under the alternative.That such as expression is available follows from the fact that when γ is small most agentswill have a strictly dominating pure strategy for how to link; hence the chance for multipleequilibria is low and the form of N can generally be deduced. Conversely, when γ is smallthe probability of a draw of U where many agents do not have a strictly dominating purestrategy, and hence the details of equilibrium selection matter, is very low. Locally best vs. heuristic test statistics

With a little manipulation we can simplify:1 P ( d ; δ ) ∂P ( d ; θ ) ∂γ (cid:12)(cid:12)(cid:12)(cid:12) γ =0 = (cid:88) i (cid:54) = j [ d ij − F U ( µ ij )] s ij ( d ) (41)where F U ( u ) = e u / [1 + e u ] is the logistic CDF. This form of the statistic provides insightinto how our test accumulates evidence against the null in practice. Consider the case where19 ij ( d ) = d ji , as would be true in agents’ have a taste for reciprocated links. Observe that F U ( µ ij ) corresponds to the probability of an ij edge under the null. Therefore the optimaltest statistic is large if we observe that many ij links with low probability under the null arereciprocated. It is not many reciprocated links that drives rejection per se, but the presenceof many “unexpected” reciprocated links.Consider a network of boys and girls with agents exhibiting a strong taste for gender-based homophily. The optimal test statistic in this case is the conditional sample covarianceof D ij and D ji given ( A i , B i , X i ) and ( A j , B j , X j ). The test based upon the reciprocity indexis – essentially – based upon the unconditional covariance. The eﬀect of conditioning is to,for example, given more weight to heterophilous reciprocated links than to homophilous ones.Similarly we give more weight to reciprocated links across low degree agents, than to thoseacross high degree agents. Because a complete enumeration of D s , m is not feasible unless N is very small, making ourtest practical requires a method of constructing uniform random draws from this set. Suchdraws can be used to simulate the null distribution of any test statistic of interest.The problem of simulating binary matrices with ﬁxed marginals is well-studied (e.g., Sin-clair, 1993); with many domain speciﬁc applications (e.g., species co-occurrence/interactionanalysis). In practice one of two simulation approaches is used (see Kolaczyk (2009) for atextbook overview).The ﬁrst approach begins with an empty graph and randomly adds links. Links needto be added such that the end graph satisﬁes the degree sequence constraint. Blitzsteinand Diaconis (2011) develop an algorithm along these lines. They cleverly use checks forgraphicality of a degree sequence, available in the discrete math literature, to add links ina way which constrains the end graph to be in the target set. They further use importancesampling to ensure that averages of simulated network statistics are with respect to the targetuniform distribution. See also Del Genio et al. (2010) and Kim et al. (2012). Graham andPelican (2020) provide a textbook discussion of the Blitzstein and Diaconis (2011) algorithm.The second approach, to which our new method belongs, uses MCMC. Speciﬁcally aninitial graph, satisfying the target constraints, is randomly rewired many times to createa new graph from the target set. Key to this approach is ensuring that each rewiring iscompatible with the target constraints (e.g., maintains the network’s degree sequence). Thealgorithm also needs to be constructed carefully to ensure that the end graph is a uniform random draw from the target set. Sinclair (1993), Rao et al. (1996), McDonald et al. (2007),20erger and M¨uller-Hannemann (2009) and Tao (2016) all developed MCMC methods forsimulating graphs (or digraphs) with given degree sequences.We are aware of no extant method of generating adjacency matrix draws from D s , m . Thenovelty of this problem, relative to the work described above, is the presence of the addi-tional cross link matrix constraint, M . In the discrete math literature the cross link matrixconstraint corresponds to what is called a partition adjacency matrix (PAM) constraint. Cz-abarka et al. (2017) conjecture that the determining whether a given s , m pair is graphical,the PAM realization problem, is NP-complete. If their conjecture is correct (and NP (cid:54) = P),using a Blitzstein and Diaconis (2011) type algorithm to draw from D s , m is not feasible.This leaves MCMC methods. Erd˝os et al. (2017) showed that naively incorporating aPAM constraint into existing MCMC algorithms destroys their correctness. In this sectionwe introduce a new MCMC algorithm that does generate uniform random draws from D s , m .This algorithm is of independent interest. Before describing the algorithm we introduce someadditional deﬁnitions and notation. We start by deﬁning an alternating walk.

Deﬁnition 3.1. (Alternating Walk)

An alternating walk H is sequence of (ordered)dyads of the form H := ( i , i ) , ( i , i ) , ( i , i ) , . . . , ( i l , i l − ) (42)or H := ( i , i ) , ( i , i ) , ( i , i ) , . . . , ( i l − , i l ) (43)with i k ∈ V ( G ), i k (cid:54) = i k +1 , i k (cid:54) = i k − and(i) if ( i k , i k − ) , ( i k , i k +1 ) in H, ( i k , i k − ) ∈ A ( G ), then ( i k , i k +1 ) / ∈ A ( G )(ii) if ( i k , i k − ) , ( i k , i k +1 ) in H, ( i k , i k − ) / ∈ A ( G ), then ( i k , i k +1 ) ∈ A ( G )(ii) if ( i k − , i k ) , ( i k +1 , i k ) in H, ( i k − , i k ) ∈ A ( G ), then ( i k +1 , i k ) / ∈ A ( G )(iv) if ( i k − , i k ) , ( i k +1 , i k ) in H, ( i k − , i k ) (cid:54)∈ A ( G ), then ( i k +1 , i k ) ∈ A ( G )for all k = 2 , . . . , l − H := i i , . . . , i l . To unpack Deﬁnition 3.1 it is easiest to consider an example.Observe that for H := i i , . . . , i l , the adjacency matrix entries D i i , D i i , . . . , D i l i l − alternate between ones and zeros (or zeros and ones). This observation suggests a method ofconstructing an alternating walk via a sequence of “hops” across the adjacency matrix: pickrow i of the adjacency matrix and move horizontally to column i , where i corresponds to21ne of the agents to which i directs a link, next move vertically to row i , where i is anagent which does not direct a link to i , and so on. We call the horizontal moves active steps and vertical moves passive steps . Figure 2 provides an example construction. The diﬀerentcases in Deﬁnition 3.1 correspond to walks beginning/ending with passive/active steps.The length of an alternating walk equals the number of ordered dyads used to deﬁne it.An important type of alternating walk, which following Tao (2016), we call an alternatingcycle , is central to our algorithm.

Deﬁnition 3.2. (Alternating Cycle)

The alternating walk C is an alternating cycle if i = i l and C has even length.The length of an alternating cycle is at least four. Let D i i , D i i , . . . , D i l i l − be thesequence of adjacency matrix entries associated with alternating cycle C in G . These entriesnecessarily form a sequence of zeros and ones (or ones and zeros).Consider constructing an alternative digraph, say G (cid:48) , by replacing all the “ones” in C with “zeros” and all “zeros” with “ones”. Rewiring G in this way is degree preserving: G (cid:48) has the same in- and out-degree sequence as G . We refer to such operations as switching thecycle (since we switch the zeros and ones).Figure 3 depicts two canonical alternating cycles (Rao et al., 1996). The ﬁrst, C := abcda ,is a so called alternating rectangle . In the conﬁguration to the left a and d each have a singleoutlink and b and c a single inlink; this is also true in the conﬁguration to the right, whichcorresponds to the the network generated by switching C . The second cycle, called a compactalternating hexagon , can be constructed from a 030 C ( ) triad.Let G s denote the set of all digraphs with degree sequence s . Rao et al. (1996) showedthat, for G and G (cid:48) distinct and belonging to G s , it is possible to obtain G (cid:48) from G by switchinga sequence of alternating rectangles and compact alternating hexagons. In practice it maybe useful, in the sense that one can move from G to G (cid:48) with fewer cycle switches, if longeralternating cycles are used (e.g., McDonald et al., 2007; Tao, 2016).Let K N denote the complete graph on N vertices. We deﬁne the link marking function m : A ( K N ) → { , } . We say the link ( i , i ) is marked if m (( i , i )) = 1 and unmarked if m (( i , i )) = 0. The expression “mark a link” means the marking function is changed suchthat m (( i , i )) = 1. We use the expression “unmark a link analogously”. Deﬁnition 3.3. (Schlaufe)

An alternating walk H := i i . . . i l is a schlaufe if either(i) There is a node i k ∈ { i i . . . i l } with k (cid:54) = l such that i k = i l and ( k − l ) mod 2 = 0.Furthermore for any two nodes i j and i h in { i i . . . i l − } with i j = i h and j (cid:54) = h it holds that( j − h ) mod 2 = 1. This description is essentially due to (Tao, 2016, p. 124). a b c d e f g h i ja 0 1 0 0 1 0 0 0 0 0b 0 0 0 0 0 0 0 0 0 0c 0 0 0 1 0 0 0 1 0 0d 0 0 0 0 0 0 0 0 0 0e 0 0 1 0 0 0 0 0 0 0f 0 0 0 0 0 0 0 1 1 0g 0 0 0 0 0 0 0 0 0 0h 0 0 1 0 0 0 1 0 0 0i 0 0 0 0 0 0 0 0 0 0j 0 0 0 0 0 0 1 0 1 0

A: Alternating Walk

Indegree Outdegreea 0 2b 1 0c 2 2d 1 0e 1 1f 0 2g 2 0h 2 2i 2 0j 0 2

B: DegreeSequence

Source: Authors’ calculations.Notes: Panel A depicts an alternating walk constructed using the adjacency matrix. Agentlabels are given in the ﬁrst column and row of the table. To construct such a walk randomlywe begin by choosing an agent at random. Here agent j is chosen, with an ex ante probabilityof since there are ten agents in the network. Next we take an active step where one ofagent j ’s outlinks is chosen at random. Here we choose the outlink to agent g , an eventwith an ex ante probability of since agent j has just two outlinks. Following the activestep comes a passive step. In a passive step we move vertically to the row of an agent whichdoes not direct a link to the current agent. Here we choose a from the set { a, b, c, d, e, f, i } uniformly at random (i.e., with an ex ante probability of ). We continue with active andpassive steps until we choose to stop or can proceed no further. Panel B reports the indegreeand outdegree of each agent in the network. Observe that in active steps the probability ofany feasible choice equals the inverse of the outdegree of the current agent. In passive stepsthe probability of any feasible choice equals the inverse of the number of nodes minus theindegree of the node chosen in the prior step minus 1 (since i k (cid:54) = i k +1 ). We can also constructalternating walks by the above procedure, but instead starting with a passive step.23igure 3: Examples of alternating cycles AlternatingRectangle ab cd ab cd a b c d ai 1 2 3 4 5sel. Pr

14 11 13 11 12

Pr( R ) = CompactAlternatingHexagon a b c a b c a b c a b c ai 1 2 3 4 5 6 7sel. Pr

13 11 11 11 11 11 11

Pr( R ) = Source: Authors’ calculations.Notes: The ﬁrst row depicts an alternating rectangle before and after switching. The secondrow depicts a compact alternating hexagon before and after switching. The ﬁnal columnof the ﬁgure shows the probability with which each node is chosen, resulting in the totalprobability of the schlaufe.(ii) At node i l there is no other node i l +1 such that the alternating walk could be extendedwith the unmarked link ( i l , i l +1 ).In German schlaufe corresponds to “loop”, “bow” or “ribbon” (its plural is schlaufen); thelatter translation is evocative of our meaning here. In the ﬁrst case the schlaufe will coincidewith an alternating walk which includes exactly one alternating cycle. Visually schlaufen ofthe ﬁrst type, with the nodes appropriately placed, will look like loops and ribbons. In thesecond case the schlaufe does not include an alternating cycle.Associated with a schlaufe, R , is a K × K violation matrix which records the numberof extra links from group k to group l generated by switching the alternating cycle in R (if there is one). Consider an alternating rectangle consisting of two boys and two girls. Ifinitially one boy directs a link to the other and one girl directs a link to the other, then afterswitching the cycle the violation matrix will equal:Ego \ Alter Boy GirlBoy -1 1Girl 1 -1 The requirement that i k = i l and ( k − l ) mod 2 = 0 ensures that C = i k i k +1 . . . i l is an alternating cycle(imposing even length). The “furthermore...” requirement ensures that if another node is visited multipletimes it does not form an alternative cycle (imposing non-even length). See Figure 4 for an example. R = ( R , . . . , R k ) feasible if (i) the cycles of the schlaufenare link disjoint and (ii) the sum of their violation matrices is zero (and for i < k the sum oftheir violation matrices is not zero).Conventional MCMC adjacency matrix re-wiring algorithms work by switching short cy-cles (e.g., alternating rectangles and compact alternating hexagons). Switches of this type,while preserving the in- and out-degree sequence of the network will typically generate net-works with the wrong inter-group link structure (i.e., non-zero link violation matrices). Ourapproach to solving this problem involves switching many alternating cycles simultaneouslysuch that their individual link violation matrices sum to zero. Let S = s and M = m be the degree sequence and cross link matrix of the network inhand. In order to a draw, say G (cid:48) , from G s , m we (i) start with a realization of ( s , m ), say G , (ii) randomly construct (link disjoint) schlaufen, and (iii) switch any alternating cycles inthem. While switching cycles will preserve the degree sequence, it may – as discussed earlier– result in a graph without the appropriate cross link matrix. In order to ensure that G (cid:48) has the appropriate cross link matrix, we construct schlaufen until either the sum of theirviolation matrices equals zero or we stop randomly. If the sum of the schlaufen violationmatrices is zero we move to G (cid:48) from G by switching the cycles, otherwise we set G (cid:48) = G .Proceeding in this way ensures that G (cid:48) is, in fact, a random draw from G s , m . After suﬃcientlymany iterations of this process we show that a graph constructed in this way correspondsto uniform random draw from G s , m . A formal statement of the procecedure is provided byAlgorithm 1.Algorithm 1 uses a subroutine to ﬁnd schlaufen. This subroutine, described in Algorithm2, ﬁnds and marks a schlaufe in the graph.To illustrate our method in more detail consider the network depicted in Panel A of Figure4. This network consists of two types of agents: gold (light) and blue (dark). The cross linkmatrix for the graph is given in Panel D. In Panels B and C a sequence of three schlaufenis shown. The ﬁrst schlaufen is R = jgabcdeca . It is constructed through a sequence ofactive and passive steps as described earlier (see also the notes to Figure 2 above). We beginby choosing agent j randomly with a probability of (since there are ten agents in thenetwork). We then take an active step, randomly choosing one of the two agents to which j directs a link (i.e., either agent g or i ). Here we choose agent g . Next we take a passivestep. Speciﬁcally we choose an agent at random from the set of agents that do not direct a25 lgorithm 1 Markov Draw Algorithm

Inputs:

An adjacency matrix d ∈ D s , m ; a mixing time τ Procedure:

1. Set t = 0.2. With probability 1 − q go to step 3, with probability q go to step 4.3. ﬁnd and mark a schlaufe (see Algorithm 2):(a) if the sum of the schlaufen violation matrices is zero, theni. switch the cycles in the schlaufen (changing the adjacency matrix d ),ii. unmark all links,iii. go to step 4.(b) else i. with probability , go to step 3 orii. with probability , unmark all links and go to step 4.4. Set t = t + 1(a) if t = τ then return d (b) else go to step 2 Output:

A uniform random draw d from D s , m lgorithm 2 Schlaufe Detection Algorithm

Inputs:

An adjacency matrix d ∈ D s , m (this network may have marked links in it) Procedure:

1. Choose an agent/node, say i , at random.2. Mark agent i as active and(a) if feasible, randomly choose one of i (cid:48) s (unmarked) outlinks, say to j , and go tostep 3;(b) else (i.e., no unmarked outlinks available) go to step 6.3. Mark edge ij , chosen in step 2 and(a) if agent j is already marked passive, then go to step 6;(b) else go to step 4.4. Mark agent j , chosen in step 3, as passive and(a) if feasible, randomly choose an agent, say k , from among those who do not directlinks to j , and go to step 5,(b) else go to step 6.5. Mark edge kj , with k the agent chosen in step 4, as passive and(a) if agent k is already marked active, then go to step 6;(b) else go to step 2.6. return the (marked) adjacency matrix, the constructed schlaufe and its violation matrix. Output:

A schlaufe, its violation matrix and a marked adjacency matrix.27ink to g (the agent chosen in the previous active step). The probability associated with ourchoice in this passive step is ; this corresponds to the reciprocal number of agents in thenetwork (i.e., 10) minus the indegree the current agent (i.e., 2) minus one (since self-loopsare not allowed). We continue taking active and passive steps in this way until we visit a forthe second time. At this point we stop since our schlaufe now includes the alternating cycle C = abcdeca . Note that c is also visited twice, but also that cdec is not an alternating cyclesince it is not of even length (see Deﬁnition 3.2).As seen in the example we can calculate the probability of a schlaufe R as we go throughthe algorithm (see Panel E). In Step 1 of Algorithm 2 an agent is chosen with probability N .Next let r aG ( i ) be the cardinality of the set of feasible out links in an active step. This setconsists of all the out links of node i , which are not already marked in D . Similarly, let r pG ( i )be the cardinality of the set of feasible outlinks in an passive step. That set consists of allthe links ij for which ji is not in A ( G ) and which are not already marked. The probabilityof R = ( i , .., i l ) can now be written as p G ( R ) = 1 N l − (cid:89) k =1 (cid:18) r aG ( i k ) [ k mod 2] + 1 r pG ( i k ) [( k −

1) mod 2] (cid:19) (44)In step 2 of Algorithm 1 we attempt to ﬁnd a sequence of schlaufen with probability1 − q and do not change the adjacency matrix otherwise. In step 3, a schlaufen sequence R = ( R , .., R h ) is constructed/found. After each detected schlaufe in this sequence, say R k , any cycle in it is marked. Let G k be the graph with the cycles of R , .., R k − marked.After each schlaufe added the construction is stopped with probability . The probabilityof ﬁnding a cycle R k is p G k ( R k ) as given in equation (44) above. The total probability of afeasible schlaufen sequence R is therefore p G ( R ) = (1 − q ) 12 ( h − h (cid:89) i =1 p G k ( R k ) . (45) To show that our algorithm does indeed generate a uniform random draw from the set D s , m we use standard Markov chain theory (e.g., Chapters 7 and 10 of Mitzenmacher and Upfal(2005)).The random rewiring of the network implemented by Algorithm 1 can be described asa Markov chain. To show that, for τ large enough, it returns a uniform random draw from D s , m we prove that the stationary distribution of the Markov chain generated by Algorithm1 is uniform on D s , m . To show this it is helpful to develop a graphical representation of the28igure 4: A feasible schlaufen sequence ab c de fg h ijA: Network prior to edge swaps Blue GoldBlue 1 3Gold 2 5

D: Cross-link (PAM) matrixab c de fg h ijB: Network withthree schlaufen shown j g a b c d e c ai 1 2 3 4 5 6 7 8 9sel. Pr

110 12 17 12 18 12 18 11 18 c h ei 1 2 3sel. Pr

110 11 17 f h j i h g fi 1 2 3 4 5 6 7sel. Pr

110 12 16 11 17 11 16

E: Three schlaufenab c de fg h ijC: Network afteredge swaps

Blue GoldBlue -1 +1Gold +1 -1Blue GoldBlue 0 0Gold 0 0Blue GoldBlue +1 -1Gold -1 +1

F: Violation matrices forthe three schlaufen

Source: Authors’ calculations.Notes: See the discussion in the main text. The ﬁgure depicts three link disjoint schlaufenwith violation matrices which sum to zero. Panel E reports the (ex ante) probability that agiven node was selected as the schlaufe was constructed. See equation (44).29arkov chain.We denote the state graph of the Markov chain by Φ = ( V φ , A φ ). Its underlying vertexset V φ is the set of all elements in G s , m . That is each node in our state graph is a networkwith degree sequence S = s and cross link matrix M = m . For G in G s , m , we denote by v G the corresponding vertex in V φ . The arc set A φ is deﬁned as follows.1. For all vertices we add the self loop ( v G , v G ) with (probability) weight q (see Step 2 ofAlgorithm 1).2. Let G and G (cid:48) be two diﬀerent networks in G s , m . Let G ∆ G (cid:48) equal the union of the setof edges in G , but not in G (cid:48) and the set of edges in G (cid:48) , but not in G . For each feasibleschlaufen-sequence R , with cycle edge set equal to G ∆ G (cid:48) we add the edge ( v G , v G (cid:48) ) andassign to it probability weight p G ( R ).3. Finally we add a directed loop ( v G , v G ) if the probability of all arrows leaving v G ,introduced in points 1 and 2 immediately above, do not sum to 1. The probability ofthis loop is 1 minus the sum of the probability of all other outward arrows.The probability of any arc a ∈ A φ is denoted by p ( a ). Note, by deﬁnition, the state graphcan have parallel arcs and loops.With these deﬁnitions in place we can prove correctness of the algorithm. First weshow that the probability of the algorithm moving from graph G to G (cid:48) coincides with theprobability of moving in the reverse direction. Lemma 3.1.

For any two vertexes v G , v (cid:48) G the transition probability attached to ( v G , v (cid:48) G ) equalsthat attached to ( v (cid:48) G , v G ) .Proof. See appendix A.3.Next we show the state graph is strongly connected. This means our Algorithm movesfrom any G ∈ G s , m to any other G (cid:48) ∈ G s , m with positive probability. Lemma 3.2.

The state graph Φ is strongly connected.Proof. See appendix A.3.With these two lemmata it is east to show that the stationary distribution is uniform on G s , m . This gives us the main result of the section. Theorem 3.3.

Algorithm 1 is a random walk on the state graph Φ which samples uniformlya network from G s , m for τ → ∞ . roof. See appendix A.3.We provide an easy to use python implementation of this algorithm as well as the optimaltest to a given utility function.Surce: https://github.com/AndrinPelican/ugd

Package: https://pypi.org/project/ugd/

Figure 5 shows the 2005 network of diplomatic exchanges in the Americas. We constructedthis network using the Correlates of War Diplomatic exchange dataset (Bayer, 2006). An ij arc indicates that country i has sent an ambassador to country j . As emphasized byKinne (2014), diplomatic recognition is a core tool of statecraft. Consequently, the decisionto establish a diplomatic mission in a country likely has strategic aspects. As a contemporaryexample, consider the decision to maintain diplomatic ties with Syria after the onset of theSyrian Civil War. This decision appears, in part, to be predicated upon the nature of astate’s bi-lateral relations with Syria’s long term allies Russia and Iran (along the lines ofthe “friend of my friend is also my friend” principle). The diplomatic network is also driven by a desire for prestige and practicality. Somenations, like the United States, send and host many ambassadors. Others send and receivevery few. The network also has geographic dimensions: countries typically send ambassadorsto close neighbors. These aspects of the network naturally generate network transitivitywhich is non-strategic . For example if most countries send an ambassador to, and host onefrom, the United States, then virtually any additional tie will generate a transitive triad;but this just reﬂects the superpower status of the United States, not a structural taste fortransitivity in relations. Similarly Central American nations may all host ambassadors from,and send them to, one another due to their strong common cultural, economic and securityconnections. A desire for transitivity need not play a role. For these reasons it is importantto allow for degree heterogeneity and (geographic) homophily when assessing whether nations(actively) prefer transitive ties. Transitivity can also operate negativity, as when mainland China chooses not recognize countries whichdiplomatically recognize Taiwan, China. Withholding recognition in this case increases transitivity as notdoing so would result in an intransitive triad.

Diplomacy Network of the Americas, 2005

North AmericaCentral America CaribbeanSouth America

Source: Correlates of War Diplomatic Exchange dataset (Bayer, 2006) and authors’ calcula-tions.Notes: Nodes correspond to capital cities. An arc from country i to country j indicates thatan ambassador represents i in j . The network is divided into four regions: (i) North America(California Gold), (ii) Central America (Berkeley Blue), (iii) the Caribbean (Lap Lane) and(iv) South America (Golden Gate) 32e consider a utility function of the form given in (2) with s ij ( d ) as deﬁned in (5). Weconsider tests of the γ = 0 null based upon the transitivity index (TI) as well as our locallybest test statistic. We also consider a variety of null reference distributions. These diﬀerentreference distributions illustrate the importance of controlling for degree heterogeneity andhomophily in practice.Panel A of Figure 6 plots the distribution of the transitivity index across three nullreference sets. First we consider the set of all networks with density equal to that observedempirically (0.399). The degree sequence and cross link matrices are allowed to freely vary.This reference distribution would be appropriate in the absence of any homophily ( λ = 0)and degree heterogeneity ( A i = B j = α for all i ).In empirical work it is common to compare the observed TI with network density andconclude that a taste for transitivity is present if the TI is substantially greater than density(the observed TI is shown by the vertical bar in the ﬁgure). Here this approach results in anabsurdly decisive rejection of the null (observed transitivity equals – by a country mile – the1.0 quantile of the null distribution); this is a “strawman” test.Second we consider the set of all networks with in- and out-degree sequences held equal tothose observed empirically (i.e., networks where the United States host 32 ambassadors from,and sends 26 to, other countries in the Americas and so on). This controls for heterogeneity inprestige and diplomatic activity across countries. This null distribution, however, continuesto assumes the absence of any homophily ( λ = 0).There are several extant methods for simulating uniform draws from this null distribu-tion. For example, Berger and M¨uller-Hannemann (2009), Kim et al. (2012), Blitzstein andDiaconis (2011), and Del Genio et al. (2010) describe methods for sampling networks withthe same degree sequence. Our algorithm easily handles this case as well.Fixing the degree sequence shifts the null reference distribution substantially to the right.This indicates that much of the observed transitivity in the American diplomatic network canbe explained solely by intrinsic variation in baseline diplomatic activity across nations (i.e.,degree heterogeneity). While we still easily reject the null in this case as well, the actual TIis much closer to – certainly not a country mile from – the mode of the reference distribution.Third, we additionally control for the cross region structure of diplomatic ties (i.e., wenow allow λ (cid:54) = 0). We consider the four regions shown in Figure 5. Hence the referencedistribution now includes all networks with the same in- and -out degree sequences and × · − ), the ordering of the three reference distributions demonstratesthe potential importance of controlling for degree heterogeneity and homophily in practice.33igure 6: Testing for strategic transitivity in diplomatic exchanges in the Americas (a) Transitivity index

38 40 420.000.050.100.150.200.250.300.350.40 66 68 70 72Transitivity Index F r e q u e n c y (b) Locally best test −160 −140 −120 −100 −80 −60 −40Locally Best Test Statistic0.000.020.040.060.080.100.120.14 F r e q u e n c y Source: Correlates of War Diplomatic Exchange dataset (Bayer, 2006) and authors’ calcula-tions.Notes: Plots of the test statistic null distribution. Panel A: Distribution of the transitivityindex (TI) according to three null models. The “California” colored histogram plots thedistribution of the TI across the set of networks with density as observed in the empiricalnetwork, but no other constraints imposed. The “Berkeley Blue” colored histogram plots thenull distribution when the in- and out-degree sequences are additionally held ﬁxed. Finally,the “Golden Gate” colored histogram additionally ﬁxes the structure of cross-region linkage(i.e., the cross link matrix). The actual TI is in, respectively the 1.0, 1.0 and 0.9998 quantilesof the three reference distributions. Panel B: Histogram plot of the null distribution of thelocally best test statistic holding the network’s degree sequence and cross link matrix ﬁxed.The observed test statistic lies in the 1.0 quantile of the reference distribution. Results basedon 10000 simulation draws. 34ailing do so could result in mis-interpreted rejections.Panel B of the ﬁgure plots the distribution of our locally best test statistic under the nullwhich ﬁxes both the degree sequence and cross link matrix. The observed locally best teststatistic exceeds in value all simulated values from the reference set. While, in this example, itappears to be the case that many “transitivity inspired” statistics would generate rejections,Panel B is suggestive of the power gains associated with using the local best statistics. Thelocally best statistic is well outside the support of the null distribution. We explore powercomparisons further in the Monte Carlo simulations below.

Our second example tests for substitutability between direct and indirect customers in asupply chain context. Firms, may, all things equal prefer to have many customers. Whilethere may be a ﬁxed costs to maintaining any customer relationship, having many customersmay reduce variation in demand for a ﬁrm’s products.Firms may, all things equal, also prefer to supply ﬁrms that themselves have many cus-tomers. For example, a customer whose own output is widely sought after may generate amore reliable stream of orders than that of a ﬁrm with few customers.In our baseline model, the supplier-speciﬁc eﬀect, A i , captures heterogeneity in ﬁrms’demand for customers. Similarly, the buyer-speciﬁc eﬀect, B j , captures heterogeneity incustomer attractiveness. If A i and B i positively co-vary, as is allowed by our model, then ﬁrmswith many customers (i.e., high A i ﬁrms) may be viewed as especially attractive customersthemselves (i.e., high B i ﬁrms). These features of our baseline model suggest that it is notsuitable for detecting whether ﬁrms value many direct (or indirect customers) per se; howeverwe can test for substitutability between direct and indirect customers.To construct such a test we set s ij ( d ) equal to s ij ( d ) = (cid:32)(cid:88) k (cid:54) = j d ik (cid:33) (cid:32)(cid:88) k d jk (cid:33) def ≡ d i + , − j × d j + (46)where d i + , − j is notation for the number of customers ﬁrm i has excluding any customerrelationship with j . With this speciﬁcation of s ij ( d ) we can interpret γ in terms of the“cross derivative” γ = ∂ M U ij ∂d i + , − j ∂d j + . (47)Here we hypothesize that γ <

0, so that ﬁrms with many direct customers value the indirect35ustomers of a ﬁrm less than ﬁrms with few direct customers.We implement our test using three industry-speciﬁc supply chain networks we constructedusing Compustat data: pharmaceuticals (see Figure 1), computers, and motor vehicles (seePanels A and B of Figure 7). We test for whether the marginal beneﬁt of an indirectcustomer is decreasing in the number of direct customers in each industry separately andusing all three networks simultaneously. Pooling is straightforward in our set-up: the broad4-digit manufacturing sector constitutes an observed ﬁrm attribute which is incorporatedinto the cross link matrix constraint.The p-value for the null of γ = 0 equals 0.02 for pharmaceuticals, 0.32 for computers and0.10 for motor vehicles. Pooling all three networks together yields a p-value of 0.05. Thesep-values are based upon 1000 simulated networks. The mixing time is chosen, such that eachedge is, on average, randomly modiﬁed 10 times before the network is considered a randomdraw from the target set. For the Monte Carlo experiments we work with the general utility function introduced inSection 1 above. We assume that A i ∈ A def ≡ { α L , α H } , B i ∈ B def ≡ { β L , β H } and X i ∈ X def ≡{ , } . We assume that each support point in A × B × X occurs with equal probability (i.e.,with probability equal to ).Observe that their are four types of sending agents: ( A i = α L , X i = 0), ( A i = α H , X i = 0),( A i = α L , X i = 1) and ( A i = α H , X i = 1). Similarly there are four types of receiving agents.The null model is therefore fully described by 16 = 4 × β H = α H = 1 . α L = β L = − . λ = λ = 0 and λ = λ = − . ,

000 Monte Carlo simulations with N = 48 average network density was 0.34,average transitivity was 0.53, and the average standard deviation of, respectively, in- andout-degree, was 4.1.We set the strategic interaction term to s ij ( d ) = (cid:80) k d ik d kj ; as is appropriate whenagents prefer transitive ties. To simulate a network under the alternative we draw U andthen, starting with an empty adjacency matrix, iterate to a ﬁxed point using equation (11).By Tarski’s Theorem this ﬁnds us the least dense pure strategy Nash Equilibrium.We compare the performance of three tests. The infeasible locally best test that is basedupon the true value of δ . The feasible version of this test which replaces δ with its maximumlikelihood estimate computed under the null (see Graham (2017), Dzemski (2018) and Yanet al. (2018) for a discussion of this particular MLE problem). Finally we construct an adhoc test based upon the transitivity index (TI). This last test is the one most often used in36igure 7: Buyer-supplier networks used to test for substitutability between direct and indirectbuyers (a) Computers Buyer-Supplier Network, 2015(b) Motor Vehicles Buyer-Supplier Network, 2015 General Motors Ford DaimlerHondaChrysler ToyotaVolkswagen Volvo Nissan TataPaccarNavistarGentex Gentherm TennecoBorgwarnerAutolivDelphiMetaldyne WabcoCooper-StandardTower Intl.

Motor Vehicle Manufacturing Buyer-Supplier Network

Motor vehicles (3361) Bodies & Trailers (3362) Parts (3363)

Source: Compustat and authors’ calculations.Notes: Plots of the computers and motor vehicles buyer-supplier networks in 2015 based uponCompustat data. The head of each arc denotes the buying ﬁrm. Nodes colored diﬀerentlyaccording to their sub-industry as listed in the legends. Largest weakly-connected componentis shown. 37able 1: Monte Carlo Design ParameterizationCalibrated Link Probability F ( α H + β H + λ ) 0.90 F ( α H + β L + λ ) 0.50 F ( α L + β L + λ ) 0.10 F ( α H + β H + λ ) 0.50 F ( α H + β L + λ ) 0.10 F ( α L + β L + λ ) 0.012Source: Authors’ calculations.Notes: Utility parameters for the Monte Carlo experiments were chosen to calibrate the nulllink probabilities listed in Column 1 to equal those values listed in Column 2.practice.Figure 8 summarizes our main ﬁndings. The two panels of the ﬁgure correspond to thetwo network sizes we considered: N = 24 and N = 48. The horizontal axes of the ﬁguresin each panel correspond to diﬀerent values of the strategic interaction parameter, γ ; thevertical axes to rejection frequencies. With 1000 Monte Carlo replications the standard errorof our simulation estimate of size is (cid:112) (0 .

05 (1 − . / ≈ . δ with its MLE (computed under the null), performsabout as well as the infeasible locally best test based on the actual value of δ .The Monte Carlo experiments highlight that the locally best test, which upweightsepisodes of “unexpected” transitivity, is more powerful than the ad hoc test based on com-paring the transitivity index (TI) with its null distribution. Note both tests are “valid” andcorrectly-sized. 38igure 8: Power Analysis (a) N = 24 agents/nodes R e j e c t i o n F r e q u e n c y Locally Best OracleLocally Best Feasible Transitivity Index (b) N = 48 agents/nodes R e j e c t i o n F r e q u e n c y Locally Best OracleLocally Best Feasible Transitivity Index

Source: Authors’ calculations.Notes: The ﬁgures plot the frequency with which H : γ = 0 is rejected across 1 , N = 24 and 48 agents. The y-axis reports the estimated rejection frequency, the x-axis gives the value of the strategicinteraction parameter, γ . The sparsest pure strategy NE is use to simulate each network.For each simulation a total of 400 MCMC draws from D s , m were used to compute criticalvalues. The mixing time was chosen such that every edge is randomly modiﬁed before thenetwork is considered a uniform draw. The marginal utility function equals A i + B j + X (cid:48) i Λ X i + γ s ij ( d ) − U ij with s ij ( d ) = (cid:80) k d ik d kj . The distribution of ( A i , B i , X i ) is as described in themain text; other model parameters are given in Table 1.39 Extensions

Graham and Pelican (2020) introduced an econometric model for network formation withtransferable utility (cf., Bloch and Jackson, 2007); appropriate for the analysis of undirectednetworks. They use the importance sampling algorithm of Blitzstein and Diaconis (2011)to simulate the null distribution of their test statistic. Their set-up does not allow forhomophily under the null; nor do they consider optimal test statistics. These extensioncould be accomplished use the simulation algorithms for undirected networks developed inPelican (2019) and the ideas of this paper.Our approach to testing could also be applied to settings where the econometrician ob-serves many independent games, each with a small number of players (see de Paula (2013)for a review); this arises when a small set of competing ﬁrms make entry decisions across alarge number of independent markets. Testing here is relatively better studied (e.g., Chenet al. (2018) and the references therein). An attractive feature of our framework for empiricalresearchers is that is can to easily handle, for example, “market level” unobservables (albeitat the cost of assuming logistic utility shocks).A primary advantage of our approach – complete agnosticism about equilibrium selectionunder the null – is also a limitation. It is not obvious how to adapt our method to, forexample, construct an identiﬁed set for γ (or a conﬁdence interval for this set). Research inthis direction would be useful.While much additional research remains to be done, we have provided a feasible andpowerful test for a key scientiﬁc hypothesis on the nature of network formation, whilst main-taining a rich and realistic null model structure. Our MCMC simulation algorithm is alsolikely to be of independent interest. References

Airoldi, E. M., Blei, D. M., Fienberg, S. E., and Xing, E. P. (2008). Mixed membershipstochastic blockmodels.

Journal of Machine Learning Research , 9:1981 – 2014.Anderson, J. E. (2011). The gravity model.

Annual Review in Economics , 3(1):133 – 160.Andrews, I., Stock, J. H., and Sun, L. (2019). Weak instruments in instrumental variablesregression: theory and practice.

Annual Review of Economics , 11:727 – 753.Apicella, C. L., Marlowe, F. W., Fowler, J. H., and Christakis, N. A. (2012). Social networksand cooperation in hunter-gatherers.

Nature , 481(7382):497 – 501.40talay, E., Horta¸csu, A., Roberts, J., and Syverson, C. (2011). Network structure of produc-tion.

Proceedings of the National Academy of Sciences , 108(13):5199 – 5202.Bala, V. and Goyal, S. (2000). A noncooperative model of network formation.

Econometrica ,68(5):1181 – 1229.Barab´asi, A.-L. (2016).

Network Science . Cambridge University Press, Cambridge.Bayer, R. (2006). Diplomatic exchange data set, v2006.1. Technical report, The Correlatesof War Project.Berger, A. and M¨uller-Hannemann, M. (2009). Uniform sampling of undirected and directedgraphs with a ﬁxed degree sequence.

CoRR , abs/0912.0685.Bickel, P., Choi, D., Chang, X., and Zhang, H. (2013). Asymptotic normality of maxi-mum likelihood and its variational approximation for stochastic blockmodels.

Annals ofStatistics , 41(4):1922 – 1943.Blitzstein, J. and Diaconis, P. (2011). A sequential importance sampling algorithm for gen-erating random graphs with prescribed degrees.

Internet Mathematics , 6(4):489 – 522.Bloch, F. and Jackson, M. O. (2007). The formation of networks with transfers amongplayers.

Journal of Economic Theory , 113(1):83 – 110.Boss, M., Elsinger, H., Summer, M., and Thurner, S. (2004). Network topology of theinterbank market.

Quantitative Finance , 4(6):677 – 684.Calv´o-Armengol, A., Patacchini, E., and Zenou, Y. (2009). Peer eﬀects and social networksin education.

Review of Economic Studies , 76(4):1239 – 1267.Charbonneau, K. B. (2017). Multiple ﬁxed eﬀects in binary response panel data models.

Econometrics Journal , 20(3):S1 – S13.Chatterjee, S., Diaconis, P., and Sly, A. (2011). Random graphs with a given degree sequence.

Annals of Applied Probability , 21(4):1400 – 1435.Chen, X., Christensen, T. M., and Tamer, E. (2018). Monte carlo conﬁdence sets for identiﬁedsets.

Econometrica , 86(6):1965 – 2018.Cox, D. R. (2006).

Principles of Statistical Inference . Cambridge University Press, Cam-bridge. 41zabarka, E., Szekely, L. A., Toroczkai, Z., and Walker, S. (2017). An algebraic monte-carloalgorithm for the bipartite partition adjacency matrix realization problem. arXiv preprintarXiv:1708.08242 .de Paula, A. (2013). Econometric analysis of games with multiple equilibria.

Annual Reviewof Economics , 5:107–131.de Paula, ´A., Richards-Shubik, S., and Tamer, E. (2018). Identifying preferences in networkswith bounded degree.

Econometrica , 86(1):263 – 288.Del Genio, C. I., Kim, H., Toroczkai, Z., and Bassler, K. E. (2010). Eﬃcient and exactsampling of simple graphs with given arbitrary degree sequence.

Plos One , 5(4):1–7.Dutta, B. and Jackson, M. O. (2000). The stability and eﬃciency of directed communicationnetworks.

Review of Economic Design , 5(3):251 – 272.Dzemski, A. (2018). An empirical model of dyadic link formation in a network with unob-served heterogeneity.

Review of Economics and Statistics . University of Mannheim.Erd˝os, P., G. Hartke, S., van Iersel, L., and Mikl´os, I. (2017). Graph realizations constrainedby skeleton graphs.

Electronic Journal of Combinatorics , 24(2):1 – 18.Ferguson, T. S. (1967).

Mathematical Statistics: A Decision Theoretic Approach . AcademicPress, New York.Gao, C., Lu, Y., and Zhou, H. H. (2015). Rate-optimal graphon estimation.

Annals ofStatistics , 43(6):2624 – 2652.Goyal, S. (2009).

Connections: An Introduction to the Economics of Networks . PrincetonUniversity Press, Princeton, NJ.Graham, B. S. (2017). An econometric model of network formation with degree heterogeneity.

Econometrica , 85(4):1033 – 1063.Graham, B. S. (2020).

Handbook of Econometrics , volume 7A, chapter Network data. North-Holland, Amsterdam.Graham, B. S. and Pelican, A. (2020).

The Econometric Analysis of Network Data , chapterTesting for externalities in network formation using simulation, pages 65 – 82. AcademicPress, London.Granovetter, M. S. (1973). The strength of weak ties.

American Journal of Sociology ,78(6):1360 – 1380. 42ackson, M. O., Rogers, B. W., and Zenou, Y. (2017). The economic consequences of social-network structure.

Journal of Economic Literature , 55(1):49 – 95.Jackson, M. O. and Wolinsky, A. (1996). A strategic model of social and economic networks.

Journal of Economic Theory , 71(1):44 – 74.Jochmans, K. (2018). Semiparametric analysis of network formation.

Journal of Businessand Economic Statistics , 36(4):705 – 713.Kaido, H. and Zhang, Y. (2019). Robust likelihood ratio tests for incomplete economicmodels. arXiv Working Paper arXiv:1910.04610v2 [econ.EM], Boston Univerity and JinanUniversity.Kim, H., Genio, C. I. D., Bassler, K. E., and Toroczkai, Z. (2012). Constructing and samplingdirected graphs with given degree sequences.

New Journal of Physics , 14(2):023012.Kinne, B. J. (2014). Dependent diplomacy: signaling, strategy, and prestige in the diplomaticnetwork.

International Studies Quarterly , 58(2):247 – 259.Kolaczyk, E. D. (2009).

Statistical Analysis of Network Data . Springer, New York.K¨onig, M. D., Liu, X., and Zenou, Y. (2019). R&d networks: theory, empirics and policyimplications.

Review of Economics and Statistics , 101(3):476 – 491.McDonald, J. W., Smith, P. W. F., and Forster, J. J. (2007). Markov chain monte carloexact inference for social networks.

Social Networks , 29(1):127 – 136.McPherson, M., Smith-Lovin, L., and Cook, J. M. (2001). Birds of a feather: homophily insocial networks.

Annual Review of Sociology , 27(1):415 – 444.Menzel, K. (2016). Strategic network formation with many agents. New York University.Mitzenmacher, M. and Upfal, E. (2005).

Probability and Computing . Cambridge UniversityPress, Cambridge.Miyauchi, Y. (2016). Structural estimation of a pairwise stable network with nonnegativeexternality.

Journal of Econometrics , 195(2):224 – 235.Moreira, M. J. (2009). Tests with correct size when instruments can be arbitrarily weak.

Journal of Econometrics, 152(2), 131-140 , 152(2):131 – 140.Nash, J. F. (1950). Equilibrium points in n-person games.

Proceedings of the NationalAcademy of Sciences , 36(1):48 – 49. 43ewman, M. E. J. (2010).

Networks: An Introduction . Oxford University Press, Oxford.Pelican, A. (2019). Uniform sampling of graphs with ﬁxed degree sequence under partitionconstraints. .Accessed: 2020-01-01.Rao, A. R., Jana, R., and Bandyopadhyay, S. (1996). A markov chain monte carlo methodfor generating random (0,1)-matrices with given marginals.

Sankhya , 58(2):225 – 242.Sheng, S. (2014). A structural econometric analysis of network formation games. Mimeo,University of California - Los Angeles.Sinclair, A. (1993).

Algorithms for Random Generation and Counting: A Markov ChainApproach . Progress in Theoretical Computer Science. Birkh¨auser, Boston.Tao, T. (2016). An improved mcmc algorithm for generating random graphs from constraineddistributions.

Network Science , 4(1):117 – 139.Tarski, A. (1955). A lattice-theoretical ﬁxpoint theorem and its applications.

Paciﬁc Journalof Mathematics , 5(2):285 – 309.Tinbergen, J. (1962).

Shaping the World Economy: Suggestions for an International Eco-nomic Policy . Twentieth Century Fund, New York.Werner, K. (2012).

Maß- und Integrationstheorie . FernUniversit¨at in Hagen.Yan, T., Jiang, B., Fienberg, S. E., and Leng, C. (2018). Statistical inference in a directednetwork model with covariates.

Journal of the American Statistical Association .44

Supplemental Web Appendix: Proofs

The appendix includes proofs of the theorems stated in the main text as well as statementsand proofs of supplemental lemmata. All notation is as established in the main text unlessstated otherwise. Equation number continues in sequence with that established in the maintext.

A.1 Proof of Theorem 1.1

For the equation (13) to be well deﬁned we must show that N ( d , · ; θ ) is measurable. For anetwork d we can deﬁne a function N ( d , · ; θ ), which assigns to a realization U = u the Nashequilibrium weight of the pure strategy which corresponds to d . We now show that there isa measurable function N ( d , · ; θ ) satisfying these conditions.First we consider the case where the utility shocks are bounded. Let M > N ( d , · ; θ ) is measurable on U c = [ − M, M ] n . Observe that every realization ofthe taste shock u corresponds to a game in normal form γ . Here we use γ to denote a tablecontaining, for each pure strategy combination (or equivalently network) d , the utility ofeach agent according to equation (2). We use γ to denote this table to be consistent with thegame theory literature (recognizing this may be confusing since γ is used in the main text,and elsewhere in this appendix, to denote the strategic interaction parameter of parameterof interest.Utility is deﬁned for every player. The mapping g : R n → R N n assigns to each tasteshock realization u the corresponding game g ( u ). In a game each player can choose among2 N − strategies (corresponding to which of the N − ( N − N pure strategy combinations.Looking at equation (2) it is apparent that g is continuous and therefore measurable.Because of continuity we have sup {|| g ( u ) || : u ∈ [ − M , M ] N ( N − ) } := L < ∞ . We setΓ := [ − L, L ] N n ; the set of all games with bounded payoﬀs. Lemma A.1.

Let Σ be the set of all mixed strategies combinations of the players. The set E := { ( γ, σ ) : σ is a NE of γ ∈ Γ } ⊂ Γ × Σ (48) is compact.Proof. E is bounded. Thus it is suﬃcient to show that E is closed. The NE is deﬁned over aset of inequalities, which have to be fulﬁlled (each player cannot strictly increase her payoﬀby replacing their strategy with any other pure strategy, holding the other players strategies1onstant). The utilities are continuous functions on Γ × Σ. Now assume x ∈ Γ × Σ is not in E ,then there exists a inequality which is not satisﬁed. The inequality is violated by u . Becausethe function on both sides of the inequality are continuous we can choose a δ environment of x such that it is violated for all the elements in the environment. Therefore the complementof E is open, which proves the statement. Lemma A.2.

Let Γ ⊂ R n and Σ ⊂ R m , E ⊂ Γ × Σ be compact sets. Further ∀ x ∈ Γ ∃ y ∈ Σ : ( x, y ) ∈ E. (49) There is a measurable function f : Γ → Σ with ∀ x ∈ Γ : ( x, f ( x )) ∈ E. (50) Proof.

Because of compactness there are a, b ∈ R such that Γ × Σ ⊂ [ a, b ) n + m . For each k we partition [ a, b ) in 2 k intervals: [ a, a + b − a k ) , .., [ b − b − a k , b ) and correspondingly partition[ a, b ) n + m into hyper-cubes with side length b − a k . To a cube [ a , b ) × [ a , b ) × .. × [ a h , b h ) weassociate the characteristic vector ( a , a , .., a h ). We order vectors such that a > a if theﬁrst coordinate i , for which entry of the two vectors is not equal is a i > a i . The ordering ofthe vectors implies an ordering of the boxes by their characteristic vectors. Let C k be the setof hyper-cubes constructed as described above with side length b − a k covering [ a, b ) n + m . Let p : R n + m → R n be the projection.Now we deﬁne f k : Γ → Σ as follows. Select x ∈ Γ ﬁnd all cubes in C ∈ C k with x ∈ p ( C )and C ∩ E (cid:54) = ∅ . We denote these cubes as C x . We call x , x ∈ Γ equivalent if and only if C x = C x . In this way Γ is partitioned into ﬁnitely many equivalence classes. By (49) C x isnot empty. If more than one cube is in C x chose C , the highest one in the sense of the orderingdeﬁned above. We select an arbitrary vector v ∈ C ∩ E and set f k ( x (cid:48) ) := ( v n +1 , v n +2 , .., v n + m )for all x (cid:48) equivalent to x . f k (Γ) is ﬁnite. And each element of f k (Γ) has as a preimage the intersection of Γ witha ﬁnite number of n dimensional cubes. Because Γ is measurable (because of compactness)and the n dimensional cube are measurable, f k is a measurable function. Note f k does nothave property (50).We now want to show pointwise convergence for ( f k ) k ∈ N . We ﬁx x ∈ Γ and have a sequence( x k , y k ) = ( x, f k ( x )) in Γ × Σ. Because Γ × Σ is compact, ( x k , y k ) has a convergent sub-sequence ( x k z , y k z ) with limit ( x ∗ , y ∗ ). Let || v || be the maximum norm of a vector v ∈ R n + m .Assume ( x ∗ , y ∗ ) (cid:54)∈ E , then inf {|| ( x ∗ , y ∗ ) − e || : e ∈ E } := (cid:15) > E is compact. Nowwe can choose z (and thereby k ) such that the side length of the cube is (cid:15) . Then the cubecontaining ( x ∗ , y ∗ ) does not contain any e ∈ E . A contradiction of the way f k is constructed.2ow assume that ( x k , y k ) has subsequence converging to point (˜ x ∗ , ˜ y ∗ ) (cid:54) = ( x ∗ , y ∗ ). Withoutloss of generality assume (˜ x ∗ , ˜ y ∗ ) < ( x ∗ , y ∗ ). Let i be the ﬁrst index of y in which the twovectors don’t equal. We have ˜ y ∗ i < y ∗ i . Therefore | ˜ y ∗ i − y ∗ i | =: (cid:15) >

0. Now we can choose athreshold K such that for any k > K the side length of the cube is less then (cid:15) . Then for all k > K the cube containing (˜ x ∗ , ˜ y ∗ ) and the cube containing ( x ∗ , y ∗ ), do not intersect. Thecube containing ( x ∗ , y ∗ ) is higher in the ordering then the cube containing (˜ x ∗ , ˜ y ∗ ). Thereforefor all k > K ( x k , y k ) is not in a (cid:15) environment of (˜ x ∗ , ˜ y ∗ ). A contradiction to another limitof the sequence. This shows the pointwise convergence for ( f k ) k ∈ N . The limit function wedenote with f ∗ .For a sequence of measurable functions, which converges pointwise, the limit function isalso measurable Werner (2012). f ∗ is measurable and satisﬁes condition (50). Proof of Theorem 1.1

Now we are ready to prove the result stated in the main text.For a z ∈ Z n we deﬁne a function r z : [ z , z + 1] × ... × [ z n , z n + 1] → Σ, which assignsto each taste shock in the hyper cube [ z, z + 1] a NE. According to Lemma A.1 and LemmaA.2 and the fact that g from above is measurable we can deﬁne r z such that it is measurable.We now deﬁne ¯ r z : R n → R N n with ¯ r z ( x ) = r z ( x ) if x ∈ [ z, z + 1) and ¯ r z ( x ) = 0 otherwise.Let r : R n → Σ with r ( x ) = (cid:88) z ∈ Z n ¯ r z ( x ) (51)Note in equation (51) we sum over a countable set and and the sum converges absolutely foreach x . Therefore r is measurable; speciﬁcally it is a measurable function which assigns toeach taste shock to a NE. Let h d : Σ → [0 ,

1] be the function which assigns to every mixedstrategy the probability of d by multiplying the mixed strategies weights corresponding to d .Since multiplication is a measurable operation h G is measurable. N ( d , · ; θ ) := h d ◦ r satisﬁesthe desired properties. A.2 Optimal test statistic proofs

Preliminary resultsLemma A.3.

Any diﬀerentiable function f ∈ O ( γ ) with f (0) = 0 has a derivative of zeroat point zero.Proof. For f ∈ O ( γ ) we have, for some C > (cid:15) >

0, that | f ( γ ) | < Cγ (52)3or all γ ∈ [ − (cid:15), (cid:15) ]. The derivative of f at γ = 0 equals f (cid:48) (0) = lim γ → f ( γ ) − f (0) γ = lim γ → f ( γ ) γ , (53)with the second equality because f (0) = 0. As γ →

0, we will have γ < (cid:15) so that f (cid:48) (0) = lim γ → f ( γ ) γ ≤ lim γ → Cγ γ = lim γ → Cγ (54)which goes to zero as γ → Proof of Theorem 2.1 (i.e., derivation of the form of the locally best test statistic)

We begin with the likelihood decomposition (35) given in the main text. The number ofsummands in (35) depends on the partition that s ij ( d ) induces on R . For a positive γ , thenumber neither depends on the exact value of γ , nor on the other covariates and parameters.Intuitively, as long as γ is positive, there is a positive probability that U falls in any combi-nation of buckets. The number of summands in (35) is typically large. The buckets b of B n and the function N depend on γ .We have that ∂P ( d ; θ, N ) ∂γ = ∂∂γ (cid:40) (cid:88) b ∈ B n (cid:90) u ∈ b N ( d , u ; θ ) f U ( u ) d u (cid:41) = (cid:88) b ∈ B n ∂∂γ (cid:90) u ∈ b N ( d , u ; θ ) f U ( u ) d u , The switching of summation and derivative operator is possible because the number of sum-mands does not depend on γ . We could try to take the derivative of each summands integralboundaries and of N ( d , . ; θ ). But there is no need to boil the ocean, because regardless of N ( d , . ; θ ) most of the summands are 0. To show this we consider three cases of summands. Case 1: more than two buckets in B are inner buckets

Recall that the boldface subscripts i = , , . . . index the n = N ( N −

1) directed dyads inarbitrary order. Consider a set of buckets b where two or more of them are inner buckets.Without loss of generality assume that the L ≥ b , . . . , b L of b = ( b , . . . , b n ). The shape of the l th bucket is ( γs l , γ ¯ s l ] with s l < ¯ s l coinciding withthe bucket borders induced by the precise form of strategic interaction speciﬁed under thealternative. We normalize the dyad-speciﬁc systematic utility component µ ij = 0 without4oss of generality.Recall that ˜ B n is the set of bucket conﬁgurations with two or more inner buckets. Forany b ∈ ˜ B n we can derive the upper bound: (cid:90) u ∈ b N ( d , u ; θ ) f U ( u ) d u = (cid:90) u ∈ b N ( d , u ; θ ) (cid:34)(cid:89) i f U ( u i ) (cid:35) d u ≤ (cid:90) γs γs f U ( u ) × · · · × (cid:90) γs L γs L f U ( u L ) (cid:90) u − L ∈ b − L f U − L ( u − L ) d u < (cid:90) γs γs f U ( u ) × · · · × (cid:90) γs L γs L f U ( u L ) d u · · · d u L < (cid:90) γs γs × · · · × (cid:90) γs L γs L u · · · d u L = γ L ( s − s ) × · · · × ( s L − s L )where u − L denotes the vector u after removal of its ﬁrst L components and similarly for b − L . The ﬁrst equality follows from independence of the components of u , the second (weak)inequality from the fact that N ( d , u ; θ ) ≤ u ∈ U . The third (strict) inequalityfollows because f U − L ( u − L ) is a density and the integration is not over all of R n − L . Thefourth (strict) inequality arises because when f U ( u ) is the logistic density we have that f U ( u ) = F U ( u ) [1 − F U ( u )] < u on a compact interval of the real line. We concludethat any summand where b has two or more inner buckets is O ( γ ) for γ → Q ( d ; θ, N ) ∈ O ( γ ) and furthermore that Q (cid:0) d ; (0 , δ (cid:48) ) (cid:48) , N (cid:1) = 0 (since inner buckets have zero probability when γ = 0). Hence, byLemma A.3, we have that ∂Q ( d ; θ, N ) ∂γ (cid:12)(cid:12)(cid:12)(cid:12) γ =0 = 0 . This is enough to show equation (40) of the main text. This simpliﬁcation is essential tothe overall result, as it allows us to proceed without knowing any details about the formof the equilibrium selection rule N when U takes values which admit multiple equilibriumnetworks. Case 2: No bucket in b is an inner bucket (i.e., all buckets are outer buckets)

If all components of u fall in either their ﬁrst or last buckets, then the network is uniquelydeﬁned. This occurs because agent-level preferences for forming (or not forming) a link areso strong that they do not depend on the presence or absence of other links in the network.5ach agent i either prefers to send a link to j, regardless of the actions taken by others, ordoes not wish to send a link. Put diﬀerently, each agent has a pure link formation strategywhich is strictly dominating in such games; therefore N ( d , u ; θ ) is either zero or one.For a particular network d , N ( d , u ; θ ) = 1 if, for all (directed) dyads ij such that d ij = 1,we have that u ij falls in the ﬁrst bucket and for all dyads ij such that d ij = 0 we have that u ij falls in the last bucket. These considerations give the equality (cid:90) u ∈ b N ( d , u ; θ ) f U ( u ) d u = (cid:89) i (cid:54) = j (cid:20)(cid:90) µ ij + γs −∞ f U ( u ij ) d u ij (cid:21) d ij (cid:34)(cid:90) ∞ µ ij + γ ¯ s f U ( u ij ) d u ij (cid:35) − d ij (55)= (cid:89) i (cid:54) = j [ F U ( µ ij + γs )] d ij [1 − F U ( µ ij + γ ¯ s )] − d ij (56)Taking logarithms of the expression above, diﬀerentiating with respect to γ , evaluating at γ = 0, and multiplying by P ( d ; δ ) yields a derivative for summands where all buckets in b are outer buckets of P ( d ; δ ) (cid:88) i (cid:54) = j (cid:20) d ij s f U ( µ ij ) F U ( µ ij ) − (1 − d ij ) ¯ s f U ( µ ij )1 − F U ( µ ij ) (cid:21) . (57) Case 3: Exactly one bucket in b is an inner bucket

If all but one component of u falls into its ﬁrst or last bucket, then the resulting network isuniquely deﬁned except for the presence or absence of one arc, say, ij . For any such draw of u , since all other links are formed according to a strictly dominating strategy, player i willeither beneﬁt from forming the ij arc or not. Hence N ( d , u ; θ ) is also either zero or one inthis case.For a particular network d , N ( d , u ; θ ) will equal one if two conditions hold. First, forall directed dyads kl (cid:54) = ij such that d kl = 1 we have that u kl falls in the ﬁrst bucket and forall dyads kl (cid:54) = ij such that d kl = 0 we have that u kl falls in the last bucket. Second, for thedyad ij with u ij falling in an inner bucket, we require that if u ij ∈ [ µ ij + γs, µ kl + γs ij ( d ))that d ij = 1, while if u ij = [ µ kl + γs ij ( d ) , µ ij + γ ¯ s ) we require that d ij = 0. The overall6ikelihood contribution for this case therefore equals: (cid:90) u ∈ b N ( d , u ; θ ) f U ( u ) d u = (cid:89) kl (cid:54) = ij (cid:20)(cid:90) µ kl + γs −∞ f U ( u kl ) d u kl (cid:21) d kl (cid:20)(cid:90) ∞ µ kl + γ ¯ s f U ( u kl ) d u kl (cid:21) − d kl × (cid:34)(cid:90) µ ij + γs ij ( d ) µ ij + γs f U ( u ij ) d u ij (cid:35) d ij (cid:34)(cid:90) µ ij + γ ¯ sµ ij + γs ij ( d ) f U ( u ij ) d u ij (cid:35) − d ij = (cid:89) kl (cid:54) = ij [ F U ( µ kl + γs )] d kl [1 − F U ( µ kl + γ ¯ s )] − d kl × [ F U ( µ ij + γs ij ( d )) − F U ( µ ij + γs )] d ij × [ F U ( µ ij + γ ¯ s ) − F U ( µ ij + γs ij ( d ))] − d ij . Recall the restriction s ij ( d ) = s ij ( d − ij ) = s ij ( d + ij ). Because the last two terms in [ · ]in the expression above are zero at γ = 0 we only need to consider their derivative (by theproduct rule the other term equals zero at γ = 0). Diﬀerentiating the last two terms withrespect to γ (and multiplying by the balance of preceding terms) yields (cid:89) kl (cid:54) = ij [ F U ( µ kl + γs )] d kl [1 − F U ( µ kl + γ ¯ s )] − d kl × [ s ij ( d ) f U ( µ ij + γs ij ( d )) − sf U ( µ ij + γs )] d ij × [¯ sf U ( µ ij + γ ¯ s ) − s ij ( d ) f U ( µ ij + γs ij ( d ))] − d ij = (cid:89) i (cid:54) = j [ F U ( µ ij + γs )] d ij [1 − F U ( µ ij + γ ¯ s )] − d ij × (cid:20) s ij ( d ) f U ( µ ij + γs ij ( d )) F U ( µ ij + γs ) − s f U ( µ ij + γs ) F U ( µ ij + γs ) (cid:21) d ij × (cid:20) ¯ s f U ( µ ij + γ ¯ s ) F U ( µ ij + γ ¯ s ) − s ij ( d ) f U ( µ ij + γs ij ( d )) F U ( µ ij + γ ¯ s ) (cid:21) − d ij . Summing this expression over all potential arcs (and evaluating at γ = 0) gives a totalcontribution of “one inner bucket in b ” summands to the derivative of: P ( d ; δ ) (cid:88) i (cid:54) = j d ij (cid:20) s ij ( d ) f U ( µ ij ) F U ( µ ij ) − s f U ( µ ij ) F U ( µ ij ) (cid:21) + (1 − d ij ) (cid:20) ¯ s f U ( µ ij ) F U ( µ ij ) − s ij ( d ) f U ( µ ij ) F U ( µ ij ) (cid:21) . (58)Summing (55) and (58) then gives the expression in the statement of Theorem 2.1. Usingsimilar methods we can show that P ( d ; θ ) can be diﬀerentiated with respect to γ twice as7laimed. A.3 MCMC Proofs

Proof of Lemma 3.1

Let A G,G (cid:48) be the set of arcs form the node v G to the node v G (cid:48) . We construct a bijection ϕ : A G,G (cid:48) → A G (cid:48) ,G . Then we show that the probability of an arc p ( a ) is equal to p ( ϕ ( a )). Ifthat is proven, the probability of a transition form v G to v G (cid:48) is (cid:88) a ∈ A G (cid:48) ,G p ( a ) = (cid:88) a ∈ A G (cid:48) ,G p ( ϕ ( a ))= (cid:88) ϕ − ( a (cid:48) ) ∈ A G (cid:48) ,G p ( a (cid:48) )= (cid:88) a (cid:48) ∈ ϕ ( A G (cid:48) ,G ) p ( a (cid:48) )= (cid:88) a (cid:48) ∈ A G,G (cid:48) p ( a (cid:48) )which is the probability for a transition from v G (cid:48) to v G .For the construction of the bijection consider that every arc A G,G (cid:48) corresponds uniquely toa schlaufen-sequence R = ( R , .., R h ). Let R k = ( i , .., i m , .., i l ) with i m the start of thecycle (if there is no cycle in R , we set R = ¯ R ). We deﬁne ¯ R k = ( i , .., i m , i l − , ..i m +1 , i l ) and¯ R = ( ¯ R , .., ¯ R h ).Note that the R , .., R h are link disjoint and as soon as the cycle of R k is switched ¯ R k isa schlaufe. The violation matrix of ¯ R k is the negative violation matrix of R k . This impliesthat if R is a feasible schlaufen sequence for G which deﬁned a arc in A G,G (cid:48) then ¯ R is afeasible schlaufen-sequence for G (cid:48) and deﬁnes an arc A G (cid:48) ,G .We deﬁne now ϕ as the function which maps the arc in A G,G (cid:48) with schlaufen sequence R to the arc in A G (cid:48) ,G with schlaufen sequence ¯ R . By construction ϕ is injective, which implies | A G,G (cid:48) | ≤ | A G (cid:48) ,G | . By symmetry we conclude | A G (cid:48) ,G | ≥ | A G,G (cid:48) | , which implies | A G (cid:48) ,G | = | A G (cid:48) ,G | and that ϕ is bijective.It remains to show that the probability of an arc p ( a ) is equal to p ( ϕ ( a )). For any nodethere are equally many feasible active / passive outlinks in G as in G (cid:48) . If for a node oneoutlink is marked due to an link in R k then for the same node one outlink is marked in ¯ R k .Therefore is r G (cid:48) k ( i ) equal to r G k ( i ) for an active as well as a passive step. Looking at equation(44) the p G k ( R k ) is only diﬀerent to p G (cid:48) k ( ¯ R k ) in the numbering of the factors. But in a cycle8f a schlaufe the start node i m and the end node i l are such that m − l mod 2 = 0. Thereordering leaves even indexes even and odd indexes odd. Therefore p G ( R k ) = p (cid:48) G ( ¯ R k ). Fromequation (45) it follows directly that p G ( R ) = p G (cid:48) ( ¯ R ) which completes the proof. Proof of Lemma 3.2

The symmetric diﬀerence of two realizations of G s,m , which we denote by G and G (cid:48) is a set ofalternating cycles. Cycles are in particular schlaufen. We order them arbitrary ( R , .., R h ).The sum of the violation matrices is 0. Therefore ( R , .., R h ) is either a feasible schlaufen-sequence or a concatenation of feasible schlaufen-sequences. In the ﬁrst case there is an arcfrom v G to v G (cid:48) . In the second case, all the feasible schlaufen-sequence deﬁne an arc to a newnode, resulting in a directed path starting at v G and ending in v G (cid:48) . Thus between any twovertexes in Φ there is a directed path. Proof of Theorem 3.3