aa r X i v : . [ ec on . E M ] A ug Dyadic Regression
August 27, 2019Bryan S. Graham Initial Draft: September 2018, This Draft: August 2019
Abstract
Dyadic data, where outcomes reflecting pairwise interaction among sampled units are ofprimary interest, arise frequently in social science research. Regression analyses with suchdata feature prominently in many research literatures (e.g., gravity models of trade). Thedependence structure associated with dyadic data raises special estimation and, especially,inference issues. This chapter reviews currently available methods for (parametric) dyadicregression analysis and presents guidelines for empirical researchers.
JEL Classification: C14, C55, C57Keywords:
Networks, Dyadic Regression, U-Statistics Department of Economics, University of California - Berkeley, 530 Evans Hall http://bryangraham.github.io/econometrics/ .This is a draft chapter for the edited volume
The Econometric Analysis of Network Data being preparedby Aureo de Paula and myself. It is based upon lecture notes prepared for a series of short courses on theeconometrics of networks. I thank Michael Jansson for several useful discussions as well as participants inshort courses in Olso, Norway (September, 2017), Hejnice, Czechia (February 2018), St. Gallen, Switzerland(October 2018), Annweiler, Germany (July 2019) and Prague, Czechia (August 2019) for useful feedback.Portions of this material were also presented at an invited session of the 2018 LACEA/LAMES meetings inGuayaqil, Ecuador. All the usual disclaimers apply. Financial support from NSF grant SES et Y ij equal total exports from country i to country j as in Tinbergen (1962); here i and j are two of N independent random draws from a common population. Let W i be a vectorof country attributes and R ij = r ( W i , W j ) a vector of constructed dyad-specific attributes; R ij typically includes the logarithm of both exporter and importer gross domestic product(GDP), the physical distance between i and j , as well as other variables (e.g., indicators forsharing a land border or belonging to a common customs union). The analyst, seeking torelate Y ij and R ij , posits the relationship Y ij = exp (cid:0) R ′ ij θ (cid:1) A i B j V ij , (1)with A i , B i and V ij mean one random variables and { ( V ij , V ji ) } ≤ i ≤ N − ,j>i independent of { W i , A i , B i } Ni =1 and independently and identically distributed across the (cid:0) N (cid:1) = N ( N − dyads. Here the { A i } Ni =1 and { B i } Ni =1 sequences correspond, respectively, to (unobserved)exporter and importer heterogeneity terms. These terms are sometimes referred to as “mul-tilaterial resistance” terms by empirical trade economists. For example, a high A i mightreflect an unmodeled export orientation of an economy or an undervalued currency. Sim-ilarly, a high B i might capture unmodeled tastes for consumption. Head & Mayer (2014)survey the gravity model of trade, including its theoretical foundations.Conditional on the exporter and importer effects we have E [ Y ij | W i , W j , A i , B j ] = exp (cid:0) R ′ ij θ (cid:1) A i B j . If, additionally, E (cid:2) ( A i , B i ) ′ (cid:12)(cid:12) W i (cid:3) = (1 , ′ , such that W i does not covary with the exporterand importer “multilaterial resistance” terms , then unconditional on A i and B j we have the dyadic regression function E [ Y ij | W i , W j ] = exp (cid:0) R ′ ij θ (cid:1) . (2)Interpret (2) as follows: draw countries i and j independently at random and record theirvalues of W i and W j . Given this information set what is the mean square error (MSE) min-imizing predictor of Y ij ? Equation (2) gives a parametric form for this prediction/regressionfunction. This chapter surveys methods of estimation of, and inference on, θ .Santos Silva & Tenreyro (2006) recommended estimating θ by maximizing a Poissonpseudo log-likelihood with a conditional mean function given by (2) (cf. Gourieroux et al.,1984). For inference they constructed standard errors using the sandwich formula of Huber If, for example, a subset of W is associated with membership in the World Trade Organization (WTO),then reasoning about this condition involves asking whether countries belonging to the WTO have a greaterlatent propensity to export or import? In what follows I entirely defer consideration of these questions andfocus solely on the inferential issues raised by the network structure. { Y ij } ≤ i,j ≤ N,i = j are conditionally independent of one an-other given W = ( W , . . . , W N ) ′ . In practice this conditional independence assumption, al-though routinely made in the empirical trade literature (e.g., Rose, 2004; Baldwin & Taglioni,2007), is very unlikely to hold. Exports from, say, Japan to Korea likely covary with thosefrom Japan to the United States. This follows because A i – the Japan exporter effect – drivesJapanese exports to both Korea and the United States. It is also possible that exports fromJapan to Korea may covary with those from Korea to Thailand; perhaps because A j and B j – the Korean exporter and importer effects – covary (as would be true if there exist commonunobserved drivers of Korean exporting and importing behavior). Loosely following Fafchamps & Gubert (2007) I call the above patterns of dependence“dyadic dependence” or “dyadic clustering”. Consider two pairs of dyads, say { i , i } and { j , j } , if these dyads share an agent in common – for example i = j – then Y i i and Y j j will covary. Failing to account for dependence of this type will, typically, result in standarderrors which are too small and consequently more Type I errors in inference than is desired(e.g., Cameron & Miller, 2014; Aronow et al., 2017).In this chapter I describe how to estimate and conduct inference on θ in a way thatappropriately accounts for dependence across dyads sharing a unit in common. Section 1outlines the population and sampling framework. Section 2 introduces a composite maximumlikelihood estimator. Section 3 develops the asymptotic properties of this estimator anddiscusses variance estimation. Sections 4 presents a small empirical illustration.Dyadic data, where outcomes reflecting pairwise interaction among sampled units areof primary interest, arise frequently in social science research. Such data play centralroles in contemporary empirical trade and international relations research (see, respectively,Tinbergen (1962) and Oneal & Russett (1999)). They also feature in work on internationalfinancial flows (Portes & Rey, 2005), development economics (Fafchamps & Gubert, 2007),and anthropology (Apicella et al., 2012) among other fields. Despite their prominence inempirical work, the properties of extant methods of estimation and inference for dyadicregression models are not fully understood. Only recently have researchers begun to for-mally study these methods (e.g., Aronow et al., 2017; Menzel, 2017; Tabord-Meehan, 2018;Davezies et al., 2019). Some of the results presented in this chapter are novel, others, whilehaving antecedents going back decades, are not widely known among empirical researchers.Section 5 ends the chapter with a discussion of further reading (including historically impor- Researchers also sometimes “cluster” on dyads (e.g., Santos Silva & Tenreyro, 2010); this assumes thatthe elements of { ( Y ij , Y ji ) } ≤ i ≤ N − ,j>i are conditionally independent given covariates. While this allows fordependence between, say, exports from Japan to the United States and from the United States to Japan,it does not allow for dependence between, say, exports from Japan to the United States and from Japan toCanada. Let i ∈ N index agents in some (infinite) population of interest. In what follows I willrefer to agents as, equivalently, nodes, vertices, units and/or individuals. Let W i ∈ W = { w , . . . , w L } be an observable attribute which partitions this population into L = | W | subpopulations or “types”; N ( w ) = { i : W i = w } equals the index set associated with thesubpopulation where W i = w . While L may be very large, the size of each subpopulation isassumed infinite. In practice W will typically enumerate different combinations of distinctagent-specific attributes (e.g., W i = w may correspond to former British colonies in thetropics with per capita GDP below $3,000). Heuristically we can think of W as consisting ofthe support points of an multinomial approximation to a (possibly continuous) underlyingcovariate space as in Chamberlain (1987).The indexing of agents within subpopulations homogenous in W i is arbitrary; from thestandpoint of the researcher all vertices of the same type are exchangeable. Similar exchange-ability assumptions underlie most cross-sectional microeconometric procedures. For each(ordered) pair of agents – or directed dyad – there exists an outcome of interest Y ij ∈ Y ⊆ R .The first subscript in Y ij indexes the directed dyads ego, or “sending” agent, while the secondits alter, or “receiving” agent. The adjacency matrix [ Y ij ] i,j ∈ N collects all such outcomes intoan (infinite) random array. Within-type exchangeability of agents implies a particular formof joint exchangeability of the adjacency matrix.To describe this exchangeability condition let σ w : N → N be any permutation of indicessatisfying the restriction (cid:2) W σ w ( i ) (cid:3) i ∈ N = [ W i ] i ∈ N . (3)Condition (3) restricts relabelings to occur among agents of the same type (i.e., within the index sets N ( w ) , w ∈ W ). Following Crane & Towsner (2018) a network is relativelyexchangeable with respect to W (or W -exchangeable) if, for all permutations σ w , (cid:2) Y σ w ( i ) σ w ( j ) (cid:3) i,j ∈ N D = [ Y ij ] i,j ∈ N (4)where D = denotes equality of distribution.If we regard [ Y ij ] i,j ∈ N as a (weighted) directed network and W i as vertex i ’s “color”, then(4) is equivalent to the statement that all colored graph isomorphisms are equally probable.Since there is nothing in the researcher’s information set which justifies attaching differentprobabilities to graphs which are isomorphic (as vertex colored graphs) any probability model4or the adjacency matrix should satisfy (4). If W i encodes all the vertex information observedby the analyst, then W -exchangeability is a natural a priori modeling restriction.Condition (4) allows for the invocation of very powerful de Finetti (1931) type represen-tation results for random arrays. These results provide an “as if” (nonparametric) data gener-ating process for the network adjacency matrix. This, in turn, facilitates various probabilisticcalculations (e.g., computing expectations and variances) and gives (tractable) structure tothe dependence across the elements of [ Y ij ] i,j ∈ N .Let α , { U i } i ≥ and { ( V ij , V ji ) } i ≥ ,j>i be i.i.d. random variables. We may normalize α , U ij and V ij to be U [0 , – uniform on the unit interval – without loss of generality. We doallow for within-dyad dependence across V ij and V ji ; the role such dependence will becomeapparent below. Next consider the random array (cid:2) Y ∗ ij (cid:3) i,j ∈ N generated according to the rule Y ij def ≡ ˜ h ( α, W i , W j , U i , U j , V ij ) . (5)Data generating process (DGP) (5) has a number of useful features. First, any pair ofoutcomes, Y i i and Y j j , sharing at least one index in common are dependent. This holdstrue even conditional on their types W i , W i , W j and W j . Second, if Y i i and Y j j shareexactly one index in common, say i = j , then they are independent if U i = U j , U i and U j are additionally conditioned on. Third, if they share both indices in common, as in i = j and i = j , then there may be dependence even conditional on U i = U j and U i = U j due to the within-dyad dependence across V i i and V i i . These patterns ofstructured dependence and conditional independence will be exploited below to derive thelimit distribution of parametric dyadic regression coefficient estimates. Shalizi (2016) helpfulcalls models like (5) conditionally independent dyad (CID) models (see also Chandrasekhar(2015)).Crane & Towsner (2018), extending Aldous (1981) and Hoover (1979), show that, forany random array [ Y ij ] i,j ∈ N satisfying (4), there exists another array (cid:2) Y ∗ ij (cid:3) i,j ∈ N , generatedaccording to (5), such that [ Y ij ] i,j ∈ N D ≡ (cid:2) Y ∗ ij (cid:3) i,j ∈ N . (6)Rule (5) can therefore be regarded as a nonparametric data generating process for [ Y ij ] i,j ∈ N . Equation (6) implies that we may proceed ‘as if’ our W -exchangeable network wasgenerated according to (5). In the spirit of Diaconis & Janson (2008) and Bickel & Chen(2009) and others, call ˜ h : [0 , × W × [0 , → R a graphon . Here α is an unidentiablemixing parameter, analogous to the one appearing in de Finetti’s (1931) classic represen-tation result for exchangeable binary sequences. Since I will focus on inference which isconditional on the empirical distribution of the data, α can be safely ignored and I will write5 ( W i , W j , U i , U j , V ij ) def ≡ ˜ h ( α, W i , W j , U i , U j , V ij ) in what follows (cf., Bickel & Chen, 2009;Menzel, 2017).The Crane & Towsner (2018) representation result implies that a very particular typeof dependence structure is associated with W -exchangeability. Namely, as discussed earlier, Y i i and Y j j are (conditionally) independent when { i , i } and { j , j } share no indices incommon and dependent when they do. This type of dependence structure, which is verymuch analogous to that which arises in the theory U-Statistics, is tractable and allows forthe formulation of Laws of Large Numbers and Central Limit Theorems. The next fewsections will show how to use this insight to develop asymptotic distribution theory fordyadic regression. Sampling assumption
I will regard [ Y ij ] i,j ∈ N as an infinite random (weighted) graph, G ∞ , with nodes N and(weighted) edges given by the non-zero elements of [ Y ij ] i,j ∈ N . Let V = { , . . . , N } be arandom sample of size N from N . Let G N = G ∞ [ V ] be the subgraph indexed by V . Weassume that the observed network corresponds to the one induced by a random sample ofagents from the larger (infinite) graph. The sampling distribution of any statistic of G N isinduced by this (perhaps hypothetical) random sampling of agents from G ∞ .If G ∞ is relatively exchangeable, then G N will we be as well. We can thus proceed ‘as if’ Y ij = h ( W i , W j , U i , U j , V ij ) for ≤ i, j ≤ N . In what follows we assume that we observe W i for each sampled agent, andfor each pair of sampled agents, we observe both Y ij and Y ji . The presentation here rulesout self loops (i.e., Y ii ≡ ), however incorporating them is natural in some empirical settingsand what follows can be adapted to handle them. Similarly the extension to undirectedoutcomes, where Y ij = Y ji , is straightforward. Let f Y | W ,W ( Y | W , W ; θ ) be a parametric family for the conditional density of Y given W and W . This family is chosen by the researcher. Let l ( θ ) denote the correspondinglog-likelihood. As an example to help fix ideas, return to the variant of the gravity model oftrade introduced in the introduction. Following Santos Silva & Tenreyro (2006) we set l ( θ ) = Y R ′ θ − exp ( R ′ θ ) , θ ) the log likelihood of a Poisson randomvariable Y with mean exp ( R ′ θ ) , and choose ˆ θ to maximize L N ( θ ) = 1 N N − X i X j = i l ij ( θ ) . (7)The maximizer of (7) coincides with a maximum likelihood estimate based upon the as-sumption that [ Y ij ] ≤ i,j ≤ N,i = j are independent Poisson random variables conditional on W = ( W , . . . , W N ) ′ . In practice, trade flows are unlikely to be well-described by a Poisson distribution and in-dependence of the summands in (7) is even less likely. As discussed earlier any two summandsin (7) will be dependent if they share an index in common. The likelihood contribution asso-ciated with exports from Vanuatu to Fiji is not independent of that associated with exportsfrom Fiji to Bangladesh. Dependencies of this type mean that proceeding ‘as if’ (7) is acorrectly specified log-likelihood (or even an M-estimation criterion function) will lead toincorrect inference.If there exists some θ such that f Y | W ,W ( Y | W , W ; θ ) is the true density, then (5)corresponds to what is called a composite likelihood (e.g., Lindsey, 1988; Cox & Reid, 2004;Bellio & Varin, 2005). Because it does not correctly reflect the dependence structure acrossdyads, (5) is not a correctly specified log-likelihood function in the usual sense. If, however,the marginal density of Y ij | W i , W j is correctly specified, then ˆ θ will generally be consistentfor θ . That is we may have that f Y | W ,W ( Y | W , W ) = f Y | W ,W ( Y | W , W ; θ ) for some θ ∈ Θ (i.e., the marginal likelihood is correctly specified), but it is not the casethat, setting Y = [ Y ij ] ≤ i,j ≤ N,i = j , f Y | W ( Y | W ) = Y ≤ i,j ≤ N,i = j f Y | W ,W ( Y ij | W i , W j ; θ ) , due to dependence across dyads sharing agents in common (i.e., the joint likelihood is notcorrectly specified). A composite log-likelihood is constructed by summing together a collec-tion of component log-likelihoods; each such component is a log-likelihood for a portion ofthe sample (in this case a single directed dyad) but, because the joint dependence structuremay not be modeled appropriately, the summation of all these components may not be thecorrect log likelihood for the sample as a whole.If the marginal likelihood is itself misspecified, then (5) corresponds to what might be7alled a pseudo-composite-log-likelihood; “pseudo” in the sense of Gourieroux et al. (1984)and “composite” in the sense of Lindsey (1988). In what follows I outline how to conductinference on the probability limit of ˆ θ (denoted by θ in all cases); the interpretation of thislimit will, of course, depend on whether the pairwise likelihood is misspecified or not. In thecontext of the Santos Silva & Tenreyro (2006) gravity model example, if the true conditionalmean equals exp (cid:0) R ′ ij θ (cid:1) for some θ ∈ Θ , then ˆ θ will be consistent for it (under regularityconditions). The key challenge is to characterize this estimate’s sampling precision. To characterize the limit properties of ˆ θ begin with a mean value expansion of the first ordercondition associated with the maximizer of (7). This yields, after some re-arrangement, √ N (cid:16) ˆ θ − θ (cid:17) = (cid:2) − H N (cid:0) ¯ θ (cid:1)(cid:3) + √ N S N ( θ ) with ¯ θ a mean value between ˆ θ and θ which may vary from row to row, the + superscriptdenoting a Moore-Penrose inverse, and a “score” vector of S N ( θ ) = 1 N N − X i X j = i s ij ( Z ij , θ ) (8)with s ( Z ij , θ ) = ∂l ij ( θ ) /∂θ for Z ij = (cid:0) Y ij , W ′ i , W ′ j (cid:1) ′ and H N ( θ ) = N N − P i P j = i ∂ l ij ( θ ) ∂θ∂θ ′ . Inwhat follows I will just assume that H N (cid:0) ¯ θ (cid:1) p → Γ , with Γ invertible (see Graham (2017)for a formal argument in a related setting and Eagleson & Weber (1978) and Davezies et al.(2019) for more general results).If the Hessian matrix converges in probability to Γ , as assumed, then √ N (cid:16) ˆ θ − θ (cid:17) = Γ − √ N S N ( θ ) + o p (1) so that the asymptotic sampling properties of √ N (cid:16) ˆ θ − θ (cid:17) will be driven by the behaviorof √ N S N ( θ ) . As pointed out by Fafchamps & Gubert (2007) and others, (8) is not asum of independent random variables, hence a basic central limit theorem (CLT) cannot be(directly) applied.My analysis of √ N S N ( θ ) borrows from the theory of U-Statistics (e.g., Ferguson, 2005;8an der Vaart, 2000). To make these connections clear it is convenient to re-write S N ( θ ) as S N ( θ ) = (cid:18) N (cid:19) − X i Variance calculation In this section I first derive the sampling variance of √ N (cid:16) ˆ θ − θ (cid:17) and then provide aninterpretation of it. I begin by calculating the variance of S N : V ( S N ) = V ( U N ) + V ( U N ) + V ( V N ) . Let Σ q = C (¯ s ( W i , U i , W i , U i ) + ¯ s ( W i , U i , W i , U i ) , ¯ s ( W j , U j , W j , U j ) + ¯ s ( W j , U j , W j , U j )) when the dyads { i , i } and { j , j } share q = 0 , , indices in common. A Hoeffding (1948)variance decomposition gives V ( U N ) = V ( U N ) + V ( U N )4 N Σ + 2 N ( N − 1) (Σ − Σ ) . Direct calculation yields (see Appendix A) Σ def ≡ V (cid:18) ¯ s e ( W , U ) + ¯ s a ( W , U )2 (cid:19) (11) = Ω , + 2Ω , + Ω , It is also implicit in the analysis of Bickel et al. (2011). Ω i i ,j j = C (¯ s ( W i , U i , W i , U i ) , ¯ s ( W j , U j , W j , U j )) . Similarly we have Σ = V (cid:18) ¯ s ( W , U , W , U ) + ¯ s ( W , U , W , U )2 (cid:19) (12) = Ω , + Ω , and, in an abuse of notation, letting Σ def ≡ V (cid:18)q(cid:0) N (cid:1) V N (cid:19) , Σ = E (cid:20) ∆ , ( W , U , W , U ) + ∆ , ( W , U , W , U )2 (cid:21) (13) = ¯∆ , + ¯∆ , where ∆ , ( W , U , W , U ) = V ( s ( Z , θ ) | W , U , W , U )∆ , ( W , U , W , U ) = E (cid:2) s ( Z , θ ) s ( Z , θ ) ′ (cid:12)(cid:12) W , U , W , U (cid:3) . From (11), (12) and (13) we have, collecting terms, a variance of S N equal to V ( S N ) = V ( U N ) + V ( U N ) + V ( V N ) (14) N Σ + 2 N ( N − 1) (Σ − ) + 2 N ( N − 1) Σ = (Ω , + 2Ω , + Ω , ) (cid:18) N − N − (cid:19) + 1 N − (cid:0) Ω , + ¯∆ , + Ω , + ¯∆ , (cid:1) . To understand (14) note that there are exactly (cid:18) N (cid:19)(cid:18) (cid:19)(cid:18) N − (cid:19) = N ( N − 1) ( N − pairs of dyads sharing one agent in common. Consequently, applying the variance operator to S N yields a total of N ( N − 1) ( N − non-zero covariance terms across the (cid:18) N (cid:19) summandsin S N . It is these covariance terms which account for the leading term in (14). The secondand third terms in (14) arise from the (cid:18) N (cid:19) variances of the summands in S N . Indeed, it is11elpful to note that Σ = V (cid:18) E (cid:20) s ( Z , θ ) + s ( Z , θ )2 (cid:12)(cid:12)(cid:12)(cid:12) W , U , W , U (cid:21)(cid:19) Σ = E (cid:20) V (cid:18) s ( Z , θ ) + s ( Z , θ )2 (cid:12)(cid:12)(cid:12)(cid:12) W , U , W , U (cid:19)(cid:21) and hence that V (cid:18) s ( Z , θ ) + s ( Z , θ )2 (cid:19) = Σ + Σ . (15)Although it may be that Σ + Σ ≥ Σ (in a positive definite sense), the larger numberof non-zero covariance terms generated by applying the variance operator to S N contributesmore to its variability, than the smaller number of own variance terms. Inspecting (14) it isclear that the multiplying by √ N stabilizes the variance such that V (cid:16) √ N S N (cid:17) = 4Σ + O (cid:0) N − (cid:1) and hence V (cid:16) √ N (cid:16) ˆ θ − θ (cid:17)(cid:17) → (cid:0) Γ ′ Σ − Γ (cid:1) − as N → ∞ .If a researcher uses standard software, for example a Poisson regression program, tomaximize the composite log-likelihood (7) and then chooses to report robust Huber (1967)type standard errors, this corresponds to assuming that Ω , = Ω , = Ω , = Ω , = ¯∆ , = 0 . This approach would ignore the dominant variance term and part of the higher order termas well. If, instead, the researcher clustered her standard errors on dyads, as in, for example,Santos Silva & Tenreyro (2010), then this corresponds to assuming that Ω , = Ω , = Ω , = 0 but allowing Ω , and/or ¯∆ , to differ from zero. This approach would still erroneouslyignore the dominant variance term. In both cases reported confidence intervals are likely toundercover the true parameter; perhaps by a substantial margin. This is shown, by example,via Monte Carlo simulation below. 12 ariance estimation Graham (TBD) provides a comprehensive discussion of variance estimation for dyadic re-gression. One approach to variance estimation he reviews shows that Σ can be estimatedby the analog covariance estimate ˆΣ = 14 2 N ( N − 1) ( N − N − X i =1 N − X j = i +1 N X k = j +1 (cid:8) (ˆ s ij + ˆ s ji ) (ˆ s ik + ˆ s ki ) ′ (ˆ s ij + ˆ s ji ) (ˆ s jk + ˆ s kj ) ′ + (ˆ s ik + ˆ s ki ) (ˆ s jk + ˆ s kj ) ′ (cid:9) , where the summation is over all triads in the sampled network. Each triad can itself bepartitioned into three different pairs of dyads, each sharing an agent in common.It turns out, as inspection of (15) suggests, it is easiest to estimate the sum of Σ and Σ jointly by \ Σ + Σ = 14 2 N ( N − N − X i =1 N X j = i +1 (ˆ s ij + ˆ s ji ) (ˆ s ij + ˆ s ji ) ′ . The Jacobian matrix, Γ , may be estimated by − H N (cid:16) ˆ θ (cid:17) , which is typically available asa by-product of estimation in most commercial software. Putting things together gives avariance estimate of ˆ V (cid:16) √ N (cid:16) ˆ θ − θ (cid:17)(cid:17) = ˆΓ − (cid:18) + 2 N − (cid:16) \ Σ + Σ − (cid:17)(cid:19) (cid:16) ˆΓ − (cid:17) ′ . (16)Graham (TBD) shows that (16) is numerically equivalent, up to a finite sample correction,to the variance estimator proposed by Fafchamps & Gubert (2007). This variance estimatorincludes estimates of asymptotically negligible terms. Although these terms are negligiblewhen the sample is large enough, in practice they may be sizable in real world settings. Limit distribution The variance calculations outlined above imply that √ N S N = √ N U N + o p (1) and hencethat √ N (cid:16) ˆ θ − θ (cid:17) = Γ − √ N U N + o p (1) . U N is the sum of i.i.d. random variables a CLT gives √ N (cid:16) ˆ θ − θ (cid:17) D → N (cid:16) , (cid:0) Γ ′ Σ − Γ (cid:1) − (cid:17) , (17)The variance expression, equation (14), indicates that inference based upon the limit dis-tribution (17) would ignore higher order variance terms included in (16). In practice, ashas been shown in other contexts, an approach to inference which incorporates estimates ofthese higher order variance terms may result in inference with better size properties (e.g.,Graham et al., 2014; Cattaneo et al., 2014; Graham et al., 2019). In practice I suggest usingthe normal reference distribution, but with a variance estimated by (16), which includesasymptotically negligible terms which may nevertheless be large in real world samples. This section provides an example of a dyadic regression analysis using the dataset con-structed by João Santos Silva and Silvana Tenreyro (2006) in their widely-cited paper “TheLog of Gravity”. This dataset, which as of the Fall of 2019 was available for downloadat http://personal.lse.ac.uk/tenreyro/LGW.html , includes information on N = 136 countries, corresponding to 18,360 directed trading relationships. Here I present a simplespecification which includes only the log of exporter and importer GDP, respectively lyex and lyim , as well as the log distance ( ldist ) between the two trading countries. Maximizing(7) yields a fitted regression function of ˆ E [ Y ij | W i , W j ] = exp − . . . . lyex + 0 . . lyim + − . . ldist ! . Standard errors which cluster on dyads, but ignore dependence across dyads sharing a singleagent in common, are reported in parentheses below the coefficient estimates. Specificallythese standard errors coincide with square roots of the diagonal elements of N ( N − 1) ˆΓ − (cid:16) \ Σ + Σ (cid:17) (cid:16) ˆΓ − (cid:17) ′ . (18)The coefficient estimates and reported standard errors are unremarkable in the contextof the empirical trade literature. I refer the reader to Santos Silva & Tenreyro (2006) orHead & Mayer (2014) for additional context.If, instead, the Fafchamps & Gubert (2007) dyadic robust variance-covariance estimator14able 1: Coverage of different confidence intervals with dyadic datai.i.d. dyadic clustered θ θ θ ˆ E [ Y ij | W i , W j ] = exp − . . . . lyex + 0 . . lyim + − . . ldist ! . Standard errors which account for dependence across dyads sharing an agent in common areapproximately twice those which ignore such dependence. Monte Carlo experiment Next I report on a small Monte Carlo experiment to illustrate the properties of inferencemethods based on the different variance-covariance estimates described above. I set N = 200 and generate outcome data for all N ( N − ordered pairs of agents according to the outcomemodel: Y ij = exp ( θ R ij + θ W i + θ W j ) A i A j U ij Here A i , for i = 1 , ..., N , is a sequence of i.i.d. log normal random variables, each with mean1 and scale parameter σ A ; U ij for i = 1 , ..., n with n = N ( N − is also sequence of i.i.d. lognormal random variables, each with mean 1 and scale parameter σ .Each agent is uniformly at random assigned a location on the unit square, ( W i , W i ), R ij = q ( W i − W j ) + ( W i − W j ) equals the distance between agents i and j on thatsquare; W i is a standard uniform random variable. I set θ = − , θ = − / and θ = 1 / .I set σ = 1 and σ A = 1 / . This generates moderate, but meaningful, dependence across anytwo dyads sharing at least one agent in common.Table 1 reports Monte Carlo estimates of confidence interval coverage (the nominal cover-age of the intervals should be 0.95). These estimates are based upon 1,000 simulated datasets.15he coverage properties of two intervals are evaluated. The first is a Wald-based intervalwhich uses standard errors constructed from (18). This corresponds to assuming indepen-dence across dyads or “clustering on dyads”. Confidence intervals constructed in this way areroutinely reported in, for example, the trade literature. The coverage of these intervals ispresented in first column of Table 1. The second interval is based on the Fafchamps-Gubertvariance estimate (see (16) above). The coverage of these intervals, which do take into ac-count dependence across pairs of dyads sharing an agent in common, are reported in columntwo of the table.In the experiment, the intervals which do not appropriately account fo dyadic clustering,drastically undercover the truth, whereas those based on the variance estimator outline abovehave actual coverage very close to 0.95. While there is no doubt additional work to be doneon variance estimation and inference in the dyadic context, a preliminary suggestion is toreport standard errors and confidence intervals based upon equation (16) of the previoussection. These intervals perform well in the simulation experiment, while those which ignoredyadic dependence, are not recommended. Although the use of gravity models by economists dates back to Tinbergen (1962), discus-sions of how to account for cross dyad dependence when conducting inference have been rare.Kolaczyk (2009, Chapter 7), in his widely cited monograph on network statistics, discusseslogistic regression with dyadic data. He notes that standard inference procedures are inap-propriate due to the presence of dyadic dependence, but is unable to offer a solution due tothe lack of formal results in the literature (available at that time).Fafchamps & Gubert (2007) proposed a variance-covariance estimator which allows fordyadic-dependence. Their estimator coincides with the bias-corrected one discussed inGraham (TBD) and is the one recommended here. Additional versions (and analyses)of this estimator are provided by Cameron & Miller (2014) and Aronow et al. (2017).A special case of the Fafchamps & Gubert (2007) variance estimator actually appearsin Holland & Leinhardt (1976) in the context of an analysis of subgraph estimation.Snijders & Borgatti (1999) suggested using the Jackknife for variance estimation of net-work statistics. Results in, for example, Callaert & Veraverbeke (1981) and the referencestherein, suggest that this estimate is (almost) numerically equivalent to ˆΣ defined above.Aldous’ (1981) representation result evidently inspired some work on LLNs andCLTs for so called dissociated random variables and exchangeable random arrays (e.g.,Eagleson & Weber, 1978). The influence of this work on empirical practice appears to16ave been minimal. Bickel et al. (2011), evidently inspired by the variance calculationsof Picard et al. (2008), but perhaps more accurately picking up where Holland & Leinhardt(1976) stopped (albeit inadvertently), present asymptotic normality results for subgraphcounts. Network density, which corresponds to the mean [ N ( N − − P i = j Y ij when Y ij is binary, is the simplest example they consider and also prototypical for understanding re-gression. The limit theory sketched hear was novel at the time of drafting, but substantiallyrelated results – independently derived – appear in Menzel (2017) and Davezies et al. (2019).Both of these papers also present bootstrap procedures appropriate for network data. TheMenzel (2017) paper focuses on the important problem of graphon degeneracy. This occurswhen the graphon only weakly varies in U i and U j ; degeneracy effects rates of convergenceand limit distributions. Graham et al. (2019) present results on kernel density estimationwith dyadic data. Tabord-Meehan (2018) showed asymptotic normality of dyadic linearregression coefficients using a rather different approach. A Derivations Expression (11) of the main text is an implication of calculations like V (¯ s e ( W , U )) = E (cid:2) E [ ¯ s ( W , U , W , U ) | W , U ] E [ ¯ s ( W , U , W , U ) | W , U ] ′ (cid:3) = E (cid:2) E [ ¯ s ( W , U , W , U ) | W , U ] E [ ¯ s ( W , U , W , U ) | W , U ] ′ (cid:3) = E (cid:2) E (cid:2) ¯ s ( W , U , W , U ) ¯ s ( W , U , W , U ) ′ (cid:12)(cid:12) W , U (cid:3)(cid:3) = E (cid:2) ¯ s ( W , U , W , U ) ¯ s ( W , U , W , U ) ′ (cid:3) = Ω , . The second equality immediately above follows because W , U | W , U D = W , U | W , U D = W , U , the third by independence of ¯ s ( W , U , W , U ) and ¯ s ( W , U , W , U ) conditional on W , U , and the fourth by iterated expectations. References Aldous, D. J. (1981). Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis , 11(4), 581 – 598.Apicella, C. L., Marlowe, F. W., Fowler, J. H., & Christakis, N. A. (2012). Social networksand cooperation in hunter-gatherers. Nature , 481(7382), 497 – 501.17ronow, P. M., Samii, C., & Assenova, V. A. (2017). Cluster-robust variance estimation fordyadic data. Political Analysis , 23(4), 564 – 577.Baldwin, R. & Taglioni, D. (2007). Trade effects of the euro: a comparison of estimators. Journal of Economic Integration , 22(4), 780 – 818.Bellio, R. & Varin, C. (2005). A pairwise likelihood approach to generalized linear modelswith crossed random effects. Statistical Modelling , 5(3), 217 – 227.Bickel, P. J. & Chen, A. (2009). A nonparametric view of network models and newman-girvan and other modularities. Proceedings of the National Academy of Sciences , 106(50),21068 – 21073.Bickel, P. J., Chen, A., & Levina, E. (2011). The method of moments and degree distributionsfor network models. Annals of Statistics , 39(5), 2280 – 2301.Callaert, H. & Veraverbeke, N. (1981). The order of the normal approximation for a studen-tized u-statistic. Annals of Statistics , 9(1), 194 – 200.Cameron, A. C. & Miller, D. L. (2014). Robust inference for dyadic data . Technical report,University of California - Davis.Cattaneo, M., Crump, R., & Jansson, M. (2014). Small bandwidth asymptotics for density-weighted average derivatives. Econometric Theory , 30(1), 176 – 200.Chamberlain, G. (1987). Asymptotic efficiency in estimation with conditional moment re-strictions. Journal of Econometrics , 34(3), 305 – 334.Chandrasekhar, A. (2015). Econometrics of network formation. In Y. Bramoullé, A. Galeotti,& B. Rogers (Eds.), Oxford Handbook on the Economics of Networks . Oxford UniversityPress.Cox, D. R. & Reid, N. (2004). A note on pseudolikelihood constructed from marginaldensities. Biometrika , 91(3), 729 – 737.Crane, H. & Towsner, H. (2018). Relatively exchangeable structures. Journal of SymbolicLogic , 83(2), 416 – 442.Davezies, L., d’Haultfoeuille, X., & Guyonvarch, Y. (2019). Empirical process results forexchangeable arrayes . Technical report, CREST-ENSAE.18e Finetti, B. (1931). Funzione caratteristica di un fenomeno aleatorio. Atti della R.Academia Nazionale dei Lincei, Serie 6. Memorie, Classe di Scienze Fisiche, Mathematicee Naturale , 4, 251 – 299.Diaconis, P. & Janson, S. (2008). Graph limits and exchangeable random graphs. Rendicontidi Matematica , 28(1), 33 – 61.Eagleson, G. K. & Weber, N. C. (1978). Limit theorems for weakly exchangeable arrays. Mathematical Proceedings of the Cambridge Philosophical Society , 84(1), 123 – 130.Fafchamps, M. & Gubert, F. (2007). The formation of risk sharing networks. Journal ofDevelopment Economics , 83(2), 326 – 350.Ferguson, T. S. (2005). U-statistics. University of California - Los Angeles.Gourieroux, C., Monfort, A., & Trognon, A. (1984). Pseudo maximum likelihood methods:applications to poisson models. Econometrica , 52(3), 701 – 720.Graham, B. S. (2017). An econometric model of network formation with degree heterogeneity. Econometrica , 85(4), 1033 – 1063.Graham, B. S. (TBD). Handbook of Econometrics , volume 7, chapter The econometricanalysis of networks. North-Holland: Amsterdam.Graham, B. S., Imbens, G. W., & Ridder, G. (2014). Complementarity and aggregateimplications of assortative matching: a nonparametric analysis. Quantitative Economics ,5(1), 29 – 66.Graham, B. S., Niu, F., & Powell, J. L. (2019). Kernel density estimation for undirecteddyadic data . Technical report, University of California - Berkeley.Head, K. & Mayer, T. (2014). Handbook of International Economics , volume 4, chapterGravity equations: workhorse, toolkit, and cookbook, (pp. 131 – 191). North-Holland:Amsterdam.Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Annalsof Mathematical Statistics , 19(3), 293 – 325.Holland, P. W. & Leinhardt, S. (1976). Local structure in social networks. SociologicalMethodology , 7, 1 – 45.Hoover, D. N. (1979). Relations on probability spaces and arrays of random variables . Tech-nical report, Institute for Advanced Study, Princeton, NJ.19uber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandardconditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics andProbability , 1, 221 – 233.Kolaczyk, E. D. (2009). Statistical Analysis of Network Data . New York: Springer.Lindsey, B. G. (1988). Composite likelihood. Contemporary Mathematics , 80, 221 – 239.Menzel, K. (2017). Bootstrap with clustering in two or more dimensions . Technical Report1703.03043v2, arXiv.Oneal, J. R. & Russett, B. (1999). The kantian peace: the pacific benefits of democracy,interdependence, and international organizations. World Politics , 52(1), 1 – 37.Picard, F., Daudin, J. J., Koskas, M., Schbath, S., & Robin, S. (2008). Assessing theexceptionality of network motifs. Journal of Computational Biology , 15(1), 1 – 20.Portes, R. & Rey, H. (2005). The determinants of cross-border equity flows. Journal ofInternational Economics , 65(2), 269 – 296.Rose, A. K. (2004). Do we really know that the wto increases trade? American EconomicReview , 94(1), 98 – 114.Santos Silva, J. & Tenreyro, S. (2006). The log of gravity. Review of Economics and Statistics ,88(4), 641 – 658.Santos Silva, J. & Tenreyro, S. (2010). Currency unions in prospect and retrospect. AnnualReview of Economics , 2, 51 – 74.Shalizi, C. R. (2016). Lecture 1: Conditionally-independent dyad models . Lecture note,Carnegie Mellon University.Snijders, T. A. B. & Borgatti, S. P. (1999). Non-parametric standard errors and tests fornetwork statistics. Connections , 22(2), 61 – 70.Tabord-Meehan, M. (2018). Inference with dyadic data: Asymptotic behavior of the dyadic-robust t-statistic. Journal of Business and Economic Statistics .Tinbergen, J. (1962). Shaping the World Economy: Suggestions for an International Eco-nomic Policy . New York: Twentieth Century Fund.van der Vaart, A. W. (2000).