[PDF] Estimation of Large Network Formation Games

Abstract

This paper develops estimation methods for network formation models using observed data from a single large network. The model allows for utility externalities from friends of friends and friends in common, so the expected utility is nonlinear in the link choices of an agent. We propose a novel method that uses the Legendre transform to express the expected utility as a linear function of the individual link choices. This implies that the optimal link decision is that for an agent who myopically chooses to establish links or not to the other members of the network. The dependence between the agent's link choices is through an auxiliary variable. We propose a two-step estimation procedure that requires weak assumptions on equilibrium selection, is simple to compute, and has consistent and asymptotically normal estimators for the parameters. Monte Carlo results show that the estimation procedure performs well.

Full PDF

aa r X i v : . [ ec on . E M ] J a n Estimation of Large Network Formation Games ∗ Geert Ridder † Shuyang Sheng ‡ January 14, 2020

Abstract

This paper develops estimation methods for network formation models usingobserved data from a single large network. The model allows for utility exter-nalities from friends of friends and friends in common, so the expected utilityis nonlinear in the link choices of an agent. We propose a novel method thatuses the Legendre transform to express the expected utility as a linear functionof the individual link choices. This implies that the optimal link decision isthat for an agent who myopically chooses to establish links or not to the othermembers of the network. The dependence between the agent’s link choices isthrough an auxiliary variable. We propose a two-step estimation procedurethat requires weak assumptions on equilibrium selection, is simple to compute,and has consistent and asymptotically normal estimators for the parameters.Monte Carlo results show that the estimation procedure performs well.KEYWORDS: Network Formation, Large Games, Incomplete Information,Two-step Estimation, Legendre TransformJEL Codes: C13, C31, C57, D85 ∗ We thank Denis Chetverikov, Aureo de Paula, Bryan Graham, Jinyong Hahn, Bo Honore, An-dres Santos, Kevin Song, Terrence Tao (for suggesting the Legendre transform) and participants ofseminars at UC Berkeley, UCLA, UC Davis, UC Riverside, Princeton, U Colorado, Northwestern,Maryland, Georgetown, Duke, UBC, Ohio State, TAMU, Florida State, Penn State, U Chicago,Tinbergen Institute, Groningen U, FGV Rio, INSPER, PUC Rio, Vanderbilt, Econometric SocietyWorld Congress (2015), Econometric Society Winter Meeting and Summer Meeting (2016), INETconferences on econometrics of networks University of Cambridge (2015) and USC (2015), Berke-ley/CeMMAP conference on networks Berkeley (2016), NYU conference (2017), Cowles Foundationconference (2017). † Department of Economics, University of Southern California, Los Angeles, CA 90089. E-mail:[email protected]. ‡ Department of Economics, University of California at Los Angeles, Los Angeles, CA 90095.E-mail: [email protected]. Introduction

This paper contributes to the growing literature on the estimation of game-theoreticmodels of network formation (Jackson, 2008). The purpose of the empirical anal-ysis is to recover the preferences of the members of the network, in particular thepreferences that determine whether a member of the network forms links (friendship,business relation or some other type of link) with other members of the network. Thepreference for a link depends in general on the exogenous characteristics of the twomembers, and on their endogenous positions in the network, e.g., their number oflinks and their number of common links. It is the dependence of the link preferenceof an agent on the endogenous position of a potential partner in the network thatcomplicates the analysis. The link preference of an agent also depends on unobserv-able features of the link. Assumptions on the nature of these unobservables play akey role in the empirical analysis.Link formation models are discrete choice models where the choice is betweenalternatives that consist of the links to the other members. In a network with n members an agent chooses between 2 n − overlapping sets of links. Because our anal-ysis assumes that n grows large, this seems an intractable discrete choice problem.Our main contribution is to propose a method that for a general class of link prefer-ences transforms this intractable discrete choice problem into a tractable sequence ofrelated binary choice problems.The ﬁrst simpliﬁcation comes from the assumption that agents have incompleteinformation on unobservables when making their link choices. We assume that agentsknow the unobserved (by the econometrician) utility shocks for their own potentiallinks, but not the unobserved utility shocks for the potential links of the other agents.An alternative assumption is complete information under which agents know not justthe unobserved utility shocks of their own potential links, but also those of the linksfor all other agents in the network. The complete information models are the hard-est to estimate and the utility function parameters are in general partially identiﬁed(de Paula, Richards-Shubik, and Tamer, 2018; Miyauchi, 2016; Sheng, 2018). Leung(2015) considers an incomplete information model where the utility function is ad-ditively separable in one’s own links. In that case, the optimal strategy is a set ofindependent binary link choices so that a link is established if the expected utilityof the link is greater than the expected utility of not forming the link. If the utility2unction depends on one’s own links in a nonseparable way, then the optimal strategydoes not have this simple form. An important example of a non-additively separableutility function occurs if the utility of a link depends on links in common. If linksin common have a positive utility then the network exhibits clustering which is acommon feature of real-world networks.We show that even if the utility function depends on the product of link choiceindicators, the expected utility maximizing link choices are still equivalent to a setof (correlated) binary link choices. To obtain this equivalence we use the Legendretransform to linearize the expected utility function. This linearization introduces anauxiliary variable that depends on the unobserved utility shocks of the agent’s links.This auxiliary variable is itself the solution to a (non-diﬀerentiable) optimizationproblem. After the inclusion of the auxiliary variable we can represent the optimallink decision as a set of binary link choices.The parameters of the utility function can be estimated by a two-step procedurewhere in the ﬁrst-step reduced-form link probabilities are estimated, and in the secondstep we estimate the utility function parameters. The asymptotic analysis of thetwo-step estimator has some complications. We assume that we have data on asingle large network. A number of papers as Menzel (2017), Leung (2015), andde Paula, Richards-Shubik, and Tamer (2018) consider estimation using such data.In our model the link choices are dependent for each agent but not across agents. Thedependence can be represented by the auxiliary variable introduced by the Legendretransform. If the number of network members n grows large the auxiliary variableconverges to a constant that does not depend on the unobserved utility shocks so thelink dependence vanishes. Our two-step estimator based on the observations on linksis consistent even with this ﬁnite network dependence. The link dependence has tobe accounted for in the asymptotic variance of the estimator.The plan of the paper is as follows. In Section 2 we introduce the model andthe speciﬁc utility function that we will use. We also discuss the Bayesian Nashequilibrium for the network. In Section 3 we obtain a closed-form expression for theoptimal link choices of an agent. Section 4 discusses the two-step estimator. Section5 introduces a number of extensions of the model and estimator. Section 6 reportsthe results of a simulation study. 3 Model

Consider n agents who choose to form links (or not) to each other. We introduce ourmodel for friendships, but it applies to any kind of links or agents. The links form anetwork, which is represented by an n × n binary matrix G ∈ G with G the set of all n × n binary matrices with a 0 main diagonal. The ( i, j ) element G ij = 1 if i and j arelinked and 0 otherwise. The diagonal elements G ii are set to 0. We consider directedlinks, i.e., G ij and G ji may be diﬀerent. The case of undirected links is discussedlater in Section 5.2.Each individual i has a vector of observed characteristics X i ∈ X and a vectorof unobserved utility shocks ε i = ( ε i , . . . , ε i,i − , ε i,i +1 , . . . , ε in ) ′ ∈ R n − , where ε ij is i ’s unobserved utility shock for link ij . We assume that the vector of characteristics X = ( X ′ , . . . , X ′ n ) ′ ∈ X n is public information of all individuals, but the utility shockvector ε i is the private information of i . We also assume that the utility shocks arei.i.d. and are independent of the observables. Assumption 1 (i) ε ij , ∀ i = j , are i.i.d. with cdf F ε ( · ; θ ε ) known up to the parametervector θ ε ∈ Θ ε ⊂ R d ε . The distribution is absolutely continuous with respect to theLebesgue measure and has a density f ε ( · ; θ ε ) that is continuously diﬀerentiable in θ ε and strictly positive and bounded on R . (ii) The vector of utility shocks ε =( ε ′ , . . . , ε ′ n ) ′ and X are independent. Utility

Given the vector of characteristics X and the private utility shocks ε i , theutility of network G for i is U i ( G, X, ε i ; θ u ) = X j = i G ij u i ( G j , X ; β ) + 1 n − X k = i,j G ik v i ( G j , G k , X ; γ ) − ε ij ! , (2.1)where G i = ( G ij , j = i ) denotes the i th row of G , i.e., the links formed by i . Weassume that the utility function is known up to the parameter vector θ u = ( β ′ , γ ′ ) ′ ina compact set Θ u ⊂ R d u .In (2.1), u i ( G j , X ; β ) represents the part of the incremental utility from a link4ith j that does not depend on i ’s link decision G i . An obvious speciﬁcation is u i ( G j , X ; β ) = β + X ′ i β + | X i − X j | ′ β + G ji β + 1 n − X k = i,j G jk β ( X i , X j , X k ) . (2.2)The ﬁrst four terms in (2.2) capture the direct utility from the link with j , whichdepends on the homophily eﬀect ( β ) and the reciprocity eﬀect ( β ). The last termin (2.2) captures the indirect utility from j ’s friends, which may vary with the char-acteristics of the individuals involved. This speciﬁcation is similar to that in Leung(2015).The utility function in (2.1) also accounts for the utility that i derives from simul-taneously linking with j and k , denoted by v i ( G j , G k , X ; γ ). An important exampleis the utility derived from friends in common v i ( G j , G k , X ; γ ) = G jk G kj γ ( X i , X j , X k ) + 1 n − X l = i,j,k G jl G kl γ ( X i , X j , X k ) , (2.3)where the ﬁrst term captures the utility of friends in common that are directly con-nected and the second term captures the utility of friends in common that are indi-rectly connected. Allowing for such potential complementarities of links is crucial ifwe want to model networks that exhibit clustering, i.e., two individuals with friendsin common are more likely to be friends (Jackson, 2008). The main diﬀerence betweenour model and that of Leung (2015) is that we allow for the complementarity of linkdecisions.We normalize the sum terms in (2.1)-(2.3) by n − n − n increases to inﬁnity, the data scenario we consider inthe asymptotic analysis. Equilibrium

Let G i ( X, ε i ) denote individual i ’s link decision, which is a mappingfrom i ’s information ( X, ε i ) to a vector of links G i ∈ G i = { , } n − . Write G =( G i , G − i ), where G − i denotes the submatrix of G with the i th row deleted, i.e., thelinks formed by individuals other than i .Each individual i makes her optimal link decision by maximizing her expectedutility E [ U i ( g i , G − i , X, ε i ) | X, ε i ] over g i ∈ G i , where the expectation is taken with We can replace G jk G kj by G jk + G kj . G − i . Since G − i is a function of X and ε − i = (cid:0) ε ′ j , j = i (cid:1) ′ , and the private shocks ε i are assumed to be independent across i (Assumption 1), individual i ’s belief about G − i depends on her information ( X, ε i )only through the public information X . Let σ i ( g i | X ) = Pr ( G i ( X, ε i ) = g i | X ) bethe conditional probability that individual i chooses g i given X . The independenceof ε , . . . , ε n implies that the link decisions G i are independent across i given X , soindividual i ’s belief about the link decisions of others is σ − i ( g − i | X ) = Q j = i σ j ( g j | X ).Let σ ( X ) = { σ i ( g i | X ) , g i ∈ G i , i = 1 , . . . , n } denote the belief proﬁle. For a givenbelief proﬁle σ , the expected utility of individual i is given by E [ U i ( G i , G − i , X, ε i ) | X, ε i , σ ]= X j = i G ij E [ u i ( G j , X ) | X, σ ] + 1 n − X k = i,j G ik E [ v i ( G j , G k , X ) | X, σ ] − ε ij ! . (2.4)For the speciﬁcation in (2.2) and (2.3), we have E [ u i ( G j , X ) | X, σ ] = β + X ′ i β + | X i − X j | ′ β + E [ G ji | X, σ ] β + 1 n − X k = i,j E [ G jk | X, σ ] β ( X i , X j , X k ) , (2.5)and E [ v i ( G j , G k , X ) | X, σ ] = E [ G jk | X, σ ] E [ G kj | X, σ ] γ ( X i , X j , X k )+ 1 n − X l = i,j,k E [ G jl | X, σ ] E [ G kl | X, σ ] γ ( X i , X j , X k ) , (2.6)with e.g. E [ G ji | X, σ ] = X g j ∈G j : g ji =1 σ j ( g j | X ) . X and σ , the probability that individual i chooses g i isPr ( G i = g i | X, σ )= Pr (cid:18) E [ U i ( g i , G − i , X, ε i ) | X, ε i , σ ] ≥ max ˜ g i ∈G i E [ U i (˜ g i , G − i , X, ε i ) | X, ε i , σ ] (cid:12)(cid:12)(cid:12)(cid:12) X, σ (cid:19) . (2.7)A Bayesian Nash equilibrium σ ∗ ( X ) = { σ ∗ i ( g i | X ) , g i ∈ G i .i = 1 , . . . , n } is a beliefproﬁle that satisﬁes σ ∗ i ( g i | X ) = Pr ( G i = g i | X, σ ∗ ( X )) (2.8)for all link decisions g i ∈ G i and all i = 1 , . . . , n . There may be multiple belief proﬁlesthat satisfy (2.8). The major challenge in estimating the model in Section 2 is that the expected utilityof each agent i in (2.4) is nonseparable in her link choices, because the expectedutility depends on G ij G ik . Solving for the optimal link choices is therefore a nonlinearinteger programming problem that does not have a closed-form solution and has aproblem size that grows with the number of agents. In this section, we develop anapproach that overcomes this challenge and yields an expression for the link choiceprobability that is computationally convenient and can be used to derive asymptoticproperties of parameter estimators. The idea is to ﬁnd an auxiliary variable thatcaptures the strategic interactions between i ’s link choices, so that after inclusionof this auxiliary variable the link choices become binary correlated choices, with thecorrelation between the link choices controlled by the auxiliary variable.As a ﬁrst step we observe that the expected utility from friends in common, i.e.,the term E [ v i ( G j , G k , X ) | X, σ ] in (2.6), is symmetric in j and k . Later on we willfocus on equilibria that are symmetric in individuals’ observed characteristics. Wesay that an equilibrium σ ( X ) is symmetric if for i and j with X i = X j , we have σ i ( X ) = σ j ( X ), where σ i ( X ) = { σ i ( g i | X ) , g i ∈ G i } denotes the conditional choiceprobability proﬁle of individual i . In social networks, we typically do not observe An implicit assumption is that γ ( X i , X j , X k ) and γ ( X i , X j , X k ) are symmetric in X j and X k . Note that i and j have the same choice probabilities, but with probability 1 diﬀerent expected σ implies that agents j and k who havethe same observed characteristics (i.e., X j = X k ) have the same conditional choiceprobabilities, i.e., σ j = σ k , so E [ v i ( G j , G k , X ) | X, σ ] depends on j and k only throughthe values of X j and X k . Therefore E [ v i ( G j , G k , X ) | X, σ ] is a symmetric functionof X j and X k .To facilitate the presentation, we focus on the case where X i is discrete. Weassume that X i takes a ﬁnite number of values, which are referred to as the types ofthe individuals. Assumption 2 (Discrete X ) X i takes T < ∞ distinct values x , . . . , x T . Under Assumption 2, E [ v i ( G j , G k , X ) | X, σ ] takes T possible values, dependingon the types of j and k . For any s, t = 1 , . . . , T , let V i,st ( X, σ ) denote the value of E [ v i ( G j , G k , X ) | X, σ ] if j and k are of types s and t , respectively, V i,st ( X, σ ) = E [ v i ( G j , G k , X ) | X j = x s , X k = x t , X, σ ] . Clearly V i,st ( X, σ ) = V i,ts ( X, σ ). We arrange the type-speciﬁc expected utilities offriends in common in a T × T symmetric matrix V i ( X, σ ) =  V i, ( X, σ ) · · · V i, T ( X, σ )... ... V i,T ( X, σ ) · · · V i,T T ( X, σ )  . (3.1)The expected utility in (2.4) can thus be represented as E [ U i ( G i , G − i , X, ε i ; θ u ) | X, ε i , σ ]= X j = i G ij ( U ij ( X, σ ) − ε ij ) + 1 n − X j = i X k = i G ij G ik Z ′ j V i ( X, σ ) Z k , (3.2) utilities due to diﬀerent random utility shocks. Potentially we can relax the assumption and allow for continuous X , but this will complicatethe derivation of the optimal link choices because we need to replace the matrix notation with linearoperators. For simplicity, we focus on discrete X in the paper and leave continuous X to futureresearch. Z j is a T × jZ j = (1 { X j = x } , . . . , { X j = x T } ) ′ and U ij ( X, σ ) = E [ u i ( G j , X ) | X, σ ] − n − Z ′ j V i ( X, σ ) Z j . (3.3)The term Z ′ j V i ( X, σ ) Z k represents the additional expected utility that individual i receives if she links to both j and k and this additional utility depends on j and k ’stypes.We transform the expected utility in two steps, so that after the transformationthe optimal decision can be obtained in closed form. First, since the matrix V i ( X, σ )is real and symmetric, it has a real spectral decomposition V i ( X, σ ) = Φ i ( X, σ ) Λ i ( X, σ ) Φ i ( X, σ ) ′ , (3.4)where Λ i ( X, σ ) = diag ( λ i ( X, σ ) , . . . , λ iT ( X, σ )) is the T × T diagonal matrix ofeigenvalues λ it ( X, σ ) ∈ R , t = 1 , . . . , T , and Φ i ( X, σ ) = ( φ i ( X, σ ) , . . . , φ iT ( X, σ ))is the T × T orthogonal matrix of eigenvectors φ it ( X, σ ) ∈ R T , t = 1 , . . . , T . Usingthe spectral decomposition we can express the second term in the expected utility in(3.2) in a form that involves only the square of functions that are linear in the linkchoices, i.e., n − P j = i G ij Z ′ j φ it ( X, σ ), t = 1 , . . . , T .In the second step we ”linearize” these squares of linear functions using the Leg-endre transform (Rockafellar, 1970). In particular, for any Y , we have the identity Y = max ω ∈ R (cid:8) Y ω − ω (cid:9) (3.5)By choosing Y as the linear function n − P j = i G ij Z ′ j φ it ( X, σ ), we can replace thesquare of this function by the maximization in (3.5). This maximization has anobjective function that is linear in Y and therefore also linear in the link choices G ij . The linearity will allow us to derive the optimal decision in closed form. Thetransformation of the expected utility is presented in Proposition 3.1. Proposition 3.1

Suppose that Assumptions 1-2 are satisﬁed. The expected utility in E [ U i ( G i , G − i , X, ε i ) | X, ε i , σ ]= X j = i G ij ( U ij ( X, σ ) − ε ij ) + ( n − n − T X t =1 λ it ( X, σ ) n − X j = i G ij Z ′ j φ it ( X, σ ) ! = X j = i G ij ( U ij ( X, σ ) − ε ij )+ ( n − n − T X t =1 λ it ( X, σ ) max ω t ∈ R ( n − X j = i G ij Z ′ j φ it ( X, σ ) ! ω t − ω t ) (3.6) Proof.

See the Supplemental Appendix.To derive the optimal decision, recall that it is the link vector G i that maximizesthe expected utility. For the expected utility in (3.6), if the eigenvalues λ it ( X, σ ), t = 1 , . . . , T , are nonnegative (Assumption 3), we can interchange the maximizationover ω = ( ω , . . . , ω T ) ′ and the maximization over G i . Therefore, the optimal G i isthe solution to a simple maximization with an objective function that is linear in G i .If we evaluate the optimal G i at the optimal ω , we obtain the optimal link decisionthat maximizes the expected utility.The solution is particularly simple if we assume the following. Assumption 3

Given X , for all θ u ∈ Θ u , all equilibria σ , and i = 1 , . . . , n , thesmallest eigenvalue of the matrix V i ( X, σ ) is nonnegative. In Section 5.1 we show that the closed-form solution, with some modiﬁcations,remains valid if some or all of the eigenvalues are negative. Assumption 3 holds iflink preferences have a large degree of homophily. If we deﬁne the type-speciﬁc linkprobability p st ( X, σ ) = Pr ( G jk = 1 | X j = x s , X k = x t , X, σ )and assume in (2.3) that γ ( X i , X j , X k ) ≡ γ >

0, i.e., friends in common havepositive utility, and γ ( X i , X j , X k ) ≡

0, then V i ( X, σ ) = γ  p ( X, σ ) · · · p T ( X, σ ) p T ( X, σ )... ... p T ( X, σ ) p T ( X, σ ) · · · p T ( X, σ )  . (3.7)10 suﬃcient condition for the eigenvalues to be nonnegative is that the matrix isdiagonally dominant, i.e., for all types tp tt ( X, σ ) ≥ X s = t p st ( X, σ ) p ts ( X, σ ) . Our key result on the optimal link choices is as follows.

Theorem 3.2

Suppose that Assumptions 1-3 are satisﬁed. For each i , the optimaldecision G i ( ε i , X, σ ) = ( G ij ( ε i , X, σ ) , j = i ) ′ ∈ { , } n − is given by G ij ( ε i , X, σ ) = 1 (cid:26) U ij ( X, σ ) + 2 ( n − n − Z ′ j Φ i ( X, σ ) Λ i ( X, σ ) ω i ( ε i , X, σ ) − ε ij ≥ (cid:27) , (3.8) for all j = i , where the T × vector ω i ( ε i , X, σ ) = ( ω i ( ε i , X, σ ) , . . . , ω iT ( ε i , X, σ )) ′ is a solution to the maximization problem max ω Π i ( ω, ε i , X, σ )= max ω X j = i (cid:20) U ij ( X, σ ) + 2 ( n − n − Z ′ j Φ i ( X, σ ) Λ i ( X, σ ) ω − ε ij (cid:21) + − ( n − n − ω ′ Λ i ( X, σ ) ω (3.9) with [ · ] + = max {· , } . Set ω it ( ε i , X, σ ) = 0 if λ it ( X, σ ) = 0 . Moreover, both G i ( ε i , X, σ ) and ω i ( ε i , X, σ ) are unique almost surely. Proof.

See the Supplemental Appendix.To understand the role and interpretation of ω i ( ε i , X, σ ) we consider the ﬁrst-ordercondition of (3.9) derived in the Supplemental AppendixΛ i ( X, σ ) ω i ( ε i , X, σ ) = 1 n − i ( X, σ ) Φ ′ i ( X, σ ) X k = i G ik ( ε i , X, σ ) Z k . If we multiply both sides of this equation by n − n − Z ′ j Φ i ( X, σ ), we ﬁnd2 ( n − n − Z ′ j Φ i ( X, σ ) Λ i ( X, σ ) ω i ( ε i , X, σ ) = 2 n − X k = i G ik ( ε i , X, σ ) Z ′ j V i ( X, σ ) Z k . (3.10)11ote that the left-hand side is the component of the choice index in (3.8) associatedwith friends in common. The right-hand side is the expected marginal utility (times2) from friends in common. To see this, note that if i contemplates a link with j ,then i considers that her friends k can become friends in common with j . If j is oftype s and i ’s friend k , a potential friend in common, is of type t , then the expectedutility of i from the friend in common with j is V i,st ( X, σ ). Taking the sum over allfriends k of i , we obtain the expected utility of friends in common if i links to j .Corollary 3.3 suggests that the optimal decision G i ( ε i , X, σ ) resembles the pure-stategy Nash equilibrium in an entry game (Ciliberto and Tamer, 2009; Tamer, 2003)where the entry decisions are the link choices. Corollary 3.3

Suppose that Assumptions 1-3 are satisﬁed. For each i , the optimaldecision G i ( ε i , X, σ ) satisﬁes G ij ( ε i , X, σ ) = 1 ( U ij ( X, σ ) + 2 n − X k = i G ik ( ε i , X, σ ) Z ′ j V i ( X, σ ) Z k ≥ ε ij ) , (3.11) for all j = i , and X j = i G ij ( ε i , X, σ ) U ij ( X, σ ) + 1 n − X k = i G ik ( ε i , X, σ ) Z ′ j V i ( X, σ ) Z k − ε ij ! ≥ max ˜ g i satisﬁes (3.11) a.s. X j = i ˜ g ij U ij ( X, σ ) + 1 n − X k = i ˜ g ik Z ′ j V i ( X, σ ) Z k − ε ij ! (3.12) with probability , where ˜ g i ∈ { , } n − . Moreover, for each g i , if we substitute G i ( ε i , X, σ ) = g i in (3.11) and (3.12) and deﬁne the set E ( g i , X, σ ) = (cid:8) ε i ∈ R n − : g i satisﬁes both (3.11) and (3.12) (cid:9) . (3.13) Then E ( g i , X, σ ) , g i ∈ { , } n − , is a partition of R n − with probability . Proof.

See the Supplemental Appendix.From the corollary, the optimal decision G i ( ε i , X, σ ) is a solution to the simul-tanous discrete choice model in (3.11), where an optimal link choice is determinedby a random utility binary choice model and the latent utility includes the expectedutility of friends in common as in (3.10). This makes the model in (3.11) and the12odel for an entry game (Ciliberto and Tamer, 2009; Tamer, 2003) similar if the po-tential entrants are the n − i can link to. In this model the strategicinteractions occur because the link utility depends on friends in common.The system in (3.11) can have multiple solutions. Because i chooses links thatmaximize her expected utility, we have a natural equilibrium selection mechanism.That is, among the solutions to system (3.11), i chooses the G i ( ε i , X, σ ) that givesthe highest expected utility as stated in (3.12). The set E ( g i , X, σ ) deﬁned in (3.13) isthe collection of ε i ∈ R n − such that g i is the optimal solution, i.e., G i ( ε i , X, σ ) = g i .By Theorem 3.2 there is a unique optimal G i ( ε i , X, σ ) that satisﬁes both (3.11) and(3.12) with probability 1. Therefore, the sets E ( g i , X, σ ), g i ∈ { , } n − , form apartition of the support of ε i . These results are useful for establishing the propertiesof the conditional choice probabilities in Section 4.The auxiliary variable ω i ( ε i , X, σ ) provides an explicit expression for the depen-dence of the link choices of an agent. Note that ω i ( ε i , X, σ ) is an optimal solution tothe problem in (3.9), with an objective function that depends on ε i , so the maximizer ω i ( ε i , X, σ ) is a function of ε i . Under Assumption 1, two optimal link choices G ij and G ik are dependent because (i) they both depend on ω i ( ε i , X, σ ), as shown in(3.8), and (ii) ω i ( ε i , X, σ ) in G ij is correlated with the utility shock ε ik for G ik , andsymmetrically ω i ( ε i , X, σ ) in G ik is correlated with the utility shock ε ij for G ij . Thisexplicit characterization of the link dependence allows us to examine the dependenceif n is large, a crucial step in the asymptotic analysis in Section 4.If the matrix V i ( X, σ ) is singular, the ω it ( ε i , X, σ ) that correspond to the zeroeigenvalues λ it ( X, σ ) = 0 are indeterminate. Since it is Λ i ( X, σ ) ω i ( ε i , X, σ ) thatenters (3.8), the indeterminate ω it ( ε i , X, σ ) are irrelevant for the optimal decision.For that reason we can arbitrarily set ω it ( ε i , X, σ ) = 0 if λ it ( X, σ i , σ ) = 0.In the special case that friends in common have no eﬀect, i.e., γ , γ ≡

0, thematrix V i ( X, σ ) ≡

0, so all the eigenvalues are equal to 0. In this case, the optimallink choice in (3.8) reduces to G ij = 1 { E [ u i ( G j , X ) | X, σ ] − ε ij ≥ } , ∀ j = i This is the optimal link choice problem for a utility speciﬁcation that is seperable in i ’s own links (Leung, 2015).We focus on symmetric equilibria as discussed earlier. Applying Corollary 3.3, we13an show that there exists a symmetric equilibrium. There may be multiple symmetricequilibria that satisfy (2.8). We assume that the observed equilibrium is symmetric. Proposition 3.4

Suppose that Assumptions 1-3 are satisﬁed. For any X , there existsa symmetric equilibrium σ ( X ) . Proof.

See the Supplemental Appendix.

In this section, we discuss how to estimate the structural parameter θ ∈ R d θ . Wepropose a two-step procedure, where we estimate the conditional link probabilitiesnonparametrically in the ﬁrst step, and estimate the parameter θ in the second step.When we analyze the properties of this estimator, a few complications arise. First,the model can have multiple equilibria. Second, the data are links in a single largenetwork, where the links formed by an individual are dependent due to the preferencefor friends in common. We will discuss how these complications aﬀect the estimation,and how we overcome them when we derive the properties of the estimator of θ .Let us start with the data generating process. In this paper, we consider thescenario where we observe links from a single network, and in the asymptotic analysiswe assume that the number of nodes of the network n increases to inﬁnity. Tohighlight the dependence of the network G on n we denote the network as G n .We think of the data as being generated by the following process. First, we drawa vector X = ( X ′ , . . . X ′ n ) ′ from a joint discrete distribution where X i represents theobserved characteristics of individual i . The characteristics need not be independentacross individuals. Because X is ancillary, we treat X as ﬁxed. Second, for each i wedraw an n − ε i that are independent acrossindividuals. Third, individuals form links that maximize their expected utility thatdepends on the equilibrium σ n . There can be multiple equilibria, and nature selectsone equilibrium σ n among the ﬁxed points in (2.8). We can think of σ n as having adistribution over all the equilibria, i.e., the ﬁxed points of (2.8). We condition on σ n in addition to X to indicate that a particular equilibrium is selected. If we observe more than one network, we estimate the link probabilities in each network sepa-rately in the ﬁrst step. In the second step we pool the links from the networks to estimate θ . σ n only through the link probabilities of each pair, so we can replace σ n by the vector of link probabilities p n = ( p n,ij , i, j = 1 , . . . , n, i = j ). The optimal linkchoice G n,ij ( ε i , X, θ, p n ) in (3.8) implies the structural choice probability P n,ij ( X, θ, p n ) = Pr ( G n,ij ( ε i , X, θ, p n ) = 1 | X, p n ) . (4.1)A Bayesian Nash equilibrium in (2.8) is that for all i = jp n,ij = P n,ij ( X, θ, p n ) . (4.2)Because of the symmetric equilibrium and the discrete X , each p n,ij depends on i and j only through their types. Therefore the link choice only depends on p n,st , s, t = 1 , . . . T , the link probabilities between types. With abuse of notation we let p n = ( p n,st , s, t = 1 , . . . , T ).The equilibrium condition in (4.2) suggests an estimator of θ with the followingtwo steps. In the ﬁrst step, we estimate each p n,st by the relative frequency of linksbetween pairs of types s and t ˆ p n,st = P i P j = i G n,ij { X i = x s , X j = x t } P i P j = i { X i = x s , X j = x t } , s, t = 1 , . . . , T. (4.3)In the second step, we estimate θ based on the moment condition (from now on weomit X in P n,ij ( X, θ, p n ))ˆΨ n ( θ, p n ) = 1 n ( n − X i X j = i ˆ W n,ij ( G n,ij − P n,ij ( θ, p n )) , (4.4)where ˆ W n,ij ∈ R d θ , i, j = 1 , . . . , n , is a d θ × X and p n . The estimator ˆ θ n is a solution to the equationˆΨ n (cid:16) ˆ θ n , ˆ p n (cid:17) = 0 . (4.5)The population moment function isΨ n ( θ, p n ) = 1 n ( n − X i X j = i W n,ij ( E [ G n,ij | X, p n ] − P n,ij ( θ, p n )) , (4.6)15here W n,ij ∈ R d θ , i, j = 1 , . . . , n , is a d θ × W n,ij . Let θ denote the true value of θ .Because E [ G n,ij | X, p n ] = P n,ij ( θ , p n ), θ satisﬁes the population moment conditionΨ n ( θ , p n ) = 0.Our model can have multiple equilibria. Nevertheless, we do not need to specify anequilibrium selection mechanism, which in our model is the selection of a particular p n .This is because in our two-step estimation procedure we estimate p n in the ﬁrst stepand substitute the estimate in the moment condition to estimate the utility functionparameters. Because we consider a single network instead of multiple networks, wedo not need an assumption on equilibrium selection across networks either. Under the additional Assumption 4, ˆ θ n is a consistent estimator of θ . Assumption 4 (i) The parameter θ lies in a compact set Θ ⊆ R d θ . (ii) For anequilibrium p n and for all n , the system of equations Ψ n ( θ, p n ) = 0 has a uniquesolution θ . (iii) The instruments ˆ W n,ij and their population counterparts W n,ij , i, j =1 , . . . , n , satisfy max i,j =1 ,...,n (cid:13)(cid:13)(cid:13) ˆ W n,ij − W n,ij (cid:13)(cid:13)(cid:13) p → and max i,j =1 ,...,n k W n,ij k < ∞ . (iv)For s, t = 1 , . . . , T , lim n →∞ n ( n − P i P j = i { X i = x s , X j = x t } exists and is strictlypositive. Assumption 4(iv) imposes a mild restriction that the fraction of pairs of all types s and t is positive as n → ∞ , so that the number of pairs of all types grows withoutbounds, and we can estimate the link probabilities p n,st consistently. This assumptionis satisﬁed if X i , i = 1 , . . . , n , are i.i.d. or have limited dependence.Assumption 4(ii) is a local identiﬁcation condition, local because it is an identiﬁ-cation condition for θ for a given equilibrium p n . It requires that the solution θ isinvariant to the equilibrium p n selected for all n . Theorem 4.1 (Consistency)

If Assumptions 1-4 are satisﬁed, then ˆ θ n − θ p → . Proof.

See the Supplemental Appendix. We need to assume that the equilibrium selection does not depend on the random utility shocks ε . Because of multiple equilibria we do not assume the global identiﬁcation condition thatΨ n ( θ, p ) = 0 has a unique solution ( θ , p n ).

16e ﬁrst show that ˆ p n is consistent for p n and next establish the consistency ofˆ θ n . In addition to the identiﬁcation condition, we need a uniform LLN for the samplemoment ˆΨ n ( θ, p n ). Note that links formed by diﬀerent individuals in a single networkare independent given X and p n . This is crucial for a LLN to hold. Moreover, becauselink choices depend on the number of agents in the network, the distribution of thedata on the links in a network with n nodes depends on n . Therefore, we prove theuniform LLN for a triangular array (Pollard, 1990).Next we examine the asymptotic distribution of ˆ θ n . The complication is that thelinks formed by an individual are correlated. By Theorem 3.2, the link choices G n,ij and G n,ik of individual i are correlated because they both depend on the auxiliaryvariable ω ni ( ε i , θ , p n ) that maximizes the objective functionΠ ni ( ω, ε i , θ , p n ) = X j = i (cid:20) U n,ij ( θ , p n ) + 2 ( n − n − Z ′ j Φ ni ( θ , p n ) Λ ni ( θ , p n ) ω − ε ij (cid:21) + − ( n − n − ω ′ Λ ni ( θ , p n ) ω, (4.7)where we add subscript n to Π i , U ij , Λ i , and Φ i to indicate their dependence on n .To derive the asymptotic distribution of ˆ θ n , we ﬁrst derive the asymptotic proper-ties of ω ni ( ε i , θ , p n ), and then investigate how ω ni ( ε i , θ , p n ) aﬀects the asymptoticdistribution of ˆ θ n .In particular, let Π ∗ ni ( ω, θ , p n ) be the conditional expectation of Π ni ( ω, ε i , θ , p n )given X and p n Π ∗ ni ( ω, θ , p n )= X j = i E (cid:20) (cid:20) U n,ij ( θ , p n ) + 2 ( n − n − Z ′ j Φ ni ( θ , p n ) Λ ni ( θ , p n ) ω − ε ij (cid:21) + (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) − ( n − n − ω ′ Λ ni ( θ , p n ) ω. (4.8)We make the following assumptions on the auxiliary variable. Assumption 5 (i) The auxiliary variable ω is in a compact set Ω ⊆ R T , whichcontains a compact neighborhood of . (ii) The function Π ∗ ni ( ω, θ , p n ) has a uniquemaximizer ω ∗ ni ( θ , p n ) . (iii) The gradient Γ ∗ ni ( ω, θ , p n ) of Π ∗ ni ( ω, θ , p n ) has the Ja- obian matrix ∇ ω ′ Γ ∗ ni ( ω, θ , p n )= n − X j = i f ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω (cid:19) Λ ni Φ ′ ni Z j Z ′ j Φ ni − I T ! Λ ni with I T being the T × T identity matrix, where the T × T matrix in parentheses n − X j = i f ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω (cid:19) Λ ni Φ ′ ni Z j Z ′ j Φ ni − I T is nonsingular at ω ∗ ni ( θ , p n ) . Under Assumption 5, we show in the Supplemental Appendix that ω ni ( ε i , θ , p n )is consistent for ω ∗ ni ( θ , p n ) and has an asymptotically linear representation ω ni ( ε i , θ , p n ) − ω ∗ ni ( θ , p n ) = 1 n − X j = i ϕ ωn,ij ( ω ∗ ni ( θ , p n ) , ε ij , θ , p n ) + o p (cid:18) √ n (cid:19) . (4.9)In this representation ϕ ωn,ij ( ω, ε ij , θ , p n ) ∈ R T is the inﬂuence function ϕ ωn,ij ( ω, ε ij , θ , p n ) = ∇ ω ′ Γ ∗ ni ( ω, θ , p n ) + ϕ πn,ij ( ω, ε ij , θ , p n )where ϕ πn,ij ( ω, ε ij , θ , p n ) ∈ R T is deﬁned by ϕ πn,ij ( ω, ε ij , θ , p n ) = 1 (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij ≥ (cid:27) Λ ni Φ ′ ni Z j − Λ ni ω. Since ω ∗ i ( θ , p n ) is deterministic, the convergence of ω ni ( ε i , θ , p n ) to ω ∗ ni ( θ , p n )indicates that the correlation between links G n,ij and G n,ik vanishes as n approachesinﬁnity. Moreover, the asymptotically linear representation in (4.9) implies that ω ni ( ε i , θ , p n ) converges to ω ∗ ni ( θ , p n ) at the rate of n − / . This rate is crucial inderiving the asymptotic distribution of the estimator.Under the additional conditions in Assumption 6, we derive the asymptotic dis-tribution of ˆ θ n in Theorem 4.2. We suppress the dependence of U n,ij , Λ ni , and Φ ni on θ and p n hereafter. ssumption 6 (i) For any i, j = 1 , . . . , n , P n,ij ( θ, p ) is continuously diﬀerentiablewith respect to θ and p in a neighborhood of ( θ , p n ) . (ii) The d θ × d θ Jacobian matrixwith respect to θ J θn ( θ , p n ) = 1 n ( n − X i X j = i W n,ij ∇ θ ′ P n,ij ( θ , p n ) is nonsingular. In the Supplemental Appendix (see Lemma S.4) we show that P n,ij ( θ, p ) is con-tinuous in θ (and p ). Continuity follows from Corollary 3.3 that shows that there isa 1-1 relationship between the optimal link decision g i and set E ( g i ) that partition R n − . The boundaries of the partition sets are continuous, but can have kinks if theset of inequalities in (3.12) that are binding depends on θ . We assume that thereis a possibly small neighborhood of θ without kinks, so that the choice probabilityis continuously diﬀerentiable in that neighborhood. Because we already establishedconsistency, the estimator is in that neighborhood with probability approaching 1. Theorem 4.2 (Asymptotic Distribution)

Suppose that Assumptions 1-6 are sat-isﬁed. Deﬁne the d θ × d θ matrix Σ n ( θ , p n )= 1 n ( n − X i X j = i J θn ( θ , p n ) − E (cid:2) ϕ mn,ij ( ε ij , θ , p n ) ϕ mn,ij ( ε ij , θ , p n ) ′ (cid:12)(cid:12) X, p n (cid:3) (cid:0) J θn ( θ , p n ) − (cid:1) ′ (4.10) with ϕ mn,ij ( ε ij , θ , p n ) ∈ R d θ given by ϕ mn,ij ( ε ij , θ , p n ) = ˜ W n,ij ( θ , p n ) (cid:0) g n,ij ( ω ∗ ni ( θ , p n ) , ε ij , θ , p n ) − P ∗ n,ij ( ω ∗ ni ( θ , p n ) , θ , p n ) (cid:1) + ˜ J ωni ( ω ∗ ni ( θ , p n ) , θ , p n ) ϕ ωn,ij ( ω ∗ ni ( θ , p n ) , ε ij , θ , p n ) (4.11) for all i = j , where ω ∗ ni ( θ , p n ) is the maximizer of Π ∗ ni ( ω, θ , p n ) . In this expression ˜ W n,ij ( θ , p n ) ∈ R d θ is a d θ × vector of augmented instruments that include thecontribution of the ﬁrst-step estimates ˜ W n,ij ( θ , p n ) = W n,ij − n ( n − X k X l = k W n,kl ∇ p ′ P n,kl ( θ , p n ) ! Q n,ij (4.12)19 ith Q n,ij = [ Q n,ij, , . . . , Q n,ij, T , . . . , Q n,ij,T , . . . , Q n,ij,T T ] ′ ∈ R T and Q n,ij,st = 1 { X i = x s , X j = x t } n ( n − P i P j = i { X i = x s , X j = x t } , s, t = 1 , . . . , T. Further g n,ij ( ω, ε ij , θ , p n ) is the link choice indicator for a given ωg n,ij ( ω, ε ij , θ , p n ) = 1 (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω ≥ ε ij (cid:27) , and P ∗ n,ij ( ω, θ , p n ) is the probability of a link given ωP ∗ n,ij ( ω, θ , p n ) = F ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω (cid:19) . Also ˜ J ωni ( ω, θ , p n ) is the d θ × T Jacobian matrix of the moment function with linkprobabilities P ∗ n,ij ( ω, θ , p n ) with respect to ω ˜ J ωni ( ω, θ , p n ) = 1 n − X j = i ˜ W n,ij ( θ , p n ) ∇ ω ′ P ∗ n,ij ( ω, θ , p n ) , and ﬁnally ϕ ωn,ij ( ω, ε ij , θ , p n ) ∈ R T is the inﬂuence function given in (4.9). Then p n ( n − − / n ( θ , p n ) (cid:16) ˆ θ n − θ (cid:17) d → N (0 , I d θ ) as n → ∞ , where I d θ is the d θ × d θ identity matrix. Proof.

See the Supplemental Appendix.The inﬂuence function ϕ mn,ij has two components where the ﬁrst captures the vari-ability in the link choices. Note that ϕ mn,ij , i, j = 1 , . . . , n , is independent over j . Thedependence between the link choices is through the auxiliary variable ω ni ( ε i , θ , p n )and this contributes to the second component of ϕ mn,ij .In applications, we need to choose the instrument ˆ W n,ij . One option is to use theinstrument from the quasi maximum likelihood estimation (QMLE). Let L n ( θ, ˆ p n ) bethe single-link log likelihood function evaluated at the ﬁrst-step estimate ˆ p n L n ( θ, ˆ p n ) = X i X j = i G n,ij ln P n,ij ( θ, ˆ p n ) + (1 − G n,ij ) ln (1 − P n,ij ( θ, ˆ p n )) . (4.13)20his is not the full-information likelihood, which requires a speciﬁcation of the equi-librium selection mechanism. Because link choices are correlated there is also infor-mation on θ in the joint distribution of pairs of the link choices G n,ij and G n,ik (seealso Section 5.2) that is not captured in L n ( θ, ˆ p n ).Taking the derivative with respect to θ we obtain the quasi-likelihood equation1 n ( n − X i X j = i ∇ θ P n,ij ( θ, ˆ p n ) P n,ij ( θ, ˆ p n ) (1 − P n,ij ( θ, ˆ p n )) ( G n,ij − P n,ij ( θ, ˆ p n )) = 0 . Comparing this with the moment in (4.4) the instrument isˆ W n,ij ( θ ) = ∇ θ P n,ij ( θ, ˆ p n ) P n,ij ( θ, ˆ p n ) (1 − P n,ij ( θ, ˆ p n )) , i, j = 1 , . . . , n, i = j. (4.14)This instrument depends on θ . Therefore, we need a preliminary estimator of θ . Forthat purpose we can use powers and interactions of X i and X j as the d θ instruments inthe moment condition. A preliminary instrument is not needed if we use continuousupdating as in Hansen, Heaton, and Yaron (1996).In the discussion of Assumption 6(i) we noted that P n,ij ( θ, p ) is not diﬀerentiablefor a ﬁnite number of values of θ . That may be a problem for the instrument, becauseit has to be evaluated outside a small neighborhood of θ during the search for thesolution of the moment equation. We can avoid this problem, if we use the limitingchoice probability derived in Section 5.3 in the instrument instead of the ﬁnite n probability. The limiting choice probability is diﬀerentiable in θ everywhere. We willcome back to this after we discuss the limiting game.The link choice probability P n,ij ( θ, p ) is an n − ε i independently R times(and use these simulated utility shocks throughout the search for a solution of themoment equation), and for each simulated ε i,r , r = 1 , . . . , R , we compute the auxiliaryvariable ω ni ( ε i,r , θ, ˆ p n ) and the vector of link choices G ni ( ε i,r , θ, ˆ p n ) in (3.8). Thevector of simulated link choice probabilities is the sample average of G n,ij ( ε i,r , θ, ˆ p n ), r = 1 , . . . , R . This simulation procedure does not aﬀect the consistency, rate ofconvergence, and asymptotic normality of the estimator. The simulated GMM hasan asymptotic variance equal to that in Theorem 4.2 multiplied by 1 + R − (Pakesand Pollard (1989)). 21 Extensions V i Assumption 3 that the matrix V i ( X, σ ) is positive semi-deﬁnite is not crucial to ourapproach. Without this assumption, the auxiliary variable is solved from a maximinproblem.

Theorem 5.1

Suppose that Assumptions 1-2 are satisﬁed. The optimal link decision G i ( ε i , X, σ ) = ( G ij ( ε i , X, σ ) , j = i ) ′ is given by G ij ( ε i , X, σ ) = 1 (cid:26) U ij ( X, σ ) + 2 ( n − n − Z ′ j Φ i ( X, σ ) Λ i ( X, σ ) ω i ( ε i , X, σ ) − ε ij ≥ (cid:27) , (5.1) almost surely, ∀ j = i , where the T × vector ω i ( ε i , X, σ ) is a solution to the maximinproblem max ( ω t ,t ∈T i + ) min ( ω t ,t ∈T i − ) Π i ( ω, ε i , X, σ )= max ( ω t ,t ∈T i + ) min ( ω t ,t ∈T i − ) X j = i (cid:20) U ij ( X, σ ) + 2 ( n − n − Z ′ j Φ i ( X, σ ) Λ i ( X, σ ) ω − ε ij (cid:21) + − ( n − n − ω ′ Λ i ( X, σ ) ω (5.2) with T i + = { t : λ it ( X, σ ) > } and T i − = { t : λ it ( X, σ ) < } . We set ω it ( ε i , X, σ ) = 0 if λ it ( ε i , X, σ ) = 0 . Moreover, both G i ( ε i , X, σ ) and ω i ( ε i , X, σ ) are unique almostsurely. Proof.

See the Supplemental Appendix.Note that the expected utility in (3.6) is separable in the maximizations over ω t , t = 1 , . . . , T , so that a maximization over ω t becomes a minimization if λ it ( X, σ ) <

0. If λ it ( X, σ ) = 0, the objective function does not depend on ω t and we set ω it ( ε i , X, σ ) = 0. The separability also implies that, unlike in a general maximinor minimax problem, the order of the maximizations and minimizations does notmatter.To gain some intuition regarding the role of the eigenvalues of V i ( X, σ ) we considerthe case with 2 types ( T = 2) and a utility speciﬁcation as in (2.1)-(2.3) with γ a22ositive constant and γ = 0. We omit the arguments X and σ . The matrix V i is asin (3.7) and has nonnegative components. Suppose V i, > λ i and λ i be the eigenvalues of V i with λ i ≥ λ i . We can see that λ i > Assume that λ i = 0, i.e., V i, V i, = V i, . Let φ i = ( φ i, , φ i, ) ′ and φ i =( φ i, , φ i, ) ′ be the corresponding eigenvectors. It can be shown that the elements of φ i have the same sign, and the elements of φ i have opposite signs, i.e., φ i, φ i, > φ i, φ i, < Without loss of generality we take φ i, , φ i, >

0, and φ i, > φ i, < ω i satisﬁes ω it = φ ′ it n − X j = i G ij Z j , a.s., t = 1 , . (5.3)Note that P j = i G ij Z j is the 2 × i has of eachtype. Therefore, ω it is a weighted sum of the numbers of friends of each type, withweights equal to the components of the eigenvector φ it . By V i = Φ i Λ i Φ ′ i and theﬁrst-order condition in (5.3), the expected utility in (3.2) evaluated at the optimal ω i can be expressed as E [ U i | X, ε i , σ ] = X j = i G ij ( U ij − ε ij )+ ( n − n − Φ ′ i n − X j = i G ij Z j ! ′ Λ i Φ ′ i n − X k = i G ik Z k ! = X j = i G ij ( U ij − ε ij ) + ( n − n − ω ′ i Λ i ω i , a.s.Because λ i >

0, individual i prefers a larger ω i , which is a preference for manyfriends, with friends of the type that corresponds to the larger of φ i, and φ i, beingpreferred. If λ i > i prefers a larger ω i (in absolute value), i.e., she prefers hercircle of friends to be of one type. If λ i < i prefers an ω i closer to 0, which is apreference for an integrated circle of friends. The eigenvalues are given by λ i , λ i = (cid:16) V i, + V i, ± q V i, + V i, + 4 V i, − V i, V i, (cid:17) .Since V i, >

0, they satisfy λ i > max { V i, , V i, } ≥ λ i < min { V i, , V i, } . By deﬁnition V i φ i = λ i φ i , so ( λ i − V i, ) φ i, = V i, φ i, and V i, φ i, = ( λ i − V i, ) φ i, .Since λ i > max { V i, , V i, } and V i, >

0, these equations imply that φ i, and φ i, must havethe same sign, i.e., φ i, φ i, >

0. Similarly, we can show φ i, φ i, <

23n the special case that V i, = V i, = 0, i.e., only agents of the opposite typelink, the two eigenvalues are λ i = V i, and λ i = − V i, , and the correspondingeigenvectors are φ i = (cid:16) √ , √ (cid:17) ′ and φ i = (cid:16) √ , − √ (cid:17) ′ . In this case, ω i = 1 √ n − X j = i G ij ( Z j + Z j ) ω i = 1 √ n − X j = i G ij ( Z j − Z j ) . Intuitively, if a network only allows for cross-type links, an agent has the most friendsin common if she makes as many friends as she can (i.e., prefers ω i to be large) andchooses an equal number of friends of each type (i.e., prefers ω i to be close to 0). In this section we show that our method that was derived for directed networks alsocan be applied to undirected networks. Let G ij now denote an undirected link between i and j and G the adjacency matrix of the undirected network. In an undirectednetwork G ij = G ji . To model the formation of undirected links, we follow the link-announcement framework (Jackson, 2008) and require mutual consent to form a link.Speciﬁcally, let S ij indicate whether i proposes to link to j . A link is formed if both i and j propose to form it, so G ij = S ij S ji . Our approach in Section 3 can be extendedto undirected networks if we work with the proposals instead of the links. Becausewe observe the links but not the proposals, the estimation of the parameters is lessstraightforward. In this section we show that the extension is possible, but we leavethe development of the extension to future research.We consider the utility speciﬁcation in (2.1), with G ij an undirected link. In (2.2)we omit the reciprocity eﬀect and in (2.3) k is a mutual friend of i and j if j and k have an undirected link, so that u i ( G j , X ; β ) = β + X ′ i β + | X i − X j | ′ β + 1 n − X k = i,j G jk β ( X i , X j , X k )24nd v i ( G j , G k , X ; γ ) = G jk γ ( X i , X j , X k ) + 1 n − X l = i,j,k G jl G kl γ ( X i , X j , X k ) . Since G ij = S ij S ji , if S is the n × n matrix of proposed links, then G is a functionof S , and we have G = G ( S ) = G ( S i , S − i ), with S i the vector of link proposalsof i and S − i the matrix of link proposals of the other agents. We maintain theassumption that ε i is private information of agent i , so each agent i forms a beliefabout the proposals of the other agents, S − i , when choosing S i . Given X , let σ i ( s i | X )be the conditional probability that agent i proposes s i given X and let σ ( X ) = (cid:8) σ i ( s i | X ) , s i ∈ { , } n − , i = 1 , . . . , n (cid:9) be the belief proﬁle. For a belief proﬁle σ ,the expected utility of agent i is given by E [ U i ( G ( S i , S − i ) , X, ε i ) | X, ε i , σ ] = X j = i S ij E [ S ji u i ( G j , X ) | X, σ ] − E [ S ji | X, σ ] ε ij + 1( n − X k = i,j S ik E [ S ji S ki v i ( G j , G k , X ) | X, σ ] ! , (5.4)where E [ S ji u i ( G j , X ) | X, σ ] = E [ S ji | X, σ ] (cid:0) β + X ′ i β + | X i − X j | ′ β (cid:1) + 1 n − X k = i,j E [ S ji S jk | X, σ ] E [ S kj | X, σ ] β ( X i , X j , X k )and E [ S ji S ki v i ( G j , G k , X ) | X, σ ]= E [ S ji S jk | X, σ ] E [ S ki S kj | X, σ ] γ ( X i , X j , X k )+ 1 n − X l = i,j,k E [ S ji S jl | X, σ ] E [ S ki S kl | X, σ ] E [ S lj S lk | X, σ ] γ ( X i , X j , X k ) . In the derivation we have used that G ij = S ij S ji and S i and S j are independent given X and σ . Note that the expected utility depends on the probability that an agentproposes a link to two other agents simultaneously, as in E [ S ji S jk | X, σ ], with S ji and25 jk dependent.Just as in Sections 3 and 5.1, we linearize the quadratic part of the expectedutility in (5.4) using the Legendre transform. The linearized (in S i ) expected utilitygives the optimal link proposals in closed form. We maintain Assumption 2, so that X takes T values. For s, t = 1 , . . . , T , we deﬁne V ui,st ( X, σ ) as V ui,st ( X, σ ) = E [ S ji S ki v i ( G j , G k , X ) | X j = x s , X k = x t , X, σ ] .V ui,st ( X, σ ) is the expected utility of friends in common if i proposes to link to both j (of type s ) and k (of type t ). The superscript u indicates an undirected network. Notethat because the expected value is symmetric in j and k , V ui,st ( X, σ ) is symmetric in s and t . Let V ui ( X, σ ) denote the symmetric T × T matrix with components V ui,st ( X, σ ), s, t = 1 , . . . , T . Let λ uit ( X, σ ), t = 1 , . . . , T , be the eigenvalues of the matrix V ui ( X, σ )and φ uit ( X, σ ), t = 1 , . . . , T , the corresponding eigenvectors. Further, Λ ui ( X, σ ) = diag ( λ ui ( X, σ ) , . . . , λ uiT ( X, σ )) and Φ ui ( X, σ ) the matrix of eigenvectors.The optimal proposal decision can be represented as a set of related binary choices.

Corollary 5.2

Suppose that Assumptions 1-2 are satisﬁed. The optimal proposaldecision S i ( ε i , X, σ ) = ( S ij ( ε i , X, σ ) , j = i ) ′ is given by S ij ( ε i , X, σ ) = 1 (cid:26) U uij ( X, σ ) + 2 ( n − n − Z ′ j Φ ui ( X, σ ) Λ ui ( X, σ ) ω ui ( ε i , X, σ ) − σ ji ε ij ≥ (cid:27) , (5.5) almost surely, for all j = i , where the T × vector ω ui ( ε i , X, σ ) = ( ω uit ( ε i , X, σ ) , t = 1 , . . . , T ) ′ is a solution to the maximin problem max ( ω t ) t ∈T i + min ( ω t ) t ∈T i − Π ui ( ω ; ε i , X, σ )= max ( ω t ) t ∈T i + min ( ω t ) t ∈T i − X j = i (cid:20) U uij ( X, σ ) + 2 ( n − n − Z ′ j Φ ui ( X, σ ) Λ ui ( X, σ ) ω − σ ji ε ij (cid:21) + − ( n − n − ω ′ Λ ui ( X, σ ) ω with U uij ( X, σ ) = E [ S ji u i ( G j , X ) | X, σ ] − n − Z ′ j V ui ( X, σ ) Z j ,σ ji = E [ S ji | X, σ ] , T i + = { t : λ uit ( X, σ ) > } , and T i − = { t : λ uit ( X, σ ) < } . We set uit ( ε i , X, σ ) = 0 if λ uit ( X, σ ) = 0 . Moreover, both S i ( ε i , X, σ ) and ω ui ( ε i , X, σ ) areunique almost surely. The directed and undirected cases are quite similar. Essentially only U ij and V i are replaced by U uij and V ui . The two-step estimation must be adapted because weobserve the links not the proposals. We leave the full development of our estimatorfor undirected networks to future research. In this section, we investigate the limit of the network formation game when thenumber of agents n grows large. We show that the link formation probability in theﬁnite- n game converges to a limit as n approaches inﬁnity. The limiting probability isin some aspects simpler than the ﬁnite- n probability. Because the limiting auxiliaryvariable does not depend on ε i we do not need simulation to compute the link choiceprobability. The limiting link choice probability is also everywhere diﬀerentiable inthe parameters.By Theorem 3.2 the probability that i forms a link to j conditional on the char-acteristic proﬁle X and the equilibrium σ is P n,ij ( X, σ )= Pr ( G n,ij ( ε i , X, σ ) = 1 | X, σ )= Pr (cid:18) U n,ij ( X, σ ) + 2 ( n − n − Z ′ j Φ ni ( X, σ ) Λ ni ( X, σ ) ω ni ( ε i , X, σ ) − ε ij ≥ (cid:12)(cid:12)(cid:12)(cid:12) X, σ (cid:19) . (5.6)Note that we added a subscript n to U ij and V i to emphasize the dependence on thenumber of agents in the network.Until now we have avoided an assumption on how the matrix of individual char-acteristics X is generated by conditioning on X . For the convergence of (5.6) it isconvenient to assume that X i , i = 1 , . . . , n , are i.i.d. and that the utility speciﬁcationis such that U n,ij ( X, σ ) and V ni ( X, σ ) converge to limits U ij ( X i , X j , σ ) and V i ( X i , σ )as n → ∞ , as formally stated in Assumption 7 and veriﬁed below for the utility spec-iﬁed in Section 2. Under the assumptions, the link formation probability P n,ij ( X, σ )27onverges to P ij ( X i , X j , σ )= Pr (cid:0) U ij ( X i , X j , σ ) + 2 Z ′ j Φ i ( X i , σ ) Λ i ( X i , σ ) ω i ( X i , σ ) − ε ij ≥ (cid:12)(cid:12) X i , X j , σ (cid:1) (5.7)as n → ∞ , where Λ i ( X i , σ ) and Φ i ( X i , σ ) are the eigenvalue and eigenvector matricesof V i ( X i , σ ), and ω i ( X i , σ ) solvesmax ω Π i ( ω, X i , σ )= max ω E (cid:16) (cid:2) U ij ( X i , X j , σ ) + 2 Z ′ j Φ i ( X i , σ ) Λ i ( X i , σ ) ω − ε ij (cid:3) + (cid:12)(cid:12)(cid:12) X i , σ (cid:17) − ω ′ Λ i ( X i , σ ) ω. (5.8)The expectation in (5.8) is taken with respect to X j and ε ij . Assumption 7 (i) X i , i = 1 , . . . , n , are i.i.d.. (ii) For U n,ij ( X, σ ) and V ni ( X, σ ) deﬁned in (3.3) and (3.1), there exist U ij ( X i , X j , σ ) and V i ( X i , σ ) such that for any i and any X i and X j , max j = i | U n,ij ( X, σ ) − U ij ( X i , X j , σ ) | = o p (1) and V ni ( X, σ ) − V i ( X i , σ ) = o p (1) . (iii) For any X i and σ , Π i ( ω, X i , σ ) deﬁned in (5.8) has a uniquemaximizer ω i ( X i , σ ) . An implication of Assumption 7(i) is that in the limit of the choice probabilitywe average over all X k , k = i, j , so that the limiting choice probability only dependson X i and X j . Assumption 7(ii) is an assumption on the utility function that weverify below. Assumption 7(iii) ensures that ω i ( X i , σ ) is well-deﬁned. Theorem 5.3

Under Assumptions 1-3 and 7, we have that for all X i and X j and all σ P n,ij ( X, σ ) − P ij ( X i , X j , σ ) = o p (1) (5.9) as n → ∞ . Proof.

See the Supplemental Appendix.We refer to P ij ( X i , X j , σ ) deﬁned in (5.7) as the limiting choice probability. It isthe choice probability in the limiting game with a continuum of players, where each Assumption 7(i) could be relaxed to allow for non-i.i.d. X i , but we do not pursue this here. i forms a link with j following the limiting strategy G ij = 1 (cid:8) U ij ( X i , X j , σ ) + 2 Z ′ j Φ i ( X i , σ ) Λ i ( X i , σ ) ω i ( X i , σ ) − ε ij ≥ (cid:9) , ∀ j = i. The ω i ( X i , σ ) in the limiting strategy captures the expected utility of friends incommon. With the inclusion of ω i ( X i , σ ) the optimal strategy of individual i is tomyopically choose to form a link as in a binary choice problem.We now verify Assumption 7(ii) for the utility function in Section 2. Example 5.1

Consider the expected utility in (2.5)-(2.6). Under the assumptionthat the equilibrium is symmetric, we can denote σ ( X j , X k ) = E [ G jk | X, σ ] where the order of the arguments indicates that it is the probability that j links to k . Therefore U n,ij ( X, σ ) = β + X ′ i β + | X i − X j | ′ β + σ ( X j , X i ) β + 1 n − X k = i,j σ ( X j , X k ) β ( X i , X j , X k ) − n − Z ′ j V ni ( X, σ ) Z j By the law of large numbers the limit is U ij ( X i , X j , σ ) = β + X ′ i β + | X i − X j | ′ β + σ ( X j , X i ) β + E [ σ ( X j , X k ) β ( X i , X j , X k )] where the expectation is over X k .Moreover, V ni ( X, σ ) = ( V ni,st ( X, σ ) , s, t = 1 , . . . , T ) with V ni,st ( X, σ ) = σ ( x s , x t ) σ ( x t , x s ) γ ( X i , x s , x t )+ 1 n − X l = i,j,k σ ( x s , X l ) σ ( x t , X l ) γ ( X i , x s , x t ) . An equilibrium σ in general depends on the entire X . In this section, our focus is to approximatethe ﬁnite- n choice probability at a particular σ , so we can treat σ as a ﬁxed matrix and ignore itsdependence on X . Any link probability in the expected utility can be viewed as a function of thecharacteristics of the involved agents only. y the law of large numbers the limit of V ni,st ( X, σ ) is V i,st ( X i , σ ) = σ ( x s , x t ) σ ( x t , x s ) γ ( X i , x s , x t )+ E [ σ ( x s , X l ) σ ( x t , X l ) γ ( X i , x s , x t )] , where the expectation is over X l . A detailed proof can be found in the SupplementalAppendix. In Section 4 we proposed a two-step GMM estimator for the parameters of theutility function. That estimator requires an instrument in the second stage andwe suggested the instrument derived from the quasi likelihood as in (4.14). Thisinstrument involves the derivative of the choice probability that is not everywherediﬀerentiable. Because the limiting choice probability is everywhere diﬀerentiable wecan use its derivative in (4.14). The instruments based on the ﬁnite- n and limitingchoice probabilities should behave similarly asymptotically.A further simpliﬁcation occurs if in the moment condition in (4.4) we replacethe ﬁnite- n choice probability by the limiting choice probability that is simpler tocompute since it does not require simulation. The formal theory that justiﬁes theuse of the limiting model for ﬁnite- n networks requires additional results. First,the convergence of the choice probabilities has to be uniform (over links), not justpointwise as established here. An additional complication is the presence of multipleequilibria. Unlike estimation based on the ﬁnite- n choice probability in Section 4which requires no assumption on equilibrium selection, for estimation based on thelimiting choice probability to be consistent, we need to assume that the sequenceof equilibrium selection mechanisms converges to a limit, similar in spirit to theconvergence of equilibria in Menzel (2016).In the next section we provide simulation evidence on the use of the limiting choiceprobability in the instrument and/or the moment function. In this section, we conduct a simulation study of the two-step estimator proposedin Section 4 for a range of network sizes. We consider three cases: (i) second-stepmoments for the ﬁnite- n game, with instruments derived from the ﬁnite- n link proba-bilities, (ii) second-step moments for the ﬁnite- n game, but instruments derived from30he limiting link probabilities, (iii) second-step moments for the limiting game, withinstruments derived from the limiting link probabilities. For cases (i) and (ii) weestablished consistency in Section 4. We did not show consistency for case (iii), butour results are suggestive that the estimator is also consistent. It is computationallythe easiest case.In the simulation study we consider the utility speciﬁcation U i ( G, X, ε i ) = P j = i G ij (cid:18) β + X i β + | X i − X j | β + 1 n − P k = i,j G jk β + 1 n − P k = i,j G ik G jk G kj γ − ε ij (cid:19) where X i are i.i.d. binary variables with equal probability of being 0 or 1, and ε ij are i.i.d. following the standard normal distribution N (0 , β , β , β , β , γ ) = ( − , , − , , n =10 , , , , , n , we generate the links in a single directed network as follows.First, we compute a Bayesian Nash equilibrium by solving (2.8) for σ ∗ ( X ) by iter-ating that equation from a starting value. Second, using the equilibrium choiceprobabilities to compute U n,ij ( X, σ ) and V ni ( X, σ ) we generate the links by (3.8)after we calculate ω ni ( ε i ) for the simulated ε i . Each experiment is repeated 100times. We report the means and standard errors of the estimated parameters.Table 1 reports the two-step GMM estimates for case (i). The ﬁnite- n link prob-ability in (4.4) is computed by simulation. We repeatedly draw ε i , solve for ω ni ( ε i )and substitute that solution in (3.8) to obtain the optimal link choices for that ε i .The link probability is now approximated by the fraction of draws that result in alink. For the instrument we choose the instrument derived from QMLE in (4.14).The GMM estimator for θ is found by continous updating. The instrument is also We use an equilibrium in the limiting game as the initial value. This equilibrium is computedby iterating the limiting version of (2.8) where we replace the ﬁnite- n choice probability on theright-hand side by the limiting one. For small networks, generating the links by maximizing the expected utility in (2.4) with respectto G i by quadratic integer programming (QIP) is computationally competitive. In our simulationstudy, we use QIP for n ≤

100 and (3.8) for n > n Game(Instruments from the Finite- n Game) n β β β β γ − .

995 0 . − .

937 0 .

981 0 . . . . . . − .

010 1 . − .

038 1 .

003 0 . . . . . . − .

003 0 . − .

001 1 .

020 0 . . . . . . − .

996 0 . − .

010 1 .

031 0 . . . . . . − .

998 0 . − .

000 1 .

027 0 . . . . . . − .

001 1 . − .

997 0 .

998 0 . . . . . . − − Note: Mean estimates and standard errors from 100 re-peated samples using the ﬁnite- n game, with the GMM in-struments simulated independently from the ﬁnite- n game.The choice probabilities are computed from 500 simula-tions by either solving quadratic integer programming (for n ≤ n > calculated by simulation, and the derivative in the numerator is approximated by anumerical derivative. Because the sample moment function is not everywhere diﬀer-entiable we use a derivative-free optimization solver when searching for the estimateof θ . The results in Table 1 show that the two-step GMM based on the ﬁnite- n gameperforms well. The mean estimates are close to the true values even for network sizesas small as n = 25. The standard errors also decrease as the network size increases,as expected.Table 2 presents the results for case (ii), with the ﬁnite- n choice probability in the The instrument is simulated using ε i that are drawn independently of those drawn to simulatethe choice probabilities in the moment function. We use fminsearch provided in MATLAB. n Game(Instruments from the Limiting Game) n β β β β γ − .

008 1 . − .

940 0 .

834 0 . . . . . . − .

017 1 . − .

065 1 .

016 0 . . . . . . − .

010 0 . − .

995 1 .

050 0 . . . . . . − .

995 0 . − .

010 1 .

034 0 . . . . . . − .

998 1 . − .

001 1 .

031 0 . . . . . . − .

001 1 . − .

000 0 .

999 0 . . . . . . − − Note: Mean estimates and standard errors from 100 re-peated samples using the ﬁnite- n game, with the GMM in-struments calculated from the limiting game. The ﬁnite- n choice probabilities are computed from 500 simulations byeither solving quadratic integer programming (for n ≤ n > moment function in (4.4), but in the instrument in (4.14) we replace the ﬁnite- n choiceprobability and its derivative by the limiting choice probability and its derivative. Thelimiting choice probability and its derivative need not be computed by simulation andthe choice probability is diﬀerentiable everywhere with respect to θ . We ﬁnd that forsmall networks (e.g. n = 10) the estimator is slightly biased. The standard errorsare larger than those in case (i). For larger networks, the standard errors in cases (i)and (ii) are close, suggesting that in larger networks, the computationally convenientlimiting game can be used without sacriﬁcing precision.In Table 3 we consider case (iii), where in both the moment function and theinstrument we use the limiting choice probabilities. This yields a moment condition33able 3: Two-Step QMLE Estimates Using the Limiting Game n β β β β γ − .

152 2 . − . − . − . . . . . . − .

719 2 . − . − . − . . . . . . − .

986 1 . − .

064 0 .

858 0 . . . . . . − .

995 1 . − .

007 0 .

985 0 . . . . . . − .

001 1 . − .

003 1 .

009 0 . . . . . . − .

001 1 . − .

000 1 .

006 0 . . . . . . − − Note: Mean estimates and standard errors from 100 re-peated samples using the limiting game, equivalent toGMM estimates with both the moment function and theinstruments calculated based on the limiting game. that is equal to the ﬁrst-order condition from QMLE based on the limiting game.This case is computationally convenient, because no simulation is needed. The resultsshow that the estimates are oﬀ in small networks. However, the bias disappears asthe network size grows. This suggests that the estimator solved from the limitingmoment condition is consistent. The standard errors in Table 3 are larger than thosein Tables 1-2.In sum, the simulation results suggest that the two-step estimation procedurebased on the ﬁnite- n game gives good estimates for the parameters even in relativelysmall networks. For large networks, estimators based on the limiting game are asgood as those based on the ﬁnite- n game. This is encouraging given that the limitingchoice probabilities are much easier to compute than the ﬁnite- n ones.34 Conclusions

In this paper, we develop a new method for the estimation of network formationgames using data from a single large network. We consider a network formationgame with incomplete information, where the utility of an agent in a network canbe nonseparable in her own link choices to accommodate the utility from friends incommon. We propose a new approach in which the optimal link decision of eachagent is a set of binary link choices if an auxiliary variable is included. Based on thisrepresentation we analyze the dependence between the link choices of an agent, andits eﬀect on the estimation of the parameters of the utility function. We propose atwo-step estimation procedure where we estimate the link choice probabilities non-parameterically in the ﬁrst step and estimate the utility function parameters in thesecond step. This two-step procedure requires weak assumptions about equilibriumselection, is simple to compute, and provides consistent and asymptotically normalestimators for the parameters that account for the link dependence.Some extensions of our approach are discussed in Section 5. We consider anunrestricted expected utility of friends in common, undirected networks and the limitof the link choice probabilities. Another important extension is to relax the i.i.d.assumption on the unobservables and introduce an individual eﬀect, similar to thatin Graham (2017). This creates stronger and non-vanishing link dependence withinan agent. Ridder and Sheng (2017) discuss the estimation and inference in this case.The Legendre transform may be useful in other applications, where an agentchooses between a large number of overlapping alternatives that exhibit strategiccomplementarity. Such models are challenging to analyze because the optimal decisionof an agent generally does not have a closed form. Our approach may facilitate theeconometric analysis of these models.

References

Andrews, D. W. (1994): “Empirical Process Methods in Econometrics,” in

Hand-book of Econometrics , ed. by R. F. Engle, and

D. L. McFadden, vol. IV, chap. 37,pp. 2247–2294. Elsevier Science.

Boucheron, S., G. Lugosi, and

P. Massart (2013):

Concentration Inequalities:A Nonasymptotic Theory of Independence . Oxford University Press.35 iliberto, F., and

E. Tamer (2009): “Market Structure and Multiple Equilibriain Airline Markets,”

Econometrica , 77(6), 1791–1828. de Paula, A., S. Richards-Shubik, and

E. Tamer (2018): “Identifying Prefer-ences in Networks With Bounded Degree,”

Econometrica , 86(1), 263–288.

Graham, B. S. (2017): “An Econometric Model of Network Formation With DegreeHeterogeneity,”

Econometrica , 85(4), 1033–1063.

Hansen, L. P., J. Heaton, and

A. Yaron (1996): “Finite-Sample Properties ofSome Alternative GMM Estimators,”

Journal of Business & Economic Statistics ,14(3), 262–280.

Jackson, M. O. (2008):

Social and Economic Networks . Princeton University Press.

Jennrich, R. I. (1969): “Asymptotic Properties of Non-Linear Least Squares Esti-mators,”

The Annals of Mathematical Statistics , 40(2), 633–643.

Leung, M. P. (2015): “Two-Step Estimation of Network-Formation Models withIncomplete Information,”

Journal of Econometrics , 188(1), 182–195.

Menzel, K. (2016): “Inference for Games with Many Players,”

The Review of Eco-nomic Studies , 83(1), 306–337.(2017): “Strategic Network Formation with Many Players,”

NYU WorkingPaper . Miyauchi, Y. (2016): “Structural Estimation of Pairwise Stable Networks withNonnegative Externality,”

Journal of Econometrics , 195(2), 224–235.

Newey, W. K., and

D. McFadden (1994): “Large Sample Estimation and Hy-pothesis Testing,” in

Handbook of Econometrics , ed. by R. F. Engle, and

D. L.McFadden, vol. IV, chap. 36, pp. 2111–2245. Elsevier Science.

Pollard, D. (1990):

Empirical Processes: Theory and Applications , vol. 2 of

NSF-CBMS Regional Conference Series in Probability and Statistics . Institute of Math-ematical Statistics.

Ridder, G., and

S. Sheng (2017): “Estimation of Social Interactions in Endoge-nous and Strategically Formed Networks,”

UCLA Working Paper .36 ockafellar, R. T. (1970):

Convex Analysis . Princeton University Press.

Sheng, S. (2018): “A Structural Econometric Analysis of Network Formation GamesThrough Subnetworks,”

Econometrica (forthcoming) . Tamer, E. (2003): “Incomplete Simultaneous Discrete Response Model with Multi-ple Equilibria,”

Review of Economic Studies , 70(1), 147–165. van der Vaart, A. W., and

J. A. Wellner (1996):

Weak Convergence andEmpirical Processes: With Applications to Statistics . Springer-Verlag.37 upplemental Appendix

This supplemental appendix contains all the proofs in the paper. We use k . k to denotethe Euclidean norm. If the argument is a matrix A the norm is the matrix Euclideannorm k A k = p tr ( A ′ A ). S.1 Proofs in Section 3

Proof of Proposition 3.1.

It suﬃces to show the ﬁrst equality as the second equalityfollows immediately from (3.5). By the real spectral decomposition of V i ( X, σ ), thedouble-sum term in the expected utility in (3.2) satisﬁes X j = i X k = i G ij G ik Z ′ j V i ( X, σ ) Z k = X j = i G ij Z ′ j ! V i ( X, σ ) X k = i G ik Z k ! = X j = i G ij Z ′ j ! Φ i ( X, σ ) Λ i ( X, σ ) Φ ′ i ( X, σ ) X k = i G ik Z k ! = X j = i G ij Z ′ j Φ i ( X, σ ) ! Λ i ( X, σ ) X k = i G ik Φ ′ i ( X, σ ) Z k ! = ( n − T X t =1 λ it ( X, σ ) n − X j = i G ij Z ′ j φ it ( X, σ ) ! Combining this with (3.2) yields the ﬁrst equality in (3.6). The proof is complete.

Proof of Theorem 3.2.

From Proposition 3.1, the expected utility can be written1s E [ U i ( G i , G − i , X, ε i ) | X, ε i , σ ]= X j = i G ij ( U ij ( X, σ ) − ε ij )+ T X t =1 λ it ( X, σ ) max ω t ∈ R ( n − n − X j = i G ij Z ′ j φ it ( X, σ ) ω t − ( n − n − ω t ) = max ω ∈ R T X j = i G ij U ij ( X, σ ) + 2 ( n − n − Z ′ j T X t =1 φ it ( X, σ ) λ it ( X, σ ) ω t − ε ij ! − ( n − n − T X t =1 λ it ( X, σ ) ω t = max ω ∈ R T X j = i G ij (cid:18) U ij ( X, σ ) + 2 ( n − n − Z ′ j Φ i ( X, σ ) Λ i ( X, σ ) ω − ε ij (cid:19) − ( n − n − ω ′ Λ i ( X, σ ) ω (S.1)where ω = ( ω t ) ∀ t ∈ R T . The second equality holds because λ it ( X, σ ) ≥ t =1 , . . . , T , by Assumption 3.Denote by ˜Π i ( G i , ω, ε i , X, σ ) the objective function of the maximization problemin (S.1)˜Π i ( G i , ω, ε i , X, σ ) = X j = i G ij (cid:18) U ij ( X, σ ) + 2 ( n − n − Z ′ j Φ i ( X, σ ) Λ i ( X, σ ) ω − ε ij (cid:19) − ( n − n − ω ′ Λ i ( X, σ ) ω From (S.1), the maximized expected utility can be derived frommax G i E [ U i ( G i , G − i , X, ε i ) | X, ε i , σ ]= max G i max ω ˜Π i ( G i , ω, ε i , X, σ )= max ω max G i ˜Π i ( G i , ω, ε i , X, σ )= max ω Π i ( ω, ε i , X, σ ) (S.2)2here Π i ( ω, ε i , X, σ ) is deﬁned in (3.9). The second equality in (S.2) follows be-cause max ω ˜Π i ( G i , ω ) ≤ max ω max G i ˜Π i ( G i , ω ) for all G i , so max G i max ω ˜Π i ( G i , ω ) ≤ max ω max G i ˜Π i ( G i , ω ), and similarly we can prove the other direction. The last equal-ity follows from the deﬁnition of Π i ( ω, ε i , X, σ ). The result in (S.2) shows that themaximum expected utility can be obtained by solving the last maximization problemin (S.2) or equivalently (3.9).By the deﬁnition of G i ( X, ε i , σ ) and ω i ( X, ε i , σ ), we havemax ω Π i ( ω, ε i , X, σ )= ˜Π i ( G i ( X, ε i , σ ) , ω i ( X, ε i , σ ) ; X, ε i , σ ) ≤ max ω ˜Π i ( G i ( X, ε i , σ ) , ω, ε i , X, σ )= E [ U i ( G i ( X, ε i , σ ) , G − i , X, ε i ) | X, ε i , σ ] (S.3)where the last equality comes from (S.1). Combining (S.2) and (S.3) we see thatthe inequality in (S.3) becomes an equality. Therefore, G i ( X, ε i , σ ) is an optimalsolution.As for the uniqueness, G i ( X, ε i , σ ) is unique almost surely because ε i has a contin-uous distribution by Assumption 1, so two link decisions achieve the same expectedutility with probability zero. To show the uniqueness of Λ i ( X, σ ) ω i ( X, ε i , σ ), notethat (S.3) implies that ω i ( X, ε i , σ ) is an optimal solution to the maximization prob-lem max ω ˜Π i ( G i , ω, ε i , X, σ ) evaluated at G i = G i ( X, ε i , σ ), so ω i ( X, ε i , σ ) satisﬁesthe ﬁrst-order conditionΛ i ( X, σ ) ω i ( X, ε i , σ ) = 1 n − i ( X, σ ) Φ ′ i ( X, σ ) X j = i G ij ( X, ε i , σ ) Z j (S.4)Since G i ( X, ε i , σ ) is unique almost surely, so is Λ i ( X, σ ) ω i ( X, ε i , σ ). The proof iscomplete. Lemma S.1

Suppose Assumption 1-3 are satisﬁed. An ω i ( ε i , X, σ ) that solves themaximization problem in (3.9) satisﬁes the ﬁrst-order condition n − X j = i (cid:26) U ij ( X, σ ) + 2 ( n − n − Z ′ j Φ i ( X, σ ) Λ i ( X, σ ) ω − ε ij ≥ (cid:27) Λ i ( X, σ ) Φ ′ i ( X, σ ) Z j = Λ i ( X, σ ) ω lmost surely. Proof.

Omit X and σ in the notation. Since Π i ( ω, ε i ) is sub-diﬀerentiable at all ω , by optimality of ω i ( ε i ), Π i ( ω, ε i ) has subgradient 0 at ω i ( ε i ), that is, ω i ( ε i ) satisﬁesthe ﬁrst-order condition1 n − X j = i (cid:26) U ij + 2 ( n − n − Z ′ j Φ i Λ i ω − ε ij > (cid:27) Λ i Φ ′ i Z j − Λ i ω = − n − X j = i (cid:26) U ij + 2 ( n − n − Z ′ j Φ i Λ i ω − ε ij = 0 (cid:27) diag ( τ ) Λ i Φ ′ i Z j , (S.5)for some τ = ( τ , . . . , τ T ) ∈ [0 , T . Deﬁne the right-hand side of (S.5) as ∆ n ( ω, ε i ).For any ω , Pr ( k ∆ n ( ω, ε i ) k > | X, σ ) ≤ Pr (cid:18) ∃ j = i, U ij + 2 ( n − n − Z ′ j Φ i Λ i ω = ε ij (cid:12)(cid:12)(cid:12)(cid:12) X, σ (cid:19) ≤ X j = i Pr (cid:18) U ij + 2 ( n − n − Z ′ j Φ i Λ i ω = ε ij (cid:12)(cid:12)(cid:12)(cid:12) X, σ (cid:19) = 0 , (S.6)because ε ij has a continuous distribution. Hence the ﬁrst-order condition (S.5) holdswith ∆ n ( ω, ε i ) replaced by 0 with probability 1. By (S.6) again, we obtain1 n − X j = i (cid:26) U ij + 2 ( n − n − Z ′ j Φ i Λ i ω i ( ε i ) − ε ij ≥ (cid:27) Λ i Φ ′ i Z j − Λ i ω i ( ε i ) = 0 , a.s. Proof of Corollary 3.3.

Omit X and σ in the notation. By Lemma S.1, ω i ( ε i ) isa solution to the ﬁrst-order condition1 n − X j = i (cid:26) U ij + 2 ( n − n − Z ′ j Φ i Λ i ω ≥ ε ij (cid:27) Λ i Φ ′ i Z j = Λ i ω, a.s.. (S.7)Note that the ﬁrst-order condition could have multiple solutions, and among theselocal solutions, ω i ( ε i ) is the unique maximizer of Π i ( ω, ε i ). For this reason we refer Notice that the function max { x, } is diﬀerentiable for x = 0 and sub-diﬀerentiable for x = 0with subderivatives in [0 , ω i ( ε i ) as the global solution.For any ω ∈ R T , deﬁne the choice function g i ( ω ; ε i ) = ( g ij ( ω ; ε ij ) , j = i ) : R T →{ , } n − by g ij ( ω ; ε ij ) = 1 (cid:26) U ij + 2 ( n − n − Z ′ j Φ i Λ i ω ≥ ε ij (cid:27) , ∀ j = i. (S.8)The ﬁrst-order condition (S.7) deﬁnes a system of equations over ω Λ i ω = 1 n − X j = i g ij ( ω ; ε ij ) Λ i Φ ′ i Z j , a.s.. (S.9)On the other hand, for any g i = ( g ij , j = i ) ∈ { , } n − , deﬁne the function ω i ( g i ) : { , } n − → R T by ω i ( g i ) = 1 n − X j = i g ij Φ ′ i Z j . (S.10)We derive a system of equations over g i g ij = 1 (cid:26) U ij + 2 ( n − n − Z ′ j Φ i Λ i ω i ( g i ) ≥ ε ij (cid:27) , ∀ j = i. (S.11)We show that with probability 1 there is a one-to-one mapping between the solutionsto (S.9) and the solutions to (S.11).First, for any local solution ω li ( ε i ) that solves system (S.9), the choice function g i ( ω ; ε i ) evaluated at ω li ( ε i ), i.e., g i (cid:0) ω li ( ε i ) ; ε i (cid:1) , is a solution to system (S.11) withprobability 1. To see this, note that by the ﬁrst-order condition in (S.9) and thedeﬁnition of ω i ( g i ) in (S.10) ω li ( ε i ) satisﬁesΛ i ω li ( ε i ) = 1 n − X j = i g ij (cid:0) ω li ( ε i ) ; ε ij (cid:1) Λ i Φ ′ i Z j , a.s.= Λ i ω i (cid:0) g i (cid:0) ω li ( ε i ) ; ε i (cid:1)(cid:1) . (S.12)5hen by the deﬁnition of g i ( ω ; ε i ) in (S.8), for any j = i , g ij (cid:0) ω li ( ε i ) ; ε ij (cid:1) =1 (cid:26) U ij + 2 ( n − n − Z ′ j Φ i Λ i ω li ( ε i ) ≥ ε ij (cid:27) =1 (cid:26) U ij + 2 ( n − n − Z ′ j Φ i Λ i ω i (cid:0) g i (cid:0) ω li ( ε i ) ; ε i (cid:1)(cid:1) ≥ ε ij (cid:27) , a.s..This shows that g ij (cid:0) ω li ( ε i ) ; ε ij (cid:1) satisﬁes system (S.11) with probability 1. Second, fortwo distinct ω l i ( ε i ) = ω l i ( ε i ), by (S.12), with probability 1, we have g i (cid:0) ω l i ( ε i ) ; ε i (cid:1) = g i (cid:0) ω l i ( ε i ) ; ε i (cid:1) . Therefore, there is a one-to-one mapping between the solutions to(S.9) and (S.11) with probability 1.The equivalence between systems (S.9) and (S.11) motivates us to analyze therelationship between ω i ( ε i ) and ε i through the relationship between the solutions to(S.11) and ε i . By the deﬁnition of ω i ( g i ) in (S.10), write system (S.11) explicitly as g ij = 1 ( U ij + 2 n − X k = i g ik Z ′ j V i Z k ≥ ε ij ) , ∀ j = i, a.s. (S.13)For any g i ∈ { , } n − , deﬁne the set E li ( g i ) = (cid:8) ε i ∈ R n − : g i satisﬁes (S.13) (cid:9) . (S.14)This set can be regarded as the collection of ε i that support g i as a solution to(S.13). Note that since ε i has support on R n − , the set E li ( g i ) is nonempty for all g i ∈ { , } n − .As discussed, system (S.13) may have multiple solutions, resembling the presenceof multiple equilibria in entry games (Ciliberto and Tamer, 2009; Tamer, 2003). Inparticular, it is possible that the sets in (S.14) for two diﬀerent g i and g ′ i overlap andin the overlapping area both g i and g ′ i satisfy (S.13). For example, assume that allelements in V i are positive so link choices are strategic complements. In the region of ε i where U ij < ε ij ≤ U ij + 2 n − X k = i Z ′ j V i Z k , ∀ j = i, we ﬁnd that both ( g ij = 1 , j = i ) and ( g ij = 0 , j = i ) are solutions to (S.13).6nlike in entry games where equilibrium selection mechanisms are typically un-known, in our case we have a natural selection mechanism. Recall that from Propo-sition 3.2 the optimal link decision G i ( ε i ) = ( G ij ( ε i ) , j = i ) ∈ { , } n − is given bythe choice function (S.8) evaluated at the global solution ω i ( ε i ), i.e., G ij ( ε i ) = g ij ( ω i ( ε i ) ; ε ij ) , ∀ j = i. From our earlier discussion we can see that G ij ( ε i ) satisﬁes (S.13). For a given ε i ∈ R n − , system (S.13) could have multiple solutions, and among all such solutions G ij ( ε i ) is selected because it is the choice function evaluated at ω i ( ε i ), the globalmaximizer of the objective functionΠ i ( ω, ε i ) = X j = i (cid:20) U ij + 2 ( n − n − Z ′ j Φ i Λ i ω − ε ij (cid:21) + − ( n − n − ω ′ Λ i ω. To further characterize the selection mechanism, we examine the objective func-tion Π i ( ω, ε i ) evaluated at the local solutions. Note that for any ω l ∈ R T that solvesthe system (S.9), we haveΛ i ω l = 1 n − X j = i g ij (cid:0) ω l ; ε ij (cid:1) Λ i Φ ′ i Z j , a.s.Hence, Π i (cid:0) ω l , ε i (cid:1) can be represented asΠ i (cid:0) ω l , ε i (cid:1) = X j = i g ij (cid:0) ω l ; ε ij (cid:1) (cid:18) U ij + 2 ( n − n − Z ′ j Φ i Λ i ω l − ε ij (cid:19) − ( n − n − n − X j = i g ij (cid:0) ω l ; ε ij (cid:1) Λ i Φ ′ i Z j ! ′ ω l , a.s.= X j = i g ij (cid:0) ω l ; ε ij (cid:1) (cid:18) U ij + n − n − Z ′ j Φ i Λ i ω l − ε ij (cid:19) = X j = i g ij (cid:0) ω l ; ε ij (cid:1) U ij + 1 n − X k = i g ik (cid:0) ω l ; ε ik , θ, p (cid:1) Z ′ j V i Z k − ε ij ! , a.s. (S.15)This indicates that Π i (cid:0) ω l , ε i (cid:1) can be regarded as a function of the link decision7 i (cid:0) ω l ; ε i (cid:1) that corresponds to ω l .By the global optimality of ω i ( ε i ) we have Π i ( ω i ( ε i ) , ε i ) ≥ Π i (cid:0) ω l , ε i (cid:1) for all ω l that solves the ﬁrst-order condition (S.9). By the represetation of Π i (cid:0) ω l , ε i (cid:1) in (S.15)and the equivalence between (S.9) (thus (S.7)) and (S.13), if G i ( ε i ) takes a value g i ∈ { , } n − , then g i must satisfy both (S.13) and X j = i g ij U ij + 1 n − X k = i g ik Z ′ j V i Z k − ε ij ! ≥ X j = i g lij U ij + 1 n − X k = i g lik Z ′ j V i Z k − ε ij ! (S.16)almost surely, for all g li that solve (S.13). We can view (S.16) as a selection criterionthat determines which solution to (S.13) is selected.Therefore, for any g i ∈ { , } n − , we can deﬁne the set E i ( g i ) = (cid:8) ε i ∈ R n − : g i satisﬁes both (S.13) and (S.16) (cid:9) . (S.17)This set is the collection of ε i that support g i as the unique optimal decision, i.e.,for any ε i ∈ E i ( g i ), we have G i ( ε i ) = g i . The uniqueness implies that if g i = g ′ i ,the sets E i ( g i ) and E i ( g ′ i ) are disjoint. The collection of such sets E i ( g i ) for all g i ∈ { , } n − thus forms a partition of the space of ε i , with each region in thepartition corresponding to a unique optimal link decision, similarly as in entry games(Ciliberto and Tamer, 2009; Tamer, 2003). The proof is complete. Proof of Proposition 3.4.

We follow the proof in Leung (2015, Theorem 1). Weorganize the choice probabilities in an n × n − matrix σ ( X ). The i th row hasindividual i ’s choice probabilities σ i ( X ) = { σ i ( g i | X ) , g i ∈ G i } . The entries in therow sum to 1. The set of such matrices is Σ ( X ). With row i of σ ( X ) we associate X i . Let Σ s ( X ) ⊂ Σ ( X ) be the subset of matrices of choice probabilities, such that if X i = X j then σ i ( X ) = σ j ( X ), i.e., rows i and j are identical.If we organize the choice probabilities in (2.7) in an n × n − matrix P ( X, σ ), itmaps the matrix σ to a matrix of choice probabilities in Σ ( X ). An equilibrium is aﬁxed point of this mapping. Because we focus on symmetric equilibria in Σ s ( X ), wehave to show that P ( X, σ ) is a continuous mapping from Σ s ( X ) to Σ s ( X ) and thatΣ s ( X ) is convex and compact.First, the mapping P ( X, σ ) maps Σ s ( X ) to itself. Let σ ( X ) ∈ Σ s ( X ). If X i =8 j , then in the expected utilities of i and j , (2.5) and (2.6) are equal for i and j . Because ε i and ε j have the same distribution, rows i and j of P ( X, σ ( X )) areidentical, so indeed P ( X, σ ( X )) ∈ Σ s ( X ).Second, a convex combination of matrices σ ( X ) , ˜ σ ( X ) ∈ Σ s ( X ) is a matrix withrows that sum to 1 and that rows i and j are identical if X i = X j . The convexcombination is therefore in Σ s ( X ).Third, Σ s ( X ) is bounded. It is also closed. Let (cid:8) σ k ( X ) , k = 1 , , . . . (cid:9) be asequence in Σ s ( X ) that converges to a limit. Then for all k the rows of σ k ( X ) sumto 1 and rows i and j are identical if X i = X j . So the limit has the same propertiesand is therefore in Σ s ( X ).Finally, the mapping P ( X, σ ) is continuous on Σ s ( X ). This is shown in LemmaS.4 in the Supplemental Appendix.We conclude that by Brower’s ﬁxed point theorem, P ( X, σ ) has a ﬁxed point inΣ s ( X ). The proof is complete. S.2 Proofs in Section 4

S.2.1 ConsistencyProof of Theorem 4.1.

We follow the consistency proof in Newey and McFadden(1994). A complication is the presence of the ﬁrst-stage parameter p n . Fix δ >

0. Let B δ ( θ ) = { θ ∈ Θ : k θ − θ k < δ } be an open δ -ball centered at θ . If (cid:13)(cid:13)(cid:13) Ψ n (cid:16) ˆ θ n , p n (cid:17)(cid:13)(cid:13)(cid:13) < inf θ ∈ Θ \B δ ( θ ) k Ψ n ( θ, p n ) k , then ˆ θ n / ∈ Θ \B δ ( θ ), or equivalently, ˆ θ n ∈ B δ ( θ ). There-fore, Pr (cid:16) (cid:13)(cid:13)(cid:13) ˆ θ n − θ (cid:13)(cid:13)(cid:13) < δ (cid:12)(cid:12)(cid:12) X, p n (cid:17) ≥ Pr (cid:18) (cid:13)(cid:13)(cid:13) Ψ n (cid:16) ˆ θ n , p n (cid:17)(cid:13)(cid:13)(cid:13) < inf θ ∈ Θ \B δ ( θ ) k Ψ n ( θ, p n ) k (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:19) . (S.18)Because by Assumption 4(i)-(ii) and Lemma S.4inf θ ∈ Θ \B δ ( θ ) k Ψ n ( θ, p n ) k > , the right-hand side in (S.18) goes to 1, if (cid:13)(cid:13)(cid:13) Ψ n (cid:16) ˆ θ n , p n (cid:17)(cid:13)(cid:13)(cid:13) = o p (1) . (S.19)9ow by the triangle inequality (cid:13)(cid:13)(cid:13) Ψ n (cid:16) ˆ θ n , p n (cid:17)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) ˆΨ n (cid:16) ˆ θ n , p n (cid:17)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) ˆΨ n (cid:16) ˆ θ n , p n (cid:17) − Ψ n (cid:16) ˆ θ n , p n (cid:17)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) ˆΨ n (cid:16) ˆ θ n , p n (cid:17)(cid:13)(cid:13)(cid:13) + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) ˆΨ n ( θ, p n ) − Ψ n ( θ, p n ) (cid:13)(cid:13)(cid:13) . By the uniform LLN in Lemma S.3 the second term of the last inequality is o p (1), sowe need to show that (cid:13)(cid:13)(cid:13) ˆΨ n (cid:16) ˆ θ n , p n (cid:17)(cid:13)(cid:13)(cid:13) = o p (1).We have (cid:13)(cid:13)(cid:13) ˆΨ n (cid:16) ˆ θ n , p n (cid:17)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) ˆΨ n (cid:16) ˆ θ n , ˆ p n (cid:17)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) ˆΨ n (cid:16) ˆ θ n , ˆ p n (cid:17) − ˆΨ n (cid:16) ˆ θ n , p n (cid:17)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) ˆΨ n (cid:16) ˆ θ n , ˆ p n (cid:17)(cid:13)(cid:13)(cid:13) + sup θ ∈ Θ (cid:13)(cid:13)(cid:13) ˆΨ n ( θ, ˆ p n ) − ˆΨ n ( θ, p n ) (cid:13)(cid:13)(cid:13) , and (cid:13)(cid:13)(cid:13) ˆΨ n (cid:16) ˆ θ n , ˆ p n (cid:17)(cid:13)(cid:13)(cid:13) = o p (1) by (4.5), so we need to show that the second term is also o p (1).For any p ∈ [0 , T , we havesup θ ∈ Θ (cid:13)(cid:13)(cid:13) ˆΨ n ( θ, p ) − ˆΨ n ( θ, p n ) (cid:13)(cid:13)(cid:13) ≤ n ( n − X i X j = i sup θ ∈ Θ (cid:13)(cid:13)(cid:13) ˆ W n,ij ( P n,ij ( θ, p ) − P n,ij ( θ, p n )) (cid:13)(cid:13)(cid:13) ≤ max i,j =1 ,...,n (cid:13)(cid:13)(cid:13) ˆ W n,ij (cid:13)(cid:13)(cid:13) max i,j =1 ,...,n sup θ ∈ Θ k P n,ij ( θ, p ) − P n,ij ( θ, p n ) k = max i,j =1 ,...,n (cid:13)(cid:13)(cid:13) ˆ W n,ij (cid:13)(cid:13)(cid:13) max s,t =1 ,...,T sup θ ∈ Θ (cid:13)(cid:13) P n, ( st ) ( θ, p ) − P n, ( st ) ( θ, p n ) (cid:13)(cid:13) , where P n, ( st ) ( θ, p ) represents the value of P n,ij ( θ, p ) if X i = x s and X j = x t . ByLemma S.4 P n, ( st ) ( θ, p ) is continuous in θ and p at any θ ∈ Θ and p n . Since Θ is acompact set, this function is uniformly continuous in θ on Θ and pointwise continuousin p at p n .If a function f ( θ, p ) is uniformly continuous in θ on Θ and pointwise continuousin p at p n , then sup θ ∈ Θ k f ( θ, p ) − f ( θ, p n ) k is continuous in p at p n . This is truebecause for any η > δ such that k ( θ ′ , p ) − ( θ, p n ) k < δ implies that k f ( θ ′ , p ) − f ( θ, p n ) k < η where δ does not depend on θ, θ ′ , and p . Now if k p − p n k < δ ,10e have also k ( θ, p ) − ( θ, p n ) k < δ for all θ , sosup θ ∈ Θ k f ( θ, p ) − f ( θ, p n ) k < η. By letting f ( θ, p ) = P n, ( st ) ( θ, p ), we derive that sup θ ∈ Θ (cid:13)(cid:13) P n, ( st ) ( θ, p ) − P n, ( st ) ( θ, p n ) (cid:13)(cid:13) is continuous in p at p n . This together with Assumption (4)(iii) implies that thefunction sup θ ∈ Θ (cid:13)(cid:13)(cid:13) ˆΨ n ( θ, p ) − ˆΨ n ( θ, p n ) (cid:13)(cid:13)(cid:13) is continuous in p at p n .By the continuous mapping theorem and the consistency of ˆ p n in Lemma S.2,sup θ ∈ Θ (cid:13)(cid:13)(cid:13) ˆΨ n ( θ, ˆ p n ) − ˆΨ n ( θ, p n ) (cid:13)(cid:13)(cid:13) p → , as n → ∞ , so (S.19) holds and weak consistency is proven. Lemma S.2 (Consistency of ˆ p n ) Suppose that Assumptions 1-3 and 4(iv) are sat-isﬁed. The ﬁrst-step estimator ˆ p n is consistent for p n , i.e., for any δ > , Pr ( k ˆ p n − p n k > δ | X, p n ) → as n → ∞ . Proof.

Recall that ˆ p n = ( ˆ p n,st , s, t = 1 , . . . , T ) and p n = ( p n,st , s, t = 1 , . . . , T ), whereˆ p n,st is the link frequency of pairs with the characteristics x s and x t ˆ p n,st = P i P j = i G n,ij { X i = x s , X j = x t } P i P j = i { X i = x s , X j = x t } and p n,st is the population link probability of such pairs p n,st = E [ G n,ij | X i = x s , X j = x t , X, p n ] , so E [ ( G n,ij − p n,st ) | X, p n ] 1 { X i = x s , X j = x t } = 0 (S.20)By Chebyshev’s inequality, for any δ > k ˆ p n − p n k > δ | X, p n ) ≤ δ E (cid:2) k ˆ p n − p n k (cid:12)(cid:12) X, p n (cid:3) .

11t suﬃces to show that E (cid:2) k ˆ p n − p n k (cid:12)(cid:12) X, p n (cid:3) → n → ∞ .Observe that E (cid:2) k ˆ p n − p n k (cid:12)(cid:12) X, p n (cid:3) = E " X s X t (ˆ p n,st − p n,st ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n = X s X t E (cid:2) (ˆ p n,st − p n,st ) (cid:12)(cid:12) X, p n (cid:3) . (S.21)We can writeˆ p n,st − p n,st = n ( n − P i P j = i ( G n,ij − p n,st ) 1 { X i = x s , X j = x t } n ( n − P i P j = i { X i = x s , X j = x t } Therefore, the conditional variance of ˆ p n,st − p n,st given X and p n has a numerator1 n ( n − X i X j = i E (cid:2) ( G n,ij − p n,st ) (cid:12)(cid:12) X, p n (cid:3) { X i = x s , X j = x t } + 1 n ( n − X i X j = i X k = i,j E [ ( G n,ij − p n,st ) ( G n,ik − p n,st ) | X, p n ] · { X i = x s , X j = x t , X k = x t } + 1 n ( n − X i X j = i X k = i X l = k E [ ( G n,ij − p n,st ) ( G n,kl − p n,st ) | X, p n ] · { X i = x s , X j = x t , X k = x s , X l = x t } (S.22)Because the link choices are independent between individuals, the last term in (S.22)is 0 by (S.20). Further, E (cid:2) ( G n,ij − p n,st ) (cid:12)(cid:12) X, p n (cid:3) { X i = x s , X j = x t } ≤ E [ ( G n,ij − p n,st ) ( G n,ik − p n,st ) | X, p n ] 1 { X i = x s , X j = x t , X k = x t } ≤ n ( n − n ( n − + n ( n −

1) ( n − n ( n − = 1 n n ( n − P i P j = i { X i = x s , X j = x t } converges to a strictly positive limit byAssumption 4(iv), the denominator of the conditional variance of ˆ p n,st − p n,st convergesto the square of that limit. Therefore, the conditional variance of ˆ p n,st − p n,st is o (1)for each s and t . This implies E (cid:2) k ˆ p n − p n k (cid:12)(cid:12) X, p n (cid:3) → p n is consistent for p n . Lemma S.3 (Uniform LLN for Sample Moments)

Suppose that Assumptions 1-3 and 4(iii) are satisﬁed. For any δ > , Pr (cid:18) sup θ ∈ Θ (cid:13)(cid:13)(cid:13) ˆΨ n ( θ, p n ) − Ψ n ( θ, p n ) (cid:13)(cid:13)(cid:13) > δ (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:19) → as n → ∞ . Proof.

By the deﬁnition of ˆΨ n and Ψ n ˆΨ n ( θ, p n ) − Ψ n ( θ, p n )= 1 n ( n − X i X j = i ˆ W n,ij ( G n,ij − P n,ij ( θ, p n )) − W n,ij ( E [ G n,ij | X, p n ] − P n,ij ( θ, p n ))= 1 n ( n − X i X j = i (cid:16) ˆ W n,ij − W n,ij (cid:17) ( G n,ij − P n,ij ( θ, p n ))+ 1 n ( n − X i X j = i W n,ij ( G n,ij − E [ G n,ij | X, p n ]) (S.24)The ﬁrst term in the last expression in (S.24) is o p (1) uniformly over θ ∈ Θ becausesup θ ∈ Θ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n ( n − X i X j = i (cid:16) ˆ W n,ij − W n,ij (cid:17) ( G n,ij − P n,ij ( θ, p n )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max i,j =1 ,...,n (cid:13)(cid:13)(cid:13) ˆ W n,ij − W n,ij (cid:13)(cid:13)(cid:13) = o p (1)by Assumption 4(iii). Write the last term in (S.24) as1 n ( n − X i X j = i W n,ij ( G n,ij − E [ G n,ij | X, p n ]) = 1 n X i Y ni Y ni = 1 n − X j = i W n,ij ( G n,ij − E [ G n,ij | X, p n ])Note that n P i Y ni does not depend on θ . We prove that it is o p (1) following theproof for a pointwise LLN. By Chebyshev’s inequality, for any δ > (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n X i Y ni (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) > δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ! ≤ δ E  (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n X i Y ni (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n  . Note that conditional on X and p n , the random variables Y ni , i = 1 , . . . , n , areindependent with mean 0. Therefore, n P i Y ni has the conditional variance E  (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n X i Y ni (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n  = 1 n X i E (cid:2) k Y ni k (cid:12)(cid:12) X, p n (cid:3) = 1 n ( n − X i X j = i W ′ n,ij E (cid:2) ( G n,ij − E [ G n,ij | X, p n ]) (cid:12)(cid:12) X, p n (cid:3) W n,ij + 1 n ( n − X i X j = i X k = i,j W ′ n,ij E [( G n,ij − E [ G n,ij | X, p n ]) · ( G n,ik − E [ G n,ik | X, p n ]) | X, p n ] W n,ik Since E (cid:2) ( G n,ij − E [ G n,ij | X, p n ]) (cid:12)(cid:12) X, p n (cid:3) ≤ | E [ ( G n,ij − E [ G n,ij | X, p n ]) ( G n,ik − E [ G n,ik | X, p n ]) | X, p n ] | ≤ n ( n −

1) max i,j =1 ,...,n k W n,ij k + n − n ( n −

1) max i,j,k =1 ,...,n k W n,ij k k W n,ik k = o (1)by Assumption 4(iii), so 1 n X i Y ni = o p (1) . θ (cid:13)(cid:13)(cid:13) ˆΨ n ( θ, p n ) − Ψ n ( θ, p n ) (cid:13)(cid:13)(cid:13) = o p (1)as n → ∞ . Lemma S.4 (Continuity of CCP)

Suppose that Assumptions 1-3 are satisﬁed. Given X , the conditional choice probability P n,ij ( θ, p ) is continuous in θ and p . Proof.

Recall that P n,ij ( θ, p )= Z (cid:26) U n,ij ( θ, p ) + 2 ( n − n − Z ′ j Φ ni ( θ, p ) Λ ni ( θ, p ) ω ni ( ε i , θ, p ) ≥ ε ij (cid:27) f ε i ( ε i ; θ ) dε i , where f ε i represents the density of ε i . By (2.5), (2.6), (3.3) and Assumption 1, U n,ij ( θ, p ) and f ε i ( ε i ; θ ) are continuous in θ and p . The challenge is that ω ni ( ε i , θ, p )is a function of ε i and it depends on θ and p . To establish the continuity of P n,ij ( θ, p ),we need to investigate how ω ni ( ε i , θ, p ) varies with θ and p .In Corollary 3.3, we show that ω ni ( ε i , θ, p ) satisﬁesΦ ni ( θ, p ) Λ ni ( θ, p ) ω ni ( ε i , θ, p ) = 1 n − X k = i G n,ik ( ε i , θ, p ) V ni ( θ, p ) Z k , a.s.,where G ni ( ε i , θ, p ) = ( G n,ij ( ε i , θ, p ) , j = i ) ∈ { , } n − is the optimal decision givenin Theorem 3.2. Therefore P n,ij ( θ, p ) can be expressed as P n,ij ( θ, p )= Z ( U n,ij ( θ, p ) + 2 n − X k = i G n,ik ( ε i , θ, p ) Z ′ j V ni ( θ, p ) Z k ≥ ε ij ) f ε i ( ε i ; θ ) dε i . From Corollary 3.3, the optimal decision G ni ( ε i , θ, p ) = g ni for some g ni ∈ { , } n − if and only if ε i ∈ E i ( g ni , θ, p ), where the set E i ( g ni , θ, p ) is deﬁned in (S.17) E i ( g ni , θ, p ) = (cid:8) ε i ∈ R n − : g ni satisﬁes both (S.13) and (S.16) (cid:9) . g ni ∈ { , } n − , the equations in (S.13) deﬁne an orthant in R n − ε ij ( < U n,ij ( θ, p ) + n − P k = i g n,ik Z ′ j V ni ( θ, p ) Z k if g n,ij = 1 ≥ U n,ij ( θ, p ) + n − P k = i g n,ik Z ′ j V ni ( θ, p ) Z k if g n,ij = 0 , ∀ j = i. (S.25)Both U n,ij ( θ, p ) and V ni ( θ, p ) are continuous in θ and p , so the boundary of thisorthant is continuous in θ and p . Moreover, the inequalities in (S.16) deﬁne half-spaces in R n − given by the hyperplanes X j = i (cid:0) g n,ij − g ln,ij (cid:1) ε ij ≤ X j = i (cid:0) g n,ij − g ln,ij (cid:1) U n,ij ( θ, p ) + 1 n − X k = i (cid:0) g ln.ik − g ln,ik (cid:1) Z ′ j V ni ( θ, p ) Z k , (S.26)for all g lni that solve (S.13) with probability 1. While the set of solutions to (S.13) fora given ε i could be discontinuous in θ and p (i.e., some link choices in an optimal g lni may switch from 0 to 1 or the opposite as θ or p changes), this occurs with probabilityzero because ε i follows a continuous distribution by Assumption 1(i). Since the right-hand side of (S.26) is continuous in θ and p , the boundaries of such half-spaces arealso continuous in θ and p .The set E i ( g ni , θ, p ) is the intersection of the orthant in (S.25) and the half-spacesdeﬁned by (S.26). Because continuity is preserved under max and min operations, iftwo sets have boundaries that are continuous in θ and p , their intersection must alsohave a boundary that is continuous in θ and p . Therefore, the set E i ( g ni , θ, p ) has aboundary that is continuous in θ and p .Partitioning the space of ε i into a collection of the sets E i ( g ni , θ, p ) for all g ni ∈{ , } n − , we can write P n,ij ( θ, p ) as P n,ij ( θ, p )= X g ni ∈{ , } n − g n,ij =1 Z E i ( g ni ,θ,p ) ( U n,ij ( θ, p ) + 2 n − X k = i g n,ik Z ′ j V ni ( θ, p ) Z k ≥ ε ij ) f ε i ( ε i ; θ ) dε i = X g ni ∈{ , } n − g n,ij =1 Z E i ( g ni ,θ,p ) f ε i ( ε i ; θ ) dε i . (S.27)16or each g ni , the set E i ( g ni ; θ, p ) has a boundary that is continuous in θ and p , soeach integral in the summation in (S.27) is continuous in θ and p by Assumption 1(i).The proof is complete. S.2.2 Asymptotic Distribution

In this section, we prove that the asymptotic distribution of ˆ θ n is as in Theorem 4.2.We ﬁrst derive the asymptotic properties of ω ni in a sequence of lemmas. Then weuse these lemmas to prove Theorem 4.2. Asymptotic Properties of ω ni ( ε i ) In the derivation of the asymptotic propertiesof ω ni ( ε i ) we suppress the dependence on θ and p n to simplify the notation. Recallthat ω ni ( ε i ) maximizes Π ni ( ω, ε i ) ω ni ( ε i ) = arg max ω ∈ R T Π ni ( ω, ε i ) , where Π ni ( ω, ε i ) = X j = i (cid:20) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij (cid:21) + − ( n − n − ω ′ Λ ni ω. Let Π ∗ ni ( ω ) denote the conditional expectation of Π ni ( ω, ε i ) given X and p n Π ∗ ni ( ω ) = X j = i E (cid:20) (cid:20) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij (cid:21) + (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) − ( n − n − ω ′ Λ ni ω and ω ∗ ni is a maximizer of Π ∗ ni ( ω ) ω ∗ ni = arg max ω ∈ R T Π ∗ ni ( ω ) . In the subsequent lemmas, we establish that ω ni ( ε i ) is consistent for ω ∗ ni (LemmaS.5). Moreover, ω ni ( ε i ) has an asymptotically linear representation (Lemma S.7)and satisﬁes certain uniformity properties (Lemma S.8). Additional results that areneeded to prove these lemmas are in Lemma S.6 and S.9.17 emark S.1 By Lemma S.1 we have Λ ni ω ni ( ε i ) = 1 n − X j = i (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω ni ( ε i ) − ε ij ≥ (cid:27) Λ ni Φ ′ ni Z j almost surely. We set ω ni,t ( ε i ) = 0 if λ ni,t = 0 , t = 1 , . . . , T , so ω ni ( ε i ) = 1 n − X j = i (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω ni ( ε i ) − ε ij ≥ (cid:27) Λ + ni Λ ni Φ ′ ni Z j where Λ + ni is the generalized inverse of Λ ni . Then k ω ni ( ε i ) k ≤ max j = i (cid:13)(cid:13) Λ + ni Λ ni Φ ′ ni Z j (cid:13)(cid:13) ≤ max j = i (cid:13)(cid:13) Λ + ni Λ ni (cid:13)(cid:13) k Φ ′ ni k k Z j k ≤ T < ∞ Therefore ω ni ( ε i ) is bounded, and without loss of generality we can assume that ω liesin a compact set Ω ⊆ R T as in Assumption 5(i). Lemma S.5 (Consistency of ω ni ) Suppose that Assumptions 1-3 and 5(i)-(ii) aresatisﬁed. For i = 1 , . . . , n, ω ni ( ε i ) is consistent for ω ∗ ni , i.e., for any δ > k ω ni ( ε i ) − ω ∗ ni k > δ | X, p n ) → as n → ∞ . Proof.

We follow the proof in Newey and McFadden (1994). Fix δ >

0. Let B δ ( ω ∗ ni ) = { ω ∈ Ω : k ω − ω ∗ ni k < δ } be an open δ -ball centered at ω ∗ ni . If Π ∗ ni ( ω ni ( ε i )) > sup ω ∈ Ω \B δ ( ω ∗ ni ) Π ∗ ni ( ω ), ω ni ( ε i ) / ∈ Ω \B δ ( ω ∗ ni ), or equivalently, ω ni ( ε i ) ∈ B δ ( ω ∗ ni ). There-fore, Pr ( k ω ni ( ε i ) − ω ∗ ni k < δ | X, p n ) ≥ Pr  Π ∗ ni ( ω ni ( ε i )) > sup ω ∈ Ω \B δ ( ω ∗ ni ) Π ∗ ni ( ω ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n  = Pr  Π ∗ ni ( ω ∗ ni ) − Π ∗ ni ( ω ni ( ε i )) < Π ∗ ni ( ω ∗ ni ) − sup ω ∈ Ω \B δ ( ω ∗ ni ) Π ∗ ni ( ω ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n  . (S.28)18y Assumption 5(i)-(ii)1 n −  Π ∗ ni ( ω ∗ ni ) − sup ω ∈ Ω \B δ ( ω ∗ ni ) Π ∗ ni ( ω )  > , so the right-hand size of (S.28) goes to 1 if1 n − ∗ ni ( ω ∗ ni ) − Π ∗ ni ( ω ni ( ε i ))) ≤ o p (1) . (S.29)By the optimality of ω ni ( ε i ) we have0 ≤ Π ∗ ni ( ω ∗ ni ) − Π ∗ ni ( ω ni ( ε i )) =Π ∗ ni ( ω ∗ ni ) − Π ni ( ω ∗ ni , ε i ) + Π ni ( ω ∗ ni , ε i ) − Π ∗ ni ( ω ni ( ε i )) ≤ Π ∗ ni ( ω ∗ ni ) − Π ni ( ω ∗ ni , ε i ) + Π ni ( ω ni ( ε i ) , ε i ) − Π ∗ ni ( ω ni ( ε i )) ≤ ω ∈ Ω | Π ni ( ω, ε i ) − Π ∗ ni ( ω ) | . By the uniform LLN for Π ni ( ω, ε i ) in Lemma S.6,sup ω ∈ Ω n − | Π ni ( ω, ε i ) − Π ∗ ni ( ω ) | = o p (1) . so (S.29) holds and the consistency is proved. Lemma S.6 (Uniform LLN for Π ni ) Suppose that Assumptions 1-3 and 5 are sat-isﬁed. Then for any δ > , Pr (cid:18) sup ω ∈ Ω n − | (Π ni ( ω, ε i ) − Π ∗ ni ( ω )) | > δ (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:19) → as n → ∞ . Proof.

Recall thatΠ ni ( ω, ε i ) = X j = i (cid:20) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij (cid:21) + − ( n − n − ω ′ Λ ni ω andΠ ∗ ni ( ω ) = X j = i E (cid:18) (cid:20) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij (cid:21) + (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:19) − ( n − n − ω ′ Λ ni ω. π n,ij ( ω, ε ij ) = (cid:20) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij (cid:21) + . Hence,1 n − ni ( ω, ε i ) − Π ∗ ni ( ω )) = 1 n − X j = i ( π n,ij ( ω, ε ij ) − E [ π n,ij ( ω, ε ij ) | X, p n ]) . By Assumption 5(i) (cid:12)(cid:12) Z ′ j Φ ni Λ ni ω (cid:12)(cid:12) ≤ k Φ ni k k Λ ni k sup ω ∈ Ω k ω k ≤ √ T max t =1 ,...,T λ ni,t sup ω ∈ Ω k ω k ≤ M < ∞ Therefore for all ω ∈ Ω π n,ij ( ω, ε ij ) ≤ (cid:18) U n,ij + 2 ( n − n − M − ε ij (cid:19) with E " (cid:18) U n,ij + 2 ( n − n − M − ε ij (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n < ∞ Also π n,ij ( ω, ε ij ) is continuous in ω on a compact set Ω. Therefore the conditionsof the uniform LLN for triangular arrays are satisﬁed (Jennrich, 1969) and (S.30)follows. Lemma S.7 (Asymptotically linear representation of ω ni ( ε i ) ) Suppose that As-sumptions 1-3 and 5 are satisﬁed. For each i = 1 , . . . , n, ω ni ( ε i ) has an asymptoticallylinear representation ω ni ( ε i ) − ω ∗ ni = 1 n − X j = i ϕ ωn,ij ( ω ∗ ni , ε ij ) + r ωni ( ε i ) (S.31) as n → ∞ , with the inﬂuence function ϕ ωn,ij ( ω ∗ ni , ε ij ) ∈ R T given by ϕ ωn,ij ( ω ∗ ni , ε ij ) = −∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + ϕ πn,ij ( ω ∗ ni , ε ij ) , (S.32) where the function ϕ πn,ij ( ω, ε ij ) ∈ R T is deﬁned by ϕ πn,ij ( ω, ε ij ) = 1 (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij ≥ (cid:27) Λ ni Φ ′ ni Z j − Λ ni ω, (S.33)20 nd ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) = 2 n − X j = i f ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω ∗ ni (cid:19) Λ ni Φ ′ ni Z j Z ′ j Φ ni Λ ni − Λ ni , (S.34) which by Assumption 5(iii) has the generalized inverse ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + =Λ + ni n − X j = i f ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω ∗ ni (cid:19) Λ ni Φ ′ ni Z j Z ′ j Φ ni − I T ! − . Moreover, the remainder r ωni ( ε i ) in (S.31) satisﬁes r ωni ( ε i ) = o p (cid:18) √ n (cid:19) . (S.35) Proof.

Deﬁne Γ ni ( ω, ε i ) ∈ R T Γ ni ( ω, ε i ) = 1 n − X j = i (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij ≥ (cid:27) Λ ni Φ ′ ni Z j − Λ ni ω = 1 n − X j = i ϕ πn,ij ( ω, ε ij ) , where ϕ πn,ij ( ω, ε ij ) ∈ R T is deﬁned in (S.33). By Lemma S.1 ω ni ( ε i ) satisﬁes theﬁrst-order condition Γ ni ( ω ni ( ε i ) , ε i ) = 0, a.s. (S.36)Let Γ ∗ ni ( ω ) ∈ R T be the conditional expectation of Γ ni ( ω, ε i )Γ ∗ ni ( ω ) = E [ Γ ni ( ω, ε i ) | X, p n ] = 1 n − X j = i E (cid:2) ϕ πn,ij ( ω, ε ij ) (cid:12)(cid:12) X, p n (cid:3) , where E (cid:2) ϕ πn,ij ( ω, ε ij ) (cid:12)(cid:12) X, p n (cid:3) = F ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω (cid:19) Λ ni Φ ′ ni Z j − Λ ni ω. By Assumption 5(ii), Π ∗ ni ( ω ) is maximized at ω ∗ ni , so ω ∗ ni satisﬁes the ﬁrst-order21ondition Γ ∗ ni ( ω ∗ ni ) = 0 . (S.37)By a Taylor expansion of Γ ∗ ni ( ω ) at ω ∗ ni and the consistency of ω ni ( ε i ) in LemmaS.5, we haveΓ ∗ ni ( ω ni ( ε i )) = ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) ( ω ni ( ε i ) − ω ∗ ni ) + O p (cid:0) k ω ni ( ε i ) − ω ∗ ni k (cid:1) , (S.38)where ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) is the Jacobian matrix of Γ ∗ ni ( ω ) at ω ∗ ni deﬁned in (S.34) that werewrite as ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) = H ni ( ω ∗ ni ) Λ ni with H ni ( ω ∗ ni ) = 2 n − X j = i f ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω ∗ ni (cid:19) Λ ni Φ ′ ni Z j Z ′ j Φ ni − I T . By Assumption 5(iii), H ni ( ω ∗ ni ) is nonsingular. There exists a constant c > k∇ ω ′ Γ ∗ ni ( ω ∗ ni ) ( ω − ω ∗ ni ) k ≥ c k ω − ω ∗ ni k for every ω . This is because k∇ ω ′ Γ ∗ ni ( ω ∗ ni ) ( ω − ω ∗ ni ) k = ( ω − ω ∗ ni ) ′ ( ∇ ω ′ Γ ∗ ni ( ω ∗ ni )) ′ ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) ( ω − ω ∗ ni )= ( ω − ω ∗ ni ) ′ Λ ni H ni ( ω ∗ ni ) ′ H ni ( ω ∗ ni ) Λ ni ( ω − ω ∗ ni ) ≥ λ min (cid:0) H ni ( ω ∗ ni ) ′ H ni ( ω ∗ ni ) (cid:1) ( ω − ω ∗ ni ) ′ Λ ni ( ω − ω ∗ ni ) ≥ λ min (cid:0) H ni ( ω ∗ ni ) ′ H ni ( ω ∗ ni ) (cid:1) λ min ( V ′ ni V ni ) k ω − ω ∗ ni k , where λ min (cid:0) H ni ( ω ∗ ni ) ′ H ni ( ω ∗ ni ) (cid:1) is the smallest eigenvalue of H ni ( ω ∗ ni ) ′ H ni ( ω ∗ ni ), whichis positive because H ni ( ω ∗ ni ) is nonsingular, and λ min ( V ′ ni V ni ) is the smallest amongthe eigenvalues of V ′ ni V ni that are not zero, which is also positive. Combining thiswith the Taylor expansion of Γ ∗ ni ( ω ni ( ε i )), we obtain k Γ ∗ ni ( ω ni ( ε i )) k ≥ k ω ni ( ε i ) − ω ∗ ni k ( c + o p (1)) . (S.39)22y (S.36) and (S.37), we can write Γ ∗ ni ( ω ni ( ε i )) asΓ ∗ ni ( ω ni ( ε i ))= − Γ ni ( ω ∗ ni , ε i ) − (Γ ni ( ω ni ( ε i ) , ε i ) − Γ ∗ ni ( ω ni ( ε i )) − (Γ ni ( ω ∗ ni , ε i ) − Γ ∗ ni ( ω ∗ ni ))) , a.s.(S.40)We apply the Lindeberg-Feller CLT to show that the ﬁrst term on the right-hand sidesatisﬁes Γ ni ( ω ∗ ni , ε i ) = O p (cid:18) √ n (cid:19) . (S.41)To verify the Lindeberg condition, deﬁne the mean 0 random vector Y γn,ij = 1 √ n − (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω ∗ ni − ε ij ≥ (cid:27) Λ ni Φ ′ ni Z j − Λ ni ω ∗ ni , so that Γ ni ( ω ∗ ni , ε i ) = 1 √ n − X j = i Y γn,ij . By the Cramer-Wold device it suﬃces to show that a ′ P j = i Y γn,ij satisﬁes the Lindebergcondition for any T × a . The Lindeberg condition is that forany ξ > n →∞ a ′ Σ γni a X j = i E (cid:20) (cid:0) a ′ Y γn,ij (cid:1) (cid:26)(cid:12)(cid:12) a ′ Y γn,ij (cid:12)(cid:12) ≥ ξ q a ′ Σ γni a (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) = 0 , withΣ γni = X j = i V ar (cid:0) Y γn,ij (cid:12)(cid:12) X, p n (cid:1) = 1 n − X j = i F ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω ∗ ni (cid:19) · (cid:18) − F ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω ∗ ni (cid:19)(cid:19) Λ ni Φ ′ ni Z j Z ′ j Φ ni Λ ni .

23e have X j = i E (cid:20) (cid:0) a ′ Y γn,ij (cid:1) (cid:26)(cid:12)(cid:12) a ′ Y γn,ij (cid:12)(cid:12) ≥ ξ q a ′ Σ γni a (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ E " X j = i (cid:0) a ′ Y γn,ij (cid:1) ( max j = i (cid:12)(cid:12) a ′ Y γn,ij (cid:12)(cid:12)p a ′ Σ γni a ≥ ξ )(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n . Note that P j = i (cid:0) a ′ Y γn,ij (cid:1) has a ﬁnite expectation and is therefore O p (1). Hence ifmax j = i (cid:12)(cid:12) a ′ Y γn,ij (cid:12)(cid:12)p a ′ Σ γni a = o p (1) , (S.42)then X j = i (cid:0) a ′ Y γn,ij (cid:1) ( max j = i (cid:12)(cid:12) a ′ Y γn,ij (cid:12)(cid:12)p a ′ Σ γni a ≥ ξ ) = O p (1) o p (1) = o p (1) . Finally, this random variable is bounded by P j = i (cid:0) a ′ Y γn,ij (cid:1) that has a ﬁnite expecta-tion. We conclude that by dominated convergence the Lindeberg condition is satisﬁedif (S.42) holds.By Chebyshev’s inequality,Pr max j = i (cid:12)(cid:12) a ′ Y γn,ij (cid:12)(cid:12)p a ′ Σ γni a ≥ ξ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ! ≤ ξ a ′ Σ γni a E (cid:20) max j = i (cid:0) a ′ Y γn,ij (cid:1) (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) . The random variable a ′ Y γn,ij has a support bounded by (cid:12)(cid:12) a ′ Y γn,ij (cid:12)(cid:12) ≤ k a k k Λ ni k (cid:16) √ T + k ω ∗ ni k (cid:17) √ n − ≤ M i √ n − M i < ∞ . Let k Z k ψ | X,p n be the conditional Orlicz norm of a random variable Z given X and p n for the convex function ψ ( z ) = e z − Then E [ | Z || X, p n ] ≤ The conditional Orlicz norm is deﬁned by k Z k ψ | X,p n = inf n C > E (cid:16) ψ (cid:16) | Z | C (cid:17)(cid:12)(cid:12)(cid:12) X, p n (cid:17) ≤ o . Z k ψ | X,p n so that E (cid:20) max j = i (cid:0) a ′ Y γn,ij (cid:1) (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ (cid:13)(cid:13)(cid:13)(cid:13) max j = i (cid:0) a ′ Y γn,ij (cid:1) (cid:13)(cid:13)(cid:13)(cid:13) ψ | X,p n . By the maximal inequality in Lemma 2.2.2 in van der Vaart and Wellner (1996) wehave the bound (cid:13)(cid:13)(cid:13)(cid:13) max j = i (cid:0) a ′ Y γn,ij (cid:1) (cid:13)(cid:13)(cid:13)(cid:13) ψ | X,p n ≤ K ln ( n + 1) max j = i (cid:13)(cid:13)(cid:13)(cid:0) a ′ Y γn,ij (cid:1) (cid:13)(cid:13)(cid:13) ψ | X,p n . By the Hoeﬀding’s inequality for bounded random variables (Boucheron, Lugosi, and Massart,2013, Theorem 2.8)Pr (cid:16) (cid:0) a ′ Y γn,ij (cid:1) ≥ t (cid:12)(cid:12)(cid:12) X, p n (cid:17) = Pr (cid:16) a ′ Y γn,ij ≥ √ t (cid:12)(cid:12)(cid:12) X, p n (cid:17) + Pr (cid:16) − a ′ Y γn,ij ≥ √ t (cid:12)(cid:12)(cid:12) X, p n (cid:17) ≤ (cid:18) − ( n − t M i (cid:19) so that by Lemma 2.2.1 in van der Vaart and Wellner (1996) (cid:13)(cid:13)(cid:13)(cid:0) a ′ Y γn,ij (cid:1) (cid:13)(cid:13)(cid:13) ψ | X,p n ≤ M i n − . Combining these results1 ξ a ′ Σ γni a E (cid:20) max j = i (cid:0) a ′ Y γn,ij (cid:1) (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ ξ a ′ Σ γni a K ln ( n + 1) M i n − o (1)so the Lindeberg condition holds.As for the second term on the right-hand side of (S.40), note thatΓ ni ( ω, ε i ) − Γ ∗ ni ( ω ) = 1 n − X j = i (cid:0) ϕ γn,ij ( ω, ε ij ) − E (cid:2) ϕ γn,ij ( ω, ε ij ) (cid:12)(cid:12) X, p n (cid:3)(cid:1) This is true because z ≤ ψ ( z ), we have E (cid:16) ψ (cid:16) | Z |k Z k ψ (cid:17)(cid:12)(cid:12)(cid:12) X n , p n (cid:17) ≤ ≤ E (cid:16) ψ (cid:16) | Z | E ( | Z || X n ,p n ) (cid:17)(cid:12)(cid:12)(cid:12) X n , p n (cid:17) . ϕ γn,ij ( ω, ε ij ) deﬁned by ϕ γn,ij ( ω, ε ij ) = 1 (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij ≥ (cid:27) Λ ni Φ ′ ni Z j . (S.43)Deﬁne the empirical process G n ϕ γni ( ω, ε i ) = √ n − ni ( ω, ε i ) − Γ ∗ ni ( ω ))= 1 √ n − X j = i (cid:0) ϕ γn,ij ( ω, ε ij ) − E (cid:2) ϕ γn,ij ( ω, ε ij ) (cid:12)(cid:12) X, p n (cid:3)(cid:1) , ω ∈ Ω (S.44)so the second term on the right-hand side of (S.40) can be written asΓ ni ( ω ni ( ε i ) , ε i ) − Γ ∗ ni ( ω ni ( ε i )) − (Γ ni ( ω ∗ ni , ε i ) − Γ ∗ ni ( ω ∗ ni ))= 1 √ n − G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i )) . (S.45)In Lemma S.9(i) we show that G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i ) = o p (1) . (S.46)Hence, the second term on the right-hand side of (S.40) is o p (cid:0) n − / (cid:1) Γ ni ( ω ni ( ε i ) , ε i ) − Γ ∗ ni ( ω ni ( ε i )) − (Γ ni ( ω ∗ ni , ε i ) − Γ ∗ ni ( ω ∗ ni )) = o p (cid:18) √ n (cid:19) . (S.47)Applying (S.39), (S.41) and (S.47) to (S.40) we obtain k ω ni ( ε i ) − ω ∗ ni k ( c + o p (1)) ≤ O p (cid:18) √ n (cid:19) + o p (cid:18) √ n (cid:19) . (S.48)This implies that ω ni ( ε i ) − ω ∗ ni = O p (cid:18) √ n (cid:19) , (S.49)i.e., ω ni ( ε i ) converges to ω ∗ ni at the rate of n − .Combining (S.38), (S.40), and (S.47) yields ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) ( ω ni ( ε i ) − ω ∗ ni ) = − Γ ni ( ω ∗ ni , ε i ) + o p (cid:18) √ n (cid:19) .

26y Assumption 5(iii), ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) ( ω − ω ∗ ni ) = Λ + ni Λ ni ( ω − ω ∗ ni ) = ω − ω ∗ ni . Multiplying both sides by ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + we obtain ω ni ( ε i ) − ω ∗ ni = −∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + Γ ni ( ω ∗ ni , ε i ) + r ωni ( ε i )= − n − X j = i ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + ϕ πn,ij ( ω ∗ ni , ε ij ) + r ωni ( ε i ) (S.50)with r ωni ( ε i ) = o p (cid:16) √ n (cid:17) . The proof is complete. Lemma S.8 (Uniform Properties of ω ni ( ε i ) ) Suppose that Assumptions 1-3 and5 are satisﬁed. Then (i) ω ni ( ε i ) satisﬁes max ≤ i ≤ n k ω ni ( ε i ) − ω ∗ ni k = o p (cid:18) √ n (cid:19) . (ii) The remainder r ωni ( ε i ) deﬁned in Lemma S.7 satisﬁes max ≤ i ≤ n k r ωni ( ε i ) k = o p (cid:18) √ n (cid:19) . Proof.

Part (i): By Markov’s inequality, for any δ > (cid:18) max ≤ i ≤ n √ n k ω ni ( ε i ) − ω ∗ ni k > δ (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:19) ≤ √ nδ E (cid:20) max ≤ i ≤ n k ω ni ( ε i ) − ω ∗ ni k (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) . Let k·k ψ | X,p n be the conditional Orlicz norm given X and p n for the convex function ψ ( z ) = e z −

1. By E [ | Z || X, p n ] ≤ k Z k ψ | X,p n for any random variable Z and themaximal inequality in Lemma 2.2.2 in van der Vaart and Wellner (1996) we derive E (cid:20) max ≤ i ≤ n k ω ni ( ε i ) − ω ∗ ni k (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ (cid:13)(cid:13)(cid:13)(cid:13) max ≤ i ≤ n k ω ni ( ε i ) − ω ∗ ni k (cid:13)(cid:13)(cid:13)(cid:13) ψ | X,p n ≤ K ln ( n + 1) max ≤ i ≤ n (cid:13)(cid:13) k ω ni ( ε i ) − ω ∗ ni k (cid:13)(cid:13) ψ | X,p n , where K is a constant. Let k·k ψ | X,p n be the conditional Orlicz norm given X and p n for the convex function ψ ( z ) = e z −

1. For any random variable Z and constant C >

0, we have E h ψ (cid:16) | Z | C (cid:17)(cid:12)(cid:12)(cid:12) X, p n i = E h ψ (cid:16) | Z | C (cid:17)(cid:12)(cid:12)(cid:12) X, p n i , so k Z k ψ | X,p n = k Z k ψ | X,p n .Hence, max ≤ i ≤ n (cid:13)(cid:13) k ω ni ( ε i ) − ω ∗ ni k (cid:13)(cid:13) ψ | X,p n = max ≤ i ≤ n k ω ni ( ε i ) − ω ∗ ni k ψ | X,p n . r ωni ( ε i ) is r ωni ( ε i ) = ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + · (cid:18) O p (cid:0) k ω ni ( ε i ) − ω ∗ ni k (cid:1) + 1 √ n − G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i )) (cid:19) . (S.51)Therefore, from (S.50) and (S.51), we obtain ω ni ( ε i ) − ω ∗ ni = 1 n − X j = i ϕ ωn,ij ( ω ∗ ni , ε ij ) + ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + · (cid:18) o p ( k ω ni ( ε i ) − ω ∗ ni k ) + 1 √ n − G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i )) (cid:19) . By the triangle inequality for the Orlicz norm and the boundedness of the inverseJacobian ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + we obtain k ( ω ni ( ε i ) − ω ∗ ni ) (1 + o p (1)) k ψ | X,p n ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n − X j = i ϕ ωn,ij ( ω ∗ ni , ε ij ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ψ | X,p n + 1 √ n − (cid:13)(cid:13) ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + (cid:13)(cid:13) k G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i ) k ψ | X,p n . (S.52)Note that k ( ω ni ( ε i ) − ω ∗ ni ) (1 + o p (1)) k ψ | X,p n = k ( ω ni ( ε i ) − ω ∗ ni ) k ψ | X,p n (1 + o (1)). Consider the ﬁrst term on the right-hand side. Recall that the inﬂuence function Take two random variables X and Y . For any ε >

1. Because ψ isnon-decreasing and convex, we have ψ (cid:18) | X + Y | u + v (cid:19) ≤ ψ (cid:18) | X | + | Y | u + v (cid:19) = ψ (cid:18) uu + v | X | u + vu + v | Y | v (cid:19) ≤ uu + v ψ (cid:18) | X | u (cid:19) + vu + v ψ (cid:18) | Y | v (cid:19) . Hence u and v satisfy u + v < k X k ψ | X,p n + k Y k ψ | X,p n + 2 ε and E h ψ (cid:16) | X + Y | u + v (cid:17)(cid:12)(cid:12)(cid:12) X, p n i ≤

1. Bydeﬁnition of the Orlicz norm k X + Y k ψ | X,p n ≤ u + v < k X k ψ | X,p n + k Y k ψ | X,p n + 2 ε . This proves k X + Y k ψ | X,p n ≤ k X k ψ | X,p n + k Y k ψ | X,p n . For any bounded random variable Z , we have k Zo p (1) k ψ | X,p n = o (cid:16) k Z k ψ | X,p n (cid:17) . This is ωn,ij ( ω ∗ ni , ε ij ) is a T × ϕ ωn,ij ( ω ∗ ni , ε ij )= − ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + (cid:18) (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω ∗ ni − ε ij ≥ (cid:27) Λ ni Φ ′ ni Z j − Λ ni ω ∗ ni (cid:19) . Let ϕ ωn,ij,t ( ω ∗ ni , ε ij ) denote the t -th component of ϕ ωn,ij ( ω ∗ ni , ε ij ), t = 1 , . . . T . Note that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n − X j = i ϕ ωn,ij ( ω ∗ ni , ε ij ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = vuut T X t =1 n − X j = i ϕ ωn,ij,t ( ω ∗ ni , ε ij ) ! ≤ T X t =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − X j = i ϕ ωn,ij,t ( ω ∗ ni , ε ij ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , so for any x >

0, Pr (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n − X j = i ϕ ωn,ij ( ω ∗ ni , ε ij ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) > x (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ! ≤ Pr T X t =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − X j = i ϕ ωn,ij,t ( ω ∗ ni , ε ij ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > x (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ! ≤ T X t =1 Pr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − X j = i ϕ ωn,ij,t ( ω ∗ ni , ε ij ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > xT (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ! . because for any sequence δ n ↓ < E " ψ | Zo p (1) |k Zo p (1) k ψ | X,p n − δ n !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n = E " ψ | Zo p (1) |k Z k ψ | X,p n · k Z k ψ | X,p n k Zo p (1) k ψ | X,p n − δ n !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n If there were

29t is clear that for any t = 1 , . . . T , and i, j = 1 , . . . , n , (cid:12)(cid:12) ϕ ωn,ij,t ( ω ∗ ni , ε ij ) (cid:12)(cid:12) < (cid:13)(cid:13) ϕ ωn,ij ( ω ∗ ni , ε ij ) (cid:13)(cid:13) ≤ (cid:13)(cid:13) ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + (cid:13)(cid:13) ( k Λ ni Φ ′ ni Z j k + k Λ ni k k ω ∗ ni k ) ≤ M n,ij ≤ M < ∞ . By Hoeﬀding’s inequality for bounded random variables (Boucheron, Lugosi, and Massart,2013, Theorem 2.8) we havePr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − X j = i ϕ ωn,ij,t ( ω ∗ ni , ε ij ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > xT (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ! ≤ (cid:18) − ( n − x M T (cid:19) , so Pr (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n − X j = i ϕ ωn,ij ( ω ∗ ni , ε ij ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) > x (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ! ≤ T exp (cid:18) − ( n − x M T (cid:19) . Hence, by Lemma 2.2.1 in van der Vaart and Wellner (1996), (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n − X j = i ϕ ωn,ij ( ω ∗ ni , ε ij ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ψ | X,p n ≤ p T + 1) T M √ n − . From (S.64) in the proof of Lemma S.9 we see that k G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i ) k ψ | X,p n = o (1) . Following the proof for (S.64) and applying Theorems 2.14.5 and 2.14.1 in van der Vaart and Wellner(1996) for p = 2 we can derive similarly k G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i ) k ψ | X,p n = o (1) , so the second term on the right-hand side of (S.52) is o (1) √ n − .30ombining the results yieldsPr (cid:18) max ≤ i ≤ n √ n k ω ni ( ε i ) − ω ∗ ni k > δ (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:19) ≤ Kδ √ n ln ( n + 1) p T + 1) T M √ n − o (1) √ n − ! = o (1) . We conclude that max ≤ i ≤ n k ω ni ( ε i ) − ω ∗ ni k = o p (cid:0) n − / (cid:1) .Part (ii): From (S.38), (S.40), (S.45), and (S.50) in Lemma S.7, the remainder r ωni ( ε i ) is given by r ωni ( ε i ) = ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + · (cid:18) O p (cid:0) k ω ni ( ε i ) − ω ∗ ni k (cid:1) + 1 √ n − G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i )) (cid:19) . It is clear that max ≤ i ≤ n (cid:13)(cid:13) ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + (cid:13)(cid:13) ≤ M < ∞ . By Lemma S.9(ii)max ≤ i ≤ n k G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i ) k = o p (1) , so combining this with part (i) we obtainmax ≤ i ≤ n k r ωni ( ε i ) k≤ max ≤ i ≤ n (cid:13)(cid:13) ∇ ω ′ Γ ∗ ni ( ω ∗ ni ) + (cid:13)(cid:13) (cid:18) O p (cid:18) max ≤ i ≤ n k ω ni ( ε i ) − ω ∗ ni k (cid:19) + 1 √ n − ≤ i ≤ n k G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i ) k (cid:19) = o p (cid:18) √ n (cid:19) . The proof is complete.

Lemma S.9 (Stochastic equicontinuity)

Suppose that Assumptions 1-3 and 5 aresatisﬁed. Then G n ϕ γni ( ω, ε i ) deﬁned in (S.44) satisﬁes for any δ > ,(i) if ω ni ( ε i ) − ω ∗ ni = o p (1) , Pr ( k G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i ) k > δ | X, p n ) → s n → ∞ , and(ii) if ω ni ( ε i ) − ω ∗ ni = O p (cid:0) n − / (cid:1) , Pr (cid:18) max ≤ i ≤ n k G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i ) k > δ (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:19) → as n → ∞ . Proof.

Part (i): By consistency of ω ni ( ε i ), we can deﬁne h ni by ω ni ( ε i ) − ω ∗ ni = r − n h ni for some r ni → ∞ at a rate slower than the rate at which ω ni converges to ω ∗ ni sothat h ni ∈ Ω if n is suﬃciently large, because by Assumption 5 Ω contains a compactneighborhood of 0.By Markov’s inequalityPr ( k G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i ) k > δ | X, p n ) ≤ Pr (cid:18) sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n ϕ γni (cid:18) ω + hr ni , ε i (cid:19) − G n ϕ γni ( ω, ε i ) (cid:13)(cid:13)(cid:13)(cid:13) > δ (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:19) ≤ δ E (cid:20) sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n ϕ γni (cid:18) ω + hr ni , ε i (cid:19) − G n ϕ γni ( ω, ε i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) . We consider the empirical process G n ϕ γni (cid:18) ω + hr ni , ε i (cid:19) − G n ϕ γni ( ω, ε i )indexed by ω, h ∈ Ω. Recall that G n ϕ γni ( ω, ε i ) − G n ϕ γni (˜ ω, ε i )= 1 √ n − X j = i ϕ γn,ij ( ω, ε ij ) − ϕ γn,ij (˜ ω, ε ij ) − (cid:0) E (cid:2) ϕ γn,ij ( ω, ε ij ) − ϕ γn,ij (˜ ω, ε ij ) (cid:12)(cid:12) X, p n (cid:3)(cid:1) , where ϕ γn,ij ( ω, ε ij ) is deﬁned in (S.43) by ϕ γn,ij ( ω, ε ij ) = 1 (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij ≥ (cid:27) Λ ni Φ ′ ni Z j . Here we essentially need to show that this empirical process is stochastically equicon-tinuous. Notice that this empirical process is a triangular array with function ϕ γn,ij j (because we condition on X ), so most of the ready-to-use re-sults for stochastic equicontinuity (e.g. Andrews, 1994) are not applicable. Instead,we apply maximal inequalities in van der Vaart and Wellner (1996) to directly provethe stochastic equicontinuity.Observe that for any ω, ˜ ω ∈ Ω the function ϕ γn,ij ( ω, ε ij ) − ϕ γn,ij (˜ ω, ε ij ) can bebounded by (cid:13)(cid:13) ϕ γn,ij ( ω, ε ij ) − ϕ γn,ij (˜ ω, ε ij ) (cid:13)(cid:13) ≤ k Λ ni Φ ′ ni Z j k· (cid:12)(cid:12)(cid:12)(cid:12) (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω ≥ ε ij (cid:27) − (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ˜ ω ≥ ε ij (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) ≤ η n,ij ( ω, ˜ ω, ε ij ) , with η n,ij ( ω, ˜ ω, ε ij ) given by η n,ij ( ω, ˜ ω, ε ij )=  k Λ ni Φ ′ ni Z j k , if ε ij lies between U n,ij + n − n − Z ′ j Φ ni Λ ni ω and U n,ij + n − n − Z ′ j Φ ni Λ ni ˜ ω, , otherwise. (S.55)Next, we apply Theorem 2.14.1 in van der Vaart and Wellner (1996). This theoremgives a uniform upper bound to the absolute p -th moment of an empirical processthat we take as G n (cid:18) ϕ γni (cid:18) ω + hr ni , ε i (cid:19) − ϕ γni ( ω, ε i ) (cid:19) (S.56)indexed by ω, h ∈ Ω. We take the expectation conditional on X and p n and set p = 1. Both U n,ij and Z j vary in j . E (cid:20) sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n (cid:18) ϕ γni (cid:18) ω + hr ni , ε i (cid:19) − ϕ γni ( ω, ε i ) (cid:19)(cid:13)(cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ K E (cid:20) J (1 , F ni ( ε i )) sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) η ni (cid:18) ω + hr ni , ω, ε i (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) , (S.57)where K > (cid:13)(cid:13)(cid:13)(cid:13) η ni (cid:18) ω + hr ni , ω, ε i (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) n = 1 n − X j = i η n,ij (cid:18) ω + hr ni , ω, ε ij (cid:19) . (S.58)We ﬁrst show that the uniform entropy integral J (1 , F ni ( ε i )) in (S.57) is ﬁnite,where F ni ( ε i ) denotes the set of arrays F ni ( ε i ) = (cid:26)(cid:18) ϕ γn,ij (cid:18) ω + hr ni , ε ij (cid:19) − ϕ γn,ij ( ω, ε ij ) , j = i (cid:19) : ω, h ∈ Ω (cid:27) , (S.59)and J (1 , F ni ( ε i )) is the uniform entropy integral of F ni ( ε i ) J (1 , F ni ( ε i )) = Z sup α ∈ R n − q ln D ( ξ k α ⊙ ¯ η ni ( ε i ) k n , α ⊙ F ni ( ε i ) , k·k n ) dξ. (S.60)In (S.60), ¯ η ni ( ε i ) = sup ω,h ∈ Ω η ni (cid:16) ω + hr ni , ω, ε i (cid:17) is an ( n − × F ni ( ε i ), α is an ( n − × α ⊙ ¯ η ni ( ε i )is the Hadamard product of α and ¯ η ni ( ε i ), and α ⊙ F ni ( ε i ) is the set of Hadamardproducts of α and the functions in F ni ( ε i ). Also k·k n is the empirical L norm deﬁnedin (S.58), and D ( ξ k α ⊙ ¯ η ni ( ε i ) k n , α ⊙ F ni ( ε i ) , k·k n ) is the packing number, i.e., themaxinum number of points in the set α ⊙ F ni ( ε i ) that are separated by the distance ξ k α ⊙ ¯ η ni ( ε i ) k n for the norm k·k n . The sup in (S.60) is taken over all ( n − × α of nonnegative constants. From the proof of Theorem 2.14.1 in van der Vaart and Wellner (1996) it follows that the em-pirical L norm of an envelope of F ni ( ε i ) can be replaced by the sup of the empirical L norm ofthe n − η ni (cid:16) ω + hr ni , ω, ε i (cid:17) . Also the theorem holds for a triangular arraywith independent but non-identically distributed observations. g n,ij ( ω, ε ij ) = 1 (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij ≥ (cid:27) . It is an indicator with the argument being a linear function of ω . We can show that theset { ( g n,ij ( ω, ε ij ) , j = i ) : ω ∈ Ω } has a pseudo-dimension of at most T , so it is Eu-clidean (Pollard, 1990, Corollary 4.10). Note that ϕ γn,ij ( ω, ε ij ) = g n,ij ( ω, ε ij ) Λ ni Φ ′ ni Z j ,and Λ ni Φ ′ ni Z j is a T × ω . From the stability re-sults in Pollard (1990, Section 5), each component of the doubly indexed process { ( ϕ γni (cid:16) ω + hr ni , ε i (cid:17) − ϕ γni ( ω, ε i )) , j = i ) : ω, h ∈ Ω } is Euclidean. Therefore, the set F ni ( ε i ) has a ﬁnite uniform entropy integral, i.e., J (1 , F ni ( ε i )) ≤ ¯ J (S.61)uniformly in ε i and n for some ¯ J < ∞ . To see this, by the deﬁnition of pseudo-dimension, it suﬃces to show that for each index set I = { j , . . . , j T +1 } ∈ { , . . . , n } \ { i } and each point c ∈ R T +1 , there is a subset J ⊆ I such that no ω ∈ Ω can satisfy the inequalities1 (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij ≥ (cid:27) (cid:26) > c j for j ∈ J< c j for j ∈ I \ J If c has a component c j that lies outside of (0 , J such that j ∈ J if c j ≥ j ∈ I \ J if c j ≤ ω can satisfy the inequalities above. It thus suﬃces to consider c with allthe components in (0 ,

1) and for such c the inequalities reduce to U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij (cid:26) ≥ j ∈ J< j ∈ I \ J Since Z ′ j Φ ni Λ ni ∈ R T for all j , there exists a non-zero vector τ = ( τ , . . . , τ T +1 ) ∈ R T +1 suchthat P T +1 t =1 τ t Z ′ j t Φ ni Λ ni = 0, so P T +1 t =1 τ t n − n − Z ′ j t Φ ni Λ ni ω = 0 for all ω ∈ Ω. We may assumethat τ t > t . If P T +1 t =1 τ t ( U n,ij t − ε n,ij t ) ≥

0, it is impossible to ﬁnd a ω ∈ Ωsatisfying these inequalities for the choice J = { j t ∈ I : τ t ≤ } , because this would lead to thecontradiction P T +1 t =1 τ t ( U n,ij t − ε n,ij t ) = P T +1 t =1 τ t ( U n,ij t − ε n,ij t ) + P T +1 t =1 τ t n − n − Z ′ j t Φ ni Λ ni ω = P T +1 t =1 τ t (cid:16) U n,ij t + n − n − Z ′ j t Φ ni Λ ni ω − ε n,ij t (cid:17) <

0. If P T +1 t =1 τ t ( U n,ij t − ε n,ij t ) <

0, we could choose J = { j t ∈ I : τ t ≥ } to reach a similar contradiction. L norm in the bound. By Jensen’s inequality E (cid:20) sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) η ni (cid:18) ω + hr ni , ω, ε i (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) = E  sup ω,h ∈ Ω n − X j = i η n,ij (cid:18) ω + hr ni , ω, ε ij (cid:19)! / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n  ≤ E " sup ω,h ∈ Ω n − X j = i η n,ij (cid:18) ω + hr ni , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n / . (S.62)To derive an upper bound on the last term in (S.62), we consider the empiricalprocess G n η ni (cid:18) ω + hr ni , ω, ε i (cid:19) = 1 √ n − X j = i (cid:18) η n,ij (cid:18) ω + hr ni , ω, ε ij (cid:19) − E (cid:20) η n,ij (cid:18) ω + hr ni , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21)(cid:19) indexed by ω, h ∈ Ω. Note that each η n,ij is bounded by k Λ ni Φ ′ ni Z j k ≤ max t =1 , ··· ,T λ ni,t T ≤ ¯ η < ∞ . Similarly to (S.57), we apply Theorem 2.14.1 in van der Vaart and Wellner(1996) to this empirical process with p = 1 and get an upper bound E (cid:20) sup ω,h ∈ Ω (cid:12)(cid:12)(cid:12)(cid:12) G n η ni (cid:18) ω + hr ni , ω, ε i (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ K η E (cid:2) J (1 , F ηni ( ε i )) (cid:13)(cid:13) ¯ η (cid:13)(cid:13) n (cid:12)(cid:12) X, p n (cid:3) , with K η > (cid:13)(cid:13) ¯ η (cid:13)(cid:13) n = s n − X j = i ¯ η = ¯ η , and J (1 , F ηni ( ε i )) the uniform entropy integral of the set F ηni ( ε i ) = (cid:26)(cid:18) η n,ij (cid:18) ω + hr ni , ω, ε ij (cid:19) , j = i (cid:19) : ω, h ∈ Ω (cid:27) . Similarly to the argument for the set F ni ( ε i ) in (S.59), we can show that the set36 ηni ( ε i ) has a ﬁnite uniform entropy integral J (1 , F ηni ( ε i )) ≤ ¯ J η < ∞ . Therefore, E " sup ω,h ∈ Ω n − X j = i η n,ij (cid:18) ω + hr ni , ω, ε ij (cid:19) − sup ω,h ∈ Ω n − X j = i E (cid:20) η n,ij (cid:18) ω + hr ni , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ≤ √ n − E (cid:20) sup ω,h ∈ Ω (cid:12)(cid:12)(cid:12)(cid:12) G n η ni (cid:18) ω + hr ni , ω, ε i (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ K η ¯ J η ¯ η √ n − ≡ M η √ n − . For any ω, h ∈ Ω and any j = i , by the mean-value theorem, we have E (cid:20) η n,ij (cid:18) ω + hr ni , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) = (cid:12)(cid:12)(cid:12)(cid:12) F ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni (cid:18) ω + hr ni (cid:19)(cid:19) − F ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) · k Λ ni Φ ′ ni Z j k = f ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni (cid:18) ω + t n,ij hr ni (cid:19)(cid:19) n − n − (cid:12)(cid:12)(cid:12)(cid:12) Z ′ j Φ ni Λ ni hr ni (cid:12)(cid:12)(cid:12)(cid:12) k Λ ni Φ ′ ni Z j k ≤ r ni f ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni (cid:18) ω + t n,ij hr ni (cid:19)(cid:19) n − n − k Λ ni Φ ′ ni Z j k sup h ∈ Ω k h k (S.63)for some t n,ij ∈ [0 , f ε is bounded. There is also aﬁnite bound on the eigenvalues in Λ ni that does not depend on i . We conclude thatthere is a ﬁnite M with E (cid:20) η n,ij (cid:18) ω + hr ni , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ Mr ni ω, h ∈ Ω and all j . Hence E " sup ω,h ∈ Ω n − X j = i η n,ij (cid:18) ω + hr ni , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ≤ M η √ n − Mr ni . Combining the results we obtain the upper boundPr ( k G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i ) k > δ | X, p n ) ≤ K ¯ Jδ s M η √ n − Mr ni , which for all δ > n suﬃciently large. Part(i) is proved.Part (ii): Because ω ni ( ε i ) − ω ∗ ni = O p (cid:0) n − / (cid:1) , we can deﬁne h ni by ω ni ( ε i ) − ω ∗ ni = n − κ h ni for 0 < κ < /

2, and h ni ∈ Ω if n is suﬃciently large. By Markov’s inequalityPr (cid:18) max ≤ i ≤ n k G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i ) k > δ (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:19) ≤ Pr (cid:18) max ≤ i ≤ n sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n ϕ γni (cid:18) ω + hn κ , ε i (cid:19) − G n ϕ γni ( ω, ε i ) (cid:13)(cid:13)(cid:13)(cid:13) > δ (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:19) ≤ δ E (cid:20) max ≤ i ≤ n sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n ϕ γni (cid:18) ω + hn κ , ε i (cid:19) − G n ϕ γni ( ω, ε i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) . Because for any random variable E [ | Z || X, p n ] ≤ k Z k ψ | X,p n with k·k ψ | X,p n the condi-tional Orlicz norm given X and p n and ψ ( x ) = e x −

1, by the maximal inequality inLemma 2.2.2 in van der Vaart and Wellner (1996) E (cid:20) max ≤ i ≤ n sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n ϕ γni (cid:18) ω + hn κ , ε i (cid:19) − G n ϕ γni ( ω, ε i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ (cid:13)(cid:13)(cid:13)(cid:13) max ≤ i ≤ n sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n ϕ γni (cid:18) ω + hn κ , ε i (cid:19) − G n ϕ γni ( ω, ε i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ψ | X,p n ≤ K ln ( n + 1) max ≤ i ≤ n (cid:13)(cid:13)(cid:13)(cid:13) sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n ϕ γni (cid:18) ω + hn κ , ε i (cid:19) − G n ϕ γni ( ω, ε i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ψ | X,p n , where K > p = 1) and Lemma 2.2.2 in van der Vaart and Wellner381996), we have (cid:13)(cid:13)(cid:13)(cid:13) sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n ϕ γni (cid:18) ω + hn κ , ε i (cid:19) − G n ϕ γni ( ω, ε i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ψ | X,p n ≤ K (cid:18) E (cid:20) sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n ϕ γni (cid:18) ω + hn κ , ε i (cid:19) − G n ϕ γni ( ω, ε i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) + ln n √ n − j = i (cid:13)(cid:13)(cid:13)(cid:13) sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) η n,ij (cid:18) ω + hn κ , ω, ε ij (cid:19)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ψ | X,p n ! . In Part (i) we derived for the ﬁrst term in the upper bound E (cid:20) sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n ϕ γni (cid:18) ω + hn κ , ε i (cid:19) − G n ϕ γni ( ω, ε i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ K ¯ J s M η √ n − Mn κ . For the second term by the deﬁnition of η n,ij ,max j = i (cid:13)(cid:13)(cid:13)(cid:13) sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) η n,ij (cid:18) ω + hn κ , ω, ε ij (cid:19)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ψ | X,p n ≤ max j = i k Λ ni Φ ′ ni Z j k≤ max t =1 ,,T λ ni,t √ T ≤ ¯ η < ∞ . Therefore, for 1 ≤ i ≤ n (cid:13)(cid:13)(cid:13)(cid:13) sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n ϕ γni (cid:18) ω + hn κ , ε i (cid:19) − G n ϕ γni ( ω, ε i ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ψ | X,p n ≤ K K ¯ J s M η √ n − Mn κ + ¯ η ln n √ n − ! . (S.64)Combining the results yields the upper boundPr (cid:18) max ≤ i ≤ n k G n ϕ γni ( ω ni ( ε i ) , ε i ) − G n ϕ γni ( ω ∗ ni , ε i ) k > δ (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:19) ≤ KK ln ( n + 1) δ K ¯ J s M η √ n − Mn κ + ¯ η ln n √ n − ! , which for all δ > n suﬃciently large. Theproof is complete. 39 symptotic Distribution of ˆ θ n Proof of Theorem 4.2.

The GMM estimator of θ satisﬁes the sample uncondi-tional moment conditionˆΨ n (ˆ θ n , ˆ p n ) = 1 n ( n − X i X j = i ˆ W n,ij (cid:16) G n,ij − P n,ij (ˆ θ n , ˆ p n ) (cid:17) = o p (cid:18) n (cid:19) with ˆ p n the T × T matrix of empirical link frequencies between the types. We arrangethe link frequencies in a vector and with abuse of notation we use ˆ p n for vec( ˆ p ′ n ).By a Taylor-series expansion of P n,ij (ˆ θ n , ˆ p n ) around ( θ , p n ) P n,ij (ˆ θ n , ˆ p n ) = P n,ij ( θ , p n ) + ∇ θ ′ P n,ij ( θ , p n )(ˆ θ n − θ )+ ∇ p ′ P n,ij ( θ , p n )(ˆ p n − p n ) + o p (cid:16)(cid:13)(cid:13)(cid:13) (ˆ θ n , ˆ p n ) − ( θ , p n ) (cid:13)(cid:13)(cid:13)(cid:17) and upon rearranging the terms of the expansion we have1 n ( n − X i X j = i ˆ W n,ij ∇ θ ′ P n,ij ( θ , p n )(ˆ θ n − θ )= 1 n ( n − X i X j = i ˆ W n,ij ( G n,ij − P n,ij ( θ , p n )) − n ( n − X i X j = i ˆ W n,ij ∇ p ′ P n,ij ( θ , p n )(ˆ p n − p n ) − n ( n − X i X j = i ˆ W n,ij o p (cid:16)(cid:13)(cid:13)(cid:13) (ˆ θ n , ˆ p n ) − ( θ , p n ) (cid:13)(cid:13)(cid:13)(cid:17) + o p (cid:18) n (cid:19) , where we assume that n is suﬃciently large, so that (ˆ θ n , ˆ p n ) is in a neighborhood of( θ , p n ) where P n,ij ( θ, p ) is continuously diﬀerentiable.The instruments ˆ W n,ij are estimated, but we have max i,j =1 ,...,n (cid:13)(cid:13)(cid:13) ˆ W n,ij − W n,ij (cid:13)(cid:13)(cid:13) = o p (1) by Assumption 4(iii), so the sampling variation in the instruments has no eﬀect40n the asymptotic distribution of ˆ θ n . The GMM estimator thus satisﬁes1 n ( n − X i X j = i W n,ij ∇ θ ′ P n,ij ( θ , p n ) (cid:16) ˆ θ n − θ (cid:17) = 1 n ( n − X i X j = i W n,ij ( G n,ij − P n,ij ( θ , p n )) − n ( n − X i X j = i W n,ij ∇ p ′ P n,ij ( θ , p n ) ( ˆ p n − p n )+ o p (cid:16)(cid:13)(cid:13)(cid:13) ˆ θ n − θ (cid:13)(cid:13)(cid:13)(cid:17) + o p ( k ˆ p n − p n k ) + o p (cid:18) n (cid:19) . (S.65)where we have used1 n ( n − X i X j = i W n,ij o p (cid:16)(cid:13)(cid:13)(cid:13) (ˆ θ n , ˆ p n ) − ( θ , p n ) (cid:13)(cid:13)(cid:13)(cid:17) ≤ o p (cid:16)(cid:13)(cid:13)(cid:13) ˆ θ n − θ (cid:13)(cid:13)(cid:13)(cid:17) + o p ( k ˆ p n − p n k )because (cid:13)(cid:13)(cid:13) (ˆ θ n , ˆ p n ) − ( θ , p n ) (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) ˆ θ n − θ (cid:13)(cid:13)(cid:13) + k ˆ p n − p n k and max i,j =1 ,...,n k W n,ij k < ∞ (Assumption 4(iii)).Let us examine the ﬁrst two terms on the right-hand side of (S.65), with theﬁrst being the main term while the second gives the contribution of the ﬁrst-stageestimation of the link probabilities. Recall that ˆ p n is the vector of empirical fractionsof pairs of type s , t that have a link and p n is the vector of link probabilities of pairsof type s , t so ˆ p n − p n = 1 n ( n − X i X j = i Q n,ij ( G n,ij − P n,ij ( θ , p n )) , where Q n,ij = ( Q n,ij, , . . . , Q n,ij, T , . . . , Q n,ij,T , . . . , Q n,ij,T T ) ′ ∈ R T with Q n,ij,st = 1 { X i = x s , X j = x t } n ( n − P i P j = i { X i = x s , X j = x t } , s, t = 1 , . . . , T. Hence, the ﬁrst two terms on the right-hand side of (S.65) can be combined as1 n ( n − X i X j = i ˜ W n,ij ( G n,ij − P n,ij ( θ , p n )) , where ˜ W n,ij is the augmented instrument that combines the instruments for the ﬁrst-41tage and second-stage estimation˜ W n,ij = W n,ij − n ( n − X k X l = k W n,kl ∇ p ′ P n,kl ( θ , p n ) ! Q n,ij . Applying Lemma S.10 twice for the instrument vectors ˜ W n,ij and Q n,ij we get1 n ( n − X i X j = i ˜ W n,ij ( G n,ij − P n,ij ( θ , p n )) = O p (cid:18) n (cid:19) , (S.66)and ˆ p n − p n = O p (cid:18) n (cid:19) . so (S.65) becomes J θn ( θ , p n ) (cid:16) ˆ θ n − θ (cid:17) = O p (cid:18) n (cid:19) + o p (cid:16)(cid:13)(cid:13)(cid:13) ˆ θ n − θ (cid:13)(cid:13)(cid:13)(cid:17) + o p (cid:18) n (cid:19) , with J θn ( θ , p n ) being the Jacobian matrix J θn ( θ , p n ) = 1 n ( n − X i X j = i W n,ij ∇ θ ′ P n,ij ( θ , p n ) . By Assumption 6(ii) J θn ( θ , p n ) is nonsingular, so (cid:13)(cid:13) J θn ( θ , p n ) ( θ − θ ) (cid:13)(cid:13) ≥ c k θ − θ k with c = λ min (cid:0) J θn ( θ , p n ) ′ J θn ( θ , p n ) (cid:1) >

0. Therefore, (cid:13)(cid:13)(cid:13) ˆ θ n − θ (cid:13)(cid:13)(cid:13) ( c + o p (1)) ≤ O p (cid:18) n (cid:19) + o p (cid:18) n (cid:19) . This implies that ˆ θ n − θ = O p (cid:18) n (cid:19) . i.e., ˆ θ n is n -consistent for θ . 42o derive the asymptotic distribution of ˆ θ n , we rewrite (S.65) as p n ( n − (cid:16) ˆ θ n − θ (cid:17) = 1 p n ( n − X i X j = i J θn ( θ , p n ) − ˜ W n,ij ( G n,ij − P n,ij ( θ , p n )) + o p (1) . Note that for each i the G n,ij are correlated over j , so that a CLT for independentrandom variables cannot be used. We need the result for dependent random variablesin Lemma S.10. The link chocies of i in the n − G ni are correlated throughtheir dependence on ω ni ( ε i ). The correlation goes to 0 as n → ∞ , so the samplemoments have an asymptotic normal distribution with a ﬁnite variance that accountsfor the variation in ω ni ( ε i ).We apply Lemma S.10 for the instrument vector J θn ( θ , p n ) − ˜ W n,ij . Deﬁne the d θ × d θ matrix Σ n ( θ , p n ) = 1 n ( n − X i X j = i Σ n,ij ( θ , p n )withΣ n,ij ( θ , p n )= J θn ( θ , p n ) − E h(cid:16) ˜ W n,ij (cid:0) g n,ij ( ω ∗ ni , ε ij ) − P ∗ n,ij ( ω ∗ ni ) (cid:1) + ˜ J ωni ( ω ∗ ni ) ϕ ωn,ij ( ω ∗ ni , ε ij ) (cid:17) · (cid:16) ˜ W n,ij (cid:0) g n,ij ( ω ∗ ni , ε ij ) − P ∗ n,ij ( ω ∗ ni ) (cid:1) + ˜ J ωni ( ω ∗ ni ) ϕ ωn,ij ( ω ∗ ni , ε ij ) (cid:17) ′ (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) (cid:0) J θn ( θ , p n ) − (cid:1) ′ (S.67)where ω ∗ ni ∈ R T maximizes Π ∗ ni ( ω ) in (4.8). The indicator function g n,ij ( ω, ε ij ) andthe corresponding probability P ∗ n,ij ( ω ) are deﬁned in Lemma S.10. The d θ × T matrix˜ J ωni ( ω ) is deﬁned by ˜ J ωni ( ω ) = 1 n − X j = i ˜ W n,ij ∇ ω ′ P ∗ n,ij ( ω )The function ϕ ωn,ij ( ω ∗ ni , ε ij ) ∈ R T is the j -th term of the inﬂuence function of ω ni deﬁned in (S.32) in Lemma S.7. By Lemma S.10, p n ( n − − / n ( θ , p n ) (cid:16) ˆ θ n − θ (cid:17) d → N (0 , I d θ )43s n → ∞ . The proof is complete. Lemma S.10 (Asymptotic normality of the sample moment)

Suppose that As-sumption 1-3 and 5 are satisﬁed. Deﬁne Y n = 1 p n ( n − X i X j = i W n,ij ( G n,ij − P n,ij ( θ , p n )) . where W n,ij is a d θ × instrument vector. Let Σ n be the d θ × d θ positive-deﬁnitematrix Σ n = 1 n ( n − X i X j = i Σ n,ij (S.68) with Σ n,ij = E h(cid:0) W n,ij (cid:0) g n,ij ( ω ∗ ni , ε ij ) − P ∗ n,ij ( ω ∗ ni ) (cid:1) + J ωni ( ω ∗ ni ) ϕ ωn,ij ( ω ∗ ni , ε ij ) (cid:1) · (cid:0) W n,ij (cid:0) g n,ij ( ω ∗ ni , ε ij ) − P ∗ n,ij ( ω ∗ ni ) (cid:1) + J ωni ( ω ∗ ni ) ϕ ωn,ij ( ω ∗ ni , ε ij ) (cid:1) ′ (cid:12)(cid:12)(cid:12) X, p n i (S.69) where ω ∗ ni ∈ R T maximizes the function Π ∗ ni ( ω ) = X j = i E (cid:20) (cid:20) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij (cid:21) + (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) − ( n − n − ω ′ Λ ni ω. The indicator functions g n,ij ( ω, ε ij ) , the corresponding probabilities P ∗ n,ij ( ω ) , and the d θ × T matrix J ωni ( ω ) are deﬁned by g n,ij ( ω, ε ij ) = 1 (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij ≥ (cid:27) P ∗ n,ij ( ω ) = F ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω (cid:19) J ωni ( ω ) = 1 n − X j = i W n,ij ∇ ω ′ P ∗ n,ij ( ω ) , and ϕ ωn,ij ( ω ∗ ni , ε ij ) ∈ R T is the j -th term of the inﬂuence function of ω ni ( ε i ) deﬁned W n,ij is a generic valid instrument vector. It can be the augmented instrument vector ˜ W n,ij orthe vector of ﬁrst-stage instruments Q n,ij . We discuss the choice of instrument in Section 4. n (S.32) in Lemma S.7. Then Σ − / n Y n d → N (0 , I d θ ) as n → ∞ , where I θ is the d θ × d θ identity matrix. Proof.

Deﬁne the link choice indicator at ω ∈ Ω g n,ij ( ω, ε ij ) = 1 (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij ≥ (cid:27) , j = i. By Theorem 3.2, the observed link choice G n,ij is given by g n,ij ( ω, ε ij ) evaluated at ω ni ( ε i ), i.e., G n,ij = g n,ij ( ω ni ( ε i ) , ε ij ) , j = i, (S.70)where ω ni ( ε i ) maximizes the functionΠ ni ( ω ) = X j = i (cid:20) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij (cid:21) + − ( n − n − ω ′ Λ ni ω, i = 1 , . . . , n. The conditional choice probability P n,ij ( θ , p n ) is thus the conditional expectation of g n,ij ( ω ni ( ε i ) , ε ij ) P n,ij = E [ g n,ij ( ω ni ( ε i ) , ε ij ) | X, p n ] , j = i. (S.71)The challenge in deriving the asymptotic distribution of the normalized samplemoment Y n lies in the fact that the link choices of an individual i , i.e., G n,ij and G n,ik , are correlated through ω ni ( ε i ). As shown in Lemma S.5 ω ni ( ε i ) converges inprobability to ω ∗ ni that does not depend on ε i . Let Π ∗ ni ( ω ) be the expectation ofΠ ni ( ω )Π ∗ ni ( ω ) = X j = i E (cid:20) (cid:20) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij (cid:21) + (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) − ( n − n − ω ′ Λ ni ω,i = 1 , . . . , n . By Assumption 5 Π ∗ ni ( ω ) has a unique maximizer ω ∗ ni that does notdepend on ε i . 45eﬁne the function P ∗ n,ij ( ω ) P ∗ n,ij ( ω ) = E [ g n,ij ( ω, ε ij ) | X, p n ]= F ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω (cid:19) , j = i. (S.72)Here we treat ω as a parameter and take the expectation with respect to ε ij only.The normalized sample moment Y n is equal to Y n = 1 p n ( n − X i X j = i W n,ij ( g n,ij ( ω ni ( ε i ) , ε ij ) − E [ g n,ij ( ω ni ( ε i ) , ε ij ) | X, p n ])= T n + T n + T n + T n , (S.73)where T n = 1 p n ( n − X i X j = i W n,ij (cid:0) g n,ij ( ω ∗ ni , ε ij ) − P ∗ n,ij ( ω ∗ ni ) (cid:1) T n = 1 p n ( n − X i X j = i W n,ij ( g n,ij ( ω ni ( ε i ) , ε ij ) − g n,ij ( ω ∗ ni , ε ij ) − (cid:0) P ∗ n,ij ( ω ni ( ε i )) − P ∗ n,ij ( ω ∗ ni ) (cid:1)(cid:1) T n = 1 p n ( n − X i X j = i W n,ij (cid:0) P ∗ n,ij ( ω ni ( ε i )) − P ∗ n,ij ( ω ∗ ni ) (cid:1) T n = − p n ( n − X i X j = i W n,ij (cid:0) E [ g n,ij ( ω ni ( ε i ) , ε ij ) | X, p n ] − P ∗ n,ij ( ω ∗ ni ) (cid:1) . (S.74)The terms in the decomposition have an interpretation. T n is the sample mo-ment condition if we replace ω ni ( ε i ) by its limit ω ∗ ni . This substitution removes thecorrelation between the link choices of an individual. The term T n contains the dif-ference between the dependent (through ω ni ( ε i )) sample moment function and theindependent one in T n . The fact that this term is shown to be negligible shows thatthe correlation between the link choices vanishes if n is large. The sampling variationin ω ni ( ε i ) is captured by T n which contributes to the asymptotic variance of themoment function. Finally, the linear approximation of T n has non-negligible approx-imation errors (from both a Taylor series expansion remainder and a remainder inthe asymptotically linear approximation of ω ni ( ε i )) that are o p (1) if we add T n .Let us now examine the four terms in (S.74).46tep 1: T n .The term T n is a normalized sum of link indicators that are evaluated at ω ∗ ni rather than ω ni ( ε i ) and thus are independent. This is the main term in Y n withan asymptotically normal distribution, because the CLT applies. It captures thesampling in the link choices, due to sampling variation in ε ij .Step 2: T n .We show that T n in (S.74) is o p (1). Deﬁne for each i the empirical process G n W ni g ni ( ω, ε i ) = 1 √ n − X j = i W n,ij (cid:0) g n,ij ( ω, ε ij ) − P ∗ n,ij ( ω ) (cid:1) , ω ∈ Ω , so that T n = 1 √ n X i G n W ni ( g ni ( ω ni ( ε i ) , ε i ) − g ni ( ω ∗ ni , ε i )) . Since each G n W ni ( g ni ( ω ni ( ε i ) , ε i ) − g ni ( ω ∗ ni , ε i )) only involves ε i , conditional on X and p n , they are independent. By Lemma S.7 ω ni ( ε i ) − ω ∗ ni = O p (cid:0) n − / (cid:1) , so if wedeﬁne h ni by ω ni ( ε i ) − ω ∗ ni = n − κ h ni for 0 < κ < /

2, then h ni ∈ Ω if n is suﬃcientlylarge, because by Assumption 5(i) Ω contains a compact neighborhood of 0. Note that T n is a normalized average of terms that are each o p (1) by establishing stochasticequicontinuity. Hence we cannot directly invoke a stochastic equicontinuity argumentto show that their sum T n is o p (1).By Chebyshev’s inequality, for any δ > (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) √ n X i G n W ni ( g ni ( ω ni ( ε i ) , ε i ) − g ni ( ω ∗ ni , ε i )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) > δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ! ≤ δ n X i E (cid:2) k G n W ni ( g ni ( ω ni ( ε i ) , ε i ) − g ni ( ω ∗ ni , ε i )) k (cid:12)(cid:12) X, p n (cid:3) ≤ δ n X i E " sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n W ni (cid:18) g ni (cid:18) ω + hn κ , ε i (cid:19) − g ni ( ω, ε i ) (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (S.75)Observe that for any ω, ˜ ω ∈ Ω, the function W n,ij ( g n,ij ( ω, ε ij ) − g n,ij (˜ ω, ε ij )) can47e bounded by k W n,ij ( g n,ij ( ω, ε ij ) − g n,ij (˜ ω, ε ij )) k≤ k W n,ij k (cid:12)(cid:12)(cid:12)(cid:12) (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij ≥ (cid:27) − (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ˜ ω − ε ij ≥ (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) ≤ η n,ij ( ω, ˜ ω, ε ij ) , with η n,ij ( ω, ˜ ω, ε ij ) given by η n,ij ( ω, ˜ ω, ε ij )=  k W n,ij k , if ε ij lies between U n,ij + n − n − Z ′ j Φ ni Λ ni ω and U n,ij + n − n − Z ′ j Φ ni Λ ni ˜ ω ,0 , otherwise. (S.76)Next, we apply Theorem 2.14.1 in van der Vaart and Wellner (1996) (with p = 2).This theorem gives a uniform upper bound to the absolute p -th moment of an empir-ical process that we take as G n W ni (cid:18) g ni (cid:18) ω + hn κ , ε i (cid:19) − g ni ( ω, ε i ) (cid:19) (S.77)indexed by ω, h ∈ Ω. The bound from Theorem 2.14.1 in van der Vaart and Wellner(1996) is E " sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n W ni (cid:18) g ni (cid:18) ω + hn κ , ε i (cid:19) − g ni ( ω, ε i ) (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ≤ K E " J (1 , F ni ( ε i )) sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) η ni (cid:18) ω + hn κ , ω, ε i (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n , Similarly to the proof in Lemma S.9, from the proof of Theorem 2.14.1 invan der Vaart and Wellner (1996) it follows that the empirical L norm of an envelope of F ni ( ε i ) canbe replaced by the sup of the empirical L norm of the n − η ni (cid:0) ω + hn κ , ω, ε i (cid:1) .Also the theorem holds for a triangular array with independent but non-identically distributed ob-servations. K > (cid:13)(cid:13)(cid:13)(cid:13) η ni (cid:18) ω + hn κ , ω, ε i (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) n = 1 n − X j = i η n,ij (cid:18) ω + hn κ , ω, ε ij (cid:19) . We now show that the uniform entropy integral J (1 , F ni ( ε i )) (deﬁned as in S.60)is ﬁnite, where F ni ( ε i ) denotes the set of functions F ni ( ε i ) = (cid:26)(cid:18) W n,ij (cid:18) g n,ij (cid:18) ω + hn κ , ε ij (cid:19) − g n,ij ( ω, ε ij ) (cid:19) , j = i (cid:19) : ω, h ∈ Ω (cid:27) . Consider g n,ij ( ω, ε ij ) = 1 (cid:26) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω − ε ij ≥ (cid:27) . It is an indicator with the argument being a linear function of ω . Following theargument in Lemma S.9, we can show that the set { ( g n,ij ( ω, ε ij ) , j = i ) : ω ∈ Ω } hasa pseudo-dimension of at most T , so it is Euclidean (Pollard, 1990, Corollary 4.10).Note that W n,ij is a d θ × ω . From the stabilityresults in Pollard (1990, Section 5), each component of the doubly indexed process { ( W n,ij ( g n,ij ( ω + n − κ h, ε ij ) − g n,ij ( ω, ε ij )) , j = i ) : ω, h ∈ Ω } is Euclidean. Therefore,the set F ni ( ε i ) has a ﬁnite uniform entropy integral bounded by some ¯ J < ∞ .Observe that E " sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) η ni (cid:18) ω + hn κ , ω, ε i (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n = E " sup ω,h ∈ Ω n − X j = i η n,ij (cid:18) ω + hn κ , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n . Note that η n,ij is bounded by k W n,ij k ≤ max i,j =1 ,...,n k W n,ij k ≡ ¯ W < ∞ (Assump-tion 4(iii)). Similar to the argument in Lemma S.9 we can show that the set offunctions (cid:26)(cid:18) η n,ij (cid:18) ω + hn κ , ω, ε ij (cid:19) , j = i (cid:19) : ω, h ∈ Ω (cid:27) has a ﬁnite uniform entropy integral bounded by some ¯ J η < ∞ . Hence, we can applyTheorem 2.14.1 in van der Vaart and Wellner (1996) and derive an upper bound on49he expectation of the empirical process G n η ni (cid:18) ω + hn κ , ω, ε i (cid:19) = 1 √ n − X j = i (cid:18) η n,ij (cid:18) ω + hn κ , ω, ε ij (cid:19) − E (cid:20) η n,ij (cid:18) ω + hn κ , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21)(cid:19) indexed by ω, h ∈ Ω. The bound is E (cid:20) sup ω,h ∈ Ω (cid:12)(cid:12)(cid:12)(cid:12) G n η ni (cid:18) ω + hn κ , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ K η ¯ J η ¯ W . Therefore, E " sup ω,h ∈ Ω n − X j = i η n,ij (cid:18) ω + hn κ , ω, ε ij (cid:19) − sup ω,h ∈ Ω n − X j = i E (cid:20) η n,ij (cid:18) ω + hn κ , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ≤ √ n − E (cid:20) sup ω,h ∈ Ω (cid:12)(cid:12)(cid:12)(cid:12) G n η ni (cid:18) ω + hn κ , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ K η ¯ J η ¯ W √ n − ≡ M η √ n − . (S.78)For any ω, h ∈ Ω and any j = i , by the mean-value theorem, we have E (cid:20) η n,ij (cid:18) ω + hn κ , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni (cid:18) ω + hn κ (cid:19)(cid:19) − F ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) · k W n,ij k = f ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni (cid:18) ω + t n,ij hn κ (cid:19)(cid:19) n − n − (cid:12)(cid:12)(cid:12)(cid:12) Z ′ j Φ ni Λ ni hn κ (cid:12)(cid:12)(cid:12)(cid:12) k W n,ij k ≤ n κ f ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni (cid:18) ω + t n,ij hn κ (cid:19)(cid:19) n − n − k Λ ni Φ ′ ni Z j k k W n,ij k sup h ∈ Ω k h k for some t n,ij ∈ [0 , f ε is bounded. There is alsoa ﬁnite bound on the eigenvalues in Λ ni and on the instruments W n,ij that does not50epend on i and j . We conclude that there is a ﬁnite M with E (cid:20) η n,ij (cid:18) ω + hn κ , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ Mn κ for all ω, h ∈ Ω and all i, j so thatsup ω,h ∈ Ω n − X j = i E (cid:20) η n,ij (cid:18) ω + hn κ , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ Mn κ . By (S.78) E " sup ω,h ∈ Ω n − X j = i η n,ij (cid:18) ω + hn κ , ω, ε ij (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ≤ M η √ n − Mn κ . Combining the results we obtain the upper bound E " sup ω,h ∈ Ω (cid:13)(cid:13)(cid:13)(cid:13) G n W ni (cid:18) g ni (cid:18) ω + hn κ , ε i (cid:19) − g ni ( ω, ε i ) (cid:19)(cid:13)(cid:13)(cid:13)(cid:13) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ≤ K ¯ J (cid:18) M η √ n − Mn κ (cid:19) . (S.79)Hence the upper bound in (S.75) is K ¯ Jδ (cid:18) M η √ n − Mn κ (cid:19) which for all δ > n suﬃciently large. Weconclude that T n = o p (1).Step 3: T n .We use the delta method to derive an asymptotically linear representation of T n ,from which we can see how T n contributes to the asymptotic distribution of Y n .By (S.72) the probability P ∗ n,ij ( ω ) is diﬀerentiable in ω with the derivative ∇ ω P ∗ n,ij ( ω ) = 2 ( n − n − f ε (cid:18) U n,ij + 2 ( n − n − Z ′ j Φ ni Λ ni ω (cid:19) Λ ni Φ ′ ni Z j , By a Taylor series expansion of P ∗ n,ij ( ω ni ( ε i )) around ω ∗ ni , T n can be written as T n = 1 p n ( n − X i X j = i W n,ij (cid:0) ∇ ω ′ P ∗ n,ij ( ω ∗ ni ) ( ω ni ( ε i ) − ω ∗ ni ) + O p (cid:0) k ω ni ( ε i ) − ω ∗ ni k (cid:1)(cid:1)

51y Lemma S.7, for any i , ω ni ( ε i ) − ω ∗ ni = O p (cid:0) n − / (cid:1) and ω ni ( ε i ) has the asymptoti-cally linear approximation ω ni ( ε i ) − ω ∗ ni = 1 n − X j = i ϕ ωn,ij ( ω ∗ ni , ε ij ) + r ωni , with the inﬂuence function ϕ ωn,ij deﬁned in Lemma S.7. The remainder r ωni satisﬁesmax ≤ i ≤ n k r ωni k = o p (cid:0) n − / (cid:1) (Lemma S.8(ii)). Denote by J ωni ( ω ∗ ni ) the d θ × T Jacobianmatrix J ωni ( ω ∗ ni ) = 1 n − X j = i W n,ij ∇ ω ′ P ∗ n,ij ( ω ∗ ni ) , i = 1 , . . . , n. and by ¯ W ni the average instrument vector¯ W ni = 1 n − X j = i W n,ij , i = 1 , . . . , n. Note that both J ωni ( ω ∗ ni ) and ¯ W ni are bounded uniformly over i . Replacing ω ni ( ε i ) − ω ∗ ni with its asymptotically linear approximation we derive T n = 1 √ n X i n − X j = i W n,ij ∇ ω ′ P ∗ n,ij ( ω ∗ ni ) ! √ n − X j = i ϕ ωn,ij ( ω ∗ ni , ε ij ) + √ n − r ωni ! + 1 √ n X i n − X j = i W n,ij ! O p (cid:0) √ n − k ω ni − ω ∗ ni k (cid:1) = T l n + r n + r n , (S.80)where T l n = 1 p n ( n − X i X j = i J ωni ( ω ∗ ni ) ϕ ωn,ij ( ω ∗ ni , ε ij ) , and r n = r n − n X i J ωni ( ω ∗ ni ) r ωni ,r n = r n − n X i ¯ W ni O p (cid:0) k ω ni − ω ∗ ni k (cid:1) . The term T l n in (S.80) contributes to the asymptotic distribution and has an52symptotically normal distribution. It captures the random variation in ω ni ( ε i ). Wewill combine it with T n to derive the asymptotic distribution of Y n .The two remainder terms r n and r n are not asymptotically negligible. The sumof these terms and the fourth term T n in (S.74) is however o p (1).Step 4: T n .Observe that E [ Y n | X, p n ] = 0, so0 = E [ T n + T n + T n + T n | X, p n ]= E (cid:2) T n + T n + T l n + r n + r n + T n (cid:12)(cid:12) X, p n (cid:3) It is clear that E [ T n | X, p n ] = E (cid:2) T l n (cid:12)(cid:12) X, p n (cid:3) = 0. We have shown in Step 2 that E [ T n | X, p n ] = o (1), so E [ T n | X, p n ] = o (1). This implies that E [ r n + r n + T n | X, p n ] = E [ r n | X, p n ] + E [ r n | X, p n ] + T n = o (1) . Hence, r n + r n + T n = r n + r n − E [ r n | X, p n ] − E [ r n | X, p n ] + o (1) . Note that E [ r n | X, p n ] = r n − n X i J ωni ( ω ∗ ni ) E [ r ωni | X, p n ] , E [ r n | X, p n ] = r n − n X i ¯ W ni E (cid:2) O p (cid:0) k ω ni − ω ∗ ni k (cid:1)(cid:12)(cid:12) X, p n (cid:3) . Below we show that the two centered remainders r n − E [ r n | X, p n ] and r n − E [ r n | X, p n ] are both o p (1).We show r n − E [ r n | X, p n ] = o p (1) and the proof for r n − E [ r n | X, p n ] is similar.53y Chebyshev’s inequality, for any δ > k r n − E [ r n | X, p n ] k > δ | X, p n ) ≤ δ E (cid:0) k r n − E [ r n | X, p n ] k (cid:12)(cid:12) X, p n (cid:1) = n − nδ X i E (cid:0) ( r ωni − E [ r ωni | X, p n ]) ′ J ωni ( ω ∗ ni ) ′ J ωni ( ω ∗ ni ) ( r ωni − E [ r ωni | X, p n ]) (cid:12)(cid:12) X, p n (cid:1) ≤ n − δ max i E (cid:0) ( r ωni − E [ r ωni | X, p n ]) ′ J ωni ( ω ∗ ni ) ′ J ωni ( ω ∗ ni ) ( r ωni − E [ r ωni | X, p n ]) (cid:12)(cid:12) X, p n (cid:1) , where the equality follows because conditional on X and p n , r ωni depends on ε i only and therefore they are independent over i by Assumption 1. From LemmaS.8(ii), max ≤ i ≤ n k r ωni k = o p (cid:0) n − / (cid:1) , by the dominated convergence theorem we have E [ max ≤ i ≤ n k r ωni k| X, p n ] = o (cid:0) n − / (cid:1) , so max ≤ i ≤ n k r ωni − E [ r ωni | X, p n ] k ≤ max ≤ i ≤ n k r ωni k + E [ max ≤ i ≤ n k r ωni k| X, p n ] = o p (cid:0) n − / (cid:1) . This together with the boundedness of J ni ( ω ∗ ni ) implies that the term ( r ωni − E [ r ωni | X, p n ]) ′ J ni ( ω ∗ ni ) ′ J ni ( ω ∗ ni ) ( r ωni − E [ r ωni | X, p n ])is o p ( n − ) uniformly over i . By the dominated convergence theorem again, we obtain E (cid:18) max ≤ i ≤ n ( r ωni − E [ r ωni | X, p n ]) ′ J ni ( ω ∗ ni ) ′ J ni ( ω ∗ ni ) ( r ωni − E [ r ωni | X, p n ]) (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:19) = o (cid:18) n (cid:19) , This shows that r n − E [ r n | X, p n ] = o p (1) . Similarly, with O p (cid:0) k ω ni − ω ∗ ni k (cid:1) in place of r ωni and ¯ W ni in place of J ni ( ω ∗ ni ), andby max ≤ i ≤ n k ω ni ( ε i ) − ω ∗ ni k = o p (cid:0) n − / (cid:1) (Lemma S.8(i)), we can derive that r n − E [ r n | X, p n ] = o p (1) . Combining the results yields r n + r n + T n = r n + r n − E [ r n | X, p n ] − E [ r n | X, p n ] + o (1) = o p (1) . Now we return to the two main terms T n in (S.74) and T l n in (S.80). Both arenormalized averages of independent random variables. If we deﬁne the d θ × Y lni = 1 p n ( n − X j = i W n,ij (cid:0) g n,ij ( ω ∗ ni , ε ij ) − P ∗ n,ij ( ω ∗ ni ) (cid:1) + J ωni ( ω ∗ ni ) ϕ ωn,ij ( ω ∗ ni , ε ij )then T n + T l n = X i Y lni . Note that conditional on X and p n , Y lni , i = 1 , . . . , n , is an independent triangulararray becuase each Y lni depends on ε i only, and ε i , i = 1 , . . . , n , are i.i.d. by Assump-tion 1. Conditional on X and p n the Y lni are not identically distributed so we have touse the Lindeberg-Feller central limit theorem (CLT) for triangular arrays to derivethe asymptotic distribution of P i Y lni .Conditional on X and p n , Y lni has mean 0. By independence of Y lni , the varianceof P i Y lni is given by X i V ar (cid:0) Y lni (cid:12)(cid:12) X, p n (cid:1) = X i E h Y lni (cid:0) Y lni (cid:1) ′ (cid:12)(cid:12)(cid:12) X, p n i = 1 n ( n − X i X j = i Σ n,ij = Σ n where Σ n,ij is deﬁned in (S.69).Since Y lni is a vector, we verify that the conditions in the Lindeberg-Feller CLThold for a ′ P i Y lni for any vector of constants a ∈ R d θ so that ( a ′ Σ n a ) − / a ′ P i Y lni converges in distribution to N (0 , − / n P i Y lni converges in distribution to N (0 , I d θ ).Observe that given X and p n , a ′ P i Y lni has mean 0 and variance a ′ Σ n a . For theLindeberg condition, we need to show that for any ξ > n →∞ a ′ Σ n a X i E h (cid:0) a ′ Y lni (cid:1) n(cid:12)(cid:12) a ′ Y lni (cid:12)(cid:12) ≥ ξ p a ′ Σ n a o(cid:12)(cid:12)(cid:12) X, p n i = 0 . (S.81)We have X i E h (cid:0) a ′ Y lni (cid:1) n(cid:12)(cid:12) a ′ Y lni (cid:12)(cid:12) ≥ ξ p a ′ Σ n a o(cid:12)(cid:12)(cid:12) X, p n i ≤ E " X i (cid:0) a ′ Y lni (cid:1) ( max ≤ i ≤ n (cid:12)(cid:12) a ′ Y lni (cid:12)(cid:12) √ a ′ Σ n a ≥ ξ )(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n , P i (cid:0) a ′ Y lni (cid:1) has a ﬁnite expectation and is therefore O p (1). Hence ifmax ≤ i ≤ n (cid:12)(cid:12) a ′ Y lni (cid:12)(cid:12) √ a ′ Σ n a = o p (1) (S.82)then X i (cid:0) a ′ Y lni (cid:1) ( max ≤ i ≤ n (cid:12)(cid:12) a ′ Y lni (cid:12)(cid:12) √ a ′ Σ n a ≥ ξ ) = O p (1) o p (1) = o p (1)Finally, this random variable is bounded by P i (cid:0) a ′ Y lni (cid:1) that has a ﬁnite expectation.We conclude that by dominated convergence the Lindeberg condition is satisﬁed if(S.82) holds.By Chebyshev’s inequalityPr max ≤ i ≤ n (cid:12)(cid:12) a ′ Y lni (cid:12)(cid:12) √ a ′ Σ n a ≥ ξ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, p n ! ≤ ξ a ′ Σ n a E (cid:20) max ≤ i ≤ n (cid:0) a ′ Y lni (cid:1) (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) . By the maximal inequality in Lemma 2.2.2 in van der Vaart and Wellner (1996), E (cid:20) max ≤ i ≤ n (cid:0) a ′ Y lni (cid:1) (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ K ln ( n + 1) max ≤ i ≤ n (cid:13)(cid:13)(cid:13)(cid:0) a ′ Y lni (cid:1) (cid:13)(cid:13)(cid:13) ψ | X,p n , where K is a constant depending only on ψ and k Z k ψ | X,p n is the conditional Orlicznorm of a random variable Z given X and p n for the convex function ψ ( z ) = e z − ψ , E ( | Z || X, p n ) ≤ k Z k ψ | X,p n .Next we derive a bound on max ≤ i ≤ n (cid:13)(cid:13)(cid:13)(cid:0) a ′ Y lni (cid:1) (cid:13)(cid:13)(cid:13) ψ | X,p n . Recall that a ′ Y lni = 1 p n ( n − X j = i a ′ (cid:0) W n,ij (cid:0) g n,ij ( ω ∗ ni , ε ij ) − P ∗ n,ij ( ω ∗ ni ) (cid:1) + J ωni ( ω ∗ ni ) ϕ ωn,ij ( ω ∗ ni , ε ij ) (cid:1) . Each term in the average is bounded by (cid:12)(cid:12) a ′ (cid:0) W n,ij (cid:0) g n,ij ( ω ∗ ni , ε ij ) − P ∗ n,ij ( ω ∗ ni ) (cid:1) + J ωni ( ω ∗ ni ) ϕ ωn,ij ( ω ∗ ni , ε ij ) (cid:1)(cid:12)(cid:12) ≤ k a k (cid:0) k W n,ij k + k J ωni ( ω ∗ ni ) k (cid:13)(cid:13) ϕ ωn,ij ( ω ∗ ni , ε ij ) (cid:13)(cid:13)(cid:1) ≡ M n,ij ≤ M n < ∞ ,

56o each normalized term in the sum has a support that is contained in " − M n,ij p n ( n − , M n,ij p n ( n − . By Hoeﬀding’s inequality for bounded random variables (Boucheron, Lugosi, and Massart,2013, Theorem 2.8), we have for any t > (cid:0) (cid:12)(cid:12) a ′ Y lni (cid:12)(cid:12) > t (cid:12)(cid:12) X, p n (cid:1) ≤ − n ( n − t P j = i M n,ij ! . Therefore, by Lemma 2.2.1 in van der Vaart and Wellner (1996), (cid:13)(cid:13)(cid:13)(cid:0) a ′ Y lni (cid:1) (cid:13)(cid:13)(cid:13) ψ | X,p n ≤ P j = i M n,ij n ( n − , Combining the results we obtain E (cid:20) max ≤ i ≤ n (cid:0) a ′ Y lni (cid:1) (cid:12)(cid:12)(cid:12)(cid:12) X, p n (cid:21) ≤ K ln ( n + 1) P j = i M n,ij n ( n − ≤ KM n ln ( n + 1) n → , so (S.82) holds and and the Lindeberg condition (S.81) is proved.By the Lindeberg-Feller CLT a ′ P i Y lni √ a ′ Σ n a d → N (0 , . Because Σ n is a symmetric and positive-deﬁnite matrix, there is a nonsingular sym-metric matrix Σ / n such that Σ / n Σ / n = Σ n . Let ˜ a = Σ / n a , then a ′ P i Y ni =˜ a ′ Σ − / n P i Y lni and a ′ Σ n a = ˜ a ′ Σ − / n Σ n Σ − / n ˜ a = ˜ a ′ ˜ a . Note that Σ n is nonsingular, so˜ a is also an arbitrary vector in R d θ . The previous result then implies that˜ a ′ Σ − / n X i Y lni d → N (0 , ˜ a ′ ˜ a ) . By the Cramer-Wold device, Σ − / n X i Y lni d → N (0 , I d θ ) . I d θ is the d θ × d θ identity matrix.Because Y n = X i Y lni + o p (1) , we conclude that by Slutsky’s theorem Y n has the asymptotic distributionΣ − / n Y n d → N (0 , I d θ ) . S.3 Proofs in Section 5

Proof of Theorem 5.1.

We prove the theorem in the case where both T i + and T i − are nonempty. The proof also holds for special cases where T i + is empty (i.e., all theeigenvalues of V i ( X, σ ) are nonpositive) or T i − is empty (i.e., all the eigenvalues of V i ( X, σ ) are nonnegative) without modiﬁcation. Note that the latter special case hasbeen proved in Theorem 3.2.From Proposition 3.1, the expected utility satisﬁes E [ U i ( G i , G − i , X, ε i ) | X, ε i , σ ]= X j = i G ij ( U ij ( X, σ ) − ε ij )+ X t λ it ( X, σ ) max ω t ∈ R ( n − n − X j = i G ij Z ′ j φ it ( X, σ ) ω t − ( n − n − ω t ) = max ( ω t ,t ∈T i + ) min ( ω t ,t ∈T i − ) X j = i G ij U ij ( X, σ ) + 2 ( n − n − Z ′ j X t φ it ( X, σ ) λ it ( X, σ ) ω t − ε ij ! − ( n − n − X t λ it ( X, σ ) ω t (S.83)= max ( ω t ,t ∈T i + ) min ( ω t ,t ∈T i − ) X j = i G ij (cid:18) U ij ( X, σ ) + 2 ( n − n − Z ′ j Φ i ( X, σ ) Λ i ( X, σ ) ω − ε ij (cid:19) − ( n − n − ω ′ Λ i ( X, σ ) ω. (S.84)The second equality in (S.83) follows because if we move an eigenvalue λ it inside amaximization, it remains a maximzation if λ it ≥ it <

0. Note that the transformed expected utility is separable in each maximization,so the order of the maximizations and minimizations in (S.83) and (S.84) does notmatter.Denote by ˜Π ( G i , ω, ε i , X, σ ) the objective function of the maximin problem in(S.84)˜Π i ( G i , ω, ε i , X, σ ) = X j = i G ij (cid:18) U ij ( X, σ ) + 2 ( n − n − Z ′ j Φ i ( X, σ ) Λ i ( X, σ ) ω − ε ij (cid:19) − ( n − n − ω ′ Λ i ( X, σ ) ω. We have max G i E [ U i ( G i , G − i , X, ε i ) | X, ε i , σ ]= max G i max ( ω t ,t ∈T i + ) min ( ω t ,t ∈T i − ) ˜Π i ( G i , ω, ε i , X, σ ) ≤ max ( ω t ,t ∈T i + ) min ( ω t ,t ∈T i − ) max G i ˜Π i ( G i , ω, ε i , X, σ )= max ( ω t ,t ∈T i + ) min ( ω t ,t ∈T i − ) Π i ( ω, ε i , X, σ ) , (S.85)where Π i ( ω, ε i , X, σ ) is deﬁned in (5.2). The inequality follows because ˜Π i ( G i , ω ) ≤ max G i ˜Π i ( G i , ω ) for all G i and ω , so we havemax ( ω t ,t ∈T i + ) min ( ω t ,t ∈T i − ) ˜Π i ( G i , ω ) ≤ max ( ω t ,t ∈T i + ) min ( ω t ,t ∈T i − ) max G i ˜Π i ( G i , ω )for all G i , and thus the maximum of the left-hand side over G i is bounded above bythe right-hand side. The last equality in (S.85) holds because for any ω , ˜Π i ( G i , ω ) isseparable in each G ij so the optimal G ij is given by (5.1) with ω i ( X, ε i , σ ) replacedby ω and max G i ˜Π i ( G i , ω ) = Π i ( ω ).We now show that the inequality in (S.85) is an equality. Since ω i ( X, ε i , σ ) is asolution to the maximin problem in the last line of (S.85), similarly as in Lemma S.159t satisﬁes the ﬁrst-order conditionΛ i ( X, σ ) ω i ( X, ε i , σ )= 1 n − X j = i (cid:26) U ij ( X, σ ) + 2 ( n − n − Z ′ j Φ i ( X, σ ) Λ i ( X, σ ) ω i ( X, ε i , σ ) − ε ij ≥ (cid:27) · Λ i ( X, σ ) Φ ′ i ( X, σ ) Z j , a.s..Pre-multiplication by Φ i ( X, σ ) givesΦ i ( X, σ ) Λ i ( X, σ ) ω i ( X, ε i , σ ) = 1 n − V i ( X, σ ) X j = i G ij ( X, ε i , σ ) Z j , a.s., (S.86)where G ij ( X, ε i , σ ) is given in (5.1). By the deﬁnition of G i ( X, ε i , σ ) and ω i ( X, ε i , σ ),the maximin value of Π ( ω, ε i , X, σ ) is given bymax ( ω t ,t ∈T i + ) min ( ω t ,t ∈T i − ) Π i ( ω, ε i , X, σ )= ˜Π ( G i ( X, ε i , σ ) , ω i ( X, ε i , σ ) ; X, ε i , σ )= X j = i G ij ( X, ε i , σ ) ( U ij ( X, σ ) − ε ij )+ 2 ( n − n − X j = i G ij ( X, ε i , σ ) Z ′ j Φ i ( X, σ ) Λ i ( X, σ ) ω i ( X, ε i , σ ) − ( n − n − ω i ( X, ε i , σ ) ′ Λ i ( X, σ ) ω i ( X, ε i , σ ) . Let V + i ( X, σ ) and Λ + i ( X, σ ) be the Moore-Penrose generalized inverse of V i ( X, σ )and Λ i ( X, σ ), respectively. Clearly V + i ( X, σ ) = Φ i ( X, σ ) Λ + i ( X, σ ) Φ i ( X, σ ) ′ . Thequadratic term in the last display satisﬁes ω i ( X, ε i , σ ) ′ Λ i ( X, σ ) ω i ( X, ε i , σ )= ω i ( X, ε i , σ ) ′ Λ i ( X, σ ) Λ + i ( X, σ ) Λ i ( X, σ ) ω i ( X, ε i , σ )= ω i ( X, ε i , σ ) ′ Λ i ( X, σ ) Φ i ( X, σ ) ′ Φ i ( X, σ ) Λ + i ( X, σ ) Φ i ( X, σ ) ′ Φ i ( X, σ ) Λ i ( X, σ ) ω i ( X, ε i , σ )= 1( n − V i ( X, σ ) X j = i G ij ( X, ε i , σ ) Z j ! ′ V + i ( X, σ ) V i ( X, σ ) X j = i G ij ( X, ε i , σ ) Z j ! , a.s.,60here we have used (S.86) to derive the last equality. Therefore,max ( ω t ,t ∈T i + ) min ( ω t ,t ∈T i − ) Π i ( ω, ε i , X, σ )= X j = i G ij ( X, ε i , σ ) ( U ij ( X, σ ) − ε ij )+ 2 n − X j = i G ij ( X, ε i , σ ) Z j ! ′ V i ( X, σ ) X j = i G ij ( X, ε i , σ ) Z j ! − n − V i ( X, σ ) X j = i G ij ( X, ε i , σ ) Z j ! ′ V + i ( X, σ ) V i ( X, σ ) X j = i G ij ( X, ε i , σ ) Z j ! = X j = i G ij ( X, ε i , σ ) ( U ij ( X, σ ) − ε ij )+ 1 n − X j = i X k = i G ij ( X, ε i , σ ) G ik ( X, ε i , σ ) Z ′ j V i ( X, σ ) Z k , a.s.= E [ U i ( G i ( X, ε i , σ ) , G − i , X, ε i ) | X, ε i , σ ] , a.s.. (S.87)Combining (S.85) and (S.87) yieldsmax G i E [ U i ( G i , G − i , X, ε i ) | X, ε i , σ ] ≤ max ( ω t ,t ∈T i + ) min ( ω t ,t ∈T i − ) Π i ( ω, ε i , X, σ )= E [ U i ( G i ( X, ε i , σ ) , G − i , X, ε i ) | X, ε i , σ ] , a.s..Because max G i E [ U i ( G i , G − i , X, ε i ) | X, ε i , σ ] ≥ E [ U i ( G i ( X, ε i , σ ) , G − i , X, ε i ) | X, ε i , σ ],the inequality becomes an equality, and all the terms are equal. Hence, G i ( X, ε i , σ )is an optimal solution almost surely.As for the uniqueness, G i ( X, ε i , σ ) is unique almost surely because ε i has a contin-uous distribution, so two link decisions achieve the same utility with probability zero.The uniqueness of Λ i ( X, σ ) ω i ( X, ε i , σ ) follows from the uniqueness of G i ( X, ε i , σ ),(S.86) and the invertibility of Φ i ( X, σ ). The proof is complete.

Proof of Theorem 5.3.

For simplicity, we omit the arguments (

X, σ ) (or ( X i , σ ))whenever possible. Deﬁne ˜ ω ni ( ε i ) = Φ ni ω ni ( ε i ) and ˜ ω i = Φ i ω i . The ﬁnite- n and61imiting conditional choice probabilities depend on ˜ ω ni ( ε i ) and ˜ ω i , respectively, i.e., P n,ij ( X, σ ) = Pr (cid:18) U n,ij + 2 ( n − n − Z ′ j V ni ˜ ω ni ( ε i ) − ε ij ≥ (cid:12)(cid:12)(cid:12)(cid:12) X, σ (cid:19) and P ij ( X i , X j , σ ) = Pr (cid:0) U ij + 2 Z ′ j V i ˜ ω i − ε ij ≥ (cid:12)(cid:12) X i , X j , σ (cid:1) . Notice that ω ′ Λ ni ω = (Φ ni ω ) ′ V ni (Φ ni ω ) ω ′ Λ i ω = (Φ i ω ) ′ V i (Φ i ω ) . Since Φ ni and Φ i are nonsingular, there are one-to-one mappings between ω and Φ ni ω and between ω and Φ i ω . Therefore, ˜ ω ni ( ε i ) and ˜ ω i are the solutions tomax ˜ ω ˜Π ni (˜ ω, ε i , X, σ ) = max ˜ ω n − X j = i (cid:20) U n,ij + 2 ( n − n − Z ′ j V ni ˜ ω − ε ij (cid:21) + − n − n − ω ′ V ni ˜ ω and max ˜ ω ˜Π i (˜ ω, X i , σ ) = max ˜ ω E h (cid:2) U ij + 2 Z ′ j V i ˜ ω − ε ij (cid:3) + (cid:12)(cid:12)(cid:12) X i , σ i − ˜ ω ′ V i ˜ ω, respectively. The advantage of the change of variables is that we get rid of the eigen-values and eigenvectors in the expressions so that the conditional choice probabilitiesand the objective functions ˜Π ni and ˜Π i only involve V ni and V i .By the deﬁnition of P n,ij and P ij , | P n,ij ( X, σ ) − P ij ( X i , X j , σ ) |≤ E (cid:20) (cid:12)(cid:12)(cid:12)(cid:12) (cid:26) U n,ij + 2 ( n − n − Z ′ j V ni ˜ ω ni ( ε i ) ≥ ε ij (cid:27) − (cid:8) U ij + 2 Z ′ j V i ˜ ω i ≥ ε ij (cid:9)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, σ (cid:21) ≤ E (cid:20) (cid:12)(cid:12)(cid:12)(cid:12) (cid:26) U n,ij + 2 ( n − n − Z ′ j V ni ˜ ω ni ( ε i ) ≥ ε ij (cid:27) − (cid:8) U n,ij + 2 Z ′ j V ni ˜ ω i ≥ ε ij (cid:9)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X, σ (cid:21) + E (cid:2) (cid:12)(cid:12) (cid:8) U n,ij + 2 Z ′ j V ni ˜ ω i ≥ ε ij (cid:9) − (cid:8) U ij + 2 Z ′ j V i ˜ ω i ≥ ε ij (cid:9)(cid:12)(cid:12)(cid:12)(cid:12) X, σ (cid:3) . (S.88)62bserve that U n,ij + 2 ( n − n − Z ′ j V ni ˜ ω ni ( ε i ) − (cid:0) U n,ij + 2 Z ′ j V ni ˜ ω i (cid:1) =2 Z ′ j V ni (˜ ω ni ( ε i ) − ˜ ω i ) + 2 n − Z ′ j V ni ˜ ω ni ( ε i ) ≡ ∆ ni ( ε i )so the ﬁrst term in the last expression in (S.88) can be bounded byPr ( | ∆ ni ( ε i ) | > δ n | X, σ )+ Pr (cid:0) U n,ij + 2 Z ′ j V ni ˜ ω i − δ n ≤ ε ij ≤ U n,ij + 2 Z ′ j V ni ˜ ω i + δ n (cid:12)(cid:12) X, σ (cid:1) (S.89)for δ n >

0. This is because if ε ij lies between U n,ij + n − n − Z ′ j V ni ˜ ω ni ( ε i ) and U n,ij +2 Z ′ j V ni ˜ ω i , and if their diﬀerence | ∆ ni ( ε i ) | is at most δ n , then ε ij must lie between U n,ij + 2 Z ′ j V ni ˜ ω i − δ n and U n,ij + 2 Z ′ j V ni ˜ ω i + δ n .Given X i and σ , ˜ ω ni ( ε i ) − ˜ ω i = o p (1) by Lemma S.11. Since V ni and ˜ ω ni ( ε i ) =Φ ni ω ni ( ε i ) are bounded, we have for any δ n > | ∆ ni ( ε i ) | > δ n | X i , σ ) → n → ∞ . By the law of iterated expectationPr ( | ∆ ni ( ε i ) | > δ n | X i , σ ) = E [ Pr ( | ∆ ni ( ε i ) | > δ n | X i , X j , σ ) | X i , σ ]= T X t =1 Pr ( | ∆ ni ( ε i ) | > δ n | X i , X j = x t , σ ) Pr ( X j = x t ) . In the expression the expectation is taken with respect to X j . If there is t ∈{ , . . . , T } such that Pr ( | ∆ ni ( ε i ) | > δ n | X i , X j = x t , σ ) does not converge to 0, thenPr ( | ∆ ni ( ε i ) | > δ n | X i , σ ) cannot converge to 0. This implies that given X i , X j , and σ we have Pr ( | ∆ ni ( ε i ) | > δ n | X i , X j , σ ) → . Note that X = ( X i , X j , X − ij ), where X − ij = ( X k , k = i, j ). By the law of iteratedexpectation againPr ( | ∆ ni ( ε i ) | > δ n | X i , X j , σ ) = E [ Pr ( | ∆ ni ( ε i ) | > δ n | X i , X j , X − ij , σ ) | X i , X j , σ ] , X − ij . Because convergence in mean impliesconvergence in probability (by Markov’s inequality), given X i , X j , and σ we musthave Pr ( | ∆ ni ( ε i ) | > δ n | X i , X j , X − ij , σ ) = o p (1) . (S.90)For the second term in the bound in (S.89), by the mean-value theoremPr (cid:0) U n,ij + 2 Z ′ j V ni ˜ ω i − δ n ≤ ε ij ≤ U n,ij + 2 Z ′ j V ni ˜ ω i + δ n (cid:12)(cid:12) X i , X j , X − ij , σ (cid:1) = F ε (cid:0) U n,ij + 2 Z ′ j V ni ˜ ω i + δ n (cid:1) − F ε (cid:0) U n,ij + 2 Z ′ j V ni ˜ ω i − δ n (cid:1) =2 f ε (cid:0) U n,ij + 2 Z ′ j V ni ˜ ω i + t n,ij δ n (cid:1) δ n for some t n,ij ∈ [ − , f ε is bounded, f ε (cid:0) U n,ij + 2 Z ′ j V ni ˜ ω i + t n,ij δ n (cid:1) is bounded as well. We choose δ n > δ n ↓ n → ∞ , soPr (cid:0) U n,ij + 2 Z ′ j V ni ˜ ω i − δ n ≤ ε ij ≤ U n,ij + 2 Z ′ j V ni ˜ ω i + δ n (cid:12)(cid:12) X i , X j , X − ij , σ (cid:1) = o p (1) . (S.91)Combining (S.90) and (S.91), given X i , X j , and σ the ﬁrst term in the last expressionin (S.88) is o p (1) E (cid:20) (cid:26) U n,ij + 2 ( n − n − Z ′ j V ni ˜ ω ni ( ε i ) ≥ ε ij (cid:27) − (cid:8) U n,ij + 2 Z ′ j V ni ˜ ω i ≥ ε ij (cid:9)(cid:12)(cid:12)(cid:12)(cid:12) X i , X j , X − ij , σ (cid:21) = o p (1) . The last term in (S.88) satisﬁes E (cid:2) (cid:12)(cid:12) (cid:8) U n,ij + 2 Z ′ j V ni ˜ ω i ≥ ε ij (cid:9) − (cid:8) U ij + 2 Z ′ j V i ˜ ω i ≥ ε ij (cid:9)(cid:12)(cid:12)(cid:12)(cid:12) X i , X j , X − ij , σ (cid:3) = (cid:12)(cid:12) F ε (cid:0) U n,ij + 2 Z ′ j V ni ˜ ω i (cid:1) − F ε (cid:0) U ij + 2 Z ′ j V i ˜ ω i (cid:1)(cid:12)(cid:12) =2 f ε (cid:0) ˜ t n,ij (cid:1) (cid:12)(cid:12) U n,ij + 2 Z ′ j V ni ˜ ω i − (cid:0) U ij + 2 Z ′ j V i ˜ ω i (cid:1)(cid:12)(cid:12) ≤ f ε (cid:0) ˜ t n,ij (cid:1) ( | U n,ij − U ij | + 2 k V ni − V i k k ˜ ω i k )for some ˜ t n,ij between U n,ij + 2 Z ′ j V ni ˜ ω i and U ij + 2 Z ′ j V i ˜ ω i , where the second equalityfollows from the mean-value theorem. Since the density f ε is bounded, and given X i , X j , and σ , U n,ij − U ij = o p (1) and V ni − V i = o p (1) by Assumption 7, the last term64n (S.88) is also o p (1) E (cid:2) (cid:12)(cid:12) (cid:8) U n,ij + 2 Z ′ j V ni ˜ ω i ≥ ε ij (cid:9) − (cid:8) U ij + 2 Z ′ j V i ˜ ω i ≥ ε ij (cid:9)(cid:12)(cid:12)(cid:12)(cid:12) X i , X j , X − ij , σ (cid:3) = o p (1) . Combining the results we conclude that given X i , X j , and σP n,ij ( X i , X j , X − ij , σ ) − P ij ( X i , X j , σ ) = o p (1) . The proof is complete.

Lemma S.11 (Consistency of ˜ ω ni ( ε i ) for ˜ ω i ) Suppose that Assumptions 1-3 and7 are satisﬁed. Given X i and σ , ˜ ω ni ( ε i ) and ˜ ω i deﬁned in the proof of Theorem 5.3satisfy ˜ ω ni ( ε i ) − ˜ ω i = o p (1) , i.e., for any δ > , Pr ( k ˜ ω ni ( ε i ) − ˜ ω i k > δ | X i , σ ) → as n → ∞ . Proof.

Recall that ˜ ω ni ( ε i ) and ˜ ω i are solutions to the transformed maximizationproblemsmax ˜ ω ˜Π ni (˜ ω, ε i , X, σ ) = max ˜ ω n − X j = i (cid:20) U n,ij + 2 ( n − n − Z ′ j V ni ˜ ω − ε ij (cid:21) + − n − n − ω ′ V ni ˜ ω and max ˜ ω ˜Π i (˜ ω, X i , σ ) = max ˜ ω E h (cid:2) U ij + 2 Z ′ j V i ˜ ω − ε ij (cid:3) + (cid:12)(cid:12)(cid:12) X i , σ i − ˜ ω ′ V i ˜ ω. (S.93)Because the original maximization problem in (5.8) has a unique solution ω i by As-sumption 7, the one-to-one relationship between ω i and ˜ ω i implies that ˜ ω i is theunique solution to the transformed maximization problem.Observe that ∂∂c E [ c − ε ] + = ∂∂c Z c −∞ ( c − ε ) f ε ( ε ) dε = F ε ( c ) . The ﬁrst-order condition of (S.93) is given by ∇ ˜ ω ˜Π i (˜ ω, X i , σ ) = 2 V i E (cid:2) Z j F ε (cid:0) U ij + 2 Z ′ j V i ˜ ω (cid:1)(cid:12)(cid:12) X i , σ (cid:3) − V i ˜ ω = 0 . (S.94)65t is easy to see that any ˜ ω that satisﬁes the ﬁrst-order condition must be bounded.Without loss of generality we can assume that ˜ ω i is in a compact set ˜Ω. Since˜Π i (˜ ω, X i , σ ) is continuous in ˜ ω , if we can further establish a uniform LLN for theobjective functions, i.e.,sup ˜ ω (cid:12)(cid:12)(cid:12) ˜Π ni (˜ ω, ε i , X, σ ) − ˜Π i (˜ ω, X i , σ ) (cid:12)(cid:12)(cid:12) = o p (1) (S.95)as n → ∞ , following a standard consistency proof (Newey and McFadden, 1994) wecan prove (S.92).By the triangle inequalitysup ˜ ω (cid:12)(cid:12)(cid:12) ˜Π ni (˜ ω, ε i , X, σ ) − ˜Π i (˜ ω, X i , σ ) (cid:12)(cid:12)(cid:12) ≤ sup ˜ ω n − X j = i (cid:12)(cid:12)(cid:12)(cid:12)(cid:20) U n,ij + 2 ( n − n − Z ′ j V ni ˜ ω − ε ij (cid:21) + − (cid:2) U ij + 2 Z ′ j V i ˜ ω − ε ij (cid:3) + (cid:12)(cid:12)(cid:12)(cid:12) + sup ˜ ω (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − X j = i (cid:2) U ij + 2 Z ′ j V i ˜ ω − ε ij (cid:3) + − E h (cid:2) U ij + 2 Z ′ j V i ˜ ω − ε ij (cid:3) + (cid:12)(cid:12)(cid:12) X i , σ i(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + sup ˜ ω (cid:12)(cid:12)(cid:12)(cid:12) n − n − ω ′ V ni ˜ ω − ˜ ω ′ V i ˜ ω (cid:12)(cid:12)(cid:12)(cid:12) . (S.96)Because (cid:12)(cid:12) [ x ] + − [ y ] + (cid:12)(cid:12) ≤ | x − y | , the ﬁrst term on the right-hand side can be boundedby sup ˜ ω n − X j = i (cid:12)(cid:12)(cid:12)(cid:12) ( U n,ij − U ij ) + 2 Z ′ j ( V ni − V i ) ˜ ω + 2 n − Z ′ j V ni ˜ ω (cid:12)(cid:12)(cid:12)(cid:12) ≤ max j = i | U n,ij − U ij | + 2 k V ni − V i k sup ˜ ω k ˜ ω k + 2 n − k V ni k sup ˜ ω k ˜ ω k = o p (1) , where the last equality follows because max j = i | U n,ij − U ij | = o p (1) and V ni − V i = o p (1) by Assumption 7, and k V ni k and sup ˜ ω k ˜ ω k are bounded. Similarly, we can66ound the last term on the right-hand side of (S.96) bysup ˜ ω | ˜ ω ′ ( V ni − V i ) ˜ ω | + sup ˜ ω n − | ˜ ω ′ V ni ˜ ω |≤ (cid:18) k V ni − V i k + 1 n − k V ni k (cid:19) sup ˜ ω k ˜ ω k = o p (1) . For the second term on the right-hand side of (S.96), observe that given X i and σ , the functions (cid:2) U ij + 2 Z ′ j V i ˜ ω − ε ij (cid:3) + are i.i.d. across j . These functions have anenvelope (cid:2) U ij + 2 Z ′ j V i ˜ ω − ε ij (cid:3) + ≤ (cid:20) U ij + 2 k V i k sup ˜ ω k ˜ ω k − ε ij (cid:21) + , ∀ ˜ ω ∈ ˜Ω , that is integrable since E (cid:20) (cid:20) U ij + 2 k V i k sup ˜ ω k ˜ ω k − ε ij (cid:21) + (cid:12)(cid:12)(cid:12)(cid:12) X i , σ (cid:21) ≤ E (cid:20) (cid:12)(cid:12)(cid:12)(cid:12) U ij + 2 k V i k sup ˜ ω k ˜ ω k − ε ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i , σ (cid:21) ≤ E " (cid:18) U ij + 2 k V i k sup ˜ ω k ˜ ω k − ε ij (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i , σ / < ∞ . Note that U ij + 2 Z ′ j V i ˜ ω is linear in ω and the function [ x ] + is Lipschitz in x because (cid:12)(cid:12) [ x ] + − [ y ] + (cid:12)(cid:12) ≤ | x − y | . The function (cid:2) U ij + 2 Z ′ j V i ˜ ω − ε ij (cid:3) + is therefore Lipschitz in˜ ω , i.e., for any ˜ ω , ˜ ω ∈ ˜Ω (cid:12)(cid:12)(cid:12)(cid:2) U ij + 2 Z ′ j V i ˜ ω − ε ij (cid:3) + − (cid:2) U ij + 2 Z ′ j V i ˜ ω − ε ij (cid:3) + (cid:12)(cid:12)(cid:12) ≤ k V i k (cid:13)(cid:13) ˜ ω − ˜ ω (cid:13)(cid:13) , so the class of functions n(cid:2) U ij + 2 Z ′ j V i ˜ ω − ε ij (cid:3) + , ˜ ω ∈ ˜Ω o is a type II class as deﬁnedin Andrews (1994). It thus satisﬁes Pollard’s entropy condition (Andrews, 1994,Theorem 2), and the uniform LLN followssup ˜ ω (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − X j = i (cid:2) U ij + 2 Z ′ j V i ˜ ω − ε ij (cid:3) + − E h (cid:2) U ij + 2 Z ′ j V i ˜ ω − ε ij (cid:3) + (cid:12)(cid:12)(cid:12) X i , σ i(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o p (1) . Proof of Example 5.1.

We verify that under Assumption 7(i), U n,ij ( X, σ ) and V ni ( X, σ ) given in Example 5.1 satisfy Assumption 7(ii). Recall that U n,ij ( X, σ ) − U ij ( X i , X j , σ )= 1 n − X k = i,j σ ( X j , X k ) β ( X i , X j , X k ) − E [ σ ( X j , X k ) β ( X i , X j , X k ) | X i , X j , σ ] − n − Z ′ j V ni ( X, σ ) Z j , and for s, t = 1 , . . . , T , V ni,st ( X, σ ) − V i,st ( X i , σ )= 1 n − X l = i,j,k ( σ ( x s , X l ) σ ( x t , X l ) γ ( X i , x s , x t ) − E [ σ ( x s , X l ) σ ( x t , X l ) γ ( X i , x s , x t ) | X i , σ ]) . Denote ∆ U ( X i , X j , X k , σ )= σ ( X j , X k ) β ( X i , X j , X k ) − E [ σ ( X j , X k ) β ( X i , X j , X k ) | X i , X j , σ ] , and for s, t = 1 , . . . , T ,∆ Vst ( X i , x s , x t , X l , σ )= σ ( x s , X l ) σ ( x t , X l ) γ ( X i , x s , x t ) − E [ σ ( x s , X l ) σ ( x t , X l ) γ ( X i , x s , x t ) | X i , σ ] . We ﬁrst look at U n,ij ( X, σ ) − U ij ( X i , X j , σ ). Given X i , X j and σ , for any δ > (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − X k = i,j ∆ U ( X i , X j , X k , σ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i , X j , σ ! ≤ δ E  n − X k = i,j ∆ U ( X i , X j , X k , σ ) ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i , X j , σ  = 1 δ ( n − X k = i,j E h (cid:0) ∆ U ( X i , X j , X k , σ ) (cid:1) (cid:12)(cid:12)(cid:12) X i , X j , σ i → n → ∞ , where the equality follows because X k , k = i, j , are i.i.d. (Assumption7(i)). This proves 1 n − X k = i,j ∆ U ( X i , X j , X k , σ ) = o p (1)for any j = i . Because n − P k = i,j ∆ U ( X i , X j , X k , σ ) depends on j only through X j , and X j takes only T values, we havemax j = i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − X k = i,j ∆ U ( X i , X j , X k , σ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = max t =1 ,...,T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − X k = i,j ∆ U ( X i , x t , X k , σ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ T X t =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − X k = i,j ∆ U ( X i , x t , X k , σ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o p (1) . The second term in U n,ij ( X, σ ) − U ij ( X i , X j , σ ) satisﬁesmax j = i n − (cid:12)(cid:12) Z ′ j V ni ( X, σ ) Z j (cid:12)(cid:12) ≤ n − k V ni ( X, σ ) k = o p (1) , because V ni ( X, σ ) is bounded. Therefore,max j = i | U n,ij ( X, σ ) − U ij ( X i , X j , σ ) |≤ max j = i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − X k = i,j ∆ U ( X i , X j , X k , σ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + max j = i n − (cid:12)(cid:12) Z ′ j V ni ( X, σ ) Z j (cid:12)(cid:12) = o p (1) . Note that n − P k = i,j ∆ U ( X i , X j , X k , σ ) = n − P k = i,j ′ ∆ U ( X i , X j ′ , X k , σ ) for any j = j ′ with X j = X j ′ .

69s for V ni ( X, σ ) − V i ( X i , σ ), by Chebyshev’s inequality and i.i.d. X l Pr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − X l = i,j,k ∆ Vst ( X i , x s , x t , X l , σ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > δ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i , σ ! ≤ δ E  n − X l = i,j,k ∆ Vst ( X i , x s , x t , X l , σ ) ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i , σ  = 1 δ ( n − X l = i,j,k E h (cid:0) ∆ Vst ( X i , x s , x t , X l , σ ) (cid:1) (cid:12)(cid:12)(cid:12) X i , σ i → n → ∞ . Hence, for s, t = 1 , . . . , T , V ni,st ( X, σ ) − V i,st ( X i , σ ) = o p (1) . We conclude that k V ni ( X, σ ) − V i ( X i , σ ) k ≤ max s,t =1 ,...,T | V ni,st ( X, σ ) − V i,st ( X i , σ ) | = o p (1) ..