[PDF] A Semiparametric Network Formation Model with Unobserved Linear Heterogeneity

Abstract

This paper analyzes a semiparametric model of network formation in the presence of unobserved agent-specific heterogeneity. The objective is to identify and estimate the preference parameters associated with homophily on observed attributes when the distributions of the unobserved factors are not parametrically specified. This paper offers two main contributions to the literature on network formation. First, it establishes a new point identification result for the vector of parameters that relies on the existence of a special repressor. The identification proof is constructive and characterizes a closed-form for the parameter of interest. Second, it introduces a simple two-step semiparametric estimator for the vector of parameters with a first-step kernel estimator. The estimator is computationally tractable and can be applied to both dense and sparse networks. Moreover, I show that the estimator is consistent and has a limiting normal distribution as the number of individuals in the network increases. Monte Carlo experiments demonstrate that the estimator performs well in finite samples and in networks with different levels of sparsity.

Full PDF

aa r X i v : . [ ec on . E M ] A ug A Semiparametric Network Formation Modelwith Unobserved Linear Heterogeneity ∗ Luis E. Candelaria † September 1, 2020

Abstract

This paper analyzes a semiparametric model of network formation in the presence of un-observed agent-speciﬁc heterogeneity. The objective is to identify and estimate the preferenceparameters associated with homophily on observed attributes when the distributions of the un-observed factors are not parametrically speciﬁed. This paper oﬀers two main contributions tothe literature on network formation. First, it establishes a new point identiﬁcation result for thevector of parameters that relies on the existence of a special regressor. The identiﬁcation proof isconstructive and characterizes a closed-form for the parameter of interest. Second, it introducesa simple two-step semiparametric estimator for the vector of parameters with a ﬁrst-step kernelestimator. The estimator is computationally tractable and can be applied to both dense andsparse networks. Moreover, I show that the estimator is consistent and has a limiting normaldistribution as the number of individuals in the network increases. Monte Carlo experimentsdemonstrate that the estimator performs well in ﬁnite samples and in networks with diﬀerentlevels of sparsity.

Keywords:

Network formation, Unobserved heterogeneity, Semiparametrics, Special regressor,Inverse weighting. ∗ First version: November, 2016. A previous version of this paper was titled: “A Semiparametric Network Forma-tion Model with Multiple Linear Fixed Eﬀects.” † Department of Economics, University of Warwick, Coventry, U.K. Email:

[email protected] .I am deeply grateful to Federico Bugni, Shakeeb Khan, Arnaud Maurel, and Matthew Masten for their excellentguidance, constant encouragement, and helpful discussions. I also thank Irene Botosaru, ´Aureo de Paula, AndreasDzemski, Cristina Gualdani, Bryan Graham, Bo Honor´e, Arthur Lewbel, Thierry Magnac, Chris Muris, James Powell,Adam Rosen, Takuya Ura, Martin Weidner, and seminar participants at Aarhus, Cambridge, Duke, Gothenburg, LSE,Surrey, Syracuse, TSE, UCL, UNC Chapel Hill, Vanderbilt, Warwick, 2018 ES Winter Meeting in Philadelphia, 2019Panel Data Workshop at the University of Amsterdam, 2019 Royal Economic Society at the University of Warwick,for their comments. Introduction

People tend to connect with individuals with whom they share similar observed attributes. Thisobservation is known as homophily and it is one of the main objects of study in the literature of socialnetworks (McPherson, Smith-Lovin, and Cook 2001). However, few have investigated the role ofhomophily when individuals have preferences for unobserved attributes. Proper policy evaluationrequires us to distinguish between the contributions of observed and unobserved attributes, sincethey have diﬀerent policy implications. For example, students might form friendships based ontheir similarities on observed socioeconomic attributes as well as on their preferences for highlevels of unobserved ability. While socioeconomic attributes can be inﬂuenced by a given policyintervention, preferences for ability are harder to change via targeted policies. In this paper, Istudy the identiﬁcation and estimation of the preference parameters associated with the observedattributes in a model of network formation that accounts for valuations on unobserved agent-speciﬁcfactors. The identiﬁcation and estimation strategies that I develop do not depend on distributionalassumptions of the unobserved random components.In particular, I consider a semiparametric model of network formation with unobserved agent-speciﬁc heterogeneity. Speciﬁcally, two distinct agents i and j form an undirected link accordingto the following network formation equation: D ij = (cid:2) g ( Z i , Z j ) ′ β + A i + A j − U ij ≥ (cid:3) , (1)where [ · ] is the indicator function, D ij is a binary outcome variable that takes a value equal to1 if agents i and j form a link and 0 otherwise, Z i is a vector of individual-speciﬁc and observedattributes, g is a measurable function that is assumed to be known, nonlinear, ﬁnite, and symmetricon its arguments, β is a vector of unknown parameters, A i and A j are unobserved and agent-speciﬁcrandom variables, and U ij is an unobserved and link-speciﬁc disturbance term.Intuitively, equation (1) says that an undirected link between two agents is formed if the netbeneﬁt of the link between agents i and j is nonnegative. The components in equation (1) can beclassiﬁed into three diﬀerent categories. The ﬁrst class, given by the vector of exogenous attributes g ( Z i , Z j ), captures the agents’ preferences for establishing a link based on observed characteristics.For instance, this component is known as homophily on observed attributes when it capturespreferences for sharing similar traits. The second class, formed by the agent-speciﬁc and unobservedfactors A i and A j , captures the individual preferences for establishing connections based on agent-speciﬁc unobserved traits. Finally, the third class, given by a link-speciﬁc disturbance term U ij ,captures the exogenous factors that inﬂuence the decision to form a speciﬁc link. The componentsin the last two categories are known to the agents but unobserved to the researcher. A link between two agents is undirected if the connection is reciprocal. In other words, two agents are eitherconnected or they are not. It excludes the case where one agent is related to another without the second being relatedto the ﬁrst. Their methodologies diﬀer substantially from the one proposed here since they follow aparametric conditional maximum likelihood approach to estimate the vector of coeﬃcients β . Incontrast, I study the formation of an undirected network and follow a semiparametric approach.This paper builds on the seminal work by Graham (2017), which aims to detect preferences Charbonneau (2017) and Jochmans (2017) study a two-way gravity model, which can be rationalized as a bipartitenetwork with directed links. U ij is not parametrically speciﬁed.Since the initial draft of this paper was circulated, recent studies have appeared analyzingsemiparametric or nonparametric variations of a dyadic network formation model with unobservedheterogeneity; these include papers by Toth (2017); Gao (2020), and Zeleneev (2020).Similarly to this paper, Toth (2017) studies a dyadic network formation model in which thedistribution of U ij is unknown. However, the author uses a diﬀerent identiﬁcation strategy. Inparticular, his strategy relies on assuming that each component in the vector of observed attributes Z i is continuously distributed which is then used to propose an identiﬁcation strategy similar tothe maximum rank by Han (1987). An estimator for β is then deﬁned as the maximizer of a Uprocess of order 4, with a nonparametric ﬁrst-step estimator. Gao (2020) studies the identiﬁcation of a dyadic network model with a nonparametric functionalform for the preferences on homophily and an unknown cumulative distribution for U ij . Heidentiﬁes the nonparametric homophily function by introducing a novel identiﬁcation strategy thatimposes an interquartile-range normalization and a location normalization of one of the quantilesas stochastic restrictions on the distribution of U ij .Finally, Zeleneev (2020) studies the identiﬁcation and estimation of a dyadic network formationmodel with a nonparametric structure of the unobserved heterogeneity. This framework allows himto account for latent homophily on the unobserved attributes. The author’s identiﬁcation analysisis based on introducing a pseudo-distance between a pair of agents i and j , which allows him torecover groups of agents with the same levels of agent-speciﬁc unobserved heterogeneity. Afterconditioning on the matched agents with similar unobserved heterogeneity, the identiﬁcation of thevector of coeﬃcients proceeds from a pairwise diﬀerence strategy. The estimation procedure followsthe same logic as the identiﬁcation strategy.Contrary to previous studies, the identiﬁcation strategy proposed here is based on the existenceof a special regressor (see, e.g., Lewbel (1998) and Lewbel (2012) for a survey). This paper, tothe best of my knowledge, represents the ﬁrst eﬀort in the econometric literature to introducea special regressor to analyze a network formation model. The vector of parameters β is pointidentiﬁed after introducing a transformation that consists in weighting the linking decisions D ij Toth (2017) also proposes a variation of his estimation strategy which requires maximizing a U-process of order 2,with a nonparametric ﬁrst-step estimator. This moditication improves the computational tractability of his method. Gao (2020) also provides several interesting extensions on the functional form of the unobserved heterogeneity;for reference, see Gao (2020, p. 5) and Zeleneev (2020, p. 6). Those extensions are beyond the scope of this paperand left for future research.

4y the inverse of the conditional density of the special regressor given the observed attributes.This transformation utilizes features of the distributions of observables and does not represent astochastic restriction on the distribution of U ij . Therefore it is not nested in any existing work. Asa restriction on the distribution of U ij , I normalize to zero the conditional mean of the link-speciﬁcdisturbance terms given the observed attributes. In Section 3.1, I provide a detailed discussion onthe suﬃcient conditions needed to point identify β via the existence of a special regressor.The second point identiﬁcation result introduced in section 3.2 is based on a suﬃcient statisticargument at the tails of the distribution of a covariate with full support. The identiﬁcation strategyshows that within- and across-individuals variation in the linking decisions can be used as a suﬃcientstatistic to diﬀerentiate out the unobserved agent-speciﬁc factors in some sets of suﬃcient variationsof the covariate with full support. The existence of only one continuous attribute with large supportin Z i is suﬃcient to show this result. The latter assumption is satisﬁed by many real networkdatasets, and hence it is empirically relevant. The resulting semiparametric estimator is solved inone step, and it is deﬁned as the maximizer of a U-process of order 4 with a trimming sequence.In Section 4, I introduce a two-step semiparametric estimator for β based on the identiﬁca-tion result that requires the existence of a special regressor. The estimator has an analytic formsimilar to the least-squares, and it uses a ﬁrst-step kernel estimator to weight the linking deci-sions D ij by the inverse of the conditional density of the special regressor. In a recent paper,Graham, Niu, and Powell (2019) have studied the nonparametric estimation of density functionswith dyadic data. I follow their ﬁndings to perform the ﬁrst-step kernel estimation. In theorems4.1 and 4.2, I show that the semiparametric estimator for β is consistent and has limiting normaldistribution.Finally, the network formation model that I analyze is related to the literature on empir-ical games. Speciﬁcally, the model in equation (1) can be derived as a stable outcome in astatic game. Papers that study the strategic formation of a network as a static game includeGoldsmith-Pinkham and Imbens (2013); Leung (2015a,b); Menzel (2015); Miyauchi (2016); Boucher and Mouriﬁ´e(2017); de Paula, Richards-Shubik, and Tamer (2017); Mele (2017); Candelaria and Ura (2018);Sheng (2018); Gualdani (2020), and Ridder and Sheng (2020). The authors study network forma-tion models that account for network externalities. Network externalities generate interdependen-cies in the linking decisions that depend on the structure of the network. The identiﬁcation andestimation methods used in these papers diﬀer substantially from the ones proposed here as theyrestrict the presence and distribution of the unobserved agent-speciﬁc heterogeneity.The rest of the paper is organized as follows. Section 2 introduces the network formation model In further research I will explore the informational content of the special regressor in a network formation modelgiven a quantile or median restriction. For example, in the National Longitudinal Study of Adolescent to Adult Health (Add Health) dataset, householdincome is a continuous variable that can be demeaned and standardized to satisfy the support condition.

A network is an ordered pair ( N n , D n ) formed by a set of n agents denoted by N n = { , · · · , n } andan n × n adjacency matrix D n , which represents the links between the agents in N n . Let D ij denotethe ( i, j )th entry of the matrix D n . I assume the network is undirected and unweighted. A networkis undirected if the adjacency matrix is symmetric, i.e., D ij = D ji . A network is unweighted if any( i, j )th entry of the adjacency matrix takes one of two values, where the values are normalized tobe 0 and 1. In other words, D ij ∈ { , } , where D ij = 1 if the agents i and j share a link and D ij = 0 otherwise. Furthermore, I normalize the value of self-ties to zero, that is, D ii = 0 for anyagent i . Example 1 (Friendships network) . A network of best friends is an example of an undirected andunweighted network. Two agents are considered to be best friends if and only if both agents reporteach other as friends. In this case, D ij = D ji = 1 . Also, this example rules out the scenario of anagent reporting herself as her best friend. Each agent i ∈ N n is endowed with a K + 1-dimensional vector of observed attributes Z i andan unobserved scalar component term A i . Common examples of observed attributes that could ex-plain the formation of a friendships network among high school students are age, gender, ethnicity,religion, and the students’ interest in extracurricular activities. The component A i captures indi-vidual i ’s preferences for establishing a link based on unobserved and agent-speciﬁc attributes. Theunobserved component U ij captures exogenous stochastic factors that inﬂuence the pair-speciﬁcdecision to establish a link between agents i and j .Given the vectors of observed attributes Z i and Z j for i = j , let ¯ Z ij = g ( Z i , Z j ) be a K + 1-dimensional vector of pair-speciﬁc attributes. The function g is assumed to be a known measurablefunction that is nonlinear and ﬁnite. Given the undirected nature of the network, g is assumedto be symmetric on its terms. The speciﬁcation of g varies according to the empirical applicationand is chosen by the researcher to capture homophily or heterophily eﬀects. For example, supposethat Z i is a scalar random variable that represents agent i ’s gender, then ¯ Z ij could be deﬁned as The intuition behind the requirement that g is a nonlinear function is similar to the logic for the identiﬁcationof the vector of coeﬃcients in a linear panel data model with ﬁxed eﬀects. A speciﬁc feature of those models is thatonly the coeﬃcients associated with time-varying variables are identiﬁed. The identiﬁcation strategies proposed insection 3 use the pairwise variation in ¯ Z ij to identify β . The assumption that g is nonlinear rules out the case thatthe pairwise variation is equal to the vector of zeroes, and hence, β is not identiﬁed. [ Z i = Z j ] to capture the preferences for homophily. Under this speciﬁcation, ¯ Z ij equals 1 if agents i and j share the same gender and 0 otherwise.The network formation model described in equation (1) can be obtained as a stable outcome of arandom utility model with transferable utilities. In particular, let ¯ u ij ( ¯ Z ij , A j , U ij ) denote individual i ’s latent valuation of establishing a link with j given their shared-observed attributes ¯ Z ij , agent j ′ s unobserved type A j , and their common unobserved factor U ij . It follows that the joint net beneﬁtof adding the link { i, j } to the network D n is¯ u ij ( ¯ Z ij , A j , U ij ) + ¯ u ji ( ¯ Z ij , A i , U ij ) = ¯ Z ′ ij β + A i + A j − U ij . (2)Notice that the joint net beneﬁt accounts for the preferences based on the observed attributes¯ Z ′ ij β , as well as preferences for association based on agent-speciﬁc factors A i + A j , and for exogenousfactors aﬀecting the decision to establish a link U ij .Equation (2) implies that two distinct individuals i and j in N n only have valuations for theirown observed attributes and agent-speciﬁc factors. To clarify, in the link formation decision fordyad { i, j } , the individuals do not take into account either observed and unobserved attributesof other individuals in the network, or general features of the network other than the dyad { i, j } .These eﬀects are known as network externalities (see, e.g., Chandrasekhar and Jackson 2014; Leung2015b; Mele 2017; Menzel 2015; Badev 2018; Sheng 2018; Ridder and Sheng 2020). Some examplesof these eﬀects are preferences for reciprocity, transitive triads, or high network degree. I leave thisextension for future research.Next, I introduce the deﬁnition of stability. Deﬁnition 1 (Stability) . A network D n is stable with transfers if for any distinct i, j ∈ N n :1. for all D ij = 1 , ¯ u ij ( ¯ Z ij , A j , U ij ) + ¯ u ji ( ¯ Z ij , A i , U ij ) ≥ ;2. for all D ij = 0 , ¯ u ij ( ¯ Z ij , A j , U ij ) + ¯ u ji ( ¯ Z ij , A i , U ij ) < . Notice that this deﬁnition adapts the pairwise stability in Jackson and Wolinsky (1996) to allowfor transferable utilities. Intuitively, this condition states that a link within dyad { i, j } is establishedif the net beneﬁt of that connection is nonnegative. For a generalization to nontransferable utilities,see Gao, Li, and Xu (2020). The following notation will be maintained in the rest of the paper. I will assume that the vectorof observed covariates Z i = ( v i , X ′ i ) ′ is comprised of a scalar random variable v i ∈ R and a K -7imensional random vector X i ∈ R K . Similarly, let¯ Z ij = (cid:0) g ( v i , v j ) , g ( X i , X j ) ′ (cid:1) ′ = ( v ij , W ′ ij ) ′ denote the observed covariates at dyad level, and let β = (1 , θ ′ ) ′ .I will denote the distinct proﬁles of observed attributes for all the agents in the network as Z n = { Z i : i ∈ N n } , v n = { v i : i ∈ N n } , and X n = { X i : i ∈ N n } . Similarly, let A n = { A i : i ∈ N n } denote the proﬁle of unobserved attributes. Moreover, let Z − ij = { Z k : k = i, j } , and A − ij = { A k : k = i, j } denote the collection of observed and unobserved attributes for all agentsin the network other than agents i and j .The identiﬁcation and estimation strategies introduced in sections 3 and 4 use the informationcontained in subnetworks formed by groups of four distinct agents { i , i , j , j } , also known astetrads. The following notation is used to describe attributes at the tetrad level. Given a networkof size n , there is a total of m n = 4! (cid:18) n (cid:19) ordered tetrads with distinct indices i , i , j , j ∈ N n . Let σ be a function that maps thesetetrads to the index set N m n = { , · · · , m n } . Thus, each tetrad with distinct indices { i , i , j , j } corresponds to a unique σ ( { i , i , j , j } ) ∈ N m n .Given any σ ( { i , i , j , j } ) ∈ N m n , let v σ = { v i , v j , v i , v j } , X σ = { X i , X j , X i , X j } , and A σ = { A i , A j , A i , A j } .Moreover, deﬁne the pairwise variations across observed attributes and linking decisions asfollows ˜ v σ = ˜ v i i ,j j = ( v i j − v i j ) − ( v i j − v i j )˜ W σ = ˜ W i i ,j j = ( W i j − W i j ) − ( W i j − W i j )˜ D σ = ˜ D i i ,j j = ( D i j − D i j ) − ( D i j − D i j ) . Finally, given any ﬁxed tetrad σ ( { i , i , j , j } ) ∈ N m n , let ω l l = ( v l l , X l , X l , A l , A l ) de-note the proﬁle of attributes at dyad-level and p n ( ω l l ) = P [ D l l = 1 | ω l l ] denote the probabilitythat a link is created for any dyad ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } .8 Identiﬁcation

This section introduces the main identiﬁcation results for the semiparametric network formationmodel with unobserved agent-speciﬁc factors. In particular, section 3.1 presents the main pointidentiﬁcation result when a special regressor is available. Section 3.2 introduces a second pointidentiﬁcation result when a covariate with full support is available.

Using the notation introduced in section 2, the rest of the paper considers the following represen-tation for the network formation model speciﬁed by equation (1). In particular, agents i and j in N n with i = j will form an undirected link according to the following equation D ij = (cid:2) v ij + W ′ ij θ + A i + A j − U ij ≥ (cid:3) , (3)where the coeﬃcient associated with v ij has been normalized to 1 and θ is a K -dimensional vectorof coeﬃcients. Given that the network of interest is undirected, U ij is assumed to be symmetric,i.e., U ij = U ji . The vector θ represents the main parameter of interest.Assumptions 3.1.1-3.1.5 will specify the underlying structure for the network formation modelin equation (3), which will be used to show the main identiﬁcation result for θ . Assumption 3.1.1.

The random sequence { Z i , A i } ni =1 is independent and identically distributed. Assumption 3.1.1 describes the sampling process, and it is widely used to describe network data(see, e.g., Graham 2017; Jochmans 2018, and Auerbach 2019).

Assumption 3.1.2.

For any ﬁnite n , the following holds.1. The sequence { U ij | Z n , A n } i = j is conditionally independent and identically distributed forany dyad { i, j } . Moreover, U ij = U ji for any dyad { i, j } .2. For any dyad { i, j } , U ij | Z n , A n d = U ij | Z i , Z j , A i , A j . Assumption 3.1.2.1 states that conditional on ( Z n , A n ) the link-speciﬁc disturbance terms { U ij } i = j are independent across dyads { i, j } and drawn from the same distribution. Furthermore,Assumption 3.1.2.2 requires that conditional on ( Z i , Z j , A i , A j ), the link-speciﬁc disturbance terms U ij are independent of any observed or unobserved feature in ( Z − ij , A − ij ). Assumption 3.1.2 en-sures that each of the linking decisions in the network is conditionally independent. In other words,it rules out interdependence across linking decisions due to externalities across the network.9otice that Assumption 3.1.2 allows for heteroskedasticity of a general form in the distributionof U ij . Moreover, it allows for ﬂexible dependence between the unobserved agent-speciﬁc factors andthe observed attributes. In other words, Assumption 3.1.2 does not restrict the joint distribution( Z n , A n ). Assumption 3.1.2 is commonly used in semiparametric nonlinear panel data models, forexample in Arellano and Honor´e (2001). In network formation models, full stochastic independence U ij ⊥ Z n , A n is usually imposed (see, e.g., Leung 2015b; Menzel 2015; Graham 2017; Toth 2017,and Gao 2020). Arbitrary heteroskedasticity is also considered in Zeleneev (2020). Assumption 3.1.3.

Given n and any distinct i, j ∈ N n , let e ij = A i + A j − U ij and suppose that e ij is conditionally independent of v ij given ( X i , X j ) . Let F e | x ( e ij | X i , X j ) denote the conditionaldistribution of e ij given ( X i , X j ) , with support given by S e ( X i , X j ) and ﬁnite ﬁrst moment. Assumption 3.1.3 represents an exclusion restriction, and it entails that the regressor v ij isconditionally independent of e ij given the observed attributes ( X i , X j ). In other words, v ij is aspecial regressor in the sense of Lewbel (1998), Lewbel (2000), and Lewbel (2012). Assumption 3.1.4.

Given n and any distinct i, j ∈ N n , the conditional distribution of v ij given ( X i , X j ) is absolutely continuous with respect to the Lebesgue measure with conditional density f v | x ( v ij | X i , X j ) and support given by S v ( X i , X j ) = [ s v , s v ] for some constants s v and s v , with −∞ ≤ s v < < s v ≤ ∞ . For any ( X i , X j ) , the support of − W ′ ij θ − e ij is a subset of the interval [ s v , s v ] . Assumption 3.1.4 is a support condition, and it ensures that v ij | X i , X j has a positive densityfunction f v | x ( v ij | X i , X j ) on S v ( X i , X j ). Furthermore, it requires that for any ( X i , X j ) the sup-port of ( − W ′ ij θ − e ij ) is contained in S v ( X i , X j ). Notice that Assumption 3.1.4 does not restrict v ij | X i , X j to having full support on the real line. Hence the point identiﬁcation result introducedin this section is general enough to include both cases: (i) the full support case, and (ii) the existenceof a continuous covariate with bounded support that contains supp (cid:16) − W ′ ij θ − e ij | X i , X j (cid:17) . More-over, observe that Assumption 3.1.4 leaves unrestricted the distribution of the observed attributes( X i , X j ). Hence, this identiﬁcation strategy also allows for discrete covariates in W ij . Assumption 3.1.5.

Given n and any tetrad σ ∈ N m n , E [ U ij | X i , X j ] = 0 , and Γ = E h ˜ W σ ˜ W ′ σ i is a ﬁnite and nonsingular matrix. The conditional independence property needs to hold after conditioning on the observed attributes ( X i , X j ), andnot just the dyad-speciﬁc covariates W ij . The intuition behind this insight follows from Assumption 3.1.1, whichallows for unrestricted dependence between X i , and A i . In particular, the proof of Theorem 3.1 requires that anystochastic variation left in A i + A j after conditioning on ( X i , X j ), is independent of W kl for any k, l ∈ N n , including,for example W il . This property no longer holds if the conditioning variable used is W ij since it is only a feature of( X i , X j ). U ij | X i , X j has conditionally mean zero. The secondpart of assumption 3.1.5 is the standard full rank condition on the pairwise variation of the observedattributes ˜ W σ , and it ensures that θ is point identiﬁed.The network formation model speciﬁed by equation (3) and Assumptions 3.1.1-3.1.5 represents,to the best of my knowledge, the ﬁrst generalization of the special regressor to analyze network data.Following Lewbel (1998, 2000), Honor´e and Lewbel (2002), and Chen, Khan, and Tang (2019), let D ∗ ij be deﬁned as D ∗ ij = (cid:20) D ij − [ v ij > f v | x ( v ij | X i , X j ) (cid:21) (4)for any distinct i, j ∈ N n .The following theorem and appended corollary formalize the ﬁrst point identiﬁcation result for θ . Theorem 3.1.

If Assumptions 3.1.3-3.1.5 hold in equation (3) , then for any distinct i and j in N n E [ D ∗ ij | X i , X j ] = W ′ ij θ + E [ A i + A j | X i , X j ] . Proof.

See Appendix A.

Corollary 3.1.

If Assumptions 3.1.1-3.1.5 hold in equation (3) , then for any tetrad σ ∈ N m n E h ˜ W σ ˜ D ∗ σ i = E h ˜ W σ ˜ W ′ σ i θ , (5) and hence, θ = Γ − × Ψ (6) with Ψ = E h ˜ W σ ˜ D ∗ σ i . Proof.

See Appendix A.Theorem 3.1 and Corollary 3.1 demonstrate that θ is point identiﬁed using the informationcontained in the joint distribution of { ˜ D ∗ σ , ˜ W σ } at tetrad level, and with analytic expression givenby equation (6). This result shows that θ is identiﬁed as an average of the linking decisions ˜ D σ which are weighted by the inverse of the conditional density of the special regressor given the11bserved attributes, f v | x ( v ij | X i , X j ). The result in Corollary 3.1 will be used as a foundation ofthe semiparametric estimator introduced in Section 4.Given the results in Theorem 3.1 and Corollary 3.1 the average contribution of the unobservedagent-speciﬁc factors to the formation of a link is also identiﬁed. Corollary 3.2.

If Assumptions 3.1.1-3.1.5 hold in equation (3) , then for any i and j in N n E [ A i + A j ] = E (cid:2) D ∗ ij (cid:3) − E [ W ij ] ′ θ , (7) In this section, I provide a second point identiﬁcation result for the vector of coeﬃcients θ . Thisresult does not require the regressor v ij to be conditionally independent of the unobserved terms, A i + A j − U ij . Nonetheless, it imposes a large support condition on v ij and bounds the contributionthat the unobserved heterogeneity A i + A j has on the formation of links.The following notation will be used to state and prove this result. For any ﬁxed tetrad σ ( { i, j, k, l } ) ∈ N m n , denote the proﬁle of observed attributes at tetrad level as ¯ v σ = ( v ik , v il , v jk , v jl )and ¯ Z σ = (¯ v σ , X σ ). Moreover, for any σ ( { i, j, k, l } ) ∈ N m n and agent r with r ∈ { i, j } denote thewithin-individual r variation of the observed attributes as ∆ σ v r = v rk − v rl and ∆ σ W r = W rk − W rl ,and the within-individual r variation of the unobserved attributes as ∆ σ A = A k − A l .The following assumptions are suﬃcient to show the second point identiﬁcation result. Assumption 3.2.1.

For any ﬁnite n and dyad { i, j } , Assumption 3.1.2 holds. Furthermore, thelink-speciﬁc unobserved term U ij | Z i , Z j , A i , A j has a positive density over the real line. Assumption 3.2.1 ensures that the disturbance term U ij has a large support for any value of( Z i , Z j , A i , A j ). This assumption is used for simplicity to ensure that the conditional probabilityof forming a link is well deﬁned for any value of ( Z i , Z j , A i , A j ). Notice that any model where thedisturbance term U ij is logistically or normally distributed will satisfy this condition. Assumption 3.2.2.

The parameter space Θ is compact. Assumption 3.2.2 is a standard assumption in the semiparametrics literature, (see, e.g., Manski1975, 1985; Newey and McFadden 1994, and Powell 1994). This assumption is used to control thecontribution that the variation in W ij has on the formation of links. Assumption 3.2.3.

For any ﬁnite n , the following holds for any σ ( { i, j, k, l } ) ∈ N m n .1. For all X σ , ¯ v σ is continuously distributed with a positive density over R . . For all X σ and r ∈ { i, j } , ∆ σ v r is continuously distributed with a positive density over the realline, and the supp ( − ∆ σ W ′ r θ − ∆ σ A | X σ ) = [ s ε , s ε ] is known with −∞ < s ε < < s ε < ∞ . Assumption 3.2.3 ensures that the regressor v ij has a large support. Moreover, it requiresthat the variation in v ij dominates the contribution that the remaining factors have in creating anetwork link. Notice that this condition does not impose that v ij is conditionally independent of A i + A j given X σ . Intuitively, Assumption 3.2.3 guarantees that the information at the tails ofthe distribution of ∆ σ v r can disentangle the contributions of the preferences for homophily andunobserved heterogeneity on the creation of network links. Assumption 3.2.4.

For any ﬁnite n and tetrad σ ( { i, j, k, l } ) ∈ N m n , P h ˜ W ′ σ γ = 0 i > for allnon-zero vectors γ ∈ R K . Assumption 3.2.4 is a rull rank condition.For any ﬁxed σ ( { i, j, k, l } ) ∈ N m n and given X σ , let V ( X σ ) denote the set of values for whichthe variations in ∆ σ v i and ∆ σ v j dominates the contribution of the remaining factors. That is tosay: V ( X σ ) = { ¯ v σ : ∆ σ v i ≤ s ε & ∆ σ v j ≥ s ε , or ∆ σ v i ≥ s ε & ∆ σ v j ≤ s ε } . (8)Notice that this set can be characterized using Assumption 3.2.3. Also, deﬁne ξ ( θ ) as ξ ( θ ) =  ¯ z σ : ¯ v σ ∈ V ( X σ ) and sign n E θ h ˜ D σ | X σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } io = sign n E θ h ˜ D σ | X σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } io  , which characterizes the set of states for which the sign of the conditional expectation of the pairwisevariations of the links ˜ D σ implied by θ diﬀers from the sign of the conditional expectation generatedunder θ . In other words, the set ξ ( θ ) summarizes the values of observed attributes for which θ canbe identiﬁed from θ using the information contained in the conditional expectation of ˜ D σ . Hence, θ is said to be identiﬁed relative to θ = θ if P (cid:2) ¯ Z σ ∈ ξ ( θ ) (cid:3) > . The next theorem and appended corollary formalizes the second point identiﬁcation result.

Theorem 3.2.

Suppose Assumptions 3.1.1, 3.2.1, 3.2.2, and 3.2.3 hold in equation (3) . Let Q θ = n ¯ z σ : ¯ v σ ∈ V ( X σ ) and ˜ W ′ σ θ ≤ − ˜ v σ < ˜ W ′ σ θ or ˜ W ′ σ θ ≤ − ˜ v σ < ˜ W ′ σ θ o . If P (cid:2) ¯ Z σ ∈ Q θ (cid:3) > , θ is point identiﬁed relative to θ . roof. See Appendix A.

Corollary 3.3.

Suppose Assumptions 3.1.1, 3.2.1- 3.2.4 hold in equation (3) . Then θ is pointidentiﬁed.Proof. See Appendix A.The results in Theorem 3.2 and Corollary 3.3 can be used to deﬁne an estimator for θ as themaximizer of a U -process of order 4 with a trimming sequence γ n such that γ n → ∞ as n → ∞ .In particular, the estimator of θ can be deﬁned asˆ θ = arg max θ ∈ Θ ˆ H n ( θ, γ n )where ˆ H n ( θ, γ n ) = (cid:20) (cid:18) n (cid:19)(cid:21) − n X i =1 X j = i X i = i ,j X j = i ,j ,i H (cid:16) ¯ Z σ ( { i ,j ; i ,j } ) , ˜ D σ ( { i ,j ; i ,j } ) ; θ, γ n (cid:17) H (cid:16) ¯ Z σ , ˜ D σ ; θ, γ n (cid:17) = h sign n ˜ v σ + ˜ W ′ σ θ o × ˜ D σ i × h | ˜ D σ | = 2 i × [ | ∆ σ v i | , | ∆ σ v j |≥ γ n ] . Although point identiﬁcation of θ is achieved assuming that the bounds [ s ε , s ε ] are known,notice that they are not needed to deﬁne the estimator ˆ θ . In other words, it is suﬃcient to assumethat ∆ σ v i has a large support which contains supp ( − ∆ σ W ′ i θ − ∆ σ A | X σ ) to characterize theestimator for θ .Naturally, the asymptotic properties of ˆ θ will depend on the frequency of subgraph conﬁgura-tions that satisfy the restriction h | ˜ D σ | = 2 i in the sample, and the rate at which γ n → ∞ as n → ∞ . The rest of this paper prioritizes the study of the semiparametric estimator introduced insection 4 since it is computationally more tractable than ˆ θ . In this section, I introduce a semiparametric estimator for θ based on the point identiﬁcationresult derived in section 3.1. The estimator for θ denoted by b θ n is a two-step estimator with anonparametric estimate of the conditional distribution of v ij given { X i , X j } , i.e., f v | x ( v ij | X i , X j ).Section 4.1 provides suﬃcient conditions to study the large sample properties of b θ n . Theorem 4.1proves that b θ n is a consistent estimator of θ . Theorem 4.2 shows that the limiting distribution of b θ n is normal. 14 .1 Consistency The estimator for θ is deﬁned as the sample analog of equation (6) and is obtained by averaging overthe linking decisions ˜ D σ for all distinct tetrads σ ∈ N m n . Given that the inverse of f v | x ( v ij | X i , X j )is used as a weight in the deﬁnition of Ψ , and hence θ , I introduce a trimming sequence intendedto avoid boundary eﬀects arising from the ﬁrst-step estimation of f v | x ( v ij | X i , X j ).Recall that ˜ D σ is deﬁned as the pairwise variation across the linking decisions for a given tetrad σ ( { i , i , j , j } ) ∈ N m n . I extend that notation to deﬁne as follows the pairwise variation of thetrimmed network links given a trimming parameter τ e D ∗ σ,τ = (cid:0) D ∗ i j ,τ − D ∗ i j ,τ (cid:1) − (cid:0) D ∗ i j ,τ − D ∗ i j ,τ (cid:1)b D ∗ σ,τ = (cid:16) b D ∗ i j ,τ − b D ∗ i j ,τ (cid:17) − (cid:16) b D ∗ i j ,τ − b D ∗ i j ,τ (cid:17) , where for any distinct i and j in N n D ∗ i j ,τ = (cid:18) D i j − [ v i j > f v | x ( v i j | X i , X j ) (cid:19) I τ ( v i j , X i , X j ) b D ∗ i j ,τ = D i j − [ v i j > b f v | x ( v i j | X i , X j ) ! I τ ( v i j , X i , X j ) . In the equations above, f v | x ( v i j | X i , X j ) denotes the true conditional density function of v i j given ( X i , X j ), and b f v | x ( v i j | X i , X j ) denotes a kernel estimator of the conditional densityof v i j given ( X i , X j ). Thus, e D ∗ σ,τ denotes the pairwise variation of the trimmed network linksassuming that the conditional distribution of the special regressor given the observed attributesis known. Conversely, b D ∗ σ,τ denotes the pairwise variation of the trimmed network links when f v | x ( v i j | X i , X j ) is replaced by a ﬁrst-stage kernel estimator b f v | x ( v i j | X i , X j )The trimming sequence I τ ( v i j , X i , X j ) is a function of the observed attributes at a dyadlevel, and it converges to 1 as the trimming parameter τ → n → ∞ . Assumptions 4.1.2 and4.1.5 below describe the conditions imposed on the trimming parameter τ , (see Honor´e and Lewbel2002 and Khan and Tamer 2010). 15o ease the exposition, I introduce the following notation for any distinct i , j ∈ N n I τ,i j = I τ ( v i j , X i , X j ) f vx,i j = f v,x ( v i j , X i , X j ) f x,i j = f x ( X i , X j ) ϕ i j = D i j − [ v i j > ϕ i j ,τ = ϕ i j I τ,i j . With this notation at hand, the semiparametric estimator for θ is deﬁned as b θ n = b Γ − n × b Ψ n,τ (9)where b Γ n = 1 m n X σ ∈N mn h ˜ W σ ˜ W ′ σ ib Ψ n,τ = 1 m n X σ ∈N mn h ˜ W σ b D ∗ σ,τ i and m n = 4! (cid:0) n (cid:1) .The ﬁrst-stage kernel estimator b f v | x ( v i j | X i , X j ) is deﬁned as the ratio of the kernel estima-tors b f vx,i j and b f x,i j with b f vx,i j = 1( n − n − h L +1 X k = i ,j X k = i ,j ,k K vx,h [ v k k − v i j , X k − X i , X k − X j ] b f x,i j = 1( n − n − h L X k = i ,j X k = i ,j ,k K x,h [ X k − X i , X k − X j ] , where h denotes a bandwith parameter and L = 2 K . The kernels K vx,h and K x,h are deﬁned as K vx,h [ v k k − v i j , X k − X i , X k − X j ] = K vx (cid:20) v k k − v i j h , X k − X i h , X k − X j h (cid:21) K x,h [ X k − X i , X k − X j ] = K x (cid:20) X k − X i h , X k − X j h (cid:21) . Assumption 4.1.5 below describes the conditions imposed on the kernel functions K vx,h and K x,h ,and bandwith parameter h .The estimator deﬁned in equation (9) represents, to the best of my knowledge, the ﬁrst eﬀortto estimate the vector of parameters θ deﬁned in the network formation model given by equation(3) using a two-step semiparametric estimator that utilizes the existence of a special regressor.16 semiparametric approach is attractive because it does not restrict the distribution of thedisturbance term to any speciﬁc parametric family. Furthermore, it allows for a ﬂexible statis-tical dependence between the agent-speciﬁc unobserved factors and the observed attributes, i.e., { X n , A n } . As an additional appealing property, the estimator deﬁned in equation (9) has ananalytical form. This characteristic increases its computational tractability compared with theestimator deﬁned as the maximizer of a U-process and introduced in section 3.2. Regarding thenon-parametric ﬁrst-stage estimator, Leung (2015b, Supp. Appendix) and Graham et al. (2019)have studied the properties of kernel estimators for network data. I use their ﬁndings to analyzethe asymptotic properties of b θ n .The following technical conditions are needed to prove Theorems 4.1 and 4.2. For simplicity, thetheorems are stated and proved assuming that all of the elements of X i are continuously distributed.However, the results can be readily extended to include discretely distributed variables by applyingthe density estimator separately to each discrete cell of data. Assumption 4.1.1.

For any distinct indices i and j in N n , the dyad-level covariates ( X i , X j ) and ( v ij , X i , X j ) are absolutely continuous with respect to some Lebesgue measures with Radon-Nikodym densities f x,ij and f vx,ij , and supports denoted by S x and S vx . Assume that f x,ij and f vx,ij are bounded, f vx,ij is bounded away from zero, and there exists a constant M > L + 1 (recallthat L = 2 K , with dim ( X i ) = K ) such that f x,ij and f vx,ij are M -times diﬀerentiable with respectto all of its arguments with bounded derivatives. There exist ﬁnite constants C w, and C w, suchthat sup σ ∈N mn || ˜ W σ ||≤ C w, w.p.1 and E h || ˜ W σ || i < C w, . Assumption 4.1.1 ensures that the densities f x,ij and f vx,ij are continuous and M -times diﬀer-entiable. Also, it requires the existence of fourth-order moments for ˜ W σ , for any σ ∈ N m n . This as-sumption has been used in the literature of semiparametric methods, for example in Ahn and Powell(1993); Aradillas-Lopez (2012), and Honor´e and Lewbel (2002). Assumption 4.1.2.

Let τ be a density trimming parameter deﬁned above. Assume that the support S vx is known, and the trimming function I τ,ij is equal to zero if ( v ij , X i , X j ) is within a distance τ of the boundary of S vx , and otherwise, I τ,ij equals one. Also, assume that τ → and τ n → as n → ∞ . Due to the weighting scheme used in the deﬁnition of b D ∗ i j , boundary eﬀects could arise fromthe density estimation step when computing b Ψ n,τ . Assumptions 4.1.1 and 4.1.2 deal with thistechnicality by assuming that f vx,i j is bounded away from zero and by introducing a trimmingsequence I τ ( v i j , X i , X j ) that sets to zero the terms in b Ψ n,τ with data within a τ distance ofthe boundary of S vx , (see, e.g., Lewbel 1997, 2000; Honor´e and Lewbel 2002, and Khan and Tamer2010)Assumptions 4.1.1 and 4.1.2 require that the support S vx is known. The support S vx is identiﬁedfrom the distribution of observables, and hence, it can be estimated in an empirical application.17s an alternative approach to Assumption 4.1.2, a ﬁxed trimming function that is not n -dependentcould be used instead, (see, e.g., Aradillas-Lopez, Honor´e, and Powell 2007 and Aradillas-Lopez2012). Assumption 4.1.3.

Let M be as deﬁned above. Given any tetrad σ ( { i , i , j , j } ) ∈ N m n , let Ξ ( X l , X l ) = E h ˜ W σ D ∗ l l ,τ | X l , X l i Ξ ( v l l , X l , X l ) = E h ˜ W σ D ∗ l l ,τ | v l l , X l , X l i for any dyad ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } . The expectations Ξ ( x, x ) and Ξ ( v, x, x ) exist and are continuous in the components of ( v, x, x ′ ) for all ( v, x, x ′ ) ∈ S vx . Also, Ξ ( x, x ) and Ξ ( v, x, x ) are M -times diﬀerentiable in the components of ( v, x, x ′ ) for all ( v, x, x ′ ) ∈ S vx , where S vx diﬀers from S vx by a set of measure zero.There exist some functions m x ( x, x ) and m vx ( v, x, x ′ ) such that the following local Lipschitzconditions hold for some ( x , x ′ ) and ( v , x , x ′ ) in an open neighborhood of zero and for all τ > : || f vx ( v + v , x + x , x ′ + x ′ ) − f vx ( v, x, x ′ ) || ≤ m vx ( v, x, x ′ ) || ( v , x , x ′ ) |||| f x ( x + x , x ′ + x ′ ) − f x ( x, x ′ ) || ≤ m x ( x, x ′ ) || ( x , x ′ ) |||| Ξ( v + v , x + x , x ′ + x ′ ) − Ξ( v, x, x ′ ) || ≤ m vx ( v, x, x ′ ) || ( v , x , x ′ ) |||| Ξ( x + x , x ′ + x ′ ) − Ξ( x, x ′ ) || ≤ m x ( x, x ′ ) || ( x , x ′ ) || . Assumption 4.1.3 imposes local smoothness conditions that are needed to derive the H´ajek pro-jection of a V -statistic. Similar conditions have been used in Ahn and Powell (1993); Aradillas-Lopez(2012), and Honor´e and Lewbel (2002). Assumption 4.1.4.

Given any σ ( { i , i , j , j } ) ∈ N m n and ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } ,let χ l l = χ ( X l , X l ) = E h ˜ W σ | X l , X l i . The following moments exist sup ( x,x ′ ) ∈ S x χ ( x, x ′ )sup ( v,x,x ′ ) ∈ S v,x ,τ ≥ E "(cid:18) ϕ l l ,τ f vx ( v, x, x ′ ) (cid:19) | v, x, x ′ sup ( v,x,x ′ ) ∈ S v,x ,τ ≥ E "(cid:18) D ∗ l l ,τ f vx ( v, x, x ′ ) (cid:19) | v, x, x ′ , nd the objects χ ( x, x ′ ) E "(cid:18) ϕ l l ,τ f vx ( v, x, x ′ ) (cid:19) | v, x, x ′ E "(cid:18) D ∗ l l ,τ f vx ( v, x, x ′ ) (cid:19) | v, x, x ′ are continuous in the components of ( v, x, x ′ ) ∈ S vx . Moreover, there exists a ﬁnite constant C χ ,such that E (cid:2) || χ ( x, x ′ ) || (cid:3) ≤ C χ for any ( x, x ′ ) ∈ S x . Assumption 4.1.4 ensures the existence and boundedness of the conditional expectations deﬁnedabove. These conditions are needed to invoke a uniform law of large numbers for V -statistics. Thelast part of Assumption 4.1.4 guarantees the existence of sixth-order moments, and it will be usedto invoke a conditional central limit theorem. Assumption 4.1.5.

Let M and τ be as deﬁned above. The kernel K x ( x, x ′ ) : R L R and bandwith h used to deﬁne the kernel estimator ˆ f x satisfy:1. K x ( x, x ′ ) = 0 for all ( x, x ′ ) on the boundary of, and outside of, a convex bounded subset of R L . This subset has an nonempty interior and has the origin as an interior point.2. K x ( · , · ) is symmetric around zero, bounded, diﬀerentiable, and bias-reducing of order M .3. There exists δ > such that n − δ h L +1 → ∞ , nh M → , and h/τ → .The kernel function K v,x ( v, x, x ′ ) has all the same properties, replacing ( x, x ′ ) with ( v, x, x ′ ) . Assumption 4.1.5 requires the use of a higher-order kernel. This selection is motivated tocontrol the bias induced by using the inverse of f v | x ( v i j | X i , X j ) as a weighting function. Thisassumption has been used by Honor´e and Lewbel (2002) and Leung (2015b). Graham et al. (2019)provide a comprehensive treatment of kernel estimation for undirected network data.Using the assumptions above, it follows that b θ n deﬁned in equation (6) is a consistent estimatorof θ . Theorem 4.1 formally states this result. Theorem 4.1.

Let Assumptions 3.1.1-3.1.5 and 4.1.1-4.1.5 hold. Then ( b θ n − θ ) p → as n → ∞ .Proof. See Appendix A. 19 .2 Asymptotic Distribution

The following theorem derives the asymptotic distribution of ˆ θ n . A key step in proving this resultis to show that p n ( n − − / n n b Ψ n,τ − E h ˜ W σ e D ∗ σ,τ | v σ , X σ , A σ io ⇒ N (0 , I ) , where I denotes the K -dimensional identity matrix, and Υ n = n ( n − V ar (cid:16) b Ψ n,τ (cid:17) , which is deﬁnedas Υ n = 1 n ( n − n X i =1 X j = i E (cid:20)(cid:26) p n ( ω i j ) [1 − p n ( ω i j )] f v | x,i j (cid:27) I τ,i j (cid:21) χ i j χ ′ i j with χ i j = n n − n − P i = i ,j P j = i ,j ,i E h ˜ W σ { i ,i ; j ,j } | X i , X j io . The proof of this result follows from showing that n b Ψ n,τ − E h ˜ W σ e D ∗ σ,τ | v σ , X σ , A σ io is asymptotically equivalent to its H´ajek Projection onto an arbitrary function of ζ i j = ( v i j , X i , X j , A i , A j , U i j ) . The resulting H´ajek Projection is an average of conditionally independent random variables at adyad level, with conditional mean equal to 0 and a conditional variance that approximates Υ n inthe limit. The result follows from a conditional version of Lyapunov’s central limit theorem (see,e.g., Rao 2009).The remaining information needed to derive the limiting distribution of the semiparametricestimator ˆ θ n , is the convergence rate of Υ n , which is given by ̺ n = O (Υ n ) = O (cid:18) E (cid:20)(cid:26) p n ( ω i j ) [1 − p n ( ω i j )] f v | x,i j (cid:27) I τ,i j (cid:21)(cid:19) , and the following matrix Σ n = Γ − × Υ n × Γ − . The next theorem formalizes the limiting distribution of b θ n . Theorem 4.2.

Suppose Assumptions 3.1.1-3.1.5, 4.1.1-4.1.5, and n ( n − ̺ − n → ∞ hold. It then ollows that p n ( n − − / n (cid:16)b θ n − θ (cid:17) = Σ − / n × Γ − ×  p n ( n − n X i =1 X j = i ξ i j ,τ  + o p (1) (10) with ξ i j ,τ = (cid:8) D ∗ i j − E (cid:2) D ∗ i j | ω i j (cid:3)(cid:9) I τ,i j χ i j , and thus, p n ( n − − / n (cid:16)b θ n − θ (cid:17) ⇒ N (0 , I ) . Proof.

See Appendix A.Equation (10) describes the asymptotic linear representation of b θ n . The limiting distribution of b θ n is derived following a studentized approach as in Andrews and Schafgans (1998), Khan and Tamer(2010), and Jochmans (2018) to control for the possible varying rates of convergence due to sparsityof the network. Notice that if ̺ − n converges to a ﬁnite constant that is bounded away from zero, b θ n − θ converges at a parametric rate p n ( n − ̺ − n decays as n increases, b θ n − θ has a slower rate ofconvergence given by O p (cid:18)q n ( n − ̺ − n (cid:19) . This section presents simulation evidence for the ﬁnite sample performance of the semiparametricestimator introduced in Section 4. I explore the properties of the estimation technique under awide array of DGP designs that are meant to capture diﬀerences in the sample size and in the levelof sparsity of the network (see, e.g., Jochmans 2018; Dzemski 2019; Yan et al. 2019).The undirected network is simulated according the network model in equation (3). I consider asingle observed attribute in X i , which is drawn as X i ∼ Beta(2 , − . The pair-speciﬁc covariate W ij = g ( X i , X j ) is constructed to account for complementarities on the observed attributes andis deﬁned as W ij = X i X j . The agent-speciﬁc unobserved factor A i is generated such that it iscorrelated with X i and depends on the sample size n . This last feature oﬀers a useful approach tocontrol the degree of sparsity in the network. In particular, I set A i = λX i − (1 − λ ) C n × Beta(0 . , . , X i and concentrates mass at the boundary of theunit interval. This implies that, conditional on X i , the individuals cluster at small or high typesof unobserved attributes. The parameter λ ∈ (0 ,

1) controls the degree of correlation between theagent-speciﬁc heterogeneity and the observed covariate X i , which is set to λ = . The constant C n depends on the size of the network and takes the values C n ∈ (cid:8) log(log( n )) , log( n ) / , log( n ) (cid:9) .Under this design, the choice of C n regulates the degree of sparsity of the network. For larger valuesof C n , fewer links are formed in the network. The special regressor v ij is simulated as v ij ∼ N (0 , i < j , and thus satisﬁes the support and independence conditions in Assumptions 3.1.3 and3.1.4. The link-speciﬁc disturbance term is generated as U ij ∼ Beta (2 , − for i < j . The trueDGP is completed by setting the parameter value θ = 1 . n ∈ { , } .The implementation of the semiparametric estimator for θ requires the estimation of the con-ditional density of v ij in a nonparametric ﬁrst stage. I consider two approaches to isolate theapproximation error induced by the density estimation. The ﬁrst one assumes that the conditionaldistribution of v ij is known and considers a ﬁxed trimming design given by I τ,ij = 1 [ | v ij | < τ ]with τ = 2 std ( v ij ). In the second approach, I compute the semiparametric estimator as deﬁned inequation (9). Although assumption 4.1.3 requires the use of higher-order kernels to eliminate theasymptotic bias, I compute b θ n using a standard second-order kernel. The motivation for this choiceis that semiparametric estimators computed using high-order kernels tends to have inferior ﬁnitesample properties compared to those obtained using standard kernels. Furthermore, this choice isa common practice in many semiparametric applications (see Rothe 2009 and Jochmans 2013). Iuse the standard-normal density as the kernel function. The trimming design is the same as in theﬁrst approach to ensure a proper comparison between the two alternative methods. The bandwidthparameter h is set to be equal to 0 . f v ( v ij ) is known, over 500 Monte Carlo replications for all the designs. Inparticular, I report the mean, median, standard deviation, and mean square error of b θ n over thetotal number of simulations. The ﬁnal column of Table 1 reports the average degree of the networkacross the total number of simulations. This information will be used to describe the degree ofsparsity in the network across the diﬀerent designs.The top panel in Table 1 shows the results of estimating θ in a small network with n = 50.Both the mean and the median show that the estimator approximates well the true value of θ = 1 . b θ n presents the smallest dispersion in the dense network design, with C n = log(log( n ))and an average degree of 42% of the links formed. As fewer links are present in the network, theperformance of the estimator deteriorates. 22n the bottom panel of Table 1, I show the results of estimating θ in a large network with n = 100. The evidence in this scenario reinforces the previous ﬁndings and suggests that theperformance of the estimator b θ n improves across all the designs. For example, in the dense networkscenario C n = log(log( n )), the standard deviation decreases by an order of less than one half andthe mean square error by an order greater than one third. A similar conclusion is obtained fromthe sparse network case C n = log( n ), where only 28% of the links are formed.Table 2 summarizes the results of computing the semiparametric two-step estimator for θ witha ﬁrst-step kernel estimator b f v ( v ij ) over 500 Monte Carlo replications for all the designs. Thetop panel in Table 2 shows the results of estimating θ in a small network with n = 50. Theseestimates suggest that b θ n approximates well the true value of θ . However, this approach obtainsless accurate results than those by the ﬁrst method due to the approximation error induced by thenonparametric ﬁrst-stage estimation. In particular, the estimator presents the best performanceand smallest dispersion in the dense network design, where the network has an average degree of42% of the links formed.In the bottom panel of Table 2, I show the results of estimating θ in a large network with n = 100. The estimates show that the performance of the estimator b θ n improves across all thedesigns as the network’s size grows large, including the sparse case where the network has anaverage degree of 29% of the links formed. Overall these numerical experiments suggest thatthe semiparametric estimator b θ n yields reliable inference for the preference parameter θ in anundirected network formation model.Table 1: Simulation results for the semiparametric estimator b θ n with known density function f v ( v ij )mean median std MSE Degree n = 50log(log( n )) 1.4764 1.4627 0.9158 0.8393 0.4250log( n ) / n ) 1.5217 1.6001 1.3832 1.9136 0.3131 n = 100log(log( n )) 1.5212 1.5022 0.4809 0.2317 0.4204log( n ) / n ) 1.5057 1.4979 0.6916 0.4783 0.2893 Total number of Monte Carlo simulations = 500. b θ n with kernel estimator b f v ( v ij )mean median std MSE Degree n = 50log(log( n )) 1.6047 1.6164 1.1253 1.2772 0.4237log( n ) / n ) 1.6444 1.6643 1.5801 2.5176 0.3125 n = 100log(log( n )) 1.5373 1.5011 0.4911 0.2425 0.4214log( n ) / n ) 1.5415 1.5197 0.7317 0.5371 0.2907 Total number of Monte Carlo simulations = 500. Bandwith parameter h = 0 . Conclusion

This paper has studied a network formation model with unobserved agent-speciﬁc heterogeneity.This paper oﬀers two main contributions to the literature on network formation. The ﬁrst contri-bution is to propose a new identiﬁcation strategy that identiﬁes the vector of coeﬃcients θ , whichaccounts for the preferences for homophilic relationships on the observed attributes. The pointidentiﬁcation result relies on the existence of a special regressor. This study represents, to thebest of my knowledge, the ﬁrst generalization of a special regressor to analyze a network formationmodel (Lewbel 1998 and Lewbel 2000).The second contribution is to introduce a two-step semiparametric estimator for θ . The esti-mator has a closed-form and is computationally tractable even in large networks. I show in MonteCarlo simulations that the estimator performs well in ﬁnite samples, as well as in sparse and densenetworks.Two diﬀerent strands of the literature on network formation have highlighted the importanceof accounting for (i) network externalities, and (ii) general forms of unobserved heterogeneity,(see, e.g., Graham 2019b). In future research, I plan to explore the identiﬁcation power that thespecial regressor has when considering an augmented model of network formation with networkexternalities and general forms of unobserved heterogeneity.25 eferences Ahn, H. and J. L. Powell (1993). Semiparametric estimation of censored selection models with anonparametric selection mechanism.

Journal of Econometrics 58 (1-2), 3–29.Andrews, D. W. and M. M. Schafgans (1998). Semiparametric estimation of the intercept of asample selection model.

The Review of Economic Studies 65 (3), 497–517.Aradillas-Lopez, A. (2010). Semiparametric estimation of a simultaneous game with incompleteinformation.

Journal of Econometrics 157 (2), 409–431.Aradillas-Lopez, A. (2012). Pairwise-diﬀerence estimation of incomplete information games.

Jour-nal of Econometrics 168 (1), 120–140.Aradillas-Lopez, A., B. E. Honor´e, and J. L. Powell (2007). Pairwise diﬀerence estimation withnonparametric control variables.

International Economic Review 48 (4), 1119–1158.Arellano, M. and B. Honor´e (2001). Panel data models: some recent developments.

Handbook ofEconometrics 5 , 3229–3296.Auerbach, E. (2019). Identiﬁcation and estimation of a partially linear regression model usingnetwork data. arXiv preprint arXiv:1903.09679 .Badev, A. (2018). Nash equilibria on (un)stable networks.Boucher, V. and I. Mouriﬁ´e (2017). My friend far, far away: a random ﬁeld approach to exponentialrandom graph models.

The econometrics journal 20 (3), S14–S46.Candelaria, L. E. and T. Ura (2018). Identiﬁcation and inference of network formation games withmisclassiﬁed links. arXiv preprint arXiv:1804.10118 .Chandrasekhar, A. G. and M. O. Jackson (2014). Tractable and consistent random graph models.

Working Paper .Charbonneau, K. B. (2017). Multiple ﬁxed eﬀects in binary response panel data models.

TheEconometrics Journal 20 (3), S1–S13.Chen, S., S. Khan, and X. Tang (2019). Exclusion restrictions in dynamic binary choice paneldata models: Comment on “semiparametric binary choice panel data models without strictlyexogenous regressors”.

Econometrica 87 (5), 1781–1785.Collomb, G. and W. H¨ardle (1986). Strong uniform convergence rates in robust nonparametrictime series analysis and prediction: Kernel regression estimation from dependent observations.

Stochastic processes and their applications 23 (1), 77–89.de Paula, A., S. Richards-Shubik, and E. Tamer (2017). Identifying preferences in networks withbounded degree. forthcoming in.

Econometrica .Dzemski, A. (2019). An empirical model of dyadic link formation in a network with unobservedheterogeneity.

Review of Economics and Statistics 101 (5), 763–776.Gao, W. Y. (2020). Nonparametric identiﬁcation in index models of link formation.

Journal ofEconometrics 215 (2), 399–413. 26ao, W. Y., M. Li, and S. Xu (2020). Logical diﬀerencing in dyadic network formation modelswith nontransferable utilities. arXiv preprint arXiv:2001.00691 .Goldsmith-Pinkham, P. and G. W. Imbens (2013). Social networks and the identiﬁcation of peereﬀects.

Journal of Business & Economic Statistics 31 (3), 253–264.Graham, B. S. (2017). An econometric model of network formation with degree heterogeneity.

Econometrica 85 (4), 1033–1063.Graham, B. S. (2019a). Dyadic regression.Graham, B. S. (2019b). Network data.Graham, B. S., F. Niu, and J. L. Powell (2019). Kernel density estimation for undirected dyadicdata. arXiv preprint arXiv:1907.13630 .Gualdani, C. (2020). An econometric model of network formation with an application to boardinterlocks between ﬁrms.Han, A. K. (1987). Non-parametric analysis of a generalized regression model: the maximum rankcorrelation estimator.

Journal of Econometrics 35 (2), 303–316.Honor´e, B. E. and A. Lewbel (2002). Semiparametric binary choice panel data models withoutstrictly exogeneous regressors.

Econometrica 70 (5), 2053–2063.Jackson, M. O. and A. Wolinsky (1996). A strategic model of social and economic networks.

Journalof economic theory 71 (1), 44–74.Jochmans, K. (2013). Pairwise-comparison estimation with non-parametric controls.

The Econo-metrics Journal 16 (3), 340–372.Jochmans, K. (2017). Two-way models for gravity.

Review of Economics and Statistics 99 (3),478–485.Jochmans, K. (2018). Semiparametric analysis of network formation.

Journal of Business &Economic Statistics 36 (4), 705–713.Khan, S. and E. Tamer (2010). Irregular identiﬁcation, support conditions, and inverse weightestimation.

Econometrica 78 (6), 2021–2042.Lee, A. J. (2019).

U-statistics: Theory and Practice . Routledge.Leung, M. (2015a). A random-ﬁeld approach to inference in large models of network formation.

Available at SSRN .Leung, M. (2015b). Two-step estimation of network-formation models with incomplete information.

Journal of Econometrics 188 (1), 182–195.Lewbel, A. (1997). Semiparametric estimation of location and other discrete choice moments.

Econometric Theory 13 (01), 32–51.Lewbel, A. (1998). Semiparametric latent variable model estimation with endogenous or mismea-sured regressors.

Econometrica , 105–121. 27ewbel, A. (2000). Semiparametric qualitative response model estimation with unknown het-eroscedasticity or instrumental variables.

Journal of Econometrics 97 (1), 145–177.Lewbel, A. (2012).

An overview of the special regressor method . Boston College, Department ofEconomics.Manski, C. F. (1975). Maximum score estimation of the stochastic utility model of choice.

Journalof Econometrics 3 (3), 205–228.Manski, C. F. (1985). Semiparametric analysis of discrete response: Asymptotic properties of themaximum score estimator.

Journal of Econometrics 27 (3), 313–333.McPherson, M., L. Smith-Lovin, and J. M. Cook (2001). Birds of a feather: Homophily in socialnetworks.

Annual Review of Sociology , 415–444.Mele, A. (2017). A structural model of dense network formation.

Econometrica 85 (3), 825–850.Menzel, K. (2015). Strategic network formation with many agents.Miyauchi, Y. (2016). Structural estimation of a pairwise stable network with nonnegative exter-nality.

Journal of Econometrics, Forthcoming .Newey, W. K. and D. McFadden (1994). Large sample estimation and hypothesis testing.

Handbookof econometrics 4 , 2111–2245.Powell, J. L. (1994). Estimation of semiparametric models.

Handbook of econometrics 4 , 2443–2521.Powell, J. L., J. H. Stock, and T. M. Stoker (1989). Semiparametric estimation of index coeﬃcients.

Econometrica: Journal of the Econometric Society , 1403–1430.Rao, B. P. (2009). Conditional independence, conditional mixing and conditional association.

Annals of the Institute of Statistical Mathematics 61 (2), 441–460.Ridder, G. and S. Sheng (2020). Estimation of large network formation games. arXiv preprintarXiv:2001.03838 .Rothe, C. (2009). Semiparametric estimation of binary response models with endogenous regressors.

Journal of Econometrics 153 (1), 51–64.Serﬂing, R. J. (2009).

Approximation theorems of mathematical statistics , Volume 162. John Wiley& Sons.Sheng, S. (2018). A structural econometric analysis of network formation games through subnet-works. forthcoming in Econometrica, mimeo UCLA .Silverman, B. W. (1978). Weak and strong uniform consistency of the kernel estimate of a densityand its derivatives.

The Annals of Statistics , 177–184.Toth, P. (2017). Semiparametric estimation in networks with homophily and degree heterogeneity.Technical report, Working paper, University of Nevada.Yan, T., B. Jiang, S. E. Fienberg, and C. Leng (2019). Statistical inference in a directed networkmodel with covariates.

Journal of the American Statistical Association 114 (526), 857–868.Zeleneev, A. (2020). Identiﬁcation and estimation of network models with nonparametric unob-served heterogeneity. 28

Appendix

A.1 Proof of Theorem 3.1

Proof.

Let e ij = A i + A j − U ij and s ( w, e ) = − w ′ θ − e . Consider E [ D ∗ ij | X i , X j ] = E (cid:2) E (cid:2) D ∗ ij | v ij , X i , X j (cid:3) | X i , X j (cid:3) = Z s v s v E [ D ij − [ v ij > | v ij , X i , X j ] f v | x ( v ij | X i , X j ) f v | x ( v ij | X i , X j ) dv ij = Z s v s v E [ [ v ij ≥ s ( W ij , e ij )] − [ v ij > | v ij , X i , X j ] dv ij = Z s v s v Z S e ( X i ,X j ) { [ v ij ≥ s ( W ij , e ij )] − [ v ij > } dF e | x ( e ij | v ij , X i , X j ) dv ij = Z S e ( X i ,X j ) Z s v s v { [ v ij ≥ s ( W ij , e ij )] − [ v ij > } dv ij dF e | x ( e ij | X i , X j )= Z S e ( X i ,X j ) − s ( W ij , e ij ) dF e | x ( e ij | X i , X j )= Z S e ( X i ,X j ) (cid:0) W ′ ij θ + e ij (cid:1) dF e | x ( e ij | X i , X j )= W ′ ij θ + E [ e ij | X i , X j ] . The third to last equality follows from the following result Z s v s v { [ v ij ≥ s ( W ij , e ij )] − [ v ij > } dv ij = Z s v s ( W ij ,e ij ) dv ij − s v = − s ( W ij , e ij ) . A.2 Proof of Corollary 3.1

Proof.

Theorem 3.1 concludes that E [ D ∗ ik | X i , X k ] = W ′ ik θ + E [ A i + A k | X i , X k ] . Observe that D ∗ ik is a function of ( Z i , Z k , A i , A k , U ik ). It follows from the the random samplingof nodes, Assumption 3.1.1, and the conditionally independent formation of links, Assumption3.1.2, that the following condition holds for any tetrad σ { i, j, k, l } ∈ N m n E [ D ∗ ik | X i , X k ] = E [ D ∗ ik | X σ ( { i,j,k,l } ) ] E [ A i + A k | X i , X k ] = E (cid:2) A i + A k | X σ ( { i,j,k,l } ) (cid:3) , v i , v k , A i , A k , U ik ) is conditionally independent of ( X j , X l ), given( X i , X k ), i.e., P r ( v i , v k , A i , A k , U ik | X i , X k ) = P r ( U ik | X i , X k , v i , v k , A i , A k ) P r ( v i , v k , A i , A k | X i , X k )= P r ( U ik | X σ ( { i,j,k,l } ) , v i , v k , A i , A k ) P r ( v i , v k , A i , A k | X σ ( { i,j,k,l } ) )= P r ( v i , v k , A i , A k , U ik | X σ ( { i,j,k,l } ) ) , where the second equality follows from Assumptions 3.1.1 and 3.1.2. Thus, the results above yield E [ D ∗ ik − D ∗ il | X σ ( { i,j,k,l } ) ] = ( W ik − W il ) ′ θ + E (cid:2) A k − A l | X σ ( { i,j,k,l } ) (cid:3) E [ D ∗ jk − D ∗ jl | X σ ( { i,j,k,l } ) ] = ( W jk − W jl ) ′ θ + E (cid:2) A k − A l | X σ ( { i,j,k,l } ) (cid:3) , for any tetrad σ { i, j, k, l } , which in turn implies E [ ˜ D ∗ σ | X σ ] = ˜ W ′ σ θ . (11)The result follows from Assumption 3.1.5. The proof is complete. Proof of Theorem 3.2

Proof.

First, notice that for any ( X σ , ¯ v σ ) ∈ Q θ sign { ˜ v σ } = sign (cid:8) ˜ v σ + (cid:0) ∆ σ W ′ i θ + ∆ σ A (cid:1) − (cid:0) ∆ σ W ′ j θ + ∆ σ A (cid:1)(cid:9) since | ˜ v σ |≥ s ε − s ε with probability 1.Consider a θ = θ with P (cid:2) ¯ Z σ ∈ Q θ (cid:3) >

0. Without loss of generality, consider some ( X σ , ¯ v σ ) ∈ Q θ , with ˜ W ′ σ θ ≤ − ˜ v σ < ˜ W ′ σ θ . From the previous observation, it follows that ˜ v σ + ˜ W ′ σ θ + ∆ σ A − ∆ σ A > σ v i > s ε , ∆ σ v j < s ε with probability 1.Given ( X σ , ¯ v σ ) ∈ Q θ , it follows that ˜ v σ + ˜ W ′ σ θ + ∆ σ A − ∆ σ A > σ v i > s ε , ∆ σ v j < s ε hold if and only if ∆ σ v i > − (cid:0) ∆ σ W ′ i θ + ∆ σ A (cid:1) ∆ σ v j ≤ − (cid:0) ∆ σ W ′ j θ + ∆ σ A (cid:1) (12)with probability 1. The inequalities in (12) are suﬃcient conditions for P θ h ˜ D σ = 2 | X σ , A σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } i > P θ h ˜ D σ = − | X σ , A σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } i , or equivalently, for E θ h ˜ D σ | X σ , A σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } i > . X σ , ¯ v σ ) ∈ Q θ , E θ h ˜ D σ | X σ , A σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } i > v σ + ˜ W ′ σ θ + ∆ σ A − ∆ σ A > v σ + ˜ W ′ σ θ + ∆ σ A − ∆ σ A ≤ v σ ∈ V ( X σ ), it would be the case that ˜ v σ <

0, and thus∆ σ v i ≤ − (cid:0) ∆ σ W ′ i θ + ∆ σ A (cid:1) ∆ σ v j > − (cid:0) ∆ σ W ′ j θ + ∆ σ A (cid:1) with probability 1, which contradicts E θ h ˜ D σ | X σ , A σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } i > . Hence, sign n E θ h ˜ D σ | X σ , A σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } io = sign n ˜ v σ + ˜ W ′ σ θ o for any ( X σ , A σ , ¯ v σ ∈ V ( X σ )).The previous result implies that for any ( X σ , ¯ v σ ) ∈ Q θ with P (cid:2) ¯ Z σ ∈ Q θ (cid:3) >

0, it will hold that˜ W ′ σ θ ≤ − ˜ v σ < ˜ W ′ σ θ if and only ifsign n E θ h ˜ D σ | X σ , ¯ v σ ∈ V , ˜ D σ ∈ {− , } io > sign n E θ h ˜ D σ | X σ , ¯ v σ ∈ V , ˜ D σ ∈ {− , } io . This result implies that ¯ z σ ∈ ξ θ ( X σ ), and P (cid:2) ¯ Z σ ∈ ξ θ (cid:3) >

0. Therefore, θ is identiﬁed relative to θ . Proof of Corollary 3.3

Proof.

Consider any θ = θ . It follows from Assumption 3.2.4 that P h ˜ W ′ σ ( θ − θ ) = 0 i > σ ∈ N m n . Suppose without loss of generality that P h ˜ W ′ σ θ < ˜ W ′ σ θ i >

0. Under Assumptions3.1.1 and 3.2.3, for any X σ , with ˜ W ′ σ θ < ˜ W ′ σ θ , there exists an interval of ˜ v σ = ∆ σ v i − ∆ σ v j with˜ W ′ σ θ ≤ − ˜ v σ < ˜ W ′ σ θ . This implies that P (cid:2) ¯ Z σ ∈ Q θ (cid:3) > θ is point identiﬁed relativeto all θ = θ . A.3 Proof of Theorem 4.1

Proof.

Consider b θ n = b Γ − n × b Ψ n,τ , with b Γ n = 1 m n X σ ∈N mn h ˜ W σ ˜ W ′ σ ib Ψ n,τ = 1 m n X σ ∈N mn h ˜ W σ b D ∗ σ,τ i b Γ n p → Γ and b Ψ n,τ p → Ψ ; the result will follow Assumption 3.1.5, thecontinuous mapping theorem and Slutsky. Part 1.

Notice that b Γ n − Γ is a mean zero fourth-order V-statistic, without common indices b Γ n − Γ = 1 m n X σ ∈N mn nh ˜ W σ ˜ W ′ σ i − E h ˜ W σ ˜ W ′ σ io . Lemma B.1 implies that ˆΓ n − Γ can be approximated by a mean zero U-statistic of order 4 ata rate √ n . Assumption 3.1.5 ensures that Γ is ﬁnite. It follows from Assumption 3.1.1 that aStrong Law of Large Numbers for a U-statisitc holds, and hence, ˆΓ n − Γ = o p (1), (see Serﬂing2009, Theorem A, p. 190). Part 2.

For a ﬁxed tetrad σ = σ ( { i , i , j , j } ) ∈ N m n , let b η [ l l ] ,τ = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ˆ f x,l l ˆ f vx,l l ! , for ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } . Next, observe that b Ψ n,τ can be written as b Ψ n,τ = (cid:0)b η [ i j ] ,τ − b η [ i j ] ,τ (cid:1) − (cid:0)b η [ i j ] ,τ − b η [ i j ] ,τ (cid:1) . Consistent estimation of Ψ will follow from repeated applications of Lemma B.2. It followsfrom Lemma B.3 that b η [ l l ] ,τ can be written as b η [ l l ] ,τ = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( f x,l l f vx,l l + b f x,l l − f x,l l f vx,l l − f x,l l f vx,l l × b f vx,l l − f vx,l l f vx,l l ) + o p (1) . Then, Lemma B.2 yields1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( ˆ f x,l l f vx,l l ) = E (cid:20) ˜ W σ ϕ l l ,τ f x,l l f vx,l l (cid:21) + o p (1)1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( f x,l l f vx,l l ˆ f vx,l l f vx,l l !) = E (cid:20) ˜ W σ ϕ l l ,τ f x,l l f vx,l l (cid:21) + o p (1) . It follows from the previous results and the deﬁnition of D ∗ l l ,τ that b η [ i j ] ,τ = 1 m n X σ ∈N mn n ˜ W σ D ∗ l l ,τ o + E h ˜ W σ D ∗ l l ,τ i − E h ˜ W σ D ∗ l l ,τ i + o p (1) , which is a V-statistic of order 4. It follows from Lemma B.1 that it can be approximated by aU-statisitcs of order 4. Assumptions 4.1.1 and 4.1.2, and equation (6) ensure that E h ˜ W σ D ∗ l l ,τ i isﬁnite. It follows then from Assumptions 3.1.1 that a Strong Law of Large Numbers for U-statistics32olds, and hence, 1 m n X σ ∈N mn n ˜ W σ D ∗ l l ,τ − E h ˜ W σ D ∗ l l ,τ io = o p (1) . Consider next1 m n X σ ∈N mn ˜ W σ (cid:8) D ∗ l l − D ∗ l l ,τ (cid:9) = 1 n ( n − n X l X l = l D ∗ l l { (1 − I τ,l l ) } ˜ W l l ( σ )where the equality follows from the deﬁnition of D ∗ l l ,τ and˜ W l l ( σ ) = 1( n − n − X s = l ,l X s = l ,l ,s ˜ W σ { l ,s ; l ,s } . It follows from using a Cauchy-Schwarz inequality, that the expectation E  n ( n − n X l X l = l D ∗ l l { (1 − I τ,l l ) } ˜ W l l ( σ )   is bounded by1 n ( n − n X l X l = l E (cid:20)(cid:16) D ∗ l l { (1 − I τ,l l ) } ˜ W l l ( σ ) (cid:17) (cid:21) = O (cid:16) E h ˜ W l l ( σ ) (cid:0) D ∗ l l (cid:1) (1 − I τ,l l ) i(cid:17) ≤ sup σ (cid:16) ˜ W σ (cid:17) sup l l (cid:0) D ∗ l l (cid:1) O (cid:16) E h (1 − I τ,l l ) i(cid:17) . where the inequality follows from Assumption 4.1.1. Assumption 4.1.2 yields E h (1 − I τ,l l ) i = P [ I τ,l l = 0] = o ( τ ) . Using the results above to conclude that E  n ( n − n X l X l = l D ∗ l l { (1 − I τ,l l ) } ˜ W l l ( σ )   ≤ o ( τ ) , and hence, 1 m n X σ ∈N mn n ˜ W σ D ∗ l l ,τ − E h ˜ W σ D ∗ l l io = o (1) . Using similar steps for b η [ i j ] ,τ , b η [ i j ] ,τ , and b η [ i j ] ,τ , yields:ˆΨ n,τ − E h ˜ W σ ˜ D ∗ σ i = o p (1) . A.4 Proof of Theorem 4.2

Proof.

Part 1: H´ajek Projection

Under Assumptions 3.1.1-3.1.5, 4.1.1-4.1.5, it follows from the proof of Theorem 4.1 that ˆΓ n p → Γ , and from Lemma B.3 that b η [ l l ] ,τ can be written as b η [ l l ] ,τ = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( f x,l l f vx,l l + b f x,l l − f x,l l f vx,l l − f x,l l f vx,l l × b f vx,l l − f vx,l l f vx,l l ) + o p (1)for ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } .Hence, b Ψ n,τ = (cid:0)b η [ i j ] ,τ − b η [ i j ] ,τ (cid:1) − (cid:0)b η [ i j ] ,τ − b η [ i j ] ,τ (cid:1) , which can be expressed as b Ψ n,τ = S ,nτ + S ,nτ − S ,nτ + o p (1) using the expression above, with S ,nτ = 1 m n X σ ∈N mn ˜ W σ (cid:8)(cid:0) D ∗ i j ,τ − D ∗ i j ,τ (cid:1) − (cid:0) D ∗ i j ,τ − D ∗ i j ,τ (cid:1)(cid:9) S ,nτ = 1 m n X σ ∈N mn ˜ W σ ( ϕ i j ,τ b f x,i j f vx,i j − ϕ i j ,τ b f x,i j f vx,i j ! − ϕ i j ,τ b f x,i j f vx,i j − ϕ i j ,τ b f x,i j f vx,i j !) S ,nτ = 1 m n X σ ∈N mn ˜ W σ ( D ∗ i j ,τ b f vx,i j f vx,i j − D ∗ i j ,τ b f vx,i j f vx,i j ! − D ∗ i j ,τ b f vx,i j f vx,i j − D ∗ i j ,τ b f vx,i j f vx,i j !) . Consider (cid:16) b Ψ n,τ − E h ˜ W σ e D ∗ σ,τ | Ω n i(cid:17) = n S ,nτ − E h ˜ W σ e D ∗ σ,τ | Ω n io + S ,nτ − S ,nτ + o p (1) , it follows from Lemmas B.4, B.5, and B.6 that the H´ajek projection of (cid:16) b Ψ n,τ − E h ˜ W σ e D ∗ σ,τ | Ω n i(cid:17) into an arbitrary function of ζ i j = ( X i , X j , A i , A j , v i j , U i j ) is given by (cid:16) b Ψ n,τ − E h ˜ W σ ˜ D ∗ στ | Ω n i(cid:17) = V ∗ n + o p (cid:18)r ̺ n n ( n − (cid:19) V ∗ n = 1 n ( n − n X i =1 X j = i ξ i j ,τ ξ i j ,τ = (cid:8) D ∗ i j − E (cid:2) D ∗ i j | ω i j (cid:3)(cid:9) I τ,i j χ i j χ i j =  n − n − X i = i ,j X j = i ,j ,i E h ˜ W σ { i ,i ; j ,j } | X i , X j i and Υ n,τ = n ( n − V ar ( V ∗ n ) = 1 n ( n −  n X i =1 X j = i Λ ∗ i ,j  Λ ∗ i ,j = E hn E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j i ̺ n,τ = O (Υ n,τ ) = O (cid:18) E (cid:20)(cid:26) p n ( ω i j ) [1 − p n ( ω i j )] f v | x,i j (cid:27) I τ,i j (cid:21)(cid:19) . Part 2: Bias Reduction

Consider next, n ( n − ̺ − n E  m n X σ ∈N mn ˜ W σ n ˜ D ∗ στ − ˜ D ∗ σ o | Ω n  . It follows from a Cauchy-Schwarz inequality that the term above is bounded by n ( n − ̺ − n m n X σ ∈N mn E (cid:20)(cid:16) ˜ D ∗ στ − ˜ D ∗ σ (cid:17) | Ω n (cid:21) ˜ W σ ˜ W ′ σ which is equal to O (cid:16) n ( n − ̺ − n n E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) E (cid:2) D ∗ i j | ω i j (cid:3) ′ o ( I τ,i j − ˜ W σ ˜ W ′ σ (cid:17) . Assumptions 4.1.1 and 4.1.2 yieldsup σ (cid:16) ˜ W σ (cid:17) sup σ (cid:16) ˜ W σ (cid:17) ′ O (cid:18) n ( n − ̺ − n (cid:26) p n ( ω i j − ) [1 − p n ( ω i j − )] f v | x,i j (cid:27) ( I τ,i j − (cid:19) = O ( n ( n − τ ) = 0since ( I τ,i j −

1) as τ → n → ∞ . 35herefore, n ( n − ̺ − n E  m n X σ ∈N mn ˜ W σ n ˜ D ∗ στ − ˜ D ∗ σ o | Ω n  = o (1) , and so n ( n − ̺ − n (cid:16) E h ˜ W σ ˜ D ∗ στ | Ω σ i − E h ˜ W σ ˜ D ∗ σ | Ω σ i(cid:17) = o (1) . Part 3: Limit Distribution of Projection

Given Assumptions 3.1.2, the H´ajek projection V ∗ n is an average of { ξ i j ,τ } , which are condi-tionally independent given Ω n = ( v n , X n , A n ), with conditional mean E [ ξ i j ,τ | Ω n ] = 0and conditional varianceΥ (Ω n ) = n ( n − V ar  n ( n − n X i =1 X j = i ξ i j | Ω n  = 1 n ( n − n X i =1 X j = i n E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j . Given Assumption 4.1.4, a conditional version of Lyapunov’s Central Limit Theorem holds, andhence Υ (Ω n ) − /  p n ( n − n X i =1 X j = i ξ i j ,τ  ⇒ N (0 , I ) . Now, it follows from using 4.1.4 that k Υ (Ω n ) − Υ n k p → n → ∞ . It follows then that thelimiting distribution is independent of the conditional values, and therefore, the limiting distributioncontinues to hold unconditionally, with Υ n replacing Υ (Ω n ). That is,Υ − / n  p n ( n − n X i =1 X j = i ξ i j ,τ  ⇒ N (0 , I ) . Part 4: Limiting distribution of b θ n Consider the matrix Σ n , deﬁned as Σ n = Γ − × Υ n × Γ − . The limiting distribution of the b θ n b Ψ − n and Σ n , and from applying Slutsky’s theorem. In other words, p n ( n − − / n (cid:16)b θ n − θ (cid:17) = p n ( n − − / n × b Γ − n  m n X σ ∈N mn n ˜ W σ ˜ D ∗ σ,τ − E h ˜ W σ ˜ D ∗ σ | Ω σ io = Γ / × Υ − / n × Γ − / ×  p n ( n − n X i =1 X j = i ξ i j ,τ  + o p (1) ⇒ N (0 , I ) . The proof is complete. 37

Technical Appendix

B.1 Equivalent representation for V statistics

The following lemma provides a U-statistic representation for a V-statistic when the kernel varieswith n . Given n and for m ≤ n , let P ( n,m ) denote the sum over the (cid:0) nm (cid:1) combinations of m distinctelements ( i , · · · , i m ) from (1 , · · · , n ), and let P Π m ! denote the sum over the m ! permutations( i , · · · , i m ) of (1 , · · · , m ).Let V n be a V -statistic or order m , without common indices V n = 1 n m n X i , ··· ,i m =1 h L γ ( X i , · · · , X i m ) [ i = · · · 6 = i m ]where h → n → ∞ , and γ : R L R .Let U n = (cid:18) nm (cid:19) − X ( n,m ) φ h ( X i , · · · , X i m ) φ h ( X , · · · , X m ) = 1 m ! X Π m ! h L γ ( X π , · · · , X π m ) Lemma B.1.

Suppose that E || γ ( X i , · · · , X i m ) || < ∞ for all ≤ i , · · · , i m ≤ m and m ≤ n ,and nh → ∞ . Then, V n − U n = o p (1) . Proof.

Let γ h ( X i , · · · , X i m ) = 1 h L γ ( X i , · · · , X i m ) , and notice that n m V n = X ( n,m ) X Π m ! γ h ( X π , · · · , X π m ) (13)= [ n ( n − · · · ( n − m + 1)] (cid:18) nm (cid:19) − X ( n,m ) φ h ( X i , · · · , X i m )= [ n ( n − · · · ( n − m + 1)] U n , and hence, ( U n − V n ) = O ( n − ) U n .Consider now E h ( U n − V n ) i = O (cid:18) n (cid:19) E (cid:2) U n (cid:3) , E (cid:2) U n (cid:3) = (cid:18) nm (cid:19) − E  X ( n,m ) φ h ( X i , · · · , X i m )   ≤ (cid:18) nm (cid:19) − (cid:18) nm (cid:19) E (cid:2) φ h ( X i , · · · , X i m ) (cid:3) where E (cid:2) φ h ( X i , · · · , X i m ) (cid:3) = 1 h L O (cid:0) E (cid:2) γ ( X i , · · · , X i m ) (cid:3)(cid:1) = O (cid:18) h L (cid:19) since E || γ ( X i , · · · , X i m ) || < ∞ by assumption, and hence, E h ( U n − V n ) i ≤ O (cid:18) nh L ) (cid:19) = o (1)as nh L → ∞ .Notice that, unlike Lemma 5.7.3 in Serﬂing (2009, page 206) and Theorem 1 in Lee (2019, page183), in equation 13 the average of terms with at least one common index is equal to zero due tothe speciﬁcation of the V-statistic without common indices. B.2 Consistency for V-statistics

Lemma B.2.

Suppose that the Assumptions in Theorem 4.1 hold. Then m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( ˆ f x,l l f vx,l l ) − E (cid:20) ˜ W σ ϕ l l ,τ f x,l l f vx,l l (cid:21) = o p (1)1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( f x,l l f vx,l l ˆ f vx,l l f vx,l l !) − E (cid:20) ˜ W σ ϕ l l ,τ f x,l l f vx,l l (cid:21) = o p (1) with ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } for a given tetrad σ { i , i , j , j } ∈ N m n .Proof. This proof focuses on the ﬁrst result since the second one follows from similar arguments.Let ˆ V n = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ f vx,l l ˆ f x,l l , f x,l l is deﬁned asˆ f x,l l = 1( n − n − X k = i ,j X k = i ,j ,k h L K x,h ( X k − X l , X k − X l ) . Plugging ˆ f x,l l into ˆ V n yields the following V-statistic of order six6! (cid:18) n (cid:19) − X i = i = j = j = k = k h L ˜ W i i ; j j ϕ l l ,τ f vx,l l K x,h ( X k − X l , X k − X l ) . Assumptions 4.1.1 and 4.1.5 imply that E (cid:20) || h L ˜ W i i ; j j ϕ l l ,τ f vx,l l K x,h ( X k − X l , X k − X l ) || (cid:21) < ∞ , it then follows from Lemma B.1 that ˆ V n is asymptotically equivalent to a six-order U-statistic as nh L → ∞ . In particular, (cid:16) U n − ˆ V n (cid:17) = o p (1) where U n = (cid:18) n (cid:19) − X i < ···

B.3 Lemmas for Asymptotic Normality Theorem

Notation

The following notation will prove to be useful to show Lemmas B.3-B.6. For any ﬁnite n , letΩ n = { X n , A n , v n } . Given a ﬁxed tetrad σ { i , i , j , j } ∈ N m n , let X σ = { X i , X i , X j , X j } , A σ = { A i , A i , A j , A j } , v σ = { v i , v i , v j , v j } , Ω σ = { X σ , A σ , v σ } , and for any dyad ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } , deﬁne ω l l = { X l , X l , A l , A l , v l l } T † l l = T l l − E h ˜ W σ ˜ D ∗ σ,τ | Ω σ i for any random variable T l l . Lemma B.3.

Suppose that the Assumptions in Theorem 4.2 hold, and consider b η [ l l ] ,τ = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ b f x,l l b f vx,σl l ! . ith ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } . It follows that b η [ l l ] ,τ can be written as b η [ l l ] ,τ = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( f x,l l f vx,l l + b f x,l l − f x,l l f vx,l l − f x,l l f vx,l l × b f vx,l l − f vx,l l f vx,l l ) + o p (1) . Proof.

Given h → n − δ h L +1 → ∞ for any δ >

0, it follows from a variance calculationargument that sup ( v,x,x ′ ) ∈ Ω v,x | ˆ f vx ( v, x, x ′ ) − f vx ( v, x, x ′ ) | = o p (1)sup ( x,x ′ ) ∈ Ω x | ˆ f x ( x, x ′ ) − f x ( x, x ′ ) | = o p (1) , for any δ >

0. See, e.g., Silverman (1978), Collomb and H¨ardle (1986),Aradillas-Lopez (2010), andfor applications to network models Leung (2015b) and Graham et al. (2019).Consider a second order Taylor expansion of b f x,l l / b f vx,l l around f x,l l /f vx,l l . The quadraticterms in the expansion involve second order derivatives of f x,l l /f vx,l l evaluated at ˜ f x,l l and˜ f vx,l l , where ˜ f x,l l lies between b f x,l l and f x,l l , and similarly ˜ f vx,l l lies between b f vx,l l and f vx,l l . By substituting a second order Taylor expansion of b f x,l l / b f vx,l l around f x,l l /f vx,l l into b η [ l l ] ,τ , I obtain b η [ l l ] ,τ = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( f x,l l f vx,l l + b f x,l l − f x,l l f vx,l l − f x,l l f vx,l l × b f vx,l l − f vx,l l f vx,l l ) + R n , where R n denotes the reminder term. The result follows from showing that R n = o p (1).The ﬁrst component of R n is1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ  ˜ f x,l l (cid:16) b f vx,l l − f vx,l l (cid:17) ˜ f vx,l l  ≤ " sup ( x,x ′ ) ∈ Ω x | f x | sup ( v,x,x ′ ) ∈ Ω vx | f − vx | sup ( v,x,x ′ ) ∈ Ω vx | b f vx − f vx |  m n X σ ∈N mn || ˜ W σ ϕ l l ,τ ||  = O p (1) " sup ( v,x,x ′ ) | b f vx − f vx | = o p (1) . The ﬁrst inequality follows from Assumption 4.1.1. The equality follows from the fact that theV-statistic inside the parenthesis converges to its expectation given that Assumptions 3.1.1 and4.1.1. The result follows from the uniform convergence of the kernel estimator.43he remaining component of R n is1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( ( b f vx,l l − f vx,l l )( b f x,l l − f x,l l ) f vx,l l ) ≤ " sup ( v,x,x ) ∈ Ω vx | f − vx | sup ( v,x,x ) ∈ Ω vx | b f vx − f vx | sup ( x,x ) ∈ Ω x | b f x − f x | ×  m n X σ ∈N mn || ˜ W σ ϕ l l ,τ ||  = O p (1) " sup ( v,x,x ) ∈ Ω vx | b f vx − f vx | sup ( x,x ) ∈ Ω vx | b f x − f x | . = o p (1) . The result follows from the uniform convergence of the kernel estimators. This completes the proof.

Lemma B.4.

Under the same Assumptions of Theorem 4.2, it follows that the H´ajek projection of S † ,nτ = S ,nτ − E h ˜ W σ ˜ D ∗ σ,τ | Ω n i = 1 m n X σ ∈N mn n ˜ W σ ˜ D ∗ σ,τ − E h ˜ W σ ˜ D ∗ σ,τ | Ω σ io into an arbitrary function ζ i j = ( X i , X j , A i , A j , v i j , U i j ) is given by V ∗ ,nτ = 1 n ( n − n X i =1 X j = i ξ i j ,τ and n ( n − − / n E (cid:20)(cid:16) S † ,nτ − V ∗ ,nτ (cid:17) (cid:21) Υ − / n = o (1) , where Υ n = n ( n − V ar ( V ∗ ,nτ ) and V ar ( V ∗ ,nτ ) = O p ( p n τ ) .Proof. Step 1. H´ajek Projection

Consider the tetrad σ { i , i , j , j } , let s ( σ { i , i , j , j } ) = ˜ W σ ˜ D ∗ σ,τ − E h ˜ W σ ˜ D ∗ σ,τ | Ω σ i = ˜ W σ n ˜ D ∗ σ,τ − E h ˜ D ∗ σ,τ | Ω σ io , E [ s ( σ { i , i , j , j } ) | ζ i j ] = E h ˜ W σ n ˜ D ∗ σ,τ − E h ˜ D ∗ σ,τ | Ω σ io | ζ i j i = (cid:8) D ∗ i j ,τ − E (cid:2) D ∗ i j ,τ | ω i j (cid:3)(cid:9) E h ˜ W σ | X i j i . where the second equality follows from the Law of Iterated Expectations, and Assumptions 3.1.1and 3.1.2. To be precise, observe that for { l , l } 6 = { i , j } with ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) } ,E h ˜ W σ n ˜ D ∗ l l ,τ − E h ˜ D ∗ l l ,τ | Ω σ io | ζ i j i = E h ˜ W σ n E h ˜ D ∗ l l ,τ | ω l l i − E h ˜ D ∗ l l ,τ | ω l l io | ζ i j i = 0 . It then follows that the H´ajek projection is given by V ∗ ,nτ = 1 n ( n − n X i =1 X j = i ξ i j ,τ , with ξ i j ,τ = (cid:8) D ∗ i j − E (cid:2) D ∗ i j | ω i j (cid:3)(cid:9) I τ,i j χ i j χ i j =  n − n − X i = i ,j X j = i ,j ,i E h ˜ W σ { i ,i ; j ,j } | X i , X j i . Notice that E (cid:2) V ∗ ,nτ (cid:3) = E [ ξ i j ,τ ] = 0. Step 2. Variance of H´ajek Projection

For two diﬀerent dyads { i , j } 6 = { i ′ , j ′ } with zero common indices, Assumption 3.1.1 impliesthat E h ξ i j ,τ ξ i ′ j ′ ,τ i = E [ ξ i j ,τ ] E h ξ i ′ j ′ ,τ i = 0 . Observe that for two dyads { i , j } 6 = { i , j ′ } with one common index, the conditionally inde-pendent formation of links implied by Assumption 3.1.2 yields E h ξ i j ,τ ξ i ′ j ′ ,τ i = E h E [ ξ i j ,τ | Ω n ] E h ξ i ′ j ′ ,τ | Ω n ii = 0 . V ∗ ,nτ is given by V ar (cid:0) V ∗ ,nτ (cid:1) = (cid:26) n ( n − (cid:27)  n X i =1 X j = i E h ξ i j ,τ ξ ′ i ′ j ′ ,τ i = (cid:26) n ( n − (cid:27)  n X i =1 X j = i Λ ∗ i ,j  where Λ ∗ i ,j = E hn E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j i . Deﬁne Υ n,τ = n ( n − V ar (cid:0) V ∗ ,nτ (cid:1) = 1 n ( n −  n X i =1 X j = i Λ ∗ i ,j  . Step 3. Variance of S † ,nτ Given two diﬀerent tetrads σ { i , i , j , j } and σ ′ { i ′ , i ′ , j ′ , j ′ } , let∆ c,n = Cov (cid:0) s ( σ { i , i , j , j } ) , s (cid:0) σ ′ { i ′ , i ′ , j ′ , j ′ } (cid:1)(cid:1) denote the covariance between s ( σ ) and s ( σ ′ ) when σ { i , i , j , j } and σ ′ { i ′ , i ′ , j ′ , j ′ } have c =0 , , , , E [ s ( σ { i , i , j , j } ) | Ω σ ] = 0, that ∆ ,n = ∆ ,n = 0.Consider∆ ,n = E (cid:2) s ( σ { i , i , j , j } ) s ( σ ′ { i , i ′ , j , j ′ } ) ′ (cid:3) = E hn ˜ D ∗ σ,τ − E h ˜ D ∗ σ,τ | Ω σ io n ˜ D ∗ σ ′ ,τ − E h ˜ D ∗ σ ′ ,τ | Ω σ ′ io ˜ W σ ˜ W σ ′ i = E (cid:20)(cid:26) E h ˜ D ∗ i j ,τ ˜ D ∗ i j ,τ | ω i j i − E h ˜ D ∗ i j ,τ | ω i j i (cid:27) I τ,i j ˜ W σ ˜ W σ ′ (cid:21) .

46t follows from the results above that

V ar (cid:16) S † ,nt (cid:17) can be expanded as V ar (cid:16) S † ,nt (cid:17) = (cid:18) m n (cid:19) X σ ∈N mn X σ ′ ∈N mn (cid:8) E (cid:2) s ( σ { i , i , j , j } ) s ( σ ′ { i , i ′ , j ′ , j ′ } ) ′ (cid:3)(cid:9) = (cid:18) m n (cid:19) n X i =1 X j = i  X k = i ,j X k = i ,j ,k X l = i ,j X l = i ,j ,l ∆ ,n  + O (cid:18) ∆ ,n n (cid:19) + O (cid:18) ∆ ,n n (cid:19) . Notice that the term inside the brackets scaled by [( n − n − − is equivalent to Λ ∗ i j , inparticular,Λ ∗ i j = (cid:26) n − n − (cid:27) X k = i ,j X k = i ,j ,k X l = i ,j ′ X l = i ,j ′ ,l ∆ ,n = E (cid:20)(cid:26) E h ˜ D ∗ i j ,τ ˜ D ∗ i j ,τ | ω i j i − E h ˜ D ∗ i j ,τ | ω i j i (cid:27) I τ,i j χ i j χ ′ i j (cid:21) , which follows from the deﬁnition of χ i j .Hence, V ar (cid:16) S † ,nt (cid:17) = (cid:18) n ( n − (cid:19)  n X i =1 X j = i Λ ∗ i ,j  + o (1) , and V ar (cid:0) V ∗ ,nτ (cid:1) − V ar (cid:16) S † ,nτ (cid:17) = o (1). Step 4. Asymptotic Equivalence

To show that n ( n − − / n,τ E (cid:20)(cid:16) S † ,nτ − V ∗ ,nτ (cid:17) (cid:16) S † ,nτ − V ∗ ,nτ (cid:17) ′ (cid:21) Υ − / n,τ = o (1)it is suﬃcient to prove that V ar (cid:0) V ∗ ,nτ (cid:1) − / Cov (cid:2) V ∗ ,nτ , S ,nτ (cid:3) V ar (cid:0) V ∗ ,nτ (cid:1) − / = I , which in turn,follows from noticing that Cov h V ∗ ,nτ , S † ,nτ i = E h V ∗ ,nτ , S † ,nτ i = E (cid:20) V ∗ ,nτ (cid:16) S † ,nτ − V ∗ ,nτ (cid:17) ′ (cid:21) + E h V ∗ ,nτ (cid:0) V ∗ ,nτ (cid:1) ′ i = V ar ( V ∗ ,nτ ) , since by construction of the orthogonal projection E h V ∗ ,nτ (cid:0) S ,nτ − V ∗ ,nτ (cid:1) ′ i = 0 . Lemma B.5.

Under the same Assumptions of Theorem 4.2, it follows that the H´ajek projection of S † ,nτ = S ,nτ − E h ˜ W σ ˜ D ∗ σ,τ | Ω σ i S ,nτ = 1 m n X σ ∈N mn ˜ W σ ( ϕ i j ,τ b f x,i j f vx,i j − ϕ i j ,τ b f x,i j f vx,i j ! − ϕ i j ,τ b f x,i j f vx,i j − ϕ i j ,τ b f x,i j f vx,i j !) into an arbitrary function ζ i j = ( X i , X j , A i , A j , v i j , U i j ) is given by V ∗ ,nτ = 1 n ( n − n X i =1 X j = i ¯ ξ i j ,τ and n Υ − / n E h(cid:0) S ,nτ − V ∗ ,nτ (cid:1) i Υ − / n = o (1) , where Υ n = nV ar ( V ∗ ,nτ ) .Proof. Similarly to the deﬁnition for tetrads, I introduce the function σ = σ { i , i , j , j , k , k } that maps each unique 6-tuple { i , i , j , j , k , k } into an index set N m n = { , · · · , m n } where m n denotes the total number of those 6-tuples. Hence, each distinct 6-tuple { i , i , j , j , k , k } corresponds to a unique σ = σ { i , i , j , j , k , k } ∈ N m n .Consider a ﬁxed 6-tuple { i , i , j , j , k , k } , and deﬁne s i ,j ( σ ) = ˜ W i i ,j j (cid:26) h L ϕ i j ,τ f vx,i j K x,h ( X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) s i ,j ( σ ) = ˜ W i i ,j j (cid:26) h L ϕ i j ,τ f vx,i j K x,h ( X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) s i ,j ( σ ) = ˜ W i i ,j j (cid:26) h L ϕ i j ,τ f vx,i j K x,h ( X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) s i ,j σ ) = ˜ W i i ,j j (cid:26) h L ϕ i j ,τ f vx,i j K x,h ( X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) , and s ,n ( σ ) = s i ,j ( σ ) − s i ,j ( σ ) − s i ,j ( σ ) + s i ,j ( σ ). It follows then that S † ,nτ can be written as S † ,nτ = (cid:20) (cid:18) n (cid:19)(cid:21) − X σ ∈N mn s ,nτ ( σ )= (cid:20) (cid:18) n (cid:19)(cid:21) − X σ ∈N mn { s i j ( σ ) − s i j ( σ ) − s i j ( σ ) + s i j ( σ ) } . Step 1. H´ajek Projection

The rest of the proof makes use of the following index notation for dyads. Given the total48umber of ordered dyads n = n ( n − π = , , · · · index the n ordereddyads in the sample. In an abuse of notation, also let π denote the set { i , j } , where i and j arethe indices that comprise dyad π . In particular, π (1) = i and π (2) = j , when π = { i , j } .With this notation at hand, S † ,nτ can be expressed as S † ,nτ = (cid:20) (cid:18) n (cid:19)(cid:21) − n X π = X π = π X π = π (cid:8) s π ( σ ) − s π (1) π (2) ( σ ) − s π (1) π (2) ( σ ) + s π ( σ ) (cid:9) where σ = σ { π , π , π } .Let p π , π ( σ ) = 1 h L (cid:18) ϕ π ,τ f vx, π ˜ W π , π + ϕ π ,τ f vx, π ˜ W π , π (cid:19) K x,h ( X π − X π ) − E h ˜ W π , π D ∗ π ,τ | Ω π , π i − E h ˜ W π , π D ∗ π ,τ | Ω π , π i p π , π ( σ ) = 1 h L (cid:18) ϕ π ,τ f vx, π ˜ W π , π K x,h ( X π − X π ) + ϕ π ,τ f vx, π ˜ W π , π K x,h ( X π − X π ) (cid:19) − E h ˜ W π , π D ∗ π ,τ | Ω π , π i − E h ˜ W π , π D ∗ π ,τ | Ω π , π i p π (1) π (2) , π ( σ ) = 1 h L ˜ W π , π (cid:26)(cid:18) ϕ π (1) π (2) ,τ f vx, π (1) π (2) (cid:19) K x,h (cid:0) X π − X π (1) π (2) (cid:1) − E h D ∗ π (1) π (2) ,τ | Ω π , π i(cid:27) + 1 h L ˜ W π , π (cid:26)(cid:18) ϕ π (1) π (2) ,τ f vx, π (1) π (2) K x,h (cid:0) X π − X π (1) π (2) (cid:1)(cid:19) − E h D ∗ π (1) π (2) ,τ | Ω π , π i(cid:27) p π (1) π (2) , π ( σ ) = 1 h L ˜ W π , π (cid:26)(cid:18) ϕ π (1) π (2) ,τ f π (1) π (2) ,τ K x,h (cid:0) X π − X π (1) π (2) (cid:1)(cid:19) − E h D ∗ π (1) π (2) ,τ | Ω π , π i(cid:27) + 1 h L ˜ W π , π (cid:26)(cid:18) ϕ π (1) π (2) ,τ f vx, π (1) π (2) K x,h (cid:0) X π − X π (1) π (2) (cid:1)(cid:19) − E h D ∗ π (1) π (2) ,τ | Ω π , π i(cid:27) where K x,h ( X π − X π ) denotes K x,h (cid:0) X π (1) − X π (1) , X π (2) − X π (2) (cid:1) , ˜ W π , π denotes ˜ W π { i i } , π { j j } ,and χ π = E h ˜ W π , π | X π i χ π = X π = π , π χ π . Using the symmetry of the kernel,it follows that S † ,nτ can be written as (cid:20) (cid:18) n (cid:19)(cid:21) − n X π = X π = π +1 X π = π , π (cid:8) p π , π ( σ ) − p π (1) π (2) , π ( σ ) − p π (1) π (2) , π ( σ ) + p π , π ( σ ) (cid:9) To compute the H´ajek projection of the above sum into an arbitrary function of ζ π , consider49rst E [ p π , π ( σ ) | ζ π ]. To that end, the following results will be useful. E h E h ˜ W π , π D ∗ π ,τ | ω π i | ζ π i = E (cid:2) D ∗ π ,τ | ω π (cid:3) E h ˜ W π , π | X π i = E (cid:2) D ∗ π ,τ χ π | ω π (cid:3) E h E h ˜ W π , π D ∗ π ,τ | ω π i | ζ π i = E h E (cid:2) D ∗ π ,τ | ω π (cid:3) E h ˜ W π , π | X π ii = E (cid:2) D ∗ π ,τ χ π (cid:3) . Furthermore, E (cid:20)(cid:18) ϕ π ,τ f vx, π ˜ W π , π + ϕ π ,τ f vx, π ˜ W π , π (cid:19) h L K x,h ( X π − X π ) | ζ π (cid:21) = E (cid:20)(cid:26) ϕ π ,τ f vx, π E h ˜ W π , π | X π i + E (cid:20) ϕ π ,τ f vx, π | X π (cid:21) E h ˜ W π , π | X π i(cid:27) h L K x,h ( X π − X π ) | ζ π (cid:21) = Z (cid:26) ϕ π ,τ f vx, π χ π + E (cid:20) ϕ π ,τ f vx, π χ π | X π (cid:21)(cid:27) h L K x,h ( X π − X π ) f x ( X π ) dX π where the second equality follows from a Law of Iterated Expectations and Assumption 3.1.1.Let Ξ ( X π ) = E (cid:2) D ∗ π ,τ χ π | X π (cid:3) , and consider Z (cid:26) ϕ π ,τ f vx, π χ π f x ( X π ) + Ξ ( X π ) (cid:27) h L K x,h ( X π − X π ) dX π − (cid:26) ϕ π ,τ f vx, π χ π f x ( X π ) + Ξ ( X π ) (cid:27) = Z (cid:26) ϕ π ,τ f vx, π χ π f x ( X π + h ν ) + Ξ ( X π + h ν ) (cid:27) K x,h ( ν ) d ν − (cid:26) ϕ π ,τ f vx, π χ π f x ( X π ) + Ξ ( X π ) (cid:27) = Z (cid:26) ϕ π ,τ f vx, π χ π ( f x ( X π + h ν ) − f x ( X π )) (cid:27) + { Ξ ( X π + h ν ) − Ξ ( X π ) } K x ( ν ) d ν = o ( h M )where the ﬁrst equality follows from a change of variable ν = h − ( X π − X π ) with Jacobian h L .The last equality follows Assumptions 4.1.1, 4.1.3, and 4.1.5 which guarantee that f x ( X π ) andΞ ( X π ) are continuous and M -times diﬀerentiable with respect to all of its arguments, and K x isa bias-reducing kernel of order 2 M . Observe that ϕ π ,τ f vx, π χ π f x ( X π ) = 0holds for any X π within a τ distance of the boundary S x , and having h/τ → ν = h − ( X π − X π ) is not aﬀected by boundary eﬀects.The previous results, and Assumption 4.1.5, yield E [ p π , π ( σ ) | ζ π ] = D ∗ π ,τ χ π + E (cid:2) D ∗ π ,τ χ π | X π (cid:3) − E (cid:2) D ∗ π ,τ χ π | ω π (cid:3) − E (cid:2) D ∗ π ,τ χ π (cid:3) + o (1) . π s ∈ { ( π (1) , π (2)) , ( π (1) , π (2)) , π } , E (cid:20) ˜ W π , π (cid:26) h L ϕ π s ,τ f vx, π s K x,h ( X π − X π s ) − E (cid:2) D ∗ π s ,τ | ω π s (cid:3)(cid:27) | ζ π (cid:21) = E (cid:20) ˜ W π , π (cid:26) E (cid:20) h L ϕ π s ,τ f vx, π s K x,h ( X π − X π s ) | Ω σ , ζ π (cid:21) − E (cid:2) D ∗ π s ,τ | ω π s (cid:3)(cid:27) | ζ π (cid:21) = O (cid:16) h M (cid:17) since the expectation E (cid:20) h L ϕ π s ,τ f vx, π s K x,h ( X π − X π s ) | Ω σ , ζ π (cid:21) = Z h L E (cid:20) ϕ π s ,τ f vx, π s | ω π s (cid:21) K x,h ( X π − X π s ) f x ( X π ) dX π = E (cid:2) D ∗ π s ,τ | ω π s (cid:3) + O (cid:16) h M (cid:17) , where the second equality follows from Assumptions 3.1.1, 3.1.2, and properties of the bias-reducingkernel, Assumption 4.1.5.Similarly, for a given π s ∈ { ( π (1) , π (2)) , ( π (1) , π (2)) , π } , it follows from Assumptions3.1.1, 3.1.2, 4.1.3, and 4.1.5, that E (cid:20) h L (cid:18) ϕ π s ,τ f vx, π s ˜ W π , π K x,h ( X π − X π s ) (cid:19) | ζ π (cid:21) − Ξ [ X π ]= E (cid:20) h L E (cid:20)(cid:18) ϕ π s ,τ f vx, π s χ π s (cid:19) | X π s (cid:21) K x,h ( X π − X π s ) | ζ π (cid:21) − Ξ [ X π ]= Z { Ξ [ X π + h ν ] − Ξ [ X π ] } K x ( ν ) d ν = O (cid:16) h M (cid:17) . Using the previous results it follows that E [ p π s , π ( σ ) | ζ π ] = E (cid:2) D ∗ π ,τ χ π | X π (cid:3) − E (cid:2) D ∗ π ,τ χ π (cid:3) , and thus, E (cid:2) p π , π ( σ ) − p π (1) π (2) , π ( σ ) − p π (1) π (2) , π ( σ ) + p π , π ( σ ) | ζ π (cid:3) = (cid:8) D ∗ π − E (cid:2) D ∗ π | ω π (cid:3)(cid:9) I τ, π χ π + o (1)It then follows that the H´ajek projection is given by V ∗ ,nτ = 1 n ( n − n X i =1 X j = i ξ i j ,τ + o (1)51ith ξ i j ,τ = (cid:8) D ∗ i j − E (cid:2) D ∗ i j | ω i j (cid:3)(cid:9) I τ,i j χ i j χ i j =  n − n − X i = i ,j X j = i ,j ,i E h ˜ W σ { i ,i ; j ,j } | X i , X j i . If follows from a Law of Iterated Expectations that E (cid:2) V ∗ ,nτ (cid:3) = E [ ξ i j ,τ ] = 0 . Step 2. Variance of H´ajek Projection

As in the proof of Lemma B.4, the variance of V ∗ ,nτ is given by V ar (cid:0) V ∗ ,nτ (cid:1) = (cid:26) n ( n − (cid:27)  n X i =1 X j = i E h ξ i j ,τ ξ ′ i ′ j ′ ,τ i = (cid:26) n ( n − (cid:27)  n X i =1 X j = i Λ ∗ i ,j  where Λ ∗ i ,j = E hn E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j i . Deﬁne Υ n = n ( n − V ar (cid:0) V ∗ ,nτ (cid:1) = 1 n ( n −  n X i =1 X j = i Λ ∗ i ,j  . Step 3. Variance of S ,nτ Given two diﬀerent 6-tuples σ { i , i , j , j , l , l } and σ ′ { i ′ , i ′ , j ′ , j ′ , l ′ , l ′ } , let∆ c,n = Cov (cid:0) s ,n ( σ { i , i , j , j , l , l } ) , s ,n (cid:0) σ ′ { i ′ , i ′ , j ′ , j ′ , l ′ , l ′ } (cid:1)(cid:1) denote the covariance between s ,n ( σ ) and s ,n ( σ ′ ) when σ and σ ′ have c = 0 , , , , , , E [ s ,n ( σ { i , i , j , j , l , l } ) | Ω σ ] = 0, that ∆ ,n = ∆ ,n = 0.52onsider∆ ,n = E (cid:2) s ,n ( σ { i , i , j , j , l , l } ) s ,n ( σ ′ { i , i ′ , j , j ′ , l , l ′ } ) ′ (cid:3) = E h s i j ( σ ) s i j (cid:0) σ ′ (cid:1) ′ i + o (1)= E (cid:20)(cid:26) E h ˜ D ∗ i j ,τ ˜ D ∗ i j ,τ | ω i j i − E h ˜ D ∗ i j ,τ | ω i j i (cid:27) I τ,i j ˜ W σ ˜ W σ ′ (cid:21) + o (1) . Therefore, the variance of

V ar ( S † ,nτ ) can be expressed as (cid:18) m n (cid:19) X σ X σ ′ E (cid:2)(cid:0) s ,n ( σ ) s ,n ( σ ′ ) ′ (cid:1)(cid:3) + (cid:18) (cid:18) n (cid:19)(cid:19) − n X i =1 X j = i  X k = i ,j X k = i ,j ,k X l = i ,j X l = i ,j ,l ∆ ,n  + O (cid:18) n (cid:19) ∆ ,n + O (cid:18) n (cid:19) ∆ ,n + O (cid:18) n (cid:19) ∆ ,n + O (cid:18) n (cid:19) ∆ ,n Notice that the term inside the brackets scaled by (( n − n − − can be written as (cid:18) n − n − (cid:19) X k = i ,j X k = i ,j ,k X l = i ,j X l = i ,j ,l ∆ ,n = E hn E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j i = Λ ∗ i ,j . As a result,

V ar h S † ,nτ i = (cid:18) n ( n − (cid:19)  n X i =1 X j = i Λ ∗ i ,j  + o (1) , and V ar (cid:2) V ∗ ,nτ (cid:3) − V ar h S † ,nτ i = o p (1).The asymptotic equivalence results follows from similar arguments as in the proof of LemmaB.4. The proof is complete. Lemma B.6.

Under the same Assumptions of Theorem 4.2, it follows that the H´ajek projection of S † ,nτ = S ,nτ − E h ˜ W σ ˜ D ∗ σ,τ | Ω σ i S ,nτ = 1 m n X σ ∈N mn ˜ W σ ( D ∗ i j ,τ b f vx,i j f vx,i j − D ∗ i j ,τ b f vx,i j f vx,i j ! − D ∗ i j ,τ b f vx,i j f vx,i j − D ∗ i j ,τ b f vx,i j f vx,i j !) nto an arbitrary function ζ i j = ( X i , X j , A i , A j , v i j , U i j ) is given by V ∗ ,nτ = E h S † ,nτ | ζ i j i = 1 n ( n − n X i =1 X j = i ξ i j ,τ and n ( n − − / n E (cid:20)(cid:16) S † ,nτ − V ∗ ,nτ (cid:17) (cid:21) Υ − / n = o (1) , where Υ n = n ( n − V ar ( V ∗ ,nτ ) .Proof. Consider a ﬁxed 6-tuple { i , i , j , j , k , k } , and deﬁne s i ,j ( σ ) = ˜ W i i ,j j (cid:26) h L +1 D ∗ i j ,τ f vx,i j K vx,h ( v k k − v i j , X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) s i ,j ( σ ) = ˜ W i i ,j j (cid:26) h L +1 D ∗ i j ,τ f vx,i j K vx,h ( v k k − v i j , X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) s i ,j ( σ ) = ˜ W i i ,j j (cid:26) h L +1 D i j ,τ ∗ f vx,i j K vx,h ( v k k − v i j , X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) s i ,j σ ) = ˜ W i i ,j j (cid:26) h L +1 D ∗ i j ,τ f vx,i j K vx,h ( v k k − v i j , X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) , and s ,n ( σ ) = s i ,j ( σ ) − s i ,j ( σ ) − s i ,j ( σ ) + s i ,j ( σ ). It follows then that S † ,nτ can be written as S † ,nτ = (cid:20) (cid:18) n (cid:19)(cid:21) − X σ ∈N mn s ,nτ ( σ )= (cid:20) (cid:18) n (cid:19)(cid:21) − X σ ∈N mn { s i j ( σ ) − s i j ( σ ) − s i j ( σ ) + s i j ( σ ) } . Step 1. H´ajek Projection

The rest of the proof makes use of the following index notation for dyads. Given the totalnumber of ordered dyads n = n ( n − π = , , · · · index the n ordereddyads in the sample. In an abuse of notation, also let π denote the set { i , j } , where i and j arethe indices that comprise dyad π . In particular, π (1) = i and π (2) = j , when π = { i , j } .With this notation at hand, S † ,nτ can be expressed as S † ,nτ = (cid:20) (cid:18) n (cid:19)(cid:21) − n X π = X π = π X π = π (cid:8) s π ( σ ) − s π (1) π (2) ( σ ) − s π (1) π (2) ( σ ) + s π ( σ ) (cid:9) where σ = σ { π , π , π } . 54et p π , π ( σ ) = 1 h L +1 (cid:18) D ∗ π ,τ f vx, π ˜ W π , π + D ∗ π ,τ f vx, π ˜ W π , π (cid:19) K vx,h ( v π − v π , X π − X π ) − E h ˜ W π , π D ∗ π ,τ | Ω π , π i − E h ˜ W π , π D ∗ π ,τ | Ω π , π i p π , π ( σ ) = 1 h L +1 ˜ W π , π (cid:26) D ∗ π ,τ f vx, π K vx,h ( v π − v π , X π − X π ) − E (cid:2) D ∗ π ,τ | Ω π , π (cid:3)(cid:27) h L +1 ˜ W π , π (cid:26) D ∗ π ,τ f vx, π K vx,h ( v π − v π , X π − X π ) − E (cid:2) D ∗ π ,τ | Ω π , π (cid:3)(cid:27) p π (1) π (2) , π ( σ ) = 1 h L +1 D ∗ π (1) π (2) ,τ f vx, π (1) π (2) ˜ W π , π K vx,h (cid:0) v π − v π (1) π (2) , X π − X π (1) π (2) (cid:1) + 1 h L +1 D ∗ π (1) π (2) ,τ f vx, π (1) π (2) ˜ W π , π K vx,h (cid:0) v π − v π (1) π (2) , X π − X π (1) π (2) (cid:1) − E h ˜ W π , π D ∗ π (1) π (2) ,τ | Ω π , π i − E h ˜ W π , π D ∗ π (1) π (2) ,τ | Ω π , π i p π (1) π (2) , π ( σ ) = 1 h L +1 D ∗ π (1) π (2) ,τ f π (1) π (2) ,τ ˜ W π , π K vx,h (cid:0) v π − v π (1) π (2) , X π − X π (1) π (2) (cid:1) + 1 h L +1 D ∗ π (1) π (2) ,τ f vx, π (1) π (2) ˜ W π , π K vx,h (cid:0) v π − v π (1) π (2) , X π − X π (1) π (2) (cid:1) − E h ˜ W π , π D ∗ π (1) π (2) ,τ | Ω π , π i − E h ˜ W π , π D ∗ π (1) π (2) ,τ | Ω π , π i where K vx,h ( v π − v π , X π − X π ) denotes K vx,h (cid:0) v π − v π , X π (1) − X π (1) , X π (2) − X π (2) (cid:1) , ˜ W π , π denotes ˜ W π { i i } , π { j j } , and χ π = E h ˜ W π , π | X π i χ π = X π = π , π χ π . Using the symmetry of the kernel, it follows that S † ,nτ can be written as (cid:20) (cid:18) n (cid:19)(cid:21) − n X π = X π = π +1 X π = π , π (cid:8) p π , π ( σ ) − p π (1) π (2) , π ( σ ) − p π (1) π (2) , π ( σ ) + p π , π ( σ ) (cid:9) To compute the H´ajek projection of the above sum into an arbitrary function of ζ π , considerﬁrst E [ p π , π ( σ ) | ζ π ] . To that end, the following results will be useful. E h E h ˜ W π , π D ∗ π ,τ | ω π i | ζ π i = E (cid:2) D ∗ π ,τ | ω π (cid:3) E h ˜ W π , π | X π i = E (cid:2) D ∗ π ,τ χ π | ω π (cid:3) E h E h ˜ W π , π D ∗ π ,τ | ω π i | ζ π i = E h E (cid:2) D ∗ π ,τ | ω π (cid:3) E h ˜ W π , π | X π ii = E (cid:2) D ∗ π ,τ χ π (cid:3) . E (cid:20)(cid:18) D ∗ π ,τ f vx, π ˜ W π , π + D ∗ π ,τ f vx, π ˜ W π , π (cid:19) h L +1 K vx,h ( v π − v π , X π − X π ) | ζ π (cid:21) = E (cid:20)(cid:26) D ∗ π ,τ f vx, π E h ˜ W π , π | X π i + E (cid:20) D ∗ π ,τ f vx, π | v π , X π (cid:21) E h ˜ W π , π | X π i(cid:27) × h L +1 K vx,h ( v π − v π , X π − X π ) | ζ π (cid:21) = Z (cid:26) D ∗ π ,τ f vx, π χ π + E (cid:20) D ∗ π ,τ f vx, π χ π | v π , X π (cid:21)(cid:27) h L +1 K vx,h ( v π − v π , X π − X π ) f vx ( v π , X π ) dv π dX π where the second equality follows from a Law of Iterated Expectations and Assumption 3.1.1.Let Ξ ( v π , X π ) = E (cid:2) D ∗ π ,τ χ π | v π , X π (cid:3) , and consider Z (cid:26) D ∗ π ,τ f vx, π χ π f vx ( v π , X π ) + Ξ ( v π , X π ) (cid:27) h L +1 K vx,h ( v π − v π , X π − X π ) dv π dX π − (cid:26) D ∗ π ,τ f vx, π χ π f vx ( v π , X π ) + Ξ ( v π , X π ) (cid:27) = Z (cid:26) D ∗ π ,τ f vx, π χ π f vx ( v π + h ν X π + h ν ) + Ξ ( v π + h ν , X π + h ν ) (cid:27) K vx ( ν ) d ν − (cid:26) D ∗ π ,τ f vx, π χ π f vx ( v π , X π ) + Ξ ( v π , X π ) (cid:27) = Z (cid:18) D ∗ π ,τ f vx, π χ π { f vx ( v π + h ν X π + h ν ) − f vx ( v π , X π ) } + { Ξ ( v π + h ν , X π + h ν ) − Ξ ( v π , X π ) } ) K vx ( ν ) d ν = o ( h M )where the ﬁrst equality follows from a change of variable ν = ( ν , ν ), with ν = h − ( v π − v π ),and ν = h − ( X π − X π ), with Jacobian h L . The last equality follows Assumptions 4.1.1, 4.1.3,and 4.1.5 which guarantee that f vx ( v π , X π ) and Ξ ( v π , X π ) are continuous and M -times dif-ferentiable with respect to all of its arguments, and K vx is a bias-reducing kernel of order 2 M .Observe that D ∗ π ,τ f vx, π χ π f vx ( v π , X π ) = 0holds for any ( v π , X π ) within a τ distance of the boundary S vx , and having h/τ → ν = ( ν , ν ), with ν = h − ( v π − v π ), and ν = h − ( X π − X π ), is notaﬀected by boundary eﬀects.The previous results yield E [ p π , π ( σ ) | ζ π ] = D ∗ π ,τ χ π + E (cid:2) D ∗ π ,τ χ π | X π (cid:3) − E (cid:2) D ∗ π ,τ χ π | ω π (cid:3) − E (cid:2) D ∗ π ,τ χ π (cid:3) + o (1) . π s ∈ { ( π (1) , π (2)) , ( π (1) , π (2)) , π } , E (cid:20) ˜ W π , π (cid:26) h L +1 D ∗ π s ,τ f vx, π s K vx,h ( v π − v π s , X π − X π s ) − E (cid:2) D ∗ π s ,τ | ω π s (cid:3)(cid:27) | ζ π (cid:21) = E (cid:20) ˜ W π , π (cid:26) E (cid:20) h L +1 D ∗ π s ,τ f vx, π s K vx,h ( v π − v π s , X π − X π s ) | Ω π , π (cid:21) − E (cid:2) D ∗ π s ,τ | ω π s (cid:3)(cid:27) | ζ π (cid:21) = O (cid:16) h M (cid:17) since the expectation E (cid:20) h L +1 D ∗ π s ,τ f vx, π s K vx,h ( v π − v π s , X π − X π s ) | Ω π , π (cid:21) = Z h L +1 E (cid:20) D ∗ π s ,τ f vx, π s | ω π s (cid:21) K vx,h ( v π − v π s , X π − X π s ) f vx ( v π , X π ) dv π dX π = E (cid:2) D ∗ π s ,τ | ω π s (cid:3) + o (cid:16) h M (cid:17) , where the second equality follows from Assumptions 3.1.1, 3.1.2, and properties of the bias-reducingkernel, Assumption 4.1.5.Similarly, for a given π s ∈ { ( π (1) , π (2)) , ( π (1) , π (2)) , π } , it follows from Assumptions3.1.1, 3.1.2, 4.1.3, and 4.1.5, that E (cid:20) h L +1 (cid:18) D ∗ π s ,τ f vx, π s ˜ W π , π K vx,h ( v π − v π s , X π − X π s ) (cid:19) | ζ π (cid:21) − Ξ [ v π , X π ]= E (cid:20) h L +1 E (cid:20)(cid:18) D ∗ π s ,τ f vx, π s χ π s (cid:19) | v π s , X π s (cid:21) K vx,h ( v π − v π s , X π − X π s ) | ζ π (cid:21) − Ξ [ v π , X π ]= Z { Ξ ( v π + h ν , X π + h ν ) − Ξ ( v π , X π ) } K vx ( ν ) d ν = O (cid:16) h M (cid:17) . Using the previous results it follows that E [ p π s , π ( σ ) | ζ π ] = E (cid:2) D ∗ π ,τ χ π | X π (cid:3) − E (cid:2) D ∗ π ,τ χ π (cid:3) , and thus, E (cid:2) p π , π ( σ ) − p π (1) π (2) , π ( σ ) − p π (1) π (2) , π ( σ ) + p π , π ( σ ) | ζ π (cid:3) = (cid:8) D ∗ π − E (cid:2) D ∗ π | ω π (cid:3)(cid:9) I τ, π χ π + o (1)It follows then that the H´ajek projection is given by V ∗ ,nτ = 1 n ( n − n X i =1 X j = i ξ i j ,τ + o (1)57ith ξ i j ,τ = (cid:8) D ∗ i j − E (cid:2) D ∗ i j | ω i j (cid:3)(cid:9) I τ,i j χ i j χ i j =  n − n − X i = i ,j X j = i ,j ,i E h ˜ W σ { i ,i ; j ,j } | X i , X j i . If follows from a Law of Iterated Expectations that E (cid:2) V ∗ ,nτ (cid:3) = E [ ξ i j ,τ ] = 0 . Step 2. Variance of H´ajek Projection

As in the proof of Lemma B.4, the variance of V ∗ ,nτ is given by V ar (cid:0) V ∗ ,nτ (cid:1) = (cid:26) n ( n − (cid:27)  n X i =1 X j = i E h ξ i j ,τ ξ ′ i ′ j ′ ,τ i = (cid:26) n ( n − (cid:27)  n X i =1 X j = i Λ ∗ i ,j  where Λ ∗ i ,j = E hn E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j i . Deﬁne Υ n = n ( n − V ar (cid:0) V ∗ ,nτ (cid:1) = 1 n ( n −  n X i =1 X j = i Λ ∗ i ,j  . Step 3. Variance of S ,nτ Given two diﬀerent 6-tuples σ { i , i , j , j , l , l } and σ ′ { i ′ , i ′ , j ′ , j ′ , l ′ , l ′ } , let∆ c,n = Cov (cid:0) s ,n ( σ { i , i , j , j , l , l } ) , s ,n (cid:0) σ ′ { i ′ , i ′ , j ′ , j ′ , l ′ , l ′ } (cid:1)(cid:1) denote the covariance between s ,n ( σ ) and s ,n ( σ ′ ) when σ and σ ′ have c = 0 , , , , , , E [ s ,n ( σ { i , i , j , j , l , l } ) | Ω σ ] = 0, that ∆ ,n = ∆ ,n = 0.58onsider∆ ,n = E (cid:2) s ,n ( σ { i , i , j , j , l , l } ) s ,n ( σ ′ { i , i ′ , j , j ′ , l , l ′ } ) ′ (cid:3) = E h s i j ( σ ) s i j (cid:0) σ ′ (cid:1) ′ i + o (1)= E (cid:20)(cid:26) E h ˜ D ∗ i j ,τ ˜ D ∗ i j ,τ | ω i j i − E h ˜ D ∗ i j ,τ | ω i j i (cid:27) I τ,i j ˜ W σ ˜ W σ ′ (cid:21) + o (1) . Therefore, the variance of

V ar ( S † ,nτ ) can be expressed as (cid:18) m n (cid:19) X σ X σ ′ E (cid:2)(cid:0) s ,n ( σ ) s ,n ( σ ′ ) ′ (cid:1)(cid:3) + (cid:18) (cid:18) n (cid:19)(cid:19) − n X i =1 X j = i  X k = i ,j X k = i ,j ,k X l = i ,j X l = i ,j ,l ∆ ,n  + O (cid:18) n (cid:19) ∆ ,n + O (cid:18) n (cid:19) ∆ ,n + O (cid:18) n (cid:19) ∆ ,n + O (cid:18) n (cid:19) ∆ ,n Notice that the term inside the brackets scaled by (( n − n − − can be written as (cid:18) n − n − (cid:19) X k = i ,j X k = i ,j ,k X l = i ,j X l = i ,j ,l ∆ ,n = E hn E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j i = Λ ∗ i ,j . As a result,

V ar h S † ,nτ i = (cid:18) n ( n − (cid:19)  n X i =1 X j = i Λ ∗ i ,j  + o (1) , and V ar (cid:2) V ∗ ,nτ (cid:3) − V ar h S † ,nτ i = o p (1).The asymptotic equivalence results follows from similar arguments as in the proof of LemmaB.4. The proof is complete. 59 Simulations: alternative designs

Table 3: Simulation results for the semiparametric estimator b θ n with kernel estimator b f v ( v ij )mean median std MSE Degree n = 50log(log( n )) 1.6047 1.6164 1.1253 1.2772 0.4237log( n ) / n ) 1.6444 1.6643 1.5801 2.5176 0.3125 n = 100log(log( n )) 1.5373 1.5011 0.4911 0.2425 0.4214log( n ) / n ) 1.5415 1.5197 0.7317 0.5371 0.2907 Total number of Monte Carlo simulations = 500. Bandwith parameter h = 0 . Table 4: Simulation results for the semiparametric estimator b θ n with kernel estimator b f v ( v ij )mean median std MSE Degree n = 50log(log( n )) 1.6296 1.5640 1.1280 1.2893 0.4252log( n ) / n ) 1.6308 1.6379 1.5430 2.3981 0.3127 n = 100log(log( n )) 1.5944 1.5782 0.4999 0.2588 0.4218log( n ) / n ) 1.5009 1.5244 0.7059 0.4983 0.2896 Total number of Monte Carlo simulations = 500. Bandwith parameter h = 0 . b θ n with kernel estimator b f v ( v ij )mean median std MSE Degree n = 50log(log( n )) 1.7149 1.7235 1.1336 1.3313 0.4252log( n ) / n ) 1.5690 1.5839 1.5592 2.4358 0.3116 n = 100log(log( n )) 1.5394 1.5478 0.4973 0.2488 0.4212log( n ) / n ) 1.5662 1.6033 0.7749 0.6049 0.2905 Total number of Monte Carlo simulations = 500. Bandwith parameter h = 0 . Table 6: Simulation results for the semiparametric estimator b θ n with kernel estimator b f v ( v ij )mean median std MSE Degree n = 50log(log( n )) 1.6675 1.6378 1.0617 1.1552 0.4250log( n ) / n ) 1.6594 1.6162 1.5833 2.5321 0.3113 n = 100log(log( n )) 1.5577 1.5512 0.5305 0.2848 0.4208log( n ) / n ) 1.5653 1.5637 0.7064 0.5033 0.2898 Total number of Monte Carlo simulations = 500. Bandwith parameter h = 0 .2.

Related Researches

Optimal transportation and the falsifiability of incompletely specified economic models

by Ivar Ekeland

A note on global identification in structural vector autoregressions

by Emanuele Bacchiocchi

Duality in dynamic discrete-choice models

by Khai Xiang Chiong

A test of non-identifying restrictions and confidence regions for partially identified parameters

by Alfred Galichon

Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy

by Falco J. Bargagli Stoffi

Extreme dependence for multivariate data

by Damien Bosc

Dilation bootstrap

by Alfred Galichon

Inference under Covariate-Adaptive Randomization with Imperfect Compliance

by Federico A. Bugni

Identification of Matching Complementarities: A Geometric Viewpoint

by Alfred Galichon

Hypothetical bias in stated choice experiments: Part I. Integrative synthesis of empirical evidence and conceptualisation of external validity

by Milad Haghani

Hypothetical bias in stated choice experiments: Part II. Macro-scale analysis of literature and effectiveness of bias mitigation methods

by Milad Haghani

The Econometrics and Some Properties of Separable Matching Models

by Alfred Galichon

Discretizing Unobserved Heterogeneity

by Stéphane Bonhomme Thibaut Lamadon Elena Manresa

Permutation Tests at Nonparametric Rates

by Marinho Bertanha

General Bayesian time-varying parameter VARs for predicting government bond yields

by Manfred M. Fischer

Quasi-maximum likelihood estimation of break point in high-dimensional factor models

by Jiangtao Duan

A Control Function Approach to Estimate Panel Data Binary Response Model

by Amaresh K Tiwari

Set Identification in Models with Multiple Equilibria

by Alfred Galichon

Inference in Incomplete Models

by Alfred Galichon

Non-stationary GARCH modelling for fitting higher order moments of financial series within moving time windows

by Luke De Clerk

Bridging factor and sparse models

by Jianqing Fan

Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem

by Charles F Manski

A Novel Multi-Period and Multilateral Price Index

by Consuelo Rubina Nava

Cointegrated Solutions of Unit-Root VARs: An Extended Representation Theorem

by Mario Faliva

Estimation and Inference by Stochastic Optimization: Three Examples

by Jean-Jacques Forneron

«

1

2

3

4

»

Submitted on 10 Jul 2020 (v1), last revised 30 Aug 2020 (this version, v2) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar