A Semiparametric Network Formation Model with Unobserved Linear Heterogeneity
aa r X i v : . [ ec on . E M ] A ug A Semiparametric Network Formation Modelwith Unobserved Linear Heterogeneity ∗ Luis E. Candelaria † September 1, 2020
Abstract
This paper analyzes a semiparametric model of network formation in the presence of un-observed agent-specific heterogeneity. The objective is to identify and estimate the preferenceparameters associated with homophily on observed attributes when the distributions of the un-observed factors are not parametrically specified. This paper offers two main contributions tothe literature on network formation. First, it establishes a new point identification result for thevector of parameters that relies on the existence of a special regressor. The identification proof isconstructive and characterizes a closed-form for the parameter of interest. Second, it introducesa simple two-step semiparametric estimator for the vector of parameters with a first-step kernelestimator. The estimator is computationally tractable and can be applied to both dense andsparse networks. Moreover, I show that the estimator is consistent and has a limiting normaldistribution as the number of individuals in the network increases. Monte Carlo experimentsdemonstrate that the estimator performs well in finite samples and in networks with differentlevels of sparsity.
Keywords:
Network formation, Unobserved heterogeneity, Semiparametrics, Special regressor,Inverse weighting. ∗ First version: November, 2016. A previous version of this paper was titled: “A Semiparametric Network Forma-tion Model with Multiple Linear Fixed Effects.” † Department of Economics, University of Warwick, Coventry, U.K. Email:
[email protected] .I am deeply grateful to Federico Bugni, Shakeeb Khan, Arnaud Maurel, and Matthew Masten for their excellentguidance, constant encouragement, and helpful discussions. I also thank Irene Botosaru, ´Aureo de Paula, AndreasDzemski, Cristina Gualdani, Bryan Graham, Bo Honor´e, Arthur Lewbel, Thierry Magnac, Chris Muris, James Powell,Adam Rosen, Takuya Ura, Martin Weidner, and seminar participants at Aarhus, Cambridge, Duke, Gothenburg, LSE,Surrey, Syracuse, TSE, UCL, UNC Chapel Hill, Vanderbilt, Warwick, 2018 ES Winter Meeting in Philadelphia, 2019Panel Data Workshop at the University of Amsterdam, 2019 Royal Economic Society at the University of Warwick,for their comments. Introduction
People tend to connect with individuals with whom they share similar observed attributes. Thisobservation is known as homophily and it is one of the main objects of study in the literature of socialnetworks (McPherson, Smith-Lovin, and Cook 2001). However, few have investigated the role ofhomophily when individuals have preferences for unobserved attributes. Proper policy evaluationrequires us to distinguish between the contributions of observed and unobserved attributes, sincethey have different policy implications. For example, students might form friendships based ontheir similarities on observed socioeconomic attributes as well as on their preferences for highlevels of unobserved ability. While socioeconomic attributes can be influenced by a given policyintervention, preferences for ability are harder to change via targeted policies. In this paper, Istudy the identification and estimation of the preference parameters associated with the observedattributes in a model of network formation that accounts for valuations on unobserved agent-specificfactors. The identification and estimation strategies that I develop do not depend on distributionalassumptions of the unobserved random components.In particular, I consider a semiparametric model of network formation with unobserved agent-specific heterogeneity. Specifically, two distinct agents i and j form an undirected link accordingto the following network formation equation: D ij = (cid:2) g ( Z i , Z j ) ′ β + A i + A j − U ij ≥ (cid:3) , (1)where [ · ] is the indicator function, D ij is a binary outcome variable that takes a value equal to1 if agents i and j form a link and 0 otherwise, Z i is a vector of individual-specific and observedattributes, g is a measurable function that is assumed to be known, nonlinear, finite, and symmetricon its arguments, β is a vector of unknown parameters, A i and A j are unobserved and agent-specificrandom variables, and U ij is an unobserved and link-specific disturbance term.Intuitively, equation (1) says that an undirected link between two agents is formed if the netbenefit of the link between agents i and j is nonnegative. The components in equation (1) can beclassified into three different categories. The first class, given by the vector of exogenous attributes g ( Z i , Z j ), captures the agents’ preferences for establishing a link based on observed characteristics.For instance, this component is known as homophily on observed attributes when it capturespreferences for sharing similar traits. The second class, formed by the agent-specific and unobservedfactors A i and A j , captures the individual preferences for establishing connections based on agent-specific unobserved traits. Finally, the third class, given by a link-specific disturbance term U ij ,captures the exogenous factors that influence the decision to form a specific link. The componentsin the last two categories are known to the agents but unobserved to the researcher. A link between two agents is undirected if the connection is reciprocal. In other words, two agents are eitherconnected or they are not. It excludes the case where one agent is related to another without the second being relatedto the first. Their methodologies differ substantially from the one proposed here since they follow aparametric conditional maximum likelihood approach to estimate the vector of coefficients β . Incontrast, I study the formation of an undirected network and follow a semiparametric approach.This paper builds on the seminal work by Graham (2017), which aims to detect preferences Charbonneau (2017) and Jochmans (2017) study a two-way gravity model, which can be rationalized as a bipartitenetwork with directed links. U ij is not parametrically specified.Since the initial draft of this paper was circulated, recent studies have appeared analyzingsemiparametric or nonparametric variations of a dyadic network formation model with unobservedheterogeneity; these include papers by Toth (2017); Gao (2020), and Zeleneev (2020).Similarly to this paper, Toth (2017) studies a dyadic network formation model in which thedistribution of U ij is unknown. However, the author uses a different identification strategy. Inparticular, his strategy relies on assuming that each component in the vector of observed attributes Z i is continuously distributed which is then used to propose an identification strategy similar tothe maximum rank by Han (1987). An estimator for β is then defined as the maximizer of a Uprocess of order 4, with a nonparametric first-step estimator. Gao (2020) studies the identification of a dyadic network model with a nonparametric functionalform for the preferences on homophily and an unknown cumulative distribution for U ij . Heidentifies the nonparametric homophily function by introducing a novel identification strategy thatimposes an interquartile-range normalization and a location normalization of one of the quantilesas stochastic restrictions on the distribution of U ij .Finally, Zeleneev (2020) studies the identification and estimation of a dyadic network formationmodel with a nonparametric structure of the unobserved heterogeneity. This framework allows himto account for latent homophily on the unobserved attributes. The author’s identification analysisis based on introducing a pseudo-distance between a pair of agents i and j , which allows him torecover groups of agents with the same levels of agent-specific unobserved heterogeneity. Afterconditioning on the matched agents with similar unobserved heterogeneity, the identification of thevector of coefficients proceeds from a pairwise difference strategy. The estimation procedure followsthe same logic as the identification strategy.Contrary to previous studies, the identification strategy proposed here is based on the existenceof a special regressor (see, e.g., Lewbel (1998) and Lewbel (2012) for a survey). This paper, tothe best of my knowledge, represents the first effort in the econometric literature to introducea special regressor to analyze a network formation model. The vector of parameters β is pointidentified after introducing a transformation that consists in weighting the linking decisions D ij Toth (2017) also proposes a variation of his estimation strategy which requires maximizing a U-process of order 2,with a nonparametric first-step estimator. This moditication improves the computational tractability of his method. Gao (2020) also provides several interesting extensions on the functional form of the unobserved heterogeneity;for reference, see Gao (2020, p. 5) and Zeleneev (2020, p. 6). Those extensions are beyond the scope of this paperand left for future research.
4y the inverse of the conditional density of the special regressor given the observed attributes.This transformation utilizes features of the distributions of observables and does not represent astochastic restriction on the distribution of U ij . Therefore it is not nested in any existing work. Asa restriction on the distribution of U ij , I normalize to zero the conditional mean of the link-specificdisturbance terms given the observed attributes. In Section 3.1, I provide a detailed discussion onthe sufficient conditions needed to point identify β via the existence of a special regressor.The second point identification result introduced in section 3.2 is based on a sufficient statisticargument at the tails of the distribution of a covariate with full support. The identification strategyshows that within- and across-individuals variation in the linking decisions can be used as a sufficientstatistic to differentiate out the unobserved agent-specific factors in some sets of sufficient variationsof the covariate with full support. The existence of only one continuous attribute with large supportin Z i is sufficient to show this result. The latter assumption is satisfied by many real networkdatasets, and hence it is empirically relevant. The resulting semiparametric estimator is solved inone step, and it is defined as the maximizer of a U-process of order 4 with a trimming sequence.In Section 4, I introduce a two-step semiparametric estimator for β based on the identifica-tion result that requires the existence of a special regressor. The estimator has an analytic formsimilar to the least-squares, and it uses a first-step kernel estimator to weight the linking deci-sions D ij by the inverse of the conditional density of the special regressor. In a recent paper,Graham, Niu, and Powell (2019) have studied the nonparametric estimation of density functionswith dyadic data. I follow their findings to perform the first-step kernel estimation. In theorems4.1 and 4.2, I show that the semiparametric estimator for β is consistent and has limiting normaldistribution.Finally, the network formation model that I analyze is related to the literature on empir-ical games. Specifically, the model in equation (1) can be derived as a stable outcome in astatic game. Papers that study the strategic formation of a network as a static game includeGoldsmith-Pinkham and Imbens (2013); Leung (2015a,b); Menzel (2015); Miyauchi (2016); Boucher and Mourifi´e(2017); de Paula, Richards-Shubik, and Tamer (2017); Mele (2017); Candelaria and Ura (2018);Sheng (2018); Gualdani (2020), and Ridder and Sheng (2020). The authors study network forma-tion models that account for network externalities. Network externalities generate interdependen-cies in the linking decisions that depend on the structure of the network. The identification andestimation methods used in these papers differ substantially from the ones proposed here as theyrestrict the presence and distribution of the unobserved agent-specific heterogeneity.The rest of the paper is organized as follows. Section 2 introduces the network formation model In further research I will explore the informational content of the special regressor in a network formation modelgiven a quantile or median restriction. For example, in the National Longitudinal Study of Adolescent to Adult Health (Add Health) dataset, householdincome is a continuous variable that can be demeaned and standardized to satisfy the support condition.
A network is an ordered pair ( N n , D n ) formed by a set of n agents denoted by N n = { , · · · , n } andan n × n adjacency matrix D n , which represents the links between the agents in N n . Let D ij denotethe ( i, j )th entry of the matrix D n . I assume the network is undirected and unweighted. A networkis undirected if the adjacency matrix is symmetric, i.e., D ij = D ji . A network is unweighted if any( i, j )th entry of the adjacency matrix takes one of two values, where the values are normalized tobe 0 and 1. In other words, D ij ∈ { , } , where D ij = 1 if the agents i and j share a link and D ij = 0 otherwise. Furthermore, I normalize the value of self-ties to zero, that is, D ii = 0 for anyagent i . Example 1 (Friendships network) . A network of best friends is an example of an undirected andunweighted network. Two agents are considered to be best friends if and only if both agents reporteach other as friends. In this case, D ij = D ji = 1 . Also, this example rules out the scenario of anagent reporting herself as her best friend. Each agent i ∈ N n is endowed with a K + 1-dimensional vector of observed attributes Z i andan unobserved scalar component term A i . Common examples of observed attributes that could ex-plain the formation of a friendships network among high school students are age, gender, ethnicity,religion, and the students’ interest in extracurricular activities. The component A i captures indi-vidual i ’s preferences for establishing a link based on unobserved and agent-specific attributes. Theunobserved component U ij captures exogenous stochastic factors that influence the pair-specificdecision to establish a link between agents i and j .Given the vectors of observed attributes Z i and Z j for i = j , let ¯ Z ij = g ( Z i , Z j ) be a K + 1-dimensional vector of pair-specific attributes. The function g is assumed to be a known measurablefunction that is nonlinear and finite. Given the undirected nature of the network, g is assumedto be symmetric on its terms. The specification of g varies according to the empirical applicationand is chosen by the researcher to capture homophily or heterophily effects. For example, supposethat Z i is a scalar random variable that represents agent i ’s gender, then ¯ Z ij could be defined as The intuition behind the requirement that g is a nonlinear function is similar to the logic for the identificationof the vector of coefficients in a linear panel data model with fixed effects. A specific feature of those models is thatonly the coefficients associated with time-varying variables are identified. The identification strategies proposed insection 3 use the pairwise variation in ¯ Z ij to identify β . The assumption that g is nonlinear rules out the case thatthe pairwise variation is equal to the vector of zeroes, and hence, β is not identified. [ Z i = Z j ] to capture the preferences for homophily. Under this specification, ¯ Z ij equals 1 if agents i and j share the same gender and 0 otherwise.The network formation model described in equation (1) can be obtained as a stable outcome of arandom utility model with transferable utilities. In particular, let ¯ u ij ( ¯ Z ij , A j , U ij ) denote individual i ’s latent valuation of establishing a link with j given their shared-observed attributes ¯ Z ij , agent j ′ s unobserved type A j , and their common unobserved factor U ij . It follows that the joint net benefitof adding the link { i, j } to the network D n is¯ u ij ( ¯ Z ij , A j , U ij ) + ¯ u ji ( ¯ Z ij , A i , U ij ) = ¯ Z ′ ij β + A i + A j − U ij . (2)Notice that the joint net benefit accounts for the preferences based on the observed attributes¯ Z ′ ij β , as well as preferences for association based on agent-specific factors A i + A j , and for exogenousfactors affecting the decision to establish a link U ij .Equation (2) implies that two distinct individuals i and j in N n only have valuations for theirown observed attributes and agent-specific factors. To clarify, in the link formation decision fordyad { i, j } , the individuals do not take into account either observed and unobserved attributesof other individuals in the network, or general features of the network other than the dyad { i, j } .These effects are known as network externalities (see, e.g., Chandrasekhar and Jackson 2014; Leung2015b; Mele 2017; Menzel 2015; Badev 2018; Sheng 2018; Ridder and Sheng 2020). Some examplesof these effects are preferences for reciprocity, transitive triads, or high network degree. I leave thisextension for future research.Next, I introduce the definition of stability. Definition 1 (Stability) . A network D n is stable with transfers if for any distinct i, j ∈ N n :1. for all D ij = 1 , ¯ u ij ( ¯ Z ij , A j , U ij ) + ¯ u ji ( ¯ Z ij , A i , U ij ) ≥ ;2. for all D ij = 0 , ¯ u ij ( ¯ Z ij , A j , U ij ) + ¯ u ji ( ¯ Z ij , A i , U ij ) < . Notice that this definition adapts the pairwise stability in Jackson and Wolinsky (1996) to allowfor transferable utilities. Intuitively, this condition states that a link within dyad { i, j } is establishedif the net benefit of that connection is nonnegative. For a generalization to nontransferable utilities,see Gao, Li, and Xu (2020). The following notation will be maintained in the rest of the paper. I will assume that the vectorof observed covariates Z i = ( v i , X ′ i ) ′ is comprised of a scalar random variable v i ∈ R and a K -7imensional random vector X i ∈ R K . Similarly, let¯ Z ij = (cid:0) g ( v i , v j ) , g ( X i , X j ) ′ (cid:1) ′ = ( v ij , W ′ ij ) ′ denote the observed covariates at dyad level, and let β = (1 , θ ′ ) ′ .I will denote the distinct profiles of observed attributes for all the agents in the network as Z n = { Z i : i ∈ N n } , v n = { v i : i ∈ N n } , and X n = { X i : i ∈ N n } . Similarly, let A n = { A i : i ∈ N n } denote the profile of unobserved attributes. Moreover, let Z − ij = { Z k : k = i, j } , and A − ij = { A k : k = i, j } denote the collection of observed and unobserved attributes for all agentsin the network other than agents i and j .The identification and estimation strategies introduced in sections 3 and 4 use the informationcontained in subnetworks formed by groups of four distinct agents { i , i , j , j } , also known astetrads. The following notation is used to describe attributes at the tetrad level. Given a networkof size n , there is a total of m n = 4! (cid:18) n (cid:19) ordered tetrads with distinct indices i , i , j , j ∈ N n . Let σ be a function that maps thesetetrads to the index set N m n = { , · · · , m n } . Thus, each tetrad with distinct indices { i , i , j , j } corresponds to a unique σ ( { i , i , j , j } ) ∈ N m n .Given any σ ( { i , i , j , j } ) ∈ N m n , let v σ = { v i , v j , v i , v j } , X σ = { X i , X j , X i , X j } , and A σ = { A i , A j , A i , A j } .Moreover, define the pairwise variations across observed attributes and linking decisions asfollows ˜ v σ = ˜ v i i ,j j = ( v i j − v i j ) − ( v i j − v i j )˜ W σ = ˜ W i i ,j j = ( W i j − W i j ) − ( W i j − W i j )˜ D σ = ˜ D i i ,j j = ( D i j − D i j ) − ( D i j − D i j ) . Finally, given any fixed tetrad σ ( { i , i , j , j } ) ∈ N m n , let ω l l = ( v l l , X l , X l , A l , A l ) de-note the profile of attributes at dyad-level and p n ( ω l l ) = P [ D l l = 1 | ω l l ] denote the probabilitythat a link is created for any dyad ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } .8 Identification
This section introduces the main identification results for the semiparametric network formationmodel with unobserved agent-specific factors. In particular, section 3.1 presents the main pointidentification result when a special regressor is available. Section 3.2 introduces a second pointidentification result when a covariate with full support is available.
Using the notation introduced in section 2, the rest of the paper considers the following represen-tation for the network formation model specified by equation (1). In particular, agents i and j in N n with i = j will form an undirected link according to the following equation D ij = (cid:2) v ij + W ′ ij θ + A i + A j − U ij ≥ (cid:3) , (3)where the coefficient associated with v ij has been normalized to 1 and θ is a K -dimensional vectorof coefficients. Given that the network of interest is undirected, U ij is assumed to be symmetric,i.e., U ij = U ji . The vector θ represents the main parameter of interest.Assumptions 3.1.1-3.1.5 will specify the underlying structure for the network formation modelin equation (3), which will be used to show the main identification result for θ . Assumption 3.1.1.
The random sequence { Z i , A i } ni =1 is independent and identically distributed. Assumption 3.1.1 describes the sampling process, and it is widely used to describe network data(see, e.g., Graham 2017; Jochmans 2018, and Auerbach 2019).
Assumption 3.1.2.
For any finite n , the following holds.1. The sequence { U ij | Z n , A n } i = j is conditionally independent and identically distributed forany dyad { i, j } . Moreover, U ij = U ji for any dyad { i, j } .2. For any dyad { i, j } , U ij | Z n , A n d = U ij | Z i , Z j , A i , A j . Assumption 3.1.2.1 states that conditional on ( Z n , A n ) the link-specific disturbance terms { U ij } i = j are independent across dyads { i, j } and drawn from the same distribution. Furthermore,Assumption 3.1.2.2 requires that conditional on ( Z i , Z j , A i , A j ), the link-specific disturbance terms U ij are independent of any observed or unobserved feature in ( Z − ij , A − ij ). Assumption 3.1.2 en-sures that each of the linking decisions in the network is conditionally independent. In other words,it rules out interdependence across linking decisions due to externalities across the network.9otice that Assumption 3.1.2 allows for heteroskedasticity of a general form in the distributionof U ij . Moreover, it allows for flexible dependence between the unobserved agent-specific factors andthe observed attributes. In other words, Assumption 3.1.2 does not restrict the joint distribution( Z n , A n ). Assumption 3.1.2 is commonly used in semiparametric nonlinear panel data models, forexample in Arellano and Honor´e (2001). In network formation models, full stochastic independence U ij ⊥ Z n , A n is usually imposed (see, e.g., Leung 2015b; Menzel 2015; Graham 2017; Toth 2017,and Gao 2020). Arbitrary heteroskedasticity is also considered in Zeleneev (2020). Assumption 3.1.3.
Given n and any distinct i, j ∈ N n , let e ij = A i + A j − U ij and suppose that e ij is conditionally independent of v ij given ( X i , X j ) . Let F e | x ( e ij | X i , X j ) denote the conditionaldistribution of e ij given ( X i , X j ) , with support given by S e ( X i , X j ) and finite first moment. Assumption 3.1.3 represents an exclusion restriction, and it entails that the regressor v ij isconditionally independent of e ij given the observed attributes ( X i , X j ). In other words, v ij is aspecial regressor in the sense of Lewbel (1998), Lewbel (2000), and Lewbel (2012). Assumption 3.1.4.
Given n and any distinct i, j ∈ N n , the conditional distribution of v ij given ( X i , X j ) is absolutely continuous with respect to the Lebesgue measure with conditional density f v | x ( v ij | X i , X j ) and support given by S v ( X i , X j ) = [ s v , s v ] for some constants s v and s v , with −∞ ≤ s v < < s v ≤ ∞ . For any ( X i , X j ) , the support of − W ′ ij θ − e ij is a subset of the interval [ s v , s v ] . Assumption 3.1.4 is a support condition, and it ensures that v ij | X i , X j has a positive densityfunction f v | x ( v ij | X i , X j ) on S v ( X i , X j ). Furthermore, it requires that for any ( X i , X j ) the sup-port of ( − W ′ ij θ − e ij ) is contained in S v ( X i , X j ). Notice that Assumption 3.1.4 does not restrict v ij | X i , X j to having full support on the real line. Hence the point identification result introducedin this section is general enough to include both cases: (i) the full support case, and (ii) the existenceof a continuous covariate with bounded support that contains supp (cid:16) − W ′ ij θ − e ij | X i , X j (cid:17) . More-over, observe that Assumption 3.1.4 leaves unrestricted the distribution of the observed attributes( X i , X j ). Hence, this identification strategy also allows for discrete covariates in W ij . Assumption 3.1.5.
Given n and any tetrad σ ∈ N m n , E [ U ij | X i , X j ] = 0 , and Γ = E h ˜ W σ ˜ W ′ σ i is a finite and nonsingular matrix. The conditional independence property needs to hold after conditioning on the observed attributes ( X i , X j ), andnot just the dyad-specific covariates W ij . The intuition behind this insight follows from Assumption 3.1.1, whichallows for unrestricted dependence between X i , and A i . In particular, the proof of Theorem 3.1 requires that anystochastic variation left in A i + A j after conditioning on ( X i , X j ), is independent of W kl for any k, l ∈ N n , including,for example W il . This property no longer holds if the conditioning variable used is W ij since it is only a feature of( X i , X j ). U ij | X i , X j has conditionally mean zero. The secondpart of assumption 3.1.5 is the standard full rank condition on the pairwise variation of the observedattributes ˜ W σ , and it ensures that θ is point identified.The network formation model specified by equation (3) and Assumptions 3.1.1-3.1.5 represents,to the best of my knowledge, the first generalization of the special regressor to analyze network data.Following Lewbel (1998, 2000), Honor´e and Lewbel (2002), and Chen, Khan, and Tang (2019), let D ∗ ij be defined as D ∗ ij = (cid:20) D ij − [ v ij > f v | x ( v ij | X i , X j ) (cid:21) (4)for any distinct i, j ∈ N n .The following theorem and appended corollary formalize the first point identification result for θ . Theorem 3.1.
If Assumptions 3.1.3-3.1.5 hold in equation (3) , then for any distinct i and j in N n E [ D ∗ ij | X i , X j ] = W ′ ij θ + E [ A i + A j | X i , X j ] . Proof.
See Appendix A.
Corollary 3.1.
If Assumptions 3.1.1-3.1.5 hold in equation (3) , then for any tetrad σ ∈ N m n E h ˜ W σ ˜ D ∗ σ i = E h ˜ W σ ˜ W ′ σ i θ , (5) and hence, θ = Γ − × Ψ (6) with Ψ = E h ˜ W σ ˜ D ∗ σ i . Proof.
See Appendix A.Theorem 3.1 and Corollary 3.1 demonstrate that θ is point identified using the informationcontained in the joint distribution of { ˜ D ∗ σ , ˜ W σ } at tetrad level, and with analytic expression givenby equation (6). This result shows that θ is identified as an average of the linking decisions ˜ D σ which are weighted by the inverse of the conditional density of the special regressor given the11bserved attributes, f v | x ( v ij | X i , X j ). The result in Corollary 3.1 will be used as a foundation ofthe semiparametric estimator introduced in Section 4.Given the results in Theorem 3.1 and Corollary 3.1 the average contribution of the unobservedagent-specific factors to the formation of a link is also identified. Corollary 3.2.
If Assumptions 3.1.1-3.1.5 hold in equation (3) , then for any i and j in N n E [ A i + A j ] = E (cid:2) D ∗ ij (cid:3) − E [ W ij ] ′ θ , (7) In this section, I provide a second point identification result for the vector of coefficients θ . Thisresult does not require the regressor v ij to be conditionally independent of the unobserved terms, A i + A j − U ij . Nonetheless, it imposes a large support condition on v ij and bounds the contributionthat the unobserved heterogeneity A i + A j has on the formation of links.The following notation will be used to state and prove this result. For any fixed tetrad σ ( { i, j, k, l } ) ∈ N m n , denote the profile of observed attributes at tetrad level as ¯ v σ = ( v ik , v il , v jk , v jl )and ¯ Z σ = (¯ v σ , X σ ). Moreover, for any σ ( { i, j, k, l } ) ∈ N m n and agent r with r ∈ { i, j } denote thewithin-individual r variation of the observed attributes as ∆ σ v r = v rk − v rl and ∆ σ W r = W rk − W rl ,and the within-individual r variation of the unobserved attributes as ∆ σ A = A k − A l .The following assumptions are sufficient to show the second point identification result. Assumption 3.2.1.
For any finite n and dyad { i, j } , Assumption 3.1.2 holds. Furthermore, thelink-specific unobserved term U ij | Z i , Z j , A i , A j has a positive density over the real line. Assumption 3.2.1 ensures that the disturbance term U ij has a large support for any value of( Z i , Z j , A i , A j ). This assumption is used for simplicity to ensure that the conditional probabilityof forming a link is well defined for any value of ( Z i , Z j , A i , A j ). Notice that any model where thedisturbance term U ij is logistically or normally distributed will satisfy this condition. Assumption 3.2.2.
The parameter space Θ is compact. Assumption 3.2.2 is a standard assumption in the semiparametrics literature, (see, e.g., Manski1975, 1985; Newey and McFadden 1994, and Powell 1994). This assumption is used to control thecontribution that the variation in W ij has on the formation of links. Assumption 3.2.3.
For any finite n , the following holds for any σ ( { i, j, k, l } ) ∈ N m n .1. For all X σ , ¯ v σ is continuously distributed with a positive density over R . . For all X σ and r ∈ { i, j } , ∆ σ v r is continuously distributed with a positive density over the realline, and the supp ( − ∆ σ W ′ r θ − ∆ σ A | X σ ) = [ s ε , s ε ] is known with −∞ < s ε < < s ε < ∞ . Assumption 3.2.3 ensures that the regressor v ij has a large support. Moreover, it requiresthat the variation in v ij dominates the contribution that the remaining factors have in creating anetwork link. Notice that this condition does not impose that v ij is conditionally independent of A i + A j given X σ . Intuitively, Assumption 3.2.3 guarantees that the information at the tails ofthe distribution of ∆ σ v r can disentangle the contributions of the preferences for homophily andunobserved heterogeneity on the creation of network links. Assumption 3.2.4.
For any finite n and tetrad σ ( { i, j, k, l } ) ∈ N m n , P h ˜ W ′ σ γ = 0 i > for allnon-zero vectors γ ∈ R K . Assumption 3.2.4 is a rull rank condition.For any fixed σ ( { i, j, k, l } ) ∈ N m n and given X σ , let V ( X σ ) denote the set of values for whichthe variations in ∆ σ v i and ∆ σ v j dominates the contribution of the remaining factors. That is tosay: V ( X σ ) = { ¯ v σ : ∆ σ v i ≤ s ε & ∆ σ v j ≥ s ε , or ∆ σ v i ≥ s ε & ∆ σ v j ≤ s ε } . (8)Notice that this set can be characterized using Assumption 3.2.3. Also, define ξ ( θ ) as ξ ( θ ) = ¯ z σ : ¯ v σ ∈ V ( X σ ) and sign n E θ h ˜ D σ | X σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } io = sign n E θ h ˜ D σ | X σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } io , which characterizes the set of states for which the sign of the conditional expectation of the pairwisevariations of the links ˜ D σ implied by θ differs from the sign of the conditional expectation generatedunder θ . In other words, the set ξ ( θ ) summarizes the values of observed attributes for which θ canbe identified from θ using the information contained in the conditional expectation of ˜ D σ . Hence, θ is said to be identified relative to θ = θ if P (cid:2) ¯ Z σ ∈ ξ ( θ ) (cid:3) > . The next theorem and appended corollary formalizes the second point identification result.
Theorem 3.2.
Suppose Assumptions 3.1.1, 3.2.1, 3.2.2, and 3.2.3 hold in equation (3) . Let Q θ = n ¯ z σ : ¯ v σ ∈ V ( X σ ) and ˜ W ′ σ θ ≤ − ˜ v σ < ˜ W ′ σ θ or ˜ W ′ σ θ ≤ − ˜ v σ < ˜ W ′ σ θ o . If P (cid:2) ¯ Z σ ∈ Q θ (cid:3) > , θ is point identified relative to θ . roof. See Appendix A.
Corollary 3.3.
Suppose Assumptions 3.1.1, 3.2.1- 3.2.4 hold in equation (3) . Then θ is pointidentified.Proof. See Appendix A.The results in Theorem 3.2 and Corollary 3.3 can be used to define an estimator for θ as themaximizer of a U -process of order 4 with a trimming sequence γ n such that γ n → ∞ as n → ∞ .In particular, the estimator of θ can be defined asˆ θ = arg max θ ∈ Θ ˆ H n ( θ, γ n )where ˆ H n ( θ, γ n ) = (cid:20) (cid:18) n (cid:19)(cid:21) − n X i =1 X j = i X i = i ,j X j = i ,j ,i H (cid:16) ¯ Z σ ( { i ,j ; i ,j } ) , ˜ D σ ( { i ,j ; i ,j } ) ; θ, γ n (cid:17) H (cid:16) ¯ Z σ , ˜ D σ ; θ, γ n (cid:17) = h sign n ˜ v σ + ˜ W ′ σ θ o × ˜ D σ i × h | ˜ D σ | = 2 i × [ | ∆ σ v i | , | ∆ σ v j |≥ γ n ] . Although point identification of θ is achieved assuming that the bounds [ s ε , s ε ] are known,notice that they are not needed to define the estimator ˆ θ . In other words, it is sufficient to assumethat ∆ σ v i has a large support which contains supp ( − ∆ σ W ′ i θ − ∆ σ A | X σ ) to characterize theestimator for θ .Naturally, the asymptotic properties of ˆ θ will depend on the frequency of subgraph configura-tions that satisfy the restriction h | ˜ D σ | = 2 i in the sample, and the rate at which γ n → ∞ as n → ∞ . The rest of this paper prioritizes the study of the semiparametric estimator introduced insection 4 since it is computationally more tractable than ˆ θ . In this section, I introduce a semiparametric estimator for θ based on the point identificationresult derived in section 3.1. The estimator for θ denoted by b θ n is a two-step estimator with anonparametric estimate of the conditional distribution of v ij given { X i , X j } , i.e., f v | x ( v ij | X i , X j ).Section 4.1 provides sufficient conditions to study the large sample properties of b θ n . Theorem 4.1proves that b θ n is a consistent estimator of θ . Theorem 4.2 shows that the limiting distribution of b θ n is normal. 14 .1 Consistency The estimator for θ is defined as the sample analog of equation (6) and is obtained by averaging overthe linking decisions ˜ D σ for all distinct tetrads σ ∈ N m n . Given that the inverse of f v | x ( v ij | X i , X j )is used as a weight in the definition of Ψ , and hence θ , I introduce a trimming sequence intendedto avoid boundary effects arising from the first-step estimation of f v | x ( v ij | X i , X j ).Recall that ˜ D σ is defined as the pairwise variation across the linking decisions for a given tetrad σ ( { i , i , j , j } ) ∈ N m n . I extend that notation to define as follows the pairwise variation of thetrimmed network links given a trimming parameter τ e D ∗ σ,τ = (cid:0) D ∗ i j ,τ − D ∗ i j ,τ (cid:1) − (cid:0) D ∗ i j ,τ − D ∗ i j ,τ (cid:1)b D ∗ σ,τ = (cid:16) b D ∗ i j ,τ − b D ∗ i j ,τ (cid:17) − (cid:16) b D ∗ i j ,τ − b D ∗ i j ,τ (cid:17) , where for any distinct i and j in N n D ∗ i j ,τ = (cid:18) D i j − [ v i j > f v | x ( v i j | X i , X j ) (cid:19) I τ ( v i j , X i , X j ) b D ∗ i j ,τ = D i j − [ v i j > b f v | x ( v i j | X i , X j ) ! I τ ( v i j , X i , X j ) . In the equations above, f v | x ( v i j | X i , X j ) denotes the true conditional density function of v i j given ( X i , X j ), and b f v | x ( v i j | X i , X j ) denotes a kernel estimator of the conditional densityof v i j given ( X i , X j ). Thus, e D ∗ σ,τ denotes the pairwise variation of the trimmed network linksassuming that the conditional distribution of the special regressor given the observed attributesis known. Conversely, b D ∗ σ,τ denotes the pairwise variation of the trimmed network links when f v | x ( v i j | X i , X j ) is replaced by a first-stage kernel estimator b f v | x ( v i j | X i , X j )The trimming sequence I τ ( v i j , X i , X j ) is a function of the observed attributes at a dyadlevel, and it converges to 1 as the trimming parameter τ → n → ∞ . Assumptions 4.1.2 and4.1.5 below describe the conditions imposed on the trimming parameter τ , (see Honor´e and Lewbel2002 and Khan and Tamer 2010). 15o ease the exposition, I introduce the following notation for any distinct i , j ∈ N n I τ,i j = I τ ( v i j , X i , X j ) f vx,i j = f v,x ( v i j , X i , X j ) f x,i j = f x ( X i , X j ) ϕ i j = D i j − [ v i j > ϕ i j ,τ = ϕ i j I τ,i j . With this notation at hand, the semiparametric estimator for θ is defined as b θ n = b Γ − n × b Ψ n,τ (9)where b Γ n = 1 m n X σ ∈N mn h ˜ W σ ˜ W ′ σ ib Ψ n,τ = 1 m n X σ ∈N mn h ˜ W σ b D ∗ σ,τ i and m n = 4! (cid:0) n (cid:1) .The first-stage kernel estimator b f v | x ( v i j | X i , X j ) is defined as the ratio of the kernel estima-tors b f vx,i j and b f x,i j with b f vx,i j = 1( n − n − h L +1 X k = i ,j X k = i ,j ,k K vx,h [ v k k − v i j , X k − X i , X k − X j ] b f x,i j = 1( n − n − h L X k = i ,j X k = i ,j ,k K x,h [ X k − X i , X k − X j ] , where h denotes a bandwith parameter and L = 2 K . The kernels K vx,h and K x,h are defined as K vx,h [ v k k − v i j , X k − X i , X k − X j ] = K vx (cid:20) v k k − v i j h , X k − X i h , X k − X j h (cid:21) K x,h [ X k − X i , X k − X j ] = K x (cid:20) X k − X i h , X k − X j h (cid:21) . Assumption 4.1.5 below describes the conditions imposed on the kernel functions K vx,h and K x,h ,and bandwith parameter h .The estimator defined in equation (9) represents, to the best of my knowledge, the first effortto estimate the vector of parameters θ defined in the network formation model given by equation(3) using a two-step semiparametric estimator that utilizes the existence of a special regressor.16 semiparametric approach is attractive because it does not restrict the distribution of thedisturbance term to any specific parametric family. Furthermore, it allows for a flexible statis-tical dependence between the agent-specific unobserved factors and the observed attributes, i.e., { X n , A n } . As an additional appealing property, the estimator defined in equation (9) has ananalytical form. This characteristic increases its computational tractability compared with theestimator defined as the maximizer of a U-process and introduced in section 3.2. Regarding thenon-parametric first-stage estimator, Leung (2015b, Supp. Appendix) and Graham et al. (2019)have studied the properties of kernel estimators for network data. I use their findings to analyzethe asymptotic properties of b θ n .The following technical conditions are needed to prove Theorems 4.1 and 4.2. For simplicity, thetheorems are stated and proved assuming that all of the elements of X i are continuously distributed.However, the results can be readily extended to include discretely distributed variables by applyingthe density estimator separately to each discrete cell of data. Assumption 4.1.1.
For any distinct indices i and j in N n , the dyad-level covariates ( X i , X j ) and ( v ij , X i , X j ) are absolutely continuous with respect to some Lebesgue measures with Radon-Nikodym densities f x,ij and f vx,ij , and supports denoted by S x and S vx . Assume that f x,ij and f vx,ij are bounded, f vx,ij is bounded away from zero, and there exists a constant M > L + 1 (recallthat L = 2 K , with dim ( X i ) = K ) such that f x,ij and f vx,ij are M -times differentiable with respectto all of its arguments with bounded derivatives. There exist finite constants C w, and C w, suchthat sup σ ∈N mn || ˜ W σ ||≤ C w, w.p.1 and E h || ˜ W σ || i < C w, . Assumption 4.1.1 ensures that the densities f x,ij and f vx,ij are continuous and M -times differ-entiable. Also, it requires the existence of fourth-order moments for ˜ W σ , for any σ ∈ N m n . This as-sumption has been used in the literature of semiparametric methods, for example in Ahn and Powell(1993); Aradillas-Lopez (2012), and Honor´e and Lewbel (2002). Assumption 4.1.2.
Let τ be a density trimming parameter defined above. Assume that the support S vx is known, and the trimming function I τ,ij is equal to zero if ( v ij , X i , X j ) is within a distance τ of the boundary of S vx , and otherwise, I τ,ij equals one. Also, assume that τ → and τ n → as n → ∞ . Due to the weighting scheme used in the definition of b D ∗ i j , boundary effects could arise fromthe density estimation step when computing b Ψ n,τ . Assumptions 4.1.1 and 4.1.2 deal with thistechnicality by assuming that f vx,i j is bounded away from zero and by introducing a trimmingsequence I τ ( v i j , X i , X j ) that sets to zero the terms in b Ψ n,τ with data within a τ distance ofthe boundary of S vx , (see, e.g., Lewbel 1997, 2000; Honor´e and Lewbel 2002, and Khan and Tamer2010)Assumptions 4.1.1 and 4.1.2 require that the support S vx is known. The support S vx is identifiedfrom the distribution of observables, and hence, it can be estimated in an empirical application.17s an alternative approach to Assumption 4.1.2, a fixed trimming function that is not n -dependentcould be used instead, (see, e.g., Aradillas-Lopez, Honor´e, and Powell 2007 and Aradillas-Lopez2012). Assumption 4.1.3.
Let M be as defined above. Given any tetrad σ ( { i , i , j , j } ) ∈ N m n , let Ξ ( X l , X l ) = E h ˜ W σ D ∗ l l ,τ | X l , X l i Ξ ( v l l , X l , X l ) = E h ˜ W σ D ∗ l l ,τ | v l l , X l , X l i for any dyad ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } . The expectations Ξ ( x, x ) and Ξ ( v, x, x ) exist and are continuous in the components of ( v, x, x ′ ) for all ( v, x, x ′ ) ∈ S vx . Also, Ξ ( x, x ) and Ξ ( v, x, x ) are M -times differentiable in the components of ( v, x, x ′ ) for all ( v, x, x ′ ) ∈ S vx , where S vx differs from S vx by a set of measure zero.There exist some functions m x ( x, x ) and m vx ( v, x, x ′ ) such that the following local Lipschitzconditions hold for some ( x , x ′ ) and ( v , x , x ′ ) in an open neighborhood of zero and for all τ > : || f vx ( v + v , x + x , x ′ + x ′ ) − f vx ( v, x, x ′ ) || ≤ m vx ( v, x, x ′ ) || ( v , x , x ′ ) |||| f x ( x + x , x ′ + x ′ ) − f x ( x, x ′ ) || ≤ m x ( x, x ′ ) || ( x , x ′ ) |||| Ξ( v + v , x + x , x ′ + x ′ ) − Ξ( v, x, x ′ ) || ≤ m vx ( v, x, x ′ ) || ( v , x , x ′ ) |||| Ξ( x + x , x ′ + x ′ ) − Ξ( x, x ′ ) || ≤ m x ( x, x ′ ) || ( x , x ′ ) || . Assumption 4.1.3 imposes local smoothness conditions that are needed to derive the H´ajek pro-jection of a V -statistic. Similar conditions have been used in Ahn and Powell (1993); Aradillas-Lopez(2012), and Honor´e and Lewbel (2002). Assumption 4.1.4.
Given any σ ( { i , i , j , j } ) ∈ N m n and ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } ,let χ l l = χ ( X l , X l ) = E h ˜ W σ | X l , X l i . The following moments exist sup ( x,x ′ ) ∈ S x χ ( x, x ′ )sup ( v,x,x ′ ) ∈ S v,x ,τ ≥ E "(cid:18) ϕ l l ,τ f vx ( v, x, x ′ ) (cid:19) | v, x, x ′ sup ( v,x,x ′ ) ∈ S v,x ,τ ≥ E "(cid:18) D ∗ l l ,τ f vx ( v, x, x ′ ) (cid:19) | v, x, x ′ , nd the objects χ ( x, x ′ ) E "(cid:18) ϕ l l ,τ f vx ( v, x, x ′ ) (cid:19) | v, x, x ′ E "(cid:18) D ∗ l l ,τ f vx ( v, x, x ′ ) (cid:19) | v, x, x ′ are continuous in the components of ( v, x, x ′ ) ∈ S vx . Moreover, there exists a finite constant C χ ,such that E (cid:2) || χ ( x, x ′ ) || (cid:3) ≤ C χ for any ( x, x ′ ) ∈ S x . Assumption 4.1.4 ensures the existence and boundedness of the conditional expectations definedabove. These conditions are needed to invoke a uniform law of large numbers for V -statistics. Thelast part of Assumption 4.1.4 guarantees the existence of sixth-order moments, and it will be usedto invoke a conditional central limit theorem. Assumption 4.1.5.
Let M and τ be as defined above. The kernel K x ( x, x ′ ) : R L R and bandwith h used to define the kernel estimator ˆ f x satisfy:1. K x ( x, x ′ ) = 0 for all ( x, x ′ ) on the boundary of, and outside of, a convex bounded subset of R L . This subset has an nonempty interior and has the origin as an interior point.2. K x ( · , · ) is symmetric around zero, bounded, differentiable, and bias-reducing of order M .3. There exists δ > such that n − δ h L +1 → ∞ , nh M → , and h/τ → .The kernel function K v,x ( v, x, x ′ ) has all the same properties, replacing ( x, x ′ ) with ( v, x, x ′ ) . Assumption 4.1.5 requires the use of a higher-order kernel. This selection is motivated tocontrol the bias induced by using the inverse of f v | x ( v i j | X i , X j ) as a weighting function. Thisassumption has been used by Honor´e and Lewbel (2002) and Leung (2015b). Graham et al. (2019)provide a comprehensive treatment of kernel estimation for undirected network data.Using the assumptions above, it follows that b θ n defined in equation (6) is a consistent estimatorof θ . Theorem 4.1 formally states this result. Theorem 4.1.
Let Assumptions 3.1.1-3.1.5 and 4.1.1-4.1.5 hold. Then ( b θ n − θ ) p → as n → ∞ .Proof. See Appendix A. 19 .2 Asymptotic Distribution
The following theorem derives the asymptotic distribution of ˆ θ n . A key step in proving this resultis to show that p n ( n − − / n n b Ψ n,τ − E h ˜ W σ e D ∗ σ,τ | v σ , X σ , A σ io ⇒ N (0 , I ) , where I denotes the K -dimensional identity matrix, and Υ n = n ( n − V ar (cid:16) b Ψ n,τ (cid:17) , which is definedas Υ n = 1 n ( n − n X i =1 X j = i E (cid:20)(cid:26) p n ( ω i j ) [1 − p n ( ω i j )] f v | x,i j (cid:27) I τ,i j (cid:21) χ i j χ ′ i j with χ i j = n n − n − P i = i ,j P j = i ,j ,i E h ˜ W σ { i ,i ; j ,j } | X i , X j io . The proof of this result follows from showing that n b Ψ n,τ − E h ˜ W σ e D ∗ σ,τ | v σ , X σ , A σ io is asymptotically equivalent to its H´ajek Projection onto an arbitrary function of ζ i j = ( v i j , X i , X j , A i , A j , U i j ) . The resulting H´ajek Projection is an average of conditionally independent random variables at adyad level, with conditional mean equal to 0 and a conditional variance that approximates Υ n inthe limit. The result follows from a conditional version of Lyapunov’s central limit theorem (see,e.g., Rao 2009).The remaining information needed to derive the limiting distribution of the semiparametricestimator ˆ θ n , is the convergence rate of Υ n , which is given by ̺ n = O (Υ n ) = O (cid:18) E (cid:20)(cid:26) p n ( ω i j ) [1 − p n ( ω i j )] f v | x,i j (cid:27) I τ,i j (cid:21)(cid:19) , and the following matrix Σ n = Γ − × Υ n × Γ − . The next theorem formalizes the limiting distribution of b θ n . Theorem 4.2.
Suppose Assumptions 3.1.1-3.1.5, 4.1.1-4.1.5, and n ( n − ̺ − n → ∞ hold. It then ollows that p n ( n − − / n (cid:16)b θ n − θ (cid:17) = Σ − / n × Γ − × p n ( n − n X i =1 X j = i ξ i j ,τ + o p (1) (10) with ξ i j ,τ = (cid:8) D ∗ i j − E (cid:2) D ∗ i j | ω i j (cid:3)(cid:9) I τ,i j χ i j , and thus, p n ( n − − / n (cid:16)b θ n − θ (cid:17) ⇒ N (0 , I ) . Proof.
See Appendix A.Equation (10) describes the asymptotic linear representation of b θ n . The limiting distribution of b θ n is derived following a studentized approach as in Andrews and Schafgans (1998), Khan and Tamer(2010), and Jochmans (2018) to control for the possible varying rates of convergence due to sparsityof the network. Notice that if ̺ − n converges to a finite constant that is bounded away from zero, b θ n − θ converges at a parametric rate p n ( n − ̺ − n decays as n increases, b θ n − θ has a slower rate ofconvergence given by O p (cid:18)q n ( n − ̺ − n (cid:19) . This section presents simulation evidence for the finite sample performance of the semiparametricestimator introduced in Section 4. I explore the properties of the estimation technique under awide array of DGP designs that are meant to capture differences in the sample size and in the levelof sparsity of the network (see, e.g., Jochmans 2018; Dzemski 2019; Yan et al. 2019).The undirected network is simulated according the network model in equation (3). I consider asingle observed attribute in X i , which is drawn as X i ∼ Beta(2 , − . The pair-specific covariate W ij = g ( X i , X j ) is constructed to account for complementarities on the observed attributes andis defined as W ij = X i X j . The agent-specific unobserved factor A i is generated such that it iscorrelated with X i and depends on the sample size n . This last feature offers a useful approach tocontrol the degree of sparsity in the network. In particular, I set A i = λX i − (1 − λ ) C n × Beta(0 . , . , X i and concentrates mass at the boundary of theunit interval. This implies that, conditional on X i , the individuals cluster at small or high typesof unobserved attributes. The parameter λ ∈ (0 ,
1) controls the degree of correlation between theagent-specific heterogeneity and the observed covariate X i , which is set to λ = . The constant C n depends on the size of the network and takes the values C n ∈ (cid:8) log(log( n )) , log( n ) / , log( n ) (cid:9) .Under this design, the choice of C n regulates the degree of sparsity of the network. For larger valuesof C n , fewer links are formed in the network. The special regressor v ij is simulated as v ij ∼ N (0 , i < j , and thus satisfies the support and independence conditions in Assumptions 3.1.3 and3.1.4. The link-specific disturbance term is generated as U ij ∼ Beta (2 , − for i < j . The trueDGP is completed by setting the parameter value θ = 1 . n ∈ { , } .The implementation of the semiparametric estimator for θ requires the estimation of the con-ditional density of v ij in a nonparametric first stage. I consider two approaches to isolate theapproximation error induced by the density estimation. The first one assumes that the conditionaldistribution of v ij is known and considers a fixed trimming design given by I τ,ij = 1 [ | v ij | < τ ]with τ = 2 std ( v ij ). In the second approach, I compute the semiparametric estimator as defined inequation (9). Although assumption 4.1.3 requires the use of higher-order kernels to eliminate theasymptotic bias, I compute b θ n using a standard second-order kernel. The motivation for this choiceis that semiparametric estimators computed using high-order kernels tends to have inferior finitesample properties compared to those obtained using standard kernels. Furthermore, this choice isa common practice in many semiparametric applications (see Rothe 2009 and Jochmans 2013). Iuse the standard-normal density as the kernel function. The trimming design is the same as in thefirst approach to ensure a proper comparison between the two alternative methods. The bandwidthparameter h is set to be equal to 0 . f v ( v ij ) is known, over 500 Monte Carlo replications for all the designs. Inparticular, I report the mean, median, standard deviation, and mean square error of b θ n over thetotal number of simulations. The final column of Table 1 reports the average degree of the networkacross the total number of simulations. This information will be used to describe the degree ofsparsity in the network across the different designs.The top panel in Table 1 shows the results of estimating θ in a small network with n = 50.Both the mean and the median show that the estimator approximates well the true value of θ = 1 . b θ n presents the smallest dispersion in the dense network design, with C n = log(log( n ))and an average degree of 42% of the links formed. As fewer links are present in the network, theperformance of the estimator deteriorates. 22n the bottom panel of Table 1, I show the results of estimating θ in a large network with n = 100. The evidence in this scenario reinforces the previous findings and suggests that theperformance of the estimator b θ n improves across all the designs. For example, in the dense networkscenario C n = log(log( n )), the standard deviation decreases by an order of less than one half andthe mean square error by an order greater than one third. A similar conclusion is obtained fromthe sparse network case C n = log( n ), where only 28% of the links are formed.Table 2 summarizes the results of computing the semiparametric two-step estimator for θ witha first-step kernel estimator b f v ( v ij ) over 500 Monte Carlo replications for all the designs. Thetop panel in Table 2 shows the results of estimating θ in a small network with n = 50. Theseestimates suggest that b θ n approximates well the true value of θ . However, this approach obtainsless accurate results than those by the first method due to the approximation error induced by thenonparametric first-stage estimation. In particular, the estimator presents the best performanceand smallest dispersion in the dense network design, where the network has an average degree of42% of the links formed.In the bottom panel of Table 2, I show the results of estimating θ in a large network with n = 100. The estimates show that the performance of the estimator b θ n improves across all thedesigns as the network’s size grows large, including the sparse case where the network has anaverage degree of 29% of the links formed. Overall these numerical experiments suggest thatthe semiparametric estimator b θ n yields reliable inference for the preference parameter θ in anundirected network formation model.Table 1: Simulation results for the semiparametric estimator b θ n with known density function f v ( v ij )mean median std MSE Degree n = 50log(log( n )) 1.4764 1.4627 0.9158 0.8393 0.4250log( n ) / n ) 1.5217 1.6001 1.3832 1.9136 0.3131 n = 100log(log( n )) 1.5212 1.5022 0.4809 0.2317 0.4204log( n ) / n ) 1.5057 1.4979 0.6916 0.4783 0.2893 Total number of Monte Carlo simulations = 500. b θ n with kernel estimator b f v ( v ij )mean median std MSE Degree n = 50log(log( n )) 1.6047 1.6164 1.1253 1.2772 0.4237log( n ) / n ) 1.6444 1.6643 1.5801 2.5176 0.3125 n = 100log(log( n )) 1.5373 1.5011 0.4911 0.2425 0.4214log( n ) / n ) 1.5415 1.5197 0.7317 0.5371 0.2907 Total number of Monte Carlo simulations = 500. Bandwith parameter h = 0 . Conclusion
This paper has studied a network formation model with unobserved agent-specific heterogeneity.This paper offers two main contributions to the literature on network formation. The first contri-bution is to propose a new identification strategy that identifies the vector of coefficients θ , whichaccounts for the preferences for homophilic relationships on the observed attributes. The pointidentification result relies on the existence of a special regressor. This study represents, to thebest of my knowledge, the first generalization of a special regressor to analyze a network formationmodel (Lewbel 1998 and Lewbel 2000).The second contribution is to introduce a two-step semiparametric estimator for θ . The esti-mator has a closed-form and is computationally tractable even in large networks. I show in MonteCarlo simulations that the estimator performs well in finite samples, as well as in sparse and densenetworks.Two different strands of the literature on network formation have highlighted the importanceof accounting for (i) network externalities, and (ii) general forms of unobserved heterogeneity,(see, e.g., Graham 2019b). In future research, I plan to explore the identification power that thespecial regressor has when considering an augmented model of network formation with networkexternalities and general forms of unobserved heterogeneity.25 eferences Ahn, H. and J. L. Powell (1993). Semiparametric estimation of censored selection models with anonparametric selection mechanism.
Journal of Econometrics 58 (1-2), 3–29.Andrews, D. W. and M. M. Schafgans (1998). Semiparametric estimation of the intercept of asample selection model.
The Review of Economic Studies 65 (3), 497–517.Aradillas-Lopez, A. (2010). Semiparametric estimation of a simultaneous game with incompleteinformation.
Journal of Econometrics 157 (2), 409–431.Aradillas-Lopez, A. (2012). Pairwise-difference estimation of incomplete information games.
Jour-nal of Econometrics 168 (1), 120–140.Aradillas-Lopez, A., B. E. Honor´e, and J. L. Powell (2007). Pairwise difference estimation withnonparametric control variables.
International Economic Review 48 (4), 1119–1158.Arellano, M. and B. Honor´e (2001). Panel data models: some recent developments.
Handbook ofEconometrics 5 , 3229–3296.Auerbach, E. (2019). Identification and estimation of a partially linear regression model usingnetwork data. arXiv preprint arXiv:1903.09679 .Badev, A. (2018). Nash equilibria on (un)stable networks.Boucher, V. and I. Mourifi´e (2017). My friend far, far away: a random field approach to exponentialrandom graph models.
The econometrics journal 20 (3), S14–S46.Candelaria, L. E. and T. Ura (2018). Identification and inference of network formation games withmisclassified links. arXiv preprint arXiv:1804.10118 .Chandrasekhar, A. G. and M. O. Jackson (2014). Tractable and consistent random graph models.
Working Paper .Charbonneau, K. B. (2017). Multiple fixed effects in binary response panel data models.
TheEconometrics Journal 20 (3), S1–S13.Chen, S., S. Khan, and X. Tang (2019). Exclusion restrictions in dynamic binary choice paneldata models: Comment on “semiparametric binary choice panel data models without strictlyexogenous regressors”.
Econometrica 87 (5), 1781–1785.Collomb, G. and W. H¨ardle (1986). Strong uniform convergence rates in robust nonparametrictime series analysis and prediction: Kernel regression estimation from dependent observations.
Stochastic processes and their applications 23 (1), 77–89.de Paula, A., S. Richards-Shubik, and E. Tamer (2017). Identifying preferences in networks withbounded degree. forthcoming in.
Econometrica .Dzemski, A. (2019). An empirical model of dyadic link formation in a network with unobservedheterogeneity.
Review of Economics and Statistics 101 (5), 763–776.Gao, W. Y. (2020). Nonparametric identification in index models of link formation.
Journal ofEconometrics 215 (2), 399–413. 26ao, W. Y., M. Li, and S. Xu (2020). Logical differencing in dyadic network formation modelswith nontransferable utilities. arXiv preprint arXiv:2001.00691 .Goldsmith-Pinkham, P. and G. W. Imbens (2013). Social networks and the identification of peereffects.
Journal of Business & Economic Statistics 31 (3), 253–264.Graham, B. S. (2017). An econometric model of network formation with degree heterogeneity.
Econometrica 85 (4), 1033–1063.Graham, B. S. (2019a). Dyadic regression.Graham, B. S. (2019b). Network data.Graham, B. S., F. Niu, and J. L. Powell (2019). Kernel density estimation for undirected dyadicdata. arXiv preprint arXiv:1907.13630 .Gualdani, C. (2020). An econometric model of network formation with an application to boardinterlocks between firms.Han, A. K. (1987). Non-parametric analysis of a generalized regression model: the maximum rankcorrelation estimator.
Journal of Econometrics 35 (2), 303–316.Honor´e, B. E. and A. Lewbel (2002). Semiparametric binary choice panel data models withoutstrictly exogeneous regressors.
Econometrica 70 (5), 2053–2063.Jackson, M. O. and A. Wolinsky (1996). A strategic model of social and economic networks.
Journalof economic theory 71 (1), 44–74.Jochmans, K. (2013). Pairwise-comparison estimation with non-parametric controls.
The Econo-metrics Journal 16 (3), 340–372.Jochmans, K. (2017). Two-way models for gravity.
Review of Economics and Statistics 99 (3),478–485.Jochmans, K. (2018). Semiparametric analysis of network formation.
Journal of Business &Economic Statistics 36 (4), 705–713.Khan, S. and E. Tamer (2010). Irregular identification, support conditions, and inverse weightestimation.
Econometrica 78 (6), 2021–2042.Lee, A. J. (2019).
U-statistics: Theory and Practice . Routledge.Leung, M. (2015a). A random-field approach to inference in large models of network formation.
Available at SSRN .Leung, M. (2015b). Two-step estimation of network-formation models with incomplete information.
Journal of Econometrics 188 (1), 182–195.Lewbel, A. (1997). Semiparametric estimation of location and other discrete choice moments.
Econometric Theory 13 (01), 32–51.Lewbel, A. (1998). Semiparametric latent variable model estimation with endogenous or mismea-sured regressors.
Econometrica , 105–121. 27ewbel, A. (2000). Semiparametric qualitative response model estimation with unknown het-eroscedasticity or instrumental variables.
Journal of Econometrics 97 (1), 145–177.Lewbel, A. (2012).
An overview of the special regressor method . Boston College, Department ofEconomics.Manski, C. F. (1975). Maximum score estimation of the stochastic utility model of choice.
Journalof Econometrics 3 (3), 205–228.Manski, C. F. (1985). Semiparametric analysis of discrete response: Asymptotic properties of themaximum score estimator.
Journal of Econometrics 27 (3), 313–333.McPherson, M., L. Smith-Lovin, and J. M. Cook (2001). Birds of a feather: Homophily in socialnetworks.
Annual Review of Sociology , 415–444.Mele, A. (2017). A structural model of dense network formation.
Econometrica 85 (3), 825–850.Menzel, K. (2015). Strategic network formation with many agents.Miyauchi, Y. (2016). Structural estimation of a pairwise stable network with nonnegative exter-nality.
Journal of Econometrics, Forthcoming .Newey, W. K. and D. McFadden (1994). Large sample estimation and hypothesis testing.
Handbookof econometrics 4 , 2111–2245.Powell, J. L. (1994). Estimation of semiparametric models.
Handbook of econometrics 4 , 2443–2521.Powell, J. L., J. H. Stock, and T. M. Stoker (1989). Semiparametric estimation of index coefficients.
Econometrica: Journal of the Econometric Society , 1403–1430.Rao, B. P. (2009). Conditional independence, conditional mixing and conditional association.
Annals of the Institute of Statistical Mathematics 61 (2), 441–460.Ridder, G. and S. Sheng (2020). Estimation of large network formation games. arXiv preprintarXiv:2001.03838 .Rothe, C. (2009). Semiparametric estimation of binary response models with endogenous regressors.
Journal of Econometrics 153 (1), 51–64.Serfling, R. J. (2009).
Approximation theorems of mathematical statistics , Volume 162. John Wiley& Sons.Sheng, S. (2018). A structural econometric analysis of network formation games through subnet-works. forthcoming in Econometrica, mimeo UCLA .Silverman, B. W. (1978). Weak and strong uniform consistency of the kernel estimate of a densityand its derivatives.
The Annals of Statistics , 177–184.Toth, P. (2017). Semiparametric estimation in networks with homophily and degree heterogeneity.Technical report, Working paper, University of Nevada.Yan, T., B. Jiang, S. E. Fienberg, and C. Leng (2019). Statistical inference in a directed networkmodel with covariates.
Journal of the American Statistical Association 114 (526), 857–868.Zeleneev, A. (2020). Identification and estimation of network models with nonparametric unob-served heterogeneity. 28
Appendix
A.1 Proof of Theorem 3.1
Proof.
Let e ij = A i + A j − U ij and s ( w, e ) = − w ′ θ − e . Consider E [ D ∗ ij | X i , X j ] = E (cid:2) E (cid:2) D ∗ ij | v ij , X i , X j (cid:3) | X i , X j (cid:3) = Z s v s v E [ D ij − [ v ij > | v ij , X i , X j ] f v | x ( v ij | X i , X j ) f v | x ( v ij | X i , X j ) dv ij = Z s v s v E [ [ v ij ≥ s ( W ij , e ij )] − [ v ij > | v ij , X i , X j ] dv ij = Z s v s v Z S e ( X i ,X j ) { [ v ij ≥ s ( W ij , e ij )] − [ v ij > } dF e | x ( e ij | v ij , X i , X j ) dv ij = Z S e ( X i ,X j ) Z s v s v { [ v ij ≥ s ( W ij , e ij )] − [ v ij > } dv ij dF e | x ( e ij | X i , X j )= Z S e ( X i ,X j ) − s ( W ij , e ij ) dF e | x ( e ij | X i , X j )= Z S e ( X i ,X j ) (cid:0) W ′ ij θ + e ij (cid:1) dF e | x ( e ij | X i , X j )= W ′ ij θ + E [ e ij | X i , X j ] . The third to last equality follows from the following result Z s v s v { [ v ij ≥ s ( W ij , e ij )] − [ v ij > } dv ij = Z s v s ( W ij ,e ij ) dv ij − s v = − s ( W ij , e ij ) . A.2 Proof of Corollary 3.1
Proof.
Theorem 3.1 concludes that E [ D ∗ ik | X i , X k ] = W ′ ik θ + E [ A i + A k | X i , X k ] . Observe that D ∗ ik is a function of ( Z i , Z k , A i , A k , U ik ). It follows from the the random samplingof nodes, Assumption 3.1.1, and the conditionally independent formation of links, Assumption3.1.2, that the following condition holds for any tetrad σ { i, j, k, l } ∈ N m n E [ D ∗ ik | X i , X k ] = E [ D ∗ ik | X σ ( { i,j,k,l } ) ] E [ A i + A k | X i , X k ] = E (cid:2) A i + A k | X σ ( { i,j,k,l } ) (cid:3) , v i , v k , A i , A k , U ik ) is conditionally independent of ( X j , X l ), given( X i , X k ), i.e., P r ( v i , v k , A i , A k , U ik | X i , X k ) = P r ( U ik | X i , X k , v i , v k , A i , A k ) P r ( v i , v k , A i , A k | X i , X k )= P r ( U ik | X σ ( { i,j,k,l } ) , v i , v k , A i , A k ) P r ( v i , v k , A i , A k | X σ ( { i,j,k,l } ) )= P r ( v i , v k , A i , A k , U ik | X σ ( { i,j,k,l } ) ) , where the second equality follows from Assumptions 3.1.1 and 3.1.2. Thus, the results above yield E [ D ∗ ik − D ∗ il | X σ ( { i,j,k,l } ) ] = ( W ik − W il ) ′ θ + E (cid:2) A k − A l | X σ ( { i,j,k,l } ) (cid:3) E [ D ∗ jk − D ∗ jl | X σ ( { i,j,k,l } ) ] = ( W jk − W jl ) ′ θ + E (cid:2) A k − A l | X σ ( { i,j,k,l } ) (cid:3) , for any tetrad σ { i, j, k, l } , which in turn implies E [ ˜ D ∗ σ | X σ ] = ˜ W ′ σ θ . (11)The result follows from Assumption 3.1.5. The proof is complete. Proof of Theorem 3.2
Proof.
First, notice that for any ( X σ , ¯ v σ ) ∈ Q θ sign { ˜ v σ } = sign (cid:8) ˜ v σ + (cid:0) ∆ σ W ′ i θ + ∆ σ A (cid:1) − (cid:0) ∆ σ W ′ j θ + ∆ σ A (cid:1)(cid:9) since | ˜ v σ |≥ s ε − s ε with probability 1.Consider a θ = θ with P (cid:2) ¯ Z σ ∈ Q θ (cid:3) >
0. Without loss of generality, consider some ( X σ , ¯ v σ ) ∈ Q θ , with ˜ W ′ σ θ ≤ − ˜ v σ < ˜ W ′ σ θ . From the previous observation, it follows that ˜ v σ + ˜ W ′ σ θ + ∆ σ A − ∆ σ A > σ v i > s ε , ∆ σ v j < s ε with probability 1.Given ( X σ , ¯ v σ ) ∈ Q θ , it follows that ˜ v σ + ˜ W ′ σ θ + ∆ σ A − ∆ σ A > σ v i > s ε , ∆ σ v j < s ε hold if and only if ∆ σ v i > − (cid:0) ∆ σ W ′ i θ + ∆ σ A (cid:1) ∆ σ v j ≤ − (cid:0) ∆ σ W ′ j θ + ∆ σ A (cid:1) (12)with probability 1. The inequalities in (12) are sufficient conditions for P θ h ˜ D σ = 2 | X σ , A σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } i > P θ h ˜ D σ = − | X σ , A σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } i , or equivalently, for E θ h ˜ D σ | X σ , A σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } i > . X σ , ¯ v σ ) ∈ Q θ , E θ h ˜ D σ | X σ , A σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } i > v σ + ˜ W ′ σ θ + ∆ σ A − ∆ σ A > v σ + ˜ W ′ σ θ + ∆ σ A − ∆ σ A ≤ v σ ∈ V ( X σ ), it would be the case that ˜ v σ <
0, and thus∆ σ v i ≤ − (cid:0) ∆ σ W ′ i θ + ∆ σ A (cid:1) ∆ σ v j > − (cid:0) ∆ σ W ′ j θ + ∆ σ A (cid:1) with probability 1, which contradicts E θ h ˜ D σ | X σ , A σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } i > . Hence, sign n E θ h ˜ D σ | X σ , A σ , ¯ v σ ∈ V ( X σ ) , ˜ D σ ∈ {− , } io = sign n ˜ v σ + ˜ W ′ σ θ o for any ( X σ , A σ , ¯ v σ ∈ V ( X σ )).The previous result implies that for any ( X σ , ¯ v σ ) ∈ Q θ with P (cid:2) ¯ Z σ ∈ Q θ (cid:3) >
0, it will hold that˜ W ′ σ θ ≤ − ˜ v σ < ˜ W ′ σ θ if and only ifsign n E θ h ˜ D σ | X σ , ¯ v σ ∈ V , ˜ D σ ∈ {− , } io > sign n E θ h ˜ D σ | X σ , ¯ v σ ∈ V , ˜ D σ ∈ {− , } io . This result implies that ¯ z σ ∈ ξ θ ( X σ ), and P (cid:2) ¯ Z σ ∈ ξ θ (cid:3) >
0. Therefore, θ is identified relative to θ . Proof of Corollary 3.3
Proof.
Consider any θ = θ . It follows from Assumption 3.2.4 that P h ˜ W ′ σ ( θ − θ ) = 0 i > σ ∈ N m n . Suppose without loss of generality that P h ˜ W ′ σ θ < ˜ W ′ σ θ i >
0. Under Assumptions3.1.1 and 3.2.3, for any X σ , with ˜ W ′ σ θ < ˜ W ′ σ θ , there exists an interval of ˜ v σ = ∆ σ v i − ∆ σ v j with˜ W ′ σ θ ≤ − ˜ v σ < ˜ W ′ σ θ . This implies that P (cid:2) ¯ Z σ ∈ Q θ (cid:3) > θ is point identified relativeto all θ = θ . A.3 Proof of Theorem 4.1
Proof.
Consider b θ n = b Γ − n × b Ψ n,τ , with b Γ n = 1 m n X σ ∈N mn h ˜ W σ ˜ W ′ σ ib Ψ n,τ = 1 m n X σ ∈N mn h ˜ W σ b D ∗ σ,τ i b Γ n p → Γ and b Ψ n,τ p → Ψ ; the result will follow Assumption 3.1.5, thecontinuous mapping theorem and Slutsky. Part 1.
Notice that b Γ n − Γ is a mean zero fourth-order V-statistic, without common indices b Γ n − Γ = 1 m n X σ ∈N mn nh ˜ W σ ˜ W ′ σ i − E h ˜ W σ ˜ W ′ σ io . Lemma B.1 implies that ˆΓ n − Γ can be approximated by a mean zero U-statistic of order 4 ata rate √ n . Assumption 3.1.5 ensures that Γ is finite. It follows from Assumption 3.1.1 that aStrong Law of Large Numbers for a U-statisitc holds, and hence, ˆΓ n − Γ = o p (1), (see Serfling2009, Theorem A, p. 190). Part 2.
For a fixed tetrad σ = σ ( { i , i , j , j } ) ∈ N m n , let b η [ l l ] ,τ = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ˆ f x,l l ˆ f vx,l l ! , for ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } . Next, observe that b Ψ n,τ can be written as b Ψ n,τ = (cid:0)b η [ i j ] ,τ − b η [ i j ] ,τ (cid:1) − (cid:0)b η [ i j ] ,τ − b η [ i j ] ,τ (cid:1) . Consistent estimation of Ψ will follow from repeated applications of Lemma B.2. It followsfrom Lemma B.3 that b η [ l l ] ,τ can be written as b η [ l l ] ,τ = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( f x,l l f vx,l l + b f x,l l − f x,l l f vx,l l − f x,l l f vx,l l × b f vx,l l − f vx,l l f vx,l l ) + o p (1) . Then, Lemma B.2 yields1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( ˆ f x,l l f vx,l l ) = E (cid:20) ˜ W σ ϕ l l ,τ f x,l l f vx,l l (cid:21) + o p (1)1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( f x,l l f vx,l l ˆ f vx,l l f vx,l l !) = E (cid:20) ˜ W σ ϕ l l ,τ f x,l l f vx,l l (cid:21) + o p (1) . It follows from the previous results and the definition of D ∗ l l ,τ that b η [ i j ] ,τ = 1 m n X σ ∈N mn n ˜ W σ D ∗ l l ,τ o + E h ˜ W σ D ∗ l l ,τ i − E h ˜ W σ D ∗ l l ,τ i + o p (1) , which is a V-statistic of order 4. It follows from Lemma B.1 that it can be approximated by aU-statisitcs of order 4. Assumptions 4.1.1 and 4.1.2, and equation (6) ensure that E h ˜ W σ D ∗ l l ,τ i isfinite. It follows then from Assumptions 3.1.1 that a Strong Law of Large Numbers for U-statistics32olds, and hence, 1 m n X σ ∈N mn n ˜ W σ D ∗ l l ,τ − E h ˜ W σ D ∗ l l ,τ io = o p (1) . Consider next1 m n X σ ∈N mn ˜ W σ (cid:8) D ∗ l l − D ∗ l l ,τ (cid:9) = 1 n ( n − n X l X l = l D ∗ l l { (1 − I τ,l l ) } ˜ W l l ( σ )where the equality follows from the definition of D ∗ l l ,τ and˜ W l l ( σ ) = 1( n − n − X s = l ,l X s = l ,l ,s ˜ W σ { l ,s ; l ,s } . It follows from using a Cauchy-Schwarz inequality, that the expectation E n ( n − n X l X l = l D ∗ l l { (1 − I τ,l l ) } ˜ W l l ( σ ) is bounded by1 n ( n − n X l X l = l E (cid:20)(cid:16) D ∗ l l { (1 − I τ,l l ) } ˜ W l l ( σ ) (cid:17) (cid:21) = O (cid:16) E h ˜ W l l ( σ ) (cid:0) D ∗ l l (cid:1) (1 − I τ,l l ) i(cid:17) ≤ sup σ (cid:16) ˜ W σ (cid:17) sup l l (cid:0) D ∗ l l (cid:1) O (cid:16) E h (1 − I τ,l l ) i(cid:17) . where the inequality follows from Assumption 4.1.1. Assumption 4.1.2 yields E h (1 − I τ,l l ) i = P [ I τ,l l = 0] = o ( τ ) . Using the results above to conclude that E n ( n − n X l X l = l D ∗ l l { (1 − I τ,l l ) } ˜ W l l ( σ ) ≤ o ( τ ) , and hence, 1 m n X σ ∈N mn n ˜ W σ D ∗ l l ,τ − E h ˜ W σ D ∗ l l io = o (1) . Using similar steps for b η [ i j ] ,τ , b η [ i j ] ,τ , and b η [ i j ] ,τ , yields:ˆΨ n,τ − E h ˜ W σ ˜ D ∗ σ i = o p (1) . A.4 Proof of Theorem 4.2
Proof.
Part 1: H´ajek Projection
Under Assumptions 3.1.1-3.1.5, 4.1.1-4.1.5, it follows from the proof of Theorem 4.1 that ˆΓ n p → Γ , and from Lemma B.3 that b η [ l l ] ,τ can be written as b η [ l l ] ,τ = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( f x,l l f vx,l l + b f x,l l − f x,l l f vx,l l − f x,l l f vx,l l × b f vx,l l − f vx,l l f vx,l l ) + o p (1)for ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } .Hence, b Ψ n,τ = (cid:0)b η [ i j ] ,τ − b η [ i j ] ,τ (cid:1) − (cid:0)b η [ i j ] ,τ − b η [ i j ] ,τ (cid:1) , which can be expressed as b Ψ n,τ = S ,nτ + S ,nτ − S ,nτ + o p (1) using the expression above, with S ,nτ = 1 m n X σ ∈N mn ˜ W σ (cid:8)(cid:0) D ∗ i j ,τ − D ∗ i j ,τ (cid:1) − (cid:0) D ∗ i j ,τ − D ∗ i j ,τ (cid:1)(cid:9) S ,nτ = 1 m n X σ ∈N mn ˜ W σ ( ϕ i j ,τ b f x,i j f vx,i j − ϕ i j ,τ b f x,i j f vx,i j ! − ϕ i j ,τ b f x,i j f vx,i j − ϕ i j ,τ b f x,i j f vx,i j !) S ,nτ = 1 m n X σ ∈N mn ˜ W σ ( D ∗ i j ,τ b f vx,i j f vx,i j − D ∗ i j ,τ b f vx,i j f vx,i j ! − D ∗ i j ,τ b f vx,i j f vx,i j − D ∗ i j ,τ b f vx,i j f vx,i j !) . Consider (cid:16) b Ψ n,τ − E h ˜ W σ e D ∗ σ,τ | Ω n i(cid:17) = n S ,nτ − E h ˜ W σ e D ∗ σ,τ | Ω n io + S ,nτ − S ,nτ + o p (1) , it follows from Lemmas B.4, B.5, and B.6 that the H´ajek projection of (cid:16) b Ψ n,τ − E h ˜ W σ e D ∗ σ,τ | Ω n i(cid:17) into an arbitrary function of ζ i j = ( X i , X j , A i , A j , v i j , U i j ) is given by (cid:16) b Ψ n,τ − E h ˜ W σ ˜ D ∗ στ | Ω n i(cid:17) = V ∗ n + o p (cid:18)r ̺ n n ( n − (cid:19) V ∗ n = 1 n ( n − n X i =1 X j = i ξ i j ,τ ξ i j ,τ = (cid:8) D ∗ i j − E (cid:2) D ∗ i j | ω i j (cid:3)(cid:9) I τ,i j χ i j χ i j = n − n − X i = i ,j X j = i ,j ,i E h ˜ W σ { i ,i ; j ,j } | X i , X j i and Υ n,τ = n ( n − V ar ( V ∗ n ) = 1 n ( n − n X i =1 X j = i Λ ∗ i ,j Λ ∗ i ,j = E hn E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j i ̺ n,τ = O (Υ n,τ ) = O (cid:18) E (cid:20)(cid:26) p n ( ω i j ) [1 − p n ( ω i j )] f v | x,i j (cid:27) I τ,i j (cid:21)(cid:19) . Part 2: Bias Reduction
Consider next, n ( n − ̺ − n E m n X σ ∈N mn ˜ W σ n ˜ D ∗ στ − ˜ D ∗ σ o | Ω n . It follows from a Cauchy-Schwarz inequality that the term above is bounded by n ( n − ̺ − n m n X σ ∈N mn E (cid:20)(cid:16) ˜ D ∗ στ − ˜ D ∗ σ (cid:17) | Ω n (cid:21) ˜ W σ ˜ W ′ σ which is equal to O (cid:16) n ( n − ̺ − n n E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) E (cid:2) D ∗ i j | ω i j (cid:3) ′ o ( I τ,i j − ˜ W σ ˜ W ′ σ (cid:17) . Assumptions 4.1.1 and 4.1.2 yieldsup σ (cid:16) ˜ W σ (cid:17) sup σ (cid:16) ˜ W σ (cid:17) ′ O (cid:18) n ( n − ̺ − n (cid:26) p n ( ω i j − ) [1 − p n ( ω i j − )] f v | x,i j (cid:27) ( I τ,i j − (cid:19) = O ( n ( n − τ ) = 0since ( I τ,i j −
1) as τ → n → ∞ . 35herefore, n ( n − ̺ − n E m n X σ ∈N mn ˜ W σ n ˜ D ∗ στ − ˜ D ∗ σ o | Ω n = o (1) , and so n ( n − ̺ − n (cid:16) E h ˜ W σ ˜ D ∗ στ | Ω σ i − E h ˜ W σ ˜ D ∗ σ | Ω σ i(cid:17) = o (1) . Part 3: Limit Distribution of Projection
Given Assumptions 3.1.2, the H´ajek projection V ∗ n is an average of { ξ i j ,τ } , which are condi-tionally independent given Ω n = ( v n , X n , A n ), with conditional mean E [ ξ i j ,τ | Ω n ] = 0and conditional varianceΥ (Ω n ) = n ( n − V ar n ( n − n X i =1 X j = i ξ i j | Ω n = 1 n ( n − n X i =1 X j = i n E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j . Given Assumption 4.1.4, a conditional version of Lyapunov’s Central Limit Theorem holds, andhence Υ (Ω n ) − / p n ( n − n X i =1 X j = i ξ i j ,τ ⇒ N (0 , I ) . Now, it follows from using 4.1.4 that k Υ (Ω n ) − Υ n k p → n → ∞ . It follows then that thelimiting distribution is independent of the conditional values, and therefore, the limiting distributioncontinues to hold unconditionally, with Υ n replacing Υ (Ω n ). That is,Υ − / n p n ( n − n X i =1 X j = i ξ i j ,τ ⇒ N (0 , I ) . Part 4: Limiting distribution of b θ n Consider the matrix Σ n , defined as Σ n = Γ − × Υ n × Γ − . The limiting distribution of the b θ n b Ψ − n and Σ n , and from applying Slutsky’s theorem. In other words, p n ( n − − / n (cid:16)b θ n − θ (cid:17) = p n ( n − − / n × b Γ − n m n X σ ∈N mn n ˜ W σ ˜ D ∗ σ,τ − E h ˜ W σ ˜ D ∗ σ | Ω σ io = Γ / × Υ − / n × Γ − / × p n ( n − n X i =1 X j = i ξ i j ,τ + o p (1) ⇒ N (0 , I ) . The proof is complete. 37
Technical Appendix
B.1 Equivalent representation for V statistics
The following lemma provides a U-statistic representation for a V-statistic when the kernel varieswith n . Given n and for m ≤ n , let P ( n,m ) denote the sum over the (cid:0) nm (cid:1) combinations of m distinctelements ( i , · · · , i m ) from (1 , · · · , n ), and let P Π m ! denote the sum over the m ! permutations( i , · · · , i m ) of (1 , · · · , m ).Let V n be a V -statistic or order m , without common indices V n = 1 n m n X i , ··· ,i m =1 h L γ ( X i , · · · , X i m ) [ i = · · · 6 = i m ]where h → n → ∞ , and γ : R L R .Let U n = (cid:18) nm (cid:19) − X ( n,m ) φ h ( X i , · · · , X i m ) φ h ( X , · · · , X m ) = 1 m ! X Π m ! h L γ ( X π , · · · , X π m ) Lemma B.1.
Suppose that E || γ ( X i , · · · , X i m ) || < ∞ for all ≤ i , · · · , i m ≤ m and m ≤ n ,and nh → ∞ . Then, V n − U n = o p (1) . Proof.
Let γ h ( X i , · · · , X i m ) = 1 h L γ ( X i , · · · , X i m ) , and notice that n m V n = X ( n,m ) X Π m ! γ h ( X π , · · · , X π m ) (13)= [ n ( n − · · · ( n − m + 1)] (cid:18) nm (cid:19) − X ( n,m ) φ h ( X i , · · · , X i m )= [ n ( n − · · · ( n − m + 1)] U n , and hence, ( U n − V n ) = O ( n − ) U n .Consider now E h ( U n − V n ) i = O (cid:18) n (cid:19) E (cid:2) U n (cid:3) , E (cid:2) U n (cid:3) = (cid:18) nm (cid:19) − E X ( n,m ) φ h ( X i , · · · , X i m ) ≤ (cid:18) nm (cid:19) − (cid:18) nm (cid:19) E (cid:2) φ h ( X i , · · · , X i m ) (cid:3) where E (cid:2) φ h ( X i , · · · , X i m ) (cid:3) = 1 h L O (cid:0) E (cid:2) γ ( X i , · · · , X i m ) (cid:3)(cid:1) = O (cid:18) h L (cid:19) since E || γ ( X i , · · · , X i m ) || < ∞ by assumption, and hence, E h ( U n − V n ) i ≤ O (cid:18) nh L ) (cid:19) = o (1)as nh L → ∞ .Notice that, unlike Lemma 5.7.3 in Serfling (2009, page 206) and Theorem 1 in Lee (2019, page183), in equation 13 the average of terms with at least one common index is equal to zero due tothe specification of the V-statistic without common indices. B.2 Consistency for V-statistics
Lemma B.2.
Suppose that the Assumptions in Theorem 4.1 hold. Then m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( ˆ f x,l l f vx,l l ) − E (cid:20) ˜ W σ ϕ l l ,τ f x,l l f vx,l l (cid:21) = o p (1)1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( f x,l l f vx,l l ˆ f vx,l l f vx,l l !) − E (cid:20) ˜ W σ ϕ l l ,τ f x,l l f vx,l l (cid:21) = o p (1) with ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } for a given tetrad σ { i , i , j , j } ∈ N m n .Proof. This proof focuses on the first result since the second one follows from similar arguments.Let ˆ V n = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ f vx,l l ˆ f x,l l , f x,l l is defined asˆ f x,l l = 1( n − n − X k = i ,j X k = i ,j ,k h L K x,h ( X k − X l , X k − X l ) . Plugging ˆ f x,l l into ˆ V n yields the following V-statistic of order six6! (cid:18) n (cid:19) − X i = i = j = j = k = k h L ˜ W i i ; j j ϕ l l ,τ f vx,l l K x,h ( X k − X l , X k − X l ) . Assumptions 4.1.1 and 4.1.5 imply that E (cid:20) || h L ˜ W i i ; j j ϕ l l ,τ f vx,l l K x,h ( X k − X l , X k − X l ) || (cid:21) < ∞ , it then follows from Lemma B.1 that ˆ V n is asymptotically equivalent to a six-order U-statistic as nh L → ∞ . In particular, (cid:16) U n − ˆ V n (cid:17) = o p (1) where U n = (cid:18) n (cid:19) − X i < ···
B.3 Lemmas for Asymptotic Normality Theorem
Notation
The following notation will prove to be useful to show Lemmas B.3-B.6. For any finite n , letΩ n = { X n , A n , v n } . Given a fixed tetrad σ { i , i , j , j } ∈ N m n , let X σ = { X i , X i , X j , X j } , A σ = { A i , A i , A j , A j } , v σ = { v i , v i , v j , v j } , Ω σ = { X σ , A σ , v σ } , and for any dyad ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } , define ω l l = { X l , X l , A l , A l , v l l } T † l l = T l l − E h ˜ W σ ˜ D ∗ σ,τ | Ω σ i for any random variable T l l . Lemma B.3.
Suppose that the Assumptions in Theorem 4.2 hold, and consider b η [ l l ] ,τ = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ b f x,l l b f vx,σl l ! . ith ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) , ( i , j ) } . It follows that b η [ l l ] ,τ can be written as b η [ l l ] ,τ = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( f x,l l f vx,l l + b f x,l l − f x,l l f vx,l l − f x,l l f vx,l l × b f vx,l l − f vx,l l f vx,l l ) + o p (1) . Proof.
Given h → n − δ h L +1 → ∞ for any δ >
0, it follows from a variance calculationargument that sup ( v,x,x ′ ) ∈ Ω v,x | ˆ f vx ( v, x, x ′ ) − f vx ( v, x, x ′ ) | = o p (1)sup ( x,x ′ ) ∈ Ω x | ˆ f x ( x, x ′ ) − f x ( x, x ′ ) | = o p (1) , for any δ >
0. See, e.g., Silverman (1978), Collomb and H¨ardle (1986),Aradillas-Lopez (2010), andfor applications to network models Leung (2015b) and Graham et al. (2019).Consider a second order Taylor expansion of b f x,l l / b f vx,l l around f x,l l /f vx,l l . The quadraticterms in the expansion involve second order derivatives of f x,l l /f vx,l l evaluated at ˜ f x,l l and˜ f vx,l l , where ˜ f x,l l lies between b f x,l l and f x,l l , and similarly ˜ f vx,l l lies between b f vx,l l and f vx,l l . By substituting a second order Taylor expansion of b f x,l l / b f vx,l l around f x,l l /f vx,l l into b η [ l l ] ,τ , I obtain b η [ l l ] ,τ = 1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( f x,l l f vx,l l + b f x,l l − f x,l l f vx,l l − f x,l l f vx,l l × b f vx,l l − f vx,l l f vx,l l ) + R n , where R n denotes the reminder term. The result follows from showing that R n = o p (1).The first component of R n is1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ˜ f x,l l (cid:16) b f vx,l l − f vx,l l (cid:17) ˜ f vx,l l ≤ " sup ( x,x ′ ) ∈ Ω x | f x | sup ( v,x,x ′ ) ∈ Ω vx | f − vx | sup ( v,x,x ′ ) ∈ Ω vx | b f vx − f vx | m n X σ ∈N mn || ˜ W σ ϕ l l ,τ || = O p (1) " sup ( v,x,x ′ ) | b f vx − f vx | = o p (1) . The first inequality follows from Assumption 4.1.1. The equality follows from the fact that theV-statistic inside the parenthesis converges to its expectation given that Assumptions 3.1.1 and4.1.1. The result follows from the uniform convergence of the kernel estimator.43he remaining component of R n is1 m n X σ ∈N mn ˜ W σ ϕ l l ,τ ( ( b f vx,l l − f vx,l l )( b f x,l l − f x,l l ) f vx,l l ) ≤ " sup ( v,x,x ) ∈ Ω vx | f − vx | sup ( v,x,x ) ∈ Ω vx | b f vx − f vx | sup ( x,x ) ∈ Ω x | b f x − f x | × m n X σ ∈N mn || ˜ W σ ϕ l l ,τ || = O p (1) " sup ( v,x,x ) ∈ Ω vx | b f vx − f vx | sup ( x,x ) ∈ Ω vx | b f x − f x | . = o p (1) . The result follows from the uniform convergence of the kernel estimators. This completes the proof.
Lemma B.4.
Under the same Assumptions of Theorem 4.2, it follows that the H´ajek projection of S † ,nτ = S ,nτ − E h ˜ W σ ˜ D ∗ σ,τ | Ω n i = 1 m n X σ ∈N mn n ˜ W σ ˜ D ∗ σ,τ − E h ˜ W σ ˜ D ∗ σ,τ | Ω σ io into an arbitrary function ζ i j = ( X i , X j , A i , A j , v i j , U i j ) is given by V ∗ ,nτ = 1 n ( n − n X i =1 X j = i ξ i j ,τ and n ( n − − / n E (cid:20)(cid:16) S † ,nτ − V ∗ ,nτ (cid:17) (cid:21) Υ − / n = o (1) , where Υ n = n ( n − V ar ( V ∗ ,nτ ) and V ar ( V ∗ ,nτ ) = O p ( p n τ ) .Proof. Step 1. H´ajek Projection
Consider the tetrad σ { i , i , j , j } , let s ( σ { i , i , j , j } ) = ˜ W σ ˜ D ∗ σ,τ − E h ˜ W σ ˜ D ∗ σ,τ | Ω σ i = ˜ W σ n ˜ D ∗ σ,τ − E h ˜ D ∗ σ,τ | Ω σ io , E [ s ( σ { i , i , j , j } ) | ζ i j ] = E h ˜ W σ n ˜ D ∗ σ,τ − E h ˜ D ∗ σ,τ | Ω σ io | ζ i j i = (cid:8) D ∗ i j ,τ − E (cid:2) D ∗ i j ,τ | ω i j (cid:3)(cid:9) E h ˜ W σ | X i j i . where the second equality follows from the Law of Iterated Expectations, and Assumptions 3.1.1and 3.1.2. To be precise, observe that for { l , l } 6 = { i , j } with ( l , l ) ∈ { ( i , j ) , ( i , j ) , ( i , j ) } ,E h ˜ W σ n ˜ D ∗ l l ,τ − E h ˜ D ∗ l l ,τ | Ω σ io | ζ i j i = E h ˜ W σ n E h ˜ D ∗ l l ,τ | ω l l i − E h ˜ D ∗ l l ,τ | ω l l io | ζ i j i = 0 . It then follows that the H´ajek projection is given by V ∗ ,nτ = 1 n ( n − n X i =1 X j = i ξ i j ,τ , with ξ i j ,τ = (cid:8) D ∗ i j − E (cid:2) D ∗ i j | ω i j (cid:3)(cid:9) I τ,i j χ i j χ i j = n − n − X i = i ,j X j = i ,j ,i E h ˜ W σ { i ,i ; j ,j } | X i , X j i . Notice that E (cid:2) V ∗ ,nτ (cid:3) = E [ ξ i j ,τ ] = 0. Step 2. Variance of H´ajek Projection
For two different dyads { i , j } 6 = { i ′ , j ′ } with zero common indices, Assumption 3.1.1 impliesthat E h ξ i j ,τ ξ i ′ j ′ ,τ i = E [ ξ i j ,τ ] E h ξ i ′ j ′ ,τ i = 0 . Observe that for two dyads { i , j } 6 = { i , j ′ } with one common index, the conditionally inde-pendent formation of links implied by Assumption 3.1.2 yields E h ξ i j ,τ ξ i ′ j ′ ,τ i = E h E [ ξ i j ,τ | Ω n ] E h ξ i ′ j ′ ,τ | Ω n ii = 0 . V ∗ ,nτ is given by V ar (cid:0) V ∗ ,nτ (cid:1) = (cid:26) n ( n − (cid:27) n X i =1 X j = i E h ξ i j ,τ ξ ′ i ′ j ′ ,τ i = (cid:26) n ( n − (cid:27) n X i =1 X j = i Λ ∗ i ,j where Λ ∗ i ,j = E hn E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j i . Define Υ n,τ = n ( n − V ar (cid:0) V ∗ ,nτ (cid:1) = 1 n ( n − n X i =1 X j = i Λ ∗ i ,j . Step 3. Variance of S † ,nτ Given two different tetrads σ { i , i , j , j } and σ ′ { i ′ , i ′ , j ′ , j ′ } , let∆ c,n = Cov (cid:0) s ( σ { i , i , j , j } ) , s (cid:0) σ ′ { i ′ , i ′ , j ′ , j ′ } (cid:1)(cid:1) denote the covariance between s ( σ ) and s ( σ ′ ) when σ { i , i , j , j } and σ ′ { i ′ , i ′ , j ′ , j ′ } have c =0 , , , , E [ s ( σ { i , i , j , j } ) | Ω σ ] = 0, that ∆ ,n = ∆ ,n = 0.Consider∆ ,n = E (cid:2) s ( σ { i , i , j , j } ) s ( σ ′ { i , i ′ , j , j ′ } ) ′ (cid:3) = E hn ˜ D ∗ σ,τ − E h ˜ D ∗ σ,τ | Ω σ io n ˜ D ∗ σ ′ ,τ − E h ˜ D ∗ σ ′ ,τ | Ω σ ′ io ˜ W σ ˜ W σ ′ i = E (cid:20)(cid:26) E h ˜ D ∗ i j ,τ ˜ D ∗ i j ,τ | ω i j i − E h ˜ D ∗ i j ,τ | ω i j i (cid:27) I τ,i j ˜ W σ ˜ W σ ′ (cid:21) .
46t follows from the results above that
V ar (cid:16) S † ,nt (cid:17) can be expanded as V ar (cid:16) S † ,nt (cid:17) = (cid:18) m n (cid:19) X σ ∈N mn X σ ′ ∈N mn (cid:8) E (cid:2) s ( σ { i , i , j , j } ) s ( σ ′ { i , i ′ , j ′ , j ′ } ) ′ (cid:3)(cid:9) = (cid:18) m n (cid:19) n X i =1 X j = i X k = i ,j X k = i ,j ,k X l = i ,j X l = i ,j ,l ∆ ,n + O (cid:18) ∆ ,n n (cid:19) + O (cid:18) ∆ ,n n (cid:19) . Notice that the term inside the brackets scaled by [( n − n − − is equivalent to Λ ∗ i j , inparticular,Λ ∗ i j = (cid:26) n − n − (cid:27) X k = i ,j X k = i ,j ,k X l = i ,j ′ X l = i ,j ′ ,l ∆ ,n = E (cid:20)(cid:26) E h ˜ D ∗ i j ,τ ˜ D ∗ i j ,τ | ω i j i − E h ˜ D ∗ i j ,τ | ω i j i (cid:27) I τ,i j χ i j χ ′ i j (cid:21) , which follows from the definition of χ i j .Hence, V ar (cid:16) S † ,nt (cid:17) = (cid:18) n ( n − (cid:19) n X i =1 X j = i Λ ∗ i ,j + o (1) , and V ar (cid:0) V ∗ ,nτ (cid:1) − V ar (cid:16) S † ,nτ (cid:17) = o (1). Step 4. Asymptotic Equivalence
To show that n ( n − − / n,τ E (cid:20)(cid:16) S † ,nτ − V ∗ ,nτ (cid:17) (cid:16) S † ,nτ − V ∗ ,nτ (cid:17) ′ (cid:21) Υ − / n,τ = o (1)it is sufficient to prove that V ar (cid:0) V ∗ ,nτ (cid:1) − / Cov (cid:2) V ∗ ,nτ , S ,nτ (cid:3) V ar (cid:0) V ∗ ,nτ (cid:1) − / = I , which in turn,follows from noticing that Cov h V ∗ ,nτ , S † ,nτ i = E h V ∗ ,nτ , S † ,nτ i = E (cid:20) V ∗ ,nτ (cid:16) S † ,nτ − V ∗ ,nτ (cid:17) ′ (cid:21) + E h V ∗ ,nτ (cid:0) V ∗ ,nτ (cid:1) ′ i = V ar ( V ∗ ,nτ ) , since by construction of the orthogonal projection E h V ∗ ,nτ (cid:0) S ,nτ − V ∗ ,nτ (cid:1) ′ i = 0 . Lemma B.5.
Under the same Assumptions of Theorem 4.2, it follows that the H´ajek projection of S † ,nτ = S ,nτ − E h ˜ W σ ˜ D ∗ σ,τ | Ω σ i S ,nτ = 1 m n X σ ∈N mn ˜ W σ ( ϕ i j ,τ b f x,i j f vx,i j − ϕ i j ,τ b f x,i j f vx,i j ! − ϕ i j ,τ b f x,i j f vx,i j − ϕ i j ,τ b f x,i j f vx,i j !) into an arbitrary function ζ i j = ( X i , X j , A i , A j , v i j , U i j ) is given by V ∗ ,nτ = 1 n ( n − n X i =1 X j = i ¯ ξ i j ,τ and n Υ − / n E h(cid:0) S ,nτ − V ∗ ,nτ (cid:1) i Υ − / n = o (1) , where Υ n = nV ar ( V ∗ ,nτ ) .Proof. Similarly to the definition for tetrads, I introduce the function σ = σ { i , i , j , j , k , k } that maps each unique 6-tuple { i , i , j , j , k , k } into an index set N m n = { , · · · , m n } where m n denotes the total number of those 6-tuples. Hence, each distinct 6-tuple { i , i , j , j , k , k } corresponds to a unique σ = σ { i , i , j , j , k , k } ∈ N m n .Consider a fixed 6-tuple { i , i , j , j , k , k } , and define s i ,j ( σ ) = ˜ W i i ,j j (cid:26) h L ϕ i j ,τ f vx,i j K x,h ( X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) s i ,j ( σ ) = ˜ W i i ,j j (cid:26) h L ϕ i j ,τ f vx,i j K x,h ( X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) s i ,j ( σ ) = ˜ W i i ,j j (cid:26) h L ϕ i j ,τ f vx,i j K x,h ( X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) s i ,j σ ) = ˜ W i i ,j j (cid:26) h L ϕ i j ,τ f vx,i j K x,h ( X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) , and s ,n ( σ ) = s i ,j ( σ ) − s i ,j ( σ ) − s i ,j ( σ ) + s i ,j ( σ ). It follows then that S † ,nτ can be written as S † ,nτ = (cid:20) (cid:18) n (cid:19)(cid:21) − X σ ∈N mn s ,nτ ( σ )= (cid:20) (cid:18) n (cid:19)(cid:21) − X σ ∈N mn { s i j ( σ ) − s i j ( σ ) − s i j ( σ ) + s i j ( σ ) } . Step 1. H´ajek Projection
The rest of the proof makes use of the following index notation for dyads. Given the total48umber of ordered dyads n = n ( n − π = , , · · · index the n ordereddyads in the sample. In an abuse of notation, also let π denote the set { i , j } , where i and j arethe indices that comprise dyad π . In particular, π (1) = i and π (2) = j , when π = { i , j } .With this notation at hand, S † ,nτ can be expressed as S † ,nτ = (cid:20) (cid:18) n (cid:19)(cid:21) − n X π = X π = π X π = π (cid:8) s π ( σ ) − s π (1) π (2) ( σ ) − s π (1) π (2) ( σ ) + s π ( σ ) (cid:9) where σ = σ { π , π , π } .Let p π , π ( σ ) = 1 h L (cid:18) ϕ π ,τ f vx, π ˜ W π , π + ϕ π ,τ f vx, π ˜ W π , π (cid:19) K x,h ( X π − X π ) − E h ˜ W π , π D ∗ π ,τ | Ω π , π i − E h ˜ W π , π D ∗ π ,τ | Ω π , π i p π , π ( σ ) = 1 h L (cid:18) ϕ π ,τ f vx, π ˜ W π , π K x,h ( X π − X π ) + ϕ π ,τ f vx, π ˜ W π , π K x,h ( X π − X π ) (cid:19) − E h ˜ W π , π D ∗ π ,τ | Ω π , π i − E h ˜ W π , π D ∗ π ,τ | Ω π , π i p π (1) π (2) , π ( σ ) = 1 h L ˜ W π , π (cid:26)(cid:18) ϕ π (1) π (2) ,τ f vx, π (1) π (2) (cid:19) K x,h (cid:0) X π − X π (1) π (2) (cid:1) − E h D ∗ π (1) π (2) ,τ | Ω π , π i(cid:27) + 1 h L ˜ W π , π (cid:26)(cid:18) ϕ π (1) π (2) ,τ f vx, π (1) π (2) K x,h (cid:0) X π − X π (1) π (2) (cid:1)(cid:19) − E h D ∗ π (1) π (2) ,τ | Ω π , π i(cid:27) p π (1) π (2) , π ( σ ) = 1 h L ˜ W π , π (cid:26)(cid:18) ϕ π (1) π (2) ,τ f π (1) π (2) ,τ K x,h (cid:0) X π − X π (1) π (2) (cid:1)(cid:19) − E h D ∗ π (1) π (2) ,τ | Ω π , π i(cid:27) + 1 h L ˜ W π , π (cid:26)(cid:18) ϕ π (1) π (2) ,τ f vx, π (1) π (2) K x,h (cid:0) X π − X π (1) π (2) (cid:1)(cid:19) − E h D ∗ π (1) π (2) ,τ | Ω π , π i(cid:27) where K x,h ( X π − X π ) denotes K x,h (cid:0) X π (1) − X π (1) , X π (2) − X π (2) (cid:1) , ˜ W π , π denotes ˜ W π { i i } , π { j j } ,and χ π = E h ˜ W π , π | X π i χ π = X π = π , π χ π . Using the symmetry of the kernel,it follows that S † ,nτ can be written as (cid:20) (cid:18) n (cid:19)(cid:21) − n X π = X π = π +1 X π = π , π (cid:8) p π , π ( σ ) − p π (1) π (2) , π ( σ ) − p π (1) π (2) , π ( σ ) + p π , π ( σ ) (cid:9) To compute the H´ajek projection of the above sum into an arbitrary function of ζ π , consider49rst E [ p π , π ( σ ) | ζ π ]. To that end, the following results will be useful. E h E h ˜ W π , π D ∗ π ,τ | ω π i | ζ π i = E (cid:2) D ∗ π ,τ | ω π (cid:3) E h ˜ W π , π | X π i = E (cid:2) D ∗ π ,τ χ π | ω π (cid:3) E h E h ˜ W π , π D ∗ π ,τ | ω π i | ζ π i = E h E (cid:2) D ∗ π ,τ | ω π (cid:3) E h ˜ W π , π | X π ii = E (cid:2) D ∗ π ,τ χ π (cid:3) . Furthermore, E (cid:20)(cid:18) ϕ π ,τ f vx, π ˜ W π , π + ϕ π ,τ f vx, π ˜ W π , π (cid:19) h L K x,h ( X π − X π ) | ζ π (cid:21) = E (cid:20)(cid:26) ϕ π ,τ f vx, π E h ˜ W π , π | X π i + E (cid:20) ϕ π ,τ f vx, π | X π (cid:21) E h ˜ W π , π | X π i(cid:27) h L K x,h ( X π − X π ) | ζ π (cid:21) = Z (cid:26) ϕ π ,τ f vx, π χ π + E (cid:20) ϕ π ,τ f vx, π χ π | X π (cid:21)(cid:27) h L K x,h ( X π − X π ) f x ( X π ) dX π where the second equality follows from a Law of Iterated Expectations and Assumption 3.1.1.Let Ξ ( X π ) = E (cid:2) D ∗ π ,τ χ π | X π (cid:3) , and consider Z (cid:26) ϕ π ,τ f vx, π χ π f x ( X π ) + Ξ ( X π ) (cid:27) h L K x,h ( X π − X π ) dX π − (cid:26) ϕ π ,τ f vx, π χ π f x ( X π ) + Ξ ( X π ) (cid:27) = Z (cid:26) ϕ π ,τ f vx, π χ π f x ( X π + h ν ) + Ξ ( X π + h ν ) (cid:27) K x,h ( ν ) d ν − (cid:26) ϕ π ,τ f vx, π χ π f x ( X π ) + Ξ ( X π ) (cid:27) = Z (cid:26) ϕ π ,τ f vx, π χ π ( f x ( X π + h ν ) − f x ( X π )) (cid:27) + { Ξ ( X π + h ν ) − Ξ ( X π ) } K x ( ν ) d ν = o ( h M )where the first equality follows from a change of variable ν = h − ( X π − X π ) with Jacobian h L .The last equality follows Assumptions 4.1.1, 4.1.3, and 4.1.5 which guarantee that f x ( X π ) andΞ ( X π ) are continuous and M -times differentiable with respect to all of its arguments, and K x isa bias-reducing kernel of order 2 M . Observe that ϕ π ,τ f vx, π χ π f x ( X π ) = 0holds for any X π within a τ distance of the boundary S x , and having h/τ → ν = h − ( X π − X π ) is not affected by boundary effects.The previous results, and Assumption 4.1.5, yield E [ p π , π ( σ ) | ζ π ] = D ∗ π ,τ χ π + E (cid:2) D ∗ π ,τ χ π | X π (cid:3) − E (cid:2) D ∗ π ,τ χ π | ω π (cid:3) − E (cid:2) D ∗ π ,τ χ π (cid:3) + o (1) . π s ∈ { ( π (1) , π (2)) , ( π (1) , π (2)) , π } , E (cid:20) ˜ W π , π (cid:26) h L ϕ π s ,τ f vx, π s K x,h ( X π − X π s ) − E (cid:2) D ∗ π s ,τ | ω π s (cid:3)(cid:27) | ζ π (cid:21) = E (cid:20) ˜ W π , π (cid:26) E (cid:20) h L ϕ π s ,τ f vx, π s K x,h ( X π − X π s ) | Ω σ , ζ π (cid:21) − E (cid:2) D ∗ π s ,τ | ω π s (cid:3)(cid:27) | ζ π (cid:21) = O (cid:16) h M (cid:17) since the expectation E (cid:20) h L ϕ π s ,τ f vx, π s K x,h ( X π − X π s ) | Ω σ , ζ π (cid:21) = Z h L E (cid:20) ϕ π s ,τ f vx, π s | ω π s (cid:21) K x,h ( X π − X π s ) f x ( X π ) dX π = E (cid:2) D ∗ π s ,τ | ω π s (cid:3) + O (cid:16) h M (cid:17) , where the second equality follows from Assumptions 3.1.1, 3.1.2, and properties of the bias-reducingkernel, Assumption 4.1.5.Similarly, for a given π s ∈ { ( π (1) , π (2)) , ( π (1) , π (2)) , π } , it follows from Assumptions3.1.1, 3.1.2, 4.1.3, and 4.1.5, that E (cid:20) h L (cid:18) ϕ π s ,τ f vx, π s ˜ W π , π K x,h ( X π − X π s ) (cid:19) | ζ π (cid:21) − Ξ [ X π ]= E (cid:20) h L E (cid:20)(cid:18) ϕ π s ,τ f vx, π s χ π s (cid:19) | X π s (cid:21) K x,h ( X π − X π s ) | ζ π (cid:21) − Ξ [ X π ]= Z { Ξ [ X π + h ν ] − Ξ [ X π ] } K x ( ν ) d ν = O (cid:16) h M (cid:17) . Using the previous results it follows that E [ p π s , π ( σ ) | ζ π ] = E (cid:2) D ∗ π ,τ χ π | X π (cid:3) − E (cid:2) D ∗ π ,τ χ π (cid:3) , and thus, E (cid:2) p π , π ( σ ) − p π (1) π (2) , π ( σ ) − p π (1) π (2) , π ( σ ) + p π , π ( σ ) | ζ π (cid:3) = (cid:8) D ∗ π − E (cid:2) D ∗ π | ω π (cid:3)(cid:9) I τ, π χ π + o (1)It then follows that the H´ajek projection is given by V ∗ ,nτ = 1 n ( n − n X i =1 X j = i ξ i j ,τ + o (1)51ith ξ i j ,τ = (cid:8) D ∗ i j − E (cid:2) D ∗ i j | ω i j (cid:3)(cid:9) I τ,i j χ i j χ i j = n − n − X i = i ,j X j = i ,j ,i E h ˜ W σ { i ,i ; j ,j } | X i , X j i . If follows from a Law of Iterated Expectations that E (cid:2) V ∗ ,nτ (cid:3) = E [ ξ i j ,τ ] = 0 . Step 2. Variance of H´ajek Projection
As in the proof of Lemma B.4, the variance of V ∗ ,nτ is given by V ar (cid:0) V ∗ ,nτ (cid:1) = (cid:26) n ( n − (cid:27) n X i =1 X j = i E h ξ i j ,τ ξ ′ i ′ j ′ ,τ i = (cid:26) n ( n − (cid:27) n X i =1 X j = i Λ ∗ i ,j where Λ ∗ i ,j = E hn E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j i . Define Υ n = n ( n − V ar (cid:0) V ∗ ,nτ (cid:1) = 1 n ( n − n X i =1 X j = i Λ ∗ i ,j . Step 3. Variance of S ,nτ Given two different 6-tuples σ { i , i , j , j , l , l } and σ ′ { i ′ , i ′ , j ′ , j ′ , l ′ , l ′ } , let∆ c,n = Cov (cid:0) s ,n ( σ { i , i , j , j , l , l } ) , s ,n (cid:0) σ ′ { i ′ , i ′ , j ′ , j ′ , l ′ , l ′ } (cid:1)(cid:1) denote the covariance between s ,n ( σ ) and s ,n ( σ ′ ) when σ and σ ′ have c = 0 , , , , , , E [ s ,n ( σ { i , i , j , j , l , l } ) | Ω σ ] = 0, that ∆ ,n = ∆ ,n = 0.52onsider∆ ,n = E (cid:2) s ,n ( σ { i , i , j , j , l , l } ) s ,n ( σ ′ { i , i ′ , j , j ′ , l , l ′ } ) ′ (cid:3) = E h s i j ( σ ) s i j (cid:0) σ ′ (cid:1) ′ i + o (1)= E (cid:20)(cid:26) E h ˜ D ∗ i j ,τ ˜ D ∗ i j ,τ | ω i j i − E h ˜ D ∗ i j ,τ | ω i j i (cid:27) I τ,i j ˜ W σ ˜ W σ ′ (cid:21) + o (1) . Therefore, the variance of
V ar ( S † ,nτ ) can be expressed as (cid:18) m n (cid:19) X σ X σ ′ E (cid:2)(cid:0) s ,n ( σ ) s ,n ( σ ′ ) ′ (cid:1)(cid:3) + (cid:18) (cid:18) n (cid:19)(cid:19) − n X i =1 X j = i X k = i ,j X k = i ,j ,k X l = i ,j X l = i ,j ,l ∆ ,n + O (cid:18) n (cid:19) ∆ ,n + O (cid:18) n (cid:19) ∆ ,n + O (cid:18) n (cid:19) ∆ ,n + O (cid:18) n (cid:19) ∆ ,n Notice that the term inside the brackets scaled by (( n − n − − can be written as (cid:18) n − n − (cid:19) X k = i ,j X k = i ,j ,k X l = i ,j X l = i ,j ,l ∆ ,n = E hn E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j i = Λ ∗ i ,j . As a result,
V ar h S † ,nτ i = (cid:18) n ( n − (cid:19) n X i =1 X j = i Λ ∗ i ,j + o (1) , and V ar (cid:2) V ∗ ,nτ (cid:3) − V ar h S † ,nτ i = o p (1).The asymptotic equivalence results follows from similar arguments as in the proof of LemmaB.4. The proof is complete. Lemma B.6.
Under the same Assumptions of Theorem 4.2, it follows that the H´ajek projection of S † ,nτ = S ,nτ − E h ˜ W σ ˜ D ∗ σ,τ | Ω σ i S ,nτ = 1 m n X σ ∈N mn ˜ W σ ( D ∗ i j ,τ b f vx,i j f vx,i j − D ∗ i j ,τ b f vx,i j f vx,i j ! − D ∗ i j ,τ b f vx,i j f vx,i j − D ∗ i j ,τ b f vx,i j f vx,i j !) nto an arbitrary function ζ i j = ( X i , X j , A i , A j , v i j , U i j ) is given by V ∗ ,nτ = E h S † ,nτ | ζ i j i = 1 n ( n − n X i =1 X j = i ξ i j ,τ and n ( n − − / n E (cid:20)(cid:16) S † ,nτ − V ∗ ,nτ (cid:17) (cid:21) Υ − / n = o (1) , where Υ n = n ( n − V ar ( V ∗ ,nτ ) .Proof. Consider a fixed 6-tuple { i , i , j , j , k , k } , and define s i ,j ( σ ) = ˜ W i i ,j j (cid:26) h L +1 D ∗ i j ,τ f vx,i j K vx,h ( v k k − v i j , X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) s i ,j ( σ ) = ˜ W i i ,j j (cid:26) h L +1 D ∗ i j ,τ f vx,i j K vx,h ( v k k − v i j , X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) s i ,j ( σ ) = ˜ W i i ,j j (cid:26) h L +1 D i j ,τ ∗ f vx,i j K vx,h ( v k k − v i j , X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) s i ,j σ ) = ˜ W i i ,j j (cid:26) h L +1 D ∗ i j ,τ f vx,i j K vx,h ( v k k − v i j , X k − X i , X k − X j ) − E (cid:2) D ∗ i j ,τ | Ω i i ,j j (cid:3)(cid:27) , and s ,n ( σ ) = s i ,j ( σ ) − s i ,j ( σ ) − s i ,j ( σ ) + s i ,j ( σ ). It follows then that S † ,nτ can be written as S † ,nτ = (cid:20) (cid:18) n (cid:19)(cid:21) − X σ ∈N mn s ,nτ ( σ )= (cid:20) (cid:18) n (cid:19)(cid:21) − X σ ∈N mn { s i j ( σ ) − s i j ( σ ) − s i j ( σ ) + s i j ( σ ) } . Step 1. H´ajek Projection
The rest of the proof makes use of the following index notation for dyads. Given the totalnumber of ordered dyads n = n ( n − π = , , · · · index the n ordereddyads in the sample. In an abuse of notation, also let π denote the set { i , j } , where i and j arethe indices that comprise dyad π . In particular, π (1) = i and π (2) = j , when π = { i , j } .With this notation at hand, S † ,nτ can be expressed as S † ,nτ = (cid:20) (cid:18) n (cid:19)(cid:21) − n X π = X π = π X π = π (cid:8) s π ( σ ) − s π (1) π (2) ( σ ) − s π (1) π (2) ( σ ) + s π ( σ ) (cid:9) where σ = σ { π , π , π } . 54et p π , π ( σ ) = 1 h L +1 (cid:18) D ∗ π ,τ f vx, π ˜ W π , π + D ∗ π ,τ f vx, π ˜ W π , π (cid:19) K vx,h ( v π − v π , X π − X π ) − E h ˜ W π , π D ∗ π ,τ | Ω π , π i − E h ˜ W π , π D ∗ π ,τ | Ω π , π i p π , π ( σ ) = 1 h L +1 ˜ W π , π (cid:26) D ∗ π ,τ f vx, π K vx,h ( v π − v π , X π − X π ) − E (cid:2) D ∗ π ,τ | Ω π , π (cid:3)(cid:27) h L +1 ˜ W π , π (cid:26) D ∗ π ,τ f vx, π K vx,h ( v π − v π , X π − X π ) − E (cid:2) D ∗ π ,τ | Ω π , π (cid:3)(cid:27) p π (1) π (2) , π ( σ ) = 1 h L +1 D ∗ π (1) π (2) ,τ f vx, π (1) π (2) ˜ W π , π K vx,h (cid:0) v π − v π (1) π (2) , X π − X π (1) π (2) (cid:1) + 1 h L +1 D ∗ π (1) π (2) ,τ f vx, π (1) π (2) ˜ W π , π K vx,h (cid:0) v π − v π (1) π (2) , X π − X π (1) π (2) (cid:1) − E h ˜ W π , π D ∗ π (1) π (2) ,τ | Ω π , π i − E h ˜ W π , π D ∗ π (1) π (2) ,τ | Ω π , π i p π (1) π (2) , π ( σ ) = 1 h L +1 D ∗ π (1) π (2) ,τ f π (1) π (2) ,τ ˜ W π , π K vx,h (cid:0) v π − v π (1) π (2) , X π − X π (1) π (2) (cid:1) + 1 h L +1 D ∗ π (1) π (2) ,τ f vx, π (1) π (2) ˜ W π , π K vx,h (cid:0) v π − v π (1) π (2) , X π − X π (1) π (2) (cid:1) − E h ˜ W π , π D ∗ π (1) π (2) ,τ | Ω π , π i − E h ˜ W π , π D ∗ π (1) π (2) ,τ | Ω π , π i where K vx,h ( v π − v π , X π − X π ) denotes K vx,h (cid:0) v π − v π , X π (1) − X π (1) , X π (2) − X π (2) (cid:1) , ˜ W π , π denotes ˜ W π { i i } , π { j j } , and χ π = E h ˜ W π , π | X π i χ π = X π = π , π χ π . Using the symmetry of the kernel, it follows that S † ,nτ can be written as (cid:20) (cid:18) n (cid:19)(cid:21) − n X π = X π = π +1 X π = π , π (cid:8) p π , π ( σ ) − p π (1) π (2) , π ( σ ) − p π (1) π (2) , π ( σ ) + p π , π ( σ ) (cid:9) To compute the H´ajek projection of the above sum into an arbitrary function of ζ π , considerfirst E [ p π , π ( σ ) | ζ π ] . To that end, the following results will be useful. E h E h ˜ W π , π D ∗ π ,τ | ω π i | ζ π i = E (cid:2) D ∗ π ,τ | ω π (cid:3) E h ˜ W π , π | X π i = E (cid:2) D ∗ π ,τ χ π | ω π (cid:3) E h E h ˜ W π , π D ∗ π ,τ | ω π i | ζ π i = E h E (cid:2) D ∗ π ,τ | ω π (cid:3) E h ˜ W π , π | X π ii = E (cid:2) D ∗ π ,τ χ π (cid:3) . E (cid:20)(cid:18) D ∗ π ,τ f vx, π ˜ W π , π + D ∗ π ,τ f vx, π ˜ W π , π (cid:19) h L +1 K vx,h ( v π − v π , X π − X π ) | ζ π (cid:21) = E (cid:20)(cid:26) D ∗ π ,τ f vx, π E h ˜ W π , π | X π i + E (cid:20) D ∗ π ,τ f vx, π | v π , X π (cid:21) E h ˜ W π , π | X π i(cid:27) × h L +1 K vx,h ( v π − v π , X π − X π ) | ζ π (cid:21) = Z (cid:26) D ∗ π ,τ f vx, π χ π + E (cid:20) D ∗ π ,τ f vx, π χ π | v π , X π (cid:21)(cid:27) h L +1 K vx,h ( v π − v π , X π − X π ) f vx ( v π , X π ) dv π dX π where the second equality follows from a Law of Iterated Expectations and Assumption 3.1.1.Let Ξ ( v π , X π ) = E (cid:2) D ∗ π ,τ χ π | v π , X π (cid:3) , and consider Z (cid:26) D ∗ π ,τ f vx, π χ π f vx ( v π , X π ) + Ξ ( v π , X π ) (cid:27) h L +1 K vx,h ( v π − v π , X π − X π ) dv π dX π − (cid:26) D ∗ π ,τ f vx, π χ π f vx ( v π , X π ) + Ξ ( v π , X π ) (cid:27) = Z (cid:26) D ∗ π ,τ f vx, π χ π f vx ( v π + h ν X π + h ν ) + Ξ ( v π + h ν , X π + h ν ) (cid:27) K vx ( ν ) d ν − (cid:26) D ∗ π ,τ f vx, π χ π f vx ( v π , X π ) + Ξ ( v π , X π ) (cid:27) = Z (cid:18) D ∗ π ,τ f vx, π χ π { f vx ( v π + h ν X π + h ν ) − f vx ( v π , X π ) } + { Ξ ( v π + h ν , X π + h ν ) − Ξ ( v π , X π ) } ) K vx ( ν ) d ν = o ( h M )where the first equality follows from a change of variable ν = ( ν , ν ), with ν = h − ( v π − v π ),and ν = h − ( X π − X π ), with Jacobian h L . The last equality follows Assumptions 4.1.1, 4.1.3,and 4.1.5 which guarantee that f vx ( v π , X π ) and Ξ ( v π , X π ) are continuous and M -times dif-ferentiable with respect to all of its arguments, and K vx is a bias-reducing kernel of order 2 M .Observe that D ∗ π ,τ f vx, π χ π f vx ( v π , X π ) = 0holds for any ( v π , X π ) within a τ distance of the boundary S vx , and having h/τ → ν = ( ν , ν ), with ν = h − ( v π − v π ), and ν = h − ( X π − X π ), is notaffected by boundary effects.The previous results yield E [ p π , π ( σ ) | ζ π ] = D ∗ π ,τ χ π + E (cid:2) D ∗ π ,τ χ π | X π (cid:3) − E (cid:2) D ∗ π ,τ χ π | ω π (cid:3) − E (cid:2) D ∗ π ,τ χ π (cid:3) + o (1) . π s ∈ { ( π (1) , π (2)) , ( π (1) , π (2)) , π } , E (cid:20) ˜ W π , π (cid:26) h L +1 D ∗ π s ,τ f vx, π s K vx,h ( v π − v π s , X π − X π s ) − E (cid:2) D ∗ π s ,τ | ω π s (cid:3)(cid:27) | ζ π (cid:21) = E (cid:20) ˜ W π , π (cid:26) E (cid:20) h L +1 D ∗ π s ,τ f vx, π s K vx,h ( v π − v π s , X π − X π s ) | Ω π , π (cid:21) − E (cid:2) D ∗ π s ,τ | ω π s (cid:3)(cid:27) | ζ π (cid:21) = O (cid:16) h M (cid:17) since the expectation E (cid:20) h L +1 D ∗ π s ,τ f vx, π s K vx,h ( v π − v π s , X π − X π s ) | Ω π , π (cid:21) = Z h L +1 E (cid:20) D ∗ π s ,τ f vx, π s | ω π s (cid:21) K vx,h ( v π − v π s , X π − X π s ) f vx ( v π , X π ) dv π dX π = E (cid:2) D ∗ π s ,τ | ω π s (cid:3) + o (cid:16) h M (cid:17) , where the second equality follows from Assumptions 3.1.1, 3.1.2, and properties of the bias-reducingkernel, Assumption 4.1.5.Similarly, for a given π s ∈ { ( π (1) , π (2)) , ( π (1) , π (2)) , π } , it follows from Assumptions3.1.1, 3.1.2, 4.1.3, and 4.1.5, that E (cid:20) h L +1 (cid:18) D ∗ π s ,τ f vx, π s ˜ W π , π K vx,h ( v π − v π s , X π − X π s ) (cid:19) | ζ π (cid:21) − Ξ [ v π , X π ]= E (cid:20) h L +1 E (cid:20)(cid:18) D ∗ π s ,τ f vx, π s χ π s (cid:19) | v π s , X π s (cid:21) K vx,h ( v π − v π s , X π − X π s ) | ζ π (cid:21) − Ξ [ v π , X π ]= Z { Ξ ( v π + h ν , X π + h ν ) − Ξ ( v π , X π ) } K vx ( ν ) d ν = O (cid:16) h M (cid:17) . Using the previous results it follows that E [ p π s , π ( σ ) | ζ π ] = E (cid:2) D ∗ π ,τ χ π | X π (cid:3) − E (cid:2) D ∗ π ,τ χ π (cid:3) , and thus, E (cid:2) p π , π ( σ ) − p π (1) π (2) , π ( σ ) − p π (1) π (2) , π ( σ ) + p π , π ( σ ) | ζ π (cid:3) = (cid:8) D ∗ π − E (cid:2) D ∗ π | ω π (cid:3)(cid:9) I τ, π χ π + o (1)It follows then that the H´ajek projection is given by V ∗ ,nτ = 1 n ( n − n X i =1 X j = i ξ i j ,τ + o (1)57ith ξ i j ,τ = (cid:8) D ∗ i j − E (cid:2) D ∗ i j | ω i j (cid:3)(cid:9) I τ,i j χ i j χ i j = n − n − X i = i ,j X j = i ,j ,i E h ˜ W σ { i ,i ; j ,j } | X i , X j i . If follows from a Law of Iterated Expectations that E (cid:2) V ∗ ,nτ (cid:3) = E [ ξ i j ,τ ] = 0 . Step 2. Variance of H´ajek Projection
As in the proof of Lemma B.4, the variance of V ∗ ,nτ is given by V ar (cid:0) V ∗ ,nτ (cid:1) = (cid:26) n ( n − (cid:27) n X i =1 X j = i E h ξ i j ,τ ξ ′ i ′ j ′ ,τ i = (cid:26) n ( n − (cid:27) n X i =1 X j = i Λ ∗ i ,j where Λ ∗ i ,j = E hn E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j i . Define Υ n = n ( n − V ar (cid:0) V ∗ ,nτ (cid:1) = 1 n ( n − n X i =1 X j = i Λ ∗ i ,j . Step 3. Variance of S ,nτ Given two different 6-tuples σ { i , i , j , j , l , l } and σ ′ { i ′ , i ′ , j ′ , j ′ , l ′ , l ′ } , let∆ c,n = Cov (cid:0) s ,n ( σ { i , i , j , j , l , l } ) , s ,n (cid:0) σ ′ { i ′ , i ′ , j ′ , j ′ , l ′ , l ′ } (cid:1)(cid:1) denote the covariance between s ,n ( σ ) and s ,n ( σ ′ ) when σ and σ ′ have c = 0 , , , , , , E [ s ,n ( σ { i , i , j , j , l , l } ) | Ω σ ] = 0, that ∆ ,n = ∆ ,n = 0.58onsider∆ ,n = E (cid:2) s ,n ( σ { i , i , j , j , l , l } ) s ,n ( σ ′ { i , i ′ , j , j ′ , l , l ′ } ) ′ (cid:3) = E h s i j ( σ ) s i j (cid:0) σ ′ (cid:1) ′ i + o (1)= E (cid:20)(cid:26) E h ˜ D ∗ i j ,τ ˜ D ∗ i j ,τ | ω i j i − E h ˜ D ∗ i j ,τ | ω i j i (cid:27) I τ,i j ˜ W σ ˜ W σ ′ (cid:21) + o (1) . Therefore, the variance of
V ar ( S † ,nτ ) can be expressed as (cid:18) m n (cid:19) X σ X σ ′ E (cid:2)(cid:0) s ,n ( σ ) s ,n ( σ ′ ) ′ (cid:1)(cid:3) + (cid:18) (cid:18) n (cid:19)(cid:19) − n X i =1 X j = i X k = i ,j X k = i ,j ,k X l = i ,j X l = i ,j ,l ∆ ,n + O (cid:18) n (cid:19) ∆ ,n + O (cid:18) n (cid:19) ∆ ,n + O (cid:18) n (cid:19) ∆ ,n + O (cid:18) n (cid:19) ∆ ,n Notice that the term inside the brackets scaled by (( n − n − − can be written as (cid:18) n − n − (cid:19) X k = i ,j X k = i ,j ,k X l = i ,j X l = i ,j ,l ∆ ,n = E hn E (cid:2) D ∗ i j D ∗ i j | ω i j (cid:3) − E (cid:2) D ∗ i j | ω i j (cid:3) o I τ,i j χ i j χ ′ i j i = Λ ∗ i ,j . As a result,
V ar h S † ,nτ i = (cid:18) n ( n − (cid:19) n X i =1 X j = i Λ ∗ i ,j + o (1) , and V ar (cid:2) V ∗ ,nτ (cid:3) − V ar h S † ,nτ i = o p (1).The asymptotic equivalence results follows from similar arguments as in the proof of LemmaB.4. The proof is complete. 59 Simulations: alternative designs
Table 3: Simulation results for the semiparametric estimator b θ n with kernel estimator b f v ( v ij )mean median std MSE Degree n = 50log(log( n )) 1.6047 1.6164 1.1253 1.2772 0.4237log( n ) / n ) 1.6444 1.6643 1.5801 2.5176 0.3125 n = 100log(log( n )) 1.5373 1.5011 0.4911 0.2425 0.4214log( n ) / n ) 1.5415 1.5197 0.7317 0.5371 0.2907 Total number of Monte Carlo simulations = 500. Bandwith parameter h = 0 . Table 4: Simulation results for the semiparametric estimator b θ n with kernel estimator b f v ( v ij )mean median std MSE Degree n = 50log(log( n )) 1.6296 1.5640 1.1280 1.2893 0.4252log( n ) / n ) 1.6308 1.6379 1.5430 2.3981 0.3127 n = 100log(log( n )) 1.5944 1.5782 0.4999 0.2588 0.4218log( n ) / n ) 1.5009 1.5244 0.7059 0.4983 0.2896 Total number of Monte Carlo simulations = 500. Bandwith parameter h = 0 . b θ n with kernel estimator b f v ( v ij )mean median std MSE Degree n = 50log(log( n )) 1.7149 1.7235 1.1336 1.3313 0.4252log( n ) / n ) 1.5690 1.5839 1.5592 2.4358 0.3116 n = 100log(log( n )) 1.5394 1.5478 0.4973 0.2488 0.4212log( n ) / n ) 1.5662 1.6033 0.7749 0.6049 0.2905 Total number of Monte Carlo simulations = 500. Bandwith parameter h = 0 . Table 6: Simulation results for the semiparametric estimator b θ n with kernel estimator b f v ( v ij )mean median std MSE Degree n = 50log(log( n )) 1.6675 1.6378 1.0617 1.1552 0.4250log( n ) / n ) 1.6594 1.6162 1.5833 2.5321 0.3113 n = 100log(log( n )) 1.5577 1.5512 0.5305 0.2848 0.4208log( n ) / n ) 1.5653 1.5637 0.7064 0.5033 0.2898 Total number of Monte Carlo simulations = 500. Bandwith parameter h = 0 .2.