Outcome regression-based estimation of conditional average treatment effect
aa r X i v : . [ m a t h . S T ] S e p Outcome regression-based estimation of conditionalaverage treatment effect
Lu Li , Niwen Zhou , and Lixing Zhu ∗ School of Finance and statistics, East China Normal University, Shanghai 200241, China School of statistics, Beijing Normal University, Beijing 100875, China Department of Mathematics, Hong Kong Baptist University, Kowloon Tong 999077, HongKong, China
Abstract
The research is about a systematic investigation on the following issues. First,we construct different outcome regression-based estimators for conditional averagetreatment effect under, respectively, true (oracle), parametric, nonparametric andsemiparametric dimension reduction structure. Second, according to the corre-sponding asymptotic variance functions, we answer the following questions whensupposing the models are correctly specified: what is the asymptotic efficiencyranking about the four estimators in general? how is the efficiency related to theaffiliation of the given covariates in the set of arguments of the regression functions?what do the roles of bandwidth and kernel function selections play for the estima-tion efficiency; and in which scenarios should the estimator under semiparametricdimension reduction regression structure be used in practice? As a by-product,the results show that any outcome regression-based estimation should be asymp-totically more efficient than any inverse probability weighting-based estimation.All these results give a relatively complete picture of the outcome regression-basedestimation such that the theoretical conclusions could provide guidance for prac-tical use when more than one estimations can be applied to the same problem.Several simulation studies are conducted to examine the performances of theseestimators in finite sample cases and a real dataset is analyzed for illustration.
Keywords:
Asymptotic variance; Conditional average treatment effect; Regressioncasual effect; Sufficient dimension reduction. ∗ Corresponding author: [email protected]. The first two authors are co-first authors. The researchwas supported by a grant from The University Grants Council of Hong Kong. Introduction
Causal inference has been widely applied for decades to analyse treatment effect basedon observational studies, in which treatments are assigned to observations in a non-random fashion. In this paper, we consider casual inference under the potential outcomeframework (Rubin, 1974; Rosenbaum and Rubin, 1983) where the treatment is binaryand the outcome variable in the hypothetical complete data set has two components( Y (1) , Y (0) ). In which Y (1) is the potential outcome if the individual receives treatmentand Y (0) is the corresponding potential outcome without treatment. As we can onlyobserve one of Y (1) and Y (0) , a commonly used method is to impute a reasonable value inthe lieu of the missing one such as linear regression imputation Healy and Westmacott(1956), kernel regression imputation Cheng (1994) and ratio imputation Rao (1996).In this paper, we consider average treatment effect (ATE) conditional on some co-variates to explore the heterogeneity of ATE Rosenbaum and Rubin (1983, 1985). Let X ∈ R p be a set of covariates that collects individual’s personal information and X ∈ R k be a subvector of X , 1 ≤ k < p . Conditional average treatment effect (CATE, here-after) is defined as E ( Y (1) − Y (0) | X ). To estimate this function, Abrevaya et al. (2015)proposed estimators that are based on inverse probability weighting (IPW, hereafter)method and concluded that, according to the asymptotic variance functions, the estima-tor with noparametrically estimated propensity score (NCATE) is asymptotically moreefficient than the one with parametrically estimated propensity score (PCATE). The rel-evant conclusion is similar to that in Hahn (1998) and Hirano et al. (2003) for the IPWestimators of ATE. But, PCATE is proved to be asymptotically equivalent to the onewith true propensity score (OCATE). This is very different from the unconditional ATE.Zhou and Zhu(2020) ∗ proposed an estimator with semiparametically estimated propen-sity score (SCATE) and gave some more detailed analysis on the asymptotic efficiencyon NCATE and SCATE.As well known, for ATE, outcome regression-based estimation is already a popularlyused methodology. Thus, methodologically, the research in this aspect is not new. How-ever, for CATE, the problem becomes more complicated as it involves double conditionalexpectations on the full set X , or subset β ⊤ X of covariates, if the curse of dimensional-ity is concerned within dimension reduction framework, and the subset X where β is aprojection matrix. Three relevant references are Luo et al. (2017), Luo et al. (2019) andMa et al. (2019)). To focus on the estimation efficiency issue, we in this paper do notgive more details about how to work on dimension reduction and feature selection, whileonly consider the general setting supposing that a dimension reduction structure alreadyexists. We then consider a systematic investigation on their asymptotic properties toanswer the following questions when the model is correctly specified in parametric case. ∗ Zhou, N. W. & Zhu L. X.(2020). On IPW-based estimation of conditional average treatment effect.Submitted.
21. When CATE is estimated under nonparametric, semiparametric, parametric andtrue (oracle) regression structure, what ranking of the asymptotic efficiency canbe for these estimators?Q2. Note that CATE is a function of X and the set of arguments of the regressionfunction, say ˜ X that is not necessary to be the full X , and thus X is not nec-essary to be a strict subset of ˜ X . Then could the affiliation of X to ˜ X affectthe asymptotic efficiency of different estimators? This issue is unique for CATEand particularly important under semiparametic dimension reduction frameworkas the regression function would be a function of ˜ X = β ⊤ X where β is a p × r matrix with r ≪ p in high dimensional scenarios.Q3. As all estimators involve nonparametric estimations for the involved conditionalexpectations, how could the bandwidth and kernel function affect the efficiency?This study is particularly necessary.Q4. Comparing with the IPW-based estimation, what efficiency ranking should beconcluded?We will have a very brief discussion in Section 5 about the misspecified cases, globallyor locally, that will be investigated in the near future, but not be touched in this paper.Note that CATE is τ ( x ) = E [( Y (1) − Y (0) ) | X = x ] = E [ E ( Y (1) − Y (0) | X ) | X = x ] , where E ( Y (1) − Y (0) | X ) is the treatment effect heterogeneity. We are interested in,under unconfoundedness assumption, estimating τ ( x ) in this paper. To well answerthe above four questions, we suggest / propose four estimators when assuming that m ( X ) − m ( X ) = E ( Y (1) − Y (0) | X ) is completely known function (ORCATE), para-metric function (PRCATE) ( m ( X ) = m ( X, θ ) and m ( X ) = m ( X, θ )), semipara-metric function with dimension reduction structure (SRCATE) ( m ( X ) = m ( β ⊤ X )and m ( X ) = m ( β ⊤ X )), and nonparametric function (NRCATE). The details willbe in Section 2. We derive the asymptotically linear representations and asymptoticnormality of these estimators in various scenarios and, according to the asymptotic vari-ance functions and using the estimators with true regression / propensity score as thebenchmark, we obtain the following results to give a relatively complete picture for theasymptotic efficiencies of the four estimation methods. The following newly derived re-sults show that the estimated CATEs have rather different asymptotic behaviors fromthe estimated ATEs. Let A (cid:22) B mean that method A has smaller asymptotic variancefunction than method B, and A ∼ = B stand for the asymptotic equivalence of them whenthe asymptotic variance functions are equal. The results are summarised as follows.A1. This is the answer for Q1 and Q4. In general, the ranking for the asymptoticefficiencies of the estimators is, together with the results about the IPW-based3stimators respectively in Abrevaya et al. (2015) and Zhou and Zhu(2020) † : regression-based CATE estimators z }| { ORCATE ∼ = PRCATE (cid:22)
SRCATE (cid:22)
NRCATE = IPW-based CATE estimators z }| {
NCATE (cid:22)
SCATE (cid:22)
PCATE ∼ = OCATE . A2. For Q2, we have the following results to show the importance of the affiliationof X to X . Under semiparametric dimension reduction structure, when X ⊂ β ⊤ X ∩ β ⊤ X or X is just contained in one of the sets β ⊤ X or β ⊤ X , ORCAT E ∼ = P RCAT E (cid:22)
SRCAT E.
While when X is not fully included in both β ⊤ X and β ⊤ X , we have ORCAT E ∼ = P RCAT E ∼ = SRCAT E.
Some more results are included in Section 2. Also some similar results aboutNRCATE and more detailed comparisons are described in Section 2.A3. This answer is for Q3. When the CATE functions are smooth sufficiently, and thebandwidth and kernel function are delicately selected, the following asymptoticequivalence among the regression-based estimators can be achieved:
ORCAT E ∼ = P RCAT E ∼ = SRCAT E ∼ = N RCAT E.
A4. In high-dimensional scenarios, semiparametric-based estimation is often preferablebecause it can greatly overcome the curse of dimensionality and also avoid modelmisspecification. Some more detailed studies and comparisons for the asymptoticefficiency are contained in Section 2. The numerical studies in Section 3 supportthis observation.The rest of this article is organized as follows. Section 2 introduces the CATEfunction and give the estimators respectively under the true, parametric, nonparameticand semiparametric framework. The asymptotic properties of the proposed estimatorsare systematically investigated in this section. Section 3 presents some simulation studiesto examine the performances of the estimators. Section 4 is devoted to the analysis for areal data example. Conclusions and some further research problems are briefly discussedin Section 5. For the ease of presentation, we defer all technical proofs to the appendix. † Zhou, N. W. & Zhu L. X.(2020). On IPW-based estimation of conditional average treatment effect.Submitted. Estimations and their asymptotic properties
Let D be a dummy variable indicating treatment status with D = 1 if an individualreceives treatment and D = 0 otherwise. We only observe D , X and Y ≡ D · Y (1) +(1 − D ) · Y (0) in the real situation. The propensity score p ( D = 1 | X ) is denoted by p ( X ). Let { X i , Y i , D i } , i = 1 , . . . , n be n independent copies of ( X, Y, D ). To estimate τ ( x ), we suggest a two-step estimation procedure when both g and g are unknown.Four estimators are proposed in this paper when the regression casual effect under true(oracle), parametric, nonparametric and semiparametric dimension reduction structure(ORCATE, PRCATE, NRCATE and SRCATE) respectively.To clearly state the estimation procedures, recall that the function m t ( X ) is definedas m t ( X ) = E ( Y ( t ) | X ) , t = 0 , . Under the unconfounderness assumption that is the conditional independence as( Y (0) , Y (1) ) D | X, we then first estimate m ( X ) − m ( X ) and then its conditional expectation τ ( x ) = E ( m ( X ) − m ( X ) | X ) . But in semiparametric dimension reduction structure, this un-confounderness assumption will have a different formula that will be specified in Sec-tion 2. However, directly estimating τ ( X ) in terms of Y (1) − Y (0) is not feasible as itis never observed. It is naturally to use Y (1) and Y (0) to estimate m ( X ) and m ( X )separately. Afterwards τ ( x ) can be estimated by a nonparametric method such as theN-W estimation (Nadaraya, 1964; Watson, 1964).As for SRCATE and NRCATE, we will have to use high order kernel functions, wegive the notation here. A function K : R k → R is a kernel of order s if it integrates toone over R k , and Z u p · · · u p k k K ( u ) du = 0for all nonnegative integers p , · · · , p k such that 1 ≤ P ki =1 p i < s , and it is nonzerowhen P ki =1 p i = s . Some regularity conditions are listed below.(C1). (Strong ignorability)(a) (Unconfoundedness) ( Y (0) , Y (1) ) D | X .(b) (Common support) For some very small c >
0, the propensity score function p ( · ) satisfies that c < p ( X ) < − c .(C2). (Distribution of X ) The support X of the p -dimensional covariate X is a Carte-sian product of compact intervals, and the density of X , f ( x ), is bounded away5rom 0 on X .(C3). (Kernel functions) K ( u ) is a kernel of order s that is symmetric around zeroand s ∗ times continuously differentiable.(C4). (Distribution of X ) The density function of X , f ( x ), is bounded away fromzero and infinity and s ≥ k = 1 and s = 2. Furthermore, thevalue s ∗ relies on the smoothness of the regression function. More specifically, s ∗ ≥ s ∗ ≥ s and s ∗ ≥ s in nonparametric and semiparametricsituation, respectively.In the following, we study the four estimations in separate subsections and give somefurther analysis for SRCATE and NRCATE in another subsection. This estimator will serve as a benchmark to examine the performance of other estimatorsdeveloped and investigated later. Assume that m ( X ) − m ( X ) is completely known withno need of estimation. Then ORCATE can be written as b τ ( x ) = nh k n P i =1 K (cid:16) X i − x h (cid:17) { m ( X i ) − m ( X i ) } nh k n P i =1 K (cid:16) X i − x h (cid:17) , (1)The asymptotically linear representation and asymptotic normality are stated below. Theorem 1.
Suppose that assumptions (C1) through (C4) are satisfied. Then, whenregression casual effect is given without estimation, for each point x in the support of X , we have q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { m ( X i ) − m ( X i ) − τ ( x ) } K (cid:18) X i − x h (cid:19) + o p (1) , nd then q nh k { b τ ( x ) − τ ( x ) } d −→ N (cid:18) , || K || σ O ( x ) f ( x ) (cid:19) , where || K || = { R K ( u ) du } / , and σ O ( x ) = E [ { m ( X ) − m ( X ) − τ ( x ) } | X = x ] . Suppose that both m ( X ) and m ( X ) have parametric structures with unknown param-eters α and α respectively. That is, m t ( X, α t ) are parametric functions for t = 0 , α and α , we use a similar method to that of Wang et al. (2004). Write,for i = 1 , . . . , n,D i Y i = D i m ( X i , α ) + D i ǫ i , (1 − D i ) Y i = (1 − D i ) m ( X i , α ) + (1 − D i ) ǫ i , where ǫ ti , t = 0 ,
1, are random error terms, and independent of X i , i = 1 , . . . , n . Useweighted least squares to estimate α t for t = 0 ,
1, and then m ( X i ) = m ( X i , α ) seeMatloff (1981). Write them as ˆ α t and b m ( X ). PRCATE is then defined as: b τ ( x ) = nh k n P i =1 K (cid:16) X i − x h (cid:17) { b m ( X i ) − b m ( X i ) } nh k n P i =1 K (cid:16) X i − x h (cid:17) , (2)where b m ( X i ) = m ( X, b α ) , b m ( X i ) = m ( X, b α ) , i = 1 , . . . , n. Assume the following additional condition:(A1). (Bandwidths) h → nh k → ∞ , nh s + k → b τ ( x ). Theorem 2.
Suppose that conditions (C1) through (C4) and (A1) are satisfied for s = s ∗ + 2 . Then, for each point x in the support of X , we have q nh k { b τ ( x ) − τ ( x ) } p nh k f ( x ) n X i =1 { m ( X i ) − m ( X i ) − τ ( x ) } K (cid:18) X i − x h (cid:19) + o p (1) d −→ N (cid:18) , || K || σ P ( x ) f ( x ) (cid:19) , where σ P ( x ) = σ O ( x ) = E [ { m ( X ) − m ( X ) − τ ( x ) } | X = x ] . Remark 1.
This theorem states the asymptotic equivalence between PRCATE and OR-CATE in the sense that their asymptotic variance functions are identical.
If we do not have prior information on the structures of m ( X ) and m ( X ) or we tryto avoid model misspecification, a nonparametric estimation is feasible. Similarly, weestimate m ( X ) and m ( X ) separately. Therefore, NRCATE is written as b τ ( x ) = nh k n P i =1 K (cid:16) X i − x h (cid:17) { b m ( X i ) − b m ( X i ) } nh k n P i =1 K (cid:16) X i − x h (cid:17) , (3)where b m ( X i ) = nh p n P j =1 K (cid:16) X j − X i h (cid:17) Y j ( D j = 1) nh p n P j =1 K (cid:16) X j − X i h (cid:17) ( D j = 1) , b m ( X i ) = nh p n P j =1 K (cid:16) X j − X i h (cid:17) Y j ( D j = 0) nh p n P j =1 K (cid:16) X j − X i h (cid:17) ( D j = 0) . To study the asymptotic properties of b τ ( x ), we give some more conditions on the kernelfunction and bandwidths.(A2). K ( u ) is a kernel of order s ≥ p , symmetric around zero and equal to zerooutside Q pi =1 [ − ,
1] with continuous ( s + 1) order derivatives.(A3). h → log nnh p + s → h s h − s − k → nh k h s → m and m to ensure the asymptotic normality. The following theorem states the main8heoretical results of NRCATE. For convenience, define the following function:Ψ ( X, Y, D ) := D { Y − m ( X ) } p ( X ) − (1 − D ) { Y − m ( X ) } − p ( X ) + m ( X ) − m ( X ) . Theorem 3.
Suppose that conditions (C1) through (C4) and (A1) through (A4) aresatisfied for s ∗ ≥ s ≥ p . Then, for each point x , we have q nh k ( b τ ( x ) − τ ( x ))= 1 p nh k f ( x ) n X i =1 [Ψ ( X i , Y i , D i ) − τ ( x )] K (cid:18) X i − x h (cid:19) + o p (1) d −→ N (cid:18) , || K || σ N ( x ) f ( x ) (cid:19) , where σ N ( x ) = E [ { Ψ ( X, Y, D ) − τ ( x ) } | X = x ]= σ P ( x ) + E ( var ( Y (1) | X ) p ( X ) + var ( Y (0) | X )1 − p ( X ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X = x ) ≥ σ P ( x ) = σ O ( x ) , the equality holds if and only if var( Y (1) | X ) p ( X ) = 0 and var( Y (0) | X )1 − p ( X ) = 0, which rarely happen.Thus, the inequality shows that NRCATE is asymptotically less efficient than PRCATEand ORCATE. An obvious limitation of NRCATE is its incapability of handling models with high-dimensional covariates X in practice. Therefore, how to alleviate the curse of dimen-sionality is an important issue. To this end, reducing dimensionality is a natural idea.But we restrict ourselves to the sufficient dimension reduction framework below and useexisting methods to estimate the projection directions as the focus of this paper is onasymptotics of the estimations assuming the dimension reduction structure is specified ina semiparametric manner. For other dimension reduction issues, we can see the relevantreferences such as Luo et al. (2017) and Ma et al. (2019).We first give a very brief review on sufficient dimension reduction. For given β ⊤ X where β is a p × r orthonormal matrix with an unknown number r ≪ p of columns,suppose that the regression of a response variable W is independent of X , which iswritten as E ( W | X ) X | β ⊤ X , where stands for independence. It is generally9nown that E ( W | X ) is an unspecified function of β ⊤ X , which allows full freedom inthe regression with β ⊤ X being the sufficiently reduced covariates (from p to r ). Thisstructure has a dimension reduction structure with unknown parameter β and also isvery much flexible with a nonparametric nature. To identify the projection directions β ,Cook and Li (2002) defined the notion of central mean subspace that is the intersectionof all subspaces spanned by any β such that the above conditional independence holds.To be specific, without notational confusion, write S E ( Y (1) | X ) and S E ( Y (0) | X ) respectivelyspanned by β ∈ R p × r (1) and β ∈ R p × r (0) where r ( t ) < p for t = 0 , m ( X ) X | β ⊤ X, m ( X ) X | β ⊤ X. (4)There are some approaches available in the literature to identify β and β . Forinstance, Luo et al. (2017) and Ma et al. (2019) discussed the relevant dimension reduc-tion issues and derived the properties of ATE under semiparametric structures. As thefocus of this paper is on the asymptotic properties of CATE estimations and the com-parisons amongst them, we then do not give the details about the estimation proceduresof dimension reduction matrices β and β , while just assume the root- n consistency oftwo estimators b β and b β we can define.Note that under this dimension reduction structure, we have m t ( X ) = E ( Y ( t ) | X ) = E ( Y ( t ) | β ⊤ t X ) = m t ( β ⊤ t X ) for t = 0 ,
1. Define a SRCATE as b τ ( x ) = nh k n P i =1 K (cid:16) X i − x h (cid:17) { b m ( b β ⊤ X i ) − b m ( b β ⊤ X i ) } nh k n P i =1 K (cid:16) X i − x h (cid:17) , (5) where In order to derive theoretical results, give the following conditions.(A5). K ( u ) is a kernel of order s , is symmetric around zero, is equal to zero outside Q pi =1 [ − , β ⊤ t X , f t ( β ⊤ t X ) is s times continuously differentiable for t = 0 ,
1. For t = 0 , p ( β ⊤ t X ) ∈ ( c ∗ , − c ∗ ) almost surely for some c ∗ ∈ (0 , . h → log nnh max { r (0) ,r (1) } + s → h s h − s − k → nh k h s → b β − β = O p ( n − ) and b β − β = O p ( n − ).Since the treatment effect heterogeneity under the semiparametric structure is based on β ⊤ t X for t = 0 ,
1, Assumptions (A5) through (A7) play the same role as Assumptions(A2) through (A4). Condition (A8) often holds.10efine three functions asΨ ( X, Y, D ) = D { Y − m ( X ) } p ( β ⊤ X ) + m ( X ) − m ( X ) , Ψ ( X, Y, D ) = − (1 − D ) { Y − m ( X ) } − p ( β ⊤ X ) + m ( X ) − m ( X ) , (6)Ψ ( X, Y, D ) = D { Y − m ( X ) } p ( β ⊤ X ) − (1 − D ) { Y − m ( X ) } − p ( β ⊤ X ) + m ( X ) − m ( X ) . Next, for ease of explanation of our theoretical results, we introduce some notations.Write A and B as two sets of elements. Without confusion, write card ( A ) as the cardi-nality of the set A .(F1) A ⊂ B stands for A ∩ B = A . In other words, elements of A are all in B and card ( B ) ≥ card ( A ).(F2) A ⊂ k − q B stands for A ∩ B = C with card ( C ) = k − q , that is, k − q elementsof A belong to B . When k = q , it means that A and B do not share the sameelements, i.e. A ∩ B = ∅ , written as A B .The following theorem states some very detailed investigation on the asymptoticefficiency of SRCATE. Theorem 4.
Suppose that assumptions (C1) through (C4), (A1) and (A5) through (A8)are satisfied for s ∗ ≥ s ≥ max { r (0) , r (1) } . Then, for each point x in the support of X , noting the definitions of Ψ i for i = 2 , , in (6),(1) when X ⊂ k − q β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , theasymptotically linear representation of b τ ( x ) is q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { m ( X i ) − m ( X i ) − τ ( x ) } K (cid:18) X i − x h (cid:19) + o p (1) , and the asymptotic distribution of b τ ( x ) is q nh k ( b τ ( x ) − τ ( x )) d −→ N (cid:18) , || K || σ S, ( x ) f ( x ) (cid:19) ; (2) when X ⊂ β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , theasymptotically linear representation of b τ ( x ) is q nh k { b τ ( x ) − τ ( x ) }
11 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K (cid:18) X i − x h (cid:19) + o p (1) d −→ N (cid:18) , || K || σ S, ( x ) f ( x ) (cid:19) ; (3) when X ⊂ k − q β ⊤ X and X ⊂ β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , theasymptotically linear representation of b τ ( x ) is q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K (cid:18) X i − x h (cid:19) + o p (1) d −→ N (cid:18) , || K || σ S, ( x ) f ( x ) (cid:19) ; (4) when X ⊂ β ⊤ X and X ⊂ β ⊤ X , the asymptotically linear representation of b τ ( x ) is q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K (cid:18) X i − x h (cid:19) + o p (1) d −→ N (cid:18) , || K || σ S, ( x ) f ( x ) (cid:19) ; where σ S, ( x ) = σ O ( x ) = E [ { m ( X ) − m ( X ) − τ ( x ) } | X = x ] ,σ S, ( x ) = E [ { Ψ ( X, Y, D ) − τ ( x ) } | X = x ] ,σ S, ( x ) = E [ { Ψ ( X, Y, D ) − τ ( x ) } | X = x ] , (7) σ S, ( x ) = E [ { Ψ ( X, Y, D ) − τ ( x ) } | X = x ] . Remark 2.
These results imply that the asymptotic behaviours of b τ ( x ) rely on whether X is a subset of β ⊤ t X for t = 0 , . Note that X ⊂ k − q β ⊤ t X implies that only k − q elements of X are also the k − q linear combinations of β ⊤ t X for t = 0 , . In thiscase, write β ⊤ t X as β ⊤ t X = ( X , . . . , X k − q ) , ( e β ⊤ t X ) ⊤ ) ⊤ for t = 0 , . Therefore,when X ⊂ k − q β ⊤ t X with s (2 − k/q ) + k > and < q ≤ k , we should determinethe intersection between X and β ⊤ t X , and then estimate β t through estimating e β t for t = 0 , . It could be done by using partial sufficient dimension reduction (e.g. Feng et al.(2013)). As this is not the focus of this paper, we then assume that β t can be estimatedat the rate / √ n of convergence. Obviously, the assumption s (2 − k/q ) + k > is atisfied for k = 1 . Corollary 1.
We have σ S, ( x ) = σ P ( x ) = σ O ( x ) ,σ S, ( x ) = σ P ( x ) + E ( var ( Y (1) | X ) p ( β ⊤ X ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X = x ) ≥ σ P ( x ) = σ O ( x ) ,σ S, ( x ) = σ P ( x ) + E ( var ( Y (0) | X )1 − p ( β ⊤ X ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X = x ) ≥ σ P ( x ) = σ O ( x ) ,σ S, ( x ) = σ P ( x ) + E ((cid:20) var ( Y (1) | X ) p ( β ⊤ X ) + var ( Y (0) | X )1 − p ( β ⊤ X ) (cid:21) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X = x ) ≥ σ P ( x ) = σ O ( x ) . Assume that var ( Y ( t ) | X ) is a measurable function with respect to β ⊤ t X for t = 0 , . Then E (cid:26) var ( Y (1) | X ) p ( β ⊤ X ) (cid:27) ≤ E (cid:26) var ( Y (1) | X ) p ( X ) (cid:27) , and E (cid:26) var ( Y (0) | X )1 − p ( β ⊤ X ) (cid:27) ≤ E (cid:26) var ( Y (0) | X )1 − p ( X ) (cid:27) . Then σ O ( x ) = σ P ( x ) ≤ σ S, ( x ) ≤ σ S, ( x ) ≤ σ N ( x ) ,σ O ( x ) = σ P ( x ) ≤ σ S, ( x ) ≤ σ S, ( x ) ≤ σ N ( x ) . (8) Remark 3.
The results in the above corollary are based on some elementary calculationsand the application of Theorem 3 of Luo et al. (2017). We then omit the detailed cal-culations. Based on these facts, SRCATE is more efficient than NRCATE in all cases,and less efficient than PRCATE and ORCATE in cases (2) to (4). In particularly, SR-CATE shares the same asymptotic distribution as PRCATE and ORCATE in case (1).Furthermore, SRCATE in case (4) is less efficient than cases (2) and (3).
Inspired by Theorem 4 about the importance of affiliation of X to the set of argumentsof the regression functions, we further investigate SRCAT E and
N RCAT E in moregeneral settings. The results are stated in the following.
Corollary 2.
Suppose that conditions (C1) through (C4) and (A1) through (A8) aresatisfied. Assume that there is a given e X such that ( Y (0) , Y (1) ) X | e X with e X ⊂ X and X e X , when X ⊂ k − q β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , then the four outcome regression-based CATE estimators share the sameasymptotic distribution.Here, e σ N ( x ) ≡ E [ { m ( X ) − m ( X ) − τ ( x ) } | X = x ] = σ P ( x ) = σ O ( x ) . emark 4. Much to our surprise, NRCATE can be asymptotically more efficient inthis special case to share the same asymptotic variance of PRCATE. This shows theimportance of covariate affiliation to the set of arguments of the regression function.This is a unique property for CATE as for ATE, this does not happen.
Corollary 3.
In Theorem 3 and Theorem 4, if commonly used constraints on the band-widths h , h and h are replaced with p nh k (cid:16) h s + p log( n ) /nh p (cid:17) = o (1) and p nh k (cid:18) h s + r log( n ) nh max { r (0) ,r (1) } (cid:19) = o (1) for someorder s , NRCATE and SRCATE have the same asymptotic distribution as PRCATE andORCATE. Remark 5.
As mentioned above, if we choose the bandwidth to satisfy the above condi-tions, NRCATE and SRCATE will share the same asymptotic efficiencies as PRCATEand ORCATE. It is obvious that the condition p nh k (cid:16) h s + p log( n ) /nh p (cid:17) = o (1) and p nh k (cid:18) h s + r log( n ) nh max { r (0) ,r (1) } (cid:19) = o (1) are muchstronger than the assumptions in Theorem 3 and Theorem 4. However, it is possible tochoose such bandwidths if the regression casual effect function is sufficiently smooth suchthat high order kernel can be used. For details, see Li and Racine (2007) and Zhou andZhu, 2020. Therefore, we obtain that the ranking for the asymptotic efficiencies of fourregression-based CATE estimators and four propensity score-based CATE estimators un-der the condition that p nh k (cid:16) h s + p log( n ) /nh p (cid:17) = o (1) and p nh k (cid:18) h s + r log( n ) nh max { r (0) ,r (1) } (cid:19) = o (1) , regression-based CATE estimators z }| { ORCATE = PRCATE = SRCATE = NRCATE ≤ IPW-based CATE estimators z }| {
NCATE = SCATE = PCATE = OCATE . (9) The equality occurs if and only if E (" var ( Y (1) | X ) p ( X ) + var ( Y (0) | X )1 − p ( X ) + p ( X )(1 − p ( X )) (cid:18) m ( X ) p ( X ) + m ( X )1 − p ( X ) (cid:19) X = x ) = 0 . In other words, regression based estimators are always more efficient than IPW-typeestimators in this general setting.On the other hand, the above investigations are mainly for theoretical studies, andin practice, we may avoid to choose those bandwidths as they are often very difficult toproperly select otherwise, the estimators would perform worse. Simulations
To verify our theoretical results, we in this section conduct simulation studies to comparethe regression-based ORCATE, PRCATE, SRCATE, NRCATE estimators with IPW-based OCATE, PCATE, SCATE, NCATE estimators (Abrevaya et al., 2015). Set p =dim( X ) ∈ { , } to avoid the curse of dimensionality under nonparametric estimation.Based on our experience and the theoretical results, when p is large, NRCATE is veryhard to implement. As well known, bandwidth selection plays an important role in theNW estimation. Hence, we first discuss this issue. Note that ORCATE and PRCATE only involve one bandwidth h used in the secondstep of the estimation procedure. We first check how to choose bandwidth sequencesand kernel functions satisfying the conditions A1 - A7. To this end, consider h = a · n − k +2 s − δ , a > , δ > ,h = a · n − p + s δ , a > , δ > ,h = a · n − { r (0) ,r (1) } + s δ , a > , δ > , (10)where δ , δ and δ can be selected as small as necessary or desired. It is clear that h , h and h satisfy conditions A1, A2, A3, A5 and A6. To satisfy condition A4, we setthe kernel orders as s = p and p + 1 for even and odd p respectively; and s = s + 2.To satisfy condition 7, under semiparametric dimension reduction structure, set s =max { r (0) , r (1) } and = max { r (0) , r (1) } +1 respectively for even and odd max { r (0) , r (1) } .Based on the above values of s , s and s , we verify th first parts of conditions A4 andA7. Next, consider the second parts of these two conditions. Note that when s ≥ p and s ≥ max { r (0) , r (1) } , − s p + s ≤ − , s + k s + 4 + k < , − s max { r (0) , r (1) } + s ≤ − , s + k s + k < . Then − s p + s + 2 s + k s + 4 + k < , − s max { r (0) , r (1) } + s + 2 s + k s + k < . Therefore, h s h − s − k → h s h − s − k →
0. Invoking condition A3, nh k h s = nh s + k h s h − s → h s h − s →
0. Since δ , δ and δ can be arbitrarily small,15e get, because − s / ( s + p ) ≤ − / s + 2) / (2 s + 4 + k ) < / − s s + p + s + 22 s + 4 + k < . Thus, condition A4 is satisfied. Similarly, together with condition A6, condition A7 canalso be satisfied, which has nh k h s → − s max { r (0) , r (1) } + s + s s + k < . To examine the finite sample performances of the CATE estimators, consider the fol-lowing three models:Model 1: Y (0) = 0 , Y (1) = X + X + ǫ , p ( X ) = exp( X + X )1+exp( X + X ) .Model 2: Y (0) = 0 , Y (1) = X + X + X + X + ǫ , p ( X ) = exp { . X + X + X + X ) } { . X + X + X + X ) } .Model 3: Y (0) = 0 , Y (1) = X + X + ǫ , p ( X ) = exp { X + X } { X + X ) } .Model 1 is a model with the dimensions 2 and 0 of the central mean subspaces forthe treatment and control groups; Model 2 is used to verify Theorem 4. Model 3 is setto justify the theory in Corollary 2. The dimensions of central mean subspaces for thetreatment and control group are 1 and 0 in Models 2 and 3. For Model 1, X = ( X , X ) ⊤ is generated by X ∼ U ( − . , . , X = (1 + 2 X ) + ζ , where ζ ∼ U ( − . , . ǫ ∼ N (0 , . ). For Model 2, we generate X = ( X , X , X , X ) ⊤ by X ∼ U ( − . , . , X = 1 + X + ζ ,X = (1 + X ) + ζ , X = ( − X ) + ζ , where ζ j iid ∼ U ( − . , . ǫ ∼ N (0 , . ), j = 1 , ,
3. In Model 3, X = ( X , X , X ) ⊤ are given by X ∼ U ( − . , . , X = 1 + X + ϑ , X = (1 + X ) ∗ ( − X ) + ϑ , where ϑ j iid ∼ U ( − . , . ǫ ∼ N (0 , . ), j = 1 , n = 200 and n = 500 and the replicationtime is 500. Let T ( x ) = p ( nh )[ b τ ( x ) − τ ( x )], we report the estimated standarddeviation (SD) of T ( x ), the BIAS of T ( x ) and the MSE of T ( x ). For the bandwidth16election described in Subsection 3.1, we have the following selections.a). For Model 1 as p = 2, equation (10) gives s = 4, s = 2, and s = 2. We thenchoose h = a · n − for a = 0 . h = a · n − for a ∈ { . , . } , h = a · n − for a ∈ { . , . , . } . Here, a , a and a are called baselines.b). For Model 2, as p = 4, h = a · n − for a = 0 . h = a · n − for a ∈{ . , . , . , . } , h = a · n − for a ∈ { . , . , . } .c). For Model 3, as p = 3, then h = a · n − for a = 0 . h = a · n − for a ∈ { . , . } , h = a · n − for a = ∈ { . , . , . } .To make the simulation results more accessible, we tubulate the results in Tables 1-3and some results in Appendix, as well as plot the SDs of all estimators divided by theSD of NRCATE to show the relative efficiency in Figures 1-3. We choose a Gaussiankernel and derive higher order kernels from it.Table 1: The distribution of √ nh [ b τ ( x ) − τ ( x )] for model 1 n=200 n=500 x OR PR SR NR N S P O OR PR SR NR N S P Opanel1 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.187 0.221 0.218 0.213 0.363 0.375 0.397 0.399 0.191 0.222 0.214 0.217 0.386 0.395 0.415 0.419-0.2 0.203 0.217 0.210 0.215 0.381 0.390 0.399 0.405 0.182 0.192 0.179 0.195 0.349 0.357 0.367 0.368SD 0 0.193 0.201 0.213 0.213 0.446 0.467 0.471 0.480 0.192 0.202 0.211 0.213 0.404 0.415 0.453 0.4660.2 0.196 0.204 0.238 0.236 0.430 0.440 0.468 0.496 0.195 0.204 0.230 0.227 0.410 0.420 0.453 0.4790.4 0.197 0.213 0.241 0.239 0.394 0.415 0.443 0.437 0.200 0.225 0.243 0.241 0.392 0.395 0.443 0.446-0.4 -0.001 0.000 -0.046 -0.004 0.012 0.032 0.017 0.024 0.006 -0.008 -0.123 -0.023 -0.025 0.004 -0.007 -0.011-0.2 0.016 0.013 0.102 0.067 -0.007 0.008 0.008 0.015 0.002 0.000 0.123 0.057 -0.026 -0.010 -0.016 -0.014BIAS 0 -0.018 -0.022 0.004 0.003 -0.034 -0.017 -0.021 -0.010 0.003 0.006 0.052 0.034 -0.017 -0.002 0.002 0.0150.2 0.001 0.001 0.003 0.008 0.010 0.048 0.003 0.014 -0.016 -0.013 -0.006 -0.001 0.003 0.009 0.013 0.0210.4 -0.001 0.006 -0.005 -0.006 0.028 0.063 0.034 0.024 0.006 0.002 -0.009 -0.008 0.043 0.043 0.010 0.000-0.4 0.035 0.049 0.049 0.045 0.132 0.142 0.158 0.160 0.037 0.049 0.061 0.048 0.149 0.156 0.172 0.176-0.2 0.041 0.047 0.054 0.051 0.145 0.152 0.159 0.165 0.033 0.037 0.047 0.041 0.123 0.128 0.135 0.135MSE 0 0.038 0.041 0.045 0.046 0.200 0.219 0.222 0.230 0.037 0.041 0.047 0.047 0.164 0.172 0.205 0.2170.2 0.038 0.042 0.057 0.056 0.185 0.196 0.219 0.246 0.038 0.042 0.053 0.052 0.168 0.176 0.206 0.2300.4 0.039 0.046 0.058 0.057 0.156 0.177 0.198 0.191 0.040 0.051 0.059 0.058 0.156 0.158 0.196 0.199panel2 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.197 0.227 0.221 0.212 0.355 0.371 0.386 0.378 0.184 0.225 0.223 0.224 0.382 0.393 0.404 0.413-0.2 0.177 0.191 0.191 0.196 0.351 0.352 0.377 0.380 0.189 0.201 0.192 0.206 0.376 0.391 0.401 0.404SD 0 0.185 0.199 0.206 0.207 0.445 0.453 0.471 0.480 0.186 0.200 0.200 0.202 0.412 0.417 0.454 0.4650.2 0.197 0.201 0.229 0.225 0.457 0.463 0.508 0.542 0.202 0.209 0.230 0.228 0.446 0.459 0.492 0.5140.4 0.208 0.229 0.254 0.253 0.388 0.393 0.417 0.440 0.195 0.212 0.236 0.234 0.379 0.378 0.417 0.434-0.4 0.007 0.007 -0.060 -0.004 -0.014 0.011 0.008 0.002 -0.004 -0.014 -0.150 -0.034 -0.047 -0.017 -0.028 -0.029-0.2 0.010 0.008 0.095 0.068 -0.029 -0.013 -0.009 -0.014 0.011 0.005 0.127 0.062 0.006 0.022 0.026 0.025BIAS 0 0.014 0.010 0.044 0.041 -0.007 0.006 -0.002 0.000 0.007 0.005 0.058 0.040 -0.010 -0.002 -0.007 0.0000.2 0.007 0.001 0.000 0.004 -0.017 0.003 -0.014 -0.007 -0.012 -0.010 -0.001 0.000 -0.017 -0.013 -0.016 -0.0090.4 -0.001 -0.008 -0.018 -0.023 0.014 0.029 0.007 -0.006 0.027 0.032 0.021 0.019 0.064 0.066 0.049 0.043-0.4 0.039 0.051 0.052 0.045 0.126 0.138 0.149 0.143 0.034 0.051 0.072 0.052 0.148 0.155 0.164 0.172-0.2 0.031 0.037 0.046 0.043 0.124 0.124 0.142 0.145 0.036 0.040 0.053 0.046 0.141 0.153 0.162 0.164MSE 0 0.034 0.040 0.044 0.045 0.198 0.205 0.222 0.230 0.035 0.040 0.043 0.043 0.170 0.174 0.206 0.2160.2 0.039 0.040 0.052 0.051 0.209 0.214 0.258 0.294 0.041 0.044 0.053 0.052 0.199 0.211 0.242 0.2650.4 0.043 0.053 0.065 0.064 0.151 0.155 0.174 0.194 0.039 0.046 0.056 0.055 0.148 0.148 0.176 0.190 . . n=200 for model 1 x1 r e l a t i v e e ff i c i en cy − . − . . . . . n=500 for model 1 x1 r e l a t i v e e ff i c i en cy − . − . . . ORCATEPRCATESRCATENRCATENCATESCATEPCATEOCATE . . . n=200, regression method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=200, IPW method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=500, regression method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=500, IPW method x1 r e l a t i v e e ff i c i en cy − . − . . . Figure 1: Relative efficiency of the CATE estimators against NRCATE for Model 1,which are based the results in panel2 of Table 1.
The observations are as follows.First, it is reasonable that larger sample size results in smaller SD and MSE to showthe estimation consistency. The dimension of X also effects the estimation performance.When p increases to 4 from 2, both SD and MSE obviously increase particularly when n = 500.Second, the comparisons show the significant advantage of outcome regression-basedestimation over IPW-based estimation. Even though in theory, NRCATE is asymp-totically equivalent to NCATE, the difference on the estimation efficiency is still verysignificant. All tables and figures obviously indicate this: all IPW-based estimators have18uch larger SD than all regression-based estimators.Third, as discussed before, the performances of NRCATE and SRCATE are highlyassociated with the affiliation of the given covariates to the set of arguments of theoutcome regression. This finding can also be confirmed in Tables 2 and 3 and Figures2 and 3. In Model 2, X ⊂ k − q β ⊤ X and X ⊂ k − q β ⊤ X with k = 1 and q = 0, thus intheory, SRCATE shares the same asymptotic variance as PRCATE and ORCATE andis more efficient than NRCATE. From Table 2 and Figure 2, we can see that the SDs ofSRCATE are similar to those of PRCATE and ORCATE, which are smaller than that ofNRCATE. In Model 3, X e X = ( X , X ) ⊤ . the asymptotic efficiencies are equivalentin theory and its SDs in Table 3 are similar to, even slightly smaller than, the others. Inthis case, all outcome regression-based estimations have smaller SDs than all IPW-basedestimations. Figure 3 obviously tells this.Table 2: The distribution of √ nh [ b τ ( x ) − τ ( x )] for model 2 n=200 n=500 x OR PR SR NR N S P O OR PR SR NR N S P Opanel1 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.384 0.390 0.403 0.410 1.023 1.166 1.151 1.156 0.354 0.358 0.375 0.395 0.983 1.122 1.106 1.106-0.2 0.367 0.370 0.380 0.419 1.035 1.205 1.200 1.200 0.354 0.354 0.362 0.380 0.969 1.132 1.104 1.116SD 0 0.366 0.369 0.385 0.415 0.981 1.159 1.151 1.140 0.385 0.388 0.399 0.414 0.965 1.128 1.087 1.0910.2 0.374 0.376 0.395 0.417 0.992 1.180 1.137 1.129 0.364 0.365 0.370 0.388 1.008 1.141 1.103 1.1260.4 0.397 0.404 0.430 0.427 1.037 1.186 1.139 1.129 0.362 0.365 0.384 0.407 1.067 1.250 1.190 1.199-0.4 0.014 0.009 0.056 0.031 -0.692 -0.134 0.048 0.051 -0.017 -0.014 0.082 -0.003 -1.069 -0.201 -0.010 -0.010-0.2 0.015 0.012 0.043 0.021 -0.778 -0.207 -0.042 -0.034 0.010 0.012 0.050 0.016 -1.038 -0.198 -0.014 0.001BIAS 0 -0.005 -0.008 -0.012 -0.001 -0.782 -0.191 -0.023 -0.027 -0.025 -0.025 -0.034 -0.011 -1.107 -0.243 -0.082 -0.0740.2 0.004 0.004 -0.021 0.005 -0.652 -0.047 0.062 0.063 0.017 0.015 -0.034 0.020 -1.059 -0.158 -0.058 -0.0550.4 0.002 0.003 -0.036 0.002 -0.578 0.045 0.103 0.091 0.020 0.016 -0.053 0.005 -0.905 0.049 0.026 0.000-0.4 0.148 0.152 0.166 0.169 1.525 1.378 1.328 1.338 0.125 0.128 0.147 0.156 2.109 1.299 1.224 1.224-0.2 0.135 0.137 0.146 0.176 1.676 1.494 1.443 1.441 0.125 0.126 0.133 0.145 2.015 1.321 1.220 1.246MSE 0 0.134 0.136 0.148 0.173 1.574 1.380 1.324 1.301 0.149 0.151 0.161 0.171 2.158 1.331 1.189 1.1950.2 0.140 0.141 0.157 0.174 1.408 1.394 1.296 1.279 0.133 0.134 0.138 0.151 2.137 1.326 1.219 1.2720.4 0.158 0.163 0.187 0.183 1.410 1.410 1.307 1.283 0.131 0.133 0.150 0.166 1.957 1.566 1.417 1.437panel2 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.385 0.392 0.397 0.440 1.066 1.266 1.236 1.244 0.375 0.379 0.385 0.418 0.946 1.125 1.066 1.095-0.2 0.386 0.387 0.389 0.430 0.992 1.159 1.172 1.176 0.357 0.361 0.370 0.397 0.937 1.116 1.083 1.102SD 0 0.379 0.384 0.387 0.411 1.031 1.227 1.203 1.213 0.387 0.388 0.399 0.420 0.933 1.118 1.103 1.0950.2 0.388 0.387 0.403 0.443 1.089 1.254 1.225 1.247 0.375 0.376 0.383 0.407 1.028 1.198 1.161 1.1770.4 0.376 0.379 0.392 0.411 1.056 1.238 1.143 1.175 0.362 0.366 0.385 0.411 1.047 1.197 1.126 1.174-0.4 0.014 0.011 0.054 0.029 -0.836 -0.207 -0.011 -0.014 0.003 0.003 0.096 0.004 -1.234 -0.182 0.008 0.002-0.2 -0.010 -0.013 0.017 0.023 -0.879 -0.262 -0.073 -0.060 0.005 0.005 0.044 0.009 -1.200 -0.133 0.049 0.056BIAS 0 0.038 0.035 0.030 0.038 -0.860 -0.192 -0.080 -0.041 -0.003 -0.002 -0.012 0.001 -1.251 -0.144 -0.024 -0.0170.2 0.028 0.028 0.005 0.041 -0.715 -0.046 0.060 0.090 -0.011 -0.010 -0.058 -0.001 -1.231 -0.133 -0.057 -0.0420.4 0.009 0.007 -0.034 0.000 -0.746 -0.056 -0.030 -0.017 -0.019 -0.018 -0.089 -0.010 -1.125 -0.004 -0.031 -0.015-0.4 0.148 0.154 0.161 0.194 1.836 1.646 1.529 1.548 0.140 0.144 0.157 0.174 2.418 1.299 1.137 1.199-0.2 0.149 0.150 0.151 0.186 1.756 1.411 1.378 1.387 0.128 0.131 0.139 0.157 2.319 1.262 1.176 1.218MSE 0 0.145 0.148 0.151 0.171 1.803 1.542 1.454 1.474 0.150 0.151 0.159 0.177 2.436 1.271 1.217 1.2000.2 0.151 0.151 0.162 0.198 1.696 1.575 1.503 1.562 0.140 0.141 0.150 0.165 2.573 1.453 1.350 1.3860.4 0.142 0.144 0.155 0.169 1.673 1.535 1.308 1.380 0.132 0.134 0.156 0.169 2.362 1.433 1.269 1.378 In this section, we apply SRCATE, as the dimensionality ( p = 15) of X is high, to analysethe ACTG 175 data set that can be obtained from the R package speff2trial. This dataset was collected from a randomized clinical trial that evaluated treatment effect when19 . . . n=200 for model 2 x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=500 for model 2 x1 r e l a t i v e e ff i c i en cy − . − . . . ORCATEPRCATESRCATENRCATENCATESCATEPCATEOCATE . . . n=200, regression method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=200, IPW method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=500, regression method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=500, IPW method x1 r e l a t i v e e ff i c i en cy − . − . . . Figure 2: Relative efficiency of the CATE estimators against NRCATE for Model 2,which are based the results in panel2 of Table 2.either one or two therapies were used for HIV-infected adults; see Hammer et al. (1996);Song and Ma (2008) for more details. As discussed before, our goal is to explore theheterogeneity of this treatment effect across subpopulations. Take age as X to checkhow the expected pesticide effect changes with age .A very brief description about the data set is as follows. The outcome here is CD4T cell count at baseline and the treatment indicator variable D is a binary variable. D = 0 means receiving zidovudine only and D = 1 means receiving two therapies simul-taneously. As documented by a number of authors, we take Y = log (CD4) and deletesome infinite value after logarithmic transformation, then the number of observations is n = 2136. Further, to guarantee the unconfoundedness assumption, X consists of thefollowing 15 covariates: the pidnum (patient’s ID number); age (age in years at baseline);wtkg (weight in kg at baseline); hemo (hemophilia); homo (homosexual activity); drugs20able 3: The distribution of √ nh [ b τ ( x ) − τ ( x )] for model 3 n=200 n=500 x OR PR SR NR N S P O OR PR SR NR N S P Opanel1 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.327 0.328 0.330 0.322 0.498 0.505 0.538 0.546 0.282 0.286 0.287 0.280 0.481 0.494 0.492 0.495-0.2 0.308 0.310 0.314 0.310 0.481 0.480 0.535 0.530 0.285 0.286 0.287 0.282 0.471 0.474 0.488 0.486SD 0 0.301 0.301 0.306 0.296 0.452 0.467 0.514 0.505 0.287 0.289 0.294 0.285 0.479 0.478 0.506 0.5120.2 0.316 0.319 0.327 0.317 0.485 0.500 0.516 0.516 0.317 0.317 0.323 0.314 0.493 0.492 0.504 0.5000.4 0.290 0.291 0.301 0.298 0.485 0.509 0.514 0.520 0.297 0.298 0.299 0.290 0.476 0.486 0.493 0.490-0.4 -0.016 -0.021 -0.023 -0.037 -0.067 -0.048 -0.031 -0.031 -0.008 -0.009 -0.012 -0.032 -0.045 -0.038 -0.014 -0.014-0.2 0.006 0.003 0.007 0.022 0.016 0.021 0.007 0.009 0.002 0.000 0.001 0.019 0.015 0.024 0.010 0.010BIAS 0 -0.002 -0.004 0.001 0.024 0.039 0.034 0.011 0.010 -0.011 -0.013 -0.007 0.022 0.027 0.042 0.000 -0.0020.2 0.004 0.001 0.007 0.015 -0.012 -0.008 -0.015 -0.020 0.009 0.008 0.009 0.024 0.012 0.021 0.005 0.0030.4 0.010 0.005 0.001 -0.017 -0.066 -0.043 -0.026 -0.027 -0.005 -0.006 -0.010 -0.026 -0.062 -0.061 -0.038 -0.040-0.4 0.107 0.108 0.109 0.105 0.252 0.257 0.290 0.299 0.080 0.082 0.083 0.080 0.234 0.245 0.243 0.245-0.2 0.095 0.096 0.098 0.097 0.231 0.231 0.287 0.281 0.081 0.082 0.082 0.080 0.222 0.225 0.238 0.236MSE 0 0.090 0.091 0.093 0.088 0.206 0.219 0.265 0.255 0.082 0.084 0.087 0.082 0.230 0.231 0.256 0.2620.2 0.100 0.102 0.107 0.101 0.236 0.250 0.267 0.267 0.100 0.101 0.104 0.099 0.243 0.242 0.254 0.2500.4 0.084 0.085 0.091 0.089 0.240 0.261 0.265 0.271 0.088 0.089 0.090 0.085 0.230 0.240 0.244 0.242panel2 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.329 0.334 0.337 0.324 0.498 0.497 0.515 0.522 0.284 0.288 0.291 0.283 0.490 0.497 0.505 0.510-0.2 0.304 0.308 0.314 0.301 0.432 0.441 0.464 0.453 0.297 0.301 0.307 0.295 0.479 0.473 0.499 0.498SD 0 0.314 0.319 0.325 0.303 0.486 0.485 0.545 0.540 0.317 0.317 0.321 0.309 0.484 0.474 0.510 0.5120.2 0.301 0.308 0.314 0.298 0.462 0.467 0.499 0.500 0.292 0.293 0.292 0.284 0.464 0.463 0.482 0.4830.4 0.302 0.306 0.313 0.296 0.503 0.510 0.525 0.525 0.293 0.298 0.301 0.289 0.472 0.485 0.477 0.479-0.4 -0.021 -0.016 -0.019 -0.027 -0.042 -0.036 -0.009 -0.010 0.000 0.002 0.000 -0.019 0.007 -0.002 0.032 0.034-0.2 0.004 0.007 0.015 0.029 0.019 0.029 0.012 0.008 -0.015 -0.011 -0.011 0.007 0.008 0.021 -0.002 -0.001BIAS 0 -0.012 -0.011 -0.009 0.014 -0.007 0.000 -0.042 -0.045 0.006 0.010 0.012 0.046 0.021 0.039 -0.006 -0.0020.2 0.022 0.023 0.032 0.039 0.037 0.048 0.036 0.035 0.004 0.008 0.009 0.028 0.014 0.025 0.005 0.0080.4 -0.023 -0.020 -0.022 -0.034 -0.040 -0.027 -0.009 -0.012 0.003 0.006 0.007 -0.019 -0.004 -0.006 0.016 0.020-0.4 0.109 0.112 0.114 0.106 0.250 0.248 0.265 0.272 0.080 0.083 0.085 0.081 0.240 0.247 0.256 0.261-0.2 0.093 0.095 0.099 0.091 0.187 0.196 0.216 0.206 0.088 0.091 0.094 0.087 0.229 0.224 0.249 0.248MSE 0 0.099 0.102 0.106 0.092 0.236 0.235 0.299 0.293 0.101 0.101 0.103 0.097 0.234 0.226 0.260 0.2620.2 0.091 0.095 0.100 0.090 0.215 0.221 0.250 0.251 0.085 0.086 0.085 0.082 0.215 0.215 0.232 0.2340.4 0.092 0.094 0.098 0.089 0.255 0.261 0.276 0.276 0.086 0.089 0.091 0.084 0.223 0.235 0.228 0.230 (history of intravenous drug use); karnof (Karnofsky score); oprior (non-zidovudine an-tiretroviral therapy prior to initiation of study treatment); zprior (zidovudine use priorto treatment initiation); preanti (number of days of previously received antiretroviraltherapy); race; gender; str2 (antiretroviral history); offtrt (indicator of off-treatmentbefore 96pm5 weeks); days (number of days until the first occurrence of: (i) a declinein CD4 T cell count of at least 50 (ii) an event indicating progression to AIDS, or (iii)death).We now estimate CATE in the interval between 20 and 57 to avoid the boundaryeffect when nonparametric estimation method is involved. This range is about from0 .
025 quantile to 0 .
975 quantile of the data. To apply SRCATE, we use the sufficientdimension reduction developed by Xia et al. (2002), which is now known to be MAVE toestimate the projection matrices β and β , and the associated dimensions. The resultsare r (1) = 2 and r (0) = 3. From these, we then have s = max { r (1) , r (0) } + 1 = 4 and h = b σ r n − / and h = b σ n − / , where b σ r = p var( β ⊤ X ), b β is the estimated projectionand b σ = 2 p var( X ). Similar to the simulation studies, Gaussian kernel is used.Figure 4 shows, as a function of age , the curve of estimated CATE. Note that thecurve is much above zero. In other words, receiving two therapies simultaneously has amuch better treatment effect than receiving only one (zidovudine). Song and Ma (2008)21 . . . n=200 for model 3 x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=500 for model 3 x1 r e l a t i v e e ff i c i en cy − . − . . . ORCATEPRCATESRCATENRCATENCATESCATEPCATEOCATE . . n=200, regression method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=200, IPW method x1 r e l a t i v e e ff i c i en cy − . − . . . . . n=500, regression method x1 r e l a t i v e e ff i c i en cy − . − . . . . . n=500, IPW method x1 r e l a t i v e e ff i c i en cy − . − . . . Figure 3: Relative efficiency of the CATE estimators against NRCATE for Model 3,which are based the results in panel2 of Table 3.also obtained this conclusion. But the investigation on the heterogeneity shows thatthe treatment effect is influenced by age . As shown in Figure 4, before the age of 30,receiving two therapies leads to the immunity rise. After that, the advantage of thistreatment is gradually weakened. Thus, such a treatment seems more useful for patientswhose ages are around 30.
In this paper, we propose four regression-based estimators of CATE, aimed to capturethe heterogeneity of a treatment effect across subpopulations. The systematic investiga-tion shows the important factors that affect the asymptotic behaviours of the estimators:22 . . . . . age C A T E Figure 4: Conditional average treatment effect curves over agethe convergence rates of the outcome regression functions and the affiliation of the givencovariates to the set of arguments of the outcome regression functions. Further, anyregression-based estimation can be asymptotically more efficient than any propensityscore-based estimation, and can at most achieve the asymptotic efficiency of nonpara-metric regression-based estimation in some cases. These results can give a relativelycomplete profile of propensity score-based and regression-based estimation for CATE.From the research, semiparametric regression-based estimation (SRCATE) is worth ofrecommendation as it can avoid model misspecification as well as the curse of dimen-sionality when some dimension reduction and feature selection approaches are combined.see Luo et al. (2017) and Ma et al. (2019). In this paper, we only discuss the cases withcorrectly specified models. When the model is misspecified globally, further topics areabout the asymptotic bias. Here global misspecification means that the assumed modelis not convergent to the underlying model. If it is convergent, we call it local misspeci-fication. Thus, we will check at which rate of convergence, the asymptotic bias vanishesand then also study its asymptotic efficiency. Another topic is about double robustestimation as it can greatly avoid model misspecification. The research is ongoing.
Give some notations first.(1) C and M stand for two generic bounded constants, Ξ is the σ -field generated by X , . . . , X n . 232) ǫ ti = Y i − E ( Y ( t ) | X i ), τ t ( x ) = E [ E { Y | D = t, X }| X = x ], Z t = β ⊤ t X for t = 0 , i = 1 , . . . , n .(3) Write K (cid:16) X i − X h (cid:17) as K h ( X i ); K (cid:16) X i − X j h (cid:17) as K h ( X i − X j ), and K h ( Z i − Z j )as K (cid:16) Z i − Z j h (cid:17) .In the two-step estimation procedure for CATE, the second step involves, for i = 1 , . . . , n ,the quantities: b K h ( X i ) = X j : j = i w ij K h ( X j ) . We call it the estimator of K h ( X j ). In different circumstances, w ij can be different.Take NRCATE as an example, and write w ij as w Nij : w Nij = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1) that depends on X , . . . , X n only. Lemma 1.
Given assumptions (C1) - (C4) in Subsection 2.1 and (A1) - (A4) in Sub-sections 2.2-2.3, | w Nij − w Nji | = O p ( h ) nh p | K h ( X i − X j ) | , (A.1) Proof of Lemma 1.
By assumption (A2), w Nij = w Nji = 0 for || X j − X i || ∞ > h (Abrevaya et al., 2015). Suppose that || X j − X i || ∞ ≤ h . For all j , we define b f ( X j ) = 1 nh p n X i : i = j K h ( X i − X j ) . It is clear that w Nij = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1) = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) × nh p n P i =1 K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1)= nh p K h ( X i − X j ) b f ( X j ) b p ( X j ) . (A.2) | w Nij − w Nji | = 1 nh p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K h ( X i − X j ) b p ( X j ) b f ( X j ) − K h ( X j − X i ) b p ( X i ) b f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 nh p | K h ( X i − X j ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − b p ( X i ) b f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ nh p | K h ( X i − X j ) | ( (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X i ) b f ( X i ) − p ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) p ( X j ) f ( X j ) − p ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12) ) = 1 nh p | K h ( X i − X j ) | ( (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) b p ( X j ) p ( X j ) b f ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X i ) b f ( X i ) − p ( X i ) f ( X i ) b p ( X i ) p ( X i ) b f ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) p ( X i ) f ( X i ) − p ( X j ) f ( X j ) p ( X i ) p ( X j ) f ( X i ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12) ) . (A.3) Under conditions (C1)-(C4) and (A1)-(A4) for nonparametric estimation, sup i | b f ( X i ) − f ( X i ) | = O p h s + s log nnh p ! , sup i | b p ( X i ) − p ( X i ) | = O p h s + s log nnh p ! . Since s ≥ p ≥
2, assumption (A3) implies that sup i | b f ( X i ) − f ( X i ) | = o p ( h ) andsup i | b p ( X i ) − p ( X i ) | = o p ( h ). By the mean value theorem, sup j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) b p ( X j ) p ( X j ) b f ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup j e p ( X j ) e f ( X j ) sup j (cid:12)(cid:12)(cid:12)b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12) , where e p ( X j ) is a quantity between b p ( X j ) and p ( X j ), similarly, e f ( X j ) is also a quan-tity between b f ( X j ) and f ( X j ). Owing to that f and p are bounded away from zero,sup j e p − j f − j = O p (1). After a simple calculation, we have sup j (cid:12)(cid:12)(cid:12)b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12) = O p h s + s log nnh p ! = o p ( h ) . Therefore,sup j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) b p ( X j ) p ( X j ) b f ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o p ( h ) , sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X i ) b f ( X i ) − p ( X i ) f ( X i ) b p ( X i ) p ( X i ) b f ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o p ( h ) .
25s for the last term in (A.3), noticing that f and p are continuously differentiable onits compact support and bounded away from zero, we have (cid:12)(cid:12)(cid:12) f ( x ) p ( x ) − f ( x ) p ( x ) (cid:12)(cid:12)(cid:12) ≤ M || x − x || ∞ for all x , x ∈ X and a constant M > || X j − X i || ∞ ≤ h leads to (cid:12)(cid:12)(cid:12) f ( X i ) p ( X i ) − f ( X j ) p ( X j ) (cid:12)(cid:12)(cid:12) = O ( h ). Combining all results yields (A.1). (cid:3) Proof of Theorem 2.
We can rewrite b m ( X i ) − b m ( X i ) − τ ( x ) as { b m ( X i ) − τ ( x ) }−{ b m ( X i ) − τ ( x ) } . Then based on (2), q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) { [ b m ( X i ) − τ ( x )] − [ b m ( X i ) − τ ( x )] } nh k n P i =1 K h ( X i )= √ nh k n P i =1 K h ( X i ) { [ b m ( X i ) − τ ( x )] − [ b m ( X i ) − τ ( x )] } f ( x ) (1 + o p (1)) , (A.4) as sup x | nh k n X i =1 K h ( X i ) − f ( x ) | = o p (1) . First, deal with { b m ( X i ) − τ ( x ) } in (A.4). It is clear that q nh k n X i =1 K h ( X i ) [ b m ( X i ) − τ ( x )]= 1 q nh k ( n X i =1 K h ( X i ) [ b m ( X i ) − m ( X i )] + n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )] ) =: 1 q nh k ( I n, + I n, ) . (A.5) A simple calculation yields that | q nh k I n, | ≤ sup x | b m ( X i ) − m ( X i ) | nh k n X i =1 | K h ( X i ) | . As h → nh k n P i =1 | K h ( X i ) | = O p (1), we then have √ nh k I n, = O p ( p h k ) = o p (1).26hus, equation (A.5) becomes q nh k n X i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] = 1 q nh k n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )] + o p (1) . Similarly, q nh k n X i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] = 1 q nh k n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )] + o p (1) . Altogether, the asymptotically linear representation of b τ ( x ) is q nh k { b τ ( x ) − τ ( x ) } = √ nh k n P i =1 K h ( X i ) { m ( X i ) − m ( X i ) − τ ( x ) } f ( x ) (1 + o p (1))= √ nh k n P i =1 K h ( X i ) { m ( X i ) − m ( X i ) − τ ( x ) } f ( x ) + o p (1) . The second equation is due to the asymptotic finiteness of the leading term that isasymptotically normal shown below. As it is the sum of independent variables, theasymptotic normality is easy to derive. Specifically, noticing that the random variables { K h ( X i ) [ m ( X i ) − m ( X i ) − τ ( X i )] } ni =1 are i.i.d., then we can apply Lyapunov’s central limit theorem to obtain the asymptoticdistribution shown in Theorem 2. Under the assumptions (C1)- (C4) and (A1), we derivethat q nh k { b τ ( x ) − τ ( x ) } d −→ N (cid:18) , || K || σ P ( x ) f ( x ) (cid:19) , we now give the formula of σ P ( x ). It is easy to see that when n → ∞ , the variance of √ nh k n P i =1 K h ( X i ) { m ( X i ) − m ( X i ) − τ ( x ) } f ( x ) converges to σ P ( x ) := E [ { m ( X ) − m ( X ) − τ ( x ) } | X = x ] . The proof of Theorem 2 is finished. (cid:3) roof of Theorem 3. First, we have q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] nh k n P i =1 K h ( X i ) , (A.6) where b m ( X i ) = nh p n P j =1 K h ( X j − X i ) Y j ( D j = 1) nh p n P j =1 K h ( X j − X i ) ( D j = 1) , b m ( X i ) = nh p n P j =1 K h ( X j − X i ) Y j ( D j = 0) nh p n P j =1 K h ( X j − X i ) ( D j = 0) . Similarly as the proof for Theorem 2, we have the following decomposition: q nh k n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )]= 1 q nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + 1 q nh k n X i =1 K h ( X i ) [ E ( Y (1) | X i ) − τ ( x )]+ 1 q nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) ( w Nij − w Nji )+ 1 q nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i ) + 1 q nh k n X i =1 K h ( X i ) nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i =: I n, + I n, + I n, + I n, + I n, , (A.7) w Nij = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1) , ǫ i = Y i − E ( Y (1) | X i ) . Note that I n, and I n, in equation (A.7) yield the final expression in Theorem 3. There-fore, we need to show that I n, , I n, and I n, in equation (A.7) are all o p (1).First show that I n, = o p (1). From Lemma 1, q h k sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 K h ( X j ) ( w Nij − w Nji ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ q h k sup i X j : j = i ( w Nij − w Nji ) | K h ( X j ) |≤ M Ch × h q h k × sup i X j : j = i nh p | K h ( X i − X j ) | = O p (1) × o p (1) × O p (1) = o p (1) , Further, √ n n P i =1 ǫ i ( D i = 1) has finite limit and thus, is bounded by O p (1) and then I n, = o p (1).Deal with I n, . As n X j =1 K h ( X j ) w Nji = nh p n P j =1 K h ( X j − X i ) nh p n P j =1 K h ( X j − X i ) ( D j = 1) nh p n P j =1 K h ( X j − X i ) K h ( X j ) nh p n P j =1 K h ( X j − X i ) , we can then regard n P j =1 K h ( X j ) w Nji as an estimator of K h ( X i ) p ( X i ) . Consider n X j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i ) , which is the bias of K h ( X i ) p ( X i ) to K h ( X i ) p ( X i ) . Write X = ( X , X (2) ) and K h ( X − X j ) = K (cid:18) X − X j h (cid:19) K (cid:18) X (2) − X j h (cid:19) . Since b f − f = o p (1), and the kernel function is s ∗ ( ≥ s ) times continuously differentiable,29e have E n X j =1 K h ( X j ) w Nji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = 1 + o p (1) h p f ( X i ) p ( X i ) Z K (cid:18) u j − X i h (cid:19) K (cid:18) u j − X i h (cid:19) K h ( u j ) f ( u i ) du = 1 + o p (1) f ( X i ) p ( X i ) Z K ( v ) K ( v ) K (cid:18) X i − X h + v h h (cid:19) f ( X i + h v ) dv = K h ( X i ) p ( X i ) + O p (cid:18) h s h s (cid:19) . (A.8) Note that b K h ( X i ) b p ( X i ) − K h ( X i ) p ( X i )= (cid:26) b p ( X i ) − p ( X i ) + 1 p ( X i ) (cid:27) n b K h ( X i ) − K h ( X i ) + K h ( X i ) o − K h ( X i ) p ( X i ) = (cid:26) b p ( X i ) − p ( X i ) (cid:27) n b K h ( X i ) − K h ( X i ) o + 1 p ( X i ) n b K h ( X i ) − K h ( X i ) o + (cid:26) b p ( X i ) − p ( X i ) (cid:27) K h ( X i )= O p h s h s + h s + s log nnh p ! = O p (cid:18) h s h s (cid:19) . Thus, sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n P j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p (cid:16) h s h s (cid:17) . Owing to assumption (A4) that h s h s k →
0, we have sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) q h k n X j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p h s h s + k ! = o p (1) . Since ǫ i = Y i − E ( Y (1) | X i ) are mutually independent, we have I n, = o p (1) in equation(A.7). Finally, to show that I n, = o p (1) of equation (A.7). Note that nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1)30 nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) · nh p n P j =1 K h ( X j − X i ) nh p n P j =1 K h ( X j − X i ) ( D j = 1) , which can be viewed as an estimator of E { ( D =1) Y (1) | X i } p ( X i ) . Denote A ( X i ) = E { ( D =1) Y (1) | X i } . We can derive easily that b A ( X i ) b p ( X i ) − A ( X i ) p ( X i )= n b A ( X i ) − A ( X i ) + A ( X i ) o (cid:26) b p ( X i ) − p ( X i ) + 1 p ( X i ) (cid:27) − A ( X i ) p ( X i )= n b A ( X i ) − A ( X i ) o (cid:26) b p ( X i ) − p ( X i ) (cid:27) + A ( X i ) (cid:26) b p ( X i ) − p ( X i ) (cid:27) + n b A ( X i ) − A ( X i ) o p ( X i ) = O p h s + s log nnh p ! . Thussup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p h s + s log nnh p ! . Then, we can bound I n, as follows: | I n, | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) q nh k n X i =1 K h ( X i ) nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ q nh k sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) nh k n X i =1 | K h ( X i ) | = q nh k O p h s + s log nnh p ! · O p (1) = o p (1) · O p (1) = o p (1) , where assumption (A4) is used for the second equation. Thus, together with I n, = o p (1),31 n, = o p (1) and I n, = o p (1), equation (A.7) becomes q nh k n X i =1 K h ( X i ) nh p n P j =1 K h ( X j − X i ) Y j ( D j = 1) nh p n P j =1 K h ( X j − X i ) ( D j = 1) − τ ( x ) = I n, + I n, + o p (1) . Similarly, we can also deal with b m ( X i ) − τ ( x ) of (A.6) to have q nh k n X i =1 K h ( X i ) nh p n P j =1 K h ( X j − X i ) Y j ( D j = 0) nh p n P j =1 K h ( X j − X i ) ( D j = 0) − τ ( x ) := I n, + I n, + o p (1) , where I n, = 1 q nh k n X i =1 ǫ i ( D i = 0) K h ( X i )1 − p ( X i ) , I n, = 1 q nh k n X i =1 K h ( X i ) EY (0) | X i ,ǫ i = Y i − EY (0) | X i . Hence, we get the asymptotic linear representation of b τ ( x ) as q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) , which can be asymptotically normal. Again, we compute its asymptotic variance. Sim-ilarly as the proof for Theorem 2, we haveVar { b τ ( x ) } = 1 nh k || K || σ N ( x ) f ( x ) + o (cid:18) nh k (cid:19) . Then by assumptions (C1)– (C4) and (A1) – (A4) for some s ∗ ≥ s ≥ p , we can derivethat q nh k { b τ ( x ) − τ ( x ) } d −→ N (cid:18) , || K || σ N ( x ) f ( x ) (cid:19) , where σ N ( x ) ≡ E [ { Ψ ( X, Y, D ) − τ ( x ) } | X = x ] . The proof is concluded. (cid:3) roof of Theorem 4. Inspired by the proof of Theorem 2 of Luo et al. (2017), wehave q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) h b m ( b β ⊤ X ) − τ ( x ) i nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) h b m ( b β ⊤ X ) − τ ( x ) i nh k n P i =1 K h ( X i )= √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i )+ O p ( q nh k || b β − β || + q nh k || b β − β || ) , (A.9) where b m ( b β ⊤ X ) = nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) Y j ( D j = 1) nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) ( D j = 1) , b Z = b β ⊤ X, b m ( b β ⊤ X ) = nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) Y j ( D j = 0) nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) ( D j = 0) , b Z = b β ⊤ X. Under assumptions (A8), O p ( p nh k || b β − β || + p nh k || b β − β || ) = O p ( p h k ) = o p (1)as h →
0. Therefore, equation (A.9) becomes q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i )+ o p (1) . (A.10) Similarly as the proof for Theorem 3, we have q nh k n X i =1 K h ( X i ) h b m ( β ⊤ X ) − τ ( x ) i q nh k n X i =1 K h ( X i ) [ EY (1) | X i − τ ( x )] + 1 q nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) ( w S ij − w S ji )+ 1 q nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w S ji + 1 q nh k n X i =1 K h ( X i ) nh r (1)4 n P j =1 K h (cid:16) Z j − Z i (cid:17) ( D j = 1) EY (1) | X j nh r (1)4 n P j =1 K h (cid:16) Z j − Z i (cid:17) ( D j = 1) − EY (1) | X i =: I n, + I n, + I n, + I n, , where w S ij = nh r (1)4 K h (cid:16) Z i − Z j (cid:17) nh r (1)4 n P i =1 K h (cid:16) Z i − Z j (cid:17) ( D i = 1) , ǫ i = Y i − EY (1) | X i . Similarly, we can decompose b m ( β ⊤ X ) − τ ( x ) as q nh k n X i =1 K h ( X i ) h b m ( β ⊤ X ) − τ ( x ) i = 1 q nh k n X i =1 K h ( X i ) [ EY (0) | X i − τ ( x )] + 1 q nh k n X i =1 ǫ i ( D i = 0) n X j =1 K h ( X j ) ( w S ij − w S ji )+ 1 q nh k n X i =1 ǫ i ( D i = 0) n X j =1 K h ( X j ) w S ji + 1 q nh k n X i =1 K h ( X i ) nh r (0)4 n P j =1 K h (cid:16) Z j − Z i (cid:17) ( D j = 0) EY (0) | X j nh r (0)4 n P j =1 K h (cid:16) Z j − Z i (cid:17) ( D j = 0) − EY (0) | X i =: I ′ n, + I ′ n, + I ′ n, + I ′ n, , where w S ij = nh r (0)4 K h (cid:16) Z i − Z j (cid:17) nh r (0)4 n P i =1 K h (cid:16) Z i − Z j (cid:17) ( D i = 0) , ǫ i = Y i − EY (0) | X i . It is easy to show that I n, , I ′ n, , I n, and I ′ n, are o p (1) following the same arguments forproving that I n, = o p (1) and I n, = o p (1) for Theorem 3. The details are omitted here. Wenow deal with I n, and I ′ n, . emma 2. Suppose assumptions (C1) – (C4), (A1) and (A5) – (A7) are satisfied. Then, foreach point x in the support of X ,(1) If X ⊂ k − q β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , we have I n, = o p (1) , I ′ n, = o p (1) . (A.11) The corresponding asymptotically linear representation is then q nh k { b τ ( x ) − τ ( x ) } = 1 q nh k f ( x ) n X i =1 { m ( X i ) − m ( X i ) − τ ( x ) } K h ( X i ) + o p (1) . (2) If X ⊂ β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , we have I n, = 1 q nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + o p (1) , I ′ n, = o p (1) . (A.12) Then we have q nh k { b τ ( x ) − τ ( x ) } = 1 q nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) . (3) If X ⊂ k − q β ⊤ X and X ⊂ β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , we have I n, = o p (1) , I ′ n, = 1 q nh k n X i =1 ǫ i ( D i = 0) K h ( X i ) p ( X i ) + o p (1) . (A.13) The corresponding asymptotically linear representation is q nh k { b τ ( x ) − τ ( x ) } = 1 q nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) . (4) If X ⊂ β ⊤ X and X ⊂ β ⊤ X , we have I n, = 1 q nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + o p (1) , I ′ n, = 1 q nh k n X i =1 ǫ i ( D i = 0) K h ( X i ) p ( X i ) + o p (1) . (A.14) We have q nh k { b τ ( x ) − τ ( x ) } = 1 q nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) . Proof of Lemma 2.
We need to show that I n, = o p (1) if X ⊂ k − q β ⊤ X with s (2 − k/q ) + > < q ≤ k . Let X = v , β ⊤ X = v , and denote (cid:16) v − v i h , v − v i h (cid:17) as ( t , t ). We have E n X j =1 K h ( X j ) w S ji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = 1 + o p (1) h r (1)4 f ( v i ) p ( v i ) Z K (cid:18) v j − β ⊤ X i h (cid:19) K (cid:18) v j − X h (cid:19) f ( v i ) dv = h o p (1) f ( v i ) p ( v i ) Z K ( t ) K (cid:18) v i − X h + t h h (cid:19) f ( v i + h t , v i + h t ) dt dt = h q K (cid:18) v i − X h (cid:19) f ( v i , v i ) f ( v i ) p ( v i ) Z K ( t ) dt dt + h q +14 h K ′ (cid:18) v i − X h (cid:19) f ( v i , v i ) f ( v i ) p ( v i ) Z t K ( t ) dt dt + o p (cid:18) h h (cid:19) , where f ( v i , v i ) is the joint density function of ( X , β ⊤ X ). Under assumptions (A5) – (A7),we have E n X j =1 K h ( X j ) w S ji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = C h q K h ( X i ) f ( X i , β ⊤ X i ) f ( X i ) p ( β ⊤ X i ) + O p h q +14 h ! = O p h q + h q +14 h ! . Hence, under assumptions (A6), (A7), s (2 − k/q ) + k > < q ≤ k ,1 q nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w S ji = 1 √ n n X i =1 ǫ i ( D i = 1) O p h q h k/ + h q +14 h k/ ! = o p (1) . Analogously, we get I ′ n, = o p (1) if X ⊂ k − q β ⊤ X . Next, we prove that I n, = 1 q nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + o p (1) , if X ⊂ β ⊤ X . As that case that X ⊂ β ⊤ X is similar to that X ⊂ X in nonparametriccase, then parallelling to derive equation (A.8), we get the desired result. Similarly, we have I ′ n, = √ nh k n P i =1 ǫ i ( D i = 0) K h ( X i ) p ( X i ) + o p (1) if X ⊂ β ⊤ X . The proof for Lemma 2 isconcluded. Proof of Corollary 2.
Consider the case where X e X ∈ R q . Similarly as before, we erive that q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) h b m ( e X i ) − τ ( x ) i nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) h b m ( e X i ) − τ ( x ) i nh k n P i =1 K h ( X i ) , (A.15)where b m ( e X i ) = nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) Y j ( D j = 1) nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 1) , b m ( e X i ) = nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) Y j ( D j = 0) nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 0) . Some similar calculations lead to b m ( e X i ) − τ ( x ).1 q nh k n X i =1 K h ( X i ) h b m ( e X i ) − τ ( x ) i = 1 q nh k n X i =1 K h ( X i ) (cid:2) EY (1) | X i − τ ( x ) (cid:3) + 1 q nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) ( w N ij − w N ji )+ 1 q nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w N ji + 1 q nh k n X i =1 K h ( X i ) nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 1) EY (1) | X j nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 1) − EY (1) | X i =: I n, + I n, + I n, + I n, , where w N ij = nh q K h (cid:16) e X i − e X j (cid:17) nh q n P i =1 K h (cid:16) e X i − e X j (cid:17) ( D i = 1) . Then we can prove that I n, and I n, are o p (1) by the same arguments as those used tohandle I n, and I n, for proving Theorem 3. Owing to X e X , similar arguments for provingLemma 2 implies that I n, = o p (1). The proof for Corollary 2 is concluded. (cid:3) roof of Corollary 3. From the proof for Theorem 3, we can see that E n X j =1 K h ( X j ) w Nji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = O p (cid:18) h + h s h s (cid:19) , by the condition q nh k (cid:16) h s + p log( n ) /nh p (cid:17) = o (1). Then NRCATE shares the same asymp-totic distribution as PRCATE. For SRCATE, we can use similar arguments to show the sameresult. The proof is finished. (cid:3) References
Abrevaya, J., Hsu, Y.C., Lieli, R.P., 2015. Estimating conditional average treatment effects.Journal of Business & Economic Statistics 33, 485–505.Cheng, P.E., 1994. Nonparametric estimation of mean functionals with data missing at random.Journal of the American statistical association 89, 81–87.Cook, R.D., Li, B., 2002. Dimension reduction for conditional mean in regression. The Annalsof Statistics 30, 455–474.Feng, Z., Wen, X.M., Yu, Z., Zhu, L., 2013. On partial sufficient dimension reduction withapplications to partially linear multi-index models. Journal of the American StatisticalAssociation 108, 237–246.Hahn, J., 1998. On the role of the propensity score in efficient semiparametric estimation ofaverage treatment effects. Econometrica. Journal of the Econometric Society 66, 315–331.Hammer, S.M., Katzenstein, D.A., Hughes, M.D., Gundacker, H., Schooley, R.T., Haubrich,R.H., Henry, W.K., Lederman, M.M., Phair, J.P., Niu, M., Martin, S.H., Thomas, C.M.,1996. A trial comparing nucleoside monotherapy with combination therapy in hiv-infectedadults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal ofMedicine 335, 1081–1090.Healy, M., Westmacott, M., 1956. Missing values in experiments analysed on automatic com-puters. Journal of the Royal Statistical Society: Series C (Applied Statistics) 5, 203–206.Hirano, K., Imbens, G.W., Ridder, G., 2003. Efficient estimation of average treatment effectsusing the estimated propensity score. Econometrica. Journal of the Econometric Society 71,1161–1189.Li, Q., Racine, J.S., 2007. Nonparametric econometrics. Princeton University Press, Princeton,NJ. Theory and practice.Luo, W., Wu, W., Zhu, Y., 2019. Learning heterogeneity in causal inference using sufficientdimension reduction. Journal of Causal Inference 7. uo, W., Zhu, Y., Ghosh, D., 2017. On estimating regression-based causal effects using suffi-cient dimension reduction. Biometrika 104, 51–65.Ma, S., Zhu, L., Zhang, Z., Tsai, C.L., Carroll, R.J., 2019. A robust and efficient approach tocausal inference based on sparse sufficient dimension reduction. The Annals of Statistics 47,1505–1535.Matloff, N.S., 1981. Use of regression functions for improved estimation of means. Biometrika68, 685–689.Nadaraya, E.A., 1964. On estimating regression. Theory of Probability & Its Applications 9,141–142.Pagan, A., Ullah, A., 1999. Nonparametric econometrics. Themes in Modern Econometrics,Cambridge University Press, Cambridge.Rao, J., 1996. On variance estimation with imputed survey data. Journal of the AmericanStatistical Association 91, 499–506.Rosenbaum, P.R., Rubin, D.B., 1983. The central role of the propensity score in observationalstudies for causal effects. Biometrika 70, 41–55.Rosenbaum, P.R., Rubin, D.B., 1985. Constructing a control group using multivariate matchedsampling methods that incorporate the propensity score. The American Statistician 39, 33–38.Rubin, D.B., 1974. Estimating causal effects of treatments in randomized and nonrandomizedstudies. Journal of Educational Psychology 66, 688.Song, X., Ma, S., 2008. Multiple augmentation for interval-censored data with measurementerror. Statistics in Medicine 27, 3178–3190.Wang, Q., Linton, O., H¨ardle, W., 2004. Semiparametric regression analysis with missingresponse at random. Journal of the American Statistical Association 99, 334–345.Watson, G.S., 1964. Smooth regression analysis. Sankhy¯a (Statistics). The Indian Journal ofStatistics. Series A 26, 359–372.Xia, Y., Tong, H., Li, W.K., Zhu, L.X., 2002. An adaptive estimation of dimension reductionspace. Journal of the Royal Statistical Society. Series B. Statistical Methodology 64, 363–410.Yin, J., Geng, Z., Li, R., Wang, H., 2010. Nonparametric covariance model. Statistica Sinica20, 469–479. √ nh [ b τ ( x ) − τ ( x )] for model 1 n=200 n=500OR PR SR NR N S P O OR PR SR NR N S P O h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.178 0.224 0.213 0.223 0.352 0.361 0.396 0.407 0.188 0.224 0.208 0.225 0.371 0.388 0.404 0.409-0.2 0.182 0.192 0.181 0.196 0.351 0.365 0.377 0.380 0.186 0.193 0.191 0.213 0.368 0.383 0.389 0.395SD 0 0.199 0.210 0.231 0.232 0.420 0.440 0.460 0.491 0.198 0.205 0.206 0.217 0.415 0.430 0.466 0.4760.2 0.208 0.216 0.248 0.243 0.466 0.476 0.503 0.525 0.195 0.203 0.231 0.226 0.423 0.438 0.484 0.5090.4 0.195 0.215 0.239 0.236 0.377 0.395 0.415 0.426 0.202 0.222 0.250 0.247 0.364 0.372 0.415 0.432-0.4 0.005 0.011 -0.032 0.001 -0.024 -0.001 -0.004 -0.006 0.021 0.026 -0.097 0.011 0.022 0.055 0.043 0.037-0.2 -0.002 0.006 0.094 0.043 -0.011 0.011 0.014 0.017 -0.005 -0.003 0.119 0.034 -0.036 -0.008 -0.013 -0.013BIAS 0 0.005 0.013 0.040 0.032 -0.033 0.004 -0.007 0.012 0.007 0.007 0.057 0.030 -0.026 0.006 -0.004 0.0070.2 0.005 0.009 0.008 0.013 -0.006 0.035 0.001 0.014 0.003 0.001 0.005 0.007 -0.030 -0.004 -0.005 0.0060.4 0.006 0.004 -0.014 -0.008 0.033 0.066 0.041 0.027 0.015 0.013 0.000 0.008 0.032 0.038 0.017 0.012-0.4 0.032 0.050 0.047 0.050 0.125 0.130 0.157 0.165 0.036 0.051 0.052 0.051 0.138 0.153 0.165 0.169-0.2 0.033 0.037 0.042 0.040 0.124 0.133 0.142 0.145 0.035 0.037 0.051 0.047 0.137 0.147 0.152 0.156MSE 0 0.040 0.044 0.055 0.055 0.177 0.194 0.212 0.241 0.039 0.042 0.046 0.048 0.173 0.185 0.217 0.2260.2 0.043 0.047 0.061 0.059 0.217 0.228 0.253 0.276 0.038 0.041 0.054 0.051 0.180 0.192 0.234 0.2590.4 0.038 0.046 0.057 0.056 0.143 0.160 0.174 0.182 0.041 0.049 0.062 0.061 0.133 0.140 0.173 0.187 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.195 0.235 0.219 0.227 0.358 0.371 0.403 0.392 0.181 0.220 0.212 0.225 0.365 0.381 0.406 0.405-0.2 0.191 0.198 0.196 0.211 0.386 0.398 0.413 0.410 0.192 0.201 0.194 0.214 0.373 0.386 0.406 0.408SD 0 0.199 0.206 0.214 0.216 0.391 0.415 0.429 0.435 0.196 0.209 0.219 0.230 0.418 0.436 0.466 0.4840.2 0.202 0.207 0.235 0.231 0.440 0.455 0.495 0.525 0.203 0.209 0.231 0.227 0.419 0.431 0.468 0.4930.4 0.207 0.222 0.248 0.245 0.375 0.380 0.429 0.441 0.196 0.212 0.231 0.229 0.361 0.370 0.416 0.426-0.4 0.011 0.019 -0.043 0.012 0.023 0.046 0.038 0.034 0.015 0.003 -0.126 -0.011 -0.008 0.024 0.005 0.000-0.2 0.000 0.001 0.081 0.035 -0.033 -0.011 -0.002 -0.006 0.011 0.009 0.126 0.045 -0.021 0.002 -0.002 -0.001BIAS 0 -0.012 -0.016 0.013 0.006 -0.033 0.000 -0.009 -0.003 0.009 0.013 0.064 0.038 -0.012 0.010 0.014 0.0270.2 -0.003 -0.008 -0.008 -0.004 -0.041 -0.014 -0.035 -0.019 -0.009 -0.004 -0.002 -0.001 -0.019 -0.008 -0.009 0.0070.4 -0.007 -0.010 -0.025 -0.022 0.017 0.037 0.030 0.026 0.017 0.019 0.010 0.015 0.055 0.055 0.046 0.047-0.4 0.038 0.056 0.050 0.051 0.129 0.140 0.164 0.155 0.033 0.048 0.061 0.051 0.133 0.145 0.165 0.164-0.2 0.037 0.039 0.045 0.046 0.150 0.159 0.171 0.168 0.037 0.040 0.053 0.048 0.139 0.149 0.165 0.167MSE 0 0.040 0.043 0.046 0.047 0.154 0.172 0.184 0.189 0.039 0.044 0.052 0.054 0.175 0.190 0.217 0.2350.2 0.041 0.043 0.055 0.053 0.195 0.207 0.246 0.276 0.041 0.044 0.053 0.051 0.176 0.186 0.219 0.2430.4 0.043 0.049 0.062 0.061 0.141 0.146 0.185 0.195 0.039 0.045 0.053 0.053 0.133 0.140 0.175 0.184 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.183 0.222 0.218 0.219 0.357 0.375 0.405 0.398 0.191 0.214 0.207 0.215 0.376 0.390 0.400 0.408-0.2 0.195 0.203 0.186 0.197 0.360 0.366 0.380 0.384 0.186 0.196 0.183 0.198 0.364 0.372 0.386 0.391SD 0 0.193 0.206 0.214 0.217 0.441 0.453 0.474 0.486 0.193 0.201 0.207 0.211 0.432 0.442 0.478 0.4910.2 0.200 0.213 0.237 0.232 0.460 0.476 0.516 0.525 0.194 0.202 0.230 0.227 0.479 0.489 0.526 0.5410.4 0.198 0.220 0.241 0.239 0.407 0.414 0.453 0.460 0.211 0.231 0.257 0.255 0.406 0.408 0.455 0.474-0.4 0.005 0.000 -0.064 -0.009 -0.013 0.010 0.010 0.007 -0.004 -0.004 -0.130 -0.016 -0.006 0.019 0.009 0.020-0.2 0.001 -0.002 0.079 0.049 -0.044 -0.029 -0.024 -0.022 -0.002 -0.004 0.118 0.041 -0.025 -0.007 -0.007 -0.004BIAS 0 0.003 0.000 0.022 0.017 -0.029 -0.010 -0.018 -0.005 0.008 0.006 0.056 0.034 -0.034 -0.014 -0.003 -0.0080.2 0.016 0.013 0.013 0.018 -0.034 -0.001 -0.026 -0.016 0.000 -0.001 0.002 0.006 -0.030 -0.026 -0.022 -0.0280.4 0.004 -0.001 -0.014 -0.015 0.014 0.035 0.013 0.000 0.021 0.021 0.009 0.010 0.030 0.036 0.008 -0.003-0.4 0.034 0.049 0.052 0.048 0.128 0.141 0.164 0.159 0.037 0.046 0.060 0.046 0.142 0.152 0.160 0.167-0.2 0.038 0.041 0.041 0.041 0.131 0.134 0.145 0.148 0.035 0.038 0.047 0.041 0.133 0.139 0.149 0.153MSE 0 0.037 0.042 0.046 0.047 0.195 0.205 0.225 0.236 0.037 0.040 0.046 0.046 0.188 0.195 0.228 0.2410.2 0.040 0.045 0.056 0.054 0.213 0.226 0.267 0.276 0.038 0.041 0.053 0.052 0.230 0.240 0.278 0.2930.4 0.039 0.048 0.058 0.057 0.166 0.172 0.205 0.211 0.045 0.054 0.066 0.065 0.165 0.168 0.207 0.225 √ nh [ b τ ( x ) − τ ( x )] for model 2 n=200 n=500OR PR SR NR N S P O OR PR SR NR N S P O h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.384 0.386 0.391 0.409 0.950 1.103 1.101 1.098 0.341 0.348 0.356 0.376 0.959 1.066 1.061 1.079-0.2 0.389 0.395 0.399 0.430 0.968 1.100 1.106 1.114 0.360 0.362 0.365 0.396 0.962 1.091 1.088 1.099SD 0 0.386 0.388 0.393 0.419 1.000 1.165 1.141 1.136 0.373 0.376 0.380 0.397 0.940 1.077 1.053 1.0640.2 0.379 0.378 0.381 0.406 1.011 1.175 1.122 1.116 0.357 0.361 0.368 0.398 0.998 1.140 1.120 1.1210.4 0.384 0.390 0.412 0.435 1.011 1.150 1.105 1.103 0.390 0.394 0.413 0.438 1.045 1.182 1.129 1.157-0.4 0.010 0.010 0.059 0.031 -0.667 -0.063 0.081 0.089 0.020 0.018 0.108 0.028 -1.033 -0.118 0.032 0.015-0.2 -0.017 -0.017 0.012 0.003 -0.740 -0.158 -0.008 0.011 -0.005 -0.007 0.033 -0.004 -1.078 -0.151 -0.023 -0.036BIAS 0 -0.013 -0.015 -0.022 -0.017 -0.751 -0.143 -0.025 -0.009 0.004 0.002 -0.003 0.028 -0.996 -0.082 0.050 0.0520.2 -0.002 -0.002 -0.023 0.013 -0.650 0.004 0.058 0.078 -0.011 -0.013 -0.068 -0.031 -0.968 -0.008 0.028 0.0350.4 0.060 0.058 0.013 0.041 -0.566 0.095 0.103 0.104 0.005 0.004 -0.067 -0.007 -0.892 0.120 0.020 0.019-0.4 0.148 0.149 0.156 0.168 1.348 1.220 1.218 1.213 0.117 0.121 0.139 0.142 1.987 1.150 1.127 1.165-0.2 0.152 0.157 0.160 0.185 1.483 1.234 1.222 1.240 0.129 0.131 0.134 0.157 2.087 1.213 1.184 1.209MSE 0 0.149 0.151 0.155 0.176 1.564 1.377 1.303 1.291 0.139 0.142 0.145 0.158 1.876 1.166 1.111 1.1360.2 0.143 0.143 0.146 0.165 1.445 1.380 1.262 1.252 0.128 0.130 0.140 0.160 1.932 1.300 1.256 1.2580.4 0.151 0.156 0.170 0.191 1.342 1.332 1.232 1.226 0.152 0.155 0.175 0.192 1.888 1.411 1.275 1.339 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.368 0.374 0.376 0.397 1.003 1.204 1.140 1.161 0.346 0.348 0.358 0.372 0.945 1.093 1.077 1.102-0.2 0.399 0.397 0.407 0.443 1.011 1.192 1.183 1.177 0.368 0.369 0.371 0.394 0.889 1.030 1.028 1.039SD 0 0.389 0.390 0.392 0.413 1.029 1.192 1.188 1.197 0.362 0.364 0.373 0.408 0.966 1.101 1.077 1.0990.2 0.387 0.387 0.395 0.411 1.048 1.254 1.207 1.198 0.328 0.330 0.333 0.376 0.966 1.097 1.078 1.1040.4 0.391 0.398 0.420 0.432 1.041 1.202 1.131 1.147 0.370 0.377 0.390 0.391 1.019 1.172 1.089 1.114-0.4 0.023 0.027 0.079 0.019 -0.811 -0.173 -0.012 -0.030 -0.023 -0.020 0.070 -0.015 -1.169 -0.194 -0.027 -0.018-0.2 0.003 0.005 0.033 0.015 -0.754 -0.101 0.052 0.049 0.005 0.007 0.046 0.002 -1.101 -0.141 0.039 0.050BIAS 0 0.008 0.009 0.011 0.019 -0.781 -0.109 -0.014 -0.005 0.000 0.001 -0.007 0.001 -1.103 -0.121 0.013 0.0210.2 0.027 0.025 -0.006 0.031 -0.653 0.054 0.122 0.119 0.003 0.003 -0.046 0.016 -1.060 -0.011 0.054 0.0520.4 0.023 0.020 -0.025 0.018 -0.588 0.124 0.157 0.128 0.014 0.013 -0.058 0.008 -0.986 0.057 0.027 0.021-0.4 0.136 0.141 0.147 0.158 1.664 1.479 1.300 1.348 0.120 0.121 0.133 0.139 2.258 1.232 1.161 1.215-0.2 0.159 0.158 0.166 0.196 1.592 1.431 1.402 1.388 0.135 0.136 0.139 0.155 2.002 1.081 1.058 1.083MSE 0 0.151 0.152 0.154 0.171 1.668 1.433 1.412 1.433 0.131 0.133 0.139 0.166 2.150 1.227 1.159 1.2070.2 0.150 0.151 0.156 0.170 1.525 1.577 1.473 1.450 0.108 0.109 0.113 0.141 2.057 1.203 1.165 1.2220.4 0.154 0.159 0.177 0.187 1.429 1.460 1.304 1.331 0.137 0.142 0.155 0.153 2.011 1.376 1.187 1.241 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.364 0.370 0.379 0.409 0.997 1.172 1.140 1.164 0.358 0.363 0.370 0.389 0.970 1.134 1.109 1.140-0.2 0.388 0.392 0.406 0.436 1.042 1.225 1.227 1.230 0.364 0.362 0.365 0.408 0.901 1.086 1.054 1.054SD 0 0.396 0.398 0.413 0.446 0.992 1.180 1.166 1.161 0.371 0.374 0.382 0.417 0.919 1.113 1.077 1.0840.2 0.388 0.389 0.397 0.436 1.029 1.254 1.161 1.182 0.369 0.370 0.374 0.389 1.021 1.199 1.148 1.1680.4 0.375 0.379 0.403 0.430 1.151 1.360 1.261 1.280 0.364 0.370 0.386 0.409 1.049 1.243 1.132 1.160-0.4 -0.001 0.002 0.047 0.008 -0.838 -0.202 -0.010 -0.019 -0.013 -0.010 0.087 0.000 -1.255 -0.212 -0.034 -0.021-0.2 0.008 0.012 0.038 0.020 -0.872 -0.245 -0.067 -0.058 0.021 0.022 0.061 0.018 -1.145 -0.086 0.120 0.121BIAS 0 0.022 0.023 0.023 0.032 -0.850 -0.196 -0.036 -0.030 -0.001 -0.003 -0.014 -0.005 -1.265 -0.190 -0.023 -0.0240.2 -0.007 -0.007 -0.036 -0.014 -0.839 -0.140 -0.053 -0.042 0.007 0.005 -0.047 -0.001 -1.213 -0.103 0.006 -0.0070.4 0.011 0.007 -0.036 0.000 -0.759 -0.075 -0.013 -0.021 -0.005 -0.009 -0.082 -0.013 -1.191 -0.073 -0.075 -0.103-0.4 0.133 0.137 0.146 0.167 1.695 1.414 1.299 1.355 0.129 0.132 0.144 0.151 2.517 1.330 1.230 1.300-0.2 0.150 0.154 0.166 0.191 1.846 1.562 1.510 1.515 0.133 0.132 0.137 0.167 2.121 1.187 1.125 1.126MSE 0 0.158 0.159 0.171 0.200 1.706 1.431 1.360 1.350 0.138 0.140 0.146 0.174 2.445 1.275 1.161 1.1760.2 0.150 0.151 0.159 0.190 1.764 1.592 1.351 1.400 0.136 0.137 0.142 0.151 2.512 1.447 1.319 1.3630.4 0.141 0.144 0.164 0.185 1.901 1.856 1.590 1.638 0.132 0.137 0.156 0.168 2.519 1.551 1.286 1.356 √ nh [ b τ ( x ) − τ ( x )] for model 3 n=200 n=500OR PR SR NR N S P O OR PR SR NR N S P O h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.320 0.321 0.328 0.319 0.516 0.532 0.570 0.548 0.306 0.310 0.315 0.307 0.502 0.501 0.501 0.499-0.2 0.341 0.343 0.352 0.345 0.477 0.477 0.514 0.526 0.298 0.301 0.304 0.292 0.476 0.472 0.497 0.501SD 0 0.301 0.306 0.312 0.304 0.450 0.458 0.487 0.495 0.292 0.296 0.299 0.290 0.484 0.466 0.512 0.5140.2 0.320 0.320 0.322 0.313 0.493 0.486 0.521 0.514 0.296 0.298 0.304 0.296 0.470 0.455 0.489 0.4880.4 0.306 0.314 0.319 0.312 0.501 0.525 0.534 0.530 0.301 0.305 0.308 0.296 0.473 0.477 0.483 0.491-0.4 -0.023 -0.025 -0.028 -0.044 -0.038 -0.029 0.001 0.003 0.026 0.027 0.025 0.005 0.027 0.021 0.050 0.054-0.2 0.026 0.022 0.020 0.035 0.004 0.009 -0.006 -0.009 -0.006 -0.006 -0.003 0.010 0.013 0.022 0.003 0.004BIAS 0 0.003 -0.001 0.011 0.035 0.048 0.051 0.019 0.022 0.010 0.011 0.013 0.042 0.020 0.043 -0.014 -0.0150.2 0.003 0.000 0.001 0.014 0.015 0.026 0.008 0.010 -0.012 -0.011 -0.010 0.009 0.033 0.044 0.023 0.0220.4 -0.004 -0.006 -0.011 -0.018 -0.023 -0.012 0.011 0.013 0.001 0.001 -0.004 -0.023 -0.044 -0.049 -0.010 -0.008-0.4 0.103 0.104 0.109 0.103 0.267 0.284 0.324 0.301 0.094 0.097 0.100 0.094 0.252 0.252 0.253 0.252-0.2 0.117 0.118 0.124 0.120 0.227 0.227 0.265 0.277 0.089 0.091 0.092 0.085 0.227 0.223 0.247 0.251MSE 0 0.091 0.094 0.098 0.094 0.205 0.212 0.237 0.246 0.085 0.088 0.090 0.086 0.234 0.219 0.262 0.2650.2 0.102 0.102 0.104 0.098 0.244 0.237 0.272 0.264 0.088 0.089 0.093 0.088 0.222 0.209 0.240 0.2390.4 0.093 0.098 0.102 0.097 0.251 0.276 0.285 0.281 0.091 0.093 0.095 0.088 0.226 0.230 0.234 0.241 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.297 0.301 0.313 0.306 0.465 0.480 0.502 0.506 0.285 0.290 0.296 0.289 0.497 0.512 0.507 0.513-0.2 0.311 0.313 0.319 0.314 0.423 0.428 0.467 0.462 0.303 0.307 0.309 0.300 0.471 0.475 0.482 0.478SD 0 0.316 0.320 0.322 0.322 0.470 0.465 0.532 0.522 0.321 0.325 0.331 0.325 0.483 0.487 0.521 0.5210.2 0.318 0.323 0.328 0.323 0.460 0.462 0.501 0.502 0.291 0.298 0.301 0.297 0.468 0.471 0.484 0.4850.4 0.301 0.305 0.306 0.305 0.489 0.493 0.530 0.516 0.310 0.311 0.312 0.307 0.518 0.536 0.524 0.522-0.4 0.003 0.002 -0.001 -0.013 -0.019 -0.004 0.024 0.025 -0.003 -0.004 -0.007 -0.025 -0.043 -0.025 -0.018 -0.015-0.2 -0.025 -0.023 -0.023 -0.013 -0.023 -0.024 -0.043 -0.044 0.011 0.011 0.015 0.029 0.007 0.016 -0.004 -0.004BIAS 0 0.001 0.003 0.009 0.028 0.019 0.028 -0.009 -0.014 0.014 0.015 0.023 0.051 0.050 0.059 0.012 0.0180.2 0.008 0.009 0.017 0.025 0.024 0.029 0.017 0.011 0.010 0.011 0.015 0.026 0.019 0.032 0.012 0.0110.4 -0.010 -0.009 -0.014 -0.025 -0.055 -0.048 -0.010 -0.012 -0.004 -0.004 -0.010 -0.025 -0.034 -0.013 -0.009 -0.008-0.4 0.088 0.090 0.098 0.094 0.217 0.230 0.253 0.257 0.081 0.084 0.088 0.084 0.248 0.263 0.257 0.264-0.2 0.097 0.099 0.102 0.099 0.179 0.183 0.220 0.216 0.092 0.094 0.095 0.091 0.222 0.226 0.232 0.229MSE 0 0.100 0.103 0.104 0.105 0.221 0.217 0.283 0.272 0.103 0.106 0.110 0.108 0.236 0.241 0.271 0.2720.2 0.101 0.104 0.108 0.105 0.212 0.215 0.251 0.252 0.085 0.089 0.091 0.089 0.220 0.223 0.234 0.2350.4 0.091 0.093 0.094 0.094 0.243 0.246 0.281 0.267 0.096 0.097 0.097 0.095 0.270 0.288 0.275 0.273 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.309 0.313 0.323 0.316 0.515 0.541 0.531 0.519 0.282 0.286 0.286 0.283 0.487 0.493 0.495 0.491-0.2 0.315 0.315 0.326 0.312 0.495 0.534 0.528 0.539 0.299 0.300 0.304 0.294 0.480 0.481 0.488 0.486SD 0 0.322 0.328 0.329 0.324 0.473 0.485 0.526 0.531 0.304 0.304 0.308 0.299 0.472 0.476 0.493 0.4870.2 0.302 0.303 0.311 0.302 0.460 0.433 0.480 0.485 0.305 0.307 0.308 0.305 0.465 0.466 0.485 0.4870.4 0.306 0.311 0.314 0.309 0.476 0.500 0.506 0.514 0.283 0.285 0.286 0.284 0.449 0.471 0.467 0.464-0.4 -0.011 -0.011 -0.016 -0.026 -0.026 0.002 0.021 0.016 0.016 0.014 0.006 -0.009 -0.020 -0.018 0.012 0.009-0.2 -0.012 -0.012 -0.012 -0.001 -0.014 -0.011 -0.024 -0.027 0.007 0.005 0.006 0.022 0.009 0.025 0.006 0.004BIAS 0 -0.012 -0.012 -0.005 0.018 0.058 0.048 0.024 0.023 -0.002 -0.004 -0.003 0.024 0.021 0.034 -0.014 -0.0210.2 0.017 0.018 0.021 0.034 0.040 0.042 0.032 0.029 -0.007 -0.009 -0.009 0.007 -0.004 0.001 -0.016 -0.0190.4 0.012 0.013 0.012 -0.002 -0.043 -0.017 0.003 0.004 -0.009 -0.011 -0.011 -0.029 -0.044 -0.032 -0.005 -0.007-0.4 0.096 0.098 0.105 0.101 0.266 0.293 0.282 0.270 0.080 0.082 0.082 0.080 0.237 0.244 0.245 0.241-0.2 0.100 0.099 0.106 0.098 0.245 0.285 0.279 0.291 0.090 0.090 0.092 0.087 0.230 0.232 0.238 0.237MSE 0 0.104 0.108 0.108 0.106 0.227 0.238 0.278 0.282 0.092 0.092 0.095 0.090 0.223 0.228 0.244 0.2380.2 0.091 0.092 0.097 0.093 0.213 0.189 0.232 0.236 0.093 0.095 0.095 0.093 0.217 0.217 0.235 0.2380.4 0.094 0.097 0.098 0.096 0.228 0.251 0.256 0.265 0.080 0.081 0.082 0.081 0.204 0.223 0.218 0.216 r X i v : . [ m a t h . S T ] S e p Submitted to the Annals of Statistics
SUPPLEMENTARY MATERIAL TO“OUTCOME REGRESSION-BASEDESTIMATION OF CONDITIONAL AVERAGETREATMENT EFFECT”
By Lu Li † , Niwen Zhou ‡ and Lixing Zhu ‡ , § , ∗ East China Normal University † ,Beijing Normal University ‡ and HongKong Baptist University §
1. Appendix
Give some notations first.(1) C and M stand for two generic bounded constants, Ξ is the σ -fieldgenerated by X , . . . , X n .(2) ǫ ti = Y i − E ( Y ( t ) | X i ), τ t ( x ) = E [ E { Y | D = t, X }| X = x ], Z t = β ⊤ t X for t = 0 , i = 1 , . . . , n .(3) Write K (cid:16) X i − X h (cid:17) as K h ( X i ); K (cid:16) X i − X j h (cid:17) as K h ( X i − X j ), and K h ( Z i − Z j ) as K (cid:16) Z i − Z j h (cid:17) .In the two-step estimation procedure for CATE, the second step involves,for i = 1 , . . . , n , the quantities: b K h ( X i ) = X j : j = i w ij K h ( X j ) . We call it the estimator of K h ( X j ). In different circumstances, w ij can bedifferent. Take NRCATE as an example, and write w ij as w Nij : w Nij = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1) that depends on X , . . . , X n only. Lemma . Given assumptions (C1) - (C4) in Subsection 2.1 and (A1)- (A4) in Subsections 2.2-2.3, | w Nij − w Nji | = O p ( h ) nh p | K h ( X i − X j ) | , (1.1) Proof of Lemma 1.1.
By assumption (A2), w Nij = w Nji = 0 for || X j − X i || ∞ > h (Abrevaya, Hsu and Lieli, 2015). Suppose that || X j − X i || ∞ ≤ h . For all j , we define b f ( X j ) = 1 nh p n X i : i = j K h ( X i − X j ) . It is clear that (1.2) w Nij = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1) = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) × nh p n P i =1 K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1)= nh p K h ( X i − X j ) b f ( X j ) b p ( X j ) . Then we have (1.3) | w Nij − w Nji | = 1 nh p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K h ( X i − X j ) b p ( X j ) b f ( X j ) − K h ( X j − X i ) b p ( X i ) b f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 nh p | K h ( X i − X j ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − b p ( X i ) b f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ nh p | K h ( X i − X j ) | ( (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X i ) b f ( X i ) − p ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) p ( X j ) f ( X j ) − p ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12) ) = 1 nh p | K h ( X i − X j ) | ( (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) b p ( X j ) p ( X j ) b f ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X i ) b f ( X i ) − p ( X i ) f ( X i ) b p ( X i ) p ( X i ) b f ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) p ( X i ) f ( X i ) − p ( X j ) f ( X j ) p ( X i ) p ( X j ) f ( X i ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12) ) . Under conditions (C1)-(C4) and (A1)-(A4) for nonparametric estimation, sup i | b f ( X i ) − f ( X i ) | = O p h s + s log nnh p ! , sup i | b p ( X i ) − p ( X i ) | = O p h s + s log nnh p ! . Since s ≥ p ≥
2, assumption (A3) implies that sup i | b f ( X i ) − f ( X i ) | = o p ( h ) and sup i | b p ( X i ) − p ( X i ) | = o p ( h ). By the mean value theorem, sup j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) b p ( X j ) p ( X j ) b f ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup j e p ( X j ) e f ( X j ) sup j (cid:12)(cid:12)(cid:12)b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12) , where e p ( X j ) is a quantity between b p ( X j ) and p ( X j ), similarly, e f ( X j ) is alsoa quantity between b f ( X j ) and f ( X j ). Owing to that f and p are boundedaway from zero, sup j e p − j f − j = O p (1). After a simple calculation, we have sup j (cid:12)(cid:12)(cid:12)b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12) = O p h s + s log nnh p ! = o p ( h ) . Therefore,sup j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) b p ( X j ) p ( X j ) b f ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o p ( h ) , sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X i ) b f ( X i ) − p ( X i ) f ( X i ) b p ( X i ) p ( X i ) b f ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o p ( h ) . As for the last term in (1.3), noticing that f and p are continuously dif-ferentiable on its compact support and bounded away from zero, we have (cid:12)(cid:12)(cid:12) f ( x ) p ( x ) − f ( x ) p ( x ) (cid:12)(cid:12)(cid:12) ≤ M || x − x || ∞ for all x , x ∈ X and a constant M > || X j − X i || ∞ ≤ h leads to (cid:12)(cid:12)(cid:12) f ( X i ) p ( X i ) − f ( X j ) p ( X j ) (cid:12)(cid:12)(cid:12) = O ( h ). Com-bining all results yields (1.1). (cid:3) Proof of Theorem ?? . We can rewrite b m ( X i ) − b m ( X i ) − τ ( x ) as { b m ( X i ) − τ ( x ) } − { b m ( X i ) − τ ( x ) } . Then based on ( ?? ), (1.4) q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) { [ b m ( X i ) − τ ( x )] − [ b m ( X i ) − τ ( x )] } nh k n P i =1 K h ( X i )= √ nh k n P i =1 K h ( X i ) { [ b m ( X i ) − τ ( x )] − [ b m ( X i ) − τ ( x )] } f ( x ) (1 + o p (1)) , as sup x | nh k n X i =1 K h ( X i ) − f ( x ) | = o p (1) . First, deal with { b m ( X i ) − τ ( x ) } in (1.4). It is clear that (1.5) 1 p nh k n X i =1 K h ( X i ) [ b m ( X i ) − τ ( x )]= 1 p nh k ( n X i =1 K h ( X i ) [ b m ( X i ) − m ( X i )] + n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )] ) =: 1 p nh k ( I n, + I n, ) . A simple calculation yields that | p nh k I n, | ≤ sup x | b m ( X i ) − m ( X i ) | nh k n X i =1 | K h ( X i ) | . As h → nh k n P i =1 | K h ( X i ) | = O p (1), we then have √ nh k I n, = O p ( q h k ) = o p (1). Thus, equation (1.5) becomes p nh k n X i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] = 1 p nh k n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )] + o p (1) . Similarly, p nh k n X i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] = 1 p nh k n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )] + o p (1) . Altogether, the asymptotically linear representation of b τ ( x ) is q nh k { b τ ( x ) − τ ( x ) } = √ nh k n P i =1 K h ( X i ) { m ( X i ) − m ( X i ) − τ ( x ) } f ( x ) (1 + o p (1))= √ nh k n P i =1 K h ( X i ) { m ( X i ) − m ( X i ) − τ ( x ) } f ( x ) + o p (1) . The second equation is due to the asymptotic finiteness of the leading termthat is asymptotically normal shown below. As it is the sum of independentvariables, the asymptotic normality is easy to derive. Specifically, noticingthat the random variables { K h ( X i ) [ m ( X i ) − m ( X i ) − τ ( X i )] } ni =1 are i.i.d., then we can apply Lyapunov’s central limit theorem to obtain theasymptotic distribution shown in Theorem ?? . Under the assumptions (C1)- (C4) and (A1), we derive that q nh k { b τ ( x ) − τ ( x ) } d −→ N (cid:18) , || K || σ P ( x ) f ( x ) (cid:19) , we now give the formula of σ P ( x ). It is easy to see that when n → ∞ , thevariance of √ nh k n P i =1 K h ( X i ) { m ( X i ) − m ( X i ) − τ ( x ) } f ( x ) converges to σ P ( x ) := E [ { m ( X ) − m ( X ) − τ ( x ) } | X = x ] . The proof of Theorem ?? is finished. (cid:3) Proof of Theorem ?? . First, we have (1.6) q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] nh k n P i =1 K h ( X i ) , where b m ( X i ) = nh p n P j =1 K h ( X j − X i ) Y j ( D j = 1) nh p n P j =1 K h ( X j − X i ) ( D j = 1) , b m ( X i ) = nh p n P j =1 K h ( X j − X i ) Y j ( D j = 0) nh p n P j =1 K h ( X j − X i ) ( D j = 0) . Similarly as the proof for Theorem ?? , we have the following decomposition: (1.7) 1 p nh k n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )]= 1 p nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + 1 p nh k n X i =1 K h ( X i ) [ E ( Y (1) | X i ) − τ ( x )]+ 1 p nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) ( w Nij − w Nji )+ 1 p nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i ) + 1 p nh k n X i =1 K h ( X i ) nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i =: I n, + I n, + I n, + I n, + I n, , where w Nij = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1) , ǫ i = Y i − E ( Y (1) | X i ) . Note that I n, and I n, in equation (1.7) yield the final expression in Theorem ?? . Therefore, we need to show that I n, , I n, and I n, in equation (1.7) areall o p (1).First show that I n, = o p (1). From Lemma 1.1, p h k sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 K h ( X j ) ( w Nij − w Nji ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ p h k sup i X j : j = i ( w Nij − w Nji ) | K h ( X j ) |≤ M Ch × h p h k × sup i X j : j = i nh p | K h ( X i − X j ) | = O p (1) × o p (1) × O p (1) = o p (1) , Further, √ n n P i =1 ǫ i ( D i = 1) has finite limit and thus, is bounded by O p (1)and then I n, = o p (1).Deal with I n, . As n X j =1 K h ( X j ) w Nji = nh p n P j =1 K h ( X j − X i ) nh p n P j =1 K h ( X j − X i ) ( D j = 1) nh p n P j =1 K h ( X j − X i ) K h ( X j ) nh p n P j =1 K h ( X j − X i ) , we can then regard n P j =1 K h ( X j ) w Nji as an estimator of K h ( X i ) p ( X i ) . Consider n X j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i ) , which is the bias of K h ( X i ) p ( X i ) to K h ( X i ) p ( X i ) . Write X = ( X , X (2) ) and K h ( X − X j ) = K (cid:18) X − X j h (cid:19) K (cid:18) X (2) − X j h (cid:19) . Since b f − f = o p (1), and the kernel function is s ∗ ( ≥ s ) times continuouslydifferentiable, we have (1.8) E n X j =1 K h ( X j ) w Nji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = 1 + o p (1) h p f ( X i ) p ( X i ) Z K (cid:18) u j − X i h (cid:19) K (cid:18) u j − X i h (cid:19) K h ( u j ) f ( u i ) du = 1 + o p (1) f ( X i ) p ( X i ) Z K ( v ) K ( v ) K (cid:18) X i − X h + v h h (cid:19) f ( X i + h v ) dv = K h ( X i ) p ( X i ) + O p (cid:18) h s h s (cid:19) . Note that b K h ( X i ) b p ( X i ) − K h ( X i ) p ( X i )= (cid:26) b p ( X i ) − p ( X i ) + 1 p ( X i ) (cid:27) n b K h ( X i ) − K h ( X i ) + K h ( X i ) o − K h ( X i ) p ( X i ) = (cid:26) b p ( X i ) − p ( X i ) (cid:27) n b K h ( X i ) − K h ( X i ) o + 1 p ( X i ) n b K h ( X i ) − K h ( X i ) o + (cid:26) b p ( X i ) − p ( X i ) (cid:27) K h ( X i )= O p h s h s + h s + s log nnh p ! = O p (cid:18) h s h s (cid:19) . Thus, sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n P j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p (cid:16) h s h s (cid:17) . Owing to assumption (A4) that h s h s k →
0, we have sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p h k n X j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p h s h s + k ! = o p (1) . Since ǫ i = Y i − E ( Y (1) | X i ) are mutually independent, we have I n, = o p (1)in equation (1.7). Finally, to show that I n, = o p (1) of equation (1.7). Notethat nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1)= nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) · nh p n P j =1 K h ( X j − X i ) nh p n P j =1 K h ( X j − X i ) ( D j = 1) , which can be viewed as an estimator of E { ( D =1) Y (1) | X i } p ( X i ) . Denote A ( X i ) = E { ( D = 1) Y (1) | X i } . We can derive easily that b A ( X i ) b p ( X i ) − A ( X i ) p ( X i )= n b A ( X i ) − A ( X i ) + A ( X i ) o (cid:26) b p ( X i ) − p ( X i ) + 1 p ( X i ) (cid:27) − A ( X i ) p ( X i )= n b A ( X i ) − A ( X i ) o (cid:26) b p ( X i ) − p ( X i ) (cid:27) + A ( X i ) (cid:26) b p ( X i ) − p ( X i ) (cid:27) + n b A ( X i ) − A ( X i ) o p ( X i ) = O p h s + s log nnh p ! . Thussup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p h s + s log nnh p ! . Then, we can bound I n, as follows: | I n, | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p nh k n X i =1 K h ( X i ) nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ q nh k sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) nh k n X i =1 | K h ( X i ) | = q nh k O p h s + s log nnh p ! · O p (1) = o p (1) · O p (1) = o p (1) , where assumption (A4) is used for the second equation. Thus, together with I n, = o p (1), I n, = o p (1) and I n, = o p (1), equation (1.7) becomes p nh k n X i =1 K h ( X i ) nh p n P j =1 K h ( X j − X i ) Y j ( D j = 1) nh p n P j =1 K h ( X j − X i ) ( D j = 1) − τ ( x ) = I n, + I n, + o p (1) . Similarly, we can also deal with b m ( X i ) − τ ( x ) of (1.6) to have p nh k n X i =1 K h ( X i ) nh p n P j =1 K h ( X j − X i ) Y j ( D j = 0) nh p n P j =1 K h ( X j − X i ) ( D j = 0) − τ ( x ) := I n, + I n, + o p (1) , where I n, = 1 p nh k n X i =1 ǫ i ( D i = 0) K h ( X i )1 − p ( X i ) , I n, = 1 p nh k n X i =1 K h ( X i ) EY (0) | X i ,ǫ i = Y i − EY (0) | X i . Hence, we get the asymptotic linear representation of b τ ( x ) as q nh k { b τ ( x ) − τ ( x ) } = 1 q nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) , which can be asymptotically normal. Again, we compute its asymptoticvariance. Similarly as the proof for Theorem ?? , we haveVar { b τ ( x ) } = 1 nh k || K || σ N ( x ) f ( x ) + o (cid:18) nh k (cid:19) . Then by assumptions (C1)– (C4) and (A1) – (A4) for some s ∗ ≥ s ≥ p , wecan derive that q nh k { b τ ( x ) − τ ( x ) } d −→ N (cid:18) , || K || σ N ( x ) f ( x ) (cid:19) , where σ N ( x ) ≡ E [ { Ψ ( X, Y, D ) − τ ( x ) } | X = x ] . The proof is concluded. (cid:3)
Proof of Theorem ?? . Inspired by the proof of Theorem 2 of Luo, Zhu and Ghosh(2017), we have (1.9) q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) h b m ( b β ⊤ X ) − τ ( x ) i nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) h b m ( b β ⊤ X ) − τ ( x ) i nh k n P i =1 K h ( X i )= √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i )+ O p ( q nh k || b β − β || + q nh k || b β − β || ) , where b m ( b β ⊤ X ) = nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) Y j ( D j = 1) nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) ( D j = 1) , b Z = b β ⊤ X, b m ( b β ⊤ X ) = nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) Y j ( D j = 0) nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) ( D j = 0) , b Z = b β ⊤ X. Under assumptions (A8), O p ( q nh k || b β − β || + q nh k || b β − β || ) = O p ( q h k ) = o p (1) as h →
0. Therefore, equation (1.9) becomes (1.10) q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i )+ o p (1) . Similarly as the proof for Theorem ?? , we have p nh k n X i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) = 1 p nh k n X i =1 K h ( X i ) [ EY (1) | X i − τ ( x )] + 1 p nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) ( w S ij − w S ji )+ 1 p nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w S ji + 1 p nh k n X i =1 K h ( X i ) nh r (1)4 n P j =1 K h (cid:0) Z j − Z i (cid:1) ( D j = 1) EY (1) | X j nh r (1)4 n P j =1 K h (cid:0) Z j − Z i (cid:1) ( D j = 1) − EY (1) | X i =: I n, + I n, + I n, + I n, , where w S ij = nh r (1)4 K h (cid:0) Z i − Z j (cid:1) nh r (1)4 n P i =1 K h (cid:0) Z i − Z j (cid:1) ( D i = 1) , ǫ i = Y i − EY (1) | X i . Similarly, we can decompose b m ( β ⊤ X ) − τ ( x ) as p nh k n X i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) = 1 p nh k n X i =1 K h ( X i ) [ EY (0) | X i − τ ( x )] + 1 p nh k n X i =1 ǫ i ( D i = 0) n X j =1 K h ( X j ) ( w S ij − w S ji )+ 1 p nh k n X i =1 ǫ i ( D i = 0) n X j =1 K h ( X j ) w S ji + 1 p nh k n X i =1 K h ( X i ) nh r (0)4 n P j =1 K h (cid:0) Z j − Z i (cid:1) ( D j = 0) EY (0) | X j nh r (0)4 n P j =1 K h (cid:0) Z j − Z i (cid:1) ( D j = 0) − EY (0) | X i =: I ′ n, + I ′ n, + I ′ n, + I ′ n, , where w S ij = nh r (0)4 K h (cid:0) Z i − Z j (cid:1) nh r (0)4 n P i =1 K h (cid:0) Z i − Z j (cid:1) ( D i = 0) , ǫ i = Y i − EY (0) | X i . I n, , I ′ n, , I n, and I ′ n, are o p (1) following the samearguments for proving that I n, = o p (1) and I n, = o p (1) for Theorem ?? . Thedetails are omitted here. We now deal with I n, and I ′ n, . Lemma . Suppose assumptions (C1) – (C4), (A1) and (A5) – (A7) aresatisfied. Then, for each point x in the support of X ,(1) If X ⊂ k − q β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k ,we have I n, = o p (1) , I ′ n, = o p (1) . (1.11) The corresponding asymptotically linear representation is then q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { m ( X i ) − m ( X i ) − τ ( x ) } K h ( X i ) + o p (1) . (2) If X ⊂ β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , wehave I n, = 1 p nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + o p (1) , I ′ n, = o p (1) . (1.12) Then we have q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) . (3) If X ⊂ k − q β ⊤ X and X ⊂ β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , wehave I n, = o p (1) , I ′ n, = 1 p nh k n X i =1 ǫ i ( D i = 0) K h ( X i ) p ( X i ) + o p (1) . (1.13) The corresponding asymptotically linear representation is q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) . (4) If X ⊂ β ⊤ X and X ⊂ β ⊤ X , we have I n, = 1 p nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + o p (1) , I ′ n, = 1 p nh k n X i =1 ǫ i ( D i = 0) K h ( X i ) p ( X i ) + o p (1) . (1.14) We have q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) . Proof of Lemma 1.2.
We need to show that I n, = o p (1) if X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > < q ≤ k . Let X = v , β ⊤ X = v , and denote (cid:16) v − v i h , v − v i h (cid:17) as ( t , t ). We have E n X j =1 K h ( X j ) w S ji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = 1 + o p (1) h r (1)4 f ( v i ) p ( v i ) Z K (cid:18) v j − β ⊤ X i h (cid:19) K (cid:18) v j − X h (cid:19) f ( v i ) dv = h o p (1) f ( v i ) p ( v i ) Z K ( t ) K (cid:18) v i − X h + t h h (cid:19) f ( v i + h t , v i + h t ) dt dt = h q K (cid:18) v i − X h (cid:19) f ( v i , v i ) f ( v i ) p ( v i ) Z K ( t ) dt dt + h q +14 h K ′ (cid:18) v i − X h (cid:19) f ( v i , v i ) f ( v i ) p ( v i ) Z t K ( t ) dt dt + o p (cid:18) h h (cid:19) , where f ( v i , v i ) is the joint density function of ( X , β ⊤ X ). Under assumptions(A5) – (A7), we have E n X j =1 K h ( X j ) w S ji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = C h q K h ( X i ) f ( X i , β ⊤ X i ) f ( X i ) p ( β ⊤ X i ) + O p h q +14 h ! = O p h q + h q +14 h ! . Hence, under assumptions (A6), (A7), s (2 − k/q ) + k > < q ≤ k ,1 p nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w S ji = 1 √ n n X i =1 ǫ i ( D i = 1) O p h q h k/ + h q +14 h k/ ! = o p (1) . Analogously, we get I ′ n, = o p (1) if X ⊂ k − q β ⊤ X . Next, we prove that I n, = 1 p nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + o p (1) , if X ⊂ β ⊤ X . As that case that X ⊂ β ⊤ X is similar to that X ⊂ X in nonpara-metric case, then parallelling to derive equation (1.8), we get the desired result.Similarly, we have I ′ n, = √ nh k n P i =1 ǫ i ( D i = 0) K h ( X i ) p ( X i ) + o p (1) if X ⊂ β ⊤ X .The proof for Lemma 1.2 is concluded. Proof of Corollary ?? . Consider the case where X e X ∈ R q . Similarly as4before, we derive that(1.15) q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) h b m ( e X i ) − τ ( x ) i nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) h b m ( e X i ) − τ ( x ) i nh k n P i =1 K h ( X i ) , where b m ( e X i ) = nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) Y j ( D j = 1) nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 1) , b m ( e X i ) = nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) Y j ( D j = 0) nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 0) . Some similar calculations lead to b m ( e X i ) − τ ( x ).1 p nh k n X i =1 K h ( X i ) h b m ( e X i ) − τ ( x ) i = 1 p nh k n X i =1 K h ( X i ) (cid:2) EY (1) | X i − τ ( x ) (cid:3) + 1 p nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) ( w N ij − w N ji )+ 1 p nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w N ji + 1 p nh k n X i =1 K h ( X i ) nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 1) EY (1) | X j nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 1) − EY (1) | X i =: I n, + I n, + I n, + I n, , where w N ij = nh q K h (cid:16) e X i − e X j (cid:17) nh q n P i =1 K h (cid:16) e X i − e X j (cid:17) ( D i = 1) . Then we can prove that I n, and I n, are o p (1) by the same arguments as thoseused to handle I n, and I n, for proving Theorem ?? . Owing to X e X , similararguments for proving Lemma 1.2 implies that I n, = o p (1). The proof for Corol-lary ?? is concluded. (cid:3) Proof of Corollary ?? . From the proof for Theorem ?? , we can see that E n X j =1 K h ( X j ) w Nji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i = O p (cid:18) h + h s h s (cid:19) , p nh k (cid:16) h s + p log( n ) /nh p (cid:17) = o (1). Then NRCATE shares thesame asymptotic distribution as PRCATE. For SRCATE, we can use similar argu-ments to show the same result. The proof is finished. (cid:3) References
Abrevaya, J., Hsu, Y. C., & Lieli, R. P. (2015). Estimating conditional averagetreatment effects.
Journal of Business & Economic Statistics , 485-505. Luo, W., Zhu, Y., & Ghosh, D (2017). On estimating regression-based causal effectsusing sufficient dimension reduction.
Biometrika , 51-65.
Lu Li School of StatisticsEast China Normal UniversityShanghai, 200062, ChinaE-mail:
Niwen Zhou School of StatisticsBeing Normal UniversityBeijing, 100875, ChinaE-mail: [email protected]
Lixing ZhuSchool of StatisticsBeing Normal UniversityBeijing, 100875, Chinaand Department of MathematicsHong Kong Baptist UniversityKowloon Tong, Hong Kong, ChinaE-mail: [email protected] Table 1.1
The distribution of √ nh [ b τ ( x ) − τ ( x )] for model 1 n=200 n=500OR PR SR NR N S P O OR PR SR NR N S P O h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.178 0.224 0.213 0.223 0.352 0.361 0.396 0.407 0.188 0.224 0.208 0.225 0.371 0.388 0.404 0.409-0.2 0.182 0.192 0.181 0.196 0.351 0.365 0.377 0.380 0.186 0.193 0.191 0.213 0.368 0.383 0.389 0.395SD 0 0.199 0.210 0.231 0.232 0.420 0.440 0.460 0.491 0.198 0.205 0.206 0.217 0.415 0.430 0.466 0.4760.2 0.208 0.216 0.248 0.243 0.466 0.476 0.503 0.525 0.195 0.203 0.231 0.226 0.423 0.438 0.484 0.5090.4 0.195 0.215 0.239 0.236 0.377 0.395 0.415 0.426 0.202 0.222 0.250 0.247 0.364 0.372 0.415 0.432-0.4 0.005 0.011 -0.032 0.001 -0.024 -0.001 -0.004 -0.006 0.021 0.026 -0.097 0.011 0.022 0.055 0.043 0.037-0.2 -0.002 0.006 0.094 0.043 -0.011 0.011 0.014 0.017 -0.005 -0.003 0.119 0.034 -0.036 -0.008 -0.013 -0.013BIAS 0 0.005 0.013 0.040 0.032 -0.033 0.004 -0.007 0.012 0.007 0.007 0.057 0.030 -0.026 0.006 -0.004 0.0070.2 0.005 0.009 0.008 0.013 -0.006 0.035 0.001 0.014 0.003 0.001 0.005 0.007 -0.030 -0.004 -0.005 0.0060.4 0.006 0.004 -0.014 -0.008 0.033 0.066 0.041 0.027 0.015 0.013 0.000 0.008 0.032 0.038 0.017 0.012-0.4 0.032 0.050 0.047 0.050 0.125 0.130 0.157 0.165 0.036 0.051 0.052 0.051 0.138 0.153 0.165 0.169-0.2 0.033 0.037 0.042 0.040 0.124 0.133 0.142 0.145 0.035 0.037 0.051 0.047 0.137 0.147 0.152 0.156MSE 0 0.040 0.044 0.055 0.055 0.177 0.194 0.212 0.241 0.039 0.042 0.046 0.048 0.173 0.185 0.217 0.2260.2 0.043 0.047 0.061 0.059 0.217 0.228 0.253 0.276 0.038 0.041 0.054 0.051 0.180 0.192 0.234 0.2590.4 0.038 0.046 0.057 0.056 0.143 0.160 0.174 0.182 0.041 0.049 0.062 0.061 0.133 0.140 0.173 0.187 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.195 0.235 0.219 0.227 0.358 0.371 0.403 0.392 0.181 0.220 0.212 0.225 0.365 0.381 0.406 0.405-0.2 0.191 0.198 0.196 0.211 0.386 0.398 0.413 0.410 0.192 0.201 0.194 0.214 0.373 0.386 0.406 0.408SD 0 0.199 0.206 0.214 0.216 0.391 0.415 0.429 0.435 0.196 0.209 0.219 0.230 0.418 0.436 0.466 0.4840.2 0.202 0.207 0.235 0.231 0.440 0.455 0.495 0.525 0.203 0.209 0.231 0.227 0.419 0.431 0.468 0.4930.4 0.207 0.222 0.248 0.245 0.375 0.380 0.429 0.441 0.196 0.212 0.231 0.229 0.361 0.370 0.416 0.426-0.4 0.011 0.019 -0.043 0.012 0.023 0.046 0.038 0.034 0.015 0.003 -0.126 -0.011 -0.008 0.024 0.005 0.000-0.2 0.000 0.001 0.081 0.035 -0.033 -0.011 -0.002 -0.006 0.011 0.009 0.126 0.045 -0.021 0.002 -0.002 -0.001BIAS 0 -0.012 -0.016 0.013 0.006 -0.033 0.000 -0.009 -0.003 0.009 0.013 0.064 0.038 -0.012 0.010 0.014 0.0270.2 -0.003 -0.008 -0.008 -0.004 -0.041 -0.014 -0.035 -0.019 -0.009 -0.004 -0.002 -0.001 -0.019 -0.008 -0.009 0.0070.4 -0.007 -0.010 -0.025 -0.022 0.017 0.037 0.030 0.026 0.017 0.019 0.010 0.015 0.055 0.055 0.046 0.047-0.4 0.038 0.056 0.050 0.051 0.129 0.140 0.164 0.155 0.033 0.048 0.061 0.051 0.133 0.145 0.165 0.164-0.2 0.037 0.039 0.045 0.046 0.150 0.159 0.171 0.168 0.037 0.040 0.053 0.048 0.139 0.149 0.165 0.167MSE 0 0.040 0.043 0.046 0.047 0.154 0.172 0.184 0.189 0.039 0.044 0.052 0.054 0.175 0.190 0.217 0.2350.2 0.041 0.043 0.055 0.053 0.195 0.207 0.246 0.276 0.041 0.044 0.053 0.051 0.176 0.186 0.219 0.2430.4 0.043 0.049 0.062 0.061 0.141 0.146 0.185 0.195 0.039 0.045 0.053 0.053 0.133 0.140 0.175 0.184 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.183 0.222 0.218 0.219 0.357 0.375 0.405 0.398 0.191 0.214 0.207 0.215 0.376 0.390 0.400 0.408-0.2 0.195 0.203 0.186 0.197 0.360 0.366 0.380 0.384 0.186 0.196 0.183 0.198 0.364 0.372 0.386 0.391SD 0 0.193 0.206 0.214 0.217 0.441 0.453 0.474 0.486 0.193 0.201 0.207 0.211 0.432 0.442 0.478 0.4910.2 0.200 0.213 0.237 0.232 0.460 0.476 0.516 0.525 0.194 0.202 0.230 0.227 0.479 0.489 0.526 0.5410.4 0.198 0.220 0.241 0.239 0.407 0.414 0.453 0.460 0.211 0.231 0.257 0.255 0.406 0.408 0.455 0.474-0.4 0.005 0.000 -0.064 -0.009 -0.013 0.010 0.010 0.007 -0.004 -0.004 -0.130 -0.016 -0.006 0.019 0.009 0.020-0.2 0.001 -0.002 0.079 0.049 -0.044 -0.029 -0.024 -0.022 -0.002 -0.004 0.118 0.041 -0.025 -0.007 -0.007 -0.004BIAS 0 0.003 0.000 0.022 0.017 -0.029 -0.010 -0.018 -0.005 0.008 0.006 0.056 0.034 -0.034 -0.014 -0.003 -0.0080.2 0.016 0.013 0.013 0.018 -0.034 -0.001 -0.026 -0.016 0.000 -0.001 0.002 0.006 -0.030 -0.026 -0.022 -0.0280.4 0.004 -0.001 -0.014 -0.015 0.014 0.035 0.013 0.000 0.021 0.021 0.009 0.010 0.030 0.036 0.008 -0.003-0.4 0.034 0.049 0.052 0.048 0.128 0.141 0.164 0.159 0.037 0.046 0.060 0.046 0.142 0.152 0.160 0.167-0.2 0.038 0.041 0.041 0.041 0.131 0.134 0.145 0.148 0.035 0.038 0.047 0.041 0.133 0.139 0.149 0.153MSE 0 0.037 0.042 0.046 0.047 0.195 0.205 0.225 0.236 0.037 0.040 0.046 0.046 0.188 0.195 0.228 0.2410.2 0.040 0.045 0.056 0.054 0.213 0.226 0.267 0.276 0.038 0.041 0.053 0.052 0.230 0.240 0.278 0.2930.4 0.039 0.048 0.058 0.057 0.166 0.172 0.205 0.211 0.045 0.054 0.066 0.065 0.165 0.168 0.207 0.225 Table 1.2
The distribution of √ nh [ b τ ( x ) − τ ( x )] for model 2 n=200 n=500OR PR SR NR N S P O OR PR SR NR N S P O h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.384 0.386 0.391 0.409 0.950 1.103 1.101 1.098 0.341 0.348 0.356 0.376 0.959 1.066 1.061 1.079-0.2 0.389 0.395 0.399 0.430 0.968 1.100 1.106 1.114 0.360 0.362 0.365 0.396 0.962 1.091 1.088 1.099SD 0 0.386 0.388 0.393 0.419 1.000 1.165 1.141 1.136 0.373 0.376 0.380 0.397 0.940 1.077 1.053 1.0640.2 0.379 0.378 0.381 0.406 1.011 1.175 1.122 1.116 0.357 0.361 0.368 0.398 0.998 1.140 1.120 1.1210.4 0.384 0.390 0.412 0.435 1.011 1.150 1.105 1.103 0.390 0.394 0.413 0.438 1.045 1.182 1.129 1.157-0.4 0.010 0.010 0.059 0.031 -0.667 -0.063 0.081 0.089 0.020 0.018 0.108 0.028 -1.033 -0.118 0.032 0.015-0.2 -0.017 -0.017 0.012 0.003 -0.740 -0.158 -0.008 0.011 -0.005 -0.007 0.033 -0.004 -1.078 -0.151 -0.023 -0.036BIAS 0 -0.013 -0.015 -0.022 -0.017 -0.751 -0.143 -0.025 -0.009 0.004 0.002 -0.003 0.028 -0.996 -0.082 0.050 0.0520.2 -0.002 -0.002 -0.023 0.013 -0.650 0.004 0.058 0.078 -0.011 -0.013 -0.068 -0.031 -0.968 -0.008 0.028 0.0350.4 0.060 0.058 0.013 0.041 -0.566 0.095 0.103 0.104 0.005 0.004 -0.067 -0.007 -0.892 0.120 0.020 0.019-0.4 0.148 0.149 0.156 0.168 1.348 1.220 1.218 1.213 0.117 0.121 0.139 0.142 1.987 1.150 1.127 1.165-0.2 0.152 0.157 0.160 0.185 1.483 1.234 1.222 1.240 0.129 0.131 0.134 0.157 2.087 1.213 1.184 1.209MSE 0 0.149 0.151 0.155 0.176 1.564 1.377 1.303 1.291 0.139 0.142 0.145 0.158 1.876 1.166 1.111 1.1360.2 0.143 0.143 0.146 0.165 1.445 1.380 1.262 1.252 0.128 0.130 0.140 0.160 1.932 1.300 1.256 1.2580.4 0.151 0.156 0.170 0.191 1.342 1.332 1.232 1.226 0.152 0.155 0.175 0.192 1.888 1.411 1.275 1.339 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.368 0.374 0.376 0.397 1.003 1.204 1.140 1.161 0.346 0.348 0.358 0.372 0.945 1.093 1.077 1.102-0.2 0.399 0.397 0.407 0.443 1.011 1.192 1.183 1.177 0.368 0.369 0.371 0.394 0.889 1.030 1.028 1.039SD 0 0.389 0.390 0.392 0.413 1.029 1.192 1.188 1.197 0.362 0.364 0.373 0.408 0.966 1.101 1.077 1.0990.2 0.387 0.387 0.395 0.411 1.048 1.254 1.207 1.198 0.328 0.330 0.333 0.376 0.966 1.097 1.078 1.1040.4 0.391 0.398 0.420 0.432 1.041 1.202 1.131 1.147 0.370 0.377 0.390 0.391 1.019 1.172 1.089 1.114-0.4 0.023 0.027 0.079 0.019 -0.811 -0.173 -0.012 -0.030 -0.023 -0.020 0.070 -0.015 -1.169 -0.194 -0.027 -0.018-0.2 0.003 0.005 0.033 0.015 -0.754 -0.101 0.052 0.049 0.005 0.007 0.046 0.002 -1.101 -0.141 0.039 0.050BIAS 0 0.008 0.009 0.011 0.019 -0.781 -0.109 -0.014 -0.005 0.000 0.001 -0.007 0.001 -1.103 -0.121 0.013 0.0210.2 0.027 0.025 -0.006 0.031 -0.653 0.054 0.122 0.119 0.003 0.003 -0.046 0.016 -1.060 -0.011 0.054 0.0520.4 0.023 0.020 -0.025 0.018 -0.588 0.124 0.157 0.128 0.014 0.013 -0.058 0.008 -0.986 0.057 0.027 0.021-0.4 0.136 0.141 0.147 0.158 1.664 1.479 1.300 1.348 0.120 0.121 0.133 0.139 2.258 1.232 1.161 1.215-0.2 0.159 0.158 0.166 0.196 1.592 1.431 1.402 1.388 0.135 0.136 0.139 0.155 2.002 1.081 1.058 1.083MSE 0 0.151 0.152 0.154 0.171 1.668 1.433 1.412 1.433 0.131 0.133 0.139 0.166 2.150 1.227 1.159 1.2070.2 0.150 0.151 0.156 0.170 1.525 1.577 1.473 1.450 0.108 0.109 0.113 0.141 2.057 1.203 1.165 1.2220.4 0.154 0.159 0.177 0.187 1.429 1.460 1.304 1.331 0.137 0.142 0.155 0.153 2.011 1.376 1.187 1.241 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.364 0.370 0.379 0.409 0.997 1.172 1.140 1.164 0.358 0.363 0.370 0.389 0.970 1.134 1.109 1.140-0.2 0.388 0.392 0.406 0.436 1.042 1.225 1.227 1.230 0.364 0.362 0.365 0.408 0.901 1.086 1.054 1.054SD 0 0.396 0.398 0.413 0.446 0.992 1.180 1.166 1.161 0.371 0.374 0.382 0.417 0.919 1.113 1.077 1.0840.2 0.388 0.389 0.397 0.436 1.029 1.254 1.161 1.182 0.369 0.370 0.374 0.389 1.021 1.199 1.148 1.1680.4 0.375 0.379 0.403 0.430 1.151 1.360 1.261 1.280 0.364 0.370 0.386 0.409 1.049 1.243 1.132 1.160-0.4 -0.001 0.002 0.047 0.008 -0.838 -0.202 -0.010 -0.019 -0.013 -0.010 0.087 0.000 -1.255 -0.212 -0.034 -0.021-0.2 0.008 0.012 0.038 0.020 -0.872 -0.245 -0.067 -0.058 0.021 0.022 0.061 0.018 -1.145 -0.086 0.120 0.121BIAS 0 0.022 0.023 0.023 0.032 -0.850 -0.196 -0.036 -0.030 -0.001 -0.003 -0.014 -0.005 -1.265 -0.190 -0.023 -0.0240.2 -0.007 -0.007 -0.036 -0.014 -0.839 -0.140 -0.053 -0.042 0.007 0.005 -0.047 -0.001 -1.213 -0.103 0.006 -0.0070.4 0.011 0.007 -0.036 0.000 -0.759 -0.075 -0.013 -0.021 -0.005 -0.009 -0.082 -0.013 -1.191 -0.073 -0.075 -0.103-0.4 0.133 0.137 0.146 0.167 1.695 1.414 1.299 1.355 0.129 0.132 0.144 0.151 2.517 1.330 1.230 1.300-0.2 0.150 0.154 0.166 0.191 1.846 1.562 1.510 1.515 0.133 0.132 0.137 0.167 2.121 1.187 1.125 1.126MSE 0 0.158 0.159 0.171 0.200 1.706 1.431 1.360 1.350 0.138 0.140 0.146 0.174 2.445 1.275 1.161 1.1760.2 0.150 0.151 0.159 0.190 1.764 1.592 1.351 1.400 0.136 0.137 0.142 0.151 2.512 1.447 1.319 1.3630.4 0.141 0.144 0.164 0.185 1.901 1.856 1.590 1.638 0.132 0.137 0.156 0.168 2.519 1.551 1.286 1.356 Table 1.3
The distribution of √ nh [ b τ ( x ) − τ ( x )] for model 3 n=200 n=500OR PR SR NR N S P O OR PR SR NR N S P O h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.320 0.321 0.328 0.319 0.516 0.532 0.570 0.548 0.306 0.310 0.315 0.307 0.502 0.501 0.501 0.499-0.2 0.341 0.343 0.352 0.345 0.477 0.477 0.514 0.526 0.298 0.301 0.304 0.292 0.476 0.472 0.497 0.501SD 0 0.301 0.306 0.312 0.304 0.450 0.458 0.487 0.495 0.292 0.296 0.299 0.290 0.484 0.466 0.512 0.5140.2 0.320 0.320 0.322 0.313 0.493 0.486 0.521 0.514 0.296 0.298 0.304 0.296 0.470 0.455 0.489 0.4880.4 0.306 0.314 0.319 0.312 0.501 0.525 0.534 0.530 0.301 0.305 0.308 0.296 0.473 0.477 0.483 0.491-0.4 -0.023 -0.025 -0.028 -0.044 -0.038 -0.029 0.001 0.003 0.026 0.027 0.025 0.005 0.027 0.021 0.050 0.054-0.2 0.026 0.022 0.020 0.035 0.004 0.009 -0.006 -0.009 -0.006 -0.006 -0.003 0.010 0.013 0.022 0.003 0.004BIAS 0 0.003 -0.001 0.011 0.035 0.048 0.051 0.019 0.022 0.010 0.011 0.013 0.042 0.020 0.043 -0.014 -0.0150.2 0.003 0.000 0.001 0.014 0.015 0.026 0.008 0.010 -0.012 -0.011 -0.010 0.009 0.033 0.044 0.023 0.0220.4 -0.004 -0.006 -0.011 -0.018 -0.023 -0.012 0.011 0.013 0.001 0.001 -0.004 -0.023 -0.044 -0.049 -0.010 -0.008-0.4 0.103 0.104 0.109 0.103 0.267 0.284 0.324 0.301 0.094 0.097 0.100 0.094 0.252 0.252 0.253 0.252-0.2 0.117 0.118 0.124 0.120 0.227 0.227 0.265 0.277 0.089 0.091 0.092 0.085 0.227 0.223 0.247 0.251MSE 0 0.091 0.094 0.098 0.094 0.205 0.212 0.237 0.246 0.085 0.088 0.090 0.086 0.234 0.219 0.262 0.2650.2 0.102 0.102 0.104 0.098 0.244 0.237 0.272 0.264 0.088 0.089 0.093 0.088 0.222 0.209 0.240 0.2390.4 0.093 0.098 0.102 0.097 0.251 0.276 0.285 0.281 0.091 0.093 0.095 0.088 0.226 0.230 0.234 0.241 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.297 0.301 0.313 0.306 0.465 0.480 0.502 0.506 0.285 0.290 0.296 0.289 0.497 0.512 0.507 0.513-0.2 0.311 0.313 0.319 0.314 0.423 0.428 0.467 0.462 0.303 0.307 0.309 0.300 0.471 0.475 0.482 0.478SD 0 0.316 0.320 0.322 0.322 0.470 0.465 0.532 0.522 0.321 0.325 0.331 0.325 0.483 0.487 0.521 0.5210.2 0.318 0.323 0.328 0.323 0.460 0.462 0.501 0.502 0.291 0.298 0.301 0.297 0.468 0.471 0.484 0.4850.4 0.301 0.305 0.306 0.305 0.489 0.493 0.530 0.516 0.310 0.311 0.312 0.307 0.518 0.536 0.524 0.522-0.4 0.003 0.002 -0.001 -0.013 -0.019 -0.004 0.024 0.025 -0.003 -0.004 -0.007 -0.025 -0.043 -0.025 -0.018 -0.015-0.2 -0.025 -0.023 -0.023 -0.013 -0.023 -0.024 -0.043 -0.044 0.011 0.011 0.015 0.029 0.007 0.016 -0.004 -0.004BIAS 0 0.001 0.003 0.009 0.028 0.019 0.028 -0.009 -0.014 0.014 0.015 0.023 0.051 0.050 0.059 0.012 0.0180.2 0.008 0.009 0.017 0.025 0.024 0.029 0.017 0.011 0.010 0.011 0.015 0.026 0.019 0.032 0.012 0.0110.4 -0.010 -0.009 -0.014 -0.025 -0.055 -0.048 -0.010 -0.012 -0.004 -0.004 -0.010 -0.025 -0.034 -0.013 -0.009 -0.008-0.4 0.088 0.090 0.098 0.094 0.217 0.230 0.253 0.257 0.081 0.084 0.088 0.084 0.248 0.263 0.257 0.264-0.2 0.097 0.099 0.102 0.099 0.179 0.183 0.220 0.216 0.092 0.094 0.095 0.091 0.222 0.226 0.232 0.229MSE 0 0.100 0.103 0.104 0.105 0.221 0.217 0.283 0.272 0.103 0.106 0.110 0.108 0.236 0.241 0.271 0.2720.2 0.101 0.104 0.108 0.105 0.212 0.215 0.251 0.252 0.085 0.089 0.091 0.089 0.220 0.223 0.234 0.2350.4 0.091 0.093 0.094 0.094 0.243 0.246 0.281 0.267 0.096 0.097 0.097 0.095 0.270 0.288 0.275 0.273 h = 0 . n − / , h = 0 . n − / , h = 0 . n − /4