[PDF] Outcome regression-based estimation of conditional average treatment effect

Abstract

The research is about a systematic investigation on the following issues. First, we construct different outcome regression-based estimators for conditional average treatment effect under, respectively, true (oracle), parametric, nonparametric and semiparametric dimension reduction structure. Second, according to the corresponding asymptotic variance functions, we answer the following questions when supposing the models are correctly specified: what is the asymptotic efficiency ranking about the four estimators in general? how is the efficiency related to the affiliation of the given covariates in the set of arguments of the regression functions? what do the roles of bandwidth and kernel function selections play for the estimation efficiency; and in which scenarios should the estimator under semiparametric dimension reduction regression structure be used in practice? As a by-product, the results show that any outcome regression-based estimation should be asymptotically more efficient than any inverse probability weighting-based estimation. All these results give a relatively complete picture of the outcome regression-based estimation such that the theoretical conclusions could provide guidance for practical use when more than one estimations can be applied to the same problem. Several simulation studies are conducted to examine the performances of these estimators in finite sample cases and a real dataset is analyzed for illustration.

Full PDF

aa r X i v : . [ m a t h . S T ] S e p Outcome regression-based estimation of conditionalaverage treatment eﬀect

Lu Li , Niwen Zhou , and Lixing Zhu ∗ School of Finance and statistics, East China Normal University, Shanghai 200241, China School of statistics, Beijing Normal University, Beijing 100875, China Department of Mathematics, Hong Kong Baptist University, Kowloon Tong 999077, HongKong, China

Abstract

The research is about a systematic investigation on the following issues. First,we construct diﬀerent outcome regression-based estimators for conditional averagetreatment eﬀect under, respectively, true (oracle), parametric, nonparametric andsemiparametric dimension reduction structure. Second, according to the corre-sponding asymptotic variance functions, we answer the following questions whensupposing the models are correctly speciﬁed: what is the asymptotic eﬃciencyranking about the four estimators in general? how is the eﬃciency related to theaﬃliation of the given covariates in the set of arguments of the regression functions?what do the roles of bandwidth and kernel function selections play for the estima-tion eﬃciency; and in which scenarios should the estimator under semiparametricdimension reduction regression structure be used in practice? As a by-product,the results show that any outcome regression-based estimation should be asymp-totically more eﬃcient than any inverse probability weighting-based estimation.All these results give a relatively complete picture of the outcome regression-basedestimation such that the theoretical conclusions could provide guidance for prac-tical use when more than one estimations can be applied to the same problem.Several simulation studies are conducted to examine the performances of theseestimators in ﬁnite sample cases and a real dataset is analyzed for illustration.

Keywords:

Asymptotic variance; Conditional average treatment eﬀect; Regressioncasual eﬀect; Suﬃcient dimension reduction. ∗ Corresponding author: [email protected]. The ﬁrst two authors are co-ﬁrst authors. The researchwas supported by a grant from The University Grants Council of Hong Kong. Introduction

Causal inference has been widely applied for decades to analyse treatment eﬀect basedon observational studies, in which treatments are assigned to observations in a non-random fashion. In this paper, we consider casual inference under the potential outcomeframework (Rubin, 1974; Rosenbaum and Rubin, 1983) where the treatment is binaryand the outcome variable in the hypothetical complete data set has two components( Y (1) , Y (0) ). In which Y (1) is the potential outcome if the individual receives treatmentand Y (0) is the corresponding potential outcome without treatment. As we can onlyobserve one of Y (1) and Y (0) , a commonly used method is to impute a reasonable value inthe lieu of the missing one such as linear regression imputation Healy and Westmacott(1956), kernel regression imputation Cheng (1994) and ratio imputation Rao (1996).In this paper, we consider average treatment eﬀect (ATE) conditional on some co-variates to explore the heterogeneity of ATE Rosenbaum and Rubin (1983, 1985). Let X ∈ R p be a set of covariates that collects individual’s personal information and X ∈ R k be a subvector of X , 1 ≤ k < p . Conditional average treatment eﬀect (CATE, here-after) is deﬁned as E ( Y (1) − Y (0) | X ). To estimate this function, Abrevaya et al. (2015)proposed estimators that are based on inverse probability weighting (IPW, hereafter)method and concluded that, according to the asymptotic variance functions, the estima-tor with noparametrically estimated propensity score (NCATE) is asymptotically moreeﬃcient than the one with parametrically estimated propensity score (PCATE). The rel-evant conclusion is similar to that in Hahn (1998) and Hirano et al. (2003) for the IPWestimators of ATE. But, PCATE is proved to be asymptotically equivalent to the onewith true propensity score (OCATE). This is very diﬀerent from the unconditional ATE.Zhou and Zhu(2020) ∗ proposed an estimator with semiparametically estimated propen-sity score (SCATE) and gave some more detailed analysis on the asymptotic eﬃciencyon NCATE and SCATE.As well known, for ATE, outcome regression-based estimation is already a popularlyused methodology. Thus, methodologically, the research in this aspect is not new. How-ever, for CATE, the problem becomes more complicated as it involves double conditionalexpectations on the full set X , or subset β ⊤ X of covariates, if the curse of dimensional-ity is concerned within dimension reduction framework, and the subset X where β is aprojection matrix. Three relevant references are Luo et al. (2017), Luo et al. (2019) andMa et al. (2019)). To focus on the estimation eﬃciency issue, we in this paper do notgive more details about how to work on dimension reduction and feature selection, whileonly consider the general setting supposing that a dimension reduction structure alreadyexists. We then consider a systematic investigation on their asymptotic properties toanswer the following questions when the model is correctly speciﬁed in parametric case. ∗ Zhou, N. W. & Zhu L. X.(2020). On IPW-based estimation of conditional average treatment eﬀect.Submitted.

21. When CATE is estimated under nonparametric, semiparametric, parametric andtrue (oracle) regression structure, what ranking of the asymptotic eﬃciency canbe for these estimators?Q2. Note that CATE is a function of X and the set of arguments of the regressionfunction, say ˜ X that is not necessary to be the full X , and thus X is not nec-essary to be a strict subset of ˜ X . Then could the aﬃliation of X to ˜ X aﬀectthe asymptotic eﬃciency of diﬀerent estimators? This issue is unique for CATEand particularly important under semiparametic dimension reduction frameworkas the regression function would be a function of ˜ X = β ⊤ X where β is a p × r matrix with r ≪ p in high dimensional scenarios.Q3. As all estimators involve nonparametric estimations for the involved conditionalexpectations, how could the bandwidth and kernel function aﬀect the eﬃciency?This study is particularly necessary.Q4. Comparing with the IPW-based estimation, what eﬃciency ranking should beconcluded?We will have a very brief discussion in Section 5 about the misspeciﬁed cases, globallyor locally, that will be investigated in the near future, but not be touched in this paper.Note that CATE is τ ( x ) = E [( Y (1) − Y (0) ) | X = x ] = E [ E ( Y (1) − Y (0) | X ) | X = x ] , where E ( Y (1) − Y (0) | X ) is the treatment eﬀect heterogeneity. We are interested in,under unconfoundedness assumption, estimating τ ( x ) in this paper. To well answerthe above four questions, we suggest / propose four estimators when assuming that m ( X ) − m ( X ) = E ( Y (1) − Y (0) | X ) is completely known function (ORCATE), para-metric function (PRCATE) ( m ( X ) = m ( X, θ ) and m ( X ) = m ( X, θ )), semipara-metric function with dimension reduction structure (SRCATE) ( m ( X ) = m ( β ⊤ X )and m ( X ) = m ( β ⊤ X )), and nonparametric function (NRCATE). The details willbe in Section 2. We derive the asymptotically linear representations and asymptoticnormality of these estimators in various scenarios and, according to the asymptotic vari-ance functions and using the estimators with true regression / propensity score as thebenchmark, we obtain the following results to give a relatively complete picture for theasymptotic eﬃciencies of the four estimation methods. The following newly derived re-sults show that the estimated CATEs have rather diﬀerent asymptotic behaviors fromthe estimated ATEs. Let A (cid:22) B mean that method A has smaller asymptotic variancefunction than method B, and A ∼ = B stand for the asymptotic equivalence of them whenthe asymptotic variance functions are equal. The results are summarised as follows.A1. This is the answer for Q1 and Q4. In general, the ranking for the asymptoticeﬃciencies of the estimators is, together with the results about the IPW-based3stimators respectively in Abrevaya et al. (2015) and Zhou and Zhu(2020) † : regression-based CATE estimators z }| { ORCATE ∼ = PRCATE (cid:22)

SRCATE (cid:22)

NRCATE = IPW-based CATE estimators z }| {

NCATE (cid:22)

SCATE (cid:22)

PCATE ∼ = OCATE . A2. For Q2, we have the following results to show the importance of the aﬃliationof X to X . Under semiparametric dimension reduction structure, when X ⊂ β ⊤ X ∩ β ⊤ X or X is just contained in one of the sets β ⊤ X or β ⊤ X , ORCAT E ∼ = P RCAT E (cid:22)

SRCAT E.

While when X is not fully included in both β ⊤ X and β ⊤ X , we have ORCAT E ∼ = P RCAT E ∼ = SRCAT E.

Some more results are included in Section 2. Also some similar results aboutNRCATE and more detailed comparisons are described in Section 2.A3. This answer is for Q3. When the CATE functions are smooth suﬃciently, and thebandwidth and kernel function are delicately selected, the following asymptoticequivalence among the regression-based estimators can be achieved:

ORCAT E ∼ = P RCAT E ∼ = SRCAT E ∼ = N RCAT E.

A4. In high-dimensional scenarios, semiparametric-based estimation is often preferablebecause it can greatly overcome the curse of dimensionality and also avoid modelmisspeciﬁcation. Some more detailed studies and comparisons for the asymptoticeﬃciency are contained in Section 2. The numerical studies in Section 3 supportthis observation.The rest of this article is organized as follows. Section 2 introduces the CATEfunction and give the estimators respectively under the true, parametric, nonparameticand semiparametric framework. The asymptotic properties of the proposed estimatorsare systematically investigated in this section. Section 3 presents some simulation studiesto examine the performances of the estimators. Section 4 is devoted to the analysis for areal data example. Conclusions and some further research problems are brieﬂy discussedin Section 5. For the ease of presentation, we defer all technical proofs to the appendix. † Zhou, N. W. & Zhu L. X.(2020). On IPW-based estimation of conditional average treatment eﬀect.Submitted. Estimations and their asymptotic properties

Let D be a dummy variable indicating treatment status with D = 1 if an individualreceives treatment and D = 0 otherwise. We only observe D , X and Y ≡ D · Y (1) +(1 − D ) · Y (0) in the real situation. The propensity score p ( D = 1 | X ) is denoted by p ( X ). Let { X i , Y i , D i } , i = 1 , . . . , n be n independent copies of ( X, Y, D ). To estimate τ ( x ), we suggest a two-step estimation procedure when both g and g are unknown.Four estimators are proposed in this paper when the regression casual eﬀect under true(oracle), parametric, nonparametric and semiparametric dimension reduction structure(ORCATE, PRCATE, NRCATE and SRCATE) respectively.To clearly state the estimation procedures, recall that the function m t ( X ) is deﬁnedas m t ( X ) = E ( Y ( t ) | X ) , t = 0 , . Under the unconfounderness assumption that is the conditional independence as( Y (0) , Y (1) ) D | X, we then ﬁrst estimate m ( X ) − m ( X ) and then its conditional expectation τ ( x ) = E ( m ( X ) − m ( X ) | X ) . But in semiparametric dimension reduction structure, this un-confounderness assumption will have a diﬀerent formula that will be speciﬁed in Sec-tion 2. However, directly estimating τ ( X ) in terms of Y (1) − Y (0) is not feasible as itis never observed. It is naturally to use Y (1) and Y (0) to estimate m ( X ) and m ( X )separately. Afterwards τ ( x ) can be estimated by a nonparametric method such as theN-W estimation (Nadaraya, 1964; Watson, 1964).As for SRCATE and NRCATE, we will have to use high order kernel functions, wegive the notation here. A function K : R k → R is a kernel of order s if it integrates toone over R k , and Z u p · · · u p k k K ( u ) du = 0for all nonnegative integers p , · · · , p k such that 1 ≤ P ki =1 p i < s , and it is nonzerowhen P ki =1 p i = s . Some regularity conditions are listed below.(C1). (Strong ignorability)(a) (Unconfoundedness) ( Y (0) , Y (1) ) D | X .(b) (Common support) For some very small c >

0, the propensity score function p ( · ) satisﬁes that c < p ( X ) < − c .(C2). (Distribution of X ) The support X of the p -dimensional covariate X is a Carte-sian product of compact intervals, and the density of X , f ( x ), is bounded away5rom 0 on X .(C3). (Kernel functions) K ( u ) is a kernel of order s that is symmetric around zeroand s ∗ times continuously diﬀerentiable.(C4). (Distribution of X ) The density function of X , f ( x ), is bounded away fromzero and inﬁnity and s ≥ k = 1 and s = 2. Furthermore, thevalue s ∗ relies on the smoothness of the regression function. More speciﬁcally, s ∗ ≥ s ∗ ≥ s and s ∗ ≥ s in nonparametric and semiparametricsituation, respectively.In the following, we study the four estimations in separate subsections and give somefurther analysis for SRCATE and NRCATE in another subsection. This estimator will serve as a benchmark to examine the performance of other estimatorsdeveloped and investigated later. Assume that m ( X ) − m ( X ) is completely known withno need of estimation. Then ORCATE can be written as b τ ( x ) = nh k n P i =1 K (cid:16) X i − x h (cid:17) { m ( X i ) − m ( X i ) } nh k n P i =1 K (cid:16) X i − x h (cid:17) , (1)The asymptotically linear representation and asymptotic normality are stated below. Theorem 1.

Suppose that assumptions (C1) through (C4) are satisﬁed. Then, whenregression casual eﬀect is given without estimation, for each point x in the support of X , we have q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { m ( X i ) − m ( X i ) − τ ( x ) } K (cid:18) X i − x h (cid:19) + o p (1) , nd then q nh k { b τ ( x ) − τ ( x ) } d −→ N (cid:18) , || K || σ O ( x ) f ( x ) (cid:19) , where || K || = { R K ( u ) du } / , and σ O ( x ) = E [ { m ( X ) − m ( X ) − τ ( x ) } | X = x ] . Suppose that both m ( X ) and m ( X ) have parametric structures with unknown param-eters α and α respectively. That is, m t ( X, α t ) are parametric functions for t = 0 , α and α , we use a similar method to that of Wang et al. (2004). Write,for i = 1 , . . . , n,D i Y i = D i m ( X i , α ) + D i ǫ i , (1 − D i ) Y i = (1 − D i ) m ( X i , α ) + (1 − D i ) ǫ i , where ǫ ti , t = 0 ,

1, are random error terms, and independent of X i , i = 1 , . . . , n . Useweighted least squares to estimate α t for t = 0 ,

1, and then m ( X i ) = m ( X i , α ) seeMatloﬀ (1981). Write them as ˆ α t and b m ( X ). PRCATE is then deﬁned as: b τ ( x ) = nh k n P i =1 K (cid:16) X i − x h (cid:17) { b m ( X i ) − b m ( X i ) } nh k n P i =1 K (cid:16) X i − x h (cid:17) , (2)where b m ( X i ) = m ( X, b α ) , b m ( X i ) = m ( X, b α ) , i = 1 , . . . , n. Assume the following additional condition:(A1). (Bandwidths) h → nh k → ∞ , nh s + k → b τ ( x ). Theorem 2.

Suppose that conditions (C1) through (C4) and (A1) are satisﬁed for s = s ∗ + 2 . Then, for each point x in the support of X , we have q nh k { b τ ( x ) − τ ( x ) } p nh k f ( x ) n X i =1 { m ( X i ) − m ( X i ) − τ ( x ) } K (cid:18) X i − x h (cid:19) + o p (1) d −→ N (cid:18) , || K || σ P ( x ) f ( x ) (cid:19) , where σ P ( x ) = σ O ( x ) = E [ { m ( X ) − m ( X ) − τ ( x ) } | X = x ] . Remark 1.

This theorem states the asymptotic equivalence between PRCATE and OR-CATE in the sense that their asymptotic variance functions are identical.

If we do not have prior information on the structures of m ( X ) and m ( X ) or we tryto avoid model misspeciﬁcation, a nonparametric estimation is feasible. Similarly, weestimate m ( X ) and m ( X ) separately. Therefore, NRCATE is written as b τ ( x ) = nh k n P i =1 K (cid:16) X i − x h (cid:17) { b m ( X i ) − b m ( X i ) } nh k n P i =1 K (cid:16) X i − x h (cid:17) , (3)where b m ( X i ) = nh p n P j =1 K (cid:16) X j − X i h (cid:17) Y j ( D j = 1) nh p n P j =1 K (cid:16) X j − X i h (cid:17) ( D j = 1) , b m ( X i ) = nh p n P j =1 K (cid:16) X j − X i h (cid:17) Y j ( D j = 0) nh p n P j =1 K (cid:16) X j − X i h (cid:17) ( D j = 0) . To study the asymptotic properties of b τ ( x ), we give some more conditions on the kernelfunction and bandwidths.(A2). K ( u ) is a kernel of order s ≥ p , symmetric around zero and equal to zerooutside Q pi =1 [ − ,

1] with continuous ( s + 1) order derivatives.(A3). h → log nnh p + s → h s h − s − k → nh k h s → m and m to ensure the asymptotic normality. The following theorem states the main8heoretical results of NRCATE. For convenience, deﬁne the following function:Ψ ( X, Y, D ) := D { Y − m ( X ) } p ( X ) − (1 − D ) { Y − m ( X ) } − p ( X ) + m ( X ) − m ( X ) . Theorem 3.

Suppose that conditions (C1) through (C4) and (A1) through (A4) aresatisﬁed for s ∗ ≥ s ≥ p . Then, for each point x , we have q nh k ( b τ ( x ) − τ ( x ))= 1 p nh k f ( x ) n X i =1 [Ψ ( X i , Y i , D i ) − τ ( x )] K (cid:18) X i − x h (cid:19) + o p (1) d −→ N (cid:18) , || K || σ N ( x ) f ( x ) (cid:19) , where σ N ( x ) = E [ { Ψ ( X, Y, D ) − τ ( x ) } | X = x ]= σ P ( x ) + E ( var ( Y (1) | X ) p ( X ) + var ( Y (0) | X )1 − p ( X ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X = x ) ≥ σ P ( x ) = σ O ( x ) , the equality holds if and only if var( Y (1) | X ) p ( X ) = 0 and var( Y (0) | X )1 − p ( X ) = 0, which rarely happen.Thus, the inequality shows that NRCATE is asymptotically less eﬃcient than PRCATEand ORCATE. An obvious limitation of NRCATE is its incapability of handling models with high-dimensional covariates X in practice. Therefore, how to alleviate the curse of dimen-sionality is an important issue. To this end, reducing dimensionality is a natural idea.But we restrict ourselves to the suﬃcient dimension reduction framework below and useexisting methods to estimate the projection directions as the focus of this paper is onasymptotics of the estimations assuming the dimension reduction structure is speciﬁed ina semiparametric manner. For other dimension reduction issues, we can see the relevantreferences such as Luo et al. (2017) and Ma et al. (2019).We ﬁrst give a very brief review on suﬃcient dimension reduction. For given β ⊤ X where β is a p × r orthonormal matrix with an unknown number r ≪ p of columns,suppose that the regression of a response variable W is independent of X , which iswritten as E ( W | X ) X | β ⊤ X , where stands for independence. It is generally9nown that E ( W | X ) is an unspeciﬁed function of β ⊤ X , which allows full freedom inthe regression with β ⊤ X being the suﬃciently reduced covariates (from p to r ). Thisstructure has a dimension reduction structure with unknown parameter β and also isvery much ﬂexible with a nonparametric nature. To identify the projection directions β ,Cook and Li (2002) deﬁned the notion of central mean subspace that is the intersectionof all subspaces spanned by any β such that the above conditional independence holds.To be speciﬁc, without notational confusion, write S E ( Y (1) | X ) and S E ( Y (0) | X ) respectivelyspanned by β ∈ R p × r (1) and β ∈ R p × r (0) where r ( t ) < p for t = 0 , m ( X ) X | β ⊤ X, m ( X ) X | β ⊤ X. (4)There are some approaches available in the literature to identify β and β . Forinstance, Luo et al. (2017) and Ma et al. (2019) discussed the relevant dimension reduc-tion issues and derived the properties of ATE under semiparametric structures. As thefocus of this paper is on the asymptotic properties of CATE estimations and the com-parisons amongst them, we then do not give the details about the estimation proceduresof dimension reduction matrices β and β , while just assume the root- n consistency oftwo estimators b β and b β we can deﬁne.Note that under this dimension reduction structure, we have m t ( X ) = E ( Y ( t ) | X ) = E ( Y ( t ) | β ⊤ t X ) = m t ( β ⊤ t X ) for t = 0 ,

1. Deﬁne a SRCATE as b τ ( x ) = nh k n P i =1 K (cid:16) X i − x h (cid:17) { b m ( b β ⊤ X i ) − b m ( b β ⊤ X i ) } nh k n P i =1 K (cid:16) X i − x h (cid:17) , (5) where In order to derive theoretical results, give the following conditions.(A5). K ( u ) is a kernel of order s , is symmetric around zero, is equal to zero outside Q pi =1 [ − , β ⊤ t X , f t ( β ⊤ t X ) is s times continuously diﬀerentiable for t = 0 ,

1. For t = 0 , p ( β ⊤ t X ) ∈ ( c ∗ , − c ∗ ) almost surely for some c ∗ ∈ (0 , . h → log nnh max { r (0) ,r (1) } + s → h s h − s − k → nh k h s → b β − β = O p ( n − ) and b β − β = O p ( n − ).Since the treatment eﬀect heterogeneity under the semiparametric structure is based on β ⊤ t X for t = 0 ,

1, Assumptions (A5) through (A7) play the same role as Assumptions(A2) through (A4). Condition (A8) often holds.10eﬁne three functions asΨ ( X, Y, D ) = D { Y − m ( X ) } p ( β ⊤ X ) + m ( X ) − m ( X ) , Ψ ( X, Y, D ) = − (1 − D ) { Y − m ( X ) } − p ( β ⊤ X ) + m ( X ) − m ( X ) , (6)Ψ ( X, Y, D ) = D { Y − m ( X ) } p ( β ⊤ X ) − (1 − D ) { Y − m ( X ) } − p ( β ⊤ X ) + m ( X ) − m ( X ) . Next, for ease of explanation of our theoretical results, we introduce some notations.Write A and B as two sets of elements. Without confusion, write card ( A ) as the cardi-nality of the set A .(F1) A ⊂ B stands for A ∩ B = A . In other words, elements of A are all in B and card ( B ) ≥ card ( A ).(F2) A ⊂ k − q B stands for A ∩ B = C with card ( C ) = k − q , that is, k − q elementsof A belong to B . When k = q , it means that A and B do not share the sameelements, i.e. A ∩ B = ∅ , written as A B .The following theorem states some very detailed investigation on the asymptoticeﬃciency of SRCATE. Theorem 4.

Suppose that assumptions (C1) through (C4), (A1) and (A5) through (A8)are satisﬁed for s ∗ ≥ s ≥ max { r (0) , r (1) } . Then, for each point x in the support of X , noting the deﬁnitions of Ψ i for i = 2 , , in (6),(1) when X ⊂ k − q β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , theasymptotically linear representation of b τ ( x ) is q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { m ( X i ) − m ( X i ) − τ ( x ) } K (cid:18) X i − x h (cid:19) + o p (1) , and the asymptotic distribution of b τ ( x ) is q nh k ( b τ ( x ) − τ ( x )) d −→ N (cid:18) , || K || σ S, ( x ) f ( x ) (cid:19) ; (2) when X ⊂ β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , theasymptotically linear representation of b τ ( x ) is q nh k { b τ ( x ) − τ ( x ) }

11 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K (cid:18) X i − x h (cid:19) + o p (1) d −→ N (cid:18) , || K || σ S, ( x ) f ( x ) (cid:19) ; (3) when X ⊂ k − q β ⊤ X and X ⊂ β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , theasymptotically linear representation of b τ ( x ) is q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K (cid:18) X i − x h (cid:19) + o p (1) d −→ N (cid:18) , || K || σ S, ( x ) f ( x ) (cid:19) ; (4) when X ⊂ β ⊤ X and X ⊂ β ⊤ X , the asymptotically linear representation of b τ ( x ) is q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K (cid:18) X i − x h (cid:19) + o p (1) d −→ N (cid:18) , || K || σ S, ( x ) f ( x ) (cid:19) ; where σ S, ( x ) = σ O ( x ) = E [ { m ( X ) − m ( X ) − τ ( x ) } | X = x ] ,σ S, ( x ) = E [ { Ψ ( X, Y, D ) − τ ( x ) } | X = x ] ,σ S, ( x ) = E [ { Ψ ( X, Y, D ) − τ ( x ) } | X = x ] , (7) σ S, ( x ) = E [ { Ψ ( X, Y, D ) − τ ( x ) } | X = x ] . Remark 2.

These results imply that the asymptotic behaviours of b τ ( x ) rely on whether X is a subset of β ⊤ t X for t = 0 , . Note that X ⊂ k − q β ⊤ t X implies that only k − q elements of X are also the k − q linear combinations of β ⊤ t X for t = 0 , . In thiscase, write β ⊤ t X as β ⊤ t X = ( X , . . . , X k − q ) , ( e β ⊤ t X ) ⊤ ) ⊤ for t = 0 , . Therefore,when X ⊂ k − q β ⊤ t X with s (2 − k/q ) + k > and < q ≤ k , we should determinethe intersection between X and β ⊤ t X , and then estimate β t through estimating e β t for t = 0 , . It could be done by using partial suﬃcient dimension reduction (e.g. Feng et al.(2013)). As this is not the focus of this paper, we then assume that β t can be estimatedat the rate / √ n of convergence. Obviously, the assumption s (2 − k/q ) + k > is atisﬁed for k = 1 . Corollary 1.

We have σ S, ( x ) = σ P ( x ) = σ O ( x ) ,σ S, ( x ) = σ P ( x ) + E ( var ( Y (1) | X ) p ( β ⊤ X ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X = x ) ≥ σ P ( x ) = σ O ( x ) ,σ S, ( x ) = σ P ( x ) + E ( var ( Y (0) | X )1 − p ( β ⊤ X ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X = x ) ≥ σ P ( x ) = σ O ( x ) ,σ S, ( x ) = σ P ( x ) + E ((cid:20) var ( Y (1) | X ) p ( β ⊤ X ) + var ( Y (0) | X )1 − p ( β ⊤ X ) (cid:21) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X = x ) ≥ σ P ( x ) = σ O ( x ) . Assume that var ( Y ( t ) | X ) is a measurable function with respect to β ⊤ t X for t = 0 , . Then E (cid:26) var ( Y (1) | X ) p ( β ⊤ X ) (cid:27) ≤ E (cid:26) var ( Y (1) | X ) p ( X ) (cid:27) , and E (cid:26) var ( Y (0) | X )1 − p ( β ⊤ X ) (cid:27) ≤ E (cid:26) var ( Y (0) | X )1 − p ( X ) (cid:27) . Then σ O ( x ) = σ P ( x ) ≤ σ S, ( x ) ≤ σ S, ( x ) ≤ σ N ( x ) ,σ O ( x ) = σ P ( x ) ≤ σ S, ( x ) ≤ σ S, ( x ) ≤ σ N ( x ) . (8) Remark 3.

The results in the above corollary are based on some elementary calculationsand the application of Theorem 3 of Luo et al. (2017). We then omit the detailed cal-culations. Based on these facts, SRCATE is more eﬃcient than NRCATE in all cases,and less eﬃcient than PRCATE and ORCATE in cases (2) to (4). In particularly, SR-CATE shares the same asymptotic distribution as PRCATE and ORCATE in case (1).Furthermore, SRCATE in case (4) is less eﬃcient than cases (2) and (3).

Inspired by Theorem 4 about the importance of aﬃliation of X to the set of argumentsof the regression functions, we further investigate SRCAT E and

N RCAT E in moregeneral settings. The results are stated in the following.

Corollary 2.

Suppose that conditions (C1) through (C4) and (A1) through (A8) aresatisﬁed. Assume that there is a given e X such that ( Y (0) , Y (1) ) X | e X with e X ⊂ X and X e X , when X ⊂ k − q β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , then the four outcome regression-based CATE estimators share the sameasymptotic distribution.Here, e σ N ( x ) ≡ E [ { m ( X ) − m ( X ) − τ ( x ) } | X = x ] = σ P ( x ) = σ O ( x ) . emark 4. Much to our surprise, NRCATE can be asymptotically more eﬃcient inthis special case to share the same asymptotic variance of PRCATE. This shows theimportance of covariate aﬃliation to the set of arguments of the regression function.This is a unique property for CATE as for ATE, this does not happen.

Corollary 3.

In Theorem 3 and Theorem 4, if commonly used constraints on the band-widths h , h and h are replaced with p nh k (cid:16) h s + p log( n ) /nh p (cid:17) = o (1) and p nh k (cid:18) h s + r log( n ) nh max { r (0) ,r (1) } (cid:19) = o (1) for someorder s , NRCATE and SRCATE have the same asymptotic distribution as PRCATE andORCATE. Remark 5.

As mentioned above, if we choose the bandwidth to satisfy the above condi-tions, NRCATE and SRCATE will share the same asymptotic eﬃciencies as PRCATEand ORCATE. It is obvious that the condition p nh k (cid:16) h s + p log( n ) /nh p (cid:17) = o (1) and p nh k (cid:18) h s + r log( n ) nh max { r (0) ,r (1) } (cid:19) = o (1) are muchstronger than the assumptions in Theorem 3 and Theorem 4. However, it is possible tochoose such bandwidths if the regression casual eﬀect function is suﬃciently smooth suchthat high order kernel can be used. For details, see Li and Racine (2007) and Zhou andZhu, 2020. Therefore, we obtain that the ranking for the asymptotic eﬃciencies of fourregression-based CATE estimators and four propensity score-based CATE estimators un-der the condition that p nh k (cid:16) h s + p log( n ) /nh p (cid:17) = o (1) and p nh k (cid:18) h s + r log( n ) nh max { r (0) ,r (1) } (cid:19) = o (1) , regression-based CATE estimators z }| { ORCATE = PRCATE = SRCATE = NRCATE ≤ IPW-based CATE estimators z }| {

NCATE = SCATE = PCATE = OCATE . (9) The equality occurs if and only if E (" var ( Y (1) | X ) p ( X ) + var ( Y (0) | X )1 − p ( X ) + p ( X )(1 − p ( X )) (cid:18) m ( X ) p ( X ) + m ( X )1 − p ( X ) (cid:19) X = x ) = 0 . In other words, regression based estimators are always more eﬃcient than IPW-typeestimators in this general setting.On the other hand, the above investigations are mainly for theoretical studies, andin practice, we may avoid to choose those bandwidths as they are often very diﬃcult toproperly select otherwise, the estimators would perform worse. Simulations

To verify our theoretical results, we in this section conduct simulation studies to comparethe regression-based ORCATE, PRCATE, SRCATE, NRCATE estimators with IPW-based OCATE, PCATE, SCATE, NCATE estimators (Abrevaya et al., 2015). Set p =dim( X ) ∈ { , } to avoid the curse of dimensionality under nonparametric estimation.Based on our experience and the theoretical results, when p is large, NRCATE is veryhard to implement. As well known, bandwidth selection plays an important role in theNW estimation. Hence, we ﬁrst discuss this issue. Note that ORCATE and PRCATE only involve one bandwidth h used in the secondstep of the estimation procedure. We ﬁrst check how to choose bandwidth sequencesand kernel functions satisfying the conditions A1 - A7. To this end, consider h = a · n − k +2 s − δ , a > , δ > ,h = a · n − p + s δ , a > , δ > ,h = a · n − { r (0) ,r (1) } + s δ , a > , δ > , (10)where δ , δ and δ can be selected as small as necessary or desired. It is clear that h , h and h satisfy conditions A1, A2, A3, A5 and A6. To satisfy condition A4, we setthe kernel orders as s = p and p + 1 for even and odd p respectively; and s = s + 2.To satisfy condition 7, under semiparametric dimension reduction structure, set s =max { r (0) , r (1) } and = max { r (0) , r (1) } +1 respectively for even and odd max { r (0) , r (1) } .Based on the above values of s , s and s , we verify th ﬁrst parts of conditions A4 andA7. Next, consider the second parts of these two conditions. Note that when s ≥ p and s ≥ max { r (0) , r (1) } , − s p + s ≤ − , s + k s + 4 + k < , − s max { r (0) , r (1) } + s ≤ − , s + k s + k < . Then − s p + s + 2 s + k s + 4 + k < , − s max { r (0) , r (1) } + s + 2 s + k s + k < . Therefore, h s h − s − k → h s h − s − k →

0. Invoking condition A3, nh k h s = nh s + k h s h − s → h s h − s →

0. Since δ , δ and δ can be arbitrarily small,15e get, because − s / ( s + p ) ≤ − / s + 2) / (2 s + 4 + k ) < / − s s + p + s + 22 s + 4 + k < . Thus, condition A4 is satisﬁed. Similarly, together with condition A6, condition A7 canalso be satisﬁed, which has nh k h s → − s max { r (0) , r (1) } + s + s s + k < . To examine the ﬁnite sample performances of the CATE estimators, consider the fol-lowing three models:Model 1: Y (0) = 0 , Y (1) = X + X + ǫ , p ( X ) = exp( X + X )1+exp( X + X ) .Model 2: Y (0) = 0 , Y (1) = X + X + X + X + ǫ , p ( X ) = exp { . X + X + X + X ) } { . X + X + X + X ) } .Model 3: Y (0) = 0 , Y (1) = X + X + ǫ , p ( X ) = exp { X + X } { X + X ) } .Model 1 is a model with the dimensions 2 and 0 of the central mean subspaces forthe treatment and control groups; Model 2 is used to verify Theorem 4. Model 3 is setto justify the theory in Corollary 2. The dimensions of central mean subspaces for thetreatment and control group are 1 and 0 in Models 2 and 3. For Model 1, X = ( X , X ) ⊤ is generated by X ∼ U ( − . , . , X = (1 + 2 X ) + ζ , where ζ ∼ U ( − . , . ǫ ∼ N (0 , . ). For Model 2, we generate X = ( X , X , X , X ) ⊤ by X ∼ U ( − . , . , X = 1 + X + ζ ,X = (1 + X ) + ζ , X = ( − X ) + ζ , where ζ j iid ∼ U ( − . , . ǫ ∼ N (0 , . ), j = 1 , ,

3. In Model 3, X = ( X , X , X ) ⊤ are given by X ∼ U ( − . , . , X = 1 + X + ϑ , X = (1 + X ) ∗ ( − X ) + ϑ , where ϑ j iid ∼ U ( − . , . ǫ ∼ N (0 , . ), j = 1 , n = 200 and n = 500 and the replicationtime is 500. Let T ( x ) = p ( nh )[ b τ ( x ) − τ ( x )], we report the estimated standarddeviation (SD) of T ( x ), the BIAS of T ( x ) and the MSE of T ( x ). For the bandwidth16election described in Subsection 3.1, we have the following selections.a). For Model 1 as p = 2, equation (10) gives s = 4, s = 2, and s = 2. We thenchoose h = a · n − for a = 0 . h = a · n − for a ∈ { . , . } , h = a · n − for a ∈ { . , . , . } . Here, a , a and a are called baselines.b). For Model 2, as p = 4, h = a · n − for a = 0 . h = a · n − for a ∈{ . , . , . , . } , h = a · n − for a ∈ { . , . , . } .c). For Model 3, as p = 3, then h = a · n − for a = 0 . h = a · n − for a ∈ { . , . } , h = a · n − for a = ∈ { . , . , . } .To make the simulation results more accessible, we tubulate the results in Tables 1-3and some results in Appendix, as well as plot the SDs of all estimators divided by theSD of NRCATE to show the relative eﬃciency in Figures 1-3. We choose a Gaussiankernel and derive higher order kernels from it.Table 1: The distribution of √ nh [ b τ ( x ) − τ ( x )] for model 1 n=200 n=500 x OR PR SR NR N S P O OR PR SR NR N S P Opanel1 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.187 0.221 0.218 0.213 0.363 0.375 0.397 0.399 0.191 0.222 0.214 0.217 0.386 0.395 0.415 0.419-0.2 0.203 0.217 0.210 0.215 0.381 0.390 0.399 0.405 0.182 0.192 0.179 0.195 0.349 0.357 0.367 0.368SD 0 0.193 0.201 0.213 0.213 0.446 0.467 0.471 0.480 0.192 0.202 0.211 0.213 0.404 0.415 0.453 0.4660.2 0.196 0.204 0.238 0.236 0.430 0.440 0.468 0.496 0.195 0.204 0.230 0.227 0.410 0.420 0.453 0.4790.4 0.197 0.213 0.241 0.239 0.394 0.415 0.443 0.437 0.200 0.225 0.243 0.241 0.392 0.395 0.443 0.446-0.4 -0.001 0.000 -0.046 -0.004 0.012 0.032 0.017 0.024 0.006 -0.008 -0.123 -0.023 -0.025 0.004 -0.007 -0.011-0.2 0.016 0.013 0.102 0.067 -0.007 0.008 0.008 0.015 0.002 0.000 0.123 0.057 -0.026 -0.010 -0.016 -0.014BIAS 0 -0.018 -0.022 0.004 0.003 -0.034 -0.017 -0.021 -0.010 0.003 0.006 0.052 0.034 -0.017 -0.002 0.002 0.0150.2 0.001 0.001 0.003 0.008 0.010 0.048 0.003 0.014 -0.016 -0.013 -0.006 -0.001 0.003 0.009 0.013 0.0210.4 -0.001 0.006 -0.005 -0.006 0.028 0.063 0.034 0.024 0.006 0.002 -0.009 -0.008 0.043 0.043 0.010 0.000-0.4 0.035 0.049 0.049 0.045 0.132 0.142 0.158 0.160 0.037 0.049 0.061 0.048 0.149 0.156 0.172 0.176-0.2 0.041 0.047 0.054 0.051 0.145 0.152 0.159 0.165 0.033 0.037 0.047 0.041 0.123 0.128 0.135 0.135MSE 0 0.038 0.041 0.045 0.046 0.200 0.219 0.222 0.230 0.037 0.041 0.047 0.047 0.164 0.172 0.205 0.2170.2 0.038 0.042 0.057 0.056 0.185 0.196 0.219 0.246 0.038 0.042 0.053 0.052 0.168 0.176 0.206 0.2300.4 0.039 0.046 0.058 0.057 0.156 0.177 0.198 0.191 0.040 0.051 0.059 0.058 0.156 0.158 0.196 0.199panel2 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.197 0.227 0.221 0.212 0.355 0.371 0.386 0.378 0.184 0.225 0.223 0.224 0.382 0.393 0.404 0.413-0.2 0.177 0.191 0.191 0.196 0.351 0.352 0.377 0.380 0.189 0.201 0.192 0.206 0.376 0.391 0.401 0.404SD 0 0.185 0.199 0.206 0.207 0.445 0.453 0.471 0.480 0.186 0.200 0.200 0.202 0.412 0.417 0.454 0.4650.2 0.197 0.201 0.229 0.225 0.457 0.463 0.508 0.542 0.202 0.209 0.230 0.228 0.446 0.459 0.492 0.5140.4 0.208 0.229 0.254 0.253 0.388 0.393 0.417 0.440 0.195 0.212 0.236 0.234 0.379 0.378 0.417 0.434-0.4 0.007 0.007 -0.060 -0.004 -0.014 0.011 0.008 0.002 -0.004 -0.014 -0.150 -0.034 -0.047 -0.017 -0.028 -0.029-0.2 0.010 0.008 0.095 0.068 -0.029 -0.013 -0.009 -0.014 0.011 0.005 0.127 0.062 0.006 0.022 0.026 0.025BIAS 0 0.014 0.010 0.044 0.041 -0.007 0.006 -0.002 0.000 0.007 0.005 0.058 0.040 -0.010 -0.002 -0.007 0.0000.2 0.007 0.001 0.000 0.004 -0.017 0.003 -0.014 -0.007 -0.012 -0.010 -0.001 0.000 -0.017 -0.013 -0.016 -0.0090.4 -0.001 -0.008 -0.018 -0.023 0.014 0.029 0.007 -0.006 0.027 0.032 0.021 0.019 0.064 0.066 0.049 0.043-0.4 0.039 0.051 0.052 0.045 0.126 0.138 0.149 0.143 0.034 0.051 0.072 0.052 0.148 0.155 0.164 0.172-0.2 0.031 0.037 0.046 0.043 0.124 0.124 0.142 0.145 0.036 0.040 0.053 0.046 0.141 0.153 0.162 0.164MSE 0 0.034 0.040 0.044 0.045 0.198 0.205 0.222 0.230 0.035 0.040 0.043 0.043 0.170 0.174 0.206 0.2160.2 0.039 0.040 0.052 0.051 0.209 0.214 0.258 0.294 0.041 0.044 0.053 0.052 0.199 0.211 0.242 0.2650.4 0.043 0.053 0.065 0.064 0.151 0.155 0.174 0.194 0.039 0.046 0.056 0.055 0.148 0.148 0.176 0.190 . . n=200 for model 1 x1 r e l a t i v e e ff i c i en cy − . − . . . . . n=500 for model 1 x1 r e l a t i v e e ff i c i en cy − . − . . . ORCATEPRCATESRCATENRCATENCATESCATEPCATEOCATE . . . n=200, regression method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=200, IPW method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=500, regression method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=500, IPW method x1 r e l a t i v e e ff i c i en cy − . − . . . Figure 1: Relative eﬃciency of the CATE estimators against NRCATE for Model 1,which are based the results in panel2 of Table 1.

The observations are as follows.First, it is reasonable that larger sample size results in smaller SD and MSE to showthe estimation consistency. The dimension of X also eﬀects the estimation performance.When p increases to 4 from 2, both SD and MSE obviously increase particularly when n = 500.Second, the comparisons show the signiﬁcant advantage of outcome regression-basedestimation over IPW-based estimation. Even though in theory, NRCATE is asymp-totically equivalent to NCATE, the diﬀerence on the estimation eﬃciency is still verysigniﬁcant. All tables and ﬁgures obviously indicate this: all IPW-based estimators have18uch larger SD than all regression-based estimators.Third, as discussed before, the performances of NRCATE and SRCATE are highlyassociated with the aﬃliation of the given covariates to the set of arguments of theoutcome regression. This ﬁnding can also be conﬁrmed in Tables 2 and 3 and Figures2 and 3. In Model 2, X ⊂ k − q β ⊤ X and X ⊂ k − q β ⊤ X with k = 1 and q = 0, thus intheory, SRCATE shares the same asymptotic variance as PRCATE and ORCATE andis more eﬃcient than NRCATE. From Table 2 and Figure 2, we can see that the SDs ofSRCATE are similar to those of PRCATE and ORCATE, which are smaller than that ofNRCATE. In Model 3, X e X = ( X , X ) ⊤ . the asymptotic eﬃciencies are equivalentin theory and its SDs in Table 3 are similar to, even slightly smaller than, the others. Inthis case, all outcome regression-based estimations have smaller SDs than all IPW-basedestimations. Figure 3 obviously tells this.Table 2: The distribution of √ nh [ b τ ( x ) − τ ( x )] for model 2 n=200 n=500 x OR PR SR NR N S P O OR PR SR NR N S P Opanel1 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.384 0.390 0.403 0.410 1.023 1.166 1.151 1.156 0.354 0.358 0.375 0.395 0.983 1.122 1.106 1.106-0.2 0.367 0.370 0.380 0.419 1.035 1.205 1.200 1.200 0.354 0.354 0.362 0.380 0.969 1.132 1.104 1.116SD 0 0.366 0.369 0.385 0.415 0.981 1.159 1.151 1.140 0.385 0.388 0.399 0.414 0.965 1.128 1.087 1.0910.2 0.374 0.376 0.395 0.417 0.992 1.180 1.137 1.129 0.364 0.365 0.370 0.388 1.008 1.141 1.103 1.1260.4 0.397 0.404 0.430 0.427 1.037 1.186 1.139 1.129 0.362 0.365 0.384 0.407 1.067 1.250 1.190 1.199-0.4 0.014 0.009 0.056 0.031 -0.692 -0.134 0.048 0.051 -0.017 -0.014 0.082 -0.003 -1.069 -0.201 -0.010 -0.010-0.2 0.015 0.012 0.043 0.021 -0.778 -0.207 -0.042 -0.034 0.010 0.012 0.050 0.016 -1.038 -0.198 -0.014 0.001BIAS 0 -0.005 -0.008 -0.012 -0.001 -0.782 -0.191 -0.023 -0.027 -0.025 -0.025 -0.034 -0.011 -1.107 -0.243 -0.082 -0.0740.2 0.004 0.004 -0.021 0.005 -0.652 -0.047 0.062 0.063 0.017 0.015 -0.034 0.020 -1.059 -0.158 -0.058 -0.0550.4 0.002 0.003 -0.036 0.002 -0.578 0.045 0.103 0.091 0.020 0.016 -0.053 0.005 -0.905 0.049 0.026 0.000-0.4 0.148 0.152 0.166 0.169 1.525 1.378 1.328 1.338 0.125 0.128 0.147 0.156 2.109 1.299 1.224 1.224-0.2 0.135 0.137 0.146 0.176 1.676 1.494 1.443 1.441 0.125 0.126 0.133 0.145 2.015 1.321 1.220 1.246MSE 0 0.134 0.136 0.148 0.173 1.574 1.380 1.324 1.301 0.149 0.151 0.161 0.171 2.158 1.331 1.189 1.1950.2 0.140 0.141 0.157 0.174 1.408 1.394 1.296 1.279 0.133 0.134 0.138 0.151 2.137 1.326 1.219 1.2720.4 0.158 0.163 0.187 0.183 1.410 1.410 1.307 1.283 0.131 0.133 0.150 0.166 1.957 1.566 1.417 1.437panel2 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.385 0.392 0.397 0.440 1.066 1.266 1.236 1.244 0.375 0.379 0.385 0.418 0.946 1.125 1.066 1.095-0.2 0.386 0.387 0.389 0.430 0.992 1.159 1.172 1.176 0.357 0.361 0.370 0.397 0.937 1.116 1.083 1.102SD 0 0.379 0.384 0.387 0.411 1.031 1.227 1.203 1.213 0.387 0.388 0.399 0.420 0.933 1.118 1.103 1.0950.2 0.388 0.387 0.403 0.443 1.089 1.254 1.225 1.247 0.375 0.376 0.383 0.407 1.028 1.198 1.161 1.1770.4 0.376 0.379 0.392 0.411 1.056 1.238 1.143 1.175 0.362 0.366 0.385 0.411 1.047 1.197 1.126 1.174-0.4 0.014 0.011 0.054 0.029 -0.836 -0.207 -0.011 -0.014 0.003 0.003 0.096 0.004 -1.234 -0.182 0.008 0.002-0.2 -0.010 -0.013 0.017 0.023 -0.879 -0.262 -0.073 -0.060 0.005 0.005 0.044 0.009 -1.200 -0.133 0.049 0.056BIAS 0 0.038 0.035 0.030 0.038 -0.860 -0.192 -0.080 -0.041 -0.003 -0.002 -0.012 0.001 -1.251 -0.144 -0.024 -0.0170.2 0.028 0.028 0.005 0.041 -0.715 -0.046 0.060 0.090 -0.011 -0.010 -0.058 -0.001 -1.231 -0.133 -0.057 -0.0420.4 0.009 0.007 -0.034 0.000 -0.746 -0.056 -0.030 -0.017 -0.019 -0.018 -0.089 -0.010 -1.125 -0.004 -0.031 -0.015-0.4 0.148 0.154 0.161 0.194 1.836 1.646 1.529 1.548 0.140 0.144 0.157 0.174 2.418 1.299 1.137 1.199-0.2 0.149 0.150 0.151 0.186 1.756 1.411 1.378 1.387 0.128 0.131 0.139 0.157 2.319 1.262 1.176 1.218MSE 0 0.145 0.148 0.151 0.171 1.803 1.542 1.454 1.474 0.150 0.151 0.159 0.177 2.436 1.271 1.217 1.2000.2 0.151 0.151 0.162 0.198 1.696 1.575 1.503 1.562 0.140 0.141 0.150 0.165 2.573 1.453 1.350 1.3860.4 0.142 0.144 0.155 0.169 1.673 1.535 1.308 1.380 0.132 0.134 0.156 0.169 2.362 1.433 1.269 1.378 In this section, we apply SRCATE, as the dimensionality ( p = 15) of X is high, to analysethe ACTG 175 data set that can be obtained from the R package speﬀ2trial. This dataset was collected from a randomized clinical trial that evaluated treatment eﬀect when19 . . . n=200 for model 2 x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=500 for model 2 x1 r e l a t i v e e ff i c i en cy − . − . . . ORCATEPRCATESRCATENRCATENCATESCATEPCATEOCATE . . . n=200, regression method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=200, IPW method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=500, regression method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=500, IPW method x1 r e l a t i v e e ff i c i en cy − . − . . . Figure 2: Relative eﬃciency of the CATE estimators against NRCATE for Model 2,which are based the results in panel2 of Table 2.either one or two therapies were used for HIV-infected adults; see Hammer et al. (1996);Song and Ma (2008) for more details. As discussed before, our goal is to explore theheterogeneity of this treatment eﬀect across subpopulations. Take age as X to checkhow the expected pesticide eﬀect changes with age .A very brief description about the data set is as follows. The outcome here is CD4T cell count at baseline and the treatment indicator variable D is a binary variable. D = 0 means receiving zidovudine only and D = 1 means receiving two therapies simul-taneously. As documented by a number of authors, we take Y = log (CD4) and deletesome inﬁnite value after logarithmic transformation, then the number of observations is n = 2136. Further, to guarantee the unconfoundedness assumption, X consists of thefollowing 15 covariates: the pidnum (patient’s ID number); age (age in years at baseline);wtkg (weight in kg at baseline); hemo (hemophilia); homo (homosexual activity); drugs20able 3: The distribution of √ nh [ b τ ( x ) − τ ( x )] for model 3 n=200 n=500 x OR PR SR NR N S P O OR PR SR NR N S P Opanel1 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.327 0.328 0.330 0.322 0.498 0.505 0.538 0.546 0.282 0.286 0.287 0.280 0.481 0.494 0.492 0.495-0.2 0.308 0.310 0.314 0.310 0.481 0.480 0.535 0.530 0.285 0.286 0.287 0.282 0.471 0.474 0.488 0.486SD 0 0.301 0.301 0.306 0.296 0.452 0.467 0.514 0.505 0.287 0.289 0.294 0.285 0.479 0.478 0.506 0.5120.2 0.316 0.319 0.327 0.317 0.485 0.500 0.516 0.516 0.317 0.317 0.323 0.314 0.493 0.492 0.504 0.5000.4 0.290 0.291 0.301 0.298 0.485 0.509 0.514 0.520 0.297 0.298 0.299 0.290 0.476 0.486 0.493 0.490-0.4 -0.016 -0.021 -0.023 -0.037 -0.067 -0.048 -0.031 -0.031 -0.008 -0.009 -0.012 -0.032 -0.045 -0.038 -0.014 -0.014-0.2 0.006 0.003 0.007 0.022 0.016 0.021 0.007 0.009 0.002 0.000 0.001 0.019 0.015 0.024 0.010 0.010BIAS 0 -0.002 -0.004 0.001 0.024 0.039 0.034 0.011 0.010 -0.011 -0.013 -0.007 0.022 0.027 0.042 0.000 -0.0020.2 0.004 0.001 0.007 0.015 -0.012 -0.008 -0.015 -0.020 0.009 0.008 0.009 0.024 0.012 0.021 0.005 0.0030.4 0.010 0.005 0.001 -0.017 -0.066 -0.043 -0.026 -0.027 -0.005 -0.006 -0.010 -0.026 -0.062 -0.061 -0.038 -0.040-0.4 0.107 0.108 0.109 0.105 0.252 0.257 0.290 0.299 0.080 0.082 0.083 0.080 0.234 0.245 0.243 0.245-0.2 0.095 0.096 0.098 0.097 0.231 0.231 0.287 0.281 0.081 0.082 0.082 0.080 0.222 0.225 0.238 0.236MSE 0 0.090 0.091 0.093 0.088 0.206 0.219 0.265 0.255 0.082 0.084 0.087 0.082 0.230 0.231 0.256 0.2620.2 0.100 0.102 0.107 0.101 0.236 0.250 0.267 0.267 0.100 0.101 0.104 0.099 0.243 0.242 0.254 0.2500.4 0.084 0.085 0.091 0.089 0.240 0.261 0.265 0.271 0.088 0.089 0.090 0.085 0.230 0.240 0.244 0.242panel2 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.329 0.334 0.337 0.324 0.498 0.497 0.515 0.522 0.284 0.288 0.291 0.283 0.490 0.497 0.505 0.510-0.2 0.304 0.308 0.314 0.301 0.432 0.441 0.464 0.453 0.297 0.301 0.307 0.295 0.479 0.473 0.499 0.498SD 0 0.314 0.319 0.325 0.303 0.486 0.485 0.545 0.540 0.317 0.317 0.321 0.309 0.484 0.474 0.510 0.5120.2 0.301 0.308 0.314 0.298 0.462 0.467 0.499 0.500 0.292 0.293 0.292 0.284 0.464 0.463 0.482 0.4830.4 0.302 0.306 0.313 0.296 0.503 0.510 0.525 0.525 0.293 0.298 0.301 0.289 0.472 0.485 0.477 0.479-0.4 -0.021 -0.016 -0.019 -0.027 -0.042 -0.036 -0.009 -0.010 0.000 0.002 0.000 -0.019 0.007 -0.002 0.032 0.034-0.2 0.004 0.007 0.015 0.029 0.019 0.029 0.012 0.008 -0.015 -0.011 -0.011 0.007 0.008 0.021 -0.002 -0.001BIAS 0 -0.012 -0.011 -0.009 0.014 -0.007 0.000 -0.042 -0.045 0.006 0.010 0.012 0.046 0.021 0.039 -0.006 -0.0020.2 0.022 0.023 0.032 0.039 0.037 0.048 0.036 0.035 0.004 0.008 0.009 0.028 0.014 0.025 0.005 0.0080.4 -0.023 -0.020 -0.022 -0.034 -0.040 -0.027 -0.009 -0.012 0.003 0.006 0.007 -0.019 -0.004 -0.006 0.016 0.020-0.4 0.109 0.112 0.114 0.106 0.250 0.248 0.265 0.272 0.080 0.083 0.085 0.081 0.240 0.247 0.256 0.261-0.2 0.093 0.095 0.099 0.091 0.187 0.196 0.216 0.206 0.088 0.091 0.094 0.087 0.229 0.224 0.249 0.248MSE 0 0.099 0.102 0.106 0.092 0.236 0.235 0.299 0.293 0.101 0.101 0.103 0.097 0.234 0.226 0.260 0.2620.2 0.091 0.095 0.100 0.090 0.215 0.221 0.250 0.251 0.085 0.086 0.085 0.082 0.215 0.215 0.232 0.2340.4 0.092 0.094 0.098 0.089 0.255 0.261 0.276 0.276 0.086 0.089 0.091 0.084 0.223 0.235 0.228 0.230 (history of intravenous drug use); karnof (Karnofsky score); oprior (non-zidovudine an-tiretroviral therapy prior to initiation of study treatment); zprior (zidovudine use priorto treatment initiation); preanti (number of days of previously received antiretroviraltherapy); race; gender; str2 (antiretroviral history); oﬀtrt (indicator of oﬀ-treatmentbefore 96pm5 weeks); days (number of days until the ﬁrst occurrence of: (i) a declinein CD4 T cell count of at least 50 (ii) an event indicating progression to AIDS, or (iii)death).We now estimate CATE in the interval between 20 and 57 to avoid the boundaryeﬀect when nonparametric estimation method is involved. This range is about from0 .

025 quantile to 0 .

975 quantile of the data. To apply SRCATE, we use the suﬃcientdimension reduction developed by Xia et al. (2002), which is now known to be MAVE toestimate the projection matrices β and β , and the associated dimensions. The resultsare r (1) = 2 and r (0) = 3. From these, we then have s = max { r (1) , r (0) } + 1 = 4 and h = b σ r n − / and h = b σ n − / , where b σ r = p var( β ⊤ X ), b β is the estimated projectionand b σ = 2 p var( X ). Similar to the simulation studies, Gaussian kernel is used.Figure 4 shows, as a function of age , the curve of estimated CATE. Note that thecurve is much above zero. In other words, receiving two therapies simultaneously has amuch better treatment eﬀect than receiving only one (zidovudine). Song and Ma (2008)21 . . . n=200 for model 3 x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=500 for model 3 x1 r e l a t i v e e ff i c i en cy − . − . . . ORCATEPRCATESRCATENRCATENCATESCATEPCATEOCATE . . n=200, regression method x1 r e l a t i v e e ff i c i en cy − . − . . . . . . n=200, IPW method x1 r e l a t i v e e ff i c i en cy − . − . . . . . n=500, regression method x1 r e l a t i v e e ff i c i en cy − . − . . . . . n=500, IPW method x1 r e l a t i v e e ff i c i en cy − . − . . . Figure 3: Relative eﬃciency of the CATE estimators against NRCATE for Model 3,which are based the results in panel2 of Table 3.also obtained this conclusion. But the investigation on the heterogeneity shows thatthe treatment eﬀect is inﬂuenced by age . As shown in Figure 4, before the age of 30,receiving two therapies leads to the immunity rise. After that, the advantage of thistreatment is gradually weakened. Thus, such a treatment seems more useful for patientswhose ages are around 30.

In this paper, we propose four regression-based estimators of CATE, aimed to capturethe heterogeneity of a treatment eﬀect across subpopulations. The systematic investiga-tion shows the important factors that aﬀect the asymptotic behaviours of the estimators:22 . . . . . age C A T E Figure 4: Conditional average treatment eﬀect curves over agethe convergence rates of the outcome regression functions and the aﬃliation of the givencovariates to the set of arguments of the outcome regression functions. Further, anyregression-based estimation can be asymptotically more eﬃcient than any propensityscore-based estimation, and can at most achieve the asymptotic eﬃciency of nonpara-metric regression-based estimation in some cases. These results can give a relativelycomplete proﬁle of propensity score-based and regression-based estimation for CATE.From the research, semiparametric regression-based estimation (SRCATE) is worth ofrecommendation as it can avoid model misspeciﬁcation as well as the curse of dimen-sionality when some dimension reduction and feature selection approaches are combined.see Luo et al. (2017) and Ma et al. (2019). In this paper, we only discuss the cases withcorrectly speciﬁed models. When the model is misspeciﬁed globally, further topics areabout the asymptotic bias. Here global misspeciﬁcation means that the assumed modelis not convergent to the underlying model. If it is convergent, we call it local misspeci-ﬁcation. Thus, we will check at which rate of convergence, the asymptotic bias vanishesand then also study its asymptotic eﬃciency. Another topic is about double robustestimation as it can greatly avoid model misspeciﬁcation. The research is ongoing.

Give some notations ﬁrst.(1) C and M stand for two generic bounded constants, Ξ is the σ -ﬁeld generated by X , . . . , X n . 232) ǫ ti = Y i − E ( Y ( t ) | X i ), τ t ( x ) = E [ E { Y | D = t, X }| X = x ], Z t = β ⊤ t X for t = 0 , i = 1 , . . . , n .(3) Write K (cid:16) X i − X h (cid:17) as K h ( X i ); K (cid:16) X i − X j h (cid:17) as K h ( X i − X j ), and K h ( Z i − Z j )as K (cid:16) Z i − Z j h (cid:17) .In the two-step estimation procedure for CATE, the second step involves, for i = 1 , . . . , n ,the quantities: b K h ( X i ) = X j : j = i w ij K h ( X j ) . We call it the estimator of K h ( X j ). In diﬀerent circumstances, w ij can be diﬀerent.Take NRCATE as an example, and write w ij as w Nij : w Nij = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1) that depends on X , . . . , X n only. Lemma 1.

Given assumptions (C1) - (C4) in Subsection 2.1 and (A1) - (A4) in Sub-sections 2.2-2.3, | w Nij − w Nji | = O p ( h ) nh p | K h ( X i − X j ) | , (A.1) Proof of Lemma 1.

By assumption (A2), w Nij = w Nji = 0 for || X j − X i || ∞ > h (Abrevaya et al., 2015). Suppose that || X j − X i || ∞ ≤ h . For all j , we deﬁne b f ( X j ) = 1 nh p n X i : i = j K h ( X i − X j ) . It is clear that w Nij = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1) = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) × nh p n P i =1 K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1)= nh p K h ( X i − X j ) b f ( X j ) b p ( X j ) . (A.2) | w Nij − w Nji | = 1 nh p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K h ( X i − X j ) b p ( X j ) b f ( X j ) − K h ( X j − X i ) b p ( X i ) b f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 nh p | K h ( X i − X j ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − b p ( X i ) b f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ nh p | K h ( X i − X j ) | ( (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X i ) b f ( X i ) − p ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) p ( X j ) f ( X j ) − p ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12) ) = 1 nh p | K h ( X i − X j ) | ( (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) b p ( X j ) p ( X j ) b f ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X i ) b f ( X i ) − p ( X i ) f ( X i ) b p ( X i ) p ( X i ) b f ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) p ( X i ) f ( X i ) − p ( X j ) f ( X j ) p ( X i ) p ( X j ) f ( X i ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12) ) . (A.3) Under conditions (C1)-(C4) and (A1)-(A4) for nonparametric estimation, sup i | b f ( X i ) − f ( X i ) | = O p h s + s log nnh p ! , sup i | b p ( X i ) − p ( X i ) | = O p h s + s log nnh p ! . Since s ≥ p ≥

25s for the last term in (A.3), noticing that f and p are continuously diﬀerentiable onits compact support and bounded away from zero, we have (cid:12)(cid:12)(cid:12) f ( x ) p ( x ) − f ( x ) p ( x ) (cid:12)(cid:12)(cid:12) ≤ M || x − x || ∞ for all x , x ∈ X and a constant M > || X j − X i || ∞ ≤ h leads to (cid:12)(cid:12)(cid:12) f ( X i ) p ( X i ) − f ( X j ) p ( X j ) (cid:12)(cid:12)(cid:12) = O ( h ). Combining all results yields (A.1). (cid:3) Proof of Theorem 2.

We can rewrite b m ( X i ) − b m ( X i ) − τ ( x ) as { b m ( X i ) − τ ( x ) }−{ b m ( X i ) − τ ( x ) } . Then based on (2), q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) { [ b m ( X i ) − τ ( x )] − [ b m ( X i ) − τ ( x )] } nh k n P i =1 K h ( X i )= √ nh k n P i =1 K h ( X i ) { [ b m ( X i ) − τ ( x )] − [ b m ( X i ) − τ ( x )] } f ( x ) (1 + o p (1)) , (A.4) as sup x | nh k n X i =1 K h ( X i ) − f ( x ) | = o p (1) . First, deal with { b m ( X i ) − τ ( x ) } in (A.4). It is clear that q nh k n X i =1 K h ( X i ) [ b m ( X i ) − τ ( x )]= 1 q nh k ( n X i =1 K h ( X i ) [ b m ( X i ) − m ( X i )] + n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )] ) =: 1 q nh k ( I n, + I n, ) . (A.5) A simple calculation yields that | q nh k I n, | ≤ sup x | b m ( X i ) − m ( X i ) | nh k n X i =1 | K h ( X i ) | . As h → nh k n P i =1 | K h ( X i ) | = O p (1), we then have √ nh k I n, = O p ( p h k ) = o p (1).26hus, equation (A.5) becomes q nh k n X i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] = 1 q nh k n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )] + o p (1) . Similarly, q nh k n X i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] = 1 q nh k n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )] + o p (1) . Altogether, the asymptotically linear representation of b τ ( x ) is q nh k { b τ ( x ) − τ ( x ) } = √ nh k n P i =1 K h ( X i ) { m ( X i ) − m ( X i ) − τ ( x ) } f ( x ) (1 + o p (1))= √ nh k n P i =1 K h ( X i ) { m ( X i ) − m ( X i ) − τ ( x ) } f ( x ) + o p (1) . The second equation is due to the asymptotic ﬁniteness of the leading term that isasymptotically normal shown below. As it is the sum of independent variables, theasymptotic normality is easy to derive. Speciﬁcally, noticing that the random variables { K h ( X i ) [ m ( X i ) − m ( X i ) − τ ( X i )] } ni =1 are i.i.d., then we can apply Lyapunov’s central limit theorem to obtain the asymptoticdistribution shown in Theorem 2. Under the assumptions (C1)- (C4) and (A1), we derivethat q nh k { b τ ( x ) − τ ( x ) } d −→ N (cid:18) , || K || σ P ( x ) f ( x ) (cid:19) , we now give the formula of σ P ( x ). It is easy to see that when n → ∞ , the variance of √ nh k n P i =1 K h ( X i ) { m ( X i ) − m ( X i ) − τ ( x ) } f ( x ) converges to σ P ( x ) := E [ { m ( X ) − m ( X ) − τ ( x ) } | X = x ] . The proof of Theorem 2 is ﬁnished. (cid:3) roof of Theorem 3. First, we have q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] nh k n P i =1 K h ( X i ) , (A.6) where b m ( X i ) = nh p n P j =1 K h ( X j − X i ) Y j ( D j = 1) nh p n P j =1 K h ( X j − X i ) ( D j = 1) , b m ( X i ) = nh p n P j =1 K h ( X j − X i ) Y j ( D j = 0) nh p n P j =1 K h ( X j − X i ) ( D j = 0) . Similarly as the proof for Theorem 2, we have the following decomposition: q nh k n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )]= 1 q nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + 1 q nh k n X i =1 K h ( X i ) [ E ( Y (1) | X i ) − τ ( x )]+ 1 q nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) ( w Nij − w Nji )+ 1 q nh k n X i =1 ǫ i ( D i = 1)  n X j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i )  + 1 q nh k n X i =1 K h ( X i )  nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i  =: I n, + I n, + I n, + I n, + I n, , (A.7) w Nij = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1) , ǫ i = Y i − E ( Y (1) | X i ) . Note that I n, and I n, in equation (A.7) yield the ﬁnal expression in Theorem 3. There-fore, we need to show that I n, , I n, and I n, in equation (A.7) are all o p (1).First show that I n, = o p (1). From Lemma 1, q h k sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 K h ( X j ) ( w Nij − w Nji ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ q h k sup i X j : j = i ( w Nij − w Nji ) | K h ( X j ) |≤ M Ch × h q h k × sup i X j : j = i nh p | K h ( X i − X j ) | = O p (1) × o p (1) × O p (1) = o p (1) , Further, √ n n P i =1 ǫ i ( D i = 1) has ﬁnite limit and thus, is bounded by O p (1) and then I n, = o p (1).Deal with I n, . As n X j =1 K h ( X j ) w Nji = nh p n P j =1 K h ( X j − X i ) nh p n P j =1 K h ( X j − X i ) ( D j = 1) nh p n P j =1 K h ( X j − X i ) K h ( X j ) nh p n P j =1 K h ( X j − X i ) , we can then regard n P j =1 K h ( X j ) w Nji as an estimator of K h ( X i ) p ( X i ) . Consider  n X j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i )  , which is the bias of K h ( X i ) p ( X i ) to K h ( X i ) p ( X i ) . Write X = ( X , X (2) ) and K h ( X − X j ) = K (cid:18) X − X j h (cid:19) K (cid:18) X (2) − X j h (cid:19) . Since b f − f = o p (1), and the kernel function is s ∗ ( ≥ s ) times continuously diﬀerentiable,29e have E  n X j =1 K h ( X j ) w Nji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i  = 1 + o p (1) h p f ( X i ) p ( X i ) Z K (cid:18) u j − X i h (cid:19) K (cid:18) u j − X i h (cid:19) K h ( u j ) f ( u i ) du = 1 + o p (1) f ( X i ) p ( X i ) Z K ( v ) K ( v ) K (cid:18) X i − X h + v h h (cid:19) f ( X i + h v ) dv = K h ( X i ) p ( X i ) + O p (cid:18) h s h s (cid:19) . (A.8) Note that b K h ( X i ) b p ( X i ) − K h ( X i ) p ( X i )= (cid:26) b p ( X i ) − p ( X i ) + 1 p ( X i ) (cid:27) n b K h ( X i ) − K h ( X i ) + K h ( X i ) o − K h ( X i ) p ( X i ) = (cid:26) b p ( X i ) − p ( X i ) (cid:27) n b K h ( X i ) − K h ( X i ) o + 1 p ( X i ) n b K h ( X i ) − K h ( X i ) o + (cid:26) b p ( X i ) − p ( X i ) (cid:27) K h ( X i )= O p h s h s + h s + s log nnh p ! = O p (cid:18) h s h s (cid:19) . Thus, sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n P j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p (cid:16) h s h s (cid:17) . Owing to assumption (A4) that h s h s k →

0, we have sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) q h k  n X j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p h s h s + k ! = o p (1) . Since ǫ i = Y i − E ( Y (1) | X i ) are mutually independent, we have I n, = o p (1) in equation(A.7). Finally, to show that I n, = o p (1) of equation (A.7). Note that nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1)30 nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) · nh p n P j =1 K h ( X j − X i ) nh p n P j =1 K h ( X j − X i ) ( D j = 1) , which can be viewed as an estimator of E { ( D =1) Y (1) | X i } p ( X i ) . Denote A ( X i ) = E { ( D =1) Y (1) | X i } . We can derive easily that b A ( X i ) b p ( X i ) − A ( X i ) p ( X i )= n b A ( X i ) − A ( X i ) + A ( X i ) o (cid:26) b p ( X i ) − p ( X i ) + 1 p ( X i ) (cid:27) − A ( X i ) p ( X i )= n b A ( X i ) − A ( X i ) o (cid:26) b p ( X i ) − p ( X i ) (cid:27) + A ( X i ) (cid:26) b p ( X i ) − p ( X i ) (cid:27) + n b A ( X i ) − A ( X i ) o p ( X i ) = O p h s + s log nnh p ! . Thussup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p h s + s log nnh p ! . Then, we can bound I n, as follows: | I n, | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) q nh k n X i =1 K h ( X i )  nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ q nh k sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) nh k n X i =1 | K h ( X i ) | = q nh k O p h s + s log nnh p ! · O p (1) = o p (1) · O p (1) = o p (1) , where assumption (A4) is used for the second equation. Thus, together with I n, = o p (1),31 n, = o p (1) and I n, = o p (1), equation (A.7) becomes q nh k n X i =1 K h ( X i )  nh p n P j =1 K h ( X j − X i ) Y j ( D j = 1) nh p n P j =1 K h ( X j − X i ) ( D j = 1) − τ ( x )  = I n, + I n, + o p (1) . Similarly, we can also deal with b m ( X i ) − τ ( x ) of (A.6) to have q nh k n X i =1 K h ( X i )  nh p n P j =1 K h ( X j − X i ) Y j ( D j = 0) nh p n P j =1 K h ( X j − X i ) ( D j = 0) − τ ( x )  := I n, + I n, + o p (1) , where I n, = 1 q nh k n X i =1 ǫ i ( D i = 0) K h ( X i )1 − p ( X i ) , I n, = 1 q nh k n X i =1 K h ( X i ) EY (0) | X i ,ǫ i = Y i − EY (0) | X i . Hence, we get the asymptotic linear representation of b τ ( x ) as q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) , which can be asymptotically normal. Again, we compute its asymptotic variance. Sim-ilarly as the proof for Theorem 2, we haveVar { b τ ( x ) } = 1 nh k || K || σ N ( x ) f ( x ) + o (cid:18) nh k (cid:19) . Then by assumptions (C1)– (C4) and (A1) – (A4) for some s ∗ ≥ s ≥ p , we can derivethat q nh k { b τ ( x ) − τ ( x ) } d −→ N (cid:18) , || K || σ N ( x ) f ( x ) (cid:19) , where σ N ( x ) ≡ E [ { Ψ ( X, Y, D ) − τ ( x ) } | X = x ] . The proof is concluded. (cid:3) roof of Theorem 4. Inspired by the proof of Theorem 2 of Luo et al. (2017), wehave q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) h b m ( b β ⊤ X ) − τ ( x ) i nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) h b m ( b β ⊤ X ) − τ ( x ) i nh k n P i =1 K h ( X i )= √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i )+ O p ( q nh k || b β − β || + q nh k || b β − β || ) , (A.9) where b m ( b β ⊤ X ) = nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) Y j ( D j = 1) nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) ( D j = 1) , b Z = b β ⊤ X, b m ( b β ⊤ X ) = nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) Y j ( D j = 0) nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) ( D j = 0) , b Z = b β ⊤ X. Under assumptions (A8), O p ( p nh k || b β − β || + p nh k || b β − β || ) = O p ( p h k ) = o p (1)as h →

0. Therefore, equation (A.9) becomes q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i )+ o p (1) . (A.10) Similarly as the proof for Theorem 3, we have q nh k n X i =1 K h ( X i ) h b m ( β ⊤ X ) − τ ( x ) i q nh k n X i =1 K h ( X i ) [ EY (1) | X i − τ ( x )] + 1 q nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) ( w S ij − w S ji )+ 1 q nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w S ji + 1 q nh k n X i =1 K h ( X i )  nh r (1)4 n P j =1 K h (cid:16) Z j − Z i (cid:17) ( D j = 1) EY (1) | X j nh r (1)4 n P j =1 K h (cid:16) Z j − Z i (cid:17) ( D j = 1) − EY (1) | X i  =: I n, + I n, + I n, + I n, , where w S ij = nh r (1)4 K h (cid:16) Z i − Z j (cid:17) nh r (1)4 n P i =1 K h (cid:16) Z i − Z j (cid:17) ( D i = 1) , ǫ i = Y i − EY (1) | X i . Similarly, we can decompose b m ( β ⊤ X ) − τ ( x ) as q nh k n X i =1 K h ( X i ) h b m ( β ⊤ X ) − τ ( x ) i = 1 q nh k n X i =1 K h ( X i ) [ EY (0) | X i − τ ( x )] + 1 q nh k n X i =1 ǫ i ( D i = 0) n X j =1 K h ( X j ) ( w S ij − w S ji )+ 1 q nh k n X i =1 ǫ i ( D i = 0) n X j =1 K h ( X j ) w S ji + 1 q nh k n X i =1 K h ( X i )  nh r (0)4 n P j =1 K h (cid:16) Z j − Z i (cid:17) ( D j = 0) EY (0) | X j nh r (0)4 n P j =1 K h (cid:16) Z j − Z i (cid:17) ( D j = 0) − EY (0) | X i  =: I ′ n, + I ′ n, + I ′ n, + I ′ n, , where w S ij = nh r (0)4 K h (cid:16) Z i − Z j (cid:17) nh r (0)4 n P i =1 K h (cid:16) Z i − Z j (cid:17) ( D i = 0) , ǫ i = Y i − EY (0) | X i . It is easy to show that I n, , I ′ n, , I n, and I ′ n, are o p (1) following the same arguments forproving that I n, = o p (1) and I n, = o p (1) for Theorem 3. The details are omitted here. Wenow deal with I n, and I ′ n, . emma 2. Suppose assumptions (C1) – (C4), (A1) and (A5) – (A7) are satisﬁed. Then, foreach point x in the support of X ,(1) If X ⊂ k − q β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , we have I n, = o p (1) , I ′ n, = o p (1) . (A.11) The corresponding asymptotically linear representation is then q nh k { b τ ( x ) − τ ( x ) } = 1 q nh k f ( x ) n X i =1 { m ( X i ) − m ( X i ) − τ ( x ) } K h ( X i ) + o p (1) . (2) If X ⊂ β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , we have I n, = 1 q nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + o p (1) , I ′ n, = o p (1) . (A.12) Then we have q nh k { b τ ( x ) − τ ( x ) } = 1 q nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) . (3) If X ⊂ k − q β ⊤ X and X ⊂ β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , we have I n, = o p (1) , I ′ n, = 1 q nh k n X i =1 ǫ i ( D i = 0) K h ( X i ) p ( X i ) + o p (1) . (A.13) The corresponding asymptotically linear representation is q nh k { b τ ( x ) − τ ( x ) } = 1 q nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) . (4) If X ⊂ β ⊤ X and X ⊂ β ⊤ X , we have I n, = 1 q nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + o p (1) , I ′ n, = 1 q nh k n X i =1 ǫ i ( D i = 0) K h ( X i ) p ( X i ) + o p (1) . (A.14) We have q nh k { b τ ( x ) − τ ( x ) } = 1 q nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) . Proof of Lemma 2.

We need to show that I n, = o p (1) if X ⊂ k − q β ⊤ X with s (2 − k/q ) + > < q ≤ k . Let X = v , β ⊤ X = v , and denote (cid:16) v − v i h , v − v i h (cid:17) as ( t , t ). We have E  n X j =1 K h ( X j ) w S ji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i  = 1 + o p (1) h r (1)4 f ( v i ) p ( v i ) Z K (cid:18) v j − β ⊤ X i h (cid:19) K (cid:18) v j − X h (cid:19) f ( v i ) dv = h o p (1) f ( v i ) p ( v i ) Z K ( t ) K (cid:18) v i − X h + t h h (cid:19) f ( v i + h t , v i + h t ) dt dt = h q K (cid:18) v i − X h (cid:19) f ( v i , v i ) f ( v i ) p ( v i ) Z K ( t ) dt dt + h q +14 h K ′ (cid:18) v i − X h (cid:19) f ( v i , v i ) f ( v i ) p ( v i ) Z t K ( t ) dt dt + o p (cid:18) h h (cid:19) , where f ( v i , v i ) is the joint density function of ( X , β ⊤ X ). Under assumptions (A5) – (A7),we have E  n X j =1 K h ( X j ) w S ji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i  = C h q K h ( X i ) f ( X i , β ⊤ X i ) f ( X i ) p ( β ⊤ X i ) + O p h q +14 h ! = O p h q + h q +14 h ! . Hence, under assumptions (A6), (A7), s (2 − k/q ) + k > < q ≤ k ,1 q nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w S ji = 1 √ n n X i =1 ǫ i ( D i = 1) O p h q h k/ + h q +14 h k/ ! = o p (1) . Analogously, we get I ′ n, = o p (1) if X ⊂ k − q β ⊤ X . Next, we prove that I n, = 1 q nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + o p (1) , if X ⊂ β ⊤ X . As that case that X ⊂ β ⊤ X is similar to that X ⊂ X in nonparametriccase, then parallelling to derive equation (A.8), we get the desired result. Similarly, we have I ′ n, = √ nh k n P i =1 ǫ i ( D i = 0) K h ( X i ) p ( X i ) + o p (1) if X ⊂ β ⊤ X . The proof for Lemma 2 isconcluded. Proof of Corollary 2.

Consider the case where X e X ∈ R q . Similarly as before, we erive that q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) h b m ( e X i ) − τ ( x ) i nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) h b m ( e X i ) − τ ( x ) i nh k n P i =1 K h ( X i ) , (A.15)where b m ( e X i ) = nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) Y j ( D j = 1) nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 1) , b m ( e X i ) = nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) Y j ( D j = 0) nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 0) . Some similar calculations lead to b m ( e X i ) − τ ( x ).1 q nh k n X i =1 K h ( X i ) h b m ( e X i ) − τ ( x ) i = 1 q nh k n X i =1 K h ( X i ) (cid:2) EY (1) | X i − τ ( x ) (cid:3) + 1 q nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) ( w N ij − w N ji )+ 1 q nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w N ji + 1 q nh k n X i =1 K h ( X i )  nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 1) EY (1) | X j nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 1) − EY (1) | X i  =: I n, + I n, + I n, + I n, , where w N ij = nh q K h (cid:16) e X i − e X j (cid:17) nh q n P i =1 K h (cid:16) e X i − e X j (cid:17) ( D i = 1) . Then we can prove that I n, and I n, are o p (1) by the same arguments as those used tohandle I n, and I n, for proving Theorem 3. Owing to X e X , similar arguments for provingLemma 2 implies that I n, = o p (1). The proof for Corollary 2 is concluded. (cid:3) roof of Corollary 3. From the proof for Theorem 3, we can see that E  n X j =1 K h ( X j ) w Nji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i  = O p (cid:18) h + h s h s (cid:19) , by the condition q nh k (cid:16) h s + p log( n ) /nh p (cid:17) = o (1). Then NRCATE shares the same asymp-totic distribution as PRCATE. For SRCATE, we can use similar arguments to show the sameresult. The proof is ﬁnished. (cid:3) References

Abrevaya, J., Hsu, Y.C., Lieli, R.P., 2015. Estimating conditional average treatment eﬀects.Journal of Business & Economic Statistics 33, 485–505.Cheng, P.E., 1994. Nonparametric estimation of mean functionals with data missing at random.Journal of the American statistical association 89, 81–87.Cook, R.D., Li, B., 2002. Dimension reduction for conditional mean in regression. The Annalsof Statistics 30, 455–474.Feng, Z., Wen, X.M., Yu, Z., Zhu, L., 2013. On partial suﬃcient dimension reduction withapplications to partially linear multi-index models. Journal of the American StatisticalAssociation 108, 237–246.Hahn, J., 1998. On the role of the propensity score in eﬃcient semiparametric estimation ofaverage treatment eﬀects. Econometrica. Journal of the Econometric Society 66, 315–331.Hammer, S.M., Katzenstein, D.A., Hughes, M.D., Gundacker, H., Schooley, R.T., Haubrich,R.H., Henry, W.K., Lederman, M.M., Phair, J.P., Niu, M., Martin, S.H., Thomas, C.M.,1996. A trial comparing nucleoside monotherapy with combination therapy in hiv-infectedadults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal ofMedicine 335, 1081–1090.Healy, M., Westmacott, M., 1956. Missing values in experiments analysed on automatic com-puters. Journal of the Royal Statistical Society: Series C (Applied Statistics) 5, 203–206.Hirano, K., Imbens, G.W., Ridder, G., 2003. Eﬃcient estimation of average treatment eﬀectsusing the estimated propensity score. Econometrica. Journal of the Econometric Society 71,1161–1189.Li, Q., Racine, J.S., 2007. Nonparametric econometrics. Princeton University Press, Princeton,NJ. Theory and practice.Luo, W., Wu, W., Zhu, Y., 2019. Learning heterogeneity in causal inference using suﬃcientdimension reduction. Journal of Causal Inference 7. uo, W., Zhu, Y., Ghosh, D., 2017. On estimating regression-based causal eﬀects using suﬃ-cient dimension reduction. Biometrika 104, 51–65.Ma, S., Zhu, L., Zhang, Z., Tsai, C.L., Carroll, R.J., 2019. A robust and eﬃcient approach tocausal inference based on sparse suﬃcient dimension reduction. The Annals of Statistics 47,1505–1535.Matloﬀ, N.S., 1981. Use of regression functions for improved estimation of means. Biometrika68, 685–689.Nadaraya, E.A., 1964. On estimating regression. Theory of Probability & Its Applications 9,141–142.Pagan, A., Ullah, A., 1999. Nonparametric econometrics. Themes in Modern Econometrics,Cambridge University Press, Cambridge.Rao, J., 1996. On variance estimation with imputed survey data. Journal of the AmericanStatistical Association 91, 499–506.Rosenbaum, P.R., Rubin, D.B., 1983. The central role of the propensity score in observationalstudies for causal eﬀects. Biometrika 70, 41–55.Rosenbaum, P.R., Rubin, D.B., 1985. Constructing a control group using multivariate matchedsampling methods that incorporate the propensity score. The American Statistician 39, 33–38.Rubin, D.B., 1974. Estimating causal eﬀects of treatments in randomized and nonrandomizedstudies. Journal of Educational Psychology 66, 688.Song, X., Ma, S., 2008. Multiple augmentation for interval-censored data with measurementerror. Statistics in Medicine 27, 3178–3190.Wang, Q., Linton, O., H¨ardle, W., 2004. Semiparametric regression analysis with missingresponse at random. Journal of the American Statistical Association 99, 334–345.Watson, G.S., 1964. Smooth regression analysis. Sankhy¯a (Statistics). The Indian Journal ofStatistics. Series A 26, 359–372.Xia, Y., Tong, H., Li, W.K., Zhu, L.X., 2002. An adaptive estimation of dimension reductionspace. Journal of the Royal Statistical Society. Series B. Statistical Methodology 64, 363–410.Yin, J., Geng, Z., Li, R., Wang, H., 2010. Nonparametric covariance model. Statistica Sinica20, 469–479. √ nh [ b τ ( x ) − τ ( x )] for model 1 n=200 n=500OR PR SR NR N S P O OR PR SR NR N S P O h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.178 0.224 0.213 0.223 0.352 0.361 0.396 0.407 0.188 0.224 0.208 0.225 0.371 0.388 0.404 0.409-0.2 0.182 0.192 0.181 0.196 0.351 0.365 0.377 0.380 0.186 0.193 0.191 0.213 0.368 0.383 0.389 0.395SD 0 0.199 0.210 0.231 0.232 0.420 0.440 0.460 0.491 0.198 0.205 0.206 0.217 0.415 0.430 0.466 0.4760.2 0.208 0.216 0.248 0.243 0.466 0.476 0.503 0.525 0.195 0.203 0.231 0.226 0.423 0.438 0.484 0.5090.4 0.195 0.215 0.239 0.236 0.377 0.395 0.415 0.426 0.202 0.222 0.250 0.247 0.364 0.372 0.415 0.432-0.4 0.005 0.011 -0.032 0.001 -0.024 -0.001 -0.004 -0.006 0.021 0.026 -0.097 0.011 0.022 0.055 0.043 0.037-0.2 -0.002 0.006 0.094 0.043 -0.011 0.011 0.014 0.017 -0.005 -0.003 0.119 0.034 -0.036 -0.008 -0.013 -0.013BIAS 0 0.005 0.013 0.040 0.032 -0.033 0.004 -0.007 0.012 0.007 0.007 0.057 0.030 -0.026 0.006 -0.004 0.0070.2 0.005 0.009 0.008 0.013 -0.006 0.035 0.001 0.014 0.003 0.001 0.005 0.007 -0.030 -0.004 -0.005 0.0060.4 0.006 0.004 -0.014 -0.008 0.033 0.066 0.041 0.027 0.015 0.013 0.000 0.008 0.032 0.038 0.017 0.012-0.4 0.032 0.050 0.047 0.050 0.125 0.130 0.157 0.165 0.036 0.051 0.052 0.051 0.138 0.153 0.165 0.169-0.2 0.033 0.037 0.042 0.040 0.124 0.133 0.142 0.145 0.035 0.037 0.051 0.047 0.137 0.147 0.152 0.156MSE 0 0.040 0.044 0.055 0.055 0.177 0.194 0.212 0.241 0.039 0.042 0.046 0.048 0.173 0.185 0.217 0.2260.2 0.043 0.047 0.061 0.059 0.217 0.228 0.253 0.276 0.038 0.041 0.054 0.051 0.180 0.192 0.234 0.2590.4 0.038 0.046 0.057 0.056 0.143 0.160 0.174 0.182 0.041 0.049 0.062 0.061 0.133 0.140 0.173 0.187 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.195 0.235 0.219 0.227 0.358 0.371 0.403 0.392 0.181 0.220 0.212 0.225 0.365 0.381 0.406 0.405-0.2 0.191 0.198 0.196 0.211 0.386 0.398 0.413 0.410 0.192 0.201 0.194 0.214 0.373 0.386 0.406 0.408SD 0 0.199 0.206 0.214 0.216 0.391 0.415 0.429 0.435 0.196 0.209 0.219 0.230 0.418 0.436 0.466 0.4840.2 0.202 0.207 0.235 0.231 0.440 0.455 0.495 0.525 0.203 0.209 0.231 0.227 0.419 0.431 0.468 0.4930.4 0.207 0.222 0.248 0.245 0.375 0.380 0.429 0.441 0.196 0.212 0.231 0.229 0.361 0.370 0.416 0.426-0.4 0.011 0.019 -0.043 0.012 0.023 0.046 0.038 0.034 0.015 0.003 -0.126 -0.011 -0.008 0.024 0.005 0.000-0.2 0.000 0.001 0.081 0.035 -0.033 -0.011 -0.002 -0.006 0.011 0.009 0.126 0.045 -0.021 0.002 -0.002 -0.001BIAS 0 -0.012 -0.016 0.013 0.006 -0.033 0.000 -0.009 -0.003 0.009 0.013 0.064 0.038 -0.012 0.010 0.014 0.0270.2 -0.003 -0.008 -0.008 -0.004 -0.041 -0.014 -0.035 -0.019 -0.009 -0.004 -0.002 -0.001 -0.019 -0.008 -0.009 0.0070.4 -0.007 -0.010 -0.025 -0.022 0.017 0.037 0.030 0.026 0.017 0.019 0.010 0.015 0.055 0.055 0.046 0.047-0.4 0.038 0.056 0.050 0.051 0.129 0.140 0.164 0.155 0.033 0.048 0.061 0.051 0.133 0.145 0.165 0.164-0.2 0.037 0.039 0.045 0.046 0.150 0.159 0.171 0.168 0.037 0.040 0.053 0.048 0.139 0.149 0.165 0.167MSE 0 0.040 0.043 0.046 0.047 0.154 0.172 0.184 0.189 0.039 0.044 0.052 0.054 0.175 0.190 0.217 0.2350.2 0.041 0.043 0.055 0.053 0.195 0.207 0.246 0.276 0.041 0.044 0.053 0.051 0.176 0.186 0.219 0.2430.4 0.043 0.049 0.062 0.061 0.141 0.146 0.185 0.195 0.039 0.045 0.053 0.053 0.133 0.140 0.175 0.184 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.183 0.222 0.218 0.219 0.357 0.375 0.405 0.398 0.191 0.214 0.207 0.215 0.376 0.390 0.400 0.408-0.2 0.195 0.203 0.186 0.197 0.360 0.366 0.380 0.384 0.186 0.196 0.183 0.198 0.364 0.372 0.386 0.391SD 0 0.193 0.206 0.214 0.217 0.441 0.453 0.474 0.486 0.193 0.201 0.207 0.211 0.432 0.442 0.478 0.4910.2 0.200 0.213 0.237 0.232 0.460 0.476 0.516 0.525 0.194 0.202 0.230 0.227 0.479 0.489 0.526 0.5410.4 0.198 0.220 0.241 0.239 0.407 0.414 0.453 0.460 0.211 0.231 0.257 0.255 0.406 0.408 0.455 0.474-0.4 0.005 0.000 -0.064 -0.009 -0.013 0.010 0.010 0.007 -0.004 -0.004 -0.130 -0.016 -0.006 0.019 0.009 0.020-0.2 0.001 -0.002 0.079 0.049 -0.044 -0.029 -0.024 -0.022 -0.002 -0.004 0.118 0.041 -0.025 -0.007 -0.007 -0.004BIAS 0 0.003 0.000 0.022 0.017 -0.029 -0.010 -0.018 -0.005 0.008 0.006 0.056 0.034 -0.034 -0.014 -0.003 -0.0080.2 0.016 0.013 0.013 0.018 -0.034 -0.001 -0.026 -0.016 0.000 -0.001 0.002 0.006 -0.030 -0.026 -0.022 -0.0280.4 0.004 -0.001 -0.014 -0.015 0.014 0.035 0.013 0.000 0.021 0.021 0.009 0.010 0.030 0.036 0.008 -0.003-0.4 0.034 0.049 0.052 0.048 0.128 0.141 0.164 0.159 0.037 0.046 0.060 0.046 0.142 0.152 0.160 0.167-0.2 0.038 0.041 0.041 0.041 0.131 0.134 0.145 0.148 0.035 0.038 0.047 0.041 0.133 0.139 0.149 0.153MSE 0 0.037 0.042 0.046 0.047 0.195 0.205 0.225 0.236 0.037 0.040 0.046 0.046 0.188 0.195 0.228 0.2410.2 0.040 0.045 0.056 0.054 0.213 0.226 0.267 0.276 0.038 0.041 0.053 0.052 0.230 0.240 0.278 0.2930.4 0.039 0.048 0.058 0.057 0.166 0.172 0.205 0.211 0.045 0.054 0.066 0.065 0.165 0.168 0.207 0.225 √ nh [ b τ ( x ) − τ ( x )] for model 2 n=200 n=500OR PR SR NR N S P O OR PR SR NR N S P O h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.384 0.386 0.391 0.409 0.950 1.103 1.101 1.098 0.341 0.348 0.356 0.376 0.959 1.066 1.061 1.079-0.2 0.389 0.395 0.399 0.430 0.968 1.100 1.106 1.114 0.360 0.362 0.365 0.396 0.962 1.091 1.088 1.099SD 0 0.386 0.388 0.393 0.419 1.000 1.165 1.141 1.136 0.373 0.376 0.380 0.397 0.940 1.077 1.053 1.0640.2 0.379 0.378 0.381 0.406 1.011 1.175 1.122 1.116 0.357 0.361 0.368 0.398 0.998 1.140 1.120 1.1210.4 0.384 0.390 0.412 0.435 1.011 1.150 1.105 1.103 0.390 0.394 0.413 0.438 1.045 1.182 1.129 1.157-0.4 0.010 0.010 0.059 0.031 -0.667 -0.063 0.081 0.089 0.020 0.018 0.108 0.028 -1.033 -0.118 0.032 0.015-0.2 -0.017 -0.017 0.012 0.003 -0.740 -0.158 -0.008 0.011 -0.005 -0.007 0.033 -0.004 -1.078 -0.151 -0.023 -0.036BIAS 0 -0.013 -0.015 -0.022 -0.017 -0.751 -0.143 -0.025 -0.009 0.004 0.002 -0.003 0.028 -0.996 -0.082 0.050 0.0520.2 -0.002 -0.002 -0.023 0.013 -0.650 0.004 0.058 0.078 -0.011 -0.013 -0.068 -0.031 -0.968 -0.008 0.028 0.0350.4 0.060 0.058 0.013 0.041 -0.566 0.095 0.103 0.104 0.005 0.004 -0.067 -0.007 -0.892 0.120 0.020 0.019-0.4 0.148 0.149 0.156 0.168 1.348 1.220 1.218 1.213 0.117 0.121 0.139 0.142 1.987 1.150 1.127 1.165-0.2 0.152 0.157 0.160 0.185 1.483 1.234 1.222 1.240 0.129 0.131 0.134 0.157 2.087 1.213 1.184 1.209MSE 0 0.149 0.151 0.155 0.176 1.564 1.377 1.303 1.291 0.139 0.142 0.145 0.158 1.876 1.166 1.111 1.1360.2 0.143 0.143 0.146 0.165 1.445 1.380 1.262 1.252 0.128 0.130 0.140 0.160 1.932 1.300 1.256 1.2580.4 0.151 0.156 0.170 0.191 1.342 1.332 1.232 1.226 0.152 0.155 0.175 0.192 1.888 1.411 1.275 1.339 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.368 0.374 0.376 0.397 1.003 1.204 1.140 1.161 0.346 0.348 0.358 0.372 0.945 1.093 1.077 1.102-0.2 0.399 0.397 0.407 0.443 1.011 1.192 1.183 1.177 0.368 0.369 0.371 0.394 0.889 1.030 1.028 1.039SD 0 0.389 0.390 0.392 0.413 1.029 1.192 1.188 1.197 0.362 0.364 0.373 0.408 0.966 1.101 1.077 1.0990.2 0.387 0.387 0.395 0.411 1.048 1.254 1.207 1.198 0.328 0.330 0.333 0.376 0.966 1.097 1.078 1.1040.4 0.391 0.398 0.420 0.432 1.041 1.202 1.131 1.147 0.370 0.377 0.390 0.391 1.019 1.172 1.089 1.114-0.4 0.023 0.027 0.079 0.019 -0.811 -0.173 -0.012 -0.030 -0.023 -0.020 0.070 -0.015 -1.169 -0.194 -0.027 -0.018-0.2 0.003 0.005 0.033 0.015 -0.754 -0.101 0.052 0.049 0.005 0.007 0.046 0.002 -1.101 -0.141 0.039 0.050BIAS 0 0.008 0.009 0.011 0.019 -0.781 -0.109 -0.014 -0.005 0.000 0.001 -0.007 0.001 -1.103 -0.121 0.013 0.0210.2 0.027 0.025 -0.006 0.031 -0.653 0.054 0.122 0.119 0.003 0.003 -0.046 0.016 -1.060 -0.011 0.054 0.0520.4 0.023 0.020 -0.025 0.018 -0.588 0.124 0.157 0.128 0.014 0.013 -0.058 0.008 -0.986 0.057 0.027 0.021-0.4 0.136 0.141 0.147 0.158 1.664 1.479 1.300 1.348 0.120 0.121 0.133 0.139 2.258 1.232 1.161 1.215-0.2 0.159 0.158 0.166 0.196 1.592 1.431 1.402 1.388 0.135 0.136 0.139 0.155 2.002 1.081 1.058 1.083MSE 0 0.151 0.152 0.154 0.171 1.668 1.433 1.412 1.433 0.131 0.133 0.139 0.166 2.150 1.227 1.159 1.2070.2 0.150 0.151 0.156 0.170 1.525 1.577 1.473 1.450 0.108 0.109 0.113 0.141 2.057 1.203 1.165 1.2220.4 0.154 0.159 0.177 0.187 1.429 1.460 1.304 1.331 0.137 0.142 0.155 0.153 2.011 1.376 1.187 1.241 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.364 0.370 0.379 0.409 0.997 1.172 1.140 1.164 0.358 0.363 0.370 0.389 0.970 1.134 1.109 1.140-0.2 0.388 0.392 0.406 0.436 1.042 1.225 1.227 1.230 0.364 0.362 0.365 0.408 0.901 1.086 1.054 1.054SD 0 0.396 0.398 0.413 0.446 0.992 1.180 1.166 1.161 0.371 0.374 0.382 0.417 0.919 1.113 1.077 1.0840.2 0.388 0.389 0.397 0.436 1.029 1.254 1.161 1.182 0.369 0.370 0.374 0.389 1.021 1.199 1.148 1.1680.4 0.375 0.379 0.403 0.430 1.151 1.360 1.261 1.280 0.364 0.370 0.386 0.409 1.049 1.243 1.132 1.160-0.4 -0.001 0.002 0.047 0.008 -0.838 -0.202 -0.010 -0.019 -0.013 -0.010 0.087 0.000 -1.255 -0.212 -0.034 -0.021-0.2 0.008 0.012 0.038 0.020 -0.872 -0.245 -0.067 -0.058 0.021 0.022 0.061 0.018 -1.145 -0.086 0.120 0.121BIAS 0 0.022 0.023 0.023 0.032 -0.850 -0.196 -0.036 -0.030 -0.001 -0.003 -0.014 -0.005 -1.265 -0.190 -0.023 -0.0240.2 -0.007 -0.007 -0.036 -0.014 -0.839 -0.140 -0.053 -0.042 0.007 0.005 -0.047 -0.001 -1.213 -0.103 0.006 -0.0070.4 0.011 0.007 -0.036 0.000 -0.759 -0.075 -0.013 -0.021 -0.005 -0.009 -0.082 -0.013 -1.191 -0.073 -0.075 -0.103-0.4 0.133 0.137 0.146 0.167 1.695 1.414 1.299 1.355 0.129 0.132 0.144 0.151 2.517 1.330 1.230 1.300-0.2 0.150 0.154 0.166 0.191 1.846 1.562 1.510 1.515 0.133 0.132 0.137 0.167 2.121 1.187 1.125 1.126MSE 0 0.158 0.159 0.171 0.200 1.706 1.431 1.360 1.350 0.138 0.140 0.146 0.174 2.445 1.275 1.161 1.1760.2 0.150 0.151 0.159 0.190 1.764 1.592 1.351 1.400 0.136 0.137 0.142 0.151 2.512 1.447 1.319 1.3630.4 0.141 0.144 0.164 0.185 1.901 1.856 1.590 1.638 0.132 0.137 0.156 0.168 2.519 1.551 1.286 1.356 √ nh [ b τ ( x ) − τ ( x )] for model 3 n=200 n=500OR PR SR NR N S P O OR PR SR NR N S P O h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.320 0.321 0.328 0.319 0.516 0.532 0.570 0.548 0.306 0.310 0.315 0.307 0.502 0.501 0.501 0.499-0.2 0.341 0.343 0.352 0.345 0.477 0.477 0.514 0.526 0.298 0.301 0.304 0.292 0.476 0.472 0.497 0.501SD 0 0.301 0.306 0.312 0.304 0.450 0.458 0.487 0.495 0.292 0.296 0.299 0.290 0.484 0.466 0.512 0.5140.2 0.320 0.320 0.322 0.313 0.493 0.486 0.521 0.514 0.296 0.298 0.304 0.296 0.470 0.455 0.489 0.4880.4 0.306 0.314 0.319 0.312 0.501 0.525 0.534 0.530 0.301 0.305 0.308 0.296 0.473 0.477 0.483 0.491-0.4 -0.023 -0.025 -0.028 -0.044 -0.038 -0.029 0.001 0.003 0.026 0.027 0.025 0.005 0.027 0.021 0.050 0.054-0.2 0.026 0.022 0.020 0.035 0.004 0.009 -0.006 -0.009 -0.006 -0.006 -0.003 0.010 0.013 0.022 0.003 0.004BIAS 0 0.003 -0.001 0.011 0.035 0.048 0.051 0.019 0.022 0.010 0.011 0.013 0.042 0.020 0.043 -0.014 -0.0150.2 0.003 0.000 0.001 0.014 0.015 0.026 0.008 0.010 -0.012 -0.011 -0.010 0.009 0.033 0.044 0.023 0.0220.4 -0.004 -0.006 -0.011 -0.018 -0.023 -0.012 0.011 0.013 0.001 0.001 -0.004 -0.023 -0.044 -0.049 -0.010 -0.008-0.4 0.103 0.104 0.109 0.103 0.267 0.284 0.324 0.301 0.094 0.097 0.100 0.094 0.252 0.252 0.253 0.252-0.2 0.117 0.118 0.124 0.120 0.227 0.227 0.265 0.277 0.089 0.091 0.092 0.085 0.227 0.223 0.247 0.251MSE 0 0.091 0.094 0.098 0.094 0.205 0.212 0.237 0.246 0.085 0.088 0.090 0.086 0.234 0.219 0.262 0.2650.2 0.102 0.102 0.104 0.098 0.244 0.237 0.272 0.264 0.088 0.089 0.093 0.088 0.222 0.209 0.240 0.2390.4 0.093 0.098 0.102 0.097 0.251 0.276 0.285 0.281 0.091 0.093 0.095 0.088 0.226 0.230 0.234 0.241 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.297 0.301 0.313 0.306 0.465 0.480 0.502 0.506 0.285 0.290 0.296 0.289 0.497 0.512 0.507 0.513-0.2 0.311 0.313 0.319 0.314 0.423 0.428 0.467 0.462 0.303 0.307 0.309 0.300 0.471 0.475 0.482 0.478SD 0 0.316 0.320 0.322 0.322 0.470 0.465 0.532 0.522 0.321 0.325 0.331 0.325 0.483 0.487 0.521 0.5210.2 0.318 0.323 0.328 0.323 0.460 0.462 0.501 0.502 0.291 0.298 0.301 0.297 0.468 0.471 0.484 0.4850.4 0.301 0.305 0.306 0.305 0.489 0.493 0.530 0.516 0.310 0.311 0.312 0.307 0.518 0.536 0.524 0.522-0.4 0.003 0.002 -0.001 -0.013 -0.019 -0.004 0.024 0.025 -0.003 -0.004 -0.007 -0.025 -0.043 -0.025 -0.018 -0.015-0.2 -0.025 -0.023 -0.023 -0.013 -0.023 -0.024 -0.043 -0.044 0.011 0.011 0.015 0.029 0.007 0.016 -0.004 -0.004BIAS 0 0.001 0.003 0.009 0.028 0.019 0.028 -0.009 -0.014 0.014 0.015 0.023 0.051 0.050 0.059 0.012 0.0180.2 0.008 0.009 0.017 0.025 0.024 0.029 0.017 0.011 0.010 0.011 0.015 0.026 0.019 0.032 0.012 0.0110.4 -0.010 -0.009 -0.014 -0.025 -0.055 -0.048 -0.010 -0.012 -0.004 -0.004 -0.010 -0.025 -0.034 -0.013 -0.009 -0.008-0.4 0.088 0.090 0.098 0.094 0.217 0.230 0.253 0.257 0.081 0.084 0.088 0.084 0.248 0.263 0.257 0.264-0.2 0.097 0.099 0.102 0.099 0.179 0.183 0.220 0.216 0.092 0.094 0.095 0.091 0.222 0.226 0.232 0.229MSE 0 0.100 0.103 0.104 0.105 0.221 0.217 0.283 0.272 0.103 0.106 0.110 0.108 0.236 0.241 0.271 0.2720.2 0.101 0.104 0.108 0.105 0.212 0.215 0.251 0.252 0.085 0.089 0.091 0.089 0.220 0.223 0.234 0.2350.4 0.091 0.093 0.094 0.094 0.243 0.246 0.281 0.267 0.096 0.097 0.097 0.095 0.270 0.288 0.275 0.273 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.309 0.313 0.323 0.316 0.515 0.541 0.531 0.519 0.282 0.286 0.286 0.283 0.487 0.493 0.495 0.491-0.2 0.315 0.315 0.326 0.312 0.495 0.534 0.528 0.539 0.299 0.300 0.304 0.294 0.480 0.481 0.488 0.486SD 0 0.322 0.328 0.329 0.324 0.473 0.485 0.526 0.531 0.304 0.304 0.308 0.299 0.472 0.476 0.493 0.4870.2 0.302 0.303 0.311 0.302 0.460 0.433 0.480 0.485 0.305 0.307 0.308 0.305 0.465 0.466 0.485 0.4870.4 0.306 0.311 0.314 0.309 0.476 0.500 0.506 0.514 0.283 0.285 0.286 0.284 0.449 0.471 0.467 0.464-0.4 -0.011 -0.011 -0.016 -0.026 -0.026 0.002 0.021 0.016 0.016 0.014 0.006 -0.009 -0.020 -0.018 0.012 0.009-0.2 -0.012 -0.012 -0.012 -0.001 -0.014 -0.011 -0.024 -0.027 0.007 0.005 0.006 0.022 0.009 0.025 0.006 0.004BIAS 0 -0.012 -0.012 -0.005 0.018 0.058 0.048 0.024 0.023 -0.002 -0.004 -0.003 0.024 0.021 0.034 -0.014 -0.0210.2 0.017 0.018 0.021 0.034 0.040 0.042 0.032 0.029 -0.007 -0.009 -0.009 0.007 -0.004 0.001 -0.016 -0.0190.4 0.012 0.013 0.012 -0.002 -0.043 -0.017 0.003 0.004 -0.009 -0.011 -0.011 -0.029 -0.044 -0.032 -0.005 -0.007-0.4 0.096 0.098 0.105 0.101 0.266 0.293 0.282 0.270 0.080 0.082 0.082 0.080 0.237 0.244 0.245 0.241-0.2 0.100 0.099 0.106 0.098 0.245 0.285 0.279 0.291 0.090 0.090 0.092 0.087 0.230 0.232 0.238 0.237MSE 0 0.104 0.108 0.108 0.106 0.227 0.238 0.278 0.282 0.092 0.092 0.095 0.090 0.223 0.228 0.244 0.2380.2 0.091 0.092 0.097 0.093 0.213 0.189 0.232 0.236 0.093 0.095 0.095 0.093 0.217 0.217 0.235 0.2380.4 0.094 0.097 0.098 0.096 0.228 0.251 0.256 0.265 0.080 0.081 0.082 0.081 0.204 0.223 0.218 0.216 r X i v : . [ m a t h . S T ] S e p Submitted to the Annals of Statistics

SUPPLEMENTARY MATERIAL TO“OUTCOME REGRESSION-BASEDESTIMATION OF CONDITIONAL AVERAGETREATMENT EFFECT”

By Lu Li † , Niwen Zhou ‡ and Lixing Zhu ‡ , § , ∗ East China Normal University † ,Beijing Normal University ‡ and HongKong Baptist University §

1. Appendix

Give some notations ﬁrst.(1) C and M stand for two generic bounded constants, Ξ is the σ -ﬁeldgenerated by X , . . . , X n .(2) ǫ ti = Y i − E ( Y ( t ) | X i ), τ t ( x ) = E [ E { Y | D = t, X }| X = x ], Z t = β ⊤ t X for t = 0 , i = 1 , . . . , n .(3) Write K (cid:16) X i − X h (cid:17) as K h ( X i ); K (cid:16) X i − X j h (cid:17) as K h ( X i − X j ), and K h ( Z i − Z j ) as K (cid:16) Z i − Z j h (cid:17) .In the two-step estimation procedure for CATE, the second step involves,for i = 1 , . . . , n , the quantities: b K h ( X i ) = X j : j = i w ij K h ( X j ) . We call it the estimator of K h ( X j ). In diﬀerent circumstances, w ij can bediﬀerent. Take NRCATE as an example, and write w ij as w Nij : w Nij = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1) that depends on X , . . . , X n only. Lemma . Given assumptions (C1) - (C4) in Subsection 2.1 and (A1)- (A4) in Subsections 2.2-2.3, | w Nij − w Nji | = O p ( h ) nh p | K h ( X i − X j ) | , (1.1) Proof of Lemma 1.1.

By assumption (A2), w Nij = w Nji = 0 for || X j − X i || ∞ > h (Abrevaya, Hsu and Lieli, 2015). Suppose that || X j − X i || ∞ ≤ h . For all j , we deﬁne b f ( X j ) = 1 nh p n X i : i = j K h ( X i − X j ) . It is clear that (1.2) w Nij = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1) = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) × nh p n P i =1 K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1)= nh p K h ( X i − X j ) b f ( X j ) b p ( X j ) . Then we have (1.3) | w Nij − w Nji | = 1 nh p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K h ( X i − X j ) b p ( X j ) b f ( X j ) − K h ( X j − X i ) b p ( X i ) b f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 nh p | K h ( X i − X j ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − b p ( X i ) b f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ nh p | K h ( X i − X j ) | ( (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X i ) b f ( X i ) − p ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) p ( X j ) f ( X j ) − p ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12) ) = 1 nh p | K h ( X i − X j ) | ( (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) b p ( X j ) p ( X j ) b f ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X i ) b f ( X i ) − p ( X i ) f ( X i ) b p ( X i ) p ( X i ) b f ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) p ( X i ) f ( X i ) − p ( X j ) f ( X j ) p ( X i ) p ( X j ) f ( X i ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12) ) . Under conditions (C1)-(C4) and (A1)-(A4) for nonparametric estimation, sup i | b f ( X i ) − f ( X i ) | = O p h s + s log nnh p ! , sup i | b p ( X i ) − p ( X i ) | = O p h s + s log nnh p ! . Since s ≥ p ≥

2, assumption (A3) implies that sup i | b f ( X i ) − f ( X i ) | = o p ( h ) and sup i | b p ( X i ) − p ( X i ) | = o p ( h ). By the mean value theorem, sup j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) b p ( X j ) p ( X j ) b f ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup j e p ( X j ) e f ( X j ) sup j (cid:12)(cid:12)(cid:12)b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12) , where e p ( X j ) is a quantity between b p ( X j ) and p ( X j ), similarly, e f ( X j ) is alsoa quantity between b f ( X j ) and f ( X j ). Owing to that f and p are boundedaway from zero, sup j e p − j f − j = O p (1). After a simple calculation, we have sup j (cid:12)(cid:12)(cid:12)b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12) = O p h s + s log nnh p ! = o p ( h ) . Therefore,sup j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X j ) b f ( X j ) − p ( X j ) f ( X j ) b p ( X j ) p ( X j ) b f ( X j ) f ( X j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o p ( h ) , sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b p ( X i ) b f ( X i ) − p ( X i ) f ( X i ) b p ( X i ) p ( X i ) b f ( X i ) f ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o p ( h ) . As for the last term in (1.3), noticing that f and p are continuously dif-ferentiable on its compact support and bounded away from zero, we have (cid:12)(cid:12)(cid:12) f ( x ) p ( x ) − f ( x ) p ( x ) (cid:12)(cid:12)(cid:12) ≤ M || x − x || ∞ for all x , x ∈ X and a constant M > || X j − X i || ∞ ≤ h leads to (cid:12)(cid:12)(cid:12) f ( X i ) p ( X i ) − f ( X j ) p ( X j ) (cid:12)(cid:12)(cid:12) = O ( h ). Com-bining all results yields (1.1). (cid:3) Proof of Theorem ?? . We can rewrite b m ( X i ) − b m ( X i ) − τ ( x ) as { b m ( X i ) − τ ( x ) } − { b m ( X i ) − τ ( x ) } . Then based on ( ?? ), (1.4) q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) { [ b m ( X i ) − τ ( x )] − [ b m ( X i ) − τ ( x )] } nh k n P i =1 K h ( X i )= √ nh k n P i =1 K h ( X i ) { [ b m ( X i ) − τ ( x )] − [ b m ( X i ) − τ ( x )] } f ( x ) (1 + o p (1)) , as sup x | nh k n X i =1 K h ( X i ) − f ( x ) | = o p (1) . First, deal with { b m ( X i ) − τ ( x ) } in (1.4). It is clear that (1.5) 1 p nh k n X i =1 K h ( X i ) [ b m ( X i ) − τ ( x )]= 1 p nh k ( n X i =1 K h ( X i ) [ b m ( X i ) − m ( X i )] + n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )] ) =: 1 p nh k ( I n, + I n, ) . A simple calculation yields that | p nh k I n, | ≤ sup x | b m ( X i ) − m ( X i ) | nh k n X i =1 | K h ( X i ) | . As h → nh k n P i =1 | K h ( X i ) | = O p (1), we then have √ nh k I n, = O p ( q h k ) = o p (1). Thus, equation (1.5) becomes p nh k n X i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] = 1 p nh k n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )] + o p (1) . Similarly, p nh k n X i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] = 1 p nh k n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )] + o p (1) . Altogether, the asymptotically linear representation of b τ ( x ) is q nh k { b τ ( x ) − τ ( x ) } = √ nh k n P i =1 K h ( X i ) { m ( X i ) − m ( X i ) − τ ( x ) } f ( x ) (1 + o p (1))= √ nh k n P i =1 K h ( X i ) { m ( X i ) − m ( X i ) − τ ( x ) } f ( x ) + o p (1) . The second equation is due to the asymptotic ﬁniteness of the leading termthat is asymptotically normal shown below. As it is the sum of independentvariables, the asymptotic normality is easy to derive. Speciﬁcally, noticingthat the random variables { K h ( X i ) [ m ( X i ) − m ( X i ) − τ ( X i )] } ni =1 are i.i.d., then we can apply Lyapunov’s central limit theorem to obtain theasymptotic distribution shown in Theorem ?? . Under the assumptions (C1)- (C4) and (A1), we derive that q nh k { b τ ( x ) − τ ( x ) } d −→ N (cid:18) , || K || σ P ( x ) f ( x ) (cid:19) , we now give the formula of σ P ( x ). It is easy to see that when n → ∞ , thevariance of √ nh k n P i =1 K h ( X i ) { m ( X i ) − m ( X i ) − τ ( x ) } f ( x ) converges to σ P ( x ) := E [ { m ( X ) − m ( X ) − τ ( x ) } | X = x ] . The proof of Theorem ?? is ﬁnished. (cid:3) Proof of Theorem ?? . First, we have (1.6) q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) [ b m ( X i ) − τ ( x )] nh k n P i =1 K h ( X i ) , where b m ( X i ) = nh p n P j =1 K h ( X j − X i ) Y j ( D j = 1) nh p n P j =1 K h ( X j − X i ) ( D j = 1) , b m ( X i ) = nh p n P j =1 K h ( X j − X i ) Y j ( D j = 0) nh p n P j =1 K h ( X j − X i ) ( D j = 0) . Similarly as the proof for Theorem ?? , we have the following decomposition: (1.7) 1 p nh k n X i =1 K h ( X i ) [ m ( X i ) − τ ( x )]= 1 p nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + 1 p nh k n X i =1 K h ( X i ) [ E ( Y (1) | X i ) − τ ( x )]+ 1 p nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) ( w Nij − w Nji )+ 1 p nh k n X i =1 ǫ i ( D i = 1)  n X j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i )  + 1 p nh k n X i =1 K h ( X i )  nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i  =: I n, + I n, + I n, + I n, + I n, , where w Nij = nh p K h ( X i − X j ) nh p n P i =1 K h ( X i − X j ) ( D i = 1) , ǫ i = Y i − E ( Y (1) | X i ) . Note that I n, and I n, in equation (1.7) yield the ﬁnal expression in Theorem ?? . Therefore, we need to show that I n, , I n, and I n, in equation (1.7) areall o p (1).First show that I n, = o p (1). From Lemma 1.1, p h k sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 K h ( X j ) ( w Nij − w Nji ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ p h k sup i X j : j = i ( w Nij − w Nji ) | K h ( X j ) |≤ M Ch × h p h k × sup i X j : j = i nh p | K h ( X i − X j ) | = O p (1) × o p (1) × O p (1) = o p (1) , Further, √ n n P i =1 ǫ i ( D i = 1) has ﬁnite limit and thus, is bounded by O p (1)and then I n, = o p (1).Deal with I n, . As n X j =1 K h ( X j ) w Nji = nh p n P j =1 K h ( X j − X i ) nh p n P j =1 K h ( X j − X i ) ( D j = 1) nh p n P j =1 K h ( X j − X i ) K h ( X j ) nh p n P j =1 K h ( X j − X i ) , we can then regard n P j =1 K h ( X j ) w Nji as an estimator of K h ( X i ) p ( X i ) . Consider  n X j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i )  , which is the bias of K h ( X i ) p ( X i ) to K h ( X i ) p ( X i ) . Write X = ( X , X (2) ) and K h ( X − X j ) = K (cid:18) X − X j h (cid:19) K (cid:18) X (2) − X j h (cid:19) . Since b f − f = o p (1), and the kernel function is s ∗ ( ≥ s ) times continuouslydiﬀerentiable, we have (1.8) E  n X j =1 K h ( X j ) w Nji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i  = 1 + o p (1) h p f ( X i ) p ( X i ) Z K (cid:18) u j − X i h (cid:19) K (cid:18) u j − X i h (cid:19) K h ( u j ) f ( u i ) du = 1 + o p (1) f ( X i ) p ( X i ) Z K ( v ) K ( v ) K (cid:18) X i − X h + v h h (cid:19) f ( X i + h v ) dv = K h ( X i ) p ( X i ) + O p (cid:18) h s h s (cid:19) . Note that b K h ( X i ) b p ( X i ) − K h ( X i ) p ( X i )= (cid:26) b p ( X i ) − p ( X i ) + 1 p ( X i ) (cid:27) n b K h ( X i ) − K h ( X i ) + K h ( X i ) o − K h ( X i ) p ( X i ) = (cid:26) b p ( X i ) − p ( X i ) (cid:27) n b K h ( X i ) − K h ( X i ) o + 1 p ( X i ) n b K h ( X i ) − K h ( X i ) o + (cid:26) b p ( X i ) − p ( X i ) (cid:27) K h ( X i )= O p h s h s + h s + s log nnh p ! = O p (cid:18) h s h s (cid:19) . Thus, sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n P j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p (cid:16) h s h s (cid:17) . Owing to assumption (A4) that h s h s k →

0, we have sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p h k  n X j =1 K h ( X j ) w Nji − K h ( X i ) p ( X i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p h s h s + k ! = o p (1) . Since ǫ i = Y i − E ( Y (1) | X i ) are mutually independent, we have I n, = o p (1)in equation (1.7). Finally, to show that I n, = o p (1) of equation (1.7). Notethat nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1)= nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) · nh p n P j =1 K h ( X j − X i ) nh p n P j =1 K h ( X j − X i ) ( D j = 1) , which can be viewed as an estimator of E { ( D =1) Y (1) | X i } p ( X i ) . Denote A ( X i ) = E { ( D = 1) Y (1) | X i } . We can derive easily that b A ( X i ) b p ( X i ) − A ( X i ) p ( X i )= n b A ( X i ) − A ( X i ) + A ( X i ) o (cid:26) b p ( X i ) − p ( X i ) + 1 p ( X i ) (cid:27) − A ( X i ) p ( X i )= n b A ( X i ) − A ( X i ) o (cid:26) b p ( X i ) − p ( X i ) (cid:27) + A ( X i ) (cid:26) b p ( X i ) − p ( X i ) (cid:27) + n b A ( X i ) − A ( X i ) o p ( X i ) = O p h s + s log nnh p ! . Thussup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O p h s + s log nnh p ! . Then, we can bound I n, as follows: | I n, | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p nh k n X i =1 K h ( X i )  nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ q nh k sup i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) nh p n P j =1 K h ( X j − X i ) ( D j = 1) EY (1) | X j nh p n P j =1 K h ( X j − X i ) ( D j = 1) − EY (1) | X i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) nh k n X i =1 | K h ( X i ) | = q nh k O p h s + s log nnh p ! · O p (1) = o p (1) · O p (1) = o p (1) , where assumption (A4) is used for the second equation. Thus, together with I n, = o p (1), I n, = o p (1) and I n, = o p (1), equation (1.7) becomes p nh k n X i =1 K h ( X i )  nh p n P j =1 K h ( X j − X i ) Y j ( D j = 1) nh p n P j =1 K h ( X j − X i ) ( D j = 1) − τ ( x )  = I n, + I n, + o p (1) . Similarly, we can also deal with b m ( X i ) − τ ( x ) of (1.6) to have p nh k n X i =1 K h ( X i )  nh p n P j =1 K h ( X j − X i ) Y j ( D j = 0) nh p n P j =1 K h ( X j − X i ) ( D j = 0) − τ ( x )  := I n, + I n, + o p (1) , where I n, = 1 p nh k n X i =1 ǫ i ( D i = 0) K h ( X i )1 − p ( X i ) , I n, = 1 p nh k n X i =1 K h ( X i ) EY (0) | X i ,ǫ i = Y i − EY (0) | X i . Hence, we get the asymptotic linear representation of b τ ( x ) as q nh k { b τ ( x ) − τ ( x ) } = 1 q nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) , which can be asymptotically normal. Again, we compute its asymptoticvariance. Similarly as the proof for Theorem ?? , we haveVar { b τ ( x ) } = 1 nh k || K || σ N ( x ) f ( x ) + o (cid:18) nh k (cid:19) . Then by assumptions (C1)– (C4) and (A1) – (A4) for some s ∗ ≥ s ≥ p , wecan derive that q nh k { b τ ( x ) − τ ( x ) } d −→ N (cid:18) , || K || σ N ( x ) f ( x ) (cid:19) , where σ N ( x ) ≡ E [ { Ψ ( X, Y, D ) − τ ( x ) } | X = x ] . The proof is concluded. (cid:3)

Proof of Theorem ?? . Inspired by the proof of Theorem 2 of Luo, Zhu and Ghosh(2017), we have (1.9) q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) h b m ( b β ⊤ X ) − τ ( x ) i nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) h b m ( b β ⊤ X ) − τ ( x ) i nh k n P i =1 K h ( X i )= √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i )+ O p ( q nh k || b β − β || + q nh k || b β − β || ) , where b m ( b β ⊤ X ) = nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) Y j ( D j = 1) nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) ( D j = 1) , b Z = b β ⊤ X, b m ( b β ⊤ X ) = nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) Y j ( D j = 0) nh r (1)4 n P j =1 K h (cid:16) b Z j − b Z i (cid:17) ( D j = 0) , b Z = b β ⊤ X. Under assumptions (A8), O p ( q nh k || b β − β || + q nh k || b β − β || ) = O p ( q h k ) = o p (1) as h →

0. Therefore, equation (1.9) becomes (1.10) q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) nh k n P i =1 K h ( X i )+ o p (1) . Similarly as the proof for Theorem ?? , we have p nh k n X i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) = 1 p nh k n X i =1 K h ( X i ) [ EY (1) | X i − τ ( x )] + 1 p nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) ( w S ij − w S ji )+ 1 p nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w S ji + 1 p nh k n X i =1 K h ( X i )  nh r (1)4 n P j =1 K h (cid:0) Z j − Z i (cid:1) ( D j = 1) EY (1) | X j nh r (1)4 n P j =1 K h (cid:0) Z j − Z i (cid:1) ( D j = 1) − EY (1) | X i  =: I n, + I n, + I n, + I n, , where w S ij = nh r (1)4 K h (cid:0) Z i − Z j (cid:1) nh r (1)4 n P i =1 K h (cid:0) Z i − Z j (cid:1) ( D i = 1) , ǫ i = Y i − EY (1) | X i . Similarly, we can decompose b m ( β ⊤ X ) − τ ( x ) as p nh k n X i =1 K h ( X i ) (cid:2) b m ( β ⊤ X ) − τ ( x ) (cid:3) = 1 p nh k n X i =1 K h ( X i ) [ EY (0) | X i − τ ( x )] + 1 p nh k n X i =1 ǫ i ( D i = 0) n X j =1 K h ( X j ) ( w S ij − w S ji )+ 1 p nh k n X i =1 ǫ i ( D i = 0) n X j =1 K h ( X j ) w S ji + 1 p nh k n X i =1 K h ( X i )  nh r (0)4 n P j =1 K h (cid:0) Z j − Z i (cid:1) ( D j = 0) EY (0) | X j nh r (0)4 n P j =1 K h (cid:0) Z j − Z i (cid:1) ( D j = 0) − EY (0) | X i  =: I ′ n, + I ′ n, + I ′ n, + I ′ n, , where w S ij = nh r (0)4 K h (cid:0) Z i − Z j (cid:1) nh r (0)4 n P i =1 K h (cid:0) Z i − Z j (cid:1) ( D i = 0) , ǫ i = Y i − EY (0) | X i . I n, , I ′ n, , I n, and I ′ n, are o p (1) following the samearguments for proving that I n, = o p (1) and I n, = o p (1) for Theorem ?? . Thedetails are omitted here. We now deal with I n, and I ′ n, . Lemma . Suppose assumptions (C1) – (C4), (A1) and (A5) – (A7) aresatisﬁed. Then, for each point x in the support of X ,(1) If X ⊂ k − q β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k ,we have I n, = o p (1) , I ′ n, = o p (1) . (1.11) The corresponding asymptotically linear representation is then q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { m ( X i ) − m ( X i ) − τ ( x ) } K h ( X i ) + o p (1) . (2) If X ⊂ β ⊤ X and X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , wehave I n, = 1 p nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + o p (1) , I ′ n, = o p (1) . (1.12) Then we have q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) . (3) If X ⊂ k − q β ⊤ X and X ⊂ β ⊤ X with s (2 − k/q ) + k > and < q ≤ k , wehave I n, = o p (1) , I ′ n, = 1 p nh k n X i =1 ǫ i ( D i = 0) K h ( X i ) p ( X i ) + o p (1) . (1.13) The corresponding asymptotically linear representation is q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) . (4) If X ⊂ β ⊤ X and X ⊂ β ⊤ X , we have I n, = 1 p nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + o p (1) , I ′ n, = 1 p nh k n X i =1 ǫ i ( D i = 0) K h ( X i ) p ( X i ) + o p (1) . (1.14) We have q nh k { b τ ( x ) − τ ( x ) } = 1 p nh k f ( x ) n X i =1 { Ψ ( X i , Y i , D i ) − τ ( x ) } K h ( X i ) + o p (1) . Proof of Lemma 1.2.

We need to show that I n, = o p (1) if X ⊂ k − q β ⊤ X with s (2 − k/q ) + k > < q ≤ k . Let X = v , β ⊤ X = v , and denote (cid:16) v − v i h , v − v i h (cid:17) as ( t , t ). We have E  n X j =1 K h ( X j ) w S ji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i  = 1 + o p (1) h r (1)4 f ( v i ) p ( v i ) Z K (cid:18) v j − β ⊤ X i h (cid:19) K (cid:18) v j − X h (cid:19) f ( v i ) dv = h o p (1) f ( v i ) p ( v i ) Z K ( t ) K (cid:18) v i − X h + t h h (cid:19) f ( v i + h t , v i + h t ) dt dt = h q K (cid:18) v i − X h (cid:19) f ( v i , v i ) f ( v i ) p ( v i ) Z K ( t ) dt dt + h q +14 h K ′ (cid:18) v i − X h (cid:19) f ( v i , v i ) f ( v i ) p ( v i ) Z t K ( t ) dt dt + o p (cid:18) h h (cid:19) , where f ( v i , v i ) is the joint density function of ( X , β ⊤ X ). Under assumptions(A5) – (A7), we have E  n X j =1 K h ( X j ) w S ji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i  = C h q K h ( X i ) f ( X i , β ⊤ X i ) f ( X i ) p ( β ⊤ X i ) + O p h q +14 h ! = O p h q + h q +14 h ! . Hence, under assumptions (A6), (A7), s (2 − k/q ) + k > < q ≤ k ,1 p nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w S ji = 1 √ n n X i =1 ǫ i ( D i = 1) O p h q h k/ + h q +14 h k/ ! = o p (1) . Analogously, we get I ′ n, = o p (1) if X ⊂ k − q β ⊤ X . Next, we prove that I n, = 1 p nh k n X i =1 ǫ i ( D i = 1) K h ( X i ) p ( X i ) + o p (1) , if X ⊂ β ⊤ X . As that case that X ⊂ β ⊤ X is similar to that X ⊂ X in nonpara-metric case, then parallelling to derive equation (1.8), we get the desired result.Similarly, we have I ′ n, = √ nh k n P i =1 ǫ i ( D i = 0) K h ( X i ) p ( X i ) + o p (1) if X ⊂ β ⊤ X .The proof for Lemma 1.2 is concluded. Proof of Corollary ?? . Consider the case where X e X ∈ R q . Similarly as4before, we derive that(1.15) q nh k ( b τ ( x ) − τ ( x ))= √ nh k n P i =1 K h ( X i ) h b m ( e X i ) − τ ( x ) i nh k n P i =1 K h ( X i ) − √ nh k n P i =1 K h ( X i ) h b m ( e X i ) − τ ( x ) i nh k n P i =1 K h ( X i ) , where b m ( e X i ) = nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) Y j ( D j = 1) nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 1) , b m ( e X i ) = nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) Y j ( D j = 0) nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 0) . Some similar calculations lead to b m ( e X i ) − τ ( x ).1 p nh k n X i =1 K h ( X i ) h b m ( e X i ) − τ ( x ) i = 1 p nh k n X i =1 K h ( X i ) (cid:2) EY (1) | X i − τ ( x ) (cid:3) + 1 p nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) ( w N ij − w N ji )+ 1 p nh k n X i =1 ǫ i ( D i = 1) n X j =1 K h ( X j ) w N ji + 1 p nh k n X i =1 K h ( X i )  nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 1) EY (1) | X j nh q n P j =1 K h (cid:16) e X j − e X i (cid:17) ( D j = 1) − EY (1) | X i  =: I n, + I n, + I n, + I n, , where w N ij = nh q K h (cid:16) e X i − e X j (cid:17) nh q n P i =1 K h (cid:16) e X i − e X j (cid:17) ( D i = 1) . Then we can prove that I n, and I n, are o p (1) by the same arguments as thoseused to handle I n, and I n, for proving Theorem ?? . Owing to X e X , similararguments for proving Lemma 1.2 implies that I n, = o p (1). The proof for Corol-lary ?? is concluded. (cid:3) Proof of Corollary ?? . From the proof for Theorem ?? , we can see that E  n X j =1 K h ( X j ) w Nji (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X i  = O p (cid:18) h + h s h s (cid:19) , p nh k (cid:16) h s + p log( n ) /nh p (cid:17) = o (1). Then NRCATE shares thesame asymptotic distribution as PRCATE. For SRCATE, we can use similar argu-ments to show the same result. The proof is ﬁnished. (cid:3) References

Abrevaya, J., Hsu, Y. C., & Lieli, R. P. (2015). Estimating conditional averagetreatment eﬀects.

Journal of Business & Economic Statistics , 485-505. Luo, W., Zhu, Y., & Ghosh, D (2017). On estimating regression-based causal eﬀectsusing suﬃcient dimension reduction.

Biometrika , 51-65.

Lu Li School of StatisticsEast China Normal UniversityShanghai, 200062, ChinaE-mail:

Niwen Zhou School of StatisticsBeing Normal UniversityBeijing, 100875, ChinaE-mail: [email protected]

Lixing ZhuSchool of StatisticsBeing Normal UniversityBeijing, 100875, Chinaand Department of MathematicsHong Kong Baptist UniversityKowloon Tong, Hong Kong, ChinaE-mail: [email protected] Table 1.1

The distribution of √ nh [ b τ ( x ) − τ ( x )] for model 1 n=200 n=500OR PR SR NR N S P O OR PR SR NR N S P O h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.178 0.224 0.213 0.223 0.352 0.361 0.396 0.407 0.188 0.224 0.208 0.225 0.371 0.388 0.404 0.409-0.2 0.182 0.192 0.181 0.196 0.351 0.365 0.377 0.380 0.186 0.193 0.191 0.213 0.368 0.383 0.389 0.395SD 0 0.199 0.210 0.231 0.232 0.420 0.440 0.460 0.491 0.198 0.205 0.206 0.217 0.415 0.430 0.466 0.4760.2 0.208 0.216 0.248 0.243 0.466 0.476 0.503 0.525 0.195 0.203 0.231 0.226 0.423 0.438 0.484 0.5090.4 0.195 0.215 0.239 0.236 0.377 0.395 0.415 0.426 0.202 0.222 0.250 0.247 0.364 0.372 0.415 0.432-0.4 0.005 0.011 -0.032 0.001 -0.024 -0.001 -0.004 -0.006 0.021 0.026 -0.097 0.011 0.022 0.055 0.043 0.037-0.2 -0.002 0.006 0.094 0.043 -0.011 0.011 0.014 0.017 -0.005 -0.003 0.119 0.034 -0.036 -0.008 -0.013 -0.013BIAS 0 0.005 0.013 0.040 0.032 -0.033 0.004 -0.007 0.012 0.007 0.007 0.057 0.030 -0.026 0.006 -0.004 0.0070.2 0.005 0.009 0.008 0.013 -0.006 0.035 0.001 0.014 0.003 0.001 0.005 0.007 -0.030 -0.004 -0.005 0.0060.4 0.006 0.004 -0.014 -0.008 0.033 0.066 0.041 0.027 0.015 0.013 0.000 0.008 0.032 0.038 0.017 0.012-0.4 0.032 0.050 0.047 0.050 0.125 0.130 0.157 0.165 0.036 0.051 0.052 0.051 0.138 0.153 0.165 0.169-0.2 0.033 0.037 0.042 0.040 0.124 0.133 0.142 0.145 0.035 0.037 0.051 0.047 0.137 0.147 0.152 0.156MSE 0 0.040 0.044 0.055 0.055 0.177 0.194 0.212 0.241 0.039 0.042 0.046 0.048 0.173 0.185 0.217 0.2260.2 0.043 0.047 0.061 0.059 0.217 0.228 0.253 0.276 0.038 0.041 0.054 0.051 0.180 0.192 0.234 0.2590.4 0.038 0.046 0.057 0.056 0.143 0.160 0.174 0.182 0.041 0.049 0.062 0.061 0.133 0.140 0.173 0.187 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.195 0.235 0.219 0.227 0.358 0.371 0.403 0.392 0.181 0.220 0.212 0.225 0.365 0.381 0.406 0.405-0.2 0.191 0.198 0.196 0.211 0.386 0.398 0.413 0.410 0.192 0.201 0.194 0.214 0.373 0.386 0.406 0.408SD 0 0.199 0.206 0.214 0.216 0.391 0.415 0.429 0.435 0.196 0.209 0.219 0.230 0.418 0.436 0.466 0.4840.2 0.202 0.207 0.235 0.231 0.440 0.455 0.495 0.525 0.203 0.209 0.231 0.227 0.419 0.431 0.468 0.4930.4 0.207 0.222 0.248 0.245 0.375 0.380 0.429 0.441 0.196 0.212 0.231 0.229 0.361 0.370 0.416 0.426-0.4 0.011 0.019 -0.043 0.012 0.023 0.046 0.038 0.034 0.015 0.003 -0.126 -0.011 -0.008 0.024 0.005 0.000-0.2 0.000 0.001 0.081 0.035 -0.033 -0.011 -0.002 -0.006 0.011 0.009 0.126 0.045 -0.021 0.002 -0.002 -0.001BIAS 0 -0.012 -0.016 0.013 0.006 -0.033 0.000 -0.009 -0.003 0.009 0.013 0.064 0.038 -0.012 0.010 0.014 0.0270.2 -0.003 -0.008 -0.008 -0.004 -0.041 -0.014 -0.035 -0.019 -0.009 -0.004 -0.002 -0.001 -0.019 -0.008 -0.009 0.0070.4 -0.007 -0.010 -0.025 -0.022 0.017 0.037 0.030 0.026 0.017 0.019 0.010 0.015 0.055 0.055 0.046 0.047-0.4 0.038 0.056 0.050 0.051 0.129 0.140 0.164 0.155 0.033 0.048 0.061 0.051 0.133 0.145 0.165 0.164-0.2 0.037 0.039 0.045 0.046 0.150 0.159 0.171 0.168 0.037 0.040 0.053 0.048 0.139 0.149 0.165 0.167MSE 0 0.040 0.043 0.046 0.047 0.154 0.172 0.184 0.189 0.039 0.044 0.052 0.054 0.175 0.190 0.217 0.2350.2 0.041 0.043 0.055 0.053 0.195 0.207 0.246 0.276 0.041 0.044 0.053 0.051 0.176 0.186 0.219 0.2430.4 0.043 0.049 0.062 0.061 0.141 0.146 0.185 0.195 0.039 0.045 0.053 0.053 0.133 0.140 0.175 0.184 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.183 0.222 0.218 0.219 0.357 0.375 0.405 0.398 0.191 0.214 0.207 0.215 0.376 0.390 0.400 0.408-0.2 0.195 0.203 0.186 0.197 0.360 0.366 0.380 0.384 0.186 0.196 0.183 0.198 0.364 0.372 0.386 0.391SD 0 0.193 0.206 0.214 0.217 0.441 0.453 0.474 0.486 0.193 0.201 0.207 0.211 0.432 0.442 0.478 0.4910.2 0.200 0.213 0.237 0.232 0.460 0.476 0.516 0.525 0.194 0.202 0.230 0.227 0.479 0.489 0.526 0.5410.4 0.198 0.220 0.241 0.239 0.407 0.414 0.453 0.460 0.211 0.231 0.257 0.255 0.406 0.408 0.455 0.474-0.4 0.005 0.000 -0.064 -0.009 -0.013 0.010 0.010 0.007 -0.004 -0.004 -0.130 -0.016 -0.006 0.019 0.009 0.020-0.2 0.001 -0.002 0.079 0.049 -0.044 -0.029 -0.024 -0.022 -0.002 -0.004 0.118 0.041 -0.025 -0.007 -0.007 -0.004BIAS 0 0.003 0.000 0.022 0.017 -0.029 -0.010 -0.018 -0.005 0.008 0.006 0.056 0.034 -0.034 -0.014 -0.003 -0.0080.2 0.016 0.013 0.013 0.018 -0.034 -0.001 -0.026 -0.016 0.000 -0.001 0.002 0.006 -0.030 -0.026 -0.022 -0.0280.4 0.004 -0.001 -0.014 -0.015 0.014 0.035 0.013 0.000 0.021 0.021 0.009 0.010 0.030 0.036 0.008 -0.003-0.4 0.034 0.049 0.052 0.048 0.128 0.141 0.164 0.159 0.037 0.046 0.060 0.046 0.142 0.152 0.160 0.167-0.2 0.038 0.041 0.041 0.041 0.131 0.134 0.145 0.148 0.035 0.038 0.047 0.041 0.133 0.139 0.149 0.153MSE 0 0.037 0.042 0.046 0.047 0.195 0.205 0.225 0.236 0.037 0.040 0.046 0.046 0.188 0.195 0.228 0.2410.2 0.040 0.045 0.056 0.054 0.213 0.226 0.267 0.276 0.038 0.041 0.053 0.052 0.230 0.240 0.278 0.2930.4 0.039 0.048 0.058 0.057 0.166 0.172 0.205 0.211 0.045 0.054 0.066 0.065 0.165 0.168 0.207 0.225 Table 1.2

The distribution of √ nh [ b τ ( x ) − τ ( x )] for model 2 n=200 n=500OR PR SR NR N S P O OR PR SR NR N S P O h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.384 0.386 0.391 0.409 0.950 1.103 1.101 1.098 0.341 0.348 0.356 0.376 0.959 1.066 1.061 1.079-0.2 0.389 0.395 0.399 0.430 0.968 1.100 1.106 1.114 0.360 0.362 0.365 0.396 0.962 1.091 1.088 1.099SD 0 0.386 0.388 0.393 0.419 1.000 1.165 1.141 1.136 0.373 0.376 0.380 0.397 0.940 1.077 1.053 1.0640.2 0.379 0.378 0.381 0.406 1.011 1.175 1.122 1.116 0.357 0.361 0.368 0.398 0.998 1.140 1.120 1.1210.4 0.384 0.390 0.412 0.435 1.011 1.150 1.105 1.103 0.390 0.394 0.413 0.438 1.045 1.182 1.129 1.157-0.4 0.010 0.010 0.059 0.031 -0.667 -0.063 0.081 0.089 0.020 0.018 0.108 0.028 -1.033 -0.118 0.032 0.015-0.2 -0.017 -0.017 0.012 0.003 -0.740 -0.158 -0.008 0.011 -0.005 -0.007 0.033 -0.004 -1.078 -0.151 -0.023 -0.036BIAS 0 -0.013 -0.015 -0.022 -0.017 -0.751 -0.143 -0.025 -0.009 0.004 0.002 -0.003 0.028 -0.996 -0.082 0.050 0.0520.2 -0.002 -0.002 -0.023 0.013 -0.650 0.004 0.058 0.078 -0.011 -0.013 -0.068 -0.031 -0.968 -0.008 0.028 0.0350.4 0.060 0.058 0.013 0.041 -0.566 0.095 0.103 0.104 0.005 0.004 -0.067 -0.007 -0.892 0.120 0.020 0.019-0.4 0.148 0.149 0.156 0.168 1.348 1.220 1.218 1.213 0.117 0.121 0.139 0.142 1.987 1.150 1.127 1.165-0.2 0.152 0.157 0.160 0.185 1.483 1.234 1.222 1.240 0.129 0.131 0.134 0.157 2.087 1.213 1.184 1.209MSE 0 0.149 0.151 0.155 0.176 1.564 1.377 1.303 1.291 0.139 0.142 0.145 0.158 1.876 1.166 1.111 1.1360.2 0.143 0.143 0.146 0.165 1.445 1.380 1.262 1.252 0.128 0.130 0.140 0.160 1.932 1.300 1.256 1.2580.4 0.151 0.156 0.170 0.191 1.342 1.332 1.232 1.226 0.152 0.155 0.175 0.192 1.888 1.411 1.275 1.339 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.368 0.374 0.376 0.397 1.003 1.204 1.140 1.161 0.346 0.348 0.358 0.372 0.945 1.093 1.077 1.102-0.2 0.399 0.397 0.407 0.443 1.011 1.192 1.183 1.177 0.368 0.369 0.371 0.394 0.889 1.030 1.028 1.039SD 0 0.389 0.390 0.392 0.413 1.029 1.192 1.188 1.197 0.362 0.364 0.373 0.408 0.966 1.101 1.077 1.0990.2 0.387 0.387 0.395 0.411 1.048 1.254 1.207 1.198 0.328 0.330 0.333 0.376 0.966 1.097 1.078 1.1040.4 0.391 0.398 0.420 0.432 1.041 1.202 1.131 1.147 0.370 0.377 0.390 0.391 1.019 1.172 1.089 1.114-0.4 0.023 0.027 0.079 0.019 -0.811 -0.173 -0.012 -0.030 -0.023 -0.020 0.070 -0.015 -1.169 -0.194 -0.027 -0.018-0.2 0.003 0.005 0.033 0.015 -0.754 -0.101 0.052 0.049 0.005 0.007 0.046 0.002 -1.101 -0.141 0.039 0.050BIAS 0 0.008 0.009 0.011 0.019 -0.781 -0.109 -0.014 -0.005 0.000 0.001 -0.007 0.001 -1.103 -0.121 0.013 0.0210.2 0.027 0.025 -0.006 0.031 -0.653 0.054 0.122 0.119 0.003 0.003 -0.046 0.016 -1.060 -0.011 0.054 0.0520.4 0.023 0.020 -0.025 0.018 -0.588 0.124 0.157 0.128 0.014 0.013 -0.058 0.008 -0.986 0.057 0.027 0.021-0.4 0.136 0.141 0.147 0.158 1.664 1.479 1.300 1.348 0.120 0.121 0.133 0.139 2.258 1.232 1.161 1.215-0.2 0.159 0.158 0.166 0.196 1.592 1.431 1.402 1.388 0.135 0.136 0.139 0.155 2.002 1.081 1.058 1.083MSE 0 0.151 0.152 0.154 0.171 1.668 1.433 1.412 1.433 0.131 0.133 0.139 0.166 2.150 1.227 1.159 1.2070.2 0.150 0.151 0.156 0.170 1.525 1.577 1.473 1.450 0.108 0.109 0.113 0.141 2.057 1.203 1.165 1.2220.4 0.154 0.159 0.177 0.187 1.429 1.460 1.304 1.331 0.137 0.142 0.155 0.153 2.011 1.376 1.187 1.241 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.364 0.370 0.379 0.409 0.997 1.172 1.140 1.164 0.358 0.363 0.370 0.389 0.970 1.134 1.109 1.140-0.2 0.388 0.392 0.406 0.436 1.042 1.225 1.227 1.230 0.364 0.362 0.365 0.408 0.901 1.086 1.054 1.054SD 0 0.396 0.398 0.413 0.446 0.992 1.180 1.166 1.161 0.371 0.374 0.382 0.417 0.919 1.113 1.077 1.0840.2 0.388 0.389 0.397 0.436 1.029 1.254 1.161 1.182 0.369 0.370 0.374 0.389 1.021 1.199 1.148 1.1680.4 0.375 0.379 0.403 0.430 1.151 1.360 1.261 1.280 0.364 0.370 0.386 0.409 1.049 1.243 1.132 1.160-0.4 -0.001 0.002 0.047 0.008 -0.838 -0.202 -0.010 -0.019 -0.013 -0.010 0.087 0.000 -1.255 -0.212 -0.034 -0.021-0.2 0.008 0.012 0.038 0.020 -0.872 -0.245 -0.067 -0.058 0.021 0.022 0.061 0.018 -1.145 -0.086 0.120 0.121BIAS 0 0.022 0.023 0.023 0.032 -0.850 -0.196 -0.036 -0.030 -0.001 -0.003 -0.014 -0.005 -1.265 -0.190 -0.023 -0.0240.2 -0.007 -0.007 -0.036 -0.014 -0.839 -0.140 -0.053 -0.042 0.007 0.005 -0.047 -0.001 -1.213 -0.103 0.006 -0.0070.4 0.011 0.007 -0.036 0.000 -0.759 -0.075 -0.013 -0.021 -0.005 -0.009 -0.082 -0.013 -1.191 -0.073 -0.075 -0.103-0.4 0.133 0.137 0.146 0.167 1.695 1.414 1.299 1.355 0.129 0.132 0.144 0.151 2.517 1.330 1.230 1.300-0.2 0.150 0.154 0.166 0.191 1.846 1.562 1.510 1.515 0.133 0.132 0.137 0.167 2.121 1.187 1.125 1.126MSE 0 0.158 0.159 0.171 0.200 1.706 1.431 1.360 1.350 0.138 0.140 0.146 0.174 2.445 1.275 1.161 1.1760.2 0.150 0.151 0.159 0.190 1.764 1.592 1.351 1.400 0.136 0.137 0.142 0.151 2.512 1.447 1.319 1.3630.4 0.141 0.144 0.164 0.185 1.901 1.856 1.590 1.638 0.132 0.137 0.156 0.168 2.519 1.551 1.286 1.356 Table 1.3

The distribution of √ nh [ b τ ( x ) − τ ( x )] for model 3 n=200 n=500OR PR SR NR N S P O OR PR SR NR N S P O h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.320 0.321 0.328 0.319 0.516 0.532 0.570 0.548 0.306 0.310 0.315 0.307 0.502 0.501 0.501 0.499-0.2 0.341 0.343 0.352 0.345 0.477 0.477 0.514 0.526 0.298 0.301 0.304 0.292 0.476 0.472 0.497 0.501SD 0 0.301 0.306 0.312 0.304 0.450 0.458 0.487 0.495 0.292 0.296 0.299 0.290 0.484 0.466 0.512 0.5140.2 0.320 0.320 0.322 0.313 0.493 0.486 0.521 0.514 0.296 0.298 0.304 0.296 0.470 0.455 0.489 0.4880.4 0.306 0.314 0.319 0.312 0.501 0.525 0.534 0.530 0.301 0.305 0.308 0.296 0.473 0.477 0.483 0.491-0.4 -0.023 -0.025 -0.028 -0.044 -0.038 -0.029 0.001 0.003 0.026 0.027 0.025 0.005 0.027 0.021 0.050 0.054-0.2 0.026 0.022 0.020 0.035 0.004 0.009 -0.006 -0.009 -0.006 -0.006 -0.003 0.010 0.013 0.022 0.003 0.004BIAS 0 0.003 -0.001 0.011 0.035 0.048 0.051 0.019 0.022 0.010 0.011 0.013 0.042 0.020 0.043 -0.014 -0.0150.2 0.003 0.000 0.001 0.014 0.015 0.026 0.008 0.010 -0.012 -0.011 -0.010 0.009 0.033 0.044 0.023 0.0220.4 -0.004 -0.006 -0.011 -0.018 -0.023 -0.012 0.011 0.013 0.001 0.001 -0.004 -0.023 -0.044 -0.049 -0.010 -0.008-0.4 0.103 0.104 0.109 0.103 0.267 0.284 0.324 0.301 0.094 0.097 0.100 0.094 0.252 0.252 0.253 0.252-0.2 0.117 0.118 0.124 0.120 0.227 0.227 0.265 0.277 0.089 0.091 0.092 0.085 0.227 0.223 0.247 0.251MSE 0 0.091 0.094 0.098 0.094 0.205 0.212 0.237 0.246 0.085 0.088 0.090 0.086 0.234 0.219 0.262 0.2650.2 0.102 0.102 0.104 0.098 0.244 0.237 0.272 0.264 0.088 0.089 0.093 0.088 0.222 0.209 0.240 0.2390.4 0.093 0.098 0.102 0.097 0.251 0.276 0.285 0.281 0.091 0.093 0.095 0.088 0.226 0.230 0.234 0.241 h = 0 . n − / , h = 0 . n − / , h = 0 . n − / -0.4 0.297 0.301 0.313 0.306 0.465 0.480 0.502 0.506 0.285 0.290 0.296 0.289 0.497 0.512 0.507 0.513-0.2 0.311 0.313 0.319 0.314 0.423 0.428 0.467 0.462 0.303 0.307 0.309 0.300 0.471 0.475 0.482 0.478SD 0 0.316 0.320 0.322 0.322 0.470 0.465 0.532 0.522 0.321 0.325 0.331 0.325 0.483 0.487 0.521 0.5210.2 0.318 0.323 0.328 0.323 0.460 0.462 0.501 0.502 0.291 0.298 0.301 0.297 0.468 0.471 0.484 0.4850.4 0.301 0.305 0.306 0.305 0.489 0.493 0.530 0.516 0.310 0.311 0.312 0.307 0.518 0.536 0.524 0.522-0.4 0.003 0.002 -0.001 -0.013 -0.019 -0.004 0.024 0.025 -0.003 -0.004 -0.007 -0.025 -0.043 -0.025 -0.018 -0.015-0.2 -0.025 -0.023 -0.023 -0.013 -0.023 -0.024 -0.043 -0.044 0.011 0.011 0.015 0.029 0.007 0.016 -0.004 -0.004BIAS 0 0.001 0.003 0.009 0.028 0.019 0.028 -0.009 -0.014 0.014 0.015 0.023 0.051 0.050 0.059 0.012 0.0180.2 0.008 0.009 0.017 0.025 0.024 0.029 0.017 0.011 0.010 0.011 0.015 0.026 0.019 0.032 0.012 0.0110.4 -0.010 -0.009 -0.014 -0.025 -0.055 -0.048 -0.010 -0.012 -0.004 -0.004 -0.010 -0.025 -0.034 -0.013 -0.009 -0.008-0.4 0.088 0.090 0.098 0.094 0.217 0.230 0.253 0.257 0.081 0.084 0.088 0.084 0.248 0.263 0.257 0.264-0.2 0.097 0.099 0.102 0.099 0.179 0.183 0.220 0.216 0.092 0.094 0.095 0.091 0.222 0.226 0.232 0.229MSE 0 0.100 0.103 0.104 0.105 0.221 0.217 0.283 0.272 0.103 0.106 0.110 0.108 0.236 0.241 0.271 0.2720.2 0.101 0.104 0.108 0.105 0.212 0.215 0.251 0.252 0.085 0.089 0.091 0.089 0.220 0.223 0.234 0.2350.4 0.091 0.093 0.094 0.094 0.243 0.246 0.281 0.267 0.096 0.097 0.097 0.095 0.270 0.288 0.275 0.273 h = 0 . n − / , h = 0 . n − / , h = 0 . n − /4