[PDF] Local Composite Quantile Regression for Regression Discontinuity

Abstract

We introduce the local composite quantile regression (LCQR) to causal inference in regression discontinuity (RD) designs. Kai et al. (2010) study the efficiency property of LCQR, while we show that its nice boundary performance translates to accurate estimation of treatment effects in RD under a variety of data generating processes. Moreover, we propose a bias-corrected and standard error-adjusted t-test for inference, which leads to confidence intervals with good coverage probabilities. A bandwidth selector is also discussed. For illustration, we conduct a simulation study and revisit a classic example from Lee (2008). A companion R package rdcqr is developed.

Full PDF

LLocal Composite Quantile Regression for Regression Discontinuity

Xiao Huang ∗ Zhaoguo Zhan † September 9, 2020

Abstract

We introduce the local composite quantile regression (LCQR) to causal inference inregression discontinuity (RD) designs. Kai et al. (2010) study the eﬃciency propertyof LCQR, while we show that its nice boundary performance translates to accurateestimation of treatment eﬀects in RD under a variety of data generating processes.Moreover, we propose a bias-corrected and standard error-adjusted t -test for inference,which leads to conﬁdence intervals with good coverage probabilities. A bandwidthselector is also discussed. For illustration, we conduct a simulation study and revisit aclassic example from Lee (2008). A companion R package rdcqr is developed. JEL Classiﬁcation : C18, C21

Keywords : Regression discontinuity, Treatment eﬀect, Local composite quantile regres-sion. ∗ Corresponding author. Department of Economics, Finance, and Quantitative Analysis, Coles College ofBusiness, Kennesaw State University, GA 30144, USA. Email: [email protected]. † Department of Economics, Finance, and Quantitative Analysis, Coles College of Business, KennesawState University, GA 30144, USA. Email: [email protected]. a r X i v : . [ ec on . E M ] S e p Introduction

Over the past few decades, regression discontinuity (RD) has become a popular quasi-experimental method to identify the local average treatment eﬀect. In its simplest form,the sharp RD design, the probability for a unit i to receive treatment is a discontinuousfunction of an underlying covariate X i , i.e., unit i receives treatment if and only if the co-variate exceeds a pre-speciﬁed cutoﬀ. Under some smoothness assumptions, units in a smallneighborhood of the cutoﬀ share similar characteristics so that the diﬀerence between theoutcomes above and below the cutoﬀ can be interpreted as the local average treatment eﬀect.This is a simple yet powerful method to identify causal eﬀects locally.Empirical studies in RD are predated by Thistlethwaite and Campbell (1960), who in-vestigated the impact on academic performance of a merit award based on whether the testscore attains a cutoﬀ. The seminal work in Hahn et al. (2001) showed that, if conditionalexpectations are continuous in the covariate around the cutoﬀ, the treatment eﬀect can beidentiﬁed by taking the diﬀerence between average outcomes above and below the cutoﬀ.This ﬁnding signiﬁcantly weakened previous identiﬁcation conditions and paved the way forthe rapid development of the RD literature. Examples of empirical work include the anal-ysis of electoral advantage to incumbency in U.S. house elections in Lee (2008), analysis ofchanges in children’s mortality rates due to the implementation of the Head Start programin Ludwig and Miller (2007), etc. Recent theoretical development includes the optimal con-vergence rate in RD estimation in Porter (2003), the optimal bandwidth choice in Imbensand Kalyanaraman (2012), the quantile treatment estimation in Frandsen et al. (2012), therobust conﬁdence intervals in Calonico et al. (2014), the null-restricted t -test in Feir et al. (2016), the honest conﬁdence intervals in Armstrong and Koles´ar (2018, 2020), etc. See alsothe literature reviews in Imbens and Lemieux (2008) and Lee and Lemieux (2010).Central to a large portion of the empirical and theoretical work on RD is the use of locallinear regression (LLR), whose theoretical properties are discussed in Fan and Gijbels (1992),Fan (1993), and Ruppert and Wand (1994) as well as Fan and Gijbels (1996). Automatic2oundary carpentry and high minimax eﬃciency (Cheng et al. (1997)) have made LLR anideal tool for estimation in RD. Since LLR has the smallest asymptotic mean squared error(MSE) in the class of linear estimators and is simple to use, the RD research community haswidely adopted it as a default tool.The goal of this paper is to introduce another nonparametric smoother to the estimationand inference in RD. Although LLR is the best linear smoother, it is possible to use analternative estimator in RD if we consider the larger class of nonlinear smoothers. Oneexample is the local composite quantile regression (LCQR) method in Kai et al. (2010), whoshow that LCQR has some eﬃciency gain against LLR when data are non-normal. Theliterature related to LCQR is growing. Several recent examples include the partial linearmodel in Kai et al. (2011), weighted LCQR smoothing in Zhao and Xiao (2014), LCQRsmoothing in Harris recurrent Markov processes in Li and Li (2016), LCQR with mixeddata in Huang and Lin (2020), just to name a few. To use LCQR, the researcher chooses aﬁnite number of quantiles, say 5, and use a local polynomial to estimate the quantile (theintercept) at each of the 5 quantile positions. Averaging the 5 quantile estimates gives anestimate for the conditional mean. m ^ ( x ) (a) LLR versus LCQR lllllllllllllllllllllllllllll llllll llllllllllllllllllllllllllll lllllllll −2−1012 near boundary 0 near boundary 1 m ^ ( x ) LLRLCQR (b) Box plots of the estimates on the boundaries

Figure 1: Estimates of LLR and LCQR with a sample size of 400 and 400 replications. Bothmethods use the same direct plug-in bandwidth in Ruppert et al. (1995). m ( X ) = sin(5 πX ).3o motivate the use of LCQR, consider the nonlinear model in Ruppert et al. (1995), Y =sin(5 πX ) + 0 . (cid:15) , where (cid:15) follows a mixture normal distribution, 0 . N (0 ,

1) + 0 . N (0 , ),and X follows a uniform distribution on [0 , t -test in Calonico et al. (2014) for conducting inference on causaleﬀects, we propose a t -test that adjusts both the bias and the standard error (s.e.) ofthe LCQR estimator, and the resulting conﬁdence intervals are shown to have good coverageprobabilities. Third, we further discuss a new bandwidth selector that is based on an adjustedMSE. As a byproduct of our research, we also develop an R package rdcqr that implementsthe method in this paper. The rest of the paper unfolds as follows. Section 2 sets up the notation for RD designsand introduces the LCQR method. Section 3 discusses the bias correction and standarderror adjustment for inference on the boundary. Section 4 presents a simulation experiment.An empirical illustration of LCQR in RD is provided in Section 5 using the data from Lee(2008). Section 6 concludes. The Appendix contains all the proofs and additional ﬁguresand tables.

We introduce the notation for RD and the concept of LCQR in this section. The rdcqr package can be downloaded from https://github.com/xhuang20/rdcqr . .1 Regression discontinuity: setup and notation Consider the triplet ( Y i , X i , T i ), i = 1 , · · · , n in a standard RD design, where Y i is theobserved outcome for individual i , X i is the covariate that determines the assignment oftreatment, and T i equals 1 if individual i receives treatment and 0 otherwise. For individual i to receive treatment, a threshold value for X i is set. Without loss of generality, we assumethe threshold is 0, and the treatment assignment rule becomes that individual i is assignedto the treatment group if X i ≥

0. Throughout the paper, the signs + and − will be used todenote data or quantities associated with X i ≥ X i <

0, respectively.We use a general nonparametric model to describe the relationship between Y i and X i : Y + ,i = m Y + ( X + ,i ) + σ (cid:15) Y + ( X + ,i ) (cid:15) Y + ,i , (1) Y − ,i = m Y − ( X − ,i ) + σ (cid:15) Y − ( X − ,i ) (cid:15) Y − ,i , (2)where m Y + , m Y − , σ (cid:15) Y + , σ (cid:15) Y − , (cid:15) Y + and (cid:15) Y − are the conditional mean functions, conditional stan-dard deviation functions, and error terms with unit variance, respectively.In a sharp RD design, whether individual i receives treatment is determined by whether X i ≥

0. One is interested in measuring the average treatment eﬀect at the threshold 0: τ sharp = m Y + (0) − m Y − (0) , (3)where m Y + (0) = lim x + ↓ m Y + ( x + ) and m Y − (0) = lim x − ↑ m Y − ( x − ) . (4)A t -test for testing a hypothesized treatment eﬀect τ reads: t sharp = ˆ τ sharp − τ (cid:112) Var(ˆ τ sharp ) , (5)where, under the i.i.d. assumption, we have 5 τ sharp = ˆ m Y + (0) − ˆ m Y − (0) , Var(ˆ τ sharp ) = Var( ˆ m Y + (0)) + Var( ˆ m Y − (0)) . (6)In a fuzzy RD design, the treatment assignment rule remains the same but T i is notnecessarily equal to 1 when X i ≥ X i <

0. It is customary to use a nonparametricfunction to describe the probability of receiving a treatment: T + ,i = m T + ( X + ,i ) + σ (cid:15) T + ( X + ,i ) (cid:15) T + ,i , (7) T − ,i = m T − ( X − ,i ) + σ (cid:15) T − ( X − ,i ) (cid:15) T − ,i . (8)The treatment eﬀect measurement becomes τ fuzzy = m Y + (0) − m Y − (0) m T + (0) − m T − (0) , (9)where m T + (0) = lim x + ↓ m T + ( x + ) and m T − (0) = lim x − ↑ m T − ( x − ) . (10)Let s.o. denote a small order term. The corresponding t -test is given by t fuzzy = ˆ τ fuzzy − τ (cid:112) Var(ˆ τ fuzzy ) , (11)where ˆ τ fuzzy = ˆ m Y + (0) − ˆ m Y − (0)ˆ m T + (0) − ˆ m T − (0) , Var(ˆ τ fuzzy ) = Var( ˆ m Y + (0)) + Var( ˆ m Y − (0)) (cid:0) m T + (0) − m T − (0) (cid:1) + (cid:0) m Y + (0) − m Y − (0) (cid:1) (cid:0) m T + (0) − m T − (0) (cid:1) (cid:2) Var( ˆ m T + (0)) + Var( ˆ m T − (0)) (cid:3) − m Y + (0) − m Y − (0) (cid:0) m T + (0) − m T − (0) (cid:1) (cid:2) Cov( ˆ m Y + (0) , ˆ m T + (0)) + Cov( ˆ m Y − (0) , ˆ m T − (0)) (cid:3) + s.o. (12)To conduct the t -test, we thus need nonparametric estimates for all quantities in eqs. (6)and (12) at the boundary point 0. 6 .2 Local composite quantile regression LLR is the leading method to estimate quantities in eqs. (6) and (12). LCQR is introducedin Kai et al. (2010) as an alternative to the LLR method due to its eﬃciency gain with non-normal errors. Robustness (to non-normality) is the key reason we propose to use LCQR inRD as the normality assumption can be easily violated in practice.The same LCQR method can be applied to eqs. (1), (2), (7) and (8). Because of thissimilarity, we describe the LCQR method based on eq. (1). For k = 1 , · · · , q , let τ k = k/ ( q +1)be the q quantile positions and ρ τ k ( r ) = τ k r − rI ( r <

0) be the q check loss functions inquantile regression. The loss function for LCQR at the point of interest x is deﬁned as q (cid:88) k =1 n + (cid:88) i =1 ρ τ k ( Y + ,i − a k − b ( X + ,i − x )) K (cid:18) X + ,i − xh Y + (cid:19) , (13)where a k is the k th quantile estimator, b is a slope restricted to be the same across quantiles, h Y + is the bandwidth, and K is a kernel function. The point of interest in eq. (1) is x = 0.Minimizing eq. (13) w.r.t. a k and b yields (ˆ a , · · · , ˆ a q , ˆ b ). The LCQR estimator for m Y + ( x ) is deﬁned as ˆ m Y + ( x ) = 1 q q (cid:88) k =1 ˆ a k (14)and ˆ b is the estimator for m (1) Y + ( x ), the ﬁrst derivative of m Y + ( x ). ˆ m Y − ( x ), ˆ m T + ( x ), andˆ m T − ( x ) can be similarly obtained at x = 0.Kai et al. (2010) discuss the asymptotic bias and variance of the LCQR estimator at aninterior point x . The asymptotic properties at a boundary point x are discussed in Kai et al. (2009). Given the results in Kai et al. (2009), one can immediately implement the t -testfor the sharp RD in eq. (5) with bias correction if needed. To implement the t -test for thefuzzy RD in eq. (11), the two covariance expressions in eq. (12) need to be estimated andwe provide the results later on. Asymptotic results for the bias and variance of ˆ m Y + ( x ),ˆ m Y − ( x ), ˆ m T + ( x ), and ˆ m T − ( x ) follow those in Kai et al. (2009) and are given as lemmas in7he Appendix.We make the following assumptions. Let c be a positive constant of supp( K ). Assumption 1. m Y + ( · ), m T + ( · ), m Y − ( · ), and m T − ( · ) are three times diﬀerentiable at x . Assumption 2. σ (cid:15) Y + ( · ), σ (cid:15) T + ( · ), σ (cid:15) Y − ( · ), and σ (cid:15) T − ( · ) are right or left continuous and diﬀer-entiable at x . Assumption 3.

The kernel function K is bounded and symmetric such that, for j =0 , , , · · · , µ + ,j ( c ) = (cid:90) ∞− c u j K ( u ) du, ν + ,j = (cid:90) ∞− c u j K ( u ) du,µ − ,j ( c ) = (cid:90) c −∞ u j K ( u ) du, ν − ,j = (cid:90) c −∞ u j K ( u ) du. Assumption 4.

The marginal density f X + ( · ) is right continuous, diﬀerentiable and positiveat x = 0, and f X − ( · ) is left continuous, diﬀerentiable and positive at x = 0. Assumption 5.

All error distributions for (cid:15) Y + , (cid:15) T + , (cid:15) Y − , and (cid:15) T − are symmetric and havepositive density. All errors are i.i.d. Assumption 6. h Y + → n + h Y + → ∞ as n + → ∞ . The same condition also holds for( n + , h T + ) , ( n − , h Y − ), and ( n − , h T − ).Assumptions 1 and 2 are used for Taylor series expansions in the proof. The bias ex-pressions require only second-order diﬀerentiability of the conditional mean function. Higherorder diﬀerentiability permits the study of smaller order terms in the expansion. The sup-port of the kernel is assumed to be ( −∞ , ∞ ) in Assumption 3. If the kernel has a boundedsupport such as [ − , µ + ,j ( c ) = (cid:82) − c u j K ( u ) du . Similar changes can be made tothe ν variables. The diﬀerentiability in Assumption 4 is also used for Taylor expansionsin the proof and we require the positivity of the density at the boundary as it appears inthe denominator of the bias expression. The symmetric error distribution assumption in8ssumption 5 is not as restrictive as it looks. This symmetry assumption helps to removean extra term in the bias. When the error distribution is non-symmetric, this extra termcan be removed via a procedure similar to bias correction. One only needs n + h + → ∞ whenestimating the conditional mean. The stronger assumption in Assumption 6 is needed whenwe estimate the variance of the second derivative of m Y + , denoted by m (2) Y + , to adjust thes.e. of ˆ m Y + . This assumption is required in order to obtain a consistent estimator of m (2) Y + .Assumption 6 translates to h → nh → ∞ as n → ∞ if we use a single bandwidth h for the data above and below the cutoﬀ, and n is the total number of observations.Next we provide two theorems on the LCQR estimator for sharp and fuzzy treatmenteﬀects deﬁned in eq. (6) and eq. (12), respectively. Let X be the σ -ﬁeld generated by all X i . Theorem 1.

Under Assumptions 1 to 6, as both n + and n − → ∞ , we haveBias(ˆ τ sharp | X ) = 12 a Y + ( c ) m (2) Y + (0) h Y + − a Y − ( c ) m (2) Y − (0) h Y − + o p ( h Y + + h Y − ) , (15)Var(ˆ τ sharp | X ) = b Y + ( c ) σ (cid:15) Y + (0) n + h Y + f X + (0) + b Y − ( c ) σ (cid:15) Y − (0) n − h Y − f X − (0) + o p ( 1 n + h Y + + 1 n − h Y − ) , (16)where a Y + ( c ) = a Y − ( c ) = µ , ( c ) − µ + , ( c ) µ + , ( c ) µ + , ( c ) µ + , ( c ) − µ , ( c ) , (17) b Y + ( c ) = e Tq ( S − Y + ( c )Σ Y + ( c ) S − Y + ( c )) e q /q , (18) b Y − ( c ) = e Tq ( S − Y − ( c )Σ Y − ( c ) S − Y − ( c )) e q /q , (19)and the matrices S Y + ( c ), S Y − ( c ), Σ Y + ( c ) and Σ Y − ( c ) are deﬁned in the Appendix, e q is the q -dimensional vector of ones.Proof of Theorem 1 is a straightforward application of Theorem 2.1 in Kai et al. (2009)and is omitted. As long as a symmetric kernel is used, the value of a Y + ( c ) remains the sameif kernel moments on the other side of the cutoﬀ are used. Generally, we have b Y + ( c ) (cid:54) = b Y − ( c )unless error distributions on both sides of the cutoﬀ are assumed to be the same.9hen the bandwidths above and below the cutoﬀ are assumed to be the same, Theorem 1reduces to the following corollary. Corollary 1.

Under Assumptions 1 to 6, if we further assume h Y + = h Y − = h , then we haveBias(ˆ τ sharp | X ) = 12 (cid:104) a Y + ( c ) m (2) Y + (0) − a Y − ( c ) m (2) Y − (0) (cid:105) h + o p ( h ) , (20)Var(ˆ τ sharp | X ) = b Y + ( c ) σ (cid:15) Y + (0) n + hf X + (0) + b Y − ( c ) σ (cid:15) Y − (0) n − hf X − (0) + o p ( 1 n + h + 1 n − h ) (21)The asymptotic results for the fuzzy RD are similarly provided in the theorem below. Theorem 2.

Under Assumptions 1 to 6, as both n + and n − → ∞ , we haveBias(ˆ τ fuzzy | X ) = 1 m T + (0) − m T − (0) (cid:20) a Y + ( c ) m (2) Y + (0) h Y + − a Y − ( c ) m (2) Y − (0) h Y − (cid:21) − m Y + (0) − m Y − (0) (cid:2) m T + (0) − m T − (0) (cid:3) (cid:20) a T + ( c ) m (2) T + (0) h T + − a T − ( c ) m (2) T − (0) h T − (cid:21) + o p ( h Y + + h Y − + h T + + h T − ) . (22)The variance expression is given in eq. (12).The variance expression in eq. (12) is complicated but is available by substituting theresults in Lemmas 2 and 3 in Appendix A into eq. (12). See Appendix A for the proof.From Theorems 1 and 2 and using a single bandwidth h , we see a lot of similaritiesbetween LCQR and LLR: both biases have order O ( h ) and both variances have order O (( nh ) − ). In fact, the asymptotic bias expression is identical for the two methods. Thevariance from LCQR, however, is smaller than that from LLR under many non-normal errordistributions, for which we provide numerical results in the next subsection.In addition, in the special case of q = 1, we can show that b Y + ( c ) = µ , ν + , − µ + , µ + , ν + , + µ , ν + , ( µ + , µ + , − µ , ) q q (cid:88) k =1 q (cid:88) k (cid:48) =1 τ kk (cid:48) f (cid:15) Y + ( c k ) f (cid:15) Y + ( c (cid:48) k ) , (23)10here τ kk (cid:48) = τ k ∧ τ k (cid:48) − τ k τ k (cid:48) , k, k (cid:48) = 1 , , ..., q , c k = F − (cid:15) Y + ( τ k ), and F (cid:15) Y + and f (cid:15) Y + are the cdf andpdf of the error distribution, respectively. The result similarly holds for b Y − ( c ). However,eq. (23) does not hold for q ≥ We provide some numerical evidence in support of using LCQR in RD. To facilitate thepresentation of the results, we consider Corollary 1 with a single bandwidth and replacethe conditional density f X + (0) with the unconditional density f X (0) so that the product n + f X + (0) becomes nf X (0). Similarly, n − f X − (0) is also replaced by nf X (0).Under standard assumptions, the MSE for the LLR estimator of τ sharp is given by (see,e.g., Imbens and Kalyanaraman (2012))MSE LLR = 14 (cid:104) a Y + ( c ) m (2) Y + (0) − a Y − ( c ) m (2) Y − (0) (cid:105) h + 1 nh (cid:34) b ( c ) σ (cid:15) Y + (0) f X (0) + b ( c ) σ (cid:15) Y − (0) f X (0) (cid:35) + o p ( h + 1 nh ) , (24)where b ( c ) = µ , ν + , − µ + , µ + , ν + , + µ , ν + , ( µ + , µ + , − µ , ) . (25)Let a ( c ) = a Y + ( c ) = a Y − ( c ). By assuming errors on both sides of the cutoﬀ have the samedistribution and let b Y ( c ) = b Y + ( c ) = b Y − ( c ), we can further simplify the variance expressionin Corollary 1 to haveMSE LCQR = 14 (cid:104) a ( c ) m (2) Y + (0) − a ( c ) m (2) Y − (0) (cid:105) h + 1 nh (cid:34) b Y ( c ) σ (cid:15) Y + (0) f X (0) + b Y ( c ) σ (cid:15) Y − (0) f X (0) (cid:35) + o p ( h + 1 nh )= 14 (cid:104) a ( c ) m (2) Y + (0) − a ( c ) m (2) Y − (0) (cid:105) h + 1 nh (cid:34) b ( c ) σ (cid:15) Y + (0) f X (0) + b ( c ) σ (cid:15) Y − (0) f X (0) (cid:35) b Y ( c ) b ( c ) + o p ( h + 1 nh ) , (26)11here the two MSEs diﬀer by a factor b Y ( c ) b ( c ) for the variance term. Minimizing these twoMSEs gives optimal bandwidths. Substitute the bandwidths into the MSEs to obtain theoptimal values, MSE optLLR and MSE optLCQR . Similar to Kai et al. (2010), we deﬁne the asymptoticrelative eﬃciency (ARE) of LCQR with respect to LLR on the boundary asARE = MSE optLLR MSE optLCQR → (cid:20) b Y ( c ) b ( c ) (cid:21) − / . (27)When q = 1, the ratio in eq. (27) simpliﬁes to (cid:20) q (cid:80) qk =1 (cid:80) qk (cid:48) =1 τ kk (cid:48) f (cid:15)Y + ( c k ) f (cid:15)Y + ( c (cid:48) k ) (cid:21) − / . When q ≥

2, this simpliﬁcation does not hold. Yet we can evaluate the ratio in eq. (27) for manycommon distributions. Using the triangular kernel as an example, we calculate the ratio forthe ﬁve error distributions listed in Table 1. Using the Epanechnikov kernel gives similarresults. Table 1: Asymptotic relative eﬃciency (ARE) of LCQR in sharp RDError distribution q = 1 q = 5 q = 9 q = 19 q = 99 N (0 ,

1) 0.6968 0.9290 0.9569 0.9728 0.9819Laplace 1.7411 1.3315 1.2920 1.2616 1.2303 t -distribution with 3 degrees of freedom 1.4718 1.6401 1.6144 1.5703 1.48540 . N (0 ,

1) + 0 . N (0 , ) 0.8639 1.1271 1.1511 1.1579 1.13270 . N (0 ,

1) + 0 . N (0 , ) 2.6960 3.4578 3.4986 3.4590 2.2632The ARE for boundary points in Table 1 largely follows the same pattern for interiorpoints as shown in Table 1 in Kai et al. (2010). When errors are normally distributed,increasing q will quickly bring ARE close to one, reﬂecting a very small eﬃciency loss ofLCQR with respect to LLR. For non-normal errors, there can be large eﬃciency gains. Theﬁrst column in Table 1 is identical for both interior and boundary points, a result of usinga symmetric kernel function. When q ≥

2, the numbers diﬀer from those in Table 1 in Kai et al. (2010). Overall, Table 1 provides the numerical evidence in support of using LCQR inRD when data are non-normal, i.e., the ARE mostly exceeds 1.12

Inference on the boundary

The goal of inference is related to but diﬀerent from that of estimation. While one still needsa precise ˆ τ sharp or ˆ τ fuzzy , much of the eﬀort is spent on making t -statistics in eqs. (5) and (11)behave like a standard normal random variable. Common approaches include reducing theimpact of bias by under-smoothing, directly adjusting the bias, combining bias correctionand s.e. adjustment (Calonico et al. (2014)), adjusting the critical values (Armstrong andKoles´ar (2020)), etc. This section studies the bias-corrected and s.e.-adjusted t -statisticsbased on LCQR for conducting inference in sharp and fuzzy RD designs. From Theorem 1, the bias-corrected estimator is given byˆ τ bcsharp = ˆ τ sharp − (cid:100) Bias(ˆ τ sharp ) , (28)where (cid:100) Bias(ˆ τ sharp ) = (cid:100) Bias( ˆ m Y + ) − (cid:100) Bias( ˆ m Y − ).Simply replacing ˆ τ sharp in eq. (5) with ˆ τ bcsharp may still lead to undercoverage of the resultingconﬁdence intervals. Calonico et al. (2014) observed that the key to ﬁx this problem is totake into consideration the additional variability introduced by bias correction. Similar ideais applied to inference based on LCQR, i.e., instead of using Var(ˆ τ sharp ) in the denominatorof eq. (5), we use Var(ˆ τ sharp − (cid:100) Bias(ˆ τ sharp )). The additional variability due to bias correctionincreases the adjusted variance and the s.e./variance adjustment naturally improves thecoverage of the resulting conﬁdence intervals. By the i.i.d. assumption, we haveVar(ˆ τ sharp − (cid:100) Bias(ˆ τ sharp )) = Var(ˆ τ sharp ) + Var( (cid:100) Bias(ˆ τ sharp )) − τ sharp , (cid:100) Bias(ˆ τ sharp ))= Var( ˆ m Y + ) + Var( ˆ m Y − ) + Var( (cid:100) Bias( ˆ m Y + )) + Var( (cid:100) Bias( ˆ m Y − )) − m Y + , (cid:100) Bias( ˆ m Y + )) − m Y − , (cid:100) Bias( ˆ m Y − )) , (29)13here the expressions for Var( ˆ m Y + ) and Var( ˆ m Y − ) are available in Kai et al. (2009, 2010).We contribute to the literature by deriving the explicit forms of the variance of bias and thecovariance in eq. (29). Theorem 3.

Under Assumptions 1 to 6, as both n + and n − → ∞ , the adjusted t -statisticfor the sharp RD follows an asymptotic normal distribution t adj.sharp = ˆ τ sharp − (cid:100) Bias(ˆ τ sharp ) (cid:113) Var(ˆ τ sharp − (cid:100) Bias(ˆ τ sharp )) d → N (0 , . (30)The exact form of the adjusted variance is given in the Appendix.Following Theorem 3, the conﬁdence interval for the nominal 95% coverage becomesˆ τ sharp − (cid:100) Bias(ˆ τ sharp ) ± . (cid:113) Var(ˆ τ sharp − (cid:100) Bias(ˆ τ sharp )), which incorporates the variability in-troduced by bias correction and it will thus improve the coverage probability. Remark 4.

Theorem 3 assumes that the researcher uses the same bandwidths ( h Y + and h Y − ) for estimation, bias correction, and s.e. adjustment. This simpliﬁes the presentationof the result and the proof. It is also empirically relevant and appealing since using a singlebandwidth throughout estimation and inference greatly reduces the complexity of imple-mentation, though ideally we need a diﬀerent bandwidth for every required nonparametricestimation. Despite the suboptimal nature of this procedure, our simulation and empiricalexercise show that the results are satisfactory. Remark 5.

We assume two diﬀerent bandwidths, h Y + and h Y − , in Theorem 3. If they arefurther assumed to be the same, h Y + = h Y − = h , then by replacing the conditional density ofthe covariate, f X + (0) and f X − (0), with the unconditional one, f X (0), it can be shown thatVar(ˆ τ sharp − (cid:100) Bias(ˆ τ sharp )) = 1 nh C + o p ( 1 nh ) , where the constant C can be straightforwardly inferred from the proof of Theorem 3 inAppendix A. 14 .2 The fuzzy case Following Feir et al. (2016), we use a modiﬁed t -statistic to help eliminate the size distortiondue to possibly weak identiﬁcation in the fuzzy RD. Use eq. (9) to rewrite the the null H : τ fuzzy = τ as H : [ m Y + (0) − τ m T + (0)] − [ m Y − (0) − τ m T − (0)] = 0 . Deﬁne ˜ τ fuzzy = ( ˆ m Y + − τ ˆ m T + ) − ( ˆ m Y − − τ ˆ m T − ). The bias-corrected ˜ τ fuzzy is given by˜ τ bcfuzzy = ˜ τ fuzzy − (cid:100) Bias(˜ τ fuzzy ) , where (cid:100) Bias(˜ τ fuzzy ) = [ (cid:100) Bias( ˆ m Y + ) − τ (cid:100) Bias( ˆ m T + )] − [ (cid:100) Bias( ˆ m Y − ) − τ (cid:100) Bias( ˆ m T − )] . (31)All the bias terms can be estimated using the result in Lemma 2 of Appendix A. To see thecomponents in the adjusted variance, it is helpful to consider ﬁrst the data above the cutoﬀ:Var(( ˆ m Y + − τ ˆ m T + ) − ( (cid:100) Bias( ˆ m Y + ) − τ (cid:100) Bias( ˆ m T + )))= Var( ˆ m Y + ) + τ Var( ˆ m T + ) − τ Cov( ˆ m Y + , ˆ m T + )+ Var( (cid:100) Bias( ˆ m Y + )) + τ Var( (cid:100)

Bias( ˆ m T + )) − τ Cov( (cid:100)

Bias( ˆ m Y + ) , (cid:100) Bias( ˆ m T + )) − m Y + , (cid:100) Bias( ˆ m Y + )) + 2 τ Cov( ˆ m Y + , (cid:100) Bias( ˆ m T + ))+ 2 τ Cov( ˆ m T + , (cid:100) Bias( ˆ m Y + )) − τ Cov( ˆ m T + , (cid:100) Bias( ˆ m T + )) . (32)To operationalize the variance adjustment process, we derive all the needed covarianceterms in the Appendix. Expressions for the data below the cutoﬀ are similar to thosepresented above for the data above the cutoﬀ.15 heorem 6. Under Assumptions 1 to 6, as both n + and n − → ∞ , the adjusted t -statisticfor the fuzzy RD follows an asymptotic normal distribution t adj.fuzzy = ˜ τ fuzzy − (cid:100) Bias(˜ τ fuzzy ) (cid:113) Var(˜ τ fuzzy − (cid:100) Bias(˜ τ fuzzy )) d → N (0 , . (33)The exact expression for Var(˜ τ fuzzy − (cid:100) Bias(˜ τ fuzzy )) is provided in the Appendix. In this subsection, we propose a method to revise the MSE-optimal bandwidth by takinginto consideration both bias correction and s.e. adjustment. It is well-known in the nonpara-metric literature that the MSE-optimal bandwidth, when used in inference, often inducesundercoverage of conventional conﬁdence intervals. The root cause is that the MSE-optimalbandwidth, h MSE , is O ( n − / ) such that nh → a constant while we need nh → h MSE in eq. (5) could lead to poorcoverage of conﬁdence intervals. Using a bias-corrected t -statisticˆ τ sharp − (cid:100) Bias(ˆ τ sharp ) (cid:112) Var(ˆ τ sharp )does not solve this problem; see the discussion in Section 2 of Calonico et al. (2014).It is helpful to revisit the bandwidth selection process in an MSE-optimal setting in orderto ﬁnd a solution. We use a single bandwidth h for illustration. Consider the usual MSEwhen estimating a conditional mean function m ( · ) by ˆ m ( · ):MSE = Bias ( ˆ m ) + Var( ˆ m ) + o p ( h + 1 nh ) . It is clear that the MSE-optimal bandwidth aims to balance the two terms, Bias andVariance. After correcting the bias for the numerator of the t -statistic, an optimal band-width should balance Bias ( ˆ m − Bias( ˆ m )), not Bias ( ˆ m ), with the variance term. Given16ias( ˆ m ) = O ( h ), we need to expand the bias expression up to O ( h ) in order to com-pute Bias( ˆ m − Bias( ˆ m )). In addition, after incorporating variance adjustment due to biascorrection, the adjusted MSE takes the following formadj. MSE = Bias ( ˆ m − Bias( ˆ m )) + adj. Var( ˆ m ) + o p ( h + 1 nh ) , (34)where adj. Var( ˆ m ) can take the form in eq. (29) in the sharp RD design. The followingtheorem presents the bandwidth result associated with the adjusted MSE in the sharp RDdesign, assuming we work with data above the cutoﬀ. Theorem 7.

Under Assumptions 1 to 6, as n + → ∞ , the bandwidth that minimizes theadjusted MSE is given by h Y + = (cid:18) C C (cid:19) / n − / , (35)where C and C are constants deﬁned in the Appendix.See the Appendix for the proof. The result for the fuzzy design case can be obtainedsimilarly. Remark 8.

Theorem 7 assumes that two separate bandwidths are used for the data aboveand below the cutoﬀ. Similar result holds for h Y − . On each side of the cutoﬀ, a singlebandwidth is used in both estimation and bias correction. Theorem 7 provides the expressionof this bandwidth. The idea of using the adjusted MSE in eq. (34) can also be applied tobandwidth selection in LLR. Remark 9.

The bandwidth derived in Theorem 7 is supposed to balance the bias of thebias-corrected estimator and its adjusted variance. It is not designed to optimize the coverageprobability of conﬁdence intervals. However, simulation evidence shows that it provides goodcoverage probabilities along with an accurate estimator of the treatment eﬀect.

Remark 10.

The optimal bandwidth is compatible with the required assumption of n + h Y + →∞ as n + → ∞ . 17 emark 11. If equal bandwidth on both sides of the cutoﬀ is preferred, it can be derived ina manner similar to the optimal bandwidth choice in Imbens and Kalyanaraman (2012). Tosee that, we further assume h Y + = h Y − = h and let C + , , C + , , C − , , C − , be the correspondingconstants in Theorem 7 for the data above and below the cutoﬀ. The adjusted MSE becomesadj. MSE = ( C + , − C − , ) h + nh ( C + , + C − , ) + o p ( h + nh ), and the optimal bandwidthis given by h = (cid:18) C + , + C − , C + , − C − , ) (cid:19) / n − / . (36)This bandwidth is shown to have good ﬁnite sample performance in our simulation study.One caveat is that the performance of this bandwidth relies on a good estimate of thesecond derivative of the conditional mean function. If it is diﬃcult to obtain an estimateof the second derivative, one can use a simple, global quadratic polynomial to estimate thederivative. This is implemented in bandwidth = "rot" as well as the option ls.derivative= TRUE in the rdcqr package. Remark 12.

The bandwidth considered in this section has order O ( n − / ) or O ( n − / ). Tocheck the robustness (to bandwidth selection) of the adjusted t -statistic, we also directlyuse the rule-of-thumb bandwidth in Section 4.2 of Fan and Gijbels (1996) in our simulation.This bandwidth is close to the Mean Integrated Squared Error optimal (MISE-optimal)bandwidth and has order O ( n − / ) and O ( n − / − ) for the data above and below the cutoﬀ.The results are reported in Appendix B, and they are found very similar to those obtainedfrom using an O ( n − / ) bandwidth. In this section, we conduct a Monte Carlo study to investigate the ﬁnite sample propertiesof the LCQR estimator in RD. We focus on the sharp design and separate the Monte Carlostudy into two parts, one for estimation and the other for inference.18he sharp designs calibrated to Lee (2008) and Ludwig and Miller (2007) are used in thedata generating process (DGP): Y i = m ( X i ) + σ ( X i ) (cid:15) i , i = 1 , · · · , n,X i ∼ × Beta(2 , − , where the conditional means are given by m Lee ( X i ) =  .

48 + 1 . X i + 7 . X i + 20 . X i + 21 . X i + 7 . X i if X i < , .

52 + 0 . X i − . X i + 7 . X i − . X i + 3 . X i if X i ≥ , (37)and m LM ( X i ) =  .

71 + 2 . X i + 3 . X i + 1 . X i + 0 . X i + 0 . X i if X i < , .

26 + 18 . X i − . X i + 74 . X i − . X i + 9 . X i if X i ≥ . (38)Hence the treatment eﬀects are 0 .

04 and − .

45 in (37) and (38), respectively. We use thesame ﬁve error distributions in Kai et al. (2010) to simulate (cid:15) i :DGP 1: (cid:15) i ∼ N (0 , , DGP 2: (cid:15) i ∼ Laplace with µ = 0 and σ = 1 , DGP 3: (cid:15) i ∼ t distribution with 3 degrees of freedom , DGP 4: (cid:15) i ∼ mixture normal 0 . N (0 ,

1) + 0 . N (0 , , DGP 5: (cid:15) i ∼ mixture normal 0 . N (0 ,

1) + 0 . N (0 , . In addition, the homoskedastic and heterskedastic speciﬁcations in Kai et al. (2010) are usedfor simulating the standard deviation 19 ( X i ) =  . πX i ) /

10 for heteroskedastic errors.We set n = 500 with 5000 replications. The data are i.i.d. draws in all replications. Thetriangular kernel is used in all estimations.Since we also study the coverage probability of conﬁdence intervals, we compare theLCQR results with two other inference methods recently proposed in the RD literature: therobust conﬁdence interval in Calonico et al. (2014), and the honest conﬁdence interval inArmstrong and Koles´ar (2020). For ease of exposition, we ﬁrst summarize the types of theestimators reported later on.Table 2: Summary of estimators in the simulation studyEstimator Descriptionˆ τ cqr1bw LCQR with equal bandwidthˆ τ cqr,bc1bw LCQR with equal bandwidth, bias-corrected and s.e.-adjustedˆ τ cqr2bw LCQR with unequal bandwidthˆ τ cqr,bc2bw LCQR with unequal bandwidth, bias-corrected and s.e.-adjustedˆ τ llr1bw LLR estimatorˆ τ robust,bc1bw the robust estimator in Calonico et al. (2014) with equal bandwidthˆ τ honest1bw the honest method in Armstrong and Koles´ar (2020) with equal bandwidthWe set q = 7 for all LCQR estimators. ˆ τ robust,bc1bw is obtained from the rdrobust commandin the R package rdrobust . We use the option bwselect = mserd in estimation and theoption bwselect = cerrd in inference so that ˆ τ robust,bc1bw is MSE-optimal and coverage-optimalin Section 4.1 and Section 4.2, respectively. ˆ τ honest1bw is obtained from the RDHonest commandin the R package

RDHonest . The option opt.criterion = FLCI is used to produce theoptimal conﬁdence intervals. Taylor smoothness class is selected in estimation. To implementthe method in Armstrong and Koles´ar (2020), one needs to get an estimate of the secondderivative of the conditional mean function. We use the function

RDSmoothnessBound in the

RDHonest package to estimate this bound for the Lee model in eq. (37). Alternatively, since20e know the data generating process in eq. (38) and the maximum of the second derivativeat 0 is 54 . ×

2, a bound of 120 is directly used for the LM model. The local linear estimator,ˆ τ llr1bw , is obtained by applying the main bandwidth used in ˆ τ robust,bc1bw .To calculate the bandwidth in eq. (35) and eq. (36) for LCQR, we use the rule-of-thumbbandwidth selector described in Section 4.2 of Fan and Gijbels (1996) to compute quantitiessuch as C and C in eq. (35). The bandwidth in eq. (35) or eq. (36) is then used to performthe LCQR estimation, bias correction, and s.e. adjustment. Unlike the LLR estimator,the LCQR estimator has no closed-form solution and is obtained from the iterative MMalgorithm. Hence the s.e. and adjusted s.e. reported for LCQR are based on the asymptoticresults in Theorems 1 and 3. In this subsection, we compare LCQR with LLR for the treatment eﬀect estimation. b i a s o f b i a s − c o rr e c t ed e s t i m a t e LCQRLLR

Figure 2: Absolute value of average bias of the bias-corrected estimators, ˆ τ cqr,bc1bw and ˆ τ robust,bc1bw for the Lee model with homoskedasticity. ˆ τ robust,bc1bw is the bias-corrected LLR estimator. Theresult is based on 5000 replications and the true treatment eﬀect is 0 . . .

04 in DGP. The bias is reduced to 0 . .

5% of the truetreatment eﬀect when LCQR is adopted. Similar results for other models can be found inFigures 4 to 6 in Appendix B.Tables 3 to 6 further present the mean and standard errors of the studied estimatorsto facilitate comparison. These tables indicate that the s.e. of ˆ τ cqr1bw is consistently smallerthan that of ˆ τ llr1bw , indicating the eﬃciency gain of LCQR against LLR. Although DGP 1has normal errors, Y is non-normal; hence the s.e. of ˆ τ cqr1bw is also smaller than that of ˆ τ llr1bw .As the DGP moves away from normal errors, the LCQR estimator achieves various levels ofeﬃciency gains compared to the LLR estimator. For example, the s.e. of ˆ τ cqr1bw in DGP 5 ofTable 4 is 0 . .

255 for ˆ τ llr1bw , so the LCQR/LLR standard error ratio is closeto 50% in this example.While reading Tables 3 to 6, it is important to bear in mind that the s.e.s of estimatorswith a superscript “ bc ” are not suitable to assess the eﬃciency of the estimators. Theyare computed by incorporating the additional variability due to bias correction. However,they shed light on the length of conﬁdence intervals. Tables 3 to 6 show that the adjusteds.e. of ˆ τ cqr,bc1bw could be smaller or larger than that of ˆ τ robust,bc1bw , so the simulation results forcomparing LCQR with LLR appear to be mixed after bias correction.In addition, using two bandwidths above and below the cutoﬀ also gives mixed results interms of bias correction for the LCQR estimator, though it is clear that the two-bandwidthapproach leads to a small decrease in s.e. for LCQR.22able 3: Lee model with homoskedastic errors and treatment eﬀect τ sharp = 0 . τ cqr1bw τ cqr,bc1bw τ cqr2bw τ cqr,bc2bw τ llr1bw τ robust,bc1bw Notes : The estimates for the treatment eﬀect and standard error are averages over 5000replications with a sample size n = 500. The s.e. and adjusted s.e. for LCQR areobtained using the asymptotic expressions from Theorem 1 and Theorem 3. Estimatorswith a superscript bc are both bias-corrected and s.e.-adjusted. The subscripts 1bw and2bw refer to the use of equal and unequal bandwidths below and above the cutoﬀ.Table 4: Lee model with heteroskedatic errors and treatment eﬀect τ sharp = 0 . τ cqr1bw τ cqr,bc1bw τ cqr2bw τ cqr,bc2bw τ llr1bw τ robust,bc1bw Notes : The estimates for the treatment eﬀect and standard error are averages over 5000replications with a sample size n = 500. The s.e. and adjusted s.e. for LCQR areobtained using the asymptotic expressions from Theorem 1 and Theorem 3. Estimatorswith a superscript bc are both bias-corrected and s.e.-adjusted. The subscripts 1bw and2bw refer to the use of equal and unequal bandwidths below and above the cutoﬀ.23able 5: LM model with homoskedasticity errors and treatment eﬀect τ sharp = − . τ cqr1bw -3.244 0.191 -3.265 0.231 -3.269 0.252 -3.257 0.206 -3.296 0.251ˆ τ cqr,bc1bw -3.439 0.353 -3.453 0.415 -3.450 0.458 -3.445 0.380 -3.443 0.455ˆ τ cqr2bw -3.259 0.189 -3.280 0.228 -3.285 0.249 -3.270 0.204 -3.308 0.247ˆ τ cqr,bc2bw -3.440 0.348 -3.452 0.411 -3.450 0.452 -3.443 0.375 -3.440 0.447ˆ τ llr1bw -3.340 0.223 -3.342 0.301 -3.333 0.346 -3.345 0.256 -3.328 0.452ˆ τ robust,bc1bw -3.423 0.256 -3.434 0.349 -3.426 0.403 -3.432 0.295 -3.426 0.530 Notes : The estimates for the treatment eﬀect and standard error are averages over 5000replications with a sample size n = 500. The s.e. and adjusted s.e. for LCQR are obtainedusing the asymptotic expressions from Theorem 1 and Theorem 3. Estimators with a super-script bc are both bias-corrected and s.e.-adjusted. The subscripts 1bw and 2bw refer to theuse of equal and unequal bandwidths below and above the cutoﬀ.Table 6: LM model with heteroskedatic errors and treatment eﬀect τ sharp = − . τ cqr1bw -3.234 0.106 -3.245 0.127 -3.252 0.138 -3.242 0.114 -3.281 0.137ˆ τ cqr,bc1bw -3.443 0.195 -3.448 0.228 -3.444 0.250 -3.442 0.209 -3.443 0.248ˆ τ cqr2bw -3.239 0.104 -3.256 0.125 -3.259 0.136 -3.247 0.112 -3.288 0.134ˆ τ cqr,bc2bw -3.443 0.192 -3.447 0.225 -3.446 0.246 -3.444 0.206 -3.445 0.243ˆ τ llr1bw -3.364 0.146 -3.359 0.190 -3.349 0.215 -3.364 0.165 -3.339 0.273ˆ τ robust,bc1bw -3.431 0.163 -3.438 0.216 -3.432 0.245 -3.437 0.185 -3.431 0.315 Notes : The estimates for the treatment eﬀect and standard error are averages over 5000replications with a sample size n = 500. The s.e. and adjusted s.e. for LCQR are obtainedusing the asymptotic expressions from Theorem 1 and Theorem 3. Estimators with a super-script bc are both bias-corrected and s.e.-adjusted. The subscripts 1bw and 2bw refer to theuse of equal and unequal bandwidths below and above the cutoﬀ.24 .2 Inference on the treatment eﬀect This subsection studies the coverage probability of the conﬁdence intervals based on theLCQR estimator, using the adjusted s.e. in Theorem 3 and the adj.MSE-based bandwidthselector. The results are summarized in Tables 7 to 10. The nominal coverage probability isset to 95%.Without bias correction or s.e. adjustment, conﬁdence intervals based on the LCQR esti-mators, ˆ τ cqr1bw and ˆ τ cqr2bw , are found to have poor coverage probabilities in Tables 7 to 10, whichhighlights an important diﬀerence between estimation and inference: despite the excellentﬁnite sample properties of LCQR in Tables 3 to 6, one has to perform both bias correctionand s.e. adjustment in order to produce the desired coverage of conﬁdence intervals. Fur-thermore, the coverage probability of LCQR-based conﬁdence intervals is lower than that ofLLR-based conﬁdence intervals. For example, it is 92 .

2% in DGP 1 in Table 7, compared to93 .

4% of the LLR-based conﬁdence interval. Although ˆ τ cqr1bw tends to have a smaller bias anda smaller s.e. than ˆ τ llr1bw , it appears that the bias in ˆ τ cqr1bw is still not small enough to centerthe estimator close to the true value. A smaller s.e. will further contribute to the decreasein coverage probabilities. This explains the poor coverage for conﬁdence intervals based onˆ τ cqr1bw and ˆ τ cqr2bw .With bias correction and s.e. adjustment, the proposed bandwidth selectors in eq. (35)and eq. (36) lead to good coverage across the ﬁve DGPs in Tables 7 to 10. It is importantto recall that the LCQR estimator presented in all tables use the bandwidth in eq. (35) andeq. (36), which is not designed to optimize the coverage probability of conﬁdence intervals,while the conﬁdence intervals for ˆ τ robust,bc1bw and ˆ τ honest1bw in Tables 7 to 10 are optimized tomaximize the coverage probability. In this respect, it is reasonable to conclude that LCQRwith bandwidths in eq. (35) and eq. (36) oﬀers very competitive results. The performance of the honest conﬁdence intervals is excellent for some DGPs in Table 7. Its coverageprobability is higher in Tables 9 and 10, most likely due to the use of the upper bound 120 in estimation.Clearly, a better number can be used to further tune the results. τ cqr1bw τ cqr,bc1bw τ cqr2bw τ cqr,bc2bw τ llr1bw τ robust,bc1bw τ honest1bw Notes : The numbers are the averages of the coverage prob-ability associated with diﬀerent estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with superscript bc are both bias-corrected and s.e.-adjusted. The results of both ˆ τ robust,bc1bw and ˆ τ honest1bw are coverage-optimal.Table 8: Coverage probability in the Lee model with heteroskedasticityEstimator DGP 1 DGP 2 DGP 3 DGP 4 DGP 5ˆ τ cqr1bw τ cqr,bc1bw τ cqr2bw τ cqr,bc2bw τ llr1bw τ robust,bc1bw τ honest1bw Notes : The numbers are the averages of the coverage prob-ability associated with diﬀerent estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with superscript bc are both bias-corrected and s.e.-adjusted. The results of both ˆ τ robust,bc1bw and ˆ τ honest1bw are coverage-optimal.26able 9: Coverage probability in the LM model with homoskedasticityEstimator DGP 1 DGP 2 DGP 3 DGP 4 DGP 5ˆ τ cqr1bw τ cqr,bc1bw τ cqr2bw τ cqr,bc2bw τ llr1bw τ robust,bc1bw τ honest1bw Notes : The numbers are the averages of the coverage prob-ability associated with diﬀerent estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with superscript bc are both bias-corrected and s.e.-adjusted. The results of both ˆ τ robust,bc1bw and ˆ τ honest1bw are coverage-optimal.Table 10: Coverage probability in the LM model with heteroskedasticityEstimator DGP 1 DGP 2 DGP 3 DGP 4 DGP 5ˆ τ cqr1bw τ cqr,bc1bw τ cqr2bw τ cqr,bc2bw τ llr1bw τ robust,bc1bw τ honest1bw Notes : The numbers are the averages of the coverage prob-ability associated with diﬀerent estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with superscript bc are both bias-corrected and s.e.-adjusted. The results of both ˆ τ robust,bc1bw and ˆ τ honest1bw are coverage-optimal.27 Application to Lee (2008)

In this section, we use the data from Lee (2008) to illustrate the practical usage of LCQR.To facilitate comparison, the ﬁndings based on LLR are also presented.We revisit a classic example from Lee (2008) with 6 ,

558 observations depicted in Fig-ure 3(a); see also Imbens and Kalyanaraman (2012). The horizontal running variable isthe democratic vote share (margin of victory) in a previous election, while the vertical out-come variable is the democratic vote share in the election afterwards. Consistent with Lee(2008) and several follow-up studies, Figure 3(a) indicates that there is a positive impact ofincumbency on re-election, i.e., a visible jump occurs at the threshold zero.Figure 3(b) and (c) present the estimated impact of incumbency on re-election by LCQRand LLR methods, respectively. We consider a sequence of bandwidth values ranging from0.05 to 1 with the step size 0.025: 0.05, 0.075, 0.1, ..., 1. This bandwidth sequence thus nestsmany common choices such as the MSE-optimal bandwidth of Imbens and Kalyanaraman(2012), which is about 0.3 for the studied data.The comparison of Figure 3(b) and (c) shows that LCQR and LLR yield similar pointestimates over a wide range of bandwidths for the studied Lee (2008) application. As thebandwidth increases, Figure 3(a) indicates that there are data points staying further awayfrom the ﬁtted regression line. These data points aﬀect LCQR and LLR estimators in adiﬀerent manner, leading to slightly disparate point estimates. Nevertheless, all the pointestimates depicted in Figure 3(b) and (c) are signiﬁcantly positive, since the (vertical) zerovalue is excluded from the shaded regions generated by ± ∼ (a) Regression Discontinuity Design . . . . D e m o c r a t i c V o t e S ha r e , E l e c t i on t + −1 −.5 0 .5 1Democratic Vote Share Margin of Victory, Election t (b) LCQR Estimate ± T r ea t m en t E ff e c t + / - S . E . (c) LLR Estimate ± T r ea t m en t E ff e c t + / - S . E . (d) Ratio of S.E., LCQR/LLR S t anda r d E rr o r R a t i o : L C Q R / LL R Notes : (a) x-axis, the democratic vote share (margin of victory) in Election t ; y-axis, the democratic voteshare in Election t + 1. The dots show the sample mean of the vertical variable in each bin of the horizontalvariable (50 bins on each side of the cutoﬀ). The solid line represents the ﬁtted fourth-order polynomial.As the bandwidth increases, (b) presents the LCQR estimate ± × standard error; (c) presents the LLRestimate ± × standard error; and (d) presents the ratio of the standard errors by LCQR and LLR. TheMSE-optimal bandwidth of Imbens and Kalyanaraman (2012) for the studied data set is about 0.3. Thetriangular kernel is used for both LCQR and LLR estimators. ± et al. (2014) approach, which is (0.046, 0.089). These empiricalﬁndings therefore lend credibility to our proposed LCQR approach. In this paper, we study the application of LCQR in Kai et al. (2010) to the estimation andinference in RD. We present numerical evidence for the eﬃciency gain of using LCQR inRD estimation and also propose a bias-corrected and s.e.-adjusted t -statistic to improve thecoverage of conﬁdence intervals. Simulation results show good performance of the proposedmethod under several non-normal error distributions.The current work can be extended in several directions. By minimizing higher orderterms in the Edgeworth expansion of the t -statistic, we could derive a coverage-optimalbandwidth for LCQR that will possibly further improve its performance in inference, similarto the work in Calonico et al. (2018). It will also be interesting to revisit some of the existingapplications in RD with the proposed method as data may deviate from normality. Finally,on the computation side, instead of using the same bandwidth for estimation, bias correctionand s.e. adjustment, one can reﬁne the bandwidth selection process, which may also improvethe estimation and coverage probability of LCQR. We leave these topics for future research.30 eferences Armstrong, T. and

Koles´ar, M. (2018). Optimal inference in a class of regressionmodels.

Econometrica , (2), 655–683. — and Koles´ar, M. (2020). Simple and honest conﬁdence intervals in nonparametricregression.

Quantitative Economics , (1), 1–39. Calonico, S. , Cattaneo, M. D. and

Farrell, M. H. (2018). On the eﬀect of biasestimation on coverage accuracy in nonparametric inference.

Journal of the AmericanStatistical Association , (522), 767–779. — , — and Titiunik, R. (2014). Robust nonparametric conﬁdence intervals for regression-discontinuity designs.

Econometrica , (6), 2295–2326. Cheng, M.-Y. , Fan, J. and

Marron, J. S. (1997). On automatic boundary corrections.

The Annals of Statistics , (4), 1691–1708. Fan, J. (1993). Local linear regression smoothers and their minimax eﬃciencies.

The Annalsof Statistics , (1), 196–216. — and Gijbels, I. (1992). Variable bandwidth and local linear-regression smoothers.

TheAnnals of Statistics , (4), 2008–2036. — and — (1996). Local polynomial modelling and its applications . CRC Press.

Feir, D. , Lemieux, T. and

Marmer, V. (2016). Weak identiﬁcation in fuzzy regressiondiscontinuity designs.

Journal of Business and Economic Statistics , (2), 185–196. Frandsen, B. R. , Frlich, M. and

Melly, B. (2012). Quantile treatment eﬀects in theregression discontinuity design.

Journal of Econometrics , (2), 382 – 395. Hahn, J. , Todd, P. and

Van der Klaauw, W. (2001). Identiﬁcation and estimation oftreatment eﬀects with a regression-discontinuity design.

Econometrica , (1), 201–209. Huang, X. and

Lin, Z. (2020). Local composite quantile regression smoothing: A ﬂexibledata structure and cross-validation.

Econometric Theory , forthcoming.

Imbens, G. and

Kalyanaraman, K. (2012). Optimal bandwidth choice for the regressiondiscontinuity estimator.

Review of Economic Studies , (3), 933–959. — and Lemieux, T. (2008). Regression discontinuity designs: A guide to practice.

Journalof Econometrics , (2), 615–635. 31 ai, B. , Li, R. and

Zou, H. (2009). Supplement material for local composite quantileregression smoothing: an eﬃcient and safe alternative to local polynomial regression. — , — and — (2010). Local composite quantile regression smoothing: an eﬃcient and safealternative to local polynomial regression. Journal of the Royal Statistical Society , (1),49–69. — , — and — (2011). New eﬃcient estimation and variable selection methods for semi-parametric varying-coeﬃcient partially linear models. The Annals of Statistics , (1),305. Lee, D. S. (2008). Randomized experiments from non-random selection in us house elec-tions.

Journal of Econometrics , (2), 675–697. — and Lemieux, T. (2010). Regression discontinuity designs in economics.

Journal ofEconomic Literature , (2), 281–355. Li, D. and

Li, R. (2016). Local composite quantile regression smoothing for harris recurrentmarkov processes.

Journal of Econometrics , (1), 44–56. Ludwig, J. and

Miller, D. L. (2007). Does head start improve children’s life chances?evidence from a regression discontinuity design.

Quarterly Journal of Economics , (1),159–208. Porter, J. (2003). Estimation in the regression discontinuity model.

UnpublishedManuscript, Department of Economics, University of Wisconsin at Madison . Ruppert, D. , Sheather, S. J. and

Wand, M. P. (1995). An eﬀective bandwidth selectorfor local least squares regression.

Journal of the American Statistical Association , (432),1257–1270. — and Wand, M. P. (1994). Multivariate locally weighted least squares regression.

TheAnnals of Statistics , (3), 1346–1370. Thistlethwaite, D. L. and

Campbell, D. T. (1960). Regression-discontinuity analysis:An alternative to the ex post facto experiment.

Journal of Educational Psychology , (6),309. Zhao, Z. and

Xiao, Z. (2014). Eﬃcient regressions via optimally combining quantile in-formation.

Econometric Theory , (6), 1272–1314.32 ppendix A Lemmas and proofs of theorems are provided in this section. In a fuzzy RD, LCQR is appliedseparately to eqs. (1), (2), (7) and (8). Instead of introducing four similar sets of variablesand notation for the proof, we will focus on eq. (1). Exactly the same proof holds for resultsbased on eqs. (2), (7) and (8).Let f (cid:15) Y + = ( f (cid:15) Y + ( c ) , · · · , f (cid:15) Y + ( c q )) T be a q × S Y + , ( c ) be a q × q diagonalmatrix with diagonal elements f (cid:15) Y + ( c k ) µ + , , S Y + , ( c ) be a q × p matrix with ( k, j ) element f (cid:15) Y + ( c k ) µ + ,j , S Y + , ( c ) be the transpose of S Y + , ( c ), and S Y + , ( c ) be a p × p matrix with( j, j (cid:48) ) element equal to (cid:80) qk =1 f (cid:15) Y + ( c k ) µ + ,j + j (cid:48) ( c ). Let Σ Y + , ( c ) be a q × q matrix with ( k, k (cid:48) )element ν + , ( c ) τ kk (cid:48) , Σ Y + , ( c ) be a q × p matrix with ( k, j ) element (cid:80) qk (cid:48) =1 τ kk (cid:48) ν + ,j , Σ Y + , ( c )be the transpose of Σ Y + , ( c ), Σ Y + , ( c ) be a p × p matrix with ( j, j (cid:48) ) element equal to (cid:80) qk =1 (cid:80) qk (cid:48) =1 τ kk (cid:48) ν + ,j + j (cid:48) ( c ). Deﬁne S Y + ( c ) =  S Y + , ( c ) S Y + , ( c ) S Y + , ( c ) S Y + , ( c )  , Σ Y + ( c ) =  Σ Y + , ( c ) Σ Y + , ( c )Σ Y + , ( c ) Σ Y + , ( c )  and the partitioned inverse of S − Y + S − Y + =  ( S − Y + ( c )) ( S − Y + ( c )) ( S − Y + ( c )) ( S − Y + ( c ))  . Let F + ( c k , c k (cid:48) ) be the joint cumulative distribution function of (cid:15) Y + and (cid:15) T + at ( c k , c k (cid:48) ). Deﬁne φ kk (cid:48) = F + ( c k , c k (cid:48) ) − τ k τ k (cid:48) . Also let Σ Y T + , ( c ) be a q × q matrix with ( k, k (cid:48) ) element ν + , ( c ) φ kk (cid:48) ,Σ Y T + , ( c ) be a q × p matrix with ( k, j ) element (cid:80) qk (cid:48) =1 φ kk (cid:48) ν + ,j ( c ), Σ Y T + , ( c ) be the transposeof Σ Y T + , ( c ), Σ Y T + , ( c ) be a p × p matrix with ( j, j (cid:48) ) element (cid:80) qk =1 (cid:80) qk (cid:48) =1 φ kk (cid:48) ν + ,j + j (cid:48) ( c ).Deﬁne Σ Y T + ( c ) =  Σ Y T + , ( c ) Σ Y T + , ( c )Σ Y T + , ( c ) Σ Y T + , ( c )  . A similar set of deﬁnitions can be provided to other variables on the boundary, including S T + ( c ) , S − T + ( c ) , Σ T + ( c ) , S Y − ( c ) , S − Y − ( c ) , Σ Y − ( c ) , S T − ( c ) , S − T − ( c ) , Σ T − ( c ) , F − ( c k , c k (cid:48) ) , Σ Y T − ( c ), and φ kk (cid:48) can also be redeﬁned with F − ( c k , c k (cid:48) ). 33onsider eq. (1). Let x + ,i = ( X + ,i − x ) /h Y + and K i = K ( x + ,i ) with x = 0. Deﬁne, for k = 1 , · · · , q and j = 0 , · · · , p , u k = (cid:112) n + h Y + ( a Y + − m Y + ( x ) − σ (cid:15) Y + c k ) ,v j = h jY + (cid:112) n + h Y + ( j ! b j − m ( j ) Y + ( x )) /j ! , ∆ i,k = u k (cid:112) n + h Y + + p (cid:88) j =1 v j x j + ,i (cid:112) n + h Y + ,r i,p = m Y + ( X + ,i ) − p (cid:88) j =0 m ( j ) Y + ( x )( X + ,i − x ) j /j ! ,d i,k = c k [ σ (cid:15) Y + ( X + ,i ) − σ (cid:15) Y + ( x )] + r i,p . Let W ∗ Y + ,n + = ( w ∗ Y + , , · · · , w ∗ Y + , q , w ∗ Y + , , · · · , w ∗ Y + , p ) T = ( w ∗ Y + , n , w ∗ Y + , n ) T , where w ∗ Y + , k = 1 (cid:112) n + h Y + n + (cid:88) i =1 K ( x + ,i ) η ∗ Y + ,i,k ,w ∗ Y + , j = 1 (cid:112) n + h Y + q (cid:88) k =1 n + (cid:88) i =1 K ( x + ,i ) x j + ,i η ∗ Y + ,i,k , ∀ j = 1 , · · · , p,η ∗ Y + ,i,k = I ( (cid:15) Y + ,i ≤ c k − d i,k σ (cid:15) Y + ,i ) − τ k . Also let W Y + ,n + = ( w Y + , , · · · , w Y + , q , w Y + , , · · · , w Y + , p ) T = ( w Y + , n , w Y + , n ) T , where w Y + , k = 1 (cid:112) n + h Y + n + (cid:88) i =1 K ( x + ,i ) η Y + ,i,k ,w Y + , j = 1 (cid:112) n + h Y + q (cid:88) k =1 n + (cid:88) i =1 K ( x + ,i ) x j + ,i η Y + ,i,k , ∀ j = 1 , · · · , p,η Y + ,i,k = I ( (cid:15) Y + ,i ≤ c k ) − τ k . Similarly, we deﬁne W ∗ T + ,n + , w ∗ T + , k , w ∗ T + , j , η ∗ T + ,i,k , W T + ,n + , w T + , k , w T + , j and η T + ,i,k .Consider the case p = 1 and deﬁne θ = ( u , · · · , u q , v ) T . It can be shown that minimizing eq. (13) is equivalent to minimizing L n + ( θ ) = n + (cid:88) i =1 (cid:32) K ( x + ,i ) q (cid:88) k =1 ρ τ k ( σ (cid:15) Y + ,i ( (cid:15) Y + ,i − c k ) + d i,k − ∆ i,k ) − ρ τ k ( σ (cid:15) Y + ,i ( (cid:15) Y + ,i − c k ) + d i,k )) (cid:33) . θ n + = (ˆ u , · · · , ˆ u q , ˆ v ) T be the minimizer. Lemma 1.

Under Assumptions 1 to 6, as n + → ∞ , we haveˆ θ n + + σ (cid:15) Y + (0) f X + (0) S − Y + ( c ) E ( W ∗ n + | X ) L → M V N (cid:32) , σ (cid:15) Y + (0) f X + (0) S − Y + ( c )Σ Y + ( c ) S − Y + ( c ) (cid:33) . Proof of Lemma 1.

See the proof of Theorem 2.1 in Kai et al. (2009).

Lemma 2.

Under Assumptions 1 to 6, as n + → ∞ , the asymptotic bias and variance forthe LCQR estimator in eq. (1) are given byBias( ˆ m Y + (0) | X ) = 12 a Y + ( c ) m (2) Y + (0) h Y + + o p ( h Y + ) , Var( ˆ m Y + (0) | X ) = 1 n + h Y + b Y + ( c ) σ (cid:15) Y + (0) f X + (0) + o p ( 1 n + h Y + ) ,a Y + ( c ) = µ , ( c ) − µ + , ( c ) µ + , ( c ) µ + , ( c ) µ + , ( c ) − µ , ( c ) , (39) b Y + ( c ) = e Tq ( S − Y + ( c )Σ Y + ( c ) S − Y + ( c )) e q /q . (40) Proof of Lemma 2.

The bias result follows that in Theorem 2.1 in Kai et al. (2009). Thevariance result also largely follows that in Kai et al. (2009). GivenVar( ˆ m Y + (0) | X ) = 1 n + h Y + σ (cid:15) Y + q f X + (0) e Tq ( S − Y + ( c )Σ Y + ( c ) S − Y + ( c )) e q + o p ( 1 n + h Y + ) , (41)It is easy to verify that when q = 1, eq. (41) can be written asVar( ˆ m Y + (0) | X ) = 1 n + h Y + σ (cid:15) Y + f X + (0) µ , ( c ) ν + , ( c ) − µ + , ( c ) µ + , ( c ) ν + , ( c ) + µ , ( c ) ν + , ( c )( µ + , ( c ) µ + , ( c ) − µ , ( c )) R ( q )+ o p ( 1 n + h Y + ) , (42)where R ( q ) = q (cid:80) qk =1 (cid:80) qk (cid:48) =1 τ kk (cid:48) f (cid:15)Y +( ck ) f (cid:15)Y +( ck (cid:48) ) . However, for q ≥

2, result in eq. (42) no longerholds and we use eq. (41) instead.Next we derive the covariance result Cov( ˆ m Y + ( x ) , ˆ m T + ( x ) | X ) at the boundary point 0.The result for Cov( ˆ m Y − ( x ) , ˆ m T − ( x ) | X ) at 0 is similar.35 emma 3. Under Assumptions 1 to 6, as n + → ∞ , the covariance between ˆ m Y + ( x ) andˆ m T + ( x ) at the boundary point 0 is given byCov( ˆ m Y + (0) , ˆ m T + (0) | X )= 1 n + (cid:112) h Y + h T + σ (cid:15) Y + (0) σ (cid:15) T + (0) f X + (0) b Y T + + o p ( 1 n + h Y + + 1 n + h T + ) , (43)where b Y T + = 1 q e Tq (cid:16) S − Y + ( c )Σ Y T + S − T + ( c ) (cid:17) e q . (44) Proof of Lemma 3.

Assume p = 1. From Lemma 1, we writeˆ m Y + (0) − E ( ˆ m Y + (0) | X )= − q (cid:112) n + h Y + σ (cid:15) Y + (0) f x + (0) e Tq (cid:18) ( S − Y + ( c )) ( S − Y + ( c )) (cid:19)  w ∗ Y + , n − E ( w ∗ Y + , n | X ) w ∗ Y + , − E ( w ∗ Y + , | X )  + o p (1)= − q (cid:112) n + h Y + σ (cid:15) Y + (0) f x + (0) e Tq (cid:18) ( S − Y + ( c )) ( S − Y + ( c )) (cid:19)  w Y + , n − E ( w Y + , n | X ) w Y + , − E ( w Y + , | X )  + o p (1) , where the last equality follows by the result that Var( w ∗ Y + , n − w Y + , n | X ) = o p (1) andVar( w ∗ Y + , − w Y + , | X ) = o p (1). See Kai et al. (2010) for a proof. Similarly, we haveˆ m T + (0) − E ( ˆ m T + (0) | X )= − q (cid:112) n + h T + σ (cid:15) T + (0) f x + (0) e Tq (cid:18) ( S − T + ( c )) ( S − T + ( c )) (cid:19)  w T + , n − E ( w T + , n | X ) w T + , − E ( w T + , | X )  + o p (1) . Hence the covariance is given byCov( ˆ m Y + (0) , ˆ m T + (0) | X )= E (cid:0) ( ˆ m Y + (0) − E ( ˆ m Y + (0) | X ))( ˆ m T + (0) − E ( ˆ m T + (0) | X )) (cid:1) = 1 q n + (cid:112) h Y + h T + σ (cid:15) Y + (0) σ (cid:15) T + (0) f X + (0) e Tq (cid:18) ( S − Y + ( c )) ( S − Y + ( c )) (cid:19) E  w Y + , n − E ( w Y + , n | X ) w Y + , − E ( w Y + , | X )   w T + , n − E ( w T + , n | X ) w T + , − E ( w T + , | X )  T  ×  ( S − T + ( c )) ( S − T + ( c ))  e q = 1 q n + (cid:112) h Y + h T + σ (cid:15) Y + (0) σ (cid:15) T + (0) f X + (0) e Tq (cid:16) S − Y + ( c )Σ Y T + S − T + ( c ) (cid:17) e q + o p ( 1 n + h Y + + 1 n + h T + ) , where Cov( η Y + ,i,k , η T + ,j,k (cid:48) ) = φ kk (cid:48) if i = j and Cov( η Y + ,i,k , η T + ,j,k (cid:48) ) = 0 if i (cid:54) = j . Proof of Theorem 2.

The proof of Theorem 2 follows directly from Lemmas 2 and 3, eq. (12),and the approximationˆ τ fuzzy − τ fuzzy = 1 m T + (0) − m T − (0) (cid:2) ˆ m Y + (0) − m Y + (0) − ( ˆ m Y − (0) − m Y − (0)) (cid:3) − m Y + (0) − m Y − (0) (cid:2) m T + (0) − m T − (0) (cid:3) (cid:2) ˆ m T + (0) − m T + (0) − ( ˆ m T − (0) − m T − (0)) (cid:3) + o p ( h Y + + h Y − + h T + + h T − ) . For convenience we write ˆ m Y + (0) and ˆ m Y − (0) as ˆ m Y + and ˆ m Y − , respectively. Equa-tion (29) suggests that we need the expressions for Var(Bias( ˆ m Y + )) and Cov( ˆ m Y + , Bias( ˆ m Y + ))to compute the adjusted variance. The next lemma provides results for Var(Bias( ˆ m Y + )). Inderiving the results, for completeness purposes, we also present the bias of Bias( ˆ m Y + ). Let e r be a p × r th element equal to one. Let p = 3 in the following proof. Lemma 4.

Under Assumptions 1 to 6, if h Y + → n + h Y + → ∞ as n + → ∞ , theasymptotic bias and variance of ˆ m (2) Y + are given byBias( ˆ m (2) Y + | X ) = 112 a ∗ Y + ( c ) m (4) Y + h Y + + o p ( h Y + ) , (45)Var ( ˆ m (2) Y + | X ) = 4 n + h Y + σ (cid:15) Y + (0) b ∗ Y + ( c ) f X + (0) + o p ( 1 n + h Y + ) , (46)where a ∗ Y + ( c ) = µ + , e T ( S − Y + ( c )) f (cid:15) Y + + q (cid:88) k =1 f (cid:15) Y + ( c k ) e T ( S − Y + ( c )) ( µ + , , µ + , , µ + , ) T (47) b ∗ Y + ( c ) = e T ( S − Y + ( c )Σ Y + ( c ) S − Y + ) e . (48)37 roof of Lemma 4. From the deﬁnition of v j , we haveˆ m (2) Y + = m (2) Y + + 2ˆ v h Y + (cid:112) n + h Y + . (49)Hence the bias becomes E ( ˆ m (2) Y + ) − m (2) Y + = − σ (cid:15) Y + (0) h Y + (cid:112) n + h Y + f X + (0) e T (( S − Y + ( c )) , ( S − Y + ( c )) ) E ( W ∗ Y + ,n )= − σ (cid:15) Y + (0) h Y + (cid:112) n + h Y + f X + (0) e T ( S − Y + ( c )) E ( W ∗ Y + , n ) − σ (cid:15) Y + (0) h Y + (cid:112) n + h Y + f X + (0) e T ( S − Y + ( c )) E ( W ∗ Y + , n )= I + II . Consider the terms I and II.I = − σ (cid:15) Y + (0) h Y + (cid:112) n + h Y + f X + (0) e T ( S − Y + ( c )) × (cid:34) − f (cid:15) Y + (cid:112) n + h Y + n + (cid:88) i =1 K i c k σ (cid:15) Y + ,i − σ (cid:15) Y + (0) σ (cid:15) Y + ,i − f (cid:15) Y + (cid:112) n + h Y + n + (cid:88) i =1 K i r i, σ (cid:15) Y + ,i (cid:35) = 112 m (4) Y + µ + , ( c ) e T ( S − Y + ( c )) f (cid:15) Y + h Y + + o p ( h Y + ) , II = − σ (cid:15) Y + (0) h Y + (cid:112) n + h Y + f X + (0) e T ( S − Y + ( c )) ×  − (cid:80) qk =1 f (cid:15) Y + ( c k ) (cid:112) n + h Y + n + (cid:88) i =1 K i c k σ (cid:15) Y + ,i − σ (cid:15) Y + (0) σ (cid:15) Y + ,i  x + ,i x ,i x ,i  − (cid:80) qk =1 f (cid:15) Y + ( c k ) (cid:112) n + h Y + n + (cid:88) i =1 K i r i, σ (cid:15) Y + ,i  x + ,i x ,i x ,i  = 112 m (4) Y + q (cid:88) k =1 f (cid:15) Y + ( c k ) e T ( S − Y + ( c ))  µ + , µ + , µ + ,  h Y + + o p ( h Y + ) . The bias result is proved by combing the two terms I and II. One would expect a numberof 4! = 24 instead of 12 on the denominator. This is due to the extra number 2 in eq. (49).Because of the way ˆ v is deﬁned, the “eﬀective” constant on the denominator is still 24,38n line with the standard results for nonparametric derivatives. Similarly, the number 4appearing on the numerator of the variance is also a result of the number 2 in eq. (49). Thevariance results from eq. (49) and Lemma 1. Proof of Theorem 3.

Following Theorem 1, we haveVar( ˆ m Y + ) = 1 n + h Y + b Y + ( c ) σ (cid:15) Y + (0) f X + (0) + o p ( 1 n + h Y + ) , Var( ˆ m Y − ) = 1 n − h Y − b Y − ( c ) σ (cid:15) Y − (0) f X − (0) + o p ( 1 n − h Y − ) . Use the bias expression in Theorem 1 and the variance result in Lemma 4, we haveVar(Bias( ˆ m Y + )) = σ (cid:15) Y + (0) n + h Y + f X + (0) a ( c ) b ∗ Y + ( c ) + o p ( 1 n + h Y + ) , Var(Bias( ˆ m Y − )) = σ (cid:15) Y − (0) n − h Y − f X − (0) a ( c ) b ∗ Y − ( c ) + o p ( 1 n − h Y − ) . For the covariances, we haveCov( ˆ m Y + , Bias( ˆ m Y + )) = Cov( m Y + + 1 q (cid:112) n + h Y + q (cid:88) k =1 ˆ u k , a Y + ( c ) h Y + ( m (2) Y + + 2ˆ v h Y + (cid:112) n + h Y + )= a Y + ( c ) n + h Y + q q (cid:88) k =1 Cov(ˆ u k , ˆ v )= a Y + ( c ) σ (cid:15) Y + (0) n + h Y + qf X + (0) e Tq ( S − Y + Σ Y + S − Y + ) , + o p ( 1 n + h Y + ) , where ( S − Y + Σ Y + S − Y + ) , is the second column of the matrix ( S − Y + Σ Y + S − Y + ) and the last linefollows from Lemma 1. Similarly, for data below the cutoﬀ, we haveCov( ˆ m Y − , Bias( ˆ m Y − )) = a Y − ( c ) σ (cid:15) Y − (0) n − h Y − qf X − (0) e Tq ( S − Y − Σ Y − S − Y − ) , + o p ( 1 n − h Y − ) . To establish the asymptotic normality of the adjusted t -statistic, we note thatˆ τ bcsharp = ˆ m Y + − (cid:100) Bias( ˆ m Y + ) − ( ˆ m Y − − (cid:100) Bias( ˆ m Y − )) , where, by applying Lemma 1 to both ˆ m Y + − (cid:100) Bias( ˆ m Y + ) and ˆ m Y − − (cid:100) Bias( ˆ m Y − ), ˆ τ bcsharp becomes39he diﬀerence between two normal random variables, which is also a normal random variable.Finally, the expression for Var(ˆ τ sharp − Bias(ˆ τ sharp )) is obtained by substituting the sixvariance and covariance results into eq. (29),Var(ˆ τ sharp − Bias(ˆ τ sharp )) = 1 n + h Y + V sharp , + + 1 n − h Y − V sharp , − , where V sharp , + = b Y + ( c ) σ (cid:15) Y + (0) f X + (0) + σ (cid:15) Y + (0) f X + (0) a ( c ) b ∗ Y + ( c ) − a Y + ( c ) σ (cid:15) Y + (0) qf X + (0) e Tq ( S − Y + Σ Y + S − Y + ) , ,V sharp , − = b Y − ( c ) σ (cid:15) Y − (0) f X − (0) + σ (cid:15) Y − (0) f X − (0) a ( c ) b ∗ Y − ( c ) − a Y − ( c ) σ (cid:15) Y − (0) qf X − (0) e Tq ( S − Y − Σ Y − S − Y − ) , . Proof of Theorem 6.

We ﬁrst note that all bias terms in eq. (31) can be obtained usingLemma 2. For terms in the adjusted variance in eq. (32), Var( ˆ m Y + ), Var( ˆ m T + ), Var( (cid:100) Bias( ˆ m Y + )),Var( (cid:100) Bias( ˆ m T + )), Cov( ˆ m Y + , (cid:100) Bias( ˆ m Y + )), and Cov( ˆ m T + , (cid:100) Bias( ˆ m T + )) can be obtained using re-sults in the proof of Theorem 3; Cov( ˆ m Y + , ˆ m T + ) is obtained using Lemma 3. And we listthese seven terms in the following.Var( ˆ m Y + ) = 1 n + h Y + b Y + ( c ) σ (cid:15) Y + (0) f X + (0) + o p ( 1 n + h Y + ) , Var( ˆ m T + ) = 1 n + h T + b T + ( c ) σ (cid:15) T + (0) f X + (0) + o p ( 1 n + h T + ) , Var(Bias( ˆ m Y + )) = σ (cid:15) Y + (0) n + h Y + f X + (0) a Y + ( c ) b ∗ Y + ( c ) + o p ( 1 n + h Y + ) , Var(Bias( ˆ m T + )) = σ (cid:15) T + (0) n + h T + f X + (0) a T + ( c ) b ∗ T + ( c ) + o p ( 1 n + h T + ) , Cov( ˆ m Y + , ˆ m T + ) = 1 n + (cid:112) h Y + h T + σ (cid:15) Y + (0) σ (cid:15) T + (0) f X + (0) b Y T + + o p ( 1 n + h Y + + 1 n + h T + ) , Cov( ˆ m Y + , (cid:100) Bias( ˆ m Y + )) = a Y + ( c ) σ (cid:15) Y + (0) n + h Y + qf X + (0) e Tq ( S − Y + Σ Y + S − Y + ) , + o p ( 1 n + h Y + ) , Cov( ˆ m T + , (cid:100) Bias( ˆ m T + )) = a T + ( c ) σ (cid:15) T + (0) n + h T + qf X + (0) e Tq ( S − T + Σ T + S − T + ) , + o p ( 1 n + h T + ) . (cid:100) Bias( ˆ m Y + ) , (cid:100) Bias( ˆ m T + )) = Cov( 12 a Y + ( c ) ˆ m (2) Y + h Y + , a T + ( c ) ˆ m (2) T + h T + )= a Y + ( c ) a T + ( c ) (cid:112) n + h Y + (cid:112) n + h T + Cov(ˆ v ,Y + , ˆ v ,T + )= a Y + ( c ) a T + ( c ) σ (cid:15) Y + (0) σ (cid:15) T + (0) n + (cid:112) h Y + (cid:112) h T + f X + (0) e T ( S − Y + Σ Y T + S − T + ) e + o p ( 1 n + h Y + + 1 n + h T + ) . Cov( ˆ m Y + , (cid:100) Bias( ˆ m T + )) = Cov( m Y + + 1 q (cid:112) n + h Y + q (cid:88) k =1 ˆ u k,Y , a T + ( c ) h T + ( m (2) T + + 2ˆ v ,T h T + (cid:112) n + h T + ))= a T + ( c ) σ (cid:15) Y + (0) σ (cid:15) T + (0) qn + (cid:112) h Y + (cid:112) h T + f X + (0) e Tq ( S − Y + Σ Y T + S − T + ) , + o p ( 1 n + h Y + + 1 n + h T + ) . Cov( ˆ m T + , (cid:100) Bias( ˆ m Y + )) = Cov( m T + + 1 q (cid:112) n + h T + q (cid:88) k =1 ˆ u k,T , a Y + ( c ) h Y + ( m (2) Y + + 2ˆ v ,Y h Y + (cid:112) n + h Y + ))= a Y + ( c ) σ (cid:15) Y + (0) σ (cid:15) T + (0) qn + (cid:112) h Y + (cid:112) h T + f X + (0) e Tq ( S − T + Σ T Y + S − Y + ) , + o p ( 1 n + h Y + + 1 n + h T + ) . Substituting the above results into eq. (32) gives the expression for Var(( ˆ m Y + − τ ˆ m T + ) − ( (cid:100) Bias( ˆ m Y + ) − τ (cid:100) Bias( ˆ m T + ))). The result for Var(( ˆ m Y − − τ ˆ m T − ) − ( (cid:100) Bias( ˆ m Y − ) − τ (cid:100) Bias( ˆ m T − )))can be obtained in a similar way. Adding up the two variance results gives the adjustedvariance in the fuzzy case.To establish the asymptotic normality, note that we can use eq. (31) to write ˜ τ bcfuzzy as˜ τ bcsharp = ( ˆ m Y + − (cid:100) Bias( ˆ m Y + )) − τ ( ˆ m T + − (cid:100) Bias( ˆ m T + )) − ( ˆ m Y − − (cid:100) Bias( ˆ m Y − ))+ τ ( ˆ m T − − (cid:100) Bias( ˆ m T − )) , where each of the four terms follows an asymptotic normal distribution by applying Lemma 1.Hence, along with the adjusted variance derived earlier, we prove the asymptotic distributionof t adj.fuzzy . 41 roof of Theorem 7. We ﬁrst expand Bias( ˆ m Y + ) up to O ( h Y + ) on the boundary. Recallˆ m Y + = (cid:80) qk =1 ˆ a k /q and we haveBias( ˆ m Y + ) = σ (cid:15) Y + (0) q q (cid:88) k =1 c k − σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq (cid:104) ( S − Y + ( c )) E ( w ∗ Y + , n ) + ( S − Y + ( c )) E ( w ∗ Y + , n ) (cid:105) = − σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c )) E ( w ∗ Y + , n ) − σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c )) E ( w ∗ Y + , n )= I + II . Consider term I.I = − σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c ))  √ n + h Y + (cid:80) n + i =1 K i E ( η ∗ Y + ,i, )... √ n + h Y + (cid:80) n + i =1 K i E ( η ∗ Y + ,i,q )  = σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c ))  f (cid:15)Y + ( c ) √ n + h Y + (cid:80) n + i =1 K i d i, σ (cid:15)Y + ,i ... f (cid:15)Y + ( c q ) √ n + h Y + (cid:80) n + i =1 K i d i,q σ (cid:15)Y + ,i  + o p (1)= σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c ))  f (cid:15)Y + ( c ) √ n + h Y + (cid:80) n + i =1 K i r i, ... f (cid:15)Y + ( c q ) √ n + h Y + (cid:80) n + i =1 K i r i,  + o p (1)= 12 q e Tq ( S − Y + ( c )) f (cid:15) Y + m (2) Y + µ + , h Y + + 16 q e Tq ( S − Y + ( c )) f (cid:15) Y + m (3) Y + µ + , h Y + + f (1) x + (0)2 qf x + (0) e Tq ( S − Y + ( c )) f (cid:15) Y + m (2) Y + µ + , h Y + + o p ( h Y + ) , where the second equality follows by expanding the cumulative distribution of (cid:15) Y + ,i around c k and the third equality follows by noticing that all terms containing c k , after multiplied bythe coeﬃcient σ (cid:15)Y + (0) q √ n + h Y + f x + (0) e Tq ( S − Y + ( c )) , become zero after a summation. The last equalityis obtained by a Taylor series expansion of m Y + at 0 up to order 3 in r i, , similar to theexpansion in the deﬁnition of r i, .Consider term II. Note that p = 1 in the following proof when we estimate the conditional42ean using degree one local polynomial.II = − σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c ))  √ n + h Y + (cid:80) qk =1 (cid:80) n + i =1 K i X + ,i E ( η ∗ Y + ,i, )... √ n + h Y + (cid:80) qk =1 (cid:80) n + i =1 K i X p + ,i E ( η ∗ Y + ,i,q )  = σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c ))  √ n + h Y + (cid:80) qk =1 f (cid:15) Y + ( c ) (cid:80) n + i =1 K i X + ,i d i, σ (cid:15)Y + ,i ... √ n + h Y + (cid:80) qk =1 f (cid:15) Y + ( c q ) (cid:80) n + i =1 K i X p + ,i d i,q σ (cid:15)Y + ,i  + o p (1)= (cid:80) qk =1 f (cid:15) Y + ( c k )2 q e Tq ( S − Y + ( c ))  µ + , ... µ + ,p +2  m (2) Y + h Y + + (cid:80) qk =1 f (cid:15) Y + ( c k )6 q e Tq ( S − Y + ( c ))  µ + , ... µ + ,p +3  m (3) Y + h Y + + f (1) x + (0) (cid:80) qk =1 f (cid:15) Y + ( c k )2 qf x + (0) e Tq ( S − Y + ( c ))  µ + , ... µ + ,p +3  m (2) Y + h Y + + o p ( h Y + ) . Combining I and II yieldsBias( ˆ m Y + ) = 12 a Y + ( c ) m (2) Y + h Y + + 16 ˇ a Y + ( c ) m (3) Y + h Y + + 12 ˜ a Y + ( c ) f (1) x + (0) f x + (0) m (2) Y + h Y + + o p ( h Y + ) , where a Y + ( c ) is deﬁned in Lemma 2, ˇ a Y + ( c ) = µ + , ( c ) µ + , ( c ) − µ + , ( c ) µ + , ( c ) µ + , ( c ) µ + , ( c ) − µ , ( c ) , and ˜ a Y + ( c ) = µ , ( c ) − µ + , ( c ) µ + , ( c ) µ + , ( c ) µ + , ( c ) − µ , ( c ) .Hence the leading term in Bias( ˆ m Y + − Bias( ˆ m Y + )) is ˇ a Y + ( c ) m (3) Y + h Y + +

12 ˜ a Y + ( c ) f (1) x + (0) f x + (0) m (2) Y + h Y + .Since we work with data above the cutoﬀ in this proof, the adjusted variance is given by43 n + h Y + V adj.sharp in the proof of Theorem 3. Thus, the adjusted MSE can be written asadj. MSE = (cid:34)

16 ˇ a Y + ( c ) m (3) Y + h Y + + 12 ˜ a Y + ( c ) f (1) x + (0) f x + (0) m (2) Y + h Y + (cid:35) + 1 n + h Y + V adj.sharp + o p ( h Y + + 1 n + h Y + )= C h Y + + 1 n + h Y + C + o p ( h Y + + 1 n + h Y + ) , where C = ˇ a Y + ( c ) m (3) Y + +

12 ˜ a Y + ( c ) f (1) x + (0) f x + (0) m (2) Y + and C = V adj.sharp . The bandwidth that minimizesthe adjusted MSE is given by h = (cid:16) C C (cid:17) / n − / . Appendix B

This section contains additional Figures 4 to 6 that compare the ﬁnite sample performanceof LCQR and LLR in estimating the treatment eﬀect. The following four tables of coverageprobability are identical to Tables 7 to 10 except that ˆ τ cqr2bw and ˆ τ cqr,bc2bw use the rule-of-thumbbandwidth described by equation (4.3) in Fan and Gijbels (1996). b i a s o f b i a s − c o rr e c t ed e s t i m a t e LCQRLLR

Figure 4: Absolute value of average bias of the bias-corrected estimators, ˆ τ cqr,bc1bw and ˆ τ robust,bc1bw for the Lee model with heteroskedasticity. ˆ τ robust,bc1bw is the bias-corrected LLR. The result isbased on 5000 replications and the true treatment eﬀect is 0 . .02730.0115 0.01630.0033 0.0243e−04 0.01790.0052 0.02410.0074 b i a s o f b i a s − c o rr e c t ed e s t i m a t e LCQRLLR

Figure 5: Absolute value of average bias of the bias-corrected estimators, ˆ τ cqr,bc1bw and ˆ τ robust,bc1bw for the LM model with homoskedasticity. ˆ τ robust,bc1bw is the bias-corrected LLR. The result isbased on 5000 replications and the true treatment eﬀect is − . b i a s o f b i a s − c o rr e c t ed e s t i m a t e LCQRLLR

Figure 6: Absolute value of average bias of the bias-corrected estimators, ˆ τ cqr,bc1bw and ˆ τ robust,bc1bw for the LM model with heteroskedasticity. ˆ τ robust,bc1bw is the bias-corrected LLR. The result isbased on 5000 replications and the true treatment eﬀect is − . τ cqr2bw τ cqr,bc2bw Notes : The numbers are the averages of the coverage prob-ability associated with diﬀerent estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with a superscript bc are both bias-corrected and s.e.-adjusted.Table 12: Coverage probability in the Lee model with het-eroskedasticity using the rule-of-thumb bandwidth in LCQRestimationEstimator DGP 1 DGP 2 DGP 3 DGP 4 DGP 5ˆ τ cqr2bw τ cqr,bc2bw Notes : The numbers are the averages of the coverage prob-ability associated with diﬀerent estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with a superscript bcbc are both bias-corrected and s.e.-adjusted. 46able 13: Coverage probability in the LM model with ho-moskedasticity using the rule-of-thumb bandwidth in LCQRestimationEstimator DGP 1 DGP 2 DGP 3 DGP 4 DGP 5ˆ τ cqr2bw τ cqr,bc2bw Notes : The numbers are the averages of the coverage prob-ability associated with diﬀerent estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with a superscript bc are both bias-corrected and s.e.-adjusted.Table 14: Coverage probability in the LM model with het-eroskedasticity using the rule-of-thumb bandwidth in LCQRestimationEstimator DGP 1 DGP 2 DGP 3 DGP 4 DGP 5ˆ τ cqr2bw τ cqr,bc2bw Notes : The numbers are the averages of the coverage prob-ability associated with diﬀerent estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with a superscript bcbc