Local Composite Quantile Regression for Regression Discontinuity
LLocal Composite Quantile Regression for Regression Discontinuity
Xiao Huang ∗ Zhaoguo Zhan † September 9, 2020
Abstract
We introduce the local composite quantile regression (LCQR) to causal inference inregression discontinuity (RD) designs. Kai et al. (2010) study the efficiency propertyof LCQR, while we show that its nice boundary performance translates to accurateestimation of treatment effects in RD under a variety of data generating processes.Moreover, we propose a bias-corrected and standard error-adjusted t -test for inference,which leads to confidence intervals with good coverage probabilities. A bandwidthselector is also discussed. For illustration, we conduct a simulation study and revisit aclassic example from Lee (2008). A companion R package rdcqr is developed. JEL Classification : C18, C21
Keywords : Regression discontinuity, Treatment effect, Local composite quantile regres-sion. ∗ Corresponding author. Department of Economics, Finance, and Quantitative Analysis, Coles College ofBusiness, Kennesaw State University, GA 30144, USA. Email: [email protected]. † Department of Economics, Finance, and Quantitative Analysis, Coles College of Business, KennesawState University, GA 30144, USA. Email: [email protected]. a r X i v : . [ ec on . E M ] S e p Introduction
Over the past few decades, regression discontinuity (RD) has become a popular quasi-experimental method to identify the local average treatment effect. In its simplest form,the sharp RD design, the probability for a unit i to receive treatment is a discontinuousfunction of an underlying covariate X i , i.e., unit i receives treatment if and only if the co-variate exceeds a pre-specified cutoff. Under some smoothness assumptions, units in a smallneighborhood of the cutoff share similar characteristics so that the difference between theoutcomes above and below the cutoff can be interpreted as the local average treatment effect.This is a simple yet powerful method to identify causal effects locally.Empirical studies in RD are predated by Thistlethwaite and Campbell (1960), who in-vestigated the impact on academic performance of a merit award based on whether the testscore attains a cutoff. The seminal work in Hahn et al. (2001) showed that, if conditionalexpectations are continuous in the covariate around the cutoff, the treatment effect can beidentified by taking the difference between average outcomes above and below the cutoff.This finding significantly weakened previous identification conditions and paved the way forthe rapid development of the RD literature. Examples of empirical work include the anal-ysis of electoral advantage to incumbency in U.S. house elections in Lee (2008), analysis ofchanges in children’s mortality rates due to the implementation of the Head Start programin Ludwig and Miller (2007), etc. Recent theoretical development includes the optimal con-vergence rate in RD estimation in Porter (2003), the optimal bandwidth choice in Imbensand Kalyanaraman (2012), the quantile treatment estimation in Frandsen et al. (2012), therobust confidence intervals in Calonico et al. (2014), the null-restricted t -test in Feir et al. (2016), the honest confidence intervals in Armstrong and Koles´ar (2018, 2020), etc. See alsothe literature reviews in Imbens and Lemieux (2008) and Lee and Lemieux (2010).Central to a large portion of the empirical and theoretical work on RD is the use of locallinear regression (LLR), whose theoretical properties are discussed in Fan and Gijbels (1992),Fan (1993), and Ruppert and Wand (1994) as well as Fan and Gijbels (1996). Automatic2oundary carpentry and high minimax efficiency (Cheng et al. (1997)) have made LLR anideal tool for estimation in RD. Since LLR has the smallest asymptotic mean squared error(MSE) in the class of linear estimators and is simple to use, the RD research community haswidely adopted it as a default tool.The goal of this paper is to introduce another nonparametric smoother to the estimationand inference in RD. Although LLR is the best linear smoother, it is possible to use analternative estimator in RD if we consider the larger class of nonlinear smoothers. Oneexample is the local composite quantile regression (LCQR) method in Kai et al. (2010), whoshow that LCQR has some efficiency gain against LLR when data are non-normal. Theliterature related to LCQR is growing. Several recent examples include the partial linearmodel in Kai et al. (2011), weighted LCQR smoothing in Zhao and Xiao (2014), LCQRsmoothing in Harris recurrent Markov processes in Li and Li (2016), LCQR with mixeddata in Huang and Lin (2020), just to name a few. To use LCQR, the researcher chooses afinite number of quantiles, say 5, and use a local polynomial to estimate the quantile (theintercept) at each of the 5 quantile positions. Averaging the 5 quantile estimates gives anestimate for the conditional mean. m ^ ( x ) (a) LLR versus LCQR lllllllllllllllllllllllllllll llllll llllllllllllllllllllllllllll lllllllll −2−1012 near boundary 0 near boundary 1 m ^ ( x ) LLRLCQR (b) Box plots of the estimates on the boundaries
Figure 1: Estimates of LLR and LCQR with a sample size of 400 and 400 replications. Bothmethods use the same direct plug-in bandwidth in Ruppert et al. (1995). m ( X ) = sin(5 πX ).3o motivate the use of LCQR, consider the nonlinear model in Ruppert et al. (1995), Y =sin(5 πX ) + 0 . (cid:15) , where (cid:15) follows a mixture normal distribution, 0 . N (0 ,
1) + 0 . N (0 , ),and X follows a uniform distribution on [0 , t -test in Calonico et al. (2014) for conducting inference on causaleffects, we propose a t -test that adjusts both the bias and the standard error (s.e.) ofthe LCQR estimator, and the resulting confidence intervals are shown to have good coverageprobabilities. Third, we further discuss a new bandwidth selector that is based on an adjustedMSE. As a byproduct of our research, we also develop an R package rdcqr that implementsthe method in this paper. The rest of the paper unfolds as follows. Section 2 sets up the notation for RD designsand introduces the LCQR method. Section 3 discusses the bias correction and standarderror adjustment for inference on the boundary. Section 4 presents a simulation experiment.An empirical illustration of LCQR in RD is provided in Section 5 using the data from Lee(2008). Section 6 concludes. The Appendix contains all the proofs and additional figuresand tables.
We introduce the notation for RD and the concept of LCQR in this section. The rdcqr package can be downloaded from https://github.com/xhuang20/rdcqr . .1 Regression discontinuity: setup and notation Consider the triplet ( Y i , X i , T i ), i = 1 , · · · , n in a standard RD design, where Y i is theobserved outcome for individual i , X i is the covariate that determines the assignment oftreatment, and T i equals 1 if individual i receives treatment and 0 otherwise. For individual i to receive treatment, a threshold value for X i is set. Without loss of generality, we assumethe threshold is 0, and the treatment assignment rule becomes that individual i is assignedto the treatment group if X i ≥
0. Throughout the paper, the signs + and − will be used todenote data or quantities associated with X i ≥ X i <
0, respectively.We use a general nonparametric model to describe the relationship between Y i and X i : Y + ,i = m Y + ( X + ,i ) + σ (cid:15) Y + ( X + ,i ) (cid:15) Y + ,i , (1) Y − ,i = m Y − ( X − ,i ) + σ (cid:15) Y − ( X − ,i ) (cid:15) Y − ,i , (2)where m Y + , m Y − , σ (cid:15) Y + , σ (cid:15) Y − , (cid:15) Y + and (cid:15) Y − are the conditional mean functions, conditional stan-dard deviation functions, and error terms with unit variance, respectively.In a sharp RD design, whether individual i receives treatment is determined by whether X i ≥
0. One is interested in measuring the average treatment effect at the threshold 0: τ sharp = m Y + (0) − m Y − (0) , (3)where m Y + (0) = lim x + ↓ m Y + ( x + ) and m Y − (0) = lim x − ↑ m Y − ( x − ) . (4)A t -test for testing a hypothesized treatment effect τ reads: t sharp = ˆ τ sharp − τ (cid:112) Var(ˆ τ sharp ) , (5)where, under the i.i.d. assumption, we have 5 τ sharp = ˆ m Y + (0) − ˆ m Y − (0) , Var(ˆ τ sharp ) = Var( ˆ m Y + (0)) + Var( ˆ m Y − (0)) . (6)In a fuzzy RD design, the treatment assignment rule remains the same but T i is notnecessarily equal to 1 when X i ≥ X i <
0. It is customary to use a nonparametricfunction to describe the probability of receiving a treatment: T + ,i = m T + ( X + ,i ) + σ (cid:15) T + ( X + ,i ) (cid:15) T + ,i , (7) T − ,i = m T − ( X − ,i ) + σ (cid:15) T − ( X − ,i ) (cid:15) T − ,i . (8)The treatment effect measurement becomes τ fuzzy = m Y + (0) − m Y − (0) m T + (0) − m T − (0) , (9)where m T + (0) = lim x + ↓ m T + ( x + ) and m T − (0) = lim x − ↑ m T − ( x − ) . (10)Let s.o. denote a small order term. The corresponding t -test is given by t fuzzy = ˆ τ fuzzy − τ (cid:112) Var(ˆ τ fuzzy ) , (11)where ˆ τ fuzzy = ˆ m Y + (0) − ˆ m Y − (0)ˆ m T + (0) − ˆ m T − (0) , Var(ˆ τ fuzzy ) = Var( ˆ m Y + (0)) + Var( ˆ m Y − (0)) (cid:0) m T + (0) − m T − (0) (cid:1) + (cid:0) m Y + (0) − m Y − (0) (cid:1) (cid:0) m T + (0) − m T − (0) (cid:1) (cid:2) Var( ˆ m T + (0)) + Var( ˆ m T − (0)) (cid:3) − m Y + (0) − m Y − (0) (cid:0) m T + (0) − m T − (0) (cid:1) (cid:2) Cov( ˆ m Y + (0) , ˆ m T + (0)) + Cov( ˆ m Y − (0) , ˆ m T − (0)) (cid:3) + s.o. (12)To conduct the t -test, we thus need nonparametric estimates for all quantities in eqs. (6)and (12) at the boundary point 0. 6 .2 Local composite quantile regression LLR is the leading method to estimate quantities in eqs. (6) and (12). LCQR is introducedin Kai et al. (2010) as an alternative to the LLR method due to its efficiency gain with non-normal errors. Robustness (to non-normality) is the key reason we propose to use LCQR inRD as the normality assumption can be easily violated in practice.The same LCQR method can be applied to eqs. (1), (2), (7) and (8). Because of thissimilarity, we describe the LCQR method based on eq. (1). For k = 1 , · · · , q , let τ k = k/ ( q +1)be the q quantile positions and ρ τ k ( r ) = τ k r − rI ( r <
0) be the q check loss functions inquantile regression. The loss function for LCQR at the point of interest x is defined as q (cid:88) k =1 n + (cid:88) i =1 ρ τ k ( Y + ,i − a k − b ( X + ,i − x )) K (cid:18) X + ,i − xh Y + (cid:19) , (13)where a k is the k th quantile estimator, b is a slope restricted to be the same across quantiles, h Y + is the bandwidth, and K is a kernel function. The point of interest in eq. (1) is x = 0.Minimizing eq. (13) w.r.t. a k and b yields (ˆ a , · · · , ˆ a q , ˆ b ). The LCQR estimator for m Y + ( x ) is defined as ˆ m Y + ( x ) = 1 q q (cid:88) k =1 ˆ a k (14)and ˆ b is the estimator for m (1) Y + ( x ), the first derivative of m Y + ( x ). ˆ m Y − ( x ), ˆ m T + ( x ), andˆ m T − ( x ) can be similarly obtained at x = 0.Kai et al. (2010) discuss the asymptotic bias and variance of the LCQR estimator at aninterior point x . The asymptotic properties at a boundary point x are discussed in Kai et al. (2009). Given the results in Kai et al. (2009), one can immediately implement the t -testfor the sharp RD in eq. (5) with bias correction if needed. To implement the t -test for thefuzzy RD in eq. (11), the two covariance expressions in eq. (12) need to be estimated andwe provide the results later on. Asymptotic results for the bias and variance of ˆ m Y + ( x ),ˆ m Y − ( x ), ˆ m T + ( x ), and ˆ m T − ( x ) follow those in Kai et al. (2009) and are given as lemmas in7he Appendix.We make the following assumptions. Let c be a positive constant of supp( K ). Assumption 1. m Y + ( · ), m T + ( · ), m Y − ( · ), and m T − ( · ) are three times differentiable at x . Assumption 2. σ (cid:15) Y + ( · ), σ (cid:15) T + ( · ), σ (cid:15) Y − ( · ), and σ (cid:15) T − ( · ) are right or left continuous and differ-entiable at x . Assumption 3.
The kernel function K is bounded and symmetric such that, for j =0 , , , · · · , µ + ,j ( c ) = (cid:90) ∞− c u j K ( u ) du, ν + ,j = (cid:90) ∞− c u j K ( u ) du,µ − ,j ( c ) = (cid:90) c −∞ u j K ( u ) du, ν − ,j = (cid:90) c −∞ u j K ( u ) du. Assumption 4.
The marginal density f X + ( · ) is right continuous, differentiable and positiveat x = 0, and f X − ( · ) is left continuous, differentiable and positive at x = 0. Assumption 5.
All error distributions for (cid:15) Y + , (cid:15) T + , (cid:15) Y − , and (cid:15) T − are symmetric and havepositive density. All errors are i.i.d. Assumption 6. h Y + → n + h Y + → ∞ as n + → ∞ . The same condition also holds for( n + , h T + ) , ( n − , h Y − ), and ( n − , h T − ).Assumptions 1 and 2 are used for Taylor series expansions in the proof. The bias ex-pressions require only second-order differentiability of the conditional mean function. Higherorder differentiability permits the study of smaller order terms in the expansion. The sup-port of the kernel is assumed to be ( −∞ , ∞ ) in Assumption 3. If the kernel has a boundedsupport such as [ − , µ + ,j ( c ) = (cid:82) − c u j K ( u ) du . Similar changes can be made tothe ν variables. The differentiability in Assumption 4 is also used for Taylor expansionsin the proof and we require the positivity of the density at the boundary as it appears inthe denominator of the bias expression. The symmetric error distribution assumption in8ssumption 5 is not as restrictive as it looks. This symmetry assumption helps to removean extra term in the bias. When the error distribution is non-symmetric, this extra termcan be removed via a procedure similar to bias correction. One only needs n + h + → ∞ whenestimating the conditional mean. The stronger assumption in Assumption 6 is needed whenwe estimate the variance of the second derivative of m Y + , denoted by m (2) Y + , to adjust thes.e. of ˆ m Y + . This assumption is required in order to obtain a consistent estimator of m (2) Y + .Assumption 6 translates to h → nh → ∞ as n → ∞ if we use a single bandwidth h for the data above and below the cutoff, and n is the total number of observations.Next we provide two theorems on the LCQR estimator for sharp and fuzzy treatmenteffects defined in eq. (6) and eq. (12), respectively. Let X be the σ -field generated by all X i . Theorem 1.
Under Assumptions 1 to 6, as both n + and n − → ∞ , we haveBias(ˆ τ sharp | X ) = 12 a Y + ( c ) m (2) Y + (0) h Y + − a Y − ( c ) m (2) Y − (0) h Y − + o p ( h Y + + h Y − ) , (15)Var(ˆ τ sharp | X ) = b Y + ( c ) σ (cid:15) Y + (0) n + h Y + f X + (0) + b Y − ( c ) σ (cid:15) Y − (0) n − h Y − f X − (0) + o p ( 1 n + h Y + + 1 n − h Y − ) , (16)where a Y + ( c ) = a Y − ( c ) = µ , ( c ) − µ + , ( c ) µ + , ( c ) µ + , ( c ) µ + , ( c ) − µ , ( c ) , (17) b Y + ( c ) = e Tq ( S − Y + ( c )Σ Y + ( c ) S − Y + ( c )) e q /q , (18) b Y − ( c ) = e Tq ( S − Y − ( c )Σ Y − ( c ) S − Y − ( c )) e q /q , (19)and the matrices S Y + ( c ), S Y − ( c ), Σ Y + ( c ) and Σ Y − ( c ) are defined in the Appendix, e q is the q -dimensional vector of ones.Proof of Theorem 1 is a straightforward application of Theorem 2.1 in Kai et al. (2009)and is omitted. As long as a symmetric kernel is used, the value of a Y + ( c ) remains the sameif kernel moments on the other side of the cutoff are used. Generally, we have b Y + ( c ) (cid:54) = b Y − ( c )unless error distributions on both sides of the cutoff are assumed to be the same.9hen the bandwidths above and below the cutoff are assumed to be the same, Theorem 1reduces to the following corollary. Corollary 1.
Under Assumptions 1 to 6, if we further assume h Y + = h Y − = h , then we haveBias(ˆ τ sharp | X ) = 12 (cid:104) a Y + ( c ) m (2) Y + (0) − a Y − ( c ) m (2) Y − (0) (cid:105) h + o p ( h ) , (20)Var(ˆ τ sharp | X ) = b Y + ( c ) σ (cid:15) Y + (0) n + hf X + (0) + b Y − ( c ) σ (cid:15) Y − (0) n − hf X − (0) + o p ( 1 n + h + 1 n − h ) (21)The asymptotic results for the fuzzy RD are similarly provided in the theorem below. Theorem 2.
Under Assumptions 1 to 6, as both n + and n − → ∞ , we haveBias(ˆ τ fuzzy | X ) = 1 m T + (0) − m T − (0) (cid:20) a Y + ( c ) m (2) Y + (0) h Y + − a Y − ( c ) m (2) Y − (0) h Y − (cid:21) − m Y + (0) − m Y − (0) (cid:2) m T + (0) − m T − (0) (cid:3) (cid:20) a T + ( c ) m (2) T + (0) h T + − a T − ( c ) m (2) T − (0) h T − (cid:21) + o p ( h Y + + h Y − + h T + + h T − ) . (22)The variance expression is given in eq. (12).The variance expression in eq. (12) is complicated but is available by substituting theresults in Lemmas 2 and 3 in Appendix A into eq. (12). See Appendix A for the proof.From Theorems 1 and 2 and using a single bandwidth h , we see a lot of similaritiesbetween LCQR and LLR: both biases have order O ( h ) and both variances have order O (( nh ) − ). In fact, the asymptotic bias expression is identical for the two methods. Thevariance from LCQR, however, is smaller than that from LLR under many non-normal errordistributions, for which we provide numerical results in the next subsection.In addition, in the special case of q = 1, we can show that b Y + ( c ) = µ , ν + , − µ + , µ + , ν + , + µ , ν + , ( µ + , µ + , − µ , ) q q (cid:88) k =1 q (cid:88) k (cid:48) =1 τ kk (cid:48) f (cid:15) Y + ( c k ) f (cid:15) Y + ( c (cid:48) k ) , (23)10here τ kk (cid:48) = τ k ∧ τ k (cid:48) − τ k τ k (cid:48) , k, k (cid:48) = 1 , , ..., q , c k = F − (cid:15) Y + ( τ k ), and F (cid:15) Y + and f (cid:15) Y + are the cdf andpdf of the error distribution, respectively. The result similarly holds for b Y − ( c ). However,eq. (23) does not hold for q ≥ We provide some numerical evidence in support of using LCQR in RD. To facilitate thepresentation of the results, we consider Corollary 1 with a single bandwidth and replacethe conditional density f X + (0) with the unconditional density f X (0) so that the product n + f X + (0) becomes nf X (0). Similarly, n − f X − (0) is also replaced by nf X (0).Under standard assumptions, the MSE for the LLR estimator of τ sharp is given by (see,e.g., Imbens and Kalyanaraman (2012))MSE LLR = 14 (cid:104) a Y + ( c ) m (2) Y + (0) − a Y − ( c ) m (2) Y − (0) (cid:105) h + 1 nh (cid:34) b ( c ) σ (cid:15) Y + (0) f X (0) + b ( c ) σ (cid:15) Y − (0) f X (0) (cid:35) + o p ( h + 1 nh ) , (24)where b ( c ) = µ , ν + , − µ + , µ + , ν + , + µ , ν + , ( µ + , µ + , − µ , ) . (25)Let a ( c ) = a Y + ( c ) = a Y − ( c ). By assuming errors on both sides of the cutoff have the samedistribution and let b Y ( c ) = b Y + ( c ) = b Y − ( c ), we can further simplify the variance expressionin Corollary 1 to haveMSE LCQR = 14 (cid:104) a ( c ) m (2) Y + (0) − a ( c ) m (2) Y − (0) (cid:105) h + 1 nh (cid:34) b Y ( c ) σ (cid:15) Y + (0) f X (0) + b Y ( c ) σ (cid:15) Y − (0) f X (0) (cid:35) + o p ( h + 1 nh )= 14 (cid:104) a ( c ) m (2) Y + (0) − a ( c ) m (2) Y − (0) (cid:105) h + 1 nh (cid:34) b ( c ) σ (cid:15) Y + (0) f X (0) + b ( c ) σ (cid:15) Y − (0) f X (0) (cid:35) b Y ( c ) b ( c ) + o p ( h + 1 nh ) , (26)11here the two MSEs differ by a factor b Y ( c ) b ( c ) for the variance term. Minimizing these twoMSEs gives optimal bandwidths. Substitute the bandwidths into the MSEs to obtain theoptimal values, MSE optLLR and MSE optLCQR . Similar to Kai et al. (2010), we define the asymptoticrelative efficiency (ARE) of LCQR with respect to LLR on the boundary asARE = MSE optLLR MSE optLCQR → (cid:20) b Y ( c ) b ( c ) (cid:21) − / . (27)When q = 1, the ratio in eq. (27) simplifies to (cid:20) q (cid:80) qk =1 (cid:80) qk (cid:48) =1 τ kk (cid:48) f (cid:15)Y + ( c k ) f (cid:15)Y + ( c (cid:48) k ) (cid:21) − / . When q ≥
2, this simplification does not hold. Yet we can evaluate the ratio in eq. (27) for manycommon distributions. Using the triangular kernel as an example, we calculate the ratio forthe five error distributions listed in Table 1. Using the Epanechnikov kernel gives similarresults. Table 1: Asymptotic relative efficiency (ARE) of LCQR in sharp RDError distribution q = 1 q = 5 q = 9 q = 19 q = 99 N (0 ,
1) 0.6968 0.9290 0.9569 0.9728 0.9819Laplace 1.7411 1.3315 1.2920 1.2616 1.2303 t -distribution with 3 degrees of freedom 1.4718 1.6401 1.6144 1.5703 1.48540 . N (0 ,
1) + 0 . N (0 , ) 0.8639 1.1271 1.1511 1.1579 1.13270 . N (0 ,
1) + 0 . N (0 , ) 2.6960 3.4578 3.4986 3.4590 2.2632The ARE for boundary points in Table 1 largely follows the same pattern for interiorpoints as shown in Table 1 in Kai et al. (2010). When errors are normally distributed,increasing q will quickly bring ARE close to one, reflecting a very small efficiency loss ofLCQR with respect to LLR. For non-normal errors, there can be large efficiency gains. Thefirst column in Table 1 is identical for both interior and boundary points, a result of usinga symmetric kernel function. When q ≥
2, the numbers differ from those in Table 1 in Kai et al. (2010). Overall, Table 1 provides the numerical evidence in support of using LCQR inRD when data are non-normal, i.e., the ARE mostly exceeds 1.12
Inference on the boundary
The goal of inference is related to but different from that of estimation. While one still needsa precise ˆ τ sharp or ˆ τ fuzzy , much of the effort is spent on making t -statistics in eqs. (5) and (11)behave like a standard normal random variable. Common approaches include reducing theimpact of bias by under-smoothing, directly adjusting the bias, combining bias correctionand s.e. adjustment (Calonico et al. (2014)), adjusting the critical values (Armstrong andKoles´ar (2020)), etc. This section studies the bias-corrected and s.e.-adjusted t -statisticsbased on LCQR for conducting inference in sharp and fuzzy RD designs. From Theorem 1, the bias-corrected estimator is given byˆ τ bcsharp = ˆ τ sharp − (cid:100) Bias(ˆ τ sharp ) , (28)where (cid:100) Bias(ˆ τ sharp ) = (cid:100) Bias( ˆ m Y + ) − (cid:100) Bias( ˆ m Y − ).Simply replacing ˆ τ sharp in eq. (5) with ˆ τ bcsharp may still lead to undercoverage of the resultingconfidence intervals. Calonico et al. (2014) observed that the key to fix this problem is totake into consideration the additional variability introduced by bias correction. Similar ideais applied to inference based on LCQR, i.e., instead of using Var(ˆ τ sharp ) in the denominatorof eq. (5), we use Var(ˆ τ sharp − (cid:100) Bias(ˆ τ sharp )). The additional variability due to bias correctionincreases the adjusted variance and the s.e./variance adjustment naturally improves thecoverage of the resulting confidence intervals. By the i.i.d. assumption, we haveVar(ˆ τ sharp − (cid:100) Bias(ˆ τ sharp )) = Var(ˆ τ sharp ) + Var( (cid:100) Bias(ˆ τ sharp )) − τ sharp , (cid:100) Bias(ˆ τ sharp ))= Var( ˆ m Y + ) + Var( ˆ m Y − ) + Var( (cid:100) Bias( ˆ m Y + )) + Var( (cid:100) Bias( ˆ m Y − )) − m Y + , (cid:100) Bias( ˆ m Y + )) − m Y − , (cid:100) Bias( ˆ m Y − )) , (29)13here the expressions for Var( ˆ m Y + ) and Var( ˆ m Y − ) are available in Kai et al. (2009, 2010).We contribute to the literature by deriving the explicit forms of the variance of bias and thecovariance in eq. (29). Theorem 3.
Under Assumptions 1 to 6, as both n + and n − → ∞ , the adjusted t -statisticfor the sharp RD follows an asymptotic normal distribution t adj.sharp = ˆ τ sharp − (cid:100) Bias(ˆ τ sharp ) (cid:113) Var(ˆ τ sharp − (cid:100) Bias(ˆ τ sharp )) d → N (0 , . (30)The exact form of the adjusted variance is given in the Appendix.Following Theorem 3, the confidence interval for the nominal 95% coverage becomesˆ τ sharp − (cid:100) Bias(ˆ τ sharp ) ± . (cid:113) Var(ˆ τ sharp − (cid:100) Bias(ˆ τ sharp )), which incorporates the variability in-troduced by bias correction and it will thus improve the coverage probability. Remark 4.
Theorem 3 assumes that the researcher uses the same bandwidths ( h Y + and h Y − ) for estimation, bias correction, and s.e. adjustment. This simplifies the presentationof the result and the proof. It is also empirically relevant and appealing since using a singlebandwidth throughout estimation and inference greatly reduces the complexity of imple-mentation, though ideally we need a different bandwidth for every required nonparametricestimation. Despite the suboptimal nature of this procedure, our simulation and empiricalexercise show that the results are satisfactory. Remark 5.
We assume two different bandwidths, h Y + and h Y − , in Theorem 3. If they arefurther assumed to be the same, h Y + = h Y − = h , then by replacing the conditional density ofthe covariate, f X + (0) and f X − (0), with the unconditional one, f X (0), it can be shown thatVar(ˆ τ sharp − (cid:100) Bias(ˆ τ sharp )) = 1 nh C + o p ( 1 nh ) , where the constant C can be straightforwardly inferred from the proof of Theorem 3 inAppendix A. 14 .2 The fuzzy case Following Feir et al. (2016), we use a modified t -statistic to help eliminate the size distortiondue to possibly weak identification in the fuzzy RD. Use eq. (9) to rewrite the the null H : τ fuzzy = τ as H : [ m Y + (0) − τ m T + (0)] − [ m Y − (0) − τ m T − (0)] = 0 . Define ˜ τ fuzzy = ( ˆ m Y + − τ ˆ m T + ) − ( ˆ m Y − − τ ˆ m T − ). The bias-corrected ˜ τ fuzzy is given by˜ τ bcfuzzy = ˜ τ fuzzy − (cid:100) Bias(˜ τ fuzzy ) , where (cid:100) Bias(˜ τ fuzzy ) = [ (cid:100) Bias( ˆ m Y + ) − τ (cid:100) Bias( ˆ m T + )] − [ (cid:100) Bias( ˆ m Y − ) − τ (cid:100) Bias( ˆ m T − )] . (31)All the bias terms can be estimated using the result in Lemma 2 of Appendix A. To see thecomponents in the adjusted variance, it is helpful to consider first the data above the cutoff:Var(( ˆ m Y + − τ ˆ m T + ) − ( (cid:100) Bias( ˆ m Y + ) − τ (cid:100) Bias( ˆ m T + )))= Var( ˆ m Y + ) + τ Var( ˆ m T + ) − τ Cov( ˆ m Y + , ˆ m T + )+ Var( (cid:100) Bias( ˆ m Y + )) + τ Var( (cid:100)
Bias( ˆ m T + )) − τ Cov( (cid:100)
Bias( ˆ m Y + ) , (cid:100) Bias( ˆ m T + )) − m Y + , (cid:100) Bias( ˆ m Y + )) + 2 τ Cov( ˆ m Y + , (cid:100) Bias( ˆ m T + ))+ 2 τ Cov( ˆ m T + , (cid:100) Bias( ˆ m Y + )) − τ Cov( ˆ m T + , (cid:100) Bias( ˆ m T + )) . (32)To operationalize the variance adjustment process, we derive all the needed covarianceterms in the Appendix. Expressions for the data below the cutoff are similar to thosepresented above for the data above the cutoff.15 heorem 6. Under Assumptions 1 to 6, as both n + and n − → ∞ , the adjusted t -statisticfor the fuzzy RD follows an asymptotic normal distribution t adj.fuzzy = ˜ τ fuzzy − (cid:100) Bias(˜ τ fuzzy ) (cid:113) Var(˜ τ fuzzy − (cid:100) Bias(˜ τ fuzzy )) d → N (0 , . (33)The exact expression for Var(˜ τ fuzzy − (cid:100) Bias(˜ τ fuzzy )) is provided in the Appendix. In this subsection, we propose a method to revise the MSE-optimal bandwidth by takinginto consideration both bias correction and s.e. adjustment. It is well-known in the nonpara-metric literature that the MSE-optimal bandwidth, when used in inference, often inducesundercoverage of conventional confidence intervals. The root cause is that the MSE-optimalbandwidth, h MSE , is O ( n − / ) such that nh → a constant while we need nh → h MSE in eq. (5) could lead to poorcoverage of confidence intervals. Using a bias-corrected t -statisticˆ τ sharp − (cid:100) Bias(ˆ τ sharp ) (cid:112) Var(ˆ τ sharp )does not solve this problem; see the discussion in Section 2 of Calonico et al. (2014).It is helpful to revisit the bandwidth selection process in an MSE-optimal setting in orderto find a solution. We use a single bandwidth h for illustration. Consider the usual MSEwhen estimating a conditional mean function m ( · ) by ˆ m ( · ):MSE = Bias ( ˆ m ) + Var( ˆ m ) + o p ( h + 1 nh ) . It is clear that the MSE-optimal bandwidth aims to balance the two terms, Bias andVariance. After correcting the bias for the numerator of the t -statistic, an optimal band-width should balance Bias ( ˆ m − Bias( ˆ m )), not Bias ( ˆ m ), with the variance term. Given16ias( ˆ m ) = O ( h ), we need to expand the bias expression up to O ( h ) in order to com-pute Bias( ˆ m − Bias( ˆ m )). In addition, after incorporating variance adjustment due to biascorrection, the adjusted MSE takes the following formadj. MSE = Bias ( ˆ m − Bias( ˆ m )) + adj. Var( ˆ m ) + o p ( h + 1 nh ) , (34)where adj. Var( ˆ m ) can take the form in eq. (29) in the sharp RD design. The followingtheorem presents the bandwidth result associated with the adjusted MSE in the sharp RDdesign, assuming we work with data above the cutoff. Theorem 7.
Under Assumptions 1 to 6, as n + → ∞ , the bandwidth that minimizes theadjusted MSE is given by h Y + = (cid:18) C C (cid:19) / n − / , (35)where C and C are constants defined in the Appendix.See the Appendix for the proof. The result for the fuzzy design case can be obtainedsimilarly. Remark 8.
Theorem 7 assumes that two separate bandwidths are used for the data aboveand below the cutoff. Similar result holds for h Y − . On each side of the cutoff, a singlebandwidth is used in both estimation and bias correction. Theorem 7 provides the expressionof this bandwidth. The idea of using the adjusted MSE in eq. (34) can also be applied tobandwidth selection in LLR. Remark 9.
The bandwidth derived in Theorem 7 is supposed to balance the bias of thebias-corrected estimator and its adjusted variance. It is not designed to optimize the coverageprobability of confidence intervals. However, simulation evidence shows that it provides goodcoverage probabilities along with an accurate estimator of the treatment effect.
Remark 10.
The optimal bandwidth is compatible with the required assumption of n + h Y + →∞ as n + → ∞ . 17 emark 11. If equal bandwidth on both sides of the cutoff is preferred, it can be derived ina manner similar to the optimal bandwidth choice in Imbens and Kalyanaraman (2012). Tosee that, we further assume h Y + = h Y − = h and let C + , , C + , , C − , , C − , be the correspondingconstants in Theorem 7 for the data above and below the cutoff. The adjusted MSE becomesadj. MSE = ( C + , − C − , ) h + nh ( C + , + C − , ) + o p ( h + nh ), and the optimal bandwidthis given by h = (cid:18) C + , + C − , C + , − C − , ) (cid:19) / n − / . (36)This bandwidth is shown to have good finite sample performance in our simulation study.One caveat is that the performance of this bandwidth relies on a good estimate of thesecond derivative of the conditional mean function. If it is difficult to obtain an estimateof the second derivative, one can use a simple, global quadratic polynomial to estimate thederivative. This is implemented in bandwidth = "rot" as well as the option ls.derivative= TRUE in the rdcqr package. Remark 12.
The bandwidth considered in this section has order O ( n − / ) or O ( n − / ). Tocheck the robustness (to bandwidth selection) of the adjusted t -statistic, we also directlyuse the rule-of-thumb bandwidth in Section 4.2 of Fan and Gijbels (1996) in our simulation.This bandwidth is close to the Mean Integrated Squared Error optimal (MISE-optimal)bandwidth and has order O ( n − / ) and O ( n − / − ) for the data above and below the cutoff.The results are reported in Appendix B, and they are found very similar to those obtainedfrom using an O ( n − / ) bandwidth. In this section, we conduct a Monte Carlo study to investigate the finite sample propertiesof the LCQR estimator in RD. We focus on the sharp design and separate the Monte Carlostudy into two parts, one for estimation and the other for inference.18he sharp designs calibrated to Lee (2008) and Ludwig and Miller (2007) are used in thedata generating process (DGP): Y i = m ( X i ) + σ ( X i ) (cid:15) i , i = 1 , · · · , n,X i ∼ × Beta(2 , − , where the conditional means are given by m Lee ( X i ) = .
48 + 1 . X i + 7 . X i + 20 . X i + 21 . X i + 7 . X i if X i < , .
52 + 0 . X i − . X i + 7 . X i − . X i + 3 . X i if X i ≥ , (37)and m LM ( X i ) = .
71 + 2 . X i + 3 . X i + 1 . X i + 0 . X i + 0 . X i if X i < , .
26 + 18 . X i − . X i + 74 . X i − . X i + 9 . X i if X i ≥ . (38)Hence the treatment effects are 0 .
04 and − .
45 in (37) and (38), respectively. We use thesame five error distributions in Kai et al. (2010) to simulate (cid:15) i :DGP 1: (cid:15) i ∼ N (0 , , DGP 2: (cid:15) i ∼ Laplace with µ = 0 and σ = 1 , DGP 3: (cid:15) i ∼ t distribution with 3 degrees of freedom , DGP 4: (cid:15) i ∼ mixture normal 0 . N (0 ,
1) + 0 . N (0 , , DGP 5: (cid:15) i ∼ mixture normal 0 . N (0 ,
1) + 0 . N (0 , . In addition, the homoskedastic and heterskedastic specifications in Kai et al. (2010) are usedfor simulating the standard deviation 19 ( X i ) = . πX i ) /
10 for heteroskedastic errors.We set n = 500 with 5000 replications. The data are i.i.d. draws in all replications. Thetriangular kernel is used in all estimations.Since we also study the coverage probability of confidence intervals, we compare theLCQR results with two other inference methods recently proposed in the RD literature: therobust confidence interval in Calonico et al. (2014), and the honest confidence interval inArmstrong and Koles´ar (2020). For ease of exposition, we first summarize the types of theestimators reported later on.Table 2: Summary of estimators in the simulation studyEstimator Descriptionˆ τ cqr1bw LCQR with equal bandwidthˆ τ cqr,bc1bw LCQR with equal bandwidth, bias-corrected and s.e.-adjustedˆ τ cqr2bw LCQR with unequal bandwidthˆ τ cqr,bc2bw LCQR with unequal bandwidth, bias-corrected and s.e.-adjustedˆ τ llr1bw LLR estimatorˆ τ robust,bc1bw the robust estimator in Calonico et al. (2014) with equal bandwidthˆ τ honest1bw the honest method in Armstrong and Koles´ar (2020) with equal bandwidthWe set q = 7 for all LCQR estimators. ˆ τ robust,bc1bw is obtained from the rdrobust commandin the R package rdrobust . We use the option bwselect = mserd in estimation and theoption bwselect = cerrd in inference so that ˆ τ robust,bc1bw is MSE-optimal and coverage-optimalin Section 4.1 and Section 4.2, respectively. ˆ τ honest1bw is obtained from the RDHonest commandin the R package
RDHonest . The option opt.criterion = FLCI is used to produce theoptimal confidence intervals. Taylor smoothness class is selected in estimation. To implementthe method in Armstrong and Koles´ar (2020), one needs to get an estimate of the secondderivative of the conditional mean function. We use the function
RDSmoothnessBound in the
RDHonest package to estimate this bound for the Lee model in eq. (37). Alternatively, since20e know the data generating process in eq. (38) and the maximum of the second derivativeat 0 is 54 . ×
2, a bound of 120 is directly used for the LM model. The local linear estimator,ˆ τ llr1bw , is obtained by applying the main bandwidth used in ˆ τ robust,bc1bw .To calculate the bandwidth in eq. (35) and eq. (36) for LCQR, we use the rule-of-thumbbandwidth selector described in Section 4.2 of Fan and Gijbels (1996) to compute quantitiessuch as C and C in eq. (35). The bandwidth in eq. (35) or eq. (36) is then used to performthe LCQR estimation, bias correction, and s.e. adjustment. Unlike the LLR estimator,the LCQR estimator has no closed-form solution and is obtained from the iterative MMalgorithm. Hence the s.e. and adjusted s.e. reported for LCQR are based on the asymptoticresults in Theorems 1 and 3. In this subsection, we compare LCQR with LLR for the treatment effect estimation. b i a s o f b i a s − c o rr e c t ed e s t i m a t e LCQRLLR
Figure 2: Absolute value of average bias of the bias-corrected estimators, ˆ τ cqr,bc1bw and ˆ τ robust,bc1bw for the Lee model with homoskedasticity. ˆ τ robust,bc1bw is the bias-corrected LLR estimator. Theresult is based on 5000 replications and the true treatment effect is 0 . . .
04 in DGP. The bias is reduced to 0 . .
5% of the truetreatment effect when LCQR is adopted. Similar results for other models can be found inFigures 4 to 6 in Appendix B.Tables 3 to 6 further present the mean and standard errors of the studied estimatorsto facilitate comparison. These tables indicate that the s.e. of ˆ τ cqr1bw is consistently smallerthan that of ˆ τ llr1bw , indicating the efficiency gain of LCQR against LLR. Although DGP 1has normal errors, Y is non-normal; hence the s.e. of ˆ τ cqr1bw is also smaller than that of ˆ τ llr1bw .As the DGP moves away from normal errors, the LCQR estimator achieves various levels ofefficiency gains compared to the LLR estimator. For example, the s.e. of ˆ τ cqr1bw in DGP 5 ofTable 4 is 0 . .
255 for ˆ τ llr1bw , so the LCQR/LLR standard error ratio is closeto 50% in this example.While reading Tables 3 to 6, it is important to bear in mind that the s.e.s of estimatorswith a superscript “ bc ” are not suitable to assess the efficiency of the estimators. Theyare computed by incorporating the additional variability due to bias correction. However,they shed light on the length of confidence intervals. Tables 3 to 6 show that the adjusteds.e. of ˆ τ cqr,bc1bw could be smaller or larger than that of ˆ τ robust,bc1bw , so the simulation results forcomparing LCQR with LLR appear to be mixed after bias correction.In addition, using two bandwidths above and below the cutoff also gives mixed results interms of bias correction for the LCQR estimator, though it is clear that the two-bandwidthapproach leads to a small decrease in s.e. for LCQR.22able 3: Lee model with homoskedastic errors and treatment effect τ sharp = 0 . τ cqr1bw τ cqr,bc1bw τ cqr2bw τ cqr,bc2bw τ llr1bw τ robust,bc1bw Notes : The estimates for the treatment effect and standard error are averages over 5000replications with a sample size n = 500. The s.e. and adjusted s.e. for LCQR areobtained using the asymptotic expressions from Theorem 1 and Theorem 3. Estimatorswith a superscript bc are both bias-corrected and s.e.-adjusted. The subscripts 1bw and2bw refer to the use of equal and unequal bandwidths below and above the cutoff.Table 4: Lee model with heteroskedatic errors and treatment effect τ sharp = 0 . τ cqr1bw τ cqr,bc1bw τ cqr2bw τ cqr,bc2bw τ llr1bw τ robust,bc1bw Notes : The estimates for the treatment effect and standard error are averages over 5000replications with a sample size n = 500. The s.e. and adjusted s.e. for LCQR areobtained using the asymptotic expressions from Theorem 1 and Theorem 3. Estimatorswith a superscript bc are both bias-corrected and s.e.-adjusted. The subscripts 1bw and2bw refer to the use of equal and unequal bandwidths below and above the cutoff.23able 5: LM model with homoskedasticity errors and treatment effect τ sharp = − . τ cqr1bw -3.244 0.191 -3.265 0.231 -3.269 0.252 -3.257 0.206 -3.296 0.251ˆ τ cqr,bc1bw -3.439 0.353 -3.453 0.415 -3.450 0.458 -3.445 0.380 -3.443 0.455ˆ τ cqr2bw -3.259 0.189 -3.280 0.228 -3.285 0.249 -3.270 0.204 -3.308 0.247ˆ τ cqr,bc2bw -3.440 0.348 -3.452 0.411 -3.450 0.452 -3.443 0.375 -3.440 0.447ˆ τ llr1bw -3.340 0.223 -3.342 0.301 -3.333 0.346 -3.345 0.256 -3.328 0.452ˆ τ robust,bc1bw -3.423 0.256 -3.434 0.349 -3.426 0.403 -3.432 0.295 -3.426 0.530 Notes : The estimates for the treatment effect and standard error are averages over 5000replications with a sample size n = 500. The s.e. and adjusted s.e. for LCQR are obtainedusing the asymptotic expressions from Theorem 1 and Theorem 3. Estimators with a super-script bc are both bias-corrected and s.e.-adjusted. The subscripts 1bw and 2bw refer to theuse of equal and unequal bandwidths below and above the cutoff.Table 6: LM model with heteroskedatic errors and treatment effect τ sharp = − . τ cqr1bw -3.234 0.106 -3.245 0.127 -3.252 0.138 -3.242 0.114 -3.281 0.137ˆ τ cqr,bc1bw -3.443 0.195 -3.448 0.228 -3.444 0.250 -3.442 0.209 -3.443 0.248ˆ τ cqr2bw -3.239 0.104 -3.256 0.125 -3.259 0.136 -3.247 0.112 -3.288 0.134ˆ τ cqr,bc2bw -3.443 0.192 -3.447 0.225 -3.446 0.246 -3.444 0.206 -3.445 0.243ˆ τ llr1bw -3.364 0.146 -3.359 0.190 -3.349 0.215 -3.364 0.165 -3.339 0.273ˆ τ robust,bc1bw -3.431 0.163 -3.438 0.216 -3.432 0.245 -3.437 0.185 -3.431 0.315 Notes : The estimates for the treatment effect and standard error are averages over 5000replications with a sample size n = 500. The s.e. and adjusted s.e. for LCQR are obtainedusing the asymptotic expressions from Theorem 1 and Theorem 3. Estimators with a super-script bc are both bias-corrected and s.e.-adjusted. The subscripts 1bw and 2bw refer to theuse of equal and unequal bandwidths below and above the cutoff.24 .2 Inference on the treatment effect This subsection studies the coverage probability of the confidence intervals based on theLCQR estimator, using the adjusted s.e. in Theorem 3 and the adj.MSE-based bandwidthselector. The results are summarized in Tables 7 to 10. The nominal coverage probability isset to 95%.Without bias correction or s.e. adjustment, confidence intervals based on the LCQR esti-mators, ˆ τ cqr1bw and ˆ τ cqr2bw , are found to have poor coverage probabilities in Tables 7 to 10, whichhighlights an important difference between estimation and inference: despite the excellentfinite sample properties of LCQR in Tables 3 to 6, one has to perform both bias correctionand s.e. adjustment in order to produce the desired coverage of confidence intervals. Fur-thermore, the coverage probability of LCQR-based confidence intervals is lower than that ofLLR-based confidence intervals. For example, it is 92 .
2% in DGP 1 in Table 7, compared to93 .
4% of the LLR-based confidence interval. Although ˆ τ cqr1bw tends to have a smaller bias anda smaller s.e. than ˆ τ llr1bw , it appears that the bias in ˆ τ cqr1bw is still not small enough to centerthe estimator close to the true value. A smaller s.e. will further contribute to the decreasein coverage probabilities. This explains the poor coverage for confidence intervals based onˆ τ cqr1bw and ˆ τ cqr2bw .With bias correction and s.e. adjustment, the proposed bandwidth selectors in eq. (35)and eq. (36) lead to good coverage across the five DGPs in Tables 7 to 10. It is importantto recall that the LCQR estimator presented in all tables use the bandwidth in eq. (35) andeq. (36), which is not designed to optimize the coverage probability of confidence intervals,while the confidence intervals for ˆ τ robust,bc1bw and ˆ τ honest1bw in Tables 7 to 10 are optimized tomaximize the coverage probability. In this respect, it is reasonable to conclude that LCQRwith bandwidths in eq. (35) and eq. (36) offers very competitive results. The performance of the honest confidence intervals is excellent for some DGPs in Table 7. Its coverageprobability is higher in Tables 9 and 10, most likely due to the use of the upper bound 120 in estimation.Clearly, a better number can be used to further tune the results. τ cqr1bw τ cqr,bc1bw τ cqr2bw τ cqr,bc2bw τ llr1bw τ robust,bc1bw τ honest1bw Notes : The numbers are the averages of the coverage prob-ability associated with different estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with superscript bc are both bias-corrected and s.e.-adjusted. The results of both ˆ τ robust,bc1bw and ˆ τ honest1bw are coverage-optimal.Table 8: Coverage probability in the Lee model with heteroskedasticityEstimator DGP 1 DGP 2 DGP 3 DGP 4 DGP 5ˆ τ cqr1bw τ cqr,bc1bw τ cqr2bw τ cqr,bc2bw τ llr1bw τ robust,bc1bw τ honest1bw Notes : The numbers are the averages of the coverage prob-ability associated with different estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with superscript bc are both bias-corrected and s.e.-adjusted. The results of both ˆ τ robust,bc1bw and ˆ τ honest1bw are coverage-optimal.26able 9: Coverage probability in the LM model with homoskedasticityEstimator DGP 1 DGP 2 DGP 3 DGP 4 DGP 5ˆ τ cqr1bw τ cqr,bc1bw τ cqr2bw τ cqr,bc2bw τ llr1bw τ robust,bc1bw τ honest1bw Notes : The numbers are the averages of the coverage prob-ability associated with different estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with superscript bc are both bias-corrected and s.e.-adjusted. The results of both ˆ τ robust,bc1bw and ˆ τ honest1bw are coverage-optimal.Table 10: Coverage probability in the LM model with heteroskedasticityEstimator DGP 1 DGP 2 DGP 3 DGP 4 DGP 5ˆ τ cqr1bw τ cqr,bc1bw τ cqr2bw τ cqr,bc2bw τ llr1bw τ robust,bc1bw τ honest1bw Notes : The numbers are the averages of the coverage prob-ability associated with different estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with superscript bc are both bias-corrected and s.e.-adjusted. The results of both ˆ τ robust,bc1bw and ˆ τ honest1bw are coverage-optimal.27 Application to Lee (2008)
In this section, we use the data from Lee (2008) to illustrate the practical usage of LCQR.To facilitate comparison, the findings based on LLR are also presented.We revisit a classic example from Lee (2008) with 6 ,
558 observations depicted in Fig-ure 3(a); see also Imbens and Kalyanaraman (2012). The horizontal running variable isthe democratic vote share (margin of victory) in a previous election, while the vertical out-come variable is the democratic vote share in the election afterwards. Consistent with Lee(2008) and several follow-up studies, Figure 3(a) indicates that there is a positive impact ofincumbency on re-election, i.e., a visible jump occurs at the threshold zero.Figure 3(b) and (c) present the estimated impact of incumbency on re-election by LCQRand LLR methods, respectively. We consider a sequence of bandwidth values ranging from0.05 to 1 with the step size 0.025: 0.05, 0.075, 0.1, ..., 1. This bandwidth sequence thus nestsmany common choices such as the MSE-optimal bandwidth of Imbens and Kalyanaraman(2012), which is about 0.3 for the studied data.The comparison of Figure 3(b) and (c) shows that LCQR and LLR yield similar pointestimates over a wide range of bandwidths for the studied Lee (2008) application. As thebandwidth increases, Figure 3(a) indicates that there are data points staying further awayfrom the fitted regression line. These data points affect LCQR and LLR estimators in adifferent manner, leading to slightly disparate point estimates. Nevertheless, all the pointestimates depicted in Figure 3(b) and (c) are significantly positive, since the (vertical) zerovalue is excluded from the shaded regions generated by ± ∼ (a) Regression Discontinuity Design . . . . D e m o c r a t i c V o t e S ha r e , E l e c t i on t + −1 −.5 0 .5 1Democratic Vote Share Margin of Victory, Election t (b) LCQR Estimate ± T r ea t m en t E ff e c t + / - S . E . (c) LLR Estimate ± T r ea t m en t E ff e c t + / - S . E . (d) Ratio of S.E., LCQR/LLR S t anda r d E rr o r R a t i o : L C Q R / LL R Notes : (a) x-axis, the democratic vote share (margin of victory) in Election t ; y-axis, the democratic voteshare in Election t + 1. The dots show the sample mean of the vertical variable in each bin of the horizontalvariable (50 bins on each side of the cutoff). The solid line represents the fitted fourth-order polynomial.As the bandwidth increases, (b) presents the LCQR estimate ± × standard error; (c) presents the LLRestimate ± × standard error; and (d) presents the ratio of the standard errors by LCQR and LLR. TheMSE-optimal bandwidth of Imbens and Kalyanaraman (2012) for the studied data set is about 0.3. Thetriangular kernel is used for both LCQR and LLR estimators. ± et al. (2014) approach, which is (0.046, 0.089). These empiricalfindings therefore lend credibility to our proposed LCQR approach. In this paper, we study the application of LCQR in Kai et al. (2010) to the estimation andinference in RD. We present numerical evidence for the efficiency gain of using LCQR inRD estimation and also propose a bias-corrected and s.e.-adjusted t -statistic to improve thecoverage of confidence intervals. Simulation results show good performance of the proposedmethod under several non-normal error distributions.The current work can be extended in several directions. By minimizing higher orderterms in the Edgeworth expansion of the t -statistic, we could derive a coverage-optimalbandwidth for LCQR that will possibly further improve its performance in inference, similarto the work in Calonico et al. (2018). It will also be interesting to revisit some of the existingapplications in RD with the proposed method as data may deviate from normality. Finally,on the computation side, instead of using the same bandwidth for estimation, bias correctionand s.e. adjustment, one can refine the bandwidth selection process, which may also improvethe estimation and coverage probability of LCQR. We leave these topics for future research.30 eferences Armstrong, T. and
Koles´ar, M. (2018). Optimal inference in a class of regressionmodels.
Econometrica , (2), 655–683. — and Koles´ar, M. (2020). Simple and honest confidence intervals in nonparametricregression.
Quantitative Economics , (1), 1–39. Calonico, S. , Cattaneo, M. D. and
Farrell, M. H. (2018). On the effect of biasestimation on coverage accuracy in nonparametric inference.
Journal of the AmericanStatistical Association , (522), 767–779. — , — and Titiunik, R. (2014). Robust nonparametric confidence intervals for regression-discontinuity designs.
Econometrica , (6), 2295–2326. Cheng, M.-Y. , Fan, J. and
Marron, J. S. (1997). On automatic boundary corrections.
The Annals of Statistics , (4), 1691–1708. Fan, J. (1993). Local linear regression smoothers and their minimax efficiencies.
The Annalsof Statistics , (1), 196–216. — and Gijbels, I. (1992). Variable bandwidth and local linear-regression smoothers.
TheAnnals of Statistics , (4), 2008–2036. — and — (1996). Local polynomial modelling and its applications . CRC Press.
Feir, D. , Lemieux, T. and
Marmer, V. (2016). Weak identification in fuzzy regressiondiscontinuity designs.
Journal of Business and Economic Statistics , (2), 185–196. Frandsen, B. R. , Frlich, M. and
Melly, B. (2012). Quantile treatment effects in theregression discontinuity design.
Journal of Econometrics , (2), 382 – 395. Hahn, J. , Todd, P. and
Van der Klaauw, W. (2001). Identification and estimation oftreatment effects with a regression-discontinuity design.
Econometrica , (1), 201–209. Huang, X. and
Lin, Z. (2020). Local composite quantile regression smoothing: A flexibledata structure and cross-validation.
Econometric Theory , forthcoming.
Imbens, G. and
Kalyanaraman, K. (2012). Optimal bandwidth choice for the regressiondiscontinuity estimator.
Review of Economic Studies , (3), 933–959. — and Lemieux, T. (2008). Regression discontinuity designs: A guide to practice.
Journalof Econometrics , (2), 615–635. 31 ai, B. , Li, R. and
Zou, H. (2009). Supplement material for local composite quantileregression smoothing: an efficient and safe alternative to local polynomial regression. — , — and — (2010). Local composite quantile regression smoothing: an efficient and safealternative to local polynomial regression. Journal of the Royal Statistical Society , (1),49–69. — , — and — (2011). New efficient estimation and variable selection methods for semi-parametric varying-coefficient partially linear models. The Annals of Statistics , (1),305. Lee, D. S. (2008). Randomized experiments from non-random selection in us house elec-tions.
Journal of Econometrics , (2), 675–697. — and Lemieux, T. (2010). Regression discontinuity designs in economics.
Journal ofEconomic Literature , (2), 281–355. Li, D. and
Li, R. (2016). Local composite quantile regression smoothing for harris recurrentmarkov processes.
Journal of Econometrics , (1), 44–56. Ludwig, J. and
Miller, D. L. (2007). Does head start improve children’s life chances?evidence from a regression discontinuity design.
Quarterly Journal of Economics , (1),159–208. Porter, J. (2003). Estimation in the regression discontinuity model.
UnpublishedManuscript, Department of Economics, University of Wisconsin at Madison . Ruppert, D. , Sheather, S. J. and
Wand, M. P. (1995). An effective bandwidth selectorfor local least squares regression.
Journal of the American Statistical Association , (432),1257–1270. — and Wand, M. P. (1994). Multivariate locally weighted least squares regression.
TheAnnals of Statistics , (3), 1346–1370. Thistlethwaite, D. L. and
Campbell, D. T. (1960). Regression-discontinuity analysis:An alternative to the ex post facto experiment.
Journal of Educational Psychology , (6),309. Zhao, Z. and
Xiao, Z. (2014). Efficient regressions via optimally combining quantile in-formation.
Econometric Theory , (6), 1272–1314.32 ppendix A Lemmas and proofs of theorems are provided in this section. In a fuzzy RD, LCQR is appliedseparately to eqs. (1), (2), (7) and (8). Instead of introducing four similar sets of variablesand notation for the proof, we will focus on eq. (1). Exactly the same proof holds for resultsbased on eqs. (2), (7) and (8).Let f (cid:15) Y + = ( f (cid:15) Y + ( c ) , · · · , f (cid:15) Y + ( c q )) T be a q × S Y + , ( c ) be a q × q diagonalmatrix with diagonal elements f (cid:15) Y + ( c k ) µ + , , S Y + , ( c ) be a q × p matrix with ( k, j ) element f (cid:15) Y + ( c k ) µ + ,j , S Y + , ( c ) be the transpose of S Y + , ( c ), and S Y + , ( c ) be a p × p matrix with( j, j (cid:48) ) element equal to (cid:80) qk =1 f (cid:15) Y + ( c k ) µ + ,j + j (cid:48) ( c ). Let Σ Y + , ( c ) be a q × q matrix with ( k, k (cid:48) )element ν + , ( c ) τ kk (cid:48) , Σ Y + , ( c ) be a q × p matrix with ( k, j ) element (cid:80) qk (cid:48) =1 τ kk (cid:48) ν + ,j , Σ Y + , ( c )be the transpose of Σ Y + , ( c ), Σ Y + , ( c ) be a p × p matrix with ( j, j (cid:48) ) element equal to (cid:80) qk =1 (cid:80) qk (cid:48) =1 τ kk (cid:48) ν + ,j + j (cid:48) ( c ). Define S Y + ( c ) = S Y + , ( c ) S Y + , ( c ) S Y + , ( c ) S Y + , ( c ) , Σ Y + ( c ) = Σ Y + , ( c ) Σ Y + , ( c )Σ Y + , ( c ) Σ Y + , ( c ) and the partitioned inverse of S − Y + S − Y + = ( S − Y + ( c )) ( S − Y + ( c )) ( S − Y + ( c )) ( S − Y + ( c )) . Let F + ( c k , c k (cid:48) ) be the joint cumulative distribution function of (cid:15) Y + and (cid:15) T + at ( c k , c k (cid:48) ). Define φ kk (cid:48) = F + ( c k , c k (cid:48) ) − τ k τ k (cid:48) . Also let Σ Y T + , ( c ) be a q × q matrix with ( k, k (cid:48) ) element ν + , ( c ) φ kk (cid:48) ,Σ Y T + , ( c ) be a q × p matrix with ( k, j ) element (cid:80) qk (cid:48) =1 φ kk (cid:48) ν + ,j ( c ), Σ Y T + , ( c ) be the transposeof Σ Y T + , ( c ), Σ Y T + , ( c ) be a p × p matrix with ( j, j (cid:48) ) element (cid:80) qk =1 (cid:80) qk (cid:48) =1 φ kk (cid:48) ν + ,j + j (cid:48) ( c ).Define Σ Y T + ( c ) = Σ Y T + , ( c ) Σ Y T + , ( c )Σ Y T + , ( c ) Σ Y T + , ( c ) . A similar set of definitions can be provided to other variables on the boundary, including S T + ( c ) , S − T + ( c ) , Σ T + ( c ) , S Y − ( c ) , S − Y − ( c ) , Σ Y − ( c ) , S T − ( c ) , S − T − ( c ) , Σ T − ( c ) , F − ( c k , c k (cid:48) ) , Σ Y T − ( c ), and φ kk (cid:48) can also be redefined with F − ( c k , c k (cid:48) ). 33onsider eq. (1). Let x + ,i = ( X + ,i − x ) /h Y + and K i = K ( x + ,i ) with x = 0. Define, for k = 1 , · · · , q and j = 0 , · · · , p , u k = (cid:112) n + h Y + ( a Y + − m Y + ( x ) − σ (cid:15) Y + c k ) ,v j = h jY + (cid:112) n + h Y + ( j ! b j − m ( j ) Y + ( x )) /j ! , ∆ i,k = u k (cid:112) n + h Y + + p (cid:88) j =1 v j x j + ,i (cid:112) n + h Y + ,r i,p = m Y + ( X + ,i ) − p (cid:88) j =0 m ( j ) Y + ( x )( X + ,i − x ) j /j ! ,d i,k = c k [ σ (cid:15) Y + ( X + ,i ) − σ (cid:15) Y + ( x )] + r i,p . Let W ∗ Y + ,n + = ( w ∗ Y + , , · · · , w ∗ Y + , q , w ∗ Y + , , · · · , w ∗ Y + , p ) T = ( w ∗ Y + , n , w ∗ Y + , n ) T , where w ∗ Y + , k = 1 (cid:112) n + h Y + n + (cid:88) i =1 K ( x + ,i ) η ∗ Y + ,i,k ,w ∗ Y + , j = 1 (cid:112) n + h Y + q (cid:88) k =1 n + (cid:88) i =1 K ( x + ,i ) x j + ,i η ∗ Y + ,i,k , ∀ j = 1 , · · · , p,η ∗ Y + ,i,k = I ( (cid:15) Y + ,i ≤ c k − d i,k σ (cid:15) Y + ,i ) − τ k . Also let W Y + ,n + = ( w Y + , , · · · , w Y + , q , w Y + , , · · · , w Y + , p ) T = ( w Y + , n , w Y + , n ) T , where w Y + , k = 1 (cid:112) n + h Y + n + (cid:88) i =1 K ( x + ,i ) η Y + ,i,k ,w Y + , j = 1 (cid:112) n + h Y + q (cid:88) k =1 n + (cid:88) i =1 K ( x + ,i ) x j + ,i η Y + ,i,k , ∀ j = 1 , · · · , p,η Y + ,i,k = I ( (cid:15) Y + ,i ≤ c k ) − τ k . Similarly, we define W ∗ T + ,n + , w ∗ T + , k , w ∗ T + , j , η ∗ T + ,i,k , W T + ,n + , w T + , k , w T + , j and η T + ,i,k .Consider the case p = 1 and define θ = ( u , · · · , u q , v ) T . It can be shown that minimizing eq. (13) is equivalent to minimizing L n + ( θ ) = n + (cid:88) i =1 (cid:32) K ( x + ,i ) q (cid:88) k =1 ρ τ k ( σ (cid:15) Y + ,i ( (cid:15) Y + ,i − c k ) + d i,k − ∆ i,k ) − ρ τ k ( σ (cid:15) Y + ,i ( (cid:15) Y + ,i − c k ) + d i,k )) (cid:33) . θ n + = (ˆ u , · · · , ˆ u q , ˆ v ) T be the minimizer. Lemma 1.
Under Assumptions 1 to 6, as n + → ∞ , we haveˆ θ n + + σ (cid:15) Y + (0) f X + (0) S − Y + ( c ) E ( W ∗ n + | X ) L → M V N (cid:32) , σ (cid:15) Y + (0) f X + (0) S − Y + ( c )Σ Y + ( c ) S − Y + ( c ) (cid:33) . Proof of Lemma 1.
See the proof of Theorem 2.1 in Kai et al. (2009).
Lemma 2.
Under Assumptions 1 to 6, as n + → ∞ , the asymptotic bias and variance forthe LCQR estimator in eq. (1) are given byBias( ˆ m Y + (0) | X ) = 12 a Y + ( c ) m (2) Y + (0) h Y + + o p ( h Y + ) , Var( ˆ m Y + (0) | X ) = 1 n + h Y + b Y + ( c ) σ (cid:15) Y + (0) f X + (0) + o p ( 1 n + h Y + ) ,a Y + ( c ) = µ , ( c ) − µ + , ( c ) µ + , ( c ) µ + , ( c ) µ + , ( c ) − µ , ( c ) , (39) b Y + ( c ) = e Tq ( S − Y + ( c )Σ Y + ( c ) S − Y + ( c )) e q /q . (40) Proof of Lemma 2.
The bias result follows that in Theorem 2.1 in Kai et al. (2009). Thevariance result also largely follows that in Kai et al. (2009). GivenVar( ˆ m Y + (0) | X ) = 1 n + h Y + σ (cid:15) Y + q f X + (0) e Tq ( S − Y + ( c )Σ Y + ( c ) S − Y + ( c )) e q + o p ( 1 n + h Y + ) , (41)It is easy to verify that when q = 1, eq. (41) can be written asVar( ˆ m Y + (0) | X ) = 1 n + h Y + σ (cid:15) Y + f X + (0) µ , ( c ) ν + , ( c ) − µ + , ( c ) µ + , ( c ) ν + , ( c ) + µ , ( c ) ν + , ( c )( µ + , ( c ) µ + , ( c ) − µ , ( c )) R ( q )+ o p ( 1 n + h Y + ) , (42)where R ( q ) = q (cid:80) qk =1 (cid:80) qk (cid:48) =1 τ kk (cid:48) f (cid:15)Y +( ck ) f (cid:15)Y +( ck (cid:48) ) . However, for q ≥
2, result in eq. (42) no longerholds and we use eq. (41) instead.Next we derive the covariance result Cov( ˆ m Y + ( x ) , ˆ m T + ( x ) | X ) at the boundary point 0.The result for Cov( ˆ m Y − ( x ) , ˆ m T − ( x ) | X ) at 0 is similar.35 emma 3. Under Assumptions 1 to 6, as n + → ∞ , the covariance between ˆ m Y + ( x ) andˆ m T + ( x ) at the boundary point 0 is given byCov( ˆ m Y + (0) , ˆ m T + (0) | X )= 1 n + (cid:112) h Y + h T + σ (cid:15) Y + (0) σ (cid:15) T + (0) f X + (0) b Y T + + o p ( 1 n + h Y + + 1 n + h T + ) , (43)where b Y T + = 1 q e Tq (cid:16) S − Y + ( c )Σ Y T + S − T + ( c ) (cid:17) e q . (44) Proof of Lemma 3.
Assume p = 1. From Lemma 1, we writeˆ m Y + (0) − E ( ˆ m Y + (0) | X )= − q (cid:112) n + h Y + σ (cid:15) Y + (0) f x + (0) e Tq (cid:18) ( S − Y + ( c )) ( S − Y + ( c )) (cid:19) w ∗ Y + , n − E ( w ∗ Y + , n | X ) w ∗ Y + , − E ( w ∗ Y + , | X ) + o p (1)= − q (cid:112) n + h Y + σ (cid:15) Y + (0) f x + (0) e Tq (cid:18) ( S − Y + ( c )) ( S − Y + ( c )) (cid:19) w Y + , n − E ( w Y + , n | X ) w Y + , − E ( w Y + , | X ) + o p (1) , where the last equality follows by the result that Var( w ∗ Y + , n − w Y + , n | X ) = o p (1) andVar( w ∗ Y + , − w Y + , | X ) = o p (1). See Kai et al. (2010) for a proof. Similarly, we haveˆ m T + (0) − E ( ˆ m T + (0) | X )= − q (cid:112) n + h T + σ (cid:15) T + (0) f x + (0) e Tq (cid:18) ( S − T + ( c )) ( S − T + ( c )) (cid:19) w T + , n − E ( w T + , n | X ) w T + , − E ( w T + , | X ) + o p (1) . Hence the covariance is given byCov( ˆ m Y + (0) , ˆ m T + (0) | X )= E (cid:0) ( ˆ m Y + (0) − E ( ˆ m Y + (0) | X ))( ˆ m T + (0) − E ( ˆ m T + (0) | X )) (cid:1) = 1 q n + (cid:112) h Y + h T + σ (cid:15) Y + (0) σ (cid:15) T + (0) f X + (0) e Tq (cid:18) ( S − Y + ( c )) ( S − Y + ( c )) (cid:19) E w Y + , n − E ( w Y + , n | X ) w Y + , − E ( w Y + , | X ) w T + , n − E ( w T + , n | X ) w T + , − E ( w T + , | X ) T × ( S − T + ( c )) ( S − T + ( c )) e q = 1 q n + (cid:112) h Y + h T + σ (cid:15) Y + (0) σ (cid:15) T + (0) f X + (0) e Tq (cid:16) S − Y + ( c )Σ Y T + S − T + ( c ) (cid:17) e q + o p ( 1 n + h Y + + 1 n + h T + ) , where Cov( η Y + ,i,k , η T + ,j,k (cid:48) ) = φ kk (cid:48) if i = j and Cov( η Y + ,i,k , η T + ,j,k (cid:48) ) = 0 if i (cid:54) = j . Proof of Theorem 2.
The proof of Theorem 2 follows directly from Lemmas 2 and 3, eq. (12),and the approximationˆ τ fuzzy − τ fuzzy = 1 m T + (0) − m T − (0) (cid:2) ˆ m Y + (0) − m Y + (0) − ( ˆ m Y − (0) − m Y − (0)) (cid:3) − m Y + (0) − m Y − (0) (cid:2) m T + (0) − m T − (0) (cid:3) (cid:2) ˆ m T + (0) − m T + (0) − ( ˆ m T − (0) − m T − (0)) (cid:3) + o p ( h Y + + h Y − + h T + + h T − ) . For convenience we write ˆ m Y + (0) and ˆ m Y − (0) as ˆ m Y + and ˆ m Y − , respectively. Equa-tion (29) suggests that we need the expressions for Var(Bias( ˆ m Y + )) and Cov( ˆ m Y + , Bias( ˆ m Y + ))to compute the adjusted variance. The next lemma provides results for Var(Bias( ˆ m Y + )). Inderiving the results, for completeness purposes, we also present the bias of Bias( ˆ m Y + ). Let e r be a p × r th element equal to one. Let p = 3 in the following proof. Lemma 4.
Under Assumptions 1 to 6, if h Y + → n + h Y + → ∞ as n + → ∞ , theasymptotic bias and variance of ˆ m (2) Y + are given byBias( ˆ m (2) Y + | X ) = 112 a ∗ Y + ( c ) m (4) Y + h Y + + o p ( h Y + ) , (45)Var ( ˆ m (2) Y + | X ) = 4 n + h Y + σ (cid:15) Y + (0) b ∗ Y + ( c ) f X + (0) + o p ( 1 n + h Y + ) , (46)where a ∗ Y + ( c ) = µ + , e T ( S − Y + ( c )) f (cid:15) Y + + q (cid:88) k =1 f (cid:15) Y + ( c k ) e T ( S − Y + ( c )) ( µ + , , µ + , , µ + , ) T (47) b ∗ Y + ( c ) = e T ( S − Y + ( c )Σ Y + ( c ) S − Y + ) e . (48)37 roof of Lemma 4. From the definition of v j , we haveˆ m (2) Y + = m (2) Y + + 2ˆ v h Y + (cid:112) n + h Y + . (49)Hence the bias becomes E ( ˆ m (2) Y + ) − m (2) Y + = − σ (cid:15) Y + (0) h Y + (cid:112) n + h Y + f X + (0) e T (( S − Y + ( c )) , ( S − Y + ( c )) ) E ( W ∗ Y + ,n )= − σ (cid:15) Y + (0) h Y + (cid:112) n + h Y + f X + (0) e T ( S − Y + ( c )) E ( W ∗ Y + , n ) − σ (cid:15) Y + (0) h Y + (cid:112) n + h Y + f X + (0) e T ( S − Y + ( c )) E ( W ∗ Y + , n )= I + II . Consider the terms I and II.I = − σ (cid:15) Y + (0) h Y + (cid:112) n + h Y + f X + (0) e T ( S − Y + ( c )) × (cid:34) − f (cid:15) Y + (cid:112) n + h Y + n + (cid:88) i =1 K i c k σ (cid:15) Y + ,i − σ (cid:15) Y + (0) σ (cid:15) Y + ,i − f (cid:15) Y + (cid:112) n + h Y + n + (cid:88) i =1 K i r i, σ (cid:15) Y + ,i (cid:35) = 112 m (4) Y + µ + , ( c ) e T ( S − Y + ( c )) f (cid:15) Y + h Y + + o p ( h Y + ) , II = − σ (cid:15) Y + (0) h Y + (cid:112) n + h Y + f X + (0) e T ( S − Y + ( c )) × − (cid:80) qk =1 f (cid:15) Y + ( c k ) (cid:112) n + h Y + n + (cid:88) i =1 K i c k σ (cid:15) Y + ,i − σ (cid:15) Y + (0) σ (cid:15) Y + ,i x + ,i x ,i x ,i − (cid:80) qk =1 f (cid:15) Y + ( c k ) (cid:112) n + h Y + n + (cid:88) i =1 K i r i, σ (cid:15) Y + ,i x + ,i x ,i x ,i = 112 m (4) Y + q (cid:88) k =1 f (cid:15) Y + ( c k ) e T ( S − Y + ( c )) µ + , µ + , µ + , h Y + + o p ( h Y + ) . The bias result is proved by combing the two terms I and II. One would expect a numberof 4! = 24 instead of 12 on the denominator. This is due to the extra number 2 in eq. (49).Because of the way ˆ v is defined, the “effective” constant on the denominator is still 24,38n line with the standard results for nonparametric derivatives. Similarly, the number 4appearing on the numerator of the variance is also a result of the number 2 in eq. (49). Thevariance results from eq. (49) and Lemma 1. Proof of Theorem 3.
Following Theorem 1, we haveVar( ˆ m Y + ) = 1 n + h Y + b Y + ( c ) σ (cid:15) Y + (0) f X + (0) + o p ( 1 n + h Y + ) , Var( ˆ m Y − ) = 1 n − h Y − b Y − ( c ) σ (cid:15) Y − (0) f X − (0) + o p ( 1 n − h Y − ) . Use the bias expression in Theorem 1 and the variance result in Lemma 4, we haveVar(Bias( ˆ m Y + )) = σ (cid:15) Y + (0) n + h Y + f X + (0) a ( c ) b ∗ Y + ( c ) + o p ( 1 n + h Y + ) , Var(Bias( ˆ m Y − )) = σ (cid:15) Y − (0) n − h Y − f X − (0) a ( c ) b ∗ Y − ( c ) + o p ( 1 n − h Y − ) . For the covariances, we haveCov( ˆ m Y + , Bias( ˆ m Y + )) = Cov( m Y + + 1 q (cid:112) n + h Y + q (cid:88) k =1 ˆ u k , a Y + ( c ) h Y + ( m (2) Y + + 2ˆ v h Y + (cid:112) n + h Y + )= a Y + ( c ) n + h Y + q q (cid:88) k =1 Cov(ˆ u k , ˆ v )= a Y + ( c ) σ (cid:15) Y + (0) n + h Y + qf X + (0) e Tq ( S − Y + Σ Y + S − Y + ) , + o p ( 1 n + h Y + ) , where ( S − Y + Σ Y + S − Y + ) , is the second column of the matrix ( S − Y + Σ Y + S − Y + ) and the last linefollows from Lemma 1. Similarly, for data below the cutoff, we haveCov( ˆ m Y − , Bias( ˆ m Y − )) = a Y − ( c ) σ (cid:15) Y − (0) n − h Y − qf X − (0) e Tq ( S − Y − Σ Y − S − Y − ) , + o p ( 1 n − h Y − ) . To establish the asymptotic normality of the adjusted t -statistic, we note thatˆ τ bcsharp = ˆ m Y + − (cid:100) Bias( ˆ m Y + ) − ( ˆ m Y − − (cid:100) Bias( ˆ m Y − )) , where, by applying Lemma 1 to both ˆ m Y + − (cid:100) Bias( ˆ m Y + ) and ˆ m Y − − (cid:100) Bias( ˆ m Y − ), ˆ τ bcsharp becomes39he difference between two normal random variables, which is also a normal random variable.Finally, the expression for Var(ˆ τ sharp − Bias(ˆ τ sharp )) is obtained by substituting the sixvariance and covariance results into eq. (29),Var(ˆ τ sharp − Bias(ˆ τ sharp )) = 1 n + h Y + V sharp , + + 1 n − h Y − V sharp , − , where V sharp , + = b Y + ( c ) σ (cid:15) Y + (0) f X + (0) + σ (cid:15) Y + (0) f X + (0) a ( c ) b ∗ Y + ( c ) − a Y + ( c ) σ (cid:15) Y + (0) qf X + (0) e Tq ( S − Y + Σ Y + S − Y + ) , ,V sharp , − = b Y − ( c ) σ (cid:15) Y − (0) f X − (0) + σ (cid:15) Y − (0) f X − (0) a ( c ) b ∗ Y − ( c ) − a Y − ( c ) σ (cid:15) Y − (0) qf X − (0) e Tq ( S − Y − Σ Y − S − Y − ) , . Proof of Theorem 6.
We first note that all bias terms in eq. (31) can be obtained usingLemma 2. For terms in the adjusted variance in eq. (32), Var( ˆ m Y + ), Var( ˆ m T + ), Var( (cid:100) Bias( ˆ m Y + )),Var( (cid:100) Bias( ˆ m T + )), Cov( ˆ m Y + , (cid:100) Bias( ˆ m Y + )), and Cov( ˆ m T + , (cid:100) Bias( ˆ m T + )) can be obtained using re-sults in the proof of Theorem 3; Cov( ˆ m Y + , ˆ m T + ) is obtained using Lemma 3. And we listthese seven terms in the following.Var( ˆ m Y + ) = 1 n + h Y + b Y + ( c ) σ (cid:15) Y + (0) f X + (0) + o p ( 1 n + h Y + ) , Var( ˆ m T + ) = 1 n + h T + b T + ( c ) σ (cid:15) T + (0) f X + (0) + o p ( 1 n + h T + ) , Var(Bias( ˆ m Y + )) = σ (cid:15) Y + (0) n + h Y + f X + (0) a Y + ( c ) b ∗ Y + ( c ) + o p ( 1 n + h Y + ) , Var(Bias( ˆ m T + )) = σ (cid:15) T + (0) n + h T + f X + (0) a T + ( c ) b ∗ T + ( c ) + o p ( 1 n + h T + ) , Cov( ˆ m Y + , ˆ m T + ) = 1 n + (cid:112) h Y + h T + σ (cid:15) Y + (0) σ (cid:15) T + (0) f X + (0) b Y T + + o p ( 1 n + h Y + + 1 n + h T + ) , Cov( ˆ m Y + , (cid:100) Bias( ˆ m Y + )) = a Y + ( c ) σ (cid:15) Y + (0) n + h Y + qf X + (0) e Tq ( S − Y + Σ Y + S − Y + ) , + o p ( 1 n + h Y + ) , Cov( ˆ m T + , (cid:100) Bias( ˆ m T + )) = a T + ( c ) σ (cid:15) T + (0) n + h T + qf X + (0) e Tq ( S − T + Σ T + S − T + ) , + o p ( 1 n + h T + ) . (cid:100) Bias( ˆ m Y + ) , (cid:100) Bias( ˆ m T + )) = Cov( 12 a Y + ( c ) ˆ m (2) Y + h Y + , a T + ( c ) ˆ m (2) T + h T + )= a Y + ( c ) a T + ( c ) (cid:112) n + h Y + (cid:112) n + h T + Cov(ˆ v ,Y + , ˆ v ,T + )= a Y + ( c ) a T + ( c ) σ (cid:15) Y + (0) σ (cid:15) T + (0) n + (cid:112) h Y + (cid:112) h T + f X + (0) e T ( S − Y + Σ Y T + S − T + ) e + o p ( 1 n + h Y + + 1 n + h T + ) . Cov( ˆ m Y + , (cid:100) Bias( ˆ m T + )) = Cov( m Y + + 1 q (cid:112) n + h Y + q (cid:88) k =1 ˆ u k,Y , a T + ( c ) h T + ( m (2) T + + 2ˆ v ,T h T + (cid:112) n + h T + ))= a T + ( c ) σ (cid:15) Y + (0) σ (cid:15) T + (0) qn + (cid:112) h Y + (cid:112) h T + f X + (0) e Tq ( S − Y + Σ Y T + S − T + ) , + o p ( 1 n + h Y + + 1 n + h T + ) . Cov( ˆ m T + , (cid:100) Bias( ˆ m Y + )) = Cov( m T + + 1 q (cid:112) n + h T + q (cid:88) k =1 ˆ u k,T , a Y + ( c ) h Y + ( m (2) Y + + 2ˆ v ,Y h Y + (cid:112) n + h Y + ))= a Y + ( c ) σ (cid:15) Y + (0) σ (cid:15) T + (0) qn + (cid:112) h Y + (cid:112) h T + f X + (0) e Tq ( S − T + Σ T Y + S − Y + ) , + o p ( 1 n + h Y + + 1 n + h T + ) . Substituting the above results into eq. (32) gives the expression for Var(( ˆ m Y + − τ ˆ m T + ) − ( (cid:100) Bias( ˆ m Y + ) − τ (cid:100) Bias( ˆ m T + ))). The result for Var(( ˆ m Y − − τ ˆ m T − ) − ( (cid:100) Bias( ˆ m Y − ) − τ (cid:100) Bias( ˆ m T − )))can be obtained in a similar way. Adding up the two variance results gives the adjustedvariance in the fuzzy case.To establish the asymptotic normality, note that we can use eq. (31) to write ˜ τ bcfuzzy as˜ τ bcsharp = ( ˆ m Y + − (cid:100) Bias( ˆ m Y + )) − τ ( ˆ m T + − (cid:100) Bias( ˆ m T + )) − ( ˆ m Y − − (cid:100) Bias( ˆ m Y − ))+ τ ( ˆ m T − − (cid:100) Bias( ˆ m T − )) , where each of the four terms follows an asymptotic normal distribution by applying Lemma 1.Hence, along with the adjusted variance derived earlier, we prove the asymptotic distributionof t adj.fuzzy . 41 roof of Theorem 7. We first expand Bias( ˆ m Y + ) up to O ( h Y + ) on the boundary. Recallˆ m Y + = (cid:80) qk =1 ˆ a k /q and we haveBias( ˆ m Y + ) = σ (cid:15) Y + (0) q q (cid:88) k =1 c k − σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq (cid:104) ( S − Y + ( c )) E ( w ∗ Y + , n ) + ( S − Y + ( c )) E ( w ∗ Y + , n ) (cid:105) = − σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c )) E ( w ∗ Y + , n ) − σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c )) E ( w ∗ Y + , n )= I + II . Consider term I.I = − σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c )) √ n + h Y + (cid:80) n + i =1 K i E ( η ∗ Y + ,i, )... √ n + h Y + (cid:80) n + i =1 K i E ( η ∗ Y + ,i,q ) = σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c )) f (cid:15)Y + ( c ) √ n + h Y + (cid:80) n + i =1 K i d i, σ (cid:15)Y + ,i ... f (cid:15)Y + ( c q ) √ n + h Y + (cid:80) n + i =1 K i d i,q σ (cid:15)Y + ,i + o p (1)= σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c )) f (cid:15)Y + ( c ) √ n + h Y + (cid:80) n + i =1 K i r i, ... f (cid:15)Y + ( c q ) √ n + h Y + (cid:80) n + i =1 K i r i, + o p (1)= 12 q e Tq ( S − Y + ( c )) f (cid:15) Y + m (2) Y + µ + , h Y + + 16 q e Tq ( S − Y + ( c )) f (cid:15) Y + m (3) Y + µ + , h Y + + f (1) x + (0)2 qf x + (0) e Tq ( S − Y + ( c )) f (cid:15) Y + m (2) Y + µ + , h Y + + o p ( h Y + ) , where the second equality follows by expanding the cumulative distribution of (cid:15) Y + ,i around c k and the third equality follows by noticing that all terms containing c k , after multiplied bythe coefficient σ (cid:15)Y + (0) q √ n + h Y + f x + (0) e Tq ( S − Y + ( c )) , become zero after a summation. The last equalityis obtained by a Taylor series expansion of m Y + at 0 up to order 3 in r i, , similar to theexpansion in the definition of r i, .Consider term II. Note that p = 1 in the following proof when we estimate the conditional42ean using degree one local polynomial.II = − σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c )) √ n + h Y + (cid:80) qk =1 (cid:80) n + i =1 K i X + ,i E ( η ∗ Y + ,i, )... √ n + h Y + (cid:80) qk =1 (cid:80) n + i =1 K i X p + ,i E ( η ∗ Y + ,i,q ) = σ (cid:15) Y + (0) q (cid:112) n + h Y + f x + (0) e Tq ( S − Y + ( c )) √ n + h Y + (cid:80) qk =1 f (cid:15) Y + ( c ) (cid:80) n + i =1 K i X + ,i d i, σ (cid:15)Y + ,i ... √ n + h Y + (cid:80) qk =1 f (cid:15) Y + ( c q ) (cid:80) n + i =1 K i X p + ,i d i,q σ (cid:15)Y + ,i + o p (1)= (cid:80) qk =1 f (cid:15) Y + ( c k )2 q e Tq ( S − Y + ( c )) µ + , ... µ + ,p +2 m (2) Y + h Y + + (cid:80) qk =1 f (cid:15) Y + ( c k )6 q e Tq ( S − Y + ( c )) µ + , ... µ + ,p +3 m (3) Y + h Y + + f (1) x + (0) (cid:80) qk =1 f (cid:15) Y + ( c k )2 qf x + (0) e Tq ( S − Y + ( c )) µ + , ... µ + ,p +3 m (2) Y + h Y + + o p ( h Y + ) . Combining I and II yieldsBias( ˆ m Y + ) = 12 a Y + ( c ) m (2) Y + h Y + + 16 ˇ a Y + ( c ) m (3) Y + h Y + + 12 ˜ a Y + ( c ) f (1) x + (0) f x + (0) m (2) Y + h Y + + o p ( h Y + ) , where a Y + ( c ) is defined in Lemma 2, ˇ a Y + ( c ) = µ + , ( c ) µ + , ( c ) − µ + , ( c ) µ + , ( c ) µ + , ( c ) µ + , ( c ) − µ , ( c ) , and ˜ a Y + ( c ) = µ , ( c ) − µ + , ( c ) µ + , ( c ) µ + , ( c ) µ + , ( c ) − µ , ( c ) .Hence the leading term in Bias( ˆ m Y + − Bias( ˆ m Y + )) is ˇ a Y + ( c ) m (3) Y + h Y + +
12 ˜ a Y + ( c ) f (1) x + (0) f x + (0) m (2) Y + h Y + .Since we work with data above the cutoff in this proof, the adjusted variance is given by43 n + h Y + V adj.sharp in the proof of Theorem 3. Thus, the adjusted MSE can be written asadj. MSE = (cid:34)
16 ˇ a Y + ( c ) m (3) Y + h Y + + 12 ˜ a Y + ( c ) f (1) x + (0) f x + (0) m (2) Y + h Y + (cid:35) + 1 n + h Y + V adj.sharp + o p ( h Y + + 1 n + h Y + )= C h Y + + 1 n + h Y + C + o p ( h Y + + 1 n + h Y + ) , where C = ˇ a Y + ( c ) m (3) Y + +
12 ˜ a Y + ( c ) f (1) x + (0) f x + (0) m (2) Y + and C = V adj.sharp . The bandwidth that minimizesthe adjusted MSE is given by h = (cid:16) C C (cid:17) / n − / . Appendix B
This section contains additional Figures 4 to 6 that compare the finite sample performanceof LCQR and LLR in estimating the treatment effect. The following four tables of coverageprobability are identical to Tables 7 to 10 except that ˆ τ cqr2bw and ˆ τ cqr,bc2bw use the rule-of-thumbbandwidth described by equation (4.3) in Fan and Gijbels (1996). b i a s o f b i a s − c o rr e c t ed e s t i m a t e LCQRLLR
Figure 4: Absolute value of average bias of the bias-corrected estimators, ˆ τ cqr,bc1bw and ˆ τ robust,bc1bw for the Lee model with heteroskedasticity. ˆ τ robust,bc1bw is the bias-corrected LLR. The result isbased on 5000 replications and the true treatment effect is 0 . .02730.0115 0.01630.0033 0.0243e−04 0.01790.0052 0.02410.0074 b i a s o f b i a s − c o rr e c t ed e s t i m a t e LCQRLLR
Figure 5: Absolute value of average bias of the bias-corrected estimators, ˆ τ cqr,bc1bw and ˆ τ robust,bc1bw for the LM model with homoskedasticity. ˆ τ robust,bc1bw is the bias-corrected LLR. The result isbased on 5000 replications and the true treatment effect is − . b i a s o f b i a s − c o rr e c t ed e s t i m a t e LCQRLLR
Figure 6: Absolute value of average bias of the bias-corrected estimators, ˆ τ cqr,bc1bw and ˆ τ robust,bc1bw for the LM model with heteroskedasticity. ˆ τ robust,bc1bw is the bias-corrected LLR. The result isbased on 5000 replications and the true treatment effect is − . τ cqr2bw τ cqr,bc2bw Notes : The numbers are the averages of the coverage prob-ability associated with different estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with a superscript bc are both bias-corrected and s.e.-adjusted.Table 12: Coverage probability in the Lee model with het-eroskedasticity using the rule-of-thumb bandwidth in LCQRestimationEstimator DGP 1 DGP 2 DGP 3 DGP 4 DGP 5ˆ τ cqr2bw τ cqr,bc2bw Notes : The numbers are the averages of the coverage prob-ability associated with different estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with a superscript bcbc
Figure 6: Absolute value of average bias of the bias-corrected estimators, ˆ τ cqr,bc1bw and ˆ τ robust,bc1bw for the LM model with heteroskedasticity. ˆ τ robust,bc1bw is the bias-corrected LLR. The result isbased on 5000 replications and the true treatment effect is − . τ cqr2bw τ cqr,bc2bw Notes : The numbers are the averages of the coverage prob-ability associated with different estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with a superscript bc are both bias-corrected and s.e.-adjusted.Table 12: Coverage probability in the Lee model with het-eroskedasticity using the rule-of-thumb bandwidth in LCQRestimationEstimator DGP 1 DGP 2 DGP 3 DGP 4 DGP 5ˆ τ cqr2bw τ cqr,bc2bw Notes : The numbers are the averages of the coverage prob-ability associated with different estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with a superscript bcbc are both bias-corrected and s.e.-adjusted. 46able 13: Coverage probability in the LM model with ho-moskedasticity using the rule-of-thumb bandwidth in LCQRestimationEstimator DGP 1 DGP 2 DGP 3 DGP 4 DGP 5ˆ τ cqr2bw τ cqr,bc2bw Notes : The numbers are the averages of the coverage prob-ability associated with different estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with a superscript bc are both bias-corrected and s.e.-adjusted.Table 14: Coverage probability in the LM model with het-eroskedasticity using the rule-of-thumb bandwidth in LCQRestimationEstimator DGP 1 DGP 2 DGP 3 DGP 4 DGP 5ˆ τ cqr2bw τ cqr,bc2bw Notes : The numbers are the averages of the coverage prob-ability associated with different estimators. The results arebased on 5000 replications with a sample size n = 500. Thes.e. and adjusted s.e. for the LCQR estimator are obtainedbased on the asymptotic expressions from Theorem 1 andTheorem 3. Estimators with a superscript bcbc