[PDF] Linear programming approach to nonparametric inference under shape restrictions: with an application to regression kink designs

Abstract

We develop a novel method of constructing confidence bands for nonparametric regression functions under shape constraints. This method can be implemented via a linear programming, and it is thus computationally appealing. We illustrate a usage of our proposed method with an application to the regression kink design (RKD). Econometric analyses based on the RKD often suffer from wide confidence intervals due to slow convergence rates of nonparametric derivative estimators. We demonstrate that economic models and structures motivate shape restrictions, which in turn contribute to shrinking the confidence interval for an analysis of the causal effects of unemployment insurance benefits on unemployment durations.

Full PDF

aa r X i v : . [ ec on . E M ] F e b Linear Programming Approach to Nonparametric Inference underShape Restrictions: with an Application to Regression Kink Designs ∗ Harold D. Chiang † Kengo Kato ‡ Yuya Sasaki § Takuya Ura ¶ Abstract

We develop a novel method of constructing conﬁdence bands for nonparametricregression functions under shape constraints. This method can be implemented via alinear programming, and it is thus computationally appealing. We illustrate a usageof our proposed method with an application to the regression kink design (RKD).Econometric analyses based on the RKD often suﬀer from wide conﬁdence intervals dueto slow convergence rates of nonparametric derivative estimators. We demonstrate thateconomic models and structures motivate shape restrictions, which in turn contributeto shrinking the conﬁdence interval for an analysis of the causal eﬀects of unemploymentinsurance beneﬁts on unemployment durations.

Keywords: linear programming, regression kink design, shape restriction, nonpara-metric inference, conﬁdence band.

JEL Classiﬁcation:

C13, C14, C21 ∗ We beneﬁted from very useful comments by Chris Taber. We would like to thank Patty Andersonand Bruce Meyer for kindly agreeing to our use of the CWBH data. H. D. Chiang is supported by theOﬃce of the Vice Chancellor for Research and Graduate Education at UW–Madison with funding from theWisconsin Alumni Research Foundation. K. Kato is partially supported by NSF grants DMS-1952306 andDMS-2014636. The usual disclaimer applies. † H.D. Chiang: Department of Economics, University of Wisconsin - Madison, William H. Sewell SocialScience Building, 1180 Observatory Drive Madison, WI 53706-1393. Email: [email protected] ‡ K. Kato: Department of Statistics and Data Science, Cornell University, 1194 Comstock Hall, Ithaca,NY 14853. Email: [email protected] § Y. Sasaki: Department of Economics, Vanderbilt University, VU Station B ¶ T. Ura: Department of Economics, University of California, Davis, 1151 Social Sciences and Humanities,Davis, CA 95616. Email: [email protected] Introduction

Nonparametric inference under shape restrictions is often computationally demanding. Forinstance, inference based on test inversion would require a grid search over a high-dimensionalsieve parameter space. In this paper, we propose a computationally attractive method fornonparametric inference about regression functions under shape restrictions. Notably, ourmethod can be implemented via a linear programming, despite the complicated nature ofnonparametric inference under shape restrictions.In many applications, economic structures often motivate shape restrictions, and suchrestrictions may contribute to delivering more informative statistical inference about theeconomic structure and causal eﬀects. We highlight a case in point in the context of the re-gression kink design (RKD; Nielsen, Sørensen, and Taber, 2010; Card, Lee, Pei, and Weber,2015; Dong, 2016). Estimation and inference in the RKD rely on derivative estimators ofnonparametric regression functions, which typically suﬀer from slow convergence rates andthus may lead to wide conﬁdence intervals. On the other hand, there are often natural andeconomically motivated restrictions in the levels and slopes of the regression function to theleft and/or right of the kink location, and they can contribute to shrinking the lengths ofthe conﬁdence interval. In the context of the regression discontinuity design, Armstrong(2015) and Babii and Kumar (2019) suggest usage of shape restrictions with related motiva-tions. The beneﬁts of shape restrictions may well be even greater for the RKD than for theregression discontinuity design due to the slower convergence rates of the RKD estimators.We are far from the ﬁrst to study the problem of nonparametric inference under shape re-strictions. D¨umbgen (2003), Cai, Low, and Xia (2013), Armstrong (2015), Chernozhukov, Newey, and Santos(2015b), Horowitz and Lee (2017), Chen, Chernozhukov, Fern´andez-Val, Kostyshak, and Luo(2018), Freyberger and Reeves (2018), Mogstad, Santos, and Torgovitsky (2018), Fang and Seo(2019), and Zhu (2020), among others, propose various approaches to nonparametric infer-ence under shape restrictions. See Chetverikov, Santos, and Shaikh (2018) and the journalissue edited by Samworth and Sen (2018) for a comprehensive review of the related literature.We advance the frontier of this literature by providing a computationally attractive approach.Speciﬁcally, we provide a novel method of constructing conﬁdence bands/regions/intervals2hose boundaries can be fully characterized as solutions to linear programs.This paper is closely related to Freyberger and Horowitz (2015), who have considered alinear programming approach to inference under shape restrictions. Speciﬁcally, they proposea linear programming approach to inference about linear functionals of ﬁnite-dimensionalparameters, where the parameter values are the values of the regression function evaluatedat ﬁnite support points. On the other hand, as acknowledged in Freyberger and Horowitz(2015), “[t]he use of shape restrictions with continuously distributed variables is beyond thescope of” their paper. We contribute to this literature by accommodating (discretely orcontinuously) inﬁnite-dimensional parameters. This extended framework allows for analysisof nonparametric regressions with inﬁnitely supported (discrete or continuous) regressors,which are relevant to many applications including the regression discontinuity and kinkdesigns among others.Our proposed inference procedure works as follows. First, we use the sieve approximation(cf. Chen, 2007) of the nonparametric regression function. We then construct a supremumtest statistic as a linear function of the sieve parameters, compute its critical value by ap-plying Chernozhukov, Chetverikov, and Kato (2017a), and then translate their relation intoan inequality constraint. Subject to this inequality constraint, together with the additionallinear-in-sieve-parameter inequality constraints stemming from shape restrictions, we ﬁndthe lower (respectively, upper) bound of the conﬁdence band/interval by the minimizing(respectively, maximizing) the sieve representation with respect to the sieve parameters.In the ﬁnal step, we inﬂate the bounds by a sieve approximation error bound similarlyto Armstrong and Koles´ar (2018, 2020), Noack and Rothe (2019), Schennach (2020), andKato, Sasaki, and Ura (2021).The rest of this paper is organized as follows. Section 2 presents the model and anoverview of the proposed procedure. Section 3 presents the size control. Section 4 describesthe procedure when we are interested in a ﬁnite-dimensional linear feature of the regres-sion function. Section 5 presents an application of the RKD, with detailed implementationprocedures tailored to this application. In an empirical application, we demonstrate that Fang, Santos, Shaikh, and Torgovitsky (2020) also propose a linear programming approach to inferencefor a growing number of linear systems, although their focus is diﬀerent from nonparametric regressionfunctions under shape restrictions as in this paper. { ( Y i , X Ti ) : i = 1 , . . . , n } consists ofi.i.d. random vectors following the law of ( Y, X T ), where Y is a real-value random variableand X is a ﬁnite-dimensional random variable with the support X ⊂ R dim X . Let E n denotethe sample mean, that is, E n [ f ( Y, X T )] ≡ n P ni =1 f ( Y i , X Ti ) for any measurable function f . In this paper, we are interested in a linear feature of the unknown mean regression function g ( x ) ≡ E [ Y | X = x ], so that the parameter of interest can be written as θ ≡ A g for a known linear operator A . We assume this parameter θ to be a function from some set W into R , which allows θ to be a scalar, a vector, or a function from X into R . For example,when A is the identity function, the parameter of interest is the conditional mean function g itself. Other examples for θ include g ( x ) for a given point x , the integral R g ( x ) dµ ( x ),and the derivative ∂g ( x ) /∂x j , among others. In Section 4, we discuss how we can tailor theprocedure to the case when θ is ﬁnite dimensional.The objective of this paper is to construct a conﬁdence region for θ under the shaperestrictions [ A g ]( w ) ≤ w ∈ W (1)for a known linear operator A . We are going to construct a conﬁdence region CR θ for θ satisfying the following two properties: (i) the boundaries of CR θ are the set of solutionsto linear programming problems; and (ii) CR θ controls the asymptotic size under the shaperestriction.We approximate g by a linear combination of k functions p , . . . , p k on X . These k In this paper, the shape restriction does not have any improvement in the identiﬁcation analysis, because g is identiﬁed over X and therefore θ is identiﬁed. Recall that X is the support of X . We assume k ≥

2, which guarantees log k ≥ p k ≡ ( p , . . . , p k ) T . We can consider the linear regression of Y on p k ( X ), and the population coeﬃcient vectorfor this regression is ¯ β ≡ E [ p k ( X ) p k ( X ) T ] − E [ p k ( X ) Y ] . With these deﬁnitions and notations, we make the following assumption about error boundsfor the approximation of g by p T k ¯ β . Assumption 1 (Approximation error bounds) . There exist known functions δ and δ suchthat (cid:12)(cid:12) [ A ( g − p T k ¯ β )]( w ) (cid:12)(cid:12) ≤ δ ( w ) for all w ∈ W ; and (2) (cid:12)(cid:12) [ A ( g − p T k ¯ β )]( w ) (cid:12)(cid:12) ≤ δ ( w ) for all w ∈ W . (3)This assumptions plays the role of restricting the function class where g resides, similarlyto Kato et al. (2021) in the spirit of the honest inference approach (Armstrong and Koles´ar,2018, 2020) and the bias bound approach (Schennach, 2020). For a generic value β ∈ R k , we can implement a hypothesis testing for the null hypothesis H : ¯ β = β against the alternative hypothesis H : ¯ β = β as follows. In this hypothesis testingproblem, we aim to detect a violation of the null hypothesis H : E [ p k ( X )( Y − p k ( X ) T β )] = 0 , which is equivalent to ¯ β = β under the invertibility of E [ p k ( X ) p k ( X ) T ]. We can estimatethe left hand side of the above equation by E n [ p k ( X )( Y − p k ( X ) T β )] and its asymptotic We allow k , δ and δ to be a function of n . We do not require k → ∞ as n → ∞ but it is allowed.In Assumption 1, we bound the biases coming from the approximation of g by p T k ¯ β by known δ and δ .Without accounting for such approximation bounds, conventional methods would set δ → δ → n → H ) by E n [ˆ ω ˆ ω T ], whereˆ ω ≡ p k ( X )( Y − p k ( X ) T E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − E n [ p k ( X ) Y ]) . Note that ˆ ω estimates ω ≡ p k ( X )( Y − p k ( X ) T ¯ β ). With these estimates, we consider thetest statistic (cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ p k ( X )( Y − p k ( X ) T β )] (cid:13)(cid:13) ∞ . To obtain a critical value, we apply the multiplier bootstrap by calculating the (1 − α )quantile, denoted by cv , of (cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ η ˆ ω ] (cid:13)(cid:13) ∞ conditional on the data set, where η , . . . , η n are independent Rademacher multiplier randomvariables that are independent of the data. Note that the critical value cv does not dependon a speciﬁc value of β , which enables us to construct a conﬁdence region characterized bylinear inequalities for β .We can construct a conﬁdence region for θ based on the test inversion. Using the teststatistic and the critical value, we can deﬁne a conﬁdence region for θ , denoted by CR θ .Namely, CR θ is the set of θ satisfying the following linear constraints for some β ∈ R k : (cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ p k ( X )( Y − p k ( X ) T β )] (cid:13)(cid:13) ∞ ≤ cv, (4) | [ A p T k ]( w ) β − θ ( w ) | ≤ δ ( w ) for every w ∈ W , and (5)[ A p T k ]( w ) β ≤ δ ( w ) for every w ∈ W , (6)where [ A p T k ]( w ) β ≡ [ A ( p T k β )]( w ) and [ A p T k ]( w ) β ≡ [ A ( p T k β )]( w ).In the deﬁnition of CR θ , we have three types of linear constraints. First, (4) comes fromthe hypothesis test for H : ¯ β = β . Second, (5) controls the approximation error between A p T k ¯ β and θ under (2) in Assumption 1. Third, (6) uses the knowledge that the shaperestriction (1) holds for true g , together with (3) in Assumption 1. This conﬁdence regioncould be more informative than that without the shape-restriction inequalities in (6).For every value w ∈ W , the following theorem states that the projection of CR θ to6 ( w ) can be computed by solving two linear programming problems. A proof is providedin Appendix 1. Theorem 1.

Under Assumption 1, for every w ∈ W , the projection of CR θ to θ ( w ) isequal to the closed interval  min βs.t. (4)&(6) [ A p T k ]( w ) β − δ ( w ) , max βs.t. (4)&(6) [ A p T k ]( w ) β + δ ( w )  . Therefore, the boundary points are the solutions to linear programs.

For the asymptotic size control, we are going to impose the following assumptions. Let b > q ∈ [4 , ∞ ) , ν ∈ (2 , ∞ ) be some constants and let B n ≥ Assumption 2. (a) The eigenvalues of E [ ωω T ] and E [ p k ( X ) p k ( X ) T ] are bounded aboveand bounded below away from uniformly over n . (b) (i) E [ Y ] < ∞ . (ii) E [ | ( E [ ωω T ] − / ) j ω | ] ≥ b , E [ | ( E [ ωω T ] − / ) j ω | κ ] ≤ B κn and E [ k E [ ωω T ] − / ω k q ∞ ] ≤ B qn for every j = 1 , . . . , k and each κ = 1 , . (iii) B n log ( nk ) /n = o (1) and B n log ( nk ) /n − /q = o (1) . (c) (i) sup x ∈X E [ | Y − g ( X ) | ν | X = x ] = O (1) . (ii) For every k , there are ﬁnite constants c k and ℓ k such that E [( g ( X ) − p k ( X ) T ¯ β ) ] / ≤ c k and that sup x ∈X | g ( x ) − p k ( x ) T ¯ β | ≤ ℓ k c k . (iii) Let ξ k ≡ sup x ∈X k p k ( x ) k and ξ Lk ≡ sup x,x ′ ∈X : x = x ′ k p k ( x ) / k p k ( x ) k − p k ( x ′ ) / k p k ( x ′ ) k k / k x − x ′ k .Then ξ ν/ ( ν − k log k/n = O (1) , log ξ Lk = O (log k ) , and log ξ k = O (log k ) . (iv) n − ξ k log k = o (1) , ℓ k c k = O (1) , and ( n − ξ k ) / n n /ν (log k ) / + √ k o = O (1) . Assumption 2 (a) implies Condition A.2 in Assumption Belloni, Chernozhukov, Chetverikov, and Kato(2015). It imposes a restriction to rule out overly strong co-linearity among p , . . . , p k . As-sumptions 2 (b)-(ii) and 2 (b)-(iii) correspond to Conditions (M.1), (M.2) and (E.2) inChernozhukov et al. (2017a). It requires that the polynomial moments of the maximal com-ponent of normalized ω will not be growing too fast, as well as it imposes conditions that ( E [ ωω T ] − / ) j denotes the j -th row of a square matrix E [ ωω T ] − / . O ( n a ) for some a between zero and one. Assumption 2 (c) covers Con-ditions A.3-A.5 in Belloni et al. (2015) as well as rate conditions in the statement of theirTheorem 4.6. Assumption 2 (c)-(i) requires the residual to have a ﬁnite ν -th moment forsome ν >

2. Assumptions 2 (c)-(ii) and 2 (c)-(iii) impose bounds on the approximation er-rors of g using p , . . . , p k , as well as restrictions on the size of basis functions, measured bythe Euclidean norm and the Lipschitz constant. Assumption 2 (c)-(iv) imposes some moreconstraints on the relative growth rates of the approximation errors, the size and number ofbasis functions. Notice that it does not require the approximation errors to be diminishingasymptotically, and hence does not require undersmoothing.The following theorem states the asymptotic size control for CR θ as a conﬁdence regionfor θ . A proof is provided in Appendix 2. Theorem 2.

If Assumptions 1 and 2 are satisﬁed, then lim inf n →∞ P ( θ ∈ CR θ ) ≥ − α. With some additional notations and rate conditions, it is possible to strengthen thestatement of Theorem 2 to hold uniformly over a set of data generating processes. This isdue to the fact that key theoretical building blocks in the proof of Theorem 2 – i.e. theanti-concentration inequality in Chernozhukov, Chetverikov, and Kato (2015a), the high-dimensional central limit theorem of Chernozhukov, Chetverikov, and Kato (2018), and Rudel-son’s concentration inequality (Belloni et al., 2015, Lemma 6.2) – all provide non-asymptoticbounds with constants only depending on a few key features of the model such as b, q and ν . θ θ is ﬁnite dimensional, we can directly test A [ p T k ¯ β ] = θ for a generic value of θ , instead of testing ¯ β = β as in Section 2. In the current section, wedescribe the inference procedure when θ is a ﬁnite-dimensional column vector.For a generic value θ , we consider the null hypothesis H : A ,k ¯ β = θ and the alternative8ypothesis H : A ,k ¯ β = θ , where A ,k is the matrix deﬁned by A ,k β = A [ p T k β ] for every k × β . Based on the deﬁnition of ¯ β , we aim to measure the violation of the nullhypothesis H : A ,k E [ p k ( X ) p k ( X ) T ] − E [ p k ( X ) Y ] = θ. We can estimate the left hand side by A ,k E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − E n [ p k ( X ) Y ] and its theasymptotic variance under H byˆ V ≡ A ,k E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − E n [ˆ ω ˆ ω T ] E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − A T ,k . With these estimators, we consider the test statistic (cid:13)(cid:13)(cid:13) ˆ V − / ( A ,k E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − E n [ p k ( X ) Y ] − θ ) (cid:13)(cid:13)(cid:13) ∞ . To obtain its critical value, we apply the multiplier bootstrap and compute the (1 − α )quantile, denoted by b cv , of (cid:13)(cid:13)(cid:13) ˆ V − / A ,k E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − E n [ η ˆ ω ] (cid:13)(cid:13)(cid:13) ∞ conditional on the data set, where η , . . . , η n are independent Rademacher multiplier randomvariables that are independent of the data.A conﬁdence region for θ can be constructed based on the test inversion. In this setup,we can construct a conﬁdence region for θ , d CR θ , by collecting all θ ’s satisfying the followinglinear constraints for some β ∈ R k : (cid:13)(cid:13)(cid:13) ˆ V − / ( A ,k E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − E n [ p k ( X ) Y ] − A ,k β ) (cid:13)(cid:13)(cid:13) ∞ ≤ b cv, (7) | [ A p T k ]( w ) − θ ( w ) | ≤ δ ( w ) for every w ∈ W , and[ A p T k ]( w ) β ≤ δ ( w ) for every w ∈ W . (8)For every value w ∈ W , we can compute the projection of d CR θ to θ ( w ) by solving9wo linear programming problems w.r.t. β :minimize [ A ,k β ]( w ) − δ ( w ) over β subject to (7) & (8) , and maximize [ A ,k β ]( w ) + δ ( w ) over β subject to (7) & (8) . In other words, the projection is the closed interval  min β s.t. (7)&(8) [ A ,k β ]( w ) − δ ( w ) , max β s.t. (7)&(8) [ A ,k β ]( w ) + δ ( w )  . Formal theoretical properties of the the conﬁdence interval constructed by this procedurefollow from analogous arguments to those in Sections 2 and 3. In the application presentedin the following section, the parameter θ of interest is a scalar (and ﬁnite dimensional inparticular) and we therefore adopt this approach to constructing its conﬁdence interval. In this section, we present an application of our proposed method to the regression kinkdesign (RKD). Since the regression kink design is based on estimates of slopes as opposed tolevels, statistical inference based on nonparametric estimates often entails slow convergencerates and thus wide conﬁdence intervals. To mitigate this adverse feature of the regressionkink design, we propose to impose shape restrictions that are motivated by the underlyingeconomic structures.To introduce the RKD, consider the structure Y = Y ( T, X, U ) and T = T ( X ) , where Y denotes the outcome variable, T denotes the treatment variable, X denotes therunning variable, and U denotes the random vector of unobserved characteristics. A re-searcher is often interested in the partial eﬀect ∂Y ( T, X, U ) /∂T of the treatment variable10n the outcome variable. Since the unobserved characteristics U are generally correlatedwith the running variable X and thus with the treatment T = T ( X ), one would need toexploit exogenous variations in the treatment variable in order to identify this partial eﬀect.If the treatment policy function T ( · ) exhibits a ‘kink’ at a known point ¯ x , then this shaperestriction can be exploited to induce local exogenous variations in the treatment variable T as well, so that the partial eﬀect of interest may be identiﬁed. This approach of the so-calledregression kink design (RKD) was proposed by Nielsen et al. (2010) and Card et al. (2015)– see Dong (2016) for the case of a binary treatment, and see Chiang and Sasaki (2019) andChen, Chiang, and Sasaki (2020) for heterogeneous treatment eﬀects.Suppose that a researcher is interested in conducting inference for the average partialeﬀect h (¯ x ) ≡ E [ ∂Y ( T, X, U ) /∂T | X = ¯ x ] at the kink point ¯ x . Under regularity conditions,we can obtain the following decomposition of the derivative g ′ ( X ) of g ( x ) = E [ Y | X = x ]: g ′ ( x ) = E (cid:20) ∂Y ( T, X, U ) ∂T (cid:12)(cid:12)(cid:12)(cid:12) X = x (cid:21)| {z } Partial Eﬀect of Interest: h ( x ) · T ′ ( x ) + E (cid:20) ∂Y ( T, X, U ) ∂X (cid:12)(cid:12)(cid:12)(cid:12) X = x (cid:21)| {z } Direct Eﬀect of X : h ( x ) + E (cid:20) Y · ∂ log f U | X ( U | X ) ∂X (cid:12)(cid:12)(cid:12)(cid:12) X = x (cid:21)| {z } Endogenous Eﬀect: h ( x ) . (9)If T ′ ( · ) is discontinuous (i.e., T ( · ) is kinked) at ¯ x while each of h , h and h is continuousat ¯ x , then this decomposition implies that the partial eﬀect of interest at ¯ x can be identiﬁedby h (¯ x ) = lim x ↓ ¯ x g ′ ( x ) − lim x ↑ ¯ x g ′ ( x )lim x ↓ ¯ x T ′ ( x ) − lim x ↑ ¯ x T ′ ( x ) , cf. Nielsen et al. (2010); Card et al. (2015). We can represent the parameter of interest via h (¯ x ) = A g , using a linear operator A deﬁned by A g = lim x ↓ ¯ x g ′ ( x ) − lim x ↑ ¯ x g ′ ( x )lim x ↓ ¯ x T ′ ( x ) − lim x ↑ ¯ x T ′ ( x ) . (10)Even though g is unknown, the operator A is known since T ( · ) is a known function. Inthis case, W = { ¯ x } , and the parameter of interest θ = A g is a scalar.Although θ is nonparametrically estimable, an estimator based on slopes of nonpara-11etric regression functions usually suﬀers from slow rates of convergence, and thus it maynot provide an informative conﬁdence interval. If an economic structure motivates shaperestrictions, then imposing such restrictions may conceivably contribute to shrinking thelength of the conﬁdence interval. With this motivation, in Section 5.1, we demonstrate howshape restrictions help in conducting statistical inference in the analysis of of unemploymentinsurance (UI). Unemployment insurance (UI) beneﬁts play important roles in supporting consumptionsmoothing under the risk of unemployment. A potential drawback of the UI beneﬁts isthe moral hazard eﬀects, that is, the UI beneﬁts may discourage unemployed workers fromlooking for jobs, leading to elongated unemployment durations and thus economic ineﬃ-ciency. Identifying and estimating these moral hazard eﬀects have been of research interestin labor economics. Landais (2015) suggests to exploit the non-smooth UI beneﬁt sched-ule as detailed below, and thus to use the regression kink design to identify the eﬀects ofUI beneﬁts on the duration of unemployment. Applying this identiﬁcation strategy to thedata of the Continuous Wage and Beneﬁt History Project (cf. Moﬃtt, 1985), Landais (2015)ﬁnds that there are positive eﬀects of the UI beneﬁt amounts on the duration of unemploy-ment, even after controlling for unobserved source of endogenous selection of the durationthat may be correlated with the pre-unemployment income and thus the beneﬁt amount.Chiang and Sasaki (2019) further investigate heterogeneous eﬀects of the UI beneﬁt amounton the duration by using the quantile regression kink design.Landais (2015) considers the following empirical framework of assessing the welfare eﬀectsof unemployment beneﬁts. The outcome Y of interest is the duration of unemployment.Upon becoming unemployed, an individual can apply for UI and receives a weekly beneﬁtamount of T = T ( X ), where X is the highest quarterly earning in the last four completedcalendar quarters prior to the date of the UI claim. The partial eﬀect ∂Y ( T, X, U ) /∂T measures the moral hazard eﬀect of the UI beneﬁts on the duration of unemployment in thissetting. Since the unobserved characteristics U contain cognitive and non-cognitive skills ofthe individual, such as attitudes toward work, that are generally correlated with the labor12ncome X received prior to the unemployment, one would need exogenous variations in thetreatment variable in order to identify this moral hazard eﬀect.As in Landais (2015), we can exploit the fact that the UI beneﬁts policy T ( · ) exhibits akinked shape. In particular, the UI schedule in the state of Louisiana is linear in X with aconstant t ≡ /

25 of proportionality up to a ﬁxed ceiling t max . (Note that the unit of X isU.S. dollars per quarter, whereas the unit of T ( X ) is U.S. dollars per week. Therefore, thisconstant of proportionality implies that the UI beneﬁt amount is approximately a half of theprior earnings.) The maximum UI beneﬁt amount is ¯ t = $

183 during the period betweenSeptember 1981 and September 1982, and ¯ t = $

205 during the period between September1982 and December 1983. In short, the UI beneﬁts policy takes the form of T ( x ) =  t · x if x < t max /tt max if x ≥ t max /t, and T is thus kinked at ¯ x = t max /t . Individuals can continue to receive the beneﬁts deter-mined by this formula as far as they remain unemployed up to the maximum duration of 28weeks.We construct a data set by following the data construction in Landais (2015) andChiang and Sasaki (2019). We focus on the observations in Louisiana. The sample size ofthe original data is 9,008 for the period between September 1981 and September 1982, and16,463 for the period between September 1982 and December 1983. Since we are interestedin the information around the kink location ¯ x , for simplicity, we focus on the (sub-)sampleof the observations in the interval X ∈ [¯ x − , ¯ x + 5000]. The resultant sample size is8,677 for the period between September 1981 and September 1982, and the resultant samplesize is 15,763 for the period between September 1982 and December 1983.In this empirical application, we can consider a few shape restrictions on the unknownconditional mean function g ( x ) = E [ Y | X = x ]. First of all, to impose the continuity of g at ¯ x , we can use the shape restrictionlim x ↓ ¯ x g ( x ) = lim x ↑ ¯ x g ( x ) . (11)13his restriction is not redundant when we use diﬀerence sieves for the left of ¯ x and the rightof ¯ x . Moreover, it may be reasonable to assume that h and h are both non-increasing.Speciﬁcally, the direct eﬀect h is non-increasing if formerly higher-income earner can ﬁndthe next job more quickly than formerly lower-income earners on average. The endogenouseﬀect h is non-increasing if individuals with higher abilities can ﬁnd the next job morequickly than those with lower abilities on average. Since T ( · ) is a constant function to theright of the kink location in this application, this assumption together with the decomposition(9) implies that the reduced form g is non-increasing to the right of the kink location ¯ x .This consideration leads to the slope restriction g ′ ( x ) ≤ for every x > ¯ x. (12)In the notations in Section 2, we can summarize the shape restrictions (11) and (12) as[ A g ]( w ) ≤ w ∈ W , (13)where W = {− , − } ∪ { w : w > ¯ x } and[ A g ]( w ) =  lim x ↓ ¯ x g ( x ) − lim x ↑ ¯ x g ( x ) if w = − x ↑ ¯ x g ( x ) − lim x ↓ ¯ x g ( x ) if w = − g ′ ( w ) if w > ¯ x. Now, we outline the concrete implementation procedure to exploit these shape restrictions(13), for inference about the causal parameter θ = A g deﬁned in (10). For every evennatural number k , we use the basis functions p k = ( ℓ L, , ℓ R, , · · · , ℓ L,k/ − , ℓ R,k/ − ) , where (cid:0) ℓ L, , ℓ L, , · · · , ℓ L,k/ − (cid:1) are the ﬁrst k/ L ([¯ x − , ¯ x ]) and (cid:0) ℓ R, , ℓ R, , · · · , ℓ R,k/ − (cid:1) are the ﬁrst k/ L ([¯ x, ¯ x + 5000]). We use the shifted Legendre bases in the empirical application in this14ubsection as well as in the simulation studies in Section C. We follow Section 4 to constructthe (1 − α )-level conﬁdence interval for θ subject to the shape constraint (13), where werestrict W = {− , − } ∪ { ξ , . . . , ξ l } with 99 equally spaced grid points { ξ , . . . , ξ l } ⊂ (¯ x, ¯ x + 5000). The following algorithm provides a step-by-step procedure of the construction. Algorithm.

1. For every observation i = 1 , . . . , n , construct the vector ˆ ω i = p k ( X i ) (cid:16) Y i − p k ( X i ) T E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − E n [ p k ( X ) Y ] (cid:17) .

2. Construct the four matrices: A ,k = (cid:16) − lim x ↑ ¯ x ℓ ′ L, ( x ) lim x ↓ ¯ x ℓ ′ R, ( x ) · · · − lim x ↑ ¯ x ℓ ′ L,k/ − ( x ) lim x ↓ ¯ x ℓ ′ R,k/ − ( x ) (cid:17) ,B = A ,k E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − E n [ p k ( X ) Y ] q A ,k E n [ p k ( X ) p k ( X ) T ] − E n [ˆ ω ˆ ω T ] E n [ p k ( X ) p k ( X ) T ] − A T ,k ,B = A ,k q A ,k E n [ p k ( X ) p k ( X ) T ] − E n [ˆ ω ˆ ω T ] E n [ p k ( X ) p k ( X ) T ] − A T ,k , and B =  − lim x ↑ ¯ x ℓ L, ( x ) lim x ↓ ¯ x ℓ R, ( x ) · · · − lim x ↑ ¯ x ℓ L,k/ − ( x ) lim x ↓ ¯ x ℓ R,k/ − ( x )lim x ↑ ¯ x ℓ L, ( x ) − lim x ↓ ¯ x ℓ R, ( x ) · · · lim x ↑ ¯ x ℓ L,k/ − ( x ) − lim x ↓ ¯ x ℓ R,k/ − ( x )0 ℓ ′ R, ( ξ ) · · · ℓ ′ R,k/ − ( ξ ) ... ... ... ... ℓ ′ R, ( ξ l ) · · · ℓ ′ R,k/ − ( ξ l )  .

3. Generate M independent samples { η m, , · · · , η m,n } m =1 ,...,M of Rademacher random vari-ables independently from data, and compute b cv by the (1 − α ) -quantile of  (cid:12)(cid:12)(cid:12) A ,k E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − E n [ η m ˆ ω ] (cid:12)(cid:12)(cid:12)q A ,k E n [ p k ( X ) p k ( X ) T ] − E n [ˆ ω ˆ ω T ] E n [ p k ( X ) p k ( X ) T ] − A T ,k  m =1 ,...,M . . Solve the linear programs min β A ,k β − δ max β A ,k β + δ s.t. B β ≤ B + cv s.t. B β ≤ B + ˆ cvB β ≥ B − cv B β ≥ B − ˆ cvB β ≤ δ B β ≤ δ . The solutions to these two linear programs are the boundary points of the (1 − α ) -levelconﬁdence interval for θ . Table 1 summarizes the results for the statistical inference about the marginal eﬀects of UIbeneﬁts on unemployment duration in Louisiana, based on the above algorithm. Displayedare the 95% conﬁdence intervals and their lengths for each of the period between September1981 and September 1982 (top panel) and the period between September 1982 and December1983 (bottom panel). We use the largest sieve dimension k = 12 among those that were usedin our simulation studies presented in Appendix C. (The shape restrictions do not bind forthe cases of k = 4 or k = 8. It is possibly because the current sample sizes are much largerthan those used in our simulation studies.) For the UI beneﬁt amount T ( X ), we use twoalternative measures. One is the amount of UI beneﬁts claimed (left half of each panel) andthe other is the amount of UI beneﬁts actually paid (right half of each panel) by followingthe prior work. That said, these two alternative measures provide almost the same results,and therefore our discussions below apply to the results based on both of the two measures.The reported conﬁdence intervals contain the point estimates reported in the prior workby Landais (2015). That said, the econometric speciﬁcations are diﬀerent, and results arethus hard to compare. Our results based on no shape restriction are eﬀectively what wewould get from the standard method with running the ﬁfth-degree polynomial regressionson each side of the left and right of x . In contrast, Landais (2015) uses the polynomialsof degree one, i.e., the linear speciﬁcation, for the main estimation results reported in hisTable 2. Due to the greater ﬂexibility of our econometric speciﬁcation, our method naturallyincurs wider conﬁdence intervals, but we demonstrate that shape restrictions will contribute16 eptember 1981 – September 1982 UI Claimed UI PaidSieve Dimension: k = 12 95% CI Length 95% CI LengthNo Shape Restriction [-0.023, 0.044] 0.067 [-0.030, 0.040] 0.070Shape Restrictions (13) [0.000, 0.044] 0.044 [0.000, 0.040] 0.040September 1982 – December 1983 UI Claimed UI PaidSieve Dimension: k = 12 95% CI Length 95% CI LengthNo Shape Restriction [0.002, 0.048] 0.046 [0.002, 0.047] 0.045Shape Restrictions (13) [0.002, 0.048] 0.046 [0.002, 0.047] 0.045 Table 1: 95% conﬁdence intervals of the marginal eﬀect of UI beneﬁt amount on unemploy-ment duration for Louisiana, 1981–1983.to providing more informative results.Our conﬁdence interval includes the zero for the period between September 1981 andSeptember 1982 (the ﬁrst panel of Figure 1) if no shape restriction is imposed, i.e., if theconventional approach is taken. However, in this panel (for the period between September1981 and September 198), shape restrictions (13) shrink the conﬁdence intervals. (Althoughthese shrunken conﬁdence intervals have their lower bounds approximately 0.000, note thatwe do not directly impose a sign restriction on the causal eﬀects per se , in the shape re-strictions (13). See our discussions above (13) for motivations of these shape restrictions.)On the other hand, the conﬁdence intervals are already informative for the period betweenSeptember 1981 and September 1982 even without any shape restriction, and imposing shaperestrictions (13) therefore will not contribute to shrinking the conﬁdence intervals. Theseresults thus demonstrate one case in which shape restrictions contribute to enhancing theinformativeness of statistical inference, and another case in which they do not.

Nonparametric inference under shape restrictions can demand high computational burdens,e.g., a grid search over a high-dimensional sieve parameter space. In this paper, we pro-vide a novel method of constructing conﬁdence bands/intervals for nonparametric regressionfunctions under shape constraints. The proposed method can be implemented via a linearprogramming, and it thus relieves the conventional computationally burdens. A usage of17his new method is illustrated with an application to the regression kink design. Inference inthe regression kink design often suﬀers from wide conﬁdence intervals due to the slow con-vergence rates of nonparametric derivative estimators. If economic models and structuresmotivate shape restrictions, then these restrictions may contribute to shrinking the conﬁ-dence interval. We demonstrate this point with real data for an analysis of the causal eﬀectsof unemployment insurance beneﬁts on unemployment durations. Speciﬁcally, for analysisof the eﬀects of unemployment insurance beneﬁts on the unemployment duration, the shaperestrictions motivated by non-increasing direct eﬀects and non-increasing endogenous eﬀectsdrastically shrink the conﬁdence interval of causal eﬀects.18 ppendixA Proofs for the Results in the Main Text

A.1 Proof of Theorem 1

Proof.

First, we are going to show that the projection of CR θ to θ ( w ) is included in theinterval deﬁned in Theorem 1. Let θ be any element of CR θ . Then [ A ,k β ]( w ) − δ ( w ) ≤ θ ( w ) ≤ [ A ,k β ]( w ) + δ ( w ) for some β ∈ R k such that (4) and (6). It implies θ ( w ) isincluded in the interval.Then, we are going to show that the interval is included in the projection of CR θ to θ ( w ). Let c be any element of the interval deﬁned in Theorem 1. There is β such that | [ A ,k β ]( w ) − c | ≤ δ ( w ) and that β satisﬁes (4) and (6). Deﬁne θ ( ˜ w ) by setting it to[ A ,k β ]( ˜ w ) for ˜ w = w and to c for w . Then this θ satisﬁes (5) with θ ( w ) = c . It implies c is included in the projection of CR θ to θ ( w ). A.2 Proof of Theorem 2

We ﬁrst state four lemmas that play important roles in the proof of Theorem 2. Their proofsare delegated to Appendix B.

Lemma 1.

Under Assumptions 2 (a) and 2 (b), there exist k -dimensional centered Gaussianrandom vectors Z and Z ∗ such that sup t (cid:12)(cid:12)(cid:12) P ( k Z k ∞ ≤ t ) − P (cid:16)(cid:13)(cid:13)(cid:13) E n [ E [ ωω T ] − / ω ] (cid:13)(cid:13)(cid:13) ∞ ≤ t (cid:17)(cid:12)(cid:12)(cid:12) = o (1) , sup t (cid:12)(cid:12)(cid:12) P ( k Z ∗ k ∞ ≤ t ) − P (cid:16)(cid:13)(cid:13)(cid:13) E n [ E [ ωω T ] − / ηω ] (cid:13)(cid:13)(cid:13) ∞ ≤ t (cid:17)(cid:12)(cid:12)(cid:12) = o (1) , and E [ ZZ T ] = E [ Z ∗ ( Z ∗ ) T ] . Lemma 2.

Under Assumptions 2 (a) and 2 (b), we have max {k E n [( η + 1) ω ] k , k E n [ ω ] k } = O P r ξ k n ! . emma 3. Under Assumptions 2 (a) and 2 (c), we have (cid:13)(cid:13) E n [ ηp k ( X ) p k ( X ) T ] (cid:13)(cid:13) = O P r ξ k log kn ! . Lemma 4.

Under Assumptions 2 (a) and 2 (c), we have (cid:13)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / − E [ ωω T ] − / (cid:13)(cid:13)(cid:13) = O P ( n /ν ∨ ℓ k c k ) r ξ k log kn ! . Proof of Theorem 2.

First, we are going to show that (cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ ω ] (cid:13)(cid:13) ∞ ≤ cv implies θ ∈ CR θ . By Assumption 1 for A , we have[ A p T k ]( w ) ¯ β ≤ [ A g ]( w ) + | [ A ( g − p T k ¯ β )]( w ) | ≤ δ ( w )for every w ∈ W . By Assumption 1 for A , we have[ A p T k ]( w ) β − δ ( w ) ≤ θ ( w ) ≤ [ A p T k ]( w ) β + δ ( w )for every w ∈ W . Together with (cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ ω ] (cid:13)(cid:13) ∞ ≤ cv , we have θ ∈ CR θ .The rest of the proof is going to establishlim inf n →∞ P (cid:0)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ ω ] (cid:13)(cid:13) ∞ ≤ cv (cid:1) ≥ − α. We now invoke Lemma 1 under Assumptions 2 (a) and 2 (b). Observe that as the Gaussianrandom vectors Z and Z ∗ are centered and share a common covariance matrix, we have20 ( k Z k ∞ ≤ t ) = P ( k Z ∗ k ∞ ≤ t ). Hence it holds that P (cid:0)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ ω ] (cid:13)(cid:13) ∞ ≤ cv (cid:1) ≥ P (cid:0)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ η ˆ ω ] (cid:13)(cid:13) ∞ ≤ cv (cid:1) − sup t (cid:12)(cid:12)(cid:12) P ( k Z ∗ k ∞ ≤ t ) − P (cid:16)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ηω ] (cid:13)(cid:13)(cid:13) ∞ ≤ t (cid:17)(cid:12)(cid:12)(cid:12) − sup t (cid:12)(cid:12)(cid:12) P (cid:16)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ηω ] (cid:13)(cid:13)(cid:13) ∞ ≤ t (cid:17) − P (cid:0)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ η ˆ ω ] (cid:13)(cid:13) ∞ ≤ t (cid:1)(cid:12)(cid:12)(cid:12) − sup t (cid:12)(cid:12)(cid:12) P ( k Z k ∞ ≤ t ) − P (cid:16)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ω ] (cid:13)(cid:13)(cid:13) ∞ ≤ t (cid:17)(cid:12)(cid:12)(cid:12) − sup t (cid:12)(cid:12)(cid:12) P (cid:16)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ω ] (cid:13)(cid:13)(cid:13) ∞ ≤ t (cid:17) − P (cid:0)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ ω ] (cid:13)(cid:13) ∞ ≤ t (cid:1)(cid:12)(cid:12)(cid:12) . Following its deﬁnition, P (cid:0)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ η ˆ ω ] (cid:13)(cid:13) ∞ ≤ cv (cid:1) = 1 − α . By Lemma 1, it suﬃcesto showsup t (cid:12)(cid:12)(cid:12) P (cid:16)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ω ] (cid:13)(cid:13)(cid:13) ∞ ≤ t (cid:17) − P (cid:0)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ ω ] (cid:13)(cid:13) ∞ ≤ t (cid:1)(cid:12)(cid:12)(cid:12) = o (1) (14)and sup t (cid:12)(cid:12)(cid:12) P (cid:16)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ηω ] (cid:13)(cid:13)(cid:13) ∞ ≤ t (cid:17) − P (cid:0)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ η ˆ ω ] (cid:13)(cid:13) ∞ ≤ t (cid:1)(cid:12)(cid:12)(cid:12) = o (1) . (15)We can bound the ﬁrst probability as follows:sup t (cid:12)(cid:12)(cid:12) P (cid:16)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ω ] (cid:13)(cid:13)(cid:13) ∞ ≤ t (cid:17) − P (cid:0)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ ω ] (cid:13)(cid:13) ∞ ≤ t (cid:1)(cid:12)(cid:12)(cid:12) ≤ sup t P (cid:16)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ω ] (cid:13)(cid:13)(cid:13) ∞ − t (cid:12)(cid:12)(cid:12) ≤ / ( √ n log k ) (cid:17) + P (cid:16)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ω ] (cid:13)(cid:13)(cid:13) ∞ − (cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ ω ] (cid:13)(cid:13) ∞ (cid:12)(cid:12)(cid:12) > / ( √ n log k ) (cid:17) ≤ sup t P (cid:0) |k Z k ∞ − t | ≤ / ( √ n log k ) (cid:1) +2 sup t (cid:12)(cid:12)(cid:12) P ( k Z k ∞ ≤ t ) − P (cid:16)(cid:13)(cid:13)(cid:13) E n [ E [ ωω T ] − / ω ] (cid:13)(cid:13)(cid:13) ∞ ≤ t (cid:17)(cid:12)(cid:12)(cid:12) + P (cid:16)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ω ] (cid:13)(cid:13)(cid:13) ∞ − (cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ ω ] (cid:13)(cid:13) ∞ (cid:12)(cid:12)(cid:12) > / ( √ n log k ) (cid:17) ≤ o (1) + P (cid:16)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ω ] (cid:13)(cid:13)(cid:13) ∞ − (cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ ω ] (cid:13)(cid:13) ∞ (cid:12)(cid:12)(cid:12) > / ( √ n log k ) (cid:17) , (16)21here the last inequality uses Lemma 1 and an anti-concentration argument, which impliesthat sup t P (cid:0) |k Z k ∞ − t | ≤ / ( √ n log k ) (cid:1) = o (1) . To see how the anti-concentration argument works, observe thatsup t P (cid:0) |k Z k ∞ − t | ≤ / ( √ n log k ) (cid:1) ≤ sup z ∈ R k P (cid:0) z < Z ≤ z + 1 / ( √ n log k ) (cid:1) + sup z ∈ R k P (cid:0) z − / ( √ n log k ) ≤ Z ≤ z (cid:1) . Then the Nazarov’s anti-concentration inequality (Lemma A.1 in Chernozhukov, Chetverikov, and Kato(2017b)) implies that the ﬁrst term on the right hand sidesup z ∈ R k P (cid:0) z < Z ≤ z + 1 / ( √ n log k ) (cid:1) ≤ C ( n log k ) − / = o (1) , where C is a constant that depends only on b from Assumption 2 (b). The second termon the right hand side above follows a similar argument. Now, for the remaining term inEquation (16), note that (cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ω ] (cid:13)(cid:13)(cid:13) ∞ − (cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ ω ] (cid:13)(cid:13) ∞ (cid:12)(cid:12)(cid:12) ≤ (cid:13)(cid:13)(cid:13) ( E n [ˆ ω ˆ ω T ] − / − E [ ωω T ] − / ) E n [ ω ] (cid:13)(cid:13)(cid:13) ∞ ≤ (cid:13)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / − E [ ωω T ] − / (cid:13)(cid:13)(cid:13) k E n [ ω ] k = O P ( n /ν ∨ ℓ k c k ) r ξ k log kn ! = o P (1)follows from Lemma 2, Lemma 4, and Assumption 2 (c)-(iv). This veriﬁes Equation (14).We next show Equation (15). In a similar way to Equation (16), we can boundsup t (cid:12)(cid:12)(cid:12) P (cid:16)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ηω ] (cid:13)(cid:13)(cid:13) ∞ ≤ t (cid:17) − P (cid:0)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ η ˆ ω ] (cid:13)(cid:13) ∞ ≤ t (cid:1)(cid:12)(cid:12)(cid:12) ≤ o (1) + P (cid:16)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ηω ] (cid:13)(cid:13)(cid:13) ∞ − (cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ η ˆ ω ] (cid:13)(cid:13) ∞ (cid:12)(cid:12)(cid:12) > / ( √ n log k ) (cid:17) . (cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ ηω ] (cid:13)(cid:13)(cid:13) ∞ − (cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / E n [ η ˆ ω ] (cid:13)(cid:13) ∞ (cid:12)(cid:12)(cid:12) ≤ (cid:13)(cid:13)(cid:13) ( E n [ˆ ω ˆ ω T ] − / − E [ ωω T ] − / ) E n [ ηω ] (cid:13)(cid:13)(cid:13) ∞ + (cid:13)(cid:13)(cid:13) ( E n [ˆ ω ˆ ω T ] − / − E [ ωω T ] − / ) E n [ ω ] (cid:13)(cid:13)(cid:13) ∞ + (cid:13)(cid:13)(cid:13) ( E n [ˆ ω ˆ ω T ] − / − E [ ωω T ] − / ) E n [ η (ˆ ω − ω )] (cid:13)(cid:13)(cid:13) ∞ + (cid:13)(cid:13)(cid:13) E [ ωω T ] − / E n [ η (ˆ ω − ω )] (cid:13)(cid:13)(cid:13) ∞ ≤ (cid:13)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / − E [ ωω T ] − / (cid:13)(cid:13)(cid:13) k E n [ ηω ] k + (cid:13)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / − E [ ωω T ] − / (cid:13)(cid:13)(cid:13) k E n [ ω ] k + (cid:16)(cid:13)(cid:13)(cid:13) ( E n [ˆ ω ˆ ω T ] − / − E [ ωω T ] − / ) (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) E [ ωω T ] − / (cid:13)(cid:13)(cid:13) (cid:17) k E n [ η (ˆ ω − ω )] k ≤ O P ( n /ν ∨ ℓ k c k ) r ξ k log kn ! + O P (1) k E n [ η (ˆ ω − ω )] k = o (1)follows from Lemma 2, Lemma 3, Lemma 4, and the fact that with probability 1 − o (1), k E n [ η (ˆ ω − ω )] k = (cid:13)(cid:13)(cid:13) ( E n [ ηp k ( X ) p k ( X ) T ]) E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − E n [ ω ] (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13) E n [ ηp k ( X ) p k ( X ) T ] (cid:13)(cid:13) k E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − k k E n [ ω ] k = O r ξ k log kn ! = o (1) . Note that we have used k E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − k = O P (1). To see this, observe that k E n (cid:2) p k ( X ) p k ( X ) T (cid:3) − E [ p k ( X ) p k ( X ) T ] k = o P (1) following Lemma 6.2 in Belloni et al.(2015) under Assumption 2 (c)-(iv) . Therefore, all eigenvalues of E n (cid:2) p k ( X ) p k ( X ) T (cid:3) arebounded away from zero with probability approaching one following the same argument inthe proof of Lemma 4. This veriﬁes Equation (15). B Proofs for the Auxiliary Lemmas

This Section contains the proofs of the lemmas in Appendix A.2.23 .1 Proof of Lemma 1

Proof.

Observe that E [ ω ] = 0. The ﬁrst uniform convergence in probability follows fromProposition 2.1 in Chernozhukov et al. (2017a) under their Conditions (M.1), (M.2), and(E.2), that are implied by our Assumption 2 (b). The second follows from the same propo-sition in Chernozhukov et al. (2017a) – note that Conditions (M.1), (M.2), and (E.2) and theindependence between η and the data imply E [( η ( E [ ωω T ] − / ) j ω ) ] ≥ b , E [ | η ( E [ ωω T ] − / ) j ω | κ ] ≤ B κn , and E [ k ηE [ ωω T ] − / ω k q ∞ ] ≤ B qn . Finally, the statement on covariance equality is impliedby the ﬁrst two statements, Proposition 2.1 in Chernozhukov et al. (2017a) and the equality E [ E [ ωω T ] − / ω ( E [ ωω T ] − / ω ) T ] = E [ η E [ ωω T ] − / ω ( E [ ωω T ] − / ω ) T ]. B.2 Proof of Lemma 2

Proof.

By Jensen’s inequality, we have E [ k E n [ ω ] k ] = E [( E n [ ω ] T E n [ ω ]) / ] ≤ ( E [ E n [ ω ] T E n [ ω ]]) / = r n E [ ω T ω ] / E [ k E n [( η + 1) ω ] k ] = E [( E n [( η + 1) ω ] T E n [( η + 1) ω ]) / ] ≤ ( E [ E n [( η + 1) ω ] T E n [( η + 1) ω ]]) / = r n ( E [( η + 1) ω T ω ]) / = r n ( E [ ω T ω ]) / . Note that we used the independence between η and the data. We can further bound E [ ω T ω ] / = (cid:0) E [ k p k ( X ) k ( Y − p k ( X ) T Q − E [ p k ( X ) Y ]) ] (cid:1) / = (cid:0) E [ k p k ( X ) k ( Y − p k ( X ) T ¯ β ) ] (cid:1) / ≤ ξ k (cid:0) E [( Y − p k ( X ) T ¯ β ) ] (cid:1) / ≤ ξ k (cid:0) E [ Y ] (cid:1) / . B.3 Proof of Lemma 3

Proof.

By the second statement of Lemma 6.1 in Belloni et al. (2015), we have E [ (cid:13)(cid:13) E n [ ηp k ( X ) p k ( X ) T ] (cid:13)(cid:13) | { Y i , X i } ] = O r log kn (cid:13)(cid:13)(cid:13)(cid:0) E n [( p k ( X ) p k ( X ) T ) ] (cid:1) / (cid:13)(cid:13)(cid:13) ! . We can further bound the norm part by (cid:13)(cid:13)(cid:13)(cid:0) E n [( p k ( X ) p k ( X ) T ) ] (cid:1) / (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:0) E n [( p k ( X )( p k ( X ) T p k ( X )) p k ( X ) T ] (cid:1) / (cid:13)(cid:13)(cid:13) ≤ ξ k k E n (cid:2) p k ( X ) p k ( X ) T (cid:3) / k . By Belloni et al. (2015, Theorem 4.6), we have k E n (cid:2) p k ( X ) p k ( X ) T (cid:3) / k = O P (1) underAssumption 2 (c). B.4 Proof of Lemma 4

Proof.

By Lemma A.2 of Belloni et al. (2015), we can bound (cid:13)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / − E [ ωω T ] − / (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − − E [ ωω T ] − (cid:13)(cid:13)(cid:13) (cid:13)(cid:13) E [ ωω T ] (cid:13)(cid:13) / . Observe that by Jensen’s inequality, { E [max ≤ i ≤ n | Y i − g ( X i ) | ] } / = O ( n /ν ) under As-sumption 2 (c)-(i) Applying Theorem 4.6 in Belloni et al. (2015), we have (cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − E [ ωω T ] (cid:13)(cid:13) = O P ( n /ν ∨ ℓ k c k ) r ξ k log kn ! under Assumptions 2 (a) and 2 (c). Notice that (cid:13)(cid:13)(cid:13) E [ ωω T ] − (cid:13)(cid:13)(cid:13) = O (1) and (cid:13)(cid:13) E [ ωω T ] (cid:13)(cid:13) = O (1). We now claim that k E n [ˆ ω ˆ ω T ] − k = O P (1). In fact, all eigenvalues of E n [ˆ ω ˆ ω T ]are bounded away from zero. To see this, assume without loss of generality E [ ωω T ] = I . Suppose that at least one of eigenvalues of E n [ˆ ω ˆ ω T ] is strictly smaller than 1 /

2, thenthere exists a vector a ∈ R k on the unit sphere such that a ′ E n [ˆ ω ˆ ω T ] a < / E n [ˆ ω ˆ ω T ] − E [ ωω T ] k ≥ | a T ( E n [ˆ ω ˆ ω T ] − E [ ωω T ]) a | = | a T E n [ˆ ω ˆ ω T ] a − | > /

2, a contradiction.This implies that all eigenvalues of E n [ˆ ω ˆ ω T ] − are bounded from above and thus the claim.Hence, we have (cid:13)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − − E [ ωω T ] − (cid:13)(cid:13)(cid:13) ≤ k E n [ˆ ω ˆ ω T ] − k (cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − E [ ωω T ] (cid:13)(cid:13) k E [ ωω T ] − k , which, combined with the above bound, yields (cid:13)(cid:13)(cid:13) E n [ˆ ω ˆ ω T ] − / − E [ ωω T ] − / (cid:13)(cid:13)(cid:13) = O P ( n /ν ∨ ℓ k c k ) r ξ k log kn ! . Therefore, the statement of the lemma follows.

C Simulation Analysis

In this section, we use Monte Carlo simulations to check whether the proposed method worksas the theory claims. Consider the following data generating process. Y ( t, x, u ) = 0 . t − . x + uT ( x ) =  . x if x <

00 if x ≥ T to mimic the actual policy schedule that we use in ourempirical analysis in Section 5.1. Allowing for the endogeneity of X , we generate ( X, U ) fromthe bivariate normal distribution with E [ X ] = E [ U ] = 0, V ar ( X ) = 1 . Cov ( X, U ) = 0 . V ar ( U ) = 0 .

10. In this data generating process, the true partial eﬀect is h (0) = 0 . n = 1000, 2000 and 4000. We implementthe algorithm in Section 5.1 with the kink location at 0 and the subsample with X ∈ [ − , M = 2500. We experiment with k ∈ { , , } and set δ = δ = 0 .

01 throughout. Each set of simulations is based on 10,000Monte Carlo iterations. 26ieve Average Length CoverageDimension Sample Size n k =4 No Shape Restriction 0.656 0.470 0.338 0.948 0.947 0.949Shape Restrictions (13) 0.647 0.470 0.338 0.948 0.947 0.949 k =8 No Shape Restriction 6.039 4.283 3.037 0.950 0.950 0.948Shape Restrictions (13) 3.519 2.646 2.020 0.950 0.950 0.948 k =12 No Shape Restriction 20.675 14.679 10.406 0.942 0.941 0.942Shape Restrictions (13) 10.819 7.879 5.690 0.942 0.941 0.942Table 2: Average lengths and coverage frequencies of the 95% conﬁdence intervals underalternative shape restrictions. All the results are based on 10,000 Monte Carlo iterations.Table 2 summarizes average lengths and coverage frequencies of the 95% conﬁdence in-tervals under alternative shape restrictions across the three diﬀerent sample sizes, n = 1000,2000 and 4000. First, note that the lengths decrease as the sample size n increases for eachsieve dimension k and for each set of shape restrictions. Second, observe that the coveragefrequencies are quite close to the nominal probability 95% for each sieve dimension k and foreach set of shape restrictions. Third, when the sieve dimension takes k ∈ { , } , the shaperestriction (13) contributes to shrinking the average lengths without sacriﬁcing the cover-age frequencies. These results imply that shape restrictions contribute to more informativestatistical inference. 27 eferences Armstrong, T. B. (2015): “Adaptive testing on a regression function at a point,”

TheAnnals of Statistics , 43, 2086–2101.

Armstrong, T. B. and M. Koles´ar (2018): “Finite-sample optimal estimation andinference on average treatment eﬀects under unconfoundedness,” Working paper.——— (2020): “Simple and honest conﬁdence intervals in nonparametric regression,”

Quan-titative Economics , 11, 1–39.

Babii, A. and R. Kumar (2019): “Isotonic regression discontinuity designs,”

Available atSSRN 3458127 . Belloni, A., V. Chernozhukov, D. Chetverikov, and K. Kato (2015): “Somenew asymptotic theory for least squares series: Pointwise and uniform results,”

Journalof Econometrics , 186, 345–366.

Cai, T. T., M. G. Low, and Y. Xia (2013): “Adaptive conﬁdence intervals for regressionfunctions under shape constraints,”

The Annals of Statistics , 41, 722–750.

Card, D., D. S. Lee, Z. Pei, and A. Weber (2015): “Inference on causal eﬀects in ageneralized regression kink design,”

Econometrica , 83, 2453–2483.

Chen, H., H. D. Chiang, and Y. Sasaki (2020): “Quantile treatment eﬀects in regressionkink designs,”

Econometric Theory , 36, 1167–1191.

Chen, X. (2007): “Large sample sieve estimation of semi-nonparametric models,”

Handbookof Econometrics , 6, 5549–5632.

Chen, X., V. Chernozhukov, I. Fern´andez-Val, S. Kostyshak, and Y. Luo (2018): “Shape-enforcing operators for point and interval estimators,” arXiv preprintarXiv:1809.01038 . Chernozhukov, V., D. Chetverikov, and K. Kato (2015a): “Comparison and anti-concentration bounds for maxima of Gaussian random vectors,”

Probability Theory andRelated Fields , 162, 47–70. 28—— (2017a): “Central limit theorems and bootstrap in high dimensions,”

Ann. Probab. ,45, 2309–2352.——— (2017b): “Central limit theorems and bootstrap in high dimensions,”

The Annals ofProbability , 45, 2309–2352.——— (2018): “Inference on causal and structural parameters using many moment inequal-ities,”

Review of Economic Studies , forthcoming.

Chernozhukov, V., W. K. Newey, and A. Santos (2015b): “Constrained conditionalmoment restriction models,” arXiv preprint arXiv:1509.06311 . Chetverikov, D., A. Santos, and A. M. Shaikh (2018): “The econometrics of shaperestrictions,”

Annual Review of Economics , 10, 31–63.

Chiang, H. D. and Y. Sasaki (2019): “Causal inference by quantile regression kinkdesigns,”

Journal of Econometrics , 210, 405–433.

Dong, Y. (2016): “Jump or kink? Regression probability jump and kink design for treat-ment eﬀect evaluation,”

Unpublished Manuscript . D¨umbgen, L. (2003): “Optimal conﬁdence bands for shape-restricted curves,”

Bernoulli ,9, 423–449.

Fang, Z., A. Santos, A. Shaikh, and A. Torgovitsky (2020): “Inference for Large-Scale Linear Systems with Known Coeﬃcients,”

University of Chicago, Becker FriedmanInstitute for Economics Working Paper . Fang, Z. and J. Seo (2019): “A general framework for inference on shape restrictions,” arXiv preprint arXiv:1910.07689 . Freyberger, J. and J. L. Horowitz (2015): “Identiﬁcation and shape restrictions innonparametric instrumental variables estimation,”

Journal of Econometrics , 189, 41–53.

Freyberger, J. and B. Reeves (2018): “Inference under shape restrictions,”

Availableat SSRN 3011474 . 29 orowitz, J. L. and S. Lee (2017): “Nonparametric estimation and inference undershape restrictions,”

Journal of Econometrics , 201, 108–126.

Kato, K., Y. Sasaki, and T. Ura (2021): “Robust inference in deconvolution,”

Quan-titative Economics , 12, 109–142.

Landais, C. (2015): “Assessing the welfare eﬀects of unemployment beneﬁts using theregression kink design,”

American Economic Journal: Economic Policy , 7, 243–78.

Moffitt, R. A. (1985):

The eﬀect of the duration of unemployment beneﬁts on workincentives: an analysis of four data sets , vol. 85, US Department of Labor, Employmentand Training Administration.

Mogstad, M., A. Santos, and A. Torgovitsky (2018): “Using instrumental variablesfor inference about policy relevant treatment parameters,”

Econometrica , 86, 1589–1619.

Nielsen, H. S., T. Sørensen, and C. Taber (2010): “Estimating the eﬀect of studentaid on college enrollment: Evidence from a government grant policy reform,”

AmericanEconomic Journal: Economic Policy , 2, 185–215.

Noack, C. and C. Rothe (2019): “Bias-aware inference in fuzzy regression discontinuitydesigns,” arXiv preprint arXiv:1906.04631 . Samworth, R. and B. Sen (2018): “Special issue on “Nonparametric inference undershape constraints”,”

Statistical Science , 33, 469–472.

Schennach, S. M. (2020): “A bias bound approach to non-parametric inference,”

TheReview of Economic Studies , 87, 2439–2472.

Zhu, Y. (2020): “Inference in nonparametric/semiparametric moment equality models withshape restrictions,”