[PDF] Nonparametric Regression with Multiple Thresholds: Estimation and Inference

Abstract

This paper examines nonparametric regression with an exogenous threshold variable, allowing for an unknown number of thresholds. Given the number of thresholds and corresponding threshold values, we first establish the asymptotic properties of the local constant estimator for a nonparametric regression with multiple thresholds. However, the number of thresholds and corresponding threshold values are typically unknown in practice. We then use our testing procedure to determine the unknown number of thresholds and derive the limiting distribution of the proposed test. The Monte Carlo simulation results indicate the adequacy of the modified test and accuracy of the sequential estimation of the threshold values. We apply our testing procedure to an empirical study of the 401(k) retirement savings plan with income thresholds.

Full PDF

aa r X i v : . [ q -f i n . E C ] F e b Nonparametric Regression with MultipleThresholds: Estimation and Inference

Yan-Yu Chiou a , Mei-Yuan Chen b, ∗ , Jau-er Chen c, ∗ a Institute of Economics, Academia Sinica, Taiwan. b Department of Finance, National Chung Hsing University, Taiwan. c Department of Economics, National Taiwan University, Taiwan.

Journal of Econometrics

We are grateful to the two anonymous referees for their constructive comments that have greatlyimproved this paper. We thank Ming-Yen Cheng for valuable discussions, and thank Zongwu Caiand the participants at the International Symposium on Recent Developments in EconometricTheory with Applications in Honor of Professor Takeshi Amemiya for their helpful comments. Theusual disclaimer applies. *Corresponding authors: National Chung Hsing University, Departmentof Finance, 250 Kuo Kuang Road, Taichung 402, Taiwan. Tel.: +886-4-22853323. E-mail address:mei − [email protected] (Mei-Yuan Chen); National Taiwan University, Department ofEconomics, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan. Tel.: +886-2-3366-8326. E-mailaddress: [email protected] (Jau-er Chen). BSTRACT

This paper examines nonparametric regression with an exogenousthreshold variable, allowing for an unknown number of thresholds.Given the number of thresholds and corresponding threshold values, weﬁrst establish the asymptotic properties of the local constant estimatorfor a nonparametric regression with multiple thresholds. However, thenumber of thresholds and corresponding threshold values are typicallyunknown in practice. We then use our testing procedure to determinethe unknown number of thresholds and derive the limiting distributionof the proposed test. The Monte Carlo simulation results indicate theadequacy of the modiﬁed test and accuracy of the sequential estimationof the threshold values. We apply our testing procedure to an empiricalstudy of the 401(k) retirement savings plan with income thresholds.Keywords: nonparametric regression, threshold variable, threshold value,signiﬁcance test

JEL Classiﬁcation: C12; C13; C14 Introduction

Piecewise linearity has been widely used to model shifts in economic re-lationships under a regression framework. Most regressions with piece-wise linearity can be represented as linear regressions with thresholds.For example, linear regressions with structural changes can be writ-ten as linear threshold regressions with the time index as the thresh-old variable. Among previous studies, Bai and Perron (1998, 2003),Qu and Perron (2007), and Yamamoto and Perron (2013) estimateand test linear regressions with structural changes and Chen (2008),Qu (2008), and Oka and Qu (2011) estimate and test linear quantileregressions with structural changes. The threshold model splits thesample into classes based on the value of an observed variable (i.e.,whether it exceeds a certain threshold). In empirical work, determin-ing the threshold of economic variables such as taxes rates as well asthe optimal public debt ratio is relevant for policy makers. When thethreshold is unknown as is typical in practice, it needs to be estimated,and this consequently increases the complexity of the econometric prob-lem. Nonetheless, theories of estimation and inference are well devel-oped for linear models with exogenous regressors, including the worksby Chan (1993), Hansen (1996, 1999, 2000), and Caner (2002).The scope of threshold models has broadened considerably in re-cent years. In particular, discussions of piecewise linearity have beenextended to nonparametric regressions. Su and Xiao (2008), for in-stance, test for structural changes in time-series nonparametric regres-sion models, while Chen and Hong (2012) investigate how to test forsmooth structural changes in time-series models by using nonparamet-ric regressions. In addition, Chen and Hong (2013) extend their ear- ier study to test for smooth structural changes in panel data models.In economics, the regression discontinuity (RD) design has graduallyemerged as a common tool in applied research. The validity of RD es-timates depends crucially both on the threshold variable (also termedthe running variable in the RD literature) and on an adequate descrip-tion of the conditional mean function of the outcome variable. Sincewhat looks like a jump at the threshold might simply be unaccountedfor nonlinearity, the nonparametric approach plays an important rolein the RD estimations (cf. Angrist and Pischke, 2009). For example,by allowing for an unknown threshold value in the RD framework,Henderson, Parmeter, and Su (2014) provide estimation and inferenceprocedures for the threshold value in a nonparametric regression withone threshold. Although related to Henderson et al. (2014), which isa pioneering study examining the nonparametric regression with onethreshold, our study analyzes nonparametric regression with multiplethresholds. Further, in contrast to Henderson et al. (2014), the thresh-old variable is excluded from the explanatory variables in our frame-work. In empirical applications, multiple thresholds might be present;however, the number of thresholds and the corresponding thresholdvalues are typically unknown in practice. Therefore, identifying theunknown number of thresholds and estimating the threshold valuesare critical issues in a nonparametric regression with multiple thresh-olds, especially when conducting empirical studies. We thus proposea testing procedure to determine the unknown number of thresholdsand derive the limiting distribution of the proposed test. To the bestof our knowledge, the present study is the ﬁrst to comprehensivelyinvestigate the aforementioned issues. This study develops a test pro-cedure for testing the existence of thresholds, determining the number f thresholds, and estimating the values of thresholds in nonparamet-ric regression. Speciﬁcally, this procedure is a modiﬁed signiﬁcancetest based on the work of A¨ıt-Sahalia et al. (2001). In addition, weestablish the consistency and asymptotic normality of the thresholdvalue estimators by using the sequential method. Hence, this studycomplements the existing literature on estimating and testing multi-ple thresholds in nonparametric regression models. Further, we applyour testing procedure to an empirical study of the 401(k) retirementsavings plan with income thresholds and identify four threshold values.Those crucial income threshold values are all above the median incomevalue.The rest of the paper is organized as follows. The model speciﬁca-tion and estimation for a nonparametric regression with thresholds areintroduced in Section 2. This section also summarizes the necessary as-sumptions for deriving our theoretical results of the test statistics andestimators under the known thresholds. Section 3 provides the test de-termining the unknown number of thresholds. Section 4 presents thestatistical properties of the multiple threshold estimator. Section 5 in-vestigates the performance of these tests by using Monte Carlo studies,while Section 6 presents an empirical application. Section 7 concludes.All the technical proofs are collected in the Appendix. We ﬁrst ﬁx the notations and consider the following threshold model,which is a nonparametric regression with s thresholds and known thresh- ld values: E ( Y | X , Q ) = s +1 X j =1 m γ j ( X ) I γ j ( Q ) , where Y is the outcome variable, X is a vector of the covariates, Q isthe threshold variable, which is used to split the sample into distinct s thresholds, γ , γ , . . . , γ s +1 are the corresponding threshold values, and I γ j ( Q i ) denotes an indicator function deﬁned as I γ j ( Q ) =  Q ∈ [ γ j − , γ j ) , , with γ = −∞ and γ s +1 = ∞ . Accordingly the conditional mean ofthe j th regime at a grid point x = [ x , . . . , x p ] ′ can be represented as m γ j ( x ) = E( Y | X = x , I γ j ( Q ) = 1)= Z y f γ j ( y, x ) f γ j ( x ) d x where f γ j ( y, x ) = R I γ j ( q ) f ( y, x , q ) dq and f γ j ( x ) = R I γ j ( q ) f ( x , q ) dq denote the joint density function of Y and X and the marginal densityof X in the j th regime, respectively.Given a sample with observations { ( Y i , X ′ i , Q i ) ′ , i = 1 , . . . , n } , thenonparametric regression with known s thresholds is speciﬁed as Y i = s +1 X j =1 m γ j ( X i ) I γ j ( Q i ) + e i (1)where Y i , X i , and Q i are the i th sample observations of Y , X , and Q ,respectively; e i is the regression error. Note that the threshold valuessatisfy γ < γ < . . . < γ s +1 .Given a p -dimensional product kernel function, K ( u ), in which K h ( u ) is deﬁned as K h ( u ) ≡ h − p K ( u /h ) , he sample kernel density estimators of f γ j ( y, x ) and f γ j ( x ) areˆ f γ j ( y, x ) = 1 n n X i =1 K h ( X i − x ) I γ j ( Q i ) K h ( Y i − y ) (2)ˆ f γ j ( x ) = 1 n n X i =1 K h ( X i − x ) I γ j ( Q i ) (3)Thus, the standard Nadaraya-Watson kernel regression estimator of m γ j ( x ) is ˆ m γ j ( x ) = P ni =1 K h ( X i − x ) I γ j ( Q i ) Y i P ni =1 K h ( X i − x ) I γ j ( Q i ) . (4) To establish the asymptotic properties of the conditional mean esti-mator, ˆ m γ j ( x ), and the density estimator, ˆ f γ j ( y, x ), in the j th regime,as well as the convergence rate of the optimal bandwidth selector, wemake the following assumptions. Assumption 1 . The following assumptions are speciﬁed for the ran-dom variables under study.1-1. Z i = ( Y i , X i , Q i ) is strictly stationary, ergodic and β -mixing with β coeﬃcients for some ﬁxed ε >

0, satisfying P ∞ k =1 k [ β ( k )] ε ε < ∞ .1-2. The density f ( y, x , q ) is bounded away from zero and globallyintegrable on the compact support S of the weighting function a ( · ), where a ( · ) is deﬁned in Section 3.1 when we construct theproposed test statistic. Hence inf S f ( x , q ) ≡ b ≥ f , j of ( Z , Z j ) exists for all j and is con-tinuous on ( R × S ) . -4. E[ e i | X i = x , Q i = q ] ≤ ∞ , E( e i | X i = x , Q i = q ) = σ ( x , q ) and σ ( x , q ) is square-integrable on S .1.5. R | m γ l ( x i ) − m γ k ( x i ) | d x i = 0 , l, k = 1 , . . . , s + 1 and l = k . Assumption 2 . The following assumptions are imposed on the kernelfunction.2-1. K is a product kernel, K = K × · · · × K p = K p , given K i = K, ∀ i , and a bounded function on R p , symmetric about 0, with R | K ( z ) | dz < ∞ , R K ( u ) du = 1 , R u j K ( u ) du = 0 , j = 1 , . . . , r −

1, and R u r K ( u ) du < ∞ .2.2. The kernel K is r th continuous diﬀerentiable with r > p/ Assumption 3 . The following assumptions are assumed for the band-width selector.3-1. As n → ∞ , h → nh p → ∞ and nh p +2 r +2 → n → ∞ , the bandwidth sequence h = O ( n − /δ ) is such that2 p < δ < r + p/ h → nh p → = ∞ and nh p/ r → here is no need to use a higher-order kernel ( r >

2) unless the dimen-sionality of the covariate is greater than or equal to 3. Assumption 3imposes the joint restrictions on the bandwidth sequence h , order ofthe kernel r , dimensionality of the covariate p , and sample size n . Inparticular, when p = 1 and r = 2, the restriction, 2 p < δ < r + p/ < δ < . δ = 4 .

25, which suﬃces the nonparametric estimator valid asymptoticproperties.

Assuming that the number of thresholds s and corresponding thresh-old values γ j , j = 1 , . . . , s + 1 are known already, the consistency andasymptotic normality of ˆ f γ j ( x ) are provided in Theorem 1 and theasymptotic properties of ˆ m γ j ( x ) are stated in Theorem 2. Theorem 1.

Suppose that the assumptions in Assumptions 1, 2, and3-1 hold. The following results are established.a). The almost sure convergence rate of ˆ f γ j ( x ) , sup | ˆ f γ j ( x ) − f γ j ( x ) | = O p ( h r + (ln( n )) / / ( nh p ) / ) , j = 1 , . . . , s + 1 . b). The asymptotic normality of ˆ f γ j ( x ) , ( nh p ) / ( ˆ f γ j ( x ) − f γ j ( x ) − h C p X l =1 f (2) γ j ,l ( x ) ) → N (0 , C f γ j ( x )) where C = Z u K ( u ) du, C = (cid:20)Z K ( u ) du (cid:21) p , f (2) γ j ,l ( x ) = ∂ f γ j ( x ) ∂x l . (cid:3) hen the estimation is carried out at a single point x , we havethe convergence rate O p ( h r + 1 / ( nh p ) / ). In empirical applications,multiple x often appear and then the estimator has a slower uniformconvergence rate O p ( h r + (ln( n )) / / ( nh p ) / ). Hence, from part b ), thekernel-smoothing density estimation is biased. Given that a Gaussianproduct kernel is being used, we already know that C = 1 and C =1 / (2 √ π ) p according to A¨ıt-Sahalia et al. (2001). Moreover, given thatthe number of thresholds s and corresponding threshold values γ j , j =1 , . . . , s + 1 are known, the consistency and asymptotic normality ofˆ m γ j ( x ) are provided as follows. Theorem 2.

Suppose that the assumptions in Assumptions 1, 2 and3-1 hold. The following results are derived.a) The almost sure convergence rate of ˆ m γ j ( x ) , sup | ˆ m γ j ( x ) − m γ j ( x ) | = O p ( h r + (ln( n )) / / ( nh p ) / ) , j = 1 , . . . , s + 1 b) The asymptotic normality of ˆ m γ j ( x ) , ( nh p ) / (cid:2) ˆ m γ j ( x ) − m γ j ( x ) − AB ( x ) (cid:3) → N , C σ γ j ( x ) f γ j ( x ) ! where AB ( x ) denotes the asymptotic bias, AB ( x ) = 12 h C p X l =1 h m (2) γ j ,l ( x ) f γ j ( x ) + 2 m (1) γ j ,l ( x ) f (1) γ j ,l ( x ) i /f γ j ( x ) ,m (1) γ j ,l ( x ) = ∂m γj ( x ) ∂x l and m (2) γ j ,l ( x ) = ∂ m γj ( x ) ∂x l are the ﬁrst- and second-order derivatives of the j th regime’s conditional mean with respect tothe l th explanatory variable, respectively. (cid:3) It is now clear that the sample estimator ˆ m γ j ( x ) is also asymptoti-cally biased. However, this asymptotic bias could be reduced by using igher-order kernels. Notice that the convergence rates and asymp-totic results of ˆ f γ j ( x ) and ˆ m γ j ( x ) are not aﬀected by s , the numberof thresholds. In ﬁnite samples, the number of thresholds does aﬀectthe nonparametric estimation. However, at the limit, the convergencerate does not depend on s . Our results are therefore similar to thosepresented by Li and Racine (2007). In nonparametric regressions, bandwidth plays a crucial role in theestimation. Diﬀerent bandwidth selection rules have been suggested inthe literature. Among the selectors, the optimal bandwidth selector isthe most comprehensively studied and is obtained by minimizing themean integrated squared error (MISE). That is, for a model with s thresholds, the corresponding MISE is deﬁned asMISE( h ) = Z Z E " s +1 X j =1 (cid:0) ˆ m γ j ( x ) − m γ j ( x ) (cid:1) I γ j ( q ) w ( x ) d x dq (5)and then the optimal bandwidth selector is obtained from h opt = arg min h MISE( h ) . The weighting function w ( x ) is an indicator function selecting a par-ticular x -region of interest, and this depends generally on empiricalstudies. Since the threshold variable q does not aﬀect the convergencerate of the proposed estimator, we construct the weighting functionwithout including the threshold variable. The convergence rate of h opt is derived and summarized in the following theorem. Theorem 3.

Under Assumptions 1, 2, and 3, the convergence rate ofthe optimal bandwidth selector is h opt = O ( n − δ ) in which δ = p + 2 r . (cid:3) his result shows that the convergence rate of the optimal band-width selector depends on the number of covariates p and order ofcontinuous diﬀerentiability of the kernel function, but that the conver-gence rate is not aﬀected by the number of thresholds. In other words,the additional thresholds do not worsen the curse-of-dimensionalityproblem. The number of thresholds and corresponding threshold values are typi-cally unknown in practice. In this section, we thus present a procedurefor determining the unknown number of thresholds and estimating thethreshold values. In linear regressions with thresholds, the number ofthresholds is commonly determined by carrying out a sequential sig-niﬁcance test (see Hansen, 1997). This sequential test is conducted bycomparing the estimated sum of the squared errors from a model with s thresholds (under the null hypothesis) with that from a model with s + 1 thresholds (under the alternative) sequentially. The number ofthresholds is determined as s when the null of s − s thresholds is rejected, whereas the null of s thresh-olds versus the alternative of s + 1 thresholds is not rejected. Similarly,we determine the number of thresholds in nonparametric regressionsbased on sequential tests in this study. Instead of comparing the es-timated error sum of squares from the linear regressions, however, weuse the signiﬁcance test suggested by A¨ıt-Sahalia et al. (2001) for thenonparametric regressions as the basis in the sequential tests. The teststatistic for the null of s + 1 thresholds to s thresholds is constructedand its asymptotic distribution is established as follows. he test of A¨ıt-Sahalia et al. (2001) is constructed to test the sig-niﬁcance of a subset of covariates in a nonparametric regression. Theintuition behind the test is to check the diﬀerence between the non-parametric regression estimates of unconstrained and constrained con-ditional means. That is, the null of the signiﬁcance test is writtenas H : P r [ m ( W , V ) − m ( W )] = 1 (6)where W represents the p -dimensional explanatory variables, V is the q -dimensional explanatory variables under testing, m ( w , v ) and m ( w )denote the conditional means under the alternative and null hypothe-ses, and f ( w , v ) and f ( w ) are the joint probability density functionsof ( w , v ) and w , respectively.To test the null of s thresholds versus the alternative of s + 1,this test can be modiﬁed by taking W as the p × ( s + 1) independentvariables in the regression with s thresholds and V as the extra p independent variables in the regression with s + 1 thresholds. Thesigniﬁcance of V implies that the regression with s + 1 thresholds mustbe considered. However, the regression remains with s thresholds if V is not signiﬁcant. The details are discussed as follows. First, weconstruct the test for detecting whether an extra threshold (known atvalue, τ j ) exists in the j th regime. Second, since the threshold value τ j is unknown in general, the test is extended to test whether an extraunknown threshold exists in the j th regime. Given a regression with s thresholds expressed as (1), a new threshold τ j is suspected to exist in the j th regime [ γ j − , γ j ). Then, the conditional ean for the regime [ γ j − , γ j ) is split into two parts: m γ j − ,τ j ( X i ) I γ j − ,τ j ( Q i )in the regime [ γ j − , τ j ) and m τ j ,γ j ( X i ) I τ j ,γ j ( Q i ) in the regime [ τ j , γ j ),where I γ j − ,τ j ( Q i ) =  , Q i ∈ [ γ j − , τ j ) , , else, , I τ j ,γ j ( Q i ) =  , Q i ∈ [ τ j , γ j ) , , else, , and m γ j − ,τ j ( x ) is deﬁned as f γ j − ,τ j ( x , y ) = Z I γ j − ,τ j ( q ) f ( x , y, q ) dqf γ j − ,τ j ( x ) = Z I γ j − ,τ j ( q ) f ( x , q ) dqm γ j − ,τ j ( x ) = E( Y i | X i = x , I γ j − ,τ j ( Q i ) = 1)= Z y f γ j − ,τ j ( y, x ) f γ j − ,τj ( x ) d x and m τ j ,γ j ( x ) is deﬁned similarly to m γ j − ,τ j ( x ).Denote E ( Y | X , Q ; γ , . . . , γ s ) as the conditional mean with s thresh-olds under the null and E ( Y | X , Q ; γ , . . . , γ j − , τ j , γ j , . . . , γ s ) as theconditional mean function with s + 1 thresholds under the alterna-tive. Then, the null hypothesis for testing whether an extra thresholdexists in the regime [ γ j − , γ j ) can be written as H : P r [ E ( Y | X , Q ; γ , . . . , γ s ) = E ( Y | X , Q ; γ , . . . , γ j − , τ j , γ j , . . . , γ s )] = 1 . The sample statistic analogous to the test Γ( τ j ) in A¨ıt-Sahalia etal. (2001) is constructed as˜Γ( τ j ) = 1 n n X i =1 (cid:8) ˆ m γ j ( X i ) I ˆ γ j ( Q i ) − ˆ m γ j − ,τ j ( X i ) I γ j − ,τ j ( Q i ) − ˆ m τ j ,γ j ( X i ) I τ j ,γ j ( Q i ) (cid:9) a ( X i ) , (7)where ˆ m γ j ( x ), ˆ m γ j − ,τ j ( x ), and ˆ m τ j ,γ j ( x ) are the sample estimates of m γ j ( x ), m γ j − ,τ j ( x ), and m τ j ,γ j ( x ), respectively, and a ( X i ) is a weight- ng function. Speciﬁcally, a ( X ) =  X ∈ C , where C ∈ R p . The choice of C is application-dependent. For example, in an empiri-cal analysis of options prices, a ( X ) can be set to exclude those in-the-money options with price biases. Similarly, it can be set by using priorinformation to tackle boundary eﬀects so that the density is boundedaway from zero. Since ˜Γ( τ j ) is the weighted sum of the squares of thediﬀerences from ˆ m γ j ( x ) to ˆ m γ j − ,τ j ( x ) and to ˆ m τ j ,γ j ( x ), the null hypoth-esis, Γ( τ j ) = 0, is not rejected when ˜Γ( τ j ) is insuﬃciently large and isrejected when ˜Γ( τ j ) is suﬃciently large. Therefore, this inference is aright-tailed test. The asymptotic distribution of ˜Γ( τ j ) is constructedas follows. Theorem 4.

Under the null hypothesis and according to Assumptions1, 2, and 3, the asymptotic normality of the statistic ˜Γ( τ j ) is representedas σ − ( τ j ) { nh p/ ˜Γ( τ j ) − h − p/ ξ ( τ j ) } d −→ N (0 , , (8) where ξ ( τ j ) and σ ( τ j ) denote the bias and variance terms, respectively,and where the bias term is ξ ( τ j ) = C [ ξ ( τ j ) + ξ ( τ j )] with ξ ( τ j ) = Z x σ γ j ( x ) a ( x ) d x ξ ( τ j ) = Z x (cid:18) − f γ j − ,τ j ( x ) f γ j ( x ) (cid:19) σ γ j − ,τ j ( x ) a ( x ) d x + Z x (cid:18) − f τ j ,γ j ( x ) f γ j ( x ) (cid:19) σ τ j ,γ j ( x ) a ( x ) d x . was deﬁned in Theorem 1, and the variance term is σ ( τ j ) = 2 C [ σ ( τ j ) + σ ( τ j )] with σ ( τ j ) = Z x σ γ j ( x ) a ( x ) d x σ ( τ j ) = Z x (cid:18) − f γ j − ,τ j ( x ) f γ j ( x ) (cid:19) σ γ j − ,τ j ( x ) a ( x ) d x + Z x (cid:18) − f τ j ,γ j ( x ) f γ j ( x ) (cid:19) σ τ j ,γ j ( x ) a ( x ) d x where σ γ j ( x ) , σ γ j − ,τ j ( x ) , and σ τ j ,γ j ( x ) are σ γ j ( x ) = Z (cid:2) y − m γ j ( x ) (cid:3) f γ j ( y, x ) f γ j ( x ) dy = Z σ ( x , q ) I γ j ( q ) f ( x , q ) f γ j ( x ) dqσ γ j − ,τ j ( x ) = Z (cid:2) y − m γ j − ,τ j ( x ) (cid:3) f γ j − ,τ j ( y, x ) f γ j − ,τ j ( x ) dy = Z σ ( x , q ) I γ j − ,τ j ( q ) f ( x , q ) f γ j − ,τ j ( x ) dqσ τ j ,γ j ( x ) = Z (cid:2) y − m τ j ,γ j ( x ) (cid:3) f τ j ,γ j ( y, x ) f τ j ,γ j ( x ) dy = Z σ ( x , q ) I τ j ,γ j ( q ) f ( x , q ) f τ j ,γ j ( x ) dq and C = Z w (cid:26)Z u K ( u ) K ( u + w ) du (cid:27) dw. (cid:3) Note that A¨ıt-Sahalia et al. (2001) also show that C = 1 / (2 √ π ) p when the Gaussian product kernel is used. Given the result in Theorem4, we denote δ ( τ j ) = σ − ( τ j ) h nh p/ ˜Γ( τ j ) − h − p/ ξ ( τ j ) i nd then the test statistic for the null of having an extra threshold τ j in the j th regime can be considered to beˆ δ ( τ j ) = ˆ σ − ( τ j ) h nh p/ ˜Γ( τ j ) − h − p/ ˆ ξ ( τ j ) i , where ˆ σ and ˆ ξ are the consistent estimators for σ and ξ , respectively.The limiting distribution of ˆ δ ( τ j ) is N (0 , δ ( τ j ) is investigated in Section 3.4; Consequently it is a consistent test.We describe the consistent estimation of σ and ξ in the followingsubsections. In practice, τ j is unknown a priori and there are, in principle, inﬁnitemany of τ j s in the regime [ γ j − , γ j ). To make the test implementable,instead of inﬁnite many of τ j s, we only consider the m candidate thresh-old values within the regime [ γ j − , γ j ), i.e., γ j − < τ j, < τ j, < . . . <τ j,m < γ j , where τ j, − γ j − = τ j, − τ j, = · · · = γ j − τ j,m = ( γ j − γ j − ) /m .Given the suspected m pseudo thresholds, τ j, , τ j, , . . . , τ j,m , the null ofan extra unknown threshold can be written as H : P r  Γ( τ j, ) = 0Γ( τ j, ) = 0...Γ( τ j,m ) = 0  = 1 . (9)Given the sample counterpart ˜Γ( τ j,i ) of Γ( τ j,i ) , i = 1 , . . . , m, as de-ﬁned in (7), the following theorem reports the joint asymptotic distri-bution of the m statistics. Theorem 5.

Given that the assumptions in Assumptions 1, 2, and 3 old, E ( e i | X i = x , Q i = q ) = σ ( x , q ) , and under the null,  δ ∗ ( τ j, ) δ ∗ ( τ j, ) ... δ ∗ ( τ j,m )  = Σ − /  δ ( τ j, ) δ ( τ j, ) ... δ ( τ j,m )  d −→ N (0 , I ) where δ ( τ j,k ) = σ − ( τ j,k ) h nh p/ ˜Γ( τ j,k ) − h − p/ ξ ( τ j,k ) i and Σ is the variance-covariance matrix of δ ( τ j, ) , . . . , δ ( τ j,m ) . The ( l, k ) -element in the variance-covariance matrix Σ , assuming τ j,l < τ j,k ,is Cov ( δ ( τ j,l ) , δ ( τ j,k ))= [ σ ( τ j,l ) + σ ( τ j,l )] − / [ σ ( τ j,k ) + σ ( τ j,k )] − / × ϕ ( τ j,l , τ j,k ) where ϕ ( τ j,l , τ j,k ) is deﬁned in the Appendix because of its complex form. (cid:3) Theorem 5 is applicable to nonparametric regressions with het-eroskedastic errors whose variances depend on the values of X i and Q i , i.e, E( e i | X i = x , Q i = q ) = σ ( x , q ). By replacing σ , ξ , and Σ inTheorem 5 with consistent estimates, namely ˆ σ , ˆ ξ , and ˆΣ, respectively, For the two restricted cases with heteroskedastic errors whose variances depend onthe values of X i but not on those of Q i , i.e., E( e i | X i = x , Q i = q ) = σ ( x ), when X and Q are either dependent or independent, the joint asymptotic distribution of the m statistics is also derived but not provided in this paper. The detailed results and proofs ofthe corresponding asymptotic distributions are available from the authors upon request. e have  ˆ δ ∗ ( τ j, )ˆ δ ∗ ( τ j, )...ˆ δ ∗ ( τ j,m )  = ˆΣ − /  ˆ δ ( τ j, )ˆ δ ( τ j, )...ˆ δ ( τ j,m )  d −→ N ( , I m ) , (10)where ˆ δ ( τ j,k ) = ˆ σ − ( τ j,k ) h nh p/ ˜Γ( τ j ) − h − p/ ˆ ξ ( τ j,k ) i . Given the asymptotic normality of the test statistic ˜Γ( τ j ), the nuisanceparameters must be estimated consistently. First, the parameter σ γ j ( x )can be estimated by using the Nadaraya-Watson estimator as follows:ˆ σ γ j ( x ) = P ni =1 K h ( X i − x ) I γ j ( Q i ) Y i P ni =1 K h ( X i − x ) I γ j ( Q i ) − ˆ m γ j ( x ) (11)Thus, σ , ξ , and Σ can be estimated asˆ ξ ( τ j,k ) = C ( ˆ ξ ( τ j,k ) + ˆ ξ ( τ j,k ))ˆ ξ ( τ j,k ) = 1 n n X i =1 ˆ σ γ j ( X i ) a ( X i )ˆ f ( X i )ˆ ξ ( τ j,k ) = 1 n n X i =1 − f γ j − ,τ j,k ( X i )ˆ f γ j ( X i ) ! ˆ σ γ j − ,τ j,k ( X i ) a ( X i )ˆ f ( X i )+ 1 n n X i =1 − f τ j,k ,γ j ( X i )ˆ f γ j ( X i ) ! ˆ σ τ j,k ,γ j ( X i ) a ( X i )ˆ f ( X i ) nd ˆ σ ( τ j,k ) = 2 C (ˆ σ ( τ j,k ) + ˆ σ ( τ j,k ))ˆ σ ( τ j,k ) = 1 n n X i =1 ˆ σ γ j ( X i ) a ( X i )ˆ f ( X i )ˆ σ ( τ j,k ) = 1 n n X i =1 (1 − f γ j − ,τ j,k ( X i )ˆ f γ j ( X i ) ) ˆ σ γ j − ,τ j,k ( X i ) a ( X i )ˆ f ( X i )+ 1 n n X i =1 (1 − f τ j,k ,γ j ( X i )ˆ f γ j ( X i ) ) ˆ σ τ j,k ,γ j ( X i ) a ( X i )ˆ f ( X i ) . Further, the ( i, j )th elements of Σ can be estimated as d Cov ( δ ( τ j,l ) , δ ( τ j,k )) = [ˆ σ ( τ j,l ) + ˆ σ ( τ j,l )] − / [ˆ σ ( τ j,k ) + ˆ σ ( τ j,k )] − / × (ˆ c + ˆ c + ˆ c + ˆ c + ˆ c + ˆ c + ˆ c + ˆ c + ˆ c ) , where the terms ˆ c to ˆ c areˆ c = 1 n n X i =1 ˆ σ γ j ( X i )ˆ f ( X i ) a ( X i )ˆ c = − ( n n X i =1 ˆ σ γ j ( X i )ˆ σ γ j − ,τ j,k ( X i )ˆ f ( X i ) ˆ f γ j − ,τ j,k ( X i )ˆ f γ j ( X i ) a ( X i )+ 1 n n X i =1 ˆ σ γ j ( X i )ˆ σ τ j,k ,γ j ( X i )ˆ f ( X i ) ˆ f τ j,k ,γ j ( X i )ˆ f γ j ( X i ) a ( X i ) ) ˆ c = 1 n n X i =1 ˆ σ γ j − ,τ j,k ( X i )ˆ f ( X i ) ˆ f γ j − ,τ j,k ( X i )ˆ f γ j ( X i ) a ( X i ) + 1 n n X i =1 ˆ σ τ j,k ,γ j ( X i )ˆ f ( X i ) ˆ f τ j,k ,γ j ( X i )ˆ f γ j ( X i ) a ( X i )ˆ c = − ( n n X i =1 ˆ σ γ j ( X i )ˆ σ γ j − ,τ j,l ˆ f ( X i ) ˆ f γ j − ,τ j,l ( X i )ˆ f γ j ( X i ) a ( X i )+ 1 n n X i =1 ˆ σ γ j ( X i )ˆ σ τ j,l ,γ j ( X i )ˆ f ( X i ) ˆ f τ j,l ,γ j ( X i )ˆ f γ j ( X i ) a ( X i ) ) c = 4 ( n n X i =1 ˆ σ γ j ( X i )ˆ σ γ j − ,τ j,l ( X i )ˆ f ( X i ) ˆ f γ j − ,τ j,l ( X i )ˆ f ( X i ) a ( X i )+ 1 n n X i =1 ˆ σ γ j ( X i )ˆ σ τ j,l ,τ j,k ( X i )ˆ f ( X i ) ˆ f τ j,l ,τ j,k ( X i )ˆ f γ j ( X i ) a ( X i )+ 1 n n X i =1 ˆ σ γ j ( X i )ˆ σ τ j,k ,γ j ( X i )ˆ f ( X i ) ˆ f τ j,k ,γ j ( X i )ˆ f γ j ( X i ) a ( X i ) ) ˆ c = − ( n n X i =1 ˆ σ γ j − ,τ j,k ( X i )ˆ σ γ j − ,τ j,l ( X i )ˆ f ( X i ) ˆ f γ j − ,τ j,l ( X i )ˆ f ( X i ) a ( X i )+ 1 n n X i =1 ˆ σ γ j − ,τ j,k ( X i )ˆ σ τ j,l ,τ j,k ( X i )ˆ f ( X i ) ˆ f τ j,l ,τ j,k ( X i )ˆ f γ j ( X i ) a ( X i )+ 1 n n X i =1 ˆ σ τ j,k ,γ j ( X i )ˆ f ( X i ) ˆ f τ j,k ,γ j ( X i )ˆ f γ j ( X i ) a ( X i ) ) ˆ c = ( n n X i =1 ˆ σ γ j − ,τ j,l ( X i )ˆ f ( X i ) ˆ f γ j − ,τ j,l ( X i )ˆ f γ j ( X i ) a ( X i ) + 1 n n X i =1 ˆ σ τ j,l ,γ j ( X i )ˆ f ( X i ) ˆ f τ j,l ,γ j ( X i )ˆ f γ j ( X i ) a ( X i ) ) ˆ c = − ( n n X i =1 ˆ σ γ j − ,τ j,l ( X i )ˆ f ( X i ) ˆ f γ j − ,τ j,l ( X i )ˆ f ( X i ) a ( X i )+ 1 n n X i =1 ˆ σ τ j,k ,γ j ( X i )ˆ σ τ j,l ,τ j,k ( X i )ˆ f ( X i ) ˆ f τ j,l ,τ j,k ( X i )ˆ f γ j ( X i ) a ( X i )+ 1 n n X i =1 ˆ σ τ j,l ,γ j ( X i )ˆ σ τ j,k ,γ j ( X i )ˆ f ( X i ) ˆ f τ j,k ,γ j ( X i )ˆ f γ j ( X i ) a ( X i ) ) ˆ c = 1 n n X i =1 ˆ σ γ j − ,τ j,l ( X i )ˆ f ( X i ) ˆ f γ j − ,τ j,l ( X i )ˆ f γ j − ,τ j,k ( X i ) a ( X i )+ 1 n n X i =1 ˆ σ τ j,l ,τ j,k ( X i )ˆ f ( X i ) n ˆ f τ j,l ,τ j,k ( X i ) o ˆ f γ j − ,τ j,k ( X i ) ˆ f τ j,l ,γ j ( X i ) a ( X i )+ 1 n n X i =1 ˆ σ τ j,k ,γ j ( X i )ˆ f ( X i ) ˆ f τ j,k ,γ j ( X i )ˆ f τ j,l ,γ j ( X i ) a ( X i ) . Given Lemma 6 , Theorems 1 and 2, and Assumptions 1, 2 and 3, e have the following results as in A¨ıt-Sahalia et al. (2001):ˆ ξ ( τ j,k ) − ξ ( τ j,k ) = o p ( h p/ )ˆ ξ ( τ j,k ) − ξ ( τ j,k ) = o p ( h p/ )and ˆ σ ( τ j,k ) − σ ( τ j,k ) = o p (1)ˆ σ ( τ j,k ) − σ ( τ j,k ) = o p (1) . That is, ˆ ξ ( τ j,k ), ˆ ξ ( τ j,k ), ˆ σ ( τ j,k ), and ˆ σ ( τ j,k ) are the consistent esti-mators of ξ ( τ j,k ), ξ ( τ j,k ), σ ( τ j,k ), and σ ( τ j,k ), respectively. For C and C , A¨ıt-Sahalia et al. (2001) show that C = 1 / (2 √ π ) p ,C = 1 / (2 √ π ) p . In light of the results in (10), the following test statistics are sug-gested to test the null of no extra unknown threshold existing in theregime [ γ j − , γ j ): Z γ j = 1 √ m m X i =1 ˆ δ ∗ ( τ j,i ) . (12)Furthermore, we know that ˆ δ ∗ ( τ j,i ) converge to the standard normaldistribution. Therefore, the distribution in the limit of Z γ j is alsostandard normally distributed, i.e., Z γ j ∼ N (0 , . (13) In this subsection, we study the consistency of the test. We then ex-amine its power, that is, the probability of rejecting a false hypothesis gainst the sequences of alternatives that approach the null as n → ∞ .Given an extra threshold existing in [ γ j − , γ j ) and being neglected, m γ j ( x ) I γ j ( q ) − m γ j − ,τ j ( x ) I γ j − ,τ j ( q ) − m τ j ,γ j ( x ) I τ j ,γ j ( q ) = 0 (14)for q ∈ [ γ j − , γ j ). Suppose an extra threshold does exist in [ γ j − , γ j ) un-der the alternative and denote the sequence of densities as f [ n ] γ j , f [ n ] γ j − ,τ j and f [ n ] τ j ,γ j . The superscript [ n ] is speciﬁed to show that these densitiesare dependent on n since the value of the extra threshold is unknown.The local alternatives can be speciﬁed as H n : sup[ m [ n ] γ j ( x ) I γ j ( q ) − m [ n ] γ j − ,τ j ( x ) I γ j − ,τ j ( q ) − m [ n ] τ j ,γ j ( x ) I τ j ,γ j ( q ) − ǫ n λ τ ∗ ,τ j ( x , q ) | : x , q ∈ S ] = o ( ǫ n )where || f [ n ] γ j − f γ j || ∞ = o ( n − h − p/ ) || f [ n ] γ j − ,τ j − f γ j − ,τ j || ∞ = o ( n − h − p/ ) || f [ n ] τ j ,γ j − f τ j ,γ j || ∞ = o ( n − h − p/ )and λ τ ∗ ,τ j ( x , q ) satisﬁes Z λ τ ∗ ,τ j ( x , q ) f ( x , q ) dq = 0and Λ τ ∗ ,τ j ≡ Z Z λ τ ∗ ,τ j ( x , q ) f ( x , q ) d x dq < ∞ It is clear that the alternative H n converges to the null H at speed n − / h − p/ (i.e., ǫ n = n − / h − p/ ). Theorem 6.

Under Assumptions 1, 2, and 3, the asymptotic power ofthe test is P (ˆ δ ( τ j ) ≥ z α | H n ) → − Φ( z α − Λ τ ∗ ,τ j /σ ( τ j )) , here Φ( z α ) = 1 − α with Φ( · ) , the CDF function of a standard normalrandom variable. (cid:3) The test statistic, the average norm Z γ j , is suggested to check whetheran extra threshold exists in the regime [ γ j − , γ j ) given that the s thresh-old values γ , . . . , γ s are already known. Logically, the test can be ap-plied to check for an extra threshold existing in the regime [ γ j − , γ j ) for j = 1 , . . . , s jointly. This thus ends up being the test for whether thereis an extra threshold in a given s threshold regression. Accordingly, weconstruct, in what follows, the test for the null of s thresholds againstthe alternative of s + 1 thresholds.Since the indicator functions are independent, i.e., I γ i ( Q i ) × I γ j ( Q i ) =0 , i = j , the covariance of δ ( τ i ) and δ ( τ j ) for i = j is zero. That is E [ δ ( τ i ) δ ( τ j )] = 0 , for i = j This fact implies that Z γ j and Z γ l ( j = l ) are asymptotically indepen-dent. The test statistic for the null s thresholds against s +1 thresholdsis constructed as characterized in the following theorem. Theorem 7.

Under the same assumptions as for Theorem 5, the teststatistic for the null s thresholds against s + 1 thresholds is constructedas F n ( s + 1 | s ) = max ≤ j ≤ s +1 Z γ j , with lim n →∞ P ( F n ( s + 1 | s ) ≤ x ) = Φ s +1 ( x ) , where Φ( x ) is the CDFof a standard normal distribution and Z γ j is deﬁned in equation (12). (cid:3) F n ( s + 1 | s ) s + 1 10% 5% 1%1 1.281552 1.644854 2.3263482 1.632219 1.954508 2.5749613 1.818281 2.121201 2.7119434 1.943196 2.234002 2.8058215 2.036469 2.318679 2.876895 Table 1 presents he critical values of the test statistic F n ( s + 1 | s )for s + 1 = 1 , , , , s thresholds against s + 1 thresh-olds in Theorem 7, the number of thresholds can be determined byconducting these tests sequentially for s = 0 , , . . . and so on. Thenumber of thresholds is determined by sequential inferences until thenot rejection result is obtained. In other words, the number of thresh-olds is s when the null of s thresholds against s + 1 thresholds is notrejected. When the number of thresholds is determined, we estimatethe corresponding threshold values by using the methods discussed inthe next section. In the preceding discussions on testing an extra unknown threshold in acertain regime and testing the null of s thresholds against s + 1 thresh-olds, the threshold values under the null are assumed to be knownalready. In applied research, the threshold values are unknown andneed to be estimated by using a valid procedure. In the framework oflinear regressions, Bai (1997) and Bai and Perron (1998) determine the umber of structural changes by using a sequential test and estimatethe breakpoints by looking up the sums of the squared errors at whichthe minimization is obtained. Hansen (1999) discusses the determina-tion of the number of thresholds and estimation of threshold values inlinear regressions by using similar procedures. We thus extend theseprocedures to the framework of nonparametric regressions. To derive the statistical properties of the threshold value estimators,we need the following assumptions.

Assumption 4 .4-1. f q ( q ), E( c l.k ( X ) | q ), and E( c l.k ( X ) e | q ) exist and are continuous at q = γ , . . . , γ s , where c l.k ( X i ) := m γ l ( X i ) − m γ k ( X i ).4-2. max l,k ∈ [1 ,...,s ]] ,l = k E | c l.k ( X i ) | < ∞ , E | c l.k ( X i ) e i | < ∞ .4-3. ∀ γ ∈ R , E( | c l.k ( X i ) e i || Q i = γ ) < D , E( | c l.k ( X i ) || Q i = γ ) < D for some D ≤ ∞ , and f q ( γ ) ≤ ¯ f ≤ ∞ .4-4. δ n,l,k ( X i ) = n − α c ∗ l,k ( X i ) , R | c ( x i ) | d x i = 0 and 0 < α < / nh /p +2 r → n )) / n α ] / [ n / h p/ ] →

0, where 0 < α < / δ n,l,k ( · ), which is needed when we de-rive the asymptotic property of the threshold value estimator; see theproofs of Lemma 7 and Theorem 9. The small eﬀect can approach zerowhen the sample size is suﬃciently large; therefore, it depends on n . ∗ l,k ( X i ) is the remainder of the diﬀerence between m γ l ( X i ) and m γ k ( X i )when we extract the eﬀect of the sample size, n − α , from c l,k ( X i ). Given that the number of thresholds s is known, the estimator of thethreshold values can be deﬁned in a manner similar to that in Propo-sition 5 of Bai and Perron (1998):[ˆ γ , . . . ˆ γ s ] = arg min n X i =1 " Y i − s +1 X j =1 ˆ m γ j ( X i ) I γ j ( Q j ) . Clearly, ˆ γ , . . . ˆ γ s are determined simultaneously by global minimiza-tion. In practice, the estimation is implemented by an algorithmbased on the principle of dynamic programming. Under Assumptions1, 2, 3, and 4, the following theorem establishes the consistency ofˆ γ j , j = 1 , . . . , s . Theorem 8.

For j = 1 , . . . , s ,a) ˆ γ j p → γ j b) n (ˆ γ j − γ j ) = O p (1) . (cid:3) The convergency rate of ˆ γ j is n , which is a common result in the lit-erature on structural changes and threshold models within the frame-work of linear regressions and linear quantile regressions (cf. Chen,2008). The limiting distribution of the threshold value estimator is pro-vided by Chan (1998) for linear models. On the contrary, Hansen (2000) nd Bai and Perron (2003) introduce the existence of the small eﬀectto obtain the limiting distribution without the nuisance parameters ofthe threshold value estimation. That is, denote δ n,l,k ( X i ) = m γ l ( X i ) − m γ k ( X i ) = n − α c ∗ l,k ( X i ) . Under the assumption of δ n,l,k ( X i ) →

0, which is called the small eﬀect,we then obtain the asymptotic property of ˆ γ j : Theorem 9. n − α (ˆ γ j − γ j ) d −→ Q j , j = 1 , . . . , s, where Q j = arg max −∞ , where ω j = E ( c ∗ j,j +1 ( X i ) e i | q i = γ j )[ E ( c ∗ j,j +1 ( X i ) | q i = γ j )] f ( γ j ) and B ,j ( · ) and B ,j ( · ) are two independent Brownian motions. (cid:3) Note that the convergence rate of ˆ γ j under the existence of thesmall eﬀect is slower than the rate in the case in which no small eﬀectis assumed. The CDF of Q j can be obtained from Bhattacharya andBrockwell (1976), i.e., for a ≥ P ( Q j ≤ a ) = 1 + r a π e − a + 32 e a Φ (cid:18) − √ a (cid:19) − (cid:18) x + 52 (cid:19) Φ (cid:18) − √ x (cid:19) and for a ≤ P ( Q j ≤ x ) = 1 − P ( Q j ≤ − x ), where Φ( x ) is the CDFof a standard normal random variable. .3 Sequential Method Instead of using a global minimization algorithm in the threshold valueestimations, the sequential method can be adopted. Bai (1997) pro-poses the sequential method for estimating the change points in a linearregression with multiple structural changes and provides the proof ofthe consistency of his estimator without knowing the number of breaks.Bai and Perron (1998) also suggest using the sequential method to es-timate the change points in linear regressions, while Hansen (1998)applies the sequential method to estimate the threshold values for non-dynamic panel threshold models. Following the literature, we thus usethe sequential method to estimate the threshold values in the non-parametric regressions. Without loss of generality, a nonparametricregression with three thresholds is considered. The model under con-sideration is, for s = 3, Y i = X j =1 m γ j ( X i ) I γ j ( Q i ) + e i . The true threshold values implied by this model are γ , γ , and γ , while γ and γ are the lower and upper bounds of the threshold values.However, a nonparametric regression is mis-speciﬁed when a modelwith one threshold is estimated asˆ Y i = ˆ m γ ( X i ) I γ ( Q i ) + ˆ m ∗ γ ( X i )[1 − I γ ( Q i )] , where ˆ m γ ( X i ) and ˆ m ∗ γ ( X i ) denote the kernel estimations from the sam-ple observations Q i ∈ ( −∞ , γ ] and Q i ∈ [ γ, ∞ ), respectively. Theindicator function I γ ( Q i ) = 1 for Q i ∈ ( −∞ , γ ] and 0 otherwise.Denote SSR ( γ ) as the sum of the squared residuals from the non- arametric regression with the threshold value γ . That is, SSR ( γ ) = 1 n n X i =1 { Y i − ˆ m γ ( X i ) I γ ( Q i ) − ˆ m ∗ γ ( X i )[1 − I γ ( Q i )] } . Theorem 10.

Given a threshold value speciﬁed at γ in a mis-speciﬁednonparametric regression with one threshold, the model mis-speciﬁcationerror is SSR ( γ ) p → S ( γ ) = X j =1 b j ( γ ) I γ j ( γ ) , where b j ( γ ) and I γ j ( γ ) for j = 1 , . . . , are deﬁned in the Appendix. (cid:3) Given the three true threshold values γ , γ , and γ , the thresholdvalue γ of a mis-speciﬁed nonparametric regression with one thresholdmay be in [ γ , γ ), in ( γ , γ ), in ( γ , γ ), or in ( γ , γ ]. The model mis-speciﬁcation error of the whole sample is b ( γ ), b ( γ ), b ( γ ), or b ( γ )if the threshold value is mis-speciﬁed at the regime [ γ , γ ), ( γ , γ ),( γ , γ ), or ( γ , γ ], respectively. In the Appendix, we describe theforegoing results in detail. Theorem 11.

Let S ( γ ) = min( S ( γ ) , S ( γ ) , S ( γ )) . S ( γ ) is thesmallest model mis-speciﬁcation error among all γ ∈ [ γ , γ ] . The exactexpression of S ( · ) can be found in the Appendix. (cid:3) S ( γ ) , S ( γ ), and S ( γ ) are three smallest model mis-speciﬁcationerrors among all γ ∈ [ γ , γ ]. Moreover, since S ( γ ) is the limit of SSR ( γ ) in probability and, without loss of generality, min( S ( γ ) , S ( γ ) , S ( γ )) = S ( γ ) is assumed, we have the following theorem to prove S ( γ ) is globalminimization. That is, Theorem 12 is suﬃcient to justify the sequentialprocedures discussed. heorem 12. Assume that the true model is a nonparametric regres-sion with three threshold values, namely γ , γ , and γ , and that a non-parametric regression with one threshold is mis-speciﬁed and estimatedvia ˆ γ = arg min 1 n n X i =1 { Y i − ˆ m γ ( X i ) I γ ( Q i ) − ˆ m ∗ γ ( X i )[1 − I γ ( Q i )] } . We then havea). If S ( γ ) = min( S ( γ ) , S ( γ ) , S ( γ )) , S ( γ ) is the smallest modelmis-speciﬁcation error among all γ ∈ [ γ , γ ] b). SSR (ˆ γ ) → S ( γ ) .c). ˆ γ will, with probability one, converge to γ . (cid:3) According to Theorem 12, even if the nonparametric regression ismis-speciﬁed and a threshold value is mis-estimated at which the sumof the squared errors is smallest, the mis-estimated threshold value con-verges to the true threshold value at which the model mis-speciﬁcationerror is the smallest. The result of Theorem 12 is thus similar to thosein the study by Bai and Perron (1998) for the estimation of the changepoints in a linear regression with multiple structural changes. To thebest of our knowledge, this is the ﬁrst theorem that ensures the con-sistency of the estimators obtained from using a sequential method innonparametric regressions.Note that the assumption min( S ( γ ) , S ( γ ) , S ( γ )) = S ( γ ) indi-cates that the threshold value γ has the largest inﬂuence on the regres-sion.Theorem 12 can be extended to a mis-speciﬁed regression modelwith two threshold values, and then the two estimated threshold valueswill be consistent with the two true threshold values that have a larger mpact on the regression. Based on Theorem 12, the determination ofthe number of thresholds and estimation of the threshold values canbe obtained by using the following sequential procedure.1. Implement the test for the null of s = 0 against s = 1. That is, runthe test to check whether an extra threshold exists in ( γ min , γ max ).If the null is not rejected, it is inferred that the regression has nothreshold. If the null is rejected, move onto the next step.2. Specify s = 1 and estimate the threshold value as ˆ γ . Given ˆ γ ,carry out the test for the null of s = 1 against s = 2. That is,run the test to check whether an extra threshold exists in regimes( γ min , ˆ γ ] and (ˆ γ , γ max ). If the null is not rejected, it is inferredthat the regression has one threshold. If the null is rejected, moveonto the next step.3. Specify s = 2 and estimate the extra threshold value from regimes( γ min , ˆ γ ] and (ˆ γ , γ max )as ˆ γ . Pick up the estimation of the thresh-old values, ˆ γ , which has a smaller sum of squared errors. Givenˆ γ and ˆ γ , carry out the test for the null of s = 2 against s = 3.That is, run the test to check whether an extra threshold exists inregimes ( γ min , ˆ γ ], (ˆ γ , ˆ γ ], and (ˆ γ , γ max ) if ˆ γ > ˆ γ . If the null isnot rejected, it is inferred that the regression has two thresholds.If the null is rejected, repeat the above test until the null of s against s + 1 thresholds is not rejected.When the procedure is conducted to the end such that the null of s thresholds against s + 1 thresholds is not rejected, we then pin down anonparametric regression with s thresholds. Along with this procedure,the estimates of the s threshold values, ˆ γ , ˆ γ , . . . , ˆ γ s , are obtained as a yproduct. Following Theorem 12, the consistency of ˆ γ , ˆ γ , . . . , ˆ γ s , isobtained consequently.As mentioned in Proposition 8 of Bai and Perron (1998), the draw-back of the previously described sequential method is that the deter-mined number of thresholds is larger than the true number of thresholdswith a nonzero probability value. Therefore, Bai and Perron (1998) rec-ommend applying the sequential method with a certain Type I errorthat converges to zero at a slower rate with the sample size. By doingso, the determined number of thresholds converges to the true numberof thresholds. In this section, Monte Carlo studies are conducted to evaluate theperformance of the proposed test statistic, F n ( s +1 | s ). We also conductsimulations to assess the ﬁnite sample performance of the sequentialmethod for estimating the threshold values. Monte Carlo simulations are designed to evaluate the empirical size andpower performances of the tests to identify the number of thresholds.Our experimental design is mainly based on the data-generating process(DGP) considered in A¨ıt-Sahalia et al. (2001). We consider the nullof no threshold against the alternative with one threshold. The DGPunder the null is speciﬁed as Y i = e − . X i + √ e − . X i + Q i ) · ǫ i X i i.i.d. ∼ √ . Q i + √ . u i Q i i.i.d. ∼ N (0 , , u i i.i.d. ∼ N (0 , , ǫ i i.i.d. ∼ N (0 , . F n ( s + 1 | s ) : h = c · σ · n − / . F n ( s + 1 | s ) c = 1 n

500 1000 20001% 0.021 0.017 0.0115% 0.045 0.051 0.05110% 0.076 0.084 0.086

Note: Heterogeneity depends on X and Q . In this DGP, the random variable X is dependent on the thresholdvariable Q and the heteroskedasticity of the regression depends on X and Q . By using a univariate normal kernel function, we compute thebandwidth as h = c · σ · n − /δ = n − /δ , where δ = 4 .

25 (cf. A¨ıt-Sahalia etal., 2001, p.383), c = 1, and σ is set to one in our simulation. We alsoconduct robustness checks on the bandwidth selection. Since s = 0under the null, the critical values of the test statistic F n ( s + 1 | s ) inTheorem 7 are 1.282, 1.645, and 2.326 for Type I errors at 1%, 5%,and 10%, respectively.We conduct simulations with sample sizes of 500, 1000 and 2000.Throughout our simulations, the numbers of replications and partitions m are set to be 1000 and 7, respectively. Table 2 presents the empiricalsizes of F n ( s + 1 | s ) at 1%, 5%, and 10%, showing that the proposedtest performs well with decent empirical sizes.Table 3 shows the corresponding Monte Carlo results with the ro-bustness checks on the choice of bandwidth. The proposed test copeswell with decent sizes across the distinct bandwidth values. F n ( s + 1 | s ) : h = c · σ · n − / . F n ( s + 1 | s ) c = 1 . c = 1 . c = 1 . n Note: Heterogeneity depends on X and Q . To assess the accuracy of the sequential method for estimating thethreshold values, we consider the following DGP in the Monte Carlostudies, which are similar to those in A¨ıt-Sahalia et al. (2001, p.383): Y i = e − . ∗ X t I γ ( Q i ) + (1 + e − . X i ) I γ ( Q i ) + (2 + e − . X i ) I γ ( Q i )+(0 . e − . X i ) I γ ( Q i ) + p . e − X i · ǫ i ,X i i.i.d. ∼ N (0 , ,Q i i.i.d. ∼ N (0 , , ǫ i.i.d. ∼ N (0 , . Thresholds : γ = − . , γ = 0 . , γ = 0 . . Let ˆ γ ,i denote the threshold value estimate in the ﬁrst-round identi-ﬁcation from the i th replication of the DGP. Then, the mean, standarderror, and MSE (mean square error) from all the nr replications arecomputed by ¯ˆ γ = 1 nr nr X i =1 ˆ γ ,i se (ˆ γ ) = " nr − nr X i =1 (ˆ γ ,i − ¯ˆ γ ) / M SE (ˆ γ ) = (¯ˆ γ − γ ) + [ sd (ˆ γ )] . ˆ γ ˆ γ ˆ γ n ¯ˆ γ se (ˆ γ ) MSE(ˆ γ ) ¯ˆ γ se (ˆ γ ) MSE(ˆ γ ) ¯ˆ γ se (ˆ γ ) MSE(ˆ γ )500 0.4227 0.2542 0.0705 0.1775 0.0960 0.0100 -0.6523 0.2407 0.06021000 0.4867 0.1245 0.0160 0.1529 0.0320 0.0010 -0.6894 0.1198 0.01403000 0.5025 0.0079 6.9 × − × − -0.7029 0.0040 2.4 × − Given n = 500 , , γ = 0 .

5. For the second estimated threshold values, the meanis 0.152506, which is close to γ = 0 .

15. The mean of the third es-timated threshold values is -0.6966892, which is close to γ = − . Examining the eﬀects of 401(k) plans on savings is an issue of long-standing empirical interest (see Chernozhukov and Hansen (2004) andthe references cited therein). Intuitively, because diﬀerent income groupsface distinct resource constraints, income thresholds should play an im- ortant role in the analysis of individual savings for retirement. Cher-nozhukov and Hansen (2013) study the eﬀect of 401(k) eligibility ontotal wealth by using high-dimensional methods that allow for ﬂexiblefunctional forms. By using a sample of 9915, they generate 10,763 tech-nical variables through a spline basis and polynomial basis and thenselect a few important variables out of the technical variables by using aLASSO-based double selection procedure. The selected few importantvariables include max(0 , income − . income variable isnormalized on the [0 ,

1] interval. Their result suggests that the incomethreshold exists in the 401(k) study. In the literature, however, notest procedures have thus far been implemented to investigate the rel-evant income threshold values in 401(k) applications. In this section,we use our testing procedure to show that income thresholds indeedexist in 401(k) applications, and conﬁrm that this ﬁnding is robust tofunctional form speciﬁcations.To illustrate the testing procedure proposed in the preceding sec-tions, we consider the estimation and inference of the thresholds as-sociated with the eﬀect of 401(k) eligibility on total wealth. 401(k)eligibility, the variable of interest, is an indicator of being eligible toenroll in a 401(k) plan (i.e., whether individual i is working for a ﬁrmthat oﬀers access to a 401(k) plan). Poterba et al. (1994a, 1994b) andChernozhukov et al. (2016) argue that 401(k) eligibility may be taken asexogenous conditional on income. Following Chernozhukov et al. (2016)and by using the data set in Chernozhukov and Hansen (2004), we thusconstruct both our outcome variable and the explanatory variable ofinterest after partialling out the eﬀects of the other variables includingthe dummies for age, education, marital status, family size, and home-ownership. The sample size is 9915. In the example presented herein, e consider the following nonparametric regression with s thresholds: Y po = s X j =1 m γ j ( D po ) + e i , where the threshold variable is income, while Y po and D po are the par-tialled out total wealth and partialled out 401(k) eligibility, respec-tively.We implement the test F n ( s + 1 | s ) in Theorem 7 to determinethe number of thresholds and then estimate the corresponding thresh-old values by using the sequential method. The weighting functionis constructed as A ( d ) = { d ∈ [ − . , . } , and the bandwidth h = c · ˆ σ × (9915) − / . , where ˆ σ = 0 .

46 and c is set to 1. We ﬁrst conducta test for the null hypothesis that s = 0 versus s = 1. We ﬁnd thatthe value of the test statistic is 50.46, thereby rejecting the null. Theﬁrst-round estimated threshold value ˆ γ = 75 , . , . s = 1 versus s = 2. The cor-responding value of the test statistic is 27.34, which again rejects thenull. The second-round estimated threshold value ˆ γ = 42 ,

600 (68thpercentile). We now conduct the test for the null hypothesis that s = 2versus s = 3 in the joint interval of [0 , , . γ = 31 ,

836 (50th percentile). Sincethere are insuﬃcient observations in the intervals [31836 , , . , s = 3 versus s = 4 in the interval of [0 , eject the null because the test statistic with the value 0.85 is less thanthe critical value. We also conduct robustness checks by using diﬀer-ent bandwidth values with c = 1 .

05 and c = 0 .

95. The correspondingthree threshold values found are the same as those found with c = 1.In short, our testing procedure allows us to identify four threshold re-gions and the estimated income threshold values are $31 ,

836 (50%),$42 ,

600 (68%), and $75 , . In this study, we identify the number of thresholds and estimate thethreshold values for a nonparametric regression with multiple thresh-olds. The signiﬁcance test of A¨ıt-Sahalia et al. (2001) is modiﬁed todetect the existence of an extra threshold (i.e., s versus s +1 thresholds).The asymptotic properties of the modiﬁed tests are then established.Based on the modiﬁed test, a procedure for determining the number ofthresholds is suggested. Accordingly, we then carry out the sequentialmethod to estimate the unknown threshold values. We also derive theasymptotic properties of the corresponding threshold value estimator.Our simulation results signify that the proposed estimators performadequately in ﬁnite samples. To illustrate our testing procedure, wepresent an empirical analysis of the 401(k) plan with income thresholds. ppendix Proof of Theorem 1.

The kernel density estimator is deﬁned byˆ f γ j ( x ) = 1 n n X i =1 K h ( X i − x ) I γ j ( Q i ) . Suppose the kernel satisﬁes the conditions in Assumption 2 and is asecond-order ( r = 2) kernel function and that Assumptions 1-1 to 1-4hold. Then, ˆ f γ j ( x ) has the expectation E [ ˆ f γ j ( x )] = f γ j ( x ) + h p X l =1 f (2) γ j ,l ( x ) C + o ( h ) (15)and the varianceV( ˆ f γ j ( x )) = 1 n V( K h ( X i − x ) I γ j ( Q i ))+ M ( n ) X l =1 n − ln Cov [ K h ( X − x ) I γ j ( Q )) , K h ( X l − x ) I γ j ( Q l ))]+2 n − X l = M ( n )+1 n − ln Cov [ K h ( X − x ) I γ j ( Q )) , K h ( X l − x ) I γ j ( Q l ))]= V + V + V . (16)Assuming M ( n ) satisﬁes as n → ∞ , M ( n ) → ∞ , and M ( n ) h p → , we have V = 1 n (cid:2) E( K h ( X i − x ) I γ j ( Q i )) − [E( K h ( X i − x ) I γ j ( Q i ))] (cid:3) = 1 nh p C f γ j ( x ) + o ( nh p ) (17)By denoting M = max l ∈ [1 ,...,M ( n )] Cov [ K h ( X − x ) I γ j ( Q )) , K h ( X l − ) I γ j ( Q l ))], we obtain V = M ( n ) X l =1 n − ln Cov [ K h ( X − x ) I γ j ( Q )) , K h ( X l − x ) I γ j ( Q l ))] ≤ n M ( n ) M = o (( nh p ) − ) . (18)Denote W ni ( x ) = K h ( X i − x ) I γ j ( Q i )) − E[ K h ( X i − x ) I γ j ( Q i ))] and forany δ >

0, the upper bound of the covariance terms can be obtainedby Lemma A.0 of Fan and Li(1999) as

Cov [ K h ( X − x ) I γ j ( Q )) , K h ( X l − x ) I γ j ( Q l ))] ≤ M / (1+ δ )2 β δ/ (1+ δ ) ( l )where M is deﬁned as max (cid:18) E | W i ( x ) W (1+ l ) i ( x ) | δ , Z Z | W i ( x ) W (1+ l ) i ( x ) | δ ) dF ( X , Q ) dF ( X l , Q l (cid:19) . Furthermore, given that Assumption 1-1 holds, V = n − X l = M ( n )+1 n − ln Cov [ K h ( X − x ) I γ j ( Q )) , K h ( X l − x ) I γ j ( Q l ))] ≤ n M ∞ X l = M ( n )+1 β δ/ (1+ δ ) ( l ) = o (( nh p ) − ) . (19)By combining (17), (18) and (19), we have the variance of ˆ f γ j ( x ) asV( ˆ f γ j ( x )) = 1 nh p C f γ j ( x ) + o ( nh p ) . (20)In general, if the r th-order kernel function is considered, (15) becomesE( ˆ f γ j ( x )) = f γ j ( x ) + O ( h r ) + o ( h r ) . (21)Given the results in (20) and (21) and that the bandwidth h satisﬁesAssumption 3-1, the uniform almost sure convergence rate of a kerneldensity estimator can be obtained; see Lemma 2 and Lemma 8 in Stone(1983). Given the results in (15) and (20), and that Assumptions 1, , 3-1, and 5-2 hold, the asymptotic sampling distribution of ˆ f γ j ( x ) isderived by Masry (1996) and Li and Racine (2007). (cid:4) Proof of Theorem 2.

Given a second-order kernel function as well as equations (15) and (20),we haveˆ f γ j ( x ) = f γ j ( x ) + O p ( h + ( nh p ) − / ) = f γ j ( x ) + o p (1) . (22)Together with (22), the local constant estimator can be rewritten asˆ m γ j ( x ) − m γ j ( x ) = P ni =1 K h ( X i − x ) I γ j ( Q i )[ Y i − m γ j ( x )] P ni =1 K h ( X i − x ) I γ j ( Q i )= (cid:20) n P ni =1 K h ( X i − x ) I γ j ( Q i )[ Y i − m γ j ( x )] f γ j ( x ) (cid:21) (1 + o p (1)) . Under the correct speciﬁcation of a nonparametric regression with s thresholds, the ﬁrst term in the previous result is1 n P ni =1 K h ( X i − x ) I γ j ( Q i )[ Y i − m γ j ( x )] f γ j ( x )= 1 n P ni =1 K h ( X i − x ) I γ j ( Q i )[ P s +1 l =1 m γ l ( X i ) I γ l ( Q i ) − m γ j ( x )] f γ j ( x )+ 1 n P ni =1 K h ( X i − x ) I γ j ( Q i ) e i f γ j ( x )= AB γ j ( x ) + AV γ j ( x ) . From Assumption 1-1, we have E [ AB γ j ( x )] = 1 n E " P ni =1 K h ( X i − x ) I γ j ( Q i )[ P s +1 l =1 m γ l ( X i ) I γ l ( Q i ) − m γ j ( x )] f γ j ( x ) = E " K h ( X i − x ) I γ j ( Q i )[ P s +1 l =1 m γ l ( X i ) I γ l ( Q i ) − m γ j ( x )] f γ j ( x ) (23)where P [ I γ j ( Q i ) × I γ l ( Q i ) = 0] = 1 , j = l ; I γ j ( Q i ) × I γ l ( Q i ) = I γ j ( Q i ) , j = . Thus, (23) becomes E [ AB γ j ( x )]= E " K h ( X i − x ) I γ j ( Q i )[ P s +1 l =1 m γ l ( X i ) I γ l ( Q i ) − m γ j ( x )] f γ j ( x ) = h C p X l =1 [ m (2) γ j ,l ( x ) f γ j ( x ) + 2 m (1) γ j ,l ( x ) f (1) γ j ,l ( x )] /f γ j ( x ) + o ( h ) . (24)Further, the asymptotic variance term isV( AV γ j ( x )) = 1 n E (cid:18) K h ( X i − x ) I γ j ( Q i ) e i f γ j ( x ) (cid:19) + 1 n n − X l =1 n − ln Cov (cid:18) K h ( X − x ) I γ j ( Q ) e f γ j ( x ) , K h ( X l − x ) I γ j ( Q l ) e l f γ j ( x ) (cid:19) = V + V with V = 1 n f γ j ( x ) E " n X i =1 K h ( X i − x ) I γ j ( Q i ) e i = 1 n f γ j ( x ) Z K h ( X i − x ) × Z ( y i − m γ j ( X i )) i f γ j ( y i | x i ) dy i f γ j ( x i ) d x i = σ γ j ( x ) nh p f γ j ( x ) Z K ( u ) du + o ( 1 nh p ) . (25)Given that Assumption 1-1 holds, and from arguments similar to theproof for Theorem 1, the covariance term V = o (( nh p ) − ). We haveV( AV γ j ( x )) = σ γ j ( x ) nh p f γ j ( x ) C + o ( nh p ) , (26)and the covariance terms are Cov ( AV γ j ( x ) , AV γ k ( x ))= 1 n f γ j ( x ) f γ k ( x ) (cid:8) E[ K h ( X i − x ) I γ j ( Q i ) I γ k ( Q i ) e i ] − E[ K h ( X i − x ) I γ j ( Q i ) e i ]E[ K h ( X i − x ) I γ k ( Q i ) e i ] (cid:9) = 0 . (27) n general, when the kernel is an r th kernel function, (23) becomesE( AB γ j ( x )) = O ( h r ) . (28)Given that (26), (28), and Assumption 3-1 hold, the result of parta) in Theorem 2 is veriﬁed based on Lemmas 2 and 8 of Stone (1983).Moreover, given (24), (26), (27), and that Assumption 3-1 holds, theresult of part b) in Theorem 2 holds according to the central limittheorem; see Masry (1996) and Li and Racine (2007). (cid:4) Proof of Theorem 3.

By substituting (26) and (28) into the mean integrated square error,we have the optimal bandwidth deﬁned as h opt = arg min Z E " s +1 X j =1 (cid:0) ˆ m γ j ( x ) − m γ j ( x ) (cid:1) w ( x ) d x = arg min Z s +1 X j =1 (cid:2) E( AB γ j ( x )) + V( AV γ j ( x )) (cid:3) w ( x ) d x . (29)Taking the ﬁrst-order derivative of (29) with respect to h , d R P s +1 j =1 (cid:2) E( AB γ j ( x )) + V( AV γ j ( x )) (cid:3) ( γ j − − γ j ) w ( x ) d x dh set = 0we then have h opt = O ( n − r + p ). It is clear that the convergence rateof h opt depends on the dimension of X , p , and the orders of the ker-nel function, r . It is worth noting that the convergence rate does notdepend on the number of thresholds, s . This result suggests that thebandwidth can be selected without considering the number of thresh-olds. (cid:4) Proof of Theorem 4. inceΓ( τ j ) = Z Z (cid:26)Z yf γ j ( y, x ) f γ j ( x ) dyI γ j ( q ) − Z yf γ j − ,τ j ( y, x ) f γ j − ,τ j ( x ) dyI γ j − ,τ j ( q ) − Z yf τ j ,γ j ( y, x ) f τ j ,γ j ( x ) dyI τ j ,γ j ( q ) (cid:27) a ( x ) dF ( x , q )= Γ( f γ j , f γ j − ,τ j , f τ j ,γ j , F ) , we have˜Γ( τ j ) = 1 n n X i =1 (cid:8) ˆ m γ j ( X ) I γ j ( Q ) − ˆ m γ j − ,τ j ( X ) I γ j − ,τ j ( Q ) − ˆ m τ j ,γ j ( X ) I τ j ,γ j ( Q ) (cid:9) a ( X i )= Z Z (Z y ˆ f γ j ( y, x )ˆ f γ j ( x ) dyI γ j ( q ) − Z y ˆ f γ j − ,τ j ( y, x )ˆ f γ j − ,τ j ( x ) dyI γ j − ,τ j ( q ) − Z y ˆ f τ j ,γ j ( y, x )ˆ f τ j ,γ j ( x ) dyI τ j ,γ j ( q ) ) a ( x ) d ˆ F ( x , q )= Γ( ˆ f γ j , ˆ f γ j − ,τ j , ˆ f τ j ,γ j , ˆ F ) . Note thatΓ( ˆ f γ j , ˆ f γ j − ,τ j , ˆ f τ j ,γ j , F )= 1 n n X i =1 (cid:8) ˆ m γ j ( X ) I γ j ( Q ) − ˆ m γ j − ,τ j ( X ) I γ j − ,τ j ( Q ) − ˆ m τ j ,γ j ( X ) I τ j ,γ j ( Q ) (cid:9) a ( X i )= Z Z (Z y ˆ f γ j ( y, x )ˆ f γ j ( x ) dyI γ j ( q ) − Z y ˆ f γ j − ,τ j ( y, x )ˆ f γ j − ,τ j ( x ) dyI γ j − ,τ j ( q ) − Z y ˆ f τ j ,γ j ( y, x )ˆ f τ j ,γ j ( x ) dyI τ j ,γ j ( q ) ) a ( x ) d ˆ F ( x , q ) . We need the following lemmas to complete the proof.

Lemma 1. (Lemma 2 of A¨ıt Sahalia et al. (2001)) eﬁning || g γ j || ≡ max (cid:18) sup x | Z yg γ j ( y, x ) dy | , sup x | g γ j ( x ) | (cid:19) || g γ j − ,τ j || ≡ max (cid:18) sup x | Z yg γ j − ,τ j ( y, x ) dy | , sup x | g γ j − ,τ j ( x ) | (cid:19) || g τ j ,γ j || ≡ max (cid:18) sup x | Z yg τ j ,γ j ( y, x ) dy | , sup x | g τ j ,γ j ( x ) | (cid:19) where g γ j = ˆ f γ j − f γ j g γ j − ,τ j = ˆ f γ j − ,τ j − f γ j − ,τ j g τ j ,γ j = ˆ f τ j ,γ j − f τ j ,γ j , we have || g γ j || = O p ( h r + ln( n ) / ( nh p ) / ) || g γ j − ,τ j || = O p ( h r + ln( n ) / ( nh p ) / ) || g τ j ,γ j || = O p ( h r + ln( n ) / ( nh p ) / ) . Lemma 2. (Lemma 7 of A¨ıt Sahalia et al. (2001))

Γ( ˆ f γ j , ˆ f γ j − ,τ , ˆ f τ,γ j , ˆ F ) = Γ( ˆ f γ j , ˆ f γ j − ,τ , ˆ f τ,γ j , F ) + Λ ,n + Λ ,n with Λ ,n = Z Z (cid:26)Z α γ j ( x ) dyI γ j ( q ) − Z α γ j − ,τ j ( x ) dyI γ i − ,τ j ( q ) − Z α τ j ,γ j ( x ) dyI τ j ,γ j ( q ) (cid:27) a ( X )( d ˆ F ( x , q ) − dF ( x , q ))= O p ( n − ( h − p ) + n − ( h r )) = o p ( n − h − p ) , Λ ,n = O p ( || ˆ f γ j − f γ j || + || ˆ f γ j − ,τ j − f γ j − ,τ j || + || ˆ f τ j ,γ j − f τ j ,γ j || ) where α γ j ( x ) = y − m γj ( x ) f γj ( x ) , α γ j − ,τ j ( x ) = y − m γj − ,τj ( x ) f γj − ,τj ( x ) , and α γ j ( x ) = y − m γj − ,τj ( x ) f γj − ,τj ( x ) . (cid:3) emma 3. (Hall, 1984).Let { Z i ; i = 1 , . . . , n } be an i.i.d sequence. Suppose that the U-statistic U n = P ≤ i

Cov ( δ ( τ j,l ) , δ ( τ j,k ))= σ − / ( τ j,l ) σ − / ( τ j,k ) h p × E (cid:18)Z a ( i ) a ( j ) dF ( x , q ) Z a ( i ) a ( j ) dF ( x , q ) − Z a ( i ) a ( j ) dF ( x , q ) Z a ( i ) c ( j ) dF ( x , q ) + Z a ( i ) a ( j ) dF ( x , q ) Z c ( i ) c ( j ) dF ( x , q ) − Z a ( i ) b ( j ) dF ( x , q ) Z a ( i ) a ( j ) dF ( x , q ) + 4 Z a ( i ) b ( j ) dF ( x , q ) Z a ( i ) c ( j ) dF ( x , q ) − Z a ( i ) b ( j ) dF ( x , q ) Z c ( i ) c ( j ) dF ( x , q ) + Z b ( i ) b ( j ) dF ( x , q ) Z a ( i ) a ( j ) dF ( x , q ) − Z b ( i ) b ( j ) dF ( x , q ) Z a ( i ) c ( j ) dF ( x , q ) + Z b ( i ) b ( j ) dF ( x , q ) Z c ( i ) c ( j ) dF ( x , q ) (cid:19) = [ σ ( τ j,l ) + σ ( τ j,l )] − / [ σ ( τ j,k ) + σ ( τ j,k )] − / ϕ ( τ j,l , τ j,k ) (cid:4) (33) Proof of Theorem 6.

Let δ τ j ( x , q ) = m γ j ( x ) I γ j ( q ) − m γ j − ,τ ( x ) I γ j − ,τ ( q ) − m τ,γ j ( x ) I τ,γ j ( q ) s τ j ( Y i , X i , Q i ; y, x , q ) = s ,τ j ( Y i , X i , Q i ; y, x , q ) + s ,τ j ( Y i , X i , Q i ; y, x , q )+ s ,τ j ( Y i , X i , Q i ; y, x , q ) ith s ,τ j ( Y i , X i , Q i ; y, x , q ) = g γ j ( x ) f γ j ( x ) Z α γ j ( y, x ) K h ( X i − x ) K h ( Y i − y ) dys ,τ j ( Y i , X i , Q i ; y, x , q ) = g γ j − ,τ ( x ) f γ j − ,τ ( x ) Z α γ j − ,τ ( y, x ) K h ( X i − x ) K h ( Y i − y ) dys ,τ j ( Y i , X i , Q i ; y, x , q ) = g τ,γ j ( x ) f τ,γ j ( x ) Z α τ,γ j ( y, x ) K h ( X i − x ) K h ( Y i − y ) dy. It is clear that

R R | δ τ j ( x , q ) | d x dq = 0 when the alternative hypothesisis true.As in the proof of Theorem 4, we know thatΓ( ˆ f γ j , ˆ f γ j − ,τ j , ˆ f τ j ,γ j , F )= Γ( f γ j , f γ j − ,τ j , f τ j ,γ j , F ) + Ψ (1) (0) + 1 / (2) (0) + 1 / (3) ( t ∗ )whereΨ (1) (0) = 2 Z Z ψ ( t ) ∂ψ ( t ) ∂t a ( x ) dF ( x , q )Ψ (2) ( t ) = 2 Z Z ( ψ ( t ) ∂ ψ ( t ) ∂t + (cid:20) ∂ψ ( t ) ∂t (cid:21) ) a ( x ) dF ( x , q ) . It is clear that ψ ( t ) = 0 under the null and ψ ( t ) = 0 under the alter-native. Then, under the alternative,Ψ (1) (0) = 1 n n X i =1 Z δ τ j ( x , q ) r τ j ( Y i , X i , Q i ; y, x , q ) a ( x ) dF ( x , q )= O ( h r ) + O p (( nh ) − / )and Z Z ψ ( t ) ∂ ψ ( t ) ∂t a ( x ) dF ( x , q )= 1 n n X i =1 Z δ τ j ( x , q ) s τ j ( Y i , X i , Q i ; y, x , q ) a ( x ) dF ( x , q ) ≤ O p ( h r + ln( n ) / ( nh p ) / )[ O ( h r ) + O p (( nh ) − / )] iven the following results in the proof of Theorem 4, (cid:20) ∂ψ ( t ) ∂t (cid:21) = O (( nh p ) − ) + O p ( n − h − p/ )Ψ (3) ( t ∗ ) = O ( || ˆ f γ j − f γ j || + || ˆ f γ j − ,τ j − f γ j − ,τ j || + || ˆ f τ j ,γ j − f τ j ,γ j || ) , we haveΓ( ˆ f γ j , ˆ f γ j − ,τ j , ˆ f τ j ,γ j , F )= Γ( f γ j , f γ j − ,τ j , f τ j ,γ j , F ) + Ψ (1) (0) + 1 / (2) (0) + 1 / (3) ( t ∗ )= O (1) + [ O ( h r ) + O p (( nh p ) − / )] + [ O (( nh p ) − ) + O p ( n − h − p/ )]+ O p ( h r + ln( n ) / ( nh p ) / )[ O ( h r ) + O p (( nh ) − / )] + 1 / (3) ( t ∗ ) . Therefore, under the alternative, σ − ( τ j ) { nh p/ ˜Γ( τ j ) − h − p/ ξ ( τ j ) } = σ − ( τ j ) { nh p/ [Γ( f γ j , f γ j − ,τ j , f τ j ,γ j , F ) + o (1)] } → ∞ . When the alternative converges to the null at speed n − / h − p/ , we get nh p/ R R ψ ( t ) ∂ψ ( t ) ∂t a ( x ) dF ( x, q ) = [ O (( nh /p +2 r ) / ) + O p (( nh ) − / )] = o p (1). Similarly, we have nh p/ R R ψ ( t ) ∂ ψ ( t ) ∂t a ( x ) dF ( x, q ) = o p (1).Hence, from Proposition 2 of A¨ıt Sahalia et al. (2001), we have provedTheorem 6. (cid:4) Proof of Theorem 7.

Observe that the indicator functions deﬁned on distinct intervals aremutually exclusive. Therefore the asymptotic covariance between thestatistics δ ( τ j,l ) and δ ( τ k,l ) ( l = k ) is zero. In what follows, we verifythis fact. Let τ j,l and τ k,l , respectively be the l and l splitting pointsin the intervals of [ γ j − , γ j ) and [ γ k − , γ k ); also let l = k . Following theproof of Theorem 4, we have δ ( τ j,l ) = σ − / ( τ j,l ) nh p/ I n ( τ j,l ) + o (cid:0) ( nh p/ ) − (cid:1) , here I n ( τ j,l ) = X X Z ˜ r τ j,l ( Y i , X i , Q i , y, x , q ) × ˜ r τ j,l ( Y j , X j , Q j , y, x , q ) dF ( x , q ) . As those deﬁned in Theorem 4,˜ r τ j,l ( Y j , X j , Q j , y, x , q )= r τ j,l ( Y i , X i , Q i , y, x , q ) − E h r τ j,l ( Y i , X i , Q i , y, x , q ) i , and r τ j,l ( Y i , X i , Q i , y, x , q )= (cid:26)Z α γ j ( y, x ) K h ( X i − x ) I γ j ( Q t ) K h ( Y i − y ) dyI γ j ( q ) − Z α γ j − ,τ j,l ( y, x ) K h ( X i − x ) I γ j − ,τ j,l ( Q i ) K h ( Y i − y ) dyI γ j − ,τ j,l ( q ) − Z α τ j,l ,γ j ( y, x ) K h ( X i − x ) I τ j,l ,γ j ( Q i ) K h ( Y i − y ) dyI τ j,l ,γ j ( q ) (cid:27) . Following the proof of Theorem 5, we denote a ( i ) = Z α γ j ( y, x ) K h ( X i − x ) I γ j ( Q t ) K h ( Y i − y ) dyI γ j ( q ) b ( i ) = Z α γ k ( y, x ) K h ( X i − x ) I γ k ( Q t ) K h ( Y i − y ) dyI γ k ( q ) c ( i ) = Z α γ j − ,τ j,l ( y, x ) K h ( X i − x ) I γ j − ,τ j,l ( Q i ) K h ( Y i − y ) dyI γ j − ,τ j,l ( q )+ Z α τ j,l ,γ j ( y, x ) K h ( X i − x ) I τ j,l ,γ j ( Q i ) K h ( Y i − y ) dyI τ j,l ,γ j ( q ) d ( i ) = Z α γ k − ,τ k,l ( y, x ) K h ( X i − x ) I γ j − ,τ k,l ( Q i ) K h ( Y i − y ) dyI γ j − ,τ k,l ( q )+ Z α τ k,l ,γ j ( y, x ) K h ( X i − x ) I τ k,l ,γ j ( Q i ) K h ( Y i − y ) dyI τ k,l ,γ j ( q ) , and obtain Cov ( δ ( τ j,l ) , δ ( τ k,l ))= σ − / ( τ j,l ) σ − / ( τ k,l ) h p E( I n ( τ j,l ) I n ( τ k,l )) + o (1)= σ − / ( τ j,l ) σ − / ( τ k,l ) h p × E (cid:18)Z [ a ( i ) − c ( i )][ a ( j ) − c ( j )] dF ( x , q ) Z [ b ( i ) − d ( i )][ b ( j ) − d ( j )] dF ( x , q ) (cid:19) + o (1) . he equation above signiﬁes that the indicators a ( i ), a ( j ), c ( i ), and c ( j ) are mutually exclusive; b ( i ), b ( j ), d ( i ), and d ( j ) are also mutu-ally exclusive. Hence, Cov ( δ ( τ j,l ) , δ ( τ k,l )) is of o p (1). Further, δ ( τ j,l )and δ ( τ k,l ) are asymptotically normally distributed, and they thus canbe seen as asymptotically independent. Accordingly, with the sameassumptions imposed in Theorem 5, Theorem 7 holds. (cid:4) . Proof of Theorem 8.

With s pseudo threshold values, [ τ , . . . , τ s ] in [ γ , γ s +1 ], the conditionalmean estimator is constructed asˆ m τ j ( x ) = P K h ( X i − x ) I τ j ( Q i ) Y i P K h ( X i − x ) I τ j ( Q i )with I τ j ( Q i ) =  Q i ∈ [ τ j − , τ j ) , , and τ = γ , τ s +1 = γ s +1 .To proceed, we need the following lemmas. Lemma 4.

For any [ τ , . . . , τ s ] , we have sup | ˆ m τ j ( x ) − m τ j ( x ) | = O p ( h r + (ln( n )) / / ( nh p ) / ) and m τ j ( x ) = Z s +1 X j =1 m γ j ( x ) I γ j ( q ) I τ j ( q ) f ( x , q ) f τ j ( x ) dq. (cid:3) (34)Proof: Since ˆ m τ j ( x ) is a local constant estimator, its almost sure con-vergence rate is O p ( h r + (ln( n )) / / ( nh p ) / ) from the result of part a) n Theorem 2. From the deﬁnition of m τ j ( x ), m τ j ( x ) = Z y f τ j ( x , y ) f τ j ( x ) dy = Z Z y f ( x , y, q ) f ( x , q ) dyI τ j ( q ) f ( x , q ) f τ j ( x ) dq = Z s +1 X j =1 m γ j ( x ) I γ j ( q ) I τ j ( q ) f ( x , q ) f τ j ( x ) dq. (cid:4) Lemma 5.

Under the condition that X and Q are exogenous, we have n n X i =1 g ( X i , Q i ) e i = o p (1) . (cid:3) Proof: The second moment of g ( X i , Q i ) e i exists, that is Z g ( x i , q i ) σ ( x , q ) dF ( x , q ) < ∞ . Since E [ g ( X i , Q i ) e i ] = 0 for X and Q being exogenous, from the lawof large number, we have1 n n X i =1 g ( X i , Q i ) e i → E [ g ( X i , Q i ) e i ] = 0 . (cid:4) Let d s,j ( X i , Q i ) = ˆ m τ j ( X i ) I τ j ( Q i ) − m γ j ( X i ) I γ j ( Q i ) and G X ,Q ( τ , . . . , τ s ) = E hP s +1 j =1 d s,j ( X i , Q i ) i . The estimated sum of squared residuals atthreshold values [ τ , . . . , τ s ] is1 n n X i =1 ˆ e i ( τ , . . . , τ s ) = 1 n n X i =1  e i − s +1 X j =1 d s,j ( X i , Q i ) e i + " s +1 X j =1 d s,j ( X i , Q i )  → p E ( e i ) + G X ,Q ( τ , . . . , τ s ) = H ( τ , . . . , τ s ) ith 1 n n X i =1 e i → p E ( e i )1 n n X i =1 s +1 X j =1 d s,j ( X i , Q i ) e i → p n n X i =1 " s +1 X j =1 d s,j ( X i , Q i ) → p E " s +1 X j =1 d s,j ( X i , Q i ) = G X ,Q ( τ , . . . , τ s ) . Moreover, G X ,Q ( τ , . . . , τ s )= E " s +1 X j =1 d s,j ( X i , Q i ) = E ( s +1 X j =1 (cid:8) [ m τ j ( X i ) − m γ j ( X i )] I γ j ( Q i ) + m τ j ( X i )[ I τ j ( Q i ) − I γ j ( Q i )] (cid:9)) + O (cid:18) nh p (cid:19) . It is clear that, from Lemma 4, G X ,Q ( γ , . . . , γ s ) and H ( τ , . . . , τ s ) havetheir minimum at τ j = γ j , ∀ j ∈ , . . . , s . According to Theorem 2.1 ofNewey and McFadden (1994), we then have[ˆ γ , . . . ˆ γ s ] = arg min 1 n n X i =1 ˆ e i ( τ , . . . , τ s ) → p arg min H ( τ , . . . , τ s ) = [ γ , . . . , γ s ] . This is the proof of part a) of Theorem 8. (cid:4)

For the proof of part b) of Theorem 8, without loss of generality,we provide the proof of ˆ γ → p γ in a nonparametric regression withthree thresholds. Denote G n, , ( τ , γ ) = n X i =1 c , ( X i ) I γ ,τ ( Q i ) J n, , ( τ ) − J n, , ( γ ) = 1 √ n c , ( X i ) e i I τ ,γ ( Q i ) , here c , ( X i ) = m γ ( X i ) − m γ ( X i ).The following lemmas are needed for our proof. Lemma 6.

Set ¯ v = Kη d (1 − /b ) ǫ and d = min τ ∈ R E ( c , ( X i ) | Q i = τ ) f ( τ ) > d = max τ ∈ R | E ( c , ( X i ) | Q i = τ ) | f ( τ ) > d = max τ ∈ R f ( τ ) > . There exist the constants

B > , < d , d , d < ∞ , and < c < ∞ ,such that for all η > and ǫ > , there exists a ¯ v < ∞ such that forall n , P (cid:18) inf ¯ v/n ≤| τ − γ |≤ B G n, , ( τ , γ ) n | τ − γ | < (1 − η ) d (cid:19) ≤ ǫ,P sup ¯ v/n ≤| τ − γ |≤ B P ni =1 | c , ( X i ) || I γ ,τ ( Q i ) | n | τ − γ | > (1 + η ) d ! ≤ ǫ,P sup ¯ v/n ≤| τ − γ |≤ B P ni =1 | I γ ,τ ( Q i ) | n | τ − γ | > (1 + η ) d ! ≤ ǫ. (cid:3) Proof: See Lemma A.7 of Hansen (2000). (cid:4)

Lemma 7.

For all η > and ǫ > , there exists some ¯ v < ∞ suchthat for any B < ∞ , P sup ¯ v/n ≤| τ − γ |≤ B | J n, , ( τ ) − J n, , ( γ ) |√ n | τ − γ | < η ! ≤ ǫ,P sup ¯ v/n ≤| τ − γ |≤ B | P ni =1 | I γ ,τ ( Q i ) | e i |√ n | τ − γ | < η ! ≤ ǫ. (cid:3) Proof: See Lemma A.8 of Hansen (2000). (cid:4)

Let E n be the intersection sets of max( | ˆ γ − γ | , | ˆ γ − γ | , | ˆ γ − γ | ) ≤ and sup | ˆ c , ( X i ) − c , ( X i ) | ≤ κ . From Lemmas 6 and 7, we haveinf ¯ v/n ≤| τ − γ |≤ B G , ,n ( τ ) | τ − γ | > (1 − η ) d , sup ¯ v/n ≤| τ − γ |≤ B P ni =1 | c , ( X i ) || I γ ,τ ( Q i ) || τ − γ | < (1 + η ) d , sup ¯ v/n ≤| τ − γ |≤ B P ni =1 | I γ ,τ ( Q i ) || τ − γ | < (1 + η ) d , sup ¯ v/n ≤| τ − γ |≤ B | J n, , ( τ ) − J n, , ( γ ) |√ n | τ − γ | < η, sup ¯ v/n ≤| τ − γ |≤ B | P ni =1 | I γ ,τ ( Q i ) | e i |√ n | τ − γ | < η. Take η and κ to be suﬃciently small such that(1 − η ) d − η − κη − κ (1 + η ) d − κ (1 + η ) d − κ (1 + η ) d − κ (1 + η ) d ≥ . We thus have

SSR ( τ , τ .τ ) − SSR ( τ , γ , τ ) n ( τ − γ )= 1 n ( τ − γ ) n X i =1 (cid:8) [ m γ ( X i ) − m γ ( X i )] I γ ,τ ( Q i )+ { [ m γ ( X i ) − m γ ( X i )] − [ ˆ m ˆ γ ( X i ) − ˆ m ˆ γ ( X i )] } I γ ,τ ( Q i ) ×{ [ m γ ( X i ) − m γ ( X i )] + [ ˆ m ˆ γ ( X i ) − ˆ m ˆ γ ( X i )] }−

2[ ˆ m ˆ γ ( X i ) − ˆ m ˆ γ ( X i )] I γ ,τ ( Q i ) e i +2[ ˆ m ˆ γ ( X i ) − ˆ m ˆ γ ( X i )][ ˆ m ˆ γ ( X i ) − m γ ( X i )] I γ ,τ ( Q i ) } . herefore, SSR ( τ , τ .τ ) − SSR ( τ , γ , τ ) n ( τ − γ ) ≥ G n, , ( γ , τ ) n | τ − γ |− | J n, , ( τ ) − J n, , ( γ ) |√ n | τ − γ |− P ni =1 | ˆ c , ( X i ) − c , ( X i ) | I γ ,τ ( Q i ) e i √ n | τ − γ |− P ni =1 n − α | ˆ m ˆ γ ( X i ) − m γ ( X i ) || c , ( X i ) || I γ ,τ ( Q i ) | n | τ − γ |− P ni =1 n − α | ˆ m ˆ γ ( X i ) − m γ ( X i ) || ˆ c , ( X i ) − c , ( X i ) || I γ ,τ ( Q i ) || n | γ − γ |− P ni =1 [ˆ c , ( X i ) − c ( X i )] | I γ ,τ ( Q i ) | n | τ − γ |− P ni =1 | ˆ c , ( X i ) − c ( X i ) || c , ( X i ) || I γ ,τ ( Q i ) | n | τ − γ |≥ (1 − η ) d − η − κη − κ (1 + η ) d − κ (1 + η ) d − κ (1 + η ) d − κ (1 + η ) d ≥ . This result indicates, in the event E n , SSR ( τ , τ , τ ) − SSR ( τ , γ , τ ) > τ ∈ [ γ + ¯ v/n, γ + B ] and when τ ∈ [ γ − B, γ − ¯ v/n ]. How-ever, this contradicts the fact that SSR ( τ , τ , τ ) − SSR ( τ , γ , τ ) ≤ | ˆ γ − γ | ≤ ¯ v/n , and then, P ( E n ) ≥ − ǫ for n ≤ ¯ n . This is equivalent to P ( n | ˆ γ − γ | > ¯ v ) ≤ ǫ for n ≥ ¯ n . (cid:4) Proof of Theorem 9.

The following lemmas are necessary for proving Theorem 9.

Lemma 8.

Given the existence of the small eﬀect, δ n,l,k ( X i ) = a n c l,k ( X i ) → , a n (ˆ γ j − γ j ) = O p (1) here a n = n − α . (cid:3) Proof: The proof is similar to the one in part b) of Theorem 8. (cid:4)

Let us ﬁx some new notations before introducing a new lemma. µ := E[ c ∗ , ( X i ) | Q i = γ ] λ := E[ c ∗ , ( X i ) e i | Q i = γ ] . Lemma 9.

Let G n, , ( v ) = a n P ni =1 c ∗ , ( X i ) d ,i ( v ) and d ,i ( v ) = I γ + v/a n ,γ j ( Q i ) .We then have G n, , ( v ) → p µ | v | . (cid:3) Proof: Since E( G n, , ( v )) = v [E( c ∗ , ( X i ) d ,i ( v ))] / ( v/a n )= vf q ( γ )E[ c ∗ , ( X i ) | q i = γ ]from Lemma A.2 of Hansen (2000)V( G n, , ( v )) = E[ G n, , ( v ) − E( G n, , ( v ))] ≤ a n n D | va n | = D | v | n − α → . Therefore, G n, , ( v )( v ) → p µ | v | according to Chebyshev’s inequality. (cid:4) Let R n, , ( v ) = √ a n √ n P ni =1 c ∗ , ( X i ) e i d ,i ( v ). We have the followingfunctional central limit theorem: Lemma 10. R n, , ( v ) → d p λ B ( v ) and B ( v ) is a standard Brownian motion. (cid:3) roof: The variance of R n, , ( v ) is V n [ R n, , ( v )] = a n (cid:8) E [ c , ( X i ) e i d ,i ( v )]+ M ( n ) − X l =1 n − ln E [ c , ( X ) c , ( X l ) e e l d , ( v ) d , l ( v )]+ n − X l = M ( n ) n − ln E [ c , ( X ) c , ( X l ) e e l d , ( v ) d , l ( v )]  = V n + V n + V n . For any M ( n ) → ∞ satisfying M ( n ) /a n → V n is V n = v E( c , ( X i ) I γ + v/a n ( Q i ) e i ) − E( c , ( X i ) I γ ( Q i ) e i ) v/a n → ∂ E( c , ( X i ) I γ ( Q i ) e i | Q i = γ ) f ( γ ) ∂γ | Q i = γ = v E( c , ( X i ) e ( X i ) | q i = γ ) f q ( γ ) = vλ . (35)Furthermore, let¯ D = max l ∈ [1 ,...,M ( n ) − E [ c , ( X ) c , ( X l ) e e l | Q = γ , Q l = γ ] < ∞ . We then have V n = v a n M ( n ) − X l =1 n − ln E [ c , ( X ) c , ( X l ) e e l | Q = γ , Q l = γ ] f Q ,Q l ( γ , γ ) ≤ v a n M ( n ) ¯ D f ( γ ) = o (1) . (36)From Lemma A.0 of Fan and Li (1999), it can then be seen that E [ c , ( X ) c , ( X l ) e e l | Q = γ , Q l = γ ]= E [ c , ( X ) e | Q = γ ] E [ c , ( X l ) e l | Q l = γ ]+ { E [ c , ( X ) c , ( X l ) e e l | Q = γ , Q l = γ ] − E [ c , ( X ) e | Q = γ ] E [ c , ( X l ) e l | Q l = γ ] }≤ D / (1+ δ )2 β δ/ (1+ δ ) ( l ) , here¯ D = sup l ∈ [ M ( n ) ,..., ∞ ] { E [ | c , ( X ) e c , ( X l ) e l | Q = γ , Q l = γ | δ ] , Z Z | c , ( X ) e c , ( X l ) e l | δ Q ( x , e | q = γ ) R ( x l , e l | q l = γ ) } . In addition, V n = a n n − X l = M ( n ) n − ln E [ c , ( X ) c , ( X l ) e e l d , ( v ) d , l ( v )] f Q ,Q l ( γ , γ ) ≤ v a n ¯ D / (1+ δ )2 ∞ X l =1 l β δ/ (1+ δ ) ( l ) = o (1) . (37)By combining (35), (36), and (37), we have V n [ R n, , ( v )] = λ v + o (1) . (38)Next, the big block and small block method is used to derive the asymp-totic normality of R n, , ( v ). Let s n and l n satisfy s n l n → , l n n → , l n ( nh ) / → , nl n α ( s n ) → , where α is the mixing coeﬃcient of ( Y i , X i , Q i ). Denote ζ j = j ( s n + l n )+ r − X i = j ( s n + l n ) √ a n √ n n X i =1 c , ( X i ) e i d ,i ( v ) ,η j = ( j +1)( s n + l n ) X i = j ( s n + l n )+ r √ a n √ n n X i =1 m ( X i ) e i d ,i ( v ) ,ξ = n − X k n ( s n + l n ) √ a n √ n n X i =1 c , ( X i ) e i d ,i ( v ) , where k n = [ ns n + l n ], [ · ] is a Gaussian function. Then R n, , ( v ) can berewritten as R n, , ( v ) = k n X j =0 ζ j + k n X j =0 η j + ξ = R ′ n, , ( v ) + R ′′ n, , ( v ) + R ′′′ n, , ( v ) . he necessary conditions for applying a functional central limit theo-rem in a big and small block method include R ′′ n, , ( v ) → , R ′′′ n, , ( v ) → (cid:12)(cid:12)(cid:12) E ( e R ′ n, , ( v ) t ) − Π k ( n ) i =0 E ( e ζ j t ) (cid:12)(cid:12)(cid:12) → R ′ n, , ( v ) → λ v (41)1 n k X j =0 E ( ζ j I [ | ζ j | ≤ ǫθ √ n ]) → P (cid:18) sup v ≤ v ≤ τv + ν | R n, , ( v ) − R n, , ( v ) | > ζ (cid:19) → . (43)From (38), we have the variance V ( η j ) = s n vθ and then the vari-ance V ( R ′′ n, , ( v )) = n − k n s n vλ = s n l n + s n vθ = o (1). Similarly, we have V ( R ′′′ n, , ( v )) = o (1). Therefore, it is clear that (39) holds. In addition,as V ( R ′ n, , ( v )) = n − k n l n vλ = l n l n + s n λ v = vλ , it can be seen that(41) also holds.From Proposition 2.6 of Fan and Yao (2003), we have (cid:12)(cid:12)(cid:12) E ( e R ′ n, , ( v ) t ) − Π k n i =0 E ( e ζ j t ) (cid:12)(cid:12)(cid:12) ≤ k n α ( s n ) → , and then (40) also holds. Furthermore, from Lemma 1 of Hansen (2000),and by letting D = max q ∈ R E [ m ( X i e i | Q i = q )], we obtainE (cid:12)(cid:12)(cid:12)(cid:12) n − / max ≤ i ≤ n | u i,n ( v ) | (cid:12)(cid:12)(cid:12)(cid:12) ! ≤ n E | u i,n ( v ) | = a n n E (cid:0) | c ∗ , ( X i ) e i | | d i ( v ) | (cid:1) ≤ a n n D | v | a n = n − α D | v | → nd then (42) holds. From Lemma 3 of Hansen (1999), we have P (cid:18) sup v ≤ v ≤ τv + ν | R n ( v ) − R n ( v ) | > ζ (cid:19) = P sup τ ≤ τ ≤ τ + ν/a n | R n ( τ ) − R n ( τ ) | > ζa / n ! ≤ K ( νa n ) a − n ζ ≤ vǫ, and then (43) also holds. Finally, combining equations (39) through(43), we have proved ( λ ) − / R n, , ( v ) → d B ( v ).Given Lemma 8, the probability of having ˆ γ in ( γ − ¯ v/n, γ + ¯ v/n )is 1 − ǫ . Denote Q n ( v ) = SSR ( τ , γ , τ ) − SSR ( τ , γ + v/a n , τ ). Weconsequently have Q n ( v ) = n X i =1 (cid:8) − n − α c ∗ , ( X i ) d ,i ( v ) − [ δ n, , ( X i ) + ˆ δ n, , ( X i )] d ,i ( v ) { δ n, , ( X i ) − ˆ δ n, , ( X i ) } +2[ ˆ m ˆ γ ( X i ) + ˆ m ˆ γ ( X i )] d ,i ( v ) e i +2ˆ δ n, , ( X i )[ ˆ m ˆ γ ( X i ) − m γ ( X i )] d ,i ( v ) o = − G n, , ( v ) + 2 R n, , ( v ) + L n, , ( v )and L n, , ( v ) ≤ √ n sup n | ˆ δ n, , ( X i ) − δ n, , ( X i ) | × | R n, , ( v ) | +[2 n α | ˆ m ˆ γ ( X i ) − m γ ( X i ) | + | c ∗ , ( X i ) − ˆ c ∗ , ( X i ) | × | c ∗ , ( X i ) + ˆ c ∗ , ( X i ) | ] | d ,i ( v ) | (cid:9) → . Given Lemmas 9 and 10, we have Q n ( v ) → d − vµ + 2 p λ B ( v ) = Q ( v )and then from Theorem 2.7 of Kim and Pollard (1990), we obtainTheorem 1 of Hansen (2000), a n (ˆ γ − γ ) → d arg max v ∈ R Q ( v ) . (cid:4) ote for Theorem 10. SSR ( γ ) p → S ( γ ) = X j =1 b j ( γ ) I γ j ( γ ) , where b ( γ ) = E ( e i )+ E ((cid:20) c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) (cid:21) I γ,γ ( Q i ) ) + E ((cid:20) − c , ( X i ) f γ,γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) (cid:21) I γ ( Q i ) ) + E ((cid:20) − c , ( X i ) f γ,γ ( X i ) f γ, ( X i ) − c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) (cid:21) I γ ( Q i ) ) + E ((cid:20) c , ( X i ) f γ,γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) (cid:21) I γ ( Q i ) ) , with I γ,γ ( Q i ) = 1 for Q i ∈ [ γ, γ ) and 0 otherwise, and b ( γ ) = E ( e i )+ E (cid:26) c , ( X i ) f γ ,γ ( X i ) f γ, ( X i ) I γ ( Q i ) (cid:27) + E (cid:26) c , ( X i ) f γ ( X i ) f γ, ( X i ) I γ ,γ ( Q i ) (cid:27) + E ((cid:20) c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) (cid:21) I γ,γ ( Q i ) ) + E ((cid:20) − c , ( X i ) f γ,γ ( x ) f γ, ( X i ) + c , f γ ( X i ) f γ, ( X i ) (cid:21) I γ ( Q i ) ) + E ((cid:20) c , ( X i ) f γ,γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) (cid:21) I γ ( Q i ) ) , with I γ ,γ ( Q i ) = 1 for Q i ∈ ( γ , γ ] and 0 otherwise, I γ,γ ( Q i ) = 1 for i ∈ [ γ, γ ) and 0 otherwise, and b ( γ ) = E ( e i )+ E ((cid:20) c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ,γ ( X i ) f γ, ( X i ) (cid:21) I γ ( Q i ) ) + E ((cid:20) − c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ,γ ( X i ) f γ, ( X i ) (cid:21) I γ ( Q i ) ) + E (cid:26)(cid:20) c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) (cid:21) f γ ( X i ) f γ, ( X i ) ] I γ ,γ ( Q i ) (cid:27) + E ((cid:20) c , ( X i ) f γ ( X i ) f γ, ( X i ) (cid:21) I γ,γ ( Q i ) ) + E ((cid:20) c , ( X i ) f γ,γ ( X i ) f γ, ( X i ) (cid:21) I γ ( Q i ) ) , with I γ ,γ ( Q i ) = 1 for Q i ∈ ( γ , γ ] and 0 otherwise, I γ,γ ( Q i ) = 1 for Q i ∈ [ γ, γ ) and 0 otherwise, and b ( γ ) = E ( e i )+ E ((cid:20) c , f γ ( X i ) f γ, ( X i ) + c , f γ ( X i ) f γ, ( X i ) + c , f γ ,γ ( X i ) f γ, ( X i ) (cid:21) I γ ( Q i ) ) + E ((cid:20) − c , f γ ( X i ) f γ, ( X i ) + c , f γ ( X i ) f γ, ( X i ) + c , f γ ,γ ( X i ) f γ, ( X i ) (cid:21) I γ ( Q i ) ) + E ((cid:20) − c , f γ ( X i ) f γ, ( X i ) − c , f γ ( X i ) f γ, ( X i ) + c , f γ ,γ ( X i ) f γ, ( X i ) (cid:21) I γ ( Q i ) ) + E ((cid:20) c , f γ ( X i ) f γ, ( X i ) + c , f γ ( X i ) f γ, ( X i ) + c , f γ ( X i ) f γ, ( X i ) (cid:21) I γ ,γ ( Q i ) ) , with I γ ,γ ( Q i ) = 1 for Q i ∈ ( γ , γ ) and 0 otherwise. (cid:3) raphic Description of Theorem 10. Case of b (1) γ γ γ γ γ γ ˆ m γ ( x ) ˆ m ∗ γ ( x )(1) (2) (3) (4)ˆ m γ ( x )(1) (2) (3) (4) (5)ˆ m ∗ γ ( x )Case of b (2) γ γ γ γ γ γ ˆ m γ ( x )(1) (2) (3) (4) (5)ˆ m ∗ γ ( x )Case of b (3) γ γ γγ γ γ ˆ m γ ( x )(1) (2) (3) (4) (5)ˆ m ∗ γ ( x )Case of b (4) γ γ γγ γ γ ˆ m γ ( x )(1) (2) (3) (4) ˆ m ∗ γ ( x )Given the three true threshold values γ , γ , and γ , the thresholdvalue γ of a mis-speciﬁed nonparametric regression with one thresholdmay be in [ γ , γ ), or in ( γ , γ ), or in ( γ , γ ), or in ( γ , γ ]. For γ ∈ [ γ , γ ), there is no model miss-speciﬁed error for Q i ∈ [ γ , γ ] but themiss-speciﬁed errors are1. Q i ∈ [ γ, γ ] is ˆ m ∗ γ ( x ) − m γ ( x ),2. Q i ∈ [ γ , γ ) is ˆ m ∗ γ ( x ) − m γ ( x ), . Q i ∈ [ γ , γ ) is ˆ m ∗ γ ( x ) − m γ ( x ), and4. Q i ∈ [ γ , γ ] is ˆ m ∗ γ ( x ) − m γ ( x ),as shown in the ﬁrst graph of Case b (1). For γ ∈ ( γ , γ ), the mis-speciﬁed errors are1. for Q i ∈ [ γ , γ ) is ˆ m γ ( x ) − m γ ( x ) and is denoted as (1),2. for Q i ∈ [ γ , γ ) is ˆ m γ ( x ) − m γ ( x ) and is denoted as (2),3. for Q i ∈ [ γ, γ ) is ˆ m ∗ γ ( x ) − m γ ( x ) and is denoted as (3),4. for Q i ∈ [ γ , γ ) is ˆ m ∗ γ ( x ) − m γ ( x ) and is denoted as (4),5. for Q i ∈ [ γ , γ ] is ˆ m ∗ γ ( x ) − m γ ( x ) and is denoted as (5),as shown in the second graph of Case b (2). For γ ∈ ( γ , γ ), the mis-speciﬁed errors are1. for Q i ∈ [ γ , γ ) is ˆ m γ ( x ) − m γ ( x ) and is denoted as (1),2. for Q i ∈ [ γ , γ ) is ˆ m γ ( x ) − m γ ( x ) and is denoted as (2),3. for Q i ∈ [ γ , γ ) is ˆ m γ ( x ) − m γ ( x ) and is denoted as (3),4. for Q i ∈ [ γ, γ ) is ˆ m ∗ γ ( x ) − m γ ( x ) and is denoted as (4),5. for Q i ∈ [ γ , γ ] is ˆ m ∗ γ ( x ) − m γ ( x ) and is denoted as (5),as shown in the third graph of Case b (3). For γ ∈ ( γ , γ ), the mis-speciﬁed errors are1. for Q i ∈ [ γ , γ ) is ˆ m γ ( x ) − m γ ( x ) and is denoted as (1),2. for Q i ∈ [ γ , γ ) is ˆ m γ ( x ) − m γ ( x ) and is denoted as (2),3. for Q i ∈ [ γ , γ ) is ˆ m γ ( x ) − m γ ( x ) and is denoted as (3), . for Q i ∈ [ γ , γ ) is ˆ m γ ( x ) − m γ ( x ) and is denoted as (4),as shown in the last graph of Case b (3). Note that there is no modelmis-speciﬁed error for Q i ∈ [ γ, γ ] in this case.As to the cases of γ = γ , γ , or γ , the model mis-speciﬁcationerrors are S ( γ ) = E ( e i ) + E ((cid:20) c , ( X i ) f γ ( X i ) f ( X i ) − f γ ( X i ) + c , ( X i ) f γ ( X i ) f ( X i ) − f γ ( X i ) (cid:21) I γ ( Q i ) ) + E ((cid:20) − c , ( X i ) f γ ( X i ) f ( X i ) − f γ ( X i ) + c , ( X i ) f γ ( X i ) f ( X i ) − f γ ( X i ) (cid:21) I γ ( Q i ) ) + E ((cid:20) c , ( X i ) f γ ( X i ) f ( X i ) − f γ ( X i ) + c , ( X i ) f γ ( X i ) f ( X i ) − f γ ( X i ) (cid:21) I γ ( Q i ) ) ,S ( γ ) = E ( e i ) + E ((cid:20) c , ( X i ) f γ ( X i ) f γ ( X i ) + f γ ( X i ) (cid:21) I γ ( Q i ) ) + E ((cid:20) − c , ( X i ) f γ ( X i ) f γ ( X i ) + f γ ( X i ) (cid:21) I γ ( Q i ) ) + E ((cid:20) c , ( X i ) f γ ( x ) f γ ( X i ) + f γ ( X i ) (cid:21) I γ,γ ( Q i ) ) + E ((cid:20) c , ( X i ) f γ ( X i ) f γ ( X i ) + f γ ( X i ) (cid:21) I γ ( Q i ) ) ,S ( γ ) = E ( e i ) + E ((cid:20) c , ( X i ) f γ ( X i ) f ( X i ) − f γ ( X i ) + c , ( X i ) f γ ( X i ) f ( X i ) − f γ ( X i ) (cid:21) I γ ( Q i ) ) + E ((cid:20) − c , ( X i ) f γ ( X i ) f ( X i ) − f γ ( X i ) + c , ( X i ) f γ ( X i ) f ( X i ) − f γ ( X i ) (cid:21) I γ ( Q i ) ) + E ((cid:20) c , ( X i ) f γ ( X i ) f ( X i ) − f γ ( X i ) + c , ( X i ) f γ ( X i ) f ( X i ) − f γ ( X i ) (cid:21) I γ ( Q i ) ) . (cid:3) Proof of Theorems 10 and 11.

From Lemma 4, sup | ˆ m γ ( x ) − m γ ( x ) | = O p ( h r + (ln( n )) / / ( nh p ) / ) sup | ˆ m ∗ γ ( x ) − m ∗ γ ( x ) | = O p ( h r + (ln( n )) / / ( nh p ) / ) , here m γ ( x )= m γ I γ ( γ ) + (cid:20) m γ ( x ) f γ ( x ) f γ, ( x ) + m γ ( x ) f γ ,γ ( x ) f γ, ( x ) (cid:21) I γ ( γ )+ (cid:20) m γ ( x ) f γ ( x ) f γ, ( x ) + m γ ( x ) f γ ( x ) f γ, ( x ) + m γ ( x ) f γ ,γ ( x ) f γ, ( x ) (cid:21) I γ ( γ )+ (cid:20) m γ ( x ) f γ ( x ) f γ, ( x ) + m γ ( x ) f γ ( x ) f γ, ( x ) + m γ ( x ) f γ ( x ) f γ, ( x ) + m γ ( x ) f γ ,γ ( x ) f γ, ( x ) (cid:21) I γ ( γ )and m ∗ γ ( x )= [ m γ ( x ) f γ,γ ( x ) f γ, ( x ) + m γ ( x ) f γ ( x ) f γ, ( x ) + m γ ( x ) f γ ( x ) f γ, ( x ) + m γ ( x ) f γ ( x ) f γ, ( x ) ] I γ ( γ )+[ m γ ( x ) f γ,γ ( x ) f γ, ( x ) + m γ ( x ) f γ ( x ) f γ, ( x ) + m γ ( x ) f γ ( x ) f γ, ( x ) ] I γ ( γ )+[ f γ,γ ( x ) f γ, ( x ) m γ + m γ ( x ) f γ ( x ) f γ, ( x ) ] I γ ( γ ) + m γ ( x ) I γ ( γ ) . (44)Denote γ as a pseudo threshold value considered in a mis-speciﬁednonparametric regression with one threshold and assume γ ∈ [ γ , γ ).From (44), we have1 n n X i =1 { Y i − ˆ m γ ( X i ) I γ ( Q i ) − ˆ m ∗ γ ( X i )(1 − I γ ( Q i )) } = 1 n n X i =1 { e i + (cid:26) c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) (cid:27) I γ,γ ( Q i )+ (cid:26) − c , ( X i ) f γ,γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) (cid:27) I γ ( Q i )+ (cid:26) − c , ( X i ) f γ,γ ( X i ) f γ, ( X i ) − c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) (cid:27) I γ ( Q i )+ (cid:26) c , ( X i ) f γ,γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) ] (cid:27) I γ ( Q i ) } . (45)Based on Lemma 5, the limit of the cross products of e i with the otherterms in the above equation will be o p (1). Note that the cross products mong these terms converge to zero. Therefore, the limit of (45) is1 n n X i =1 { Y i − ˆ m γ ( X i ) I γ ( Q i ) − ˆ m ∗ γ ( X i )(1 − I γ ( Q i )) } → P E( e ) + E (cid:26) [ c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) ] I γ,γ ( Q i ) (cid:27) +E (cid:26) [ − c , ( X i ) f γ,γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) ] I γ ( Q i ) (cid:27) +E (cid:26) [ − c , ( X i ) f γ,γ ( X i ) f γ, ( X i ) − c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) ] I γ ( Q i ) (cid:27) +E (cid:26) [ c , ( X i ) f γ,γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) ] I γ ( Q i ) (cid:27) = b ( γ ) . The limiting properties of b ( γ ), b ( γ ), and b ( γ ) can be derived in thesame manner. (cid:4) Proof of Theorem 12.

The slope of b ( γ ) for γ ∈ [ γ , γ ) is db ( γ ) dγ = − Z [ c , ( x i ) f γ ( x i ) + c , ( x i ) f γ ( x i ) + c , ( x i ) f γ ( x i )] f ( x i , q i = γ ) f γ, ( x i ) d x i = − E (cid:26) [ c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) ] | q i = γ (cid:27) f ( γ ) . (46)The slope of b ( γ ) for γ ∈ [ γ , γ ) is db ( γ ) dγ = Z c , ( x i ) f γ ( x i ) 1 f γ, ( x i ) − ( c , ( x i ) f γ ( x i ) + c , ( x i ) f γ ( x i )) f γ, ( x i ) f ( x i , q i = γ ) d x i = E[ c , ( X i ) f γ ( X i ) f γ, ( X i ) | Q i = γ ] f ( γ ) − E (cid:26) [ c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( x i ) f γ ( X i ) f γ, ( X i ) ] | Q i = γ (cid:27) f ( γ ) . (47) he slope of b ( γ ) for γ ∈ [ γ , γ ) is db ( γ ) dγ = Z { c , f γ ( x i ) + c , f γ ( x i ) } f γ, ( x i ) f ( x i , q i = γ ) d x i − Z c , f γ ( x ) f γ, ( x ) f ( x i , q i = γ ) d x i = E (cid:26) [ c , f γ ( x i ) f γ, ( x i ) + c , f γ ( x i ) f γ, ( x i ) ] | Q i = γ (cid:27) f ( γ ) − E[ c , f γ ( x ) f γ, ( x ) | Q i = γ ] f ( γ ) . (48)Finally, the slope of b ( γ ) for γ ∈ [ γ , γ ) is db ( γ ) dγ = Z { c , ( x i ) f γ ( x i ) + c , ( x i ) f γ ( x i ) + c , ( x i ) f γ ( x i ) } f γ, ( x i ) f ( x i , q i = γ ) d x i = E (cid:26) c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) + c , ( X i ) f γ ( X i ) f γ, ( X i ) | Q i = γ ] (cid:27) f ( γ ) . (49)From (46), the slope is a strictly decreasing function in γ for γ ∈ [ γ , γ ). Thus, S ( γ ) is the smallest value of the model mis-speciﬁcationerror for γ ∈ [ γ , γ ). For γ ∈ [ γ , γ ), we denote π ( x i , γ ) = c , ( x i ) f γ ( x i ) 1 f γ, ( x i ) − [ c , ( x i ) f γ ( x i ) + c , ( x i ) f γ ( x i )] f γ, ( x i ) ∀ x i ∈ R p . The partial eﬀect of γ on π ( x i ) is ∂π ( x i , γ ) ∂γ = − c , ( x i ) f γ ( x i ) f ( x , γ ) f γ, ( x i ) − c , ( x i ) f γ ( x i ) + c , ( x i ) f γ ( x i )] f ( x ,γ ) f γ, ( x i ) ≤ . We have R ∂π ( x i ,γ ) ∂γ f ( x i , γ ) d x i <

0. This result indicates that the min-imum of b ( γ ) is either at γ or at γ in spite of the initial value of b ( γ ) being positive or negative. In other words, either S ( γ ) or S ( γ ) ust be the minimal value of the model mis-speciﬁcation error for γ ∈ [ γ , γ ). In the same manner, either S ( γ ) or S ( γ ) must be theminimal value of the model mis-speciﬁcation error for γ ∈ [ γ , γ ).Finally, from (49), the slope is a strictly increasing function in γ for γ ∈ [ γ , γ ). This fact implies that the minimal value of the model mis-speciﬁcation error takes place at γ , which is equal to S ( γ ). Therefore,the minimal value among S ( γ ), S ( γ ), and S ( γ ) is the global mini-mum of the model mis-speciﬁcation error for γ ∈ [ γ , γ ]. This is theproof of part a) in Theorem 12.Since min( S ( γ ) , S ( γ ) , S ( γ )) = S ( γ ) is assumed, S ( γ ) is theglobal minimum of the model mis-speciﬁcation error for γ ∈ [ γ , γ ].Therefore, from Theorem 2.1 of Newey and McFadden (1994), we haveˆ γ = arg min 1 n SSR ( γ ) → p γ = arg min 1 n S ( γ ) . This completes the proof of parts b) and c) in Theorem 12. (cid:4) eferences A¨ıt-Sahalia, Y., Bickel, P.J., Stoker, T.M., 2001. Goodness-of-ﬁt testsfor kernel regression with an application to option implied volatil-ities, Journal of Econometrics 105, 363–412.Angrist, J.D., Pischke, J., 2009. Mostly Harmless Econometrics, NewJersey: Princeton University Press.Bai, J., 1997. Estimating multiple breaks one at a time, EconometricTheory 13, 315–352.Bai, J., Perron, P., 1998. Estimating and testing linear models withmultiple structural changes, Econometrica 66, 47–78.Bai, J., Perron, P., 2003. Computation and analysis of multiple struc-tural change models, Journal of Applied Econometrics 18, 1–22.Bhattacharya, P.K., Brockwell, P.J., 1976. The minimum of an ad-ditive process with applications to signal estimation and storagetheory, Z. Wahrschein. Verw. Gebiete 37, 51–75.Chan, K.S., 1993. Consistency and limiting distribution of the leastsquares estimator of a threshold autoregressive model, The An-nals of Statistics 21, 520–533.Chen, B., Hong, Y., 2012. Testing for smooth structural changes intime series models via nonparametric regression, Econometrica80, 1157–1183.Chen, B., Hong, Y., 2013. Nonparametric testing for smooth struc-tural change in panel data models, Working Paper, Department f Economics, University of Rochester.Chen, J.-E., 2008. Estimating and testing quantile regression withstructural changes, Working Paper, Department of Economics,NYU.Chernozhukov, V., Chetverikov, D., Demirer, M., Duﬂo, E., Hansen,C., Newey, W., 2016. Double machine learning for treatment andcausal parameters, Cemmap Working Paper CWP49/16.Chernozhukov, V., Hansen, C., 2004. The impact of 401(k) participa-tion on the wealth distribution: An instrumental quantile regres-sion analysis, Review of Economics and Statistics 86, 735–751.Chernozhukov, V., Hansen, C., 2013. High-Dimensional Methods: Ex-amples for Inference on Structural Eﬀects, NBER Summer Insti-tute.Dette, H., Spreckelsen, I., 2004. Some comments on speciﬁcation testsin nonparametric absolutely regular processes, Journal of TimeSeries Analysis 25, 159–172.Fan, Y., Li, Q., 1999. Central limit theorem for degenerate U-statisticsof absolutely regular processes with applications to model speci-ﬁcation testing, Journal of Nonparametric Statistics 10, 245–271.Fan, J., Yao, Q., 2003. Nonlinear Time Series: Nonparametric andParametric Methods, New York: Springer-Verlag.Hall, P., 1984. Central limit theorem for integrated squared error ofmultivariate nonparametric density estimators, Journal of Multi-variate Analysis 14, 1–16. ansen, B.E., 1999. Threshold eﬀects in non-dynamic panels:Estimation,testing, and inference, Journal of Econometrics 93, 345–368.Hansen, B.E., 2000. Sample splitting and threshold estimation, Econo-metrica 68, 575–603.Henderson, D.J., Parmeter, C.F., Su, L., 2014. Nonparametric thresh-old regression: Estimation and inference, Working Paper, Depart-ment of Economics, University of Miami.Li, Q., Racine, J.S., 2007. Nonparametric Econometrics: Theory andPractice, Princeton, NJ: Princeton University Press.Masry, E., 1996. Multivariate regression estimation local polynomialﬁtting for time series, Stochastic Processes and their Applications65, 81–101.Masry, E., Fan, J., 1997. Local polynomial estimation of regressionfunctions for mixing processes, Scandinavian Journal of Statistics24, 165–179.Newey, W. K., McFadden, D.L., 1994. Large sample estimation andhypothesis testing, Handbook of Econometrics: Vol. IV, ed. byR. F. Engle and D. L. McFadden, New York: Elsevier, 2113 –2245.Oka, T., Qu, Z., 2011. Estimating structural changes in regressionquantiles, Journal of Econometrics 162, 248–267.Poterba, J.M., Venti, S.F., Wise, D.A., 1994a. 401(k) plans and tax-deferred savings, Studies in the Economics of Aging, Chicago:University of Chicago Press, 105–142. oterba, J.M., Venti, S.F., Wise, D.A., 1994b. Do 401(k) contributionscrowd out other personal saving?, Journal of Public Economics58, 1–32.Qu, Z., 2008. Testing for structural change in regression quantiles,Journal of Econometrics 146, 170–184.Qu, Z., Perron P., 2007. Estimating and testing structural changes inmultivariate regressions, Econometrica 75, 459–502.Stone, C.J., 1983. Optimal uniform rate of convergence for nonpara-metric estimators of a density function or its derivatives, RecentAdvances in Statistics, 393–406. Academic Press, New York.Su, L., Xiao, Z., 2008. Testing structural change in time-series nonpara-metric regression models, Statistics and Its Interface 1, 347–366.Yu, P., Philips, P.C.B., 2015. Threshold regression with endogeneity,Cowles Foundation Discussion Paper no. 1966.oterba, J.M., Venti, S.F., Wise, D.A., 1994b. Do 401(k) contributionscrowd out other personal saving?, Journal of Public Economics58, 1–32.Qu, Z., 2008. Testing for structural change in regression quantiles,Journal of Econometrics 146, 170–184.Qu, Z., Perron P., 2007. Estimating and testing structural changes inmultivariate regressions, Econometrica 75, 459–502.Stone, C.J., 1983. Optimal uniform rate of convergence for nonpara-metric estimators of a density function or its derivatives, RecentAdvances in Statistics, 393–406. Academic Press, New York.Su, L., Xiao, Z., 2008. Testing structural change in time-series nonpara-metric regression models, Statistics and Its Interface 1, 347–366.Yu, P., Philips, P.C.B., 2015. Threshold regression with endogeneity,Cowles Foundation Discussion Paper no. 1966.