AA Continuous Threshold Expectile Model
Feipeng Zhang a,b , Qunhua Li b, ∗ a Department of Statistics, Hunan University, Changsha, 410082, China b Department of Statistics, Pennsylvania State University, PA, 16802, USA
Abstract
Expectile regression is a useful tool for exploring the relation between theresponse and the explanatory variables beyond the conditional mean. Thisarticle develops a continuous threshold expectile regression for modeling data inwhich the effect of a covariate on the response variable is linear but varies belowand above an unknown threshold in a continuous way. Based on a grid searchapproach, we obtain estimators for the threshold and the regression coefficientsvia an asymmetric least squares regression method. We derive the asymptoticproperties for all the estimators and show that the estimator for the thresholdachieves root-n consistency. We also develop a weighted CUSUM type teststatistic for the existence of a threshold in a given expectile, and derive itsasymptotic properties under both the null and the local alternative models. Thistest only requires fitting the model under the null hypothesis in the absence ofa threshold, thus it is computationally more efficient than the likelihood-ratiotype tests. Simulation studies show desirable finite sample performance in bothhomoscedastic and heteroscedastic cases. The application of our methods on aDutch growth data and a baseball pitcher salary data reveals interesting insights.
Keywords:
Expectile regression, Threshold, Weighted CUSUM test, Gridsearch method ∗ Department of Statistics, Pennsylvania State University, PA, 16802, USA
Email address: [email protected] (Qunhua Li)
Preprint submitted to Elsevier November 9, 2016 a r X i v : . [ s t a t . M E ] N ov . Introduction Expectile regression, first introduced by Aigner et al. (1976) and Newey andPowell (1987), has become popular in the last decades. Analogous to quantileregression (Koenker and Bassett, 1978), expectile regression draws a completepicture of the conditional distribution of the response variable given the covari- ates, making it a useful tool for modeling data with heterogeneous conditionaldistributions. As modeling tools, quantile regression and expectile regressionboth have advantages over the other in certain aspects: quantile regression ismore robust to outliers than expectile regression, whereas expectile regressionis more sensitive to the extreme values in the response variable than quantile regression. However, expectile regression has certain computational advantagesover quantile regression (Newey and Powell, 1987). First, unlike quantile regres-sion, the loss function of expectile regression is everywhere differentiable, thus itsestimation is more straightforward and much quicker. Second, the computationof the asymptotic covariance matrix of the expectile regression estimator does not involve estimating the density function of the errors. Besides the early de-velopment on linear expectile regression (Newey and Powell, 1987; Efron, 1991),many nonparametric or semiparametric expectile regression have been devel-oped in recent years, for example, Yao and Tong (1996), De Rossi and Harvey(2009), Kuan et al. (2009), Schnabel and Eilers (2009), Kneib (2013), Sobotka et al. (2013), Xie et al. (2014), Waltrup et al. (2015), Kim and Lee (2016),and among others. These models greatly improve the flexibility of expectileregression for modeling nonlinear relationships.However, some natural phenomena call for nonlinear regression forms thatexhibit structure changes, sometimes in the form of two line segments with different slopes. For example, a child’s height increases rapidly with age beforeand during puberty and then stops increasing in late teens. This implies thatthe growth curve of height may be described as two line segments with differentslopes intersecting at a threshold. Another example arises from a study of thesalaries of major league baseball players in 1987 (Hoaglin and Velleman, 1995). salaries. Although the existing spline-based (e.g., Schnabel and Eilers, 2009;Kim and Lee, 2016) or varying-coefficient expectile models (e.g., Xie et al.,2014) can capture the nonlinear relationship between the response variable andthe predictors, they cannot provide information on the location of the threshold.This issue motivates us to consider a continuous threshold model for expectile regression. Continuous threshold regression, also called segmented regression orbent line regression, has been studied in the context of least squares regression(Quandt, 1958, 1960; Hinkley, 1969; Feder, 1975; Chappell, 1989; Chan andTsay, 1998; Chiu et al., 2006; Hansen, 2015), quantile regression (Li et al.,2011), and rank-based regression (Zhang and Li, 2016). However, no literature has investigated the continuous threshold expectile regression.In this article, we develop a continuous threshold expectile regression model.The contribution of this article is twofold. First, we propose a grid searchmethod to estimate the unknown threshold and other regression coefficients. Wederive the asymptotic properties for all the parameters including the threshold, and show that the estimator for the threshold achieves √ n -consistency. Second,we develop a testing procedure for the existence of structural change at a givenexpectile, based on a weighted CUSUM type statistic. This test only requiresfitting the model under the null hypothesis in the absence of a threshold, thusit is computationally efficient. The limiting distribution of the test statistic is also established. The estimation and testing procedures are implemented in Rcode, which is available from the first author by request.The remainder of the article is organized as follows. In Section 2, we de-scribe the continuous threshold expectile regression model, and develop a gridsearch method for estimating the unknown threshold and regression coefficients. A testing procedure for the structural change in a given expectile level is also3roposed. In Section 3, we conduct simulation studies and two real data analy-ses. Section 4 provides the conclusion with possible future extensions. Technicalproofs are presented in the Appendix.
2. Methodology Let ( Y i , X i , Z i ), i = 1 , · · · , n , be a sequence of independent and identicallydistributed sample from the population ( Y, X, Z ). We assume that Y is theresponse variable, Z is a vector of covariates, and X is a scalar variable, whoserelationship with Y changes at an unknown location. The population τ -expectileof Y , ν τ ( Y ), minimizes the loss function E [ ρ τ ( Y − ν )], where ρ τ ( u ) = ω τ ( u ) u = (1 − τ ) u , u ≤ ,τ u , u > , is the asymmetric squared error loss function, and 0 < τ < τ = 0 .
5, the τ -expectile corresponds to the mean of Y .In this paper, we model the conditional τ -th expectile of Y using the con-tinuous threshold model ν τ ( Y | X, Z ) = β + β X + β ( X − t ) + + γ (cid:62) Z , (1)where θ τ = ( ξ (cid:62) , t ) (cid:62) are the unknown parameters of interest, ξ = ( β , β , β , γ (cid:62) ) (cid:62) is the vector of parameters excluding the unknown location of the threshold orchange point t , γ is a p × a + = aI ( a > I ( · ) is the indicator function. Clearly, the linear expectile regression is continu-ous on X at t , but has different slopes on either side of the threshold t . In otherwords, β is the slope of the left line segment for X ≤ t and β + β is the slope of the right line segment for X > t . 4 .2. Estimation procedure
To estimate θ τ = ( ξ (cid:62) , t ) (cid:62) at a given expectile τ , we minimize the objectivefunction M n,τ ( θ ) = n − n (cid:88) i =1 ρ τ (cid:0) Y i − β − β X i − β ( X i − t ) + − γ (cid:62) Z i (cid:1) . (2)However, due to the existence of the threshold t , the objective function (2) isconvex in ξ but non-convex in t , making it difficult to obtain its minimizer. Oneestimation approach is to use the grid search strategy, which is commonly usedfor bent line mean regression (Quandt, 1958; Chappell, 1989). To proceed, were-write the objective function (2) with respect to ξ and t as M n,τ ( θ ) ≡ M n,τ ( ξ , t ) = n − n (cid:88) i =1 ρ τ (cid:0) Y i − ξ (cid:62) V i ( t ) (cid:1) , (3)where V i ( t ) = (cid:0) , X i , ( X i − t ) + , Z (cid:62) i (cid:1) (cid:62) . The minimization can be carried out intwo steps:(1) for each t ∈ T , where T is the range set of all t ’s, obtain a profile estimateof ξ by (cid:98) ξ ( t ) = arg min ξ M n,τ ( ξ , t ) . (2) obtain the threshold t as (cid:98) t = arg min t ∈T M n,τ (cid:16) (cid:98) ξ ( t ) , t (cid:17) . The estimate for θ then is (cid:98) θ = (cid:16) (cid:98) ξ ( (cid:98) t ) , (cid:98) t (cid:17) . Because the objective function is not differentiable with respect to θ , it isimpossible to obtain the asymptotic properties of (cid:98) θ using the standard theory.Here, we derive the asymptotic properties using the modern empirical processestheory. We first introduce some notations. Denote the true parameters as θ .Let M τ ( θ ) = E ρ τ (cid:0) Y − ξ (cid:62) V ( t ) (cid:1) , where V i ( t ) = (cid:0) , X, ( X − t ) + , Z (cid:62) (cid:1) (cid:62) . Usingthe notation of empirical process, one can write M n,τ ( θ ) = P n m θ and M τ ( θ ) = P m θ , P n = n − (cid:80) ni =1 δ X i is the empirical measure, and m θ ( X ) = ρ τ (cid:0) Y − ξ (cid:62) V ( t ) (cid:1) = ω τ [ Y − ξ (cid:62) V ( t )] with the weights ω τ ( X ) = (cid:12)(cid:12) τ − I ( Y − ξ (cid:62) V ( t ) ≤ (cid:12)(cid:12) = (1 − τ ) , Y − ξ (cid:62) V ( t ) ≤ ,τ, Y − ξ (cid:62) V ( t ) > . . Here, X is the observed data ( Y, X, Z ).In Lemma A.1 in the Appendix, we show that sup θ ∈ Θ | M n,τ ( θ ) − M τ ( θ ) | converges to zero in probability, as n goes to infinity. Furthermore, we establishthe consistency of (cid:98) θ . Theorem 2.1.
Under the regularity conditions in the Appendix, as n → ∞ , wehave that (cid:98) θ P −→ θ . We prove the asymptotic normality by using Theorem 5.23 in Van der Vaart(2000), which establishes the asymptotic normality of M-estimators when thecriterion function is Lipschitz continuous and its limiting function admits asecond order Taylor expansion. To proceed, define the matrix Σ( θ ) = E ˙ m θ ˙ m (cid:62) θ ,where ˙ m θ is ˙ m θ = − ω τ V ( t ) (cid:8) Y − ξ (cid:62) V ( t ) (cid:9) β E (cid:26) ω τ (cid:2) Y − ξ (cid:62) V ( t ) (cid:3) (cid:12)(cid:12)(cid:12)(cid:12) X (cid:27) I ( X > t ) . Define the Hessian matrix of M τ ( θ ) H ( θ ) ≡ ∂ ∂ θ ∂ θ (cid:62) M τ ( θ )= 2E ω τ V ( t ) V ( t ) (cid:62) − β I ( X > t ) V ( t ) + (cid:8) Y − ξ (cid:62) V ( t ) (cid:9) U ( t ) − β I ( X > t ) V ( t ) (cid:62) + (cid:8) Y − ξ (cid:62) V ( t ) (cid:9) U ( t ) (cid:62) β I ( X > t ) + 2E ( p +3) × ( p +3) ( p +3) × × ( p +3) − β E (cid:26) ω τ (cid:2) Y − ξ (cid:62) V ( t ) (cid:3) (cid:12)(cid:12)(cid:12)(cid:12) X = t (cid:27) f X ( t ) , where U ( t ) = [0 , , I ( X > t ) , p × ] (cid:62) . Theorem 2.2.
Under the regularity conditions in the Appendix, √ n ( (cid:98) θ − θ ) is asymptotically normally distributed with mean zero and covariance matrix H ( θ ) − Σ( θ ) H ( θ ) − , as n → ∞ . It is worthwhile to emphasize that the regression coefficients and thresholdestimators ( (cid:98) ξ (cid:62) , (cid:98) t ) (cid:62) are jointly asymptotically normal with √ n convergence rate, (cid:98) ξ are still √ n -consistent, but the thresholdestimator (cid:98) t is n -consistent with a non-standard asymptotic distribution. The √ n -convergence rate of (cid:98) t in our model is due to the continuity of M n,τ ( θ ) at t . The asymptotic variance-covariance matrix can be estimated by (cid:98) H n ( (cid:98) θ ) − (cid:98) Σ( (cid:98) θ ) (cid:98) H n ( (cid:98) θ ) − ,where (cid:98) Σ n ( (cid:98) θ ) = n − (cid:80) ni =1 (cid:98) G n ( (cid:98) θ ) (cid:98) G n ( (cid:98) θ ) (cid:62) , and (cid:98) G n ( (cid:98) θ ) = − (cid:98) ω τ,i V i ( t ) (cid:110) Y i − (cid:98) ξ (cid:62) V i ( t ) (cid:111) (cid:98) β (cid:98) ω τ,i (cid:110) Y i − (cid:98) ξ (cid:62) V i ( t ) (cid:111) I ( X i > t ) , (cid:98) H n ( (cid:98) θ ) = 2 n n (cid:88) i =1 (cid:98) ω τ,i V i ( t ) V i ( t ) (cid:62) − (cid:98) β I ( X i > t ) V i ( t ) + (cid:8) Y i − ξ (cid:62) V i ( t ) (cid:9) U i ( t ) − (cid:98) β I ( X i > t ) V i ( t ) (cid:62) + (cid:8) Y i − ξ (cid:62) V i ( t ) (cid:9) U i ( t ) (cid:62) (cid:98) β I ( X i > t ) + 2 n n (cid:88) i =1 ( p +3) × ( p +3) ( p +3) × × ( p +3) − (cid:98) β (cid:98) ω τ,i (cid:8) Y i − ξ (cid:62) V i ( t ) (cid:9) (cid:98) f X ( t ) . Here, (cid:98) ω τ,i = | τ − I ( Y i − (cid:98) ξ (cid:62) V i ( t )) | , and (cid:98) f X ( x ) = ( nh ) − (cid:80) ni =1 K ( X i − xh ) isthe kernel estimator for the density f X ( x ) of X , and K ( · ) is a kernel func- tion with a bandwidth h >
0. In practice, we use the Epanechnikov kernel K ( u ) = 3 / − u ) I ( | u | ≤
1) and obtain the optimal bandwidth by the Silver-man’s rule of thumb (Silverman, 1986), h = 1 . (cid:98) σn − / , where (cid:98) σ is the standarddeviation of X . An important question before fitting model (1) is whether there exists athreshold at a pre-specified expectile. If a threshold does not exist, then t isunidentifiable and the estimation procedure in the last section is ill-conditioned.To test the existence of a threshold, we test null ( H ) and alternative ( H )hypotheses H : β = 0 for any t ∈ T v.s. H : β (cid:54) = 0 for some t ∈ T , where T is the range set of all t ’s.Tests for structural changes have been developed in conditional mean re-gression (Andrews, 1993; Bai, 1996; Hansen, 1996, 2015), quantile regression(Qu, 2008; Li et al., 2011), transformation models (Kosorok and Song, 2007),7ime series models (Chan, 1993; Cho and White, 2007), and among others. To construct our test statistic, we take an approach in spirit similar to the test forstructural changes in quantile regression in Qu (2008). This test is constructedby sequentially evaluating the subgradients of the objective function under H for a subsample, in a fashion similar to the CUSUM statistic. An advantageof this test is that it only requires fitting the model under the null hypothesis. Thus, it is computationally more efficient than the likelihood-ratio type tests,such as the sup-likelihood-ratio-type test for testing threshold effects in regres-sion models in Lee et al. (2011), which requires fitting the models under bothnull and alternative hypotheses.To proceed, we define the following statistic, R n ( t ) = 1 √ n n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ (cid:98) α (cid:62) W i ) (cid:12)(cid:12) ( Y i − (cid:98) α (cid:62) W i )( X i − t ) I ( X i ≤ t ) , where W i = (1 , X i , Z (cid:62) i ) (cid:62) , and (cid:98) α is the estimator of coefficients α = ( β , β , γ (cid:62) ) (cid:62) under the null hypothesis H , that is, (cid:98) α = arg min α √ n n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ α (cid:62) W i ) (cid:12)(cid:12) ( Y i − α (cid:62) W i ) . An intuitive interpretation for R n ( t ) is given as follows. If there is not a thresh-old, (cid:98) α is a good estimate of its population value, and hence, the estimatedresidual e i = Y i − (cid:98) α T W i would show a random pattern against X i , leading toa small R n ( t ). On the other hand, if there exists a threshold, the estimate (cid:98) α would differ significantly from the true value, and the estimated residuals woulddepart from zero in a systematic fashion related to X i , resulting in a large ab-solute value of R n ( t ). Because the location of the threshold is unknown, weneed search through all the possible locations. Therefore, we propose the teststatistic T n = sup t ∈T | R n ( t ) | . This statistic can be viewed as a weighted CUSUM statistic based on the esti- mated residuals under the null hypothesis. Intuitively, it is plausible to reject H when T n is too large. This intuition will be formally verified by Theorem8.5. It implies that R n ( t ) converges to a Gaussian process with mean zero, andthe size of such a process can be used to test for a threshold effect.In order to derive the large-sample inference for T n , we consider the localalternative model, Y i = β + β X i + n − / β ( X i − t ) + + γ (cid:62) Z i + e i , (4)where t is the location of threshold, β (cid:54) = 0, and the τ -expectile of e i is zero.We first introduce some notations (cid:98) S wn ( (cid:98) α ) = n − n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ (cid:98) α (cid:62) W i ) (cid:12)(cid:12) W i W (cid:62) i ,S w ( α ) = E (cid:2)(cid:12)(cid:12) τ − I ( Y ≤ α (cid:62) W ) (cid:12)(cid:12) W W (cid:62) (cid:3) , (cid:98) S n ( (cid:98) α , t ) = n − n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ (cid:98) α (cid:62) W i ) (cid:12)(cid:12) W i ( X i − t ) I ( X i ≤ t ) ,S ( α , t ) = E (cid:2)(cid:12)(cid:12) τ − I ( Y ≤ α (cid:62) W ) (cid:12)(cid:12) W ( X − t ) I ( X ≤ t ) (cid:3) , (cid:98) S n ( (cid:98) α , t ) = n − n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ (cid:98) α (cid:62) W i ) (cid:12)(cid:12) W i β ( X i − t ) I ( X i ≥ t ) ,S ( α , t ) = E (cid:2)(cid:12)(cid:12) τ − I ( Y ≤ α (cid:62) W ) (cid:12)(cid:12) W β ( X − t ) I ( X ≥ t ) (cid:3) , and q ( t ) = S ( α , t ) (cid:62) S w ( α ) − S ( α , t ). Theorem 2.3.
Under the regularity conditions in the Appendix, for the localalternative model (4) , R n ( t ) has the asymptotic representation R n ( t ) = 1 √ n n (cid:88) i =1 e i (cid:12)(cid:12) τ − I ( Y i − α (cid:62) W i ≤ (cid:12)(cid:12) (cid:2) ( X i − t ) I ( X i ≤ t ) − S ( α , t ) (cid:62) S w ( α ) − W i (cid:3) (5) − q ( t ) + o P (1) . Furthermore, T n converges weakly to the process sup t | R ( t ) − q ( t ) | , where R ( t ) is the Gaussian process with mean zero and covariance functionE (cid:20) e (cid:12)(cid:12) τ − I ( Y − α (cid:62) W ≤ (cid:12)(cid:12) (cid:8) ( X − t ) I ( X ≤ t ) − S ( α , t ) T S w ( α ) − W (cid:9) × (cid:8) ( X − t ) I ( X ≤ t ) − S ( α , t ) T S w ( α ) − W (cid:9) (cid:21) . orollary 2.4. Under the regularity conditions in the Appendix, for the localalternative model, Y i = β + β X i + n − / a n β ( X i − t ) + + γ (cid:62) Z + e i , for anyincreasing sequence a n goes to infinite, we have that lim n →∞ P ( | T n | ≥ t ) = 1 for any t > . Because the limiting null distribution of T n is nonstandard, we resort tothe Gaussian multiplier method (Van der Vaart, 2000) to calculate the criticalvalues, based on the asymptotic representation (5). The procedure is describedin Algorithm 1. In the Appendix, we prove the following result, which implies the validity of the bootstrap resampling scheme. Theorem 2.5.
Under both the null and the local alternative hypotheses, R ∗ n ( τ ) (defined in Algorithm 1) converges to the Gaussian process R ( t ) as n → ∞ . We summarize the computing procedure as follows.
Algorithm 1 : Generate iid { v , · · · , v n } from N (0 , Calculate the test statistic T ∗ n ( t ) = sup t ∈T | R ∗ n ( t ) | , where R ∗ n ( t ) = 1 √ n n (cid:88) i =1 v i (cid:98) e i | τ − I ( (cid:98) e i ≤ |× (cid:104) ( X i − t ) I ( X i ≤ t ) − (cid:98) S n ( (cid:98) α , t ) (cid:62) (cid:98) S wn ( (cid:98) α ) − W i (cid:105) , with the estimated residuals (cid:98) e i = Y i − (cid:98) α (cid:62) W i under the null hypothesis. Repeat Steps 1–2 with NB times to obtain T ∗ (1) n , · · · , T ∗ ( NB ) n . Calculatethe p-value as (cid:98) p n = NB − NB (cid:80) j =1 I { T ∗ ( j ) n ≥ T n } .
3. Simulation Studies and Applications
In this section, we conduct simulation studies for assessing the finite sampleperformance of the proposed method. We consider the following two scenarios:10i) Independent and identically distributed (IID): Y = β + β X + β ( X − t ) + + γZ + e, (ii) Heteroscedasticity: Y = β + β X + β ( X − t ) + + γZ + (1 + 0 . Z ) e, where x is generated from a uniform distribution U ( − , z is generated froma normal distribution N (1 , . ), and the parameters are ( β , β , β , γ, t ) (cid:62) =(1 , , − , , . (cid:62) . For each scenario, we consider three error cases: (1) e ∼ N (0 , e ∼ t , and (3) a mixture distribution e ∼ . N (0 ,
1) + 0 . t , where t is the t -distribution with four degrees of freedom. For each case, we conduct1000 repetitions with sample sizes n = 200 and 400.As shown in Tables 1—2, for both the IID and the heteroscedastic scenar-ios, all the biases are small, indicating the proposed estimator is asymptotically consistent. Moreover, the average estimated standard errors are close to theempirical standard errors. The coverage probabilities of the regression param-eters ( β , β , β , γ ) are close to the nominal level 95%. Though some coverageprobabilities of the threshold t are below 90% when n = 200, they improve asthe sample size increases to n = 400. The performance is similar in all the three error distributions. In summary, the proposed estimate has a good finite sampleperformance.We also conduct simulation studies to evaluate the type I error and thepower of the testing procedure. The simulation models are similar to the above,with threshold effects at β = − , − , − . , , . , ,
2. The number of bootstrap times is set as 1 ,
000 and the nominal significance level is 5%. The results areshown in Table 3. For all scenarios, the tests have type I errors close to thenominal level and have reasonable power, which indicates that the proposedtest is valid for testing the existence of a threshold.
We first apply our method to the Fourth Dutch Growth data, which wascollected by the Fourth Dutch Growth study (van Buuren, 2007) and is available11able 1: Performance of the proposed estimator based on 1,000 simulated sam-ples of n = 200 and 400 observations, for the three error distributions in theIID case. n = 200 n = 400Error τ β β β γ t β β β γ t Bias: the empirical bias; SD: the empirical standard error; ESE: the average estimated stan-dard error; CP: 95% coverage probability. n = 200 and 400 observations, for the three error distributions in theheteroscedastic case. n = 200 n = 400Error τ β β β γ t β β β γ t True 1.000 3.000 -2.000 1.000 1.500 1.000 3.000 -2.000 1.000 1.5001 0.3 Bias 0.013 0.008 -0.024 0.000 -0.004 0.004 0.005 -0.010 0.001 -0.004SD 0.199 0.125 0.227 0.170 0.219 0.154 0.085 0.157 0.129 0.149ESE 0.200 0.113 0.219 0.175 0.175 0.141 0.080 0.156 0.124 0.125CP 0.958 0.921 0.946 0.948 0.866 0.928 0.927 0.947 0.935 0.9050.5 Bias 0.010 0.007 -0.022 -0.004 -0.003 0.002 0.004 -0.011 0.001 -0.001SD 0.194 0.119 0.218 0.165 0.209 0.150 0.082 0.152 0.126 0.144ESE 0.195 0.110 0.214 0.170 0.171 0.137 0.078 0.151 0.121 0.122CP 0.952 0.930 0.948 0.949 0.880 0.931 0.940 0.947 0.941 0.9030.8 Bias 0.008 0.009 -0.027 -0.009 -0.007 0.000 0.004 -0.016 -0.002 -0.000SD 0.214 0.130 0.241 0.182 0.235 0.160 0.090 0.168 0.136 0.160ESE 0.209 0.119 0.232 0.181 0.185 0.148 0.084 0.165 0.130 0.132CP 0.944 0.928 0.934 0.945 0.881 0.934 0.936 0.947 0.936 0.8942 0.3 Bias 0.015 0.035 -0.086 0.002 -0.015 0.010 0.007 -0.042 -0.003 0.006SD 0.323 0.183 0.366 0.272 0.346 0.215 0.134 0.240 0.183 0.230ESE 0.293 0.168 0.328 0.256 0.251 0.209 0.118 0.234 0.184 0.184CP 0.927 0.923 0.918 0.937 0.839 0.951 0.912 0.945 0.959 0.8810.5 Bias 0.008 0.024 -0.074 0.001 -0.002 0.009 0.006 -0.034 -0.004 0.003SD 0.297 0.166 0.338 0.249 0.323 0.200 0.121 0.221 0.169 0.212ESE 0.274 0.157 0.316 0.239 0.241 0.194 0.110 0.218 0.170 0.172CP 0.937 0.924 0.939 0.945 0.846 0.957 0.927 0.938 0.955 0.8970.8 Bias 0.020 0.029 -0.105 -0.004 -0.004 0.018 0.016 -0.057 -0.007 -0.002SD 0.374 0.216 0.437 0.310 0.410 0.264 0.161 0.297 0.216 0.289ESE 0.364 0.226 0.499 0.287 0.346 0.243 0.143 0.307 0.206 0.230CP 0.924 0.936 0.931 0.935 0.851 0.944 0.915 0.945 0.937 0.8843 0.3 Bias 0.015 0.002 -0.025 -0.001 0.001 0.013 0.005 -0.018 -0.006 0.001SD 0.226 0.131 0.285 0.197 0.242 0.153 0.089 0.176 0.129 0.156ESE 0.210 0.118 0.239 0.183 0.186 0.148 0.084 0.164 0.131 0.132CP 0.937 0.920 0.923 0.930 0.872 0.941 0.937 0.937 0.950 0.8970.5 Bias 0.012 0.001 -0.022 -0.005 0.002 0.012 0.005 -0.015 -0.007 -0.003SD 0.215 0.123 0.258 0.188 0.229 0.144 0.086 0.166 0.122 0.148ESE 0.203 0.115 0.228 0.177 0.181 0.143 0.082 0.158 0.126 0.127CP 0.931 0.931 0.929 0.931 0.875 0.949 0.943 0.944 0.957 0.9190.8 Bias 0.013 0.004 -0.035 -0.013 0.002 0.015 0.005 -0.015 -0.010 -0.007SD 0.239 0.142 0.282 0.206 0.264 0.159 0.099 0.190 0.138 0.169ESE 0.224 0.129 0.329 0.193 0.225 0.162 0.094 0.180 0.140 0.144CP 0.930 0.926 0.927 0.927 0.877 0.946 0.937 0.935 0.949 0.913
Bias: the empirical bias; SD: the empirical standard error; ESE: the average estimated stan-dard error; CP: 95% coverage probability. n = 200 observations. Model Error τ β -2 -1 -0.5 0 0.5 1 2IID 1 0.3 1.000 1.000 0.770 0.049 0.766 1.000 1.0000.5 1.000 1.000 0.801 0.052 0.788 1.000 1.0000.8 1.000 0.999 0.745 0.048 0.712 1.000 1.0002 0.3 0.998 0.925 0.442 0.065 0.478 0.935 0.9990.5 1.000 0.974 0.492 0.063 0.503 0.960 1.0000.8 0.994 0.860 0.400 0.067 0.377 0.873 0.9933 0.3 1.000 0.992 0.710 0.035 0.729 0.998 1.0000.5 1.000 0.996 0.738 0.041 0.769 0.999 1.0000.8 1.000 0.990 0.682 0.039 0.662 0.990 0.998heteroscedastic 1 0.3 1.000 0.990 0.609 0.051 0.610 0.994 1.0000.5 1.000 0.995 0.644 0.054 0.624 0.998 1.0000.8 1.000 0.986 0.586 0.052 0.555 0.993 1.0002 0.3 0.998 0.838 0.326 0.065 0.345 0.851 0.9960.5 1.000 0.884 0.381 0.064 0.384 0.902 1.0000.8 0.990 0.757 0.307 0.067 0.282 0.755 0.9863 0.3 0.999 0.982 0.538 0.040 0.596 0.988 1.0000.5 1.000 0.984 0.593 0.040 0.598 0.993 1.0000.8 0.999 0.967 0.548 0.037 0.521 0.961 0.997
14n the R package expectreg . This dataset has the height, weight and headcircumference of Dutch children between ages 0 and 21 years (van Buuren and Fredriks, 2001). A primary interest of this study concerns the relation betweenage and height. The scatter plot (Figure 1a) shows the relationship betweenage and height for a subset of 6 ,
848 boys. Clearly, there is a nonlinear trendbetween height and age (Figure 1a), with a steep curvature before age threedue to rapid growth in early childhood, and a bent in the late teens due to reaching the full adult height. This dataset has been analyzed by Schnabel andEilers (2009). In their analysis, they took a square root transformation on age.While this transformation effectively removes the curvature at early childhood,the nonlinearity in the late teens still exists (Figure 1b). Then they fitted thetransformed data using smoothed expectile regression, by combining the least asymmetrically weighted squares with the P-splines. Though the smoothedexpectile curves fit the data well, they do not provide any information on thelocation of the threshold, i.e., the age to stop growing.Here, we fit the continuous threshold expectile model to the square roottransformed data ( X i , Y i ) , i = 1 , . . . , ,
848 and estimate the location of thresh-old. Specifically, ν τ ( Y i | X i , Z ) = β + β X i + β ( X i − t ) + , (6)where Y i is the height of the i th boy, X i is the square root of his age, and θ τ = ( β , β , β ) are the unknown parameters of interest, t is the unknown location of threshold. We fit the model with τ = 0.05, 0.15, 0.25, 0.30, 0.40,0.50, 0.80, 0.90, 0.95, 0.98.For all the expectile levels we fit, the p-values from our threshold effect testare nearly 0, indicating a highly significant continuous threshold pattern. Theregression results for different expectile levels are reported in Table 4. The esti- mated coefficients show that the height first increases rapidly with age (roughly31–35 cm per square root of age), and then the growth is very limited or nearlystops after about age 17–18. The estimated thresholds illustrate a general trendthat shorter boys seem to stop growth later than taller boys, with a 95% con-15dence interval (CI), [18.39, 19.24] at the expectile level τ = 0 .
05, and [16.76, τ = 0 .
98. Figure 1b confirms these results.
Our second example concerns the salaries of major league baseball (MLB)players for the 1987 baseball season (Hoaglin and Velleman, 1995). The datasethas been analyzed by several groups in the ASA graphical session in 1989. Here we consider a subset with n=176 pitchers, which was analyzed in (Hettmanspergerand McKean, 2011) using a rank-based regression. This dataset is available inthe R package rfit . It consists of the 1987 beginning salary and the number ofyears of experience for these pitchers.Visually, the scatter plot (Figure 2a) suggests that the salaries are first pos- itively correlated with the years of experience, but then decline after about 9years. This is somewhat unusual, because it is generally expected that salariesgrow with the years of experience in players’ early career and the status of freeagent (i.e., the player whose initial 6 year contract expires). Although salariesdo decrease after players pass their prime time, it would happen much later, for example, Haupert and Murray (2012) estimated the decline for MLB playersoccurs after 22 years. In the analysis by the ASA graphical session in 1989,the model with the best predictive performance is a segmented mean regressionmodel with a fixed threshold at 7 years, where the threshold was chosen accord-ing to the length of the initial professional baseball contracts (6 years). It is of interest to formally test if the visually observed transition is significant andestimate the onset of the decline from the data. Furthermore, the salaries showconsiderable heterogeneity at a given number of years of experience. Hence, aregression model based on the conditional distribution of the response variableprovides a more complete picture than a mean regression model. Previous anal- yses only focused on the mean regression model (Hettmansperger and McKean,2011; Hoaglin and Velleman, 1995), but not regression models for conditionaldistribution. 16able 4: The estimated parameters and their standard errors (listed in paren-theses) for Dutch boys data. The p-values are from the test for a thresholdeffect. τ p-value β β β t a) Scatter plot of height (in cm) against age (in years) of Dutch boys .(b) Fitted expectile curves for the data with transformed age. Figure 1: Analysis of Dutch boys data from the Fourth Dutch Growth Study.18ere we fit the data using the continuous threshold expectile regression, ν τ ( Y i | X i , Z ) = β + β X i + β ( X i − t ) + , (7)where Y i is the log (salary) of the i th pitcher, X i is log (years of experience),and θ τ = ( β , β , β ) are the unknown parameters of interest, t is the unknown location of threshold, τ = 0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,0.9, 0.95, 0.98, 0.99.Our threshold test shows that the continuous threshold patterns are highlysignificant, with p-values less than 0.05 for all the expectile levels considered.Table 5 reports the estimated coefficients and their standard errors. The coeffi- cients show that the salaries indeed decline for pitchers with 9 or more years ofexperience (range: (8.61, 10.35)), at all the expertile levels we fitted. Figure 2confirms this conclusion.This raises two natural questions: why did the salaries decrease for moreexperienced pitchers? and why did the decrease occur at 9 years for all salary levels? The history of the MLB shows that, in the time period of 1985 to 1987,the MLB team owners colluded in an effort to decrease salaries for free agentsafter their initial contracts expired. Pitchers with 9 or more years of experienceare all free agents. Their salary decrease is a reflection of owners trying tocontrol salaries. The reason that the observed threshold (9 years) is later than the start of free agents (7 years), is that some pitchers have become free agentsbefore the collusion, thus they had more than 7 years of experience when thecollusion occurred.As a comparison, we also fit the data with the bent-line quantile regression(Li et al., 2011). Though the overall trend is similar to the continuous threshold expectile regression, it has more crossing between quantiles. This agrees withthe observation of Schnabel and Eilers (2009) and Waltrup et al. (2015) thatexpectile regression tends to have less crossing than quantile regression.19able 5: The estimated parameters and their standard errors (listed in paren-theses) for baseball salaries data. The p-value is testing for a threshold effect. τ p-value β β β t a) Fitted expectile curves for the data.(b) Fitted quantile curves for the data. Figure 2: Analysis of baseball salaries data.21 . Concluding Remarks
In this article, we have developed the continuous threshold expectile regres- sion model. This model allows the expectiles of the response to be piecewiselinear but still continuous in covariates. We developed a grid search methodto estimate the unknown threshold and the regression coefficients. A weightedCUSUM type test statistics was proposed to test the structural change at agiven expectile. Our numerical studies showed that the proposed estimator has good finite sample performance.Our work may be extended in several ways. First, although generally thereare fewer crossings in expectile regression than in quantile regression (Schnabeland Eilers, 2009), the expectile crossings may happen. It will be worthwhileto extend our model to non-crossing continuous threshold expectile estimation and to develop tests for structure change across expectiles. Another interestingextension is to consider more than one threshold for a covariate. In such asituation, the estimation and test of the thresholds would be more complicated,and further investigation is needed.
Acknowledgements
The authors thank Dr. Andrew Wiesner for the interpretation of the baseballdata. This research is partially supported by NIH R01GM109453. Zhang’s workis partially supported by National Natural Science Foundation of China (NSFC)(11401194), and the Fundamental Research Funds for the Central Universities(531107050739).
Appendix A
Regularity Conditions. (A1) t = arg min t ∈T M τ (cid:16) (cid:98) ξ ( t ) , t (cid:17) is unique, where T is a compact set in R . (A2) θ τ is in Θ, and Θ is a compact subset of R p +4 .22 A3)
The scalar variable X has an absolutely continuous distribution with density function f X , which is strictly positive, bounded and continuousfor any t in a neighborhood of t . (A4) E | Y | < ∞ , E | X | < ∞ , and E | Z | < ∞ . (A5) Given β (cid:54) = 0, the Hessian matrix H ( θ ) is nonsingular.Condition (A1) is the identifiability condition of the estimation. Conditions (A1)—(A3) are for the consistency of the estimates, and Conditions (A4)—(A5)are used for the asymptotic normality.We first provide the following uniformly convergence results. Lemma A.1.
Under the regular conditions, as n → ∞ , we have sup θ ∈ Θ | M n,τ ( θ ) − M τ ( θ ) | P −→ . Proof of Lemma A.1 . To show that the class of functions { m θ : θ ∈ Θ } isGlivenko-Cantelli, it is sufficient to show m θ is Lipschitz continuous. Recallingthat θ = ( ξ (cid:62) , t ) (cid:62) , and the derivatives ∂m θ ∂ ξ = − ω τ V ( t ) (cid:2) Y − ξ (cid:62) V ( t ) (cid:3) ,∂m θ ∂t = 2 ω τ β I ( X > t ) (cid:2) Y − ξ (cid:62) V ( t ) (cid:3) . By the Condition (A2), both | max V ( t ) (cid:2) Y − ξ (cid:62) V ( t ) (cid:3) | and max | β I ( X > t ) | are finite. Note that w τ ≤ max( τ, − τ ) < τ ∈ (0 , | m θ ( X ) − m θ ( X ) | ≤ m ( X ) (cid:107) θ − θ (cid:107) for every X ,where m ( X ) = max | V ( t ) (cid:2) Y − ξ (cid:62) V ( t ) (cid:3) | max | β I ( X > t ) (cid:2) Y − ξ (cid:62) V ( t ) (cid:3) | < ∞ . Therefore, m θ is Lipschitz continuous, and applying the Glivenko-Cantelli the-orem and Example 19.8 in Van der Vaart (2000), we can establish that { m θ : θ ∈ Θ } is Glivenko-Cantelli. Proof of Theorem 2.1 . By Lemma A.1, sup θ ∈ Θ | M n,τ ( θ ) − M τ ( θ ) | P −→ n goes to infinity. Since Θ is compact and the uniqueness of the minimum θ (by23onditions A1 and A2), along with that M n,τ ( θ ) is continuous with respective to θ , then we can establish that (cid:98) θ P −→ θ , by Theorem 2.1 of Newey and McFadden (1994). Proof of Theorem 2.2 . Firstly, by Condition (A3), the function
X (cid:55)−→ m θ ( X )is measurable, and the function θ (cid:55)−→ m θ ( X ) is differentable at θ for P-almostevery X . Recall that m θ is Lipschitz continuous with respect to θ , as proved inLemma A.1. Secondly, the map θ (cid:55)−→ M τ ( θ ) = E m θ admits a second order Taylorexpansion at θ , with a nonsingular symmetric Hessian matrix H ( θ ). Wecan verify that H ( θ ) is continuous in θ . Indeed, the elements of H ( θ ) arequadratic functions of ξ , and hence H ( θ ) is continuous in ξ . It is sufficientto show that H ( θ ) is continuous in t . Note that the first term of H ( θ ) is afunction of t through moments of the form E [ V ( t ) I ( X > t )]. By Condition(A4), (E( (cid:107) V ( t )) (cid:107) ) / ≤ C for some constant C < ∞ . By Condition (A3), | F X ( t ) − F X ( t ) | ≤ max x f X ( x ) | t − t | ≤ C | t − t | for some constant C < ∞ , t < t . Then, by Cauchy-Schwartz inequality,E (cid:107) V ( t ) I ( t ≤ X ≤ t ) (cid:107) ≤ (cid:0) E (cid:107) V ( t ) (cid:107) (cid:1) / (cid:0) E | t ≤ X ≤ t | (cid:1) / ≤ C C | t − t | / , is uniformly continuous in t . Hence, the first term of H ( θ ) is continuous in t .On the other hand, since E ω τ = τ (cid:0) − F Y ( ξ (cid:62) V ( t )) (cid:1) + (1 − τ ) F Y ( ξ (cid:62) V ( t )) iscontinuous in t , then the second term of H ( θ ) is continuous in t . Thus, H ( θ ) iscontinuous in t .Finally, by Theorem 2.1, (cid:98) θ is consistent for θ in a neighborhood of θ , it follows that √ n ( (cid:98) θ − θ ) is asymptotically normal with mean zero and covariancematrix H ( θ ) − Σ( θ ) H ( θ ) − , by Theorem 5.23 in Van der Vaart (2000). Lemma A.2.
Under the regularity conditions, as n → ∞ , we have (i) (cid:98) S wn ( (cid:98) α ) P −→ S w ( α ) . (ii) sup t (cid:12)(cid:12)(cid:12) (cid:98) S n ( (cid:98) α , t ) − S ( α , t ) (cid:12)(cid:12)(cid:12) P −→ . iii) sup t (cid:12)(cid:12)(cid:12) (cid:98) S n ( (cid:98) α , t ) − S ( α , t ) (cid:12)(cid:12)(cid:12) P −→ .Proof of Lemma A.2 . It is easily obtained by using the law of large numberfor (i). To establish (ii) and (iii), note that (cid:98) S n ( (cid:98) α , t ) and (cid:98) S n ( (cid:98) α , t ) are sumsof indicator functions and Lipschitz functions, then they are Glivenko-Cantelliclass, which implies that both (ii) and (iii) holds. Proof of Theorem 2.5 . Note that (cid:98) α = arg min α n − / (cid:80) ni =1 (cid:12)(cid:12) τ − I ( Y i ≤ α (cid:62) W i ) (cid:12)(cid:12) ( Y i − α (cid:62) W i ) , which is equivalent to the solution of the estimating equation U n ( α ) = 1 √ n n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ α (cid:62) W i ) (cid:12)(cid:12) W i ( Y i − α (cid:62) W i ) . Recall that the local alternative model (4) is Y i = β + β X i + n − / β ( X − t ) + + γ (cid:62) Z + e i . Then, under model (4), the estimating equation can be written as U n ( α ) = 1 √ n n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ α (cid:62) W i ) (cid:12)(cid:12) W i e i + 1 n n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ α (cid:62) W i ) (cid:12)(cid:12) W i β ( X i − t ) I ( X i > t ) + o P (1) . By the mean-value theorem, we have − U n ( α ) = U n ( (cid:98) α ) − U n ( α )= − √ n n (cid:88) i =1 (cid:12)(cid:12)(cid:12) τ − I ( Y i ≤ (cid:99) α ∗(cid:62) W i ) (cid:12)(cid:12)(cid:12) W i W (cid:62) i ( (cid:98) α − α ) + o p (1)= − (cid:98) S wn ( (cid:99) α ∗ ) √ n ( (cid:98) α − α ) + o p (1) . where (cid:99) α ∗ lies in the line between (cid:98) α and α . By Lemma A.2, (cid:98) S wn ( (cid:98) α ) P −→ S w ( α ),and under the local alternative model 4, it yields that √ n ( (cid:98) α − α ) = 1 √ n S w ( α ) − n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ α (cid:62) W i ) (cid:12)(cid:12) W i ( Y i − α (cid:62) W i ) + o P (1)= 1 √ n S w ( α ) − n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ α (cid:62) W i ) (cid:12)(cid:12) W i e i + 1 √ n S w ( α ) − n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ α (cid:62) W i ) (cid:12)(cid:12) W i β ( X i − t ) I ( X i > t ) + o P (1) . √ n ( (cid:98) α − α ) and some algebraic manipulation, we have R n ( t ) = 1 √ n n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ α (cid:62) W i ) (cid:12)(cid:12) (cid:20) Y i − α (cid:62) W i − n − / β ( X i − t ) I ( X i > t ) − ( (cid:98) α − α ) (cid:62) W i + n − / β ( X i − t ) I ( X i > t ) (cid:21) ( X i − t ) I ( X i ≤ t ) + o P (1)= 1 √ n n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ α (cid:62) W i ) (cid:12)(cid:12) e i (cid:104) ( X i − t ) I ( X i ≤ t ) − (cid:98) S n ( α , t ) (cid:62) (cid:98) S wn ( α ) − W i (cid:105) − (cid:98) S n ( α , t ) (cid:62) (cid:98) S wn ( α ) − (cid:98) S n ( t, α ) + o P (1)= 1 √ n n (cid:88) i =1 (cid:12)(cid:12) τ − I ( Y i ≤ α (cid:62) W i ) (cid:12)(cid:12) e i (cid:104) ( X i − t ) I ( X i ≤ t ) − (cid:98) S n ( α , t ) (cid:62) (cid:98) S wn ( α ) − W i (cid:105) − q ( t ) + o P (1) . It is easy to derive the remainder conclusion for weak convergence of R n ( (cid:98) α , t )by following the proofs in Stute (1997). Proof of Theorem 2.5
We divide the proof into three steps. Firstly, we show that the covariancefunction of R ∗ n converges to that of R . Define R ∗ n ( t ) = 1 √ n n (cid:88) i =1 v i ( Y i − α (cid:62) W i ) (cid:12)(cid:12) τ − I ( Y i ≤ α (cid:62) W i ) (cid:12)(cid:12) × (cid:2) ( X i − t ) I ( X i ≤ t ) − S ( α , t ) (cid:62) S w ( α ) − W i (cid:3) . By the fact that the consistency of (cid:98) α − α , along with the uniform convergence of (cid:98) S n ( (cid:98) α , t ) − S ( α , t ) and (cid:98) S wn ( (cid:98) α ) − S w ( α ), one can easily show R ∗ n ( t ) and R ∗∗ n ( t )are asymptotically equivalent in the sense thatsup t (cid:107) R ∗ n ( t ) − R ∗∗ n ( t ) (cid:107) = o P (1) . v i ’s are independent of ( Y i , X i , Z i ), and E v i = 0, Var( v i ) = 1. Then,for any t , t , the covariance function of R ∗∗ n is Cov ( R ∗∗ n ( t ) , R ∗∗ n ( t ))= 1 n n (cid:88) i =1 E (cid:18) v i e i | τ − I ( e i ≤ | (cid:8) ( X i − t ) I ( X i ≤ t ) − S ( α , t ) T S w ( α ) − W i (cid:9) × (cid:8) ( X i − t ) I ( X i ≤ t ) − S ( α , t ) T S w ( α ) − W i (cid:9) (cid:19) = E (cid:20) e | τ − I ( e ≤ | (cid:8) ( X − t ) I ( X ≤ t ) − S ( α , t ) T S w ( α ) − W (cid:9) × (cid:8) ( X − t ) I ( X ≤ t ) − S ( α , t ) T S w ( α ) − W (cid:9) (cid:21) . which is the same as the covariance of R ( t ).Secondly, it is easily to show that any finite-dimensional projection of R ∗∗ n ( t )converges to that of R ( t ), by the central limit theorem.Thirdly, R ∗∗ n ( t ) is uniformly tight. Note that the class of all indicator func-tions I ( X ≤ t ) is a Vapnik-Chervonenskis (VC) class of functions. Then, theclass of functions F n = (cid:8) ( X i − t ) I ( X i ≤ t ) − S n ( t ) S − w W i : t ∈ R (cid:9) is a VC class of functions. Thus, by the equicontinuity lemma 15 of (Pollard,1984), one can show that R ∗ n ( τ ) is uniformly tight. Then, by the Cramer-Wold device, the proof of Theorem 2.3 is completed. ReferencesReferences
Aigner, D., Amemiya, T., Poirier, D. J., 1976. On the estimation of productionfrontiers: Maximum likelihood estimation of the parameters of a discontinu- ous density function. International Economic Review 17, 377–96.Andrews, D., 1993. Tests for parameter instability and structural change withunknown change point. Econometrica 61, 821–856.27ai, J., 1996. Testing for parameter constancy in linear regressions: an empiricaldistribution function approach. Econometrica 64, 597–622.
Chan, K. S., 1993. Consistency and limiting distribution of the least squaresestimator of a threshold autoregressive model. Annals of Statistics 21, 520–533.Chan, K. S., Tsay, R. S., 1998. Limiting properties of the least squares estimatorof a continuous threshold autoregressive model. Biometrika 85, 413–426.
Chappell, R., 1989. Fitting bent lines to data, with applications to allometry.Journal of Theoretical Biology 138, 235–256.Chiu, G., Lockhart, R., Routledge, R., 2006. Bent-cable regression theory andapplications. Journal of the American Statistical Association 101, 542–553.Cho, J. S., White, H., 2007. Testing for regime switching. Econometrica 75,
Feder, P. I., 1975. On asymptotic distribution theory in segmented regressionproblems–identified case. The Annals of Statistics 3, 49–83.Hansen, B. E., 1996. Inference when a nuisance parameter is not identified underthe null hypothesis. Econometrica 64, 413–430.Hansen, B. E., 2015. Regression kink with an unknown threshold. Journal of
Business and Economic Statistics (Accepted), 00–00.Haupert, M., Murray, J., 2012. Regime switching and wages in major leaguebaseball under the reserve clause. Cliometrica 6, 143–162.28ettmansperger, T., McKean, J. W., 2011. Robust Nonparametric StatisticalMethods, 2nd Ed. New York, Chapman.
Hinkley, D. V., 1969. Inference about the intersection in two-phase regression.Biometrika 56, 495–504.Hoaglin, D. C., Velleman, P. F., 1995. A critical look at some analyses of majorleague baseball salaries. The American Statistician 49, 277–285.Kim, M., Lee, S., 2016. Nonlinear expectile regression with application to value- at-risk and expected shortfall estimation. Computational Statistics and DataAnalysis 94, 1–19.Kneib, T., 2013. Beyond mean regression. Statistical Modelling 13, 275–303.Koenker, R., Bassett, J. G., 1978. Regression quantiles. Econometrica 46, 33–50.Kosorok, M. R., Song, R., 2007. Inference under right censoring for transforma- tion models with a change-point based on a covariate threshold. The Annalsof Statistics 35, 957–989.Kuan, C.-M., Yeh, J.-H., Hsu, Y.-C., 2009. Assessing value at risk with care,the conditional autoregressive expectile models. Journal of Econometrics 150,261–270.
Lee, S., Seo, M. H., Shin, Y., 2011. Testing for threshold effects in regressionmodels. Journal of the American Statistical Association 106, 220–231.Li, C., Wei, Y., Chappell, R., He, X., 2011. Bent line quantile regression withapplication to an allometric study of land mammals’ speed and mass. Bio-metrics 67, 242–249.
Newey, W. K., McFadden, D., 1994. Large sample estimation and hypothesistesting. Handbook of Econometrics 4, 2111–2245.Newey, W. K., Powell, J. L., 1987. Asymmetric least squares estimation andtesting. Econometrica 55, 819–847. 29ollard, D., 1984. Convergence of Stochastic Processes. Springer Science & Busi- ness Media.Qu, Z., 2008. Testing for structural change in regression quantiles. Journal ofEconometrics 146, 170–184.Quandt, R. E., 1958. The estimation of the parameters of a linear regressionsystem obeying two separate regimes. Journal of the American Statistical
Association 53, 873–880.Quandt, R. E., 1960. Tests of the hypothesis that a linear regression systemobeys two separate regimes. Journal of the American Statistical Association55, 324–330.Schnabel, S. K., Eilers, P. H., 2009. Optimal expectile smoothing. Computa- tional Statistics and Data Analysis 53, 4168–4177.Silverman, B. W., 1986. Density Estimation for Statistics and Data Analysis.Vol. 26. CRC press.Sobotka, F., Kauermann, G., Waltrup, L. S., Kneib, T., 2013. On confidenceintervals for semiparametric expectile regression. Statistics and Computing
23, 135–148.Stute, W., 1997. Nonparametric model checks for regression. The Annals ofStatistics 25, 613–641.van Buuren, S., 2007. Worm plot to diagnose fit in quantile regression. StatisticalModelling 7, 363–376. van Buuren, S., Fredriks, M., 2001. Worm plot: a simple diagnostic device formodelling growth reference curves. Statistics in Medicine 20, 1259–1277.Van der Vaart, A. W., 2000. Asymptotic Statistics. Cambridge University Press.Waltrup, L. S., Sobotka, F., Kneib, T., Kauermann, G., 2015. Expectile andquantile regression–david and goliath? Statistical Modelling 15, 433–456.420