[PDF] Tests for validity of the semiparametric heteroskedastic transformation model

Abstract

There exist a number of tests for assessing the nonparametric heteroscedastic location-scale assumption. Here we consider a goodness-of-fit test for the more general hypothesis of the validity of this model under a parametric functional transformation on the response variable. Specifically we consider testing for independence between the regressors and the errors in a model where the transformed response is just a location/scale shift of the error. Our criteria use the familiar factorization property of the joint characteristic function of the covariates under independence. The difficulty is that the errors are unobserved and hence one needs to employ properly estimated residuals in their place. We study the limit distribution of the test statistics under the null hypothesis as well as under alternatives, and also suggest a resampling procedure in order to approximate the critical values of the tests. This resampling is subsequently employed in a series of Monte Carlo experiments that illustrate the finite-sample properties of the new test. We also investigate the performance of related test statistics for normality and symmetry of errors, and apply our methods on real data sets.

Full PDF

aa r X i v : . [ m a t h . S T ] J a n Tests for validity of the semiparametricheteroskedastic transformation model

Marie Hušková ∗ Simos G. Meintanis † ‡

Charl Pretorius ∗ †

January 10, 2019

Abstract

There exist a number of tests for assessing the nonparametric heteroscedasticlocation-scale assumption. Here we consider a goodness-of-ﬁt test for the more generalhypothesis of the validity of this model under a parametric functional transformationon the response variable. Speciﬁcally we consider testing for independence betweenthe regressors and the errors in a model where the transformed response is just alocation/scale shift of the error. Our criteria use the familiar factorization property ofthe joint characteristic function of the covariates under independence. The diﬃcultyis that the errors are unobserved and hence one needs to employ properly estimatedresiduals in their place. We study the limit distribution of the test statistics under thenull hypothesis as well as under alternatives, and also suggest a resampling procedurein order to approximate the critical values of the tests. This resampling is subsequentlyemployed in a series of Monte Carlo experiments that illustrate the ﬁnite-sampleproperties of the new test. We also investigate the performance of related test statisticsfor normality and symmetry of errors, and apply our methods on real data sets.

Key words: bootstrap test, heteroskedastic transformation, independence model,nonparametric regression

AMS 2010 classiﬁcation: ∗ Department of Probability and Mathematical Statistics, Charles University, Prague, Czech Republic † Unit for Business Mathematics and Informatics, North-West University, Potchefstroom, South Africa ‡ Corresponding author: [email protected] Introduction

At least since the seminal paper of Box and Cox (1964) transformations are applied to datasets in order to facilitate statistical inference. The goal of a certain transformation could be,among others, reduction of skewness, faster convergence to normality and better ﬁt, linear-ity, and stabilization of variance. These aims, which may even be contradictory, are quiteimportant as it is only under such assumptions that certain statistical procedures are appli-cable. Some of the issues raised when performing a certain transformation are discussed invarying settings by Staniswalis et al. (1993), Quiroz et al. (1996), Yeo and Johnson (2000),Chen et al. (2002), Mu and He (2007), and Meintanis and Stupﬂer (2015), and in the re-views of Sakia (1992) and Horowitz (2009). Here we will consider goodness-of-ﬁt (GOF)tests for the (after-transformation) location-scale nonparametric heteroskedastic model

T ( Y ) = m ( X ) + σ ( X ) ε, (1)where T (·) is a transformation acting on the response Y , m (·) and σ (·) are unknownfunctions, and where the error ε , having mean zero and unit variance, is supposed to beindependent of the vector of covariates X . The classical location-scale model, i.e. themodel in (1) with T ( Y ) ≡ Y , is a popular model that is often employed in statistics as wellas in econometrics (see e.g., Racine and Li, 2017; Brown and Levine, 2007; Chen et al.,2005), and there exist a number of approaches to test the validity of this model such as theclassical Kolmogorov–Smirnov and Cramér–von Mises tests in Einmahl and Van Keilegom(2008) and the test criteria in Hlávka et al. (2011) which are based on the characteristicfunction. On the other hand the problem of goodness-of-ﬁt (GOF) for the general model (1)under any given (ﬁxed) transformation has only recently drawn attention in Neumeyer et al.(2016) by means of classical methods. Here we deviate from classical approaches in thatwe employ the characteristic function (CF) instead of the distribution function (DF) asthe basic inferential tool. As already mentioned the CF approach was also followed inanalogous situations by Stute and Zhu (2005), Hlávka et al. (2011), and Hušková et al.(2018), among others, and gave favorable results.The rest of the paper is outlined as follows. In Section 2 we introduce the nullhypothesis of independence between the regressor and the error term and formulate thenew test statistic. The asymptotic distribution of the test statistic under the null hypothesisas well as under alternatives is studied in Section 3, while in Section 4 we particularizeour method, and suggest a bootstrap procedure for its calibration. Section 5 presentsthe results of a Monte Carlo study. Since one aim of transforming the response is toachieve normality, or more generally symmetry after-transformation, we also investigatethe small-sample performance of CF-based statistics for these problems with referenceto the regression errors. Real data applications are also included. We ﬁnally conclude1ith discussion of our ﬁndings in Section 7. Some technical material is deferred to theAppendix. Note that underlying equation (1) is the potentiality of obtaining a location/scale structurefollowing a certain transformation of the response. Hence there exist a number of inferentialproblems similar to the problems faced in the non-transformation case, i.e., when

T ( Y ) ≡ Y .For the homoskedastic version of model (1) with σ ≡ constant, estimation methods wereproposed by Linton et al. (2008) and Colling et al. (2015). The classical problem of ﬁttinga speciﬁc regression function was considered by Colling and Van Keilegom (2016, 2017),respectively, by means of the DF and the integrated regression function, while regressorsigniﬁcance using Bieren’s CF-type approach was considered by Allison et al. (2018). Onthe other hand, the problem of GOF of the model itself is studied by Hušková et al. (2018)only in the homoskedastic case, whereas the single existing work for GOF testing with themore general heteroskedastic model is that of Neumeyer et al. (2016) which is based onthe classical pair of Kolmogorov–Smirnov/Cramér–von Mises functionals.Here we are concerned with the GOF test for a ﬁxed parametric transformation Y ϑ = T ϑ ( Y ) , indexed by a parameter ϑ ∈ Θ . Speciﬁcally on the basis of independent copies ( Y j , X j ) , j = , . . . , n , of ( Y , X ) ∈ R × R p we wish to test the null hypothesis H : ∃ ϑ ∈ Θ such that ε ϑ ( Y , X )⊥ X , (2)where ⊥ denotes stochastic independence, and ε ϑ ( Y , X ) = Y ϑ − m ϑ ( X ) σ ϑ ( X ) , (3)with m ϑ ( X ) : = E (Y ϑ | X ) and σ ϑ ( X ) : = Var (Y ϑ | X ) being the mean and variance,respectively, of the transformed response conditionally on the covariate vector X , and Θ ⊆ R q , q ≥

1. Note that the validity of the null hypothesis H is tantamount tothe existence of a (true) value ϑ of ϑ which if substituted in this speciﬁc parametrictransformation and applied on the response will render the location/scale structure of model(1). To avoid confusion we emphasize that while the transformation is indeed parametric,the regression and the heteroskedasticity functions, respectively m ϑ (·) and σ ϑ (·) , and apartfrom their implicit dependence on the particular transformation T ϑ (·) and the associatedparameter ϑ , they are both viewed and estimated within an entirely nonparametric context.Therefore our model can be labelled as a semiparametric model, i.e. parametric in thetransformation but nonparametric in its location and scale functions.2o motivate our test statistic write ϕ X ,ε ϑ for the joint CF of ( X , ε ϑ ) , and ϕ X and ϕ ε ϑ for the marginal CFs of X and ε ϑ , respectively, and recall that the null hypothesisin (2) equivalently implies that ϕ X ,ε ϑ = ϕ X ϕ ε ϑ , so that by following the approach inHušková et al. (2018), the suggested test procedure will be based on the criterion ∆ n , W = n ∫ ∞−∞ ∫ R p | b ϕ ( t , t ) − b ϕ X ( t ) b ϕ b ε ( t )| W ( t , t ) dt d t , (4)where b ϕ ( t , t ) = n n Õ j = exp { i t ⊤ X j + it b ε j } is the joint empirical CF, and b ϕ X ( t ) and b ϕ b ε ( t ) are the empirical marginal CFs resultingfrom b ϕ ( t , t ) by setting t = t = . These three quantities serve as estimatorsof ϕ X ,ε ϑ , ϕ X and ϕ ε ϑ , respectively, and they will be computed by means of properlyestimated residuals b ε j = ( b Y j − b m ( X j ))/ b σ ( X j ) , j = , . . . , n . We note that in theseresiduals the observed response is parametrically transformed by means of b Y j = T b ϑ ( X j ) ,using the particular transformation under test and the corresponding estimate b ϑ of thetransformation parameter ϑ . Other than that and as already mentioned, the estimate b m (·) ofthe regression function as well as the heteroskedasticity estimate b σ (·) are obtained entirelynonparametrically. Having said this, we often suppress the index b ϑ in these estimates.Clearly for any weight function W satisfying W (· , ·) ≥

0, the test statistic ∆ n , W deﬁned in(4) is expected to be large under alternatives, and therefore large values indicate that thenull hypothesis is violated. We now consider theoretical properties of the introduced test statistics. More precisely, wepresent the limit distribution of our test statistics under both the null as well as alternativehypotheses. Since the assumptions are quite technical they are deferred to the Appendix.We ﬁrst introduce some required notation. For ϑ ∈ Θ , deﬁne m ϑ ( X j ) = E (cid:0) T ϑ ( Y j )| X j (cid:1) and σ ϑ ( X j ) = Var (cid:0) T ϑ ( Y j )| X j (cid:1) . m ϑ ( x ) and σ ϑ ( x ) , x = ( x , . . . , x p ) ⊤ , by b m ϑ ( x ) = b f ( x ) nh p n Õ v = K (cid:16) x − X v h (cid:17) T ϑ ( Y j ) , b σ ϑ ( x ) = b f ( x ) nh p n Õ v = K (cid:16) x − X v h (cid:17) (cid:16) T ϑ ( Y j ) − b m ϑ ( X j ) (cid:17) , respectively, where K (·) and h = h n are a kernel and a bandwidth, and b f ( x ) = nh p n Õ v = K (cid:16) x − X v h (cid:17) is a kernel estimator of the density of X j . Finally, let ε ϑ , j = T ϑ ( Y j ) − m ϑ ( X j ) σ ϑ ( X j ) , ε j = ε ϑ , j , b ε j = b ε b ϑ , j = T b ϑ ( Y j ) − b m b ϑ ( X j ) b σ b ϑ ( X j ) , (5)where b ϑ is a √ n -consistent estimator of ϑ . It is assumed that b ϑ allows an asymptoticrepresentation as shown in assumption (A.7).Now we formulate the limit distribution of the test statistic under the null hypothesis: Theorem 1.

Let assumptions (A.1)–(A.8) be satisﬁed. Then under the null hypothesis, as n → ∞ , ∆ n , W d → ∫ R p + | Z ( t , t )| W ( t , t ) dt d t , where { Z ( t , t ) , t ∈ R p + } is a Gaussian process with zero mean function and the samecovariance structure as the process { Z ( t , t ) , ( t , t ) ∈ R p + } deﬁned as Z ( t , t ) = { cos ( t ε ) − C ε ( t )} g + ( t ⊤ X ) + { sin ( t ε ) − S ε ( t )} g − ( t ⊤ X ) + t ε (cid:16) S ε ( t ) g + ( t ⊤ X ) + C ε ( t ) g − ( t ⊤ X ) (cid:17) + t ( ε − ) (cid:16) C ′ ε ( t ) g + ( t ⊤ X ) − S ′ ε ( t ) g − ( t ⊤ X ) (cid:17) + g ⊤ ( Y , X ) H ϑ , q ( t , t ) , (6) where C ε and S ε are the real and the imaginary part of the CF of ε . Similarly, C X and S X denote the real and the imaginary part of the CF of X j . Also, g ( Y , X ) is speciﬁed in ssumption (A.7) and g + ( t ⊤ X ) = cos ( t ⊤ X ) + sin ( t ⊤ X ) − C X ( t ) − S X ( t ) , g − ( t ⊤ X ) = cos ( t ⊤ X ) − sin ( t ⊤ X ) − C X ( t ) + S X ( t ) , H ϑ , q ( t , t ) = E h (cid:16) ∂ T ϑ ( Y ) ∂ϑ , . . . , ∂ T ϑ ( Y ) ∂ϑ q (cid:17) ⊤ × n σ ( X ) (cid:16) − t sin ( t ε ) g + ( t ⊤ X ) + t cos ( t ε ) g − ( t ⊤ X ) (cid:17) + σ ( X ) ( + ε ε ) (cid:16) t sin ( t ε ) g + ( t ⊤ X ) − t cos ( t ε ) g − ( t ⊤ X ) (cid:17) oi . The proof is postponed to the Appendix.The limit distribution under the null hypothesis is a weighted L -type functional of aGaussian process. Concerning the structure of Z (· , ·) , the ﬁrst row in (6) corresponds tothe situation when ϑ , ε , m (·) , σ (·) are known, the second row reﬂects the inﬂuence ofthe estimator of m (·) , the third one of the estimator of σ (·) while the last row reﬂects theinﬂuence of the estimator of ϑ .To get an approximation of the critical value one estimates the unknown quantities andsimulates the limit distribution described above with unknown parameters replaced by theirestimators. However, the bootstrap described in Section 4.2 is probably more useful.Concerning the consistency of the newly proposed test, note that if H is not true, thereis no parameter in ϑ that leads to independence, i.e. ∀ ϑ , ϕ X ,ε ϑ , ϕ X ϕ ε ϑ . The main assertion under alternatives reads as follows.

Theorem 2.

The estimator b ϑ converges in probability to some ϑ ∈ Θ and let ϑ ∈ Θ satisfy ∫ R p | ϕ X ,ε ϑ ( t , t ) − ϕ X ( t ) ϕ ε ϑ ( t )| W ( t , t ) dt d t > . (7) Let assumptions (A.1)–(A.4), (A.8) and (A.9) be satisﬁed and let also (A.5), (A.6) with s = s = and ϑ replaced by ϑ . As soon as n → ∞ , ∆ n , W P → ∞ . The proof is deferred to the Appendix.Theorems 1 and 2 imply consistency of the test and also that large values of ∆ n , W indicate that the null hypothesis is violated. 5 Computations and resampling

Following Hušková et al. (2018), we impose the decomposition W ( t , t ) = w ( t ) w ( t ) on the weight function. If in addition the individual weight functions w m (·) , m = , ∆ n , W = n n Õ j , k = I , jk I , jk + n n Õ j , k = I , jk n Õ j , k = I , jk − n n Õ j , k ,ℓ = I , jk I , j ℓ , (8)where I , jk : = I w ( b ε jk ) , I , jk : = I w ( X jk ) , with X jk = X j − X k and b ε jk = b ε j − b ε k , j , k = , . . . , n , and I w m ( x ) = ∫ cos ( t ⊤ x ) w m ( t ) d t , m = , . (9)The weight function w m (·) in (9) may be chosen in a way that facilitates integrationwhich is extremely important in high dimension. To this end notice that if w m (·) isreplaced by a spherical density, then the right-hand side of (9) gives (by deﬁnition) theCF corresponding to w m (·) computed at the argument x . Furthermore recall that withinthe class of all spherical distributions, the integral in (9) depends on x only via its usualEuclidean norm k x k , and speciﬁcally I w m ( x ) = Ψ (k x k) , where the functional form of theunivariate function Ψ (·) depends on the underlying subfamily of spherical distributions.In this connection Ψ (·) is called the “characteristic kernel” of the particular subfamily; seeFang et al. (1990). Consequently the test statistic in (8) becomes a function of Ψ (k x k) alone. Subfamilies of spherical distributions with simple kernels is the class of sphericalstable distributions with Ψ ( S ) γ ( u ) = e − u γ , 0 < γ ≤

2, and the class of generalized multivariateLaplace distributions with Ψ ( L ) γ ( u ) = ( + u ) − γ , γ >

0. For more information on theseparticular cases the reader is referred to Nolan (2013), and to Kozubowski et al. (2013),respectively. For further use we simply note that interesting special cases of spherical stabledistributions are the Cauchy distribution and the normal distribution corresponding to Ψ ( S ) γ with γ = γ =

2, respectively, while the classical multivariate Laplace distributionresults from Ψ ( L ) γ for γ = Recall that the null hypothesis H in (2) corresponds to model (1) in which both thetrue value of transformation parameter ϑ as well as the error density are unknown. In6his connection, and since, as was noted in Section 2, the asymptotic distribution of thetest criterion under the null hypothesis depends on these quantities, among other things,we provide here a resampling scheme which can be used in order to compute criticalpoints and actually carry out the test. The resampling scheme, which was proposed byNeumeyer et al. (2016), involves resampling from the observed X j and independentlyconstructing the bootstrap errors by smoothing the residuals. The bootstrap model thenfulﬁls the null hypothesis since T b ϑ ( Y ∗ j ) − E ∗ (cid:0) T b ϑ ( Y ∗ j ) (cid:12)(cid:12) X ∗ j (cid:1)q Var ∗ (cid:0) T b ϑ ( Y ∗ j ) (cid:12)(cid:12) X ∗ j (cid:1) : = ε ∗ j p + a n ⊥ ∗ X ∗ j , where E ∗ and Var ∗ denotes the conditional expectation and variance and ⊥ ∗ the conditionalindependence given the original sample.We now describe the resampling procedure. Let a n be a positive smoothing parametersuch that a n → na n → ∞ , as n → ∞ . Also, denote by { ξ j } nj = a sequence of randomvariables which are drawn independently of any other stochastic quantity involved in thetest criterion. The bootstrap procedure is as follows:1. Draw X ∗ , . . . , X ∗ n with replacement from X , . . . , X n .2. Generate i.i.d. random variables { ξ j } nj = with a standard normal distribution and let ε ∗ j = a n ξ j + b ε j , j = , ..., n , with b ε j deﬁned in (5).3. Compute the bootstrap responses Y ∗ j = T − b ϑ ( b m b ϑ ( X ∗ j ) + b σ b ϑ ( X ∗ j ) ε ∗ j ) , j = , . . . , n .4. On the basis of the observations ( Y ∗ j , X ∗ j ) , j = , . . . , n , reﬁt the model and obtainthe bootstrap residuals b ε ∗ j , j = , . . . , n .5. Calculate the value of the test statistic, say ∆ ∗ n , W , corresponding to the bootstrapsample ( Y ∗ j , X ∗ j ) , j = , . . . , n .6. Repeat the previous steps a number of times, say B , and obtain { ∆ ∗( b ) n , W } Bb = .7. Calculate the critical point of a size- α test as the ( − α ) level quantile c ∗ − α of ∆ ∗( b ) n , W , b = , ..., B .8. Reject the null hypothesis if ∆ n , W > c ∗ − α , where ∆ n , W is the value of the test statisticbased on the original observations ( Y j , X j ) , j = , . . . , n .7 Simulations

In this section we present the results of a Monte Carlo exercise that sheds light on thesmall-sample properties of the new test statistic and compare our test with the classi-cal Kolmogorov–Smirnov (later denoted by KS) and Cramér–von Mises (CM) criteriasuggested by Neumeyer et al. (2016). We considered the family of transformations T ϑ ( Y ) =  (cid:8) ( Y + ) ϑ − (cid:9) / ϑ if Y ≥ , ϑ , , log ( Y + ) if Y ≥ , ϑ = , − (cid:8) (− Y + ) − ϑ − (cid:9) /( − ϑ ) if Y < , ϑ , , − log (− Y + ) if Y < , ϑ = , proposed by Yeo and Johnson (2000), and randomly generated paired observations ( Y j , X j ) , j = , . . . , n , from the univariate heteroskedastic model T ϑ ( Y ) = m ( X ) + σ ( X ) ε, (10)where ϑ = m ( x ) = . + exp ( x ) and σ ( x ) = x . Here, ε is an error term which should bestochastically independent of X under the null hypothesis. The distribution of the covariate X and the distribution of the error ε (conditional on X ) were chosen as one of the following: Model A. X ∼ uniform ( , ) . Let ST ( ζ, ω, η, ν ) denote the univariate skew- t distribu-tion with parameters ζ (location), ω (scale), η (shape) and ν (degrees of freedom)as deﬁned by Azzalini (2005). Deﬁne ( ε (cid:12)(cid:12) X = x ) d =  W η,ν − E ( W η,ν ) √ Var ( W η,ν ) if 0 ≤ x ≤ . , Z if 0 . < x ≤ , where W η,ν ∼ ST ( , , η, ν ) and Z ∼ N ( , ) , both quantities independent of X . No-tice that the null hypothesis of a heteroskedastic transformation structure is violatedexcept when η → ν → ∞ , in which case W η,ν → N ( , ) so that ε and X areindependent. Model B. X ∼ uniform ( , ) . Deﬁne ( ε (cid:12)(cid:12) X = x ) d = ( W ν − ν √ ν if 0 ≤ x ≤ . , Z if 0 . < x ≤ , where W ν ∼ χ ν and Z ∼ N ( , ) . Notice that ε is stochastically dependent on X except when ν → ∞ , in which case the null hypothesis of a heteroskedastictransformation structure is satisﬁed. 8 odel C. X ∼ uniform ( , ) . Let AL ( ν, λ, κ ) denote the univariate asymmetric Laplacedistribution with parameters ν (location), λ (scale) and κ (shape) as studied byKozubowski et al. (2013). Deﬁne ( ε (cid:12)(cid:12) X = x ) d =  W κ −( − κ )/ κ √ ( + κ )/ κ if 0 ≤ x ≤ . , Z if 0 . < x ≤ , where W κ ∼ AL ( , , κ ) and Z ∼ AL ( , , ) , the latter being an observation fromthe usual symmetric Laplace distribution. Notice that ε is stochastically dependenton X except when κ =

1, in which case the null hypothesis is satisﬁed.

Model D.

To investigate the behavior of the tests also in the case of a discrete covari-ate, we considered generating X from a discrete uniform distribution on the set { , , · · · , } , with ε having the same distribution as the errors of Model A givenabove.For a test size of α = .

05, the rejection frequency of the test was recorded for samplesizes n = , , a n , where we followed Neumeyer et al. (2016) and chose a n = . n − / throughout. Since the bootstrap replications are time consuming we haveemployed the warp-speed method of Giacomini et al. (2013) in order to calculate criticalpoints of the test criterion. With this method we generate only one bootstrap resamplefor each Monte Carlo sample and thereby compute the bootstrap test statistic ∆ ∗ n , W for thatresample. Then, for a number M of Monte Carlo replications, the size- α critical point isdetermined similarly as in step 7 of Section 4.2, by computing the ( − α ) -level quantile of ∆ ∗( m ) n , W , m = , ..., M . For all simulations the number of Monte Carlo replications was set to M = Estimation of the tranformation parameter

To estimate the transformation parameter ϑ in (10) we employ the proﬁle likelihood es-timator recently studied by Neumeyer et al. (2016), which allows for the heteroskedasticerror structure present in our setup. Implementation of this estimator relies on somepractical considerations, which we now discuss. The estimator involves estimating m (·) and σ (·) nonparametrically, for which we used local linear regression with a Gaussiankernel and bandwidth chosen by the direct plug-in methodology proposed by Ruppert et al.(1995). The estimator also requires estimation of the density of the regression errors.For this purpose we used a Gaussian kernel with bandwidth chosen by the method of9heather and Jones (1991). Both these methods of bandwidth selection have been imple-mented by Wand (2015) in the R package KernSmooth . The author warns that in somecases these procedures might be numerically unstable (see p. 10 of Wand, 2015). In theserare situations we turned to simple rule of thumb selection methods: in the case of non-parametric regression we used the rule of thumb of Fan and Gijbels (1996) (implementedin the package locpol by Cabrera, 2018) and in the case of density estimation a rule ofSilverman (1986, equations (3.28) and (3.30) on pp. 45 and 47). Finally, to actually im-plement nonparametric regression and density estimation using these chosen bandwidths,we employed the package np (Hayﬁeld and Racine, 2008) designed speciﬁcally for thispurpose. The simulation results for the three considered models are shown in Tables 1 to 4. Theresults for the classical Kolmogorov–Smirnov and Cramér–von Mises tests are given in thecolumns labelled KS and CM, respectively. The percentage of rejections of our statistic ∆ n , W is given for three diﬀerent choices of the characteristic kernel Ψ (·) discussed Section 4.1,with various choices of a tuning parameter c >

0. Speciﬁcally and for c >

0, we useas weight functions scaled spherical stable densities that yield Ψ ( S ) γ = e − cu γ (recall that γ = Ψ ( S ) γ = ( + ( u / c )) − γ . From Tables 1 to 4 it is clear that the test based on ∆ n , W respectsthe nominal size well. However, for smaller sample sizes the test appears to be slightlyconservative in some cases. The same can be said of the classical KS and CM tests.Notice that, under the various considered alternatives, the power of all tests increasein accordance with the nature of the dependence that is introduced between the covariatesand the error terms. Moreover, in agreement with the consistency of the test establishedformally in Theorem 2, under alternatives the power of our test appears to increase as thesample size increases. Overall, in terms of power the new test based on ∆ n , W exhibitscompetitive performance and even outperforms the classical tests for most consideredchoices of the tuning parameter c .We close by noting that the value of the tuning parameter c clearly has some eﬀect onthe power of the test based on ∆ n , W . There exist several interpretations regarding the valueof c and for more information on this the reader is referred to the recent review paper byMeintanis (2016). For all tests we chose values of c for which the tests exhibit good sizeproperties as well as good power under alternatives.10able 1. Size and power results for verifying the validity of Model A. The null hypothesisis satisﬁed for η = ν = ∞ . The nominal size of the test is α = . ∆ n , W Ψ ( S ) ( u ) = exp (− cu ) Ψ ( S ) ( u ) = exp (− cu ) Ψ ( L ) ( u ) = ( + u / c ) − η ν n KS CM c = c = . c = c = c = . c = c = . c = c = . c = . c = . c = ∞

100 4.7 3.9 4.5 4.4 4.4 4.2 3.8 4.0 3.7 3.6 3.9 4.2 4.3 4.3200 4.6 4.2 4.6 4.9 4.8 4.0 3.9 3.7 3.8 3.7 3.5 4.4 4.6 4.9300 5.0 4.9 5.8 5.9 6.1 5.6 4.9 5.0 4.9 4.8 5.0 5.6 5.7 5.60 5 100 7.1 9.3 10.4 9.9 9.4 8.7 8.9 8.7 8.5 8.0 7.6 8.6 9.2 10.2200 11.4 15.5 18.3 18.7 18.6 17.3 16.5 17.7 17.0 15.9 14.4 17.5 18.1 18.4300 15.7 21.5 26.5 27.9 27.5 25.0 24.6 24.5 24.3 23.9 22.5 25.6 27.2 26.30 2.1 100 22.3 24.5 31.2 31.4 31.2 31.2 29.4 30.3 30.5 30.2 29.4 31.5 31.8 31.6200 41.0 44.2 51.5 52.9 53.4 54.2 52.0 53.4 54.7 54.1 53.6 54.2 53.6 52.3300 56.4 60.5 69.3 70.3 71.7 71.4 69.2 70.7 71.0 71.4 70.4 71.5 71.2 70.0100 2.1 100 33.0 35.8 39.8 40.2 40.4 41.8 42.0 43.2 44.3 44.2 44.2 42.5 41.1 40.4200 51.3 52.2 57.5 58.1 58.2 58.9 59.6 60.9 61.3 61.7 61.0 59.9 58.9 57.5300 67.1 66.8 67.1 68.5 69.6 71.8 71.1 72.0 73.3 74.7 75.0 71.9 70.2 68.2

Table 2. Size and power results for verifying the validity of Model B. The null hypothesisis satisﬁed for ν = ∞ . The nominal size of the test is α = . ∆ n , W Ψ ( S ) ( u ) = exp (− cu ) Ψ ( S ) ( u ) = exp (− cu ) Ψ ( L ) ( u ) = ( + u / c ) − ν n KS CM c = c = . c = c = c = . c = c = . c = c = . c = . c = . c = ∞

100 5.1 3.8 5.1 4.9 4.6 4.4 4.4 4.5 4.4 4.1 4.7 5.0 5.0 4.9200 3.9 4.1 4.3 4.2 4.1 4.0 3.7 3.6 3.5 3.4 4.0 4.4 4.3 4.2300 4.7 4.7 5.9 5.5 5.3 5.1 4.8 4.8 4.5 4.6 5.3 5.5 5.7 5.810 100 5.9 7.1 8.9 8.4 7.9 6.8 7.7 7.2 6.9 6.6 8.1 8.5 9.1 9.5200 9.7 10.2 13.5 12.7 12.2 11.0 11.6 11.4 10.9 10.3 12.2 13.0 13.1 13.6300 10.0 11.0 14.4 13.4 12.6 11.4 12.9 12.9 12.4 12.0 13.2 14.0 14.3 14.75 100 9.7 10.4 13.3 12.0 11.4 11.2 12.2 12.2 11.9 11.4 11.8 12.8 13.3 14.0200 16.1 17.6 21.6 20.8 20.5 19.7 21.6 22.1 20.9 20.6 21.4 22.0 22.5 21.9300 14.7 18.1 23.7 23.0 22.5 22.1 23.6 23.4 23.0 22.9 23.1 23.6 23.9 24.13 100 14.4 17.4 22.7 21.5 21.1 18.7 21.1 21.1 20.2 19.0 21.5 22.2 22.3 22.5200 22.5 26.1 30.6 30.6 30.4 30.6 32.3 32.6 32.5 32.5 31.5 31.7 31.7 31.0300 31.6 35.4 36.6 36.6 37.3 39.2 38.8 39.5 39.6 40.4 38.0 37.1 36.7 36.52 100 21.0 25.2 30.2 29.6 28.7 28.3 29.6 30.3 30.0 29.7 29.8 29.9 29.9 29.2200 39.1 43.0 47.0 47.9 48.1 50.7 49.9 51.5 52.4 52.7 49.5 47.9 46.9 46.2300 53.0 52.2 53.8 54.9 56.3 59.4 57.6 59.3 60.4 61.3 57.4 54.6 53.7 52.6 κ =

1. The nominal size of the test is α = . ∆ n , W Ψ ( S ) ( u ) = exp (− cu ) Ψ ( S ) ( u ) = exp (− cu ) Ψ ( L ) ( u ) = ( + u / c ) − κ n KS CM c = c = . c = c = c = c = . c = c = c = . c = . c = . c = .

11 100 5.1 4.8 5.0 5.5 5.4 5.2 5.0 4.8 4.7 4.7 4.0 4.8 5.0 4.6200 5.9 6.2 5.2 5.3 5.5 5.0 5.2 4.9 4.9 4.6 4.3 4.7 4.7 4.9300 6.5 5.5 5.8 5.8 5.4 5.5 4.9 5.2 5.2 5.3 5.4 5.6 5.0 4.92 100 8.6 9.4 13.9 14.1 13.9 13.9 13.6 14.0 13.5 11.5 10.0 11.1 11.9 13.3200 20.7 23.8 30.2 30.5 30.7 31.9 29.9 31.4 31.1 30.8 24.9 30.1 30.8 32.0300 34.0 32.3 33.3 35.1 35.7 41.0 37.2 39.3 38.6 44.3 39.8 43.6 44.1 40.35 100 10.7 12.1 18.3 18.6 19.2 19.8 18.6 19.2 20.2 19.6 16.1 17.6 19.4 20.5200 27.8 33.4 38.7 40.3 41.7 44.2 42.3 43.3 44.7 45.6 41.7 44.5 46.5 45.9300 41.3 41.1 43.6 45.7 47.0 57.0 53.5 56.0 59.2 66.7 63.6 65.8 66.9 64.1

Table 4. Size and power results for verifying the validity of Model D. The null hypothesisis satisﬁed for η = ν = ∞ . The nominal size of the test is α = . ∆ n , W Ψ ( S ) ( u ) = exp (− cu ) Ψ ( S ) ( u ) = exp (− cu ) Ψ ( L ) ( u ) = ( + u / c ) − η ν n KS CM c = c = . c = c = c = . c = c = . c = c = . c = . c = . c = ∞

100 3.1 2.8 3.9 3.5 3.3 3.8 3.6 3.5 3.5 3.4 3.7 3.7 3.5 3.4200 3.7 3.5 4.4 4.6 4.3 4.0 4.2 4.5 4.7 4.2 3.9 4.1 4.4 4.5300 4.5 4.0 4.6 4.8 4.6 4.8 4.8 4.5 4.5 4.6 4.8 4.9 4.5 4.40 5 100 5.7 5.8 7.4 7.6 7.1 6.8 7.1 7.1 7.3 6.8 6.3 6.9 7.0 7.3200 7.2 8.6 9.8 10.0 10.4 10.4 10.8 11.1 11.0 10.6 10.6 10.7 11.0 10.5300 9.7 11.2 11.1 12.1 13.2 13.3 13.1 13.7 14.0 13.3 12.9 13.6 13.7 13.80 2.1 100 18.9 19.0 22.1 23.0 25.3 26.2 25.5 25.8 26.1 27.2 25.7 26.6 26.6 25.7200 35.3 33.3 39.4 42.7 46.7 48.6 46.6 48.8 49.9 50.9 48.3 49.7 49.8 47.8300 53.4 49.1 56.1 60.4 64.9 67.5 64.4 66.6 68.1 67.9 68.0 68.0 67.4 65.7100 2.1 100 30.1 27.7 32.1 34.5 36.0 37.7 38.2 38.2 38.3 40.1 38.8 39.4 38.4 36.6200 53.9 53.1 54.2 57.5 61.5 63.9 60.3 62.3 63.7 65.8 66.8 65.6 63.5 61.6300 69.1 65.2 64.8 68.2 72.6 75.8 70.8 72.1 73.5 76.8 77.6 76.5 74.3 72.2 .2 Simulation results for normality and symmetry One of the main goals of transformation is to reduce skewness and possibly even achievenear normality. These issues have been recently investigated by Yeo and Johnson (2000),Yeo et al. (2014), Meintanis and Stupﬂer (2015), and Chen et al. (2002), with or withoutregressors, with the last reference also providing asymptotics for a test of normality of theafter Box–Cox transformation errors under homoskedasticity. In this section we investigatehow the CF tests for symmetry and normality designed for i.i.d. data perform within the sig-niﬁcantly more complicated context of the semiparametric heteroskedastic transformationmodel (1). In this connection we note that such CF tests have already shown competitiveperformance in more classical regression frameworks; see Hušková and Meintanis (2010,2012). The CF test statistics of normality and symmetry are motivated by uniqueness ofthe CF of any given distribution, and by the fact that for any zero-symmetric distribution,the imaginary part of its CF is identically equal to zero. Thus we have the test statistic fornormality ∆ ( G ) n , w = n ∫ ∞−∞ | b ϕ b ε ( t ) − e − t / | w ( t ) d t , (11)and the test statistic for symmetry of errors ∆ ( S ) n , w = n ∫ ∞−∞ ( Im ( b ϕ b ε ( t )) w ( t ) d t , (12)where w (·) is a weight function analogous to w (·) of assumption (A.8) and Im ( z ) denotesthe imaginary part of a complex number z .In our Monte Carlo simulations, the results of which are shown in Tables 5 to 11, weuse three choices of the weight function w (·) and various choices of the tuning parameter c for both tests. To obtain the critical value of the normality test corresponding to the statisticin (11), we used the same bootstrap resampling scheme as given in Section 4.2, but withstep 2 replaced by:2 . ′ Generate i.i.d. errors { ε ∗ j } nj = from a standard normal distribution.The critical value of test for symmetry based on the statistic in (12) was obtained usinga wild bootstrap scheme, see Neuhaus and Zhu (2000), Delgado and González-Manteiga(2001) and Hušková and Meintanis (2012), which is the same as that given in Section 4.2but with step 2 replaced by:2 . ′′ Generate i.i.d. random variables { U j } nj = according to the law P ( U j = + ) = P ( U j = − ) = and set ε ∗ j = U j ˜ ε j , j = , ..., n , where the ˜ ε j are drawn randomly withreplacement from b ε , . . . , b ε n . 13irstly, concerning the test for normality, we see from Tables 5 to 7 that the size ofthe test is approximately around the nominal size, being slightly conservative in somecases. The power of the test increases with the extend of violation of normality, i.e., as theskewness parameter η is increased or as the degrees of freedom parameter ν is decreased.Finally, the results seem to suggest consistency of the test in the sense that, for each givenﬁxed alternative, the power increases gradually as the sample size is increased. Theseobservations hold for error terms generated under Models A, B and D. For Model C, wejust note that an analogous CF-based test can be constructed along the lines of Meintanis(2004).Regarding the test for symmetry similar conclusions as above can be made (see Tables 8to 11). Note however that for Model A there is a clear over-rejection of the null hypothesisof symmetry in the cases where η = ν = . κ =

1, although to a lesser extent. To address this issue we employed thepermutation test suggested by Henze et al. (2003) developed speciﬁcally to address theissue of over-rejection. The results obtained in this way, however, agree almost exactlywith the results in Tables 8 obtained using the wild bootstrap approach. It should be notedthat this issue of over-rejection does not occur when the true transformation parameter ϑ is assumed to be known and only arises in the more complicated setting where ϑ needs tobe estimated.In conclusion we note that the test for normality and the test for symmetry bothexhibit favourable properties even in this more complicated setting of the heteroskedastictransformation model. However, our results are just indicative of the performance ofexisting tests in this setting, and a more in depth study is needed to explore the theoreticalproperties of these tests, which might shed more light on some of the prevailing issuesmentioned above. For our ﬁrst application of the described procedures we con-sider the ultrasonic calibration data given in

NIST/SEMATECH e-Handbook of Statistical Methods (the data can be downloaded from ).The response variable Y represents ultrasonic response and the predictor variable X ismetal distance.We investigate the appropriateness of four alternative models: a homoskedastic modelwith or without transformation of the response variable and a heteroskedastic model with14able 5. Size and power results for assessing normality ( ∆ ( G ) n , w ) of the error terms appearingin Model A. The nominal size of the test is α = . w ( t ) = exp (− ct ) w ( t ) = ( + t / c ) − w ( t ) = exp (− c | t |) η ν n c = . c = . c = . c = c = . c = c = . c = c = . c = . c = . c = . ∞

100 3.7 4.0 4.2 4.4 2.5 2.6 2.6 2.5 2.7 2.5 2.4 2.4200 3.7 3.8 3.8 4.0 3.1 3.0 2.9 2.9 2.9 3.0 2.9 2.9300 3.9 4.0 4.1 4.6 2.8 2.9 3.0 2.9 3.3 3.4 2.9 2.95 ∞

100 15.2 15.5 15.2 13.9 13.5 14.5 14.6 15.2 14.7 15.8 16.0 14.7200 30.4 30.8 30.9 31.3 22.7 24.8 25.9 26.3 27.2 27.4 27.2 26.4300 41.8 41.5 41.9 40.7 30.7 33.5 35.0 36.3 39.4 39.7 38.2 36.620 ∞

100 18.5 19.8 19.5 18.8 14.9 15.9 17.1 17.4 17.5 18.3 17.8 17.3200 43.0 42.2 41.4 41.4 27.3 30.0 31.4 32.4 36.3 36.5 34.8 32.0300 55.1 54.1 52.1 50.2 37.4 41.3 43.2 44.4 49.7 48.6 46.9 44.30 5 100 5.5 6.0 6.4 6.8 7.2 7.3 6.9 6.9 6.9 7.0 6.9 7.2200 6.3 6.7 7.1 7.8 9.4 10.0 10.2 10.2 10.7 10.1 10.2 10.0300 6.1 6.5 6.8 7.5 13.7 14.7 14.7 14.9 16.2 15.7 15.8 14.90 2.1 100 14.0 14.7 14.6 14.1 17.9 19.0 19.8 20.4 20.7 21.3 20.7 20.0200 27.3 25.4 24.3 23.4 30.2 31.8 33.5 34.3 37.5 36.8 35.6 34.0300 29.0 27.7 26.2 26.2 37.6 42.0 43.4 44.1 46.1 45.8 44.8 44.1100 2.1 100 36.1 34.1 32.6 29.9 28.5 32.3 34.8 36.5 41.9 41.9 38.8 35.7200 57.6 53.9 51.7 50.5 48.7 52.8 55.5 57.2 62.1 61.9 59.1 56.7300 58.1 53.6 52.9 53.1 59.8 64.7 66.2 67.9 72.9 72.2 70.6 67.4

Table 6. Size and power results for assessing normality ( ∆ ( G ) n , w ) of the error terms appearingin Model B. The nominal size of the test is α = . w ( t ) = exp (− ct ) w ( t ) = ( + t / c ) − w ( t ) = exp (− c | t |) ν n c = . c = . c = . c = c = . c = c = . c = c = . c = . c = . c = . ∞

100 3.5 3.4 3.7 3.7 2.5 2.4 2.6 2.5 2.6 2.4 2.5 2.5200 4.0 4.1 4.4 4.7 3.8 3.5 3.7 3.7 3.9 3.9 3.7 3.7300 4.2 4.3 4.6 4.8 3.3 3.2 3.2 3.3 3.6 3.4 3.5 3.310 100 12.6 14.0 14.5 14.2 12.3 12.8 12.9 13.0 12.4 12.5 13.1 12.9200 25.0 26.1 25.9 25.6 23.0 24.3 24.9 25.5 25.6 25.8 26.0 25.1300 32.8 33.2 33.0 32.6 26.6 29.5 30.8 31.6 31.9 32.4 32.6 31.15 100 17.7 19.3 19.6 19.5 17.6 18.4 18.8 19.4 19.8 20.4 20.5 19.0200 38.6 38.4 38.9 37.6 31.2 33.5 34.6 34.9 35.6 35.8 35.7 35.2300 54.8 55.2 54.1 52.8 42.2 46.3 48.6 49.9 52.2 52.4 51.1 48.83 100 25.4 26.4 25.9 25.0 19.9 21.9 22.8 23.7 23.6 24.9 24.6 24.0200 51.9 50.6 50.4 50.5 38.7 42.6 44.4 45.7 49.7 49.5 48.4 45.8300 64.0 62.7 62.2 61.1 51.6 54.9 57.0 58.7 62.8 62.6 60.9 57.82 100 34.7 33.5 31.8 30.3 25.4 28.4 30.0 31.4 35.3 35.4 33.9 31.1200 55.6 52.5 51.3 50.1 41.8 45.1 47.7 49.8 56.0 55.3 52.9 49.1300 68.3 65.9 63.6 62.2 56.0 60.5 62.9 64.9 74.5 72.9 68.7 64.0 ∆ ( G ) n , w ) of the error terms appearingin Model D. The nominal size of the test is α = . w ( t ) = exp (− ct ) w ( t ) = ( + t / c ) − w ( t ) = exp (− c | t |) η ν n c = . c = . c = . c = c = . c = c = . c = c = . c = . c = . c = . ∞

100 3.2 3.2 3.0 3.1 1.7 1.9 1.9 2.0 2.1 1.7 1.9 2.6200 2.9 2.8 2.8 3.0 1.8 2.0 2.1 2.1 2.1 1.9 1.9 2.3300 4.2 3.9 3.8 3.8 2.2 2.3 2.4 2.6 2.4 2.4 2.3 2.20 5 100 5.8 6.6 6.5 6.7 5.4 5.4 5.3 5.5 5.4 5.4 5.9 6.0200 5.8 6.3 6.5 7.0 6.3 6.8 6.7 6.7 6.8 6.8 6.4 6.7300 6.9 7.7 8.6 9.2 8.8 8.6 8.9 9.1 9.2 8.4 9.0 9.70 2.1 100 16.3 16.2 15.6 14.7 14.0 14.9 15.7 16.0 15.5 15.4 13.0 10.5200 26.6 25.8 25.7 24.1 21.8 24.5 25.7 26.9 26.3 24.1 21.4 18.2300 33.9 30.5 27.6 25.7 22.2 26.0 28.0 29.5 29.4 25.6 22.7 19.5100 2.1 100 39.3 37.4 35.7 33.2 23.3 28.0 29.9 31.0 31.2 25.7 21.4 17.7200 64.3 59.9 56.1 52.4 40.0 47.2 51.2 53.8 52.7 44.6 36.6 30.8300 77.9 72.3 67.4 62.1 51.6 59.9 63.9 66.9 65.5 56.3 48.9 40.6

Table 8. Size and power results for assessing symmetry ( ∆ ( S ) n , w ) of the error terms appearingin Model A. The nominal size of the test is α = . w ( t ) = exp (− ct ) w ( t ) = ( + t / c ) − w ( t ) = exp (− c | t |) η ν n c = . c = . c = c = . c = . c = c = . c = c = . c = c = . c = ∞

100 6.0 5.8 6.5 6.4 6.0 5.9 5.7 5.4 5.9 6.3 6.6 6.7200 4.4 4.9 5.3 5.2 6.0 5.9 5.9 5.9 5.9 5.0 4.6 4.8300 5.0 5.4 5.2 5.1 6.9 6.8 6.7 6.7 6.8 5.0 5.3 5.25 ∞

100 23.0 23.5 23.5 22.4 22.0 21.2 20.8 20.1 20.8 22.6 23.4 24.0200 38.2 39.5 39.1 37.7 35.5 34.9 34.1 32.8 33.9 37.1 39.6 39.2300 48.8 50.1 49.1 47.2 46.6 45.9 44.8 44.0 45.5 48.8 49.5 49.520 ∞

100 27.4 29.5 28.7 27.3 26.3 25.9 24.8 24.0 24.8 27.2 28.5 28.8200 47.7 47.9 46.4 44.9 43.4 43.1 42.6 42.6 43.4 45.9 47.6 47.2300 59.9 59.7 58.0 56.5 59.3 59.0 58.3 57.7 58.1 60.9 59.6 58.80 5 100 8.0 7.4 9.1 9.1 8.0 7.9 7.6 7.4 7.7 8.2 8.5 8.4200 7.6 8.1 8.6 8.6 8.1 7.9 7.8 7.7 7.7 8.1 8.1 8.0300 6.7 7.3 8.0 8.1 7.3 6.9 7.0 6.9 7.0 6.5 6.9 7.30 2.1 100 9.3 9.8 10.2 9.9 8.6 8.0 8.0 7.6 7.9 9.1 9.7 10.0200 11.6 12.8 12.1 11.6 10.6 10.3 10.3 9.9 10.3 12.0 12.3 12.5300 14.1 15.1 14.6 13.5 12.2 11.9 11.8 11.5 11.7 13.4 14.2 14.6100 2.1 100 39.0 40.2 37.9 34.0 38.4 38.4 38.6 38.7 38.9 38.2 38.7 38.5200 52.7 49.6 47.7 46.5 51.8 53.3 54.3 55.2 55.0 52.2 50.9 49.1300 53.2 49.6 47.3 45.5 54.5 56.1 58.1 59.6 58.9 54.2 51.2 48.9 ∆ ( S ) n , w ) of the error terms appearingin Model B. The nominal size of the test is α = . w ( t ) = exp (− ct ) w ( t ) = ( + t / c ) − w ( t ) = exp (− c | t |) ν n c = . c = . c = c = . c = . c = c = . c = c = . c = c = . c = ∞

100 6.2 6.0 6.5 6.0 5.3 5.4 5.4 5.4 5.5 6.1 6.2 6.4200 5.3 5.9 6.0 5.9 6.3 6.3 6.1 6.0 6.2 6.5 5.6 5.7300 5.2 5.9 5.9 5.9 6.7 6.8 6.9 6.8 6.9 5.2 5.7 5.910 100 16.8 19.1 19.1 18.0 15.6 15.3 15.0 14.5 14.8 16.2 18.0 18.4200 28.7 29.9 30.4 29.9 26.4 26.5 25.9 25.2 25.9 29.2 28.9 30.2300 42.2 43.3 43.4 42.8 38.0 37.5 36.2 35.5 36.6 41.6 42.6 43.35 100 26.9 28.5 28.7 26.5 26.0 25.1 24.1 23.2 24.4 26.2 27.4 27.8200 45.6 46.0 45.3 44.2 43.3 42.7 41.9 41.4 42.4 44.2 45.6 45.6300 56.6 56.0 54.9 53.3 55.1 55.1 54.3 53.6 54.3 56.7 56.0 55.43 100 36.5 37.5 35.9 33.0 34.9 34.8 34.2 34.0 35.0 35.8 36.3 35.8200 55.9 55.8 54.5 53.0 55.4 55.5 55.8 55.5 55.9 55.8 55.7 55.2300 66.1 64.6 62.4 60.1 63.9 64.1 63.7 63.5 63.8 65.7 64.9 63.52 100 41.0 41.1 39.9 35.7 38.5 38.3 37.8 37.5 37.5 39.1 40.9 40.4200 60.3 58.6 55.9 53.4 58.4 59.2 59.3 59.4 59.8 60.1 59.3 57.9300 66.9 64.6 61.1 59.4 66.7 67.0 67.5 68.4 68.3 67.2 65.1 63.8

Table 10. Size and power results for assessing symmetry ( ∆ ( S ) n , w ) of the error terms appearingin Model C. The nominal size of the test is α = . w ( t ) = exp (− ct ) w ( t ) = ( + t / c ) − w ( t ) = exp (− c | t |) κ n c = . c = . c = c = . c = . c = c = . c = c = . c = c = . c =

21 100 7.2 7.6 8.7 8.8 8.8 8.8 8.5 8.4 8.6 9.2 7.5 8.1200 7.2 7.6 7.7 8.8 8.3 8.0 7.9 7.7 8.2 7.3 7.5 7.5300 5.6 6.9 7.6 7.9 7.6 7.7 7.7 7.4 7.5 5.4 6.4 7.72 100 25.8 25.8 24.7 22.8 23.0 23.2 22.4 21.8 22.7 24.4 25.5 25.4200 37.1 35.7 34.8 32.9 33.7 33.9 33.8 34.0 35.1 35.5 36.0 35.6300 49.8 46.7 45.5 42.1 48.8 49.7 49.0 49.3 51.3 50.3 48.1 47.05 100 29.8 30.3 29.6 27.1 27.2 27.4 27.1 26.9 27.4 29.2 29.5 29.5200 45.8 44.6 42.9 41.9 43.6 43.5 43.8 43.9 44.1 45.1 45.0 44.2300 54.2 53.3 51.2 48.6 48.2 47.9 49.5 50.4 50.4 54.6 53.7 53.5 ∆ ( S ) n , w ) of the error terms appearingin Model D. The nominal size of the test is α = . w ( t ) = exp (− ct ) w ( t ) = ( + t / c ) − w ( t ) = exp (− c | t |) η ν n c = . c = . c = c = . c = . c = c = . c = c = . c = c = . c = ∞

100 3.6 3.8 4.1 4.1 3.9 3.6 3.6 3.3 3.4 3.7 3.8 3.9200 4.4 4.4 4.4 4.0 4.1 4.2 4.1 4.0 4.1 4.4 4.4 4.1300 5.0 5.0 4.8 4.8 4.8 4.8 4.8 4.8 4.8 5.0 5.0 4.80 5 100 5.9 6.4 6.9 6.5 5.8 5.4 5.2 5.1 5.1 5.8 6.4 6.7200 5.9 6.1 5.9 5.7 5.7 5.6 5.6 5.5 5.6 5.8 6.0 6.1300 5.8 6.0 5.9 5.6 5.3 5.3 5.2 5.0 5.2 5.6 5.7 5.90 2.1 100 8.2 8.1 7.7 7.7 7.8 7.7 7.6 7.5 7.7 8.0 8.1 7.9200 8.0 8.0 7.3 7.1 7.3 7.2 7.1 6.7 7.0 7.9 7.9 7.6300 8.8 9.0 8.6 7.8 7.7 8.0 7.8 7.7 8.0 8.1 8.4 8.5100 2.1 100 40.3 39.0 36.9 35.0 39.6 40.1 40.1 41.3 41.0 39.8 39.1 38.7200 53.4 50.4 47.2 45.4 54.7 56.9 59.1 60.1 59.8 53.8 51.0 49.7300 56.6 52.7 49.0 46.7 58.6 61.2 63.7 65.3 64.9 57.4 53.8 51.4 or without transformation of the response variable. For each of these models we test forvalidity, i.e. independence of the error term and the regressor. We employ all tests con-sidered in this paper (and their homoskedastic counterparts introduced by Neumeyer et al.,2016, and Hušková et al., 2018), and for all tests based on the characteristic function wechoose a Gaussian characteristic kernel. The choice of the tuning parameter was based onthe Monte Carlo study and is shown in Table 12 along with the numerical results. For thisapplication we used 1 000 bootstrap replications and assume a signiﬁcance level of 0.05.For simplicity we used Fan and Gijbels (1996) for regression bandwidths and Silverman(1986) for density estimation bandwidths.For the homoscedastic case the results indicate a poor ﬁt of the respective non-transformation models, but implementation of the Box–Cox transformation on the responseclearly improves the ﬁt according to all tests. An enhanced ﬁt for the after-transformationmodel is also illustrated by the results corresponding to the heteroscedastic case althoughin this case the model can not be rejected even before transformation.As our second application we consider the heteroscedastic location-scale model for theCanadian cross-section wage data and the Italian GDP data; these data are also discussedin Racine and Li (2017) in the context of the non-transformation model. For the Canadianwage data there are n =

205 observations with ‘age’ consider as predictor for ‘logwage’. Forthe Italian GDP data there are n = p -values of the tests formodel validity. Homoskedastic case Heteroskedastic caseTest No transformation Box–Cox No transformation Box–CoxParameter estimate n/a ˆ ϑ = .

458 n/a ˆ ϑ = − . ∆ n , W ( c =

1) 0.025 0.403 0.132 0.294

Table 13. Results for the Canadian cross-section wage data and Italian GDP data, alongwith the p -values of the tests for model validity. Canadian wage data Italian GDP dataTest No transformation Box–Cox No transformation Box–CoxParameter estimate n/a ˆ ϑ = .

842 n/a ˆ ϑ = − . ∆ n , W ( c =

1) 0.021 0.020 <0.001 0.960 data however quite the opposite holds: The KS and CM tests indicate that neither the non-transformation nor the transformation model is appropriate, while the CF-based test showsa remarkably improved ﬁt that clearly favours the transformation model. These results arepartly in line with Racine and Li (2017) as they also ﬁnd an insigniﬁcant KS statistic forthe Canadian wage data but at the same test reject the location-scale presumption for theItalian data. On the other hand our ﬁndings indicate that while performing a Box–Coxtransformation on the response might still lead to the same conclusion, there exist caseswhere this transformation could enhance the ﬁt of the underlying model.

New tests for the validity of the heteroskedastic transformation model are proposed whichare based on the well known factorization property of the joint characteristic function into itscorresponding marginals. The asymptotic null distribution is derived and the consistencyof the new criteria is shown. A Monte Carlo study is included by means of which aresampling version of the proposed method is compared to earlier methods and showsthat the new test, aside from being computationally convenient, compares well and often19utperforms its competitors, particularly under heavy tailed error distributions. A furtherMonte Carlo study of characteristic-function based tests for symmetry and normality ofregression errors exhibit analogous favourable features. Finally a couple of illustrativeapplications on real data lead to interesting conclusions.

Acknowledgements

The work of the ﬁrst author was partially supported by the grant GAČR 18-08888S.The work of the third author was partially supported by OP RDE project No.CZ.02.2.69/0.0/0.0/16_027/0008495, International Mobility of Researchers at Charles Uni-versity.

References

Allison, J. S., Hušková, M., and Meintanis, S. G. (2018). Testing the adequacy of semi-parametric transformation models.

TEST , 27:70–94.Azzalini, A. (2005). The skew-normal distribution and related multivariate families.

Scand.J. Statist. , 32:159–188.Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations.

J. Roy. Statist. Soc.B. , 26:211–252.Breiman, L. and Friedman, J. H. (1985). Estimating optimal transformations for multipleregression and correlation.

J. Amer. Statist. Assoc. , 80:580–598.Brown, L. D. and Levine, M. (2007). Variance estimation in nonparametric regression viathe diﬀerence sequence method.

Ann. Statist. , 35:2219–2232.Cabrera, J. L. O. (2018). locpol: Kernel Local Polynomial Regression . R package version0.7-0.Chen, G., Lockhart, R. A., and Stephens, M. A. (2002). Box–Cox transformations in linearmodels: large sample theory and tests of normality.

Can. J. Statist. , 30:177–209.Chen, S., Dahl, G. B., and Khan, S. (2005). Nonparametric identiﬁcation and estimationof a censored location-scale regression model.

J. Amer. Statist. Assoc. , 100:212–221.Colling, B., Heuchenne, C., Samb, R., and Van Keilegom, I. (2015). Estimation of the errordensity in a semiparametric transformation model.

Ann. Instit. Statist. Math. , 67:1–18.20olling, B. and Van Keilegom, I. (2016). Goodness-of-ﬁt tests in semiparametric transfor-mation models.

TEST , 25:291–308.Colling, B. and Van Keilegom, I. (2017). Goodness-of-ﬁt tests in semiparametric transfor-mation models using the integrated regression function.

J. Multivar. Anal. , 160:10–30.Delgado, M. A. and González-Manteiga, W. (2001). Signiﬁcance testing in nonparametricregression based on the bootstrap.

Ann. Statist. , pages 1469–1507.Einmahl, J. H. J. and Van Keilegom, I. (2008). Speciﬁcation tests in nonparametricregression.

J. Econometrics , 143:88–102.Fan, J. and Gijbels, I. (1996).

Local polynomial modelling and its applications . Chapman& Hall, London.Fang, K.-T., Kotz, S., and Ng, K. W. (1990).

Symmetric Multivariate and Related Distri-butions . Chapman and Hall, New York.Giacomini, R., Politis, D. N., and White, H. (2013). A warp-speed method for conductingMonte Carlo experiments involving bootstrap estimators.

Economet. Theor. , 29:567.Hayﬁeld, T. and Racine, J. S. (2008). Nonparametric econometrics: The np package. Journal of Statistical Software , 27(5).Henze, N., Klar, B., and Meintanis, S. G. (2003). Invariant tests for symmetry aboutan unspeciﬁed point based on the empirical characteristic function.

J. Multivar. Anal. ,87:275–297.Hlávka, Z., Hušková, M., and Meintanis, S. G. (2011). Tests for independence in non-parametric heteroscedastic regression models.

J. Multivar. Anal. , 102:816–827.Horowitz, J. L. (2009).

Semiparametric and Nonparametric Methods in Econometrics ,volume 12. Springer, New York.Hušková, M. and Meintanis, S. G. (2010). Tests for the error distribution in non-parametricpossibly heteroscedastic regression models.

TEST , 19:92–112.Hušková, M. and Meintanis, S. G. (2012). Tests for symmetric error distribution in linearand non-parametric regression models.

Commun. Statist.-Simul. Comput. , 41:833–851.Hušková, M., Meintanis, S. G., Neumeyer, N., and Pretorius, C. (2018). Independencetests in semiparametric transformation models.

S. Afr. Statist. J. , 52:1–13.21ozubowski, T. J., Podgórski, K., and Rychlik, I. (2013). Multivariate generalized Laplacedistribution and related random ﬁelds.

J. Multivar. Anal. , 113:59–72.Linton, O., Sperlich, S., and Van Keilegom, I. (2008). Estimation of a semiparametrictransformation model.

Ann. Statist. , pages 686–718.Meintanis, S. G. (2004). A class of omnibus tests for the Laplace distribution based on theempirical characteristic function.

Commun. Statist.-Theor. Meth. , 33:925–948.Meintanis, S. G. (2016). A review of testing procedures based on the empirical characteristicfunction.

S. Afr. Statist. J. , 50:1–14.Meintanis, S. G. and Stupﬂer, G. (2015). Transformations to symmetry based on theprobability weighted characteristic function.

Kybernetika , 51:571–587.Mu, Y. and He, X. (2007). Power transformation toward a linear regression quantile.

J.Amer. Statist. Assoc. , 102:269–279.Neuhaus, G. and Zhu, L. (2000). Nonparametric monte carlo tests for multivariate distri-butions.

Biometrika , 87:919–928.Neumeyer, N., Noh, H., and Van Keilegom, I. (2016). Heteroscedastic semiparametrictransformation models: estimation and testing for validity.

Statist. Sinica , 26:925–954.Nolan, J. P. (2013). Multivariate elliptically contoured stable distributions: theory andestimation.

Computat. Statist. , 28:2067–2089.Quiroz, A. J., Nakamura, M., and Pérez, F. J. (1996). Estimation of a multivariate Box-Coxtransformation to elliptical symmetry via the empirical characteristic function.

Ann.Instit. Statist. Math. , 48:687–709.Racine, J. S. and Li, K. (2017). Nonparametric conditional quantile estimation: A locallyweighted quantile kernel approach.

Journal of Econometrics , 201:72–94.Ruppert, D., Sheather, S. J., and Wand, M. P. (1995). An eﬀective bandwidth selector forlocal least squares regression.

J. Amer. Statist. Assoc. , 90:1257–1270.Sakia, R. M. (1992). The Box-Cox transformation technique: A review.

J. Roy. Statist.Soc. Ser. D , 41:169–178.Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection methodfor kernel density estimation.

J. Roy. Statist. Soc. Series B (Methodological) , pages683–690. 22ilverman, B. W. (1986).

Density Estimation for Statistics and Data Analysis , volume 26.Chapman & Hall, New York.Staniswalis, J. G., Severini, T. A., and Moschopoulos, P. G. (1993). On a data based powertransformation for reducing skewness.

J. Statist. Comput. Simul. , 46:91–100.Stute, W. and Zhu, L. (2005). Nonparametric checks in single–index models.

Ann. Statist. ,33:1048–1083.Wand, M. (2015).

KernSmooth : Functions for Kernel Smoothing Supporting Wand &Jones (1995) . R package version 2.23-15.Yeo, I.-K. and Johnson, R. A. (2000). A new family of power transformations to improvenormality or symmetry. Biometrika , 87:954–959.Yeo, I.-K., Johnson, R. A., and Deng, X. (2014). An empirical characteristic functionapproach to selecting a transformation to normality.

Communications for StatisticalApplications and Methods , 21:213–224.

We start with formulation of the assumptions and then we give the assertions on behaviorof the test statistics under both the null hypothesis and some alternatives.(A.1) ( Y j , X j ) , j = , . . . , n , are i.i.d. random vectors, where the covariates X j , j = , . . . , n , have a compact support R X with R X ⊂ R p .(A.2) We use a product kernel K ( y ) = Î ps = k ( y s ) , y = ( y , . . . , y p ) ⊤ , with k (·) beingsymmetric and continuous in [− , ] , and satisfying ∫ − u r k ( u ) du = δ r , , r = , . . . , p , ∫ − u p + k ( u ) du , , where δ r , s stands for Kronecker’s delta.23A.3) The bandwidth h = h n satisﬁes ( nh p ) − + nh p + δ → , as n → ∞ for some δ > E || X j || < ∞ , and that X j has the density f (·) satisfying0 < inf x ∈R X f ( x ) ≤ sup x ∈R X f ( x ) < ∞ , (cid:12)(cid:12)(cid:12) f ( x ) − L ( f , y , x − y , s ) (cid:12)(cid:12)(cid:12) ≤ || x − y || s + δ d ( y ) , where L ( f , y , x − y , s ) is the Taylor expansion of the density f of order s at x , δ > E | d ( X j )| < ∞ , for some s + ≥ p / m ϑ ( x ) , σ ϑ ( x ) , x ∈ R X satisﬁes (cid:12)(cid:12)(cid:12) m ϑ ( x ) − L ϑ ( m , y , x − y , s ) (cid:12)(cid:12)(cid:12) ≤ || x − y || s + δ d ( y ) , (cid:12)(cid:12)(cid:12) σ ϑ ( x ) − L ϑ ( σ, y , x − y , s ) (cid:12)(cid:12)(cid:12) ≤ || x − y || s + δ d ( y ) where L ϑ ( m ϑ , y , x − y , s ) is the Taylor expansions of regression function m ϑ of order s at y and E | d ( X j )| < ∞ for some s ≥ p / E m ϑ ( X j ) < ∞ .Similarly, L ϑ ( σ ϑ , y , x − y , s ) is the Taylor expansions of regression function σ ϑ of order s at y and E | d ( X j )| < ∞ for some s ≥ p / E σ ϑ ( X j ) < ∞ .(A.6) L = { T ϑ ; ϑ ∈ Θ } is a parametric class of strictly increasing transformations, Θ is aopen measurable subset of R q , and for some ξ > || ϑ − ϑ ||≤ ξ (cid:12)(cid:12)(cid:12) T ϑ ( Y j ) − T ϑ ( Y j ) − q Õ s = ( ϑ s − ϑ s ) ∂ T ϑ ( Y j ) ∂ϑ s (cid:12)(cid:12)(cid:12) ϑ = ϑ (cid:12)(cid:12)(cid:12) /|| ϑ − ϑ || + δ ≤ d ( Y j ) , where E d ( Y j ) < ∞ , E (| d ( Y j )|| X j ) < ∞ , a.s., and for some δ > E (cid:16) ∂ T ϑ ( Y j ) ∂ϑ s (cid:12)(cid:12)(cid:12) ϑ = ϑ (cid:12)(cid:12)(cid:12) X j (cid:17) = ∂ E (cid:0) T ϑ ( Y j )| X j (cid:1) ∂ϑ s (cid:12)(cid:12)(cid:12) ϑ = ϑ , a.s. , E (cid:16) E (cid:16) ∂ T ϑ ( Y j ) ∂ϑ s (cid:12)(cid:12)(cid:12) ϑ = ϑ (cid:12)(cid:12)(cid:12) X j (cid:17) (cid:17) < ∞ , E T ϑ ( Y j ) < ∞ . b ϑ of ϑ (they are q -dimensional) satisﬁes: √ n (cid:16) b ϑ − ϑ (cid:17) = √ n n Õ j = g ( Y j , X j ) + o P ( ) where g ( Y j , X j ) is has zero mean and ﬁnite covariance matrix.(A.8) The weight function is such that W ( t , t ) = w ( t ) w ( t ) , where w m (·) satisfy w ( t ) = w (− t ) , t ∈ R , ∫ ∞−∞ t w ( t ) dt < ∞ , w ( t ) = w (− t ) , t ∈ R p . (A.9) L = { T ϑ ; ϑ ∈ Θ } is a parametric class of strictly increasing transformations, Θ is aopen measurable subset of R q and that for all ϑ ∈ Θ (cid:12)(cid:12)(cid:12) T ϑ ( Y j ) − T ϑ ( Y j )| ≤ d ( Y j )|| ϑ − ϑ || , with E | d ( Y j )| < ∞ .Comments on the assumptions:• Assumptions (A.2) are (A.3) are quite standard.• Assumption (A.4) requires smoothness of the density f (·) of X .• Assumption (A.5) formulates the requirements on the regression function m ϑ ( x ) = E (cid:0) T ϑ ( Y j )| X j = x (cid:1) . Motivation for assumptions (A.4) and (A.5) are fromDelgado and González-Manteiga (2001).• Assumption (A.7) requires that a √ n -estimator of ϑ with an asymptotic rep-resentation is available. Such estimators are proposed and studied in, e.g.,Breiman and Friedman (1985), Horowitz (2009) and Linton et al. (2008). They areeither based on a modiﬁed least squares method or on proﬁle likelihood estimatorsor on mean square distance from independence.• Assumptions (A.9) and (A.10) are for the considered class of alternatives.25 .2 Proofs The proofs are quite technical and therefore we present the main steps only. Additionally,main line of the proofs follows that in Hlávka et al. (2011), however some modiﬁcationsand extensions are needed. Standard technique from nonparametric regression is appliedtogether with functional central limit theorems.When not confused we use short notations: ε j = ε ϑ , j , b ε j = b ε b ϑ, j , b m n ( x ) = b m b ϑ ( x ) , b σ n ( x ) = b σ b ϑ ( x ) . For simplicity we give the proofs only for ϑ univariate. Multivariate situation proceedsquite analogously. Proof of Theorem 1.

By assumption (A.7) and elementary properties of the functions sinusand cosinus ∆ n , W = ∫ ∫ (cid:12)(cid:12) J n , ( t , t ) + J n , ( t , t ) (cid:12)(cid:12) W ( t , t ) dt d t , (13)where J n , ( t , t ) = √ n n Õ j = h (cid:0) cos ( t b ε j ) − E cos ( t ε j ) (cid:1) g + ( t ⊤ X j ) + (cid:0) sin ( t b ε j ) − E sin ( t ε j ) (cid:1) g − ( t ⊤ X j ) i , J n , ( t , t ) = n / n Õ j = n Õ v = h (cid:0) cos ( t b ε v ) − E cos ( t ε v ) (cid:1) g + ( t ⊤ X j ) + (cid:0) sin ( t b ε v ) − E sin ( t ε v ) (cid:1) g − ( t ⊤ X j ) i with g + ( t ⊤ X ) = cos ( t ⊤ X ) + sin ( t ⊤ X ) − E (cid:0) cos ( t ⊤ X ) + sin ( t ⊤ X ) (cid:1) , g − ( t ⊤ X ) = cos ( t ⊤ X ) − sin ( t ⊤ X ) − E (cid:0) cos ( t ⊤ X ) − sin ( t ⊤ X ) (cid:1) . A useful asymptotic representation for J n , ( t , t ) provides 1 while negligibility of J n , ( t , t ) is proved quite analogously and therefore its proof is omitted. (cid:3) Lemma 1.

Let the assumptions of Theorem 1 be satisﬁed then, as n → ∞ , ∫ ∫ (cid:12)(cid:12)(cid:12) J n , ( t , t ) − Q ε, X , c ( t , t ) − Q ε, X , s ( t , t ) − L ε, X ( t , t ) (cid:12)(cid:12)(cid:12) W ( t , t ) dt d t → P , here Q ε, X , c ( t , t ) = √ n n Õ j = (cid:8) ( cos ( t ε j ) − E (cid:0) cos ( t ε j ))) + t ε j S ε ( t ) + t ( ε j − ) C ′ ε ( t )/ (cid:9) g + ( t ⊤ X j ) , Q ε, X , s ( t , t ) = √ n n Õ j = (cid:8) ( sin ( t ε j ) − E (cid:0) sin ( t ε j ) (cid:1) ) − t ε j S ε ( t )− t ( ε j − ) C ′ ε ( t )/ (cid:9) g − ( t ⊤ X j ) , L ε, X ( t , t ) = √ n ( b ϑ − ϑ ) H ϑ , ( t , t ) where C ε and S ε are the real and the imaginary part of the CF of ε j and C ′ ε and S ′ ε arerespective derivatives and H ϑ , ( t , t ) is deﬁned in Theorem 1 with q = .Proof. Recall that the residuals b ε j can be expressed as b ε j = ε j + ε j (cid:26) σ ϑ ( X j ) b σ n ( X j ) − (cid:27) + m ϑ ( X j ) − b m n ( X j ) b σ n ( X j ) + T ϑ ( Y j ) − T ϑ ( Y j ) b σ n ( X j ) , j = , . . . , n , (14)and then by Taylor expansion and smoothness of T ϑ ( Y j ) w.r.t. ϑ we havecos ( t b ε j ) = cos ( t ε j )− t sin ( t ε j ) h ε j (cid:26) σ ϑ ( X j ) b σ n ( X j ) − (cid:27) + m ϑ ( X j ) − b m n ( X j ) b σ n ( X j ) + T b ϑ ( Y j ) − T ϑ ( Y j ) b σ n ( X j ) i + t R cn j ( t ) , (15) j = , . . . , n , where R cn j ( t ) ’s are remainders. Similar relations can be obtained for thesin ( t b ε j ) . Comparing the present situation with that considered in Hlávka et al. (2011)there is an additional parameter θ which inﬂuences the behavior of the treated variables.We notice b m ϑ ( X j ) − m ϑ ( X j ) = nh p b f ( X j ) n Õ v = K (cid:16) X v − X j h (cid:17) (cid:16) T ϑ ( Y v ) − T ϑ ( Y v ) (cid:17) + nh p b f ( X j ) n Õ v = K (cid:16) X v − X j h (cid:17) (cid:16) ε v σ ϑ ( X v ) + m ϑ ( X v ) − m ϑ ( X j ) (cid:17) = A , n ( X j , ϑ ) + A , n ( X j , ϑ ) , sa y b σ ϑ ( X j ) = nh p b f ( X j ) n Õ v = K (cid:16) X v − X j h (cid:17) (cid:16) T ϑ ( Y v ) − T ϑ ( Y v ) + ε v σ ϑ ( X j ) + m ϑ ( X v ) − m ϑ ( X j ) (cid:17) = nh p b f ( X j ) n Õ v = K (cid:16) X v − X j h (cid:17) (cid:16) T ϑ ( Y v ) − T ϑ ( Y v ) (cid:17) + nh p b f ( X j ) n Õ v = K (cid:16) X v − X j h (cid:17) (cid:16) ε v σ ϑ ( X j ) + m ϑ ( X v ) − m ϑ ( X j ) (cid:17) + nh p b f ( X j ) n Õ v = K (cid:16) X v − X j h (cid:17) (cid:16) T ϑ ( Y v ) − T ϑ ( Y v ) (cid:17) ε v σ ϑ ( X v ) + nh p b f ( X j ) n Õ v = K (cid:16) X v − X j h (cid:17) (cid:16) T ϑ ( Y v ) − T ϑ ( Y v ) (cid:17) (cid:16) m ϑ ( X v ) − m ϑ ( X j ) (cid:17) = B , n ( X j , ϑ ) + B , n ( X j , ϑ ) + B , n ( X j , ϑ ) + B , n ( X j , ϑ ) , sa y The terms A , n ( X j , ϑ ) , A n ( X j , ϑ ) , B , n ( X j , ϑ ) , B , n ( X j , ϑ ) are inﬂuential while the othersare properly negligible. The desired property of A n ( X j , ϑ ) , B , n ( X j , ϑ ) follow from resultsin Hlávka et al. (2011) and are formulated below, the remaining terms will be discussedshortly later.Proceeding as in Hlávka et al. (2011) we ﬁnd out that ∫ ∫ (cid:16) Q ε, X , c ( t , t ) − √ n n Õ j = g + ( t X j ) n ( cos ( t ε j ) − E (cid:0) cos ( t ε j ) (cid:1) − t sin ( t ε j ) h − A , n ( X j , ϑ ) σ ϑ ( X j ) − ε j (cid:16) B , n ( X j , ϑ ) σ ϑ ( X j ) − (cid:17) i o(cid:17) W ( t , t ) dt d t → P , ϑ . We start with1 √ n n Õ j = g + ( t X j ) (cid:16) − t sin ( t ε j ) − A , n ( X j , ϑ ) − ε j B , n ( X j , ϑ ) + T ϑ ( Y j ) − T ϑ ( Y j ) σ ϑ ( X j ) (cid:17) = √ n n Õ j = g + ( t X j )(− t sin ( t ε j )) σ ϑ ( X j ) h T ϑ ( Y j ) − T ϑ ( Y j )− nh p b f ( X j ) n Õ v = K (cid:16) X v − X j h (cid:17) (cid:0) T ϑ ( Y v ) − T ϑ ( Y v ) (cid:1) (cid:0) + ε v ε j (cid:1) i = √ n n Õ j = n Õ v = K (cid:16) X v − X j h (cid:17) (cid:16) T ϑ ( Y j ) − T ϑ ( Y j ) (cid:17) × h nh p b f ( X j ) σ ϑ ( X j ) ( g + ( t X j )(− t sin ( t ε j ))− nh p b f ( X v ) σ ϑ ( X v ) g + ( t X v ))(− t sin ( t ε v ))( + ε j ε v ) i . This behaves asymptotically as (due to assumption (A.7)) √ n ( b ϑ − ϑ ) n n Õ j = n Õ v = K (cid:16) X v − X j h (cid:17) ∂ T ϑ ( Y j ) ∂ϑ (cid:12)(cid:12)(cid:12) ϑ = ϑ × h nh p b f ( X j ) σ ϑ ( X j ) ( g + ( t X j )(− t sin ( t ε j ))− nh p b f ( X v ) σ ϑ ( X v ) g + ( t X v ))(− t sin ( t ε v ))( + ε j θ ε v i Since by the assumptions √ n ( b ϑ − ϑ ) = O P ( ) it suﬃces to to study C , n ( t , t ) = n n Õ j = ∂ T ϑ ( Y j ) ∂ϑ (cid:12)(cid:12)(cid:12) ϑ = ϑ σ ϑ ( X j ) g + ( t X j )(− t sin ( t ε j )) C , n ( t , t ) = − n n Õ j = n Õ v = K (cid:16) X v − X j h (cid:17) ∂ T ϑ ( Y j ) ∂ϑ (cid:12)(cid:12)(cid:12) ϑ = ϑ nh p b f ( X v ) σ ϑ ( X v ) g + ( t X v ))× (− t sin ( t ε v ))( + ε j ε v )

29y the law of large numbers (uniform in t , t ), as n → ∞ , C , n ( t , t ) → P E (cid:16) ∂ T ϑ ( Y ) ∂ϑ (cid:12)(cid:12)(cid:12) ϑ = ϑ σ ϑ ( X ) g + ( t X )(− t sin ( t ε )) (cid:17) , C , n ( t , t ) → P − E (cid:16) ∂ T ϑ ( Y ) ∂ϑ (cid:12)(cid:12)(cid:12) ϑ = ϑ σ ϑ ( X ) ( + ε ε ) (cid:16) − t sin ( t ε ) g + ( t ⊤ X ) (cid:17) (cid:17) . Similarly we proceed with1 √ n n Õ j = g − ( t X j ) (cid:16) t cos ( t ε j ) − A , n ( X j , ϑ ) − ε j B , n ( X j , ϑ ) + T ϑ ( Y j ) − T ϑ ( Y j ) σ ϑ ( X j ) (cid:17) and the proof of Lemma 1 is ﬁnished. (cid:3) Proof of Theorem 2.

We proceed as in the proof of Theorem 1 and Lemma 1 and come tothe conclusion that1 n ∆ n , W → P ∫ R p | ϕ X ,ε ϑ ( t , t ) − ϕ X ( t ) ϕ ε ϑ ( t )| W ( t , t ) dt d t > . (cid:3)(cid:3)