Change point analysis in non-stationary processes - a mass excess approach
SSubmitted to the Annals of Statistics
DETECTING RELEVANT CHANGES IN THE MEAN OFNON-STATIONARY PROCESSES - A MASS EXCESSAPPROACH
By Holger Dette ∗ , Weichi Wu ∗† Ruhr-Universit¨at Bochum ∗ and Tsinghua University † This paper considers the problem of testing if a sequence of means( µ t ) t =1 ,...,n of a non-stationary time series ( X t ) t =1 ,...,n is stable in thesense that the difference of the means µ and µ t between the initialtime t = 1 and any other time is smaller than a given threshold, thatis | µ − µ t | ≤ c for all t = 1 , . . . , n . A test for hypotheses of thistype is developed using a bias corrected monotone rearranged locallinear estimator and asymptotic normality of the corresponding teststatistic is established. As the asymptotic variance depends on thelocation of the roots of the equation | µ − µ t | = c a new bootstrapprocedure is proposed to obtain critical values and its consistency isestablished. As a consequence we are able to quantitatively describerelevant deviations of a non-stationary sequence from its initial value.The results are illustrated by means of a simulation study and byanalyzing data examples.
1. Introduction.
A frequent problem in time series analysis is the de-tection of structural breaks. Since the pioneering work of Page (1954) inquality control change point detection has become an important tool withnumerous applications in economics, climatology, engineering, hydrology andmany authors have developed statistical tests for the problem of detectingstructural breaks or change-points in various models. Exemplarily we men-tion Chow (1960), Brown, Durbin and Evans (1975), Kr¨amer, Ploberger andAlt (1988), Andrews (1993), Bai and Perron (1998) and Aue et al. (2009)]and refer to the work of Aue and Horv´ath (2013) and Jandhyala et al. (2013)for more recent reviews.Most of the literature on testing for structural breaks formulates the hy-potheses such that in the statistical model the stochastic process under thenull hypothesis of “no change-point” is stationary. For example, in the prob-lem of testing if a sequence of means ( µ t ) t =1 ,...,n of a non-stationary timeseries ( X t ) t =1 ,...,n is stable it is often assumed that X t = µ t + ε t with a MSC 2010 subject classifications:
Keywords and phrases: locally stationary process, change point analysis, relevantchange points, local linear estimation, Gaussian approximation, rearrangement estima-tors a r X i v : . [ s t a t . M E ] J a n stationary error process ( ε t ) t =1 ,...,n . The null hypothesis is then given by H : µ = µ = · · · = µ n , (1.1)while the alternative (in the simplest case of only one structural break) isdefined as H : µ (1) = µ = µ = · · · = µ k (cid:54) = µ k +1 = µ k +2 = · · · = µ n = µ (2) , (1.2)where k ∈ { , . . . , n } denotes the (unknown) location of the change. Theformulation of the null hypothesis in the form (1.1) facilitates the analysisof the distributional properties of a corresponding test statistic substantially,because one can work under the assumption of stationarity. Consequently,it is a very useful assumption from a theoretical point of view.On the other hand, if the differences {| µ − µ t |} t =2 ,...,n are rather “small”,a modification of the statistical analysis might not be necessary although thetest rejects the “classical” null hypothesis (1.1) and detects non-stationarity.For example, as pointed out by Dette and Wied (2016), in risk managementone wants to fit a model for forecasting the Value at Risk from “uncontami-nated data”, that means from data after the last change-point. If the changesare small they might not yield large changes in the Value at Risk. Now us-ing only the uncontaminated data might decrease the bias but increasesthe variance of a prediction. Thus, if the changes are small, the forecastingquality might not necessarily decrease and - in the best case - would onlyimprove slightly. Moreover, any benefit with respect to statistical accuracycould be negatively overcompensated by additional transaction costs.In order to address these issues Dette and Wied (2016) proposed to in-vestigate precise hypotheses in the context of change point analysis, whereone does not test for exact equality, but only looks for “similarity” or a“relevant” difference. This concept is well known in biostatistics [see, forexample, Wellek (2010)] but has also been used to investigate the similarityof distribution functions [see ´Alvarez Esteban et al. (2008, 2012) among oth-ers]. In the context of detecting a change in a sequence of means (or otherparameters of the marginal distribution) Dette and Wied (2016) assumedtwo stationary phases and tested if the difference before and after the changepoint is small, that is H : | µ (1) − µ (2) | ≤ c versus H : | µ (1) − µ (2) | > c, (1.3)where c > c could be determined by the transactioncosts). Their approach heavily relies on the fact that the process before ELEVANT CHANGES VIA A MASS EXCESS and after the change point is stationary, but this assumption might also bequestionable in many applications.A similar idea can be used to specify the economic design of control chartsfor quality control purposes. While in change-point analysis the focus is ontesting for the presence of a change and on estimating the time at whicha change occurs once it has been detected, control charting has typicallybeen focused more on detecting such a change as quickly as possible afterit occurs [see for example Champ and Woodall (1987), Woodall and Mont-gomery (1999) among many others]. In particular control charts are relatedto sequential change point detection, while the focus of the cited literatureis on retrospective change point detection.In the present paper we investigate alternative relevant hypotheses in theretrospective change point problem, which are motivated by the observationthat in many applications the assumption of two stationary phases (suchas constant means before and after the change point) cannot be justified asthe process parameters change continuously in time. For this purpose weconsider the location scale model X i,n = µ ( i/n ) + (cid:15) i,n , (1.4)where { (cid:15) i,n : i = 1 , . . . , n } n ∈ N denotes a triangular array of centered randomvariables (note that we do not assume that the “rows” { (cid:15) j,n : j = 1 , . . . , n } are stationary) and µ : [0 , → R is the unknown mean function. We definea change as relevant , if the amount of the change and the time period wherethe change occurs are reasonably large. More precisely, for a level c > level set M c = { t ∈ [0 ,
1] : | µ ( t ) − µ (0) | > c } (1.5)of all points t ∈ [0 , c . The situation is illustrated inFigure 1, where the curve represents the mean function µ with µ (0) = 0and the lines in boldface represent the set M c (with c = 1). These periodsresemble in some sense popular run rules from the statistical quality controlliterature which signal if k of the last m standardized sample means fall inthe interval [see for example Champ and Woodall (1987)].Define T c := λ ( M c )(1.6)as the corresponding excess measure, where λ denotes the Lebesgue measure.We now propose to investigate the hypothesis that the relative time, where − − m u Fig 1:
Illustration of the set M c in (1.5) . this difference is larger than c does not exceed a given constant, say ∆ ∈ (0 , H : T c ≤ ∆ versus H : T c > ∆ . (1.7)We consider the change as relevant , if the Lebesgue measure T c = λ ( M c )is larger than the threshold ∆. Note that this includes the case when achange (greater than c ) occurs at some point t < − ∆ and the mean levelremains constant otherwise.In many applications it might also be of interest to investigate one-sidedhypotheses, because one wants to detect a change in certain direction. Forthis purpose we also consider the sets M ± c = { t ∈ [0 ,
1] : ± ( µ ( t ) − µ (0)) > c } and define the hypotheses H +0 : T + c = λ ( M + c ) ≤ ∆ versus H +1 : T + c > ∆ , (1.8) H − : T − c = λ ( M − c ) ≤ ∆ versus H − : T − c > ∆ . (1.9)The hypotheses (1.7), (1.8) and (1.9) require the specification of two pa-rameters ∆ and c and in a concrete application both parameters have to bedefined after a careful discussion with the practitioners. In particular theywill be different in different fields of application. Another possibility is toinvestigate a relative deviation from the mean, that is: µ ( t ) deviates from µ (0) relative to µ (0) by at most x % (see Section 2.2.2 for a discussion ofthis measure).Although the mean function in model (1.4) cannot be assumed to bemonotone, we use a monotone rearrangement type estimator [see Dette,Neumeyer and Pilz (2006)] to estimate the quantities T c , T + c , T − c , and pro-pose to reject the null hypothesis (1.7), (1.8) (1.9) for large values of the ELEVANT CHANGES VIA A MASS EXCESS corresponding test statistic. We study the properties of these estimatorsand the resulting tests in a model of the form (1.4) with a locally stationaryerror process, which have found considerable interest in the literature [seeDahlhaus et al. (1997), Nason, von Sachs and Kroisandt (2000), Ombao, vonSachs and Guo (2005), Zhou and Wu (2009) and Vogt (2012) among others].In particular we do not assume that the underlying process is stationary, asthe mean function can vary smoothly in time and the error process is non-stationary. Moreover, we also allow that the derivative of the mean function µ may vanish on the set of critical roots C = { t ∈ [0 ,
1] : | µ ( t ) − µ (0) | = c } and prove that appropriately standardized versions of the monotone rear-rangement estimators are consistent for T c , T + c and T − c , and asymptoticallynormally distributed. The main challenge in this asymptotic analysis is toquantify the order of an approximation of the quantity(1.10) λ (cid:0) { t ∈ [0 ,
1] : | ˆ µ ( t ) − ˆ µ (0) | > c } (cid:1) , where ˆ µ is an appropriate estimate of the regression function. While esti-mates of the mean trend have been already studied under local stationarityin the literature [see, for example, Wu and Zhao (2007)], the analysis of thequantity (1.10) and its approximation requires a careful localization of theeffect of the estimation error around the critical roots satisfying the equation | µ ( t ) − µ (0) | = c .It is demonstrated - even in the case of independent or stationary errors- that the variance of the limit distribution depends sensitively on (even-tually higher order) derivatives of the regression function at the criticalroots, which are very difficult to estimate. Moreover, because of the non-stationarity of the error process in (1.4) the asymptotic variance dependsalso in a complicated way on the unknown dependence structure. We proposea bootstrap method to obtain critical values for the test, which is motivatedby a Gaussian approximation used in the proof of the asymptotic normal-ity. This re-sampling procedure is adaptive in the sense that it avoids thedirect estimation of the critical roots and the values of the derivatives of theregression function at these points.Note that T c is the excess Lebesgue measure (or mass) of the time whenthe absolute difference between the mean trend and its initial value ex-ceeds the level c . Thus our approach is naturally related to the conceptof excess mass which has found considerable attention in the literature.Many authors used the excess mass approach to investigate multimodalityof a density [see, for example, M¨uller and Sawitzki (1991), Polonik (1995), Cheng and Hall (1998), Polonik and Wang (2006)]. The asymptotic prop-erties of distances between an estimated level and the “true” level set ofa density have also been studied in several publications [see Baillo (2003),Cadre (2006), Cuevas, Gonz´alez-Manteiga and Rodr´ıguez-Casal (2006) andMason and Polonik (2009) among many others]. The concept of mass ex-cess has additionally been used for discrimination between time series [seeChandler and Polonik (2006)], for the construction of monotone regres-sion estimates [Dette, Neumeyer and Pilz (2006), Chernozhukov, Fernand´ez-Val and Galichon (2010)], quantile regression [Dette and Volgushev (2008),Chernozhukov, Fernand´ez-Val and Galichon (2009)], clustering [Rinaldo andWasserman (2010)] and for bandwidth selection in density estimation [seeSamworth and Wand (2010)], but to our best knowledge it has not beenused for change point analysis.Most of the literature discusses regular points, that are points, where thefirst derivative of the density or regression function does not vanish, but thereexist also references where this condition is relaxed. For example, Hartiganand Hartigan (1985) proposed a test for multimodality of a density compar-ing the difference between the empirical distribution function and a classof unimodal distribution functions. They observed that the stochastic orderof the test statistic depends on the minimal number k , such that the k thderivative of the cumulative distribution function does not vanish. Polonik(1995) studied the asymptotic properties of an estimate of the mass excessfunctional of a cumulative distribution function F with density f and Tsy-bakov (1997) observed that the minimax risk in the problem of estimatingthe level set of a density depends on its “regularity”. More recently, Chan-dler and Polonik (2006) used the excess mass functional for discriminationanalysis under the additional assumption of unimodality.The present paper differs from this literature with respect to several per-spectives. First, we are interested in change point analysis and develop a testfor a relevant difference in the mean of the process over a certain range oftime. Therefore - in contrast to most of the literature, which deals with i.i.d.data - we consider the regression model (1.4) with a non-stationary errorprocess. Second, we are interested in an estimate, say ˆ T N,c of the Lebesguemeasure T c of the level set M c and its asymptotic properties in order toconstruct a test for the change point problem (1.7). Therefore - in contrastto many references - we do not discuss estimates of an excess mass functionalor a distance between an estimated level set and the “true” level set, butinvestigate the asymptotic distribution of ˆ T N,c . Third, as this distributiondepends sensitively on the critical points and the dependence structure of thenon-stationary error process, we use a Gaussian approximation to develop a
ELEVANT CHANGES VIA A MASS EXCESS bootstrap method, which allows us to find quantiles without estimating thelocation of the critical points and the derivatives of the regression functionat these points.We also mention the differences to the work of Mercurio and Spokoiny(2004) and Spokoiny (2009), which has its focus on the detection of in-tervals of homogeneity of the underlying process, while the present paperinvestigates the problem to detect significant deviations of an inhomoge-neous process from its initial distribution (here specified by different valuesof the mean function).The approach proposed in this paper is also related to the sojourn timeof a (real valued) stochastic process, say { X ( t ) } t ∈ [0 , , which is defined as S c = (cid:90) {| X ( t ) − X (0) | > c } dt (1.11)and has widely been studied in probability theory under specific distribu-tional assumptions [see, for example Berman (1992); Tak´acs (1996) amongmany]. To be precise let X ( t ) = µ ( t ) + (cid:15) ( t ) for some centered process { (cid:15) ( t ) } t ∈ [0 , , then compared to the quantity T c defined in (1.6), which refersto expectation µ ( t ), the quantity S c is a random variable. An alternativeexcess-type measure is now given by the expected sojourn time e c := E ( S c ) , (1.12)and the corresponding null hypotheses can be formulated as H : e c ≤ ∆ versus H : e c > ∆ . A further quantity of interest was mentioned by a referee to us and is definedby the probability that the sojourn time exceeds the threshold ∆, that is p c, ∆ := P ( S c > ∆) . (1.13)This quantity cannot be directly used for testing, but can be considered asa measure of a relevant deviation for a sufficiently long time from the initialstate X (0).The rest of paper is organized as follows. In Section 2 we motivate our ap-proach, define an estimator of the quantity T c , discuss alternative measuresand give some basic assumptions of the non-stationary model (1.4). Section3 is devoted to a discussion of the asymptotic properties of this estimator inthe case, where all critical points are regular points, that is µ (1) ( s ) (cid:54) = 0 forall s ∈ C . We focus on this case first, because here the arguments are more transparent. In particular in this case all roots are of the same order andcontribute to the asymptotic variance of the limit distribution, which sim-plifies the statement of the results substantially. In this case we also identifya bias problem, which makes the implementation of the test at this stagedifficult. The general case is carefully investigated in Section 4, where wealso address the bias problem using a Jackknife approach. The bootstrapprocedure is developed in the second part of Section 4. In Sections 5 and6 we illustrate its finite sample properties by means of a simulation studyand by analyzing data examples. Finally, some discussion on multivariatedata is given in Section 10. In this section we also propose estimators ofthe quantities (1.12) and (1.13). Finally, most of the technical details aredeferred to Section 8 and an online supplement (which also contains somefurther auxiliary results).
2. Estimation and basic assumptions.
Relevant changes via a mass excess approach.
Recall the definitionof the testing problems (1.7), (1.8), (1.9) and note that T c = T + c + T − c , where(2.1) T + c = (cid:90) ( µ ( t ) − µ (0) > c ) dt , T − c = (cid:90) ( µ ( t ) − µ (0) < − c ) dt, and ( B ) denotes the indicator function of the set B . In most parts of thepaper we mainly concentrate on the estimation of the quantity T + c and studythe asymptotic properties of an appropriately standardized estimate [see forexample Theorems 3.1 and 4.1]. Corresponding results for the estimators of T − c and T c can be obtained by similar methods and the joint weak conver-gence is established in Theorem 3.2 and Theorem 4.2 without giving detailedproofs.We propose to estimate the mean function by a local linear estimator(ˆ µ b n ( t ) , ˆ˙ µ b n ( t )) T = argmin β ∈ R ,β ∈ R n (cid:88) i =1 ( X i − β − β ( i/n − t )) K (cid:16) i/n − tb n (cid:17) , t ∈ [0 , K ( · ) denotes a continuous and symmetric kernel supported on theinterval [ − , T + c byˆ T + N,c = 1 N N (cid:88) i =1 (cid:90) ∞ c h d K d (cid:16) ˆ µ b n ( i/N ) − ˆ µ b n (0) − uh d (cid:17) du, (2.3) ELEVANT CHANGES VIA A MASS EXCESS where K d ( · ) is a symmetric kernel function supported on the interval [ − , (cid:82) − K d ( x ) dx = 1. In (2.3) the quantity h d > N is the number of knots in a Riemann approximation (see thediscussion in the following paragraph), which does not need to coincide withthe sample size n . It turns out that the procedures proposed in this paperare not sensitive with respect to the choice of h d and N , provided that theseparameters have been chosen sufficiently small and large, respectively (seeSection 5 for a further discussion).A statistic of the type (2.3) has been proposed by Dette, Neumeyer andPilz (2006) to estimate the inverse of a strictly increasing regression function,but we use it here without assuming monotonicity of the mean function µ .Observing that ˆ µ b n ( t ) is a consistent estimate of µ ( t ) we argue (rigorousarguments are given later) thatˆ T + N,c = 1 N N (cid:88) i =1 (cid:90) ∞ c h d K d (cid:16) µ ( i/N ) − µ (0) − uh d (cid:17) du + o P (1)= 1 h d (cid:90) (cid:90) ∞ c K d (cid:16) µ ( x ) − µ (0) − uh d (cid:17) dudx + o P (1) = T + c + o P (1)(2.4)as n, N → ∞ , h d →
0. In Figure 2 we display the functions p h d : t → h d (cid:90) ∞ c K d (cid:16) µ ( t ) − µ (0) − uh d (cid:17) du and q : t → ( µ ( t ) − µ (0) ≥ c )(2.5)and visualize that p h d is a smooth approximation of the indicator functionfor decreasing h d (for the function considered in Figure 1). This smoothingis introduced to derive the asymptotic properties of the statistic ˆ T + N,c andto construct a valid bootstrap procedure without estimating the criticalroots and derivatives of the regression function. Thus intuitively (rigorousarguments will be given in the following sections) the statistic ˆ T + N,c is aconsistent estimator of T + c and a similar argument for T − c will provide aconsistent estimator of the quantity T c defined in (1.6). The null hypothesisis finally rejected for large values of this estimate.In order to make these heuristic arguments more rigorous we make thefollowing basic assumptions for the model (1.4). Assumption . (a) The mean function is twice differentiable with Lipschitz continuous sec-ond derivative. . . . . . . − hd Fig 2:
Smooth approximation p h d of the step function q = M + c for differentchoices of the bandwidth h d . (b) There exists a positive constant (cid:15) , such that for all δ ∈ [0 , (cid:15) ] there are k δ closed disjoint intervals I ,δ , . . . , I k δ ,δ , such that (cid:8) t ∈ [0 ,
1] : | µ ( t ) − µ (0) − c | ≤ δ (cid:9) ∪ (cid:8) t ∈ [0 ,
1] : | µ ( t ) − µ (0) + c | ≤ δ (cid:9) = k δ (cid:91) i =1 I i,δ , where the number of intervals k δ satisfies sup ≤ δ ≤ (cid:15) k δ ≤ M for someuniversal constant M . In particular there exists only a finite numberof roots of the equation µ ( t ) − µ (0) = ± c . We also assume that | µ (1) − µ (0) | (cid:54) = c .It is worthwhile to mention that all results presented in the paper re-main true if the regression function is Lipschitz continuous on the interval[0 ,
1] and the assumptions regarding its differentiability (such as Assumption2.1) hold in a neighborhood of the critical roots. Our first result makes theapproximation of T + c by its deterministic counterpart T + N,c := 1 N N (cid:88) i =1 (cid:90) ∞ c h d K d (cid:16) µ ( i/N ) − µ (0) − uh d (cid:17) du (2.6)in (2.4) rigorous. For this purpose let m γ,δ ( µ ) = λ (cid:0) { t ∈ [0 ,
1] : | µ ( t ) − γ | ≤ δ } (cid:1) (2.7)denote the Lebesgue measure of the set of points, where the mean functionlies in a δ -neighbourhood of the point γ . ELEVANT CHANGES VIA A MASS EXCESS Proposition . If Assumption 2.1 holds and m c + µ (0) ,δ ( µ ) = O ( δ ι ) for some ι > as δ → , we have for the quantity T + N,c in (2.6) , T + N,c − T + c = O (max { h ιd , N − } ) as N → ∞ , h d → .Proof. By elementary calculations it follows that (cid:90) ∞ c h d K d (cid:16) µ ( i/N ) − µ (0) − uh d (cid:17) du − ( µ ( i/N ) − µ (0) > c )= ( {| c − ( µ ( i/N ) − µ (0)) | ≤ h d } ) (cid:90) ∞ c − µ ( i/N )+ µ (0) hd K d ( x ) dx − ( { µ ( i/N ) − µ (0) − h d < c ≤ µ ( i/N ) − µ (0) } ) . Therefore, we obtain (observing that (cid:82) − K d ( x ) dx = 1) | T + N,c − T + c | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N N (cid:88) i =1 (cid:32)(cid:90) ∞ c h d K d (cid:16) µ ( iN ) − µ (0) − uh d (cid:17) du − ( µ ( iN ) − µ (0) > c ) (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + O (cid:16) N (cid:17) ≤ N N (cid:88) i =1 ( | µ ( i/N ) − µ (0) − c | ≤ h d ) + O ( N − )= 2 m c + µ (0) ,h d ( µ ) + O ( N − ) = O (cid:16) max { h ιd , N } (cid:17) . as N → ∞ , h d → (cid:3) Alternatives measures of mass excess.
In this section we briefly men-tion several alternative measures of mass excess, which might be of interestin applications and for which similar results as stated in this paper can bederived. For the sake of brevity we do not state these results in full detailin this paper and only describe the measures with corresponding estimates.2.2.1.
Deviations from an average trend.
In applications one might bealso interested if there exist relevant deviations of the sequence ( µ ( i/n )) i = (cid:98) nt (cid:99) +1 ,...,n from an average trend formed from the previous period ( µ ( i/n )) i =1 ,..., (cid:98) nt (cid:99) .This question can be addressed by estimating the quantity (cid:90) t (cid:16) µ ( t ) − (cid:90) t µ ( s ) ds > c (cid:17) dt = λ (cid:16)(cid:8) t ∈ [ t ,
1] : µ ( t ) − (cid:90) t µ ( s ) ds > c (cid:9)(cid:17) . Using similar arguments as given in this paper (and the supplementary ma-terial) one can prove consistency and derive the asymptotic distribution ofthe estimate1 N N (cid:88) i = (cid:98) Nt (cid:99) (cid:90) ∞ c h d K d (cid:16) ˆ µ b n ( i/N ) − (cid:82) t ˆ µ b n ( s ) ds − uh d (cid:17) du where ˆ µ b n is local linear estimator of µ (in Section 4 we will use a biascorrected version of ˆ µ b n ).2.2.2. Relative deviations. If µ (0) (cid:54) = 0 an alternative measure of excesscan be defined by (cid:90) (cid:16)(cid:12)(cid:12)(cid:12) µ ( t ) − µ (0) µ (0) (cid:12)(cid:12)(cid:12) > c (cid:17) dt = λ (cid:16)(cid:110) t ∈ [0 ,
1] : (cid:12)(cid:12)(cid:12) µ ( t ) − µ (0) µ (0) (cid:12)(cid:12)(cid:12) > c (cid:111)(cid:17) . (2.8)This measure of excess allows to define a relevant change in the mean rel-ative to its initial value and makes the choice of the constant c easier inapplications. For example, if one chooses c = 0 .
1, one is interested in rel-evant deviation from the initial value by more than 10%. The quantity inequation (2.8) can be estimated in a similar way as described in the previousparagraph and the details are omitted for the sake of brevity.2.3.
Locally stationary processes.
In Sections 3 and 4 we will establishthe asymptotic properties of the statistic ˆ T + N,c as an estimator of T + c andderive a bootstrap approximation to derive critical values. Since we are inter-ested in a procedure for non-stationary processes we require several technicalassumptions on the error process in model (1.4). The less experienced readercan easily skip this paragraph and consider an independent identically dis-tributed array of centered random variables (cid:15) i,n in model (1.4) with variance σ . The main challenge in the proofs is neither the dependence structure northe non-stationarity of the error process but consists in the fact that defi-nition (2.3) defines a complicated map from the class of estimators to theLebesgue measure of random sets of the form { t : | ˆ µ b n ( t ) − ˆ µ b n (0) | > c } .Thus, although a standardized version of the local linear estimator ˆ µ b n isasymptotically normally distributed (under suitable conditions), a rigorousanalysis of this mapping is required to derive the distributional propertiesof the statistic ˆ T + N,c . These depend sensitively on the local behaviour of thefunction µ at points satisfying the equation | µ ( t ) − µ (0) | = c and the corre-sponding analysis represents the most important part of the work, which isindependent of the error structure in model (1.4). ELEVANT CHANGES VIA A MASS EXCESS To be precise let || X || q = (cid:0) E | X | q (cid:1) /q denote the L q -norm of the randomvariable X ( q ≥ Definition . Let η = ( η i ) i ∈ Z be a sequence of independent identicallydistributed random variables, F i = { η s : s ≤ i } , denote by η (cid:48) = ( η (cid:48) i ) i ∈ Z anindependent copy of η and define F ∗ i = ( . . . , η − , η − , η (cid:48) , η , . . . , η i ) . For t ∈ [0 , let G : [0 , × R ∞ → R denote a nonlinear filter, that is a measurablefunction, such that G ( t, F i ) is a properly defined random variable for all t ∈ [0 , . (1) A sequence ( (cid:15) i,n ) i =1 ,...,n is called locally stationary process, if there existsa filter G such that (cid:15) i,n = G ( i/n, F i ) for all i = 1 , . . . , n . (2) For a nonlinear filter G with sup t ∈ [0 , (cid:107) G ( t, F i ) (cid:107) q < ∞ , the physicaldependence measure of G with respect to (cid:107) · (cid:107) q is defined by δ q ( G, k ) = sup t ∈ [0 , (cid:107) G ( t, F k ) − G ( t, F ∗ k ) (cid:107) q . (2.9) (3) The filter G is called Lipschitz continuous with respect to (cid:107) · (cid:107) q if andonly if sup ≤ s 1] and non-degenerate , that is inf t ∈ [0 , σ ( t ) > { (cid:15) i,n } i =1 ,...,n in model (1.4) is locally stationary with geometrically decaying dependencemeasure. The theoretical results of the paper can also be derived under theassumption of a polynomially decaying dependence measure with substan-tially more complicated bandwidth conditions and proofs. Conditions (b)and (c) are standard in the literature of locally stationary time series. Theyare used later for a Gaussian approximation of the locally stationary timeseries; see for example Zhou and Wu (2010). 3. Twice continuously differentiable mean functions. In this sec-tion we briefly consider the situation, where the derivatives of the meanfunction at the critical set C do not vanish. These assumptions are quitecommon in the literature [see, for example, condition (B.ii) in Mason andPolonik (2009) or assumption (A1) in Samworth and Wand (2010)]. We dis-cuss this case separately because of (at least) two reasons. First, the resultsand required assumptions are slightly simpler here. Second, and more im-portant, we use this case to demonstrate that the estimates of T c , T + c and T − c have a bias, which is asymptotically not negligible and makes their di-rect application for testing the hypotheses (1.7), (1.8) and (1.9) difficult.The general case is postponed to Section 4, where we solve the bias problemand also introduce a bootstrap procedure. We do not provide proofs of theresults in this section, as they can be obtained by similar (but substantiallysimpler) arguments as given in the proofs of Theorems 4.1 and 4.2 below.Recall the definition of the statistic ˆ T + N,c in (2.3), where ˆ µ b n ( t ) is thelocal linear estimate of the mean function with bandwidth b n . Our firstresult specifies its asymptotic distribution, and for its statement we makethe following additional assumption on the bandwidths. Assumption . The bandwidth b n of the local linear estimator satisfies b n → , nb n → ∞ , b n /h d → ∞ , √ nb n / log n → ∞ , and π ∗ n /h d → where π ∗ n := ( b n + ( nb n ) − / log n ) log n. ELEVANT CHANGES VIA A MASS EXCESS Theorem . Suppose that Assumptions 2.1, 2.2 and 3.1 hold, thatthere exist roots t +1 , . . . , t + k + of the equation µ ( t ) − µ (0) = c satisfying ˙ µ ( t + j ) (cid:54) =0 for ≤ j ≤ k + , and define ¯ R ,n = n / log nnb n , ¯ R ,n = (cid:16) N b n + 1 N h d (cid:17) ( b n ∧ h d ) , ¯ χ n = ( b n + 1 nb n ) h − d . If N b n → ∞ , N h d → ∞ , √ nb n ( ¯ χ n + ¯ R ,n + ¯ R ,n ) = o (1) , then (cid:112) nb n (cid:16) ˆ T + N,c − T + c − µ ,K b n k + (cid:88) j =1 ¨ µ ( t + j ) | ˙ µ ( t + j ) | + b n c ,K ¨ µ (0)2 c ,K k + (cid:88) j =1 | ˙ µ ( t + j ) | (cid:17) D = ⇒ N (0 , τ , +1 + τ , +2 ) , where τ , +1 = k + (cid:88) s =1 σ ( t + s )˙ µ ( t + s ) (cid:90) K ( x ) dx,τ , +2 = σ (0) c ,K (cid:16) k + (cid:88) j =1 | ˙ µ ( t + j ) | (cid:17) (cid:90) (cid:0) µ ,K − tµ ,K (cid:1) K ( t ) dt, the constants c ,K and c ,K are given by c ,K = µ ,K µ ,K − µ ,K , c ,K = µ ,K − µ ,K µ ,K and µ l,K = (cid:82) x l K ( x ) dx for ( l = 1 , , . . . ). Theorem 3.1 establishes asymptotic normality under the scenario that ˙ µ ( t ) (cid:54) =0 for all points t ∈ C + = { t ∈ [0 , 1] : µ ( t ) − µ (0) = c } . This conditionguarantees that the mean function µ is strictly monotone in a neighbourhoodof the roots. Moreover, 2.1(b), Assumption 2.2 and 3.1 imply the asymptoticindependence of the estimators of µ (0) and µ ( t ) for any t ∈ C + .We conclude this section presenting a corresponding weak convergenceresult for the joint distribution of ( ˆ T + N,c , ˆ T − N,c ), whereˆ T − N,c = 1 N N (cid:88) i =1 (cid:90) − c −∞ h d K d (cid:18) ˆ µ b n ( i/N ) − ˆ µ b n (0) − uh d (cid:19) du (3.1)denotes an estimate of the quantity T − c defined in (1.9). Theorem . Suppose that Assumptions 2.1, 2.2 and 3.1 are satisfiedand that the bandwidth conditions of Theorem 3.1 hold. If there also existroots t − , . . . , t − k − of the equation µ ( t ) − µ (0) = − c , such that ˙ µ ( t − j ) (cid:54) = 0( j = 1 , . . . , k − ) , then, as n → ∞ , (cid:112) nb n (cid:16) ˆ T + N,c − T + c − β + c , ˆ T − N,c − T − c − β − c (cid:17) T D = ⇒ N (0 , ˜Σ) , (3.2) where β ± c = µ ,K b n k ± (cid:88) j =1 ¨ µ ( t ± j ) | ˙ µ ( t ± j ) | − b n c ,K ¨ µ (0)2 c ,K k ± (cid:88) j =1 | ˙ µ ( t ± j ) | , (3.3) and the elements in the matrix ˜Σ = ( ˜Σ ij ) i,j =1 , are given by ˜Σ = τ , +1 + τ , +2 , ˜Σ = τ , − + τ , − and ˜Σ = ˜Σ = − c − ,K σ (0) (cid:16) k + (cid:88) j =1 | ˙ µ ( t + j ) | (cid:17)(cid:16) k − (cid:88) j =1 | ˙ µ ( t − j ) | (cid:17) (cid:90) ( µ ,K − tµ ,K ) K ( t ) dt. where τ , − and τ , − are defined in a similar way as τ , +1 and τ , +2 in Theorem3.1. Remark . The representation of the bias in (3.3) has some similaritywith the approximation of the risk of an estimate of the highest densityregion investigated in Samworth and Wand (2010). We suppose that similararguments as given in the proofs of our main results can be used to deriveasymptotic normality of this estimate [see also Mason and Polonik (2009)]. Remark . The most general assumptions under which the results ofour paper hold are the following.(a) The mean trend is a piece-wise Lipschitz continuous function, with abounded number of jump points. If D + ( t ) and D − ( t ) denote thelimit of the function | µ ( · ) − µ (0) | from the left an right at the jumppoint t , then ( D + ( t ) − c )( D − ( t ) − c ) > 0. In other words: at anyjump, the function | µ ( · ) − µ (0) | does not “cross” the level c .(b) There is a finite number of critical roots and the mean trend functionhas a Lipschitz continuous second derivative in a neighborhood of eachcritical root. ELEVANT CHANGES VIA A MASS EXCESS In particular we exclude the case where jumps occur at critical roots, butthere might be jumps at other points in the interval [0 , µ b n has to be modified to address for these jumps[see Qiu (2003) or Gijbels, Lambert and Qiu (2007) among others]. For thesake of a transparent representation and for the sake of brevity we state ourresults under Assumption 2.1 and 2.2.Theorem 3.1 and 3.2 can be used to construct tests for the hypotheses(1.8) and (1.9). Similarly, by the continuous mapping theorem we also obtainfrom Theorem 3.2 the asymptotic distribution of the the statistic ˆ T N,c =ˆ T + N,c + ˆ T − N,c , which could be used to construct a test for the hypotheses(1.7). However, such tests would either require undersmoothing or estimationof the bias β + c and β − c in (3.3), which is not an easy task. We addressthis problem by a Jackknife method in the following section where we alsodevelop a bootstrap test to avoid the estimation of the critical roots. 4. Bias correction and bootstrap. In this section we will address thebias problem mentioned in the previous section adopting the Jackknife biasreduction technique proposed by Schucany and Sommers (1977). In a secondstep we will use these results to construct a bootstrap procedure. Moreover,we also relax the main assumption in Section 3 that the derivative of themean function does not vanish at critical roots t ∈ C .4.1. Bias correction. Recalling the definition ˆ µ b n ( t ) of the local linearestimator in (2.2) with bandwidth b n we define the Jackknife estimator by˜ µ b n ( t ) = 2ˆ µ b n / √ ( t ) − ˆ µ b n ( t )(4.1)for 0 ≤ t ≤ 1. It has been shown in Wu and Zhao (2007) that the bias of theestimator (4.1) is of order o ( b n + nb n ), whenever b n ≤ t ≤ − b n , and Zhouand Wu (2010) showed that the estimate ˜ µ b n is asymptotically equivalent toa local linear estimate with kernel K ∗ ( x ) = 2 √ K ( √ x ) − K ( x ) . (4.2)In order to use these bias corrected estimators for the construction of tests forthe hypotheses defined in (1.7) - (1.9), we also need to study the estimate˜ µ b n (0), which is not asymptotically equivalent to a local linear estimatewith kernel K ∗ ( x ). However, as a consequence of Lemma 10.2 in the onlinesupplement we obtain the stochastic expansion (cid:12)(cid:12)(cid:12) ˜ µ b n (0) − µ (0) − nb n n (cid:88) i =1 ¯ K ∗ ( inb n ) (cid:15) i,n (cid:12)(cid:12)(cid:12) = O ( b n + 1 nb n ) , (4.3) where the kernel ¯ K ∗ ( x ) is given by¯ K ∗ ( x ) = 2 √ K ( √ x ) − ¯ K ( x )(4.4)with ¯ K ( x ) = ( µ ,K − xµ ,K ) K ( x ) /c ,K . Since the kernel ¯ K ∗ ( x ) is not sym-metric, the bias of ˜ µ b n (0) is of the order O ( b n + nb n ). The correspondingestimators of the quantities T + c and T − c are then defined as in Section 2,where the local linear estimator ˆ µ b n is replaced by its bias corrected version˜ µ b n . For example, the analogue of the statistic in (2.3) is given by˜ T + N,c = 1 N N (cid:88) i =1 (cid:90) ∞ c h d K d (cid:16) ˜ µ b n ( i/N ) − ˜ µ b n (0) − uh d (cid:17) du. (4.5)The investigation of the asymptotic properties of these estimators in thegeneral case requires some preparations, which are discussed next.We call a point t ∈ [0 , 1] a regular point of the mean function µ , if thederivative µ (1) does not vanish at t . A point t ∈ C is called a critical pointof µ of order k ≥ k derivatives of µ at t vanish while the( k + 1)st derivative of µ at t is non zero, that is µ ( s ) ( t ) = 0 for 1 ≤ s ≤ k and µ ( k +1) ( t ) (cid:54) = 0. Regular points are critical points of order 0. Theorem3.1 or 3.2 are not valid if any of the roots of the equation µ ( t ) − µ (0) = c or µ ( t ) − µ (0) = − c is a critical point of order larger or equal than 1. Thefollowing result provides the asymptotic distribution in this case and alsosolves the bias problem mentioned in Section 3. For its statement we makethe following additional assumptions. Assumption . The mean function µ is three times continuously dif-ferentiable. Let t +1 , . . . , t + k + and t − , . . . , t − k − denote the roots of the equa-tions µ ( t ) − µ (0) = c and µ ( t ) − µ (0) = − c , respectively. For each t − s ( s = 1 , . . . , k − ) and each t + s ( s = 1 , . . . , k + ) there exists a neighbourhoodof t − s and t + s such that µ is ( v − s + 1) and ( v + s + 1) times differentiable inthese neighbourhoods with corresponding critical order v − s and v + s , respec-tively (1 ≤ s ≤ k − , ≤ s ≤ k + ). We also assume that the ( v − s + 1)st and( v + s + 1)st derivatives of the mean function are Lipschitz continuous on theseneighbourhoods. Assumption . There exist q points 0 = s < s < . . . < s q Theorem . Suppose that k + ≥ , and that Assumptions 2.1, 2.2,4.1 and Assumption 4.2 are satisfied. Define v + = max ≤ l ≤ k + v + l as themaximum critical order of the roots of the equation µ ( t ) − µ (0) = c andintroduce the notation χ + n = (cid:16) b n + 1 nb n (cid:17) h − d h v ++1 d , R +1 ,n = h − v + v ++1 d (cid:16) b n + 1 nb n (cid:17) , (4.6) R +2 ,n = n / log nnb n h − v + v ++1 d , R +3 ,n = (cid:16) N b n + 1 N h d (cid:17)(cid:16) b n ∧ h v ++1 d (cid:17) . (4.7) Assume further that the bandwidth conditions h d → , nb n h d → ∞ , b n → , nb n → ∞ , N b n → ∞ , N h d → ∞ and π n = o ( h d ) hold, where π n := ( b n + ( nb n ) − / log n ) log n, (4.8) then we have the following results.(a) If b v + +1 n /h d → ∞ , √ nb n h v + v ++1 d ( χ + n + R +1 ,n + R +2 ,n + R +3 ,n ) = o (1) , √ nb n h v + v ++1 d /N = o (1) , then (cid:112) nb n h v + v ++1 d (cid:16) ˜ T + N,c − T + c (cid:17) D = ⇒ N (0 , σ , + + σ , + ) , (4.9) where σ , + = (cid:16) (cid:90) K d ( z v + +1 ) dz (cid:17) (( v + + 1)!) v ++1 (cid:88) { t + l : v + l = v + } σ ( t + l ) | µ ( v + +1) ( t + l ) | v ++1 (cid:90) ( K ∗ ( x )) dx, (4.10) σ , + = σ (0)(( v + + 1)!) v ++1 (cid:90) ( ¯ K ∗ ( t )) dt (cid:16) (cid:88) { t + l : v + l = v + } | µ ( v + +1) ( t + l ) | − v ++1 (cid:90) K d ( z v + +1 ) dz (cid:17) . (4.11) (b) If b n /h v ++1 d = r ∈ [0 , ∞ ) , √ nh d h v +2( v ++1) d ( χ + n + R +1 ,n + R +2 ,n + R +3 ,n ) = o (1) ,then (cid:112) nh d h v +2( v ++1) d (cid:16) ˜ T + N,c − T + N,c (cid:17) D = ⇒ N (0 , ρ , +1 + ρ , +2 ) , (4.12) where ρ , +1 = | ( v + + 1)! | v ++1 (cid:88) { t + l : v + l = v + } σ ( t + l ) | µ ( v + +1) ( t + l ) | v ++1 (cid:90) (cid:90) (cid:90) K ∗ ( u ) K ∗ ( v ) K d ( z v + +11 ) × K d (cid:16)(cid:16) z + r (cid:12)(cid:12)(cid:12) ( v + + 1)! µ ( v + +1) ( t + l ) (cid:12)(cid:12)(cid:12) − v ++1 ( v − u ) (cid:17) v + +1 (cid:17) dudvdz , (4.13) and ρ , +2 = r − σ , +2 , where σ , +2 is defined in (4.11)In general the rate of convergence of the estimator ˜ T + N,c is determined by themaximal order of the critical points, and only critical points of maximal orderappear in the asymptotic variance. The rate of convergence additionallydepends on the relative order of the bandwidths b n and h d . Theorem 4.1also covers the case v + = 0, where all roots of the equation µ ( t ) − µ (0) = c are regular. Moreover, the use of the Jackknife corrected estimate ˜ µ b n avoidsthe bias problem observed in Theorem 3.1.It is also worthwhile to mention that there exists a slight difference inthe statement of part (a) and (b) of Theorem 4.1. While part (a) gives theasymptotic distribution of ˜ T + N,c − T + c (appropriately standardized), part (b)describes the weak convergence of ˜ T + N,c − T + N,c . The replacement of T + N,c byits limit T + c is only possible under additional bandwidth conditions. In fact,if b n /h v ++1 d = r ∈ [0 , ∞ ), Theorem 4.1 and Proposition 2.1 give (cid:112) nh d h v +2( v ++1) d (cid:16) ˜ T + N,c − T + c (cid:17) − R n D = ⇒ N (0 , ρ , +1 + ρ , +2 ) , (4.14)where ρ , +1 and ρ , +2 are defined in Theorem 4.1, and R n is a an additionalbias term of order O ( (cid:112) nh d h v ++22( v ++1) d ) , which does not necessarily vanish asymptotically. For example, in the regularcase v + = 0 this bias is of order o (1) under the additional assumptions ELEVANT CHANGES VIA A MASS EXCESS nh d = o (1) and b n /h d < ∞ . Note that these bandwidth conditions do notallow for the MSE-optimal bandwidth b n ∼ n − / . These considerations givesome arguments for using small bandwidths h d in the estimator (4.5) suchthat condition (a) of Theorem 4.1 holds, that is h d = o ( b v + +1 n ). Moreover,in numerical experiments we observed that smaller bandwidths h d usuallyyield a substantially better performance of the estimator ˜ T + N,c and in theremaining part of this section we concentrate on this case as this is mostimportant from a practical point of view.The next result gives a corresponding statement of the joint asymptoticdistribution of ( ˜ T + N,c , ˜ T − N,c ) and as a consequence that of ˜ T N,c = ˜ T + N,c + ˜ T − N,c ,where the statistic ˜ T − N,c is defined by˜ T − N,c = 1 N N (cid:88) i =1 (cid:90) − c −∞ h d K d (cid:18) ˜ µ b n ( i/N ) − ˜ µ b n (0) − uh d (cid:19) du. (4.15) Theorem . Assume that the conditions of Theorem 4.1 are satisfied,that k − ≥ and define v − = max ≤ l ≤ k − v − l as the maximum order of thecritical roots { t − l : 1 ≤ l ≤ k − } . If, additionally, the bandwidth conditions(a) of Theorem 4.1 hold and similar bandwidth conditions are satisfied forthe level − c , we have (cid:112) nb n (cid:16) h v + v +1 d ( ˜ T + N,c − T + c ) , h v − v − +1 d ( ˜ T − N,c − T − c ) (cid:17) T ⇒ N (0 , Σ) , (4.16) where the matrix Σ = (Σ ij ) i,j =1 , has the entries Σ = σ , +1 + σ , +2 , Σ = σ , − + σ , − , Σ = Σ = − σ (0)(( v + + 1)!) v ++1 (( v − + 1)!) v − +1 (cid:90) ( ¯ K ∗ ( t )) dt × (cid:88) { t + l : v + l = v + } (cid:82) K d ( z v + +1 ) dz | µ ( v + +1) ( t + l ) | / ( v + +1) (cid:88) { t − l : v − l = v − } (cid:82) K d ( z v − +1 ) dz | µ ( v − +1) ( t − l ) | / ( v − +1) , and σ , − , σ , − are defined similarly as σ , +1 , σ , +2 in (4.10) , (4.11) , respec-tively. The continuous mapping theorem and Theorem 4.2 imply the weak conver-gence of the estimator ˜ T N,c of T c , that is √ nb n h vv +1 d ( ˜ T N,c − T c ) → N (0 , σ ) , where v = max { v + , v − } and the asymptotic variance is given by σ =Σ ( v + ≥ v − ) + Σ ( v + ≤ v − ) + 2Σ ( v + = v − ) . Bootstrap. Although Theorem 4.1 is interesting from a theoreticalpoint of view and avoids the bias problem described in Section 3, it can notbe easily used to construct a test for the hypotheses (1.7). The asymptoticvariance of the statistics T + N,c and T − N,c depends on the long-run variance σ ( · ) and the set C of critical points, which are difficult to estimate. More-over, the order of the critical roots is usually unknown and not estimable.Therefore it is not clear which derivatives have to be estimated (the esti-mation of higher order derivatives of the mean function is a hard problemanyway). As an alternative we propose a bootstrap test which does notrequire the estimation of the derivatives of the mean trend at the criticalroots.The bootstrap procedure is motivated by an essential step in the proof ofTheorem 4.1, which gives a stochastic approximation for the difference˜ T + N,c − T + c = I (cid:48) + o p (cid:16)(cid:0)(cid:112) nb n h v + v ++1 d (cid:1) − (cid:17) , where the statistic I (cid:48) is defined as − nN b n h d n (cid:88) j =1 N (cid:88) i =1 K d (cid:16) µ ( i/N ) − µ (0) − ch d (cid:17) σ (cid:0) jn (cid:1)(cid:16) K ∗ (cid:16) i/N − j/nb n (cid:17) − ¯ K ∗ (cid:16) jnb n (cid:17)(cid:17) V j , (4.17)and ( V j ) j ∈ N is a sequence of independent standard normally distributedrandom variables. Based on this approximation we propose the followingbootstrap to calculate critical values. Algorithm . (1) Choose bandwidths b n , h d and an estimator of the long-run variance,say ˆ σ ( · ), which is uniformly consistent on the set ∪ v + k =1 U ε ( t + k ) for some ε > 0, where U ε ( t ) denotes a ε -neighbourhood of the point t .(2) Calculate the bias corrected local linear estimate ˜ µ b n ( t ) and the statistic˜ T + N,c defined in (4.1) and (4.5), respectively.(3) Calculate¯ V = n (cid:88) j =1 ˆ σ (cid:0) jn (cid:1)(cid:104) N (cid:88) i =1 K d (cid:16) ˜ µ b n ( i/N ) − ˜ µ b n (0) − ch d (cid:17)(cid:110) K ∗ (cid:16) i/N − j/nb n (cid:17) − ¯ K ∗ (cid:16) jnb n (cid:17)(cid:111)(cid:105) . (4.18) ELEVANT CHANGES VIA A MASS EXCESS (4) Let q +1 − α denote the the 1 − α quantile of a centered normal distributionwith variance ¯ V , then the null hypothesis in (1.8) is rejected, whenever nN b n h d (cid:0) ˜ T + N,c − ∆ (cid:1) > q +1 − α . (4.19) Theorem . Assume that the conditions of Theorem 4.1 (a) are sat-isfied, then the test (4.19) defines a consistent and asymptotic level α testfor the hypotheses (1.8) . Remark . (a) It follows from the proof of Theorem 4.3 in the appendix that P (cid:0) test (4.19) rejects (cid:1) −→ T + c > ∆ α if T + c = ∆0 if T + c < ∆ . (4.20)Moreover, these arguments also show that the power of the test (4.19)depends on the “signal to noise ratio” (∆ − T + c ) / (cid:113) σ , +1 + σ , +2 and that it isable to detect local alternatives converging to the null at a rate O (( nb n ) − / h − v + / ( v + +1) d ).When the level c decreases, the value of T + c increases and the rejection prob-abilities also increase. On the other hand, for any given level c , the rejectionprobability will increase when the threshold ∆ decreases (see equation (8.28)in the appendix).(b) As pointed out by one referee, it is also of interest to discuss someuniformity properties in this context. For this purpose we consider the sit-uation in Theorem 4.3, assume that f is a potential mean function in (1.4)and denote by v + f and q f the corresponding quantities in Assumption 4.1and 4.2 for µ = f . For given numbers ˜ q, ˜ v < ∞ let F denote the class of all3 ∨ (˜ v + 1) + 1 times differentiable functions f on the interval [0 , 1] satisfyingsup f ∈F v + f ≤ ˜ v and sup f ∈F q f ≤ ˜ q . Consider a sequence (∆ n ) n ∈ N satisfying (cid:112) nb n h ˜ v ˜ v +1 d (∆ − ∆ n ) → −∞ and define for a given level c > 0, constants M , L , η, ι > F c ( M, η, ι, ˜ q, ˜ v, L, ∆ n )as the class of all functions f ∈ F with the properties(i) The cardinality of the set E + c ( f ) = { t ∈ [0 , 1] : f ( t ) − f (0) = c } is atmost M .(ii) min {| t − t | : t , t ∈ E + c ( f ); t (cid:54) = t } ≥ η ; min { t : t ∈ E + c ( f ) } ≥ η ;max { t : t ∈ E + c ( f ) } ≤ − η . (iii) sup t ∈ [0 , ( f ( t ) − f (0)) ≥ c + ι .(iv) sup t ∈ [0 , max ≤ s ≤ ∨ (˜ v +1)+1 | f ( s ) ( t ) | ≤ L .(v) T + f,c := (cid:82) ( f ( t ) − f (0) > c ) dt ≥ ∆ n . If P f denotes the distribution of the process ( X i,n ) i =1 ,...,n in model (1.4)with µ = f , then it follows by a careful inspection of the proof of Theorem4.3 that lim n →∞ inf f ∈F c ( M,η,ι, ˜ q, ˜ v,L, ∆ n ) P f (cid:0) test (4.19) rejects (cid:1) = 1 . (c) The bootstrap procedure can easily be modified to test the hypothesis(1.7) referring to the quantity T c . In step (2), we additionally calculate thestatistic ˜ T − N,c defined in (4.15), ˜ T N,c = ˜ T + N,c + ˜ T − N,c and the quantity V ∗ = n (cid:88) j =1 ˆ σ ( j/n ) (cid:16) N (cid:88) i =1 K † d (cid:16) ˜ µ b n ( i/N ) − ˜ µ b n (0) − ch d (cid:17)(cid:16) K ∗ (cid:16) i/N − j/nb n (cid:17) − ¯ K ∗ (cid:16) jnb n (cid:17)(cid:17)(cid:17) , where K † d (cid:16) ˜ µ b n ( i/N ) − ˜ µ b n (0) − ch d (cid:17) = K d (cid:16) ˜ µ b n ( i/N ) − ˜ µ b n (0) − ch d (cid:17) − K d (cid:16) ˜ µ b n ( i/N ) − ˜ µ b n (0) + ch d (cid:17) . Finally, the null hypothesis (1.7) is rejected if nN b n h d (cid:0) ˜ T N,c − ∆ (cid:1) > q − α , where q − α denotes the (1 − α )th quantile of a centered normal distributionwith variance V ∗ For the estimation of the the long-variance we define S k,r = (cid:80) ri = k X i andfor m ≥ j = S j − m +1 ,j − S j +1 ,j + m m , and for t ∈ [ m/n, − m/n ]ˆ σ ( t ) = n (cid:88) j =1 m ∆ j ω ( t, j ) , (4.21)where for some bandwidth τ n ∈ (0 , ω ( t, i ) = K (cid:16) i/n − tτ n (cid:17) / n (cid:88) i =1 K (cid:16) i/n − tτ n (cid:17) . ELEVANT CHANGES VIA A MASS EXCESS For t ∈ [0 , m/n ) and t ∈ (1 − m/n, 1] we define ˆ σ ( t ) = ˆ σ ( m/n ) andˆ σ ( t ) = ˆ σ (1 − m/n ), respectively. Note that the estimator (4.21) does notinvolve estimated residuals. The following result shows that ˆ σ is consistentand can be used in Algorithm 4.3. Theorem . Let Assumption 2.1 - 2.2 be satisfied and assume τ n → , nτ n → ∞ , m → ∞ and mnτ n → . If, additionally, the function σ is twicecontinuously differentiable, then the estimate defined in (4.21) satisfies sup t ∈ [ γ n , − γ n ] | ˆ σ ( t ) − σ ( t ) | = O p (cid:16)(cid:114) mnτ n + 1 m + τ n + m / /n (cid:17) , (4.22) where γ n = τ n + m/n . Moreover, we have ˆ σ ( t ) − σ ( t ) = O p (cid:16)(cid:114) mnτ n + 1 m + τ n + m / /n (cid:17) (4.23) for any fixed t ∈ (0 , , and for s = { , } ˆ σ ( s ) − σ ( s ) = O p (cid:16)(cid:114) mnτ n + 1 m + τ n + m / /n (cid:17) . (4.24)Note that error term (cid:113) mnτ n + m + τ n in (4.23) is minimized at the rate of O ( n − / ) by m (cid:16) n / and τ n (cid:16) n − / , where we write r n (cid:16) s n if r n = O ( s n )and s n = O ( r n ). For this choice the estimator (4.21) achieves a better ratethan the long-run variance estimator proposed in Zhou and Wu (2010) (seeTheorem 5 in this reference). 5. Simulation study. In this section we investigate the finite sampleproperties of the bootstrap tests proposed in the previous sections. For thesake of brevity we restrict ourselves to the test (4.19) for the hypotheses(1.8). Similar results can be obtained for the corresponding tests for thehypotheses (1.7) and (1.9). The code used to obtain the presented results isavailable from the second author on request.Throughout this section all kernels are chosen as Epanechnikov kernel.The selection of the bandwidth b n in the local linear estimator is of particularimportance in our approach, and for this purpose we use the generalized crossvalidation (GCV) method. To be precise, let ˜ e i,b = X i,n − ˜ µ b ( i/n ) be theresidual obtained from a bias corrected local linear fit with bandwidth b anddefine ˜ e b = (˜ e ,b , . . . , ˜ e n,b ) T . Throughout this section we use the bandwidthˆ b n = argmin b GCV ( b ) := argmin b n − ˆ e Tb ˆΓ − n ˆ e b (1 − K ∗ (0) / ( nb )) , where ˆΓ n is an estimator of the covariance matrix Γ n := { E ( (cid:15) i,n (cid:15) j,n ) } ≤ i,j ≤ n ,which is obtained by the banding techniques as described in Wu and Pourah-madi (2009).It turns out that Algorithm 4.3 is not very sensitive with respect to thechoice of the bandwidth h d as long as it is chosen sufficiently small. Similarly,the number N of knots used in the Riemann approximation (2.3) has anegligible influence on the test, provided it has been chosen sufficiently large.As a rule of thumb satisfying the bandwidth conditions of Theorem 4.1(a),we use h d = N − / / N = n . Inorder to save computational time we use m = (cid:98) n / (cid:99) and τ n = n − / for theestimator ˆ σ in the simulation study [see the discussion at the end of Section4.2]. For the data analysis in Section 6 we suggest a data-driven procedureand use a slight modification of the minimal volatility method as proposedby Zhou and Wu (2010). To be precise - in order to avoid choosing too largevalues for m and τ - we penalize the quantity ISE h,j = ise [ ∪ r = − ˆ σ m h ,τ j + r ( t ) ∪ r = − ˆ σ m h + r ,τ j ( t )]in their selection criteria by the term 2( τ j + m h /n ) IS , where ˆ σ m h ,τ j ( · ) is theestimator (4.21) of the long-run variance with parameters m h and τ j and IS is the average of the quantities ISE h,j .All simulation results presented in this section are based on 2000 simu-lation runs. We consider the model (1.4) with errors (cid:15) i,n = G ( i/n, F i ) / G ( t, F i ) = 0 . | sin(2 πt ) | G ( t, F i − ) + η i ;(II) : G ( t, F i ) = 0 . − t − . ) G ( t, F i − ) + η i ,and the filtration F i = ( η −∞ , . . . , η i ) is generated by a sequence { η i , i ∈ Z } ofindependent standard normally distributed random variables. For the meantrend we consider the following two cases(a): µ ( t ) = 8( − ( t − . + 0 . µ ( t ) = sin(2 | t − . | π )(1 + 0 . t ).Typical sample paths of these processes are depicted in Figure 3. Note thatthe mean trend (b) is not differentiable at the point 0 . 6. However, usingsimilar but more complicated arguments as given in Section 8 and in thesupplementary material, it can be shown that the results of this paper alsohold if µ ( · ) is Lipschitz continuous outside of an open set containing thecritical roots t +1 , . . . , t + k + , t − , . . . , t − k − . ELEVANT CHANGES VIA A MASS EXCESS Time x . . . Time x . . . Time x − . . . Time x − . . . mean0.30.15 Fig 3: Simulated sample paths for the four models under consideration. Thehorizontal lines display the level c which is given by . and . for themean function (a) and by . and . for the mean function (b). We begin illustrating the finite sample properties of the (uncorrected) es-timator ˆ T + N,c in (2.3) and its bias correction ˜ T + N,c in (4.5) for the quantity T + c , where c = 1 . 8. The corresponding values of T + c are T +1 . = 0 . T +1 . = 0 . T + N,c has a smaller mean squared error than the uncorrected estimate. Table 1 Simulated bias and standard deviation of the estimators ˆ T + N,c and its bias correction ˜ T + N,c ,where c = 1 . . The sample size is n = 500 and the bandwidth has been chosen by GCV. Model (a,I) (a,II) (b,I) (b,II)Accuracy bias sd bias sd bias sd bias sdˆ T + N, . -0.105 0.063 -0.122 0.077 -0.077 0.055 -0.054 0.060˜ T + N, . -0.008 0.065 -0.011 0.069 -0.001 0.076 0.010 0.085 Next we investigate the finite sample properties of the bootstrap test(4.19) for the hypotheses (1.8), where the threshold is given by ∆ = 0 . ∆ = 0 . 15. Following the discussion in Remark 4.1(a) we display in Tables 2the simulated type 1 error at the boundary of the null hypothesis in (1.8),that is T + c = ∆. A good approximation of the nominal level at this point isrequired as the rejection probabilities for T + c < ∆ or T + c > ∆ are usuallysmaller or larger than this value, respectively. The values of c correspondingto T + c = 0 . T + c = 0 . 15 are given by c = 1 . 82 and c = 1 . 955 for themean function (a) and by c = 1 . 672 and c = 1 . 78 for the mean function(b). We observe a rather precise approximation of the nominal level, whichis improved with increasing sample size. For the sample size n = 200 theGCV method selects the bandwidths b cv for 0 . 25, 0 . 26, 0 . 23, 0 . 19 for themodels (( I ) , ( a )), (( I ) , ( b )), (( II ) , ( a )), and (( II ) , ( b )), respectively. Similarly,for the sample size n = 500 the GCV method selects the bandwidths 0 . . 17, 0 . 21, 0 . 14 for the models (( I ) , ( a )), (( I ) , ( b )), (( II ) , ( a )) and (( II ) , ( b )),respectively. In order to study the robustness of the test with respect tothe choice of b n we investigate the bandwidths b − cv = b cv − . , b cv , b + cv = b cv + 0 . 05. For this range of bandwidths the approximation of the nominallevel is remarkably stable. Table 2 Simulated level of the test (4.19) at the boundary of the null hypothesis (1.8) . The samplesize is n = 200 (upper part) and n = 500 (lower part) and various bandwidths areconsidered. The bandwidth b cv is chosen by GCV, and b − cv = b cv − . , b + cv = b cv + 0 . .n model (a,I) (b,I) (a,II) (b,II)∆ b n 5% 10% 5% 10% 5% 10% 5% 10%0.3 b − cv b cv b + cv b − cv b cv b + cv b − cv b cv b + cv b − cv b cv b + cv We also briefly address the problem of the sensitivity of the procedure withrespect to the choice of the bandwidth h d . For this purpose we consider thesame scenarios as in Table 2. For the sake of brevity we restrict ourselves to ELEVANT CHANGES VIA A MASS EXCESS Table 3 Simulated level of the test (4.19) at the boundary of the null hypothesis (1.8) for differentchoices of the bandwidth h d . The sample size is n = 500 . The bandwidth b cv is chosen byGCV. n model (a,I) (b,I) (a,II) (b,II)∆ h d 5% 10% 5% 10% 5% 10% 5% 10%0.3 0 . . . . . . . . . . . Threshold Delta r e j e c t i on r a t e . . . . . . Level c r e j e c t i on r a t e Fig 4: Simulated rejection probabilities of the test (4.19) in model (1.4) forvarying values of c and ∆ . Left: c = 1 . , ∆ ∈ [0 , . (the case ∆ =0 . corresponds to the boundary of the null hypothesis). Right: ∆ = 0 . , c ∈ [1 . , (the case c = 1 . corresponds to the boundary of the nullhypothesis).The dashed horizontal line represents the nominal level 10%. the case n = 500 and the data driven bandwidth b cv . The results are shownin Table 3 for the bandwidths h d = n − / / . h d = 0 . h d = 0 . h d as long as h d is chosen sufficiently small.In Figure 4, we investigate the properties of the test (4.19) as a functionof the threshold ∆ and level c , where we restrict ourselves to the scenario(( I ) , ( a )). For the other cases the observations are similar. The bandwidthis b n = 0 . 2. In the left part of the figure the level c is fixed as 1 . 82 and ∆varies from 0 to 0 . . . c varies between 1 . 44 and 2. Again the rejection rates decreasewhen c increases.We finally investigate the power of the test (4.19) for the hypotheses (1.8)with c = 1 . 82 and ∆ = 0 . 3, where the bandwidth is chosen as b n = 0 . 2. Themodel is given by (1.4) with error ( I ) and different mean functions µ ( t ) = a ( − ( t − . + 0 . , a ∈ [7 . , . a = 8 corresponds to the boundary of thehypotheses). The results are presented in Figure 5, which demonstrate thatthe test (4.19) has decent power. . . . . . . a po w e r Fig 5: Simulated power of the test (4.19) in model (1.4) for the hypothesis (1.8) with c = 1 . and ∆ = 0 . . The mean functions are given by (5.1) and the case a = 8 corresponds to the boundary of the null hypothesis. Thedashed horizontal line represents the nominal level 10%. Although hypotheses of the form (1.7) have not been investigated in the ELEVANT CHANGES VIA A MASS EXCESS . . . . . . level . . . . . . level Fig 6: Rejection rates of the test of Dette and Wied (2016) (dashed line) andthe bootstrap test (4.19) with ∆ = 0 . (solid line) for various values of thelevel c . Left panel: regression function (III); right panel: regression function(IV). The nominal level is . literature so far it was pointed out by a referee that it might be of interest tosee a comparison with tests for similar hypotheses. The method most similarin spirit to our approach is the test of Dette and Wied (2016) for the hy-potheses (1.3). Note that the procedure of these authors assumes a constantmean before and after the (relevant) change point, while we investigate if a(inhomogeneous) process deviates from it’s initial mean substantially overa sufficiently long period. Thus - strictly speaking - none of the proceduresis applicable to the other testing problem. On the other hand both testsaddress the problem of relevant changes under different perspectives andit might therefore be of interest to see their performance in the respectivealternative testing problems. For this purpose we consider model (1.4) withthe mean functions(III) µ ( t ) = 2 . πt ),(IV) µ ( t ) = 0 for t ∈ [0 , / 3) and µ ( t ) = 2 . t ∈ [2 / , (cid:15) i,n ∼ N (0 , / 4. Note that model (III)corresponds to the situation considered in this paper (i.e. a continuouslyvarying mean function), while model (IV) reflects the situation investigatedin Dette and Wied (2016). In Figure 6 we display the rejection probabili-ties of both tests if the level c varies from 0 . . 75 (thus the curves are decreasing with increasing c ). The significance level is given by 10%, whichmeans the value of c where the curve is 10% should be close to 2.5. For thehypotheses (1.7) we fixed ∆ as 0 . 1, because for a comparison with the testof Dette and Wied (2016) it is irrelevant how long the threshold is exceededand the power of the test (4.19) decreases for increasing values of ∆ (seeFigure 4).We observe in the left panel of Figure 6 that the test of Dette and Wied(2016) performs poorly in model (III), where the mean is not constant andthe conditions for its applications are not satisfied. On the other hand, thebootstrap test (4.19) shows a reasonable performance in model (IV) althoughthe assumptions for its application are not satisfied. In particular this testshows a similar performance as the test of Dette and Wied (2016) for smallvalues of ∆, which is particularly designed for the hypotheses (1.3) (see theright panel of Figure 6). ELEVANT CHANGES VIA A MASS EXCESS 6. Data examples. Global temperature data. Global temperature data has been exten-sively studied in the statistical literature under the assumption of station-arity [see for example Bloomfield and Nychka (1992), Vogelsang (1998)and Wu and Zhao (2007) among others]. We consider here a series fromhttp://cdiac.esd.ornl.gov/ftp/trends/temp/jonescru/ with global monthlytemperature anomalies from January 1850 to April 2015, relative to the1961 − p -value of1 . 6% supporting a non-stationary model for data analysis. Time t e m pe r a t u r e − . − . . . . Time r a i n f a ll Fig 7: Left panel: deseasonalized global temperature 1850–2015 and its fittedmean-trend. Right panel: Yearly Rainfall of Tucum´an Province, Argentina,1884–1996. We are interested in the question if the deseasonalized monthly tempera-ture exceeds the temperature in January 1850 by more than c = 0 . 15 degreesCelsius in more than 100∆% of the considered period. For this purpose werun the test (4.19) for the hypothesis (1.8), where the bandwidth (chosenby GCV) is b n = 0 . 105 and h d = 0 . 011 (we note again that the procedureis rather stable with respect to the choice of h d ). For the estimate (4.21) ofthe long-run variance σ , we use the procedure described at the beginning ofthis section, which yields m = 30 and τ = 0 . . p -value of 4 . b n = 0 . and we chose h d = 0 . 013 and m = 36, τ = 0 . 234 for the estimate of thetime-varying long-run variance (see the discussion at the beginning of thissection). We find that for ∆ = 26% and c = 0 . 15 the p -value is 6 . c = 0 . 15 degrees Celsius arise more frequently between 1975and 2015. The conclusions of this short data analysis are similar to thoseof many authors, but by our method we are able to quantitatively describerelevant deviations. For example, if we reject the hypothesis that in less than26% of the time between January 1850 and April 2015 the mean functionexceeds its value from January 1850 by more than c = 0 . 15 degrees Celsius,the type I error of this conclusion is less or equal than 5%.6.2. Rainfall data. In this example we analyze the yearly rainfall data(in millimeters) from 1884 to 1996 in the Tucum´an Province, Argentina,which is a predominantly agriculture region. Therefore its economy well-being depends sensitively on timely rainfall. The series with a local linearestimate of the mean trend are depicted in right panel of Figure 7 (notethat the range of estimated mean function is [71 . . p -value smaller than 0 . p -value smaller than 2%. Onthe other hand a self-normalization method considered in Shao and Zhang(2010) reports a p -value about 10%.Meanwhile, there is some belief that there exists a change point becauseof the construction of a dam near the region during 1952 − . 05 (here we calculated b n = 0 . m = 11, τ = 0 . 24 and h d = 0 . 047 as described at the beginning of this section). Forthe level c = 7 the p -value is 6 . 7. Further discussion. We conclude this paper with a brief discussionof the extension of the proposed concept to the multivariate case and itsrelation to the concept of sojourn times in probability theory. ELEVANT CHANGES VIA A MASS EXCESS Multivariate data. The results of this paper can be extended tomultivariate time series of the form X i,n = µ ( i/n ) + e i,n , (7.1)where X i,n = ( X i,n , ..., X mi,n ) T is the m -dimensional vector of observations, µ ( i/n ) = ( µ ( i/n ) , ..., µ m ( i/n )) T its corresponding expectation and ( e i,n ) i =1 ...,n is an m -dimensional time series such that e i,n = G ( i/n, F i ), where G ( t, F i ) =( G ( t, F i ) , ..., G m ( t, F i )) T is an m -dimensional filter. Assume that the longrun variance matrixΣ( t ) = ∞ (cid:88) i = −∞ cov( G ( t, F i ) , G ( t, F ))of the error process is strictly positive and let (cid:107) v (cid:107) denote the Euclideannorm of an m -dimensional vector v . The excess mass for the m -dimensionalmean function is then defined as T c := (cid:90) ( (cid:107) µ ( t ) − µ (0) (cid:107) > c ) dt (7.2)and a test for the hypotheses H : T c ≤ ∆ versus H : T c > ∆ can bedeveloped by estimating this quantity byˆ T N,c = 1 N N (cid:88) i =1 (cid:90) ∞ c h d K d (cid:18) (cid:107) ˆ µ ( i/N ) − ˆ µ (0) (cid:107) − uh d (cid:19) du, (7.3)where ˆ µ denote the vector of component-wise bias-corrected Jackknife esti-mates of the vector of regression functions.The corresponding bootstrap test is now obtained by rejecting the nullhypothesis at level α , whenever nN b n h d ˆ T N,c − ∆ > q − α , (7.4)where q − α is the (1 − α )-quantile of the random variable n (cid:88) j =1 N (cid:88) i =1 K d (cid:18) ˆ g ( i/N ) − c h d (cid:19) (cid:18) K ∗ (cid:18) j/n − i/Nb n (cid:19) − ¯ K ∗ (cid:18) jnb n (cid:19)(cid:19) ( ∇ ˆ g ( i/N )) T ˆΣ / ( j/n ) V j , (7.5) ∇ ˆ g ( u ) is the gradient of the function ˆ g ( u ) = (cid:107) ˆ µ ( u ) − ˆ µ (0) (cid:107) , V , V , . . . areindependent standard normally distributed m -dimensional random vectorsand ˆΣ( t ) is an analogue of the long run variance matrix estimator definedin (4.21).Under similar conditions as stated in in Assumption 2.1, 2.2, 4.1, 4.2and in Theorem 4.1(a), an analogue of Theorem 4.3 can be proved, i.e. thebootstrap test defined by (7.4) has asymptotic level α and is consistent. Estimates of excess measures related to sojourn times. The excessmeasures (1.12) and (1.13) based on sojourn times can easily be estimatedunder the assumption that the process { (cid:15) ( t ) − (cid:15) (0) } t ∈ [0 , is stationary withdensity f . In this case the quantities e c and p c, ∆ can be expressed as e c = E ( S c ) = (cid:90) (cid:90) ( | µ ( t ) − µ (0) + x | > c ) f ( x ) dtdx, (7.6) p c, ∆ = P ( S c > ∆) = E ( E ( ( S c > ∆) | (cid:15) ( t ) − (cid:15) (0) = x ))(7.7) = (cid:90) (cid:16) (cid:90) ( | µ ( t ) − µ (0) + x | > c ) dt > ∆ (cid:17) f ( x ) dx, and corresponding estimators are given byˆ e c = 1 N nh d n (cid:88) i =1 N (cid:88) s =1 (cid:90) ∞ c K d (cid:16) | ˆ µ ( s/N ) − ˆ µ (0) + ˆ Z ( i/n ) | − uh d (cid:17) du, (7.8)ˆ p c, ∆ = 1 n n (cid:88) i =1 (cid:16) N h d N (cid:88) s =1 (cid:90) ∞ c K d (cid:16) | ˆ µ ( s/N ) − ˆ µ (0) + ˆ Z ( i/n ) | − uh d (cid:17) du > ∆ (cid:17) , (7.9)respectively, where ˆ µ ( t ) − ˆ µ (0) is a consistent estimator (say a local linear)of µ ( t ) − µ (0) and ˆ Z ( t ) = ˆ (cid:15) ( t ) − ˆ (cid:15) (0) denotes the corresponding residual.Statistical analysis can then be developed along the lines of this paper.However, in the case of a non-stationary error process as considered in thispaper the situation is much more complicated and we leave the developmentof estimators and investigation of their (asymptotic) properties for futureresearch. Acknowledgements. The authors would like to thank Martina Steinwho typed this manuscript with considerable technical expertise and to V.Spokoiny for explaining his results to us and to V. Golosnoy for some helpwith the literature on control charts. The authors are also grateful to fourunknown reviewers for their constructive comments on an earlier versionof this manuscript. The work of the authors was partially supported bythe Deutsche Forschungsgemeinschaft (SFB 823: Statistik nichtlinearer dy-namischer Prozesse, Teilprojekt A1 and C1, FOR 1735: Structural inferencein statistics - adaptation and efficiency). ELEVANT CHANGES VIA A MASS EXCESS References. ´Alvarez Esteban, P. C. , Barrio, E. D. , Cuesta-Albertos, J. A. and Matran, C. (2008). Trimmed Comparison of Distributions. Journal of the American Statistical As-sociation ´Alvarez Esteban, P. C. , del Barrio, E. , Cuesta-Albertos, J. A. and Matran, C. (2012). Similarity of samples and trimming. Bernoulli Andrews, D. W. K. (1993). Tests for parameter instability and structural change withunknown change point. Econometrica Aue, A. and Horv´ath, L. (2013). Structural breaks in time series. Journal of Time SeriesAnalysis Aue, A. , H¨ormann, S. , Horv´ath, L. and Reimherr, M. (2009). Break detection inthe covariance structure of multivariate time series models. Annals of Statistics Bai, J. and Perron, P. (1998). Estimating and testing linear models with multiple struc-tural changes. Econometrica Baillo, A. (2003). Total error in a plug-in estimator of level sets. Statistics & ProbabilityLetters 411 - 417. Berman, S. M. (1992). Sojourns and extremes of stochastic processes . The Wadsworth &Brooks/Cole Statistics/Probability Series . Wadsworth & Brooks/Cole Advanced Books& Software, Pacific Grove, CA. Bloomfield, P. and Nychka, D. (1992). Climate spectra and detecting climate change. Climatic Change Brown, R. L. , Durbin, J. and Evans, J. M. (1975). Techniques for Testing the Con-stancy of Regression Relationships Over Time. Journal of the Royal Statistical SocietySeries B Cadre, B. (2006). Kernel estimation of density level sets. Journal of Multivariate Analysis 999 - 1023. Champ, C. W. and Woodall, W. H. (1987). Exact results for Shewhart control chartswith supplementary runs rules. Technometrics Chandler, G. and Polonik, W. (2006). Discrimination of locally stationary time seriesbased on the excess mass functional. Journal of the American Statistical Association Cheng, M. Y. and Hall, P. (1998). Calibrating the excess mass and dip tests of modality. Journal of the Royal Statistical Society: Series B (Statistical Methodology) Chernozhukov, V. , Fernand´ez-Val, I. and Galichon, A. (2009). Improving point andinterval estimators of monotone functions by rearrangement. Biometrika Chernozhukov, V. , Fernand´ez-Val, I. and Galichon, A. (2010). Quantile and Prob-ability Curves Without Crossing. Econometrica Chow, G. C. (1960). Tests of Equality Between Sets of Coefficients in Two Linear Re-gressions. Econometrica Cuevas, A. , Gonz´alez-Manteiga, W. and Rodr´ıguez-Casal, A. (2006). PLUG-INESTIMATION OF GENERAL LEVEL SETS. Australian & New Zealand Journal ofStatistics Dahlhaus, R. et al. (1997). Fitting time series models to nonstationary processes. Theannals of Statistics Dette, H. , Neumeyer, N. and Pilz, K. F. (2006). A simple nonparametric estimator of a strictly monotone regression function. Bernoulli Dette, H. and Volgushev, S. (2008). Non-crossing non-parametric estimates of quantilecurves. Journal of the Royal Statistical Society: Series B (Statistical Methodology) Dette, H. and Wied, D. (2016). Detecting relevant changes in time series models. Journalof the Royal Statistical Society, Ser. B Dette, H. , Wu, W. and Zhou, Z. (2015a). Change point analysis of second order char-acteristics in non-stationary time series. arXiv preprint arXiv:1503.08610 . Dette, H. , Wu, W. and Zhou, Z. (2015b). Supplement for Change point analysis ofsecond order characteristics in non-stationary time series. arXiv preprint . Gijbels, I. , Lambert, A. and Qiu, P. (2007). Jump-Preserving Regression and Smooth-ing using Local Linear Fitting: A Compromise. Annals of the Institute of StatisticalMathematics Hartigan, J. A. and Hartigan, P. M. (1985). The dip test of unimodality. The Annalsof Statistics Jandhyala, V. , Fotopoulos, S. , MacNeill, I. and Liu, P. (2013). Inference for singleand multiple change-points in time series. Journal of Time Series Analysis Kr¨amer, W. , Ploberger, W. and Alt, R. (1988). Testing for structural change indynamic models. Econometrica Mason, D. M. and Polonik, W. (2009). Asymptotic normality of plug-in level set esti-mates. Ann. Appl. Probab. Mercurio, D. and Spokoiny, V. (2004). Statistical inference for time-inhomogeneousvolatility models. Ann. Statist. M¨uller, D. W. and Sawitzki, G. (1991). Excess Mass Estimates and Tests for Multi-modality. Journal of the American Statistical Association Nason, G. P. , von Sachs, R. and Kroisandt, G. (2000). Wavelet processes and adap-tive estimation of the evolutionary wavelet spectrum. Journal of the Royal StatisticalSociety, Ser. B Ombao, H. , von Sachs, R. and Guo, W. (2005). SLEX Analysis of Multivariate Non-Stationary Time Series. Journal of the American Statistical Association Page, E. S. (1954). Continuous inspection schemes. Biometrika . Polonik, W. (1995). Measuring mass concentrations and estimating density contour clus-ters – an excess mass approach. Annals of Statistics Polonik, W. and Wang, Z. (2006). Estimation of regression contour clusters – an appli-cation of the excess mass approach to regression. Journal of Multivariate Analysis Qiu, P. (2003). A jump-preserving curve fitting procedure based on local piecewise-linearkernel estimation. Journal of Nonparametric Statistics Rinaldo, A. and Wasserman, L. (2010). Generalized density clustering. The Annals ofStatistics Samworth, R. J. and Wand, M. P. (2010). Asymptotics and optimal bandwidth selectionfor highest density region estimation. Ann. Statist. Schucany, W. R. and Sommers, J. P. (1977). Improvement of Kernel Type DensityEstimators. Journal of the American Statistical Association Shao, X. and Zhang, X. (2010). Testing for change points in time series. Journal of theAmerican Statistical Association Spokoiny, V. (2009). Multiscale local change point detection with applications to value-at-risk. Ann. Statist. Tak´acs, L. (1996). Sojourn times. Journal of Applied Mathematics and Stochastic Anal-ysis Tsybakov, A. B. (1997). On nonparametric estimation of density level sets. The Annalsof Statistics Vogelsang, T. J. (1998). Trend function hypothesis testing in the presence of serialcorrelation. Econometrica Vogt, M. (2012). Nonparametric regression for locally stationary time series. Annals ofStatistics Wellek, S. (2010). Testing Statistical Hypotheses of Equivalence and Noninferiority . CRCPress. Woodall, W. H. and Montgomery, D. C. (1999). Research Issues and Ideas in Statis-tical Process Control. Journal of Quality Technology Wu, W. B. and Pourahmadi, M. (2009). Banding sample autocovariance matrices ofstationary processes. Statistica Sinica Wu, W. B. , Woodroofe, M. and Mentz, G. (2001). Isotonic regression: Another lookat the changepoint problem. Biometrika Wu, W. B. and Zhao, Z. (2007). Inference of trends in time series. Journal of the RoyalStatistical Society: Series B (Statistical Methodology) Zhou, Z. (2010). Nonparametric inference of quantile curves for nonstationary time series. The Annals of Statistics Zhou, Z. (2013). Heteroscedasticity and autocorrelation robust structural change detec-tion. Journal of the American Statistical Association Zhou, Z. and Wu, W. B. (2009). Local linear quantile estimation for nonstationary timeseries. The Annals of Statistics Zhou, Z. and Wu, W. B. (2010). Simultaneous inference of linear models with time vary-ing coefficients. Journal of the Royal Statistical Society: Series B (Statistical Methodol-ogy) 8. Proofs of main results. In this section we will prove the mainresults of this paper. For the sake of a simple notation we write e i := (cid:15) i,n throughout this section, where (cid:15) i,n is the non-stationary error process inmodel (1.4). Moreover, in all arguments given below M denotes a sufficientlylarge constant which may vary from line to line. For the sake of brevity wewill restrict ourselves to proofs of the results in Section 4, while the detailsfor the proofs of the results in Section 3 are omitted as they follow by similararguments as presented here. We will give a proof of Theorem 4.1 (deferringsome of the more technical arguments to the supplementary material) andof Theorem 4.3 in this section. The proof of Theorem 4.4 can also be foundin the supplementary material. Proof of Theorem 4.1. It follows from Assumption 2.1 that thereexist k + ≥ t +1 < . . . < t + k + of the equation µ ( t ) = µ (0) + c . Define γ + = min ≤ i ≤ k + ( t + i +1 − t + i ) > 0, with the convention that t +0 = 0 and t + k + +1 = 1. Recalling the definition of the statistic ˜ T + N,c and the quantity T + N,c in (4.5) and (2.6), respectively, we obtain the decomposition˜ T + N,c − T + N,c = ∆ ,N + ∆ ,N , (8.1)where the random variables ∆ ,N and ∆ ,N are defined by∆ ,N = 1 N N (cid:88) i =1 (cid:90) ∞ c h d K (cid:48) d (cid:16) µ ( iN ) − µ (0) − uh d (cid:17) (˜ µ b n ( iN ) − µ ( iN ) − (˜ µ b n (0) − µ (0))) du, ∆ ,N = 12 N N (cid:88) i =1 (cid:90) ∞ c h d K (cid:48)(cid:48) d (cid:16) ζ i − uh d (cid:17) (˜ µ b n ( iN ) − µ ( iN ) − (˜ µ b n (0) − µ (0))) du (8.2)(note that we do reflect the dependence of ∆ (cid:96),N on n in our notation) and ζ i denotes a random variable satisfying | ζ i − ( µ ( i/N ) − µ (0)) | ≤ | ˜ µ b n ( i/N ) − µ ( i/N ) − (˜ µ b n (0) − ˜ µ (0)) | and | ζ i − (˜ µ b n ( i/N ) − ˜ µ b n (0)) | ≤ | ˜ µ b n ( i/N ) − µ ( i/N ) − (˜ µ b n (0) − µ (0)) | . It is easy to see that | ,N | = (cid:12)(cid:12)(cid:12) N N (cid:88) i =1 h d K (cid:48) d (cid:16) ζ i − ch d (cid:17) (˜ µ b n ( i/N ) − µ ( i/N ) − (˜ µ b n (0) − µ (0))) du (cid:12)(cid:12)(cid:12) . (8.3)Recall the definition of π n in (4.8) and define A n = (cid:110) sup t ∈ [ b n , − b n ] ∪{ } | ˜ µ b n ( t ) − µ ( t ) | ≤ π n , sup t ∈ [0 ,b n ) ∪ (1 − b n ,b n ] | ˜ µ b n ( t ) − µ ( t ) | ≤ b n ∨ π n (cid:111) , (8.4)where we denote max { a, b } by a ∨ b . By Lemma 10.3 in Section 10 of theonline supplement, we have lim n →∞ P ( A n ) = 1 and Lemma 10.1 of the onlinesupplement yields (cid:93) { i : | ˜ µ b n ( i/N ) − ˜ µ b n (0) − c | ≤ h d , | ˜ µ b n ( i/N ) − µ ( i/N ) − (˜ µ b n (0) − µ (0)) | ≤ π n }≤ (cid:93) { i : | µ ( i/N ) − µ (0) − c | ≤ h d + 2 π n } = O ( N ( h d + π n ) / ( v + +1) )(8.5)almost surely, where (cid:93)A denotes the number of points in the set A . Ob-serving the definition of ζ i and (8.5) we obtain that the number of non-vanishing terms on the right hand side of equality (8.3) is bounded by ELEVANT CHANGES VIA A MASS EXCESS O ( N ( h d + π n ) v +1 ). Therefore the triangle inequality yields for a sufficientlylarge constant M (cid:107) ∆ ,N ( A n ) (cid:107) ≤ M (cid:16) b n + 1 nb n (cid:17) h − d (( h d + π n ) v ++1 ) . (8.6)Now Proposition B.3 of Dette, Wu and Zhou (2015b) (note that lim n →∞ P ( A n ) =1) yields the estimate∆ ,N = O p (cid:16) ( b n + 1 nb n ) h − d (( h d + π n ) v ++1 (cid:17) . (8.7)Notice that the assumptions regarding bandwidths guarantee that (cid:112) nh d h v +2( v ++1) d ∆ ,N = o (1) , if b v + +1 n /h d = r ∈ [0 , ∞ ) , (8.8) (cid:112) nb n h v + v ++1 d ∆ ,N = o (1) , if b v + +1 n /h d → ∞ , (8.9)and therefore it remains to consider the term ∆ ,N in the decomposition(8.1).For this purpose we recall its definition in (8.2) and obtain by an appli-cation of Lemma 10.3 of the online supplement and straightforward calcu-lations the following decomposition∆ ,n = − N h d N (cid:88) i =1 K d (cid:16) µ ( i/N ) − µ (0) − ch d (cid:17) ((˜ µ b n ( i/N ) − µ ( i/N )) − (˜ µ b n (0) − µ (0)))= I + R, (8.10)where the terms I and R are defined by I = − nN b n h d N (cid:88) i =1 K d (cid:16) µ ( iN ) − µ (0) − ch d (cid:17) n (cid:88) j =1 e j (cid:16) K ∗ (cid:16) iN − jn b n (cid:17) − ¯ K ∗ (cid:16) jnb n (cid:17)(cid:17) , (8.11) R = O (cid:16) N h d N (cid:88) i =1 K d (cid:16) µ ( iN ) − µ (0) − ch d (cid:17)(cid:16) b n + 1 nb n (cid:17)(cid:17) . (8.12)By Lemma 10.1 of the online supplement the term R is of order O ( h − v + v ++1 d ( b n + nb n )). For the investigation of the remaining term I , we use Proposition 5 of Zhou (2013), which shows that there exist (on a possibly richer probabilityspace), independent stand normally distributed random variables { V i } i ∈ Z ,such that max ≤ i ≤ n | i (cid:88) j =1 e j − i (cid:88) j =1 σ ( j/n ) V j | = o p ( n / log n ) . (8.13)This representation and the summation by parts formula in equation (44)of Zhou (2010) yieldsup t ∈ [0 , (cid:12)(cid:12)(cid:12) n (cid:88) j =1 e j ˜ K ∗ (cid:16) t − j/nb n (cid:17) − i (cid:88) j =1 σ ( j/n ) V j ˜ K ∗ (cid:16) t − j/nb n (cid:17)(cid:12)(cid:12)(cid:12) = o p ( n / log n ) , (8.14)where we introduce the notation˜ K ∗ (cid:16) t − j/nb n (cid:17) = K ∗ (cid:16) t − j/nb n (cid:17) − ¯ K ∗ (cid:16) jnb n (cid:17) . (8.15)Using these results in (8.11) and Lemma 10.1 of the online supplement pro-vides an asymptotically equivalent representation of the term I , that is | I (cid:48) − I | = o p (cid:16) n / log nnb n h − v + v ++1 d (cid:17) . (8.16)Here I (cid:48) := − nN b n h d n (cid:88) j =1 N (cid:88) i =1 K d (cid:16) µ ( i/N ) − µ (0) − ch d (cid:17) σ ( j/n ) ˜ K ∗ (cid:16) i/N − j/nb n (cid:17) V j is a zero mean Gaussian random variable with varianceVar( I (cid:48) ) = 1 n b n h d n (cid:88) j =1 (cid:16) N N (cid:88) i =1 σ ( j/n ) ˜ K ∗ (cid:16) i/N − j/nb n (cid:17) K d (cid:16) µ ( i/N ) − µ (0) − ch d (cid:17)(cid:17) = 1 n b n h d n (cid:88) j =1 (cid:16) (cid:90) σ ( j/n ) ˜ K ∗ (cid:16) t − j/nb n (cid:17) K d (cid:16) µ ( t ) − µ (0) − ch d (cid:17) dt (cid:17) + β n := ¯ α n + β n , (8.17) ELEVANT CHANGES VIA A MASS EXCESS and the last two equalities define the quantities ¯ α n and β n in an obviousmanner. Observing the estimates1 N N (cid:88) i =1 σ ( jn ) ˜ K ∗ (cid:16) i/N − j/nb n (cid:17) K d (cid:16) µ ( i/N ) − µ (0) − ch d (cid:17) − (cid:90) σ ( jn ) ˜ K ∗ (cid:16) t − j/nb n (cid:17) K d (cid:16) µ ( t ) − µ (0) − ch d (cid:17) dt = O (cid:16) ( 1 N b n + 1 N h d )( b n ∧ h v ++1 d ) (cid:17) , n b n h d n (cid:88) j =1 (cid:16) (cid:90) σ ( j/n ) ˜ K ∗ (cid:16) t − j/nb n (cid:17) K d (cid:16) µ ( t ) − µ (0) − ch d (cid:17) dt (cid:17) = O (cid:16) h − v + v ++1 d nb n h d (cid:17) , we have that β n = h − v + v ++1 d nb n h d (cid:16) N b n + 1 N h d (cid:17) ( b n ∧ h v ++1 d ) + (cid:16) ( 1 N b n + 1 N h d )( b n ∧ h v ++1 d ) (cid:17) , (8.18)where a ∧ b := min( a, b ).For the calculation of ¯ α n we note that¯ K ∗ (cid:16) j/nb n (cid:17) K ∗ (cid:16) t − j/nb n (cid:17) K d (cid:16) µ ( t ) − µ (0) − ch d (cid:17) = 0 . (8.19)for sufficiently large n . This statement follows because by Lemma 10.1 ofthe online supplement the third factor vanishes outside of (shrinking) neigh-bourhoods U , . . . , U k + of the points t +1 , . . . , t + k + with Lebesgue measure oforder h vl +1 d , (1 ≤ l ≤ k + ). Consequently, the product of the first and secondfactor vanishes, wheneever the point j/n is not an element of the set (cid:8) s + t (cid:12)(cid:12) t ∈ ∪ k + j =1 U j ; s ∈ [ − b n , b n ] (cid:9) . However, if n is sufficiently large the intersection of this set with the interval[0 , b n ], is empty. Consequently, for sufficiently large n there exists no pair( t, j/n ) such that all factors in (8.19) different from zero.Therefore, we obtain (recalling the notation of ˜ K ∗ in (8.15))¯ α n = α n + ˜ α n , (8.20) where α n = 1 n b n h d n (cid:88) j =1 (cid:16) (cid:90) σ ( j/n ) K ∗ (cid:16) t − j/nb n (cid:17) K d (cid:16) µ ( t ) − µ (0) − ch d (cid:17) dt (cid:17) , (8.21)˜ α n = 1 n b n h d n (cid:88) j =1 (cid:16) (cid:90) σ ( j/n ) ¯ K ∗ (cid:16) j/nb n (cid:17) K d (cid:16) µ ( t ) − µ (0) − ch d (cid:17) dt (cid:17) . (8.22)In the supplementary material we will show that α n = h − v + v ++1 d ( nb n ) − σ , +1 if b v + +1 n /h d → ∞ h v ++1 d ( nh d ) − ρ , +1 if b v + +1 n /h d → r ∈ [0 , ∞ )(8.23) ˜ α n = h − v + v ++1 d ( nb n ) − σ , +2 if b v + +1 n /h d → ∞ h v ++1 d ( nh d ) − ρ , +2 if b v + +1 n /h d → r ∈ [0 , ∞ ) . (8.24)where σ , +1 , σ , +2 , ρ , +1 and ρ , +2 are defined in Theorem 4.1. The assertionnow follows from (8.1), (8.8), (8.9), (8.10) and (8.16) observing that therandom variable I (cid:48) is normally distributed, where the (asymptotic) variancecan be obtained from (8.17), (8.18), (8.20), (8.23) and (8.24). (cid:3) Proof of Theorem 4.3 . We have to distinguish two cases: (1) The equation µ ( t ) − µ (0) = c has at least one solution. Recallthe definition of the quantity I (cid:48) in (4.17), then it follows from the proof ofTheorem 4.1, thatVar( (cid:112) nb n h v + v ++1 d I (cid:48) ) = σ , +1 + σ , +2 + o (1) , (8.25)where σ , +1 and σ , +2 are defined in (4.10) and (4.11), respectively. Note thatVar( I (cid:48) ) = n N b n h d ˜ V , where˜ V = n (cid:88) j =1 σ ( j/n ) (cid:16) N (cid:88) i =1 K d (cid:16) µ ( i/N ) − µ (0) − ch d (cid:17)(cid:16) K ∗ (cid:16) i/N − j/nb n (cid:17) − ¯ K ∗ (cid:16) jnb n (cid:17)(cid:17)(cid:17) . ELEVANT CHANGES VIA A MASS EXCESS At the end of this proof we will show that( √ nb n h v + v ++1 d ) n N b n h d ( ˜ V − ¯ V ) = o (1) , (8.26)which implies thatlim n →∞ (cid:112) nb n h v + v ++1 d q +1 − α / ( nN b n h d ) = Φ − (1 − α ) (cid:113) σ , +1 + σ , +2 . (8.27)Observing the identity P (cid:0) nN b n h d (cid:0) ˜ T + N,c − ∆ (cid:1) > q +1 − α (cid:1) (8.28)= P √ nb n h v + v ++1 d (cid:0) ˜ T + N,c − T + c (cid:1)(cid:113) σ , +1 + σ , +2 > √ nb n h v + v ++1 d nNb n h d q +1 − α + √ nb n h v + v ++1 d (∆ − T + c ) (cid:113) σ , +1 + σ , +2 the assertion now follows from (8.27) and Theorem 4.1, which shows thatthe random variable √ nb n h v + v ++1 d (cid:0) ˜ T + N,c − T + c (cid:1)(cid:113) σ , +1 + σ , +2 converges weakly to a standard normal distribution.It remains to prove (8.26), which is a consequence of the following observa-tions(a) ˆ σ ( t + l ) = σ ( t + l )(1 + o (1)), uniformly with respect to l = 1 , . . . , k + .(b) The bandwidth condition π n /h d = o (1), Proposition 2.1 and similararguments as (8.5) show K d (cid:16) µ ( iN ) − µ (0) − ch d (cid:17) − K d (cid:16) ˜ µ bn ( iN ) − ˜ µ bn (0) − ch d (cid:17) = O (cid:16) (cid:88) { l : v + l = v + } (cid:0)(cid:12)(cid:12) iN − t + l (cid:12)(cid:12) ≤ h v + +1 d (cid:1) π n h d (cid:17) , where π n is defined in Theorem 4.1.This completes the proof of Theorem 4.3 in the case that there exist in factroots of the equation µ ( t ) − µ (0) = c . (2) The equation µ ( t ) − µ (0) = c has no solutions. In this case wehave µ ( t ) − µ (0) < c where c > 0. Note that for two sequences of measurablesets U n and V n such that P ( U n ) → P ( U n ∩ V n ) → u ∈ (0 , P ( V n ) → u . Consequently, as the set A n defined in (8.4) satisfies P ( A n ) → n →∞ P ( nN b n h d ( ˜ T + N,c − ∆) > q +1 − α , A n , µ ( t ) − µ (0) < c ) = 0 . (8.29)However, under the event A n and µ ( t ) − µ (0) < c we have q +1 − α = 0 and˜ T + N,c = 0, if n is sufficiently large. Thus (8.29) is obvious (note that 0 < ∆ < µ ( t ) − µ (0) = c has in fact no roots. (cid:3) Appendix In this section we will provide technical details for the proof of Theorem4.1 and a proof of Theorem 4.4. Recall that we use the notation e i := (cid:15) i,n throughout this section, where (cid:15) i,n is the nonstationary error process inmodel (1.4). Moreover, in all arguments given below M denotes a sufficientlylarge constant which may vary from line to line. 9. Proof of of Theorem 4.1 and 4.4. Proof of Theorem 4.1. Following the arguments of the main article, it remains to show (8.23) and(8.24) to complete the proof of Theorem 4.1. Proof of (8.23): By Lemma 10.1 with m replaced by µ (0) + c , there exists asmall positive number 0 < (cid:15) < γ + / n is sufficiently large,we have α n = 1 n b n h d n (cid:88) j =1 (cid:16) k + (cid:88) l =1 (cid:90) t + l + (cid:15)t + l − (cid:15) σ ( j/n ) K ∗ (cid:16) t − j/nb n (cid:17) K d (cid:16) µ ( t ) − µ (0) − ch d (cid:17) dt (cid:17) = 1 n b n h d n (cid:88) j =1 k + (cid:88) l =1 (cid:16) (cid:90) t + l + (cid:15)t + l − (cid:15) σ ( j/n ) K ∗ (cid:16) t − j/nb n (cid:17) K d (cid:16) µ ( t ) − µ (0) − ch d (cid:17) dt (cid:17) = 1 n b n h d n (cid:88) j =1 k + (cid:88) l =1 α n,l,j , (8.1)where the last equation defines the quantities α n,l,j in an obvious manner.We now calculate α n for the two bandwidth conditions in (8.23). ELEVANT CHANGES VIA A MASS EXCESS (i) We begin with the case b v + +1 n /h d → ∞ , which means b v + l +1 n /h d → ∞ for l = 1 , . . . , k . By Lemma 10.1 there exists a sufficiently large constant M such that α n,l,j = (cid:90) t + l + Mh v + l +1 d t + l − Mh v + l +1 d σ ( t + l ) K ∗ (cid:16) t − j/nb n (cid:17) K d (cid:16) µ ( t ) − µ (0) − ch d (cid:17) dt (cid:16) O (cid:16) h v + l +1 d (cid:17)(cid:17) . (8.2)Observing the fact that the kernel K d ( · ) is bounded and continuous weobtain by a Taylor expansion of µ ( t ) − µ (0) − c around t + l , (cid:12)(cid:12) α n,l,j − α ∗ n,l,j (cid:12)(cid:12) = O (cid:16) h v + l +1 d ( | j/n − t + l | ≤ b n ) (cid:17) (8.3)uniformly with respect to t ∈ [0 , α ∗ n,l,j = (cid:90) t + l + Mh v + l +1 d t + l − Mh v + l +1 d σ ( t + l ) K ∗ (cid:16) t − j/nb n (cid:17) K d (cid:16) µ ( v + l +1) ( t + l )( t − t + l ) v + l +1 ( v + l + 1)! h d (cid:17) dt. Substituting t = t + l + z (cid:12)(cid:12) h d ( v + l + 1)! /µ ( v + l +1) ( t + l ) (cid:12)(cid:12) v + l +1 , observing the sym-metry of K d ( · ) and using a Taylor expansion shows that α ∗ n,l,j = (cid:12)(cid:12)(cid:12) ( v + l + 1)! h d µ ( v + l +1) ( t + l ) (cid:12)(cid:12)(cid:12) v + l +1 (cid:16) (cid:90) K d ( z v + l +1 ) dz (cid:17) σ ( t + l ) K ∗ (cid:16) t + l − j/nb n (cid:17) (8.4) + O (cid:16) h v + l +1 d ( b n ) − ( | j/n − t + l | ≤ b n ) (cid:17) , where we have used the fact that (cid:82) zK d ( z v + l +1 ) dz < ∞ since K d ( · ) has acompact support. Equations (8.2)–(8.4) and the condition b v ++1 n h d → ∞ now give α n = 1 n b n h d k (cid:88) l =1 n (cid:88) j =1 (cid:16)(cid:16) (cid:90) K d ( z v + l +1 ) dz (cid:17) σ ( t + l ) (cid:12)(cid:12)(cid:12) ( v + + 1)! h d µ ( v + +1) ( t + l ) (cid:12)(cid:12)(cid:12) v ++1 K ∗ (cid:16) t + l − j/nb n (cid:17)(cid:17) × (cid:18) O (cid:0) h v ++1 d b n − ( | j/n − t + l | ≤ b n ) (cid:1)(cid:19) = k (cid:88) l =1 h − v + lv + l +1 d nb n (cid:16) (cid:90) K d ( z v + l +1 ) dz (cid:17) (( v + l + 1)!) v + l +1 (cid:16) σ ( t + l ) | µ ( v + l +1) ( t + l ) | v + l +1 (cid:17) (cid:90) ( K ∗ ( x )) dx × (cid:0) O (cid:0) ( nb n ) − + h v + l +1 d /b n (cid:1)(cid:1) = h − v + v ++1 d ( nb n ) − σ , +1 (cid:0) o (1) (cid:1) which proves (8.23) in the case b v + +1 n /h d → ∞ .Next we turn to the case b n /h v ++1 d → c ∈ [0 , ∞ ), introduce the notation α n,l = n b n h d (cid:80) nj =1 α n,l,j and note that α n = k (cid:88) l =1 α n,l . (8.5)Define c l = b v + l +1 n /h d for l ∈ { , . . . , k + } . For those l satisfying c l → ∞ , wehave already shown that α n,l = h − v + lv + l +1 d nb n = o (cid:16) h v + l +1 d nh d (cid:17) = o (cid:16) h v ++1 d nh d (cid:17) . (8.6)In the following discussion we prove that for those l , for which c l does notconverge to infinity, the quantity α n,l is exactly of order O ( h v + l +1 d ( nh d ) − ).For this purpose define α (cid:48) n,l = 1 nb n h d (cid:90) (cid:0) G ( t + l , s, b n , h d ) (cid:1) ds. (8.7)where G ( t + l , s, b n , h d ) = (cid:90) t + l + (cid:15)t + l − (cid:15) σ ( s ) K ∗ (cid:16) t − sb n (cid:17) K d (cid:16) µ ( t ) − µ (0) − ch d (cid:17) dt. ELEVANT CHANGES VIA A MASS EXCESS It follows from a Taylor expansion and an approximation by a Riemann sumthat | α n,l − α (cid:48) n,l | ≤ nb n h d n (cid:88) j =1 n sup j − n ≤ s ≤ jn (cid:12)(cid:12) G ( t + l , s, b n , h d ) (cid:12)(cid:12) (cid:12)(cid:12)(cid:12) ∂∂s G ( t + l , s, b n , h d ) (cid:12)(cid:12)(cid:12) . (8.8)The terms in this sum can be estimated by an application of Lemma 10.1,that is sup j − n Define ˜ S k,r = (cid:80) r ∧ ni = k ∨ e i ,˜∆ j = ˜ S j − m +1 ,j − ˜ S j +1 ,j + m m , ˜ σ ( t ) = n (cid:88) j =1 m ˜∆ j w ( t, j ) . Since µ ( · ) ∈ C , elementary calculations show that uniformly for t ∈ [0 , | ˜ σ ( t ) − ˆ σ ( t ) | = O p ( m / /n )(8.22)Similar arguments as given in the proof of Lemma 3 of Zhou and Wu (2010)yields sup j (cid:107) ˜∆ j (cid:107) = O ( m − / ). A further application of this lemma gives (cid:107) sup t ∈ [ γ n , − γ n ] | ˜ σ ( t ) − E (˜ σ ( t )) |(cid:107) = O ( m / n − / τ − n ) , (8.23) (cid:107) ˜ σ ( t ) − E (˜ σ ( t )) (cid:107) = O ( m / n − / τ − / n )(8.24)Elementary calculations show that E (˜ σ ( t )) = Λ ( t ) + Λ ( t ) + Λ ( t ) , (8.25) ELEVANT CHANGES VIA A MASS EXCESS where Λ ( t ) = 12 m n (cid:88) j =1 ˜ S j − m − ,j ω ( t, j ) , Λ ( t ) = 12 m n (cid:88) j =1 ˜ S j +1 ,j + m ω ( t, j ) , Λ ( t ) = − m n (cid:88) j =1 ˜ S j +1 ,j + m ˜ S j − m − ,j ω ( t, j ) . Recall the representation e i = G ( i/n, F i ). Define ˜ S (cid:5) j − m +1 ,j = (cid:80) jr =1 ∨ ( j − m +1) G ( j/n, F r ),and ˜ S (cid:5) j − m +1 = (cid:80) n ∧ ( j + m ) r = j +1 G ( j/n, F r ). For s = 1 , , 3, define Λ (cid:5) s ( t ) as thequantity where the terms ˜ S j − m +1 ,j and ˜ S j − m +1 in Λ s ( t ) are replaced by˜ S (cid:5) j − m +1 ,j , ˜ S (cid:5) j − m +1 , respectively.Then by Lemma 4 of Zhou and Wu (2010), we have uniformly with resepctto t ∈ [0 , | E (Λ (cid:5) s ( t )) − E (Λ s ( t )) | = O ( (cid:112) m/n ) , s = 1 , , . (8.26)By Lemma 5 of Zhou and Wu (2010), it follows for s = 1 , | E (Λ (cid:5) s ( t )) − σ ( t ) / | = O ( m − + τ n ) , t ∈ [ γ n , − γ n ] , (8.27) | E (Λ (cid:5) s ( t )) − σ ( t ) / | = O ( m − + τ n ) , t ∈ [0 , γ n ) ∪ (1 − γ n , . (8.28)Define Γ( k ) = E ( G ( i/n, F ) G ( i/n, F k )), then similar arguments as givenin the proof of Lemma 5 of Zhou and Wu (2010) yield Γ( k ) = O ( χ | k | ).Elementary calculations show that for 1 ≤ j ≤ n E ( S (cid:5) j − m +1 ,j S (cid:5) j +1 ,j + m ) = m (cid:88) k =1 Γ( k ) = O (1) , (8.29)which proves E (Λ (cid:5) ( t )) = O ( m − )(8.30)uniformly with respect to t ∈ [0 , t ∈ [ γ n , − γ n ] | E ˜ σ ( t ) − σ ( t ) | = O ( (cid:112) m/n + m − + τ n ) , (8.31) sup t ∈ [0 ,γ n ) ∪ (1 − γ n , | E ˜ σ ( t ) − σ ( t ) | = O ( (cid:112) m/n + m − + τ n ) . (8.32)The theorem is now a consequence of these two equations and (8.23)–(8.25). (cid:3) 10. Some technical results. The size of mass excess. Lemma . Assume that the function µ ( · ) − m has k roots < t <. . . < t k < of order v i , ≤ i ≤ k , and define γ = min ≤ i ≤ k ( t i +1 − t i ) (with convention that t = 0 , t k +1 = 1 ), such that(i) For ≤ s ≤ k , the ( v s + 1) nd derivative of µ ( · ) is Lipschitz continuouson the interval I s := [ t s − γ, t s + γ ] .(ii) µ ( · ) is strictly monotone on the intervals I − s and I + s for ≤ s ≤ k ,where I − s := [ t s − γ, t s ] , I + s := ( t s , t s + γ ] ,(iii) there exists a positive number (cid:15) , such that min t ∈ [0 , ∩ ks =1 ¯ I s | µ ( t ) − m | ≥ (cid:15) , where ¯ I s := [0 , t s − γ ) ∪ ( t s + γ, is complement of I s .If A n denotes the set A n := { s : | µ ( s ) − m | ≤ h n } , (B.1) then there exists a sufficiently large constant C such that for any sequence h n → , we have λ ( A n ) ≤ Ch v +1 n , (B.2) where v = max ≤ l ≤ k v l . Furthermore, there exists a sufficiently large con-stant M , such that (B.3) A n = ∪ kl =1 B n,l,M when n is sufficiently large, where the sets B n,l,M are defined by B n,l,M = { s : | s − t l | v l +1 ≤ M h n , | µ ( s ) − m | ≤ h n } . (B.4) Proof. Define for 1 ≤ l ≤ k , A n,l = { s : | µ ( s ) − m | ≤ h n , | s − t l | < min { γ, ζ n }} , (B.5)where ζ n is a sequence of real numbers which converges to zero arbitrarilyslowly. We shall show that there exists a constant n ∈ N , such that for n ≥ n A n = ∪ kl =1 A n,l , (B.6) A n,l ⊆ B n,l,M , ≤ l ≤ k, (B.7) ELEVANT CHANGES VIA A MASS EXCESS where M is a sufficiently large constant. Note that (B.6) and (B.7) yield A n ⊆ ∪ kl =1 B n,l,M . By definition of B n,l,M and A n , we have that ∪ kl =1 B n,l,M ⊆ A n , which proves (B.3). Then a straightforward calculation shows that λ ( B n,l,M ) ≤ Ch vl +1 n ≤ Ch v +1 n , and the lemma follows.We first prove the assertion (B.6). By definition, A n ⊇ ∪ kl =1 A n,l . We nowargue that there exists a sufficiently large constant n , such that for n ≥ n , ∪ kl =1 A n,l ⊇ A n .Suppose this statement is not true, then there exists a sequence of points( s n ) n ∈ N , such that s n ∈ A n and s n ∈ ∩ kl =1 ¯ A n,l , where ¯ A n,l is the complementset of A n,l . Since h n = o (1) we have h n < (cid:15) for sufficiently large n and byassumption (iii), there exists an l ∈ { , . . . , k } such that s n ∈ I l ∩ A n ∩ ¯ A n,l . Without loss of generality we assume that s n ∈ I + l ∩ A n ∩ ¯ A n,l . The casethat s n ∈ I − l ∩ A n ∩ ¯ A n,l can be treated similarly.A Taylor expansion and assumption (i) yield for sufficiently large n ∈ N µ ( s ) − µ ( t l ) = µ ( v l +1) ( t l )( v l + 1)! ( s − t l ) v l +1 + µ ( v l +1) ( t ∗ l ) − µ ( v l +1) ( t l )( v l + 1)! ( s − t l ) v l +1 (B.8)for s ∈ A n,l , where t ∗ l ∈ [ t l ∧ s, t l ∨ s ]. By the definition of A n,l in (B.5) and thefact that ζ n = o (1), we have that A n,l ⊂ I l for sufficiently large n ∈ N . Thisresult together with s n ∈ I + l ∩ A n ∩ ¯ A n,l implies that t l + ζ n < s n ≤ t l + γ .However, by assumption (ii), µ ( · ) is strictly monotone in I + l , which yieldsthat for sufficiently large n , | µ ( s n ) − µ ( t l ) | ≥ | µ ( t l + ζ n ) − µ ( t l ) | > h n , (B.9)where the last > is due to (B.8), the Lipschitz continuity of µ ( v l +1) ( · ) inthe neighbourhood of t l and the fact that ζ n → A n in (B.1), equation (B.9) implies that s n (cid:54)∈ A n . Thiscontradicts to the assumption that s n ∈ A n , from which (B.6) follows.Now we show the conclusion (B.7). Since µ ( t l ) = m and the leading termin (B.8) is of order | ( s − t l ) v l +1 | , the set A n,l can be represented as (cid:110) s : | s − t l | ≤ (cid:16) h n | M ,l + M ,l ( s ) | (cid:17) vl +1 , | s − t l | ≤ ζ n , | µ ( s ) − c | ≤ h n (cid:111) , (B.10) where M ,l = µ ( vl +1) ( t l )( v l +1)! , and M ,l ( s ) = µ ( vl +1) ( t ∗ l ) − µ ( vl +1) ( t l )( v l +1)! ( s − t l ) v l +1 forsome t ∗ l ∈ [ t l ∧ s, t l ∨ s ]. By the Lipschitz continuity of µ ( v l +1) ( · ) on the interval[ t l − γ, t l + γ ], there exists a constant M (cid:48) l such that | M ,l ( s ) | ≤ M (cid:48) l | t l − s | .As ζ n = o (1) there exists an n l ∈ N such that | s − t l | ≤ | M ,l | M (cid:48) l for all s ∈ A n,l whenever n ≥ n l . This yields | M ,l + M ,l ( s − t l ) | ≥ | M ,l | n ≥ n l , s ∈ A n,l . By choosing n = max ≤ l ≤ k n l and M = max ≤ l ≤ k (cid:16) | M ,l | (cid:17) vl +1 ,and noticing the fact that ζ n → A n,l ⊆ B n,l,M for n ≥ n . Thus (B.7) follows, which completes the proof of Lemma 10.1. (cid:3) Remark . Observe that B n,i,M ∩ B n,j,M = ∅ for i (cid:54) = j if n is suf-ficiently large. Moreover, B n,i,M can be covered by closed intervals. TheLemma shows that the set { t : | µ ( t ) − m | ≤ h n , t ∈ [0 , } can be decom-posed in disjoint intervals containing the root of the equation µ ( t ) = m , withLebesgue measure determined by the maximal critical order of the roots.10.2. Uniform bounds for nonparametric estimates. In this section wepresent some results about the rate of uniform convergence of the Jackknifeestimator ˜ µ b n ( t ) defined in (4.1). Lemma . Recall the definition of ˜ µ b n in (4.1) and suppose that As-sumption 2.1(a) holds. If b n → , nb n → ∞ , then sup t ∈ [ b n , − b n ] (cid:12)(cid:12)(cid:12) ˜ µ b n ( t ) − µ ( t ) − nb n n (cid:88) i =1 K ∗ (cid:16) i/n − tb n (cid:17) e i (cid:12)(cid:12)(cid:12) = O ( b n + 1 nb n ) , (B.11) (cid:12)(cid:12)(cid:12) ˜ µ b n (0) − µ (0) − nb n n (cid:88) i =1 ¯ K ∗ (cid:16) i/nb n (cid:17) e i (cid:12)(cid:12)(cid:12) = O ( b n + 1 nb n ) , (B.12) where K ∗ ( · ) and ¯ K ∗ ( · ) are defined in (4.2) and (4.4) , respectivelyProof. We only show the estimate (B.11). The other result follows similarlyusing Lemma B.2 of Dette, Wu and Zhou (2015b). By Lemma B.1 of Dette, ELEVANT CHANGES VIA A MASS EXCESS Wu and Zhou (2015b) we obtain a uniform bound for the (uncorrected) locallinear estimate ˆ µ b n in (2.2), that issup t ∈ [ b n , − b n ] (cid:12)(cid:12)(cid:12) ˆ µ b n ( t ) − µ ( t ) − µ ¨ µ ( t )2 b n − nb n n (cid:88) i =1 e i K b n ( i/n − t ) (cid:12)(cid:12)(cid:12) = O ( b n + 1 nb n ) . (B.13)Then the lemma follows from the definition of ˜ µ b n ( · ). (cid:3) Lemma . If Assumption 2.1(a), Assumption 2.2 are satisfied and nb n log n → ∞ , b n → , then sup t ∈{ }∪ [ b n , − b n ] | ˜ µ b n ( t ) − µ ( t ) | = O p (cid:16) b n + log n √ nb n (cid:17) . (B.14) sup t ∈ [0 ,b n ) ∪ (1 − b n , | ˜ µ b n ( t ) − µ ( t ) | = O p (cid:16) b n + log n √ nb n (cid:17) . (B.15) Proof. We only prove the estimatesup [ b n , − b n ] | ˜ µ b n ( t ) − µ ( t ) | = O p (cid:16) b n + log n √ nb n (cid:17) . The case that t = 0 in (B.14) and the estimate (B.15) follow by similararguments, which are omitted for the sake of brevity. By the stochasticexpansion (B.11), it suffices to show thatsup t ∈ [ b n , − b n ] (cid:12)(cid:12)(cid:12) nb n n (cid:88) i =1 K ∗ (cid:16) i/n − tb n (cid:17) e i (cid:12)(cid:12)(cid:12) = O p (cid:16) log n √ nb n (cid:17) . (B.16)Then Assumption 2.2, Proposition 5 of Zhou (2013) and the summation byparts formula (44) in Zhou (2010) yield the existence (on a possibly richerprobability space) of a sequence ( V i ) i ∈ Z of independently standard normaldistributed random variables such thatsup t ∈ [ b n , − b n ] (cid:12)(cid:12)(cid:12) nb n n (cid:88) i =1 K ∗ (cid:16) i/n − tb n (cid:17) ( e i − V i ) (cid:12)(cid:12)(cid:12) = O p (cid:16) n / log nnb n (cid:17) . (B.17)Note that ( V i ) i ∈ Z is a martingale difference sequence with respect to thefiltration generated by ( V −∞ , ..., V i ). By Burkholder’s inequality it follows that for any positive κ and a sufficiently large universal constant C theinequality (cid:13)(cid:13)(cid:13) n (cid:88) i =1 V i K ∗ b n ( i/n − t ) (cid:13)(cid:13)(cid:13) κ ≤ Cκ (cid:13)(cid:13)(cid:13)(cid:0) n (cid:88) i =1 { V i K ∗ b n ( i/n − t ) } (cid:1) / (cid:13)(cid:13)(cid:13) κ ≤ Cκ n (cid:88) i =1 (cid:13)(cid:13) ( V i K ∗ b n ( i/n − t )) (cid:13)(cid:13) κ = Cκ n (cid:88) i =1 (cid:13)(cid:13) ( V i K ∗ b n ( i/n − t )) (cid:13)(cid:13) κ ≤ Cκ ( nb n )holds uniformly with respect to t ∈ [ b n , − b n ], where we have used that E | V | κ ≤ ( κ − ≤ κ κ in the last inequality. This leads tosup t ∈ [ b n , − b n ] (cid:13)(cid:13)(cid:13) nb n n (cid:88) i =1 K ∗ b n ( i/n − t ) V i (cid:13)(cid:13)(cid:13) κ = O (cid:16) κ √ nb n (cid:17) . Similarly, we obtainsup t ∈ [ b n , − b n ] (cid:13)(cid:13)(cid:13) nb n n (cid:88) i =1 ∂∂t K ∗ b n ( i/n − t ) V i (cid:13)(cid:13)(cid:13) κ = O (cid:16) κb − n √ nb n (cid:17) . Consequently, Proposition B.1. of Dette, Wu and Zhou (2015b) shows that (cid:13)(cid:13)(cid:13) sup t ∈ [ b n , − b n ] nb n n (cid:88) i =1 K ∗ b n ( i/n − t ) V i (cid:13)(cid:13)(cid:13) κ = O (cid:16) κb − κ n √ nb n (cid:17) The result now follows using κ = log( b − n ) observing the conditions on thebandwidths. Ruhr University BochumBuilding IB 2/65 Universit¨atsstrasse 150,D-44801 Bochum,GermanyE-mail: [email protected]