[PDF] Empirical process theory for locally stationary processes

Abstract

We provide a framework for empirical process theory of locally stationary processes using the functional dependence measure. Our results extend known results for stationary mixing sequences by another common possibility to measure dependence and allow for additional time dependence. We develop maximal inequalities for expectations and provide functional limit theorems and Bernstein-type inequalities. We show their applicability to a variety of situations, for instance we prove the weak functional convergence of the empirical distribution function and uniform convergence rates for kernel density and regression estimation if the observations are locally stationary processes.

Full PDF

aa r X i v : . [ m a t h . S T ] J u l Submitted to Bernoulli

Empirical process theory for locally stationaryprocesses

NATHAWUT PHANDOIDAEN and STEFAN RICHTER Institut f¨ur angewandte Mathematik, Im Neuenheimer Feld 205, Universit¨at HeidelbergE-mail: [email protected] ,E-mail: [email protected]

We provide a framework for empirical process theory of locally stationary processes using thefunctional dependence measure. Our results extend known results for stationary mixing se-quences by another common possibility to measure dependence and allow for additional timedependence. We develop maximal inequalities for expectations and provide functional limit the-orems and Bernstein-type inequalities. We show their applicability to a variety of situations, forinstance we prove the weak functional convergence of the empirical distribution function anduniform convergence rates for kernel density and regression estimation if the observations arelocally stationary processes.

MSC 2010 subject classiﬁcations:

Primary 60F17; secondary 60F10.

Keywords:

Empirical process theory, Functional dependence measure, Maximal inequality, Func-tional central limit theorem, Locally stationary processes.

1. Introduction

Empirical process theory is a powerful tool to prove uniform convergence rates and weakconvergence of composite functionals. The theory for independent variables is well-studied(cf. [22] for an overview) based on the original ideas of [6], [9], [10] and [20]. For more gen-eral random variables, various approaches which quantify dependence under mixing havebeen discussed. A complete theory which uses entropy with bracketing is available for φ -(uniform)-mixing and β -(absolutely regular)-mixing sequences (cf. [5] for an overviewand in particular [7] for the theory of β -mixing). Up to our knowledge, no other measureof dependence was found to allow for such a rich theory. It was mainly formalized forstationary processes. With some eﬀort, a generalization to locally stationary processesshould be possible but was never done. Unfortunately, even though it is graphically anintuitive assumption, it is hard to prove mixing for time series models due to the supre-mum over two diﬀerent sigma-algebras. The veriﬁcation needs quite some work even forrelatively simple stationary models like ARMA (cf. [17]) or GARCH (cf. [11]), and spe-ciﬁc continuity conditions on the distribution of the innovations have to be posed. Forlocally stationary models, even less results are known in this direction (cf. for instance1[13] for tvARCH models) and additional diﬃculties arise through the time-varying distri-butions. Therefore, mixing seems not to be ideally suited for an empirical process theoryfor locally stationary processes.In this paper we introduce an empirical process theory based on the functional depen-dence measure (invented by [24]) instead of mixing coeﬃcients. It was shown in [26] thatfor a large variety of recursively deﬁned models and linear models, the functional depen-dence measure is easy to calculate and only moment conditions on the innovations areenforced. Therefore, stating conditions on the dependence measure instead of the decayof certain mixing coeﬃcients can be viewed as an assumption which is weaker and ofteneasier to verify in theoretic models.It has been shown in various applications that the functional dependence measure allows,when combined with the rich theory of martingales, for sharp large deviation inequal-ities (cf. [27] or [28]). Instead of β - or φ -mixing, where dependence is quantiﬁed withprobabilities of events coming from diﬀerent sigma-algebras, the functional dependencemeasure uses a representation of the given process as a Bernoulli shift process and quan-tiﬁes dependence with a L ν -norm. More precisely, we assume that X i = ( X ij ) j =1 ,...,d , i = 1 , ..., n , is a d -dimensional process of the form X i = J i,n ( G i ) , (1.1)where G i = σ ( ε i , ε i − , ... ) is the sigma-algebra generated by ε i , i ∈ Z , a sequence ofi.i.d. random variables in R ˜ d ( d, ˜ d ∈ N ), and some measurable function J i,n : ( R ˜ d ) N → R , i = 1 , ..., n , n ∈ N . For a real-valued random variable W and some ν >

0, wedeﬁne k W k ν := E [ | W | ν ] /ν . If ε ∗ k is an independent copy of ε k , independent of ε i , i ∈ Z ,we deﬁne G ∗ ( i − k ) i := ( ε i , ..., ε i − k +1 , ε ∗ i − k , ε i − k − , ... ) and X ∗ ( i − k ) i := J i,n ( G ∗ ( i − k ) i ). Theuniform functional dependence measure is given by δ Xν ( k ) = sup i =1 ,...,n sup j =1 ,...,d (cid:13)(cid:13) X ij − X ∗ ( i − k ) ij (cid:13)(cid:13) ν . (1.2)Although representation (1.1) appears to be rather restrictive, it does cover a largevariety of processes: In [3] it was motivated that the set of all processes of the form X i = J ( ε i , ε i − , ... ) should be equal to the set of all stationary and ergodic processes. Weadditionally allow J to vary with i and n to cover processes which change their stochasticbehavior over time. This is exactly the form of the so-called locally stationary processesdiscussed in [4]. If both, functional dependence measure and mixing coeﬃcients are avail-able, many examples reveal that they lead to similar decay rates. These discoveries justifythe use of the functional dependence measure for empirical process theory and raise hopethat similar results as in the mixing framework can be derived.Since we are working in the time series context, many applications ask for functions f that not only depend on the actual observation of the process but on the whole (inﬁnite)past Z i := ( X i , X i − , X i − , ... ). In the course of this paper, we aim to derive asymptotic mpirical process theory for locally stationary processes G n ( f ) := 1 √ n n X i =1 (cid:8) f ( Z i , in ) − E f ( Z i , in ) (cid:9) , f ∈ F , (1.3)where F ⊂ { f : ( R d ) N × [0 , → R measurable } . Let H ( ε, F , k · k ) denote the bracketing entropy, that is, the logarithm of the number of ε -brackets with respect to some semi-norm k · k that is necessary to cover F (this is madeprecise at the end of this section). We will deﬁne a semi-norm V ( · ) which guarantees weakconvergence of (1.3) if the corresponding bracketing entropy integral R p H ( ε, F , V ) dε is ﬁnite. In the framework of β -mixing, [21] argues that the choice of the speciﬁc normwhich is needed to measure the size of the brackets is connected to the dependencestructure of X i . A main tool in deriving uniform results over f ∈ F exploits the factthat if X i is β -mixing with coeﬃcients β ( k ), the same holds for f ( X i , in ). When usingthe functional dependence measure (1.2), the situation is more complicated: In orderto quantify f ( X i , in ) (or more general, f ( Z i , in )) by δ Xν , we have to impose smoothnessconditions on f in direction of its ﬁrst argument. The semi-norm V ( · ) therefore willnot only change with the dependence structure of X , but also has to be “compatible”with the function class F . The smoothness condition on f also poses a challenging issuewhen considering chaining procedures where rare events are excluded by (non-smooth)indicator functions. We will see that despite these facts, our theory is not restrictedto smooth function classes. If the distribution of ε has a Lebesgue density, it is oftenpossible to decompose G n ( f ) into a martingale part and an integrated smooth part evenif f itself was not smooth.Our main contributions in this paper are the following: We derive • maximal inequalities for G n ( f ) for classes of functions F , • a chaining device which preserves smoothness during the chaining procedure, • conditions to ensure asymptotic tightness and functional convergence of G n ( f ), f ∈ F , • and Bernstein-type large deviation inequalities.Speciﬁcally, we generalize the results derived in [25] and [16], which consider weak con-vergence of the empirical distribution function for stationary processes and for piecewiselocally stationary processes, to general function classes F .The paper is organized as follows. In Section 2, we introduce the main deﬁnitions andassumptions on the function class F , deﬁne the semi-norm V ( · ) and give examples of itsform. We show that V ( f ) is an upper bound for the variance of G n ( f ). Section 3 con-siders the case where F consists of smooth functions. We derive maximal inequalities for G n ( f ) and a functional central limit theorem under minimal moment conditions. Section4 focuses on extending the results to non-smooth function classes F , while Section 5 pro-vides large deviation inequalities of Bernstein-type. Our main results are the Corollaries3.3 and 3.14 as well as the Corollaries 4.3 and 4.14. In Section 6, we use the theory ofSections 3 and 4 to prove uniform convergence rates for nonparametric regression esti-mation, M-estimation and weak convergence of the empirical distribution function. Theaim of the last section is to highlight the wide range of applicability of our theory and toprovide the typical conditions which have to be imposed as well as some discussion. InSection 7, a conclusion is drawn. We postpone all detailed proofs to the SupplementaryMaterial Supplement A but illustrate the main steps in the article.Let a ∧ b := min { a, b } , a ∨ b := max { a, b } for a, b ∈ R , and for k ∈ N , H ( k ) := 1 ∨ log( k ) (1.4)which naturally appears in large deviation inequalities. For a given ﬁnite class F , let |F| denote its cardinality. We use the abbreviation H = H ( |F| ) = 1 ∨ log |F| (1.5)if no confusion arises. For some semi-norm k · k , let N ( ε, F , k · k ) denote the bracketingnumbers, that is, the smallest number of ε -brackets [ l j , u j ] := { f ∈ F : l j ≤ f ≤ u j } (i.e. measurable functions l j , u j ∈ F with k u j − l j k ≤ ε for all j ) to cover F . Let H ( ε, F , k · k ) := log N ( ε, F , k · k ) denote the bracketing entropy. The fact that the limitfunctions l j , u j have to belong to F is discussed in Remark 2.12.

2. Derivation of the semi-norm, main deﬁnitions andassumptions on the function class

In this section, we provide the basic assumptions on F and the deﬁnition of the semi-norm V ( · ) which is used to measure the size of the brackets. Recall that for ν > W , we put k W k ν := E [ | W | ν ] /ν .Furthermore, for f ∈ F , let k f k ν,n := (cid:16) n n X i =1 (cid:13)(cid:13) f (cid:0) Z i , in (cid:1)(cid:13)(cid:13) νν (cid:17) /ν . Our theory mainly is based on the case ν = 2. A basic property that a semi-norm V ( · )has to fulﬁll when using a chaining procedure is that its square has to be an upper boundof the variance of G n ( f ), that is, Var( G n ( f )) ≤ V ( f ) . mpirical process theory for locally stationary processes k ∈ N . For a se-quence W i = ˜ J i,n ( G i ) with k W i k < ∞ , let P i − k W := E [ W i |G i − k ] − E [ W i |G i − k − ].Then, ( P i − k W i ) i ∈ N is a martingale diﬀerence sequence with respect to ( G i ) i ∈ N , and W i − E W i = P ∞ k =0 P i − k W i . By the projection property of the conditional expectationand an elementary property of δ W (cf. [24], Theorem 1), we have k P i − k W i k ≤ min {k W i k , δ W ( k ) } . (2.1)Since min { a , b } + min { a , b } ≤ min { a + a , b + b } for nonnegative real numbers a , b , a , b , we obtainVar( G n ( f )) / ≤ ∞ X k =0 (cid:13)(cid:13)(cid:13) √ n n X i =1 P i − k f ( Z i , in ) (cid:13)(cid:13)(cid:13) = ∞ X k =0 (cid:16) n n X i =1 k P i − k Y i k (cid:17) / ≤ ∞ X k =0 (cid:16) n n X i =1 min { (cid:13)(cid:13) f ( Z i , in ) (cid:13)(cid:13) , δ f ( Z, in )2 ( k ) } (cid:17) / ≤ ∞ X k =0 min n k f k ,n , (cid:16) n n X i =1 δ f ( Z, in )2 ( k ) (cid:17) / o . (2.2)To further bound (2.2), we therefore have to investigate for u ∈ [0 , δ f ( Z,u )2 ( k ) = sup i ∈ Z (cid:13)(cid:13) f ( Z i , u ) − f ( Z ∗ ( i − k ) i , u ) (cid:13)(cid:13) . (2.3)Due to its linear nature, it is necessary to impose some quantitative smoothness assump-tion on f ∈ F in order to derive upper bounds for δ f ( Z,u )2 ( k ) in terms of the functionaldependence measure of X i from (1.2). When doing so, we “lose” the properties of f andespecially of k f k ,n , that is, our goal should be to bound (2.3) by some quantity whichis completely independent of the speciﬁc f . To obtain a rich enough theory for locallystationary processes, it is necessary to allow f to depend on n and include classes F where parts of f change the convergence rate of G n ( f ). We especially have in mind thecase that f ( z, u ) = 1 √ h K (cid:16) i/n − vh (cid:17) · ¯ f ( z, u ) , where ¯ f : ( R d ) N × [0 , → R is measurable, K : R → R is some kernel function, h = h n some bandwidth and v ∈ [0 ,

1] some value which either is ﬁxed or varies with f . To coversuch cases, we require that each f ∈ F can be written in the form f ( z, u ) = D f,n ( u ) · ¯ f ( z, u ) , z ∈ ( R d ) N , u ∈ [0 , , (2.4)where D f,n ( u ) ∈ R does not depend on z . We put¯ F := { ¯ f : f ∈ F} . (2.5)Given some decreasing sequence ∆( k ) and some D n which fulﬁllsup u ∈ [0 , sup f ∈F δ ¯ f ( Z,u )2 ≤ ∆( k ) , sup f ∈F (cid:16) n n X i =1 D f,n (cid:0) in (cid:1) (cid:17) / ≤ D n , (2.6)we obtain from (2.2) thatVar( G n ( f )) / ≤ ∞ X k =0 min {k f k ,n , D n ∆( k ) } , which motivates the deﬁnition of V in the next subsection. V For some decreasing nonnegative sequence (∆( k )) k ∈ N of real numbers with P ∞ k =0 ∆( k ) < ∞ and some nonnegative sequence ( D n ) n ∈ N of real numbers, we deﬁne V ( f ) := k f k ,n + ∞ X k =1 min {k f k ,n , D n ∆( k ) } , f ∈ F . The following lemma collects some properties of V and especially shows that V is aseminorm. The proof is obvious and therefore omitted. Lemma 2.1.

Let f, g ∈ F and a ∈ R . Then(i) V (0) = 0 , V ( f + g ) ≤ V ( f ) + V ( g ) and V ( a · f ) = | a | V ( f ) ,(ii) | f | ≤ g = ⇒ V ( f ) ≤ V ( g ) ,(iii) k f k ,n , k f k ,n ≤ V ( f ) , and V ( f ) ≤ V ( k f k ∞ ) < ∞ if k f k ∞ < ∞ . Based on the fact that we will later assume that F fulﬁlls (2.6) (and thus G n ( f ) isproperly standardized), it is reasonable to suppose that D n ∈ (0 , ∞ ) is independent of n ∈ N . In this case, simpler forms of V can be derived for special cases of ∆( k ) whichare given in Table 1. Note that if f ( Z i , in ), i = 1 , ..., n , are independent, δ f ( Z,u )2 ( k ) = 0for k > V ( f ) = k f k ,n . We therefore exactly recover the case of independentvariables with our theory. Remark (Comparison with β -mixing) . If f ( Z i , in ) is β -mixing with decay coeﬃ-cients β ( k ) = c · k − α = ∆( k ) , k ∈ N , for some c > , α > , then it was shown in [5]that G n ( f ) is asymptotically tight if the entropy integral satisﬁes Z q H ( ε, F , k · k αα − ,n ) dε < ∞ . (2.7) mpirical process theory for locally stationary processes ∆( j ) cj − α , α > , c > cρ j , ρ ∈ (0 , c > V ( f ) k f k ,n max {k f k − α ,n , } k f k ,n max { log( k f k − ,n ) , } R σ p H ( ε, F , V ) dε R ˜ σ ε − α p H ( ε, F , k · k ,n ) dε R ˜ σ log( ε − ) p H ( ε, F , k · k ,n ) dε Table 1.

Equivalent expressions of V and the corresponding entropy integral taken from Lemma 8.15and Lemma 8.16 of the Supplementary Material Supplement A in Section 8.8, respectively, under thecondition that D n ∈ (0 , ∞ ) is independent of n . We omitted the lower and upper bound constantswhich are only depending on c, ρ, α and D n . Furthermore, ˜ σ = ˜ σ ( σ ) fulﬁlls ˜ σ → σ → In this framework, we therefore have to pay the price for dependence with a higher numberof moments of f ( Z i , u ) instead of an additional factor ε − α in the entropy integral (cf.Table 1).There is a special case where both entropy integrals have comparable values: If the brackets [ l, u ] which yield H ( ε, F , k · k αα − ,n ) have the property that | u − l | = | u − l | αα − (we haveespecially in mind the case that l, u are indicator functions), then it is easy to see that H ( ε, F , k · k αα − ,n ) ≤ H ( ε αα − , F , k · k ,n ) . By substitution u = ε αα − , (2.7) then is upperbounded by α − α Z u − α q H ( u, F , k · k ,n ) du, that is, the integrand is of the same order as in the entropy integral R p H ( ε, F , V ) dε (cf. Table 1). F We now give conditions such that F satisﬁes (2.6) based on statements about the func-tional dependence measure of X i . By the linear nature of the functional dependencemeasure, it is necessary to establish quantitative smoothness assumptions on the ele-ments of F . We do so by asking for H¨older-type smoothness. For s ∈ (0 , z = ( z i ) i ∈ N of elements of R d (equipped with the maximum norm |·| ∞ ) and an absolutelysummable sequence χ = ( χ i ) i ∈ N of nonnegative real numbers, put | z | χ,s := (cid:16) ∞ X i =0 χ i | z i | s ∞ (cid:17) /s and | z | χ := | z | χ, . We summarize the smoothness conditions on F in the following deﬁ-nition (recall (2.5)). Deﬁnition (( L F , s, R, C )-class) . We call a class ¯ F of functions ¯ f : ( R d ) N × [0 , → R a ( L F , s, R, C ) -class if L F = ( L F ,i ) i ∈ N is a sequence of nonnegative real numbers, s ∈ (0 , and R : ( R d ) N × [0 , → [0 , ∞ ) satisﬁes for all u ∈ [0 , , z, z ′ ∈ ( R d ) N , ¯ f ∈ ¯ F , | ¯ f ( z, u ) − ¯ f ( z ′ , u ) | ≤ | z − z ′ | sL F ,s · (cid:2) R ( z, u ) + R ( z ′ , u ) (cid:3) . Furthermore, C = ( C R , C ¯ f ) ∈ (0 , ∞ ) satisﬁes sup u | ¯ f (0 , u ) | ≤ C ¯ f , sup u | R (0 , u ) | ≤ C R . Remark . The condition on ¯ F to be an ( L F , s, R, C ) -class poses a smoothnesscondition on any ¯ f ∈ ¯ F separately. There is no need for any connection between thediﬀerent ¯ f ∈ ¯ F , and it should not be confused with the important example of so-calledparametric Lipschitz classes in empirical process theory (cf. [22, Example 19.7]) where itis assumed that there is some parameter space Θ ⊂ R p such that ¯ F = { ¯ f θ : θ ∈ Θ } andfor two θ , θ ∈ Θ , | ¯ f θ ( z, u ) − ¯ f θ ( z, u ) | ≤ m ( z, u ) · | θ − θ | ∞ holds for some measurablefunction m . We are now able to formulate the basic assumptions which are needed to prove the mainresults. Recall (2.4).

Assumption 2.5 (Compatability condition on F ) . Let ¯ F = { ¯ f : f ∈ F} be a ( L F , s, R, C ) -class. There exist ν ≥ and some p ∈ (1 , ∞ ] , C X > such that sup i,u k R ( Z i , u ) k νp ≤ C R , sup i,j k X ij k νspp − ≤ C X . (2.8) It holds that dC R · k X j =0 L F ,j ( δ X νspp − ( k − j )) s ≤ ∆( k ) , sup f ∈F (cid:16) n n X i =1 (cid:12)(cid:12) D f,n ( in ) (cid:12)(cid:12) (cid:17) / ≤ D n . We furthermore deﬁne D ∞ n ( u ) := sup f ∈F | D f,n ( u ) | and choose D ∞ ν,n such that (cid:16) n n X i =1 D ∞ n ( in ) ν (cid:17) /ν ≤ D ∞ ν,n . We abbreviate D ∞ n = D ∞ ,n . Remark . (i) In the theory of this paper, we will mainly consider the case ν = 2 .(ii) In the case that each ¯ f ∈ ¯ F has the simple form ¯ f ( z, u ) = ¯ f ( z ) with H¨older-continuous ¯ f and H¨older exponent s ∈ (0 , , we can choose p = ∞ . Then thestochastic conditions basically translate to sup i,j k X ij k νs < ∞ , δ Xνs ( k ) s ≤ ∆( k ) , and so, the decay of ∆( k ) is determined by δ Xνs ( k ) s . Note that ∆( k ) has a slowerdecay rate than δ Xνs ( k ) if s is strictly less than 1. In this case, our theory gives weakermpirical process theory for locally stationary processes results than the corresponding results for absolutely regular sequences from [21] since β -mixing does not rely on the smoothness of f . It can be seen that the choice of s is part of a trade-oﬀ: A smaller s is connected to weaker moment assumptions onthe underlying process X i and vice versa. If Assumption 2.5 is fulﬁlled, we obtain the following main consequences given in Lemma2.7. The proof is a simple application of H¨older’s inequality and is given in Section 8.1found in the Supplementary Material. Our non-asymptotic main results only rely on thestatements of Lemma 2.7; therefore they serve as an alternative set of assumptions.

Lemma 2.7.

Let Assumption 2.5 hold for some ν ≥ . Then, δ f ( Z,u ) ν ( k ) ≤ | D f,n ( u ) | · ∆( k ) , sup i (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) f ( Z i , u ) − f ( Z ∗ ( i − j ) i , u ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ν ≤ D ∞ n ( u ) · ∆( k ) , sup i k f ( Z i , u ) k ν ≤ | D f,n ( u ) | · C ∆ , where C ∆ := 4 d · | L F | · C sX C R + C ¯ f . Assumption 2.5 automatically imposes a continuity assumption on every f . This does,for instance, not hold for the analysis of the empirical distribution function where F = { f y = {·≤ x } : x ∈ R } . By using the decomposition G n ( f ) := G (1) n ( f ) + G (2) n ( f ) , G (1) n ( f ) := 1 √ n n X i =1 (cid:8) f ( Z i , in ) − E [ f ( Z i , in ) |G i − ] (cid:9) , (2.9) G (2) n ( f ) := 1 √ n n X i =1 (cid:8) E [ f ( Z i , in ) |G i − ] − E f ( Z i , in ) (cid:9) (2.10)(cf. [25] for a similar approach), the analysis of G n ( f ) can be transferred to that of themartingale G (1) n ( f ) and the more smooth G (2) n ( f ). For G (2) n ( f ), the smoothness conditionsof Assumption 2.5 have to be transferred from z f ( z, u ) to g E [ f ( Z i , u ) |G i − = g ].Even if the ﬁrst summand is noncontinuous, there is hope that the latter one is, due tothe additional integration over ε i . To guarantee this, we typically have to assume that ε i has a continuous density. Assumption 2.8 (Compatibility condition on F ) . Let ν ≥ . There exists a process X ◦ i = ( X ij ) j =1 ,...,d ◦ = J ◦ i,n ( G i ) with the following properties. For κ ∈ { , } and any f ∈ F , there exist functions ¯ µ ( κ ) f,i such that ¯ µ ( κ ) f,i ( Z ◦ i − , u ) = E [ ¯ f ( Z i , u ) κ |G i − ] /κ , i = 1 , ..., n, u ∈ [0 , , (2.11) with Z ◦ i − := ( X ◦ i − , X ◦ i − , ... ) .The class ¯ F κ := { ¯ µ ( κ ) f,i : f ∈ F , i ∈ { , ..., n }} is an ( L F , s, R, C ) -class, and there exists p ∈ (1 , ∞ ] , C X > such that sup i,u k R ( Z ◦ i − , u ) k νp ≤ C R , sup i,j k X ◦ ij k νspp − ≤ C X . It holds that d ◦ C R k − X j =0 L F ,j ( δ X ◦ νspp − ( k − j − s ≤ ∆( k ) , sup f ∈F (cid:16) n n X i =1 (cid:12)(cid:12) D f,n ( in ) (cid:12)(cid:12) (cid:17) / ≤ D n . Remark . (i) Assumption 2.8 naturally mixes properties of f and the one-stepevolution of the statistical model posed on X i . This means, we need some additionalknowledge on the evolution of the process X i to verify it.(ii) Note that we can always choose X ◦ i = ε i , Z ◦ i − = G i − and deﬁne µ ( κ ) f,i ( G i − , u ) := E [ f ( Z i , u ) κ |G i − ] /κ . (2.12) In the case that X i is recursively deﬁned, the choice (2.12) may lead to a morecomplicated calculation of ∆( k ) . In this case we should instead choose X ◦ i = X i ifit is possible.(iii) In Section 6 we will see examples where we can choose s ∈ (0 , arbitrarily. Thetrade-oﬀ connected to this choice mentioned in Remark 2.6(ii) also is present inthe framework of Assumption 2.8.(iv) We require that ¯ F is a ( L F , s, R, C ) -class to ensure smoothness of the conditionalvariance of G (1) n . This allows us to upper bound it by a deterministic distance mea-sure. We think that this is one of the weakest general assumptions that can beimposed. In special cases, stronger properties may be present which also allow for areduction of moment and dependence conditions, which is connected to the choiceof s . For details, we refer to Remark 4.4 and Remark 4.8. Based on Assumption 2.8 it is possible to show similar results as in Lemma 2.7. Thedetails can be found in the Supplementary Material Section 8.1 in Lemma 8.1. In theframework of Assumption 2.8, we sometimes need a submultiplicativity assumption on∆( k ). For q ∈ N , put β ( q ) = ∞ X j = q ∆( k ) . mpirical process theory for locally stationary processes Assumption 2.10.

There exists a constant C β > such that for each q , q ∈ N , β ( q q ) ≤ C β · β ( q ) β ( q ) . It is easily seen that Assumption 2.10 is fulﬁlled if ∆( k ) follows a polynomial (∆( k ) = ck − α for c > , α >

1) or exponential decay (∆( k ) = cρ k for c > ρ ∈ (0 , k ) contains a factor of the form k ) . Note that both Assumption 2.5 and Assumption 2.8 imply that Var( G n ( f )) / is boundedby V ( f ). This follows from (2.2) and from the fact that Var( G n ( f )) / ≤ Var( G (1) n ( f )) / +Var( G (2) n ( f )) / ≤ k f k ,n + Var( G (2) n ( f )). Lemma 2.11 (Variance bound) . Suppose that Assumption 2.5 or Assumption 2.8holds. Then for f ∈ F , Var( G n ( f )) / ≤ V ( f ) . Remark . We introduced the bracketing numbers N ( ε, F , k · k ) of F with the con-dition that the limits l j , u j of the brackets [ l j , u j ] have to belong to F . This later isneeded for the chaining procedure. Only in the case of Assumption 2.8, this represents anadditional condition. In the case of Assumption 2.5, we can simply deﬁne new limits ˜ l j ( z, u ) := inf f ∈ [ l j ,u j ] f ( z, u ) , ˜ u j ( z, u ) := sup f ∈ [ l j ,u j ] f ( z, u ) which fulﬁll [ l j , u j ] ∩ F = [˜ l j , ˜ u j ] ∩ F . Furthermore, | ˜ l j ( z, u ) − ˜ l j ( z ′ , u ) | ≤ sup f ∈ [ l j ,u j ] | f ( z, u ) − f ( z ′ , u ) | . Thus, we can add ˜ l j , ˜ u j to F without changing the bracketing numbers N ( ε, F , k · k ) andthe validity of Assumption 2.5.

3. Empirical process theory for smooth functionclasses

We provide an approach to obtain maximal inequalities for sums of random variables W i ( f ), i = 1 , ..., n , indexed by f ∈ F , by using a decomposition into independent random2variables. A similar approach is presented in [5] (Section 4.3 therein) for absolutely regularsequences. We will apply the results to W i ( f ) = f ( Z i , in ) or W i ( f ) = E [ f ( Z i , in ) |G i − ]in the case of Assumption 2.5 or Assumption 2.8, respectively. We will impose the fol-lowing conditions on W i ( f ) which are easily veriﬁed in the above two cases by Lemma2.7 or Lemma 8.1 in the Supplementary Material Supplement A. Assumption 3.1.

Suppose that for all measurable f , f , f, g , W i ( f + f ) = W i ( f ) + W i ( f ) , and | f | ≤ g ⇒ | W i ( f ) | ≤ W i ( g ) . For each i = 1 , ..., n , j ∈ N , s ∈ N ∪ {∞} , f ∈ F , (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) W i ( f ) − W i ( f ) ∗ ( i − j ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ≤ D ∞ n ( in )∆( j ) , (cid:13)(cid:13) W i ( f ) − W i ( f ) ∗ ( i − j ) (cid:13)(cid:13) ≤ | D f,n ( in ) | · ∆( j ) , (cid:13)(cid:13) W i ( f ) k s ≤ (cid:13)(cid:13) f ( Z i , in ) (cid:13)(cid:13) s . To approximate W i ( f ) by independent variables, we use a technique from [27] which wasreﬁned in [28]. Deﬁne W i,j ( f ) := E [ W i ( f ) | ε i − j , ε i − j +1 , ..., ε i ] , j ∈ N , and S Wn ( f ) := n X i =1 { W i ( f ) − E W i ( f ) } , S Wn,j ( f ) := n X i =1 { W i,j ( f ) − E W i,j ( f ) } . Let q ∈ { , ..., n } be arbitrary. Put L := ⌊ log( q )log(2) ⌋ and τ l := 2 l ( l = 0 , ..., L − τ L := q .Then we have W i ( f ) = W i ( f ) − W i,q ( f ) + L X l =1 ( W i,τ l ( f ) − W i,τ l − ( f )) + W i, ( f )(in the case q = 1, the sum in the middle does not appear) and thus S Wn ( f ) = (cid:2) S Wn ( f ) − S Wn,q ( f ) (cid:3) + L X l =1 (cid:2) S Wn,τ l ( f ) − S Wn,τ l − ( f ) (cid:3) + S Wn, ( f ) . We write S Wn,τ l ( f ) − S Wn,τ l − ( f ) = ⌊ nτl ⌋ +1 X i =1 T i,l ( f ) , T i,l ( f ) := ( iτ l ) ∧ n X k =( i − τ l +1 (cid:2) W k,τ l ( f ) − W k,τ l − ( f ) (cid:3) . mpirical process theory for locally stationary processes T i,l ( f ) , T i ′ ,l ( f ) are independent if | i − i ′ | >

1. This leads to thedecompositionmax f ∈F (cid:12)(cid:12)(cid:12) √ n S Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ max f ∈F √ n (cid:12)(cid:12) S Wn ( f ) − S Wn,q ( f ) (cid:12)(cid:12) + L X l =1 h max f ∈F (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) + max f ∈F (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i odd √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12)i + max f ∈F √ n (cid:12)(cid:12) S Wn, ( f ) (cid:12)(cid:12) . (3.1)While the ﬁrst term in (3.1) can be made small by assumptions on the dependence of W i ( f ) and by the use of a large deviation inequality for martingales in Banach spacesfrom [19], the second and third term allow the application of Rosenthal-type boundsdue to the independency of the summands T i,l ( f ) and W i, ( f ), respectively. Recall that H = H ( |F| ) = 1 ∨ log |F| as in (1.5). We obtain the following maximal inequality. Theorem 3.2.

Suppose that F satisﬁes |F| < ∞ and Assumption 3.1. Then thereexists some universal constant c > such that the following holds: If sup f ∈F k f k ∞ ≤ M and sup f ∈F V ( f ) ≤ σ , then E max f ∈F (cid:12)(cid:12)(cid:12) √ n S Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ c · min q ∈{ ,...,n } h σ √ H + √ H · D ∞ n β ( q ) + qM H √ n i . (3.2) For x > , deﬁne q ∗ ( x ) := min { q ∈ N : β ( q ) ≤ q · x } . Then, E max f ∈F (cid:12)(cid:12)(cid:12) √ n S Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ c · (cid:16) σ √ H + q ∗ (cid:0) M √ H √ n D ∞ n (cid:1) M H √ n (cid:17) . (3.3)In the next subsections, we will prove asymptotic tightness and a functional centrallimit theorem for G n ( f ) under the condition that D ∞ n , D n do not depend on n . However,uniform convergence rates of G n ( f ) for ﬁnite F can be obtained without these conditionsbut with additional moment assumptions, which is done in the following Corollary 3.3.Values of q ∗ ( · ) and r ( · ) for the two prominent cases that ∆( · ) is polynomial or exponentialdecaying can be found in Table 2. Corollary 3.3 (Uniform convergence rates) . Suppose that F satisﬁes |F| < ∞ andAssumption 2.5 for some ν ≥ . Furthermore, suppose that sup n ∈ N sup f ∈F V ( f ) < ∞ , sup n ∈ N D ∞ ν,n D ∞ n < ∞ , sup n ∈ N C ∆ Hn − ν r ( σ D ∞ n ) < ∞ . (3.4) Then, max f ∈F | G n ( f ) | = O p ( √ H ) . Remark . • Corollary 3.3 can be used to prove (optimal) convergence rates forkernel density and regression estimators as well as maximum likelihood estimatorsunder dependence. We give some examples in Section 6. • The ﬁrst condition in (3.4) guarantees that G n ( f ) is properly normalized. The sec-ond and third condition are needed to prove that the “rare events”, where | f ( Z i , in ) | exceeds some threshold M n ∈ (0 , ∞ ) , are of the same order as √ H . For this, we mayneed more than two moments, that is, ν > , depending on √ H and the behaviorof D ∞ n . ∆( j ) Cj − α , α > Cρ j , ρ ∈ (0 , q ∗ ( x ) max { x − α , } max { log( x − ) , } r ( δ ) min { δ αα − , δ } min { δ log( δ − ) , δ } Table 2.

Equivalent expressions of q ∗ ( · ) and r ( · ) taken from Lemma 8.14 in Section 8.8. We omittedthe lower and upper bound constants which are only depending on C, ρ, α . In this section, we assume that D n , D ∞ n ∈ (0 , ∞ ) can be chosen independently of n . Wenow use Theorem 3.2 to obtain a bound for (possibly inﬁnite) function classes F whichconsist of continuous functions with respect to their ﬁrst argument. Let G Wn ( f ) := 1 √ n n X i =1 ( W i ( f ) − E W i ( f )) . The choice of the truncation sequence for the following chaining approach is motivatedby [5] (Theorem 3.3 therein). Since Theorem 3.2 only yields maximal inequalities forcontinuous functions, we are not able to use the standard chaining scheme which involvesindicator functions. We therefore provide an adaptation of the typical chaining schemewhich does not need the use of indicators but replaces them by truncations of the arisingfunctions via maxima and minima (which preserves their continuity).For m >

0, deﬁne the truncation ϕ ∧ m : R → R and the corresponding ‘peaky’ residual ϕ ∨ m : R → R via ϕ ∧ m ( x ) := ( x ∨ ( − m )) ∧ m, ϕ ∨ m ( x ) := x − ϕ ∧ m ( x ) . In the following, assume that for each j ∈ N there exists a decomposition F = S N j k =1 F jk ,where ( F jk ) k =1 ,...,N j , j ∈ N is a sequence of nested partitions. For each j ∈ N and mpirical process theory for locally stationary processes k ∈ { , ..., N j } , choose a ﬁxed element f jk ∈ F jk . For j ∈ N , deﬁne π j f := f jk if f ∈ F jk .Assume that there exists a sequence (∆ j f ) j ∈ N such that for all j ∈ N , sup f,g ∈F jk | f − g | ≤ ∆ j f . Finally, let ( m j ) j ∈ N be a decreasing sequence which will serve as a truncationsequence.For j ∈ N , we use the decomposition f − π j f = ϕ ∧ m j ( f − π j f ) + ϕ ∨ m j ( f − π j f )Since f − π j f = f − π j +1 f + π j +1 f − π j f = ϕ ∧ m j +1 ( f − π j +1 f ) + ϕ ∨ m j +1 ( f − π j +1 f )+ ϕ ∧ m j − m j +1 ( π j +1 f − π j f ) + ϕ ∨ m j − m j +1 ( π j +1 f − π j f ) , (3.5)we can write ϕ ∧ m j ( f − π j f ) = ϕ ∧ m j +1 ( f − π j +1 f ) + ϕ ∧ m j − m j +1 ( π j +1 f − π j f ) + R ( j ) , (3.6)where R ( j ) := ϕ ∧ m j ( f − π j f ) − ϕ ∧ m j ( ϕ ∧ m j +1 ( f − π j +1 f )) − ϕ ∧ m j ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) . To bound R ( j ), we use (i) of the following elementary Lemma 3.5 which is proved inSection 8.3 included in the Supplementary Material Supplement A. Lemma 3.5.

Let y, x, x , x , x and m, m ′ > be real numbers. Then the followingassertions hold:(i) If | x | + | x | ≤ m , then (cid:12)(cid:12) ϕ ∧ m ( x + x + x ) − ϕ ∧ m ( x ) − ϕ ∧ m ( x ) (cid:12)(cid:12) ≤ min {| x | , m } . (ii) | ϕ ∧ m ( x ) | ≤ min {| x | , m } and if | x | < y , | ϕ ∨ m ( x ) | ≤ ϕ ∨ m ( y ) ≤ y { y>m } . (iii) If F fulﬁlls Assumption 2.5, then Assumption 2.5 also holds for { ϕ ∧ m ( f ) : f ∈ F} and { ϕ ∨ m ( f ) : f ∈ F} . Because the partitions are nested, we have | π j +1 f − π j f | ≤ ∆ j f . By Lemma 3.5 and(3.5), we have | R ( j ) | ≤ min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 ( f − π j +1 f ) + ϕ ∨ m j − m j +1 ( π j +1 f − π j f ) (cid:12)(cid:12) , m j (cid:9) ≤ min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) + min (cid:8)(cid:12)(cid:12) ϕ ∨ m j − m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) . (3.7)6Let τ ∈ N . We then have with iterated application of (3.6) and linearity of f W i ( f ), G Wn ( ϕ ∧ m ( f − π f ))= G Wn ( ϕ ∧ m ( f − π f )) + G Wn ( ϕ ∧ m − m ( π f − π f )) + G Wn ( R (0))= G Wn ( ϕ ∧ m τ ( f − π τ f )) + τ − X j =0 G Wn ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) + τ − X j =0 G Wn ( R ( j )) , (3.8)which in combination with (3.7) can now be used for chaining. In the following Lemma3.6, we balance the contribution of the truncated stochastic part and the expectation ofthe rare events. Recall that H ( k ) = 1 ∨ log( k ) as in (1.4). Lemma 3.6 (Compatibility lemma) . For δ > , put r ( δ ) := max { r > q ∗ ( r ) r ≤ δ } . For n ∈ N , δ > and k ∈ N deﬁne m ( n, δ, k ) := r ( δ D n ) · D ∞ n n / H ( k ) / . (3.9) Then the following statements hold:(i) r ( · ) is well-deﬁned and for each a > , r ( a )2 ≥ r ( a ) and r ( a ) ≤ a .(ii) If F fulﬁlls |F| ≤ k and Assumption 3.1, then sup f ∈F V ( f ) ≤ δ , sup f ∈F k f k ∞ ≤ m ( n, δ, k ) imply E max f ∈F (cid:12)(cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ c (1 + D ∞ n D n ) δ p H ( k ) , (3.10) and sup f ∈F V ( f ) ≤ δ implies that for each γ > , √ n k f { f>γ · m ( n,δ,k ) } k ,n ≤ γ D n D ∞ n δ p H ( k ) . (3.11)We now use (3.7), (3.8) and Lemma 3.6 to derive a uniform bound for E sup f ∈F | G Wn ( f ) | in the following Theorem 3.7. Theorem 3.7.

Let F satisfy Assumption 3.1 and let F be some envelope function of F ,that is, for each f ∈ F it holds that | f | ≤ F . Let σ > and assume that sup f ∈F V ( f ) ≤ σ .Then there exists some universal constant ˜ c > such that E sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ ˜ c h (1 + D ∞ n D n + D n D ∞ n ) Z σ q ∨ H (cid:0) ε, F , V (cid:1) d ε + √ n (cid:13)(cid:13) F { F > m ( n,σ, N ( σ , F ,V )) } (cid:13)(cid:13) ,n i , where m ( · ) is from Lemma 3.6.mpirical process theory for locally stationary processes Remark . Lemma 3.6 and Theorem 3.7 are designed for the case that D n , D ∞ n ∈ (0 , ∞ ) are independent of n . If instead V and D n , D ∞ n depend on n , chaining has tobe performed in a diﬀerent way to get optimal bounds for the corresponding maximalinequality. We give a short idea how the statements change. Let F satisfy Assumptions2.8, 2.10 with ν > . Deﬁne V ν,n ( f ) := k f k ν,n + P ∞ j =1 min {k f k ν,n , D n ∆( k ) } . Choose m ∞ ( n, δ, k ) = m ( n, δ, k ) · C β r ( D n D ∞ n ) instead of (3.9), and ν large enough such that √ n (cid:16) D n C β γ √ nr ( D n D ∞ n ) (cid:17) ν − ≤ . Then the following modiﬁcation of Lemma 3.6 holds. There exists some universal constant c > such that |F| ≤ k, sup f ∈F V ν,n ( f ) ≤ δ and sup f ∈F k f k ∞ ≤ m ∞ ( n, δ, k ) imply E max f ∈F (cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12) ≤ c (1 + C β ) δ p H ( k ) , and √ n k f { f>γ · m ( n,δ,k ) } k ,n ≤ √ n k f k νν,n m ∞ ( n, δ, k ) ν − γ ν − = k f k ν,n (cid:16) √ n k f k ν,n m ( n, δ, k ) (cid:17) ν − · √ n (cid:16) C β γ √ nr ( D n D ∞ n ) (cid:17) − ( ν − ≤ δH ( k ) ν − · √ n (cid:16) D n C β γ √ nr ( D n D ∞ n ) (cid:17) ν − ≤ δH ( k ) ν − . We now show a functional central limit theorem for G n ( f ). First we conclude fromTheorem 3.7 asymptotic equicontinuity of G n . To do so, we have to discuss the trailingterm in Theorem 3.7 which involves the envelope function. This can either be tackledwith higher moment assumptions on f ( Z i , u ) or by imposing smoothness assumptions onthe process X i and the functions f ∈ F with respect to their second argument. Here, wewill consider the second approach since most of the smoothness assumptions are naturallyneeded to prove a central limit theorem, anyway (cf. Theorem 3.13). This also has theadvantage that we only have to assume the existence of a second moment for f ( Z i , u ). Assumption 3.9.

For each u ∈ [0 , , there exists a process ˜ X i ( u ) = J ( G i , u ) , i ∈ Z ,where J is a measurable function. Furthermore, there exists some C X > , ς ∈ (0 , suchthat for every i ∈ { , ..., n } , u , u ∈ [0 , , k X i − ˜ X i ( in ) k spp − ≤ C X n − ς , k ˜ X i ( u ) − ˜ X i ( u ) k spp − ≤ C X | u − u | ς . For ˜ Z i ( u ) = ( ˜ X i ( u ) , ˜ X i − ( u ) , ... ) it holds that sup v,u k R ( ˜ Z ( v ) , u ) k p < ∞ . Assumption 3.10.

There exists some ς ∈ (0 , such that for every f ∈ F , | ¯ f ( z, u ) − ¯ f ( z, u ) | ≤ | u − u | ς · (cid:0) ¯ R ( z, u ) + ¯ R ( z, u ) (cid:1) , and sup u,v k ¯ R ( ˜ Z ( v ) , u ) k < ∞ . Corollary 3.11.

Let F satisfy Assumption 2.5, 3.9 and 3.10. Suppose that sup n ∈ N Z p ∨ H ( ε, F , V ) dε < ∞ . (3.12) Furthermore, assume that D n , D ∞ n ∈ (0 , ∞ ) are independent of n , and sup i =1 ,...,n D ∞ n ( in ) √ n → . (3.13) Then, the process G n ( f ) is equicontinuous with respect to V , that is, for every η > , lim σ → lim sup n →∞ P (cid:16) sup f,g ∈F ,V ( f − g ) ≤ σ | G n ( f ) − G n ( g ) | ≥ η (cid:17) = 0 . From Theorem 8.6 provided by the Supplementary Material Supplement A, Section 8.5,we directly obtain the following multivariate central limit theorem as a special case whichonly needs second moments of the summands f ( Z i , u ). To keep the presentation simple,we reduce ourselves to two explicit forms of D f,n ( · ) which are given in Assumption 3.12,namely a global and a local version. Theorem 8.6 allows more general choices of D f,n ( · ).In Assumption 3.12, we make use of ˜ X i ( u ), ˜ Z i ( u ) introduced in Assumption 3.9. Assumption 3.12.

Let ω : [0 , → R , K : R → R be some bounded functions. One ofthe following cases holds • Case K = 1 (global version): For all f ∈ F , D f,n ( u ) = ω ( u ) , where ω has boundedvariation and R ω ( u ) du > . For all f, g ∈ F , j , j ∈ N , the mapping u E [ E [ ¯ f ( ˜ Z j ( u ) , u ) |G ] · E [¯ g ( ˜ Z j ( u ) , u ) |G ]] has bounded variation.For f, g ∈ F , deﬁne Σ (1) f,g := Z ω ( u ) · X j ∈ Z Cov ( f ( ˜ Z ( u ) , u ) , g ( ˜ Z j ( u ) , u )) du. • Case K = 2 (local version): For all f ∈ F , D f,n ( u ) = ω ( u ) · √ h K (cid:0) u − vh (cid:1) , mpirical process theory for locally stationary processes where v ∈ (0 , is some ﬁxed value, h = h n → , nh → ∞ . ω is continuous in v ,and K has bounded variation, support ⊂ [ − , ] and satisﬁes R K ( u ) du > .For f, g ∈ F , deﬁne Σ (2) f,g := Z K ( u ) du · ω ( v ) X j ∈ Z Cov ( f ( ˜ Z ( v ) , v ) , g ( ˜ Z j ( v ) , v )) . Theorem 3.13.

Let F satisfy Assumptions 2.5, 3.9, 3.10 and 3.12. Let m ∈ N and f , ..., f m ∈ F . Then, √ n n X i =1 n  f ( Z i , in ) ... f m ( Z i , in )  − E  f ( Z i , in ) ... f m ( Z i , in )  o d → N (0 , (Σ ( K ) f k ,f l ) k,l =1 ,...,m ) , where Σ ( K ) is from Assumption 3.12. As a result of Corollary 3.11 and Theorem 3.13 and Theorem 18.14 in [22], we obtainthe following functional central limit theorem. The weak convergence takes place in thenormed space ℓ ∞ ( F ) = { G : F → R | k G k ∞ := sup f ∈F | G ( f ) | < ∞} , (3.14)cf. [22], Example 18.5. Corollary 3.14.

Let F satisfy Assumptions 2.5, 3.9, 3.10 and 3.12. Assume that sup n ∈ N Z p ∨ H ( ε, F , V ) dε < ∞ . Then it holds in ℓ ∞ ( F ) that (cid:2) G n ( f ) (cid:3) f ∈F d → (cid:2) G ( f ) (cid:3) f ∈F , where ( G ( f )) f ∈F is a centered Gaussian process with covariances Cov( G ( f ) , G ( g )) = Σ ( K ) f,g , where Σ ( K ) is from Assumption 3.12.

4. Empirical process theory for non-continuousfunctions

We now provide an approach for empirical process theory if the class F consists of non-continuous functions. Our approach is based on the decomposition G n ( f ) = G (1) n ( f ) + G (2) n ( f )into a martingale G (1) n (cf. (2.9)) and a process G (2) n (cf. (2.10)) with smooth increments.The second part G (2) n can then be controlled in a similar way as done in Section 3 bytaking W i ( f ) = E [ f ( Z i , in ) |G i − ]. The term G (1) n is dealt with by using a Bernstein-typeinequality for martingales. Observe that the conditional variance of G (1) n ( f ) is boundedfrom above by R n ( f ) := 1 n n X i =1 E [ f ( Z i , in ) |G i − ] . The ﬁrst step is now to bound R n ( f ) over f ∈ F . Again, let W i ( f ), i = 1 , ..., n , be somesequence of random variables indexed by f ∈ F . We will apply the following theory to W i ( f ) = E [ f ( Z i , in ) |G i − ] , but impose the more general assumptions which are directly implied by Lemma 8.1 inthe Supplementary material Supplement A under Assumption 2.8. Assumption 4.1.

For each i = 1 , ..., n , j ∈ N , s ∈ N ∪ {∞} , f ∈ F , (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) W i ( f ) − W i ( f ) ∗ ( i − j ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ≤ C ∆ D ∞ n ( in ) ∆( j ) , (cid:13)(cid:13) W i ( f ) − W i ( f ) ∗ ( i − j ) (cid:13)(cid:13) ≤ | D f,n ( in ) | · k f ( Z i , in ) k ∆( j ) , (cid:13)(cid:13) W i ( f ) k s ≤ k f ( Z i , in ) k s . We obtain the following analogue of Theorem 3.2, a maximal inequality for means ofrandom variables.

Lemma 4.2 (A maximal inequality for means) . Let F satisfy |F| < ∞ and Assumption4.1. Then there exists some universal constant c > such that the following holds: If sup f ∈F k f k ∞ ≤ M and sup f ∈F V ( f ) ≤ σ , then E max f ∈F (cid:12)(cid:12)(cid:12) n S Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ c · min q ∈{ ,...,n } h D n r ( σ D n ) σ + C ∆ ( D ∞ n ) β ( q ) + qM Hn i . (4.1) mpirical process theory for locally stationary processes Furthermore, E max f ∈F (cid:12)(cid:12)(cid:12) n S Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ c · h D n r ( σ D n ) σ + q ∗ (cid:0) M Hn ( D ∞ n ) C ∆ (cid:1) M Hn i . (4.2)Lemma 4.2 in conjunction with Theorem 3.2 can be used to provide convergence ratesin the same fashion as done in Corollary 3.3. Recall from (1.5) that H = 1 ∨ log |F| . Corollary 4.3 (Uniform convergence rates) . Suppose that F satisﬁes |F| < ∞ , As-sumption 2.8 for some ν ≥ , and Assumption 2.10. Let ¯ F := sup f ∈F ¯ f and assume thatfor some ν ∈ [2 , ∞ ] , C ¯ F ,n := sup i,u k ¯ F ( Z i , u ) k ν < ∞ . If sup n ∈ N sup f ∈F V ( f ) < ∞ , sup n ∈ N D ∞ ν ,n D ∞ n < ∞ , sup n ∈ N C F ,n Hn − ν r ( σ D ∞ n ) < ∞ , (4.3) then max f ∈F | G n ( f ) | = O p ( √ H ) . Remark (Alternative conditions) . In the special case that there exists some con-stant R > such that sup f ∈F n P ni =1 E [ f ( Z i , in ) |G i − ] ≤ R , it can easily be seen inthe proof that the statement of Corollary 4.3 still holds if we only ask for Assumption2.8 to hold for κ = 1 and Assumption 2.10 is discarded. A possible application is givenin Example 6.8. We now show asymptotic tightness of the martingale part G (1) n ( f ), f ∈ F . By usinga Bernstein-type inequality for martingales and Lemma 4.2, we obtain an analogue ofLemma 3.6 with the same function m ( · ) as deﬁned there. Lemma 4.5 (Compatibility lemma 2) . Let ψ : (0 , ∞ ) → [1 , ∞ ) be some functionand k ∈ N , δ > . If F fulﬁlls |F| ≤ k and Assumptions 2.8, 2.10, then there existssome universal constant c > such that the following holds: If sup f ∈F V ( f ) ≤ δ and sup f ∈F k f k ∞ ≤ m ( n, δ, k ) , then E max f ∈F (cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) { R n ( f ) ≤ δψ ( δ ) } ≤ c (1 + D ∞ n D n ) · ψ ( δ ) δ p H ( k ) , (4.4) P (cid:16) sup f ∈F R n ( f ) > δψ ( δ ) (cid:17) ≤ c (1 + q ∗ (cid:0) C − C − β (cid:1) ( D ∞ n D n ) ) ψ ( δ ) . (4.5)With the help of Lemma 4.5, we obtain the following maximal inequality.2 Theorem 4.6.

Let F satisfy Assumption 2.8 and 2.10, and F be some envelope func-tion of F . Furthermore, let σ > and suppose that sup f ∈F V ( f ) ≤ σ . Set ψ ( ε ) = p log( ε − ∨

1) log log( ε − ∨ e ) . (4.6) Then there exists a universal constant c > such that for each η > , P (cid:16) sup f ∈F (cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) > η (cid:17) ≤ η h c (cid:16) D ∞ n D n + D n D ∞ n (cid:17) · Z σ ψ ( ε ) q ∨ H (cid:0) ε, F , V (cid:1) d ε + √ n (cid:13)(cid:13) F { F > m ( n,σ, N ( σ , F ,V )) } (cid:13)(cid:13) i + c (cid:16) q ∗ (cid:0) C − C − β (cid:1)(cid:16) D ∞ n D n (cid:17) (cid:17) Z σ εψ ( ε ) dε, (4.7) where m ( · ) is from Lemma 3.6. Remark . Let m > . The chaining procedure found in [18] for martingales usesthe fact that for functions f, g with | f | ≤ g and g ( · ) > m , | G (1) n ( f ) | ≤ | G (1) n ( g ) | + 2 √ n · n n X i =1 E [ g ( Z i , in ) |G i − ] ≤ | G (1) n ( g ) | + 2 √ n R n ( g ) m . Afterwards, bounds for the conditional variance R n ( g ) are applied. In our case, thesebounds are not sharp enough. We therefore employ the inequality | G (1) n ( f ) | ≤ | G (1) n ( g ) | + 2 | G (2) n ( g ) | + 2 √ n k g k ,n m and are forced to use the “smooth” chaining technique applied in Theorem 3.7. Remark (Alternative conditions) . There seems to be no straightforward way us-ing a slicing device to approximate the conditional variance R n ( f ) by an appropriatedeterministic distance. Instead, we upper bound R n ( f ) during the chaining procedurewhich leads to the additional factor ψ ( ε ) in the entropy integral. In some special cases,the conditions of Theorem 4.6 can be relaxed. Suppose that V ∗ is some semi-metric on F × F such that for all f , f ∈ F , V ( f − f ) ≤ V ∗ ( f , f ) and for γ > small enough, sup f ,f ∈F ,V ∗ ( f ,f ) ≤ γ R n ( f − f ) ≤ γ almost surely. Then the statement of Theorem 4.6 still holds in the following form: If F satisﬁes Assumption 2.8 only for κ = 1 (and not necessarily Assumption 2.10), then formpirical process theory for locally stationary processes any R > , P (cid:16) sup f ∈F (cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) > η (cid:17) ≤ Rη h c (cid:16) D ∞ n D n + D n D ∞ n (cid:17) · Z σ q ∨ H (cid:0) ε, F , V ∗ (cid:1) d ε + √ n (cid:13)(cid:13) F { F > m ( n,σ, N ( σ , F ,V ∗ )) } (cid:13)(cid:13) i + cR (cid:16) q ∗ (cid:0) C − C − β (cid:1)(cid:16) D ∞ n D n (cid:17) (cid:17) . One possible example where this could be applicable is given in Example 6.10.

To formulate an equicontinuity statement, we use the following assumption to discussthe term which incorporates the envelope function in the upper bound of Theorem 4.6.The assumption is necessary to bound terms of the form k f ( Z i , u ) − f ( ˜ Z i ( in ) , u ) k , which naturally arise when a limit for the variance of G n ( f ) is derived. These cannot bediscussed with Assumption 2.8. Assumption 4.9.

For small enough c > and for all f ∈ F , sup u,v ∈ [0 , c s E h sup | a | L F ,s ≤ c (cid:12)(cid:12) ¯ f ( ˜ Z ( v ) , u ) − ¯ f ( ˜ Z ( v ) + a, u ) (cid:12)(cid:12) i < ∞ , (4.8) and there exists ¯ p ∈ (1 , ∞ ] such that sup i,u k ¯ f ( Z i , u ) k p < ∞ , sup v,u k ¯ f ( ˜ Z ( v ) , u ) k p < ∞ .Let ¯ F : ( R d ) N × [0 , → R be some function which fulﬁlls sup f ∈F ¯ f ≤ ¯ F . Assumption3.10 and the conditions above also hold when ¯ f is replaced by ¯ F . Remark . In opposite to the continuous case, where all conditions imposed on f ∈ F also transfer to sup f ∈F f due to the purely analytic nature of Assumptions 2.5and 3.10, we here additionally require some envelope function ¯ F to fulﬁll Assumption 3.10and (4.8) because the supremum over f ∈ F does not interchange with the expectation in(4.8). We now obtain asymptotic equicontinuity of the process G n ( f ). Corollary 4.11.

Let F satisfy the Assumptions 2.8, 2.10, 3.9, 3.10 and 4.9. For ψ from (4.6), suppose that sup n ∈ N Z ∞ ψ ( ε ) p ∨ H ( ε, F , V ) dε < ∞ . (4.9)4 Furthermore, let D n , D ∞ n ∈ (0 , ∞ ) be independent of n , and sup i =1 ,...,n D ∞ n ( in ) √ n → . (4.10) Then, the process G n ( f ) is equicontinuous with respect to V , that is, for every η > , lim σ → lim sup n →∞ P (cid:16) sup f,g ∈F ,V ( f − g ) ≤ σ | G n ( f ) − G n ( g ) | ≥ η (cid:17) = 0 . Remark . Compared to Corollary 3.11, the condition (4.9) of Corollary 4.11 is notoptimal due to the additional log -factor. The reason here is that we do not approximatethe distance R n ( · ) uniformly over the class F in an external step but evaluate the neededbounds for R n ( · ) during the chaining process. This is also the reason why our result doesnot coincide with the i.i.d. case. However, in comparison to the results of Corollary 8.16we do not lose much due to this factor in the presence of polynomial dependence. Evenin the case of exponential decay, only an additional factor in the integral appears, whichcan be seen as a factor contributed by the exponential decay itself. It is possible to show the following analogue of a multivariate central limit theorem asin Theorem 3.13 for a class F which fulﬁlls Assumption 2.8. Theorem 4.13.

Suppose that F satisﬁes Assumptions 2.8, 2.10, 3.9, 3.10, 3.12 and4.9. Let m ∈ N and f , ..., f m ∈ F . Then, √ n n X i =1 n  f ( Z i , in ) ... f m ( Z i , in )  − E  f ( Z i , in ) ... f m ( Z i , in )  o d → N (0 , (Σ ( K ) f k ,f l ) k,l =1 ,...,m ) , where Σ ( K ) is from Assumption 3.12. As a ﬁnal result, we obtain the following functional central limit theorem (cf. (3.14) forthe deﬁnition of ℓ ∞ ( F )). Corollary 4.14.

Suppose that F satisﬁes Assumptions 2.8, 2.10, 3.9, 3.10, 3.12 and4.9. For ψ deﬁned in (4.6), suppose that sup n ∈ N Z ψ ( ε ) p ∨ H ( ε, F , V ) dε < ∞ . Then in ℓ ∞ ( F ) , (cid:2) G n ( f ) (cid:3) f ∈F d → (cid:2) G ( f ) (cid:3) f ∈F , mpirical process theory for locally stationary processes where ( G ( f )) f ∈F is a centered Gaussian process with covariances Cov( G ( f ) , G ( g )) = Σ ( K ) f,g and Σ ( K ) is from Assumption 3.12.

5. Large deviation inequalities

A large variety of large deviation inequalities using the functional dependence measurehave been derived, see for instance [28] and [27] for Nagaev- and Rosenthal-type inequal-ities. Here, we present a Bernstein-type inequality for G n ( f ) which can be extended to alarge deviation inequality for sup f ∈F | G n ( f ) | using a combination of our chaining schemefrom Section 3 and the one from [2]. We provide these results to complete the picture ofempirical process theory for the functional dependence measure and to show the powerof the decomposition (3.1); in general however, the derived inequalities are weaker thana combination of Markov’s inequality and Theorem 3.2. The reason for this mainly lies inthe treatment of the ﬁrst summand in (3.1) and the fact that the functional dependencemeasure is formulated with a L ν -norm instead of probabilities. This leads to a worseningof V ( · ) and β ( · ).For q ∈ N , ν ≥

2, deﬁne ω ( q ) := q /ν log( eq ) / , L ( q ) = log log( e e q ) , Φ( q ) = q L ( q )as well as˜ β ( q ) = ∞ X j = q ∆( j ) ω ( j ) L ( j ) , ˜ V ( f ) = k f k ,n + ∞ X j =1 min {k f k ,n , D n ∆( j ) ω ( j ) }L ( j ) . With the above quantities, we can formulate the following result.

Theorem 5.1 (Bernstein-type large deviation inequality) . Let F satisfy Assumption2.5. Then there exist universal constants c , c > such that the following holds: Foreach q ∈ { , ..., n } there exists a set B n ( q ) independent of f ∈ F such that for all x > , P (cid:16)(cid:12)(cid:12) G n ( f ) (cid:12)(cid:12) > x, B n ( q ) (cid:17) ≤ c exp (cid:16) − c x ˜ V ( f ) + M Φ( q ) √ n x (cid:17) (5.1) and P ( B n ( q ) c ) ≤ (cid:16) D ∞ n ˜ β ( q ) √ nM Φ( q ) (cid:17) . Deﬁne ˜ q ∗ ( z ) := min { q ∈ N : ˜ β ( q ) ≤ Φ( q ) z } . Then for any y > , x > , P (cid:16) | G n ( f ) | > x, B n (˜ q ∗ ( M √ n D ∞ n y )) (cid:17) ≤ c exp (cid:16) − c x ˜ V ( f ) + Φ(˜ q ∗ ( M √ n D ∞ n y )) Mx √ n (cid:17) (5.2)6 and P ( B n (˜ q ∗ ( M √ n D ∞ n y )) c ) ≤ y . Remark . (i) Theorem 5.1 mimics the well-known large deviation inequalitiesfrom [21] (Theorem 5 therein) or [15] in the case of α -mixing sequences.(ii) The reason for the worsening of V, β, q to ˜ V , ˜ β, Φ( q ) in Theorem 5.1 comparedto Theorem 3.2 is due to the arising sums over l = 1 , ..., L in the second termand j = q, q + 1 , ... in the ﬁrst term P ∞ j = q max f ∈F √ n (cid:12)(cid:12) S Wn,j +1 ( f ) − S Wn,j ( f ) (cid:12)(cid:12) inthe decomposition (3.1), which forces us to include additional log -factors to obtainconvergence. The additional factor j /ν that appears in ˜ β is due to an applicationof Markov’s inequality. It can be argued that this is a relict of the fact that thedependence conditions are stated with moments and not with probabilities as in thecase of mixing.(iii) Theorem 5.1 can be seen as an improvement of the Bernstein inequalities given in[8] which are only available for random variables with exponential decay (in oursetting, the conditions are comparable to ∆( k ) = O (exp( k − a )) for some a > ). A similar statement is valid in the case of non-continuous classes F . We then need thefollowing analogue of Assumption 2.10 where β ( · ) is replaced by ˜ β ( · ) and q is replacedby Φ( q ). Assumption 5.3.

The sequence j ∆( j ) ω ( j ) L ( j ) is decreasing. There exists someconstant C ˜ β > such that ˜ β norm ( q ) := ˜ β ( q )Φ( q ) fulﬁlls for all q , q ∈ N , ˜ β norm ( q q ) ≤ C ˜ β ˜ β norm ( q ) ˜ β norm ( q ) . Theorem 5.4.

Let F satisfy the Assumptions 2.8, 5.3. Then there exist universal con-stants c ◦ , c ◦ > such that the following holds: For each q ∈ { , ..., n } there exists a set B ◦ n ( q ) independent of f ∈ F such that for all x > , P (cid:16)(cid:12)(cid:12) G n ( f ) (cid:12)(cid:12) > x, B ◦ n ( q ) (cid:17) ≤ c ◦ exp (cid:16) − c ◦ x ˜ V ( f ) + M Φ( q ) √ n x (cid:17) (5.3) and P ( B ◦ n ( q ) c ) ≤ [4 + C ∆ C ˜ β ] (cid:16) √ n D ∞ n M ˜ β ( q )Φ( q ) (cid:17) . Furthermore, for any x > , y > , P (cid:16)(cid:12)(cid:12) G n ( f ) (cid:12)(cid:12) > x, B ◦ n (˜ q ∗ ( M √ n D ∞ n y )) (cid:17) ≤ c ◦ exp (cid:16) − c ◦ x ˜ V ( f ) + ˜ q ∗ ( M √ n D ∞ n y ) Mx √ n (cid:17) (5.4) and P ( B ◦ n (˜ q ∗ ( M √ n D ∞ n y ) c ) ≤ C ∆ C ˜ β y .mpirical process theory for locally stationary processes f ∈F | G n ( f ) | using a chaining scheme from [2] which incorporates an entropy integral of the form R σ ψ ( ε ) W (1 ∨ H ( ε, F , ˜ V )) dε , where ψ is a log-factor (cf. (4.6)) and W : R → R fulﬁlls H / ≤ W ( H ) ≤ H , depending on the decay of ∆( · ). Details can be found in Section 8.6,Theorem 8.12 in the Supplementary Material Supplement A. The larger entropy integralcomes from the fact that in the proof of Theorem 5.1, we can only recover the linearexp( − x ) part of the Bernstein inequality in the discussion of the ﬁrst summand in (3.1)(see (8.115) in the Supplementary Material Supplement A).

6. Applications

In this section, we provide some applications of the main results for smooth functionclasses (Corollary 3.3 and Corollary 3.14) and nonsmooth function classes (Corollary 4.3and Corollary 4.14). We will focus on locally stationary processes and therefore use local-ization in our functionals, but the results also hold for stationary processes, accordingly.Let K : R → R be some bounded kernel function which is Lipschitz continuous withLipschitz constant L K , R K ( u ) du = 1, R K ( u ) du ∈ (0 , ∞ ) and support ⊂ [ − , ]. Forsome bandwidth h >

0, put K h ( · ) := h K ( · h ).In the ﬁrst example we consider the nonparametric kernel estimator in the context ofnonparametric regression with ﬁxed design and locally stationary noise. We show thatunder conditions on the bandwidth h , which are common in the presence of dependence(cf. [14] or [23]), we obtain the optimal uniform convergence rate q log( n ) nh . Write a n & b n for sequences a n , b n if there exists some constant c > a n ≥ cb n for all n ∈ N . Example (Nonparametric Regression) . Let X i be some arbitrary process of theform (1.1) with P ∞ k =0 δ X ( k ) < ∞ which fulﬁlls sup i =1 ,...,n k X i k ν ≤ C X ∈ (0 , ∞ ) forsome ν > . Suppose that we observe Y i , i = 1 , ..., n given by Y i = g ( in ) + X i , where g : [0 , → R is some function. Estimation of g is performed via ˆ g n,h ( v ) := 1 n n X i =1 K h ( in − v ) Y i . Suppose that either • δ X ( j ) ≤ κj − α with some κ > , α > , and h & ( log( n ) n − ν ) α − α , or • δ X ( j ) ≤ κρ j with some κ > , ρ ∈ (0 , and h & log( n ) n − ν . From (6.1) and (6.2) below it follows that sup v ∈ [0 , | ˆ g n,h ( v ) − E ˆ g n,h ( v ) | = O p (cid:0)r log( n ) nh (cid:1) . First note that due to Lipschitz continuity of K with Lipschitz constant L K , we have sup | v − v ′ |≤ n − (cid:12)(cid:12) (ˆ g n,h ( v ) − E ˆ g n,h ( v )) − (ˆ g n,h ( v ′ ) − E ˆ g n,h ( v ′ )) (cid:12)(cid:12) ≤ · L K n − nh n X i =1 (cid:0) | X i | + E | X i | (cid:1) = O p ( n − ) . (6.1) For the grid V n = { in − , i = 1 , ..., n } , which discretizes [0 , up to distances n − , weobtain by Corollary 3.3 that √ nh sup v ∈ V n | ˆ g n,h ( v ) − E ˆ g n,h ( v ) | = sup f ∈F | G n ( f ) | = O p (cid:0)p log | V n | (cid:1) = O p (cid:0) log( n ) / (cid:1) , (6.2) where F = { f v ( x, u ) = 1 √ h K ( u − vh ) x : v ∈ V n } . The conditions of Corollary 3.3 are easily veriﬁed: It holds that f v ( x, u ) = D f,n ( u ) · ¯ f v ( x, u ) with D f,n ( u ) = √ h K ( u − vh ) and ¯ f v ( x, u ) = x . Thus, Assumption 2.5 is satisﬁedwith ∆( k ) = 2 δ X ( k ) , p = ∞ , R ( · ) = C R = 1 . Furthermore, D n = | K | ∞ , D ν,n = | K | ∞ √ h ,and k f v k ,n ≤ √ h (cid:16) n n X i =1 K ( v − uh ) k X i k (cid:17) / ≤ C X | K | ∞ , which shows that sup f ∈F V ( f ) = O (1) . The conditions on h emerge from the last condi-tion in (3.4) and using the bounds for r ( · ) from Table 2. For the following two examples we suppose the following properties of the underlyingprocess X i . Similar assumptions are posed in [4] and are fulﬁlled for a large variety oflocally stationary processes. Assumption 6.2.

Let

M > . Let X i be some process of the form (1.1). For any u ∈ [0 , , there exists ˜ X i ( u ) = J ( G i , u ) , where J is a measurable function, with thefollowing properties: There exists some constants C X > , ς ∈ (0 , such that for all i = 1 , ..., n , u , u ∈ [0 , : k ˜ X ( u ) k M ≤ C X , k X i k M ≤ C X , k X i − ˜ X i ( in ) k M ≤ C X n − ς , k ˜ X ( u ) − ˜ X ( u ) k M ≤ C X | u − u | ς . In the same spirit as in Example 6.1, it is possible to derive uniform rates of convergencefor M-estimators of parameters θ in models of locally stationary processes. Furthermore, mpirical process theory for locally stationary processes ∇ jθ denote the j -th derivativewith respect to θ . To apply empirical process theory, we ask for the objective functionsto be ( L F , , R, C )-classes in (A1) and Lipschitz with respect to θ in (A2). Lemma 6.3 (M-estimation, uniform results) . Let Θ ⊂ R d Θ be compact and θ : [0 , → interior (Θ) . For each θ ∈ Θ , let ℓ θ : R k → R be some measurable function which is twicecontinuously diﬀerentiable. Let Z i = ( X i , ..., X i − k +1 ) , and deﬁne for v ∈ [0 , , ˆ θ n,h ( v ) := arg min θ ∈ Θ L n,h ( v, θ ) , L n,h ( v, θ ) := 1 n n X i = k K h (cid:0) in − v (cid:1) · ℓ θ ( Z i ) Let Assumption 6.2 be fulﬁlled for some M ≥ . Suppose that there exists C Θ > suchthat for j ∈ { , , } ,(A1) ¯ F j = {∇ jθ ℓ θ : θ ∈ Θ } is an ( L F , , R, C ) -class with R ( z ) = 1 + | z | M − ,(A2) for all z ∈ R k , θ, θ ′ ∈ Θ , (cid:12)(cid:12) ∇ jθ ℓ θ ( z ) − ∇ jθ ℓ θ ′ ( z ) (cid:12)(cid:12) ∞ ≤ C Θ (1 + | z | M ) · | θ − θ ′ | , (A3) θ E ℓ θ ( ˜ Z ( v )) attains its global minimum in θ ( v ) with positive deﬁnite I ( v ) := E ∇ θ ℓ θ ( ˜ Z ( v )) .Furthermore, suppose that either • δ X M ( j ) ≤ κj − α with some κ > , α > , and h & ( log( n ) n − ν ) α − α , or • δ X M ( j ) ≤ κρ j with some κ > , ρ ∈ (0 , and h & log( n ) n − ν .Deﬁne τ n := q log( n ) nh and B h := sup v ∈ [0 , | E ∇ θ L n,h ( v, θ ( v )) | (the bias). Then, B h = O ( h ς ) , and as nh → ∞ , sup v ∈ [ h , − h ] (cid:12)(cid:12) ˆ θ n,h ( v ) − θ ( v ) (cid:12)(cid:12) = O p (cid:0) τ n + B h (cid:1) and sup v ∈ [ h , − h ] (cid:12)(cid:12) { ˆ θ n,h ( v ) − θ ( v ) } − I ( v ) − ∇ θ L n,h ( v, θ ( v )) (cid:12)(cid:12) = O p (( τ n + h ς )( τ n + B h )) . Remark . • In the tvAR(1) case X i = a ( i/n ) X i − + ε i , we can use for instance ℓ θ ( x , x ) = ( x − ax ) , which for a ∈ ( − , is a ((1 , a ) , , | x | + | x | , (0 , -class. • With more smoothness assumptions on ∇ θ ℓ or using a local linear estimationmethod for ˆ θ n,h , the bias term B h can be shown to be of smaller order, for in-stance O ( h ) (cf. [4]). • The theory derived in this paper can also be used to prove asymptotic propertiesof M-estimators based on objective functions ℓ θ which are only almost everywherediﬀerentiable in the Lebesgue sense by following the theory of chapter 5 in [22].This is of utmost interest for ℓ θ that have additional analytic properties, such asconvexity. Since these properties are also needed in the proofs, we will not discussthis in detail. We give an easy application of the functional central limit theorem from Corollary 3.14following Example 19.25 in [22].

Example (Local mean absolute deviation) . For ﬁxed v ∈ (0 , , put X n ( v ) := n K h (cid:0) in − v (cid:1) X i and deﬁne the mean absolute deviationmad n ( v ) := 1 n n X i =1 K h (cid:0) in − v (cid:1) | X i − X n ( v ) | . Let Assumption 6.2 hold with M = 1 . Suppose that P ( ˜ X ( v ) = E ˜ X ( v )) = 0 and that forsome κ > , α > , δ X ( j ) ≤ κj − α . We show that if nh → ∞ and nh ς → , √ nh (cid:0) mad n ( v ) − E | ˜ X ( v ) − µ | (cid:1) d → N (0 , σ ) , (6.3) where µ = E ˜ X ( v ) , G denotes the distribution function of ˜ X ( v ) and σ = Z K ( u ) du · ∞ X j =0 Cov (cid:0) | ˜ X ( v ) − µ | + (2 G ( µ ) −

1) ˜ X ( v ) , | ˜ X j ( v ) − µ | + (2 G ( µ ) −

1) ˜ X j ( v ) (cid:1) . The result is obtained by using the decomposition √ nh (cid:0) mad n ( v ) − E | ˜ X ( v ) − µ | (cid:1) = G n ( f X n ( v ) − f µ ) + G n ( f µ ) + A n ,A n = √ nhn n X i =1 K h (cid:0) in − v (cid:1) { E | X i − θ | − E | ˜ X ( v ) − µ | (cid:9)(cid:12)(cid:12)(cid:12) θ = X n ( v ) , where Θ = { θ ∈ R : | θ − µ | ≤ } and F = { f θ ( x, u ) = √ hK h ( u − v ) | x − θ | : θ ∈ Θ } . By the triangle inequality, F satisﬁes Assumption 2.5 with ¯ f θ ( x, u ) = | x − θ | , R ( · ) = C R =1 , p = ∞ , s = 1 and ∆( k ) = 2 δ X ( k ) . Assumption 3.9 is satisﬁed through Assumption6.2, and Assumption 3.10 is trivially fulﬁlled since ¯ f does not depend on u . Since F is ampirical process theory for locally stationary processes one-dimensional Lipschitz class, sup n ∈ N H ( ε, F , k · k ,n ) = O (log( ε − ∨ . By Corollary3.14, we obtain that there exists some process [ G ( f θ )] θ ∈ Θ such that for h → , nh → ∞ , (cid:2) G n ( f θ ) (cid:3) θ ∈ Θ d → (cid:2) G ( f θ ) (cid:3) θ ∈ Θ in ℓ ∞ (Θ) . (6.4) Furthermore, by Assumption 6.2, k f X n ( v ) ( X i ) − f µ ( X i ) k ≤ k X n ( v ) − µ k ≤ k X n ( v ) − E X n ( v ) k + k E X n ( v ) − µ k ≤ √ nh (cid:16) nh n X i =1 K (cid:0) in − vh (cid:1) (cid:17) / ∞ X j =0 δ X ( j ) + 1 n n X i =1 K h ( in − v ) (cid:12)(cid:12) E X i − E ˜ X ( v ) | = O (( nh ) − / + h ς ) . (6.5) By Lemma 19.24 in [22], we conclude from (6.4) and (6.5) that G n ( f X n ( v ) − f µ ) p → . (6.6) By Assumption 6.2 and bounded variation of K , A n = √ nh (cid:8) E | ˜ X ( v ) − θ | (cid:12)(cid:12) θ = X n ( v ) − E | ˜ X ( v ) − µ | (cid:9) + O p (( nh ) − / + ( nh ) / h − ς ) . (6.7) Due to P ( ˜ X ( v ) = µ ) = 0 , g ( θ ) = E | ˜ X ( v ) − θ | is diﬀerentiable in θ = µ with derivative G ( µ ) − . The Delta method delivers √ nh (cid:8) E | ˜ X ( v ) − θ | (cid:12)(cid:12) θ = X n ( v ) − E | ˜ X ( v ) − µ | (cid:9) = (2 G ( µ ) − √ nh ( X n ( v ) − µ ) + o p (1) . (6.8) From (6.6), (6.7) and (6.8) we obtain √ nh (cid:0) mad n ( v ) − E | ˜ X ( v ) − µ | (cid:1) = G n ( f µ + (2 G ( µ ) − id ) + o p (1) . Theorem 3.13 now yields (6.3).

To keep the following examples simple, we reduce ourselves to rather speciﬁc models. Itis not hard to apply our theory to more general situations.

Model 6.6 (Recursively deﬁned models) . The process X i , i = 1 , ..., n , follows a recur-sion X i = m ( X i − , in ) + σ ( X i − , in ) ε i , where ε i , i ∈ Z , is an i.i.d. sequence of random variables and σ, m : R × [0 , → R .Suppose that there exist χ m , C m , ς > such that sup x = x ′ sup u | m ( x, u ) − m ( x ′ , u ) || x − x ′ | ≤ χ m , sup u = u ′ sup x | m ( x, u ) − m ( x, u ′ ) | (1 + | x | ) · | u − u ′ | ς ≤ C m , (6.9) and sup u | m (0 , u ) | ≤ C m . Let σ ( · ) satisfy the same properties with constants χ σ , C σ > .Let s > such that χ m + k ε k s · χ σ < . By Proposition 4.4 and Lemma 4.5 in [4], Assumption 6.2 is fulﬁlled and with some ρ ∈ (0 , , δ X s ( k ) ≤ C X ρ k . Model 6.7 (Linear models) . The process X i , i = 1 , ..., n , has the form X i = ∞ X j =0 a j ( in ) ε i − j , where ε i , i ∈ Z , is an i.i.d. sequence and a j : [0 , → R are some functions. There exist M > , ς > and some absolutely summable sequences A = ( A j ) j ∈ N , ¯ A = ( ¯ A j ) j ∈ N suchthat k ε k s < ∞ and for j ∈ N , sup u ∈ [0 , | a j ( u ) | ≤ A j , sup u = u | a j ( u ) − a j ( u ) || u − u | ς ≤ ¯ A j . Furthermore, inf u a ( u ) ≥ σ min > . Then it is easily seen that Assumption 6.2 is fulﬁlledand δ X s ( j ) ≤ k ε k s A j . To verify Assumption 2.8 in the case of density and distribution function estimation,the linear Model 6.7 can be dealt with as a “special case” of the recursive Model 6.6 byidentifying µ ( X i − , in ) with P ∞ j =1 a j ( i/n ) ε i and σ ( X i − , in ) with a ( i/n ). There exists astandard method to show that Assumption 2.8 is valid by only imposing minimal momentconditions on the underlying process X i : We will see that there will be terms of the form1 σ ( z, u ) g (cid:16) y − m ( z, u ) σ ( z, u ) (cid:17) , where y ∈ R and g ( · ) is some bounded continuously diﬀerentiable function with C g ′ , :=sup x ∈ R | g ′ ( x ) x | < ∞ . Omitting the second argument of m, σ for shortness, we have for mpirical process theory for locally stationary processes ξ z,z ′ between 1 and σ ( z ′ ) σ ( z ) , (cid:12)(cid:12)(cid:12) g (cid:16) y − m ( z ) σ ( z ) (cid:17) − g (cid:16) y − m ( z ′ ) σ ( z ′ ) (cid:17)(cid:12)(cid:12)(cid:12) (6.10) ≤ | g ′ | ∞ σ min | m ( z ) − m ( z ′ ) | + (cid:12)(cid:12)(cid:12) g (cid:16) y − m ( z ′ ) σ ( z ′ ) · σ ( z ′ ) σ ( z ) (cid:17) − g (cid:16) y − m ( z ′ ) σ ( z ′ ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ | g ′ | ∞ σ min | m ( z ) − m ( z ′ ) | + (cid:12)(cid:12)(cid:12) g (cid:16) y − m ( z ′ ) σ ( z ′ ) ξ z,z ′ (cid:17) y − m ( z ′ ) σ ( z ′ ) ξ z,z ′ × (cid:16) σ ( z ′ ) σ ( z ) − (cid:17) ξ − z,z ′ (cid:12)(cid:12)(cid:12) ≤ | g ′ | ∞ σ min | m ( z ) − m ( z ′ ) | + 2 C g ′ , σ min | σ ( z ) − σ ( z ′ ) | . (6.11)On the other hand, (6.10) is bounded by 2 | g | ∞ . Using the fact that for x ≥

0, min { , x } ≤ x a for arbitrary small a ∈ (0 , (cid:12)(cid:12) σ ( z ) − σ ( z ′ ) | ≤ min { σ − min , σ − min | σ ( z ) − σ ( z ′ ) |} ≤ σ − min σ − amin | σ ( z ) − σ ( z ′ ) | a and from (6.11) that (cid:12)(cid:12)(cid:12) σ ( z ) g (cid:16) y − m ( z ) σ ( z ) (cid:17) − σ ( z ′ ) g (cid:16) y − m ( z ′ ) σ ( z ′ ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ | g | ∞ σ min (cid:16) | g ′ | ∞ | g | ∞ σ min (cid:17) a | m ( z ) − m ( z ′ ) | a + 2 | g | ∞ σ min (cid:16) σ min ∨ C g ′ , | g | ∞ (cid:17) a | σ ( z ) − σ ( z ′ ) | a . (6.12) Example (Density estimation) . With some kernel ˜ K : R → [0 , ∞ ) , we considerthe localized density estimate of the density g ˜ X ( v ) of ˜ X ( v ) , ˆ g n,h ( x, v ) = 1 n n X i =1 K h ( in − v ) ˜ K h ( X i − x ) , where h , h > are some bandwidths and we abbreviate h = ( h , h ) . Suppose that • X i evolves like Model 6.6 or Model 6.7 and for some α > ( s ∧ ) − , δ X s ( j ) = O ( j − α ) , • ε fulﬁlls C ε := k ε k s < ∞ , has a density g ε with respect to the Lebesgue measurewhich is bounded, continuously diﬀerentiable and satisﬁes sup x ∈ R | g ′ ε ( x ) x | < ∞ . • there exists p ˜ K ≥ s, C ˜ K > such that for u large enough, | ˜ K ( u ) | ≤ C ˜ K | u | − p ˜ K .Furthermore, R ˜ K ( x ) dx = 1 , R ˜ K ( x ) dx < ∞ and R ˜ K ( x ) | x | dx < ∞ .We show that if log( n ) (cid:0) nh h α ( s ∧

12 ) α ( s ∧

12 ) − (cid:1) − = O (1) , sup x ∈ R ,v ∈ [0 , (cid:12)(cid:12) ˆ g n,h ( x, v ) − g ˜ X ( v ) ( x ) (cid:12)(cid:12) = O p (cid:0)s log( n ) nh h + p nh h ( h + h ς ( s ∧ ) (cid:1) . (6.13)4 To do so, note that p nh h (cid:0) ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) (cid:1) = G n ( f x,v ) , where F = { f x,v ( z, u ) = p h K h ( u − v ) · p h ˜ K h ( z − x ) : x ∈ R , v ∈ [0 , } . With ¯ f x,v ( z, u ) = √ h ˜ K h ( z − x ) and κ ∈ { , } , we have by a substitution ω = h − ( m ( X i − , in ) + σ ( X i − , in ) ε − x ) , E [ ¯ f x,v ( X i , u ) κ |G i − ] /κ = 1 √ h E h ˜ K (cid:16) m ( X i − , in ) + σ ( X i − , in ) ε i − xh (cid:17) κ (cid:12)(cid:12)(cid:12) G i − i /κ = 1 √ h h Z ˜ K (cid:16) m ( X i − , in ) + σ ( X i − , in ) ε − xh (cid:17) κ g ε ( ε ) dε i /κ = h κ − h Z ˜ K ( ω ) κ σ ( X i − , in ) g ε ( x + h ω − m ( X i − , in ) σ ( X i − , in ) ) dω i /κ =: ¯ µ ( κ ) f x,v ,i ( X ◦ i − , u ) with X ◦ i = X i . By H¨older continuity of the square root, (6.12) and (6.9), we obtain (cid:12)(cid:12) ¯ µ ( κ ) f x ,i ( z, u ) − ¯ µ ( κ ) f x ,i ( z, u ) (cid:12)(cid:12) ≤ C (cid:16) Z ˜ K ( ω ) κ dω (cid:17) /κ | z − z ′ | s ∧ κ , where C depends on | g | ∞ , | g ′ ε | ∞ , sup x ∈ R | g ′ ε ( x ) x | , χ m , χ σ , σ min .The class F therefore satisﬁes Assumption 2.8 with p = ∞ , ν = 2 , ∆( k ) = O ( δ X s ( k ) s ∧ ) = O ( j − α ( s ∧ ) ) . Note that ¯ F ( z, u ) = sup f ∈F ¯ f ( z, u ) ≤ | ˜ K | ∞ √ h =: C ¯ F ,n . We obtain from Corol-lary 4.3 that for the grids V n = { in − : i = 1 , ..., n } , X n = { in − : i ∈ {− ⌈ n s ⌉ , ..., ⌈ n s ⌉}} , p nh h sup x ∈X n ,v ∈ V n (cid:12)(cid:12) ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) (cid:12)(cid:12) = sup x ∈X n ,v ∈ V n | G n ( f x,v ) | = O p (cid:0)p log( n ) (cid:1) . The discretization of (6.13) and the replacement of E ˆ g n,h ( x, v ) by g ˜ X ( v ) ( x ) is ratherstandard and postponed to the Supplementary material Supplement A, Section 8.7. Remark . • Note that due to Remark 4.4, all statements of the Example alsohold for s ∧ replaced by s ∧ . • Compared to [14] or [23], which proved similar results in the case that dependenceis quantiﬁed with α -mixing coeﬃcients, we get much weaker conditions on the lowerbounds of the bandwidth h in order to guarantee uniform convergence. The reasonhere is that we assume that ε has a Lebesgue density which was not asked for inthe above papers. If we want to use our theory to prove (6.13) without assumingthat ε has a Lebesgue density, we would have to impose conditions which allow usto prove the statements of Lemma 2.7 directly.mpirical process theory for locally stationary processes F , we are able to quantifythe conditions on the moments and the decay of dependence in an easy way. Example (Empirical distribution function) . Let v ∈ [0 , . As an estimator forthe distribution function G ˜ X ( v ) of ˜ X ( v ) , we consider ˆ G n,h ( x, v ) := 1 n n X i =1 K h ( in − v ) { X i ≤ x } , x ∈ R , Suppose that • X i evolves like Model 6.6 or Model 6.7 and for some α > ( s ∧ ) − , δ X s ( j ) = O ( j − α ) , • ε fulﬁlls C ε := k ε k s < ∞ , its distribution function G ε is continuously diﬀeren-tiable with derivative g ε and satisﬁes sup x ∈ R | g ε ( x ) x | < ∞ .Then as nh → ∞ , √ nh · h ς ( s ∧ → , √ nh (cid:2) ˆ G n,h ( x, v ) − G ˜ X ( v ) ( x ) (cid:3) x ∈ R d → (cid:2) G ( x ) (cid:3) x ∈ R in ℓ ∞ ( R ) , (6.14) where G is some centered Gaussian process with covariance function Cov( G ( x ) , G ( y )) = Z K ( u ) du X j ∈ Z Cov( { ˜ X ( v ) ≤ x } , { ˜ X j ( v ) ≤ y } ) . To prove this result, we use the fact that √ nh (cid:2) ˆ G n,h ( x, v ) − E ˆ G n,h ( x, v ) (cid:3) x ∈ R = (cid:2) G n ( f ) (cid:3) f ∈F , (6.15) where F = { f x ( z, u ) = √ hK h ( u − v ) { z ≤ x } : x ∈ R } . With ¯ f x ( z, u ) = { z ≤ x } , we have E [ f x ( X i , u ) κ |G i − ] /κ = G ε (cid:16) x − m ( X i − , in ) σ ( X i − , in ) (cid:17) /κ =: ¯ µ ( κ ) f x ,i ( X i − , u ) . As in Example 6.8, we see that Assumption 2.8 is satisﬁed with X ◦ i = X i , p = ∞ , ν = 2 , ∆( k ) = O ( δ X s ( k − s ) = O ( j − α ( s ∧ ) ) . Assumption 3.9 follows directly from Model 6.6or Model 6.7. Assumption 3.10 is trivially satisﬁed since ¯ f x ( z, u ) does not depend on thesecond argument. For any x ∈ R , sup | a |≤ c (cid:12)(cid:12) { ˜ X ( u ) ≤ x } − { ˜ X ( u )+ a ≤ x } (cid:12)(cid:12) ≤ (cid:12)(cid:12) { ˜ X ( u ) ≤ x } − { ˜ X ( u ) ≤ x − c } (cid:12)(cid:12) ≤ { x − c< ˜ X ( u ) ≤ x } , so that for s ∈ (0 , ] , sup u ∈ [0 , c s E h sup | a |≤ c (cid:12)(cid:12) { ˜ X ( u ) ≤ x } − { ˜ X ( u )+ a ≤ x } (cid:12)(cid:12) i ≤ sup u ∈ [0 , c s (cid:12)(cid:12)(cid:12) G ε (cid:16) x − m ( ˜ X i − ( u ) , u )) σ ( ˜ X i − ( u ) , u ) (cid:17) − G ε (cid:16) ( x − c ) − m ( ˜ X i − ( u ) , u )) σ ( ˜ X i − ( u ) , u ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ c s min { , | g ε | ∞ cσ min } ≤ (cid:16) | g ε | ∞ σ min (cid:17) s , which shows Assumption 4.9.For γ > , we can discretize [ − γ − s , γ − s ] by a grid { x j } j = − N,...,N with distances γ ,having roughly γ − s − points. Under the given conditions, it is possible to show thatwith x N +1 = ∞ , x − ( N +1) = −∞ , the brackets [ f x j − , f x j ] , j = − N, ..., N + 1 , cover F . Details can be found in the Supplementary material Supplement A, Section 8.7. Wetherefore have q H ( γ, F , k · k ,n ) = O (cid:0)p log( γ − ) (cid:1) . By Table 2, as long as α ( s ∧ ) > , we have sup n ∈ N R ψ ( ε ) p H ( ε, F , V ) dε < ∞ . ByCorollary 4.14 we obtain for nh → ∞ that (6.15) converges to (cid:2) G ( x ) (cid:3) x ∈ R . Remark . • We conjecture that s ∧ can be replaced by s ∧ due to Remark4.8. • With similar techniques as presented in 6.12, it is also possible to include weightfunctions w : R → [0 , ∞ ) with lim x →±∞ w ( x ) = ∞ as additional factors to theconvergence (6.14), as done in [25]. • In the Model 6.6, it is also reasonable to consider estimating of the residual distribu-tion function G ε itself. Following the approach of [1], we ﬁrst have to specify estima-tors ˆ m , ˆ σ for m, σ , respectively, and deﬁne empirical residuals ˆ ε i = X i − ˆ m ( X i − ,i/n )ˆ σ ( X i − ,i/n ) .Then ˆ G ε ( x ) = 1 n n X i =1 { ˆ ε i ≤ x } = 1 n n X i =1 { ε i ≤ x · ˆ σ ( Xi − ,i/n ) σ ( Xi − ,i/n ) + ˆ m ( Xi − ,i/n ) − m ( Xi − ,i/n ) σ ( Xi − ,i/n ) } can be discussed with empirical process theory and analytic properties of ˆ m, ˆ σ .

7. Conclusion

In this paper, we have developed an empirical process theory for locally stationary pro-cesses via the functional dependence measure. We have proven maximal inequalities, mpirical process theory for locally stationary processes L - or L ∞ -statistics. We have given several exam-ples in nonparametric estimation where our theory is applicable. Due to the possibilityto analyze the size of the function class and the stochastic properties of the underlyingprocess separately, we conjecture that our theory also permits an extension of variousresults from i.i.d. to dependent data, such as empirical risk minimization.From a technical point of view, the linear and moment-based nature of the functionaldependence measure has forced us to modify several approaches from empirical processtheory for i.i.d. or mixing variables. A main issue was given by the fact that the de-pendence measure only transfers decay rates for continuous functions. We therefore haveprovided a new chaining technique which preserves continuity of the arguments of theempirical process and extended the results to noncontinuous functions. We were not ableto derive Bernstein-type maximal inequalities with an optimal entropy integral. This maybe addressed in future work.In principle, a similar empirical process theory can be established for (1.3) under mixingconditions such as absolute regularity. This would be a generalization of the results foundin [21] and [5]. However, in a number of models the derivation of a bound for these mixingcoeﬃcients may require some eﬀort while the functional dependence measure is usuallyeasy to bound if the evolution of the process over time is known. Similar to such amixing framework, it is possible to apply our theory as long as the decay coeﬃcients ofthe functional dependence measure are absolutely summable. However, it turnes out thatthere are signiﬁcant diﬀerences: In our framework, the integrand p H ( ε, F , k · k ,n ) of theentropy integral is multiplied by some factor dependent on ε while only second momentsare needed, whereas in the mixing case there is no additional factor but more moments areneeded through a larger norm. Only in special cases these integrals are comparable; theexact connection between the values of the functional dependence measure and β -mixingcoeﬃcients remains up to now an open question. References [1]

Akritas, M. G. and

Van Keilegom, I. (2001). Non-parametric estimation of theresidual distribution.

Scand. J. Statist. Alexander, K. S. (1984). Probability inequalities for empirical processes and alaw of the iterated logarithm.

The Annals of Probability

Borkar, V. S. (1993). White-Noise Representations in Stochastic Realization The-ory.

SIAM J. Control Optim. Dahlhaus, R. , Richter, S. and

Wu, W. B. (2019). Towards a general theory fornonlinear locally stationary processes.

Bernoulli Dedecker, J. and

Louhichi, S. (2002). Maximal inequalities and empirical cen-tral limit theorems. In

Empirical process techniques for dependent data

Donsker, M. D. (1952). Justiﬁcation and extension of Doob’s heuristic approachto the Komogorov-Smirnov theorems.

Ann. Math. Statistics Doukhan, P. , Massart, P. and

Rio, E. (1995). Invariance principles for abso-lutely regular empirical processes.

Ann. Inst. H. Poincar´e Probab. Statist. Doukhan, P. and

Neumann, M. H. (2007). Probability and moment inequali-ties for sums of weakly dependent random variables, with applications.

StochasticProcesses and their Applications

Dudley, R. M. (1966). Weak convergences of probabilities on nonseparable metricspaces and empirical measures on Euclidean spaces.

Illinois J. Math. Dudley, R. M. (1978). Central limit theorems for empirical measures.

Ann. Probab. Francq, C. and

Zako¨ıan, J.-M. (2006). Mixing properties of a general class ofGARCH(1,1) models without moment assumptions on the observed process.

Econo-metric Theory Freedman, D. A. (1975). On Tail Probabilities for Martingales.

Ann. Probab. Fryzlewicz, P. and

Subba Rao, S. (2011). Mixing properties of ARCH and time-varying ARCH processes.

Bernoulli Hansen, B. E. (2008). Uniform convergence rates for kernel estimation with de-pendent data.

Econometric Theory Liebscher, E. (1996). Strong convergence of sums of [alpha]-mixing random vari-ables with applications to density estimation.

Stochastic Processes and their Appli-cations Mayer, U. (2019). Functional weak limit theorem for a local empirical process ofnon-stationary time series and its application to von Mises-statistics.[17]

Mokkadem, A. (1988). Mixing properties of ARMA processes.

Stochastic Process.Appl. Nishiyama, Y. et al. (2000). Weak convergence of some classes of martingales withjumps.

The Annals of Probability Pinelis, I. (1994). Optimum bounds for the distributions of martingales in Banachspaces.

Ann. Probab. Pollard, D. (1982). A central limit theorem for empirical processes.

J. Austral.Math. Soc. Ser. A Rio, E. (1995). The Functional Law of the Iterated Logarithm for StationaryStrongly Mixing Sequences.

Ann. Probab. van der Vaart, A. W. (1998). Asymptotic statistics . Cambridge Series in Sta-tistical and Probabilistic Mathematics . Cambridge University Press, Cambridge.MR1652247[23] Vogt, M. (2012). Nonparametric regression for locally stationary time series.

Ann.mpirical process theory for locally stationary processes Statist. Wu, W. B. (2005). Nonlinear system theory: another look at dependence.

Proc.Natl. Acad. Sci. USA

Wu, W. B. (2008). EMPIRICAL PROCESSES OF STATIONARY SEQUENCES.

Statistica Sinica Wu, W. B. (2011). Asymptotic theory for stationary processes.

Stat. Interface Wu, W. B. , Liu, W. and

Xiao, H. (2013). Probability and moment inequalitiesunder dependence.

Statist. Sinica Zhang, D. and

Wu, W. B. (2017). Gaussian approximation for high dimensionaltime series.

Ann. Statist. Supplementary MaterialSupplement A: Technical proofs (doi: COMPLETED BY THE TYPESETTER; .pdf). This material contains some de-tails of the proofs in the paper as well as the proofs of the examples.

8. Appendix

Proof of Lemma 2.7.

We have for each f ∈ F and ν ≥ i (cid:13)(cid:13)(cid:13) ¯ f ( Z i , u ) − ¯ f ( Z ∗ ( i − k ) i , u ) (cid:13)(cid:13)(cid:13) ν ≤ sup i (cid:13)(cid:13)(cid:13) | Z i − Z ∗ ( i − k ) i | sL F ,s (cid:0) R ( Z i , u ) + R ( Z ∗ ( i − k ) i , u ) (cid:1)(cid:13)(cid:13)(cid:13) ν ≤ sup i (cid:13)(cid:13)(cid:13)(cid:12)(cid:12) Z i − Z ∗ ( i − k ) i (cid:12)(cid:12) sL F ,s (cid:13)(cid:13)(cid:13) pp − ν (cid:13)(cid:13)(cid:13) R ( Z i , u ) + R ( Z ∗ ( i − k ) i , u ) (cid:13)(cid:13)(cid:13) pν ≤ sup i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ X j =0 L F ,j (cid:12)(cid:12) X i − j − X ∗ ( i − k ) i − j (cid:12)(cid:12) s ∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) pp − ν (cid:18) k R ( Z i , u ) k pν + (cid:13)(cid:13)(cid:13) R ( Z ∗ ( i − k ) i , u ) (cid:13)(cid:13)(cid:13) pν (cid:19) ≤ dC R k X j =0 L F ,j ( δ X pp − νs ( k − j )) s . This shows the ﬁrst assertion. Due tosup f ∈F (cid:12)(cid:12) ¯ f ( Z i , u ) − ¯ f ( Z ∗ ( i − k ) i , u ) (cid:12)(cid:12) ≤ | Z i − Z ∗ ( i − k ) i | sL F ,s (cid:0) R ( Z i , u ) + R ( Z ∗ ( i − k ) i , u ) (cid:1) , the second assertion follows similarly. The last assertion follows from | ¯ f ( z, u ) | ≤ | ¯ f ( z, u ) − ¯ f (0 , u ) | + | ¯ f (0 , u ) | ≤ | z | sL F ,s · ( R ( z, u ) + R (0 , u )) + | ¯ f (0 , u ) | which implies k ¯ f ( Z i , u ) k ν ≤ (cid:13)(cid:13)(cid:13) ∞ X j =0 L F ,j | Z i − j | s ∞ (cid:13)(cid:13)(cid:13) pp − ν (cid:0)(cid:13)(cid:13) R ( Z i , u ) (cid:13)(cid:13) pq + R (0 , u ) (cid:1) + | ¯ f (0 , u ) |≤ d · | L F | · C sX · ( C R + | R (0 , u ) | ) + | ¯ f (0 , u ) |≤ d · | L F | · C sX · C R + C ¯ f . mpirical process theory for locally stationary processes Lemma 8.1.

Let Assumption 2.8 hold for some ν ≥ . Then for all u ∈ [0 , , δ E [ f ( Z i ,u ) |G i − ] ν ( k ) ≤ | D f,n ( u ) | · ∆( k ) , (8.1)sup i (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) E [ f ( Z i , u ) |G i − ] − E [ f ( Z i , u ) |G i − ] ∗ ( i − k ) (cid:12)(cid:12)(cid:13)(cid:13)(cid:13) ν ≤ D ∞ n ( u ) · ∆( k ) , (8.2)sup i k f ( Z i , u ) k ≤ | D f,n ( u ) | · C ∆ . (8.3) Furthermore, δ E [ f ( Z i ,u ) |G i − ] ν/ ( k ) ≤ | D f,n ( u ) | · sup i k f ( Z i , u ) k ν · ∆( k ) , (8.4) (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) E [ f ( Z i , u ) |G i − ] − E [ f ( Z i , u ) |G i − ] ∗ ( i − k ) (cid:12)(cid:12)(cid:13)(cid:13)(cid:13) ν/ ≤ D ∞ n ( u ) · C ∆ · ∆( k ) , (8.5) where C ∆ := 2 max { d, ˜ d }| L F | C sX C R + C ¯ f . Proof of Lemma 8.1.

We havesup i (cid:13)(cid:13) E [ f ( Z i , u ) |G i − ] − E [ f ( Z i , u ) |G i − ] ∗ ( i − k ) (cid:13)(cid:13) ν = | D f,n ( u ) | · sup i (cid:13)(cid:13) ¯ µ (1) f,i ( Z ◦ i − , u ) − ¯ µ (1) f,i (( Z ◦ i − ) ∗ ( i − k ) , u ) (cid:13)(cid:13) ν ≤ | D f,n ( u ) | · sup i (cid:13)(cid:13)(cid:13)(cid:12)(cid:12) Z ◦ i − − ( Z ◦ i − ) ∗ ( i − k ) (cid:12)(cid:12) sL F ,s (cid:13)(cid:13)(cid:13) pνp − (cid:13)(cid:13)(cid:13) R ( Z ◦ i − , u ) + R (( Z ◦ i − ) ∗ ( i − k ) , u ) (cid:13)(cid:13)(cid:13) pν ≤ | D f,n ( u ) | · sup i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ X j =0 L F ,j (cid:12)(cid:12) X ◦ i − − j − ( X ◦ i − − j ) ∗ ( i − k ) (cid:12)(cid:12) s ∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) pνp − × (cid:16) k R ( G i − , u ) k pν + (cid:13)(cid:13)(cid:13) R ( G ∗ ( i − k ) i − , u ) (cid:13)(cid:13)(cid:13) pν (cid:17) ≤ | D f,n ( u ) | · d ◦ C R k − X j =0 L F ,j δ pνsp − ( k − j − s , that is, the assertion (8.1) holds with the given ∆( k ). The proof of (8.2) is similar.We now prove (8.3). We have E [ f ( Z i , u ) ] = E [ E [ f ( Z i , u ) |G i − ]] = D f,n ( u ) E [¯ µ (2) f,i ( Z ◦ i − , u ) ]and thus k f ( Z i , u ) k = | D f,n ( u ) | · k ¯ µ (2) f,i ( Z ◦ i − , u ) k . Since | ¯ µ (2) f,i ( y, u ) | ≤ | ¯ µ (2) f,i ( y, u ) − ¯ µ (2) f,i (0 , u ) | + | ¯ µ (2) f,i (0 , u ) | , the proof now follows the same lines as in the proof of Lemma 2.7.2We now show (8.4) and (8.5). We have (cid:12)(cid:12) ¯ µ (2) f,i ( z, u ) − ¯ µ (2) f,i ( z ′ , u ) (cid:12)(cid:12) = (cid:12)(cid:12) ¯ µ (2) f,i ( z, u ) − ¯ µ (2) f,i ( z ′ , u ) (cid:12)(cid:12) · (cid:2) | ¯ µ (2) f,i ( z, u ) | + | ¯ µ (2) f,i ( z ′ , u ) | (cid:3) . We then have by the Cauchy Schwarz inequality that (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) ¯ µ (2) f,i ( Z ◦ i − , u ) − ¯ µ (2) f,i (( Z ◦ i − ) ∗ ( i − k ) , u ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ν/ ≤ (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) ¯ µ (2) f,i ( Z ◦ i − , u ) − ¯ µ (2) f,i (( Z ◦ i − ) ∗ ( i − k ) , u ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ν · (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) ¯ µ (2) f,i ( Z ◦ i − , u ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ν . (8.6)Since { ¯ µ (2) f,i : f ∈ F , i ∈ { , ..., n }} forms a ( L F , s, R, C )-class, the ﬁrst factor in (8.6) isbounded by ∆( k ) as in the proof of Lemma 2.7. Furthermore, | ¯ µ (2) f,i ( z, u ) | ≤ | ¯ µ (2) f,i ( z, u ) − ¯ µ (2) f,i (0 , u ) | + | ¯ µ (2) f,i (0 , u ) |≤ | z | sL F ,s ( R ( z, u ) + R (0 , u )) + | ¯ µ (2) f,i (0 , u ) | . Note that (cid:13)(cid:13)(cid:13) | Z ◦ i − | sL F ,s · (cid:2) R ( Z ◦ i − , u ) + R (0 , u ) (cid:3)(cid:13)(cid:13)(cid:13) ν ≤ (cid:13)(cid:13)(cid:13) ∞ X j =0 L F ,j | Z ◦ i − − j | s ∞ (cid:13)(cid:13)(cid:13) pp − ν · (cid:16) k R ( Z ◦ i − , u ) k pν + | R (0 , u ) | (cid:17) ≤ d ◦ | L F | sup i,j k X ◦ ij k s νspp − · ( C R + | R (0 , u ) | ) ≤ d ◦ | L F | C sX C R . We now obtain (8.5) from (8.6) with the given C ∆ .By the Cauchy Schwarz inequality, we have for q ≥ δ E [ f ( Z i ,u ) |G i − ] ν/ ( k )= sup i (cid:13)(cid:13)(cid:13) E [ f ( Z i , u ) |G i − ] − E [ f ( Z i , u ) |G i − ] ∗ ( i − k ) (cid:13)(cid:13)(cid:13) ν/ = | D f,n ( u ) | · sup i (cid:13)(cid:13)(cid:13) D f,n ( u ) (cid:0) ¯ µ (2) f,i ( Z ◦ i − , u ) − ¯ µ (2) f,i (( Z ◦ i − ) ∗ ( i − k ) , u ) (cid:1)(cid:13)(cid:13)(cid:13) ν/ ≤ | D f,n ( u ) | · sup i (cid:13)(cid:13)(cid:13) ¯ µ (2) f,i ( Z ◦ i − , u ) − ¯ µ (2) f,i (( Z ◦ i − ) ∗ ( i − k ) , u ) (cid:13)(cid:13)(cid:13) ν × (cid:13)(cid:13)(cid:13) D f,n ( u )¯ µ (2) f,i ( Z ◦ i − , u ) (cid:13)(cid:13)(cid:13) ν (8.7)Furthermore, (cid:13)(cid:13)(cid:13) D f,n ( u )¯ µ (2) f,i ( Z ◦ i − , u ) (cid:13)(cid:13)(cid:13) ν ≤ k E [ f ( Z i , u ) |G i − ] / k ν ≤ k f ( Z i , u ) k ν . (8.8)Since Assumption 2.8 holds for µ (2) f,i , the ﬁrst factor in (8.7) is bounded by D f,n ( u )∆( k )as in the proof of Lemma 2.7. Inserting this and (8.8) into (8.7), we obtain the result(8.4). mpirical process theory for locally stationary processes Proof of Theorem 3.2.

Denote the three terms on the right hand side of (3.1) by A , A , A . We now discuss the three terms separately. First, we have E A ≤ ∞ X j = q √ n E max f ∈F (cid:12)(cid:12)(cid:12) n X i =1 ( W i,j +1 ( f ) − W i,j ( f )) (cid:12)(cid:12)(cid:12) . For ﬁxed j , the sequence E i,j := ( E i,j ( f )) f ∈F = (cid:0) ( W i,j +1 ( f ) − W i,j ( f )) (cid:1) f ∈F = ( E [ W i ( f ) | ε i − j , ..., ε i ] − E [ W i ( f ) | ε i − j +1 , ..., ε i ]) f ∈F is a |F| -dimensional martingale diﬀerence vector with respect to G i = σ ( ε i − j , ε i − j +1 , ... ).For a vector x = ( x f ) f ∈F and s ≥

1, write | x | s := ( P f ∈F | x f | s ) /s . By Theorem 4.1 in[19] there exists an absolute constant c > s > (cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12) n X i =1 E i,j (cid:12)(cid:12)(cid:12) s (cid:13)(cid:13)(cid:13) ≤ c n (cid:13)(cid:13)(cid:13) sup i =1 ,...,n | E i,j | s (cid:13)(cid:13)(cid:13) + p s − (cid:13)(cid:13)(cid:13)(cid:16) n X i =1 E [ | E i,j | s |G i − ] (cid:17) / (cid:13)(cid:13)(cid:13) o . (8.9)We have (cid:13)(cid:13)(cid:13) sup i =1 ,...,n | E i,j | s (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:0) sup i =1 ,...,n | E i,j | s (cid:1) / (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:0) n X i =1 | E i,j | s (cid:1) / (cid:13)(cid:13)(cid:13) , therefore both terms in (8.9) are of the same order and it is enough to bound the secondterm in (8.9). We have (cid:13)(cid:13)(cid:13)(cid:16) n X i =1 E [ | E i,j | s |G i − ] (cid:17) / (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) n X i =1 E [ | E i,j | s |G i − ] (cid:13)(cid:13)(cid:13) / ≤ (cid:16) n X i =1 (cid:13)(cid:13) E [ | E i,j | s |G i − ] (cid:13)(cid:13) (cid:17) / ≤ (cid:16) n X i =1 (cid:13)(cid:13) | E i,j | s (cid:13)(cid:13) (cid:17) / . (8.10)Note that E i,j ( f ) = W i,j +1 ( f ) − W i,j ( f ) = E [ W i ( f ) | ε i − j , ..., ε i ] − E [ W i ( f ) | ε i − j +1 , ..., ε i ]= E [ W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) |G i ] , (8.11)where H ( F i ) ∗∗ ( i − j ) := H ( F ∗∗ ( i − j ) i ) and F ∗∗ ( i − j ) i = ( ε i , ε i − , ..., ε i − j , ε ∗ i − j − , ε ∗ i − j − , ... ).4By Jensen’s inequality, Lemma 2.7 and the fact that ( W i ( f ) ∗∗ ( i − j ) , W i ( f ) ∗∗ ( i − j +1) ) hasthe same distribution as ( W i ( f ) , W i ( f ) ∗ ( i − j ) ), k| E i,j | s (cid:13)(cid:13) = | (cid:13)(cid:13)(cid:13)(cid:16) X f ∈F | E i,j ( f ) | s (cid:17) /s (cid:13)(cid:13)(cid:13) ≤ s /s (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) E [ W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) |G i ] (cid:12)(cid:12)(cid:13)(cid:13)(cid:13) ≤ e · (cid:13)(cid:13)(cid:13) E (cid:2) sup f ∈F (cid:12)(cid:12) W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) (cid:12)(cid:12) (cid:12)(cid:12) G i (cid:3)(cid:13)(cid:13)(cid:13) ≤ e · (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) = e · (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) W i ( f ) − W i ( f ) ∗ ( i − j ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ≤ e · D ∞ n ( in )∆( j ) . (8.12)Inserting (8.12) into (8.10) delivers (cid:16) n X i =1 (cid:13)(cid:13) | E i,j | s (cid:13)(cid:13) p (cid:17) / ≤ e (cid:16) n X i =1 D ∞ n ( in ) (cid:17) / ∆( j ) , Inserting this bound into (8.9), we obtain (cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12) n X i =1 E i,j (cid:12)(cid:12)(cid:12) s (cid:13)(cid:13)(cid:13) ≤ ec s / n / (cid:16) n n X i =1 D ∞ n ( in ) (cid:17) / ∆( j ) . We conclude with s := 2 ∨ log |F| that E A ≤ √ n ∞ X k = q (cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12) n X i =1 E i,j (cid:12)(cid:12)(cid:12) s (cid:13)(cid:13)(cid:13) ≤ ec · p ∨ log |F| · (cid:16) n n X i =1 D ∞ n ( in ) (cid:17) / ∞ X j = q ∆ p ( j ) ≤ ec · √ H · D ∞ n β ( q ) . (8.13)We now discuss E A . If M Q , σ Q > Q i ( f ), i = 1 , ..., m mean-zero in-dependent variables (depending on f ∈ F ) with | Q i ( f ) | ≤ M Q , ( m P mi =1 k Q i ( f ) k ) / ≤ σ Q , then there exists some universal constant c > E max f ∈F √ m (cid:12)(cid:12)(cid:12) m X i =1 (cid:2) Q i ( f ) − E Q i ( f ) (cid:3)(cid:12)(cid:12)(cid:12) ≤ c · (cid:16) σ Q √ H + M Q H √ m (cid:17) , (8.14)(see e.g. [5] (equation (4.3) in Section 4.1 therein). mpirical process theory for locally stationary processes W k,j − W k,j − ) k is a martingale diﬀerence sequence and W k,τ l − W k,τ l − = P τ l j = τ l − +1 ( W k,j − W k,j − ). Furthermore, we have k W k,j − W k,j − k ≤ k W k − E [ W k | ε k − j +1 ] k ≤ k W k k and k W k,j − W k,j − k = k E [ W ∗∗ ( k − j +1) k − W ∗∗ ( k − j +2) k |G k ] k ≤ k W ∗∗ ( k − j +1) k − W ∗∗ ( k − j +2) k k = k W k − W ∗ ( k − j +1) k k = δ W k ( j − , thus k W k,j − W k,j − k ≤ min {k W k k , δ W k ( j − } . We conclude with the elementary inequality min { a , b } +min { a , b } ≤ min { a + a , b + b } that k T i,l k = (cid:13)(cid:13)(cid:13) ( iτ l ) ∧ n X k =( i − τ l +1 ( W k,τ l − W k,τ l − ) (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) τ l X j = τ l − +1 ( iτ l ) ∧ n X k =( i − τ l +1 ( W k,j − W k,j − ) (cid:13)(cid:13)(cid:13) ≤ τ l X j = τ l − +1 (cid:13)(cid:13)(cid:13) ( iτ l ) ∧ n X k =( i − τ l +1 ( W k,j − W k,j − ) (cid:13)(cid:13)(cid:13) ≤ τ l X j = τ l − +1 (cid:16) ( iτ l ) ∧ n X k =( i − τ l +1 k W k,j − W k,j − k (cid:17) / ≤ τ l X j = τ l − +1 min n(cid:16) ( iτ l ) ∧ n X k =( i − τ l +1 k W k k (cid:17) / , (cid:16) ( iτ l ) X k =( i − τ l +1 ( δ W k ( j − (cid:17) / o . Put σ i,l := (cid:16) τ l ( iτ l ) ∧ n X k =( i − τ l +1 k W k k (cid:17) / , ∆ i,j,l := (cid:16) τ l ( iτ l ) ∧ n X k =( i − τ l +1 δ W k ( j − (cid:17) / . (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 i even τ l k T i,l ( f ) k (cid:17) / ≤ (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 (cid:16) τ l X j = τ l − +1 min n(cid:16) τ l ( iτ l ) ∧ n X k =( i − τ l +1 k W k k (cid:17) / , (cid:16) τ l ( iτ l ) ∧ n X k =( i − τ l +1 δ W k ( j − (cid:17) / o(cid:17) (cid:17) / = (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 (cid:16)(cid:0) τ l − τ l − (cid:1) min { σ i , ∆ i,τ l − +1 ,l } (cid:17) / = (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 (cid:0) τ l − τ l − (cid:1) min { σ i,l , ∆ i,τ l − +1 ,l } (cid:17) / ≤ (cid:0) τ l − τ l − (cid:1) · (cid:16) min { nτ l ⌊ nτl ⌋ +1 X i =1 σ i,l , nτ l ⌊ nτl ⌋ +1 X i =1 ∆ i,τ l − +1 ,l } (cid:17) / ≤ τ l X j = τ l − +1 min { (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 σ i,l (cid:17) / , (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 ∆ i,τ l − +1 ,l (cid:17) / }≤ τ l X j = τ l − +1 min {k f k ,n , (cid:16) n n X i =1 δ W i ( τ l − ) (cid:17) / }≤ τ l X j = τ l − +1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } (8.15)With √ τ l (cid:12)(cid:12) T i,l ( f ) (cid:12)(cid:12) ≤ √ τ l k f k ∞ ≤ √ τ l M and (8.14), we obtain L X l =1 h E max f ∈F (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12)i ≤ c L X l =1 h sup f (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 i even (cid:13)(cid:13)(cid:13)(cid:13) √ τ l T i,l ( f ) (cid:13)(cid:13)(cid:13)(cid:13) (cid:17) / √ H + 2 √ τ l M H q nτ l i , mpirical process theory for locally stationary processes i odd) in A . With (8.15), we conclude that E A ≤ L X l =1 h E max f ∈F q nτ l (cid:12)(cid:12)(cid:12) X ≤ i ≤⌊ nτl ⌋ +1 ,i odd √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) + E max f ∈F q nτ l (cid:12)(cid:12)(cid:12) X ≤ i ≤⌊ nτl ⌋ +1 ,i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12)i ≤ c L X l =1 h(cid:16) τ l X j = τ l − +1 min { max f ∈F k f k ,n , D n ∆( ⌊ j ⌋ ) } (cid:17) · √ H + √ τ l M H q ⌊ nτ l ⌋ + 1 i . (8.16)Note that L X l =1 √ τ l q ⌊ nτ l ⌋ + 1 ≤ L X l =1 √ τ l q nτ l = 1 √ n L X l =0 τ l = 1 √ n L − X l =1 l ≤ √ n (2 L + q ) ≤ q √ n . (8.17)Furthermore, we have by Lemma 8.2 that L X l =1 τ l X j = τ l − +1 min { max f ∈F k f k ,n , D n ∆( ⌊ j ⌋ ) } ≤ ∞ X j =2 min { max f ∈F k f k ,n , D n ∆( ⌊ j ⌋ ) }≤ V (max f ∈F k f k ,n )= 2 max f ∈F ¯ V ( k f k ,n ) = 2 max f ∈F V ( f ) , (8.18)where ¯ V ( x ) = x + ∞ X j =1 min { x, D n ∆( j ) } (8.19)and the second to last equality holds since x ¯ V ( x ) is increasing.Inserting (8.17) and (8.18) into (8.16), we conclude that with some universal c > E A ≤ c (cid:16) sup f ∈F V ( f ) √ H + qM H √ n (cid:17) ≤ c (cid:16) σ √ H + qM H √ n (cid:17) . (8.20)Since S Wn, = P ni =1 W i, ( f ) is a sum of independent variables with | W i, ( f ) | ≤ k f k ∞ ≤ M and k W i, ( f ) k ≤ k f k ≤ V ( f ) ≤ σ , we obtain from (8.14) again E A ≤ c (cid:16) σ √ H + M H √ n (cid:17) . (8.21)If we insert the bounds (8.13), (8.20) and (8.21) into (3.1), we obtain the result (3.2).8We now show (3.3). If q ∗ ( M √ H √ n D ∞ n ) Hn ≤

1, we have q ∗ ( M √ H √ n D ∞ n ) ∈ { , ..., n } and thus by (3.2): E max f ∈F (cid:12)(cid:12)(cid:12) √ n S n ( f ) (cid:12)(cid:12)(cid:12) ≤ c (cid:16) √ H D ∞ n β (cid:16) q ∗ (cid:16) M √ H √ n D ∞ n (cid:17)(cid:17) + q ∗ (cid:16) M √ H √ n D ∞ n (cid:17) M H √ n + σ √ H (cid:17) ≤ c (cid:16) q ∗ (cid:16) M √ H √ n D ∞ n (cid:17) M H √ n + σ √ H (cid:17) = 2 c (cid:16) √ nM · min n q ∗ (cid:16) M √ H √ n D ∞ n (cid:17) Hn , o + σ √ H (cid:17) . (8.22)If q ∗ ( M √ H √ n D ∞ n ) Hn ≥

1, we note that the simple bound E max f ∈F (cid:12)(cid:12)(cid:12) √ n S n ( f ) (cid:12)(cid:12)(cid:12) ≤ √ nM ≤ c (cid:16) √ nM min n q ∗ (cid:16) M √ H √ n D ∞ n (cid:17) Hn , o + σ √ H (cid:17) (8.23)holds. Putting the two bounds (8.22) and (8.23) together, we obtain the result (3.3). Lemma 8.2.

Let ω ( k ) be an increasing sequence in k . Then, for any x > , ∞ X j =2 min { x, D n ∆( ⌊ j ⌋ ) } ω ( j ) ≤ ∞ X j =1 min { x, D n ∆( j ) } ω (2 j + 1) . Especially in the case ω ( k ) = 1 , ∞ X j =2 min { x, D n ∆( ⌊ j ⌋ ) } ≤ ∞ X j =1 min { x, D n ∆( j ) } . Proof of Lemma 8.2.

It holds that ∞ X j =2 min { x, D n ∆( ⌊ j ⌋ ) } ω ( j )= ∞ X k =1 min { x, D n ∆( ⌊ k ⌋ ) } ω (2 k ) + ∞ X k =1 min { x, D n ∆( ⌊ k + 12 ⌋ ) } ω (2 k + 1)= ∞ X k =1 min { x, D n ∆( k ) } · { ω (2 k ) + ω (2 k + 1) }≤ ∞ X k =1 min { x, D n ∆( k ) } · ω (2 k + 1) . mpirical process theory for locally stationary processes Proof of Corollary 3.3.

Let σ := sup n ∈ N sup f ∈F V ( f ) < ∞ . For Q ≥

1, deﬁne M n = √ n √ H r ( σQ / D ∞ n ) D ∞ n . Let ¯ F = sup f ∈F ¯ f , and F ( z, u ) = D ∞ n ( u ) · ¯ F ( z, u ). Then F is an envelope function of F .We furthermore have P ( sup i =1 ,...,n F ( Z i , in ) > M n ) ≤ P (cid:16)(cid:0) n n X i =1 F ( Z i , in ) ν (cid:1) /ν > M n n /ν (cid:17) ≤ nM νn · k F k νν,n . (8.24)Inserting the bound k F k νν,n = 1 n n X i =1 D ∞ n ( in ) ν k ¯ F ( Z i , in ) k νν ≤ C ν ∆ · n n X i =1 D ∞ n ( in ) ν ≤ C ν ∆ · ( D ∞ ν,n ) ν into (8.24) and using r ( γa ) ≥ γr ( a ) for γ ≥ , a > P ( sup i =1 ,...,n F ( Z i , in ) > M n ) ≤ (cid:16) Hn − ν r ( σQ / D ∞ n ) (cid:17) ν/ · (cid:16) C ∆ D ∞ ν,n D ∞ n (cid:17) ν ≤ Q ν/ (cid:16) Hn − ν r ( σ D ∞ n ) (cid:17) ν/ · (cid:16) C ∆ D ∞ ν,n D ∞ n (cid:17) ν . (8.25)Using the rough bound k f k ν,n ≤ k F k ν,n and r ( a ) ≤ a for a > f ∈F √ n n X i =1 E [ f ( Z i , in ) {| f ( Z i , in ) | >M n } ] ≤ √ nM ν − n max f ∈F n X i =1 E [ | f ( Z i , in ) | ν ] ≤ nM νn · M n √ n max f ∈F k f k νν,n ≤ (cid:16) C Hn − ν r ( σQ / D ∞ n ) (cid:17) ν/ · σQ / √ H · (cid:16) D ∞ ν,n D ∞ n (cid:17) ν ≤ σQ ν − √ H (cid:16) C Hn − ν r ( σ D ∞ n ) (cid:17) ν/ · (cid:16) D ∞ ν,n D ∞ n (cid:17) ν . (8.26)Abbreviate C n := (cid:16) C Hn − ν r ( σ D ∞ n ) (cid:17) ν/ · (cid:16) D ∞ ν,n D ∞ n (cid:17) ν . n ∈ N C n < ∞ . By Theorem 3.2, (8.25) and (8.26), P (cid:16) max f ∈F (cid:12)(cid:12) G n ( f ) (cid:12)(cid:12) > Q √ H (cid:17) ≤ P (cid:16) max f ∈F (cid:12)(cid:12) G n ( f ) (cid:12)(cid:12) > Q √ H, sup i =1 ,...,n ¯ F ( Z i , in ) ≤ M (cid:17) + P ( sup i =1 ,...,n F ( Z i , in ) > M ) ≤ P (cid:16) max f ∈F (cid:12)(cid:12) G n (max { min { f, M } , − M } ) (cid:12)(cid:12) > Q √ H/ (cid:17) + P (cid:16) max f ∈F (cid:12)(cid:12) √ n n X i =1 E [ f ( Z i , in ) {| f ( Z i , in ) | >M } ] > Q √ H/ (cid:17) + P ( sup i =1 ,...,n F ( Z i , in ) > M ) ≤ cQ √ H h σ √ H + q ∗ (cid:16) r ( σQ / D ∞ n ) (cid:17) r ( σQ / D ∞ n ) D ∞ n i + (cid:16) Q ν + 2 σQ ν H (cid:17) C n ≤ cσQ / + (cid:16) Q ν + 2 σQ ν H (cid:17) C n . Since sup n ∈ N C n < ∞ and σ is independent of n , the assertion follows for Q → ∞ . Proof of Lemma 3.5. (i) Since | x | + | x | ≤ m implies | x | , | x | ≤ m , we have I := (cid:12)(cid:12) ϕ ∧ m ( x + x + x ) − ϕ ∧ m ( x ) − ϕ ∧ m ( x ) (cid:12)(cid:12) = (cid:12)(cid:12) ϕ ∧ m ( x + x + x ) − x − x | . Case 1: x + x + x > m . Then, since | x | + | x | ≤ m , we have I = | m − x − x | = m − x − x < x ≤ | x | .Case 2: x + x + x ∈ [ − m, m ]. Then I = | x + x + x − x − x | = | x | .Case 3: x + x + x < − m . Then, since | x | + | x | ≤ m , we have I = |− m − x − x | = m + x + x < − x ≤ | x | .Furthermore, I ≤ | ϕ m ( x + x + x ) | + | x + x | ≤ m + m = 2 m .(ii) The ﬁrst assertion is obvious. If | x | ≤ y , we have | ϕ ∨ m ( x ) | =  x − m, x > m , x ∈ [ − m, m ] − x − m, x < − m =  | x | − m, x > m , x ∈ [ − m, m ] | x | − m, x < − m = ( | x | − m ) | x | >m ≤ ( y − m ) y>m = ( y − m ) ∨ y − m ) { y − m> } ≤ y y>m , which shows the second assertion. mpirical process theory for locally stationary processes z, z ′ ∈ R N it holds that | ϕ ∧ m ( f )( z ) − ϕ ∧ m ( f )( z ′ ) | ≤ | f ( z ) − f ( z ′ ) | , | ϕ ∨ m ( f )( z ) − ϕ ∨ m ( f )( z ′ ) | ≤ | f ( z ) − f ( z ′ ) | (8.27)from which the assertion follows. For real numbers a i , b i , we havemax i { a i } = max i { a i − b i + b i } ≤ max i { a i − b i } + max i { b i } , thus | max i { a i }− max i { b i }| ≤ max i | a i − b i | . This implies | max { a, y }− max { a, y ′ }| ≤| y − y ′ | and therefore | ϕ ∧ m ( f )( z ) − ϕ ∧ m ( f )( z ′ ) | = | ( − m ) ∨ ( f ( z ) ∧ m ) − ( − m ) ∨ ( f ( z ′ ) ∧ m ) | ≤ | f ( z ) ∧ m − f ( z ′ ) ∧ m | = | ( − f ( z ′ )) ∨ ( − m ) − ( − f ( z )) ∨ ( − m ) | ≤ | f ( z ) − f ( z ′ ) | . For the second inequality in (8.27), note that ϕ ∨ m ( f )( z ) = ( f ( z ) − m ) ∨ f ( z ) + m ) ∧ . We therefore have | ϕ ∨ m ( f )( z ) − ϕ ∨ m ( f )( z ′ ) | = (cid:12)(cid:12) ( f ( z ) − m ) ∨ − ( f ( z ′ ) − m ) ∨ f ( z )+ m ) ∧ − ( f ( z ′ )+ m ) ∧ | . If f ( z ) , f ( z ′ ) ≥ m , then | ϕ ∨ m ( f )( z ) − ϕ ∨ m ( f )( z ′ ) | ≤ (cid:12)(cid:12) ( f ( z ) − m ) ∨ − ( f ( z ′ ) − m ) ∨ | ≤ | f ( z ) − f ( z ′ ) | . A similar result is obtained for f ( z ) , f ( z ′ ) ≤ − m . If f ( z ) ≥ m , f ( z ′ ) < m , then | ϕ ∨ m ( f )( z ) − ϕ ∨ m ( f )( z ′ ) |≤ (cid:12)(cid:12) ( f ( z ) − m ) − ( f ( z ′ ) + m ) ∧ | = ( | f ( z ) − f ( z ′ ) − m | = f ( z ) − f ( z ′ ) − m ≤ f ( z ) − f ( z ′ ) , f ( z ′ ) ≤ − m, | f ( z ) − m | = f ( z ) − m ≤ f ( z ) − f ( z ′ ) , f ( z ′ ) > − m . A similar result is obtained for f ( z ) ≥ m , f ( z ′ ) ≤ m , which proves (8.27). Proof of Lemma 3.6.

For q ∈ N , put β norm ( q ) := β ( q ) q .(i) q ∗ ( · ) and r ( · ) are well-deﬁned since β norm ( · ) is decreasing (at a rate ≪ q − ) and r q ∗ ( r ) r is increasing (at a rate ≪ r ) and lim r ↓ q ∗ ( r ) r = 0.Let a >

0. We show that r = 2 r ( a ) fulﬁlls q ∗ ( r ) r ≤ a . By deﬁnition of r ( a ), weobtain r ( a ) ≥ r = 2 r ( a ) which gives the result. Since β norm is decreasing, q ∗ isdecreasing. We conclude that q ∗ ( r ) r = 2 · q ∗ (2 r ( a r ( a ≤ · q ∗ ( r ( a r ( a ≤ · a a. The second inequality r ( a ) ≤ a follows from the fact that q ∗ ( r ) r is increasing and q ∗ ( a ) a ≥ a .2(ii) By Theorem 3.2 and the deﬁnition of r ( · ), E max f ∈F (cid:12)(cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ c (cid:16) δ p H ( k ) + q ∗ (cid:16) m ( n, δ, k ) p H ( k ) √ n D ∞ n (cid:17) m ( n, δ, k ) H ( k ) √ n (cid:17) = c (cid:16) δ p H ( k ) + D ∞ n q ∗ ( r ( δ D n )) r ( δ D n ) p H ( k ) (cid:17) = c (1 + D ∞ n D n ) δ p H ( k ) . which shows (3.10).Since k f ( Z i , in ) { f ( Z i , in ) >γm ( n,δ,k ) } k ≤ γm ( n, δ, k ) k f ( Z i , in ) k = 1 γm ( n, δ, k ) k f ( Z i , in ) k , for all f ∈ F with V ( f ) ≤ δ , it holds that √ n k f { f>γm ( n,δ,k ) k ,n ≤ √ nγm ( n, δ, k ) k f k ,n ≤ γ k f k ,n D ∞ n r ( δ D n ) p H ( k ) . (8.28)If k f k ,n ≥ D n ∆(1), we have V ( f ) = k f k ,n + D n ∞ X j =1 ∆( j ) ≥ k f k ,n + D n β (1) . (8.29)In the case k f k ,n < D n ∆(1), the fact that ∆( · ) is decreasing implies that a ∗ =max { j ∈ N : k f k ,n < D n ∆( j ) } is well-deﬁned. We conclude that V ( f ) = k f k ,n + ∞ X j =0 k f k ,n ∧ ( D n ∆( j )) = k f k ,n + a ∗ X j =1 k f k ,n + D n ∞ X j = a ∗ +1 ∆( j )= k f k ,n ( a ∗ + 1) + D n β ( a ∗ ) ≥ k f k ,n a ∗ + β ( a ∗ ) . (8.30)Summarizing the results (8.29) and (8.30), we have V ( f ) ≥ k f k ,n ( a ∗ ∨

1) + D n β ( a ∗ ∨ . We conclude that V ( f ) ≥ min a ∈ N (cid:2) k f k ,n a + D n β ( a ) (cid:3) ≥ k f k ,n ˆ a + D n β (ˆ a ) , where ˆ a = arg min j ∈ N (cid:8) k f k ,n · j + D n β ( j ) (cid:9) .Since δ ≥ V ( f ), we have δ ≥ D n β (ˆ a ) = D n β norm (ˆ a )ˆ a . Thus β norm (ˆ a ) ≤ δ D n ˆ a .By deﬁnition of q ∗ , q ∗ ( δ D n ˆ a ) ≤ ˆ a . Thus q ∗ ( δ D n ˆ a ) δ D n ˆ a ≤ δ D n . By deﬁnition of r ( · ), r ( δ D n ) ≥ δ D n ˆ a . We conclude with k f k ,n ≤ V ( f ) ≤ δ that k f k ,n D ∞ n r ( δ D n ) ≤ D n ˆ a k f k ,n D ∞ n δ = D n V ( f ) k f k ,n D ∞ n δ ≤ D n D ∞ n k f k ,n ≤ D n D ∞ n δ. (8.31) mpirical process theory for locally stationary processes f ∈ A with V ( f ) ≤ δ it holds that √ n k f { f>γm ( n,δ,k ) k ,n ≤ √ nγm ( n, δ, k ) k f k ,n ≤ γ k f k ,n D ∞ n r ( δ D n ) p H ( k ) ≤ γ D n D ∞ n δ p H ( k ) . which shows (3.11). Proof of Theorem 3.7.

In the following, we abbreviate H ( δ ) = H ( δ, F , V ) and N ( δ ) = N ( δ, F , V ).Choose δ = σ and δ j = 2 − j δ . Put m j := 12 m ( n, δ j , N j +1 ) , ( m ( · ) from Lemma 3.6). Choose M n = m . We then have E sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ E sup f ∈F ( M n ) (cid:12)(cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12)(cid:12) + 1 √ n n X i =1 E (cid:2) W i ( F { F >M n } ) (cid:3) , where F ( M n ) := { ϕ ∧ M n ( f ) : f ∈ F} . Due to Lemma 3.5(iii), F ( M n ) still fulﬁlls Assump-tion 2.5.For each j ∈ N , we choose a covering by brackets F prejk := [ l jk , u jk ] ∩ F , k = 1 , ..., N ( δ j )such that V ( u jk − l jk ) ≤ δ j and sup f,g ∈F jk | f − g | ≤ u jk − l jk =: ∆ jk .We now construct inductively a new nested sequence of partitions ( F jk ) k of F from( F prejk ) k in the following way: For each ﬁxed j ∈ N , put {F jk : k } := { j \ i =0 F preik i : k i ∈ { , ..., N ( δ i ) } , i ∈ { , ..., j }} as the intersections of all previous partitions and the j -th partition. Then |{F jk : k }| ≤ N j := N ( δ ) · ... · N ( δ j ). By Lemma 2.1(ii), we havesup f,g ∈F jk | f − g | ≤ ∆ jk , V (∆ jk ) ≤ δ j . In each F jk , ﬁx some f jk ∈ F , and deﬁne π j f := f j,ψ j f where ψ j f := min { i ∈ { , ..., N j } : f ∈ F ji } . Put ∆ j f := ∆ j,ψ j f and I ( σ ) := Z σ p ∨ H ( ε, F , V ) dε, τ := min n j ≥ δ j ≤ I ( σ ) √ n o ∨ . (8.32)Since | f | ≤ g implies | W i ( f ) | ≤ W i ( g ) and k W i ( g ) k ≤ k g ( Z i , in ) k , it holds that | G Wn ( f ) | ≤ √ n n X i =1 (cid:12)(cid:12) W i ( f ) − E W i ( f ) (cid:12)(cid:12) ≤ G Wn ( g ) + 2 √ n n X i =1 k W i ( g ) k ≤ G Wn ( g ) + 2 √ n k g k ,n . By (3.7) and (3.8) and the fact that k f − π f k ∞ ≤ M n ≤ m , we have the decompositionsup f ∈F | G Wn ( f ) | ≤ sup f ∈F | G Wn ( π f ) | + sup f ∈F | G Wn ( ϕ ∧ m τ ( f − π τ f )) | + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) + τ − X j =0 sup f ∈F | G Wn ( R ( j )) |≤ sup f ∈F | G Wn ( π f ) | + n sup f ∈F | G Wn ( ϕ ∧ m τ (∆ τ f )) | + 2 √ n sup f ∈F k ∆ τ f k ,n o + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) + τ − X j =0 n sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j +1 f { ∆ j +1 f>m j +1 } k ,n o + τ − X j =0 n sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j − m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j f { ∆ j f>m j − m j +1 } k ,n o =: R + R + R + R + R . (8.33)We now discuss the terms R i , i ∈ { , ..., } from (8.33). Therefore, put C n := c (1 + D ∞ n D n )+ D n D ∞ n . mpirical process theory for locally stationary processes jk = u jk − l jk with l jk , u jk ∈ F , the class { ∆ jk : k ∈ { , ..., N ( δ j ) }} still fulﬁllsAssumption 3.1. We conclude by Lemma 3.5(iii) that for arbitrary m, ˜ m >

0, the classes { ϕ ∧ m (∆ jk ) : k ∈ { , ..., N ( δ j ) }} , {

12 min { ϕ ∨ m (∆ jk ) , m } : k ∈ { , ..., N ( δ j ) }} , { ϕ ∧ m ( π j +1 f − π j f ) : k ∈ { , ..., N ( δ j ) }} fulﬁll Assumption 3.1. • Since |{ π f : f ∈ F ( M n ) }| ≤ N ( δ ) = N ( σ ), k π f k ∞ ≤ M n ≤ m ( n, δ , N ( δ )) and V ( π f ) ≤ σ = δ (by assumption, every f ∈ F fulﬁlls V ( f ) ≤ σ ), we have by(3.10): E R = E sup f ∈F ( M n ) | G Wn ( π f ) | ≤ C n δ p ∨ log N ( δ ) . • It holds that |{ ϕ ∧ m τ (∆ τ f ) : f ∈ F ( M n ) }| ≤ N τ . If g := ϕ ∧ m τ (∆ τ f ), then k g k ∞ ≤ m τ ≤ m ( n, δ τ , N τ +1 ) and V ( g ) ≤ V (∆ τ f ) ≤ δ τ . We conclude by (3.10) that: E sup f ∈F ( M n ) | G Wn ( ϕ ∧ m τ (∆ τ f )) | ≤ C n δ τ · p ∨ log N τ +1 . (8.34)For the second term, we have by deﬁnition of τ in (8.32) and the Cauchy Schwarzinequality: √ n k ∆ τ f k ,n ≤ √ n k ∆ τ f k ,n ≤ √ nV (∆ τ f ) ≤ √ nδ τ ≤ I ( σ ) . (8.35)From (8.34) and (8.35) we obtain E R ≤ C n δ τ p ∨ log N τ +1 + 2 · I ( σ ) . • Since the partitions are nested, it holds that |{ ϕ ∧ m j − m j +1 ( π j +1 f − π j f ) : f ∈F ( M n ) }| ≤ N j +1 . If g := ϕ ∧ m j − m j +1 ( π j +1 f − π j f ), we have k g k ∞ ≤ m j − m j +1 ≤ m j ≤ m ( n, δ j , N j +1 ) and | g | ≤ | π j +1 f − π j f | ≤ ∆ j f. Furthermore, V ( g ) ≤ V (∆ j f ) ≤ δ j . We conclude by (3.10) that: E R ≤ τ − X j =0 E sup f ∈F ( M n ) | G Wn ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) | ≤ C n τ − X j =0 δ j p ∨ log N j +1 . • It holds that |{ min { ϕ ∨ m j +1 (∆ j +1 f ) , m j } : f ∈ F ( M n ) }| ≤ N j +1 . If g := min { ϕ ∨ m j +1 (∆ j +1 f ) , m j } ,we have k g k ∞ ≤ m j = m ( n, δ j , N j +1 ) and | g | ≤ ∆ j +1 f. V ( g ) ≤ V (∆ j +1 f ) ≤ δ j +1 ≤ δ j . We conclude by (3.10) that: τ − X j =0 E sup f ∈F ( M n ) | G Wn (min { ϕ ∨ m j +1 (∆ j +1 f ) , m j } ) | ≤ C n τ − X j =0 δ j p ∨ log N j +1 . (8.36)Note that V (∆ j +1 f ) ≤ δ j +1 and m j +1 = m ( n, δ j +1 , N j +2 ). By (3.11), we have √ n k ∆ j +1 f { ∆ j +1 f>m j +1 } k ≤ δ j +1 p ∨ log N j +2 . (8.37)From (8.36) and (8.37) we obtain E R ≤ ( C n + 4) τ X j =0 δ j p ∨ log N j +1 . • It holds that |{ min { ϕ ∨ m j − m j +1 (∆ j f ) , m j } : f ∈ F ( M n ) }| ≤ N j +1 . If g := min { ϕ ∨ m j − m j +1 (∆ j f ) , m j } ,we have k g k ∞ ≤ m j = m ( n, δ j , N j +1 ) and | g | ≤ ∆ j f. Thus, V ( g ) ≤ V (∆ j f ) ≤ δ j . We conclude by (3.10) that: τ − X j =0 E sup f ∈F ( M n ) | G Wn (min { ϕ ∨ m j − m j +1 (∆ j +1 f ) , m j } ) | ≤ C n τ − X j =0 δ j · p ∨ log N j +1 . (8.38)Note that V (∆ j f ) ≤ δ j and2( m j − m j +1 ) = m ( n, δ j , N j +1 ) − m ( n, δ j +1 , N j +2 )= D ∞ n n / h r ( δ j D n ) p ∨ log N j +1 − r ( δ j +1 D n ) p ∨ log N j +2 i ≥ D ∞ n n / p ∨ log N j +1 (cid:2) r ( δ j D n ) − r ( δ j +1 D n ) (cid:3) ≥ D ∞ n n / p ∨ log N j +1 r ( δ j D n ) = m j , where the last inequality is due to Lemma 3.6(i). By (3.11) we have √ n k ∆ j f { ∆ j f>m j − m j +1 } k ,n ≤ √ n k ∆ j f { ∆ j f> mj } k ,n m j = m ( n,δ j ,N j +1 ) ≤ δ j p ∨ log N j +1 . (8.39)From (8.38) and (8.39) we obtain R ≤ ( C n + 8) τ − X j =0 δ j p ∨ log N j +1 . mpirical process theory for locally stationary processes R i , i = 1 , ...,

5, we obtain that with some universal constant˜ c > E sup f ∈F ( M n ) (cid:12)(cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ ˜ c · C n h τ X j =0 δ j p ∨ log N j +1 + I ( σ ) i . (8.40)We have (1 ∨ log N j ) / = (cid:16) ∨ P ji =0 log N ( δ i ) (cid:17) / ≤ (cid:16) P ji =0 (1 ∨ H ( δ i )) (cid:17) ≤ P ji =0 (1 ∨ H ( δ i )) / , thus τ X j =0 δ j p ∨ log N j +1 ≤ ∞ X j =0 δ j j X i =0 p ∨ H ( δ i +1 ) ≤ ∞ X i =0 (cid:16) ∞ X j = i δ j (cid:17)p ∨ H ( δ i +1 )= 2 ∞ X i =0 δ i p ∨ H ( δ i +1 ) ≤ ∞ X i =0 δ i +1 p ∨ H ( δ i +1 ) . (8.41)Since H is increasing, we obtain ∞ X i =0 δ i +1 p ∨ H ( δ i +1 ) ≤ ∞ X i =0 δ i p ∨ H ( δ i ) = 2 ∞ X i =0 δ i +1 p ∨ H ( δ i )= 2 ∞ X i =0 Z δ i δ i +1 p ∨ H ( δ i ) dε ≤ ∞ X i =0 Z δ i δ i +1 p ∨ H ( ε ) dε = 2 Z σ p ∨ H ( ε ) dε = 2 · I ( σ ) . (8.42)Inserting (8.42) into (8.41) and then into (8.40), we obtain the result. Proof of Corollary 3.11.

Deﬁne ˜ F := { f − g : f, g ∈ F} . It is easily seen that N ( ε, ˜ F , V ) ≤ N ( ε , F , V ) (cf. [22], Theorem 19.5), thus H ( ε, ˜ F , V ) ≤ H ( ε , F , V ) (8.43)Let σ >

0. Deﬁne F ( z, u ) := 2 D ∞ n ( u ) · ¯ F ( z, u ) , ¯ F ( z, u ) := sup f ∈F | ¯ f ( z, u ) | . Then obviously, F is an envelope function of ˜ F .8By Markov’s inequality, Theorem 3.7 and (8.43), P (cid:16) sup V ( f − g ) ≤ σ, f,g ∈F | G n ( f ) − G n ( g ) | ≥ η (cid:17) ≤ η E sup V ( f − g ) ≤ σ, f,g ∈F | G n ( f ) − G n ( g ) | = 1 η E sup ˜ f ∈ ˜ F ,V ( ˜ f ) ≤ σ | G n ( ˜ f ) |≤ ˜ cη h (1 + D ∞ n D n + D n D ∞ n ) Z σ q ∨ H ( ε, ˜ F , V ) dε + √ n (cid:13)(cid:13) F { F > m ( n,σ, N ( σ )) } (cid:13)(cid:13) i ≤ ˜ cη h √ D ∞ n D n + D n D ∞ n ) Z σ/ p ∨ H ( u, F , V ) du + 4 p ∨ H ( σ ) r ( σ D n ) (cid:13)(cid:13) F { F > n / r ( σ ) √ ∨ H ( σ } (cid:13)(cid:13) ,n i . The ﬁrst term converges to 0 by (3.12) and (3.13) for σ → n ).We now discuss the second term. The continuity conditions from Assumption 2.5 andAssumption 3.10 transfer to ¯ F by the inequality | ¯ F ( z , u ) − ¯ F ( z , u ) | = | sup f ∈F ¯ f ( z , u ) − sup f ∈F ¯ f ( z , u ) | ≤ sup f ∈F | f ( z , u ) − f ( z , u ) | We therefore have as in Lemma 8.9(ii) that for all u, u , u , v , v ∈ [0 , k ¯ F ( Z i , u ) − ¯ F ( ˜ Z i ( in ) , u ) k ≤ C cont · n − αs , (8.44) k ¯ F ( Z i ( v ) , u ) − ¯ F ( ˜ Z i ( v ) , v ) k ≤ C cont · (cid:0) | v − v | αs + | u − u | αs (cid:1) . (8.45)Put c n = n / sup i =1 ,...,n D ∞ n ( in ) r ( σ ) √ ∨ H ( σ ) . Then by Lemma 8.7(ii) and (8.44), k F { F > n / r ( σ ) √ ∨ H ( σ } k ,n ≤ n n X i =1 D ∞ n ( in ) · E h ¯ F ( Z i , in ) {| ¯ F ( Z i , in ) | >c n } i ≤ n n X i =1 D ∞ n ( in ) · E h ¯ F ( ˜ Z i ( in ) , in ) {| ¯ F ( ˜ Z i ( in ) , in ) | >c n } i +16 C cont · n − αs · ( D ∞ n ) . (8.46)Put ˜ W i ( u ) := ¯ F ( ˜ Z i ( u ) , u ) and a n ( u ) := ( D ∞ n ( u )) . By (8.45), k ˜ W i ( u ) − ˜ W i ( u ) k ≤ C cont | u − u | αs . By the assumptions on D f,n ( · ), c n → ∞ and lim sup n →∞ n P ni =1 | a n ( in ) | =lim sup n →∞ ( D ∞ n ) < ∞ . We conclude with Lemma 8.8(i) that16 n n X i =1 D ∞ n ( in ) · E h ¯ F ( ˜ Z i ( in ) , in ) {| ¯ F ( ˜ Z i ( in ) , in ) | >c n } i → , that is, the ﬁrst summand in (8.46) tends to 0. Since lim sup n →∞ D ∞ n < ∞ , we obtainthat (8.46) tends to 0. mpirical process theory for locally stationary processes For the noncontinuous arguments, we need an exponential type inequality which onlyassumes that the process has one moment, which is easily derived from a Bernsteininequality. We then obtain the following lemma.

Lemma 8.3.

Assume that Q i ( f ) , i = 1 , ..., m are independent variables indexed by f ∈F which fulﬁll E Q i ( f ) = 0 , m P mi =1 k W i ( f ) k ≤ σ Q and | W i ( f ) | ≤ M Q a.s. ( i = 1 , ..., n ).Then there exists some universal constant c > such that E max f ∈F (cid:12)(cid:12)(cid:12) m m X i =1 W i ( f ) (cid:12)(cid:12)(cid:12) ≤ c (cid:16) σ Q + M Q Hm (cid:17) , (8.47) where H is deﬁned by (1.5). Proof of Lemma 8.3.

By Bernstein’s inequality, we have for each f ∈ F that P (cid:16)(cid:12)(cid:12)(cid:12) m m X i =1 Q i (cid:12)(cid:12)(cid:12) ≥ x (cid:17) ≤ (cid:16) − x m P mi =1 k Q i k + x M Q m (cid:17) ≤ (cid:16) − x M Q m · σ Q + x M Q m (cid:17) , where we used in the last step that k Q i k = E [ Q i ] ≤ M Q k Q i k .With standard arguments (cf. the proof of Lemma 19.33 in [22]), we conclude that thereexists some universal constant c > E max f ∈F (cid:12)(cid:12)(cid:12) m m X i =1 Q i ( f ) (cid:12)(cid:12)(cid:12) ≤ c (cid:16) √ H ( σ Q M Q m ) / + M Q Hm (cid:17) . The result follows by using ( Hσ Q M Q m ) / ≤ M Q Hm + 2 σ Q . Proof of Lemma 4.2.

We use a similar argument as in Theorem 3.2, especially wemake use of the decomposition (3.1). Denote the three summands in (3.1) with A , A , A .We ﬁrst discuss A . We have L X l =1 E max f ∈F nτ l (cid:12)(cid:12)(cid:12) X ≤ i ≤⌊ nτl ⌋ +1 ,i odd τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) . k W k,j ( f ) − W k,j − ( f ) k ≤ {k W k ( f ) k , δ W k ( f )1 ( j − } , we have for each f ∈ F ,1 τ l k T i,l k ≤ τ l X j = τ l − +1 τ l (cid:13)(cid:13)(cid:13) ( iτ l ) ∧ n X k =( i − τ l +1 ( W k,j − W k,j − ) (cid:13)(cid:13)(cid:13) ≤ τ l X j = τ l − +1 τ l ( iτ l ) ∧ n X k =( i − τ l +1 (cid:13)(cid:13)(cid:13) W k,j − W k,j − (cid:13)(cid:13)(cid:13) ≤ τ l X j = τ l − +1 τ l ( iτ l ) ∧ n X k =( i − τ l +1 min {k W k ( f ) k , δ W k ( f )1 ( j − }≤ τ l X j = τ l − +1 min { τ l ( iτ l ) ∧ n X k =( i − τ l +1 k W k ( f ) k , τ l ( iτ l ) ∧ n X k =( i − τ l +1 δ W k ( f )1 ( j − } = 2 τ l X j = τ l − +1 min { σ i,l , ∆ i,j,l } , where σ i,l := 1 τ l ( iτ l ) ∧ n X k =( i − τ l +1 k W k ( f ) k , ∆ i,j,l := 1 τ l ( iτ l ) ∧ n X k =( i − τ l +1 δ W k ( f )1 ( j − . We conclude that1 ⌊ nτ l ⌋ + 1 ⌊ nτl ⌋ +1 X i =1 τ l k T i,l k ≤ τ l X j = τ l − +1 min { nτ l ⌊ nτl ⌋ +1 X i =1 σ i,l , nτ l ⌊ nτl ⌋ +1 X i =1 ∆ i,j,l }≤ τ l X j = τ l − +1 min { n n X i =1 k W i ( f ) k , n n X i =1 δ W i ( j ) } . (8.48)Furthermore, it holds that1 τ l | T i,l | ≤ i k W i ( f ) k ∞ ≤ k f k ∞ ≤ M . (8.49)By Lemma 8.3, (8.47), we have with some universal constant c > √ n E A ≤ c L X l =1 h sup f ∈F (cid:16) ⌊ nτ l ⌋ + 1 ⌊ nτl ⌋ +1 X i =1 τ l k T i,l ( f ) k (cid:17) + 2 M H ⌊ nτ l ⌋ + 1 i ≤ c (cid:16) L X l =1 sup f ∈F τ l X j = τ l − +1 min { n n X i =1 k W i ( f ) k , n n X i =1 δ W i ( j ) } + qM Hn (cid:17) . (8.50) mpirical process theory for locally stationary processes L X l =1 sup f ∈F τ l X j = τ l − +1 min { n n X i =1 k W i ( f ) k , n n X i =1 δ W i ( j ) }≤ L X l =1 sup f ∈F τ l X j = τ l − +1 min { n n X i =1 k f ( Z i , in ) k , n n X i =1 D f,n ( in ) k f ( Z i , in ) k · ∆( j ) }≤ ∞ X j =1 min { sup f ∈F k f k ,n , D n sup f ∈F k f k ,n · ∆( j ) } = sup f ∈F k f k ,n · ¯ V (sup f ∈F k f k ,n )= sup f ∈F (cid:0) k f k ,n · ¯ V ( k f k ,n ) (cid:1) ≤ sup f ∈F (cid:2) k f k ,n V ( f ) (cid:3) , (8.51)where we have used the deﬁnition of ¯ V from (8.19) and in the second-to-last equality thefact that x x · ¯ V ( x ) is increasing in x .We also have k W i, ( f ) − E W i, ( f ) k ∞ ≤ k f k ∞ ≤ M and k W i, ( f ) − E W i, ( f ) k ≤ k W i ( f ) k . Thus by Lemma 8.3, (8.47),1 √ n E A ≤ E max f ∈F (cid:12)(cid:12)(cid:12) n n X i =1 ( W i, ( f ) − E W i, ( f )) (cid:12)(cid:12)(cid:12) ≤ c (cid:16) sup f ∈F n n X i =1 k W i ( f ) k + M Hn (cid:17) ≤ c (cid:16) sup f ∈F k f k ,n + M Hn (cid:17) . (8.52)and thusFinally, it holds that1 √ n E A ≤ ∞ X j = q E sup f ∈F (cid:12)(cid:12)(cid:12) n n X i =1 ( W i,j +1 ( f ) − W i,j ( f )) (cid:12)(cid:12)(cid:12) ≤ ∞ X j = q n n X i =1 (cid:13)(cid:13) sup f ∈F | W i,j +1 ( f ) − W i,j ( f ) | (cid:13)(cid:13) . Since | W i,j +1 ( f ) − W i,j ( f ) | = | E [ W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) |G i ] | ≤ E [ | W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) | |G i ] (cf. (8.11) for the introduced notation), we have (cid:13)(cid:13) sup f ∈F | W i,j +1 ( f ) − W i,j ( f ) | (cid:13)(cid:13) ≤ (cid:13)(cid:13) E [max f ∈F | W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) | |G i ] (cid:13)(cid:13) ≤ (cid:13)(cid:13) sup f ∈F | W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) | (cid:13)(cid:13) = (cid:13)(cid:13) sup f ∈F | W i ( f ) − W i ( f ) ∗ ( i − j ) | (cid:13)(cid:13) ≤ D ∞ n ( in ) C ∆ ∆( j ) , (8.53)which shows that 1 √ n E A ≤ ( D ∞ n ) C ∆ β ( q ) . (8.54)Collecting the upper bounds (8.50), (8.51), (8.52) and (8.54), we obtain that E max f ∈F (cid:12)(cid:12)(cid:12) n S Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ (4 c + 1) · h sup f ∈F (cid:2) k f k ,n V ( f ) (cid:3) + ( D ∞ n ) C ∆ β ( q ) + qM Hn i . (8.55)By (8.31), V ( f ) ≤ σ implies k f k ,n ≤ D n r ( δ D n ) k f k ,n and thus k f k ,n ≤ D n r ( σ D n ) , thus sup f ∈F (cid:2) k f k ,n V ( f ) (cid:3) ≤ D n r ( σ D n ) σ. (8.56)Inserting (8.56) into (8.55) yields the ﬁrst assertion (4.1) of the lemma.We now show (4.2) with a case distinction. We abbreviate q ∗ = q ∗ ( M Hn ( D ∞ n ) C ∆ ). If q ∗ Hn ≤ q ∗ ∈ { , ..., n } and thus P ≤ c (cid:16) D n r ( σ D n ) σ + ( D ∞ n ) C ∆ β ( q ∗ ) + q ∗ M Hn (cid:17) ≤ c (cid:16) D n r ( σ D n ) σ + q ∗ M Hn (cid:17) = 2 c (cid:16) D n r ( σ D n ) σ + M · min n q ∗ Hn , o(cid:17) . (8.57)If q ∗ Hn ≥

1, choose q = ⌊ nH ⌋ ≤ nH . By simply bounding each summand with M , wehave E max f ∈F (cid:12)(cid:12)(cid:12) n S n ( f ) (cid:12)(cid:12)(cid:12) ≤ M ≤ c (cid:16) D n r ( σ D n ) σ + M (cid:17) ≤ c (cid:16) D n r ( σ D n ) σ + M · min n q ∗ Hn , o(cid:17) . (8.58)holds. Putting the two bounds (8.57) and (8.58) together, we obtain the result (4.2). mpirical process theory for locally stationary processes Lemma 8.4.

Let F be some ﬁnite class of functions. Let R > be arbitrary and assumethat sup f ∈F k f k ∞ ≤ M . Then there exists a universal constant c > such that E max f ∈F (cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) { R n ( f ) ≤ R } ≤ c n R √ H + M H √ n o , (8.59) where H is deﬁned by (1.5). Proof of Lemma 8.4.

By Theorem 3.3 in [19], it holds for x, a > f that P (cid:16)(cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) ≥ x, R n ( f ) ≤ R (cid:17) ≤ (cid:16) − x R + k f k ∞ x √ n ) (cid:17) . Using standard arguments (cf. the proof of Lemma 19.33 in [22]), we obtain (8.59).

Proof of Corollary 4.3.

Let Q ≥

1, and σ := sup n ∈ N sup f ∈F V ( f ) < ∞ . Put M n = √ n √ H r (cid:0) σQ / D ∞ n (cid:1) D ∞ n . Let F ( z, u ) := D ∞ n ( u ) · ¯ F ( z, u ), (recall ¯ F = sup f ∈F ¯ f ). Then P (cid:16) max f ∈F | G n ( f ) | > Q √ H (cid:17) ≤ P (cid:16) max f ∈F | G n ( f ) | > Q √ H, sup i =1 ,...,n F ( Z i , in ) ≤ M n (cid:17) + P (cid:16) sup i =1 ,...,n F ( Z i , in ) > M n (cid:17) ≤ P (cid:16) max f ∈F | G n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) + P (cid:16) √ n max f ∈F (cid:12)(cid:12) n X i =1 E [ f ( Z i , in ) {| f ( Z i , in ) | >M n } ] (cid:12)(cid:12) > Q √ H (cid:17) + P (cid:16) sup i =1 ,...,n F ( Z i , in ) > M n (cid:17) . (8.60)4For the ﬁrst summand in (8.60), we use the decomposition P (cid:16) max f ∈F | G n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) ≤ P (cid:16) max f ∈F | G (1) n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) + P (cid:16) max f ∈F | G (2) n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) ≤ P (cid:16) max f ∈F | G (1) n ( ϕ ∧ M n ( f )) | > Q √ H , max f ∈F R n ( ϕ ∧ M n ( f )) ≤ σ (cid:17) + P (cid:16) max f ∈F R n ( ϕ ∧ M n ( f )) > σ (cid:17) + P (cid:16) max f ∈F | G (2) n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) . (8.61)We now discuss the three terms separately. By Lemma 8.4, we have P (cid:16) max f ∈F | G (1) n ( ϕ ∧ M n ( f )) | > Q √ H , max f ∈F R n ( ϕ ∧ M n ( f )) ≤ Q / σ (cid:17) ≤ cQ √ H h σQ / √ H + M n H √ n i ≤ cQ √ H h σQ / √ H + σ √ HQ / i ≤ cQ / . By Lemma 4.2 and (8.63), P (cid:16) max f ∈F R n ( ϕ ∧ M n ( f )) > Q / σ (cid:17) ≤ cσ Q / h D n r ( σ D n ) σ + q ∗ (cid:16) M Hn ( D ∞ n ) C ∆ (cid:17) M Hn i ≤ cσ Q / h σ + q ∗ (cid:16) r ( σQ / D ∞ n ) C ∆ (cid:17) r ( σQ / D ∞ n ) ( D ∞ n ) i ≤ cσ Q / h σ + q ∗ (cid:16) C − C − β (cid:1) · h q ∗ (cid:16) r ( σQ / D ∞ n ) (cid:17) r ( σQ / D ∞ n ) i ( D ∞ n ) i ≤ cσ Q / h σ + q ∗ (cid:16) C − C − β (cid:1) σ Q i |≤ cQ / (cid:2) q ∗ (cid:16) C − C − β (cid:1)(cid:3) . By Theorem 3.2 applied to W i ( f ) = E [ f ( Z i , in ) |G i − ], P (cid:16) max f ∈F | G (2) n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) ≤ cQ √ H · h σ √ H + q ∗ (cid:16) r ( σQ / D ∞ n ) (cid:17) r ( σQ / D ∞ n ) D ∞ n i ≤ cQ √ H (cid:2) σ √ H + σQ / √ H (cid:3) ≤ cσQ / . mpirical process theory for locally stationary processes P (cid:16) max f ∈F | G n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) ≤ cQ / + 2 cQ / (cid:2) q ∗ (cid:16) C − C − β (cid:1)(cid:3) + 16 cσQ / → Q → ∞ . The second and third summand in (8.60) were already discussed in the proofof Corollary 3.3 ((8.25) and (8.26) therein; note especially that we only need there that k ¯ F ( Z i , in ) k ν ≤ C ¯ F ,n instead of C ∆ which is part of the assumptions), and converge to 0for Q → ∞ under the given assumptions. Proof of Lemma 4.5.

By Lemma 8.4 and since r ( a ) ≤ a (cf. Lemma 3.6(i)), E max f ∈F (cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) { R n ( f ) ≤ δψ ( δ ) } ≤ c n ψ ( δ ) δ p H ( k ) + m ( n, δ, k ) H ( k ) √ n o ≤ c · (cid:2) ψ ( δ ) · δ + D ∞ n r ( δ D n ) (cid:3)p H ( k ) ≤ c · (1 + D ∞ n D n ) · ψ ( δ ) δ p H ( k ) , which shows (4.4).By (8.31), k f k ,n ≤ D n r ( δ D n ) k f k ,n and thus k f k ,n ≤ D n r ( δ D n ). Note that due to r ( a ) ≤ a , E R n ( f ) = 1 n n X i =1 E [ f ( Z i , in ) ] ≤ k f k ,n ≤ ( D n r ( δ D n )) ≤ δ . (8.62)Recall that β norm ( q ) = β ( q ) q . By Assumption 2.10, we have that for any x , x > q = q ∗ ( x ) q ∗ ( x ) satisﬁes β norm (˜ q ) ≤ C β β norm ( q ∗ ( x )) β norm ( q ∗ ( x )) ≤ C β x x . Thus, by deﬁnition of q ∗ , q ∗ ( C β x x ) ≤ q ∗ ( x ) q ∗ ( x ) . (8.63)We obtain that q ∗ (cid:16) r ( δ D n ) C ∆ (cid:17) ≤ q ∗ (cid:16) r ( δ D n ) (cid:17) q ∗ (cid:0) C − C − β (cid:1) . (8.64)6By (8.62), Markov’s inequality, Lemma 4.2 and (8.64), P (cid:16) sup f ∈F R n ( f ) > ψ ( δ ) δ (cid:17) ≤ P (cid:16) sup f ∈F | R n ( f ) − E R n ( f ) | > ψ ( δ ) δ (cid:17) ≤ cψ ( δ ) δ · h D n r ( δ D n ) δ + q ∗ (cid:16) r ( δ D n ) C ∆ (cid:17) r ( δ D n ) ( D ∞ n ) i ≤ cψ ( δ ) δ · h δ + h q ∗ (cid:16) r ( δ D n ) (cid:17) r ( δ D n ) i q ∗ (cid:0) C − C − β (cid:1) ( D ∞ n ) i ≤ cψ ( δ ) δ · h δ + δ q ∗ (cid:0) C − C − β (cid:1) ( D ∞ n D n ) i ≤ c (1 + q ∗ (cid:0) C − C − β (cid:1) ( D ∞ n D n ) ) ψ ( δ ) , which shows (4.5). Proof of Theorem 4.6.

In the following, we abbreviate H ( δ ) = H ( δ, F , V ) and N ( δ ) = N ( δ, F , V ).We use exactly the same setup as in the proof of Theorem 3.7, that is, we choose δ = σ and δ j = 2 − j δ , and m j = 12 m ( n, δ j , N j +1 ) , as well as M n = m . We then use E sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12)(cid:12) ≤ E sup f ∈F ( M n ) (cid:12)(cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12)(cid:12) + 1 √ n n X i =1 E (cid:2) F ( Z i ) { F ( Z i ) >M n } (cid:3) , (8.65)where F ( M n ) := { ϕ ∧ M n ( f ) : f ∈ F} .As in the proof of Theorem 3.7, we construct a nested sequence of partitions ( F jk ) k =1 ,...,N j , j ∈ N of F ( M n ) (where N j := N ( δ ) · ... · N ( δ j )), and a sequence ∆ jk of measurable func-tions such that sup f,g ∈F jk | f − g | ≤ ∆ jk , V (∆ jk ) ≤ δ j . In each F jk , we ﬁx some f jk ∈ F , and deﬁne π j f := f j,ψ j f where ψ j f := min { i ∈{ , ..., N j } : f ∈ F ji } , and put ∆ j f := ∆ j,ψ j f , and I ( σ ) := Z σ ψ ( ε ) p ∨ H ( ε, F , V ) dε, as well as τ := min n j ≥ δ j ≤ I ( σ ) √ n o ∨ . (8.66) mpirical process theory for locally stationary processes f, g with | f | ≤ g , it holds that | G (1) n ( f ) | ≤ | G (1) n ( g ) | + 2 √ n · n n X i =1 E [ g ( Z i , in ) |G i − ] ≤ | G (1) n ( g ) | + 2 | G (2) n ( g ) | + 2 √ n · n n X i =1 E [ g ( Z i , in )] ≤ | G (1) n ( g ) | + 2 | G (2) n ( g ) | + 2 √ n k g k ,n . By (3.7) and (3.8) (applied to W i ( f ) = f ( Z i , in ) − E [ f ( Z i , in ) |G i − ] ) and the fact that k f − π f k ∞ ≤ M n ≤ m , we have the decompositionsup f ∈F | G (1) n ( f ) | ≤ sup f ∈F | G (1) n ( π f ) | + sup f ∈F | G (1) n ( ϕ ∧ m τ ( f − π τ f )) | + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) + τ − X j =0 sup f ∈F | G (1) n ( R ( j )) |≤ sup f ∈F | G (1) n ( π f ) | + n sup f ∈F | G (1) n ( ϕ ∧ m τ (∆ τ f )) | + 2 sup f ∈F | G (2) n ( ϕ ∧ m τ (∆ τ f )) | +2 √ n sup f ∈F k ∆ τ f k ,n o + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) + τ − X j =0 n sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 sup f ∈F (cid:12)(cid:12)(cid:12) G (2) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j +1 f { ∆ j +1 f>m j +1 } k ,n o + τ − X j =0 n sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j − m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 sup f ∈F (cid:12)(cid:12)(cid:12) G (2) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j − m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j f { ∆ j f>m j − m j +1 } k ,n o (8.67)8We have for f ∈ F ( M n ): π f = ϕ ∧ M n ( π f ) ,ϕ ∧ m τ (∆ τ f ) ≤ min { ∆ τ f, m τ } ,ϕ ∧ m j − m j − ( π j +1 f − π j f ) ≤ min { ∆ j f, m j } , min { ϕ ∨ m j +1 (∆ j +1 f ) , m j } ≤ min { ∆ j f, m j } , min { ϕ ∨ m j − m j +1 (∆ j f ) , m j } ≤ min { ∆ j f, m j } . (8.68)We therefore deﬁne the eventΩ n := { sup f ∈F ( M n ) R n ( ϕ ∧ M n ( π f )) ≤ σψ ( σ ) }∩ τ \ j =1 (cid:8) sup f ∈F ( M n ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) (cid:9) . From (8.67) and (8.68), we obtainsup f ∈F ( M n ) | G (1) n ( f ) | Ω n ≤ sup f ∈F ( M n ) | G (1) n ( π f ) | { sup f ∈F ( Mn ) R n ( π f ) ≤ σψ ( σ ) } + n sup f ∈F | G (1) n ( ϕ ∧ m τ (∆ τ f )) |× { sup f ∈F ( Mn ) R n (min { ∆ τ f, m τ } ) ≤ δ τ ψ ( δ τ ) } + 2 R o + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) × { sup f ∈F ( Mn ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) } + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) × { sup f ∈F ( Mn ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) } + 2 R + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j − m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) × { sup f ∈F ( Mn ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) } + 2 R =: ˜ R + { ˜ R + 2 R } + ˜ R + { ˜ R + 2 R } + { ˜ R + 2 R } . (8.69)We now discuss the terms ˜ R i , i = 1 , ..., R i , i ∈ { , , } werealready discussed in the proof of Theorem 3.7. Put˜ C n := 2 c (1 + D ∞ n D n ) , mpirical process theory for locally stationary processes c is from Lemma 3.6. • Since |{ π f : f ∈ F ( M n ) }| ≤ N ( δ ), k π f k ∞ ≤ M n ≤ m ( n, δ , N ( δ )), we have byLemma 4.5: E ˜ R = E sup f ∈F ( M n ) | G (1) n ( π f ) | { sup f ∈F ( Mn ) R n ( π f ) ≤ δ ψ ( δ ) } ≤ ˜ C n ψ ( δ ) δ p ∨ log N ( δ ) . • It holds that |{ ϕ ∧ m τ (∆ τ f ) : f ∈ F ( M n ) }| ≤ N τ . If g := ϕ ∧ m τ (∆ τ f ), then k g k ∞ ≤ m τ ≤ m ( n, δ τ , N τ +1 ). We conclude by Lemma 4.5: E ˜ R ≤ E sup f ∈F | G (1) n ( ϕ ∧ m τ (∆ τ f )) |× { sup f ∈F ( Mn ) R n (min { ∆ τ f, m τ } ) ≤ δ τ ψ ( δ τ ) } ≤ ˜ C n ψ ( δ τ ) δ τ · p ∨ log N τ +1 . • Since the partitions are nested, it holds that |{ ϕ ∧ m j − m j +1 ( π j +1 f − π j f ) : f ∈F ( M n ) }| ≤ N j +1 . If g := ϕ ∧ m j − m j +1 ( π j +1 f − π j f ), we have k g k ∞ ≤ m j − m j +1 ≤ m j ≤ m ( n, δ j , N j +1 ). We conclude by Lemma 4.5: E ˜ R ≤ τ − X j =0 E sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) × { sup f ∈F ( Mn ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) } ≤ ˜ C n τ − X j =0 ψ ( δ j ) δ j p ∨ log N j +1 . • It holds that |{ min { ϕ ∨ m j +1 (∆ j +1 f ) , m j } : f ∈ F ( M n ) }| ≤ N j +1 . If g := min { ϕ ∨ m j +1 (∆ j +1 f ) , m j } ,we have k g k ∞ ≤ m j = m ( n, δ j , N j +1 ). We conclude by Lemma 4.5: E ˜ R ≤ τ − X j =0 E sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) × { sup f ∈F ( Mn ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) } ≤ ˜ C n τ − X j =0 ψ ( δ j ) δ j p ∨ log N j +1 . • It holds that |{ min { ϕ ∨ m j − m j +1 (∆ j f ) , m j } : f ∈ F ( M n ) }| ≤ N j +1 . If g := min { ϕ ∨ m j − m j +1 (∆ j f ) , m j } ,we have k g k ∞ ≤ m j = m ( n, δ j , N j +1 ). We conclude by Lemma 4.5 that: E ˜ R ≤ τ − X j =0 E sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j − m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) × { sup f ∈F ( Mn ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) } ≤ ˜ C n τ − X j =0 ψ ( δ j ) δ j · p ∨ log N j +1 . E ˜ R i , i = 1 , ..., R i , i ∈ { , , } from theproof of Theorem 3.7 into (8.69), we obtain that with some universal constant ˜ c > E sup f ∈F ( M n ) (cid:12)(cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12)(cid:12) Ω n ≤ ˜ c (1 + D ∞ n D n + D n D ∞ n ) h τ +1 X j =0 ψ ( δ j ) δ j p ∨ log N j +1 + I ( σ ) i . (8.70)Note that ∞ X j = k δ j ψ ( δ j ) ≤ ∞ X j = k Z δ j δ j +1 ψ ( δ j ) dx ≤ Z δ k ψ ( x ) dx. By partial integration, it is easy to see that there exists some universal constant c ψ > (cid:12)(cid:12) Z δ k ψ ( x ) dx (cid:12)(cid:12) ≤ c ψ δ k ψ ( δ k ) , (8.71)thus ∞ X j = k δ j ψ ( δ j ) ≤ c ψ δ k ψ ( δ k ) . (8.72)Using (8.72), we can argue as in the proof of Theorem 3.7 (see (8.40), (8.41) and (8.42)therein) that there exists some universal constant ˜ c > ∞ X j =0 ψ ( δ j ) δ j p ∨ log N j +1 ≤ ˜ c I ( σ ) . Insertion of the results into (8.70) yields E sup f ∈F ( M n ) (cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) Ω n ≤ ˜ c · (3˜ c + 1)(1 + D ∞ n D n + D n D ∞ n ) I ( σ ) . (8.73)Discussion of the event Ω n : We have P (Ω cn ) ≤ P (cid:16) sup f ∈F ( M n ) R n ( ϕ ∧ M n ( π f )) > ψ ( σ ) σ (cid:17) + τ +1 X j =1 P (cid:16) sup f ∈F ( M n ) R n (min { ∆ j f, m j } ) > ψ ( δ j ) δ j (cid:17) =: R ◦ + R ◦ . (8.74)We now discuss R ◦ i , i = 1 ,

2. Put C ◦ n := 2 c n q ∗ (cid:0) C − C − β (cid:1)(cid:0) D ∞ n D n (cid:1) o , where c is from Lemma 4.5. mpirical process theory for locally stationary processes • Since |{ ϕ ∧ M n ( π f ) : f ∈ F ( M n ) }| ≤ N ( δ ) = N ( σ ), k ϕ ∧ M n ( π f ) k ∞ ≤ M n ≤ m ( n, σ, N ( σ )) and V ( ϕ ∧ M n ( π f )) ≤ V ( π f ) ≤ σ , we have by Lemma 4.5: R ◦ ≤ C ◦ n ψ ( σ ) . • It holds that |{ min { ∆ j f, m j } : f ∈ F ( M n ) }| ≤ N j +1 . We have k min { ∆ j f, m j }k ∞ ≤ m j = m ( n, δ j , N j +1 ) and V (min { ∆ j f, m j } ) ≤ V (∆ j f ) ≤ δ j . We conclude byLemma 4.5 that: R ◦ ≤ C ◦ n τ +1 X j =0 ψ ( δ j ) . Inserting the bounds for R ◦ i , i = 1 , P (Ω cn ) ≤ C ◦ n ∞ X j =0 ψ ( δ j ) . (8.75)We now have ∞ X j =0 ψ ( δ j ) ≤ Z σ εψ ( ε ) dε = 2log(log( σ )) . We conclude that for each η > P (cid:16) sup f ∈F | G (1) n ( f ) | > η (cid:17) ≤ P (cid:16) sup f ∈F | G (1) n ( f ) | > η, Ω n (cid:17) + P (Ω cn ) ≤ η E sup f ∈F | G (1) n ( f ) | Ω n + P (Ω cn ) . Insertion of (8.65), (8.73) and (8.75) gives the result.

Proof of Corollary 4.11.

Deﬁne ˜ F as in Corollary 3.11. We obtain P (cid:16) sup V ( f − g ) ≤ σ, f,g ∈F | G n ( f ) − G n ( g ) | ≥ η (cid:17) ≤ P (cid:16) sup V ( ˜ f ) ≤ σ, ˜ f ∈ ˜ F | G (1) n ( ˜ f ) | ≥ η (cid:17) + P (cid:16) sup V ( ˜ f ) ≤ σ, ˜ f ∈ ˜ F | G (2) n ( ˜ f ) | ≥ η (cid:17) . (8.76)Now let F ( z, u ) := 2 D ∞ n ( u ) · ¯ F ( z, u ), where ¯ F is from Assumption 4.9. Then obviously, F is an envelope function of ˜ F .We now discuss the second summand on the right hand side in (8.76). By Markov’sinequality and Theorem 3.7 applied to W i ( f ) = E [ f ( Z i , in ) |G i − ], we obtain as in the2proof of Corollary 3.11 that P (cid:16) sup V ( ˜ f ) ≤ σ, ˜ f ∈ ˜ F | G (2) n ( ˜ f ) | ≥ η (cid:17) ≤ ˜ c ( η/ h √ D ∞ n D n + D n D ∞ n ) Z σ/ p ∨ H ( u, F , V ) du + 4 p ∨ H ( σ ) r ( σ D n ) (cid:13)(cid:13) F { F > n / r ( σ ) √ ∨ H ( σ } (cid:13)(cid:13) ,n i . (8.77)The ﬁrst summand in (8.77) converges to 0 for σ → n ) sincesup n ∈ N Z σ/ p ∨ H ( u, F , V ) du ≤ sup n ∈ N Z σ ψ ( ε ) p ∨ H ( ε, F , V ) dε < ∞ . We now discuss the second summand in (8.77). The continuity conditions from Assump-tion 4.9 on ¯ F yield as in the proof of Lemma 8.9(ii) that for all u, u , u , v , v ∈ [0 , k ¯ F ( Z i , u ) − ¯ F ( ˜ Z i ( in ) , u ) k ≤ C cont · n − αs/ , (8.78) k ¯ F ( Z i ( v ) , u ) − ¯ F ( ˜ Z i ( v ) , v ) k ≤ C cont · (cid:0) | v − v | αs/ + | u − u | αs (cid:1) . (8.79)As in the proof of Corollary 3.11, we now obtain with (8.78) and (8.79) that (cid:13)(cid:13) F { F > n / r ( σ ) √ ∨ H ( σ } (cid:13)(cid:13) ,n → n → ∞ , which shows that (8.77) converges to 0 for σ → n → ∞ .We now consider the ﬁrst term in (8.76). By Theorem 4.6, we have with some universalconstant c > P (cid:16) sup V ( ˜ f ) ≤ σ, ˜ f ∈ ˜ F | G (1) n ( ˜ f ) | ≥ η (cid:17) ≤ η h c (cid:16) D ∞ n D n + D n D ∞ n (cid:17) · Z σ ψ ( ε ) q ∨ H (cid:0) ε, ˜ F , V (cid:1) d ε + 4 p ∨ H ( σ ) r ( σ D n ) (cid:13)(cid:13) F { F > m ( n,σ, N ( σ )) } (cid:13)(cid:13) i + c (cid:16) q ∗ (cid:0) C − C − β (cid:1)(cid:16) D ∞ n D n (cid:17) (cid:17) Z σ εψ ( ε ) dε. (8.81)For the ﬁrst summand in (8.81), note that by (8.43), Z σ ψ ( ε ) q ∨ H ( ε, ˜ F , V ) dε ≤ √ Z σ/ ψ (2 ε ) p ∨ H ( ε, F , V ) dε ≤ √ Z σ/ ψ ( ε ) p ∨ H ( ε, F , V ) dε. mpirical process theory for locally stationary processes D n , D ∞ n , we obtain that the ﬁrstsummand in (8.81) converges to 0 for σ → n ).The third summand in (8.81) converges to 0 for σ → n ) since R ∞ εψ ( ε ) dε < ∞ and by the uniform boundedness of D n , D ∞ n .The second summand in (8.81) converges to 0 for n → ∞ by (8.80). The following central limit theorem is formulated for a more general structure of D f,n ( · )than in Theorem 3.13. We formulate the conditions on D f,n ( · ) in the following Assump-tion 8.5. Assumption 8.5.

For f ∈ F , let D ∞ f,n := sup i =1 ,...,n D f,n ( in ) . There exists a sequence h n > and v ∈ [0 , such that for all u ∈ [0 , , | v − u | > h n implies D f,n ( u ) = 0 .For all f ∈ F , sup n ∈ N ( h / n · D ∞ f,n ) < ∞ , sup n ∈ N n n X i =1 D f,n ( in ) < ∞ , D ∞ f,n √ n → , and D f,n ( · ) D ∞ f,n has bounded variation uniformly in n . We obtain the following central limit theorem.

Theorem 8.6.

Let F satisfy Assumptions 3.9, 3.10 and 8.5. Suppose that either As-sumption 2.5 or Assumptions 2.8, 4.9 hold. Let m ∈ N and f , ..., f m ∈ F .Suppose that either • Case K = 1 : The mapping u E [ E [ ¯ f k ( ˜ Z j ( u ) , u ) |G ] · E [ ¯ f l ( ˜ Z j ( u ) , u ) |G ]] hasbounded variation for all j , j ∈ N , k, l ∈ { , ..., m } and the limit Σ (1) kl := lim n →∞ Z D f k ,n ( u ) D f l ,n ( u ) · X j ∈ Z Cov ( f k ( ˜ Z ( u ) , u ) , f l ( ˜ Z j ( u ) , u )) du exists for all k, l ∈ { , ..., m } . • Case K = 2 : h n → , and the limit Σ (2) kl := lim n →∞ Z D f k ,n ( u ) D f l ,n ( u ) du · X j ∈ Z Cov ( f k ( ˜ Z ( v ) , v ) , f l ( ˜ Z j ( v ) , v )) exists for all k, l ∈ { , ..., m } . Let Σ ( K ) = (Σ ( K ) kl ) k,l =1 ,...,m . Then √ n n X i =1 n  f ( Z i , in ) ... f m ( Z i , in )  − E  f ( Z i , in ) ... f m ( Z i , in )  o d → N (0 , Σ ( K ) ) , Proof of Theorem 8.6.

Denote W i ( f ) := f ( Z i , in ) and W i := ( W i ( f ) , ..., W i ( f m )) ′ .Let a = ( a , ..., a m ) ′ ∈ R m \{ } . We use the decomposition1 √ n n X i =1 a ′ ( W i − EW i ) = ∞ X j =0 √ n n X i =1 a ′ P i − j W i . For ﬁxed J ∈ N ∪ {∞} , put( S n ( J )) k =1 ,...,m := S n ( J ) := J − X j =0 √ n n X i =1 P i − j W i . Then, since P i − j W i ( f k ), i = 1 , ..., n is a martingale diﬀerence sequence and by Lemma8.9(i), k S n ( ∞ ) k − S n ( J ) k k ≤ ∞ X j = J (cid:13)(cid:13) √ n n X i =1 P i − j W i ( f k ) (cid:13)(cid:13) = ∞ X j = J (cid:16) n n X i =1 k P i − j W i ( f k ) k (cid:17) / ≤ (cid:16) n n X i =1 D f k , ,n ( in ) (cid:17) / · ∞ X j = J ∆( j ) , thuslim sup J,n →∞ k S n ( ∞ ) k − S n ( J ) k k ≤ sup n ∈ N (cid:16) n n X i =1 D f k , ,n ( in ) (cid:17) / · lim sup J →∞ ∞ X j = J ∆( j ) = 0 . (8.82)Deﬁne ( S ◦ n ( J ) k ) k =1 ,...,m := S ◦ n ( J ) := 1 √ n n − J +1 X i =1 J − X j =0 P i W i + j . Then we have k S ◦ n ( J ) k − S n ( J ) k k ≤ J − X j =0 k √ n j X i =1 P i − j W i ( f k ) k + 1 √ n J − X j =0 k n X i = n − J + j +1 P i − j W i ( f k ) k ≤ J √ n · sup i =1 ,...,n + j k P i − j W i ( f k ) k ≤ J √ n · sup i =1 ,...,n + j k f k ( Z i , in ) k . mpirical process theory for locally stationary processes i =1 ,...,n + j k f k ( Z i , in ) k ≤ C ∆ , · D ,n ( in ) , which gives lim n →∞ k S ◦ n ( J ) k − S n ( J ) k k = 0 . (8.83) Stationary approximation:

Put ˜ S ◦ n ( J ) = ( ˜ S ◦ n ( J ) k ) k =1 ,...,m , where˜ S ◦ n ( J ) k := 1 √ n n − J +1 X i =1 J − X j =0 P i f k ( ˜ Z i + j ( in ) , in ) . Then we have k S ◦ n ( J ) k − ˜ S ◦ n ( J ) k k ≤ J − X j =0 (cid:16) n n − J +1 X i =1 (cid:13)(cid:13)(cid:13) P i f k ( Z i + j , i + jn ) − P i f k ( ˜ Z i + j ( in ) , in ) (cid:13)(cid:13)(cid:13) (cid:17) / . For each j, k , it holds that1 n n − J +1 X i =1 k P i f k ( Z i + j , i + jn ) − P i f k ( ˜ Z i + j ( in ) , in ) k ≤ n n − J +1 X i =1 (cid:16) D f k ,n ( i + jn ) − D f k ,n ( in ) (cid:17) · sup i k ¯ f ( Z i + j , i + jn ) k + 2 n n − J +1 X i =1 D f,n ( in ) · sup i (cid:13)(cid:13)(cid:13) ¯ f k ( Z i + j , i + jn ) − ¯ f k ( ˜ Z i + j ( in ) , in )] k . By Lemma 8.9, we have sup i k ¯ f ( Z i + j , i + jn ) k < ∞ . Since √ n D f k ,n ( · ) has bounded vari-ation uniformly in n ,1 n n − J +1 X i =1 (cid:16) D f k ,n ( i + jn ) − D f k ,n ( in ) (cid:17) ≤ sup i =1 ,...,n √ n D f k ,n ( in ) · √ n n − J +1 X i =1 (cid:12)(cid:12)(cid:12) D f k ,n ( i + jn ) − D f k ,n ( in ) (cid:12)(cid:12)(cid:12) → . By Lemma 8.9(ii), sup i (cid:13)(cid:13)(cid:13) ¯ f k ( Z i + j , i + jn ) − ¯ f k ( ˜ Z i + j ( in ) , in ) (cid:13)(cid:13)(cid:13) → . We therefore obtain k S ◦ n ( J ) k − ˜ S ◦ n ( J ) k k → . (8.84)6Note that M i,k := 1 √ n J X j =0 P i f k ( ˜ Z i + j ( in ) , in ) , i = 1 , ..., n is a martingale diﬀerence sequence with respect to G i − , and˜ S ◦ n ( J ) k = n − J +1 X i =1 M i,k . We can therefore apply a central limit theorem for martingale diﬀerence sequences to a ′ ˜ S ◦ n ( J ) = P n − J +1 i =1 ( P mk =1 a k M i,k ). The Lindeberg condition:

Let ς >

0. Iterated application of Lemma 8.7(i) yields thatthere are constants c , c > m, J such that n − J +1 X i =1 E [( m X k =1 a k M i,k ) {| P mk =1 a k M i,k | >ς √ n } ] ≤ c X l =0 , J − X j =0 m X k =1 | a k | · n n − J X i =1 E h E [ f k ( ˜ Z i + j ( in ) , in ) |G i − l ] {| E [ f k ( ˜ Z i + j ( in ) , in ) |G i − l ] | > √ n ςc | a |∞ } i . For each l, j, k , we have1 n n − J X i =1 E h E [ f k ( ˜ Z i + j ( in ) , in ) |G i − l ] {| E [ f k ( ˜ Z i + j ( in ) , in ) |G i − l ] | > √ n ςc | a |∞ } i = 1 n n − J X i =1 D f k ,n ( in ) E h E [ ¯ f k ( ˜ Z i ( in ) , in ) |G i − l ] {| E [ ¯ f k ( ˜ Z i ( in ) , in ) |G i − l ] | > √ n sup i =1 ,...,n | Df,n ( in ) | ςc | a |∞ } i = 1 n n − J X i =1 D f k ,n ( in ) E h ˜ W i ( in ) {| ˜ W i ( in ) | >c n } i , (8.85)where we have put˜ W i ( u ) := E [ ¯ f k ( ˜ Z i ( u ) , u ) |G i − l ] , c n := √ n sup i =1 ,...,n | D f,n ( in ) | ςc | a | ∞ . By Lemma 8.9(ii), ˜ W i ( u ) satisﬁes the assumptions (8.89) of Lemma 8.8. By assumption, c n → ∞ . With a n ( u ) := D f k ,n ( u ) , we obtain from Lemma 8.8 that (8.85) converges to0, which shows that the Lindeberg condition is satisﬁed. Convergence of the variance:

We have n − J +1 X i =1 E [( m X k =1 M i,k ) |G i − ]= J − X j ,j =0 m X k ,k =1 a k a l · n n − J +1 X i =1 D f k ,n ( in ) D f l ,n ( in ) · E (cid:2) P i ¯ f k ( ˜ Z i + j ( in ) , in ) · P i ¯ f l ( ˜ Z i + j ( in ) , in ) |G i − (cid:3) . mpirical process theory for locally stationary processes j , j , k , k , we deﬁne˜ W i ( u ) := E (cid:2) P i ¯ f k ( ˜ Z i + j ( u ) , u ) · P i ¯ f l ( ˜ Z i + j ( u ) , u ) |G i − (cid:3) , a n ( u ) := D f k ,n ( u ) D f l ,n ( u ) . Then 1 n n − J +1 X i =1 D f k ,n ( in ) D f l ,n ( in ) · E (cid:2) P i ¯ f k ( ˜ Z i + j ( in ) , in ) · P i ¯ f l ( ˜ Z i + j ( in ) , in ) |G i − (cid:3) = 1 n n − J +1 X i =1 a n ( in ) ˜ W i ( in ) . By Lemma 8.9(i),(ii), we have k ˜ W ( u ) − ˜ W ( v ) k ≤ k ¯ f k ( ˜ Z ( u ) , u ) − ¯ f k ( ˜ Z ( v ) , v ) k · k ¯ f l ( ˜ Z ( u )) k + k ¯ f l ( ˜ Z ( u ) , u ) − ¯ f l ( ˜ Z ( v ) , v ) k · k ¯ f k ( ˜ Z ( v )) k ≤ C cont C ¯ f · | u − v | ςs/ Let A n := sup i =1 ,...,n | a n ( in ) | . Since D f,n ( · ) D ∞ f,n has bounded variation uniformly in n , itfollows that a n ( · ) A n has bounded variation uniformly in n . From D ∞ f,n √ n → A n n → n h n n X i =1 | a n ( in ) | i ≤ sup n (cid:16) n n X i =1 D f k ,n ( in ) (cid:17) / · (cid:16) n n X i =1 D f l ,n ( in ) (cid:17) / < ∞ . It holds that sup n ( h n · A n ) ≤ sup n ( h / n D ∞ f k ,n ) · sup n ( h / n D ∞ f l ,n ) < ∞ , and | v − u | > h n ⇒ D f k ,n ( u ) = 0 , D f l ,n ( u ) = 0 , ⇒ a n ( u ) = 0 . Thus, Lemma 8.8(ii) is applicable.Case K = 1: If u E [ P ¯ f k ( ˜ Z j ( u ) , u ) · P ¯ f l ( ˜ Z j ( u ) , u )] has bounded variation, we have1 n n − J +1 X i =1 D f k ,n ( in ) D f l ,n ( in ) · E (cid:2) P i ¯ f k ( ˜ Z i + j ( in ) , in ) · P i ¯ f l ( ˜ Z i + j ( in ) , in ) |G i − (cid:3) p → lim n →∞ Z D f k ,n ( u ) D f l ,n ( u ) · E [ P ¯ f k ( ˜ Z j ( u ) , u ) · P ¯ f l ( ˜ Z j ( u ) , u )] du. and thus n − J +1 X i =1 E [( m X k =1 M i,k ) |G i − ] p → m X k,l =1 a k a l · lim n →∞ Z D f k ,n ( u ) D f l ,n ( u ) · J − X j ,j =0 E [ P ¯ f k ( ˜ Z j ( u ) , u ) · P ¯ f l ( ˜ Z j ( u ) , u )] du = a ′ Σ (1) kl ( J ) a f, g ∈ F , we have that E [ P ¯ f ( ˜ Z j ( u ) , u ) · P ¯ g ( ˜ Z j ( u ) , u )] can be written as E [ P ¯ f ( ˜ Z j ( u ) , u ) · P ¯ g ( ˜ Z j ( u ) , u )]= E [ E [ ¯ f ( ˜ Z j ( u ) , u ) |G ]] · E [¯ g ( ˜ Z j ( u ) , u ) |G ]] − E [ E [ ¯ f ( ˜ Z j ( u ) , u ) |G − ]] · E [¯ g ( ˜ Z j ( u ) , u ) |G − ]]which shows that the condition stated in the assumption guarantees the bounded varia-tion of u E [ P ¯ f ( ˜ Z j ( u ) , u ) · P ¯ g ( ˜ Z j ( u ) , u )].Case K = 2: If h n →

0, then we obtain similarly n − J +1 X i =1 E [( m X k =1 M i,k ) |G i − ] p → m X k,l =1 a k a l · lim n →∞ Z D f k ,n ( u ) D f l ,n ( u ) du · J − X j ,j =0 E [ P ¯ f k ( ˜ Z j ( v ) , v ) · P ¯ f l ( ˜ Z j ( v ) , v )] du = a ′ Σ (2) kl ( J ) a. By the martingale central limit theorem and (8.83), (8.84), we obtain that a ′ S n ( J ) d → N (0 , a ′ Σ ( K ) kl ( J ) a ) . (8.86) Conclusion:

For K ∈ { , } , we have a ′ Σ ( K ) kl ( J ) a → a ′ Σ ( K ) kl ( ∞ ) a ( J → ∞ ) (8.87)due to X j ,j :max { j ,j }≥ J k P ¯ f k ( ˜ Z j ( u ) , u ) · P ¯ f l ( ˜ Z j ( u ) , u ) k ≤ X j ,j :max { j ,j }≥ J k P ¯ f k ( ˜ Z j ( u ) , u ) k k P ¯ f l ( ˜ Z j ( u ) , u ) k → J → ∞ )uniformly in n andsup n Z | D f k ,n ( u ) D f l ,n ( u ) | du ≤ sup n (cid:0) Z D f k ,n ( u ) du (cid:1) / (cid:0) Z D f l ,n ( u ) du (cid:1) / < ∞ . By (8.82), (8.86), (8.87), X j ∈ Z Cov( ¯ f k ( ˜ Z ( u ) , u ) , ¯ f l ( ˜ Z j ( u ) , u )) = ∞ X j ,j =0 E [ P ¯ f k ( ˜ Z j ( u ) , u ) · P ¯ f l ( ˜ Z j ( u ) , u )]and the Cramer-Wold device, the assertion of the theorem follows. mpirical process theory for locally stationary processes Lemma 8.7.

Let c ∈ R , c > .(i) For x, y ∈ R , it holds that ( x + y ) {| x + y | >c } ≤ x {| x | > c } + 8 y {| y | > c } . (ii) For random variables W, ˜ W , it holds that E [ W {| W | >c } ] ≤ E [( W − ˜ W ) ] + 4 E [ ˜ W {| ˜ W | > c } ] . Proof of Lemma 8.7. (i) It holds that( x + y ) {| x + y | >c } ≤ (cid:2) x + y (cid:3) {| x | > c or | y | > c } ≤ (cid:2) x + y (cid:3)(cid:8) {| x | > c , | y | > c } + {| x | > c , | y |≤ c } + {| x |≤ c , | y | > c } (cid:9) ≤ (cid:2) x {| x | > c } + y {| y | > c } (cid:3) + 4 x {| x | > c } + 4 y {| y | > c } ≤ x {| x | > c } + 8 y {| y | > c } . (ii) We have E [ W {| W | >c } ] ≤ E [( | W | − ˜ W ) {| W | >c } ] + 2 E [ ˜ W {| W | >c } ] ≤ E [( W − ˜ W ) ] + 2 E [ ˜ W {| W − ˜ W | + | ˜ W | >c } ] . (8.88)Furthermore, with Markov’s inequality, E [ ˜ W {| W − ˜ W | + | ˜ W | >c } ] ≤ E [ ˜ W {| W − ˜ W | > c } ] + E [ ˜ W {| ˜ W | > c } ] ≤ ( c P ( | W − ˜ W | > c E [ ˜ W {| W − ˜ W | > c } {| ˜ W | > c } ] + E [ ˜ W {| ˜ W | > c } ] ≤ E [( W − ˜ W ) ] + 2 E [ ˜ W {| ˜ W | > c } ] . Inserting this inequality into (8.88), we obtain the assertion.The following lemma generalizes some results from [4] using similar techniques as therein.

Lemma 8.8.

Let q ∈ { , } . Let ˜ W i ( u ) be a stationary sequence with sup u ∈ [0 , k ˜ W ( u ) k q < ∞ , k ˜ W ( u ) − ˜ W ( v ) k q ≤ C W | u − v | ς . (8.89) Let a n : [0 , → R be some sequence of functions with lim sup n →∞ n P ni =1 | a n ( in ) | < ∞ . (i) Let q = 2 . Let c n be some sequence with c n → ∞ . Then n n X i =1 | a n ( in ) | · E [ ˜ W i ( in ) {| ˜ W i ( in ) | >c n } ] → , (ii) Let q = 1 . Suppose that there exists h n > , v ∈ [0 , such that for all u ∈ [0 , , | v − u | > h n implies a n ( u ) = 0 . Put A n = sup i =1 ,...,n | a n ( in ) | and suppose that sup n ∈ N ( h n · A n ) < ∞ , A n n → , a n ( · ) A n has bounded variation uniformly in n. Suppose that the limits on the following right hand sides exist. If u E ˜ W ( u ) hasbounded variation, then n n X i =1 a n ( in ) ˜ W i ( in ) p → lim n →∞ Z a n ( u ) E ˜ W ( u ) du. If h n → , then n n X i =1 a n ( in ) ˜ W i ( in ) p → lim n →∞ Z a n ( u ) du · E ˜ W ( v ) . Proof of Lemma 8.8.

Let J ∈ N be ﬁxed and assume that n ≥ · J . For j ∈{ , ..., J } , Deﬁne I j,J,n := { i ∈ { , ..., n } : in ∈ ( j − J , j J ] } . Then ( I j,J,n ) j forms a decom-position of { , ..., n } in the sense that P J j =1 I j,J,n = { , ..., n } . Since in ∈ ( j − J , j J ] ⇐⇒ j − J · n < i ≤ n · j − J ≤ n J , we conclude that n J − ≤ | I j,J,n | ≤ n J . Thus, since n ≥ · J , (cid:12)(cid:12)(cid:12) I j,J,n | n − J (cid:12)(cid:12)(cid:12) ≤ n , | I j,J,n | ≥ n J . (8.90)Let w i , i ∈ N be an arbitrary sequence. Then it holds that (cid:12)(cid:12)(cid:12) n n X i =1 w i − J J X j =1 | I j,J,n | X i ∈ I j,J,n w i (cid:12)(cid:12)(cid:12) ≤ J X j =1 (cid:12)(cid:12)(cid:12) | I j,J,n | n − J (cid:12)(cid:12)(cid:12) · (cid:12)(cid:12)(cid:12) | I j,J,n | X i ∈ I j,J,n w i (cid:12)(cid:12)(cid:12) ≤ n J X j =1 | I j,J,n | X i ∈ I j,J,n | w i |≤ J n n X i =1 | w i | (8.91) mpirical process theory for locally stationary processes w i = a n ( in ) E [ ˜ W i ( in ) {| ˜ W i ( in ) | >c n } ] yields1 n n X i =1 E [ ˜ W i ( in ) {| ˜ W i ( in ) | >c n } ] ≤ J J X j =1 | I j,J,n | X i ∈ I j,J,n E [ ˜ W i ( in ) {| ˜ W i ( in ) | >c n } ] + 2 J n · n n X i =1 a n ( in ) · sup u k ˜ W ( u ) k . (8.92)By Lemma 8.7(ii),12 J J X j =1 | I j,J,n | X i ∈ I j,J,n | a n ( in ) | · E [ ˜ W i ( in ) {| ˜ W i ( in ) | >c n } ] ≤ J J X j =1 | I j,J,n | X i ∈ I j,J,n | a n ( in ) | · E [ ˜ W ( j J ) {| ˜ W ( j J ) | >c n } ]+ 12 J J X j =1 | I j,J,n | X i ∈ I j,J,n | a n ( in ) | · (cid:13)(cid:13) ˜ W ( in ) − ˜ W ( j J ) (cid:13)(cid:13) ≤ h sup j =1 ,..., J E [ ˜ W ( j J ) {| ˜ W ( j J ) | >c n } ] + C W (2 − J ) ς i · J J X j =1 | I j,J,n | X i ∈ I j,J,n | a n ( in ) | . (8.93)By (8.90), 12 J J X j =1 | I j,J,n | X i ∈ I j,J,n | a n ( in ) | ≤ n n X i =1 | a n ( in ) | . By the dominated convergence theorem,lim sup n →∞ E [ ˜ W ( j J ) {| ˜ W ( j J ) | >c n } ] . Furthermore, lim sup n →∞ J n · sup u k ˜ W ( u ) k = 0. Inserting (8.93) into (8.92) andapplying lim sup n →∞ and afterwards, lim sup J →∞ , yields the assertion.(ii) Since (8.89) also holds for ˜ W ( u ) replaced by ˜ W ( u ) − E ˜ W ( u ), we may assume inthe following that w.l.o.g. that E ˜ W ( u ) = 0.By (8.91) applied to w i = a ( in ) W i ( in ), we obtain (cid:13)(cid:13)(cid:13) n n X i =1 a n ( in ) ˜ W i ( in ) − J J X j =1 | I j,J,n | X i ∈ I j,J,n a n ( in ) ˜ W i ( in ) (cid:13)(cid:13)(cid:13) ≤ J n · n n X i =1 | a n ( in ) | · sup u k W ( u ) k → n → ∞ ) . (8.94)2 We furthermore have (cid:13)(cid:13)(cid:13) J J X j =1 | I j,J,n | X i ∈ I j,J,n a n ( in ) ˜ W i ( in ) − J J X j =1 | I j,J,n | X i ∈ I j,J,n a n ( in ) ˜ W i ( j − J ) (cid:13)(cid:13)(cid:13) ≤ J J X j =1 | I j,J,n | X i ∈ I j,J,n | a n ( in ) | · (cid:13)(cid:13) ˜ W ( in ) − ˜ W ( j − J ) (cid:13)(cid:13) ≤ n n X i =1 | a n ( in ) | · C W (2 − J ) ς . (8.95)Fix j ∈ { , ..., J } . Put u j := j − J and, for a real-valued positive x , deﬁne [ x ] :=max { k ∈ N : k > x } . By stationarity, the following equality holds in distribution:1 | I j,J,n | X i ∈ I j,J,n a n ( in ) ˜ W i ( u j ) d = 1 | I j,J,n | | I j,J,n | X i =1 a n ( in + [ u j n ] − n ) ˜ W i ( u j ) . (8.96)Put ˜ W i ( u ) ◦ := ˜ W i ( u ) { in + [ ujn ] − n ∈ [ r n ,r n ] } . By partial summation and since a n ( · ) A n has bounded variation B a uniformly in n ,1 | I j,J,n | | I j,J,n | X i =1 a n ( in + [ u j n ] −

1) ˜ W i ( u j )= 1 | I j,J,n | | I j,J,n |− X i =1 (cid:8) a n ( in + [ u j n ] − − a n ( i + 1 n + [ u j n ] − (cid:9) i X l =1 ˜ W l ( u j ) ◦ + 1 | I j,J,n | A n · | I j,J,n | X l =1 ˜ W l ( u j ) ◦ ≤ B a + 1 | I j,J,n | A n · sup i =1 ,..., | I j,J,n | (cid:12)(cid:12)(cid:12) i X l =1 ˜ W l ( u j ) ◦ (cid:12)(cid:12)(cid:12) (8.97)By stationarity, we havesup i =1 ,..., | I j,J,n | (cid:12)(cid:12)(cid:12) i X l =1 ˜ W l ( u j ) ◦ (cid:12)(cid:12)(cid:12) = sup i =1 ,..., | I j,J,n | (cid:12)(cid:12)(cid:12) i ∧ ( ⌊ n ( v − h n ) ⌋− [ u j n ]+1) X l =1 ∨ ( ⌈ n ( v + h n ) ⌉− [ u j n ]+1) ˜ W l ( u j ) (cid:12)(cid:12)(cid:12) d = sup i =1 ,...,m n (cid:12)(cid:12)(cid:12) i X l =1 ˜ W l ( u j ) (cid:12)(cid:12)(cid:12) , mpirical process theory for locally stationary processes | I j,J,n |∧ ( ⌊ n ( v + h n ) ⌋− [ u j n ]+1)) − (1 ∨ ( ⌈ n ( v − h n ) ⌉− [ u j n ]+1)) ≤ m n := 2 nh n .By assumption, m n = nA n · A n h n → ∞ .By the ergodic theorem, lim m →∞ (cid:12)(cid:12)(cid:12) m m X l =1 ˜ W l ( u j ) (cid:12)(cid:12)(cid:12) = 0 a.s. and especially ( m P ml =1 ˜ W l ( u j )) m is bounded a.s. We conclude that1 m n sup i =1 ,...,m n (cid:12)(cid:12)(cid:12) i X l =1 ˜ W l ( u j ) (cid:12)(cid:12)(cid:12) ≤ √ m n sup i =1 ,..., √ m n (cid:12)(cid:12)(cid:12) i i X l =1 ˜ W l ( u j ) (cid:12)(cid:12)(cid:12) + sup i = √ m n +1 ,...,m n (cid:12)(cid:12)(cid:12) i i X l =1 ˜ W l ( u j ) (cid:12)(cid:12)(cid:12) → . We conclude from (8.97) that1 | I j,J,n | | I j,J,n | X i =1 a n ( in + [ u j n ] −

1) ˜ W i ( u j ) ≤ · J ( B a + 1) · A n · m n n · m n sup i =1 ,..., | I j,J,n | (cid:12)(cid:12)(cid:12) i X l =1 ˜ W l ( u j ) ◦ (cid:12)(cid:12)(cid:12) → . (8.98)Combination of (8.94), (8.95), (8.96) and (8.98) and applying lim sup n →∞ andafterwards lim sup J →∞ , we obtain1 n n X i =1 a n ( in ) (cid:8) ˜ W i ( in ) − E ˜ W ( in ) (cid:9) p → . If u E ˜ W ( u ) has bounded variation, we have with some intermediate value ξ i,n ∈ [ i − n , in ], (cid:12)(cid:12)(cid:12) n n X i =1 a n ( in ) E ˜ W ( in ) − Z a n ( u ) E ˜ W ( u ) du (cid:12)(cid:12)(cid:12) ≤ n n X i =1 (cid:12)(cid:12) a n ( in ) E ˜ W ( in ) − a n ( ξ i,n ) E ˜ W ( ξ i,n ) (cid:12)(cid:12) ≤ A n n · A n n X i =1 | a n ( in ) − a n ( ξ i,n ) | · sup u k ˜ W ( u ) k + A n n n X i =1 (cid:12)(cid:12) E ˜ W ( in ) − E ˜ W ( ξ i,n ) (cid:12)(cid:12) → . h n →

0, we have with some intermediate value ξ i,n ∈ [ i − n , in ], (cid:12)(cid:12)(cid:12) n n X i =1 a n ( in ) E ˜ W ( in ) − n n X i =1 a n ( in ) E ˜ W ( v ) (cid:12)(cid:12)(cid:12) ≤ n n X i =1 | a n ( in ) | · sup | u − v |≤ h n k ˜ W ( u ) − ˜ W ( v ) k → . Since a n ( · ) A n has bounded variation uniformly in n , (cid:12)(cid:12)(cid:12) n n X i =1 a n ( in ) − Z a n ( u ) du (cid:12)(cid:12)(cid:12) ≤ A n n · A n n X i =1 | a n ( in ) − a n ( ξ i,n ) | → . Lemma 8.9.

Let F satisfy Assumptions 3.9, 3.10. Suppose that either Assumption 2.5or Assumptions 2.8, 4.9 hold. Then there exist constants C cont > , C ¯ f > such that forany f ∈ F ,(i) for any j ≥ , k P i − j f ( Z i , u ) k ≤ D f,n ( u )∆( j ) , sup i =1 ,...,n k f ( Z i , u ) k ≤ C ∆ · D f,n ( u ) , sup i,u k ¯ f ( Z i , u ) k ≤ C ¯ f , sup v,u k ¯ f ( ˜ Z ( v ) , u ) k ≤ C ¯ f . (ii) with x = , k ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( in ) , u ) k ≤ C cont · n − ςsx , (8.99) k ¯ f ( ˜ Z i ( v ) , u ) − ¯ f ( ˜ Z i ( v ) , u ) k ≤ C cont · (cid:0) | v − v | ςsx + | u − u | ςs (cid:1) . (8.100) In the case that Assumption 2.5 is fulﬁlled, we can choose x = 1 . Proof of Lemma 8.9. (i) If Assumption 2.5 is satisﬁed, we have by Lemma 2.7 that k P i − j f ( Z i , u ) k ≤ k f ( Z i , u ) − f ( Z ∗ ( i − j ) i , u ) k = δ f ( Z,u )2 ( j ) ≤ D f,n ( u )∆( j ) . If Assumption 2.8 is satisﬁed, we have by Lemma 8.1 that k P i − j f ( Z i , u ) k = k P i − j E [ f ( Z i , u ) |G i − ] k ≤ k E [ f ( Z i , u ) |G i − ] − E [ f ( Z i , u ) |G i − ] ∗ ( i − j ) k ≤ D f,n ( u )∆( j ) . The second assertion follows from Lemma 2.7 or Lemma 8.1 depending on if As-sumption 2.5 or 2.8 is satisﬁed. mpirical process theory for locally stationary processes C R := sup v,u k ¯ R ( ˜ Z ( v ) , u ) k and C R := max { sup i,u k R ( Z i , u ) k , sup u,v k R ( ˜ Z ( v ) , u ) k } .We ﬁrst use Assumption 3.10 and H¨older’s inequality to obtain k ¯ f ( ˜ Z i ( v ) , u ) − ¯ f ( ˜ Z i ( v ) , u ) k (8.101) ≤ | u − u | ς · (cid:0) k ¯ R ( ˜ Z i ( v ) , u ) k + k R ( ˜ Z i ( v ) , u ) k (cid:1) ≤ C R | u − u | ς . (8.102)Let Assumption 2.5 hold. Then k ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( v ) , u ) k ≤ k| Z i − ˜ Z i ( v ) | sL F ,s ( R ( Z i , u ) + R ( ˜ Z i ( v ) , u ) k ≤ k| Z i − ˜ Z i ( v ) | sL F ,s k pp − (cid:0) k R ( Z i , u ) k p + k R ( ˜ Z i ( v ) , u ) k p (cid:1) ≤ C R k| Z i − ˜ Z i ( v ) | sL F ,s k pp − . Furthermore, k| Z i − ˜ Z i ( v ) | sL F ,s k p ¯ p − ≤ ∞ X l =0 L F ,l k| X i − l − ˜ X i − l ( v ) | s k pp − = i X l =0 L F ,l k X i − l − ˜ X i − l ( v ) k s psp − ≤ i X l =0 L F ,l C sX (cid:0) | v − in | ς + l ς n − ς (cid:1) s ≤ | v − in | ς · C X | L F | + n − ς · C X ∞ X l =0 L F ,l l ςs (cid:9) . We obtain with C cont := 2 ¯ C R + 2 C R C X (cid:8) | L F | + P ∞ j =0 L F ,j j ςs (cid:9) that k ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( v ) , u ) k ≤ C cont · h | v − in | ςs + n − ςs i . (8.103)Furthermore, as above, k f ( ˜ Z i ( v ) , u ) − f ( ˜ Z i ( v ) , u ) k ≤ C R k| ˜ Z ( v ) − ˜ Z ( v ) | sL F ,s k pp − ≤ C R i X l =0 L F ,l k ˜ X ( v ) − ˜ X ( v ) k s psp − ≤ C R C X | L F | · | v − v | ςs (8.104)From (8.103), we obtain (8.99) with v = in . From (8.102) and (8.104), we conclude(8.100).Now let Assumption 4.9 hold. Assume w.l.o.g. thatsup u,v c s E h sup | a | L F ,s ≤ c (cid:12)(cid:12) ¯ f ( ˜ Z ( v ) , u ) − ¯ f ( ˜ Z ( v ) + a, u ) (cid:12)(cid:12) i ≤ C R . c n > C ¯ f := max { sup i,u k f ( Z i , u ) k p , sup u,v k f ( ˜ Z ( v ) , u ) k p } .Then we have by Jensen’s inequality, (cid:13)(cid:13) ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( v ) , u ) (cid:13)(cid:13) ≤ E h(cid:12)(cid:12) ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( v ) , u ) (cid:12)(cid:12) {| Z i − ˜ Z i ( v ) | L F ,s ≤ c n } i / + E h ( ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( v ) , u ) {| Z i − ˜ Z i ( v ) | L F ,s >c n } i / ≤ E h sup | a | L F ,s ≤ c n (cid:12)(cid:12) ¯ f ( ˜ Z i ( v ) , u ) − ¯ f ( ˜ Z i ( v ) + a, u ) (cid:12)(cid:12) i / + (cid:8)(cid:13)(cid:13) ¯ f ( Z i , u ) (cid:13)(cid:13) p + ¯ f ( ˜ Z i ( v ) , u ) (cid:13)(cid:13) p (cid:9) P ( | Z i − ˜ Z i ( v ) | L F ,s > c n ) ¯ p − p ≤ C R c sn + 2 C ¯ f (cid:16) k| Z i − ˜ Z i ( v ) | L F ,s k ps ¯ p − c n (cid:17) s ≤ C R c sn + 2 C ¯ f C X ( | L F | + ∞ X j =0 L F ,j j ςs ) · {| v − in | ςs + n − ςs } c sn . We obtain with c cont := C R + 2 C ¯ f C X ( | L F | + P ∞ j =0 L F ,j j ςs ) that k ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( v ) , u ) k ≤ c cont · h c sn + | v − in | ςs + n − ςs c sn i . (8.105)Furthermore, as above, for any c > k f ( ˜ Z i ( v ) , u ) − f ( ˜ Z i ( v ) , u ) k ≤ C R c s + 2 C ¯ f (cid:16) k| ˜ Z ( v ) − ˜ Z ( v ) | sL F ,s k p ¯ p − c (cid:17) s ≤ C R c s + 2 C ¯ f C X | L F | · | v − v | ςs c s . (8.106)From (8.105), (8.106) and (8.102), we obtain the assertion again with v = in . Proof of Theorem 5.1.

We show the result more generally for G Wn ( f ) = √ n S Wn ( f ).The statement of the theorem is obtained for W i ( f ) = f ( Z i , in ).Let V ◦ ( f ) = k f k ,n + P ∞ j =1 min {k f k ,n , D n ∆( j ) } ϕ ( j ) / , where ϕ ( j ) = log log( e e j ). V ◦ ( f ) serves as a lower bound for ˜ V ( f ).For q ∈ { , ..., n } , we use decomposition (3.1) without the maximum. The set B n ( q ) is mpirical process theory for locally stationary processes P (cid:0)(cid:12)(cid:12) √ n S Wn ( f ) (cid:12)(cid:12) > x, B n ( q ) (cid:1) ≤ P (cid:16) √ n (cid:12)(cid:12) S Wn ( f ) − S Wn,q ( f ) (cid:12)(cid:12) > x/ , B n ( q ) (cid:17) + P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x/ (cid:17) + P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i odd √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x/ (cid:17) + P (cid:16) √ n (cid:12)(cid:12) S Wn, ( f ) (cid:12)(cid:12) > x/ (cid:17) =: A + A + A . Deﬁne for l ∈ N , g ( l ) = p log( l + 1) + 1 , g ( l ) = log( l + 1) + 1 , a ( l ) = l / log( el ) / ϕ ( l ) . and for j ∈ N , γ ( j ) = log ( j ) + 1. By elementary calculations, we see that there exists auniversal constant c ≥ L X l =1 τ l g ( l ) ≤ L X l =1 τ l X j = τ l − +1 g ( l ) ≤ q X j =1 L X l =1 { τ l − +1 ≤ j ≤ τ l } g ( l ) ≤ q X j =1 g ( γ ( j )) ≤ q · g ( γ ( q )) ≤ q ) . The third to last inequality is due to 2 l − +1 = τ l − +1 ≤ j ⇐⇒ l ≤ log ( j − ≤ γ ( j )and the monotonicity of g . In a similar fashion, L X l =1 g ( l ) τ l X j = τ l − +1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) }≤ q X j =1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } · L X l =1 { τ l − +1 ≤ j ≤ τ l g ( l ) ≤ q X j =1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } g ( γ ( j )) ≤ V ◦ ( f )by g ( γ ( j )) ≤ ϕ ( j ) / and Lemma 8.2.8Therefore, x x x x V ◦ ( f ) L X l =1 g ( l ) τ l X j = τ l − +1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } + x q ) L X l =1 τ l g ( l )= L X l =1 y ( l ) + L X l =1 y ( l ) , where y ( l ) := x V ◦ ( f ) g ( l ) P τ l j = τ l − +1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } , y ( l ) = x q ) τ l g ( l ).We now use a standard Bernstein inequality for independent random variables: If M Q , σ Q > Q i , i = 1 , ..., m mean-zero independent variables with | Q i | ≤ M Q ,( m P mi =1 k Q i k ) / ≤ σ Q , then for any z > P (cid:16) √ m (cid:12)(cid:12)(cid:12) m X i =1 (cid:2) Q i − E Q i (cid:3)(cid:12)(cid:12)(cid:12) > z (cid:17) ≤ · exp (cid:16) − z σ Q + M Q z √ m (cid:17) . (8.107)Using the bound (8.15), √ τ l | T i,l ( f ) | ≤ √ τ l k f k ∞ ≤ √ τ l M and the elementary inequal-ity min { ab , ac } ≤ ab + c ≤ min { ab , ac } we obtain P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x (cid:17) ≤ L X l =1 P (cid:16)(cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > y ( l ) + y ( l ) (cid:17) ≤ L X l =1 exp (cid:16) −

14 min n ( y ( l ) + y ( l )) (cid:16) P τ l j = τ l − +1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } (cid:17) , y ( l ) + y ( l ) √ τ l M √ nτl o(cid:17) ≤ L X l =1 exp (cid:16) −

14 min n y ( l ) (cid:16) P τ l j = τ l − +1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } (cid:17) , y ( l ) √ τ l M √ nτl o(cid:17) = 2 L X l =1 exp (cid:16) −

14 min n x g ( l ) V ◦ ( f ) , xg ( l ) M Φ( q ) √ n o(cid:17) ≤ L X l =1 exp (cid:16) − x g ( l ) V ◦ ( f ) (cid:17) + 2 L X l =1 exp (cid:16) − xg ( l ) M Φ( q ) √ n (cid:17) . (8.108) mpirical process theory for locally stationary processes x > √ · V ◦ ( f ), (cid:16) L X l =1 exp (cid:16) − x g ( l ) V ◦ ( f ) (cid:17)(cid:17) exp (cid:16) x V ◦ ( f ) (cid:17) = L X l =1 exp (cid:16) − log( l + 1) · (cid:16) x V ◦ ( f ) (cid:17) (cid:17) ≤ L X l =1 ( l + 1) − ( x V ◦ ( f ) ) ≤ π . Similarly, if x > M Φ( q ) √ n , (cid:16) L X l =0 exp (cid:16) − xg ( l ) M Φ( q ) √ n (cid:17)(cid:17) exp (cid:16) x M Φ( q ) √ n (cid:17) ≤ π . We conclude from (8.108): If x > max {√ · V ◦ ( f ) , M Φ( q ) √ n } , (8.109)then P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x (cid:17) ≤ π h exp (cid:16) − x V ◦ ( f ) (cid:17) + exp (cid:16) − x M Φ( q ) √ n (cid:17)i ≤ π e exp (cid:16) − min n x V ◦ ( f ) , x M Φ( q ) √ n o(cid:17) , (8.110)where in the last step we added the factor e for convenience of the next step of theproof. If (8.109) is not fulﬁlled, then either x ≤ √ · V ◦ ( f ) or x ≤ M Φ( q ) √ n . The upperbound (8.110) then is still true since x ≤ √ · V ◦ ( f ) impliesexp (cid:16) − min n x V ◦ ( f ) , x M Φ( q ) √ n o(cid:17) ≥ exp (cid:16) − x V ◦ ( f ) (cid:17) ≥ exp( − ≤ x ≤ M Φ( q ) √ n . Thus, (8.110) holds for all x > A ≤ π e · exp (cid:16) − x V ◦ ( f ) + M Φ( q ) x √ n (cid:17) . (8.111)0Since k W i ( f ) k ≤ k f ( Z i , in ) k and k W i ( f ) k ∞ ≤ k f k ∞ ≤ M , we obtain from (8.107) A ≤ (cid:16) − x k f k ,n + Mx √ n (cid:17) . (8.112)Since 1 ≤ Φ( q ) and k f k ,n ≤ V ◦ ( f ), this yields a similar bound as (8.111).We now discuss A . Write1 √ n ( S Wn ( f ) − S Wn,q ( f )) = ∞ X j = q √ n ( S Wn,j +1 ( f ) − S Wn,j ( f )) = ∞ X j = q √ n n X i =1 ( W i,j +1 ( f ) − W i,j ( f )) . PutΩ n ( j ) := { sup f ∈F n n X i =1 E [( W i,j +1 ( f ) − W i,j ( f )) |G i − ] ≤ ( M Φ( q )˜ β ( q ) √ n ) ∆( j ) a ( j ) g ( j ) }∩{ sup f ∈F sup i =1 ,...,n | W i,j +1 ( f ) − W i,j ( f ) | ≤ M Φ( q )˜ β ( q ) ∆( j ) a ( j ) } , and B n ( q ) := ∞ \ j = q Ω n ( j ) . (8.113)Note that A ≤ P (cid:16) √ n (cid:12)(cid:12) S Wn ( f ) − S Wn,q ( f ) (cid:12)(cid:12) > x , ∞ \ j = q Ω n ( j ) (cid:17) . (8.114)Here, W i,j +1 ( f ) − W i,j ( f ) is a martingale diﬀerence with respect to G i . Furthermore, ∞ X j = q ∆( j ) a ( j ) g ( j ) ≤ ∞ X j = q ∆( j ) j / log( ej ) = 4 ˜ β ( q ) . By Freedman’s Bernstein-type inequality for martingales (cf. [12]), we have for x ≥ mpirical process theory for locally stationary processes M Φ( q ) √ n that P (cid:16) √ n (cid:12)(cid:12) S Wn ( f ) − S Wn,q ( f ) (cid:12)(cid:12) > x , ∞ \ j = q Ω n ( j ) (cid:17) ≤ ∞ X k = q P (cid:16) √ n (cid:12)(cid:12)(cid:12) n X i =1 ( W i,j +1 ( f ) − W i,j ( f )) (cid:12)(cid:12)(cid:12) > x ∆( j ) a ( j ) g ( j )˜ β ( q ) , Ω n ( j ) (cid:17) ≤ ∞ X k = q exp (cid:16) −

12 ( x ∆( j ) a ( j ) g ( j )˜ β ( q ) ) ( M Φ( q )˜ β ( q ) √ n ) ∆( j ) a ( j ) g ( j ) + M Φ( q )∆( j ) a ( j )˜ β ( q ) √ n · x ∆( j ) a ( j ) g ( j )˜ β ( q ) (cid:17) = 2 ∞ X k = q exp (cid:16) − x g ( j ) ( M Φ( q ) √ n ) g ( j ) + M Φ( q ) xg ( j ) √ n (cid:17) = 2 ∞ X k = q exp (cid:16) − g ( j )4 min n(cid:16) x M Φ( q ) √ n (cid:17) , (cid:16) x M Φ( q ) √ n (cid:17)o(cid:17) ≤ ∞ X k = q exp (cid:16) − g ( j ) x M Φ( q ) √ n (cid:17) . (8.115)We conclude that for x > M Φ( q ) √ n , (cid:16) ∞ X j = q exp (cid:16) − g ( j ) x M Φ( q ) √ n (cid:17)(cid:17) · exp (cid:16) x M Φ( q ) √ n (cid:17) ≤ ∞ X j = q ( j + 1) − ( x M Φ( q ) √ n ) ≤ π , and thus (with an additional factor e ), A ≤ P (cid:16) √ n (cid:12)(cid:12) S Wn ( f ) − S Wn,q ( f ) (cid:12)(cid:12) > x , ∞ \ j = q Ω n ( j ) (cid:17) ≤ π e exp (cid:16) − x M Φ( q ) √ n (cid:17) . (8.116)In the case x ≤ M Φ( q ) √ n , we have π e exp (cid:16) − x M Φ( q ) √ n (cid:17) ≥ π ≥ , thus (8.116) holds for all x > g ( j ) ≥

1, we haveΩ n ( j ) ⊂ { n n X i =1 E [sup f ∈F (cid:12)(cid:12) W i,j +1 ( f ) − W i,j ( f ) (cid:12)(cid:12) |G i − ] ≤ ( M Φ( q )˜ β ( q ) √ n ) ∆( j ) a ( j ) }∩{ (cid:16) n X i =1 sup f ∈F | W i,j +1 ( f ) − W i,j ( f ) | (cid:17) / ≤ M Φ( q )˜ β ( q ) ∆( j ) a ( j ) } , P (Ω n ( j ) c ) ≤ (cid:16) √ n ˜ β ( q ) M Φ( q ) (cid:17) j ) a ( j ) n n X i =1 (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) W i,j +1 ( f ) − W i,j ( f ) (cid:12)(cid:12)(cid:13)(cid:13)(cid:13) + (cid:16) ˜ β ( q )2 M Φ( q ) (cid:17) j ) a ( j ) n X i =1 (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) W i,j +1 ( f ) − W i,j ( f ) (cid:12)(cid:12)(cid:13)(cid:13)(cid:13) ≤ (cid:16) ˜ β ( q ) √ nM Φ( q ) (cid:17) ( D ∞ n ) a ( j ) . Therefore, P ( B n ( q ) c ) ≤ P (cid:16) ∞ [ j = q Ω n ( j ) c (cid:17) ≤ (cid:16) D ∞ n ˜ β ( q ) √ nM Φ( q ) (cid:17) ∞ X k = q a ( j ) . (8.117)Note that ∞ X j = q +1 a ( j ) ≤ ∞ X j = q +1 Z jj − a ( j ) dx ≤ Z ∞ q a ( x ) dx ≤ Z ∞ q x log( e e x ) log(log( e e x )) dx = 2log(log( e e q )) , so that ∞ X k = q a ( j ) = 1 a ( q ) + ∞ X j = q +1 a ( j ) ≤ ϕ ( q ) . Summarizing the bounds (8.111), (8.112), (8.116) and (8.117) and using the fact that V ◦ ( f ) = k f k ,n + ∞ X j =1 min {k f k ,n , D n ∆( j ) } ϕ ( j ) / ≤ ˜ V ( f ) , we obtain (5.1).We now show (5.2) by a case distinction. We abbreviate ˜ q ∗ = ˜ q ∗ ( M √ n D ∞ n y ). In the caseΦ(˜ q ∗ ) n ≤

1, we have ˜ q ∗ ∈ { , ..., n } and thus by (5.1) P (cid:16) √ n | S Wn ( f ) | > x, B n (˜ q ∗ ) (cid:17) ≤ c exp (cid:16) − c x ˜ V ( f ) + M Φ(˜ q ∗ ) √ n x (cid:17) and, by deﬁnition of ˜ q ∗ , P ( B n (˜ q ∗ ) c ) ≤ (cid:16) ˜ β (˜ q ∗ )Φ(˜ q ∗ ) · D ∞ n √ nM (cid:17) ≤ y . mpirical process theory for locally stationary processes q ∗ ) n >

1, we obviously have P (cid:16) √ n | S Wn ( f ) | > x (cid:17) ≤ P ( M √ n > x ) ≤ c exp (cid:16) − c xM √ n (cid:17) ≤ c exp (cid:16) − c x M Φ(˜ q ∗ ) √ n (cid:17) ≤ c exp (cid:16) − c x ˜ V ( f ) + M Φ(˜ q ∗ ) √ n x (cid:17) , and the assertion follows holds without any restricting set B n ( q ), we can therefore choose q arbitrarily. Lemma 8.10.

Let F be a class of functions which satisﬁes Assumption 4.1. Then thereexist universal constants c , c > such that the following holds: For each q ∈ { , ..., n } there exists a set B (2) n ( q ) independent of f ∈ F such that for all x > , P (cid:16) n (cid:12)(cid:12) S Wn ( f ) (cid:12)(cid:12) > x, B (2) n ( q ) (cid:17) ≤ c exp (cid:16) − c x M Φ( q ) n · min (cid:8) x k f k ,n V ( f ) , (cid:9)(cid:17) (8.118) and P ( B (2) n ( q ) c ) ≤ n ( D ∞ n ) M · C ∆ β ( q )Φ( q ) . Deﬁne ˜ q ∗ ( z ) = min { q ∈ N : β ( q ) ≤ Φ( q ) x } . Then for any x > , y > , P (cid:16) n (cid:12)(cid:12) S Wn ( f ) (cid:12)(cid:12) > x, B (2) n (˜ q ∗ ( M n ( D ∞ n ) y )) (cid:17) ≤ c exp (cid:16) − c x M n Φ(˜ q ∗ ( M n ( D ∞ n ) y )) · min (cid:8) x k f k ,n V ( f ) , (cid:9)(cid:17) (8.119) and P ( B (2) n (˜ q ∗ ( M n ( D ∞ n ) y )) c ) ≤ C ∆ y . Proof of Lemma 8.10.

We use a similar argument as in Theorem 5.1, especially wemake use of the decomposition (3.1).The set B (2) n ( q ) is deﬁned below in (8.125). We then have P (cid:0)(cid:12)(cid:12) n S Wn ( f ) (cid:12)(cid:12) > x, B (2) n ( q ) (cid:1) ≤ P (cid:16) n (cid:12)(cid:12) S Wn ( f ) − S Wn,q ( f ) (cid:12)(cid:12) > x/ , B (2) n ( q ) (cid:17) + P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) nτ l ⌊ nτl ⌋ +1 X i =1 i even τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x/ (cid:17) + P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) nτ l ⌊ nτl ⌋ +1 X i =1 i odd τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x/ (cid:17) + P (cid:16) n (cid:12)(cid:12) S Wn, ( f ) (cid:12)(cid:12) > x/ (cid:17) =: A + A + A . g ( l ) = log( l + 1) + 1, L X l =1 τ l g ( l ) ≤ q ) . Therefore, x x x x V ( f ) L X l =1 τ l X j = τ l − +1 min {k f k ,n , D n ∆( j ) } + x q ) L X l =1 τ l g ( l )= L X l =1 y ( l ) + L X l =1 y ( l ) , where y ( l ) := x V ( f ) P τ l j = τ l − +1 min {k f k ,n , D n ∆( j ) } , y ( l ) = x q ) τ l g ( l ).We have by Lemma 8.3: If M Q , σ Q > Q i , i = 1 , ..., m mean-zeroindependent variables with | Q i | ≤ M Q , m P mi =1 k Q i k ≤ σ Q , then for any z > P (cid:16) √ m (cid:12)(cid:12)(cid:12) m X i =1 (cid:2) Q i − E Q i (cid:3)(cid:12)(cid:12)(cid:12) > z (cid:17) ≤ · exp (cid:16) − z σ Q M Q m + M Q zm (cid:17) . (8.120)Using the bound (8.48) combined with (8.51), τ l | T i,l ( f ) | ≤ k f k ∞ ≤ M and theelementary inequalities min { ab , ac } ≤ ab + c ≤ min { ab , ac } and ( a + b ) ≥ ab , we obtain P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) nτ l ⌊ nτl ⌋ +1 X i =1 i even τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x (cid:17) ≤ L X l =1 P (cid:16)(cid:12)(cid:12)(cid:12) nτ l ⌊ nτl ⌋ +1 X i =1 i even τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > y ( l ) + y ( l ) (cid:17) ≤ L X l =1 exp (cid:16) −

14 min n ( y ( l ) + y ( l )) k f k ,n P τ l j = τ l − +1 min {k f k ,n , D n ∆( j ) } · M nτl ,y ( l ) + y ( l ) M nτl o(cid:17) ≤ L X l =1 exp (cid:16) − · y ( l ) M nτl min n y ( l ) k f k ,n P τ l j = τ l − +1 min {k f k ,n , D n ∆( j ) } , o(cid:17) ≤ L X l =1 exp (cid:16) − xg ( l )2 M Φ( q ) n · min n x k f k ,n V ( f ) , o(cid:17) . (8.121) mpirical process theory for locally stationary processes x is such that c ( x ) := x M q ) n · min (cid:8) x k f k ,n V ( f ) , (cid:9) ≥

2, then (cid:16) L X l =1 exp (cid:0) − g ( l ) c ( x ) (cid:1)(cid:17) · exp (cid:0) − c ( x ) (cid:1) = L X l =1 exp (cid:16) − log( l + 1) c ( x ) (cid:17) ≤ L X l =1 ( l + 1) − c ( x ) ≤ π . Insertion into (8.121) leads to P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) nτ l ⌊ nτl ⌋ +1 X i =1 i even τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x (cid:17) ≤ π · e exp( − c ( x )) (8.122)In the case c ( x ) <

2, the right hand side of (8.122) is ≥

1. Thus, (8.122) holds for all x > A ≤ π e · exp (cid:16) − · x M Φ( q ) n · min (cid:8) x k f k ,n V ( f ) , (cid:9)(cid:17) . (8.123)Since k W i ( f ) k ≤ k f ( Z i , in ) k and k W i ( f ) k ∞ ≤ k f k ∞ ≤ M , we obtain from (8.120) A ≤ (cid:16) − x k f k ,n · M n + M xn (cid:17) ≤ (cid:16) − x M n · min (cid:8) x k f k ,n , (cid:9)(cid:17) . (8.124)Since 1 ≤ Φ( q ) and k f k ,n ≤ V ( f ), this yields a similar bound as (8.123).We now discuss A . Put B (2) n ( q ) := { sup f ∈F n | S Wn ( f ) − S Wn,q ( f ) | ≤ M Φ( q ) n } . (8.125)Then with Markov’s inequality and using the same calculation as in (8.53), P ( B (2) n ( q ) c ) ≤ nM Φ( q ) · (cid:13)(cid:13) sup f ∈F n | S Wn ( f ) − S Wn,q ( f ) | (cid:13)(cid:13) ≤ nM Φ( q ) · ∞ X k = q n n X i =1 (cid:13)(cid:13) sup f ∈F | W i,j +1 ( f ) − W i,j ( f ) | (cid:13)(cid:13) ≤ nM Φ( q ) · ( D ∞ n ) C ∆ β ( q ) . (8.126)Furthermore, A = P (cid:16) n | S Wn ( f ) − S Wn,q ( f ) | > x , B (2) n ( q ) (cid:17) = { M q ) n > x } ≤ e · exp (cid:16) − x M Φ( q ) n (cid:17) . (8.127)6Summarizing the bounds (8.123), (8.124), (8.127) and (8.126), we obtain the result(8.118).We now show (8.119) by a case distinction. Abbreviate ˜ q ∗ = ˜ q ∗ ( M n ( D ∞ n ) y ). In the caseΦ(˜ q ∗ ) n ≤

1, we have ˜ q ∗ ∈ { , ..., n } and thus by (8.118) P (cid:16) n | S Wn ( f ) | > x, B n (˜ q ∗ ) (cid:17) ≤ c exp (cid:16) − c x M Φ(˜ q ∗ ) n · min (cid:8) x k f k ,n V ( f ) , (cid:9)(cid:17) and, by deﬁnition of ˜ q ∗ , P ( B n (˜ q ∗ ) c ) ≤ ( D ∞ n ) nM · C ∆ β (˜ q ∗ )Φ(˜ q ∗ ) ≤ C ∆ y , the assertion follows with B (2) n ( M, y ) = B (2) n (˜ q ∗ ).In the case Φ(˜ q ∗ ) n >

1, we obviously have P (cid:16) n | S Wn ( f ) | > x (cid:17) ≤ P ( M > x ) ≤ c exp (cid:16) − c xM (cid:17) ≤ c exp (cid:16) − c x M Φ(˜ q ∗ ) n (cid:17) ≤ c exp (cid:16) − c x M Φ(˜ q ∗ ) n · min (cid:8) x k f k ,n V ( f ) , (cid:9)(cid:17) , and the assertion follows with B (2) n ( M, y ) being the whole probability space.

Proof of Theorem 5.4.

Let B n ( q ) denote the set from Theorem 5.1 (applied to W i ( f ) = E [ f ( Z i , in ) |G i − ] instead of W i ( f ) = f ( Z i , in ); the proof is similar for this situation). Let B (2) n ( q ) denote the set from Lemma 8.10.Put B ◦ n ( q ) = B n ( q ) ∩ B (2) n ( q ) . Then we have P (cid:0) | G n ( f ) | > x, B ◦ n ( q ) (cid:1) ≤ P (cid:0) G (1) n ( f ) | > x , B (2) n ( q ) (cid:1) + P (cid:0) | G (2) n ( f ) | > x , B n ( q ) (cid:1) ≤ P (cid:16) | G (1) n ( f ) | > x , R n ( f ) ≤ max (cid:8) ˜ V ( f ) , M Φ( q ) √ n x (cid:9)(cid:17) + P (cid:16) R n ( f ) > max (cid:8) ˜ V ( f ) , M Φ( q ) √ n x (cid:9) , B (2) n ( q ) (cid:17) + P (cid:0) | G (2) n ( f ) | > x , B n ( q ) (cid:1) . (8.128)We now discuss the three summands in (8.128) separately. By Theorem 5.1, P (cid:0) | G (2) n ( f ) | > x , B n ( q ) (cid:1) ≤ c exp (cid:16) − c ( x/ ˜ V ( f ) + M Φ( q ) √ n ( x/ (cid:17) . mpirical process theory for locally stationary processes P (cid:16) | G (1) n ( f ) | > x , R n ( f ) ≤ max (cid:8) ˜ V ( f ) , M Φ( q ) √ n x (cid:9)(cid:17) ≤ (cid:16) −

12 ( x/ max { ˜ V ( f ) , M Φ( q ) √ n x } + M √ n x (cid:17) ≤ (cid:16) −

14 ( x/ ˜ V ( f ) + M Φ( q ) √ n x (cid:17) . By Lemma 8.10 applied to W i ( f ) = E [ f ( Z i , in ) |G i − ] and using Φ( q ) ≤ Φ( q ) (cf.(8.129)), P (cid:16) R n ( f ) > max (cid:8) ˜ V ( f ) , M Φ( q ) √ n x (cid:9) , B (2) n ( q ) (cid:17) ≤ c exp (cid:16) − c M Φ( q ) √ n x M Φ( q ) n · min n ˜ V ( f ) k f k ,n V ( f ) , o(cid:17) = c exp (cid:16) − c x M Φ( q ) √ n x (cid:17) . Inserting the above estimates into (8.128), the assertion (5.3) follows. Furthermore byAssumption 5.3, P ( B (2) n ( q ) c ) ≤ C ∆ n ( D ∞ n ) M ˜ β norm ( q ) ≤ C ∆ C ˜ β (cid:16) √ n D ∞ n M ˜ β norm ( q ) (cid:17) . Thus, P ( B ◦ n ( q ) c ) ≤ P ( B n ( q ) c ) + P ( B (2) n ( q ) c ) ≤ [4 + C ∆ C ˜ β ] (cid:16) √ n D ∞ n M ˜ β norm ( q ) (cid:17) . The second assertion (5.4) follows as in Theorem 5.1 with q = ˜ q ∗ ( M √ n D ∞ n y ). Lemma 8.11 (A second compatibility lemma) . Let n ∈ N , δ, a M > and k ∈ N . For H > , put ˜ r ( δ ) := max { r > q ∗ ( r ) r ≤ δ } , and w ( H ) := min { w > w · ˜ r ( w ) ≥ H − } , W ( H ) := Hw ( H ) . and ˜ m ( n, δ, k ) := a M ˜ r ( δ D n )˜ r ( w ( H ( k ))) · D ∞ n n / . Finally, put ˆ C n := 8 c (1 + D ∞ n D n )(1 + C β ( ˜ β (1) ∨ a M ) . (i) Then W is subadditive.(ii) If F fulﬁlls Assumption 3.1 and Assumption 5.3, then sup f ∈F ˜ V ( f ) ≤ δ , sup f ∈F k f k ∞ ≤ ˜ m ( n, δ, k ) implies that for any ψ : (0 , ∞ ) → [1 , ∞ ) , P (cid:16) √ n (cid:12)(cid:12) S Wn ( f ) (cid:12)(cid:12) > ˆ C n ψ ( δ ) δ W ( H ( k )) , B n (cid:17) ≤ c exp (cid:0) − H ( k ) (cid:1) , √ n k f { f>γ · ˜ m ( n,δ,k ) } k ,n ≤ γa M · D ∞ n D n · δ W ( H ( k )) , P (cid:0) B cn (cid:1) ≤ ψ ( δ ) a M , where B n = B n (˜ q ∗ ( m ( n,δ,k ) √ n D ∞ n ψ ( δ ) a M )) , c , c are from Theorem 5.1. Proof of Lemma 8.11. (i) Note that for a, b >

0, we have w ( a + b ) ≤ w ( a ) since w ( a )˜ r ( w ( a )) ≥ a − ≥ ( a + b ) − . Thus W ( a + b ) = ( a + b ) w ( a + b ) ≤ aw ( a + b )+ bw ( a + b ) ≤ aw ( a )+ bw ( b ) ≤ W ( a )+ W ( b ) . (ii) As in the proof of Lemma 4.5 (cf. (8.63)), we obtain that for x , x > q ∗ ( C ˜ β x x ) ≤ ˜ q ∗ ( x )˜ q ∗ ( x ) . Furthermore, for q , q ∈ N we have due to x + x ≤ x x + 1 thatlog log( e e q q ) ≤ log[log( eq ) + log( eq )] ≤ log[log( eq ) · log( eq ) + 1] ≤ log[log( eq ) · log( e e q )] ≤ log log( eq ) + log log( e e q ) ≤ log log( eq ) · log log( e e q ) + 1 ≤ log log( e e q ) · log log( e e q ) , and thus Φ( q q ) ≤ Φ( q )Φ( q ) . (8.129)Furthermore, note that for a ∈ (0 , ( ˜ β (1) ∨ q = ⌈ Φ − ( ( ˜ β (1) ∨ a ) ⌉ satisﬁesΦ( q ) a = Φ( ⌈ Φ − ( ( ˜ β (1) ∨ a ) ⌉ ) a ≥ ( ˜ β (1) ∨ ≥ ˜ β ( q ) , that is, Φ(˜ q ∗ ( a )) ≤ Φ( ⌈ Φ − ( ( ˜ β (1) ∨ a ) ⌉ ) ≤ Φ(2Φ − ( ( ˜ β (1) ∨ a )) ≤ − ( ( ˜ β (1) ∨ a )) ≤

4( ˜ β (1) ∨ a . (8.130) mpirical process theory for locally stationary processes y = ψ ( δ ) a M , we have˜ q ∗ ( ˜ m ( n, δ, k ) √ n D ∞ n y ) = ˜ q ∗ ( ˜ m ( n, δ, k ) √ n D ∞ n ψ ( δ ) a M ) = ˜ q ∗ ( C β ( ˜ β (1) ∨ r r ψ ( δ ) ) , where r = ˜ r ( δ D n ), r = ˜ r ( w ( H ( k ))), and thus with (8.129) and (8.130),Φ(˜ q ∗ ) ˜ m ( n, δ, k ) √ n ≤ Φ(˜ q ∗ ( C β ( ˜ β (1) ∨ r r ψ ( δ ) )) r r D ∞ n a M ≤ Φ (cid:16) ˜ q ∗ ( ( ˜ β (1) ∨ ψ ( δ ) )˜ q ∗ ( r )˜ q ∗ ( r ) (cid:17) r r D ∞ n a M ≤ Φ (cid:16) ˜ q ∗ ( ( ˜ β (1) ∨ ψ ( δ ) ) (cid:17) Φ(˜ q ∗ ( r ))Φ(˜ q ∗ ( r ))) r r D ∞ n a M ≤ D ∞ n D n ψ ( δ ) δw ( H ( k )) a M . By deﬁnition of W ( · ) and Theorem 5.1, we obtain P (cid:16) √ n (cid:12)(cid:12) S Wn ( f ) (cid:12)(cid:12) > ˆ C n ψ ( δ ) δ · W ( H ( k )) , B n (cid:17) ≤ c exp (cid:16) − c ˆ C n ψ ( δ ) δ W ( H ( k )) δ + 4 D ∞ n D n a M ˆ C n δ ψ ( δ ) w ( H ( k )) W ( H ( k )) (cid:17) ≤ c exp (cid:16) − c ˆ C n a M D ∞ n D n ˆ C n H ( k ) (cid:17) ≤ c exp (cid:0) − H ( k ) (cid:1) . Similar as in the proof of Lemma 3.6, we obtain due to Assumption 5.3 that˜ V ( f ) ≥ min a ∈ N (cid:2) k f k ,n (cid:16) a X j =1 ϕ ( j ) (cid:17) + D n ˜ β ( a ) (cid:3) ≥ k f k ,n (cid:16) ˆ a X j =1 ϕ ( j ) (cid:17) + D n ˜ β (ˆ a ) , where ˆ a = arg min a ∈ N {k f k ,n · (cid:0) P aj =1 ϕ ( j ) (cid:1) + D n β ( a ) } . Elementary calculationsshow that for ˆ a ≥ ˆ a X j =1 ϕ ( j ) = 1 + ˆ a X j =2 ϕ ( j ) ≥ Z ˆ a − ϕ ( x ) dx = 1 + (Φ(ˆ a − − − Z ˆ a − e e x ) dx ≥ Φ(ˆ a − − ˆ a − e ≥

14 Φ(ˆ a ) . Clearly, the same holds for ˆ a = 1. We therefore have˜ V ( f ) ≥ k f k ,n Φ(ˆ a ) . (8.131)00 Now, δ ≥ ˜ V ( f ) ≥ D n ˜ β (ˆ a ) = D n ˜ β norm (ˆ a )Φ(ˆ a ). Thus ˜ β norm (ˆ a ) ≤ δ D n Φ(ˆ a ) . By deﬁ-nition of ˜ q ∗ , ˜ q ∗ ( δ D n Φ(ˆ a ) ) ≤ ˆ a . Thus Φ(˜ q ∗ ( δ D n Φ(ˆ a ) )) δ D n Φ(ˆ a ) ≤ δ D n . By deﬁnition of ˜ r ,˜ r ( δ D n ) ≥ δ D n Φ(ˆ a ) .Using this result, (8.131) and the deﬁnition of w ( · ) yields √ n k f { f>γ · ˜ m ( n,δ,k ) } k ,n ≤ γ √ n k f k ,n ˜ m ( n, δ, k ) ≤ γ a M D ∞ n k f k ,n ˜ r ( δ D n )˜ r ( w ( H ( k ))) , and k f k ,n ˜ r ( δ D n )˜ r ( w ( H ( k ))) ≤ D n Φ(ˆ a ) k f k ,n δ r ( w ( H ( k )) ≤ D n V ( f ) k f k ,n δ r ( w ( H ( k )) ≤ δ · r ( w ( H ( k )) ≤ D n δ W ( H ( k )) , which provides √ n k f { f>γ · ˜ m ( n,δ,k ) } k ,n ≤ γ D n D ∞ n a M δ W ( H ( k )) . Finally, Theorem 5.1 implies that P ( B cn ) ≤ y = 4 ψ ( δ ) a M . We here present a chaining version of the large deviation inequality from Theorem 5.1.For the sake of simplicity, we derive the result for some continuous strictly decreasingupper bound ¯ H ( ε ) of H ( ε, F , V ). Theorem 8.12 (Chaining for large deviation inequalities) . There exists a universalconstant c > such that the following holds.Let a M ≥ , M, σ > be arbitrary. Let ψ ( x ) = p log( x − ∨ e ) log log( x − ∨ e e ) .Let F be a class which satisﬁes Assumption 3.1 and 5.3, and sup f ∈F ˜ V ( f ) ≤ σ , sup f ∈F k f k ∞ ≤ M . Deﬁne ˜ I ( σ ) := Z σ ψ ( ε ) W (1 ∨ ¯ H ( ε )) dε. Choose σ ◦ , x > such that ¯ H ( σ ◦ ) = 150 c · x σ + Φ(˜ q ∗ ( M √ n D ∞ n a M )) Mx √ n , x ≥ c ˆ C n ˜ I ( σ ◦ ) , (8.132) mpirical process theory for locally stationary processes where ˆ C n , W is from Lemma 8.11. Then there exists a set Ω n independent of x such that P (cid:16) sup f ∈F (cid:12)(cid:12) G n ( f ) (cid:12)(cid:12) > x, Ω n (cid:17) ≤ (cid:16) − c · x σ + Φ(˜ q ∗ ( M √ n D ∞ n a M )) Mx √ n (cid:17) , and P (Ω cn ) ≤ a M Z σ ◦ xψ ( x ) dx. Proof of Theorem 8.12.

We use the chaining technique from [2], Theorem 2.3 therein.We deﬁne δ := σ ◦ , δ j +1 := max { δ ≤ δ j H ( δ ) ≥ H ( δ j ) } . Since ¯ H ( · ) is continuous, it holds that ¯ H ( δ j +1 ) = 4 ¯ H ( δ j ). Put τ := min { j ≥ δ j ≤ ˜ I ( σ ◦ ) √ n } . Deﬁne η j := 4 ˆ C n ψ ( δ j ) δ j W ( H ( ¯ N j +1 )) , where ˆ C n is from Lemma 8.11 and¯ N j +1 := j +1 Y k =0 exp( ¯ H ( δ k )) ≥ j +1 Y k =0 exp( H ( δ k )) = j +1 Y k =0 N ( δ k ) =: N j +1 . By Lemma 8.11(i), W ( · ) is subadditive, thus τ X j =0 ψ ( δ j ) δ j W ( H ( ¯ N j +1 )) ≤ τ X j =0 ψ ( δ j ) δ j W (1 ∨ j +1 X k =1 ¯ H ( δ k )) ≤ τ X j =0 ψ ( δ j ) δ j j +1 X k =1 W (1 ∨ ¯ H ( δ k )) ≤ τ − X k =0 W (1 ∨ ¯ H ( δ k +1 )) τ X j = k ψ ( δ j ) δ j . (8.133)Similar to (8.71), there exists some universal constant c ψ > τ X j = k ψ ( δ j ) δ j ≤ ∞ X j = k Z δ j δ j / ψ ( δ j ) dx ≤ ∞ X j = k Z δ j δ j +1 ψ ( x ) dx ≤ Z δ k ψ ( x ) dx ≤ c ψ δ k ψ ( δ k ) . (8.134)02Furthermore, by deﬁnition of the sequence ( δ j ) j and since w ( · ) is decreasing but W isincreasing, we have W (1 ∨ ¯ H ( δ j +1 )) ≤ W (4(1 ∨ ¯ H ( δ j ))) ≤ W (1 ∨ ¯ H ( δ j )) . (8.135)Insertion of (8.134) and (8.135) into (8.133) yields τ X j =0 η j ≤ C n τ X j =0 ψ ( δ j ) δ j W ( H ( ¯ N j +1 )) ≤ c ψ ˆ C n ∞ X k =0 δ k ψ ( δ k ) W (1 ∨ ¯ H ( δ j )) ≤ c ψ ˆ C n ∞ X k =0 Z δ k δ k / ψ ( δ k ) W (1 ∨ ¯ H ( δ j )) dε ≤ c ψ ˆ C n ∞ X k =0 Z δ k δ k +1 ψ ( ε ) W (1 ∨ ¯ H ( ε )) dε ≤ c ψ ˆ C n Z σ ◦ ψ ( ε ) W (1 ∨ ¯ H ( ε )) dε = 64 c ψ ˆ C n ˜ I ( σ ◦ ) . (8.136)We set up the same decomposition as in the proof of Theorem 3.7. Deﬁne˜ m j := 12 ˜ m ( n, δ j , ¯ N j +1 ) . Note that x ≥ x x − η τ ) + ( x η τ ) . Deﬁne c := 5 · · c ψ . Condition 8.132 implies x ≥ c ψ ˆ C n ˜ I ( σ ◦ ) , (8.137)and thus with (8.136), we obtain x − η τ ≥ c ψ ˆ C n ˜ I ( σ ◦ ) − η τ ≥ τ − X j =0 η j . Put ˜ q ∗ j := ˜ q ∗ ( m ( n,δ j ,N j +1 ) √ n D ∞ n ψ ( δ j ) a M ), andΩ n := B n (˜ q ∗ ( M √ n D ∞ n a M )) ∩ τ \ j =0 B n (˜ q ∗ j ) , mpirical process theory for locally stationary processes B n ( q ) is from Theorem 5.1. From (8.33), we obtain the decomposition P (cid:16) sup f ∈F (cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12) > x, Ω n (cid:17) ≤ P (cid:16) sup f ∈F | G Wn ( π f ) | > x , Ω n (cid:17) + P (cid:16) sup f ∈F | G Wn ( ϕ ∧ ˜ m τ (∆ τ f )) | + 2 √ n sup f ∈F k ∆ τ f k ,n > x η τ , Ω n (cid:17) + P (cid:16) τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( ϕ ∧ ˜ m j − ˜ m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) > x − η τ , Ω n (cid:17) + P (cid:16) τ − X j =0 n sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ ˜ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j +1 f { ∆ j +1 f> ˜ m j +1 } k ,n o > x − η τ , Ω n (cid:17) + P (cid:16) τ − X j =0 n sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ ˜ m j − ˜ m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j f { ∆ j f> ˜ m j − ˜ m j +1 } k ,n o > x − η τ , Ω n (cid:17) ≤ P (cid:16) sup f ∈F | G Wn ( π f ) | > x , Ω n (cid:17) + P (cid:16) sup f ∈F | G Wn ( ϕ ∧ ˜ m τ (∆ τ f )) | + 2 √ n sup f ∈F k ∆ τ f k ,n > x η τ , Ω n (cid:17) + τ − X j =0 P (cid:16) sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( ϕ ∧ ˜ m j − ˜ m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) > η j , Ω n (cid:17) + τ − X j =0 P (cid:16) sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ ˜ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j +1 f { ∆ j +1 f> ˜ m j +1 } k ,n > η j , Ω n (cid:17) + τ − X j =0 P (cid:16) sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ ˜ m j − ˜ m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j f { ∆ j f> ˜ m j − ˜ m j +1 } k ,n > η j , Ω n (cid:17) =: R ∗ + R ∗ + R ∗ + R ∗ + R ∗ . (8.138)We now discuss the terms in (8.138) separately.04 • We have by deﬁnition of Theorem 5.1 that R ∗ ≤ P (cid:16) sup f ∈F | G Wn ( π f ) | > x , B n (˜ q ∗ ( M √ n D ∞ n a M )) (cid:17) ≤ N ( σ ◦ ) · sup f ∈F P (cid:16) | G Wn ( π f ) | > x , B n (˜ q ∗ ( M √ n D ∞ n a M )) (cid:17) ≤ exp( H ( σ ◦ )) · c exp (cid:16) − c ( x/ σ + Φ(˜ q ∗ ( M √ n D ∞ n a M )) M ( x/ √ n (cid:17) ≤ c exp (cid:16) − c x σ + Φ(˜ q ∗ ( M √ n D ∞ n a M )) M √ n (cid:17) . • We have by Lemma 8.11 that P (cid:16) sup f ∈F | G Wn ( ϕ ∧ ˜ m τ (∆ τ f )) | > η τ , B n (˜ q ∗ τ ) (cid:17) ≤ exp( H ( N τ +1 )) · c exp( − H ( N τ +1 )) ≤ c ∞ X j =0 exp( − H ( N j +1 )) ≤ c exp( − H ( σ ◦ ))(for the last inequality, see the more detailed calculation for R ∗ below). By theCauchy-Schwarz inequality, the deﬁnition of τ and (8.137), √ n sup f ∈F k ∆ τ f k ,n ≤ √ n k ∆ τ f k ,n ≤ √ nV (∆ τ f ) ≤ √ nδ τ ≤ ˜ I ( σ ◦ ) < x . We conclude that R ∗ ≤ c exp( − H ( σ ◦ )) . • We have by Lemma 8.11(i) that R ∗ ≤ τ − X j =0 P (cid:16) sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( ϕ ∧ ˜ m j − ˜ m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) > η j , B n (˜ q ∗ j ) (cid:17) ≤ τ − X j =0 N j +1 · sup f ∈F P (cid:16)(cid:12)(cid:12)(cid:12) G Wn ( ϕ ∧ ˜ m j − ˜ m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) > η j , B n (˜ q ∗ j ) (cid:17) ≤ τ − X j =0 exp( H ( N j +1 )) · c exp( − H ( ¯ N j +1 )) ≤ c τ − X j =0 exp( − H ( ¯ N j +1 )) ≤ c τ − X j =0 exp( − ¯ H ( δ j +1 )) ≤ c ∞ X j =0 exp( − j +1 ¯ H ( σ ◦ )) ≤ c exp( − ¯ H ( σ ◦ )) . mpirical process theory for locally stationary processes (cid:0) ∞ X j =0 exp( − j +1 ¯ H ( σ ◦ )) (cid:1) exp( ¯ H ( σ ◦ )) = ∞ X j =0 exp( − (4 j +1 −

1) ¯ H ( σ ◦ )) ≤ ∞ X j =0 exp( − (4 j +1 − ≤ . (8.139) • Similarly as for R ∗ , we have by Lemma 8.11(ii) that τ − X j =0 P (cid:16) sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ ˜ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) > η j , B n (˜ q ∗ j ) (cid:17) ≤ τ − X j =0 N j +1 · c exp( − H ( ¯ N j +1 )) ≤ c exp( − ¯ H ( σ ◦ )) , and, since a M ≥ √ n sup f ∈F k ∆ j +1 f { ∆ j +1 f> ˜ m j +1 } k ,n ≤ D ∞ n D n δ j W ( H ( ¯ N j +1 )) < ˆ C n · δ j W ( H ( ¯ N j +1 )) ≤ η j . This shows that R ∗ ≤ c exp( − H ( σ ◦ )) . • Similarly as for R ∗ , we obtain τ − X j =0 P (cid:16) sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ ˜ m j − ˜ m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) > η j , B n (˜ q ∗ j (cid:17) ≤ τ − X j =0 N j +1 · c exp( − H ( ¯ N j +1 )) ≤ c exp( − ¯ H ( σ ◦ )) . As in the proof of Theorem 3.7 (discussion of R therein), we see that 2( ˜ m j − ˜ m j +1 ) ≥ ˜ m j due to the fact that the inequality˜ r ( δ j D n ) − ˜ r ( δ j +1 D n ) ≥ ˜ r ( δ j D n ) − ˜ r ( δ j D n ) ≥ ˜ r ( δ j D n ) −

12 ˜ r ( δ j D n ) ≥

12 ˜ r ( δ j D n ) .

06 only requires δ j +1 ≤ δ j . Thus, since a M ≥ √ n sup f ∈F k ∆ j f { ∆ j f> ˜ m j − ˜ m j +1 } k ,n ≤ √ n sup f ∈F k ∆ j f { ∆ j f> ˜ m j } k ,n ≤ D ∞ n D n δ j W ( H ( ¯ N j +1 )) < C n δ j W ( H ( ¯ N j +1 )) ≤ η j . This shows that R ∗ ≤ c exp( − ¯ H ( σ ◦ )) . By plugging in the above upper bounds for R ∗ i , i ∈ { , ..., } into (8.138) and using(8.132), we obtain P (cid:16) sup f ∈F | G Wn ( f ) | > x, Ω n (cid:17) ≤ c exp (cid:0) − c · x σ + Φ(˜ q ∗ ( M √ n D ∞ n a M )) Mx √ n (cid:17) . (8.140)Discussion of the residual term: By Lemma 8.11(ii), we have that: P (Ω cn ) ≤ P ( B n (˜ q ∗ ( M √ n D ∞ n a M )) c ) + ∞ X j =0 P ( B n (˜ q ∗ j ) c ) ≤ a M + 4 a M ∞ X j =0 ψ ( δ j ) ≤ a M ∞ X j =0 ψ ( δ j ) . Due to ∞ X j =0 ψ ( δ j ) = ∞ X j =0 δ j − δ j +1 Z δ j δ j +1 ψ ( δ j ) dx ≤ ∞ X j =0 Z δ j δ j +1 δ j ψ ( δ j ) dx ≤ Z σ ◦ xψ ( x ) dx ≤ σ ◦ ) − ∨ e e )) , the result follows. Proof of Lemma 6.3.

Put D v,n ( u ) = √ hK h ( u − v ). By (A1) and Assumption 6.2,Assumption 2.5 is fulﬁlled for F j with ν = 2 and ∆( k ) = O ( δ X M ( k )), C R = 1 + k max { C X , } M . mpirical process theory for locally stationary processes K is Lipschitz continuous and (A2) holds, we havesup | v − v ′ |≤ n − , | θ − θ ′ | ≤ n − (cid:12)(cid:12)(cid:0) ∇ jθ L n,h ( v, θ ) − E ∇ jθ L n,h ( v, θ ) (cid:1) − (cid:0) ∇ jθ L n,h ( v ′ , θ ′ ) − E ∇ jθ L n,h ( v ′ , θ ′ ) (cid:1)(cid:12)(cid:12) ∞ ≤ sup | v − v ′ |≤ n − , | θ − θ ′ | ≤ n − C R h (cid:2) L K | v − v ′ | + C Θ | θ − θ ′ | (cid:3) × n n X i = k (cid:0) | Z i | M + E | Z i | M (cid:1) = O p ( n − ) . Let Θ n be a grid approximation of Θ such that for any θ ∈ Θ, there exists some θ ′ ∈ Θ n such that | θ − θ ′ | ≤ n − . Since Θ ⊂ R d Θ , it is possible to choose Θ n such that | Θ n | = O ( n − d Θ ). Furthermore, deﬁne V n := { in − : i = 1 , ..., n } as an approximation of [0 , F ′ j = { f v,θ : θ ∈ Θ n , v ∈ V n } yields for j ∈ { , , } thatsup v ∈ [ h , − h ] (cid:12)(cid:12) ∇ jθ L n,h ( v, θ ) − E ∇ jθ L n,h ( v, θ ) (cid:12)(cid:12) ∞ = O p (cid:0) τ n (cid:1) . (8.141)Put ˜ L n,h ( v, θ ) = n P ni =1 K h ( i/n − v ) ℓ θ ( ˜ Z i ( v )). With (A1) it is easy to see that (cid:12)(cid:12) E ∇ jθ L n,h ( v, θ ) − E ∇ jθ ˜ L n,h ( v, θ ) (cid:12)(cid:12) ∞ ≤ d j Θ C R n n X i =1 | K h ( i/n − v ) | · k| Z i − ˜ Z i ( v ) | k M × (cid:0) k| Z i | k M − M + k| ˜ Z i ( v ) | k M − M (cid:1) ≤ d j Θ C R | K | ∞ C X (1 + 2 C M − X ) (cid:0) n − + h (cid:1) . (8.142)Finally, since K has bounded variation and R K ( u ) du = 1, uniformly in v ∈ [ h , − h ] itholds that E ∇ jθ ˜ L n,h ( v, θ ) = 1 n n X i =1 K h ( i/n − v ) E ∇ jθ ℓ θ ( ˜ Z ( v )) = E ∇ jθ ℓ θ ( ˜ Z ( v ))+ O (( nh ) − ) . (8.143)From (8.141), (8.142) and (8.143) we obtainsup v ∈ [ h , − h ] sup θ ∈ Θ (cid:12)(cid:12) ∇ jθ L n,h ( v, θ ) − E ∇ jθ ℓ θ ( ˜ Z ( v )) (cid:12)(cid:12) ∞ = O p ( τ ( j ) n ) , (8.144)08where τ ( j ) n := τ n + ( nh ) − + h, j ∈ { , } , τ (1) n := τ n + ( nh ) − + B h . By (A3) and (8.144) for j = 0, we obtain with standard arguments that if τ (0) n = o (1),sup v ∈ [ h , − h ] (cid:12)(cid:12) ˆ θ n,h ( v ) − θ ( v ) (cid:12)(cid:12) ∞ = o p (1) . Since ˆ θ n,h ( v ) is a minimizer of θ L n,h ( v, θ ) and ℓ θ is twice continuously diﬀerentiable,we have the representationˆ θ n,h ( v ) − θ ( v ) = −∇ θ L n,h ( v, ¯ θ v ) − ∇ θ L n,h ( v, θ ( v )) , (8.145)where ¯ θ v ∈ Θ fulﬁlls | ¯ θ v − θ ( v ) | ∞ ≤ | ˆ θ n,h ( v ) − θ ( v ) | ∞ = o p (1).By (A2), we have (cid:12)(cid:12) E ∇ θ ℓ θ ( ˜ Z ( v )) (cid:12)(cid:12) θ = θ ( v ) − E ∇ θ ℓ θ ( ˜ Z ( v )) (cid:12)(cid:12) θ =¯ θ v (cid:12)(cid:12) ∞ = O ( | θ ( v ) − ¯ θ v | ) = o p (1) . and thus with (8.144),sup v ∈ [ h , − h ] (cid:12)(cid:12) ∇ θ L n,h ( v, ¯ θ v ) − E ∇ θ ℓ θ ( ˜ Z ( v )) (cid:12)(cid:12) θ = θ ( v ) (cid:12)(cid:12) ∞ = O p ( τ (2) n ) + o p (1) . (8.146)By (A3) and the dominated convergence theorem, E ∇ θ ℓ ( ˜ Z ( v )) = ∇ θ E ℓ ( ˜ Z ( v )) = 0. By(8.144),sup v ∈ [ h , − h ] (cid:12)(cid:12) ∇ θ L n,h ( v, θ ( v )) (cid:12)(cid:12) ∞ = sup v ∈ [ h , − h ] (cid:12)(cid:12) ∇ θ L n,h ( v, θ ( v )) − E ∇ θ ℓ ( ˜ Z ( v )) (cid:12)(cid:12) ∞ = O p ( τ (1) n ) . (8.147)Inserting (8.146) and (8.147) into (8.145), we obtainsup v ∈ [ h , − h ] (cid:12)(cid:12) ˆ θ n,h ( v ) − θ ( u ) (cid:12)(cid:12) ∞ = O p ( τ (1) n ) . This yields an improved version of (8.146):sup v ∈ [ h , − h ] (cid:12)(cid:12) ∇ θ L n,h ( v, ¯ θ v ) − E ∇ θ ℓ θ ( ˜ Z ( v )) (cid:12)(cid:12) θ = θ ( v ) (cid:12)(cid:12) ∞ = O p ( τ (2) n ) . (8.148)Inserting (8.147) and (8.148) into (8.145), we obtain the assertion. Details of Example 6.8.

We ﬁrst show that the supremum over x ∈ R , v ∈ [0 ,

1] canbe approximated by a supremum over grids x ∈ X n , v ∈ V n . mpirical process theory for locally stationary processes Q >

0, put c n = Qn s . Deﬁne the event A n = { sup i =1 ,...,n | X i | ≤ c n } . Thenby Markov’s inequality, P ( A cn ) ≤ n · k X i k s s Q s c sn ≤ C sX nc sn (8.149)is arbitrarily small for Q large enough.Put ˆ g ◦ n,h ( x, v ) := n P ni =1 K h ( i/n − v ) ˜ K h ( X i − x ) {| X i |≤ c n } . ThenOn A n , ˆ g ◦ n,h ( · ) = ˆ g n,h ( · ) . (8.150)Furthermore, p nh h (cid:12)(cid:12) E ˆ g n,h ( x, v ) − E ˆ g ◦ n,h ( x, v ) (cid:12)(cid:12) ≤ √ nh h | K | ∞ nh n X i =1 E [ ˜ K h ( X i − x ) {| X i | >c n } ] ≤ p nh h ( h h ) − | K | ∞ c − sn sup i E [ ˜ K ( X i − xh ) | X i | s ] ≤ Q − s ( nh h ) − / | ˜ K | ∞ | K | ∞ C sX = o (1) . (8.151)For | x | > c n , we have ˜ K h ( X i − x ) {| X i |≤ c n } ≤ h − ( c n h ) − p K = h p K − c − p K n and thus √ nh | ˆ g ◦ n,h ( x, v ) − E ˆ g ◦ n,h ( x, v ) | ≤ | K | ∞ C ˜ K h / ( nh ) / h p K − c − p K n ≤ h p K Q p K ( nh h ) / = o (1) . (8.152)By (8.150), (8.151) and (8.152), we have on A n , p nh h sup x ∈ R ,v ∈ [0 , | ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) | = p nh h sup x ∈ R ,v ∈ [0 , | ˆ g ◦ n,h ( x, v ) − E ˆ g ◦ n,h ( x, v ) | + o p (1)= p nh h sup | x |≤ c n ,v ∈ [0 , | ˆ g ◦ n,h ( x, v ) − E ˆ g ◦ n,h ( x, v ) | + o p (1)= p nh h sup | x |≤ c n ,v ∈ [0 , | ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) | + o p (1) . (8.153)Let X n = { in − : i ∈ {− ⌈ c n ⌉ n , ..., ⌈ c n ⌉ n }} be a grid that approximates each x ∈ [ − c n , c n ] with precision n − , and V n = { in − : i = 1 , ..., n } . Since K, ˜ K are Lipschitzcontinuous, p nh h sup | x − x ′ |≤ n − , | v − v ′ |≤ n − (cid:12)(cid:12)(cid:0) ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) (cid:1) − (cid:0) ˆ g n,h ( x ′ , v ) − E ˆ g n,h ( x ′ , v ) (cid:1)(cid:12)(cid:12) ≤ √ n √ h h sup | x − x ′ |≤ n − , | v − v ′ |≤ n − h L ˜ K | K | ∞ | x − x ′ | h + L K | ˜ K | ∞ | v − v ′ | h i = O ( n − ) . (8.154)10We conclude from (8.149), (8.153) and (8.154) that p nh h sup x ∈ R ,v ∈ [0 , | ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) | = p nh h sup x ∈X n ,v ∈ V n | ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) | + O p (1) (8.155)It was already shown that Assumption 2.8 is satisﬁed. Furthermore, we can choose D n = | K | ∞ , D ∞ ν ,n = | K | ∞ √ h with ν = ∞ , and ¯ F ( z, u ) = sup f ∈F ¯ f ( z, u ) ≤ | ˜ K | ∞ √ h =: C ¯ F ,n . Notethat E [( p h ˜ K h ( X i − x )) ] = E Z ˜ K ( u ) g ε (cid:0) x + ωh − m ( X i − , in ) σ ( X i − , i/n ) (cid:1) dω ≤ | g ε | ∞ Z ˜ K ( u ) du, therefore k f x,v k ,n ≤ D n | g ε | ∞ Z ˜ K ( u ) du, which implies σ := sup n ∈ N sup f ∈F V ( f ) < ∞ . Due to ∆( k ) = O ( j − αs ), the last conditionin (4.3) is fulﬁlled if sup n ∈ N log( n ) nh h α ( s ∧

12 ) α ( s ∧

12 ) − < ∞ . By Corollary 4.3, we have p nh h sup x ∈X n ,v ∈ V n (cid:12)(cid:12) ˆ g n,h ( x ) − E ˆ g n,h ( x, v ) (cid:12)(cid:12) = sup f ∈F | G n ( f ) | = O p ( p log |F| ) = O ( p log( n )) . With (8.155), it follows that p nh h sup x ∈ R ,v ∈ [0 , | ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) | = O p (cid:0)p log( n ) (cid:1) . We have for the distribution function of X i , G X i ( x ) = E G ε (cid:0) x − m ( X i − , i/n ) σ ( X i − , i/n ) (cid:1) , (8.156)and after diﬀerentiating we obtain the density of X i , g X i ( x ) = E Z ˜ K ( ω ) 1 σ ( X i − , i/n ) g ε (cid:0) x − m ( X i − , i/n ) σ ( X i − , i/n ) (cid:1) dω. (8.157) mpirical process theory for locally stationary processes (cid:12)(cid:12) E ˆ g n,h ( x, v ) − n n X i =1 K h ( i/n − v ) g X i ( x ) (cid:12)(cid:12) ≤ n n X i =1 K h ( i/n − v ) · E Z ˜ K ( ω ) 1 σ ( X i − , i/n ) × (cid:12)(cid:12)(cid:12) g ε (cid:0) x + ωh − m ( X i − , i/n ) σ ( X i − , i/n ) (cid:1) − g ε (cid:0) x − m ( X i − , i/n ) σ ( X i − , i/n ) (cid:1)(cid:12)(cid:12)(cid:12) dω ≤ | g ′ | ∞ | K | ∞ σ min Z ˜ K ( ω ) | ω | dω · h . (8.158)A similar derivation as in (8.157) yields g ˜ X i ( v ) ( x ) = E σ ( ˜ X i − ( v ) , v ) g ε (cid:0) x − m ( ˜ X i − ( v ) , v ) σ ( ˜ X i − ( v ) , v ) (cid:1) . By bounded variation of K ( · ), (6.12) and (6.9), we have with some constant C > (cid:12)(cid:12) n n X i =1 K h ( i/n − v ) g X i ( x ) − g ˜ X i ( v ) ( x ) (cid:12)(cid:12) = 1 n n X i =1 K h ( i/n − v ) (cid:12)(cid:12) g X i ( x ) − g ˜ X i ( v ) ( x ) (cid:12)(cid:12) + O (( nh ) − ) ≤ Cn n X i =1 K h ( i/n − v ) (cid:0) E | m ( X i − , i/n ) − m ( ˜ X i − ( v ) , v ) | s ∧ + E | σ ( X i − , i/n ) − σ ( ˜ X i − ( v ) , v ) | s ∧ (cid:1) + O (( nh ) − ) . = O ( n − ς ( s ∧ + h ς ( s ∧ + ( nh ) − ) . (8.159) Details of Example 6.10.

Calculation of bracketing numbers:

By the smoothness as-sumptions on m , | m ( x, u ) | ≤ C m + χ m | x | . Thus k m ( X i − , u ) k s s ≤ C sm + χ sm k X i − k s s ≤ C sm + χ sm C sX =: C sM , and similarly, k σ ( X i − , u ) k s s ≤ C sσ + χ sσ C sX =: C s Σ .Let γ > C γ = max { C M , C Σ } ( γ | K | ∞ ) − s . Deﬁne A := {| m ( X i − , in ) | ≤ C γ } ∩ { σ ( X i − , in ) ≤ C γ } . P ( A c ) ≤ (cid:16) k m ( Z i , in ) k s C γ (cid:17) s + (cid:16) k σ ( Z i , in ) k s C γ (cid:17) s ≤ (cid:0) C M C γ (cid:1) s ≤ γ | K | ∞ . (8.160)Deﬁne x N := C γ + C ε C γ ( γ | K | ∞ ) − s , x N +1 = ∞ , x − ( N +1) := −∞ , x − N := − x N and x j := x + γ σ min | g ε | ∞ | K | ∞ ( N + j ) , j = − ( N − , ..., N − , where N := ⌈ x N | g ε | ∞ | K | ∞ γ σ min ⌉ . Then, x N − x N − = 2 x N − γ σ min | g ε | ∞ | K | ∞ (2 N − ≤ x N − γ σ min | g ε | ∞ | K | ∞ · (cid:16) x N | g ε | ∞ | K | ∞ γ σ min − (cid:17) ≤ γ σ min | g ε | ∞ | K | ∞ , that is, the brackets [ f x j − , f x j ] = { f ∈ F : f x j − ≤ f ≤ f x j } , j = − N, ..., N + 1 cover F . We now show that k f x j − − f x j k ,n ≤ γ , j = − N, ..., N + 1.By Markov’s inequality, we have for x ∈ R that G ε ( x ) = P ( ε ≤ x ) ≥ − k ε k s s x s ≥ − C sε x − s . On the event A , we have G ε (cid:16) x N − m ( X i − , in ) σ ( i/n ) (cid:17) ≥ G ε (cid:16) C γ − m ( X i − , in ) σ ( i/n ) + C ε C γ σ ( i/n ) (cid:0) γ | K | ∞ (cid:1) − s (cid:17) ≥ G ε (cid:0) C ε (cid:0) γ | K | ∞ (cid:1) − s (cid:1) ≥ − γ | K | ∞ . (8.161)If j = N + 1, we have by (8.160) and (8.161): E h G ε (cid:16) x j − m ( X i − , in ) σ ( i/n ) (cid:17) − G ε (cid:16) x j − − m ( X i − , in ) σ ( i/n ) (cid:17)i / = (cid:16) E h(cid:16) − G ε (cid:16) x N − m ( X i − , in ) σ ( i/n ) (cid:17)(cid:17) A i + P ( A c ) (cid:17) / ≤ (cid:16) γ | K | ∞ + 2 γ | K | ∞ (cid:17) / = γ | K | ∞ , that is, k f x N +1 − f x N k ,n ≤ γ . A similar calculation holds for j = − N . mpirical process theory for locally stationary processes j ∈ { , ..., N } , we have by deﬁnition of x j , k f x j − − f x j k ,n = (cid:16) nh n X i =1 K (cid:0) i/n − vh (cid:1) E [( { x j −

0. We conclude that q H ( γ, F , k · k ,n ) ≤ r log( C N ) + (2 + 2 s ) log( γ − ) , that is, as long as α ( s ∧ ) >

1, sup n ∈ N R p H ( ε, F , V ) dε < ∞ . Then, the conditions ofCorollary 4.14 are fulﬁlled and we obtain that (6.15) converges to [ G ( f )] f ∈F in ℓ ∞ ( F ).Similar as in (8.156), we have G X i ( x ) = E G ε (cid:0) x − m ( X i − , i/n ) σ ( X i − , i/n ) (cid:1) , G ˜ X i − ( v ) ( x ) = E G ε (cid:0) x − m ( ˜ X i − ( v ) , v ) σ ( ˜ X i − ( v ) , v ) (cid:1) . By bounded variation of K , we obtain with a similar calculation as in (8.159) that (cid:12)(cid:12) E ˆ G n,h ( x, v ) − G ˜ X ( v ) ( x ) (cid:12)(cid:12) ≤ n n X i =1 K h ( i/n − v ) (cid:12)(cid:12) G X i ( x ) − G ˜ X ( v ) ( x ) (cid:12)(cid:12) + O (( nh ) − )= O ( n − ς ( s ∧ + h ς ( s ∧ + ( nh ) − ) . V -norm and connected quantities Lemma 8.13 (Summation of polynomial and geometric decay) . Let α > and q ∈ N .Then it holds that (i) α − q − α +1 ≤ ∞ X j = q j − α ≤ max { α, − α +1 } α − q − α +1 . (ii) For σ > , κ ≥ b ρ,κ ,l · σ · log( σ − ) ≤ ∞ X j =1 min { σ, κ ρ j } ≤ b ρ,κ · σ · log( σ − ∨ e ) ,b α,κ ,l · σ · σ − α ≤ ∞ X j =1 min { σ, κ j − α } ≤ b α,κ · σ · max { σ − α , } , where b ρ,κ , b ρ,κ ,l , b α,κ , b α,κ ,l are constants only depending on ρ, κ , α . Proof of Lemma 8.13. (i) Upper bound: If q ≥

2, then ∞ X j = q j − α = ∞ X j = q Z jj − j − α dx ≤ ∞ X j = q Z jj − x − α dx = Z ∞ q − x − α dx = 1 − α + 1 x − α +1 (cid:12)(cid:12)(cid:12) ∞ q − = 1 α − q − − α +1 = 1 α − q − α +1 · ( q − q ) − α +1 ≤ − α +1 α − q − α +1 . If q = 1, then P ∞ j = q j − α = 1 + P ∞ j = q +1 j − α ≤ α − q − α +1 = αα − .Lower bound: Using similar decomposition arguments as above, we have ∞ X j = q j − α ≥ ∞ X j = q Z j +1 j x − α dx = Z ∞ q x − α dx = 1 − α + 1 x − α +1 (cid:12)(cid:12)(cid:12) ∞ q = 1 α − q − α +1 . (ii) • Exponential decay:

Upper bound: First let a := max {⌊ log( σ/κ )log( ρ ) ⌋ , } + 1. Thenwe have ∞ X j =0 min { σ, κ ρ j } ≤ a − X j =0 σ + κ ∞ X j = a ρ j = aσ + κ ρ a − ρ ≤ aσ + κ − ρ min { σκ , } ≤ aσ + σ − ρ ≤ σ · h ρ − ) max { log( κ /σ ) , } + 21 − ρ i ≤ σ · h ρ − ) max { log( σ − ) , } + log( κ ) ∨ ρ − ) + 21 − ρ i ≤ b ρ,κ · σ · log( σ − ∨ e ) , where b ρ,κ := 2(log( κ ) ∨ · ρ − ) (cid:2) ρ − )1 − ρ (cid:3) . mpirical process theory for locally stationary processes β ( q ) = κ P ∞ j = q ρ j = κ − ρ ρ q . Then ∞ X j =1 min { σ, κ ρ j } ≥ σ (ˆ q −

1) + β (ˆ q ) , where ˆ q = min { q ∈ N : σκ ≥ ρ q } . We have ˆ q ≥ log( σ/κ )log( ρ ) =: q and ˆ q ≤ q + 1.Thus ∞ X j =1 min { σ, κ ρ j } ≥ σ ( q −

1) + β ( q + 1) . Now consider the case σκ < ρ , that is, log( σ/κ )log( ρ ) ≥

2. Then, q − ≥ q , and q ≤ log( σ/κ )log( ρ ) . We obtain ∞ X j =1 min { σ, κ ρ j } ≥ σ log( σ/κ )log( ρ ) + κ ρ − ρ ρ log( σ/κ ρ ) = 12 σ log( σ/κ )log( ρ ) + ρ − ρ σ ≥ (cid:16) ρ − ρ + 1log( ρ − ) (cid:17) σ log( σ − κ ) , that is, the assertion holds with b ρ,κ ,l := (cid:0) ρ − ρ + ρ − ) (cid:1) . • Polynomial decay:

Upper bound: Let a := ⌊ ( σκ ) − α ⌋ + 1 ≥ ( σκ ) − α . Then wehave by (i): ∞ X j =1 min { σ, κ j − α } ≤ a X j =1 σ + κ ∞ X j = a +1 j − α = aσ + κ α − a − α +1 ≤ aσ + κ α α − σ α − α ≤ σ · h κ α σ − α + 1 + κ α α − σ − α i ≤ σ · h αα − κ α σ − α + 1 i ≤ b α,κ · σ · max { σ − α , } , where b α,κ := 2 αα − ( κ ∨ α .Lower Bound: Put β ( q ) = κ P ∞ j = q j − α . By (i), β ( q ) ≥ κ α − q − α +1 . Then ∞ X j =1 min { σ, κ j − α } ≥ min q ∈ N { σq + β ( q ) }≥ min q ∈ N { σq + κ α − q − α +1 } .

16 Elementary analysis yields that the minimum is achieved for q = κ α · σ − a =( κ σ ) α , that is, ∞ X j =1 min { σ, κ j − α } ≥ αα − κ α · σ α − α , the assertion holds with b α,κ ,l := αα − κ α . Lemma 8.14 (Values of q ∗ , r ( δ )) . • Polynomial decay ∆( j ) = κj − α ( α > ). Thenthere exist constants c ( i ) α,κ , C ( i ) α,κ > , i = 1 , only depending on κ, α such that c (1) α,κ max { x − α , } ≤ q ∗ ( x ) ≤ C (1) α,κ max { x − α , } , and c (2) α,κ min { δ αα − , δ } ≤ r ( δ ) ≤ C (2) α,κ min { δ αα − , δ } . • Geometric decay ∆( j ) = κρ j ( ρ ∈ (0 , ). Then there exist constants c ( i ) ρ,κ , C ( i ) ρ,κ > , i = 1 , only depending on κ, ρ such that c (1) ρ,κ max { log( x − ) , } ≤ q ∗ ( x ) ≤ C (1) ρ,κ max { log( x − ) , } , and c (2) ρ,κ δ log( δ − ∨ e ) ≤ r ( δ ) ≤ C (2) ρ,κ δ log( δ − ∨ e ) . Proof of Lemma 8.14. (i) By Lemma 8.13(i), β norm ( q ) = β ( q ) q ∈ [ c α,κ q − α , C α,κ q − α ]with c α,κ = κα − , C α,κ = κ max { α, − α +1 } α − . In the following we assume w.l.o.g. that C α,κ > c α,κ < • q ∗ ( x ) Upper bound: For any x > q ∗ ( x ) = min { q ∈ N : β norm ( q ) ≤ x } ≤ min { q ∈ N : q ≥ ( xC α,κ ) − α } = ⌈ ( xC α,κ ) − α ⌉ . Especially we obtain q ∗ ( x ) ≤ ( xC α,κ ) − α + 1 ≤ C α α,κ max { x − α , } . The asser-tion holds with C (1) α,κ := 2 max { C α,κ , } α . • q ∗ ( x ) Lower bound: Similarly to above, q ∗ ( x ) ≥ ⌈ ( xc α,κ ) − α ⌉ ≥ (cid:0) xc α,κ (cid:1) − α = c α α,κ x − α . On the other hand, q ∗ ( x ) ≥ ≥ c α α,κ , which yields the assertion with c (1) α,κ =min { c α,κ , } α . mpirical process theory for locally stationary processes • r ( δ ) Upper bound: Put r = 2 αα − c − α − α,κ δ αα − . Then we have q ∗ ( r ) r ≥ ⌈ ( rc α,κ ) − α ⌉ r = 2 αα − c − α − α,κ ⌈ − α − c α − α,κ δ − α − ⌉ δ αα − ≥ δ > δ. By deﬁnition of r ( · ), r ( δ ) ≤ r . It was already shown in Lemma 3.6(i) that r ( δ ) ≤ δ holds for all δ >

0. We obtain the assertion with C (2) α,κ = 2 αα − c − α − α,κ . • r ( δ ) Lower bound: First consider the case δ < C α,κ .Put r = 2 − αα − C − α − α,κ δ αα − . Since x := 2 α − C α − α,κ δ − α − > ⌈ x ⌉ ≤ x andthus q ∗ ( r ) r ≤ ⌈ ( rC α,κ ) − α ⌉ r = 2 − αα − C − α − α,κ ⌈ α − C α − α,κ δ − α − ⌉ δ αα − ≤ · − δ ≤ δ. By deﬁnition of r ( · ), r ( δ ) ≥ r = 2 − αα − min { ( δC α,κ ) α − , } δ .In the case δ > C α,κ , we have q ∗ ( δ ) δ = ⌈ ( δC α,κ ) − α ⌉ δ ≤ · δ ≤ δ, thus r ( δ ) ≥ δ = min { ( δC α,κ ) α − , } δ ≥ − αα − min { ( δC α,κ ) α − , } δ . We con-clude that the assertion holds with c (2) α,κ = 2 − αα − C − α − α,κ .(ii) We have β norm ( q ) = β ( q ) q = C ρ,κ ρ q q , where C ρ,κ = κρ − ρ . In the following we assumew.l.o.g. that C ρ,κ > • q ∗ ( x ) Upper bound: Put ψ ( x ) = max { log( x − ) , } . Deﬁne ˜ q = ⌈ ψ ( xCρ,κ log( ρ − )log( ρ − ) ⌉ .Then we have β norm (˜ q ) ≤ C ρ,κ ρ log( (cid:0) xCρ,κ log( ρ − (cid:1) − ) / log( ρ − ) ˜ q ≤ x log( ρ − ) ˜ q ≤ xψ ( xC ρ,κ log( ρ − ) ) ≤ x, thus q ∗ ( x ) = min { q ∈ N : β norm ( q ) ≤ x } ≤ ˜ q = l ψ ( xC ρ,κ log( ρ − ) )log( ρ − ) m . Especially we obtain q ∗ ( x ) ≤ ρ − ) (cid:0) ψ ( x )+log( C ρ,κ log( ρ − (cid:1) +1 ≤ C ρ,κ log( ρ − )))log( ρ − ) ψ ( x ) , that is, the assertion holds with C (1) ρ,κ = C ρ,κ log( ρ − ρ − ) .18 • q ∗ ( x ) Lower Bound: Case 1: Assume that x < C ρ,κ log( ρ − ) ρ . Deﬁne ˜ q = ⌈

14 log(( xCρ,κ log( ρ − ) − )log( ρ − ) ⌉ ≥

1. Then ˜ q ≤

12 log(( xCρ,κ log( ρ − ) − )log( ρ − ) , and thus β norm (˜ q ) ≥ C ρ,k (cid:16) xC ρ,κ log( ρ − ) (cid:17) / ˜ q ≥ ( C ρ,κ log( ρ − )) / x / log(( xC ρ,κ log( ρ − ) ) − / ) > x since ( xC ρ,κ log( ρ − ) ) − / > log(( xC ρ,κ log( ρ − ) ) − / ) . We have therefore shown that for x < C ρ,κ log( ρ − ) ρ , q ∗ ( x ) ≥ ˜ q = max { , ˜ q } . (8.162)Case 2: If x ≥ C ρ,κ log( ρ − ) ρ , then ˜ q ≤

1, that is, q ∗ ( x ) ≥ { , ˜ q } . We have shown that for all x > q ∗ ( x ) ≥ max { , ˜ q } . Since˜ q ≥

14 log(( xC ρ,κ log( ρ − ) ) − )log( ρ − ) ≥

14 log( ρ − ) (cid:2) log( x − ) + log( C ρ,κ log( ρ − )) (cid:3) ≥

14 log( ρ − ) log( x − ) , the assertion follows with c (1) ρ,κ =

14 log( ρ − ) . • r ( δ ) Upper bound: Put ˜ r = c (1) ρ,κ ) − δ log((2 − c (1) ρ,κ δ − ) ∨ e ) . Then we have q ∗ (˜ r )˜ r ≥ c (1) ρ,κ log(˜ r − ∨ e ) · ˜ r = 2 δ log((2 − c (1) ρ,κ δ − ) ∨ e ) · log([2 − c (1) ρ,κ δ − log((2 − c (1) ρ,κ δ − ) ∨ e )] ∨ e ) ≥ δ log((2 − c (1) ρ,κ δ − ) ∨ e ) · log([2 − c (1) ρ,κ δ − ] ∨ e ) = 2 δ > δ. By deﬁnition of r ( · ), we obtain r ( δ ) ≤ ˜ r. mpirical process theory for locally stationary processes a ∈ (0 , , ∞ ) → (0 , ∞ ) , x log( x − ∨ e )log(( ax − ) ∨ e ) attains itsmaximum at x = ae − with maximum value 1 + log( a − ). Thus˜ r ≤ c (1) ρ,κ ) − (1 + log(2 − ( c (1) ρ,κ ) − )) · δ log( δ − ∨ e ) , that is, the assertion holds with C (2) ρ,κ = 2( c (1) ρ,κ ) − (1 + log(2 − ( c (1) ρ,κ ) − )). • r ( δ ) Lower Bound: Put ˜ r = − ( C (1) ρ,κ ) − δ log((2 C (1) ρ,κ δ − ) ∨ e ) . Then q ∗ (˜ r )˜ r ≤ C (1) ρ,κ log(˜ r − ∨ e ) · ˜ r = 2 − δ log((2 C (1) ρ,κ δ − ) ∨ e ) · log([2 C (1) ρ,κ δ − log((2 C (1) ρ,κ δ − ) ∨ e )] ∨ e ) ≤ − δ log(( C (1) ρ,κ δ − ) ∨ e ) · (cid:2) log((2 C (1) ρ,κ δ − ) ∨ e ) + log log((2 C (1) ρ,κ δ − ) ∨ e ) (cid:3) ≤ δ, where the last step is due to log( x ) + log log( x ) ≤ x ) for x ≥ e . Bydeﬁnition of r ( · ), we obtain r ( δ ) ≥ ˜ r. For a >

1, the function (0 , ∞ ) → (0 , ∞ ) , x log( x − ∨ e )log(( ax − ) ∨ e ) attains its minimumat x = e − with minimum value a ) . We therefore obtain˜ r ≥ ( C (1) ρ,κ ) − C (1) ρ,κ )) δ log( δ − ∨ e ) , that is, the assertion holds with c (2) ρ,κ = ( C (1) ρ,κ ) − C (1) ρ,κ )) . Lemma 8.15 (Form of V ) . (i) Polynomial decay ∆( j ) = κj − α (where α > ): Thenthere exist some constants C (3) α,κ , c (3) α,κ only depending on κ, α, D n such that c (3) α,κ k f k ,n max {k f k − α ,n , } ≤ V ( f ) ≤ C (3) α,κ k f k ,n max {k f k − α ,n , } . (ii) Geometric decay ∆( j ) = κρ j (where ρ ∈ (0 , ): Then there exist some constants c (3) ρ,κ , C (3) ρ,κ only depending on κ, ρ, D n such that c (3) ρ,κ k f k ,n max { log( k f k − ,n ) , } ≤ V ( f ) ≤ C (3) ρ,κ k f k ,n max { log( k f k − ,n ) , } . Proof of Lemma 8.15.

The assertions follow from Lemma 8.13(ii) by taking κ = κ D n . The maximum in the lower bounds is obtained due to the additional summand k f k ,n in V ( f ).The following lemma formulates the entropy integral in terms of the well-known brack-eting numbers in terms of the k · k ,n -norm in the case that sup n ∈ N D n < ∞ . For this, weuse the upper bounds of V given in Lemma 8.15. Lemma 8.16. (i) Polynomial decay ∆( j ) = κj − α (where α > ). Then for any σ ∈ (0 , C (3) α,κ ) , Z σ p H ( ε, F , V ) dε ≤ C (3) α,κ α − α Z ( σC (3) α,κ ) αα − u − α q H ( u, F , k · k ,n ) du, where C (3) α,κ is from lemma 8.15.(ii) Exponential decay ∆( j ) = κρ j (where ρ ∈ (0 , ). Then for any σ ∈ (0 , e − C (3) ρ,κ ) , Z σ p H ( ε, F , V ) dε ≤ C (3) ρ,κ Z E − ( σC (3) ρ,κ )0 log( u − ) q H ( u, F , k · k ,n ) du, where E − ( x ) = x log( x − ) and C (3) ρ,κ is from lemma 8.15. Proof of Lemma 8.16. (i) By Lemma 8.15, V ( f ) ≤ C (3) α,κ k f k ,n max {k f k − α ,n , } .We abbreviate c = C (3) α,κ in the following.Let ε ∈ (0 , c ) and ( l j , u j ), j = 1 , ..., N brackets such that k u j − l j k ,n ≤ ( εc ) αα − .Then V ( u j − l j ) ≤ c max {k u j − l j k ,n , k u j − l j k α − α ,n } ≤ c max n ( εc ) αα − , εc o ≤ c · εc = ε. Therefore, the bracketing number fulﬁll the relation N ( ε, F , V ) ≤ N (cid:16) ( εc ) αα − , F , k · k ,n (cid:17) . We conclude that for σ ∈ (0 , c ), Z σ p H ( ε, F , V ) dε ≤ Z σ r H (cid:16) ( εc ) αα − , F , k · k ,n (cid:17) dε = c α − α Z ( σc ) αα − u − α q H ( u, F , k · k ,n ) du. In the last step, we used the substitution u = ( εc ) αα − which leads to dudε = αα − · c · ( εc ) α − = αα − · c · u α . mpirical process theory for locally stationary processes V ( f ) ≤ C (3) ρ,κ E ( k f k ,n ) with E ( x ) = x max { log( x − ) , } . We ab-breviate c = C (3) ρ,κ in the following.We ﬁrst collect some properties of E . Put E − ( x ) = x log( x − ∨ e ) . In the case x > e − ,we have E ( E − ( x )) = x . In the case x ≤ e − , we have E ( E − ( x )) = x log( x − ) · log (cid:16) x − log( x − ) − (cid:17) ≤ x log( x − ) log( x − ) = x. This shows that for all x > E ( E − ( x )) ≤ x. (8.163)Furthermore, for x < e − ,log( E − ( x ) − ) = log( x − log( x − )) ≥ log( x − ) . (8.164)Now let ε ∈ (0 ,

1) and ( l j , u j ), j = 1 , ..., N brackets such that k u j − l j k ,n ≤ E − ( εc ).Then by (8.163), V ( u j − l j ) ≤ cE ( E − ( εc )) ≤ c · εc = ε. Therefore, we have the following relation between the bracketing numbers N ( ε, F , V ) ≤ N (cid:16) E − ( εc ) , F , k · k ,n (cid:17) . We conclude that for σ ∈ (0 , ce − ), Z σ p H ( ε, F , V ) dε ≤ Z σ r H (cid:16) E − ( εc ) , F , k · k (cid:17) dε ≤ c Z E − ( σc )0 log( u − ) p H ( u, F , k · k ) du. In the last step, we used the substitution u = E − ( εc ) which leads to dudε = c · ε/c ) − )log(( ε/c ) − ) , and with (8.164) we obtain dε = c log(( ε/c ) − ) ε/c ) − ) du ≤ c log(( ε/c ) − ) du ≤ c log( E − ( εc ) − ) du = c log( u − ))