Empirical process theory for locally stationary processes
aa r X i v : . [ m a t h . S T ] J u l Submitted to Bernoulli
Empirical process theory for locally stationaryprocesses
NATHAWUT PHANDOIDAEN and STEFAN RICHTER Institut f¨ur angewandte Mathematik, Im Neuenheimer Feld 205, Universit¨at HeidelbergE-mail: [email protected] ,E-mail: [email protected]
We provide a framework for empirical process theory of locally stationary processes using thefunctional dependence measure. Our results extend known results for stationary mixing se-quences by another common possibility to measure dependence and allow for additional timedependence. We develop maximal inequalities for expectations and provide functional limit the-orems and Bernstein-type inequalities. We show their applicability to a variety of situations, forinstance we prove the weak functional convergence of the empirical distribution function anduniform convergence rates for kernel density and regression estimation if the observations arelocally stationary processes.
MSC 2010 subject classifications:
Primary 60F17; secondary 60F10.
Keywords:
Empirical process theory, Functional dependence measure, Maximal inequality, Func-tional central limit theorem, Locally stationary processes.
1. Introduction
Empirical process theory is a powerful tool to prove uniform convergence rates and weakconvergence of composite functionals. The theory for independent variables is well-studied(cf. [22] for an overview) based on the original ideas of [6], [9], [10] and [20]. For more gen-eral random variables, various approaches which quantify dependence under mixing havebeen discussed. A complete theory which uses entropy with bracketing is available for φ -(uniform)-mixing and β -(absolutely regular)-mixing sequences (cf. [5] for an overviewand in particular [7] for the theory of β -mixing). Up to our knowledge, no other measureof dependence was found to allow for such a rich theory. It was mainly formalized forstationary processes. With some effort, a generalization to locally stationary processesshould be possible but was never done. Unfortunately, even though it is graphically anintuitive assumption, it is hard to prove mixing for time series models due to the supre-mum over two different sigma-algebras. The verification needs quite some work even forrelatively simple stationary models like ARMA (cf. [17]) or GARCH (cf. [11]), and spe-cific continuity conditions on the distribution of the innovations have to be posed. Forlocally stationary models, even less results are known in this direction (cf. for instance1[13] for tvARCH models) and additional difficulties arise through the time-varying distri-butions. Therefore, mixing seems not to be ideally suited for an empirical process theoryfor locally stationary processes.In this paper we introduce an empirical process theory based on the functional depen-dence measure (invented by [24]) instead of mixing coefficients. It was shown in [26] thatfor a large variety of recursively defined models and linear models, the functional depen-dence measure is easy to calculate and only moment conditions on the innovations areenforced. Therefore, stating conditions on the dependence measure instead of the decayof certain mixing coefficients can be viewed as an assumption which is weaker and ofteneasier to verify in theoretic models.It has been shown in various applications that the functional dependence measure allows,when combined with the rich theory of martingales, for sharp large deviation inequal-ities (cf. [27] or [28]). Instead of β - or φ -mixing, where dependence is quantified withprobabilities of events coming from different sigma-algebras, the functional dependencemeasure uses a representation of the given process as a Bernoulli shift process and quan-tifies dependence with a L ν -norm. More precisely, we assume that X i = ( X ij ) j =1 ,...,d , i = 1 , ..., n , is a d -dimensional process of the form X i = J i,n ( G i ) , (1.1)where G i = σ ( ε i , ε i − , ... ) is the sigma-algebra generated by ε i , i ∈ Z , a sequence ofi.i.d. random variables in R ˜ d ( d, ˜ d ∈ N ), and some measurable function J i,n : ( R ˜ d ) N → R , i = 1 , ..., n , n ∈ N . For a real-valued random variable W and some ν >
0, wedefine k W k ν := E [ | W | ν ] /ν . If ε ∗ k is an independent copy of ε k , independent of ε i , i ∈ Z ,we define G ∗ ( i − k ) i := ( ε i , ..., ε i − k +1 , ε ∗ i − k , ε i − k − , ... ) and X ∗ ( i − k ) i := J i,n ( G ∗ ( i − k ) i ). Theuniform functional dependence measure is given by δ Xν ( k ) = sup i =1 ,...,n sup j =1 ,...,d (cid:13)(cid:13) X ij − X ∗ ( i − k ) ij (cid:13)(cid:13) ν . (1.2)Although representation (1.1) appears to be rather restrictive, it does cover a largevariety of processes: In [3] it was motivated that the set of all processes of the form X i = J ( ε i , ε i − , ... ) should be equal to the set of all stationary and ergodic processes. Weadditionally allow J to vary with i and n to cover processes which change their stochasticbehavior over time. This is exactly the form of the so-called locally stationary processesdiscussed in [4]. If both, functional dependence measure and mixing coefficients are avail-able, many examples reveal that they lead to similar decay rates. These discoveries justifythe use of the functional dependence measure for empirical process theory and raise hopethat similar results as in the mixing framework can be derived.Since we are working in the time series context, many applications ask for functions f that not only depend on the actual observation of the process but on the whole (infinite)past Z i := ( X i , X i − , X i − , ... ). In the course of this paper, we aim to derive asymptotic mpirical process theory for locally stationary processes G n ( f ) := 1 √ n n X i =1 (cid:8) f ( Z i , in ) − E f ( Z i , in ) (cid:9) , f ∈ F , (1.3)where F ⊂ { f : ( R d ) N × [0 , → R measurable } . Let H ( ε, F , k · k ) denote the bracketing entropy, that is, the logarithm of the number of ε -brackets with respect to some semi-norm k · k that is necessary to cover F (this is madeprecise at the end of this section). We will define a semi-norm V ( · ) which guarantees weakconvergence of (1.3) if the corresponding bracketing entropy integral R p H ( ε, F , V ) dε is finite. In the framework of β -mixing, [21] argues that the choice of the specific normwhich is needed to measure the size of the brackets is connected to the dependencestructure of X i . A main tool in deriving uniform results over f ∈ F exploits the factthat if X i is β -mixing with coefficients β ( k ), the same holds for f ( X i , in ). When usingthe functional dependence measure (1.2), the situation is more complicated: In orderto quantify f ( X i , in ) (or more general, f ( Z i , in )) by δ Xν , we have to impose smoothnessconditions on f in direction of its first argument. The semi-norm V ( · ) therefore willnot only change with the dependence structure of X , but also has to be “compatible”with the function class F . The smoothness condition on f also poses a challenging issuewhen considering chaining procedures where rare events are excluded by (non-smooth)indicator functions. We will see that despite these facts, our theory is not restrictedto smooth function classes. If the distribution of ε has a Lebesgue density, it is oftenpossible to decompose G n ( f ) into a martingale part and an integrated smooth part evenif f itself was not smooth.Our main contributions in this paper are the following: We derive • maximal inequalities for G n ( f ) for classes of functions F , • a chaining device which preserves smoothness during the chaining procedure, • conditions to ensure asymptotic tightness and functional convergence of G n ( f ), f ∈ F , • and Bernstein-type large deviation inequalities.Specifically, we generalize the results derived in [25] and [16], which consider weak con-vergence of the empirical distribution function for stationary processes and for piecewiselocally stationary processes, to general function classes F .The paper is organized as follows. In Section 2, we introduce the main definitions andassumptions on the function class F , define the semi-norm V ( · ) and give examples of itsform. We show that V ( f ) is an upper bound for the variance of G n ( f ). Section 3 con-siders the case where F consists of smooth functions. We derive maximal inequalities for G n ( f ) and a functional central limit theorem under minimal moment conditions. Section4 focuses on extending the results to non-smooth function classes F , while Section 5 pro-vides large deviation inequalities of Bernstein-type. Our main results are the Corollaries3.3 and 3.14 as well as the Corollaries 4.3 and 4.14. In Section 6, we use the theory ofSections 3 and 4 to prove uniform convergence rates for nonparametric regression esti-mation, M-estimation and weak convergence of the empirical distribution function. Theaim of the last section is to highlight the wide range of applicability of our theory and toprovide the typical conditions which have to be imposed as well as some discussion. InSection 7, a conclusion is drawn. We postpone all detailed proofs to the SupplementaryMaterial Supplement A but illustrate the main steps in the article.Let a ∧ b := min { a, b } , a ∨ b := max { a, b } for a, b ∈ R , and for k ∈ N , H ( k ) := 1 ∨ log( k ) (1.4)which naturally appears in large deviation inequalities. For a given finite class F , let |F| denote its cardinality. We use the abbreviation H = H ( |F| ) = 1 ∨ log |F| (1.5)if no confusion arises. For some semi-norm k · k , let N ( ε, F , k · k ) denote the bracketingnumbers, that is, the smallest number of ε -brackets [ l j , u j ] := { f ∈ F : l j ≤ f ≤ u j } (i.e. measurable functions l j , u j ∈ F with k u j − l j k ≤ ε for all j ) to cover F . Let H ( ε, F , k · k ) := log N ( ε, F , k · k ) denote the bracketing entropy. The fact that the limitfunctions l j , u j have to belong to F is discussed in Remark 2.12.
2. Derivation of the semi-norm, main definitions andassumptions on the function class
In this section, we provide the basic assumptions on F and the definition of the semi-norm V ( · ) which is used to measure the size of the brackets. Recall that for ν > W , we put k W k ν := E [ | W | ν ] /ν .Furthermore, for f ∈ F , let k f k ν,n := (cid:16) n n X i =1 (cid:13)(cid:13) f (cid:0) Z i , in (cid:1)(cid:13)(cid:13) νν (cid:17) /ν . Our theory mainly is based on the case ν = 2. A basic property that a semi-norm V ( · )has to fulfill when using a chaining procedure is that its square has to be an upper boundof the variance of G n ( f ), that is, Var( G n ( f )) ≤ V ( f ) . mpirical process theory for locally stationary processes k ∈ N . For a se-quence W i = ˜ J i,n ( G i ) with k W i k < ∞ , let P i − k W := E [ W i |G i − k ] − E [ W i |G i − k − ].Then, ( P i − k W i ) i ∈ N is a martingale difference sequence with respect to ( G i ) i ∈ N , and W i − E W i = P ∞ k =0 P i − k W i . By the projection property of the conditional expectationand an elementary property of δ W (cf. [24], Theorem 1), we have k P i − k W i k ≤ min {k W i k , δ W ( k ) } . (2.1)Since min { a , b } + min { a , b } ≤ min { a + a , b + b } for nonnegative real numbers a , b , a , b , we obtainVar( G n ( f )) / ≤ ∞ X k =0 (cid:13)(cid:13)(cid:13) √ n n X i =1 P i − k f ( Z i , in ) (cid:13)(cid:13)(cid:13) = ∞ X k =0 (cid:16) n n X i =1 k P i − k Y i k (cid:17) / ≤ ∞ X k =0 (cid:16) n n X i =1 min { (cid:13)(cid:13) f ( Z i , in ) (cid:13)(cid:13) , δ f ( Z, in )2 ( k ) } (cid:17) / ≤ ∞ X k =0 min n k f k ,n , (cid:16) n n X i =1 δ f ( Z, in )2 ( k ) (cid:17) / o . (2.2)To further bound (2.2), we therefore have to investigate for u ∈ [0 , δ f ( Z,u )2 ( k ) = sup i ∈ Z (cid:13)(cid:13) f ( Z i , u ) − f ( Z ∗ ( i − k ) i , u ) (cid:13)(cid:13) . (2.3)Due to its linear nature, it is necessary to impose some quantitative smoothness assump-tion on f ∈ F in order to derive upper bounds for δ f ( Z,u )2 ( k ) in terms of the functionaldependence measure of X i from (1.2). When doing so, we “lose” the properties of f andespecially of k f k ,n , that is, our goal should be to bound (2.3) by some quantity whichis completely independent of the specific f . To obtain a rich enough theory for locallystationary processes, it is necessary to allow f to depend on n and include classes F where parts of f change the convergence rate of G n ( f ). We especially have in mind thecase that f ( z, u ) = 1 √ h K (cid:16) i/n − vh (cid:17) · ¯ f ( z, u ) , where ¯ f : ( R d ) N × [0 , → R is measurable, K : R → R is some kernel function, h = h n some bandwidth and v ∈ [0 ,
1] some value which either is fixed or varies with f . To coversuch cases, we require that each f ∈ F can be written in the form f ( z, u ) = D f,n ( u ) · ¯ f ( z, u ) , z ∈ ( R d ) N , u ∈ [0 , , (2.4)where D f,n ( u ) ∈ R does not depend on z . We put¯ F := { ¯ f : f ∈ F} . (2.5)Given some decreasing sequence ∆( k ) and some D n which fulfillsup u ∈ [0 , sup f ∈F δ ¯ f ( Z,u )2 ≤ ∆( k ) , sup f ∈F (cid:16) n n X i =1 D f,n (cid:0) in (cid:1) (cid:17) / ≤ D n , (2.6)we obtain from (2.2) thatVar( G n ( f )) / ≤ ∞ X k =0 min {k f k ,n , D n ∆( k ) } , which motivates the definition of V in the next subsection. V For some decreasing nonnegative sequence (∆( k )) k ∈ N of real numbers with P ∞ k =0 ∆( k ) < ∞ and some nonnegative sequence ( D n ) n ∈ N of real numbers, we define V ( f ) := k f k ,n + ∞ X k =1 min {k f k ,n , D n ∆( k ) } , f ∈ F . The following lemma collects some properties of V and especially shows that V is aseminorm. The proof is obvious and therefore omitted. Lemma 2.1.
Let f, g ∈ F and a ∈ R . Then(i) V (0) = 0 , V ( f + g ) ≤ V ( f ) + V ( g ) and V ( a · f ) = | a | V ( f ) ,(ii) | f | ≤ g = ⇒ V ( f ) ≤ V ( g ) ,(iii) k f k ,n , k f k ,n ≤ V ( f ) , and V ( f ) ≤ V ( k f k ∞ ) < ∞ if k f k ∞ < ∞ . Based on the fact that we will later assume that F fulfills (2.6) (and thus G n ( f ) isproperly standardized), it is reasonable to suppose that D n ∈ (0 , ∞ ) is independent of n ∈ N . In this case, simpler forms of V can be derived for special cases of ∆( k ) whichare given in Table 1. Note that if f ( Z i , in ), i = 1 , ..., n , are independent, δ f ( Z,u )2 ( k ) = 0for k > V ( f ) = k f k ,n . We therefore exactly recover the case of independentvariables with our theory. Remark (Comparison with β -mixing) . If f ( Z i , in ) is β -mixing with decay coeffi-cients β ( k ) = c · k − α = ∆( k ) , k ∈ N , for some c > , α > , then it was shown in [5]that G n ( f ) is asymptotically tight if the entropy integral satisfies Z q H ( ε, F , k · k αα − ,n ) dε < ∞ . (2.7) mpirical process theory for locally stationary processes ∆( j ) cj − α , α > , c > cρ j , ρ ∈ (0 , c > V ( f ) k f k ,n max {k f k − α ,n , } k f k ,n max { log( k f k − ,n ) , } R σ p H ( ε, F , V ) dε R ˜ σ ε − α p H ( ε, F , k · k ,n ) dε R ˜ σ log( ε − ) p H ( ε, F , k · k ,n ) dε Table 1.
Equivalent expressions of V and the corresponding entropy integral taken from Lemma 8.15and Lemma 8.16 of the Supplementary Material Supplement A in Section 8.8, respectively, under thecondition that D n ∈ (0 , ∞ ) is independent of n . We omitted the lower and upper bound constantswhich are only depending on c, ρ, α and D n . Furthermore, ˜ σ = ˜ σ ( σ ) fulfills ˜ σ → σ → In this framework, we therefore have to pay the price for dependence with a higher numberof moments of f ( Z i , u ) instead of an additional factor ε − α in the entropy integral (cf.Table 1).There is a special case where both entropy integrals have comparable values: If the brackets [ l, u ] which yield H ( ε, F , k · k αα − ,n ) have the property that | u − l | = | u − l | αα − (we haveespecially in mind the case that l, u are indicator functions), then it is easy to see that H ( ε, F , k · k αα − ,n ) ≤ H ( ε αα − , F , k · k ,n ) . By substitution u = ε αα − , (2.7) then is upperbounded by α − α Z u − α q H ( u, F , k · k ,n ) du, that is, the integrand is of the same order as in the entropy integral R p H ( ε, F , V ) dε (cf. Table 1). F We now give conditions such that F satisfies (2.6) based on statements about the func-tional dependence measure of X i . By the linear nature of the functional dependencemeasure, it is necessary to establish quantitative smoothness assumptions on the ele-ments of F . We do so by asking for H¨older-type smoothness. For s ∈ (0 , z = ( z i ) i ∈ N of elements of R d (equipped with the maximum norm |·| ∞ ) and an absolutelysummable sequence χ = ( χ i ) i ∈ N of nonnegative real numbers, put | z | χ,s := (cid:16) ∞ X i =0 χ i | z i | s ∞ (cid:17) /s and | z | χ := | z | χ, . We summarize the smoothness conditions on F in the following defi-nition (recall (2.5)). Definition (( L F , s, R, C )-class) . We call a class ¯ F of functions ¯ f : ( R d ) N × [0 , → R a ( L F , s, R, C ) -class if L F = ( L F ,i ) i ∈ N is a sequence of nonnegative real numbers, s ∈ (0 , and R : ( R d ) N × [0 , → [0 , ∞ ) satisfies for all u ∈ [0 , , z, z ′ ∈ ( R d ) N , ¯ f ∈ ¯ F , | ¯ f ( z, u ) − ¯ f ( z ′ , u ) | ≤ | z − z ′ | sL F ,s · (cid:2) R ( z, u ) + R ( z ′ , u ) (cid:3) . Furthermore, C = ( C R , C ¯ f ) ∈ (0 , ∞ ) satisfies sup u | ¯ f (0 , u ) | ≤ C ¯ f , sup u | R (0 , u ) | ≤ C R . Remark . The condition on ¯ F to be an ( L F , s, R, C ) -class poses a smoothnesscondition on any ¯ f ∈ ¯ F separately. There is no need for any connection between thedifferent ¯ f ∈ ¯ F , and it should not be confused with the important example of so-calledparametric Lipschitz classes in empirical process theory (cf. [22, Example 19.7]) where itis assumed that there is some parameter space Θ ⊂ R p such that ¯ F = { ¯ f θ : θ ∈ Θ } andfor two θ , θ ∈ Θ , | ¯ f θ ( z, u ) − ¯ f θ ( z, u ) | ≤ m ( z, u ) · | θ − θ | ∞ holds for some measurablefunction m . We are now able to formulate the basic assumptions which are needed to prove the mainresults. Recall (2.4).
Assumption 2.5 (Compatability condition on F ) . Let ¯ F = { ¯ f : f ∈ F} be a ( L F , s, R, C ) -class. There exist ν ≥ and some p ∈ (1 , ∞ ] , C X > such that sup i,u k R ( Z i , u ) k νp ≤ C R , sup i,j k X ij k νspp − ≤ C X . (2.8) It holds that dC R · k X j =0 L F ,j ( δ X νspp − ( k − j )) s ≤ ∆( k ) , sup f ∈F (cid:16) n n X i =1 (cid:12)(cid:12) D f,n ( in ) (cid:12)(cid:12) (cid:17) / ≤ D n . We furthermore define D ∞ n ( u ) := sup f ∈F | D f,n ( u ) | and choose D ∞ ν,n such that (cid:16) n n X i =1 D ∞ n ( in ) ν (cid:17) /ν ≤ D ∞ ν,n . We abbreviate D ∞ n = D ∞ ,n . Remark . (i) In the theory of this paper, we will mainly consider the case ν = 2 .(ii) In the case that each ¯ f ∈ ¯ F has the simple form ¯ f ( z, u ) = ¯ f ( z ) with H¨older-continuous ¯ f and H¨older exponent s ∈ (0 , , we can choose p = ∞ . Then thestochastic conditions basically translate to sup i,j k X ij k νs < ∞ , δ Xνs ( k ) s ≤ ∆( k ) , and so, the decay of ∆( k ) is determined by δ Xνs ( k ) s . Note that ∆( k ) has a slowerdecay rate than δ Xνs ( k ) if s is strictly less than 1. In this case, our theory gives weakermpirical process theory for locally stationary processes results than the corresponding results for absolutely regular sequences from [21] since β -mixing does not rely on the smoothness of f . It can be seen that the choice of s is part of a trade-off: A smaller s is connected to weaker moment assumptions onthe underlying process X i and vice versa. If Assumption 2.5 is fulfilled, we obtain the following main consequences given in Lemma2.7. The proof is a simple application of H¨older’s inequality and is given in Section 8.1found in the Supplementary Material. Our non-asymptotic main results only rely on thestatements of Lemma 2.7; therefore they serve as an alternative set of assumptions.
Lemma 2.7.
Let Assumption 2.5 hold for some ν ≥ . Then, δ f ( Z,u ) ν ( k ) ≤ | D f,n ( u ) | · ∆( k ) , sup i (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) f ( Z i , u ) − f ( Z ∗ ( i − j ) i , u ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ν ≤ D ∞ n ( u ) · ∆( k ) , sup i k f ( Z i , u ) k ν ≤ | D f,n ( u ) | · C ∆ , where C ∆ := 4 d · | L F | · C sX C R + C ¯ f . Assumption 2.5 automatically imposes a continuity assumption on every f . This does,for instance, not hold for the analysis of the empirical distribution function where F = { f y = {·≤ x } : x ∈ R } . By using the decomposition G n ( f ) := G (1) n ( f ) + G (2) n ( f ) , G (1) n ( f ) := 1 √ n n X i =1 (cid:8) f ( Z i , in ) − E [ f ( Z i , in ) |G i − ] (cid:9) , (2.9) G (2) n ( f ) := 1 √ n n X i =1 (cid:8) E [ f ( Z i , in ) |G i − ] − E f ( Z i , in ) (cid:9) (2.10)(cf. [25] for a similar approach), the analysis of G n ( f ) can be transferred to that of themartingale G (1) n ( f ) and the more smooth G (2) n ( f ). For G (2) n ( f ), the smoothness conditionsof Assumption 2.5 have to be transferred from z f ( z, u ) to g E [ f ( Z i , u ) |G i − = g ].Even if the first summand is noncontinuous, there is hope that the latter one is, due tothe additional integration over ε i . To guarantee this, we typically have to assume that ε i has a continuous density. Assumption 2.8 (Compatibility condition on F ) . Let ν ≥ . There exists a process X ◦ i = ( X ij ) j =1 ,...,d ◦ = J ◦ i,n ( G i ) with the following properties. For κ ∈ { , } and any f ∈ F , there exist functions ¯ µ ( κ ) f,i such that ¯ µ ( κ ) f,i ( Z ◦ i − , u ) = E [ ¯ f ( Z i , u ) κ |G i − ] /κ , i = 1 , ..., n, u ∈ [0 , , (2.11) with Z ◦ i − := ( X ◦ i − , X ◦ i − , ... ) .The class ¯ F κ := { ¯ µ ( κ ) f,i : f ∈ F , i ∈ { , ..., n }} is an ( L F , s, R, C ) -class, and there exists p ∈ (1 , ∞ ] , C X > such that sup i,u k R ( Z ◦ i − , u ) k νp ≤ C R , sup i,j k X ◦ ij k νspp − ≤ C X . It holds that d ◦ C R k − X j =0 L F ,j ( δ X ◦ νspp − ( k − j − s ≤ ∆( k ) , sup f ∈F (cid:16) n n X i =1 (cid:12)(cid:12) D f,n ( in ) (cid:12)(cid:12) (cid:17) / ≤ D n . Remark . (i) Assumption 2.8 naturally mixes properties of f and the one-stepevolution of the statistical model posed on X i . This means, we need some additionalknowledge on the evolution of the process X i to verify it.(ii) Note that we can always choose X ◦ i = ε i , Z ◦ i − = G i − and define µ ( κ ) f,i ( G i − , u ) := E [ f ( Z i , u ) κ |G i − ] /κ . (2.12) In the case that X i is recursively defined, the choice (2.12) may lead to a morecomplicated calculation of ∆( k ) . In this case we should instead choose X ◦ i = X i ifit is possible.(iii) In Section 6 we will see examples where we can choose s ∈ (0 , arbitrarily. Thetrade-off connected to this choice mentioned in Remark 2.6(ii) also is present inthe framework of Assumption 2.8.(iv) We require that ¯ F is a ( L F , s, R, C ) -class to ensure smoothness of the conditionalvariance of G (1) n . This allows us to upper bound it by a deterministic distance mea-sure. We think that this is one of the weakest general assumptions that can beimposed. In special cases, stronger properties may be present which also allow for areduction of moment and dependence conditions, which is connected to the choiceof s . For details, we refer to Remark 4.4 and Remark 4.8. Based on Assumption 2.8 it is possible to show similar results as in Lemma 2.7. Thedetails can be found in the Supplementary Material Section 8.1 in Lemma 8.1. In theframework of Assumption 2.8, we sometimes need a submultiplicativity assumption on∆( k ). For q ∈ N , put β ( q ) = ∞ X j = q ∆( k ) . mpirical process theory for locally stationary processes Assumption 2.10.
There exists a constant C β > such that for each q , q ∈ N , β ( q q ) ≤ C β · β ( q ) β ( q ) . It is easily seen that Assumption 2.10 is fulfilled if ∆( k ) follows a polynomial (∆( k ) = ck − α for c > , α >
1) or exponential decay (∆( k ) = cρ k for c > ρ ∈ (0 , k ) contains a factor of the form k ) . Note that both Assumption 2.5 and Assumption 2.8 imply that Var( G n ( f )) / is boundedby V ( f ). This follows from (2.2) and from the fact that Var( G n ( f )) / ≤ Var( G (1) n ( f )) / +Var( G (2) n ( f )) / ≤ k f k ,n + Var( G (2) n ( f )). Lemma 2.11 (Variance bound) . Suppose that Assumption 2.5 or Assumption 2.8holds. Then for f ∈ F , Var( G n ( f )) / ≤ V ( f ) . Remark . We introduced the bracketing numbers N ( ε, F , k · k ) of F with the con-dition that the limits l j , u j of the brackets [ l j , u j ] have to belong to F . This later isneeded for the chaining procedure. Only in the case of Assumption 2.8, this represents anadditional condition. In the case of Assumption 2.5, we can simply define new limits ˜ l j ( z, u ) := inf f ∈ [ l j ,u j ] f ( z, u ) , ˜ u j ( z, u ) := sup f ∈ [ l j ,u j ] f ( z, u ) which fulfill [ l j , u j ] ∩ F = [˜ l j , ˜ u j ] ∩ F . Furthermore, | ˜ l j ( z, u ) − ˜ l j ( z ′ , u ) | ≤ sup f ∈ [ l j ,u j ] | f ( z, u ) − f ( z ′ , u ) | . Thus, we can add ˜ l j , ˜ u j to F without changing the bracketing numbers N ( ε, F , k · k ) andthe validity of Assumption 2.5.
3. Empirical process theory for smooth functionclasses
We provide an approach to obtain maximal inequalities for sums of random variables W i ( f ), i = 1 , ..., n , indexed by f ∈ F , by using a decomposition into independent random2variables. A similar approach is presented in [5] (Section 4.3 therein) for absolutely regularsequences. We will apply the results to W i ( f ) = f ( Z i , in ) or W i ( f ) = E [ f ( Z i , in ) |G i − ]in the case of Assumption 2.5 or Assumption 2.8, respectively. We will impose the fol-lowing conditions on W i ( f ) which are easily verified in the above two cases by Lemma2.7 or Lemma 8.1 in the Supplementary Material Supplement A. Assumption 3.1.
Suppose that for all measurable f , f , f, g , W i ( f + f ) = W i ( f ) + W i ( f ) , and | f | ≤ g ⇒ | W i ( f ) | ≤ W i ( g ) . For each i = 1 , ..., n , j ∈ N , s ∈ N ∪ {∞} , f ∈ F , (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) W i ( f ) − W i ( f ) ∗ ( i − j ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ≤ D ∞ n ( in )∆( j ) , (cid:13)(cid:13) W i ( f ) − W i ( f ) ∗ ( i − j ) (cid:13)(cid:13) ≤ | D f,n ( in ) | · ∆( j ) , (cid:13)(cid:13) W i ( f ) k s ≤ (cid:13)(cid:13) f ( Z i , in ) (cid:13)(cid:13) s . To approximate W i ( f ) by independent variables, we use a technique from [27] which wasrefined in [28]. Define W i,j ( f ) := E [ W i ( f ) | ε i − j , ε i − j +1 , ..., ε i ] , j ∈ N , and S Wn ( f ) := n X i =1 { W i ( f ) − E W i ( f ) } , S Wn,j ( f ) := n X i =1 { W i,j ( f ) − E W i,j ( f ) } . Let q ∈ { , ..., n } be arbitrary. Put L := ⌊ log( q )log(2) ⌋ and τ l := 2 l ( l = 0 , ..., L − τ L := q .Then we have W i ( f ) = W i ( f ) − W i,q ( f ) + L X l =1 ( W i,τ l ( f ) − W i,τ l − ( f )) + W i, ( f )(in the case q = 1, the sum in the middle does not appear) and thus S Wn ( f ) = (cid:2) S Wn ( f ) − S Wn,q ( f ) (cid:3) + L X l =1 (cid:2) S Wn,τ l ( f ) − S Wn,τ l − ( f ) (cid:3) + S Wn, ( f ) . We write S Wn,τ l ( f ) − S Wn,τ l − ( f ) = ⌊ nτl ⌋ +1 X i =1 T i,l ( f ) , T i,l ( f ) := ( iτ l ) ∧ n X k =( i − τ l +1 (cid:2) W k,τ l ( f ) − W k,τ l − ( f ) (cid:3) . mpirical process theory for locally stationary processes T i,l ( f ) , T i ′ ,l ( f ) are independent if | i − i ′ | >
1. This leads to thedecompositionmax f ∈F (cid:12)(cid:12)(cid:12) √ n S Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ max f ∈F √ n (cid:12)(cid:12) S Wn ( f ) − S Wn,q ( f ) (cid:12)(cid:12) + L X l =1 h max f ∈F (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) + max f ∈F (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i odd √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12)i + max f ∈F √ n (cid:12)(cid:12) S Wn, ( f ) (cid:12)(cid:12) . (3.1)While the first term in (3.1) can be made small by assumptions on the dependence of W i ( f ) and by the use of a large deviation inequality for martingales in Banach spacesfrom [19], the second and third term allow the application of Rosenthal-type boundsdue to the independency of the summands T i,l ( f ) and W i, ( f ), respectively. Recall that H = H ( |F| ) = 1 ∨ log |F| as in (1.5). We obtain the following maximal inequality. Theorem 3.2.
Suppose that F satisfies |F| < ∞ and Assumption 3.1. Then thereexists some universal constant c > such that the following holds: If sup f ∈F k f k ∞ ≤ M and sup f ∈F V ( f ) ≤ σ , then E max f ∈F (cid:12)(cid:12)(cid:12) √ n S Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ c · min q ∈{ ,...,n } h σ √ H + √ H · D ∞ n β ( q ) + qM H √ n i . (3.2) For x > , define q ∗ ( x ) := min { q ∈ N : β ( q ) ≤ q · x } . Then, E max f ∈F (cid:12)(cid:12)(cid:12) √ n S Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ c · (cid:16) σ √ H + q ∗ (cid:0) M √ H √ n D ∞ n (cid:1) M H √ n (cid:17) . (3.3)In the next subsections, we will prove asymptotic tightness and a functional centrallimit theorem for G n ( f ) under the condition that D ∞ n , D n do not depend on n . However,uniform convergence rates of G n ( f ) for finite F can be obtained without these conditionsbut with additional moment assumptions, which is done in the following Corollary 3.3.Values of q ∗ ( · ) and r ( · ) for the two prominent cases that ∆( · ) is polynomial or exponentialdecaying can be found in Table 2. Corollary 3.3 (Uniform convergence rates) . Suppose that F satisfies |F| < ∞ andAssumption 2.5 for some ν ≥ . Furthermore, suppose that sup n ∈ N sup f ∈F V ( f ) < ∞ , sup n ∈ N D ∞ ν,n D ∞ n < ∞ , sup n ∈ N C ∆ Hn − ν r ( σ D ∞ n ) < ∞ . (3.4) Then, max f ∈F | G n ( f ) | = O p ( √ H ) . Remark . • Corollary 3.3 can be used to prove (optimal) convergence rates forkernel density and regression estimators as well as maximum likelihood estimatorsunder dependence. We give some examples in Section 6. • The first condition in (3.4) guarantees that G n ( f ) is properly normalized. The sec-ond and third condition are needed to prove that the “rare events”, where | f ( Z i , in ) | exceeds some threshold M n ∈ (0 , ∞ ) , are of the same order as √ H . For this, we mayneed more than two moments, that is, ν > , depending on √ H and the behaviorof D ∞ n . ∆( j ) Cj − α , α > Cρ j , ρ ∈ (0 , q ∗ ( x ) max { x − α , } max { log( x − ) , } r ( δ ) min { δ αα − , δ } min { δ log( δ − ) , δ } Table 2.
Equivalent expressions of q ∗ ( · ) and r ( · ) taken from Lemma 8.14 in Section 8.8. We omittedthe lower and upper bound constants which are only depending on C, ρ, α . In this section, we assume that D n , D ∞ n ∈ (0 , ∞ ) can be chosen independently of n . Wenow use Theorem 3.2 to obtain a bound for (possibly infinite) function classes F whichconsist of continuous functions with respect to their first argument. Let G Wn ( f ) := 1 √ n n X i =1 ( W i ( f ) − E W i ( f )) . The choice of the truncation sequence for the following chaining approach is motivatedby [5] (Theorem 3.3 therein). Since Theorem 3.2 only yields maximal inequalities forcontinuous functions, we are not able to use the standard chaining scheme which involvesindicator functions. We therefore provide an adaptation of the typical chaining schemewhich does not need the use of indicators but replaces them by truncations of the arisingfunctions via maxima and minima (which preserves their continuity).For m >
0, define the truncation ϕ ∧ m : R → R and the corresponding ‘peaky’ residual ϕ ∨ m : R → R via ϕ ∧ m ( x ) := ( x ∨ ( − m )) ∧ m, ϕ ∨ m ( x ) := x − ϕ ∧ m ( x ) . In the following, assume that for each j ∈ N there exists a decomposition F = S N j k =1 F jk ,where ( F jk ) k =1 ,...,N j , j ∈ N is a sequence of nested partitions. For each j ∈ N and mpirical process theory for locally stationary processes k ∈ { , ..., N j } , choose a fixed element f jk ∈ F jk . For j ∈ N , define π j f := f jk if f ∈ F jk .Assume that there exists a sequence (∆ j f ) j ∈ N such that for all j ∈ N , sup f,g ∈F jk | f − g | ≤ ∆ j f . Finally, let ( m j ) j ∈ N be a decreasing sequence which will serve as a truncationsequence.For j ∈ N , we use the decomposition f − π j f = ϕ ∧ m j ( f − π j f ) + ϕ ∨ m j ( f − π j f )Since f − π j f = f − π j +1 f + π j +1 f − π j f = ϕ ∧ m j +1 ( f − π j +1 f ) + ϕ ∨ m j +1 ( f − π j +1 f )+ ϕ ∧ m j − m j +1 ( π j +1 f − π j f ) + ϕ ∨ m j − m j +1 ( π j +1 f − π j f ) , (3.5)we can write ϕ ∧ m j ( f − π j f ) = ϕ ∧ m j +1 ( f − π j +1 f ) + ϕ ∧ m j − m j +1 ( π j +1 f − π j f ) + R ( j ) , (3.6)where R ( j ) := ϕ ∧ m j ( f − π j f ) − ϕ ∧ m j ( ϕ ∧ m j +1 ( f − π j +1 f )) − ϕ ∧ m j ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) . To bound R ( j ), we use (i) of the following elementary Lemma 3.5 which is proved inSection 8.3 included in the Supplementary Material Supplement A. Lemma 3.5.
Let y, x, x , x , x and m, m ′ > be real numbers. Then the followingassertions hold:(i) If | x | + | x | ≤ m , then (cid:12)(cid:12) ϕ ∧ m ( x + x + x ) − ϕ ∧ m ( x ) − ϕ ∧ m ( x ) (cid:12)(cid:12) ≤ min {| x | , m } . (ii) | ϕ ∧ m ( x ) | ≤ min {| x | , m } and if | x | < y , | ϕ ∨ m ( x ) | ≤ ϕ ∨ m ( y ) ≤ y { y>m } . (iii) If F fulfills Assumption 2.5, then Assumption 2.5 also holds for { ϕ ∧ m ( f ) : f ∈ F} and { ϕ ∨ m ( f ) : f ∈ F} . Because the partitions are nested, we have | π j +1 f − π j f | ≤ ∆ j f . By Lemma 3.5 and(3.5), we have | R ( j ) | ≤ min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 ( f − π j +1 f ) + ϕ ∨ m j − m j +1 ( π j +1 f − π j f ) (cid:12)(cid:12) , m j (cid:9) ≤ min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) + min (cid:8)(cid:12)(cid:12) ϕ ∨ m j − m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) . (3.7)6Let τ ∈ N . We then have with iterated application of (3.6) and linearity of f W i ( f ), G Wn ( ϕ ∧ m ( f − π f ))= G Wn ( ϕ ∧ m ( f − π f )) + G Wn ( ϕ ∧ m − m ( π f − π f )) + G Wn ( R (0))= G Wn ( ϕ ∧ m τ ( f − π τ f )) + τ − X j =0 G Wn ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) + τ − X j =0 G Wn ( R ( j )) , (3.8)which in combination with (3.7) can now be used for chaining. In the following Lemma3.6, we balance the contribution of the truncated stochastic part and the expectation ofthe rare events. Recall that H ( k ) = 1 ∨ log( k ) as in (1.4). Lemma 3.6 (Compatibility lemma) . For δ > , put r ( δ ) := max { r > q ∗ ( r ) r ≤ δ } . For n ∈ N , δ > and k ∈ N define m ( n, δ, k ) := r ( δ D n ) · D ∞ n n / H ( k ) / . (3.9) Then the following statements hold:(i) r ( · ) is well-defined and for each a > , r ( a )2 ≥ r ( a ) and r ( a ) ≤ a .(ii) If F fulfills |F| ≤ k and Assumption 3.1, then sup f ∈F V ( f ) ≤ δ , sup f ∈F k f k ∞ ≤ m ( n, δ, k ) imply E max f ∈F (cid:12)(cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ c (1 + D ∞ n D n ) δ p H ( k ) , (3.10) and sup f ∈F V ( f ) ≤ δ implies that for each γ > , √ n k f { f>γ · m ( n,δ,k ) } k ,n ≤ γ D n D ∞ n δ p H ( k ) . (3.11)We now use (3.7), (3.8) and Lemma 3.6 to derive a uniform bound for E sup f ∈F | G Wn ( f ) | in the following Theorem 3.7. Theorem 3.7.
Let F satisfy Assumption 3.1 and let F be some envelope function of F ,that is, for each f ∈ F it holds that | f | ≤ F . Let σ > and assume that sup f ∈F V ( f ) ≤ σ .Then there exists some universal constant ˜ c > such that E sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ ˜ c h (1 + D ∞ n D n + D n D ∞ n ) Z σ q ∨ H (cid:0) ε, F , V (cid:1) d ε + √ n (cid:13)(cid:13) F { F > m ( n,σ, N ( σ , F ,V )) } (cid:13)(cid:13) ,n i , where m ( · ) is from Lemma 3.6.mpirical process theory for locally stationary processes Remark . Lemma 3.6 and Theorem 3.7 are designed for the case that D n , D ∞ n ∈ (0 , ∞ ) are independent of n . If instead V and D n , D ∞ n depend on n , chaining has tobe performed in a different way to get optimal bounds for the corresponding maximalinequality. We give a short idea how the statements change. Let F satisfy Assumptions2.8, 2.10 with ν > . Define V ν,n ( f ) := k f k ν,n + P ∞ j =1 min {k f k ν,n , D n ∆( k ) } . Choose m ∞ ( n, δ, k ) = m ( n, δ, k ) · C β r ( D n D ∞ n ) instead of (3.9), and ν large enough such that √ n (cid:16) D n C β γ √ nr ( D n D ∞ n ) (cid:17) ν − ≤ . Then the following modification of Lemma 3.6 holds. There exists some universal constant c > such that |F| ≤ k, sup f ∈F V ν,n ( f ) ≤ δ and sup f ∈F k f k ∞ ≤ m ∞ ( n, δ, k ) imply E max f ∈F (cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12) ≤ c (1 + C β ) δ p H ( k ) , and √ n k f { f>γ · m ( n,δ,k ) } k ,n ≤ √ n k f k νν,n m ∞ ( n, δ, k ) ν − γ ν − = k f k ν,n (cid:16) √ n k f k ν,n m ( n, δ, k ) (cid:17) ν − · √ n (cid:16) C β γ √ nr ( D n D ∞ n ) (cid:17) − ( ν − ≤ δH ( k ) ν − · √ n (cid:16) D n C β γ √ nr ( D n D ∞ n ) (cid:17) ν − ≤ δH ( k ) ν − . We now show a functional central limit theorem for G n ( f ). First we conclude fromTheorem 3.7 asymptotic equicontinuity of G n . To do so, we have to discuss the trailingterm in Theorem 3.7 which involves the envelope function. This can either be tackledwith higher moment assumptions on f ( Z i , u ) or by imposing smoothness assumptions onthe process X i and the functions f ∈ F with respect to their second argument. Here, wewill consider the second approach since most of the smoothness assumptions are naturallyneeded to prove a central limit theorem, anyway (cf. Theorem 3.13). This also has theadvantage that we only have to assume the existence of a second moment for f ( Z i , u ). Assumption 3.9.
For each u ∈ [0 , , there exists a process ˜ X i ( u ) = J ( G i , u ) , i ∈ Z ,where J is a measurable function. Furthermore, there exists some C X > , ς ∈ (0 , suchthat for every i ∈ { , ..., n } , u , u ∈ [0 , , k X i − ˜ X i ( in ) k spp − ≤ C X n − ς , k ˜ X i ( u ) − ˜ X i ( u ) k spp − ≤ C X | u − u | ς . For ˜ Z i ( u ) = ( ˜ X i ( u ) , ˜ X i − ( u ) , ... ) it holds that sup v,u k R ( ˜ Z ( v ) , u ) k p < ∞ . Assumption 3.10.
There exists some ς ∈ (0 , such that for every f ∈ F , | ¯ f ( z, u ) − ¯ f ( z, u ) | ≤ | u − u | ς · (cid:0) ¯ R ( z, u ) + ¯ R ( z, u ) (cid:1) , and sup u,v k ¯ R ( ˜ Z ( v ) , u ) k < ∞ . Corollary 3.11.
Let F satisfy Assumption 2.5, 3.9 and 3.10. Suppose that sup n ∈ N Z p ∨ H ( ε, F , V ) dε < ∞ . (3.12) Furthermore, assume that D n , D ∞ n ∈ (0 , ∞ ) are independent of n , and sup i =1 ,...,n D ∞ n ( in ) √ n → . (3.13) Then, the process G n ( f ) is equicontinuous with respect to V , that is, for every η > , lim σ → lim sup n →∞ P (cid:16) sup f,g ∈F ,V ( f − g ) ≤ σ | G n ( f ) − G n ( g ) | ≥ η (cid:17) = 0 . From Theorem 8.6 provided by the Supplementary Material Supplement A, Section 8.5,we directly obtain the following multivariate central limit theorem as a special case whichonly needs second moments of the summands f ( Z i , u ). To keep the presentation simple,we reduce ourselves to two explicit forms of D f,n ( · ) which are given in Assumption 3.12,namely a global and a local version. Theorem 8.6 allows more general choices of D f,n ( · ).In Assumption 3.12, we make use of ˜ X i ( u ), ˜ Z i ( u ) introduced in Assumption 3.9. Assumption 3.12.
Let ω : [0 , → R , K : R → R be some bounded functions. One ofthe following cases holds • Case K = 1 (global version): For all f ∈ F , D f,n ( u ) = ω ( u ) , where ω has boundedvariation and R ω ( u ) du > . For all f, g ∈ F , j , j ∈ N , the mapping u E [ E [ ¯ f ( ˜ Z j ( u ) , u ) |G ] · E [¯ g ( ˜ Z j ( u ) , u ) |G ]] has bounded variation.For f, g ∈ F , define Σ (1) f,g := Z ω ( u ) · X j ∈ Z Cov ( f ( ˜ Z ( u ) , u ) , g ( ˜ Z j ( u ) , u )) du. • Case K = 2 (local version): For all f ∈ F , D f,n ( u ) = ω ( u ) · √ h K (cid:0) u − vh (cid:1) , mpirical process theory for locally stationary processes where v ∈ (0 , is some fixed value, h = h n → , nh → ∞ . ω is continuous in v ,and K has bounded variation, support ⊂ [ − , ] and satisfies R K ( u ) du > .For f, g ∈ F , define Σ (2) f,g := Z K ( u ) du · ω ( v ) X j ∈ Z Cov ( f ( ˜ Z ( v ) , v ) , g ( ˜ Z j ( v ) , v )) . Theorem 3.13.
Let F satisfy Assumptions 2.5, 3.9, 3.10 and 3.12. Let m ∈ N and f , ..., f m ∈ F . Then, √ n n X i =1 n f ( Z i , in ) ... f m ( Z i , in ) − E f ( Z i , in ) ... f m ( Z i , in ) o d → N (0 , (Σ ( K ) f k ,f l ) k,l =1 ,...,m ) , where Σ ( K ) is from Assumption 3.12. As a result of Corollary 3.11 and Theorem 3.13 and Theorem 18.14 in [22], we obtainthe following functional central limit theorem. The weak convergence takes place in thenormed space ℓ ∞ ( F ) = { G : F → R | k G k ∞ := sup f ∈F | G ( f ) | < ∞} , (3.14)cf. [22], Example 18.5. Corollary 3.14.
Let F satisfy Assumptions 2.5, 3.9, 3.10 and 3.12. Assume that sup n ∈ N Z p ∨ H ( ε, F , V ) dε < ∞ . Then it holds in ℓ ∞ ( F ) that (cid:2) G n ( f ) (cid:3) f ∈F d → (cid:2) G ( f ) (cid:3) f ∈F , where ( G ( f )) f ∈F is a centered Gaussian process with covariances Cov( G ( f ) , G ( g )) = Σ ( K ) f,g , where Σ ( K ) is from Assumption 3.12.
4. Empirical process theory for non-continuousfunctions
We now provide an approach for empirical process theory if the class F consists of non-continuous functions. Our approach is based on the decomposition G n ( f ) = G (1) n ( f ) + G (2) n ( f )into a martingale G (1) n (cf. (2.9)) and a process G (2) n (cf. (2.10)) with smooth increments.The second part G (2) n can then be controlled in a similar way as done in Section 3 bytaking W i ( f ) = E [ f ( Z i , in ) |G i − ]. The term G (1) n is dealt with by using a Bernstein-typeinequality for martingales. Observe that the conditional variance of G (1) n ( f ) is boundedfrom above by R n ( f ) := 1 n n X i =1 E [ f ( Z i , in ) |G i − ] . The first step is now to bound R n ( f ) over f ∈ F . Again, let W i ( f ), i = 1 , ..., n , be somesequence of random variables indexed by f ∈ F . We will apply the following theory to W i ( f ) = E [ f ( Z i , in ) |G i − ] , but impose the more general assumptions which are directly implied by Lemma 8.1 inthe Supplementary material Supplement A under Assumption 2.8. Assumption 4.1.
For each i = 1 , ..., n , j ∈ N , s ∈ N ∪ {∞} , f ∈ F , (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) W i ( f ) − W i ( f ) ∗ ( i − j ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ≤ C ∆ D ∞ n ( in ) ∆( j ) , (cid:13)(cid:13) W i ( f ) − W i ( f ) ∗ ( i − j ) (cid:13)(cid:13) ≤ | D f,n ( in ) | · k f ( Z i , in ) k ∆( j ) , (cid:13)(cid:13) W i ( f ) k s ≤ k f ( Z i , in ) k s . We obtain the following analogue of Theorem 3.2, a maximal inequality for means ofrandom variables.
Lemma 4.2 (A maximal inequality for means) . Let F satisfy |F| < ∞ and Assumption4.1. Then there exists some universal constant c > such that the following holds: If sup f ∈F k f k ∞ ≤ M and sup f ∈F V ( f ) ≤ σ , then E max f ∈F (cid:12)(cid:12)(cid:12) n S Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ c · min q ∈{ ,...,n } h D n r ( σ D n ) σ + C ∆ ( D ∞ n ) β ( q ) + qM Hn i . (4.1) mpirical process theory for locally stationary processes Furthermore, E max f ∈F (cid:12)(cid:12)(cid:12) n S Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ c · h D n r ( σ D n ) σ + q ∗ (cid:0) M Hn ( D ∞ n ) C ∆ (cid:1) M Hn i . (4.2)Lemma 4.2 in conjunction with Theorem 3.2 can be used to provide convergence ratesin the same fashion as done in Corollary 3.3. Recall from (1.5) that H = 1 ∨ log |F| . Corollary 4.3 (Uniform convergence rates) . Suppose that F satisfies |F| < ∞ , As-sumption 2.8 for some ν ≥ , and Assumption 2.10. Let ¯ F := sup f ∈F ¯ f and assume thatfor some ν ∈ [2 , ∞ ] , C ¯ F ,n := sup i,u k ¯ F ( Z i , u ) k ν < ∞ . If sup n ∈ N sup f ∈F V ( f ) < ∞ , sup n ∈ N D ∞ ν ,n D ∞ n < ∞ , sup n ∈ N C F ,n Hn − ν r ( σ D ∞ n ) < ∞ , (4.3) then max f ∈F | G n ( f ) | = O p ( √ H ) . Remark (Alternative conditions) . In the special case that there exists some con-stant R > such that sup f ∈F n P ni =1 E [ f ( Z i , in ) |G i − ] ≤ R , it can easily be seen inthe proof that the statement of Corollary 4.3 still holds if we only ask for Assumption2.8 to hold for κ = 1 and Assumption 2.10 is discarded. A possible application is givenin Example 6.8. We now show asymptotic tightness of the martingale part G (1) n ( f ), f ∈ F . By usinga Bernstein-type inequality for martingales and Lemma 4.2, we obtain an analogue ofLemma 3.6 with the same function m ( · ) as defined there. Lemma 4.5 (Compatibility lemma 2) . Let ψ : (0 , ∞ ) → [1 , ∞ ) be some functionand k ∈ N , δ > . If F fulfills |F| ≤ k and Assumptions 2.8, 2.10, then there existssome universal constant c > such that the following holds: If sup f ∈F V ( f ) ≤ δ and sup f ∈F k f k ∞ ≤ m ( n, δ, k ) , then E max f ∈F (cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) { R n ( f ) ≤ δψ ( δ ) } ≤ c (1 + D ∞ n D n ) · ψ ( δ ) δ p H ( k ) , (4.4) P (cid:16) sup f ∈F R n ( f ) > δψ ( δ ) (cid:17) ≤ c (1 + q ∗ (cid:0) C − C − β (cid:1) ( D ∞ n D n ) ) ψ ( δ ) . (4.5)With the help of Lemma 4.5, we obtain the following maximal inequality.2 Theorem 4.6.
Let F satisfy Assumption 2.8 and 2.10, and F be some envelope func-tion of F . Furthermore, let σ > and suppose that sup f ∈F V ( f ) ≤ σ . Set ψ ( ε ) = p log( ε − ∨
1) log log( ε − ∨ e ) . (4.6) Then there exists a universal constant c > such that for each η > , P (cid:16) sup f ∈F (cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) > η (cid:17) ≤ η h c (cid:16) D ∞ n D n + D n D ∞ n (cid:17) · Z σ ψ ( ε ) q ∨ H (cid:0) ε, F , V (cid:1) d ε + √ n (cid:13)(cid:13) F { F > m ( n,σ, N ( σ , F ,V )) } (cid:13)(cid:13) i + c (cid:16) q ∗ (cid:0) C − C − β (cid:1)(cid:16) D ∞ n D n (cid:17) (cid:17) Z σ εψ ( ε ) dε, (4.7) where m ( · ) is from Lemma 3.6. Remark . Let m > . The chaining procedure found in [18] for martingales usesthe fact that for functions f, g with | f | ≤ g and g ( · ) > m , | G (1) n ( f ) | ≤ | G (1) n ( g ) | + 2 √ n · n n X i =1 E [ g ( Z i , in ) |G i − ] ≤ | G (1) n ( g ) | + 2 √ n R n ( g ) m . Afterwards, bounds for the conditional variance R n ( g ) are applied. In our case, thesebounds are not sharp enough. We therefore employ the inequality | G (1) n ( f ) | ≤ | G (1) n ( g ) | + 2 | G (2) n ( g ) | + 2 √ n k g k ,n m and are forced to use the “smooth” chaining technique applied in Theorem 3.7. Remark (Alternative conditions) . There seems to be no straightforward way us-ing a slicing device to approximate the conditional variance R n ( f ) by an appropriatedeterministic distance. Instead, we upper bound R n ( f ) during the chaining procedurewhich leads to the additional factor ψ ( ε ) in the entropy integral. In some special cases,the conditions of Theorem 4.6 can be relaxed. Suppose that V ∗ is some semi-metric on F × F such that for all f , f ∈ F , V ( f − f ) ≤ V ∗ ( f , f ) and for γ > small enough, sup f ,f ∈F ,V ∗ ( f ,f ) ≤ γ R n ( f − f ) ≤ γ almost surely. Then the statement of Theorem 4.6 still holds in the following form: If F satisfies Assumption 2.8 only for κ = 1 (and not necessarily Assumption 2.10), then formpirical process theory for locally stationary processes any R > , P (cid:16) sup f ∈F (cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) > η (cid:17) ≤ Rη h c (cid:16) D ∞ n D n + D n D ∞ n (cid:17) · Z σ q ∨ H (cid:0) ε, F , V ∗ (cid:1) d ε + √ n (cid:13)(cid:13) F { F > m ( n,σ, N ( σ , F ,V ∗ )) } (cid:13)(cid:13) i + cR (cid:16) q ∗ (cid:0) C − C − β (cid:1)(cid:16) D ∞ n D n (cid:17) (cid:17) . One possible example where this could be applicable is given in Example 6.10.
To formulate an equicontinuity statement, we use the following assumption to discussthe term which incorporates the envelope function in the upper bound of Theorem 4.6.The assumption is necessary to bound terms of the form k f ( Z i , u ) − f ( ˜ Z i ( in ) , u ) k , which naturally arise when a limit for the variance of G n ( f ) is derived. These cannot bediscussed with Assumption 2.8. Assumption 4.9.
For small enough c > and for all f ∈ F , sup u,v ∈ [0 , c s E h sup | a | L F ,s ≤ c (cid:12)(cid:12) ¯ f ( ˜ Z ( v ) , u ) − ¯ f ( ˜ Z ( v ) + a, u ) (cid:12)(cid:12) i < ∞ , (4.8) and there exists ¯ p ∈ (1 , ∞ ] such that sup i,u k ¯ f ( Z i , u ) k p < ∞ , sup v,u k ¯ f ( ˜ Z ( v ) , u ) k p < ∞ .Let ¯ F : ( R d ) N × [0 , → R be some function which fulfills sup f ∈F ¯ f ≤ ¯ F . Assumption3.10 and the conditions above also hold when ¯ f is replaced by ¯ F . Remark . In opposite to the continuous case, where all conditions imposed on f ∈ F also transfer to sup f ∈F f due to the purely analytic nature of Assumptions 2.5and 3.10, we here additionally require some envelope function ¯ F to fulfill Assumption 3.10and (4.8) because the supremum over f ∈ F does not interchange with the expectation in(4.8). We now obtain asymptotic equicontinuity of the process G n ( f ). Corollary 4.11.
Let F satisfy the Assumptions 2.8, 2.10, 3.9, 3.10 and 4.9. For ψ from (4.6), suppose that sup n ∈ N Z ∞ ψ ( ε ) p ∨ H ( ε, F , V ) dε < ∞ . (4.9)4 Furthermore, let D n , D ∞ n ∈ (0 , ∞ ) be independent of n , and sup i =1 ,...,n D ∞ n ( in ) √ n → . (4.10) Then, the process G n ( f ) is equicontinuous with respect to V , that is, for every η > , lim σ → lim sup n →∞ P (cid:16) sup f,g ∈F ,V ( f − g ) ≤ σ | G n ( f ) − G n ( g ) | ≥ η (cid:17) = 0 . Remark . Compared to Corollary 3.11, the condition (4.9) of Corollary 4.11 is notoptimal due to the additional log -factor. The reason here is that we do not approximatethe distance R n ( · ) uniformly over the class F in an external step but evaluate the neededbounds for R n ( · ) during the chaining process. This is also the reason why our result doesnot coincide with the i.i.d. case. However, in comparison to the results of Corollary 8.16we do not lose much due to this factor in the presence of polynomial dependence. Evenin the case of exponential decay, only an additional factor in the integral appears, whichcan be seen as a factor contributed by the exponential decay itself. It is possible to show the following analogue of a multivariate central limit theorem asin Theorem 3.13 for a class F which fulfills Assumption 2.8. Theorem 4.13.
Suppose that F satisfies Assumptions 2.8, 2.10, 3.9, 3.10, 3.12 and4.9. Let m ∈ N and f , ..., f m ∈ F . Then, √ n n X i =1 n f ( Z i , in ) ... f m ( Z i , in ) − E f ( Z i , in ) ... f m ( Z i , in ) o d → N (0 , (Σ ( K ) f k ,f l ) k,l =1 ,...,m ) , where Σ ( K ) is from Assumption 3.12. As a final result, we obtain the following functional central limit theorem (cf. (3.14) forthe definition of ℓ ∞ ( F )). Corollary 4.14.
Suppose that F satisfies Assumptions 2.8, 2.10, 3.9, 3.10, 3.12 and4.9. For ψ defined in (4.6), suppose that sup n ∈ N Z ψ ( ε ) p ∨ H ( ε, F , V ) dε < ∞ . Then in ℓ ∞ ( F ) , (cid:2) G n ( f ) (cid:3) f ∈F d → (cid:2) G ( f ) (cid:3) f ∈F , mpirical process theory for locally stationary processes where ( G ( f )) f ∈F is a centered Gaussian process with covariances Cov( G ( f ) , G ( g )) = Σ ( K ) f,g and Σ ( K ) is from Assumption 3.12.
5. Large deviation inequalities
A large variety of large deviation inequalities using the functional dependence measurehave been derived, see for instance [28] and [27] for Nagaev- and Rosenthal-type inequal-ities. Here, we present a Bernstein-type inequality for G n ( f ) which can be extended to alarge deviation inequality for sup f ∈F | G n ( f ) | using a combination of our chaining schemefrom Section 3 and the one from [2]. We provide these results to complete the picture ofempirical process theory for the functional dependence measure and to show the powerof the decomposition (3.1); in general however, the derived inequalities are weaker thana combination of Markov’s inequality and Theorem 3.2. The reason for this mainly lies inthe treatment of the first summand in (3.1) and the fact that the functional dependencemeasure is formulated with a L ν -norm instead of probabilities. This leads to a worseningof V ( · ) and β ( · ).For q ∈ N , ν ≥
2, define ω ( q ) := q /ν log( eq ) / , L ( q ) = log log( e e q ) , Φ( q ) = q L ( q )as well as˜ β ( q ) = ∞ X j = q ∆( j ) ω ( j ) L ( j ) , ˜ V ( f ) = k f k ,n + ∞ X j =1 min {k f k ,n , D n ∆( j ) ω ( j ) }L ( j ) . With the above quantities, we can formulate the following result.
Theorem 5.1 (Bernstein-type large deviation inequality) . Let F satisfy Assumption2.5. Then there exist universal constants c , c > such that the following holds: Foreach q ∈ { , ..., n } there exists a set B n ( q ) independent of f ∈ F such that for all x > , P (cid:16)(cid:12)(cid:12) G n ( f ) (cid:12)(cid:12) > x, B n ( q ) (cid:17) ≤ c exp (cid:16) − c x ˜ V ( f ) + M Φ( q ) √ n x (cid:17) (5.1) and P ( B n ( q ) c ) ≤ (cid:16) D ∞ n ˜ β ( q ) √ nM Φ( q ) (cid:17) . Define ˜ q ∗ ( z ) := min { q ∈ N : ˜ β ( q ) ≤ Φ( q ) z } . Then for any y > , x > , P (cid:16) | G n ( f ) | > x, B n (˜ q ∗ ( M √ n D ∞ n y )) (cid:17) ≤ c exp (cid:16) − c x ˜ V ( f ) + Φ(˜ q ∗ ( M √ n D ∞ n y )) Mx √ n (cid:17) (5.2)6 and P ( B n (˜ q ∗ ( M √ n D ∞ n y )) c ) ≤ y . Remark . (i) Theorem 5.1 mimics the well-known large deviation inequalitiesfrom [21] (Theorem 5 therein) or [15] in the case of α -mixing sequences.(ii) The reason for the worsening of V, β, q to ˜ V , ˜ β, Φ( q ) in Theorem 5.1 comparedto Theorem 3.2 is due to the arising sums over l = 1 , ..., L in the second termand j = q, q + 1 , ... in the first term P ∞ j = q max f ∈F √ n (cid:12)(cid:12) S Wn,j +1 ( f ) − S Wn,j ( f ) (cid:12)(cid:12) inthe decomposition (3.1), which forces us to include additional log -factors to obtainconvergence. The additional factor j /ν that appears in ˜ β is due to an applicationof Markov’s inequality. It can be argued that this is a relict of the fact that thedependence conditions are stated with moments and not with probabilities as in thecase of mixing.(iii) Theorem 5.1 can be seen as an improvement of the Bernstein inequalities given in[8] which are only available for random variables with exponential decay (in oursetting, the conditions are comparable to ∆( k ) = O (exp( k − a )) for some a > ). A similar statement is valid in the case of non-continuous classes F . We then need thefollowing analogue of Assumption 2.10 where β ( · ) is replaced by ˜ β ( · ) and q is replacedby Φ( q ). Assumption 5.3.
The sequence j ∆( j ) ω ( j ) L ( j ) is decreasing. There exists someconstant C ˜ β > such that ˜ β norm ( q ) := ˜ β ( q )Φ( q ) fulfills for all q , q ∈ N , ˜ β norm ( q q ) ≤ C ˜ β ˜ β norm ( q ) ˜ β norm ( q ) . Theorem 5.4.
Let F satisfy the Assumptions 2.8, 5.3. Then there exist universal con-stants c ◦ , c ◦ > such that the following holds: For each q ∈ { , ..., n } there exists a set B ◦ n ( q ) independent of f ∈ F such that for all x > , P (cid:16)(cid:12)(cid:12) G n ( f ) (cid:12)(cid:12) > x, B ◦ n ( q ) (cid:17) ≤ c ◦ exp (cid:16) − c ◦ x ˜ V ( f ) + M Φ( q ) √ n x (cid:17) (5.3) and P ( B ◦ n ( q ) c ) ≤ [4 + C ∆ C ˜ β ] (cid:16) √ n D ∞ n M ˜ β ( q )Φ( q ) (cid:17) . Furthermore, for any x > , y > , P (cid:16)(cid:12)(cid:12) G n ( f ) (cid:12)(cid:12) > x, B ◦ n (˜ q ∗ ( M √ n D ∞ n y )) (cid:17) ≤ c ◦ exp (cid:16) − c ◦ x ˜ V ( f ) + ˜ q ∗ ( M √ n D ∞ n y ) Mx √ n (cid:17) (5.4) and P ( B ◦ n (˜ q ∗ ( M √ n D ∞ n y ) c ) ≤ C ∆ C ˜ β y .mpirical process theory for locally stationary processes f ∈F | G n ( f ) | using a chaining scheme from [2] which incorporates an entropy integral of the form R σ ψ ( ε ) W (1 ∨ H ( ε, F , ˜ V )) dε , where ψ is a log-factor (cf. (4.6)) and W : R → R fulfills H / ≤ W ( H ) ≤ H , depending on the decay of ∆( · ). Details can be found in Section 8.6,Theorem 8.12 in the Supplementary Material Supplement A. The larger entropy integralcomes from the fact that in the proof of Theorem 5.1, we can only recover the linearexp( − x ) part of the Bernstein inequality in the discussion of the first summand in (3.1)(see (8.115) in the Supplementary Material Supplement A).
6. Applications
In this section, we provide some applications of the main results for smooth functionclasses (Corollary 3.3 and Corollary 3.14) and nonsmooth function classes (Corollary 4.3and Corollary 4.14). We will focus on locally stationary processes and therefore use local-ization in our functionals, but the results also hold for stationary processes, accordingly.Let K : R → R be some bounded kernel function which is Lipschitz continuous withLipschitz constant L K , R K ( u ) du = 1, R K ( u ) du ∈ (0 , ∞ ) and support ⊂ [ − , ]. Forsome bandwidth h >
0, put K h ( · ) := h K ( · h ).In the first example we consider the nonparametric kernel estimator in the context ofnonparametric regression with fixed design and locally stationary noise. We show thatunder conditions on the bandwidth h , which are common in the presence of dependence(cf. [14] or [23]), we obtain the optimal uniform convergence rate q log( n ) nh . Write a n & b n for sequences a n , b n if there exists some constant c > a n ≥ cb n for all n ∈ N . Example (Nonparametric Regression) . Let X i be some arbitrary process of theform (1.1) with P ∞ k =0 δ X ( k ) < ∞ which fulfills sup i =1 ,...,n k X i k ν ≤ C X ∈ (0 , ∞ ) forsome ν > . Suppose that we observe Y i , i = 1 , ..., n given by Y i = g ( in ) + X i , where g : [0 , → R is some function. Estimation of g is performed via ˆ g n,h ( v ) := 1 n n X i =1 K h ( in − v ) Y i . Suppose that either • δ X ( j ) ≤ κj − α with some κ > , α > , and h & ( log( n ) n − ν ) α − α , or • δ X ( j ) ≤ κρ j with some κ > , ρ ∈ (0 , and h & log( n ) n − ν . From (6.1) and (6.2) below it follows that sup v ∈ [0 , | ˆ g n,h ( v ) − E ˆ g n,h ( v ) | = O p (cid:0)r log( n ) nh (cid:1) . First note that due to Lipschitz continuity of K with Lipschitz constant L K , we have sup | v − v ′ |≤ n − (cid:12)(cid:12) (ˆ g n,h ( v ) − E ˆ g n,h ( v )) − (ˆ g n,h ( v ′ ) − E ˆ g n,h ( v ′ )) (cid:12)(cid:12) ≤ · L K n − nh n X i =1 (cid:0) | X i | + E | X i | (cid:1) = O p ( n − ) . (6.1) For the grid V n = { in − , i = 1 , ..., n } , which discretizes [0 , up to distances n − , weobtain by Corollary 3.3 that √ nh sup v ∈ V n | ˆ g n,h ( v ) − E ˆ g n,h ( v ) | = sup f ∈F | G n ( f ) | = O p (cid:0)p log | V n | (cid:1) = O p (cid:0) log( n ) / (cid:1) , (6.2) where F = { f v ( x, u ) = 1 √ h K ( u − vh ) x : v ∈ V n } . The conditions of Corollary 3.3 are easily verified: It holds that f v ( x, u ) = D f,n ( u ) · ¯ f v ( x, u ) with D f,n ( u ) = √ h K ( u − vh ) and ¯ f v ( x, u ) = x . Thus, Assumption 2.5 is satisfiedwith ∆( k ) = 2 δ X ( k ) , p = ∞ , R ( · ) = C R = 1 . Furthermore, D n = | K | ∞ , D ν,n = | K | ∞ √ h ,and k f v k ,n ≤ √ h (cid:16) n n X i =1 K ( v − uh ) k X i k (cid:17) / ≤ C X | K | ∞ , which shows that sup f ∈F V ( f ) = O (1) . The conditions on h emerge from the last condi-tion in (3.4) and using the bounds for r ( · ) from Table 2. For the following two examples we suppose the following properties of the underlyingprocess X i . Similar assumptions are posed in [4] and are fulfilled for a large variety oflocally stationary processes. Assumption 6.2.
Let
M > . Let X i be some process of the form (1.1). For any u ∈ [0 , , there exists ˜ X i ( u ) = J ( G i , u ) , where J is a measurable function, with thefollowing properties: There exists some constants C X > , ς ∈ (0 , such that for all i = 1 , ..., n , u , u ∈ [0 , : k ˜ X ( u ) k M ≤ C X , k X i k M ≤ C X , k X i − ˜ X i ( in ) k M ≤ C X n − ς , k ˜ X ( u ) − ˜ X ( u ) k M ≤ C X | u − u | ς . In the same spirit as in Example 6.1, it is possible to derive uniform rates of convergencefor M-estimators of parameters θ in models of locally stationary processes. Furthermore, mpirical process theory for locally stationary processes ∇ jθ denote the j -th derivativewith respect to θ . To apply empirical process theory, we ask for the objective functionsto be ( L F , , R, C )-classes in (A1) and Lipschitz with respect to θ in (A2). Lemma 6.3 (M-estimation, uniform results) . Let Θ ⊂ R d Θ be compact and θ : [0 , → interior (Θ) . For each θ ∈ Θ , let ℓ θ : R k → R be some measurable function which is twicecontinuously differentiable. Let Z i = ( X i , ..., X i − k +1 ) , and define for v ∈ [0 , , ˆ θ n,h ( v ) := arg min θ ∈ Θ L n,h ( v, θ ) , L n,h ( v, θ ) := 1 n n X i = k K h (cid:0) in − v (cid:1) · ℓ θ ( Z i ) Let Assumption 6.2 be fulfilled for some M ≥ . Suppose that there exists C Θ > suchthat for j ∈ { , , } ,(A1) ¯ F j = {∇ jθ ℓ θ : θ ∈ Θ } is an ( L F , , R, C ) -class with R ( z ) = 1 + | z | M − ,(A2) for all z ∈ R k , θ, θ ′ ∈ Θ , (cid:12)(cid:12) ∇ jθ ℓ θ ( z ) − ∇ jθ ℓ θ ′ ( z ) (cid:12)(cid:12) ∞ ≤ C Θ (1 + | z | M ) · | θ − θ ′ | , (A3) θ E ℓ θ ( ˜ Z ( v )) attains its global minimum in θ ( v ) with positive definite I ( v ) := E ∇ θ ℓ θ ( ˜ Z ( v )) .Furthermore, suppose that either • δ X M ( j ) ≤ κj − α with some κ > , α > , and h & ( log( n ) n − ν ) α − α , or • δ X M ( j ) ≤ κρ j with some κ > , ρ ∈ (0 , and h & log( n ) n − ν .Define τ n := q log( n ) nh and B h := sup v ∈ [0 , | E ∇ θ L n,h ( v, θ ( v )) | (the bias). Then, B h = O ( h ς ) , and as nh → ∞ , sup v ∈ [ h , − h ] (cid:12)(cid:12) ˆ θ n,h ( v ) − θ ( v ) (cid:12)(cid:12) = O p (cid:0) τ n + B h (cid:1) and sup v ∈ [ h , − h ] (cid:12)(cid:12) { ˆ θ n,h ( v ) − θ ( v ) } − I ( v ) − ∇ θ L n,h ( v, θ ( v )) (cid:12)(cid:12) = O p (( τ n + h ς )( τ n + B h )) . Remark . • In the tvAR(1) case X i = a ( i/n ) X i − + ε i , we can use for instance ℓ θ ( x , x ) = ( x − ax ) , which for a ∈ ( − , is a ((1 , a ) , , | x | + | x | , (0 , -class. • With more smoothness assumptions on ∇ θ ℓ or using a local linear estimationmethod for ˆ θ n,h , the bias term B h can be shown to be of smaller order, for in-stance O ( h ) (cf. [4]). • The theory derived in this paper can also be used to prove asymptotic propertiesof M-estimators based on objective functions ℓ θ which are only almost everywheredifferentiable in the Lebesgue sense by following the theory of chapter 5 in [22].This is of utmost interest for ℓ θ that have additional analytic properties, such asconvexity. Since these properties are also needed in the proofs, we will not discussthis in detail. We give an easy application of the functional central limit theorem from Corollary 3.14following Example 19.25 in [22].
Example (Local mean absolute deviation) . For fixed v ∈ (0 , , put X n ( v ) := n K h (cid:0) in − v (cid:1) X i and define the mean absolute deviationmad n ( v ) := 1 n n X i =1 K h (cid:0) in − v (cid:1) | X i − X n ( v ) | . Let Assumption 6.2 hold with M = 1 . Suppose that P ( ˜ X ( v ) = E ˜ X ( v )) = 0 and that forsome κ > , α > , δ X ( j ) ≤ κj − α . We show that if nh → ∞ and nh ς → , √ nh (cid:0) mad n ( v ) − E | ˜ X ( v ) − µ | (cid:1) d → N (0 , σ ) , (6.3) where µ = E ˜ X ( v ) , G denotes the distribution function of ˜ X ( v ) and σ = Z K ( u ) du · ∞ X j =0 Cov (cid:0) | ˜ X ( v ) − µ | + (2 G ( µ ) −
1) ˜ X ( v ) , | ˜ X j ( v ) − µ | + (2 G ( µ ) −
1) ˜ X j ( v ) (cid:1) . The result is obtained by using the decomposition √ nh (cid:0) mad n ( v ) − E | ˜ X ( v ) − µ | (cid:1) = G n ( f X n ( v ) − f µ ) + G n ( f µ ) + A n ,A n = √ nhn n X i =1 K h (cid:0) in − v (cid:1) { E | X i − θ | − E | ˜ X ( v ) − µ | (cid:9)(cid:12)(cid:12)(cid:12) θ = X n ( v ) , where Θ = { θ ∈ R : | θ − µ | ≤ } and F = { f θ ( x, u ) = √ hK h ( u − v ) | x − θ | : θ ∈ Θ } . By the triangle inequality, F satisfies Assumption 2.5 with ¯ f θ ( x, u ) = | x − θ | , R ( · ) = C R =1 , p = ∞ , s = 1 and ∆( k ) = 2 δ X ( k ) . Assumption 3.9 is satisfied through Assumption6.2, and Assumption 3.10 is trivially fulfilled since ¯ f does not depend on u . Since F is ampirical process theory for locally stationary processes one-dimensional Lipschitz class, sup n ∈ N H ( ε, F , k · k ,n ) = O (log( ε − ∨ . By Corollary3.14, we obtain that there exists some process [ G ( f θ )] θ ∈ Θ such that for h → , nh → ∞ , (cid:2) G n ( f θ ) (cid:3) θ ∈ Θ d → (cid:2) G ( f θ ) (cid:3) θ ∈ Θ in ℓ ∞ (Θ) . (6.4) Furthermore, by Assumption 6.2, k f X n ( v ) ( X i ) − f µ ( X i ) k ≤ k X n ( v ) − µ k ≤ k X n ( v ) − E X n ( v ) k + k E X n ( v ) − µ k ≤ √ nh (cid:16) nh n X i =1 K (cid:0) in − vh (cid:1) (cid:17) / ∞ X j =0 δ X ( j ) + 1 n n X i =1 K h ( in − v ) (cid:12)(cid:12) E X i − E ˜ X ( v ) | = O (( nh ) − / + h ς ) . (6.5) By Lemma 19.24 in [22], we conclude from (6.4) and (6.5) that G n ( f X n ( v ) − f µ ) p → . (6.6) By Assumption 6.2 and bounded variation of K , A n = √ nh (cid:8) E | ˜ X ( v ) − θ | (cid:12)(cid:12) θ = X n ( v ) − E | ˜ X ( v ) − µ | (cid:9) + O p (( nh ) − / + ( nh ) / h − ς ) . (6.7) Due to P ( ˜ X ( v ) = µ ) = 0 , g ( θ ) = E | ˜ X ( v ) − θ | is differentiable in θ = µ with derivative G ( µ ) − . The Delta method delivers √ nh (cid:8) E | ˜ X ( v ) − θ | (cid:12)(cid:12) θ = X n ( v ) − E | ˜ X ( v ) − µ | (cid:9) = (2 G ( µ ) − √ nh ( X n ( v ) − µ ) + o p (1) . (6.8) From (6.6), (6.7) and (6.8) we obtain √ nh (cid:0) mad n ( v ) − E | ˜ X ( v ) − µ | (cid:1) = G n ( f µ + (2 G ( µ ) − id ) + o p (1) . Theorem 3.13 now yields (6.3).
To keep the following examples simple, we reduce ourselves to rather specific models. Itis not hard to apply our theory to more general situations.
Model 6.6 (Recursively defined models) . The process X i , i = 1 , ..., n , follows a recur-sion X i = m ( X i − , in ) + σ ( X i − , in ) ε i , where ε i , i ∈ Z , is an i.i.d. sequence of random variables and σ, m : R × [0 , → R .Suppose that there exist χ m , C m , ς > such that sup x = x ′ sup u | m ( x, u ) − m ( x ′ , u ) || x − x ′ | ≤ χ m , sup u = u ′ sup x | m ( x, u ) − m ( x, u ′ ) | (1 + | x | ) · | u − u ′ | ς ≤ C m , (6.9) and sup u | m (0 , u ) | ≤ C m . Let σ ( · ) satisfy the same properties with constants χ σ , C σ > .Let s > such that χ m + k ε k s · χ σ < . By Proposition 4.4 and Lemma 4.5 in [4], Assumption 6.2 is fulfilled and with some ρ ∈ (0 , , δ X s ( k ) ≤ C X ρ k . Model 6.7 (Linear models) . The process X i , i = 1 , ..., n , has the form X i = ∞ X j =0 a j ( in ) ε i − j , where ε i , i ∈ Z , is an i.i.d. sequence and a j : [0 , → R are some functions. There exist M > , ς > and some absolutely summable sequences A = ( A j ) j ∈ N , ¯ A = ( ¯ A j ) j ∈ N suchthat k ε k s < ∞ and for j ∈ N , sup u ∈ [0 , | a j ( u ) | ≤ A j , sup u = u | a j ( u ) − a j ( u ) || u − u | ς ≤ ¯ A j . Furthermore, inf u a ( u ) ≥ σ min > . Then it is easily seen that Assumption 6.2 is fulfilledand δ X s ( j ) ≤ k ε k s A j . To verify Assumption 2.8 in the case of density and distribution function estimation,the linear Model 6.7 can be dealt with as a “special case” of the recursive Model 6.6 byidentifying µ ( X i − , in ) with P ∞ j =1 a j ( i/n ) ε i and σ ( X i − , in ) with a ( i/n ). There exists astandard method to show that Assumption 2.8 is valid by only imposing minimal momentconditions on the underlying process X i : We will see that there will be terms of the form1 σ ( z, u ) g (cid:16) y − m ( z, u ) σ ( z, u ) (cid:17) , where y ∈ R and g ( · ) is some bounded continuously differentiable function with C g ′ , :=sup x ∈ R | g ′ ( x ) x | < ∞ . Omitting the second argument of m, σ for shortness, we have for mpirical process theory for locally stationary processes ξ z,z ′ between 1 and σ ( z ′ ) σ ( z ) , (cid:12)(cid:12)(cid:12) g (cid:16) y − m ( z ) σ ( z ) (cid:17) − g (cid:16) y − m ( z ′ ) σ ( z ′ ) (cid:17)(cid:12)(cid:12)(cid:12) (6.10) ≤ | g ′ | ∞ σ min | m ( z ) − m ( z ′ ) | + (cid:12)(cid:12)(cid:12) g (cid:16) y − m ( z ′ ) σ ( z ′ ) · σ ( z ′ ) σ ( z ) (cid:17) − g (cid:16) y − m ( z ′ ) σ ( z ′ ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ | g ′ | ∞ σ min | m ( z ) − m ( z ′ ) | + (cid:12)(cid:12)(cid:12) g (cid:16) y − m ( z ′ ) σ ( z ′ ) ξ z,z ′ (cid:17) y − m ( z ′ ) σ ( z ′ ) ξ z,z ′ × (cid:16) σ ( z ′ ) σ ( z ) − (cid:17) ξ − z,z ′ (cid:12)(cid:12)(cid:12) ≤ | g ′ | ∞ σ min | m ( z ) − m ( z ′ ) | + 2 C g ′ , σ min | σ ( z ) − σ ( z ′ ) | . (6.11)On the other hand, (6.10) is bounded by 2 | g | ∞ . Using the fact that for x ≥
0, min { , x } ≤ x a for arbitrary small a ∈ (0 , (cid:12)(cid:12) σ ( z ) − σ ( z ′ ) | ≤ min { σ − min , σ − min | σ ( z ) − σ ( z ′ ) |} ≤ σ − min σ − amin | σ ( z ) − σ ( z ′ ) | a and from (6.11) that (cid:12)(cid:12)(cid:12) σ ( z ) g (cid:16) y − m ( z ) σ ( z ) (cid:17) − σ ( z ′ ) g (cid:16) y − m ( z ′ ) σ ( z ′ ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ | g | ∞ σ min (cid:16) | g ′ | ∞ | g | ∞ σ min (cid:17) a | m ( z ) − m ( z ′ ) | a + 2 | g | ∞ σ min (cid:16) σ min ∨ C g ′ , | g | ∞ (cid:17) a | σ ( z ) − σ ( z ′ ) | a . (6.12) Example (Density estimation) . With some kernel ˜ K : R → [0 , ∞ ) , we considerthe localized density estimate of the density g ˜ X ( v ) of ˜ X ( v ) , ˆ g n,h ( x, v ) = 1 n n X i =1 K h ( in − v ) ˜ K h ( X i − x ) , where h , h > are some bandwidths and we abbreviate h = ( h , h ) . Suppose that • X i evolves like Model 6.6 or Model 6.7 and for some α > ( s ∧ ) − , δ X s ( j ) = O ( j − α ) , • ε fulfills C ε := k ε k s < ∞ , has a density g ε with respect to the Lebesgue measurewhich is bounded, continuously differentiable and satisfies sup x ∈ R | g ′ ε ( x ) x | < ∞ . • there exists p ˜ K ≥ s, C ˜ K > such that for u large enough, | ˜ K ( u ) | ≤ C ˜ K | u | − p ˜ K .Furthermore, R ˜ K ( x ) dx = 1 , R ˜ K ( x ) dx < ∞ and R ˜ K ( x ) | x | dx < ∞ .We show that if log( n ) (cid:0) nh h α ( s ∧
12 ) α ( s ∧
12 ) − (cid:1) − = O (1) , sup x ∈ R ,v ∈ [0 , (cid:12)(cid:12) ˆ g n,h ( x, v ) − g ˜ X ( v ) ( x ) (cid:12)(cid:12) = O p (cid:0)s log( n ) nh h + p nh h ( h + h ς ( s ∧ ) (cid:1) . (6.13)4 To do so, note that p nh h (cid:0) ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) (cid:1) = G n ( f x,v ) , where F = { f x,v ( z, u ) = p h K h ( u − v ) · p h ˜ K h ( z − x ) : x ∈ R , v ∈ [0 , } . With ¯ f x,v ( z, u ) = √ h ˜ K h ( z − x ) and κ ∈ { , } , we have by a substitution ω = h − ( m ( X i − , in ) + σ ( X i − , in ) ε − x ) , E [ ¯ f x,v ( X i , u ) κ |G i − ] /κ = 1 √ h E h ˜ K (cid:16) m ( X i − , in ) + σ ( X i − , in ) ε i − xh (cid:17) κ (cid:12)(cid:12)(cid:12) G i − i /κ = 1 √ h h Z ˜ K (cid:16) m ( X i − , in ) + σ ( X i − , in ) ε − xh (cid:17) κ g ε ( ε ) dε i /κ = h κ − h Z ˜ K ( ω ) κ σ ( X i − , in ) g ε ( x + h ω − m ( X i − , in ) σ ( X i − , in ) ) dω i /κ =: ¯ µ ( κ ) f x,v ,i ( X ◦ i − , u ) with X ◦ i = X i . By H¨older continuity of the square root, (6.12) and (6.9), we obtain (cid:12)(cid:12) ¯ µ ( κ ) f x ,i ( z, u ) − ¯ µ ( κ ) f x ,i ( z, u ) (cid:12)(cid:12) ≤ C (cid:16) Z ˜ K ( ω ) κ dω (cid:17) /κ | z − z ′ | s ∧ κ , where C depends on | g | ∞ , | g ′ ε | ∞ , sup x ∈ R | g ′ ε ( x ) x | , χ m , χ σ , σ min .The class F therefore satisfies Assumption 2.8 with p = ∞ , ν = 2 , ∆( k ) = O ( δ X s ( k ) s ∧ ) = O ( j − α ( s ∧ ) ) . Note that ¯ F ( z, u ) = sup f ∈F ¯ f ( z, u ) ≤ | ˜ K | ∞ √ h =: C ¯ F ,n . We obtain from Corol-lary 4.3 that for the grids V n = { in − : i = 1 , ..., n } , X n = { in − : i ∈ {− ⌈ n s ⌉ , ..., ⌈ n s ⌉}} , p nh h sup x ∈X n ,v ∈ V n (cid:12)(cid:12) ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) (cid:12)(cid:12) = sup x ∈X n ,v ∈ V n | G n ( f x,v ) | = O p (cid:0)p log( n ) (cid:1) . The discretization of (6.13) and the replacement of E ˆ g n,h ( x, v ) by g ˜ X ( v ) ( x ) is ratherstandard and postponed to the Supplementary material Supplement A, Section 8.7. Remark . • Note that due to Remark 4.4, all statements of the Example alsohold for s ∧ replaced by s ∧ . • Compared to [14] or [23], which proved similar results in the case that dependenceis quantified with α -mixing coefficients, we get much weaker conditions on the lowerbounds of the bandwidth h in order to guarantee uniform convergence. The reasonhere is that we assume that ε has a Lebesgue density which was not asked for inthe above papers. If we want to use our theory to prove (6.13) without assumingthat ε has a Lebesgue density, we would have to impose conditions which allow usto prove the statements of Lemma 2.7 directly.mpirical process theory for locally stationary processes F , we are able to quantifythe conditions on the moments and the decay of dependence in an easy way. Example (Empirical distribution function) . Let v ∈ [0 , . As an estimator forthe distribution function G ˜ X ( v ) of ˜ X ( v ) , we consider ˆ G n,h ( x, v ) := 1 n n X i =1 K h ( in − v ) { X i ≤ x } , x ∈ R , Suppose that • X i evolves like Model 6.6 or Model 6.7 and for some α > ( s ∧ ) − , δ X s ( j ) = O ( j − α ) , • ε fulfills C ε := k ε k s < ∞ , its distribution function G ε is continuously differen-tiable with derivative g ε and satisfies sup x ∈ R | g ε ( x ) x | < ∞ .Then as nh → ∞ , √ nh · h ς ( s ∧ → , √ nh (cid:2) ˆ G n,h ( x, v ) − G ˜ X ( v ) ( x ) (cid:3) x ∈ R d → (cid:2) G ( x ) (cid:3) x ∈ R in ℓ ∞ ( R ) , (6.14) where G is some centered Gaussian process with covariance function Cov( G ( x ) , G ( y )) = Z K ( u ) du X j ∈ Z Cov( { ˜ X ( v ) ≤ x } , { ˜ X j ( v ) ≤ y } ) . To prove this result, we use the fact that √ nh (cid:2) ˆ G n,h ( x, v ) − E ˆ G n,h ( x, v ) (cid:3) x ∈ R = (cid:2) G n ( f ) (cid:3) f ∈F , (6.15) where F = { f x ( z, u ) = √ hK h ( u − v ) { z ≤ x } : x ∈ R } . With ¯ f x ( z, u ) = { z ≤ x } , we have E [ f x ( X i , u ) κ |G i − ] /κ = G ε (cid:16) x − m ( X i − , in ) σ ( X i − , in ) (cid:17) /κ =: ¯ µ ( κ ) f x ,i ( X i − , u ) . As in Example 6.8, we see that Assumption 2.8 is satisfied with X ◦ i = X i , p = ∞ , ν = 2 , ∆( k ) = O ( δ X s ( k − s ) = O ( j − α ( s ∧ ) ) . Assumption 3.9 follows directly from Model 6.6or Model 6.7. Assumption 3.10 is trivially satisfied since ¯ f x ( z, u ) does not depend on thesecond argument. For any x ∈ R , sup | a |≤ c (cid:12)(cid:12) { ˜ X ( u ) ≤ x } − { ˜ X ( u )+ a ≤ x } (cid:12)(cid:12) ≤ (cid:12)(cid:12) { ˜ X ( u ) ≤ x } − { ˜ X ( u ) ≤ x − c } (cid:12)(cid:12) ≤ { x − c< ˜ X ( u ) ≤ x } , so that for s ∈ (0 , ] , sup u ∈ [0 , c s E h sup | a |≤ c (cid:12)(cid:12) { ˜ X ( u ) ≤ x } − { ˜ X ( u )+ a ≤ x } (cid:12)(cid:12) i ≤ sup u ∈ [0 , c s (cid:12)(cid:12)(cid:12) G ε (cid:16) x − m ( ˜ X i − ( u ) , u )) σ ( ˜ X i − ( u ) , u ) (cid:17) − G ε (cid:16) ( x − c ) − m ( ˜ X i − ( u ) , u )) σ ( ˜ X i − ( u ) , u ) (cid:17)(cid:12)(cid:12)(cid:12) ≤ c s min { , | g ε | ∞ cσ min } ≤ (cid:16) | g ε | ∞ σ min (cid:17) s , which shows Assumption 4.9.For γ > , we can discretize [ − γ − s , γ − s ] by a grid { x j } j = − N,...,N with distances γ ,having roughly γ − s − points. Under the given conditions, it is possible to show thatwith x N +1 = ∞ , x − ( N +1) = −∞ , the brackets [ f x j − , f x j ] , j = − N, ..., N + 1 , cover F . Details can be found in the Supplementary material Supplement A, Section 8.7. Wetherefore have q H ( γ, F , k · k ,n ) = O (cid:0)p log( γ − ) (cid:1) . By Table 2, as long as α ( s ∧ ) > , we have sup n ∈ N R ψ ( ε ) p H ( ε, F , V ) dε < ∞ . ByCorollary 4.14 we obtain for nh → ∞ that (6.15) converges to (cid:2) G ( x ) (cid:3) x ∈ R . Remark . • We conjecture that s ∧ can be replaced by s ∧ due to Remark4.8. • With similar techniques as presented in 6.12, it is also possible to include weightfunctions w : R → [0 , ∞ ) with lim x →±∞ w ( x ) = ∞ as additional factors to theconvergence (6.14), as done in [25]. • In the Model 6.6, it is also reasonable to consider estimating of the residual distribu-tion function G ε itself. Following the approach of [1], we first have to specify estima-tors ˆ m , ˆ σ for m, σ , respectively, and define empirical residuals ˆ ε i = X i − ˆ m ( X i − ,i/n )ˆ σ ( X i − ,i/n ) .Then ˆ G ε ( x ) = 1 n n X i =1 { ˆ ε i ≤ x } = 1 n n X i =1 { ε i ≤ x · ˆ σ ( Xi − ,i/n ) σ ( Xi − ,i/n ) + ˆ m ( Xi − ,i/n ) − m ( Xi − ,i/n ) σ ( Xi − ,i/n ) } can be discussed with empirical process theory and analytic properties of ˆ m, ˆ σ .
7. Conclusion
In this paper, we have developed an empirical process theory for locally stationary pro-cesses via the functional dependence measure. We have proven maximal inequalities, mpirical process theory for locally stationary processes L - or L ∞ -statistics. We have given several exam-ples in nonparametric estimation where our theory is applicable. Due to the possibilityto analyze the size of the function class and the stochastic properties of the underlyingprocess separately, we conjecture that our theory also permits an extension of variousresults from i.i.d. to dependent data, such as empirical risk minimization.From a technical point of view, the linear and moment-based nature of the functionaldependence measure has forced us to modify several approaches from empirical processtheory for i.i.d. or mixing variables. A main issue was given by the fact that the de-pendence measure only transfers decay rates for continuous functions. We therefore haveprovided a new chaining technique which preserves continuity of the arguments of theempirical process and extended the results to noncontinuous functions. We were not ableto derive Bernstein-type maximal inequalities with an optimal entropy integral. This maybe addressed in future work.In principle, a similar empirical process theory can be established for (1.3) under mixingconditions such as absolute regularity. This would be a generalization of the results foundin [21] and [5]. However, in a number of models the derivation of a bound for these mixingcoefficients may require some effort while the functional dependence measure is usuallyeasy to bound if the evolution of the process over time is known. Similar to such amixing framework, it is possible to apply our theory as long as the decay coefficients ofthe functional dependence measure are absolutely summable. However, it turnes out thatthere are significant differences: In our framework, the integrand p H ( ε, F , k · k ,n ) of theentropy integral is multiplied by some factor dependent on ε while only second momentsare needed, whereas in the mixing case there is no additional factor but more moments areneeded through a larger norm. Only in special cases these integrals are comparable; theexact connection between the values of the functional dependence measure and β -mixingcoefficients remains up to now an open question. References [1]
Akritas, M. G. and
Van Keilegom, I. (2001). Non-parametric estimation of theresidual distribution.
Scand. J. Statist. Alexander, K. S. (1984). Probability inequalities for empirical processes and alaw of the iterated logarithm.
The Annals of Probability
Borkar, V. S. (1993). White-Noise Representations in Stochastic Realization The-ory.
SIAM J. Control Optim. Dahlhaus, R. , Richter, S. and
Wu, W. B. (2019). Towards a general theory fornonlinear locally stationary processes.
Bernoulli Dedecker, J. and
Louhichi, S. (2002). Maximal inequalities and empirical cen-tral limit theorems. In
Empirical process techniques for dependent data
Donsker, M. D. (1952). Justification and extension of Doob’s heuristic approachto the Komogorov-Smirnov theorems.
Ann. Math. Statistics Doukhan, P. , Massart, P. and
Rio, E. (1995). Invariance principles for abso-lutely regular empirical processes.
Ann. Inst. H. Poincar´e Probab. Statist. Doukhan, P. and
Neumann, M. H. (2007). Probability and moment inequali-ties for sums of weakly dependent random variables, with applications.
StochasticProcesses and their Applications
Dudley, R. M. (1966). Weak convergences of probabilities on nonseparable metricspaces and empirical measures on Euclidean spaces.
Illinois J. Math. Dudley, R. M. (1978). Central limit theorems for empirical measures.
Ann. Probab. Francq, C. and
Zako¨ıan, J.-M. (2006). Mixing properties of a general class ofGARCH(1,1) models without moment assumptions on the observed process.
Econo-metric Theory Freedman, D. A. (1975). On Tail Probabilities for Martingales.
Ann. Probab. Fryzlewicz, P. and
Subba Rao, S. (2011). Mixing properties of ARCH and time-varying ARCH processes.
Bernoulli Hansen, B. E. (2008). Uniform convergence rates for kernel estimation with de-pendent data.
Econometric Theory Liebscher, E. (1996). Strong convergence of sums of [alpha]-mixing random vari-ables with applications to density estimation.
Stochastic Processes and their Appli-cations Mayer, U. (2019). Functional weak limit theorem for a local empirical process ofnon-stationary time series and its application to von Mises-statistics.[17]
Mokkadem, A. (1988). Mixing properties of ARMA processes.
Stochastic Process.Appl. Nishiyama, Y. et al. (2000). Weak convergence of some classes of martingales withjumps.
The Annals of Probability Pinelis, I. (1994). Optimum bounds for the distributions of martingales in Banachspaces.
Ann. Probab. Pollard, D. (1982). A central limit theorem for empirical processes.
J. Austral.Math. Soc. Ser. A Rio, E. (1995). The Functional Law of the Iterated Logarithm for StationaryStrongly Mixing Sequences.
Ann. Probab. van der Vaart, A. W. (1998). Asymptotic statistics . Cambridge Series in Sta-tistical and Probabilistic Mathematics . Cambridge University Press, Cambridge.MR1652247[23] Vogt, M. (2012). Nonparametric regression for locally stationary time series.
Ann.mpirical process theory for locally stationary processes Statist. Wu, W. B. (2005). Nonlinear system theory: another look at dependence.
Proc.Natl. Acad. Sci. USA
Wu, W. B. (2008). EMPIRICAL PROCESSES OF STATIONARY SEQUENCES.
Statistica Sinica Wu, W. B. (2011). Asymptotic theory for stationary processes.
Stat. Interface Wu, W. B. , Liu, W. and
Xiao, H. (2013). Probability and moment inequalitiesunder dependence.
Statist. Sinica Zhang, D. and
Wu, W. B. (2017). Gaussian approximation for high dimensionaltime series.
Ann. Statist. Supplementary MaterialSupplement A: Technical proofs (doi: COMPLETED BY THE TYPESETTER; .pdf). This material contains some de-tails of the proofs in the paper as well as the proofs of the examples.
8. Appendix
Proof of Lemma 2.7.
We have for each f ∈ F and ν ≥ i (cid:13)(cid:13)(cid:13) ¯ f ( Z i , u ) − ¯ f ( Z ∗ ( i − k ) i , u ) (cid:13)(cid:13)(cid:13) ν ≤ sup i (cid:13)(cid:13)(cid:13) | Z i − Z ∗ ( i − k ) i | sL F ,s (cid:0) R ( Z i , u ) + R ( Z ∗ ( i − k ) i , u ) (cid:1)(cid:13)(cid:13)(cid:13) ν ≤ sup i (cid:13)(cid:13)(cid:13)(cid:12)(cid:12) Z i − Z ∗ ( i − k ) i (cid:12)(cid:12) sL F ,s (cid:13)(cid:13)(cid:13) pp − ν (cid:13)(cid:13)(cid:13) R ( Z i , u ) + R ( Z ∗ ( i − k ) i , u ) (cid:13)(cid:13)(cid:13) pν ≤ sup i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ X j =0 L F ,j (cid:12)(cid:12) X i − j − X ∗ ( i − k ) i − j (cid:12)(cid:12) s ∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) pp − ν (cid:18) k R ( Z i , u ) k pν + (cid:13)(cid:13)(cid:13) R ( Z ∗ ( i − k ) i , u ) (cid:13)(cid:13)(cid:13) pν (cid:19) ≤ dC R k X j =0 L F ,j ( δ X pp − νs ( k − j )) s . This shows the first assertion. Due tosup f ∈F (cid:12)(cid:12) ¯ f ( Z i , u ) − ¯ f ( Z ∗ ( i − k ) i , u ) (cid:12)(cid:12) ≤ | Z i − Z ∗ ( i − k ) i | sL F ,s (cid:0) R ( Z i , u ) + R ( Z ∗ ( i − k ) i , u ) (cid:1) , the second assertion follows similarly. The last assertion follows from | ¯ f ( z, u ) | ≤ | ¯ f ( z, u ) − ¯ f (0 , u ) | + | ¯ f (0 , u ) | ≤ | z | sL F ,s · ( R ( z, u ) + R (0 , u )) + | ¯ f (0 , u ) | which implies k ¯ f ( Z i , u ) k ν ≤ (cid:13)(cid:13)(cid:13) ∞ X j =0 L F ,j | Z i − j | s ∞ (cid:13)(cid:13)(cid:13) pp − ν (cid:0)(cid:13)(cid:13) R ( Z i , u ) (cid:13)(cid:13) pq + R (0 , u ) (cid:1) + | ¯ f (0 , u ) |≤ d · | L F | · C sX · ( C R + | R (0 , u ) | ) + | ¯ f (0 , u ) |≤ d · | L F | · C sX · C R + C ¯ f . mpirical process theory for locally stationary processes Lemma 8.1.
Let Assumption 2.8 hold for some ν ≥ . Then for all u ∈ [0 , , δ E [ f ( Z i ,u ) |G i − ] ν ( k ) ≤ | D f,n ( u ) | · ∆( k ) , (8.1)sup i (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) E [ f ( Z i , u ) |G i − ] − E [ f ( Z i , u ) |G i − ] ∗ ( i − k ) (cid:12)(cid:12)(cid:13)(cid:13)(cid:13) ν ≤ D ∞ n ( u ) · ∆( k ) , (8.2)sup i k f ( Z i , u ) k ≤ | D f,n ( u ) | · C ∆ . (8.3) Furthermore, δ E [ f ( Z i ,u ) |G i − ] ν/ ( k ) ≤ | D f,n ( u ) | · sup i k f ( Z i , u ) k ν · ∆( k ) , (8.4) (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) E [ f ( Z i , u ) |G i − ] − E [ f ( Z i , u ) |G i − ] ∗ ( i − k ) (cid:12)(cid:12)(cid:13)(cid:13)(cid:13) ν/ ≤ D ∞ n ( u ) · C ∆ · ∆( k ) , (8.5) where C ∆ := 2 max { d, ˜ d }| L F | C sX C R + C ¯ f . Proof of Lemma 8.1.
We havesup i (cid:13)(cid:13) E [ f ( Z i , u ) |G i − ] − E [ f ( Z i , u ) |G i − ] ∗ ( i − k ) (cid:13)(cid:13) ν = | D f,n ( u ) | · sup i (cid:13)(cid:13) ¯ µ (1) f,i ( Z ◦ i − , u ) − ¯ µ (1) f,i (( Z ◦ i − ) ∗ ( i − k ) , u ) (cid:13)(cid:13) ν ≤ | D f,n ( u ) | · sup i (cid:13)(cid:13)(cid:13)(cid:12)(cid:12) Z ◦ i − − ( Z ◦ i − ) ∗ ( i − k ) (cid:12)(cid:12) sL F ,s (cid:13)(cid:13)(cid:13) pνp − (cid:13)(cid:13)(cid:13) R ( Z ◦ i − , u ) + R (( Z ◦ i − ) ∗ ( i − k ) , u ) (cid:13)(cid:13)(cid:13) pν ≤ | D f,n ( u ) | · sup i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ X j =0 L F ,j (cid:12)(cid:12) X ◦ i − − j − ( X ◦ i − − j ) ∗ ( i − k ) (cid:12)(cid:12) s ∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) pνp − × (cid:16) k R ( G i − , u ) k pν + (cid:13)(cid:13)(cid:13) R ( G ∗ ( i − k ) i − , u ) (cid:13)(cid:13)(cid:13) pν (cid:17) ≤ | D f,n ( u ) | · d ◦ C R k − X j =0 L F ,j δ pνsp − ( k − j − s , that is, the assertion (8.1) holds with the given ∆( k ). The proof of (8.2) is similar.We now prove (8.3). We have E [ f ( Z i , u ) ] = E [ E [ f ( Z i , u ) |G i − ]] = D f,n ( u ) E [¯ µ (2) f,i ( Z ◦ i − , u ) ]and thus k f ( Z i , u ) k = | D f,n ( u ) | · k ¯ µ (2) f,i ( Z ◦ i − , u ) k . Since | ¯ µ (2) f,i ( y, u ) | ≤ | ¯ µ (2) f,i ( y, u ) − ¯ µ (2) f,i (0 , u ) | + | ¯ µ (2) f,i (0 , u ) | , the proof now follows the same lines as in the proof of Lemma 2.7.2We now show (8.4) and (8.5). We have (cid:12)(cid:12) ¯ µ (2) f,i ( z, u ) − ¯ µ (2) f,i ( z ′ , u ) (cid:12)(cid:12) = (cid:12)(cid:12) ¯ µ (2) f,i ( z, u ) − ¯ µ (2) f,i ( z ′ , u ) (cid:12)(cid:12) · (cid:2) | ¯ µ (2) f,i ( z, u ) | + | ¯ µ (2) f,i ( z ′ , u ) | (cid:3) . We then have by the Cauchy Schwarz inequality that (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) ¯ µ (2) f,i ( Z ◦ i − , u ) − ¯ µ (2) f,i (( Z ◦ i − ) ∗ ( i − k ) , u ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ν/ ≤ (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) ¯ µ (2) f,i ( Z ◦ i − , u ) − ¯ µ (2) f,i (( Z ◦ i − ) ∗ ( i − k ) , u ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ν · (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) ¯ µ (2) f,i ( Z ◦ i − , u ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ν . (8.6)Since { ¯ µ (2) f,i : f ∈ F , i ∈ { , ..., n }} forms a ( L F , s, R, C )-class, the first factor in (8.6) isbounded by ∆( k ) as in the proof of Lemma 2.7. Furthermore, | ¯ µ (2) f,i ( z, u ) | ≤ | ¯ µ (2) f,i ( z, u ) − ¯ µ (2) f,i (0 , u ) | + | ¯ µ (2) f,i (0 , u ) |≤ | z | sL F ,s ( R ( z, u ) + R (0 , u )) + | ¯ µ (2) f,i (0 , u ) | . Note that (cid:13)(cid:13)(cid:13) | Z ◦ i − | sL F ,s · (cid:2) R ( Z ◦ i − , u ) + R (0 , u ) (cid:3)(cid:13)(cid:13)(cid:13) ν ≤ (cid:13)(cid:13)(cid:13) ∞ X j =0 L F ,j | Z ◦ i − − j | s ∞ (cid:13)(cid:13)(cid:13) pp − ν · (cid:16) k R ( Z ◦ i − , u ) k pν + | R (0 , u ) | (cid:17) ≤ d ◦ | L F | sup i,j k X ◦ ij k s νspp − · ( C R + | R (0 , u ) | ) ≤ d ◦ | L F | C sX C R . We now obtain (8.5) from (8.6) with the given C ∆ .By the Cauchy Schwarz inequality, we have for q ≥ δ E [ f ( Z i ,u ) |G i − ] ν/ ( k )= sup i (cid:13)(cid:13)(cid:13) E [ f ( Z i , u ) |G i − ] − E [ f ( Z i , u ) |G i − ] ∗ ( i − k ) (cid:13)(cid:13)(cid:13) ν/ = | D f,n ( u ) | · sup i (cid:13)(cid:13)(cid:13) D f,n ( u ) (cid:0) ¯ µ (2) f,i ( Z ◦ i − , u ) − ¯ µ (2) f,i (( Z ◦ i − ) ∗ ( i − k ) , u ) (cid:1)(cid:13)(cid:13)(cid:13) ν/ ≤ | D f,n ( u ) | · sup i (cid:13)(cid:13)(cid:13) ¯ µ (2) f,i ( Z ◦ i − , u ) − ¯ µ (2) f,i (( Z ◦ i − ) ∗ ( i − k ) , u ) (cid:13)(cid:13)(cid:13) ν × (cid:13)(cid:13)(cid:13) D f,n ( u )¯ µ (2) f,i ( Z ◦ i − , u ) (cid:13)(cid:13)(cid:13) ν (8.7)Furthermore, (cid:13)(cid:13)(cid:13) D f,n ( u )¯ µ (2) f,i ( Z ◦ i − , u ) (cid:13)(cid:13)(cid:13) ν ≤ k E [ f ( Z i , u ) |G i − ] / k ν ≤ k f ( Z i , u ) k ν . (8.8)Since Assumption 2.8 holds for µ (2) f,i , the first factor in (8.7) is bounded by D f,n ( u )∆( k )as in the proof of Lemma 2.7. Inserting this and (8.8) into (8.7), we obtain the result(8.4). mpirical process theory for locally stationary processes Proof of Theorem 3.2.
Denote the three terms on the right hand side of (3.1) by A , A , A . We now discuss the three terms separately. First, we have E A ≤ ∞ X j = q √ n E max f ∈F (cid:12)(cid:12)(cid:12) n X i =1 ( W i,j +1 ( f ) − W i,j ( f )) (cid:12)(cid:12)(cid:12) . For fixed j , the sequence E i,j := ( E i,j ( f )) f ∈F = (cid:0) ( W i,j +1 ( f ) − W i,j ( f )) (cid:1) f ∈F = ( E [ W i ( f ) | ε i − j , ..., ε i ] − E [ W i ( f ) | ε i − j +1 , ..., ε i ]) f ∈F is a |F| -dimensional martingale difference vector with respect to G i = σ ( ε i − j , ε i − j +1 , ... ).For a vector x = ( x f ) f ∈F and s ≥
1, write | x | s := ( P f ∈F | x f | s ) /s . By Theorem 4.1 in[19] there exists an absolute constant c > s > (cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12) n X i =1 E i,j (cid:12)(cid:12)(cid:12) s (cid:13)(cid:13)(cid:13) ≤ c n (cid:13)(cid:13)(cid:13) sup i =1 ,...,n | E i,j | s (cid:13)(cid:13)(cid:13) + p s − (cid:13)(cid:13)(cid:13)(cid:16) n X i =1 E [ | E i,j | s |G i − ] (cid:17) / (cid:13)(cid:13)(cid:13) o . (8.9)We have (cid:13)(cid:13)(cid:13) sup i =1 ,...,n | E i,j | s (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:0) sup i =1 ,...,n | E i,j | s (cid:1) / (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:0) n X i =1 | E i,j | s (cid:1) / (cid:13)(cid:13)(cid:13) , therefore both terms in (8.9) are of the same order and it is enough to bound the secondterm in (8.9). We have (cid:13)(cid:13)(cid:13)(cid:16) n X i =1 E [ | E i,j | s |G i − ] (cid:17) / (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) n X i =1 E [ | E i,j | s |G i − ] (cid:13)(cid:13)(cid:13) / ≤ (cid:16) n X i =1 (cid:13)(cid:13) E [ | E i,j | s |G i − ] (cid:13)(cid:13) (cid:17) / ≤ (cid:16) n X i =1 (cid:13)(cid:13) | E i,j | s (cid:13)(cid:13) (cid:17) / . (8.10)Note that E i,j ( f ) = W i,j +1 ( f ) − W i,j ( f ) = E [ W i ( f ) | ε i − j , ..., ε i ] − E [ W i ( f ) | ε i − j +1 , ..., ε i ]= E [ W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) |G i ] , (8.11)where H ( F i ) ∗∗ ( i − j ) := H ( F ∗∗ ( i − j ) i ) and F ∗∗ ( i − j ) i = ( ε i , ε i − , ..., ε i − j , ε ∗ i − j − , ε ∗ i − j − , ... ).4By Jensen’s inequality, Lemma 2.7 and the fact that ( W i ( f ) ∗∗ ( i − j ) , W i ( f ) ∗∗ ( i − j +1) ) hasthe same distribution as ( W i ( f ) , W i ( f ) ∗ ( i − j ) ), k| E i,j | s (cid:13)(cid:13) = | (cid:13)(cid:13)(cid:13)(cid:16) X f ∈F | E i,j ( f ) | s (cid:17) /s (cid:13)(cid:13)(cid:13) ≤ s /s (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) E [ W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) |G i ] (cid:12)(cid:12)(cid:13)(cid:13)(cid:13) ≤ e · (cid:13)(cid:13)(cid:13) E (cid:2) sup f ∈F (cid:12)(cid:12) W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) (cid:12)(cid:12) (cid:12)(cid:12) G i (cid:3)(cid:13)(cid:13)(cid:13) ≤ e · (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) = e · (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) W i ( f ) − W i ( f ) ∗ ( i − j ) (cid:12)(cid:12) (cid:13)(cid:13)(cid:13) ≤ e · D ∞ n ( in )∆( j ) . (8.12)Inserting (8.12) into (8.10) delivers (cid:16) n X i =1 (cid:13)(cid:13) | E i,j | s (cid:13)(cid:13) p (cid:17) / ≤ e (cid:16) n X i =1 D ∞ n ( in ) (cid:17) / ∆( j ) , Inserting this bound into (8.9), we obtain (cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12) n X i =1 E i,j (cid:12)(cid:12)(cid:12) s (cid:13)(cid:13)(cid:13) ≤ ec s / n / (cid:16) n n X i =1 D ∞ n ( in ) (cid:17) / ∆( j ) . We conclude with s := 2 ∨ log |F| that E A ≤ √ n ∞ X k = q (cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12) n X i =1 E i,j (cid:12)(cid:12)(cid:12) s (cid:13)(cid:13)(cid:13) ≤ ec · p ∨ log |F| · (cid:16) n n X i =1 D ∞ n ( in ) (cid:17) / ∞ X j = q ∆ p ( j ) ≤ ec · √ H · D ∞ n β ( q ) . (8.13)We now discuss E A . If M Q , σ Q > Q i ( f ), i = 1 , ..., m mean-zero in-dependent variables (depending on f ∈ F ) with | Q i ( f ) | ≤ M Q , ( m P mi =1 k Q i ( f ) k ) / ≤ σ Q , then there exists some universal constant c > E max f ∈F √ m (cid:12)(cid:12)(cid:12) m X i =1 (cid:2) Q i ( f ) − E Q i ( f ) (cid:3)(cid:12)(cid:12)(cid:12) ≤ c · (cid:16) σ Q √ H + M Q H √ m (cid:17) , (8.14)(see e.g. [5] (equation (4.3) in Section 4.1 therein). mpirical process theory for locally stationary processes W k,j − W k,j − ) k is a martingale difference sequence and W k,τ l − W k,τ l − = P τ l j = τ l − +1 ( W k,j − W k,j − ). Furthermore, we have k W k,j − W k,j − k ≤ k W k − E [ W k | ε k − j +1 ] k ≤ k W k k and k W k,j − W k,j − k = k E [ W ∗∗ ( k − j +1) k − W ∗∗ ( k − j +2) k |G k ] k ≤ k W ∗∗ ( k − j +1) k − W ∗∗ ( k − j +2) k k = k W k − W ∗ ( k − j +1) k k = δ W k ( j − , thus k W k,j − W k,j − k ≤ min {k W k k , δ W k ( j − } . We conclude with the elementary inequality min { a , b } +min { a , b } ≤ min { a + a , b + b } that k T i,l k = (cid:13)(cid:13)(cid:13) ( iτ l ) ∧ n X k =( i − τ l +1 ( W k,τ l − W k,τ l − ) (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) τ l X j = τ l − +1 ( iτ l ) ∧ n X k =( i − τ l +1 ( W k,j − W k,j − ) (cid:13)(cid:13)(cid:13) ≤ τ l X j = τ l − +1 (cid:13)(cid:13)(cid:13) ( iτ l ) ∧ n X k =( i − τ l +1 ( W k,j − W k,j − ) (cid:13)(cid:13)(cid:13) ≤ τ l X j = τ l − +1 (cid:16) ( iτ l ) ∧ n X k =( i − τ l +1 k W k,j − W k,j − k (cid:17) / ≤ τ l X j = τ l − +1 min n(cid:16) ( iτ l ) ∧ n X k =( i − τ l +1 k W k k (cid:17) / , (cid:16) ( iτ l ) X k =( i − τ l +1 ( δ W k ( j − (cid:17) / o . Put σ i,l := (cid:16) τ l ( iτ l ) ∧ n X k =( i − τ l +1 k W k k (cid:17) / , ∆ i,j,l := (cid:16) τ l ( iτ l ) ∧ n X k =( i − τ l +1 δ W k ( j − (cid:17) / . (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 i even τ l k T i,l ( f ) k (cid:17) / ≤ (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 (cid:16) τ l X j = τ l − +1 min n(cid:16) τ l ( iτ l ) ∧ n X k =( i − τ l +1 k W k k (cid:17) / , (cid:16) τ l ( iτ l ) ∧ n X k =( i − τ l +1 δ W k ( j − (cid:17) / o(cid:17) (cid:17) / = (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 (cid:16)(cid:0) τ l − τ l − (cid:1) min { σ i , ∆ i,τ l − +1 ,l } (cid:17) / = (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 (cid:0) τ l − τ l − (cid:1) min { σ i,l , ∆ i,τ l − +1 ,l } (cid:17) / ≤ (cid:0) τ l − τ l − (cid:1) · (cid:16) min { nτ l ⌊ nτl ⌋ +1 X i =1 σ i,l , nτ l ⌊ nτl ⌋ +1 X i =1 ∆ i,τ l − +1 ,l } (cid:17) / ≤ τ l X j = τ l − +1 min { (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 σ i,l (cid:17) / , (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 ∆ i,τ l − +1 ,l (cid:17) / }≤ τ l X j = τ l − +1 min {k f k ,n , (cid:16) n n X i =1 δ W i ( τ l − ) (cid:17) / }≤ τ l X j = τ l − +1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } (8.15)With √ τ l (cid:12)(cid:12) T i,l ( f ) (cid:12)(cid:12) ≤ √ τ l k f k ∞ ≤ √ τ l M and (8.14), we obtain L X l =1 h E max f ∈F (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12)i ≤ c L X l =1 h sup f (cid:16) nτ l ⌊ nτl ⌋ +1 X i =1 i even (cid:13)(cid:13)(cid:13)(cid:13) √ τ l T i,l ( f ) (cid:13)(cid:13)(cid:13)(cid:13) (cid:17) / √ H + 2 √ τ l M H q nτ l i , mpirical process theory for locally stationary processes i odd) in A . With (8.15), we conclude that E A ≤ L X l =1 h E max f ∈F q nτ l (cid:12)(cid:12)(cid:12) X ≤ i ≤⌊ nτl ⌋ +1 ,i odd √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) + E max f ∈F q nτ l (cid:12)(cid:12)(cid:12) X ≤ i ≤⌊ nτl ⌋ +1 ,i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12)i ≤ c L X l =1 h(cid:16) τ l X j = τ l − +1 min { max f ∈F k f k ,n , D n ∆( ⌊ j ⌋ ) } (cid:17) · √ H + √ τ l M H q ⌊ nτ l ⌋ + 1 i . (8.16)Note that L X l =1 √ τ l q ⌊ nτ l ⌋ + 1 ≤ L X l =1 √ τ l q nτ l = 1 √ n L X l =0 τ l = 1 √ n L − X l =1 l ≤ √ n (2 L + q ) ≤ q √ n . (8.17)Furthermore, we have by Lemma 8.2 that L X l =1 τ l X j = τ l − +1 min { max f ∈F k f k ,n , D n ∆( ⌊ j ⌋ ) } ≤ ∞ X j =2 min { max f ∈F k f k ,n , D n ∆( ⌊ j ⌋ ) }≤ V (max f ∈F k f k ,n )= 2 max f ∈F ¯ V ( k f k ,n ) = 2 max f ∈F V ( f ) , (8.18)where ¯ V ( x ) = x + ∞ X j =1 min { x, D n ∆( j ) } (8.19)and the second to last equality holds since x ¯ V ( x ) is increasing.Inserting (8.17) and (8.18) into (8.16), we conclude that with some universal c > E A ≤ c (cid:16) sup f ∈F V ( f ) √ H + qM H √ n (cid:17) ≤ c (cid:16) σ √ H + qM H √ n (cid:17) . (8.20)Since S Wn, = P ni =1 W i, ( f ) is a sum of independent variables with | W i, ( f ) | ≤ k f k ∞ ≤ M and k W i, ( f ) k ≤ k f k ≤ V ( f ) ≤ σ , we obtain from (8.14) again E A ≤ c (cid:16) σ √ H + M H √ n (cid:17) . (8.21)If we insert the bounds (8.13), (8.20) and (8.21) into (3.1), we obtain the result (3.2).8We now show (3.3). If q ∗ ( M √ H √ n D ∞ n ) Hn ≤
1, we have q ∗ ( M √ H √ n D ∞ n ) ∈ { , ..., n } and thus by (3.2): E max f ∈F (cid:12)(cid:12)(cid:12) √ n S n ( f ) (cid:12)(cid:12)(cid:12) ≤ c (cid:16) √ H D ∞ n β (cid:16) q ∗ (cid:16) M √ H √ n D ∞ n (cid:17)(cid:17) + q ∗ (cid:16) M √ H √ n D ∞ n (cid:17) M H √ n + σ √ H (cid:17) ≤ c (cid:16) q ∗ (cid:16) M √ H √ n D ∞ n (cid:17) M H √ n + σ √ H (cid:17) = 2 c (cid:16) √ nM · min n q ∗ (cid:16) M √ H √ n D ∞ n (cid:17) Hn , o + σ √ H (cid:17) . (8.22)If q ∗ ( M √ H √ n D ∞ n ) Hn ≥
1, we note that the simple bound E max f ∈F (cid:12)(cid:12)(cid:12) √ n S n ( f ) (cid:12)(cid:12)(cid:12) ≤ √ nM ≤ c (cid:16) √ nM min n q ∗ (cid:16) M √ H √ n D ∞ n (cid:17) Hn , o + σ √ H (cid:17) (8.23)holds. Putting the two bounds (8.22) and (8.23) together, we obtain the result (3.3). Lemma 8.2.
Let ω ( k ) be an increasing sequence in k . Then, for any x > , ∞ X j =2 min { x, D n ∆( ⌊ j ⌋ ) } ω ( j ) ≤ ∞ X j =1 min { x, D n ∆( j ) } ω (2 j + 1) . Especially in the case ω ( k ) = 1 , ∞ X j =2 min { x, D n ∆( ⌊ j ⌋ ) } ≤ ∞ X j =1 min { x, D n ∆( j ) } . Proof of Lemma 8.2.
It holds that ∞ X j =2 min { x, D n ∆( ⌊ j ⌋ ) } ω ( j )= ∞ X k =1 min { x, D n ∆( ⌊ k ⌋ ) } ω (2 k ) + ∞ X k =1 min { x, D n ∆( ⌊ k + 12 ⌋ ) } ω (2 k + 1)= ∞ X k =1 min { x, D n ∆( k ) } · { ω (2 k ) + ω (2 k + 1) }≤ ∞ X k =1 min { x, D n ∆( k ) } · ω (2 k + 1) . mpirical process theory for locally stationary processes Proof of Corollary 3.3.
Let σ := sup n ∈ N sup f ∈F V ( f ) < ∞ . For Q ≥
1, define M n = √ n √ H r ( σQ / D ∞ n ) D ∞ n . Let ¯ F = sup f ∈F ¯ f , and F ( z, u ) = D ∞ n ( u ) · ¯ F ( z, u ). Then F is an envelope function of F .We furthermore have P ( sup i =1 ,...,n F ( Z i , in ) > M n ) ≤ P (cid:16)(cid:0) n n X i =1 F ( Z i , in ) ν (cid:1) /ν > M n n /ν (cid:17) ≤ nM νn · k F k νν,n . (8.24)Inserting the bound k F k νν,n = 1 n n X i =1 D ∞ n ( in ) ν k ¯ F ( Z i , in ) k νν ≤ C ν ∆ · n n X i =1 D ∞ n ( in ) ν ≤ C ν ∆ · ( D ∞ ν,n ) ν into (8.24) and using r ( γa ) ≥ γr ( a ) for γ ≥ , a > P ( sup i =1 ,...,n F ( Z i , in ) > M n ) ≤ (cid:16) Hn − ν r ( σQ / D ∞ n ) (cid:17) ν/ · (cid:16) C ∆ D ∞ ν,n D ∞ n (cid:17) ν ≤ Q ν/ (cid:16) Hn − ν r ( σ D ∞ n ) (cid:17) ν/ · (cid:16) C ∆ D ∞ ν,n D ∞ n (cid:17) ν . (8.25)Using the rough bound k f k ν,n ≤ k F k ν,n and r ( a ) ≤ a for a > f ∈F √ n n X i =1 E [ f ( Z i , in ) {| f ( Z i , in ) | >M n } ] ≤ √ nM ν − n max f ∈F n X i =1 E [ | f ( Z i , in ) | ν ] ≤ nM νn · M n √ n max f ∈F k f k νν,n ≤ (cid:16) C Hn − ν r ( σQ / D ∞ n ) (cid:17) ν/ · σQ / √ H · (cid:16) D ∞ ν,n D ∞ n (cid:17) ν ≤ σQ ν − √ H (cid:16) C Hn − ν r ( σ D ∞ n ) (cid:17) ν/ · (cid:16) D ∞ ν,n D ∞ n (cid:17) ν . (8.26)Abbreviate C n := (cid:16) C Hn − ν r ( σ D ∞ n ) (cid:17) ν/ · (cid:16) D ∞ ν,n D ∞ n (cid:17) ν . n ∈ N C n < ∞ . By Theorem 3.2, (8.25) and (8.26), P (cid:16) max f ∈F (cid:12)(cid:12) G n ( f ) (cid:12)(cid:12) > Q √ H (cid:17) ≤ P (cid:16) max f ∈F (cid:12)(cid:12) G n ( f ) (cid:12)(cid:12) > Q √ H, sup i =1 ,...,n ¯ F ( Z i , in ) ≤ M (cid:17) + P ( sup i =1 ,...,n F ( Z i , in ) > M ) ≤ P (cid:16) max f ∈F (cid:12)(cid:12) G n (max { min { f, M } , − M } ) (cid:12)(cid:12) > Q √ H/ (cid:17) + P (cid:16) max f ∈F (cid:12)(cid:12) √ n n X i =1 E [ f ( Z i , in ) {| f ( Z i , in ) | >M } ] > Q √ H/ (cid:17) + P ( sup i =1 ,...,n F ( Z i , in ) > M ) ≤ cQ √ H h σ √ H + q ∗ (cid:16) r ( σQ / D ∞ n ) (cid:17) r ( σQ / D ∞ n ) D ∞ n i + (cid:16) Q ν + 2 σQ ν H (cid:17) C n ≤ cσQ / + (cid:16) Q ν + 2 σQ ν H (cid:17) C n . Since sup n ∈ N C n < ∞ and σ is independent of n , the assertion follows for Q → ∞ . Proof of Lemma 3.5. (i) Since | x | + | x | ≤ m implies | x | , | x | ≤ m , we have I := (cid:12)(cid:12) ϕ ∧ m ( x + x + x ) − ϕ ∧ m ( x ) − ϕ ∧ m ( x ) (cid:12)(cid:12) = (cid:12)(cid:12) ϕ ∧ m ( x + x + x ) − x − x | . Case 1: x + x + x > m . Then, since | x | + | x | ≤ m , we have I = | m − x − x | = m − x − x < x ≤ | x | .Case 2: x + x + x ∈ [ − m, m ]. Then I = | x + x + x − x − x | = | x | .Case 3: x + x + x < − m . Then, since | x | + | x | ≤ m , we have I = |− m − x − x | = m + x + x < − x ≤ | x | .Furthermore, I ≤ | ϕ m ( x + x + x ) | + | x + x | ≤ m + m = 2 m .(ii) The first assertion is obvious. If | x | ≤ y , we have | ϕ ∨ m ( x ) | = x − m, x > m , x ∈ [ − m, m ] − x − m, x < − m = | x | − m, x > m , x ∈ [ − m, m ] | x | − m, x < − m = ( | x | − m ) | x | >m ≤ ( y − m ) y>m = ( y − m ) ∨ y − m ) { y − m> } ≤ y y>m , which shows the second assertion. mpirical process theory for locally stationary processes z, z ′ ∈ R N it holds that | ϕ ∧ m ( f )( z ) − ϕ ∧ m ( f )( z ′ ) | ≤ | f ( z ) − f ( z ′ ) | , | ϕ ∨ m ( f )( z ) − ϕ ∨ m ( f )( z ′ ) | ≤ | f ( z ) − f ( z ′ ) | (8.27)from which the assertion follows. For real numbers a i , b i , we havemax i { a i } = max i { a i − b i + b i } ≤ max i { a i − b i } + max i { b i } , thus | max i { a i }− max i { b i }| ≤ max i | a i − b i | . This implies | max { a, y }− max { a, y ′ }| ≤| y − y ′ | and therefore | ϕ ∧ m ( f )( z ) − ϕ ∧ m ( f )( z ′ ) | = | ( − m ) ∨ ( f ( z ) ∧ m ) − ( − m ) ∨ ( f ( z ′ ) ∧ m ) | ≤ | f ( z ) ∧ m − f ( z ′ ) ∧ m | = | ( − f ( z ′ )) ∨ ( − m ) − ( − f ( z )) ∨ ( − m ) | ≤ | f ( z ) − f ( z ′ ) | . For the second inequality in (8.27), note that ϕ ∨ m ( f )( z ) = ( f ( z ) − m ) ∨ f ( z ) + m ) ∧ . We therefore have | ϕ ∨ m ( f )( z ) − ϕ ∨ m ( f )( z ′ ) | = (cid:12)(cid:12) ( f ( z ) − m ) ∨ − ( f ( z ′ ) − m ) ∨ f ( z )+ m ) ∧ − ( f ( z ′ )+ m ) ∧ | . If f ( z ) , f ( z ′ ) ≥ m , then | ϕ ∨ m ( f )( z ) − ϕ ∨ m ( f )( z ′ ) | ≤ (cid:12)(cid:12) ( f ( z ) − m ) ∨ − ( f ( z ′ ) − m ) ∨ | ≤ | f ( z ) − f ( z ′ ) | . A similar result is obtained for f ( z ) , f ( z ′ ) ≤ − m . If f ( z ) ≥ m , f ( z ′ ) < m , then | ϕ ∨ m ( f )( z ) − ϕ ∨ m ( f )( z ′ ) |≤ (cid:12)(cid:12) ( f ( z ) − m ) − ( f ( z ′ ) + m ) ∧ | = ( | f ( z ) − f ( z ′ ) − m | = f ( z ) − f ( z ′ ) − m ≤ f ( z ) − f ( z ′ ) , f ( z ′ ) ≤ − m, | f ( z ) − m | = f ( z ) − m ≤ f ( z ) − f ( z ′ ) , f ( z ′ ) > − m . A similar result is obtained for f ( z ) ≥ m , f ( z ′ ) ≤ m , which proves (8.27). Proof of Lemma 3.6.
For q ∈ N , put β norm ( q ) := β ( q ) q .(i) q ∗ ( · ) and r ( · ) are well-defined since β norm ( · ) is decreasing (at a rate ≪ q − ) and r q ∗ ( r ) r is increasing (at a rate ≪ r ) and lim r ↓ q ∗ ( r ) r = 0.Let a >
0. We show that r = 2 r ( a ) fulfills q ∗ ( r ) r ≤ a . By definition of r ( a ), weobtain r ( a ) ≥ r = 2 r ( a ) which gives the result. Since β norm is decreasing, q ∗ isdecreasing. We conclude that q ∗ ( r ) r = 2 · q ∗ (2 r ( a r ( a ≤ · q ∗ ( r ( a r ( a ≤ · a a. The second inequality r ( a ) ≤ a follows from the fact that q ∗ ( r ) r is increasing and q ∗ ( a ) a ≥ a .2(ii) By Theorem 3.2 and the definition of r ( · ), E max f ∈F (cid:12)(cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ c (cid:16) δ p H ( k ) + q ∗ (cid:16) m ( n, δ, k ) p H ( k ) √ n D ∞ n (cid:17) m ( n, δ, k ) H ( k ) √ n (cid:17) = c (cid:16) δ p H ( k ) + D ∞ n q ∗ ( r ( δ D n )) r ( δ D n ) p H ( k ) (cid:17) = c (1 + D ∞ n D n ) δ p H ( k ) . which shows (3.10).Since k f ( Z i , in ) { f ( Z i , in ) >γm ( n,δ,k ) } k ≤ γm ( n, δ, k ) k f ( Z i , in ) k = 1 γm ( n, δ, k ) k f ( Z i , in ) k , for all f ∈ F with V ( f ) ≤ δ , it holds that √ n k f { f>γm ( n,δ,k ) k ,n ≤ √ nγm ( n, δ, k ) k f k ,n ≤ γ k f k ,n D ∞ n r ( δ D n ) p H ( k ) . (8.28)If k f k ,n ≥ D n ∆(1), we have V ( f ) = k f k ,n + D n ∞ X j =1 ∆( j ) ≥ k f k ,n + D n β (1) . (8.29)In the case k f k ,n < D n ∆(1), the fact that ∆( · ) is decreasing implies that a ∗ =max { j ∈ N : k f k ,n < D n ∆( j ) } is well-defined. We conclude that V ( f ) = k f k ,n + ∞ X j =0 k f k ,n ∧ ( D n ∆( j )) = k f k ,n + a ∗ X j =1 k f k ,n + D n ∞ X j = a ∗ +1 ∆( j )= k f k ,n ( a ∗ + 1) + D n β ( a ∗ ) ≥ k f k ,n a ∗ + β ( a ∗ ) . (8.30)Summarizing the results (8.29) and (8.30), we have V ( f ) ≥ k f k ,n ( a ∗ ∨
1) + D n β ( a ∗ ∨ . We conclude that V ( f ) ≥ min a ∈ N (cid:2) k f k ,n a + D n β ( a ) (cid:3) ≥ k f k ,n ˆ a + D n β (ˆ a ) , where ˆ a = arg min j ∈ N (cid:8) k f k ,n · j + D n β ( j ) (cid:9) .Since δ ≥ V ( f ), we have δ ≥ D n β (ˆ a ) = D n β norm (ˆ a )ˆ a . Thus β norm (ˆ a ) ≤ δ D n ˆ a .By definition of q ∗ , q ∗ ( δ D n ˆ a ) ≤ ˆ a . Thus q ∗ ( δ D n ˆ a ) δ D n ˆ a ≤ δ D n . By definition of r ( · ), r ( δ D n ) ≥ δ D n ˆ a . We conclude with k f k ,n ≤ V ( f ) ≤ δ that k f k ,n D ∞ n r ( δ D n ) ≤ D n ˆ a k f k ,n D ∞ n δ = D n V ( f ) k f k ,n D ∞ n δ ≤ D n D ∞ n k f k ,n ≤ D n D ∞ n δ. (8.31) mpirical process theory for locally stationary processes f ∈ A with V ( f ) ≤ δ it holds that √ n k f { f>γm ( n,δ,k ) k ,n ≤ √ nγm ( n, δ, k ) k f k ,n ≤ γ k f k ,n D ∞ n r ( δ D n ) p H ( k ) ≤ γ D n D ∞ n δ p H ( k ) . which shows (3.11). Proof of Theorem 3.7.
In the following, we abbreviate H ( δ ) = H ( δ, F , V ) and N ( δ ) = N ( δ, F , V ).Choose δ = σ and δ j = 2 − j δ . Put m j := 12 m ( n, δ j , N j +1 ) , ( m ( · ) from Lemma 3.6). Choose M n = m . We then have E sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ E sup f ∈F ( M n ) (cid:12)(cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12)(cid:12) + 1 √ n n X i =1 E (cid:2) W i ( F { F >M n } ) (cid:3) , where F ( M n ) := { ϕ ∧ M n ( f ) : f ∈ F} . Due to Lemma 3.5(iii), F ( M n ) still fulfills Assump-tion 2.5.For each j ∈ N , we choose a covering by brackets F prejk := [ l jk , u jk ] ∩ F , k = 1 , ..., N ( δ j )such that V ( u jk − l jk ) ≤ δ j and sup f,g ∈F jk | f − g | ≤ u jk − l jk =: ∆ jk .We now construct inductively a new nested sequence of partitions ( F jk ) k of F from( F prejk ) k in the following way: For each fixed j ∈ N , put {F jk : k } := { j \ i =0 F preik i : k i ∈ { , ..., N ( δ i ) } , i ∈ { , ..., j }} as the intersections of all previous partitions and the j -th partition. Then |{F jk : k }| ≤ N j := N ( δ ) · ... · N ( δ j ). By Lemma 2.1(ii), we havesup f,g ∈F jk | f − g | ≤ ∆ jk , V (∆ jk ) ≤ δ j . In each F jk , fix some f jk ∈ F , and define π j f := f j,ψ j f where ψ j f := min { i ∈ { , ..., N j } : f ∈ F ji } . Put ∆ j f := ∆ j,ψ j f and I ( σ ) := Z σ p ∨ H ( ε, F , V ) dε, τ := min n j ≥ δ j ≤ I ( σ ) √ n o ∨ . (8.32)Since | f | ≤ g implies | W i ( f ) | ≤ W i ( g ) and k W i ( g ) k ≤ k g ( Z i , in ) k , it holds that | G Wn ( f ) | ≤ √ n n X i =1 (cid:12)(cid:12) W i ( f ) − E W i ( f ) (cid:12)(cid:12) ≤ G Wn ( g ) + 2 √ n n X i =1 k W i ( g ) k ≤ G Wn ( g ) + 2 √ n k g k ,n . By (3.7) and (3.8) and the fact that k f − π f k ∞ ≤ M n ≤ m , we have the decompositionsup f ∈F | G Wn ( f ) | ≤ sup f ∈F | G Wn ( π f ) | + sup f ∈F | G Wn ( ϕ ∧ m τ ( f − π τ f )) | + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) + τ − X j =0 sup f ∈F | G Wn ( R ( j )) |≤ sup f ∈F | G Wn ( π f ) | + n sup f ∈F | G Wn ( ϕ ∧ m τ (∆ τ f )) | + 2 √ n sup f ∈F k ∆ τ f k ,n o + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) + τ − X j =0 n sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j +1 f { ∆ j +1 f>m j +1 } k ,n o + τ − X j =0 n sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j − m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j f { ∆ j f>m j − m j +1 } k ,n o =: R + R + R + R + R . (8.33)We now discuss the terms R i , i ∈ { , ..., } from (8.33). Therefore, put C n := c (1 + D ∞ n D n )+ D n D ∞ n . mpirical process theory for locally stationary processes jk = u jk − l jk with l jk , u jk ∈ F , the class { ∆ jk : k ∈ { , ..., N ( δ j ) }} still fulfillsAssumption 3.1. We conclude by Lemma 3.5(iii) that for arbitrary m, ˜ m >
0, the classes { ϕ ∧ m (∆ jk ) : k ∈ { , ..., N ( δ j ) }} , {
12 min { ϕ ∨ m (∆ jk ) , m } : k ∈ { , ..., N ( δ j ) }} , { ϕ ∧ m ( π j +1 f − π j f ) : k ∈ { , ..., N ( δ j ) }} fulfill Assumption 3.1. • Since |{ π f : f ∈ F ( M n ) }| ≤ N ( δ ) = N ( σ ), k π f k ∞ ≤ M n ≤ m ( n, δ , N ( δ )) and V ( π f ) ≤ σ = δ (by assumption, every f ∈ F fulfills V ( f ) ≤ σ ), we have by(3.10): E R = E sup f ∈F ( M n ) | G Wn ( π f ) | ≤ C n δ p ∨ log N ( δ ) . • It holds that |{ ϕ ∧ m τ (∆ τ f ) : f ∈ F ( M n ) }| ≤ N τ . If g := ϕ ∧ m τ (∆ τ f ), then k g k ∞ ≤ m τ ≤ m ( n, δ τ , N τ +1 ) and V ( g ) ≤ V (∆ τ f ) ≤ δ τ . We conclude by (3.10) that: E sup f ∈F ( M n ) | G Wn ( ϕ ∧ m τ (∆ τ f )) | ≤ C n δ τ · p ∨ log N τ +1 . (8.34)For the second term, we have by definition of τ in (8.32) and the Cauchy Schwarzinequality: √ n k ∆ τ f k ,n ≤ √ n k ∆ τ f k ,n ≤ √ nV (∆ τ f ) ≤ √ nδ τ ≤ I ( σ ) . (8.35)From (8.34) and (8.35) we obtain E R ≤ C n δ τ p ∨ log N τ +1 + 2 · I ( σ ) . • Since the partitions are nested, it holds that |{ ϕ ∧ m j − m j +1 ( π j +1 f − π j f ) : f ∈F ( M n ) }| ≤ N j +1 . If g := ϕ ∧ m j − m j +1 ( π j +1 f − π j f ), we have k g k ∞ ≤ m j − m j +1 ≤ m j ≤ m ( n, δ j , N j +1 ) and | g | ≤ | π j +1 f − π j f | ≤ ∆ j f. Furthermore, V ( g ) ≤ V (∆ j f ) ≤ δ j . We conclude by (3.10) that: E R ≤ τ − X j =0 E sup f ∈F ( M n ) | G Wn ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) | ≤ C n τ − X j =0 δ j p ∨ log N j +1 . • It holds that |{ min { ϕ ∨ m j +1 (∆ j +1 f ) , m j } : f ∈ F ( M n ) }| ≤ N j +1 . If g := min { ϕ ∨ m j +1 (∆ j +1 f ) , m j } ,we have k g k ∞ ≤ m j = m ( n, δ j , N j +1 ) and | g | ≤ ∆ j +1 f. V ( g ) ≤ V (∆ j +1 f ) ≤ δ j +1 ≤ δ j . We conclude by (3.10) that: τ − X j =0 E sup f ∈F ( M n ) | G Wn (min { ϕ ∨ m j +1 (∆ j +1 f ) , m j } ) | ≤ C n τ − X j =0 δ j p ∨ log N j +1 . (8.36)Note that V (∆ j +1 f ) ≤ δ j +1 and m j +1 = m ( n, δ j +1 , N j +2 ). By (3.11), we have √ n k ∆ j +1 f { ∆ j +1 f>m j +1 } k ≤ δ j +1 p ∨ log N j +2 . (8.37)From (8.36) and (8.37) we obtain E R ≤ ( C n + 4) τ X j =0 δ j p ∨ log N j +1 . • It holds that |{ min { ϕ ∨ m j − m j +1 (∆ j f ) , m j } : f ∈ F ( M n ) }| ≤ N j +1 . If g := min { ϕ ∨ m j − m j +1 (∆ j f ) , m j } ,we have k g k ∞ ≤ m j = m ( n, δ j , N j +1 ) and | g | ≤ ∆ j f. Thus, V ( g ) ≤ V (∆ j f ) ≤ δ j . We conclude by (3.10) that: τ − X j =0 E sup f ∈F ( M n ) | G Wn (min { ϕ ∨ m j − m j +1 (∆ j +1 f ) , m j } ) | ≤ C n τ − X j =0 δ j · p ∨ log N j +1 . (8.38)Note that V (∆ j f ) ≤ δ j and2( m j − m j +1 ) = m ( n, δ j , N j +1 ) − m ( n, δ j +1 , N j +2 )= D ∞ n n / h r ( δ j D n ) p ∨ log N j +1 − r ( δ j +1 D n ) p ∨ log N j +2 i ≥ D ∞ n n / p ∨ log N j +1 (cid:2) r ( δ j D n ) − r ( δ j +1 D n ) (cid:3) ≥ D ∞ n n / p ∨ log N j +1 r ( δ j D n ) = m j , where the last inequality is due to Lemma 3.6(i). By (3.11) we have √ n k ∆ j f { ∆ j f>m j − m j +1 } k ,n ≤ √ n k ∆ j f { ∆ j f> mj } k ,n m j = m ( n,δ j ,N j +1 ) ≤ δ j p ∨ log N j +1 . (8.39)From (8.38) and (8.39) we obtain R ≤ ( C n + 8) τ − X j =0 δ j p ∨ log N j +1 . mpirical process theory for locally stationary processes R i , i = 1 , ...,
5, we obtain that with some universal constant˜ c > E sup f ∈F ( M n ) (cid:12)(cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ ˜ c · C n h τ X j =0 δ j p ∨ log N j +1 + I ( σ ) i . (8.40)We have (1 ∨ log N j ) / = (cid:16) ∨ P ji =0 log N ( δ i ) (cid:17) / ≤ (cid:16) P ji =0 (1 ∨ H ( δ i )) (cid:17) ≤ P ji =0 (1 ∨ H ( δ i )) / , thus τ X j =0 δ j p ∨ log N j +1 ≤ ∞ X j =0 δ j j X i =0 p ∨ H ( δ i +1 ) ≤ ∞ X i =0 (cid:16) ∞ X j = i δ j (cid:17)p ∨ H ( δ i +1 )= 2 ∞ X i =0 δ i p ∨ H ( δ i +1 ) ≤ ∞ X i =0 δ i +1 p ∨ H ( δ i +1 ) . (8.41)Since H is increasing, we obtain ∞ X i =0 δ i +1 p ∨ H ( δ i +1 ) ≤ ∞ X i =0 δ i p ∨ H ( δ i ) = 2 ∞ X i =0 δ i +1 p ∨ H ( δ i )= 2 ∞ X i =0 Z δ i δ i +1 p ∨ H ( δ i ) dε ≤ ∞ X i =0 Z δ i δ i +1 p ∨ H ( ε ) dε = 2 Z σ p ∨ H ( ε ) dε = 2 · I ( σ ) . (8.42)Inserting (8.42) into (8.41) and then into (8.40), we obtain the result. Proof of Corollary 3.11.
Define ˜ F := { f − g : f, g ∈ F} . It is easily seen that N ( ε, ˜ F , V ) ≤ N ( ε , F , V ) (cf. [22], Theorem 19.5), thus H ( ε, ˜ F , V ) ≤ H ( ε , F , V ) (8.43)Let σ >
0. Define F ( z, u ) := 2 D ∞ n ( u ) · ¯ F ( z, u ) , ¯ F ( z, u ) := sup f ∈F | ¯ f ( z, u ) | . Then obviously, F is an envelope function of ˜ F .8By Markov’s inequality, Theorem 3.7 and (8.43), P (cid:16) sup V ( f − g ) ≤ σ, f,g ∈F | G n ( f ) − G n ( g ) | ≥ η (cid:17) ≤ η E sup V ( f − g ) ≤ σ, f,g ∈F | G n ( f ) − G n ( g ) | = 1 η E sup ˜ f ∈ ˜ F ,V ( ˜ f ) ≤ σ | G n ( ˜ f ) |≤ ˜ cη h (1 + D ∞ n D n + D n D ∞ n ) Z σ q ∨ H ( ε, ˜ F , V ) dε + √ n (cid:13)(cid:13) F { F > m ( n,σ, N ( σ )) } (cid:13)(cid:13) i ≤ ˜ cη h √ D ∞ n D n + D n D ∞ n ) Z σ/ p ∨ H ( u, F , V ) du + 4 p ∨ H ( σ ) r ( σ D n ) (cid:13)(cid:13) F { F > n / r ( σ ) √ ∨ H ( σ } (cid:13)(cid:13) ,n i . The first term converges to 0 by (3.12) and (3.13) for σ → n ).We now discuss the second term. The continuity conditions from Assumption 2.5 andAssumption 3.10 transfer to ¯ F by the inequality | ¯ F ( z , u ) − ¯ F ( z , u ) | = | sup f ∈F ¯ f ( z , u ) − sup f ∈F ¯ f ( z , u ) | ≤ sup f ∈F | f ( z , u ) − f ( z , u ) | We therefore have as in Lemma 8.9(ii) that for all u, u , u , v , v ∈ [0 , k ¯ F ( Z i , u ) − ¯ F ( ˜ Z i ( in ) , u ) k ≤ C cont · n − αs , (8.44) k ¯ F ( Z i ( v ) , u ) − ¯ F ( ˜ Z i ( v ) , v ) k ≤ C cont · (cid:0) | v − v | αs + | u − u | αs (cid:1) . (8.45)Put c n = n / sup i =1 ,...,n D ∞ n ( in ) r ( σ ) √ ∨ H ( σ ) . Then by Lemma 8.7(ii) and (8.44), k F { F > n / r ( σ ) √ ∨ H ( σ } k ,n ≤ n n X i =1 D ∞ n ( in ) · E h ¯ F ( Z i , in ) {| ¯ F ( Z i , in ) | >c n } i ≤ n n X i =1 D ∞ n ( in ) · E h ¯ F ( ˜ Z i ( in ) , in ) {| ¯ F ( ˜ Z i ( in ) , in ) | >c n } i +16 C cont · n − αs · ( D ∞ n ) . (8.46)Put ˜ W i ( u ) := ¯ F ( ˜ Z i ( u ) , u ) and a n ( u ) := ( D ∞ n ( u )) . By (8.45), k ˜ W i ( u ) − ˜ W i ( u ) k ≤ C cont | u − u | αs . By the assumptions on D f,n ( · ), c n → ∞ and lim sup n →∞ n P ni =1 | a n ( in ) | =lim sup n →∞ ( D ∞ n ) < ∞ . We conclude with Lemma 8.8(i) that16 n n X i =1 D ∞ n ( in ) · E h ¯ F ( ˜ Z i ( in ) , in ) {| ¯ F ( ˜ Z i ( in ) , in ) | >c n } i → , that is, the first summand in (8.46) tends to 0. Since lim sup n →∞ D ∞ n < ∞ , we obtainthat (8.46) tends to 0. mpirical process theory for locally stationary processes For the noncontinuous arguments, we need an exponential type inequality which onlyassumes that the process has one moment, which is easily derived from a Bernsteininequality. We then obtain the following lemma.
Lemma 8.3.
Assume that Q i ( f ) , i = 1 , ..., m are independent variables indexed by f ∈F which fulfill E Q i ( f ) = 0 , m P mi =1 k W i ( f ) k ≤ σ Q and | W i ( f ) | ≤ M Q a.s. ( i = 1 , ..., n ).Then there exists some universal constant c > such that E max f ∈F (cid:12)(cid:12)(cid:12) m m X i =1 W i ( f ) (cid:12)(cid:12)(cid:12) ≤ c (cid:16) σ Q + M Q Hm (cid:17) , (8.47) where H is defined by (1.5). Proof of Lemma 8.3.
By Bernstein’s inequality, we have for each f ∈ F that P (cid:16)(cid:12)(cid:12)(cid:12) m m X i =1 Q i (cid:12)(cid:12)(cid:12) ≥ x (cid:17) ≤ (cid:16) − x m P mi =1 k Q i k + x M Q m (cid:17) ≤ (cid:16) − x M Q m · σ Q + x M Q m (cid:17) , where we used in the last step that k Q i k = E [ Q i ] ≤ M Q k Q i k .With standard arguments (cf. the proof of Lemma 19.33 in [22]), we conclude that thereexists some universal constant c > E max f ∈F (cid:12)(cid:12)(cid:12) m m X i =1 Q i ( f ) (cid:12)(cid:12)(cid:12) ≤ c (cid:16) √ H ( σ Q M Q m ) / + M Q Hm (cid:17) . The result follows by using ( Hσ Q M Q m ) / ≤ M Q Hm + 2 σ Q . Proof of Lemma 4.2.
We use a similar argument as in Theorem 3.2, especially wemake use of the decomposition (3.1). Denote the three summands in (3.1) with A , A , A .We first discuss A . We have L X l =1 E max f ∈F nτ l (cid:12)(cid:12)(cid:12) X ≤ i ≤⌊ nτl ⌋ +1 ,i odd τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) . k W k,j ( f ) − W k,j − ( f ) k ≤ {k W k ( f ) k , δ W k ( f )1 ( j − } , we have for each f ∈ F ,1 τ l k T i,l k ≤ τ l X j = τ l − +1 τ l (cid:13)(cid:13)(cid:13) ( iτ l ) ∧ n X k =( i − τ l +1 ( W k,j − W k,j − ) (cid:13)(cid:13)(cid:13) ≤ τ l X j = τ l − +1 τ l ( iτ l ) ∧ n X k =( i − τ l +1 (cid:13)(cid:13)(cid:13) W k,j − W k,j − (cid:13)(cid:13)(cid:13) ≤ τ l X j = τ l − +1 τ l ( iτ l ) ∧ n X k =( i − τ l +1 min {k W k ( f ) k , δ W k ( f )1 ( j − }≤ τ l X j = τ l − +1 min { τ l ( iτ l ) ∧ n X k =( i − τ l +1 k W k ( f ) k , τ l ( iτ l ) ∧ n X k =( i − τ l +1 δ W k ( f )1 ( j − } = 2 τ l X j = τ l − +1 min { σ i,l , ∆ i,j,l } , where σ i,l := 1 τ l ( iτ l ) ∧ n X k =( i − τ l +1 k W k ( f ) k , ∆ i,j,l := 1 τ l ( iτ l ) ∧ n X k =( i − τ l +1 δ W k ( f )1 ( j − . We conclude that1 ⌊ nτ l ⌋ + 1 ⌊ nτl ⌋ +1 X i =1 τ l k T i,l k ≤ τ l X j = τ l − +1 min { nτ l ⌊ nτl ⌋ +1 X i =1 σ i,l , nτ l ⌊ nτl ⌋ +1 X i =1 ∆ i,j,l }≤ τ l X j = τ l − +1 min { n n X i =1 k W i ( f ) k , n n X i =1 δ W i ( j ) } . (8.48)Furthermore, it holds that1 τ l | T i,l | ≤ i k W i ( f ) k ∞ ≤ k f k ∞ ≤ M . (8.49)By Lemma 8.3, (8.47), we have with some universal constant c > √ n E A ≤ c L X l =1 h sup f ∈F (cid:16) ⌊ nτ l ⌋ + 1 ⌊ nτl ⌋ +1 X i =1 τ l k T i,l ( f ) k (cid:17) + 2 M H ⌊ nτ l ⌋ + 1 i ≤ c (cid:16) L X l =1 sup f ∈F τ l X j = τ l − +1 min { n n X i =1 k W i ( f ) k , n n X i =1 δ W i ( j ) } + qM Hn (cid:17) . (8.50) mpirical process theory for locally stationary processes L X l =1 sup f ∈F τ l X j = τ l − +1 min { n n X i =1 k W i ( f ) k , n n X i =1 δ W i ( j ) }≤ L X l =1 sup f ∈F τ l X j = τ l − +1 min { n n X i =1 k f ( Z i , in ) k , n n X i =1 D f,n ( in ) k f ( Z i , in ) k · ∆( j ) }≤ ∞ X j =1 min { sup f ∈F k f k ,n , D n sup f ∈F k f k ,n · ∆( j ) } = sup f ∈F k f k ,n · ¯ V (sup f ∈F k f k ,n )= sup f ∈F (cid:0) k f k ,n · ¯ V ( k f k ,n ) (cid:1) ≤ sup f ∈F (cid:2) k f k ,n V ( f ) (cid:3) , (8.51)where we have used the definition of ¯ V from (8.19) and in the second-to-last equality thefact that x x · ¯ V ( x ) is increasing in x .We also have k W i, ( f ) − E W i, ( f ) k ∞ ≤ k f k ∞ ≤ M and k W i, ( f ) − E W i, ( f ) k ≤ k W i ( f ) k . Thus by Lemma 8.3, (8.47),1 √ n E A ≤ E max f ∈F (cid:12)(cid:12)(cid:12) n n X i =1 ( W i, ( f ) − E W i, ( f )) (cid:12)(cid:12)(cid:12) ≤ c (cid:16) sup f ∈F n n X i =1 k W i ( f ) k + M Hn (cid:17) ≤ c (cid:16) sup f ∈F k f k ,n + M Hn (cid:17) . (8.52)and thusFinally, it holds that1 √ n E A ≤ ∞ X j = q E sup f ∈F (cid:12)(cid:12)(cid:12) n n X i =1 ( W i,j +1 ( f ) − W i,j ( f )) (cid:12)(cid:12)(cid:12) ≤ ∞ X j = q n n X i =1 (cid:13)(cid:13) sup f ∈F | W i,j +1 ( f ) − W i,j ( f ) | (cid:13)(cid:13) . Since | W i,j +1 ( f ) − W i,j ( f ) | = | E [ W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) |G i ] | ≤ E [ | W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) | |G i ] (cf. (8.11) for the introduced notation), we have (cid:13)(cid:13) sup f ∈F | W i,j +1 ( f ) − W i,j ( f ) | (cid:13)(cid:13) ≤ (cid:13)(cid:13) E [max f ∈F | W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) | |G i ] (cid:13)(cid:13) ≤ (cid:13)(cid:13) sup f ∈F | W i ( f ) ∗∗ ( i − j ) − W i ( f ) ∗∗ ( i − j +1) | (cid:13)(cid:13) = (cid:13)(cid:13) sup f ∈F | W i ( f ) − W i ( f ) ∗ ( i − j ) | (cid:13)(cid:13) ≤ D ∞ n ( in ) C ∆ ∆( j ) , (8.53)which shows that 1 √ n E A ≤ ( D ∞ n ) C ∆ β ( q ) . (8.54)Collecting the upper bounds (8.50), (8.51), (8.52) and (8.54), we obtain that E max f ∈F (cid:12)(cid:12)(cid:12) n S Wn ( f ) (cid:12)(cid:12)(cid:12) ≤ (4 c + 1) · h sup f ∈F (cid:2) k f k ,n V ( f ) (cid:3) + ( D ∞ n ) C ∆ β ( q ) + qM Hn i . (8.55)By (8.31), V ( f ) ≤ σ implies k f k ,n ≤ D n r ( δ D n ) k f k ,n and thus k f k ,n ≤ D n r ( σ D n ) , thus sup f ∈F (cid:2) k f k ,n V ( f ) (cid:3) ≤ D n r ( σ D n ) σ. (8.56)Inserting (8.56) into (8.55) yields the first assertion (4.1) of the lemma.We now show (4.2) with a case distinction. We abbreviate q ∗ = q ∗ ( M Hn ( D ∞ n ) C ∆ ). If q ∗ Hn ≤ q ∗ ∈ { , ..., n } and thus P ≤ c (cid:16) D n r ( σ D n ) σ + ( D ∞ n ) C ∆ β ( q ∗ ) + q ∗ M Hn (cid:17) ≤ c (cid:16) D n r ( σ D n ) σ + q ∗ M Hn (cid:17) = 2 c (cid:16) D n r ( σ D n ) σ + M · min n q ∗ Hn , o(cid:17) . (8.57)If q ∗ Hn ≥
1, choose q = ⌊ nH ⌋ ≤ nH . By simply bounding each summand with M , wehave E max f ∈F (cid:12)(cid:12)(cid:12) n S n ( f ) (cid:12)(cid:12)(cid:12) ≤ M ≤ c (cid:16) D n r ( σ D n ) σ + M (cid:17) ≤ c (cid:16) D n r ( σ D n ) σ + M · min n q ∗ Hn , o(cid:17) . (8.58)holds. Putting the two bounds (8.57) and (8.58) together, we obtain the result (4.2). mpirical process theory for locally stationary processes Lemma 8.4.
Let F be some finite class of functions. Let R > be arbitrary and assumethat sup f ∈F k f k ∞ ≤ M . Then there exists a universal constant c > such that E max f ∈F (cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) { R n ( f ) ≤ R } ≤ c n R √ H + M H √ n o , (8.59) where H is defined by (1.5). Proof of Lemma 8.4.
By Theorem 3.3 in [19], it holds for x, a > f that P (cid:16)(cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) ≥ x, R n ( f ) ≤ R (cid:17) ≤ (cid:16) − x R + k f k ∞ x √ n ) (cid:17) . Using standard arguments (cf. the proof of Lemma 19.33 in [22]), we obtain (8.59).
Proof of Corollary 4.3.
Let Q ≥
1, and σ := sup n ∈ N sup f ∈F V ( f ) < ∞ . Put M n = √ n √ H r (cid:0) σQ / D ∞ n (cid:1) D ∞ n . Let F ( z, u ) := D ∞ n ( u ) · ¯ F ( z, u ), (recall ¯ F = sup f ∈F ¯ f ). Then P (cid:16) max f ∈F | G n ( f ) | > Q √ H (cid:17) ≤ P (cid:16) max f ∈F | G n ( f ) | > Q √ H, sup i =1 ,...,n F ( Z i , in ) ≤ M n (cid:17) + P (cid:16) sup i =1 ,...,n F ( Z i , in ) > M n (cid:17) ≤ P (cid:16) max f ∈F | G n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) + P (cid:16) √ n max f ∈F (cid:12)(cid:12) n X i =1 E [ f ( Z i , in ) {| f ( Z i , in ) | >M n } ] (cid:12)(cid:12) > Q √ H (cid:17) + P (cid:16) sup i =1 ,...,n F ( Z i , in ) > M n (cid:17) . (8.60)4For the first summand in (8.60), we use the decomposition P (cid:16) max f ∈F | G n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) ≤ P (cid:16) max f ∈F | G (1) n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) + P (cid:16) max f ∈F | G (2) n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) ≤ P (cid:16) max f ∈F | G (1) n ( ϕ ∧ M n ( f )) | > Q √ H , max f ∈F R n ( ϕ ∧ M n ( f )) ≤ σ (cid:17) + P (cid:16) max f ∈F R n ( ϕ ∧ M n ( f )) > σ (cid:17) + P (cid:16) max f ∈F | G (2) n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) . (8.61)We now discuss the three terms separately. By Lemma 8.4, we have P (cid:16) max f ∈F | G (1) n ( ϕ ∧ M n ( f )) | > Q √ H , max f ∈F R n ( ϕ ∧ M n ( f )) ≤ Q / σ (cid:17) ≤ cQ √ H h σQ / √ H + M n H √ n i ≤ cQ √ H h σQ / √ H + σ √ HQ / i ≤ cQ / . By Lemma 4.2 and (8.63), P (cid:16) max f ∈F R n ( ϕ ∧ M n ( f )) > Q / σ (cid:17) ≤ cσ Q / h D n r ( σ D n ) σ + q ∗ (cid:16) M Hn ( D ∞ n ) C ∆ (cid:17) M Hn i ≤ cσ Q / h σ + q ∗ (cid:16) r ( σQ / D ∞ n ) C ∆ (cid:17) r ( σQ / D ∞ n ) ( D ∞ n ) i ≤ cσ Q / h σ + q ∗ (cid:16) C − C − β (cid:1) · h q ∗ (cid:16) r ( σQ / D ∞ n ) (cid:17) r ( σQ / D ∞ n ) i ( D ∞ n ) i ≤ cσ Q / h σ + q ∗ (cid:16) C − C − β (cid:1) σ Q i |≤ cQ / (cid:2) q ∗ (cid:16) C − C − β (cid:1)(cid:3) . By Theorem 3.2 applied to W i ( f ) = E [ f ( Z i , in ) |G i − ], P (cid:16) max f ∈F | G (2) n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) ≤ cQ √ H · h σ √ H + q ∗ (cid:16) r ( σQ / D ∞ n ) (cid:17) r ( σQ / D ∞ n ) D ∞ n i ≤ cQ √ H (cid:2) σ √ H + σQ / √ H (cid:3) ≤ cσQ / . mpirical process theory for locally stationary processes P (cid:16) max f ∈F | G n ( ϕ ∧ M n ( f )) | > Q √ H (cid:17) ≤ cQ / + 2 cQ / (cid:2) q ∗ (cid:16) C − C − β (cid:1)(cid:3) + 16 cσQ / → Q → ∞ . The second and third summand in (8.60) were already discussed in the proofof Corollary 3.3 ((8.25) and (8.26) therein; note especially that we only need there that k ¯ F ( Z i , in ) k ν ≤ C ¯ F ,n instead of C ∆ which is part of the assumptions), and converge to 0for Q → ∞ under the given assumptions. Proof of Lemma 4.5.
By Lemma 8.4 and since r ( a ) ≤ a (cf. Lemma 3.6(i)), E max f ∈F (cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) { R n ( f ) ≤ δψ ( δ ) } ≤ c n ψ ( δ ) δ p H ( k ) + m ( n, δ, k ) H ( k ) √ n o ≤ c · (cid:2) ψ ( δ ) · δ + D ∞ n r ( δ D n ) (cid:3)p H ( k ) ≤ c · (1 + D ∞ n D n ) · ψ ( δ ) δ p H ( k ) , which shows (4.4).By (8.31), k f k ,n ≤ D n r ( δ D n ) k f k ,n and thus k f k ,n ≤ D n r ( δ D n ). Note that due to r ( a ) ≤ a , E R n ( f ) = 1 n n X i =1 E [ f ( Z i , in ) ] ≤ k f k ,n ≤ ( D n r ( δ D n )) ≤ δ . (8.62)Recall that β norm ( q ) = β ( q ) q . By Assumption 2.10, we have that for any x , x > q = q ∗ ( x ) q ∗ ( x ) satisfies β norm (˜ q ) ≤ C β β norm ( q ∗ ( x )) β norm ( q ∗ ( x )) ≤ C β x x . Thus, by definition of q ∗ , q ∗ ( C β x x ) ≤ q ∗ ( x ) q ∗ ( x ) . (8.63)We obtain that q ∗ (cid:16) r ( δ D n ) C ∆ (cid:17) ≤ q ∗ (cid:16) r ( δ D n ) (cid:17) q ∗ (cid:0) C − C − β (cid:1) . (8.64)6By (8.62), Markov’s inequality, Lemma 4.2 and (8.64), P (cid:16) sup f ∈F R n ( f ) > ψ ( δ ) δ (cid:17) ≤ P (cid:16) sup f ∈F | R n ( f ) − E R n ( f ) | > ψ ( δ ) δ (cid:17) ≤ cψ ( δ ) δ · h D n r ( δ D n ) δ + q ∗ (cid:16) r ( δ D n ) C ∆ (cid:17) r ( δ D n ) ( D ∞ n ) i ≤ cψ ( δ ) δ · h δ + h q ∗ (cid:16) r ( δ D n ) (cid:17) r ( δ D n ) i q ∗ (cid:0) C − C − β (cid:1) ( D ∞ n ) i ≤ cψ ( δ ) δ · h δ + δ q ∗ (cid:0) C − C − β (cid:1) ( D ∞ n D n ) i ≤ c (1 + q ∗ (cid:0) C − C − β (cid:1) ( D ∞ n D n ) ) ψ ( δ ) , which shows (4.5). Proof of Theorem 4.6.
In the following, we abbreviate H ( δ ) = H ( δ, F , V ) and N ( δ ) = N ( δ, F , V ).We use exactly the same setup as in the proof of Theorem 3.7, that is, we choose δ = σ and δ j = 2 − j δ , and m j = 12 m ( n, δ j , N j +1 ) , as well as M n = m . We then use E sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12)(cid:12) ≤ E sup f ∈F ( M n ) (cid:12)(cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12)(cid:12) + 1 √ n n X i =1 E (cid:2) F ( Z i ) { F ( Z i ) >M n } (cid:3) , (8.65)where F ( M n ) := { ϕ ∧ M n ( f ) : f ∈ F} .As in the proof of Theorem 3.7, we construct a nested sequence of partitions ( F jk ) k =1 ,...,N j , j ∈ N of F ( M n ) (where N j := N ( δ ) · ... · N ( δ j )), and a sequence ∆ jk of measurable func-tions such that sup f,g ∈F jk | f − g | ≤ ∆ jk , V (∆ jk ) ≤ δ j . In each F jk , we fix some f jk ∈ F , and define π j f := f j,ψ j f where ψ j f := min { i ∈{ , ..., N j } : f ∈ F ji } , and put ∆ j f := ∆ j,ψ j f , and I ( σ ) := Z σ ψ ( ε ) p ∨ H ( ε, F , V ) dε, as well as τ := min n j ≥ δ j ≤ I ( σ ) √ n o ∨ . (8.66) mpirical process theory for locally stationary processes f, g with | f | ≤ g , it holds that | G (1) n ( f ) | ≤ | G (1) n ( g ) | + 2 √ n · n n X i =1 E [ g ( Z i , in ) |G i − ] ≤ | G (1) n ( g ) | + 2 | G (2) n ( g ) | + 2 √ n · n n X i =1 E [ g ( Z i , in )] ≤ | G (1) n ( g ) | + 2 | G (2) n ( g ) | + 2 √ n k g k ,n . By (3.7) and (3.8) (applied to W i ( f ) = f ( Z i , in ) − E [ f ( Z i , in ) |G i − ] ) and the fact that k f − π f k ∞ ≤ M n ≤ m , we have the decompositionsup f ∈F | G (1) n ( f ) | ≤ sup f ∈F | G (1) n ( π f ) | + sup f ∈F | G (1) n ( ϕ ∧ m τ ( f − π τ f )) | + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) + τ − X j =0 sup f ∈F | G (1) n ( R ( j )) |≤ sup f ∈F | G (1) n ( π f ) | + n sup f ∈F | G (1) n ( ϕ ∧ m τ (∆ τ f )) | + 2 sup f ∈F | G (2) n ( ϕ ∧ m τ (∆ τ f )) | +2 √ n sup f ∈F k ∆ τ f k ,n o + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) + τ − X j =0 n sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 sup f ∈F (cid:12)(cid:12)(cid:12) G (2) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j +1 f { ∆ j +1 f>m j +1 } k ,n o + τ − X j =0 n sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j − m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 sup f ∈F (cid:12)(cid:12)(cid:12) G (2) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j − m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j f { ∆ j f>m j − m j +1 } k ,n o (8.67)8We have for f ∈ F ( M n ): π f = ϕ ∧ M n ( π f ) ,ϕ ∧ m τ (∆ τ f ) ≤ min { ∆ τ f, m τ } ,ϕ ∧ m j − m j − ( π j +1 f − π j f ) ≤ min { ∆ j f, m j } , min { ϕ ∨ m j +1 (∆ j +1 f ) , m j } ≤ min { ∆ j f, m j } , min { ϕ ∨ m j − m j +1 (∆ j f ) , m j } ≤ min { ∆ j f, m j } . (8.68)We therefore define the eventΩ n := { sup f ∈F ( M n ) R n ( ϕ ∧ M n ( π f )) ≤ σψ ( σ ) }∩ τ \ j =1 (cid:8) sup f ∈F ( M n ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) (cid:9) . From (8.67) and (8.68), we obtainsup f ∈F ( M n ) | G (1) n ( f ) | Ω n ≤ sup f ∈F ( M n ) | G (1) n ( π f ) | { sup f ∈F ( Mn ) R n ( π f ) ≤ σψ ( σ ) } + n sup f ∈F | G (1) n ( ϕ ∧ m τ (∆ τ f )) |× { sup f ∈F ( Mn ) R n (min { ∆ τ f, m τ } ) ≤ δ τ ψ ( δ τ ) } + 2 R o + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) × { sup f ∈F ( Mn ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) } + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) × { sup f ∈F ( Mn ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) } + 2 R + τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j − m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) × { sup f ∈F ( Mn ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) } + 2 R =: ˜ R + { ˜ R + 2 R } + ˜ R + { ˜ R + 2 R } + { ˜ R + 2 R } . (8.69)We now discuss the terms ˜ R i , i = 1 , ..., R i , i ∈ { , , } werealready discussed in the proof of Theorem 3.7. Put˜ C n := 2 c (1 + D ∞ n D n ) , mpirical process theory for locally stationary processes c is from Lemma 3.6. • Since |{ π f : f ∈ F ( M n ) }| ≤ N ( δ ), k π f k ∞ ≤ M n ≤ m ( n, δ , N ( δ )), we have byLemma 4.5: E ˜ R = E sup f ∈F ( M n ) | G (1) n ( π f ) | { sup f ∈F ( Mn ) R n ( π f ) ≤ δ ψ ( δ ) } ≤ ˜ C n ψ ( δ ) δ p ∨ log N ( δ ) . • It holds that |{ ϕ ∧ m τ (∆ τ f ) : f ∈ F ( M n ) }| ≤ N τ . If g := ϕ ∧ m τ (∆ τ f ), then k g k ∞ ≤ m τ ≤ m ( n, δ τ , N τ +1 ). We conclude by Lemma 4.5: E ˜ R ≤ E sup f ∈F | G (1) n ( ϕ ∧ m τ (∆ τ f )) |× { sup f ∈F ( Mn ) R n (min { ∆ τ f, m τ } ) ≤ δ τ ψ ( δ τ ) } ≤ ˜ C n ψ ( δ τ ) δ τ · p ∨ log N τ +1 . • Since the partitions are nested, it holds that |{ ϕ ∧ m j − m j +1 ( π j +1 f − π j f ) : f ∈F ( M n ) }| ≤ N j +1 . If g := ϕ ∧ m j − m j +1 ( π j +1 f − π j f ), we have k g k ∞ ≤ m j − m j +1 ≤ m j ≤ m ( n, δ j , N j +1 ). We conclude by Lemma 4.5: E ˜ R ≤ τ − X j =0 E sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n ( ϕ ∧ m j − m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) × { sup f ∈F ( Mn ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) } ≤ ˜ C n τ − X j =0 ψ ( δ j ) δ j p ∨ log N j +1 . • It holds that |{ min { ϕ ∨ m j +1 (∆ j +1 f ) , m j } : f ∈ F ( M n ) }| ≤ N j +1 . If g := min { ϕ ∨ m j +1 (∆ j +1 f ) , m j } ,we have k g k ∞ ≤ m j = m ( n, δ j , N j +1 ). We conclude by Lemma 4.5: E ˜ R ≤ τ − X j =0 E sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) × { sup f ∈F ( Mn ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) } ≤ ˜ C n τ − X j =0 ψ ( δ j ) δ j p ∨ log N j +1 . • It holds that |{ min { ϕ ∨ m j − m j +1 (∆ j f ) , m j } : f ∈ F ( M n ) }| ≤ N j +1 . If g := min { ϕ ∨ m j − m j +1 (∆ j f ) , m j } ,we have k g k ∞ ≤ m j = m ( n, δ j , N j +1 ). We conclude by Lemma 4.5 that: E ˜ R ≤ τ − X j =0 E sup f ∈F (cid:12)(cid:12)(cid:12) G (1) n (min (cid:8)(cid:12)(cid:12) ϕ ∨ m j − m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) × { sup f ∈F ( Mn ) R n (min { ∆ j f, m j } ) ≤ δ j ψ ( δ j ) } ≤ ˜ C n τ − X j =0 ψ ( δ j ) δ j · p ∨ log N j +1 . E ˜ R i , i = 1 , ..., R i , i ∈ { , , } from theproof of Theorem 3.7 into (8.69), we obtain that with some universal constant ˜ c > E sup f ∈F ( M n ) (cid:12)(cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12)(cid:12) Ω n ≤ ˜ c (1 + D ∞ n D n + D n D ∞ n ) h τ +1 X j =0 ψ ( δ j ) δ j p ∨ log N j +1 + I ( σ ) i . (8.70)Note that ∞ X j = k δ j ψ ( δ j ) ≤ ∞ X j = k Z δ j δ j +1 ψ ( δ j ) dx ≤ Z δ k ψ ( x ) dx. By partial integration, it is easy to see that there exists some universal constant c ψ > (cid:12)(cid:12) Z δ k ψ ( x ) dx (cid:12)(cid:12) ≤ c ψ δ k ψ ( δ k ) , (8.71)thus ∞ X j = k δ j ψ ( δ j ) ≤ c ψ δ k ψ ( δ k ) . (8.72)Using (8.72), we can argue as in the proof of Theorem 3.7 (see (8.40), (8.41) and (8.42)therein) that there exists some universal constant ˜ c > ∞ X j =0 ψ ( δ j ) δ j p ∨ log N j +1 ≤ ˜ c I ( σ ) . Insertion of the results into (8.70) yields E sup f ∈F ( M n ) (cid:12)(cid:12) G (1) n ( f ) (cid:12)(cid:12) Ω n ≤ ˜ c · (3˜ c + 1)(1 + D ∞ n D n + D n D ∞ n ) I ( σ ) . (8.73)Discussion of the event Ω n : We have P (Ω cn ) ≤ P (cid:16) sup f ∈F ( M n ) R n ( ϕ ∧ M n ( π f )) > ψ ( σ ) σ (cid:17) + τ +1 X j =1 P (cid:16) sup f ∈F ( M n ) R n (min { ∆ j f, m j } ) > ψ ( δ j ) δ j (cid:17) =: R ◦ + R ◦ . (8.74)We now discuss R ◦ i , i = 1 ,
2. Put C ◦ n := 2 c n q ∗ (cid:0) C − C − β (cid:1)(cid:0) D ∞ n D n (cid:1) o , where c is from Lemma 4.5. mpirical process theory for locally stationary processes • Since |{ ϕ ∧ M n ( π f ) : f ∈ F ( M n ) }| ≤ N ( δ ) = N ( σ ), k ϕ ∧ M n ( π f ) k ∞ ≤ M n ≤ m ( n, σ, N ( σ )) and V ( ϕ ∧ M n ( π f )) ≤ V ( π f ) ≤ σ , we have by Lemma 4.5: R ◦ ≤ C ◦ n ψ ( σ ) . • It holds that |{ min { ∆ j f, m j } : f ∈ F ( M n ) }| ≤ N j +1 . We have k min { ∆ j f, m j }k ∞ ≤ m j = m ( n, δ j , N j +1 ) and V (min { ∆ j f, m j } ) ≤ V (∆ j f ) ≤ δ j . We conclude byLemma 4.5 that: R ◦ ≤ C ◦ n τ +1 X j =0 ψ ( δ j ) . Inserting the bounds for R ◦ i , i = 1 , P (Ω cn ) ≤ C ◦ n ∞ X j =0 ψ ( δ j ) . (8.75)We now have ∞ X j =0 ψ ( δ j ) ≤ Z σ εψ ( ε ) dε = 2log(log( σ )) . We conclude that for each η > P (cid:16) sup f ∈F | G (1) n ( f ) | > η (cid:17) ≤ P (cid:16) sup f ∈F | G (1) n ( f ) | > η, Ω n (cid:17) + P (Ω cn ) ≤ η E sup f ∈F | G (1) n ( f ) | Ω n + P (Ω cn ) . Insertion of (8.65), (8.73) and (8.75) gives the result.
Proof of Corollary 4.11.
Define ˜ F as in Corollary 3.11. We obtain P (cid:16) sup V ( f − g ) ≤ σ, f,g ∈F | G n ( f ) − G n ( g ) | ≥ η (cid:17) ≤ P (cid:16) sup V ( ˜ f ) ≤ σ, ˜ f ∈ ˜ F | G (1) n ( ˜ f ) | ≥ η (cid:17) + P (cid:16) sup V ( ˜ f ) ≤ σ, ˜ f ∈ ˜ F | G (2) n ( ˜ f ) | ≥ η (cid:17) . (8.76)Now let F ( z, u ) := 2 D ∞ n ( u ) · ¯ F ( z, u ), where ¯ F is from Assumption 4.9. Then obviously, F is an envelope function of ˜ F .We now discuss the second summand on the right hand side in (8.76). By Markov’sinequality and Theorem 3.7 applied to W i ( f ) = E [ f ( Z i , in ) |G i − ], we obtain as in the2proof of Corollary 3.11 that P (cid:16) sup V ( ˜ f ) ≤ σ, ˜ f ∈ ˜ F | G (2) n ( ˜ f ) | ≥ η (cid:17) ≤ ˜ c ( η/ h √ D ∞ n D n + D n D ∞ n ) Z σ/ p ∨ H ( u, F , V ) du + 4 p ∨ H ( σ ) r ( σ D n ) (cid:13)(cid:13) F { F > n / r ( σ ) √ ∨ H ( σ } (cid:13)(cid:13) ,n i . (8.77)The first summand in (8.77) converges to 0 for σ → n ) sincesup n ∈ N Z σ/ p ∨ H ( u, F , V ) du ≤ sup n ∈ N Z σ ψ ( ε ) p ∨ H ( ε, F , V ) dε < ∞ . We now discuss the second summand in (8.77). The continuity conditions from Assump-tion 4.9 on ¯ F yield as in the proof of Lemma 8.9(ii) that for all u, u , u , v , v ∈ [0 , k ¯ F ( Z i , u ) − ¯ F ( ˜ Z i ( in ) , u ) k ≤ C cont · n − αs/ , (8.78) k ¯ F ( Z i ( v ) , u ) − ¯ F ( ˜ Z i ( v ) , v ) k ≤ C cont · (cid:0) | v − v | αs/ + | u − u | αs (cid:1) . (8.79)As in the proof of Corollary 3.11, we now obtain with (8.78) and (8.79) that (cid:13)(cid:13) F { F > n / r ( σ ) √ ∨ H ( σ } (cid:13)(cid:13) ,n → n → ∞ , which shows that (8.77) converges to 0 for σ → n → ∞ .We now consider the first term in (8.76). By Theorem 4.6, we have with some universalconstant c > P (cid:16) sup V ( ˜ f ) ≤ σ, ˜ f ∈ ˜ F | G (1) n ( ˜ f ) | ≥ η (cid:17) ≤ η h c (cid:16) D ∞ n D n + D n D ∞ n (cid:17) · Z σ ψ ( ε ) q ∨ H (cid:0) ε, ˜ F , V (cid:1) d ε + 4 p ∨ H ( σ ) r ( σ D n ) (cid:13)(cid:13) F { F > m ( n,σ, N ( σ )) } (cid:13)(cid:13) i + c (cid:16) q ∗ (cid:0) C − C − β (cid:1)(cid:16) D ∞ n D n (cid:17) (cid:17) Z σ εψ ( ε ) dε. (8.81)For the first summand in (8.81), note that by (8.43), Z σ ψ ( ε ) q ∨ H ( ε, ˜ F , V ) dε ≤ √ Z σ/ ψ (2 ε ) p ∨ H ( ε, F , V ) dε ≤ √ Z σ/ ψ ( ε ) p ∨ H ( ε, F , V ) dε. mpirical process theory for locally stationary processes D n , D ∞ n , we obtain that the firstsummand in (8.81) converges to 0 for σ → n ).The third summand in (8.81) converges to 0 for σ → n ) since R ∞ εψ ( ε ) dε < ∞ and by the uniform boundedness of D n , D ∞ n .The second summand in (8.81) converges to 0 for n → ∞ by (8.80). The following central limit theorem is formulated for a more general structure of D f,n ( · )than in Theorem 3.13. We formulate the conditions on D f,n ( · ) in the following Assump-tion 8.5. Assumption 8.5.
For f ∈ F , let D ∞ f,n := sup i =1 ,...,n D f,n ( in ) . There exists a sequence h n > and v ∈ [0 , such that for all u ∈ [0 , , | v − u | > h n implies D f,n ( u ) = 0 .For all f ∈ F , sup n ∈ N ( h / n · D ∞ f,n ) < ∞ , sup n ∈ N n n X i =1 D f,n ( in ) < ∞ , D ∞ f,n √ n → , and D f,n ( · ) D ∞ f,n has bounded variation uniformly in n . We obtain the following central limit theorem.
Theorem 8.6.
Let F satisfy Assumptions 3.9, 3.10 and 8.5. Suppose that either As-sumption 2.5 or Assumptions 2.8, 4.9 hold. Let m ∈ N and f , ..., f m ∈ F .Suppose that either • Case K = 1 : The mapping u E [ E [ ¯ f k ( ˜ Z j ( u ) , u ) |G ] · E [ ¯ f l ( ˜ Z j ( u ) , u ) |G ]] hasbounded variation for all j , j ∈ N , k, l ∈ { , ..., m } and the limit Σ (1) kl := lim n →∞ Z D f k ,n ( u ) D f l ,n ( u ) · X j ∈ Z Cov ( f k ( ˜ Z ( u ) , u ) , f l ( ˜ Z j ( u ) , u )) du exists for all k, l ∈ { , ..., m } . • Case K = 2 : h n → , and the limit Σ (2) kl := lim n →∞ Z D f k ,n ( u ) D f l ,n ( u ) du · X j ∈ Z Cov ( f k ( ˜ Z ( v ) , v ) , f l ( ˜ Z j ( v ) , v )) exists for all k, l ∈ { , ..., m } . Let Σ ( K ) = (Σ ( K ) kl ) k,l =1 ,...,m . Then √ n n X i =1 n f ( Z i , in ) ... f m ( Z i , in ) − E f ( Z i , in ) ... f m ( Z i , in ) o d → N (0 , Σ ( K ) ) , Proof of Theorem 8.6.
Denote W i ( f ) := f ( Z i , in ) and W i := ( W i ( f ) , ..., W i ( f m )) ′ .Let a = ( a , ..., a m ) ′ ∈ R m \{ } . We use the decomposition1 √ n n X i =1 a ′ ( W i − EW i ) = ∞ X j =0 √ n n X i =1 a ′ P i − j W i . For fixed J ∈ N ∪ {∞} , put( S n ( J )) k =1 ,...,m := S n ( J ) := J − X j =0 √ n n X i =1 P i − j W i . Then, since P i − j W i ( f k ), i = 1 , ..., n is a martingale difference sequence and by Lemma8.9(i), k S n ( ∞ ) k − S n ( J ) k k ≤ ∞ X j = J (cid:13)(cid:13) √ n n X i =1 P i − j W i ( f k ) (cid:13)(cid:13) = ∞ X j = J (cid:16) n n X i =1 k P i − j W i ( f k ) k (cid:17) / ≤ (cid:16) n n X i =1 D f k , ,n ( in ) (cid:17) / · ∞ X j = J ∆( j ) , thuslim sup J,n →∞ k S n ( ∞ ) k − S n ( J ) k k ≤ sup n ∈ N (cid:16) n n X i =1 D f k , ,n ( in ) (cid:17) / · lim sup J →∞ ∞ X j = J ∆( j ) = 0 . (8.82)Define ( S ◦ n ( J ) k ) k =1 ,...,m := S ◦ n ( J ) := 1 √ n n − J +1 X i =1 J − X j =0 P i W i + j . Then we have k S ◦ n ( J ) k − S n ( J ) k k ≤ J − X j =0 k √ n j X i =1 P i − j W i ( f k ) k + 1 √ n J − X j =0 k n X i = n − J + j +1 P i − j W i ( f k ) k ≤ J √ n · sup i =1 ,...,n + j k P i − j W i ( f k ) k ≤ J √ n · sup i =1 ,...,n + j k f k ( Z i , in ) k . mpirical process theory for locally stationary processes i =1 ,...,n + j k f k ( Z i , in ) k ≤ C ∆ , · D ,n ( in ) , which gives lim n →∞ k S ◦ n ( J ) k − S n ( J ) k k = 0 . (8.83) Stationary approximation:
Put ˜ S ◦ n ( J ) = ( ˜ S ◦ n ( J ) k ) k =1 ,...,m , where˜ S ◦ n ( J ) k := 1 √ n n − J +1 X i =1 J − X j =0 P i f k ( ˜ Z i + j ( in ) , in ) . Then we have k S ◦ n ( J ) k − ˜ S ◦ n ( J ) k k ≤ J − X j =0 (cid:16) n n − J +1 X i =1 (cid:13)(cid:13)(cid:13) P i f k ( Z i + j , i + jn ) − P i f k ( ˜ Z i + j ( in ) , in ) (cid:13)(cid:13)(cid:13) (cid:17) / . For each j, k , it holds that1 n n − J +1 X i =1 k P i f k ( Z i + j , i + jn ) − P i f k ( ˜ Z i + j ( in ) , in ) k ≤ n n − J +1 X i =1 (cid:16) D f k ,n ( i + jn ) − D f k ,n ( in ) (cid:17) · sup i k ¯ f ( Z i + j , i + jn ) k + 2 n n − J +1 X i =1 D f,n ( in ) · sup i (cid:13)(cid:13)(cid:13) ¯ f k ( Z i + j , i + jn ) − ¯ f k ( ˜ Z i + j ( in ) , in )] k . By Lemma 8.9, we have sup i k ¯ f ( Z i + j , i + jn ) k < ∞ . Since √ n D f k ,n ( · ) has bounded vari-ation uniformly in n ,1 n n − J +1 X i =1 (cid:16) D f k ,n ( i + jn ) − D f k ,n ( in ) (cid:17) ≤ sup i =1 ,...,n √ n D f k ,n ( in ) · √ n n − J +1 X i =1 (cid:12)(cid:12)(cid:12) D f k ,n ( i + jn ) − D f k ,n ( in ) (cid:12)(cid:12)(cid:12) → . By Lemma 8.9(ii), sup i (cid:13)(cid:13)(cid:13) ¯ f k ( Z i + j , i + jn ) − ¯ f k ( ˜ Z i + j ( in ) , in ) (cid:13)(cid:13)(cid:13) → . We therefore obtain k S ◦ n ( J ) k − ˜ S ◦ n ( J ) k k → . (8.84)6Note that M i,k := 1 √ n J X j =0 P i f k ( ˜ Z i + j ( in ) , in ) , i = 1 , ..., n is a martingale difference sequence with respect to G i − , and˜ S ◦ n ( J ) k = n − J +1 X i =1 M i,k . We can therefore apply a central limit theorem for martingale difference sequences to a ′ ˜ S ◦ n ( J ) = P n − J +1 i =1 ( P mk =1 a k M i,k ). The Lindeberg condition:
Let ς >
0. Iterated application of Lemma 8.7(i) yields thatthere are constants c , c > m, J such that n − J +1 X i =1 E [( m X k =1 a k M i,k ) {| P mk =1 a k M i,k | >ς √ n } ] ≤ c X l =0 , J − X j =0 m X k =1 | a k | · n n − J X i =1 E h E [ f k ( ˜ Z i + j ( in ) , in ) |G i − l ] {| E [ f k ( ˜ Z i + j ( in ) , in ) |G i − l ] | > √ n ςc | a |∞ } i . For each l, j, k , we have1 n n − J X i =1 E h E [ f k ( ˜ Z i + j ( in ) , in ) |G i − l ] {| E [ f k ( ˜ Z i + j ( in ) , in ) |G i − l ] | > √ n ςc | a |∞ } i = 1 n n − J X i =1 D f k ,n ( in ) E h E [ ¯ f k ( ˜ Z i ( in ) , in ) |G i − l ] {| E [ ¯ f k ( ˜ Z i ( in ) , in ) |G i − l ] | > √ n sup i =1 ,...,n | Df,n ( in ) | ςc | a |∞ } i = 1 n n − J X i =1 D f k ,n ( in ) E h ˜ W i ( in ) {| ˜ W i ( in ) | >c n } i , (8.85)where we have put˜ W i ( u ) := E [ ¯ f k ( ˜ Z i ( u ) , u ) |G i − l ] , c n := √ n sup i =1 ,...,n | D f,n ( in ) | ςc | a | ∞ . By Lemma 8.9(ii), ˜ W i ( u ) satisfies the assumptions (8.89) of Lemma 8.8. By assumption, c n → ∞ . With a n ( u ) := D f k ,n ( u ) , we obtain from Lemma 8.8 that (8.85) converges to0, which shows that the Lindeberg condition is satisfied. Convergence of the variance:
We have n − J +1 X i =1 E [( m X k =1 M i,k ) |G i − ]= J − X j ,j =0 m X k ,k =1 a k a l · n n − J +1 X i =1 D f k ,n ( in ) D f l ,n ( in ) · E (cid:2) P i ¯ f k ( ˜ Z i + j ( in ) , in ) · P i ¯ f l ( ˜ Z i + j ( in ) , in ) |G i − (cid:3) . mpirical process theory for locally stationary processes j , j , k , k , we define˜ W i ( u ) := E (cid:2) P i ¯ f k ( ˜ Z i + j ( u ) , u ) · P i ¯ f l ( ˜ Z i + j ( u ) , u ) |G i − (cid:3) , a n ( u ) := D f k ,n ( u ) D f l ,n ( u ) . Then 1 n n − J +1 X i =1 D f k ,n ( in ) D f l ,n ( in ) · E (cid:2) P i ¯ f k ( ˜ Z i + j ( in ) , in ) · P i ¯ f l ( ˜ Z i + j ( in ) , in ) |G i − (cid:3) = 1 n n − J +1 X i =1 a n ( in ) ˜ W i ( in ) . By Lemma 8.9(i),(ii), we have k ˜ W ( u ) − ˜ W ( v ) k ≤ k ¯ f k ( ˜ Z ( u ) , u ) − ¯ f k ( ˜ Z ( v ) , v ) k · k ¯ f l ( ˜ Z ( u )) k + k ¯ f l ( ˜ Z ( u ) , u ) − ¯ f l ( ˜ Z ( v ) , v ) k · k ¯ f k ( ˜ Z ( v )) k ≤ C cont C ¯ f · | u − v | ςs/ Let A n := sup i =1 ,...,n | a n ( in ) | . Since D f,n ( · ) D ∞ f,n has bounded variation uniformly in n , itfollows that a n ( · ) A n has bounded variation uniformly in n . From D ∞ f,n √ n → A n n → n h n n X i =1 | a n ( in ) | i ≤ sup n (cid:16) n n X i =1 D f k ,n ( in ) (cid:17) / · (cid:16) n n X i =1 D f l ,n ( in ) (cid:17) / < ∞ . It holds that sup n ( h n · A n ) ≤ sup n ( h / n D ∞ f k ,n ) · sup n ( h / n D ∞ f l ,n ) < ∞ , and | v − u | > h n ⇒ D f k ,n ( u ) = 0 , D f l ,n ( u ) = 0 , ⇒ a n ( u ) = 0 . Thus, Lemma 8.8(ii) is applicable.Case K = 1: If u E [ P ¯ f k ( ˜ Z j ( u ) , u ) · P ¯ f l ( ˜ Z j ( u ) , u )] has bounded variation, we have1 n n − J +1 X i =1 D f k ,n ( in ) D f l ,n ( in ) · E (cid:2) P i ¯ f k ( ˜ Z i + j ( in ) , in ) · P i ¯ f l ( ˜ Z i + j ( in ) , in ) |G i − (cid:3) p → lim n →∞ Z D f k ,n ( u ) D f l ,n ( u ) · E [ P ¯ f k ( ˜ Z j ( u ) , u ) · P ¯ f l ( ˜ Z j ( u ) , u )] du. and thus n − J +1 X i =1 E [( m X k =1 M i,k ) |G i − ] p → m X k,l =1 a k a l · lim n →∞ Z D f k ,n ( u ) D f l ,n ( u ) · J − X j ,j =0 E [ P ¯ f k ( ˜ Z j ( u ) , u ) · P ¯ f l ( ˜ Z j ( u ) , u )] du = a ′ Σ (1) kl ( J ) a f, g ∈ F , we have that E [ P ¯ f ( ˜ Z j ( u ) , u ) · P ¯ g ( ˜ Z j ( u ) , u )] can be written as E [ P ¯ f ( ˜ Z j ( u ) , u ) · P ¯ g ( ˜ Z j ( u ) , u )]= E [ E [ ¯ f ( ˜ Z j ( u ) , u ) |G ]] · E [¯ g ( ˜ Z j ( u ) , u ) |G ]] − E [ E [ ¯ f ( ˜ Z j ( u ) , u ) |G − ]] · E [¯ g ( ˜ Z j ( u ) , u ) |G − ]]which shows that the condition stated in the assumption guarantees the bounded varia-tion of u E [ P ¯ f ( ˜ Z j ( u ) , u ) · P ¯ g ( ˜ Z j ( u ) , u )].Case K = 2: If h n →
0, then we obtain similarly n − J +1 X i =1 E [( m X k =1 M i,k ) |G i − ] p → m X k,l =1 a k a l · lim n →∞ Z D f k ,n ( u ) D f l ,n ( u ) du · J − X j ,j =0 E [ P ¯ f k ( ˜ Z j ( v ) , v ) · P ¯ f l ( ˜ Z j ( v ) , v )] du = a ′ Σ (2) kl ( J ) a. By the martingale central limit theorem and (8.83), (8.84), we obtain that a ′ S n ( J ) d → N (0 , a ′ Σ ( K ) kl ( J ) a ) . (8.86) Conclusion:
For K ∈ { , } , we have a ′ Σ ( K ) kl ( J ) a → a ′ Σ ( K ) kl ( ∞ ) a ( J → ∞ ) (8.87)due to X j ,j :max { j ,j }≥ J k P ¯ f k ( ˜ Z j ( u ) , u ) · P ¯ f l ( ˜ Z j ( u ) , u ) k ≤ X j ,j :max { j ,j }≥ J k P ¯ f k ( ˜ Z j ( u ) , u ) k k P ¯ f l ( ˜ Z j ( u ) , u ) k → J → ∞ )uniformly in n andsup n Z | D f k ,n ( u ) D f l ,n ( u ) | du ≤ sup n (cid:0) Z D f k ,n ( u ) du (cid:1) / (cid:0) Z D f l ,n ( u ) du (cid:1) / < ∞ . By (8.82), (8.86), (8.87), X j ∈ Z Cov( ¯ f k ( ˜ Z ( u ) , u ) , ¯ f l ( ˜ Z j ( u ) , u )) = ∞ X j ,j =0 E [ P ¯ f k ( ˜ Z j ( u ) , u ) · P ¯ f l ( ˜ Z j ( u ) , u )]and the Cramer-Wold device, the assertion of the theorem follows. mpirical process theory for locally stationary processes Lemma 8.7.
Let c ∈ R , c > .(i) For x, y ∈ R , it holds that ( x + y ) {| x + y | >c } ≤ x {| x | > c } + 8 y {| y | > c } . (ii) For random variables W, ˜ W , it holds that E [ W {| W | >c } ] ≤ E [( W − ˜ W ) ] + 4 E [ ˜ W {| ˜ W | > c } ] . Proof of Lemma 8.7. (i) It holds that( x + y ) {| x + y | >c } ≤ (cid:2) x + y (cid:3) {| x | > c or | y | > c } ≤ (cid:2) x + y (cid:3)(cid:8) {| x | > c , | y | > c } + {| x | > c , | y |≤ c } + {| x |≤ c , | y | > c } (cid:9) ≤ (cid:2) x {| x | > c } + y {| y | > c } (cid:3) + 4 x {| x | > c } + 4 y {| y | > c } ≤ x {| x | > c } + 8 y {| y | > c } . (ii) We have E [ W {| W | >c } ] ≤ E [( | W | − ˜ W ) {| W | >c } ] + 2 E [ ˜ W {| W | >c } ] ≤ E [( W − ˜ W ) ] + 2 E [ ˜ W {| W − ˜ W | + | ˜ W | >c } ] . (8.88)Furthermore, with Markov’s inequality, E [ ˜ W {| W − ˜ W | + | ˜ W | >c } ] ≤ E [ ˜ W {| W − ˜ W | > c } ] + E [ ˜ W {| ˜ W | > c } ] ≤ ( c P ( | W − ˜ W | > c E [ ˜ W {| W − ˜ W | > c } {| ˜ W | > c } ] + E [ ˜ W {| ˜ W | > c } ] ≤ E [( W − ˜ W ) ] + 2 E [ ˜ W {| ˜ W | > c } ] . Inserting this inequality into (8.88), we obtain the assertion.The following lemma generalizes some results from [4] using similar techniques as therein.
Lemma 8.8.
Let q ∈ { , } . Let ˜ W i ( u ) be a stationary sequence with sup u ∈ [0 , k ˜ W ( u ) k q < ∞ , k ˜ W ( u ) − ˜ W ( v ) k q ≤ C W | u − v | ς . (8.89) Let a n : [0 , → R be some sequence of functions with lim sup n →∞ n P ni =1 | a n ( in ) | < ∞ . (i) Let q = 2 . Let c n be some sequence with c n → ∞ . Then n n X i =1 | a n ( in ) | · E [ ˜ W i ( in ) {| ˜ W i ( in ) | >c n } ] → , (ii) Let q = 1 . Suppose that there exists h n > , v ∈ [0 , such that for all u ∈ [0 , , | v − u | > h n implies a n ( u ) = 0 . Put A n = sup i =1 ,...,n | a n ( in ) | and suppose that sup n ∈ N ( h n · A n ) < ∞ , A n n → , a n ( · ) A n has bounded variation uniformly in n. Suppose that the limits on the following right hand sides exist. If u E ˜ W ( u ) hasbounded variation, then n n X i =1 a n ( in ) ˜ W i ( in ) p → lim n →∞ Z a n ( u ) E ˜ W ( u ) du. If h n → , then n n X i =1 a n ( in ) ˜ W i ( in ) p → lim n →∞ Z a n ( u ) du · E ˜ W ( v ) . Proof of Lemma 8.8.
Let J ∈ N be fixed and assume that n ≥ · J . For j ∈{ , ..., J } , Define I j,J,n := { i ∈ { , ..., n } : in ∈ ( j − J , j J ] } . Then ( I j,J,n ) j forms a decom-position of { , ..., n } in the sense that P J j =1 I j,J,n = { , ..., n } . Since in ∈ ( j − J , j J ] ⇐⇒ j − J · n < i ≤ n · j − J ≤ n J , we conclude that n J − ≤ | I j,J,n | ≤ n J . Thus, since n ≥ · J , (cid:12)(cid:12)(cid:12) I j,J,n | n − J (cid:12)(cid:12)(cid:12) ≤ n , | I j,J,n | ≥ n J . (8.90)Let w i , i ∈ N be an arbitrary sequence. Then it holds that (cid:12)(cid:12)(cid:12) n n X i =1 w i − J J X j =1 | I j,J,n | X i ∈ I j,J,n w i (cid:12)(cid:12)(cid:12) ≤ J X j =1 (cid:12)(cid:12)(cid:12) | I j,J,n | n − J (cid:12)(cid:12)(cid:12) · (cid:12)(cid:12)(cid:12) | I j,J,n | X i ∈ I j,J,n w i (cid:12)(cid:12)(cid:12) ≤ n J X j =1 | I j,J,n | X i ∈ I j,J,n | w i |≤ J n n X i =1 | w i | (8.91) mpirical process theory for locally stationary processes w i = a n ( in ) E [ ˜ W i ( in ) {| ˜ W i ( in ) | >c n } ] yields1 n n X i =1 E [ ˜ W i ( in ) {| ˜ W i ( in ) | >c n } ] ≤ J J X j =1 | I j,J,n | X i ∈ I j,J,n E [ ˜ W i ( in ) {| ˜ W i ( in ) | >c n } ] + 2 J n · n n X i =1 a n ( in ) · sup u k ˜ W ( u ) k . (8.92)By Lemma 8.7(ii),12 J J X j =1 | I j,J,n | X i ∈ I j,J,n | a n ( in ) | · E [ ˜ W i ( in ) {| ˜ W i ( in ) | >c n } ] ≤ J J X j =1 | I j,J,n | X i ∈ I j,J,n | a n ( in ) | · E [ ˜ W ( j J ) {| ˜ W ( j J ) | >c n } ]+ 12 J J X j =1 | I j,J,n | X i ∈ I j,J,n | a n ( in ) | · (cid:13)(cid:13) ˜ W ( in ) − ˜ W ( j J ) (cid:13)(cid:13) ≤ h sup j =1 ,..., J E [ ˜ W ( j J ) {| ˜ W ( j J ) | >c n } ] + C W (2 − J ) ς i · J J X j =1 | I j,J,n | X i ∈ I j,J,n | a n ( in ) | . (8.93)By (8.90), 12 J J X j =1 | I j,J,n | X i ∈ I j,J,n | a n ( in ) | ≤ n n X i =1 | a n ( in ) | . By the dominated convergence theorem,lim sup n →∞ E [ ˜ W ( j J ) {| ˜ W ( j J ) | >c n } ] . Furthermore, lim sup n →∞ J n · sup u k ˜ W ( u ) k = 0. Inserting (8.93) into (8.92) andapplying lim sup n →∞ and afterwards, lim sup J →∞ , yields the assertion.(ii) Since (8.89) also holds for ˜ W ( u ) replaced by ˜ W ( u ) − E ˜ W ( u ), we may assume inthe following that w.l.o.g. that E ˜ W ( u ) = 0.By (8.91) applied to w i = a ( in ) W i ( in ), we obtain (cid:13)(cid:13)(cid:13) n n X i =1 a n ( in ) ˜ W i ( in ) − J J X j =1 | I j,J,n | X i ∈ I j,J,n a n ( in ) ˜ W i ( in ) (cid:13)(cid:13)(cid:13) ≤ J n · n n X i =1 | a n ( in ) | · sup u k W ( u ) k → n → ∞ ) . (8.94)2 We furthermore have (cid:13)(cid:13)(cid:13) J J X j =1 | I j,J,n | X i ∈ I j,J,n a n ( in ) ˜ W i ( in ) − J J X j =1 | I j,J,n | X i ∈ I j,J,n a n ( in ) ˜ W i ( j − J ) (cid:13)(cid:13)(cid:13) ≤ J J X j =1 | I j,J,n | X i ∈ I j,J,n | a n ( in ) | · (cid:13)(cid:13) ˜ W ( in ) − ˜ W ( j − J ) (cid:13)(cid:13) ≤ n n X i =1 | a n ( in ) | · C W (2 − J ) ς . (8.95)Fix j ∈ { , ..., J } . Put u j := j − J and, for a real-valued positive x , define [ x ] :=max { k ∈ N : k > x } . By stationarity, the following equality holds in distribution:1 | I j,J,n | X i ∈ I j,J,n a n ( in ) ˜ W i ( u j ) d = 1 | I j,J,n | | I j,J,n | X i =1 a n ( in + [ u j n ] − n ) ˜ W i ( u j ) . (8.96)Put ˜ W i ( u ) ◦ := ˜ W i ( u ) { in + [ ujn ] − n ∈ [ r n ,r n ] } . By partial summation and since a n ( · ) A n has bounded variation B a uniformly in n ,1 | I j,J,n | | I j,J,n | X i =1 a n ( in + [ u j n ] −
1) ˜ W i ( u j )= 1 | I j,J,n | | I j,J,n |− X i =1 (cid:8) a n ( in + [ u j n ] − − a n ( i + 1 n + [ u j n ] − (cid:9) i X l =1 ˜ W l ( u j ) ◦ + 1 | I j,J,n | A n · | I j,J,n | X l =1 ˜ W l ( u j ) ◦ ≤ B a + 1 | I j,J,n | A n · sup i =1 ,..., | I j,J,n | (cid:12)(cid:12)(cid:12) i X l =1 ˜ W l ( u j ) ◦ (cid:12)(cid:12)(cid:12) (8.97)By stationarity, we havesup i =1 ,..., | I j,J,n | (cid:12)(cid:12)(cid:12) i X l =1 ˜ W l ( u j ) ◦ (cid:12)(cid:12)(cid:12) = sup i =1 ,..., | I j,J,n | (cid:12)(cid:12)(cid:12) i ∧ ( ⌊ n ( v − h n ) ⌋− [ u j n ]+1) X l =1 ∨ ( ⌈ n ( v + h n ) ⌉− [ u j n ]+1) ˜ W l ( u j ) (cid:12)(cid:12)(cid:12) d = sup i =1 ,...,m n (cid:12)(cid:12)(cid:12) i X l =1 ˜ W l ( u j ) (cid:12)(cid:12)(cid:12) , mpirical process theory for locally stationary processes | I j,J,n |∧ ( ⌊ n ( v + h n ) ⌋− [ u j n ]+1)) − (1 ∨ ( ⌈ n ( v − h n ) ⌉− [ u j n ]+1)) ≤ m n := 2 nh n .By assumption, m n = nA n · A n h n → ∞ .By the ergodic theorem, lim m →∞ (cid:12)(cid:12)(cid:12) m m X l =1 ˜ W l ( u j ) (cid:12)(cid:12)(cid:12) = 0 a.s. and especially ( m P ml =1 ˜ W l ( u j )) m is bounded a.s. We conclude that1 m n sup i =1 ,...,m n (cid:12)(cid:12)(cid:12) i X l =1 ˜ W l ( u j ) (cid:12)(cid:12)(cid:12) ≤ √ m n sup i =1 ,..., √ m n (cid:12)(cid:12)(cid:12) i i X l =1 ˜ W l ( u j ) (cid:12)(cid:12)(cid:12) + sup i = √ m n +1 ,...,m n (cid:12)(cid:12)(cid:12) i i X l =1 ˜ W l ( u j ) (cid:12)(cid:12)(cid:12) → . We conclude from (8.97) that1 | I j,J,n | | I j,J,n | X i =1 a n ( in + [ u j n ] −
1) ˜ W i ( u j ) ≤ · J ( B a + 1) · A n · m n n · m n sup i =1 ,..., | I j,J,n | (cid:12)(cid:12)(cid:12) i X l =1 ˜ W l ( u j ) ◦ (cid:12)(cid:12)(cid:12) → . (8.98)Combination of (8.94), (8.95), (8.96) and (8.98) and applying lim sup n →∞ andafterwards lim sup J →∞ , we obtain1 n n X i =1 a n ( in ) (cid:8) ˜ W i ( in ) − E ˜ W ( in ) (cid:9) p → . If u E ˜ W ( u ) has bounded variation, we have with some intermediate value ξ i,n ∈ [ i − n , in ], (cid:12)(cid:12)(cid:12) n n X i =1 a n ( in ) E ˜ W ( in ) − Z a n ( u ) E ˜ W ( u ) du (cid:12)(cid:12)(cid:12) ≤ n n X i =1 (cid:12)(cid:12) a n ( in ) E ˜ W ( in ) − a n ( ξ i,n ) E ˜ W ( ξ i,n ) (cid:12)(cid:12) ≤ A n n · A n n X i =1 | a n ( in ) − a n ( ξ i,n ) | · sup u k ˜ W ( u ) k + A n n n X i =1 (cid:12)(cid:12) E ˜ W ( in ) − E ˜ W ( ξ i,n ) (cid:12)(cid:12) → . h n →
0, we have with some intermediate value ξ i,n ∈ [ i − n , in ], (cid:12)(cid:12)(cid:12) n n X i =1 a n ( in ) E ˜ W ( in ) − n n X i =1 a n ( in ) E ˜ W ( v ) (cid:12)(cid:12)(cid:12) ≤ n n X i =1 | a n ( in ) | · sup | u − v |≤ h n k ˜ W ( u ) − ˜ W ( v ) k → . Since a n ( · ) A n has bounded variation uniformly in n , (cid:12)(cid:12)(cid:12) n n X i =1 a n ( in ) − Z a n ( u ) du (cid:12)(cid:12)(cid:12) ≤ A n n · A n n X i =1 | a n ( in ) − a n ( ξ i,n ) | → . Lemma 8.9.
Let F satisfy Assumptions 3.9, 3.10. Suppose that either Assumption 2.5or Assumptions 2.8, 4.9 hold. Then there exist constants C cont > , C ¯ f > such that forany f ∈ F ,(i) for any j ≥ , k P i − j f ( Z i , u ) k ≤ D f,n ( u )∆( j ) , sup i =1 ,...,n k f ( Z i , u ) k ≤ C ∆ · D f,n ( u ) , sup i,u k ¯ f ( Z i , u ) k ≤ C ¯ f , sup v,u k ¯ f ( ˜ Z ( v ) , u ) k ≤ C ¯ f . (ii) with x = , k ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( in ) , u ) k ≤ C cont · n − ςsx , (8.99) k ¯ f ( ˜ Z i ( v ) , u ) − ¯ f ( ˜ Z i ( v ) , u ) k ≤ C cont · (cid:0) | v − v | ςsx + | u − u | ςs (cid:1) . (8.100) In the case that Assumption 2.5 is fulfilled, we can choose x = 1 . Proof of Lemma 8.9. (i) If Assumption 2.5 is satisfied, we have by Lemma 2.7 that k P i − j f ( Z i , u ) k ≤ k f ( Z i , u ) − f ( Z ∗ ( i − j ) i , u ) k = δ f ( Z,u )2 ( j ) ≤ D f,n ( u )∆( j ) . If Assumption 2.8 is satisfied, we have by Lemma 8.1 that k P i − j f ( Z i , u ) k = k P i − j E [ f ( Z i , u ) |G i − ] k ≤ k E [ f ( Z i , u ) |G i − ] − E [ f ( Z i , u ) |G i − ] ∗ ( i − j ) k ≤ D f,n ( u )∆( j ) . The second assertion follows from Lemma 2.7 or Lemma 8.1 depending on if As-sumption 2.5 or 2.8 is satisfied. mpirical process theory for locally stationary processes C R := sup v,u k ¯ R ( ˜ Z ( v ) , u ) k and C R := max { sup i,u k R ( Z i , u ) k , sup u,v k R ( ˜ Z ( v ) , u ) k } .We first use Assumption 3.10 and H¨older’s inequality to obtain k ¯ f ( ˜ Z i ( v ) , u ) − ¯ f ( ˜ Z i ( v ) , u ) k (8.101) ≤ | u − u | ς · (cid:0) k ¯ R ( ˜ Z i ( v ) , u ) k + k R ( ˜ Z i ( v ) , u ) k (cid:1) ≤ C R | u − u | ς . (8.102)Let Assumption 2.5 hold. Then k ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( v ) , u ) k ≤ k| Z i − ˜ Z i ( v ) | sL F ,s ( R ( Z i , u ) + R ( ˜ Z i ( v ) , u ) k ≤ k| Z i − ˜ Z i ( v ) | sL F ,s k pp − (cid:0) k R ( Z i , u ) k p + k R ( ˜ Z i ( v ) , u ) k p (cid:1) ≤ C R k| Z i − ˜ Z i ( v ) | sL F ,s k pp − . Furthermore, k| Z i − ˜ Z i ( v ) | sL F ,s k p ¯ p − ≤ ∞ X l =0 L F ,l k| X i − l − ˜ X i − l ( v ) | s k pp − = i X l =0 L F ,l k X i − l − ˜ X i − l ( v ) k s psp − ≤ i X l =0 L F ,l C sX (cid:0) | v − in | ς + l ς n − ς (cid:1) s ≤ | v − in | ς · C X | L F | + n − ς · C X ∞ X l =0 L F ,l l ςs (cid:9) . We obtain with C cont := 2 ¯ C R + 2 C R C X (cid:8) | L F | + P ∞ j =0 L F ,j j ςs (cid:9) that k ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( v ) , u ) k ≤ C cont · h | v − in | ςs + n − ςs i . (8.103)Furthermore, as above, k f ( ˜ Z i ( v ) , u ) − f ( ˜ Z i ( v ) , u ) k ≤ C R k| ˜ Z ( v ) − ˜ Z ( v ) | sL F ,s k pp − ≤ C R i X l =0 L F ,l k ˜ X ( v ) − ˜ X ( v ) k s psp − ≤ C R C X | L F | · | v − v | ςs (8.104)From (8.103), we obtain (8.99) with v = in . From (8.102) and (8.104), we conclude(8.100).Now let Assumption 4.9 hold. Assume w.l.o.g. thatsup u,v c s E h sup | a | L F ,s ≤ c (cid:12)(cid:12) ¯ f ( ˜ Z ( v ) , u ) − ¯ f ( ˜ Z ( v ) + a, u ) (cid:12)(cid:12) i ≤ C R . c n > C ¯ f := max { sup i,u k f ( Z i , u ) k p , sup u,v k f ( ˜ Z ( v ) , u ) k p } .Then we have by Jensen’s inequality, (cid:13)(cid:13) ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( v ) , u ) (cid:13)(cid:13) ≤ E h(cid:12)(cid:12) ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( v ) , u ) (cid:12)(cid:12) {| Z i − ˜ Z i ( v ) | L F ,s ≤ c n } i / + E h ( ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( v ) , u ) {| Z i − ˜ Z i ( v ) | L F ,s >c n } i / ≤ E h sup | a | L F ,s ≤ c n (cid:12)(cid:12) ¯ f ( ˜ Z i ( v ) , u ) − ¯ f ( ˜ Z i ( v ) + a, u ) (cid:12)(cid:12) i / + (cid:8)(cid:13)(cid:13) ¯ f ( Z i , u ) (cid:13)(cid:13) p + ¯ f ( ˜ Z i ( v ) , u ) (cid:13)(cid:13) p (cid:9) P ( | Z i − ˜ Z i ( v ) | L F ,s > c n ) ¯ p − p ≤ C R c sn + 2 C ¯ f (cid:16) k| Z i − ˜ Z i ( v ) | L F ,s k ps ¯ p − c n (cid:17) s ≤ C R c sn + 2 C ¯ f C X ( | L F | + ∞ X j =0 L F ,j j ςs ) · {| v − in | ςs + n − ςs } c sn . We obtain with c cont := C R + 2 C ¯ f C X ( | L F | + P ∞ j =0 L F ,j j ςs ) that k ¯ f ( Z i , u ) − ¯ f ( ˜ Z i ( v ) , u ) k ≤ c cont · h c sn + | v − in | ςs + n − ςs c sn i . (8.105)Furthermore, as above, for any c > k f ( ˜ Z i ( v ) , u ) − f ( ˜ Z i ( v ) , u ) k ≤ C R c s + 2 C ¯ f (cid:16) k| ˜ Z ( v ) − ˜ Z ( v ) | sL F ,s k p ¯ p − c (cid:17) s ≤ C R c s + 2 C ¯ f C X | L F | · | v − v | ςs c s . (8.106)From (8.105), (8.106) and (8.102), we obtain the assertion again with v = in . Proof of Theorem 5.1.
We show the result more generally for G Wn ( f ) = √ n S Wn ( f ).The statement of the theorem is obtained for W i ( f ) = f ( Z i , in ).Let V ◦ ( f ) = k f k ,n + P ∞ j =1 min {k f k ,n , D n ∆( j ) } ϕ ( j ) / , where ϕ ( j ) = log log( e e j ). V ◦ ( f ) serves as a lower bound for ˜ V ( f ).For q ∈ { , ..., n } , we use decomposition (3.1) without the maximum. The set B n ( q ) is mpirical process theory for locally stationary processes P (cid:0)(cid:12)(cid:12) √ n S Wn ( f ) (cid:12)(cid:12) > x, B n ( q ) (cid:1) ≤ P (cid:16) √ n (cid:12)(cid:12) S Wn ( f ) − S Wn,q ( f ) (cid:12)(cid:12) > x/ , B n ( q ) (cid:17) + P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x/ (cid:17) + P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i odd √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x/ (cid:17) + P (cid:16) √ n (cid:12)(cid:12) S Wn, ( f ) (cid:12)(cid:12) > x/ (cid:17) =: A + A + A . Define for l ∈ N , g ( l ) = p log( l + 1) + 1 , g ( l ) = log( l + 1) + 1 , a ( l ) = l / log( el ) / ϕ ( l ) . and for j ∈ N , γ ( j ) = log ( j ) + 1. By elementary calculations, we see that there exists auniversal constant c ≥ L X l =1 τ l g ( l ) ≤ L X l =1 τ l X j = τ l − +1 g ( l ) ≤ q X j =1 L X l =1 { τ l − +1 ≤ j ≤ τ l } g ( l ) ≤ q X j =1 g ( γ ( j )) ≤ q · g ( γ ( q )) ≤ q ) . The third to last inequality is due to 2 l − +1 = τ l − +1 ≤ j ⇐⇒ l ≤ log ( j − ≤ γ ( j )and the monotonicity of g . In a similar fashion, L X l =1 g ( l ) τ l X j = τ l − +1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) }≤ q X j =1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } · L X l =1 { τ l − +1 ≤ j ≤ τ l g ( l ) ≤ q X j =1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } g ( γ ( j )) ≤ V ◦ ( f )by g ( γ ( j )) ≤ ϕ ( j ) / and Lemma 8.2.8Therefore, x x x x V ◦ ( f ) L X l =1 g ( l ) τ l X j = τ l − +1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } + x q ) L X l =1 τ l g ( l )= L X l =1 y ( l ) + L X l =1 y ( l ) , where y ( l ) := x V ◦ ( f ) g ( l ) P τ l j = τ l − +1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } , y ( l ) = x q ) τ l g ( l ).We now use a standard Bernstein inequality for independent random variables: If M Q , σ Q > Q i , i = 1 , ..., m mean-zero independent variables with | Q i | ≤ M Q ,( m P mi =1 k Q i k ) / ≤ σ Q , then for any z > P (cid:16) √ m (cid:12)(cid:12)(cid:12) m X i =1 (cid:2) Q i − E Q i (cid:3)(cid:12)(cid:12)(cid:12) > z (cid:17) ≤ · exp (cid:16) − z σ Q + M Q z √ m (cid:17) . (8.107)Using the bound (8.15), √ τ l | T i,l ( f ) | ≤ √ τ l k f k ∞ ≤ √ τ l M and the elementary inequal-ity min { ab , ac } ≤ ab + c ≤ min { ab , ac } we obtain P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x (cid:17) ≤ L X l =1 P (cid:16)(cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > y ( l ) + y ( l ) (cid:17) ≤ L X l =1 exp (cid:16) −
14 min n ( y ( l ) + y ( l )) (cid:16) P τ l j = τ l − +1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } (cid:17) , y ( l ) + y ( l ) √ τ l M √ nτl o(cid:17) ≤ L X l =1 exp (cid:16) −
14 min n y ( l ) (cid:16) P τ l j = τ l − +1 min {k f k ,n , D n ∆( ⌊ j ⌋ ) } (cid:17) , y ( l ) √ τ l M √ nτl o(cid:17) = 2 L X l =1 exp (cid:16) −
14 min n x g ( l ) V ◦ ( f ) , xg ( l ) M Φ( q ) √ n o(cid:17) ≤ L X l =1 exp (cid:16) − x g ( l ) V ◦ ( f ) (cid:17) + 2 L X l =1 exp (cid:16) − xg ( l ) M Φ( q ) √ n (cid:17) . (8.108) mpirical process theory for locally stationary processes x > √ · V ◦ ( f ), (cid:16) L X l =1 exp (cid:16) − x g ( l ) V ◦ ( f ) (cid:17)(cid:17) exp (cid:16) x V ◦ ( f ) (cid:17) = L X l =1 exp (cid:16) − log( l + 1) · (cid:16) x V ◦ ( f ) (cid:17) (cid:17) ≤ L X l =1 ( l + 1) − ( x V ◦ ( f ) ) ≤ π . Similarly, if x > M Φ( q ) √ n , (cid:16) L X l =0 exp (cid:16) − xg ( l ) M Φ( q ) √ n (cid:17)(cid:17) exp (cid:16) x M Φ( q ) √ n (cid:17) ≤ π . We conclude from (8.108): If x > max {√ · V ◦ ( f ) , M Φ( q ) √ n } , (8.109)then P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) q nτ l ⌊ nτl ⌋ +1 X i =1 i even √ τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x (cid:17) ≤ π h exp (cid:16) − x V ◦ ( f ) (cid:17) + exp (cid:16) − x M Φ( q ) √ n (cid:17)i ≤ π e exp (cid:16) − min n x V ◦ ( f ) , x M Φ( q ) √ n o(cid:17) , (8.110)where in the last step we added the factor e for convenience of the next step of theproof. If (8.109) is not fulfilled, then either x ≤ √ · V ◦ ( f ) or x ≤ M Φ( q ) √ n . The upperbound (8.110) then is still true since x ≤ √ · V ◦ ( f ) impliesexp (cid:16) − min n x V ◦ ( f ) , x M Φ( q ) √ n o(cid:17) ≥ exp (cid:16) − x V ◦ ( f ) (cid:17) ≥ exp( − ≤ x ≤ M Φ( q ) √ n . Thus, (8.110) holds for all x > A ≤ π e · exp (cid:16) − x V ◦ ( f ) + M Φ( q ) x √ n (cid:17) . (8.111)0Since k W i ( f ) k ≤ k f ( Z i , in ) k and k W i ( f ) k ∞ ≤ k f k ∞ ≤ M , we obtain from (8.107) A ≤ (cid:16) − x k f k ,n + Mx √ n (cid:17) . (8.112)Since 1 ≤ Φ( q ) and k f k ,n ≤ V ◦ ( f ), this yields a similar bound as (8.111).We now discuss A . Write1 √ n ( S Wn ( f ) − S Wn,q ( f )) = ∞ X j = q √ n ( S Wn,j +1 ( f ) − S Wn,j ( f )) = ∞ X j = q √ n n X i =1 ( W i,j +1 ( f ) − W i,j ( f )) . PutΩ n ( j ) := { sup f ∈F n n X i =1 E [( W i,j +1 ( f ) − W i,j ( f )) |G i − ] ≤ ( M Φ( q )˜ β ( q ) √ n ) ∆( j ) a ( j ) g ( j ) }∩{ sup f ∈F sup i =1 ,...,n | W i,j +1 ( f ) − W i,j ( f ) | ≤ M Φ( q )˜ β ( q ) ∆( j ) a ( j ) } , and B n ( q ) := ∞ \ j = q Ω n ( j ) . (8.113)Note that A ≤ P (cid:16) √ n (cid:12)(cid:12) S Wn ( f ) − S Wn,q ( f ) (cid:12)(cid:12) > x , ∞ \ j = q Ω n ( j ) (cid:17) . (8.114)Here, W i,j +1 ( f ) − W i,j ( f ) is a martingale difference with respect to G i . Furthermore, ∞ X j = q ∆( j ) a ( j ) g ( j ) ≤ ∞ X j = q ∆( j ) j / log( ej ) = 4 ˜ β ( q ) . By Freedman’s Bernstein-type inequality for martingales (cf. [12]), we have for x ≥ mpirical process theory for locally stationary processes M Φ( q ) √ n that P (cid:16) √ n (cid:12)(cid:12) S Wn ( f ) − S Wn,q ( f ) (cid:12)(cid:12) > x , ∞ \ j = q Ω n ( j ) (cid:17) ≤ ∞ X k = q P (cid:16) √ n (cid:12)(cid:12)(cid:12) n X i =1 ( W i,j +1 ( f ) − W i,j ( f )) (cid:12)(cid:12)(cid:12) > x ∆( j ) a ( j ) g ( j )˜ β ( q ) , Ω n ( j ) (cid:17) ≤ ∞ X k = q exp (cid:16) −
12 ( x ∆( j ) a ( j ) g ( j )˜ β ( q ) ) ( M Φ( q )˜ β ( q ) √ n ) ∆( j ) a ( j ) g ( j ) + M Φ( q )∆( j ) a ( j )˜ β ( q ) √ n · x ∆( j ) a ( j ) g ( j )˜ β ( q ) (cid:17) = 2 ∞ X k = q exp (cid:16) − x g ( j ) ( M Φ( q ) √ n ) g ( j ) + M Φ( q ) xg ( j ) √ n (cid:17) = 2 ∞ X k = q exp (cid:16) − g ( j )4 min n(cid:16) x M Φ( q ) √ n (cid:17) , (cid:16) x M Φ( q ) √ n (cid:17)o(cid:17) ≤ ∞ X k = q exp (cid:16) − g ( j ) x M Φ( q ) √ n (cid:17) . (8.115)We conclude that for x > M Φ( q ) √ n , (cid:16) ∞ X j = q exp (cid:16) − g ( j ) x M Φ( q ) √ n (cid:17)(cid:17) · exp (cid:16) x M Φ( q ) √ n (cid:17) ≤ ∞ X j = q ( j + 1) − ( x M Φ( q ) √ n ) ≤ π , and thus (with an additional factor e ), A ≤ P (cid:16) √ n (cid:12)(cid:12) S Wn ( f ) − S Wn,q ( f ) (cid:12)(cid:12) > x , ∞ \ j = q Ω n ( j ) (cid:17) ≤ π e exp (cid:16) − x M Φ( q ) √ n (cid:17) . (8.116)In the case x ≤ M Φ( q ) √ n , we have π e exp (cid:16) − x M Φ( q ) √ n (cid:17) ≥ π ≥ , thus (8.116) holds for all x > g ( j ) ≥
1, we haveΩ n ( j ) ⊂ { n n X i =1 E [sup f ∈F (cid:12)(cid:12) W i,j +1 ( f ) − W i,j ( f ) (cid:12)(cid:12) |G i − ] ≤ ( M Φ( q )˜ β ( q ) √ n ) ∆( j ) a ( j ) }∩{ (cid:16) n X i =1 sup f ∈F | W i,j +1 ( f ) − W i,j ( f ) | (cid:17) / ≤ M Φ( q )˜ β ( q ) ∆( j ) a ( j ) } , P (Ω n ( j ) c ) ≤ (cid:16) √ n ˜ β ( q ) M Φ( q ) (cid:17) j ) a ( j ) n n X i =1 (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) W i,j +1 ( f ) − W i,j ( f ) (cid:12)(cid:12)(cid:13)(cid:13)(cid:13) + (cid:16) ˜ β ( q )2 M Φ( q ) (cid:17) j ) a ( j ) n X i =1 (cid:13)(cid:13)(cid:13) sup f ∈F (cid:12)(cid:12) W i,j +1 ( f ) − W i,j ( f ) (cid:12)(cid:12)(cid:13)(cid:13)(cid:13) ≤ (cid:16) ˜ β ( q ) √ nM Φ( q ) (cid:17) ( D ∞ n ) a ( j ) . Therefore, P ( B n ( q ) c ) ≤ P (cid:16) ∞ [ j = q Ω n ( j ) c (cid:17) ≤ (cid:16) D ∞ n ˜ β ( q ) √ nM Φ( q ) (cid:17) ∞ X k = q a ( j ) . (8.117)Note that ∞ X j = q +1 a ( j ) ≤ ∞ X j = q +1 Z jj − a ( j ) dx ≤ Z ∞ q a ( x ) dx ≤ Z ∞ q x log( e e x ) log(log( e e x )) dx = 2log(log( e e q )) , so that ∞ X k = q a ( j ) = 1 a ( q ) + ∞ X j = q +1 a ( j ) ≤ ϕ ( q ) . Summarizing the bounds (8.111), (8.112), (8.116) and (8.117) and using the fact that V ◦ ( f ) = k f k ,n + ∞ X j =1 min {k f k ,n , D n ∆( j ) } ϕ ( j ) / ≤ ˜ V ( f ) , we obtain (5.1).We now show (5.2) by a case distinction. We abbreviate ˜ q ∗ = ˜ q ∗ ( M √ n D ∞ n y ). In the caseΦ(˜ q ∗ ) n ≤
1, we have ˜ q ∗ ∈ { , ..., n } and thus by (5.1) P (cid:16) √ n | S Wn ( f ) | > x, B n (˜ q ∗ ) (cid:17) ≤ c exp (cid:16) − c x ˜ V ( f ) + M Φ(˜ q ∗ ) √ n x (cid:17) and, by definition of ˜ q ∗ , P ( B n (˜ q ∗ ) c ) ≤ (cid:16) ˜ β (˜ q ∗ )Φ(˜ q ∗ ) · D ∞ n √ nM (cid:17) ≤ y . mpirical process theory for locally stationary processes q ∗ ) n >
1, we obviously have P (cid:16) √ n | S Wn ( f ) | > x (cid:17) ≤ P ( M √ n > x ) ≤ c exp (cid:16) − c xM √ n (cid:17) ≤ c exp (cid:16) − c x M Φ(˜ q ∗ ) √ n (cid:17) ≤ c exp (cid:16) − c x ˜ V ( f ) + M Φ(˜ q ∗ ) √ n x (cid:17) , and the assertion follows holds without any restricting set B n ( q ), we can therefore choose q arbitrarily. Lemma 8.10.
Let F be a class of functions which satisfies Assumption 4.1. Then thereexist universal constants c , c > such that the following holds: For each q ∈ { , ..., n } there exists a set B (2) n ( q ) independent of f ∈ F such that for all x > , P (cid:16) n (cid:12)(cid:12) S Wn ( f ) (cid:12)(cid:12) > x, B (2) n ( q ) (cid:17) ≤ c exp (cid:16) − c x M Φ( q ) n · min (cid:8) x k f k ,n V ( f ) , (cid:9)(cid:17) (8.118) and P ( B (2) n ( q ) c ) ≤ n ( D ∞ n ) M · C ∆ β ( q )Φ( q ) . Define ˜ q ∗ ( z ) = min { q ∈ N : β ( q ) ≤ Φ( q ) x } . Then for any x > , y > , P (cid:16) n (cid:12)(cid:12) S Wn ( f ) (cid:12)(cid:12) > x, B (2) n (˜ q ∗ ( M n ( D ∞ n ) y )) (cid:17) ≤ c exp (cid:16) − c x M n Φ(˜ q ∗ ( M n ( D ∞ n ) y )) · min (cid:8) x k f k ,n V ( f ) , (cid:9)(cid:17) (8.119) and P ( B (2) n (˜ q ∗ ( M n ( D ∞ n ) y )) c ) ≤ C ∆ y . Proof of Lemma 8.10.
We use a similar argument as in Theorem 5.1, especially wemake use of the decomposition (3.1).The set B (2) n ( q ) is defined below in (8.125). We then have P (cid:0)(cid:12)(cid:12) n S Wn ( f ) (cid:12)(cid:12) > x, B (2) n ( q ) (cid:1) ≤ P (cid:16) n (cid:12)(cid:12) S Wn ( f ) − S Wn,q ( f ) (cid:12)(cid:12) > x/ , B (2) n ( q ) (cid:17) + P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) nτ l ⌊ nτl ⌋ +1 X i =1 i even τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x/ (cid:17) + P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) nτ l ⌊ nτl ⌋ +1 X i =1 i odd τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x/ (cid:17) + P (cid:16) n (cid:12)(cid:12) S Wn, ( f ) (cid:12)(cid:12) > x/ (cid:17) =: A + A + A . g ( l ) = log( l + 1) + 1, L X l =1 τ l g ( l ) ≤ q ) . Therefore, x x x x V ( f ) L X l =1 τ l X j = τ l − +1 min {k f k ,n , D n ∆( j ) } + x q ) L X l =1 τ l g ( l )= L X l =1 y ( l ) + L X l =1 y ( l ) , where y ( l ) := x V ( f ) P τ l j = τ l − +1 min {k f k ,n , D n ∆( j ) } , y ( l ) = x q ) τ l g ( l ).We have by Lemma 8.3: If M Q , σ Q > Q i , i = 1 , ..., m mean-zeroindependent variables with | Q i | ≤ M Q , m P mi =1 k Q i k ≤ σ Q , then for any z > P (cid:16) √ m (cid:12)(cid:12)(cid:12) m X i =1 (cid:2) Q i − E Q i (cid:3)(cid:12)(cid:12)(cid:12) > z (cid:17) ≤ · exp (cid:16) − z σ Q M Q m + M Q zm (cid:17) . (8.120)Using the bound (8.48) combined with (8.51), τ l | T i,l ( f ) | ≤ k f k ∞ ≤ M and theelementary inequalities min { ab , ac } ≤ ab + c ≤ min { ab , ac } and ( a + b ) ≥ ab , we obtain P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) nτ l ⌊ nτl ⌋ +1 X i =1 i even τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x (cid:17) ≤ L X l =1 P (cid:16)(cid:12)(cid:12)(cid:12) nτ l ⌊ nτl ⌋ +1 X i =1 i even τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > y ( l ) + y ( l ) (cid:17) ≤ L X l =1 exp (cid:16) −
14 min n ( y ( l ) + y ( l )) k f k ,n P τ l j = τ l − +1 min {k f k ,n , D n ∆( j ) } · M nτl ,y ( l ) + y ( l ) M nτl o(cid:17) ≤ L X l =1 exp (cid:16) − · y ( l ) M nτl min n y ( l ) k f k ,n P τ l j = τ l − +1 min {k f k ,n , D n ∆( j ) } , o(cid:17) ≤ L X l =1 exp (cid:16) − xg ( l )2 M Φ( q ) n · min n x k f k ,n V ( f ) , o(cid:17) . (8.121) mpirical process theory for locally stationary processes x is such that c ( x ) := x M q ) n · min (cid:8) x k f k ,n V ( f ) , (cid:9) ≥
2, then (cid:16) L X l =1 exp (cid:0) − g ( l ) c ( x ) (cid:1)(cid:17) · exp (cid:0) − c ( x ) (cid:1) = L X l =1 exp (cid:16) − log( l + 1) c ( x ) (cid:17) ≤ L X l =1 ( l + 1) − c ( x ) ≤ π . Insertion into (8.121) leads to P (cid:16) L X l =1 (cid:12)(cid:12)(cid:12) nτ l ⌊ nτl ⌋ +1 X i =1 i even τ l T i,l ( f ) (cid:12)(cid:12)(cid:12) > x (cid:17) ≤ π · e exp( − c ( x )) (8.122)In the case c ( x ) <
2, the right hand side of (8.122) is ≥
1. Thus, (8.122) holds for all x > A ≤ π e · exp (cid:16) − · x M Φ( q ) n · min (cid:8) x k f k ,n V ( f ) , (cid:9)(cid:17) . (8.123)Since k W i ( f ) k ≤ k f ( Z i , in ) k and k W i ( f ) k ∞ ≤ k f k ∞ ≤ M , we obtain from (8.120) A ≤ (cid:16) − x k f k ,n · M n + M xn (cid:17) ≤ (cid:16) − x M n · min (cid:8) x k f k ,n , (cid:9)(cid:17) . (8.124)Since 1 ≤ Φ( q ) and k f k ,n ≤ V ( f ), this yields a similar bound as (8.123).We now discuss A . Put B (2) n ( q ) := { sup f ∈F n | S Wn ( f ) − S Wn,q ( f ) | ≤ M Φ( q ) n } . (8.125)Then with Markov’s inequality and using the same calculation as in (8.53), P ( B (2) n ( q ) c ) ≤ nM Φ( q ) · (cid:13)(cid:13) sup f ∈F n | S Wn ( f ) − S Wn,q ( f ) | (cid:13)(cid:13) ≤ nM Φ( q ) · ∞ X k = q n n X i =1 (cid:13)(cid:13) sup f ∈F | W i,j +1 ( f ) − W i,j ( f ) | (cid:13)(cid:13) ≤ nM Φ( q ) · ( D ∞ n ) C ∆ β ( q ) . (8.126)Furthermore, A = P (cid:16) n | S Wn ( f ) − S Wn,q ( f ) | > x , B (2) n ( q ) (cid:17) = { M q ) n > x } ≤ e · exp (cid:16) − x M Φ( q ) n (cid:17) . (8.127)6Summarizing the bounds (8.123), (8.124), (8.127) and (8.126), we obtain the result(8.118).We now show (8.119) by a case distinction. Abbreviate ˜ q ∗ = ˜ q ∗ ( M n ( D ∞ n ) y ). In the caseΦ(˜ q ∗ ) n ≤
1, we have ˜ q ∗ ∈ { , ..., n } and thus by (8.118) P (cid:16) n | S Wn ( f ) | > x, B n (˜ q ∗ ) (cid:17) ≤ c exp (cid:16) − c x M Φ(˜ q ∗ ) n · min (cid:8) x k f k ,n V ( f ) , (cid:9)(cid:17) and, by definition of ˜ q ∗ , P ( B n (˜ q ∗ ) c ) ≤ ( D ∞ n ) nM · C ∆ β (˜ q ∗ )Φ(˜ q ∗ ) ≤ C ∆ y , the assertion follows with B (2) n ( M, y ) = B (2) n (˜ q ∗ ).In the case Φ(˜ q ∗ ) n >
1, we obviously have P (cid:16) n | S Wn ( f ) | > x (cid:17) ≤ P ( M > x ) ≤ c exp (cid:16) − c xM (cid:17) ≤ c exp (cid:16) − c x M Φ(˜ q ∗ ) n (cid:17) ≤ c exp (cid:16) − c x M Φ(˜ q ∗ ) n · min (cid:8) x k f k ,n V ( f ) , (cid:9)(cid:17) , and the assertion follows with B (2) n ( M, y ) being the whole probability space.
Proof of Theorem 5.4.
Let B n ( q ) denote the set from Theorem 5.1 (applied to W i ( f ) = E [ f ( Z i , in ) |G i − ] instead of W i ( f ) = f ( Z i , in ); the proof is similar for this situation). Let B (2) n ( q ) denote the set from Lemma 8.10.Put B ◦ n ( q ) = B n ( q ) ∩ B (2) n ( q ) . Then we have P (cid:0) | G n ( f ) | > x, B ◦ n ( q ) (cid:1) ≤ P (cid:0) G (1) n ( f ) | > x , B (2) n ( q ) (cid:1) + P (cid:0) | G (2) n ( f ) | > x , B n ( q ) (cid:1) ≤ P (cid:16) | G (1) n ( f ) | > x , R n ( f ) ≤ max (cid:8) ˜ V ( f ) , M Φ( q ) √ n x (cid:9)(cid:17) + P (cid:16) R n ( f ) > max (cid:8) ˜ V ( f ) , M Φ( q ) √ n x (cid:9) , B (2) n ( q ) (cid:17) + P (cid:0) | G (2) n ( f ) | > x , B n ( q ) (cid:1) . (8.128)We now discuss the three summands in (8.128) separately. By Theorem 5.1, P (cid:0) | G (2) n ( f ) | > x , B n ( q ) (cid:1) ≤ c exp (cid:16) − c ( x/ ˜ V ( f ) + M Φ( q ) √ n ( x/ (cid:17) . mpirical process theory for locally stationary processes P (cid:16) | G (1) n ( f ) | > x , R n ( f ) ≤ max (cid:8) ˜ V ( f ) , M Φ( q ) √ n x (cid:9)(cid:17) ≤ (cid:16) −
12 ( x/ max { ˜ V ( f ) , M Φ( q ) √ n x } + M √ n x (cid:17) ≤ (cid:16) −
14 ( x/ ˜ V ( f ) + M Φ( q ) √ n x (cid:17) . By Lemma 8.10 applied to W i ( f ) = E [ f ( Z i , in ) |G i − ] and using Φ( q ) ≤ Φ( q ) (cf.(8.129)), P (cid:16) R n ( f ) > max (cid:8) ˜ V ( f ) , M Φ( q ) √ n x (cid:9) , B (2) n ( q ) (cid:17) ≤ c exp (cid:16) − c M Φ( q ) √ n x M Φ( q ) n · min n ˜ V ( f ) k f k ,n V ( f ) , o(cid:17) = c exp (cid:16) − c x M Φ( q ) √ n x (cid:17) . Inserting the above estimates into (8.128), the assertion (5.3) follows. Furthermore byAssumption 5.3, P ( B (2) n ( q ) c ) ≤ C ∆ n ( D ∞ n ) M ˜ β norm ( q ) ≤ C ∆ C ˜ β (cid:16) √ n D ∞ n M ˜ β norm ( q ) (cid:17) . Thus, P ( B ◦ n ( q ) c ) ≤ P ( B n ( q ) c ) + P ( B (2) n ( q ) c ) ≤ [4 + C ∆ C ˜ β ] (cid:16) √ n D ∞ n M ˜ β norm ( q ) (cid:17) . The second assertion (5.4) follows as in Theorem 5.1 with q = ˜ q ∗ ( M √ n D ∞ n y ). Lemma 8.11 (A second compatibility lemma) . Let n ∈ N , δ, a M > and k ∈ N . For H > , put ˜ r ( δ ) := max { r > q ∗ ( r ) r ≤ δ } , and w ( H ) := min { w > w · ˜ r ( w ) ≥ H − } , W ( H ) := Hw ( H ) . and ˜ m ( n, δ, k ) := a M ˜ r ( δ D n )˜ r ( w ( H ( k ))) · D ∞ n n / . Finally, put ˆ C n := 8 c (1 + D ∞ n D n )(1 + C β ( ˜ β (1) ∨ a M ) . (i) Then W is subadditive.(ii) If F fulfills Assumption 3.1 and Assumption 5.3, then sup f ∈F ˜ V ( f ) ≤ δ , sup f ∈F k f k ∞ ≤ ˜ m ( n, δ, k ) implies that for any ψ : (0 , ∞ ) → [1 , ∞ ) , P (cid:16) √ n (cid:12)(cid:12) S Wn ( f ) (cid:12)(cid:12) > ˆ C n ψ ( δ ) δ W ( H ( k )) , B n (cid:17) ≤ c exp (cid:0) − H ( k ) (cid:1) , √ n k f { f>γ · ˜ m ( n,δ,k ) } k ,n ≤ γa M · D ∞ n D n · δ W ( H ( k )) , P (cid:0) B cn (cid:1) ≤ ψ ( δ ) a M , where B n = B n (˜ q ∗ ( m ( n,δ,k ) √ n D ∞ n ψ ( δ ) a M )) , c , c are from Theorem 5.1. Proof of Lemma 8.11. (i) Note that for a, b >
0, we have w ( a + b ) ≤ w ( a ) since w ( a )˜ r ( w ( a )) ≥ a − ≥ ( a + b ) − . Thus W ( a + b ) = ( a + b ) w ( a + b ) ≤ aw ( a + b )+ bw ( a + b ) ≤ aw ( a )+ bw ( b ) ≤ W ( a )+ W ( b ) . (ii) As in the proof of Lemma 4.5 (cf. (8.63)), we obtain that for x , x > q ∗ ( C ˜ β x x ) ≤ ˜ q ∗ ( x )˜ q ∗ ( x ) . Furthermore, for q , q ∈ N we have due to x + x ≤ x x + 1 thatlog log( e e q q ) ≤ log[log( eq ) + log( eq )] ≤ log[log( eq ) · log( eq ) + 1] ≤ log[log( eq ) · log( e e q )] ≤ log log( eq ) + log log( e e q ) ≤ log log( eq ) · log log( e e q ) + 1 ≤ log log( e e q ) · log log( e e q ) , and thus Φ( q q ) ≤ Φ( q )Φ( q ) . (8.129)Furthermore, note that for a ∈ (0 , ( ˜ β (1) ∨ q = ⌈ Φ − ( ( ˜ β (1) ∨ a ) ⌉ satisfiesΦ( q ) a = Φ( ⌈ Φ − ( ( ˜ β (1) ∨ a ) ⌉ ) a ≥ ( ˜ β (1) ∨ ≥ ˜ β ( q ) , that is, Φ(˜ q ∗ ( a )) ≤ Φ( ⌈ Φ − ( ( ˜ β (1) ∨ a ) ⌉ ) ≤ Φ(2Φ − ( ( ˜ β (1) ∨ a )) ≤ − ( ( ˜ β (1) ∨ a )) ≤
4( ˜ β (1) ∨ a . (8.130) mpirical process theory for locally stationary processes y = ψ ( δ ) a M , we have˜ q ∗ ( ˜ m ( n, δ, k ) √ n D ∞ n y ) = ˜ q ∗ ( ˜ m ( n, δ, k ) √ n D ∞ n ψ ( δ ) a M ) = ˜ q ∗ ( C β ( ˜ β (1) ∨ r r ψ ( δ ) ) , where r = ˜ r ( δ D n ), r = ˜ r ( w ( H ( k ))), and thus with (8.129) and (8.130),Φ(˜ q ∗ ) ˜ m ( n, δ, k ) √ n ≤ Φ(˜ q ∗ ( C β ( ˜ β (1) ∨ r r ψ ( δ ) )) r r D ∞ n a M ≤ Φ (cid:16) ˜ q ∗ ( ( ˜ β (1) ∨ ψ ( δ ) )˜ q ∗ ( r )˜ q ∗ ( r ) (cid:17) r r D ∞ n a M ≤ Φ (cid:16) ˜ q ∗ ( ( ˜ β (1) ∨ ψ ( δ ) ) (cid:17) Φ(˜ q ∗ ( r ))Φ(˜ q ∗ ( r ))) r r D ∞ n a M ≤ D ∞ n D n ψ ( δ ) δw ( H ( k )) a M . By definition of W ( · ) and Theorem 5.1, we obtain P (cid:16) √ n (cid:12)(cid:12) S Wn ( f ) (cid:12)(cid:12) > ˆ C n ψ ( δ ) δ · W ( H ( k )) , B n (cid:17) ≤ c exp (cid:16) − c ˆ C n ψ ( δ ) δ W ( H ( k )) δ + 4 D ∞ n D n a M ˆ C n δ ψ ( δ ) w ( H ( k )) W ( H ( k )) (cid:17) ≤ c exp (cid:16) − c ˆ C n a M D ∞ n D n ˆ C n H ( k ) (cid:17) ≤ c exp (cid:0) − H ( k ) (cid:1) . Similar as in the proof of Lemma 3.6, we obtain due to Assumption 5.3 that˜ V ( f ) ≥ min a ∈ N (cid:2) k f k ,n (cid:16) a X j =1 ϕ ( j ) (cid:17) + D n ˜ β ( a ) (cid:3) ≥ k f k ,n (cid:16) ˆ a X j =1 ϕ ( j ) (cid:17) + D n ˜ β (ˆ a ) , where ˆ a = arg min a ∈ N {k f k ,n · (cid:0) P aj =1 ϕ ( j ) (cid:1) + D n β ( a ) } . Elementary calculationsshow that for ˆ a ≥ ˆ a X j =1 ϕ ( j ) = 1 + ˆ a X j =2 ϕ ( j ) ≥ Z ˆ a − ϕ ( x ) dx = 1 + (Φ(ˆ a − − − Z ˆ a − e e x ) dx ≥ Φ(ˆ a − − ˆ a − e ≥
14 Φ(ˆ a ) . Clearly, the same holds for ˆ a = 1. We therefore have˜ V ( f ) ≥ k f k ,n Φ(ˆ a ) . (8.131)00 Now, δ ≥ ˜ V ( f ) ≥ D n ˜ β (ˆ a ) = D n ˜ β norm (ˆ a )Φ(ˆ a ). Thus ˜ β norm (ˆ a ) ≤ δ D n Φ(ˆ a ) . By defi-nition of ˜ q ∗ , ˜ q ∗ ( δ D n Φ(ˆ a ) ) ≤ ˆ a . Thus Φ(˜ q ∗ ( δ D n Φ(ˆ a ) )) δ D n Φ(ˆ a ) ≤ δ D n . By definition of ˜ r ,˜ r ( δ D n ) ≥ δ D n Φ(ˆ a ) .Using this result, (8.131) and the definition of w ( · ) yields √ n k f { f>γ · ˜ m ( n,δ,k ) } k ,n ≤ γ √ n k f k ,n ˜ m ( n, δ, k ) ≤ γ a M D ∞ n k f k ,n ˜ r ( δ D n )˜ r ( w ( H ( k ))) , and k f k ,n ˜ r ( δ D n )˜ r ( w ( H ( k ))) ≤ D n Φ(ˆ a ) k f k ,n δ r ( w ( H ( k )) ≤ D n V ( f ) k f k ,n δ r ( w ( H ( k )) ≤ δ · r ( w ( H ( k )) ≤ D n δ W ( H ( k )) , which provides √ n k f { f>γ · ˜ m ( n,δ,k ) } k ,n ≤ γ D n D ∞ n a M δ W ( H ( k )) . Finally, Theorem 5.1 implies that P ( B cn ) ≤ y = 4 ψ ( δ ) a M . We here present a chaining version of the large deviation inequality from Theorem 5.1.For the sake of simplicity, we derive the result for some continuous strictly decreasingupper bound ¯ H ( ε ) of H ( ε, F , V ). Theorem 8.12 (Chaining for large deviation inequalities) . There exists a universalconstant c > such that the following holds.Let a M ≥ , M, σ > be arbitrary. Let ψ ( x ) = p log( x − ∨ e ) log log( x − ∨ e e ) .Let F be a class which satisfies Assumption 3.1 and 5.3, and sup f ∈F ˜ V ( f ) ≤ σ , sup f ∈F k f k ∞ ≤ M . Define ˜ I ( σ ) := Z σ ψ ( ε ) W (1 ∨ ¯ H ( ε )) dε. Choose σ ◦ , x > such that ¯ H ( σ ◦ ) = 150 c · x σ + Φ(˜ q ∗ ( M √ n D ∞ n a M )) Mx √ n , x ≥ c ˆ C n ˜ I ( σ ◦ ) , (8.132) mpirical process theory for locally stationary processes where ˆ C n , W is from Lemma 8.11. Then there exists a set Ω n independent of x such that P (cid:16) sup f ∈F (cid:12)(cid:12) G n ( f ) (cid:12)(cid:12) > x, Ω n (cid:17) ≤ (cid:16) − c · x σ + Φ(˜ q ∗ ( M √ n D ∞ n a M )) Mx √ n (cid:17) , and P (Ω cn ) ≤ a M Z σ ◦ xψ ( x ) dx. Proof of Theorem 8.12.
We use the chaining technique from [2], Theorem 2.3 therein.We define δ := σ ◦ , δ j +1 := max { δ ≤ δ j H ( δ ) ≥ H ( δ j ) } . Since ¯ H ( · ) is continuous, it holds that ¯ H ( δ j +1 ) = 4 ¯ H ( δ j ). Put τ := min { j ≥ δ j ≤ ˜ I ( σ ◦ ) √ n } . Define η j := 4 ˆ C n ψ ( δ j ) δ j W ( H ( ¯ N j +1 )) , where ˆ C n is from Lemma 8.11 and¯ N j +1 := j +1 Y k =0 exp( ¯ H ( δ k )) ≥ j +1 Y k =0 exp( H ( δ k )) = j +1 Y k =0 N ( δ k ) =: N j +1 . By Lemma 8.11(i), W ( · ) is subadditive, thus τ X j =0 ψ ( δ j ) δ j W ( H ( ¯ N j +1 )) ≤ τ X j =0 ψ ( δ j ) δ j W (1 ∨ j +1 X k =1 ¯ H ( δ k )) ≤ τ X j =0 ψ ( δ j ) δ j j +1 X k =1 W (1 ∨ ¯ H ( δ k )) ≤ τ − X k =0 W (1 ∨ ¯ H ( δ k +1 )) τ X j = k ψ ( δ j ) δ j . (8.133)Similar to (8.71), there exists some universal constant c ψ > τ X j = k ψ ( δ j ) δ j ≤ ∞ X j = k Z δ j δ j / ψ ( δ j ) dx ≤ ∞ X j = k Z δ j δ j +1 ψ ( x ) dx ≤ Z δ k ψ ( x ) dx ≤ c ψ δ k ψ ( δ k ) . (8.134)02Furthermore, by definition of the sequence ( δ j ) j and since w ( · ) is decreasing but W isincreasing, we have W (1 ∨ ¯ H ( δ j +1 )) ≤ W (4(1 ∨ ¯ H ( δ j ))) ≤ W (1 ∨ ¯ H ( δ j )) . (8.135)Insertion of (8.134) and (8.135) into (8.133) yields τ X j =0 η j ≤ C n τ X j =0 ψ ( δ j ) δ j W ( H ( ¯ N j +1 )) ≤ c ψ ˆ C n ∞ X k =0 δ k ψ ( δ k ) W (1 ∨ ¯ H ( δ j )) ≤ c ψ ˆ C n ∞ X k =0 Z δ k δ k / ψ ( δ k ) W (1 ∨ ¯ H ( δ j )) dε ≤ c ψ ˆ C n ∞ X k =0 Z δ k δ k +1 ψ ( ε ) W (1 ∨ ¯ H ( ε )) dε ≤ c ψ ˆ C n Z σ ◦ ψ ( ε ) W (1 ∨ ¯ H ( ε )) dε = 64 c ψ ˆ C n ˜ I ( σ ◦ ) . (8.136)We set up the same decomposition as in the proof of Theorem 3.7. Define˜ m j := 12 ˜ m ( n, δ j , ¯ N j +1 ) . Note that x ≥ x x − η τ ) + ( x η τ ) . Define c := 5 · · c ψ . Condition 8.132 implies x ≥ c ψ ˆ C n ˜ I ( σ ◦ ) , (8.137)and thus with (8.136), we obtain x − η τ ≥ c ψ ˆ C n ˜ I ( σ ◦ ) − η τ ≥ τ − X j =0 η j . Put ˜ q ∗ j := ˜ q ∗ ( m ( n,δ j ,N j +1 ) √ n D ∞ n ψ ( δ j ) a M ), andΩ n := B n (˜ q ∗ ( M √ n D ∞ n a M )) ∩ τ \ j =0 B n (˜ q ∗ j ) , mpirical process theory for locally stationary processes B n ( q ) is from Theorem 5.1. From (8.33), we obtain the decomposition P (cid:16) sup f ∈F (cid:12)(cid:12) G Wn ( f ) (cid:12)(cid:12) > x, Ω n (cid:17) ≤ P (cid:16) sup f ∈F | G Wn ( π f ) | > x , Ω n (cid:17) + P (cid:16) sup f ∈F | G Wn ( ϕ ∧ ˜ m τ (∆ τ f )) | + 2 √ n sup f ∈F k ∆ τ f k ,n > x η τ , Ω n (cid:17) + P (cid:16) τ − X j =0 sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( ϕ ∧ ˜ m j − ˜ m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) > x − η τ , Ω n (cid:17) + P (cid:16) τ − X j =0 n sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ ˜ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j +1 f { ∆ j +1 f> ˜ m j +1 } k ,n o > x − η τ , Ω n (cid:17) + P (cid:16) τ − X j =0 n sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ ˜ m j − ˜ m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j f { ∆ j f> ˜ m j − ˜ m j +1 } k ,n o > x − η τ , Ω n (cid:17) ≤ P (cid:16) sup f ∈F | G Wn ( π f ) | > x , Ω n (cid:17) + P (cid:16) sup f ∈F | G Wn ( ϕ ∧ ˜ m τ (∆ τ f )) | + 2 √ n sup f ∈F k ∆ τ f k ,n > x η τ , Ω n (cid:17) + τ − X j =0 P (cid:16) sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( ϕ ∧ ˜ m j − ˜ m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) > η j , Ω n (cid:17) + τ − X j =0 P (cid:16) sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ ˜ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j +1 f { ∆ j +1 f> ˜ m j +1 } k ,n > η j , Ω n (cid:17) + τ − X j =0 P (cid:16) sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ ˜ m j − ˜ m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) +2 √ n sup f ∈F k ∆ j f { ∆ j f> ˜ m j − ˜ m j +1 } k ,n > η j , Ω n (cid:17) =: R ∗ + R ∗ + R ∗ + R ∗ + R ∗ . (8.138)We now discuss the terms in (8.138) separately.04 • We have by definition of Theorem 5.1 that R ∗ ≤ P (cid:16) sup f ∈F | G Wn ( π f ) | > x , B n (˜ q ∗ ( M √ n D ∞ n a M )) (cid:17) ≤ N ( σ ◦ ) · sup f ∈F P (cid:16) | G Wn ( π f ) | > x , B n (˜ q ∗ ( M √ n D ∞ n a M )) (cid:17) ≤ exp( H ( σ ◦ )) · c exp (cid:16) − c ( x/ σ + Φ(˜ q ∗ ( M √ n D ∞ n a M )) M ( x/ √ n (cid:17) ≤ c exp (cid:16) − c x σ + Φ(˜ q ∗ ( M √ n D ∞ n a M )) M √ n (cid:17) . • We have by Lemma 8.11 that P (cid:16) sup f ∈F | G Wn ( ϕ ∧ ˜ m τ (∆ τ f )) | > η τ , B n (˜ q ∗ τ ) (cid:17) ≤ exp( H ( N τ +1 )) · c exp( − H ( N τ +1 )) ≤ c ∞ X j =0 exp( − H ( N j +1 )) ≤ c exp( − H ( σ ◦ ))(for the last inequality, see the more detailed calculation for R ∗ below). By theCauchy-Schwarz inequality, the definition of τ and (8.137), √ n sup f ∈F k ∆ τ f k ,n ≤ √ n k ∆ τ f k ,n ≤ √ nV (∆ τ f ) ≤ √ nδ τ ≤ ˜ I ( σ ◦ ) < x . We conclude that R ∗ ≤ c exp( − H ( σ ◦ )) . • We have by Lemma 8.11(i) that R ∗ ≤ τ − X j =0 P (cid:16) sup f ∈F (cid:12)(cid:12)(cid:12) G Wn ( ϕ ∧ ˜ m j − ˜ m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) > η j , B n (˜ q ∗ j ) (cid:17) ≤ τ − X j =0 N j +1 · sup f ∈F P (cid:16)(cid:12)(cid:12)(cid:12) G Wn ( ϕ ∧ ˜ m j − ˜ m j +1 ( π j +1 f − π j f )) (cid:12)(cid:12)(cid:12) > η j , B n (˜ q ∗ j ) (cid:17) ≤ τ − X j =0 exp( H ( N j +1 )) · c exp( − H ( ¯ N j +1 )) ≤ c τ − X j =0 exp( − H ( ¯ N j +1 )) ≤ c τ − X j =0 exp( − ¯ H ( δ j +1 )) ≤ c ∞ X j =0 exp( − j +1 ¯ H ( σ ◦ )) ≤ c exp( − ¯ H ( σ ◦ )) . mpirical process theory for locally stationary processes (cid:0) ∞ X j =0 exp( − j +1 ¯ H ( σ ◦ )) (cid:1) exp( ¯ H ( σ ◦ )) = ∞ X j =0 exp( − (4 j +1 −
1) ¯ H ( σ ◦ )) ≤ ∞ X j =0 exp( − (4 j +1 − ≤ . (8.139) • Similarly as for R ∗ , we have by Lemma 8.11(ii) that τ − X j =0 P (cid:16) sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ ˜ m j +1 (∆ j +1 f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) > η j , B n (˜ q ∗ j ) (cid:17) ≤ τ − X j =0 N j +1 · c exp( − H ( ¯ N j +1 )) ≤ c exp( − ¯ H ( σ ◦ )) , and, since a M ≥ √ n sup f ∈F k ∆ j +1 f { ∆ j +1 f> ˜ m j +1 } k ,n ≤ D ∞ n D n δ j W ( H ( ¯ N j +1 )) < ˆ C n · δ j W ( H ( ¯ N j +1 )) ≤ η j . This shows that R ∗ ≤ c exp( − H ( σ ◦ )) . • Similarly as for R ∗ , we obtain τ − X j =0 P (cid:16) sup f ∈F (cid:12)(cid:12)(cid:12) G Wn (min (cid:8)(cid:12)(cid:12) ϕ ∨ ˜ m j − ˜ m j +1 (∆ j f ) (cid:12)(cid:12) , m j (cid:9) ) (cid:12)(cid:12)(cid:12) > η j , B n (˜ q ∗ j (cid:17) ≤ τ − X j =0 N j +1 · c exp( − H ( ¯ N j +1 )) ≤ c exp( − ¯ H ( σ ◦ )) . As in the proof of Theorem 3.7 (discussion of R therein), we see that 2( ˜ m j − ˜ m j +1 ) ≥ ˜ m j due to the fact that the inequality˜ r ( δ j D n ) − ˜ r ( δ j +1 D n ) ≥ ˜ r ( δ j D n ) − ˜ r ( δ j D n ) ≥ ˜ r ( δ j D n ) −
12 ˜ r ( δ j D n ) ≥
12 ˜ r ( δ j D n ) .
06 only requires δ j +1 ≤ δ j . Thus, since a M ≥ √ n sup f ∈F k ∆ j f { ∆ j f> ˜ m j − ˜ m j +1 } k ,n ≤ √ n sup f ∈F k ∆ j f { ∆ j f> ˜ m j } k ,n ≤ D ∞ n D n δ j W ( H ( ¯ N j +1 )) < C n δ j W ( H ( ¯ N j +1 )) ≤ η j . This shows that R ∗ ≤ c exp( − ¯ H ( σ ◦ )) . By plugging in the above upper bounds for R ∗ i , i ∈ { , ..., } into (8.138) and using(8.132), we obtain P (cid:16) sup f ∈F | G Wn ( f ) | > x, Ω n (cid:17) ≤ c exp (cid:0) − c · x σ + Φ(˜ q ∗ ( M √ n D ∞ n a M )) Mx √ n (cid:17) . (8.140)Discussion of the residual term: By Lemma 8.11(ii), we have that: P (Ω cn ) ≤ P ( B n (˜ q ∗ ( M √ n D ∞ n a M )) c ) + ∞ X j =0 P ( B n (˜ q ∗ j ) c ) ≤ a M + 4 a M ∞ X j =0 ψ ( δ j ) ≤ a M ∞ X j =0 ψ ( δ j ) . Due to ∞ X j =0 ψ ( δ j ) = ∞ X j =0 δ j − δ j +1 Z δ j δ j +1 ψ ( δ j ) dx ≤ ∞ X j =0 Z δ j δ j +1 δ j ψ ( δ j ) dx ≤ Z σ ◦ xψ ( x ) dx ≤ σ ◦ ) − ∨ e e )) , the result follows. Proof of Lemma 6.3.
Put D v,n ( u ) = √ hK h ( u − v ). By (A1) and Assumption 6.2,Assumption 2.5 is fulfilled for F j with ν = 2 and ∆( k ) = O ( δ X M ( k )), C R = 1 + k max { C X , } M . mpirical process theory for locally stationary processes K is Lipschitz continuous and (A2) holds, we havesup | v − v ′ |≤ n − , | θ − θ ′ | ≤ n − (cid:12)(cid:12)(cid:0) ∇ jθ L n,h ( v, θ ) − E ∇ jθ L n,h ( v, θ ) (cid:1) − (cid:0) ∇ jθ L n,h ( v ′ , θ ′ ) − E ∇ jθ L n,h ( v ′ , θ ′ ) (cid:1)(cid:12)(cid:12) ∞ ≤ sup | v − v ′ |≤ n − , | θ − θ ′ | ≤ n − C R h (cid:2) L K | v − v ′ | + C Θ | θ − θ ′ | (cid:3) × n n X i = k (cid:0) | Z i | M + E | Z i | M (cid:1) = O p ( n − ) . Let Θ n be a grid approximation of Θ such that for any θ ∈ Θ, there exists some θ ′ ∈ Θ n such that | θ − θ ′ | ≤ n − . Since Θ ⊂ R d Θ , it is possible to choose Θ n such that | Θ n | = O ( n − d Θ ). Furthermore, define V n := { in − : i = 1 , ..., n } as an approximation of [0 , F ′ j = { f v,θ : θ ∈ Θ n , v ∈ V n } yields for j ∈ { , , } thatsup v ∈ [ h , − h ] (cid:12)(cid:12) ∇ jθ L n,h ( v, θ ) − E ∇ jθ L n,h ( v, θ ) (cid:12)(cid:12) ∞ = O p (cid:0) τ n (cid:1) . (8.141)Put ˜ L n,h ( v, θ ) = n P ni =1 K h ( i/n − v ) ℓ θ ( ˜ Z i ( v )). With (A1) it is easy to see that (cid:12)(cid:12) E ∇ jθ L n,h ( v, θ ) − E ∇ jθ ˜ L n,h ( v, θ ) (cid:12)(cid:12) ∞ ≤ d j Θ C R n n X i =1 | K h ( i/n − v ) | · k| Z i − ˜ Z i ( v ) | k M × (cid:0) k| Z i | k M − M + k| ˜ Z i ( v ) | k M − M (cid:1) ≤ d j Θ C R | K | ∞ C X (1 + 2 C M − X ) (cid:0) n − + h (cid:1) . (8.142)Finally, since K has bounded variation and R K ( u ) du = 1, uniformly in v ∈ [ h , − h ] itholds that E ∇ jθ ˜ L n,h ( v, θ ) = 1 n n X i =1 K h ( i/n − v ) E ∇ jθ ℓ θ ( ˜ Z ( v )) = E ∇ jθ ℓ θ ( ˜ Z ( v ))+ O (( nh ) − ) . (8.143)From (8.141), (8.142) and (8.143) we obtainsup v ∈ [ h , − h ] sup θ ∈ Θ (cid:12)(cid:12) ∇ jθ L n,h ( v, θ ) − E ∇ jθ ℓ θ ( ˜ Z ( v )) (cid:12)(cid:12) ∞ = O p ( τ ( j ) n ) , (8.144)08where τ ( j ) n := τ n + ( nh ) − + h, j ∈ { , } , τ (1) n := τ n + ( nh ) − + B h . By (A3) and (8.144) for j = 0, we obtain with standard arguments that if τ (0) n = o (1),sup v ∈ [ h , − h ] (cid:12)(cid:12) ˆ θ n,h ( v ) − θ ( v ) (cid:12)(cid:12) ∞ = o p (1) . Since ˆ θ n,h ( v ) is a minimizer of θ L n,h ( v, θ ) and ℓ θ is twice continuously differentiable,we have the representationˆ θ n,h ( v ) − θ ( v ) = −∇ θ L n,h ( v, ¯ θ v ) − ∇ θ L n,h ( v, θ ( v )) , (8.145)where ¯ θ v ∈ Θ fulfills | ¯ θ v − θ ( v ) | ∞ ≤ | ˆ θ n,h ( v ) − θ ( v ) | ∞ = o p (1).By (A2), we have (cid:12)(cid:12) E ∇ θ ℓ θ ( ˜ Z ( v )) (cid:12)(cid:12) θ = θ ( v ) − E ∇ θ ℓ θ ( ˜ Z ( v )) (cid:12)(cid:12) θ =¯ θ v (cid:12)(cid:12) ∞ = O ( | θ ( v ) − ¯ θ v | ) = o p (1) . and thus with (8.144),sup v ∈ [ h , − h ] (cid:12)(cid:12) ∇ θ L n,h ( v, ¯ θ v ) − E ∇ θ ℓ θ ( ˜ Z ( v )) (cid:12)(cid:12) θ = θ ( v ) (cid:12)(cid:12) ∞ = O p ( τ (2) n ) + o p (1) . (8.146)By (A3) and the dominated convergence theorem, E ∇ θ ℓ ( ˜ Z ( v )) = ∇ θ E ℓ ( ˜ Z ( v )) = 0. By(8.144),sup v ∈ [ h , − h ] (cid:12)(cid:12) ∇ θ L n,h ( v, θ ( v )) (cid:12)(cid:12) ∞ = sup v ∈ [ h , − h ] (cid:12)(cid:12) ∇ θ L n,h ( v, θ ( v )) − E ∇ θ ℓ ( ˜ Z ( v )) (cid:12)(cid:12) ∞ = O p ( τ (1) n ) . (8.147)Inserting (8.146) and (8.147) into (8.145), we obtainsup v ∈ [ h , − h ] (cid:12)(cid:12) ˆ θ n,h ( v ) − θ ( u ) (cid:12)(cid:12) ∞ = O p ( τ (1) n ) . This yields an improved version of (8.146):sup v ∈ [ h , − h ] (cid:12)(cid:12) ∇ θ L n,h ( v, ¯ θ v ) − E ∇ θ ℓ θ ( ˜ Z ( v )) (cid:12)(cid:12) θ = θ ( v ) (cid:12)(cid:12) ∞ = O p ( τ (2) n ) . (8.148)Inserting (8.147) and (8.148) into (8.145), we obtain the assertion. Details of Example 6.8.
We first show that the supremum over x ∈ R , v ∈ [0 ,
1] canbe approximated by a supremum over grids x ∈ X n , v ∈ V n . mpirical process theory for locally stationary processes Q >
0, put c n = Qn s . Define the event A n = { sup i =1 ,...,n | X i | ≤ c n } . Thenby Markov’s inequality, P ( A cn ) ≤ n · k X i k s s Q s c sn ≤ C sX nc sn (8.149)is arbitrarily small for Q large enough.Put ˆ g ◦ n,h ( x, v ) := n P ni =1 K h ( i/n − v ) ˜ K h ( X i − x ) {| X i |≤ c n } . ThenOn A n , ˆ g ◦ n,h ( · ) = ˆ g n,h ( · ) . (8.150)Furthermore, p nh h (cid:12)(cid:12) E ˆ g n,h ( x, v ) − E ˆ g ◦ n,h ( x, v ) (cid:12)(cid:12) ≤ √ nh h | K | ∞ nh n X i =1 E [ ˜ K h ( X i − x ) {| X i | >c n } ] ≤ p nh h ( h h ) − | K | ∞ c − sn sup i E [ ˜ K ( X i − xh ) | X i | s ] ≤ Q − s ( nh h ) − / | ˜ K | ∞ | K | ∞ C sX = o (1) . (8.151)For | x | > c n , we have ˜ K h ( X i − x ) {| X i |≤ c n } ≤ h − ( c n h ) − p K = h p K − c − p K n and thus √ nh | ˆ g ◦ n,h ( x, v ) − E ˆ g ◦ n,h ( x, v ) | ≤ | K | ∞ C ˜ K h / ( nh ) / h p K − c − p K n ≤ h p K Q p K ( nh h ) / = o (1) . (8.152)By (8.150), (8.151) and (8.152), we have on A n , p nh h sup x ∈ R ,v ∈ [0 , | ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) | = p nh h sup x ∈ R ,v ∈ [0 , | ˆ g ◦ n,h ( x, v ) − E ˆ g ◦ n,h ( x, v ) | + o p (1)= p nh h sup | x |≤ c n ,v ∈ [0 , | ˆ g ◦ n,h ( x, v ) − E ˆ g ◦ n,h ( x, v ) | + o p (1)= p nh h sup | x |≤ c n ,v ∈ [0 , | ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) | + o p (1) . (8.153)Let X n = { in − : i ∈ {− ⌈ c n ⌉ n , ..., ⌈ c n ⌉ n }} be a grid that approximates each x ∈ [ − c n , c n ] with precision n − , and V n = { in − : i = 1 , ..., n } . Since K, ˜ K are Lipschitzcontinuous, p nh h sup | x − x ′ |≤ n − , | v − v ′ |≤ n − (cid:12)(cid:12)(cid:0) ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) (cid:1) − (cid:0) ˆ g n,h ( x ′ , v ) − E ˆ g n,h ( x ′ , v ) (cid:1)(cid:12)(cid:12) ≤ √ n √ h h sup | x − x ′ |≤ n − , | v − v ′ |≤ n − h L ˜ K | K | ∞ | x − x ′ | h + L K | ˜ K | ∞ | v − v ′ | h i = O ( n − ) . (8.154)10We conclude from (8.149), (8.153) and (8.154) that p nh h sup x ∈ R ,v ∈ [0 , | ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) | = p nh h sup x ∈X n ,v ∈ V n | ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) | + O p (1) (8.155)It was already shown that Assumption 2.8 is satisfied. Furthermore, we can choose D n = | K | ∞ , D ∞ ν ,n = | K | ∞ √ h with ν = ∞ , and ¯ F ( z, u ) = sup f ∈F ¯ f ( z, u ) ≤ | ˜ K | ∞ √ h =: C ¯ F ,n . Notethat E [( p h ˜ K h ( X i − x )) ] = E Z ˜ K ( u ) g ε (cid:0) x + ωh − m ( X i − , in ) σ ( X i − , i/n ) (cid:1) dω ≤ | g ε | ∞ Z ˜ K ( u ) du, therefore k f x,v k ,n ≤ D n | g ε | ∞ Z ˜ K ( u ) du, which implies σ := sup n ∈ N sup f ∈F V ( f ) < ∞ . Due to ∆( k ) = O ( j − αs ), the last conditionin (4.3) is fulfilled if sup n ∈ N log( n ) nh h α ( s ∧
12 ) α ( s ∧
12 ) − < ∞ . By Corollary 4.3, we have p nh h sup x ∈X n ,v ∈ V n (cid:12)(cid:12) ˆ g n,h ( x ) − E ˆ g n,h ( x, v ) (cid:12)(cid:12) = sup f ∈F | G n ( f ) | = O p ( p log |F| ) = O ( p log( n )) . With (8.155), it follows that p nh h sup x ∈ R ,v ∈ [0 , | ˆ g n,h ( x, v ) − E ˆ g n,h ( x, v ) | = O p (cid:0)p log( n ) (cid:1) . We have for the distribution function of X i , G X i ( x ) = E G ε (cid:0) x − m ( X i − , i/n ) σ ( X i − , i/n ) (cid:1) , (8.156)and after differentiating we obtain the density of X i , g X i ( x ) = E Z ˜ K ( ω ) 1 σ ( X i − , i/n ) g ε (cid:0) x − m ( X i − , i/n ) σ ( X i − , i/n ) (cid:1) dω. (8.157) mpirical process theory for locally stationary processes (cid:12)(cid:12) E ˆ g n,h ( x, v ) − n n X i =1 K h ( i/n − v ) g X i ( x ) (cid:12)(cid:12) ≤ n n X i =1 K h ( i/n − v ) · E Z ˜ K ( ω ) 1 σ ( X i − , i/n ) × (cid:12)(cid:12)(cid:12) g ε (cid:0) x + ωh − m ( X i − , i/n ) σ ( X i − , i/n ) (cid:1) − g ε (cid:0) x − m ( X i − , i/n ) σ ( X i − , i/n ) (cid:1)(cid:12)(cid:12)(cid:12) dω ≤ | g ′ | ∞ | K | ∞ σ min Z ˜ K ( ω ) | ω | dω · h . (8.158)A similar derivation as in (8.157) yields g ˜ X i ( v ) ( x ) = E σ ( ˜ X i − ( v ) , v ) g ε (cid:0) x − m ( ˜ X i − ( v ) , v ) σ ( ˜ X i − ( v ) , v ) (cid:1) . By bounded variation of K ( · ), (6.12) and (6.9), we have with some constant C > (cid:12)(cid:12) n n X i =1 K h ( i/n − v ) g X i ( x ) − g ˜ X i ( v ) ( x ) (cid:12)(cid:12) = 1 n n X i =1 K h ( i/n − v ) (cid:12)(cid:12) g X i ( x ) − g ˜ X i ( v ) ( x ) (cid:12)(cid:12) + O (( nh ) − ) ≤ Cn n X i =1 K h ( i/n − v ) (cid:0) E | m ( X i − , i/n ) − m ( ˜ X i − ( v ) , v ) | s ∧ + E | σ ( X i − , i/n ) − σ ( ˜ X i − ( v ) , v ) | s ∧ (cid:1) + O (( nh ) − ) . = O ( n − ς ( s ∧ + h ς ( s ∧ + ( nh ) − ) . (8.159) Details of Example 6.10.
Calculation of bracketing numbers:
By the smoothness as-sumptions on m , | m ( x, u ) | ≤ C m + χ m | x | . Thus k m ( X i − , u ) k s s ≤ C sm + χ sm k X i − k s s ≤ C sm + χ sm C sX =: C sM , and similarly, k σ ( X i − , u ) k s s ≤ C sσ + χ sσ C sX =: C s Σ .Let γ > C γ = max { C M , C Σ } ( γ | K | ∞ ) − s . Define A := {| m ( X i − , in ) | ≤ C γ } ∩ { σ ( X i − , in ) ≤ C γ } . P ( A c ) ≤ (cid:16) k m ( Z i , in ) k s C γ (cid:17) s + (cid:16) k σ ( Z i , in ) k s C γ (cid:17) s ≤ (cid:0) C M C γ (cid:1) s ≤ γ | K | ∞ . (8.160)Define x N := C γ + C ε C γ ( γ | K | ∞ ) − s , x N +1 = ∞ , x − ( N +1) := −∞ , x − N := − x N and x j := x + γ σ min | g ε | ∞ | K | ∞ ( N + j ) , j = − ( N − , ..., N − , where N := ⌈ x N | g ε | ∞ | K | ∞ γ σ min ⌉ . Then, x N − x N − = 2 x N − γ σ min | g ε | ∞ | K | ∞ (2 N − ≤ x N − γ σ min | g ε | ∞ | K | ∞ · (cid:16) x N | g ε | ∞ | K | ∞ γ σ min − (cid:17) ≤ γ σ min | g ε | ∞ | K | ∞ , that is, the brackets [ f x j − , f x j ] = { f ∈ F : f x j − ≤ f ≤ f x j } , j = − N, ..., N + 1 cover F . We now show that k f x j − − f x j k ,n ≤ γ , j = − N, ..., N + 1.By Markov’s inequality, we have for x ∈ R that G ε ( x ) = P ( ε ≤ x ) ≥ − k ε k s s x s ≥ − C sε x − s . On the event A , we have G ε (cid:16) x N − m ( X i − , in ) σ ( i/n ) (cid:17) ≥ G ε (cid:16) C γ − m ( X i − , in ) σ ( i/n ) + C ε C γ σ ( i/n ) (cid:0) γ | K | ∞ (cid:1) − s (cid:17) ≥ G ε (cid:0) C ε (cid:0) γ | K | ∞ (cid:1) − s (cid:1) ≥ − γ | K | ∞ . (8.161)If j = N + 1, we have by (8.160) and (8.161): E h G ε (cid:16) x j − m ( X i − , in ) σ ( i/n ) (cid:17) − G ε (cid:16) x j − − m ( X i − , in ) σ ( i/n ) (cid:17)i / = (cid:16) E h(cid:16) − G ε (cid:16) x N − m ( X i − , in ) σ ( i/n ) (cid:17)(cid:17) A i + P ( A c ) (cid:17) / ≤ (cid:16) γ | K | ∞ + 2 γ | K | ∞ (cid:17) / = γ | K | ∞ , that is, k f x N +1 − f x N k ,n ≤ γ . A similar calculation holds for j = − N . mpirical process theory for locally stationary processes j ∈ { , ..., N } , we have by definition of x j , k f x j − − f x j k ,n = (cid:16) nh n X i =1 K (cid:0) i/n − vh (cid:1) E [( { x j −
0. We conclude that q H ( γ, F , k · k ,n ) ≤ r log( C N ) + (2 + 2 s ) log( γ − ) , that is, as long as α ( s ∧ ) >
1, sup n ∈ N R p H ( ε, F , V ) dε < ∞ . Then, the conditions ofCorollary 4.14 are fulfilled and we obtain that (6.15) converges to [ G ( f )] f ∈F in ℓ ∞ ( F ).Similar as in (8.156), we have G X i ( x ) = E G ε (cid:0) x − m ( X i − , i/n ) σ ( X i − , i/n ) (cid:1) , G ˜ X i − ( v ) ( x ) = E G ε (cid:0) x − m ( ˜ X i − ( v ) , v ) σ ( ˜ X i − ( v ) , v ) (cid:1) . By bounded variation of K , we obtain with a similar calculation as in (8.159) that (cid:12)(cid:12) E ˆ G n,h ( x, v ) − G ˜ X ( v ) ( x ) (cid:12)(cid:12) ≤ n n X i =1 K h ( i/n − v ) (cid:12)(cid:12) G X i ( x ) − G ˜ X ( v ) ( x ) (cid:12)(cid:12) + O (( nh ) − )= O ( n − ς ( s ∧ + h ς ( s ∧ + ( nh ) − ) . V -norm and connected quantities Lemma 8.13 (Summation of polynomial and geometric decay) . Let α > and q ∈ N .Then it holds that (i) α − q − α +1 ≤ ∞ X j = q j − α ≤ max { α, − α +1 } α − q − α +1 . (ii) For σ > , κ ≥ b ρ,κ ,l · σ · log( σ − ) ≤ ∞ X j =1 min { σ, κ ρ j } ≤ b ρ,κ · σ · log( σ − ∨ e ) ,b α,κ ,l · σ · σ − α ≤ ∞ X j =1 min { σ, κ j − α } ≤ b α,κ · σ · max { σ − α , } , where b ρ,κ , b ρ,κ ,l , b α,κ , b α,κ ,l are constants only depending on ρ, κ , α . Proof of Lemma 8.13. (i) Upper bound: If q ≥
2, then ∞ X j = q j − α = ∞ X j = q Z jj − j − α dx ≤ ∞ X j = q Z jj − x − α dx = Z ∞ q − x − α dx = 1 − α + 1 x − α +1 (cid:12)(cid:12)(cid:12) ∞ q − = 1 α − q − − α +1 = 1 α − q − α +1 · ( q − q ) − α +1 ≤ − α +1 α − q − α +1 . If q = 1, then P ∞ j = q j − α = 1 + P ∞ j = q +1 j − α ≤ α − q − α +1 = αα − .Lower bound: Using similar decomposition arguments as above, we have ∞ X j = q j − α ≥ ∞ X j = q Z j +1 j x − α dx = Z ∞ q x − α dx = 1 − α + 1 x − α +1 (cid:12)(cid:12)(cid:12) ∞ q = 1 α − q − α +1 . (ii) • Exponential decay:
Upper bound: First let a := max {⌊ log( σ/κ )log( ρ ) ⌋ , } + 1. Thenwe have ∞ X j =0 min { σ, κ ρ j } ≤ a − X j =0 σ + κ ∞ X j = a ρ j = aσ + κ ρ a − ρ ≤ aσ + κ − ρ min { σκ , } ≤ aσ + σ − ρ ≤ σ · h ρ − ) max { log( κ /σ ) , } + 21 − ρ i ≤ σ · h ρ − ) max { log( σ − ) , } + log( κ ) ∨ ρ − ) + 21 − ρ i ≤ b ρ,κ · σ · log( σ − ∨ e ) , where b ρ,κ := 2(log( κ ) ∨ · ρ − ) (cid:2) ρ − )1 − ρ (cid:3) . mpirical process theory for locally stationary processes β ( q ) = κ P ∞ j = q ρ j = κ − ρ ρ q . Then ∞ X j =1 min { σ, κ ρ j } ≥ σ (ˆ q −
1) + β (ˆ q ) , where ˆ q = min { q ∈ N : σκ ≥ ρ q } . We have ˆ q ≥ log( σ/κ )log( ρ ) =: q and ˆ q ≤ q + 1.Thus ∞ X j =1 min { σ, κ ρ j } ≥ σ ( q −
1) + β ( q + 1) . Now consider the case σκ < ρ , that is, log( σ/κ )log( ρ ) ≥
2. Then, q − ≥ q , and q ≤ log( σ/κ )log( ρ ) . We obtain ∞ X j =1 min { σ, κ ρ j } ≥ σ log( σ/κ )log( ρ ) + κ ρ − ρ ρ log( σ/κ ρ ) = 12 σ log( σ/κ )log( ρ ) + ρ − ρ σ ≥ (cid:16) ρ − ρ + 1log( ρ − ) (cid:17) σ log( σ − κ ) , that is, the assertion holds with b ρ,κ ,l := (cid:0) ρ − ρ + ρ − ) (cid:1) . • Polynomial decay:
Upper bound: Let a := ⌊ ( σκ ) − α ⌋ + 1 ≥ ( σκ ) − α . Then wehave by (i): ∞ X j =1 min { σ, κ j − α } ≤ a X j =1 σ + κ ∞ X j = a +1 j − α = aσ + κ α − a − α +1 ≤ aσ + κ α α − σ α − α ≤ σ · h κ α σ − α + 1 + κ α α − σ − α i ≤ σ · h αα − κ α σ − α + 1 i ≤ b α,κ · σ · max { σ − α , } , where b α,κ := 2 αα − ( κ ∨ α .Lower Bound: Put β ( q ) = κ P ∞ j = q j − α . By (i), β ( q ) ≥ κ α − q − α +1 . Then ∞ X j =1 min { σ, κ j − α } ≥ min q ∈ N { σq + β ( q ) }≥ min q ∈ N { σq + κ α − q − α +1 } .
16 Elementary analysis yields that the minimum is achieved for q = κ α · σ − a =( κ σ ) α , that is, ∞ X j =1 min { σ, κ j − α } ≥ αα − κ α · σ α − α , the assertion holds with b α,κ ,l := αα − κ α . Lemma 8.14 (Values of q ∗ , r ( δ )) . • Polynomial decay ∆( j ) = κj − α ( α > ). Thenthere exist constants c ( i ) α,κ , C ( i ) α,κ > , i = 1 , only depending on κ, α such that c (1) α,κ max { x − α , } ≤ q ∗ ( x ) ≤ C (1) α,κ max { x − α , } , and c (2) α,κ min { δ αα − , δ } ≤ r ( δ ) ≤ C (2) α,κ min { δ αα − , δ } . • Geometric decay ∆( j ) = κρ j ( ρ ∈ (0 , ). Then there exist constants c ( i ) ρ,κ , C ( i ) ρ,κ > , i = 1 , only depending on κ, ρ such that c (1) ρ,κ max { log( x − ) , } ≤ q ∗ ( x ) ≤ C (1) ρ,κ max { log( x − ) , } , and c (2) ρ,κ δ log( δ − ∨ e ) ≤ r ( δ ) ≤ C (2) ρ,κ δ log( δ − ∨ e ) . Proof of Lemma 8.14. (i) By Lemma 8.13(i), β norm ( q ) = β ( q ) q ∈ [ c α,κ q − α , C α,κ q − α ]with c α,κ = κα − , C α,κ = κ max { α, − α +1 } α − . In the following we assume w.l.o.g. that C α,κ > c α,κ < • q ∗ ( x ) Upper bound: For any x > q ∗ ( x ) = min { q ∈ N : β norm ( q ) ≤ x } ≤ min { q ∈ N : q ≥ ( xC α,κ ) − α } = ⌈ ( xC α,κ ) − α ⌉ . Especially we obtain q ∗ ( x ) ≤ ( xC α,κ ) − α + 1 ≤ C α α,κ max { x − α , } . The asser-tion holds with C (1) α,κ := 2 max { C α,κ , } α . • q ∗ ( x ) Lower bound: Similarly to above, q ∗ ( x ) ≥ ⌈ ( xc α,κ ) − α ⌉ ≥ (cid:0) xc α,κ (cid:1) − α = c α α,κ x − α . On the other hand, q ∗ ( x ) ≥ ≥ c α α,κ , which yields the assertion with c (1) α,κ =min { c α,κ , } α . mpirical process theory for locally stationary processes • r ( δ ) Upper bound: Put r = 2 αα − c − α − α,κ δ αα − . Then we have q ∗ ( r ) r ≥ ⌈ ( rc α,κ ) − α ⌉ r = 2 αα − c − α − α,κ ⌈ − α − c α − α,κ δ − α − ⌉ δ αα − ≥ δ > δ. By definition of r ( · ), r ( δ ) ≤ r . It was already shown in Lemma 3.6(i) that r ( δ ) ≤ δ holds for all δ >
0. We obtain the assertion with C (2) α,κ = 2 αα − c − α − α,κ . • r ( δ ) Lower bound: First consider the case δ < C α,κ .Put r = 2 − αα − C − α − α,κ δ αα − . Since x := 2 α − C α − α,κ δ − α − > ⌈ x ⌉ ≤ x andthus q ∗ ( r ) r ≤ ⌈ ( rC α,κ ) − α ⌉ r = 2 − αα − C − α − α,κ ⌈ α − C α − α,κ δ − α − ⌉ δ αα − ≤ · − δ ≤ δ. By definition of r ( · ), r ( δ ) ≥ r = 2 − αα − min { ( δC α,κ ) α − , } δ .In the case δ > C α,κ , we have q ∗ ( δ ) δ = ⌈ ( δC α,κ ) − α ⌉ δ ≤ · δ ≤ δ, thus r ( δ ) ≥ δ = min { ( δC α,κ ) α − , } δ ≥ − αα − min { ( δC α,κ ) α − , } δ . We con-clude that the assertion holds with c (2) α,κ = 2 − αα − C − α − α,κ .(ii) We have β norm ( q ) = β ( q ) q = C ρ,κ ρ q q , where C ρ,κ = κρ − ρ . In the following we assumew.l.o.g. that C ρ,κ > • q ∗ ( x ) Upper bound: Put ψ ( x ) = max { log( x − ) , } . Define ˜ q = ⌈ ψ ( xCρ,κ log( ρ − )log( ρ − ) ⌉ .Then we have β norm (˜ q ) ≤ C ρ,κ ρ log( (cid:0) xCρ,κ log( ρ − (cid:1) − ) / log( ρ − ) ˜ q ≤ x log( ρ − ) ˜ q ≤ xψ ( xC ρ,κ log( ρ − ) ) ≤ x, thus q ∗ ( x ) = min { q ∈ N : β norm ( q ) ≤ x } ≤ ˜ q = l ψ ( xC ρ,κ log( ρ − ) )log( ρ − ) m . Especially we obtain q ∗ ( x ) ≤ ρ − ) (cid:0) ψ ( x )+log( C ρ,κ log( ρ − (cid:1) +1 ≤ C ρ,κ log( ρ − )))log( ρ − ) ψ ( x ) , that is, the assertion holds with C (1) ρ,κ = C ρ,κ log( ρ − ρ − ) .18 • q ∗ ( x ) Lower Bound: Case 1: Assume that x < C ρ,κ log( ρ − ) ρ . Define ˜ q = ⌈
14 log(( xCρ,κ log( ρ − ) − )log( ρ − ) ⌉ ≥
1. Then ˜ q ≤
12 log(( xCρ,κ log( ρ − ) − )log( ρ − ) , and thus β norm (˜ q ) ≥ C ρ,k (cid:16) xC ρ,κ log( ρ − ) (cid:17) / ˜ q ≥ ( C ρ,κ log( ρ − )) / x / log(( xC ρ,κ log( ρ − ) ) − / ) > x since ( xC ρ,κ log( ρ − ) ) − / > log(( xC ρ,κ log( ρ − ) ) − / ) . We have therefore shown that for x < C ρ,κ log( ρ − ) ρ , q ∗ ( x ) ≥ ˜ q = max { , ˜ q } . (8.162)Case 2: If x ≥ C ρ,κ log( ρ − ) ρ , then ˜ q ≤
1, that is, q ∗ ( x ) ≥ { , ˜ q } . We have shown that for all x > q ∗ ( x ) ≥ max { , ˜ q } . Since˜ q ≥
14 log(( xC ρ,κ log( ρ − ) ) − )log( ρ − ) ≥
14 log( ρ − ) (cid:2) log( x − ) + log( C ρ,κ log( ρ − )) (cid:3) ≥
14 log( ρ − ) log( x − ) , the assertion follows with c (1) ρ,κ =
14 log( ρ − ) . • r ( δ ) Upper bound: Put ˜ r = c (1) ρ,κ ) − δ log((2 − c (1) ρ,κ δ − ) ∨ e ) . Then we have q ∗ (˜ r )˜ r ≥ c (1) ρ,κ log(˜ r − ∨ e ) · ˜ r = 2 δ log((2 − c (1) ρ,κ δ − ) ∨ e ) · log([2 − c (1) ρ,κ δ − log((2 − c (1) ρ,κ δ − ) ∨ e )] ∨ e ) ≥ δ log((2 − c (1) ρ,κ δ − ) ∨ e ) · log([2 − c (1) ρ,κ δ − ] ∨ e ) = 2 δ > δ. By definition of r ( · ), we obtain r ( δ ) ≤ ˜ r. mpirical process theory for locally stationary processes a ∈ (0 , , ∞ ) → (0 , ∞ ) , x log( x − ∨ e )log(( ax − ) ∨ e ) attains itsmaximum at x = ae − with maximum value 1 + log( a − ). Thus˜ r ≤ c (1) ρ,κ ) − (1 + log(2 − ( c (1) ρ,κ ) − )) · δ log( δ − ∨ e ) , that is, the assertion holds with C (2) ρ,κ = 2( c (1) ρ,κ ) − (1 + log(2 − ( c (1) ρ,κ ) − )). • r ( δ ) Lower Bound: Put ˜ r = − ( C (1) ρ,κ ) − δ log((2 C (1) ρ,κ δ − ) ∨ e ) . Then q ∗ (˜ r )˜ r ≤ C (1) ρ,κ log(˜ r − ∨ e ) · ˜ r = 2 − δ log((2 C (1) ρ,κ δ − ) ∨ e ) · log([2 C (1) ρ,κ δ − log((2 C (1) ρ,κ δ − ) ∨ e )] ∨ e ) ≤ − δ log(( C (1) ρ,κ δ − ) ∨ e ) · (cid:2) log((2 C (1) ρ,κ δ − ) ∨ e ) + log log((2 C (1) ρ,κ δ − ) ∨ e ) (cid:3) ≤ δ, where the last step is due to log( x ) + log log( x ) ≤ x ) for x ≥ e . Bydefinition of r ( · ), we obtain r ( δ ) ≥ ˜ r. For a >
1, the function (0 , ∞ ) → (0 , ∞ ) , x log( x − ∨ e )log(( ax − ) ∨ e ) attains its minimumat x = e − with minimum value a ) . We therefore obtain˜ r ≥ ( C (1) ρ,κ ) − C (1) ρ,κ )) δ log( δ − ∨ e ) , that is, the assertion holds with c (2) ρ,κ = ( C (1) ρ,κ ) − C (1) ρ,κ )) . Lemma 8.15 (Form of V ) . (i) Polynomial decay ∆( j ) = κj − α (where α > ): Thenthere exist some constants C (3) α,κ , c (3) α,κ only depending on κ, α, D n such that c (3) α,κ k f k ,n max {k f k − α ,n , } ≤ V ( f ) ≤ C (3) α,κ k f k ,n max {k f k − α ,n , } . (ii) Geometric decay ∆( j ) = κρ j (where ρ ∈ (0 , ): Then there exist some constants c (3) ρ,κ , C (3) ρ,κ only depending on κ, ρ, D n such that c (3) ρ,κ k f k ,n max { log( k f k − ,n ) , } ≤ V ( f ) ≤ C (3) ρ,κ k f k ,n max { log( k f k − ,n ) , } . Proof of Lemma 8.15.
The assertions follow from Lemma 8.13(ii) by taking κ = κ D n . The maximum in the lower bounds is obtained due to the additional summand k f k ,n in V ( f ).The following lemma formulates the entropy integral in terms of the well-known brack-eting numbers in terms of the k · k ,n -norm in the case that sup n ∈ N D n < ∞ . For this, weuse the upper bounds of V given in Lemma 8.15. Lemma 8.16. (i) Polynomial decay ∆( j ) = κj − α (where α > ). Then for any σ ∈ (0 , C (3) α,κ ) , Z σ p H ( ε, F , V ) dε ≤ C (3) α,κ α − α Z ( σC (3) α,κ ) αα − u − α q H ( u, F , k · k ,n ) du, where C (3) α,κ is from lemma 8.15.(ii) Exponential decay ∆( j ) = κρ j (where ρ ∈ (0 , ). Then for any σ ∈ (0 , e − C (3) ρ,κ ) , Z σ p H ( ε, F , V ) dε ≤ C (3) ρ,κ Z E − ( σC (3) ρ,κ )0 log( u − ) q H ( u, F , k · k ,n ) du, where E − ( x ) = x log( x − ) and C (3) ρ,κ is from lemma 8.15. Proof of Lemma 8.16. (i) By Lemma 8.15, V ( f ) ≤ C (3) α,κ k f k ,n max {k f k − α ,n , } .We abbreviate c = C (3) α,κ in the following.Let ε ∈ (0 , c ) and ( l j , u j ), j = 1 , ..., N brackets such that k u j − l j k ,n ≤ ( εc ) αα − .Then V ( u j − l j ) ≤ c max {k u j − l j k ,n , k u j − l j k α − α ,n } ≤ c max n ( εc ) αα − , εc o ≤ c · εc = ε. Therefore, the bracketing number fulfill the relation N ( ε, F , V ) ≤ N (cid:16) ( εc ) αα − , F , k · k ,n (cid:17) . We conclude that for σ ∈ (0 , c ), Z σ p H ( ε, F , V ) dε ≤ Z σ r H (cid:16) ( εc ) αα − , F , k · k ,n (cid:17) dε = c α − α Z ( σc ) αα − u − α q H ( u, F , k · k ,n ) du. In the last step, we used the substitution u = ( εc ) αα − which leads to dudε = αα − · c · ( εc ) α − = αα − · c · u α . mpirical process theory for locally stationary processes V ( f ) ≤ C (3) ρ,κ E ( k f k ,n ) with E ( x ) = x max { log( x − ) , } . We ab-breviate c = C (3) ρ,κ in the following.We first collect some properties of E . Put E − ( x ) = x log( x − ∨ e ) . In the case x > e − ,we have E ( E − ( x )) = x . In the case x ≤ e − , we have E ( E − ( x )) = x log( x − ) · log (cid:16) x − log( x − ) − (cid:17) ≤ x log( x − ) log( x − ) = x. This shows that for all x > E ( E − ( x )) ≤ x. (8.163)Furthermore, for x < e − ,log( E − ( x ) − ) = log( x − log( x − )) ≥ log( x − ) . (8.164)Now let ε ∈ (0 ,
1) and ( l j , u j ), j = 1 , ..., N brackets such that k u j − l j k ,n ≤ E − ( εc ).Then by (8.163), V ( u j − l j ) ≤ cE ( E − ( εc )) ≤ c · εc = ε. Therefore, we have the following relation between the bracketing numbers N ( ε, F , V ) ≤ N (cid:16) E − ( εc ) , F , k · k ,n (cid:17) . We conclude that for σ ∈ (0 , ce − ), Z σ p H ( ε, F , V ) dε ≤ Z σ r H (cid:16) E − ( εc ) , F , k · k (cid:17) dε ≤ c Z E − ( σc )0 log( u − ) p H ( u, F , k · k ) du. In the last step, we used the substitution u = E − ( εc ) which leads to dudε = c · ε/c ) − )log(( ε/c ) − ) , and with (8.164) we obtain dε = c log(( ε/c ) − ) ε/c ) − ) du ≤ c log(( ε/c ) − ) du ≤ c log( E − ( εc ) − ) du = c log( u − ))