The maximum of the periodogram of Hilbert space valued time series
TThe maximum of the periodogram of Hilbert space valuedtime series
Cl´ement Cerovecki a , Vaidotas Characiejus b , and Siegfried H¨ormann c a D´epartement de math´ematique, Universit´e libre de Bruxelles, Belgium a Department of Mathematics, Katholieke Universiteit Leuven, Belgium b Department of Statistics, University of California, Davis, USA c Institute of Statistics, Graz University of Technology, Graz, Austria
July 4, 2020
Abstract
We are interested to detect periodic signals in Hilbert space valued time series when thelength of the period is unknown. A natural test statistic is the maximum Hilbert-Schmidtnorm of the periodogram operator over all fundamental frequencies. In this paper we analyzethe asymptotic distribution of this test statistic. We consider the case where the noise vari-ables are independent and then generalize our results to functional linear processes. Detailsfor implementing the test are provided for the class of functional autoregressive processes.We illustrate the usefulness of our approach by examining air quality data from Graz, Aus-tria. The accuracy of the asymptotic theory in finite samples is evaluated in a simulationexperiment.
Keywords: periodogram, periodicities, spectral analysis, time series, functional data, hy-pothesis testing.
MSC2020:
Periodic characteristics are present in many time series due to various factors such as differentseasons, meteorological phenomena, human economic activity, transport, etc. The interest todetect, analyze and model periodicities goes back to the origins of time series analysis (for example,Schuster [34], Walker [39], Yule [41], Fisher [10], Grenander and Rosenblatt [12], Jenkins andPriestley [20], Hannan [14], Shimshoni [36] to name just a few).The primary motivation of this paper is to develop a methodology to detect periodicities infunctional time series (FTS). This a sequence { X t } t ≥ , where each X t is a curve { X t ( u ) } u ∈U .FTS have been gaining interest in recent years due to the advances of modern technology andthe availability of high frequency data. Frequently, FTS arise from measurements obtained byseparating a continuous time process { Y ( u ) } u ≥ into natural consecutive intervals, for instance,days. Then, in an appropriate time scale we have X t ( u ) = Y ( t + u ) for u ∈ U = [0 , a r X i v : . [ m a t h . S T ] J u l Liebl [24]), high frequency asset price data (Horv´ath et al. [18]), daily pollution level curves (Aueet al. [1]), daily vehicle traffic curves (Klepsch et al. [22]), etc. It should be noted, that such asegmentation already accounts for a periodic structure in the underlying continuous time process.For example, when we segment into daily data, it is because we expect a similar daily fluctuationin each curve. Our interest is then to investigate if there remains a periodic behavior with respectto the discrete time parameter t .While this problem is well explored in the univariate setting (see Section 10.2 of Brockwell andDavis [3] for an overview of the classical tests), developments in multivariate or functional contextare restricted to periodicity tests where the length of the period is known (see MacNeill [26] andH¨ormann, Kokoszka, and Nisol [16]). This paper is motivated by the interest in testing for anunspecified period , which makes the problem considerably more complex and requires an entirelydifferent theoretical approach. Testing for an unspecified period (in residuals or raw data) isrelevant, because periodic behavior can have diverse causes and quite often is not evident. Eventhough sometimes we expect that the data contains, for example, a weekly or monthly periodiccomponent, there are situations when the period of a latent signal is not so evident. For instance,the solar cycle is a nearly periodic 11-year change in the Sun’s activity measured in terms ofvariations in the number of observed sunspots on the solar surface discovered by Schwabe [35].In Section 4 we also show that our test indicates an unexpected periodic component in the airquality data set from Graz, Austria.Our test is based on the frequency domain approach to FTS analysis, which is rather naturalin this context. This topic has been gaining a significant amount of attention in recent years andit is very useful in various problems (see, for example, Panaretos and Tavakoli [29], H¨ormann,Kidzi´nski, and Hallin [19], Zhang [42], Characiejus and Rice [4] among others). For the theoreticaldevelopments which follow we consider time series with values in an abstract separable Hilbertspace. In this way we cover functional and multivariate data. For the latter our results are alsonew.Before we describe our approach in detail, we introduce notation that is used throughout thepaper. Suppose that H is a real separable Hilbert space equipped with an inner product (cid:104)· , ·(cid:105) : H × H → R and the corresponding norm (cid:107)·(cid:107) : H → [0 , ∞ ). The complexification of H is denotedby H := H ⊕ i H and the space H inherits the Hilbert space structure from H . The complexinner product is defined as (cid:104) u, v (cid:105) H = (cid:104) u , v (cid:105) + (cid:104) u , v (cid:105) + i( (cid:104) u , v (cid:105) − (cid:104) u , v (cid:105) ) for any u = u + i u and v = v + i v in H with u , u , v , v ∈ H . For easier notation, we henceforth consider H as asubspace of H and use (cid:104)· , ·(cid:105) for the real and the complex inner product. We do the same for thenorm and other definitions to come. L ( H ) denotes the space of bounded linear operators on H and it is equipped with the usual operator norm (cid:107) A (cid:107) = sup (cid:107) x (cid:107) =1 (cid:107) A ( x ) (cid:107) . We say that an operator A is Hilbert-Schmidt (trace-class) if its singular values { σ k } k ≥ are square summable (absolutelysummable). We define the corresponding Hilbert-Schmidt norm (cid:107) A (cid:107) S = ( (cid:80) ∞ k =1 σ k ) / and thetrace norm (cid:107) A (cid:107) T = (cid:80) ∞ k =1 σ k (see Weidmann [40] for more details). For x, y ∈ H , the tensor of x and y is a rank one operator x ⊗ y : H → H defined by ( x ⊗ y )( z ) := (cid:104) z, y (cid:105) x for each z ∈ H . Inparticular this gives rise to the covariance operator Var( X ) := E [( X − EX ) ⊗ ( X − EX )]. Wenote that for H = R d with d > X , . . . , X n are observations of random elements with values in some separableHilbert space H and define the discrete Fourier transform (DFT) of X , . . . , X n by setting X n ( ω ) := 1 √ n n (cid:88) t =1 X t e − i tω n ≥ ω ∈ [ − π, π, ], where i = √−
1. By its very definition X n ( ω ) is an element of thecomplex Hilbert space H := H ⊕ i H for ω ∈ [ − π, π ]. The periodogram operator is defined by I n ( ω ) := X n ( ω ) ⊗ X n ( ω ) (1.1)for n ≥ ω ∈ [ − ω, ω ] and it is a well-known and important tool in time series analysis. It is themain ingredient for estimation of the spectral density operator (see Panaretos and Tavakoli [29])and it is the key statistic for detection of periodic signals in the data. What is of particular interestis the maximum of the periodogram operator defined by M n := max j =1 ,...,q (cid:107) I n ( ω j ) (cid:107) S = max j =1 ,...,q (cid:107)X n ( ω j ) (cid:107) , (1.2)where ω , . . . , ω q are the Fourier or the fundamental frequencies given by ω j = 2 πj/n with j =1 , . . . , q and q = (cid:98) ( n − / (cid:99) . In the univariate case the exact distribution of M n can be derived forindependent and identically distributed (iid) Gaussian data (see Fisher [10] as well as Section 10.2of Brockwell and Davis [3]). Then M n is the maximum of q iid standard exponential randomvariables and M n belongs to the domain of attraction of the standard Gumbel distribution. That is, M n − log q converges in distribution to the standard Gumbel distribution as n → ∞ (the cumulativedistribution function of the standard Gumbel distribution is given by F ( x ) = exp {− e − x } for x ∈ R ). If we superimpose a sinusoidal signal s t = α cos( θ + ω j t ) for some j ∈ { , . . . , q } to theobservations, then M n will diverge at a rate proportional to n , which in turn then leads to a verypowerful test statistic.The assumption of Gaussianity is restrictive and hence an alternative approach would be toestablish the asymptotic distribution of the appropriately standardized maximum M n under moregeneral conditions. Walker [38] conjectured that the same result still holds even if the randomvariables are not normal, provided that the moments of the distribution of X , . . . , X n up to somesufficiently high order exist. Walker [38] also stated that no proof was known at the time and thatthe problem of constructing one is undoubtedly extremely difficult. Almost 35 years later, Davisand Mikosch [7] proved that the limit indeed remains the same provided that E | X | s < ∞ withsome s > H that3he observations are generated by a linear process (no periodic component) against the alternativehypothesis H that the observations are generated by a linear process with a superimposed deter-ministic periodic component with an unknown period. We also establish the consistency of theproposed test (see Theorem 8) without assuming any specific shape or form of the superimposeddeterministic periodic component.The rest of the paper is organized as follows. In Section 2 we formulate our main theoremswhich are valid for iid data. In Section 3 we extend these results to linear processes. Then weillustrate in Section 4 how to use our results to construct a test for periodic signals in functionaltime series at some unknown frequency. We evaluate the finite sample behavior in a simulationstudy and with a real data example in Section 5. We give a conclusion in Section 6 and providethe proofs in Section 7. In the Appendix, we prove two theorems which are of separate interestand which are needed for proving our main results. Suppose that { X t } t ≥ are iid random elements with values in H such that EX = 0 and E (cid:107) X (cid:107) < ∞ . Let { v k } k ≥ be the eigenvectors (principal components) of the covariance operator E [ X ⊗ X ]with their corresponding eigenvalues { λ k } k ≥ . The { v k } k ≥ form an orthonormal basis of H and { λ k } k ≥ are indexed in a non-increasing order. We use the following assumption in some of ourresults. Assumption 1. λ k > λ k +1 for each k ≥ V ∼ Exp( θ ) to indicate that V follows an exponential distribution with mean 1 /θ and V ∼ Hypo( θ , . . . , θ p ) if the variable V follows a hypoexponential distribution, i.e. V ∼ (cid:80) pk =1 E i where E i are independent Exp( θ i ) random variables with 1 ≤ i ≤ p . As usual, N ( µ, σ ) denotesthe normal distribution with mean µ and variance σ . We start by studying the projections of X t ’s onto the space spanned by { v , . . . , v d } : X dt = d (cid:88) k =1 (cid:104) X t , v k (cid:105) v k , t ≥ . The DFT and the periodogram operator of { X dt } ≤ t ≤ n are defined by X dn ( ω ) = n − / n (cid:88) t =1 X dt e − i tω and I dn ( ω ) = X dn ( ω ) ⊗ X dn ( ω ) , (2.1)respectively, for ω ∈ [ − π, π ]. Observe that X dt = X t and X dn ( ω ) = X n ( ω ) if H = R d . So themultivariate setting can be viewed as a special case.If we assume for the moment that the X t ’s are iid Gaussian random elements, then we havethat max ≤ j ≤ q (cid:107)X dn ( ω j ) (cid:107) = max ≤ j ≤ q (cid:110) d (cid:88) k =1 λ k E kj (cid:111) , (2.2)4here E kj are independent Exp(1) random variables for 1 ≤ k ≤ d and 1 ≤ j ≤ q . This followsfrom the orthogonality of { v k } k ≥ , which implies that (cid:104) X t , v k (cid:105) are independent N (0 , λ k ) randomvariables. Consequently, (cid:107)X dn ( ω j ) (cid:107) are independent Hypo( λ − , . . . , λ − d ) random variables. Tohave a non-degenerate limiting distribution, the variable M n needs to be centered and scaled. Thecorresponding sequences depend on the eigenvalues of Var( X d ). If Assumption 1 holds, then wehave that λ − (max ≤ j ≤ q (cid:107)X dn ( ω j ) (cid:107) − b dq ) d → G as n → ∞ , where G denotes a standard Gumbeldistribution and where b dn = λ log( nα ,d ) and α ,d = d (cid:89) j =2 (1 − λ j /λ ) − (2.3)(see Lemma 3 in Section 7.2).If H = R d , and if Σ := E[ X t X (cid:48) t ] has full rank we consider the standardized process { Σ − / X t } t ≥ .Alternatively, we may directly assume that Var( X ) = I d , where I d is the identity matrix. Thenmax ≤ j ≤ q (cid:107)X n ( ω j ) (cid:107) − c q d −→ G as n → ∞ , where c n = log n + ( d −
1) log log n − log( d − n ≥ Theorem 1.
Let { X t } t ≥ be iid random elements in H with E (cid:107) X (cid:107) r < ∞ for some r > .Suppose that Assumption 1 holds and d ≥ is fixed. Then λ − ( max ≤ j ≤ q (cid:107)X dn ( ω j ) (cid:107) − b dq ) d → G as n → ∞ , (2.5) where b dq is given by (2.3) . Theorem 2.
Let { X t } t ≥ be iid random vectors in R d with E (cid:107) X (cid:107) r < ∞ for some r > and E [ X X (cid:48) ] = I d , where I d is the identity matrix. Then max ≤ j ≤ q (cid:107)X n ( ω j ) (cid:107) − c q d −→ G as n → ∞ , where c q is given by (2.4) . The proofs of Theorem 1 and Theorem 2 are given in Section 7. They rely on a powerfulGaussian approximation due to Chernozhukov et al. [5] (see Proposition 1). If H = R , we recoverTheorem 2.1 of Davis and Mikosch [7] as a special case of Theorem 2. To the best of our knowledge,Theorem 1 is the first multivariate generalization of Theorem 2.1 of Davis and Mikosch [7].We now present an extension of Theorem 1, where we let d grow to infinity as n → ∞ . Theorem 3.
Suppose that E (cid:107) X (cid:107) < ∞ and that Assumption 1 holds. Assume that { kλ k } k ≥ iseventually monotonic, i.e. there exists k ≥ such that kλ k ≥ ( k + 1) λ k +1 (2.6)5 or all k ≥ k . Then convergence (2.5) still holds if d is replaced by a sequence of integers { d n } n ≥ such that d n → ∞ , and d n λ / d n = o ( n / / log / n ) and d n = O ( n γ ) (2.7) as n → ∞ with γ < min (cid:110) min k ≥ (cid:110) k (cid:16) λ λ k − (cid:17)(cid:111) , (cid:111) . (2.8)Since we assume that λ k ’s are strictly decreasing and summable, we have that kλ k = o (1) as k → ∞ and hence we only additionally require { kλ k } k ≥ to be eventually monotonic in Theorem 3.The first condition in (2.7) ensures that the Gaussian approximation still holds while the secondcondition in (2.7), as well as (2.6) and (2.8) are used to show that the hypoexponential distributionwith an increasing number of parameters belongs to the domain of attraction of the Gumbeldistribution (see Lemma 4 in Section 7.2). If d n → ∞ as n → ∞ , we can choose a centringsequence { b n } n ≥ independently of { d n } n ≥ by setting b n = lim d →∞ b dn for n ≥
1, where b dn isdefined by (2.3) (see Lemma 9). The following theorem establishes a fully functional result, i.e. the convergence in distribution of λ − ( M n − b q ) as n → ∞ , where M n is defined by (1.2). Technical conditions are connected withthe decay rate of the eigenvalues { λ k } k ≥ of the covariance operator Var( X ). Theorem 4.
Suppose that E (cid:107) X (cid:107) r < ∞ for some r ≥ and let Assumption 1 hold. Moreover,suppose that there exists a sequence { d n } n ≥ which satisfies the conditions of Theorem 3. Considersome sequence { (cid:96) k } k ≥ of positive numbers such that (cid:80) ∞ k =1 (cid:96) k = 1 and assume that ∞ (cid:88) k =1 (cid:96) − r/ k E |(cid:104) X , v k (cid:105)| r < ∞ (2.9) and that (cid:88) k>d n ( λ k /(cid:96) k ) r/ = o (1 /n ) (2.10) as n → ∞ . Then λ − ( M n − b q ) d → G as n → ∞ , where b q = lim d →∞ b dq with b dq given by (2.3) .Remark . By our assumption r/ − ≥
0, and hence (2.9) implies (cid:88) k>d n (cid:96) − r/ k E |(cid:104) X , v k (cid:105)| r = o ( n r/ − ) . (2.11)We prove Theorem 4 under weaker condition (2.11).If X is a Gaussian random element, then E |(cid:104) X , v k (cid:105)| r = E | Z | r · λ r/ k , where Z ∼ N (0 ,
1) andhence condition (2.10) implies condition (2.9). While under Gaussianity such an equality holdsfor any r >
0, we only need this condition for some fixed r ≥
4. To this end, we note that by theKarhunen-Lo`eve expansion any random element X in H has the representation X = (cid:88) k ≥ (cid:104) X , v k (cid:105) v k = (cid:88) k ≥ (cid:112) λ k Z k v k , { Z k } k ≥ is white noise with mean zero and unit variance. Since (cid:104) X , v k (cid:105) = √ λ k Z k , thecondition sup k ≥ E | Z k | r = C < ∞ , (2.12)provides the bound E |(cid:104) X , v k (cid:105)| r ≤ Cλ r/ k for all k ≥
1. Consequently, (2.10) together with (2.12)imply (2.9).Let us provide two examples where the conditions of Theorem 4 are satisfied. We look at thesettings where the eigenvalues λ k decay exponentially or polynomially. For numerical sequences { α n } n ≥ and { β n } n ≥ we write α n = Θ( β n ) as n → ∞ if there exist k > K > N ≥ kβ n ≤ α n ≤ Kβ n for all n > N . Example 1.
Suppose that E (cid:107) X (cid:107) r < ∞ with r > λ k = Θ( ρ k ) with 0 < ρ < k → ∞ .Also, assume that (2.6) as well as (2.12) hold. We choose d n = (cid:98) c log( n ) (cid:99) with2 r log(1 /ρ ) < c <
13 log(1 /ρ ) . Then d n λ / d n = O (log ( n ) n c log(1 /ρ ) ) = o (cid:16) n / log / n (cid:17) as n → ∞ if c < (3 log(1 /ρ )) − . This shows that (2.7) holds. We set (cid:96) k = (cid:15) (1 − (cid:15) ) − (1 − (cid:15) ) k forsome (cid:15) ∈ (0 , − ρ ). Then (2.10) holds since (cid:88) k>d n ( λ k /(cid:96) k ) r/ = O (( ρ/ (1 − (cid:15) )) rd n / ) = O ( n − rc log((1 − (cid:15) ) /ρ ) ) = o (1 /n )as n → ∞ whenever c > / ( r log(1 /ρ )) and if (cid:15) is small enough. Hence the required conditionshold. Example 2.
Suppose that λ k = Θ( k − ν ) with ν > k → ∞ . Now choose some large enough r > / ( ν −
1) such that for some β > ν − r/ − < β < min (cid:110) ν ) , min k ≥ k (cid:16) λ λ k − (cid:17) , (cid:111) (2.13)and assume that E (cid:107) X (cid:107) r < ∞ . Also, let us assume that (2.6) as well as (2.12) hold. Then wemay set d n = (cid:98) n β (cid:99) and verify condition (2.7) so that Theorem 4 is applicable. To this end wenotice that d n λ / d n = O ( n β (4+ ν/ ) = o (cid:16) n / log / n (cid:17) as n → ∞ since β < (3(8 + ν )) − . For the second part of condition (2.7), we require β < min { min k ≥ k − ( λ /λ k − , } .In order to verify (2.10) we choose (cid:96) k proportional to k − (1+ (cid:15) ) . Then (cid:88) k>d n ( λ k /(cid:96) k ) r/ = O (cid:16) (cid:88) k>d n k r ( − ν +1+ (cid:15) ) (cid:17) = O ( n β ( r (1+ (cid:15) − ν )+1) ) = o ( n − )as n → ∞ if r > / ( ν −
1) and if β > / (( ν − r/ − (cid:15) is chosen small enough.7 Extension to linear processes
We consider an extension of our Theorem 2 and Theorem 4 to linear processes. Suppose that { X t } t ∈ Z is a linear process given by X t = ∞ (cid:88) k = −∞ a k ( ε t − k ) (3.1)for each t ∈ Z , where { a k } k ∈ Z ⊂ L ( H ) such that (cid:80) ∞ k = −∞ (cid:107) a k (cid:107) < ∞ and { ε t } t ∈ Z are iid H -valuedrandom elements with zero means. We denote the DFT of ε , . . . , ε n by E n ( ω ) = n − / n (cid:88) t =1 ε t e − i tω for ω ∈ [ − π, π ] and n ≥
1. We also use the impulse-response operator A ( ω ) defined by A ( ω ) = ∞ (cid:88) k = −∞ a k e − i kω (3.2)for ω ∈ [ π, π ].The next lemma establishes a relationship between the DFT and the periodogram operatorof X , . . . , X n and the DFT and the periodogram operator of ε , . . . , ε n . Essentially, this is ageneralization of Theorem 3 of Walker [38] to linear processes with values in separable Hilbertspaces. Lemma 1.
Suppose that { X t } t ∈ Z is given by (3.1) and (cid:80) k (cid:54) =0 log( | k | ) (cid:107) a k (cid:107) < ∞ . Then max ≤ j ≤ q (cid:107)X n ( ω j ) − A ( ω j ) E n ( ω j ) (cid:107) = o P (log − / n ) (3.3) and max ≤ j ≤ q (cid:107)X n ( ω j ) ⊗ X n ( ω j ) − A ( ω j ) E n ( ω j ) ⊗ A ( ω j ) E n ( ω j ) (cid:107) = o P (1) (3.4) as n → ∞ , where A ( ω ) is given by (3.2) for ω ∈ [ π, π ] . We note that we require a weaker summability condition than in Walker [38], where it isassumed that (cid:80) k (cid:54) =0 | k | / (cid:107) a k (cid:107) < ∞ .Lemma 1 implies thatmax ≤ j ≤ q (cid:107)X n ( ω j ) (cid:107) − max ≤ j ≤ q (cid:107) A ( ω j ) E n ( ω j ) (cid:107) = o P (1) as n → ∞ . (3.5)With additional assumptions on A ( ω ) it is possible to establish the asymptotic distribution ofmax ≤ j ≤ q (cid:107) A − ( ω j ) X n ( ω j ) (cid:107) from max ≤ j ≤ q (cid:107)E n ( ω j ) (cid:107) . Lemma 2.
Suppose that { X t } t ∈ Z is given by (3.1) , (cid:80) k (cid:54) =0 log( | k | ) (cid:107) a k (cid:107) < ∞ , A − ( ω ) exists foreach ω ∈ [ − π, π ] and sup ω ∈ [0 ,π ] (cid:107) A − ( ω ) (cid:107) < ∞ , where A ( ω ) is given by (3.2) . Then max ≤ j ≤ q (cid:107) A − n ( ω j ) X n ( ω j ) (cid:107) − max ≤ j ≤ q (cid:107)E n ( ω j ) (cid:107) = o P (1) as n → ∞ . Example 3.
Consider an FAR(1) model given by X t = ρ ( X t − ) + ε t = ∞ (cid:88) j =0 ρ j ( ε t − j )for t ∈ Z with ρ ∈ L ( H ) such that (cid:107) ρ n (cid:107) < n ≥ A ( ω ) is a Neumann series for each ω ∈ [ − π, π ], we have that A ( ω ) = ( I − e − i ω ρ ) − and hence A − ( ω ) = I − e − i ω ρ exists for each ω ∈ [ − π, π ], and sup ω ∈ [0 ,π ] (cid:107) A − ( ω ) (cid:107) < ∞ .Lemma 2 allows us to obtain the following theorem. Theorem 5.
Suppose that { X t } t ∈ Z is given by (3.1) , the assumptions of Lemma 2 are satisfiedand { ε t } t ∈ Z satisfy the assumptions of Theorem 4. Then λ − (cid:16) max ≤ j ≤ q (cid:107) A − ( ω j ) X n ( ω j ) (cid:107) − b q (cid:17) d → G as n → ∞ . The eigenvalues λ and those in the definition of b n are the eigenvalues of the covariance operator Var( ε ) . If we restrict our attention to the multivariate case, i.e. H = R d , then we can standardizethe covariance structure of { ε t } t ∈ Z . We have the following result in the finite dimensional setting.We note that in the following theorem we do not require distinct eigenvalues of Var( ε ) as long asthey all are positive. Theorem 6.
Suppose that H = R d , { X t } t ∈ Z is given by (3.1) , (cid:80) k (cid:54) =0 log( | k | ) (cid:107) a k (cid:107) < ∞ , A − ( ω ) exists for each ω ∈ [ − π, π ] and sup ω ∈ [0 ,π ] (cid:107) A − ( ω ) (cid:107) < ∞ , where A ( ω ) is given by (3.2) for ω ∈ [ − π, π ] . Suppose that the covariance matrix Σ := Var( ε ) is positive definite. Then max ≤ j ≤ q (cid:107) B − ( ω j ) X n ( ω j ) (cid:107) − c n d −→ G as n → ∞ , where B ( ω ) = A ( ω )Σ / , c n is given by (2.4) . We conclude by remarking that the spectral density matrix can be expressed as F ( ω ) = A ( ω )Σ A ∗ ( ω ) = B ( ω ) B ∗ ( ω ) . (3.6)for ω ∈ [ − π, π ]. Hence, we have that (cid:107) B − ( ω ) X n ( ω ) (cid:107) = Tr[ F − ( ω )[ X n ( ω ) ⊗ X n ( ω )]]for ω ∈ [ − π, π ]. 9 Detecting periodic signals
In this section we discuss the application of our results to testing for hidden periodicities infunctional time series. Our basic framework hence is the following: assume that the sequence { Y t } t ∈ Z is given by Y t = µ + s ( t ) + X t (4.1)for t ∈ Z , where µ ∈ H , s : Z → H is a deterministic periodic function such that s ( t ) = s ( t + d )for all t ∈ Z with some d ≥ (cid:80) dt =1 s ( t ) = 0. We complement the recent results of H¨ormannet al. [16], where such tests were developed when the length of the period d is assumed to beknown. In the following we do not assume that d is known. We investigate the subsequent testingproblem: H : (4.1) holds with (cid:107) s ( t ) (cid:107) ≡ H : (4.1) holds with (cid:107) s ( t ) (cid:107) (cid:54)≡
0. (4.2)The noise process X t can follow any of the different settings discussed in the present paper (multi-variate, multivariate with increasing dimension, iid data, linear processes). Of course, every settingrequires different—though conceptually similar—test statistics. To keep the paper streamlined wefocus here on the infinite dimensional setting. In particular we are going to assume that X t is anFAR(1) process X t = ρ ( X t − ) + ε t . For this setup we will work out the details. With ρ = 0 thisincludes the iid case, where we can actually relax Assumption 2 below, since we do not have toestimate ρ then. The proofs of this section are given in Section 7.4.Suppose for the moment that Σ = Var( ε t ) and ρ are known. Let λ j be the eigenvalues of Σ.Then, under H and suitable assumptions on the innovations ε t , we get by Theorem 5 that thetest statistic λ − max ≤ j ≤ q (cid:107) ( I − e − i ω j ρ ) Y n ( ω j ) (cid:107) − log( q ) + ∞ (cid:88) j =2 log(1 − λ j /λ )converges to the standard Gumble distribution. Here Y n ( ω j ) denotes the discrete Fourier transformof Y , . . . , Y n (note that under H we have Y n ( ω j ) = X n ( ω j ) for all 1 ≤ j ≤ q .) In practice, weneed to replace ρ and λ j by estimators to get a valid test statistic. We will impose the followingassumption. Assumption 2.
Suppose that (cid:98) ρ is an estimator of ρ , with (cid:107) (cid:98) ρ − ρ (cid:107) = o P ( a − n ), where log n ≤ a n ≤ √ n . Assume (cid:107) ρ (cid:107) <
1. Assume moreover, that the innovations ε t satisfy the assumptions ofTheorem 4. Finally we suppose that µ = 0.Assumption 2 contains the basic assumptions on the innovations which we require in the iidcase to apply our theorems. In addition we need a consistent estimator for ρ , which is, forexample, established in Bosq [2] or H¨ormann and Kidzinski [15]. Rates of convergence can befound in Guillas [13]. The requirement (cid:107) ρ (cid:107) < µ = 0 is a simplification. Otherwise we center the data by the samplemean. A constant shift does not alter Y n ( ω j ) for j = 1 , . . . , q . Theorem 7.
Define ˆ λ j to be the eigenvalues of n − (cid:80) nk =2 ˆ ε k ⊗ ˆ ε k , where ˆ ε k = X k − (cid:98) ρ ( X k − ) , k = 2 , . . . , n. Under H and Assumption 2, we have that T n := ˆ λ − max ≤ j ≤ q (cid:107) ( I − e − i ω j (cid:98) ρ ) Y n ( ω j ) (cid:107) − log( q ) + a n (cid:88) j =2 log(1 − ˆ λ j / ˆ λ ) d → G as n → ∞ . emark . In Theorem 7 the truncation parameter a n in the centering constant can be replacedby any b n ≤ a n with b n → ∞ .Our next result establishes consistency of our test statistic when H is violated. We assumethat there exists d ≥ s ( t ) = s ( t + d ) and (cid:107) s ( t ) (cid:107) (cid:54)≡
0. In the formulation of the theorembelow, we allow d and s ( t ) to be dependent on n . Theorem 8.
Consider the assumptions of Theorem 7, but assume now that H doesn’t hold.Suppose that max ≤ t ≤ d (cid:107) s ( t ) (cid:107) = O (1) and √ nd → ∞ and ψ n := (cid:13)(cid:13)(cid:13)(cid:13) d (cid:88) t =1 s ( t ) e − i πd t (cid:13)(cid:13)(cid:13)(cid:13) d (cid:114) n log n → ∞ . (4.3) Suppose moreover, that (cid:98) ρ P → ρ (cid:48) , with (cid:107) ρ (cid:48) (cid:107) < and ˆ λ j P → λ (cid:48) j , with (cid:80) j ≥ λ (cid:48) j < ∞ . Then we have T n P → ∞ as n → ∞ . Condition (4.3) is a technical condition which is fairly mild and which assures that the periodicsignal is strong enough to be picked up by the Fourier transform. The assumptions on (cid:98) ρ and ˆ λ j areneeded because the violation of H implies that our process { Y t } t ∈ Z is not stationary. Thereforethe estimator for ρ —neglecting the underlying periodic signal—is in general not consistent. Whenthe length of the period is known, then the estimator can be adapted to remain consistent underthe alternative. Here we do not assume that d is known and hence we use the same estimator for ρ as in the stationary case. To work out the asymptotics of the estimator under the alternative isbeyond the scope of this paper and hence is phrased as an assumption. In this section we compare the asymptotic theory developed in this paper to the finite samplebehaviour of the statistic T n from Section 4. To this end we organize a simulation study which wedescribe now in detail. The first step is to generate suitable data. The target in a simulation is to generate synthetic data, so that we have control over the datagenerating process (DGP). Often, however, we find the available simulation settings for functionaldata rather unrealistic. We want to explain here a setting which allows to generate synthetic andat the same time realistic data. To this end we use as our basic building block a real data setwhich we are well familiar with and which we have used as a toy data set in different papers,namely
PM10 curves in Graz, Austria.
PM10 is measured in µg/m and describes the amountof particles with a diameter of less the 10 µm in 1 cubic-meter of air. Specifically, our data setconsists of 182 observation days in the winter season 2010/2011 (October–March). The data arerecorded in 30 minutes intervals, resulting in 48 observations per day. We have removed theweek around New Year’s Eve because of high outlying observations due to fireworks, leaving 175days. In the data preprocessing we have also removed a potential weekday effect, by centering thedata with corresponding weekday averages. To account for heavy tails, we have done a square-root transformation, i.e. we look at √ PM10 . The preprocessed data are than transformed tofunctional data by a basis function approach, see Ramsay et al. [32]. We use the R -package fda Data2fd with 21 Fourier basis functions. To the resulting functional time series Z , . . . , Z we fit an FAR(1) model Z t = ψ ( Z t − ) + e t . The estimator (cid:98) ψ is a PCA based estimatordefined as in Bosq [2], p. 218. We set the tuning parameter k n = 8. This parameter determines thenumber of principal components to use for the estimator. In our example 8 principal componentsare needed to explain more than 99% of the variability in the data. In general, a linear operator ψ on the function space L can be represented in the form ψ = (cid:80) i,j ≥ ψ i,j v i ⊗ v j where { v i : i ≥ } are the Fourier basis functions. Hence ψ is equivalent to an infinite dimensional correspondancematrix Ψ = (( ψ ij )). In our case, since we use 21 Fourier basis functions to expand the data, (cid:98) ψ corresponds to a 21 ×
21 matrix (cid:98)
Ψ. In Figure 1 we show the 9 × (cid:98) Ψ. This (cid:98)
Ψ is close to an upper triangular matrix. It is very different fromcommon settings where mainly diagonal or symmetric matrices are used.
FAR operator
Column R o w +0.72 −0.27 +0.61 −0.66 +0.15 −1.11 +0.14 −0.39 −0.11+0.09 +0.15 +0.08 −0.11 +0.40 +0.00 +0.41 −0.56 −0.05−0.02 +0.06 −0.00 +0.13 −0.03 −0.22 −0.14 −0.33 +0.10+0.10 −0.18 +0.08 +0.17 −0.16 −0.01 +0.21 −0.43 +0.07+0.01 +0.01 −0.02 −0.08 +0.21 +0.06 −0.08 −0.19 +0.02+0.03 −0.06 +0.03 −0.05 +0.08 +0.16 +0.10 +0.05 −0.04−0.03 +0.11 +0.03 +0.08 −0.15 −0.10 +0.13 −0.23 +0.03+0.02 −0.06 +0.03 −0.00 +0.04 +0.09 −0.03 −0.08 +0.02+0.01 −0.04 −0.00 +0.01 −0.02 +0.01 +0.01 +0.07 −0.01 −1.30−1.00−0.70−0.40−0.10+0.10+0.40+0.70+1.00+1.30 Figure 1: The coefficient matrix (upper 9 × PM10 sample and used for our DGP.Now we start with the actual generation of our synthetic data. To this end we compute theresiduals ˆ e t = Z t − (cid:98) ψ ( Z t − ), 2 ≤ t ≤
175 and generate a functional time series X t = ρ ( X t − ) + ε t ,using ρ = (cid:98) ψ , and ε , . . . , ε n being an iid bootstrap sample of size n from ˆ e , . . . , ˆ e . We use X = ε . Our construction assures that we get a functional time series which is stationary andbehaves similarly as the original PM10 data.
The core algorithm for our simulations can be described as follows:
Simulation algorithm:
1. Generate n data from the FAR(1) process X t = ρ ( X t − ) + ε t .12. Generate a d -periodic signal s ( t ) and define Y t = s ( t ) + X t .3. Estimate the auto-regression operator ρ .4. Calculate the residuals ˆ ε t = X t − (cid:98) ρ ( X t − ).5. Using ˆ ε t compute estimates ˆ λ j for the eigenvalues of Σ = Var( ε ).6. Compute T n and then δ := I { T n >q − α } , where q − α = G − (1 − α ) and I A is the indicatorfunction on A .7. Repeat Steps 1–6 2000 times independently to obtain δ , . . . , δ and calculate the empiricalrejection rate ˆ r := av( δ i : 1 ≤ i ≤ n = 100 , , s ( t, u ) = s ( t ) = a cos(2 πt/d ), where d − P λ with λ = 5 and λ = 15. (Note that we guarantee d ≥ a we investiagethe values a = 0 , ,
2. Clearly, a = 0 corresponds to H . In Step 3 we estimate ρ using the estimatoroutlined in Section 5.1, again with k n such that we explain more than 99% of the variance in oursample. In Step 6 we need to choose a n . We use a n = argmin j ≥ {− log(1 − ˆ λ j / ˆ λ ) ≤ . } . Thesignificance levels for our tests are α ∈ { . , . , . } .For all combinations of n , λ and a we run the experiment 2000 times and report ˆ r = ˆ r ( n, λ, a, α )in Table 1. We can see that the respective size is captured fairly accurately even at the relativelysmall sample size n = 100. Not surprisingly, the test is more powerful for shorter periods and largersample sizes. Concerning the power we notice that under our setting with a = 1 the signal-to-noiseratio is 1 d d (cid:88) t =1 (cid:107) s ( t ) (cid:107) (cid:46) E (cid:107) X t (cid:107) ≈ . . Here we have approximated E (cid:107) X t (cid:107) by n (cid:80) nt =1 (cid:107) X t (cid:107) with n = 10 .ˆ r ( n, λ, a, α ) a = 0 ( ≡ H ) a = 1 a = 2 α λ = 5 n = 100 0.066 0.029 0.004 0.861 0.799 0.670 1.000 0.999 0.993 n = 200 0.082 0.038 0.006 0.989 0.983 0.970 1.000 1.000 1.000 n = 500 0.093 0.054 0.011 1.000 1.000 0.999 1.000 1.000 1.000 λ = 15 n = 100 0.082 0.041 0.005 0.249 0.165 0.071 0.818 0.758 0.606 n = 200 0.071 0.035 0.006 0.569 0.471 0.293 0.985 0.973 0.922 n = 500 0.096 0.045 0.007 0.990 0.978 0.942 1.000 1.000 1.000Table 1: Empirical rejection rates in our simulation study. We now apply the test directly to the
PM10 data set. In H¨ormann et al. [16] the same data weretested for a fixed period d = 7 in order to reveal a potential weekday effect. It was found there,that such a weekday effect is significant. The reason being that on weekends the shape of the PM10 curves (again we use √ PM10 curves) changes towards a lower level during day time and higher13evels during the night time. Since the test we propose here is not requiring knowledge of theperiod d , it is of course expected to have smaller power.We consider two settings: in the first we use the data Z , . . . , Z as described in Section 5.1,i.e. the detrended data, centered by the weekday averages. In addition we consider ˜ Z , . . . , ˜ Z ,where the detrending step is skipped. This data corresponds to the actual √ PM10 curves.Instead of plainly computing the test statistic T n we rather show in Figure 2 plots of T n ( j ) := (cid:107) ( I − e − i ω j (cid:98) ρ ) Y n ( ω j ) (cid:107) − log( q ) + a n (cid:88) j =2 log(1 − ˆ λ j / ˆ λ ) j = 1 , . . . , q = 87 . The horizontal lines represent critical values at levels α = 0 .
1, 0 .
05 and 0 .
01. For the detrendeddata (left figure) we cannot find a significant violation of H . Also for the non-detrended data(right figure) a weekly periodicity, corresponding to frequency ω = π (marked by the dashedvertical line) does not stand out significantly. So here we are confronted with the loss in power wementioned before. However, to our surprise, we did notice a significant periodicity at frequency ω . A closer look into the data shows that it can be explained by a seasonal behavior of PM10 ,which we did not notice earlier. Taking a moving averages sliding over the data, we observeda slightly increasing trend of the base
PM10 level towards the high winter, followed again by adecreasing trend towards spring. We remark that it is quite difficult to notice such features byvisual inspection, since plotting and visually analysing 175 functional data in a sequence is notquite obvious.In practice it is advisable to test for a fixed frequency, if we have a particular conjecture aboutthe length of the period. The example shows that it is well worth to complement this approachwith our new test, as it may reveal periodicities which are not a priori expected. − − − Index V a l ue s − − − Index V a l ue s Figure 2: The statistics T n ( j ) plotted for index j = 1 , . . . , q . The left figure isbased on ( Z i ) (detrended data) the right figure on ( ˜ Z i ) (actually PM10 curves).
We have investigated the limiting distribution of the maximum norm of the periodogram operatorof a Hilbert space valued random sequence. This a very useful statistic when we are interested in14evealing a hidden periodic signal in functional time series. For the proof of our main results weproceed stepwise from the multivariate, to the high-dimensional (i.e. the dimension is divergingwith sample size) and then to the infinite dimensional case. The method of proof we use is based onrecent advances in the normal approximation of high dimensional data in Chernozhukov et al. [5].Our approach can be used to recover a classical result of Davis and Mikosch [7] for univariate data.In fact, the proof for the univariate results in Davis and Mikosch [7] with our approach would bemuch shorter. For passing to the infinite dimensional case we had to slightly adapt the result ofChernozhukov et al. [5] for our needs, and make the constant in the normal approximation boundsexplicit. The application also demands to extend our theory beyond independent data. We havepresented an extension to linear processes under quite sharp conditions.Finally, we conducted an empirical study to investigate how this theory works with simulatedas well as real data. We investigate the
PM10 data set from Graz, Austria which we are well familiarwith and which we have used as an example in different publications (see Stadlober et al. [37] andH¨ormann et al. [16]). This is an air quality data set that contains the amount of particulate matterof up to 10 µ m in diameter measured in µ g / m . The PM10 data set is also the main building blockof our simulated data. We use it to generate synthetic and at the same time realistic data viasome resampling scheme. Our simulation study shows that our approach has good finite sampleperformance. We also compare our test with the test of H¨ormann et al. [16] using the
PM10 dataset. Since here we do not require the knowledge of the period, it is expected that we have smallerpower. Our test does not detect the same (weekly) periodic component as the test by H¨ormannet al. [16], but the new approach reveals another seasonal effect which we did not notice previously.
For the proofs we introduce the following notation and conventions. We use again (cid:107) · (cid:107) as norm on H but also for the Euclidian norm in R d . The specific meaning should be clear from the context.We use N d ( µ, Σ) for the d -variate normal law with mean µ and covariance Σ. The unit-sphere in R d is denoted S d − = { x ∈ R d : (cid:107) x (cid:107) = 1 } . We define (cid:107) u (cid:107) the number of non-zero components ofthe vector u ∈ R d . We will use I A for indicator function of a set A and I d the identity matrix in R d .The main tool of our proofs is a powerful result of Chernozhukov et al. [5]. Suppose that V , . . . , V n are independent random vectors in R p with zero means and finite second moments. Let W , . . . , W n be independent Gaussian random vectors in R p such that W i ∼ N p (0 , E [ V i V (cid:48) i ]) for1 ≤ i ≤ n . Set S Vn = n − / (cid:80) ni =1 V i and S Wn = n − / (cid:80) ni =1 W i for n ≥
1. Chernozhukov et al. [5]establish a bound for ρ n ( A sp ( s )) = sup A ∈A sp ( s ) | P ( S Vn ∈ A ) − P ( S Wn ∈ A ) | , (7.1)where A sp ( s ) is the class of s -sparsely convex subsets of R d . A set A is an element of A sp ( s )if A is an intersection of finitely many convex sets A k and if the indicator function of each A k , x (cid:55)→ I A k ( x ), depends only on s components of its argument x = ( x , . . . , x d ). We state someconditions which will be needed:(i) n − (cid:80) nt =1 E | u (cid:48) V t | ≥ b for all u ∈ S p − and (cid:107) u (cid:107) ≤ s ;(ii) n − (cid:80) nt =1 E | V t,j | k ≤ B kn for all j = 1 , . . . , p and k = 1 , E exp( | V t,j | /B n ) ≤ t = 1 , . . . , n and j = 1 , . . . , p ;(iv) E max ≤ j ≤ p ( | V t,j | /B n ) q ≤ t = 1 , . . . , n ,where b, q > B n ≥ n → ∞ . Proposition 1. (Chernozhukov et al. [5, Proposition 3.2])) Under conditions (i), (ii) and (iii),it holds that ρ n ( A sp ( s )) ≤ C · B / n log / ( pn ) n / (7.2) for n ≥ . The constant C in (7.2) depends only on b , s and q . Proof of Theorem 1.
We denote˜ X t = X t I {(cid:107) X t (cid:107)≤ n /r } − E[ X I {(cid:107) X (cid:107)≤ n /r } ] , and ˜ X dt = d (cid:88) k =1 (cid:104) ˜ X t , v k (cid:105) v k and ˜ X dn ( ω ) = n − / n (cid:88) t =1 ˜ X dt e − itω for n ≥ t ≥ d ≥ ω ∈ [ − π, π ]. In view of Lemma 10 in Section 7.5, it suffices to showthat λ − (max ≤ j ≤ q (cid:107) ˜ X dn ( ω j ) (cid:107) − b dn ) d → G as n → ∞ . To this end let us define R dq -valued randomvectors S Vn = n − / n (cid:88) t =1 V t for n ≥
1, where V t = ( (cid:104) ˜ X t , v (cid:105) f (cid:48) t , . . . , (cid:104) ˜ X t , v d (cid:105) f (cid:48) t ) (cid:48) with f t = (cos( tω ) , sin( tω ) , . . . , cos( tω q ) , sin( tω q )) (cid:48) ∈ R q . (7.3)Note, that for the sake of a lighter notation we suppress in some variables the dependence on d and q .We let V t,m and S Vn,m be the m -th element of the vectors V t and S Vn , respectively. For someordered index set J we let V t,J = ( V t,j : j ∈ J ) (cid:48) . Analogously we define S Vn,J . Then V t, q × ( (cid:96) − k − = (cid:104) ˜ X t , v (cid:96) (cid:105) cos( tω k ) and V t, q × ( (cid:96) − k = (cid:104) ˜ X t , v (cid:96) (cid:105) sin( tω k ) , for 1 ≤ (cid:96) ≤ d and 1 ≤ k ≤ q. Thus, with the sets J k = J + 2( k − J = { , , q + 1 , q + 2 , . . . , d − q + 1 , d − q + 2 } (7.4)we obtain vectors V t,J k ∈ R d , k = 1 , . . . , q , where V t,J k = ( (cid:104) ˜ X t , v (cid:105) cos( tω k ) , (cid:104) ˜ X t , v (cid:105) sin( tω k ) , . . . , (cid:104) ˜ X t , v d (cid:105) cos( tω k ) , (cid:104) ˜ X t , v d (cid:105) sin( tω k )) (cid:48) .
16t holds that P ( max ≤ k ≤ q (cid:107) ˜ X dn ( ω k ) (cid:107) ≤ x ) = P ( (cid:107) ˜ X dn ( ω k ) (cid:107) ≤ x for all k = 1 , . . . , q )= P ( (cid:107) S Vn,J k (cid:107) ≤ x for all k = 1 , . . . , q )= P ( S Vn ∈ ∩ qk =1 A k ) , where A k = { y ∈ R dq : (cid:107) ( y j ) j ∈ J k (cid:107) ≤ x } . It is important to note that ∩ qk =1 A k is a 2 d -sparsely convex set.Our target is then to apply Proposition 1. To this end we show that conditions (i), (ii) and(iii) hold. Suppose that u ∈ S dq − and u = ( u (cid:48) , . . . , u (cid:48) d ) (cid:48) with u (cid:96) ∈ R q . We obtain n − n (cid:88) t =1 E |(cid:104) V t , u (cid:105)| ≥ n − n (cid:88) t =1 E (cid:12)(cid:12)(cid:12) p (cid:88) k =1 (cid:104) X t , v k (cid:105) f (cid:48) t u k (cid:12)(cid:12)(cid:12) + 2 n − n (cid:88) t =1 d (cid:88) k,l =1 E[ (cid:104) X t , v k (cid:105)(cid:104) ˜ X t − X t , v l (cid:105) ] f (cid:48) t u k f (cid:48) t u l = T + T . We have that T ≥ λ d / n − (cid:80) nt =1 f t f (cid:48) t = I q and (cid:80) dk,l =1 (cid:104) u l , u k (cid:105) ≤ d , we obtain | T | ≤ d (cid:88) k,l =1 E |(cid:104) X , v k (cid:105)(cid:104) ˜ X − X , v l (cid:105)||(cid:104) u l , u k (cid:105)| ≤ d (E (cid:107) X (cid:107) ) / (E (cid:107) ˜ X − X (cid:107) ) / . Clearly E (cid:107) ˜ X − X (cid:107) →
0, and hence n − (cid:80) nt =1 E |(cid:104) V t , u (cid:105)| ≥ λ d / o (1) as n → ∞ and thuscondition (i) is satisfied.To verify (ii) we first notice that for any (cid:96) and m |(cid:104) ˜ X t , v (cid:96) (cid:105)| max {| cos( tω m ) | , | sin( tω m ) |} ≤ (cid:107) X t (cid:107) I {(cid:107) X (cid:107)≤ n /r } . Hence, if r < k , then n − n (cid:88) t =1 E | V t,j | k ≤ k E[ (cid:107) X (cid:107) k I {(cid:107) X (cid:107)≤ n /r } ] = O ( n (2+ k ) /r − )as n → ∞ . For r > k , we have n − (cid:80) nt =1 E | V t,j | k = O (1). Thus, (ii) is satisfied if we set B n = cn /r for n ≥ c >
0. Finally, (iii) follows fromE exp( | V t,j | /B n ) ≤ exp(2 n /r /B n ) ≤ , if c ≥ / log 2.Hence, (i), (ii) and (iii) hold, which in turn implies that (7.2) holds with B n = cn /r , where c ≥ / log 2 and r >
2. The bound in (7.2) tends to 0 with n → ∞ .17uppose now that ˜ Y , ˜ Y , . . . are iid Gaussian random elements with values in H such thatE ˜ Y = 0 and Var( ˜ Y ) = Var( ˜ X ). Then the 2 dq -dimensional random vectors V t have the samecovariance matrices as the Gaussian random vectors W t , where W t = ( (cid:104) ˜ Y t , v (cid:105) f (cid:48) t , . . . , (cid:104) ˜ Y t , v d (cid:105) f (cid:48) t ) (cid:48) , ≤ t ≤ n. In analogy to ˜ X dt and ˜ X dn ( ω ) we define now˜ Y dt = d (cid:88) k =1 (cid:104) ˜ Y t , v k (cid:105) v k and ˜ Y dn ( ω ) = n − / n (cid:88) t =1 ˜ Y dt e − itω . We have shown thatsup x ∈ R | P ( max ≤ k ≤ q (cid:107) ˜ X dn ( ω k ) (cid:107) ≤ x ) − P ( max ≤ k ≤ q (cid:107) ˜ Y dn ( ω k ) (cid:107) ≤ x ) |≤ sup A ∈A sp (2 d ) | P ( S Vn ∈ A ) − P ( S Wn ∈ A ) | → , when n → ∞ . Therefore, it remains to prove that λ − max ≤ j ≤ q (cid:107) ˜ Y dn ( ω j ) (cid:107) − b dq = λ − max ≤ j ≤ q d (cid:88) k =1 |(cid:104) ˜ Y n ( ω j ) , v k (cid:105)| − b dq d → G . (7.5)To this end we introduce (˜ λ k , ˜ v k ), which are the pairs of eigenvalues and eigenfunctions of Var( ˜ X ).In a first step we show that λ − max ≤ j ≤ q d (cid:88) k =1 |(cid:104) ˜ Y n ( ω j ) , ˜ v k (cid:105)| − b dn d → G . (7.6)Let us denote C kj = ˜ λ − / k n − / n (cid:88) t =1 (cid:104) ˜ Y t , ˜ v k (cid:105) cos( tω j ) and S kj = ˜ λ − / k n − / n (cid:88) t =1 (cid:104) ˜ Y t , ˜ v k (cid:105) sin( tω j ) , with 1 ≤ k ≤ d and 1 ≤ j ≤ q . These 2 dq variables are mutually independent and N (0 , )distributed. Thus max ≤ j ≤ q d (cid:88) k =1 |(cid:104) ˜ Y n ( ω j ) , ˜ v k (cid:105)| = max ≤ j ≤ q d (cid:88) k =1 ˜ λ k E kj , where E kj = C kj + S kj iid ∼ Exp(1). Moreover, we have (cid:12)(cid:12)(cid:12) max ≤ j ≤ q d (cid:88) k =1 ˜ λ k E kj − max ≤ j ≤ q d (cid:88) k =1 λ k E kj (cid:12)(cid:12)(cid:12) ≤ max ≤ j ≤ q | d (cid:88) k =1 ˜ λ k E kj − p (cid:88) k =1 λ k E kj |≤ d (cid:88) k =1 | ˜ λ k − λ k | max ≤ j ≤ q E kj . It is a basic result that max ≤ j ≤ q E kj = O P (log n ) and Lemma 11 yields | ˜ λ k − λ k | ≤ (cid:107) Var( ˜ X ) − Var( X ) (cid:107) = o ( n − (1 − /r ) ) as n → ∞ . Hence, combining these results with Lemma 3, we get (7.6).18he last step in the proof is to show (7.5) and this in turn will follow from (7.6) if we provethat max ≤ j ≤ q d (cid:88) k =1 |(cid:104) ˜ Y n ( ω j ) , c k v k (cid:105)| − max ≤ j ≤ q d (cid:88) k =1 |(cid:104) ˜ Y n ( ω j ) , ˜ v k (cid:105)| = o P (1) , n → ∞ . (7.7)The absolute of the left-hand side in (7.7) is bounded bymax ≤ j ≤ q (cid:12)(cid:12)(cid:12) d (cid:88) k =1 {|(cid:104) ˜ Y n ( ω j ) , c k v k (cid:105)| − |(cid:104) ˜ Y n ( ω j ) , ˜ v k (cid:105)| } (cid:12)(cid:12)(cid:12) = max ≤ j ≤ q (cid:12)(cid:12)(cid:12) d (cid:88) k =1 {|(cid:104) ˜ Y n ( ω j ) , c k v k − ˜ v k (cid:105)| + 2 Re[ (cid:104) ˜ Y n ( ω j ) , c k v k − ˜ v k (cid:105)(cid:104) ˜ v k , ˜ Y n ( ω j ) (cid:105) ] } (cid:12)(cid:12)(cid:12) ≤ d (cid:88) k =1 max ≤ j ≤ q |(cid:104) ˜ Y n ( ω j ) , c k v k − ˜ v k (cid:105)| (7.8)+ 2 (cid:88) max ≤ j ≤ q |(cid:104) ˜ Y n ( ω j ) , ˜ v k (cid:105)| max ≤ j ≤ q |(cid:104) ˜ Y n ( ω j ) , c k v k − ˜ v k (cid:105)| . (7.9)The components of the random vector n − / (cid:80) nt =1 (cid:104) ˜ Y t , c k v k − ˜ v k (cid:105) cos( tω ) n − / (cid:80) nt =1 (cid:104) ˜ Y t , c k v k − ˜ v k (cid:105) sin( tω ) . . .n − / (cid:80) nt =1 (cid:104) ˜ Y t , c k v k − ˜ v k (cid:105) cos( tω q ) n − / (cid:80) nt =1 (cid:104) ˜ Y t , c k v k − ˜ v k (cid:105) sin( tω q ) are uncorrelated and the covariance matrix is given by 2 − E |(cid:104) ˜ Y , c k v k − ˜ v k (cid:105)| I q . A summand in(7.8) is given bymax ≤ j ≤ q (cid:12)(cid:12)(cid:12) n − / n (cid:88) t =1 (cid:104) ˜ Y t , c k v k − ˜ v k (cid:105) e − itω j (cid:12)(cid:12)(cid:12) == E |(cid:104) ˜ Y , c k v k − ˜ v k (cid:105)| max ≤ j ≤ q (cid:12)(cid:12)(cid:12) (E |(cid:104) ˜ Y , c k v k − ˜ v k (cid:105)| ) − / n − / n (cid:88) t =1 (cid:104) ˜ Y t , c k v k − ˜ v k (cid:105) e − itω j (cid:12)(cid:12)(cid:12) ≤ E (cid:107) ˜ Y (cid:107) (cid:107) c k v k − ˜ v k (cid:107) max ≤ j ≤ q (cid:12)(cid:12)(cid:12) (E |(cid:104) ˜ Y t , c k v k − ˜ v k (cid:105)| ) − / n − / n (cid:88) t =1 (cid:104) ˜ Y t , c k v k − ˜ v k (cid:105) e − itω j (cid:12)(cid:12)(cid:12) . Using Lemma 12 in Section 7.5, (cid:107) c k v k − ˜ v k (cid:107) = o ( n − − /r ) ) as n → ∞ andmax ≤ j ≤ q (cid:12)(cid:12)(cid:12) (E |(cid:104) ˜ Y , c k v k − ˜ v k (cid:105)| ) − / n − / n (cid:88) t =1 (cid:104) ˜ Y t , c k v k − ˜ v k (cid:105) e − itω j (cid:12)(cid:12)(cid:12) = O P (log n )as n → ∞ (since this is the maximum of q iid standard exponential random variables) shows that(7.8) tends to 0. Similar arguments show that (7.9) goes to 0 in probability. Hence (7.7) holds. Proof of Theorem 2.
The proof is basically identical to the proof of Theorem 1. The main differ-ence here is that if we consider the approximating Gaussian process { Y t } with Var( Y ) = Var( X )then max ≤ j ≤ q (cid:13)(cid:13)(cid:13) n − / n (cid:88) t =1 Y t e − itω j (cid:13)(cid:13)(cid:13) = max ≤ j ≤ q (cid:110) d (cid:88) k =1 E kj (cid:111) , E kj are iid Exp(1) random variables with 1 ≤ k ≤ d and 1 ≤ j ≤ q . Then (cid:80) dk =1 E kj are iid Gamma( d,
1) random variables. The limiting distribution of the maximum can be foundin Example 1 of Kang and Serfozo [21] or Table 3.4.4 of Embrechts et al. [9]. The proof iscomplete.Now we let d grow to infinity. In this case we need a version of Proposition 1 where thedependence of the constant C on b and s is explicit. We provide such a result in the followingproposition which may be of independent interest. The proof is outlined in Appendix A. Proposition 2.
Suppose (i), (ii), (iv) hold with q ≥ hold and { B n } n ≥ is a bounded sequence,then ρ n ( A sp ( s )) ≤ C · s log / ( pn ) b / n / , (7.10) where C is a constant that does not depend on n , b , p or s . For the proofs of Theorem 3 and Theorem 4 we don’t need a truncation argument. Wedenote the DFT of Y , . . . , Y n by Y n ( ω ), Y dn ( ω ) is its projection onto { v , . . . , v d } and (cid:102) M dn =max j =1 ,...,q (cid:107)Y dn ( ω j ) (cid:107) and M dn = max j =1 ,...,q (cid:107)X dn ( ω j ) (cid:107) .Now we are ready to prove Theorem 3. Proof of Theorem 3.
We have | P (( M dn − b dq ) /λ ≤ x ) − e − e − x |≤ | P ( (cid:102) M dn − b dq ) /λ ≤ x ) − e − e − x | + ρ n ( A sp (2 d )) , (7.11)where ρ n ( A sp (2 d )) = sup x ∈ R | P ( M dn ≤ x ) − P ( ˜ M dn ≤ x ) | . (7.12)We consider the normalized partial sums S Vn = n − / n (cid:88) t =1 ξ dt ⊗ f t = n − / n (cid:88) t =1 V t . where ξ dt = ( (cid:104) X t , v (cid:105) , . . . , (cid:104) X t , v d (cid:105) ) (cid:48) (7.13)and f t is defined by (7.3). Like we showed in the proof of Theorem 1 we have that P ( M dn ≤ x ) = P ( S Vn ∈ ∩ qk =1 A k ) . where ∩ qk =1 A k is a 2 d -sparsely convex set.Set B = max { E (cid:107) X (cid:107) , ( E (cid:107) X (cid:107) ) / , (2 − E (cid:107) X (cid:107) ) / } . We aim to apply (7.10) with p = 2 dq and s = 2 d . Since | V t,j | ≤ (cid:107) X t (cid:107) for all j = 1 , . . . , dq , we see that (ii) and (iv) are satisfiedwith B n = B and condition (i) follows from Lemma 14 in Section 7.5 with b = λ d /
2. Hence, byProposition 2 we get ρ n ( A sp (2 d n )) ≤ C · d n log / ( d n n ) λ / d n n / (7.14)20or n ≥
1, where C is a universal constant. Under assumption (2.7), the right hand side of (7.14)goes to 0 as n → ∞ .Since Y , Y , . . . are Gaussian random elements, (cid:107)Y ( d ) n ( ω ) (cid:107) , . . . , (cid:107)Y ( d ) n ( ω d ) (cid:107) are iid variablesfollowin a Hypo( λ − , . . . , λ − d ) distribution. Lemma 4 implies that the first term on the right handside of (7.11) goes to 0 as n → ∞ under assumptions (2.6) and (2.8). The proof is complete. Proof of Theorem 4.
We start by noting that λ − ( M n − b q ) = λ − ( M n − M d n n ) + λ − ( M d n n − b d n q ) + λ − ( b d n n − b q ) . The third term converges to zero by Lemma 9, and the second term converges in distribution toa Gumbel random variable under the assumptions of Theorem 3. Hence, by Slutsky’s theorem,the convergence in distribution of λ − ( M n − b q ) to the standard Gumbel distribution holds if weverify that the first term tends to zero in probability.To this end, we define δ j = (cid:107)X n ( ω j ) (cid:107) − (cid:107)X dn ( ω j ) (cid:107) = (cid:88) k>d n (cid:12)(cid:12)(cid:12) n (cid:88) t =1 (cid:104) X t , v k (cid:105) e − i tω j (cid:12)(cid:12)(cid:12) . For any a >
0, we have P ( | M n − M dn | > a ) = P ( M n − M dn > a )= P (cid:16) max j =1 ,...,q (cid:110) (cid:107)X d ( ω j ) (cid:107) + δ j (cid:111) − M dn > a (cid:17) ≤ P (cid:16) max j =1 ,...,q δ j > a (cid:17) ≤ q (cid:88) j =1 (cid:88) k>d P (cid:16) n (cid:12)(cid:12)(cid:12) n (cid:88) t =1 (cid:104) X t , v k (cid:105) cos( tω j ) (cid:12)(cid:12)(cid:12) > a(cid:96) k / (cid:17) + q (cid:88) j =1 (cid:88) k>d P (cid:16) n (cid:12)(cid:12)(cid:12) n (cid:88) t =1 (cid:104) X t , v k (cid:105) sin( tω j ) (cid:12)(cid:12)(cid:12) > a(cid:96) k / (cid:17) . (7.15)Since {(cid:104) X t , v k (cid:105) cos( tω j ) } ≤ t ≤ n are independent random variables with zero means and E |(cid:104) X t , v k (cid:105) cos( tω j ) | r ≤ E (cid:107) X (cid:107) r < ∞ for some r >
2, Markov’s inequality and Rosenthal’s inequality (see Rosenthal [33])lead to P (cid:16) n (cid:12)(cid:12)(cid:12) n (cid:88) t =1 (cid:104) X t , v k (cid:105) cos( tω j ) (cid:12)(cid:12)(cid:12) > a(cid:96) k / (cid:17) ≤≤ C r ( na(cid:96) k / − r/ (cid:104) n (cid:88) t =1 E |(cid:104) X t , v k (cid:105)| r + (cid:16) n (cid:88) t =1 E |(cid:104) X t , v k (cid:105)| (cid:17) r/ (cid:105) ≤ C r ( na(cid:96) k / − r/ [ nE |(cid:104) X , v k (cid:105)| r + ( nλ k ) r/ ] ≤ C r (2 /a ) r/ [ n − r/ (cid:96) − r/ k E |(cid:104) X , v k (cid:105)| r + ( λ k /(cid:96) k ) r/ ] , where C r is a constant depending only on r . (7.15) can be bounded in an analogous way andsummation over j and k gives conditions (2.9) and (2.10). The proof is complete.21 .2 Domain of attraction of Gumbel distribution First, we show that, for fixed d ≥
1, the hypoexponential distribution with strictly increasingparameters belongs to the domain of attraction of the Gumbel distribution.
Lemma 3.
Let d ≥ be fixed. Suppose that ξ , . . . , ξ n are iid Hypo( λ − , . . . λ − d ) random variableswith λ k > λ k +1 for all ≤ k ≤ d . Then λ − (max { ξ , . . . , ξ n } − b dn ) d → G as n → ∞ , (7.16) where b dn is given by (2.3) .Proof of Lemma 3. Since λ k > λ k +1 for all 1 ≤ k ≤ d , the cdf of Hypo( λ − , . . . λ − d ) is given by F ( d ) ( x ) = d (cid:88) k =1 α k,d F k ( x ) (7.17)for x ∈ R , where α k,d = (cid:81) dj =1 ,j (cid:54) = k (1 − λ j /λ k ) − and F k is the cdf of Exp( λ − k ) for k ≥
1. Hence,Theorem 2 of Kang and Serfozo [21] implies that max { ξ , . . . , ξ n } is asymptotically distributed asthe standard Gumbel distribution with the normalizing constants given by (2.3).Under certain assumptions on the parameters of the hypoexponential distribution and the thegrowth rate of d = d n , we show that convergence (7.16) still holds even if d = d n → ∞ as n → ∞ . Lemma 4.
Suppose that condition (2.6) is satisfied and that d = d n = O ( n γ ) as n → ∞ , with γ satisfying (2.8) . Then convergence (7.16) holds.Proof of Lemma 4. Fix x ∈ R . Using (7.17) and the fact that (cid:80) dk =1 α k,d = 1, we obtain P ( λ − (max { ξ , . . . , ξ n } − b dn ) ≤ x ) = F ( d ) (cid:0) λ x + b dn (cid:1) n = (cid:104) − e − x n − d (cid:88) k =2 α k,d (cid:16) e − x nα ,d (cid:17) λ /λ k (cid:105) n . (7.18)We need to show that d (cid:88) k =2 α k,d (cid:16) e − x nα ,d (cid:17) λ /λ k = o ( n − ) (7.19)as n → ∞ , which implies that (7.18) converges to the Gumbel distribution function.Denote a k,n := nα k,d (cid:16) e − x nα ,d (cid:17) λ /λ k and let d n = O ( n γ ) as n → ∞ for some 0 < γ <
1. We first remark that1 α ,d = d (cid:89) j =2 (1 − λ j /λ ) ≤ (1 − λ d /λ ) d − ≤ . (7.20)Observe that condition (2.6) is equivalent to the following condition: there exists k ≥ λ j /λ k ≥ k/j (7.21)22or each k ≥ k and 1 ≤ j ≤ k −
1. Let k ≥ k be such that d n ( d n /n ) k − →
0, which is possiblesince γ <
1. Denote A n = k − (cid:88) k =2 a k,n and A n = d n (cid:88) k = k a k,n . (7.22)For k ≥ k , using (7.21) and the fact that λ j /λ k ≤ k/j for j > k , | α k,d | ≤ k − (cid:89) j =1 jk − j d (cid:89) j = k +1 jj − k = (cid:18) dk (cid:19) ≤ (cid:18) edk (cid:19) k . (7.23)Choose n ≥ e − x /n ≤ n ≥ n . For n ≥ n , using (7.20), (7.21) and (7.23), weobtain | A n | ≤ d n (cid:88) k = k n (cid:16) ed n k (cid:17) k (cid:16) e − x n (cid:17) λ /λ k ≤ d n d n (cid:88) k = k (cid:16) e − x k (cid:17) k (cid:16) d n n (cid:17) k − ≤ d n (cid:16) d n n (cid:17) k − ∞ (cid:88) k =1 (cid:16) e − x k (cid:17) k , where d n ( d n /n ) k − → n → ∞ .Next, for k < k , set ν k = (cid:81) k − j =1 ,j (cid:54) = k | − λ j /λ k | − . Since λ j /λ k ≤ k/j for j ≥ k and k < k using (2.6), we obtain | α k,d | = ν k d (cid:89) j = k (1 − λ j /λ k ) − ≤ ν k d (cid:89) j = k jj − k = ν k · d !( k − k − k − d − k )! ≤ ν k · d k · ( k − k − k − . (7.24)Thus, using (7.20) and (7.24), | A n | ≤ k (cid:88) k =2 ν k · d kn · ( k − k − k − · e − xλ /λ k (cid:18) n (cid:19) λ /λ k − = O (cid:16) max ≤ k ≤ k (cid:110) d kn n λ /λ k − (cid:111)(cid:17) (7.25)as n → ∞ . If γ < min ≤ k ≤ k { k − ( λ /λ k − } , (7.25) tends to 0 as n → ∞ . The proof iscomplete. 23 .3 Linear processes The method for transferring the iid setting to linear processes is similar as for the central limit the-orem and the functional central limit theorem for linear processes under the absolute summabilityof a k ’s (see e.g. Merlev`ede et al. [27] and Raˇckauskas and Suquet [31]). Lemma 5.
Suppose that { X t } t ∈ Z is a linear process defined by (3.1) such that (cid:80) ∞ k = −∞ (cid:107) a k (cid:107) < ∞ .Then for n ≥ we have E max ≤ j ≤ q (cid:13)(cid:13)(cid:13) n (cid:88) t =1 X t e − itω j (cid:13)(cid:13)(cid:13) ≤ (cid:16) ∞ (cid:88) k = −∞ (cid:107) a k (cid:107) (cid:17) E max ≤ j ≤ q (cid:13)(cid:13)(cid:13) n (cid:88) t =1 ε t e − itω j (cid:13)(cid:13)(cid:13) . (7.26) Proof.
The left-hand side of (7.26) is given asE max ≤ j ≤ q (cid:13)(cid:13)(cid:13) ∞ (cid:88) k = −∞ a k (cid:16) n − k (cid:88) s =1 − k ε s e − isω j (cid:17) e − ikω j (cid:13)(cid:13)(cid:13) ≤≤ E max ≤ j ≤ q (cid:16) ∞ (cid:88) k = −∞ (cid:13)(cid:13)(cid:13) a k (cid:16) n − k (cid:88) s =1 − k ε s e − isω j (cid:17) e − ikω j (cid:13)(cid:13)(cid:13)(cid:17) ≤ E max ≤ j ≤ q (cid:16) ∞ (cid:88) k = −∞ (cid:107) a k (cid:107) (cid:13)(cid:13)(cid:13) n − k (cid:88) s =1 − k ε s e − isω j (cid:13)(cid:13)(cid:13)(cid:17) . The proof is complete.
Proof of Lemma 1.
Some little algebra shows that X n ( ω ) − A ( ω ) E n ( ω ) = n − / ∞ (cid:88) k = −∞ a k (cid:16) n − k (cid:88) s =1 − k ε s e − isω − n (cid:88) t =1 ε t e − itω (cid:17) e − ikω = n − / ∞ (cid:88) k = −∞ a k (∆ nk ( ω )) e − ikω . Since ∆ nk ( ω ) has 2( | k | ∧ n ) summands, it follows from Theorem 9 thatE max ≤ j ≤ q (cid:107) ∆ nk ( ω j ) (cid:107) ≤ (E max ≤ j ≤ q (cid:107) ∆ nk ( ω j ) (cid:107) ) / ≤ (2 C {| k | log(2 | k | ) ∧ n log(2 n ) } ) / , where C > ≤ j ≤ q (cid:107)X n ( ω j ) − A ( ω j ) E n ( ω j ) (cid:107) ≤≤ n − / ∞ (cid:88) k = −∞ (cid:107) a k (cid:107) E max ≤ j ≤ q (cid:107) ∆ nk ( ω j ) (cid:107)(cid:28) n − / (cid:104) (cid:88) | k | 24t follows that E max ≤ j ≤ q (cid:107)X n ( ω j ) − A ( ω j ) E n ( ω j ) (cid:107) = o (log − / n ) as n → ∞ , which is a sufficientcondition for (3.3). Since X n ( ω ) ⊗ X n ( ω ) − A ( ω ) E n ( ω ) ⊗ A ( ω ) E n ( ω ) == ( X n ( ω ) − A ( ω ) E n ( ω )) ⊗ ( X n ( ω ) − A ( ω ) E n ( ω ))+ ( X n ( ω ) − A ( ω ) E n ( ω )) ⊗ A ( ω ) E n ( ω )+ A ( ω ) E n ( ω ) ⊗ ( X n ( ω ) − A ( ω ) E n ( ω )) , for ω ∈ [ − π, π ] and (cid:107) x ⊗ y (cid:107) = (cid:107) x (cid:107)(cid:107) y (cid:107) for x, y ∈ H , we obtainmax ≤ j ≤ q (cid:107)X n ( ω j ) ⊗ X n ( ω j ) − A ( ω j ) E n ( ω j ) ⊗ A ( ω j ) E n ( ω j ) (cid:107) S ≤ max ≤ j ≤ q (cid:107)X n ( ω j ) − A ( ω j ) E n ( ω j ) (cid:107) + 2 max ≤ j ≤ q (cid:107) A ( ω j ) E n ( ω j ) (cid:107) max ≤ j ≤ q (cid:107)X n ( ω j ) − A ( ω j ) E n ( ω j ) (cid:107) . (7.27)Also, E max ≤ j ≤ q (cid:107) A ( ω j ) E n ( ω j ) (cid:107) ≤ max ≤ j ≤ q (cid:107) A ( ω j ) (cid:107) (E max ≤ j ≤ q (cid:107)E n ( ω j ) (cid:107) ) / ≤ max ≤ j ≤ q (cid:107) A ( ω j ) (cid:107) ( C log n ) / . (7.28)Then (7.27) together with (3.3) and (7.28) implies (3.4). The proof is complete. Proof of Lemma 2. We have that (cid:12)(cid:12)(cid:12) max ≤ j ≤ q (cid:107) A − ( ω j ) X n ( ω j ) (cid:107) − max ≤ j ≤ q (cid:107)E n ( ω j ) (cid:107) (cid:12)(cid:12)(cid:12) ≤ max ≤ j ≤ q (cid:12)(cid:12) (cid:107) A − ( ω j ) X n ( ω j ) (cid:107) − (cid:107)E n ( ω j ) (cid:107) (cid:12)(cid:12) = max ≤ j ≤ q { ( (cid:107) A − ( ω j ) X n ( ω j ) (cid:107) − (cid:107)E n ( ω j ) (cid:107) )( (cid:107) A − ( ω j ) X n ( ω j ) (cid:107) + (cid:107)E n ( ω j ) (cid:107) ) }≤ max ≤ j ≤ q (cid:107) A − ( ω j ) X n ( ω j ) − E n ( ω j ) (cid:107) (cid:16) max ≤ j ≤ q (cid:13)(cid:13)(cid:13) A − ( ω j ) X n ( ω j ) (cid:107) + max ≤ j ≤ q (cid:107)E n ( ω j ) (cid:107) (cid:17) ≤ sup ω ∈ [0 ,π ] (cid:107) A − ( ω ) (cid:107) max ≤ j ≤ q (cid:107)X n ( ω j ) − A ( ω j ) E n ( ω j ) (cid:107)× (cid:16) sup ω ∈ [0 ,π ] (cid:107) A − ( ω ) (cid:107) max ≤ j ≤ q (cid:107)X n ( ω j ) (cid:107) + max ≤ j ≤ q (cid:107)E n ( ω j ) (cid:107) (cid:17) . According to Lemma 1, max ≤ j ≤ q (cid:107)X n ( ω j ) − A ( ω j ) E n ( ω j ) (cid:107) = o p (log − / n )as n → ∞ . Also, it follows from Lemma 5 and Theorem 9 thatmax ≤ j ≤ q (cid:107)E n ( ω j ) (cid:107) = O p (log / n ) and max ≤ j ≤ q (cid:107)X n ( ω j ) (cid:107) = O p (log / n )as n → ∞ , which completes the proof. 25 roof of Theorem 6. Denote Z t = Σ − / ε t with t = 1 , . . . , n so that Z n ( ω ) = Σ − / E n ( ω ) for ω ∈ [ − π, π ]. Similarly as in the proof of Lemma 2, (cid:12)(cid:12)(cid:12) max ≤ j ≤ q (cid:107) B − ( ω j ) X n ( ω j ) (cid:107) − max ≤ j ≤ q (cid:107)Z n ( ω j ) (cid:107) (cid:12)(cid:12)(cid:12) ≤ (cid:107) Σ − / (cid:107) sup ω ∈ [0 ,π ] (cid:107) A − ( ω ) (cid:107) max ≤ j ≤ q (cid:107)X n ( ω j ) − A ( ω j ) X n ( ω j ) (cid:107)× (cid:16) (cid:107) Σ − / (cid:107) sup ω ∈ [0 ,π ] (cid:107) A − ( ω ) (cid:107) max ≤ j ≤ q (cid:107)X n ( ω j ) (cid:107) + max ≤ j ≤ q (cid:107)Z n ( ω j ) (cid:107) (cid:17) . Using Lemma 1, Lemma 5 and Theorem 9, (cid:12)(cid:12)(cid:12) max ≤ j ≤ q (cid:107) B − ( ω j ) X n ( ω j ) (cid:107) − max ≤ j ≤ q (cid:107)Z n ( ω j ) (cid:107) (cid:12)(cid:12)(cid:12) = o p (1) , as n → ∞ and we use Theorem 2 to conclude. The proof of Theorem 7 is a simple consequence of the following three lemmas. We remark thatwe can work with X n ( ω j ) instead of Y n ( ω j ), as those quantities are identical under H for any j = 1 , . . . , q . Lemma 6. Under Assumption 2 we have (cid:12)(cid:12)(cid:12) max ≤ j ≤ q (cid:107) ( I − e − i ω j (cid:98) ρ ) X n ( ω j ) (cid:107) − max ≤ j ≤ q (cid:107) ( I − e − i ω j ρ ) X n ( ω j ) (cid:107) (cid:12)(cid:12)(cid:12) = o P (cid:16) log na n (cid:17) . Proof. Let v = ( I − e − i ω j ρ ) X n ( ω j ) and h = e − i ω j ( ρ − (cid:98) ρ ) X n ( ω j ) . Then, using (cid:107) v + h (cid:107) − (cid:107) v (cid:107) = (cid:107) h (cid:107) + (cid:104) v, h (cid:105) + (cid:104) v, h (cid:105) and thus |(cid:107) v + h (cid:107) − (cid:107) v (cid:107) | ≤ (cid:107) h (cid:107) + 2 (cid:107) h (cid:107)(cid:107) v (cid:107) we get (cid:12)(cid:12)(cid:12) max ≤ j ≤ q (cid:107) ( I − e − i ω j (cid:98) ρ ) X n ( ω j ) (cid:107) − max ≤ j ≤ q (cid:107) ( I − e − i ω j ρ ) X n ( ω j ) (cid:107) (cid:12)(cid:12)(cid:12) ≤ max ≤ j ≤ q (cid:12)(cid:12) (cid:107) ( I − e − i ω j (cid:98) ρ ) X n ( ω j ) (cid:107) − (cid:107) ( I − e − i ω j ρ ) X n ( ω j ) (cid:107) (cid:12)(cid:12) ≤ ( (cid:107) (cid:98) ρ − ρ (cid:107) + 2(1 + (cid:107) ρ (cid:107) ) (cid:107) (cid:98) ρ − ρ (cid:107) ) max ≤ j ≤ q (cid:107)X n ( ω j ) (cid:107) . By (3.5) max ≤ j ≤ q (cid:107)X n ( ω j ) (cid:107) = O P (max ≤ j ≤ q (cid:107)E n ( ω j ) (cid:107) ), which in turn is O P (log n ) by Theorem 4. Lemma 7. Under Assumption 2 we have max j ≥ | λ j − ˆ λ j | = o P (cid:16) a n (cid:17) . roof. By Weyl’s lemma it suffices to show that (cid:13)(cid:13) n (cid:80) nt =1 ˆ ε t ⊗ ˆ ε t − Σ (cid:13)(cid:13) = o P ( a − n ). (For the sakeof simplicity take averages from 1 to n .) Since we require 4 moments for the ε t it follows that (cid:13)(cid:13) n (cid:80) nt =1 ε t ⊗ ε t − Σ (cid:13)(cid:13) = O P ( n − / ). Hence we have that (cid:13)(cid:13) n n (cid:88) t =1 ˆ ε t ⊗ ˆ ε t − Σ (cid:13)(cid:13) ≤ n n (cid:88) t =1 (cid:13)(cid:13) ˆ ε t ⊗ ˆ ε t − ε t ⊗ ε t (cid:13)(cid:13) + (cid:13)(cid:13) n n (cid:88) t =1 ε t ⊗ ε t − Σ (cid:13)(cid:13) ≤ n n (cid:88) t =1 (cid:107) ˆ ε t ⊗ ˆ ε t − ε t ⊗ ε t (cid:13)(cid:13) + O P ( n − / ) ≤ n n (cid:88) t =1 (cid:107) ε t (cid:107)(cid:107) ˆ ε t − ε t (cid:107) + 1 n n (cid:88) t =1 (cid:107) ˆ ε t − ε t (cid:107) + O P ( n − / ) ≤ (cid:16) n n (cid:88) t =1 (cid:107) ε t (cid:107) (cid:17) / (cid:16) n n (cid:88) t =1 (cid:107) ˆ ε t − ε t (cid:107) (cid:17) / + 1 n n (cid:88) t =1 (cid:107) ˆ ε t − ε t (cid:107) + O P ( n − / ) . Now we have1 n n (cid:88) t =1 (cid:107) ˆ ε t − ε t (cid:107) = 1 n n (cid:88) t =1 (cid:107) ( (cid:98) ρ − ρ ) X t − (cid:107) ≤ (cid:107) (cid:98) ρ − ρ (cid:107) × n n (cid:88) t =1 (cid:107) X t − (cid:107) = o P ( a − n ) . Here we used that by the ergodic theorem n (cid:80) nt =1 (cid:107) X t − (cid:107) = O P (1) and by the law of largenumbers n (cid:80) nt =1 (cid:107) ε t (cid:107) = O P (1). Hence the claim follows. Lemma 8. Under Assumption 1 and Assumption 2 we have (cid:88) j ≥ log(1 − λ j /λ ) − a n (cid:88) j =2 log(1 − ˆ λ j / ˆ λ ) = o P (1) . Proof. Since the series (cid:80) j ≥ log(1 − λ j /λ ) is convergent and a n → ∞ , it suffices to show that a n (cid:88) j ≥ (cid:0) log(1 − λ j /λ ) − log(1 − ˆ λ j / ˆ λ ) (cid:1) = o P (1) . To this end we note that by the mean value theorem and the monotonicity of log (cid:48) x = x − we have | log(1 − λ j /λ ) − log(1 − ˆ λ j / ˆ λ ) |≤ | ˆ λ j / ˆ λ − λ j /λ | × max (cid:110) λ λ − λ j , ˆ λ ˆ λ − ˆ λ j (cid:111) ≤ | ˆ λ j / ˆ λ − λ j /λ | × max (cid:110) λ λ − λ , ˆ λ ˆ λ − ˆ λ (cid:111) . By Lemma 7 we have max { λ λ − λ , ˆ λ ˆ λ − ˆ λ } = O P (1) andmax j ≥ | ˆ λ j / ˆ λ − λ j /λ | ≤ max j ≥ | ˆ λ j / ˆ λ − λ j / ˆ λ | + max j ≥ | λ j / ˆ λ − λ j /λ |≤ λ max j ≥ ( | ˆ λ j − λ j | + | ˆ λ − λ | ) ≤ λ max j ≥ | λ j − ˆ λ j | = o P ( a − n ) . roof of Theorem 8. Define N = (cid:98) n/d (cid:99) , r = n − dN ∈ { , . . . , d − } and set ˆ ω = 2 πN/n . Clearly,when r = 0, then ˆ ω = 2 π/d and then S n (ˆ ω ) = (cid:113) Nd (cid:80) dt =1 s ( t ) e − i πtd . Let us elaborate the termwhen r (cid:54) = 0. S n (ˆ ω ) := 1 √ n n (cid:88) t =1 s ( t ) e − iˆ ωt . By the d -periodicity of s ( t ) and letting R n = √ n (cid:80) nt = n − r +1 s ( t ) e − iˆ ωt we get S n (ˆ ω ) = 1 √ n d (cid:88) t =1 s ( t ) N − (cid:88) m =0 e − i π ( t + md ) Nn + R n = d (cid:88) t =1 s ( t ) e − iˆ ωt × √ n N − (cid:88) m =0 e − i πm ( n − r ) n + R n = d (cid:88) t =1 s ( t ) e − iˆ ωt × √ n N − (cid:88) m =0 e i πrmn + R n . Now using the formula (cid:80) N − m =0 e i mx = sin( Nx/ x/ e i x N − we have with x = 2 πr/n (cid:12)(cid:12)(cid:12) N − (cid:88) m =0 e i πrmn (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) sin( Nπrn )sin( πrn ) (cid:12)(cid:12)(cid:12) ≥ sin( π min { d − rd , Nrn } )sin( πrn ) . For the last inequality we use that π min { d − rd , Nrn } ∈ [0 , π/ 2] and that sin( x ) is increasing in thisinterval. Recall, moreover, that x/ ≤ sin( x ) ≤ x for x ∈ [0 , π/ d/n → n sin( π min { d − rd , Nrn } )sin( πrn ) ≥ n r min (cid:110) d − rd , N rn (cid:111) ≥ N d . Because of (cid:107) R n (cid:107) = O ( n − / ) we can conclude that for n ≥ n we have (cid:107)S n (ˆ ω ) (cid:107) ≥ √ N d / (cid:13)(cid:13)(cid:13) d (cid:88) t =1 s ( t ) e − iˆ ωt (cid:13)(cid:13)(cid:13) . We note, moreover, that (cid:13)(cid:13)(cid:13) d (cid:88) t =1 s ( t ) (cid:16) e − iˆ ωt − e − i πd t (cid:17)(cid:13)(cid:13)(cid:13) ≤ d (cid:88) t =1 (cid:107) s ( t ) (cid:107) (cid:12)(cid:12)(cid:12) ˆ ω − πd (cid:12)(cid:12)(cid:12) t ≤ d (cid:88) t =1 (cid:107) s ( t ) (cid:107) (cid:12)(cid:12)(cid:12) πrn (cid:12)(cid:12)(cid:12) = O ( d/N ) . Because of (4.3) we may hence conclude that for a large enough n (cid:107)S n (ˆ ω ) (cid:107) ≥ √ n d (cid:13)(cid:13)(cid:13) d (cid:88) t =1 s ( t ) e − i πtd (cid:13)(cid:13)(cid:13) , and thus max ≤ j ≤ q (cid:107) ( I − e − i ω j (cid:98) ρ ) Y n ( ω j ) (cid:107) = max ≤ j ≤ q (cid:107) ( I − e − i ω j (cid:98) ρ )( S n ( ω j ) + X n ( ω j )) (cid:107)≥ (cid:107) ( I − e − iˆ ω (cid:98) ρ ) S n (ˆ ω ) (cid:107) − max ≤ j ≤ q (cid:107) ( I − e − i ω j (cid:98) ρ ) X n ( ω j ) (cid:107)≥ (1 − (cid:107) (cid:98) ρ (cid:107) ) ψ n √ log n − max ≤ j ≤ q (cid:107) ( I − e − i ω j (cid:98) ρ ) X n ( ω j ) (cid:107) . ≤ j ≤ q (cid:107) ( I − e − i ω j (cid:98) ρ ) X n ( ω j ) (cid:107) = max ≤ j ≤ q (cid:107) ( I − e − i ω j (cid:98) ρ )( I − e − i ω j ρ ) − ( I − e − i ω j ρ ) X n ( ω j ) (cid:107)≤ (cid:107) (cid:98) ρ (cid:107) − (cid:107) ρ (cid:107) max ≤ j ≤ q (cid:107)X n ( ω j ) (cid:107) = O P ( (cid:112) log n ) . This last bound is obtained by (cid:107) (cid:98) ρ − ρ (cid:48) (cid:107) P → 0, Lemma 2 and Theorem 4. So by (4.3) it follows thatmax ≤ j ≤ q (cid:107) ( I − e − i ω j (cid:98) ρ ) X n ( ω j ) (cid:107) = o P ( ψ n √ log n ). This shows that max ≤ j ≤ q (cid:107) ( I − e − i ω j (cid:98) ρ ) Y n ( ω j ) (cid:107) diverges at least at rate ψ n log n . Since the estimated eigenvalues in the definition T n converge byassumption, the statistic T n must diverge. Auxiliary lemmas for standardising sequencesLemma 9. Let d = d n → ∞ . Suppose λ > λ . Then we have that b n − b dn → as n → ∞ , where b n = λ log( q (cid:81) ∞ j =2 (1 − λ j /λ ) − ) .Proof. We have that b n − b dn = λ ∞ (cid:88) j = d +1 log(1 + λ j / ( λ − λ j ))and for any j > λ j / ( λ − λ j )) ≤ λ j / ( λ − λ j ) ≤ λ j / ( λ − λ ) . The claim follows from (cid:80) ∞ j =1 λ j < ∞ . Auxiliary lemmas for truncationLemma 10. Suppose that d ≥ is fixed and E (cid:107) X t (cid:107) r < ∞ with r > . Then M n − ˜ M n := max ≤ k ≤ q (cid:107)X dn ( ω k ) (cid:107) − max ≤ k ≤ q (cid:107) ˜ X dn ( ω k ) (cid:107) = o P (1) as n → ∞ . Proof. We have that ∩ nt =1 { X t = ˜ X t } ⊂ { M n = ˜ M n } . Hence, P ( | M n − ˜ M n | > ε ) ≤ P ( M n (cid:54) = ˜ M n ) ≤ P ( ∪ nt =1 { X t (cid:54) = ˜ X t } ) ≤ nP ( (cid:107) X (cid:107) > n /r ) → n → ∞ for each ε > X t ’s have the same distribution and E (cid:107) X (cid:107) r < ∞ . The proof iscomplete. Lemma 11. Suppose that E (cid:107) X (cid:107) r < ∞ with some r ≥ . Then (cid:107) Var( X ) − Var( ˜ X ) (cid:107) = o ( n − (1 − /r ) ) as n → ∞ . roof. We have thatVar( ˜ X ) = E[ I {(cid:107) X (cid:107)≤ n /r } ( X ⊗ X )] − E[ X I {(cid:107) X (cid:107)≤ n /r } ] ⊗ E[ X I {(cid:107) X (cid:107)≤ n /r } ]and E[ X I {(cid:107) X (cid:107)≤ n /r } ] = − E[ X I {(cid:107) X (cid:107) >n /r } ]since E X = 0. Hence, (cid:107) Var( X ) − Var( ˜ X ) (cid:107) == (cid:107) E[( X ⊗ X ) I {(cid:107) X (cid:107) >n /r } ] + E[ X I {(cid:107) X (cid:107) >n /r } ] ⊗ E[ X I {(cid:107) X (cid:107) >n /r } ] (cid:107)≤ (cid:107) X (cid:107) I {(cid:107) X (cid:107) >n /r } ] ≤ (cid:107) X (cid:107) r I {(cid:107) X (cid:107) >n /r } ]) /r · n − (1 − /r ) . In the last step we used the H¨older inequality. The proof is complete. Lemma 12. Suppose that Assumption 1 holds. Denote the eigenvectors of Var( ˜ X ) by ˜ v , ˜ v , . . . with the corresponding eigenvalues ˜ λ , ˜ λ , . . . and c k = sgn (cid:104) v k , ˜ v k (cid:105) for k ≥ . Then (cid:107) ˜ v k − c k v k (cid:107) = o ( n − (1 − /r ) ) as n → ∞ for each k ≥ .Proof. Using Lemma 2.3 of Horv´ath and Kokoszka [17], (cid:107) ˜ v k − c k v k (cid:107) ≤ √ α k (cid:107) Var( X ) − Var( ˜ X ) (cid:107) , where α = λ − λ and α k = min { λ k − − λ k , λ k − λ k +1 } for k > 1. We use Lemma 11 to concludethe proof. Lemma 13. Suppose that E X = 0 and E (cid:107) X (cid:107) r < ∞ with some r ≥ . Then for any v > r wehave E (cid:107) ˜ X (cid:107) v = O ( n v/r − ) as n → ∞ Proof. We have that E (cid:107) ˜ X (cid:107) v ≤ E (cid:107) X I {(cid:107) X (cid:107)≤ n /r } − E[ X I {(cid:107) X (cid:107)≤ n /r } ] (cid:107) v ≤ E( (cid:107) X (cid:107) I {(cid:107) X (cid:107)≤ n /r } + E[ (cid:107) X (cid:107) I {(cid:107) X (cid:107)≤ n /r } ]) v ≤ v E[ (cid:107) X (cid:107) v I {(cid:107) X (cid:107)≤ n /r } ]= 2 v E[ (cid:107) X (cid:107) r (cid:107) X (cid:107) v − r I {(cid:107) X (cid:107)≤ n /r } ]= 2 v E (cid:107) X (cid:107) r · n v/r − . The proof is complete. 30 uxiliary lemma for CLTLemma 14. Set V t := ξ dt ⊗ f t (for vectors ⊗ denotes the Kronecker product) with ≤ t ≤ n where ξ dt and f t are given by (7.13) and (7.3) respectively. Then λ d ≤ n − n (cid:88) t =1 E | u (cid:48) V t | ≤ λ for all u ∈ S dq − .Proof of Lemma 14. Denote u = ( u (cid:48) , . . . , u (cid:48) d ) (cid:48) ∈ R dq with u (cid:48) k ∈ R q for 1 ≤ k ≤ d . SinceE[ ξ ( d ) t ( ξ ( d ) t ) (cid:48) ] = diag( λ , . . . , λ d ) , we obtain 1 n n (cid:88) t =1 E | u (cid:48) V t | = 1 n n (cid:88) t =1 d (cid:88) j,k =1 E [ (cid:104) X t , v j (cid:105)(cid:104) X t , v k (cid:105) ] u (cid:48) j f t u (cid:48) k f t = d (cid:88) j =1 λ j u (cid:48) j (cid:16) n n (cid:88) t =1 f t f (cid:48) t (cid:17) u j . But note that n − (cid:80) nt =1 f t f (cid:48) t = I q . Hence,1 n n (cid:88) t =1 E | u (cid:48) V t | = 12 d (cid:88) j =1 λ j (cid:107) u j (cid:107) (7.30)and (7.30) is maximized if (cid:107) u (cid:107) = 1 and minimized if (cid:107) u d (cid:107) = 1. Acknowledgments Vaidotas Characiejus would like to acknowledge the support of the Communaut´e fran¸caise de Bel-gique, Actions de Recherche Concert´ees, Projects Consolidation 2016-2021. The authors would liketo thank Professor Kengo Kato for sharing a detailed proof of Nazarov’s inequality (Chernozhukovet al. [6]) and Professor Fedor Nazarov who kindly communicated the proof of Theorem 9 in theunivariate case on MathOverflow. 31 ppendixA Constants in the high-dimensional CLT The constant C in (7.2) depends on the parameters b and s (in our setting, this corresponds to λ d and 2 d ). When we let d → ∞ then we need to make this dependence explicit. This is thepurpose of our Proposition 2, which is an extension of Proposition 3.2 of Chernozhukov et al. [5].We outline here the modifications needed. Proposition 3.2 of Chernozhukov et al. [5] is based ona series of other results which we are now formulating in the adapted version. An important stepin this extension is the following lemma, which is a refinement of Lemma A.1 of Chernozhukovet al. [5]. This result is originally due to Nazarov [28]. For the proof, we refer to Chernozhukovet al. [6]. Lemma 15. Let Y ∼ N p (0 , Σ) be such that EY j ≥ b for all j = 1 , . . . , p and with b > . Thenfor every y ∈ R p and δ > , P ( Y ≤ y + δ ) − P ( Y ≤ y ) ≤ δb / ( (cid:112) p + 2) , (A.1) where the inequalities between vectors are coordinatewise. For the rest of this section we will use essentially the same notation as in Chernozhukov et al. [5],with exception of the constants K i for i ≥ 1, which in our case are independent of the parameters n , p , b and s . Moreover, we use V t and W t in (7.1) instead of X t and Y t , since the latter variablesalready have different usage in this paper. It will be assumed throughout that p ≥ 3. Here is somenotation needed later. L n = max ≤ j ≤ p n − n (cid:88) i =1 E | V i,j | ; M n,V ( φ ) = n − n (cid:88) i =1 E (cid:16) max ≤ j ≤ p | V i,j | I { max ≤ j ≤ p | V i,j | > √ n/ (4 φ log p ) } (cid:17) ; M n ( φ ) = M n,V ( φ ) + M n,W ( φ ); Lemma 16 (Modification of Lemma 5.1 in Chernozhukov et al. [5]) . Denote (cid:37) n = sup y ∈ R p ,v ∈ [0 , | P ( √ vS Vn + √ − vS Wn ≤ y ) − P ( S Wn ≤ y ) | . Suppose that there exists some constant b > such that n − (cid:80) ni =1 E [ V i,j ] ≥ b for all j = 1 , . . . , p .Then for all φ ≥ it holds that (cid:37) n ≤ K (cid:110) φ log pn / (cid:16) φL n (cid:37) n + L n log / pb / + φM n ( φ ) (cid:17) + log / pφ b / (cid:111) . Proof. Replace in the proof of Lemma 5.1 in Chernozhukov et al. [5] the bound obtained fromtheir Lemma A.1 by the Lemma 15. This lemma is used in two places. At all other places theconstant K required in Chernozhukov et al. [5] is not affected by the value of b .32he lemma can be easily extended to hyperrectangles. Let A re be the class of hyperrectanglesin R p . Lemma 17 (Modification of Corollary 5.1 in Chernozhukov et al. [5]) . Denote (cid:37) (cid:48) n = sup A ∈A re ,v ∈ [0 , | P ( √ vS Vn + √ − vS Wn ∈ A ) − P ( S Wn ∈ A ) | . Suppose that there exists some constant b > such that n − (cid:80) ni =1 E [ V i,j ] ≥ b for all j = 1 , . . . , p .Then for all φ ≥ it holds that (cid:37) (cid:48) n ≤ K (cid:110) φ log pn / (cid:16) φL n (cid:37) (cid:48) n + L n log / pb / + φM n (2 φ ) (cid:17) + log / pφ b / (cid:111) . Lemma 18 (Modification of Theorem 2.1 in Chernozhukov et al. [5]) . Suppose that there existssome constant b ∈ (0 , such that n − (cid:80) ni =1 E [ V i,j ] ≥ b for all j = 1 , . . . , p . Then if L n ≥ L n , ρ n ( A re ) ≤ K (cid:110) log / p L / n b / n / + M n ( φ n ) L n (cid:111) , where φ n = γ n / L / n log / p and γ = K ∨ .Proof. Note that K / p L / n b / n / = K ( γ log / pb / ) φ n ≥ K K ∨ φ n . Thus the result is trivial if φ n ≤ K = 2( K ∨ φ n ≥ φ ≥ 1) and apply Lemma 17 with φ = φ n / Lemma 19 (Modification of Proposition 2.1 in Chernozhukov et al. [5]) . Suppose that there existssome constant b ∈ (0 , such that n − (cid:80) ni =1 E [ V i,j ] ≥ b for all j = 1 , . . . , p . Suppose, moreover,that condition (ii) holds for some sequence B n ≥ . Then, under (iv) we have ρ n ( A re ) ≤ K (cid:110) B / n log / ( pn ) b / n / + B / n log( pn ) b / n q − q (cid:111) . (A.2) Proof. The proof is based on Lemma 18, choosing φ n = γ ( n − L n log p ) − / , and L n = B n + B n n / − /q log / p . In Chernozhukov et al. [5] exactly the same terms are used and worked out, buttheir bound corresponding to our (A.2) does not involve the factor b − / (here it comes from ourLemma 18). The dependence on b in their bound remains latent. In particular it is implicit in theconstant corresponding to our γ (they denote it K ). In our case this constant doesn’t depend on b . We also note that Chernozhukov et al. [5] request in their proof the constants B / n log / ( pn ) n / ≤ min { Cγ − / , γ/ } and B / n log( pn ) n q − q ≤ γ/ , with some absolute constant C . (See inequalities (32) and (33) in Chernozhukov et al. [5].) We canimpose these assumptions as well, since otherwise (A.2) becomes trivial, by choosing K = K ( γ )big enough. Here we use again that our γ doesn’t depend on b . Hence, these assumptions will alsonot invoke dependence of K on b . 33or the next result we need further terms and definitions, which one can find in Section 3in Chernozhukov et al. [5]. For convenience we give here a quick review. For a closed convexset A we define a mapping S A , which maps from v ∈ S p − (= { v ∈ R p : (cid:107) v (cid:107) = 1 } ) to S A ( v ) =sup { w (cid:48) v : w ∈ A } . Then A = ∩ v ∈ S p − { w ∈ R p : w (cid:48) v ≤ S A ( v ) } . If A is a convex polytope withat most m facets then it is called m -generated. If V ( A ) are the m unit vectors orthogonal tothe facets, then A = ∩ v ∈V ( A ) { w ∈ R p : w (cid:48) v ≤ S A ( v ) } . For an m -generated set A m , set A m,(cid:15) = ∩ v ∈V ( A m ) { w ∈ R p : w (cid:48) v ≤ S A m ( v ) + (cid:15) } . A convex set A admits an approximation with precision (cid:15) by an m -generated convex set A m if A m ⊂ A ⊂ A m,(cid:15) We are now ready to define the class A si ( d ), which is the class of Borel sets A ∈ R p such that A admits an approximation with precision 1 /n by an m -generated convex set A m with m ≤ ( pn ) d .(In Chernozhukov et al. [5] a more general class A si ( d ) is introduced, but for us only the case a = 1 is relevant.) Consider A ⊂ A si ( d ). For some A ∈ A let A m = A m ( A ) be the approximating m -generated set. For the process V t let ˜ V t = ( ˜ V t, , . . . , ˜ V t,m ) (cid:48) = ( v (cid:48) V t ) v ∈V ( A m ) , t = 1 , . . . , n andconsider the following conditions.(i’) n − (cid:80) nt =1 E | ˜ V t,j | ≥ b for all j = 1 , . . . , m ;(ii’) n − (cid:80) nt =1 E | ˜ V t,j | k ≤ B kn for all j = 1 , . . . , m and k = 1 , E max ≤ j ≤ p ( | ˜ V t,j | /B n ) q ≤ t = 1 , . . . , n . Lemma 20 (Modification of Proposition 3.1 in Chernozhukov et al. [5]) . Let A be a subclass of A si ( d ) such that (i’), (ii’) and (iv’) are satisfied for all A ∈ A . Then ρ n ( A ) ≤ K (cid:110) B / n log / (cid:0) ( pn ) d (cid:1) b / n / + B / n log (cid:0) ( pn ) d (cid:1) b / n q − q (cid:111) . (A.3) The constant K does not depend on d .Proof. Following the proof of Proposition 3.1 in Chernozhukov et al. [5] and applying our Lemma 15instead of their Lemma A.1 we obtain | P ( S Vn ∈ A ) − P ( S Wn ∈ A ) | ≤ nb / ( (cid:113) (cid:0) ( pn ) d (cid:1) + 2) + ¯ ρ, where ¯ ρ = max (cid:8) | P ( S Vn ∈ A m ) − P ( S Wn ∈ A m ) | , | P ( S Vn ∈ A m,(cid:15) ) − P ( S Wn ∈ A m,(cid:15) ) | (cid:9) . For ¯ ρ we can use Lemma 19, and apply it to ˜ V , . . . , ˜ V n . From this we get the bound in (A.3)which in turn dominates nb / ( (cid:113) (cid:0) ( pn ) d (cid:1) + 2). Proof of Proposition 2. We need to adapt the proof of Proposition 3.2 in Chernozhukov et al. [5]and make the dependence on b and s explicit. Since we are interested in the case b → 0, wecan assume that b ≤ 1. Here are the steps and modifications. In the following C is an absoluteconstant which may vary from place to place.1. It is sufficient to consider sparsely convex sets A with max ≤ j ≤ p | w j | ≤ pn / for all w =( w , . . . , w p ) (cid:48) ∈ A . The argument is the same as in Chernozhukov et al. [5].34. Consider the subclass A sp1 ( s ) of sets in A sp ( s ) which contain a ball of radius 1 /n . Usingtheir Lemma D.1 with γ = 1 it is easy to show that A ∈ A sp1 ( s ) is approximable by an m -generated set A m with precision 1 /n and with m ≤ ( pn ) s , provided n ≥ n ( γ ) = n (1).This latter constraint is not a restriction, since for n < n (1) we may just choose a big enoughconstant K . The target is then to show conditions (i’), (ii’) and (iv’) and apply Lemma 20.Condition (i’) follows from condition (i) and the statement in Lemma D.1 that A m can bechosen to satisfy (cid:107) v (cid:107) ≤ s for all v ∈ V ( A m ). Next, in Chernozhukov et al. [5] it is shownthat (ii’) holds with B n replaced by B (cid:48) n = B n s and (iv”) with B n replaced by B n s . Sincewe require the original B n to be bounded we get from (A.3) ρ n ( A sp1 ( s )) ≤ K (cid:110) s log / (cid:0) pn (cid:1) b / n / (cid:111) . (A.4)3. Let A sp2 ( s ) = A sp ( s ) \A sp1 ( s ). Let us first consider the case of an A ∈ A sp2 ( s ) where we haveat least one A k in the representation A = ∩ A k which does not contain a ball of radius 1 /n .Remember that I A k ( x ) depends only on s components of x ∈ R p , say ˜ x = ( x j , . . . , x j s ) ∈ R s .Define a convex set (cid:101) A k ⊂ R s such that I A k ( x ) = I (cid:101) A k (˜ x ) for all x ∈ R p . For J = J ( A k ) =( j , . . . , j s ) we then have { S Vn ∈ A k } = { S Vn,J ∈ ˜ A k } . By Lemma A.2 in Chernozhukovet al. [5] it follows that P ( S Wn ∈ A ) ≤ P ( S Wn ∈ A k ) = P ( S Wn,J ∈ (cid:101) A k ) ≤ C n (cid:113) (cid:107) Ω − J (cid:107) S ≤ C n / s / √ b , where Ω J = Var( S Wn,J ). For the second inequality above we use that (cid:107) Ω − J (cid:107) S ≤ sλ , where λ min is the smallest eigenvalue of Ω J and hence 1 /λ min is the largest eigenvalue of Ω − J . Byour condition (i) we have λ min ≥ nb . Next we bound | P ( S Vn ∈ A k ) − P ( S Wn ∈ A k ) | = | P ( S Vn,J ∈ (cid:101) A k ) − P ( S Wn,J ∈ (cid:101) A k ) |≤ sup M ∈C | P (Ω − / J S Vn,J ∈ M ) − P ( N s (0 , I s ) ∈ M ) | =: ∆ , where C is the class of measurable convex sets in R s . In G¨otze [11] it is shown that for someabsolute constant C we have∆ ≤ Csβ , where β = n (cid:88) t =1 E (cid:107) Ω − / J V t,J (cid:107) . Note that by (ii) β = (cid:107) Ω − / J (cid:107) n (cid:88) t =1 E (cid:107) V t,J (cid:107) ≤ s / B n b / n / . We can assume that b / n / ≥ 1, otherwise the bound in Proposition 2 becomes trivial, bychoosing the constant big enough. Then b / n / ≤ b / n / and therefore P ( S Vn ∈ A ) ≤ P ( S Vn ∈ A k ) ≤ P ( S Wn ∈ A k ) + | P ( S Vn ∈ A k ) − P ( S Wn ∈ A k ) |≤ C s / b / n / B n . This shows that both, P ( S Vn ∈ A ) and P ( S Wn ∈ A ), are dominated by s log / ( pn ) b / n / and hencethis is also true for the difference | P ( S Vn ∈ A ) − P ( S Wn ∈ A ) | .35. The last case we need to handle is when A ∈ A sp2 and A = ∩ Kk =1 A k , such that each A k contains a ball with radius 1 /n . We show that both, P ( S Vn ∈ A ) and P ( S Wn ∈ A ) aredominated by the bound in (A.4). Thus their difference is.Like in Step 2 we can find for each k an m -generated convex set A m such that A mk ⊂ A k ⊂ A m, /nk . We have m ≤ ( pn ) s and we can choose A m such that for all v ∈ V ( A mk ) wehave (cid:107) v (cid:107) ≤ s . In Chernozhukov et al. [5] it is shown that quite generally K ≤ p s . Thus, A := ∩ Kk =1 A m, /nk is approximable by an m (cid:48) -generated set with m (cid:48) ≤ p s ( pn ) s ≤ ( pn ) s .Using the same arguments as in Step 2 we see that (i’), (ii’) and (iv’) hold and hence byLemma 20 with d = 3 s we have that | P ( S Vn ∈ A ) − P ( S Wn ∈ A ) | is bounded as in (A.4).Now since A contains no ball of radius 1 /n we get P ( S Wn ∈ ∩ Kk =1 A m, − /nk ) = 0 and hence P ( S Wn ∈ A ) ≤ P ( S Wn ∈ A )= P ( S Wn ∈ ∩ Kk =1 A m, /nk ) − P ( S Wn ∈ ∩ Kk =1 A m, − /nk ) ≤ P ( v (cid:48) S Wn ≤ S A mk ( v ) + 1 /n : k = 1 , . . . , K, v ∈ V ( A mk )) − P ( v (cid:48) S Wn ≤ S A mk ( v ) − /n : k = 1 , . . . , K, v ∈ V ( A mk )) ≤ n (cid:112) log(( pn ) s ) + 2 √ b . For the last inequality we used Lemma 15. Finally we observe P ( S Vn ∈ A ) ≤ P ( S Vn ∈ A ) ≤ P ( S Wn ∈ A ) + | P ( S Vn ∈ A ) − P ( S Wn ∈ A ) | . The proof is complete. B Maximum of linear forms Suppose that X , . . . , X n are iid zero mean random elements with values in H and that { a jnt } ≤ j ≤ q, ≤ t ≤ n ⊂L ( H ) are such that (cid:107) a jnt (cid:107) ≤ n − / for n ≥ 1. Denote L nj = n (cid:88) t =1 a jnt ( X t )for n ≥ ≤ j ≤ q . We show that E max ≤ j ≤ q (cid:107) L nj (cid:107) = O (log n ) as n → ∞ provided thatE (cid:107) X (cid:107) r < ∞ with some r > 2. We first prove an auxiliary lemma that is used in the proof. Lemma 21. Suppose that X , . . . , X n are independent zero mean random elements with values in H such that (cid:107) X t (cid:107) ≤ b a.s. with some b > and p := P ( (cid:107) X t (cid:107) (cid:54) = 0) for each t = 1 , . . . , n . Then P ( (cid:107) S n (cid:107) ≥ x ) ≤ e − x b β (cid:2) p ( e x b β − (cid:3) n for x ≥ and each β ∈ R , where S n = X + . . . + X n for n ≥ .Proof. It follows from Theorem 3.5 of Pinelis [30] that P ( (cid:107) S n (cid:107) ≥ x ) ≤ {− x / (2 nb ) } for all x ≥ 0. Let A k denote the event that k out of n random elements X , . . . , X n are not equal to 036ith k = 0 , . . . , n . Using the fact that k − ≥ β − β k for k (cid:54) = 0 and β ∈ R , we obtain P ( (cid:107) S n (cid:107) ≥ x ) = n (cid:88) k =0 P ( (cid:107) S n (cid:107) ≥ x | A k ) P ( A k ) ≤ n (cid:88) k =1 e − x kb (cid:18) nk (cid:19) p k (1 − p ) n − k ≤ e − x b β n (cid:88) k =1 (cid:18) nk (cid:19) e x b β k p k (1 − p ) n − k ≤ e − x b β (cid:2) p ( e x b β − (cid:3) n for x > β ∈ R . The proof is complete.Now we are ready to prove the main result. Theorem 9. Suppose that X , . . . , X n are iid random elements with values in H such that E X =0 and E (cid:107) X (cid:107) r with some r > . Then E max ≤ j ≤ q (cid:107) L nj (cid:107) = O (log n ) as n → ∞ . The proof of Theorem 9 is based on a decomposition of random elements. Consider somezero mean random element ξ with values in H such that E (cid:107) ξ (cid:107) r = 1. Suppose that { p k } k ≥ is astrictly decreasing sequence of probabilities that converges to 0 as k → ∞ . Choose R ≥ 1. Thedecomposition of ξ is given by ξ = ˆ ξ + R (cid:88) k =1 ˇ ξ k + ξ (cid:48) R , (B.1)where the random elements are defined in the following way. Since we are only interested in thedistribution, we can assume without loss of generality that a uniform random variable can bedefined on the underlying probability space. Then the space is non-atomic and hence there existsan event F R such that P ( F R ) = p R and (cid:107) ξ (cid:107) ≤ p − /rR on F cR . Defineˆ ξ R = ξI F cR + p − R E[ ξI F R ] I F R and ξ (cid:48) R = ξ − ˆ ξ R . The remaining random elements are defined recursively. Denote ˆ F k − the event such that P ( ˆ F k − ) = p k − and (cid:107) ˆ ξ k (cid:107) ≤ p − /rk − on ˆ F ck − with 1 ≤ k ≤ R . Moreover, defineˆ ξ k − = ˆ ξ k I ˆ F ck − + p − k − E[ ˆ ξ k I ˆ F k − ] I ˆ F k − and ˇ ξ k = ˆ ξ k − ˆ ξ k − . Then decomposition (B.1) has the following properties(i) E ˆ ξ = E ˇ ξ = . . . = E ˇ ξ R = E ξ (cid:48) R = 0;(ii) E (cid:107) ˆ ξ (cid:107) r ≤ E (cid:107) ˆ ξ (cid:107) r ≤ . . . ≤ E (cid:107) ˆ ξ R (cid:107) r ≤ E (cid:107) ξ (cid:107) r ;(iii) E (cid:107) ξ (cid:48) R (cid:107) r ≤ r E (cid:107) ξ (cid:107) r and E (cid:107) ξ k (cid:107) r ≤ r E (cid:107) ˜ ξ k (cid:107) r for 1 ≤ k ≤ R ;(iv) by H¨older’s inequality, (cid:107) p − R E[ ξI F R ] (cid:107) ≤ p − /rR and (cid:107) p − k − E[ ˆ ξ k I ˆ F r − ] (cid:107) ≤ p − /rk − for 1 ≤ k ≤ R and hence (cid:107) ˆ ξ k (cid:107) ≤ p − /rk for 0 ≤ k ≤ R and (cid:107) ˇ ξ k (cid:107) ≤ p − /rk − ≤ p − /rk for 1 ≤ k ≤ R ;37v) P ( ξ (cid:48) R (cid:54) = 0) ≤ p R and P ( ˇ ξ k (cid:54) = 0) ≤ p k − for 1 ≤ k ≤ R . Proof of Theorem 9. Assume that E (cid:107) X (cid:107) r = 1 without loss of generality. We use decomposi-tion (B.1) with p k = 2 − k and R = log n (the logarithm to the base 2). Then L nj = n (cid:88) t =1 a jnt ( ˆ X t, ) + log n (cid:88) k =1 n (cid:88) t =1 a jnt ( ˇ X t,k ) + n (cid:88) t =1 a jnt ( X (cid:48) t, log n )for 1 ≤ j ≤ q and n ≥ 1. Observe that ˆ X t, = 0 almost surely for 1 ≤ t ≤ n since p = 1.By H¨older’s inequality, n − / n (cid:88) t =1 (cid:107) X (cid:48) t, log n (cid:107) ≤ n − / N − /rn (cid:104) n (cid:88) t =1 (cid:107) X (cid:48) t, log n (cid:107) r (cid:105) /r , where the random variable N n = (cid:80) nt =1 I {(cid:107) X (cid:48) t, log2 n (cid:107)(cid:54) =0 } follows a binomial distribution. Observe thatE e N n = [1 + P ( (cid:107) X (cid:48) , log n (cid:107) (cid:54) = 0)( e − n ≤ [1 + n − ( e − n < e e . It follows that any fixed moment of N n is bounded for all n ≥ ≤ j ≤ q (cid:13)(cid:13)(cid:13) n (cid:88) t =1 a jnt ( X (cid:48) t, log n ) (cid:13)(cid:13)(cid:13) ≤ n − E N − /qn [ n E (cid:107) X (cid:48) , log n (cid:107) r ] /r = O ( n /r − )as n → ∞ using Jensen’s inequality.By the triangle inequality,E max ≤ j ≤ q (cid:13)(cid:13)(cid:13) log n (cid:88) k =1 n (cid:88) t =1 a jnt ( ˇ X t,k ) (cid:13)(cid:13)(cid:13) ≤ E (cid:104) log n (cid:88) k =1 max ≤ j ≤ q (cid:13)(cid:13)(cid:13) n (cid:88) t =1 a jnt ( ˇ X t,k ) (cid:13)(cid:13)(cid:13)(cid:105) ≤ (cid:12)(cid:12)(cid:12) log n (cid:88) k =1 (cid:16) E max ≤ j ≤ q (cid:13)(cid:13)(cid:13) n (cid:88) t =1 a jnt ( ˇ X t,k ) (cid:13)(cid:13)(cid:13) (cid:17) / (cid:12)(cid:12)(cid:12) . Choose δ > /r + δ < / 2. We show thatE max ≤ j ≤ q (cid:13)(cid:13)(cid:13) n (cid:88) t =1 δk − a jnt ( ˇ X t,k ) (cid:13)(cid:13)(cid:13) ≤ C log n + O (1)for each 1 ≤ k ≤ log n with some C > 0. Note that this yields the proof. More specifically, weshow that E (cid:104)(cid:13)(cid:13)(cid:13) n (cid:88) t =1 δk − a jnt ( ˇ X t,k ) (cid:13)(cid:13)(cid:13) − C log n (cid:105) + = (cid:90) ∞ C √ log n xP (cid:16)(cid:13)(cid:13)(cid:13) n (cid:88) t =1 δk − a jnt ( ˇ X t,k ) (cid:13)(cid:13)(cid:13) > x (cid:17) dx ≤ Cn (B.2)for each 1 ≤ j ≤ q and 1 ≤ k ≤ log n . 38ince (cid:107) δk − a jnt ( ˇ X t,k ) (cid:107) ≤ p − (1 /r + δ ) k n − / and P ( ˇ X t,k (cid:54) = 0) ≤ p k − for 1 ≤ t ≤ n , we are in theposition to apply Lemma 21. Let s k := p − (1 /r + δ ) k and let β = β (cid:48) /n . We thus obtain P (cid:16)(cid:13)(cid:13)(cid:13) n (cid:88) t =1 δk − a jnt ( ˇ X t,k ) (cid:13)(cid:13)(cid:13) > x (cid:17) ≤ e − x s k β (cid:48) (cid:2) p k − ( e x ns k ( β (cid:48) ) − (cid:3) n . (B.3)We split the integral in equation (B.2) into two integrals (cid:90) ∞ C √ log n xP (cid:16)(cid:13)(cid:13)(cid:13) n (cid:88) t =1 δk − a jnt ( ˇ X t,k ) (cid:13)(cid:13)(cid:13) > x (cid:17) dx = (cid:90) √ n/s k C √ log n xP (cid:16)(cid:13)(cid:13)(cid:13) n (cid:88) t =1 δk − a jnt ( ˇ X t,k ) (cid:13)(cid:13)(cid:13) > x (cid:17) dx + (cid:90) ∞√ n/s k xP (cid:16)(cid:13)(cid:13)(cid:13) n (cid:88) t =1 δk − a jnt ( ˇ X t,k ) (cid:13)(cid:13)(cid:13) > x (cid:17) dx. Using (B.3) and p k − ≤ s − k , setting β (cid:48) = εs k , where ε > e x − ≤ x , which holds for small enough values of x , we obtain (cid:90) √ n/s k C √ log n xP (cid:16)(cid:13)(cid:13)(cid:13) n (cid:88) t =1 δk − a jnt ( ˇ X t,k ) (cid:13)(cid:13)(cid:13) > x (cid:17) dx ≤ (cid:90) √ n/s k C √ log n xe − x ε (cid:2) x ε n (cid:3) n dx ≤ (cid:90) √ n/s k C √ log n xe − x ε (1 − ε ) dx = 2 ε (1 − ε ) (cid:2) e − C ε (1 − ε ) log n − e − ns k · ε (1 − ε ) (cid:3) ≤ ε (1 − ε ) · n − C ε (1 − ε ) , where C is chosen in such a way that C ε (1 − ε ) ≥ ρ = 1 / − (1 /r + δ ) > 0. Using (B.3), setting β (cid:48) = εs k √ n/x , where ε > e x − ≤ x , which holds for small enough values of x , aswell as the inequality (1 + x/n ) n ≤ e x for x ∈ R and n ≥ 1, we obtain (cid:90) ∞√ n/s k xP (cid:16)(cid:13)(cid:13)(cid:13) n (cid:88) t =1 δk − a jnt ( ˇ X t,k ) (cid:13)(cid:13)(cid:13) > x (cid:17) dx ≤ (cid:90) ∞√ n/s k x e − x √ nsk ε (cid:2) p k − ( e ε − (cid:3) n dx ≤ p k − ε ] n (cid:90) ∞√ n/s k xe − x √ nsk ε dx ≤ ε − e nε s k e − ns k ε (cid:2) s k εn (cid:3) ≤ ε − e − ε (1 − ε ) n − /r + δ ) (cid:2) ε − n /r + δ ) − (cid:3) = 4 ε − e − ε (1 − ε ) n ρ (cid:2) ε − n − ρ (cid:3) , where we used the fact that p k − ≤ s − k and s k ≤ n /r + δ for 1 ≤ k ≤ log n . The proof iscomplete. 39 eferences [1] A. Aue, D. Dubart-Norinho, and S. H¨ormann. On the prediction of stationary functionaltime series. Journal of the American Statistical Association , 110:378–392, 2015.[2] D. Bosq. Linear Processes in Function Spaces , volume 149 of Lecture Notes in Statistics .Springer-Verlag New York, 2000.[3] P. Brockwell and R. Davis. Time Series: Theory and Methods . Springer Series in Statistics.Springer-Verlag New York, 1991.[4] V. Characiejus and G. Rice. A general white noise test based on kernel lag-window estimatesof the spectral density operator. Econometrics and Statistics , 13:175 – 196, 2020. ISSN2452-3062.[5] V. Chernozhukov, D. Chetverikov, and K. Kato. Central limit theorems and bootstrap inhigh dimensions. The Annals of Probability , 45:2309–2352, 2017.[6] V. Chernozhukov, D. Chetverikov, and K. Kato. Detailed proof of Nazarov’s inequality. arXive-prints , art. arXiv:1711.10696, November 2017.[7] R.A. Davis and T. Mikosch. The maximum of the periodogram of a non-Gaussian sequence. The Annals of Probability , 27:522–536, 1999.[8] Uwe Einmahl. Extensions of results of Koml´os, Major, and Tusn´ady to the multivariate case. Journal of Multivariate Analysis , 28(1):20 – 68, 1989.[9] P. Embrechts, C. Kl¨uppelberg, and T. Mikosch. Modelling Extremal Events , volume 33 of Stochastic Modelling and Applied Probability . Springer-Verlag Berlin Heidelberg, 1997.[10] Ronald Aylmer Fisher. Tests of significance in harmonic analysis. Proceedings of the RoyalSociety of London. Series A, Containing Papers of a Mathematical and Physical Character ,125(796):54–59, 1929.[11] F. G¨otze. On the rate of convergence in the multivariate CLT. The Annals of Probability , 19:724–739, 1991.[12] U. Grenander and M. Rosenblatt. Statistical Analysis of Stationary Time Series , volume 21.John Wiley and Sons, New York, 1957.[13] S. Guillas. Rates of convergence of autocorrelation estimates for autoregressive Hilbertianprocesses. Statistics & Probability Letters , 55.[14] E. J. Hannan. Testing for a jump in the spectral function. Journal of the Royal StatisticalSociety: Series B (Methodological) , 23(2):394–404, 1961.[15] S. H¨ormann and (cid:32)L. Kidzinski. A note on estimation in Hilbertian linear models. Scandinavianjournal of Statistics , 42:43–62.[16] S. H¨ormann, P. Kokoszka, and G. Nisol. Testing for periodicity in functional time series. TheAnnals of Statistics , 46:2960–2984, 2018. 4017] L. Horv´ath and P. Kokoszka. Inference for Functional Data with Applications , volume 200 of Springer Series in Statistics . Springer-Verlag New York, 2012.[18] L. Horv´ath, P. Kokoszka, and G. Rice. Testing stationarity of functional time series. Journalof Econometrics , 179:66–82, 2014.[19] S. H¨ormann, (cid:32)L. Kidzi´nski, and M. Hallin. Dynamic functional principal components. Journalof the Royal Statistical Society. Series B (Statistical Methodology) , 77(2):319–348, 2015.[20] G. M. Jenkins and M. B. Priestley. The spectral analysis of time-series. Journal of the RoyalStatistical Society. Series B (Methodological) , 19(1):1–12, 1957.[21] S. Kang and R. Serfozo. Extreme values of phase-type and mixed random variables withparallel-processing examples. Journal of Applied Probability , 36:194–210, 1999.[22] J. Klepsch, C. Kl¨uppelberg, and T. Wei. Prediction of functional ARMA processes with anapplication to traffic data. Econometrics and Statistics , 1:128–149, 2017.[23] A. Laukaitis and A. Raˇckauskas. Functional data analysis of payment systems. NonlinearAnalysis: Modelling and Control , 7:53–68, December 2002.[24] D. Liebl. Modeling and forecasting electricity prices: A functional data perspective. TheAnnals of Applied Statistics , 7:1562–1592, 2013.[25] Z. Lin and W. Liu. On maxima of periodograms of stationary processes. The Annals ofStatistics , 37:2676–2695, 2009.[26] I. B. MacNeill. Tests for periodic components in multiple time series. Biometrika , 61:57–70,1974.[27] F. Merlev`ede, M. Peligrad, and S. Utev. Sharp conditions for the CLT of linear processes ina Hilbert space. Journal of Theoretical Probability , pages 681–693, 1997.[28] F. Nazarov. On the maximal perimeter of a convex set in R n with respect to a Gaussianmeasure. In V. Milman and G. Schechtman, editors, Geometric Aspects of Functional Analy-sis: Israel Seminar 2001-2002 , volume 1807 of Lecture Notes in Mathematics , pages 169–187.Springer Berlin Heidelberg, Berlin, Heidelberg, 2003.[29] V. Panaretos and S. Tavakoli. Fourier analysis of stationary time series in function space. The Annals of Statistics , 41:568–603, 2013.[30] I. Pinelis. Optimum bounds for the distributions of martingales in Banach spaces. The Annalsof Probability , 22:1679–1706, October 1994.[31] A. Raˇckauskas and C. Suquet. On limit theorems for Banach-space-valued linear processes. Lithuanian Mathematical Journal , 50:71–87, 2010.[32] J. Ramsay, G. Hooker, and S. Graves. Functional Data Analysis with R and MATLAB .Springer, 1st edition, 2009.[33] H. Rosenthal. On the subspaces of L p ( p > 2) spanned by sequences of independent randomvariables. Israel Journal of Mathematics , 8:273–303, 1970.4134] A. Schuster. On the investigation of hidden periodicities with application to a supposed 26day period of meteorological phenomena. Terrestrial Magnetism , 3(1):13–41, 1898.[35] S.H. Schwabe. Die Sonne. Astronomische Nachrichten , 20:283–286, 1843.[36] M. Shimshoni. On Fisher’s test of significance in harmonic analysis. Geophysical Journal ofthe Royal Astronomical Society , 23(4):373–377, 1971.[37] E. Stadlober, S. H¨ormann, and B. Pfeiler. Quality and performance of a pm10 daily forecastingmodel. Atmospheric Environment , 42(6):1098 – 1109, 2008. ISSN 1352-2310.[38] A.M. Walker. Some asymptotic results for the periodogram of a stationary time series. Journalof the Australian Mathematical Society , 5:107–128, 1965.[39] G. T. Walker. Correlation in seasonal variations of weather, III : on the criterion for thereality of relationships or periodicities , volume 21 of Memoirs of the India MeteorologicalDepartment . Meteorological Office, 1914.[40] J. Weidmann. Linear Operators in Hilbert Spaces , volume 68 of Graduate Texts in Mathe-matics . Springer-Verlag New York, 1980.[41] G. Udny Yule. On a method of investigating periodicities in disturbed series, with specialreference to Wolfer’s sunspot numbers. Philosophical Transactions of the Royal Society ofLondon. Series A, Containing Papers of a Mathematical or Physical Character , 226:267–298,1927.[42] Xianyang Zhang. White noise testing and model diagnostic checking for functional time series.