[PDF] Sieve Bootstrap for Functional Time Series

Abstract

A bootstrap procedure for functional time series is proposed which exploits a general vector autoregressive representation of the time series of Fourier coefficients appearing in the Karhunen-Loève expansion of the functional process. A double sieve-type bootstrap method is developed which avoids the estimation of process operators and generates functional pseudo-time series that appropriately mimic the dependence structure of the functional time series at hand. The method uses a finite set of functional principal components to capture the essential driving parts of the infinite dimensional process and a finite order vector autoregressive process to imitate the temporal dependence structure of the corresponding vector time series of Fourier coefficients. By allowing the number of functional principal components as well as the autoregressive order used to increase to infinity (at some appropriate rate) as the sample size increases, a basic bootstrap central limit theorem is established which shows validity of the bootstrap procedure proposed for functional finite Fourier transforms. Some numerical examples illustrate the good finite sample performance of the new bootstrap method proposed.

Full PDF

aa r X i v : . [ m a t h . S T ] D ec Sieve Bootstrap for Functional TimeSeries

Efstathios Paparoditis ∗ UNIVERISTY OF CYPRUSDEPARTMENT OF MATHEMATICS AND STATISTICSP.O.Box 20537CY-1678 NICOSIACYPRUSe-mail: [email protected]

Abstract:

A bootstrap procedure for functional time series is proposedwhich exploits a general vector autoregressive representation of the timeseries of Fourier coeﬃcients appearing in the Karhunen-Lo`eve expansion ofthe functional process. A double sieve-type bootstrap method is developedwhich avoids the estimation of process operators and generates functionalpseudo-time series that appropriately mimic the dependence structure ofthe functional time series at hand. The method uses a ﬁnite set of functionalprincipal components to capture the essential driving parts of the inﬁnitedimensional process and a ﬁnite order vector autoregressive process to im-itate the temporal dependence structure of the corresponding vector timeseries of Fourier coeﬃcients. By allowing the number of functional principalcomponents as well as the autoregressive order used to increase to inﬁn-ity (at some appropriate rate) as the sample size increases, consistency ofthe functional sieve bootstrap can be established. We demonstrate this byproving a basic bootstrap central limit theorem for functional ﬁnite Fouriertransforms and by establishing bootstrap validity in the context of a fullyfunctional testing problem. A novel procedure to select the number of func-tional principal components is introduced while simulations illustrate thegood ﬁnite sample performance of the new bootstrap method proposed.

MSC 2010 subject classiﬁcations:

Primary 62M10, 62M15; secondary62G09.

Keywords and phrases:

Bootstrap, Fourier transform, Principal compo-nents, Karhunen-Lo`eve expansion, Spectral density operator.

1. Introduction

Statistical inference for time series stemming from stationary functional pro-cesses has attracted considerable interest during the last decades and progresshas been made in several directions. Estimation and testing procedures havebeen developed for a wide range of inference problems and for large classes ofstationary functional processes; see Bosq (2000), H¨ormann and Kokoszka (2012)and Horv´ath and Kokoszka (2012). However, the asymptotic results derived, typ-ically depend in a complicated way on diﬃcult to estimate, inﬁnite dimensionalcharacteristics of the underlying functional process. This restricts considerably ∗ Supported in part by a University of Cyprus Research Grant.1 fstathios Paparoditis/Functional Sieve Bootstrap the implementability of asymptotic approximations when used in practice tojudge the uncertainty of estimation procedures or to calculate critical values oftests. In such situations, bootstrap methods can provide useful alternatives.Bootstrap procedures for Hilbert space-valued time series proposed so far inthe literature, are mainly attempts to adapt, to the inﬁnite dimensional func-tional set-up, of bootstrap methods that have been developed for the ﬁnitedimensional (i.e., mostly univariate) time series case; cf. Lahiri (2003). Poli-tis and Romano (1994) considered applications of the stationary bootstrap tofunctional, Hilbert-valued time series and showed its validity for the samplemean for functional processes satisfying certain mixing and boundeness condi-tions. Dehling et al. (2015) considered applications of the non-overlapping blockbootstrap to U-statistics for so called near epoch dependent functional processesand Sharipov et al. (2016) to change point analysis. Franke and Nyarigue (2016)and Zhou and Politis (2016) developed some theory for diﬀerent residual-basedbootstrap procedures applied to a ﬁrst order functional autoregressive process.Notice that the transmission of other bootstrap methods for real-valued timeseries to the functional set-up, like for instance of the autoregressive-sieve boot-strap, Kreiss (1988) and Kreiss et al. (2011), seems to be diﬃcult mainly dueto problems associated with the estimation (of an with sample size increasingnumber) of inﬁnite dimensional autoregressive operators.Applications of bootstrap procedures to certain inference problems in func-tional time series analysis have been also considered in the literature. For in-stance, for the construction of prediction intervals, Fern´andez De Castro et al.(2005) used an approach based on resampling pairs of functional observationsby means of kernel-driven resampling probabilities. The same authors also applya parametric, residual-based bootstrap approach using an estimated ﬁrst orderfunctional autoregression with i.i.d. resampling of appropriately deﬁned func-tional residuals. For the same prediction problem, Hyndman and Shang (2009)applied diﬀerent bootstrap approaches including bootstrapping the functionalcurves by randomly disturbing the forecasted scores using residuals obtainedfrom univariate autoregressive ﬁts. Aneiros-Perez et al. (2011) considered thenonparametric functional autoregressive models, while Mingotti et al. (2015)the case of the integrated functional autoregressive model. Apart from the lackof theoretical justiﬁcation, the aforementioned bootstrap applications do notprovide a general bootstrap methodology for functional time series as they aredesigned for and their applicability is restricted to the particular inference prob-lem considered; see also McMurry and Politis (2011) and Shang (2016) for anoverview.In this paper a general and easy to implement bootstrap procedure for func-tional time series is proposed which generates bootstrap replicates X ∗ , X ∗ , . . . , X ∗ n of a functional time series X , X , . . . , X n and is applicable to a largeclass of stationary functional processes. The procedure avoids the explicit esti-mation of process operators and exploits some basic properties of the stochasticprocess of Fourier coeﬃcients (scores) appearing in the well-known Karhunen-Lo`eve expansion of the functional random variables. It is in particular shown,that under quite general assumptions, the stochastic process of Fourier coeﬃ- fstathios Paparoditis/Functional Sieve Bootstrap cients obeys a so-called vector autoregressive representation and this represen-tation plays a key role in developing a bootstrap procedure for the functionaltime series at hand. More speciﬁcally, to capture the essential driving functionalparts of the underlying inﬁnite dimensional process, the ﬁrst m functional prin-cipal components are used and the corresponding m -dimensional time seriesof Fourier coeﬃcients is bootstrapped using a p th order vector autoregressionﬁtted to the vector time series of sample Fourier coeﬃcients. In this way, a m -dimensional pseudo-time series of Fourier coeﬃcients is generated which imitatesthe temporal dependence structure of the vector time series of sample Fouriercoeﬃcients. Using the (truncated) Karhunen-Lo`eve expansion, these pseudo-Fourier coeﬃcients are then transformed to functional bootstrap replicates ofthe main driving, principal components, of the observed functional time series.Adding to these replicates an appropriately resampled functional noise, leadsﬁnally to the bootstrapped functional pseudo-time series X ∗ , X ∗ , . . . , X ∗ n .In a certain sense, our bootstrap procedure works by using a ﬁnite rank (i.e., m -dimensional) approximation of the inﬁnite dimensional structure of the un-derlying functional process and a p th order vector autoregressive approximationof its inﬁnite order temporal dependence structure. To achieve consistency andto capture appropriately the entire inﬁnite dimensional structure of the func-tional process, the number m of functional principal components used as wellas the order p of the vector autoregression applied, are allowed to increase toinﬁnity (at some appropriate rate) as the sample size n increases to inﬁnity.This double sieve property justiﬁes the use of the term “sieve bootstrap” for thebootstrap procedure proposed.We show that under quite general conditions, this bootstrap procedure suc-ceeds in imitating correctly the entire inﬁnite dimensional autocovariance struc-ture of the underlying functional process. Notice that apart from the problemthat instead of the unknown true scores, the time series of estimated scores isused, the asymptotic analysis of our bootstrap procedure faces additional chal-lenges which are caused by the fact that vector autoregressions of increasingorder and of increasing dimension are considered and that the lower bound ofthe corresponding spectral density matrix approaches zero as the dimension ofthe vector time series of scores used, increases to inﬁnity. We demonstrate howthe new bootstrap procedure proposed can be successfully applied to diﬀerentinference problems in functional time series analysis. In particular, we apply theproposed sieve bootstrap procedure to the problem of estimating the distribu-tion of the functional Fourier transform which is fundamental in a multitude ofapplications and has attracted interest in the functional time series literature;see Cerovecki and H¨ormann (2015) for some recent developments. In this con-text, a basic bootstrap central limit theorem is established which shows validityof the functional sieve bootstrap for this important class of statistics. Further-more, we consider applications of the functional sieve bootstrap in the context offully functional testing and to the two sample mean problem and show how thisbootstrap procedure can be applied to consistently estimate the complicateddistribution of the test statistic of interest under the null.Using the time series of Fourier coeﬃcients in the context of functional time fstathios Paparoditis/Functional Sieve Bootstrap series analysis has been considered by many authors in a variety of applications.Among others we mention Hyndman and Shang (2009) who, for functional au-toregressive models and for the sake of prediction, used univariate autoregres-sions ﬁtted to the scalar time series of scores. In the same context and morerelated to the approach proposed in this paper, a multivariate approach of pre-diction has been proposed by Aue et al. (2014) which works by ﬁtting a vectorautoregressive model to the multivariate time series of scores.The paper is organized as follows. Section 2 derives some basic properties anddiscuss the autoregressive representations of the vector process of Fourier coef-ﬁcients appearing in the Karhunen-Lo`eve expansion of the functional process.Apart from being useful for bootstrap purposes, these properties are of intereston their own. The functional sieve bootstrap procedure proposed is described inSection 3 where some properties of the bootstrap functional pseudo-time seriesare also discussed. Asymptotic validity of the new bootstrap procedure appliedto ﬁnite Fourier transforms and to fully functional testing is established in Sec-tion 4. Section 5 proposes some novel practical, data driven rules to choose thebootstrap parameters and presents some numerical simulations which investi-gate the ﬁnite sample performance of the functional sieve bootstrap. Compar-isons with three variants of block bootstrap methods are also given. Technicalproofs and auxiliary lemmas are deferred to Section 6.

2. The Process of Fourier Coeﬃcients

We consider a (functional) stochastic process X = { X t , t ∈ Z } where for each t (interpreted as time), X t is a random element of the separable Hilbert space H := L ([0 , , R ) with parametrization τ → X t ( τ ) ∈ R for τ ∈ [0 , h· , ·i the inner product in H and by k · k the induced norm deﬁnedfor x, y ∈ H as h x, y i = (cid:16) R [0 , x ( t ) y ( t ) dt (cid:17) / and k x k = (cid:0) h x, x i (cid:1) / respectively.Furthermore, for matrices A and B we denote by k A k F the Frobenius norm, wewrite A ≥ B or B ≤ A if A − B is non-negative hermitian while for an operator T , k T k denotes its operator norm and k T k HS its Hilbert-Schmidt norm, if T isa Hilbert-Schmidt operator.For the underlying functional process X it is assumed that its dependencestructure satisﬁes the following assumption. Assumption 1 X is a purely non deterministic, L - M approximable pro-cess.The general notion of L p − M approximability refers to stochastic process X = { X t , t ∈ Z } with X t taking values in H , E k X t k p < ∞ , and wherethe random element X t admits the representation X t = f ( ε t , ε t − , . . . ). Herethe ε t ’s are i.i.d. random elements in H and f some measurable function f : H ∞ → H . If for { e ε t , t ∈ Z } an independent copy of { ε t , t ∈ Z } and X ( M ) t = fstathios Paparoditis/Functional Sieve Bootstrap f ( ε t , ε t − , . . . , ε t −M +1 , e ε t −M , e ε t −M− , . . . ), the condition ∞ X k =1 (cid:0) E k X k − X ( k ) k k p (cid:1) /p < ∞ , is satisﬁed, then X is called L p − M approximable. L p − M approximability isa notion of weak dependence which applies to many commonly used functionaltime series models, like linear functional processes, functional ARCH processes,etc.; see H¨ormann and Kokoszka (2010) for more details.Let µ := EX ∈ H be the mean of X which by stationary is independent of t and for which we assume µ = 0 for simplicity. We denote by C h the autocovari-ance operator C h : H → H at lag h ∈ Z deﬁned by C h ( · ) = E h X t − µ, ·i ( X t + h − µ ). Associated with the autocovariance operator is the autocovariance func-tion c h : [0 , × [0 , → R with c h ( τ, ν ) = E ( X t ( τ ) − µ ( τ ))( X t + h ( ν ) − µ ( ν )), τ, ν ∈ [0 , C h is an integral operator with kernel function c h .Assumption 1 implies that P h k C h k HS < ∞ and that for every ω ∈ R thespectral density operator F ω ( x ) = (2 π ) − X h ∈ Z C h ( x ) e − ihω , x ∈ H is well deﬁned, continuous in ω , selfadjoint and trace class, H¨ormann et al.(2015); see also Panaretos and Tavakoli (2013) for similar properties under dif-ferent weak dependence conditions. In what follows we will strengthen somehowthe assumption on the norm summability of the autocovariance operator to thefollowing requirement. Assumption 2 P h (1 + | h | ) r k C h k HS < ∞ for some r ≥ F ω satisﬁesthe following condition. Assumption 3

For all ω ∈ [0 , π ], the operator F ω is of full rank, i.e.,ker( F ω ) = 0.For real-valued univariate processes, ker( F ω ) = 0 is equivalent to the condi-tion that the spectral density is everywhere in [0 , π ] strictly positive while formultivariate process to the non-singularity of the spectral density matrix forevery frequency ω ∈ [0 , π ]. Notice that all eigenvalues ν j ( ω ), j = 1 , , . . . of F ω are positive and that P ∞ j =1 ν j ( ω ) < ∞ by the trace class property of F ω . Since C = R π − π F ω dω , the positivity of F ω implies that the covariance operator C has full rank, that is, its eigenvalues λ j satisfy λ j > j ≥

1. Bythe symmetry and compacteness of C , the random element X t admits the wellknown Karhunen-Lo`eve representation X t = ∞ X j =1 h X t , v j i v j , t ∈ Z , (2.1) fstathios Paparoditis/Functional Sieve Bootstrap where v j , j = 1 , , . . . , are the orthonormalized eigenfunctions that correspondto the eigenvalues λ j , j = 1 , , . . . , of C . For t ∈ Z , let ξ j,t := h X t , v j i , j ≥

1, andconsider any subset of indices M = { j , j , . . . , j m } ⊂ N with j < j < . . . < j m , m < ∞ . Later on, we will concentrate on the speciﬁc set M = { , , . . . , m } which will be the set of the m largest eigenvalues of the covariance operator C .Consider now the m -dimensional process ξ ( M ) = { ξ ( M ) t = ( ξ ( M ) j s ,t , s = 1 , , . . . , m ) ⊤ , t ∈ Z } . Observe that ξ ( M ) is strictly stationary, purely non deterministic and hasmean zero, i.e., E ( ξ ( M ) t ) = ( h EX t , v j s i , s ∈ M ) = 0. Furthermore, its au-tocovariance matrix function Γ ξ ( M ) ( h ) = E ( ξ ( M ) t ξ ( M ) T t + h ), h ∈ Z , is given byΓ ξ ( M ) ( h ) = (cid:0) h C h ( v j s ) , v l r i (cid:1) s,r =1 , ,...,m and satisﬁes by Assumption 2, ∞ X h = −∞ (1 + | h | ) r k Γ ξ ( M ) ( h ) k F = ∞ X h = −∞ (1 + | h | ) r (cid:16) m X s,r =1 h C h ( v j s ) , v l r i (cid:17) / ≤ ∞ X h = −∞ (1 + | h | ) r k C h k HS < ∞ . (2.2)Note that the bound on the right hand side above is independent of the set M and that although by construction it holds true that Cov ( ξ ( M ) r ,t , ξ ( M ) r ,t ) = 0 for r = r , the random variables ξ r ,t and ξ r ,s may be correlated for t = s . Thesummability property (2.2) implies that the m -dimensional vector process ξ ( M ) possesses a continuous spectral density matrix f ξ ( M ) ( · ) which is given by f ξ ( M ) ( ω ) = (2 π ) − ∞ X h = −∞ Γ ξ ( M ) ( h ) e − iωh , ω ∈ R . Moreover, f ξ ( M ) satisﬁes the following boundeness conditions. Lemma 2.1.

Under Assumption 1 and 3 and Assumption 2 with r = 0 , thespectral density f ξ ( M ) satisﬁes δ M I m ≤ f ξ ( M ) ( ω ) ≤ c I m , for all ω ∈ [0 , π ] , (2.3) where δ M and c are real numbers ( δ M depends on the set M ), such that <δ M ≤ c < ∞ and I m is the m × m unity matrix. The continuity and the boundeness properties of the spectral density matrix f ξ ( M ) ( · ) stated in Lemma 2.1, imply that the process ξ ( M ) obeys a so calledvector autoregressive representation; Cheng and Pourahmadi (1983), see alsoWiener and Masani (1958). That is, there exist an inﬁnite sequence of m × m -matrices { A ( M ) j , j ∈ N } and a full rank m -dimensional white noise process { e ( M ) t , t ∈ Z } , such that ξ ( M ) t can be expressed as ξ ( M ) t = ∞ X j =1 A ( M ) j ξ ( M ) t − j + e ( M ) t , t ∈ Z , (2.4) fstathios Paparoditis/Functional Sieve Bootstrap where the coeﬃcients matrices satisfy P j ∈ N (1 + j ) k A ( M ) j k F < ∞ and { e ( M ) t , t ∈ Z } is a zero mean white noise innovation process, that is E ( e ( M ) t ) = 0 and E ( e ( M ) t e ( M ) ⊤ s ) = δ t,s Σ ( M ) e , with δ t,s = 1 if t = s , δ t,s = 0 otherwise and Σ ( M ) e a full rank m × m covariance matrix. We stress here the fact that (2.4) doesnot describe a model for the process of Fourier coeﬃcients ξ ( M ) t and shouldnot be confused with the so-called linear, inﬁnite order vector autoregressive(VAR( ∞ )) process driven by independent, identically distributed (i.i.d.) inno-vations. In fact, representation (2.4) is the autoregressive analogue of the well-known (moving average) Wold representation of ξ ( M ) t with respect to the samewhite noise innovation process { e ( M ) t , t ∈ Z } . This autoregressive representationis valid for any stationary and purely non deterministic process the spectraldensity matrix of which is continuous and satisﬁes the boundness conditions(2.3); see also Cheng and Pourahmadi (1983) and Pourahmadi (2001) for de-tails. In contrast to the Wold representation, the autoregressive representation(2.4) seems to be more appealing for statistical purposes, since it express thevector time series of Fourier coeﬃcients ξ ( M ) t as a function of its (in principle)observable past values ξ ( M ) t − j , j = 1 , , . . . .In what follows we assume that the eigenvalues are in descending order, i.e.,that λ > λ > · · · > λ m > M = { , , . . . , m } ofthe m largest eigenvalues of C . The corresponding normalized eigenfunctions(principal components) are denoted by v j , j = 1 , , . . . , m and are (up to a sign)uniquely identiﬁed. Furthermore, by Parseval’s identity, the quantity P mj =1 λ j describes the variance of X t captured by the ﬁrst m functional principal com-ponents. To simplify notation we surpass in the following the upper index ( M )and write simple ξ t for ξ ( M ) t respectively f ξ for f ξ ( M ) , keeping in mind that the j th component ξ j,t = h X t , v j i of ξ t = ( ξ .t , ξ ,t , . . . , ξ m,t ) ⊤ is obtained using theorthonormalized eigenfunction v j which corresponds to the j th largest eigen-value λ j of C , j = 1 , , . . . , m . Furthermore, we write A j ( m ), e t ( m ), δ m andΣ e ( m ) for A ( M ) j , e ( M ) t , δ M and Σ ( M ) e , respectively.

3. The Functional Sieve Bootstrap Procedure

The basic idea of our procedure is to generate pseudo-replicates X ∗ , X ∗ , . . . , X ∗ n of the functional time series at hand by ﬁrst bootstrapping the m -dimensionaltime series of Fourier coeﬃcients ξ t = ( ξ ,t , ξ ,t , . . . , ξ m,t ) ⊤ , t = 1 , , . . . , n , cor-responding to the ﬁrst m principal components. This m -dimensional time seriesof Fourier coeﬃcients is bootstrapped using the autoregressive representationof ξ t discussed in Section 2.2. The generated m -dimensional pseudo-time se-ries of Fourier coeﬃcients is then transformed to functional principal pseudo-components by means of the truncated Karhunen-Lo`eve expansion P mj =1 ξ j,t v j .Adding to this an appropriately resampled functional noise leads to the func-tional pseudo-time series X ∗ , X ∗ , . . . , X ∗ n . However, since the ξ t ’s are not ob- fstathios Paparoditis/Functional Sieve Bootstrap served, we work with the time series of estimates scores. This idea is preciselydescribed in the following functional sieve bootstrap algorithm. Step 1 : Select a number m = m ( n ) of functional principal componentsand an autoregressive order p = p ( n ), both ﬁnite and depending on n . Step 2 : Let b ξ t = ( b ξ j,t = h X t , b v j i , j = 1 , , . . . , m ) ⊤ , t = 1 , , . . . , n, be the m -dimensional time series of estimated Fourier coeﬃcients, where b v j , j = 1 , , . . . , m are the estimated eigenfunctions corresponding to theestimated eigenvalues b λ > b λ > · · · > b λ m of the sample covarianceoperator b C = n − P nt =1 ( X t − X n ) ⊗ ( X t − X n ), X n = n − P nt =1 X t . Step 3 : Let b X t,m = P mj =1 b ξ j,t b v j and deﬁne the functional residuals b U t,m = X t − b X t,m , t = 1 , , . . . , n . Step 4 : Fit a p th order vector autoregressive process to the m -dimensionaltime series b ξ t , t = 1 , , . . . , n , denote by b A j,p ( m ) , j = 1 , , . . . , p , the esti-mates of the autoregressive matrices and by b e t,p the residuals, b e t,p = b ξ t − p X j =1 b A j,p ( m ) b ξ t − j , t = p + 1 , p + 2 , . . . , n. Diﬀerent estimators b A j,p ( m ), j = 1 , , . . . , p can be used, but we focus inthe following on Yule-Walker estimators; cf. Brockwell and Davis (1991). Step 5 : Generate a m -dimensional pseudo time series of scores ξ ∗ t =( ξ ∗ ,t , ξ ∗ ,t , . . . , ξ ∗ m,t ), t = 1 , , . . . , n , using ξ ∗ t = p X j =1 b A j,p ( m ) ξ ∗ t − j + e ∗ t , where e ∗ t , t = 1 , , . . . , n are i.i.d. random vectors having as distributionthe empirical distribution of the centered residual vectors e e t,p = b e t,p − b e n , t = p + 1 , p + 2 , . . . , n and b e n = ( n − p ) − P nt = p +1 b e t,p . Step 6 : Generate a pseudo-functional time series X ∗ , X ∗ , . . . , X ∗ n , where X ∗ t = m X j =1 ξ ∗ j,t b v j + U ∗ t , t = 1 , , . . . , n, (3.1)and U ∗ , U ∗ , . . . , U ∗ n are i.i.d. random functions obtained by choosing withreplacement from the set of centered functional residuals b U t,m − b U n , t =1 , , . . . , n and b U n = n − P nt =1 b U t,m .Some comments regarding the above algorithm are in order. Notice ﬁrst that X ∗ , X ∗ , . . . , X ∗ n are functional pseudo-random variables and that the autore-gressive representation of the vector time series of Fourier coeﬃcients is solely fstathios Paparoditis/Functional Sieve Bootstrap used as a tool to bootstrap the m main functional principal components ofthe functional time series at hand. In fact, it is this autoregressive representa-tion which allows the generation of the pseudo-time series of Fourier coeﬃcients ξ ∗ , ξ ∗ , . . . , ξ ∗ n in Step 4 and Step 5 in a way that imitates the dependence struc-ture of the sample Fourier coeﬃcients ξ , ξ , . . . , ξ n . These pseudo-Fourier coef-ﬁcients are transformed to bootstrapped main principal components by meansof the truncated and estimated Karhunen-Lo`eve expansion which together withthe additive functional noise U ∗ t , lead to the new functional pseudo-observations X ∗ , X ∗ , . . . , X ∗ n .The estimated eigenfunctions b v j used in Step 2 may point in an opposite di-rection than the eigenfunctions v j . In asymptotic derivations this is commonlytaking care oﬀ by considering the sign corrected estimator b s j b v j , where the (unob-served) random variable b s j is given by b s j = sign( h b v j , v j i ). However, since in oursetting adding this sign correction will not aﬀect the asymptotic results derived,we assume for simplicity throughout this paper, that b s j = 1, for j = 1 , , . . . , m . Remark 3.1.

To simplify notation we have assumed that the mean of X iszero. If EX t = µ = 0 then the sieve bootstrap algorithm can be appropriatelymodiﬁed by deﬁning the pseudo-random element X ∗ t in Step 6 as X ∗ t = X n + P mj =1 ξ ∗ j,t b v j + U ∗ t , t = 1 , , . . . , n . Notice that since under Assumption 1, k X n − µ k = O P ( n − / ), see H¨ormann and Kokoszka (2012), the asymptotic resultsderived in this paper are not aﬀected, i.e., EX t = 0 is not a stringent assumption. Remark 3.2.

Modiﬁcations of the above basic bootstrap algorithm are possiblewhich concern the resampling schemes used to generate the vector of pseudo-innovations e ∗ t and/or the bootstrap functional noise U ∗ t . To elaborate, and aswe will see in the sequel, for general stationary processes satisfying Assumption1, the applied i.i.d. resampling used to generate the pseudo-innovations e ∗ t in Step 5 , suﬃces in order to capture the entire, inﬁnite dimensional second orderstructure of the underlying functional process X . However, a modiﬁcation ofthis i.i.d. resampling scheme may be needed if higher order dependence charac-teristics of the underlying functional process beyond those of order two, shouldalso be correctly mimicked by the functional pseudo-time series X ∗ , X ∗ , . . . , X ∗ n .In such a case, the i.i.d. resampling used to generate the e ∗ t ’s in Step 5 can bereplaced by other resampling schemes (i.e., block bootstrap schemes) that areable to capture higher order dependence characteristics of the white noise pro-cess { e t , t ∈ Z } appearing in (2.4). As usual, all considerations made regarding the bootstrap procedure are madeconditionally on the observed functional time series X , X , . . . , X n . The gener-ation mechanism of the pseudo-time series X ∗ , X ∗ , . . . , X ∗ n , enables us to con-sider the bootstrap functional process X ∗ = { X ∗ t , t ∈ Z } , where for t ∈ Z , X ∗ t = P mj =1 ⊤ j ξ ∗ t b v j + U ∗ t , with { ξ ∗ t = ( ξ ∗ ,t , . . . , ξ ∗ m,t ) ⊤ , t ∈ Z } generated as ξ ∗ t = P pj =1 b A j,p ( m ) ξ ∗ t − j + e ∗ t and the U ∗ t ’s are i.i.d. functional random variable fstathios Paparoditis/Functional Sieve Bootstrap taking values in the set { b U t,m − b U n , t = 1 , , . . . , n } with probability 1 /n . Inthe above notation j is the m -dimensional vector j = (0 , . . . , , , , . . . , ⊤ ,where the unity appears in the j th position.It is easy to see that X ∗ is a strictly stationary functional process with meanfunction E ∗ X ∗ t = 0 and autocovariance operator C ∗ h : H → H given, for h ∈ Z ,by C ∗ h ( · ) = m X j =1 m X j =1 ′ j Γ ∗ h j h b v j , ·i b v j + I ( h = 0) E ∗ h U ∗ t , ·i U ∗ t , where Γ ∗ h = E ∗ ( ξ ∗ t ξ ∗ T t + h ) is the m × m autocovariance matrix at lag h of { ξ ∗ t , t ∈ Z } . C ∗ h is a Hilbert-Schmidt operator since it is, for h = 0, a ﬁnite rank operatorwhile for h = 0 it is the sum of a ﬁnite rank operator and of the (Hilbert-Schmidt) empirical covariance operator of the functional pseudo-innovations C ∗ U = E ∗ h U ∗ t , ·i U ∗ t = n − P nt =1 h b U t,m − b U n , ·i ( b U t,m − b U n ).If the (estimated) vector autoregressive process used to generate the time se-ries of pseudo-scores ξ ∗ t is stable, then the dependence structure of the bootstrapprocess X ∗ can be precisely described. This is stated in the following proposi-tion. Notice that the required stability condition of the estimated autoregressivepolynomial is fulﬁlled, if for instance, b A j,p , j = 1 , , . . . ., p , are the Yule-Walkerestimators; cf. Brockwell and Davis (1991), Ch. 11.4. Proposition 3.1. If p, m ∈ N is such that the estimator b A j,p , j = 1 , , . . . , p ,used in Step 4 of the functional sieve bootstrap algorithm is well deﬁned andsatisﬁes det ( b A p,m ( z )) = 0 for all | z | ≤ , where b A p,m ( z ) = I m − P pj =1 b A j,p ( m ) z j , z ∈ C , then, conditionally on X , X , . . . , X n , the bootstrap process X ∗ is L −M approximable. The L − M approximability of X ∗ implies that P h k C ∗ h k HS < ∞ , seeH¨ormann et al. (2015), which can be also easily veriﬁed since X h ∈ Z k C ∗ h k HS ≤ X h ∈ Z k Γ ∗ h k F + I ( h = 0) k C ∗ U k HS = O P (1) . Furthermore, and because of the L −M approximability property, the bootstrapprocess X ∗ possesses for every ω ∈ R a spectral density operator F ∗ ω,m deﬁnedby F ∗ ω,m ( x ) = (2 π ) − X h ∈ Z C ∗ h ( x ) e − ihω , x ∈ H . (3.2) C ∗ h and F ∗ ω,m are essentially ﬁnite rank approximations of the correspondingpopulation operators C h and F ω respectively. Thus and in order for the boot-strap process X ∗ to capture the inﬁnite dimensional structure of the underlyingfunctional process and the inﬁnite order dependence structure of the vector timeseries generating the scores, the dimension m as well as the autoregressive order p , used in the functional sieve bootstrap algorithm, have to increase to inﬁn-ity (at some appropriate rate) as the sample size n increases to inﬁnity. This fstathios Paparoditis/Functional Sieve Bootstrap rate should take into account the fact that the true scores and eigenfunctionsappearing in the Karhunen-Lo`eve expansion are not observed and, therefore,sample estimates are used instead. Furthermore, the lower bound δ m of thespectral density matrix of the scores f ξ , approaches zero as the sample size n increases to inﬁnity. This is due to the fact that the eigenvalues ν j ( ω ) of thespectral density operator F ω converge to zero as j → ∞ . These facts make theasymptotic analysis quite involved and impose several restrictions regarding thebehavior of m and p with respect to the sample size n which are summarized inthe following assumption. Assumption 4

The sequences m = m ( n ) and p = p ( n ) satisfy m → ∞ and p → ∞ as n → ∞ such that,(i) m / = O ( p / )(ii) p n / λ m vuut m X j =1 α j →

0, where α = λ − λ and α j = min { λ j − − λ j , λ j − λ j +1 } for j = 2 , , ..., m .(iii) δ − m P ∞ j = p +1 j r k A j ( m ) k F → r ≥

0, where δ m is the lowerbound of the spectral density matrix f ξ given in (2.3).(iv) m p k e A p,m − A p,m k F = O P (1), where e A p,m = ( e A ,p ( m ) , . . . , e A p,p ( m )),and A p,m = ( A ,p ( m ) , . . . , A p,p ( m )). Here, e A j,p , j = 1 , , . . . , p denotesthe same estimator as b A j,p , j = 1 , , . . . , p , based on the true vector timeseries of scores ξ , ξ , . . . , ξ n instead of their estimates b ξ , b ξ , . . . , b ξ n and A j,p ( m ), j = 1 , , . . . , m are the coeﬃcient matrices of the best (in themean square sense) linear predictor of ξ t based on ξ t − j , j = 1 , , . . . , p .Assumption 4(i) restricts the rate with which the dimension m is allowedto increase to inﬁnity compared with that of p . Assumption 4(ii) is imposedin order to control the error made by the fact that the bootstrap procedure isbased on estimated scores and eigenfunctions instead on the unobserved truequantities in a context where the dimension m and the autoregressive order p ,both, increase to inﬁnity and the lower bound of the spectral density matrix ofthe m -dimensional vector of scores approaches zero as m increases to inﬁnity.Part (iii) relates the rate of increase of the autoregressive order p to the lowerbound of the spectral density matrix f ξ and the decay of the norm of the au-toregressive matrices to zero. Part (iv) is essentially a requirement on the rateat which m and p are allowed to increase to inﬁnity taking into account theconvergence rate of the estimator e A j,p , j = 1 , , . . . , p based on the true scores.For instance, calculations similar to that in the proof of Lemma 6.3 yield forthe Yule-Walker estimator that k e A p,m − A p,m k F = O P ( mpn − / ( √ mλ − m + p ) )which, taking into account Assumption 4(i), implies that Assumption 4(iv) issatisﬁed if m, p → ∞ slowly enough with n such that mp = O ( √ nλ m ) and pλ m = O ( m ). Notice that, for real valued-random variables, such assumptionsrelating the rate of increase of the autoregressive parameters to the convergencerate of the estimators used, are common in the autoregressive-sieve bootstrapliterature; see Kreiss et al. (2011) and Meyer and Kreiss (2015). However, the fstathios Paparoditis/Functional Sieve Bootstrap situation here is much more involved since in our context, not only the order p but also the dimension m of the vector autoregression has to increase to inﬁnitewith the sample size by taking into account the fact that λ m converges to zeroas m increases to inﬁnity.The following lemma illustrates the rate conditions imposed in Assumption 4by considering two particular examples of the behavior of the diﬀerence λ j − λ j +1 which is related to the rate of decrease of the eigenvalues λ j . According to thislemma, p may increase to inﬁnity as n a for some a > m depends on the rate of decrease of λ j − λ j +1 respectively of the eigenvalues λ j , j = 1 , , . . . . If these diﬀerences decrease with a geometric rate, then m may increase at most logarithmically in the sample size n , while if the samediﬀerences decrease with a polynomial rate, then m may increase to inﬁnityfaster, like n ζ for some appropriate ζ > Lemma 3.1.

Assume that e A p,m are the Yule-Walker estimators of A p,m .(i) If λ j − λ j +1 ≥ C λ ρ j for j = 1 , , . . . , ρ ∈ (0 , and C λ > , thenAssumption 4(i), (ii) and (iv) is satisﬁed if p = O ( n a ) and m ≤ (cid:16)

16 log( ρ − ) (cid:0) − a ) − δ (cid:17) log ( n ) , for a ∈ (0 , / and some δ > .(ii) If λ j − λ j +1 ≥ C λ j − θ for j = 1 , , . . . and for some θ > and C λ > ,then Assumption 4(i), (ii) and (iv) is satisﬁed if p = O ( n a ) and m = O ( n ζ ) , for a ∈ (0 , / and ζ ∈ [ ζ min , ζ max ] , where ζ min = a/ (2 + 2 θ ) and ζ max =min { (1 − a ) / (1 + 6 θ ) − δ, a/ } for some δ > . Under the condition that m and p increase to inﬁnity at an appropriate ratewith n such that Assumption 4 is satisﬁed, the following proposition can beestablished which shows that the spectral density operator F ∗ ω,m of the bootstrapprocess X ∗ converges, in Hilbert-Schmidt norm, to the spectral density operator F ω of the underlying functional process X . Proposition 3.2.

Under Assumptions 1 and 3 and Assumption 2 and 4 with r = 2 , we have, that, as n → ∞ , sup ω ∈ [0 ,π ] kF ∗ ω,m − F ω k HS → , in probability. From the above proposition and the inversion formulae of Fourier transforms,we immediately get for the covariance operators C ∗ h and C h of the bootstrapprocess X ∗ and of the underlying process X , that sup h ∈ Z k C ∗ h − C h k HS → n → ∞ . Thus the bootstrap process X ∗ , imitates asymp-totically correct the entire inﬁnite dimensional autocovariance structure of the fstathios Paparoditis/Functional Sieve Bootstrap functional process X . This allows for the use of the bootstrap functional time X ∗ , X ∗ , . . . , X ∗ n to approximate the distribution of statistics based on the func-tional time series X , X , . . . , X n . Some examples of such statistics are discussedin the next section.So far we have assumed that the covariance operator C has full rank, i.e., thatits eigenvalues λ j are distinct which implies that, for consistency and in order tocapture the entire inﬁnite dimensional dependence structure of the underlyingfunctional process X , the number m of principal components included, has toincrease to inﬁnity with the sample size n . The situation is much simpler if weassume that m ∈ N exists such that λ m > λ j = 0 for all j > m . In thiscase only the ﬁnite number of m score time series are needed to describe the en-tire dependence structure of X . We are then essentially in the ﬁnite dimensionalcase with the m -dimensional score process { ξ t = ( h X t , v j i , j = 1 , . . . , m ) ⊤ , t ∈ Z } , possessing a spectral density matrix which is bounded from bellow by a pos-itive constant independent of the sample size n . Furthermore, as in the proof ofLemma 6.3 and, because in this case P m j =1 k b v j − v j k = O P ( n − / ), we get that k b A p,m − e A p,m k F = O P ( p / √ n ). Standard arguments applied in the case ofthe (ﬁnite dimensional) vector autoregressive-sieve bootstrap can then be used(see for instance Meyer and Kreiss (2015)), to show that under less restrictiveconditions that those stated in Assumption 4, sup ω ∈ [0 ,π ] kF ∗ ω,m − F ω k HS P → , in probability.

4. Bootstrap Validity

In this section we investigate the validity of the functional sieve bootstrap ap-plied in order to approximate the distribution of some statistic T n = T ( X , X , . . . , X n )of interest, when the bootstrap analogue T ∗ n = T ( X ∗ , X ∗ , . . . , X ∗ n ) is used. No-tice that establishing validity of a bootstrap procedure for time series heavily de-pends on two issues; see also Kreiss and Paparoditis (2011). On the dependencestructure of the underlying process which aﬀects the distribution of the statis-tic of interest and on the capability of the bootstrap procedure used to mimicappropriately this dependence structure. Furthermore, since proving bootstrapvalidity is a case by case matter, we demonstrate in the following applications ofthe functional sieve bootstrap procedure proposed to some statistics that haverecently attracted considerable interest in the functional time series literature. Consider the distribution of the functional Fourier transform S n ( ω ) = n X t =1 X t e − itω , ω ∈ [ − π, π ] . (4.1)Notice that the sample mean X n = n − S n (0) is just a special case of (4.1).In order to elaborate on the limiting distribution of S n ( ω ) we ﬁrst ﬁx some fstathios Paparoditis/Functional Sieve Bootstrap notation. We say that a random element Z ∈ H C := H + i H , follows a circularly-symmetric complex Gaussian distribution with mean zero and covariance G , wewrite Z ∼ CN (0 , G ), if (cid:18) Re ( Z ) Im ( Z ) (cid:19) ∼ N H×H (cid:16) (cid:18) (cid:19) , (cid:18) Re ( G ) − Im ( G ) Im ( G ) Re ( G ) (cid:19) (cid:17) ;see also Cerovecki and H¨ormann (2017) for a general discussion of the complexGaussian distribution.Under a range of diﬀerent weak dependence assumptions on the functionalprocess X , it has been shown that n − / S n ( ω ) ⇒ CN (0 , π F ω ) (4.2)as n → ∞ , where ⇒ denotes weak convergence on H C . For ω = 0, such alimiting behavior has been established for linear functional processes by Mer-lev`ede et al. (1997) and for L p − M approximable processes by Horv`ath et al.(2013). Panaretros and Tavakoli (2013) derived the above limiting distributionof n − / S n ( ω ) for ω ∈ [0 , π ], under a summability condition of the functionalcumulants, while more general results for the same statistic and under weakerconditions, have been recently obtained by Cerovecki and H¨ormann (2017).We propose to use the bootstrap statistic n − / S ∗ n ( ω ) = n − / P nt =1 X ∗ t e − itω in order to approximate the distribution of the statistic n − / S n ( ω ). The fol-lowing theorem establishes asymptotic validity of this functional sieve bootstrapproposal for the class of functional Fourier transforms considered. In this theo-rem, d is any metric metrizing weak convergence on H C . Theorem 4.1.

Suppose that for ω ∈ [0 , π ] , the sequence { n − / S n ( ω ) , n ∈ N } in H C satisﬁes (4.2). Suppose further that Assumptions 1 and 3 and Assumption2 and 4 with r = 2 are satisﬁed. Then, as n → ∞ ,(i) d (cid:0) L ( n − / S n ( ω )) , L ( n − / S ∗ n ( ω ) | X , X , . . . , X n )) → , and(ii) k n − E ∗ S ∗ n ( ω ) ⊗ S ∗ n ( ω ) − n − ES n ( ω ) ⊗ S n ( ω ) k HS P → ,in probability. Remark 4.1.

Notice that as a special case of the above theorem we get that,under the assumptions made, and as n → ∞ , √ nX ∗ n ⇒ N (cid:0) , P h ∈ Z C h (cid:1) , inprobability and nE ∗ X ∗ n ⊗ X ∗ n P → π F , which provides one of the ﬁrst instancesof a central limit theorem for the bootstrap for functional time series under theweak dependence conditions stated in Assumption 1. In a variety of functional testing situations one is faced with the problemthat the limiting distribution under the null of a fully functional test statis-tic, depends, in a complicated way, on diﬃcult to estimate characteristics of fstathios Paparoditis/Functional Sieve Bootstrap the underlying functional process. This makes the practical implementation ofasymptotic results derived in order to calculate critical values of tests a diﬃ-cult task. To overcome this problem, a common approach in the literature isto consider tests based on ﬁnite dimensional projections. However, such testshave non-degenerated power only for alternatives which are not orthogonal tothe space captured by the particular projections considered; see Horv´ath et. al(2013) and Horv´ath et al. (2014) for examples. Using as an example the twosample mean problem, we demonstrate in the following how the sieve bootstrapprocedure proposed in this paper, can be successfully applied to approximatethe null distribution of a fully functional test.Let X = { X t , t ∈ Z } and Y = { Y t , t ∈ Z } be two independent, strictly sta-tionary functional processes with mean functions µ X = EX t and µ Y = EY t respectively and consider the testing problem H : µ X = µ Y against the alter-native H : µ X = µ Y . Given two time series X , X , . . . , X n and Y , Y , . . . , Y n stemming from X and Y respectively, a natural test statistic for these hypothe-ses is given by U n ,n = n n n + n k X n − Y n k , where X n = n − P n t =1 X t and Y n = n − P n t =1 Y t . If both processes sat-isfy Assumption 1 and n , n → ∞ such that n / ( n + n ) → θ ∈ (0 , U n ,n d → R Γ ( τ ) dτ , where { Γ( τ ) , τ ∈ [0 , } is a mean zero Gaussian process with covariance function E (Γ( τ )Γ( τ )) = (1 − θ ) c X ( τ , τ )+ θc Y ( τ , τ ) for τ , τ ∈ [0 ,

1] and c X ( τ , τ ) = Cov ( X ( τ ) , X ( τ ))+ P h ≥ Cov ( X ( τ ) , X h ( τ ))+ P h ≥ Cov ( X ( τ ) , X h ( τ ))and c Y ( τ , τ ) = Cov ( Y ( τ ) , Y ( τ ))+ P h ≥ Cov ( Y ( τ ) , Y h ( τ ))+ P h ≥ Cov ( Y ( τ ) , Y h ( τ )).Notice that the kernel functions c X and c Y are unknown, which makes the cal-culation of critical values of the test U n ,n a diﬃcult task.Since the functional sieve bootstrap procedure proposed satisfactory imi-tates the autocovariance structure of the underlying processes, it can be suc-cessfully applied to estimate the critical values of the test U n ,n . To elab-orate, the goal is to generate two independent functional pseudo-time series X ∗ , X ∗ , . . . , X ∗ n and Y ∗ , Y ∗ , . . . , Y ∗ n , that mimic the autocovariance structureof the processes X and Y respectively and satisfy, at the same time, the nullhypothesis of interest. For this let X ∗ t and Y ∗ t be generated by means of equa-tion (3.1) of the functional sieve bootstrap algorithm, where for the generationof the X ∗ t ’s the sample scores b ξ ( X ) t = ( b ξ ( X ) j,t = h X t , b v ( X ) j i , j = 1 , , . . . , m ) ⊤ , t = 1 , , . . . , n and for the generation of the Y ∗ t ’s, the sample scores b ξ ( Y ) t =( b ξ ( Y ) j,t = h Y t , b v ( Y ) j i , j = 1 , , . . . , m ) ⊤ , t = 1 , , . . . , n are used in Step 1 of thisalgorithm. Here b v ( X ) j , j = 1 , . . . , m and b v ( Y ) j , j = 1 , . . . , m , denote the or-thonormalized eigenfunctions of the m respectively m largest eigenvalues ofthe sample covariance operators b C ( X )0 = n − P n t =1 ( X t − X n ) ⊗ ( X t − X n ) and b C ( Y )0 = n − P n t =1 ( Y t − Y n ) ⊗ ( Y t − Y n ) respectively. Notice that generation of X ∗ t and Y ∗ t by using (3.1) ensures that E ∗ X ∗ t = E ∗ Y ∗ t = 0, that is the generatedfunctional pseudo-time series X ∗ , X ∗ , . . . , X ∗ n and Y ∗ , Y ∗ , . . . , Y ∗ n satisfy the fstathios Paparoditis/Functional Sieve Bootstrap null hypothesis H . Now, let X ∗ n = n − P n t =1 X ∗ t and Y ∗ n = n − P n t =1 Y ∗ t anddeﬁne the bootstrap analogue of U n ,n as U ∗ n ,n = n n n + n k X ∗ n − Y ∗ n k . The following theorem establishes validity of the sieve bootstrap applied to thefunctional testing problem considered.

Theorem 4.2.

Let the conditions of Theorem 4.1 be satisﬁed and assume that n , n → ∞ such that n / ( n + n ) → θ ∈ (0 , . Then, sup x ∈ R (cid:12)(cid:12) P ( U n ,n ≤ x ) − P ( U ∗ n ,n ≤ x | X n , Y n ) (cid:12)(cid:12) → , in probability, where P ( U ∗ n ,n ≤ ·| X n , Y n ) denotes the distribution function of U ∗ n ,n conditional on X n = ( X , X , . . . , X n ) and Y n = ( Y , Y , . . . , Y n ) .

5. Choice of Parameters and Numerical Results

Implementation of the functional sieve bootstrap requires the choice of two tun-ing parameters: the order p and the dimension m . By choosing these parameters,the problem of overﬁtting caused by selecting a large dimension and/or a highorder vector autoregressive model, should be seriously taken into account.Several approaches for selecting the number of principal components in func-tional data analysis have been proposed in the literature; see among others Yaoet al. (2005) and Li et al. (2013) for the use of information type criteria. Forour purpose, one useful and simple criterion for selecting the dimension m isbased on the ratio of the total variance explained by the number m of princi-pal components included, to the variance of X t . According to this rule, m isselected as the smallest positive integer for which the empirical variance ratio( V R n ) satisﬁes V R n ( m ) = P mj =1 b λ j / P nj =1 b λ j ≥ Q , with Q a predeterminedvalue and Q = 0 .

80 or Q = 0 .

85 two common choices; cf. H´orvath and Kokoszka(2012). One drawback of the VR-rule applied to functional time series, is thatthis criterion does not take into account dependence.To overcome this drawback we introduce in the following a generalized vari-ance ratio criterion. Measuring the total variability of the underlying functionalprocess by the quantity R ( − π,π ] kF ω k HS dω , yields by straightforward calcu-lations and evaluating the Hilbert-Schmidt norm using the orthonormal basis { v j , j = 1 , , . . . } , the expression Z ( − π,π ] kF ω k HS dω = ∞ X l =1 ∞ X r =1 Z ( − π,π ] (cid:12)(cid:12) f ξ l ,ξ r ( ω ) (cid:12)(cid:12) dω, where f ξ l ,ξ r denotes the cross spectral density of the score processes { ξ l,t } and { ξ r,t } . Deﬁne next a functional process X + m = { X + t , ∈ Z } , where X + t = fstathios Paparoditis/Functional Sieve Bootstrap X + t,m + U + t,m , X + t,m = P mj =1 ξ j,t v j , U + t,m = P ∞ j = m +1 ζ j,t v j and { ζ j,t , t ∈ Z } , j = m + 1 , m + 2 , . . . , are independent, i.i.d. processes which are independent from X + t,m and have mean zero and V ar ( ζ j,t ) = λ j . Observe that for any m ﬁxed andignoring estimation errors, it is the dependence structure of X + m which is essen-tially mimicked by the functional sieve bootstrap process X ∗ . This is so sincein the bootstrap world, U t,m = X t − P mj =1 ξ j,t v j is treated as an i.i.d. processand the (possible) correlation between the processes { X t,m = P mj =1 ξ j,t v j } and { U t,m } is ignored. Let F + ω,m be the spectral density operator of X + m . Using thesame measure of total variability as for the process X , we get Z ( − π,π ] kF + ω,m k HS dω = m X l =1 m X r =1 Z ( − π,π ] (cid:12)(cid:12) f ξ l ,ξ r ( ω ) (cid:12)(cid:12) dω + (2 π ) − ∞ X l = m +1 λ l . Notice that the term (2 π ) − P ∞ l = m +1 λ l is due to integrating the squared Hilbert-Schmidt norm of the spectral density operator of the process { U + t,m } . This pro-cess is included in the deﬁnition of X + m because of the functional i.i.d. innovations U ∗ t used in Step 6 of the sieve bootstrap algorithm to generate the X ∗ t ’s.The ratio GV R ( m ) = Z ( − π,π ] kF + ω,m k HS dω . Z ( − π,π ] kF ω k HS dω, can then be considered as the proportion of total variability of the process X cap-tured by that of the process X + m . Recall that (cid:12)(cid:12) f ξ l ,ξ r ( ω ) (cid:12)(cid:12) = κ l,r ( ω ) f ξ l ,ξ l ( ω ) f ξ r ,ξ r ( ω )with κ l,r the squared coherency between the score processes { ξ l,t } and { ξ r,t } .That is, GV R explicitly takes into account the entire autocovariance structureof the processes X and X + m . GV R ( m ) can then be interpreted as a measure ofthe los on information on the dependence structure of X caused by the func-tional sieve bootstrap procedure based on m principal components. Note that if X is a white noise process, then GRV ( m ) = 1 for every value of m . In this casewe set m = 0 as the most parsimonious choice, i.e., no vector autoregression isﬁtted, which implies that the functional sieve bootstrap (correctly) reduces toan i.i.d. bootstrap.Now, observe that λ j , R ( − π,π ] (cid:12)(cid:12) f ξ l ,ξ r ( ω ) (cid:12)(cid:12) dω and R ( − π,π ] kF ω k HS dω can beconsistently estimated by b λ j , 2 πn − P j ∈ F n | I ξ l ,ξ r ( ω j ) | and 2 πn − P j ∈ F n k I n,ω j k HS ,respectively, where I ξ l ,ξ r ( ω ) = J ξ l ( ω ) J ξ r ( − ω ) and J ξ s ( ω ) = (2 πn ) − / P nt =1 ξ s,t e − iωt for any s ≥

1. Furthermore, I n,ω is the periodogram operator with ker-nel I n,ω ( τ , τ ) = J n,ω ( τ ) J n,ω ( τ ), J n,ω ( τ ) = (2 πn ) − / P nt =1 X t ( τ ) e − iωt , ω j =2 πj/n , F n = {− N, . . . , − , , . . . , N } and N = [ n/ m as the smallest positive integer for which the empirical generalizedvariance ratio ( GV R n ) satisﬁes GV R n ( m ) = m X l =1 m X r =1 πn X j ∈ F n (cid:12)(cid:12) b I ξ l ,ξ r ( ω j ) (cid:12)(cid:12) + 12 π n X l = m +1 b λ l πn X j ∈ F n (cid:13)(cid:13) I n,ω j (cid:13)(cid:13) HS ≥ Q. fstathios Paparoditis/Functional Sieve Bootstrap Here b I ξ l ,ξ r ( ω ) = b J ξ l ( ω ) b J ξ r ( − ω ) with b J ξ s ( ω ) = (2 πn ) − / P nt =1 b ξ s,t e − iωt theﬁnite Fourier transform of the time series of estimated scores. Remark 5.1.

GV R n has been developed for the functional sieve bootstrapsituation considered in this paper. However, a simple modiﬁcation of this crite-rion leads to an alternative to the V R n rule which is appropriate for dependentfunctional data and which is of interest on its own. In particular, ignoring thesecond term of the nominator of GV R n , the following dependent variance ratio( DV R n ) criterion is obtained, DV R n ( m ) = m X l =1 m X r =1 X j ∈ F n (cid:12)(cid:12)b I ξ l ,ξ r ( ω j ) (cid:12)(cid:12) . X j ∈ F n (cid:13)(cid:13) I n,ω j (cid:13)(cid:13) HS .DV R n delivers an empirical measure of the lost on information on the depen-dence structure of X associated with the use of the m -dimensional space and canbe therefore, used as a simple criterion to select the number m of principal com-ponents in a functional time series setting. Notice that if the Hilbert-Schmidtnorm in GVR is replaced by the trace norm of the spectral density operatorsinvolved and the additional term (2 π ) − P ∞ l = m +1 λ l is ignored, then the corre-sponding DV R ( m ) ratio given by DV R ( m ) = m X l =1 m X r =1 Z π − π (cid:12)(cid:12) f ξ l ,ξ r ( ω ) (cid:12)(cid:12) dω . ∞ X l =1 ∞ X r =1 Z π − π (cid:12)(cid:12) f ξ l ,ξ r ( ω ) (cid:12)(cid:12) dω, reduces to the V R ( m ) = P ml =1 λ l / P ∞ l =1 λ l ratio.Notice that both, the VR and the GVR criterion, refer to a ﬁxed samplesize n and the purpose is to select the number of principal components in away which ensures that a desired fraction Q of the variance of the process iscaptured by the number of principal components included in the analysis. Thisis important for our bootstrap proposal where the objective is to appropriatelymimic the dependence structure of the functional time series at hand. However,consistency requires that m increases to inﬁnity with n which is not the caseif Q remains ﬁxed with n . At the same time and as we have seen, the rate atwhich m has to increase to inﬁnity should take into account the rate of decreaseof the eigenvalues λ j respectively of the diﬀerences λ j − λ j +1 to zero. One wayto accommodate such aspects in our practical selection of m , is to combinethe discussed VR respectively GVR criterion with an approach for selecting m proposed by H¨ormann and Kidzi´nski (2015) and which explicitly takes intoaccount the behavior of the eigenvalues b λ j . To elaborate, denote by m n,E thenumber of principal components selected by the rule m n,E = argmax n j ≥ b λ b λ j ≤ √ n/ log( n ) o . Notice that m n,E allows for the j -th principal component to be included in theanalysis if the corresponding estimated eigenvalue b λ j is big enough, i.e., if the fstathios Paparoditis/Functional Sieve Bootstrap ratio 1 / b λ j does not exceed the threshold √ n/ log( n ). The nominator b λ actssolely as a normalization to adapt for scaling; for this and for the choice theparticular threshold see H¨ormann and Kidzi´nski (2015). Denote now by m n,Q the number of principal components selected using, the VR or the GVR criterionfor some given Q . The practical selection of m we then propose is to set thisparameter equal to b m n = max { m n,Q , m n,E } . According to this proposal, only those principal directions are included in theanalysis the eigenvalues of which can be estimated with a reasonably accuracyensuring at the same time that the number of principal components selectedexplains at least a desired portion of the variability of the time series at hand. Weremark that although for functional time series the GVR criterion is theoreticallymore appealing, for short time series of n ≤

100 observations, we still recommendthe use the VR-criterion since it leads to selections of m with a smaller variabilityavoiding, therefore, the potential ﬁt of vector autoregressions of large dimensionsand/or of high orders which is an important issue for small samples sizes; seealso Section 5.2 for details.Once the dimension m has been selected, the order p of the vector autore-gression ﬁtted can be chosen using the AICC criterion; see Hurvich and Tsai(1993). This criterion is preferred because it is based on an approximately unbi-ased estimator of the expected Kullback-Leibler information of the ﬁtted modeland, more importantly, avoids overﬁtting. The order p is then selected by mini-mizing AICC ( p ) = n log | b Σ e,p | + n ( nm + pm ) (cid:14) ( n − m ( p + 1) − p , where b Σ e,p = n − P nt = p +1 b e t,p b e Tt,p and b e t,p is deﬁned in Step 4 of the functional sieve bootstrap algorithm.

To investigate the ﬁnite sample behavior of the functional sieve bootstrap (FSP)we have performed simulations using time series stemming from a ﬁrst orderfunctional moving average process given by X t = ε t + Θ( ε t − ) . (5.1)as in Aue et al. (2015). To elaborate, Θ is speciﬁed as Θ = 0 . H D → H D , H D = sp { f , f , . . . , f D } , D = 21 and f j , j = 1 , , . . . , D are Fourier basis functions on the interval [0 , x ∈ H D , x = P Dj =1 c j f j with c j = h x, f j i , the operator Ψ acts asΨ( x ) = P Dj =1 P Dl =1 c j h Ψ( f j ) , f l i f l = ( B Ψ c ) ′ v , where c = ( c , . . . , c D ) ′ and v = ( f , . . . , f D ) ′ and the matrix B Ψ has element in the j th column and l throw given by h Ψ( f j ) , f l i . Following Aue et al. (2015) the operator Ψ was chosenat random. For this a D × D matrix of independent, normal random variableswith mean zero was ﬁrst generated where its ( j , j )th element has standarddeviation σ j ,j = j − j − . This matrix was then scaled so that the resultingmatrix B Ψ has induced norm equal to 1 and in every iteration of the simulation fstathios Paparoditis/Functional Sieve Bootstrap Table 1

Frequency of selected values of m by the VR and the GVR criterion ( R = 1000 replications). m = 1 2 3 4 5 6 7n=100 V R n GV R n V R n GV R n V R n GV R n V R n GV R n V R n GV R n runs B Ψ was newly generated. The corresponding i.i.d. innovations ε t in (5.1)were generated as ε t = P Dj =1 Z t,j f j , where Z t,j are i.i.d. Gaussian with meanzero and standard deviation equal to j − .We ﬁrst consider the performance of the VR and GVR criteria in selectingthe number m of principal components, when Q = 0 .

85. Table 1 shows thefrequencies of selected dimensions m over R = 1000 replications of the con-sidered FMA(1) model for diﬀerent sample sizes. As it is seen from this table,the VR criterion is quite stable over the diﬀerent sample sizes leading to theselections m = 3 or m = 4 in almost all situations. The GVR criterion exhibitsa greater variability for small sample sizes (n ≤ m = 4 and m = 5 as n increases. Observe thatbecause the GVR criterion explicitly takes into account the dependence struc-ture of the processes involved, it selects more frequently the larger dimension m = 4 compared to the dimension m = 3 which is more frequently selectedby the VR criterion. Notice further that the smaller variability of the VR rulefor small sample sizes, prohibits the selection of vector autoregressions of largedimension which is particularly important in our set-up. Thus for n ≤

100 obser-vations we recommend to apply the b m n rule using the VR criterion to calculate m n,Q and the AICC criterion in order to select the values of m and p .To investigated the behavior of b m n for the FMA(1) model considered, we usea range of sample sizes with m n,Q chosen according to the V R ( n ≤ GV R criterion with Q = 0 .

85. Table 1 of the supplementary materialshows the results obtained over R = 1000 repetitions for each of the sample sizesconsidered. As it is seen from this table, the behavior of b m n is dominated forsmall to moderate sample sizes by m n,Q ensuring, therefore, the desired descrip-tion of the variability of the functional time series by the number m of principalcomponents selected. However, as n increases the behavior of b m n becomes dom-inated by m n,E which allows for the number of principal components selectedas well as for the part of the variance explained, to increase with n .We next consider the behavior of the FSB procedure in estimating the stan-dard deviation of the sample mean √ nX n ( τ j ) = n − / P nt =1 X t ( τ j ), calculatedfor time series of length n = 100 observations and for τ j , j = 1 , , . . . , T , T = 21,equidistant time points in the interval [0 , fstathios Paparoditis/Functional Sieve Bootstrap Table 2

Averaged absolute bias (ABias), Averaged relative bias (RBias) and Averaged standarddeviation (AStd) of the moving block bootstrap (MBB), the tapered block bootstrap (TBB),the stationary bootstrap (SB) and the functional sieve bootstrap (FSB) estimates of thestandard deviation of the sample mean X n . MBB TBB SB FSB b = 5 b =9 b = 7 b = 6 b = 5 b = 6 (2,3) (3,3) ( b m, b p )ABias 0.206 0.208 0.139 0.153 0.255 0.256 0.037 0.054 0.121RBias 0.091 0.092 0.061 0.068 0.112 0.113 0.016 0.024 0.053AStd 0.321 0.406 0.350 0.312 0.341 0.371 0.445 0.462 0.484 the sample mean is estimated using 20,000 replications of the moving averagemodel (5.1). All estimates presented are based on R = 1 ,

000 replications and B = 1 ,

000 bootstrap repetitions. Table 2 of the supplementary material showsthe FSB estimates obtained using some diﬀerent values of the bootstrap parame-ters m and p as well as for the values of these parameters chosen by means of the b m n and AICC rule and which are denoted by ( b m, b p ). Note that ( m, p ) = (3 , n = 100 observations. These estimates also seem not to be very sensi-tive with respect to the diﬀerent choices of the parameter m used to truncatethe Karhunen-Lo´eve expansion.Table 2 compares the results using the FSB procedure with those of threediﬀerent block bootstrap methods, the moving block bootstrap (MBB), the ta-pered block bootstrap (TBB) and the stationary bootstrap (SB). To asses theoverall behavior of the diﬀerent bootstrap estimates, we use the averaged abso-lute bias (ABias), T − P Tj =1 | σ ∗ ( τ j ) − σ ( τ j ) | , the averaged relative bias (RBias), T − P Tj =1 | σ ∗ ( τ j ) /σ ( τ j ) − | and the averaged standard deviation of the boot-strap estimates (AStd), calculated as T − P Tj =1 q d V ar ( σ ∗ ( τ j )), where σ ( τ j ) isthe estimated exact standard deviation, d V ar ( σ ∗ ( τ j )) = ( R − − P Rr =1 ( σ ∗ r ( τ j ) − σ ∗ ( τ j )) , with σ ∗ r ( τ j ) denoting the bootstrap estimate of σ ( τ j ) obtained in the r th replication, r = 1 , , . . . , R , and σ ∗ ( τ j ) = R − P Rr =1 σ ∗ r ( τ j ). For the threeblock bootstrap methods considered we report the results for two block sizesdenoted by b and b , for which the corresponding methods achieve the twolowest ABias respectively RBias values. Thus the results presented for the threeblock bootstrap methods in Table 2 are those having the overall lowest bias. Fi-nally, for the FSB procedure we report the results for the values ( m, p ) = (2 , m, p ) = (3 ,

3) and for the values of these parameters chosen by the b m n and AICC rule denoted by ( b m, b p ).As it is seen from Table 2, between the three block bootstrap estimators con-sidered, the MBB estimator seems to behave better that the SB estimator, whileboth estimators are outperformed by the TBB estimator. However, comparedto the FSB estimates, all block bootstrap estimates are quite biased and theyare clearly outperformed by the FSB estimates. This is true even for the casewhere the parameters of the FSB procedure are chosen data dependent, where fstathios Paparoditis/Functional Sieve Bootstrap the bias of the FSB estimates is smaller that the lowest bias achieved by theblock bootstrap methods. The FSB estimates have a larger standard deviationwhich, however, is not surprising taking into account the fact that this boot-strap method requires the estimation of m p autoregressive coeﬃcients. It isworth investigating whether the standard deviation of the FSB estimates canbe reduced by using sparse methods to ﬁt the vector autoregression involved inthe bootstrap procedure.The results of a small simulation study investigating the ﬁnite sample size andpower behavior of the bootstrap based, fully functional test for the two-samplemean problem considered in Section 4.2, are presented in the supplementarymaterial.

6. Auxiliary Results and ProofsLemma 6.1.

Let Assumption 1, 2 and 3 be satisﬁed. Denote by Ψ j ( m ) , j =1 , , . . . , the coeﬃcients matrices of the power series A − m ( z ) , where A m ( z ) = I m − P ∞ j =1 A j ( m ) z j , | z | ≤ , and let Σ e ( m ) = E ( e t ( m ) e ⊤ t ( m )) . Then,(i) P ∞ j =1 (1 + j ) r k A j ( m ) k F = O (1) ,(ii) P ∞ j =1 (1 + j ) r k Ψ j ( m ) k F = O (1) , and(iii) < c e ≤ k Σ e ( m ) k F = O (1) ,where all bounds on the right hand side are valid uniformly in m . The following version of Baxter’s inequality is very useful in our setting be-cause it relates the approximation error of the coeﬃcient matrices of the ﬁnitepredictor and of the autoregressive-representation of the m -dimensional processof scores to the lower bound of the spectral density matrix f ξ ( · ). It is an imme-diate consequence of Lemma 2.1 and of Theorem 3.2 in Meyer et al. (2016). Lemma 6.2.

Let Assumption 1, 2 and 3 be satisﬁed. Then there exists a con-stant

C > which does not depend on m , such that for all ≤ s ≤ r − , p X j =1 (1 + j ) s k A j,p ( m ) − A j ( m ) k F ≤ Cδ − m ∞ X j = p +1 (1 + j ) s +1 k A j ( m ) k F , where δ m is given in Lemma 2.1. The following lemma provides a useful bound between the estimated matricesof the autoregressive parameters based on the vector of scores ξ t and on thevector of their estimates b ξ t , t = 1 , , . . . , n . It deals with the case of the Yule-Walker estimators but similar bounds can be established along the same linesfor other estimators, like for instance for least squares estimators. Lemma 6.3.

Let Assumption 1 be satisﬁed, let b A p,m = ( b A j,p ( m ) , j = 1 , , . . . , p ) and let e A p,m = ( e A j,p ( m ) , j = 1 , , . . . , p ) be the Yule-Walker estimators of A j,p ( m ) , j = 1 , , . . . , p , based on the time series of true scores ξ , ξ , . . . , ξ n . fstathios Paparoditis/Functional Sieve Bootstrap Then, (cid:13)(cid:13) b A p,m − e A p,m (cid:13)(cid:13) F = O P (cid:16)(cid:16) p √ mλ m + p (cid:17) n n m X j =1 α j o / (cid:17) . Lemma 6.4.

Let Assumption 1 and 2 (with r = 0 ) be satisﬁed and A p,m ( z ) = I − P pj =1 A j,p ( m ) z j , z ∈ C . There exists p m ∈ N and a positive constant C which does not depend on m such that for m ∈ N and all p > p m , inf | z |≤ /p (cid:12)(cid:12)(cid:12) det ( A p,m ( z )) (cid:12)(cid:12)(cid:12) ≥ Cm − / . To state the next lemma we ﬁrst ﬁx the following notation. Ψ j ( m ), Ψ j,p ( m ), e Ψ j,p ( m ) and b Ψ j,p ( m ) j = 1 , , . . . denote the coeﬃcient matrices in the powerseries expansions of A − m ( z ), A − p,m ( z ), e A − p,m ( z ) and b A − p,m ( z ), respectively, | z | ≤ ( m ) = Ψ ,p ( m ) = e Ψ ,p ( m ) = b Ψ ,p ( m ) = I m . Furthermore, e t ( m ) = ξ t − P ∞ j =1 A j ( m ) ξ t − j , e t,p ( m ) = ξ t − P pj =1 A j,p ( m ) ξ t − j , e e t,p ( m ) = ξ t − P pj =1 e A j,p ( m ) ξ t − j and b e t,p ( m ) = b ξ t − P pj =1 b A j,p ( m ) b ξ t − j , while e Σ e,p ( m ) = E + ( e e t,p ( m ) − e e n,p ( m ))( e e t,p ( m ) − e e n,p ( m )) ⊤ and b Σ e,p ( m ) = E ∗ ( b e t,p ( m ) − b e n,p ( m ))( b e t,p ( m ) − b e n,p ( m )) ⊤ with e e n,p ( m ) =( n − p ) − P nt = p +1 e e t,p ( m ) and b e n,p ( m ) = ( n − p ) − P nt = p +1 b e t,p ( m ), where E + denotes expectation with respect to the measure assigning probability ( n − p ) − to each e e t,p ( m ), t = p + 1 , p + 2 , . . . , n . Lemma 6.5.

Let Assumptions 1 and 3 and Assumption 2 and 4 (r=2) besatisﬁed. Then, as n → ∞ ,(i) P ∞ j =1 k e Ψ j,p ( m ) − Ψ j,p ( m ) k F P → ,(ii) k e Σ e,p ( m ) − Σ e,p ( m ) k F P → ,(iii) P ∞ j =1 k b Ψ j,p ( m ) − Ψ j,p ( m ) k F P → ,(iv) k b Σ e,p ( m ) − Σ e,p ( m ) k F P → ,(v) P ∞ j =1 k Ψ j,p ( m ) − Ψ j ( m ) k F → ,(vi) k Σ e,p ( m ) − Σ e ( m ) k F → . Proof of Lemma 2.1:

Expression (2.2) imediately leads, for all ω ∈ [0 , π ],to an upper bound of f ξ ( M ) ( ω ). To derive a lower bound, recall that Γ ξ ( M ) ( h ) = (cid:0) h C h ( v j r ) , v j s i (cid:1) r,s =1 , ,...,m and observe that f ξ ( M ) ( ω ) = (cid:16) hF ω ( v j r ) , v j s i (cid:17) r,s =1 , ,...,m . Let µ j ( ω ), j = 1 , , . . . , m , be the eigenvalues of f ξ ( M ) ( ω ) (including multi-plicity). It suﬃces to show that min ≤ j ≤ m µ j ( ω ) ≥ δ M > ω ∈ [0 , π ]. For this let c j ( ω ) = ( c j, ( ω ) , c j, ( ω ) , . . . , c j,m ( ω )) ⊤ ∈ C m , j = 1 , , . . . , m , be the corresponding normalized eigenvectors. Then for every j ∈ { , , . . . , m } , we have µ j ( ω ) = c ⊤ j ( ω ) (cid:16) hF ω ( v j r ) , v j s i (cid:17) r,s =1 , ,...,m c j ( ω )= hF ω ( y j ( ω )) , y j ( ω ) i > , fstathios Paparoditis/Functional Sieve Bootstrap by the positivity of F ω , where y j ( ω ) = P mr =1 c j,r ( ω ) v j r ∈ V M = sp { v j , v j ,. . . , v j m } and k y j k = 1. Because of the norm summability of the autocovariancematrix function Γ ξ M ( h ), the spectral density f ξ ( M ) ( ω ) and consequently theeigenvalues µ j ( ω ), j = 1 , , . . . , m , are continuous functions of ω . Let δ M ( ω ) =min ≤ j ≤ m µ j ( ω ) and notice that δ M ( ω ) is continuous in ω and δ M ( ω ) > ω ∈ [0 , π ]. Deﬁne δ M = min ω ∈ [0 ,π ] δ M ( ω ) which is positive by the continuityof δ M ( · ) in the compact interval [0 , π ]. Hence min ≤ j ≤ m µ j ( ω ) ≥ δ M > ω ∈ [0 , π ]. (cid:3) Proof of Proposition 3.1:

Recall the deﬁnition of X ∗ t = P mj =1 ⊤ j ξ ∗ t b v j + U ∗ t and observe that ξ ∗ t = P ∞ l =0 b Ψ l,p ( m ) e ∗ t − l , where b Ψ ,p ( m ) = I m and the powerseries b Ψ m,p ( z ) = I m + P ∞ l =1 b Ψ l,p ( m ) z l = ( I m − P pj =1 b A j,p ( m ) z j ) − convergesfor | z | ≤

1. Write X ∗ t = P ∞ l =0 P mj =1 ⊤ j b Ψ l,p ( m ) e ∗ t − l b v j + U ∗ t and deﬁne X ∗ t,M = P M − l =0 P mj =1 ⊤ j b Ψ l,p ( m ) e ∗ t − l b v j + P ∞ l = M P mj =1 ⊤ j b Ψ l,p ( m ) e ∗ t − l,t b v j + U ∗ t , where foreach t ∈ Z , { e ∗ s,t , s ∈ Z } is an independent copy of { e ∗ s , s ∈ Z } . Notice that X ∗ M − X ∗ M,M = P ∞ l = M P mj =1 ⊤ j b Ψ l,p ( m )( e ∗ M − l − e ∗ M − l,M ) b v j . By Minkowski’sinequality we have q E k X ∗ M − X ∗ M,M k ≤ vuut E k ∞ X l = M m X j =1 ⊤ j b Ψ l,p ( m ) e ∗ M − l b v j k + vuut E k ∞ X l = M m X j =1 ⊤ j b Ψ l,p ( m ) e ∗ M − l,M b v j k . (6.1)Evaluating the ﬁrst expectation term we get using k A k F = tr ( AA ⊤ ) and thesubmultiplicative property of the Frobenius matrix norm, that E k ∞ X l = M m X j =1 ⊤ j b Ψ l,p ( m ) e ∗ M − l b v j k = ∞ X l = M tr (cid:0) b Ψ l,p ( m )Σ ∗ ( m ) b Ψ ⊤ l,p ( m ) (cid:1) ≤ k b Σ / e,p ( m ) k F ∞ X l = M k b Ψ l,p ( m ) k F , where b Σ e,p ( m ) = b Σ / e,p ( m ) b Σ / e,p ( m ). An identical expression appears for the sec-ond expectation term on the right hand side of (6.1). Applying Minkowski’sinequality again we get by the exponential decay of k b Ψ l,p ( m ) k F , that ∞ X M =1 q E k X ∗ M − X ∗ M,M k ≤ k b Σ / e,p ( m ) k F ∞ X M =1 ∞ X l = M k b Ψ l,p ( m ) k F = 2 k b Σ / e,p ( m ) k F ∞ X l =1 l k b Ψ l,p ( m ) k F = O P (1) . (cid:3) fstathios Paparoditis/Functional Sieve Bootstrap Proof of Theorem 4.1

Let L + n,m = 1 √ n n X t =1 m X j =1 ξ + j,t v j e − itω , where ξ + t = ( ξ +1 ,t , ξ +2 ,t , . . . , ξ + m,t ) ⊤ , t = 1 , , . . . , n with ξ + t = P pj =1 e A j,p ( m ) ξ + t − j + e + t , where e A j,p ( m ), j = 1 , , . . . , p are the estimators of the autoregressive pa-rameter matrices based on the vector time series of true scores ξ t , t = 1 , , . . . , n and e + t are obtained by i.i.d. resampling from the centered residuals b e t = ξ t − P pj =1 e A j,p ( m ) ξ t − j , t = p + 1 , p + 2 , . . . , n . That is, the pseudo-variable L + n,m is obtained using the true eingefunctions v j and the true scores ξ j,t instead oftheir estimates b v j and b ξ j,t respectively. Decompose n − / S ∗ n ( ω ) as n − / S ∗ n ( ω ) = 1 √ n n X t =1 m X j =1 ξ + j,t v j e − itω + 1 √ n n X t =1 m X j =1 ξ ∗ j,t ( b v j − v j ) e − itω + 1 √ n n X t =1 m X j =1 ( ξ ∗ j,t − ξ + j,t ) v j e − itω + 1 √ n n X t =1 U ∗ t,m e − itω = L + n,m + V ∗ n,m + D ∗ n,m + R ∗ n,m with an obvious notation for L + n,m , V ∗ n,m , D ∗ n,m and R ∗ n,m . Notice that the terms V ∗ n,m and D ∗ n,m are due to the fact that, in the bootstrap procedure, the un-known scores and eigenfunctions are replaced by their sample estimates, while R ∗ n,m is due to the m -dimensional approximation of the inﬁnite dimensionalstructure of the underlying process. Assertion (i) of the theorem follows thenfrom Lemma 6.6, 6.7, 6.8 and 6.9 and Slutsky’s theorem.Consider assertion (ii). Since n − k E ∗ S ∗ n ( ω ) ⊗ S ∗ n ( ω ) − ES n ( ω ) ⊗ S ( ω ) k HS ≤ k n − E ∗ S ∗ n ( ω ) ⊗ S ∗ n ( ω ) − π F ∗ ω,m k HS + 2 π kF ∗ ω,m − F ω k HS + k n − ES n ( ω ) ⊗ S n − π F ω k HS , it suﬃces in view of Proposition 3.2 and Theorem 2 of Cerovecki and H¨ormann(2015), to show that the ﬁrst term on the right hand side of the above inequalityconverges to zero in probability. For this we have using n − E ∗ S ∗ n ( ω ) ⊗ S ∗ n ( ω ) = n − P n − − n +1 (1 − | h | /n ) C ∗ h , that this term is bounded by X | h |≥ n k C ∗ h k HS + n − n − X h = − n +1 | h |k C ∗ h k HS . Now, since P h ∈ Z k C ∗ h k HS = O P (1) uniformly in p and m , we get that P | h |≥ n k C ∗ h k HS = o P (1) and by Kronecker’s lemma that n − P n − h = − n +1 | h | k C ∗ h k HS = o P (1). Toverify the uniform boundeness of P h ∈ Z k C ∗ h k HS , notice ﬁrst that from the ex-pression of C ∗ h given in Section 3.2 we get that P h ∈ Z k C ∗ h k HS ≤ P h ∈ Z k Γ ∗ h k F + fstathios Paparoditis/Functional Sieve Bootstrap k C ∗ U k HS . The square of the second term on the right hand side of the lastinequality equals k E ∗ U ∗ t ⊗ U ∗ t k HS which converges to zero in probability, seethe proof of Proposition 3.2. For the ﬁrst term we have that P h ∈ Z k Γ ∗ h k F ≤ (cid:0) P ∞ j =0 k b Ψ j,p ( m ) k F (cid:1) k b Σ e,p ( m ) k F = O P (1) uniformly in p and m by Lemma6.1 and Lemma 6.5. (cid:3) Lemma 6.6.

Under the assumptions of Theorem 4.1 it holds true that, R ∗ n,m P → , as n → ∞ . Lemma 6.7.

Under the assumptions of Theorem 4.1 it holds true that, D ∗ n,m P → , as n → ∞ . Lemma 6.8.

Under the assumptions of Theorem 4.1 it holds true that, V ∗ n,m P → , as n → ∞ . Lemma 6.9.

Under the assumptions of Theorem 4.1 it holds true that, for all ω ∈ [ − π, π ] and as n → ∞ , L + n,m ( ω ) ⇒ N C (0 , π F ω ) , in probability. Acknowledgements

The author thanks the Editor, the Associate Editor and two referees for theircareful reading and thoughtful comments and questions that led to an improvedversion of the paper. SUPPLEMENTARY MATERIAL

Online Supplement: “Sieve Bootstrap for Functional Time Series” .The online supplement contains the proofs that were omitted in this paper andadditional numerical results.

References [1] ANEIROS-P´EREZ, G., CAO, R. and VILAR-FERNAN´EZ, J. M. (2011).Functional methods for time series prediction: A nonparametric approach.

Journal of Forecasting , , 377-392.[2] AUE, A., Dubart, D. N. and H ¨ORMANN, S. (2015). On the predictionof stationary functional time series. Journal of the American Statisticalassociation , , 378-392.[3] BOSQ, D. (2000). Linear Process in Function Spaces . Springer, Berlin-Heidelberg-New York.[4] BROCKWELL, P. and DAVIS, R. (11991).

Time Series: Theory an Meth-ods . Springer, Berlin-Heidelberg-New York.[5] CEROVECKI, C. and H ¨ORMANN, S. (2017). On the CLT for discreteFourier transforms of functional time series.

Journal of Multivariate Anal-ysis , , 282-295. fstathios Paparoditis/Functional Sieve Bootstrap [6] CHENG, R. and POURAHMADI, M. (1993). Baxter’s inequality and con-vergence of ﬁnite predictors of multivariate stochastic processes. ProbabilityTheory and Related Fields , , 115-124.[7] DEHLING, H., SHARIPOV, O. S. and WENDLER, M. (2015). Bootstrapfor dependent Hilbert space valued random variables with application tovon Mises statistics. Journal of Multivariate Analysis , , 200-215.[8] FERN ´ANDEZ DE CASTRO, B., GUILLAS, S. and GONZ ´ALEZ MAN-TEIGA, W. (2005). Functional samples and bootstrap for predicting sulfurdioxide levels. Technometrics , , 212-222.[9] FRANKE, J. and NYARIGUE, E. (2016). Residual-based bootstrap forfunctional autoregressions. Preprint .[10] H ¨ORMANN, S. and KOKOSZKA, P. (2010). Weakly dependent functionaldata.

Annals of Statistics , , 1845-1884.[11] H ¨ORMANN, S. and KOKOSZKA, P. (2012). Functional time series. Hand-book of Statistics: Time Series Analysis-Methods and Applications , 157-186.[12] H ¨ORMANN, S. and KIDZI ´NSKI, L. (2015). A note on estimation in Hilber-tian linear models.

Scandinavian Journal of Statistics , , 43-62.[13] H ¨ORMANN, S., KIDZI ´NSKI, L. and HALLIN, M. (2015). Dynamic func-tional principal components. Journal of the Royal Statistical Society: SeriesB , , 319-348.[14] HORV ´ATH, L. and KOKOSZKA, P. (2012). Inference for Functional Datawith Applications . Springer, Berlin-Heidelberg-New York.[15] HORV ´ATH, L., KOKOSZKA, P. and REEDER, R. (2013). Estimation ofthe Mean of Functional Time Series and a Two Sample Problem.

Journalof the Royal Statistical Society: Series B , , 103-122.[16] HORV ´ATH, L., KOKOSZKA, P. and RICE, G. (2014). Testing Stationarityof Functional Time Series. Journal of Econometric , , 66-82.[17] HURVICH, C. M. and TSAI, C.-L. (1993). A Corrected Akaike InformationCriterion for Vector Autoregressive Model Selection. Journal of Time SeriesAnalysis , , 271-279.[18] HYNDMAN, R. J. and SHANG, H. L. (2009). Forecasting functional timeseries. Journal of the Korean Statistical Society , , 199-211.[19] KREISS, J.-P. (1988). Asymptotic Statistical Inference for a Class ofStochastic Processes. Habilationsschrift. Universit¨at Hamburg.[20] KREISS, J.-P., PAPARODITIS, E. and POLITIS, D. N. (2011). On therange of validity of the autoregressive sieve bootstrap. Annals of Statistics , , 2103-2130.[21] KREISS, J.-P. and PAPARODITIS, E. (2011). Bootstrap Methods for De-pendent Data: A Review. Journal of the Korean Statistical Society , ,357-378.[22] LAHIRI, S. N. (2003). Resampling Methods for Dependent Data . Springer,Berlin-Heidelberg-New York.[23] LI, Y., WANG, N. and CAROROL, R. J. (2013). Selecting the number ofprincipal components in functional data.

Journal of the American Statisti-cal Association , , 1284-1294.[24] McMURRY, T. and POLITIS, D. N. (2011). Resampling methods for func- fstathios Paparoditis/Functional Sieve Bootstrap tional data. The Oxford Handbook of Functional Data Analysis , (F. Ferratyand Y. Romain, eds.) 189-209. Oxford University Press.[25] MERLEVEDE, F., PELIGRAD, M. and UTEV, S. (1997). Sharp condi-tions for the CLT of linear processes in a Hilbert -space.

Journal of Theo-retical Probability , , 681-693.[26] MEYER, M. and KREISS, J.-P. (2015). On the vector autoregressive sievebootstrap. Journal of Time Series Analysis , , 377-397.[27] MEYER, M., JENTSCH, C., and KREISS, J.-P. (2016). Baxter’s inequalityand sieve bootstrap for random ﬁelds. Bernoulli , to appear.[28] MINGOTTI, N., LILLO, R. E. and ROMO, J. (2015). A random walk testfor functional time series. UC3M Working papers, Statistics and Econo-metrics.[29] PANARETOS, V. and TAVAKOLI, S. (2013). Fourier analysis of stationarytime series in function spaces.

Annals of Statistics , , 568-603.[30] POLITIS, D. N. and ROMANO, J. (1994). Limit theorems for weakly de-pendent Hilbert space valued random variables with applications to thestationary bootstrap. Statistica Sinica , , 461-476.[31] POURAHMADI, M. (2001). Foundation of Time Series Analysis and Pre-diction Theory . John-Wiley: New York.[32] SHARIPOV, O., TEWES, J. and WENDLER, M. (2016). Sequential blockbootstrap in a Hilbert space with application to change point analysis.

TheCanadian Journal of Statistics , forthcoming.[33] YAO, F., M ¨ULLER, H. G. and WANG, J. L. (2005). Functional DataAnalysis for Sparse Longitudinal Data.

Journal of the American StatisticalAssociation , , 577-590.[34] WIENER, N. and MASANI, P. (1958). The prediction theory of multivari-ate stochastic processes, II. Acta Mathematica , , 93-137.[35] SHANG, L. H. (2016). Resampling Methods for Dependent FunctionalData. Preprint .[36] ZHOU, T. and POLITIS, D. N. (2016). Kernel Estimation of First-OrderNonparametric Functional Autoregression Model and its Bootstrap Ap-proximation.

Preprint . upplement to“Sieve Bootstrap for Functional TimeSeries” This supplement contains technical proofs of the results presented in the mainpaper Paparoditis (2016) as well as some additional numerical results. In par-ticular, Section 1 contains the proofs of the auxiliary lemmas presented in thementioned paper, Section 2 the proof of Lemma 3.1, Section 3 the proof of Propo-sition 3.2, Section 4 the proofs of the lemmas related to Theorem 4.1, Section5 the proof of Theorem 4.2 and Section 6 discusses some implementation issuesand presents some additional numerical results.

1. Proofs of auxiliary lemmasProof of Lemma 6.1:

Consider (i) and (ii). Let C v be the class of all m × m matrix-valued functions on [ − π, π ] with C m × m -valued Fourier coeﬃ-cient matrices ( F k , k ∈ Z ) satisfying the condition P h ∈ Z (1 + | h | ) r k F k k F m ∈ N , and, therefore, φ ( ω ) is in-vertible with inverse denoted by φ − . Notice that φ, φ − ∈ C v . According toWiener and Masani (1958), Theorem 5.5 and Theorem 5.7, there exist sequences { C n ( m ) , n ∈ N } and { D n ( m ) , n ∈ N } which are independent of t such that for all t ∈ Z , P sp { ξ t − j ,j ≥ } ( ξ t ) = P ∞ j =1 C j ( m ) e t − k ( m ) and e t ( m ) = P ∞ j =0 D j ( m ) ξ t − j ,where D ( m ) = Σ − / e ( m ) and the inﬁnite sums are L -convergent. The coef-ﬁcients in the autoregressive and the Wold representation are obtained by set-ting A ( m ) = Σ / e ( m ) D ( m ) = I m , A j ( m ) = − Σ / e ( m ) D j ( m ) and Ψ j ( m ) = C j ( m )Σ − / e ( m ), where C j ( m ), j = 1 , , . . . and D j ( m ), j = 0 , , , . . . , are theFourier coeﬃcients of φ and φ − , respectively. Since φ, φ − ∈ C v , we get that P j ∈ N (1 + j ) r k A j ( m ) k F and P j ∈ N (1 + j ) r k Ψ j ( m ) k F are bounded uniformly in m . In (iii) the lower bound follows from the regularity of the inﬁnite dimensionalprocess of scores { ξ t = ( ξ j,t , j = 1 , , . . . , ) ⊤ , t ∈ Z } which in turn follows fromthe regularity of X . For the upper bound, let σ ( m ) j , j = 1 , , . . . , m , be the (pos-itive) eigenvalues of Σ e ( m ). Then, since 0 ≤ P mj =1 σ ( m ) j = P mj =1 V ar ( e j,t ) ≤ P mj =1 V ar ( ξ j,t ) = P mj =1 λ j we have k Σ e ( m ) k F = qP mj =1 σ j ≤ qP mj =1 λ j ≤ fstathios Paparoditis/Functional Sieve Bootstrap k C k HS < ∞ . (cid:3) Proof of Lemma 6.3:

We ﬁrst show thatsup − p ≤ h ≤ p k b Γ h − e Γ h k F = O P ( { n − m X j =1 α − j } / ) , (1.1)where b Γ h = n − P n − ht =1 b ξ t b ξ ⊤ t + h , e Γ h = n − P n − ht =1 ξ t ξ ⊤ t + h , h = 0 , , . . . , n − α = λ − λ and α j = min { λ j − − λ j , λ j − λ j +1 } , j = 2 , , . . . , m . To simplify notationwe also write Γ h for Γ ξ ( h ) in what follows. Recall that the covariance matricesintroduced refer to the m -dimensional vector of scores ξ t = ( ξ j,t = h X t , v j i , j =1 , , . . . , m ) ⊤ or to its estimator b ξ t = ( b ξ j,t = h X t , b v j i , j = 1 , , . . . , m ) ⊤ . Since k b Γ h − e Γ h k F ≤ k n − P n − ht =1 ( b ξ t + h − ξ t + h ) b ξ ⊤ t k F + k n − P n − ht =1 ξ t + h ( b ξ t − ξ t ) ⊤ k F itsuﬃces to consider only one of the two terms on the right hand side of the lastbound. By the triangular and the Cauchy-Schwarz inequality we have k n − n − h X t =1 ( b ξ t + h − ξ t + h ) b ξ ⊤ t k F ≤ n − n − h X t =1 k (cid:0) h X t + h , b v j − v j i , j = 1 , . . . , m (cid:1) ⊤ k× k (cid:0) h X t + h , b v j i , j = 1 , . . . , m (cid:1) ⊤ k≤ (cid:16) m X j =1 k b v j − v j k (cid:17) / n n X t =1 k X t k (cid:16) m X j =1 h X t , b v j i (cid:1) / = O P (cid:16) ( m X j =1 k b v j − v j k ) / (cid:17) , with the O P term uniformly in h . The assertion follows because by Assumption1, P mj =1 k b v j − v j k = O P ( n − P mj =1 α − j ); see H¨ormann and Kokoszka (2010).We next proof the assertion of the lemma. First notice that for invertiblematrices A n and B such that k A n − B k F → n → ∞ , we have the bound k A − n − B − k F = k A − n ( B − A ) B − k F ≤ k A − n − B − k F k B − A n k F k B − k F + k B − k F k B − A n k F , from which we get, for n large enough such that 1 − k A n − B k F k B − k F > k A − n − B − k F ≤ k B − k F k A n − B k F − k A n − B k F k B − k F . (1.2)Then recall the solution of the Yule-Walker equations, A p,m = ( A ,p ( m ) , A ,p ( m ) , . . . , A p,p ( m )) = G G − ,p , where the G ,p ∈ R mp × mp matrix is given by G ,p =  Γ Γ . . . Γ p − Γ − Γ . . . Γ p − ... ... ...Γ − p +1 Γ − p +2 . . . Γ  and G = (Γ , Γ , . . . , Γ p ) . fstathios Paparoditis/Functional Sieve Bootstrap Let b A p,m = b G b G − ,p where b G ,p and b G are the same matrices as G ,p and G with Γ h replaced by b Γ h and let e A p,m = e G e G − ,p , where e G ,p and e G are the samematrices as G ,p and G with Γ h replaced by e Γ h . We then have k b A p,m − e A p,m k F ≤ k b G − ,p − e G − ,p k F k b G k F + k e G − ,p k F k b G − e G k F . (1.3)We ﬁrst show that k G − ,p k F = O P (cid:0) √ mλ − m + p (cid:1) . (1.4)Toward this notice ﬁrst the recursive relation G − ,p +1 = (cid:18) G − ,p

00 0 (cid:19) + R p , (1.5)where R p = − J p A p,m V − / p V − / p ! − V − / p A ⊤ p,m J p V − / p ! , see Brockwell and Davis (1991), Ch. 11.4 and Sowell (1989), where J p = I p ⊗ I m with I m the m × m unity matrix and I p the matrix with ones on the diago-nal from the bottom left to the top right and zero elsewhere, V p = E ( ξ t − P pj =1 A j,p ( m ) ξ t + j )( ξ t − P pj =1 A j,p ( m ) ξ t + j ) ⊤ and A p,m = ( A ,p ( m ) ⊤ , A ,p ( m ) ⊤ ,. . . , A p,p ( m ) ⊤ ) the coeﬃcient matrices that minimize the “forward predictionvariance” E ( ξ t − P pj =1 D j,p ( m ) ξ t + j )( ξ t − P pj =1 D j,p ( m ) ξ t + j ) ⊤ . We then get fromthe recursive relation (1.5) that G − ,p = (cid:18) Γ −

00 0 (cid:19) + p X j =1 R j . Using the deﬁnition of the matrix R s and because k V − / s k F = O (1) uniformlyin s and m , we get k R s k F ≤ k V − / s k F (cid:0) k J s A s,m k F (cid:1) ≤ O (1) (cid:0) s X j =1 k A j,s ( m ) k F (cid:1) = O (1) , since, as in Lemma 6.1, P sj =1 k A j,s ( m ) k F = O (1) uniformly in s and m . Thus k P pj =1 R j k F ≤ P pj =1 k R j k F = O ( p ) and using the bound k Γ − k F = qP mj =1 λ − j ≤√ mλ − m we conclude that k G − ,p k F = O (cid:0) √ mλ − m + p (cid:1) .We next show that k b G − ,p − e G − ,p k F = O P (cid:16) ( p √ mλ − m + p ) vuut n m X j =1 α − j (cid:17) . (1.6) fstathios Paparoditis/Functional Sieve Bootstrap For this notice that using (1.2) we get k b G − ,p − e G − ,p k F ≤ k e G − ,p k F k b G ,p − e G ,p k F − k e G − ,p k F k b G ,p − e G ,p k F = k G − ,p k F k b G ,p − e G ,p k F − k e G − ,p k F k b G ,p − e G ,p k F + a lower order term= O P (cid:0) k G − ,p k F k b G ,p − e G ,p k F (cid:1) and since by (1.1), k b G ,p − e G ,p k F ≤ p X i =1 p X j =1 k b Γ i − j − e Γ i − j k F ≤ p max − p +1 ≤ h ≤ p − k b Γ h − e Γ h k F = O P (cid:16) p vuut n − m X j =1 α − j (cid:17) , we get using (1.4) the assertion (1.6).Furthermore, k b G k F ≤ p X j =1 k b Γ j k F ≤ p X j =1 k Γ j k F + p X j =1 k b Γ j − Γ j k F = O (1) + O P (cid:16) p vuut n − m X j =1 α − j (cid:17) , (1.7)where the O (1) term is uniformly in m , and, k b G − e G k F ≤ p X j =1 k b Γ j − e Γ j k F = O P (cid:16) p vuut n − m X j =1 α − j (cid:17) . (1.8)Thus from (1.3) and using the bounds (1.4)-(1.8) we get that k b A p,m − e A p,m k F = O P (cid:16) ( p √ mλ − m + p ) vuut n m X j =1 α − j (cid:17) . (cid:3) fstathios Paparoditis/Functional Sieve Bootstrap Proof of Lemma 6.4:

We ﬁrst show that the assertion is true for | z | ≤ | det ( A p,m ( z )) | 6 = 0 for | z | ≤ | det ( A p,m ( z ) | ≥ inf | z | =1 | detA p,m ( z ) | . Now, recallthat for ω ∈ [ − π, π ], 2 πf ξ ( ω ) = A − m,p ( e − iω )Σ e ( m ) A − m,p ( e − iω ). Let µ ( ω ) be thelargest eigenvalue of f ξ ( ω ). We then have | det ( A p,m ( e − iω )) | = det (Σ e ( m )) / (2 π | det ( f ξ ( ω )) | ) ≥ c e / (2 π m µ ( ω )) ≥ e Cm − , for some constant e C > m . Notice that the ﬁrst inequality fol-lows by Lemma 6.1(iii) and the last by the fact that µ ( ω ) is bounded uniformlyin m ; see Lemma 2.1. Thus inf ω ∈ [ − π,π ] | det ( A p,m ( e − iω ) | ≥ Cm − which impliesthat inf | z |≤ | detA p,m ( z ) | ≥ Cm − / with some constant C > m . Extension of this lower bound to the slightly larger region | z | ≤ /p andfor all p > p m for some p m ∈ N , follows exactly along the same lines as the proofof Lemma 3.2 of Meyer and Kreiss (2015); see also Lemma 2.3 of Kreiss et al.(2011). (cid:3) Proof of Lemma 6.5:

To see (i) let A ( r,s ) be the ( r, s )th element of amatrix A and notice that by Cauchy’s inequality for holomorphic functions wehave | e Ψ ( r,s ) j,p ( m ) − Ψ ( r,s ) j,p ( m ) | ≤ (cid:0) p (cid:1) − j max | z | =1+1 /p k e A − p,m ( z ) − A − p,m ( z ) k F (1.9)andmax | z | =1+1 /p k e A − p,m ( z ) − A − p,m ( z ) k F ≤ max | z | =1+1 /p | det ( e A p,m ( z ) | k e A Adjp,m ( z ) − A Adjp,m ( z ) k F + max | z | =1+1 /p (cid:12)(cid:12) det ( e A p,m ( z )) − det ( A p,m ( z )) (cid:12)(cid:12) k A Adjp,m ( z ) k F = R ,n ( z ) + R ,n ( z ) , with an obvious notation for R ,n ( z ) and R ,n ( z ). By Theorem 2.12 of Ipsenand Rehman (2008) we get that | det ( e A p,m ( z )) − det ( A p,m ( z )) | ≤ m k e A p,m ( z ) − A p,m ( z ) k B m − ( z ) , where B max ( z ) = max {k e A p,m ( z ) k , k A p,m ( z ) k } and k A k denotes the spectralnorm (i.e., the largest singular value) of the matrix A . Since k A k ≤ k A k F andusing the bound k A k ≤ | det ( A ) | (cid:16) k A k F √ m + 1 (cid:17) m +1 , for the largest singular value of a non-singular matrix A , see Merikoski andKumar (2005), p. 373, we get by straightforward calculations and in view of fstathios Paparoditis/Functional Sieve Bootstrap Lemma 6.4 and the constant C appearing there, thatmax | z | =1+1 /p B m − ( z ) ≤ m − m ( m − / max | z | =1+1 /p k B max ( z ) k m − F C m − ( m + 1) ( m − / = o P (1) , since max | z | =1+1 /p k A p,m ( z ) k F = O (1) and max | z | =1+1 /p k e A p,m ( z ) k F = O P (1)uniformly in m . Thus | det ( e A p,m ( z )) − det ( A p,m ( z )) | ≤ m k e A p,m ( z ) − A p,m ( z ) k F o P (1) . (1.10)Lemma 6.1 and Assumption 4(iv) lead by Cauchy-Schwarz’s inequality to thebound sup | z |≤ /p k e A p,m ( z ) − A p,m ( z ) k ≤ (1 + 1 /p ) p p X j =1 k e A p,m ( z ) − A p,m ( z ) k≤ O P ( √ p k e A p,m − A p,m k F ) , from which we derive, using (1.10), thatsup | z |≤ /p | det ( e A p,m ( z )) − det ( A p,m ( z )) | ≤ o P (1) m sup | z |≤ /p k e A p,m ( z ) − A p,m ( z ) k F = o P ( m √ p k e A p,m − A p,m k F )and by Lemma 6.4 that R ,n ( z ) ≤ δ − m m X r =1 m X s =1 sup | z |≤ /p | det ( e A ( − r, − s ) p,m ( z )) − det ( A ( − r, − s ) p,m ( z )) | = O P ( m / ) O P ( m ) o P ( m √ p k e A p,m − A p,m k F )= o P ( m / p / k e A p,m − A p,m k F ) . Furthermore, by Lemma 6.4 and the bound (1.10) we get R ( z ) ≤ δ − m max | z | =1+1 /p k A Adjp,m ( z ) k F max | z | =1+1 /p | det ( e A p,m ( z )) − det ( A p,m ( z )) | = o P ( δ − m mp / k e A p,m − A p,m k F )= o P ( m p / k e A p,m − A p,m k F ) . Thus and using equation (1.9), we conclude that ∞ X j =1 k e Ψ j,p ( m ) − Ψ j,p ( m ) k F ≤ ∞ X j =1 m X r =1 m X s =1 | e Ψ ( r,s ) j,p ( m ) − Ψ ( r,s ) j,p ( m ) | = o P ( m / p / k e A p,m − A p,m k F )+ o P ( m p / k e A p,m − A p,m k F )= o P ( m / p − / ) + o P ( p − / ) → , fstathios Paparoditis/Functional Sieve Bootstrap by Assumption 4.Consider (ii) so we have, k e Σ e,p ( m ) − Σ e,p ( m ) k F ≤ k n − p n X t = p +1 (cid:0)e e t,p ( m ) e e ⊤ t,p ( m ) − e t,p ( m ) e ⊤ t,p ( m ) (cid:1) k F + k n − p n X t = p +1 e t,p ( m ) e ⊤ t,p ( m ) − Ee t,p ( m ) e ⊤ t,p ( m ) k F + k e e n,p ( m ) e e ⊤ n,p ( m ) k F = E ,n + E ,n + E ,n , with an obvious notation for E j,n , j = 1 , ,

3. We show that all three termsconverge to zero in probability. By the triangular inequality and in order toshow E ,n P →

0, it suﬃces to show that E (1)1 ,n = k ( n − p ) − P nt = p +1 ( e e t,p ( m ) − e t,p ( m )) e e ′ t,p ( m ) k F P →

0. For this we use the bound E (1)1 ,n ≤ k n − p n X t = p +1 p X j =1 ( e A j,p ( m ) − A j,p ( m )) ξ t − j e e ⊤ t,p ( m ) k F + k n − p n X t = p +1 p X j =1 ( A j,p ( m ) − A j ( m )) ξ t − j e e ⊤ t,p ( m ) k F + k n − p n X t = p +1 ∞ X j = p +1 A j ( m ) ξ t − j e e ⊤ t,p ( m ) k F . (1.11)Since by straightforward calculations it yields that P pj =1 k ξ t − j e e ⊤ t,p ( m ) k F = O P ( m p ), we get by Assumption 4(iv) and Cauchy-Schwarz’s inequality that k n − p n X t = p +1 p X j =1 ( e A j,p ( m ) − A j,p ( m )) ξ t − j e e ⊤ t,p ( m ) k F = O P (cid:16) k e A p,m − A p,m k F vuut p X j =1 k ξ t − j e e ⊤ t,p ( m ) k F (cid:17) = O P ( m − p − / ) → . For the second term on the right hand side of (1.11) we get by replacing e e t,p ( m ) fstathios Paparoditis/Functional Sieve Bootstrap by e t,p ( m ) and using Lemma 6.2, that E k p X j =1 ( A j,p ( m ) − A j ( m )) ξ t − j e ⊤ t,p ( m ) k F ≤ p X j =1 k A j,p ( m ) − A j ( m ) k F O ( m )= O ( mδ − m ∞ X j = p +1 j k A j ( m ) k F )= O ( mp − δ − m ∞ X j = p +1 j k A j ( m ) k F ) → , by Assumption 4. Finally and by the same assumption, we get for the thirdterm of (1.11) using E ∞ X j = p +1 k A j ( m ) k F k ξ t − j e ⊤ t,p ( m ) k = O ( m ) ∞ X j = p +1 k A j ( m ) k F = O ( mp − ∞ X j = p +1 j k A j ( m ) k F ) , which converges to zero in probability.Since the term E ,n is easier to deal with using similar arguments as for theterm E ,n , we consider the term E ,n . Using e n ( m ) = ( n − p ) − P nt = p +1 e t ( m )we have that E ,n ≤ k ( e e n,p ( m ) − e n ( m ))( e e n,p ( m ) − e n ( m )) ⊤ k F + k e n ( m ) e ⊤ n ( m ) k F + 2 k ( e e n,p ( m ) − e n ( m )) e ⊤ n ( m ) k F . Since e n ( m ) = O P (( n − p ) − / ) uniformly in m and by similar arguments asabove, we have k e e n,p ( m ) − e n ( m ) k ≤ n − p n X t = p +1 vuut p X j =1 k e A j,p ( m ) − A j,p ( m ) k F vuut p X j =1 k ξ t − j k + 1 n − p n X t = p +1 p X j =1 k A j,p ( m ) − A j ( m ) k F k ξ t − j k + 1 n − p n X t = p +1 ∞ X j = p +1 k A j ( m ) k F k ξ t − j k = O P ( m − / p − / ) + O P ( m / δ − m ∞ X j = p +1 j k A j ( m ) k F ) . Thus we conclude using Assumption 4, that E ,n P → fstathios Paparoditis/Functional Sieve Bootstrap Consider (iii). By (i) it suﬃces to show that P ∞ j =1 k b Ψ j,p ( m ) − e Ψ j,p ( m ) k F P → | z |≤ /p k b A p,m ( z ) − e A p,m ( z ) k F ≤ (cid:0) p ) p p X j =1 k b A j,p ( m ) − e A j,p ( m ) k F ≤ O (1) O P ( √ p k b A p,m − e A p,m k F )= O P (cid:16)(cid:16) p √ mλ m + p (cid:17) vuut pn m X j =1 α − j (cid:17) . By Cauchy’s inequality for holomorphic functions we get for the ( r, s )th elementof the matrices b Ψ j,p ( m ) and e Ψ j,p ( m ), that (cid:12)(cid:12) b Ψ ( r,s ) j,p ( m ) − e Ψ ( r,s ) j,p ( m ) (cid:12)(cid:12) ≤ (cid:0) p (cid:1) − j max | z | =1+1 /p k b A − p,m ( z ) − e A − p,m ( z ) k F andmax | z | =1+1 /p k b A − p,m ( z ) − e A − p,m ( z ) k F ≤ max | z | =1+1 /p | det ( b A p,m ( z ) | k b A Adjp,m ( z ) − e A Adjp,m ( z ) k F + max | z | =1+1 /p (cid:12)(cid:12) det ( b A p,m ( z ) − det ( e A p,m ( z ) (cid:12)(cid:12) k e A Adjp,m ( z ) k F . From the above bound and by Lemma 6.3 and Lemma 6.4, we get by the samearguments as those leading to the bounds of R ,n ( z ) and R ,n ( z ), that uniformlyin j , (cid:12)(cid:12) b Ψ ( r,s ) j,p ( m ) − e Ψ ( r,s ) j,p ( m ) (cid:12)(cid:12) ≤ (cid:0) p (cid:1) − j O P (cid:16) m / (cid:16) p √ mλ m + p (cid:17) vuut pn m X j =1 α − j (cid:17) , that is k b Ψ j,p ( m ) − e Ψ j,p ( m ) k F ≤ m X r,s =1 (cid:12)(cid:12) b Ψ ( r,s ) j,p ( m ) − e Ψ ( r,s ) j,p ( m ) (cid:12)(cid:12) = (cid:0) p (cid:1) − j O P (cid:16) m / (cid:16) p √ mλ m + p (cid:17) vuut pn m X j =1 α − j (cid:17) , from which we get ∞ X j =1 k b Ψ j,p ( m ) − e Ψ j,p ( m ) k F = O P (cid:16) pm / (cid:16) p √ mλ m + p (cid:17) vuut pn m X j =1 α − j (cid:17) → , by Assumption 4. fstathios Paparoditis/Functional Sieve Bootstrap To establish (iv) notice that using (ii) it suﬃces to show that k b Σ e,p ( m ) − e Σ e,p ( m ) k P →

0. By the triangular inequality it suﬃces to show that k n − p n X t = p +1 (cid:2) ( b e t,p ( m ) − b e n,p ( m )) − ( e e t,p ( m )) − e e n,p ( m ) (cid:3)(cid:0)b e t,p ( m ) − e e t,p ( m ) (cid:1) k P → . Since the above term can be bounded by1 n − p n X t = p +1 k b e t,p ( m ) − e e t,p ( m ) k + k b e n,p ( m ) − e e n,p ( m ) k n − p k b e t,p ( m ) − e e t,p ( m ) k , we show that both terms above converge to zero in probability. We use thebound1 n − p n X t = p +1 k b e t,p ( m ) − e e t,p ( m ) k ≤ p X j =1 k b A j,p ( m ) − e A j,p ( m ) k F n − p n X t = p +1 k b ξ t − j k + 2 n − p n X t = p +1 k b ξ t − ξ t k + 4 p X j =1 k e A j,p ( m ) k F n − p n X t = p +1 k b ξ t − j − ξ t − j k . From Lemma 6.3 we get by straightforward calculations that, ( n − p ) − P nt = p +1 k b ξ t − ξ t k = O P ( n − P mj =1 α − j ) and because ( n − p ) − P nt = p +1 k b ξ t − j k = O P (1), weget p X j =1 k b A j,p ( m ) − e A j,p ( m ) k F n − p n X t = p +1 k b ξ t − j k = k b A p,m − e A p,m k F n − p n X t = p +1 k b ξ t − j k = o P (1) , by Assumption 4. Furthermore, since P pj =1 k e A j,p ( m ) k F = O P (1), we get p X j =1 k e A j,p ( m ) k F n − p n X t = p +1 k b ξ t − j − ξ t − j k = O P (1) O P (cid:16) n m X j =1 α − j (cid:17) = o P (1) . Similar arguments yield k b e n,p ( m ) k ≤ k n − p n X t = p +1 b ξ t k + 2 k p X j =1 b A j,p ( m ) 1 n − p n X t = p +1 b ξ t − j k = O P (cid:16) m ( n − p ) − + n − m X j =1 α − j (cid:17) , and k e e n,p ( m ) k = O P ( m ( n − p ) − ) → fstathios Paparoditis/Functional Sieve Bootstrap

2. Proof of Lemma 3.1 (i) Notice that λ j ≥ C λ ρ j . Since p = O ( n a ) with a ∈ (0 , / m / /p / = O ( log / ( n ) /n a/ ). For Assumption 4(ii)we have p √ nλ m vuut m X j =1 α − j ≤ p √ nC λ ρ m vuut m X j =1 ρ m − j ) ≤ p √ nC λ ρ m × p − ρ → , if n / − a ρ m → ∞ , which is satisﬁed for m ≤ (cid:16) log ( ρ − ) (cid:0) − a (cid:1) − δ (cid:17) log ( n )for some δ >

0. Finally, straightforward calculations as for Assumption 4(ii)show that mp = O ( √ nλ m ) and pλ m = O ( m ) which imply that Assumption4(iv) is satisﬁed.(ii) Notice that λ − j ≤ C − λ j θ and recall that p = O ( n a ) with a ∈ (0 , / m / /p / = O ( n ζ/ − a/ ) = O (1) for 0 < ζ ≤ α/

3. Consider Assumption4(ii) and observe that m X j =1 α − j ≤ C − λ m X j =1 j θ ≤ C − λ θ + 1 ( m + 1) θ +1 . Thus p √ nλ m vuut m X j =1 α − j = O (cid:0) n − (1 / − a − θζ − ζ/ (cid:1) → , for ζ ∈ (0 , ζ max ] and ζ max = min n − a θ − δ, a/ o and for some δ >

0. Finally,verify that for ζ < (1 − a )(1 + 6 θ ) − we have that mp = O ( √ nλ m ) and thatif ζ ≥ a/ (2 + 2 θ ), then pλ m = O ( m ).

3. Proof of Proposition 3.2

Recall that the spectral density operator F ω can be expressed as 2 π F ω = P ∞ j =1 P ∞ l =1 P ∞ h = −∞ h C h ( v j ) , v l i e − ihω ( v j ⊗ v l ). Deﬁne for m ∈ N , 2 π F ω,m = P mj =1 P ml =1 P ∞ h = −∞ h C h ( v j ) , v l i e − ihω ( v j ⊗ v l ) and verify that since h C h ( v j ) , v l i = E ( ξ j,t ξ l,t + h ), the following expression is also valid,2 π F ω,m ( · ) = m X j =1 m X l =1 ⊤ j Ψ m ( e − iω )Σ e ( m )Ψ m ( e − iω ) l h v j , ·i v l , fstathios Paparoditis/Functional Sieve Bootstrap where Ψ m ( z ) = I m + P ∞ j =1 Ψ j ( m ) z j = ( I m − P ∞ j =1 A j ( m ) z j ) − , | z | ≤

1. Let2 π e F ω,m = P mj =1 P ml =1 ⊤ j Ψ p,m ( e − iω )Σ e,p ( m )Ψ p,m ( e − iω ) l h v j , ·i v l where Ψ p,m ( z ) = P ∞ j =1 Ψ j,p ( m ) z j , | z | ≤

1. Finally recall that2 πF ∗ ω,m = m X j =1 m X l =1 ⊤ j b Ψ p,m ( e − iω ) b Σ e,p ( m ) b Ψ p,m ( e − iω ) l h b v j , ·i b v l + E ∗ U ∗ t ⊗ U ∗ t . Then, kF ∗ ω,m − F ω k HS ≤ kF ∗ ω,m − e F ω,m k HS + k e F ω,m − F ω,m k HS + kF ω,m − F ω k HS . (3.1)The ﬁrst term on the right hand side above is bounded by kF ∗ ω,m − e F ω,m k HS ≤ k E ∗ U ∗ t ⊗ U ∗ t k HS + k X j,l ′ j b Ψ p,m ( e − iω ) b Σ e,p ( m ) b Ψ p,m ( e − iω ) l (cid:0) h b v j , ·i b v l − h v j , ·i v l (cid:1) k HS + k X j,l ′ j (cid:0) b Ψ p,m ( e − iω ) b Σ e,p ( m ) b Ψ p,m ( e − iω ) − Ψ p,m ( e − iω )Σ e,p ( m )Ψ p,m ( e − iω ) (cid:1) × l h v j , ·i v l k HS . Furthermore, k E ∗ U ∗ t ⊗ U ∗ t k HS ≤ k E + U + t ⊗ U + t k HS + k E ∗ U ∗ t ⊗ U ∗ t − E + U + t ⊗ U + t k HS , where U + t are i.i.d. random variables taking values with probability n − in theset { U ct = U t − U n , t = 1 , , . . . , n } and U n = n − P nt =1 U t . Then k E + U + t ⊗ U + t k HS ≤ k ∞ X j,l = m +1 h b C ( v j ) v l ih v j , ·i v l k HS + k U n k → , in probability, since k b C − C k HS → C is Hilbert-Schmidt.Furthermore, k E ∗ U ∗ t ⊗ U ∗ t − E + U + t ⊗ U + t k HS ≤ k n n X t =1 (cid:0) h b U t , ·i b U t − h U t , ·i U t (cid:1) k HS + kh b U n , ·i b U n − h U n , ·i U n k HS = O P (cid:16)vuut m X j =1 k b v j − v j k (cid:17) → , in probability, where the last equality follows by straightforward calculationsand using b U t − U t = P mj =1 ( h X t , b v j i b v j − h X t , v j i v j ). Similarly, and by the same fstathios Paparoditis/Functional Sieve Bootstrap arguments as above and using Lemma 6.5, we get k X j,l ′ j b Ψ p,m ( e − iω ) b Σ e,p ( m ) b Ψ p,m ( e − iω ) l (cid:0) h b v j , ·i b v l − h v j , ·i v l (cid:1) k HS = O P (cid:16)vuut m X j =1 k b v j − v j k (cid:17) → , Finally, straightforward calculations yield k X j,l ′ j (cid:0) b Ψ p,m ( e − iω ) b Σ e,p ( m ) b Ψ p,m ( e − iω ) − Ψ p,m ( e − iω )Σ e,p ( m )Ψ p,m ( e − iω ) (cid:1) × l h v j , ·i v l k HS = O P ( ∞ X j =1 k b Ψ j,p ( m ) − Ψ j,p ( m ) k F + k Σ e,p ( m ) − b Σ e,p ( m ) k F ) → , by Lemma 6.5(iii) and (iv). This concludes the proof that kF ∗ ω,m − e F ω,m k HS P → k e F ω,m − F ω,m k HS = O P ( P ∞ j =1 k Ψ j,p ( m ) − Ψ j ( m ) k F ) + O P ( k Σ e,p ( m ) − Σ e ( m ) k F ), i.e., k e F ω,m − F ω,m k HS converges to zero in probability by Lemma6.5(v) and (vi). For the third and last term on the right hand side of (3.1) weobtain kF ω,m − F ω k HS ≤ k ∞ X j = m +1 m X l =1 hF ω ( v j ) , v l i ( v j ⊗ v l ) k HS + k m X j =1 ∞ X l = m +1 hF ω ( v j ) , v l i ( v j ⊗ v l ) k HS + k ∞ X j = m +1 ∞ X l = m +1 hF ω ( v j ) , v l i ( v j ⊗ v l ) k HS → , as m → ∞ , since { ( v j ⊗ v l ) , j = 1 , , . . . , l = 1 , , . . . } is a complete orthonormalbasis of H ⊗ H . (cid:3)

4. Proofs of the lemmas used for the proof of Theorem 4.1Proof of Lemma 6.6:

Note that E ∗ k R ∗ n,m k = 1 n n X t =1 k b U t,m − b U n k ≤ n n X t =1 k b U t,m k + 2 k b U n k . fstathios Paparoditis/Functional Sieve Bootstrap Using k b v j − v j k ≤ √ α − j k b C − C k HS , see H¨ormann and Kokoszka (2010), weget1 n n X t =1 k b U t,m k ≤ n n X t =1 k m X j =1 h X t , v j i ( v j − b v j ) k + 4 n n X t =1 k m X j =1 h X t , v j − b v j i b v j k = 4 m X j =1 m X l =1 h b C ( v j ) , v l ih v j − b v j , v l − b v l i + 4 m X j =1 n n X t =1 h X t , v j − b v j i ≤ k b C k HS (cid:0) m X j =1 k b v j − v j k (cid:1) + 4 k b C k HS m X j =1 k b v j − v j k ≤ k b C k HS k b C − C k HS (cid:16)(cid:0) m X j =1 α − j (cid:1) + m X j =1 α − j (cid:17) = O P (cid:0) n − / m X j =1 α − j (cid:1) , where the last equality follows because k b C − C k HS = O P ( n − / ). Further-more, k b U n k P → b U n = U n + n − P nt =1 P mj =1 (cid:0) h X t , v j i v j − h X t , b v j i b v j (cid:1) , where U n = n − P nt =1 U t,m . (cid:3) Proof of Lemma 6.7:

We have E k √ n n X t =1 m X j =1 ( ξ ∗ j,t − ξ + j,t ) v j k = 1 n n X t,s =1 m X j =1 ⊤ j Eξ ∗ t ( ξ ∗ s − ξ + s ) ⊤ j + 1 n n X t,s =1 m X j =1 ⊤ j Eξ + t ( ξ + s − ξ ∗ s ) ⊤ j = D (1) n,m + D (2) n,m , with an obvious notation for D (1) n,m and D (2) n,m . We consider D (1) n,m only since D (2) n,m can be handled similarly. For this term we have D (1) n,m = 1 n n X t,s =1 m X j =1 ∞ X l =0 ⊤ j e Ψ l,p ( m )Σ ∗ e,p ( m )( b Ψ l + s − t,p ( m ) − e Ψ l + s − t,p ( m )) ⊤ j + 1 n n X t,s =1 m X j =1 ∞ X l =0 ⊤ j e Ψ l,p ( m ) E (cid:2) e ∗ t,p ( m )( e ∗ t,p ( m ) − e + t,p ( m )) (cid:3) e Ψ l + s − t,p ( m ) ⊤ j (4.1)and, using Lemma 6.1 and 6.5 we get for the ﬁrst term on the right hand sideof (4.1), that, this term is bounded by k Σ ∗ e,p ( m ) k F ∞ X l =0 k m X j =1 ⊤ j e Ψ l,p ( m ) k F ∞ X l =0 k m X j =1 ⊤ j ( b Ψ l,p ( m ) − e Ψ l,p ( m )) k F → , fstathios Paparoditis/Functional Sieve Bootstrap in probability. The second term of (4.1) is bounded by q E k e ∗ t,p ( m ) k q E k e ∗ t,p ( m ) − e ∗ t,p ( m ) k ∞ X l =0 k m X j =1 ⊤ j e Ψ l,p ( m ) k F × ∞ X l =0 k m X j =1 ⊤ j b Ψ l,p ( m ) k F , which converges to zero in probability, because E k e ∗ t,p ( m ) − e + t,p ( m ) k → E k e ∗ t,p ( m ) − e + t,p ( m ) k ≤ n − p n X t = p +1 k b e t,p ( m ) − e e t,p ( m ) k + 4 (cid:0) k b e n k + k e e n k (cid:1) ≤ n − p n X t = p +1 k b ξ t − ξ t k + 4 n − p n X t = p +1 k p X j =1 ( b A j,p ( m ) b ξ t − j − e A j,p ( m ) ξ t − j ) k + 4 (cid:0) k b e n k + k e e n k (cid:1) , and1 n − p n X t = p +1 k b ξ t − ξ t k ≤ n − p n X t = p +1 k X t k m X j =1 k b v j − v j k = O P (cid:0) n − m X j =1 α − j (cid:1) → . Furthermore,1 n − p n X t = p +1 k p X j =1 (cid:0) b A j,p ( m ) b ξ t − j − e A j,p ( m ) ξ t − j (cid:1) k ≤ p X j =1 k b A j,p ( m ) k F n − p n X t = p +1 k b ξ t − j − ξ t − j k F + 2 p X j =1 k b A j,p ( m ) − e A j,p ( m ) k F n − p n X t = p +1 k ξ t − j k = O P (cid:0) n − m X j =1 α − j (cid:1) + O P (cid:0) λ − m n − mp m X j =1 α j (cid:1) → , where the last equality follows using Lemma 6.1 and Lemma 6.3. Finally, k b e n k ≤ k n − p n X t = p +1 b ξ t k + 2 (cid:0) p X j =1 k b A j,p ( m ) k F (cid:1) (cid:0) k n − p n X t = p +1 b ξ t − j k (cid:1) → fstathios Paparoditis/Functional Sieve Bootstrap in probability, since k n − p n X t = p +1 b ξ t k ≤ k n − p n X t = p +1 ξ t k + O P ( n − m X j =1 α − j )= O P ( m/ ( n − p )) + O P ( n − m X j =1 α − j ) → , and p X j =1 k b A j,p ( m ) k F k n − p n X t = p +1 b ξ t − j k = O P (1) × O P (cid:16)p m/ ( n − p ) + vuut n − m X j =1 α − j (cid:17) → . By similar arguments we get k e e n k →

0, in probability. (cid:3)

Proof of Lemma 6.8: E (cid:13)(cid:13) √ n n X t =1 m X j =1 ξ ∗ j,t ( b v j − v j ) (cid:13)(cid:13) = m X j =1 m X l =1 n n X t =1 n X s =1 ⊤ j Γ ∗ t − s l h b v j − v j , b v l − v l i≤ (cid:0) m X j =1 k b v j − v j k (cid:1) n n X t =1 n X s =1 k Γ ∗ t − s k F = O P (cid:0)(cid:0) n − / m X j =1 α − j (cid:1) (cid:1) → . (cid:3) Proof of Lemma 6.9:

Write L + n,m ( ω ) = n − / P nt =1 W + t e − itω where W + t = P mj =1 ⊤ j ξ + t v j with ξ + t = P ∞ l =0 e Ψ l,p ( m ) e + t − l , e Ψ ,p ( m ) = I m , a random elementin H . Notice that E + ( W + t ) = 0, while using ξ t = P ∞ l =0 Ψ l ( m ) e t − l , Ψ ( m ) = I m , we get E + W + t ⊗ W + t + h = ∞ X l =0 m X j =1 m X s =1 ⊤ j e Ψ l,p ( m ) e Σ e,p ( m ) e Ψ ⊤ l + h,p ( m ) s h v j , ·i v s = ∞ X l =0 m X j =1 m X s =1 ⊤ j Ψ l ( m )Σ e ( m )Ψ ⊤ l + h,p ( m ) s h v j , ·i v s + e D n = E h X t − U t,m , ·i ( X t + h − U t + h,m ) + e D n = C h ( · ) − E h U t,m , ·i X t + h − E h X t , ·i U t + h,m + E h U t,m , ·i U t + h,m + e D n , with an obvious notation for e D n . It is easily seen that e D n = O P ( P ∞ l =0 k e Ψ l,p ( m ) − Ψ l ( m ) k F + k e Σ e,p ( m ) − Σ e ( m ) k F ) and therefore k e D n k HS → fstathios Paparoditis/Functional Sieve Bootstrap Lemma 6.5. Hence and using E k U t,m k → m → ∞ , we get that k E + W + t ⊗ W + t + h − C h k HS → n → ∞ .Let ξ ot = P ∞ l =0 Ψ l ( m ) e + t − l and deﬁne W ot = P mj =1 ⊤ j ξ ot v j and L on,m ( ω ) = n − / P nt =1 W ot e − itω . It easily follows by simple algebra and using Lemma 6.5that E + k L + n,m ( ω ) − L on,m ( ω ) k = O P ( P ∞ l =0 k e Ψ l,p ( m ) − Ψ l ( m ) k F ) →

0, in prob-ability, that is L + n,m ( ω ) = L on,m ( ω ) + o P (1). Thus to prove the assertion of thelemma it suﬃces to show that L on,m ( ω ) ⇒ N C (0 , π F ω ). For this we show thatAssumption 2 of Cerovecki and H¨ormann (2015) is satisﬁed, that is, using thenotation S on,m ( ω ) = P nt =1 W ot e − itω , we show that the following two conditionsare fulﬁlled, in probability. Z on ( ω ) ≡ n X t =0 P ( W ot ) e − itω is a Cauchy sequence in H , (4.2)and E k E ( S on,m ( ω ) |G ) k = o ( n ) , (4.3)where the operator P is deﬁned as P ( · ) = E ( ·|G ) − E ( ·|G − ) and G s = σ ( W os , W os − , W os − , . . . ). Toward this we ﬁrst deﬁne W os,s = m X j =1 ⊤ j ∞ X l =0 Ψ l ( m ) e + s − l,s v j , where e + t,s = e + t if t > e + t,s = e e t if t ≤ e e + t a copy of e + t which isindependent of e + t for t <

0. We show that ∞ X s =1 q E + k W o − W o ,s k = O P (1) , (4.4)where the O P (1) term is independent of m and p . Notice ﬁrst that by Minkowski’sinequality q E + k W o − W o ,s k ≤ vuut E + k m X j =1 ⊤ j ∞ X l = s Ψ l ( m ) e + − l v j k + vuut E + k m X j =1 ⊤ j ∞ X l = s Ψ l ( m ) e + − l,s v j k ≤ k e Σ e,p ( m ) k F ∞ X l = s k Ψ l ( m ) k F . Thus ∞ X s =1 q E + k W o − W o ,s k ≤ ∞ X s =1 k e Σ e,p ( m ) k F ∞ X l = s k Ψ l ( m ) k F ≤ k e Σ e,p ( m ) k F ∞ X l =1 l k Ψ l ( m ) k F . fstathios Paparoditis/Functional Sieve Bootstrap Now, since by Lemma 6.1(ii), P ∞ l =1 l k Ψ l ( m ) k F is bounded uniformly in m , and,by Lemma 6.5(ii) and (vi) and Lemma 6.1(iii), k e Σ e,p ( m ) k F is bounded in prob-ability, where the bound is independent of p and m , assertion (4.4) follows.Consider next condition (4.2). For positive integers n > n we have that E + k Z on ( ω ) − Z on ( ω ) k ≤ n X t = n +1 n X t = n +1 | E + hP ( W ot ) , P ( W ot ) i|≤ (cid:16) n X t = n +1 p E + kP ( W ot ) k (cid:17) . Recall the deﬁnition of W os,s . Then we have, since E ( W os,s |G ) = E ( W os,s |G − ) =0, that E + kP ( W ot ) k = E + kP ( W os ) − P ( W os,s ) k = E + k E + ( W os − W os,s |G ) − E ( W os − W os,s |G − ) k ≤ E + k E + ( W os − W os,s |G ) k + 2 E + k E ( W os − W os,s |G − ) k ≤ E + k W o − W o ,s k Hence E + k Z on ( ω ) − Z on ( ω ) k ≤ (cid:16) ∞ X s = n q E + k W o − W o ,s k (cid:17) → , as n → ∞ because of (4.4).To establish condition (4.3) notice that E k E ( S on,m ( ω ) |G ) k ≤ n X t =1 n X t =1 | E + h E + ( W ot |G ) , E + ( W ot |G ) i| = n X t =1 n X t =1 | E + h E + ( W ot − W ot ,t |G ) , E + ( W ot − W ot ,t |G ) i|≤ (cid:16) n X t =1 q E + k W o − W o ,t k (cid:17) ≤ (cid:16) ∞ X t =1 q E + k W o − W o ,t k (cid:17) , which is bounded because of (4.4). (cid:3)

5. Proof of Theorem 4.2 fstathios Paparoditis/Functional Sieve Bootstrap In view of Theorem 4.1 and Remark 4.1 of Paparoditis (2016), we get that √ nX n ⇒ N (0 , C X ) and √ nY n ⇒ N (0 , C Y ), where C X = P ∞ h = −∞ C h,X and C Y = P ∞ h = −∞ C h,Y , with C h,X and C h,Y the autocovariance operators at lag h of the processes X and Y respectively. Since X ∗ amd Y ∗ are independent weget, taking into account that n / ( n + n ) → θ , the following convergence on H as n → ∞ , G n ,n = r n n n + n X ∗ n + r n n n + n Y ∗ n ⇒ N (0 , (1 − θ ) C X + θC Y ) . By the continuous mapping theorem we then have U ∗ n ,n = k G n ,n k ⇒ R Γ ( τ ) τ , where { Γ( τ ) , τ ∈ [0 , } is a Gaussian process in H with mean zeroand covariance Cov (Γ( τ ) , Γ( τ )) = (1 − θ ) c X ( τ , τ ) + θc Y ( τ , τ ), τ , τ ∈ [0 , c X and c Y denote the covariance kernels of the operators C X and C Y ,respectively. (cid:3)

6. Additional numerical results

To generate the m -dimensional time series of pseudo scores ξ ∗ , ξ ∗ , . . . , ξ ∗ n , a set of p starting values have to be chosen. Diﬀerent alternatives can be used. In orderto obtain a time series of length n , we generated time series of length n + L usingas starting values the observed values b ξ , b ξ , . . . , b ξ p and then discarded the ﬁrst L observations to eliminate the eﬀects of these starting values. The number L hasbe chosen adapting to the multivariate case a proposal made for the univariatecase by McLeod and Hipel (1978). To elaborate, we ﬁrst calculated b Γ ξ (0) givenby b Γ ξ (0) = ∞ X j =0 b Ψ j,p ( m ) b Σ e ( m ) b Ψ ⊤ j,p ( m ) = Z π − π f b ξ ( ω ) dω, b Ψ ,p ( m ) = I m , where f b ξ ( ω ) denotes the spectral density of the VAR(p) model ﬁtted to the m -dimensional time series of estimated scores. We then selected a natural number S such that k b Γ ξ (0) − e Γ ξ (0) k F < δ, where e Γ ξ (0) = S X j =0 b Ψ j,p ( m ) b Σ e ( m ) b Ψ ⊤ j,p ( m )and δ has been set equal to a very small number, i.e., δ = 10 − . This essentiallyimplies that observations X t − j for j ≥ S have practically no eﬀect on the currentvalue X t . For instance, for the model (5.1) used in the simulations we found in anumber of 20 preliminary runs, that the values of S obtained (which depend onthe estimates b A j,p ( m ) and b Σ e ( m )), vary between 10 and 18. To be on the safeside, we have set for this model L = 30 to eliminate the eﬀects of the startingvalues b ξ , b ξ , . . . , b ξ p . fstathios Paparoditis/Functional Sieve Bootstrap Table 3

Frequency of selected values of m n,Q , m n,E and of b m n ( R = 1000 replications). m = 1 2 3 4 5 6 7n=100 m n,Q m n,E b m n m n,Q m n,E b m n m n,Q m n,E b m n m n,Q m n,E b m n m n,Q m n,E b m n m n,Q m n,E b m n Table 1 shows the results obtained for selecting the number m of principalcomponents according to the rule b m n = max { m n,Q , m n,E } , Q = 0 .

85 and fordiﬀerent sample sizes. Note that for n ≤

200 the VR while for n >

200 the GVRcriterion is used to calculate m n,Q .Table 2 shows the FSB estimates obtained using some diﬀerent values of thebootstrap parameters m and p as well as for the values of these parameterschosen by means of the b m n and AICC rule and which are denoted by ( b m, b p ).Note that ( m, p ) = (3 ,

3) is the most frequently chosen pair using this datadriven selection rule.

Note that Theorem 4.2 justiﬁes the use of percentage points of the distributionof U ∗ n ,n in order to obtain bootstrap critical values of the test U n ,n . Further-more, if H is true, that is if k µ X − µ Y k > U n ,n p → ∞ as n , n → ∞ ,see for instance Theorem 4 of Horv´ath et al. (2013), then the consistency of thetest U n ,n based on sieve bootstrap estimated critical values, follows.To investigate the size and power behavior of the bootstrap based, fully func-tional test U n ,n , we conducted a small numerical experiment by adopting the fstathios Paparoditis/Functional Sieve Bootstrap Table 4

Estimated exact ( σ EE ( τ j ) ) and functional sieve bootstrap (FSB) estimates of the standarddeviation of the sample mean X n ( τ j ) for diﬀerent values of τ j ∈ [0 , and for diﬀerentparameters m and p . b σ ( τ j ) refers to the mean, while S ( b σ ( τ j )) to the standard deviation ofthe FSB estimates. m=2, p=3 m=3, p=3 b m, b pτ j σ EE ( τ j ) b σ ( τ j ) S ( b σ ( τ j )) b σ ( τ j ) S ( b σ ( τ j )) b σ ( τ j ) S ( b σ ( τ j ))0.00 2.149 2.124 0.392 2.188 0.440 2.025 0.4620.05 2.203 2.172 0.404 2.227 0.440 2.072 0.4730.10 2.272 2.262 0.441 2.305 0.458 2.141 0.4800.15 2.325 2.362 0.466 2.385 0.477 2.196 0.5010.20 2.358 2.429 0.484 2.434 0.492 2.227 0.5100.25 2.370 2.457 0.488 2.452 0.488 2.240 0.5160.30 2.351 2.429 0.488 2.432 0.485 2.231 0.5090.35 2.317 2.359 0.462 2.382 0.471 2.203 0.4930.40 2.267 2.271 0.435 2.307 0.448 2.138 0.4700.45 2.196 2.183 0.419 2.237 0.439 2.062 0.4520.50 2.146 2.123 0.401 2.199 0.433 2.026 0.4460.55 2.194 2.165 0.405 2.240 0.440 2.075 0.4560.60 2.264 2.249 0.419 2.309 0.459 2.148 0.4730.65 2.314 2.342 0.441 2.370 0.468 2.204 0.4900.70 2.343 2.408 0.464 2.418 0.487 2.241 0.5050.75 2.351 2.429 0.475 2.430 0.494 2.244 0.5130.80 2.342 2.405 0.474 2.413 0.481 2.235 0.5100.85 2.309 2.346 0.459 2.364 0.473 2.198 0.4970.90 2.258 2.262 0.431 2.299 0.456 2.133 0.4820.95 2.188 2.167 0.399 2.227 0.444 2.061 0.4631.00 2.149 2.123 0.392 2.188 0.440 2.025 0.462 simulation design of Horv´ath et al. (2013) and considering the functional movingaverage model X t = Θ ( ε t − ) + ε t , with Θ the integral operator with kernel θ ( t, s ) = exp {− ( t + s ) / } Z exp( − x ) dx and { ε t } i.i.d. Brownian bridges. Pairs of functional time series of length n and n have been generated using the above FMA(1) model with mean functionsgiven by µ = 0 for the ﬁrst and µ ( τ ) = γτ (1 − τ ), τ ∈ [0 , γ = 0 corresponds to the null hypothesis whilethe degree of deviation from the null under the alternative is controlled by theparameter γ . The rejection frequencies obtained for diﬀerent sample sizes basedon R = 200 repetitions and B = 1000 bootstrap replications are reported inTable 3 for diﬀerent choices of the parameters m and p and for three diﬀerentnominal levels. Notice that the data driven values of m and p chosen using the b m n and AICC rule are denoted in this table by ( b m, b p ), while ( m, p ) = (3 , m, p ) = (3 ,

2) are the most frequently chosen values of the correspondingparameters using the same rule for n = n = 100 and for n = n = 200,respectively. As this table shows, using the critical values obtained by means of fstathios Paparoditis/Functional Sieve Bootstrap Table 5

Size and power behavior of the FSB-based test for the two-sample mean problem (R=200replications, B=1000 bootstrap samples). n = 100 n = 200 n = 100 n = 200 α = α = γ (m,p) 0.01 0.05 0.10 (m,p) 0.01 0.05 0.100 (3,1) 0.008 0.055 0.125 (3,2) 0.010 0.050 0.112( b m, b p ) 0.010 0.050 0.095 ( b m, b p ) 0.015 0.045 0.0870.2 (3,1) 0.018 0.085 0.170 (3,2) 0.055 0.135 0.210( b m, b p ) 0.035 0.080 0.180 ( b m, b p ) 0.045 0.150 0.2450.5 (3,1) 0.180 0.325 0.455 (3,2) 0.435 0.635 0.770( b m, b p ) 0.215 0.455 0.575 ( b m, b p ) 0.355 0.575 0.7150.8 (3,1) 0.535 0.790 0.870 (3,2) 0.915 0.955 0.980( b m, b p ) 0.495 0.690 0.815 ( b m, b p ) 0.865 0.960 0.9851.0 (3,1) 0.715 0.880 0.940 (3,2) 0.980 1.000 1.000( b m, b p ) 0.735 0.835 0.930 ( b m, b p ) 0.985 1.000 1.000 the functional sieve bootstrap procedure, the fully functional test U n ,n retainsthe nominal size and shows at the same time a nice power behavior; the power ofthe test increases as the deviation from the null and/or the sample size increases. References [1] BROCKWELL, P. and DAVIS, R. (11991).

Time Series: Theory an Meth-ods . Springer, Berlin-Heidelberg-New York.[2] CEROVECKI, C. and H ¨ORMANN, S. (2017). On the CLT for discreteFourier transforms of functional time series.

Journal of Multivariate Anal-ysis , , 282-295.[3] H ¨ORMANN, S. and KOKOSZKA, P. (2010). Weakly dependent functionaldata. Annals of Statistics , , 1845-1884.[4] HORV ´ATH, L., KOKOSZKA, P. and REEDER, R. (2013). Estimation ofthe Mean of Functional Time Series and a Two Sample Problem. Journalof the Royal Statistical Society: Series B , , 103-122.[5] IPSEN, I. C. F. and REHMAN, R. (2008). Perturbation bounds for deter-minants and characteristic polynomials. SIAM Journal of Matrix Analysisand Applications , , 762-776.[6] KREISS, J.-P., PAPARODITIS, E. and POLITIS, D. N. (2011). On therange of validity of the autoregressive sieve bootstrap. Annals of Statistics , , 2103-2130.[7] MCLEOD, A. I. and HIPEL, K. W. (1978). Simulation procedures for Box-Jenkins models. Water Resouces Research , 14, 969-975.[8] MERIKOSKI, J. K. and KUMAR, R. (2005). Upper bounds for singularvalues.

Linear Algebra and its Applications , 401, 371-379.[9] MEYER, M. and KREISS, J.-P. (2015). On the vector autoregressive sievebootstrap.

Journal of Time Series Analysis , , 377-397.[10] PAPARODITIS, E. (2016). Sieve bootstrap for functional time series.[11] SOWELL, F. (1989). A Decomposition of Block Toeplitz Matrices with Ap- fstathios Paparoditis/Functional Sieve Bootstrap plications to Vector Time Series. Discussion Paper, GSIA, Carnegie MellonUniversity.[12] WIENER, N. and MASANI, P. (1958). The prediction theory of multivari-ate stochastic processes, II. Acta Mathematica ,99