[PDF] Adaptive Frequency Band Analysis for Functional Time Series

Abstract

The frequency-domain properties of nonstationary functional time series often contain valuable information. These properties are characterized through its time-varying power spectrum. Practitioners seeking low-dimensional summary measures of the power spectrum often partition frequencies into bands and create collapsed measures of power within bands. However, standard frequency bands have largely been developed through manual inspection of time series data and may not adequately summarize power spectra. In this article, we propose a framework for adaptive frequency band estimation of nonstationary functional time series that optimally summarizes the time-varying dynamics of the series. We develop a scan statistic and search algorithm to detect changes in the frequency domain. We establish theoretical properties of this framework and develop a computationally-efficient implementation. The validity of our method is also justified through numerous simulation studies and an application to analyzing electroencephalogram data in participants alternating between eyes open and eyes closed conditions.

Full PDF

AAdaptive Frequency Band Analysis forFunctional Time Series

Pramita Bagchi and Scott Bruce

Department of Statistics, George Mason University

Abstract:

The frequency-domain properties of nonstationary functional time se-ries often contain valuable information. These properties are characterized throughits time-varying power spectrum. Practitioners seeking low-dimensional summarymeasures of the power spectrum often partition frequencies into bands and cre-ate collapsed measures of power within bands. However, standard frequency bands have largely been developed through manual inspection of time series data andmay not adequately summarize power spectra. In this article, we propose a frame-work for adaptive frequency band estimation of nonstationary functional time se-ries that optimally summarizes the time-varying dynamics of the series. We developa scan statistic and search algorithm to detect changes in the frequency domain. Weestablish theoretical properties of this framework and develop a computationally-efﬁcient implementation. The validity of our method is also justiﬁed through nu-merous simulation studies and an application to analyzing electroencephalogramdata in participants alternating between eyes open and eyes closed conditions.MSC 2020 subject classiﬁcations. 62M15, 62G20.

Keywords and phrases:

Frequency band estimation, functional time series, locallystationary, multitaper estimation, spectrum analysis..

1. Introduction

Functional data has emerged as an important object of interest in statistics in re-cent years as advances in technology have led to an abundance of high-dimensionaland high resolution data. While classical statistical methods often fail in this setting,functional data analysis techniques use the smooth structure of the observed pro-cess in order to model non-sparse, high-dimensional and high-resolution data. Theterm “functional data analysis” was coined by [25] and [26], but the history of thisarea is much longer, dating back to [13] and [28]. The intrinsic high, or rather in-ﬁnite, dimensionality of such data poses interesting challenges in both theory andcomputation and has garnered a considerable amount of attention within the statis-tics community. While functional data are now quite well studied, a relatively newﬁeld of research is emerging which aims to analyze the time-varying characteris-tics of such functional data. Functional time series data often arise in many impor-tant problems, such as analyzing forward curves derived from commodity futures[15], daily patterns of geophysical and environmental data [30], demographic quan-tities, such as age-speciﬁc fertility or mortality rates studied over time [31], and neu-rophysiological data, such as electroencephalography (EEG) and functional mag-netic resonance imaging (fMRI), recorded at various locations in the brain [12, 32].For example, NASA records surface temperatures for more than 5000 locations, andthese readings are used to identify yearly temperature anomalies, which are crucial a r X i v : . [ s t a t . M E ] F e b . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series in studying global warming patterns [18]. In practice, such data are typically ana-lyzed separately for each location, which is computationally expensive and fails toaccount for the spatial structure of the data. However, it is reasonable to assumetemperatures vary smoothly across locations, and thus ideal to analyze the collec-tion of readings across locations as a functional data object.The frequency-domain properties of time series data, including the aforemen-tioned functional time series data examples, often contain valuable information.These properties are characterized through its power spectrum, which describesthe contribution to the variability of a time series from waveforms oscillating at dif-ferent frequencies. Practitioners seeking practical, low-dimensional summary mea-sures of the power spectrum often partition frequencies into bands and create col-lapsed measures of power within these bands. In practice, frequency band summarymeasures are used in a wide variety of contexts, such as to summarize seasonalpatterns in environmental data, and to measure association between frequency-domain characteristics of EEG data and cognitive processes [16]. In the scientiﬁcliterature, standard frequency bands used for analysis have largely been developedthrough manual inspection of time series data. This is accomplished by noting promi-nent oscillatory patterns in the data and forming frequency bands that largely ac-count for these dominant waveforms. For example, frequency-domain analysis ofheart rate variability (HRV) data began in the late 1960s [4] and led to the develop-ment of three primary frequency bands used to summarize power spectra: very lowfrequency (VLF) ( ≤ number of frequency bands neededto adequately summarize the power spectrum may depend on experimental factors.[10] also notes the endpoints for the alpha frequency band may vary across subjectsand proposes a data-adaptive deﬁnition for alpha band power. Both of these exam-ples illustrate the need for a standardized, quantitative approach to frequency bandestimation that provides a data-driven determination of the number of frequencybands and their respective endpoints.For univariate nonstationary time series, a data-adaptive framework for identify-ing frequency bands that best preserve time-varying dynamics has been introducedin [6]. They develop a frequency-domain scan and hypothesis testing procedure andsearch algorithm to detect and validate changes in nonstationary behavior acrossfrequencies. A rigorous framework for studying the frequency domain properties offunctional time series was established in [22] and [23]. The spectral characteristics oflocally stationary functional time series have been developed by [36]. In this paper,we develop a scan statistic for identifying the optimal frequency band structuresfor characterizing functional time series data. Developing appropriate scan statis-tics for the functional domain poses two important challenges. As the periodogramitself is an inconsistent estimator of the power spectrum, the univariate scan statis- . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series tic was developed using a local multitapered periodogram estimator [6]. The corre-sponding test for detecting changes in the time-varying dynamics of the power spec-trum across frequencies uses as asymptotic χ distribution approximation of the lo-cal multitapered periodograms. However, the asymptotic behavior of the multitaperperiodogram estimator for functional time series is not well studied. In fact, the ap-proximation of such a quantity by a χ distribution is not possible in the functionalcase. Hence, a functional generalization of this result requires a completely differentapproach. In this paper, we derive a central limit theorem type result for the func-tional multitaper periodogram and establish a uniform distributional approxima-tion of the functional scan statistics to a quadratic functional of a Gaussian process.The second challenge is implementation of the developed method, which is chal-lenging largely due to the non-standard limit distribution and high-dimensionalityof the data. In this work, we propose an efﬁcient algorithm to detect the frequencyband structure based on a ﬁnite dimensional projection of the functional scan statis-tic. Moreover, we propose a computationally-efﬁcient and memory-smart modiﬁca-tion of the algorithm based on the intrinsic structure of the asymptotic distribution.These modiﬁcations enable the proposed methodology to be applied to longer andmore densely-observed functional time series typically encountered in practice.The rest of the paper is organized as follows. Section 2 introduces the locally sta-tionary functional time series model and offers a deﬁnition of the frequency-bandedpower spectrum. Section 3 ﬁrst provides an overview of the local multitaper peri-odogram estimator for the power spectrum, which is used as a ﬁrst step in the pro-posed procedure. It should be noted that the proposed procedure can be carriedout with any consistent time-varying spectral estimator, but this work focuses onthe local multitaper periodogram estimator due to favorable empirical and theoret-ical properties established herein. Section 3 then details the components of the pro-posed analytical framework, including the scan statistics and their theoretical prop-erties, an iterative algorithm for identifying multiple frequency bands, and anothertest statistic to determine if the power spectrum within a frequency band is station-ary with respect to time. Section 4 contains simulation results to evaluate empiricalperformance in estimating the number of frequency bands and their endpoints andprovides an application to frequency band analysis of EEG data for an individualalternating between eyes open (EO) and eyes closed (EC) resting states. Section 5offers a discussion of the results and concluding remarks. Proofs for all theoreticalresults are provided in the Appendix of this paper.

2. Nonstationary Functional Time Series in Frequency Domain

We begin by introducing some notation and results from functional analysis usedin the paper. Suppose that H is a separable Hilbert space. L ( H ) denotes the Ba-nach space of bounded linear operators A : H → H with the operator norm givenby ||| A ||| ∞ = sup (cid:107) x (cid:107)≤ (cid:107) Ax (cid:107) . Each operator A ∈ L ( H ) has the adjoint operator A † ∈ L ( H ), which satisﬁes 〈 Ax , y 〉 = 〈 x , A † y 〉 for each x , y ∈ H . A ∈ L ( H ) is called self-adjoint if A = A † and non-negative deﬁnite if 〈 Ax , x 〉 ≥ x ∈ H . . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series An operator A ∈ L ( H ) is called compact if it can be written in the form A = (cid:80) j ≥ s j ( A ) 〈 e j , ·〉 f j , where { e j : j ≥

1} and { f j : j ≥

1} are orthonormal sets (not nec-essarily complete) of H , { s j ( A ) : j ≥

1} are the singular values of A and the seriesconverges in the operator norm. We say that a compact operator A ∈ L ( H ) belongsto the Schatten class of order p ≥ A ∈ S p ( H ) if ||| A ||| pp = (cid:80) j ≥ s pj ( A ) < ∞ .The Schatten class of order p ≥ ||| · ||| p . A com-pact operator A ∈ L ( H ) is called Hilbert-Schmidt if A ∈ S ( H ) and trace class if A ∈ S ( H ). The space of Hilbert-Schmidt operators S ( H ) is also a Hilbert spacewith the inner product given by 〈 A , B 〉 H S = (cid:80) j ≥ 〈 Ae j , Be j 〉 for each A , B ∈ S ( H ),where { e j : j ≥

1} is an orthonormal basis.Let L ([0,1] k , (cid:67) ) for k ≥ f : [0,1] k → (cid:67) with the inner product given by 〈 f , g 〉 = (cid:90) [0,1] k f ( x ) g ( x ) d x for each f , g ∈ L ([0,1] k , (cid:67) ), where x denotes the complex conjugate of x ∈ (cid:67) . Wedenote the norm of L ([0,1] k , (cid:67) ) by (cid:107) · (cid:107) . L ([0,1] k , (cid:82) ) for k ≥ A ∈ L ( L ([0,1] k , (cid:67) )) is Hilbert-Schmidt if and only if there exists akernel k A ∈ L ([0,1] k × [0,1] k , (cid:67) ) such that A f ( x ) = (cid:90) [0,1] k k A ( x , y ) f ( y ) d y almost everywhere in [0,1] k for each f ∈ L ([0,1] k , (cid:67) ) (see Theorem 6.11 of [39]).Furthermore, ||| A ||| = (cid:107) k A (cid:107) = (cid:90) [0,1] k (cid:90) [0,1] k | k A ( x , y ) | d x d y and 〈 A , B 〉 HS = 〈 k A , k B 〉 = (cid:90) [0,1] k (cid:90) [0,1] k k A ( x , y ) k B ( x , y ) d xd y for A , B ∈ S ( L ([0,1] k , (cid:67) )) with the kernels k A ∈ L ([0,1] k × [0,1] k , (cid:67) ) and k B ∈ L ([0,1] k × [0,1] k , (cid:67) ) respectively. Finally, for f , g ∈ L ([0,1] k , (cid:67) ), we deﬁne the tensor product f ⊗ g : L ( L ([0,1] k , (cid:67) )) by setting ( f ⊗ g ) v = 〈 v , g 〉 f for all v ∈ L ([0,1] k , (cid:67) ). In particu-lar, since the tensor product L ([0,1] k , (cid:67) ) ⊗ L ([0,1] k , (cid:67) ) is isomorphic to S ( L ([0,1] k , (cid:67) )),it deﬁnes a Hilbert-Schmidt operator with the kernel in L ([0,1] k × [0,1] k , (cid:67) ) given by( f ⊗ g )( τ , σ ) = f ( τ ) g ( σ ) for each τ , σ ∈ [0,1] k . The second order dynamics of weakly stationary time series of functional data { X h } h ∈ (cid:90) can be characterized by its spectral density operator, deﬁned as the Fourier trans-form of the sequence of covariance operators, acting on L ([0,1], (cid:67) ), i.e., F ω = (cid:88) h ∈ (cid:90) (cid:69) (cid:161) ( X h − µ ) ⊗ ( X − µ ) (cid:162) e − i2 πω h ω ∈ (0,0.5) (2.1) . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series where µ = (cid:69) X denotes the mean function. We will assume our data are centeredand hence µ =

0. When stationarity is violated, we can no longer deﬁne a spectraldensity for all time points. To allow for a meaningful deﬁnition of this object whenstationarity does not hold, we consider a triangular array { X t , T : 1 ≤ t ≤ T } T ∈ (cid:78) as adoubly indexed functional time series, where X t , T is a random element with valuesin L ([0,1], (cid:82) ) for each 1 ≤ t ≤ T and T ∈ (cid:78) . The processes { X t , T : 1 ≤ t ≤ T } areextended on t ∈ (cid:90) by setting X t , T = X T for t < X t , T = X T , T for t > T . Following[36], the sequence of stochastic processes { X t , T : t ∈ (cid:90) } indexed by T ∈ (cid:78) is called locally stationary if for all rescaled times u ∈ [0,1] there exists an L ([0,1], (cid:82) )-valuedstrictly stationary process { X ( u ) t : t ∈ (cid:90) } such that (cid:176)(cid:176)(cid:176) X t , T − X ( u ) t (cid:176)(cid:176)(cid:176) ≤ (cid:161)(cid:175)(cid:175) tT − u (cid:175)(cid:175) + T (cid:162) P ( u ) t , T a . s . (2.2)for all 1 ≤ t ≤ T , where P ( u ) t , T is a positive real-valued process such that for some ρ > C < ∞ , the process satisﬁes (cid:69) (cid:179)(cid:175)(cid:175)(cid:175) P ( u ) t , T (cid:175)(cid:175)(cid:175) ρ (cid:180) < C for all t and T uniformly in u ∈ [0,1]. If the second-order dynamics change gradually over time, the second orderdynamics of the stochastic process { X t , T : t ∈ (cid:90) } T ∈ (cid:78) are completely described by the time-varying spectral density operator given by F u , ω = (cid:88) h ∈ (cid:90) (cid:69) (cid:161) X ( u ) t + h ⊗ X ( u ) t (cid:162) e − i2 πω h (2.3)for u ∈ [0,1] and { X ( u ) t : t ∈ (cid:90) }. Following the set-up of [36], in order to establish ourasymptotic results, we impose the following technical assumption on the time seriesunder consideration. Assumption . Assume { X t , T : t ∈ (cid:90) } T ∈ (cid:78) is a locally stationary zero-mean stochas-tic process. Let κ k ; t ,..., t k − be a positive sequence in L ([0,1] k , (cid:82) ) independent of T such that, for all j = k − (cid:96) ∈ (cid:78) , (cid:88) t ,..., t k − ∈ (cid:90) (1 + | t j | (cid:96) ) (cid:107) κ k ; t ,..., t k − (cid:107) < ∞ . (2.4)Let us denote Y ( T ) t = X t , T − X ( t / T ) t and Y ( u , v ) t = X ( u ) t − X ( v ) t ( u − v ) (2.5)for T ≥

1, 1 ≤ t ≤ T and u , v ∈ [0,1] such that u (cid:54)= v . Suppose furthermore that the k -th order joint cumulants satisfy(i) (cid:107) Cum( X t , T ,..., X t k − , T , Y ( T ) t k ) (cid:107) ≤ T (cid:107) κ k ; t − t k ,..., t k − − t k (cid:107) ,(ii) (cid:107) Cum( X ( u ) t ,..., X ( u k − ) t k − , Y ( u k , v ) t k ) (cid:107) ≤ (cid:107) κ k ; t − t k ,..., t k − − t k (cid:107) ,(iii) sup u (cid:107) Cum( X ( u ) t ,..., X ( u ) t k − , X ( u ) t k ) (cid:107) ≤ (cid:107) κ k ; t − t k ,..., t k − − t k (cid:107) ,(iv) sup u (cid:107) ∂∂ u Cum( X ( u ) i , t ,..., X ( u ) i , t k − , X ( u ) i , t k ) (cid:107) ≤ (cid:107) κ k ; t − t k ,..., t k − − t k (cid:107) . . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series Under these assumptions, the spectral density operator deﬁned in (2.3) is a Hilbert-Schmidt operator, and we shall denote its kernel function by f u , ω ∈ L ([0,1] , (cid:67) ),which is twice-differentiable with respect to u and ω .In order to characterize the time-varying dynamics of F u , ω , we consider the de-meaned power spectrum, g u , ω ( τ , σ ) : = f u , ω ( τ , σ ) − (cid:90) f u , ω ( τ , σ ) du , (2.6)and equivalently G u , ω = F u , ω − (cid:82) F u , ω du . We assume g admits a partition of thefrequency space such that g u , ω ( τ , σ ) =  g (1) u ( τ , σ ) for ω ∈ (0, ω ), g (2) u ( τ , σ ) for ω ∈ [ ω , ω ),... g ( p ) u ( τ , σ ) for ω ∈ [ ω p − ,0.5). (2.7)Equivalently the spectral density operator G admits a partition G u , ω =  G (1) u for ω ∈ (0, ω ), G (2) u for ω ∈ [ ω , ω ),... G ( p ) u for ω ∈ [ ω p − ,0.5). (2.8)In practice, assumptions about the frequency-banded structure of the power spec-trum are often made implicitly and may not be stated explicitly when conductingfrequency band analysis. We aim to estimate the size of the frequency partition, p ,and the associated frequency partition points ( ω , ω ,..., ω p ).

3. Empirical Band Analysis

We start by approximating the time-varying power spectrum by multitaper local pe-riodograms based on the data. We consider B equally-sized non-overlapping tem-poral blocks. For the sake of notational convenience, suppose that T is a multiple of B , and let T B = T / B be the number of observations in each temporal block.For b = B , ω ∈ (0,0.5), k = K and T ≥

1, the functional discreteFourier transform (fDFT) is deﬁned as a random function with values in L ([0,1], (cid:67) )given by (cid:101) X ( k ), b , ω T : = T (cid:88) t = v kb ( t ) I b ( t / T ) X t , T e − i2 πω t , (3.1)where v kb : {1,2,..., T } (cid:55)→ (cid:82) is the k -th data taper for the b -th time segment. We pro-pose the use of sinusoidal tapers of the form v kb ( t ) = (cid:115) T B + π k [ t − ( b − T B ] T B + k = K , (3.2) . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series which are orthogonal for t ∈ [( b − T B + bT B ]. Sinusoidal tapers are more com-putationally efﬁcient than the Slepian tapers proposed in [34], which require nu-merical eigenvalue decomposition to construct the tapers, and can achieve similarspectral concentration with signiﬁcantly less local bias [29]. Another advantage isthat the bandwidth, bw , which is the minimum separation in frequency betweenapproximately uncorrelated spectral estimates, can be fully determined by settingthe number of tapers, K , appropriately [37]. More speciﬁcally, bw = K + T B + (cid:98) f ( k ) b , ω ( τ , σ ) = (cid:101) X ( k ), b , ω T ( τ ) (cid:101) X ( k ), b , − ω T ( σ ) = (cid:101) X ( k ), b , ω T ( τ ) (cid:101) X ( k ), b , ω T ( σ ).The multitaper estimator of the local power spectrum within the b -th block isthen deﬁned to be the average of all K single taper estimators (cid:98) f ( mt ) b , ω ( τ , σ ) = K K (cid:88) k = (cid:98) f ( k ) b , ω ( τ , σ ). (3.4)The ﬁnal estimator of the time-varying power spectrum is then given by (cid:98) f ( mt ) u , ω ( τ , σ ) = B (cid:88) b = I b ( u ) (cid:98) f ( mt ) b , ω ( τ , σ ). (3.5)In practice, this estimator requires appropriate selection of two tuning parameters: B and K . The number of time blocks, B , balances frequency and temporal prop-erties. It should be selected small enough to ensure sufﬁcient frequency resolutionand so the central limit theorem holds for the local tapered periodgrams. It shouldbe selected large enough so that data within each block are approximately station-ary, which depends on the signal under study. Asymptotic results presented in thenext section indicate the optimal rate for B is T . In the absence of scientiﬁc guide-lines, we recommend selecting B as the factor of T closest to T . The numberof tapers, K , controls the smoothness of local spectral estimates. Asymptotic re-sults presented in the next section indicate an appropriate choice would be K = max (cid:161) (cid:165) min( T B , B ) (cid:166)(cid:162) .We further deﬁne the estimated demeaned time-varying spectra as (cid:98) g b / B , ω ( τ , σ ) = (cid:98) f ( mt ) b / B , ω ( τ , σ ) − B B (cid:88) l = (cid:98) f ( mt ) l / B , ω ( τ , σ ). (3.6)Let (cid:101) g u , ω , δ be the average of demeaned time-varying spectrum for ω ∈ [ ω , ω + δ ). Inorder to identify frequency partition points, we consider an integrated scan statisticdeﬁned as Q ω , δ = B (cid:88) b = (cid:90) (cid:90) (cid:161) (cid:98) g b / B , ω + δ ( τ , σ ) − (cid:101) g b / B , ω , δ ( τ , σ ) (cid:162) d τ d σ . (3.7) . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series The basic intuition behind this scan statistic is that the estimator (cid:98) g deﬁned in (3.6)asymptotically behaves like the theoretical demeaned power spectrum deﬁned in(2.7). The scan statistic compares the value of (cid:98) g at ω + δ with the average of thesame quantity over the interval ( ω , ω + δ ). Hence, if one of the partition points ω k deﬁned in (2.7) is present in the interval ( ω , ω + δ ), the scan statistic should takea large value, and it should be bounded and close to zero otherwise. For rest of thepaper, we will consider the following asymptotic scheme. Assumption . Assume T → ∞ , B → ∞ , K → ∞ , T / K → ∞ and T / BK → ∞ ,Our ﬁrst result characterizes the asymptotic behavior of the estimated demeanedpower spectrum. Lemma . Under Assumption 2.1 and Assumption 3.1, for every b , b , b ∈ {1,2,..., B }, ω ∈ (0,0.5), and τ , τ , τ , σ , σ , σ ∈ [0,1], (cid:69) (cid:161) (cid:98) g b / B , ω ( τ , σ ) (cid:162) = g u b , ω ( τ , σ ) + O (cid:161) log( T B )/ T B (cid:162) + log(1/ T )Cov (cid:161) (cid:98) g b / B , ω ( τ , σ ), (cid:98) g b / B , ω ( τ , σ ) (cid:162) = (1 − B ) K F ( u b , ω , ω , τ , σ , τ , σ ) + B K B (cid:88) l = F ( u l , ω , ω , τ , σ , τ , σ ) + o (1).Cov (cid:161) (cid:98) g b / B , ω ( τ , σ ), (cid:98) g b / B , ω ( τ , σ ) (cid:162) = − BK F ( u b , ω , ω , τ , σ , τ , σ ) − BK F ( u b , ω , ω , τ , σ , τ , σ ) + B K B (cid:88) l = F ( u b l , ω , ω , τ , σ , τ , σ ) + o (1),where F ( u , ω , ω , τ , σ , τ , σ ) : = f u , ω ( τ , τ ) f u , ω ( σ , σ ) + f u , ω ( τ , σ ) f u , ω ( τ , σ ),and u b is the mid-point of the b -th block. Remark . Note that the assumptions T / BK → ∞ and K → ∞ together guarantee T B → ∞ . Therefore Lemma 3.1 implies that (cid:69) (cid:161) (cid:98) g b / B , ω ( τ , σ ) (cid:162) → g u b , ω ( τ , σ ) under ourasymptotic scheme. Moreover, the covariance kernel of the process (cid:98) g is O (1/ K ). Thisguarantees for ﬁxed b and ω , (cid:98) g b / B , ω consistently estimates g u b , ω . A stronger uniformresult on b and ω can be established under additional moment assumptions. Addi-tionally, the covariance across different time blocks is of the order O (1/ BK ), whichconverges to 0 faster than the covariance kernel within the same block.Next, we investigate the asymptotic behavior of the scan statistic itself. The nexttwo theorems describe the asymptotic behavior of the scan statistic under the ab-sence and presence of a jump point in the interval ( ω , ω + δ ) respectively. Theorem . Assume g u , ω ( τ , σ ) = g (0) u ( τ , σ ) for all τ , σ ∈ [0,1] and ω ∈ [ ω , ω + δ ].Under Assumption 2.1 and Assumption 3.1, Q ω , δ d = K B (cid:88) b = (cid:161) (cid:107) G b (cid:107) + o p (1) (cid:162) , . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series where { G b } is a collection of zero-mean Gaussian processes in L (cid:161) [0,1] (cid:162) with co-variance structure given in (A.7) and (A.8).By Lemma A.4, the quantity B (cid:80) b (cid:107) G b (cid:107) = O p (1) as B → ∞ . Therefore, Theo-rem 3.1 suggests that the quantity K / BQ ω , δ is asymptotically bounded. Hence, aslong as the last quantity diverges under our asymptotic set-up, we can construct aconsistent test for the existence of a partition point in frequency domain. The nexttheorem guarantees that is indeed the case. Theorem . Assume that g u , ω ( τ , σ ) = (cid:189) g (1) u ( τ , σ ) for ω ∈ [ ω , ω ∗ ) g (2) u ( τ , σ ) for ω ∈ [ ω ∗ , ω + δ ].Under Assumption 2.1 and Assumption 3.1, Q ω , δ d = K B (cid:88) b = (cid:161) (cid:107) G b (cid:107) + o p (1) (cid:162) + B (cid:181) ω ∗ − ω δ (cid:182) (cid:90) (cid:90) (cid:90) (cid:161) g (1) u ( τ , σ ) − g (2) u ( τ , σ ) (cid:162) d τ d σ du + O p ( B / K ),where { G b } are as deﬁned in Theorem 3.1. Remark . If there is no signiﬁcant frequency partition point in [ ω , ω + δ ], wehave Q ω , δ = O ( B / K ),and in the presence of a change, Q ω , δ = O ( B / K ) + O ( B ).Therefore, we expect to see a large spike in the scan statistic around the frequencywhere the dynamics of the power spectrum changes. Remark . Suppose we are interested in testing the null hypothesis that the kernel g u , ω does not change in ω on the interval ( ω , ω + δ ). Let c α be the (1 − α )-th quantileof B (cid:80) Bb = (cid:107) G b (cid:107) for large B . Then an asymptotic level α test for this hypothesis canbe constructed by rejecting the null when KQ ω , δ > Bc α . In fact, the next Corollarysuggests that this is a consistent test. Corollary . Under the assumptions of Theorem 3.2, we have P (cid:195) Q ω , δ > K B (cid:88) b = (cid:129) G b (cid:129) (cid:33) → The results discussed in the previous section provide a consistent test to ﬁnd a singlepartition point in the frequency space. In this section, we extend this framework todetect multiple frequency partition points using an iterative search algorithm thatuses this scan statistic to efﬁciently explore the frequency space and identify all fre-quency partition points. . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series For computational efﬁciency, we search across frequencies in batches of size n max .This limits the number of calculations required to approximate the null distributionof the scan statistic to avoid undue computational burden. In particular, ω is ﬁrstﬁxed at a value near 0, and we conduct a test for partition points in the interval( ω , ω + δ ] comparing the statistic Q ω , δ against the null distribution described inTheorem 3.1 for different values of δ < n max / T B . A Hochberg step-up procedure [14]is then used to test for the presence of a partition in the frequency domain at eachparticular frequency ω + δ . If the procedure returns a set of frequencies for whichthe null hypothesis (no partition point) is rejected, we then add the smallest fre-quency in this set to the estimated frequency band partition, increase ω to be justlarger than this newly found frequency partition point, and repeat the process. If theprocedure does not return any frequencies for which the null hypothesis is rejected,we increase ω to be just larger than the largest frequency tested in the current batchand repeat the process. The procedure continues until all frequencies have beenevaluated as potential partition points. To better visualize this procedure, considerthat the traversal of this procedure across frequencies resembles the movement ofan inchworm. A complete algorithmic representation of the search procedure in itsentirety is available in Algorithm 1.P-values are obtained by comparing observed test statistics with a simulated dis-tribution of the limiting random variable described in Theorem 3.1. To simulatefrom the null distribution, we generate d draws from the collection of zero-meanGaussian processes { G b } with covariance structure given in (A.7) and (A.8). The co-variance is estimated using the estimates of the demeaned time-varying power spec-trum based on the local multitaper power spectrum estimator ˆ f ( mt ) u , ω described inSection 3.1. Remark . Computation cost is a critical aspect for analyzing functional time se-ries data. In order to make our algorithm more efﬁcient, we use two particular ad-justments(i)

A block diagonal approximation of the asymptotic covariance structure:

Whileapproximating the asymptotic quantiles by simulating { G b }, we ignore the co-variance across different time blocks b given by (A.8). This allows for simulat-ing from a collection of B independent, lower dimensional Gaussian processes,which can be carried out in parallel for computational efﬁciency. Note that thecovariance kernel for G b given in (A.7) is O (1) and the covariance kernel in (A.8)is O (1/ B ). Therefore, for large B , the approximation works reasonably well.(ii) Choice of the tuning parameter n max : Notice that number of calculations re-quired to estimate the covariance kernel given in (A.8) grows with the numberof frequencies between ω and ω + δ . By traversing frequencies by testing insmaller batches, this reduces computation times signiﬁcantly compared to let-ting δ vary over the whole frequency domain in a single pass.All computations in what follows were performed using R T = R = B = . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series Algorithm 1: I NCHWORM F REQUENCY B AND S EARCH A LGORITHM

Input:

Demeaned time-varying multitaper power spectrum estimates, ˆ g b / B , ω k ( τ , σ ), for b = B and ω k = k / T B for k = N B = (cid:98) T B /2 (cid:99)− K , signiﬁcance level α , number of frequencies tested in each pass n max ,number of draws for approximating p -values, d Output:

Estimated number of partition points, (cid:98) p Estimated partition points, (cid:98) ω (cid:98) p = { (cid:98) ω , (cid:98) ω ,..., (cid:98) ω (cid:98) p − } (cid:98) p ← (cid:98) ω (cid:98) p ← {}, stop ← (cid:178) ← K + T B + , k ∗ ← (cid:100) T B (cid:178) (cid:101) , ω ← ω k ∗ while stop (cid:54)= do k min ← (cid:100) T B (cid:178) (cid:101) , k max ← min (cid:161) k min + n max − (cid:165) N B − k ∗ − T B (cid:178) (cid:166)(cid:162) Compute test statistics Q ω , δ k ∀ δ k = k / T B , k ∈ { k min , k min + k max }.Simulate d draws, Q H ω , δ k = (cid:110) Q H , i ω , δ k (cid:111) d i = , from the limiting null distribution of Q ω , δ k .Approximate p -values, ˆ p ( k ) = d (cid:80) d i = I (cid:179) Q H , i ω , δ k > Q ω , δ k (cid:180) to test H ( k ) : g u , ω ( τ , σ ) = g (0) u ( τ , σ ) for ω ∈ [ ω , ω + δ k ].Identify R α = { ω + δ k : H ( k ) rejected} using level- α Hochberg step-up procedure. if R α = {} then k ∗ ← T B ( ω + δ k max ), ω ← k ∗ / N B else ˆ ω ∗ ← min R α , (cid:98) ω (cid:98) p ← (cid:98) ω (cid:98) p ∪ { ˆ ω ∗ }, (cid:98) p ← (cid:98) p + k ∗ ← (cid:167) T B ( ˆ ω ∗ + (cid:178) ) (cid:168) , ω ← k ∗ / N B endif ω > ω N B − (cid:178) then stop ← endreturn (cid:98) p , (cid:98) ω (cid:98) p K =

15 tapers with n max =

40 and d = The proposed methodology produces homogeneous regions of frequencies in whichthe power spectrum varies only across time. It is natural to then seek to identify fre-quency bands for which the second order structure is stationary. Speciﬁcally, withina frequency band [ ω , ω ], the power spectrum f u , ω ( τ , σ ) = f u ( τ , σ ) for all u , τ , σ ∈ [0,1] and ω ∈ [ ω , ω ]. Furthermore, if the process in stationary within this region,the power spectrum is constant across time u , i.e., f u , ω ( τ , σ ) = f ( τ , σ ) within thanband. In this situation, the demeaned power spectrum g u , ω ( τ , σ ) = f u , ω ( τ , σ ) − (cid:90) f u , ω ( τ , σ ) du ≡ . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series for for all u , τ , σ ∈ [0,1] and ω ∈ [ ω , ω ]. Therefore, to test stationarity within a fre-quency band, we consider null hypothesis H : g u , ω ( τ , σ ) ≡

0, almost everywhere u , τ , σ ∈ [0,1], ω ∈ [ ω , ω ],against the alternative H a : g u , ω ( τ , σ ) (cid:54)=

0, on a set of positive Lebesgue measure.To test this hypothesis, we propose the test statistic Q ( ω , ω ) = B B (cid:88) b = (cid:90) ω ω (cid:90) (cid:90) (cid:175)(cid:175) (cid:98) g b / B , ω ( τ , σ ) (cid:175)(cid:175) d τ d σ d ω .As (cid:98) g is a consistent estimator of the true demeaned power spectrum g , this statisticis close to 0 under H and takes large positive values under the alternative. Now weformalize this idea through the asymptotic distribution of this statistic. Theorem . Assume that g u , ω ( τ , σ ) = g u ( τ , σ ) = u , τ , σ ∈ [0,1] and ω ∈ [ ω , ω ]. Under Assumption 2.1 and Assumption 3.1, Q ( ω , ω ) d = K B (cid:88) b = (cid:161) (cid:107) H b (cid:107) + o (1) (cid:162) , (3.8)where { H b } is a collection of zero-mean Gaussian Process in L [0,1] with covari-ance structure given in (A.5). Remark . Theorem 3.3 suggests that under H , g u ( τ , σ ) ≡

0, the quantity K / B × Q ( ω , ω ) = B (cid:80) Bb = (cid:107) H b (cid:107) + o (1). Therefore, approximate p -values for the test ofstationarity can be constructed by comparing KQ ( ω , ω )/ B with simulated quan-tiles of B (cid:80) Bb = (cid:107) H b (cid:107) , for large B . Note that B (cid:80) Bb = (cid:107) H b (cid:107) is O p (1) as B → ∞ byLemma A.4.The following Lemma guarantees consistency of the proposed test. Lemma . Assume that g u , ω ( τ , σ ) = g u ( τ , σ ) (cid:54)= KQ ( ω , ω )/ B → ∞ in probability.

4. Finite Sample Properties

In order to evaluate the performance of the search algorithm in ﬁnite samples, weconsider three simulation settings representing appropriate extensions of [6] forfunctional time series. A B-spline basis with 15 basis functions is used to generaterandom realizations of the functional time series. All settings can be represented as f u , ω ( τ , σ ) = φ u , ω f ( τ , σ ) for u ∈ [0,1] and ω ∈ (0,0.5) and φ u , ω = ω ∈ (0,0.5), (4.1) . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series φ u , ω =  − u for ω ∈ (0,0.15)5 for ω ∈ [0.15,0.35)1 + u for ω ∈ [0.35,0.5), (4.2)and φ u , ω =  + sin(8 π u − π /2) for ω ∈ (0,0.15]2 + cos(8 π u ) for ω ∈ (0.15,0.35]2 + cos(16 π u ) for ω ∈ (0.35,0.5). (4.3)See Figure 1 for an illustration of the estimated time-varying auto spectrum for a sin-gle point in the functional domain for each setting. In the ﬁrst setting, we considerfunctional white noise in order to ensure the method maintains appropriate con-trol for false positives. In the second and third settings, both linear and non-linearnonstationary dynamics are considered within frequency bands. These settings arespecially designed to assess performance in detecting time-varying dynamics of dif-ferent forms, as well as subtle changes in dynamics over frequencies.Fig 1: Local multitaper estimates of the time-varying autospectra for a single compo-nent from each of the simulation settings for time series with N = B =

20 time segments. Solid green lines represent the true par-tition of frequencies.Table 1 reports the means and standard deviations over 100 replications for theestimated number of frequency bands, ˆ p , and Rand indices, R ( ˆ ω , ω ). The Rand index[27] summarizes the similarity between the estimated frequency partition, ˆ ω , andthe true partition, ω . Let a be the number of pairs of Fourier frequencies in the samefrequency band in ˆ ω and the same frequency band in ω and b be the number of pairsof Fourier frequencies in different frequency bands in ˆ ω and different frequencybands in ω . Then the Rand index, R ( ˆ ω , ω ) = ( a + b )/ (cid:161) N B (cid:162) , where N B = (cid:98) T B /2 (cid:99)− T B , the number of approximately sta-tionary time blocks, B , and the number of observations in the functional domain, R .Furthermore, we ﬁx the family-wise error rate (FWER) control level, α = n max =

30, number of draws to approximate . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series the null distribution of the test statistics d = bw = K = (cid:98) bw ( N + (cid:99) −

1, and weuse a block diagonal approximation for the covariance function of the Gaussian pro-cess used to approximate the limiting null distribution of the scan statistics and se-lect four approximately equally-spaced points in the functional domain for testing.The proposed method provides good estimation accuracy for both the number andlocation of frequency band partition points, and performance generally improves as T B and B increase. R=5 R=10 T B B ˆ p R ( ˆ ω , ω ) ˆ p R ( ˆ ω , ω )White noise ( p = p = p = T ABLE Mean(standard deviation) for the estimated number of frequency bands, ˆ p, and Rand index values,R ( ˆ ω , ω ) , for R = replications. . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series When the number of time blocks, B , is smaller, the proposed method slightlyoverestimates the number of partition points. This is not unexpected, since the pro-posed method uses the asymptotic behavior of the scan statistics for testing pur-poses, which may not hold for smaller values of B . For the ﬁrst setting, Table 1 indi-cates good performance in correctly estimating only one frequency band. Since thesearch algorithm is designed to control FWER, the false positive rate remains undercontrol, even as the number of distinct frequencies tested increases with larger val-ues of T B . In some cases, FWER may be controlled more tightly than α = T B and B increase.However, the accuracy is also impacted by the magnitude of the differences betweenthe underlying demeaned time-varying power spectra across frequency bands, aswell as the number of time blocks used to approximate the time-varying behavior.The time-varying dynamics of the power spectrum for adjacent frequency bands forthe second setting are dissimilar for nearly all time points. Accordingly, the algo-rithm is able to correctly estimate the frequency band partition with smaller T B and B for this setting. On the other hand, the two frequency bands covering higher fre-quencies in the third setting have similar time-varying dynamics in the power spec-trum at particular time points due to their periodicities (see Figure 1). Also, withouta sufﬁcient number of time blocks, it is difﬁcult to distinguish the different periodictime-varying behavior across frequency bands for this setting. Taken together, thisexplains the need for relatively larger values of T B and B for accurate estimation ofthe frequency band partition for the third setting compared to the second setting. To illustrate the usefulness of the proposed methodology for functional time seriesanalysis, we turn to frequency band analysis of EEG signals. Analyzing EEG signals asnonstationary functional time series is warranted by the high-dimensionality, non-stationarity, and strong dependence typically observed for EEG signals. Frequencybands are also commonly used in the scientiﬁc literature to generate summary mea-sures of EEG power spectra, so a principled approach to frequency band estimationwould be a welcomed development. In the scientiﬁc literature, there is signiﬁcantvariability in frequency ranges used to deﬁne traditional EEG frequency bands [21].For the purposes of comparison with the proposed method, we use the followingdeﬁnitions (Hz) [20]: delta (0,4), theta [4,7), alpha [7,12), beta [12,30), and gamma[30,100). We analyze a 4-minute segment of a 72-channel 256 Hz EEG signal froma single participant from the study described in [35]. Participants in the study satin a wakeful resting state and alternated between eyes open (EO) and eyes closed(EC) conditions at 1 minute intervals. Given this particular design, we expect to seeperiodic nonstationary behavior similar to that of the third simulation setting intro-duced in Section 4.1.For computational efﬁciency, we downsample the time series at 32 Hz to producea series of length T = T B = × =

320 ob-servations per segment and B = =

24 segments. Other parameters (e.g. . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series FWER control level, number of frequencies tested in each pass, local multitaperbandwidth, etc.) follow the same settings used for the simulations in Section 4.1.Since it is possible that different brain regions may be characterized by different fre-quency band structures, we consider two groups of channels, one representing theparietal and occipital lobes (PO7, PO3, POz, PO4, and PO8), which is associated withattention, and one representing the frontal and central lobes (F1, Fz, F2, FC1, FCz,FC2), which is associated with sensorimotor function [38]. By applying the proposedmethod to each group, we can better understand if the frequency band structuresthat characterize the time-varying dynamics of the power spectrum are differentwithin sub-regions of the functional domain.Figure 2 presents the log autospectra for the ﬁve channels associated with at-tentional system areas along with traditional EEG frequency bands and estimatedEEG frequency bands using the proposed method. Applying the proposed method-ology to this data revealed three frequency bands with different time-varying dy-namics (Hz): (0,3.8), [3.8,12.6), [12.6,16). Comparing with the traditional frequencyband partition, the proposed low frequency band (0,3.8) coincides with the tradi-tional delta band (0,4), but the next proposed frequency band [3.8,12.6) covers boththe traditional theta [4,7) and alpha [7,12) frequency bands. This suggests that whilethe delta band exhibits different time-varying characteristics from other bands, thetheta and alpha bands exhibit similar time-varying characteristics for these chan-nels. These results are not surprising, as it is well-known that alpha band power in-creases during EC conditions and attenuates with visual stimulation during EO con-ditions. As is the case with this participant, similar behavior has also been observedfor theta band power, with larger reductions observed in the posterior regions of thebrain, including the attentional system area channels currently under study [1, 2].Hence, it is reasonable that the time-varying dynamics of the alpha and theta bandsmay be similar for this particular set of EEG channels.To better understand differences in the time-varying behavior across the esti-mated bands, Figure 3 displays a smoothed estimate of the frequency-band spe-ciﬁc demeaned time-varying autospectra, ˆ g ( p ) u ( τ i , τ i ), p = i = . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series Fig 2: Log time-varying autospectra for 5 EEG channels and frequency bands deter-mined by the proposed methodology (solid green lines) and traditional frequencybands (dashed blue lines) are displayed. These EEG channels measure activity in theparietal (P) and occipital (O) lobes, which are associated with attention and visualprocessing.varying dynamics (Hz): (0,2.2), [2.2,4.9), [4.9, 8.1), [8.1,11.8), [11.8,16.0). These esti-mated bands align reasonably well with the traditional EEG frequency bands. How-ever, the estimated bands suggest that the traditional delta band, (0,4), should becharacterized by two sub-bands, (0,2.2) and [2.2, 4.9), which exhibit signiﬁcantly dif-ferent time-varying behavior of the power spectrum. Such ﬁndings have been notedin the scientiﬁc literature, in which the so-called “slow delta” (0.7-2 Hz) and “fastdelta” (2-4 Hz) bands exhibit different behavior during the wake-sleep transition [3].[1] also observed that the magnitude of the difference in theta power between theEO and EC conditions is less for frontal and central brain regions compared to pos-terior regions, while the magnitude of the difference for the alpha band is similaracross regions. This is supported by the current analysis and can help explain whythe theta and alpha bands are estimated to have similar time-varying behavior forthe group of posterior region EEG channels (Figure 2), and different time-varyingdynamics for the group of central and frontal region EEG channels (Figure 4).The smoothed estimates of the frequency-band speciﬁc demeaned time-varyingautospectra for these channels (see Figure 5) illustrate the different time-varying be-havior of the estimated frequency bands captured by the search algorithm. The twosub-bands covering the traditional delta band indicate very different time-varyingbehavior. Also, the estimated band that roughly corresponds to the traditional alpha . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series Fig 3: Smoothed estimator of the frequency band speciﬁc demeaned time-varyingpower spectra g ( p ) u p =

5. Discussion

The frequency band analysis framework for nonstationary functional time series in-troduced in this article offers a quantitative approach to identifying frequency bandsthat best preserve the nonstationary dynamics of the underlying functional time se-ries. This framework allows for estimation of both the number of frequency bandsand their corresponding endpoints through the use of a sensible integrated scanstatistic within an iterative search algorithm. Another test statistic is also offered todetermine which bands, if any, are stationary with respect to time. Motivated by theapplication to EEG frequency band analysis, it would be interesting to consider ex-tensions of this framework that enable localization of the frequency band estimationframework in the time and functional domains. Such extensions would allow forthe frequency band estimation framework to automatically adapt to local spectralcharacteristics without needing to pre-specify particular time segments or subsets . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series Fig 4: Log time-varying autospectra for 6 EEG channels and frequency bands deter-mined by the proposed methodology (solid green lines) and traditional frequencybands (dashed blue lines) are displayed. These EEG channels measure activity in thefrontal (F) and central (C) lobes, which are associated with sensorimotor function.of the functional domain for analysis. However, these extensions present signiﬁcantcomputational challenges associated with searching over multiple spaces simulta-neously.We have focused on estimation of frequency bands for a single nonstationaryfunctional time series, but this framework can also be extended for the analysis ofmultiple functional time series. For example, extending this framework for estimat-ing frequency bands for classiﬁcation and clustering of functional time series wouldprovide researchers with optimal frequency band features for supervised and unsu-pervised learning tasks. This extension could be very useful in the study of EEG andfMRI signals to construct frequency band features that are associated with clinicaland behavioral outcomes or that can be used to identify groups of time series withsimilar spectral characteristics. . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series Fig 5: Smoothed estimator of the frequency band speciﬁc demeaned time-varyingpower spectra g ( p ) u p = Technical Details

A.1. Properties of Multitaper Periodogram of Functional Time Series

We ﬁrst establish some asymptotic properties of the multitaper periodogram esti-mator deﬁned in (3.5). Following the notations in [9] we deﬁne the quantities H k ( ω ) = T B (cid:88) t = e − πω t v kb ( t ), H k , l ( ω ) = T B (cid:88) t = e − πω t v kb ( t ) v lb ( t ),Note that, (cid:90) − | H k ( ω ) | d ω = T B (cid:88) t = ( v kb ( t )) = L ( ω ) for both the functions (cid:112) T B H k ( ω )and T B H k , k ( ω ) where L ( ω ) = (cid:189) T B , if | ω | ≤ T B | ω | , otherwise. Lemma

A.1 . Let ˜ X ( k ), b , ω T be the k -th functional discrete Fourier transformation ofthe observed time series at b -th block and frequency ω , as deﬁned in (3.1). Under . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series Assumption 2.1 and Assumption 3.1, the l -th order cumulant kernel of the fDFT isgiven by cum (cid:179) ˜ X ( k ), b , ω T ( τ ),..., ˜ X ( k l ), b l , ω l T ( τ l ) (cid:180) = (cid:40) H k ,..., k l (cid:179)(cid:80) lj = ω j (cid:180) f u b , ω ,..., ω l − ( τ ,..., τ l ) + o (1), if b = b = ··· = b l = b

0, otherwise.where H k ,..., k l ( ω ) = T B (cid:88) t = v k b ( t ) v k b ( t )... v k l b ( t ) e − i ω t . Proof.

Let B b be the b -th time block and write cum (cid:179) ˜ X ( k ), b , ω T ( τ ),..., ˜ X ( k l ), b l , ω l T ( τ l ) (cid:180) = cum  (cid:88) t ∈ B b v k b ( t ) X t , T ( τ ) e − i2 πω t ,..., (cid:88) t ∈ B bl v k l b l ( t ) X t , T ( τ l ) e − i2 πω l t  = (cid:88) t ∈ B b ··· (cid:88) t l ∈ B bl v k b ( t )... v k l b l ( t l ) e − i2 π (cid:80) lk = ω k t k cum (cid:161) X t , T ( τ ),..., X t l , T ( τ l ) (cid:162) = (cid:88) t ∈ B b ··· (cid:88) t l ∈ B bl v k b ( t )... v k l b l ( t l ) e − i2 π (cid:80) lk = ω k t k cum (cid:179) X ( u b ) t ( τ ),..., X ( u bl ) t l ( τ l ) (cid:180) + o (1).If we have b i (cid:54)= b j for any pair i , j ∈ {1,2,..., l }, the last expression converges to 0 byAssumption 2.1.For the case b = b = ··· = b l = b , we write t j = t + v j for j = l . The expres-sion is then simpliﬁed to (cid:88) v ··· (cid:88) v l exp (cid:195) − i l (cid:88) j = v j ω j (cid:33) cum (cid:179) X ( u b ) t ( τ ),..., X ( u b ) t + v l ( τ l ) (cid:180)(cid:88) t v b ( t ) v b ( t + v ) ×··· × v b ( t + v l )exp (cid:195) − i l (cid:88) j = ω j t (cid:33) The result then follows by Lemma P.4.1 an Lemma P.4.2 from [5].

Lemma

A.2 . Let (cid:98) f ( mt ) b , ω be the multitaper periodogram estimator, as deﬁned in (3.5).Under Assumption 3.1, we have (cid:69) ˆ f ( mt ) b , ω ( τ , σ ) = f u b , ω ( τ , σ ) + O (cid:161) log( T B )/ T B (cid:162) + O (1/ T ). (A.1)Cov (cid:179) ˆ f ( mt ) b , ω ( τ , σ ), ˆ f ( mt ) b , ω ( τ , σ ) (cid:180) = (cid:40) f ub , ω ( τ , τ ) f ub , ω ( σ , σ ) K + O (1/ T B ) + O (1/ T ), if ω = ω = ω O (1/ T B ) + O (1/ T ), if ω (cid:54)= ω . Proof.

We start by noting that, (cid:69) ˆ f ( k ) b , ω ( τ , σ ) = (cid:90) − | H k ( α ) | f u b , ω − α d α + O (1/ T ), . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series where u b is the midpoint of the b -th segment. This follows from Theorem 5.2.3from [5] applied to the approximating series X ( u b ) t within the b -th block. Now write f u b , ω − α = f u b , ω + O ( | α | ) and using the form of L , we have (cid:69) ˆ f ( k ) b , ω ( τ , σ ) = f u b , ω ( τ , σ ) + O (cid:161) log( T B )/ T B (cid:162) + O (1/ T ).Taking an average of the last expression over different tapers, we get (cid:69) ˆ f ( mt ) b , ω ( τ , σ ) = f u b , ω ( τ , σ ) + O (cid:161) log( T B )/ T B (cid:162) + O (1/ T ).To calculate the covariance, ﬁrst note thatCov (cid:179) ˆ f ( k ) b , ω ( τ , σ ), ˆ f ( k ) b , ω ( τ , σ ) (cid:180) =

0, if b (cid:54)= b .Similar calculations as in the proof of Theorem 5.2.8 in [5] yieldsCov (cid:179) ˆ f ( k ) b , ω ( τ , σ ), ˆ f ( k ) b , ω ( τ , σ ) (cid:180) = (cid:163) | H k , k ( ω + ω ) | f u b , ω ( τ , σ ) f u b , ω ( τ , σ ) + | H k , k ( ω − ω ) | f u b , ω ( τ , τ ) f u b , ω ( σ , σ ) (cid:164) + O (1/ T B ) + O (1/ T ).Therefore if ω (cid:54)= ω we haveCov (cid:179) ˆ f ( k ) b , ω ( τ , σ ), ˆ f ( k ) b , ω ( τ , σ ) (cid:180) = O (1/ T B ) + O (1/ T ),andCov (cid:179) ˆ f ( k ) b , ω ( τ , σ ), ˆ f ( k ) b , ω ( τ , σ ) (cid:180) = f u b , ω ( τ , τ ) f u b , ω ( σ , σ ) + O (1/ T B ) + O (1/ T ).For local periodograms calculated for different tapers,Cov (cid:179) ˆ f ( k ) b , ω ( τ , σ ), ˆ f ( l ) b , ω ( τ , σ ) (cid:180) = (cid:163) | H k , l ( ω + ω ) | f u b , ω ( τ , σ ) f u b , ω ( τ , σ ) + | H k , l ( ω − ω ) | f u b , ω ( τ , τ ) f u b , ω ( σ , σ ) (cid:164) + O (1/ T B ) + O (1/ T ).By the orthogonality of the tapers, H k , l (0) = k (cid:54)= l . And by Cauchy-Schwartz,we have H k , l ( ω ) ≤ (cid:112) H k , k ( ω ) (cid:112) H l , l ( ω ), and hence L is indeed an upper bound for T B H k , l ( ω ). Therefore, in general for k (cid:54)= l ,Cov (cid:179) ˆ f ( k ) b , ω ( τ , σ ), ˆ f ( l ) b , ω ( τ , σ ) (cid:180) = O (1/ T B ) + O (1/ T ).Therefore we have,Cov (cid:179) ˆ f ( mt ) b , ω ( τ , σ ), ˆ f ( mt ) b , ω ( τ , σ ) (cid:180) = K K (cid:88) k = K (cid:88) l = Cov (cid:179) ˆ f ( k ) b , ω ( τ , σ ), ˆ f ( l ) b , ω ( τ , σ ) (cid:180) = K (cid:163) f u b , ω ( τ , τ ) f u b , ω ( σ , σ ) + f u b , ω ( τ , σ ) f u b , ω ( τ , σ ) (cid:164) + O (1/ T B ) + O (1/ T ). . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series The next theorem establishes a Central Limit Theorem type result for the mul-titaper periodogram estimator and is essential for proving Theorem 3.1 and Theo-rem 3.2.

Theorem

A.1 . Consider the processes E b , j ∈ L [0,1] deﬁned as E b , j ( τ , σ ) = (cid:112) K (cid:179) ˆ f ( mt ) b , ω j ( τ , σ ) − f u b , ω j ( τ , σ ) (cid:180) for b = B and j = J . For ﬁxed B , as T → ∞ , K → ∞ and T / BK → ∞ ,the ﬁnite dimensional distributions of { E b , j } j , b converge to a multivariate normaldistribution. More precisely, for all ( τ , σ ),...,( τ d , σ d ) ∈ [0,1] and for all d ∈ (cid:78) ,{ E b , j ( τ , σ ),..., E b , j ( τ d , σ d )} b , j d → { Z b , j ( τ , σ ),..., Z b , j ( τ d , σ d )} b , j where { Z b , j ( τ , σ ),..., Z b , j ( τ d , σ d )} b , j is a multivariate normal random vector withzero mean and covariance structureCov (cid:161) Z b , j ( τ , σ ), Z b , j ( τ , σ ) (cid:162) = (cid:189) f u b , ω j ( τ , τ ) f u b , ω j ( σ , σ ) + f u b , ω j ( τ , σ ) f u b , ω j ( τ , σ ), if b = b = b ,0, otherwise. Proof.

We will show the cumulants of the vector { E b , j ( τ , σ ),..., E b , j ( τ d , σ d )} b , j con-verge to the cumulants of the vector { Z b , j ( τ , σ ),..., Z b , j ( τ d , σ d )} b , j . As the cumu-lants of order l of the Gaussian distribution are zero for l >

2, we will show that as T → ∞ and K → ∞ , cum (cid:161) E b , j ( τ , σ ),...,..., E b l , j l ( τ l , σ l ) (cid:162) = (cid:189) o (1) if l (cid:54)= (cid:161) Z b , j ( τ , σ ), Z b , j ( τ , σ ) (cid:162) + o (1) if l = l = l ≥ cum (cid:161) E b , j ( τ , σ ),...,..., E b l , j l ( τ l , σ l ) (cid:162) = K l /2 cum (cid:181) ˆ f ( mt ) b , ω j ( τ , σ ),...,..., ˆ f ( mt ) b l , ω jl ( τ l , σ l ) (cid:182) = K l /2 (cid:88) k ··· (cid:88) k l cum (cid:181) ˆ f ( k ) b , ω j ( τ , σ ),...,..., ˆ f ( k l ) b l , ω jl ( τ l , σ l ) (cid:182) = K l /2 (cid:88) k ··· (cid:88) k l cum ( Y Y ,..., Y l Y l )where Y i = (cid:101) X ( k i ), b i , ω i T ( τ i ) and Y i = (cid:101) X ( k i ), b i , ω i T ( σ i ).Using Theorem 2.3.2 from [5], the last quantity is equal to1 K l /2 (cid:88) k ··· (cid:88) k l (cid:88) ν cum ( Y i j : i j ∈ ν )... cum ( Y i j : i j ∈ ν p ) = : (cid:88) ν C ( ν ) . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series where the sum is over all indecomposable partitions ν = ν ∪ ν ∪ ··· ∪ ν p of(1,1) (1,2)(2,1) (2,2)... ...( l ,1) ( l ,2).As there are a ﬁnite number of partitions, it is enough to show C ( ν ) = o (1) for all in-decomposible partitions ν for l > H k , k ,..., k m ( ω ) = O ( T − m /2 B ) if ω (cid:54)=

0. By theorthonormality and symmetry of the wave function, H k , k ,..., k m ( ω ) = O ( T − m /2 B ) if ω = k i ’s appear an even number of times in the index k ,..., k m ,and H k , k ,..., k m ( ω ) =

0, if ω = k i appear an odd number of times.Let | ν i | denote the number of elements of the set ν i . Note that ν is a partition ofa set of 2 l elements, and therefore (cid:80) pi = | ν i | = l . By Lemma A.1 and the property ofthe H k ,..., k l functions, we note that C ( ν ) = ν i in ν has at least one k i an oddnumber of times and otherwise C ( ν ) = K l /2 (cid:88) k ··· (cid:88) k l p (cid:89) i = O (cid:195) T −| ν i | /2 + (cid:179)(cid:80) j ∈ ν i ω j = π (cid:180) B (cid:33) = l (cid:88) r = O ( K r − l /2 ) O (cid:179) T − l + s ( ν ) B (cid:180) ,where r is the distinct number of k i ’s in a collection k , k ,..., k l and s ( ν ) = the num-ber of ν i in ν , such that (cid:80) j ∈ ν i ω j = r > l /2, then at least one of the k i ’s appear just once, and therefore one of thesets in the partition must have a single occurrence of that index. So it is enough toconsider the case where r ≤ l /2. Now consider the possibilities for the O (cid:179) T − l + s ( ν ) B (cid:180) term. Case 1: If p < l then s ( ν ) ≤ p < l , and therefore O (cid:179) T − l + s ( ν ) B (cid:180) = o (1). Case 2: If p > l , at least 2( p − l ) sets of the partitions have just one element. (Tosee this, suppose l is the number of sets with one element. Then we have 2 l ≥ l + p − l ).) For all those one element sets (cid:80) j ω j (cid:54)= π . Therefore s ( ν ) ≤ p − p − l ) = l − p < l and consequently O (cid:179) T − l + s ( ν ) B (cid:180) = o (1). Case 3: If p = l and at least one set in the partition has a single element, for thatset (cid:80) j ω j (cid:54)= π , therefore s ( ν ) ≤ l − O (cid:179) T − l + s ( ν ) B (cid:180) = o (1). Case 4:

Consider the case where p = l and all the partitions have 2 elements. Notethat as s ( ν ) ≤ p , if r < l /2 the product O ( K r − l /2 ) O (cid:179) T − l + s ( ν ) B (cid:180) = o (1). Therefore it isenough to consider the case where l is even and r = l /2. If l >

2, this means there areat least two distinct k i in the collection and by indecomposibility, one of the sets ν i in the partition must have two distinct k i , making C ( ν ) = . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series A.2. Proof of Results in Section 3

Proof of Lemma 3.1

Noting the deﬁnition of (cid:98) g u , ω in (3.6), the result follows from Lemma A.2 and somesimple algebra. Lemma

A.3 . Consider the processes H b , j ∈ L [0,1] deﬁned as H b , j ( τ , σ ) = (cid:112) K (cid:179) (cid:98) g b / B , ω j ( τ , σ ) − g u b , ω j ( τ , σ ) (cid:180) for b = B and j = J . For ﬁxed B , as T → ∞ , K → ∞ and T / BK → ∞ ,the ﬁnite dimensional distributions of { H b , j } j , b converges to a multivariate normaldistribution. More precisely, for all ( τ , σ ),...,( τ d , σ d ) ∈ [0,1] and for all d ∈ (cid:78) { H b , j ( τ , σ ),..., H b , j ( τ d , σ d )} b , j d → { Z (cid:48) b , j ( τ , σ ),..., Z (cid:48) b , j ( τ d , σ d )} b , j where { Z (cid:48) b , j ( τ , σ ),..., Z (cid:48) b , j ( τ d , σ d )} b , j is a multivariate normal random vector withzero mean and covariance structureCov (cid:179) Z (cid:48) b , j ( τ , σ ), Z (cid:48) b , j ( τ , σ ) (cid:180) = (cid:181) − B (cid:182) F ( u b , ω , ω , τ , σ , τ , σ ) + B B (cid:88) l = F ( u l , ω , ω , τ , σ , τ , σ ).Cov (cid:179) Z (cid:48) b , j ( τ , σ ), Z (cid:48) b , j ( τ , σ ) (cid:180) = − B F ( u b , ω , ω , τ , σ , τ , σ ) − B F ( u b , ω , ω , τ , σ , τ , σ ) + B B (cid:88) l = F ( u b l , ω , ω , τ , σ , τ , σ ), (A.2)where b (cid:54)= b and F ( u , ω , ω , τ , σ , τ , σ ) : = f u , ω ( τ , τ ) f u , ω ( σ , σ ) + f u , ω ( τ , σ ) f u , ω ( τ , σ ).(A.3) Proof.

In view of Lemma 3.1 it is enough to show that the joint cumulants of { H b , j ( τ , σ ),..., H b , j ( τ d , σ d )} b , j for b = B and j = J of order > T → ∞ , K → ∞ and T / K → ∞ . Note that by deﬁnition of (cid:98) g , we have H b , j ( τ , σ ) = E b , j ( τ , σ ) − B B (cid:88) b = E b , j ( τ , σ ).Therefore using the linearity of cumulants (Theorem 2.3.1 (i) & (iii) from [5]) we canwrite cum (cid:161) H b , j ( τ , σ ),...,..., H b l , j l ( τ l , σ l ) (cid:162) as sum of 2 l terms, where each term isof the form cum (cid:161) (cid:101) E b , j ( τ , σ ),...,..., (cid:101) E b l , j l ( τ l , σ l ) (cid:162) . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series where (cid:101) E b , j ( τ , σ ) is either E b , j ( τ , σ ) or the average B (cid:80) Bb = E b , j ( τ , σ ). Without loss ofgenerality, consider a term where the ﬁrst k (cid:101) E b , j ( τ , σ )’s are the averages and the last( l − k ) are E b , j . That particular term can then be simpliﬁed to1 B k B (cid:88) b = ··· B (cid:88) b k = cum (cid:161) E b , j ( τ , σ ),...,..., E b l , j l ( τ l , σ l ) (cid:162) .As B is ﬁnite, the last sum converges to 0 by Theorem A.1. As all the 2 l terms convergeto 0 and l is ﬁnite, this imples the convergence of joint cumulants of H b , j ( τ , σ ) fororder l > Proof of Theorem 3.1

We write Q ω , δ = (cid:82) (cid:82) Q ω , δ ( τ , σ ) d τ d σ , where Q ω , δ ( τ , σ ) = B (cid:88) b = (cid:161) (cid:98) g b / B , ω + δ ( τ , σ ) − (cid:101) g b / B , ω , δ ( τ , σ ) (cid:162) .Noting than the functional : L (cid:161) [0,1] (cid:162) (cid:55)→ (cid:82) is continuous it is enough to show that Q ω , δ ( τ , σ ) d = K B (cid:88) b = (cid:161) G b ( τ , σ ) + o p (1) (cid:162) ,This will be proved in two steps. Speciﬁcally, we will show that:(i) The result holds in ﬁnite dimensional distribution, i.e., for any k and any ﬁxed( τ , σ ),...,( τ k , σ k ) ∈ [0,1] the distribution of the random vector( Q ω , δ ( τ , σ ),..., Q ω , δ ( τ k , σ k ))is asymptotically equal to the distribution of1 K (cid:195) B (cid:88) b = (cid:161) G b ( τ , σ ) + o p (1) (cid:162) ,..., B (cid:88) b = (cid:161) G b ( τ , σ ) + o p (1) (cid:162)(cid:33) .(ii) The process { Q ω , δ ( τ , σ )} ( τ , σ ) ∈ [0,1] is asymptotically tight as a process in L ([0,1] ).Without the loss of generality, we will prove (i) for k =

1, the result for general k can be proved similarly with some more notations. Note that the scan statistics canbe written as Q ω , δ ( τ , σ ) = K B (cid:88) b = A b , K , T where A b , K , T ( τ , σ ) = K (cid:161) (cid:98) g b / B , ω + δ ( τ , σ ) − (cid:101) g b / B , ω , δ ( τ , σ ) (cid:162) . . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series Therefore it is enough to show the process A b , k , T ∈ L ([0,1] ) converges in distribu-tion to G b , where G b is the Gaussian process deﬁned in the statement of the Theo-rem, uniformly over b ∈ {1,2,..., B }. In order to establish that write A b , K , T ( τ , σ ) = K (cid:161) (cid:98) g b / B , ω + δ ( τ , σ ) − (cid:69) ( (cid:98) g b / B , ω + δ ( τ , σ )) − (cid:101) g b / B , ω , δ ( τ , σ ) + (cid:69) ( (cid:101) g b / B , ω , δ ( τ , σ )) (cid:162) + K (cid:161) (cid:98) g b / B , ω + δ ( τ , σ ) − (cid:101) g b / B , ω , δ ( τ , σ ) (cid:162)(cid:161) (cid:69) ( (cid:98) g b / B , ω + δ ( τ , σ )) − (cid:69) ( (cid:101) g b / B , ω , δ ( τ , σ )) (cid:162) − K (cid:161) (cid:69) ( (cid:98) g b / B , ω + δ ( τ , σ )) − (cid:69) ( (cid:101) g b / B , ω , δ ( τ , σ )) (cid:162) . (A.4)Suppose L δ is the number of frequencies ω j in the interval [ ω , ω + δ ). Using Lemma 3.1,under H we have, (cid:69) ( (cid:101) g b / B , ω , δ ( τ , σ )) = L δ (cid:88) ω j ∈ [ ω , ω + δ ) (cid:69) ( (cid:98) g b / B , ω + δ ( τ , σ )) = g u b , ω + δ ( τ , σ ) + O (log( T B )/ T B ) + O (1/ T ).Therefore by Lemma 3.1, the last two terms of (A.4) is of the order O (cid:181) K × log T B T B (cid:182) + O (cid:179) KT (cid:180) , which converges to zero under Assumption 3.1. Note that the order of theseresiduals are independent of the choice of block b .The ﬁrst term of (A.4) can be written as T ( E b , j ( τ , σ ),..., E b , j L δ + ( τ , σ )), where E b , j is the process deﬁned in Theorem A.1, the set of frequencies { ω j ,..., ω j L δ } = { ω : ω ∈ [ ω , ω + δ )}, ω j L δ + = ω + δ and the function T : (cid:82) L δ + (cid:55)→ (cid:82) is deﬁned as T ( x ,..., x L δ + ) = x L δ + − L δ L δ (cid:88) i = x i .Therefore an application of the Delta method along with Theorem A.1 guarantee T ( E b , j ( τ , σ ),..., E b , j L δ + ( τ , σ )) d → G b ( τ , σ )where G b is a zero mean Gaussian process with covariance kernel given in Theo-rem 3.1. The weak convergence of A b , K , T ( τ , σ ) to G b ( τ , σ ) then follows by the Contin-uous Mapping Theorem and Slutzky’s Theorem. This in turn proves the asymptoticﬁnite dimensional distributional equivalence in (i).Part (ii) follows from Lemma A.1 by (4.3) of Theorem 2 from [8]. Proof of Theorem 3.2

The proof is similar to the proof of Theorem 3.1. The only difference is in the treat-ment of the residual (second and third) terms of A b , K , T ( τ , σ ) deﬁned in (A.4). Notethat under the alternative speciﬁed in the statement of the Theorem, (cid:69) ( (cid:101) g b / B , ω , δ ( τ , σ )) = L δ (cid:88) ω j ∈ [ ω , ω + δ ) (cid:69) ( (cid:98) g b / B , ω + δ ( τ , σ )) = ω : ω ∈ ( ω , ω ∗ )} L δ g (1) u b ( τ , σ ) + ω : ω ∈ [ ω ∗ , ω + δ )} L δ g (2) u b ( τ , σ ) . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series + O (log( T B )/ T B ) + O (1/ T ). = ω ∗ − ω δ g (1) u b ( τ , σ ) + ω + δ − ω ∗ δ g (2) u b ( τ , σ ) + O (log( T B )/ T B ) + O (1/ T ). = g (2) u b ( τ , σ ) + ω ∗ − ω δ (cid:179) g (1) u b ( τ , σ ) − g (2) u b ( τ , σ ) (cid:180) + O (log( T B )/ T B ) + O (1/ T ).Note that the second equality follows from the fact that ω j ’s are chosen equallyspaced. Therefore the residual is K (cid:161) (cid:69) ( (cid:98) g b / B , ω + δ ( τ , σ )) − (cid:69) ( (cid:101) g b / B , ω , δ ( τ , σ )) (cid:162) + o p (1) = ω ∗ − ω δ (cid:179) g (1) u b ( τ , σ ) − g (2) u b ( τ , σ ) (cid:180) + o p (1).The the rest of the proof is similar the proof of Theorem 3.1. Proof of Theorem 3.3

We write Q ( ω , ω ) = K B (cid:88) b = (cid:90) ω ω (cid:90) (cid:90) (cid:179) (cid:112) K (cid:98) g b / B , ω ( τ , σ ) (cid:180) d τ d σ d ω = (cid:90) ω ω (cid:90) (cid:90) Q ( ω , τ , σ ),where Q ( ω , τ , σ ) = K B (cid:88) b = (cid:179) (cid:112) K (cid:98) g b / B , ω ( τ , σ ) (cid:180) .An application of Lemma 3.1 along with continuous mapping theorem guaranteesunder H that the ﬁnite dimensional distributions of Q ( ω , τ , σ ) are asymptoticallyequivalent to K (cid:80) Bb = H b ( τ , σ ). The rest of the proof follows is similar as to the proofof Theorem 3.1. Proof of Lemma 3.2

Using similar expansion as in the proof of Theorem 3.3, under the alternative wewrite Q ( ω , τ , σ ) = K B (cid:88) b = (cid:179) (cid:112) K (cid:98) g b / B , ω ( τ , σ ) (cid:180) = K B (cid:88) b = (cid:179) (cid:112) K (cid:163) (cid:98) g b / B , ω ( τ , σ ) − g u ( τ , σ ) + g u ( τ , σ ) (cid:164)(cid:180) = K B (cid:88) b = (cid:179) (cid:112) K (cid:163) (cid:98) g b / B , ω ( τ , σ ) − g u ( τ , σ ) (cid:164)(cid:180) + B g u ( τ , σ ) + g u ( τ , σ ) (cid:112) K B (cid:88) b = (cid:112) K (cid:163) (cid:98) g b / B , ω ( τ , σ ) − g u ( τ , σ ) (cid:164) .As the second term dominates under the asymptotic scheme in Assumption 3.1, thequantity Q ( ω , τ , σ ) = O p ( B ). Taking integral over ω , τ , σ we have Q ( ω , ω ) = O p ( B )and the result follows. . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series A.3. Asymptotic Distribution Properties

By Lemma 3.1 the covariance of the process H b deﬁned in Theorem 3.3 is given byCov( H b ( τ , σ ), H b ( τ , σ )) = C ( b , b , ω , ω , τ , σ , τ , σ ), (A.5)where C ( b , b , ω , ω , τ , σ , τ , σ ) = (1 − B )[ f u b , ω ( τ , τ ) f u b , ω ( σ , σ ) + f u b , ω ( τ , σ ) f u b , ω ( τ , σ )](A.6) + B B (cid:88) l = [ f u l , ω ( τ , τ ) f u l , ω ( σ , σ ) + f u l , ω ( τ , σ ) f u l , ω ( τ , σ )] C ( b , b , ω , ω , τ , σ , τ , σ ) = − B (cid:104) f u b , ω ( τ , τ ) f u b , ω ( σ , σ ) + f u b , ω ( τ , σ ) f u b , ω ( τ , σ ) (cid:105) − B (cid:104) f u b , ω ( τ , τ ) f u b , ω ( σ , σ ) + f u b , ω ( τ , σ ) f u b , ω ( τ , σ ) (cid:105) + B B (cid:88) l = (cid:104) f u bl , ω ( τ , τ ) f u l , ω ( σ , σ ) + f u bl , ω ( τ , σ ) f u bl , ω ( τ , σ ) (cid:105) ,where b (cid:54)= b .The covariance structure of the process G b is given by Cov ( G b ( τ , σ ), G b ( τ , σ )) = C ( b , b , ω + δ , ω + δ , τ , σ , τ , σ )) + L δ L δ (cid:88) j = L δ (cid:88) k = C ( b , b , ω j , ω k , τ , σ , τ , σ )) − L δ L (cid:88) j = C ( b , b , ω + δ , ω J , τ , σ , τ , σ )) − L δ L (cid:88) j = C ( b , b , ω j , ω + δ , τ , σ , τ , σ )), (A.7)and for b (cid:54)= b , Cov ( G b ( τ , σ ), G b ( τ , σ )) = C ( b , b , ω + δ , ω + δ , τ , σ , τ , σ )) + L δ L δ (cid:88) j = L δ (cid:88) k = C ( b , b , ω j , ω k , τ , σ , τ , σ )) − L δ L (cid:88) j = C ( b , b , ω + δ , ω j , τ , σ , τ , σ ))) − L δ L (cid:88) j = C ( b , b , ω j , ω + δ , τ , σ , τ , σ )) (A.8)where, C is as deﬁned in (A.6). Lemma

A.4 . As B → ∞ , the quantities B (cid:80) Bb = (cid:107) G b (cid:107) and B (cid:80) Bb = (cid:107) H b (cid:107) are O p (1). . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series Proof.

We will show this for the case of G b . The proof for H b is similar.Note that (cid:69) (cid:195) B B (cid:88) b = (cid:107) G b (cid:107) (cid:33) = B B (cid:88) b = (cid:69) (cid:161) (cid:107) G b (cid:107) (cid:162) = B B (cid:88) b = (cid:69) (cid:181)(cid:90) (cid:90) G b ( τ , σ ) d τ d σ (cid:182) = B B (cid:88) b = (cid:90) (cid:90) (cid:69) G b ( τ , σ ) d τ d σ → (cid:90) (cid:90) (cid:90) (cid:104) f u , ω + δ ( τ , τ ) f u , ω + δ ( σ , σ ) + f u , ω + δ ( τ , σ ) (cid:105) d τ d σ du + L δ L δ (cid:88) j = L δ (cid:88) k = (cid:209) (cid:104) f u , ω j ( τ , τ ) f u , ω k ( σ , σ ) + f u , ω j ( τ , σ ) f u , ω k ( τ , σ ) (cid:105) d τ d σ du − L δ L δ (cid:88) j = (cid:209) (cid:104) f u , ω j ( τ , τ ) f u , ω + δ ( σ , σ ) + f u , ω j ( τ , σ ) f u , ω + δ ( τ , σ ) (cid:105) d τ d σ du ,as B → ∞ . As f u , ω (.) is continuous in u and f u , ω is square integrable for all u ∈ [0,1]and ω ∈ (0,0.5], the integrals in the limit are ﬁnite. Note that we can exchange the in-tegral and expectation in the second line by Fubini’s Theorem as the double integralis ﬁnite. V ar (cid:195) B B (cid:88) b = (cid:107) G b (cid:107) (cid:33) = B B (cid:88) b = V ar (cid:161) (cid:107) G b (cid:107) (cid:162) + B B (cid:88) b = B (cid:88) b = Cov (cid:161) (cid:107) G b (cid:107) , (cid:107) G b (cid:107) (cid:162) The ﬁrst term can be simpliﬁed as1 B B (cid:88) b = V ar (cid:161) (cid:107) G b (cid:107) (cid:162) ≤ B B (cid:88) b = (cid:69) (cid:161) (cid:107) G b (cid:107) (cid:162) = B B (cid:88) b = (cid:69) (cid:181)(cid:90) (cid:90) G b ( τ , σ ) d τ d σ (cid:182) ≤ B B (cid:88) b = (cid:181)(cid:90) (cid:90) (cid:69) G b ( τ , σ ) d τ d σ (cid:182) = O (1/ B ).Similarly with some standard algebra we can show that1 B B (cid:88) b = B (cid:88) b = Cov (cid:161) (cid:107) G b (cid:107) , (cid:107) G b (cid:107) (cid:162) = B B (cid:88) b = B (cid:88) b = (cid:69) (cid:161) (cid:107) G b (cid:107) (cid:107) G b (cid:107) (cid:162) = O (1/ B ).Therefore as B → ∞ , (cid:69) (cid:161) B (cid:80) Bb = (cid:107) G b (cid:107) (cid:162) = O (1) and V ar (cid:161) B (cid:80) Bb = (cid:107) G b (cid:107) (cid:162) → B (cid:80) Bb = (cid:107) G b (cid:107) is O p (1). Acknowledgements

Research reported in this publication was supported by the National Institute OfGeneral Medical Sciences of the National Institutes of Health under Award NumberR01GM140476. The content is solely the responsibility of the authors and does notnecessarily represent the ofﬁcial views of the National Institutes of Health. . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series Supplementary Material R code for "Adaptive Frequency Band Analysis for Functional Time Series" (https://github.com/sbruce23/fEBA). R code, a quick start demo, and descriptionsof all functions and parameters needed to generate simulated data introduced inSection 4.1 and to implement the proposed method on data for use in practice canbe downloaded from GitHub at this link. References [1] Barry, R. J., Clarke, A. R., Johnstone, S. J., Magee, C. A., and Rushby, J. A. (2007).EEG differences between eyes-closed and eyes-open resting conditions.

ClinicalNeurophysiology , 118(12):2765 – 2773.[2] Barry, R. J. and De Blasio, F. M. (2017). EEG differences between eyes-closed andeyes-open resting remain in healthy ageing.

Biological Psychology , 129:293 – 304.[3] Benoit, O., Daurat, A., and Prado, J. (2000). Slow (0.7–2 hz) and fast (2–4 hz) deltacomponents are differently correlated to theta, alpha and beta frequency bandsduring NREM sleep.

Clinical Neurophysiology , 111(12):2103 – 2106.[4] Billman, G. (2011). Heart rate variability - A historical perspective.

Frontiers inPhysiology , 2:86.[5] Brillinger, D. R. (2002).

Time Series: Data Analysis and Theory . Philadelphia:SIAM.[6] Bruce, S. A., Tang, C. Y., Hall, M. H., and Krafty, R. T. (2020). Empirical frequencyband analysis of nonstationary time series.

Journal of the American StatisticalAssociation , 115:1933–1945.[7] Causeur, D., Kloareg, M., and Friguet, C. (2009). Control of the FWER in multipletesting under dependence.

Communications in Statistics - Theory and Methods ,38(16-17):2733–2747.[8] Cremers, H. and Kadelka, D. (1986). On weak convergence of integral functionalsof stochastic processes with applications to processes taking paths in L Ep . Stochas-tic processes and their applications , 21(2):305–317.[9] Dahlhaus, R. (1985). Asymptotic normality of spectral estimates.

Journal of Mul-tivariate Analysis , 16(3):412–431.[10] Doppelmayr, M., Klimesch, W., Pachinger, T., and Ripper, B. (1998). Individ-ual differences in brain dynamics: Important implications for the calculation ofevent-related band power.

Biological Cybernetics , 79(1):49-57.[11] Efron, B. (2007). Correlation and large-scale simultaneous signiﬁcance testing.

Journal of the American Statistical Association , 102(477):93–103.[12] Glendinning, R. and Fleet, S. (2007). Classifying functional time series.

SignalProcessing , 87(1):79 – 100.[13] Grenander, U. (1950). Stochastic processes and statistical inference.

Arkiv förmatematik , 1(3):195–277.[14] Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of sig-niﬁcance.

Biometrika , 75(4):800–802.[15] Horváth, L., Liu, Z., Rice, G., and Wang, S. (2020). A functional time series anal- . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series ysis of forward curves derived from commodity futures. International Journal ofForecasting , 36(2):646 – 665.[16] Klimesch, W. (1999). EEG alpha and theta oscillations reﬂect cognitive andmemory performance: A review and analysis.

Brain Research Reviews , 29(2):169-195.[17] Klimesch, W., Doppelmayr, M., Russegger, H., Pachinger, T., and Schwaiger, J.(1998). Induced alpha band power changes in the human EEG and attention.

Neuroscience Letters , 244(2):73-76.[18] Lenssen, N. J. L., Schmidt, G. A., Hansen, J. E., Menne, M. J., Persin, A., Ruedy,R., and Zyss, D. (2019). Improvements in the gistemp uncertainty model.

Journalof Geophysical Research: Atmospheres , 124(12):6307–6326.[19] Malik, M., Bigger, J. T., Camm, A. J., Kleiger, R. E., Malliani, A., Moss, A. J., andSchwartz, P. J. (1996). Heart rate variability. Standards of measurement, physio-logical interpretation, and clinical use.

European Heart Journal , 17(3):354-381.[20] Nacy, S. M., Kbah, S. N., Jafer, H. A., and Al-Shaalan, I. (2016). Controlling aservo motor using eeg signals from the primary motor cortex.

American Journalof Biomedical Engineering , 6(5):139–146.[21] Newson, J. J. and Thiagarajan, T. C. (2019). Eeg frequency bands in psychi-atric disorders: A review of resting state studies.

Frontiers in Human Neuroscience ,12:521.[22] Panaretos, V. M. and Tavakoli, S. (2013). Cramér–karhunen–loève represen-tation and harmonic principal component analysis of functional time series.

Stochastic Processes and their Applications , 123(7):2779–2807.[23] Panaretos, V. M., Tavakoli, S., et al. (2013). Fourier analysis of stationary timeseries in function space.

The Annals of Statistics , 41(2):568–603.[24] R Core Team (2020).

R: A Language and Environment for Statistical Computing .R Foundation for Statistical Computing, Vienna, Austria.[25] Ramsay, J. (1982). When the data are functions.

Psychometrika , 47(4):379–396.[26] Ramsay, J. O. and Danzell, C. J. (1991). Some tools for functional data analysis.

Journal of the Royal Statistical Society, B , 53:539–572.[27] Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods.

Journal of the American Statistical Association , 66(336):846–850.[28] Rao, C. R. (1958). Some statistical methods for comparison of growth curves.

Biometrics , 14(1):1–17.[29] Riedel, K. S. and Sidorenko, A. (1995). Minimum bias multiple taper spectralestimation.

IEEE Transactions on Signal Processing , 43(1):188–195.[30] Rubín, T. and Panaretos, V. M. (2020). Sparsely observed functional time series:estimation and prediction.

Electron. J. Statist. , 14(1):1137–1210.[31] Shang, H. L. and Hyndman, R. J. (2017). Grouped functional time series fore-casting: An application to age-speciﬁc mortality rates.

Journal of Computationaland Graphical Statistics , 26(2):330–343.[32] Stoehr, C., Aston, J. A. D., and Kirch, C. (2020). Detecting changes in the covari-ance structure of functional time series with application to fmri data.

Economet-rics and Statistics .[33] Storey, J. D. (2007). The optimal discovery procedure: A new approach to si-multaneous signiﬁcance testing.

Journal of the Royal Statistical Society: Series B . Bagchi and S.A. Bruce/Frequency Band Learning for Functional Time Series (Statistical Methodology) , 69(3):347–368.[34] Thomson, D. J. (1982). Spectrum estimation and harmonic analysis. Proceed-ings of the IEEE , 70(9):1055-1096.[35] Trujillo, L. T., Stanﬁeld, C. T., and Vela, R. D. (2017). The effect of electroen-cephalogram (eeg) reference choice on information-theoretic measures of thecomplexity and integration of eeg signals.

Frontiers in Neuroscience , 11:425.[36] van Delft, A. and Eichler, M. (2018). Locally stationary functional time series.

Electronic Journal of Statistics , 12(1):107–170.[37] Walden, A. T., McCoy, E. J., and Percival, D. B. (1995). The effective bandwidthof a multitaper spectral estimator.

Biometrika , 82(1):201–214.[38] Wei, J., Chen, T., Li, C., Liu, G., Qiu, J., and Wei, D. (2018). Eyes-open and eyes-closed resting states with opposite brain activity in sensorimotor and occipitalregions: Multidimensional evidences from machine learning perspective.

Fron-tiers in Human Neuroscience , 12:422.[39] Weidmann, J. (1980).