[PDF] Adaptive Estimation for Non-stationary Factor Models And A Test for Static Factor Loadings

Abstract

This paper considers the estimation and testing of a class of high-dimensional non-stationary time series factor models with evolutionary temporal dynamics. In particular, the entries and the dimension of the factor loading matrix are allowed to vary with time while the factors and the idiosyncratic noise components are locally stationary. We propose an adaptive sieve estimator for the span of the varying loading matrix and the locally stationary factor processes. A uniformly consistent estimator of the effective number of factors is investigated via eigenanalysis of a non-negative definite time-varying matrix. A high-dimensional bootstrap-assisted test for the hypothesis of static factor loadings is proposed by comparing the kernels of the covariance matrices of the whole time series with their local counterparts. We examine our estimator and test via simulation studies and a real data analysis.

Full PDF

aa r X i v : . [ s t a t . M E ] D ec Adaptive Estimation for Non-stationary FactorModels And A Test for Static Factor Loadings

Weichi Wu ∗ Center for Statistical Science,Department of Industrial Engineering,Tsinghua University, 100084 Beijing, China Zhou Zhou † Department of Statistical SciencesUniversity of TorontoOntario M5S 3G3, Canada

January 1, 2021

Abstract

This paper considers the estimation and testing of a class of high-dimensional non-stationarytime series factor models with evolutionary temporal dynamics. In particular, the entries andthe dimension of the factor loading matrix are allowed to vary with time while the factorsand the idiosyncratic noise components are locally stationary. We propose an adaptive sieveestimator for the span of the varying loading matrix and the locally stationary factor pro-cesses. A uniformly consistent estimator of the eﬀective number of factors is investigated viaeigenanalysis of a non-negative deﬁnite time-varying matrix. A high-dimensional bootstrap-assisted test for the hypothesis of static factor loadings is proposed by comparing the kernelsof the covariance matrices of the whole time series with their local counterparts. We examineour estimator and test via simulation studies and a real data analysis.

Keywords: Time series factor model, local stationarity, high dimensional time series, test ofstatic factor loadings, adaptive estimation.

Technology advancement has made it easy to record simultaneously a large number of stochasticprocesses of interest over a relatively long period of time where the underlying data generating ∗ The author gratefully acknowledges

NSFC Young program (No.11901337). † The author gratefully acknowledges

NSERC X i,n = A ( i/n ) z i,n + e i,n , (1)where { X i,n } ni =1 is a p -dimensional observed time series, A ( t ): [0 , → R p × d ( t ) is a matrix-valuedfunction of possibly time-varying factor loadings and the number of factors d ( t ) is assumed tobe a piece-wise constant function of time, { z i,n } ni =1 is a d ( i/n )-dimensional unobserved sequenceof common factors and { e i,n } ni =1 are the idiosyncratic components. Here p = p n may diverge toinﬁnity with the time series length n and d ( t ) is typically much smaller than p uniformly over t .Note that X i,n , z i,n and e i,n are allowed to be non-stationary processes. Throughout the article weassume that { e i,n } and { z i,n } are centered.The version of model (1) with constant loadings is among the most popular dimension reductiontools for the analysis of multivariate stationary time series (Stock and Watson (2011), Tsay (2013),Wei (2018)). According to the model assumptions adapted and estimation methods used, it seemsthat recent literature on linear time series factor models mainly falls into two types. The cross-sectional averaging method (summarized in Stock and Watson (2011)) which is popular in theeconometric literature of linear factor models, exploits the assumption of weak dependence amongthe vector components of e i,n and hence achieves de-noising via cross-sectional averaging. See forinstance Stock and Watson (2002), Bai and Ng (2002), Bai (2003) and Forni et al. (2000) amongmany others. One advantage of the cross-sectional averaging method is that it allows for a veryhigh dimensionality. In fact the method requires that p diverges to achieve consistency and theestimation accuracy improves as p gets larger under the corresponding model assumptions. Onthe other hand, the linear factor model can also be ﬁtted by exploring the relationship betweenthe factor loading space and the auto-covariance or the spectral density matrices of the time seriesunder appropriate assumptions. This method dates back at least to the works of Anderson (1963),Brillinger (2001) and Pena and Box (1987) among others for ﬁxed dimensional multivariate timeseries and is extended to the high dimensional setting by the excellent recent works of Lam et al.22011), Lam and Yao (2012), Wang et al. (2019) and others. The latter method allows for strongercontemporary dependence among the vector components and is consistent when p is ﬁxed underthe requirement that the idiosyncratic components form a white noise.To date, the literature on evolutionary linear factor models is scarce and the existing results arefocused on extensions of the cross-sectional averaging method. Among others, Breitung and Eickmeier(2011) and Yamamoto and Tanaka (2015) considered testing for structural changes in the factorloadings and Motta et al. (2011) and Su and Wang (2017) considered evolutionary model (1) usingthe cross-sectional averaging method assuming that the number of factors is constant over time. Inthis paper, we shall extend the second estimation method mentioned in the last paragraph to thecase of evolutionary factor loadings with locally stationary factor and idiosyncratic component timeseries whose data generating mechanisms change smoothly over time. In particular, the numberof factors is allowed to vary with time. We adapt the method of sieves (Grenander (1981),Chen(2007)) to estimate the high-dimensional auto-covariance matrices of X i,n at each time point andsubsequently estimate the loadings A ( t ) at each t exploiting the relationship between A ( · ) and thekernel of the latter local auto-covariance matrices. The sieve method is computationally simple toimplement and is adaptive to the unknown smoothness of the target function if certain linear sievessuch as the Fourier basis (for periodic functions), the Legendre polynomials and the orthogonalwavelets are used (Chen (2007), Wang and Xiang (2012)). We will show that the span of A ( · ) canbe estimated at a rate independent of p uniformly over time provided that all factors are strongwith order p / Euclidean norms, extending the corresponding result for factor models with staticloadings established in Lam et al. (2011). Uniform consistency of the estimated number of factorswill be established without assuming that the positive eigenvalues of the corresponding matrix aredistinct.Testing whether A ( · ) is constant over time is important in the application of (1). In the litera-ture, Su and Wang (2017) considered an L test of static factor loadings under the cross-sectionalaveraging method assuming that each component of e i,n is a martingale diﬀerence sequence. Herewe shall propose a high-dimensional L ∞ or maximum deviation test which utilizes the observationthat the kernel of the full-sample auto-covariance matrices coincides with all of its local coun-terparts under the null hypothesis of static loadings while the latter observation is likely to failwhen A ( · ) is time-varying. Using the uniform convergence rates of the estimated factor loadingsestablished in this paper, the test statistic will be shown to be asymptotically equivalent to the3aximum deviation of the sum of a high-dimensional locally stationary time series under some mildconditions. A multiplier bootstrap procedure with overlapping blocks is adapted to approximatethe critical values of the test. The bootstrap will be shown to be asymptotically correct under thenull and powerful under a large class of local alternatives.The paper is organized as follows. Section 2 introduces the locally stationary factor modelsand their assumptions. Sections 3 and 4 discuss the estimation of the evolutionary factor loadingand the test of static factor loadings, respectively. Theoretical results are contained in Section 5and numerical studies are displayed in Section 6. Section 7 illustrates a real world ﬁnancial dataapplication. Finally, we provide an supplementary material which includes extra assumptions,tuning parameter selection, power analysis for the test of static factor loadings and all proofs. We start with introducing some notation. For two series a n and b n , write a n ≍ b n if the exists0 < m < m < ∞ such that m ≤ lim inf | a n || b n | ≤ lim sup | a n || b n | ≤ m < ∞ . Write a n / b n ( a n ' b n )if there exists a uniform constant M such that a n ≤ M b n ( a n ≥ M b n ). For any p dimensional(random) vector v = ( v , ..., v p ) ⊤ , write k v k u = ( P ps =1 | v s | u ) /u , and the corresponding L v norm k V k L v = ( E ( k V k v )) /v for v ≥

1. For any matrix F let λ max ( F ) be its largest eigenvalue, and λ min ( F ) be its smallest eigenvalue. Let k F k F = (trace( F ⊤ F )) / denote the Frobenius norm,and k F k m be the smallest nonzero singular value of F . For any vector or matrix A = ( a ij ) let | A | ∞ = max i,j | a ij | . For any integer v denote by I v the v × v identity matrix. Write |I| be thelength of the interval I . For any matrix M , denote by V ec ( M ) the vector obtained by stackingthe columns of M . Let d = sup ≤ t ≤ d ( t ).We shall ﬁrst demonstrate that the evolutionary factor model (1) is equivalent to a factor modelwith a ﬁxed factor dimension d while the rank of the loading matrix varies with time. The latterrepresentation is beneﬁcial for the theoretical investigation of the model and provides insights onhow smooth changes in the loading matrix cause the dimensionality of the factor space to change.Consider a p × d matrix A ∗ ( t ) = ( a ∗ ( t ) , ...., a ∗ d ( t )) where a ∗ s ( t ) , ≤ s ≤ d are p dimensionalsmooth functions, sup t ∈ [0 , k a ∗ s ( t ) k ≍ p − δ , and inf t ∈ [0 , k A ∗ ( t ) k F ≍ p − δ , sup t ∈ [0 , k A ∗ ( t ) k F ≍ p − δ . Here δ refers to the factor strength and will be discussed in condition (C1) later. Let z ∗ i,n and e i,n be d and p dimensional locally stationary time series. Here local stationarity refers to4 slowly or smoothly time-varying data generating mechanism of a time series, which we refer toDahlhaus (1997) and Zhou and Wu (2009) among others for rigorous treatments. Consider thefollowing model with a ﬁxed dimensional loading matrix X i,n = A ∗ ( i/n ) z ∗ i,n + e i,n . (2)Using singular value decomposition (SVD), model (2) can be written as X i,n = p − δ U ∗ ( i/n ) Σ ∗ ( i/n ) V ∗⊤ ( i/n ) z ∗ i,n + e i,n , (3)where Σ ∗ ( i/n ) is a p × d rectangular diagonal matrix with diagonal Σ ∗ uu ( i/n ) = σ u ( i/n ) for 1 ≤ u ≤ d where ( σ u ( i/n )) ≤ u ≤ d are singular values of A ∗ ( i/n ) /p − δ , U ∗ ( i/n ) U ∗⊤ ( i/n ) = I p and V ∗⊤ ( i/n ) V ∗ ( i/n ) = I d . Therefore max ≤ u ≤ d sup t ∈ (0 , σ u ( t ) is bounded. Further, the ( σ u ( i/n )) ≤ u ≤ d are ordered such that σ (1 /n ) ≥ ... ≥ σ d (1 /n ) ≥

0. Let d ( t ) be the number of nonzero singularvalues σ u ( t ) and equation (3) can be further written as X i,n = p − δ ˜ U ∗ ( i/n ) ˜ Σ ∗ ( i/n ) ˜ V ∗⊤ ( i/n ) z ∗ i,n + e i,n , (4)where ˜ Σ ( i/n ) is the matrix by deleting the rows and columns of Σ ∗ ( i/n ) where the diagonalsare zero, and ˜ U ∗ ( i/n ) and ˜ V ∗⊤ ( i/n ) are the matrices resulted from the deletion of correspondingcolumns and rows of U ∗ ( i/n ) and V ∗⊤ ( i/n ), respectively. Let A ( i/n ) = p − δ ˜ U ∗ ( i/n ) ˜ Σ ∗ ( i/n ) and z i,n = ˜ V ∗⊤ ( i/n ) z ∗ i,n , then A ( i/n ) is a p × d ( i/n ) matrix and z i,n is a length d ( i/n ) locally stationaryvector. Then model (4) has the form of model (1).The equivalence between (1) and (2) has the following implications. First, though we allow d ( t ) to jump from one integer to another over time, it is appropriate to assume X i,n of (1) islocally stationary since model (2) is locally stationary as long as z ∗ i,n and e i,n are locally stationaryprocesses and each element of A ∗ ( t ) is Lipschitz continuous. Second, we observe from the SVDthat k A ( i/n ) k m will be close to zero in a small neighborhood around the time when d ( t ) of model(1) changes in which case it is diﬃcult to estimate the rank of A ∗ ( · ) or the number of factorsaccurately. We will exclude such small neighborhoods in our asymptotic investigations.5 .1 Some Preliminary Assumptions Let ˜ z i,n := V ∗⊤ ( i/n ) z ∗ i,n in (3). Observe that ˜ z i,n is the collection of all factors in the sense that z i,n is a sub-vector of ˜ z i,n for each i . Adapting the formulation in Zhou and Wu (2009), we modelthe locally stationary time series X i,n , ˜ z i,n and e i,n , 1 ≤ i ≤ n as follows: X i,n = G ( i/n, F i ) , ˜ z i,n = Q ( i/n, F i ) , e i,n = H ( i/n, F i ) (5)where the ﬁltration F i = ( ǫ −∞ , ..., ǫ i − , ǫ i ) with { ǫ i } i ∈ Z i.i.d. random elements, and G , Q and H are p , d and p dimensional measurable nonlinear ﬁlters. The j th entry of the time series X i,n , z i,n and e i,n can be written as X i,j,n = G j ( i/n, F i ), z i,j,n = Q j ( i/n, F i ) and e i,j,n = H j ( i/n, F i ). Let { ǫ ′ i } i ∈ Z be an independent copy of { ǫ i } i ∈ Z and denote by F ( h ) i = ( ǫ −∞ , ... ǫ h − , ǫ ′ h , ǫ h +1 , ..., ǫ i ) for h ≤ i , and F ( h ) i = F i otherwise. The dependence measures for X i,n , ˜ z i,n ( z i,n ) and e i,n in L l normare deﬁned as (Zhou and Wu (2009)) δ G,l ( k ) := sup t ∈ [0 , ,i ∈ Z , ≤ j ≤ p δ G,l,j ( k ) := sup t ∈ [0 , ,i ∈ Z , ≤ j ≤ p E /l ( | G j ( t, F i ) − G j ( t, F ( i − k ) i ) | l ) ,δ Q,l ( k ) := sup t ∈ [0 , ,i ∈ Z , ≤ j ≤ d δ Q,l,j ( k ) := sup t ∈ [0 , ,i ∈ Z , ≤ j ≤ d E /l ( | Q j ( t, F i ) − Q j ( t, F ( i − k ) i ) | l ) ,δ H,l ( k ) =: sup t ∈ [0 , ,i ∈ Z , ≤ j ≤ δ H,l,j ( k ) := sup t ∈ [0 , ,i ∈ Z , ≤ j ≤ p E /l ( | H j ( t, F i ) − H j ( t, F ( i − k ) i ) | l ) , which quantify the magnitude of change of systems G , Q , H in L l norm when the input of thesystems k steps ahead are replaced by its i.i.d. copies. We will make several requirements for thefactor loading matrix A ( t ). Due to space constraints, those conditions are deferred to the supple-mentary material. Conditions (A1) and (A2) in the supplemental material regard the boundednessand smoothness of elements of A ( t ) so that they can be well approximated by orthonormal basisfunctions. We present condition (C1) on factor strength ( c.f. Section 2.3 of Lam et al. (2011)) asfollows.(C1) Write A ( t ) = ( a ( t ) , ...., a d ( t ) ( t )) where a s ( t ) , ≤ s ≤ d ( t ) are p dimensional vectors. Thensup t ∈ [0 , k a s ( t ) k ≍ p − δ for 1 ≤ s ≤ d ( t ) for some constant δ ∈ [0 , A ( t ) satisﬁesinf t ∈ [0 , k A ( t ) k F ≍ p − δ , sup t ∈ [0 , k A ( t ) k F ≍ p − δ , inf t ∈T ηn k A ( t ) k m ≥ η / n p − δ (6)6or a positive sequence η n = O (1) on a collection of intervals T η n ⊂ [0 , η n is stated in condition (S4) of the supplemental material.As in Lam et al. (2011) and Lam and Yao (2012), δ = 0 and δ > d ( t ) factors in the modelare of equal strength δ on [0 , k A ( t ) k m is of the order η / n p − δ on T η n which enables us tocorrectly identify the number of factors on T η n . By our discussions of model (2), T η n excludes smallneighborhoods around the change points of the number of factors. See Remark 5.1 of Section 5 forthe use of η n in practice.In the supplementary material, we further make assumptions (M1), (M2) and (M3) on thedependence strength, moments and Lipschitz continuity of the processes G ( t, F i ), H ( t, F i ) and Q ( t, F i ), and assumptions (S1), (S2), (S3) and (S4) on covariance matrix of common factors z i,n and idiosyncratic components e i,n , which are needed for spectral decomposition. We also discussthe equivalent assumptions on model (2) in Remark A.2 of the supplemental material. Let Σ X ( t, k ) = E ( G ( t, F i + k ) G ⊤ ( t, F i )) with σ x,u,v ( t, k ) its ( u, v ) th element. When X i,n is locallystationary, Σ X ( t, k ) is smooth with respect to t . Deﬁne Σ z ( t, k ) and Σ z,e ( t, k ) analogously. Observefrom equation (1) and conditions (S1)-(S4) in the supplementary that Σ X ( t, k ) = A ( t ) Σ z ( t, k ) A ⊤ ( t ) + A ( t ) Σ z,e ( t, k ) , k ≥ , (7)Further deﬁne Λ ( t ) = P k k =1 Σ X ( t, k ) Σ ⊤ X ( t, k ) for some pre-speciﬁed integer k and we have Λ ( t ) = A ( t ) h k X k =1 ( Σ z ( t, k ) A ⊤ ( t ) + Σ z,e ( t, k ))( A ( t ) Σ ⊤ z ( t, k ) + Σ ⊤ z,e ( t, k )) i A ⊤ ( t ) . (8)Therefore in principle A ( t ) can be identiﬁed by the null space of Λ ( t ). As we discussed in theintroduction, ﬁtting factor models using relationships between the factor space and the null space ofthe auto-covariance matrices has a long history. The use of Λ ( t ) was advocated in Lam et al. (2011).In the following we shall propose a nonparametric sieve-based method for time-varying loadingmatrix estimation which is adaptive to the smoothness (with respect to t ) of the covariance function7 X ( t, k ). For a pre-selected set of orthonormal basis functions { B j ( t ) } ∞ j =1 satisfying condition (A2)of the supplementary material, we shall approximate Σ X ( t, k ) by a ﬁnite but diverging order basisexpansion Σ X ( t, k ) ≈ J n X j =1 Z Σ X ( u, k ) B j ( u ) duB j ( t ) , (9)where the order J n diverges to inﬁnity. The speed of divergence is determined by the smoothnessof Σ X ( t, k ) with respect to t ; See (A2) in the supplemental material for more details. Motivatedby (9) we propose to estimate Λ ( t ) by the following ˆ Λ ( t ). Deﬁne˜ Σ X,j,k = 1 n n X i =1 X i + k X ⊤ i B j ( in ) , (10)ˆ M ( J n , t, k ) = J n X j =1 ˜ Σ X,j,k B j ( t ) , ˆ Λ ( t ) = k X k =1 ˆ M ( J n , t, k ) ˆ M ⊤ ( J n , t, k ) . (11)Let λ ( ˆ Λ ( t )) ≥ λ ( ˆ Λ ( t )) ≥ ... ≥ λ p ( ˆ Λ ( t )) denote the eigenvalues of ˆ Λ ( t ), and ˆ V ( t ) = (ˆ v ( t ) , ..., ˆ v d ( t ) ( t ))where ˆ v i ( t ) ′ s are the eigenvectors corresponding to λ ( ˆ Λ ( t )),..., λ d ( t ) ( ˆ Λ ( t )). Then we estimate thecolumn space of A ( t ) by Span (ˆ v ( t ) , ..., ˆ v d ( t ) ( t )) . (12)To implement this approach it is required to determine d ( t ), which we estimate using a modiﬁedversion of the eigen-ratio statistics advocated by Lam and Yao (2012); that is, we estimateˆ d n ( t ) = argmin ≤ i ≤ R ( t ) λ i +1 ( ˆ Λ ( t )) /λ i ( ˆ Λ ( t )) . (13)where R ( t ) is the largest integer less or equal to n/ | λ R ( t ) ( ˆ Λ ( t )) | qP pi =1 λ i ( ˆ Λ ( t )) ≥ C η n / log n (14)for some positive constant C . Asymptotic theory established in Section 5 guarantees that thepercentage of variance explained by the ﬁrst d ( t ) eigenvalues will be much larger than η n / log n T η n with high probability which motivates us to restrict the upper-bound of the search range, R ( t ), using (14). It is of practical interest to test H : A ( t ) = A , where A is a p × d matrix. Under the nullhypothesis, Σ X ( t, k ) = AΣ z ( t, k ) A ⊤ + AΣ z,e ( t, k ) for k = 0 . Observe that testing H is moresubtle than testing covariance stationarity of X i,n as both z i,n and e i,n can be non-stationary underthe null. By equation (1), we have, under H , Z Σ X ( t, k ) dt = A Z ( Σ z ( t, k ) A ⊤ + Σ z,e ( t, k )) dt, k > . Consider the following quantity Γ k and its estimate ˆ Γ k : Γ k = Z Σ X ( t, k ) dt Z Σ ⊤ X ( t, k ) dt, ˆ Γ k = ( n − k X i =1 X i + k,n X ⊤ i,n /n )( n − k X i =1 X i + k X ⊤ i,n /n ) ⊤ . Let Γ = P k k =1 Γ k and ˆ Γ = P k k =1 ˆ Γ k . Then the kernel space of A can be estimated by the kernelof ˆ Γ under H . Let ˆ F i , i = 1 , ..., p − d be the orthonormal eigenvectors w.r.t. ( λ d +1 ( ˆ Γ ),..., λ p ( ˆ Γ )),and F i , i = 1 , ..., p − d be the orthonormal eigenvectors of the kernel of Γ . Write F = ( F , ..., F p − d ),ˆ F = ( ˆ F , ..., ˆ F p − d ).The test is constructed by segmenting the time series into non-overlapping equal-sized blocksof size m n . Without loss of generality, consider n − k = m n N n for integers m n and N n . Deﬁne theindex set b j = (( j − m n + 1 , ..., jm n ) for 1 ≤ j ≤ N n . For 1 ≤ j ≤ N n letˆ Σ X ( j, k ) = X i ∈ b j X i + k,n X ⊤ i,n /m n (15)Then we deﬁne the test statistic ˆ T n ˆ T n = √ m n max ≤ k ≤ k max ≤ j ≤ N n max ≤ i ≤ p − ˆ d n | ˆ F ⊤ i ˆ Σ X ( j, k ) | ∞ , (16)where ˆ d n is an estimate of d . The test ˆ T n utilizes the observation that the kernel space of the full9ample statistic Γ coincides with that of Σ X ( t, k ) for each local t under the null while the latter islikely to fail under the alternative. Hence the multiplication of ˆ F ⊤ and ˆ Σ X ( j, k ) should be smallin L ∞ norm uniformly in j under the null while some of those multiplications are likely to be largeunder the alternative.We propose a multiplier bootstrap procedure with overlapping blocks to determine the criticalvalues of ˆ T n . Deﬁne for 1 ≤ i ≤ m n , 1 ≤ j ≤ N n , the ( p − d ) × p vectorˆ Z i,j,k = vec (( ˆ F ⊤ X ( j − m n + i + k,n X ⊤ ( j − m n + i,n ) ⊤ ) , (17)and the k ( p − d ) p − vector ˆ Z i,j, · = ( ˆ Z ⊤ i,j, , . . . , ˆ Z ⊤ i,j,k ) ⊤ . Further deﬁneˆ Z i = ( ˆ Z ⊤ i, , · , ..., ˆ Z ⊤ i,N n , · ) ⊤ (18)for 1 ≤ i ≤ m n . Notice that ˆ T n = | P mni =1 ˆ Z i √ m n | ∞ . For given m n , let ˆ S j,w n = P j + w n − r = j ˆ Z i andˆ S m n = P m n r =1 ˆ Z i for 1 ≤ i ≤ m n where w n = o ( m n ) and w n → ∞ is the window size. Deﬁne κ n = 1 p w n ( m n − w n + 1) m n − w n +1 X j =1 (ˆ S j,w n − w n m n ˆ S m n ) R j (19)where { R i } i ∈ Z are i.i.d. N (0 ,

1) independent of { X i,n , ≤ i ≤ n } . Then we have the followingalgorithm for testing static factor loadings: Algorithm for implementing the multiplier bootstrap: (1) Select m n and w n by the Minimal Volatility (MV) method described in Section D.2 of thesupplementary material.(2) Generate B (say 1000) conditionally i.i.d. copies of K r = | κ ( r ) n | ∞ r = 1 , ...B , where κ ( r ) n isobtained by (19) via the r th copy of i.i.d. standard normal random variables { R ( r ) i } i ∈ Z .(3) Let K ( r ) , ≤ r ≤ B be the order statistics for K r , ≤ r ≤ B . Then we reject H at level α if ˆ T n ≥ K ( ⌊ (1 − α ) B ⌋ ) . Let B ∗ = max { r : K ( r ) ≥ ˆ T n } and the corresponding p value of the testcan be approximated by 1 − B ∗ /B . 10o implement our test, d will be estimate byˆ d n = argmin ≤ i ≤ R λ i +1 ( ˆ Γ ) /λ i ( ˆ Γ ) . (20)for some constant R satisfying (14) with ˆ Λ ( t ) replaced by ˆ Γ therein. Theorem 5.1 provides the estimation accuracy of ˆΛ ( t ) by the sieve method. Theorem 5.1.

Assume conditions (A1), (A2), (C1), (M1), (M2) ,(M3) and (S1)–(S4) of thesupplementary material hold. Deﬁne ι n = sup ≤ j ≤ J n Lip j + sup t, ≤ j ≤ J n | B j ( t ) | , where Lip j is theLipschitz constant of basis function B j ( t ) , and β n = J n sup t, ≤ j ≤ Jn | B j ( t ) | pι n n + pg J n ,K, ˜ M , where thequantity g J n ,K, ˜ M is deﬁned in condition (A2) of the supplementary. Write ν n = J n p sup t, ≤ j ≤ Jn | B j ( t ) | √ n + β n , Then we have (cid:13)(cid:13)(cid:13) sup t ∈ [0 , (cid:13)(cid:13)(cid:13) ˆ Λ ( t ) − Λ ( t ) (cid:13)(cid:13)(cid:13) F (cid:13)(cid:13)(cid:13) L = O ( p − δ ν n ) . From the proof, we shall see that k Λ ( t ) k F is of the order p − δ uniformly for t ∈ [0 , √ np − δ J n sup t, ≤ j ≤ Jn | B j ( t ) | → ∞ , p − δ /β n → ∞ , then the approximation errorof ˆ Λ ( t ) is negligible compared with the magnitude of Λ ( t ). For orthnormal Legendre polynomialsand trigonometric polynomials it is easy to derive that Lip j = O ( j ). Similar calculations can beperformed for a large class of frequently-used basis functions. We refer to Remark A.1 in Section1 of the supplementary for a detailed discussion on the magnitude of the sieve approximationerror g J n ,K, ˜ M for some widely-used basis functions. The ﬁrst term in the approximation rate ofTheorem 5.1 is due to the stochastic variation of ˆ M ( J n , t, k ), while the second term is due to basisapproximation.Write ˆ B ( t ) = (ˆ b d ( t )+1 ( t ) , ..., ˆ b p ( t )) where ˆ b s ( t ), d ( t ) + 1 ≤ s ≤ p are eigenvectors corresponding to λ d +1 ( ˆ Λ ( t )),... λ p ( ˆ Λ ( t )). Obviously (ˆ v ( t ) , ..., ˆ v d ( t ) ( t ) , ˆ b d ( t )+1 ( t ) , ..., ˆ b p ( t )) form a set of orthonormal11asis of R p . Deﬁne V ( t ) = ( v ( t ) , ..., v d ( t ) ( t )) where v i ( t ) ′ s are the eigenvectors corresponding to thepositive λ i ( Λ ( t )) ′ s and B ( t ) = ( b d ( t )+1 ( t ) , ..., b p ( t )) where b s ( t ), d ( t ) + 1 ≤ s ≤ p are orthonormalbasis for null space of Λ ( t ). Therefore ( v ( t ) , ..., v d ( t ) ( t ) , b d ( t )+1 ( t ) , ..., b p ( t )) form an orthonormalbasis of R p . Theorem 5.2.

Under conditions of Theorem 5.1 and conditions (S1)-(S4), we have(i) For each t ∈ T η n there exists an orthogonal matrices ˆ O ( t ) ∈ R d ( t ) × d ( t ) and ˆ O ( t ) ∈ R ( p − d ( t )) × ( p − d ( t )) such that k sup t ∈T ηn k ˆ V ( t ) ˆ O ( t ) − V ( t ) k F k L = O ( η − n p δ − ν n ) , k sup t ∈T ηn k ˆ B ( t ) ˆ O ( t ) − B ( t ) k F k L = O ( η − n p δ − ν n ) . (ii) Furthermore, if lim sup p sup t ∈ (0 , λ max ( E ( H ( t, F i ) H ⊤ ( t, F i ))) < ∞ we have that p − / max ≤ i ≤ n, in ∈T ηn k ˆ V ( i/n ) ˆ V ⊤ ( i/n ) X i,n − A ( i/n ) z i,n k = O p ( η − n p δ/ − ν n + p − / ) . Assertion (i) involves orthogonal matrices ˆ O ( t ) and ˆ O ( t ) since it allows multiple eigenvalues atcertain time points, which yields the non-uniqueness of the eigen-decomposition. Moreover, whenall the factors are strong ( δ = 0) the rate in (i) is independent of p and reduces to the uniformnonparametric sieve estimation rate for univariate smooth functions, which coincides with the well-known ‘blessing of dimension’ phenomenon for stationary factor models, see for example Lam et al.(2011). Further assuming p δ − ν n = o ( η n ), Theorem 5.2 guarantees the uniform consistency of theeigen-space estimator on T η n .Through studying the estimated eigenvectors, Theorem 5.1 and Theorem 5.2 demonstrate thevalidity of the estimator (12). The next thoerem discusses the property of estimated eigenvalues λ i ( ˆ Λ ( t )), i = 1 , ...p. Theorem 5.3.

Assume ν n p δ − = o (1) and that there exists a sequence g n → ∞ such that η n ( g n ν n p δ − ) / → ∞ . Under conditions of Theorem 5.2, we have that(i) k sup t ∈ (0 , max ≤ j ≤ d ( t ) | λ i ( ˆ Λ ( t )) − λ i ( Λ ( t )) |k L = O ( p − δ ν n ) .(ii) P (sup t ∈T ηn max j = d ( t )+1 ,...,p λ i ( ˆ Λ ( t )) ≥ g n η − n ν n ) = O ( g n ) . iii) − P ( λ j +1 ( ˆ Λ ( t )) λ j ( ˆ Λ ( t )) ' η n , j = 1 , ..., d ( t ) − , ∀ t ∈ T η n ) = O ( ν n η n p − δ ) .(iv) P (sup t ∈T ηn λ d ( t )+1 ( ˆ Λ ( t )) λ d ( t ) ( ˆ Λ ( t )) ' p δ − g n ν n /η n ) = O ( g n + ν n η n p − δ ) . Notice that rate in (iv) is asymptotic negligible compared with that in (iii). If d ( t ) has abounded number of changes, (iii) and (iv) of Theorem 5.3 indicate that the eigen-ratio estimator(13) is able to consistently identify the time-varying number of factors d ( t ) on intervals with totallength approaching 1. We remark that most results on linear factor models require that the positiveeigenvalues are distinct. Theorems 5.2 and 5.3 are suﬃciently ﬂexible to allow multiple eigenvaluesof Λ ( t ). Remark 5.1.

In practice, if the estimated number of factors ˆ d ( t ) does not change over time, thenaccording to our discussions in Section 2, T η n = [0 , . Otherwise one can set ˆ T η n = (0 , ∩ rs =1 (cid:18) ˆ d n (ˆ t s ) − n , ˆ d n (ˆ t s ) + 1log n (cid:19) (21) where ˆ t s , s = 1 , ...r are the estimated time points when d ( t ) changes. In fact, equation (21) corresponds to choosing η n ≍ n when the eigenvalues of A ( t ) /p − δ are Lipschitz continuous. Theorem 5.3 yields the following consistency result for the eigen-ratio estimator (13).

Proposition 5.1.

Assume ν n p δ − = o (1) . Under the conditions of Theorem 5.2, we have P ( ∃ t ∈ T η n , ˆ d n ( t ) = d ( t )) = O ( νp δ − log nη n ) . (22)Hence ˆ d n ( t ) is uniformly consistent on T η n if νp δ − log nη n = o (1). In the best scenario where Σ X ( t, k ) is analytic w.r.t. t , we will carry out a detailed discussion on the rates of Theorem 5.3 fortwo classes of basis functions in Remark A.3 of the supplementary material. We discuss the limiting behaviour of ˆ T n of (16) under H in this subsection. First, the followingproposition indicates that with asymptotic probability one ˆ d n equals d under H . Proposition 5.2.

Assume conditions (A1), (A2), (C1), (M1), (M2), (M3) and condition (S1)-(S4), and that p δ √ n = o (1) . Then we have, under H , P ( ˆ d n = d ) = O ( p δ √ n log n ) as n → ∞ . T n using the true quantities F and d to approximate ˆ T n ,˜ T = √ m n max ≤ k ≤ k max ≤ j ≤ N n max ≤ i ≤ p − d | F ⊤ i ˆ Σ X ( j, k ) | ∞ . (23)Recall the deﬁnition of ˆ F and F in Section 4. We shall see from Proposition 5.3 below that ˆ T − ˜ T is negligible asymptotically under some mild conditions due to the small magnitude of k ˆ F − F k F . Proposition 5.3.

Assuming conditions (A1), (A2), (C1), (M’), (M1), (M3), and (S) in thesupplementary and that p δ √ n log n = o (1) , then there exists a set of orthnormal basis F of the nullspace of Γ such that for any sequence g n → ∞ P ( | ˆ T n − ˜ T n | ≥ g n p δ √ m n √ n Ω n ) = O ( 1 g n + p δ √ n log n ) (24) where Ω n = ( np/m n ) /l √ p and l is a positive number such that the l th moments of e i and z i areﬁnite and the l th dependence measures of the two processes are summable. See Condition M’ inthe supplemental material for more details. When l is suﬃciently large, the order of the rate in (24) is close to p δ +1 / √ m n √ n . Hence, in orderfor the error in (24) to vanish, p can be as large as O ( n a ) for any a < δ = 0. Furthermore,under the null ˜ T is equivalent to the L ∞ norm of the averages of a high-dimensional time series.Speciﬁcally, deﬁne ˜ Z i by replacing ˆ F with F in the deﬁnition of ˆ Z i (c.f. (18)) with its j th elementdenoted by ˜ Z ij . Then straightforward calculations indicate that ˜ T = | P mni =1 ˜ Z i √ m n | ∞ . Therefore we canapproximate ˜ T by the L ∞ norm of a certain mean zero Gaussian process via the recent developmentin high dimensional Gaussian approximation theory, see for instance Chernozhukov et al. (2013)and Zhang and Cheng (2018). Let Y i = ( y i , ...y i ( k N n ( p − d ) p ) ) be a centered k N n ( p − d ) p Gaussianrandom vectors that preserved the auto-covariance structure of ˜ Z i for 1 ≤ i ≤ m n and write Y = P m n i =1 Y i / √ m n . Theorem 5.4.

Assume conditions of Proposition 5.3 holds, p δ √ m n log n √ n Ω n = o (1) and assume con-ditions (a)-(f ) in Section 2 of the supplemental material.Then under null hypothesis, we have sup t ∈ R | P ( ˜ T n ≤ t ) − P ( | Y | ∞ ≤ t ) | / ι ( m n , k N n p ( p − d ) , l, ( N n p ) /l ) , (25)14 here ι is deﬁned in (B.8) of the supplementary material. Furthermore, sup t ∈ R | P ( ˆ T n ≤ t ) − P ( | Y | ∞ ≤ t ) | = O (cid:18) p δ √ m n log n √ n Ω n (cid:19) / + ι ( m n , k N n p ( p − d ) , l, ( N n p ) /l ) ! . (26)Notice that if the dependence measures of X i,n in condition (e) of the supplemental mate-rial satisfy δ G, l ( k ) = O (( k + 1) − (1+ α ) ) for α > ul − ∨ u = log m n / log n − log m n / log n +2 log p/ log n , then ι ( m n , k N n p ( p − d ) , l, ( N n p ) /l ) = o (1). If further δ G, l ( k ) decays geometrically as k increases,then ι ( m n , k N n p ( p − d ) , l, ( N n p ) /l ) will be reduced to m − (1 − ζ ) / n for a constant ζ deﬁned incondition (c). Furthermore, we provide detailed explanations of conditions (a)-(f) and examplesthat satisfy these conditions in the supplemental material. We now investigate the bootstrap algorithm in Section 4 under H . With extra assumptions onthe moments and the dependence measures, i.e. see conditions (i) and (ii) of Theorem 5.5 below,the validity of the bootstrap procedure is supported by the following theorem: Theorem 5.5.

Denote by W n,p = ( k N n ( p − d ) p ) . Assume that the conditions of Theorem 5.4hold, w n → ∞ , w n /m n = o (1) , and that there exists q ∗ ≥ l and ǫ > such that Θ n := w − n + p w n /m n W /q ∗ n,p / W − ǫn,p and(i) max ≤ i ≤ n k X i,n k L q ∗ ≤ M for some suﬃciently large constant M ,(ii) Condition (e) of Theorem 5.4 holds with l replaced by q ∗ .Then we have that conditional on X i,n and under H , sup t ∈ R | P ( ˆ T n ≤ t ) − P ( | κ n | ∞ ≤ t | X i,n , ≤ i ≤ n ) | = O p (cid:18) p δ √ m n log n √ n Ω n (cid:19) / + ι ( m n , k N n p ( p − d ) , l, ( N n p ) /l ) + Θ / n log / ( W n,p Θ n ) ! . (27)The condition w − n + p w n /m n W /q ∗ n,p / W − ǫn,p holds if w n = o ( n c ) for some c > p w n m n W q ∗ + ǫn,p =15 (1) . The convergence rate of (27) will go to zero and hence the bootstrap is asymptotically consis-tent under H if the dependence between X i,n is weak such that ι ( m n , k N n p ( p − d ) , l, ( N n p ) /l ) = o (1); see the discussion below (26). We shall discuss the local power of the test in Section C of thesupplemental material. Furthermore, the selection of m n and w n is discussed in Section D.2 of thesupplemental material. In this subsection we shall compare our method with that in Lam et al. (2011), which is equivalentto ﬁxing J n = 0 in our sieve method. The method studied in Lam et al. (2011) is developed underthe stationarity assumption with static factor loadings and hence the purpose of the simulation is todemonstrate that the methodology developed under the stationarity assumption does not directlycarry over to the non-stationary setting. To demonstrate the advantage of the adaptive sievemethod, our method is also compared with a simple local estimator of Λ ( t ), which was consideredin the data analysis section in Lam et al. (2011) and we shall call it the local PCA method in ourpaper. Speciﬁcally, for each i , Λ ( in ) will be consistently estimated byˆ Λ ( in ) = k X k =1 ˆ M ( in , k ) ˆ M ⊤ ( in , k ) , ˆ M ( in , k ) = 12 m + 1 j = i + m X j = i − m X j + k X ⊤ j (28)where m is the window size such that m → ∞ and m = o ( n ). Straightforward calculations showthat the optimal m is of the order n / . The J n of our method is selected by cross validation while m of the local PCA method is selected by the minimal volatility method.We then present our simulation scenarios. Deﬁne the following smooth functions: a ( t ) = 0 . . − . t ) , a ( t ) = 1 . t ) − , a ( t ) = 0 . πt ,a ( t ) = − (0 . t ) , a ( t ) = 2 cos( πt . t. Let A = ( a , .., a p ) ⊤ be a p × a i = 1 + 0 . i/p ) . . Deﬁne the locally stationary process z i = G ( i/n, F i ) where G ( t, F i ) = P ∞ j =0 a j ( t ) ǫ i − j where ﬁltration F i = ( ǫ −∞ , ..., ǫ i ) and ( ǫ i ) i ∈ Z is16 sequence of i.i.d. N(0,1) random variables. We then deﬁne the time varying matrix A ( t ) =  A a ( t ) A a ( t ) A a ( t ) A a ( t )  . (29)where A , A , A and A are the sub-matrices of A which consist of the ﬁrst round ( p/ th rows, the ( round ( p/

5) + 1) th to round (2 p/ th , ( round (2 p/

5) + 1) th to ( round (3 p/ th and the( round (3 p/

5) + 1) th to p th rows of A , respectively. Let e i = ( e i , ..., e ip ) ⊤ be a p × e ij ) i ∈ Z , ≤ j ≤ p i.i.d. N (0 , . ), and ( e ij ) i ∈ Z , ≤ j ≤ p are independent of ( ǫ i ) i ∈ Z .We consider the cases p = 50 , , ,

500 and n = 1000 , RM SE = 1 np n X i =1 k ˜ L i k , where ˜ L i := k ˆ V ( i/n ) ˆ V ⊤ ( i/n ) X i,n − A ( i/n ) z i,n k is a p × A ( i/n ) and its estimate ˆ A ( i/n ) is deﬁned as Υ i := ˆ V ⊤ i V i , where V i and ˆ V i are the normalized A ( i/n ) and ˆ A ( i/n ) respectively. The principle angle is a well-deﬁned distance between spaces span ( A ( i/n )) and span ( ˆ A ( i/n )). Finally, the average principle angle is deﬁned as ¯ Υ = n P ni =1 Υ i .We present the RMSE and average principle angle of the three estimators using 100 simulationsamples in Table 1 and 2, respectively. Our method achieves the minimal RMSE and averageprinciple angle in all simulation scenarios among the three estimators. We choose k = 3 in oursimulation. Other choices k = 1 , , δ = 0, RMSE in Table 1 decreases as n , p increases and the average principleangle decreases with n increases, and is independent of p .17able 1: Mean and standard errors (in brackets) of simulated RMSE for our sieve method, thestatic loading method ( J n = 0) and Local PCA for model (29). The results are multiplied by 1000.n=1000 n=1500Sieve J n = 0 Local PCA Sieve J n = 0 Local PCAp=50 228 (4 . (5 . (3 . (3 . (2 . (2 . p=100 222 (4 . (3 . (3 . (3 . (4 . (2 . p=200 221 (4 . (3 . (2 . (4 . (3 . (2 . p=500 218 (4 . (3 . (3 . (3 . (2 . (2 . Table 2: Mean and standard errors (in brackets) of simulated principle angles for our sieve method,the static loading method ( J n = 0) and Local PCA for model (29). The results are multiplied by1000. n=1000 n=1500Sieve J n = 0 Local PCA Sieve J n = 0 Local PCAp=50 13 . (0 . . (1 . . (0 . . (0 . . (0 . . (0 . p=100 13 . (0 . . (0 . . (0 . . (0 . . (0 . . (0 . p=200 13 . (0 . . (0 . . (0 . . (0 . . (0 . . (0 . p=500 13 . (0 . . (0 . . (0 . . (0 . . (0 . . (0 . We now examine our method in Section 4 to test the hypothesis of static factor loadings. Deﬁne θ ( t ) = 0 . . t , θ ( t ) = 0 .

12 + 0 . t , θ ( t ) ≡ . , (30) a ( t ) ≡ . , a ( t ) ≡ . , a ( t ) ≡ . (31)Let A be a p × U ( − , A ( t ) =  A a ( t ) A a ( t ) A a ( t )  (32)where A , A and A are the sub-matrices of A which consist of the ﬁrst round ( p/ th rows, the( round ( p/

3) + 1) th to round (2 p/ th rows, and ( round (2 p/

3) + 1) th to p th rows, respectively. Thefactors z i,n = ( z i, ,n , z i, ,n , z i, ,n ) ⊤ where z i,k,n = 2 P ∞ j =0 θ jk ( i/n ) ǫ i − j,k for k = 1 , ,

3, where { ǫ i,k } are i.i.d. standard normal. We consider the following diﬀerent sub-models for e i,n = ( e i, ,n , ..., e i,s,n ) ⊤ :(Model I) ( e i,s,n ) i ∈ Z , 1 ≤ s ≤ p follow i.i.d t (5) / p / e i,s,n = ˜ e i − ,s,n ˜ e i,s,n where (˜ e i,s,n ) i ∈ Z , 1 ≤ s ≤ p are i.i.d. N (0 , p and time series length n . FromTable 3, we see the simulated type I errors approximate its nominal level reasonably well.Table 3: Simulated Type I errors for Model I and Model IIModel I Model II n = 1000 n = 1500 n = 1000 n = 1500level 5% 10% 5% 10% 5% 10% 5% 10% p = 20 4.30% 10.00% 5.67% 10.67% 5.33% 11.00% 5.67% 11.33% p = 50 4.67% 11.33% 5.00% 10.00% 4.00% 8.33% 5.33% 9.67% p = 100 2.67% 5.33% 5.33% 10.67% 3.00% 5.67% 5.00% 11.33% In this subsection, we examine the power of our test procedure in Section 4. Let ˜ A be a p × A i , 1 ≤ i ≤ A which consist of theﬁrst round ( ( i − p ) + 1) th to round ( ip )) th rows of ˜ A , 1 ≤ i ≤

5. We then consider a time-varyingnormalized loading matrix A ( t ) as follows. For a given d ∈ R , let a ( t, D ) = 1 − Dt, a ( t, D ) = 1 − Dt, a ( t, D ) = 1 − Dt ,a ( t, D ) = 1 − Dt , a ( t, D ) ≡ , and we deﬁne the normalized loading matrix A ( t ) = ˜ A D ( t )( ˜ A ⊤ D ( t ) ˜ A D ( t )) − / where˜ A D ( t ) =  ˜ A a ( t, D )˜ A a ( t, D )˜ A a ( t, D )˜ A a ( t, D )˜ A a ( t, D )  . (33)19et ( e i,s,n ) i ∈ Z , 1 ≤ s ≤ p be i.i.d N (0 , . ), and e i,n = ( e i,s,n , ≤ s ≤ p ) ⊤ . Let ( ǫ i ) i ∈ Z be i.i.d. N(0,1)that are independent of ( e i,s,n ) i ∈ Z , 1 ≤ s ≤ p . Then the locally stationary process z i,n we considerin this subsection is z i,n = 4 P ∞ j =0 θ j ( i/n ) ǫ i − j , where the coeﬃcient function θ ( t ) = 0 . . t .Observe that D = 0 corresponds to the situation when A ( t ) is time invariant, and as D increaseswe have larger deviations from the null. We examine the case of p = 25 , , , n = 1500 andincrease D from 0 to 0 .

5. The results are displayed in Table 4 which supports that our test has adecent power. Each simulated rejection rate is based on 100 simulated samples while each sampleuses B = 1000 bootstrap samples for block bootstrap algorithm in Section 5.3.Table 4: Simulated rejection rate at diﬀerent alternatives, with B=1000 D p = 25 6 13 18 35 33 41 96 97 100 100 100 100 p = 50 7 12 14 27 21 39 91 95 100 100 100 100 p = 100 4 11 17 33 49 64 79 87 100 100 100 100 p = 150 7 12 9 25 23 32 55 65 100 100 100 100 To illustrate the usefulness of our method we investigate the implied volatility surface of call optionof Microsoft during the period June 1st, 2014-June 1st, 2019 with 1260 trading days in total. Thedata are obtained by OptionMetrics through the WRDS database. For each day we observe thevolatility W t ( u i , v j ), where u i is the time to maturity and v j is the delta. A similar type of datahas been studied by Lam et al. (2011).We ﬁrst examine whether the considered data set has a static loading matrix by performingour test procedure in Section 4. Speciﬁcally, we consider u i to take values at 30 , ,

91, 122, 152,182, 273, 365, 547 and 730 for i = 1 , ...

10 and v j to take values at 0 . . . . .

55 for j = 1 , ...,

5. Then we write ˜ X t = vec ( W t ( u i , v j ) ≤ i ≤ , ≤ j ≤ ) . Due to the well-documented unit rootnon-stationarity of ˜ X t (see e.g. Fengler et al. (2007)) we study the 50-dimensional time series X t := ˜ X t − ˜ X t − .

200 400 600 800 1000 1200 . . . . . . Time

Figure 1: Dimension of loading matrix . . . . . . P e r c en t age LargestSecond Largest

Figure 2: Variance explainedIn our data analysis we choose k = 3. Using the minimal volatility method we select N n = 4where N n = T − m n is the number of non-overlapping equal-sized blocks, and choose window size w = 5. Via B = 10000 bootstrap samples we obtain a p − value of 6 . p − value provides a moderate to strong evidence against static factor loadings.We then apply our sieve estimator in Section 3 to estimate the time varying loading matrix.The cross validation method suggests the use of the Legendre polynomial basis up to 6 th order. Weﬁnd that during the considered period the number of factors, or the rank of the loading matrix, iseither one or two at each time point. In ﬁgure 1 we display the rank of estimated factor loadingmatrix at each time, and in ﬁgure 2 we show the percentage of variance that is explained by theeigenvectors corresponding to the ﬁrst and second largest eigenvalues.Finally, we examine the performance of our method in predicting the next day value of thediﬀerenced volatility and compare it with those of the local PCA method mentioned in Section 6.1.For the given period, we ﬁrst apply our Sieve-PCA based method in Section 3 to obtain ˆ d ( i/n )which is the estimated dimension of A ( i/n ). Denote d max = max ≤ i ≤ n ˆ d ( i/n ). Then at each time i/n , we set ˇ A ( i/n ) as the p × d max matrix with its j th , ≤ j ≤ d max , column being the eigenvectorof Λ ( i/n ) (see (8)) w.r.t. λ j ( Λ ( i/n )). We estimate the d max dimensional time series ˜ z i,n by ˆ z i,n where ˆ z i,n = ˇ A ⊤ ( i/n ) X i . z i +1 ,j,pred , ≤ j ≤ d max based on z sj , ≤ j ≤ d max , 1 ≤ s ≤ i . Let z i + j,pred = ( z i +1 ,j,pred , ≤ j ≤ d max ) ⊤ , and J d ,d be an d × d diagonal matrix with ﬁrst d diagonals 1 and all other elements 0. Then the predictor is deﬁnedby X i +1 ,pred = A ⊤ pred (( i + 1) /n ) z ( i +1) ,n,pred (34)with A ⊤ pred (( i + 1) /n ) = ˇ A ⊤ ( i/n ) J ˆ d ( i/n ) ,d max . As an alternative approach, one can also predict z i + j by ﬁtting stationary autoregressive models as in Lam et al. (2011). For comparison, we considerthe squared mean prediction error (MSPE) given period j, j + 1 ..., T , deﬁned as M SP E = P Ts = j k X s +1 ,pred − X s +1 k p ( T − j + 1) . (35)We examine the mean prediction error of our Sieve-PCA based method and that of Local-PCAbased method proposed in the data analysis section of Lam et al. (2011), with z i predicted by boththe locally stationary prediction method and stationary AR models. The period we consider forMSPE (35) starts from the date 26 Sep. 2017, which corresponds to j = 838 at the 2 / X s +1 ,pred is predicted by the observed implied volatility at 1 ≤ t ≤ s . Toapply Local-PCA method of Lam et al. (2011) for prediction, we use the data at s − L ≤ t ≤ s with L = 100 and L = 200. Lam et al. (2011) used L = 99 for prediction. We summarize theresult in Table 5, which shows our method achieves the smallest MSPE on this data set.Table 5: MSPE for Sieve PCA-based method and Local PCA based method. The ﬁrst row cor-responds to predicting z i +1 using a stationary AR model, and the second row corresponds topredicting z i +1 using Dette and Wu (2020). The values are multiplied by 10000.Our Method Local PCA (L=100) Local PCA (L=200)AR 1.3 1.39 1.32LS 1.27* 1.44 1.41A benchmark procedure for forecast is to use today’s value as a one-step ahead prediction. TheMSPE for this benchmark is 3 . × − which is worse than both Local-PCA and Sieve-PCA22

100 200 300 400 . . . . . . . C u m u l a t i v e e rr o r BenchSPSAPPCA100PCAR100PCA200PCAR200

Figure 3:

Cumulated MSPE for the Benchmark procedure (Bench), the Sieve-PCA with locallystationary prediction (SP) and the AR-ﬁt prediction (SAP), the Local-PCA when L = 100 with thelocally stationary prediction (PCA100) and the AR-ﬁt prediction (PCAR100), Local-PCA when L = 200 with the locally stationary prediction (PCA200) and the AR-ﬁt prediction (PCAR200) .based approaches. In ﬁgure 3 we show the cumulative errors of the benchmark procedure andthose forecasting methods investigated in Table 5. The ﬁgure shows an improvement in MSPEat almost every time when forecasting via our our Sieve-PCA based approach comparing with allother methods. Supplemental Material

Abstract

Section A of this supplementary material provides detailed assumptions for the evolution-ary factor models, and a discussion of the rate in Theorem 5.3 of the main article. Section Bgives out assumptions of and comments on Theorem 5.4 for the test of static factor loadings.Section C is regarding the power performance of the test of the static factor loading. SectionD gives out the method for selecting tuning parameters. Section E contains the proofs ofTheorem 5.1, 5.2, 5.3 for estimation of the null space of the evolutionary loading matrix.Section F includes the proof of Propositions 5.2 , 5.3 and Theorems 5.4, 5.5 for testing thestatic factor loading. Finally, Section G proves Theorem C.1. Assumptions for the time-varying loading matrix

In this section we shall state our assumptions on the moment and dependency of the processes X i , z i and e i . For this purpose we deﬁne further some notation. At time i/n , the d ( i/n ) dimen-sional factors z i,n = ( z i,l ,n , ..., z i,l d ( i/n ) ,n ) ⊤ where the index set { l , ..., l d ( i/n ) } ⊂ { , , ..., d } . Let Q ( t, F i ) = ( Q l ( t, F i ) , ...., Q l d ( t ) ( t, F i )) ⊤ . Denoted by Σ z ( t, k ) = E ( Q ( t, F i + k ) Q ⊤ ( t, F i )), Σ e ( t, k ) = E ( H ( t, F i + k ) H ⊤ ( t, F i )) the k th order auto-covariance function of z i,n and e i,n at time t , respectively.Let Σ ze ( t, k ) = E ( Q ( t, F i + k ) H ⊤ ( t, F i )), Σ ez ( t, k ) = E ( H ( t, F i + k ) Q ⊤ ( t, F i )). Let C K ( ˜ M )[0 ,

1] be thecollection of functions f deﬁned on [0 ,

1] such that the K -th derivative of f is Lipschitz continu-ous with Lipschitz constant ˜ M . For F matrix, let k F k = p λ max ( FF ⊤ ). If F is a vector, then k F k = k F k F . We ﬁrst present assumptions on A ( t ).(A1) Let a ij ( t ), 1 ≤ i ≤ p , 1 ≤ j ≤ d be the ( i, j ) th element of A ( t ). We assume there exists asuﬃciently large constant M such thatsup t ∈ [0 , | a ij ( t ) | ≤ M. (A.1)(A2) Assume σ x,i,j ( t, k ), 1 ≤ i ≤ p and 1 ≤ j ≤ p , 1 ≤ k ≤ k belongs to a common functionalspace Ω which is equipped with an orthonormal basis B j ( t ), i.e. R B m ( t ) B n ( t ) dt = ( m = n ),where ( · ) is an indicator function. Assume Ω ∈ C K ( ˜ M )[0 ,

1] for some K ≥

2. Moreovermax ≤ i ≤ p, ≤ j ≤ p sup t ∈ [0 , | σ x,i,j ( t, k ) − J n X u =1 ˜ σ x,i,j,u ( k ) B u ( t ) | = O p ( g J n ,K, ˜ M ) , (A.2)where ˜ σ x,i,j,u ( k ) = R σ x,i,j ( t, k ) B u ( t ) dt , and g J n ,K, ˜ M → J n → ∞ .Condition (A1) concerns the boundedness of the loading matrix, while (A2) means Σ X ( t, k ) can beapproximated by basis expansion. The approximation error rate g J n ,K, ˜ M diminishes as J n increases.Often higher diﬀerentiability yields more accurate approximation rate. Remark A.1.

Consider the case for ≤ k ≤ k , ≤ i, j ≤ p , σ (1) x,i,j ( t, k ) , ..., σ ( v − x,i,j ( t, k ) areabsolutely continuous in t where σ ( m ) x,i,j ( t, k ) = ∂ m σ x,i,j ( t ) ∂t m for some positive integer m . We are able tospecify the rate g J n ,m +1 , ˜ M for various basis functions. a) Assume i) max ≤ i,j ≤ p, ≤ k ≤ k R σ ( v +1) x,i,j ( t ) √ − t dt ≤ M < ∞ for some constant M , σ ( v ) x,i,j ( t ) is ofbounded variation for all i, j , then the uniform approximation rate g J n ,m +1 , ˜ M is J − v +1 / n fornormalized Legendre polynomial basis. If σ x,i,j ( t, k ) ′ s are analytic inside and on a certainregion in the complex plane then the approximation rate decays geometrically in J n . See forinstance Wang and Xiang (2012) for more details.(b) If σ ( v ) x,i,j ( t, k ) is Lipschitz continuous, then the uniform approximation rate g J n ,m +1 , ˜ M is J − vn when trigonometric polynomials (if all σ x,i,j ( t, k ) can be extended to periodic functions) ororthogonal wavelets generated by suﬃciently high order father and mother wavelets are used.See Chen (2007) for more details. The assumptions for z i,n and e i,n are now in order.(M1) The short-range dependence conditions hold for both z i,n and e i,n in L l norm , i.e.∆ Q,l,m := max ≤ j ≤ d ∞ X k = m δ Q,l,j ( k ) = o (1) , ∆ H,l,m := max ≤ j ≤ p ∞ X k = m δ H,l,j ( k ) = o (1) (A.3)as m → ∞ for some constant l ≥ M such thatsup t ∈ [0 , max ≤ u ≤ d E | Q u ( t, F ) | ≤ M, sup t ∈ [0 , max ≤ v ≤ p E | H v ( t, F ) | ≤ M. (M3) For t, s ∈ [0 , M such that (cid:0) E | Q u ( t, F ) − Q u ( s, F ) | (cid:1) / ≤ M | t − s | , ≤ u ≤ d, (A.4) (cid:0) E | H v ( t, F ) − H v ( s, F ) | (cid:1) / ≤ M | t − s | , ≤ v ≤ p, (A.5) (cid:0) E | G v ( t, F ) − G v ( s, F ) | (cid:1) / ≤ M | t − s | , ≤ v ≤ p. (A.6)Conditions (M1)-(M3) means that each coordinate process of z i,n and e i,n are standard shortmemory locally stationary time series deﬁned in the literature; see for instance Zhou and Wu(2009). Furthermore, the moment condition (M2) implies that for 0 ≤ t ≤

1, all elements of thematrices Σ z,e ( t, k ) and Σ e ( t, k ) are bounded in L norm.25e then postulate the following assumptions on the covariance matrices of the common factors z i,n and the idiosyncratic components e i,n , which are needed for spectral decomposition:(S1) For t ∈ [0 ,

1] and k = 1 , ..., k , all components of Σ e ( t, k ) are 0.(S2) For k = 0 , , ..., k , Σ z ( t, k ) is full ranked on a sub-interval of [0 , t ∈ [0 ,

1] and k = 1 , ..., k , all components of Σ ez ( t, k ) are 0.(S4) For t ∈ [0 ,

1] and 1 ≤ k ≤ k , k Σ z,e ( t, k ) k F = o ( η / n p − δ ), where η n is the sequence deﬁnedin condition (C1).Condition (S1) indicates that { e i,n } does not have auto-covariance up to order k which is slightlyweaker than the requirement that { e i,n } is a white noise process used in the literature. Condition(S2) implies that for 1 ≤ i ≤ n , there exists a certain period that no linear combination ofcomponents of z i,n is white noise that can be absorbed into e i,n .(S3) implies that z i,n , and { e i + k,n } are uncorrelated for any k ≥

0. Condition (S4) requires a weak correlation between z i + k and e i .In fact it is the non-stationary extension of Condition (i) in Theorem 1 of Lam et al. (2011) andcondition (C6) of Lam and Yao (2012). Though (C6) of Lam and Yao (2012) assumes a rate of o ( p − δ ), it requires standardization of the factor loading matrix A . Remark A.2.

We now discuss the equivalent assumptions on the ﬁx dimensional loading matrixmodel (2) . Deﬁne Σ z ∗ ( t, k ) , Σ z ∗ ,e ∗ ( t, k ) , Σ e ∗ ( t, k ) , Σ e ∗ ,z ∗ ( t, k ) similarly to Σ z ( t, k ) , Σ z,e ( t, k ) , Σ e ( t, k ) , Σ e,z ( t, k ) and assume (S) for these quantites. By construction conditions (M) are satisﬁed.In particular, under the equivalent model (A.4) and (A.5) imply (A.6) . Conditions (A) are satisﬁedif each element of A ∗ ( t ) , Σ z ∗ ( t, k ) and Σ z ∗ ,e ∗ ( t, k ) , k = 0 , ..., k has a bounded K order derivative.Condition (C1) is satisﬁed with T η n the period when smallest nonzero σ u exceeds η n . If d ( t ) ispiecewise constant with bounded number of change points, then |T η n | → as n → ∞ and η n → .If d ( t ) ≡ d we can assume that T η n = [0 , for some suﬃciently small positive η n := η > . A.1 Discussion of The Rate in Theorem 5.3

Remark A.3.

If we assume that σ x,i,j ( t, k ) ′ s are real analytic and normalized Legendre polynomialsor trigonometric polynomials (when all σ x,i,j ( t, k ) can be extended to periodic functions) are usedas basis, then Theorem 5.1 will yield an approximation rate of p − δ log n √ n . For Theorem 5.1 the pproximation rate of (i) is then p δ log n √ n and of (ii) is p − / + p δ/ log n √ n by taking J n = M log n for some large constant M . Our rate achieves the rate of Lam et al. (2011) for stationary highdimensional time series except a factor of logarithm. For Theorem 5.2 when η n ≡ η for some ﬁxedconstant η , the approximation rate is p − δ log nn / for (i), p log nn for (ii) and p δ log nn for (iv) by taking J n = M log n for some large constant M . These rates coincide that of Theorem 1 and Corollary1 of Lam and Yao (2012) except a factor of logarithm. The above ﬁndings are consistent with theresults in nonparametric sieve estimation for analytic functions where the uniform convergencerates for the sieve estimators (when certain adaptive sieve bases are used for the estimation) areonly inferior than the n − / parametric rate by a factor of logarithm. We shall also point out that thesieve approximation rates will be adaptive to the smoothness and will be slower when σ x,i,j ( t, k ) ′ s areless smooth in which case g J n ,K, ˜ M converges to zero at an adaptive but slower rate as J n increases. A.2 The moment condition (M’) for proposition 5.3

The following mild moment condition (M’) is needed for proposition 5.3.(M’) There exists an integer l > M such that max ≤ j ≤ d P ∞ k =1 δ Q, l,j ( k ) < ∞ , max ≤ j ≤ p P ∞ k =1 δ H, l,j ( k ) < ∞ andsup t ∈ [0 , max ≤ u ≤ d E | Q u ( t, F ) | l ≤ M, sup t ∈ [0 , max ≤ v ≤ p E | H v ( t, F ) | l ≤ M. B Assumptions of and comments on Theorem 5.4 (a) k A k ∞ / l ≥ N n p ) /l / m − ζ n and k N n ( p − d ) p / exp( m ζn ) for some 0 ≤ ζ < / σ j,j = P m n i,l =1 cov ( ˜ Z i,j , ˜ Z l,j ) /m n . There exists η > σ j,j ≥ η , i.e., each of the k N n p ( p − d ) univariate sub-series of ˜ Z i is non-degenerated.(e) The dependence measure for X i,n satisﬁes δ G, l ( k ) = O ((( k + 1) log( k + 1)) − ) . (B.7)27f) There exists a constant M l depending on l such that for 1 ≤ i ≤ n , and for all p dimen-sional vector c such that k c k = 1, the inequality k c ⊤ e i,n k L l ≤ M l k c ⊤ e i,n k L holds. Alsomax ≤ i ≤ n λ max ( cov ( e i,n e ⊤ i,n )) is uniformly bounded as n and p diverges.The rate of Theorem 5.4 is deﬁned by ι ( n, p, q, D n ) = min( n − / M / l / n + γ + ( n / M − / l − / n ) q/ (1+ q ) ( p X j =1 Θ qM,j,q ) / (1+ q ) +Ξ / M (1 ∨ log( p/ Ξ M )) / ) , (B.8)where the minimum is taken over all possible values of γ and M subject to n / M − / l − / n ≥ max { D n ( n/γ ) / , l / n } . Here l n = log( pn/γ ) ∨

1, Θ

M,j,q = P ∞ k = M δ G,q,j ( k ) and Ξ M = max ≤ j ≤ p P ∞ k = M kδ G, ,j ( k ).Conditions (a)-(e) control the magnitude of the loading matrix, the dimension and dependenceof the time series X i,n as well as non-degeneracy of the process ˜ Z i . Those conditions are standardin the literature of high dimensional Gaussian approximation; see for instance Zhang and Cheng(2018). Condition (f) controls the magnitude of the L l norm of projections of e i,n by their L norm which essentially requires that the dependence among the components of e i,n cannot betoo strong. (f) is mild in general and is satisﬁed, for instance, if the components of e i,n areindependent or e i,n is a sub-Gaussian random vector with a variance proxy that does not dependon p for each i . Furthermore, we comment that Condition (d) is a mild non-degeneracy condition.For example, assuming (i) e s,n is independent of e l,n , z l,n for s ≥ l , min ≤ i ≤ n, ≤ j ≤ p E X i,j ≥ η ′ > ≤ i ≤ n λ min ( cov ( e i,n e ⊤ i,n )) ≥ η ′′ > η ′ and η ′′ , and (ii) condition (a)-(c), (e) and(f) hold, then condition (d) holds. To see this, note that under null hypothesis, for 1 ≤ s ≤ p − d ,1 ≤ k ≤ k , 1 ≤ i, j ≤ n and 1 ≤ v ≤ p , we have for i = j , E ( F ⊤ s X i + k,n X i,v F ⊤ s X l + k,n X l,v ) = E ( F ⊤ s e i + k X i,v F ⊤ s e l + k,n X l,v ) = 0 , (B.9) E (( F ⊤ s X i + k,n X i,v ) ) = E (( F ⊤ s e i + k,n X i,v ) ) = F ⊤ s ( cov ( e i + k,n e ⊤ i + k,n )) F s E ( X i,v ) ≥ η ′ η ′′ . (B.10)Consequently (B.9) and (B.10) imply condition (d).28 Power

In this section we discuss the local power of our bootstrap-assisted testing algorithm in Section 5.3of the main article for testing static factor loadings. For two matrices A and B , denoted by A ◦ B the Hadamard product of A and B . Let J be a p × d matrix with all entries 1 where d = rank ( A ).To simplify the proof, we further assume the following Condition (G):(G1): e i is independent of z s , e s for i > s .(G2): the dependence measure of z i satisﬁesmax ≤ j ≤ d δ Q, ,j ( k ) = O ((( k + 1) log( k + 1)) − ) . (C.1)(G3):the dependence measures of e i satisfy ∞ X k =1 k sup ≤ t ≤ k F ⊤ H ( t, F i ) − F ⊤ H ( t, F ( i − k ) i ) k L = O (log p ) (C.2)for all p dimensional vector F such that k F k F = 1.Conditions (G1) are (G2) and mild further assumptions on the error and factor processes. Con-dition (G3) controls the strength of temporal dependence for projections of the error process e i .(G3) essentially requires that each component of e i is a short memory process and the dependenceamong the components of e i are not too strong. Elementary calculation shows that Condition(G3) is satisﬁed under two important scenarios. The ﬁrst is that the dependence measures satisfymax ≤ j ≤ p δ H, ,j ( k ) = O ( χ k ) for some χ ∈ (0 , e i are independent i.e., H j ( · , F u ) is independent of H j ( · , F v ) for all u, v ∈ N aslong as j = j , 1 ≤ j , j ≤ p , and P ∞ k =1 k max ≤ j ≤ p δ H, ,j ( k ) = O (1). Lemma C.1.

Under conditions (f ), (M’) and max ≤ j ≤ p δ H, ,j ( k ) = O ( χ k ) for some χ ∈ (0 , ,(G3) holds.Proof. Notice that by condition (f), there exist constants M and M such that k F ⊤ H ( t, F i ) − F ⊤ H ( t, F ( i − k ) i ) k L ≤ k F ⊤ H ( t, F i ) k L ≤ M k F ⊤ H ( t, F i ) k L ≤ M . (C.3)29n the other hand, by triangle inequality k F ⊤ H ( t, F i ) − F ⊤ H ( t, F ( i − k ) i ) k L ≤ p X j =1 F j k H j ( t, F i ) − H j ( t, F ( i − k ) i ) k L = O ( √ pχ k ) , (C.4)where the last equality is due to Cauchy-Schwarz inequality and Condition (M’). As a consequence,(C.3) and (C.4) lead to thatsup ≤ t ≤ k F ⊤ H ( t, F i ) − F ⊤ H ( t, F ( i − k ) i ) k L = O (1 ∧ √ pχ k ) . (C.5)Notice that when k = ⌊ a log p ⌋ for some suﬃciently large constant a , √ pχ k = O (1). Therefore by(C.5), ⌊ a log p ⌋ X k =1 k sup ≤ t ≤ k F ⊤ H ( t, F i ) − F ⊤ H ( t, F ( i − k ) i ) k L + ∞ X k = ⌊ a log p ⌋ +1 k sup ≤ t ≤ k F ⊤ H ( t, F i ) − F ⊤ H ( t, F ( i − k ) i ) k L = O (log p ) + O ( ∞ X k = ⌊ a log p ⌋ +1 kχ k ) = O (log p ) , which ﬁnishes the proof. (cid:3) We consider the following class of local alternatives: H A : A ( t ) = A n ( t ) := A ◦ ( J + ρ n D ( t )) , (C.6)where D ( t ) = ( d ij ( t )) is a p × d matrix, max i,j sup t | d ij ( t ) | /

1, sup t ∈ [0 , k D ( t ) ◦ A ( t ) k F = O ( p − δ )for some δ ∈ [ δ,

1] and ρ n ↓ A n ( t ) is a function of ρ n and therefore Z Σ X ( t, k ) dt = Z A n ( t )( Σ z ( t, k ) A n ( t ) ⊤ + Σ z,e ( t, k )) dt := γ ( ρ n , k ) , k > . Let Γ k ( ρ n ) = γ ( ρ n , k ) γ ( ρ n , k ) ⊤ and Γ ( ρ n ) = P k k =1 Γ k ( ρ n ). To simplify notation, write Γ ( ρ n ), ρ n > Γ n with an orthogonal basis F n = ( F ,n , ..., F p − d,n ). Recall the centered k N n ( p − d ) p dimensional Gaussian random vectors Y and Y i deﬁned in Theorem 5.4 of the main article andthe deﬁnition of ˜ Z i . Deﬁne ˇ Z i in analogue of ˜ Z i with F replaced by F n .30 heorem C.1. Consider the alternative hypothesis (C.6) . Assume in addition that for t ∈ [0 , , rank ( A ( t )) = d , p δ n − / m / n Ω n = o (1) , and ∆ n = | P m n i =1 E ˇ Z i /m n | ∞ ≍ ρ n p − δ . Then underconditions of Theorem 5.5 and condition (G) we have the following results.(a) Under the considered alternative hypothesis,if ρ n m / n log / n Ω n = o (1) then sup t ∈ R | P ( ˆ T n ≤ t ) − P ( | Y | ∞ ≤ t ) | (C.7)= O (cid:18) ( ρ n p δ − δ + p δ √ n ) p m n log n Ω n (cid:19) / + (cid:16) ρ n p m n log n Ω n (cid:17) / + ι ( m n , k N n p ( p − d ) , l, ( N n p ) /l ! , where the function ι is deﬁned in (B.8) . Furthermore, if √ m n ρ n p − δ → ∞ then ˆ T n divergesat the rate of √ m n ρ n p − δ .(b) Assume in addition the conditions of Theorem 5.5 hold, then if Θ / n log / ( W n,p Θ n ) = o (1) , sup t ∈ R | P ( | Y | ∞ ≤ t ) − P ( | κ n | ∞ ≤ t | X i , ≤ i ≤ n ) | = O p (cid:16) Θ / n log / ( W n,p Θ n ) + (cid:0) ( p δ √ n + ρ n ) √ w n Ω n (cid:1) l/ (1+ l ) (cid:17) . (C.8) Furthermore, | κ n | ∞ = O p ( √ w n ρ n p − δ √ log n ) .Therefore by (a) and (b) if √ m n √ w n log n → ∞ , √ m n ρ n p − δ → ∞ , the power of the test approaches as n → ∞ . We introduce δ to allow that D ( t ) has a diﬀerent level of strength than that of A . For instance,if D ( t ) has a bounded number of non-zero entries, then δ = 1 while δ can be any number in [0 , n ≍ ρ n p − δ is mild. In fact, it is shown in equation (G.8) in the proof ofProposition G.2 that | E ˇ Z i | ∞ = O ( ρ n p − δ ), which means ∆ n = O ( ρ n p − δ ). On the other hand,the condition ∆ n ≍ ρ n p − δ requires that there exists at least one eigenvector of the null space of P k k =1 Γ k such that an order p of its entries have the same order of magnitude.31 Selection of Tuning Parameters

D.1 Selection of tuning parameters for the estimation of time-varyingfactor loading matrices.

We discuss the selection of J n for the estimation of time-varying factors. Since in practice β n is unknown, a data-driven method to select J n is desired. By Theorem 5.2, the residuals ˆ e i,n = X i,n − ˆ V ( in ) ˆ V ⊤ ( in ) X i,n . Notice that ˆ e i,n = (ˆ e i, ,n , ..., ˆ e i,p,n ) ⊤ is a p dimensional vector. We select J n as the minimizer of the following cross validation standard CV ( d ) CV ( d ) = n X i =1 p X s =1 ˆ e i,s,n ( d )(1 − v i,s ( d )) (D.1)where v i,s ( d ) is the s th diagonal element of ˆ V ( in ) ˆ V ⊤ ( in ) obtained by setting J n = d , and ˆ e i,s,n ( d ) ′ s ≤ i ≤ n , 1 ≤ s ≤ p are also the residuals calculated when J n = d . The cross-validation hasbeen widely used in the literature of sieve nonparametric estimation and has been advocated byfor example Hansen (2014). We choose J n = argmin d ( CV ( d )) in our empirical studies. D.2 Selection of tuning parameters m n and w n for testing static factorloadings We select m n by ﬁrst select N n and let m n = ⌊ ( n − k ) /N n ⌋ . The N n is chosen by the minimalvolatility method as follows. For a given data set, letˆ T ( N n ) = √ m n max ≤ k ≤ k max ≤ j ≤ N n max ≤ i ≤ p − ˆ d n | ˆ F ⊤ i ˆ Σ X ( j, k ) | ∞ (D.2)be the test statistic obtained by using the integer parameter N n with ˆ d n = argmin ≤ i ≤ R λ i +1 ( ˆ Γ ) /λ i ( ˆ Γ )where R is deﬁned in (20) of the main article.Consider a set of candidate N n , which is denoted by J = { J , ..., J s } where J s are positiveintegers. For each J v , 1 ≤ v ≤ s we calculate ˆ T ( N n ) and hence the local standard error SE ( ˆ T ( J l ) , h ) =  h h X u = − h ˆ T ( J u + l ) − h + 1 h X u = − h ˆ T ( J u + l )) !  / (D.3)32here 1 + h ≤ l ≤ s − h and h is a positive integer, say h = 1. We then select N n byargmin h ≤ l ≤ s − h SE ( ˆ T ( J l , h )) (D.4)which stabilizes the test statistics. The idea behind the minimum volatility method is that the teststatistic should behave stably as a function of N n when the latter parameter is in an appropriaterange. In our empirical studies we ﬁnd that the proposed method performs reasonably well, andthe results are not sensitive to the choice of N n ( m n ) as long as N n ( m n ) used is not very diﬀerentfrom that chosen by (D.4).After choosing m n , we then further choose w n by the minimal volatility method. In this casewe ﬁrst obtain the k N n ( p − d ) p dimensional vectors { ˆ Z i , 1 ≤ i ≤ m n } deﬁned in Section 5.3 ofthe main article. Then we select w n by a multivariate extension of the minimal volatility methodin Zhou (2013) as follows. We consider choosing w n from a grid w ≤ ... ≤ w r . For each w n = w i ,1 ≤ i ≤ r we calculate a k N n ( p − d ) p dimensional vector B oi,r = V ec ( w i ( m n − w i +1) P rj =1 (ˆ S j,w i − w i m n ˆ S m n ) ◦ ) where ◦ represents the Hadamard product and 1 ≤ r ≤ m n − w r + 1. Let B oi =( B o ⊤ i, , ..., B o ⊤ i,m n − w r +1 ) ⊤ be a k N n ( p − d ) p ( m n − w r + 1) dimensional vector, and B be a k N n ( p − d ) p ( m n − w r + 1) × r matrix with its i th column B oi . Then for each row, say i th row B i, · of B , wecalculate SE ( B i, · , h ) for a given window size h , see (D.3) for deﬁnition of SE and therefore obtain a r − h length row vector. Stacking these row vectors we get a new k N n ( p − d ) p ( m n − w r +1) × ( r − h )matrix B † . Let colmax ( B † ) be a ( r − h ) length vector with its i th element being the maximumentry of the i th column of B † . Then we choose w n = w k if the smallest entry of colmax ( B † ) is its( k − h ) th element. E Proofs of Theorem 5.1, 5.2, 5.3 and Proposition 5.1.

We now present some additional notation for the proofs. For any random variable W = W ( t, F i ),write W ( g ) = W ( t, F ( g ) i ) where F ( g ) i = ( ǫ −∞ , ..ǫ g − , ǫ ′ g , ǫ g +1 , ..., ǫ i ) where ( ǫ ′ i ) i ∈ Z is an iid copyof ( ǫ i ) i ∈ Z . Let P j = E ( ·|F j ) − E ( ·|F j − ) be the projection operator. In the proof, let C be ageneric constant whose value may vary from line to line. Let a uv ( t ) = 0 if v

6∈ { l , ..., l d ( t ) } , where l i , ≤ i ≤ d ( t ) is the index set of factors deﬁned in Section A. Proof of Theorem 5.1. k ∈ , ..., k , we have thatsup t ∈ [0 , k ˆ M ( J n , t, k ) ˆ M ⊤ ( J n , t, k ) − Σ X ( t, k ) Σ ⊤ X ( t, k ) k F ≤ (E.1)2 k Σ ⊤ X ( t, k ) k F k ˆ M ( J n , t, k ) − Σ ⊤ X ( t, k ) k F + k ˆ M ( J n , t, k ) − Σ ⊤ X ( t, k ) k F . Notice that by Jansen’s inequality, Cauchy-Schwarz inequality and conditions (C1), (M2), (S1)–(S4), it follows that uniformly for t ∈ [0 , k Σ X ( t, k ) k F ≤ p X u =1 p X v =1 (cid:16) d X u ′ =1 d X v ′ =1 a uu ′ ( t ) E ( Q u ′ ( t, F i + k ) Q v ′ ( t, F i )) a vv ′ ( t ) (cid:17) + 2 k A ( t )Σ z,e ( t, k ) k F ≤ C p X u =1 p X v =1 " d X u ′ =1 d X v ′ =1 a uu ′ ( t ) a vv ′ ( t ) + o ( p − δ ) ≤ C ′ d p − δ . (E.2)for some suﬃciently large constant C and C ′ which depend on the constant M in condition (M2).On the other hand, by Lemmas E.4, E.5 and E.6 we have that k sup t ∈ [0 , k ˆ M ( J n , t, k ) − Σ X ( t, k ) k F k L = O ( J n p sup t, ≤ j ≤ J n | B j ( t ) | √ n + β n ) . (E.3)Then the theorem follows from equations (E.1), (E.2) and (E.3). (cid:3) Proof of Theorem 5.2.

We ﬁrst prove (i). It suﬃces to show that the d ( t ) th largest eigenvalue of Λ ( t ) satisﬁesinf t ∈T λ d ( t ) ( Λ ( t )) ' η n p − δ . (E.4)Then the theorem follows from Theorem 5.1, (E.4) and Theorem 1 of Yu et al. (2015). We now show(E.4). Consider the QR decomposition of A ( t ) such that A ( t ) = V ( t ) R ( t ) where V ( t ) ⊤ V ( t ) = I d ( t ) and I d ( t ) is an d ( t ) × d ( t ) identity matrix. Here V ( t ) is a p × d ( t ) matrix and R ( t ) is a d ( t ) × d ( t )matrix. Then (8) of the main article can be written as Λ ( t ) = V ( t ) ˜ Λ ( t ) V ⊤ ( t ) , (E.5)34here ˜ Λ ( t ) = R ( t ) h k X k =1 ( Σ z ( t, k ) A ⊤ ( t ) + Σ z,e ( t, k ))( A ( t ) Σ ⊤ z ( t, k ) + Σ ⊤ z,e ( t, k )) i R ⊤ ( t ) , (E.6)We now discuss the λ d ( t ) ( ˜ Λ ( t )). Write Σ Rz ( t, k ) = R ( t ) Σ z ( t, k ) R ⊤ ( t ) , Σ Rz,e ( t, k ) = R ( t ) Σ z,e ( t, k ) , (E.7) Σ ∗ Rz ( t, k ) = ( Σ Rz ( t, , ..., Σ Rz ( t, k )) , Σ ∗ Rz,e ( t, k ) = ( Σ Rz,e ( t, , ..., Σ Rz,e ( t, k )) . (E.8)Therefore we have˜ Λ ( t ) = ( Σ ∗ Rz ( t, k )( I k ⊗ V ⊤ ( t )) + Σ ∗ Rz,e ( t, k ))( Σ ∗ Rz ( t, k )( I k ⊗ V ⊤ ( t )) + Σ ∗ Rz,e ( t, k )) ⊤ (E.9)where ⊗ denotes the Kronecker product. On the other hand, by (C1) we havesup t ∈ [0 , k R ( t ) k ≍ p − δ , inf t ∈ [0 , k R ( t ) k ≍ p − δ , sup t ∈T ηn k R ( t ) k m ' η n p − δ , inf t ∈T ηn k R ( t ) k m ' η n p − δ . Then by conditions (M1), (M2), (M3), (S2) and (S4) we havesup t ∈ [0 , k Σ ∗ Rz,e ( t, k ) k = o ( p − δ ) , inf t ∈ [0 , k Σ ∗ Rz,e ( t, k ) k = o ( p − δ ) , sup t ∈T ηn σ d ( t ) ( Σ ∗ Rz ( t, k )) ' η n p − δ , inf t ∈T ηn σ d ( t ) ( Σ ∗ Rz ( t, k )) ' η n p − δ , where σ d ( t ) ( Σ ) of a matrix Σ denotes the d ( t ) th singular value of the matrix Σ . Therefor using thearguments in the proof of Theorem 1 of Lam et al. (2011) we have thatinf t ∈T ηn λ d ( t ) ( ˜ Λ ( t )) ' η n p − δ . (E.10)This shows (E.4). Therefore the (i) of the Theorem follows. The assertion (ii) follows from result(i), condition (C1) and a similar argument of proof of Theorem 3 of Lam et al. (2011). Details areomitted for the sake of brevity. (cid:3) The following lemma from Bhatia (1982) is useful for proving Theorem 5.3.35 emma E.1.

Let M ( n ) be the space of all n × n (complex) matrices. A norm k · k on M ( n ) issaid to be unitary-invariant if k A k = k U AV k for any two unitary matrices U and V. We denoteby Eig A the unordered n -tuple consisting of the eigenvalues of A, each counted as many times asits multiplicity. Let D ( A ) be a diagonal matrix whose diagonal entries are the elements of Eig A. For any norm on M ( n ) deﬁne k (Eig A, Eig B ) k = min W (cid:13)(cid:13) D ( A ) − W D ( B ) W − (cid:13)(cid:13) where the minimum is taken over all permutation matrices W. If A, B are Hermitian matrices, wehave for all unitary-invariant norms (including the Frobenius norm ) the inequality k (Eig A, Eig B ) k k A − B k . Proof of Theorem 5.3. (i) follows immediately from Lemma E.1 and Theorem 5.1. For (ii),notice that by Theorem 5.2 we have that k sup t ∈T ηn k ˆ B ( t ) − B ( t ) ˆ O ⊤ ( t ) k F k L = O (cid:18) η − n p δ − (cid:18) J n p sup j,t | B j ( t ) | √ n + β n (cid:19)(cid:19) , (E.11)where B ( t ), ˆ B ( t ) and ˆ O ( t ) are deﬁned in Theorem 5.2. Since ˆ O ( t ) is an ( p − d ( t )) × ( p − d ( t ))orthonormal matrix, it is easy to check that B ( t ) ˆ O ⊤ ( t ) is an orthonormal basis of the null spaceof Λ ( t ). Using this fact, we then follow the decomposition (A.5) of Lam and Yao (2012) to obtainthat uniformly for j = d ( t ) + 1 , ..., pλ j ( ˆ Λ )( t ) = K ,j ( t ) + K ,j ( t ) + K ,j ( t ) , (E.12)with sup t ∈T ηn | K ,j ( t ) | ≤ k X k =1 sup t ∈T ηn k ˆ M ( J n , t, k ) − Σ X ( t, k ) k F , (E.13)sup t ∈T ηn | K ,j ( t ) | ≤ M sup t ∈T ηn k ˆ Λ ( t ) − Λ ( t ) k F k ˆ B ( t ) − B ( t ) ˆ O ⊤ ( t ) k F , (E.14)sup t ∈T ηn | K ,j ( t ) | ≤ M k ˆ B ( t ) − B ( t ) ˆ O ⊤ ( t ) k F k Λ ( t ) k F (E.15)36or some suﬃciently large constants M and M . Notice that the RHS of the above inequalities areindependent of j . By (E.3), for K ,j ( t ) we have that k sup t ∈T ηn max d ( t )+1 ≤ j ≤ p | K ,j ( t ) |k L = O (cid:18) J n p sup j,t | B j ( t ) | √ n + β n (cid:19) ! . (E.16)Notice that for any two random variables X , Y with ﬁnite ﬁrst moments, for any c >

0, thefollowing inequality holds by Markov inequality P ( | XY | ≥ c E | X | E | Y | ) ≤ P ( | X | ≥ c E | X | ) + P ( | Y | ≥ c E | Y | ) ≤ /c. (E.17)Notice that η − n (cid:16) J n p sup j,t | B j ( t ) | √ n + β n (cid:17) = η − n p − δ ( J n p sup j,t | B j ( t ) | √ n + β n ) p δ − ( J n p sup j,t | B j ( t ) | √ n + β n ).Therefor by (E.17) and the results of Theorem 5.1, Theorem 5.2 and the upperbound of K ,j ( t ),we have that P sup t ∈T ηn max d ( t )+1 ≤ j ≤ p | K ,j ( t ) | ≥ η − n (cid:18) J n p sup j,t | B j ( t ) | √ n + β n (cid:19) g n ! = O (1 /g n ) . (E.18)By (E.2) we shall see that sup t ∈T ηn k Λ ( t ) k F ≍ p − δ , (E.19)which together with Theorem 5.1 and (E.17) provides the probabilistic bound for K ,j ( t ) P (cid:16) sup t ∈T ηn max d ( t )+1 ≤ j ≤ p | K ,j ( t ) | ≥ g n η − n (cid:0) J n p sup j,t | B j ( t ) | √ n + β n (cid:1) (cid:17) = O (1 /g n ) . (E.20)As a result, (ii) follows from (E.16), (E.18) and (E.20).For (iii), notice that by (E.10) and (E.19) it follows thatinf t ∈T ηn k Λ ( t ) k m ' η n p − δ , sup t ∈T ηn k Λ ( t ) k m ' η n p − δ , inf t ∈T ηn k Λ ( t ) k ≍ p − δ , sup t ∈T ηn k Λ ( t ) k ≍ p − δ . (E.21)37hen by (i) P ( λ j +1 ( ˆ Λ ( t ) λ j ( ˆ Λ ( t )) ≥ η n , j = 1 , ..., d ( t ) − , ∀ t ∈ T η n ) ≥ P ( sup t ∈T ηn max ≤ j ≤ d ( t ) | λ j ( ˆ Λ ( t )) − λ j ( Λ ( t )) | ≤ η n p − δ ˜ M ) , (E.22)where ˜ M is a suﬃciently large constant such that η n p − δ / ˜ M < / t ∈T ηn λ d ( t ) ( Λ ( t )). Thereforeby (i) and Markov inequality (iii) holds. Finally, (iv) follows from (i), (ii), (E.21) and (E.17). (cid:3) Proof of Proposition 5.1 . Consider g n = η n ν n p δ − log n . Then the conditions of Theorem 5.3 hold.Since P pj =1 λ j ( ˆ Λ ( t )) = k ˆ Λ ( t ) k F , we shall discuss the magnitude of k ˆ Λ ( t ) k F . First, notice that byusing similar and simpler argument of proofs of Theorem 5.2 it follows that inf t ∈T ηn k Λ ( t ) k F ≍ p − δ .By Theorem 5.1 and (E.19) we shall see thatsup t ∈T ηn k ˆ Λ ( t ) k F ≍ p − δ , inf t ∈T ηn k ˆ Λ ( t ) k F ≍ p − δ . (E.23)Deﬁne the events A n = { sup t ∈T ηn max ≤ j ≤ d ( t ) | λ j ( ˆ Λ ( t )) − λ j ( Λ ( t )) | ≤ η n p − δ ˜ M } , B n = { sup t ∈T ηn λ d ( t )+1 ( ˆ Λ ( t )) λ d ( t ) ( ˆ Λ ( t )) ≥ p δ − g n ν n /η n } . (E.24)By the proof of Theorem 5.3, we shall see that P ( A n ) = 1 − O ( ν n η n p − δ ) , P ( B n ) = 1 − O (1 /g n ) − O ( ν n η n p − δ ) . (E.25)Notice that on A n , by the choice of R ( t ),inf t ∈T ηn min d ( t )+1 ≤ j ≤ R ( t ) λ j +1 ( ˆ Λ ( t )) λ j ( ˆ Λ ( t )) ≥ inf t ∈T ηn min d ( t )+1 ≤ j ≤ R ( t ) λ j +1 ( ˆ Λ ( t )) λ d ( t ) ( ˆ Λ ( t )) ≥ C η n p − δ η n p − δ log n = C / log n (E.26)for some suﬃciently small positive constant C , where the last ≥ is due to (E.23). Because on A n ,inf t ∈ T ηn min ≤ j ≤ d ( t ) λ j +1 ( ˆ Λ ( t )) λ j ( ˆ Λ ( t )) ≥ η n (c.f.(E.22)), and the fact that p δ − g n ν n /η n = o (min( η n , (log n ) − ),38e have that P ( ˆ d n ( t ) = d ( t ) , ∀ t ∈ T η n ) ≥ P ( A n ∩ B n ) = 1 − O ( 1 g n ) − O ( ν n η n p − δ ) . (E.27)Therefore the Proposition holds by considering g n = η n ν n p δ − log n . (cid:3) E.1 Auxiliary Lemmas

Lemma E.2.

Recall in the main article that δ G,l ( k ) = sup t ∈ [0 , ,i ∈ Z , ≤ j ≤ p k G j ( t, F i ) − G j ( t, F ( i − ki ) k L l . (E.28) Under conditions (A1), and (M1)–(M3), there exists a suﬃciently large constant M , such thatuniformly for ≤ u ≤ p and t ∈ [0 , , E | G u ( t, F ) | ≤ M , (E.29) δ G,l ( k ) = O ( dδ Q,l ( k ) + δ H,l ( k )) . (E.30) Proof.

By deﬁnition we have that for 1 ≤ u ≤ p , G u ( t, F i ) = d X v =1 a uv ( t ) Q v ( t, F i ) + H u ( t, F i ) . Notice that d = sup ≤ t ≤ d ( t ) is ﬁxed. Therefore, assumptions (A1), (M2) and triangle inequalitylead to the ﬁrst statement of boundedness of fourth moment of (E.29). Finally (A1) and (M3) leadto the assertion (E.30). (cid:3) Lemma E.3.

Consider the process z i,u,n z i + k,v,n for some k > , and ≤ u, v ≤ d . Then underconditions (C1), (M1)–(M3) we have i) ζ i,u,v =: ζ u,v ( in , F i ) = z i,u,n z i + k,v,n is a locally stationaryprocess with associated dependence measures δ ζ u,v , ( h ) ≤ C ( δ Q, ( h ) + δ Q, ( h + k )) (E.31) for some universal constant C > independent of u, v and any integer h ; (ii) For any series of umbers a i , ≤ i ≤ n , we have (cid:16) E (cid:12)(cid:12)(cid:12) n n X i =1 a i ( ζ i,u,v − E ζ i,u,v ) (cid:12)(cid:12)(cid:12) (cid:17) / ≤ Cn (cid:0) n X i =1 a i (cid:1) ∆ Q, , . (E.32) Proof . i) is a consequence of the Cauchy-Schwarz inequality, triangle inequality and conditions( M M n X i =1 a i ( ζ i,u,v − E ζ i,u,v ) = n X i =1 a i ( ∞ X u =0 P i + k − u ζ i,u,v ) = ∞ X u =0 n X i =1 a i P i + k − u ζ i,u,v . (E.33)By the property of martingale diﬀerence and i) of this lemma we have k n X i =1 a i P i + k − u ζ i,u,v k L = n X i =1 a i kP i + k − u ζ i,u,v k L ≤ C n X i =1 a i ( δ Q, ( u ) + δ Q, ( u − k )) . (E.34)By triangle inequality, inequalities (E.33) and (E.34), the lemma follows. (cid:3) Corollary E.1.

Under conditions (C1), (M1) and (M2) we have for each ﬁxed k > , ≤ u, v ≤ p and ≤ w ≤ d , i) ψ i =: ψ ( in , F i ) = e i + k,u,n e i,v,n , φ i =: φ ( in , F i ) = z i + k,w,n e i,v,n and ι i =: ι ( in , F i ) = e i + k,u,n z i,w,n are locally stationary processes with associated dependence measures max( δ ψ, ( h ) , δ φ, ( h ) , δ ι, ( h )) ≤ C ( δ Q, ( h ) + δ H, ( h ) + δ Q, ( h + k ) + δ H, ( h + k )) . (E.35) for some universal constant C > independent of u , v and w .Proof. The corollary follows from the same proof of Lemma E.3. (cid:3)

To save notation in the following proofs, for given J n , k write ˆ M ( J n , t, k ) as ˆ M if no confusionarises. Recall the deﬁnition of ˜ Σ X,j,k and ˆ M ( J n , t, k ) in (10) and (11) in the main article. Observethe following decompositions ˜ Σ X,j,k = V ,j,k + V ,j,k + V ,j,k + V ,j,k , (E.36) V ,j,k = 1 n n − k X i =1 A ( i + kn ) z i + k,n z ⊤ i,n A ⊤ ( in ) B j ( in ) , (E.37) V ,j,k = 1 n n − k X i =1 A ( i + kn ) z i + k,n e ⊤ i,n B j ( in ) , (E.38)40 ,j,k = 1 n n − k X i =1 e i + k z ⊤ i,n A ( in ) B j ( in ) , V ,j,k = 1 n n − k X i =1 e i + k,n e ⊤ i,n B j ( in ) . (E.39) Lemma E.4.

Under conditions (C1) and (M1), (M2) and (M3) we have that k sup t ∈ [0 , k ˆ M ( J n , t, k ) − E ˆ M ( J n , t, k ) k F k L = O ( J n p sup t, ≤ j ≤ J n | B j ( t ) | √ n ) . Proof.

Using equations (E.36)-(E.39) we have thatˆ M ( J n , t, k ) − E ˆ M ( J n , t, k ) = J n X j =1 4 X s =1 ( V s,j,k − E ( V s,j,k )) B j ( t ) := X s =1 ˜ V s ( t ) , (E.40)where ˜ V s ( t ) = P J n j =1 ( V s,j,k − E ( V s,j,k )) B j ( t ), for s = 1 , , ,

4. Consider the s = 1 case and then˜ V ( t ) := J n X j =1 ( V ,j,k − E ( V ,j,k )) B j ( t )= 1 n J n X j =1 B j ( t ) n − k X i =1 A ( i + kn )( z i + k,n z ⊤ i,n − E ( z i + k,n z ⊤ i,n )) A ⊤ ( in ) B j ( in ) . (E.41)Consider ˜ M j = n P n − ki =1 A ( i + kn )( z i + k,n z ⊤ i,n − E ( z i + k,n z ⊤ i,n )) A ⊤ ( in ) B j ( in ). Its ( u, v ) th , 1 ≤ u ≤ p ,1 ≤ v ≤ p element is˜ M j,u,v = 1 n n − k X i =1 d X u ′ =1 d X v ′ =1 a uu ′ ( i + kn )( z i + k,u ′ ,n z i,v ′ ,n − E ( z i + k,u ′ ,n z i,v ′ n )) a vv ′ ( in ) B j ( in ) . (E.42)Therefore it follows from the triangle inequality and Lemma E.3 that, (cid:13)(cid:13)(cid:13) ˜ M j,u,v (cid:13)(cid:13)(cid:13) L ≤ C ∆ Q, , sup j,t | B j ( t ) | n d X v ′ =1 d X u ′ =1 vuut n − k X i =1 a uu ′ ( i + kn ) a vv ′ ( in ) (E.43)for some suﬃciently large constant C . Consequently by (C1) and Jansen’s inequality, we get E (cid:16) k ˜ M j k F (cid:17) ≤ C ∆ Q, , sup j,t | B j ( t ) | n p X u =1 p X v =1  d X v ′ =1 d X u ′ =1 vuut n − k X i =1 a uu ′ ( i + kn ) a vv ′ ( in )  C ∆ Q, , d sup j,t | B j ( t ) | n n − k X i =1 p X u =1 p X v =1 d X v ′ =1 d X u ′ =1 a uu ′ ( i + kn ) a vv ′ ( in ) ≍ ∆ Q, , d sup j,t | B j ( t ) | p − δ n (E.44)for 1 ≤ j ≤ J n . On the other hand, since ( u, v ) th element of ˜ V ( t ), which is denoted by ˜ V ,u,v ( t ),satisﬁes ˜ V ,u,v ( t ) = 1 n J n X j =1 B j ( t ) ˜ M j,u,v . (E.45)Therefore by Jansen’s inequality it follows thatsup t ∈ [0 , k ˜ V ( t ) k F = sup t, ≤ j ≤ J n p X u =1 p X v =1 J n X j =1 B j ( t ) ˜ M j,u,v ! ≤ sup t, ≤ j ≤ J n | B j ( t ) | p X u =1 p X v =1 ( J n X j =1 | ˜ M j,u,v | ) ≤ sup t, ≤ j ≤ J n | B j ( t ) | p X u =1 p X v =1 J n J n X j =1 | ˜ M j,u,v | ≤ sup t, ≤ j ≤ J n | B j ( t ) | J n J n X j =1 k ˜ M j k F . (E.46)Therefore we have E ( sup t ∈ [0 , k ˜ V ( t ) k F ) ≤ sup t, ≤ j ≤ J n | B j ( t ) | J n J n X j =1 E ( k ˜ M j k F ) (E.47)Combining (E.44) we have that E ( sup t ∈ [0 , k ˜ V ( t ) k F ) / = O (cid:18) J n sup t, ≤ j ≤ J n | B j ( t ) | p − δ √ n (cid:19) . (E.48)Similarly using Corollary E.1 we have that E (cid:0) sup t ∈ [0 , k ˜ V s ( t ) k F (cid:1) = O J n sup t, ≤ j ≤ J n | B j ( t ) | p − δ/ √ n ! , s = 2 , , (E.49)42nd E (cid:0) sup t ∈ [0 , k ˜ V ( t ) k F (cid:1) = O (cid:18) J n p sup t, ≤ j ≤ J n | B j ( t ) | √ n (cid:19) . (E.50)Then the lemma follows from (E.48), (E.49), (E.50) and triangle inequality. (cid:3) Lemma E.5.

Under conditions (A1) (A2), (C1), (M1), (M2) and (M3) we have that k sup t ∈ [0 , ( E ˆ M ( J n , t, k ) − Σ ∗ k ( t )) k F = O ( J n sup t, ≤ j ≤ J n | B j ( t ) | kp/n ) where Σ ∗ k ( t ) = n P J n j =1 P ni =1 E ( G ( in , F i + k ) G ( in , F i ) ⊤ ) B j ( in ) B j ( t ) .Proof. Consider the ( u, v ) th element of ( E ˆ M ( J n , t, k ) − Σ ∗ k ( t )) u,v . By deﬁnition, we have for1 ≤ u, v ≤ p ,( E ˆ M ( J n , t, k ) − Σ ∗ k ( t )) uv = 1 n J n X j =1 n − k X i =1 E (cid:0)(cid:0) G u ( i + kn , F i + k ) − G u ( in , F i + k ) (cid:1) G v ( in , F i ) (cid:1) B j ( in ) B j ( t )+ 1 n J n X j =1 n X i = n − k +1 E (cid:0) G u ( in , F i + k ) G v ( in , F i ) (cid:1) B j ( in ) B j ( t ) . (E.51)By condition (M3) we have that uniformly for 1 ≤ u, v ≤ p ,sup t ∈ [0 , | ( E ˆ M ( J n , t, k ) − Σ ∗ k ( t )) uv | ≤ M ′ J n sup t, ≤ j ≤ J n | B j ( t ) | k/n (E.52)for some suﬃciently large constant M ′ independent of u and v . Therefore by the deﬁnition ofFrobenius norm, the lemma follows. (cid:3) Lemma E.6.

Let ι n = sup ≤ j ≤ J n Lip j + sup t, ≤ j ≤ J n | B j ( t ) | where Lip j is the Lipschitz constant ofthe basis function B j ( t ) . Then under conditions (A1), (A2), (C1), (M1)–(M3) we have that sup t ∈ [0 , k Σ ∗ k ( t ) − Σ X ( t, k ) k F = O (cid:16) J n sup t, ≤ j ≤ J n | B j ( t ) | pι n n + pg J n ,K, ˜ M (cid:17) , where Σ ∗ k is deﬁned in Lemma E.5. roof. Notice that by deﬁnition we have that Σ ∗ k ( t ) = 1 n J n X j =1 n X i =1 Σ X ( in , k ) B j ( in ) B j ( t ) . (E.53)Deﬁne that ˜ Σ ∗ k ( t ) = P J n j =1 R Σ X ( s, k ) B j ( s ) dsB j ( t ). Notice that the ( u, v ) th element of Σ ∗ k ( t ) − ˜ Σ ∗ k ( t ) is ( Σ ∗ k ( t ) − ˜ Σ ∗ k ( t )) u,v = J n X j =1 (cid:16) n n X i =1 E ( G u ( in , F i + k ) G v ( in , F i )) B j ( in ) − Z E ( G u ( s, F i + k ) G v ( s, F i )) B j ( s ) ds (cid:17) B j ( t ) . (E.54)Notice that Lemma E.2 and Condition (A2) imply that there exists a suﬃciently large constant M ′ depending on M of Lemma E.2, such that those Lipschitz constants of the functions E ( G u ( s, F i + k ) G v ( s, F i )) B j ( s )are bounded by M ′ ι n for all 1 ≤ k ≤ k , 1 ≤ u, v ≤ p . Then using similar argument to the proofof Lemma E.5, we obtain thatsup t ∈ [0 , k Σ ∗ k ( t ) − ˜ Σ ∗ k ( t ) k F = O (cid:16) J n sup t, ≤ j ≤ J n | B j ( t ) | pι n n (cid:17) . (E.55)Similarly by using basis expansion (A.2) in condition (A2) have we have thatsup t ∈ [0 , k ˜ Σ ∗ k ( t ) − Σ X ( t, k ) k F = O ( pg J n ,K, ˜ M ) (E.56)which completes the proof. (cid:3) F Proof of Propositions 5.2 , 5.3 and Theorems 5.4, 5.5

In this section we deﬁne ˆ W i , i = 1 , ...d as the orthonormal eigenvectors of ( λ ( P k k =1 ˆ Γ k ),..., λ d ( P k k =1 ˆ Γ k )).Denote by W i , i = 1 , ...d the orthonormal eigenvectors with respect to the d positive eigenvaluesof P k k =1 Γ k : ( λ ( P k k =1 Γ k ),..., λ d ( P k k =1 Γ k )). Let W = ( W , ..., W d ), F = ( F , ..., F p − d ), and sim-44larly ˆ W = ( ˆ W , ..., ˆ W d ), ˆ F = ( ˆ F , ..., ˆ F p − d ). We also require that ˆ W ⊤ ˆ F = , W ⊤ F = , i.e. ,(( W i , ≤ i ≤ d ) , ( F i , ≤ i ≤ p − d )) and (( ˆ W i , ≤ i ≤ d ) , ( ˆ F i , ≤ i ≤ p − d )) are bothorthonormal basis for R p . Notice that ˆ Γ ( t ) = P k k =1 ˆ Γ k and Γ = P k k =1 Γ k .Since we are testing the null hypothesis of static factor loadings and are considering localalternatives where A ( · ) deviates slightly from the null in this section, we assume throughout thissection that d ( t ) ≡ d , i.e., the number of factors is ﬁxed. Because in this section it is assumedthat there are no change points in the number of factors, we can assume that η n in conditions(C1), (S) satisﬁes η n ≥ η > η , and T η n = (0 ,

1) which coincides withdiscussions in Remark 5.1 of the main article.

Corollary F.1.

Assume (A1), (A2), (C1), (M1)–(M3). kk Γ − ˆ Γ k F k L = O ( p − δ √ n ) (F.1) Proof.

It suﬃces to show uniformly for 1 ≤ k ≤ k , kk Γ k − ˆ Γ k k F k L = O p ( p − δ √ n ) (F.2)By the proof of Lemma E.4, it follows that for 1 ≤ k ≤ k , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n n − k X i =1 X i + k,n X ⊤ i,n − n E ( n − k X i =1 X i + k,n X ⊤ i,n ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L = O ( p √ n ) . (F.3)By the proof of Lemma E.5, it follows that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n n − k X i =1 (cid:18) E ( X i + k,n X ⊤ i,n ) − Σ X ( in , k ) (cid:19)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F = O ( pn ) , (F.4) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n n − k X i =1 Σ X ( in , k ) − Z Σ X ( t, k ) dt (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F = O ( pn ) , (F.5)where to prove (F.5) we have used Lemma E.2. Then by (F.3) to (F.5) we have that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n n − k X i =1 X i + k,n X ⊤ i,n − Z Σ X ( t, k ) dt (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) F = O ( p √ n ) , (F.6)45hich together with (E.2) and the deﬁnition of Γ k proves (F.2). Therefore the corollary holds. (cid:3) Corollary F.2.

Assume conditions (A1), (A2), (C1), (M1), (M2), (M3) and condition (S1)-(S4),then there exists orthogonal matrices ˆ O ∈ R d × d , ˆ O ∈ R ( p − d ) × ( p − d ) such that kk ˆ W ˆ O − W k F k L = O ( p δ √ n ) , kk ˆ F ˆ O − F k F k L = O ( p δ √ n ) . Proof.

By (E.2) and the triangle inequality we have that k X k =1 k ( Z Σ X ( t, k ) dt )( Z Σ X ( t, k ) dt ) ⊤ k F ≍ p − δ . (F.7)On the other hand, by Weyl’s inequality (Bhatia (1997)) and (12) of Lam et al. (2011) we havethat sup t ∈ [0 , k k X k =1 ( Z Σ X ( t, k ) dt )( Z Σ X ( t, k ) dt ) ⊤ k m ≥ ( Z inf t ∈ (0 , k ( Σ X ( t, dt ) k m ) . (F.8)By (F.7), (F.8) and the proof of Theorem 5.2 we have that λ d ( Γ ) ≍ p − δ (F.9)The corollary follows from Corollary F.1, (F.9) and Theorem 1 of Yu et al. (2015). Details areomitted for the sake of brevity. (cid:3) Proof of Proposition 5.2 . Proof.

It is follows from Corollaries F.1 and F.2, and the proof ofProposition 5.1. (cid:3)

Corollary F.3.

Assume conditions of Proposition 5.2 hold, then under null hypothesis there existsan orthonormal basis F i , ≤ i ≤ p − d of null space of Γ , such that kk ˆ F − F k F k L = O ( p δ √ n ) , (F.10) where F = ( F , ... F p − d ) . roof. Notice that by Corollary F.2, the exists a set of orthogonal basis G = ( G , ..., G p − d ) ofnull space of Γ together with a ( p − d ) × ( p − d ) orthnormal matrix ˆ O such that kk ˆ F ˆ O − G k F k L = O ( p δ √ n ) . Take F = G ˆ O ⊤ and the corollary is proved. (cid:3) F.1 Proof of Proposition 5.3

Proof.

By Proposition 5.2, it suﬃces to prove everything on the event A n := { ˆ d n = d } . Deﬁne ˆ Z i in analogue to ˜ Z i , with F i replaced by ˆ F i . Then ˆ T n = | P mni =1 ˆ Z i √ m n | ∞ . Notice that for 1 ≤ s ≤ N n − ≤ s ≤ p − d , 1 ≤ s ≤ p , 1 ≤ w ≤ k , the ( k ( p − d ) ps + ( w − p ( p − d ) + ( s − p + s ) th entryof ˜ Z i and ˆ Z i are F ⊤ s X s m n + i + w,n X s m n + i,s and ˆ F ⊤ s X s m n + i + w,n X s m n + i,s respectively where X i,s is the ( s ) th entry of X i,n . Let ˜ Z i,s be the s th entry of ˜ Z i , and ˆ Z i,s be the s th entry of ˆ Z i . Then onevent A n , | ˆ T n − ˜ T n | = max ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k , ≤ s ≤ p − d | ( F ⊤ s − ˆ F ⊤ s ) m n X i =1 X s m n + i + w,n X s m n + i,s | / √ m n ≤ max ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k , ≤ s ≤ p − d m n X i =1 | ( F ⊤ s − ˆ F ⊤ s ) X s m n + i + w,n X s m n + i,s | / √ m n (F.11)By the triangle inequality, condition ( M ′ ) and Cauchy inequality, it follows thatmax ≤ s ≤ p − d | ( F ⊤ s − ˆ F ⊤ s ) m n X i =1 X s m n + i + w,n X s m n + i,s | = max ≤ s ≤ p − d | p X u =1 ( F s ,u − ˆ F s ,u ) m n X i =1 X s m n + i + w,u X s m n + i,s |≤ max ≤ s ≤ p − d vuut p X u =1 ( F s ,u − ˆ F s ,u ) vuut p X u =1 ( m n X i =1 X s m n + i + w,u X s m n + i,s ) ≤ k ˆ F − F k F | p X u =1 ( m n X i =1 X s m n + i + w,u X s m n + i,s ) | / . (F.12)47oreover, for 1 ≤ s ≤ N n −

1, 1 ≤ s ≤ p , 1 ≤ w ≤ k , (cid:13)(cid:13) | p X u =1 ( m n X i =1 X s m n + i + w,u X s m n + i,s ) | / (cid:13)(cid:13) L l = (cid:13)(cid:13) p X u =1 ( m n X i =1 X s m n + i + w,u X s m n + i,s ) (cid:13)(cid:13) / L l/ / √ pm n . (F.13)Using the fact that E (cid:16) max ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k (cid:16)(cid:12)(cid:12)(cid:12) p X u =1 ( m n X i =1 X s m n + i + w,u X s m n + i,s ) (cid:12)(cid:12)(cid:12) / (cid:17) l (cid:17) ≤ X ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p X u =1 ( m n X i =1 X s m n + i + w,u X s m n + i,s ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) / (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) l L l , (F.14)we have k max ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k | p X u =1 ( m n X i =1 X s m n + i + w,u X s m n + i,s ) | / k L l = O (( N n p ) /l √ pm n ) . (F.15)Now the proposition follows from Proposition 5.2, (F.11), (F.12) and (F.15), Corollary F.3 and anapplication of (E.17). (cid:3) In the following proofs, deﬁne the k N n ( p − d ) p dimensional vectors ˜ Z i via replacing ˆ F in ˆ Z i of(17) with F . Further deﬁne S j,w n = j + w n − X r = j ˜ Z i , S m n = m n X i =1 ˜ Z i for 1 ≤ j ≤ m n − w n + 1 . (F.16)Notice that by conditions (S1) and (S3), ˜ Z i , 1 ≤ i ≤ m n are mean 0 N n ( p − d ) p dimensionalvectors. F.2 Proof of Theorem 5.4.

The key idea of the proof of Theorem 5.4 is to approximate the test statistic by the maximum ofthe mean of some centered high-dimensional non-stationary time series. Then we shall apply L ∞ ≤ s ≤ N n −

1, 1 ≤ s ≤ p − d , 1 ≤ s ≤ p , 1 ≤ w ≤ k , the ( k ( p − d ) ps + ( w − p ( p − d ) + ( s − p + s ) th entry of ˜ Z i is F ⊤ s X s m n + i + w,n X s m n + i,s where X i,s is the ( s ) th entry of X i,n . Let ˜ Z i,s be the s th entry of ˜ Z i . By the proof of LemmaE.3, condition M’, condition (a) and the deﬁnition of X i,n , using the fact that k F i k F = 1 for1 ≤ i ≤ p − d , we get δ ˜ Z ( j, k, l ) := max ≤ i ≤ m n max ≤ s ≤ N n k ( p − d ) p E /l ( | ˜ Z i,s − ˜ Z ( i − k ) i,s | l ) = O ((( k + 1) log( k + 1)) − ) . (F.17)Notice that (F.17) satisﬁes (10) of Assumption 2.3 of Zhang and Cheng (2018). Furthermore, bythe triangle inequality, condition ( M ′ ) and Cauchy inequality, under null hypothesis, we have k F ⊤ s X s m n + i + w,n X s m n + i,s |k L l ≤ k F ⊤ s e s m n + i + w k L l k X s m n + i,s k L l . By triangle inequality and condition (M’), we have k X s m n + i,s k L l is bounded for s , i and s . Byconditions (f) we have that k F ⊤ s e s m n + i + w k L l ≤ M l k F ⊤ s e s m n + i + w k L , (F.18) k F ⊤ s e s m n + i + w k L = F ⊤ s E ( e s m n + i + w e ⊤ s m n + i + w ) F s ≤ max ≤ i ≤ n λ max ( cov ( e i,n e ⊤ i,n )) / , (F.19)where the / in (F.19) is uniformly for all i, n, p , which is due to condition (f). This leads to thatmax ≤ i ≤ m n E ( max ≤ s ≤ N n k ( p − d ) p | ˜ Z i,s | )= max ≤ i ≤ m n E ( max ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k , ≤ s ≤ p − d | F ⊤ s X s m n + i + w,n X s m n + i,s | ) / ( N n p ) /l , (F.20)which with condition (c) shows that equation (12) of Corollary 2.2 of Zhang and Cheng (2018)holds. Finally, using (F.17), (F.20) and by similar arguments of Lemma E.3 and Lemma 5 ofZhou and Wu (2010), we shall see that max ≤ j ≤ k N n ( p − d ) p σ j,j ≤ M < ∞ for some constant M .This veriﬁes (9) of Zhang and Cheng (2018), which proves the ﬁrst claim of the theorem.To see the second claim of the theorem, writing δ = g n p δ √ m n √ n Ω n , where g n is a diverging49equence. Notice that P ( ˆ T n ≥ t ) ≤ P ( ˜ T n ≥ t − δ ) + P ( | ˜ T n − ˆ T n | ≥ δ ) , (F.21) P ( ˆ T n ≥ t ) ≥ P ( ˜ T n ≥ t + δ , | ˜ T n − ˆ T n | ≤ δ )= P ( ˜ T n ≥ t + δ ) − P ( ˜ T n ≥ t + δ , | ˜ T n − ˆ T n | ≥ δ ) ≥ P ( ˜ T n ≥ t + δ ) − P ( | ˜ T n − ˆ T n | ≥ δ ) . (F.22)Therefore following the ﬁrst assertion of Theorem 5.4 and Proposition 5.3, we havesup t ∈ R | P ( ˆ T n ≥ t ) − P ( | Y | ∞ ≥ t ) | ≤ sup t | P ( | Y | ∞ ≥ t − δ ) − P ( | Y | ∞ ≥ t + δ ) | + O ( 1 g n + p δ √ n log n + ι ( m n , k N n p ( p − d ) , l, ( N n p ) /l )) . (F.23)Since | Y | ∞ = max( Y , − Y ), by Corollary 1 of Chernozhukov et al. (2015), we have thatsup t | P ( | Y | ∞ ≥ t − δ ) − P ( | Y | ∞ ≥ t + δ ) | = O ( δ p log( n/δ )) . (F.24)Combining (F.23) and (F.24), and letting g n = ( p δ √ m n log n √ n Ω n ) − / , the second claim of Theorem5.4 follows. (cid:3) F.3 Proof of Theorem 5.5

Deﬁne υ n = 1 p w ( m n − w n + 1) m n − w n +1 X j =1 ( S j,w n − w n m n S m n ) R j . (F.25)We shall show the following two assertions:sup t ∈ R | P ( | Y | ∞ ≤ t ) − P ( | υ n | ∞ ≤ t | X i,n , ≤ i ≤ n ) | = O p (Θ / n log / ( W n,p Θ n )) , (F.26)sup t ∈ R | P ( | υ n | ∞ ≤ t | X i,n , ≤ i ≤ n ) − P ( | κ n | ∞ ≤ t | X i,n , ≤ i ≤ n ) | = O p (( p δ +1 / √ w n log n √ n ( N n p ) /l ) l/ ( l +1) ) . (F.27)50he theorem then follows from (F.26), (F.27) and Theorem 5.4. In the remaining proofs, we writeconditional on X i,n , 1 ≤ i ≤ n as conditional on X i,n for short if no confusion arises.Step (i): Proof of (F.26). To show (F.26), we shall show that k max ≤ u,v ≤ k N n p ( p − d ) | σ υu,v − σ Yu,v |k L q ∗ = O (cid:16) w − n + p w n /m n W /q ∗ n,p (cid:17) , (F.28)where σ υu,v and σ Yu,v are the ( u, v ) th entry of the covariance matrix of υ n given ˜ Z i and covariancematrix of Y . Notice that (F.28) together with condition (d) implies that there exists a constant η > P ( max ≤ u,v ≤ k N n p ( p − d ) σ υu,v ≥ η ) ≥ − O (cid:18)(cid:16) w − n + p w n /m n W /q ∗ n,p (cid:17) q ∗ (cid:19) . (F.29)Since by assumption w − n + p w n /m n W /q ∗ n,p = o (1), it suﬃces to consider the conditional Gaussianapproximation on the { X i,n } measurable event { max ≤ u,v ≤ k N n p ( p − d ) σ υu,v ≥ η } . Then by the con-struction of Y and Theorem 2 of Chernozhukov et al. (2015) (we consider the case a p = √ p in there), (F.26) will follow.Now we prove (F.28). Let S j,w n ,s and S m n ,s be the s th element of the vectors S j,w n and S m n ,respectively. By our construction, we have σ υu,v = 1 w n ( m n − w n + 1) m n − w n +1 X j =1 ( S j,w n ,u − w n m n S m n ,u )( S j,w n ,v − w n m n S m n ,v ) ! , σ Yu,v = E S m n ,u S m n ,v m n . (F.30)On one hand, following the proof of Lemma 4 of Zhou (2013) using conditions (i), (ii) and Lemma5 of Zhou and Wu (2010) we have thatmax u,v | E σ υu,v − σ Yu,v | = O (cid:18) w − n + r w n m n (cid:19) . (F.31)Now using (i), (ii) and a similar argument to the proof of Lemma 1 of Zhou (2013), Cauchy-Schwarz51nequality and the fact that max ≤ i ≤ m n | X i | q ∗ ≤ P m n i =1 | X i | q ∗ , we obtain that (cid:13)(cid:13)(cid:13)(cid:13) max u,v (cid:12)(cid:12)(cid:12)(cid:12) w n ( m n − w n + 1) m n − w n +1 X j =1 ( S j,w n ,u S j,w n ,v − E ( S j,w n ,u S j,w n ,v )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13)(cid:13) L q ∗ = O ( p w n /m n W /q ∗ n,p ) , (F.32) (cid:13)(cid:13)(cid:13)(cid:13) max u,v (cid:12)(cid:12)(cid:12)(cid:12) w n m n ( m n − w n + 1) m n − w n +1 X j =1 ( S m n ,u S m n ,v − E ( S m n ,u S m n ,v )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13)(cid:13) L q ∗ = O ( p w n /m n W /q ∗ n,p )(F.33) (cid:13)(cid:13)(cid:13)(cid:13) max u,v (cid:12)(cid:12)(cid:12)(cid:12) m n ( m n − w n + 1) m n − w n +1 X j =1 ( S j,w n ,u S m n ,v − E ( S j,w n ,u S m n ,v )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13)(cid:13) L q ∗ = O ( p w n /m n W /q ∗ n,p ) , (F.34) (cid:13)(cid:13)(cid:13)(cid:13) max u,v (cid:12)(cid:12)(cid:12)(cid:12) m n ( m n − w n + 1) m n − w n +1 X j =1 ( S j,w n ,v S m n ,u − E ( S j,w n ,v S m n ,u )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:13)(cid:13)(cid:13)(cid:13) L q ∗ = O ( p w n /m n W /q ∗ n,p )(F.35)Combining (F.32)-(F.35) we have k max u,v | σ υu,v − E σ υu,v |k L q ∗ = O ( p w n /m n W /q ∗ n,p ) . (F.36)Therefore (F.28) follows from (F.31) and (F.36).Step(ii). We now show (F.27). It suﬃces to consider on the event { ˆ d n = d } . We ﬁrst show thatfor ǫ ∈ (0 , ∞ ) P ( | υ n − κ n | ∞ ≥ ǫ | X i,n ) = O p (( ǫ − p δ +1 / √ w n √ n ( N n p ) /l ) l ) . (F.37)After that we then show for ǫ ∈ (0 , ∞ )sup t ∈ R P ( | υ n − t | ≤ ǫ | X i,n ) = O p ( ǫ p log( n/ǫ )) . (F.38)Combining (F.37) and (F.38), and following the argument of (F.21) to (F.23), we havesup t ∈ R | P ( | υ n | ∞ ≤ t | X i,n , ≤ i ≤ n ) − P ( | κ n | ∞ ≤ t | X i,n , ≤ i ≤ n ) | O p (( ǫ − p δ +1 / √ w n √ n ( N n p ) /l ) l + ǫ p log( n/ǫ )) . (F.39)Take ǫ = ( p δ +1 / √ w n √ n ( N n p ) /l ) l/ ( l +1) log − / (2 l +2) n , (F.27) follows.To show (F.37) it suﬃces to prove that E (cid:0)(cid:12)(cid:12) p w n ( m n − w n + 1) m n − w n +1 X j =1 (ˆ S j,w n − S j,w n ) R j (cid:12)(cid:12) l ∞ (cid:12)(cid:12) X i,n (cid:1) /l = O ( p δ +1 / √ w n √ n ( N n p ) /l ) , (F.40) E (cid:0)(cid:12)(cid:12) p w n ( m n − w n + 1) m n − w n +1 X j =1 w n m n (ˆ S m n − S m n ) R j (cid:12)(cid:12) l ∞ | X i,n (cid:1) /l = O ( p δ +1 / √ w n √ n ( N n p ) /l ) . (F.41)We now show (F.40), and (F.41) follows mutatis mutandis. Deﬁne ˆ S j,w n ,r and S j,w n ,r as the r th component of the k N n ( p − d ) p dimensional vectors ˆ S j,w n and S j,w n . Using the notation of proofof Theorem 5.4, it follows that (cid:12)(cid:12) m n − w n +1 X j =1 (ˆ S j,w n − S j,w n ) R j (cid:12)(cid:12) ∞ = max ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k max ≤ s ≤ p − d (cid:12)(cid:12) m n − w n +1 X j =1 ( j + w n − X i = j ( ˆ F ⊤ s − F ⊤ s ) X s m n + i + w,n X s m n + i,s ) R j (cid:12)(cid:12) . (F.42)Notice that max ≤ s ≤ p − d (cid:12)(cid:12) m n − w n +1 X j =1 ( j + w n − X i = j ( ˆ F ⊤ s − F ⊤ s ) X s m n + i + w,n X s m n + i,s ) R j (cid:12)(cid:12) ≤ X ≤ s ≤ p − d k ˆ F ⊤ s − F ⊤ s k (cid:13)(cid:13)(cid:13) m n − w n +1 X j =1 j + w n − X i = j X s m n + i + w,n X s m n + i,s R j (cid:13)(cid:13) = k ˆ F − F k F (cid:13)(cid:13)(cid:13) m n − w n +1 X j =1 j + w n − X i = j X s m n + i + w,n X s m n + i,s R j (cid:13)(cid:13) . (F.43)Therefore, by using Corollary F.3, (F.42), (F.43) we have (cid:12)(cid:12) m n − w n +1 X j =1 (ˆ S j,w n − S j,w n ) R j (cid:12)(cid:12) ∞ ≤ p δ √ n max ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k (cid:13)(cid:13)(cid:13) m n − w n +1 X j =1 j + w n − X i = j X s m n + i + w,n X s m n + i,s R j (cid:13)(cid:13) . (F.44)53otice that, conditional on X i,n , P m n − w n +1 j =1 P j + w n − i = j X s m n + i + w,n X s m n + i,s R j are p dimensionalGaussian random vector for 1 ≤ s ≤ N n −

1, 1 ≤ s ≤ p , 1 ≤ w ≤ k whose extreme probabilisticbehaviour depend on their covariance structure. Therefore we shall study (cid:13)(cid:13) m n − w n +1 X j =1 j + w n − X i = j X s m n + i + w,n X s m n + i,s (cid:13)(cid:13) . For this purpose, for any random variable Y write ( E ( | Y | l | X i,n )) /l = k Y k L l , X i,n then E  max ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k (cid:13)(cid:13)(cid:13) m n − w n +1 X j =1 j + w n − X i = j X s m n + i + w,n X s m n + i,s R j (cid:13)(cid:13) ! l/ (cid:12)(cid:12)(cid:12) X i,n  ≤ X ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k E  p X s =1 m n − w n +1 X j =1 j + w n − X i = j X s m n + i + w,s X s m n + i,s R j !  l/ (cid:12)(cid:12)(cid:12) X i,n  ≤ X ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k  p X s =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m n − w n +1 X j =1 j + w n − X i = j X s m n + i + w,s X s m n + i,s R j ! (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L l/ , X i,n  l/ = X ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k  p X s =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m n − w n +1 X j =1 j + w n − X i = j X s m n + i + w,s X s m n + i,s R j !(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L l , X i,n  l/ . (F.45)By Burkholder inequality, conditions (M’), (M1) and (M3), we have that (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) m n − w n +1 X j =1 j + w n − X i = j X s m n + i + w,s X s m n + i,s R j !(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L l , X i,n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) L l / w n √ m n − w n + 1 . (F.46)Therefore by straightforward calculations and equations (F.44)–(F.46), we show (F.40). Therefore(F.27) follows.Finally (F.38) follows from by Corollary 1 of Chernozhukov et al. (2015) and (F.29), whichcompletes the proof. (cid:3) Proof of Theorem C.1

Recall the quantities ˇ Z i and ˜ Z i deﬁned in Section C, and ˆ Z i in (18) in the main article. To prove thepower result in Theorem C.1 we need to evaluate the magnitude of E ˆ Z i under the local alternatives,which are determined by E ˜ Z i and E ˇ Z i . Deﬁne X ∗ i = Az i,n + e i,n , and further deﬁne ˜ Z ∗ i by replacing X i,n in ˜ Z i with X ∗ i,n . Notice that, under the considered alternative, Y i in fact preserves the auto-covariance structure of ˜ Z ∗ i . Then we prove Theorem C.1 by the similar arguments to the proof ofTheorems 5.4 and 5.5 but under local alternatives. Proposition G.1.

Write Γ (0) as Γ .Under conditions of Theorem C.1, we have that for each n there exists an orthogonal basis F = ( F , ..., F p − d ) such that k F n − F k F = O ( ρ n p ( δ − δ ) / ) , (G.1) where F forms a basis of Γ . Furthermore, kk ˆ F − F n k F k L = O ( p δ √ n ) , kk ˆ F − F k F k L = O ( ρ n p ( δ − δ ) / + p δ √ n ) . (G.2) Proof.

By the deﬁnition of Γ and Γ n , it follows that k Γ n − Γ k = O ( ρ n p − δ p − δ )2 ). Therefore bythe proof of Corollaries F.2 and F.3, for each n there exists an orthogonal basis F = ( F , ..., F p − d )such that k F n − F k F = O ( k Γ n − Γ k F p − δ ) = O ( ρ n p ( δ − δ ) / ) , (G.3)which shows (G.1). By the similar argument to Corollary F.1 and triangle inequality, we have that kk ˆ Γ − Γ n k F k L = O ( p δ √ n p − δ ) , (G.4) kk ˆ Γ − Γ k F k L ≤ kk ˆ Γ − Γ n k F k L + kk Γ n − Γ k F k L = O (( ρ n p ( δ − δ ) / + p δ √ n ) p − δ ) , (G.5)which further by Theorem 1 of Yu et al. (2015) yield that kk ˆ F − F n k F k L = O ( kk ˆ Γ − Γ n k F k L p − δ ) = O ( p δ √ n ) , (G.6)55 k ˆ F − F k F k L = O ( kk ˆ Γ − Γ k F k L p − δ ) = O ( ρ n p ( δ − δ ) / + p δ √ n ) . (G.7)Therefore (G.2) holds. (cid:3) The following proposition studies the bias of the expectation of ˇ Z i and ˜ Z i . Proposition G.2.

Under conditions of Theorem C.1, we have that | E ˜ Z i | ∞ = O ( ρ n p − δ ) , | E ˇ Z i | ∞ = O ( ρ n p − δ ) . (G.8) Proof.

To show the ﬁrst equation of(G.8), notice that for 1 ≤ s ≤ N n −

1, 1 ≤ s ≤ p − d ,1 ≤ s ≤ p , 1 ≤ w ≤ k , the ( k ( p − d ) ps + ( w − p ( p − d ) + ( s − p + s ) th entry of E ˜ Z i is E ( F ⊤ s X s m n + i + w,n X s m n + i,s ) where X i,s is the ( s ) th entry of X i,n . Then by Cauchy-Schwarzinequality we have that there exists a constant M such that uniformly for 1 ≤ s ≤ N n − ≤ s ≤ p − d , 1 ≤ s ≤ p , 1 ≤ w ≤ k , | E ( F ⊤ s X s m n + i + w,n X s m n + i,s ) | = | E ( F ⊤ s ( ρ n D ( t n ) ◦ Az s m n + i + w,n + e s m n + i + w,n ) X s m n + i,s ) | ≤ ρ n M p − δ , (G.9)where t n = ( s m n + i + w ) /n .In the above argument we have used the fact that k F u k F = 1 for 1 ≤ u ≤ p − d . Therefore theﬁrst equation of (G.8) follows from (G.9). Write h s = F s − F s ,n . Similarly, | E ( h ⊤ s X s m n + i + w,n X s m n + i,s ) | = | E ( h ⊤ s ( A ( t n ) z s m n + i + w,n + e s m n + i + w,n ) X s m n + i,s ) |≤ | E ( h ⊤ s ( A ( t n ) z s m n + i + w,n ) | + | E ( h ⊤ s e s m n + i + w,n X s m n + i,s ) | = O ( ρ n M p − δ ) , where the last equality is due to (G.1), condition (C1), condition (f), Cauchy inequality and thesubmultiplicity of Frobenius norm. Therefore we have | E ˇ Z i − E ˜ Z i | ∞ = O ( ρ n p − δ ). Together withﬁrst equation of (G.8) we get | E ˇ Z i | ∞ = O ( ρ n p − δ ) , (G.10)56nd the proof is complete. (cid:3) Proof of Theorem C.1.

We ﬁrst show (a). Now a careful inspection of the proof of Proposition5.3 shows that the proposition 5.3 still holds under considered alternative hypothesis used inTheorem 5.4. Recall ˆ T = (cid:12)(cid:12)(cid:12) P mni =1 ˆ Z i √ m n (cid:12)(cid:12)(cid:12) ∞ . Deﬁne ˆ Z i in analogue of ˜ Z i with F replaced by ˆ F . Deﬁne˜ T n = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P m n i =1 ˜ Z i √ m n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ , ˜ T ∗ n = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P m n i =1 ˜ Z ∗ i √ m n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ . Using Proposition G.1 and following the proof of Proposition 5.3 we have that for any sequence g n → ∞ , P ( | ˜ T n − ˆ T n | ≥ g n ( ρ n p ( δ − δ ) / + p δ √ n ) m / n Ω n ) = O ( 1 g n + p δ √ n log n ) . (G.11)One the other hand, similarly to the proof of Proposition 5.3, on the event of { ˆ d n = d } , | ˜ T n − ˜ T ∗ n | ≤ max ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k , ≤ s ≤ p − d m n X i =1 | F ⊤ s ( X s m n + i + w,n X s m n + i,s − X ∗ s m n + i + w,n X ∗ s m n + i,s ) | / √ m n ≤ max ≤ s ≤ N n − , ≤ s ≤ p, ≤ w ≤ k | p X u =1 ( m n X i =1 ( X s m n + i + w,u X s m n + i,s − X ∗ s m n + i + w,u X ∗ s m n + i,s )) | / . (G.12)Notice that (cid:13)(cid:13) | p X u =1 ( m n X i =1 ( X s m n + i + w,u X s m n + i,s − X ∗ s m n + i + w,u X ∗ s m n + i,s )) | / (cid:13)(cid:13) L l = (cid:13)(cid:13) p X u =1 ( m n X i =1 X s m n + i + w,u X s m n + i,s − X ∗ s m n + i + w,u X ∗ s m n + i,s ) (cid:13)(cid:13) / L l/ (G.13)which can be further bounded by the summation of (cid:13)(cid:13) p X u =1 ( m n X i =1 X s m n + i + w,u ( X s m n + i,s − X ∗ s m n + i,s )) (cid:13)(cid:13) / L l/ (cid:13)(cid:13) p X u =1 ( m n X i =1 X ∗ s m n + i,s ( X s m n + i + w,u − X ∗ s m n + i + w,u ) (cid:13)(cid:13) / L l/ . Therefore, using the deﬁnition of X ∗ i,n straightforward calculations show that (cid:13)(cid:13) | p X u =1 ( m n X i =1 ( X s m n + i + w,u X s m n + i,s − X ∗ s m n + i + w,u X ∗ s m n + i,s )) | / (cid:13)(cid:13) L l / √ pm n ρ n . (G.14)As a consequence, by the proof of Proposition 5.3, for any sequence g n → ∞ , P ( | ˜ T ∗ n − ˜ T n | ≥ g n ρ n m / n Ω n ) = O ( 1 g n + p δ √ n log n ) . (G.15)Notice that Y i preserves the autocovariance structure of ˜ Z ∗ i . Then it follows from (G.11), (G.14)and the proof of Theorem 5.4 that under the alternative hypothesis,sup t ∈ R | P ( ˆ T n ≤ t ) − P ( | Y | ∞ ≤ t ) | = O (cid:18) ( ρ n p δ − δ + p δ √ n ) p m n log n Ω n (cid:19) / + (cid:16) ρ n p m n log n Ω n (cid:17) / + ι ( m n , k N n p ( p − d ) , l, ( N n p ) /l ! , (G.16)which proves (C.7). We now show the divergence case in (a).Deﬁne ˇ T = P mni =1 ˇ Z i √ m n and notice that ˆ T = P mni =1 ˆ Z i √ m n . Following the proof of Proposition 5.3 we havethat | ˇ T − ˆ T | ∞ = O p ( p δ √ n m / n Ω n ) . (G.17)Notice that for 1 ≤ s ≤ N n −

1, 1 ≤ s ≤ p − d , 1 ≤ s ≤ p , 1 ≤ w ≤ k , the ( k ( p − d ) ps +( w − p ( p − d ) + ( s − p + s ) th entry of ˇ Z i is F ⊤ s ,n X s m n + i + w,n X s m n + i,s where X i,s is the ( s ) th entry of X i,n . It is easy to see that | ˇ T | ∞ ≥ √ m n | m n X i =1 F ⊤ s ,n X s m n + i + w,n X s m n + i,s | , (G.18)58here by the deﬁnition of and the conditions on ∆ n , s , s , s , w can be chosen such that1 √ m n m n X i =1 E ( F ⊤ s ,n X s m n + i + w,n X s m n + i,s ) ≍ ( √ m n ρ n p − δ ) (G.19)On the other hand, notice that 1 m n V ar (cid:16) m n X i =1 ( F ⊤ s ,n X s m n + i + w,n X s m n + i,s ) (cid:17) = 1 m n X i ,i Cov ( F ⊤ s ,n X s m n + i + w,n X s m n + i ,s , F ⊤ s ,n X s m n + i + w,n X s m n + i ,s ) . (G.20)Consider the case when i > i . Now using condition (G) and that X s m n + i + w,n = A ( s m n + i + w, nn ) z s m n + i + w,n + e s m n + i + w,n , we have (let i ′ = s m n + i , i ′ = s m n + i for short) Cov ( F ⊤ s ,n X i ′ + w X i ′ ,s , F ⊤ s ,n X i ′ + w X i ′ ,s ) = Cov ( F ⊤ s ,n A ( i ′ + wn ) z i ′ + w X i ′ ,s , F ⊤ s ,n A ( i ′ + wn ) z i ′ + w X i ′ ,s )+ Cov ( F ⊤ s ,n A ( i ′ + wn ) z i ′ + w X i ′ ,s , F ⊤ s ,n e i ′ + w X i ′ ,s ) . (G.21)Notice that (G.1) and condition (C1) lead to k F ⊤ s ,n A ( kn ) k F = O ( ρ n p − δ ) (G.22)uniformly for 1 ≤ s ≤ p − d and 1 ≤ k ≤ n . Therefore, Cov ( F ⊤ s ,n A ( i ′ + wn ) z i ′ + w X i ′ ,s , F ⊤ s ,n A ( i ′ + wn ) z i ′ + w X i ′ ,s )= k F ⊤ s ,n A ( i ′ + wn ) Cov ( z i ′ + w X i ′ ,s , z i ′ + w X i ′ ,s )( F ⊤ s ,n A ( i ′ + wn )) ⊤ k F ≤ ρ n p − δ k Cov ( z i ′ + w X i ′ ,s , z i ′ + w X i ′ ,s ) k F . (G.23)On the other hand using condition (G2) and condition (ii) of Theorem 5.5, by the proof of LemmaE.3 it follows for 1 ≤ u ≤ d , ≤ v ≤ p , Υ i,u,v =: Υ i,u,v ( in , F i ) = z i,u,n x i + k,v,n is a locally stationary59rocess with associated dependence measures δ Υ u,v , ( h ) = O ((( k + 1) log( k + 1)) − ) . (G.24)Therefore by Lemma 5 of Zhou and Wu (2010), uniformly for 1 ≤ i , i ′ ≤ m n and 1 ≤ s ≤ p , k Cov ( z i ′ + w X i ′ ,s , z i ′ + w X i ′ ,s ) k F = O (( | i ′ − i ′ | log( | i ′ − i ′ | ) − ) . (G.25)Together with (G.23) we have1 m n X i ′ ,i ′ Cov ( F ⊤ s ,n A ( i ′ + wn ) z i ′ + w X i ′ , F ⊤ s ,n A ( i ′ + wn ) z i ′ + w X i ′ ) = O ( ρ n p − δ ∨ . (G.26)By similarly arguments using the proof of Lemma 5 of Zhou and Wu (2010) and condition (G3)we have 1 m n X i ′ ,i ′ Cov ( F ⊤ s ,n A ( i ′ + wn ) z i ′ + w X i ′ , F ⊤ s ,n e i ′ + w X i ′ ) = O ( ρ n p − δ log p ∨

1) (G.27)By (G.18)–(G.20), (G.26) and (G.27) we show ˆ T n is diverging at the rate of √ mρ n p − δ , whichproves (a).To show (b), we deﬁne S ∗ j,w n = j + w n − X r = j ˜ Z ∗ i , S ∗ m n = m n X i =1 ˜ Z ∗ i for 1 ≤ j ≤ m n − w n + 1 . (G.28)Then we deﬁne υ ∗ n = 1 p w n ( m n − w n + 1) m n − w n +1 X j =1 ( S ∗ j,w n − w n m n S ∗ m n ) R j , (G.29)˜ κ n = 1 p w n ( m n − w n + 1) m n − w n +1 X j =1 (˜ S j,w n − w n m n ˜ S m n ) R j . (G.30)60hen by the proof of Theorem 5.5 we have thatsup t ∈ R | P ( | Y | ∞ ≤ t ) − P ( | υ ∗ n | ∞ ≤ t | X i,n , ≤ i ≤ n ) | = O p (Θ / n log / ( W n,p Θ n )) . (G.31)Straightforward calculations using similar arguments to the proofs of step (ii) of Theorem 5.5 andequation (G.2) further show that under the considered local alternative hypothesis, kk| υ ∗ n − ˜ κ n | ∞ k L l , X i,n k L l = O ( ρ n √ w n Ω n ) , (G.32) k| κ n − ˜ κ n | ∞ k L l , X i,n k L l = O (( p δ √ n + ρ n p δ − δ ) √ w n Ω n ) , (G.33)Therefore kk| υ ∗ n − κ n | ∞ k L l , X i,n k L l = O (( p δ √ n + ρ n ) √ w n Ω n ) . (G.34)As a result, by the arguments of (F.39), (C.8) follows.To evaluate κ n , notice that κ n is a Gaussian vector conditional on { X i,n } . Therefore E ( | κ n | ∞ | ( X i,n ) ≤ i ≤ n ) ≤ | ( m n − w n +1 X j =1 (ˆ S j,w n − w n m n ˆ S n ) ◦ ) ◦ | ∞ p n/ p w n ( m n − w n + 1) , a.s.. (G.35)In the following we consider the event { ˆ d n = d } . Observe that | ( m n − w n +1 X j =1 (ˆ S j,w n − w n m n ˆ S n ) ◦ ) ◦ | ∞ ≤ M | ( m n − w n +1 X j =1 (ˆ S j,w n − ˇ S j,w n − w n m n (ˆ S n − ˇ S n )) ◦ ) ◦ | ∞ (G.36)+ M | ( m n − w n +1 X j =1 (ˇ S j,w n − E ˇ S j,w n − w n m n (ˇ S n − E ˇ S n )) ◦ ) ◦ | ∞ + M | ( m n − w n +1 X j =1 ( E ˇ S j,w n − w n m n E ˇ S n ) ◦ ) ◦ | ∞ := I + II + III for some large positive constant M , where I , II and III are deﬁned in an obvious way.61or I , notice that similar to the proof of Theorem 5.5 and the ﬁrst equality of (G.1), we have I/ p w n ( m n − w n + 1) = O p ( p δ √ n √ w n Ω n ) . (G.37)For II , by (G.26), (G.27) and a similar argument to the proof of Theorem 5.5 applied to ˇ S j,w n and ˇ S n , we shall see that II/ p w n ( m n − w n + 1) = O p ( ρ n p − δ log p ∨ ρ n p − δ ∨ . (G.38)For III , by Proposition G.2

III/ p w n ( m n − w n + 1) = O ( √ w n ρ n p − δ ) . (G.39)Therefore from (G.35) to (G.39), | κ n | ∞ = O p ( √ w n ρ n p − δ √ log n ), which completes the proof. (cid:3) References

Anderson, T. W. (1963). The use of factor analysis in the statistical analysis of multiple timeseries.

Psychometrika , 28(1):1–25.Bai, J. (2003). Inferential theory for factor models of large dimensions.

Econometrica , 71(1):135–171.Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models.

Econometrica , 70(1):191–221.Bhatia, R. (1982). Analysis of spectral variation and some inequalities.

Transactions of the Amer-ican Mathematical Society , 272(1):323–331.Bhatia, R. (1997).

Matrix analysis . Springer-Verlag.Breitung, J. and Eickmeier, S. (2011). Testing for structural breaks in dynamic factor models.

Journal of Econometrics , 163(1):71–84.Brillinger, D. R. (2001).

Time series: data analysis and theory . SIAM.62hen, X. (2007). Large sample sieve estimation of semi-nonparametric models.

Handbook ofeconometrics , 6:5549–5632.Chernozhukov, V., Chetverikov, D., and Kato, K. (2015). Comparison and anti-concentrationbounds for maxima of gaussian random vectors.

Probability Theory and Related Fields , 162(1):47–70.Chernozhukov, V., Chetverikov, D., Kato, K., et al. (2013). Gaussian approximations and multiplierbootstrap for maxima of sums of high-dimensional random vectors.

The Annals of Statistics ,41(6):2786–2819.Dahlhaus, R. (1997). Fitting time series models to nonstationary processes.

The annals of Statistics ,25(1):1–37.Dahlhaus, R. (2012). Locally stationary processes. In

Handbook of statistics , volume 30, pages351–413. Elsevier.Das, S. and Politis, D. N. (2020). Predictive inference for locally stationary time series with anapplication to climate data.

Journal of the American Statistical Association , pages 1–16.Dette, H. and Wu, W. (2020+). Prediction in locally stationary time series.

Journal of Bussiness& Economic Statistics , page to Appear.Fengler, M., Hardle, W., and Mammen, E. (2007). A dynamic semiparametric factor modelforimplied volatility string dynamics. Technical Report 5.Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2000). The generalized dynamic-factor model:Identiﬁcation and estimation.

Review of Economics and statistics , 82(4):540–554.Grenander, U. (1981). Abstract inference. Technical report.Hansen, B. E. (2014). Nonparametric sieve regression: Least squares, averaging least squares, andcross-validation.

Handbook of Applied Nonparametric and Semiparametric Econometrics andStatistics, forthcoming .Lam, C. and Yao, Q. (2012). Factor modeling for high-dimensional time series: inference for thenumber of factors.

The Annals of Statistics , 40(2):694–726.63am, C., Yao, Q., and Bathia, N. (2011). Estimation of latent factors for high-dimensional timeseries.

Biometrika , 98(4):901–918.Motta, G., Hafner, C. M., and von Sachs, R. (2011). Locally stationary factor models: Identiﬁcationand nonparametric estimation.

Econometric Theory , pages 1279–1319.Pena, D. and Box, G. E. (1987). Identifying a simplifying structure in time series.

Journal of theAmerican statistical Association , 82(399):836–843.Stock, J. H. and Watson, M. (2011). Dynamic factor models.

Oxford Handbooks Online .Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a largenumber of predictors.

Journal of the American statistical association , 97(460):1167–1179.Su, L. and Wang, X. (2017). On time-varying factor models: Estimation and testing.

Journal ofEconometrics , 198(1):84–101.Tsay, R. S. (2013).

Multivariate time series analysis: with R and ﬁnancial applications . JohnWiley & Sons.Wang, D., Liu, X., and Chen, R. (2019). Factor models for matrix-valued high-dimensional timeseries.

Journal of econometrics , 208(1):231–248.Wang, H. and Xiang, S. (2012). On the convergence rates of legendre approximation.

Mathematicsof Computation , 81(278):861–877.Wei, W. W. (2018).

Multivariate Time Series Analysis and Applications . John Wiley & Sons.Yamamoto, Y. and Tanaka, S. (2015). Testing for factor loading structural change under commonbreaks.

Journal of Econometrics , 189(1):187–206.Yu, Y., Wang, T., and Samworth, R. J. (2015). A useful variant of the davis–kahan theorem forstatisticians.

Biometrika , 102(2):315–323.Zhang, X. and Cheng, G. (2018). Gaussian approximation for high dimensional vector underphysical dependence.

Bernoulli , 24(4A):2640–2675.64hou, Z. (2012). Measuring nonlinear dependence in time-series, a distance correlation approach.

Journal of Time Series Analysis , 33(3):438–457.Zhou, Z. (2013). Heteroscedasticity and autocorrelation robust structural change detection.

Journalof the American Statistical Association , 108(502):726–740.Zhou, Z. and Wu, W. B. (2009). Local linear quantile estimation for nonstationary time series.

The Annals of Statistics , 37(5B):2696–2729.Zhou, Z. and Wu, W. B. (2010). Simultaneous inference of linear models with time varying coeﬃ-cients.

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 72(4):513–531.65

100 200 300 . . . . . . . C u m u l a t i v e e rr o r BenchSPSAPPCA100PCAR100PCA200PCAR200 200 400 600 800 1000 1200 . . . . . . Time 200 400 600 800 1000 1200 . . . . . . P e r c en t ageage