An autocovariance-based learning framework for high-dimensional functional time series
AAn autocovariance-based learning framework forhigh-dimensional functional time series
Jinyuan Chang , Cheng Chen , and Xinghao Qiao School of Statistics, Southwestern University of Finance and Economics, P. R. China Department of Statistics, London School of Economics, U.K.
Abstract
Many scientific and economic applications involve the analysis of high-dimensionalfunctional time series, which stands at the intersection between functional time seriesand high-dimensional statistics gathering challenges of infinite-dimensionality with se-rial dependence and non-asymptotics. In this paper, we model observed functional timeseries, which are subject to errors in the sense that each functional datum arises as thesum of two uncorrelated components, one dynamic and one white noise. Motivatedfrom a simple fact that the autocovariance function of observed functional time seriesautomatically filters out the noise term, we propose an autocovariance-based three-step procedure by first performing autocovariance-based dimension reduction and thenformulating a novel autocovariance-based block regularized minimum distance (RMD)estimation framework to produce block sparse estimates, from which we can finally re-cover functional sparse estimates. We investigate non-asymptotic properties of relevantestimated terms under such autocovariance-based dimension reduction framework. Toprovide theoretical guarantees for the second step, we also present convergence analysisof the block RMD estimator. Finally, we illustrate the proposed autocovariance-basedlearning framework using applications of three sparse high-dimensional functional timeseries models. With derived theoretical results, we study convergence properties ofthe associated estimators. We demonstrate via simulated and real datasets that ourproposed estimators significantly outperform the competitors.
Key words : Autocovariance; Block regularized minimum distance estimation; Dimension reduction;High-dimensional functional time series; Non-asymptotics; Sparsity. a r X i v : . [ m a t h . S T ] A ug Introduction
Functional time series refers to functional data objects that are observed consecutively overtime and constitutes an active research area. Existing research has mainly focused on ex-tending standard univariate or low-dimensional multivariate time series methods to the func-tional domain with theoretical guarantees under an asymptotic framework, e.g., Bosq (2000);Bathia et al. (2010); H¨ormann and Kokoszka (2010); Panaretos and Tavakoli (2013); Aueet al. (2015); H¨ormann et al. (2015); Li et al. (2020), just to name a few. Rapid developmentof data collection technology has made high-dimensional functional time series datasets be-come increasingly common. Examples include hourly measured concentrations of variouspollutants, e.g., PM10 trajectories (H¨ormann et al., 2015) collected over a number of sites,daily electricity load curves (Cho et al., 2013) for a large number of households, cumulativeintraday return trajectories (Horv´ath et al., 2014), daily return density curves (Bathia et al.,2010) and functional volatility processes (M¨uller et al., 2011) for a large collection of stocks,and annul temperature curves (Aue and van Delft, 2020) at different measuring stations.The datasets, in this context, consist of p -dimensional vector of functional time series, W t p¨q “ t W t p¨q , . . . , W tp p¨qu T for t “ , . . . , n with (auto)covariance functions Σ Wh p u, v q “ Cov t W t p u q , W t ` h p v qu for any integer h and u, v P U (a compact interval), where p can bediverging with, or even larger than, n in a high-dimensional regime. Suppose the observedcurves W t p¨q are subject to errors in the form of W t p¨q “ X t p¨q ` e t p¨q , u P U , (1)where X t p¨q “ t X t p¨q , . . . , X tp p¨qu T is p -dimensional functional time series of interest and e t p¨q “ t e t p¨q , . . . , e tp p¨qu T is a white noise sequence. In the same manner as Σ Wh p u, v q , wedefine Σ Xh p u, v q and Σ eh p u, v q by replacing W t p¨q with X t p¨q and e t p¨q , respectively. We call e t p¨q is a white noise sequence if E t e t p u qu “ and Σ eh p u, v q “ for any u, v P U and h ‰ . This formulation guarantees that all dynamic elements of W t p¨q are included in the signalterm X t p¨q and all white noise elements are absorbed into e t p¨q . The existence of e t p¨q reflectsthat curves X t p¨q are seldom completely observed. Instead, they are often only measured,with errors, at discrete locations. These noisy discrete data are smoothed to yield ‘observed’curves W t p¨q . Note that t X t p¨qu nt “ and t e t p¨qu nt “ are uncorrelated and unobservable. See2athia et al. (2010) for the univariate case of model (1) with fully nonparametric structure on Σ e . When W p¨q , . . . , W n p¨q are univariate and independent, Hall and Vial (2006) addressedthe same problem under a ‘low noise’ setting assuming that e t p¨q goes to 0 as n goes to . Imposing some parametrically specified structure in the univariate case, Σ e is assumed tobe diagonal in Yao et al. (2005) and banded under the assumption that Σ X is finite rank inDescary and Panaretos (2019).The standard estimation procedure for univariate or low-dimensional functional timeseries models consists of three steps (Aue et al., 2015). Due to the intrinsic infinite-dimensionality of functional data, the first step performs dimension reduction via e.g. func-tional principal components analysis (FPCA) to approximate each observed curve by thefinite Karhunen-Lo`eve representation, which transforms functional time series observationsinto a vector time series of FPC scores. The second step transforms the estimation offunction-valued parameters involved in the models to the estimation of some vector- ormatrix-valued parameters based on the estimated FPC scores. The third step utilizes es-timated eigenfunctions to obtain the function-valued estimate of interest from the vector-or matrix-valued estimate obtained in the second step. Estimation in the context of high-dimensional functional time series is often impossible without imposing some lower-dimensionalstructural assumption on the model parameters space. With imposed functional sparsitystructure, the second step needs to consider the estimation under a block (or group) spar-sity constraint resulting from the first step, where variables belonging to the same groupshould be simultaneously included in or excluded from the model. In a regression setup, thegroup-lasso penalized least squares estimation (Yuan and Lin, 2006) can be implementedin the second step to obtain block sparse estimates, from which the third step can recoverfunctional sparse estimates. Similar three-step procedures have been developed to estimatesparse high-dimensional functional models, see e.g., vector functional autoregression (VFAR)(Guo and Qiao, 2020), scalar-on-function linear additive regression (SFLR) (Fan et al., 2015;Kong et al., 2016; Xue and Yao, 2020) and function-on-function linear additive regression(FFLR) (Fan et al., 2014; Luo and Qi, 2017) with serially dependent observations.Under the error contamination model in (1), provided that both FPCA and penalizedleast squares estimation are based on the estimated covariance function of W t p¨q , i.e. p Σ W , the3tandard covariance-based procedure is inappropriate given the fact that Σ W “ Σ X ` Σ e andhence p Σ W is not a consistent estimator for Σ X . In this paper, motivated from a simple factthat Σ Wh “ Σ Xh for any h ‰ , which automatically removes the impact from the noise e t p¨q , we propose an autocovariance-based three-step learning framework. Differing from FPCAvia Karhunen-Lo`eve expansion of W tj p¨q for each j , our first step of dimension reduction isdeveloped under an alternative data-driven basis expansion of X tj p¨q formed by performingeigenanlysis on a positive-definite operator defined based on autocovariance functions of W tj p¨q . Different from the penalized least squares estimation applied in the second step,we make use of the autocovariance information of the basis coefficients to construct somemoment equations and then apply our proposed block regularized method to estimate theassociated block sparse vector- or matrix-valued parameters based on the estimated basiscoefficients obtained in the first step. In the third step, the block sparse estimates obtainedin the second step are re-transformed to sparse function-valued estimates via estimated basisfunctions obtained in the first step.There exist several challenges in the theoretical analysis of the proposed autocovariance-based learning framework for high-dimensional functional time series gathering challengesof non-asymptotics (Wainwright, 2019) and infinite-dimensionality with serial dependence(Jirak, 2016). First, our proposed second step is applied to the estimated basis coefficientsrather than the true coefficients to produce block sparse estimates whereas the conventionalsparse estimation is applied directly to observed data. Accounting for such approximationis a major undertaking. Second, under a high-dimensional and serially dependent setting,it is essential to develop non-asymptotic theory that seeks to provide probabilistic boundson relevant estimated terms as a function of n, p and the truncated dimension under ourautocovariance-based dimension reduction framework. Third, compared with non-functionaldata, the infinite-dimensional nature of functional data leads to the additional theoreticalcomplexity that arises from specifying the block structure and controlling the bias termsformed by truncation errors in our dimension reduction step.The main contribution of our paper is fourfold.1. Our autocovariance-based learning framework can address the error contaminationmodel in (1) in the presence of infinite-dimensional signal curve dynamics with the4ddition of ‘genuinely functional’ white noise. It makes the good use of the serialcorrelation information in our estimation, which is most relevant in the context of timeseries modelling.2. To provide theoretical guarantees for the first and third steps and to verify imposedconditions in the second step, we rely on functional stability measures (Guo and Qiao,2020; Fang et al., 2020) to characterize the effect of serial dependence and investigatenon-asymptotic properties of relevant estimated terms under the autocovariance-baseddimension reduction framework we consider.3. We utilize the autocovariance of basis coefficients to construct high-dimensional mo-ment equations with partitioned group structure, based on which we formulate the sec-ond step in a general block regularized minimum distance (RMD) estimation frameworkso as to produce block sparse estimates. Within such framework, the group informa-tion can be explicitly encoded in a convex optimization targeting at minimizing theblock (cid:96) norm objective function subject to the block (cid:96) norm constraint. To theoreti-cally support the second step, we also study convergence properties of the block RMDestimator.4. Exemplarily, we illustrate the autocovariance-based three-step procedure using threesparse high-dimensional functional time series models, i.e. SFLR, FFLR and VFAR.Using our derived theoretical results, we establish convergence rates of the associ-ated estimators in these models. Empirically, we demonstrate the superiority of theseautocovariance-based estimators relative to their covariance-based counterparts.Our paper is set out as follows. In Section 2, we propose a general autocovariance-based three-step procedure with illustration using SFLR as an example. In Section 3, wepresent the first step of autocovariance-based dimension reduction and establish essentialdeviation bounds in elementwise (cid:96) -norm on relevant estimated terms used in subsequentanalysis. In Section 4, we formulate the second step in a general block RMD estimationframework and investigate its theoretical properties. In Section 5, we illustrate the proposedautocovariance-based learning framework using applications of SFLR, FFLR and VFAR,and present convergence analysis of the associated estimators. In Section 6, we examine5he finite-sample performance of the proposed estimators through both an extensive set ofsimulations and an analysis of a public financial dataset. All technical proofs are relegatedto the Supplementary Material. Notation . For a positive integer q, we denote r q s “ t , . . . , q u . Let L p U q be a Hilbertspace of square integrable functions on a compact interval U . The inner product of f, g P L p U q is x f, g y “ ş U f p u q g p u q d u . For a Hilbert space H Ă L p U q , we denote the p -foldCartesian product by H p “ H ˆ ¨ ¨ ¨ ˆ H and the tensor product by S “ H b H . For f “ p f , . . . , f p q T and g “ p g , . . . , g p q T in H p , we define x f , g y “ ř pi “ x f i , g i y . We use } f } “ x f , f y { and } f } “ ř pi “ I p} f i } ‰ q with I p¨q being the indicator function to de-note functional versions of induced norm and (cid:96) -norm, respectively. For an integral oper-ator K : H p Ñ H q induced from the kernel function K “ p K ij q q ˆ p with each K ij P S , K p f qp u q “ t ř pj “ x K j p u, ¨q , f j p¨qy , . . . , ř pj “ x K qj p u, ¨q , f j p¨qyu T P H q for any f P H p . For nota-tional economy, we will also use K to denote both the kernel and the operator. We definefunctional versions of Frobenius and matrix (cid:96) -norms by } K } F “ p ř qi “ ř pj “ } K ij } S q { and } K } “ max i Pr q s ř pj “ } K ij } S , respectively, where } K ij } S “ t ş U ş U K ij p u, v q d u d v u { denotes the Hilbert–Schmidt norm of K ij . For any real matrix B “ p b ij q q ˆ p , we write } B } max “ max i Pr q s ,j Pr p s | b ij | and use } B } F “ p ř qi “ ř pj “ | b ij | q { and } B } “ λ { p B T B q to denote its Frobenius norm and (cid:96) -norm, respectively. For two sequences of positive num-bers t a n u and t b n u , we write a n À b n or b n Á a n if there exist a positive constant c such that a n { b n ď c . We write a n — b n if and only if a n À b n and b n À a n hold simultaneously. Suppose we observe weakly stationary functional time series t W t p¨qu t Pr n s with mean zero and(auto)covariance functions Σ Wh p u, v q “ t Σ Wh,jk p u, v qu j,k Pr p s for integer h ě p u, v q P U , whose sample estimators are given by p Σ Wh p u, v q “ n ´ h n ´ h ÿ t “ W t p u q W t ` h p v q T “ t p Σ Wh,jk p u, v qu j,k Pr p s . (2)Our proposed autocovariance-based learning framework consists of the following three steps.Step 1: Due to the infinite-dimensionality of functional data, for each j, we expand signal curves6 tj p¨q through data-driven orthonormal basis functions, t ψ jl p¨qu l “ , and approximatethem using d j -dimensional truncation, X tj p¨q “ ÿ l “ η tjl ψ jl p¨q « η T tj ψ j p¨q , j P r p s , (3)where η tjl “ x X tj , ψ jl y , η tj “ p η tj , . . . , η tjd j q T P R d j and ψ j “ p ψ j , . . . , ψ jd j q T P R d j . Given observed functional time series t W tj p¨qu t Pr n s , we adopt an autocovariance-based dimension reduction approach in Section 3, where we can obtain estimated basisfunctions p ψ j “ p ˆ ψ j , . . . , ˆ ψ jd j q T and estimated basis coefficients p η tj “ p ˆ η tj , . . . , ˆ η tjd j q T with ˆ η tjl “ x W tj , ˆ ψ jl y for l P r d j s . Step 2: Based on the dimension reduction in Step 1, we can transform the estimation offunction-valued parameters of interest under the sparsity constraint to the block sparseestimation of some vector- or matrix-valued parameters. Let E t η tj η T p t ` h q k u “ t σ p h q jklm u l Pr d j s ,m Pr d k s with its estimator p n ´ h q ´ ř n ´ ht “ p η tj p η T p t ` h q k “ t ˆ σ p h q jklm u l Pr d j s ,m Pr d k s for j, k P r p s and h ě . To identify these vector- or matrix-valued parameters, we use the autocovariance in-formation among the basis coefficients t η tj u to construct high-dimensional momentequations with partitioned group structure and then rely on estimated autocovarianceterms t ˆ σ p h q jklm : j, k P r p s , l P r d j s , m P r d k s , h ě u to formulate the block RMD estima-tion as introduced in Section 4.Step 3: We utilize t p ψ j p¨qu j Pr p s to recover functional sparse estimates from those block sparseestimates obtained in Step 2.We give some illustration on the rationality of our autocovariance-based procedure. Write Σ Xh p u, v q “ t Σ Xh,jk p u, v qu j,k Pr p s and Σ eh p u, v q “ t Σ eh,jk p u, v qu j,k Pr p s . In the first step, the classi-cal FPCA is implemented by the eigenanalysis of p Σ W ,jj for each j. However, such covariance-based estimation problem is insoluble in the sense that one cannot separate X tj p¨q from W tj p¨q due to Σ W ,jj “ Σ X ,jj ` Σ e ,jj and hence p Σ W ,jj is no longer a consistent estimator forΣ X ,jj . Inspired from Σ
Wh,jj “ Σ Xh,jj for any h ‰ , which automatically filters out the impactfrom e tj p¨q and hence guarantees that p Σ Wh,jj is a legitimate estimator for Σ
Xh,jj , our first step isdeveloped under an alternative data-driven basis expansion of X tj p¨q formed by performingeigenanalysis on a positive-definite operator defined based on p Σ Wh,jj for h ě . In the second7tep, the commonly adopted penalized least squares approach is based on the sample covari-ance among the estimated FPC scores t ˆ σ p q jklm : j, k P r p s , l P r d j s , m P r d k su , see e.g. Konget al. (2016). However, provided that σ p h q jklm “ x ψ jl , Σ Xh,jk p ψ km qy and ˆ σ p h q jklm “ x ˆ ψ jl , p Σ Wh,jk p ˆ ψ km qy ,such covariance-based penalized least squares approach is inappropriate due to the fact that p Σ W ,jk and ˆ σ p q jklm are not consistent estimators for Σ X ,jk and σ p q jklm , respectively. Motivatedfrom Σ Wh,jk “ Σ Xh,jk for any h ‰ σ p h q jklm is a legitimate estimator for σ p h q jklm , the moment equations based on t σ p h q jklm : j, k P r p s , l P r d j s , m P r d k s , h ě u can be wellapproximated by its empirical version relied on t ˆ σ p h q jklm u . We next illustrate the proposed three-step procedure using SFLR as an example. Con-sider high-dimensional SFLR in the form of Y t “ p ÿ j “ ż U X tj p u q β j p u q d u ` ε t , t P r n s , (4)where p -dimensional functional covariates t X t p¨qu t Pr n s satisfying model (1) are independent ofi.i.d. mean-zero random errors t ε t u t Pr n s , and t β j p¨qu j Pr p s are unknown functional coefficients.Given observations tp W t p¨q , Y t qu t Pr n s , our goal is to estimate β p¨q “ t β p¨q , . . . , β p p¨qu T . To guarantee a feasible solution under high-dimensional scaling, we assume that β p¨q isfunctional s -sparse, i.e. s components in t β j p¨qu j Pr p s are nonzero with s being much smallerthan p. We expand each X tj p¨q according to (3) truncated at d j and rewrite (4) as Y t “ p ÿ j “ η T tj b j ` r t ` ε t , where b j “ ş U ψ j p u q β j p u q d u P R d j and r t “ ř pj “ ř l “ d j ` η tjl x ψ jl , β j y is the truncationerror. Given some prescribed positive integer L , we choose t η p t ` h q k : h P r L s , k P r p su asvector-valued instrumental variables. Then b “ p b T , . . . , b T p q T P R ř pj “ d j can be identifiedby the following moment equations: E t η p t ` h q k ε t u “ g hk p b q ` R hk “ , k P r p s , h P r L s , (5)where g hk p b q “ E t η p t ` h q k Y t u´ ř pj “ E t η p t ` h q k η T tj b j u and the bias term R hk “ ´ E t η p t ` h q k r t u . Given t p η tj u t Pr n s ,j Pr p s and t p ψ j p¨qu j Pr p s obtained in the first step, for any b “ p b T , . . . , b T p q T P ř pj “ d j , we define p g hk p b q “ n ´ h n ´ h ÿ t “ p η p t ` h q k Y t ´ n ´ h n ´ h ÿ t “ p ÿ j “ p η p t ` h q k p η T tj b j , k P r p s , h P r L s , (6)which provides the empirical version of g hk p b q “ E t η p t ` h q k Y t u ´ ř pj “ E t η p t ` h q k η T tj b j u . Itfollows from (5) that p g hk p b q « , k P r p s , h P r L s . (7)Based on (7), applying the block RMD estimation introduced in Section 4 results in ablock sparse estimator p b “ p p b T , . . . , p b T p q T . Given that the recovery of functional sparsity in β p¨q is equivalent to estimating the block sparsity in b , we can estimate functional sparsecoefficients in the third step by ˆ β j p¨q “ p ψ j p¨q T p b j , j P r p s . (8) For each j P r p s , we assume that signal curves X tj p¨q admit the Karhunen-Lo`eve expansion X tj p¨q “ ř l “ ξ tjl ν jl p¨q , where ξ tjl “ x X tj , ν jl y corresponds to a sequence of random variableswith E p ξ tjl q “ p ξ tjl , ξ tjl q “ ω jl I p l “ l q . Here ω j ě ω j ě ¨ ¨ ¨ ě X ,jj and ν j p¨q , ν j p¨q , . . . are the corresponding orthonormal eigenfunctions satisfying ş U Σ X ,jj p u, v q ν jl p v q d v “ ω jl ν jl p u q for l ě . The commonly adopted FPCA is based on ap-plying Karhunen-Lo`eve expansion to observed curves t W tj p¨qu t Pr n s . However, this covariance-based dimension reduction approach is inappropriate under the error contamination modelin (1) as discussed in Section 2. Hall and Vial (2006) tackled such covariance-based problemunder the assumption that W j p¨q , . . . , W nj p¨q are independent and the noise e tj p¨q goes to 0as n grows to . Without requiring the restrictive ‘low noise’ and independence assumption, we followBathia et al. (2010) to implement an autocovariance-based dimension reduction approachfor observed curves t W tj p¨qu t Pr n s due to the fact Σ Wh,jj “ Σ Xh,jj for any h ‰ , which ensures9hat p Σ Wh,jj is a legitimate estimator for Σ
Xh,jj when h ‰
0. Specifically, we define a nonnegativeoperator K jj to pull together the autocovariance information at different lags: K jj p u, v q “ L ÿ h “ ż U Σ Xh,jj p u, z q Σ Xh,jj p v, z q d z “ L ÿ h “ ż U Σ Wh,jj p u, z q Σ Wh,jj p v, z q d z , (9)where L ą L in practice. It then follows from the infinite-dimensional analog of Proposition 1 inBathia et al. (2010) that, under regularity conditions, K jj has the spectral decomposition K jj p u, v q “ ř l “ λ jl ψ jl p u q ψ jl p v q with nonzero eigenvalues λ ě λ ě ¨ ¨ ¨ ą ψ j p¨q , ψ j p¨q , . . . such that the expansion in (3) holds.This expansion forms the foundation of autocovariance-based dimension reduction for error-contaminated functional time series and generalizes the finite-dimensional formulation inBathia et al. (2010) to the infinite-dimensional setting.With legitimate estimators p Σ Wh,jj for positive integer h in (2), a natural estimator for K jj in (9) can be obtained by p K jj p u, v q “ L ÿ h “ ż U p Σ Wh,jj p u, z q p Σ Wh,jj p v, z q d z “ p n ´ L q L ÿ h “ n ´ L ÿ t,s “ W tj p u q W sj p v qx W p t ` h q j , W p s ` h q j y . (10)Performing eigenanalysis on p K jj leads to the estimated eigenpairs tp ˆ λ jl , ˆ ψ jl qu l ě . The infinite series in the expansion in (3) are then truncated at d j , chosen data-adaptively.In practice, we only observe the erroneous versions t W tj p¨qu t Pr n s instead of the signal com-ponents t X tj p¨qu t Pr n s themselves, and the estimated basis coefficients are given by ˆ η tjl “x W tj , ˆ ψ jl y . As discussed in Section 2, the proposed second and third steps explicitly rely onthe sample autocovariance among estimated basis coefficients, t ˆ σ p h q jklm : j, k P r p s , l P r d j s , m Pr d k s , h P r L su , and the estimated eigenfunctions t ˆ ψ jl p¨q : j P r p s , l P r d j su , respectively,whose convergence properties in elementwise (cid:96) -norm under high-dimensional scaling areinvestigated in Section 3.2 below. 10 .2 Rates in elementwise (cid:96) -norm To characterize the effect of dependence on relevant estimated terms, we will use the func-tional stability measure of t W t p¨qu t P Z proposed in Guo and Qiao (2020). Condition 1.
For t W t p¨qu t P Z , the spectral density operator f Wθ “ p π q ´ ř h P Z Σ Wh e ´ ihθ for θ P r´ π, π s exists and the functional stability measure defined in (11) is finite, i.e. M W “ π ¨ ess sup θ Pr´ π,π s , Φ P H p x Φ , f Wθ p Φ qyx Φ , Σ W p Φ qy ă 8 , (11)where H p “ t Φ P H p : x Φ , Σ W p Φ qy P p , . Here M W in (11) is expressed proportional to functional Rayleigh quotients of f Wθ relativeto Σ W and hence it can more precisely capture the effect of eigenvalues of f Wθ relative to smalldecaying eigenvalues of Σ W , which is essential to handle truly infinite-dimensional functionalobjects t W tj p¨qu . We next define the functional stability measure of all k -dimensional subsetsof t W t p¨qu t P Z , i.e. tp W tj p¨q : j P J q T u t P Z for J Ă r p s with cardinality | J | ď k, by M Wk “ π ¨ ess sup θ Pr´ π,π s , } Φ } ď k, Φ P H p x Φ , f Wθ p Φ qyx Φ , Σ W p Φ qy , k P r p s . (12)Under Condition 1, it is easy to verify that M Wk ď M W ă 8 , which will be used in ournon-asymptotic analysis.Provided that our non-asymptotic results are developed using the infinite-dimensionalanalog of Hanson–Wright inequality (Rudelson and Vershynin, 2013) in a general Hilbertspace H , we need to specify the sub-Gaussian random variables therein. Definition 1.
Let Z t p¨q be a mean zero random variable in H for any fixed t and Σ : H Ñ H be a covariance operator. Then Z t p¨q is a sub-Gaussian process if there exists a constant c ą E p e x x,Z y q ď e c x x, Σ p x qy{ for all x P H . Condition 2. (i) t W t p¨qu t P Z is a sequence of multivariate functional linear processes withsub-Gaussian errors, namely sub-Gaussian functional linear processes, W t p¨q “ ř l “ B l p ε t ´ l q for any t P Z , where B l “ p B l,jk q p ˆ p with each B l,jk P H b H , ε t p¨q “ t ε t p¨q , . . . , ε tp p¨qu T P H p and the components in t ε t p¨qu t P Z are independent sub-Gaussian processes satisfying Defini-tion 1; (ii) The coefficient functions satisfy ř l “ } B l } “ O p q ; (iii) ω ε “ max j Pr p s ş U Σ ε ,jj p u, u q d u “ O p q , where Σ ε ,jj p u, u q “ Cov t ε tj p u q , ε tj p u qu .11he multivariate functional linear process can be seen as the generalization of functionallinear process (Bosq, 2000) to the multivariate setting as well as the extension of multivariatelinear process (Hamilton, 1994) to the functional domain. According to Fang et al. (2020),Condition 2(ii) ensures the stationarity of t W t p¨qu t P Z and, together with Condition 2(iii),implies that ω W “ max j Pr p s ş U Σ W ,jj p u, u q d u “ O p q , which is essential in subsequent analysis. Condition 3. (i) For each j P r p s , it holds that λ j ą λ j ą ¨ ¨ ¨ ą , and there exist somepositive constants c and α ą λ jl ´ λ j p l ` q ě c l ´ α ´ for l ě
1; (ii) For each j P r p s , the linear space spanned by t ν jl p¨qu l “ is the same as that spanned by t ψ jl p¨qu l “ . Condition 3(i) controls the lower bound of eigengaps with larger values of α yieldingtigher gaps and also implies that λ jl ě c α ´ l ´ α . See similar conditions in Hall and Horowitz(2007) and Kong et al. (2016). To simplify notation, we assume the same α across j, butthis condition can be relaxed by allowing α to depend on j and our theoretical results canbe generalized accordingly.We next establish the deviation bounds on estimated eigenpairs, tp ˆ λ jl , ˆ ψ jl qu , and thesample autocovariance among estimated basis coefficients, t ˆ σ p h q jklm u , in elementwise (cid:96) -norm,which play a crucial role in further convergence analysis under high-dimensional scaling. Theorem 1.
Let Conditions hold, d be a positive integer possibly depending on p n, p q . If n Á log p pd q , then there exist some positive constants c and c independent of p n, p, d q suchthat max j Pr p s ,l Pr d s " | ˆ λ jl ´ λ jl | ` ››› ˆ ψ jl ´ ψ jl l α ` ›››* À M W c log p pd q n (13) holds with probability greater than ´ c p pd q ´ c , where M W is defined in (12) . Theorem 2.
Let conditions in Theorem hold and h ě be fixed. If n Á d α ` p M W q log p pd q , then there exist some positive constants c and c independent of p n, p, d q such that max j,k Pr p s ,l,m Pr d s | ˆ σ p h q jklm ´ σ p h q jklm |p l _ m q α ` À M W c log p pd q n (14) holds with probability greater than ´ c p pd q ´ c , where M W is defined in (12) . Remark 1.
The parameter d in Theorems 1 and 2 can be understood as the truncated dimen-sion of infinite-dimensional functional objects under the expansion in (3). In general, d candepend on j, say d j , then the right-sides of (13) and (14) become M W n ´ { log { p ř pj “ d j q .12 Block RMD estimation framework
In this section, we present the proposed second step in a general block RMD estimationframework. Resulting from the dimension reduction step, the estimation of function-valuedparameters involved in sparse high-dimensional functional models can be transformed to theblock sparse estimation of some vector- or matrix-valued parameters, θ “ p θ T , . . . , θ T p q T P R ř pj “ d j ˆ ˜ d with each θ j P R d j ˆ ˜ d , under high-dimensional scaling. For SFLR with a scalarresponse, ˜ d “ . For FFLR and VFAR, ˜ d ě L and q “ pL target moment functions θ ÞÑ g i p θ q mapping θ P R ř pj “ d j ˆ ˜ d to g i p θ q P R d k ˆ ˜ d with i “ p h ´ q p ` k and k P r p s for h P r L s , where both p and q are large, we assume that θ can be identified by the followingmoment equations: g i p θ q ` R i “ , i P r q s , (15)where R i ’s are formed by autocovariance-based truncation errors due to the finite approxi-mation in the first step. We are interested in estimating block sparse θ based on empiricalmappings θ ÞÑ p g i p θ q of θ ÞÑ g i p θ q for i P r q s . See Sections 2 and 5 for detailed expressionsof g i p¨q and p g i p¨q in some exemplified models.We define the block RMD estimator p θ “ p p θ T , . . . , p θ T p q T P R ř pj “ d j ˆ ˜ d as a solution to thefollowing convex optimization problem: p θ “ arg min θ p ÿ j “ } θ j } F subject to max i Pr q s } p g i p θ q} F ď γ n , (16)where γ n ě d “ , the matrixFrobenius norm in (16) degenerates to the vector (cid:96) -norm. For FFLR and VFAR with ˜ d ą , the corresponding optimization tasks are formulated under the matrix Frobenius norm. Thegroup information is encoded in the objective function, which forces the elements of p θ j toeither all be zero or nonzero, thus producing the block sparsity in p θ . It is worth notingthat, without the bias terms R i ’s in (15), our proposed block RMD estimation frameworkcan be seen as a blockwise generalization of the RMD estimation (Belloni et al., 2018) by13eplacing | ¨ | with the } ¨ } F . To solve the large-scale convex optimization problem in (16),we use the R package CVXR (Fu et al., 2020), which is easy to implement and convergesfast. In Sections 5.1, 5.2 and 5.3, we will illustrate our proposed autocovariance-based blockRMD estimation framework using examples of SFLR, FFLR and VFAR, respectively, in thecontext of high-dimensional functional time series.
We begin with some notation that will be used in this section. For a block matrix B “p B ij q i Pr N s ,j Pr N s P R N m ˆ N m with the p i, j q -th block B ij P R m ˆ m , we define } B } p m ,m q max “ max i Pr N s ,j Pr N s } B ij } F . When N “ , we also define } B } p m ,m q “ ř N i “ } B i } F . To simplifynotation in this section and theoretical analysis in Section 5, we assume the same truncateddimension d j “ d across j P r p s , but our theoretical results extend naturally to the moregeneral setting where d j ’s are different.Let g p θ q “ t g p θ q T , . . . , g q p θ q T u T and R “ p R T , . . . , R T q q T P R qd ˆ ˜ d . We focus on the caseof which the moment function θ ÞÑ g p θ q mapping from R pd ˆ ˜ d to R qd ˆ ˜ d is linear with respectto θ in the form of g p θ q “ G θ ` g p q . This together with (15) implies that G θ ` g p q ` R “ , (17)the form of which can be easily verified for SFLR, FFLR and VFAR models we consider inthis paper. Now we reformulate the optimization task in (16) as p θ “ arg min θ } θ } p d, ˜ d q subject to } p g p θ q} p d, ˜ d q max ď γ n , (18)where p g p θ q “ p G θ ` p g p q is the empirical version of g p θ q . It is worth noting that θ is block s -sparse with support S “ t j P r p s : } θ j } F ‰ u and its cardinality s “ | S | . Before presenting properties of the block RMD estimator p θ , we impose some high-levelregularity conditions. Condition 4.
There exists (cid:15) n , δ n ą } p G ´ G } p d,d q max _ } p g p q ´ g p q} p d, ˜ d q max ď (cid:15) n withprobability at least 1 ´ δ n . Condition 5.
There exists (cid:15) ą } R } p d, ˜ d q max ď (cid:15) . ondition 6. There exists δ n ą } p g p θ q} p d, ˜ d q max ď γ n with probability at least1 ´ δ n . Conditions 4 and 5 together ensure that the empirical moment functions are nicely con-centrated around the target moment functions. Using our derived non-asymptotic results inSection 3.2, we can easily specify the concentration bounds in Condition 4 for SFLR, FFLRand VFAR. With further imposed smoothness conditions on coefficient functions, Condi-tion 5 can also be verified. Condition 6 indicates that θ is feasible in the optimizationproblem (18) with high probability, in which case a solution p θ of (18) exists and satisfies } p θ } p d, ˜ d q ď } θ } p d, ˜ d q . The non-block version of such property typically plays a crucial role totackle high-dimensional models in the literature.Let δ “ θ ´ θ . We define a block (cid:96) -sensitivity coefficient κ p θ q “ inf T : | T |ď s inf δ P C T : } δ } p d, ˜ d q ą } G δ } p d, ˜ d q max } δ } p d, ˜ d q , (19)where C T “ t δ P R pd ˆ ˜ d : } δ T c } p d, ˜ d q ď } δ T } p d, ˜ d q u for T Ă r p s . Provided that p δ “ p θ ´ θ P C S under Condition 6 as justified in Lemma 1 of the Supplementary Material, the lower boundof κ p θ q is useful to establish the error bound for } p δ } p d, ˜ d q . See also Belloni et al. (2018) andGautier and Rose (2019) for non-block l q -sensitivity quantities to handle high-dimensionalinstruments. We then need Condition 7 below to determine such lower bound. Beforepresenting this condition, we introduce some notation. Let J Ă r q s and M Ă r p s , let G J,M “ p G jk q j P J,k P M with each G jk P R d ˆ d be the block submatrix of G consisting of allblock rows j P J and all block columns k P M of G . For an integer m ě s, we define σ min p m, G q “ min | M |ď m max | J |ď m σ min p G J,M q and σ max p m, G q “ max | M |ď m max | J |ď m σ max p G J,M q , where σ min p G J,M q and σ max p G J,M q are the smallest and largest singular values of G J,M . Condition 7.
There exists an universal constant c ą µ ą σ max p m, G q ě c and σ min p m, G q{ σ max p m, G q ě µ for m “ s { µ . In Condition 7, the quantity µ serves as a key factor to determine the lower bound of κ p θ q , which is justified in Lemma 3 of the Supplementary Material. When µ is bounded15way from zero, we have a strongly-identified model. When µ Ñ , it corresponds to thescenario with weak instruments. See also Belloni et al. (2018) for similar conditions. We arenow ready to present the theorem on the convergence rate of p θ . Theorem 3.
Suppose that Conditions hold. If } θ } p d, ˜ d q ď K and the regularizationparameter γ n À p K ` q (cid:15) n ` (cid:15) , then with probability at least ´ p δ n ` δ n q , the block RMDestimator p θ satisfies } p θ ´ θ } p d, ˜ d q À sµ ´ tp K ` q (cid:15) n ` (cid:15) u . (20) Remark 2. (i) The error bound in (20) has the familiar variance-bias tradeoff as commonlyconsidered in nonparametrics statistics, suggesting us to carefully select the truncated di-mension d so as to balance variance and bias terms for the optimal estimation.(ii) With commonly imposed smoothness conditions on functional coefficients, it is easyto verify that K _ (cid:15) “ o p s q for SFLR, FFLR and VFAR in Section 5.(iii) For three examples we consider, G is formed by t σ p h q jklm : j, k P r p s , l, m P r d s , h P r L su with the components σ p h q jklm satisfying | σ p h q jklm | ď t E p η tjl qu { r E t η p t ` h q km us { “ λ { jl λ { km Ñ l, m Ñ 8 . Consider a general cross-covariance matrix G “ E p xy T q P R qd ˆ pd with entriesdecaying to zero as d Ñ 8 , where x “ p x , . . . , x qd q T with E p x q “ and y “ p y , . . . , y pd q T with E p y q “ , it is more sensible to impose Condition 7 on its normalized version r G “ D x GD y instead of G itself, where D x “ diag t Var p x q ´ { , . . . , Var p x qd q ´ { u and D y “ diag t Var p y q ´ { , . . . , Var p y pd q ´ { u . For three exemplified models, D x and D y are formed by t λ ´ { jl : j P r p s , l P r d su . Remark 2(iii) motivates us to present the following proposition that will be used in thetheoretical analysis of associate estimators for SFLR, FFLR and VFAR in Section 5.
Proposition 1.
Suppose that all conditions in Theorem hold except that Condition holdsfor r G , then with probability at least ´ p δ n ` δ n q , the block RMD estimator p θ satisfies } p θ ´ θ } p d, ˜ d q À sµ ´ } D x } max } D y } max tp K ` q (cid:15) n ` (cid:15) u . (21)16 Applications
In this section, we present the proposed autocovariance-based estimation procedure withcorresponding convergence analysis using applications of SFLR, FFLR and VFAR modelsunder high-dimensional scaling in Sections 5.1, 5.2 and 5.3, respectively.
Within the learning framework in Section 2, we first perform autocovariance-based dimensionreduction on t W tj p¨qu t Pr n s for each j P r p s . Following the optimization framework in (16), wethen develop the block RMD estimator p b as a solution to the constrained optimizationproblem below: p b “ arg min b p ÿ j “ } b j } subject to max k Pr p s ,h Pr L s } p g hk p b q} ď γ n , where γ n ě p g hk p b q is defined in (6). Finally, we obtainestimated functional coefficients t ˆ β j p¨qu j Pr p s as in (8).We next present the convergence analysis of t ˆ β j p¨qu j Pr p s . To simplify notation, we assumethe same truncated dimension d j “ d across j P r p s . We rewrite (5) in the form of (17),where g “ p g T , . . . , g T p , . . . , g T L , . . . , g T Lp q T , R “ p R T , . . . , R T p , . . . , R T L , . . . , R T Lp q T and G “p G ij q P R pLd ˆ pd whose p i, j q -th block is G ij “ E t η p t ` h q k η T tj u P R d ˆ d with i “ p h ´ q p ` k and k P r p s for h P r L s . Applying Theorem 2 and Proposition 3 in the Appendix on p G and p g p q , respectively, we can verify Condition 4 with the choice of (cid:15) n — M W,Y d α ` t log p pd q{ n u { , where M W,Y is specified in Proposition 3 in the Appendix. Before presenting the maintheorem, we list two regularity conditions.
Condition 8.
For each j P S “ t j P r p s : } β j } ‰ u , β j p¨q “ ř l “ a jl ψ jl p¨q and there existssome positive constant τ ą α ` { | a jl | À l ´ τ for l ě . Condition 9.
Let r G “ p r G ij q be the normalized version of G “ p G ij q by replacing each G ij with r G ij “ E t D k η p t ` h q k η T tj D j u , i “ p h ´ q p ` k, k P r p s for h P r L s and j P r p s , where D j “ diag p λ ´ { j , . . . , λ ´ { jd q . Then there exists an universal constant c and µ ą σ max p m, r G q ě c and σ min p m, r G q{ σ max p m, r G q ě µ for m “ s { µ . t β j p¨q : j P S u based on its expansion throughbasis t ψ jl p¨qu l ě . The parameter τ determines the decay rate of basis coefficients and hencecontrol the level of smoothness with large values yielding smoother functions in t β j p¨q : j P S u . See similar conditions in Hall and Horowitz (2007) and Kong et al. (2016). Noting thatcomponents of G decay to zero as d grows to infinity, we impose Condition 9 on r G , whichcan be viewed as the normalized counterpart of Condition 7 for SFLR.Applying Proposition 1 and Theorem 1 yields the convergence rate of the SFLR estimate p β p¨q “ t ˆ β p¨q , . . . , ˆ β p p¨qu T under functional (cid:96) norm in the following theorem. Theorem 4.
Suppose that Conditions and Condition in the Appendix hold forsub-Gaussian functional linear process t W t p¨qu and sub-Gaussian linear process t Y t u , andalso Conditions hold. If the regularization parameter γ n — s r d α ` M W,Y t log p pd q{ n u { ` d ´ τ ` { s , then the estimate p β p¨q satisfies p ÿ j “ } ˆ β j ´ β j } “ O p " µ ´ s ˆ d α ` M W,Y c log p pd q n ` d α ´ τ ` { ˙* . (22) Remark 3.
The rate of convergence in (22) is governed by both dimensionality parameters p n, p, s q and internal parameters p M W,Y , d, α, τ, µ q . Typically, the rate is better when τ, µ arelarge and M W,Y and α are small. To balance variance and bias terms in (22) for the optimalestimation, we can choose the truncated dimension d satisfying M W,Y log p pd q d τ ` α ` — n. Consider high-dimensional FFLR in the form of Y t p v q “ p ÿ j “ ż U X tj p u q β j p u, v q d u ` ε t p v q , t P r n s , v P V , (23)where t X t p¨qu t Pr n s satisfy model (1) and are independent of i.i.d. mean-zero functional errors t ε t p¨qu t Pr n s , and t β j p¨ , ¨qu j Pr p s are functional coefficients to be estimated. With observed data tp W t p u q , Y t p v qq : p u, v q P U ˆ V , t P r n su , we target to estimate β “ t β p¨ , ¨q , . . . , β p p¨ , ¨qu T under a functional sparsity constraint when p is large. Specifically, we assume β is functional s -sparse with support S “ t j P r p s : } β j } S ‰ u and cardinality s “ | S | ! p. Provided that each observed Y t p¨q is decomposed into the sum of dynamic and white noisecomponents in (23), we approximate Y t p¨q under the Karhunen–Lo`eve expansion truncated18t ˜ d, i.e. Y t p¨q « ζ T t φ p¨q , where ζ t “ p ζ t , . . . , ζ t ˜ d q T and φ “ p φ , . . . , φ ˜ d q T . Note that we canrelax the independence assumption for t ε t p¨qu t Pr n s and model observed response curves via r Y t p¨q “ Y t p¨q ` e Yt p¨q , where Y t p¨q and e Yt p¨q correspond to the dynamic signal and white noiseelements, respectively. Then Y t p¨q can be approximated under the autocovariance-basedexpansion in the sense of (3) and our subsequent analysis still follow.For each j P r p s , we expand X tj p¨q according to (3) truncated at d j . Some specific calcu-lations lead to the representation of (23) as ζ T t “ p ÿ j “ η T tj B j ` r T t ` ε T t , (24)where B j “ ş U ˆ V ψ j p u q β j p u, v q φ p v q T d u d v P R d j ˆ ˜ d and r t “ p r t , . . . , r t ˜ d q T is the truncationerror with each r tm “ ř pj “ ř l “ d j ` η tjl xx ψ jl , β j y , φ m y for m P r ˜ d s . Let B “ p B T , . . . , B T p q T P R ř pj “ d j ˆ ˜ d . We choose t η p t ` h q k : h P r L s , k P r p su as vector-valued instrumental variables,which are assumed to be uncorrelated with the random error ε t in (24). Within the frame-work of (15), we assume that B is the unique solution to the following moment equations: “ E t η p t ` h q k ε T t u “ g hk p B q ` R hk , h P r L s , k P r p s , (25)where g hk p B q “ E t η p t ` h q k ζ T t u ´ ř pj “ E t η p t ` h q k η T tj B j u and R hk “ ´ E t η p t ` h q k r T t u . Given the recovery equivalence between functional sparsity in β and the block sparsity in B , we aim to estimate the block sparse matrix B using the empirical versions B ÞÑ p g hk p B q for h P r L s and k P r p s , p g hk p B q “ n ´ h n ´ h ÿ t “ p η p t ` h q k p ζ T t ´ n ´ h n ´ h ÿ t “ p ÿ j “ p η p t ` h q k p η T tj B j , where p ζ t “ p ˆ ζ t , . . . , ˆ ζ t ˜ d q T with ˆ ζ tm “ x Y t , ˆ φ m y for m P r ˜ d s and t p η tj u t Pr n s ,j Pr p s are obtained inthe first step. In the second step, according to (16), we formulate the block RMD estimator p B by solving the convex optimization problem below: p B “ arg min B p ÿ j “ } B j } F subject to max k Pr p s ,h Pr L s } p g hk p B q} F ď γ n , where γ n ě β j p u, v q “ p ψ j p u q T p B j p φ p v q , p u, v q P U ˆ V , j P r p s , (26)19here t p ψ j p u qu j Pr p s and p φ p v q “ p ˆ φ p v q , . . . , ˆ φ ˜ d p v qq T are obtained in the first step.In the following, we investigate the convergence property of t ˆ β j p¨ , ¨qu j Pr p s in (26). Tosimplify notation, we assume the same truncated dimension d j “ d across j P r p s . We firstrewrite (25) in the form of (17) and apply Theorem 2 and Proposition 2 in the Appendix on p G and p g p q to verify Condition 4 with the choice of (cid:15) n — M W,Y d α _ ˜ α ` t log p pd q{ n u { , where M W,Y is specified in Proposition 2 in the Appendix. In a similar fashion to α, the parameter˜ α as specified in Condition 14 in the Appendix determines the tightness of eigengaps ofthe covariance function of t Y t p¨qu . We then impose the following smoothness condition onnonzero coefficient functions.
Condition 10.
For each j P S, β j p u, v q “ ř l,m “ a jlm ψ jl p u q φ m p v q and there exists somepositive constant τ ą α _ ˜ α ` { | a jlm | À p l ` m q ´ τ ´ { for l, m ě . We are now ready to present the convergence rate of the FFLR estimate p β p¨ , ¨q “t ˆ β p¨ , ¨q , . . . , ˆ β p p¨ , ¨qu T under functional (cid:96) norm in Theorem 5. Theorem 5.
Suppose that Conditions and Conditions in the Appendix hold forsub-Gaussian functional linear processes t W t p¨qu and t Y t p¨qu , and also Conditions hold.Let d — ˜ d. If the regularization parameter γ n — s r d α _ ˜ α ` M W,Y t log p pd q{ n u { ` d ´ τ ` { s , thenthe estimate p β p¨ , ¨q satisfies p ÿ j “ } ˆ β j ´ β j } S “ O p " µ ´ s ˆ d α ` α _ ˜ α ` M W,Y c log p pd q n ` d α ´ τ ` { ˙* . (27) Remark 4.
With the same expression of G for both SFLR and FFLR, Condition 9 isrequired in both Theorems 4 and 5. Note we can further remove the assumption of d — ˜ d, and establish the general convergence result in terms of d, ˜ d and other parameters. The high-dimensional VFAR of a fixed lag order H, namely VFAR( H ), takes the form of X t p v q “ H ÿ h “ ż U A p h q p u, v q X t ´ h p u q d u ` ε t p v q , t “ H ` , . . . , n , (28)where t X t p¨qu satisfy model (1), the errors ε t “ p ε t , . . . , ε tp q T are i.i.d. sampled from a p -dimensional vector of mean-zero functional processes, independent of X t ´ p¨q , X t ´ p¨q , . . . , A p h q “ t A p h q ,jj p¨ , ¨qu j,j Pr p s is the unknown functional transition matrix at lag h . To makea feasible fit to (28) under a high-dimensional regime based on observed curves t W t p¨qu t Pr n s , we assume t A p h q u h Pr H s is rowwise functional s -sparse with s “ max j Pr p s s j ! p. To bespecific, for the j -th row of components in t A p h q u , we denote the set of nonzero functionsby S j “ tp j , h q P r p s ˆ r H s : } A p h q ,jj } S ‰ u and its cardinality by s j “ | S j | for j P r p s . For each j P r p s , we approximate X tj p¨q based on the expansion in (3) truncated at d j . With some specific calculations, model (28) can be rowwisely rewritten as η T tj “ H ÿ h “ p ÿ j “ η T p t ´ h q j Ω p h q ,jj ` r T tj ` ε T tj , j P r p s , (29)where Ω p h q ,jj “ ş U ψ j p u q A p h q ,jj p u, v q ψ j p v q T d u d v P R d j ˆ d j and r tj “ p r tj , . . . , r tjd j q T isthe truncation error with each r tjm “ ř Hh “ ř pj “ ř l “ d j ` η p t ´ h q j l xx ψ j l , A p h q ,jj y , ψ jm y for m P r d j s . Let Ω j “ tp Ω p q ,j q T , . . . , p Ω p q ,jp q T , . . . , p Ω p H q ,j q T , . . . , p Ω p H q ,jp q T qu T P R H ř pj d j ˆ d j . Wechoose (cid:32) η p t ` h q k : h P r L s , k P r p s ( as vector-valued instrumental variables, which are as-sumed to be uncorrelated with the random error ε tj in (29). Within the framework of (15),we assume that Ω j is the unique solution to the following moment equations: “ E t η p t ` h q k ε T tj u “ g j,hk p Ω j q ` R j,hk , h P r L s , k P r p s , (30)where g j,hk p Ω j q “ E t η p t ` h q k η T tj u´ ř Hh “ ř pj “ E t η p t ` h q k η T p t ´ h q j Ω p h q ,jj u and R j,hk “ ´ E t η p t ` h q k r T tj u . Given that estimating the functional sparsity in the j -th row of t A p h q u h Pr H s is equivalentto estimating the block sparsity in Ω j for each j, our goal is to estimate the block sparsematrix Ω j using the empirical versions Ω j ÞÑ p g j,hk p Ω j q for h P r L s and k P r p s , where p g j,hk p Ω j q “ n ´ h n ´ h ÿ t “ p η p t ` h q k p η T tj ´ n ´ h n ´ h ÿ t “ H ÿ h “ p ÿ j “ p η p t ` h q k p η T p t ´ h q j Ω p h q jj and t p η tj u t Pr n s ,j Pr p s are obtained in the first step. The second step follows (16) to formulatethe block RMD estimator p Ω j by solving the following optimization task: p Ω j “ arg min Ω j H ÿ h “ p ÿ j “ } Ω p h q jj } F subject to max k Pr p s ,h Pr L s } p g j,hk p Ω j q} F ď γ nj , where γ nj ě A p h q jj p u, v q “ p ψ j p u q T p Ω p h q jj p ψ j p v q , p u, v q P U , j, j P r p s , h P r H s , t p ψ j p¨qu j Pr p s are obtained in the first step.We next present convergence analysis of t ˆ A p h q jj p¨ , ¨q : j, j P r p s , h P r H su . To simplifynotation, we assume the same truncated dimension d j “ d across j P r p s . For each j P r p s , we first express (30) in the form of g j p Ω j q ` R j “ G j Ω j ` g j p q ` R j “ , where g j “ p g T j, , . . . , g T j, p , . . . , g T j,L , . . . , g T j,Lp q T , R j “ p R T j, , . . . , R T j, p , . . . , R T j,L , . . . , R T j,Lp q T and G j “ p G j,ii q P R pLd ˆ pHd whose p i, i q -th block is G j,ii “ E t η p t ` h q k η T p t ´ h q j u P R d ˆ d with i “ p h ´ q p ` k, k P r p s for h P r L s and i “ p h ´ q p ` j , j P r p s for h P r H s . Ap-plying Theorem 2 on p G j and p g j p q , we can verify Condition 4 with the choice of (cid:15) n — M W d α ` t log p pd q{ n u { . Similarly, we then give two regularity conditions.
Condition 11.
For each j P r p s and p j , h q P S j , A p h q ,jj p u, v q “ ř l,m “ a p h q jj lm ψ j m p u q ψ jl p v q and there exists some constant τ ą α ` { | a p h q jj lm | À p l ` m q ´ τ ´ { for l, m ě . Condition 12.
For each j P r p s , let r G j “ p r G j,ii q be the normalized version of G j “ p G j,ii q by replacing each G j,ii with r G j,ii “ E t D k η p t ` h q k η T p t ´ h q j D j u for i “ p h ´ q p ` k and i “ p h ´ q p ` j with k, j P r p s , h P r L s and h P r H s , where D j “ diag p λ ´ { j , . . . , λ ´ { jd q . Then there exists an universal constant ˜ c j and µ j ą σ max p m, r G j q ě ˜ c j and σ min p m, r G j q{ σ max p m, r G j q ě µ j for m “ s j { µ j . We finally establish convergence rate of the VFAR estimate t ˆ A p h q jj u j,j Pr p s ,h Pr H s in the senseof functional matrix (cid:96) norm as follows. Theorem 6.
Suppose that Conditions hold for sub-Gaussian functional linear pro-cess t W t p¨qu , and Conditions also hold. If regularization parameters satisfy γ nj — s j r d α ` M W t log p pd q{ n u { ` d ´ τ ` { s for j P r p s and µ “ min j Pr p s µ j , the estimate t ˆ A p h q jj u satisfies max j Pr p s p ÿ j “ H ÿ h “ } ˆ A p h q jj ´ A p h q ,jj } S “ O p " µ ´ s ˆ d α ` M W c log p pd q n ` d α ´ τ ` { ˙* . (31)22 Empirical studies
In this section, we conduct a number of simulations to evaluate the finite-sample performanceof the proposed autocovariance-based estimators for SFLR, FFLR and VFAR models.In each simulated scenario, to mimic the infinite-dimensional nature of signal curves,we generate X tj p u q “ ř l “ η tjl ψ l p u q “ η T tj ψ p u q with η tj “ p η tj , . . . , η tj q T and ψ p¨q “t ψ p¨q , . . . , ψ p¨qu T for t P r n s , j P r p s and u P U “ r , s , where t ψ l p u qu ď l ď is formed by25-dimensional Fourier basis functions, 1 , ? p πlu q , ? p πlu q for l “ , . . . ,
12 andeach η t “ p η T t , . . . , η T tp q T P R p is generated from a stationary vector autoregressive (VAR)model, η t “ Ω η t ´ ` (cid:15) t , with block transition matrix Ω “ p Ω jk q j,k Pr p s P R p ˆ p and (cid:15) t “p (cid:15) t , . . . , (cid:15) t q T , whose components are independently sampled according to (cid:15) tj „ N p , . ´ . j q for j “ , . . . , N p , j ´ q for j “ , . . . , . Therefore, X t p¨q follows a VFAR(1)model satisfying X t p v q “ ş U A p u, v q X t ´ p u q d u ` ε t p v q , where ε tj p v q “ ψ p v q T (cid:15) tj and autoco-efficient functions satisfy A jk p u, v q “ ψ p v q T Ω jk ψ p u q for j, k P r p s and u, v P U . In our simula-tions, we generate n “ , ,
400 serially dependent observations of p “ ,
80 functionalvariables. The observed curves are generated from W tj p u q “ X tj p u q ` e tj p u q , where whitenoise curves e tj p u q “ ř l “ z tjl ψ l p u q and tp z tj , . . . , z tj q T u t Pr n s are independently sampled frommultivariate normal distribution with mean zero and covariance diag p , . , . , . , . q . Foreach of the three models, the data is generated as follows.
VFAR : We generate block sparse Ω with 5% or 10% nonzero blocks for p “
80 or p “ , respectively. Specifically, for the j -th block row, we set the diagonal block Ω jj “ diag p . , . , . , . , . , ´ , . . . , ´ q and randomly choose one off-diagonal block being0 . Ω jj and two off-diagonal blocks being 0 . Ω jj . Such block sparse design on Ω can guaranteethe stationarity of the VFAR(1) process. It is worth noting that estimating VFAR(1) resultsin a very high-dimensional task, since, e.g. even under the most ‘low-dimensional’ settingwith p “ , n “
400 and truncated dimension d “
3, one needs to estimate 40 ˆ “ ,
400 parameters based on only 400 observations. The p -dimensional functional covariates t X t p¨qu t Pr n s for SFLR and FFLR below are generated in the same way as those for VFAR. SFLR : We generate the scalar responses t Y t u t Pr n s from model (4), where ε t ’s are inde-23endent N p , q variables. For each j P S “ t , . . . , u , we generate β j p u q “ ř l “ b jl ψ l p u q for u P U , where b j , b j , b j are sampled from the uniform distribution with support r´ , ´ . sYr . , s and b jl “ p´ q l l ´ for l “ , . . . , . For j P r p sz S, we let β j p u q “ . FFLR : We generate the functional responses t Y t p v q : v P V u t Pr n s with V “ r , s frommodel (23), where ε t p v q “ ř m “ g tm ψ m p v q with g tm ’s being independent N p , q variables.For j P S, we generate β j p u, v q “ ř l,m “ b jml ψ l p u q ψ m p v q for p u, v q P U ˆ V , where componentsin t b jlm u ď l,m ď are sampled from the uniform distribution with support r´ , ´ . s Y r . , s and b jlm “ p´ q l ` m p l ` m q ´ for l or m “ , . . . , . For j P r p sz S, we let β j p u, v q “ L and d j ’s. As our simulated results suggest that the estimators are not sensitiveto the choice of L , we set L “ d j , we take the standard approachby selecting the largest d j eigenvalues of p K jj in (10) such that the cumulative percentageof selected eigenvalues exceeds 90%. To choose the regularization parameter(s) for eachmodel and comparison method, there are several possible methods one could adopt suchas AIC, BIC and cross-validation. The BIC and AIC methods require the calculation ofthe effective degrees of freedom, which leads to a very challenging task given the high-dimensional, functional and dependent nature of the model structure and hence is left forfuture research. In our simulations, we generate a training sample of size n and a separatevalidation sample of the same size. Using the training data, we compute a series of estimatorswith 30 different values of the regularization parameters, i.e. t p b p γ n q j u j Pr p s (or t p B p γ n q j u j Pr p s ) asa function of γ n for SFLR (or FFLR) and t p Ω p γ nj q jk u k Pr p s as a function of γ nj for VFAR,calculate the squared error between observed and fitted values on the validation set, i.e. ř nt “ r Y t ´ ř pj “ t p b p γ n q j u T p η tj s for SFLR, ř nt “ } p ζ t ´ ř pj “ p p B p γ n q j q T p η tj } for FFLR and ř nt “ } p η tj ´ ř pk “ p p Ω p γ nj q jk q T p η p t ´ q k } for VFAR, and choose the one with the smallest error.We compare AUTO with the standard covariance-based estimation framework (COV),which proceeds in the following three steps. The first step performs FPCA on t W tj p¨qu t Pr n s for each j, where the truncated dimension was selected in a similar way as d j . Therefore,estimating SFLR and FFLR models are transformed into fitting multiple linear regressionswith univariate response (Kong et al., 2016) and multivariate response (Fang et al., 2020),respectively and the VFAR estimation is converted to the VAR estimation (Guo and Qiao,24020). The second step considers minimizing the covariance-based criterion, essentially theleast squares with the addition of a group lasso type penalty. Such criterion can be optimizedusing an efficient block version of fast iterative shrinkage-thresholding algorithm developed inGuo and Qiao (2020), which converges faster than the commonly adopted block coordinatedescent algorithm (Fan et al., 2015). The third step recovers functional sparse estimatesusing estimated eigenfunctions.We examine the performance of COV and AUTO for three models in terms of the relativeestimation accuracy, i.e. } p A ´ A } F {} A } F for VFAR, p ř pj “ } ˆ β j ´ β j } q { {p ř pj “ } β j } q { forSFLR and p ř pj “ } ˆ β j ´ β j } S q { {p ř pj “ } β j } S q { for FFLR. We ran each simulation 100 times.Figure 1 displays boxplots of relative estimation errors for three models, while Table 2 in theSupplementary Material gives numerical summaries. Several conclusions can be drawn fromFigure 1. First, AUTO significantly outperforms COV for three models under all scenarioswe consider. Second, as discussed in Section 2, AUTO provides consistent estimates, whilethe consistency of COV estimates is jeopardized by the white noise contamination. This canbe demonstrated by our empirical results that AUTO provides more substantially improvedestimates over COV as n increases from 100 to 400 especially for SFLR and FFLR. Third, theperformance of AUTO slightly deteriorates as p increases from 40 to 80, providing empiricalevidence to support that the rates in (22), (27) and (31) for SFLR, FFLR and VFAR models,respectively, all depend on the p log p q { term. We further illustrate our developed methodology using a public financial dataset, which wasobtained from the WRDS database and consists of high-frequency observations of pricesfor S&P 100 index and component stocks (list available in Table 3 of the SupplementaryMaterial, we removed several stocks for which the data were not available so that p “ r , s . We constructcumulative intraday return (CIDR) trajectories (Horv´ath et al., 2014), in percentage, by W tj p u k q “ r log t P tj p u k qu ´ log t P tj p u qus , where P tj p u k q p t P r n s , j P r p s , k P r N sq denotesthe price of the j -th stock at the k -th minute after the opening time on the t -th trading day.25 . . . . . . p=40 p=40 p=40 p=80 p=80 p=80n=100 n=200 n=400 n=100 n=200 n=400 COVAUTO (a) Relative estimation errors for VFAR . . . . . . p=40 p=40 p=40 p=80 p=80 p=80n=100 n=200 n=400 n=100 n=200 n=400 COVAUTO (b) Relative estimation errors for SFLR . . . . p=40 p=40 p=40 p=80 p=80 p=80n=100 n=200 n=400 n=100 n=200 n=400 COVAUTO (c) Relative estimation errors for FFLR Figure 1: The boxplots of relative estimation errors for (a) VFAR, (b) SFLR and (c) FFLR.26uch CIDR curves always start from zero and have nearly the same shape as the originalprice curves, but make the stationarity assumption more plausible.Our interest is in predicting the intraday return of the S&P 100 index based on observedCIDR trajectories of component stocks, W tj p u q , u P U “ r , N s up to time N, where, e.g. N “
360 corresponds to 30 minutes prior to the closing time of the trading day. With thisin mind, we construct a sparse SFLR model with erroneous functional predictors as follows Y t “ p ÿ j “ ż U X tj p u q β j p u q d u ` ε t , W tj p u q “ X tj p u q ` e tj p u q , t P r n s , j P r p s , (32)where Y t is the intraday return of the S&P 100 index on the t -th trading day, X tj p¨q and e tj p¨q represent the signal and noise components in W tj p¨q , respectively. We split the whole datasetinto three subsets: training, validation and test sets consisting of the first 171, subsequent40 and last 40 observations, respectively. We apply the validation set approach to select theregularization parameters for AUTO and COV, based on which we estimate sparse functionalcoefficients in (32) and calculate the mean squared prediction errors (MSPEs) on the test set.For comparison, we also implement autocovariance-based generalized method-of-moments(AGMM) (Chen et al., 2020) and covariance-based least squares method (CLS) (Hall andHorowitz, 2007) to fit the unvariate version of (32) for each component stock, among whichwe choose the best models leading to the lowest test MSPEs. Finally, we include the nullmodel, using the mean of the training response to predict the test response.The resulting test MSPEs for different values of N and all comparison approaches arepresented in Table 1. We observe a few apparent patterns. First, in all scenarios we consider,AUTO provides the best predictive performance, while the autocovariance-based methodsare superior to the covariance-based counterparts. Second, the predictive accuracy for func-tional regression type of methods improves as N approaches to 390 providing more recentinformation into the predictors. Third, AUTO and COV significantly outperform AGMMand CLS, while Mean gives the worst results. This indicates that using multiple selectedfunctional predictors from the trading histories indeed improves the prediction results.27able 1: MSPEs up to different current times, N “ N is in bold font. Method u ď u ď u ď u ď u ď u ď u ď COV 5.487 5.360 5.222 5.090 4.976 4.927 4.882AGMM 6.506 6.470 6.454 6.441 6.408 6.385 6.364CLS 6.859 6.798 6.730 6.655 6.583 6.546 6.507Mean 8.832 8.832 8.832 8.832 8.832 8.832 8.832
A Further non-asymptotic results
To provide the theoretical support for proposed estimators in Sections 5.1 and 5.2, we presentessential non-asymptotic results for relevant estimated cross-(auto)covariance terms basedon the functional cross-spectral stability measure (Fang et al., 2020) between t W t p¨qu t P Z and˜ p -dimensional mean-zero functional time series (or scalar time series) t Y t p¨qu t P Z (or t Z t u t P Z ).Define Σ W,Yh p u, v q “ Cov t W t p u q , Y t ` h p v qu and Σ W,Zh p u q “ Cov t W t p u q , Z t ` h u for h P Z and p u, v q P U ˆ V . Condition 13. (i) For t W t p¨qu t P Z and t Y t p¨qu t P Z , the cross-spectral density function f W,Yθ “p π q ´ ř h P Z Σ W,Yh e ´ ihθ for θ P r´ π, π s exists and the functional cross-spectral stability mea-sure defined in (A.1) is finite, i.e. M W,Y “ π ¨ ess sup θ Pr´ π,π s , Φ P H p , Φ P H r p |x Φ , f W,Yθ p Φ qy| b x Φ , Σ W p Φ qy b x Φ , Σ Y p Φ qy ă 8 , (A.1)where H p “ t Φ P H p : x Φ , Σ W p Φ qy P p , and H ˜ p “ t Φ P H ˜ p : x Φ , Σ Y p Φ qy P p , . (ii) For t W t p¨qu t P Z and t Z t u t P Z , the cross-spectral density function f W,Zθ “ p π q ´ ř h P Z Σ W,Zh e ´ ihθ for θ P r´ π, π s exists and the functional cross-spectral stability measure defined in (A.2) isfinite, i.e. M W,Z “ π ¨ ess sup θ Pr´ π,π s , Φ P H p , v P R ˜ p |x Φ , f W,Zθ v y| b x Φ , Σ X p Φ qy b v T Σ Z v ă 8 , (A.2)28here R ˜ p “ t ν P R ˜ p : v T Σ Z v P p , . In analogy to (12), we can define the functional cross-spectral stability measure of all k -dimensional subsets of t W t p¨qu and k -dimensional subsets of t Y t p¨qu (or t Z t u ) as M W,Yk ,k (or M W,Zk ,k ). It is easy to verify that M W,Yk ,k ď M W,Y ă 8 (or M W,Zk ,k ď M W,Z ă 8 ) for k P r p s and k P r ˜ p s . See Fang et al. (2020) for further discussions. For scalar time series t Z t u , the non-functional stability measure reduces to M Z “ π ¨ ess sup θ Pr´ π,π s , v P R ˜ p v T f Zθ vv T Σ Z v . and the stability measure of all k -dimensional subsets of t Z t u , i.e. M Zk for k P r ˜ p s , can besimilarly defined according to (12).For each k P r ˜ p s , we represent Y tk p¨q “ ř m “ ζ tkm φ km p¨q under the Karhunen-Lo`eveexpansion, where ζ tkm “ x Y tk , φ km y and tp θ km , φ km qu m ě are pairs of eigenvalues and eigen-functions of Σ Y ,kk . Let tp ˆ θ km , ˆ φ km qu m ě be estimated eigenpairs of p Σ Y ,kk and ˆ ζ tkm “ x Y tk , ˆ φ km y . We next impose a condition on eigenvalues t θ km u m ě and then develop the deviation boundin elementwise (cid:96) -norm on how ˆ σ W,Yh,jklm “ p n ´ h q ´ ř n ´ ht “ ˆ η tjl ˆ ζ p t ` h q km concentrates around σ W,Yh,jklm “ Cov t η tjl , ζ p t ` h q km u , which plays a crucial role in investigating the convergence prop-erty of the FFLR estimate in Section 5.2. Condition 14. (i) For each k P r ˜ p s , θ k ą θ k ą ¨ ¨ ¨ ą , and there exist some positiveconstants ˜ c and ˜ α ą θ km ´ θ k p m ` q ě ˜ cm ´ ˜ α ´ for m ě
1; (ii) max k Pr ˜ p s ř m “ θ km “ O p q . Proposition 2.
Suppose that Conditions , and hold for sub-Gaussian func-tional linear processes, t W t p¨qu , t Y t p¨qu , and h is fixed. Let d and ˜ d be positive inte-gers possibly depending on p n, p, ˜ p q and M W,Y “ M W ` M Y ` M W,Y , . If n Á p d α ` _ ˜ d α ` qp M W,Y q log p p ˜ pd ˜ d q , then there exist some positive constants c and c independent of p n, p, ˜ p, d, ˜ d q such that max j Pr p s ,k Pr ˜ p s ,l Pr d s ,m Pr ˜ d s | ˆ σ W,Yh,jklm ´ σ W,Yh,jklm | l α ` _ m ˜ α ` À M W,Y d log p p ˜ pd ˜ d q n (A.3) holds with probability greater than ´ c p p ˜ pd ˜ d q ´ c .
29e next consider a mixed process scenario consisting of t W t p¨qu and t Z t u and establishthe deviation bound in elementwise (cid:96) -norm on how ˆ (cid:37) X,Zh,jkl “ p n ´ h q ´ ř n ´ ht “ ˆ η tjl Z p t ` h q k concentrates around (cid:37) X,Zh,jkl “ Cov t η tjl , Z p t ` h q k u , which is essential in the convergence analysisof the SFLR estimate in Section 5.1. Proposition 3.
Suppose that Conditions and hold for sub-Gaussian functional lin-ear process t W t p¨qu , sub-Gaussian linear process t Z t u and h is fixed. Let d be a positive inte-ger possibly depending on p n, p, ˜ p q and M W,Z “ M W ` M Z ` M W,Z , . If n Á p M W,Z q log p p ˜ pd q ,then there exist some positive constants c and c independent of p n, p, ˜ p, d q such that max j Pr p s ,k Pr ˜ p s ,l Pr d s | ˆ (cid:37) W,Zh,jkl ´ (cid:37) W,Zh,jkl | l α ` À M W,Z c log p p ˜ pd q n , (A.4) holds with probability greater than ´ c p p ˜ pd q ´ c . References
Aue, A., Norinho, D. and H¨ormann, S. (2015). On the prediction of stationary functionaltime series,
Journal of the American Statistical Association : 378–392.Aue, A. and van Delft, A. (2020). Testing for stationarity of functional time series in thefrequency domain,
The Annals of Statistics
To appear .Bathia, N., Yao, Q. and Ziegelmann, F. (2010). Identifying the finite dimensionality of curvetime series,
The Annals of Statistics : 3352–3386.Belloni, A., Chernozhukov, V., Chetverikov, D., Hansen, C. and Kato, K. (2018). High-dimensional econometrics and regularized GMM, arXiv:1806.01888v2 .Bosq, D. (2000). Linear Processes in Function Spaces , Springer, New York.Chen, C., Guo, S. and Qiao, X. (2020). Functional linear regression: dependence and errorcontamination,
Journal of Business & Economic Statistics
To appear .Cho, H., Goude, Y., Brossat, X. and Yao, Q. (2013). Modelling and forecasting dailyelectricity load curves: a hybrid approach,
Journal of the American Statistical Association : 7–21. 30escary, M.-H. and Panaretos, V. M. (2019). Functional data analysis by matrix completion,
Annals of Statistics : 1–38.Fan, Y., Foutz, N., James, G. and Jank, W. (2014). Functional forecasting of demand decayrates using online virtual stock markets, The Annals of Applied Statistics : 2435–2460.Fan, Y., James, G. and Radchenko, P. (2015). Functional additive regression, The Annalsof Statistics : 2296–2325.Fang, Q., Guo, S. and Qiao, X. (2020). A new perspective on dependence in high-dimensionalfunctional/scalar time series: finite sample theory and applications, arXiv:2004.07781 .Fu, A., Narasimhan, B. and Boyd, S. (2020). CVXR: An R package for disciplined convexoptimization, Journal of Statistical Software
To appear .Gautier, E. and Rose, C. (2019). High-dimensional instrumental variables regression andconfidence sets, arXiv:1105.2454v6 .Guo, S. and Qiao, X. (2020). On consistency and sparsity for high-dimensional functionaltime series with application to autoregressions, arXiv:2003.11462 .Hall, P. and Horowitz, J. L. (2007). Methodology and convergence rates for functional linearregression,
The Annals of Statistics : 70–91.Hall, P. and Vial, C. (2006). Assessing the finite dimensionality of functional data, Journalof the Royal Statistical Society: Series B : 689–705.Hamilton, J. D. (1994). Time series analysis , Vol. 2, Princeton New Jersey.H¨ormann, S., Kidzinski, L. and Hallin, M. (2015). Dynamic functional principal components,
Journal of the Royal Statistical Society: Series B : 319–348.H¨ormann, S. and Kokoszka, P. (2010). Weakly dependent functional data, The Annals ofStatistics : 1845–1884.Horv´ath, L., Kokoszka, P. and Rice, G. (2014). Testing stationary of functional time series, Journal of Econometrics : 66–82. 31irak, M. (2016). Optimal eigen expansions and uniform bounds,
Probability Theory andRelated Fields : 753–799.Kong, D., Xue, K., Yao, F. and Zhang, H. (2016). Partially functional linear regression inhigh dimensions,
Biometrika : 147–159.Lam, C. and Yao, Q. (2012). Factor modelling for high-dimensional time series: inferencefor the number of factors,
The Annals of Statistics : 694–726.Li, D., Robinson, P. M. and Shang, H. L. (2020). Long-range dependent curve time series, Journal of the American Statistical Association (530): 957–971.Luo, R. and Qi, X. (2017). Function-on-function linear regression by signal compression,
Journal of the American Statistical Association : 690–705.M¨uller, H., Sen, R. and Stadtm¨uller, U. (2011). Functional data analysis for volatility,
Journal of Econometrics : 233–245.Panaretos, V. and Tavakoli, S. (2013). Fourier analysis of stationary time series in functionspace,
The Annals of Statistics : 568–603.Rudelson, M. and Vershynin, R. (2013). Hanson-wright inequality and sub-gaussian concen-tration, Electronic Communications in Probability : 1–9.Wainwright, M. J. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint , Cam-bridge University Press, Cambridge.Xue, K. and Yao, F. (2020). Hypothesis testing in large-scale functional linear regression,
Statistica Sinica
To appear .Yao, F., M¨uller, H. G. and Wang, J.-L. (2005). Functional data analysis for sparse longitu-dinal data,
Journal of the American Statistical Association : 577–590.Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with groupedvariables,
Journal of the Royal Statistical Society: Series B68