[PDF] A Class of Time-Varying Vector Moving Average Models: Nonparametric Kernel Estimation and Application

Abstract

Multivariate dynamic time series models are widely encountered in practical studies, e.g., modelling policy transmission mechanism and measuring connectedness between economic agents. To better capture the dynamics, this paper proposes a wide class of multivariate dynamic models with time-varying coefficients, which have a general time-varying vector moving average (VMA) representation, and nest, for instance, time-varying vector autoregression (VAR), time-varying vector autoregression moving-average (VARMA), and so forth as special cases. The paper then develops a unified estimation method for the unknown quantities before an asymptotic theory for the proposed estimators is established. In the empirical study, we investigate the transmission mechanism of monetary policy using U.S. data, and uncover a fall in the volatilities of exogenous shocks. In addition, we find that (i) monetary policy shocks have less influence on inflation before and during the so-called Great Moderation, (ii) inflation is more anchored recently, and (iii) the long-run level of inflation is below, but quite close to the Federal Reserve's target of two percent after the beginning of the Great Moderation period.

Full PDF

AA Class of Time–Varying Vector Moving Average ( ∞ ) Models:Nonparametric Kernel Estimation and Application Yayi Yan, Jiti Gao and Bin Peng

Monash UniversityOctober 6, 2020

Abstract

Multivariate dynamic time series models are widely encountered in practical studies, e.g.,modelling policy transmission mechanism and measuring connectedness between economicagents. To better capture the dynamics, this paper proposes a wide class of multivariatedynamic models with time–varying coeﬃcients, which have a general time–varying vectormoving average (VMA) representation, and nest, for instance, time–varying vector autore-gression (VAR), time–varying vector autoregression moving–average (VARMA), and so forthas special cases. The paper then develops a uniﬁed estimation method for the unknown quan-tities before an asymptotic theory for the proposed estimators is established. In the empiricalstudy, we investigate the transmission mechanism of monetary policy using U.S. data, anduncover a fall in the volatilities of exogenous shocks. In addition, we ﬁnd that (i) monetarypolicy shocks have less inﬂuence on inﬂation before and during the so–called Great Modera-tion, (ii) inﬂation is more anchored recently, and (iii) the long–run level of inﬂation is below,but quite close to the Federal Reserve’s target of two percent after the beginning of the GreatModeration period.

Keywords : Multivariate Time Series Model; Nonparametric Kernel Estimation; TrendingStationarity

Corresponding author : Jiti Gao, Department of Econometrics and Business Statistics, Monash University,Caulﬁeld East, Victoria 3145, Australia. Email:

[email protected] .The authors of this paper would like to thank George Athanasopoulos, Rainer Dahlhaus, David Frazier, OliverLinton, Gael Martin, Peter CB Phillips and Wei Biao Wu for their constructive comments on earlier versions of thispaper. The second author would also like to acknowledge ﬁnancial support from the Australian Research CouncilDiscovery Grants Program under Grant Numbers: DP170104421 and DP200102769. a r X i v : . [ ec on . E M ] O c t Introduction

Vector autoregressions (VARs), as well as their extensions like vector autoregressive moving average(VARMA) models and VARs with exogenous variables (VARX), are among some of the mostpopular frameworks for modelling dynamic interactions among multiple variables. These modelsarise mainly as a response to the “incredible” identiﬁcation conditions embedded in the large–scale macroeconomic models (Sims, 1980). VAR modelling begins with minimal restrictions onthe multivariate dynamic models. Gradually armed with identiﬁcation information, VARs plustheir statistical tool–kits like impulse response functions, are powerful approaches for conductingpolicy analysis. Also, VARs can be applied to other important tasks including data descriptionand forecasting, see Stock and Watson (2001) for a detailed review. Despite the popularity, linearVAR models can always be rejected by data in empirical studies (cf., Tsay, 1998). For example,Stock and Watson (2016 b ) point out, “changes associated with the Great Moderation go beyondreduction in variances to include changes in dynamics and reduction in predictability.”To go beyond linear VAR models, various parametric time–varying VAR models have beenproposed (e.g., Tsay, 1998, Sims and Zha, 2006, and references therein) in order to allow forabrupt structural breaks in economic relationships and obtain eﬃcient estimation. However, modelmisspeciﬁcation and parameter instability may undermine the performance of parametric time–varying VAR models. Usually, researchers do not know the true functional forms of the time–varyingparameters, so the choice on the functional forms might be somewhat arbitrary. In addition, aspointed out by Hansen (2001), “it may seem unlikely that a structural break could be immediateand might seem more reasonable to allow a structural change to take a period of time to takeeﬀect”. Therefore, it is reasonable to allow smooth structural changes over a period of time ratherthan abrupt structural changes. Another strand of the VAR literature assumes that structuralcoeﬃcients evolve in a random way, such as Primiceri (2005) and Giannone et al. (2019). Recently,Giraitis et al. (2014) point out that “the theoretical asymptotic properties of estimating suchprocesses via the Kalman, or related ﬁlters are unclear”. Along this line, Giraitis et al. (2014) andGiraitis et al. (2018) have achieved some useful asymptotic results. However, estimation theory fortime–varying impulse response functions, which are of interest in interpreting multivariate dynamicmodels, are not yet established in the random walk case.It is worth pointing out that although nonparametric estimation for deterministic time–varyingmodels has received much attention initially on time series regression models (Robinson, 1989; Cai,2007; Li et al., 2011; Chen et al., 2012; Zhang and Wu, 2012) over the past three decades and thenon univariate autoregressive models (Dahlhaus and Rao, 2006; Richter et al., 2019) in recent years,little study about multivariate autoregressive models with deterministic time–varying coeﬃcientshas been done. One exception is Dahlhaus et al. (1999), who use a wavelet method to transform thetime–varying VAR model to a linear approximation with an orthonormal wavelet basis, and then1how that the wavelet estimator attains the usual near–optimal minimax rate of L convergence.Up to this point, it is worth bringing up the terminology “local stationarity”, which dates back tothe seminal work by Dahlhaus (1996) at least. While several studies have been conducted alongthis line (Dahlhaus, 1996 and Zhang and Wu, 2012 on time–varying AR; Dahlhaus and Polonik,2009 on time–varying ARMA; Rohan and Ramanathan, 2013 and Truquet, 2017 on time–varyingARCH/GARCH), the literature has not ventured much outside the univariate setting. A commonlyused method is to approximate a locally stationary process by a stationary approximation on eachof the segments (Dahlhaus et al., 2019). However, it remains unclear to us how to extend thisapproximation method for the univariate setting to the multivariate case where the segments onwhich stationarity approximations for each univariate time series may be quite diﬀerent.This paper is to show the versatility of an alternative approach that is especially designed for awide class of time–varying VMA( ∞ ) processes. Our approach relies on an explicit decompositionof time–varying VMA( ∞ ) processes into long–run and transitory elements, which is known as theBeveridge–Nelson (BN) decomposition (Beveridge and Nelson, 1981; Phillips and Solo, 1992). Thelong–run component in the decomposition yields a martingale approximation to the sum of time–varying VMA( ∞ ) processes. We are then able to deal with a wide class of multivariate dynamicmodels with smooth time–varying coeﬃcients, which have a general time–varying VMA( ∞ ) repre-sentation, nesting VAR, VARMA, VARX and so forth as special cases. Speciﬁcally, the structuralcoeﬃcients are unknown functions of the re–scaled time, so that the proposed models can bettercapture the simultaneous relations among variables of interest over time. Such a modelling strat-egy is especially useful for analysing time series over a long horizon, since it oﬀers a comprehensivetreatment on tracking interests which are aﬀected by frequently updated policies, environment,system, etc. In an economy system consisting of inﬂation, unemployment and interest rates, onepriority in Section 4 is inferring time–varying impacts of the interest rate change, which helps sta-bilize ﬂuctuations in inﬂation and unemployment in long–run. Under the proposed framework, itis achieved by investigating the corresponding time–varying impulse response functions.In summary, our contributions are in threefold. First, we propose a wide class of time–varyingVMA( ∞ ) models, which covers several classes of multivariate dynamic models. Second, we developa time–varying counterpart of the conventional BN decomposition and propose a uniﬁed estima-tion method for a class of unknown time–varying functions. We then establish the correspondingasymptotic theory for the proposed models and estimators. Third, in the empirical study of Sec-tion 4, we study the changing dynamics of three key U.S. macroeconomic variables (i.e., inﬂation,unemployment, and interest rate), and uncover a fall in the volatilities of exogenous shocks. Inaddition, we ﬁnd that (i) monetary policy shocks have less inﬂuence on inﬂation before and duringthe so–called Great Moderation; (ii) inﬂation is more anchored recently; and (iii) the long–runlevel of inﬂation is below, but quite close to the Federal Reserve’s target of two percent after thebeginning of the Great Moderation period. 2he organization of this paper is as follows. Section 2 proposes a class of time–varying VMA( ∞ )models. In Section 3, we discuss a class of time–varying VAR models and then establish anestimation theory for the unknown quantities. Section 4 presents an empirical study to showthe practical relevance and applicability of the proposed models and estimation theory. Section 5gives a short conclusion. The main proofs of the theorems are given in Appendix A. In the onlinesupplementary material, simulation results are given in Appendix B.1 and some technical lemmasand proofs are given in the rest of Appendix B.Before proceeding further, it is convenient to introduce some notation: (cid:107)·(cid:107) denotes the Euclideannorm of a vector or the Frobenius norm of a matrix; ⊗ denotes the Kronecker product; I a and a are a × a identity and null matrices respectively, and a × b stands for a a × b matrix of zeros;for a function g ( w ), let g ( j ) ( w ) be the j th derivative of g ( w ), where j ≥ g (0) ( w ) ≡ g ( w ); K h ( · ) = K ( · /h ) /h , where K ( · ) and h stand for a nonparametric kernel function and a bandwidthrespectively; let ˜ c k = (cid:82) − u k K ( u ) du and ˜ v k = (cid:82) − u k K ( u ) du for integer k ≥

0; vec( · ) stacks theelements of an m × n matrix as an mn × a × a square matrix A , vech ( A ) denotes the a ( a + 1) × A ) by eliminating all supra–diagonal elements of A ; tr ( A )denotes the trace of A ; ﬁnally, let → P and → D denote convergence in probability and convergencein distribution, respectively. ( ∞ ) We start our investigation by considering a class of time–varying VMA( ∞ ) model: x t = µ t + ∞ (cid:88) j =0 B j,t (cid:15) t − j := µ t + B t ( L ) (cid:15) t (2.1)for t = 1 , . . . , T , where x t is a d –dimensional vector of observable variables, µ t is a d –dimensionalunknown trending function, (cid:15) t is a vector of d –dimensional random innovations, and d is ﬁxedthroughout this paper. Moreover, B t ( L ) = (cid:80) ∞ j =0 B j,t L j , where L is the lag operator, and B j,t is amatrix of d × d unknown deterministic coeﬃcients.We ﬁrst comment on the usefulness of the structure in (2.1), and the corresponding BN decom-position. An application of the BN decomposition gives x t = µ t + B t (1) (cid:15) t + (cid:101) B t ( L ) (cid:15) t − − (cid:101) B t ( L ) (cid:15) t , (2.2)where we have used the decomposition of B t ( L ) as B t ( L ) = B t (1) − (1 − L ) (cid:101) B t ( L ), in which (cid:101) B t ( L ) = (cid:80) ∞ j =0 (cid:101) B j,t L j and (cid:101) B j,t = (cid:80) ∞ k = j +1 B k,t . Equation (2.2) indicates that one may establish some generalasymptotic properties for partial sums and quadratic forms of x t with minor restrictions on { (cid:15) t } .3or example, one can show that the simple average of x t becomes1 T T (cid:88) t =1 x t = 1 T T (cid:88) t =1 µ t + 1 T T (cid:88) t =1 B t (1) (cid:15) t + O P ( T − ) . (2.3)Similar to (2.3), asymptotic properties for partial sums and quadratic forms mainly depend on theprobabilistic structure of { (cid:15) t } and regularity conditions on { B j,t } . In other words, there is no needto impose any further structure on x t , such as requiring x t to be locally stationary time seriesin a similar way to what has been done in the relevant literature for the univariate case. As aconsequence, it facilitates to develop general theory for the multivariate case. Moreover, it shouldbe added that our settings in (2.2) and (2.3) considerably extend similar treatments by Phillipsand Solo (1992) for the univariate linear process case where both µ t and B j,t reduce to constantscalars: µ t = µ and B j,t = B j . Let us now stress that (2.1) covers a wide range of models, which are of general interest in boththeory and practice. Below, we list a few examples, of which the parametric counterparts can beseen in L¨utkepohl (2005).

Example 1.

Suppose that x t is a d –dimensional time–varying VAR( p ) process: x t = A ,t x t − + · · · + A p,t x t − p + (cid:15) t , (2.4)which has been widely studied in the literature with Bayesian framework being the dominantapproach (e.g., Benati and Surico, 2009; Paul, 2019). Similar to Hamilton (1994, p. 260), (2.4)can be expressed as a time–varying MA( ∞ ) process x t = (cid:80) ∞ j =0 B j,t (cid:15) t − j , where B ,t = I d and B j,t = J (cid:81) j − i =0 Φ t − i J (cid:62) for j ≥ J = [ I d , d × d ( p − ] and Φ t =  A ,t · · · A p − ,t A p,t I d · · · d d ... . . . ... ... d · · · I d d  . Example 2.

Suppose that x t is a d –dimensional time–varying VARMA( p, q ) process as follows: x t = A ,t x t − + ... + A p,t x t − p + (cid:15) t + Θ ,t (cid:15) t − + ... + Θ q,t (cid:15) t − q , (2.5)which then can be expressed as x t = (cid:80) ∞ b =0 D b,t (cid:15) t − b with D b,t = (cid:80) bj =max(0 ,b − q ) B j,t Θ b − j,t − j , B j,t deﬁned similarly as in Example 1, and Θ ,t ≡ I d independent of t .4 xample 3. Let x t be a d –dimensional time–varying double MA( ∞ ) process: x t = ∞ (cid:88) j =0 Ψ j,t v t − j and v t = ∞ (cid:88) l =0 Θ l,t (cid:15) t − l , (2.6)in which the innovations v t ’s also follow a time–varying MA( ∞ ) process. Simple algebra showsthat x t = (cid:80) ∞ j =0 B j,t (cid:15) t − j , where B j,t = (cid:80) jl =0 Ψ l,t Θ j − l,t − l .To facilitate the development of our general theory, we introduce the following assumptions. Assumption 1. max t ≥ (cid:80) ∞ j =1 j (cid:107) B j,t (cid:107) < ∞ , lim sup T →∞ (cid:80) T − t =1 (cid:80) ∞ j =1 j (cid:107) B j,t +1 − B j,t (cid:107) < ∞ , and lim sup T →∞ (cid:80) T − t =1 (cid:107) µ t +1 − µ t (cid:107) < ∞ . Assumption 2. { (cid:15) t } ∞ t = −∞ is a martingale diﬀerence sequences (m.d.s.) adapted to the ﬁltration {F t } , where F t = σ ( (cid:15) t , (cid:15) t − , . . . ) is the σ –ﬁeld generated by ( (cid:15) t , (cid:15) t − , . . . ) , E (cid:0) (cid:15) t (cid:15) (cid:62) t |F t − (cid:1) = I d almostsurely (a.s.), and max t ≥ E (cid:104) (cid:107) (cid:15) t (cid:107) δ (cid:105) < ∞ for some δ > . Assumption 1 regulates the matrices B j,t ’s, and ensures the validity of the BN decompositionunder the time–varying framework. It covers cases such as (i) the parametric setting of Phillipsand Solo (1992), and (ii) B j,t := B j ( τ t ), where B j ( · ) satisﬁes Lipschitz continuity on [0 ,

1] for all j .Assumption 2 imposes conditions on the innovation error terms by replacing the commonly usedindependent and identically distributed (i.i.d.) innovations (e.g., Dahlhaus and Polonik, 2009) witha martingale diﬀerence structure.We are now ready to present a summary of useful results for Examples 1–3, which explains whymodel (2.1) serves as a foundation of the examples given above. Proposition 2.1.

1. Consider Examples 1 and 2. Suppose that the roots of I d − A ,t − · · · − A p,t = d all lie outsidethe unit circle uniformly over t , lim sup T →∞ (cid:80) T − t =1 (cid:107) A m,t +1 − A m,t (cid:107) < ∞ for m = 1 , . . . , p and A m,t = A m, for t ≤ and m = 1 , . . . , p . In addition, suppose that in Example 2, lim sup T →∞ (cid:80) T − t =1 (cid:107) Θ m,t +1 − Θ m,t (cid:107) < ∞ for m = 1 , . . . , q . Then both (2.5) and (2.6) aretime–varying MA ( ∞ ) processes, in which the MA coeﬃcients satisfy Assumption 1.2. For Example 3, let lim sup T →∞ (cid:80) T − t =1 (cid:80) ∞ j =1 j (cid:107) Ψ j,t +1 − Ψ j,t (cid:107) < ∞ and max t ≥ (cid:80) ∞ j =1 j (cid:107) Ψ j,t (cid:107) < ∞ . Moreover, let { Θ j,t } satisfy the same conditions as those for { Ψ j,t } . Then (2.6) is atime–varying MA ( ∞ ) , in which the MA coeﬃcients satisfy Assumption 1. We now move on to investigate asymptotic properties for model (2.1). We ﬁrst propose someestimates for several population moments of x t in (2.1), which help derive the asymptotic theorythroughout this paper. To conserve space, we present the rates of the uniform convergence below,while extra results on point–wise convergence are given in the supplementary Appendix B.5 heorem 2.1. Let Assumptions 1 and 2 hold. In addition, let { W T,t ( · ) } Tt =1 be a sequence of m × d matrices of deterministic functions, in which m is ﬁxed, each functional component is Lipschitzcontinuous and deﬁned on a compact set [ a, b ] . Moreover, suppose that1. sup τ ∈ [ a,b ] (cid:80) Tt =1 (cid:107) W T,t ( τ ) (cid:107) = O (1) ;2. sup τ ∈ [ a,b ] (cid:80) T − t =1 (cid:107) W T,t +1 ( τ ) − W T,t ( τ ) (cid:107) = O ( d T ) , where d T = sup τ ∈ [ a,b ] ,t ≥ (cid:107) W T,t ( τ ) (cid:107) .Then as T → ∞ ,1. sup τ ∈ [ a,b ] (cid:107) (cid:80) Tt =1 W T,t ( τ ) ( x t − E ( x t )) (cid:107) = O P ( √ d T log T ) provided T δ d T log T → ;2. sup τ ∈ [ a,b ] (cid:107) (cid:80) Tt =1 W T,t ( τ ) (cid:0) x t x (cid:62) t + p − E ( x t x (cid:62) t + p ) (cid:1) (cid:107) = O P ( √ d T log T ) for any ﬁxed integer p ≥ provided T δ d T log T → and max t ≥ E ( (cid:107) (cid:15) t (cid:107) |F t − ) < ∞ a.s., where δ > is the same as inAssumption 2. Theorem 2.1 is readily used for studying many useful cases, including weighted kernel estimators(see Lemma B.7 of Appendix B for example), and will be repeatedly used in many of the theoreticalderivations of this paper. Theorem 2.1 is also helpful to a broad range of studies, such as thosementioned in H¨ardle et al. (2000), Fan and Yao (2003), Gao (2007), Li and Racine (2007), Hansen(2008), Wang and Chan (2014), and Li et al. (2016). µ t As modelling time–varying means is always an important task in time series analysis (e.g., Wuand Zhao, 2007; Friedrich et al., 2020), we infer µ t of the model (2.1) below. Up to this point, wehave not imposed any speciﬁc form on the components µ t and B j,t of x t . To carry on with ourinvestigation, we suppose further that µ t = µ ( τ t ) and B j,t = B j ( τ t ) with τ t = t/T , so (2.1) can beexpressed by x t = µ ( τ t ) + ∞ (cid:88) j =0 B j ( τ t ) (cid:15) t − j . (2.7)The challenge then lies in the fact that “residuals” are time–varying linear processes. Some detailedexplanations can be found in Dahlhaus (2012).The following assumptions are necessary for the development of our trend estimation. Assumption 3. µ ( · ) and B j ( · ) are d × vector and d × d matrix respectively. Each func-tional component of µ ( · ) and B j ( · ) is second order continuously diﬀerentiable on [0 , . Moreover, sup τ ∈ [0 . (cid:80) ∞ j =1 j (cid:107) B ( (cid:96) ) j ( τ ) (cid:107) < ∞ for (cid:96) = 0 , . Assumption 4.

Let K ( · ) be a symmetric probability kernel function and Lipschitz continuous on [ − , . Also, let h → and T h → ∞ as T → ∞ . ∀ τ ∈ (0 , µ ( τ ) by the next estimator. (cid:98) µ ( τ ) = (cid:32) T (cid:88) t =1 K h ( τ t − τ ) (cid:33) − T (cid:88) t =1 x t K h ( τ t − τ ) . (2.8)We are now ready to establish an important and useful theorem. Theorem 2.2.

Let Assumptions 2–4 hold. For ∀ τ ∈ (0 , , as T → ∞ , √ T h (cid:18) (cid:98) µ ( τ ) − µ ( τ ) − h ˜ c µ (2) ( τ ) (cid:19) → D N ( d × , Σ µ ( τ )) , where Σ µ ( τ ) = ˜ v (cid:16)(cid:80) ∞ j =0 B j ( τ ) (cid:80) ∞ j =0 B (cid:62) j ( τ ) (cid:17) , and ˜ c and ˜ v are deﬁned in the end of Section 1. Note that Σ µ ( · ) is the long–run covariance matrix, which in general cannot be estimated directly.To construct conﬁdence intervals practically, we use a dependent wild bootstrap (DWB) methodwhich is initially proposed by Shao (2010) for stationary time series. For the sake of space, thedetailed procedure with the associated asymptotic properties is presented in Appendix A.2. In this section, we pay particular attention to one of the most popular models of the VMA( ∞ )family — VAR. Many multivariate time series exhibit time–varying simultaneous interrelationshipsand changes in unconditional volatility (e.g., Justiniano and Primiceri, 2008; Coibion and Gorod-nichenko, 2011). Along this line, time–varying VAR models have proven to be especially usefulfor describing the dynamics of the multivariate time series. Majority time–varying VAR modelsare investigated under the Bayesian framework, while little has been done using a frequentists’approach. Building on Section 2, we consider a time–varying VAR model under the nonparametricframework, and establish the corresponding estimation theory.Suppose that we observe ( x − p +1 , . . . , x , x , . . . , x T ) from the following data generating process.Accounting for heteroscedasticity, we consider the next model. x t = a ( τ t ) + p (cid:88) j =1 A j ( τ t ) x t − j + η t , with η t = ω ( τ t ) (cid:15) t , (3.1)where ω ( τ ) is a matrix of unknown functions of τ . The model (3.1) allows dynamic variationsfor both the coeﬃcients and the covariance matrix. We infer a ( · ) and A j ( · )’s below, which arerespectively d × d × d matrices of unknown smooth functions. In addition, we areinterested in the d × d dimension ω ( · ) which governs the dynamics of the covariance matrix of7 η t } Tt =1 . As mentioned in Primiceri (2005), allowing ω ( · ) to vary over time is important theoreticallyand practically, because a constant covariance matrix implies that the shock to the i th variable of x t has a time–invariant eﬀect on the j th variable of x t , restricting simultaneous interactions amongmultiple variables to be time–invariant. To facilitate the development, we ﬁrst impose the following conditions.

Assumption 5.

1. The roots of I d − A ( τ ) L − · · · − A p ( τ ) L p = d all lie outside the unit circle uniformly in τ ∈ [0 , .2. Each element of A ( τ ) = ( a ( τ ) , A ( τ ) , . . . , A p ( τ )) is second order continuously diﬀerentiableon [0 , and A ( τ ) = A (0) for τ < .3. Each element of ω ( τ ) is second order continuously diﬀerentiable on [0 , . Moreover, Ω ( τ ) = ω ( τ ) ω ( τ ) (cid:62) is positive deﬁnite uniformly in τ ∈ [0 , and ω ( τ ) = ω (0) for τ < . Assumption 5.1 ensures that model (3.1) is neither unit–root nor explosive, while Assumption5.2 allows the underlying data generate process to evolve over time in a smooth manner. In addition,the conditions A ( τ ) = A (0) and ω ( τ ) = ω (0) for τ < x t = a (0) + p (cid:88) j =1 A j (0) x t − j + ω (0) (cid:15) t (3.2)for all t ≤

0, which ensures (3.2) behaves like a stationary parametric VAR( p ) model for t ≤

0. Similar treatments can be found in Vogt (2012) for the univariate setting. With the aboveassumptions in hand, the following proposition shows that model (3.1) can be approximated by atime–varying VMA( ∞ ) process satisfying Assumption 3. Proposition 3.1.

Under Assumption 5, there exists a VMA ( ∞ ) process (cid:101) x t = µ ( τ t ) + B ( τ t ) (cid:15) t + B ( τ t ) (cid:15) t − + B ( τ t ) (cid:15) t − + · · · (3.3) such that max t ≥ E (cid:107) x t − (cid:101) x t (cid:107) = O ( T − ) , where µ ( τ ) = a ( τ ) + (cid:80) ∞ j =1 Ψ j ( τ ) a ( τ ) , B ( τ ) = ω ( τ ) , B j ( τ ) = Ψ j ( τ ) ω ( τ ) , Ψ j ( τ ) = J Φ j ( τ ) J (cid:62) for j ≥ , Φ ( · ) is deﬁned as follows: Φ ( τ ) =  A ( τ ) · · · A p − ( τ ) A p ( τ ) I d · · · d d ... . . . ... ... d · · · I d d  , (3.4)8 nd J = (cid:2) I d , d × d ( p − (cid:3) . Moreover, µ ( · ) and B j ( · ) fulﬁl Assumption 3. Under Assumption 5, when τ t is in a small neighbourhood of τ , we have x t ≈ Z (cid:62) t − vec[ A ( τ )] + η t , (3.5)where Z t − = z t − ⊗ I d and z t − = (1 , x (cid:62) t − , . . . , x (cid:62) t − p ) (cid:62) . The estimators of A ( τ ) and Ω ( τ ) are thensequentially given byvec[ (cid:98) A ( τ )] = (cid:32) T (cid:88) t =1 Z t − Z (cid:62) t − K h ( τ t − τ ) (cid:33) − T (cid:88) t =1 Z t − x t K h ( τ t − τ ) , (cid:98) Ω ( τ ) = (cid:32) T (cid:88) t =1 K h ( τ t − τ ) (cid:33) − T (cid:88) t =1 (cid:98) η t (cid:98) η (cid:62) t K h ( τ t − τ ) , (3.6)where (cid:98) η t = x t − (cid:98) A ( τ t ) z t − . The asymptotic properties associated with (3.6) are summarized in thenext theorem. Theorem 3.1.

Let Assumptions 2, 4 and 5 hold. Suppose further that T − δ h log T → ∞ as T → ∞ and max t ≥ E ( (cid:107) (cid:15) t (cid:107) |F t − ) < ∞ a.s. Then the following results hold.1. sup τ ∈ [ h, − h ] (cid:107) (cid:98) A ( τ ) − A ( τ ) (cid:107) = O P (cid:0) h + ( log TT h ) / (cid:1) ;2. In addition, conditional on F t − , the third and fourth moments of (cid:15) t are identical to thecorresponding unconditional moments a.s.. For ∀ τ ∈ (0 , , √ T h  vec (cid:16) (cid:98) A ( τ ) − A ( τ ) − h ˜ c A (2) ( τ ) (cid:17) vech (cid:16) (cid:98) Ω ( τ ) − Ω ( τ ) − h ˜ c Ω (2) ( τ ) (cid:17) → D N ( , V ( τ )) , where V ( τ ) is deﬁned in (A.3) for the sake of presentation.3. (cid:98) V ( τ ) → P V ( τ ) , where (cid:98) V ( τ ) is deﬁned in (A.5) for the sake of presentation. The ﬁrst result of Theorem 3.1 provides the uniform convergence rate for (cid:98) A ( τ ). As a con-sequence, it allows us to establish a joint asymptotic distribution for the estimates of the coeﬃ-cients and innovation covariance in the second result. For δ >

5, the usual optimal bandwidth h opt = O (cid:0) T − / (cid:1) satisﬁes the condition T − δ h log T → ∞ . The third result ensures the conﬁdenceinterval can be constructed practically.Before moving on to impulse responses, we consider a practical issue — the choice of the lag p ,which is usually unknown in practice and needs to be decided by the data. We select the numberof lags by minimizing the next information criterion: (cid:98) p = argmin ≤ p ≤ P IC( p ) , (3.7)9here IC( p ) = log { RSS( p ) } + p · χ T , RSS( p ) = T (cid:80) Tt =1 (cid:98) η (cid:62) p ,t (cid:98) η p ,t , χ T is the penalty term, and P is asuﬃciently large ﬁxed positive integer. The next theorem shows the validity of (3.7). Theorem 3.2.

Let Assumptions 2, 4 and 5 hold. Suppose T − δ h log T → ∞ , max t ≥ E ( (cid:107) (cid:15) t (cid:107) |F t − ) < ∞ a.s., χ T → , and ( c T φ T ) − χ T → ∞ , where c T = h + (cid:0) log TT h (cid:1) / and φ T = h + (cid:0) log TT h (cid:1) / . Then Pr ( (cid:98) p = p ) → as T → ∞ . In view of the conditions on χ T , a natural choice is χ T = max (cid:40) h , h (cid:18) log TT h (cid:19) / , log TT h (cid:41) · log(1 /h ) . In the supplementary Appendix B.1, we conduct intensive simulations to examine the ﬁnite sampleperformance of the above information criterion.

We now focus on the impulse responses below, which capture the dynamic interactions among thevariables of interest in a wide range of practical cases.By Proposition 3.1, the impulse response functions of x t is asymptotically equivalent to thoseof (cid:101) x t . Hence, recovering the impulse response functions requires estimating Ψ j ( · )’s and ω ( · ), whichis then down to the estimation of Φ ( · ) and ω ( · ) by construction. Note further that Φ ( · ) is a matrixconsisting of the coeﬃcients of (3.1). The estimator of Φ ( · ) is intuitively deﬁned as (cid:98) Φ ( · ), in whichwe replace A j ( · ) of (A.6) with the corresponding estimator obtained from (3.6). Furthermore, werequire ω ( τ ) to be a lower–triangular matrix in order to fulﬁl the identiﬁcation restriction. Thus, (cid:98) ω ( τ ) is chosen as the lower triangular matrix from the Cholesky decomposition of (cid:98) Ω ( τ ) such that (cid:98) Ω ( τ ) = (cid:98) ω ( τ ) (cid:98) ω (cid:62) ( τ ).With the above notation in hand, we are now ready to present the estimator of B j ( τ ) by (cid:98) B j ( τ ) = (cid:98) Ψ j ( τ ) (cid:98) ω ( τ ) , where (cid:98) Ψ j ( τ ) = J (cid:98) Φ j ( τ ) J (cid:62) . The corresponding asymptotic results are summarized in the followingtheorem. Theorem 3.3.

Let Assumptions 2, 4 and 5 hold, and let T → ∞ . Suppose further that T − δ h log T → ∞ , max t ≥ E (cid:0) (cid:107) (cid:15) t (cid:107) |F t − (cid:1) < ∞ a.s. and conditional on F t − , the third and fourth moments of (cid:15) t areidentical to the corresponding unconditional moments a.s.. Then for any ﬁxed integer j ≥ √ T h (cid:18) vec (cid:16) (cid:98) B j ( τ ) − B j ( τ ) (cid:17) − h ˜ c B (2) j ( τ ) (cid:19) → D N (cid:0) , Σ B j ( τ ) (cid:1) , where Σ B j ( τ ) = [ C j, ( τ ) , C j, ( τ )] V ( τ )[ C j, ( τ ) , C j, ( τ )] (cid:62) , (2) j ( τ ) = C j, ( τ )vec (cid:0) A (2) ( τ ) (cid:1) + C j, ( τ )vech (cid:0) Ω (2) ( τ ) (cid:1) , C , ( τ ) = 0 , C , ( τ ) = L (cid:62) d (cid:0) L d ( I d + K dd )( ω ( τ ) ⊗ I d ) L (cid:62) d (cid:1) − , C j, ( τ ) = ( ω (cid:62) ( τ ) ⊗ I d ) (cid:32) j − (cid:88) m =0 J ( Φ (cid:62) ( τ )) j − − m ⊗ ( J Φ m ( τ ) J (cid:62) ) (cid:33) · [ d p × d , I d p ] , j ≥ , C j, ( τ ) = ( I d ⊗ ( J Φ j ( τ ) J (cid:62) )) L (cid:62) d (cid:0) L d ( I d + K dd )( ω ( τ ) ⊗ I d ) L (cid:62) d (cid:1) − , j ≥ , in which the elimination matrix L d satisﬁes that vech( F ) = L d vec( F ) for any d × d matrix F , andthe commutation matrix K mn satisﬁes that K mn vec( G ) = vec( G (cid:62) ) for any m × n matrix G . To close this section, we comment on how to construct the conﬁdence interval. Since (cid:98) Φ ( τ ) → P Φ ( τ ), (cid:98) ω ( τ ) → P ω ( τ ) and (cid:98) V ( τ ) → P V ( τ ) by Theorem 3.1, it is straightforward to have (cid:98) Σ B j ( τ ) → P Σ B j ( τ ), where (cid:98) Σ B j ( τ ) has a form identical to Σ B j ( τ ) but replacing Φ ( τ ), ω ( τ ) and V ( τ ) withtheir estimators, respectively.We next show in Section 4 about how to apply the proposed model and estimation method toan empirical data. Our ﬁndings show that the estimated coeﬃcient matrices and impulse responsefunctions capture various time–varying features. In this section, we study the transmission mechanism of the monetary policy, and infer the long–runlevel of inﬂation (i.e., trend inﬂation) and the natural rate of unemployment (NAIRU). The trendinﬂation and NAIRU are of central position in setting monetary policy since the Federal ReserveBank aims to mitigate deviations of inﬂation and unemployment from their long–run targets. SeeStock and Watson (2016 a ) for more relevant discussions.As well documented, inﬂation is higher and more volatile during 1970–1980, but substantiallydecreases in the subsequent period, which is often referred to as the Great Moderation (Primiceri,2005). The literature has considered two main classes of explanations: bad policy or bad luck. Theﬁrst type of explanations focuses on the changes in the transmission mechanism (e.g., Cogley andSargent, 2005), while the second regards it as a consequence of changes in the size of exogenousshocks (e.g., Sims and Zha, 2006). In what follows, we revisit the arguments associated with theGreat Moderation using our approach. Also, we use the VMA( ∞ ) representation of the VAR( p )model to infer the path of trend inﬂation and NAIRU over time. First, we estimate the time–varying VAR( p ) model using three commonly adopted macroeconomicvariables of the literature (Primiceri, 2005; Cogley et al., 2010), which are the inﬂation rate (mea-sured by the 100 times the year–over–year log change in the GDP deﬂator), the unemployment11ate, representing the non–policy block, and the interest rate (measured by the average value forthe Federal funds rates over the quarter), representing the monetary policy block. To isolate themonetary policy shocks, the interest rate is ordered last in the VAR model, and is treated as themonetary policy instrument. The identiﬁcation requirement is that the monetary policy actionsaﬀect the inﬂation and the unemployment with at least one period of lag (Primiceri, 2005). Thedata are quarterly observations measured at an annual rate from 1954:Q3 to 2020:Q1, which aretaken from the Federal Reserve Bank of St. Louis economic database. Figure 1 plots the threemacro variables.The order of the VAR( p ) model and the optimal bandwidth are determined by the informationcriterion (3.7) and the cross validation criterion (B.1), respectively. We obtain (cid:98) p = 3 and (cid:98) h cv =0 . x t by µ t = lim p →∞ E t ( x t + p ) =( I − A t ) − a t , where a t is the intercept term and A t is the autoregressive coeﬃcients. The maindiﬀerence between our method and Petrova’s method is that we invert time–varying VAR( p ) modelto the time–varying MA( ∞ ) model, and then explicitly estimate the underlying trends of theinﬂation and the NAIRU using model (2.7).Figure 4 plots the estimates of the trend inﬂation and the NAIRU. It is obvious that theunderlying trend of the inﬂation is high in the 1970s, but decreases in the subsequent period. Afterthe Great Moderation, the long–run level of the inﬂation is below, but quite close to the FederalReserve’s target of 2%, which indicates that the inﬂation is more anchored now than in the 1970s.However, the NAIRU is less persistent and ﬂuctuates over time.12 .2 Out–of–Sample Forecasting In this subsection, we focus on the out–of–sample forecasting, and compare the forecasts of aBayesian time–varying parameter VAR with stochastic volatility (TVP–SV) (cf., Primiceri, 2005),as well as a VAR model with constant parameters (CVAR).Speciﬁcally, we consider the 1–8 quarters ahead forecasts. That is to forecast x t +1 | t + h = h − h (cid:88) i =1 x t + i for h = 1, 2, 4 and 8, where x t includes the values of the three aforementioned macro variables atthe date t . The forecasts of x t +1 | t + h are constructed by (cid:98) x t +1 | t + h = (cid:98) A t z t , in which (cid:98) A t is estimatedfrom CVAR, TVP–SV and the time–varying VAR model using the available data at time t and z t = [1 , x (cid:62) t , x (cid:62) t − , x (cid:62) t − ] (cid:62) . The expanding window scheme is adopted. For comparison, we computethe root of mean square errors (RMSE) for CVAR as a benchmark, and the RMSE ratios for others.The out–of–sample forecast period covers 1985:Q2–end, about 35 years.The forecasting results are presented in Table 1, in which the values represent the ratios of theRMSEs of the corresponding method over the RMSEs of the benchmark method (i.e., CVAR). Theresult shows that the time–varying VAR model and TVP–SV perform much better than CVAR,which implies the desirability of introducing variations in forecasting models. In addition, thetime–varying VAR model has a better forecasting performance than TVP–SV with the increase ofthe forecast horizon.Table 1: The 1–8 quarters ahead forecast. The values represent the ratios of the RMSEs of thecorresponding method over the RMSEs of the benchmark method (i.e., CVAR). In each panel, thenumbers in bold font of each column indicate the method that provides the best out–of–sampleforecast for given h . h = 1 Q h = 2

Q h = 4

Q h = 8 Q Inﬂation rate, 1985:Q1–endTVP–SV

Unemployment rate, 1985:Q1–endTVP–SV

960 1980 2000 2020024681012 1960 1980 2000 202024681012 1960 1980 2000 202005101520

Figure 1: Plots of the inﬂation (left), the unemployment rate (middle) and the interest rate (right)

Figure 2: The estimated volatilities of the innovations in the inﬂation equation (left), the unem-ployment equation (middle) and the interest rate equation (right) as well as the associated 95%conﬁdence intervals. The red line denotes the estimated volatilities using the constant VAR model.Figure 3: The time–varying structural impulse response of the inﬂation to monetary policy shocksas well as the associated 95% conﬁdence interval.14

960 1970 1980 1990 2000 2010 20200246810

Trend Inflation

NAIRU

Figure 4: The estimated trends of the inﬂation and the NAIRU as well as the associated 95%conﬁdence intervals

This paper has proposed a class of time–varying VMA( ∞ ) models, which nest, for instance, time–varying VAR, time–varying VARMA, and so forth as special cases. Both the estimation method-ology and asymptotic theory have been established accordingly. In the empirical study, we haveinvestigated the transmission mechanism of monetary policy using U.S. data, and uncover a fall inthe volatilities of exogenous shocks. Our ﬁndings include (i) monetary policy shocks have less inﬂu-ence on inﬂation before and during the so–called Great Moderation, (ii) inﬂation is more anchoredrecently, and (iii) the long–run level of inﬂation is below, but quite close to the Federal Reserve’starget of two percent after the beginning of the Great Moderation period. In addition, in AppendixB of the online supplementary material, we have evaluated the ﬁnite–sample performance of theproposed model and estimation theory.There are several directions for possible extensions. The ﬁrst one is to test about whetherthe d –dimensional components of the VAR( p ) process is cross–sectionally independent for the casewhere the dimensionality, d , and the number of lags, p , may diverge along with the sample size, T . The second one is about model speciﬁcation testing to check whether some of the time–varyingcoeﬃcient matrices, A j ( τ ), may just be constant matrices, A j . Existing studies by Gao andGijbels (2008), Pan et al. (2014), and Chen and Wu (2019) may be useful for both issues. Thethird one is to allow for some cointegrated structure in our settings. The recent work by Zhanget al. (2019) provides us with a good reference. We wish to leave such issues for future study. References

Benati, L. and Surico, P. (2009), ‘VAR analysis and the great moderation’,

American Economic Review (4), 1636–52.Beveridge, S. and Nelson, C. R. (1981), ‘A new approach to decomposition of economic time series into permanent nd transitory components with particular attention to measurement of the business cycle’, Journal of MonetaryEconomics (2), 151–174.B¨uhlmann, P. et al. (1998), ‘Sieve bootstrap for smoothing in nonstationary time series’, Annals of Statistics (1), 48–83.Cai, Z. (2007), ‘Trending time–varying coeﬃcient time series models with serially correlated errors’, Journal ofEconometrics (2), 163–188.Chen, J., Gao, J. and Li, D. (2012), ‘Semiparametric trending panel data models with cross-sectional dependence’,

Journal of Econometrics (1), 71–85.Chen, L. and Wu, W. B. (2019), ‘Testing for trends in high-dimensional time series’,

Journal of the AmericanStatistical Association (526), 869–881.Chu, C.-K. and Marron, J. S. (1991), ‘Comparison of two bandwidth selectors with dependent errors’,

Annals ofStatistics (4), 1906–1918.Cogley, T., Primiceri, G. E. and Sargent, T. J. (2010), ‘Inﬂation-gap persistence in the U.S.’, American EconomicJournal: Macroeconomics (1), 43–69.Cogley, T. and Sargent, T. J. (2005), ‘Drifts and volatilities: Monetary policies and outcomes in the post WorldWar II U.S.’, Review of Economic Dynamics (2), 262–302.Coibion, O. and Gorodnichenko, Y. (2011), ‘Monetary policy, trend inﬂation, and the great moderation: An alter-native interpretation’, American Economic Review (1), 341–70.Dahlhaus, R. (1996), ‘On the kullback-leibler information divergence of locally stationary processes’,

StochasticProcesses and Their Applications (1), 139–168.Dahlhaus, R. (2012), Locally stationary processes, in ‘Handbook of Statistics’, Vol. 30, Elsevier, pp. 351–413.Dahlhaus, R., Neumann, M. H. and Von Sachs, R. (1999), ‘Nonlinear wavelet estimation of time-varying autoregres-sive processes’, Bernoulli (5), 873–906.Dahlhaus, R. and Polonik, W. (2009), ‘Empirical spectral processes for locally stationary time series’, Bernoulli (1), 1–39.Dahlhaus, R. and Rao, S. S. (2006), ‘Statistical inference for time-varying ARCH processes’, Annals of Statistics (3), 1075–1114.Dahlhaus, R., Richter, S. and Wu, W. B. (2019), ‘Towards a general theory for nonlinear locally stationary processes’, Bernoulli (2), 1013–1044.Fan, J. and Yao, Q. (2003), Nonlinear Time Series: Parametric and Nonparametric Methods , Springer–Verlag, NewYork.Freedman, D. A. (1975), ‘On tail probabilities for martingales’,

Annals of Probability (1), 100–118.Friedrich, M., Smeekes, S. and Urbain, J.-P. (2020), ‘Autoregressive wild bootstrap inference for nonparametrictrends’, Journal of Econometrics (1), 81–109. ao, J. (2007), Nonlinear Time Series: Semi– and Non–Parametric Methods , Chapman & Hall/CRC, London.Gao, J. and Gijbels, I. (2008), ‘Bandwidth selection in nonparametric kernel testing’,

Journal of the AmericanStatistical Association (484), 1584–1594.Giannone, D., Lenza, M. and Primiceri, G. E. (2019), ‘Priors for the long run’,

Journal of the American StatisticalAssociation (526), 565–580.Giraitis, L., Kapetanios, G. and Yates, T. (2014), ‘Inference on stochastic time-varying coeﬃcient models’,

Journalof Econometrics (1), 46–65.Giraitis, L., Kapetanios, G. and Yates, T. (2018), ‘Inference on multivariate heteroscedastic time varying randomcoeﬃcient models’,

Journal of Time Series Analysis (2), 129–149.Hall, P. and Heyde, C. C. (1980), Martingale Limit Theory and Its Application , Academic Press.Hamilton, J. D. (1994),

Time Series Analysis , Vol. 2, Princeton University Press, New Jersey.Hansen, B. E. (2001), ‘The new econometrics of structural change: dating breaks in US labour productivity’,

Journalof Economic Perspectives (4), 117–128.Hansen, B. E. (2008), ‘Uniform convergence rates for kernel estimation with dependent data’, Econometric Theory (3), 726–748.H¨ardle, W., Liang, H. and Gao, J. (2000), Partially Linear Models , Springer-Verlag, New York.Justiniano, A. and Primiceri, G. E. (2008), ‘The time-varying volatility of macroeconomic ﬂuctuations’,

AmericanEconomic Review (3), 604–41.Li, D., Chen, J. and Gao, J. (2011), ‘Nonparametric time–varying coeﬃcient panel data models with ﬁxed eﬀects’, Econometrics Journal (3), 387–408.Li, D., Phillips, P. C. B. and Gao, J. (2016), ‘Uniform consistency of nonstationary kernel-weighted sample covari-ances for nonparametric regression’, Econometric Theory (3), 655–685.Li, Q. and Racine, J. (2007), Nonparametric Econometrics Theory and Practice , Princeton University Press, NewJersey.L¨utkepohl, H. (2005),

New Introduction to Multiple Time Series Analysis , Springer Science & Business Media.Pan, G., Gao, J. and Yang, Y. (2014), ‘Testing independence for a large number of high dimensional random vectors’,

Journal of the American Statistical Association (506), 600–612.Paul, P. (2019), ‘The time-varying eﬀect of monetary policy on asset prices’,

Review of Economics and Statistics p. forthcoming.Petrova, K. (2019), ‘A quasi-Bayesian local likelihood approach to time varying parameter VAR models’,

Journal ofEconometrics (1), 286–306.Phillips, P. C. B. and Solo, V. (1992), ‘Asymptotics for linear processes’,

Annals of Statistics (2), 971–1001.Primiceri, G. E. (2005), ‘Time varying structural vector autoregressions and monetary policy’, Review of EconomicStudies (3), 821–852. ichter, S., Dahlhaus, R. et al. (2019), ‘Cross validation for locally stationary processes’, Annals of Statistics (4), 2145–2173.Robinson, P. M. (1989), ‘Chapter 15: Nonparametric estimation of time-varying parameters’, Statistical Analysisand Forecasting of Economic Structural Change pp. 253–264.Rohan, N. and Ramanathan, T. (2013), ‘Nonparametric estimation of a time-varying GARCH model’,

Journal ofNonparametric Statistics (1), 33–52.Shao, X. (2010), ‘The dependent wild bootstrap’, Journal of the American Statistical Association (489), 218–235.Sims, C. A. (1980), ‘Macroeconomics and reality’,

Econometrica (1), 1–48.Sims, C. and Zha, T. (2006), ‘Were there regime switches in U.S. monetary policy?’, American Economic Review (1), 54–81.Stock, J. H. and Watson, M. W. (2001), ‘Vector autoregressions’, Journal of Economic Perspectives (4), 101–115.Stock, J. H. and Watson, M. W. (2016 a ), ‘Core inﬂation and trend inﬂation’, Review of Economics and Statistics (4), 770–784.Stock, J. H. and Watson, M. W. (2016 b ), Dynamic factor models, factor-augmented vector autoregressions, andstructural vector autoregressions in macroeconomics, in ‘Handbook of macroeconomics’, Vol. 2, Elsevier, pp. 415–525.Truquet, L. (2017), ‘Parameter stability and semiparametric inference in time varying auto-regressive conditionalheteroscedasticity models’, Journal of the Royal Statistical Society: Series B (5), 1391–1414.Tsay, R. S. (1998), ‘Testing and modeling multivariate threshold models’, Journal of the American Statistical Asso-ciation (443), 1188–1202.Vogt, M. (2012), ‘Nonparametric regression for locally stationary time series’, Annals of Statistics (5), 2601–2633.Wang, Q. and Chan, N. (2014), ‘Uniform convergence rates for a class of martingales with application in non-linearcointegrating regression’, Bernoulli (1), 207–230.Wu, W. B. and Zhao, Z. (2007), ‘Inference of trends in time series’, Journal of the Royal Statistical Society: SeriesB (3), 391–410.Zhang, R., Robinson, P. and Yao, Q. (2019), ‘Identifying cointegration by eigenanalysis’, Journal of the AmericanStatistical Association (526), 916–927.Zhang, T. and Wu, W. B. (2012), ‘Inference of time-varying regression models’,

Annals of Statistics (3), 1376–1402. ppendix A For the sake of presentation, we ﬁrst provide some notation and mathematical sybmols in AppendixA.1, and then present the dependent wild bootstrap (DWB) procedure with the associated asymptoticproperties in Appendix A.2. Some proofs of the main results are provided in Appendix A.3. Simulations,some secondary results and omitted proofs are given in the online supplementary Appendix B. In whatfollows, M and O (1) always stand for constants, and may be diﬀerent at each appearance. A.1 Notation and Mathematical Symbols

For ease of notation, we deﬁne three matrices Σ ( τ ), V ( τ ) and Φ ( τ ) with their estimators respectively.First, for ∀ τ ∈ (0 , Σ ( τ ) =  µ (cid:62) ( τ ) · · · µ (cid:62) ( τ ) µ ( τ ) Σ ( τ ) · · · Σ (cid:62) p − ( τ )... ... . . . ... µ ( τ ) Σ p − ( τ ) · · · Σ ( τ )  , (A.1)in which µ ( τ ) and B j ( τ ) are deﬁned in Proposition 3.1 and Σ m ( τ ) = µ ( τ ) µ ( τ ) (cid:62) + (cid:80) ∞ j =0 B j ( τ ) B (cid:62) j + m ( τ )for m = 0 , . . . , p −

1. We deﬁne the estimator of Σ ( τ ) as (cid:98) Σ ( τ ) = 1 T T (cid:88) t =1 z t − z (cid:62) t − K h ( τ t − τ ) , (A.2)where z t is deﬁned in (3.5).Next, we let V ( τ ) =  V , ( τ ) V (cid:62) , ( τ ) V , ( τ ) V , ( τ )  , (A.3)where V , ( τ ) = ˜ v Σ − ( τ ) ⊗ Ω ( τ ) , V , ( τ ) = lim T →∞ hT T (cid:88) t =1 E (cid:16) vech( η t η (cid:62) t ) η (cid:62) t Z (cid:62) t − (cid:17) K h ( τ t − τ ) · ( Σ − ( τ ) ⊗ I d ) , V , ( τ ) = lim T →∞ hT T (cid:88) t =1 E (cid:16) vech( η t η (cid:62) t )vech( η t η (cid:62) t ) (cid:62) (cid:17) K h ( τ t − τ ) − ˜ v vech ( Ω ( τ )) vech ( Ω ( τ )) (cid:62) . (A.4)The estimator of V ( τ ) is then deﬁned as follows. (cid:98) V ( τ ) =  (cid:98) V , ( τ ) (cid:98) V (cid:62) , ( τ ) (cid:98) V , ( τ ) (cid:98) V , ( τ )  , (A.5) here (cid:98) V , ( τ ), (cid:98) V , ( τ ) and (cid:98) V , ( τ ) have the forms identical to their counterparts of (A.4), but we replace Σ ( τ ), η t and Ω ( τ ) with their estimators presented in (A.2) and (3.6).Finally, recall the following deﬁnition: Φ ( τ ) =  A ( τ ) · · · A p − ( τ ) A p ( τ ) I d · · · d d ... . . . ... ... d · · · I d d  . (A.6)Replacing A j ( τ )’s of (A.6) with their estimators obtained from (3.6) yields the estimator (cid:98) Φ ( τ ) straightaway. A.2 Dependent Wild Bootstrap

We now present the detailed dependent wild bootstrap (DWB) procedure which is used to establish theconﬁdence interval associated with Theorem 2.2.1. For ∀ τ ∈ (0 , (cid:101) µ ( τ ) be the same as deﬁned in (2.8) using an over–smoothing bandwidth (cid:101) h .Obtain residuals (cid:98) e t = x t − (cid:101) µ ( τ t ) for t ≥ x ∗ t = (cid:101) µ ( τ t ) + e ∗ t for t ≥

1, where e ∗ t = ξ ∗ t (cid:98) e t , { ξ ∗ t } is an l -dependent process satisfying E ( ξ ∗ t ) = 0, E | ξ ∗ t | = 1, E | ξ ∗ t | δ < ∞ with δ >

2, and E ( ξ ∗ t ξ ∗ s ) = a (( t − s ) /l ) with some kernel function a ( · ) and tuning parameter l .3. For ∀ τ ∈ (0 , (cid:98) µ ∗ ( τ ) be the same as (2.8) but using { x ∗ t } .4. Repeat Steps 2–3 J times. Let q α ( τ ) be the α –quantile of J statistics (cid:98) µ ∗ ( τ ) − (cid:101) µ ( τ ), and denote the(1 − α ) · (cid:98) µ ( τ ) as (cid:2) (cid:98) µ ( τ ) − q − α/ ( τ ) , (cid:98) µ ( τ ) − q α/ ( τ ) (cid:3) .The above DWB procedure requires a tuning parameter l , which is the so–called “block length” (Shao,2010). The following conditions are required to ensure the validity of the DWB procedure. Assumption A.1.

Suppose that l → ∞ , max { (cid:101) h, h/ (cid:101) h } → and l · max { / √ T h, (cid:101) h , /T (cid:101) h } → . Addition-ally, let a ( · ) be a symmetric kernel deﬁned on [ − , satisfying that a (0) = 1 and a ( · ) is continuous at with a (1) (0) < ∞ . We summarize the asymptotic properties of the DWB method by the next theorem.

Theorem A.1.

Let Assumptions 2–4 and A.1 hold. For ∀ τ ∈ (0 , , as T → ∞ ,1. sup w ∈ R d (cid:12)(cid:12)(cid:12) Pr ∗ (cid:104) √ T h ( (cid:98) µ ∗ ( τ ) − (cid:101) µ ( τ )) ≤ w (cid:105) − Pr (cid:104) √ T h ( (cid:98) µ ( τ ) − µ ( τ )) ≤ w (cid:105)(cid:12)(cid:12)(cid:12) = o P (1) ,2. lim inf T →∞ Pr (cid:0) µ ( τ ) ∈ (cid:2) (cid:98) µ ( τ ) − q − α/ ( τ ) , (cid:98) µ ( τ ) − q α/ ( τ ) (cid:3)(cid:1) = 1 − α ,where Pr ∗ denotes the probability measure induced by the DWB procedure. e comment on some practical issues. Bandwidth Selection:

Since the observations are dependent, we use the modiﬁed cross–validationcriterion proposed by Chu and Marron (1991). Speciﬁcally, it is a “leave–(2k+1)–out” version of cross–validation and h mcv is selected to minimize the following objective function: h mcv = min h T (cid:88) t =1 ( x t − (cid:98) µ k,h ( τ t )) (cid:62) ( x t − (cid:98) µ k,h ( τ t )) , where (cid:98) µ k,h ( τ ) = (cid:16)(cid:80) t : | t − τT | >k K (cid:0) τ t − τh (cid:1)(cid:17) − (cid:80) t : | t − τT | >k x t K (cid:0) τ t − τh (cid:1) . Moreover, in the ﬁrst step of thebootstrap procedure, we follow the suggestion of B¨uhlmann et al. (1998) to use (cid:101) h = c · h / mcv with c = 2. Tuning parameter:

In the second step of the bootstrap procedure, we choose the kernel function a ( · ) and the bandwidth l as in Shao (2010). A.3 Proofs of the Main Results

Proof of Theorem 2.1. (1). By Lemma B.3, we have x t = µ t + B t (1) (cid:15) t + (cid:101) B t ( L ) (cid:15) t − − (cid:101) B t ( L ) (cid:15) t , where B t (1) and (cid:101) B t ( L ) have been deﬁned in equation (2.2). We are then able to writesup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 W T,t ( τ )( x t − E ( x t )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 W T,t ( τ ) B t (1) (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13) W T, ( τ ) (cid:101) B ( L ) (cid:15) (cid:13)(cid:13)(cid:13) + sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13) W T,T ( τ ) (cid:101) B T ( L ) (cid:15) T (cid:13)(cid:13)(cid:13) + sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 (cid:16) W T,t +1 ( τ ) (cid:101) B t +1 ( L ) − W T,t ( τ ) (cid:101) B t ( L ) (cid:17) (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := I T, + I T, + I T, + I T, , where the deﬁnitions of I T,j for j = 1 , . . . , I T, = O P (cid:0) √ d T log T (cid:1) . Also, it’s easy to see that I T, = O P ( d T ) and I T, = O P ( d T ), because E (cid:107) (cid:101) B ( L ) (cid:15) (cid:107) < ∞ and E (cid:107) (cid:101) B T ( L ) (cid:15) T (cid:107) < ∞ in view of the fact that (cid:107) (cid:101) B (1) (cid:107) ≤ ∞ (cid:88) j =0 (cid:107) (cid:101) B j, (cid:107) < ∞ and (cid:107) (cid:101) B T (1) (cid:107) ≤ ∞ (cid:88) j =0 (cid:107) (cid:101) B j,T (cid:107) < ∞ by Lemma B.3. Thus, we just need to consider I T, below.Note that(1). (cid:80) T − t =1 (cid:107) (cid:101) B t +1 (1) − (cid:101) B t (1) (cid:107) = O (1) by Lemma B.3;(2). T /δ d T log T → τ ∈ [ a,b ] (cid:80) T − t =1 (cid:107) W T,t +1 ( τ ) − W T,t ( τ ) (cid:107) = O ( d T ) by the conditions in thebody of this theorem; ≤ t ≤ T − (cid:107) (cid:101) B t +1 ( L ) (cid:15) t (cid:107) = O P ( T /δ ) by E (cid:107) (cid:101) B t +1 ( L ) (cid:15) t (cid:107) δ < ∞ andmax ≤ t ≤ T − (cid:107) (cid:101) B t +1 ( L ) (cid:15) t (cid:107) ≤ (cid:32) T − (cid:88) t =1 (cid:107) (cid:101) B t +1 ( L ) (cid:15) t (cid:107) δ (cid:33) /δ = O P ( T /δ ) . Hence, write sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 (cid:16) W T,t +1 ( τ ) (cid:101) B t +1 ( L ) − W T,t ( τ ) (cid:101) B t ( L ) (cid:17) (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 ( W T,t +1 ( τ ) − W T,t ( τ )) (cid:101) B t +1 ( L ) (cid:15) t + W T,t ( τ )( (cid:101) B t +1 ( L ) − (cid:101) B t ( L )) (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max ≤ t ≤ T − (cid:107) (cid:101) B t +1 ( L ) (cid:15) t (cid:107) · sup τ ∈ [ a,b ] T − (cid:88) t =1 (cid:107) W T,t +1 ( τ ) − W T,t ( τ ) (cid:107) + sup τ ∈ [ a,b ] , ≤ t ≤ T (cid:107) W T,t ( τ ) (cid:107) · T − (cid:88) t =1 (cid:107) ( (cid:101) B t +1 ( L ) − (cid:101) B t ( L )) (cid:15) t (cid:107) = O P ( T /δ · d T ) + O P ( d T ) = o P ( (cid:112) d T log T ) . The ﬁrst result then follows.(2). Below, we consider p = 0 only. The cases with ﬁxed p ≥ τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 vec (cid:16) W T,t ( τ )( x t x (cid:62) t − E ( x t x (cid:62) t )) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( τ )) ∞ (cid:88) j =0 ( B j,t ⊗ µ t ) (cid:15) t − j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( τ )) ∞ (cid:88) j =0 ( B j,t ⊗ B j,t ) (cid:16) vec (cid:16) (cid:15) t − j (cid:15) (cid:62) t − j (cid:17) − vec ( I d ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 2 sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( τ )) ∞ (cid:88) r =1 ∞ (cid:88) j =0 ( B j + r,t ⊗ B j,t )vec ( (cid:15) t − j (cid:15) t − j − r ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := 2 I T, + I T, + 2 I T, , wherein I T, = O P ( √ d T log T ) by a proof similar to the ﬁrst result of this theorem.Consider I T, . Using Lemma B.3, write I T, ≤ sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( τ )) B t (1) (cid:16) vec (cid:16) (cid:15) t (cid:15) (cid:62) t (cid:17) − vec ( I d ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13) ( I d ⊗ W T, ( τ )) (cid:101) B ( L )vec (cid:16) (cid:15) (cid:15) (cid:62) (cid:17)(cid:13)(cid:13)(cid:13) + sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13) ( I d ⊗ W T,T ( τ )) (cid:101) B T ( L )vec (cid:16) (cid:15) T (cid:15) (cid:62) T (cid:17)(cid:13)(cid:13)(cid:13) + sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 (cid:16) ( I d ⊗ W T,t +1 ( τ )) (cid:101) B t +1 ( L ) − ( I d ⊗ W T,t ( τ )) (cid:101) B t ( L ) (cid:17) · vec (cid:16) (cid:15) t (cid:15) (cid:62) t (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := I T, + I T, + I T, + I T, . y Lemma B.6, we have I T, = O P (cid:0) √ d T log T (cid:1) . Also, I T, = O P ( d T ) and I T, = O P ( d T ), because (cid:107) (cid:101) B (1) (cid:107) < ∞ and (cid:107) (cid:101) B T (1) (cid:107) < ∞ by Lemma B.3. Similar to the proof of the ﬁrst result, for I T, , we writesup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 (cid:16) ( I d ⊗ W T,t +1 ( τ )) (cid:101) B t +1 ( L ) − ( I d ⊗ W T,t ( τ )) (cid:101) B t ( L ) (cid:17) vec (cid:16) (cid:15) t (cid:15) (cid:62) t (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ √ d sup τ ∈ [ a,b ] , ≤ t ≤ T (cid:107) W T,t +1 ( τ ) (cid:107) · T − (cid:88) t =1 (cid:13)(cid:13)(cid:13)(cid:16)(cid:101) B t +1 ( L ) − (cid:101) B t ( L ) (cid:17) vec (cid:16) (cid:15) t (cid:15) (cid:62) t (cid:17)(cid:13)(cid:13)(cid:13) + √ d max t (cid:13)(cid:13)(cid:13)(cid:101) B t ( L )vec (cid:16) (cid:15) t (cid:15) (cid:62) t (cid:17)(cid:13)(cid:13)(cid:13) · sup τ ∈ [ a,b ] T − (cid:88) t =1 (cid:107) W T,t +1 ( τ ) − W T,t ( τ ) (cid:107) = o P (cid:16)(cid:112) d T log T (cid:17) , where we have used the facts that(1) . T /δ d T log T → . max t ≥ (cid:13)(cid:13)(cid:13)(cid:101) B t ( L )vec (cid:16) (cid:15) t (cid:15) (cid:62) t (cid:17)(cid:13)(cid:13)(cid:13) = O P ( T /δ );(3) . sup τ ∈ [ a,b ] T − (cid:88) t =1 (cid:107) W T,t +1 ( τ ) − W T,t ( τ ) (cid:107) = O ( d T );(4) . T − (cid:88) t =1 (cid:13)(cid:13)(cid:13)(cid:101) B t +1 (1) − (cid:101) B t (1) (cid:13)(cid:13)(cid:13) = O (1) . Then we can conclude that I T, = O P (cid:0) √ d T log T (cid:1) .We now consider I T, . Using Lemma B.3, we havesup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( τ )) ∞ (cid:88) r =1 ∞ (cid:88) j =0 ( B j + r,t ⊗ B j,t )vec ( (cid:15) t − j (cid:15) t − j − r ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( τ )) ζ t (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ( I d ⊗ W T, ( τ )) ∞ (cid:88) r =1 (cid:101) B r ( L )vec (cid:16) (cid:15) (cid:15) (cid:62)− r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ( I d ⊗ W T,T ( τ )) ∞ (cid:88) r =1 (cid:101) B rT ( L )vec (cid:16) (cid:15) T (cid:15) (cid:62) T − r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 ∞ (cid:88) r =1 (cid:16) ( I d ⊗ W T,t +1 ( τ )) (cid:101) B rt +1 ( L ) − ( I d ⊗ W T,t ( τ )) (cid:101) B rt ( L ) (cid:17) vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := I T, + I T, + I T, + I T, , where ζ t is deﬁned in Lemma B.6.By Lemma B.6, I T, = O P (cid:0) √ d T log T (cid:1) . Moreover, I T, = O P ( d T ) and I T, = O P ( d T ), because (cid:80) ∞ r =1 (cid:107) (cid:101) B r (1) (cid:107) < ∞ and (cid:80) ∞ r =1 (cid:107) (cid:101) B rT (1) (cid:107) < ∞ by Lemma B.3. For I T, , we writesup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 ∞ (cid:88) r =1 (cid:16) ( I d ⊗ W T,t +1 ( τ )) (cid:101) B rt +1 ( L ) − ( I d ⊗ W T,t ( τ )) (cid:101) B rt ( L ) (cid:17) vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ √ d sup τ ∈ [ a,b ] , ≤ t ≤ T (cid:107) W T,t ( τ ) (cid:107) · T − (cid:88) t =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:88) r =1 (cid:16)(cid:101) B rt +1 ( L ) − (cid:101) B rt ( L ) (cid:17) vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + √ d max t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:88) r =1 (cid:101) B rt ( L )vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) · sup τ ∈ [ a,b ] T − (cid:88) t =1 (cid:107) W T,t +1 ( τ ) − W T,t ( τ ) (cid:107) = o P ( (cid:112) d T log T ) , here we have used the fact that(1) . T /δ d T log T → . max t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:88) r =1 (cid:101) B rt ( L )vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O P ( T /δ );(3) . T − (cid:88) t =1 ∞ (cid:88) r =1 (cid:107) (cid:101) B rt +1 (1) − (cid:101) B rt (1) (cid:107) = O (1);(4) . sup τ ∈ [ a,b ] T − (cid:88) t =1 (cid:107) W T,t +1 ( τ ) − W T,t ( τ ) (cid:107) = O ( d T ) . Based on the above development, the proof of the case with p = 0 is complete. The proof is nowcompleted. Proof of Theorem 2.2.

Write 1 √ T h T (cid:88) t =1 ( x t − µ ( τ t )) K (cid:18) τ t − τh (cid:19) = 1 √ T h T (cid:88) t =1 B t (1) (cid:15) t K (cid:18) τ t − τh (cid:19) + 1 √ T h (cid:101) B ( L ) (cid:15) K (cid:18) τ − τh (cid:19) − √ T h (cid:101) B T ( L ) (cid:15) T K (cid:18) τ T − τh (cid:19) + 1 √ T h T − (cid:88) t =1 (cid:18)(cid:101) B t +1 ( L ) K (cid:18) τ t +1 − τh (cid:19) − (cid:101) B t ( L ) K (cid:18) τ t − τh (cid:19)(cid:19) (cid:15) t = 1 √ T h T (cid:88) t =1 B t (1) (cid:15) t K (cid:18) τ t − τh (cid:19) + o P (1) , where the second equality follows from similar arguments to the proof of Theorem 2.1.For the bias term, we have for any τ ∈ (0 , T h T (cid:88) t =1 µ ( τ t ) K (cid:18) τ t − τh (cid:19) = µ ( τ ) + 12 h (cid:101) c µ (2) ( τ ) + o ( h ) + O (cid:18) T h (cid:19) . Since Var (cid:32) √ T h T (cid:88) t =1 B t (1) (cid:15) t K (cid:18) τ t − τh (cid:19)(cid:33) = 1 T h T (cid:88) t =1 B t (1) B (cid:62) t (1) K (cid:18) τ t − τh (cid:19) → (cid:101) v  ∞ (cid:88) j =0 B j ( τ )   ∞ (cid:88) j =0 B (cid:62) j ( τ )  , we then use the Cram´er-Wold device to prove its asymptotic normality. That is to show that for anyconformable vector l √ T h T (cid:88) t =1 l (cid:62) B t (1) (cid:15) t K (cid:18) τ t − τh (cid:19) → D N  , (cid:101) v l (cid:62)  ∞ (cid:88) j =0 B j ( τ )   ∞ (cid:88) j =0 B (cid:62) j ( τ )  l  . et Z t ( τ ) = √ T h l (cid:62) B t (1) (cid:15) t K (cid:0) τ t − τh (cid:1) . By the law of large numbers for martingale diﬀerences and theassumption E (cid:0) (cid:15) t (cid:15) (cid:62) t |F t − (cid:1) = I d a.s., we have for any τ ∈ (0 , T (cid:88) t =1 Z t ( τ ) → P (cid:101) v l (cid:62)  ∞ (cid:88) j =0 B j ( τ )   ∞ (cid:88) j =0 B (cid:62) j ( τ )  l . Furthermore, for any ν > τ ∈ (0 , T (cid:88) t =1 E (cid:0) Z t ( τ ) I ( | Z t ( τ ) | > ν ) (cid:1) ≤ T h T (cid:88) t =1 K (cid:18) τ t − τh (cid:19) (cid:16) E | l (cid:62) B t (1) (cid:15) t | δ (cid:17) /δ (cid:18) E | l (cid:62) B t (1) (cid:15) t | δ ( T h ) δ/ ν δ (cid:19) ( δ − /δ = O (cid:18) T h ) δ − / (cid:19) = o (1) . By Lemma B.1, the proof is now completed.

Proof of Theorem 3.1. (1). Similar to the proof of Proposition 3.1, one can show that (cid:80) Tt =1 (cid:13)(cid:13) x t x (cid:62) t − (cid:101) x t (cid:101) x (cid:62) t (cid:13)(cid:13) = O P (1). Therefore,Lemma B.7 are still valid for the time–varying VAR process (3.1). For example, consider the uniformconvergence results, by Lemma B.7,sup τ ∈ [0 , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T T (cid:88) t =1 (cid:16) x t x t − E ( (cid:101) x t (cid:101) x (cid:62) t ) (cid:17) K h ( τ t − τ ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ sup τ ∈ [0 , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T T (cid:88) t =1 (cid:16)(cid:101) x t (cid:101) x (cid:62) t − E ( (cid:101) x t (cid:101) x (cid:62) t ) (cid:17) K h ( τ t − τ ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:18) T sup τ,τ t | K h ( τ t − τ ) | (cid:19) · T (cid:88) t =1 (cid:13)(cid:13)(cid:13) x t x (cid:62) t − (cid:101) x t (cid:101) x (cid:62) t (cid:13)(cid:13)(cid:13) = O P (cid:32)(cid:114) log TT h (cid:33) + O P (cid:18) T h (cid:19) = O P (cid:32)(cid:114) log TT h (cid:33) . Hence, in the following we will directly apply Lemma B.7 to the time-varying VAR process.For notational simplicity, let S T,k ( τ ) = 1 T T (cid:88) t =1 z t − z (cid:62) t − (cid:18) τ t − τh (cid:19) k K h ( τ t − τ ) for 0 ≤ k ≤ , M ( τ t ) = A ( τ t ) − A ( τ ) − A (1) ( τ )( τ t − τ ) − A (2) ( τ )( τ t − τ ) . We now begin our investigation. Since x t = Z (cid:62) t − vec (cid:18) A ( τ ) + A (1) ( τ )( τ t − τ ) + 12 A (2) ( τ )( τ t − τ ) + M ( τ t ) (cid:19) + η t , e write vec( (cid:98) A ( τ ) − A ( τ ))= (cid:32) T T (cid:88) t =1 Z t − Z (cid:62) t − K h ( τ t − τ ) (cid:33) − (cid:32) T T (cid:88) t =1 Z t − x t K h ( τ t − τ ) (cid:33) − vec ( A ( τ ))= (cid:16) S − T, ( τ ) ⊗ I d (cid:17) (cid:26) ( S T, ( τ ) ⊗ I d ) h vec (cid:16) A (1) ( τ ) (cid:17) + ( S T, ( τ ) ⊗ I d ) 12 h vec (cid:16) A (2) ( τ ) (cid:17)(cid:27) + (cid:16) S − T, ( τ ) ⊗ I d (cid:17) (cid:32) T T (cid:88) t =1 ( z t − z (cid:62) t − ⊗ I d )vec ( M ( τ t )) K h ( τ t − τ ) (cid:33) + (cid:16) S − T, ( τ ) ⊗ I d (cid:17) (cid:32) T T (cid:88) t =1 ( z t − ⊗ I d ) η t K h ( τ t − τ ) (cid:33) := I T, + I T, + I T, . By standard arguments for the local constant kernel estimator and the uniform convergence results inLemma B.7, we have (cid:107) I T, + I T, (cid:107) = O ( h ) + O P ( h (cid:112) log T / ( T h ))uniformly over τ ∈ [ h, − h ]. By Lemma B.8, we have I T, = O P (( log TT h ) ) uniformly over τ ∈ [0 , T h T (cid:88) t =1 (cid:98) η t (cid:98) η (cid:62) t K (cid:18) τ t − τh (cid:19) = 1 T h T (cid:88) t =1 ( η t + (cid:98) η t − η t ) ( η t + (cid:98) η t − η t ) (cid:62) K (cid:18) τ t − τh (cid:19) = 1 T h T (cid:88) t =1 η t η (cid:62) t K (cid:18) τ t − τh (cid:19) + 1 T h T (cid:88) t =1 ( (cid:98) η t − η t )( (cid:98) η t − η t ) (cid:62) K (cid:18) τ t − τh (cid:19) + 1 T h T (cid:88) t =1 η t ( (cid:98) η t − η t ) (cid:62) K (cid:18) τ t − τh (cid:19) + 1 T h T (cid:88) t =1 ( (cid:98) η t − η t ) η (cid:62) t K (cid:18) τ t − τh (cid:19) := I T, + I T, + I T, + I T, . Let c T = h + (cid:113) log TT h . By the ﬁrst result, for ∀ τ ∈ (0 ,

1) we have (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T h T (cid:88) t =1 ( (cid:98) η t − η t )( (cid:98) η t − η t ) (cid:62) K (cid:18) τ t − τh (cid:19)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ sup τ t ∈ [ h, − h ] (cid:107) (cid:98) A ( τ t ) − A ( τ t ) (cid:107) · T h T (cid:88) t =1 (cid:107) z t − (cid:107) K (cid:18) τ t − τh (cid:19) = O P ( c T ) . By Lemma B.8, I T, and I T, are both o P (( T h ) − / ). Hence, √ T h (cid:32) T h T (cid:88) t =1 (cid:98) η t (cid:98) η (cid:62) t K (cid:18) τ t − τh (cid:19) − T h T (cid:88) t =1 η t η (cid:62) t K (cid:18) τ t − τh (cid:19) − o P ( h ) (cid:33) = o P (1) . he above development yields that √ T h  vec (cid:16) (cid:98) A ( τ ) − A ( τ ) − h ˜ c A (2) ( τ ) (cid:17) − o P ( h )vech (cid:16) (cid:98) Ω ( τ ) − Ω ( τ ) − h ˜ c Ω (2) ( τ ) (cid:17) − o P ( h )  = (cid:0) Σ − ( τ ) ⊗ I d (cid:1) (cid:16) √ T h (cid:80) Tt =1 Z t − η t K (cid:0) τ t − τh (cid:1)(cid:17) √ T h (cid:80) Tt =1 vech (cid:0) η t η (cid:62) t − Ω ( τ t ) (cid:1) K (cid:0) τ t − τh (cid:1)  + o P (1):= I T, + o P (1) . Below, we focus on I T, . First, show Var( I T, ) → V ( τ ). LetVar( I T, ) =  (cid:101) V , ( τ ) (cid:101) V (cid:62) , ( τ ) (cid:101) V , ( τ ) (cid:101) V , ( τ )  , where the deﬁnition of each block should be obvious. Moreover, simple algebra shows that (cid:101) V i,j ( τ ) → V i,j ( τ )for i, j ∈ { , } .By construction, I T, is a summation of m.d.s., we thus use Lemma B.1 and Cram´er-Wold device toprove its asymptotic normality. It suﬃces to show that d (cid:62) I T, → D N (cid:0) , d (cid:62) V ( τ ) d (cid:1) for any conformableunit vector d . Let Z T,t ( τ ) = 1 √ T h d (cid:62) (cid:0) Σ − ( τ ) ⊗ I d (cid:1) (cid:0) Z t − η t K (cid:0) τ t − τh (cid:1)(cid:1) vech (cid:0) η t η (cid:62) t − Ω ( τ t ) (cid:1) K (cid:0) τ t − τh (cid:1)  . By the law of large numbers for martingale diﬀerences, we have (cid:80) Tt =1 Z T,t ( τ ) − (cid:80) Tt =1 E ( Z T,t ( τ ) |F t − ) → P

0. Since conditional on F t − the third and fourth moments of (cid:15) t equal to the corresponding unconditionalmoments a.s., we can prove that (cid:80) Tt =1 E ( Z T,t ( τ ) |F t − ) → P d (cid:62) V ( τ ) d .Furthermore, for any ν > τ ∈ (0 , T (cid:88) t =1 E (cid:18)(cid:16) d (cid:62) Z T,t ( τ ) (cid:17) I (cid:16) | d (cid:62) Z T,t ( τ ) | > ν (cid:17)(cid:19) → . The result follows by Lemma B.1.(3). By Lemma B.7 and the second result of Theorem 3.1, we have (cid:98) V , ( τ ) → P V , ( τ ). Similar to theproof of the second result of this theorem, by the uniform convergence results of (cid:98) A ( τ ), we can replace (cid:98) η t with η t in the following calculations. Therefore, (cid:98) V , ( τ ) = 1 T h T (cid:88) t =1 vech (cid:16) η t η (cid:62) t (cid:17) η (cid:62) t Z (cid:62) t − K (cid:18) τ t − τh (cid:19) (cid:0) Σ − ( τ ) ⊗ I d (cid:1) + o P (1) → P V , ( τ ) , and (cid:98) V , ( τ ) = 1 T h T (cid:88) t =1 vech( η t η (cid:62) t )vech( η t η (cid:62) t ) (cid:62) K (cid:18) τ t − τh (cid:19) − (cid:101) v vech ( Ω ( τ )) vech ( Ω ( τ )) (cid:62) + o P (1) → P V , ( τ ) . he proof is now completed. Proof of Theorem 3.2.

We prove that lim T →∞ Pr (IC( p ) < IC( p )) = 0 for all p (cid:54) = p and p ≤ P .Note that IC( p ) − IC( p ) = log[RSS( p ) / RSS( p )] + ( p − p ) χ T . For p < p , Lemma B.9 implies that RSS( p ) / RSS( p ) > ν for some ν > T . Thus, log[RSS( p ) / RSS( p )] ≥ ν/ T . Because χ T →

0, we have IC( p ) − IC( p ) ≥ ν/ − ( p − p ) χ T ≥ ν/ T with large probability. Thus Pr (IC( p ) < IC( p )) → p < p .Next, consider p > p . Lemma B.9 implies that log[RSS( p ) / RSS( p )] = 1 + O P ( c T φ T ). Hence,log[RSS( p ) / RSS( p )] = O P ( c T φ T ). Because ( p − p ) χ T ≥ χ T , which converges to zero at a slower ratethan c T φ T , it follows that Pr (IC( p ) < IC( p )) ≤ Pr ( O P ( c T φ T ) + χ T < → . The proof is now completed.

Proof of Theorem 3.3.

Given the joint distribution of vec( (cid:98) A ( τ )) and vech( (cid:98) Ω ( τ )) in Theorem 3.1, Theorem 3.3 can by easilyobtained by the Delta method since √ T h vec (cid:16) (cid:98) B j ( τ ) − B j ( τ ) (cid:17) = √ T h vec (cid:16) J (cid:98) Φ j ( τ ) J (cid:62) (cid:98) ω ( τ ) − J Φ j ( τ ) J (cid:62) ω ( τ ) (cid:17) . Then, by standard arguments of the Delta method (see L¨utkepohl, 2005, p.111), we can show that √ T h (cid:18) vec (cid:16) (cid:98) B j ( τ ) − B j ( τ ) (cid:17) − h ˜ c B (2) j ( τ ) (cid:19) → D N (cid:0) , Σ B j ( τ ) (cid:1) , where Σ B j ( τ ) has been deﬁned in the body of the theorem. nline Supplementary Appendix B to “A Class of Time–VaryingVMA( ∞ ) Models: Nonparametric Kernel Estimation and Application” This ﬁle includes the simulations, preliminary lemmas and proofs which are omitted in the main text.Speciﬁcally, the simulations are summarized in Appendix B.1; Appendix B.2 presents the preliminarylemmas which are helpful to derive the main results of the paper; Appendix B.3 includes the omittedproofs of the main results; the proofs of the secondary lemmas are presented in Appendix B.4.

Appendix BB.1 Simulation

In this section, we exam the above theoretical ﬁndings using intensive simulation. The Epanechnikov kernel K ( u ) = 0 . − u ) I ( | u | ≤

1) is adopted throughout the numerical studies of this paper for simplicity. Foreach estimation conducted below, we always select the number of lag by (3.7) by searching the estimate of p over a suﬃciently large range, say { , . . . , (cid:98)√ T h (cid:99)} . Moreover, for each given p , the bandwidth is selectedby minimizing the following cross–validation criterion functionCV( h ) = T (cid:88) t =1  x t − a − t ( τ t ) − p (cid:88) j =1 A j, − t ( τ t ) x t − j  (cid:62)  x t − a − t ( τ t ) − p (cid:88) j =1 A j, − t ( τ t ) x t − j  , (B.1)where a − t ( · ), and A j, − t ( · ) are obtained using (3.3) but leaving the t th observation out. Once (cid:98) p is obtained,the rest calculation is relatively straightforward.We now start describing the data generating process. Let (cid:15) t be i.i.d. draws from N ( × , I ) with E (cid:0) (cid:15) t (cid:15) (cid:62) t (cid:1) = I . Consider x t = a ( τ t ) + A ( τ t ) x t − + A ( τ t ) x t − + η t , η t = ω ( τ t ) (cid:15) t with t = 1 , . . . , T, where a ( τ ) = [0 . πτ ) , . πτ )] (cid:62) , A ( τ ) =  . − . τ ) 0 . τ − . . τ − . . . πτ )  , A ( τ ) =  − . − . τ ) 0 . τ − . . τ − . − . . πτ )  , ω ( τ ) =  . . . − τ ) 00 . . . τ − . )(1 . . . − τ )) 1 . . τ − .  . We consider the sample size T ∈ { , , } , and conduct 1000 replications for each choice of T . ased on 1000 replications, we ﬁrst report the percentages of (cid:98) p < (cid:98) p = 2, and (cid:98) p > (cid:98) p = 2 increases as the sample size goes up. Table B.1: The percentages of (cid:98) p < (cid:98) p = 2, and (cid:98) p > T (cid:98) p < (cid:98) p = 2 (cid:98) p > Next, we evaluate the estimates of A ( τ ) and Ω ( τ ), and calculate the root mean square error (RMSE)as follows. (cid:40) nT (cid:88) n =1 T (cid:88) t =1 (cid:107) (cid:98) θ ( n ) ( τ t ) − θ ( τ t ) (cid:107) (cid:41) / for θ ( · ) ∈ { A ( · ) , Ω ( · ) } , where (cid:98) θ ( n ) ( τ ) is the estimate of θ ( τ ) for the n -th replication. Of interest, wealso examine the ﬁnite sample coverage probabilities of the conﬁdence intervals based on our asymptotictheories. In the following, we compute the average of coverage probabilities for grid points in { τ t , t =1 , . . . , T } . The RMSEs and empirical coverage probabilities are reported in Table B.2. As shown in TableB.2, the RMSE decreases as the sample size goes up. The ﬁnite sample coverage probabilities are smallerthan their nominal level (95%) for small T , but are fairly close to 95% as T increases. Table B.2: The RMSEs and the empirical coverage probabilities with 95% nominal level. A ( τ ) Ω ( τ ) T RMSE Coverage rate RMSE Coverage rate200 0.4937 0.9265 0.8228 0.8687400 0.3703 0.9291 0.7143 0.9059800 0.2785 0.9319 0.6205 0.9226

B.2 Preliminary Lemmas

We present the preliminary lemmas below, which help facilitate the development of the main results.

Lemma B.1.

Suppose { Z t , F t } is a martingale diﬀerence sequence, S T = (cid:80) Tt =1 Z t , U T = (cid:80) Tt =1 Z t and s T = E ( U T ) = E ( S T ) . If s − T U T → P and (cid:80) Tt =1 E [ Z T,t I ( | Z T,t | > ν )] → for any ν > with Z T,t = s − T Z t , then as T → ∞ , s − T S T → D N (0 , . Lemma B.1 can be found in Hall and Heyde (1980).

Lemma B.2.

Let { Z t , F t } be a martingale diﬀerence sequence. Suppose that | Z t | ≤ M for a constant M , t = 1 , . . . , T . Let V T = (cid:80) Tt =1 Var ( Z t |F t − ) ≤ V for some V > . Then for any given ν > , Pr (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t =1 Z t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > ν (cid:33) ≤ exp (cid:26) − ν V + M ν ) (cid:27) . emma B.2 is the Proposition 2.1 of Freedman (1975). Lemma B.3.

The following algebraic decompositions hold true.1. B t ( L ) = (cid:80) ∞ j =0 B j,t L j can be decomposed as B t ( L ) = B t (1) − (1 − L ) (cid:101) B t ( L ) , where (cid:101) B t ( L ) = (cid:80) ∞ j =0 (cid:101) B j,t L j and (cid:101) B j,t = (cid:80) ∞ k = j +1 B k,t .2. B rt ( L ) = (cid:80) ∞ j =0 ( B j + r,t ⊗ B j,t ) L j can be decomposed as B rt ( L ) = B rt (1) − (1 − L ) (cid:101) B rt ( L ) , where (cid:101) B rt ( L ) = (cid:80) ∞ j =0 (cid:101) B rj,t L j and (cid:101) B rj,t = (cid:80) ∞ k = j +1 ( B k + r,t ⊗ B k,t ) .In addition, let Assumption 1 hold, then3. max t ≥ (cid:80) ∞ j =0 (cid:107) (cid:101) B j,t (cid:107) < ∞ ;4. lim sup T →∞ (cid:80) T − t =1 (cid:107) (cid:101) B t +1 (1) − (cid:101) B t (1) (cid:107) < ∞ ;5. max t ≥ (cid:80) ∞ j =0 (cid:107) (cid:101) B rj,t (cid:107) < ∞ ;6. max t ≥ (cid:80) ∞ r =1 (cid:107) (cid:101) B rt (1) (cid:107) < ∞ ;7. lim sup T →∞ (cid:80) T − t =1 (cid:80) ∞ r =0 (cid:107) (cid:101) B rt +1 (1) − (cid:101) B rt (1) (cid:107) < ∞ . Lemma B.4.

Let Assumptions 1 and 2 hold. Suppose { W T,t } Tt =1 is a sequence m × d deterministic matricessatisfying (1). (cid:80) Tt =1 (cid:107) W T,t (cid:107) = O (1) , (2). max t ≥ (cid:107) W T,t (cid:107) = o (1) , and (3). (cid:80) T − t =1 (cid:107) W T,t +1 − W T,t (cid:107) = o (1) .As T → ∞ , T (cid:88) t =1 W T,t ( x t − E ( x t )) → P and T (cid:88) t =1 W T,t (cid:16) x t x (cid:62) t + p − E (cid:16) x t x (cid:62) t + p (cid:17)(cid:17) → P , where m ≥ is ﬁxed, and p is a ﬁxed non–negative integer. Lemma B.5.

Let Assumptions 1 and 2 hold, and let { W T,t ( · ) } Tt =1 be a sequence of m × d matrices offunctions, where m ≥ is ﬁxed, and each functional component is Lipschitz continuous and deﬁned on acompact set [ a, b ] . Moreover, suppose that (1). sup τ ∈ [ a,b ] (cid:80) Tt =1 (cid:107) W T,t ( τ ) (cid:107) = O (1) , and (2). T δ d T log T → , where d T = sup τ ∈ [ a,b ] ,t ≥ (cid:107) W T,t ( τ ) (cid:107) . As T → ∞ , sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:80) Tt =1 W T,t ( τ ) B t (1) (cid:15) t (cid:13)(cid:13)(cid:13) = O P (cid:0) √ d T log T (cid:1) . Lemma B.6.

Let the conditions of Lemma B.5 hold. Suppose T δ d T log T → , max t ≥ E [ (cid:107) (cid:15) t (cid:107) |F t − ] < ∞ a.s. and sup τ ∈ [ a,b ] (cid:80) T − t =1 (cid:107) W T,t +1 ( τ ) − W T,t ( τ ) (cid:107) = o (1) . As T → ∞ sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:80) Tt =1 ( I d ⊗ W T,t ( τ )) B t (1) (cid:0) vec[ (cid:15) t (cid:15) (cid:62) t ] − vec[ I d ] (cid:1)(cid:13)(cid:13)(cid:13) = O P (cid:0) √ d T log T (cid:1) ;2. sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:80) Tt =1 ( I d ⊗ W T,t ( τ )) ζ t (cid:15) t (cid:13)(cid:13)(cid:13) = O P (cid:0) √ d T log T (cid:1) ;where ζ t = (cid:80) ∞ r =1 (cid:80) ∞ s =0 { B s + r,t (cid:15) t − r } ⊗ B s,t . emma B.7. Let Assumptions 2, 3, and 4 hold. As T → ∞ ,1. for τ ∈ (0 , , T T (cid:88) t =1 x t (cid:18) τ t − τh (cid:19) k K h ( τ t − τ ) − (cid:101) c k µ ( τ ) → P , T T (cid:88) t =1 x t x (cid:62) t + p (cid:18) τ t − τh (cid:19) k K h ( τ t − τ ) − (cid:101) c k Σ p ( τ ) → P , where Σ p ( τ ) = µ ( τ ) µ (cid:62) ( τ ) + (cid:80) ∞ j =0 B j ( τ ) B (cid:62) j + p ( τ ) for ﬁxed integer k, p ≥ ;2. given T − δ h log T → ∞ , sup τ ∈ [ h, − h ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T T (cid:88) t =1 x t (cid:18) τ t − τh (cid:19) k K h ( τ t − τ ) − (cid:101) c k µ ( τ ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O P (cid:32) h + (cid:18) log TT h (cid:19) (cid:33) ;

3. given T − δ h log T → ∞ and max t ≥ E [ (cid:107) (cid:15) t (cid:107) |F t − ] < ∞ a.s., sup τ ∈ [ h, − h ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T T (cid:88) t =1 x t x (cid:62) t + p (cid:18) τ t − τh (cid:19) k K h ( τ t − τ ) − (cid:101) c k Σ p ( τ ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O P (cid:32) h + (cid:18) log TT h (cid:19) (cid:33) . Lemma B.8.

Let Assumptions 2, 4 and 5 hold. Suppose T − δ h log T → ∞ and max t ≥ E [ (cid:107) (cid:15) t (cid:107) |F t − ] < ∞ a.s. As T → ∞ ,1. sup τ ∈ [0 , (cid:13)(cid:13)(cid:13) T h (cid:80) Tt =1 Z t − η t K (cid:0) τ t − τh (cid:1)(cid:13)(cid:13)(cid:13) = O P (cid:18)(cid:16) log TT h (cid:17) (cid:19) ;2. √ T h (cid:80) Tt =1 η t ( η t − (cid:98) η t ) (cid:62) K (cid:0) τ t − τh (cid:1) = o P (1) for ∀ τ ∈ (0 , . Lemma B.9.

Let Assumptions 2, 4 and 5 hold. Suppose T − δ h log T → ∞ and max t ≥ E [ (cid:107) (cid:15) t (cid:107) |F t − ] < ∞ a.s. As T → ∞ ,1. if p ≥ p , then RSS ( p ) = T (cid:80) Tt =1 E (cid:0) η (cid:62) t η t (cid:1) + O P ( c T φ T ) ;2. if p < p , then RSS ( p ) = T (cid:80) Tt =1 E (cid:0) η (cid:62) t η t (cid:1) + c + o P (1) with some constant c > . B.3 Omitted Proofs of the Main Results

We present the omitted proofs of the main results in this section.

Proof of Proposition 2.1. (1). Start from Example 1. Let ρ denote the largest eigenvalue of Φ t uniformly over t . Then, ρ < t ≥ (cid:107) (cid:81) j − i =0 Φ t − i (cid:107) ≤ M ρ j , which yields that ax t ≥ ∞ (cid:88) j =1 j (cid:107) B j,t (cid:107) = max t ≥ ∞ (cid:88) j =1 j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) J j − (cid:89) i =0 Φ t − i J (cid:62) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M ∞ (cid:88) j =1 jρ j = O (1) . In addition, for any conformable matrices { A i } and { B i } , since r (cid:89) i =1 A i − r (cid:89) i =1 B i = r (cid:88) j =1 (cid:32) j − (cid:89) k =1 A k (cid:33) ( A j − B j )  r (cid:89) k = j +1 B k  , we then obtain thatlim sup T →∞ T − (cid:88) t =1 ∞ (cid:88) j =1 j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) J (cid:32) j − (cid:89) i =0 Φ t +1 − i − j − (cid:89) i =0 Φ t − i (cid:33) J (cid:62) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = lim sup T →∞ T − (cid:88) t =1 ∞ (cid:88) j =1 j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) J j (cid:88) m =1 (cid:32) m − (cid:89) k =1 Φ t +2 − k (cid:33) ( Φ t +2 − m − Φ t +1 − m ) (cid:32) j (cid:89) k = m Φ t +1 − k (cid:33) J (cid:62) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M ∞ (cid:88) j =1 j ρ j − lim sup T →∞ T − (cid:88) t =1 (cid:107) Φ t +1 − Φ t (cid:107) = O (1)given the condition in Proposition 2.1.Consider Example 2. Similar to Example 1,max t ≥ ∞ (cid:88) b =1 b (cid:107) D b,t (cid:107) ≤ M max t ≥ ∞ (cid:88) b =1 b b (cid:88) j = b − q (cid:107) B j,t (cid:107) ≤ M ∞ (cid:88) b =1 bρ b = O (1) . In addition, lim sup T →∞ T − (cid:88) t =1 ∞ (cid:88) b =1 b (cid:107) D b,t +1 − D b,t (cid:107)≤ lim sup T →∞ T − (cid:88) t =1 ∞ (cid:88) b =1 b b (cid:88) j =max(0 ,b − q ) (cid:107) B j,t +1 − B j,t (cid:107) (cid:107) Θ b − j,t +1 − j (cid:107) + lim sup T →∞ T − (cid:88) t =1 ∞ (cid:88) b =1 b b (cid:88) j =max(0 ,b − q ) (cid:107) B j,t (cid:107) (cid:107) Θ b − j,t +1 − j − Θ b − j,t − j (cid:107)≤ M lim sup T →∞ T − (cid:88) t =1 ∞ (cid:88) b =1 b (cid:107) B b,t +1 − B b,t (cid:107) + ∞ (cid:88) b =1 b b (cid:88) j =max(0 ,b − q ) (cid:107) B j,t (cid:107) lim sup T →∞ T − (cid:88) t =1 (cid:107) Θ b − j,t +1 − j − Θ b − j,t − j (cid:107) = O (1) . (2). By the condition of Proposition 2.1.3,max t ≥ ∞ (cid:88) j =1 j (cid:107) B j,t (cid:107) ≤ max t ≥ ∞ (cid:88) j =1 j j (cid:88) l =0 (cid:107) Ψ l,t (cid:107) (cid:107) Θ j − l,t − l (cid:107) = max t ≥ ∞ (cid:88) j =0 ∞ (cid:88) l = j +1 l (cid:107) Ψ j,t (cid:107) (cid:107) Θ l − j,t − j (cid:107) = max t ≥ ∞ (cid:88) j =0 (cid:107) Ψ j,t (cid:107) ∞ (cid:88) l =1 ( l + j ) (cid:107) Θ l,t − j (cid:107) = max t ≥ ∞ (cid:88) j =0 j (cid:107) Ψ j,t (cid:107) ∞ (cid:88) l =1 (cid:107) Θ l,t − j (cid:107) + max t ≥ ∞ (cid:88) j =0 (cid:107) Ψ j,t (cid:107) ∞ (cid:88) l =1 l (cid:107) Θ l,t − j (cid:107) = O (1) . n addition,lim sup T →∞ T − (cid:88) t =1 ∞ (cid:88) j =1 j (cid:107) B j,t +1 − B j,t (cid:107) ≤ lim sup T →∞ T − (cid:88) t =1 ∞ (cid:88) j =1 j j (cid:88) l =0 (cid:107) Ψ l,t +1 (cid:107) (cid:107) Θ j − l,t +1 − l − Θ j − l,t − l (cid:107) + lim sup T →∞ T − (cid:88) t =1 ∞ (cid:88) j =1 j j (cid:88) l =0 (cid:107) Ψ l,t +1 − Ψ l,t (cid:107) (cid:107) Θ j − l,t − l (cid:107) := I T, + I T, . We show that I T, is bounded below, and the proof of I T, can be established similarly. I T, ≤ max t ≥ ∞ (cid:88) j =0 j (cid:107) Ψ j,t (cid:107) lim sup T →∞ T − (cid:88) t =1 ∞ (cid:88) l =1 (cid:107) Θ l,t +1 − j − Θ l,t − j (cid:107) + max t ≥ ∞ (cid:88) j =0 (cid:107) Ψ j,t (cid:107) lim sup T →∞ T − (cid:88) t =1 ∞ (cid:88) l =1 l (cid:107) Θ l,t +1 − j − Θ l,t − j (cid:107) = O (1) . The proof is now completed.

Proof of Proposition 3.1.

Consider the VMA representation of x t : x t = µ t + B ,t (cid:15) t + B ,t (cid:15) t − + B ,t (cid:15) t − + · · · , where B ,t = ω ( τ t ), B j,t = Ψ j,t ω ( τ t − j ), Ψ j,t = J (cid:81) j − m =0 Φ ( τ t − m ) J (cid:62) for j ≥ µ t = a ( τ t )+ (cid:80) ∞ j =1 Ψ j,t a ( τ t − j )and τ t − j = t − jT I ( t ≥ j ).First, we investigate the validity of VMA representation of x t and (cid:101) x t . Let ρ A denote the largest eigenvalueof Φ ( τ ) uniformly over τ ∈ [0 , ρ A < t ≥ (cid:13)(cid:13)(cid:13)(cid:81) j − m =0 Φ ( τ t − m ) (cid:13)(cid:13)(cid:13) ≤ M ρ jA . It follows that (cid:107) E( x t ) (cid:107) ≤ (cid:80) ∞ j =0 (cid:107) Ψ j,t (cid:107) · (cid:107) a ( τ t − j ) (cid:107) ≤ M (cid:80) ∞ j =0 ρ jA < ∞ and (cid:107) Var( x t ) (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:88) j =0 B j,t B (cid:62) j,t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ ∞ (cid:88) j =0 (cid:107) B j,t (cid:107) ≤ M ∞ (cid:88) j =0 ρ jA < ∞ . Similarly, we have (cid:107)

E ( (cid:101) x t ) (cid:107) < ∞ and (cid:107) Var ( (cid:101) x t ) (cid:107) < ∞ .Then, we need to verify that max t ≥ E (cid:107) x t − (cid:101) x t (cid:107) = O ( T − ). For any conformable matrices { A i } and { B i } , since r (cid:89) i =1 A i − r (cid:89) i =1 B i = r (cid:88) j =1 (cid:32) j − (cid:89) k =1 A k (cid:33) ( A j − B j )  r (cid:89) k = j +1 B k  we have (cid:107) B j,t − B j ( τ t ) (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) J j − (cid:89) m =0 Φ ( τ t − m ) J (cid:62) ω ( τ t − j ) − J Φ j ( τ t ) J (cid:62) ω ( τ t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:32) J j − (cid:89) m =0 Φ ( τ t − m ) J (cid:62) − J Φ j ( τ t ) J (cid:62) (cid:33) ω ( τ t ) + J j − (cid:89) m =0 Φ ( τ t − m ) J (cid:62) ( ω ( τ t − j ) − ω ( τ t )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M j − (cid:88) i =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) Φ i ( τ t )( Φ ( τ t − i ) − Φ ( τ t )) j − (cid:89) m = i +1 Φ ( τ t − m ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + M ρ jA jT ≤ M j − (cid:88) i =1 iT ρ j − A + M ρ jA jT = O ( T − ) , which follows that E (cid:107) x t − (cid:101) x t (cid:107) ≤ ∞ (cid:88) j =1 (cid:107) Ψ j,t a ( τ t − j ) − Ψ j ( τ t ) a ( τ t ) (cid:107) + ∞ (cid:88) j =1 (cid:107) B j,t − B j ( τ t ) (cid:107) · E (cid:107) (cid:15) t (cid:107)≤ M ∞ (cid:88) j =1 (cid:32) j − (cid:88) i =1 iT ρ j − A + ρ jA jT (cid:33) = O (cid:0) T − (cid:1) . Finally, we check whether the MA coeﬃcients of (cid:101) x t satisfy Assumption 3. For µ ( τ ), the series (cid:80) ∞ j =0 Ψ j ( τ ) a ( τ ) converges uniformly on [0 , ∈ R since for every ν > N ν > (cid:107) Ψ m +1 ( τ ) a ( τ ) + · · · + Ψ n ( τ ) a ( τ ) (cid:107) ≤ M n (cid:88) j = m +1 ρ jA < ν whenever n > m > N ν .By the term-by-term diﬀerentiability theorem, we have µ (1) ( τ ) = (cid:80) ∞ j =0 (cid:16) Ψ (1) j ( τ ) a ( τ ) + Ψ j ( τ ) a (1) ( τ ) (cid:17) ,where Ψ (1) j ( τ ) = J (cid:16)(cid:80) j − i =0 Φ i ( τ ) Φ (1) ( τ ) Φ j − − i ( τ ) (cid:17) J (cid:62) .Therefore, we can conclude that µ ( · ) and Ψ j ( · ) is ﬁrst-order continuously diﬀerentiable. Similarly, wecan show the second-order continuously diﬀerentiability of µ ( · ) and Ψ j ( · ).In addition, since sup τ ∈ [0 , (cid:13)(cid:13) Φ j ( τ ) (cid:13)(cid:13) ≤ M ρ jA , we have ∞ (cid:88) j =0 j (cid:107) B j ( τ ) (cid:107) = ∞ (cid:88) j =0 j (cid:107) Ψ j ( τ ) ω ( τ ) (cid:107) ≤ M ∞ (cid:88) j =0 jρ jA < ∞ , and ∞ (cid:88) j =0 j (cid:13)(cid:13)(cid:13) B (1) j ( τ ) (cid:13)(cid:13)(cid:13) = ∞ (cid:88) j =0 j (cid:13)(cid:13)(cid:13) Ψ (1) j ( τ ) ω ( τ ) + Ψ j ( τ ) ω (1) ( τ ) (cid:13)(cid:13)(cid:13) ≤ M ∞ (cid:88) j =0 ( j ρ j − A + jρ jA ) < ∞ . The proof is therefore completed.

Proof of Theorem A.1.

Note that x ∗ t = (cid:101) µ ( τ t ) + e ∗ t , so we can write (cid:98) µ ∗ ( τ ) − (cid:101) µ ( τ ) = (cid:32) T (cid:88) t =1 W T,t ( τ ) (cid:101) µ ( τ t ) − (cid:101) µ ( τ ) (cid:33) + T (cid:88) t =1 W T,t ( τ ) e ∗ t := I T, + I T, , here W T,t ( τ ) = K ( τ t − τh ) / (cid:80) Tt =1 K ( τ t − τh ).We start our investigation from I T, , and write I T, = (cid:32) T h T (cid:88) t =1 K (cid:18) τ t − τh (cid:19) T (cid:101) h T (cid:88) s =1 µ ( τ s ) K (cid:18) τ s − τ t (cid:101) h (cid:19) − T (cid:101) h T (cid:88) s =1 µ ( τ s ) K (cid:18) τ s − τ (cid:101) h (cid:19)(cid:33) + 1 (cid:112) T (cid:101) h (cid:32) T (cid:88) t =1 K (cid:18) τ t − τh (cid:19) Z T ( τ t ) − Z T ( τ ) (cid:33) + O P (cid:18) T h (cid:19) := I T, + I T, + O P (cid:18) T h (cid:19) , where the deﬁnitions of I T, and I T, should be obvious, Z T ( τ ) = √ T (cid:101) h (cid:80) Tt =1 e t K (cid:16) τ t − τ (cid:101) h (cid:17) and e t = (cid:80) ∞ j =0 B j ( τ t ) (cid:15) t − j . Similar to the development of Lemma B.4, we can show that (cid:107) I T, (cid:107) = O P (( T (cid:101) h ) − / ),which in connection of Assumption A.1 yields √ T h (cid:107) I T, (cid:107) = O P (( h/ (cid:101) h ) / ) = o P (1) . For I T, , by the deﬁnition of Riemann integral, we have I T, = (cid:90) − K ( u ) (cid:90) − K ( v ) (cid:16) µ ( τ + v (cid:101) h + uh ) − µ ( τ + v (cid:101) h ) (cid:17) dvdu + O (cid:18) T h (cid:19) = 12 h (cid:101) c µ (2) ( τ ) + O ( h ( h + (cid:101) h )) + O (cid:18) T h (cid:19) . Thus, we just need to focus on I T, , and show that1 √ T h T (cid:88) t =1 K (cid:18) τ t − τh (cid:19) e ∗ t → D ∗ N  , ˜ v  ∞ (cid:88) j =0 B j ( τ )   ∞ (cid:88) j =0 B (cid:62) j ( τ )  . Using the Cram´er-Wold device, this is enough to show for any conformable unit vector d ,1 √ T h T (cid:88) t =1 K (cid:18) τ t − τh (cid:19) d (cid:62) e ∗ t → D ∗ N  , ˜ v d (cid:62)  ∞ (cid:88) j =0 B j ( τ )   ∞ (cid:88) j =0 B (cid:62) j ( τ )  d  . For ∀ τ ∈ [ h + (cid:101) h, − h − (cid:101) h ], we write1 √ T h T (cid:88) t =1 K (cid:18) τ t − τh (cid:19) d (cid:62) (cid:98) e t ξ ∗ t = Z ∗ T ( τ ) + 1 √ T h T (cid:88) t =1 K (cid:18) τ t − τh (cid:19) d (cid:62) ( (cid:98) e t − e t ) ξ ∗ t = Z ∗ T ( τ ) + o P (1) , where Z ∗ T ( τ ) = √ T h (cid:80) Tt =1 K (cid:0) τ t − τh (cid:1) d (cid:62) e t ξ ∗ t , and the second equality follows from EE ∗ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) √ T h T (cid:88) t =1 K (cid:18) τ t − τh (cid:19) d (cid:62) ( (cid:98) e t − e t ) ξ ∗ t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max (cid:98) T ( τ − h ) (cid:99)≤ t ≤(cid:100) T ( τ + h ) (cid:101) E (cid:107) (cid:98) e t − e t (cid:107) (cid:32) T h T (cid:88) t =1 T (cid:88) s =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ s − τh (cid:19) E ∗ ( ξ ∗ t ξ ∗ s ) (cid:33) = O (cid:16)(cid:101) h + 1 / ( T (cid:101) h ) (cid:17) O ( l ) = o (1) , here EE ∗ [ · ] stands for taking expectation of the variables with respect to the bootstrap draws ﬁrst, andthen taking the exception with respect to the sample.In the following, we ﬁrst show thatVar ∗ ( Z ∗ T ( τ )) = ˜ v d (cid:62)  ∞ (cid:88) j =0 B j ( τ )   ∞ (cid:88) j =0 B (cid:62) j ( τ )  d + o P (1)and then prove its normality by blocking techniques.Condition on the original sample, we have E ∗ ( Z ∗ T ( τ )) = 1 T h T (cid:88) t =1 T (cid:88) s =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ s − τh (cid:19) d (cid:62) e t e (cid:62) s d E ∗ ( ξ ∗ t ξ ∗ s )= 1 T h T (cid:88) t =1 K (cid:18) τ t − τh (cid:19) d (cid:62) e t e (cid:62) t d + 1 T h T − (cid:88) i =1 T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) d (cid:62) e t e (cid:62) t + i d a ( i/l )+ 1 T h T − (cid:88) i =1 T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) d (cid:62) e t + i e (cid:62) t d a ( i/l ) . (B.2)For the ﬁrst term on the right hand side of (B.2), by Lemma B.4, it is straightforward to obtain that1 T h T (cid:88) t =1 K (cid:18) τ t − τh (cid:19) d (cid:62) e t e (cid:62) t d = 1 T h T (cid:88) t =1 K (cid:18) τ t − τh (cid:19) d (cid:62) E (cid:16) e t e (cid:62) t (cid:17) d + o P (1) . For the second and third terms on the right hand side of (B.2), we have E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T h T − (cid:88) i =1 T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) (cid:16) e t e (cid:62) t + i − E ( e t e (cid:62) t + i ) (cid:17) a ( i/l ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ T h T − (cid:88) i =1 a ( i/l ) E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) (cid:16) e t e (cid:62) t + i − E ( e t e (cid:62) t + i ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . We now take a careful look at E (cid:13)(cid:13)(cid:13)(cid:80) T − it =1 K (cid:0) τ t − τh (cid:1) K (cid:16) τ t + i − τh (cid:17) (cid:0) e t e (cid:62) t + i − E ( e t e (cid:62) t + i ) (cid:1)(cid:13)(cid:13)(cid:13) . Simple algebrashows that E ( e t e (cid:62) t + i ) = (cid:80) ∞ j =0 B j,t B (cid:62) j + i,t + i . Applying vector transformation, we can write1 T h T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) vec (cid:104) e t e (cid:62) t + i − E ( e t e (cid:62) t + i ) (cid:105) = 1 T h T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) ∞ (cid:88) j =0 ( B j + i,t + i ⊗ B j,t ) vec (cid:104) (cid:15) t − j (cid:15) (cid:62) t − j − I d (cid:105) + 1 T h T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) ∞ (cid:88) j =0 ∞ (cid:88) m =0 , (cid:54) = j + i ( B m,t + i ⊗ B j,t ) vec (cid:104) (cid:15) t − j (cid:15) (cid:62) t + i − m (cid:105) := I T, + I T, . Let w t = vec (cid:2) (cid:15) t (cid:15) (cid:62) t − I d (cid:3) . By Assumption 2, E (cid:107) w t (cid:107) δ/ ≤ E (cid:107) (cid:15) t (cid:107) δ < ∞ for some δ >

2, which im-plies that { w t } is uniformly integrable. Hence, for every ν >

0, there exists a λ ν > E (cid:107) w t I ( (cid:107) w t (cid:107) > λ ν ) (cid:107) < ν . Deﬁne w ,t = w t I ( (cid:107) w t (cid:107) ≤ λ ν ) and w ,t = w t − w ,t = w t I ( (cid:107) w t (cid:107) > λ ν ). imilar to the proof of Theorem 2.22 in Hall and Heyde (1980), E (cid:107) I T, (cid:107) ≤ E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T h T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) · ∞ (cid:88) j =0 ( B j + i,t + i ⊗ B j,t ) ( w ,t − j − E ( w ,t − j |F t − j − )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T h T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) · ∞ (cid:88) j =0 ( B j + i,t + i ⊗ B j,t ) ( w ,t − j − E ( w ,t − j |F t − j − )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := I T, + I T, . For I T, , I T, ≤ T h ∞ (cid:88) j =0 (cid:40) E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) · ( B j + i,t + i ⊗ B j,t ) ( w ,t − j − E ( w ,t − j |F t − j − )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:41) ≤ T h ∞ (cid:88) j =0 (cid:40) T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) · (cid:107) B j + i,t + i (cid:107) (cid:107) B j,t (cid:107) E (cid:107) w ,t − j (cid:107) (cid:41) ≤ √ T h max t ≥  ∞ (cid:88) j =0 (cid:107) B j,t (cid:107) (cid:107) B j + i,t + i (cid:107)  · (cid:40) T h T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) E (cid:107) w ,t − j (cid:107) (cid:41) := β i φ T, ( τ ) , where β i = max t ≥ (cid:80) ∞ j =0 (cid:107) B j,t (cid:107) (cid:107) B j + i,t + i (cid:107) satisfying ∞ (cid:88) i =1 β i ≤ ∞ (cid:88) i =1 max t ≥ ∞ (cid:88) j =0 (cid:107) B j,t (cid:107) (cid:107) B j + i,t + i (cid:107) ≤  sup τ ∞ (cid:88) j =0 (cid:107) B j ( τ ) (cid:107)  < ∞ , and φ T, ( τ ) = O (cid:16) λ ν √ T h (cid:17) . For I T, , I T, ≤ t ≥ ∞ (cid:88) j =0 (cid:107) B j,t (cid:107) (cid:107) B j + i,t + i (cid:107) T h T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) E (cid:107) w ,t − j (cid:107) := β i φ T, ( τ ) , here φ T, ( τ ) = O ( ν ). As we can make ν arbitrarily small, it follows that φ T = sup τ ∈ [0 , ( φ T, ( τ ) + φ T, ( τ )) → T → ∞ .For I T, , we have E (cid:107) I T, (cid:107) ≤ √ T h (cid:32) E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) √ T h T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) · ∞ (cid:88) j =0 ∞ (cid:88) m =0 , (cid:54) = j + i ( B m,t + i ⊗ B j,t ) vec (cid:104) (cid:15) t − j (cid:15) (cid:62) t + i − m (cid:105) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:33) ≤ O (1) 1 √ T h , since E (cid:16) vec (cid:2) (cid:15) t − j (cid:15) (cid:62) t + i − m (cid:3) vec (cid:2) (cid:15) s − j (cid:15) (cid:62) s + i − m (cid:3) (cid:62) (cid:17) , m (cid:54) = j + i can only be non-zero if t = s .Based on the above development and (cid:80) ∞ i =0 a ( i/l ) = (cid:80) li =0 a ( i/l ) = O ( l ), we conclude that E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T h T − (cid:88) i =1 T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) (cid:16) e t e (cid:62) t + i − E ( e t e (cid:62) t + i ) (cid:17) a ( i/l ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M φ T ∞ (cid:88) i =0 β i + M √ T h ∞ (cid:88) i =0 a ( i/l ) ≤ M l √ T h = o (1)since (cid:80) ∞ i =1 β i < ∞ and lim T →∞ φ T = 0.We now just need to focus on T h (cid:80) T − i =1 (cid:80) T − it =1 K (cid:0) τ t − τh (cid:1) K (cid:16) τ t + i − τh (cid:17) E ( e t e (cid:62) t + i ) a ( i/l ). Note that1 T h T − (cid:88) i =1 T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) E ( e t e (cid:62) t + i ) a ( i/l )= 1 T h T − (cid:88) i =1 T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) E ( e t e (cid:62) t + i )+ 1 T h T − (cid:88) i =1 T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) E ( e t e (cid:62) t + i )( a ( i/l ) − . It is then suﬃcient to show that the second term is o (1) since1 T h T (cid:88) t =1 T (cid:88) s =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ s − τh (cid:19) E ( e t e (cid:62) s )= Var (cid:32) √ T h T (cid:88) t =1 K (cid:18) τ t − τh (cid:19) e t (cid:33) → (cid:101) v  ∞ (cid:88) j =0 B j ( τ )   ∞ (cid:88) j =0 B (cid:62) j ( τ )  by the proof of Theorem 2.2. et d T satisfy d T + d T l →

0. The second term is then bounded by (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T h T − (cid:88) i =1 T − i (cid:88) t =1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) E ( e t e (cid:62) t + i )( a ( i/l ) − (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M d T (cid:88) i =1 max t (cid:13)(cid:13)(cid:13) E ( e t e (cid:62) t + i ) (cid:13)(cid:13)(cid:13) | a ( i/l ) − | + M ∞ (cid:88) i = d T +1 max t (cid:13)(cid:13)(cid:13) E ( e t e (cid:62) t + i ) (cid:13)(cid:13)(cid:13) | a ( i/l ) − |≤ M d T (cid:88) i =1 (1 − a ( i/l )) + M ∞ (cid:88) i = d T +1 max t (cid:13)(cid:13)(cid:13) E ( e t e (cid:62) t + i ) (cid:13)(cid:13)(cid:13) = o (1) , since d T (cid:88) i =1 (1 − a ( i/l )) ≤ d T (cid:88) i =1 ( − a (1) (0) i/l + o ( i/l )) ≤ M d T /l = o (1) , ∞ (cid:88) i = d T +1 max t (cid:13)(cid:13)(cid:13) E ( e t e (cid:62) t + i ) (cid:13)(cid:13)(cid:13) = o (1) as d T → ∞ . Conditional on the original sample, we now use standard arguments of a blocking technique to showthe asymptotic normality of the stochastic term. Now let Z ∗ T ( τ ) = (cid:80) kj =1 X ∗ T,j ( τ ) + (cid:80) kj =1 Y ∗ T,j ( τ ), where X ∗ T,j ( τ ) = 1 √ T h B j + r (cid:88) t = B j +1 K (cid:18) τ t − τh (cid:19) d (cid:62) e t ξ ∗ t , Y ∗ T,j ( τ ) = 1 √ T h B j + r + r (cid:88) t = B j + r +1 K (cid:18) τ t − τh (cid:19) d (cid:62) e t ξ ∗ t , with B j = ( j − r + r ) and k = (cid:100) T / ( r + r ) (cid:101) . Let r = r ( T ) and r = r ( T ) satisfying r / ( T h ) + l/ ( r ) → r /r + l/r →

0. We ﬁrst show that (cid:80) kj =1 Y ∗ T,j ( τ ) = o P (1). Since r > l for large enough T and the blocks Y ∗ T,j are mutual independent conditionally on the original data, then we have EE ∗  k (cid:88) j =1 Y ∗ T,j ( τ )  = E  k (cid:88) j =1 E ∗ ( Y ∗ T,j ( τ ))  ≤ T h r − (cid:88) i = − r +1 a ( i/l ) max t (cid:13)(cid:13)(cid:13) E ( e t e (cid:62) t + i ) (cid:13)(cid:13)(cid:13) k (cid:88) j =1 B j + r + r −| i | (cid:88) t = B j + r +1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + | i | − τh (cid:19) ≤ M T h max ≤ i ≤ r − k (cid:88) j =1 B j + r + r − i (cid:88) t = B j + r +1 K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − τh (cid:19) ≤ M kr hT h ≤ M r r = o (1) . We employ the Lindeberg CLT to establish the asymptotic normality of (cid:80) kj =1 X ∗ T,j ( τ ) as the blocks X ∗ T,j ( τ )are independent when r > l for large enough T . As discussed before, we have already shown that theasymptotic variance is equal to (cid:101) v d (cid:62) (cid:110)(cid:80) ∞ j =0 B j ( τ ) (cid:111) (cid:110)(cid:80) ∞ j =0 B (cid:62) j ( τ ) (cid:111) d . We then need to verify that forevery ν > k (cid:88) j =1 E ∗  X ∗ T,j ( τ ) E ∗ (cid:16)(cid:80) kj =1 X ∗ T,j ( τ ) (cid:17) I  X ∗ T,j ( τ ) E ∗ (cid:16)(cid:80) kj =1 X ∗ T,j ( τ ) (cid:17) > ν  = o P (1) . onditional on original sample, by H¨older’s inequality, Chebyshev’s inequality and Minkowski’s inequality,we have k (cid:88) j =1 E ∗  X ∗ T,j ( τ ) E ∗ (cid:16)(cid:80) kj =1 X ∗ T,j ( τ ) (cid:17) I  X ∗ T,j ( τ ) E ∗ (cid:16)(cid:80) kj =1 X ∗ T,j ( τ ) (cid:17) > ν  ≤ k (cid:88) j =1  E ∗ (cid:32) X ∗ T,j ( τ ) E ∗ ( (cid:80) kj =1 X ∗ T,j ( τ )) (cid:33) δ  δ  E ∗ (cid:18) X ∗ T,j ( τ ) E ∗ ( (cid:80) kj =1 X ∗ T,j ( τ )) (cid:19) δ ν δ  δ − δ = ν − δ k (cid:88) j =1 E ∗ ( X ∗ T,j ( τ )) δ (cid:16) E ∗ ( (cid:80) kj =1 X ∗ T,j ( τ )) (cid:17) δ ≤ ν − δ k (cid:88) j =1 (cid:80) B j + r t = B j +1 (cid:16) √ T h K (cid:0) τ t − τh (cid:1) d (cid:62) e t (cid:17) δ E ∗ ( ξ ∗ t ) δ (cid:16) E ∗ ( (cid:80) kj =1 X ∗ T,j ( τ )) (cid:17) δ ≤ ν − δ T h ) δ − δ T h (cid:80) Tt =1 (cid:0) K (cid:0) τ t − τh (cid:1) d (cid:62) e t (cid:1) δ E ∗ ( ξ ∗ t ) δ (cid:16) E ∗ ( (cid:80) kj =1 X ∗ T,j ( τ )) (cid:17) δ = O P (cid:32) T h ) δ − δ (cid:33) = o P (1) . The proof is now completed.

B.4 Proofs of the Preliminary Lemmas

Proof of Lemma B.3. (1). The ﬁrst result follows from the standard BN decomposition (e.g., Phillips and Solo, 1992), so thedetails are omitted.(2). For the second decomposition, write(1 − L ) (cid:101) B rt ( L ) = ∞ (cid:88) j =0  L j ∞ (cid:88) k = j +1 ( B k + r,t ⊗ B k,t ) − L j +1 ∞ (cid:88) k = j +1 ( B k + r,t ⊗ B k,t )  = ∞ (cid:88) j =0  L j +1 ∞ (cid:88) k = j +2 ( B k + r,t ⊗ B k,t ) − L j +1 ∞ (cid:88) k = j +1 ( B k + r,t ⊗ B k,t )  + ∞ (cid:88) k =1 ( B k + r,t ⊗ B k,t )= − ∞ (cid:88) j =0 L j +1 ( B j +1+ r,t ⊗ B j +1 ,t ) + ∞ (cid:88) k =1 ( B k + r,t ⊗ B k,t )= − ∞ (cid:88) j =0 L j ( B j + r,t ⊗ B j,t ) + ∞ (cid:88) k =0 ( B k + r,t ⊗ B k,t )= B rt (1) − B rt ( L ) . (3). By Assumption 1,max t ≥ ∞ (cid:88) j =0 (cid:13)(cid:13)(cid:13) (cid:101) B j,t (cid:13)(cid:13)(cid:13) ≤ max t ≥ ∞ (cid:88) j =0 ∞ (cid:88) k = j +1 (cid:107) B k,t (cid:107) = max t ≥ ∞ (cid:88) j =1 j (cid:107) B j,t (cid:107) < ∞ . T − (cid:88) t =1 (cid:107) (cid:101) B t +1 (1) − (cid:101) B t (1) (cid:107) ≤ T − (cid:88) t =1 ∞ (cid:88) j =0 ∞ (cid:88) k = j +1 (cid:107) B k,t +1 − B k,t (cid:107) = T − (cid:88) t =1 ∞ (cid:88) j =1 j (cid:107) B j,t +1 − B j,t (cid:107) < ∞ . (5). By Assumption 1,max t ≥ ∞ (cid:88) j =0 (cid:13)(cid:13)(cid:13) (cid:101) B rj,t (cid:13)(cid:13)(cid:13) ≤ max t ≥ ∞ (cid:88) j =0 ∞ (cid:88) k = j +1 (cid:107) B k + r,t ⊗ B k,t (cid:107) = max t ≥ ∞ (cid:88) j =0 ∞ (cid:88) k = j +1 (cid:107) B k + r,t (cid:107) · (cid:107) B k,t (cid:107) = max t ≥ ∞ (cid:88) j =1 j (cid:107) B j + r,t (cid:107) · (cid:107) B j,t (cid:107) ≤ M max t ≥ ∞ (cid:88) j =1 j (cid:107) B j,t (cid:107) < ∞ . (6). Write max t ≥ ∞ (cid:88) r =1 (cid:13)(cid:13)(cid:13)(cid:101) B rt (1) (cid:13)(cid:13)(cid:13) ≤ max t ≥ ∞ (cid:88) r =1 ∞ (cid:88) j =0 ∞ (cid:88) k = j +1 (cid:107) B k + r,t (cid:107) · (cid:107) B k,t (cid:107) = max t ≥ ∞ (cid:88) j =0 ∞ (cid:88) k = j +1 (cid:107) B k,t (cid:107) · (cid:32) ∞ (cid:88) r =1 (cid:107) B k + r,t (cid:107) (cid:33) ≤ max t ≥ (cid:32) ∞ (cid:88) r =1 (cid:107) B r,t (cid:107) (cid:33) ·  ∞ (cid:88) j =1 j (cid:107) B j,t (cid:107)  < ∞ . (7). Write T − (cid:88) t =1 ∞ (cid:88) r =0 ∞ (cid:88) j =0 (cid:13)(cid:13)(cid:13)(cid:101) B rt +1 (1) − (cid:101) B rt (1) (cid:13)(cid:13)(cid:13) ≤ T − (cid:88) t =1 ∞ (cid:88) r =0 ∞ (cid:88) j =0 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:88) k = j +1 B k + r,t +1 ⊗ B k,t +1 − B k + r,t ⊗ B k,t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ T − (cid:88) t =1 ∞ (cid:88) r =0 ∞ (cid:88) j =0 ∞ (cid:88) k = j +1 ( (cid:107) B k + r,t +1 − B k + r,t (cid:107) · (cid:107) B k,t +1 (cid:107) + (cid:107) B k + r,t (cid:107) · (cid:107) B k,t +1 − B k,t (cid:107) )= T − (cid:88) t =1 ∞ (cid:88) j =0 ∞ (cid:88) k = j +1 (cid:32) (cid:107) B k,t +1 (cid:107) · ∞ (cid:88) r =0 (cid:107) B k + r,t +1 − B k + r,t (cid:107) + (cid:107) B k,t +1 − B k,t (cid:107) · ∞ (cid:88) r =0 (cid:107) B k + r,t (cid:107) (cid:33) ≤ (cid:32) T − (cid:88) t =1 ∞ (cid:88) r =1 (cid:107) B r,t +1 − B r,t (cid:107) (cid:33) · (cid:32) max t ≥ ∞ (cid:88) k =1 k (cid:107) B k,t (cid:107) (cid:33) + (cid:32) T − (cid:88) t =1 ∞ (cid:88) k =1 k (cid:107) B k,t +1 − B k,t (cid:107) (cid:33) · (cid:32) max t ≥ ∞ (cid:88) r =1 (cid:107) B r,t (cid:107) (cid:33) < ∞ . The proof is now completed.

Proof of Lemma B.4.

By Lemma B.3, we have x t = µ t + B t (1) (cid:15) t + (cid:101) B t ( L ) (cid:15) t − − (cid:101) B t ( L ) (cid:15) t , hich yields that T (cid:88) t =1 W T,t ( x t − E ( x t )) = T (cid:88) t =1 W T,t B t (1) (cid:15) t + W T, (cid:101) B ( L ) (cid:15) − W T,T (cid:101) B T ( L ) (cid:15) T + T − (cid:88) t =1 (cid:16) W T,t +1 (cid:101) B t +1 ( L ) − W T,t (cid:101) B t ( L ) (cid:17) (cid:15) t := I T, + I T, + I T, + I T, . For I T, , E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 W T,t B t (1) (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = tr (cid:32) T (cid:88) t =1 W T,t B t (1) E ( (cid:15) t (cid:15) (cid:62) t ) B (cid:62) t (1) W (cid:62) T,t (cid:33) ≤ M T (cid:88) t =1 (cid:107) W T,t (cid:107) ≤ M max t ≥ (cid:107) W T,t (cid:107) T (cid:88) t =1 (cid:107) W T,t (cid:107) = o (1) . Hence, (cid:107) I T, (cid:107) = o P (1).Also, (cid:107) I T, (cid:107) = o P (1) and (cid:107) I T, (cid:107) = o P (1), since max t ≥ (cid:107) W T,t (cid:107) = o (1), E (cid:107) (cid:101) B ( L ) (cid:15) (cid:107) < ∞ and E (cid:107) (cid:101) B T ( L ) (cid:15) t (cid:107) < ∞ by Lemma B.3.For I T, , T − (cid:88) t =1 (cid:16) W T,t +1 (cid:101) B t +1 ( L ) − W T,t (cid:101) B t ( L ) (cid:17) (cid:15) t = T − (cid:88) t =1 ( W T,t +1 − W T,t ) (cid:101) B t +1 ( L ) (cid:15) t + T − (cid:88) t =1 W T,t (cid:16)(cid:101) B t +1 ( L ) − (cid:101) B t ( L ) (cid:17) (cid:15) t . (B.3)Note that for the ﬁrst term on the right hand side of (B.3) E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 ( W T,t +1 − W T,t ) (cid:101) B t +1 ( L ) (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max t ≥ E (cid:13)(cid:13)(cid:13)(cid:101) B t +1 ( L ) (cid:15) t (cid:13)(cid:13)(cid:13) · T − (cid:88) t =1 (cid:107) W T,t +1 − W T,t (cid:107) = o (1)by Lemma B.3 and the conditions on W T,t . For the second term on the right hand side of (B.3), write E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 W T,t (cid:16)(cid:101) B t +1 ( L ) − (cid:101) B t ( L ) (cid:17) (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max t ≥ E (cid:107) (cid:15) t (cid:107) · max t ≥ (cid:107) W T,t (cid:107) T − (cid:88) t =1 (cid:107) (cid:101) B t +1 (1) − (cid:101) B t (1) (cid:107)≤ M max t ≥ (cid:107) W T,t (cid:107) T − (cid:88) t =1 ∞ (cid:88) j =0 ∞ (cid:88) k = j +1 (cid:107) B j,t +1 − B j,t (cid:107) = M max t ≥ (cid:107) W T,t (cid:107) T − (cid:88) t =1 ∞ (cid:88) j =1 j (cid:107) B j,t +1 − B j,t (cid:107) = o (1) . Thus, we have proved that (cid:107) (cid:80) Tt =1 W T,t ( x t − E ( x t )) (cid:107) = o P (1). e now prove (cid:107) (cid:80) Tt =1 W T,t (cid:0) x t x (cid:62) t + p − E (cid:0) x t x (cid:62) t + p (cid:1)(cid:1) (cid:107) = o P (1). Start from p = 0 and write x t x (cid:62) t = µ t µ (cid:62) t + µ t ∞ (cid:88) j =0 (cid:15) (cid:62) t − j B (cid:62) j,t + ∞ (cid:88) j =0 B j,t (cid:15) t − j µ (cid:62) t + ∞ (cid:88) j =0 B j,t (cid:15) t − j (cid:15) (cid:62) t − j B (cid:62) j,t + ∞ (cid:88) r =1 ∞ (cid:88) j =0 B j,t (cid:15) t − j (cid:15) (cid:62) t − j − r B (cid:62) j + r,t + ∞ (cid:88) r =1 ∞ (cid:88) j =0 B j + r,t (cid:15) t − j − r (cid:15) (cid:62) t − j B (cid:62) j,t , which yields that vec (cid:104) W T,t (cid:16) x t x (cid:62) t − E (cid:16) x t x (cid:62) t (cid:17)(cid:17)(cid:105) = ( I d ⊗ W T,t ) ∞ (cid:88) j =0 ( B j,t ⊗ µ t ) (cid:15) t − j + ( I d ⊗ W T,t ) ∞ (cid:88) j =0 ( µ t ⊗ B j,t ) (cid:15) t − j + ( I d ⊗ W T,t ) ∞ (cid:88) j =0 ( B j,t ⊗ B j,t ) vec[ (cid:15) t − j (cid:15) (cid:62) t − j − I d ]+ ( I d ⊗ W T,t ) ∞ (cid:88) r =1 ∞ (cid:88) j =0 ( B j + r,t ⊗ B j,t )vec[ (cid:15) t − j (cid:15) (cid:62) t − j − r ]+ ( I d ⊗ W T,t ) ∞ (cid:88) r =1 ∞ (cid:88) j =0 ( B j,t ⊗ B j + r,t )vec[ (cid:15) t − j − r (cid:15) (cid:62) t − j ] . Consequently, we obtain (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 W T,t (cid:16) x t x (cid:62) t − E (cid:16) x t x (cid:62) t (cid:17)(cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ) ∞ (cid:88) j =0 ( µ t ⊗ B j,t ) (cid:15) t − j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ) ∞ (cid:88) j =0 ( B j,t ⊗ B j,t ) vec[ (cid:15) t − j (cid:15) (cid:62) t − j − I d ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 2 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ) ∞ (cid:88) r =1 ∞ (cid:88) j =0 ( B j + r,t ⊗ B j,t )vec[ (cid:15) t − j (cid:15) (cid:62) t − j − r ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := I T, + I T, + I T, . By the development of (cid:80) Tt =1 W T,t ( x t − E ( x t )), it is easy to know that I T, is o P (1).For I T, , by Lemma B.3, write I T, ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ) B t (1) (cid:16) vec( (cid:15) t (cid:15) (cid:62) t ) − vec( I d ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) ( I d ⊗ W T, ) (cid:101) B ( L )vec( (cid:15) (cid:15) (cid:62) ) (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) ( I d ⊗ W T,T ) (cid:101) B T ( L )vec( (cid:15) T (cid:15) (cid:62) T ) (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 (cid:16) ( I d ⊗ W T,t +1 ) (cid:101) B t +1 ( L ) − ( I d ⊗ W T,t ) (cid:101) B t ( L ) (cid:17) vec (cid:16) (cid:15) t (cid:15) (cid:62) t (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := I T, + I T, + I T, + I T, . Let Z t = vec( (cid:15) t (cid:15) (cid:62) t − I d ) for notational simplicity. By Assumption 2, for any ν >

0, there exists λ ν > hat for all t , E [ (cid:107) Z t (cid:107) · I ( (cid:107) Z t (cid:107) > λ ν )] < ν . Then let further Z ,t = Z t · I ( (cid:107) Z t (cid:107) ≤ λ ν ) and Z ,t = Z t − Z ,t .We are now ready to write I T, ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ) B t (1) ( Z ,t − E ( Z ,t |F t − )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ) B t (1) ( Z ,t − E ( Z ,t |F t − )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . (B.4)For the ﬁrst term on the right hand side of (B.4), by Lemma B.3, write E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ) B t (1) ( Z ,t − E ( Z ,t |F t − )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = tr (cid:40) T (cid:88) t =1 T (cid:88) s =1 ( I d ⊗ W T,t ) B t (1) E (cid:16) ( Z ,t − E ( Z ,t |F t − )) ( Z ,s − E ( Z ,s |F s − )) (cid:62) (cid:17) · B , (cid:62) s (1)( I d ⊗ W (cid:62) T,s ) (cid:41) ≤ M  max t ≥ ∞ (cid:88) j =0 (cid:107) B j,t (cid:107)  T (cid:88) t =1 (cid:107) W T,t (cid:107) E (cid:107) Z ,t (cid:107) ≤ M λ ν max t ≥ (cid:107) W T,t (cid:107) T (cid:88) t =1 (cid:107) W T,t (cid:107) = o (1) . For the second term on the right hand side of (B.4), we have E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ) B t (1) ( Z ,t − E ( Z ,t |F t − )) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M T (cid:88) t =1 (cid:107) W T,t (cid:107) E (cid:107) Z ,t (cid:107) ≤ M ν.

By choosing ν suﬃciently small, and then it follows that I T, = o P (1). Similar to the proof of I T, and I T, , we can prove that I T, and I T, are o P (1). For I T, , we have I T, ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 ( I d ⊗ ( W T,t +1 − W T,t )) (cid:101) B t +1 ( L )vec (cid:16) (cid:15) t (cid:15) (cid:62) t (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 ( I d ⊗ W T,t ) (cid:16)(cid:101) B t +1 ( L ) − (cid:101) B t ( L ) (cid:17) vec (cid:16) (cid:15) t (cid:15) (cid:62) t (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . Similar to the proof of I T, , by Lemma B.3, we can prove that I T, is o P (1). Then we can conclude that I T, = o P (1).For I T, , using Lemma B.3, we have I T, ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ) ∞ (cid:88) r =1 B rt (1)vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ( I d ⊗ W T, ) ∞ (cid:88) r =1 (cid:101) B r ( L )vec (cid:16) (cid:15) (cid:15) (cid:62)− r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ( I d ⊗ W T,T ) ∞ (cid:88) r =1 (cid:101) B rT ( L )vec (cid:16) (cid:15) T (cid:15) (cid:62) T − r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 ∞ (cid:88) r =1 (cid:16) ( I d ⊗ W T,t +1 ) (cid:101) B rt +1 ( L ) − ( I d ⊗ W T,t ) (cid:101) B rt ( L ) (cid:17) vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := I T, + I T, + I T, + I T, . For I T, , by Lemma B.3, we further write (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ) ∞ (cid:88) r =1 B rt (1)vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = E tr  T (cid:88) t =1 T (cid:88) s =1 ( I d ⊗ W T,t ) ∞ (cid:88) r,k =1 B rt (1)vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r (cid:17) vec (cid:62) (cid:16) (cid:15) s (cid:15) (cid:62) s − k (cid:17) B k, (cid:62) s (1)( I d ⊗ W (cid:62) T,s )  ≤ M T (cid:88) t =1 (cid:107) W T,t (cid:107) ∞ (cid:88) r =1 (cid:107) B rt (1) (cid:107) ≤ M (cid:32) max t ≥ ∞ (cid:88) r =1 (cid:107) B rt (1) (cid:107) (cid:33) max t ≥ (cid:107) W T,t (cid:107) T (cid:88) t =1 (cid:107) W T,t (cid:107) = o (1) . In addition, similar to the proof of I T, to I T, , we can show that I T, to I T, are o P (1).Combining the above results, we have proved the case of p = 0.Similar to the development of p = 0, we can consider the case with p ≥ p is a ﬁxed number.The details are omitted due to similarity. The proof is now complete. Proof of Lemma B.5.

In the following proof, we cover the interval [ a, b ] by a ﬁnite number of subintervals { S l } , which are centeredat s l with the length denoted by δ T . Denote the number of these intervals by N T , then N T = O ( δ − T ). Inaddition, let δ T = O ( T − γ T ) with γ T = √ d T log T .Write sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 W T,t ( τ ) B t (1) (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 W T,t ( s l ) B t (1) (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T sup τ ∈ S l (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( W T,t ( τ ) − W T,t ( s l )) B t (1) (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := J T, + J T, . For J T, , since W T,t ( · ) is Lipschitz continuous and max t ≥ (cid:107) B t (1) (cid:107) < ∞ by Assumption 1, we have E | J T, | ≤ T (cid:88) t =1 max ≤ l ≤ N T sup τ ∈ S l (cid:107) W T,t ( τ ) − W T,t ( s l ) (cid:107) E (cid:107) B t (1) (cid:15) t (cid:107)≤ M T δ T max t ≥ E (cid:107) B t (1) (cid:15) t (cid:107) = O ( γ T ) . For J T, , we apply the truncation method. Deﬁne (cid:15) (cid:48) t = (cid:15) t I ( (cid:107) (cid:15) t (cid:107) ≤ T δ ) and (cid:15) (cid:48)(cid:48) t = (cid:15) t − (cid:15) (cid:48) t , where δ isdeﬁned in Assumption 2, and I ( · ) is the indicator function. Write J T, = max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 W T,t ( s l ) B t (1) (cid:0) (cid:15) (cid:48) t + (cid:15) (cid:48)(cid:48) t − E (cid:0) (cid:15) (cid:48) t + (cid:15) (cid:48)(cid:48) t |F t − (cid:1)(cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 W T,t ( s l ) B t (1) (cid:0) (cid:15) (cid:48) t − E ( (cid:15) (cid:48) t |F t − ) (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 W T,t ( s l ) B t (1) (cid:15) (cid:48)(cid:48) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 W T,t ( s l ) B t (1) E ( (cid:15) (cid:48)(cid:48) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := J T, + J T, + J T, . tart from J T, . By H¨older’s inequality and Markov’s inequality, E | J T, | ≤ O (1) d T T (cid:88) t =1 E (cid:13)(cid:13) (cid:15) (cid:48)(cid:48) t (cid:13)(cid:13) = O (1) d T T (cid:88) t =1 E (cid:107) (cid:15) t I ( (cid:107) (cid:15) t (cid:107) ≥ T δ ) (cid:107)≤ O (1) d T T (cid:88) t =1 (cid:110) E (cid:107) (cid:15) t (cid:107) δ (cid:111) δ (cid:110) E (cid:107) I ( (cid:107) (cid:15) t (cid:107) ≥ T δ ) (cid:107) (cid:111) δ − δ = O (1) d T T (cid:88) t =1 (cid:110) E (cid:107) (cid:15) t (cid:107) δ (cid:111) δ (cid:110) Pr( (cid:107) (cid:15) t (cid:107) ≥ T δ ) (cid:111) δ − δ ≤ O (1) d T T (cid:88) t =1 (cid:110) E (cid:107) (cid:15) t (cid:107) δ (cid:111) δ (cid:26) E (cid:107) (cid:15) t (cid:107) δ T (cid:27) δ − δ = O ( T δ d T ) = o (cid:16)(cid:112) d T log T (cid:17) , where the second inequality follows from H¨older’s inequality, and the third inequality follows from Markov’sinequality. Similarly, J T, = O P ( T δ d T ) = o P (cid:0) √ d T log T (cid:1) .We now turn to J T, . For notational simplicity, let Y t = W T,t ( s l ) B t (1) ( (cid:15) (cid:48) t − E ( (cid:15) (cid:48) t |F t − )) for 1 ≤ t ≤ T and A T = 2 T δ d T max t ≥ (cid:107) B t (1) (cid:107) . Simple algebra shows that (cid:107) Y t (cid:107) ≤ A T uniformly in t and s l . ByAssumption 2 and the ﬁrst condition in the body of this lemma,max ≤ l ≤ N T T (cid:88) t =1 E (cid:16) (cid:107) Y t (cid:107) |F t − (cid:17) ≤ M d T max ≤ l ≤ N T T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) E (cid:16) (cid:107) (cid:15) t (cid:107) |F t − (cid:17) = O a.s. ( d T ) . By Lemma B.2 and T δ d T log T →

0, choose some β > β = 4), and writePr (cid:16) J T, > (cid:112) βM γ T (cid:17) = Pr (cid:32) J T, > (cid:112) βM γ T , max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M d T (cid:33) + Pr (cid:32) J T, > (cid:112) βM γ T , max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) > M d T (cid:33) ≤ Pr (cid:32) J T, > (cid:112) βM γ T , max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M d T (cid:33) + Pr (cid:32) max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) > M d T (cid:33) ≤ N T exp (cid:18) − βM γ T M d T + γ T A T ) (cid:19) + 0 ≤ N T exp (cid:18) − β T (cid:19) = O ( δ − T ) T − β → . Based on the above development, the proof is now complete.

Proof of Lemma B.6. (1). Similar to the proof of Lemma B.5, we use a ﬁnite number of subintervals { S l } to cover the interval a, b ], which are centered at s l with the length δ T . Denote the number of these intervals by N T then N T = O ( δ − T ). In addition, let δ T = O ( T − γ T ) with γ T = √ d T log T .sup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( τ )) B t (1) (cid:16) vec (cid:16) (cid:15) t (cid:15) (cid:62) t (cid:17) − vec( I d ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( s l )) B t (1) (cid:16) vec (cid:16) (cid:15) t (cid:15) (cid:62) t (cid:17) − vec( I d ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T sup τ ∈ S l (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ ( W T,t ( τ ) − W T,t ( s l ))) B t (1) (cid:16) vec (cid:16) (cid:15) t (cid:15) (cid:62) t (cid:17) − vec( I d ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := J T, + J T, . Start from J T, . Similar to the proof of Lemma B.5, since (cid:13)(cid:13) B t (1) (cid:13)(cid:13) ≤ ∞ (cid:88) j =0 (cid:107) B j,t (cid:107) ≤  ∞ (cid:88) j =0 (cid:107) B j,t (cid:107)  < ∞ by Assumption 1, we have E | J T, | ≤ M T δ T max t ≥ E (cid:13)(cid:13)(cid:13) B t (1) (cid:16) vec (cid:16) (cid:15) t (cid:15) (cid:62) t (cid:17) − vec( I d ) (cid:17)(cid:13)(cid:13)(cid:13) = O ( γ T ) . We then apply the truncation method. Deﬁne u t = B t (1) (cid:0) vec (cid:0) (cid:15) t (cid:15) (cid:62) t (cid:1) − vec ( I d ) (cid:1) , u (cid:48) t = u t I ( (cid:107) u t (cid:107) ≤ T δ )and u (cid:48)(cid:48) t = u t − u (cid:48) t . For J T, , write J T, = max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( s l )) (cid:0) u (cid:48) t + u (cid:48)(cid:48) t − E ( u (cid:48) t + u (cid:48)(cid:48) t |F t − ) (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( s l )) (cid:0) u (cid:48) t − E ( u (cid:48) t |F t − ) (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( s l )) u (cid:48)(cid:48) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( s l )) E ( u (cid:48)(cid:48) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := J T, + J T, + J T, . As in the proof of Lemma B.5, we can show that J T, = o P ( √ d T log T ) and J T, = o P ( √ d T log T )respectively. We focus on J T, below.For any 1 ≤ l ≤ N T , let Y t = ( I d ⊗ W T,t ( s l ))( u (cid:48) t − E ( u (cid:48) t |F t − )). We then have E ( Y t |F t − ) = 0 and (cid:107) Y t (cid:107) ≤ T /δ d T max t (cid:13)(cid:13) B t (1) (cid:13)(cid:13) . Since max t ≥ E (cid:16) (cid:107) (cid:15) t (cid:107) |F t − (cid:17) < ∞ a.s., we can writemax ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M d T max t ≥ E (cid:16) (cid:107) u t (cid:107) |F t − (cid:17) max ≤ l ≤ N T T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) = O a.s. ( d T ) . Similar to Lemma B.2, choose β > β = 4). In view of the fact that T δ d T log T →

0, we write r (cid:16) J T, > (cid:112) βM γ T (cid:17) = Pr (cid:32) J T, > (cid:112) βM γ T , max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M d T (cid:33) + Pr (cid:32) J T, > (cid:112) βM γ T , max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) > M d T (cid:33) ≤ Pr (cid:32) J T, > (cid:112) βM γ T , max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M d T (cid:33) + Pr (cid:32) max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) > M d T (cid:33) ≤ N T exp (cid:32) − βM γ T M d T + M γ T T δ d T ) (cid:33) + 0 ≤ N T exp (cid:18) − β T (cid:19) = N T T − β = o (1) . The ﬁrst result then follows.(2). Let { S l } be a ﬁnite number of subintervals covering the interval [ a, b ], which are centered at s l with thelength δ T . Denote the number of these intervals by N T then N T = O ( δ − T ). In addition, let δ T = O ( T − γ T )with γ T = √ d T log T . Thensup τ ∈ [ a,b ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( τ )) ζ t (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( s l )) ζ t (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T sup τ ∈ S l (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ ( W T,t ( τ ) − W T,t ( s l ))) ζ t (cid:15) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := J T, + J T, . Consider J T, . By the fact that | tr( A ) | ≤ d (cid:107) A (cid:107) for any d × d matrix A and Assumption 1, E (cid:107) ζ t (cid:15) t (cid:107) = E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:88) r =1 (cid:32) ∞ (cid:88) s =0 B s + r,t ⊗ B s,t (cid:33) vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤  E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:88) r =1 (cid:32) ∞ (cid:88) s =0 B s + r,t ⊗ B s,t (cid:33) vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)  / ≤  tr  ∞ (cid:88) r =1 (cid:32) ∞ (cid:88) s =0 B s + r,t ⊗ B s,t (cid:33) · ( I d ⊗ I d ) · (cid:32) ∞ (cid:88) s =0 B s + r,t ⊗ B s,t (cid:33) (cid:62)  / ≤ M  ∞ (cid:88) r =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:88) s =0 B s + r,t ⊗ B s,t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)  / ≤ M (cid:32) ∞ (cid:88) r =1 (cid:32) ∞ (cid:88) s =0 (cid:107) B s + r,t (cid:107) (cid:33) · (cid:32) ∞ (cid:88) s =0 (cid:107) B s,t (cid:107) (cid:33)(cid:33) / = M (cid:32)(cid:32) ∞ (cid:88) r =1 r (cid:107) B r,t (cid:107) (cid:33) · (cid:32) ∞ (cid:88) s =0 (cid:107) B s,t (cid:107) (cid:33)(cid:33) / ≤ M (cid:32) ∞ (cid:88) r =1 r (cid:107) B r,t (cid:107) (cid:33) · (cid:32) ∞ (cid:88) s =0 (cid:107) B s,t (cid:107) (cid:33) / < ∞ . imilarly, we have E | J T, | ≤ M T δ T max t ≥ E (cid:107) ζ t (cid:15) t (cid:107) = O ( γ T ) . Before investigating J T, , we ﬁrst show thatmax ≤ l ≤ N T T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) E (cid:16) (cid:107) ζ t (cid:15) t (cid:107) |F t − (cid:17) = O P (1) . (B.5)Note that max ≤ l ≤ N T T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) E (cid:16) (cid:107) ζ t (cid:15) t (cid:107) |F t − (cid:17) ≤ max ≤ l ≤ N T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) (cid:16) E (cid:16) (cid:107) ζ t (cid:15) t (cid:107) |F t − (cid:17) − E (cid:107) ζ t (cid:15) t (cid:107) (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + max ≤ l ≤ N T T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) E (cid:107) ζ t (cid:15) t (cid:107) and max ≤ l ≤ N T (cid:80) Tt =1 (cid:107) W T,t ( s l ) (cid:107) E (cid:107) ζ t (cid:15) t (cid:107) = O (1). Thus, to prove (B.5), it is suﬃcient to showmax ≤ l ≤ N T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) (cid:16) E (cid:16) (cid:107) ζ t (cid:15) t (cid:107) |F t − (cid:17) − E (cid:107) ζ t (cid:15) t (cid:107) (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o P (1) . In order to do so, we writemax ≤ l ≤ N T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) (cid:16) E (cid:16) (cid:107) ζ t (cid:15) t (cid:107) |F t − (cid:17) − E (cid:107) ζ t (cid:15) t (cid:107) (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = max ≤ l ≤ N T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) tr  ∞ (cid:88) r,r ∗ =1 B rt (1)( (cid:15) t − r (cid:15) (cid:62) t − r ∗ ⊗ I d ) B r ∗ , (cid:62) t (1) − ∞ (cid:88) r =1 B rt (1) B r, (cid:62) t (1) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ d · max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) ∞ (cid:88) r =1 ( B rt (1) ⊗ B rt (1)) (cid:16) vec (cid:16) (cid:15) t − r (cid:15) (cid:62) t − r ⊗ I d (cid:17) − vec( I d ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + 2 d · max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) ∞ (cid:88) r =1 ∞ (cid:88) j =1 (cid:16) B r + jt (1) ⊗ B rt (1) (cid:17) vec (cid:16) (cid:15) t − r (cid:15) (cid:62) t − r − j ⊗ I d (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := J T, + J T, . Let F r,t ( L ) = (cid:80) ∞ j =1 B r + jt (1) ⊗ B jt (1) L j . Similar to the second result of Lemma B.3, we have F r,t ( L ) = F r,t (1) − (1 − L ) (cid:101) F r,t ( L ) (B.6)where (cid:101) F r,t ( L ) = (cid:80) ∞ j =1 (cid:101) F rj,t L j and (cid:101) F rj,t = (cid:80) ∞ k = j +1 B r + kt (1) ⊗ B kt (1). For notational simplicity, denote (cid:101) X at = ∞ (cid:88) j =1 (cid:16) B jt (1) ⊗ B jt (1) (cid:17) vec (cid:16) (cid:15) t − j (cid:15) (cid:62) t − j ⊗ I d (cid:17) , (cid:101) X bt = ∞ (cid:88) r =1 ∞ (cid:88) j =1 (cid:16) B r + jt (1) ⊗ B jt (1) (cid:17) vec (cid:16) (cid:15) t − j (cid:15) (cid:62) t − r − j ⊗ I d (cid:17) . pplying (B.6) to (cid:101) X at and (cid:101) X bt yields that (cid:101) X at = F ,t (1)vec (cid:16) (cid:15) t (cid:15) (cid:62) t ⊗ I d (cid:17) − (1 − L ) (cid:101) F ,t ( L )vec (cid:16) (cid:15) t (cid:15) (cid:62) t ⊗ I d (cid:17) , (cid:101) X bt = ∞ (cid:88) r =1 F r,t (1)vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r ⊗ I d (cid:17) − (1 − L ) ∞ (cid:88) r =1 (cid:101) F r,t ( L )vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r ⊗ I d (cid:17) . For J T, , summing up (cid:101) X at over t yieldsmax ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) ∞ (cid:88) r =1 ( B rt (1) ⊗ B rt (1)) (cid:16) vec (cid:16) (cid:15) t − r (cid:15) (cid:62) t − r ⊗ I d (cid:17) − vec( I d ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) F ,t (1) (cid:16) vec (cid:16) (cid:15) t (cid:15) (cid:62) t ⊗ I d (cid:17) − vec ( I d ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T (cid:13)(cid:13)(cid:13) (cid:107) W T, ( s l ) (cid:107) (cid:101) F , ( L )vec (cid:16) (cid:15) (cid:15) (cid:62) ⊗ I d (cid:17)(cid:13)(cid:13)(cid:13) + sup ≤ τ ≤ (cid:13)(cid:13)(cid:13) (cid:107) W T,T ( s l ) (cid:107) (cid:101) F ,T ( L )vec (cid:16) (cid:15) T (cid:15) (cid:62) T ⊗ I d (cid:17)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 (cid:16) (cid:107) W T,t +1 ( s l ) (cid:107) (cid:101) F ,t +1 ( L ) − (cid:107) W T,t ( s l ) (cid:107) (cid:101) F ,t ( L ) (cid:17) vec (cid:16) (cid:15) t (cid:15) (cid:62) t ⊗ I d (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := J T, + J T, + J T, + J T, . Similar to the proof of Lemma B.6.1, we can show that J T, = O P (cid:0) √ d T log T (cid:1) , sincemax t ≥ (cid:107) F ,t (1) (cid:107) ≤ max t ≥ ∞ (cid:88) j =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:88) k =0 B k + j,t ⊗ B k,t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max t ≥ ∞ (cid:88) j =1 (cid:32) ∞ (cid:88) k =0 (cid:107) B k + j,t (cid:107) (cid:33) (cid:32) ∞ (cid:88) k =0 (cid:107) B k,t (cid:107) (cid:33) ≤ max t ≥ (cid:32) ∞ (cid:88) k =0 (cid:107) B k,t (cid:107) (cid:33)  ∞ (cid:88) j =1 j (cid:107) B j,t (cid:107)  < ∞ . Also, we can show that J T, = O P ( d T ) and J T, = O P ( d T ), sincemax t ≥ (cid:13)(cid:13)(cid:13)(cid:101) F ,t (1) (cid:13)(cid:13)(cid:13) ≤ max t ≥ ∞ (cid:88) j =1 ∞ (cid:88) k = j +1 (cid:13)(cid:13)(cid:13) B kt (1) (cid:13)(cid:13)(cid:13) ≤ max t ≥ ∞ (cid:88) r =1 ∞ (cid:88) k = r +1  ∞ (cid:88) j =0 (cid:107) B j + k,t (cid:107)   ∞ (cid:88) j =0 (cid:107) B j,t (cid:107)  ≤ max t ≥  ∞ (cid:88) j =0 (cid:107) B j,t (cid:107)  (cid:32) ∞ (cid:88) r =1 ∞ (cid:88) k = r +1 ( k − r ) (cid:107) B k,t (cid:107) (cid:33) ≤ max t ≥  ∞ (cid:88) j =0 (cid:107) B j,t (cid:107)  (cid:32) ∞ (cid:88) r =1 r ( r + 1)2 (cid:107) B r +1 ,t (cid:107) (cid:33) ≤ max t ≥  ∞ (cid:88) j =0 (cid:107) B j,t (cid:107)   ∞ (cid:88) j =1 j (cid:107) B j,t (cid:107)  < ∞ . We can easily show J T, = o P (1), sincesup τ ∈ [ a,b ] (cid:32) T − (cid:88) t =1 (cid:107) W T,t +1 ( τ ) (cid:107) − (cid:107) W T,t ( τ ) (cid:107) (cid:33) sup τ ∈ [ a,b ] T − (cid:88) t =1 (cid:107) W T,t +1 ( τ ) − W T,t ( τ ) (cid:107) = o (1)and T − (cid:88) t =1 (cid:13)(cid:13)(cid:13)(cid:101) F ,t +1 (1) − (cid:101) F ,t (1) (cid:13)(cid:13)(cid:13) ≤ T − (cid:88) t =1 ∞ (cid:88) r =1 ∞ (cid:88) k = r +1 (cid:13)(cid:13)(cid:13) B kt +1 (1) ⊗ B kt +1 (1) − B kt (1) ⊗ B kt (1) (cid:13)(cid:13)(cid:13) ≤ T − (cid:88) t =1 ∞ (cid:88) r =1 ∞ (cid:88) k = r +1 (cid:13)(cid:13)(cid:13) B kt +1 (1) − B kt (1) (cid:13)(cid:13)(cid:13) · (cid:16)(cid:13)(cid:13)(cid:13) B kt +1 (1) (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) B kt (1) (cid:13)(cid:13)(cid:13)(cid:17) ≤ M T − (cid:88) t =1 ∞ (cid:88) r =1 ∞ (cid:88) k = r +1 ∞ (cid:88) j =0 (cid:107) B j + k,t +1 ⊗ B j,t +1 − B j + k,t ⊗ B j,t (cid:107)≤ M T − (cid:88) t =1 ∞ (cid:88) r =1 ∞ (cid:88) k = r +1 ∞ (cid:88) j =0 ( (cid:107) B j + k,t +1 − B j + k,t (cid:107) · (cid:107) B j,t +1 (cid:107) + (cid:107) B j,t +1 − B j,t (cid:107) · (cid:107) B j + k,t (cid:107) ) ≤ M T − (cid:88) t =1 ∞ (cid:88) r =1 ∞ (cid:88) k = r +1  (cid:107) B k,t +1 − B k,t (cid:107) · ∞ (cid:88) j =0 (cid:107) B j,t +1 (cid:107) + (cid:107) B k,t (cid:107) · ∞ (cid:88) j =0 (cid:107) B j,t +1 − B j,t (cid:107)  ≤ M (cid:32) T − (cid:88) t =1 ∞ (cid:88) k =1 k (cid:107) B k,t +1 − B k,t (cid:107) (cid:33) ·  max t ≥ ∞ (cid:88) j =0 (cid:107) B j,t +1 (cid:107)  + M (cid:32) max t ≥ ∞ (cid:88) k =1 k (cid:107) B k,t (cid:107) (cid:33) ·  T − (cid:88) t =1 ∞ (cid:88) j =0 (cid:107) B j,t +1 − B j,t (cid:107)  = O (1) . Based on the above development, we conclude that J T, = o P (1).Next, we focus on J T, , and writemax ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) ∞ (cid:88) r =1 ∞ (cid:88) j =1 B r + jt (1) ⊗ B rt (1)vec (cid:16) (cid:15) t − r (cid:15) (cid:62) t − r − j ⊗ I d (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) ∞ (cid:88) r =1 F r,t (1)vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r ⊗ I d (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:107) W T, ( s l ) (cid:107) ∞ (cid:88) r =1 (cid:101) F r, ( L )vec (cid:16) (cid:15) (cid:15) (cid:62)− r ⊗ I d (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:107) W T,T ( s l ) (cid:107) ∞ (cid:88) r =1 (cid:101) F r,T ( L )vec (cid:16) (cid:15) t (cid:15) (cid:62) T − r ⊗ I d (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − (cid:88) t =1 ∞ (cid:88) r =1 (cid:16) (cid:107) W T,t +1 ( s l ) (cid:107) (cid:101) F r,t +1 ( L ) − (cid:107) W T,t ( s l ) (cid:107) (cid:101) F r,t ( L ) (cid:17) vec (cid:16) (cid:15) t (cid:15) (cid:62) t − r ⊗ I d (cid:17)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := J T, + J T, + J T, + J T, . We can show that J T, and J T, are O P ( d T ), sincemax t ≥ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ∞ (cid:88) r =1 (cid:101) F r,t (1) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max t ≥ ∞ (cid:88) r =1 ∞ (cid:88) j =1 (cid:13)(cid:13)(cid:13)(cid:101) F rj,t (cid:13)(cid:13)(cid:13) ≤ max t ≥ ∞ (cid:88) r =1 ∞ (cid:88) j =1 ∞ (cid:88) k = j +1 (cid:13)(cid:13)(cid:13) B r + kt (1) (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) B kt (1) (cid:13)(cid:13)(cid:13) max t ≥ (cid:32) ∞ (cid:88) r =1 (cid:107) B rt (1) (cid:107) (cid:33)  ∞ (cid:88) j =1 j (cid:13)(cid:13)(cid:13) B jt (1) (cid:13)(cid:13)(cid:13) ≤ M max t ≥ ∞ (cid:88) j =1 j ∞ (cid:88) k =0 (cid:107) B k + j,t (cid:107) (cid:107) B k,t (cid:107)≤ M max t ≥  ∞ (cid:88) j =1 j (cid:107) B j,t (cid:107)  (cid:32) ∞ (cid:88) k =0 (cid:107) B k,t (cid:107) (cid:33) < ∞ . Similar to J T, , we have J T, = o P (1), since T − (cid:88) t =1 ∞ (cid:88) r =1 (cid:13)(cid:13)(cid:13)(cid:101) F r,t +1 (1) − (cid:101) F r,t (1) (cid:13)(cid:13)(cid:13) ≤ T − (cid:88) t =1 ∞ (cid:88) r =1 ∞ (cid:88) j =1 ∞ (cid:88) k = j +1 (cid:13)(cid:13)(cid:13) B r + kt +1 (1) ⊗ B kt +1 (1) − B r + kt (1) ⊗ B kt (1) (cid:13)(cid:13)(cid:13) ≤ T − (cid:88) t =1 ∞ (cid:88) r =1 ∞ (cid:88) j =1 ∞ (cid:88) k = j +1 (cid:16)(cid:13)(cid:13)(cid:13) B r + kt +1 (1) − B r + kt (1) (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) B kt +1 (1) (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) B kt +1 (1) − B kt (1) (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) B r + kt (1) (cid:13)(cid:13)(cid:13)(cid:17) ≤ (cid:32) T − (cid:88) t =1 ∞ (cid:88) r =1 (cid:13)(cid:13) B rt +1 (1) − B rt (1) (cid:13)(cid:13)(cid:33)  max t ≥ ∞ (cid:88) j =1 j (cid:13)(cid:13)(cid:13) B jt (cid:13)(cid:13)(cid:13) +  T − (cid:88) t =1 ∞ (cid:88) j =1 j (cid:13)(cid:13)(cid:13) B jt +1 (1) − B jt (1) (cid:13)(cid:13)(cid:13) (cid:32) max t ≥ ∞ (cid:88) r =1 (cid:107) B rt (cid:107) (cid:33) = O (1) . Now consider term J T, . Deﬁne u t = (cid:80) ∞ r =1 F r,t (1)vec (cid:0) (cid:15) t (cid:15) (cid:62) t − r ⊗ I d (cid:1) , u (cid:48) t = u t I ( (cid:107) u t (cid:107) ≤ T δ ) and u (cid:48)(cid:48) t = u t − u (cid:48) t . Then we have J T, = max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) (cid:0) u (cid:48) t + u (cid:48)(cid:48) t − E ( u (cid:48) t + u (cid:48)(cid:48) t |F t − ) (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) (cid:0) u (cid:48) t − E ( u (cid:48) t |F t − ) (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) u (cid:48)(cid:48) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) E ( u (cid:48)(cid:48) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) := J T, + J T, + J T, . Using an argument as in the proof for J T, of Lemma B.5, we can show that J T, and J T, are O P (cid:16) T δ d T (cid:17) .Next, consider J T, . For any 1 ≤ l ≤ N T , let Y t = (cid:107) W T,t ( s l ) (cid:107) ( u (cid:48) t − E ( u (cid:48) t |F t − )). We then have E ( Y t |F t − ) = 0 and (cid:107) Y t (cid:107) ≤ T /δ d T . In addition, we havemax ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ ≤ l ≤ N T T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) E (cid:16) (cid:107) u t (cid:107) |F t − (cid:17) M · d T max ≤ l ≤ N T T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) ∞ (cid:88) r =1 (cid:107) F r,t (1) (cid:107) (cid:107) (cid:15) t − r (cid:107) ≤ M · d T max t ≥ ∞ (cid:88) r =1 (cid:107) F r,t (1) (cid:107) (cid:107) (cid:15) t − r (cid:107) ≤ M · d T ∞ (cid:88) r =1 max t ≥ (cid:107) F r,t (1) (cid:107) (cid:32) T (cid:88) t =1 (cid:107) (cid:15) t − r (cid:107) δ (cid:33) δ = O P (cid:16) d T T δ (cid:17) . Therefore, we have max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:80) Tt =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13) = O P ( d T T δ ). By Lemma B.2, and choosing β = 4,we have Pr (cid:18) J T, > (cid:112) βM (cid:113) d T T δ log T (cid:19) = Pr (cid:32) J T, > (cid:112) βM (cid:113) d T T δ log T , max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M d T T δ (cid:33) + Pr (cid:32) J T, > (cid:112) βM (cid:113) d T T δ log T , max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) > M d T T δ (cid:33) ≤ Pr (cid:32) J T, > (cid:112) βM (cid:113) d T T δ log T , max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M d T T δ (cid:33) + Pr (cid:32) max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) > M d T T δ (cid:33) ≤ N T exp  − βM d T T δ log T M d T T δ + M (cid:113) d T T δ log T T δ d T )  + o (1) ≤ N T exp (cid:18) − β T (cid:19) = N T T − β = o (1)given d T T δ log T →

0. Hence, we have J T, = O P ( { d T T δ log T } / ). Combining the above results, wehave proved that sup τ ∈ [ a,b ] (cid:12)(cid:12)(cid:12)(cid:80) Tt =1 (cid:107) W T,t ( τ ) (cid:107) E (cid:16) (cid:107) ζ t (cid:15) t (cid:107) |F t − (cid:17)(cid:12)(cid:12)(cid:12) = O P (1).Finally, we turn to J T, , and apply the truncation method. Let u t = ζ t (cid:15) t , u (cid:48) t = u t I (cid:16) (cid:107) u t (cid:107) ≤ T δ (cid:17) and u (cid:48)(cid:48) t = u t − u (cid:48) t . Then we have J T, = max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( s l )) (cid:0) u (cid:48) t + u (cid:48)(cid:48) t − E ( u (cid:48) t + u (cid:48)(cid:48) t |F t − ) (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( s l )) (cid:0) u (cid:48) t − E ( u (cid:48) t |F t − ) (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( s l )) u (cid:48)(cid:48) t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 ( I d ⊗ W T,t ( s l )) E ( u (cid:48)(cid:48) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = J T, + J T, + J T, . It’s easy to show that J T, = O P ( T δ d T ) and J T, = O P ( T δ d T ). Thus, we focus on J T, .For any 1 ≤ l ≤ N T , let Y t = ( I d ⊗ W T,t ( s l ))( u (cid:48) t − E ( u (cid:48) t |F t − )), then we have E ( Y t |F t − ) = 0 and Y t (cid:107) ≤ T /δ d T . Also, max ≤ l ≤ N T T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) E (cid:16) (cid:107) ζ t (cid:15) t (cid:107) |F t − (cid:17) = O P (1) , which yields thatmax ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M d T max ≤ l ≤ N T T (cid:88) t =1 (cid:107) W T,t ( s l ) (cid:107) E (cid:16) (cid:107) u t (cid:107) |F t − (cid:17) = O P ( d T ) . Therefore, we have max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:80) Tt =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13) = O P ( d T ). By Lemma B.2 and choosing β = 4, wehave Pr (cid:16) J T, > (cid:112) βM γ T (cid:17) = Pr (cid:32) J T, > (cid:112) βM γ T , max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M d T (cid:33) + Pr (cid:32) J T, > (cid:112) βM γ T , max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) > M d T (cid:33) ≤ Pr (cid:32) J T, > (cid:112) βM γ T , max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ M d T (cid:33) + Pr (cid:32) max ≤ l ≤ N T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t =1 E ( Y t Y (cid:62) t |F t − ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) > M d T (cid:33) ≤ N T exp (cid:32) − βM γ T M d T + M γ T T δ d T ) (cid:33) + o (1) ≤ N T exp (cid:18) − β T (cid:19) = N T T − β = o (1) . given d T T δ log T → Proof of Lemma B.7. (1). For any ﬁxed τ ∈ (0 , W T,t = I d T (cid:0) τ t − τh (cid:1) k K h ( τ t − τ ). It is straightforward to verify theconditions imposed on W T,t , then the ﬁrst result follows from Lemma B.4 immediately.(2)–(3). Let W T,t ( τ ) = I d T (cid:0) τ t − τh (cid:1) k K h ( τ t − τ ), then the second and third results follow from Theorem2.1. Proof of Lemma B.8. (1). This proof is similar to Lemma B.6 with Z t − replacing ζ t , so omitted here.(2). Note that 1 √ T h T (cid:88) t =1 η t ( η t − (cid:98) η t ) (cid:62) K (cid:18) τ t − τh (cid:19) √ T h T (cid:88) t =1 η t z (cid:62) t − (cid:16) (cid:98) A ( τ t ) − A ( τ t ) (cid:17) (cid:62) K (cid:18) τ t − τh (cid:19) = 1 √ T h T (cid:88) t =1 η t z (cid:62) t − h S − T, ( τ t ) S T, ( τ t ) A (1) , (cid:62) ( τ t ) K (cid:18) τ t − τh (cid:19) + 1 √ T h T (cid:88) t =1 η t z (cid:62) t − h S − T, ( τ t ) S T, ( τ t ) A (2) , (cid:62) ( τ t ) K (cid:18) τ t − τh (cid:19) + 1 √ T h T (cid:88) t =1 η t z (cid:62) t − S − T, ( τ t ) (cid:32) T h T (cid:88) s =1 z s − z (cid:62) s − M (cid:62) ( τ s ) K (cid:18) τ s − τ t h (cid:19)(cid:33) K (cid:18) τ t − τh (cid:19) + 1 √ T h T (cid:88) t =1 η t z (cid:62) t − S − T, ( τ t ) (cid:32) T h T (cid:88) s =1 z s − η (cid:62) s K (cid:18) τ s − τ t h (cid:19)(cid:33) K (cid:18) τ t − τh (cid:19) := J T, + J T, + J T, + J T, . For J T, to J T, , using Lemma B.7, we can replace the sample covariance matrix with its convergedvalue with rate O P (cid:18)(cid:113) log TT h (cid:19) and hence it’s easy to show that J T, to J T, are o P (1).For J T, , for notational simplicity, we ignore S − T, ( τ t ) and hence, J T, = 1( T h ) / T (cid:88) t =1 η t z (cid:62) t − z t − η (cid:62) t K (0) K (cid:18) τ t − τh (cid:19) + 1( T h ) / T − (cid:88) i =1 T − i (cid:88) t =1 η t z (cid:62) t − z t + i − η (cid:62) t + i K (cid:18) iT h (cid:19) K (cid:18) τ t − τh (cid:19) + 1( T h ) / T − (cid:88) i =1 T − i (cid:88) t =1 η t + i z (cid:62) t + i − z t − η (cid:62) t K (cid:18) iT h (cid:19) K (cid:18) τ t − τh (cid:19) := J T, + J T, + J T, . It’s easy to see J T, = O P (cid:0) ( T h ) − / (cid:1) . For J T, , J T, = 1( T h ) / T − (cid:88) i =1 T − i (cid:88) t =1 η t E (cid:16) z (cid:62) t − z t + i − (cid:17) η (cid:62) t + i K (cid:18) iT h (cid:19) K (cid:18) τ t − τh (cid:19) + 1( T h ) / T − (cid:88) i =1 T − i (cid:88) t =1 η t (cid:16) z (cid:62) t − z t + i − − E (cid:16) z (cid:62) t − z t + i − (cid:17)(cid:17) η (cid:62) t + i K (cid:18) iT h (cid:19) K (cid:18) τ t − τh (cid:19) := J T, + J T, . For J T, , E (cid:107) J T, (cid:107) ≤ T h ) T − (cid:88) i =1 T − i (cid:88) t =1 (cid:110) E (cid:16) z (cid:62) t − z t + i − (cid:17) E (cid:16) η (cid:62) t η t (cid:17) E (cid:16) η (cid:62) t + i η t + i (cid:17)(cid:111) K (cid:18) iT h (cid:19) K (cid:18) τ t − τh (cid:19) = O (cid:18) T h (cid:19) , hich then yields that J T, = O P (cid:0) ( T h ) − / (cid:1) . For J T, , J T, = 1( T h ) / T − (cid:88) i =1 T − i (cid:88) t =1 η t p (cid:88) m =1 (cid:16) x (cid:62) t − m x t + i − m − E (cid:16) x (cid:62) t − m x t + i − m (cid:17)(cid:17) η (cid:62) t + i K (cid:18) iT h (cid:19) K (cid:18) τ t − τh (cid:19) . For notational simplicity, assume p = 1 and thus J T, = 1( T h ) / T − (cid:88) i =1 T − i (cid:88) t =1 η t  µ (cid:62) t − ∞ (cid:88) j =0 Ψ j,t + i − η t + i − − j  η (cid:62) t + i K (cid:18) iT h (cid:19) K (cid:18) τ t − τh (cid:19) + 1( T h ) / T − (cid:88) i =1 T − i (cid:88) t =1 η t  ∞ (cid:88) j =0 η (cid:62) t − − j Ψ (cid:62) j,t − µ t + i −  η (cid:62) t + i K (cid:18) iT h (cid:19) K (cid:18) τ t − τh (cid:19) + 1( T h ) / T − (cid:88) i =1 T − i (cid:88) t =1 η t  ∞ (cid:88) j =0 (cid:16) η (cid:62) t − − j ⊗ η (cid:62) t − − j − E (cid:16) η (cid:62) t − − j ⊗ η (cid:62) t − − j (cid:17)(cid:17) · vec (cid:16) Ψ (cid:62) j,t − Ψ j + i,t + i − (cid:17)(cid:17) η (cid:62) t + i K (cid:18) iT h (cid:19) K (cid:18) τ t − τh (cid:19) + 1( T h ) / T − (cid:88) i =1 T − i (cid:88) t =1 η t  ∞ (cid:88) j =0 ∞ (cid:88) m =0 , (cid:54) = j + i (cid:16) η (cid:62) t + i − − m ⊗ η (cid:62) t − − j (cid:17) vec (cid:16) Ψ (cid:62) j,t − Ψ m,t + i − (cid:17) · η (cid:62) t + i K (cid:18) iT h (cid:19) K (cid:18) τ t − τh (cid:19) := J T, + J T, + J T, + J T, . For J T, , E (cid:107) J T, (cid:107) ≤ T h ) T − (cid:88) i =1 T − i (cid:88) t =1 E (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) η t  µ (cid:62) t − ∞ (cid:88) j =0 Ψ j,t + i − η t + i − − j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) E (cid:107) η t + i (cid:107) K (cid:18) iT h (cid:19) K (cid:18) τ t − τh (cid:19) = O (cid:18) T h (cid:19) . Similarly, J T, and J T, are O P (cid:0) ( T h ) − / (cid:1) . For J T, , J T, = 1( T h ) / T − (cid:88) i =1 T − i (cid:88) t =1 η t  ∞ (cid:88) j =0 (cid:16) η (cid:62) t ⊗ η (cid:62) t − − j (cid:17) vec (cid:16) Ψ (cid:62) j,t − Ψ i − ,t + i − (cid:17) η (cid:62) t + i K (cid:18) iT h (cid:19) K (cid:18) τ t − τh (cid:19) + 1( T h ) / T − (cid:88) i =1 T − i (cid:88) t =1 η t  ∞ (cid:88) j =0 ∞ (cid:88) m =0 , (cid:54) = j + i, (cid:54) = i − (cid:16) η (cid:62) t + i − − m ⊗ η (cid:62) t − − j (cid:17) · vec (cid:16) Ψ (cid:62) j,t − Ψ m,t + i − (cid:17)(cid:17) η (cid:62) t + i K (cid:18) iT h (cid:19) K (cid:18) τ t − τh (cid:19) := J T, + J T, . Similar to the proof of J T, , we can show that J T, = O P (cid:0) ( T h ) − / (cid:1) . et w t,i = (cid:80) ∞ j =0 (cid:16) η t η (cid:62) t ⊗ η (cid:62) t − − j (cid:17) vec (cid:16) Ψ (cid:62) j,t − Ψ (cid:62) i − ,t + i − (cid:17) . For J T, , E (cid:107) J T, (cid:107) = 1( T h ) T − (cid:88) i =1 T − (cid:88) i =1 T − i (cid:88) t =1 E (cid:13)(cid:13)(cid:13) η (cid:62) t + i η t + i (cid:13)(cid:13)(cid:13) E (cid:13)(cid:13)(cid:13) w (cid:62) t + i − i ,i w t ,i (cid:13)(cid:13)(cid:13) · K (cid:18) i T h (cid:19) K (cid:18) i T h (cid:19) K (cid:18) τ t − τh (cid:19) K (cid:18) τ t + i − i − τh (cid:19) ≤ M ( T h ) T (cid:88) t =1 (cid:32) max t T − (cid:88) i =1 (cid:107) Ψ i,t (cid:107) (cid:33)  max t ∞ (cid:88) j =0 (cid:107) Ψ j,t (cid:107)  K (cid:18) τ t − τh (cid:19) = O (cid:0) ( T h ) − (cid:1) . Hence, J T, = O P (cid:0) ( T h ) − / (cid:1) . Similar to J T, , J T, = O P (cid:0) ( T h ) − / (cid:1) . The proof is now complete.Deﬁne Λ p ( τ ) = [ a ( τ ) , A p , ( τ ) , ..., A p , p ( τ )], where A p ,j ( τ ) = A j ( τ ) for 1 ≤ j ≤ p and A p ,j ( τ ) = 0 for j > p . Let z p ,t − = [1 , x (cid:62) t − , ..., x (cid:62) t − p ] (cid:62) , M p ( τ t ) = Λ p ( τ t ) − Λ p ( τ ) − Λ (1) p ( τ )( τ t − τ ) − h Λ (2) p ( τ )( τ t − τ ) , Λ p ( τ ) = [ A p , p +1 ( τ ) , ..., A p , P ( τ )] and z p ,t − = [ x (cid:62) t − p − , ..., x (cid:62) t − P ] (cid:62) . Proof of Lemma B.9. (1). Since p ≥ p , we have (cid:98) η p ,t = η t + (cid:16) Λ p ( τ t ) − (cid:98) Λ p ( τ t ) (cid:17) z p ,t − andRSS( p ) = 1 T T (cid:88) t =1 η (cid:62) t η t + 1 T T (cid:88) t =1 z (cid:62) p ,t − (cid:16) Λ p ( τ t ) − (cid:98) Λ p ( τ t ) (cid:17) (cid:62) (cid:16) Λ p ( τ t ) − (cid:98) Λ p ( τ t ) (cid:17) z p ,t − − T T (cid:88) t =1 tr (cid:16) η t ( η t − (cid:98) η p ,t ) (cid:62) (cid:17) := 1 T T (cid:88) t =1 η (cid:62) t η t + I T, + I T, . Since η (cid:62) t η t is m.d.s., we have T (cid:80) Tt =1 η (cid:62) t η t = T (cid:80) Tt =1 E ( η (cid:62) t η t ) + T − / . By Theorem 3.1, I T, ≤ T (cid:98) T h (cid:99) (cid:88) t =1 (cid:107) z p ,t − (cid:107) · (cid:13)(cid:13)(cid:13) (cid:98) Λ p ( τ t ) − Λ p ( τ t ) (cid:13)(cid:13)(cid:13) + 1 T T −(cid:98) T h (cid:99) (cid:88) t = (cid:98) T h (cid:99) +1 (cid:107) z p ,t − (cid:107) · (cid:13)(cid:13)(cid:13) (cid:98) Λ p ( τ t ) − Λ p ( τ t ) (cid:13)(cid:13)(cid:13) + 1 T T (cid:88) t = T −(cid:98) T h (cid:99) +1 (cid:107) z p ,t − (cid:107) · (cid:13)(cid:13)(cid:13) (cid:98) Λ p ( τ t ) − Λ p ( τ t ) (cid:13)(cid:13)(cid:13) ≤ sup ≤ τ ≤ h (cid:13)(cid:13)(cid:13) (cid:98) Λ p ( τ t ) − Λ p ( τ ) (cid:13)(cid:13)(cid:13) · T (cid:98) T h (cid:99) (cid:88) t =1 (cid:107) z p ,t − (cid:107) + sup h ≤ τ ≤ − h (cid:13)(cid:13)(cid:13) (cid:98) Λ p ( τ t ) − Λ p ( τ ) (cid:13)(cid:13)(cid:13) · T T −(cid:98) T h (cid:99) (cid:88) t = (cid:98) T h (cid:99) +1 (cid:107) z p ,t − (cid:107) + sup − h ≤ τ ≤ (cid:13)(cid:13)(cid:13) (cid:98) Λ p ( τ t ) − Λ p ( τ ) (cid:13)(cid:13)(cid:13) · T T (cid:88) t = T −(cid:98) T h (cid:99) +1 (cid:107) z p ,t − (cid:107) = O P (cid:16) h ( h + (log T / ( T h )) / ) + (1 − h ) c T (cid:17) = O P (cid:16) ( h / + (log T / ( T h )) / ) (cid:17) . or I T, , 1 T T (cid:88) t =1 η t ( η t − (cid:98) η p ,t ) (cid:62) = 1 T T (cid:88) t =1 η t z (cid:62) p ,t − (cid:16) (cid:98) Λ p ( τ t ) − Λ p ( τ t ) (cid:17) (cid:62) = h · T T (cid:88) t =1 η t z (cid:62) p ,t − S − T, ( τ t ) S T, ( τ t ) A (1) , (cid:62) p ( τ t )+ 12 h · T T (cid:88) t =1 η t z (cid:62) p ,t − S − T, ( τ t ) S T, ( τ t ) A (2) , (cid:62) p ( τ t )+ 1 T T (cid:88) t =1 η t z (cid:62) p ,t − S − T, ( τ t ) (cid:32) T h T (cid:88) s =1 z p ,s − z (cid:62) s − M (cid:62) p ( τ s ) K (cid:18) τ s − τ t h (cid:19)(cid:33) + 1 T T (cid:88) t =1 η t z (cid:62) p ,t − S − T, ( τ t ) (cid:32) T h T (cid:88) s =1 z s − η (cid:62) s K (cid:18) τ s − τ t h (cid:19)(cid:33) := I T, + I T, + I T, + I T, . By the uniform convergence results stated in Theorem 2.1, we replace the weighed sample covariancewith its converged value plus the rate O P (cid:0) (log T / ( T h )) / (cid:1) . For I T, , by the fact that S T, ( τ ) = O P (cid:0) (log T / ( T h )) / (cid:1) for τ ∈ [ h, − h ], (cid:107) I T, (cid:107) = O P (cid:16) T − h + h (log T / ( T h )) / (cid:17) . Similarly, (cid:107) I T, (cid:107) + (cid:107) I T, (cid:107) = O P (cid:16) T − h + h (log T / ( T h )) / (cid:17) . For I T, , let Σ ( τ ) = plim T →∞ S T, ( τ ), we have I T, = 1 T T (cid:88) t =1 η t z (cid:62) p ,t − Σ − ( τ t ) (cid:32) T h T (cid:88) s =1 z s − η (cid:62) s K (cid:18) τ s − τ t h (cid:19)(cid:33) + O P (cid:16) ( T h ) − / (log T / ( T h )) / (cid:17) . Similar to the proof of J T, in Lemma B.8, we can show1 T T (cid:88) t =1 η t z (cid:62) p ,t − Σ − ( τ t ) (cid:32) T h T (cid:88) s =1 z s − η (cid:62) s K (cid:18) τ s − τ t h (cid:19)(cid:33) = O P (( T h ) − ) . Since (

T h ) − + T − h = o ( c T φ T ), result (1) follows.(2). For p < p , we have (cid:98) Λ p ( τ ) − Λ p ( τ ) = B p ( τ ) + o P (1) uniformly over τ ∈ [0 , B p ( τ ) is anonrandom bias term. Since (cid:98) η p ,t = η t + (cid:16) Λ p ( τ t ) − (cid:98) Λ p ( τ t ) (cid:17) z p ,t − + Λ p ( τ t ) z p ,t − , by Theorem 2.1, we haveRSS( p ) = 1 T T (cid:88) t =1 E (cid:16) η (cid:62) t η t (cid:17) + 1 T T (cid:88) t =1 tr (cid:16) [ B p ( τ t ) , Λ p ( τ t )] E (cid:16) z P ,t − z (cid:62) P ,t − (cid:17) [ B p ( τ t ) , Λ p ( τ t )] (cid:62) (cid:17) + o P (1) . Since [ B p ( τ t ) , Λ p ( τ t )] (cid:54) = 0 and E (cid:16) z P ,t − z (cid:62) P ,t − (cid:17) is a positive deﬁnite matrix, the result follows.is a positive deﬁnite matrix, the result follows.