[PDF] Locally trimmed least squares: conventional inference in possibly nonstationary models

Abstract

A novel IV estimation method, that we term Locally Trimmed LS (LTLS), is developed which yields estimators with (mixed) Gaussian limit distributions in situations where the data may be weakly or strongly persistent. In particular, we allow for nonlinear predictive type of regressions where the regressor can be stationary short/long memory as well as nonstationary long memory process or a nearly integrated array. The resultant t-tests have conventional limit distributions (i.e. N(0; 1)) free of (near to unity and long memory) nuisance parameters. In the case where the regressor is a fractional process, no preliminary estimator for the memory parameter is required. Therefore, the practitioner can conduct inference while being agnostic about the exact dependence structure in the data. The LTLS estimator is obtained by applying certain chronological trimming to the OLS instrument via the utilisation of appropriate kernel functions of time trend variables. The finite sample performance of LTLS based t-tests is investigated with the aid of a simulation experiment. An empirical application to the predictability of stock returns is also provided.

Full PDF

LLOCALLY TRIMMED LEAST SQUARES: CONVENTIONALINFERENCE IN POSSIBLY NONSTATIONARY MODELS

Zhishui Hu ∗ , Ioannis Kasparis † and Qiying Wang ‡ June 24, 2020

Abstract

A novel IV estimation method, that we term

Locally Trimmed LS (LTLS) , is developed whichyields estimators with (mixed) Gaussian limit distributions in situations where the data may beweakly or strongly persistent. In particular, we allow for nonlinear predictive type of regressionswhere the regressor can be stationary short/long memory as well as nonstationary long memoryprocess or a nearly integrated array. The resultant t-tests have conventional limit distributions(i.e. N (0 , ) free of (near to unity and long memory) nuisance parameters. In the case wherethe regressor is a fractional process, no preliminary estimator for the memory parameter isrequired. Therefore, the practitioner can conduct inference while being agnostic about theexact dependence structure in the data. The LTLS estimator is obtained by applying certain chronological trimming to the OLS instrument via the utilisation of appropriate kernel functionsof time trend variables. The ﬁnite sample performance of LTLS based t-tests is investigatedwith the aid of a simulation experiment. An empirical application to the predictability of stockreturns is also provided. It is well known that under nonstationarity regression estimators do not have conventional limitdistributions in general. As a consequence, the inferential procedures developed for stationary dataare not applicable under nonstationarity. A number of early studies in the area of nonstationaryeconometrics (e.g. Phillips and Hansen, 1990; Johansen, 1995; Phillips, 1995; Robinson and Hualde2003) develop inferential procedures suitable for nonstationary models, however these methods are ∗ International Institute of Finance, School of Management, University of Science and Technology of China, Hefei,Anhui 230026, China, email: [email protected]. † University of Cyprus, Nicosia, Cyprus, email: [email protected]. ‡ School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia, email: [email protected]. a r X i v : . [ ec on . E M ] J un ot valid in general under stationarity. In fact, it is well known that methods such as FMLS (c.f.Phillips, 1995) may exhibit severe size distortions even under local deviations from the (fractional)unit root paradigm. This duality in inference, has made empirical work in time series econometricselusive. Practitioners typically need to make preliminary (some times ad hoc) assumptions aboutthe persistence level in the data or apply some sort of pre-testing -and therefore expose inferenceto problems associated to pre-testing- before proceeding to estimation and inference. A number ofstudies has attempted to address this issue using conservative conﬁdence intervals (for a review seeMikusheva (2007), Phillips (2014) and the references therein). The more recent work of Magdalinosand Phillips (2009; MP hereafter) (see also Kostakis, Magdalinos and Stamatogiannis (2015) forreﬁnements and additional results) follows a completely diﬀerent direction. MP propose an IVestimator (IVX) that that has mixed Gaussian limit distribution at the expense of an arbitraryreduction in the convergence rate, relative to that of the OLS estimator.In this paper we follow an approach similar to the pioneering work of MP. To ﬁx ideas considerthe simple model y k = βx k + u k , k = 1 , ..., n, (1)where x k is a nearly integrated (NI) and x k predetermined with respect to some martingale diﬀer-ence error term ( u k ). MP construct the so called IVX instrument by applying the following linearﬁltering to the OLS instrument ( x k ) Z kn = k − (cid:88) j =0 (cid:16) c z n b (cid:17) j ( x k − j − x k − j − ) , (2)for some c z < and < b < . This linear ﬁltering transforms x k into a mildly integrated process(e.g. see Giraitis and Phillips, 2006; Phillips and Magdalinos, 2007) that is less persistent thana NI array (e.g. x k ). By choosing b arbitrary close to unity, the reduction in the signal of theinstrument results into an arbitrary small reduction in the convergence rate of the IVX estimator,relative to that of the OLS, and this is suﬃcient for a martingale CLT to operate, rendering IVXbased inference conventional. The choice of b is important to inference with smaller b resulting inbetter size control at the expense of asymptotic power. Note that as b ↑ , Z kn approximates theNI process x k and the IVX estimator resembles the behaviour of the OLS estimator. The recentwork of Yang, Long, Peng and Cai (2019) generalises the IVX method to regression models withserially correlated regression (parametric AR) errors, whilst Demetrescu, Georgiev, Rodrigues andTaylor (2020) apply a modiﬁed version of the IVX estimator to test for episodic predictability instock returns. 2e consider an alternative method for reducing the signal of the OLS instrument. Let K bean integrable kernel function and set Z kn = K [ c n ( k/n − τ )] x k ,where c n is a positive deterministic sequence such that c − n + c n n − → and < τ < . Forsimplicity set τ = 1 / and K (0) = 1 . In this case the kernel function extracts information fromthe OLS instrument for observations near the middle of the sample. In particular, Z kn ≈ x k i.e.when k ≈ n/ , and Z kn ≈ when k is far from n/ . In other words certain chronological trimming applies around the “ chronological point τ ”. By allowing the c n sequence to diverge at an arbitraryslow rate, the resultant IV (LTLS) estimator attains an arbitrary slower convergence rate relative tothe OLS estimator. In principle, it is possible to extract information around multiple chronologicalpoints < τ < ... < τ l n < where l n is either ﬁxed or l n → ∞ such that l n = o ( c n ) . In this casethe relevant instrument is Z kn = l n (cid:88) j =1 K [ c n ( k/n − τ j )] x k . (3)As long as the LTLS estimator converges at slower rate, than the OLS estimator, limit theory ismixed Gaussian for nonstationary regressor covariates and Gaussian for stationary. In particular,the reduction in the signal of the OLS instrument allows a martingale CLT (c.f. Wang, 2014) tooperate even if x k is nonstationary. Notice that if c n is too small or if too many chronologicalpoints ( l ) are employed, then Z kn approximates the OLS instrument and as a consequence LTLSbased inference resembles OLS based inference. This can be easily seen, if a vanishing sequence c n is employed. Note that for c n → , Z kn ≈ l n K (0) x k .Our theoretical framework allows for a wide range of stationary and nonstationary linear pro-cesses as well NI arrays. In particular, x k can be a stationary or a nonstationary fractionalprocess. Consider the LTLS estimator of β in (1) that utilises the instrument of (3) i.e. ˆ β = (cid:80) nk =1 Z kn y k / (cid:80) nk =1 Z kn x k . Let t ∈ [0 , and suppose that x k is a nonstationary process such thatfor some d n → ∞ , d − n x (cid:98) nt (cid:99) ⇒ X t in D [0 , where X t is a continuous process. For instance X t canbe a fractional BM or a fractional Ornstein-Uhlenbeck process (see Remark 1 below) depending onsome memory or near-to-unity nuisance parameter. Then we have d n (cid:114) nl n c n (cid:16) ˆ β − β (cid:17) → d MN (cid:32) , E (cid:0) u t (cid:1) (cid:82) R K ( x ) dx (cid:0)(cid:82) R K ( x ) dx (cid:1) (cid:82) X t dt (cid:33) . Because c n → ∞ and l n = o ( c n ) the convergence rate of the LTLS is slower than that of theOLS estimator ( d n √ n ). Further, note that nuisance parameters aﬀect the limit distribution only3ia the mixing variate (cid:104)(cid:82) X t dt (cid:105) − and as a consequence the studentised LTLS estimator hasstandard normal limit distribution. Interestingly, the limit variance shown above is the same, upto a constant, to that of the FMLS estimator for the case where x k ∼ I (1) .We mention that the constant that features in the limit variance of the LTLS estimator above,can be made arbitrarily small by an appropriate choice of the kernel function. For example supposethat K ( x ) = (cid:0) πς (cid:1) − / exp (cid:16) − x ς (cid:17) . Then E (cid:0) u t (cid:1) (cid:90) R K ( x ) dx (cid:46) (cid:18)(cid:90) R K ( x ) dx (cid:19) = E (cid:0) u t (cid:1) √ πς → as ς → ∞ . Nevertheless, choosing a large value of the kernel variance parameter has the sameeﬀect as choosing a small value for c n . Therefore as ς → ∞ , the LTLS estimator approximatesthe OLS estimator.It should be further noted that for nonstationary fractional covariates (i.e. I ( d ) , d > / ),methods like FMLS (e.g. Phillips, 1995) or the spectral GLS of Robinson and Hualde (2003) (seealso Hualde and Robinson, 2010) are asymptotically equivalent the Gaussian pseudo maximumlikelihood and therefore asymptotically eﬃcient (c.f. Phillips, 1991). The key feature of thesemethods is to induce asymptotically mixed Gaussian estimators by certain modiﬁcation in thedependent variable that involves (fractionally) diﬀerencing the covariates. In the context of (1) suchdiﬀerencing takes the form ( I − L ) ˆ d x k , where L is the lag operator and ˆ d is a preliminary estimatorfor the memory parameter of x k . Nevertheless, if there is a local deviation (order O ( n − ) ) from the(fractional) unit root model, the aforementioned methods yield mixed Gaussian limit theory onlyif the following quasi fractional diﬀerencing is applied ( I − ( c/n ) L ) ˆ d x k , where c is a local to unity parameter. A non trivial value for the local to unity parameter howeverrenders the aforementioned methods infeasible because of the lack of identiﬁability of c . It is wellknown that if c (cid:54) = 0 , inference based on methods like FMLS are prone to severe size distortionseven if there is moderate correlation between the regressor and the regression error.The remaining of this work is organised as follows. Section 2 provides basic limit theory forlocally trimmed functionals of stationary and nonstationary processes. This limit theory is utilised (cid:90) R K ( x ) dx = (cid:0) πς (cid:1) − (cid:90) R exp (cid:18) − x ς / (cid:19) dx = (cid:0) πς (cid:1) − (cid:112) π ( ς /

2) = 12 √ πς .

4n Section 3 for exploring the limit properties of the LTLS estimation and inference. Section 4provides a simulation study and Section 5 an empirical application on the predictability of stockreturns.Throughout this paper we make use of the following notation. For two deterministic sequences a n and b n , a n ∼ b n denotes lim n →∞ a n /b n = 1 . { A } is the indicator function on set A . We maywrite the integral (cid:82) R f ( x ) dx as (cid:82) f . ⇒ denotes weak convergence in the space D [0 , . For a vector x , (cid:107) x (cid:107) is its inner product norm and x (cid:48) its transpose. By [ x ] we denote the integer part of a positivenumber x . Finally, diag { a , ..., a p } denotes a p × p diagonal matrix with elements { a , ..., a p } onthe main diagonal, → d denotes the convergence in distribution and Y := MN ( , (cid:80) ) denotes aGaussian variate (mixing normal) with characteristic function f ( t ) = Ee it (cid:48) Y = Ee − t (cid:48) (cid:80) t/ . In this section we develop basic limit theory for locally trimmed (LT) sample functionals ofstationary and nonstationary processes. Our basic limit theory is utilised in Section 3 for theasymptotic analysis of the LTLS estimator. Let { x k } ≤ k ≤ n be a scalar time series process and { X nk } ≤ k ≤ n,n ≥ be some scalar random array. Further, let K be an integrable kernel functionand g ( . ) = [ g ( . ) , ..., g p ( . )] (cid:48) , where, for each i = 1 , ..., p , g i is a measurable function. For l ∈ N and < τ < ... < τ l < , set S n,l = c n n n (cid:88) k =1 g ( x k )  l l (cid:88) j =1 K [ c n ( k/n − τ j )]  ,M n,l = (cid:114) c n n n (cid:88) k =1 g ( x k )  √ l l (cid:88) j =1 K [ c n ( k/n − τ j )]  u k ,S n,l = c n n n (cid:88) k =1 g ( X nk )  l l (cid:88) j =1 K [ c n ( k/n − τ j )]  ,M n,l = (cid:114) c n n n (cid:88) k =1 g ( X nk )  √ l l (cid:88) j =1 K [ c n ( k/n − τ j )]  u k , where c n is a sequence of positive constants, l either ﬁxed or l → ∞ as n → ∞ , and u k to-gether with an appropriate ﬁltration {F k } forms a martingale diﬀerence sequence (such that X nk , x k are F k − -measurable). The limit theory of the LTLS estimator relies on the asymptotics of { S jn,l , M jn,l } j =1 . Limit theory for the functionals { S n,l , M n,l } is relevant for stationary regres-sors whilst { S n,l , M n,l } for nonstationary. In fact, it is assumed that X nk satisﬁes some FCLT.The term S n,l resembles certain functionals considered by Phillips, Li and Gao (2017) who study5he estimation of cointegrated models with smooth time varying parameters (TVP). The aforemen-tioned work considers terms of the form c n n n (cid:88) k =1 X nk K [ c n ( k/n − τ )] , < τ < , where X nk is an I (1) process normalised by √ n . As explained below, under our assumptions X nk canbe an appropriately normalised I ( d ) , d > / , process or a NI array (possibly driven by fractionalerrors). Therefore the limit results provided in this section are also relevant to the estimation ofTVP models for the case where the covariate is a general nonstationary process satisfying someFCLT (see Assumption A3 below).To facilitate basic limit results, we make use of the following conditions. A1 (innovations): { η k , F k } k ≥ , where η (cid:48) k = ( ξ k +1 , u k ) and F k = σ ( u k , u k − , ..., u ; ξ j , j ≤ k + 1) , forms a -dimensional martingale diﬀerence satisfying the following conditions:(a) sup k ≥ E ( u k I ( | u k | ≥ M ) |F k − ) = o P (1) , as M → ∞ ;(b) sup k ≥ E ( ξ k I ( | ξ k | ≥ M ) |F k − ) = o P (1) , as M → ∞ ;(c) there exists a positive deﬁnite matrix: Σ =  σ ξ σ ξu σ uξ σ u  so that, for all k ≥ , E ( η k η (cid:48) k | F k − ) = Σ , a.s. A2 (stationary process): x k is an ergodic (strictly) stationary random sequence and a functionalof ξ k , ξ k − , ... satisfying that E (cid:107) g ( x k ) (cid:107) δ < ∞ for some δ > . A3 (nonstationary process and invariance principle): X nk = d − n x k , where < d n = var ( x n ) →∞ and x k is a functional of ξ k , ξ k − , ... (depending on n is allowed) so that, on D R [0 , ,  √ n [ nt ] (cid:88) k =1 ξ k , √ n [ nt ] (cid:88) k =1 ξ − k , X n, [ nt ]  ⇒ [ B t , B t , X t ] , (4)where B t and B t are two independent Gaussian process with mean zero and stationaryindependent increments, and X t is a continuous process that depends only on functionals of { B t } ≤ t ≤ and { B t } ≤ t ≤ . A4 (kernel function and restrictions on τ j , l n and c n ):6a) K ( x ) is a positive real function having a compact support;(b) < c n → ∞ and c n /n → ;(c) τ j = j/ ( l n + 1) where j = 1 , ..., l n with l − n + c − n l n → .We remark that the innovation process { η k , F k } k ≥ used in A1 is standard in literature so thatboth M n,l and M n,l have a martingale structure. The uniform integrability conditions (a) and (b)are weak in comparison with the high moments used in previous works. See, for instance, Wang(2014) and Wang and Phillips (2009a, b). Since Σ is required to be a positive deﬁnite matrix,condition (c) excludes the process u k to be ARCH and GARCH models. The condition (c) isrequired for technical reasons, which seems to be diﬃcult to reduce at the moment.Stationary process given in A2 is extensively used in empirical applications where examplesinclude short and long memory (fractional) processes. Typical examples on nonstationary processessatisfying A3 have the form: x k = ρx k − , + ∞ (cid:88) i =0 φ i ξ k − i , where ρ = 1 + c/n with c ∈ R and (cid:80) ∞ i =0 φ i < ∞ . For the latter speciﬁcation, (4) holds with X t being a fractional Ornstein-Uhlenbeck process. See, for instance, Buchmann and Chan (2007),Wang and Phillips (2009a, b) and Wang (2015).As for A4 , the restriction on compact support for K ( x ) can be relaxed if we have more conditionson l n . Indeed, in the following main results, A4 can be replaced by the following: A4 ∗ (kernel function and restrictions on τ j , l n and c n ):(a) K ( x ) is an eventually monotonic (i.e., there exists A > such that K ( x ) is monotonicon ( −∞ , − A ) and ( A , ∞ ) ) positive function so that K ( x ) ≤ C/ (1 + | x | ) and (cid:82) K < ∞ ;(b) < c n → ∞ and c n /n → ;(c) τ j = j/ ( l n + 1) where j = 1 , ..., l n with l − n + c − n l n log n → .We now introduce the limit theory for LT sample functionals. Since there are essential diﬀerencebetween M n,l and M n,l , the main results will be presented based on stationary and nonstationaryprocesses, separately. Theorem 1.

Suppose A2 and A4 or A4 ∗ hold. Then, as n → ∞ , we have S n,l n = Eg ( x ) (cid:90) K + o P (1) . (5)7 f in addition A1 , then, as n → ∞ , M n,l n → d N (cid:18) , σ u E (cid:2) g ( x ) g ( x ) (cid:48) (cid:3) (cid:90) K (cid:19) . (6) Theorem 2.

Suppose that A3 and A4 or A4 ∗ hold and g ( . ) is continuous. Then, as n → ∞ , wehave S n,l n = (cid:90) g ( X n, (cid:98) nt (cid:99) ) dt (cid:90) K + o P (1) → d (cid:90) g ( X t ) dt (cid:90) K. (7) If in addition A1 , jointly with (7), we have M n,l n → d MN (cid:18) , σ u (cid:90) g ( X t ) g ( X t ) (cid:48) dt (cid:90) K (cid:19) . (8) Remark . If we are only interesting the similar results as those of (5) and (7), conditions A2 and A3 can be reduced. For instance, the result (7) still holds if only (4) is replaced by X n, [ nt ] ⇒ X t on D R [0 , . See Lemma 1 in Section 6 for more details. Furthermore, if x k is a weakly nonstationaryprocess (i.e., I (1 / and mildly integrated processes, where FCLTs do not apply) as consideredin Phillips and Magdalinos (2007) and Duﬀy and Kasparis (2018), some preliminary calculationssuggest (see also Theorem 2.2 in Duﬀy and Kasparis, 2018) that c n n n (cid:88) k =1 g (cid:0) d − n x k (cid:1)(cid:110) l n l n (cid:88) j =1 K [ c n ( k/n − τ j )] (cid:111) → d (cid:90) R g ( x + X − ) ϕ σ ( x ) dx (cid:90) K, where ϕ σ ( x ) is the density of a N (cid:0) , σ (cid:1) variate ( σ > ) and X − ∼ N (cid:0) , σ − (cid:1) ( σ − ≥ ).Discussions toward this kind of generalization, together with the investigation for trimmed samplefunctionals of weakly nonstationary processes, are left for future work. Remark . The continuity requirement in Theorem 2 is not essential for (7) and (8). These resultscan be extended to the case where g is locally Lebesgue integrable, if we impose more smoothnessconditions on X nk (see for example Christopeit (2009) and the references therein). This kind ofgeneralisation involves more complicated derivations and will not be pursued here in order to keepthe paper under reasonable length. Remark . Following the proof of Theorem 1, it is easy to see that results (5) and (6) still hold if A4 (c) is replaced by τ j = j/ ( l + 1) where j = 1 , ..., l , i.e., if l n ≡ l is ﬁxed. As for (7) and (8), if A4 (c) is replaced by τ j = j/ ( l + 1) where j = 1 , ..., l , we have [ S n,l , M n,l ] → d  l l (cid:88) j =1 g ( X τ j ) (cid:90) K, MN (cid:16) , σ u l l (cid:88) j =1 g ( X τ j ) g ( X τ j ) (cid:48) (cid:90) K (cid:17) . d − n x k as given A3 ). For the purposes of regression analysis, limit theory for non rescaled processes(i.e. x k ) is more relevant. Following Park and Phillips (1999, 2001), we assume that the function g ( . ) = [ g ( . ) , ..., g p ( . )] (cid:48) is asymptotically homogeneous, i.e. for large λg i ( λx ) ≈ π i ( λ ) H i ( x ) , i = 1 , ..., p where π i (positive real valued function) is the “asymptotic order” of g i and H i is the “asymptotichomogeneous function” of g i that is assumed continuous. Several speciﬁcations of interest satisfythese conditions e.g. polynomial functions, logarithmic, indicator functions and distribution typeof functions e.g. see Park and Phillips (2001) for more details. Set π ( . ) := diag { π ( . ) , ..., π p ( . ) } and H ( . ) = [ H ( . ) , ..., H p ( . )] (cid:48) . The following result is the counterpart of Theorem 2 for additivetransformations of non rescaled sequences. Theorem 3.

Suppose that: ( a ) A1, A3 and A4 or A4 ∗ hold; ( b ) for each i = 1 , .., p , there exists a continuous function H i and π i : (0 , ∞ ) → (0 , ∞ ) , so that g i ( λx ) = π i ( λ ) H i ( x ) + R i ( λ, x ) , where | R i ( λ, x ) | ≤ a i ( λ )(1 + | x | δ ) for some δ > and a i ( λ ) /π i ( λ ) → , as λ → ∞ .Then, as n → ∞ , we have n (cid:88) k =1 π ( d n ) − g ( x k ) (cid:110) l n (cid:88) j =1 K [ c n ( k/n − τ j )] (cid:111)(cid:104) c n nl n , (cid:114) c n nl n u k (cid:105) = n (cid:88) k =1 H ( X nk ) (cid:110) l n (cid:88) j =1 K [ c n ( k/n − τ j )] (cid:111)(cid:104) c n nl n , (cid:114) c n nl n u k (cid:105) + o P (1) (9) → d (cid:104) (cid:90) H ( X t ) dt (cid:90) K, MN (cid:16) , σ u (cid:90) H ( X t ) H ( X t ) (cid:48) dt (cid:90) K (cid:17)(cid:105) . (10) Remark . As noticed in Remark 3, if A4 (c) is replaced by τ j = j/ ( l + 1) where j = 1 , ..., l , wehave n (cid:88) k =1 π ( d n ) − g ( x k ) (cid:110) l (cid:88) j =1 K [ c n ( k/n − τ j )] (cid:111)(cid:104) c n nl , (cid:114) c n nl u k (cid:105) d  l l (cid:88) j =1 H ( X τ j ) (cid:90) K, MN (cid:16) , σ u l l (cid:88) j =1 H ( X τ j ) H ( X τ j ) (cid:48) (cid:90) K (cid:17) . Remark . Suppose K ∗ is a real function satisfying A4 (a) or A4 ∗ (a). Let < τ ∗ < . Similararguments as in the proof of Theorems 2 and 3 show that, under the conditions of Theorem 3 with g ( . ) = [ g ( . ) , g ( . )] (cid:48) , (cid:16) (cid:90) H ( X n, [ nt ] ) dt, U n , U n (cid:17) → d (cid:16) (cid:90) H ( X t ) dt, MN (cid:0) , σ u V (cid:1) (cid:17) , (11) (cid:16) (cid:90) H ( X n, [ nt ] ) dt, U n , U n (cid:17) → d (cid:16) (cid:90) H ( X t ) dt, MN (cid:0) , σ u V (cid:1) (cid:17) , (12)where U n = (cid:114) c n n n (cid:88) k =1 π ( d n ) − g ( x k )  √ l n l n (cid:88) j =1 K [ c n ( k/n − τ j )]  u k ,U n = (cid:114) c n n n (cid:88) k =1  √ l n l n (cid:88) j =1 K ∗ [ c n ( k/n − τ j )]  u k ,U n = (cid:114) c n n n (cid:88) k =1 K ∗ [ c n ( k/n − τ ∗ )] u k ,V =  (cid:82) H ( X t ) dt (cid:82) K (cid:82) H ( X t ) dt (cid:82) KK ∗ (cid:82) H ( X t ) dt (cid:82) KK ∗ (cid:82) ( K ∗ )  ,V = (cid:82) H ( X t ) dt (cid:82) K (cid:82) ( K ∗ )  . The limit results (11) and (12), together with Theorems 1 - 3, will be utilised in Section 3 next.

The limit theory presented in the previous section is subsequently utilised for deriving the propertiesof the LTLS estimator and a related t-statistic. We consider nonlinear models of the form y k = µ + βf ( x k ) + u k , k = 1 , ..., n, (13)where f is a known regression function ( µ, β ) unknown parameters and the covariate x k can benonstationary process or a stationary one amenable to the limit theory of Theorem 1 or Theorem2 respectively. Further, x k is predetermined with respect to the error u t in the sense x k is F k − -10easurable and { u k , F k } is a martingale diﬀerence (c.f. Assumptions A1-A3 ). Similar nonlinearmodels with a predetermined covariate have been considered for example by Park and Phillips (1999,2001) and Chan and Wang (2015), in a parametric set up, and by Wang and Phillips (2009a,b,2011, 2012) in a nonparametric set-up. Let K be a kernel function satisfying A4 (a) or A4 ∗ (a). Let τ j = j/ ( l n + 1) , j = 1 , ..., l n , c n and l n be deterministic sequences satisfying A4 (b) and (c) or A4 ∗ (b) and (c). We also allow for l n tobe a ﬁxed constant. Set K kn := l n (cid:88) j =1 K [ c n ( k/n − τ j )] . (14)Our aim is to estimate the unknown parameter β in (13) by using the following instrument for f ( x k ) Z kn := f k K kn := f ( x k ) K kn . As remarked in Section 1, due to the integrability of K , a trimming eﬀect applies around thechronological point(s) ( cp(s) hereafter) τ j which in turn reduces the signal of the OLS instrument f ( x k ) . The reduction is more pronounced when the distance between k/n and τ j is large, and/orthe sequence c n diverges fast. Clearly, for K kn = 1 we get the OLS estimator as a special case.The reduction in the instrument signal enables an extended martingale given by Wang (2014) tooperate. As a result the estimator under consideration has a mixed Gaussian limit distribution,making pivotal inference possible.A trimming method is also crucial for demeaning { y k } i.e. taking into account the unknownintercept µ . Let K ∗ kn , k = 1 , ..., n be additive functionals of certain integrable kernel function. Forany sequence { a k } nk =1 let a := (cid:80) nk =1 a k K ∗ kn (cid:80) nk =1 K ∗ kn and a k := a k − a . (15)We will consider two possibilities for K ∗ kn . Either K ∗ kn := l n (cid:88) j =1 K ∗ [ c n ( k/n − τ j )] or K ∗ kn := K ∗ [ c n ( k/n − τ ∗ )] , (16)where K ∗ satisﬁes A4 (a), τ j = j/ ( l n + 1) , j = 1 , , ..., l n , are given above and < τ ∗ < . The ﬁrstterm in (15) involves a trimmed sample mean around an array of several cps , whilst the second is Here we consider nonlinear models in x k only. Our results can be generalised to models that are both nonlinearin x k and the parameters along the lines of Chan and Wang (2015) for instance.

11 trimmed sample mean based on a single ﬁxed cp . Deﬁne the LTLS estimator as ˆ β := (cid:80) nk =1 Z kn y k (cid:80) nk =1 Z kn f k . The employment of a “trimmed” sample mean is crucial for obtaining mixed Gaussian limit theory.Notice that ˆ β = β + 1 (cid:80) nk =1 Z kn f k (cid:40) n (cid:88) k =1 f k K kn u k − ( (cid:80) nk =1 f k K kn ) (cid:80) nk =1 K ∗ kn u k (cid:80) nk =1 K ∗ kn (cid:41) . For nonstationary x k the two martingale terms shown above converge jointly to a bivariate mixedGaussian limit. In particular, (cid:34)(cid:114) c n n n (cid:88) k =1 f (cid:0) d − n x k (cid:1) K kn u k , (cid:114) c n n n (cid:88) k =1 K ∗ kn u k (cid:35) → d MN ( , V ) , for some random matrix V . Note that if instead the standard demeaning was employed (i.e. K ∗ = 1 ), then (cid:34)(cid:114) c n n n (cid:88) k =1 f (cid:0) d − n x k (cid:1) K kn u k , √ n n (cid:88) k =1 u k (cid:35) (cid:57) d MN ( , V ) , for some random matrix V , despite the fact that each of the components on the l.h.s. aboveconverges weakly to some (mixed) Gaussian limit.To investigate the limit properties of the LTLS estimator ˆ β in detail, set λ n := nl n c n and λ ∗ n := nl ∗ n c n , where l ∗ n :=  l n , if K ∗ kn = (cid:80) l n j =1 K ∗ [ c n ( k/n − τ j )]1 , if K ∗ kn = K ∗ [ c n ( k/n − τ ∗ )] . The sequences λ n , λ ∗ n give the order of the terms (cid:80) nk =1 K kn and (cid:80) nk =1 K ∗ kn which in turn determinethe convergence rate of the LTLS estimator. Further set R ∗ = 1 and Q ∗ = (cid:82) KK ∗ if l ∗ n = l n ; R ∗ = Q ∗ = 0 if l ∗ n = 1 .We have the following main results for the asymptotics of the LTLS estimator ˆ β . Theorem 4 isfor stationary regressor. Limit theory in nonstationary case is given in Theorem 5. Note that by standard arguments (Euler summation) n (cid:88) k =1 K ∗ kn ∼ nl ∗ n c n (cid:90) K ∗ . heorem 4. Suppose that: ( a ) A1 , A2 with g = f , and A4 or A4 ∗ hold; ( b ) K ∗ satisﬁes A4 (a) or A4 ∗ (a) and < τ ∗ < .Then, as n → ∞ , we have (cid:112) λ ∗ n (cid:16) ˆ β − β (cid:17) → d σ u N (cid:0) , Ω − LM L (cid:48) (cid:1) , (17) where Ω = (cid:110) Ef ( x ) − [ Ef ( x )] (cid:111) (cid:82) K , L = (cid:0) R ∗ , − Ef ( x ) (cid:82) K/ (cid:82) K ∗ (cid:1) and M =  Ef ( x ) (cid:82) K Ef ( x ) Q ∗ Ef ( x ) Q ∗ (cid:82) ( K ∗ )  . Theorem 5.

Suppose that ( a ) A1 , A3 and A4 or A4 ∗ hold; ( b ) f ( x ) is an asymptotically homogeneous function, i.e., there exists a continuous function H and π : (0 , ∞ ) → (0 , ∞ ) such that f ( λx ) = π ( λ ) H ( x ) + R ( λ, x ) , where | R ( λ, x ) | ≤ a ( λ )(1 + | x | δ ) for some δ > and a ( λ ) /π ( λ ) → , as λ → ∞ ; ( c ) K ∗ satisﬁes A4 (a) or A4 ∗ (a) and < τ ∗ < .Then, as n → ∞ , (cid:112) λ ∗ n π ( d n ) (cid:16) ˆ β − β (cid:17) → d σ u MN (cid:18) , (cid:16) C (cid:90) K (cid:17) − AV A (cid:48) (cid:19) , (18) where C =  (cid:82) H ( X t ) dt − (cid:104)(cid:82) H ( X t ) dt (cid:105) , if K ∗ kn = (cid:80) l n j =1 K ∗ [ c n ( k/n − τ j )] , (cid:82) H ( X t ) dt − (cid:2) (cid:82) H ( X t ) dt (cid:3) H ( X τ ∗ ) , if K ∗ kn = K ∗ [ c n ( k/n − τ ∗ )] ,A = (cid:20) R ∗ , − (cid:90) H ( X t ) dt (cid:90) K/ (cid:90) K ∗ (cid:21) , and V =  (cid:82) H ( X t ) dt (cid:82) K (cid:82) H ( X t ) dt Q ∗ (cid:82) H ( X t ) dt Q ∗ (cid:82) ( K ∗ )  . emark . Due to the fact that (cid:112) λ ∗ n = o ( √ n ) , the convergence rate in LTLS for both stationaryand nonstationary regressor is slower in comparison with that of the OLS estimator. Remark . When a single cp is used in demeaning y k , we have R ∗ , Q ∗ = 0 . In this case, the righthand side of (18) becomes − (cid:82) H ( X t ) dt/ (cid:82) K ∗ (cid:82) H ( X t ) dt − (cid:104)(cid:82) H ( X t ) dt (cid:105) H ( X τ ∗ ) × N (cid:18) , σ u (cid:90) ( K ∗ ) (cid:19) . Simulations presented in Section 4 show that, in ﬁnite samples, superior performance is obtainedfor certain conﬁguration that involves multiple cps for the instrumentation of x k (i.e. K ) and asingle cp for demeaning y k (i.e. K ∗ ). An analogous result can be established when the oppositeholds i.e. a single cp ( τ , say) is used for the instrumentation of x k (i.e. K ) and multiple cps (i.e. K ∗ ) are used for demeaning. In particular, in the latter case it can be shown that the limitdistribution (nonstationary x k ) is − H ( X τ ) / (cid:82) K ∗ H ( X τ ) − (cid:104)(cid:82) H ( X t ) dt (cid:105) H ( X τ ) × N (cid:18) , σ u (cid:90) ( K ∗ ) (cid:19) . We do not consider this possibility in the theorems shown above explicitly in order avoid morecomplex exposition.To end this section, we consider the following t -statistic for the hypothesis H : β = β (forsome β ∈ R ) ˆ T := C n ˆ β − β (cid:112) ˜ σ A n V n A (cid:48) n , (19)where A n := (cid:20) , − (cid:80) nk =1 f k K kn (cid:80) nk =1 K ∗ kn (cid:21) , C n := n (cid:88) k =1 Z kn f k , V n :=  (cid:80) nk =1 K kn f k (cid:80) nk =1 K ∗ kn K kn f k (cid:80) nk =1 K ∗ kn K kn f k (cid:80) nk =1 ( K ∗ kn )  , and ˜ σ := n − (cid:80) nk =1 ˜ u k , where ˜ u k are residuals from OLS estimation of (13). The limit propertiesof ˆ T under the null hypothesis are demonstrated by Theorem 6 below. Theorem 6.

Suppose that either conditions of Theorem 4 or Theorem 5 hold. Then under H : β = β , ˆ T → d N (0 , . emark . Note that the limit distribution of the test statistic under the null hypothesis is standardnormal for both stationary and nonstationary regressors. Under the alternative hypothesis, thedivergence rate of ˆ T is determined by the convergence rate of the LTLS estimator. In particular,for stationary x k it can be easily seen that ˆ T = O P ( (cid:112) λ ∗ n ) . On the other hand in the nonstationarycase we have ˆ T = O P ( (cid:112) λ ∗ n π ( d n )) , where d n = √ n for x k NI or I(1) and d n = n d , / < d < / . Therefore, faster divergence rate is attained for more persistence processes. This fact is alsocorroborated by our simulation results (see Figures 1-3). We next investigate the ﬁnal sample performance of the t-test based on the LTLS estimator. Inparticular we test the hypothesis H : β = 0 against H : β (cid:54) = 0 at 5% signiﬁcance level. The vector [ ξ k , u k ] process is generated by  ξ k u k  ∼ i.d.N  ,  δδ  Further, for k = 1 , ..., n the process { y k } is generated by y k +1 = βx k + u k +1 where { x k } is either a NI array of the form x k = (cid:16) cn (cid:17) x k − + ξ k , (20)with c ≤ and x = 0 or a type II fractional process (e.g. see Robinson and Hualde, 2003) of theform ( I − L ) d x k = ξ k { k ≥ } . (21)Let ϕ ς ( x ) be the density of a N (cid:0) , ς (cid:1) variate. Next, set ˜ σ u = n − (cid:80) nk =1 ˜ u k , ˜ σ ξ = n − (cid:80) nk =1 ˜ ξ k , ˜ δ = n − (cid:80) nk =1 ˜ u k ˜ ξ k (cid:113) ˜ σ u ˜ σ ξ , where ˜ u t and ˜ ξ k are OLS residuals from the regressions y k +1 = ˜ µ + ˜ βx k + ˜ u k +1 and x k = ˜ µ x + ˜ ρx k − + ˜ ξ k respectively. Finally, { τ j } l n j =1 are equispaced points on (0 , .We consider 3 set-ups for kernel functionals and cps .151 (set-up 1) K kn = (cid:80) l n j =1 K [ c n ( k/n − τ j )] , K ∗ kn = (cid:80) l n j =1 K ∗ [ c n ( k/n − τ j )] , K ( x ) = ϕ . ( x ) / , K ( x ) ∗ = ϕ ( x ) / , c n = n . , l n = c . n .S2 (set-up 2) K kn = (cid:80) l n j =1 K [ c n ( k/n − τ j )] , K ∗ kn = (cid:80) l n j =1 K ∗ [ c n ( k/n − τ j )] , K ( x ) = ϕ . ( x ) / , K ( x ) ∗ = ϕ ( x ) / , c n = n . , l n = c ˆ αn , ˆ α = 1 − . (cid:12)(cid:12)(cid:12) ˜ δ (cid:12)(cid:12)(cid:12) .S3 (set-up 3) K kn = (cid:80) l n j =1 K [ c n ( k/n − τ j )] , K ∗ kn = K ∗ [ c n ( k/n − . , K ( x ) = ϕ ˆ ς ( x ) , K ( x ) ∗ = ϕ ˆ ς ( x ) / , ˆ ς = ˜ σ u (cid:16) . . (cid:12)(cid:12)(cid:12) ˜ δ (cid:12)(cid:12)(cid:12)(cid:17) , c n = n ˆ α , ˆ α = − . . (cid:12)(cid:12)(cid:12) ˜ δ (cid:12)(cid:12)(cid:12) , l n = log n .In S1 and S2 multiple cps are used for both K kn and K ∗ kn whilst in S3 K ∗ kn involves a single cp . Contrary to S1, in S2 a data driven approach is followed for the determination of the numberof cps ( l n ). As remarked in Section 1, a small c n and/or large number of cps results in a LTLSestimator approximately equal to the OLS estimator. The OLS estimator in general has a goodpower properties but is severely oversized when endogeneity is strong (i.e. when | δ | is close to one).In S2 a large number of cps is utilised when endogeneity is weak whilst for l n drops as | δ | approachesone. A similar data-driven approach is utilised in S3. In this case c n is very small (vanishing) for δ close to zero, whilst c n is large (diverging) for | δ | close to one. Further, in S3 the choice of thekernel variance is also data driven. Preliminary simulations have shown that superior performanceis attained when ς = 0 . for δ ≈ and ς = 1 for | δ | ≈ . Therefore, ˆ ς = ˜ σ u (cid:16) . . (cid:12)(cid:12)(cid:12) ˜ δ (cid:12)(cid:12)(cid:12)(cid:17) provides an interpolation between these values based on the actual data.For S1 and S2 we use the test statistic of (19). For S3 we use A ∗ n := (cid:104) , − (cid:80) nk =1 f ( x k ) K ∗ kn (cid:80) nk =1 K ∗ kn (cid:105) insteadof A n in (19). Note that given the conﬁguration of S3, in the nonstationary case, (cid:80) nk =1 f ( x k ) K ∗ kn (cid:80) nk =1 K ∗ kn = O p ( π f ( d n )) whilst (cid:80) nk =1 f ( x k ) K kn (cid:80) nk =1 K ∗ kn that appears in A n is O p ( π f ( d n ) log n ) . Therefore, the employ-ment of A ∗ n results in giving less weight in the term that correspondents to the studentisation ofthe intercept correction. Note that in inﬁnite samples the utilisation of A ∗ n does not result in aconsistent estimator for the limit variance of ˆ β − β . Nevertheless, our simulation results reveal thatin ﬁnite samples a superior performance in attained when A ∗ n is employed.Table 1 shows the size properties of the LTLS based t-tests, for the case the regressor is aNI array generated by (20). The number of replication paths is set 10,000 throughout. We alsoconsider the IVX based test (see eq. (20) in Kostakis et al, 2015) and the OLS based t-test. Weallow for several values of the correlation parameter ( δ = {− . , − . , , . , . } ) the near to unityparameter ( c = { , − , − , − , − } ) and sample size ( n = { , , , } ). We use thenotation T1, T2 and T3 to denote the LTLS t-statistics that correspond to set-ups S1, S2 and S3respectively. In general, all LTLS based test exhibit good size control. Under S1 and S2 the testsare moderately oversized for small samples sizes when c = 0 and correlation | δ | = . . Figure 1 and16igure 2 show the empirical power ( n = 250 ) of the LTLS and IVX tests for c = 0 and c = − respectively. It can be seen from these ﬁgures that T3 attains better performance than the otherLTLS based tests under consideration (i.e. T1 and T2). In particular, the performance of the theLTLS t-test under S3 is almost identical to that of the IVX based test. This is somewhat surprisinggiven that under S3 the studentisation used does not lead to a consistent estimator for the limitvariance of the LTLS estimator. As noted above, under S3 the term that provides studentisation tothe intercept correction is of slightly smaller order of magnitude (i.e. log n ) than the correspondingterm in A n . The simulation study provided suggests that this misbalancing leads to some ﬁnitesample improvement. Hosseinkouchack and Demetrescu (2019) provide ﬁnite sample improvementsto the the IVX method. These authors show that the IVX t-statistic distribution is skewed relativeto the N (0 , in ﬁnite samples when endogeneity is strong. It is reasonable to expect that a similarphenomenon holds for the LTLS distribution in ﬁnite samples. It seems that the utilisation A ∗ n provides a rebalancing to the test statistic that corrects for deviations from the standard normaldistribution. A rigorous analysis for the performance of the T3 in ﬁnite samples, would requiredeveloping higher order limit theory. A development in this direction is challenging from a technicalpoint of view and will be left for future work.We next consider the case where the regressor is a non stationary fractional process (i.e. (21)).The ﬁnite sample size performance of T3 and the LS based test procedure are shown in Table 2. It can be seen from Table 2 that the T3 test provides good size control for a wide range levels inpersistence and endogeneity. On the other hand LS based test may exhibit serious oversizing. Inparticular, for δ = − . empirical size ranges from three times ( d = 0 . ) to six times ( d = 1 . )the nominal one. Finally, Figure 3 shows the ﬁnite power of T3 for n = 250 , d = { . , , . } and δ = { , − . , − . } . As expected, better power performance is attained for more persistentregressors. Preliminary simulation results show that the performance of T1 and T2 in the fractional case is comparable tothat in the NI case. c = 0 δ = − . δ = − . δ = 0 δ = 0 . δ = 0 . n T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS250 0.084 0.095 0.060 0.059 0.278 0.059 0.074 0.057 0.056 0.117 0.051 0.052 0.045 0.050 0.053 0.061 0.075 0.052 0.056 0.113 0.087 0.096 0.064 0.061 0.295500 0.077 0.078 0.062 0.062 0.287 0.059 0.067 0.051 0.054 0.114 0.054 0.053 0.046 0.054 0.054 0.060 0.076 0.057 0.058 0.116 0.080 0.083 0.058 0.055 0.279750 0.076 0.069 0.062 0.058 0.272 0.059 0.065 0.051 0.052 0.109 0.052 0.050 0.042 0.050 0.051 0.059 0.062 0.054 0.055 0.111 0.080 0.068 0.063 0.057 0.2771000 0.070 0.067 0.059 0.053 0.278 0.054 0.064 0.051 0.051 0.111 0.049 0.048 0.046 0.051 0.053 0.059 0.067 0.052 0.050 0.108 0.075 0.062 0.058 0.053 0.277 c = − δ = − . δ = − . δ = 0 δ = 0 . δ = 0 . n T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS250 0.061 0.069 0.060 0.062 0.116 0.051 0.061 0.056 0.056 0.072 0.050 0.052 0.051 0.050 0.051 0.057 0.065 0.056 0.059 0.074 0.068 0.070 0.066 0.066 0.123500 0.060 0.067 0.062 0.063 0.117 0.051 0.060 0.056 0.059 0.073 0.051 0.055 0.051 0.052 0.054 0.056 0.061 0.057 0.057 0.071 0.062 0.060 0.058 0.058 0.116750 0.063 0.058 0.062 0.060 0.116 0.058 0.056 0.056 0.059 0.070 0.056 0.055 0.052 0.056 0.053 0.059 0.055 0.059 0.058 0.073 0.065 0.063 0.062 0.062 0.1191000 0.058 0.055 0.059 0.060 0.116 0.049 0.059 0.052 0.054 0.066 0.047 0.052 0.050 0.050 0.051 0.050 0.056 0.053 0.052 0.066 0.059 0.056 0.057 0.058 0.115 c = − δ = − . δ = − . δ = 0 δ = 0 . δ = 0 . n T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS250 0.058 0.067 0.058 0.062 0.086 0.051 0.058 0.054 0.055 0.063 0.049 0.050 0.050 0.051 0.052 0.056 0.061 0.057 0.057 0.063 0.063 0.059 0.066 0.065 0.090500 0.058 0.053 0.061 0.063 0.088 0.051 0.060 0.058 0.058 0.065 0.047 0.056 0.051 0.052 0.052 0.050 0.061 0.054 0.055 0.060 0.056 0.056 0.059 0.057 0.085750 0.058 0.054 0.061 0.060 0.087 0.058 0.053 0.058 0.056 0.064 0.055 0.054 0.053 0.056 0.053 0.056 0.059 0.057 0.055 0.062 0.058 0.057 0.062 0.062 0.0881000 0.053 0.052 0.058 0.058 0.084 0.049 0.052 0.053 0.053 0.059 0.046 0.048 0.048 0.050 0.051 0.049 0.056 0.050 0.051 0.058 0.054 0.055 0.058 0.058 0.088 c = − δ = − . δ = − . δ = 0 δ = 0 . δ = 0 . n T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS250 0.056 0.052 0.057 0.060 0.069 0.052 0.054 0.052 0.051 0.057 0.051 0.049 0.051 0.050 0.050 0.055 0.057 0.055 0.055 0.058 0.061 0.056 0.059 0.060 0.071500 0.054 0.055 0.058 0.060 0.072 0.050 0.055 0.055 0.054 0.058 0.048 0.056 0.051 0.051 0.052 0.049 0.053 0.053 0.055 0.058 0.053 0.052 0.054 0.056 0.067750 0.053 0.053 0.060 0.059 0.071 0.056 0.055 0.057 0.060 0.060 0.052 0.054 0.055 0.056 0.053 0.056 0.055 0.055 0.055 0.058 0.057 0.054 0.061 0.062 0.0741000 0.052 0.052 0.057 0.057 0.071 0.047 0.057 0.052 0.050 0.056 0.048 0.049 0.048 0.048 0.049 0.048 0.050 0.048 0.049 0.053 0.052 0.055 0.057 0.055 0.070 c = − δ = − . δ = − . δ = 0 δ = 0 . δ = 0 . n T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS250 0.053 0.053 0.056 0.054 0.058 0.052 0.050 0.049 0.050 0.051 0.049 0.051 0.049 0.049 0.049 0.052 0.054 0.052 0.050 0.053 0.055 0.049 0.055 0.055 0.058500 0.052 0.053 0.056 0.054 0.059 0.052 0.054 0.052 0.051 0.053 0.048 0.051 0.046 0.047 0.048 0.050 0.053 0.050 0.050 0.050 0.053 0.048 0.055 0.055 0.059750 0.051 0.050 0.059 0.059 0.064 0.053 0.049 0.054 0.055 0.055 0.053 0.050 0.054 0.053 0.052 0.057 0.051 0.056 0.056 0.058 0.057 0.046 0.059 0.059 0.0631000 0.054 0.054 0.058 0.055 0.061 0.051 0.053 0.052 0.053 0.053 0.050 0.048 0.048 0.050 0.050 0.050 0.047 0.048 0.049 0.050 0.051 0.050 0.054 0.053 0.058 igure 1: Empirical Power (NI regressor; 5% Nominal Size ; c = 0 ) P o w e r IVX P o w e r T1T2T3IVX P o w e r T1T2T3IVX c = − ) P o w e r T1T2T3IVX P o w e r T1T2T3IVX P o w e r T1T2T3IVX δ = − . d = 0 . d = 0 . d = 0 . d = 1 d = 1 . d = 1 . n T3 LS T3 LS T3 LS T3 LS T3 LS T3 LS250 0.051 0.158 0.051 0.184 0.055 0.235 0.060 0.278 0.064 0.308 0.067 0.325500 0.051 0.161 0.052 0.184 0.058 0.242 0.062 0.287 0.066 0.319 0.068 0.337750 0.051 0.155 0.052 0.178 0.058 0.230 0.062 0.272 0.064 0.301 0.067 0.3221000 0.048 0.155 0.050 0.183 0.055 0.229 0.059 0.278 0.065 0.310 0.069 0.327 δ = − . d = 0 . d = 0 . d = 0 . d = 1 d = 1 . d = 1 . n T3 LS T3 LS T3 LS T3 LS T3 LS T3 LS250 0.051 0.085 0.052 0.093 0.055 0.107 0.057 0.117 0.058 0.121 0.057 0.126500 0.051 0.085 0.050 0.091 0.051 0.102 0.051 0.114 0.052 0.120 0.052 0.123750 0.048 0.081 0.046 0.086 0.048 0.098 0.051 0.109 0.052 0.117 0.052 0.1191000 0.047 0.077 0.047 0.086 0.048 0.102 0.051 0.111 0.056 0.118 0.053 0.120 δ = 0 . d = 0 . d = 0 . d = 0 . d = 1 d = 1 . d = 1 . n T3 LS T3 LS T3 LS T3 LS T3 LS T3 LS250 0.043 0.051 0.044 0.052 0.043 0.053 0.045 0.053 0.044 0.052 0.043 0.052500 0.048 0.054 0.048 0.055 0.047 0.054 0.046 0.054 0.045 0.055 0.045 0.056750 0.048 0.053 0.046 0.051 0.043 0.052 0.042 0.051 0.042 0.051 0.043 0.0531000 0.045 0.049 0.044 0.049 0.044 0.053 0.046 0.053 0.044 0.054 0.044 0.055 igure 3: Empirical Power (Fractional Regressor; 5% Nominal Size) P o w e r T3, d=1T3, d=1.2 P o w e r T3, d=1T3, d=1.2 P o w e r T3, d=0.8T3, d=1T3, d=1.2 Application to the predictability of stock returns

A large literature in empirical ﬁnance is devoted to the investigation of the hypothesis that stockreturns can be predicted with publicly available information. For a review of existing work seefor example Welch and Goyal (2008) and for more recent developments Kostakis, Magdalinos andStamatogiannis (2015). Typically empirical work in this area involves inferential procedures, forthe hypothesis H : β = 0 , in the context of predictive regressions of the form r k +1 = µ + βx k + u k +1 , (22)where r k are stock returns relating to some stock index, x k some predictive variable and u t amartingale diﬀerence regression error. Usually some ﬁnancial ratio (e.g. dividend yield, earningsto price ratio, book to market ratio) or some macroeconomic variable (e.g. inﬂation) is consideredas a possible predictor for future returns. Phillips (2015) provides a review for the econometricmethodology employed in the predictive regressions literature. Most studies (e.g. Welch andGoyal, 2008) are utilising methods that are only valid for stationary x k despite the fact that thereis strong evidence that that in certain datasets various ﬁnancial and macroeconomic variables areconsistent with nonstationary processes (e.g. see Kostakis et al, 2015; Table 4). To the best ofour knowledge, Campbell and Yogo (2006) is the ﬁrst work that explicitly provides an attemptto address the possibility that the regressor is nonstatationary. In particular, Campbell and Yogo(2006) develop a testing procedure for the case the predictor is a NI array based on conservativeconﬁdence intervals. Kostakis et al (2015) consider a modiﬁed version of the Magdalinos andPhillips (2009) IVX, that involves a ﬁnite sample correction relating to intercept estimation, toexamine the return predictability hypothesis. The IVX estimator yields conventional inference forthe case where x k is a NI or mildly integrated array (e.g. Phillips and Magdalinos, 2007) or astationary linear process. IVX instruments are also employed in the recent work of Demetrescu etal (2020) who propose inferential procedures for detecting episodic predictability in stock returns.The IVX method has been also employed in the recent work of Yang, Long, Peng and Cai (2020)who investigate predictability in the U.S. housing index return.An important issue that has been largely overlooked in most studies in this area, is that stockreturns series typically exhibit very weak persistence relative to most popular predictors. In par-ticular, in many datasets short-term returns appear to be close to I ( d ) processes with d ≈ , whilstseveral predictors appear to be I ( d ) with d > / i.e. nonstationary processes. Regressing a sta-tionary processes on a possibly nonstationary leads to misbalancing . As emphasised by Phillips(2015), misbalancing may result to asymptotically vanishing estimators. For instance if r k ∼ I ( d ) d < / (stationary long memory) and x k ∼ I ( d ) with d > / , then then OLS estimator for β in (22) is ˜ β → P .Only a few studies in this area attempt to address the issue of misbalancing. Marmer (2007)points out that a nonlinear relationship between returns and predictive variables is a plausible jus-tiﬁcation for this discrepancy in persistence. It is known for instance that integrable and boundedtransformations of persistent processes may exhibit very weak signal (e.g. Park and Phillips, 1999,2001; Park, 2003). Therefore, suppose that r k +1 = f ( x k ) + u k +1 where f is either integrable andcompactly supported or the indicator function { . < } . The predictor x k in this case has only“spatial episodic” impact on returns when the predictive variable visits the support of f (integrablecase) or when it assumes negative values (indicator case). For DGPs of this kind it is diﬃcultdistinguishing r k from the martingale diﬀerence error u k , despite the fact r k is a function of apersistent process (see for example Kasparis, Andreou and Phillips (2015), Figure 6; or Phillips(2015), Figure 2). Marmer (2007) develops a RESET type of functional form test for detecting pos-sibly nonlinear components (e.g. integrable) of some predictor in the stock return series. A similarapproach is also followed by Kasparis (2010) and Kasparis et al (2015), who utilise test statisticsthat involve integrable transformations of the predictor. The presence of integrable transforma-tions in the test statistics results in conventional inference but can also detect weak signal nonlinearcomponents aﬀecting the returns series (for more details see p. 473-474 in Kasparis et al, 2015).Bollerslev, Osterrieder, Sizova and Tauchen (2013) follow a diﬀerent approach for addressing theissue of misbalancing. These authors consider vix and realised volatility as possible predictors ofstock returns. Using preliminary estimations they ﬁnd that the aforementioned predictors exhibitlong memory with memory parameters d ≈ . , whilst stock returns appear to have a memoryparameter d ≈ . In view of this, Bollerslev et al (2013) consider preﬁltered predictors of the form ( I − L ) ˆ d x k where x k is some volatility variable. Notice that the fractionally diﬀerenced process isapproximately d ≈ . Finally, Demetrescu et al (2020) develop inferential procedures capable ofdetecting episodic predictability is stock returns for the case where the predictors that are either I (0) or NI. In particular, they consider a potentially nonlinear relationship between returns and thepredictive variables of the form r k +1 = f n ( x k , k/n ) + u k +1 , where f n ( x k , k/n ) = µ + k n β ( k/n ) x k , β ( . ) is a TVP depending on the rescaled time trend k/n , and k n an appropriate sequence. Thisformulations allows for “time episodic” impact of the predictor to the returns variable. Demetrescuet al (2020) achieve conventional inference by either utilising IVX instruments or the so called typeII instruments of Breitung and Demetrescu (2015). The method of Demetrescu et al (2020) can be used in conjunction with various instruments including LTLS.Such a development would require additional theoretical work and is left for future research.

24n this work we address the issue of misbalancing by consider predictability over longer horizons.In particular, we employ LTLS based inference in predictive regressions of the form r k + m = µ + βx k + u k + m , (23)where m ≥ . The speciﬁcation of (23) has been considered by other studies that investigate returnpredictability over long horizons (see for example Bandi and Perron, 2008; Hjalmarsson, 2011).The data are taken from the updated 2018 Welch and Goyal dataset . The returns variable isconstructed from the SP500 index ( I k ) as follows r k + m = ln( I k + m ) − ln( I k ) . We are using monthlyand quarterly observations. Therefore, for monthly data, r k + m should be understood as m monthsahead returns, and for quarterly observations as m quarters ahead. By construction returns are log-price diﬀerences. Therefore, the persistence of the returns series tends to increase as the horizonincreases. Table 3 provides memory estimates for the return series over diﬀerent horizons andfrequencies. In particular, we use the local Whittle estimator (LW; e.g. see Robinson, 1995) andthe exact local Whittle (ELW) of Shimotsu and Phillips (2005). The bandwidth employed is ofthe form n b . Shimotsu and Phillips (2005) consider b = 0 . for the bandwidth exponent. Herewe also consider b = 0 . and b = 0 . . Moreover, we report memory estimates for the earnings toprice ratio (EP). The particular series appears to be less persistent than dividend yield and book tomarker ratio that are commonly used in empirical work. For this reason we will concentrate on EPwhose memory characteristics are closer to those of the returns series. It can be seen from Table3 that the EP appears to be nonstationary at both frequencies and for all bandwidth choices withminimal memory estimate . . Further, the memory characteristics of the returns series appearto resemble those of the EP variable over longer horizons i.e. m = 24 for monthly data and m = 12 for quarterly, when b = 0 . , . .Figure 3 reports values for the LTLS ˆ T -statistics for the hypothesis H : β = 0 vs H : β (cid:54) = 0 (c.f. equation (23)). These values are plotted against the predictability horizon parameter m .We consider three conﬁgurations for kernels, cps and bandwidth sequences consistent with theset-ups S1, S2 and S3 given in the previous section. In particular, for S1 and S2 we choose K ( x ) = ϕ . σ u ( x ) / , K ( x ) ∗ = ϕ ˜ σ u ( x ) / . It can be seen from Figure 3 that there is evidence forpredictability only for longer horizons under S1 and S3. For monthly data, the null hypothesis isrejected at a 5% level under S1 and S3 for for m greater than 6 and 5 respectively. For quarterlydata the null is rejected under S1 and S3 for m greater than 12 and 10 respectively. These ﬁndingsare consistent with those of Bandi and Perron (2008) how ﬁnd strong predictability (by volatility n b b = 0 . b = 0 . b = 0 . LW ELW LW ELW LW ELWReturns ( m = 1 ) -0.09 -0.08 0.07 0.06 0.03 0.04Returns ( m = 12 ) -0.036 -0.02 0.45 0.45 0.84 0.86Returns ( m = 24 ) 0.21 0.22 0.93 0.93 1.04 1.06EP 0.77 0.85 0.92 1.22 1.02 1.51Quarterly DataBandwidth n b b = 0 . b = 0 . b = 0 . LW ELW LW ELW LW ELWReturns ( m = 1 ) -0.09 -0.07 -0.09 -0.08 0.03 0.04Returns ( m = 8 ) -0.03 -0.01 0.16 0.17 0.89 0.93Returns ( m = 12 ) 0.06 0.08 0.82 0.83 1.19 1.14EP 0.76 0.81 0.79 0.85 0.88 1.17predictors) over longer horizons. 26igure 4: Predictability Tests m (predictability horizon) -0.500.511.522.5 t - s t a t i s t i cs LTLS predictability tests (quarterly data)

T1T2T31.96 m (predictability horizon) t - s t a t i s t i cs LTLS predictability tests (monthly data)

T1T2T31.96 Proofs of main results

Throughout the section, we assume that

C, C , C , C , ... are positive constants that may take adiﬀerent value in each appearance and let K kn := (cid:80) l n j =1 K [ c n ( k/n − τ j )] as in (14). We start with two preliminary lemmas, which provide signiﬁcant extension to Lemma 4.1 of Hu,Phillips and Wang (2019) and include (5) and (7) as a corollary. The proofs of these two lemmaswill be given in Sections 6.7 and 6.8, respectively.Let { X n,k } k ≥ ,n ≥ , where X n,k = ( X nk, , ..., X nk,p ) , be a vector random array. When thereis no confusion, we also use the notation X nk = X n,k . Let { v k } k ≥ be a sequence of randomvariables, and G ( q ) = G ( q , ..., q p ) and K ( x ) be Borel functions on R p and R , respectively. For < τ < τ < ... < τ l < , set S n,l = c n n n (cid:88) k =1 G ( X nk ) v k l l (cid:88) j =1 K (cid:2) c n ( k/n − τ j ) (cid:3) , where { c n } n ≥ is a sequence of positive constants. Our ﬁrst result investigates the asymptotics of S n,l . Lemma 1.

Suppose that(a) there is a continuous limiting process X t = ( X ( t ) , ..., X p ( t )) such that X n, [ nt ] ⇒ X t on D R p [0 , ;(b) sup k ≥ E | v k | < ∞ and there exist A ∈ R and < m := m n → ∞ satisfying n/m → ∞ sothat max m ≤ j ≤ n − m E (cid:12)(cid:12) m (cid:80) j + mk = j +1 v k − A (cid:12)(cid:12) = o (1); (c) G ( q ) is continuous, K ( x ) has compact support or K ( x ) is eventually monotonic with K ( x ) ≤ C/ (1 + | x | ) , and K ( x ) ≥ satisfying (cid:82) K < ∞ .Then, for any ﬁxed l ≥ , c n → ∞ and c n /n → , we have S n,l = 1 l l (cid:88) j =1 G ( X n, [ nτ j ] ) A (cid:90) K + o P (1) → p l l (cid:88) j =1 G ( X τ j ) A (cid:90) K. (24)28 f in addition τ j = j/ ( l n + 1) , j = 1 , , ..., l n , where l − n + l n /c n → , then S n,l n = (cid:90) G ( X n, [ nt ] ) dt A (cid:90) K + o P (1) → p (cid:90) G ( X t ) dt A (cid:90) K. (25) Remark . Weak convergence in (a) and continuity of G ( q ) are essentially necessary for this kindof result. The result can be extended to the case that G ( q ) is locally Lebesgue integrable if weimpose more smooth conditions on X nk , but it involves more complicated calculation. We do notpursue the extension to keep this paper under reasonable length. It is worth to mention that norelationship is imposed between v k and X nk and condition (b) is satisﬁed with A = Ev whenever v t is ergodic (strictly) stationary satisfying E | v | < ∞ and n (cid:80) nk =1 v k → L Ev .If we are only interested in the boundedness of S n,l , condition (b) can be reduced as seen in thefollowing result. Lemma 2.

Suppose that conditions (a) and (c) of Lemma 1 hold and { v k } k ≥ is an arbitraryrandom sequence satisfying sup k ≥ E | v k | < ∞ . Then, for any l ≥ (allowing for l = l n → ∞ ), c n → ∞ and c n /n → , we have c n n n (cid:88) k =1 || G ( X nk ) || | v k | l l (cid:88) j =1 K (cid:2) c n ( k/n − τ j ) (cid:3) = O P (1) . (26) If in addition K ( x ) ≤ C / (1 + | x | ) , τ j = j/ ( l n + 1) , j = 1 , , ..., l n , l n log l n /c n → and l n → ∞ ,then c n n n (cid:88) k =1 || G ( X nk ) || | v k | l n (cid:88) ≤ i , it is readily seen that { v k } k ≥ is stationary and ergodic, and condition (b) of Lemma 1 holds with A = Ev (see, for instance,Kallenberg (2002, Chapter 10)). (5) follows from Lemma 1 with G ( x ) ≡ .We next consider M n,l n , i.e., (6). Set Q k,n := (cid:113) c n nl n α (cid:48) g ( x k ) K kn where α ∈ R p . Note that n (cid:88) k =1 Q k,n = c n n n (cid:88) k =1 (cid:2) α (cid:48) g ( x k ) (cid:3) l n l n (cid:88) j =1 K [ c n ( k/n − τ j )] + o P (1)= E (cid:2) α (cid:48) g ( x k ) (cid:3) (cid:90) K + o P (1) (33)by using Lemmas 1 and 2 with G ( x ) ≡ , v k = [ α (cid:48) g ( x k )] and A = E [ α (cid:48) g ( x k )] . It follows fromHall and Heyde (1980, Theorem 3.2) or Wang (2014, Theorem 2.1) that, equation (6) will follow,if we prove max ≤ k ≤ n | Q k,n | = o P (1) . (34)Note that for any A > , max ≤ k ≤ n | Q k,n | ≤ (cid:40) n (cid:88) k =1 Q k,n I {(cid:107) g ( x k ) (cid:107) > A } (cid:41) / + (cid:40) n (cid:88) k =1 Q k,n I {(cid:107) g ( x k ) (cid:107) ≤ A } (cid:41) / : II n ( A ) / + II n ( A ) / . Similar arguments used in (33) show that the ﬁrst term as n → ∞ ﬁrst and then as A → ∞ II n ( A ) ≤ (cid:107) α (cid:107) c n n n (cid:88) k =1 (cid:107) g ( x k ) (cid:107) I {(cid:107) g ( x k ) (cid:107) > A } l n l n (cid:88) j =1 K [ c n ( k/n − τ j )] + o P (1)= (cid:107) α (cid:107) E (cid:107) g ( x k ) (cid:107) I {(cid:107) g ( x k ) (cid:107) > A } (cid:90) K + o P (1) = o P (1) . By Lemma 2 with G ( x ) ≡ and v k = 1 , as n → ∞ , the second term II n ( A ) ≤ (cid:107) α (cid:107) A (cid:18) c n nl n (cid:19) n (cid:88) k =1 K kn = o P (1) . Combining these facts together, we establish (34). The proof of Theorem 1 is complete. (cid:3)

The result for S n,l n , i.e., (7) follows from Lemma 1 with v k ≡ .We next consider M n,l n , i.e., (8). Set Q k,n := (cid:113) c n nl n α (cid:48) g ( X nk ) K kn where α ∈ R p . Noting that (cid:82) g ( X n, [ nt ] ) dt is a continuous functional of X n, [ nt ] , the limit result of (8), jointly with (7), willfollow if we prove that (cid:110) X n, [ nt ] , n (cid:88) k =1 Q k,n u k (cid:111) ⇒ (cid:26) X t , MN (cid:18) , σ u (cid:90) (cid:2) α (cid:48) g ( X t ) (cid:3) dt (cid:90) K (cid:19)(cid:27) (35)on D R [0 , . First note that, by using Lemmas 1 and 2 with v k ≡ , n (cid:88) k =1 Q k,n = c n n n (cid:88) k =1 (cid:2) α (cid:48) g ( X nk ) (cid:3) l n l n (cid:88) j =1 K [ c n ( k/n − τ j )] + o P (1)= (cid:90) (cid:2) α (cid:48) g ( X n, [ nt ] ) (cid:3) dt (cid:90) K + o P (1) → d (cid:90) (cid:2) α (cid:48) g ( X t ) (cid:3) dt (cid:90) K , indicating that (cid:110) X n, [ nt ] , n (cid:88) k =1 Q k,n (cid:111) ⇒ (cid:26) X t , (cid:90) (cid:2) α (cid:48) g ( X t ) (cid:3) dt (cid:90) K (cid:27) . It follows from Theorem 2.1 of Wang (2014), the limit result of (35) will follow, if we prove max ≤ k ≤ n | Q k,n | = o P (1) , (36)31nd √ n n (cid:88) k =1 | Q k,n | = o P (1) . (37)In fact, by recalling the fact that || g || is still continuous, it follows from Lemma 2 with v k = 1 again that (cid:104) max ≤ k ≤ n | Q k,n | (cid:105) ≤ n (cid:88) k =1 Q k,n ≤ (cid:107) α (cid:107) (cid:16) c n nl n (cid:17) n (cid:88) k =1 (cid:107) g ( X nk ) (cid:107) K kn = o P (1) , yielding (36). Similarly, by recalling l n /c n → , we have √ n n (cid:88) k =1 | Q k,n | ≤ (cid:107) α (cid:107) √ n (cid:114) c n nl n n (cid:88) k =1 (cid:107) g ( X nk ) (cid:107) K kn = (cid:107) α (cid:107) (cid:114) l n c n c n nl n n (cid:88) k =1 (cid:107) g ( X nk ) (cid:107) K kn = O P (cid:16)(cid:114) l n c n (cid:17) = o P (1) , which shows (37). The proof of Theorem 2 is complete. (cid:3) To show Theorem 3, we only prove (9) since (10) is a direct consequence of (9) and Theorem 2.Notice that, by the condition (b), we may write n (cid:88) k =1 π ( d n ) − g ( x k ) K kn (cid:104) c n nl n , (cid:114) c n nl n u k (cid:105) = n (cid:88) k =1 (cid:104) H ( X nk ) + π ( d n ) − R ( d n , X nk ) (cid:105) K kn (cid:104) c n nl n , (cid:114) c n nl n u k (cid:105) = n (cid:88) k =1 H ( X nk ) K kn (cid:104) c n nl n , (cid:114) c n nl n u k (cid:105) + (∆ n , ∆ n ) , where R ( λ, x ) = [ R ( λ, x ) , ..., R p ( λ, x )] (cid:48) and ∆ n = c n nl n n (cid:88) k =1 π ( d n ) − R ( d n , X nk ) K kn , ∆ n = (cid:114) c n nl n n (cid:88) k =1 π ( d n ) − R ( d n , X nk ) K kn u k . Now (9) follows from Theorem 2 with g ( x ) = H ( x ) if we prove | α (cid:48) ∆ in | = o P (1) , i = 1 , , (38)32or any α = ( α , ..., α p ) (cid:48) ∈ R p .We only prove (38) with i = 2 since the proof of | α (cid:48) ∆ n | = o P (1) is similar except simpler.Recall K kn := (cid:80) l n j =1 K [ c n ( k/n − τ j )] and set, for A > , (cid:101) R n,l n ( A ) = (cid:114) c n nl n n (cid:88) k =1 α (cid:48) π ( d n ) − R ( d n , X nk ) I {| X nk | ≤ A } K kn u k . Note that as n → ∞ ﬁrst and then A → ∞ P (cid:16) α (cid:48) ∆ n (cid:54) = (cid:101) R n,l n ( A ) (cid:17) ≤ P (cid:18) max ≤ k ≤ n | X nk | ≥ A (cid:19) → . (39)For any (cid:15) > and A > , we have P (cid:0)(cid:12)(cid:12) α (cid:48) ∆ n (cid:12)(cid:12) ≥ (cid:15) (cid:1) ≤ P (cid:16) α (cid:48) ∆ n (cid:54) = (cid:101) R n,l n ( A ) (cid:17) + (cid:15) − E (cid:104) (cid:101) R n,l n ( A ) (cid:105) . Now | α (cid:48) ∆ n | = o P (1) follows from (39) and the fact that as n → ∞ for any A > E (cid:104) (cid:101) R n,l n ( A ) (cid:105) ≤ c n nl n C n (cid:88) k =1 E (cid:12)(cid:12)(cid:12) α (cid:48) π ( d n ) − R ( d n , X nk ) (cid:12)(cid:12)(cid:12) I {| X nk | ≤ A } K kn ≤ c n n C (cid:107) α (cid:107) (cid:16) A δ (cid:17) (cid:15) n l n n (cid:88) k =1 K kn → , where (cid:15) n = max ≤ i ≤ p | [ π i ( d n )] − a i ( d n ) | → and we have used (28) of Lemma 2 (with G ( x ) ≡ and v k ≡ ). The proof of Theorem 3 is now complete. Proofs of (11) and (12) are essentially the same as that of (9). We only provide a outline for(11). For any α, β ∈ R , let (cid:101) Q k,n = (cid:114) c n nl n (cid:16) αH ( X n,k ) K kn + β K ∗ kn (cid:17) , where K ∗ kn := (cid:80) l n j =1 K ∗ [ c n ( k/n − τ j )] . As in the proof of (9), we have α U n + β U n = n (cid:88) k =1 (cid:101) Q k,n u k + o P (1) . Note that, by using (31) and Lemmas 1 and 2, n (cid:88) k =1 (cid:101) Q k,n = α (cid:90) H ( X n, [ nt ] ) dt (cid:90) K + 2 αβ (cid:90) H ( X n, [ nt ] ) dt (cid:90) KK ∗ β (cid:90) ( K ∗ ) + o P (1) , indicating (cid:110) X n, [ nt ] , n (cid:88) k =1 (cid:101) Q k,n (cid:111) ⇒ (cid:8) X t , [ α, β ] V [ α, β ] (cid:48) (cid:9) . Similarly, we may prove that (36) and (37) hold with Q kn being replaced by (cid:101) Q k,n . As a consequence,(11) follows from Wang (2014) as in the proof of Theorem 2. (cid:3) We only prove Theorem 5 since the proof of Theorem 4 is similar except simpler. Let A n = c n nl n n (cid:88) k =1 π ( d n ) − f ( x k ) K kn , A n = c n nl n n (cid:88) k =1 π ( d n ) − f ( x k ) K kn ,A n = c n nl ∗ n n (cid:88) k =1 π ( d n ) − f ( x k ) K ∗ kn ,B n = (cid:114) c n nl n n (cid:88) k =1 π ( d n ) − f ( x k ) K kn u k , B n = (cid:114) c n nl ∗ n n (cid:88) k =1 K ∗ kn u k . Recall (15) and Z kn = f ( x k ) K kn and note that c n nl ∗ n (cid:80) nk =1 K ∗ kn = (cid:82) K ∗ + o (1) . It is readily seenfrom (9) of Theorem 3 and Theorem 2 that λ n π ( d n ) n (cid:88) k =1 Z kn f k = c n nl n n (cid:88) k =1 π ( d n ) − f ( x k ) K kn (cid:104) π ( d n ) − f ( x k ) − (cid:80) nk =1 π ( d n ) − f ( x k ) K ∗ kn (cid:80) nk =1 K ∗ kn (cid:105) = A n − A n A n (cid:46) (cid:90) K ∗ + o P (1)= C n (cid:90) K + o P (1) , (40)where C n =  (cid:82) H ( X n, [ nt ] ) dt − (cid:104)(cid:82) H ( X n, [ nt ] ) dt (cid:105) , if K ∗ kn = (cid:80) l n j =1 K ∗ [ c n ( k/n − τ j )] , (cid:82) H ( X n, [ nt ] ) dt − (cid:2) (cid:82) H ( X n, [ nt ] ) dt (cid:3) H ( X n, [ nτ ∗ ] ) , if K ∗ kn = K ∗ [ c n ( k/n − τ ∗ )] . Similarly, we have (cid:112) λ ∗ n λ n π ( d n ) n (cid:88) k =1 Z kn u k (cid:112) λ ∗ n λ n (cid:40) n (cid:88) k =1 π ( d n ) − f ( x k ) K kn u k − (cid:2)(cid:80) nk =1 π ( d n ) − f ( x k ) K kn (cid:3) [ (cid:80) nk =1 K ∗ kn u k ] (cid:80) nk =1 K ∗ kn (cid:41) (41) = (cid:114) l ∗ n l n B n − A n B n (cid:14) (cid:90) K ∗ + o P (1)= A n B n + o P (1) , (42)where A n = (cid:34) R ∗ , − (cid:82) H (cid:0) X n, [ nt ] (cid:1) dt (cid:82) K (cid:82) K ∗ (cid:35) , B n = [ B n , B n ] (cid:48) . Since both C n and A n are continuous functionals of X n, [ nt ] , a simple application of (11) and (12)yields that (cid:112) λ ∗ n π ( d n ) (cid:16) ˆ β − β (cid:17) = √ λ ∗ n λ n π ( d n ) (cid:80) nk =1 Z kn u k λ n π ( d n ) (cid:80) nk =1 Z kn f k = (cid:16) C n (cid:90) K (cid:17) − A n B n + o P (1) → d σ u MN (cid:18) , (cid:16) C (cid:90) K (cid:17) − AV A (cid:48) (cid:19) , (43)as required. The proof of Theorem 5 is complete. (cid:3) We only prove Theorem 6 under conditions of Theorem 5 since the proof under conditions ofTheorem 4 is similar. In addition to A n , B n , B n , A n and B n in the proof of Theorem 5, wedeﬁne V n = (cid:82) H (cid:0) X n, [ nt ] (cid:1) dt (cid:82) K (cid:82) H (cid:0) X n, [ nt ] (cid:1) dt Q ∗ (cid:82) H (cid:0) X n, [ nt ] (cid:1) dt Q ∗ (cid:82) ( K ∗ )  . As in the proof of (40), by letting D n = diag (cid:8) π ( d n ) √ λ n , (cid:112) λ ∗ n (cid:9) , we have λ ∗ n λ n π ( d n ) A n V n A (cid:48) n = λ ∗ n λ n π ( d n ) A n D n D − n V n D − n D n A (cid:48) n = (cid:34)(cid:114) λ ∗ n λ n , − λ n π ( d n ) (cid:80) nk =1 f ( x k ) K kn λ ∗ n (cid:80) nk =1 K ∗ kn (cid:35) ×  λ n π ( d n ) (cid:80) nk =1 K kn f ( x k ) π ( d n ) √ λ n λ ∗ n (cid:80) nk =1 K ∗ kn K kn f ( x k ) π ( d n ) √ λ n λ ∗ n (cid:80) nk =1 K ∗ kn K kn f ( x k ) λ ∗ n (cid:80) nk =1 ( K ∗ kn )  (cid:34)(cid:114) λ ∗ n λ n , − λ n π ( d n ) (cid:80) nk =1 f ( x k ) K kn λ ∗ n (cid:80) nk =1 K ∗ kn (cid:35) = A n V n A (cid:48) n + o P (1) . (44)Since ˜ σ = σ u + o P (1) under given assumptions, by using the similar arguments as in the proofs of(42) and (43), it follows from (44) that ˆ T = √ λ ∗ n λ n π ( d n ) (cid:80) nk =1 Z kn u k (cid:113) ˜ σ λ ∗ n λ n π ( d n ) A n V n A (cid:48) n = (cid:0) σ u A n V n A (cid:48) n (cid:1) − / A n B n + o P (1) → d N (0 , , as required. (cid:3) We only prove (25), as the proof of (24) is similar except more simpler. We start with the proof of(25) by assuming that there exists an

A > such that K ( x ) = 0 if | x | ≥ A and K ( x ) is Lipschitzcontinuous on R . This restriction will be removed later.Without loss of generality, suppose A = 1 . Set δ n,j = [ n ( τ j − /c n )] ∨ , δ n,j = [ n ( τ j +1 /c n )] ∨ and δ n,j = [ nτ j ] . Recall τ j = j/ ( l n + 1) . Since | c n ( k/n − τ j ) | < only if δ n,j ≤ k ≤ δ n,j , j = 1 , ..., l n , (45)by letting R n,j = c n n (cid:80) δ n,j k = δ n,j v k K (cid:2) c n ( k/n − τ j ) (cid:3) and R n,j = c n n δ n,j (cid:88) k = δ n,j (cid:2) G (cid:0) X nk (cid:1) − G (cid:0) X n,δ n,j (cid:1)(cid:3) v k K (cid:2) c n ( k/n − τ j ) (cid:3) , we have S n,l n = 1 l n l n (cid:88) j =1 c n n n (cid:88) k =1 G ( X nk ) v k K (cid:2) c n ( k/n − τ j ) (cid:3) = 1 l n l n (cid:88) j =1 G (cid:0) X n,δ n,j (cid:1) c n n δ n,j (cid:88) k = δ n,j v k K (cid:2) c n ( k/n − τ j ) (cid:3) + 1 l n l n (cid:88) j =1 c n n δ n,j (cid:88) k = δ n,j (cid:2) G (cid:0) X nk (cid:1) − G (cid:0) X n,δ n,j (cid:1)(cid:3) v k K (cid:2) c n ( k/n − τ j ) (cid:3) = 1 l n l n (cid:88) j =1 G (cid:0) X n,δ n,j (cid:1) R n,j + 1 l n l n (cid:88) j =1 R n,j l n l n (cid:88) j =1 G (cid:0) X n,δ n,j (cid:1) A (cid:90) K + 1 l n l n (cid:88) j =1 G (cid:0) X n,δ n,j (cid:1) (cid:2) R n,j − A (cid:90) K (cid:3) + 1 l n l n (cid:88) j =1 R n,j := 1 l n l n (cid:88) j =1 G (cid:0) X n,δ n,j (cid:1) A (cid:90) K + R n + R n . Since l n (cid:80) l n j =1 G (cid:0) X n,δ n,j (cid:1) = (cid:82) G ( X n, [ nt ] ) dt + o P (1) → d (cid:82) G ( X t ) dt , it suﬃces to show that R jn = o P (1) , j = 1 , . (46)To prove (46), we start with some preliminaries. Recalling X n, [ nt ] ⇒ X t on D R p [0 , and thelimit process X ( t ) is path continuous, we have X n, [ nt ] ⇒ X t on D R p [0 , in the sense of uniformtopology. See, for instance, Section 18 of Billingsley (1968). This fact implies that lim sup N →∞ lim sup n →∞ P (cid:16) max ≤ k ≤ n || X nk || ≥ N (cid:17) = 0 , (47)and by the tightness of { X n, [ nt ] } ≤ t ≤ , for any ε > and δ > , there is some ˜ δ = ˜ δ ( ε, δ ) > suchthat P ( sup | s − t |≤ ˜ δ || X n, [ nt ] − X n, [ ns ] || ≥ δ ) ≤ ε (48)holds for all suﬃciently large n . In terms of (48), for any δ > , we have lim n →∞ P ( max ≤ j ≤ l n max δ n,j ≤ l ≤ k ≤ δ n,j || X nk − X nl || ≥ δ ) = 0 . (49)We are now ready to prove (46), starting with j = 1 .For any N > , we let G N ( x ) = G ( x ) ξ N ( x ) with ξ N ( x ) =  , || x || ≤ N, − || x || /N, N < || x || < N, , || x || ≥ N, and (cid:101) R n = 1 l n l n (cid:88) j =1 G N (cid:0) X n,δ n,j (cid:1) (cid:2) R n,j − A (cid:90) K (cid:3) . n → ∞ ﬁrst and then N → ∞ , P ( R n (cid:54) = (cid:101) R n ) ≤ P (cid:16) max ≤ k ≤ n || X nk || ≥ N (cid:17) → , (50)and | (cid:101) R n | ≤ C N l n l n (cid:88) j =1 (cid:12)(cid:12) R n,j − A (cid:90) K (cid:12)(cid:12) , (51)where C N := sup x | G N ( x ) | < ∞ is a constant depending only on N , due to the continuity of G ( x ) .Result (46) with j = 1 will follow if we prove max ≤ j ≤ l n E (cid:12)(cid:12) R n,j − A (cid:90) K (cid:12)(cid:12) → , (52)as n → ∞ . Indeed, by virtue of (51) and (52), we have E | (cid:101) R n | → and then (cid:101) R n = o P (1) for each N ≥ . This, together with (50), yields R n = o P (1) .Since, as n → ∞ , max ≤ j ≤ l n (cid:12)(cid:12)(cid:12) c n n δ n,j (cid:88) k = δ n,j K (cid:2) c n ( k/n − τ j ) (cid:3) − (cid:90) K (cid:12)(cid:12)(cid:12) → , (53)to prove (52), it suﬃces to show that max ≤ j ≤ l n E | A n ( τ j ) | → , where A n ( τ j ) = c n n δ n,j (cid:88) k = δ n,j ( v k − A ) K (cid:2) c n ( k/n − τ j ) (cid:3) . Let γ = γ n be integers such that γ → ∞ and γ c n /n → , T n,j = [ δ n,j /γ ] and T n,j = [ δ n,j /γ ] .Noting (45), we may write A n ( τ j ) = c n n δ n,j (cid:88) k = δ n,j ( v k − A ) K (cid:2) c n ( k/n − τ j ) (cid:3) = c n n T n,j (cid:88) s = T n,j ( s +1) γ (cid:88) k = sγ ( v k − A ) K (cid:2) c n ( k/n − τ j ) (cid:3) ≤ γc n n T n,j (cid:88) s = T n,j K (cid:2) c n ( sγ/n − τ j ) (cid:3) γ (cid:12)(cid:12)(cid:12) ( s +1) γ (cid:88) k = sγ ( v k − A ) (cid:12)(cid:12)(cid:12) + c n n T n,j (cid:88) s = T n,j ( s +1) γ (cid:88) k = sγ | v k − A | (cid:12)(cid:12)(cid:12) K (cid:2) c n ( k/n − τ j ) (cid:3) − K (cid:2) c n ( sγ/n − τ j ) (cid:3)(cid:12)(cid:12)(cid:12) := A n ( τ j ) + A n ( τ j ) . sup k ≥ E | v k | < ∞ by condition (b), it is readily from the the Lipschitz condition on K ( x ) that EA n ( τ j ) ≤ C γc n n c n n δ n,j (cid:88) k = δ n,j E | v k − A | ≤ C γc n n → , uniformly in ≤ j ≤ l n . Similarly, by using condition (b), we have max ≤ j ≤ l n EA n ( τ j ) ≤ max γ ≤ s ≤ n − γ E (cid:12)(cid:12) γ s + γ (cid:88) k = s v k − A (cid:12)(cid:12) max ≤ j ≤ l n A n ( τ j ) → , where A n ( τ j ) = γc n n T n,j (cid:88) s = T n,j K (cid:2) c n ( sγ/n − τ j ) (cid:3) , and we have used the fact that max ≤ j ≤ l n (cid:12)(cid:12)(cid:12) A n ( τ j ) − (cid:82) K (cid:12)(cid:12)(cid:12) → . Combining all these facts, we prove(52), and complete the proof of R n = o P (1) .We next show R n = o P (1) . Let (cid:101) R n = l n (cid:80) l n j =1 (cid:101) R n,j , where (cid:101) R n,j = c n n δ n,j (cid:88) k = δ n,j (cid:2) G N (cid:0) X nk (cid:1) − G N (cid:0) X n,δ n,j (cid:1)(cid:3) v k K (cid:2) c n ( k/n − τ j ) (cid:3) . In terms of (47), we have P ( R n (cid:54) = (cid:101) R n ) ≤ P (cid:16) max ≤ k ≤ n || X nk || ≥ N (cid:17) → , as n → ∞ ﬁrst and then N → ∞ . Result R n = o P (1) will follow if we prove (cid:101) R n = o P (1) , foreach ﬁxed N ≥ .Recall that G N ( x ) is continuous with compact support. For any (cid:15) > , there exists a δ (cid:15) > sothat | G N ( x ) − G N ( y ) | ≤ (cid:15) whenever || x − y || ≤ δ (cid:15) . Write Ω δ (cid:15) = { ω : max ≤ j ≤ l n max δ n,j ≤ l ≤ k ≤ δ n,j || X nk − X nl || ≤ δ (cid:15) } . By virtue of the facts above and (53), it is readily seen that max ≤ j ≤ l n E (cid:2) | (cid:101) R n,j | I (Ω δ (cid:15) ) (cid:3) E (cid:110) max ≤ j ≤ l n max δ n,j ≤ l ≤ k ≤ δ n,j | G N ( X nk ) − G N ( X nl ) | c n n δ n,j (cid:88) k = δ n,j +1 | v k | K (cid:2) c n ( k/n − τ j ) (cid:3)(cid:111) ≤ (cid:15) sup k ≥ E | v k | c n n δ n,j (cid:88) k = δ n,j +1 K (cid:2) c n ( k/n − τ j ) (cid:3) ≤ C N (cid:15), where C N is a constant depending only on N . Now, for any η > and η > , let (cid:15) = η η and n be large enough so that, for all n ≥ n [recall (49)], P (cid:0) max ≤ j ≤ l n max δ n,j ≤ l ≤ k ≤ δ n,j || X nk − X nl || ≥ δ (cid:15) (cid:1) ≤ η . It is readily seen that, for all n ≥ n , P ( | (cid:101) R n | ≥ η ) ≤ P (cid:0) ¯Ω δ (cid:15) (cid:1) + η − l n l n (cid:88) j =1 E (cid:2) | (cid:101) R n,j | I (Ω δ (cid:15) ) (cid:3) ≤ C N η where ¯Ω δ (cid:15) denotes the complementary set of Ω δ (cid:15) and C N is a constant depending only on N . Thisyields (cid:101) R n = o P (1) , for each ﬁxed N ≥ , and completes the proof of R n = o P (1) .We ﬁnally remove the restriction on K and then conclude the proof of Lemma 1. If K hascompact support, then there exists A > such that K ( x ) = 0 holds for all | x | ≥ A . If K iseventually monotonic, then for any (cid:15) > , we can also choose a constant A := A ( (cid:15) ) > such that K ( x ) is monotonic on ( −∞ , − A ) and ( A , ∞ ) and (cid:82) | x | >A K ( x ) dx < (cid:15) (in order to simplify thenotations, here we use the same notation A to denote the constant).Since K ≥ with (cid:82) K < ∞ , for any (cid:15) > , there exists an A := A (cid:15) ≥ A + 1 such that (cid:90) | K − K (cid:15),A | ≤ (cid:15), where K (cid:15),A ( x ) = 0 if | x | ≥ A and K (cid:15),A ( x ) is Lipschitz continuous on R . Let (cid:101) K ( x ) = K ( x ) − K (cid:15),A ( x ) and S n,(cid:15) = 1 l n l n (cid:88) j =1 c n n n (cid:88) k =1 G ( X nk ) v k (cid:101) K (cid:2) c n ( k/n − τ j ) (cid:3) . It suﬃces to show that, as n → ∞ ﬁrst and then (cid:15) → , S n,(cid:15) = o P (1) . (54)40he proof of (54) is similar to that of (46). Indeed, by letting S n,(cid:15),N = 1 l n l n (cid:88) j =1 c n n n (cid:88) k =1 G N ( X nk ) v k (cid:101) K (cid:2) c n ( k/n − τ j ) (cid:3) , we have P (cid:104) S n,(cid:15) (cid:54) = S n,(cid:15),N (cid:105) ≤ P (cid:16) max ≤ k ≤ n || X nk || ≥ N (cid:17) → , as n → ∞ ﬁrst and then N → ∞ . Hence it suﬃces to show that, for each ﬁxed N ≥ , S n,(cid:15),N = o P (1) as n → ∞ ﬁrst and then (cid:15) → . Note that sup ≤ j ≤ l n (cid:12)(cid:12)(cid:12) c n n n (cid:88) k =1 (cid:12)(cid:12) (cid:101) K (cid:2) c n ( k/n − τ j ) (cid:3)(cid:12)(cid:12) I ( c n | k/n − τ j | ≤ A ) − (cid:90) A − A | (cid:101) K ( x ) | dx (cid:12)(cid:12)(cid:12) → , and, if K ( x ) is monotonic on ( −∞ , − A ) and ( A, ∞ ) then for suﬃciently large n , uniformly for ≤ j ≤ l n , c n n n (cid:88) k =1 (cid:12)(cid:12) (cid:101) K (cid:2) c n ( k/n − τ j ) (cid:3)(cid:12)(cid:12) I ( c n | k/n − τ j | > A )= c n n n (cid:88) k =1 K (cid:2) c n ( k/n − τ j ) (cid:3) I ( c n | k/n − τ j | > A ) ≤ (cid:90) | x | >A − c n /n K ( x ) dx ≤ (cid:90) | x | >A K ( x ) dx < (cid:15). Hence, in terms of the uniformed boundedness of G N ( x ) , we have ES n,(cid:15),N ≤ C N sup k E | v k | l n l n (cid:88) j =1 c n n n (cid:88) k =1 (cid:12)(cid:12) (cid:101) K (cid:2) c n ( k/n − τ j ) (cid:3)(cid:12)(cid:12) → , as n → ∞ ﬁrst and then (cid:15) → . Hence S n,(cid:15),N = o P (1) as n → ∞ ﬁrst and then (cid:15) → . The proofof (54) is completed. (cid:3) We ﬁrst prove (27). Using similar arguments as in the proof of (46) or (54), it suﬃces to show that,as n → ∞ , I n := c n n n (cid:88) k =1 l n (cid:88) ≤ i

Journal of Econometrics , , 349–374.[2] Billingsley P. (1968). Convergence of probability measures.

Wiley, New York.[3] Bollerslev, T., Osterrieder, D., Sizova, N. and Tauchen, G. (2013). Risk and return: long-runrelations, fractional cointegration, and return predictability.

Journal of Financial Economics, , 409-424.[4] Breitung, J. and Demetrescu, M. (2015). Instrumental variable and variable addition basedinference in predictive regressions.

Journal of Econometrics, (1), 358-375.[5] Buchmann, B. and Chan, N.H. (2007). Asymptotic theory of least squares estimators for nearlyunstable processes under strong dependence.

Annals of Statistics , (5), 2001–2017.[6] Campbell, J.Y. and Yogo, M. (2006). Eﬃcient tests of stock return predictability. Journal ofFinancial Economics, , 27–60. 437] Chan, N. and Wang, Q. (2015). Nonlinear regressions with nonstationary time series. Journalof Econometrics, , 182-195.[8] Christopeit, N. (2009). Weak convergence to nonlinear transformations of integrated processes:the multivariate case.

Econometric Theory , , 1180-1207.[9] Demetrescu, M., Georgiev, I., Rodrigues, P. and Taylor, R. (2020). Testing for episodic pre-dictability in stock returns. Journal of Econometrics , in press.[10] Duﬀy J.A. and Kasparis, I. (2018). Regressions with fractional d = 1 / and weakly nonsta-tionary processes. Mimeo , arXiv:1812.07944[11] Giraitis, L. and Phillips, P.C.B. (2006). Uniform limit theory for stationary autoregression.

Journal of Time Series Analysis , (1), 51-60.[12] Hall, P. and Heyde, C.C. (1980). Martingale limit theory and its application . Academic Press,New York.[13] Hjalmarsson, E. (2011). New methods for inference in long-horizon regressions.

Journal ofFinancial and Quantitative Analysis , , 815-839[14] Hosseinkouchack, M. and Demetrescu, M. (2019) Finite-sample size control of IVX-based testsin predictive regressions. Mimeo .[15] Hu, Z., Phillips, P.C.B. and Wang, Q. (2019). Nonlinear cointegrating power regression withendogeneity.

Preprint.

Cowles Foundation Discussion Papers No. 2211.[16] Hualde, J. and Robinson, P.M. (2010). Semiparametric inference in multivariate fractionallycointegrated systems.

Journal of Econometrics , (2), 492-511.[17] Johansen, S. (1995). Likelihood-based inference in cointegrated vector auto-regressive models .Oxford University Press, New York.[18] Kallenberg, O. (2002).

Foundations of Modern Probability, Second Edition.

Springer-Verlag,Berlin.[19] Kasparis, I.(2010). The Bierens test for certain nonstationary models’,

Journal of Economet-rics , , 221-230.[20] Kasparis, I., Andreou, E., and Phillips, P.C.B. (2015). Nonparametric predictive regression. Journal of Econometrics , (2), 468-494. 4421] Kostakis, A., Magdalinos, T. and Stamatogiannis, M.P. (2015). Robust econometric inferencefor stock return predictability. Review of Financial Studies , (5), 1506-1553.[22] Magdalinos, T. and Phillips, P.C.B. (2009). Econometric inference in the vicinity of unity. Mimeo , Singapore Management University.[23] Marmer. V. (2007). Nonlinearity, nonstationarity and spurious forecasts.

Journal of Econo-metrics , , 1-27.[24] Mikusheva, A. (2007). Uniform inference in autoregressive models. Econometrica , (5), 1411-1452.[25] Park, J.Y. (2003). Nonstationary nonlinear heteroskedasticity. Journal of Econometrics , 110,383-415.[26] Park, J.Y. and Phillips P.C.B. (1999). Asymptotics for nonlinear transformations of integratedtime series.

Econometric Theory , (3), 269-298.[27] Park, J.Y. and Phillips P.C.B. (2001). Nonlinear regressions with integrated time series, Econo-metrica , (1), 117-161.[28] Phillips, P.C.B. (1991). Optimal inference in cointegrated systems. Econometrica , (2), 283-306.[29] Phillips, P.C.B. (1995). Fully modiﬁed least squares and vector autoregression. Econometrica , (5) 1023-1078.[30] Phillips, P.C.B. (2014). On conﬁdence intervals for autoregressieve roots and predictive regres-sions. Econometrica , (3), 1177-1195.[31] Phillips, P.C.B. (2015). Pitfalls and possibilities in predictive regression. Journal of FinancialEconometrics , (3), 521–555.[32] Phillips, P.C.B. and Hansen, B. (1990). Statistical inference in instrumental variables regressionwith I(1) processes. Review of Economic Studies , (1), 99-125[33] Philips P.C.B., Li, D. and Gao, J. (2017) Estimating smooth structural change in cointegrationmodels. Journal of Econometrics , , 180-195.[34] Phillips, P.C.B. and Magdalinos, T. (2007). Limit theory for moderate deviations from a unitroot. Journal of Econometrics , (1), 115-130.4535] Robinson, P.M. (1995). Gaussian semiparametric estimation of long range dependence. Annalsof Statistics , (5), 1630-1661.[36] Robinson, P.M. and Hualde, J. (2003). Cointegration in fractional systems with unknownintegration orders. Econometrica , (6), 1727-1766.[37] Shimotsu, K. and Phillips, P.C.B. (2005). Exact local Whittle estimation of fractional integra-tion. Annals of Statistics , (4), 1890-1933.[38] Wang, Q. (2014). Martingale limit theorem revisited and nonlinear cointegrating regression. Econometric Theory , (3), 509-535.[39] Wang, Q. (2015). Limit Theorems for Nonlinear Cointegrating Regression , World Scientiﬁc,Singapore.[40] Wang, Q. and Phillips P.C.B. (2009a). Asymptotic theory for local time density estimationand nonparametric cointegrating regression.