Locally trimmed least squares: conventional inference in possibly nonstationary models
LLOCALLY TRIMMED LEAST SQUARES: CONVENTIONALINFERENCE IN POSSIBLY NONSTATIONARY MODELS
Zhishui Hu ∗ , Ioannis Kasparis † and Qiying Wang ‡ June 24, 2020
Abstract
A novel IV estimation method, that we term
Locally Trimmed LS (LTLS) , is developed whichyields estimators with (mixed) Gaussian limit distributions in situations where the data may beweakly or strongly persistent. In particular, we allow for nonlinear predictive type of regressionswhere the regressor can be stationary short/long memory as well as nonstationary long memoryprocess or a nearly integrated array. The resultant t-tests have conventional limit distributions(i.e. N (0 , ) free of (near to unity and long memory) nuisance parameters. In the case wherethe regressor is a fractional process, no preliminary estimator for the memory parameter isrequired. Therefore, the practitioner can conduct inference while being agnostic about theexact dependence structure in the data. The LTLS estimator is obtained by applying certain chronological trimming to the OLS instrument via the utilisation of appropriate kernel functionsof time trend variables. The finite sample performance of LTLS based t-tests is investigatedwith the aid of a simulation experiment. An empirical application to the predictability of stockreturns is also provided. It is well known that under nonstationarity regression estimators do not have conventional limitdistributions in general. As a consequence, the inferential procedures developed for stationary dataare not applicable under nonstationarity. A number of early studies in the area of nonstationaryeconometrics (e.g. Phillips and Hansen, 1990; Johansen, 1995; Phillips, 1995; Robinson and Hualde2003) develop inferential procedures suitable for nonstationary models, however these methods are ∗ International Institute of Finance, School of Management, University of Science and Technology of China, Hefei,Anhui 230026, China, email: [email protected]. † University of Cyprus, Nicosia, Cyprus, email: [email protected]. ‡ School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia, email: [email protected]. a r X i v : . [ ec on . E M ] J un ot valid in general under stationarity. In fact, it is well known that methods such as FMLS (c.f.Phillips, 1995) may exhibit severe size distortions even under local deviations from the (fractional)unit root paradigm. This duality in inference, has made empirical work in time series econometricselusive. Practitioners typically need to make preliminary (some times ad hoc) assumptions aboutthe persistence level in the data or apply some sort of pre-testing -and therefore expose inferenceto problems associated to pre-testing- before proceeding to estimation and inference. A number ofstudies has attempted to address this issue using conservative confidence intervals (for a review seeMikusheva (2007), Phillips (2014) and the references therein). The more recent work of Magdalinosand Phillips (2009; MP hereafter) (see also Kostakis, Magdalinos and Stamatogiannis (2015) forrefinements and additional results) follows a completely different direction. MP propose an IVestimator (IVX) that that has mixed Gaussian limit distribution at the expense of an arbitraryreduction in the convergence rate, relative to that of the OLS estimator.In this paper we follow an approach similar to the pioneering work of MP. To fix ideas considerthe simple model y k = βx k + u k , k = 1 , ..., n, (1)where x k is a nearly integrated (NI) and x k predetermined with respect to some martingale differ-ence error term ( u k ). MP construct the so called IVX instrument by applying the following linearfiltering to the OLS instrument ( x k ) Z kn = k − (cid:88) j =0 (cid:16) c z n b (cid:17) j ( x k − j − x k − j − ) , (2)for some c z < and < b < . This linear filtering transforms x k into a mildly integrated process(e.g. see Giraitis and Phillips, 2006; Phillips and Magdalinos, 2007) that is less persistent thana NI array (e.g. x k ). By choosing b arbitrary close to unity, the reduction in the signal of theinstrument results into an arbitrary small reduction in the convergence rate of the IVX estimator,relative to that of the OLS, and this is sufficient for a martingale CLT to operate, rendering IVXbased inference conventional. The choice of b is important to inference with smaller b resulting inbetter size control at the expense of asymptotic power. Note that as b ↑ , Z kn approximates theNI process x k and the IVX estimator resembles the behaviour of the OLS estimator. The recentwork of Yang, Long, Peng and Cai (2019) generalises the IVX method to regression models withserially correlated regression (parametric AR) errors, whilst Demetrescu, Georgiev, Rodrigues andTaylor (2020) apply a modified version of the IVX estimator to test for episodic predictability instock returns. 2e consider an alternative method for reducing the signal of the OLS instrument. Let K bean integrable kernel function and set Z kn = K [ c n ( k/n − τ )] x k ,where c n is a positive deterministic sequence such that c − n + c n n − → and < τ < . Forsimplicity set τ = 1 / and K (0) = 1 . In this case the kernel function extracts information fromthe OLS instrument for observations near the middle of the sample. In particular, Z kn ≈ x k i.e.when k ≈ n/ , and Z kn ≈ when k is far from n/ . In other words certain chronological trimming applies around the “ chronological point τ ”. By allowing the c n sequence to diverge at an arbitraryslow rate, the resultant IV (LTLS) estimator attains an arbitrary slower convergence rate relative tothe OLS estimator. In principle, it is possible to extract information around multiple chronologicalpoints < τ < ... < τ l n < where l n is either fixed or l n → ∞ such that l n = o ( c n ) . In this casethe relevant instrument is Z kn = l n (cid:88) j =1 K [ c n ( k/n − τ j )] x k . (3)As long as the LTLS estimator converges at slower rate, than the OLS estimator, limit theory ismixed Gaussian for nonstationary regressor covariates and Gaussian for stationary. In particular,the reduction in the signal of the OLS instrument allows a martingale CLT (c.f. Wang, 2014) tooperate even if x k is nonstationary. Notice that if c n is too small or if too many chronologicalpoints ( l ) are employed, then Z kn approximates the OLS instrument and as a consequence LTLSbased inference resembles OLS based inference. This can be easily seen, if a vanishing sequence c n is employed. Note that for c n → , Z kn ≈ l n K (0) x k .Our theoretical framework allows for a wide range of stationary and nonstationary linear pro-cesses as well NI arrays. In particular, x k can be a stationary or a nonstationary fractionalprocess. Consider the LTLS estimator of β in (1) that utilises the instrument of (3) i.e. ˆ β = (cid:80) nk =1 Z kn y k / (cid:80) nk =1 Z kn x k . Let t ∈ [0 , and suppose that x k is a nonstationary process such thatfor some d n → ∞ , d − n x (cid:98) nt (cid:99) ⇒ X t in D [0 , where X t is a continuous process. For instance X t canbe a fractional BM or a fractional Ornstein-Uhlenbeck process (see Remark 1 below) depending onsome memory or near-to-unity nuisance parameter. Then we have d n (cid:114) nl n c n (cid:16) ˆ β − β (cid:17) → d MN (cid:32) , E (cid:0) u t (cid:1) (cid:82) R K ( x ) dx (cid:0)(cid:82) R K ( x ) dx (cid:1) (cid:82) X t dt (cid:33) . Because c n → ∞ and l n = o ( c n ) the convergence rate of the LTLS is slower than that of theOLS estimator ( d n √ n ). Further, note that nuisance parameters affect the limit distribution only3ia the mixing variate (cid:104)(cid:82) X t dt (cid:105) − and as a consequence the studentised LTLS estimator hasstandard normal limit distribution. Interestingly, the limit variance shown above is the same, upto a constant, to that of the FMLS estimator for the case where x k ∼ I (1) .We mention that the constant that features in the limit variance of the LTLS estimator above,can be made arbitrarily small by an appropriate choice of the kernel function. For example supposethat K ( x ) = (cid:0) πς (cid:1) − / exp (cid:16) − x ς (cid:17) . Then E (cid:0) u t (cid:1) (cid:90) R K ( x ) dx (cid:46) (cid:18)(cid:90) R K ( x ) dx (cid:19) = E (cid:0) u t (cid:1) √ πς → as ς → ∞ . Nevertheless, choosing a large value of the kernel variance parameter has the sameeffect as choosing a small value for c n . Therefore as ς → ∞ , the LTLS estimator approximatesthe OLS estimator.It should be further noted that for nonstationary fractional covariates (i.e. I ( d ) , d > / ),methods like FMLS (e.g. Phillips, 1995) or the spectral GLS of Robinson and Hualde (2003) (seealso Hualde and Robinson, 2010) are asymptotically equivalent the Gaussian pseudo maximumlikelihood and therefore asymptotically efficient (c.f. Phillips, 1991). The key feature of thesemethods is to induce asymptotically mixed Gaussian estimators by certain modification in thedependent variable that involves (fractionally) differencing the covariates. In the context of (1) suchdifferencing takes the form ( I − L ) ˆ d x k , where L is the lag operator and ˆ d is a preliminary estimatorfor the memory parameter of x k . Nevertheless, if there is a local deviation (order O ( n − ) ) from the(fractional) unit root model, the aforementioned methods yield mixed Gaussian limit theory onlyif the following quasi fractional differencing is applied ( I − ( c/n ) L ) ˆ d x k , where c is a local to unity parameter. A non trivial value for the local to unity parameter howeverrenders the aforementioned methods infeasible because of the lack of identifiability of c . It is wellknown that if c (cid:54) = 0 , inference based on methods like FMLS are prone to severe size distortionseven if there is moderate correlation between the regressor and the regression error.The remaining of this work is organised as follows. Section 2 provides basic limit theory forlocally trimmed functionals of stationary and nonstationary processes. This limit theory is utilised (cid:90) R K ( x ) dx = (cid:0) πς (cid:1) − (cid:90) R exp (cid:18) − x ς / (cid:19) dx = (cid:0) πς (cid:1) − (cid:112) π ( ς /
2) = 12 √ πς .
4n Section 3 for exploring the limit properties of the LTLS estimation and inference. Section 4provides a simulation study and Section 5 an empirical application on the predictability of stockreturns.Throughout this paper we make use of the following notation. For two deterministic sequences a n and b n , a n ∼ b n denotes lim n →∞ a n /b n = 1 . { A } is the indicator function on set A . We maywrite the integral (cid:82) R f ( x ) dx as (cid:82) f . ⇒ denotes weak convergence in the space D [0 , . For a vector x , (cid:107) x (cid:107) is its inner product norm and x (cid:48) its transpose. By [ x ] we denote the integer part of a positivenumber x . Finally, diag { a , ..., a p } denotes a p × p diagonal matrix with elements { a , ..., a p } onthe main diagonal, → d denotes the convergence in distribution and Y := MN ( , (cid:80) ) denotes aGaussian variate (mixing normal) with characteristic function f ( t ) = Ee it (cid:48) Y = Ee − t (cid:48) (cid:80) t/ . In this section we develop basic limit theory for locally trimmed (LT) sample functionals ofstationary and nonstationary processes. Our basic limit theory is utilised in Section 3 for theasymptotic analysis of the LTLS estimator. Let { x k } ≤ k ≤ n be a scalar time series process and { X nk } ≤ k ≤ n,n ≥ be some scalar random array. Further, let K be an integrable kernel functionand g ( . ) = [ g ( . ) , ..., g p ( . )] (cid:48) , where, for each i = 1 , ..., p , g i is a measurable function. For l ∈ N and < τ < ... < τ l < , set S n,l = c n n n (cid:88) k =1 g ( x k ) l l (cid:88) j =1 K [ c n ( k/n − τ j )] ,M n,l = (cid:114) c n n n (cid:88) k =1 g ( x k ) √ l l (cid:88) j =1 K [ c n ( k/n − τ j )] u k ,S n,l = c n n n (cid:88) k =1 g ( X nk ) l l (cid:88) j =1 K [ c n ( k/n − τ j )] ,M n,l = (cid:114) c n n n (cid:88) k =1 g ( X nk ) √ l l (cid:88) j =1 K [ c n ( k/n − τ j )] u k , where c n is a sequence of positive constants, l either fixed or l → ∞ as n → ∞ , and u k to-gether with an appropriate filtration {F k } forms a martingale difference sequence (such that X nk , x k are F k − -measurable). The limit theory of the LTLS estimator relies on the asymptotics of { S jn,l , M jn,l } j =1 . Limit theory for the functionals { S n,l , M n,l } is relevant for stationary regres-sors whilst { S n,l , M n,l } for nonstationary. In fact, it is assumed that X nk satisfies some FCLT.The term S n,l resembles certain functionals considered by Phillips, Li and Gao (2017) who study5he estimation of cointegrated models with smooth time varying parameters (TVP). The aforemen-tioned work considers terms of the form c n n n (cid:88) k =1 X nk K [ c n ( k/n − τ )] , < τ < , where X nk is an I (1) process normalised by √ n . As explained below, under our assumptions X nk canbe an appropriately normalised I ( d ) , d > / , process or a NI array (possibly driven by fractionalerrors). Therefore the limit results provided in this section are also relevant to the estimation ofTVP models for the case where the covariate is a general nonstationary process satisfying someFCLT (see Assumption A3 below).To facilitate basic limit results, we make use of the following conditions. A1 (innovations): { η k , F k } k ≥ , where η (cid:48) k = ( ξ k +1 , u k ) and F k = σ ( u k , u k − , ..., u ; ξ j , j ≤ k + 1) , forms a -dimensional martingale difference satisfying the following conditions:(a) sup k ≥ E ( u k I ( | u k | ≥ M ) |F k − ) = o P (1) , as M → ∞ ;(b) sup k ≥ E ( ξ k I ( | ξ k | ≥ M ) |F k − ) = o P (1) , as M → ∞ ;(c) there exists a positive definite matrix: Σ = σ ξ σ ξu σ uξ σ u so that, for all k ≥ , E ( η k η (cid:48) k | F k − ) = Σ , a.s. A2 (stationary process): x k is an ergodic (strictly) stationary random sequence and a functionalof ξ k , ξ k − , ... satisfying that E (cid:107) g ( x k ) (cid:107) δ < ∞ for some δ > . A3 (nonstationary process and invariance principle): X nk = d − n x k , where < d n = var ( x n ) →∞ and x k is a functional of ξ k , ξ k − , ... (depending on n is allowed) so that, on D R [0 , , √ n [ nt ] (cid:88) k =1 ξ k , √ n [ nt ] (cid:88) k =1 ξ − k , X n, [ nt ] ⇒ [ B t , B t , X t ] , (4)where B t and B t are two independent Gaussian process with mean zero and stationaryindependent increments, and X t is a continuous process that depends only on functionals of { B t } ≤ t ≤ and { B t } ≤ t ≤ . A4 (kernel function and restrictions on τ j , l n and c n ):6a) K ( x ) is a positive real function having a compact support;(b) < c n → ∞ and c n /n → ;(c) τ j = j/ ( l n + 1) where j = 1 , ..., l n with l − n + c − n l n → .We remark that the innovation process { η k , F k } k ≥ used in A1 is standard in literature so thatboth M n,l and M n,l have a martingale structure. The uniform integrability conditions (a) and (b)are weak in comparison with the high moments used in previous works. See, for instance, Wang(2014) and Wang and Phillips (2009a, b). Since Σ is required to be a positive definite matrix,condition (c) excludes the process u k to be ARCH and GARCH models. The condition (c) isrequired for technical reasons, which seems to be difficult to reduce at the moment.Stationary process given in A2 is extensively used in empirical applications where examplesinclude short and long memory (fractional) processes. Typical examples on nonstationary processessatisfying A3 have the form: x k = ρx k − , + ∞ (cid:88) i =0 φ i ξ k − i , where ρ = 1 + c/n with c ∈ R and (cid:80) ∞ i =0 φ i < ∞ . For the latter specification, (4) holds with X t being a fractional Ornstein-Uhlenbeck process. See, for instance, Buchmann and Chan (2007),Wang and Phillips (2009a, b) and Wang (2015).As for A4 , the restriction on compact support for K ( x ) can be relaxed if we have more conditionson l n . Indeed, in the following main results, A4 can be replaced by the following: A4 ∗ (kernel function and restrictions on τ j , l n and c n ):(a) K ( x ) is an eventually monotonic (i.e., there exists A > such that K ( x ) is monotonicon ( −∞ , − A ) and ( A , ∞ ) ) positive function so that K ( x ) ≤ C/ (1 + | x | ) and (cid:82) K < ∞ ;(b) < c n → ∞ and c n /n → ;(c) τ j = j/ ( l n + 1) where j = 1 , ..., l n with l − n + c − n l n log n → .We now introduce the limit theory for LT sample functionals. Since there are essential differencebetween M n,l and M n,l , the main results will be presented based on stationary and nonstationaryprocesses, separately. Theorem 1.
Suppose A2 and A4 or A4 ∗ hold. Then, as n → ∞ , we have S n,l n = Eg ( x ) (cid:90) K + o P (1) . (5)7 f in addition A1 , then, as n → ∞ , M n,l n → d N (cid:18) , σ u E (cid:2) g ( x ) g ( x ) (cid:48) (cid:3) (cid:90) K (cid:19) . (6) Theorem 2.
Suppose that A3 and A4 or A4 ∗ hold and g ( . ) is continuous. Then, as n → ∞ , wehave S n,l n = (cid:90) g ( X n, (cid:98) nt (cid:99) ) dt (cid:90) K + o P (1) → d (cid:90) g ( X t ) dt (cid:90) K. (7) If in addition A1 , jointly with (7), we have M n,l n → d MN (cid:18) , σ u (cid:90) g ( X t ) g ( X t ) (cid:48) dt (cid:90) K (cid:19) . (8) Remark . If we are only interesting the similar results as those of (5) and (7), conditions A2 and A3 can be reduced. For instance, the result (7) still holds if only (4) is replaced by X n, [ nt ] ⇒ X t on D R [0 , . See Lemma 1 in Section 6 for more details. Furthermore, if x k is a weakly nonstationaryprocess (i.e., I (1 / and mildly integrated processes, where FCLTs do not apply) as consideredin Phillips and Magdalinos (2007) and Duffy and Kasparis (2018), some preliminary calculationssuggest (see also Theorem 2.2 in Duffy and Kasparis, 2018) that c n n n (cid:88) k =1 g (cid:0) d − n x k (cid:1)(cid:110) l n l n (cid:88) j =1 K [ c n ( k/n − τ j )] (cid:111) → d (cid:90) R g ( x + X − ) ϕ σ ( x ) dx (cid:90) K, where ϕ σ ( x ) is the density of a N (cid:0) , σ (cid:1) variate ( σ > ) and X − ∼ N (cid:0) , σ − (cid:1) ( σ − ≥ ).Discussions toward this kind of generalization, together with the investigation for trimmed samplefunctionals of weakly nonstationary processes, are left for future work. Remark . The continuity requirement in Theorem 2 is not essential for (7) and (8). These resultscan be extended to the case where g is locally Lebesgue integrable, if we impose more smoothnessconditions on X nk (see for example Christopeit (2009) and the references therein). This kind ofgeneralisation involves more complicated derivations and will not be pursued here in order to keepthe paper under reasonable length. Remark . Following the proof of Theorem 1, it is easy to see that results (5) and (6) still hold if A4 (c) is replaced by τ j = j/ ( l + 1) where j = 1 , ..., l , i.e., if l n ≡ l is fixed. As for (7) and (8), if A4 (c) is replaced by τ j = j/ ( l + 1) where j = 1 , ..., l , we have [ S n,l , M n,l ] → d l l (cid:88) j =1 g ( X τ j ) (cid:90) K, MN (cid:16) , σ u l l (cid:88) j =1 g ( X τ j ) g ( X τ j ) (cid:48) (cid:90) K (cid:17) . d − n x k as given A3 ). For the purposes of regression analysis, limit theory for non rescaled processes(i.e. x k ) is more relevant. Following Park and Phillips (1999, 2001), we assume that the function g ( . ) = [ g ( . ) , ..., g p ( . )] (cid:48) is asymptotically homogeneous, i.e. for large λg i ( λx ) ≈ π i ( λ ) H i ( x ) , i = 1 , ..., p where π i (positive real valued function) is the “asymptotic order” of g i and H i is the “asymptotichomogeneous function” of g i that is assumed continuous. Several specifications of interest satisfythese conditions e.g. polynomial functions, logarithmic, indicator functions and distribution typeof functions e.g. see Park and Phillips (2001) for more details. Set π ( . ) := diag { π ( . ) , ..., π p ( . ) } and H ( . ) = [ H ( . ) , ..., H p ( . )] (cid:48) . The following result is the counterpart of Theorem 2 for additivetransformations of non rescaled sequences. Theorem 3.
Suppose that: ( a ) A1, A3 and A4 or A4 ∗ hold; ( b ) for each i = 1 , .., p , there exists a continuous function H i and π i : (0 , ∞ ) → (0 , ∞ ) , so that g i ( λx ) = π i ( λ ) H i ( x ) + R i ( λ, x ) , where | R i ( λ, x ) | ≤ a i ( λ )(1 + | x | δ ) for some δ > and a i ( λ ) /π i ( λ ) → , as λ → ∞ .Then, as n → ∞ , we have n (cid:88) k =1 π ( d n ) − g ( x k ) (cid:110) l n (cid:88) j =1 K [ c n ( k/n − τ j )] (cid:111)(cid:104) c n nl n , (cid:114) c n nl n u k (cid:105) = n (cid:88) k =1 H ( X nk ) (cid:110) l n (cid:88) j =1 K [ c n ( k/n − τ j )] (cid:111)(cid:104) c n nl n , (cid:114) c n nl n u k (cid:105) + o P (1) (9) → d (cid:104) (cid:90) H ( X t ) dt (cid:90) K, MN (cid:16) , σ u (cid:90) H ( X t ) H ( X t ) (cid:48) dt (cid:90) K (cid:17)(cid:105) . (10) Remark . As noticed in Remark 3, if A4 (c) is replaced by τ j = j/ ( l + 1) where j = 1 , ..., l , wehave n (cid:88) k =1 π ( d n ) − g ( x k ) (cid:110) l (cid:88) j =1 K [ c n ( k/n − τ j )] (cid:111)(cid:104) c n nl , (cid:114) c n nl u k (cid:105) d l l (cid:88) j =1 H ( X τ j ) (cid:90) K, MN (cid:16) , σ u l l (cid:88) j =1 H ( X τ j ) H ( X τ j ) (cid:48) (cid:90) K (cid:17) . Remark . Suppose K ∗ is a real function satisfying A4 (a) or A4 ∗ (a). Let < τ ∗ < . Similararguments as in the proof of Theorems 2 and 3 show that, under the conditions of Theorem 3 with g ( . ) = [ g ( . ) , g ( . )] (cid:48) , (cid:16) (cid:90) H ( X n, [ nt ] ) dt, U n , U n (cid:17) → d (cid:16) (cid:90) H ( X t ) dt, MN (cid:0) , σ u V (cid:1) (cid:17) , (11) (cid:16) (cid:90) H ( X n, [ nt ] ) dt, U n , U n (cid:17) → d (cid:16) (cid:90) H ( X t ) dt, MN (cid:0) , σ u V (cid:1) (cid:17) , (12)where U n = (cid:114) c n n n (cid:88) k =1 π ( d n ) − g ( x k ) √ l n l n (cid:88) j =1 K [ c n ( k/n − τ j )] u k ,U n = (cid:114) c n n n (cid:88) k =1 √ l n l n (cid:88) j =1 K ∗ [ c n ( k/n − τ j )] u k ,U n = (cid:114) c n n n (cid:88) k =1 K ∗ [ c n ( k/n − τ ∗ )] u k ,V = (cid:82) H ( X t ) dt (cid:82) K (cid:82) H ( X t ) dt (cid:82) KK ∗ (cid:82) H ( X t ) dt (cid:82) KK ∗ (cid:82) ( K ∗ ) ,V = (cid:82) H ( X t ) dt (cid:82) K (cid:82) ( K ∗ ) . The limit results (11) and (12), together with Theorems 1 - 3, will be utilised in Section 3 next.
The limit theory presented in the previous section is subsequently utilised for deriving the propertiesof the LTLS estimator and a related t-statistic. We consider nonlinear models of the form y k = µ + βf ( x k ) + u k , k = 1 , ..., n, (13)where f is a known regression function ( µ, β ) unknown parameters and the covariate x k can benonstationary process or a stationary one amenable to the limit theory of Theorem 1 or Theorem2 respectively. Further, x k is predetermined with respect to the error u t in the sense x k is F k − -10easurable and { u k , F k } is a martingale difference (c.f. Assumptions A1-A3 ). Similar nonlinearmodels with a predetermined covariate have been considered for example by Park and Phillips (1999,2001) and Chan and Wang (2015), in a parametric set up, and by Wang and Phillips (2009a,b,2011, 2012) in a nonparametric set-up. Let K be a kernel function satisfying A4 (a) or A4 ∗ (a). Let τ j = j/ ( l n + 1) , j = 1 , ..., l n , c n and l n be deterministic sequences satisfying A4 (b) and (c) or A4 ∗ (b) and (c). We also allow for l n tobe a fixed constant. Set K kn := l n (cid:88) j =1 K [ c n ( k/n − τ j )] . (14)Our aim is to estimate the unknown parameter β in (13) by using the following instrument for f ( x k ) Z kn := f k K kn := f ( x k ) K kn . As remarked in Section 1, due to the integrability of K , a trimming effect applies around thechronological point(s) ( cp(s) hereafter) τ j which in turn reduces the signal of the OLS instrument f ( x k ) . The reduction is more pronounced when the distance between k/n and τ j is large, and/orthe sequence c n diverges fast. Clearly, for K kn = 1 we get the OLS estimator as a special case.The reduction in the instrument signal enables an extended martingale given by Wang (2014) tooperate. As a result the estimator under consideration has a mixed Gaussian limit distribution,making pivotal inference possible.A trimming method is also crucial for demeaning { y k } i.e. taking into account the unknownintercept µ . Let K ∗ kn , k = 1 , ..., n be additive functionals of certain integrable kernel function. Forany sequence { a k } nk =1 let a := (cid:80) nk =1 a k K ∗ kn (cid:80) nk =1 K ∗ kn and a k := a k − a . (15)We will consider two possibilities for K ∗ kn . Either K ∗ kn := l n (cid:88) j =1 K ∗ [ c n ( k/n − τ j )] or K ∗ kn := K ∗ [ c n ( k/n − τ ∗ )] , (16)where K ∗ satisfies A4 (a), τ j = j/ ( l n + 1) , j = 1 , , ..., l n , are given above and < τ ∗ < . The firstterm in (15) involves a trimmed sample mean around an array of several cps , whilst the second is Here we consider nonlinear models in x k only. Our results can be generalised to models that are both nonlinearin x k and the parameters along the lines of Chan and Wang (2015) for instance.
11 trimmed sample mean based on a single fixed cp . Define the LTLS estimator as ˆ β := (cid:80) nk =1 Z kn y k (cid:80) nk =1 Z kn f k . The employment of a “trimmed” sample mean is crucial for obtaining mixed Gaussian limit theory.Notice that ˆ β = β + 1 (cid:80) nk =1 Z kn f k (cid:40) n (cid:88) k =1 f k K kn u k − ( (cid:80) nk =1 f k K kn ) (cid:80) nk =1 K ∗ kn u k (cid:80) nk =1 K ∗ kn (cid:41) . For nonstationary x k the two martingale terms shown above converge jointly to a bivariate mixedGaussian limit. In particular, (cid:34)(cid:114) c n n n (cid:88) k =1 f (cid:0) d − n x k (cid:1) K kn u k , (cid:114) c n n n (cid:88) k =1 K ∗ kn u k (cid:35) → d MN ( , V ) , for some random matrix V . Note that if instead the standard demeaning was employed (i.e. K ∗ = 1 ), then (cid:34)(cid:114) c n n n (cid:88) k =1 f (cid:0) d − n x k (cid:1) K kn u k , √ n n (cid:88) k =1 u k (cid:35) (cid:57) d MN ( , V ) , for some random matrix V , despite the fact that each of the components on the l.h.s. aboveconverges weakly to some (mixed) Gaussian limit.To investigate the limit properties of the LTLS estimator ˆ β in detail, set λ n := nl n c n and λ ∗ n := nl ∗ n c n , where l ∗ n := l n , if K ∗ kn = (cid:80) l n j =1 K ∗ [ c n ( k/n − τ j )]1 , if K ∗ kn = K ∗ [ c n ( k/n − τ ∗ )] . The sequences λ n , λ ∗ n give the order of the terms (cid:80) nk =1 K kn and (cid:80) nk =1 K ∗ kn which in turn determinethe convergence rate of the LTLS estimator. Further set R ∗ = 1 and Q ∗ = (cid:82) KK ∗ if l ∗ n = l n ; R ∗ = Q ∗ = 0 if l ∗ n = 1 .We have the following main results for the asymptotics of the LTLS estimator ˆ β . Theorem 4 isfor stationary regressor. Limit theory in nonstationary case is given in Theorem 5. Note that by standard arguments (Euler summation) n (cid:88) k =1 K ∗ kn ∼ nl ∗ n c n (cid:90) K ∗ . heorem 4. Suppose that: ( a ) A1 , A2 with g = f , and A4 or A4 ∗ hold; ( b ) K ∗ satisfies A4 (a) or A4 ∗ (a) and < τ ∗ < .Then, as n → ∞ , we have (cid:112) λ ∗ n (cid:16) ˆ β − β (cid:17) → d σ u N (cid:0) , Ω − LM L (cid:48) (cid:1) , (17) where Ω = (cid:110) Ef ( x ) − [ Ef ( x )] (cid:111) (cid:82) K , L = (cid:0) R ∗ , − Ef ( x ) (cid:82) K/ (cid:82) K ∗ (cid:1) and M = Ef ( x ) (cid:82) K Ef ( x ) Q ∗ Ef ( x ) Q ∗ (cid:82) ( K ∗ ) . Theorem 5.
Suppose that ( a ) A1 , A3 and A4 or A4 ∗ hold; ( b ) f ( x ) is an asymptotically homogeneous function, i.e., there exists a continuous function H and π : (0 , ∞ ) → (0 , ∞ ) such that f ( λx ) = π ( λ ) H ( x ) + R ( λ, x ) , where | R ( λ, x ) | ≤ a ( λ )(1 + | x | δ ) for some δ > and a ( λ ) /π ( λ ) → , as λ → ∞ ; ( c ) K ∗ satisfies A4 (a) or A4 ∗ (a) and < τ ∗ < .Then, as n → ∞ , (cid:112) λ ∗ n π ( d n ) (cid:16) ˆ β − β (cid:17) → d σ u MN (cid:18) , (cid:16) C (cid:90) K (cid:17) − AV A (cid:48) (cid:19) , (18) where C = (cid:82) H ( X t ) dt − (cid:104)(cid:82) H ( X t ) dt (cid:105) , if K ∗ kn = (cid:80) l n j =1 K ∗ [ c n ( k/n − τ j )] , (cid:82) H ( X t ) dt − (cid:2) (cid:82) H ( X t ) dt (cid:3) H ( X τ ∗ ) , if K ∗ kn = K ∗ [ c n ( k/n − τ ∗ )] ,A = (cid:20) R ∗ , − (cid:90) H ( X t ) dt (cid:90) K/ (cid:90) K ∗ (cid:21) , and V = (cid:82) H ( X t ) dt (cid:82) K (cid:82) H ( X t ) dt Q ∗ (cid:82) H ( X t ) dt Q ∗ (cid:82) ( K ∗ ) . emark . Due to the fact that (cid:112) λ ∗ n = o ( √ n ) , the convergence rate in LTLS for both stationaryand nonstationary regressor is slower in comparison with that of the OLS estimator. Remark . When a single cp is used in demeaning y k , we have R ∗ , Q ∗ = 0 . In this case, the righthand side of (18) becomes − (cid:82) H ( X t ) dt/ (cid:82) K ∗ (cid:82) H ( X t ) dt − (cid:104)(cid:82) H ( X t ) dt (cid:105) H ( X τ ∗ ) × N (cid:18) , σ u (cid:90) ( K ∗ ) (cid:19) . Simulations presented in Section 4 show that, in finite samples, superior performance is obtainedfor certain configuration that involves multiple cps for the instrumentation of x k (i.e. K ) and asingle cp for demeaning y k (i.e. K ∗ ). An analogous result can be established when the oppositeholds i.e. a single cp ( τ , say) is used for the instrumentation of x k (i.e. K ) and multiple cps (i.e. K ∗ ) are used for demeaning. In particular, in the latter case it can be shown that the limitdistribution (nonstationary x k ) is − H ( X τ ) / (cid:82) K ∗ H ( X τ ) − (cid:104)(cid:82) H ( X t ) dt (cid:105) H ( X τ ) × N (cid:18) , σ u (cid:90) ( K ∗ ) (cid:19) . We do not consider this possibility in the theorems shown above explicitly in order avoid morecomplex exposition.To end this section, we consider the following t -statistic for the hypothesis H : β = β (forsome β ∈ R ) ˆ T := C n ˆ β − β (cid:112) ˜ σ A n V n A (cid:48) n , (19)where A n := (cid:20) , − (cid:80) nk =1 f k K kn (cid:80) nk =1 K ∗ kn (cid:21) , C n := n (cid:88) k =1 Z kn f k , V n := (cid:80) nk =1 K kn f k (cid:80) nk =1 K ∗ kn K kn f k (cid:80) nk =1 K ∗ kn K kn f k (cid:80) nk =1 ( K ∗ kn ) , and ˜ σ := n − (cid:80) nk =1 ˜ u k , where ˜ u k are residuals from OLS estimation of (13). The limit propertiesof ˆ T under the null hypothesis are demonstrated by Theorem 6 below. Theorem 6.
Suppose that either conditions of Theorem 4 or Theorem 5 hold. Then under H : β = β , ˆ T → d N (0 , . emark . Note that the limit distribution of the test statistic under the null hypothesis is standardnormal for both stationary and nonstationary regressors. Under the alternative hypothesis, thedivergence rate of ˆ T is determined by the convergence rate of the LTLS estimator. In particular,for stationary x k it can be easily seen that ˆ T = O P ( (cid:112) λ ∗ n ) . On the other hand in the nonstationarycase we have ˆ T = O P ( (cid:112) λ ∗ n π ( d n )) , where d n = √ n for x k NI or I(1) and d n = n d , / < d < / . Therefore, faster divergence rate is attained for more persistence processes. This fact is alsocorroborated by our simulation results (see Figures 1-3). We next investigate the final sample performance of the t-test based on the LTLS estimator. Inparticular we test the hypothesis H : β = 0 against H : β (cid:54) = 0 at 5% significance level. The vector [ ξ k , u k ] process is generated by ξ k u k ∼ i.d.N , δδ Further, for k = 1 , ..., n the process { y k } is generated by y k +1 = βx k + u k +1 where { x k } is either a NI array of the form x k = (cid:16) cn (cid:17) x k − + ξ k , (20)with c ≤ and x = 0 or a type II fractional process (e.g. see Robinson and Hualde, 2003) of theform ( I − L ) d x k = ξ k { k ≥ } . (21)Let ϕ ς ( x ) be the density of a N (cid:0) , ς (cid:1) variate. Next, set ˜ σ u = n − (cid:80) nk =1 ˜ u k , ˜ σ ξ = n − (cid:80) nk =1 ˜ ξ k , ˜ δ = n − (cid:80) nk =1 ˜ u k ˜ ξ k (cid:113) ˜ σ u ˜ σ ξ , where ˜ u t and ˜ ξ k are OLS residuals from the regressions y k +1 = ˜ µ + ˜ βx k + ˜ u k +1 and x k = ˜ µ x + ˜ ρx k − + ˜ ξ k respectively. Finally, { τ j } l n j =1 are equispaced points on (0 , .We consider 3 set-ups for kernel functionals and cps .151 (set-up 1) K kn = (cid:80) l n j =1 K [ c n ( k/n − τ j )] , K ∗ kn = (cid:80) l n j =1 K ∗ [ c n ( k/n − τ j )] , K ( x ) = ϕ . ( x ) / , K ( x ) ∗ = ϕ ( x ) / , c n = n . , l n = c . n .S2 (set-up 2) K kn = (cid:80) l n j =1 K [ c n ( k/n − τ j )] , K ∗ kn = (cid:80) l n j =1 K ∗ [ c n ( k/n − τ j )] , K ( x ) = ϕ . ( x ) / , K ( x ) ∗ = ϕ ( x ) / , c n = n . , l n = c ˆ αn , ˆ α = 1 − . (cid:12)(cid:12)(cid:12) ˜ δ (cid:12)(cid:12)(cid:12) .S3 (set-up 3) K kn = (cid:80) l n j =1 K [ c n ( k/n − τ j )] , K ∗ kn = K ∗ [ c n ( k/n − . , K ( x ) = ϕ ˆ ς ( x ) , K ( x ) ∗ = ϕ ˆ ς ( x ) / , ˆ ς = ˜ σ u (cid:16) . . (cid:12)(cid:12)(cid:12) ˜ δ (cid:12)(cid:12)(cid:12)(cid:17) , c n = n ˆ α , ˆ α = − . . (cid:12)(cid:12)(cid:12) ˜ δ (cid:12)(cid:12)(cid:12) , l n = log n .In S1 and S2 multiple cps are used for both K kn and K ∗ kn whilst in S3 K ∗ kn involves a single cp . Contrary to S1, in S2 a data driven approach is followed for the determination of the numberof cps ( l n ). As remarked in Section 1, a small c n and/or large number of cps results in a LTLSestimator approximately equal to the OLS estimator. The OLS estimator in general has a goodpower properties but is severely oversized when endogeneity is strong (i.e. when | δ | is close to one).In S2 a large number of cps is utilised when endogeneity is weak whilst for l n drops as | δ | approachesone. A similar data-driven approach is utilised in S3. In this case c n is very small (vanishing) for δ close to zero, whilst c n is large (diverging) for | δ | close to one. Further, in S3 the choice of thekernel variance is also data driven. Preliminary simulations have shown that superior performanceis attained when ς = 0 . for δ ≈ and ς = 1 for | δ | ≈ . Therefore, ˆ ς = ˜ σ u (cid:16) . . (cid:12)(cid:12)(cid:12) ˜ δ (cid:12)(cid:12)(cid:12)(cid:17) provides an interpolation between these values based on the actual data.For S1 and S2 we use the test statistic of (19). For S3 we use A ∗ n := (cid:104) , − (cid:80) nk =1 f ( x k ) K ∗ kn (cid:80) nk =1 K ∗ kn (cid:105) insteadof A n in (19). Note that given the configuration of S3, in the nonstationary case, (cid:80) nk =1 f ( x k ) K ∗ kn (cid:80) nk =1 K ∗ kn = O p ( π f ( d n )) whilst (cid:80) nk =1 f ( x k ) K kn (cid:80) nk =1 K ∗ kn that appears in A n is O p ( π f ( d n ) log n ) . Therefore, the employ-ment of A ∗ n results in giving less weight in the term that correspondents to the studentisation ofthe intercept correction. Note that in infinite samples the utilisation of A ∗ n does not result in aconsistent estimator for the limit variance of ˆ β − β . Nevertheless, our simulation results reveal thatin finite samples a superior performance in attained when A ∗ n is employed.Table 1 shows the size properties of the LTLS based t-tests, for the case the regressor is aNI array generated by (20). The number of replication paths is set 10,000 throughout. We alsoconsider the IVX based test (see eq. (20) in Kostakis et al, 2015) and the OLS based t-test. Weallow for several values of the correlation parameter ( δ = {− . , − . , , . , . } ) the near to unityparameter ( c = { , − , − , − , − } ) and sample size ( n = { , , , } ). We use thenotation T1, T2 and T3 to denote the LTLS t-statistics that correspond to set-ups S1, S2 and S3respectively. In general, all LTLS based test exhibit good size control. Under S1 and S2 the testsare moderately oversized for small samples sizes when c = 0 and correlation | δ | = . . Figure 1 and16igure 2 show the empirical power ( n = 250 ) of the LTLS and IVX tests for c = 0 and c = − respectively. It can be seen from these figures that T3 attains better performance than the otherLTLS based tests under consideration (i.e. T1 and T2). In particular, the performance of the theLTLS t-test under S3 is almost identical to that of the IVX based test. This is somewhat surprisinggiven that under S3 the studentisation used does not lead to a consistent estimator for the limitvariance of the LTLS estimator. As noted above, under S3 the term that provides studentisation tothe intercept correction is of slightly smaller order of magnitude (i.e. log n ) than the correspondingterm in A n . The simulation study provided suggests that this misbalancing leads to some finitesample improvement. Hosseinkouchack and Demetrescu (2019) provide finite sample improvementsto the the IVX method. These authors show that the IVX t-statistic distribution is skewed relativeto the N (0 , in finite samples when endogeneity is strong. It is reasonable to expect that a similarphenomenon holds for the LTLS distribution in finite samples. It seems that the utilisation A ∗ n provides a rebalancing to the test statistic that corrects for deviations from the standard normaldistribution. A rigorous analysis for the performance of the T3 in finite samples, would requiredeveloping higher order limit theory. A development in this direction is challenging from a technicalpoint of view and will be left for future work.We next consider the case where the regressor is a non stationary fractional process (i.e. (21)).The finite sample size performance of T3 and the LS based test procedure are shown in Table 2. It can be seen from Table 2 that the T3 test provides good size control for a wide range levels inpersistence and endogeneity. On the other hand LS based test may exhibit serious oversizing. Inparticular, for δ = − . empirical size ranges from three times ( d = 0 . ) to six times ( d = 1 . )the nominal one. Finally, Figure 3 shows the finite power of T3 for n = 250 , d = { . , , . } and δ = { , − . , − . } . As expected, better power performance is attained for more persistentregressors. Preliminary simulation results show that the performance of T1 and T2 in the fractional case is comparable tothat in the NI case. c = 0 δ = − . δ = − . δ = 0 δ = 0 . δ = 0 . n T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS250 0.084 0.095 0.060 0.059 0.278 0.059 0.074 0.057 0.056 0.117 0.051 0.052 0.045 0.050 0.053 0.061 0.075 0.052 0.056 0.113 0.087 0.096 0.064 0.061 0.295500 0.077 0.078 0.062 0.062 0.287 0.059 0.067 0.051 0.054 0.114 0.054 0.053 0.046 0.054 0.054 0.060 0.076 0.057 0.058 0.116 0.080 0.083 0.058 0.055 0.279750 0.076 0.069 0.062 0.058 0.272 0.059 0.065 0.051 0.052 0.109 0.052 0.050 0.042 0.050 0.051 0.059 0.062 0.054 0.055 0.111 0.080 0.068 0.063 0.057 0.2771000 0.070 0.067 0.059 0.053 0.278 0.054 0.064 0.051 0.051 0.111 0.049 0.048 0.046 0.051 0.053 0.059 0.067 0.052 0.050 0.108 0.075 0.062 0.058 0.053 0.277 c = − δ = − . δ = − . δ = 0 δ = 0 . δ = 0 . n T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS250 0.061 0.069 0.060 0.062 0.116 0.051 0.061 0.056 0.056 0.072 0.050 0.052 0.051 0.050 0.051 0.057 0.065 0.056 0.059 0.074 0.068 0.070 0.066 0.066 0.123500 0.060 0.067 0.062 0.063 0.117 0.051 0.060 0.056 0.059 0.073 0.051 0.055 0.051 0.052 0.054 0.056 0.061 0.057 0.057 0.071 0.062 0.060 0.058 0.058 0.116750 0.063 0.058 0.062 0.060 0.116 0.058 0.056 0.056 0.059 0.070 0.056 0.055 0.052 0.056 0.053 0.059 0.055 0.059 0.058 0.073 0.065 0.063 0.062 0.062 0.1191000 0.058 0.055 0.059 0.060 0.116 0.049 0.059 0.052 0.054 0.066 0.047 0.052 0.050 0.050 0.051 0.050 0.056 0.053 0.052 0.066 0.059 0.056 0.057 0.058 0.115 c = − δ = − . δ = − . δ = 0 δ = 0 . δ = 0 . n T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS250 0.058 0.067 0.058 0.062 0.086 0.051 0.058 0.054 0.055 0.063 0.049 0.050 0.050 0.051 0.052 0.056 0.061 0.057 0.057 0.063 0.063 0.059 0.066 0.065 0.090500 0.058 0.053 0.061 0.063 0.088 0.051 0.060 0.058 0.058 0.065 0.047 0.056 0.051 0.052 0.052 0.050 0.061 0.054 0.055 0.060 0.056 0.056 0.059 0.057 0.085750 0.058 0.054 0.061 0.060 0.087 0.058 0.053 0.058 0.056 0.064 0.055 0.054 0.053 0.056 0.053 0.056 0.059 0.057 0.055 0.062 0.058 0.057 0.062 0.062 0.0881000 0.053 0.052 0.058 0.058 0.084 0.049 0.052 0.053 0.053 0.059 0.046 0.048 0.048 0.050 0.051 0.049 0.056 0.050 0.051 0.058 0.054 0.055 0.058 0.058 0.088 c = − δ = − . δ = − . δ = 0 δ = 0 . δ = 0 . n T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS250 0.056 0.052 0.057 0.060 0.069 0.052 0.054 0.052 0.051 0.057 0.051 0.049 0.051 0.050 0.050 0.055 0.057 0.055 0.055 0.058 0.061 0.056 0.059 0.060 0.071500 0.054 0.055 0.058 0.060 0.072 0.050 0.055 0.055 0.054 0.058 0.048 0.056 0.051 0.051 0.052 0.049 0.053 0.053 0.055 0.058 0.053 0.052 0.054 0.056 0.067750 0.053 0.053 0.060 0.059 0.071 0.056 0.055 0.057 0.060 0.060 0.052 0.054 0.055 0.056 0.053 0.056 0.055 0.055 0.055 0.058 0.057 0.054 0.061 0.062 0.0741000 0.052 0.052 0.057 0.057 0.071 0.047 0.057 0.052 0.050 0.056 0.048 0.049 0.048 0.048 0.049 0.048 0.050 0.048 0.049 0.053 0.052 0.055 0.057 0.055 0.070 c = − δ = − . δ = − . δ = 0 δ = 0 . δ = 0 . n T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS T1 T2 T3 IVX OLS250 0.053 0.053 0.056 0.054 0.058 0.052 0.050 0.049 0.050 0.051 0.049 0.051 0.049 0.049 0.049 0.052 0.054 0.052 0.050 0.053 0.055 0.049 0.055 0.055 0.058500 0.052 0.053 0.056 0.054 0.059 0.052 0.054 0.052 0.051 0.053 0.048 0.051 0.046 0.047 0.048 0.050 0.053 0.050 0.050 0.050 0.053 0.048 0.055 0.055 0.059750 0.051 0.050 0.059 0.059 0.064 0.053 0.049 0.054 0.055 0.055 0.053 0.050 0.054 0.053 0.052 0.057 0.051 0.056 0.056 0.058 0.057 0.046 0.059 0.059 0.0631000 0.054 0.054 0.058 0.055 0.061 0.051 0.053 0.052 0.053 0.053 0.050 0.048 0.048 0.050 0.050 0.050 0.047 0.048 0.049 0.050 0.051 0.050 0.054 0.053 0.058 igure 1: Empirical Power (NI regressor; 5% Nominal Size ; c = 0 ) P o w e r IVX P o w e r T1T2T3IVX P o w e r T1T2T3IVX c = − ) P o w e r T1T2T3IVX P o w e r T1T2T3IVX P o w e r T1T2T3IVX δ = − . d = 0 . d = 0 . d = 0 . d = 1 d = 1 . d = 1 . n T3 LS T3 LS T3 LS T3 LS T3 LS T3 LS250 0.051 0.158 0.051 0.184 0.055 0.235 0.060 0.278 0.064 0.308 0.067 0.325500 0.051 0.161 0.052 0.184 0.058 0.242 0.062 0.287 0.066 0.319 0.068 0.337750 0.051 0.155 0.052 0.178 0.058 0.230 0.062 0.272 0.064 0.301 0.067 0.3221000 0.048 0.155 0.050 0.183 0.055 0.229 0.059 0.278 0.065 0.310 0.069 0.327 δ = − . d = 0 . d = 0 . d = 0 . d = 1 d = 1 . d = 1 . n T3 LS T3 LS T3 LS T3 LS T3 LS T3 LS250 0.051 0.085 0.052 0.093 0.055 0.107 0.057 0.117 0.058 0.121 0.057 0.126500 0.051 0.085 0.050 0.091 0.051 0.102 0.051 0.114 0.052 0.120 0.052 0.123750 0.048 0.081 0.046 0.086 0.048 0.098 0.051 0.109 0.052 0.117 0.052 0.1191000 0.047 0.077 0.047 0.086 0.048 0.102 0.051 0.111 0.056 0.118 0.053 0.120 δ = 0 . d = 0 . d = 0 . d = 0 . d = 1 d = 1 . d = 1 . n T3 LS T3 LS T3 LS T3 LS T3 LS T3 LS250 0.043 0.051 0.044 0.052 0.043 0.053 0.045 0.053 0.044 0.052 0.043 0.052500 0.048 0.054 0.048 0.055 0.047 0.054 0.046 0.054 0.045 0.055 0.045 0.056750 0.048 0.053 0.046 0.051 0.043 0.052 0.042 0.051 0.042 0.051 0.043 0.0531000 0.045 0.049 0.044 0.049 0.044 0.053 0.046 0.053 0.044 0.054 0.044 0.055 igure 3: Empirical Power (Fractional Regressor; 5% Nominal Size) P o w e r T3, d=1T3, d=1.2 P o w e r T3, d=1T3, d=1.2 P o w e r T3, d=0.8T3, d=1T3, d=1.2 Application to the predictability of stock returns
A large literature in empirical finance is devoted to the investigation of the hypothesis that stockreturns can be predicted with publicly available information. For a review of existing work seefor example Welch and Goyal (2008) and for more recent developments Kostakis, Magdalinos andStamatogiannis (2015). Typically empirical work in this area involves inferential procedures, forthe hypothesis H : β = 0 , in the context of predictive regressions of the form r k +1 = µ + βx k + u k +1 , (22)where r k are stock returns relating to some stock index, x k some predictive variable and u t amartingale difference regression error. Usually some financial ratio (e.g. dividend yield, earningsto price ratio, book to market ratio) or some macroeconomic variable (e.g. inflation) is consideredas a possible predictor for future returns. Phillips (2015) provides a review for the econometricmethodology employed in the predictive regressions literature. Most studies (e.g. Welch andGoyal, 2008) are utilising methods that are only valid for stationary x k despite the fact that thereis strong evidence that that in certain datasets various financial and macroeconomic variables areconsistent with nonstationary processes (e.g. see Kostakis et al, 2015; Table 4). To the best ofour knowledge, Campbell and Yogo (2006) is the first work that explicitly provides an attemptto address the possibility that the regressor is nonstatationary. In particular, Campbell and Yogo(2006) develop a testing procedure for the case the predictor is a NI array based on conservativeconfidence intervals. Kostakis et al (2015) consider a modified version of the Magdalinos andPhillips (2009) IVX, that involves a finite sample correction relating to intercept estimation, toexamine the return predictability hypothesis. The IVX estimator yields conventional inference forthe case where x k is a NI or mildly integrated array (e.g. Phillips and Magdalinos, 2007) or astationary linear process. IVX instruments are also employed in the recent work of Demetrescu etal (2020) who propose inferential procedures for detecting episodic predictability in stock returns.The IVX method has been also employed in the recent work of Yang, Long, Peng and Cai (2020)who investigate predictability in the U.S. housing index return.An important issue that has been largely overlooked in most studies in this area, is that stockreturns series typically exhibit very weak persistence relative to most popular predictors. In par-ticular, in many datasets short-term returns appear to be close to I ( d ) processes with d ≈ , whilstseveral predictors appear to be I ( d ) with d > / i.e. nonstationary processes. Regressing a sta-tionary processes on a possibly nonstationary leads to misbalancing . As emphasised by Phillips(2015), misbalancing may result to asymptotically vanishing estimators. For instance if r k ∼ I ( d ) d < / (stationary long memory) and x k ∼ I ( d ) with d > / , then then OLS estimator for β in (22) is ˜ β → P .Only a few studies in this area attempt to address the issue of misbalancing. Marmer (2007)points out that a nonlinear relationship between returns and predictive variables is a plausible jus-tification for this discrepancy in persistence. It is known for instance that integrable and boundedtransformations of persistent processes may exhibit very weak signal (e.g. Park and Phillips, 1999,2001; Park, 2003). Therefore, suppose that r k +1 = f ( x k ) + u k +1 where f is either integrable andcompactly supported or the indicator function { . < } . The predictor x k in this case has only“spatial episodic” impact on returns when the predictive variable visits the support of f (integrablecase) or when it assumes negative values (indicator case). For DGPs of this kind it is difficultdistinguishing r k from the martingale difference error u k , despite the fact r k is a function of apersistent process (see for example Kasparis, Andreou and Phillips (2015), Figure 6; or Phillips(2015), Figure 2). Marmer (2007) develops a RESET type of functional form test for detecting pos-sibly nonlinear components (e.g. integrable) of some predictor in the stock return series. A similarapproach is also followed by Kasparis (2010) and Kasparis et al (2015), who utilise test statisticsthat involve integrable transformations of the predictor. The presence of integrable transforma-tions in the test statistics results in conventional inference but can also detect weak signal nonlinearcomponents affecting the returns series (for more details see p. 473-474 in Kasparis et al, 2015).Bollerslev, Osterrieder, Sizova and Tauchen (2013) follow a different approach for addressing theissue of misbalancing. These authors consider vix and realised volatility as possible predictors ofstock returns. Using preliminary estimations they find that the aforementioned predictors exhibitlong memory with memory parameters d ≈ . , whilst stock returns appear to have a memoryparameter d ≈ . In view of this, Bollerslev et al (2013) consider prefiltered predictors of the form ( I − L ) ˆ d x k where x k is some volatility variable. Notice that the fractionally differenced process isapproximately d ≈ . Finally, Demetrescu et al (2020) develop inferential procedures capable ofdetecting episodic predictability is stock returns for the case where the predictors that are either I (0) or NI. In particular, they consider a potentially nonlinear relationship between returns and thepredictive variables of the form r k +1 = f n ( x k , k/n ) + u k +1 , where f n ( x k , k/n ) = µ + k n β ( k/n ) x k , β ( . ) is a TVP depending on the rescaled time trend k/n , and k n an appropriate sequence. Thisformulations allows for “time episodic” impact of the predictor to the returns variable. Demetrescuet al (2020) achieve conventional inference by either utilising IVX instruments or the so called typeII instruments of Breitung and Demetrescu (2015). The method of Demetrescu et al (2020) can be used in conjunction with various instruments including LTLS.Such a development would require additional theoretical work and is left for future research.
24n this work we address the issue of misbalancing by consider predictability over longer horizons.In particular, we employ LTLS based inference in predictive regressions of the form r k + m = µ + βx k + u k + m , (23)where m ≥ . The specification of (23) has been considered by other studies that investigate returnpredictability over long horizons (see for example Bandi and Perron, 2008; Hjalmarsson, 2011).The data are taken from the updated 2018 Welch and Goyal dataset . The returns variable isconstructed from the SP500 index ( I k ) as follows r k + m = ln( I k + m ) − ln( I k ) . We are using monthlyand quarterly observations. Therefore, for monthly data, r k + m should be understood as m monthsahead returns, and for quarterly observations as m quarters ahead. By construction returns are log-price differences. Therefore, the persistence of the returns series tends to increase as the horizonincreases. Table 3 provides memory estimates for the return series over different horizons andfrequencies. In particular, we use the local Whittle estimator (LW; e.g. see Robinson, 1995) andthe exact local Whittle (ELW) of Shimotsu and Phillips (2005). The bandwidth employed is ofthe form n b . Shimotsu and Phillips (2005) consider b = 0 . for the bandwidth exponent. Herewe also consider b = 0 . and b = 0 . . Moreover, we report memory estimates for the earnings toprice ratio (EP). The particular series appears to be less persistent than dividend yield and book tomarker ratio that are commonly used in empirical work. For this reason we will concentrate on EPwhose memory characteristics are closer to those of the returns series. It can be seen from Table3 that the EP appears to be nonstationary at both frequencies and for all bandwidth choices withminimal memory estimate . . Further, the memory characteristics of the returns series appearto resemble those of the EP variable over longer horizons i.e. m = 24 for monthly data and m = 12 for quarterly, when b = 0 . , . .Figure 3 reports values for the LTLS ˆ T -statistics for the hypothesis H : β = 0 vs H : β (cid:54) = 0 (c.f. equation (23)). These values are plotted against the predictability horizon parameter m .We consider three configurations for kernels, cps and bandwidth sequences consistent with theset-ups S1, S2 and S3 given in the previous section. In particular, for S1 and S2 we choose K ( x ) = ϕ . σ u ( x ) / , K ( x ) ∗ = ϕ ˜ σ u ( x ) / . It can be seen from Figure 3 that there is evidence forpredictability only for longer horizons under S1 and S3. For monthly data, the null hypothesis isrejected at a 5% level under S1 and S3 for for m greater than 6 and 5 respectively. For quarterlydata the null is rejected under S1 and S3 for m greater than 12 and 10 respectively. These findingsare consistent with those of Bandi and Perron (2008) how find strong predictability (by volatility n b b = 0 . b = 0 . b = 0 . LW ELW LW ELW LW ELWReturns ( m = 1 ) -0.09 -0.08 0.07 0.06 0.03 0.04Returns ( m = 12 ) -0.036 -0.02 0.45 0.45 0.84 0.86Returns ( m = 24 ) 0.21 0.22 0.93 0.93 1.04 1.06EP 0.77 0.85 0.92 1.22 1.02 1.51Quarterly DataBandwidth n b b = 0 . b = 0 . b = 0 . LW ELW LW ELW LW ELWReturns ( m = 1 ) -0.09 -0.07 -0.09 -0.08 0.03 0.04Returns ( m = 8 ) -0.03 -0.01 0.16 0.17 0.89 0.93Returns ( m = 12 ) 0.06 0.08 0.82 0.83 1.19 1.14EP 0.76 0.81 0.79 0.85 0.88 1.17predictors) over longer horizons. 26igure 4: Predictability Tests m (predictability horizon) -0.500.511.522.5 t - s t a t i s t i cs LTLS predictability tests (quarterly data)
T1T2T31.96 m (predictability horizon) t - s t a t i s t i cs LTLS predictability tests (monthly data)
T1T2T31.96 Proofs of main results
Throughout the section, we assume that
C, C , C , C , ... are positive constants that may take adifferent value in each appearance and let K kn := (cid:80) l n j =1 K [ c n ( k/n − τ j )] as in (14). We start with two preliminary lemmas, which provide significant extension to Lemma 4.1 of Hu,Phillips and Wang (2019) and include (5) and (7) as a corollary. The proofs of these two lemmaswill be given in Sections 6.7 and 6.8, respectively.Let { X n,k } k ≥ ,n ≥ , where X n,k = ( X nk, , ..., X nk,p ) , be a vector random array. When thereis no confusion, we also use the notation X nk = X n,k . Let { v k } k ≥ be a sequence of randomvariables, and G ( q ) = G ( q , ..., q p ) and K ( x ) be Borel functions on R p and R , respectively. For < τ < τ < ... < τ l < , set S n,l = c n n n (cid:88) k =1 G ( X nk ) v k l l (cid:88) j =1 K (cid:2) c n ( k/n − τ j ) (cid:3) , where { c n } n ≥ is a sequence of positive constants. Our first result investigates the asymptotics of S n,l . Lemma 1.
Suppose that(a) there is a continuous limiting process X t = ( X ( t ) , ..., X p ( t )) such that X n, [ nt ] ⇒ X t on D R p [0 , ;(b) sup k ≥ E | v k | < ∞ and there exist A ∈ R and < m := m n → ∞ satisfying n/m → ∞ sothat max m ≤ j ≤ n − m E (cid:12)(cid:12) m (cid:80) j + mk = j +1 v k − A (cid:12)(cid:12) = o (1); (c) G ( q ) is continuous, K ( x ) has compact support or K ( x ) is eventually monotonic with K ( x ) ≤ C/ (1 + | x | ) , and K ( x ) ≥ satisfying (cid:82) K < ∞ .Then, for any fixed l ≥ , c n → ∞ and c n /n → , we have S n,l = 1 l l (cid:88) j =1 G ( X n, [ nτ j ] ) A (cid:90) K + o P (1) → p l l (cid:88) j =1 G ( X τ j ) A (cid:90) K. (24)28 f in addition τ j = j/ ( l n + 1) , j = 1 , , ..., l n , where l − n + l n /c n → , then S n,l n = (cid:90) G ( X n, [ nt ] ) dt A (cid:90) K + o P (1) → p (cid:90) G ( X t ) dt A (cid:90) K. (25) Remark . Weak convergence in (a) and continuity of G ( q ) are essentially necessary for this kindof result. The result can be extended to the case that G ( q ) is locally Lebesgue integrable if weimpose more smooth conditions on X nk , but it involves more complicated calculation. We do notpursue the extension to keep this paper under reasonable length. It is worth to mention that norelationship is imposed between v k and X nk and condition (b) is satisfied with A = Ev whenever v t is ergodic (strictly) stationary satisfying E | v | < ∞ and n (cid:80) nk =1 v k → L Ev .If we are only interested in the boundedness of S n,l , condition (b) can be reduced as seen in thefollowing result. Lemma 2.
Suppose that conditions (a) and (c) of Lemma 1 hold and { v k } k ≥ is an arbitraryrandom sequence satisfying sup k ≥ E | v k | < ∞ . Then, for any l ≥ (allowing for l = l n → ∞ ), c n → ∞ and c n /n → , we have c n n n (cid:88) k =1 || G ( X nk ) || | v k | l l (cid:88) j =1 K (cid:2) c n ( k/n − τ j ) (cid:3) = O P (1) . (26) If in addition K ( x ) ≤ C / (1 + | x | ) , τ j = j/ ( l n + 1) , j = 1 , , ..., l n , l n log l n /c n → and l n → ∞ ,then c n n n (cid:88) k =1 || G ( X nk ) || | v k | l n (cid:88) ≤ i
The result for S n,l n , i.e., (7) follows from Lemma 1 with v k ≡ .We next consider M n,l n , i.e., (8). Set Q k,n := (cid:113) c n nl n α (cid:48) g ( X nk ) K kn where α ∈ R p . Noting that (cid:82) g ( X n, [ nt ] ) dt is a continuous functional of X n, [ nt ] , the limit result of (8), jointly with (7), willfollow if we prove that (cid:110) X n, [ nt ] , n (cid:88) k =1 Q k,n u k (cid:111) ⇒ (cid:26) X t , MN (cid:18) , σ u (cid:90) (cid:2) α (cid:48) g ( X t ) (cid:3) dt (cid:90) K (cid:19)(cid:27) (35)on D R [0 , . First note that, by using Lemmas 1 and 2 with v k ≡ , n (cid:88) k =1 Q k,n = c n n n (cid:88) k =1 (cid:2) α (cid:48) g ( X nk ) (cid:3) l n l n (cid:88) j =1 K [ c n ( k/n − τ j )] + o P (1)= (cid:90) (cid:2) α (cid:48) g ( X n, [ nt ] ) (cid:3) dt (cid:90) K + o P (1) → d (cid:90) (cid:2) α (cid:48) g ( X t ) (cid:3) dt (cid:90) K , indicating that (cid:110) X n, [ nt ] , n (cid:88) k =1 Q k,n (cid:111) ⇒ (cid:26) X t , (cid:90) (cid:2) α (cid:48) g ( X t ) (cid:3) dt (cid:90) K (cid:27) . It follows from Theorem 2.1 of Wang (2014), the limit result of (35) will follow, if we prove max ≤ k ≤ n | Q k,n | = o P (1) , (36)31nd √ n n (cid:88) k =1 | Q k,n | = o P (1) . (37)In fact, by recalling the fact that || g || is still continuous, it follows from Lemma 2 with v k = 1 again that (cid:104) max ≤ k ≤ n | Q k,n | (cid:105) ≤ n (cid:88) k =1 Q k,n ≤ (cid:107) α (cid:107) (cid:16) c n nl n (cid:17) n (cid:88) k =1 (cid:107) g ( X nk ) (cid:107) K kn = o P (1) , yielding (36). Similarly, by recalling l n /c n → , we have √ n n (cid:88) k =1 | Q k,n | ≤ (cid:107) α (cid:107) √ n (cid:114) c n nl n n (cid:88) k =1 (cid:107) g ( X nk ) (cid:107) K kn = (cid:107) α (cid:107) (cid:114) l n c n c n nl n n (cid:88) k =1 (cid:107) g ( X nk ) (cid:107) K kn = O P (cid:16)(cid:114) l n c n (cid:17) = o P (1) , which shows (37). The proof of Theorem 2 is complete. (cid:3) To show Theorem 3, we only prove (9) since (10) is a direct consequence of (9) and Theorem 2.Notice that, by the condition (b), we may write n (cid:88) k =1 π ( d n ) − g ( x k ) K kn (cid:104) c n nl n , (cid:114) c n nl n u k (cid:105) = n (cid:88) k =1 (cid:104) H ( X nk ) + π ( d n ) − R ( d n , X nk ) (cid:105) K kn (cid:104) c n nl n , (cid:114) c n nl n u k (cid:105) = n (cid:88) k =1 H ( X nk ) K kn (cid:104) c n nl n , (cid:114) c n nl n u k (cid:105) + (∆ n , ∆ n ) , where R ( λ, x ) = [ R ( λ, x ) , ..., R p ( λ, x )] (cid:48) and ∆ n = c n nl n n (cid:88) k =1 π ( d n ) − R ( d n , X nk ) K kn , ∆ n = (cid:114) c n nl n n (cid:88) k =1 π ( d n ) − R ( d n , X nk ) K kn u k . Now (9) follows from Theorem 2 with g ( x ) = H ( x ) if we prove | α (cid:48) ∆ in | = o P (1) , i = 1 , , (38)32or any α = ( α , ..., α p ) (cid:48) ∈ R p .We only prove (38) with i = 2 since the proof of | α (cid:48) ∆ n | = o P (1) is similar except simpler.Recall K kn := (cid:80) l n j =1 K [ c n ( k/n − τ j )] and set, for A > , (cid:101) R n,l n ( A ) = (cid:114) c n nl n n (cid:88) k =1 α (cid:48) π ( d n ) − R ( d n , X nk ) I {| X nk | ≤ A } K kn u k . Note that as n → ∞ first and then A → ∞ P (cid:16) α (cid:48) ∆ n (cid:54) = (cid:101) R n,l n ( A ) (cid:17) ≤ P (cid:18) max ≤ k ≤ n | X nk | ≥ A (cid:19) → . (39)For any (cid:15) > and A > , we have P (cid:0)(cid:12)(cid:12) α (cid:48) ∆ n (cid:12)(cid:12) ≥ (cid:15) (cid:1) ≤ P (cid:16) α (cid:48) ∆ n (cid:54) = (cid:101) R n,l n ( A ) (cid:17) + (cid:15) − E (cid:104) (cid:101) R n,l n ( A ) (cid:105) . Now | α (cid:48) ∆ n | = o P (1) follows from (39) and the fact that as n → ∞ for any A > E (cid:104) (cid:101) R n,l n ( A ) (cid:105) ≤ c n nl n C n (cid:88) k =1 E (cid:12)(cid:12)(cid:12) α (cid:48) π ( d n ) − R ( d n , X nk ) (cid:12)(cid:12)(cid:12) I {| X nk | ≤ A } K kn ≤ c n n C (cid:107) α (cid:107) (cid:16) A δ (cid:17) (cid:15) n l n n (cid:88) k =1 K kn → , where (cid:15) n = max ≤ i ≤ p | [ π i ( d n )] − a i ( d n ) | → and we have used (28) of Lemma 2 (with G ( x ) ≡ and v k ≡ ). The proof of Theorem 3 is now complete. Proofs of (11) and (12) are essentially the same as that of (9). We only provide a outline for(11). For any α, β ∈ R , let (cid:101) Q k,n = (cid:114) c n nl n (cid:16) αH ( X n,k ) K kn + β K ∗ kn (cid:17) , where K ∗ kn := (cid:80) l n j =1 K ∗ [ c n ( k/n − τ j )] . As in the proof of (9), we have α U n + β U n = n (cid:88) k =1 (cid:101) Q k,n u k + o P (1) . Note that, by using (31) and Lemmas 1 and 2, n (cid:88) k =1 (cid:101) Q k,n = α (cid:90) H ( X n, [ nt ] ) dt (cid:90) K + 2 αβ (cid:90) H ( X n, [ nt ] ) dt (cid:90) KK ∗ β (cid:90) ( K ∗ ) + o P (1) , indicating (cid:110) X n, [ nt ] , n (cid:88) k =1 (cid:101) Q k,n (cid:111) ⇒ (cid:8) X t , [ α, β ] V [ α, β ] (cid:48) (cid:9) . Similarly, we may prove that (36) and (37) hold with Q kn being replaced by (cid:101) Q k,n . As a consequence,(11) follows from Wang (2014) as in the proof of Theorem 2. (cid:3) We only prove Theorem 5 since the proof of Theorem 4 is similar except simpler. Let A n = c n nl n n (cid:88) k =1 π ( d n ) − f ( x k ) K kn , A n = c n nl n n (cid:88) k =1 π ( d n ) − f ( x k ) K kn ,A n = c n nl ∗ n n (cid:88) k =1 π ( d n ) − f ( x k ) K ∗ kn ,B n = (cid:114) c n nl n n (cid:88) k =1 π ( d n ) − f ( x k ) K kn u k , B n = (cid:114) c n nl ∗ n n (cid:88) k =1 K ∗ kn u k . Recall (15) and Z kn = f ( x k ) K kn and note that c n nl ∗ n (cid:80) nk =1 K ∗ kn = (cid:82) K ∗ + o (1) . It is readily seenfrom (9) of Theorem 3 and Theorem 2 that λ n π ( d n ) n (cid:88) k =1 Z kn f k = c n nl n n (cid:88) k =1 π ( d n ) − f ( x k ) K kn (cid:104) π ( d n ) − f ( x k ) − (cid:80) nk =1 π ( d n ) − f ( x k ) K ∗ kn (cid:80) nk =1 K ∗ kn (cid:105) = A n − A n A n (cid:46) (cid:90) K ∗ + o P (1)= C n (cid:90) K + o P (1) , (40)where C n = (cid:82) H ( X n, [ nt ] ) dt − (cid:104)(cid:82) H ( X n, [ nt ] ) dt (cid:105) , if K ∗ kn = (cid:80) l n j =1 K ∗ [ c n ( k/n − τ j )] , (cid:82) H ( X n, [ nt ] ) dt − (cid:2) (cid:82) H ( X n, [ nt ] ) dt (cid:3) H ( X n, [ nτ ∗ ] ) , if K ∗ kn = K ∗ [ c n ( k/n − τ ∗ )] . Similarly, we have (cid:112) λ ∗ n λ n π ( d n ) n (cid:88) k =1 Z kn u k (cid:112) λ ∗ n λ n (cid:40) n (cid:88) k =1 π ( d n ) − f ( x k ) K kn u k − (cid:2)(cid:80) nk =1 π ( d n ) − f ( x k ) K kn (cid:3) [ (cid:80) nk =1 K ∗ kn u k ] (cid:80) nk =1 K ∗ kn (cid:41) (41) = (cid:114) l ∗ n l n B n − A n B n (cid:14) (cid:90) K ∗ + o P (1)= A n B n + o P (1) , (42)where A n = (cid:34) R ∗ , − (cid:82) H (cid:0) X n, [ nt ] (cid:1) dt (cid:82) K (cid:82) K ∗ (cid:35) , B n = [ B n , B n ] (cid:48) . Since both C n and A n are continuous functionals of X n, [ nt ] , a simple application of (11) and (12)yields that (cid:112) λ ∗ n π ( d n ) (cid:16) ˆ β − β (cid:17) = √ λ ∗ n λ n π ( d n ) (cid:80) nk =1 Z kn u k λ n π ( d n ) (cid:80) nk =1 Z kn f k = (cid:16) C n (cid:90) K (cid:17) − A n B n + o P (1) → d σ u MN (cid:18) , (cid:16) C (cid:90) K (cid:17) − AV A (cid:48) (cid:19) , (43)as required. The proof of Theorem 5 is complete. (cid:3) We only prove Theorem 6 under conditions of Theorem 5 since the proof under conditions ofTheorem 4 is similar. In addition to A n , B n , B n , A n and B n in the proof of Theorem 5, wedefine V n = (cid:82) H (cid:0) X n, [ nt ] (cid:1) dt (cid:82) K (cid:82) H (cid:0) X n, [ nt ] (cid:1) dt Q ∗ (cid:82) H (cid:0) X n, [ nt ] (cid:1) dt Q ∗ (cid:82) ( K ∗ ) . As in the proof of (40), by letting D n = diag (cid:8) π ( d n ) √ λ n , (cid:112) λ ∗ n (cid:9) , we have λ ∗ n λ n π ( d n ) A n V n A (cid:48) n = λ ∗ n λ n π ( d n ) A n D n D − n V n D − n D n A (cid:48) n = (cid:34)(cid:114) λ ∗ n λ n , − λ n π ( d n ) (cid:80) nk =1 f ( x k ) K kn λ ∗ n (cid:80) nk =1 K ∗ kn (cid:35) × λ n π ( d n ) (cid:80) nk =1 K kn f ( x k ) π ( d n ) √ λ n λ ∗ n (cid:80) nk =1 K ∗ kn K kn f ( x k ) π ( d n ) √ λ n λ ∗ n (cid:80) nk =1 K ∗ kn K kn f ( x k ) λ ∗ n (cid:80) nk =1 ( K ∗ kn ) (cid:34)(cid:114) λ ∗ n λ n , − λ n π ( d n ) (cid:80) nk =1 f ( x k ) K kn λ ∗ n (cid:80) nk =1 K ∗ kn (cid:35) = A n V n A (cid:48) n + o P (1) . (44)Since ˜ σ = σ u + o P (1) under given assumptions, by using the similar arguments as in the proofs of(42) and (43), it follows from (44) that ˆ T = √ λ ∗ n λ n π ( d n ) (cid:80) nk =1 Z kn u k (cid:113) ˜ σ λ ∗ n λ n π ( d n ) A n V n A (cid:48) n = (cid:0) σ u A n V n A (cid:48) n (cid:1) − / A n B n + o P (1) → d N (0 , , as required. (cid:3) We only prove (25), as the proof of (24) is similar except more simpler. We start with the proof of(25) by assuming that there exists an
A > such that K ( x ) = 0 if | x | ≥ A and K ( x ) is Lipschitzcontinuous on R . This restriction will be removed later.Without loss of generality, suppose A = 1 . Set δ n,j = [ n ( τ j − /c n )] ∨ , δ n,j = [ n ( τ j +1 /c n )] ∨ and δ n,j = [ nτ j ] . Recall τ j = j/ ( l n + 1) . Since | c n ( k/n − τ j ) | < only if δ n,j ≤ k ≤ δ n,j , j = 1 , ..., l n , (45)by letting R n,j = c n n (cid:80) δ n,j k = δ n,j v k K (cid:2) c n ( k/n − τ j ) (cid:3) and R n,j = c n n δ n,j (cid:88) k = δ n,j (cid:2) G (cid:0) X nk (cid:1) − G (cid:0) X n,δ n,j (cid:1)(cid:3) v k K (cid:2) c n ( k/n − τ j ) (cid:3) , we have S n,l n = 1 l n l n (cid:88) j =1 c n n n (cid:88) k =1 G ( X nk ) v k K (cid:2) c n ( k/n − τ j ) (cid:3) = 1 l n l n (cid:88) j =1 G (cid:0) X n,δ n,j (cid:1) c n n δ n,j (cid:88) k = δ n,j v k K (cid:2) c n ( k/n − τ j ) (cid:3) + 1 l n l n (cid:88) j =1 c n n δ n,j (cid:88) k = δ n,j (cid:2) G (cid:0) X nk (cid:1) − G (cid:0) X n,δ n,j (cid:1)(cid:3) v k K (cid:2) c n ( k/n − τ j ) (cid:3) = 1 l n l n (cid:88) j =1 G (cid:0) X n,δ n,j (cid:1) R n,j + 1 l n l n (cid:88) j =1 R n,j l n l n (cid:88) j =1 G (cid:0) X n,δ n,j (cid:1) A (cid:90) K + 1 l n l n (cid:88) j =1 G (cid:0) X n,δ n,j (cid:1) (cid:2) R n,j − A (cid:90) K (cid:3) + 1 l n l n (cid:88) j =1 R n,j := 1 l n l n (cid:88) j =1 G (cid:0) X n,δ n,j (cid:1) A (cid:90) K + R n + R n . Since l n (cid:80) l n j =1 G (cid:0) X n,δ n,j (cid:1) = (cid:82) G ( X n, [ nt ] ) dt + o P (1) → d (cid:82) G ( X t ) dt , it suffices to show that R jn = o P (1) , j = 1 , . (46)To prove (46), we start with some preliminaries. Recalling X n, [ nt ] ⇒ X t on D R p [0 , and thelimit process X ( t ) is path continuous, we have X n, [ nt ] ⇒ X t on D R p [0 , in the sense of uniformtopology. See, for instance, Section 18 of Billingsley (1968). This fact implies that lim sup N →∞ lim sup n →∞ P (cid:16) max ≤ k ≤ n || X nk || ≥ N (cid:17) = 0 , (47)and by the tightness of { X n, [ nt ] } ≤ t ≤ , for any ε > and δ > , there is some ˜ δ = ˜ δ ( ε, δ ) > suchthat P ( sup | s − t |≤ ˜ δ || X n, [ nt ] − X n, [ ns ] || ≥ δ ) ≤ ε (48)holds for all sufficiently large n . In terms of (48), for any δ > , we have lim n →∞ P ( max ≤ j ≤ l n max δ n,j ≤ l ≤ k ≤ δ n,j || X nk − X nl || ≥ δ ) = 0 . (49)We are now ready to prove (46), starting with j = 1 .For any N > , we let G N ( x ) = G ( x ) ξ N ( x ) with ξ N ( x ) = , || x || ≤ N, − || x || /N, N < || x || < N, , || x || ≥ N, and (cid:101) R n = 1 l n l n (cid:88) j =1 G N (cid:0) X n,δ n,j (cid:1) (cid:2) R n,j − A (cid:90) K (cid:3) . n → ∞ first and then N → ∞ , P ( R n (cid:54) = (cid:101) R n ) ≤ P (cid:16) max ≤ k ≤ n || X nk || ≥ N (cid:17) → , (50)and | (cid:101) R n | ≤ C N l n l n (cid:88) j =1 (cid:12)(cid:12) R n,j − A (cid:90) K (cid:12)(cid:12) , (51)where C N := sup x | G N ( x ) | < ∞ is a constant depending only on N , due to the continuity of G ( x ) .Result (46) with j = 1 will follow if we prove max ≤ j ≤ l n E (cid:12)(cid:12) R n,j − A (cid:90) K (cid:12)(cid:12) → , (52)as n → ∞ . Indeed, by virtue of (51) and (52), we have E | (cid:101) R n | → and then (cid:101) R n = o P (1) for each N ≥ . This, together with (50), yields R n = o P (1) .Since, as n → ∞ , max ≤ j ≤ l n (cid:12)(cid:12)(cid:12) c n n δ n,j (cid:88) k = δ n,j K (cid:2) c n ( k/n − τ j ) (cid:3) − (cid:90) K (cid:12)(cid:12)(cid:12) → , (53)to prove (52), it suffices to show that max ≤ j ≤ l n E | A n ( τ j ) | → , where A n ( τ j ) = c n n δ n,j (cid:88) k = δ n,j ( v k − A ) K (cid:2) c n ( k/n − τ j ) (cid:3) . Let γ = γ n be integers such that γ → ∞ and γ c n /n → , T n,j = [ δ n,j /γ ] and T n,j = [ δ n,j /γ ] .Noting (45), we may write A n ( τ j ) = c n n δ n,j (cid:88) k = δ n,j ( v k − A ) K (cid:2) c n ( k/n − τ j ) (cid:3) = c n n T n,j (cid:88) s = T n,j ( s +1) γ (cid:88) k = sγ ( v k − A ) K (cid:2) c n ( k/n − τ j ) (cid:3) ≤ γc n n T n,j (cid:88) s = T n,j K (cid:2) c n ( sγ/n − τ j ) (cid:3) γ (cid:12)(cid:12)(cid:12) ( s +1) γ (cid:88) k = sγ ( v k − A ) (cid:12)(cid:12)(cid:12) + c n n T n,j (cid:88) s = T n,j ( s +1) γ (cid:88) k = sγ | v k − A | (cid:12)(cid:12)(cid:12) K (cid:2) c n ( k/n − τ j ) (cid:3) − K (cid:2) c n ( sγ/n − τ j ) (cid:3)(cid:12)(cid:12)(cid:12) := A n ( τ j ) + A n ( τ j ) . sup k ≥ E | v k | < ∞ by condition (b), it is readily from the the Lipschitz condition on K ( x ) that EA n ( τ j ) ≤ C γc n n c n n δ n,j (cid:88) k = δ n,j E | v k − A | ≤ C γc n n → , uniformly in ≤ j ≤ l n . Similarly, by using condition (b), we have max ≤ j ≤ l n EA n ( τ j ) ≤ max γ ≤ s ≤ n − γ E (cid:12)(cid:12) γ s + γ (cid:88) k = s v k − A (cid:12)(cid:12) max ≤ j ≤ l n A n ( τ j ) → , where A n ( τ j ) = γc n n T n,j (cid:88) s = T n,j K (cid:2) c n ( sγ/n − τ j ) (cid:3) , and we have used the fact that max ≤ j ≤ l n (cid:12)(cid:12)(cid:12) A n ( τ j ) − (cid:82) K (cid:12)(cid:12)(cid:12) → . Combining all these facts, we prove(52), and complete the proof of R n = o P (1) .We next show R n = o P (1) . Let (cid:101) R n = l n (cid:80) l n j =1 (cid:101) R n,j , where (cid:101) R n,j = c n n δ n,j (cid:88) k = δ n,j (cid:2) G N (cid:0) X nk (cid:1) − G N (cid:0) X n,δ n,j (cid:1)(cid:3) v k K (cid:2) c n ( k/n − τ j ) (cid:3) . In terms of (47), we have P ( R n (cid:54) = (cid:101) R n ) ≤ P (cid:16) max ≤ k ≤ n || X nk || ≥ N (cid:17) → , as n → ∞ first and then N → ∞ . Result R n = o P (1) will follow if we prove (cid:101) R n = o P (1) , foreach fixed N ≥ .Recall that G N ( x ) is continuous with compact support. For any (cid:15) > , there exists a δ (cid:15) > sothat | G N ( x ) − G N ( y ) | ≤ (cid:15) whenever || x − y || ≤ δ (cid:15) . Write Ω δ (cid:15) = { ω : max ≤ j ≤ l n max δ n,j ≤ l ≤ k ≤ δ n,j || X nk − X nl || ≤ δ (cid:15) } . By virtue of the facts above and (53), it is readily seen that max ≤ j ≤ l n E (cid:2) | (cid:101) R n,j | I (Ω δ (cid:15) ) (cid:3) E (cid:110) max ≤ j ≤ l n max δ n,j ≤ l ≤ k ≤ δ n,j | G N ( X nk ) − G N ( X nl ) | c n n δ n,j (cid:88) k = δ n,j +1 | v k | K (cid:2) c n ( k/n − τ j ) (cid:3)(cid:111) ≤ (cid:15) sup k ≥ E | v k | c n n δ n,j (cid:88) k = δ n,j +1 K (cid:2) c n ( k/n − τ j ) (cid:3) ≤ C N (cid:15), where C N is a constant depending only on N . Now, for any η > and η > , let (cid:15) = η η and n be large enough so that, for all n ≥ n [recall (49)], P (cid:0) max ≤ j ≤ l n max δ n,j ≤ l ≤ k ≤ δ n,j || X nk − X nl || ≥ δ (cid:15) (cid:1) ≤ η . It is readily seen that, for all n ≥ n , P ( | (cid:101) R n | ≥ η ) ≤ P (cid:0) ¯Ω δ (cid:15) (cid:1) + η − l n l n (cid:88) j =1 E (cid:2) | (cid:101) R n,j | I (Ω δ (cid:15) ) (cid:3) ≤ C N η where ¯Ω δ (cid:15) denotes the complementary set of Ω δ (cid:15) and C N is a constant depending only on N . Thisyields (cid:101) R n = o P (1) , for each fixed N ≥ , and completes the proof of R n = o P (1) .We finally remove the restriction on K and then conclude the proof of Lemma 1. If K hascompact support, then there exists A > such that K ( x ) = 0 holds for all | x | ≥ A . If K iseventually monotonic, then for any (cid:15) > , we can also choose a constant A := A ( (cid:15) ) > such that K ( x ) is monotonic on ( −∞ , − A ) and ( A , ∞ ) and (cid:82) | x | >A K ( x ) dx < (cid:15) (in order to simplify thenotations, here we use the same notation A to denote the constant).Since K ≥ with (cid:82) K < ∞ , for any (cid:15) > , there exists an A := A (cid:15) ≥ A + 1 such that (cid:90) | K − K (cid:15),A | ≤ (cid:15), where K (cid:15),A ( x ) = 0 if | x | ≥ A and K (cid:15),A ( x ) is Lipschitz continuous on R . Let (cid:101) K ( x ) = K ( x ) − K (cid:15),A ( x ) and S n,(cid:15) = 1 l n l n (cid:88) j =1 c n n n (cid:88) k =1 G ( X nk ) v k (cid:101) K (cid:2) c n ( k/n − τ j ) (cid:3) . It suffices to show that, as n → ∞ first and then (cid:15) → , S n,(cid:15) = o P (1) . (54)40he proof of (54) is similar to that of (46). Indeed, by letting S n,(cid:15),N = 1 l n l n (cid:88) j =1 c n n n (cid:88) k =1 G N ( X nk ) v k (cid:101) K (cid:2) c n ( k/n − τ j ) (cid:3) , we have P (cid:104) S n,(cid:15) (cid:54) = S n,(cid:15),N (cid:105) ≤ P (cid:16) max ≤ k ≤ n || X nk || ≥ N (cid:17) → , as n → ∞ first and then N → ∞ . Hence it suffices to show that, for each fixed N ≥ , S n,(cid:15),N = o P (1) as n → ∞ first and then (cid:15) → . Note that sup ≤ j ≤ l n (cid:12)(cid:12)(cid:12) c n n n (cid:88) k =1 (cid:12)(cid:12) (cid:101) K (cid:2) c n ( k/n − τ j ) (cid:3)(cid:12)(cid:12) I ( c n | k/n − τ j | ≤ A ) − (cid:90) A − A | (cid:101) K ( x ) | dx (cid:12)(cid:12)(cid:12) → , and, if K ( x ) is monotonic on ( −∞ , − A ) and ( A, ∞ ) then for sufficiently large n , uniformly for ≤ j ≤ l n , c n n n (cid:88) k =1 (cid:12)(cid:12) (cid:101) K (cid:2) c n ( k/n − τ j ) (cid:3)(cid:12)(cid:12) I ( c n | k/n − τ j | > A )= c n n n (cid:88) k =1 K (cid:2) c n ( k/n − τ j ) (cid:3) I ( c n | k/n − τ j | > A ) ≤ (cid:90) | x | >A − c n /n K ( x ) dx ≤ (cid:90) | x | >A K ( x ) dx < (cid:15). Hence, in terms of the uniformed boundedness of G N ( x ) , we have ES n,(cid:15),N ≤ C N sup k E | v k | l n l n (cid:88) j =1 c n n n (cid:88) k =1 (cid:12)(cid:12) (cid:101) K (cid:2) c n ( k/n − τ j ) (cid:3)(cid:12)(cid:12) → , as n → ∞ first and then (cid:15) → . Hence S n,(cid:15),N = o P (1) as n → ∞ first and then (cid:15) → . The proofof (54) is completed. (cid:3) We first prove (27). Using similar arguments as in the proof of (46) or (54), it suffices to show that,as n → ∞ , I n := c n n n (cid:88) k =1 l n (cid:88) ≤ i Journal of Econometrics , , 349–374.[2] Billingsley P. (1968). Convergence of probability measures. Wiley, New York.[3] Bollerslev, T., Osterrieder, D., Sizova, N. and Tauchen, G. (2013). Risk and return: long-runrelations, fractional cointegration, and return predictability. Journal of Financial Economics, , 409-424.[4] Breitung, J. and Demetrescu, M. (2015). Instrumental variable and variable addition basedinference in predictive regressions. Journal of Econometrics, (1), 358-375.[5] Buchmann, B. and Chan, N.H. (2007). Asymptotic theory of least squares estimators for nearlyunstable processes under strong dependence. Annals of Statistics , (5), 2001–2017.[6] Campbell, J.Y. and Yogo, M. (2006). Efficient tests of stock return predictability. Journal ofFinancial Economics, , 27–60. 437] Chan, N. and Wang, Q. (2015). Nonlinear regressions with nonstationary time series. Journalof Econometrics, , 182-195.[8] Christopeit, N. (2009). Weak convergence to nonlinear transformations of integrated processes:the multivariate case. Econometric Theory , , 1180-1207.[9] Demetrescu, M., Georgiev, I., Rodrigues, P. and Taylor, R. (2020). Testing for episodic pre-dictability in stock returns. Journal of Econometrics , in press.[10] Duffy J.A. and Kasparis, I. (2018). Regressions with fractional d = 1 / and weakly nonsta-tionary processes. Mimeo , arXiv:1812.07944[11] Giraitis, L. and Phillips, P.C.B. (2006). Uniform limit theory for stationary autoregression. Journal of Time Series Analysis , (1), 51-60.[12] Hall, P. and Heyde, C.C. (1980). Martingale limit theory and its application . Academic Press,New York.[13] Hjalmarsson, E. (2011). New methods for inference in long-horizon regressions. Journal ofFinancial and Quantitative Analysis , , 815-839[14] Hosseinkouchack, M. and Demetrescu, M. (2019) Finite-sample size control of IVX-based testsin predictive regressions. Mimeo .[15] Hu, Z., Phillips, P.C.B. and Wang, Q. (2019). Nonlinear cointegrating power regression withendogeneity. Preprint. Cowles Foundation Discussion Papers No. 2211.[16] Hualde, J. and Robinson, P.M. (2010). Semiparametric inference in multivariate fractionallycointegrated systems. Journal of Econometrics , (2), 492-511.[17] Johansen, S. (1995). Likelihood-based inference in cointegrated vector auto-regressive models .Oxford University Press, New York.[18] Kallenberg, O. (2002). Foundations of Modern Probability, Second Edition. Springer-Verlag,Berlin.[19] Kasparis, I.(2010). The Bierens test for certain nonstationary models’, Journal of Economet-rics , , 221-230.[20] Kasparis, I., Andreou, E., and Phillips, P.C.B. (2015). Nonparametric predictive regression. Journal of Econometrics , (2), 468-494. 4421] Kostakis, A., Magdalinos, T. and Stamatogiannis, M.P. (2015). Robust econometric inferencefor stock return predictability. Review of Financial Studies , (5), 1506-1553.[22] Magdalinos, T. and Phillips, P.C.B. (2009). Econometric inference in the vicinity of unity. Mimeo , Singapore Management University.[23] Marmer. V. (2007). Nonlinearity, nonstationarity and spurious forecasts. Journal of Econo-metrics , , 1-27.[24] Mikusheva, A. (2007). Uniform inference in autoregressive models. Econometrica , (5), 1411-1452.[25] Park, J.Y. (2003). Nonstationary nonlinear heteroskedasticity. Journal of Econometrics , 110,383-415.[26] Park, J.Y. and Phillips P.C.B. (1999). Asymptotics for nonlinear transformations of integratedtime series. Econometric Theory , (3), 269-298.[27] Park, J.Y. and Phillips P.C.B. (2001). Nonlinear regressions with integrated time series, Econo-metrica , (1), 117-161.[28] Phillips, P.C.B. (1991). Optimal inference in cointegrated systems. Econometrica , (2), 283-306.[29] Phillips, P.C.B. (1995). Fully modified least squares and vector autoregression. Econometrica , (5) 1023-1078.[30] Phillips, P.C.B. (2014). On confidence intervals for autoregressieve roots and predictive regres-sions. Econometrica , (3), 1177-1195.[31] Phillips, P.C.B. (2015). Pitfalls and possibilities in predictive regression. Journal of FinancialEconometrics , (3), 521–555.[32] Phillips, P.C.B. and Hansen, B. (1990). Statistical inference in instrumental variables regressionwith I(1) processes. Review of Economic Studies , (1), 99-125[33] Philips P.C.B., Li, D. and Gao, J. (2017) Estimating smooth structural change in cointegrationmodels. Journal of Econometrics , , 180-195.[34] Phillips, P.C.B. and Magdalinos, T. (2007). Limit theory for moderate deviations from a unitroot. Journal of Econometrics , (1), 115-130.4535] Robinson, P.M. (1995). Gaussian semiparametric estimation of long range dependence. Annalsof Statistics , (5), 1630-1661.[36] Robinson, P.M. and Hualde, J. (2003). Cointegration in fractional systems with unknownintegration orders. Econometrica , (6), 1727-1766.[37] Shimotsu, K. and Phillips, P.C.B. (2005). Exact local Whittle estimation of fractional integra-tion. Annals of Statistics , (4), 1890-1933.[38] Wang, Q. (2014). Martingale limit theorem revisited and nonlinear cointegrating regression. Econometric Theory , (3), 509-535.[39] Wang, Q. (2015). Limit Theorems for Nonlinear Cointegrating Regression , World Scientific,Singapore.[40] Wang, Q. and Phillips P.C.B. (2009a). Asymptotic theory for local time density estimationand nonparametric cointegrating regression.