[PDF] An Asymptotically F-Distributed Chow Test in the Presence of Heteroscedasticity and Autocorrelation

Abstract

This study proposes a simple, trustworthy Chow test in the presence of heteroscedasticity and autocorrelation. The test is based on a series heteroscedasticity and autocorrelation robust variance estimator with judiciously crafted basis functions. Like the Chow test in a classical normal linear regression, the proposed test employs the standard F distribution as the reference distribution, which is justified under fixed-smoothing asymptotics. Monte Carlo simulations show that the null rejection probability of the asymptotic F test is closer to the nominal level than that of the chi-square test.

Full PDF

aa r X i v : . [ ec on . E M ] N ov An Asymptotically F-Distributed Chow Test in the Presence ofHeteroscedasticity and Autocorrelation ∗ Yixiao SunDepartment of EconomicsUC San Diego, USA Xuexin WangSchool of Economics and WISEXiamen University, ChinaNovember 12, 2019

Abstract

This study proposes a simple, trustworthy Chow test in the presence of heteroscedasticityand autocorrelation. The test is based on a series heteroscedasticity and autocorrelationrobust variance estimator with judiciously crafted basis functions. Like the Chow test in aclassical normal linear regression, the proposed test employs the standard F distribution asthe reference distribution, which is justiﬁed under ﬁxed-smoothing asymptotics. Monte Carlosimulations show that the null rejection probability of the asymptotic F test is closer to thenominal level than that of the chi-square test.

Keywords:

Chow Test, F Distribution, Heteroscedasticity and Autocorrelation, StructuralBreak.

For predictive modeling and policy analysis using time series data, it is important to checkwhether a structural relationship is stable over time. The Chow (1960) test is designed to testwhether a break takes place at a given period in an otherwise stable relationship. The test iswidely used in empirical applications and has been included in standard econometric textbooks.This paper considers the Chow test in the presence of heteroscedasticity and autocorrelation.There is ample evidence that the Chow test can have very large size distortions if heteroscedas-ticity and autocorrelation are not accounted for (e.g., Kr¨amer (1989) and Giles and Scott (1992)).Even if we account for them using heteroscedasticity and autocorrelation robust (HAR) varianceestimators (e.g., Newey and West (1987) and Andrews (1991)), the test can still over-reject thenull hypothesis by a large margin if chi-square critical values are used . This is a general prob-lem for any HAR inference, as the chi-square approximation ignores the often substantial ﬁnitesample randomness of the HAR variance estimator. To address this problem, the recent litera-ture has developed a new type of asymptotics known as ﬁxed-smoothing asymptotics (see, e.g.,Kiefer and Vogelsang (2002a,b, 2005) for early seminal contributions). It is now well known that ∗ We thank Derrick H. Sun for excellent research assistance. When the Chow test is performed on a single coeﬃcient, normal critical values are typically used on the tstatistic. For now, we focus only on the Wald-type Chow test for more than one coeﬃcients so that chi-squarecritical values are used. .To establish the asymptotic F theory for the Chow test under ﬁxed-smoothing asymptotics,we have to transform the usual orthonormal bases such as sine and cosine bases using the Gram–Schmidt orthonormalization. This is because, unlike the HAR inference in a regression withstationary regressors and regression errors, using the usual bases as in Sun (2013) does not leadto a standard ﬁxed-smoothing asymptotic distribution, since the regressors in the regressionfor the structural break test are identically zero before or after the break point and are thusnot stationary. The Gram–Schmidt orthonormalization ensures that the transformed bases areorthonormal with respect to a special inner product that is built into the problem under con-sideration. The asymptotic F test is very convenient to use, as the F critical values are readilyavailable from standard statistical tables and programming environments.Monte Carlo simulation experiments show that the F test based on the transformed Fourierbases is as accurate as the nonstandard test based on the usual Fourier bases. The F test andnonstandard test have the same size-adjusted power as the corresponding chi-square tests butmuch more accurate size. Given its convenience, competitive power, and higher size accuracy, werecommend the F test for practical use.Our F test theory generalizes the classical Chow test in a linear normal regression where theF distribution is the exact ﬁnite sample distribution. The main departures are that we do notmake the normality assumption and that we allow for heteroscedasticity and autocorrelation ofunknown forms. Without restrictive assumptions such as normality and strict exogeneity, it isin general not possible to obtain the exact ﬁnite sample distribution. Instead, we employ theﬁxed-smoothing asymptotics to show that the Wald statistic is asymptotically F distributed.This study contributes to the asymptotic F test theory in the HAR literature. The asymptoticF theory has been developed in a number of papers including Sun (2011); Sun and Kim (2012);Sun (2013); Hwang and Sun (2017); Lazarus et al. (2018); Liu and Sun (2019); Wang and Sun(2019); Martnez-Iriarte et al. (2019). However, none of these studies considers the case wherethe regressors take the special form of nonstationarity as we consider here. Cho and Vogelsang(2017) consider ﬁxed-b asymptotics for testing structural breaks, but they consider only kernelHAR variance estimators. As a result, the ﬁxed-smoothing asymptotic distributions they obtainedare highly nonstandard.The rest of this paper is organized as follows. Section 2 presents the basic setting andintroduces the test statistics. Section 3 establishes the ﬁxed-smoothing asymptotics of the F andt statistics. Section 4 develops asymptotically valid F and t tests. Section 5 extends the basicregression model to include other covariates whose coeﬃcients are known to be stable over time. In the series case, ﬁxed-smoothing asymptotics holds the number of basis functions ﬁxed as the sample sizeincreases. In the kernel case, ﬁxed-smoothing asymptotics holds the truncation lag parameter ﬁxed at a certainproportion of the sample size.

Given the time series observations { X t ∈ R m , Y t ∈ R } Tt =1 , we consider the model Y t = X t · { t ≤ [ λT ] } · β + X t · { t ≥ [ λT ] + 1 } · β + u t , for t = 1 , , . . . , T where the unobserved u t satisﬁes E ( X t u t ) = 0 . In the above, λ is a knownparameter in (0 ,

1) so that [ λT ] is the period where the structural break may take place. Theeﬀects of X t on Y t before and after the break are β ∈ R m and β ∈ R m , respectively. We allow X t u t to exhibit autocorrelation of unknown forms. In particular, we allow u t to be heteroskedasticso that E ( u t | X t ) is a nontrivial function of X t . We are interested in testing the null of H : R β = R β against the alternative H : R β = R β for some p × m matrix R . When R is the m × m identity matrix, we aim at testing whether β is equal to β . For the moment, we consider the case that all coeﬃcients are subject to apossible break. In Section 5, we consider the case that some of the coeﬃcients are known to betime invariant.Let X t = X t · { t ≤ [ λT ] } , X t = X t · { t ≥ [ λT ] + 1 } . Note that both X t and X t are nonstationary. The form of the nonstationarity makes theproblem at hand unique. Let β = ( β ′ , β ′ ) ′ and ˜ X t = ( X t , X t ) . Then Y t = ˜ X t β + u t , and the hypotheses of interest become H : Rβ = 0 and H : Rβ = 0 for R = [ R , −R ] ∈ R p × m . Denote ˜ X = ( ˜ X ′ , . . . , ˜ X ′ T ) ′ , Y = ( Y , . . . , Y T ) ′ , and u = ( u , . . . , u T ) ′ . We estimate β by OLS:ˆ β = ( ˜ X ′ ˜ X ) − ˜ X ′ Y. The OLS estimator ˆ β satisﬁes √ T ( ˆ β − β ) = ˆ Q − √ T T X t =1 ˜ X ′ t u t , where ˆ Q = ˜ X ′ ˜ XT = T − P [ T λ ] t =1 X ′ t X t OO T − P TT [ λ ]+1 X ′ t X t ! and O is a matrix of zeros. To make inferences on β such as testing whether Rβ is zero, weneed to estimate the variance of T − / P Tt =1 ˜ X ′ t u t . To this end, we ﬁrst construct the residualˆ u t = Y t − ˜ X t ˆ β, which serves as an estimate for u t . Given a set of basis functions { φ j ( · ) } Kj =1 , wethen construct the series estimator of the variance asˆΩ = 1 K K X j =1 " √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˆ u t ⊗ , a, a ⊗ is the outer product of a, that is, a ⊗ = aa ′ . The asymptoticvariance of R √ T ( ˆ β − β ) is then estimated by R ˆ Q − ˆΩ ˆ Q − R ′ . The Wald statistic for testing H : Rβ = 0 against H : Rβ = 0 is F T = T · ( R ˆ β ) ′ h R ˆ Q − ˆΩ ˆ Q − R ′ i − ( R ˆ β ) . When p = 1 and we test H : Rβ = 0 against a one-sided alternative, say, H : Rβ >

0, we canconstruct the t statistic: t T = √ T · R ˆ β h R ˆ Q − ˆΩ ˆ Q − R ′ i / . The forms of the F and t statistics are standard.

To establish the asymptotic distributions of F T and t T , we maintain the following three assump-tions: Assumption 3.1 T − P [ T r ] t =1 X ′ t X t → p Q · r uniformly over r ∈ [0 , and Q is invertible. Assumption 3.2 T − / P [ T r ] t =1 X ′ t u t → d Λ W m ( r ) for r ∈ [0 , where Ω = ΛΛ ′ is the long runvariance of { X ′ t u t } and W m ( · ) is an m × standard Brownian process. Assumption 3.3

The basis functions φ j ( · ) , j = 1 , , . . . , K are piecewise monotonic and piece-wise continuously diﬀerentiable. Lemma 3.1

Let Assumptions 3.1 and 3.2 hold. Then √ T ( ˆ β − β ) := (cid:18) √ T ( ˆ β − β ) √ T ( ˆ β − β ) (cid:19) → d Q − Λ · λ R λ dW m ( λ ) Q − Λ · − λ R λ dW m ( λ ) ! . If Assumption 3.3 also holds, then √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˆ u t → d Λ · R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r )Λ · R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) ! , where ¯ φ j, = 1 λ Z λ φ j ( s ) ds and ¯ φ j, = 11 − λ Z λ φ j ( s ) ds. Note that λ R λ dW m ( λ ) and − λ R λ dW m ( λ ) are the average changes of the Brownian motionover the intervals [0 , λ ] and [ λ, , respectively. Lemma 3.1 shows that √ T ( ˆ β − β ) and √ T ( ˆ β − β ) are (matrix) proportional to the average changes. Given the independence of these changesover any non-overlapping intervals, √ T ( ˆ β − β ) and √ T ( ˆ β − β ) are asymptotically independent.Note that ¯ φ j, can be regarded as an average of φ j ( · ) over the interval [0 , λ ] . Similarly, ¯ φ j, can be regarded as an average of φ j ( · ) over the interval [ λ, . So φ j ( r ) − ¯ φ j, and φ j ( r ) − ¯ φ j, are the demeaned versions of φ j ( r ) over the intervals [0 , λ ] and [ λ, , respectively.Using Lemma 3.1, we can prove our main theorem below.4 heorem 3.1 Let Assumptions 3.1–3.3 hold. Then, under the null hypothesis, F T → d (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) ′ ×  K K X j =1 (cid:26)Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:27) ⊗  − × (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) := F ∞ , (1) where ˜ φ j ( r ; λ ) = 1 λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) · { r ≤ λ } − − λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) · { r > λ } . (2) When p = 1 ,t T → d (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) ×  K K X j =1 (cid:26)Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:27) ⊗  − / := t ∞ . Like the ﬁnite sample distributions, the limiting distributions of F T and t T depend on λ andthe number and form of the basis functions. This is an attractive feature of the ﬁxed-smoothingapproximations, as they capture the eﬀects of all these factors. More importantly, the ﬁxed-smoothing approximations capture the randomness of the HAR variance estimator, which clearlyaﬀects the ﬁnite sample distributions of F T and t T . This is why the ﬁxed-smoothing asymptoticapproximations are more accurate than the chi-square or normal approximations.

The limiting distributions F ∞ and t ∞ in Theorem 3.1 are pivotal but nonstandard. We canapproximate the nonstandard distributions using a chi-square or t distribution. We can alsodesign a new set of basis functions so that F ∞ and t ∞ become the standard F and t distributionsafter some multiplicative adjustment. Deﬁne ˜ φ ( r ; λ ) = 1 λ { r ≤ λ } − − λ { r > λ } . Then 1 λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ )= Z ˜ φ ( r ; λ ) dW p ( r ) ∼ N (cid:18) , Z h ˜ φ ( r ; λ ) i dr · I p (cid:19) = N (cid:18) , λ (1 − λ ) I p (cid:19) , η := p λ (1 − λ ) (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) = p λ (1 − λ ) Z ˜ φ ( r ; λ ) dW p ( r ) ∼ N (0 , I p ) . As a result, λ (1 − λ ) F T → d η ′  K K X j =1 η j η ′ j  − η and p λ (1 − λ ) t T → d η ′  K K X j =1 η j η ′ j  − / , where η j = Z ˜ φ j ( r ; λ ) dW p ( r ) for j = 1 , . . . , K. When K is relatively large, it is reasonable to approximate K − P Kj =1 η j η ′ j by its mean: E  K K X j =1 η j η ′ j  = I p · K K X j =1 Z h ˜ φ j ( r ; λ ) i dr. With such an approximation, we have λ (1 − λ ) · K K X j =1 Z h ˜ φ j ( r ; λ ) i dr · F ∞ ∼ a χ p , p λ (1 − λ )  K K X j =1 Z h ˜ φ j ( r ; λ ) i dr  / · t ∞ ∼ a N (0 , , where ‘ ∼ a ’ signiﬁes distributional approximations. As a result, we can employ the followingapproximations: F ∗ T := λ (1 − λ ) ·  KT K X j =1 T X i =1 (cid:20) ˜ φ j,T (cid:18) iT ; λ (cid:19)(cid:21)  · F T ∼ a χ p , (3) t ∗ T := p λ (1 − λ ) ·  KT K X j =1 T X i =1 (cid:20) ˜ φ j,T (cid:18) iT ; λ (cid:19)(cid:21)  / · t T ∼ a N (0 , , (4)where ˜ φ j,T ( r ; λ ) is the ﬁnite sample version of ˜ φ j ( r ; λ ) given by˜ φ j,T ( r ; λ ) = 1 λ  φ j ( r ) − λT ] [ λT ] X t =1 φ j (cid:18) tT (cid:19) · { r ≤ λ }− − λ  φ j ( r ) − T − [ λT ] T X t =[ λT ]+1 φ j (cid:18) tT (cid:19) · { r > λ } . (5)6t is important to point out that the chi-square and normal approximations are not basedon the original Wald and t statistics but rather on their modiﬁed versions F ∗ T and t ∗ T . Toa great extent, the chi-square and normal approximations we propose here improve upon theconventional chi-square and normal approximations that are applied directly to the original Waldand t statistics.Note that the chi-square distribution and standard normal distribution in (3) and (4) are notthe asymptotic distributions of F ∗ T and t ∗ T for a ﬁxed K. The ﬁxed- K asymptotic distributionsare given by F ∗ T → d λ (1 − λ ) · K K X j =1 Z h ˜ φ j ( r ; λ ) i dr · F ∞ := F ∗∞ , (6) t ∗ T → d p λ (1 − λ ) ·  K K X j =1 Z h ˜ φ j ( r ; λ ) i / · t ∞ := t ∗∞ . (7)These follow directly from Theorem 3.1. The chi-square distribution and standard normal dis-tribution are only approximations to the above nonstandard ﬁxed-K asymptotic distributions. To obtain convenient ﬁxed-K asymptotic approximations, we note that for each j = 0 , , . . . , K,η j is normal. For each j = 0 , we have cov ( η , η j )= Z ˜ φ ( r ; λ ) ˜ φ j ( r ; λ ) dr = 1 λ Z (cid:2) φ j ( r ) − ¯ φ j, (cid:3) { r ≤ λ } dr + 1(1 − λ ) Z (cid:2) φ j ( r ) − ¯ φ j, (cid:3) { r > λ } dr = 0 . So η is independent of η j , j = 1 , . . . , K. In addition, cov ( η j , η j ) = Z ˜ φ j ( r ; λ ) ˜ φ j ( r ; λ ) dr. Therefore, if { ˜ φ j ( r ; λ ) } are orthonormal, then η j for j = 0 , , . . . , K are independent standardnormals. In this case, λ (1 − λ ) F ∞ is a quadratic form in a standard normal vector with anindependent weighting matrix. After some adjustment, we can show that λ (1 − λ ) F ∞ is equalto a standard F distribution and that F T converges to the F distribution. Similarly, p λ (1 − λ ) · t T converges to Student’s t distribution. Proposition 4.1

Let Assumptions 3.1–3.3 hold. If { ˜ φ j ( r ; λ ) } are orthonormal, then ˜ F ∗ T := K − p + 1 Kp · λ (1 − λ ) · F T → d F p,K − p +1 , and ˜ t ∗ T := p λ (1 − λ ) · t T → d t K where F p,K − p +1 is the standard F distribution with the degrees of freedom ( p, K − p + 1) and t K is Student’s t distribution with degrees of freedom K. { ˜ φ j ( r ; λ ) } are orthonormal, we have K − P Kj =1 R h ˜ φ j ( r ; λ ) i dr = 1 . In view of this,we can see that the deﬁnitions of ˜ F ∗ T and ˜ t ∗ T are similar to those of F ∗ T and t ∗ T given in (6) and(7). The only diﬀerence is that there is an additional degrees-of-freedom-adjustment factor in ˜ F ∗ T when p > . To design the basis functions such that { ˜ φ j ( r ; λ ) } are orthonormal, we need the following lemma. Lemma 4.1

Let δ ( · ) be the Dirac delta function such that Z Z φ j ( r ) δ ( r − s ) φ j ( s ) drds = Z φ j ( r ) φ j ( r ) dr. Then Z ˜ φ j ( r ; λ ) ˜ φ j ( r ; λ ) dr = Z Z C ( r, s ; λ ) φ j ( r ) φ j ( s ) drds, where C ( r, s ; λ ) = (cid:20) δ ( r − s ) − λ (cid:21) { ( r, s ) ∈ [0 , λ ] × [0 , λ ] } λ + (cid:20) δ ( r − s ) − − λ (cid:21) { ( r, s ) ∈ [ λ, × [ λ, } (1 − λ ) . Let W p ( r ; λ ) = 1 λ h W p ( r ) − rλ W p ( λ ) i · { ≤ r ≤ λ }− − λ (cid:26) W p ( r ) − W p ( λ ) − r − λ − λ [ W p (1) − W p ( λ )] (cid:27) · { λ < r ≤ } be the transformed Brownian motion. Then we have Z ˜ φ j ( r ; λ ) dW p ( r ) = Z φ j ( r ) dW p ( r ; λ ) , and E (cid:2) dW p ( r ; λ ) dW ′ p ( s ; λ ) (cid:3) = I p · C ( r, s ; λ ) drds. Therefore, C ( r, s ; λ ) can be regarded as the covariance kernel function for the transformed Brow-nian motion.To design the basis functions { φ j ( r ) } such that { ˜ φ j ( r ; λ ) } are orthonormal on L [0 , , werequire that { φ j ( r ) } be orthonormal with respect to the covariance kernel function C ( r, s ; λ ) , that is, Z Z C ( r, s ; λ ) φ j ( r ) φ j ( s ) drds = 1 { j = j } . (8)This can be achieved by applying the Gram–Schmidt orthonormalization to any set of basisfunctions on L [0 , φ j } → estimation error n ˜ φ j o (may not be orthonormal on L [0 , ↓ GS { φ ∗ j } → estimation error n ˜ φ ∗ j o (orthonormal on L [0 , { φ j } is the initial set of basis functions, and { φ ∗ j } is the Gram-Schmidtorthonormalized set. “ φ j → ˜ φ j ” and “ φ ∗ j → ˜ φ ∗ j ” reﬂect the eﬀect of the estimation errorin estimating β : had we known β, we would have used the true u t instead of ˆ u t in con-structing the variance estimator, and the key elements of the weighting matrix in (1) in The-orem 3.1 would have been R φ j ( r ) dW p ( r ) instead of R ˜ φ j ( r ; λ ) dW p ( r ) . The Gram-Schmidtorthonormalization ensures that { φ ∗ j } are orthonormal with respect to the covariance kernel C ( r, s ; λ ) : R R φ ∗ j ( r ) φ ∗ j ( s ) C ( r, s ; λ ) drds = 1 { j = j } . In view of Z Z φ ∗ j ( r ) φ ∗ j ( s ) C ( r, s ; λ ) drds = Z ˜ φ ∗ j ( r ) ˜ φ ∗ j ( r ) dr, we have: { ˜ φ ∗ j } are orthonormal on L [0 , . If we use { φ ∗ j } in constructing the variance estimator, then λ (1 − λ ) F T → d η ′  K K X j =1 η j η ′ j  − η for η j = R ˜ φ ∗ j ( r ; λ ) dW p ( r ) ∼ iidN (0 , I p ) because { ˜ φ ∗ j } are orthonormal on L [0 , . Moreover,for j = 1 , . . . , K, η j is independent of η . Therefore, the asymptotic F theory in Proposition 4.1holds. Similarly, the asymptotic t theory holds.Instead of searching for the basis functions that satisfy (8), we search for their discrete versions:the basis vectors. For each basis function φ k ( r ) , the corresponding basis vector is deﬁned as φ k = (cid:18) φ k (cid:18) T (cid:19) , φ k (cid:18) T (cid:19) , . . . , φ k (cid:18) TT (cid:19)(cid:19) ′ . Let C T := C T ( λ ) be the T × T matrix whose ( i, j )-th element is equal to C T ( i, j ; λ ) = (cid:20) T · { i = j } − λ (cid:21) n(cid:16) iT , jT (cid:17) ∈ [0 , λ ] × [0 , λ ] o λ + (cid:20) T · { i = j } − − λ (cid:21) n(cid:16) iT , jT (cid:17) ∈ ( λ, × ( λ, o (1 − λ ) . By deﬁnition, C T is symmetric and positive-deﬁnite. It is the discrete version of C ( r, s ; λ ) . Forany two vectors r , r ∈ R T , we deﬁne the inner product h r , r i = r ′ C T r /T . (9)Then the discrete analogue of (8) is (cid:10) φ j , φ j (cid:11) = 1 { j = j } for j , j = 1 , . . . , K. (10)9iven any basis vectors φ , . . . , φ K , we now apply the Gram–Schmidt orthonormalization viathe Cholesky decomposition. Let φ = ( φ , . . ., φ K ) be the T × K matrix of basis vectors. Let U T ∈ R K × K be the upper triangular factor in the Cholesky decomposition of φ ′ C T φ /T suchthat φ ′ C T φ /T = U ′ T U T . Deﬁne φ ∗ = φ U − T := ( φ ∗ , . . ., φ ∗ K ) . We then have ( φ ∗ ) ′ C T φ ∗ /T = (cid:0) U ′ T (cid:1) − φ ′ C T φ U − T /T = (cid:0) U ′ T (cid:1) − U ′ T U T U − T = I K . That is, the columns of the matrix φ ∗ satisfy the conditions in (10).Note that the ( k , k )-th element of φ ′ C T φ /T satisﬁes1 T T X j =1 T X i =1 φ k (cid:18) iT (cid:19) C T ( i, j ; λ ) φ k (cid:18) jT (cid:19) → Z Z C ( r, s ; λ ) φ k ( r ) φ k ( s ) drds = Z ˜ φ k ( r ; λ ) ˜ φ k ( r ; λ ) dr = cov ( η k , η k ) as T → ∞ . This implies that U T converges to the upper triangular factor of the Cholesky decomposition ofvar( η , . . . , η K ) . As a result, every transformed basis vector is approximately equal to a linearcombination of the original basis vectors. The implied basis functions are thus equal to linearcombinations of the original basis functions. Therefore, if Assumption 3.3 holds for the originalbasis functions, it also holds for the transformed basis functions. It then follows that Proposition4.1 holds when { φ ∗ , . . ., φ ∗ K } are used as the basis vectors in constructing the asymptotic varianceestimator. More speciﬁcally, if we estimate Ω byˆΩ = 1 K K X j =1 " √ T T X t =1 φ ∗ j,t ˜ X ′ t ˆ u t ⊗ , where φ ∗ j,t is the t -th element of the vector φ ∗ j , then the asymptotic F and t results in Proposition4.1 hold. Suppose there is another covariate vector Z t ∈ R ℓ whose eﬀect on Y t does not change over timeso that we have the model: Y t = X t · { t ≤ [ λT ] } · β + X t · { t > [ λT ] + 1 } · β + Z t γ + u t . Let Z = ( Z ′ , . . . , Z ′ T ) ′ and M Z = I T − Z ( Z ′ Z ) − Z ′ . Then M Z Y = M Z ˜ Xβ + M Z u. The OLS estimator of β = ( β ′ , β ′ ) ′ is nowˆ β = ( ˜ X ′ M Z ˜ X ) − ˜ X ′ M Z Y. u = (ˆ u , . . . , ˆ u T ) ′ = M Z Y − M Z ˜ X ˆ β = M Z u − M Z ˜ X ( ˆ β − β ) and ˜ X z = ( ˜ X ′ z, , . . . , ˜ X ′ z,T ) ′ = M Z ˜ X. Deﬁne ˆ Q ˜ X · Z = ˜ X ′ M Z ˜ XT and ˆΩ = 1 K K X j =1 " √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˆ u t ⊗ . The Wald statistic for testing H : Rβ = 0 against H : Rβ = 0 takes the same form as before: F T = T · ( R ˆ β ) ′ h R ˆ Q − ˆΩ ˆ Q − R ′ i − ( R ˆ β ) . When p = 1 , we construct the t statistic: t T = √ T · R ˆ β h R ˆ Q − ˆΩ ˆ Q − R ′ i / . To establish the asymptotic distributions of F T and t T , we maintain the two assumptionsbelow, which are analogous to Assumptions 3.1 and 3.2. Assumption 5.1 T − P [ T r ] t =1 ( X t , Z t ) ′ ( X t , Z t ) → p Q · r uniformly over r ∈ [0 , for a ( m + ℓ ) × ( m + ℓ ) invertible matrix Q . Assumption 5.2 T − / P [ T r ] t =1 ( X t , Z t ) ′ u t → d Λ W m + ℓ ( r ) for r ∈ [0 , where ΛΛ ′ is the long runvariance of the process { ( X t , Z t ) ′ u t } and W m + ℓ ( · ) is an ( ℓ + m ) × standard Brownian process. We partition Q and Λ according to Q = (cid:18) Q XX Q XZ Q ZX Q ZZ (cid:19) and Λ = (cid:18) Λ X Λ Z (cid:19) , where Q XX ∈ R m × m , Q ZZ ∈ R ℓ × ℓ , Λ X ∈ R m × ( ℓ + m ) , and Λ Z ∈ R ℓ × ( ℓ + m ) . Theorem 5.1

Let Assumptions 3.3, 5.1, and 5.2 hold. Then ( a ) R √ T ( ˆ β − β ) → d R Q − XX Λ X (cid:18) λ Z λ dW m + ℓ ( λ ) − − λ Z λ dW m + ℓ ( λ ) (cid:19) . ( b ) R ˆ Q − X · Z √ T T X t =1 φ j (cid:18) tT (cid:19) X ′ z,t ˆ u t → d R Q − XX Λ X Z ˜ φ j ( r ; λ ) dW m + ℓ ( r ) jointly over j = 1 , , ..., K. ( c ) F T → d (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) ′ ×  K K X j =1 (cid:26)Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:27) ⊗  − × (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) . hen p = 1 ,t T → d (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) ×  K K X j =1 (cid:26)Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:27) ⊗  − / . Theorem 5.1 shows that the limiting distributions of the Wald statistic and t statistic are thesame as in the case without the extra covariate Z t . The asymptotic F and t limit theory can bedeveloped in exactly the same way as in Section 4. We present the result formally as a corollary.

Corollary 1

Let Assumptions 3.3, 5.1, and 5.2 hold. Suppose that the Gram–Schmidt trans-formed basis vectors φ ∗ , ..., φ ∗ K are used in constructing the variance estimator, that is, ˆΩ = 1 K K X j =1 " √ T T X t =1 φ ∗ j,t ˜ X ′ z,t ˆ u t ⊗ where φ ∗ j,t is the t -th element of the vector φ ∗ j . Then ˜ F ∗ T := K − p + 1 Kp · λ (1 − λ ) · F T → d F p,K − p +1 , and ˜ t ∗ T := p λ (1 − λ ) · t T → d t K . In this section, we investigate the ﬁnite sample properties of the proposed F test. We consider thelinear regression model with m = 2 and X t = (1 , q t ) . The regressor q t follows an AR(1) process,and the error u t follows an independent AR(1) or ARMA(1,1) process. That is, q t = ρq t − + ǫ q,t u t = ρu t − + ǫ u,t + ψǫ u,t − where both ǫ q,t and ǫ u,t are iid N (0 ,

1) and { ǫ q,t , t = 1 , . . . , T } are independent of { ǫ u,t : t = 1 , , . . . , T } . Note that the AR parameter ρ is the same for q t and u t . We consider the sample sizes T = 100 , . We let λ = 0 . . Without the loss ofgenerality, we set β = (0 , ′ and β = (0 , ′ under the null. We consider testing H : β = β against H : β = β so that p = 2.We consider two pairs of diﬀerent tests, both of which are based on the series variance esti-mators. The ﬁrst pair uses the (usual) Fourier bases n φ j − ( r ) = √ jπr ) , φ j = √ jπr ) , j = 1 , . . . , K/ o . (11)Each test in this pair is based on the same test statistic F ∗ T deﬁned in (3) but uses diﬀerentreference distributions. The ﬁrst test uses the chi-square approximation ( χ ) while the secondtest uses the nonstandard ﬁxed-smoothing approximation given in (6). We refer to the two testsas “ χ : Fourier Bases” and “ F ∗∞ : Fourier Bases,” respectively. The nonstandard critical valuesare simulated. We approximate the standard Brownian motion in the nonstandard distribution12sing scaled partial sums of 1000 iid N (0 ,

1) random variables. To compute the nonstandardcritical values, we use 10,000 simulation replications.The second pair of tests uses the transformed Fourier bases via the Gram–Schmidt orthog-onalization given in Section 4.3. Each of the two tests in this pair is based on the same teststatistic ˜ F ∗ T deﬁned in Proposition 4.1. The ﬁrst test uses the standard F approximation, andthe second test uses the rescaled chi-square distribution Kp [ K − p + 1] − χ . Equivalently, thesecond test in this pair employs the test statistic ˜ F T = λ (1 − λ ) · F T and the standard chi-squareapproximation ( χ ). We refer to the two tests as “ χ : Transformed Bases” and “ F : TransformedBases,” respectively. The chi-square test in the second pair is used to illustrate the eﬀectivenessof the F approximation in reducing the size distortion.The nominal level of all tests is 5% . The number of simulation replications is 10,000. Figures1 and 2 report the null rejection probability for each test for the sample sizes T = 100 and T = 500 when q t and u t follow independent AR(1) processes with the same AR parameter ρ. Several patterns emerge from these two ﬁgures: • Regardless of the bases used, the chi-square tests over-reject the null by a large margin,especially when K is small. • Regardless of the bases used, the nonstandard test and F test are much more accurate thanthe chi-square tests. • For each given value of K, the null rejection probabilities of the nonstandard test and F testare close to each other. This shows that, in terms of size accuracy, using the F approximation(when the transformed Fourier bases are employed) is as good as using the nonstandardapproximation (when the Fourier bases are employed). However, the F approximation ismore convenient to use and, hence, is preferred. • For each given value of K, the null rejection probabilities of the two chi-square tests areclose to each other, although the one based on the transformed Fourier bases is somewhatmore accurate. This shows that the bases do not have a large eﬀect on the quality of thechi-square approximation. • The nonstandard test and standard F test can still have quite some size distortion if K islarge and the regressor and error processes are persistent. The size distortion comes fromthe bias of the variance estimator. When K is large, we take an average over a frequencywindow that is too large when the processes are highly persistent, that is, when the spectraldensity of { x t u t } is not very ﬂat at the origin. So, it is important to use a data-driven K to obtain an accurate test in practice. • Comparing the two ﬁgures, we see that the size distortion of every test becomes smallerwhen the sample size is larger.Figure 3 reports the null rejection probabilities when the sample size T is 200 and when theerror process may have an MA component and the AR parameter may be negative. As in Figures1 and 2, the same patterns emerge.Next, we consider the size properties of the tests with a data-driven K. Note that R √ T ( ˆ β − β ) = 1 √ T T X t =1 R ˜ X ′ ˜ XT ! − ˜ X ′ t u t = 1 √ T T X t =1 v t + o p (1)13 R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y Figure 1: The empirical null rejection probabilities of diﬀerent 5% tests when T = 100 for a rangeof diﬀerent K values from 2 to 20 with increment 2.14 R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y Figure 2: The empirical null rejection probabilities of diﬀerent 5% tests when T = 500 for a rangeof diﬀerent K values from 2 to 20 with increment 2.15 R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y Figure 3: The empirical null rejection probabilities of diﬀerent 5% tests when T = 200 for a rangeof diﬀerent K values from 2 to 20 with increment 2.16here v t = RQ − ˜ X ′ t u t . Then R ˆ Q − ˆΩ Q − R ′ = 1 K K X j =1 " √ T T X t =1 φ j (cid:18) tT (cid:19) ˆ v t ⊗ , where ˆ v t = R ˆ Q − ˜ X ′ t u t . So R ˆ Q − ˆΩ Q − R ′ can be viewed as the series variance estimator of thelong run variance of { v t } . We can follow Phillips (2005) and choose K to minimize the meansquare error (MSE) of RQ − ˆΩ Q − R ′ . We ﬁt a VAR(1) model to ˆ v t and use the ﬁtted model tocompute the data-driven MSE-optimal K. Table 1 reports the null rejection probabilities and the average values of K used with data-driven choice of K for diﬀerent sample sizes. The qualitative observations from Figures 1– 3continue to hold with the data-driven K. In particular, the nonstandard test and standard F testare more accurate than the corresponding chi-square tests, especially when the latter have largepositive size distortion. The null rejection probabilities of the nonstandard test and the standardF test are close to each other. Similarly, the null rejection probabilities of the two chi-squaretests are close to each other. As expected, the average value of K decreases with the persistenceof the underlying processes. The higher the persistence, the smaller the average K value, andthe more eﬀective the nonstandard test and standard F test in reducing the size distortion.Table 1: The empirical null rejection probabilities of diﬀerent 5% tests with data-driven choiceof K . ρ = 0 ρ = 0 . ρ = 0 . ρ = 0 . ρ = − . ρ = − . ρ = 0 . ρ = 0 . ψ = 0 ψ = 0 ψ = 0 ψ = 0 ψ = 0 ψ = 0 ψ = 0 . ψ = . . T = 100 χ : Fourier 0.092 0.131 0.227 0.511 0.128 0.091 0.287 0.558 F ∗∞ : Fourier 0.060 0.076 0.101 0.198 0.067 0.056 0.087 0.170 χ : Transformed 0.089 0.124 0.210 0.473 0.119 0.085 0.259 0.516 F : Transformed 0.064 0.079 0.101 0.209 0.071 0.060 0.088 0.182Ave( K ) 30.00 18.40 9.71 5.29 16.57 26.34 6.14 4.27 T = 200 χ : Fourier 0.069 0.100 0.153 0.396 0.092 0.070 0.197 0.444 F ∗∞ : Fourier 0.052 0.066 0.079 0.150 0.055 0.050 0.075 0.131 χ : Transformed 0.068 0.094 0.142 0.363 0.088 0.067 0.179 0.406 F : Transformed 0.057 0.069 0.082 0.153 0.058 0.051 0.074 0.135Ave( K ) 70 28.82 14.32 6.10 24.96 46.02 8.56 4.56 T = 500 χ : Fourier 0.055 0.068 0.096 0.222 0.067 0.057 0.119 0.278 F ∗∞ : Fourier 0.049 0.054 0.062 0.091 0.053 0.048 0.056 0.084 χ : Transformed 0.053 0.064 0.091 0.209 0.064 0.055 0.110 0.253 F : Transformed 0.048 0.053 0.062 0.096 0.051 0.048 0.058 0.086Ave( K ) 144.51 56.47 26.91 9.03 46.58 96.41 15.19 6.05To simulate the power of the tests, we let β = (0 ,

0) and β = ( δ, δ ) . Figure 4 presents thesize-adjusted power curves as functions of δ when the sample size is 200 and when both q t and17 t follow AR(1) processes. The ﬁgure is representative of other cases. For the two tests in eachpair, the size-adjusted powers are the same, as they are based on the same test statistic. Thus,we need only report two power curves: one for the usual Fourier bases and the other for thetransformed Fourier bases. The basic message from Figure 4 is that the size-adjusted powersassociated with the two sets of bases are very close to each other. This, coupled with its sizeaccuracy and convenience to use, suggests that we use the F test in empirical applications. This study proposes asymptotic F and t tests for structural breaks that are robust to heteroscedas-ticity and autocorrelation. The tests are based on a special series HAR variance estimator wherethe basis functions are crafted via the Gram–Schmidt orthonormalization. Monte Carlo simula-tions show that the F test is much more accurate than the corresponding chi-square test.This study assumes that there is a single known break point. The asymptotic F and t theorycan be extended to the case with multiple but known break points. The theory can also beextended to allow for a linear trend or other deterministic trends, but we need to redesign thebasis functions. In principle, the tests based on series HAR variance estimation can be extendedto accommodate the case with an unknown break point along the line of Cho and Vogelsang(2017). All the basic ingredients have been established in the study. We only need to take thesupremum (or other functionals) of the Wald or t statistic over λ as the test statistic. However,the convenient F approximation is lost, as the supremum of the standard distributions is notstandard any more. Therefore, it is not clear whether there is still an advantage of using seriesHAR variance estimators rather than kernel HAR variance estimators.18 Fourier BasesTransformed Bases

Fourier BasesTransformed Bases

Figure 4: The size-adjusted power curves for diﬀerent 5% tests when T = 200 . Appendix of Proofs

Proof of Lemma 3.1.

Under Assumption 3.1, we have˜ X ′ ˜ XT = T − P [ T λ ] t =1 X ′ t X t OO T − P Tt =[ T λ ]+1 X ′ t X t ! → p (cid:18) λQ OO (1 − λ ) Q (cid:19) . Under Assumption 3.2, we have˜ X ′ u √ T = T − P [ T λ ] t =1 X ′ t u t T − P Tt =[ T λ ]+1 X ′ t u t ! → d (cid:18) Λ W m ( λ )Λ [ W m (1) − W m ( λ )] (cid:19) . Hence √ T ( ˆ β − β ) → d (cid:18) λQ OO (1 − λ ) Q (cid:19) − (cid:18) Λ W m ( λ )Λ [ W m (1) − W m ( λ )] (cid:19) = (cid:18) ( λQ ) − Λ W m ( λ )[(1 − λ ) Q ] − Λ [ W m (1) − W m ( λ )] (cid:19) = Q − Λ · W m ( λ ) λ Q − Λ · W m (1) − W m ( λ )1 − λ ! = Q − Λ · λ R λ dW m ( λ ) Q − Λ · − λ R λ dW m ( λ ) ! . For the second part of the lemma, we have1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˆ u t = 1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t (cid:16) Y t − ˜ X t ˆ β (cid:17) = 1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t (cid:16) ˜ X t β + u t − ˜ X t ˆ β (cid:17) = 1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t u t − " T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˜ X t √ T ( ˆ β − β ) . Now, it is not hard to show that under Assumption 3.3,1 T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˜ X t = T P [ T λ ] t =1 φ j (cid:0) tT (cid:1) X ′ t X t T P Tt =[ T λ ]+1 φ j (cid:0) tT (cid:1) X ′ t X t ! → p  hR λ φ j ( r ) dr i Q OO hR λ φ j ( r ) dr i Q  . √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˆ u t → d Λ R λ φ j ( r ) dW m ( r )Λ R λ φ j ( r ) dW m ( r ) ! −  hR λ φ j ( r ) dr i Q OO hR λ φ j ( r ) dr i Q  Q − Λ · W m ( λ ) λ Q − Λ · W m (1) − W m ( λ )(1 − λ ) ! =  Λ nR λ φ j ( r ) dW m ( r ) − λ R λ φ j ( r ) dr · W m ( λ ) o Λ nR λ φ j ( r ) dW m ( r ) − − λ R λ φ j ( r ) dr [ W m (1) − W m ( λ )] o  = Λ R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r )Λ R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) ! . Proof of Theorem 3.1.

We have R ˆ Q − → d ( R , −R ) (cid:18) λ − Q − OO (1 − λ ) − Q − (cid:19) = (cid:0) λ − R Q − , − (1 − λ ) − R Q − (cid:1) , ˆΩ → d K K X j =1 (cid:18) Λ OO Λ (cid:19) R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) ! ⊗ (cid:18) Λ OO Λ (cid:19) . Hence, R ˆ Q − ˆΩ Q − R ′ → d (cid:0) λ − R Q − , − (1 − λ ) − R Q − (cid:1) (cid:18) Λ OO Λ (cid:19) × K K X j =1 R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) ! ⊗ × (cid:18) Λ OO Λ (cid:19) λ − (cid:0) R Q − (cid:1) ′ OO (1 − λ ) − (cid:0) R Q − (cid:1) ′ ! = (cid:0) λ − R Q − Λ , − (1 − λ ) − R Q − Λ (cid:1) K K X j =1 R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) ! ⊗ × λ − (cid:0) R Q − Λ (cid:1) ′ , − (1 − λ ) − (cid:0) R Q − Λ (cid:1) ′ ! = R Q − Λ 1 K K X j =1 (cid:26)Z λ φ j ( r ) − ¯ φ j, λ dW m ( r ) − Z λ φ j ( r ) − ¯ φ j, − λ dW m ( r ) (cid:27) ⊗ (cid:0) R Q − Λ (cid:1) ′ . √ T · h R ( ˆ β − β ) i → d ( R , −R ) Q − Λ · λ R λ dW m ( λ ) Q − Λ · − λ R λ dW m ( λ ) ! = R Q − Λ (cid:20) λ Z λ dW m ( λ ) − − λ Z λ dW m ( λ ) (cid:21) . Therefore, F T = T · h R ( ˆ β − β ) i ′ h R ˆ Q − ˆΩ Q − R ′ i − h R ( ˆ β − β ) i → d (cid:26) R Q − Λ (cid:20) λ Z λ dW m ( λ ) − − λ Z λ dW m ( λ ) (cid:21)(cid:27) ′ ×  R Q − Λ 1 K K X j =1 (cid:26)Z ˜ φ j ( r ; λ ) dW m ( r ) (cid:27) ⊗ (cid:0) R Q − Λ (cid:1) ′  − × R Q − Λ (cid:20) λ Z λ dW m ( λ ) − − λ Z λ dW m ( λ ) (cid:21) . Using the fact that R Q − Λ W m = A p W p for a square and invertible matrix A p , we have F T → d (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) ′ ×  K K X j =1 (cid:26)Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:27) ⊗  − × (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) . The proof for the weak convergence of t T is similar and is omitted to save space. Proof of Proposition 4.1.

We prove the part for the Wald statistic only, as the proof for thet-statistic is similar. Given that { ˜ φ j ( r ; λ ) } are orthonormal on L [0 , , we have: η j := Z ˜ φ j ( r ; λ ) dW p ( r ) ∼ iidN (0 , I p ) . As a consequence, K X j =1 η j η ′ j ∼ W p ( I p , K ) , the standard Wishart distribution with degrees of freedom K. So, K − p + 1 Kp λ (1 − λ ) F ∞ = K − p + 1 Kp · η ′  K K X j =1 η j η ′ j  − η . η , η , . . . , η K are independent standard normal vectors. η ′ (cid:16) K P Kj =1 η j η ′ j (cid:17) − η followsHotelling’s T distribution. Using the relationship between Hotelling’s T distribution and thestandard F distribution, we have K − p + 1 Kp λ (1 − λ ) F ∞ ∼ F p,K − p +1 . It then follows that K − p + 1 Kp λ (1 − λ ) F T → d F p,K − p +1 . Proof of Lemma 4.1.

We have Z ˜ φ j ( r ; λ ) ˜ φ j ( r ; λ ) dr = 1 λ Z λ (cid:20) φ j ( r ) − λ Z λ φ j ( s ) ds (cid:21) (cid:20) φ j ( r ) − λ Z λ φ j ( s ) ds (cid:21) dr + 1(1 − λ ) Z λ (cid:20) φ j ( r ) − − λ Z λ φ j ( s ) ds (cid:21) (cid:20) φ j ( r ) − − λ Z λ φ j ( s ) ds (cid:21) dr, where 1 λ Z λ (cid:20) φ j ( r ) − λ Z λ φ j ( s ) ds (cid:21) (cid:20) φ j ( r ) − λ Z λ φ j ( s ) ds (cid:21) dr = 1 λ (cid:20)Z λ Z λ δ ( r − s ) φ j ( r ) φ j ( s ) drds − λ Z λ Z λ φ j ( r ) φ j ( s ) drds (cid:21) = Z Z (cid:20) δ ( r − s ) − λ (cid:21) { ( r, s ) ∈ [0 , λ ] × [0 , λ ] } λ φ j ( r ) φ j ( s ) drds, and similarly,1(1 − λ ) Z λ (cid:20) φ j ( r ) − − λ Z λ φ j ( s ) ds (cid:21) (cid:20) φ j ( r ) − − λ Z λ φ j ( s ) ds (cid:21) dr = 1(1 − λ ) Z λ Z λ (cid:20) δ ( r − s ) − − λ (cid:21) φ j ( r ) φ j ( s ) drds = Z Z (cid:20) δ ( r − s ) − − λ (cid:21) { ( r, s ) ∈ [ λ, × [ λ, } (1 − λ ) φ j ( r ) φ j ( s ) drds. Therefore, Z ˜ φ j ( r ; λ ) ˜ φ j ( r ; λ ) dr = Z Z C ( r, s ; λ ) φ j ( r ) φ j ( s ) drds. roof of Theorem 5.1. Part (a). Under Assumption 5.1, we haveˆ Q ˜ X · Z = 1 T ˜ X ′ z ˜ X z = 1 T ˜ X ′ M Z ˜ X = 1 T ˜ X ′ ˜ X − T ˜ X ′ Z (cid:0) Z ′ Z (cid:1) − Z ′ ˜ X → p (cid:18) λQ XX OO (1 − λ ) Q XX (cid:19) − (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) Q − ZZ (cid:0) λQ ZX (1 − λ ) Q ZX (cid:1) := Q ˜ X · Z . Under Assumption 5.2, we have1 √ T ˜ X ′ M Z u = 1 √ T ˜ X ′ u − ˜ X ′ ZT (cid:18) Z ′ ZT (cid:19) − Z ′ uT → d (cid:18) Λ X W m + ℓ ( λ )Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] (cid:19) − (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) Q − ZZ Λ Z W m + ℓ (1)= (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) . where Λ XZ = Q XZ Q − ZZ Λ Z . Hence, R √ T ( ˆ β − β ) → d RQ − X · Z (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) . Using the matrix inverse formula (cid:0) A − CB − C ′ (cid:1) − = A − + A − C (cid:0) B − C ′ A − C (cid:1) − C ′ A − , we have Q − X · Z = (cid:18) λQ XX OO (1 − λ ) Q XX (cid:19) − + (cid:18) λQ XX OO (1 − λ ) Q XX (cid:19) − (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) × " Q ZZ − (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) ′ (cid:18) λQ XX OO (1 − λ ) Q XX (cid:19) − (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) − × (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) ′ (cid:18) λQ XX OO (1 − λ ) Q XX (cid:19) − = (cid:18) λ − Q − XX OO (1 − λ ) − Q − XX (cid:19) + (cid:18) Q QQ Q (cid:19) , where Q = Q − XX Q XZ Q − Z · X Q ZX Q − XX for Q Z · X = Q ZZ − Q ZX Q − XX Q XZ . RQ − X · Z = [ R , −R ] (cid:18) λ − Q − XX OO (1 − λ ) − Q − XX (cid:19) = R h λ − Q − XX , − (1 − λ ) − Q − XX i (12)and RQ − X · Z (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) = R h λ − Q − XX , − (1 − λ ) − Q − XX i (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) = R Q − XX (cid:18) Λ X W m + ℓ ( λ ) λ − Λ XZ W m + ℓ (1) (cid:19) − R Q − XX (cid:18) Λ X W m + ℓ (1) − W m + ℓ ( λ )1 − λ − Λ XZ W m + ℓ (1) (cid:19) = R Q − XX Λ X (cid:18) W m + ℓ ( λ ) λ − W m + ℓ (1) − W m + ℓ ( λ )1 − λ (cid:19) . Hence R √ T ( ˆ β − β ) → d R Q − XX Λ X (cid:18) W m + ℓ ( λ ) λ − W m + ℓ (1) − W m + ℓ ( λ )1 − λ (cid:19) = R Q − XX Λ X (cid:20) λ Z λ dW m + ℓ ( λ ) − − λ Z λ dW m + ℓ ( λ ) (cid:21) . Part (b) We have R ˆ Q − X · Z √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˆ u t = R ˆ Q − X · Z √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t (cid:16) u t − ˜ X ′ z,t ( ˆ β − β ) − Z t (cid:0) Z ′ Z (cid:1) − Z ′ u (cid:17) = R ˆ Q − X · Z √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t u ∗ t − R ˆ Q − X · Z " T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˜ X z,t √ T ( ˆ β − β ) , where u ∗ t = u t − Z t ( Z ′ Z ) − Z ′ u. To ﬁnd the limit of the ﬁrst term in the above equation, we note25hat 1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t u ∗ t = 1 √ T T X t =1 φ j (cid:18) tT (cid:19) (cid:16) ˜ X t − Z t (cid:0) Z ′ Z (cid:1) − Z ′ ˜ X (cid:17) ′ (cid:16) u t − Z t (cid:0) Z ′ Z (cid:1) − Z ′ u (cid:17) = 1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t u t − √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t Z t (cid:0) Z ′ Z (cid:1) − Z ′ u − √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ Z (cid:0) Z ′ Z (cid:1) − Z ′ t u t + 1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ Z (cid:0) Z ′ Z (cid:1) − Z ′ t Z t (cid:0) Z ′ Z (cid:1) − Z ′ u → d Λ X R λ φ j ( r ) dW m + ℓ ( r )Λ X R λ φ j ( r ) dW m + ℓ ( r ) ! − R λ φ j ( r ) dr · Λ XZ W m + ℓ (1) R λ φ j ( r ) dr · Λ XZ W m + ℓ (1) ! − λ · Λ XZ R φ j ( r ) dW m + ℓ ( r )(1 − λ ) · Λ XZ R φ j ( r ) dW m + ℓ ( r ) ! + (cid:18) λ ¯ φ j, · Λ XZ W m + ℓ (1)(1 − λ ) ¯ φ j, · Λ XZ W m + ℓ (1) (cid:19) , where ¯ φ j, = R φ j ( r ) dr. Therefore, R ˆ Q − X · Z √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t u ∗ t → d R Q − XX Λ X (cid:20) λ − Z λ φ j ( r ) dW m + ℓ ( r ) − (1 − λ ) − Z λ φ j ( r ) dW m + ℓ ( r ) (cid:21) − R Q − XX Λ XZ (cid:0) ¯ φ j, − ¯ φ j, (cid:1) W m + ℓ (1) , (13)where we have used R ˆ Q − → d RQ − X,Z = R h λ − Q − XX , − (1 − λ ) − Q − XX i ; see (12).26ext,1 T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˜ X z,t = 1 T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˜ X t − T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t Z t (cid:0) Z ′ Z (cid:1) − Z ′ ˜ X − T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ Z (cid:0) Z ′ Z (cid:1) − Z ′ t ˜ X t + 1 T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ Z (cid:0) Z ′ Z (cid:1) − Z ′ t Z t (cid:0) Z ′ Z (cid:1) − Z ′ ˜ X → p  hR λ φ j ( r ) dr i Q XX OO hR λ φ j ( r ) dr i Q XX  −  hR λ φ j ( r ) dr i Q XZ Q − ZZ hR λ φ j ( r ) dr i Q XZ Q − ZZ  [ λQ ZX , (1 − λ ) Q ZX ] −  hR λ φ j ( r ) dr i Q XZ Q − ZZ hR λ φ j ( r ) dr i Q XZ Q − ZZ  [ λQ ZX , (1 − λ ) Q ZX ]  ′ + Z φ j ( r ) dr · (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) Q − ZZ (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) ′ . (14)So, R ˆ Q − X · Z " T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˜ X z,t → p h λ − R Q − XX , − (1 − λ ) − R Q − XX i  hR λ φ j ( r ) dr i Q XX OO hR λ φ j ( r ) dr i Q XX  − h λ − R Q − XX , − (1 − λ ) − R Q − XX i  hR λ φ j ( r ) dr i Q XZ Q − ZZ hR λ φ j ( r ) dr i Q XZ Q − ZZ  ( λQ ZX , (1 − λ ) Q ZX )= (cid:2) ¯ φ j, R , − ¯ φ j, R (cid:3) − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX Q XZ Q − ZZ · [ λQ ZX , (1 − λ ) Q ZX ] , where we have used the fact that the last two terms in (14) pre-multiplied by h λ − R Q − XX , − (1 − λ ) − R Q − XX i are equal to zero. 27t then follows that R ˆ Q − X · Z " T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˜ X z,t ˆ Q − X · Z → p (cid:2) ¯ φ j, R , − ¯ φ j, R (cid:3) (cid:26)(cid:18) λ − Q − XX OO (1 − λ ) − Q − XX (cid:19) + (cid:18) Q QQ Q (cid:19)(cid:27) − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX Q XZ Q − ZZ × (cid:2) λQ ZX , (1 − λ ) Q ZX (cid:3) (cid:26)(cid:18) λ − Q − XX OO (1 − λ ) − Q − XX (cid:19) + (cid:18) Q QQ Q (cid:19)(cid:27) = (cid:2) λ − ¯ φ j, R Q − XX , − (1 − λ ) − ¯ φ j, R Q − XX (cid:3) − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX Q XZ Q − ZZ (cid:2) Q ZX Q − XX , Q ZX Q − XX (cid:3) − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R (cid:2) Q − XX Q XZ Q − ZZ Q ZX − I p (cid:3) [ Q , Q ] . As a consequence, R ˆ Q − X · Z " T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˜ X z,t √ T ( ˆ β − β ) → d I + I + I , (15)where I = (cid:2) λ − ¯ φ j, R Q − XX , − (1 − λ ) − ¯ φ j, R Q − XX (cid:3) × (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) = λ − ¯ φ j, R Q − XX Λ X W m + ℓ ( λ ) − ¯ φ j, R Q − XX Λ XZ W m + ℓ (1) − (1 − λ ) − ¯ φ j, R Q − XX Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] + ¯ φ j, R Q − XX Λ XZ W m + ℓ (1)= (cid:16) λ − ¯ φ j, + (1 − λ ) − ¯ φ j, (cid:17) R Q − XX Λ X W m + ℓ ( λ ) − h(cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX Λ XZ + (1 − λ ) − ¯ φ j, R Q − XX Λ X i W m + ℓ (1)= (cid:16) λ − ¯ φ j, + (1 − λ ) − ¯ φ j, (cid:17) R Q − XX Λ X W m + ℓ ( λ )+ (cid:2)(cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX (Λ X − Λ XZ ) (cid:3) W m + ℓ (1) − n (1 − λ ) − ¯ φ j, + (cid:2)(cid:0) ¯ φ j, − ¯ φ j, (cid:1)(cid:3)o R Q − XX Λ X W m + ℓ (1)= (cid:16) λ − ¯ φ j, + (1 − λ ) − ¯ φ j, (cid:17) R Q − XX Λ X W m + ℓ ( λ )+ (cid:2)(cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX (Λ X − Λ XZ ) (cid:3) W m + ℓ (1) − λ n λ − ¯ φ j, + (1 − λ ) − ¯ φ j, o R Q − XX Λ X W m + ℓ (1) ,I = − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX Q XZ Q − ZZ × (cid:2) Q ZX Q − XX , Q ZX Q − XX (cid:3) (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) = − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX Q XZ Q − ZZ · Q ZX Q − XX (cid:2) Λ X − Q XZ Q − ZZ Λ Z (cid:3) W m + ℓ (1) , I = − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R (cid:2) Q − XX Q XZ Q − ZZ Q ZX − I p (cid:3) [ Q , Q ] × (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) = − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R (cid:2) Q − XX Q XZ Q − ZZ Q ZX − I p (cid:3) Q (cid:2) Λ X − Q XZ Q − ZZ Λ Z (cid:3) W m + ℓ (1) . Plugging the above three terms I , I , and I back into (15), we obtain: R ˆ Q − X · Z " T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˜ X z,t √ T (cid:16) ˆ β − β (cid:17) → d (cid:16) λ − ¯ φ j, + (1 − λ ) − ¯ φ j, (cid:17) R Q − XX Λ X W m + ℓ ( λ ) − λ (cid:16) λ − ¯ φ j, + (1 − λ ) − ¯ φ j, (cid:17) R Q − XX Λ X W m + ℓ (1)+ (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX (cid:2) I p − Q XZ Q − ZZ Q ZX Q − XX − (cid:0) Q XZ Q − ZZ Q ZX − Q XX (cid:1) Q (cid:3) × (cid:0) Λ X − Q XZ Q − ZZ Λ Z (cid:1) W m + ℓ (1)= (cid:16) λ − ¯ φ j, + (1 − λ ) − ¯ φ j, (cid:17) R Q − XX Λ X [ W m + ℓ ( λ ) − λW m + ℓ (1)]+ (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX (cid:0) Λ X − Q XZ Q − ZZ Λ Z (cid:1) W m + ℓ (1) , (16)where the last equality holds because Q − XX (cid:2) I p − Q XZ Q − ZZ Q ZX Q − XX − (cid:0) Q XZ Q − ZZ Q ZX − Q XX (cid:1) Q (cid:3) = Q − XX − Q − XX Q XZ Q − ZZ Q ZX Q − XX − Q − XX Q XZ Q − ZZ Q ZX Q − XX Q XZ Q − Z · X Q ZX Q − XX = Q − XX − Q − XX Q XZ Q − ZZ Q ZX Q − XX + Q − XX Q XZ Q − ZZ (cid:0) − Q ZX Q − XX Q XZ (cid:1) (cid:0) Q ZZ − Q ZX Q − XX Q XZ (cid:1) − Q ZX Q − XX + Q = Q − XX − Q − XX Q XZ Q − ZZ Q ZX Q − XX + Q − XX Q XZ Q − ZZ (cid:0) Q ZZ − Q ZX Q − XX Q XZ (cid:1) (cid:0) Q ZZ − Q ZX Q − XX Q XZ (cid:1) − Q ZX Q − XX − Q − XX Q XZ Q − ZZ Q ZZ (cid:0) Q ZZ − Q ZX Q − XX Q XZ (cid:1) − Q ZX Q − XX + Q = Q − XX − Q − XX Q XZ Q − ZZ Q ZX Q − XX + Q − XX Q XZ Q − ZZ Q ZX Q − XX − Q − XX Q XZ (cid:0) Q ZZ − Q ZX Q − XX Q XZ (cid:1) − Q ZX Q − XX + Q = Q − XX − Q − XX Q XZ (cid:0) Q ZZ − Q ZX Q − XX Q XZ (cid:1) − Q ZX Q − XX + Q − XX Q XZ Q − Z · X Q ZX Q − XX = Q − XX . R ˆ Q − X · Z √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˆ u t → d R Q − XX Λ X (cid:20) λ − Z λ φ j ( r ) dW m + ℓ ( r ) − (1 − λ ) − Z λ φ j ( r ) dW m + ℓ ( r ) (cid:21) − R Q − XX Λ X h λ − ¯ φ j, + (1 − λ ) − ¯ φ j, i W m + ℓ ( λ )+ R Q − XX Λ X h ¯ φ j, + λ (1 − λ ) − ¯ φ j, − (cid:0) ¯ φ j, − ¯ φ j, (cid:1)i W m + ℓ (1)= R Q − XX Λ X (cid:20) λ − Z λ φ j ( r ) dW m + ℓ ( r ) − (1 − λ ) − Z λ φ j ( r ) dW m + ℓ ( r ) (cid:21) − R Q − XX Λ X h λ − ¯ φ j, + (1 − λ ) − ¯ φ j, i W m + ℓ ( λ )+ R Q − XX Λ X h (1 − λ ) − ¯ φ j, i W m + ℓ (1)= R Q − XX Λ X (cid:20) λ − Z λ (cid:0) φ j ( r ) − ¯ φ j, (cid:1) dW m + ℓ ( r ) − (1 − λ ) − Z λ (cid:0) φ j ( r ) − ¯ φ j, (cid:1) dW m + ℓ ( r ) (cid:21) = R Q − XX Λ X Z ˜ φ j ( r ; λ ) dW m + ℓ ( r ) . Part (c). We prove the case for F T only, as the proof for t T is similar. Using Parts (a) and(b), we have F T = T · ( R ˆ β ) ′ h R ˆ Q − ˆΩ Q − R ′ i − R ˆ β → d (cid:20) R Q − XX Λ X (cid:18) λ Z λ dW m + ℓ ( λ ) − − λ Z λ dW m + ℓ ( λ ) (cid:19)(cid:21) ′ ×  K X j =1 (cid:20) R Q − XX Λ Z ˜ φ j ( r ; λ ) dW m + ℓ ( r ) (cid:21) ⊗  − × (cid:20) R Q − XX Λ X (cid:18) λ Z λ dW m + ℓ ( λ ) − − λ Z λ dW m + ℓ ( λ ) (cid:19)(cid:21) . Note that R Q − XX Λ X W m + ℓ ( λ ) = d A p W p ( λ )for a p × p invertible matrix A p such that A ′ p A p = R Q − XX Λ X Λ ′ X Q − XX R ′ . Using this distributional30quivalence, we have F T = T · ( R ˆ β ) ′ h R ˆ Q − ˆΩ Q − R ′ i − R ˆ β → d (cid:20) A p (cid:18) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:19)(cid:21) ′ ×  K K X j =1 (cid:20) A p Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:21) ⊗  − × (cid:20) A p (cid:18) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:19)(cid:21) = (cid:18) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:19) ′ ×  K K X j =1 (cid:20)Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:21) ⊗  − × (cid:18) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:19) , as desired. References

Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrixestimation.

Econometrica , 59:817–858.Cho, C.-K. and Vogelsang, T. J. (2017). Fixed-b inference for testing structural change in a timeseries regression.

Econometrics , 5(1).Chow, G. C. (1960). Tests of equality between sets of coeﬃcients in two linear regressions.

Econometrica , 28(3):591–605.Giles, D. and Scott, M. (1992). Some consequences of using the Chow test in the context ofautocorrelated disturbances.

Economics Letters , 38(2):145 – 150.Hwang, J. and Sun, Y. (2017). Asymptotic F and t tests in an eﬃcient GMM setting.

Journalof Econometrics , 198:277–295.Jansson, M. (2004). On the error of rejection probability in simple autocorrelation robust tests.

Econometrica , 72:937–946.Kiefer, N. M. and Vogelsang, T. J. (2002a). Heteroskedasticity-autocorrelation robust testingusing bandwidth equal to sample size.

Econometric Theory , 18:1350–1366.Kiefer, N. M. and Vogelsang, T. J. (2002b). Heteroskedasticity-autocorrelation robust standarderrors using the Bartlett kernel without truncation.

Econometrica , 70:2093–2095.Kiefer, N. M. and Vogelsang, T. J. (2005). A new asymptotic theory for heteroskedasticity-autocorrelation robust tests.

Econometric Theory , 21:1130–1164.Kr¨amer, W. (1989).

The Robustness of the Chow Test to Autocorrelation among Disturbances ,pages 45–52. in Statistical Analysis and Forecasting of Economic Structural Change, Springer.31azarus, E., Lewis, D. J., Stock, J. H., and Watson, M. W. (2018). HAR inference: Recommen-dations for practice.

Journal of Business & Economic Statistics , 36(4):541–559.Liu, C. and Sun, Y. (2019). A simple and trustworthy asymptotic t test in diﬀerence-in-diﬀerencesregressions.

Journal of Econometrics , 210(2):327 – 362.Martnez-Iriarte, J., Sun, Y., and Wang, X. (2019). Asymptotic F tests under possibly weakidentiﬁcation. Working Paper, Department of Economics, UC San Diego.Newey, W. K. and West, K. D. (1987). A simple, positive semi-deﬁnite, heteroskedasticity andautocorrelation consistent covariance matrix.

Econometrica , 55(3):703–708.Phillips, P. C. B. (2005). HAC estimation by automated regression.

Econometric Theory ,21(1):116–142.Sun, Y. (2011). Robust trend inference with series variance estimator and testing-optimal smooth-ing parameter.

Journal of Econometrics , 164:345–366.Sun, Y. (2013). A heteroskedasticity and autocorrelation robust F test using orthonormal seriesvariance estimator.

Econometrics Journal , 16:1–26.Sun, Y. and Kim, M. S. (2012). Simple and powerful GMM over-identiﬁcation tests with accuratesize.

Journal of Econometrics , 166:267–281.Sun, Y., Philips, P. C. B., and Jin, S. (2008). Optimal bandwidth selection in heteroskedasticity-autocorrelation robust testing.