An Asymptotically F-Distributed Chow Test in the Presence of Heteroscedasticity and Autocorrelation
aa r X i v : . [ ec on . E M ] N ov An Asymptotically F-Distributed Chow Test in the Presence ofHeteroscedasticity and Autocorrelation ∗ Yixiao SunDepartment of EconomicsUC San Diego, USA Xuexin WangSchool of Economics and WISEXiamen University, ChinaNovember 12, 2019
Abstract
This study proposes a simple, trustworthy Chow test in the presence of heteroscedasticityand autocorrelation. The test is based on a series heteroscedasticity and autocorrelationrobust variance estimator with judiciously crafted basis functions. Like the Chow test in aclassical normal linear regression, the proposed test employs the standard F distribution asthe reference distribution, which is justified under fixed-smoothing asymptotics. Monte Carlosimulations show that the null rejection probability of the asymptotic F test is closer to thenominal level than that of the chi-square test.
Keywords:
Chow Test, F Distribution, Heteroscedasticity and Autocorrelation, StructuralBreak.
For predictive modeling and policy analysis using time series data, it is important to checkwhether a structural relationship is stable over time. The Chow (1960) test is designed to testwhether a break takes place at a given period in an otherwise stable relationship. The test iswidely used in empirical applications and has been included in standard econometric textbooks.This paper considers the Chow test in the presence of heteroscedasticity and autocorrelation.There is ample evidence that the Chow test can have very large size distortions if heteroscedas-ticity and autocorrelation are not accounted for (e.g., Kr¨amer (1989) and Giles and Scott (1992)).Even if we account for them using heteroscedasticity and autocorrelation robust (HAR) varianceestimators (e.g., Newey and West (1987) and Andrews (1991)), the test can still over-reject thenull hypothesis by a large margin if chi-square critical values are used . This is a general prob-lem for any HAR inference, as the chi-square approximation ignores the often substantial finitesample randomness of the HAR variance estimator. To address this problem, the recent litera-ture has developed a new type of asymptotics known as fixed-smoothing asymptotics (see, e.g.,Kiefer and Vogelsang (2002a,b, 2005) for early seminal contributions). It is now well known that ∗ We thank Derrick H. Sun for excellent research assistance. When the Chow test is performed on a single coefficient, normal critical values are typically used on the tstatistic. For now, we focus only on the Wald-type Chow test for more than one coefficients so that chi-squarecritical values are used. .To establish the asymptotic F theory for the Chow test under fixed-smoothing asymptotics,we have to transform the usual orthonormal bases such as sine and cosine bases using the Gram–Schmidt orthonormalization. This is because, unlike the HAR inference in a regression withstationary regressors and regression errors, using the usual bases as in Sun (2013) does not leadto a standard fixed-smoothing asymptotic distribution, since the regressors in the regressionfor the structural break test are identically zero before or after the break point and are thusnot stationary. The Gram–Schmidt orthonormalization ensures that the transformed bases areorthonormal with respect to a special inner product that is built into the problem under con-sideration. The asymptotic F test is very convenient to use, as the F critical values are readilyavailable from standard statistical tables and programming environments.Monte Carlo simulation experiments show that the F test based on the transformed Fourierbases is as accurate as the nonstandard test based on the usual Fourier bases. The F test andnonstandard test have the same size-adjusted power as the corresponding chi-square tests butmuch more accurate size. Given its convenience, competitive power, and higher size accuracy, werecommend the F test for practical use.Our F test theory generalizes the classical Chow test in a linear normal regression where theF distribution is the exact finite sample distribution. The main departures are that we do notmake the normality assumption and that we allow for heteroscedasticity and autocorrelation ofunknown forms. Without restrictive assumptions such as normality and strict exogeneity, it isin general not possible to obtain the exact finite sample distribution. Instead, we employ thefixed-smoothing asymptotics to show that the Wald statistic is asymptotically F distributed.This study contributes to the asymptotic F test theory in the HAR literature. The asymptoticF theory has been developed in a number of papers including Sun (2011); Sun and Kim (2012);Sun (2013); Hwang and Sun (2017); Lazarus et al. (2018); Liu and Sun (2019); Wang and Sun(2019); Martnez-Iriarte et al. (2019). However, none of these studies considers the case wherethe regressors take the special form of nonstationarity as we consider here. Cho and Vogelsang(2017) consider fixed-b asymptotics for testing structural breaks, but they consider only kernelHAR variance estimators. As a result, the fixed-smoothing asymptotic distributions they obtainedare highly nonstandard.The rest of this paper is organized as follows. Section 2 presents the basic setting andintroduces the test statistics. Section 3 establishes the fixed-smoothing asymptotics of the F andt statistics. Section 4 develops asymptotically valid F and t tests. Section 5 extends the basicregression model to include other covariates whose coefficients are known to be stable over time. In the series case, fixed-smoothing asymptotics holds the number of basis functions fixed as the sample sizeincreases. In the kernel case, fixed-smoothing asymptotics holds the truncation lag parameter fixed at a certainproportion of the sample size.
Given the time series observations { X t ∈ R m , Y t ∈ R } Tt =1 , we consider the model Y t = X t · { t ≤ [ λT ] } · β + X t · { t ≥ [ λT ] + 1 } · β + u t , for t = 1 , , . . . , T where the unobserved u t satisfies E ( X t u t ) = 0 . In the above, λ is a knownparameter in (0 ,
1) so that [ λT ] is the period where the structural break may take place. Theeffects of X t on Y t before and after the break are β ∈ R m and β ∈ R m , respectively. We allow X t u t to exhibit autocorrelation of unknown forms. In particular, we allow u t to be heteroskedasticso that E ( u t | X t ) is a nontrivial function of X t . We are interested in testing the null of H : R β = R β against the alternative H : R β = R β for some p × m matrix R . When R is the m × m identity matrix, we aim at testing whether β is equal to β . For the moment, we consider the case that all coefficients are subject to apossible break. In Section 5, we consider the case that some of the coefficients are known to betime invariant.Let X t = X t · { t ≤ [ λT ] } , X t = X t · { t ≥ [ λT ] + 1 } . Note that both X t and X t are nonstationary. The form of the nonstationarity makes theproblem at hand unique. Let β = ( β ′ , β ′ ) ′ and ˜ X t = ( X t , X t ) . Then Y t = ˜ X t β + u t , and the hypotheses of interest become H : Rβ = 0 and H : Rβ = 0 for R = [ R , −R ] ∈ R p × m . Denote ˜ X = ( ˜ X ′ , . . . , ˜ X ′ T ) ′ , Y = ( Y , . . . , Y T ) ′ , and u = ( u , . . . , u T ) ′ . We estimate β by OLS:ˆ β = ( ˜ X ′ ˜ X ) − ˜ X ′ Y. The OLS estimator ˆ β satisfies √ T ( ˆ β − β ) = ˆ Q − √ T T X t =1 ˜ X ′ t u t , where ˆ Q = ˜ X ′ ˜ XT = T − P [ T λ ] t =1 X ′ t X t OO T − P TT [ λ ]+1 X ′ t X t ! and O is a matrix of zeros. To make inferences on β such as testing whether Rβ is zero, weneed to estimate the variance of T − / P Tt =1 ˜ X ′ t u t . To this end, we first construct the residualˆ u t = Y t − ˜ X t ˆ β, which serves as an estimate for u t . Given a set of basis functions { φ j ( · ) } Kj =1 , wethen construct the series estimator of the variance asˆΩ = 1 K K X j =1 " √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˆ u t ⊗ , a, a ⊗ is the outer product of a, that is, a ⊗ = aa ′ . The asymptoticvariance of R √ T ( ˆ β − β ) is then estimated by R ˆ Q − ˆΩ ˆ Q − R ′ . The Wald statistic for testing H : Rβ = 0 against H : Rβ = 0 is F T = T · ( R ˆ β ) ′ h R ˆ Q − ˆΩ ˆ Q − R ′ i − ( R ˆ β ) . When p = 1 and we test H : Rβ = 0 against a one-sided alternative, say, H : Rβ >
0, we canconstruct the t statistic: t T = √ T · R ˆ β h R ˆ Q − ˆΩ ˆ Q − R ′ i / . The forms of the F and t statistics are standard.
To establish the asymptotic distributions of F T and t T , we maintain the following three assump-tions: Assumption 3.1 T − P [ T r ] t =1 X ′ t X t → p Q · r uniformly over r ∈ [0 , and Q is invertible. Assumption 3.2 T − / P [ T r ] t =1 X ′ t u t → d Λ W m ( r ) for r ∈ [0 , where Ω = ΛΛ ′ is the long runvariance of { X ′ t u t } and W m ( · ) is an m × standard Brownian process. Assumption 3.3
The basis functions φ j ( · ) , j = 1 , , . . . , K are piecewise monotonic and piece-wise continuously differentiable. Lemma 3.1
Let Assumptions 3.1 and 3.2 hold. Then √ T ( ˆ β − β ) := (cid:18) √ T ( ˆ β − β ) √ T ( ˆ β − β ) (cid:19) → d Q − Λ · λ R λ dW m ( λ ) Q − Λ · − λ R λ dW m ( λ ) ! . If Assumption 3.3 also holds, then √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˆ u t → d Λ · R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r )Λ · R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) ! , where ¯ φ j, = 1 λ Z λ φ j ( s ) ds and ¯ φ j, = 11 − λ Z λ φ j ( s ) ds. Note that λ R λ dW m ( λ ) and − λ R λ dW m ( λ ) are the average changes of the Brownian motionover the intervals [0 , λ ] and [ λ, , respectively. Lemma 3.1 shows that √ T ( ˆ β − β ) and √ T ( ˆ β − β ) are (matrix) proportional to the average changes. Given the independence of these changesover any non-overlapping intervals, √ T ( ˆ β − β ) and √ T ( ˆ β − β ) are asymptotically independent.Note that ¯ φ j, can be regarded as an average of φ j ( · ) over the interval [0 , λ ] . Similarly, ¯ φ j, can be regarded as an average of φ j ( · ) over the interval [ λ, . So φ j ( r ) − ¯ φ j, and φ j ( r ) − ¯ φ j, are the demeaned versions of φ j ( r ) over the intervals [0 , λ ] and [ λ, , respectively.Using Lemma 3.1, we can prove our main theorem below.4 heorem 3.1 Let Assumptions 3.1–3.3 hold. Then, under the null hypothesis, F T → d (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) ′ × K K X j =1 (cid:26)Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:27) ⊗ − × (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) := F ∞ , (1) where ˜ φ j ( r ; λ ) = 1 λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) · { r ≤ λ } − − λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) · { r > λ } . (2) When p = 1 ,t T → d (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) × K K X j =1 (cid:26)Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:27) ⊗ − / := t ∞ . Like the finite sample distributions, the limiting distributions of F T and t T depend on λ andthe number and form of the basis functions. This is an attractive feature of the fixed-smoothingapproximations, as they capture the effects of all these factors. More importantly, the fixed-smoothing approximations capture the randomness of the HAR variance estimator, which clearlyaffects the finite sample distributions of F T and t T . This is why the fixed-smoothing asymptoticapproximations are more accurate than the chi-square or normal approximations.
The limiting distributions F ∞ and t ∞ in Theorem 3.1 are pivotal but nonstandard. We canapproximate the nonstandard distributions using a chi-square or t distribution. We can alsodesign a new set of basis functions so that F ∞ and t ∞ become the standard F and t distributionsafter some multiplicative adjustment. Define ˜ φ ( r ; λ ) = 1 λ { r ≤ λ } − − λ { r > λ } . Then 1 λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ )= Z ˜ φ ( r ; λ ) dW p ( r ) ∼ N (cid:18) , Z h ˜ φ ( r ; λ ) i dr · I p (cid:19) = N (cid:18) , λ (1 − λ ) I p (cid:19) , η := p λ (1 − λ ) (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) = p λ (1 − λ ) Z ˜ φ ( r ; λ ) dW p ( r ) ∼ N (0 , I p ) . As a result, λ (1 − λ ) F T → d η ′ K K X j =1 η j η ′ j − η and p λ (1 − λ ) t T → d η ′ K K X j =1 η j η ′ j − / , where η j = Z ˜ φ j ( r ; λ ) dW p ( r ) for j = 1 , . . . , K. When K is relatively large, it is reasonable to approximate K − P Kj =1 η j η ′ j by its mean: E K K X j =1 η j η ′ j = I p · K K X j =1 Z h ˜ φ j ( r ; λ ) i dr. With such an approximation, we have λ (1 − λ ) · K K X j =1 Z h ˜ φ j ( r ; λ ) i dr · F ∞ ∼ a χ p , p λ (1 − λ ) K K X j =1 Z h ˜ φ j ( r ; λ ) i dr / · t ∞ ∼ a N (0 , , where ‘ ∼ a ’ signifies distributional approximations. As a result, we can employ the followingapproximations: F ∗ T := λ (1 − λ ) · KT K X j =1 T X i =1 (cid:20) ˜ φ j,T (cid:18) iT ; λ (cid:19)(cid:21) · F T ∼ a χ p , (3) t ∗ T := p λ (1 − λ ) · KT K X j =1 T X i =1 (cid:20) ˜ φ j,T (cid:18) iT ; λ (cid:19)(cid:21) / · t T ∼ a N (0 , , (4)where ˜ φ j,T ( r ; λ ) is the finite sample version of ˜ φ j ( r ; λ ) given by˜ φ j,T ( r ; λ ) = 1 λ φ j ( r ) − λT ] [ λT ] X t =1 φ j (cid:18) tT (cid:19) · { r ≤ λ }− − λ φ j ( r ) − T − [ λT ] T X t =[ λT ]+1 φ j (cid:18) tT (cid:19) · { r > λ } . (5)6t is important to point out that the chi-square and normal approximations are not basedon the original Wald and t statistics but rather on their modified versions F ∗ T and t ∗ T . Toa great extent, the chi-square and normal approximations we propose here improve upon theconventional chi-square and normal approximations that are applied directly to the original Waldand t statistics.Note that the chi-square distribution and standard normal distribution in (3) and (4) are notthe asymptotic distributions of F ∗ T and t ∗ T for a fixed K. The fixed- K asymptotic distributionsare given by F ∗ T → d λ (1 − λ ) · K K X j =1 Z h ˜ φ j ( r ; λ ) i dr · F ∞ := F ∗∞ , (6) t ∗ T → d p λ (1 − λ ) · K K X j =1 Z h ˜ φ j ( r ; λ ) i / · t ∞ := t ∗∞ . (7)These follow directly from Theorem 3.1. The chi-square distribution and standard normal dis-tribution are only approximations to the above nonstandard fixed-K asymptotic distributions. To obtain convenient fixed-K asymptotic approximations, we note that for each j = 0 , , . . . , K,η j is normal. For each j = 0 , we have cov ( η , η j )= Z ˜ φ ( r ; λ ) ˜ φ j ( r ; λ ) dr = 1 λ Z (cid:2) φ j ( r ) − ¯ φ j, (cid:3) { r ≤ λ } dr + 1(1 − λ ) Z (cid:2) φ j ( r ) − ¯ φ j, (cid:3) { r > λ } dr = 0 . So η is independent of η j , j = 1 , . . . , K. In addition, cov ( η j , η j ) = Z ˜ φ j ( r ; λ ) ˜ φ j ( r ; λ ) dr. Therefore, if { ˜ φ j ( r ; λ ) } are orthonormal, then η j for j = 0 , , . . . , K are independent standardnormals. In this case, λ (1 − λ ) F ∞ is a quadratic form in a standard normal vector with anindependent weighting matrix. After some adjustment, we can show that λ (1 − λ ) F ∞ is equalto a standard F distribution and that F T converges to the F distribution. Similarly, p λ (1 − λ ) · t T converges to Student’s t distribution. Proposition 4.1
Let Assumptions 3.1–3.3 hold. If { ˜ φ j ( r ; λ ) } are orthonormal, then ˜ F ∗ T := K − p + 1 Kp · λ (1 − λ ) · F T → d F p,K − p +1 , and ˜ t ∗ T := p λ (1 − λ ) · t T → d t K where F p,K − p +1 is the standard F distribution with the degrees of freedom ( p, K − p + 1) and t K is Student’s t distribution with degrees of freedom K. { ˜ φ j ( r ; λ ) } are orthonormal, we have K − P Kj =1 R h ˜ φ j ( r ; λ ) i dr = 1 . In view of this,we can see that the definitions of ˜ F ∗ T and ˜ t ∗ T are similar to those of F ∗ T and t ∗ T given in (6) and(7). The only difference is that there is an additional degrees-of-freedom-adjustment factor in ˜ F ∗ T when p > . To design the basis functions such that { ˜ φ j ( r ; λ ) } are orthonormal, we need the following lemma. Lemma 4.1
Let δ ( · ) be the Dirac delta function such that Z Z φ j ( r ) δ ( r − s ) φ j ( s ) drds = Z φ j ( r ) φ j ( r ) dr. Then Z ˜ φ j ( r ; λ ) ˜ φ j ( r ; λ ) dr = Z Z C ( r, s ; λ ) φ j ( r ) φ j ( s ) drds, where C ( r, s ; λ ) = (cid:20) δ ( r − s ) − λ (cid:21) { ( r, s ) ∈ [0 , λ ] × [0 , λ ] } λ + (cid:20) δ ( r − s ) − − λ (cid:21) { ( r, s ) ∈ [ λ, × [ λ, } (1 − λ ) . Let W p ( r ; λ ) = 1 λ h W p ( r ) − rλ W p ( λ ) i · { ≤ r ≤ λ }− − λ (cid:26) W p ( r ) − W p ( λ ) − r − λ − λ [ W p (1) − W p ( λ )] (cid:27) · { λ < r ≤ } be the transformed Brownian motion. Then we have Z ˜ φ j ( r ; λ ) dW p ( r ) = Z φ j ( r ) dW p ( r ; λ ) , and E (cid:2) dW p ( r ; λ ) dW ′ p ( s ; λ ) (cid:3) = I p · C ( r, s ; λ ) drds. Therefore, C ( r, s ; λ ) can be regarded as the covariance kernel function for the transformed Brow-nian motion.To design the basis functions { φ j ( r ) } such that { ˜ φ j ( r ; λ ) } are orthonormal on L [0 , , werequire that { φ j ( r ) } be orthonormal with respect to the covariance kernel function C ( r, s ; λ ) , that is, Z Z C ( r, s ; λ ) φ j ( r ) φ j ( s ) drds = 1 { j = j } . (8)This can be achieved by applying the Gram–Schmidt orthonormalization to any set of basisfunctions on L [0 , φ j } → estimation error n ˜ φ j o (may not be orthonormal on L [0 , ↓ GS { φ ∗ j } → estimation error n ˜ φ ∗ j o (orthonormal on L [0 , { φ j } is the initial set of basis functions, and { φ ∗ j } is the Gram-Schmidtorthonormalized set. “ φ j → ˜ φ j ” and “ φ ∗ j → ˜ φ ∗ j ” reflect the effect of the estimation errorin estimating β : had we known β, we would have used the true u t instead of ˆ u t in con-structing the variance estimator, and the key elements of the weighting matrix in (1) in The-orem 3.1 would have been R φ j ( r ) dW p ( r ) instead of R ˜ φ j ( r ; λ ) dW p ( r ) . The Gram-Schmidtorthonormalization ensures that { φ ∗ j } are orthonormal with respect to the covariance kernel C ( r, s ; λ ) : R R φ ∗ j ( r ) φ ∗ j ( s ) C ( r, s ; λ ) drds = 1 { j = j } . In view of Z Z φ ∗ j ( r ) φ ∗ j ( s ) C ( r, s ; λ ) drds = Z ˜ φ ∗ j ( r ) ˜ φ ∗ j ( r ) dr, we have: { ˜ φ ∗ j } are orthonormal on L [0 , . If we use { φ ∗ j } in constructing the variance estimator, then λ (1 − λ ) F T → d η ′ K K X j =1 η j η ′ j − η for η j = R ˜ φ ∗ j ( r ; λ ) dW p ( r ) ∼ iidN (0 , I p ) because { ˜ φ ∗ j } are orthonormal on L [0 , . Moreover,for j = 1 , . . . , K, η j is independent of η . Therefore, the asymptotic F theory in Proposition 4.1holds. Similarly, the asymptotic t theory holds.Instead of searching for the basis functions that satisfy (8), we search for their discrete versions:the basis vectors. For each basis function φ k ( r ) , the corresponding basis vector is defined as φ k = (cid:18) φ k (cid:18) T (cid:19) , φ k (cid:18) T (cid:19) , . . . , φ k (cid:18) TT (cid:19)(cid:19) ′ . Let C T := C T ( λ ) be the T × T matrix whose ( i, j )-th element is equal to C T ( i, j ; λ ) = (cid:20) T · { i = j } − λ (cid:21) n(cid:16) iT , jT (cid:17) ∈ [0 , λ ] × [0 , λ ] o λ + (cid:20) T · { i = j } − − λ (cid:21) n(cid:16) iT , jT (cid:17) ∈ ( λ, × ( λ, o (1 − λ ) . By definition, C T is symmetric and positive-definite. It is the discrete version of C ( r, s ; λ ) . Forany two vectors r , r ∈ R T , we define the inner product h r , r i = r ′ C T r /T . (9)Then the discrete analogue of (8) is (cid:10) φ j , φ j (cid:11) = 1 { j = j } for j , j = 1 , . . . , K. (10)9iven any basis vectors φ , . . . , φ K , we now apply the Gram–Schmidt orthonormalization viathe Cholesky decomposition. Let φ = ( φ , . . ., φ K ) be the T × K matrix of basis vectors. Let U T ∈ R K × K be the upper triangular factor in the Cholesky decomposition of φ ′ C T φ /T suchthat φ ′ C T φ /T = U ′ T U T . Define φ ∗ = φ U − T := ( φ ∗ , . . ., φ ∗ K ) . We then have ( φ ∗ ) ′ C T φ ∗ /T = (cid:0) U ′ T (cid:1) − φ ′ C T φ U − T /T = (cid:0) U ′ T (cid:1) − U ′ T U T U − T = I K . That is, the columns of the matrix φ ∗ satisfy the conditions in (10).Note that the ( k , k )-th element of φ ′ C T φ /T satisfies1 T T X j =1 T X i =1 φ k (cid:18) iT (cid:19) C T ( i, j ; λ ) φ k (cid:18) jT (cid:19) → Z Z C ( r, s ; λ ) φ k ( r ) φ k ( s ) drds = Z ˜ φ k ( r ; λ ) ˜ φ k ( r ; λ ) dr = cov ( η k , η k ) as T → ∞ . This implies that U T converges to the upper triangular factor of the Cholesky decomposition ofvar( η , . . . , η K ) . As a result, every transformed basis vector is approximately equal to a linearcombination of the original basis vectors. The implied basis functions are thus equal to linearcombinations of the original basis functions. Therefore, if Assumption 3.3 holds for the originalbasis functions, it also holds for the transformed basis functions. It then follows that Proposition4.1 holds when { φ ∗ , . . ., φ ∗ K } are used as the basis vectors in constructing the asymptotic varianceestimator. More specifically, if we estimate Ω byˆΩ = 1 K K X j =1 " √ T T X t =1 φ ∗ j,t ˜ X ′ t ˆ u t ⊗ , where φ ∗ j,t is the t -th element of the vector φ ∗ j , then the asymptotic F and t results in Proposition4.1 hold. Suppose there is another covariate vector Z t ∈ R ℓ whose effect on Y t does not change over timeso that we have the model: Y t = X t · { t ≤ [ λT ] } · β + X t · { t > [ λT ] + 1 } · β + Z t γ + u t . Let Z = ( Z ′ , . . . , Z ′ T ) ′ and M Z = I T − Z ( Z ′ Z ) − Z ′ . Then M Z Y = M Z ˜ Xβ + M Z u. The OLS estimator of β = ( β ′ , β ′ ) ′ is nowˆ β = ( ˜ X ′ M Z ˜ X ) − ˜ X ′ M Z Y. u = (ˆ u , . . . , ˆ u T ) ′ = M Z Y − M Z ˜ X ˆ β = M Z u − M Z ˜ X ( ˆ β − β ) and ˜ X z = ( ˜ X ′ z, , . . . , ˜ X ′ z,T ) ′ = M Z ˜ X. Define ˆ Q ˜ X · Z = ˜ X ′ M Z ˜ XT and ˆΩ = 1 K K X j =1 " √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˆ u t ⊗ . The Wald statistic for testing H : Rβ = 0 against H : Rβ = 0 takes the same form as before: F T = T · ( R ˆ β ) ′ h R ˆ Q − ˆΩ ˆ Q − R ′ i − ( R ˆ β ) . When p = 1 , we construct the t statistic: t T = √ T · R ˆ β h R ˆ Q − ˆΩ ˆ Q − R ′ i / . To establish the asymptotic distributions of F T and t T , we maintain the two assumptionsbelow, which are analogous to Assumptions 3.1 and 3.2. Assumption 5.1 T − P [ T r ] t =1 ( X t , Z t ) ′ ( X t , Z t ) → p Q · r uniformly over r ∈ [0 , for a ( m + ℓ ) × ( m + ℓ ) invertible matrix Q . Assumption 5.2 T − / P [ T r ] t =1 ( X t , Z t ) ′ u t → d Λ W m + ℓ ( r ) for r ∈ [0 , where ΛΛ ′ is the long runvariance of the process { ( X t , Z t ) ′ u t } and W m + ℓ ( · ) is an ( ℓ + m ) × standard Brownian process. We partition Q and Λ according to Q = (cid:18) Q XX Q XZ Q ZX Q ZZ (cid:19) and Λ = (cid:18) Λ X Λ Z (cid:19) , where Q XX ∈ R m × m , Q ZZ ∈ R ℓ × ℓ , Λ X ∈ R m × ( ℓ + m ) , and Λ Z ∈ R ℓ × ( ℓ + m ) . Theorem 5.1
Let Assumptions 3.3, 5.1, and 5.2 hold. Then ( a ) R √ T ( ˆ β − β ) → d R Q − XX Λ X (cid:18) λ Z λ dW m + ℓ ( λ ) − − λ Z λ dW m + ℓ ( λ ) (cid:19) . ( b ) R ˆ Q − X · Z √ T T X t =1 φ j (cid:18) tT (cid:19) X ′ z,t ˆ u t → d R Q − XX Λ X Z ˜ φ j ( r ; λ ) dW m + ℓ ( r ) jointly over j = 1 , , ..., K. ( c ) F T → d (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) ′ × K K X j =1 (cid:26)Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:27) ⊗ − × (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) . hen p = 1 ,t T → d (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) × K K X j =1 (cid:26)Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:27) ⊗ − / . Theorem 5.1 shows that the limiting distributions of the Wald statistic and t statistic are thesame as in the case without the extra covariate Z t . The asymptotic F and t limit theory can bedeveloped in exactly the same way as in Section 4. We present the result formally as a corollary.
Corollary 1
Let Assumptions 3.3, 5.1, and 5.2 hold. Suppose that the Gram–Schmidt trans-formed basis vectors φ ∗ , ..., φ ∗ K are used in constructing the variance estimator, that is, ˆΩ = 1 K K X j =1 " √ T T X t =1 φ ∗ j,t ˜ X ′ z,t ˆ u t ⊗ where φ ∗ j,t is the t -th element of the vector φ ∗ j . Then ˜ F ∗ T := K − p + 1 Kp · λ (1 − λ ) · F T → d F p,K − p +1 , and ˜ t ∗ T := p λ (1 − λ ) · t T → d t K . In this section, we investigate the finite sample properties of the proposed F test. We consider thelinear regression model with m = 2 and X t = (1 , q t ) . The regressor q t follows an AR(1) process,and the error u t follows an independent AR(1) or ARMA(1,1) process. That is, q t = ρq t − + ǫ q,t u t = ρu t − + ǫ u,t + ψǫ u,t − where both ǫ q,t and ǫ u,t are iid N (0 ,
1) and { ǫ q,t , t = 1 , . . . , T } are independent of { ǫ u,t : t = 1 , , . . . , T } . Note that the AR parameter ρ is the same for q t and u t . We consider the sample sizes T = 100 , . We let λ = 0 . . Without the loss ofgenerality, we set β = (0 , ′ and β = (0 , ′ under the null. We consider testing H : β = β against H : β = β so that p = 2.We consider two pairs of different tests, both of which are based on the series variance esti-mators. The first pair uses the (usual) Fourier bases n φ j − ( r ) = √ jπr ) , φ j = √ jπr ) , j = 1 , . . . , K/ o . (11)Each test in this pair is based on the same test statistic F ∗ T defined in (3) but uses differentreference distributions. The first test uses the chi-square approximation ( χ ) while the secondtest uses the nonstandard fixed-smoothing approximation given in (6). We refer to the two testsas “ χ : Fourier Bases” and “ F ∗∞ : Fourier Bases,” respectively. The nonstandard critical valuesare simulated. We approximate the standard Brownian motion in the nonstandard distribution12sing scaled partial sums of 1000 iid N (0 ,
1) random variables. To compute the nonstandardcritical values, we use 10,000 simulation replications.The second pair of tests uses the transformed Fourier bases via the Gram–Schmidt orthog-onalization given in Section 4.3. Each of the two tests in this pair is based on the same teststatistic ˜ F ∗ T defined in Proposition 4.1. The first test uses the standard F approximation, andthe second test uses the rescaled chi-square distribution Kp [ K − p + 1] − χ . Equivalently, thesecond test in this pair employs the test statistic ˜ F T = λ (1 − λ ) · F T and the standard chi-squareapproximation ( χ ). We refer to the two tests as “ χ : Transformed Bases” and “ F : TransformedBases,” respectively. The chi-square test in the second pair is used to illustrate the effectivenessof the F approximation in reducing the size distortion.The nominal level of all tests is 5% . The number of simulation replications is 10,000. Figures1 and 2 report the null rejection probability for each test for the sample sizes T = 100 and T = 500 when q t and u t follow independent AR(1) processes with the same AR parameter ρ. Several patterns emerge from these two figures: • Regardless of the bases used, the chi-square tests over-reject the null by a large margin,especially when K is small. • Regardless of the bases used, the nonstandard test and F test are much more accurate thanthe chi-square tests. • For each given value of K, the null rejection probabilities of the nonstandard test and F testare close to each other. This shows that, in terms of size accuracy, using the F approximation(when the transformed Fourier bases are employed) is as good as using the nonstandardapproximation (when the Fourier bases are employed). However, the F approximation ismore convenient to use and, hence, is preferred. • For each given value of K, the null rejection probabilities of the two chi-square tests areclose to each other, although the one based on the transformed Fourier bases is somewhatmore accurate. This shows that the bases do not have a large effect on the quality of thechi-square approximation. • The nonstandard test and standard F test can still have quite some size distortion if K islarge and the regressor and error processes are persistent. The size distortion comes fromthe bias of the variance estimator. When K is large, we take an average over a frequencywindow that is too large when the processes are highly persistent, that is, when the spectraldensity of { x t u t } is not very flat at the origin. So, it is important to use a data-driven K to obtain an accurate test in practice. • Comparing the two figures, we see that the size distortion of every test becomes smallerwhen the sample size is larger.Figure 3 reports the null rejection probabilities when the sample size T is 200 and when theerror process may have an MA component and the AR parameter may be negative. As in Figures1 and 2, the same patterns emerge.Next, we consider the size properties of the tests with a data-driven K. Note that R √ T ( ˆ β − β ) = 1 √ T T X t =1 R ˜ X ′ ˜ XT ! − ˜ X ′ t u t = 1 √ T T X t =1 v t + o p (1)13 R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y Figure 1: The empirical null rejection probabilities of different 5% tests when T = 100 for a rangeof different K values from 2 to 20 with increment 2.14 R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y Figure 2: The empirical null rejection probabilities of different 5% tests when T = 500 for a rangeof different K values from 2 to 20 with increment 2.15 R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y R e j e c t i on P r obab ili t y Figure 3: The empirical null rejection probabilities of different 5% tests when T = 200 for a rangeof different K values from 2 to 20 with increment 2.16here v t = RQ − ˜ X ′ t u t . Then R ˆ Q − ˆΩ Q − R ′ = 1 K K X j =1 " √ T T X t =1 φ j (cid:18) tT (cid:19) ˆ v t ⊗ , where ˆ v t = R ˆ Q − ˜ X ′ t u t . So R ˆ Q − ˆΩ Q − R ′ can be viewed as the series variance estimator of thelong run variance of { v t } . We can follow Phillips (2005) and choose K to minimize the meansquare error (MSE) of RQ − ˆΩ Q − R ′ . We fit a VAR(1) model to ˆ v t and use the fitted model tocompute the data-driven MSE-optimal K. Table 1 reports the null rejection probabilities and the average values of K used with data-driven choice of K for different sample sizes. The qualitative observations from Figures 1– 3continue to hold with the data-driven K. In particular, the nonstandard test and standard F testare more accurate than the corresponding chi-square tests, especially when the latter have largepositive size distortion. The null rejection probabilities of the nonstandard test and the standardF test are close to each other. Similarly, the null rejection probabilities of the two chi-squaretests are close to each other. As expected, the average value of K decreases with the persistenceof the underlying processes. The higher the persistence, the smaller the average K value, andthe more effective the nonstandard test and standard F test in reducing the size distortion.Table 1: The empirical null rejection probabilities of different 5% tests with data-driven choiceof K . ρ = 0 ρ = 0 . ρ = 0 . ρ = 0 . ρ = − . ρ = − . ρ = 0 . ρ = 0 . ψ = 0 ψ = 0 ψ = 0 ψ = 0 ψ = 0 ψ = 0 ψ = 0 . ψ = . . T = 100 χ : Fourier 0.092 0.131 0.227 0.511 0.128 0.091 0.287 0.558 F ∗∞ : Fourier 0.060 0.076 0.101 0.198 0.067 0.056 0.087 0.170 χ : Transformed 0.089 0.124 0.210 0.473 0.119 0.085 0.259 0.516 F : Transformed 0.064 0.079 0.101 0.209 0.071 0.060 0.088 0.182Ave( K ) 30.00 18.40 9.71 5.29 16.57 26.34 6.14 4.27 T = 200 χ : Fourier 0.069 0.100 0.153 0.396 0.092 0.070 0.197 0.444 F ∗∞ : Fourier 0.052 0.066 0.079 0.150 0.055 0.050 0.075 0.131 χ : Transformed 0.068 0.094 0.142 0.363 0.088 0.067 0.179 0.406 F : Transformed 0.057 0.069 0.082 0.153 0.058 0.051 0.074 0.135Ave( K ) 70 28.82 14.32 6.10 24.96 46.02 8.56 4.56 T = 500 χ : Fourier 0.055 0.068 0.096 0.222 0.067 0.057 0.119 0.278 F ∗∞ : Fourier 0.049 0.054 0.062 0.091 0.053 0.048 0.056 0.084 χ : Transformed 0.053 0.064 0.091 0.209 0.064 0.055 0.110 0.253 F : Transformed 0.048 0.053 0.062 0.096 0.051 0.048 0.058 0.086Ave( K ) 144.51 56.47 26.91 9.03 46.58 96.41 15.19 6.05To simulate the power of the tests, we let β = (0 ,
0) and β = ( δ, δ ) . Figure 4 presents thesize-adjusted power curves as functions of δ when the sample size is 200 and when both q t and17 t follow AR(1) processes. The figure is representative of other cases. For the two tests in eachpair, the size-adjusted powers are the same, as they are based on the same test statistic. Thus,we need only report two power curves: one for the usual Fourier bases and the other for thetransformed Fourier bases. The basic message from Figure 4 is that the size-adjusted powersassociated with the two sets of bases are very close to each other. This, coupled with its sizeaccuracy and convenience to use, suggests that we use the F test in empirical applications. This study proposes asymptotic F and t tests for structural breaks that are robust to heteroscedas-ticity and autocorrelation. The tests are based on a special series HAR variance estimator wherethe basis functions are crafted via the Gram–Schmidt orthonormalization. Monte Carlo simula-tions show that the F test is much more accurate than the corresponding chi-square test.This study assumes that there is a single known break point. The asymptotic F and t theorycan be extended to the case with multiple but known break points. The theory can also beextended to allow for a linear trend or other deterministic trends, but we need to redesign thebasis functions. In principle, the tests based on series HAR variance estimation can be extendedto accommodate the case with an unknown break point along the line of Cho and Vogelsang(2017). All the basic ingredients have been established in the study. We only need to take thesupremum (or other functionals) of the Wald or t statistic over λ as the test statistic. However,the convenient F approximation is lost, as the supremum of the standard distributions is notstandard any more. Therefore, it is not clear whether there is still an advantage of using seriesHAR variance estimators rather than kernel HAR variance estimators.18 Fourier BasesTransformed Bases
Fourier BasesTransformed Bases
Fourier BasesTransformed Bases
Fourier BasesTransformed Bases
Figure 4: The size-adjusted power curves for different 5% tests when T = 200 . Appendix of Proofs
Proof of Lemma 3.1.
Under Assumption 3.1, we have˜ X ′ ˜ XT = T − P [ T λ ] t =1 X ′ t X t OO T − P Tt =[ T λ ]+1 X ′ t X t ! → p (cid:18) λQ OO (1 − λ ) Q (cid:19) . Under Assumption 3.2, we have˜ X ′ u √ T = T − P [ T λ ] t =1 X ′ t u t T − P Tt =[ T λ ]+1 X ′ t u t ! → d (cid:18) Λ W m ( λ )Λ [ W m (1) − W m ( λ )] (cid:19) . Hence √ T ( ˆ β − β ) → d (cid:18) λQ OO (1 − λ ) Q (cid:19) − (cid:18) Λ W m ( λ )Λ [ W m (1) − W m ( λ )] (cid:19) = (cid:18) ( λQ ) − Λ W m ( λ )[(1 − λ ) Q ] − Λ [ W m (1) − W m ( λ )] (cid:19) = Q − Λ · W m ( λ ) λ Q − Λ · W m (1) − W m ( λ )1 − λ ! = Q − Λ · λ R λ dW m ( λ ) Q − Λ · − λ R λ dW m ( λ ) ! . For the second part of the lemma, we have1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˆ u t = 1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t (cid:16) Y t − ˜ X t ˆ β (cid:17) = 1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t (cid:16) ˜ X t β + u t − ˜ X t ˆ β (cid:17) = 1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t u t − " T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˜ X t √ T ( ˆ β − β ) . Now, it is not hard to show that under Assumption 3.3,1 T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˜ X t = T P [ T λ ] t =1 φ j (cid:0) tT (cid:1) X ′ t X t T P Tt =[ T λ ]+1 φ j (cid:0) tT (cid:1) X ′ t X t ! → p hR λ φ j ( r ) dr i Q OO hR λ φ j ( r ) dr i Q . √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˆ u t → d Λ R λ φ j ( r ) dW m ( r )Λ R λ φ j ( r ) dW m ( r ) ! − hR λ φ j ( r ) dr i Q OO hR λ φ j ( r ) dr i Q Q − Λ · W m ( λ ) λ Q − Λ · W m (1) − W m ( λ )(1 − λ ) ! = Λ nR λ φ j ( r ) dW m ( r ) − λ R λ φ j ( r ) dr · W m ( λ ) o Λ nR λ φ j ( r ) dW m ( r ) − − λ R λ φ j ( r ) dr [ W m (1) − W m ( λ )] o = Λ R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r )Λ R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) ! . Proof of Theorem 3.1.
We have R ˆ Q − → d ( R , −R ) (cid:18) λ − Q − OO (1 − λ ) − Q − (cid:19) = (cid:0) λ − R Q − , − (1 − λ ) − R Q − (cid:1) , ˆΩ → d K K X j =1 (cid:18) Λ OO Λ (cid:19) R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) ! ⊗ (cid:18) Λ OO Λ (cid:19) . Hence, R ˆ Q − ˆΩ Q − R ′ → d (cid:0) λ − R Q − , − (1 − λ ) − R Q − (cid:1) (cid:18) Λ OO Λ (cid:19) × K K X j =1 R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) ! ⊗ × (cid:18) Λ OO Λ (cid:19) λ − (cid:0) R Q − (cid:1) ′ OO (1 − λ ) − (cid:0) R Q − (cid:1) ′ ! = (cid:0) λ − R Q − Λ , − (1 − λ ) − R Q − Λ (cid:1) K K X j =1 R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) R λ (cid:2) φ j ( r ) − ¯ φ j, (cid:3) dW m ( r ) ! ⊗ × λ − (cid:0) R Q − Λ (cid:1) ′ , − (1 − λ ) − (cid:0) R Q − Λ (cid:1) ′ ! = R Q − Λ 1 K K X j =1 (cid:26)Z λ φ j ( r ) − ¯ φ j, λ dW m ( r ) − Z λ φ j ( r ) − ¯ φ j, − λ dW m ( r ) (cid:27) ⊗ (cid:0) R Q − Λ (cid:1) ′ . √ T · h R ( ˆ β − β ) i → d ( R , −R ) Q − Λ · λ R λ dW m ( λ ) Q − Λ · − λ R λ dW m ( λ ) ! = R Q − Λ (cid:20) λ Z λ dW m ( λ ) − − λ Z λ dW m ( λ ) (cid:21) . Therefore, F T = T · h R ( ˆ β − β ) i ′ h R ˆ Q − ˆΩ Q − R ′ i − h R ( ˆ β − β ) i → d (cid:26) R Q − Λ (cid:20) λ Z λ dW m ( λ ) − − λ Z λ dW m ( λ ) (cid:21)(cid:27) ′ × R Q − Λ 1 K K X j =1 (cid:26)Z ˜ φ j ( r ; λ ) dW m ( r ) (cid:27) ⊗ (cid:0) R Q − Λ (cid:1) ′ − × R Q − Λ (cid:20) λ Z λ dW m ( λ ) − − λ Z λ dW m ( λ ) (cid:21) . Using the fact that R Q − Λ W m = A p W p for a square and invertible matrix A p , we have F T → d (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) ′ × K K X j =1 (cid:26)Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:27) ⊗ − × (cid:20) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:21) . The proof for the weak convergence of t T is similar and is omitted to save space. Proof of Proposition 4.1.
We prove the part for the Wald statistic only, as the proof for thet-statistic is similar. Given that { ˜ φ j ( r ; λ ) } are orthonormal on L [0 , , we have: η j := Z ˜ φ j ( r ; λ ) dW p ( r ) ∼ iidN (0 , I p ) . As a consequence, K X j =1 η j η ′ j ∼ W p ( I p , K ) , the standard Wishart distribution with degrees of freedom K. So, K − p + 1 Kp λ (1 − λ ) F ∞ = K − p + 1 Kp · η ′ K K X j =1 η j η ′ j − η . η , η , . . . , η K are independent standard normal vectors. η ′ (cid:16) K P Kj =1 η j η ′ j (cid:17) − η followsHotelling’s T distribution. Using the relationship between Hotelling’s T distribution and thestandard F distribution, we have K − p + 1 Kp λ (1 − λ ) F ∞ ∼ F p,K − p +1 . It then follows that K − p + 1 Kp λ (1 − λ ) F T → d F p,K − p +1 . Proof of Lemma 4.1.
We have Z ˜ φ j ( r ; λ ) ˜ φ j ( r ; λ ) dr = 1 λ Z λ (cid:20) φ j ( r ) − λ Z λ φ j ( s ) ds (cid:21) (cid:20) φ j ( r ) − λ Z λ φ j ( s ) ds (cid:21) dr + 1(1 − λ ) Z λ (cid:20) φ j ( r ) − − λ Z λ φ j ( s ) ds (cid:21) (cid:20) φ j ( r ) − − λ Z λ φ j ( s ) ds (cid:21) dr, where 1 λ Z λ (cid:20) φ j ( r ) − λ Z λ φ j ( s ) ds (cid:21) (cid:20) φ j ( r ) − λ Z λ φ j ( s ) ds (cid:21) dr = 1 λ (cid:20)Z λ Z λ δ ( r − s ) φ j ( r ) φ j ( s ) drds − λ Z λ Z λ φ j ( r ) φ j ( s ) drds (cid:21) = Z Z (cid:20) δ ( r − s ) − λ (cid:21) { ( r, s ) ∈ [0 , λ ] × [0 , λ ] } λ φ j ( r ) φ j ( s ) drds, and similarly,1(1 − λ ) Z λ (cid:20) φ j ( r ) − − λ Z λ φ j ( s ) ds (cid:21) (cid:20) φ j ( r ) − − λ Z λ φ j ( s ) ds (cid:21) dr = 1(1 − λ ) Z λ Z λ (cid:20) δ ( r − s ) − − λ (cid:21) φ j ( r ) φ j ( s ) drds = Z Z (cid:20) δ ( r − s ) − − λ (cid:21) { ( r, s ) ∈ [ λ, × [ λ, } (1 − λ ) φ j ( r ) φ j ( s ) drds. Therefore, Z ˜ φ j ( r ; λ ) ˜ φ j ( r ; λ ) dr = Z Z C ( r, s ; λ ) φ j ( r ) φ j ( s ) drds. roof of Theorem 5.1. Part (a). Under Assumption 5.1, we haveˆ Q ˜ X · Z = 1 T ˜ X ′ z ˜ X z = 1 T ˜ X ′ M Z ˜ X = 1 T ˜ X ′ ˜ X − T ˜ X ′ Z (cid:0) Z ′ Z (cid:1) − Z ′ ˜ X → p (cid:18) λQ XX OO (1 − λ ) Q XX (cid:19) − (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) Q − ZZ (cid:0) λQ ZX (1 − λ ) Q ZX (cid:1) := Q ˜ X · Z . Under Assumption 5.2, we have1 √ T ˜ X ′ M Z u = 1 √ T ˜ X ′ u − ˜ X ′ ZT (cid:18) Z ′ ZT (cid:19) − Z ′ uT → d (cid:18) Λ X W m + ℓ ( λ )Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] (cid:19) − (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) Q − ZZ Λ Z W m + ℓ (1)= (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) . where Λ XZ = Q XZ Q − ZZ Λ Z . Hence, R √ T ( ˆ β − β ) → d RQ − X · Z (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) . Using the matrix inverse formula (cid:0) A − CB − C ′ (cid:1) − = A − + A − C (cid:0) B − C ′ A − C (cid:1) − C ′ A − , we have Q − X · Z = (cid:18) λQ XX OO (1 − λ ) Q XX (cid:19) − + (cid:18) λQ XX OO (1 − λ ) Q XX (cid:19) − (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) × " Q ZZ − (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) ′ (cid:18) λQ XX OO (1 − λ ) Q XX (cid:19) − (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) − × (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) ′ (cid:18) λQ XX OO (1 − λ ) Q XX (cid:19) − = (cid:18) λ − Q − XX OO (1 − λ ) − Q − XX (cid:19) + (cid:18) Q QQ Q (cid:19) , where Q = Q − XX Q XZ Q − Z · X Q ZX Q − XX for Q Z · X = Q ZZ − Q ZX Q − XX Q XZ . RQ − X · Z = [ R , −R ] (cid:18) λ − Q − XX OO (1 − λ ) − Q − XX (cid:19) = R h λ − Q − XX , − (1 − λ ) − Q − XX i (12)and RQ − X · Z (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) = R h λ − Q − XX , − (1 − λ ) − Q − XX i (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) = R Q − XX (cid:18) Λ X W m + ℓ ( λ ) λ − Λ XZ W m + ℓ (1) (cid:19) − R Q − XX (cid:18) Λ X W m + ℓ (1) − W m + ℓ ( λ )1 − λ − Λ XZ W m + ℓ (1) (cid:19) = R Q − XX Λ X (cid:18) W m + ℓ ( λ ) λ − W m + ℓ (1) − W m + ℓ ( λ )1 − λ (cid:19) . Hence R √ T ( ˆ β − β ) → d R Q − XX Λ X (cid:18) W m + ℓ ( λ ) λ − W m + ℓ (1) − W m + ℓ ( λ )1 − λ (cid:19) = R Q − XX Λ X (cid:20) λ Z λ dW m + ℓ ( λ ) − − λ Z λ dW m + ℓ ( λ ) (cid:21) . Part (b) We have R ˆ Q − X · Z √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˆ u t = R ˆ Q − X · Z √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t (cid:16) u t − ˜ X ′ z,t ( ˆ β − β ) − Z t (cid:0) Z ′ Z (cid:1) − Z ′ u (cid:17) = R ˆ Q − X · Z √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t u ∗ t − R ˆ Q − X · Z " T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˜ X z,t √ T ( ˆ β − β ) , where u ∗ t = u t − Z t ( Z ′ Z ) − Z ′ u. To find the limit of the first term in the above equation, we note25hat 1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t u ∗ t = 1 √ T T X t =1 φ j (cid:18) tT (cid:19) (cid:16) ˜ X t − Z t (cid:0) Z ′ Z (cid:1) − Z ′ ˜ X (cid:17) ′ (cid:16) u t − Z t (cid:0) Z ′ Z (cid:1) − Z ′ u (cid:17) = 1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t u t − √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t Z t (cid:0) Z ′ Z (cid:1) − Z ′ u − √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ Z (cid:0) Z ′ Z (cid:1) − Z ′ t u t + 1 √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ Z (cid:0) Z ′ Z (cid:1) − Z ′ t Z t (cid:0) Z ′ Z (cid:1) − Z ′ u → d Λ X R λ φ j ( r ) dW m + ℓ ( r )Λ X R λ φ j ( r ) dW m + ℓ ( r ) ! − R λ φ j ( r ) dr · Λ XZ W m + ℓ (1) R λ φ j ( r ) dr · Λ XZ W m + ℓ (1) ! − λ · Λ XZ R φ j ( r ) dW m + ℓ ( r )(1 − λ ) · Λ XZ R φ j ( r ) dW m + ℓ ( r ) ! + (cid:18) λ ¯ φ j, · Λ XZ W m + ℓ (1)(1 − λ ) ¯ φ j, · Λ XZ W m + ℓ (1) (cid:19) , where ¯ φ j, = R φ j ( r ) dr. Therefore, R ˆ Q − X · Z √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t u ∗ t → d R Q − XX Λ X (cid:20) λ − Z λ φ j ( r ) dW m + ℓ ( r ) − (1 − λ ) − Z λ φ j ( r ) dW m + ℓ ( r ) (cid:21) − R Q − XX Λ XZ (cid:0) ¯ φ j, − ¯ φ j, (cid:1) W m + ℓ (1) , (13)where we have used R ˆ Q − → d RQ − X,Z = R h λ − Q − XX , − (1 − λ ) − Q − XX i ; see (12).26ext,1 T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˜ X z,t = 1 T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t ˜ X t − T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ t Z t (cid:0) Z ′ Z (cid:1) − Z ′ ˜ X − T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ Z (cid:0) Z ′ Z (cid:1) − Z ′ t ˜ X t + 1 T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ Z (cid:0) Z ′ Z (cid:1) − Z ′ t Z t (cid:0) Z ′ Z (cid:1) − Z ′ ˜ X → p hR λ φ j ( r ) dr i Q XX OO hR λ φ j ( r ) dr i Q XX − hR λ φ j ( r ) dr i Q XZ Q − ZZ hR λ φ j ( r ) dr i Q XZ Q − ZZ [ λQ ZX , (1 − λ ) Q ZX ] − hR λ φ j ( r ) dr i Q XZ Q − ZZ hR λ φ j ( r ) dr i Q XZ Q − ZZ [ λQ ZX , (1 − λ ) Q ZX ] ′ + Z φ j ( r ) dr · (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) Q − ZZ (cid:18) λQ XZ (1 − λ ) Q XZ (cid:19) ′ . (14)So, R ˆ Q − X · Z " T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˜ X z,t → p h λ − R Q − XX , − (1 − λ ) − R Q − XX i hR λ φ j ( r ) dr i Q XX OO hR λ φ j ( r ) dr i Q XX − h λ − R Q − XX , − (1 − λ ) − R Q − XX i hR λ φ j ( r ) dr i Q XZ Q − ZZ hR λ φ j ( r ) dr i Q XZ Q − ZZ ( λQ ZX , (1 − λ ) Q ZX )= (cid:2) ¯ φ j, R , − ¯ φ j, R (cid:3) − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX Q XZ Q − ZZ · [ λQ ZX , (1 − λ ) Q ZX ] , where we have used the fact that the last two terms in (14) pre-multiplied by h λ − R Q − XX , − (1 − λ ) − R Q − XX i are equal to zero. 27t then follows that R ˆ Q − X · Z " T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˜ X z,t ˆ Q − X · Z → p (cid:2) ¯ φ j, R , − ¯ φ j, R (cid:3) (cid:26)(cid:18) λ − Q − XX OO (1 − λ ) − Q − XX (cid:19) + (cid:18) Q QQ Q (cid:19)(cid:27) − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX Q XZ Q − ZZ × (cid:2) λQ ZX , (1 − λ ) Q ZX (cid:3) (cid:26)(cid:18) λ − Q − XX OO (1 − λ ) − Q − XX (cid:19) + (cid:18) Q QQ Q (cid:19)(cid:27) = (cid:2) λ − ¯ φ j, R Q − XX , − (1 − λ ) − ¯ φ j, R Q − XX (cid:3) − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX Q XZ Q − ZZ (cid:2) Q ZX Q − XX , Q ZX Q − XX (cid:3) − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R (cid:2) Q − XX Q XZ Q − ZZ Q ZX − I p (cid:3) [ Q , Q ] . As a consequence, R ˆ Q − X · Z " T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˜ X z,t √ T ( ˆ β − β ) → d I + I + I , (15)where I = (cid:2) λ − ¯ φ j, R Q − XX , − (1 − λ ) − ¯ φ j, R Q − XX (cid:3) × (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) = λ − ¯ φ j, R Q − XX Λ X W m + ℓ ( λ ) − ¯ φ j, R Q − XX Λ XZ W m + ℓ (1) − (1 − λ ) − ¯ φ j, R Q − XX Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] + ¯ φ j, R Q − XX Λ XZ W m + ℓ (1)= (cid:16) λ − ¯ φ j, + (1 − λ ) − ¯ φ j, (cid:17) R Q − XX Λ X W m + ℓ ( λ ) − h(cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX Λ XZ + (1 − λ ) − ¯ φ j, R Q − XX Λ X i W m + ℓ (1)= (cid:16) λ − ¯ φ j, + (1 − λ ) − ¯ φ j, (cid:17) R Q − XX Λ X W m + ℓ ( λ )+ (cid:2)(cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX (Λ X − Λ XZ ) (cid:3) W m + ℓ (1) − n (1 − λ ) − ¯ φ j, + (cid:2)(cid:0) ¯ φ j, − ¯ φ j, (cid:1)(cid:3)o R Q − XX Λ X W m + ℓ (1)= (cid:16) λ − ¯ φ j, + (1 − λ ) − ¯ φ j, (cid:17) R Q − XX Λ X W m + ℓ ( λ )+ (cid:2)(cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX (Λ X − Λ XZ ) (cid:3) W m + ℓ (1) − λ n λ − ¯ φ j, + (1 − λ ) − ¯ φ j, o R Q − XX Λ X W m + ℓ (1) ,I = − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX Q XZ Q − ZZ × (cid:2) Q ZX Q − XX , Q ZX Q − XX (cid:3) (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) = − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX Q XZ Q − ZZ · Q ZX Q − XX (cid:2) Λ X − Q XZ Q − ZZ Λ Z (cid:3) W m + ℓ (1) , I = − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R (cid:2) Q − XX Q XZ Q − ZZ Q ZX − I p (cid:3) [ Q , Q ] × (cid:18) Λ X W m + ℓ ( λ ) − λ Λ XZ W m + ℓ (1)Λ X [ W m + ℓ (1) − W m + ℓ ( λ )] − (1 − λ ) Λ XZ W m + ℓ (1) (cid:19) = − (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R (cid:2) Q − XX Q XZ Q − ZZ Q ZX − I p (cid:3) Q (cid:2) Λ X − Q XZ Q − ZZ Λ Z (cid:3) W m + ℓ (1) . Plugging the above three terms I , I , and I back into (15), we obtain: R ˆ Q − X · Z " T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˜ X z,t √ T (cid:16) ˆ β − β (cid:17) → d (cid:16) λ − ¯ φ j, + (1 − λ ) − ¯ φ j, (cid:17) R Q − XX Λ X W m + ℓ ( λ ) − λ (cid:16) λ − ¯ φ j, + (1 − λ ) − ¯ φ j, (cid:17) R Q − XX Λ X W m + ℓ (1)+ (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX (cid:2) I p − Q XZ Q − ZZ Q ZX Q − XX − (cid:0) Q XZ Q − ZZ Q ZX − Q XX (cid:1) Q (cid:3) × (cid:0) Λ X − Q XZ Q − ZZ Λ Z (cid:1) W m + ℓ (1)= (cid:16) λ − ¯ φ j, + (1 − λ ) − ¯ φ j, (cid:17) R Q − XX Λ X [ W m + ℓ ( λ ) − λW m + ℓ (1)]+ (cid:0) ¯ φ j, − ¯ φ j, (cid:1) R Q − XX (cid:0) Λ X − Q XZ Q − ZZ Λ Z (cid:1) W m + ℓ (1) , (16)where the last equality holds because Q − XX (cid:2) I p − Q XZ Q − ZZ Q ZX Q − XX − (cid:0) Q XZ Q − ZZ Q ZX − Q XX (cid:1) Q (cid:3) = Q − XX − Q − XX Q XZ Q − ZZ Q ZX Q − XX − Q − XX Q XZ Q − ZZ Q ZX Q − XX Q XZ Q − Z · X Q ZX Q − XX = Q − XX − Q − XX Q XZ Q − ZZ Q ZX Q − XX + Q − XX Q XZ Q − ZZ (cid:0) − Q ZX Q − XX Q XZ (cid:1) (cid:0) Q ZZ − Q ZX Q − XX Q XZ (cid:1) − Q ZX Q − XX + Q = Q − XX − Q − XX Q XZ Q − ZZ Q ZX Q − XX + Q − XX Q XZ Q − ZZ (cid:0) Q ZZ − Q ZX Q − XX Q XZ (cid:1) (cid:0) Q ZZ − Q ZX Q − XX Q XZ (cid:1) − Q ZX Q − XX − Q − XX Q XZ Q − ZZ Q ZZ (cid:0) Q ZZ − Q ZX Q − XX Q XZ (cid:1) − Q ZX Q − XX + Q = Q − XX − Q − XX Q XZ Q − ZZ Q ZX Q − XX + Q − XX Q XZ Q − ZZ Q ZX Q − XX − Q − XX Q XZ (cid:0) Q ZZ − Q ZX Q − XX Q XZ (cid:1) − Q ZX Q − XX + Q = Q − XX − Q − XX Q XZ (cid:0) Q ZZ − Q ZX Q − XX Q XZ (cid:1) − Q ZX Q − XX + Q − XX Q XZ Q − Z · X Q ZX Q − XX = Q − XX . R ˆ Q − X · Z √ T T X t =1 φ j (cid:18) tT (cid:19) ˜ X ′ z,t ˆ u t → d R Q − XX Λ X (cid:20) λ − Z λ φ j ( r ) dW m + ℓ ( r ) − (1 − λ ) − Z λ φ j ( r ) dW m + ℓ ( r ) (cid:21) − R Q − XX Λ X h λ − ¯ φ j, + (1 − λ ) − ¯ φ j, i W m + ℓ ( λ )+ R Q − XX Λ X h ¯ φ j, + λ (1 − λ ) − ¯ φ j, − (cid:0) ¯ φ j, − ¯ φ j, (cid:1)i W m + ℓ (1)= R Q − XX Λ X (cid:20) λ − Z λ φ j ( r ) dW m + ℓ ( r ) − (1 − λ ) − Z λ φ j ( r ) dW m + ℓ ( r ) (cid:21) − R Q − XX Λ X h λ − ¯ φ j, + (1 − λ ) − ¯ φ j, i W m + ℓ ( λ )+ R Q − XX Λ X h (1 − λ ) − ¯ φ j, i W m + ℓ (1)= R Q − XX Λ X (cid:20) λ − Z λ (cid:0) φ j ( r ) − ¯ φ j, (cid:1) dW m + ℓ ( r ) − (1 − λ ) − Z λ (cid:0) φ j ( r ) − ¯ φ j, (cid:1) dW m + ℓ ( r ) (cid:21) = R Q − XX Λ X Z ˜ φ j ( r ; λ ) dW m + ℓ ( r ) . Part (c). We prove the case for F T only, as the proof for t T is similar. Using Parts (a) and(b), we have F T = T · ( R ˆ β ) ′ h R ˆ Q − ˆΩ Q − R ′ i − R ˆ β → d (cid:20) R Q − XX Λ X (cid:18) λ Z λ dW m + ℓ ( λ ) − − λ Z λ dW m + ℓ ( λ ) (cid:19)(cid:21) ′ × K X j =1 (cid:20) R Q − XX Λ Z ˜ φ j ( r ; λ ) dW m + ℓ ( r ) (cid:21) ⊗ − × (cid:20) R Q − XX Λ X (cid:18) λ Z λ dW m + ℓ ( λ ) − − λ Z λ dW m + ℓ ( λ ) (cid:19)(cid:21) . Note that R Q − XX Λ X W m + ℓ ( λ ) = d A p W p ( λ )for a p × p invertible matrix A p such that A ′ p A p = R Q − XX Λ X Λ ′ X Q − XX R ′ . Using this distributional30quivalence, we have F T = T · ( R ˆ β ) ′ h R ˆ Q − ˆΩ Q − R ′ i − R ˆ β → d (cid:20) A p (cid:18) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:19)(cid:21) ′ × K K X j =1 (cid:20) A p Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:21) ⊗ − × (cid:20) A p (cid:18) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:19)(cid:21) = (cid:18) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:19) ′ × K K X j =1 (cid:20)Z ˜ φ j ( r ; λ ) dW p ( r ) (cid:21) ⊗ − × (cid:18) λ Z λ dW p ( λ ) − − λ Z λ dW p ( λ ) (cid:19) , as desired. References
Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrixestimation.
Econometrica , 59:817–858.Cho, C.-K. and Vogelsang, T. J. (2017). Fixed-b inference for testing structural change in a timeseries regression.
Econometrics , 5(1).Chow, G. C. (1960). Tests of equality between sets of coefficients in two linear regressions.
Econometrica , 28(3):591–605.Giles, D. and Scott, M. (1992). Some consequences of using the Chow test in the context ofautocorrelated disturbances.
Economics Letters , 38(2):145 – 150.Hwang, J. and Sun, Y. (2017). Asymptotic F and t tests in an efficient GMM setting.
Journalof Econometrics , 198:277–295.Jansson, M. (2004). On the error of rejection probability in simple autocorrelation robust tests.
Econometrica , 72:937–946.Kiefer, N. M. and Vogelsang, T. J. (2002a). Heteroskedasticity-autocorrelation robust testingusing bandwidth equal to sample size.
Econometric Theory , 18:1350–1366.Kiefer, N. M. and Vogelsang, T. J. (2002b). Heteroskedasticity-autocorrelation robust standarderrors using the Bartlett kernel without truncation.
Econometrica , 70:2093–2095.Kiefer, N. M. and Vogelsang, T. J. (2005). A new asymptotic theory for heteroskedasticity-autocorrelation robust tests.
Econometric Theory , 21:1130–1164.Kr¨amer, W. (1989).
The Robustness of the Chow Test to Autocorrelation among Disturbances ,pages 45–52. in Statistical Analysis and Forecasting of Economic Structural Change, Springer.31azarus, E., Lewis, D. J., Stock, J. H., and Watson, M. W. (2018). HAR inference: Recommen-dations for practice.
Journal of Business & Economic Statistics , 36(4):541–559.Liu, C. and Sun, Y. (2019). A simple and trustworthy asymptotic t test in difference-in-differencesregressions.
Journal of Econometrics , 210(2):327 – 362.Martnez-Iriarte, J., Sun, Y., and Wang, X. (2019). Asymptotic F tests under possibly weakidentification. Working Paper, Department of Economics, UC San Diego.Newey, W. K. and West, K. D. (1987). A simple, positive semi-definite, heteroskedasticity andautocorrelation consistent covariance matrix.
Econometrica , 55(3):703–708.Phillips, P. C. B. (2005). HAC estimation by automated regression.
Econometric Theory ,21(1):116–142.Sun, Y. (2011). Robust trend inference with series variance estimator and testing-optimal smooth-ing parameter.
Journal of Econometrics , 164:345–366.Sun, Y. (2013). A heteroskedasticity and autocorrelation robust F test using orthonormal seriesvariance estimator.
Econometrics Journal , 16:1–26.Sun, Y. and Kim, M. S. (2012). Simple and powerful GMM over-identification tests with accuratesize.
Journal of Econometrics , 166:267–281.Sun, Y., Philips, P. C. B., and Jin, S. (2008). Optimal bandwidth selection in heteroskedasticity-autocorrelation robust testing.