The effect of a Durbin-Watson pretest on confidence intervals in regression
TThe effect of a Durbin-Watson pretest onconfidence intervals in regression
Paul Kabaila * , Samer Alhelli, Davide Farchione and NathanBragg Department of Mathematics and Statistics, La Trobe University, Australia
Abstract
Consider a linear regression model and suppose that our aim is to find a confidenceinterval for a specified linear combination of the regression parameters. In practice,it is common to perform a Durbin-Watson pretest of the null hypothesis of zerofirst-order autocorrelation of the random errors against the alternative hypothesisof positive first-order autocorrelation. If this null hypothesis is accepted then theconfidence interval centred on the Ordinary Least Squares estimator is used; oth-erwise the confidence interval centred on the Feasible Generalized Least Squaresestimator is used. We provide new tools for the computation, for any given designmatrix and parameter of interest, of graphs of the coverage probability functions ofthe confidence interval resulting from this two-stage procedure and the confidenceinterval that is always centred on the Feasible Generalized Least Squares estima-tor. These graphs are used to choose the better confidence interval, prior to anyexamination of the observed response vector.
Keywords and Phrases: autocorrelated errors, coverage probability, feasible gener-alized least squares, linear regression model, restricted maximum likelihood. * Corresponding author.
E-mail address:
[email protected] a r X i v : . [ s t a t . M E ] M a y Introduction
Consider a linear regression model where the parameter of interest θ is a specifiedlinear combination of the regression parameters. Suppose that our aim is to finda confidence interval for θ . Commonly in econometrics, for example when the re-sponses are measured over time, the random errors in the regression model may beautocorrelated. In the absence of autocorrelation the usual confidence interval, cen-tred on the Ordinary Least Squares (OLS) estimator and based on the assumptionof independent errors, should be used. We call this the OLS confidence interval. Ofcourse, in the presence of autocorrelation, this interval is no longer valid. In thiscase, it is common to estimate the first order autocorrelation ψ , assuming that therandom errors are a first order autoregressive process, and then to substitute thisestimate into the expression for the confidence interval found using generalized leastsquares. We call this a feasible generalized least squares (FGLS) confidence interval.The fact that the OLS confidence interval is preferable to the FGLS confidenceinterval when ψ = 0, has led to the proposal of the following two-stage procedure.We carry out a Durbin-Watson, or similar, pretest of the null hypothesis that ψ = 0against the alternative hypothesis that ψ >
0. If this null hypothesis is acceptedthen we use the OLS confidence interval; otherwise we use an FGLS confidenceinterval. We call this the two-stage confidence interval. This confidence interval hasbeen proposed by Wooldridge (2016, p.381), Kennedy (2008, p.119), Anselin (2006,pp 931-2), Verbeek (2004, p.101), Berthouex and Brown (2002, pp 368-9), Salvatoreand Reagle (2002, p.208), Giles and Giles (1993), Pokorny (1987, pp.202-7), Folmer(1988), Griffiths and Beesley (1984), Katz (1982, pp.122-5) and Karmel and Polasek(1977, p.355).A problem with the two-stage procedure is that the pretest may incorrectlyaccept or reject the null hypothesis, leading to a degradation in the coverage perfor-mance of the two-stage confidence interval. An alternative to the two-stage confi-dence interval is to always use a FGLS confidence interval. There are good reasonsfor constructing the FGLS confidence interval using the restricted maximum like-lihood estimator (REML) of ψ , see Cheang and Reinsel (2000). We will thereforeconstruct the FGLS confidence interval using the REML estimator of ψ .Our aim is to compare the coverage probabilities of the two-stage and FGLSconfidence intervals. It would be nice if one could make some general statement,such as “the FGLS confidence interval always has better coverage properties than2he two-stage confidence interval”. Our finding, however, is that this comparisondepends crucially on the design matrix for the linear regression under consideration.We have therefore chosen to provide the tools for the comparison of the coverageprobabilities for any given design matrix and parameter of interest. This comparisonmust, of course, be carried out prior to any examination of the observed responsevector. In summary, the decision as to whether one uses the two-stage or FGLSconfidence interval is made on a case-by-case basis, depending on the design matrixand parameter of interest at hand.For simplicity, we have assumed that the random errors are an AR(1) process. Asnoted in the Remarks section, our methodology is easily extended to the case thatthe random errors are an ARMA( (cid:96) , (cid:96) ) process for any given (cid:96) and (cid:96) ( (cid:96) + (cid:96) > θ that accounts explicitly for this misspecification. Thisalternative framework includes the heteroskedasticity and autocorrelation consistent(HAC) estimator of the correct covariance matrix of the OLS estimator.The main result of Section 2 is that the coverage probability of the FGLS confi-dence interval does not depend on either the regression parameters or the varianceof the random error. Consequently, for given design matrix, parameter of interestand nominal coverage, the coverage probability of the FGLS confidence interval is afunction of ψ . The main result of Section 3 is that the coverage probability of thetwo-stage confidence interval does not depend on either the regression parametersor the variance of the random error. In fact, for given design matrix, parameterof interest, nominal coverage and level of the Durbin-Watson pretest, the coverageprobability of the two-stage confidence interval is a function of ψ . This makes iteasy to compare the coverage probabilities of the two-stage and FGLS intervals forany given design matrix, parameter of interest, nominal coverage and level of theDurbin-Watson pretest. We estimate these coverage probabilities using the variancereduction methods described in Section 4 and the Supporting Information, so thatthis comparison can be quickly carried out.Figure 1 presents graphs of the coverage probability functions of the FGLS andtwo-stage confidence intervals, each with nominal coverage 0.95, for two real life data3xamples. The coverage probability for each given value of ψ ∈ { , . , . , . . . , . } was estimated using 50,000 simulation runs that employ the variance reductionmethods described in Section 4 and the Supporting Information. The vertical barsin Figure 1 are approximate 95% confidence intervals for the coverage probabilitiesestimated for each value of ψ . The computations for this paper were carried outusing R programs and packages.The top panel is for the chicken demand example considered on p.333 of Stu-denmund (2006) and based on the data in Table 6.2 on p.189. In this example, theresponse is Y t , the per capita chicken consumption (in pounds) in year t , and themodel is Y t = β + β P C t + β P B t + β Y D t + e t , where P C t is the price of chicken (in cents per pound), P B t is the price of beef(in cents per pound), Y D t is the U.S. per capita disposable income (in hundreds ofdollars) and e t denotes the random error in year t . For the top panel of Figure 1,the parameter of interest is β .The bottom panel is for the defense spending example considered on p.342 ofStudenmund (2006) and based on the data in Table 9.1 on pp.343–344. In thisexample, the linear regression model islog( SD t ) = β + β log( U SD t ) + β log( SY t ) + β log( SP t ) + e t , where SDH t is the CIA’s “high” estimate of Soviet defense expenditure (billionsof 1970 rubles), U SD t is U.S. defense expenditure (billions of 1980 dollars), SY t isSoviet GNP (billions of 1970 rubles), SP t is the ratio of the number of USSR nuclearwarheads ( N R t ) to the number of U.S. nuclear warheads ( N U t ) and e t denotes therandom error in year t . For the bottom panel of Figure 1, the parameter of inter-est is β . In both cases, the FGLS confidence interval outperforms the two-stageconfidence interval, in terms of coverage probability. The resulting recommenda-tion is that the FGLS confidence interval should be used, instead of the two-stageconfidence interval.In Section 6 we provide two further examples of the comparison of the cover-age probabilities of the FGLS and two-stage confidence intervals. In Section 7 wedescribe the gains in simulation efficiency achieved using the variance reductionmethods described in Section 4 and the Supporting Information in the context ofthe examples described in Figures 1 and 2.4 igure 1: The top panel presents graphs of the coverage probability functions for theFGLS and two-stage confidence intervals, each with nominal coverage 0.95, for the chickendemand example. The bottom panel presents graphs of the coverage probability functionsfor these confidence intervals and the same nominal coverage, for the defense spendingexample. The OLS and FGLS confidence intervals
Consider the linear regression model y = Xβ + e where y is a n -vector of responses, X is an n × p known design matrix with linearlyindependent columns ( n > p ), β is an p -vector of unknown parameters, e is an n -vector of zero-mean random errors. We suppose that { e t } is a zero-mean strictlystationary first order autoregressive (AR(1)) process satisfying e t = ψ e t − + u t for all integer t , where ψ is an unknown parameter satisfying 0 ≤ ψ <
1, the u t ’s areindependent and identically normally distributed with zero mean. Let σ = E ( e t ),an unknown positive parameter. The restriction to non-negative values of ψ isvery reasonable for many econometric data sets (see e.g. Wooldridge, 2016, p.378).Suppose that the parameter of interest is θ = a (cid:48) β , where a is a specified non-zero p -vector. Let our aim be to find a confidence interval for θ with minimum coverageprobability 1 − α . Henceforth, suppose that the design matrix X , a (which is usedin the definition of the parameter of interest θ ) and 1 − α are given.The covariance matrix of e is σ G ( ψ ), where G ( ψ ) is an n × n matrix with( i, j )’th element ψ | i − j | . Suppose, for the moment, that ψ is known. The standardestimator of β is (cid:98) β ( ψ ) = (cid:0) X (cid:48) G − ( ψ ) X (cid:1) − X (cid:48) G − ( ψ ) y . The resulting estimator of θ is (cid:98) θ ( ψ ) = a (cid:48) (cid:98) β ( ψ ). Let (cid:98) σ ( ψ ) = (cid:0) y − X (cid:98) β ( ψ ) (cid:1) (cid:48) G − ( ψ ) (cid:0) y − X (cid:98) β ( ψ ) (cid:1)(cid:14) m , where m = n − p .In other words, (cid:98) β ( ψ ), (cid:98) θ ( ψ ) and (cid:98) σ ( ψ ) denote the generalized least squares estimatorsof β , θ and σ , respectively. Also let v ( ψ ) = V ar (cid:0)(cid:98) θ ( ψ ) (cid:1) /σ = a (cid:48) (cid:0) X (cid:48) G − ( ψ ) X (cid:1) − a .Let [ a ± b ] denote the interval [ a − b, a + b ] ( b > a ). For ψ known, the standardconfidence interval for θ , with coverage 1 − α , is J ( ψ ) = (cid:104)(cid:98) θ ( ψ ) ± t m, − α/ ( v ( ψ )) / (cid:98) σ ( ψ ) (cid:105) , where the quantile t m,p is defined as P ( T ≤ t m,p ) = p for T ∼ t m . For ψ = 0,this confidence interval reduces to the usual confidence interval for θ , with coverage1 − α , centred on the ordinary least squares (OLS) estimator. We call this the OLSconfidence interval. When ψ is unknown we replace it by the restricted maximumlikelihood (REML) estimator (cid:98) ψ REML in J ( ψ ) to obtain the feasible generalized leastsquares (FGLS) confidence interval J (cid:0) (cid:98) ψ REML (cid:1) . In Appendix A, we describe three6stimators of ψ , including (cid:98) ψ REML . Let e † = e/σ , so that e † ∼ N (0 , G ( ψ )). Thefollowing theorem, proved in Appendix A, is the main result of this section. Theorem 1.
The three estimators (cid:98) ψ , (cid:98) ψ ML and (cid:98) ψ REML described in Appendix A areall functions of e † . For both (cid:101) ψ = 0 and (cid:101) ψ one of these three estimators of ψ , theevent (cid:8) θ ∈ J (cid:0) (cid:101) ψ (cid:1)(cid:9) is equal to the event (cid:110)(cid:0) b ( (cid:101) ψ ) (cid:1) (cid:48) e † ∈ (cid:104) ± t m, − α/ (cid:0) v ( (cid:101) ψ ) (cid:1) / (cid:0) w ( e † , (cid:101) ψ ) (cid:1) / (cid:105)(cid:111) , (1) where w ( e † , ψ ) = 1 m (cid:0) e † (cid:1) (cid:48) G − ( ψ ) (cid:16) I − X (cid:0) X (cid:48) G − ( ψ ) X (cid:1) − X (cid:48) G − ( ψ ) (cid:17) e † (cid:0) b ( ψ ) (cid:1) (cid:48) = a (cid:48) (cid:0) X (cid:48) G − ( ψ ) X (cid:1) − X (cid:48) G − ( ψ ) . Consequently, P (cid:0) θ ∈ J ( (cid:98) ψ ) (cid:1) , P (cid:0) θ ∈ J ( (cid:98) ψ ML ) (cid:1) and P (cid:0) θ ∈ J ( (cid:98) ψ REML ) (cid:1) are functions of ψ . This theorem allows us to easily carry out a numerical comparison of the coverageprobability functions of the confidence intervals J ( (cid:98) ψ ), J ( (cid:98) ψ ML ) and J ( (cid:98) ψ REML ) for anygiven design matrix X , a (which is used in the definition of the parameter of interest θ ) and 1 − α (the desired minimum coverage probability) are given. These coverageprobabilities do not depend on either β or σ and are determined solely by ψ .We compared these coverage probability functions for the same X , a and 1 − α as those considered in Figure 1. We found that, in terms of coverage probability, J ( (cid:98) ψ REML ) performs better than J ( (cid:98) ψ ML ), and J ( (cid:98) ψ ML ) performs better than J ( (cid:98) ψ ). Thisfinding provides support for the choice we made earlier to always construct the FGLSconfidence interval using the REML estimator of ψ . The Durbin-Watson test statistic is (cid:98) d = (cid:80) ni =2 ( r i − r i − ) (cid:80) ni =1 r i , where r i is the i ’th component of the vector r = (cid:0) I − X ( X (cid:48) X ) − X (cid:48) (cid:1) e of residualsfrom the model fitted by OLS. It may be shown that (cid:98) d = r (cid:48) B rr (cid:48) r , B = − · · · · · · − − · · · · · · − . Dividing the numerator and denominator of this expression for (cid:98) d by σ , we find thatthe Durbin-Watson test statistic (cid:98) d = (cid:0) r † (cid:1) (cid:48) B r † (cid:0) r † (cid:1) (cid:48) r † , (2)where r † = r/σ , so that r † = (cid:0) I − X ( X (cid:48) X ) − X (cid:48) (cid:1) e † . We use this test statistic asfollows to test the null hypothesis that ψ = 0 against the alternative hypothesis that ψ >
0. If (cid:98) d > c ( (cid:101) α ) then we accept this null hypothesis; otherwise we reject thisnull hypothesis. Here c ( (cid:101) α ) is defined to be the value of c such that under the nullhypothesis P ( (cid:98) d ≤ c ) = (cid:101) α , a specified test size. The method used to compute c ( (cid:101) α ) isdescribed in Appendix B. Henceforth, suppose that (cid:101) α is given.Consider the following two-stage procedure. We carry out a Durbin-Watsonpretest of the null hypothesis that ψ = 0 against the alternative hypothesis that ψ >
0. If this null hypothesis is accepted then we use the OLS confidence interval,with nominal coverage 1 − α ; otherwise we use an FGLS confidence interval, withnominal coverage 1 − α . We call this the two-stage confidence interval, with desiredminimum coverage 1 − α , and we denote it by K . In other words, K = (cid:40) J (0) if (cid:98) d > c ( (cid:101) α ) J ( (cid:98) ψ REML ) otherwise . The following theorem, which is the main result of this section, is proved inAppendix B using Theorem 1 and the expression (2) for the Durbin-Watson statistic (cid:98) d . Theorem 2.
The coverage probability of the two-stage confidence interval K , P ( θ ∈ K ) , is a function of ψ . Using this theorem, we can easily carry out a numerical comparison of the cov-erage probability functions of the FGLS confidence interval J ( (cid:98) ψ REML ) and the two-stage confidence interval K , for the same values of X , a and 1 − α . These coverage8robabilities do not depend on either β or σ and are determined solely by ψ . Asdescribed in the two next sections, we compute these coverage probability functionsby simulation. We may compute the coverage probability of the FGLS confidence interval J ( (cid:98) ψ REML )using “brute force” simulation as follows. Suppose that we carry out M independentsimulation runs. On the k ’th simulation run we compute an observation of e † andthen record whether or not the event (1) occurs. The total number of occurrencesof this event has a Binomial( M, p ) distribution, where p = P ( θ ∈ J ( (cid:98) ψ REML )). Theestimator of p and its standard error are found using the well-known properties ofthis distribution.However, a better way to compute this coverage probability by simulation is touse variance reduction as follows. Let ( B ) = (cid:40) B is true0 if B is false , where B is an arbitrary statement. Also let h ( e † , ψ ) = (cid:16)(cid:0) b ( ψ ) (cid:1) (cid:48) e † ∈ (cid:104) ± t m, − α/ (cid:0) v ( ψ ) (cid:1) / (cid:0) w ( e † , ψ ) (cid:1) / (cid:105)(cid:17) , (3)so that, by Theorem 1, P (cid:0) θ ∈ J ( (cid:98) ψ REML ) (cid:1) = E (cid:0) h ( e † , (cid:98) ψ REML ) (cid:1) . We expect that, withprobability close to 1, h ( e † , (cid:98) ψ REML ) will be close to h ( e † , ψ ), particularly for large n . Note that E (cid:0) h ( e † , ψ ) (cid:1) = 1 − α . This motivates our use of h ( e † , ψ ) as a controlvariate. Therefore P (cid:0) θ ∈ J ( (cid:98) ψ REML ) (cid:1) = 1 − α + E (cid:16) h (cid:0) e † , (cid:98) ψ REML (cid:1) − h (cid:0) e † , ψ (cid:1)(cid:17) . We expect that
V ar (cid:0) h (cid:0) e † , (cid:98) ψ REML (cid:1) − h (cid:0) e † , ψ (cid:1)(cid:1) will be much less than V ar (cid:0) h (cid:0) e † , (cid:98) ψ REML (cid:1)(cid:1) ,particularly for large n . The resulting computation of P (cid:0) θ ∈ J ( (cid:98) ψ REML ) (cid:1) by simula-tion is as follows. Suppose that we carry out M independent simulation runs. Onthe k ’th simulation run we compute an observation of e † and then record the value of h (cid:0) e † , (cid:98) ψ REML (cid:1) − h (cid:0) e † , ψ (cid:1) . Then 1 − α + (sample mean of these values) is the estimateof P (cid:0) θ ∈ J ( (cid:98) ψ REML ) (cid:1) , with standard error (cid:0) (sample variance of these values) (cid:14) M (cid:1) / .9 Computation of the coverage probability of thetwo-stage confidence interval by simulation
We may compute the coverage probability of the two-stage confidence interval K using “brute force” simulation as follows. Suppose that we carry out M independentsimulation runs. On the k ’th simulation run we compute an observation of e † andthen record if the event { θ ∈ K } occurs. The total number of occurrences of thisevent has a Binomial( M, p ) distribution, where p = P ( θ ∈ K ). The estimator of p and its standard error are found using the well-known properties of this distribution.However, a better way to compute this coverage probability by simulation is to usethe variance reduction method described in the Supporting Information. Figure 2 presents graphs of the coverage probability functions of the FGLS and two-stage confidence intervals for two real life data examples. Both of these confidenceintervals have nominal coverage 0.95. The coverage probability for each given valueof ψ ∈ { , . , . , . . . , . } was estimated using 50,000 simulation runs thatemploy the variance reduction methods described in Section 4 and the SupportingInformation. The vertical bars in Figure 2 are approximate 95% confidence intervalsfor the coverage probabilities estimated for each value of ψ .The top panel of Figure 2 is for the fish demand example considered on p.334Studenmund (1992) and based on the data in Table 8.1 on p.290, which was obtainedfrom Historical Statistics of the US, Colonial Times to 1970 part 1. In this example,the linear regression model is F t = β + β RP t + β log( Y d t ) + β D t + e t , where F t is the average pounds of fish consumed per capita in year t , RP t is the priceof fish relative to beef in year t , Y d t is the real per capita disposable income in year t (in billions of dollars), D t is a dummy variable equal to zero in years before 1966 andone afterwards and e t denotes the random error in year t . For this panel of Figure 2,the parameter of interest is β . This panel provides an illustration of the case that,while the FGLS confidence interval outperforms the two-stage confidence interval in10erms of the coverage probability function, the coverage probability performance ofboth of these intervals drops substantially as ψ approaches 1.The bottom panel of Figure 2 is for consumption of ice cream example consideredon p.104 Verbeek (2004). This data is listed by Hildreth and Lu (1960) and consistsof 30 four-weekly observations from 18 March 1951 to 11 July 1953. In this example,the response is y t , the consumption of ice cream per head (pints) at time t (measuredin consecutive four-weekly segments), and the model is y t = β + β X t + β X t + β X t + e t , where X t is the average family income per week (in US Dollars), X t is the price ofice cream (per pint), X t is the average temperature (in Fahrenheit) and e t denotesthe random error at time t . For this panel of Figure 2, the parameter of interestis β . This panel provides yet another illustration of the case that the FGLS confi-dence interval outperforms the two-stage confidence interval in terms of the coverageprobability function. This panel also provides an illustration of the case that boththe FGLS and two-stage confidence intervals have coverage probability very close tothe nominal coverage 0.95 for ψ = 0. All of the computations reported in this paper were carried out on a PC withIntel i7 CPU and 32Gb of RAM. The coverage probabilities of the FGLS and two-stage confidence intervals, each with nominal coverage 0.95, were computed using50,000 simulation runs, with the variance reductions described in Section 4 and theSupporting Information, for each ψ ∈ { , . , . , . . . , . } and for both panelsof both figures. The use of variance reduction has greatly increased the efficiency ofthe computations of the graphs shown in both Figures 1 and 2.To assess the improvement in the efficiency of these computations, we also com-puted the coverage probabilities of the FGLS and two-stage confidence intervals,each with nominal coverage 0.95, using 50,000 simulation runs, without variancereduction, for each ψ ∈ { , . , . , . . . , . } and for both panels of both figures.Using the standard measure of relative efficiency given e.g. on p.51 of Hammersleyand Handscomb (1964), we then found the computation time required to estimatethese coverage probabilities with the same accuracy (i.e. with the same standard11 igure 2: The top panel presents graphs of the coverage probability functions for theFGLS and two-stage confidence intervals, each with nominal coverage 0.95, for the fish de-mand example. The bottom panel presents graphs of the coverage probability functions forthese confidence intervals, and the same nominal coverage, for the ice cream consumptionexample.
Table 1:
Computation times for simulation estimates, with the same accu-racy, of the coverage probabilities of the FGLS confidence interval for each ψ ∈{ , . , . , . . . , . } , with and without the variance reduction described in Sec-tion 4. With variance reduction Without variance reductionFig.1 top panel 82 mins 208 minsFig.1 bottom panel 51 mins 102 minsFig.2 top panel 103 mins 180 minsFig.2 bottom panel 115 mins 324 mins Table 2:
Computation times for simulation estimates, with the same accuracy,of the coverage probabilities of the two-stage confidence interval for each ψ ∈{ , . , . , . . . , . } , with and without the variance reduction described in Sec-tion 5. With variance reduction Without variance reductionFig.1 top panel 81 mins 282 minsFig.1 bottom panel 53 mins 134 minsFig.2 top panel 118 mins 268 minsFig.2 bottom panel 118 mins 254 mins Remark 8.1.
The R programs used to compute the coverage probabilities of the FGLSand two-stage confidence intervals, using simulation with the variance reductionmethods described in Section 4 and the Supporting Information, were checked forcorrectness in the following two ways, for all of the examples considered in thepaper. Firstly, these coverage probabilities were computed by simulation, withoutvariance reduction, using simulations of e † and the expression (1) with (cid:101) ψ replacedby (cid:98) ψ REML and 0, respectively, as appropriate. Secondly, these coverage probabilitieswere also computed by simulation, without variance reduction, using simulations of y = Xβ + e for the particular case that β = 0 (which implies that θ = 0) and σ = 1.13 emark 8.2. Straightforward analogues of Theorem 2 and the variance reductionmethods described in the Supporting Information hold if we replace the Durbin-Watson test statistic by the so-called “t-statistic” (cid:98) ψ (cid:16) n − (cid:80) nt =2 (cid:0) r t − (cid:98) ψ r t − (cid:1) (cid:14) (cid:80) n − s =1 r s (cid:17) / , where (cid:98) ψ is the estimator of ψ described in Appendix A. Remark 8.3.
Theorems 2.1 and 3.2 and the variance reduction method described inSection 4 extend in the obvious way to the case that { e t } is assumed to be an anARMA( (cid:96) , (cid:96) ) process for any given (cid:96) and (cid:96) ( (cid:96) + (cid:96) > Remark 8.4.
The framework that we use for the construction of confidence intervalsfor θ does not preclude the consideration of misspecification of the model for theautocorrelations of the random errors. We expect that for moderate levels of mis-specification and moderate sample sizes, the result will be a negligible change in theperformance of the confidence interval for θ constructed assuming that there is nomisspecification. Such an assertion can easily be checked using a sensitivity analysisin which the actual data generating process for the random errors is not included inthe assumed family of parametric models. Remark 8.5.
An alternative framework for the construction of confidence inter-vals for θ is to use a confidence interval centred on the OLS estimator, but withthe correct standard error estimated using a heteroskedasticity and autocorrelationconsistent (HAC) estimator (see e.g. Andrews, 1991). A remarkable feature of thisestimator is that it is consistent for virtually arbitrary autocorrelations. One should,however, not lose sight of the fact that the OLS estimator is typically inefficient bycomparison with competitors of the type described by Wooldridge (2016, p.390).Also, confidence intervals based on this estimator and standard error estimated byHAC can perform poorly, in terms of coverage probability, for moderate values of n . In this alternative framework there seems to be little motivation for carrying outany preliminary hypothesis test. It is common in applied econometrics to carry out preliminary data-based modelselection, using preliminary hypothesis tests or minimizing a criterion such as the14kaike Information Criterion. This is frequently followed by the construction of aconfidence interval for a scalar parameter of interest, using the same data, basedon the assumption that the selected model had been given to us a priori , as thetrue model. This assumption is false because (a) the preliminary model selectionsometimes chooses the wrong model and (b) the data used to choose the model is re-used for the construction of the confidence interval without due acknowledgement.It is important to delineate those models and model selection procedures for whichthe post-model-selection confidence interval has poor coverage properties. A reviewof some of the literature that carries out this delineation is provided by Kabaila(2009). Kabaila, Mainzer and Farchione (2015, 2017) and the present paper extendthis delineation project to model selection procedures that are of particular interestin the field of econometrics.In the present paper we consider a preliminary data-based selection of a timeseries model for the random errors in a linear regression model. We provide thetools needed to assess the effect of this preliminary model selection on the coverageprobability of a confidence interval for a given linear combination of the regressionparameters and a given design matrix. The first tool is to show that the coverageprobabilities of both the FGLS and two-stage confidence intervals do not dependon either the regression parameter vector or the variance of the random error. Thesecond tool is to provide methods of variance reduction for the simulations usedto estimate these coverage probability functions, leading to the provision of theseestimates in a reasonable amount of time. Our proposal is that this assessmentbe carried out on a case-by-case basis for each given design matrix and parameterof interest. Since this assessment is carried out prior to the examination of the ob-served response vector, it can validly be used to decide whether the FGLS confidenceinterval or the two-stage confidence interval should be used.
Appendix A: FGLS confidence intervals
The three estimators of ψ considered We consider three estimators of ψ . The first of these is (cid:98) ψ = (cid:80) nt =2 r t r t − (cid:80) nt =1 r t , where r t is the t ’th component of the vector r = (cid:0) I − X ( X (cid:48) X ) − X (cid:48) (cid:1) e of residuals15rom the model fitted by OLS. This estimator is the sample first order autocorrela-tion of the residuals from this fitted model.The second estimator is (cid:98) ψ ML , the maximum likelihood estimator of ψ , is obtained(see e.g. Cooper and Thompson, 1977) by maximizing − n (cid:32) S (cid:0) (cid:98) β ( ψ ) , ψ (cid:1) n (cid:33) −
12 log ( | G ( ψ ) | ) (A.1)with respect to ψ ∈ [0 , S ( β, ψ ) = ( y − Xβ ) (cid:48) G − ( ψ ) ( y − Xβ ). The thirdestimator is (cid:98) ψ REML , the restricted maximum likelihood estimator of ψ , is obtained(see e.g. Cheang and Reinsel, 2000) by maximizing − m (cid:32) S (cid:0) (cid:98) β ( ψ ) , ψ (cid:1) m (cid:33) −
12 log( | G ( ψ ) | ) −
12 log (cid:0) | X (cid:48) G − ( ψ ) X | (cid:1) (A.2)with respect to ψ ∈ [0 , Proof of Theorem 1
Let e † = e/σ and note that e † ∼ N (0 , G ( ψ )). We show that (cid:98) ψ , (cid:98) ψ ML and (cid:98) ψ REML are all functions of e † . Let r † = r/σ , so that r † = (cid:0) I − X ( X (cid:48) X ) − X (cid:48) (cid:1) e † . Divisionof the numerator and denominator of the expression for (cid:98) ψ by σ shows that (cid:98) ψ = (cid:80) nt =2 r † t r † t − (cid:80) nt =1 (cid:0) r † t (cid:1) . In other words, (cid:98) ψ is a function of e † .The proofs that (cid:98) ψ ML and (cid:98) ψ REML are functions of e † are almost identical and sowe present only the proof for (cid:98) ψ ML . It follows from y − X (cid:98) β ( ψ ) = (cid:16) I − X (cid:0) X (cid:48) G − ( ψ ) X (cid:1) − X (cid:48) G − ( ψ ) (cid:17) e that S (cid:0) (cid:98) β ( ψ ) , ψ (cid:1) = ( y − X (cid:98) β ( ψ )) (cid:48) G − ( ψ ) ( y − X (cid:98) β ( ψ ))= e (cid:48) G − ( ψ ) (cid:16) I − X (cid:0) X (cid:48) G − ( ψ ) X (cid:1) − X (cid:48) G − ( ψ ) (cid:17) e. Thus S (cid:0) (cid:98) β ( ψ ) , ψ (cid:1) σ = (cid:0) e † (cid:1) (cid:48) G − ( ψ ) (cid:16) I − X (cid:0) X (cid:48) G − ( ψ ) X (cid:1) − X (cid:48) G − ( ψ ) (cid:17) e † , where σ denotes the true parameter value. Now the criterion (A.1) is equal to − n (cid:32) S (cid:0) (cid:98) β ( ψ ) , ψ (cid:1) n σ (cid:33) − n (cid:0) σ (cid:1) −
12 log ( | G ( ψ ) | ) , σ denotes the true parameter value. Thus maximizing (A.1) with respect to ψ ∈ [0 ,
1) is equivalent to maximizing − n (cid:32) S (cid:0) (cid:98) β ( ψ ) , ψ (cid:1) n σ (cid:33) −
12 log ( | G ( ψ ) | ) , with respect to ψ ∈ [0 , σ denotes the true parameter value. Consequently, (cid:98) ψ ML is a function of e † .Suppose that (cid:101) ψ is one of the three estimators (cid:98) ψ , (cid:98) ψ ML and (cid:98) ψ REML of ψ . Alsosuppose that the nominal coverage, 1 − α , of the confidence interval J (cid:0) (cid:101) ψ (cid:1) is given.Then (cid:8) θ ∈ J (cid:0) (cid:101) ψ (cid:1)(cid:9) = (cid:26) θ ∈ (cid:20)(cid:98) θ ( (cid:101) ψ ) ± t m, − α/ (cid:16) v ( (cid:101) ψ ) (cid:17) / (cid:98) σ ( (cid:101) ψ ) (cid:21)(cid:27) = (cid:40) (cid:98) θ ( (cid:101) ψ ) − θσ ∈ (cid:34) ± t m, − α/ (cid:16) v ( (cid:101) ψ ) (cid:17) / (cid:98) σ ( (cid:101) ψ ) σ (cid:35)(cid:41) = (cid:40) a (cid:48) (cid:32) (cid:98) β ( (cid:101) ψ ) − βσ (cid:33) ∈ (cid:34) ± t m, − α/ (cid:16) v ( (cid:101) ψ ) (cid:17) / (cid:98) σ ( (cid:101) ψ ) σ (cid:35)(cid:41) . Since (cid:98) σ ( ψ ) = S (cid:0) (cid:98) β ( ψ ) , ψ (cid:1)(cid:14) m , (cid:98) σ ( ψ ) = 1 m e (cid:48) G − ( ψ ) (cid:16) I − X (cid:0) X (cid:48) G − ( ψ ) X (cid:1) − X (cid:48) G − ( ψ ) (cid:17) e. Therefore, (cid:98) σ ( (cid:101) ψ ) = 1 m e (cid:48) G − ( (cid:101) ψ ) (cid:16) I − X (cid:0) X (cid:48) G − ( (cid:101) ψ ) X (cid:1) − X (cid:48) G − ( (cid:101) ψ ) (cid:17) e Hence (cid:98) σ ( (cid:101) ψ ) σ = 1 m ( e † ) (cid:48) G − (cid:0) (cid:101) ψ ) (cid:16) I − X (cid:0) X (cid:48) G − ( (cid:101) ψ ) X (cid:1) − X (cid:48) G − ( (cid:101) ψ ) (cid:17) e † = w ( e † , (cid:101) ψ ) . Also note that a (cid:48) (cid:32) (cid:98) β ( (cid:101) ψ ) − βσ (cid:33) = ( b ( (cid:101) ψ )) (cid:48) e † . Thus the event (cid:8) θ ∈ J (cid:0) (cid:101) ψ (cid:1)(cid:9) is equal to (1). Appendix B: The two-stage confidence interval
Computation of P (cid:0) (cid:98) d > c (cid:1) c ( (cid:101) α ) is defined to be the value of c such that, under the null hypothesis ψ = 0, P ( (cid:98) d ≤ c ) = (cid:101) α , a specified test size. Observe that P (cid:0) (cid:98) d > c (cid:1) = P (cid:32) (cid:0) r † (cid:1) (cid:48) B r † (cid:0) r † (cid:1) (cid:48) r † > c (cid:33) = P (cid:16)(cid:0) r † (cid:1) (cid:48) ( B − cI ) r † > (cid:17) = P (cid:16) ( e † ) (cid:48) (cid:0) I − X ( X (cid:48) X ) − X (cid:48) (cid:1) ( B − cI ) (cid:0) I − X ( X (cid:48) X ) − X (cid:48) (cid:1) e † > (cid:17) where e † ∼ N (0 , G ( ψ )). We compute the P (cid:0) (cid:98) d > c (cid:1) using the method of Imhof(1961). This is done using the Imhof function in the
CompQuadForm package in R .We compute (cid:0) X (cid:48) X (cid:1) − using the QR decomposition of X . Proof of Theorem 2
Observe that (cid:8) θ ∈ K (cid:9) = (cid:16)(cid:8) θ ∈ K (cid:9) ∩ (cid:8) H accepted (cid:9)(cid:17) ∪ (cid:16)(cid:8) θ ∈ K (cid:9) ∩ (cid:8) H rejected (cid:9)(cid:17) = (cid:16)(cid:8) θ ∈ J (0) (cid:9) ∩ (cid:8) (cid:98) d > c ( (cid:101) α ) (cid:9)(cid:17) ∪ (cid:16)(cid:8) θ ∈ J ( (cid:98) ψ REML ) (cid:9) ∩ (cid:8) (cid:98) d ≤ c ( (cid:101) α ) (cid:9)(cid:17) . Now (cid:8) θ ∈ J (0) (cid:9) is equal to (1) with (cid:101) ψ = 0, (cid:8) θ ∈ J (cid:0) (cid:98) ψ REML (cid:1)(cid:9) is equal to (1) with (cid:101) ψ replaced by (cid:98) ψ REML , (cid:98) ψ REML is a function of e † (by Theorem 1) and the Durbin-Watsontest statistic (cid:98) d satisfies (2). Hence whether or not the event { θ ∈ K } occurs isdetermined by the random vector e † , which has an N (0 , G ( ψ )) distribution. Thus P ( θ ∈ K ) is a function of ψ . References
Andrews, D.W.K. (1991), Heteroskedasticity and autocorrelation consistent covari-ance matrix estimation.
Econometrica , 59, 817–858.Anselin, L. (2006), Spatial econometrics. Pages 931–932 of Palgrave Handbook ofEconometrics: Vol 1, Econometric Theory. (T.C. Mills and K. Patterson eds.).Palgrave Macmillan, Basingstoke.Berthouex, P.M. and L.C. Brown (2002), Statistics for environmental engineers, 2ndedition, CRC, Boca Raton, FL.Cheang, W-K. and G.C. Reinsel (2000), Bias reduction of autoregressive estimatesin time series regression model through restricted maximum likelihood.
Journalof the American Statistical Association , 95, 1173–1184.18ooper, D.M. and R. Thompson (1977), A note on the estimation of the parametersof the autoregressive-moving average process.
Biometrika , 64, 625–628.Folmer, H. (1988), Autocorrelation pre-testing in linear models with AR(1) errors.Pages 39–55 of On Model Uncertainty and its Statistical Implications, Proceed-ings of a Workshop, Held in Groningen, The Netherlands, September 25–26, 1986(Theo K. Dijkstra ed.). Springer-Verlag, Berlin.Giles, J.A. and D.E.A. Giles (1993), Pre-test estimation and testing in econometrics:recent developments.
Journal of Economic Surveys , 7, 145-197.Griffiths,W.E. and P.A.A. Beesley (1984), The small-sample properties of some pre-liminary test estimators in linear model with autocorrelated errors.
Journal ofEconometrics , 25, 49–61.Hammersley, J. M. and Handscomb, D. C. (1965), Monte carlo methods. Methuen,London.Hildreth, C. and J. Lu (1960), Demand relations with autocorrelated disturbances.Technical Bulletin No. 276, Michigan State University.Imhof, J.P. (1961), Computing the distribution of quadratic forms in normal vari-ables.
Biometrika , 48, 419–426.Kabaila, P. (2009), The coverage properties of confidence regions after model selec-tion.
International Statistical Review , 77, 405–414.Kabaila, P., R. Mainzer and D. Farchione (2015), The impact of a Hausman pretest,applied to panel data, on the coverage probability of confidence intervals.
Eco-nomics Letters , 131, 12–15.Kabaila, P., R. Mainzer and D. Farchione (2017), Conditional assessment of theimpact of a Hausman pretest on confidence intervals.