On Size and Power of Heteroskedasticity and Autocorrelation Robust Tests
aa r X i v : . [ m a t h . S T ] J u l On Size and Power of Heteroskedasticity and AutocorrelationRobust Tests ∗ David Preinerstorfer and Benedikt M. P¨otscher † Department of Statistics, University of ViennaPreliminary version: April 2012First version: January 2013This version: June 2014
Abstract
Testing restrictions on regression coefficients in linear models often requires correctingthe conventional F-test for potential heteroskedasticity or autocorrelation amongst the dis-turbances, leading to so-called heteroskedasticity and autocorrelation robust test procedures.These procedures have been developed with the purpose of attenuating size distortions andpower deficiencies present for the uncorrected F-test. We develop a general theory to establishpositive as well as negative finite-sample results concerning the size and power properties of alarge class of heteroskedasticity and autocorrelation robust tests. Using these results we showthat nonparametrically as well as parametrically corrected F-type tests in time series regressionmodels with stationary disturbances have either size equal to one or nuisance-infimal powerequal to zero under very weak assumptions on the covariance model and under generic condi-tions on the design matrix. In addition we suggest an adjustment procedure based on artificialregressors. This adjustment resolves the problem in many cases in that the so-adjusted testsdo not suffer from size distortions. At the same time their power function is bounded awayfrom zero. As a second application we discuss the case of heteroskedastic disturbances.AMS Mathematics Subject Classification 2010: 62F03, 62J05, 62F35, 62M10, 62M15Keywords: Size distortion, power deficiency, invariance, robustness, autocorrelation, het-eroskedasticity, HAC, fixed-bandwidth, long-run-variance, feasible GLS
So-called autocorrelation robust tests have received considerable attention in the econometrics lit-erature in the last two and a half decades. These tests are Wald-type tests which make use of anappropriate nonparametric variance estimator that tries to take into account the autocorrelationin the data. The early papers on such nonparametric variance estimators in econometrics date ∗ Parts of the results in the paper have been presented as the Econometric Theory Lecture at the InternationalSymposium on Econometric Theory and Applications, Shanghai, May 19-21, 2012. We are grateful to the Editorand three referees for helpful comments. † Department of Statistics, University of Vienna, Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria. E-mail: { david.preinerstorfer, benedikt.poetscher } @univie.ac.at This improvement, however, is often achieved at the expense of some loss ofpower. In an attempt to better understand size and power properties of autocorrelation robusttests, higher-order asymptotic properties of these tests have been studied (Velasco and Robinson(2001), Jansson (2004), Sun et al. (2008, 2011), Zhang and Shao (2013a)).The first-order as well as the higher-order asymptotic results in the literature cited above areall pointwise asymptotic results in the sense that they are derived under the assumption of a fixed underlying data-generating process (DGP). Therefore, while these results tell us something aboutthe limit of the rejection probability, or the rate of convergence to this limit, for a fixed underlyingDGP, they do not necessarily inform us about the size of the test or its asymptotic behavior (e.g.,limit of the size as sample size increases) nor about the power function or its asymptotic behavior.The reason is that the asymptotic results do not hold uniformly in the underlying DGP under thetypical assumptions on the feasible set of DGPs in this literature. Of course, one could restrictthe set of feasible DGPs in such a way that the asymptotic results hold uniformly, but this wouldrequire the imposition of unnatural and untenable assumptions on the set of feasible DGPs as willtranspire from the subsequent discussion; cf. also Subsection 3.2.2.In Section 3 of the present paper we provide a theoretical finite-sample analysis of the size andpower properties of autocorrelation robust tests for linear restrictions on the parameters in a linearregression model with autocorrelated errors. Being finite-sample results, the findings of the paperapply equally well regardless of whether we fancy that the variance estimator being used wouldbe consistent or not would sample size go to infinity. Under a mild assumption on the richenessof the set of allowed autocorrelation structures in the maintained model, the results in Section 3imply that in most cases the size of common autocorrelation robust tests is 1 or that the worst case Some of the Monte Carlo studies in the literature initialize the disturbance process with its stationary distribu-tion, while others use a fixed starting value for initialization. In both cases size distortions are found for both classesof tests referred to in the text. not apply is a negligible set (Propositions 3.6 and 3.16). Furthermore, we provide a positive result inthat we isolate conditions (on the design matrix and on the restrictions to be tested) such that thesize of the test can be controlled. While this result is obtained under the strong assumption thatthe set of feasible correlation structures coincides with the correlation structures of all stationaryautoregressive process of order 1, it should be noted that the negative results equally well holdunder this parametric correlation model. The positive result just mentioned is then used to showhow for the majority of testing problems autocorrelation robust tests can be adjusted in such a waythat they do not suffer from the ”size equals 1” and the ”worst case power equals 0” problem. InSection 4 we provide an analogous negative result for heteroskedasticity robust tests and discusswhy a (nontrivial) positive result is not possible.The above mentioned results for autocorrelation/heteroskedasticity robust tests can of coursealso be phrased in terms of properties of the confidence sets that are obtained from these tests viainversion. For example, the ”size equals one” results for the tests translate into ”infimal coverageprobability equals zero” results for the corresponding confidence sets.We next discuss some related literature. Problems with tests and confidence sets for the interceptin a linear regression model with autoregressive disturbances have been pointed out in Section 5.3of Dufour (1997) (in a somewhat different setup). These results are specific to testing the interceptand do not apply to other linear restrictions. This is, in particular witnessed by our positive resultsfor certain testing problems. Furthermore, there is a considerable body of literature concernedwith the properties of the standard F -test (i.e., the F -test constructed without any correctionfor autocorrelation) in the presence of autocorrelation, see the references cited in Kr¨amer et al.(1990) and Banerjee and Magnus (2000). Much of this literature concentrates on the case wherethe errors follow a stationary autoregressive process of order 1. As the correlation in the errors isnot accounted for when considering the standard F -test, it is not too surprising that the standard F -test typically shows deplorable performance for large values of the autocorrelation coefficient ρ ,see Kr¨amer (1989), Kr¨amer et al. (1990), Banerjee and Magnus (2000), and Subsection 3.4 for morediscussion. Section 3 of the present paper shows that autocorrelation robust tests, which despitehaving built into them a correction for autocorrelation, exhibit a similarly bad behavior. Finally, ina different testing problem (the leading case being testing the correlation of the errors in a spatialregression model) Martellosio (2010) has studied the power of a class of invariant tests includingstandard tests like the Cliff-Ord test and observed somewhat similar results in that the power ofthe tests considered typically approaches (as the strength of the correlation increases) either 0 or1. While his results are similar in spirit to some of our results, his arguments are unfortunatelyfraught with a host of problems. See Preinerstorfer and P¨otscher (2014) for discussion, corrections,and extensions. 3he results in Section 3 for autocorrelation robust tests and in Section 4 for heteroskedasticityrobust tests are derived as special cases of a more general theory for size and power propertiesof a larger class of tests that are invariant under a particular group of affine transformations.This theory is provided in Section 5. One of the mechanisms behind the negative results in thepresent paper is a concentration mechanism explained subsequent to Theorem 3.3 and in moredetail in Subsection 5.2, cf. also Corollary 5.17. A second mechanism generating negative results isdescribed in Theorem 5.19. The theory underlying the positive results mentioned above is providedin Subsection 5.3 and in Theorem 5.21 as well as Proposition 5.23. Furthermore, the results inSection 5 allow for covariance structures more general than the ones discussed in Sections 3 and 4.For example, from the results in Section 5 results similar to the ones in Section 3 could be derivedfor heteroskedasticity/autocorrelation robust tests of regression coefficients in spatial regressionmodels or in panel data models; for an overview of heteroskedasticity/autocorrelation robust testsin these models see Kelejian and Prucha (2007, 2010), and Vogelsang (2012). We do not provideany such results for lack of space. We note that for the uncorrected standard F -test in this settingnegative results have been derived in Kr¨amer (2003) and Krmer and Hanck (2009). Consider the linear regression model Y = Xβ + U , (1)where X is a (real) nonstochastic regressor (design) matrix of dimension n × k and β ∈ R k denotesthe unknown regression parameter vector. We assume rank( X ) = k and 1 ≤ k < n . The n × U = ( u , . . . , u n ) ′ is normally distributed with mean zero and unknowncovariance matrix σ Σ, where 0 < σ < ∞ holds (and σ always denotes the positive squareroot). The matrix Σ varies in a prescribed (nonempty) set C of symmetric and positive definite n × n matrices. Throughout the paper we make the assumption that C is such that σ and Σ ∈ C can be uniquely determined from σ Σ. [For example, if the first diagonal element of each Σ ∈ C equals 1 this is satisfied; alternatively, if the largest diagonal element or the trace of each Σ ∈ C isnormalized to a fixed constant, C has this property.] Of course, this assumption entails little lossof generality and can, if necessary, always be achieved by a suitable reparameterization of σ Σ.The linear model described above induces a collection of distributions on R n , the sample space of Y . Denoting a Gaussian probability measure with mean µ ∈ R n and (possibly singular) covariancematrix Φ by P µ, Φ and setting M = span( X ), the induced collection of distributions is given by (cid:8) P µ,σ Σ : µ ∈ M , < σ < ∞ , Σ ∈ C (cid:9) . (2)Note that each P µ,σ Σ in (2) is absolutely continuous with respect to (w.r.t.) Lebesgue measureon R n , since every Σ ∈ C is positive definite by assumption. We consider the problem of testing alinear (better: affine) restriction on the parameter vector β ∈ R k , namely the problem of testingthe null Rβ = r versus the alternative Rβ = r , where R is a q × k matrix of rank q , q ≥
1, and r ∈ R q . To be more precise and to emphasize that the testing problem is in fact a compound one, Although not expressed in the notation, the elements of Y , X , and U (and even the probability space supporting Y and U ) may depend on sample size n . Furthermore, the obvious dependence of C on n will also not be shown inthe notation. [Note that C depends on n even if it is induced by a covariance model for the entire process ( u t ) t ∈ N that does not depend on n .] H : Rβ = r, < σ < ∞ , Σ ∈ C vs. H : Rβ = r, < σ < ∞ , Σ ∈ C . (3)This is important to stress, because size and power properties of tests critically depend on nuisanceparameters and, in particular, on the complexity of C . Define the affine space M = { µ ∈ M : µ = Xβ and Rβ = r } and let M = M \ M = { µ ∈ M : µ = Xβ and Rβ = r } . Adopting these definitions, the above testing problem can also be written as H : µ ∈ M , < σ < ∞ , Σ ∈ C vs. H : µ ∈ M , < σ < ∞ , Σ ∈ C . (4)Two remarks are in order: First, the Gaussiantiy assumption is not really a restriction for thenegative results in the paper, since they hold a fortiori in any enlarged model that allows not onlyfor Gaussian but also for non-Gaussian disturbances. Furthermore, a large portion of the results inthe paper (positive or negative) continues to hold for certain classes of non-Gaussian distributionssuch as, e.g., elliptical distributions, see Subsection 5.5. Second, if X were allowed to be stochasticbut independent of U , the results of the paper apply to size and power conditional on X . Because X is observable, one could then argue in the spirit of conditional inference (see, e.g., Robinson (1979))that conditional size and power and not their unconditional counterparts are the more relevantcharacteristics of a test.Recall that a (randomized) test is a Borel-measurable function ϕ from the sample space R n to[0 , ϕ = W , the set W is called the rejection region of the test. As usual, the size of a test ϕ is the supremum over all rejection probabilities under the null hypothesis H and thus is givenby sup µ ∈ M sup <σ < ∞ sup Σ ∈ C E µ,σ Σ ( ϕ ) where E µ,σ Σ refers to expectation under the probabilitymeasure P µ,σ Σ .Throughout the paper we shall always reserve the symbol ˆ β ( y ) for ( X ′ X ) − X ′ y , where X is thedesign matrix appearing in (1) and y ∈ R n . Furthermore, random vectors and random variables arealways written in bold capital and bold lower case letters, respectively. Lebesgue measure on R n will be denoted by λ R n , whereas Lebesgue measure on an affine subspace A of R n (but viewed as ameasure on the Borel-sets of R n ) will be denoted by λ A , with zero-dimensional Lebesgue measurebeing interpreted as point mass. We shall write int( A ), cl( A ), and bd( A ) for the interior, closure,and boundary of a set A ⊆ R n , respectively, taken with respect to the Euclidean topology. TheEuclidean norm is denoted by k·k , while d ( x, A ) denotes the Euclidean distance of the point x ∈ R n to the set A ⊆ R n . Let B ′ denote the transpose of a matrix B and let span ( B ) denote the spacespanned by the columns of B . For a linear subspace L of R n we let L ⊥ denote its orthogonalcomplement and we let Π L denote the orthogonal projection onto L . For a vector x in Euclideanspace we define the symbol h x i to denote ± x for x = 0, the sign being chosen in such a way thatthe first nonzero component of h x i is positive, and we set h i = 0. The j -th standard basis vectorin R n is denoted by e j ( n ). The set of real matrices of dimension m × n is denoted by R m × n . Wealso introduce the following terminology. Definition 2.1.
Let C be a set of symmetric and positive definite n × n matrices. An l -dimensionallinear subspace Z of R n with 0 ≤ l < n is called a concentration space of C , if there exists a sequence(Σ m ) m ∈ N in C , such that Σ m → ¯Σ and span( ¯Σ) = Z .5hile we shall in the sequel often refer to C as the covariance model, one should keep in mind thatthe set of all feasible covariance matrices corresponding to (2) is given by (cid:8) σ Σ : 0 < σ < ∞ , Σ ∈ C (cid:9) .In this context we note that two covariance models C and C ∗ can be equivalent in the sense of giv-ing rise to the same set of feasible covariance matrices, but need not have the same concentrationspaces. In this section we investigate size and power properties of autocorrelation robust tests that havebeen designed for use in case of stationary disturbances. Studies of the properties of such testsin the literature (Newey and West (1987, 1994), Andrews (1991), Andrews and Monahan (1992),Kiefer et al. (2000), Kiefer and Vogelsang (2002a,b, 2005), Jansson (2002, 2004), Sun et al. (2008,2011)) maintain assumptions that allow for nonparametric models for the spectral distribution ofthe disturbances. For example, a typical nonparametric model results from assuming that thedisturbance vector consists of n consecutive elements of a weakly stationary process with spectraldensity equal to f ( ω ) = (2 π ) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ X j =0 c j exp( − ιjω ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where the coefficients c j are not all equal to zero and, for ξ ≥ P ∞ j =0 j ξ | c j | < ∞ . Here ι denotes the imaginary unit. Let F ξ denote thecollection of all such spectral densities f . The corresponding covariance model C ξ is then given by { Σ ( f ) : f ∈ F ξ } where Σ ( f ) is the n × n correlation matrixΣ ( f ) = (cid:18)Z π − π exp ( − ιω ( i − j )) f ( ω ) dω (cid:30)Z π − π f ( ω ) dω (cid:19) ni,j =1 . Certainly, F ξ contains all spectral densities of stationary autoregressive moving average models ofarbitrary large order. Hence, the following assumption on the covariance model C that we shallimpose for most results in this section is very mild and is satisfied by the typical nonparametricmodel allowed for in the above mentioned literature. It certainly covers the case where C = C ξ orwhere C corresponds to an autoregressive model of order p ≥ Assumption 1. C AR (1) ⊆ C . Here C AR (1) denotes the set of correlation matrices corresponding to n successive elements of astationary autoregressive processes of order 1, i.e., C AR (1) = { Λ( ρ ) : ρ ∈ ( − , } where the ( i, j )-thentry in the n × n matrix Λ( ρ ) is given by ρ | i − j | . As hinted at in the introduction, parameter values (cid:0) µ, σ , Σ (cid:1) with Σ = Λ( ρ ) where ρ gets close to ± σ is constant will play an important rˆole asthey will be instrumental for establishing the bad size and power properties of the tests presentedbelow. We want to stress here that, as ρ → ±
1, the corresponding stationary process does not In applying the general results in Section 5.2 or Corollary 5.17 to a particular problem some skill in choosingbetween equivalent C and C ∗ may thus be required as one choice for C may lead to more interesting results than doesanother choice. If we parameterized in terms of ρ and the innovation variance σ ε = σ (cid:0) − ρ (cid:1) , this would correspond to σ ε → But see also Remark B(i) inSubsection 3.2.2 for a discussion that holding σ constant is actually not a restriction.For later use we note that under Assumption 1 the matrices e + e ′ + and e − e ′− are limit points ofthe covariance model C where e + = (1 , . . . , ′ and e − = ( − , , . . . , ( − n ) ′ are n × ρ m ) converges to e + e ′ + ( e − e ′− , respectively) if ρ m → ρ m → −
1, respectively)). Other singularlimit points of C are possible, but e + e ′ + and e − e ′− are the only singular limit points of C AR (1) . Before we present the results for common nonparametrically based autocorrelation robust testsin the next subsection and for parametrically based tests in Subsection 3.3, it is perhaps helpfulto gain some understanding for these results from a very special case, namely from the locationmodel. We should, however, warn the reader that only some, but not all, phenomena that we shalllater observe in the case of a general regression model will occur in the case of the location model,because it represents an oversimplification of the general case. Hence, while gaining intuition in thelocation model is certainly helpful, this intuition does not paint a complete and faithful picture ofthe situation in a general regression model.Consider now the location model, i.e., model (1) with k = 1 and X = e + . Let Assumption1 hold and assume that we want to test β = β against the alternative β = β . Consider thecommonly used autocorrelation robust test statistic τ loc ( y ) = (ˆ β ( y ) − β ) / ˆ ω ( y )where ˆ β ( y ) is the arithmetic mean n − e ′ + y and where ˆ ω ( y ) is one of the usual autocorrelationrobust estimators for the variance of the least squares estimator. As usual, the null hypothesis isrejected if τ loc ( y ) ≥ C for some user-specified critical value C satisfying 0 < C < ∞ . For definitenessof the discussion assume that one has chosen the Bartlett estimator, although any estimator basedon weights satisfying Assumption 2 given below could be used instead. It is then not difficult tosee (cf. Lemma 3.1 given below) that ˆ ω ( y ) is positive, and hence τ loc ( y ) is well-defined, exceptwhen y is proportional to e + ; in this case we set τ loc ( y ) equal to 0, which, of course, is a completelyarbitrary choice, but has no effect on the rejection probability of the resulting test as the event that y is proportional to e + has probability zero under all the distributions in the model.Consider now the points ( β , , Λ( ρ )) in the null hypothesis, where we have set σ = 1 forsimplicity and where we let ρ ∈ ( − ,
1) converge to 1. Writing P ρ for P e + β , Λ( ρ ) , i.e., for thedistribution of the data, observe that under P ρ the distribution of ˆ β ( y ) − β = n − e ′ + y − β is N (cid:0) , n − e ′ + Λ( ρ ) e + (cid:1) . Noting that Λ( ρ ) → e + e ′ + for ρ →
1, we see that under P ρ the distributionof the numerator of the test statistic converges weakly for ρ → ω ( y ) is a quadratic form inthe residual vector y − e + ˆ β ( y ) = (cid:0) I n − n − e + e ′ + (cid:1) y , this vector being distributed under P ρ as N (0 , A ( ρ )) with A ( ρ ) = (cid:0) I n − n − e + e ′ + (cid:1) Λ( ρ ) (cid:0) I n − n − e + e ′ + (cid:1) . Now for ρ → A ( ρ )converges to the zero matrix, and therefore the distribution of the residual vector under P ρ convergesto pointmass at zero. Consequently, the distribution of the quadratic form ˆ ω ( y ) under P ρ collapsesto pointmass at zero. But this shows that all of the mass of the distribution of the test statistic To see this note that the covariance function of the disturbances converges to that of a (very simple) harmonicprocess as ρ → ±
1. In view of Gaussianity, this implies convergence of finite-dimensional distributions and henceweak convergence of the entire process, cf. Billingsley (1968), p.19. loc under P ρ escapes to infinity for ρ →
1, entailing convergence of the rejection probabilities P ρ ( τ loc ( y ) ≥ C ) to 1, although the distributions P ρ correspond to points ( β , , Λ( ρ )) in the nullhypothesis. This of course then implies that the size of the test equals 1.In a similar vein, consider the points ( β , , Λ( ρ )) in the null hypothesis where now ρ con-verges to −
1. Note that P ρ then converges weakly to N (cid:0) e + β , e − e ′− (cid:1) which is the distributionof e + β + e − g where g is a standard normal random variable. Similar computations as beforeshow that under P ρ the distribution of the numerator of the test statistic now converges weaklyto the distribution of n − (cid:0) e ′ + e − (cid:1) g and that the distribution of the residual vector convergesweakly to the distribution of (cid:0) I n − n − e + e ′ + (cid:1) e − g , the weak convergence occurring jointly. Be-cause of ˆ ω ( y ) = ˆ ω (cid:0)(cid:0) I n − n − e + e ′ + (cid:1) y (cid:1) , it follows from the continuous mapping theorem thatthe distribution of the denominator of the test statistic under P ρ converges weakly to the dis-tribution of ˆ ω (cid:0)(cid:0) I n − n − e + e ′ + (cid:1) e − g (cid:1) (and convergence is joint with the numerator). Note thatˆ ω (cid:0)(cid:0) I n − n − e + e ′ + (cid:1) e − g (cid:1) equals ˆ ω (cid:0) e − − n − e + e ′ + e − (cid:1) g by homogeneity of ˆ ω . Now, if samplesize n is even, we see that e ′ + e − = 0, entailing that the distribution of the test statistic under P ρ converges to pointmass at zero for ρ → − ω ( e − ) g is almost surely positive). As aconsequence, if sample size n is even the rejection probabilities P ρ ( τ loc ( y ) ≥ C ) converge to zeroas ρ → − C >
0. Next consider the case where n is odd. Then e ′ + e − = − n − ˆ ω − (cid:0) e − + n − e + (cid:1) which is positive (and iswell-defined since ˆ ω (cid:0) e − + n − e + (cid:1) > e − + n − e + is not proportional to e + ). Hence, if n isodd, we learn that the rejection probabilities P ρ ( τ loc ( y ) ≥ C ) converge to zero or one as ρ → − C satisfies C > n − ˆ ω − (cid:0) e − + n − e + (cid:1) or C < n − ˆ ω − (cid:0) e − + n − e + (cid:1) .In summary we have learned that the size of the autocorrelation robust test in the locationmodel is always equal to one, an ”offending” sequence leading to this result being, e.g., ( β , , Λ( ρ ))with ρ →
1. We have also learned that if n is even, or if n is odd and the critical value C is largerthan n − ˆ ω − (cid:0) e − + n − e + (cid:1) , the test is severely biased as the rejection probabilities get arbitrarilyclose to zero in certain parts of the null hypothesis; of course, this implies dismal power propertiesof the test in certain parts of the alternative hypothesis. The ”offending” sequence in this case beingagain ( β , , Λ( ρ )), but now with ρ → −
1. It is worth noting that in the case where n is odd and C < n − ˆ ω − (cid:0) e − + n − e + (cid:1) holds, this ”offending” sequence does not inform us about biasedness ofthe test, but rather provides a second sequence along which the null rejection probabilities convergeto 1. We note here also that due to certain invariance properties of the test statistic in fact anysequence (cid:0) β , σ , Λ( ρ ) (cid:1) with ρ → ± arbitrary behavior of σ , 0 < σ < ∞ , is an ”offending”sequence in the same way as ( β , , Λ( ρ )) is. The results obtained above heavily exploit the factthat ρ can be chosen arbitrarily close to ± ρ ) becomes singular in the limit).To what extent an assumption restricting the parameter space C in such a way, that the matricesΣ ∈ C do not have limit points that are singular, can provide an escape route avoiding the size andpower problems observed above is discussed in Subsection 3.2.2.We would like to stress once more that not all cases that can arise in a general regression model(see Theorems 3.3 and 3.7) appear already in the location model discussed above. For example, forother design matrices and/or linear hypothesis to be tested, the roles of the ”offending” sequences ρ → ρ → − (cid:0) β, σ , Λ( ρ ) (cid:1) with ρ = 1. Then the test problem now also contains the problem of testing β = β against β = β in the family P = (cid:8) P e + β,σ Λ(1) : β ∈ R , < σ < ∞ (cid:9) as a subproblem. Because of Λ(1) = e + e ′ + , this subproblem is equivalent to testing β = β against β = β inthe family (cid:8) N (cid:0) β, σ (cid:1) : β ∈ R , < σ < ∞ (cid:9) . Obviously, there is no ”reasonable” test for the lattertesting problem, and thus for the test problem in the family P . The intuitively appealing argumentnow is that the absence of a ”reasonable” test in the family P should necessarily imply troublefor tests, and in particular for autocorrelation robust tests, in the original test problem in thefamily P orig = (cid:8) P e + β,σ Λ( ρ ) : β ∈ R , < σ < ∞ , | ρ | < (cid:9) whenever ρ is close to one. While thisargument has some appeal, it seems to rest on some sort of tacit continuity assumption regardingthe rejection probabilities at the point ρ = 1, which is unjustified as we now show: If ϕ is any test,i.e., is a measurable function on R n with values in [0 , ϕ ∗ that coincides with ϕ on R n \ span ( e + ) has the same rejection probabilities in the model P orig as has ϕ ; and any test ϕ ∗∗ that coincides with ϕ on span ( e + ) has the same rejection probabilities in the model P ashas ϕ . This is so since the distributions in P are concentrated on span ( e + ), whereas this set isa null set for the distributions in P orig . As a consequence, the sequence of rejection probabilitiesof a test ϕ under P ρ with ρ < ρ → e + ),whereas such a modification will substantially affect the rejection probability under P (e.g., wecan make it equal to 0 or to 1 by suitable modifications of ϕ on span ( e + )). This, of course,then shows that rejection probabilities of a test ϕ will in general not be continuous at the point ρ = 1. Put differently, in the case of the test statistic τ loc the rejection probabilities under P depend only on the (completely arbitrary) way τ loc is defined on span ( e + ), while the rejectionprobabilities under P orig are completely unaffected by the way τ loc is defined on span ( e + ). Hence,any attempt to obtain information on the behavior of P ρ ( τ loc ( y ) ≥ C ) for ρ → P alone is necessarily futile. [At theheart of the matter lies here the fact, that while the distributions in P can be approximated bydistributions in P orig in the sense of weak convergence, this has little consequences for closeness ofrejection probabilities in general, especially since the distributions in P and P orig are orthogonaland the tests one is interested in are not continuous everywhere.] In a similar way one could tryto predict the behavior of the rejection probabilities for ρ → − P − = (cid:8) P e + β,σ Λ( − : β ∈ R , < σ < ∞ (cid:9) , the argument now beingas follows: Since n > β can be estimated without error in themodel P − . Thus, we can test the hypothesis β = β without committing any error, seeminglysuggesting that P ρ ( τ loc ( y ) ≥ C ) should converge to zero for ρ → −
1. However, as we have shownabove, P ρ ( τ loc ( y ) ≥ C ) does not always converge to zero for ρ → −
1, namely it converges to oneif n is odd and C < n − ˆ ω − (cid:0) e − + n − e + (cid:1) holds. Summarizing we see that, while the heuristicarguments are interesting, they do not really capture the underlying mechanism; cf. the discussionfollowing Theorem 3.3. Furthermore, the heuristic arguments just discussed are specific to thelocation model (i.e., to the case X = e + ), whereas severe size distortions can also arise in moregeneral regression models as will be shown in the next subsection. We stress that the parameters β and σ are identifiable in the model P . Note that the arbitrariness in the definition of the test statistic τ loc ( y ) on span ( e + ) has no effect on therejection probabilities under the experiment P − . Hence, one could hope to derive the behavior of P ρ ( τ loc ( y ) > C )for ρ → − P − and then by arguing thatthe map ρ P ρ ( τ loc ( y ) > C ) is continuous at ρ = −
1. However, this would just amount to reproducing our directargument given earlier. .2 Nonparametrically based autocorrelation robust tests Commonly used autocorrelation robust tests for the null hypothesis H given by (3) are basedon test statistics of the form ( R ˆ β ( y ) − r ) ′ ˆΩ − ( y ) ( R ˆ β ( y ) − r ), with the statistic typically beingundefined if ˆΩ ( y ) is singular. HereˆΩ ( y ) = nR ( X ′ X ) − ˆΨ( y )( X ′ X ) − R ′ (5)and ˆΨ is a nonparametric estimator for n − E ( X ′ UU ′ X ). The type of estimator ˆΨ we consider inthis subsection is obtained as a weighted sum of sample autocovariances of ˆ v t ( y ) = ˆ u t ( y ) x ′ t · , whereˆ u t ( y ) is the t -th coordinate of the least squares residual vector ˆ u ( y ) = y − X ˆ β ( y ) and x t · denotesthe t -th row vector of X . That isˆΨ( y ) = ˆΨ w ( y ) = n − X j = − ( n − w ( j, n )ˆΓ j ( y ) (6)for every y ∈ R n with ˆΓ j ( y ) = n − P nt = j +1 ˆ v t ( y )ˆ v t − j ( y ) ′ if j ≥ j ( y ) = ˆΓ − j ( y ) ′ else. Theassociated estimator ˆΩ will be denoted by ˆΩ w . We make the following assumption on the weights. Assumption 2.
The weights w ( j, n ) for j = − ( n − , . . . , n − are data-independent and satisfy w (0 , n ) = 1 as well as w ( − j, n ) = w ( j, n ) . Furthermore, the symmetric n × n Toeplitz matrix W n with elements w ( i − j, n ) is positive definite. The positive definiteness assumption on W n is weaker than the frequently employed assumptionthat the Fourier transform w † ( ω ) of the weights is nonnegative for all ω ∈ [ − π, π ]. It certainlyimplies that ˆΨ w ( y ), and hence ˆΩ w ( y ), is always nonnegative definite, but it will allow us to showmore, see Lemma 3.1 below. In many applications the weights take the form w ( j, n ) = w ( | j | /M n ),where the lag-window w is an even function with w (0) = 1 and where M n > M n > See Anderson (1971) or Hannan (1970) formore discussion. It is also satisfied for many exponentiated lag-windows as used in Phillips et al.(2006, 2007) and Sun et al. (2011).In the typical asymptotic analysis of this sort of tests in the literature the event where theestimator ˆΩ w is singular is asymptotically negligible (as ˆΩ w converges to a positive definite or almostsurely positive definite matrix), and hence there is no need to be specific about the definition ofthe test statistic on this event. However, if one is concerned with finite-sample properties, one has For the case where W n is only nonnegative definite see Subsection 3.2.1. Note that the quadratic form α ′ W n α can be represented as R π − π (cid:12)(cid:12)(cid:12)P nj =1 α j exp ( ιjω ) (cid:12)(cid:12)(cid:12) w † ( ω ) dω . If w † ( ω ) ≥ ω ∈ [ − π, π ] is assumed, the integrand is nonnegative; and if α = 0 it is positive almost everywhere (since it isthen a product of two nontrivial trigonometric polynomials). The estimator in Keener et al. (1991) coincides with ( n times) the estimator given by (5) if the rectangularlag-window is used and R = I k .
10o think about the definition of the test statistic also in the case where ˆΩ w ( y ) is singular. We thusdefine the test statistic as follows: T ( y ) = (cid:26) ( R ˆ β ( y ) − r ) ′ ˆΩ − w ( y ) ( R ˆ β ( y ) − r ) if det ˆΩ w ( y ) = 0 , w ( y ) = 0 . (7)Of course, assigning the test statistic T the value zero on the set where ˆΩ w ( y ) is singular is arbitrary.However, it will be irrelevant for size and power properties of the test provided we can ensure thatthe set of y ∈ R n for which det ˆΩ w ( y ) = 0 holds is a λ R n -null set (since all relevant distributions P µ,σ Σ are absolutely continuous w.r.t. λ R n due to the fact that every element of Σ ∈ C is positivedefinite by assumption). We thus need to study under which circumstances this is ensured. Thiswill be done in the subsequent lemma. It will prove useful to introduce the following matrix forevery y ∈ R n B ( y ) = R ( X ′ X ) − X ′ diag (ˆ u ( y ) , . . . , ˆ u n ( y ))= R ( X ′ X ) − X ′ diag (cid:0) e ′ ( n )Π span( X ) ⊥ y, . . . , e ′ n ( n )Π span( X ) ⊥ y (cid:1) , (8)as well as the following assumption on the design matrix X (and on the restriction matrix R ): Assumption 3.
Let ≤ i < . . . < i s ≤ n denote all the indices for which e i j ( n ) ∈ span( X ) holdswhere e j ( n ) denotes the j -th standard basis vector in R n . If no such index exists, set s = 0 . Let X ′ ( ¬ ( i , . . . i s )) denote the matrix which is obtained from X ′ by deleting all columns with indices i j , ≤ i < . . . < i s ≤ n (if s = 0 no column is deleted). Then rank (cid:0) R ( X ′ X ) − X ′ ( ¬ ( i , . . . i s )) (cid:1) = q holds. The lemma is now as follows. Note that the matrix B ( y ) does not depend on the weights w ( j, n ). Lemma 3.1.
Suppose Assumption 2 is satisfied. Then the following holds:1. ˆΩ w ( y ) is nonnegative definite for every y ∈ R n .2. ˆΩ w ( y ) is singular if and only if rank ( B ( y )) < q .3. ˆΩ w ( y ) = 0 if and only if B ( y ) = 0 .4. The set of all y ∈ R n for which ˆΩ w ( y ) is singular (or, equivalently, for which rank ( B ( y )) < q )is either a λ R n -null set or the entire sample space R n . The latter occurs if and only ifAssumption 3 is violated. Remark 3.2. (i) Setting R = X ′ X and q = k shows that a necessary and sufficient condition forˆΨ w to be λ R n -almost everywhere nonsingular is that e i ( n ) / ∈ span( X ) for all i = 1 , . . . , n . [If thiscondition is not satisfied ˆΨ w ( y ) is singular for every y ∈ R n .] In particular, it follows that underthis simple condition ˆΩ w ( y ) is nonsingular λ R n -almost everywhere for every choice of the restrictionmatrix R .(ii) In the case q = 1 Assumption 3 is easily seen to be violated if and only if R ( X ′ X ) − X ′ e i ( n ) = 0 or e i ( n ) ∈ span( X ) holds for every i = 1 , . . . , n. Some authors (e.g., Kiefer and Vogelsang (2002b, 2005)) choose to normalize also by q , the number of restrictionsto be tested. This is of course immaterial as long as one accordingly adjusts the critical vlaue.
11e learn from the preceding lemma that, provided Assumption 3 is satisfied (which only dependson X and R and hence can be verified by the user), our choice of defining the test statistic T to bezero on the set where ˆΩ w is singular is immaterial and has no effect on the size and power propertiesof the test. We also learn from that lemma that, in case Assumption 3 is violated, the commonlyused autocorrelation robust tests break down completely in a trivial way as ˆΩ w ( y ) is then singularfor every data point y . We are therefore forced to impose Assumption 3 on the design matrix X if we want commonly used autocorrelation robust tests to make any sense at all. We shall thusimpose Assumption 3 in the following development. We also note that, given a restriction matrix R , the set of design matrices that lead to a violation of Assumption 3 is a ”thin” subset in the setof all n × k matrices of full rank.As usual, the test based on T rejects H if T ( y ) ≥ C where C > T (obtainedeither under assumptions that guarantee consistency of ˆΩ w or under the assumption of a ”fixedbandwidth”, i.e., M n /n > n ). In the subsequent theorem, which discusses size andpower properties of autocorrelation robust tests based on T , we allow for arbitrary (nonrandom)critical values C > Because of this, and since the theorem is a finite-sample result, it appliesequally well to standard autocorrelation robust tests (for which one fancies that M n → ∞ and M n /n → n would increase to infinity) and to so-called ”fixed-bandwidth” tests (which assume M n /n > n ). Theorem 3.3.
Suppose Assumptions 1, 2, and 3 are satisfied. Let T be the test statistic definedin (7) with ˆΨ w as in (6). Let W ( C ) = { y ∈ R n : T ( y ) ≥ C } be the rejection region where C is areal number satisfying < C < ∞ . Then the following holds:1. Suppose rank ( B ( e + )) = q and T ( e + + µ ∗ ) > C hold for some (and hence all) µ ∗ ∈ M , or rank ( B ( e − )) = q and T ( e − + µ ∗ ) > C hold for some (and hence all) µ ∗ ∈ M . Then sup Σ ∈ C P µ ,σ Σ ( W ( C )) = 1 (9) holds for every µ ∈ M and every < σ < ∞ . In particular, the size of the test is equal toone.2. Suppose rank ( B ( e + )) = q and T ( e + + µ ∗ ) < C hold for some (and hence all) µ ∗ ∈ M , or rank ( B ( e − )) = q and T ( e − + µ ∗ ) < C hold for some (and hence all) µ ∗ ∈ M . Then inf Σ ∈ C P µ ,σ Σ ( W ( C )) = 0 (10) holds for every µ ∈ M and every < σ < ∞ , and hence inf µ ∈ M inf Σ ∈ C P µ ,σ Σ ( W ( C )) = 0 holds for every < σ < ∞ . In particular, the test is biased. Furthermore, the nuisance-infimal rejection probability at every point µ ∈ M is zero, i.e., inf <σ < ∞ inf Σ ∈ C P µ ,σ Σ ( W ( C )) = 0 . In particular, the infimal power of the test is equal to zero. Because the theorem is a finite-sample result, we are free to imagine that C depends on sample size n . In fact,there is nothing in the theory that prohibits us from imagining that C depends even on the design matrix X , on therestriction given by ( R, r ), or on the weights w ( j, n ). . Suppose B ( e + ) = 0 and R ˆ β ( e + ) = 0 hold, or B ( e − ) = 0 and R ˆ β ( e − ) = 0 hold. Then sup Σ ∈ C P µ ,σ Σ ( W ( C )) = 1 (11) holds for every µ ∈ M and every < σ < ∞ . In particular, the size of the test is equal toone. Remark 3.4. (i) As a point of interest we note that the rejection probabilities P µ,σ Σ ( W ( C )) can beshown to depend on (cid:0) µ, σ , Σ (cid:1) only through (( Rβ − r ) /σ, Σ) (in fact, only through ( h ( Rβ − r ) /σ i , Σ)),see Lemma A.1 in Appendix A.(ii) Because of (i), the rejection probabilities P µ ,σ Σ ( W ( C )) are constant w.r.t. (cid:0) µ , σ (cid:1) ∈ M × (0 , ∞ ) for every Σ ∈ C . Consequently, we could have equivalently written (9) and (11) byinserting an infimum over (cid:0) µ , σ (cid:1) ∈ M × (0 , ∞ ) in between the supremum and P µ ,σ Σ ( W ( C )).Similarly, we could have inserted a supremum over (cid:0) µ , σ (cid:1) ∈ M × (0 , ∞ ) in between the infimumand P µ ,σ Σ ( W ( C )) in (10). A similar remark also applies to other results in the paper such as,e.g., Theorems 3.12, 3.15, 4.2, and Corollary 5.17.(iii) Although trivial, it is useful to note that the conclusions of the preceding theorem alsoapply to any rejection region W ∗ ∈ B ( R n ) which differs from W ( C ) by a λ R n -null set.(iv) By the way T is defined in (7), the condition T ( e + + µ ∗ ) > C ( T ( e − + µ ∗ ) > C , respectively)in Part 1 of the preceding theorem already implies rank ( B ( e + )) = q (rank ( B ( e − )) = q , respec-tively). For reasons of comparability with Part 2 we have nevertheless included this rank conditioninto the formulation of Part 1. Remark 3.5. (i) Inspection of the proof of Theorem 3.3 shows that Assumption 1 can obviouslybe weakened to the assumption that C contains AR(1) correlation matrices Λ( ρ (1) m ) and Λ( ρ (2) m ) fortwo sequences ρ ( i ) m ∈ ( − ,
1) with ρ (1) m → ρ (2) m → −
1. In fact, this can be further weakened tothe assumption that there exist Σ ( i ) m ∈ C with Σ (1) m → e + e ′ + and Σ (2) m → e − e ′− for m → ∞ .(ii) For a discussion on how Theorem 3.3 has to be modified in case only e + e ′ + (or e − e ′− ) arisesas a singular accumulation point of C see Subsection 3.2.2.The conditions in Parts 1-3 of the theorem only depend on the design matrix X , the restriction( R, r ), the vector e + ( e − , respectively), the critical value C , and the weights w ( j, n ) (via T ( e + + µ ∗ )or T ( e − + µ ∗ ), respectively). Hence, in any particular application it can be decided whether(and which of) these conditions are satisfied. Furthermore, as will become transparent from theexamples to follow and from Proposition 3.6 below, in the majority of applications at least one ofthese conditions will be satisfied, implying that common autocorrelation robust tests have size 1and/or have power arbitrarily close to 0 in certain parts of the alternative hypothesis. Before weturn to these examples, we want to provide some intuition for Theorem 3.3: Consider a sequence ρ m ∈ ( − ,
1) with ρ m → ρ m → −
1, respectively) as m → ∞ . Then Σ m = Λ ( ρ m ) ∈ C byAssumption 1 and Λ ( ρ m ) → e + e ′ + ( e − e ′− ) holds. Consequently, P µ ,σ Σ m concentrates more andmore around the one-dimensional subspace span ( e + ) (span ( e − ), respectively) in the sense thatit converges weakly to the singular Gaussian distribution P µ ,σ e + e ′ + ( P µ ,σ e − e ′− , respectively).The conditions in Part 1 (or Part 3) of the preceding theorem then essentially allow one to showthat (i) the measure P µ ,σ e + e ′ + ( P µ ,σ e − e ′− , respectively) is supported by W ( C ) (more precisely,after W ( C ) has been modified by a suitable λ R n -null set), and (ii) that P µ ,σ e + e ′ + ( P µ ,σ e − e ′− ,13espectively) puts no mass on the boundary of the (modified) set W ( C ). By the Portmanteautheorem we can then conclude that the sequence of measures P µ ,σ Σ m puts more and more masson W ( C ) in the sense that P µ ,σ Σ m ( W ( C )) → m → ∞ , which establishes the conclusionof Part 1 of the theorem. The proof of the first claim in Part 2 works along similar lines butwhere concentration is now on the complement of the rejection region W ( C ). For more discussionsee Subsection 5.2. The remaining results in Part 2 are obtained from the first claim in Part 2exploiting invariance and continuity properties of the rejection probabilities. While concentrationof the probability measures P µ ,σ Σ m constitutes an important ingredient in the proof of Theorem3.3, it should, however, be stressed that there are also other cases (cf. Theorems 3.7 and 3.8), wheredespite concentration of P µ ,σ Σ m as above, the conditions for an application of the Portmanteautheorem are not satisfied; in fact, in some of these cases size < > Example 3.1. (Testing a restriction involving the intercept)
Suppose that Assumptions 1, 2, and 3hold. For definiteness assume that the first column of X corresponds to the intercept (i.e., the firstcolumn of X is e + ). Assume also that the restriction involves the intercept, i.e., the first columnof R is nonzero. Then it is easy to see that B ( e + ) = 0 and R ˆ β ( e + ) = 0 holds (the latter sinceˆ β ( e + ) = e ( k )). Consequently, Part 3 of Theorem 3.3 applies and shows that the size of the test T is always 1. Additionally, the power deficiency results in Part 2 of the theorem will apply wheneverrank ( B ( e − )) = q and T ( e − + µ ∗ ) < C hold. [Whether or not this is the case will depend on C , X , R , and the weights.] Example 3.2. (Location model)
Suppose that Assumptions 1 and 2 hold. Suppose X = e + andthe hypothesis is β = β (hence k = q = 1). As just noted in Example 3.1, the size of the test T isthen always 1 (as Assumption 3 is certainly satisfied). In this simple model the conditions for thepower deficiencies to arise can be made more explicit: Note that B ( e − ) = 0 clearly always holds,and hence rank B ( e − ) = 1 = q . If n is even, it is also easy to see that T ( e − + β e + ) = 0 < C always holds. Consequently, Part 2 of Theorem 3.3 applies and shows that the power of the testgets arbitrarily close to zero in certain parts of the parameter space as described in the theorem. If n is odd, then T ( e − + β e + ) = n − ˆΨ − w ( e − ) and the same conclusion applies provided this quantityis less than C . For example, for the (modified) Bartlett lag-window numerical computations showthat n − ˆΨ − w ( e − ) is less than 1 .
563 for every odd n in the range 1 < n < M n /n ∈ (0 , C has been chosen to be larger than or equal to 1 . Example 3.3. (Testing a zero restriction on a slope parameter)
Consider the same regression modelas in Example 3.1 with the same assumptions, but now suppose that the hypothesis is β i = 0 forsome i >
1, i.e., we are interested in testing a slope parameter. Since in this case B ( e + ) = 0 and R ˆ β ( e + ) = 0 obviously hold, where R = e ′ i ( k ), we need to investigate the behavior of B ( e − ) in order The discussion in this example so far just reproduces results obtained in Subsection 3.1.
14o be able to apply Theorem 3.3. If rank B ( e − ) = 1 holds (which will generically be the case) thensize equals 1 in case T ( e − ) > C and the power deficiencies arise in case T ( e − ) < C . Example 3.4. (Testing for a change in mean)
A special case of the preceding example is the casewhere k = 2, the first column of X is e + and the second column has entries x t = 0 for 1 ≤ t ≤ t ∗ and x t = 1 else. We assume t ∗ to be known and to satisfy 1 < t ∗ < n . The hypothesis to betested is β = 0. It is then easy to see that Assumption 3 is satisfied. Furthermore, some simplecomputations show that rank B ( e − ) = q = 1 always holds. Hence, the test T has size 1 if T ( e − ) > C and the power deficiencies arise if T ( e − ) < C . In case n as well as n − t are even, the latter casealways arises since T ( e − ) = 0 holds. [If n or n − t is odd, T ( e − ) can of course be computed anddepends only on n , t , and ˆΨ − w ( e − ). We omit the details.]The cases in Theorem 3.3 leading to size 1 or to power deficiencies of the test based on T , whilenot being exhaustive, are often satisfied in applications. We make this formal in the subsequentproposition in that we prove that, for given restriction ( R, r ) and critical value C , the conditionsin Theorem 3.3 involving X are generically satisfied. The first part of the proposition shows thatthese conditions are generically satisfied in the universe of all possible n × k design matrices of rank k . Parts 2 and 3 show that the same is true if we impose that the regression model has to containan intercept. In the subsequent proposition the dependence of B ( y ), of T ( y ), as well as of ˆΩ w ( y )on X will be important and thus we shall write B X ( y ), T X ( y ), and ˆΩ w,X ( y ) for these quantitiesin the result to follow. Proposition 3.6.
Suppose Assumption 1 holds. Fix ( R, r ) with rank ( R ) = q , fix < C < ∞ , andfix the weights w ( j, n ) which are assumed to satisfy Assumption 2. Let T be the test statistic definedin (7) with ˆΨ w as in (6) and let µ ∗ ∈ M be arbitrary.1. Define X = (cid:8) X ∈ R n × k : rank ( X ) = k (cid:9) , X ( e + ) = { X ∈ X : rank ( B X ( e + )) < q } , X ( e + ) = { X ∈ X \ X ( e + ) : T X ( e + + µ ∗ ) = C } , and similarly define X ( e − ) , X ( e − ) . [Note that X ( e + ) and X ( e − ) do not depend onthe choice of µ ∗ .] Then X ( e + ) , X ( e + ) , X ( e − ) , and X ( e − ) are λ R n × k -null sets. Theset of all design matrices X ∈ X for which Theorem 3.3 does not apply is a subset of ( X ( e + ) ∪ X ( e + )) ∩ ( X ( e − ) ∪ X ( e − )) and hence is a λ R n × k -null set. It thus is a ”neg-ligible” subset of X in view of the fact that X differs from R n × k only by a λ R n × k -null set.2. Suppose k ≥ , X has e + as its first column, i.e., X = (cid:16) e + , ˜ X (cid:17) , and suppose the first columnof R consists of zeros only. Define ˜ X = n ˜ X ∈ R n × ( k − : rank (cid:16)(cid:16) e + , ˜ X (cid:17)(cid:17) = k o , ˜ X ( e − ) = n ˜ X ∈ ˜ X : rank (cid:16) B ( e + , ˜ X )( e − ) (cid:17) < q o , ˜ X ( e − ) = n ˜ X ∈ ˜ X \ ˜ X ( e − ) : T ( e + , ˜ X )( e − + µ ∗ ) = C o , and note that ˜ X ( e − ) does not depend on the choice of µ ∗ . Then ˜ X ( e − ) and ˜ X ( e − ) are λ R n × ( k − -null sets (with the analogously defined sets ˜ X ( e + ) and ˜ X ( e + ) satisfying ˜ X ( e + ) =15 X and ˜ X ( e + ) = ∅ .). The set of all matrices ˜ X ∈ ˜ X such that Theorem 3.3 does not apply tothe design matrix X = (cid:16) e + , ˜ X (cid:17) is a subset of ˜ X ( e − ) ∪ ˜ X ( e − ) and hence is a λ R n × ( k − -nullset. It thus is a ”negligible” subset of ˜ X in view of the fact that ˜ X differs from R n × ( k − only by a λ R n × ( k − -null set.3. Suppose k ≥ , X = (cid:16) e + , ˜ X (cid:17) , and suppose the first column of R is nonzero. Then Theo-rem 3.3 applies to the design matrix X = (cid:16) e + , ˜ X (cid:17) for every ˜ X ∈ ˜ X (provided X satisfiesAssumption 3). The proof of the proposition actually shows more, namely that the set of design matrices forwhich Theorem 3.3 does not apply is contained in an algebraic set. We also remark that if theregressor matrix X is viewed as randomly drawn from a distribution that is absolutely continuousw.r.t. λ R n × k , Proposition 3.6 implies that then the conditions of Theorem 3.3 are almost surelysatisfied; if X is also independent of U , Theorem 3.3 then establishes negative results for theconditional rejection probabilities for almost all realizations of X .We next discuss an exceptional case to which Theorem 3.3 does not apply and which is interestingin that a positive result can be established, at least if the covariance model C is assumed to be C AR (1) or is approximated by C AR (1) near the singular points in the sense of Remark 3.10(i) below. Thispositive result will then guide us to an improved version of the test statistic T . Theorem 3.7.
Suppose C = C AR (1) and suppose Assumptions 2 and 3 are satisfied. Let T be thetest statistic defined in (7) with ˆΨ w as in (6). Let W ( C ) = { y ∈ R n : T ( y ) ≥ C } be the rejectionregion where C is a real number satisfying < C < ∞ . If e + , e − ∈ M and R ˆ β ( e + ) = R ˆ β ( e − ) = 0 is satisfied, then the following holds:1. The size of the rejection region W ( C ) is strictly less than , i.e., sup µ ∈ M sup <σ < ∞ sup − <ρ< P µ ,σ Λ( ρ ) ( W ( C )) < . Furthermore, inf µ ∈ M inf <σ < ∞ inf − <ρ< P µ ,σ Λ( ρ ) ( W ( C )) > .
2. The infimal power is bounded away from zero, i.e., inf µ ∈ M inf <σ < ∞ inf − <ρ< P µ ,σ Λ( ρ ) ( W ( C )) > .
3. For every < c < ∞ inf µ ∈ M , <σ < ∞ d ( µ , M ) /σ ≥ c P µ ,σ Λ( ρ m ) ( W ( C )) → holds for m → ∞ and for any sequence ρ m ∈ ( − , satisfying | ρ m | → . Furthermore, forevery sequence < c m < ∞ and every < ε < µ ∈ M ,d ( µ , M ) ≥ c m inf − ε ≤ ρ ≤ − ε P µ ,σ m Λ( ρ ) ( W ( C )) → If X does not satisfy Assumption 3, then the test breaks down in a trivial way as already discussed. olds for m → ∞ whenever < σ m < ∞ and c m /σ m → ∞ . [The very last statement holdseven without the conditions e + , e − ∈ M and R ˆ β ( e + ) = R ˆ β ( e − ) = 0 .]4. For every δ , < δ < , there exists a C ( δ ) , < C ( δ ) < ∞ , such that sup µ ∈ M sup <σ < ∞ sup − <ρ< P µ ,σ Λ( ρ ) ( W ( C ( δ ))) ≤ δ. The first statement of the theorem says that, in contrast to the cases considered in Theorem3.3, the size of the test T is now bounded away from 1 for any choice of the critical value C .Moreover, the last part of the theorem shows that the size can be controlled to be less than orequal to any prespecified significance level δ by a suitable choice of the critical value C ( δ ). Because P µ ,σ Λ( ρ ) ( W ( C )) does not depend on µ and σ but only on ρ (see Proposition 5.4) and becausethis probability can be computed via simulation, the supremum of this probability over µ , σ , and ρ can be easily found by a grid search; exploiting monotonicity of the probability with respect to C , the value of C ( δ ) can then be found by a simple search algorithm. The theorem furthermoreshows that, again in contrast to the scenario considered in Theorem 3.3, the infimal power of thetest is at least bounded away from zero. The power even approaches 1 if either (cid:13)(cid:13)(cid:13)(cid:16) Rβ (1) − r (cid:17) /σ (cid:13)(cid:13)(cid:13) is bounded away from zero and | ρ | →
1, or if (cid:13)(cid:13)(cid:13)(cid:16) Rβ (1) − r (cid:17) /σ (cid:13)(cid:13)(cid:13) → ∞ and | ρ | is bounded away from1. [Here β (1) is the parameter vector corresponding to µ . Note that d ( µ , M ) is bounded fromabove as well as from below by multiples of (cid:13)(cid:13)(cid:13) Rβ (1) − r (cid:13)(cid:13)(cid:13) , where the constants involved are positiveand depend only on X , R , and r .]The preceding theorem required e + , e − ∈ M and R ˆ β ( e + ) = R ˆ β ( e − ) = 0. To illustrate, theseconditions are, e.g., satisfied if e + and e − constitute the first two columns of the matrix X and thehypothesis tested only involves coefficients β i with i ≥ R are zero).While an intercept will typically be present in a regression model and thus e + appears as one of theregressors (and hence satisfies e + ∈ M ), e − will not necessarily be an element of M , and hence thepreceding theorem will not apply. However, the following theorem shows how we can neverthelessextend the same positive results to this case if we apply a simple adjustment to the test statistic T . Theorem 3.8.
Suppose C = C AR (1) and suppose Assumption 2 is satisfied. Suppose one of thefollowing scenarios applies:1. e + ∈ M with R ˆ β ( e + ) = 0 and e − / ∈ M . Furthermore, k + 1 < n holds and the n × ( k + 1) matrix ¯ X = ( X, e − ) (which necessarily has rank k + 1 ) satisfies Assumption 3 relative to the q × ( k + 1) restriction matrix ¯ R = ( R, . Define ¯ β ( y ) = ( I k , (cid:0) ¯ X ′ ¯ X (cid:1) − ¯ X ′ y .2. e + / ∈ M and e − ∈ M with R ˆ β ( e − ) = 0 . Furthermore, k + 1 < n holds and the n × ( k + 1) matrix ¯ X = ( X, e + ) (which necessarily has rank k + 1 ) satisfies Assumption 3 relative to the q × ( k + 1) restriction matrix ¯ R = ( R, . Define ¯ β ( y ) = ( I k , (cid:0) ¯ X ′ ¯ X (cid:1) − ¯ X ′ y .3. e + / ∈ M and e − / ∈ M with rank ( X, e + , e − ) = k + 2 . Furthermore, k + 2 < n holds andthe n × ( k + 2) matrix ¯ X = ( X, e + , e − ) (which necessarily has rank k + 2 ) satisfies As-sumption 3 relative to the q × ( k + 2) restriction matrix ¯ R = ( R, , . Define ¯ β ( y ) =( I k , , (cid:0) ¯ X ′ ¯ X (cid:1) − ¯ X ′ y . . e + / ∈ M and e − / ∈ M with rank ( X, e + , e − ) = k + 1 . Furthermore, k + 1 < n holds and the n × ( k + 1) matrix ¯ X = ( X, e + ) (which necessarily has rank k + 1 ) satisfies Assumption 3 relativeto the q × ( k + 1) restriction matrix ¯ R = ( R, . Suppose further that ¯ R (cid:0) ¯ X ′ ¯ X (cid:1) − ¯ X ′ e − = 0 holds. Define ¯ β ( y ) = ( I k , (cid:0) ¯ X ′ ¯ X (cid:1) − ¯ X ′ y .5. e + / ∈ M and e − / ∈ M with rank ( X, e + , e − ) = k + 1 . Furthermore, k + 1 < n holds and the n × ( k + 1) matrix ¯ X = ( X, e − ) (which necessarily has rank k + 1 ) satisfies Assumption 3 relativeto the q × ( k + 1) restriction matrix ¯ R = ( R, . Suppose further that ¯ R (cid:0) ¯ X ′ ¯ X (cid:1) − ¯ X ′ e + = 0 holds. Define ¯ β ( y ) = ( I k , (cid:0) ¯ X ′ ¯ X (cid:1) − ¯ X ′ y .In all five scenarios define ¯ T ( y ) = (cid:26) ( R ¯ β ( y ) − r ) ′ ¯Ω − w ( y )( R ¯ β ( y ) − r ) if det ¯Ω w ( y ) = 0 , if det ¯Ω w ( y ) = 0 , where ¯Ω w ( y ) = n ¯ R ( ¯ X ′ ¯ X ) − ¯Ψ w ( y )( ¯ X ′ ¯ X ) − ¯ R ′ , and ¯Ψ w ( y ) is computed from (6) based on ¯ v t ( y ) =¯ u t ( y )¯ x ′ t · instead of ˆ v t ( y ) . Here ¯ u t ( y ) are the residuals from the regression of y on ¯ X , and ¯ x t · arethe rows of ¯ X . Let ¯ W ( C ) = (cid:8) y ∈ R n : ¯ T ( y ) ≥ C (cid:9) be the rejection region where C is a real numbersatisfying < C < ∞ . Then for each of the five scenarios the conclusions of Theorem 3.7 hold with W ( C ) replaced by ¯ W ( C ) . Theorem 3.3 together with Proposition 3.6 has shown that generically the commonly used testbased on the statistic T has severe size or power deficiencies even for C = C AR (1) , while Theorem 3.7has isolated a special case where this is not so. Theorem 3.8 now shows that in many of the casesfalling under the wrath of Theorem 3.3 the ensuing problems can be circumvented (if C = C AR (1) )by making use of the adjusted version ¯ T of the test statistic. The adjustment mechanism is simpleand amounts to basing the test statistic on estimators ¯ β and ¯Ω w that are obtained from a ”workingmodel” that always adds the regressors e + and/or e − to the design matrix. Note that theseregressors effect a purging of the residuals from harmonic components of angular frequency 0 and π . This purging effect together with the fact that the restrictions to be tested do not involve thecoefficients of the ”purging” regressors e + and e − lies at the heart of the positive results expressedin Theorems 3.7 and 3.8. Numerical results that will be presented elsewhere support the theoreticalresult and show that the adjusted test based on ¯ T considerably improves over the unadjusted onebased on T .We next illustrate Theorems 3.7 and 3.8 in the context of Examples 3.1-3.4: In Examples 3.1and 3.2 we have e + ∈ M but R ˆ β ( e + ) = 0, hence neither Theorem 3.7 nor Theorem 3.8 is applicable.In contrast, in Example 3.3 we have e + ∈ M and R ˆ β ( e + ) = 0 since R = e ′ i ( k ) with i >
1. In case e − / ∈ M , which is the typical case and which is, in particular, satisfied in Example 3.4, we canthen use the adjusted test statistic ¯ T which is obtained from the auxiliary model using the enlargeddesign matrix ¯ X = ( X, e − ). Part 1 of Theorem 3.8 then informs us that the so-adjusted test doesnot suffer from the severe size/power distortions discussed in 3.3 for the unadjusted autocorrelationrobust test (provided the conditions on ¯ X in the theorem are satisfied, which generically will be thecase). In case e − ∈ M , Theorem 3.7 applies to the problem considered in Example 3.3 whenever R ˆ β ( e − ) = 0 holds, showing that in this case already the unadjusted test does not suffer from thesevere size/power distortions. Note that here the condition R ˆ β ( e − ) = 0 will hold, for example, if e − is one of the columns of X and the slope parameter that is subjected to test is not the coefficientof e − . 18 emark 3.9. (i) Suppose the scenario in Part 1 of the above theorem applies except that k + 1 = n holds or ¯ X = ( X, e − ) does not satisfy Assumption 3. Then the test statistic ¯ T is identically zeroand the adjustment procedure does not work. A similar remark applies to Parts 2-5.(ii) Suppose the scenario of Part 4 of the above theorem applies except that ¯ R (cid:0) ¯ X ′ ¯ X (cid:1) − ¯ X ′ e − = 0holds. Applying Part 3 of Theorem 3.3 to ¯ T shows that this test has size 1 and hence the adjustmentprocedure fails. A similar comment applies to the scenario of Part 5. Remark 3.10. (i) The results in Theorems 3.7 and 3.8 have assumed C = C AR (1) . The resultsimmediately extend to other covariance models C as long as C is norm-bounded, the only singularaccumulation points of C are e + e ′ + and e − e ′− , and for every Σ m ∈ C converging to one of theselimit points there exists a sequence ( ρ m ) m ∈ N in ( − ,
1) such that Λ − / ( ρ m )Σ m Λ − / ( ρ m ) → I n for m → ∞ (that is, near the ”singular boundary” the covariance model C behaves similar to C AR (1) ).This can be seen from an inspection of the proof. An extension of Theorems 3.7 and 3.8 to evenmore general covariance models will be discussed elsewhere.(ii) For a discussion of a version of Theorem 3.7 for the case where C = C + AR (1) = { Λ ( ρ ) : 0 ≤ ρ < } or C = { Λ ( ρ ) : − ε < ρ < } , ε >
0, see Subsection 3.2.2.
We next discuss test statistics of the form (7) that use estimators other than ˆΨ w . A. (General quadratic estimators based on ˆ v t ) The estimator ˆΨ w given by (6) is a special caseof general quadratic estimators ˆΨ GQ ( y ) of the formˆΨ GQ ( y ) = n X t,s =1 w ( t, s ; n ) ˆ v t ( y )ˆ v s ( y ) ′ for every y ∈ R n , where the n × n weighting matrix W ∗ n = ( w ( t, s ; n )) t,s is symmetric and data-independent. While estimators of this more general form have been studied in the early literature onspectral estimation, much of the literature has focused on the special case of weighted autocovarianceestimators of the form ˆΨ w (partly as a consequence of a result in Grenander and Rosenblatt (1957)that the restriction to the smaller class of estimators does not lead to inferior estimators in a certainasymptotic sense). However, if the data are preprocessed by tapering before an estimator like ˆΨ w is computed from the tapered data, the final estimator belongs to the class of general quadraticestimators. Also, many modern spectral estimators studied in the engineering literature fall intothis class (see Thomson (1982)), but not into the more narrow class of weighted autocovarianceestimators. Another example are the estimators proposed in Phillips (2005), Sun (2013), andZhang and Shao (2013b). We now distinguish two cases: Case 1:
The weighting matrix W ∗ n = ( w ( t, s ; n )) t,s is positive definite. Inspection of the proofsthen shows that all results given above for the tests T based on ˆΨ w remain valid as they stand ifˆΨ w is replaced by ˆΨ GQ in the definition of the test statistic. Case 2:
The weighting matrix W ∗ n = ( w ( t, s ; n )) t,s is only assumed to be nonnegative definite(as is, e.g., the case for the estimators considered in Phillips (2005) and Sun (2013)). Arguingsimilar as in the proof of Lemma 3.1 one can show the following: Lemma 3.11.
Suppose W ∗ n = ( w ( t, s ; n )) t,s is nonnegative definite and define ˆΩ GQ ( y ) = nR ( X ′ X ) − ˆΨ GQ ( y )( X ′ X ) − R ′ . hen the following hold:1. ˆΩ GQ ( y ) is nonnegative definite for every y ∈ R n .2. ˆΩ GQ ( y ) is singular if and only if rank ( B ( y ) W ∗ n ) < q (or, equivalently, if rank (cid:16) B ( y ) W ∗ / n (cid:17)
0. As mentioned above, the size of the test based on T will be less than 1 and the infimalpower will be larger than 0. However, an upshot of Theorem 3.3 still is that the size will be closeto 1 and/or the infimal power will be close to 0 for generic design matrices X , provided ε is small(more precisely, for given sample size n this will happen for sufficiently small ε ). Hence, evenunder such an assumption, size/power problems will disappear (or will be moderate) only if one iswilling to assume a relatively large ε (in relation to sample size n ), making the assumption lookeven more heroic.(ii) If C has e + e ′ + ( e − e ′− , respectively) as its only singular limit point, inspection of the proof ofTheorem 3.3 shows that a version of that theorem, in which now every reference to e − ( e + , respec- Of course, size could be reduced to any prescribed value in this situation by increasing the critical value, butthis would then come at the price of even further reduced power. C = C AR (1) ( ε,
0) = { Λ( ρ ) : ρ ∈ ( − ε, } with ε >
0, such a version of Theorem 3.3 applies. As an illustration, assume that C = C AR (1) ( ε, R ˆ β ( e + ) = 0). Then we can conclude from this version of Theorem 3.3 that the size ofthe test is equal to 1. Note that this result covers the case of testing in a location model.(iii) Suppose C has e + e ′ + as its only singular limit point. Then in the important special casewhere an intercept is present in the regression and the hypothesis tested does not involve theintercept (in the sense that R ˆ β ( e + ) = 0), a positive result (similar to Theorem 3.7) is immediatelyobtained from Theorem 5.21, namely that the test based on T now has size < >
0; moreover, the size can be controlled at any given level δ by an appropriate choice of the criticalvalue C ( δ ). [To be precise, Assumptions 2 and 3 have to be satisfied, C has to be norm-bounded, andmatrices in C that approach e + e ′ + have to do so in the particular manner required in Theorem 5.21.]An important example, where C has e + e ′ + as its only singular limit point (and is norm-bounded andsatisfies the just mentioned assumption required for Theorem 5.21, cf. Lemma G.1 in AppendixG), is C = C AR (1) ( ε,
0) defined above. While an assumption like C = C AR (1) ( ε,
0) is perhaps a bitmore palatable than the assumption C = C AR (1) ( ε, ε ), it still imposes an adhoc restriction on thecovariance model C AR (1) that is debatable, especially if ε is not small (as is, e.g., the case when ρ isrestricted to be positive). Furthermore, note that, while the extreme size and power problems (i.e.,size equal one and infimal power equal zero) are absent in the case we discuss here, less extreme,but nevertheless substantial, size or power problems will generically still be present if ε is smallas explained in (i) above. In case there is no intercept in the regression, an appropriate versionof Theorem 3.8 can be used to generate an adjusted test by adding the intercept as a regressor,thus bringing one back to the situation just discussed. [With the appropriate modifications, similarremarks apply to the case where e − e ′− is the only singular limit point of C .](iv) Regarding the preceding discussion in (iii) one should recall that in case C = C AR (1) The-orems 3.7 and 3.8 show how tests, which have size less than one and infimal power larger thanzero, can easily be obtained without any need of bounding ρ away from 1 or −
1, and thus withoutintroducing any such adhoc restrictions on C . Therefore, it would be desirable to free Theorems 3.7and 3.8 from the assumption C = C AR (1) . To what extent this can be achieved without introduc-ing implausible assumptions like the ones discussed in the preceding paragraphs will be discussedelsewhere. B. (i) The results concerning the extreme size distortion and biasedness of the tests underconsideration in Theorems 3.3 and 3.15 are obtained by considering ”offending” sequences of theform (cid:0) µ , σ , Σ m (cid:1) belonging to the null hypothesis where µ ∈ M and where Σ m converges to e + e ′ + or e − e ′− . For example, if Σ m = Λ( ρ m ) with ρ m → ±
1, then the disturbance processeswith covariance matrix σ Σ m converge weakly to a harmonic process as discussed subsequent toAssumption 1. However, it follows from Remark 3.4(i) that also the sequences (cid:0) µ , σ m , Σ m (cid:1) , where µ and Σ m are as before and σ m , 0 < σ m < ∞ , is an arbitrary sequence, are ”offending” sequencesin the same way. Note that in case Σ m = Λ( ρ m ) with ρ m → ± not converge weakly to a harmonic process: As an example, consider the casewhere one chooses σ m = σ ε (cid:0) − ρ m (cid:1) − with a constant innovation variance σ ε > C maintained in this section (i.e., Section 3) supposes that the dis-turbances in the regression model are weakly stationary and that all stationary AR(1) processesare allowed for. For definiteness of the subsequent discussion assume that C = C AR (1) . Now an22 lternative model assumption could be that the disturbances u t satisfy u t = ρ u t − + ε t , ≤ t ≤ n where | ρ | <
1, where the innovations ε t are i.i.d. N (0 , σ ε ), say, and where u is a (possibly random)starting value with mean zero. If u is treated as a fixed random variable (i.e., being the same forall choices of the parameters in the model), then the resulting model is not covered by the resultsin our paper. [Of course, this does by no means guarantee that usual autocorrelation robust testshave good size and power properties; cf. Footnote 1.] We note, however, that the assumptionthat u is fixed in the above sense assigns a special meaning to the time point t = 0, and hencemay be debatable. Therefore one may rather want to treat u , more precisely its distribution,as a further ”parameter” of the problem. For example, one could assume that u is N (cid:0) , σ ∗ (cid:1) -distributed independently of the innovations ε t for t ≥ < σ ∗ < ∞ , where σ ∗ can varyindependently of ρ and σ ε . But then the resulting covariance model C ∗ contains C = C AR (1) as asubset. Hence, all the results in the paper concerning size equal to 1 or infimal power equal to 0,apply a fortiori to this larger model C ∗ . C. In a recent paper Perron and Ren (2011) argue that the impossibility results in P¨otscher(2002) for estimating the value of the spectral density at frequency zero are irrelevant in the contextof autocorrelation robust testing: In the framework of a Gaussian location model they compare thebehavior of common autocorrelation robust tests t Robust , which are standardized with the helpof a spectral density estimate ˆ f n (0), with a benchmark given by the infeasible test statistic t f (0) that uses the value of the unknown spectral density at frequency zero for standardization. Theyfind that common autocorrelation robust tests beat the infeasible test statistic along a sequenceof DGPs similar to the ones that have been used in P¨otscher (2002) to establish ill-posedness ofthe spectral density estimation problem. This is certainly true and in fact easy to understand:Consider as another benchmark the infeasible test statistic t ideal , say, which uses the (unknown)finite-sample variance s n of the arithmetic mean for standardization rather than the asymptoticvariance 2 πf (0), and observe that this statistic is exactly N (0 ,
1) distributed (under the null) andhas well-behaved size and power properties. Because s n does in general not converge uniformly tothe asymptotic variance 2 πf (0) (for the very same reasons that underlie the impossibility resultin P¨otscher (2002)) t f (0) is not uniformly close to the ideal test t ideal . The fact that ˆ f n (0) is alsonot uniformly close to f (0) (due to the ill-posedness results in P¨otscher (2002)) is now ”helpful”in the sense that it in principle allows for the possibility that 2 π ˆ f n (0) might be closer to the idealstandardization factor s n than is 2 πf (0), thus allowing for the possibility that t Robust might becloser to the ideal test t ideal than to t f (0) . [Observe that 2 π ˆ f n (0) as well as s n each not beinguniformly close to 2 πf (0) does in principle not preclude (uniform) closeness between 2 π ˆ f n (0) and s n .] In other words, ”aiming” at f (0) in standardizing the test statistic is simply the wrong thing todo. In that sense, the ill-posedness of estimating f (0) is then indeed irrelevant for autocorrelationrobust testing (simply because the benchmark t f (0) is irrelevant). As a matter of fact, there is nostatement to the contrary in P¨otscher (2002): Note that P¨otscher (2002) only discusses ill-posednessof the problem of estimating f (0) (considered to be the parameter of interest), and does not makeany statements regarding consequences of this ill-posedness for autocorrelation robust tests thatuse 2 π ˆ f n (0) as an estimate of the variance nuisance parameter. The claim opening the last but oneparagraph on p.1 in Perron and Ren (2011) is thus simply false. Finally, the preceding discussionbegs the question whether or not uniform closeness of 2 π ˆ f n (0) and s n can indeed be established23nder sufficiently general assumptions on the underlying correlation structure. If possible, thiswould then immediately transfer the good size and power properties of t ideal to t Robust . However,unfortunately this is not possible: Recall from Example 3.2 that in the location model considered inPerron and Ren (2011) the size of common autocorrelation robust tests like t Robust is always equalto 1.
The negative results given in Theorem 3.3 rest on Assumption 1, i.e., C ⊇ C AR (1) , and the fact thatthere exist sequences Σ m ∈ C AR (1) that converge to the singular matrices e + e ′ + or e ′− e − leadingto a concentration phenomenon as discussed in the wake of Theorem 3.3. The commonly usednonparametric covariance models like C ξ discussed at the beginning of Section 3 of course alsosatisfy C ξ ⊇ C AR ( p ) for every p , where C AR ( p ) is the set of all n × n correlation matrices arising fromstationary autoregressive process of order not larger than p . In this case additional singular limitmatrices arise which lead to additional conditions under which size equals 1 or infimal power equals0. We illustrate this shortly for the case where C ⊇ C AR (2) . To this end define for ν ∈ (0 , π ) thematrix E ( ν ) as the n × t -th row equal to (cos( tν ) , sin( tν )). Furthermore set E (0) = e + and E ( π ) = e − . In Lemma G.2 in Appendix G we show that the matrices E ( ν ) E ( ν ) ′ for ν ∈ [0 , π ]arise as limits of sequences of matrices in C AR (2) . Obviously, E ( ν ) E ( ν ) ′ is singular whenever n ≥ ν to the set { , π } in the subsequent theorem reproduces the conditions appearing inTheorem 3.3 (albeit under the stronger assumptions that C ⊇ C AR (2) and n ≥ Theorem 3.12.
Suppose C ⊇ C AR (2) , Assumptions 2 and 3 are satisfied, and n ≥ holds. Let T be the test statistic defined in (7) with ˆΨ w as in (6). Let W ( C ) = { y ∈ R n : T ( y ) ≥ C } be therejection region where C is a real number satisfying < C < ∞ . Then the following holds:1. Suppose there exists a ν ∈ [0 , π ] such that rank ( B ( z )) = q and T ( z + µ ∗ ) > C hold for some(and hence all) µ ∗ ∈ M and for λ span( E ( ν )) -almost all z ∈ span ( E ( ν )) . Then sup Σ ∈ C P µ ,σ Σ ( W ( C )) = 1 holds for every µ ∈ M and every < σ < ∞ . In particular, the size of the test is equal toone.2. Suppose there exists a ν ∈ [0 , π ] such that rank ( B ( z )) = q and T ( z + µ ∗ ) < C hold for some(and hence all) µ ∗ ∈ M and for λ span( E ( ν )) -almost all z ∈ span ( E ( ν )) . Then inf Σ ∈ C P µ ,σ Σ ( W ( C )) = 0 holds for every µ ∈ M and every < σ < ∞ , and hence inf µ ∈ M inf Σ ∈ C P µ ,σ Σ ( W ( C )) = 0 holds for every < σ < ∞ . In particular, the test is biased. Furthermore, the nuisance-infimal rejection probability at every point µ ∈ M is zero, i.e., inf <σ < ∞ inf Σ ∈ C P µ ,σ Σ ( W ( C )) = 0 . In particular, the infimal power of the test is equal to zero. . Suppose there exists a ν ∈ [0 , π ] such that B ( z ) = 0 and R ˆ β ( z ) = 0 hold for λ span( E ( ν )) -almostall z ∈ span ( E ( ν )) . Then sup Σ ∈ C P µ ,σ Σ ( W ( C )) = 1 holds for every µ ∈ M and every < σ < ∞ . In particular, the size of the test is equal toone. To illustrate the value added of the preceding theorem when compared to Theorem 3.3 considerthe following example: Assume that e + and e − are both elements of M and R ˆ β ( e + ) = R ˆ β ( e − ) = 0.Then none of the conditions in Theorem 3.3 are satisfied and thus this theorem is not applicable.Suppose now that the design matrix X contains E ( ν ) for some ν ∈ (0 , π ) as a submatrix, i.e.,seasonal regressors are included. Without loss of generality assume that X = ( E ( ν ) , X (2) ). If wewant to test for absence of seasonality at angular frequency ν , this corresponds to R = ( I ,
0) and r = 0. In case Assumption 3 holds, the conditions in Case 3 of the preceding theorem are thenobviously satisfied and we conclude that the size of the test for absence of seasonality is equal toone. [In case Assumption 3 is violated, the test breaks down in a trivial way as noted earlier.]We finally ask what happens if we allow for covariance structures deriving from even higher-order autoregressive models, i.e., C ⊇ C AR ( p ) with p >
2. While additional concentration spacesarise and theorems like the one above can be easily obtained from Corollary 5.17, these theoremswill often not generate new obstructions to good size and power properties. The reason for this isthat any of the newly arising concentration spaces already contains one of the concentration spacesspan ( E ( ν )) for ν ∈ [0 , π ] as a subset. The results in Subsection 3.2 were given for autocorrelation robust tests that make use of a non-parametric estimator ˆΩ. In this subsection we show that the phenomena encountered in Subsection3.2 (size distortions and power deficiencies) are not a consequence of the nonparametric nature ofthe estimator, but can equally arise if a parametric estimator is being used (and even if the para-metric model employed correctly describes the covariance structure of the errors). We illustratethis for the case where the test statistic is obtained from a feasible generalized least squares (GLS)estimator predicated on an AR(1) covariance structure, as well as for the case where the test statis-tic is obtained from the ordinary least squares (OLS) estimator combined with an estimator forthe variance covariance matrix again predicated on the same covariance structure. The theoreticalresults derived below are in line with Monte Carlo results provided in Park and Mitchell (1980) andMagee (1989).We start with the estimator ˆ ρ that will be used in the feasible GLS procedure as well as in theestimator for the variance covariance matrix of the OLS estimator. Assumption 4.
For a ∈ { , } and a ∈ { n − , n } with a ≤ a the estimator ˆ ρ is of the form ˆ ρ ( y ) = n X t =2 ˆ u t ( y )ˆ u t − ( y ) , a X t = a ˆ u t ( y ) for all y ∈ R n \ N ( a , a ) and it is undefined for y ∈ N ( a , a ) = (cid:26) y ∈ R n : a P t = a ˆ u t ( y ) = 0 (cid:27) . ρ Y W , corresponds to a = 1, a = n ,while the least squares estimator ˆ ρ LS corresponds to a = 1, a = n −
1. The estimators whichuse a = 2, a = n − a = 2, a = n have also been considered in the literature (see, e.g.,Park and Mitchell (1980), Magee (1989)). Remark 3.13. (Some properties of ˆ ρ ) (i) For the Yule-Walker estimator ˆ ρ Y W we have N (1 , n ) = M , i.e., ˆ ρ Y W is well-defined for every y ∈ R n \ M . Furthermore, ˆ ρ Y W is bounded away from 1in modulus uniformly over its domain of definition, i.e., sup y ∈ R n \ M | ˆ ρ Y W ( y ) | < | ˆ ρ Y W ( y ) | <
1, that the supremum in question does notchange its value if the range for y is replaced by the compact set (cid:8) y ∈ M ⊥ : k y k = 1 (cid:9) , and the factthat ˆ ρ Y W is continuous on this set. [It can also be derived from the discussion in Section 3.5 inGrenander and Rosenblatt (1957).](ii) The least squares estimator ˆ ρ LS exhibits a somewhat different behavior: First, ˆ ρ LS is welldefined only on R n \ N (1 , n − N (1 , n −
1) given by { y ∈ R n : ˆ u ( y ) ∈ span( e n ( n )) } . Notethat R n \ N (1 , n −
1) is contained in R n \ M , but is strictly smaller in case e n ( n ) is orthogonal toeach column of X . Second, ˆ ρ LS is not bounded away from one in modulus, in fact | ˆ ρ LS | ≥ (iii) The behavior of the remaining two estimators ˆ ρ is similar to the behavior of ˆ ρ LS .(iv) The set N ( a , a ) is always a closed subset of R n . It is guaranteed to be a λ R n -null setprovided k ≤ a − a holds, cf. Lemma 3.14 below. This condition on k is no restriction in the caseof the Yule-Walker estimator (since we have assumed k < n from the beginning), and is a very mildcondition in the other cases (requiring k ≤ n − k ≤ n − ρ ). While Λ(ˆ ρ ) isnonsingular if | ˆ ρ | 6 = 1, Λ(ˆ ρ ) is singular if | ˆ ρ | = 1, and hence we need to study the set of y where | ˆ ρ ( y ) | = 1 (or ˆ ρ ( y ) is undefined). Lemma 3.14.
Let ˆ ρ satisfy Assumption 4. Then M ⊆ N ( a , a ) ⊆ N ( a , a ) where N ( a , a ) = ( y ∈ R n : (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X t =2 ˆ u t ( y )ˆ u t − ( y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = a X t = a ˆ u t ( y ) ) . The set N ( a , a ) is a closed subset of R n and is precisely the set where the estimator ˆ ρ is eithernot well-defined or is equal to in modulus. The estimator ˆ ρ is continuous on R n \ N ( a , a ) ⊇ R n \ N ( a , a ) . If k ≤ a − a holds, the set N ( a , a ) is a λ R n -null set. While for the Yule-Walker estimator N (1 , n ) = N (1 , n ) holds as a consequence of Remark3.13(i), for the other estimators ˆ ρ the corresponding set N ( a , a ) can be a proper superset of N ( a , a ).Given an estimator ˆ ρ satisfying Assumption 4 we now introduce the test statistic T F GLS ( y ) = ( ( R ˜ β ( y ) − r ) ′ ˜Ω − ( y )( R ˜ β ( y ) − r ) if y ∈ R n \ N ∗ ( a , a ) , β ( y ) = ( X ′ Λ − (ˆ ρ ( y )) X ) − X ′ Λ − (ˆ ρ ( y )) y, There are even cases where ˆ ρ LS is unbounded. σ ( y ) = ( n − k ) − ( y − X ˜ β ( y )) ′ Λ − (ˆ ρ ( y ))( y − X ˜ β ( y )) , ˜Ω ( y ) = ˜ σ ( y ) R ( X ′ Λ − (ˆ ρ ( y )) X ) − R ′ . Here N ∗ ( a , a ) is defined via R n \ N ∗ ( a , a ) = (cid:8) y ∈ R n \ N ( a , a ) : ˜ σ ( y ) = 0 , det (cid:0) R ( X ′ Λ − (ˆ ρ ( y )) X ) − R ′ (cid:1) = 0 (cid:9) , where N ( a , a ) is given by R n \ N ( a , a ) = (cid:8) y ∈ R n \ N ( a , a ) : det (cid:0) X ′ Λ − (ˆ ρ ( y )) X (cid:1) = 0 (cid:9) . Note that ˜ β , ˜ σ , and ˜Ω are well-defined on R n \ N ( a , a ), with ˜Ω ( y ) being nonsingular if and onlyif y ∈ R n \ N ∗ ( a , a ), see Lemma B.1 in Appendix B. Furthermore, define T OLS ( y ) = ( ( R ˆ β ( y ) − r ) ′ ˆΩ − ( y )( R ˆ β ( y ) − r ) if y ∈ R n \ N ∗ ( a , a ) , β ( y ) is the OLS-estimator, ˆ σ ( y ) = ( n − k ) − ˆ u ′ ( y )ˆ u ( y ), andˆΩ ( y ) = ˆ σ ( y ) R ( X ′ X ) − X ′ Λ(ˆ ρ ( y )) X ( X ′ X ) − R ′ . Here N ∗ ( a , a ) is defined via R n \ N ∗ ( a , a ) = (cid:8) y ∈ R n \ N ( a , a ) : det (cid:0) R ( X ′ X ) − X ′ Λ(ˆ ρ ( y )) X ( X ′ X ) − R ′ (cid:1) = 0 (cid:9) . Of course, ˆ β and ˆ σ are well-defined on all of R n , while ˆΩ is well-defined on R n \ N ( a , a ) ⊇ R n \ N ∗ ( a , a ). Furthermore, ˆΩ ( y ) is nonsingular for y ∈ R n \ N ∗ ( a , a ), see Lemma B.1 in Ap-pendix B. We note that the exceptional sets N ∗ ( a , a ) and N ∗ ( a , a ), respectively, appearing inthe definition of the test statistics are λ R n -null sets provided k ≤ a − a holds, see Lemma B.1.[For the case of the Yule-Walker estimator actually N ∗ (1 , n ) = N (1 , n ) = N (1 , n ) = N ∗ (1 , n ) = N (1 , n ) = M holds, because Λ(ˆ ρ Y W ( y )) is positive definite for every y / ∈ N (1 , n ) = M in view of | ˆ ρ Y W ( y ) | <
1, cf. Remark 3.13(i).]As already noted in Remark 3.13, except for the Yule-Walker estimator we can not rule out thatˆ ρ ( y ) is larger than one in absolute value. For such values of y the matrix Λ(ˆ ρ ( y )), although beingnonsingular, is indefinite. [To see this, note that det Λ(ˆ ρ ( y )) = (1 − ˆ ρ ( y )) n − , which is negativefor | ˆ ρ ( y ) | > n is even. Hence there must exist a negative and a positive eigenvalue. For odd n > | ˆ ρ ( y ) | > y , then it occurs on a set of positive λ R n -measure in view of continuity of ˆ ρ . As a consequence,˜Ω ( y ) and ˆΩ ( y ) are not guaranteed to be λ R n -almost everywhere nonnegative definite (except if theYule-Walker estimator is being used), although they are λ R n -almost everywhere nonsingular in case k ≤ a − a . Of course, the probability of the event | ˆ ρ ( y ) | > | ˆ ρ ( y ) | > heorem 3.15. Suppose Assumptions 1 and 4 are satisfied and k ≤ a − a holds. Let W F GLS ( C ) = { y ∈ R n : T F GLS ( y ) ≥ C } and W OLS ( C ) = { y ∈ R n : T OLS ( y ) ≥ C } be the rejection regions corre-sponding to the test statistics T F GLS and T OLS , respectively, where C is a real number satisfying < C < ∞ . Then the following holds:1. Suppose e + / ∈ N ∗ ( a , a ) and T F GLS ( e + + µ ∗ ) > C hold for some (and hence all) µ ∗ ∈ M ,or e − / ∈ N ∗ ( a , a ) and T F GLS ( e − + µ ∗ ) > C hold for some (and hence all) µ ∗ ∈ M . Then sup Σ ∈ C P µ ,σ Σ ( W F GLS ( C )) = 1 holds for every µ ∈ M and every < σ < ∞ . In particular, the size of the test is equal toone.2. Suppose e + / ∈ N ∗ ( a , a ) and T F GLS ( e + + µ ∗ ) < C hold for some (and hence all) µ ∗ ∈ M ,or e − / ∈ N ∗ ( a , a ) and T F GLS ( e − + µ ∗ ) < C hold for some (and hence all) µ ∗ ∈ M . Then inf Σ ∈ C P µ ,σ Σ ( W F GLS ( C )) = 0 holds for every µ ∈ M and every < σ < ∞ , and hence inf µ ∈ M inf Σ ∈ C P µ ,σ Σ ( W F GLS ( C )) = 0 holds for every < σ < ∞ . In particular, the test is biased. Furthermore, the nuisance-infimal rejection probability at every point µ ∈ M is zero, i.e., inf <σ < ∞ inf Σ ∈ C P µ ,σ Σ ( W F GLS ( C )) = 0 . In particular, the infimal power of the test is equal to zero.3. Suppose that e + ∈ M and R ˆ β ( e + ) = 0 hold. Then there exists a constant K F GLS ( e + ) , whichdepends only on e + , R , and X , such that for every µ ∈ M , every σ with < σ < ∞ , andevery M ≥ we have inf γ ∈ R , | γ |≥ M inf Σ ∈ C P µ + γe + ,σ Σ ( W F GLS ( C )) ≤ K F GLS ( e + ) ≤ sup Σ ∈ C P µ ,σ Σ ( W F GLS ( C )) ; Note that µ + γe + ∈ M for γ = 0 . Furthermore, if ˆ ρ ≡ ˆ ρ Y W , then K F GLS ( e + ) = 1 andhence sup Σ ∈ C P µ ,σ Σ ( W F GLS ( C )) = 1 (12) holds for every µ ∈ M and every < σ < ∞ . If e − ∈ M and R ˆ β ( e − ) = 0 hold thenthe analogous statements hold with e + replaced by e − where the constant K F GLS ( e − ) nowdepends only on e − , R , and X .4. Statements analogous to 1.-3. hold true if T F GLS is replaced by T OLS , W F GLS ( C ) is replacedby W OLS ( C ) , the set N ∗ ( a , a ) is replaced by N ∗ ( a , a ) , and the constants K F GLS ( · ) arereplaced by constants K OLS ( · ) . ρ Y W is used the exceptional null sets appearing in Parts 1 and 2 (and in the corresponding portion ofPart 4) satisfy N ∗ (1 , n ) = N ∗ (1 , n ) = M . Part 3 differs somewhat from the corresponding part ofthe earlier theorem, and tells us that, given the conditions in Part 3 are met, there exist points inthe alternative, arbitrarily far away from the null hypothesis, at which power is not larger than thesize of the test. The reason for the difference between Part 3 of Theorem 3.3 and Part 3 of the pre-ceding theorem lies in the fact that the variance covariance matrix estimator ˜Ω used in the presentsubsection can be indefinite and that the concentration direction e + ( e − , respectively) belongs tothe null set on which ˜Ω is not defined. This requires one in the proof of the preceding theorem toresort to Theorem 5.19 rather than to using Part 3 of Corollary 5.17 (even when the Yule-Walkerestimator ˆ ρ Y W is used). A similar remark applies also to the corresponding portion of Part 4 ofthe preceding theorem. In view of the general results in Subsection 5.4 there is little doubt thatsimilar negative results can also be obtained for FGLS or OLS based tests that are constructed onthe basis of higher order autoregressive AR models or of other more profligate parametric models(as long as C ⊇ C AR (1) is assumed). Hence it is to be expected that autocorrelation robust testsbased on autoregressive estimates (cf. Berk (1974), den Haan and Levin (1997), Sun and Kaplan(2012)) will also suffer from severe size and power problems.The results given in the preceding theorem reveal serious size and power problems of the testsbased on T F GLS and T OLS . Note that these problems arise even if C = C AR (1) , i.e., even if theconstruction of the test statistics makes use of the correct covariance model. If C = C AR (1) holds, itis interesting to contrast the above results with the size and power properties of the correspondinginfeasible tests based on T ∗ GLS and T ∗ OLS which are defined in a similar way as T F GLS and T OLS are, but with ˆ ρ replaced by the true value of ρ : These tests are standard F -tests (except for notbeing standardized by q ), have well-known and reasonable size and power properties, and do notsuffer from the size and power problems exhibited by their feasible counterparts.Similar to the situation in Subsection 3.2, the conditions in Parts 1-3 of the preceding theoremonly depend on a and a (i.e., on the choice of estimator ˆ ρ ), the design matrix X , the restriction( R, r ), the vector e + ( e − , respectively), and the critical value C . Hence, in any particular applicationit can be decided whether or not (and which of) these conditions are satisfied. We furthermorenote that remarks analogous to Remarks 3.4 and 3.5 also apply mutatis mutandis to the precedingtheorem. We also note that a result analogous to Theorem 3.12 could be given here, but we do notspell out the details.We next show that the conditions of Theorem 3.15 involving the design matrix X are genericallysatisfied. The first part of the subsequent proposition shows that these conditions are genericallysatisfied in the class of all possible design matrices of rank k . Parts 2 and 3 show a correspondingresult if we impose that the regression model has to contain an intercept. In the proposition thedependence of several quantities like T F GLS , T OLS , N ∗ ( a , a ), etc on the design matrix X will beimportant and thus we shall write T F GLS,X , T OLS,X , N ∗ ,X ( a , a ), etc for these quantities in theresult to follow. Proposition 3.16.
Suppose Assumption 1 holds. Fix ( R, r ) with rank ( R ) = q , fix < C < ∞ ,and fix a ∈ { , } and a ∈ { n − , n } in Assumption 4. Suppose k ≤ a − a holds. Let T F GLS,X and T OLS,X be the test statistics defined above and let µ ∗ ∈ M be arbitrary. . With X defined in Proposition 3.6 define now X ,F GLS ( e + ) = (cid:8) X ∈ X : e + ∈ N ∗ ,X ( a , a ) (cid:9) , X ,F GLS ( e + ) = { X ∈ X \ X ,F GLS ( e + ) : T F GLS,X ( e + + µ ∗ ) = C } , and similarly define X ,F GLS ( e − ) , X ,F GLS ( e − ) . [Note that X ,F GLS ( e + ) and X ,F GLS ( e − ) do not depend on the choice of µ ∗ .] Then X ,F GLS ( e + ) and X ,F GLS ( e − ) are λ R n × k -null sets.The same is true for X ,F GLS ( e + ) ( X ,F GLS ( e − ) , respectively) under the provision that it isa proper subset of X \ X ,F GLS ( e + ) ( X \ X ,F GLS ( e − ) , respectively). The set of all designmatrices X ∈ X for which Theorem 3.15 does not apply is a subset of ( X ,F GLS ( e + ) ∪ X ,F GLS ( e + )) ∩ ( X ,F GLS ( e − ) ∪ X ,F GLS ( e − )) . Hence it is a λ R n × k -null set provided the preceding provision holds for at least one of X ,F GLS ( e + ) or X ,F GLS ( e − ) ; it thus is a ”negligible” subset of X in view of the fact that X differsfrom R n × k only by a λ R n × k -null set.2. Suppose k ≥ and n ≥ hold and suppose X has e + as its first column, i.e., X = (cid:16) e + , ˜ X (cid:17) .With ˜ X defined in Proposition 3.6 define ˜ X ,F GLS ( e − ) = n ˜ X ∈ ˜ X : e − ∈ N ∗ , ( e + , ˜ X ) ( a , a ) o , ˜ X ,F GLS ( e − ) = n ˜ X ∈ ˜ X \ ˜ X ,F GLS ( e − ) : T F GLS, ( e + , ˜ X )( e − + µ ∗ ) = C o , and note that ˜ X ,F GLS ( e − ) does not depend on the choice of µ ∗ . Then ˜ X ,F GLS ( e − ) isa λ R n × ( k − -null set. The set ˜ X ,F GLS ( e − ) is a λ R n × ( k − -null set under the provision thatit is a proper subset of ˜ X \ ˜ X ,F GLS ( e − ) . [The analogously defined sets ˜ X ,F GLS ( e + ) and ˜ X ,F GLS ( e + ) satisfy ˜ X ,F GLS ( e + ) = ˜ X and ˜ X ,F GLS ( e + ) = ∅ .] The set of all matrices ˜ X ∈ ˜ X such that Theorem 3.15 does not apply to the design matrix X = (cid:16) e + , ˜ X (cid:17) is a subsetof ˜ X ,F GLS ( e − ) ∪ ˜ X ,F GLS ( e − ) and hence is a λ R n × ( k − -null set under the preceding provision;it thus is a ”negligible” subset of ˜ X in view of the fact that ˜ X differs from R n × ( k − only bya λ R n × ( k − -null set.3. Define X ,OLS ( · ) and X ,OLS ( · ) analogously, but with N ∗ ,X ( a , a ) replacing N ∗ ,X ( a , a ) and T OLS,X replacing T F GLS,X . Similarly define ˜ X ,OLS ( · ) and ˜ X ,OLS ( · ) . Then Part 1 (Part2, respectively) holds analogously for X ,OLS ( · ) and X ,OLS ( · ) ( ˜ X ,OLS ( · ) and ˜ X ,OLS ( · ) ,respectively) with obvious changes.4. Suppose X = (cid:16) e + , ˜ X (cid:17) , and suppose the first column of R is nonzero. Then Part 3 of Theorem3.15 applies to the design matrix X = (cid:16) e + , ˜ X (cid:17) for every ˜ X ∈ ˜ X (for the FGLS- as well asfor the OLS-based test). The preceding genericity result maintains in Part 1 the provision that X ,F GLS ( e + ) is a propersubset of X \ X ,F GLS ( e + ) or that X ,F GLS ( e − ) is a proper subset of X \ X ,F GLS ( e − ). Note thatthe provision depends on the critical value C . If the provision is satisfied for the given C , we30an conclude from Part 1 that the set of all design matrices X ∈ X for which Theorem 3.15 isnot applicable to the test statistic T F GLS is ”negligible”. If the provision is not satisfied, i.e., if X ,F GLS ( e + ) = X \ X ,F GLS ( e + ) and X ,F GLS ( e − ) = X \ X ,F GLS ( e − ) holds, and thus we cannotdraw the desired conclusion for the given value of C , we immediately see that the provision mustthen be satisfied for any other choice C ′ of the critical value; hence, negligibility of the set ofdesign matrices for which Theorem 3.15 is not applicable to the test statistic T F GLS can then beconcluded for any C ′ = C . Summarizing we see that the provision is always satisfied except possiblefor one particular choice of the critical value. A similar comment applies to Parts 2 and 3 of theproposition. Similar as in Subsection 3.2, we next discuss an exceptional case to which Theorem 3.15 doesnot apply and which allows for a positive result, at least if the covariance model C is assumed tobe C AR (1) or is approximated by C AR (1) near the singular points (in the sense of Remark 3.10(i)). Theorem 3.17.
Suppose C = C AR (1) , Assumption 4 is satisfied, and k ≤ a − a holds. Let W F GLS ( C ) = { y ∈ R n : T F GLS ( y ) ≥ C } and W OLS ( C ) = { y ∈ R n : T OLS ( y ) ≥ C } be the rejectionregions corresponding to the test statistics T F GLS and T OLS , respectively, where C is a real numbersatisfying < C < ∞ . If e + , e − ∈ M and R ˆ β ( e + ) = R ˆ β ( e − ) = 0 is satisfied, then the followingholds for W ( C ) = W F GLS ( C ) as well as W ( C ) = W OLS ( C ) :1. The size of the rejection region W ( C ) is strictly less than , i.e., sup µ ∈ M sup <σ < ∞ sup − <ρ< P µ ,σ Λ( ρ ) ( W ( C )) < . Furthermore, inf µ ∈ M inf <σ < ∞ inf − <ρ< P µ ,σ Λ( ρ ) ( W ( C )) > .
2. The infimal power is bounded away from zero, i.e., inf µ ∈ M inf <σ < ∞ inf − <ρ< P µ ,σ Λ( ρ ) ( W ( C )) > .
3. Suppose that a = 1 and a = n . Then for every < c < ∞ inf µ ∈ M , <σ < ∞ d ( µ , M ) /σ ≥ c P µ ,σ Λ( ρ m ) ( W ( C )) → holds for m → ∞ and for any sequence ρ m ∈ ( − , satisfying | ρ m | → . Furthermore, forevery sequence < c m < ∞ and every < ε < µ ∈ M ,d ( µ , M ) ≥ c m inf − ε ≤ ρ ≤ − ε P µ ,σ m Λ( ρ ) ( W ( C )) → holds for m → ∞ whenever < σ m < ∞ and c m /σ m → ∞ . [The very last statement holdseven without the conditions e + , e − ∈ M and R ˆ β ( e + ) = R ˆ β ( e − ) = 0 .] For example, if T OLS is used, a = 1, a = n (Yule-Waker estimator), and X is not restricted to be of the form (cid:16) e + , ˜ X (cid:17) , it is not difficult to show that the provision is in fact satisfied for every choice of C . This can also be shownfor other choices of a and a and/or for the case where X = (cid:16) e + , ˜ X (cid:17) under additional assumptions on R . It mayactually be true in general, but we do not want to pursue this. . For every δ , < δ < , there exists a C ( δ ) , < C ( δ ) < ∞ , such that sup µ ∈ M sup <σ < ∞ sup − <ρ< P µ ,σ Λ( ρ ) ( W ( C ( δ ))) ≤ δ. A discussion similar to the one following Theorem 3.7 applies also here. Furthermore, a resultparalleling Theorem 3.8 can again be obtained by a combined application of Theorem 5.21 andProposition 5.23. The so-obtained result shows how adjusted test statistics ¯ T F GLS and ¯ T OLS canbe constructed that have size/power properties as given in the preceding theorem also in manycases which fall under the wrath of Theorem 3.15 (and for which the tests based on T F GLS and T OLS suffer from extreme size or power deficiencies). The adjustment mechanism again amounts tousing a ”working model” that always adds the regressors e + and/or e − to the design matrix. Weabstain from providing details. F -test without correction for autocorrelation As mentioned in the introduction, a considerable body of literature is concerned with the propertiesof the standard F -test (i.e., the F -test without correction for autocorrelation) in the presence ofautocorrelation. Much of this literature concentrates on the case where the errors follow a stationaryautoregressive process of order 1, i.e., C = C AR (1) . As the correlation in the errors is not accountedfor in the standard F -test, bad performance of the standard F -test for large values of the correlation ρ can be expected. This has been demonstrated formally in Kr¨amer (1989), Kr¨amer et al. (1990),and subsequently in Banerjee and Magnus (2000): These papers determine the limit as ρ → F -test and show that (i) this limit is 1 if the regressioncontains an intercept and the restrictions to be tested involve the intercept (i.e., the n × e + = (1 , . . . , ′ belongs to the span of the design matrix and R ˆ β ( e + ) = 0 holds) or if the regressiondoes not contain an intercept (i.e., e + does not belong to the span of the design matrix) and a certainobservable quantity, A say, is positive, (ii) it is 0 if the regression does not contain an interceptand the observable quantity A is negative, and (iii) it is a value between 0 and 1 if the regressioncontains an intercept but the restrictions to be tested do not involve the intercept (i.e., e + belongsto the span of the design matrix and R ˆ β ( e + ) = 0 holds). It perhaps comes as a surprise thatautocorrelation robust tests, which have built into them a correction for autocorrelation, exhibit asimilar behavior as shown in Section 3 of the present paper. We mention that, due to the relativelysimple structure of the standard F -test statistic as a ratio of quadratic forms, the method of proofin Kr¨amer (1989), Kr¨amer et al. (1990), and Banerjee and Magnus (2000) is by direct computationof the limit (as ρ →
1) of the test statistic. In contrast, the results for the much more complicatedtest statistics considered in the present paper rely on quite different methods which make use ofinvariance considerations and are of a more geometric flavor. Needless to say, the just mentionedresults in Kr¨amer (1989), Kr¨amer et al. (1990), and Banerjee and Magnus (2000) can be rederivedthrough a straightforward application of the general results in Subsection 5.4 to the standard F -test.In light of the fact that the standard F -test makes no correction for autocorrelation at all,a perhaps surprising observation is that nevertheless an analogue to Theorems 3.7 and 3.17 canbe established for the standard F -test by a simple application of Theorem 5.21. Even more, theadjustment procedure described in Proposition 5.23 can be applied to the standard F -test leading Banerjee and Magnus (2000) claim in their Theorem 5 that the expression Pr ( F (0) > δ ) converges to zero if Mi = 0 and ¯ F (0) ≤ δ . In case ¯ F (0) = δ the argument given there is, however, incorrect, because F (0) → ¯ F (0) = δ in probability does not imply Pr ( F (0) > δ ) →
32o a result analogous to Theorem 3.8. While these results show that the size and power of theso-adjusted standard F -test do not ”break down” completely for extreme correlations, they do nottell us much about the performance of the adjusted test for moderate correlations. We next turn to size and power properties of commonly used heteroskedasticity robust tests. Tothis end we allow for heteroskedasticity of unknown form as is common in the literature and thusallow that the errors in the regression model have a variance covariance matrix σ Σ where Σ is anelement of the covariance model given by C Het = ( diag( τ , . . . , τ n ) : τ i > , i = 1 , . . . , n, n X i =1 τ i = 1 ) . The normalization for Σ chosen is of course arbitrary and could equally well be replaced, e.g., bythe normalization τ = 1. The heteroskedasticity robust test statistic considered is given by T Het ( y ) = (cid:26) ( R ˆ β ( y ) − r ) ′ ˆΩ − Het ( y ) ( R ˆ β ( y ) − r ) if det ˆΩ Het ( y ) = 0 , Het ( y ) = 0 , (13)where ˆΩ Het = R ˆΨ Het R ′ and ˆΨ Het is a heteroskedasticity robust estimator. Such estimators wereintroduced in Eicker (1963, 1967) and have later found their way into the econometrics literature(e.g., White (1980)). They are of the formˆΨ
Het ( y ) = ( X ′ X ) − X ′ diag (cid:0) d ˆ u ( y ) , . . . , d n ˆ u n ( y ) (cid:1) X ( X ′ X ) − where the constants d i > d i are d i = 1, d i = n/ ( n − k ), d i = (1 − h ii ) − , or d i = (1 − h ii ) − where h ii denotes the i -th diagonal element of theprojection matrix X ( X ′ X ) − X ′ , see Long and Ervin (2000) for an overview. Another suggestionis d i = (1 − h ii ) − δ i for suitable choice of δ i , see Cribari-Neto (2004). For the last three choices of d i we use the convention that we set d i = 1 in case h ii = 1. Note that h ii = 1 implies ˆ u i ( y ) = 0 forevery y , and hence it is irrelevant which real value is assigned to d i in case h ii = 1.Similar as in Subsection 3.2 we need to ensure that ˆΩ Het ( y ) is nonsingular λ R n -almost every-where. As shown in the subsequent lemma this is the case provided Assumption 3 introduced inSubsection 3.2 is satisfied. The lemma also shows that in case this assumption is violated the matrixˆΩ Het ( y ) is singular everywhere , leading to a complete and trivial breakdown of the test. Recall thedefinition of the matrix B ( y ) given in (8) and note that it is independent of the constants d i . Lemma 4.1. ˆΩ Het ( y ) is nonnegative definite for every y ∈ R n .2. ˆΩ Het ( y ) is singular if and only if rank ( B ( y )) < q .3. ˆΩ Het ( y ) = 0 if and only if B ( y ) = 0 .4. The set of all y ∈ R n for which ˆΩ Het ( y ) is singular (or, equivalently, for which rank ( B ( y )) Theorem 4.2. Suppose C ⊇ C Het holds and Assumption 3 is satisfied. Let T Het be the test statisticdefined in (13) and let W Het ( C ) = { y ∈ R n : T ( y ) ≥ C } be the rejection region where C is a realnumber satisfying < C < ∞ . Then the following holds:1. Suppose for some i , ≤ i ≤ n , we have rank ( B ( e i ( n ))) = q and T Het ( e i ( n ) + µ ∗ ) > C forsome (and hence all) µ ∗ ∈ M . Then sup Σ ∈ C P µ ,σ Σ ( W Het ( C )) = 1 holds for every µ ∈ M and every < σ < ∞ . In particular, the size of the test is equal toone.2. Suppose for some i , ≤ i ≤ n , we have rank ( B ( e i ( n ))) = q and T Het ( e i ( n ) + µ ∗ ) < C forsome (and hence all) µ ∗ ∈ M . Then inf Σ ∈ C P µ ,σ Σ ( W Het ( C )) = 0 holds for every µ ∈ M and every < σ < ∞ , and hence inf µ ∈ M inf Σ ∈ C P µ ,σ Σ ( W Het ( C )) = 0 holds for every < σ < ∞ . In particular, the test is biased. Furthermore, the nuisance-infimal rejection probability at every point µ ∈ M is zero, i.e., inf <σ < ∞ inf Σ ∈ C P µ ,σ Σ ( W Het ( C )) = 0 . In particular, the infimal power of the test is equal to zero.3. Suppose for some i , ≤ i ≤ n , we have B ( e i ( n )) = 0 and R ˆ β ( e i ( n )) = 0 . Then sup Σ ∈ C P µ ,σ Σ ( W Het ( C )) = 1 holds for every µ ∈ M and every < σ < ∞ . In particular, the size of the test is equal toone. We note that Remark 3.4 as well as most of the discussion following Theorem 3.3 apply mutatismutandis also here. Similar as in Subsection 3.2 it is also not difficult to show (for typical choicesof d i ) that the set of design matrices X for which the conditions in Theorem 4.2 are not satisfiedis a negligible set. We omit a formal statement. In contrast to the case considered in Subsection3.2, however, no (nontrivial) analogues to the positive results given in Theorems 3.7 and 3.8 arepossible due to the fact that in the present setting there are now too many concentration spaces(which together in fact span all of R n ). Furthermore, the above theorem and its proof exploitsonly the one-dimensional concentration spaces Z i =span( e i ( n )). While every linear space of the34orm span( e i ( n ) , . . . , e i p ( n )) for 0 < p < n and 1 ≤ i < . . . < i p ≤ n is a concentration space ofthe model C , using all these concentration spaces in conjunction with Corollary 5.17 will often notdeliver additional obstructions to good size or power properties, the reason being that each of thesespaces already contains a concentration space Z i as a subset. As a further point of interest we notethat the assumptions imposed in Eicker (1963, 1967) require all variances σ τ i to be bounded awayfrom zero in order to achieve uniformity in the convergence to the limiting distribution. Hence,Eicker’s assumptions rule out the concentration effect that drives the above result. It appearsthat this insight in Eicker (1963, 1967) has not been fully appreciated in the ensuing econometricsliterature.In connection with the preceding theorem, which points out size distortions and/or power de-ficiencies of heteroskedasticity robust tests even under a normality assumption, a result in Section4.2 of Dufour (2003) needs to be mentioned which shows that the size of heteroskedasticity robusttests is always 1 if one allows for a sufficiently large nonparametric class of distributions for theerrors U .We briefly discuss the standard F -test statistic without any correction for heteroskedasticity.Let T uncorr ( y ) = ( (( n − k ) /q ) ( R ˆ β ( y ) − r ) ′ (cid:16) R ( X ′ X ) − R ′ (cid:17) − ( R ˆ β ( y ) − r ) / (ˆ u ′ ( y ) ˆ u ( y )) if y / ∈ M y ∈ M and define W uncorr ( C ) in the obvious way. It is then easy to see that a variant of Theorem 4.2also holds with T uncorr and W uncorr ( C ) replacing T Het and W Het ( C ), respectively, if in this variantof the theorem Assumption 3 is dropped, the condition rank ( B ( e i ( n ))) = q is replaced by thecondition e i ( n ) / ∈ M , and the condition B ( e i ( n )) = 0 is replaced by the condition e i ( n ) ∈ M . Ina recent paper Ibragimov and M¨uller (2010) consider the standard t -test for testing µ = 0 versus µ = 0 in a Gaussian location model and discuss a result by Bakirov and Szkely (2005) to the effectthat the size of this test under heteroskedasticity of unknown form equals the nominal significancelevel δ as long as n ≥ δ ≤ . T uncorr ( e i ( n )) = 1 holds for every i (note that µ ∗ = 0) and thus the inequality T uncorr ( e i ( n )) < C always holds whenever C > 1. Hence Case 1 of the variant of Theorem 4.2 just discussed does notarise whenever C > e i ( n ) / ∈ M = span ( e + )), showingthat the standard t -test suffers from severe power deficiencies under heteroskedasticity of unknownform in case n ≥ δ ≤ . t -statistic is the standard F -statistic). Imposing the assumption that all elements Σ of C ⊆ C Het have all their diagonal elements bounded from belowby a given positive constant ε is only a partial cure. While it saves the heteroskedasticity robust test from the extreme size and power distortions as described in Theorem 4.2, substantial size/power distortions will neverthelessbe present if ε is small (relative to sample size). Cf. the discussion in Subsection 3.2.2. General Principles Underlying Size and Power Results forTests of Linear Restrictions in Regression Models withNonspherical Disturbances The results on size and power properties given in the previous sections are obtained as special casesof a more general theory that applies to a large class of tests and to general covariance models C (which thus are not restricted to covariance structures resulting from stationary disturbances orfrom heteroskedasticity). This theory is provided in the present section. We use the notation andassumptions of Section 2. Since invariance properties of tests will play an important rˆole in someof the results to follow, the next subsection collects some relevant results related to invariance.In Subsection 5.2 we provide conditions under which the tests considered have highly unpleasantsize or power properties. This result is based on a ”concentration” effect. In contrast, Subsection5.3 provides conditions under which tests do not suffer from the size and power problems justmentioned. Subsection 5.4 then specializes the results of the preceding subsections to a class oftests which can be described as nonsphericity-corrected F -type tests. This class of tests containsvirtually all so-called heteroskedasticity and autocorrelation robust tests available in the literatureas special cases. Furthermore, Subsection 5.4 also contains another negative result, the derivationof which exploits the particular structure of these tests. Let G be a group of bijective Borel-measurable transformations of R n into itself, the group operationbeing the composition of transformations. A function S defined on R n is said to be invariant underthe group G if S ( g ( y )) = S ( y ) for all y ∈ R n and all g ∈ G . A subset A of R n is said to be invariantunder G if g ( A ) ⊆ A holds for every g ∈ G . Since with g also g − belongs to G , this is equivalent to g ( A ) = A for every g ∈ G , and thus to invariance of the indicator function of A as defined before. Clearly, invariance of S : R n → R , the extended real line, under the group G implies invariance ofthe super-level sets W = { y : S ( y ) ≥ C } . Furthermore, a function S defined on R n is said to bealmost invariant under the group G if S ( g ( y )) = S ( y ) holds for all g ∈ G and all y ∈ R n \ N ( g )with Borel-sets N ( g ) satisfying λ R n ( N ( g )) = 0 and also λ R n (cid:0) g ′− ( N ( g )) (cid:1) = 0 for all g ′ ∈ G . Asubset A of R n is said to be almost invariant if g ( A ) ⊆ A ∪ N ( g ) holds for every g ∈ G with theBorel-sets N ( g ) satisfying λ R n ( N ( g )) = 0 and λ R n (cid:0) g ′− ( N ( g )) (cid:1) = 0 for all g ′ ∈ G . It is easy tosee that this is equivalent to g ( A ) △ A ⊆ N ∗ ( g ) for every g ∈ G , with Borel-sets N ∗ ( g ) satisfying λ R n ( N ∗ ( g )) = 0 and λ R n (cid:0) g ′− ( N ∗ ( g )) (cid:1) = 0 for all g ′ ∈ G ; thus it is equivalent to almost invarianceof the indicator function of A . Clearly, almost invariance of S : R n → R under the group G impliesalmost invariance of the super-level sets W = { y : S ( y ) ≥ C } .We are interested in some particular groups of affine transformations. For an affine subspace N If G is only a collection of bijective transformations on R n but is not a group, then invariance of A does notimply g ( A ) = A in general, and in particular does not coincide with the notion of invariance of the indicator functionof A . The additional requirement λ R n (cid:0) g ′− ( N ( g )) (cid:1) = 0 for all g ′ ∈ G of course implies λ R n ( N ( g )) = 0 and mayappear artificial at first sight. However, it arises naturally in the context of testing problems that are invariant underthe group G and for which the relevant family of probability measures is equivalent to λ R n , cf. Lehmann and Romano(2005), Section 6.5. Regardless of this, the additional requirement already follows from λ R n ( N ( g )) = 0 in case thegroup G is a group of affine transformations on R n , which will be the groups we are interested in. R n let G ( N ) = { g α,ν,ν ′ : α = 0, ν ′ ∈ N } for some fixed but arbitrary ν ∈ N , where the affine map g α,ν,ν ′ is given by g α,ν,ν ′ ( y ) = α ( y − ν )+ ν ′ with α ∈ R . Observe that G ( N ) does not depend on the choice of ν (in particular, if N is a linearsubspace, one may choose ν = 0). Hence, G ( N ) can also be written in a redundant way as G ( N ) = { g α,ν,ν ′ : α = 0, ν ∈ N , ν ′ ∈ N } .It is easy to see that G ( N ) is a group w.r.t. composition which is non-abelian except if N is asingleton. For later use we also note that N as well as R n \ N are invariant under G ( N ), and that G ( N ) acts transitively on N (but not on R n \ N in general). Furthermore, note that the elements of G ( N ) can also be written as g α,ν,ν ′ ( y ) = αy + (1 − α ) ν + ( ν ′ − ν ). Remark 5.1. We make an observation on the structure of G ( N ). Let G ( N ) denote the collectionof transformations g α,ν,ν ( y ) for every α = 0 and every ν ∈ N , and let G ( N ) denote the collection oftransformations g ,ν,ν ′ ( y ) for every pair ν, ν ′ ∈ N . Obviously, G ( N ) as well as G ( N ) are subsetsof G ( N ), and every element of G ( N ) is the composition of an element in G ( N ) with an element of G ( N ). While G ( N ) is a subgroup, G ( N ) is not (as it is not closed under composition) except inthe trivial case where N is a singleton. However, the group generated by G ( N ) is precisely G ( N ).As a consequence, any function S which is invariant under the elements of G ( N ) (meaning that S ( g ( y )) = S ( y ) for all y ∈ R n and all g ∈ G ( N )) is already invariant under the entire group G ( N ),and a similar statement holds for almost invariance. Proposition 5.2. A maximal invariant for G ( N ) is given by h ( y ) = D Π ( N − ν ∗ ) ⊥ ( y − ν ∗ ) / (cid:13)(cid:13)(cid:13) Π ( N − ν ∗ ) ⊥ ( y − ν ∗ ) (cid:13)(cid:13)(cid:13)E , where ν ∗ is an arbitrary element of N . The maximal invariant h in fact does not depend on thechoice of ν ∗ ∈ N . [Here we use the convention x/ k x k = 0 if x = 0 .] Remark 5.3. Specializing to the case N = M it is obvious that Π ( M − µ ) ⊥ ( y − µ ) can becomputed as y − X ˆ β rest ( y ), where ˆ β rest denotes the restricted ordinary least squares estimator. Itfollows that any test that is invariant under G ( M ) depends only on the normalized restricted leastsquares residuals, in fact only on D y − X ˆ β rest ( y ) / (cid:13)(cid:13)(cid:13) y − X ˆ β rest ( y ) (cid:13)(cid:13)(cid:13)E . [For the tests considered inSubsection 5.4 one can obtain this result also directly from the definition of the tests.]Consider now the problem of testing H versus H as defined in (4). First observe that the sets M and M are invariant under the transformations in G ( M ). This implies that the parameterspaces M i × (0 , ∞ ) × C corresponding to H i (for i = 0 , 1) are each invariant under the associatedgroup G ( M ), i.e., the group consisting of all transformations ¯ g α,µ ,µ ′ defined on M × (0 , ∞ ) × C given by ¯ g α,µ ,µ ′ ( µ, σ , Σ) = ( α ( µ − µ ) + µ ′ , α σ , Σ)where α = 0, µ ∈ M , µ ′ ∈ M . [Note that the associated group strictly speaking also dependson C , but we suppress this in the notation.] Second, the probability measures associated with H and H clearly satisfy P µ,σ Σ ( A ) = P α ( µ − µ )+ µ ′ ,α σ Σ ( α ( A − µ ) + µ ′ ) (14)37or every ( µ, σ , Σ) ∈ M × (0 , ∞ ) × C and every Borel set A ⊆ R n . This shows that the testingproblem considered in (4) is invariant under the group G ( M ) in the sense of Lehmann and Romano(2005), Chapters 6 and 8. While trivial, it will be useful to note that (14) continues to hold if Σ ∈ C is replaced by an arbitrary nonnegative definite symmetric n × n matrix Φ. The next propositiondiscusses invariance properties of the rejection probabilities of an almost invariant test ϕ that will beneeded in subsequent subsections. As will be seen later, it is useful to consider in that propositionthe rejection probabilities E µ,σ Φ ( ϕ ) also for Φ a positive (or sometimes only nonnegative) definitesymmetric n × n matrix not necessarily belonging to the assumed covariance model C . Proposition 5.4. Let ϕ : R n → [0 , be a Borel-measurable function that is almost invariant under G ( M ) .1. For every ( µ, σ ) ∈ M × (0 , ∞ ) and for every positive definite symmetric n × n matrix Φ therejection probabilities satisfy E µ,σ Φ ( ϕ ) = E α ( µ − µ )+ µ ′ ,α σ Φ ( ϕ ) (15) for all α = 0 , µ ∈ M , µ ′ ∈ M .2. For every ( µ, σ ) ∈ M × (0 , ∞ ) and every positive definite symmetric n × n matrix Φ we havethe representation E µ,σ Φ ( ϕ ) = E Π ( M − µ ⊥ ( µ − µ ) /σ + µ , Φ ( ϕ ) = E D Π ( M − µ ⊥ ( µ − µ ) /σ E + µ , Φ ( ϕ ) (16) where µ is an arbitrary element of M . [Note that Π ( M − µ ) ⊥ ( µ − µ ) /σ actually does notdepend on the choice of µ , and Π ( M − µ ) ⊥ ( µ − µ ) can be computed as µ − X ˆ β rest ( µ ) .]3. The rejection probability E µ,σ Φ ( ϕ ) depends on (cid:0) µ, σ (cid:1) ∈ M × (0 , ∞ ) and Φ ( Φ symmetric andpositive definite) only through (cid:16)D Π ( M − µ ) ⊥ ( µ − µ ) /σ E , Φ (cid:17) . Furthermore, Π ( M − µ ) ⊥ ( µ − µ ) /σ is in a bijective correspondence with ( Rβ − r ) /σ where β denotes the coordinates of µ in the basis given by the columns of X . Thus the rejection probability E µ,σ Φ ( ϕ ) depends on (cid:0) µ, σ (cid:1) ∈ M × (0 , ∞ ) and Φ only through ( h ( Rβ − r ) /σ i , Φ) .4. If ϕ is invariant under G ( M ) , then (15) and (16) hold even if Φ is only nonnegative definiteand symmetric (and consequently in this case also the claim in Part 3 continues to hold forsuch Φ ). Remark 5.5. (i) For Φ = Σ ∈ C relation (15) expresses the fact that the rejection probability ofthe almost invariant test ϕ is invariant under the associated group G ( M ).(ii) Setting α = 1 in (15) and holding σ and Φ fixed, we see that the rejection probability is,in particular, constant along that translation of M which passes through µ .(iii) If µ ∈ M , choosing µ = µ , α = σ − in (15), and fixing µ ′ ∈ M , shows that E µ,σ Φ ( ϕ ) = E µ ′ , Φ ( ϕ ). Hence, for µ ∈ M , the rejection probability is constant in (cid:0) µ, σ (cid:1) and only depends onΦ. (iv) Occasionally we consider tests ϕ that are only required to be almost invariant under thesubgroup of transformations y αy +(1 − α ) µ for a fixed µ ∈ M , i.e., under the group G ( { µ } ).38he results in the above propositions can be easily adapted to this case and we refrain from spellingout the details. We only note that the analogue to (15) in this case is given by E µ,σ Φ ( ϕ ) = E α ( µ − µ )+ µ ,α σ Φ ( ϕ ) (17)for all α = 0.Part 2 of the above proposition has shown that the rejection probability depends on the pa-rameters only through (cid:16)D Π ( M − µ ) ⊥ ( µ − µ ) /σ E , Σ (cid:17) . This quantity is recognized as a maximalinvariant in the next result. Proposition 5.6. Let µ ∈ M be arbitrary. Then (cid:16)D Π ( M − µ ) ⊥ ( µ − µ ) /σ E , Σ (cid:17) is a maximalinvariant for the associated group G ( M ) . We next establish a negative result providing conditions under which (i) the size of a test is 1,and/or (ii) the power function of a test gets arbitrarily close to zero. The theorem is based on a”concentration effect” that we explain now: Suppose one can find a sequence Σ m ∈ C convergingto a singular matrix ¯Σ and let Z denote the span of the columns of ¯Σ. Let µ ∈ M . Sincethe probability measures P µ ,σ Σ m converge weakly to P µ ,σ ¯Σ , which has support µ + Z , theyconcentrate their mass more and more around µ + Z . Suppose first that one can show that µ + Z is essentially contained in the interior of the rejection region W in the sense that the setof points in µ + Z which are not interior points of W has λ µ + Z -measure zero. It then followsthat P µ ,σ Σ m ( W ) converges to P µ ,σ ¯Σ ( W ) ≥ P µ ,σ ¯Σ ( µ + Z ) = 1, establishing that the sizeof the test is 1. Now, in some cases of interest it turns out that µ + Z fails to satisfy the justmentioned ”interiority” condition with respect to the rejection region W , but it also turns out that itdoes satisfy the ”interiority” condition with respect to an ”equivalent” rejection region W ′ , which isobtained by adjoining a λ R n -null set to W (for example, for W ′ = W ∪ ( µ + Z )). Since the rejectionprobabilities corresponding to W and W ′ are identical (as any Σ ∈ C is positive definite) and thusthe two tests have the same size, the above reasoning can then be applied to W ′ , again showingthat the size of the test based on W is 1 for these cases. Part 1 of Theorem 5.7 below formalizes thisreasoning. The same ”concentration effect” reasoning applied to R n \ W instead of W then gives(20). [The remaining claims in Part 2 as well as Part 3 are then consequences of (20) combinedwith continuity or invariance properties of the power function.] It should, however, be stressed thatweak convergence of P µ ,σ Σ m to P µ ,σ ¯Σ together with the inclusion µ + Z ⊆ W (except possiblyfor a λ µ + Z -null set) alone is not sufficient to allow one to draw the conclusion – as tempting as itmay be – that P µ ,σ Σ m ( W ) → P µ ,σ ¯Σ ( W ) = 1 holds. Counterexampleswhere P µ ,σ Σ m converges weakly to P µ ,σ ¯Σ and µ + Z ⊆ W (and thus P µ ,σ ¯Σ ( W ) = 1) holds,but where P µ ,σ Σ m ( W ) converges to a positive number less than 1 are easily found with the help ofTheorem 5.10. We furthermore note that in a different testing context Martellosio (2010) providesa result which also makes use of a ”concentration effect”, but his result is not correct as given. Fora discussion of these issues and corrected results see Preinerstorfer and P¨otscher (2014).The ”concentration effect” reasoning underlying Theorem 5.7 of course hinges crucially on the”interiority” condition (either w.r.t. W or w.r.t. R n \ W ), raising the question why we should expectthis to be satisfied in the applications we have in mind, rather than expect that µ + Z intersects39ith both W and R n \ W in such a way that the ”interiority” condition is neither satisfied w.r.t. W nor w.r.t. R n \ W . Consider the case where Z is one-dimensional, a case of paramount importance inthe applications, and suppose also that W is invariant under the group G ( M ). Then we have thedichotomy that ( µ + Z ) \ { µ } either lies entirely in W or in R n \ W , showing that – except possiblyfor the point µ – the set µ + Z never intersects both W and R n \ W . Moreover, if an elementof ( µ + Z ) \ { µ } belongs to the interior of W (of R n \ W , respectively), then ( µ + Z ) \ { µ } inits entirety is a subset of the interior of W (of R n \ W , respectively). Hence, under the mentionedinvariance and for one-dimensional Z , one can expect the ”interiority” conditions in the subsequenttheorem to be satisfied not infrequently. Theorem 5.7. Let W be a Borel set in R n , the rejection region of a test. Furthermore, assumethat Z is a concentration space of the covariance model C . Then the following holds:1. If µ ∈ M satisfies λ µ + Z (bd ( W ∪ ( µ + Z ))) = 0 , (18) then for every < σ < ∞ sup Σ ∈ C P µ ,σ Σ ( W ) = 1 holds; in particular, the size of the test equals . [In case W is of the form { y ∈ R n : T ( y ) ≥ C } for some Borel-measurable function T : R n R and < C < ∞ , a sufficient condition for(18) is that for λ Z -almost every z ∈ Z the test statistic T satisfies T ( µ + z ) > C and islower semicontinuous at µ + z .]2. If µ ∈ M satisfies λ µ + Z (bd (( R n \ W ) ∪ ( µ + Z ))) = 0 , (19) then for every < σ < ∞ inf Σ ∈ C P µ ,σ Σ ( W ) = 0 , (20) and hence inf µ ∈ M inf Σ ∈ C P µ ,σ Σ ( W ) = 0 , holds for every < σ < ∞ . In particular, the test is biased (except in the trivial case whereits size is zero). [In case W is of the form { y ∈ R n : T ( y ) ≥ C } for some Borel-measurablefunction T : R n R and < C < ∞ , a sufficient condition for (19) is that for λ Z -almostevery z ∈ Z the test statistic T satisfies T ( µ + z ) < C and is upper semicontinuous at µ + z .]3. Suppose that condition (20) is satisfied for some µ ∈ M and some < σ < ∞ . Further-more, assume that W is almost invariant under the group G ( { µ } ) . Then for every µ ∈ M we have inf <σ < ∞ inf Σ ∈ C P µ ,σ Σ ( W ) = 0 . [In case W is of the form { y ∈ R n : T ( y ) ≥ C } for some Borel-measurable function T : R n R and < C < ∞ , almost invariance of W under the group G ( { µ } ) follows from almostinvariance of T under G ( { µ } ) .] emark 5.8. (i) The conclusions of the above theorem immediately also apply to every teststatistic T ′ that is λ R n -almost everywhere equal to a test statistic T satisfying the assumptions ofthe theorem.(ii) Let ϕ : R n [0 , 1] be Borel-measurable, i.e., a test. If the set { y : ϕ ( y ) = 1 } satisfies theassumptions on W in Part 1 of the above theorem, then for every 0 < σ < ∞ sup Σ ∈ C E µ ,σ Σ ( ϕ ) = 1holds. If the set { y : ϕ ( y ) = 0 } satisfies the assumptions on R n \ W in Part 2 of the above theoremthen for every 0 < σ < ∞ inf Σ ∈ C E µ ,σ Σ ( ϕ ) = 0holds. A similar remark applies to Part 3 of the theorem, provided ϕ is almost invariant under G ( { µ } ). Remark 5.9. If the covariance model C contains AR(1) correlation matrices Λ( ρ m ) for some se-quence ρ m ∈ ( − , 1) with ρ m → ρ m → − 1, respectively), then span ( e + ) (span ( e − ), respectively)is a concentration space of C (cf. Lemma G.1 in Appendix G). Hence Theorem 5.7 applies with Z = span ( e + ) ( Z = span ( e − ), respectively). In particular, if C contains C AR (1) , then Theorem 5.7applies with Z = span ( e + ) as well as with Z = span ( e − ). The next theorem isolates conditions under which a test does not suffer from the extreme size andpower problems encountered in the preceding subsection. In particular, we provide conditions whichguarantee that the size is bounded away from one and that the power function is bounded awayfrom zero. The theorem assumes that the test ϕ – apart from being (almost) invariant under thegroup G ( M ) – is also invariant under addition of elements of J ( C ) defined below. This additionalinvariance assumption will be automatically satisfied in the important special case where ϕ isinvariant under the group G ( M ) and where J ( C ) ⊆ M − µ for some µ ∈ M (and hence for all µ ∈ M ) as then the maps x x + z for z ∈ J ( C ) are elements of G ( M ); see also Proposition 5.23and the attending discussion in Subsection 5.4. A second assumption of the subsequent theorem isthat the covariance model C is bounded which is typically a harmless assumption in applications asit is, e.g., always satisfied if the elements of C are normalized such that the largest diagonal elementis 1, or such that the trace is 1. The theorem also maintains a further assumption on the covariancemodel C related to the way sequences of elements in C approach singular matrices. This conditionhas to be verified for the covariance model C in any particular application. A verification for C AR (1) is given in Appendix G, cf. also Remarks 5.14 and 5.20.For a covariance model C define now J ( C ) = [ n span( ¯Σ) : det ¯Σ = 0, ¯Σ = lim m →∞ Σ m for a sequence Σ m ∈ C o , i.e., J ( C ) is the union of all concentration spaces of the covariance model C . [Note that the subse-quent results remain valid in the case where J ( C ) is empty.] Theorem 5.10. Let ϕ : R n → [0 , be a Borel-measurable function that is almost invariant under G ( M ) . Suppose that ϕ is neither λ R n -almost everywhere equal to nor λ R n -almost everywhere qual to . Suppose further that ϕ ( x + z ) = ϕ ( x ) for every x ∈ R n and every z ∈ J ( C ) . (21) Assume that C is bounded (as a subset of R n × n ). Assume also that for every sequence Σ m ∈ C converging to a singular ¯Σ there exists a subsequence ( m i ) i ∈ N and a sequence of positive realnumbers s m i such that the sequence of matrices D m i = Π span(¯Σ) ⊥ Σ m i Π span(¯Σ) ⊥ /s m i converges to amatrix D which is regular on the orthogonal complement of span( ¯Σ) (meaning that the linear mapcorresponding to D is injective when restricted to the orthogonal complement of span( ¯Σ) ) . Thenthe following holds:1. The size of the test ϕ is strictly less than , i.e., sup µ ∈ M sup <σ < ∞ sup Σ ∈ C E µ ,σ Σ ( ϕ ) < . Furthermore, inf µ ∈ M inf <σ < ∞ inf Σ ∈ C E µ ,σ Σ ( ϕ ) > . 2. Suppose additionally that for every sequence ν m ∈ Π ( M − µ ) ⊥ ( M − µ ) with k ν m k → ∞ andfor every sequence Φ m of positive definite symmetric n × n matrices with Φ m → Φ , Φ positivedefinite, we have lim inf m →∞ E ν m + µ , Φ m ( ϕ ) > , (22) where µ is an element of M . [This condition clearly does not depend on the particular choiceof µ ∈ M .]. Then the infimal power is bounded away from zero, i.e., inf µ ∈ M inf <σ < ∞ inf Σ ∈ C E µ ,σ Σ ( ϕ ) > . 3. Suppose that the limit inferior in (22) is for every sequence ν m and Φ m as specified above.Then for every < c < ∞ inf µ ∈ M , <σ < ∞ d ( µ , M ) /σ ≥ c E µ ,σ Σ m ( ϕ ) → holds for m → ∞ and for any sequence Σ m ∈ C satisfying Σ m → ¯Σ with ¯Σ a singular matrix.Furthermore, for every sequence < c m < ∞ inf µ ∈ M ,d ( µ , M ) ≥ c m E µ ,σ m Σ m ( ϕ ) → holds for m → ∞ whenever < σ m < ∞ , c m /σ m → ∞ , and the sequence Σ m ∈ C satisfies Σ m → ¯Σ with ¯Σ a positive definite matrix. [The very last statement even holds withoutrecourse to condition (21) and the condition on C following (21).] Of course, D maps every element of span(¯Σ) into zero by construction. (cid:13)(cid:13)(cid:13)(cid:16) Rβ (1) − r (cid:17) /σ (cid:13)(cid:13)(cid:13) is bounded away from zero and Σ m approaches asingular matrix, or that (cid:13)(cid:13)(cid:13)(cid:16) Rβ (1) − r (cid:17) /σ (cid:13)(cid:13)(cid:13) → ∞ and Σ m approaches a positive definite matrix.Here β (1) is the parameter vector corresponding to µ . Note that d ( µ , M ) is bounded from aboveas well as from below by multiples of (cid:13)(cid:13)(cid:13) Rβ (1) − r (cid:13)(cid:13)(cid:13) , where the constants involved are positive anddepend only on X , R , and r . Remark 5.11. (i) Because J ( C ) as a union of linear spaces is homogenous, condition (21) isequivalent to the condition that ϕ ( x + z ) = ϕ ( x ) holds for every x ∈ R n and every z ∈ span ( J ( C )).(ii) If condition (22) in Theorem 5.10 is replaced by the weaker conditionlim inf m →∞ E d m ( µ − µ )+ µ , Φ m ( ϕ ) > , (25)for every µ ∈ M , for every d m → ∞ and every sequence Φ m of positive definite symmetric n × n matrices with Φ m → Φ, Φ a positive definite matrix, then we can only establish for every µ ∈ M that inf <σ < ∞ inf Σ ∈ C E µ ,σ Σ ( ϕ ) > . If the limes inferior in (25) is 1 for every µ , d m , and Φ m as specified above, then for every µ ∈ M and every 0 < σ ∗ < ∞ we have inf <σ ≤ σ ∗ E µ ,σ Σ m ( ϕ ) → m ∈ C satisfying Σ m → ¯Σ with ¯Σ a singular matrix; and also E µ ,σ m Σ m ( ϕ ) → σ m → m ∈ C satisfies Σ m → ¯Σ with ¯Σ a positive definitematrix. [The very last statement even holds without recourse to condition (21) and the conditionon C following (21).]The subsequent theorem elaborates on Part 1 of Theorem 5.10 and shows that under the addi-tional assumptions one can not only guarantee that the size of the test is smaller than 1, but onecan, for any prescribed significance level δ (0 < δ < δ . The result applies in particular to the important case where the tests areof the form ϕ C = ( T ≥ C ) for some test statistic T . Note that for any C k ↑ ∞ the sequence oftests ϕ C k clearly satisfies condition (26) in the subsequent theorem provided { y : T ( y ) = ∞} is a λ R n -null set. Thus in this case the theorem shows that for any given significance level δ , 0 < δ < C ( δ ) such that the test ϕ C ( δ ) has a size not exceeding δ . Theorem 5.12. Let ϕ k : R n → [0 , for k ≥ be a sequence of Borel-measurable functions each ofwhich satisfies the assumptions for Part 1 of Theorem 5.10, and let C also satisfy the assumptionsof that theorem. Furthermore assume that the sequence ϕ k satisfies E µ ∗ , Φ ( ϕ k ) ↓ as k ↑ ∞ for some µ ∗ ∈ M and all positive definite symmetric n × n matrices Φ . Then for every δ , < δ < , there exists a k = k ( δ ) such that sup µ ∈ M sup <σ < ∞ sup Σ ∈ C E µ ,σ Σ ( ϕ k ) ≤ δ. emark 5.13. (i) The assumption in Theorem 5.10 that ϕ k is not λ R n -almost everywhere equalto 0 is of course irrelevant for the result in Theorem 5.12.(ii) Of course, the second part of Part 1 of Theorem 5.10 immediately applies to ϕ k ; and Parts2 and 3 of that theorem also apply to ϕ k provided ϕ k satisfies the respective additional conditions. Remark 5.14. (i) In case the covariance model C equals C AR (1) , the boundedness condition inTheorems 5.10 and 5.12 is clearly satisfied and J ( C ) reduces to span ( e + ) ∪ span ( e − ). Furthermore,the condition on the covariance model C in those theorems expressed in terms of the matrices D m isthen also satisfied as shown in Lemma G.1 in Appendix G. Also note that in this case the sequencesΣ m in Part 3 of Theorem 5.10 converging to a singular matrix are of the form Λ ( ρ m ) with ρ m → ρ m → − C is norm-bounded, has e + e ′ + and e − e ′− as the only singularaccumulation points, and has the property that for every sequence Σ m ∈ C converging to one of theselimit points there exists a sequence ( ρ m ) m ∈ N in ( − , 1) such that Λ − / ( ρ m )Σ m Λ − / ( ρ m ) → I n for m → ∞ (that is, near the ”singular boundary” the covariance model C behaves similar to C AR (1) ).Then J ( C ) is as in (i) and again the conditions on the covariance model C in Theorems 5.10 and5.12 are satisfied. F -type tests In this subsection we specialize the preceding results to a broad class of tests of linear restrictionsin linear regression models with nonspherical errors and derive a further result specific to this class.The class considered in this subsection contains the vast majority of tests proposed in the literaturefor this testing problem. We start with a pair of estimators ˇ β and ˇΩ, where ˇΩ typically has theinterpretation of an estimator of the variance covariance matrix of R ˇ β − r under the null hypothesis.Similar as in previous sections, the estimators are viewed as functions of y ∈ R n , but it proves usefulto allow for cases where the estimators are not defined for some exceptional values of y . We imposethe following assumption on the estimators. Assumption 5. (i) The estimators ˇ β : R n \ N → R k and ˇΩ : R n \ N → R q × q are well-defined andcontinuous on the complement of a closed λ R n -null set N in the sample space R n , with ˇΩ also beingsymmetric on R n \ N .(ii) The set R n \ N is invariant under the group G ( M ) , i.e., y ∈ R n \ N implies αy + Xγ ∈ R n \ N for every α = 0 and every γ ∈ R k .(iii) The estimators satisfy the equivariance properties ˇ β ( αy + Xγ ) = α ˇ β ( y ) + γ and ˇΩ( αy + Xγ ) = α ˇΩ( y ) for every y ∈ R n \ N , for every α = 0 , and for every γ ∈ R k .(iv) ˇΩ is λ R n -almost everywhere nonsingular on R n \ N . We make a few obvious observations: First, the invariance of R n \ N under the group G ( M )expressed in Assumption 5 is equivalent to the same invariance property of N itself. Second, since N is closed by Assumption 5, it follows that either N is empty or otherwise must at least contain M (to see this note that y ∈ N implies αy ∈ N for α arbitrarily close to zero which in turn implies0 ∈ N by closedness of N ). Third, given Assumption 5 holds, the sets (cid:8) y ∈ R n \ N : det ˇΩ( y ) = 0 (cid:9) and (cid:8) y ∈ R n \ N : det ˇΩ( y ) = 0 (cid:9) are invariant under the transformations in G ( M ), and the set N ∗ = N ∪ (cid:8) y ∈ R n \ N : det ˇΩ( y ) = 0 (cid:9) (27)44s a closed λ R n -null set that is also invariant under the transformations in G ( M ); cf. Lemma F.1in Appendix F. Hence, the set (cid:8) y ∈ R n \ N : det ˇΩ( y ) = 0 (cid:9) could in principle have been absorbedinto N in the above assumption; however, we shall not do so since keeping the exceptional set N as small as possible will lead to stronger results. Furthermore, M ⊆ N ∗ always holds. To see thisnote that M ⊆ N ⊆ N ∗ holds if N is not empty as noted above; in case N is empty, ˇΩ( y ) iswell-defined for every y and ˇΩ(0) = ˇΩ( α 0) = α ˇΩ(0) must hold, implying ˇΩ(0) = 0 and thus alsoˇΩ( Xγ ) = ˇΩ( α Xγ ) = α ˇΩ(0) = 0. In particular, this shows that either ˇΩ is not defined on M or is zero on M .Given estimators ˇ β and ˇΩ satisfying Assumption 5 we define the test statistic T ( y ) = ( ( R ˇ β ( y ) − r ) ′ ˇΩ − ( y )( R ˇ β ( y ) − r ) , y ∈ R n \ N ∗ , , y ∈ N ∗ . (28)We note that assigning the test statistic the value zero at points y ∈ R n for which either y ∈ N or det( ˇΩ)( y ) = 0 holds is arbitrary, but has no effect on the rejection probabilities of the test, since N ∗ is a λ R n -null set as noted above and since all relevant probability measures P µ,σ Σ are absolutelycontinuous w.r.t. Lebesgue measure on R n .In line with the interpretation of ˇΩ as an estimator for a variance covariance matrix, the leadingcase is when ˇΩ is positive definite almost everywhere (which under Assumption 5 is equivalent tononnegative definiteness almost everywhere). However, sometimes we encounter situations wherethis is not guaranteed for a given fixed sample size (cf. Subsection 3.3), although typically theprobability of being positive definite will go to one for each fixed value of the parameters as samplesize increases. In order to be able to accommodate also such cases, Assumption 5 does not containa requirement that ˇΩ is positive definite almost everywhere. Nevertheless, in light of what has justbeen said, we shall consider the rejection region to be of the form { y ∈ R n : T ( y ) ≥ C } for C a realnumber satisfying 0 < C < ∞ .For some of the results that follow we shall need further conditions on ˇΩ which, however, aremuch weaker than the almost everywhere positive definiteness requirement just mentioned. Assumption 6. There exists v ∈ R q , v = 0 , and a y ∈ R n \ N ∗ such that v ′ ˇΩ − ( y ) v > holds. Since under Assumption 5 the matrix ˇΩ − ( y ) is continuous on R n \ N ∗ , it follows that Assumption6 in fact implies that v ′ ˇΩ − ( y ) v > y ’s. The condition expressed in thenext assumption is also certainly satisfied if ˇΩ is positive definite almost everywhere. At first glanceit may seem that this condition rules out the case where ˇΩ( y ) is allowed to be indefinite on a set ofpositive Lebesgue measure, but this is not so as v is not allowed to depend on y in this condition. Assumption 7. For every v ∈ R q with v = 0 we have λ R n (cid:0)(cid:8) y ∈ R n \ N ∗ : v ′ ˇΩ − ( y ) v = 0 (cid:9)(cid:1) = 0 . The following lemma collects some properties of the test statistic that will be useful in thesequel. Lemma 5.15. Suppose Assumption 5 is satisfied and let T be the test statistic defined in (28).Then the following holds:1. The set R n \ N ∗ is invariant under the elements of G ( M ) .2. The test statistic T is continuous on R n \ N ∗ ; in particular, T is λ R n -almost everywhere con-tinuous on R n . . The test statistic T is invariant under the group G ( M ) . Consequently, the rejection region W ( C ) = { y ∈ R n : T ( y ) ≥ C } and its complement are invariant under G ( M ) .4. The set { y ∈ R n : T ( y ) = C } is a λ R n -null set for every < C < ∞ .5. Suppose < C < ∞ holds. Then { y ∈ R n \ N ∗ : T ( y ) > C } (= { y ∈ R n : T ( y ) > C } ) is anopen set in R n , which is guaranteed to be non-empty under Assumption 6. Consequently, underAssumption 6 the rejection region W ( C ) contains a non-empty open set and thus satisfies λ R n ( W ( C )) > .6. Suppose < C < ∞ holds. Then { y ∈ R n \ N ∗ : T ( y ) < C } is a non-empty open set in R n .Consequently, the complement of the rejection region W ( C ) contains a non-empty open setand thus satisfies λ R n ( R n \ W ( C )) > .7. Suppose Assumption 7 and < C < ∞ hold. Then, for every µ ∈ M , every sequence ν m ∈ Π ( M − µ ) ⊥ ( M − µ ) with k ν m k → ∞ , and for every sequence Φ m of positive definitesymmetric n × n matrices with Φ m → Φ , Φ a positive definite matrix, we have that lim inf m →∞ P ν m + µ , Φ m ( W ( C )) = inf v ∈ A (( ν m ) m ≥ ) Pr (cid:16) v ′ ˇΩ − (Φ / G ) v ≥ (cid:17) = inf v ∈ A (( ν m ) m ≥ ) Pr (cid:16) v ′ ˇΩ − (Φ / G ) v > (cid:17) (29) where A (( ν m ) m ≥ ) is the set of all accumulation points of the sequence R ( X ′ X ) − X ′ ν m / (cid:13)(cid:13)(cid:13) R ( X ′ X ) − X ′ ν m (cid:13)(cid:13)(cid:13) , and where G is a standard normal n -vector. A lower bound that does not depend on thesequence ν m is as follows: lim inf m →∞ P ν m + µ , Φ m ( W ( C )) ≥ inf v ∈ R q , k v k =1 Pr (cid:16) v ′ ˇΩ − (Φ / G ) v ≥ (cid:17) = inf v ∈ R q , k v k =1 Pr (cid:16) v ′ ˇΩ − (Φ / G ) v > (cid:17) ≥ Pr (cid:16) ˇΩ(Φ / G ) is nonnegative definite (cid:17) . (30) In particular, if ˇΩ is nonnegative definite λ R n -almost everywhere (implying that Assumption7 is satisfied), this lower bound is . Remark 5.16. (i) Because A (( ν m ) m ≥ ) is a closed subset of the unit ball in R q and because themap v Pr (cid:0) v ′ ˇΩ − (Φ / G ) v ≥ (cid:1) is continuous on the unit ball under Assumption 7, we see thatthe expressions in (29) are positive if and only if λ R n (cid:0)(cid:8) y ∈ R n \ N ∗ : v ′ ˇΩ − ( y ) v ≥ (cid:9)(cid:1) > v ∈ A (( ν m ) m ≥ ). Under Assumption 7 we have λ R n (cid:0)(cid:8) y ∈ R n \ N ∗ : v ′ ˇΩ − ( y ) v ≥ (cid:9)(cid:1) = λ R n (cid:0)(cid:8) y ∈ R n \ N ∗ : v ′ ˇΩ − ( y ) v > (cid:9)(cid:1) for every v = 0 and hence, by continuity of ˇΩ − ( y ) on R n \ N ∗ ,condition (31) for some v = 0 is in turn equivalent to v ′ ˇΩ − ( y ) v > y = y ( v ) ∈ R n \ N ∗ .46ii) Let ˇ β and ˇΩ satisfy Assumption 5, let T be the test statistic defined in (28), and supposethat we now use a ”random” critical value ˇ C = ˇ C ( y ) > y ∈ R n . Suppose that ˇ C is contin-uous on R n \ N and satisfies the invariance condition ˇ C ( αy + Xγ ) = ˇ C ( y ) for every y ∈ R n \ N ,every α = 0, and for every γ ∈ R k . Rewriting the rejection region (cid:8) y ∈ R n : T ( y ) ≥ ˇ C (cid:9) as (cid:8) y ∈ R n : T ( y ) / ˇ C ≥ (cid:9) and observing that ¯Ω( y ) = ˇ C ( y ) ˇΩ( y ) satisfies Assumption 5 shows thatthe results of this subsection also apply to the test with rejection region (cid:8) y ∈ R n : T ( y ) ≥ ˇ C (cid:9) .As a corollary to Theorem 5.7, we now obtain negative size and power results for tests of theform (28). The semicontinuity conditions in Theorem 5.7 are implied by continuity properties ofthe estimators ˇΩ and ˇ β used in the construction of the test. The sufficient conditions so obtainedare easy to verify in practice and become particularly simple in the practically relevant case wheredim ( Z ) = 1, cf. the remark following the corollary. Corollary 5.17. Let ˇ β and ˇΩ satisfy Assumption 5 and let T be the test statistic defined in (28).Furthermore, let W ( C ) = { y ∈ R n : T ( y ) ≥ C } with < C < ∞ be the rejection region. Supposethat Z is a concentration space of the covariance model C . Recall that N is the exceptional set inAssumption 5 and that N ∗ is given by (27). Then the following holds:1. Suppose we have for some µ ∗ ∈ M that z ∈ R n \ N ∗ and T ( µ ∗ + z ) > C hold simultaneously λ Z -almost everywhere. Then sup Σ ∈ C P µ ,σ Σ ( W ( C )) = 1 holds for every µ ∈ M and every < σ < ∞ . In particular, the size of the test is equal toone.2. Suppose we have for some µ ∗ ∈ M that z ∈ R n \ N ∗ and T ( µ ∗ + z ) < C hold simultaneously λ Z -almost everywhere. Then inf Σ ∈ C P µ ,σ Σ ( W ( C )) = 0 holds for every µ ∈ M and every < σ < ∞ , and hence inf µ ∈ M inf Σ ∈ C P µ ,σ Σ ( W ( C )) = 0 , holds for every < σ < ∞ . In particular, the test is biased (except in the trivial casewhere its size is zero). Furthermore, the nuisance-infimal rejection probability at every point µ ∈ M is zero, i.e., inf <σ < ∞ inf Σ ∈ C P µ ,σ Σ ( W ( C )) = 0 . In particular, the infimal power of the test is equal to zero.3. Suppose ˇΩ is nonnegative definite on R n \ N . If z ∈ R n \ N , ˇΩ( z ) = 0 , and R ˇ β ( z ) = 0 holdsimultaneously λ Z -almost everywhere, then sup Σ ∈ C P µ ,σ Σ ( W ( C )) = 1 holds for every µ ∈ M and every < σ < ∞ . In particular, the size of the test is equal toone. emark 5.18. (i) Since T in the above corollary is invariant under G ( M ), the condition inthe corollary does not depend on the particular choice of µ ∗ ∈ M . Furthermore, if Z is one-dimensional, the invariance of T shows that T ( µ ∗ + z ) > C already holds for all z ∈ Z with z = 0provided it holds for one z ∈ Z with z = 0. In a similar vein, Part 1 of Lemma 5.15 implies forone-dimensional Z that z ∈ R n \ N ∗ holds for all z ∈ Z with z = 0 if and only if z ∈ R n \ N ∗ holdsfor at least one z ∈ Z with z = 0. In view of Assumption 5 a similar statement also applies to therelations z ∈ R n \ N , ˇΩ( z ) = 0, and R ˇ β ( z ) = 0.(ii) We note that the rejection probabilities under the null hypothesis, i.e., P µ ,σ Σ ( W ( C )), donot depend on (cid:0) µ , σ (cid:1) ∈ M × (0 , ∞ ). Hence Remark 3.4(ii) applies here.(iii) In case the covariance model C contains AR(1) correlation matrices, a remark analogous toRemark 5.9 also applies here. Furthermore, note that the concentration spaces derived from theAR(1) correlation matrices are one-dimensional, and hence the discussion in (i) above applies.The negative result in the preceding corollary does not apply if substantial portions of Z belongto the exceptional set N (which in particular occurs if Z ⊆ M holds and N is not empty as then Z ⊆ M ⊆ N ). For this case we provide a further negative result which is applicable provided (32)given below holds. For example, if Z = span ( e + ) and the design matrix contains an intercept, weimmediately obtain Z ⊆ M , and (32) holds if and only if the column in R corresponding to theintercept is nonzero. The significance of the subsequent theorem is that it provides an upper bound K for the power in certain directions which is less than or equal to a lower bound for the size.This will typically imply biasedness of the test (except if equality holds in (33)). Furthermore, notethat the result implies that the test has size 1 in case ˇΩ is positive definite λ R n -almost everywheresince then K = K = 1 follows. The condition on the covariance model C is often satisfied, seeRemark 5.20 following the theorem. Theorem 5.19. Let ˇ β and ˇΩ satisfy Assumptions 5 and 7, let T be the test statistic definedin (28), and let W ( C ) = { y ∈ R n : T ( y ) ≥ C } with < C < ∞ be the rejection region. As-sume that there is a sequence Σ m ∈ C such that Σ m → ¯Σ for m → ∞ where ¯Σ is singular with l := dim span( ¯Σ) > . Suppose that for some sequence of positive real numbers s m the matrix D m = Π span(¯Σ) ⊥ Σ m Π span(¯Σ) ⊥ /s m converges to a matrix D , which is regular on span( ¯Σ) ⊥ , and that Π span(¯Σ) ⊥ Σ m Π span(¯Σ) /s / m → . Suppose further that span( ¯Σ) ⊆ M , and let Z be a matrix, thecolumns of which form a basis for span( ¯Σ) . Assume also that R ˆ β ( z ) = 0 λ span(¯Σ) - a.e. (32) is satisfied. Then for every µ ∈ M , every σ with < σ < ∞ , and every M ≥ we have inf γ ∈ R l , k γ k≥ M inf Σ ∈ C P µ + Zγ,σ Σ ( W ( C )) ≤ K ≤ K ≤ sup Σ ∈ C P µ ,σ Σ ( W ( C )) . (33) The constants K and K are given by K = inf γ ∈ R l Pr (cid:0) ¯ ξ ( γ ) ≥ (cid:1) = inf k γ k =1 Pr (cid:0) ¯ ξ ( γ ) ≥ (cid:1) and K = Z Pr (cid:0) ¯ ξ ( γ ) ≥ (cid:1) dP ,A ( γ )48 ith the random variable ¯ ξ ( γ ) given by ¯ ξ ( γ ) = (cid:16) R ˆ β ( Zγ ) (cid:17) ′ ˇΩ − (cid:16)(cid:16) ¯Σ / + D / (cid:17) G (cid:17) R ˆ β ( Zγ ) on the event (cid:8)(cid:0) ¯Σ / + D / (cid:1) G ∈ R n \ N ∗ (cid:9) and by ¯ ξ ( γ ) = 0 otherwise, where G is a standardnormal n -vector. The matrix A denotes ( Z ′ Z ) − Z ′ ¯Σ Z ( Z ′ Z ) − , which is nonsingular, and P ,A denotes the Gaussian distribution on R l with mean zero and variance covariance matrix A . Remark 5.20. Suppose the covariance model C contains C AR (1) , or, more generally, C containsAR(1) correlation matrices Λ( ρ m ) for some sequence ρ m ∈ ( − , 1) with ρ m → ρ m → − e + e ′ + , span( ¯Σ) = span( e + ), and Z = e + ( ¯Σ = e − e ′− , span( ¯Σ) = span( e − ), and Z = e − ,respectively); cf. Lemma G.1 in Appendix G. Furthermore, condition (32) simplifies to R ˆ β ( e + ) = 0( R ˆ β ( e − ) = 0, respectively).The subsequent theorem specializes the positive result given in Theorems 5.10 and 5.12 to theclass of tests considered in the present subsection. Theorem 5.21. Let ˇ β and ˇΩ satisfy Assumptions 5, 6, and 7. Let T be the test statistic definedin (28). Furthermore, let W ( C ) = { y ∈ R n : T ( y ) ≥ C } with < C < ∞ be the rejection region.Suppose further that T ( y + z ) = T ( y ) for every y ∈ R n and every z ∈ J ( C ) . (34) Assume that C is bounded (as a subset of R n × n ). Assume also that for every sequence Σ m ∈ C converging to a singular ¯Σ there exists a subsequence ( m i ) i ∈ N and a sequence of positive real numbers s m i such that the sequence of matrices D m i = Π span(¯Σ) ⊥ Σ m i Π span(¯Σ) ⊥ /s m i converges to a matrix D which is regular on the orthogonal complement of span( ¯Σ) . Then the following holds:1. The size of the rejection region W ( C ) is strictly less than , i.e., sup µ ∈ M sup <σ < ∞ sup Σ ∈ C P µ ,σ Σ ( W ( C )) < . Furthermore, inf µ ∈ M inf <σ < ∞ inf Σ ∈ C P µ ,σ Σ ( W ( C )) > . 2. Suppose that λ R n (cid:0)(cid:8) y ∈ R n \ N ∗ : v ′ ˇΩ − ( y ) v ≥ (cid:9)(cid:1) > for every v ∈ R q with k v k = 1 . Thenthe infimal power is bounded away from zero, i.e., inf µ ∈ M inf <σ < ∞ inf Σ ∈ C P µ ,σ Σ ( W ( C )) > . 3. Suppose that ˇΩ is nonnegative definite λ R n -almost everywhere. Then for every < c < ∞ inf µ ∈ M , <σ < ∞ d ( µ , M ) /σ ≥ c P µ ,σ Σ m ( W ( C )) → olds for m → ∞ and for any sequence Σ m ∈ C satisfying Σ m → ¯Σ with ¯Σ a singular matrix.Furthermore, for every sequence < c m < ∞ inf µ ∈ M ,d ( µ , M ) ≥ c m P µ ,σ m Σ m ( W ( C )) → holds for m → ∞ whenever < σ m < ∞ , c m /σ m → ∞ , and the sequence Σ m ∈ C satisfies Σ m → ¯Σ with ¯Σ a positive definite matrix. [The very last statement even holds withoutrecourse to condition (34) and the condition on C following (34).]4. For every δ , < δ < , there exists a C ( δ ) , < C ( δ ) < ∞ , such that sup µ ∈ M sup <σ < ∞ sup Σ ∈ C P µ ,σ Σ ( W ( C ( δ ))) ≤ δ. Remark 5.22. (i) In case the covariance model C equals C AR (1) , a remark analogous to Remark5.14 also applies here.(ii) Under the assumptions of the preceding theorem, the additional condition in Part 2 of thetheorem is equivalent to v ′ ˇΩ − ( y ) v > v ∈ R q with k v k = 1 and a suitable y = y ( v ) ∈ R n \ N ∗ . Cf. Remark 5.16(i).We now discuss when the preceding theorem can be expected to apply and how the crucialcondition (34) can be enforced. As already noted prior to Theorem 5.10, a sufficient condition for(34) to be satisfied for any test statistic T of the form (28), based on estimators ˇ β and ˇΩ satisfyingAssumption 5, is that J ( C ) ⊆ M − µ for some (and hence all) µ ∈ M holds. This sufficientcondition is equivalent to J ( C ) ⊆ M and R ˆ β ( z ) = 0 for every z ∈ J ( C ), because M − µ coincideswith the set n µ ∈ M : R ˆ β ( µ ) = 0 o . [Note that replacing J ( C ) by span ( J ( C )) in the preceding twosentences leads to equivalent statements because M − µ as well as M are linear spaces.] Nowconsider the general case where J ( C ), or equivalently span ( J ( C )), may not be a subset of M − µ :If there exists a z ∈ span ( J ( C )) ∩ M with z / ∈ M − µ (i.e., with R ˆ β ( z ) = 0), then any test statistic T of the form (28), based on estimators ˇ β and ˇΩ satisfying Assumptions 5 and 7, does not satisfythe invariance condition (34), see Lemma F.3 in Appendix F. Hence, span ( J ( C )) ∩ M ⊆ M − µ ,or in other words R ˆ β ( z ) = 0 for every z ∈ span ( J ( C )) ∩ M , is a necessary condition for (34) to besatisfied for some T as above. We next show how a test statistic of the form (28) satisfying thecrucial invariance condition (34) can in fact be constructed if we impose this necessary condition. Proposition 5.23. Let C be a covariance model and suppose that span ( J ( C )) ∩ M ⊆ M − µ holds.1. Let ¯ M be the linear space spanned by J ( C ) ∪ M . Define ¯ X = ( X, ¯ x , . . . , ¯ x p ) where ¯ x i ∈ span ( J ( C ) ∪ ( M − µ )) are chosen in such a way that the columns of ¯ X form a basis of ¯ M .Assume that k < k + p < n holds. Suppose ¯ θ and ¯Ω are estimators satisfying the analogue ofAssumption 5 obtained by replacing k by k + p , X by ¯ X , and M by ¯ M . Let ¯ N denote the nullset appearing in that analogue of Assumption 5 and ¯ N ∗ = ¯ N ∪ (cid:8) y ∈ R n \ ¯ N : det ¯Ω( y ) = 0 (cid:9) .Define ¯ β = ( I k , 0) ¯ θ . Then ¯ β and ¯Ω satisfy the original Assumption 5 (with N given by ¯ N ),and the test statistic ¯ T given by ¯ T ( y ) = ( ( R ¯ β ( y ) − r ) ′ ¯Ω − ( y )( R ¯ β ( y ) − r ) , y ∈ R n \ ¯ N ∗ , , y ∈ ¯ N ∗ . atisfies the invariance condition (34).2. Let ¯ M and ¯ X be as above and k < k + p < n . Suppose ¯ θ ( y ) = (cid:0) ¯ X ′ ¯ X (cid:1) − ¯ X ′ y is the leastsquares estimator based on ¯ X . Then the requirements on ¯ θ postulated in the above mentionedanalogue of Assumption 5 are satisfied, and R ¯ β ( z ) = 0 holds for every z ∈ span ( J ( C )) .Furthermore, if X ∗ = (cid:0) X, x ∗ , . . . , x ∗ p (cid:1) is obtained in the same way as is ¯ X but for anotherchoice of elements x ∗ i ∈ span ( J ( C ) ∪ ( M − µ )) and if θ ∗ denotes the least squares estimatorw.r.t. the design matrix X ∗ , then R ¯ β ( y ) = Rβ ∗ ( y ) holds for every y ∈ R n with β ∗ denoting ( I k , θ ∗ . We next discuss ways of choosing ¯ x , . . . , ¯ x p such that they satisfy the requirements in thepreceding proposition: One natural way is to first find z . . . , z r in J ( C ) that form a basis ofspan J ( C ). From these vectors then select ¯ x = z i . . . , ¯ x p = z i p to complement the columns of X to a basis of ¯ M . An alternative way is based on the observation that adding elements of M − µ to each of the previously found z i j obviously gives rise to another feasible choice of ¯ x i . It hencefollows that an alternative feasible choice for the ¯ x i is to use the projections of the z i j onto theorthogonal complement of M − µ . Of course, if the estimator ¯ θ is chosen to be the least squaresestimator, then Part 2 of the preceding proposition informs us that the particular choice of the ¯ x i has no effect on R ¯ β ( y ) since it is invariant under the choice of the ¯ x i .Part 2 of Proposition 5.23 provides a particular estimator ¯ θ that satisfies the assumptionson ¯ θ maintained in Part 1 of this proposition. Because no particular covariance model C hasbeen specified in Proposition 5.23, we can not provide a similar concrete construction of ¯Ω inthat proposition. The construction of an appropriate ¯Ω has to be done on a case by case basis,depending on the covariance model employed in the particular application. For an example of sucha construction in the context of autocorrelation robust testing see Theorem 3.8. We furthermorenote that similar to the results in Part 2 of Proposition 5.23 such estimators ¯Ω will typically beunchanged whether they are constructed on the basis of the design matrices ¯ X or X ∗ . In particular,this is the case for the estimator constructed in Theorem 3.8.To summarize, the significance of Proposition 5.23 is that it tells us (in conjunction with Theorem5.21) when and how we can construct an adjusted test based on an auxiliary model that does notsuffer from the severe size and power distortions (i.e., size 1 and/or infimal power 0), the adjustmentconsisting of adding appropriate auxiliary regressors to the model. For a concrete implementationsee Theorem 3.8. Remark 5.24. (i) Suppose that the assumptions of Proposition 5.23 hold, except that now p = 0holds. Then J ( C ) ⊆ M and hence R ˆ β ( z ) = 0 holds for every z ∈ J ( C ), implying that actuallythe sufficient condition mentioned prior to the proposition is satisfied. Consequently, as discussedabove, the invariance condition (34) is already satisfied for every T of the form (28) based onestimators ˇ β and ˇΩ satisfying Assumption 5.(ii) Suppose that the assumptions of Proposition 5.23 hold, except that now k + p = n holds(note that k + p ≤ n always holds). Suppose further that T is a test statistic of the form (28) basedon estimators ˇ β and ˇΩ satisfying Assumptions 5 and 6. Then T can never satisfy (34) and henceTheorem 5.21 does not apply in this situation. This can be seen as follows: Because of k + p = n itfollows that every y ∈ R n can be written as a linear combination of finitely many z i ∈ J ( C ) plus anelement µ in M . Because invariance w.r.t. addition of elements z ∈ J ( C ) is equivalent to invariancew.r.t. addition of elements z ∈ span ( J ( C )) (cf. Remark 5.11(i)) we see that T ( y ) = T ( µ ) wouldhave to hold under (34). As noted after the introduction of Assumption 5, either M ⊆ N ⊆ N ∗ N is empty. In the second case we have that ˇΩ( µ ) = 0 as a consequence of equivariance.Hence in both cases we arrive at µ ∈ N ∗ and thus at T ( µ ) = 0. But this shows that T is constantequal to zero, contradicting Part 5 of Lemma 5.15.(iii) Proposition 5.23 uses the auxiliary matrix ¯ X and the associated estimators ¯ θ to constructan estimator ¯ β for the parameter β in the originally given regression model (1) and this estimator ¯ β is then used to construct a test statistic ¯ T for the testing problem (4) to which Theorem 5.21 can beapplied. In an alternative view we can consider the auxiliary model Y = ¯ Xθ + U with θ = (cid:0) β ′ , ζ ′ (cid:1) ′ asa model in its own right. [Of course, if we maintain model (1) then ζ = 0 must hold in the auxiliarymodel.] Define the q × ( k + p ) matrix ¯ R = R ( I k , M = (cid:8) µ ∈ ¯ M : µ = ¯ Xθ, ¯ Rθ = r (cid:9) andset ¯ M = ¯ M \ ¯ M , and define a null hypothesis ¯ H and an alternative hypothesis ¯ H analogouslyas in (4). Proposition 5.23 can now be viewed as stating that condition (34) is satisfied for the teststatistic which is obtained by using (28) based on the restriction matrix ¯ R and on the estimators ¯ θ and ¯Ω figuring in Proposition 5.23. Consequently, Theorem 5.21 can be directly applied to this teststatistic (provided ¯Ω satisfies Assumptions 6 and 7). It should be noted that the so-obtained resultnow applies to the problem of testing ¯ H versus ¯ H . However, since M ⊆ ¯ M and M ⊆ ¯ M holdand since T is invariant under translation by elements in span ( J ( C )), we essentially recover thesame result as before. As already noted in Section 2, the negative results given in this paper immediately extend in atrivial way without imposing the Gaussianity assumption on the error vector U in (1) as long asthe assumptions on the feasible error distributions is weak enough to ensure that the implied setof distributions for Y contains the set (cid:8) P µ,σ Σ : µ ∈ M , < σ < ∞ , Σ ∈ C (cid:9) , but possibly containsalso other distributions.Another, less trivial, extension is as follows: Suppose that U is elliptically distributed in thesense that it has the same distribution as ̺ σ Σ / E where 0 < σ < ∞ , Σ ∈ C , E is a random vectoruniformly distributed on the unit sphere S n − , and ̺ is a random variable distributed independentlyof E satisfying Pr( ̺ > 0) = 1. [If ̺ is distributed as the square root of a chi-square with n degreesof freedom we recover the Gaussian situation described in Section 2.] If ϕ is a test that is invariantunder the group G ( M ) then it is easy to see that for µ ∈ M E ( ϕ ( µ + ̺ σ Σ / E )) = E ( ϕ ( µ + Σ / E ))holds. Since this does not depend on the distribution of ̺ at all, we learn that the rejection prob-ability under the null hypothesis is therefore the same as in the Gaussian case. As a consequence,all results concerning only the null behavior of ϕ obtained under Gaussianity in the paper extendimmediately to regression models in which the disturbance vector U is elliptically distributed in theabove sense. Furthermore, all results concerning rejection probabilities under the alternative whichare obtained from the behavior of the null rejection probabilities by an approximation argument(e.g., Parts 2 and 3 of Theorem 5.7 as well as of Corollary 5.17, and the corresponding applicationsof these results in Sections 3 and 4) also go through in view of Scheff´e’s lemma provided the densityof ̺ E exists and is continuous almost everywhere. Under an additional absolute continuity assumption this is also true for almost invariant tests ϕ . Appendix: Proofs for Subsection 3.1 Proof of Lemma 3.1: Observe that ˆΩ w ( y ) = B ( y ) W n B ′ ( y ). Given that W n is positive definitedue to Assumption 2, this immediately establishes Parts 1-3 of the Lemma. It remains to prove Part4. Let s be as in Assumption 3 and consider first the case where this assumption is satisfied, i.e.,where rank (cid:0) R ( X ′ X ) − X ′ ( ¬ ( i , . . . i s )) (cid:1) = q holds. If now y is such that ˆΩ w ( y ) is singular it follows,in view of the equivalent condition rank ( B ( y )) < q , that ˆ u l ( y ) = 0 must hold at least for some l / ∈{ i , . . . i s } where l may depend on y . But this means that y satisfies e ′ l ( n ) (cid:16) I n − X ( X ′ X ) − X ′ (cid:17) y =0. Since e ′ l ( n ) (cid:16) I n − X ( X ′ X ) − X ′ (cid:17) = 0 by construction of l , it follows that the set of y for whichˆΩ w ( y ) is singular is contained in a finite union of proper linear subspaces, and hence is a λ R n -nullset. Next consider the case where Assumption 3 is not satisfied. Observe that then s > u i ( y ) = 0 holds for all y ∈ R n and all i ∈ { i , . . . i s } by construction of { i , . . . i s } .But then for every y ∈ R n rank ( B ( y )) = rank (cid:0) R ( X ′ X ) − X ′ ( ¬ ( i , . . . i s )) A ( y ) (cid:1) ≤ rank (cid:0) R ( X ′ X ) − X ′ ( ¬ ( i , . . . i s )) (cid:1) < q is satisfied where A ( y ) is obtained from diag (ˆ u ( y ) , . . . , ˆ u n ( y )) by deleting rows and columns i with i ∈ { i , . . . i s } . This completes the proof. (cid:4) Lemma A.1. Suppose Assumptions 2 and 3 are satisfied. Then ˆ β and ˆΩ w satisfy Assumption 5, 6,and 7 with N = ∅ . In fact, ˆΩ w ( y ) is nonnegative definite for every y ∈ R n , and is positive definite λ R n -almost everywhere. The test statistic T defined in (7), with ˆΨ w as in (6), is invariant under thegroup G ( M ) and the rejection probabilities P µ,σ Σ ( T ≥ C ) depend on (cid:0) µ, σ , Σ (cid:1) ∈ M × (0 , ∞ ) × C only through (( Rβ − r ) /σ, Σ) (in fact, only through ( h ( Rβ − r ) /σ i , Σ) ), where β corresponds to µ via µ = Xβ .Proof. Clearly, ˆ β and ˆΩ w are well-defined and continuous on R n , hence we may set N = ∅ inAssumption 5. Symmetry of ˆΩ w as well as the required equivariance properties of ˆ β and ˆΩ w are obviously satisfied. By Assumption 2 ˆΩ w ( y ) is nonnegative definite for every y ∈ R n . ByAssumptions 2 and 3 and Lemma 3.1 the matrix ˆΩ w is nonsingular (and hence positive definite) λ R n -almost everywhere. Hence Assumptions 5, 6, and 7 are satisfied which proves the first claim.The remaining claims follow immediately from Lemma 5.15 and Proposition 5.4. Proof of Theorem 3.3: By Lemma A.1 we know that ˆ β and ˆΩ w satisfy Assumption 5 and thatˆΩ w ( y ) is nonnegative definite for every y ∈ R n . Furthermore, in view of this lemma and because N = ∅ , the set N ∗ in Corollary 5.17 is precisely the set of y for which rank ( B ( y )) < q , cf. Lemma3.1. By Assumption 1 the spaces Z + = span( e + ) and Z − = span( e − ) are concentration spaces of C . The theorem now follows by applying Corollary 5.17 and Remark 5.18(i) to Z + as well as to Z − and by noting that e + ∈ R n \ N ∗ translates into rank ( B ( e + )) = q with a similar translation if e + is replaced by e − . Also note that the size of the test can not be zero in view of Part 5 of Lemma5.15 and Lemma A.1. (cid:4) Proof of Proposition 3.6: ( 1) Define the matrix B ∗ X ( y ) = (det( X ′ X )) B X ( y ) and observethat (for given y ) every element of this matrix is a multivariate polynomial in the elements x ti of X because ( X ′ X ) − can be written as (det( X ′ X )) − adj( X ′ X ) (with the convention that adj( X ′ X ) =1 if k = 1). Because det( X ′ X ) = 0 for X ∈ X holds, we have X ( e + ) = X ∩ (cid:8) X ∈ R n × k : det ( B ∗ X ( e + ) B ∗′ X ( e + )) = 0 (cid:9) . x ti . Thus it is an algebraic set, and hence is either a λ R n × k -null set or is all of R n × k . However, the latter case can not arise because we can choose an n × k matrix X ∈ X , say, such that all its columns are orthogonal to e + (this being possiblesince k < n by assumption) and this matrix then satisfies rank (cid:0) B ∗ X ( e + ) (cid:1) = q . This shows that X ( e + ) is a λ R n × k -null set. Next consider X ( e + ): Observe that for X ∈ X \ X ( e + ) we havedet( ˆΩ w,X ( e + )) = 0 and hence for X ∈ X \ X ( e + ) the relation T X ( e + + µ ∗ ) = C can equivalentlybe written as( R adj( X ′ X ) X ′ e + ) ′ adj ˆΩ w,X ( e + ) ( R adj( X ′ X ) X ′ e + ) − (det( X ′ X )) det( ˆΩ w,X ( e + )) C = 0 . Furthermore, for X ∈ X we can write ˆΩ w,X ( e + ) as (det( X ′ X )) − B ∗ X ( e + ) W n B ∗′ X ( e + ). Notethat B ∗ X ( e + ) W n B ∗′ X ( e + ) is a multivariate polynomial in the variables x ti . Consequently, for X ∈ X \ X ( e + ) the relation T X ( e + + µ ∗ ) = C can, after multiplication by (det( X ′ X )) q − , which isnonzero for X ∈ X , equivalently be written as(det( X ′ X )) ( R adj( X ′ X ) X ′ e + ) ′ adj ( B ∗ X ( e + ) W n B ∗′ X ( e + )) ( R adj( X ′ X ) X ′ e + ) − det( B ∗ X ( e + ) W n B ∗′ X ( e + )) C = 0 . The left-hand side of the above display is now a multivariate polynomial in the elements x ti .The polynomial does not vanish on all of R n × k since the matrix X constructed before providesan element in X \ X ( e + ) for which T X ( e + + µ ∗ ) = 0 < C holds. The proofs for X ( e − ) and X ( e − ) are completely analogous, as is the proof for the fact that R n × k \ X is a λ R n × k -null set.Finally, that the set of all design matrices X ∈ X for which Theorem 3.3 does not apply is a subsetof ( X ( e + ) ∪ X ( e + )) ∩ ( X ( e − ) ∪ X ( e − )) is obvious upon observing that the set of all X ∈ X which do not satisfy Assumption 3 is contained in X ( e + ) as well as in X ( e − ).(2) Similar arguments as in the proof of Part 1 show that ˜ X ( e − ) and ˜ X ( e − ) are each containedin an algebraic set. Define the matrix X ♯ = (cid:16) e + , ˜ X ♯ (cid:17) where the columns of ˜ X ♯ are k − e + as well as e − . It is then easy to see that˜ X ♯ ∈ ˜ X \ ˜ X ( e − ), implying that ˜ X ( e − ) does not coincide with all of ˜ X . Furthermore, simplecomputation shows that T X ♯ ( e − + µ ∗ ) = 0 < C by the assumption on R , which implies that˜ X ( e − ) is a proper subset of ˜ X \ ˜ X ( e − ). It follows now as above that ˜ X ( e − ) and ˜ X ( e − ) are λ R n × ( k − -null sets. The rest of the proof now proceeds as before.(3) See Example 3.1. (cid:4) Proof of Theorem 3.7: We verify the assumptions of Theorem 5.21. By Lemma A.1 Assump-tions 5, 6, and 7 are satisfied. Because of C = C AR (1) we have that J ( C ) = span( e + ) ∪ span( e − ),see Lemma G.1, and because e + , e − ∈ M is assumed we conclude that J ( C ) ⊆ M . The assumption R ˆ β ( e + ) = R ˆ β ( e − ) = 0 then implies that even J ( C ) ⊆ M − µ holds. The invariance condition (34)in Theorem 5.21 is thus satisfied, because T is G ( M )-invariant by Lemma 5.15. The assumptionson C in Theorem 5.21 are satisfied in view of Lemma G.1. Finally the assumptions on ˆΩ w in Parts2 and 3 of Theorem 5.21 are satisfied because ˆΩ w is positive definite λ R n -almost everywhere asshown in Lemma A.1. The theorem now follows from Theorem 5.21 using a standard subsequenceargument for Part 3. The claim in parenthesis in Part 3 follows from the corresponding claim inparenthesis in Theorem 5.21 and the observation that the conditions on e + and e − in the theoremwere only used to verify condition (34). (cid:4) Proof of Theorem 3.8: Similar as in the preceding proof verify the assumptions of Theorem5.21 but now for ¯ β and ¯Ω w by additionally making use of Proposition 5.23. Note that the condition54pan ( J ( C )) ∩ M ⊆ M − µ is satisfied in all five parts of the theorem. This is obvious for Parts1-3. For Part 4 this follows from the following argument: Observe that e − = δe + + Xγ must holdby the assumptions of Part 4. Now suppose m ∈ span ( J ( C )) ∩ M . Then α + e + + α − e − = m = Xγ ∗ must hold. These relations together imply ( α + + α − δ ) e + = X ( γ ∗ − α − γ ). Because e + / ∈ M , itfollows that γ ∗ − α − γ = 0. Thus R ( X ′ X ) − X ′ m = Rγ ∗ = α − Rγ = α − ¯ R ( γ ′ : δ ) ′ = α − ¯ R (cid:0) ¯ X ′ ¯ X (cid:1) − ¯ X ′ e − = 0 , which establishes that m ∈ M − µ . The verification for Part 5 is completely analogous. (cid:4) Proof of Lemma 3.11: Since ˆΩ w ( y ) = nB ( y ) W ∗ n B ′ ( y ), Parts 1-3 of the Lemma followimmediately from nonnegative definiteness of W ∗ n . To prove Part 4 observe that ˆΩ w ( y ) is singularif and only if det ( B ( y ) W ∗ n B ′ ( y )) = 0. Now observe that the l.h.s. of this equation is a multivariatepolynomial in y , hence the solution set is an algebraic set and thus is either a λ R n -null set or all of R n . (cid:4) Proof of Theorem 3.12: The proof is completely analogous to the proof of Theorem 3.3 usingLemma G.2 in case ν ∈ (0 , π ). (cid:4) B Appendix: Proofs for Subsection 3.3 Proof of Lemma 3.14: The inclusion M ⊆ N ( a , a ) is trivial since ˆ u ( y ) = 0 for y ∈ M .Because a ∈ { , } , a ∈ { n − , n } with a ≤ a holds, N ( a , a ) is contained in N ( a , a ),establishing the first claim. Closedness of N ( a , a ) is obvious. Given the just established inclusion N ( a , a ) ⊆ N ( a , a ) the alternative description of N ( a , a ) given in the second claim is alsoimmediately seen to be true. Continuity of ˆ ρ on R n \ N ( a , a ) is obvious. Assume now that k ≤ a − a holds. If a = 1, a = n , i.e., ˆ ρ = ˆ ρ Y W , we have N ( a , a ) = N ( a , a ) = M becauseˆ ρ Y W is well-defined and bounded away from one in modulus on R n \ M as shown in Remark 3.13(i).Hence, N ( a , a ) is a λ R n - null set in this case as k < n holds by assumption. To establish thisresult also for the other choices of a and a note that N ( a , a ) is the zero set of a multivariatepolynomial in y . It hence is a λ R n - null set, provided we can show that the polynomial is notidentically zero. Observe that we now have n − k ≥ n − a + a ≥ a = 1, a = n ). Let y (1) , . . . , y ( n − k ) be a basis for M ⊥ . The submatrix obtainedfrom (cid:0) y (1) , . . . , y ( n − k ) (cid:1) by selecting the rows with index j satisfying j < a as well as the rows with j > a has dimension ( n − a + a − × ( n − k ) and thus has rank at most n − a + a − < n − k .Consequently, we can find constants c , . . . , c n − k , not all equal to zero, such that the j -th componentof y = P n − ki =1 c i y ( i ) is zero whenever j < a or j > a . Because y ∈ M ⊥ and y = 0 by construction,we have y ∈ R n \ M = R n \ N (1 , n ). Because y = 0 and because the j -th component of y = ˆ u ( y )is zero whenever j < a or j > a , we also have y ∈ R n \ N ( a , a ). Hence, ˆ ρ ( y ) as well asˆ ρ Y W ( y ) are well-defined. Furthermore, they coincide in view of the construction of y = ˆ u ( y ).By what was said above for the Yule-Walker estimator it follows that | ˆ ρ ( y ) | = | ˆ ρ Y W ( y ) | < y ∈ R n \ N ( a , a ), and the polynomial is not identically equal to zero. (cid:4) Lemma B.1. Suppose ˆ ρ satisfies Assumption 4.1. The sets R n \ N ( a , a ) , R n \ N ( a , a ) , and R n \ N ( a , a ) are invariant under the group oftransformations y αy + Xγ where α = 0 , γ ∈ R k .2. The estimators ˜ β , ˜ σ , and ˜Ω are well-defined and continuous on R n \ N ( a , a ) . They satisfythe equivariance conditions ˜ β ( αy + Xγ ) = α ˜ β ( y ) + γ , ˜ σ ( αy + Xγ ) = α ˜ σ ( y ) , and ˜Ω( αy +55 γ ) = α ˜Ω( y ) for α = 0 , γ ∈ R k , and y ∈ R n \ N ( a , a ) . The estimator ˜Ω ( y ) is (well-definedand) nonsingular if and only if y ∈ R n \ N ∗ ( a , a ) . The sets N ( a , a ) and N ∗ ( a , a ) areclosed. If k ≤ a − a holds, N ( a , a ) and N ∗ ( a , a ) are λ R n -null sets.3. The estimator ˆΩ is well-defined and continuous on R n \ N ( a , a ) , whereas ˆ β and ˆ σ arewell-defined and continuous on all of R n . They satisfy the equivariance conditions ˆ β ( αy + Xγ ) = α ˆ β ( y ) + γ , ˆ σ ( αy + Xγ ) = α ˆ σ ( y ) for α = 0 , γ ∈ R k , and y ∈ R n , as well as ˆΩ( αy + Xγ ) = α ˆΩ( y ) for α = 0 , γ ∈ R k , and y ∈ R n \ N ( a , a ) . Furthermore, ˆ σ ( y ) > holds for y ∈ R n \ M ⊇ R n \ N ∗ ( a , a ) , and hence ˆΩ ( y ) is (well-defined and) nonsingular ifand only if y ∈ R n \ N ∗ ( a , a ) . The set N ∗ ( a , a ) is closed. If k ≤ a − a holds, N ∗ ( a , a ) is a λ R n -null set. [Recall from Remark 3.13(iv) that N ( a , a ) is always a closed set, and isa λ R n -null set in case k ≤ a − a .]Proof. (1) The invariance of the first two sets follows since ˆ u ( αy + Xγ ) = α ˆ u ( y ) holds for every y ∈ R n , α = 0, and γ ∈ R k . This property of the residual vector implies ˆ ρ ( αy + Xγ ) = ˆ ρ ( y )for every α = 0, γ ∈ R k and y ∈ R n \ N ( a , a ) ⊇ R n \ N ( a , a ). Together with the alreadyestablished invariance of R n \ N ( a , a ) this implies invariance of R n \ N ( a , a ) upon observingthat Λ − (ˆ ρ ( y )) is well-defined for y ∈ R n \ N ( a , a ). The latter holds because for b ∈ R , | b | 6 = 1the matrix Λ( b ) is nonsingular. [This can, e.g., be seen from the fact that its inverse is given by thesymmetric tridiagonal matrix with diagonal equal to (cid:0) , b , . . . , b , (cid:1) / (cid:0) − b (cid:1) and withthe elements next to the diagonal given by − b/ (cid:0) − b (cid:1) .](2) Using Lemma 3.14 and the just established fact that Λ − (ˆ ρ ( y )) is well-defined for y ∈ R n \ N ( a , a ), we see that ˜ β , ˜ σ , and ˜Ω are well-defined and continuous on R n \ N ( a , a ) ⊆ R n \ N ( a , a ). Observing that ˆ ρ ( αy + Xγ ) = ˆ ρ ( y ) holds for α = 0, γ ∈ R k , and y ∈ R n \ N ( a , a ) ⊇ R n \ N ( a , a ), the claimed equivariance of ˜ β , ˜ σ , and ˜Ω follows. The third claim is obvious, andthe fourth claim follows easily from Lemma 3.14. We next prove the last claim for the Yule-Walkerestimator, i.e., for a = 1 and a = n : For this it suffices to show that N ∗ (1 , n ) ⊆ M since M isa proper subspace of R n in view of the assumption k < n . Now for arbitrary y / ∈ M = N (1 , n )we have that ˆ ρ Y W ( y ) is well-defined and satisfies | ˆ ρ Y W ( y ) | < y ∈ R n \ N (1 , n ) as well as positive definiteness of Λ(ˆ ρ Y W ( y )). But this gives positive definiteness,and hence nonsingularity, of X ′ Λ − (ˆ ρ Y W ( y )) X , implying that y ∈ R n \ N (1 , R ( X ′ Λ − (ˆ ρ ( y )) X ) − R ′ . Furthermore, y / ∈ M implies y − X ˜ β ( y ) = 0 andthus ˜ σ ( y ) > ρ Y W ( y )). But this gives y ∈ R n \ N ∗ (1 , a = 1 and a = n . To prove the claim forthe remaining values of a and a we first show that N ( a , a ) is a λ R n -null set: observe that N ( a , a ) is the union of N ( a , a ) and (cid:8) y ∈ R n \ N ( a , a ) : det (cid:0) X ′ Λ − (ˆ ρ ( y )) X (cid:1) = 0 (cid:9) . In viewof Lemma 3.14 it hence suffices to show that the latter set is a λ R n -null set. Using the relation D − = adj ( D ) / det ( D ) (with the convention that adj ( D ) = 1 if D is 1 × 1) and noting thatdet (Λ(ˆ ρ ( y ))) = 0 for y ∈ R n \ N ( a , a ) the set in question can be rewritten as A = { y ∈ R n \ N ( a , a ) : det ( X ′ adj (Λ(ˆ ρ ( y ))) X ) = 0 } . Note that the equation in the set in the above display is polynomial in ˆ ρ ( y ). Upon multiplying theequation defining A by (cid:0)P a t = a ˆ u t ( y ) (cid:1) d , which is non-zero on R n \ N ( a , a ), where d = ( n − k , theset A is seen to be the intersection of R n \ N ( a , a ) with the zero-set of a multivariate polynomialin y . Hence, A is a λ R n -null set provided we can establish that the polynomial is not identically zero.For this it suffices to find an y ∈ R n \ N ( a , a ) such that det (cid:0) X ′ Λ − (ˆ ρ ( y )) X (cid:1) = 0: Set y = y y has been constructed in the proof of Lemma 3.14. Observe that ˆ ρ ( y ) = ˆ ρ Y W ( y ) for theestimator ˆ ρ specified by a and a and hence y ∈ R n \ N ( a , a ) ⊆ R n \ M since | ˆ ρ Y W ( y ) | < (cid:0) X ′ Λ − (ˆ ρ ( y )) X (cid:1) = 0 holds because Λ(ˆ ρ Y W ( y )) is always positive definite(whenever it is defined) as has been established before. This shows that N ( a , a ) is a λ R n -nullset. It remains to show that N ∗ ( a , a ) is a λ R n -null set. For this it suffices to show that B = (cid:8) y ∈ R n \ N ( a , a ) : ˜ σ ( y ) = 0 (cid:9) as well as C = (cid:8) y ∈ R n \ N ( a , a ) : det (cid:0) R ( X ′ Λ − (ˆ ρ ( y )) X ) − R ′ (cid:1) = 0 (cid:9) are λ R n -null sets. Noting that det (Λ(ˆ ρ ( y ))) = 0 as well as det ( X ′ adj (Λ(ˆ ρ ( y ))) X ) = 0 hold for y ∈ R n \ N ( a , a ), the set B can be rewritten as B = n y ∈ R n \ N ( a , a ) : det (Λ(ˆ ρ ( y ))) [det ( X ′ adj (Λ(ˆ ρ ( y ))) X )] ˜ σ ( y ) = 0 o . Again the equation in the set in the above display is polynomial in y and ˆ ρ ( y ). Upon multiplyingthis by (cid:0)P a t = a ˆ u t ( y ) (cid:1) d , which is non-zero on R n \ N ( a , a ), where d = ( n − (2 k + 1), one seesthat B is the intersection of R n \ N ( a , a ) with the zero-set of a multivariate polynomial in y .To establish that B is a λ R n -null set it thus suffices to find an y ∈ R n \ N ( a , a ) with ˜ σ ( y ) > 0. Choose y as above. Then we know that y ∈ R n \ N ( a , a ) and det (cid:0) X ′ Λ − (ˆ ρ ( y )) X (cid:1) = 0hold, i.e., y ∈ R n \ N ( a , a ). Furthermore, as shown before Λ(ˆ ρ ( y )) is positive definite (sinceΛ(ˆ ρ ( y )) = Λ(ˆ ρ Y W ( y ))) and y − X ˜ β ( y ) = 0 holds (since y / ∈ M ). Consequently, ˜ σ ( y ) > C is very similar.(3) Well-definedness is trivial and continuity follows from continuity of ˆ ρ on the open set R n \ N ( a , a ) (cf. Lemma 3.14). Equivariance of ˆ β and ˆ σ is obvious, while the equivarianceproperty of ˆΩ follows from invariance of R n \ N ( a , a ) and the equivariance of ˆ ρ established in(1). The third claim is obvious. Closedness of N ∗ ( a , a ) follows from the continuity property ofˆ ρ established in Lemma 3.14. To prove the final claim observe that N ∗ ( a , a ) is the union of the λ R n -null set N ( a , a ) with (cid:8) y ∈ R n \ N ( a , a ) : det (cid:0) R ( X ′ X ) − X ′ Λ(ˆ ρ ( y )) X ( X ′ X ) − R ′ (cid:1) = 0 (cid:9) . Multiplying the equation defining this set by (cid:0)P a t = a ˆ u t ( y ) (cid:1) q ( n − , which is non-zero on R n \ N ( a , a ),one sees that the above set is the intersection of R n \ N ( a , a ) with the zero-set of a multivariatepolynomial in y . Again perusing y constructed before shows that the polynomial is not identicallyzero, which then delivers the desired result.The following lemma is an immediate consequence of Lemma B.1. Lemma B.2. Suppose ˆ ρ satisfies Assumption 4 and k ≤ a − a holds. Then ˜ β and ˜Ω satisfyAssumption 5 with N = N ( a , a ) , and the set N ∗ (cf. equation (27)) is given by N ∗ ( a , a ) .Similarly, ˆ β and ˆΩ satisfy Assumption 5 with N = N ( a , a ) , and the set N ∗ is given by N ∗ ( a , a ) .The sets N ∗ ( a , a ) and N ∗ ( a , a ) are invariant under the group of transformations y αy + Xγ where α = 0 , γ ∈ R k .Proof. The lemma except for the last claim follows from Lemma B.1. The last claim then followsfrom Lemma F.1 in Appendix F, cf. also the discussion following Assumption 5.57 emma B.3. Suppose ˆ ρ satisfies Assumption 4 and k ≤ a − a holds. Then ˜Ω and ˆΩ satisfyAssumptions 6 and 7 with N ∗ = N ∗ ( a , a ) in case of ˜Ω and with N ∗ = N ∗ ( a , a ) in case of ˆΩ .Proof. Consider first the case of the Yule-Walker estimator, i.e., a = 1 and a = n . ThenΛ(ˆ ρ Y W ( y )) is positive definite for every y / ∈ N ( a , a ). Hence ˜Ω ( y ) is positive definite for y / ∈ N ∗ ( a , a ) and ˆΩ ( y ) is positive definite for y / ∈ N ∗ ( a , a ). Consequently, Assumptions6 and 7 are clearly satisfied. Next consider the case where a = 1 or a = n . Then y con-structed in the proof of Lemma 3.14 satisfies y ∈ R n \ N ∗ ( a , a ) as well as y ∈ R n \ N ∗ ( a , a )as shown in the proof of Lemma B.1. Because of ˆ ρ ( y ) = ˆ ρ Y W ( y ), we also see that ˜Ω( y ) aswell as ˆΩ( y ) are positive definite (as the variance covariance estimators based on ˆ ρ coincide withthe ones based on the Yule-Walker estimator). This shows that Assumption 6 is satisfied for˜Ω and ˆΩ. It remains to establish Assumption 7: Let v = 0, v ∈ R q be arbitrary. The pre-ceding argument has shown y ∈ R n \ N ∗ ( a , a ) and y ∈ R n \ N ∗ ( a , a ) and also shows that v ′ ˜Ω − ( y ) v > v ′ ˆΩ − ( y ) v > n y ∈ R n \ N ∗ ( a , a ) : v ′ ˜Ω − ( y ) v = 0 o is the intersection of y ∈ R n \ N ∗ ( a , a ) with the zero-set ofa multivariate polynomial, and similarly for n y ∈ R n \ N ∗ ( a , a ) : v ′ ˆΩ − ( y ) v = 0 o . This is provedin a similar manner as in the proof of Lemma B.1 by rewriting all inverse matrices appearing in v ′ ˜Ω − ( y ) v ( v ′ ˆΩ − ( y ) v , respectively) in terms of the adjoints and determinants and observing thatthe determinants are all non-zero for y ∈ R n \ N ∗ ( a , a ) ( y ∈ R n \ N ∗ ( a , a ), respectively). Thisshows that v ′ ˜Ω − ( y ) v = 0 ( v ′ ˆΩ − ( y ) v = 0, respectively) can be rewritten as a polynomial equationin ˆ ρ ( y ). Multiplying this polynomial equation by a suitable power of P a t = a ˆ u t ( y ), which is non-zero on R n \ N ∗ ( a , a ) ( R n \ N ∗ ( a , a ), respectively) shows that these equations can be rewrittenas polynomial equations in y . Proof of Theorem 3.15: We first verify the assumptions of Corollary 5.17. Assumption 5is satisfied for ˜ β and ˜Ω (with N = N ( a , a ) and N ∗ = N ∗ ( a , a )) as well as for ˆ β and ˆΩ (with N = N ( a , a ) and N ∗ = N ∗ ( a , a )) in view of Lemma B.2. In view of Assumption 1 we concludefrom Lemma G.1 that Z + = span ( e + ) as well as Z − = span ( e − ) are concentration spaces of C .Applying Parts 1 and 2 of Corollary 5.17 and Remark 5.18(i) to Z + as well as to Z − establishes(1) and (2) of the theorem as well as the corresponding parts of (4), if we also note that the sizeof the test can not be zero in view of Part 5 of Lemma 5.15 and Lemma B.3. In order to prove(3) of the theorem, we apply Theorem 5.19. First note that ˜Ω satisfies Assumption 7 because ofLemma B.3. Furthermore, choose as the sequence Σ m in that theorem Σ m = Λ ( ρ m ) for somesequence ρ m → ρ m ∈ ( − , e + e + by Lemma G.1, which also provides the matrix D and its required properties. Hence l = 1 and span (cid:0) ¯Σ (cid:1) = span ( e + ) which is contained in M since e + ∈ M has been assumed and M is a linear space. Condition (32) in Theorem 5.19 is satisfiedin view of the assumption R ˆ β ( e + ) = 0 since span (cid:0) ¯Σ (cid:1) = span ( e + ). Inspection of the constants K and K in Theorem 5.19 reveal that K = K =: K F GLS ( e + ) since in the present case γ isone-dimensional. That K F GLS ( e + ) depends only on the quantities given in the theorem is obviousfrom the formulas for K and K . Furthermore, if ˆ ρ ≡ ˆ ρ Y W , then ˜Ω is always positive definiteon R n \ N (1 , n ) = R n \ M , because | ˆ ρ Y W | < ρ Y W ) is positive definite on R n \ N (1 , n ). Inspection of the constants K and K then reveals K = K = 1 in that case. Theclaims in (3) with e + replaced by e − are proved analogously, and so are the remaining claims in(4). (cid:4) Proof of Proposition 3.16: (1) First consider X ,F GLS ( e + ). The condition e + ∈ N ∗ ,X ( a , a )is equivalent to e + ∈ N ,X ( a , a ), or e + ∈ R n \ N ,X ( a , a ) but det (cid:0) X ′ Λ − (ˆ ρ X ( e + )) X (cid:1) = 0, or to58 + ∈ R n \ N ,X ( a , a ) and det (cid:0) X ′ Λ − (ˆ ρ X ( e + )) X (cid:1) = 0 but ˜ σ X ( e + ) det (cid:0) R ( X ′ Λ − (ˆ ρ X ( e + )) X ) − R ′ (cid:1) =0. The first one of these three conditions can be written as n X t =2 ˆ u t ( e + )ˆ u t − ( e + ) ! = a X t = a ˆ u t ( e + ) ! . (35)Since det( X ′ X ) = 0 holds for X ∈ X , the set of X ∈ X satisfying (35) is – after multiplication ofboth sides of (35) by the fourth power of det( X ′ X ) – seen to be included in the zero-set of a multivari-ate polynomial in the variables x ti . Observing that det (Λ(ˆ ρ X ( e + ))) = 0 and P a t = a ˆ u t,X ( e + ) = 0for e + ∈ R n \ N ,X ( a , a ), the second one of the above conditions takes the equivalent form a X t = a ˆ u t ( e + ) ! k ( n − det ( X ′ adj (Λ(ˆ ρ X ( e + ))) X ) = 0 , n X t =2 ˆ u t ( e + )ˆ u t − ( e + ) ! = a X t = a ˆ u t ( e + ) ! . (36)For X ∈ X satisfying the inequality in (36), the left-hand side of the equation in the precedingdisplay is easily seen to be a polynomial in the variables x ti and ˆ u t ( e + ). Since det( X ′ X )ˆ u t ( e + ) ispolynomial in the variables x ti and det( X ′ X ) = 0 for X ∈ X , we may rewrite the equation in thepreceding display by multiplying it by the 4 k ( n − -th power of det( X ′ X ). The resulting equivalentequation is obviously a polynomial in the variables x ti . This shows that the set of X ∈ X satisfying(36) is (a subset of) the zero-set of a multivariate polynomial. Recalling that det (Λ(ˆ ρ X ( e + ))) = 0and P a t = a ˆ u t ( e + ) = 0 for e + ∈ R n \ N ,X ( a , a ), and that det (cid:0) X ′ Λ − (ˆ ρ X ( e + )) X (cid:1) = 0 impliesdet ( X ′ adj (Λ(ˆ ρ X ( e + ))) X ) = 0, the third one of the above conditions takes the equivalent form a X t = a ˆ u t ( e + ) ! ( n − (2 k +1+ q ( k − f ( X ) ′ adj (Λ(ˆ ρ X ( e + ))) f ( X ) g ( X ) = 0 (37)subject to n X t =2 ˆ u t ( e + )ˆ u t − ( e + ) ! = a X t = a ˆ u t ( e + ) ! , det (cid:0) X ′ Λ − (ˆ ρ X ( e + )) X (cid:1) = 0 , (38)where f ( X ) = [det ( X ′ adj (Λ(ˆ ρ X ( e + ))) X )) I n − X adj ( X ′ adj (Λ(ˆ ρ X ( e + ))) X ) X ′ adj (Λ(ˆ ρ X ( e + )))] e + g ( X ) = det ( R adj( X ′ adj (Λ(ˆ ρ X ( e + ))) X ) R ′ ) . The left-hand side of the equation in (37) is a polynomial in the variables x ti as well as ˆ u t,X ( e + )for all X ∈ X satisfying the inequality in (38). After multiplying the left-hand side of the equationin (37) by a suitable power of det( X ′ X ), which is non-zero for X ∈ X , (37) can be equivalentlyrecast as an equation that is polynomial in x ti , showing that the set of X ∈ X satisfying (37)and (38) is a subset of the zero-set of a multivariate polynomial. It follows that X ,F GLS ( e + )is a λ R n × k -null set provided we can show that each of the three polynomials in the variables x ti mentioned before is not trivial. For this it certainly suffices to construct a matrix X ∈ X such that e + / ∈ N ∗ ,X ( a , a ) holds: Consider first the case n ≥ 3. Let the first column x ∗· of X ∗ be equal to(1 , , . . . , , ′ , and choose the remaining columns linearly independent in the orthogonal comple-ment of the space spanned by x ∗· and e + . Then X ∗ ∈ X holds and ˆ u X ∗ ( e + ) = (0 , , , . . . , , , ′ ρ X ∗ ( e + ) is well-defined and equals ˆ ρ Y W,X ∗ ( e + ), which is always less than 1 in absolutevalue. Consequently, e + ∈ R n \ N ,X ∗ ( a , a ) holds. Furthermore, Λ(ˆ ρ X ∗ ( e + )) is then positive def-inite and hence det (cid:0) X ∗′ Λ − (ˆ ρ X ∗ ( e + )) X ∗ (cid:1) = 0 and det (cid:0) R ( X ∗′ Λ − (ˆ ρ X ∗ ( e + )) X ∗ ) − R ′ (cid:1) = 0 hold;also ˜ σ X ∗ ( e + ) > ρ X ∗ ( e + )) and the fact that e + / ∈ span ( X ∗ ).But this establishes e + ∈ R n \ N ∗ ,X ∗ ( a , a ) in case n ≥ 3. Next consider the case n = 2. Then k = 1 must hold. The assumption k ≤ a − a entails a = n = 2 and a = 1, i.e., ˆ ρ must bethe Yule-Walker estimator implying that N ∗ ,X ∗ ( a , a ) = span ( X ∗ ). Choose X ∗ as an arbitraryvector linearly independent of e + (which is possible since n = 2 > k ). Then X ∗ ∈ X and e + ∈ R n \ N ∗ ,X ∗ ( a , a ) are satisfied. The proof for X ,F GLS ( e − ) is completely analogous where incase n ≥ X ∗ is now chosen in such a way that x ∗· is equal to ( − , , . . . , , ( − n ) ′ and e − takes the rˆole of e + in the construction of the remaining columns. Next consider the set X ,F GLS ( e + ). Observe that for X ∈ X \ X ,F GLS ( e + ) the relation T F GLS,X ( e + + µ ∗ ) = C canequivalently be written as ( R ˜ β X ( e + )) ′ ˜Ω − X ( e + )( R ˜ β X ( e + )) − C = 0 . (39)Similar arguments as above show that for X ∈ X \ X ,F GLS ( e + ) this equation can equivalently bestated as p ( X ) = 0 where p ( X ) is a polynomial in the variables x ti . But this shows that the set of X ∈ X \ X ,F GLS ( e + ) satisfying (39) is (a subset of) an algebraic set. It follows that X ,F GLS ( e + )is a λ R n × k -null set provided the polynomial p is not trivial, or in other words that there exists amatrix X ∈ X \ X ,F GLS ( e + ) that violates (39). But this is guaranteed by the provision in thetheorem. The result for X ,F GLS ( e − ) is proved in exactly the same manner. The remaining claimsof Part 1 are now obvious.(2) Similar arguments as in the proof of Part 1 show that ˜ X ,F GLS ( e − ) and ˜ X ,F GLS ( e − )are each contained in an algebraic set. By the assumed provision it follows immediately that˜ X ,F GLS ( e − ) is a λ R n × ( k − -null set. The same conclusion holds for ˜ X ,F GLS ( e − ) if we can finda matrix X ∗ = (cid:16) e + , ˜ X ∗ (cid:17) such that e − / ∈ N ∗ ,X ∗ ( a , a ). To this end let the n × a =( − , , . . . , , ( − n ) ′ be the first column of ˜ X ∗ and choose the remaining k − e + , e − , and a (which is possiblesince k < n ). Simple computation now shows that ˆ u X ∗ ( e − ) = 0 (note that n ≥ u X ∗ ( e − ) is zero. Consequently, ˆ ρ X ∗ ( e − ) is well-defined andequals ˆ ρ Y W,X ∗ ( e − ), which is always less than 1 in absolute value, and the same argument as in theproof of Part 1 shows that e − / ∈ N ∗ ,X ∗ ( a , a ) is indeed satisfied. The remaining claims of Part 2are now obvious.(3) First consider X ,OLS ( e + ). The condition e + ∈ N ∗ ,X ( a , a ) is equivalent to P a t = a ˆ u t ( e + ) =0, or P a t = a ˆ u t ( e + ) = 0 but det (cid:0) R ( X ′ X ) − X ′ Λ(ˆ ρ ( y )) X ( X ′ X ) − R ′ (cid:1) = 0. Similar arguments as in(1) then show that X ,OLS ( e + ) is a subset of an algebraic set. The matrix X ∗ constructed in (1) iseasily seen to satisfy e + ∈ R n \ N ∗ ,X ∗ ( a , a ). Thus X ,OLS ( e + ) is a λ R n × k -null set. The proof for X ,OLS ( e − ) is exactly the same. Next consider X ,OLS ( e + ). Observe that for X ∈ X \ X ,OLS ( e + )the relation T OLS,X ( e + + µ ∗ ) = C can equivalently be written as( R ˆ β X ( e + )) ′ ˆΩ − X ( e + )( R ˆ β X ( e + )) − C = 0 . (40)The same argument as in the proof of Part (1) shows that the set of X ∈ X \ X ,OLS ( e + ) satisfying(40) is (a subset of) the zero-set of a multivariate polynomial in the variables x ti . It followsthat X ,OLS ( e + ) is a λ R n × k -null set under the maintained provision that it is a proper subset of60 \ X ,OLS ( e + ). The proof for X ,OLS ( e − ) is the same. The proof for ˜ X ,OLS ( e − ) and ˜ X ,OLS ( e − )is similar to the proof for ˜ X ,F GLS ( e − ) and ˜ X ,F GLS ( e − ).(4) Note that the assumptions obviously imply e + ∈ M and R ˆ β ( e + ) = 0. (cid:4) Remark B.4. In case a = 1 and a = n the argument in the above proof simplifies due to thefact that N ∗ ,X (1 , n ) = N ∗ ,X (1 , n ) = span ( X ). Proof of Theorem 3.17: We apply Theorem 5.21. That (˜ β, ˜Ω) as well as (ˆ β, ˆΩ) satisfyAssumptions 5, 6, and 7 has been shown in Lemmata B.2 and B.3. The covariance model C AR (1) satisfies the properties required in Theorem 5.21 as shown in Lemma G.1. Furthermore, we have J (cid:0) C AR (1) (cid:1) = span( e + ) ∪ span( e − ), see Lemma G.1, and because e + , e − ∈ M is assumed we concludethat J (cid:0) C AR (1) (cid:1) ⊆ M . The assumption R ˆ β ( e + ) = R ˆ β ( e − ) = 0 then implies that even J (cid:0) C AR (1) (cid:1) ⊆ M − µ holds. The invariance condition (34) in Theorem 5.21 is thus satisfied, because T is G ( M )-invariant by Lemma 5.15. We next show that the additional condition in Part 2 of Theorem5.21 is satisfied. This is trivial in case the Yule-Walker estimator is used (i.e., if a = 1 and a n = n ) since then ˜Ω ( y ) is positive definite for y / ∈ N ∗ ( a , a ) and ˆΩ ( y ) is positive definite for y / ∈ N ∗ ( a , a ) (see the proof of Lemma B.1) and since N ∗ ( a , a ) and N ∗ ( a , a ) are λ R n -null setsby Lemma B.1. If a = 1 or a n = n , then y constructed in the proof of Lemma 3.14 satisfies y ∈ R n \ N ∗ ( a , a ) and y ∈ R n \ N ∗ ( a , a ) (cf. proof of Lemma B.1) as well as ˆ ρ ( y ) = ˆ ρ Y W ( y ),implying that ˜Ω( y ) as well as ˆΩ( y ) are positive definite. As shown in Lemma B.1, the matrix ˜Ωis, in particular, continuous on the open set R n \ N ∗ ( a , a ) and the matrix ˆΩ is continuous on theopen set R n \ N ∗ ( a , a ). Consequently, ˜Ω and ˆΩ are positive definite in a neighborhood of y andthus the additional condition in Part 2 of Theorem 5.21 is satisfied. Finally, the condition a = 1and a = n implies that ˜Ω and ˆΩ are λ R n -almost everywhere positive definite (since then ˆ ρ = ˆ ρ Y W ),verifying the extra condition in Part 3 of Theorem 5.21. (cid:4) C Appendix: Proofs for Section 4 Proof of Theorem 4.2: First observe that ˆ β and ˆΩ Het satisfy Assumptions 5 and 6 with N = ∅ .In fact, ˆΩ Het ( y ) is nonnegative definite for every y ∈ R n , and is positive definite λ R n -almosteverywhere under Assumption 3 by Lemma 4.1. Furthermore, in view of this lemma and because N = ∅ , the set N ∗ in Corollary 5.17 is precisely the set of y for which rank ( B ( y )) < q . It is trivialthat Z i =span( e i ( n )) is a concentration space of C for every i = 1 , . . . , n . The theorem now followsby applying Corollary 5.17 and Remark 5.18(i) to Z i and by noting that e i ( n ) ∈ R n \ N ∗ translatesinto rank ( B ( e i ( n ))) = q . Also note that the size of the test can not be zero in view of Part 5 ofLemma 5.15. (cid:4) D Appendix: Proofs for Subsection 5.1 Proof of Proposition 5.2: SinceΠ ( N − ν ∗ ) ⊥ ( g α,ν,ν ′ ( y ) − ν ∗ ) = α Π ( N − ν ∗ ) ⊥ ( y − ν ) + Π ( N − ν ∗ ) ⊥ ( ν ′ − ν ∗ )= α Π ( N − ν ∗ ) ⊥ ( y − ν ) = α Π ( N − ν ∗ ) ⊥ ( y − ν ∗ ) , h follows, and hence h is constant on the orbits of G ( N ). Now suppose that h ( y ) = h ( y ′ ). If h ( y ) = h ( y ′ ) = 0 holds, it follows thatΠ ( N − ν ∗ ) ⊥ ( y − ν ∗ ) = Π ( N − ν ∗ ) ⊥ ( y ′ − ν ∗ ) = 0 . Consequently, y ′ − y is of the form ν ∗ − ν ∗ for some ν ∗ ∈ N . But this gives y ′ = ( y − ν ∗ ) + ν ∗ = g ,ν ∗ ,ν ∗ ( y ), showing that y ′ is in the same orbit as y . Next consider the case where h ( y ) = h ( y ′ ) = 0.Then Π ( N − ν ∗ ) ⊥ (cid:16)(cid:13)(cid:13)(cid:13) Π ( N − ν ∗ ) ⊥ ( y ′ − ν ∗ ) (cid:13)(cid:13)(cid:13) ( y − ν ∗ ) − c (cid:13)(cid:13)(cid:13) Π ( N − ν ∗ ) ⊥ ( y − ν ∗ ) (cid:13)(cid:13)(cid:13) ( y ′ − ν ∗ ) (cid:17) = 0where c = ± 1. It follows that the argument inside the projection operator is of the form ν ∗ − ν ∗ for some ν ∗ ∈ N . Elementary calculations give y ′ = (cid:13)(cid:13)(cid:13) Π ( N − ν ∗ ) ⊥ ( y ′ − ν ∗ ) (cid:13)(cid:13)(cid:13) c (cid:13)(cid:13)(cid:13) Π ( N − ν ∗ ) ⊥ ( y − ν ∗ ) (cid:13)(cid:13)(cid:13) ( y − ν ∗ ) + ν ∗ + 1 c (cid:13)(cid:13)(cid:13) Π ( N − ν ∗ ) ⊥ ( y − ν ∗ ) (cid:13)(cid:13)(cid:13) ( ν ∗ − ν ∗ ) .Since the last term in parenthesis on the right-hand side above is obviously an element of N , wehave obtained y ′ = g ( y ) for some g ∈ G ( N ), i.e., y ′ is in the same orbit as y . This shows that h isa maximal invariant. (cid:4) Proof of Proposition 5.4: (1) From (14) and its extension discussed subsequently to (14), aswell as from the transformation theorem for integrals we obtain E µ,σ Φ ( ϕ ( y )) = E α ( µ − µ )+ µ ′ ,α σ Φ (cid:16) ϕ (cid:16) g − α,µ ,µ ′ ( y ) (cid:17)(cid:17) . By almost invariance of ϕ we have that ϕ ( y ) = ϕ (cid:16) g − α,µ ,µ ′ ( y ) (cid:17) for all y ∈ R n \ N with λ R n ( N ) = 0(where N may depend on g − α,µ ,µ ′ ). Since Φ is positive definite, also P α ( µ − µ )+ µ ′ ,α σ Φ ( N ) = 0holds, and thus the right-hand side of the above display equals E α ( µ − µ )+ µ ′ ,α σ Φ ( ϕ ( y )) whichproves the first claim.(2) Setting α = 1 in (15) shows that the rejection probability is invariant under addition ofelements that belong to M − µ . Since µ = Π ( M − µ ) ( µ − µ ) + Π ( M − µ ) ⊥ ( µ − µ ) + µ wethus conclude that E µ,σ Φ ( ϕ ) = E ν + µ ,σ Φ ( ϕ ) where ν = Π ( M − µ ) ⊥ ( µ − µ ) ∈ M . Now applying(15) with α = σ − and µ ′ = µ to E ν + µ ,σ Φ ( ϕ ) establishes the first equality in (16). The secondequality follows by the same argument by setting α = ± σ − , the sign equaling the sign of the firstnon-zero component of ν if ν = 0, and the choice of sign being irrelevant if ν = 0.(3) The first claim is an immediate consequence of (16). For the second claim it suffices to showthat µ − X ˆ β rest ( µ ) (for µ ∈ M ) is an injective linear function of Rβ − r , bijectivity of this mappingfollowing from dimension considerations. To this end note that µ − X ˆ β rest ( µ ) = Xβ − X (cid:20) ˆ β ( µ ) − ( X ′ X ) − R ′ (cid:16) R ( X ′ X ) − R ′ (cid:17) − (cid:16) R ˆ β ( µ ) − r (cid:17)(cid:21) = Xβ − X (cid:20) β − ( X ′ X ) − R ′ (cid:16) R ( X ′ X ) − R ′ (cid:17) − ( Rβ − r ) (cid:21) = ( X ′ X ) − R ′ (cid:16) R ( X ′ X ) − R ′ (cid:17) − ( Rβ − r )62nd that the matrix premultiplying Rβ − r is of full column rank q .(4) This follows similarly as in (1) observing that for invariant ϕ the exceptional set N is empty. (cid:4) Proof of Proposition 5.6: Set ¯ h ( µ, σ ) = D Π ( M − µ ) ⊥ ( µ − µ ) /σ E . The invariance of (cid:0) ¯ h ( µ, σ ) , Σ (cid:1) follows from a simple computation similar to the one in the proof of Proposition5.2. Now assume that (cid:0) ¯ h ( µ, σ ) , Σ (cid:1) = (cid:0) ¯ h ( µ ′ , σ ′ ) , Σ ′ (cid:1) . We immediately get ¯ h ( µ, σ ) = ¯ h ( µ ′ , σ ′ )and Σ = Σ ′ . The former impliesΠ ( M − µ ) ⊥ (( µ − µ ) − c ( σ/σ ′ ) ( µ ′ − µ )) = 0where c = ± 1. Similar calculations as in the proof of Proposition 5.2 give µ ′ = c ( σ ′ /σ ) ( µ − µ ∗ ) + µ for some µ ∗ ∈ M . Together with Σ = Σ ′ this shows that ( µ ′ , σ ′ , Σ ′ ) is in the same orbit underthe associated group as is ( µ, σ , Σ). (cid:4) E Appendix: Proofs and Auxiliary Results for Subsections5.2 and 5.3 The next lemma is a simple consequence of a continuity property of the characteristic function ofa multivariate Gaussian probability measure and of the Portmanteau theorem. Lemma E.1. Let Φ m be a sequence of nonnegative definite symmetric n × n matrices converging toan n × n matrix Φ as m → ∞ , where Φ may be singular, and let µ m ∈ R n be a sequence convergingto µ ∈ R n as m → ∞ . Then P µ m , Φ m converges weakly to P µ, Φ . If, in addition, A ∈ B ( R n ) satisfies λ µ +span(Φ) (bd( A )) = 0 , then P µ m , Φ m ( A ) → P µ, Φ ( A ) . Proof of Theorem 5.7: (1) Since Z is a concentration space of C , there exists a sequence(Σ m ) m ∈ N in C converging to ¯Σ such that span( ¯Σ) = Z . Note that µ + Z is a λ R n -null set becausedim ( Z ) < n in view of Definition 2.1. Because Σ m is positive definite, we thus have P µ ,σ Σ m ( W ) = P µ ,σ Σ m ( W ∪ ( µ + Z )).By Lemma E.1 we then have that P µ ,σ Σ m ( W ∪ ( µ + Z )) converges to P µ ,σ ¯Σ ( W ∪ ( µ + Z )). Butthe later probability is not less than P µ ,σ ¯Σ ( µ + Z ) which equals 1 since P µ ,σ ¯Σ is supported by µ + Z . To prove the claim in parentheses observe that T ( µ + z ) > C and lower semicontinuity of T at µ + z implies that T ( w ) > C holds for all w in a neighborhood of µ + z ; hence such points µ + z belong to int( W ) ⊆ int ( W ∪ ( µ + Z )), and consequently do not belong to bd ( W ∪ ( µ + Z )). Butthis establishes (18) . (2) Apply the same argument as above to R n \ W . Also note that P µ ,σ Σ ( W ) can be approx-imated arbitrarily closely by P µ ,σ Σ ( W ) for suitable µ ∈ M , since k P µ ,σ Σ − P µ ,σ Σ k T V → µ → µ holds by Scheff´e’s Lemma as σ Σ is positive definite.(3) Choose an arbitrary µ ∈ M . By assumption we have inf Σ ∈ C P µ ,σ Σ ( W ) = 0 for a suitable σ > 0. It hence suffices to show that for every Σ ∈ C P µ ,σ Σ ( W ) − P µ ,σ τ Σ ( W ) → τ → ∞ . By almost invariance of W under G ( { µ } ) we have that W △ ( τ W + (1 − τ ) µ )is a λ R n -null set. Hence, by the reproductive property of the normal distribution P µ ,σ τ Σ ( W ) = P µ ,σ τ Σ ( τ W + (1 − τ ) µ ) = P µ + τ − ( µ − µ ) ,σ Σ ( W ) . But, since σ Σ is positive definite, we have by an application of Scheff´e’s Lemma k P µ ,σ Σ − P µ + τ − ( µ − µ ) ,σ Σ k T V → τ → ∞ , and hence P µ ,σ Σ ( W ) − P µ + τ − ( µ − µ ) ,σ Σ ( W ) → 0. The claim in parenthesis isobvious. (cid:4) Lemma E.2. Let ϕ : R n → [0 , be a Borel-measurable function that is almost invariant under G ( M ) . Suppose Φ m is a sequence of positive definite symmetric n × n matrices converging to apositive definite matrix Φ , suppose µ m ∈ M , and suppose the sequence σ m satisfies < σ m < ∞ .Then lim m →∞ E µ m ,σ m Φ m ( ϕ ) = E ν + µ , Φ ( ϕ ) provided ν ∗ m = Π ( M − µ ) ⊥ ( µ m − µ ) /σ m for some µ ∈ M converges to an element ν ∈ R n (whichthen necessarily belongs to M ). [Note that ν ∗ m , and thus the result, does not depend on the choiceof µ ∈ M .]Proof. By Proposition 5.4 we have that E µ m ,σ m Φ m ( ϕ ) = E ν ∗ m + µ , Φ m ( ϕ ). Since ν ∗ m → ν and sinceΦ m → Φ, with Φ positive definite, the result follows from total variation distance convergence of P ν ∗ m + µ , Φ m to P ν + µ , Φ . Remark E.3. (i) Consider the case where ν ∗ m = Π ( M − µ ) ⊥ ( µ m − µ ) /σ m does not converge.Then, as long as the sequence ν ∗ m is bounded, the above lemma can be applied by passing to sub-sequences along which ν ∗ m converges. In the case where the sequence ν ∗ m is unbounded, then, alongsubsequences such that the norm of ν ∗ m diverges, one would expect E µ m ,σ m Φ m ( ϕ ) = E ν ∗ m + µ , Φ m ( ϕ )to converge to 1 for any reasonable test since ν ∗ m + µ moves farther and farther away from M (and Φ m stabilizes at a positive definite matrix). Indeed, such a result can be shown for a largeclass of tests, see Lemma 5.15.(ii) In the special case where µ m ≡ µ it is easy to see, using Proposition 5.4, that the limit in theabove lemma is E µ,σ Φ ( ϕ ) if σ m → σ ∈ (0 , ∞ ) and µ ∈ M , is E µ , Φ ( ϕ ) if σ m → ∞ and µ ∈ M ,and is E µ, Φ ( ϕ ) if µ ∈ M . Lemma E.4. Let ϕ : R n → [0 , be a Borel-measurable function that is almost invariant under G ( M ) . Suppose Φ m is a sequence of positive definite symmetric n × n matrices converging to asingular matrix Φ , suppose µ m ∈ M , and σ m is a sequence satisfying < σ m < ∞ . Assume furtherthat ϕ ( x + z ) = ϕ ( x ) holds for every x ∈ R n and every z ∈ span(Φ) . Suppose that for some sequenceof positive real numbers s m the matrix D m = Π span(Φ) ⊥ Φ m Π span(Φ) ⊥ /s m converges to a matrix D ,which is regular on the orthogonal complement of span(Φ) . Then lim m →∞ E µ m ,σ m Φ m ( ϕ ) = E ν + µ ,D +Φ ( ϕ ) = E ν + µ ,D ( ϕ ) provided ν ∗∗ m = Π ( M − µ ) ⊥ ( µ m − µ ) / (cid:16) σ m s / m (cid:17) for some µ ∈ M converges to an element ν ∈ R n (which then necessarily belongs to M ). [Note that ν ∗∗ m , and thus the result, does not depend on thechoice of µ ∈ M .] Furthermore, the matrix D + Φ is positive definite. roof. Because Π span(Φ) ( x − µ m ) ∈ span(Φ), we obtain by the assumed invariance w.r.t. additionof z ∈ span(Φ) ϕ ( x ) = ϕ ( µ m + Π span(Φ) ⊥ ( x − µ m ) + Π span(Φ) ( x − µ m )) = ϕ ( µ m + Π span(Φ) ⊥ ( x − µ m ))for every x . By the transformation theorem we then have on the one hand E µ m ,σ m Φ m ( ϕ ( · )) = E µ m ,σ m Φ m ( ϕ ( µ m + Π span(Φ) ⊥ ( · − µ m ))) = E µ m ,σ m Π span(Φ) ⊥ Φ m Π span(Φ) ⊥ ( ϕ ( · ))= E µ m ,σ m s m D m ( ϕ ( · )) . (41)On the other hand, by the same invariance property of ϕE µ m ,σ m s m D m ( ϕ ( · )) = E µ m ,σ m s m D m ( ϕ ( · + z ))holds for every z ∈ span(Φ). Integrating this w.r.t. a normal distribution P ,σ m s m Φ (in the variable z ) and using the reproductive property of the normal distribution gives E µ m ,σ m s m D m ( ϕ ( x )) = E Q m ( ϕ ( x + z )) = E µ m ,σ m s m ( D m +Φ) ( ϕ ( x )) (42)where Q m denotes the product of the Gaussian measures P µ m ,σ m s m D m and P ,σ m s m Φ . Observe that D + Φ as well as D m + Φ are positive definite. An application of Lemma E.2 now giveslim m →∞ E µ m ,σ m s m ( D m +Φ) ( ϕ ) = E ν + µ ,D +Φ ( ϕ ) . The same argument that has led to (42) now shows that E ν + µ ,D +Φ ( ϕ ) = E ν + µ ,D ( ϕ ). Combiningthis with (41) completes the proof of the display in the theorem. The positive definiteness of D + Φis obvious as noted earlier in the proof. Remark E.5. (i) A remark similar to Remark E.3(i) also applies here. In particular, we typicallycan expect E µ m ,σ m Φ m ( ϕ ) to converge to 1 in case the norm of ν ∗∗ m diverges.(ii) In the special case where µ m ≡ µ it is easy to see, using Proposition 5.4, that the limit inthe above lemma is E µ,κ ( D +Φ) ( ϕ ) = E µ,κD ( ϕ ) if σ m s m → κ ∈ (0 , ∞ ) and µ ∈ M , is E µ ,D +Φ ( ϕ ) = E µ ,D ( ϕ ) if σ m s m → ∞ and µ ∈ M , and is E µ,D +Φ ( ϕ ) = E µ,D ( ϕ ) if µ ∈ M . Remark E.6. (i) If s m and s ∗ m are two positive scaling factors such that Π span(Φ) ⊥ Φ m Π span(Φ) ⊥ /s m → D and Π span(Φ) ⊥ Φ m Π span(Φ) ⊥ /s ∗ m → D ∗ with both D and D ∗ being regular on the orthogonal com-plement of span(Φ), then s m /s ∗ m must converge to a positive finite number, i.e., the scaling sequenceis essentially uniquely determined.(ii) Typical choices for s m are s (1) m = (cid:13)(cid:13) Π span(Φ) ⊥ Φ m Π span(Φ) ⊥ (cid:13)(cid:13) (for some choice of norm) or s (2) m = tr(Π span(Φ) ⊥ Φ m Π span(Φ) ⊥ ); note that s (1) m as well as s (2) m are positive, since Φ m is positivedefinite and Φ is singular. With both choices convergence of Π span(Φ) ⊥ Φ m Π span(Φ) ⊥ /s m (at leastalong suitable subsequences) is automatic. Furthermore, since for any choice of norm we have c (cid:13)(cid:13) Π span(Φ) ⊥ Φ m Π span(Φ) ⊥ (cid:13)(cid:13) ≤ tr(Π span(Φ) ⊥ Φ m Π span(Φ) ⊥ ) ≤ c (cid:13)(cid:13) Π span(Φ) ⊥ Φ m Π span(Φ) ⊥ (cid:13)(cid:13) for suit-able 0 < c ≤ c < ∞ , we have convergence of s (1) m /s (2) m to a positive finite number (at least alongsuitable subsequences). Hence, which of the normalization factors s ( i ) m is used in an application ofthe above lemma, typically does not make a difference.65 roof of Theorem 5.10: (1) By the invariance properties of the rejection probability expressedin Proposition 5.4 it suffices to show for an arbitrary fixed µ ∈ M thatsup Σ ∈ C E µ , Σ ( ϕ ) < m ∈ C be a sequence such that E µ , Σ m ( ϕ ) converges to sup Σ ∈ C E µ , Σ ( ϕ ). Since C is assumed to be bounded, we may assumewithout loss of generality that Σ m converges to a matrix ¯Σ (not necessarily in C ). If ¯Σ is positivedefinite, it follows from Lemma E.2 applied to E µ , Σ m ( ϕ ) that sup Σ ∈ C E µ , Σ ( ϕ ) = E µ , ¯Σ ( ϕ ) (since ν = 0). But E µ , ¯Σ ( ϕ ) is less than 1 since ϕ ≤ λ R n -almost everywhere equal to 1. If ¯Σ issingular, then in view of the assumptions of the theorem we can pass to the subsequence Σ m i andthen apply Lemma E.4 to E µ , Σ mi ( ϕ ) to obtain that sup Σ ∈ C E µ , Σ ( ϕ ) = E µ ,D +¯Σ ( ϕ ) (since again ν = 0) for a matrix D with the properties as given in the theorem. But E µ ,D +¯Σ ( ϕ ) is less than1, since D + ¯Σ is positive definite (as noted in Lemma E.4) and since ϕ ≤ λ R n -almosteverywhere equal to 1. This proves the first claim of Part 1 of the theorem. To prove the secondclaim in Part 1, observe that for the same invariance reasons it suffices to show that for an arbitraryfixed µ ∈ M inf Σ ∈ C E µ , Σ ( ϕ ) > E µ , ¯Σ ( ϕ ) for somepositive definite ¯Σ, or equals E µ ,D +¯Σ ( ϕ ) for some positive definite D + ¯Σ. Since ϕ ≥ 0, but ϕ isnot λ R n -almost everywhere equal to 0 by assumption, the result follows.(2) Let µ m ∈ M , 0 < σ m < ∞ , and Σ m ∈ C be sequences such that E µ m ,σ m Σ m ( ϕ ) convergesto inf µ ∈ M inf σ > inf Σ ∈ C E µ ,σ Σ ( ϕ ). Since C is assumed to be bounded, we may assume withoutloss of generality that Σ m converges to a matrix ¯Σ.Consider first the case where ¯Σ is positive definite: Set ν ∗ m = Π ( M − µ ) ⊥ ( µ m − µ ) /σ m . Ifthis sequence is bounded, we may pass to a subsequence m ′ such that ν ∗ m ′ converges to some ν .Applying Lemma E.2 then shows that E µ m ′ ,σ m ′ Σ m ′ ( ϕ ) converges to E ν + µ , ¯Σ ( ϕ ), which is positivesince ϕ ≥ λ R n -almost everywhere equal to 0 and since ¯Σ is positive definite. If the sequence ν ∗ m is unbounded, we may pass to a subsequence m ′ such that k ν ∗ m ′ k → ∞ . Since E µ m ′ ,σ m ′ Σ m ′ ( ϕ ) = E ν ∗ m ′ + µ , Σ m ′ ( ϕ ) by Proposition 5.4, it follows from assumption (22) that lim m ′ E µ m ′ ,σ m ′ Σ m ′ ( ϕ ) ispositive.Next consider the case where ¯Σ is singular: Pass to the subsequence m i mentioned in the theoremand set now ν ∗∗ m i = Π ( M − µ ) ⊥ ( µ m i − µ ) / (cid:16) σ m i s / m i (cid:17) . If this sequence is bounded, we may passto a subsequence m ′ i of m i such that ν ∗∗ m ′ i converges to some ν . Applying Lemma E.4 then showsthat E µ m ′ i ,σ m ′ i Σ m ′ i ( ϕ ) converges to E ν + µ ,D +¯Σ ( ϕ ), which is positive since ϕ ≥ λ R n -almosteverywhere equal to 0 and since D + ¯Σ is positive definite. If the sequence ν ∗∗ m i is unbounded, wemay pass to a subsequence m ′ i of m i such that (cid:13)(cid:13)(cid:13) ν ∗∗ m ′ i (cid:13)(cid:13)(cid:13) → ∞ . Since E µ m ′ i ,σ m ′ i Σ m ′ i ( ϕ ) = E µ m ′ i ,σ m ′ i s m ′ i (cid:16) D m ′ i +¯Σ (cid:17) ( ϕ ) = E ν ∗∗ m ′ i + µ ,D m ′ i +¯Σ ( ϕ )by (41), (42), and Proposition 5.4, it follows from assumption (22) and positive definiteness of D + ¯Σthat lim i →∞ E µ m ′ i ,σ m ′ i Σ m ′ i ( ϕ ) is positive. Taken together the preceding arguments establish Part 2of the theorem. 663) To prove the first claim of Part 3 of the theorem observe that we can find µ m ∈ M and σ m with 0 < σ m < ∞ with d ( µ m , M ) /σ m ≥ c such that the expression left of the arrow in (23) differsfrom E µ m ,σ m Σ m ( ϕ ) only by a sequence converging to zero. Let m ′ denote an arbitrary subsequence.We can then find a further subsequence m ′ i such that the corresponding matrix D m ′ i satisfies theassumptions of the theorem. Note that the sequence s m ′ i corresponding to D m ′ i necessarily convergesto zero. But then the norm of ν ∗∗ m ′ i defined above must diverge since d (cid:16) µ m ′ i , M (cid:17) /σ m ≥ c and sinceΠ ( M − µ ) ⊥ is the projection onto the orthogonal complement of M − µ . Because E µ m ′ i ,σ m ′ i Σ m ′ i ( ϕ ) = E ν ∗∗ m ′ i + µ ,D m ′ i +¯Σ ( ϕ )in view of (41), (42), and Proposition 5.4, the result then follows from the assumption that thelimit inferior in (22) is equal to 1, noting that D m ′ i + ¯Σ is positive definite and converges to thepositive definite matrix D + ¯Σ.We next prove the second claim in Part 3. Choose µ m ∈ M with d ( µ m , M ) ≥ c m such that theexpression to the left of the arrow in (24) differs from E µ m ,σ m Σ m ( ϕ ) only by a sequence convergingto zero. Since E µ m ,σ m Σ m ( ϕ ) = E ν ∗ m + µ , Σ m ( ϕ )by Proposition 5.4 where ν ∗ m was defined above and since k ν ∗ m k ≥ c m /σ m → ∞ clearly holds, theresult follows from the assumption that the limit inferior in (22) is equal to 1. [Note that we havenot made use of condition (21) and the condition on C following (21).] (cid:4) Proof of Theorem 5.12: By invariance properties of the rejection probability (cf. Proposition5.4) it suffices to show for the particular µ ∗ ∈ M appearing in (26) that for every δ , 0 < δ < k = k ( δ ) such that sup Σ ∈ C E µ ∗ , Σ ( ϕ k ) ≤ δ. (43)For this it suffices to show that sup Σ ∈ C E µ ∗ , Σ ( ϕ k ) converges to zero for k → ∞ . Let Σ k ∈ C be asequence such that for all k ≥ Σ ∈ C E µ ∗ , Σ ( ϕ k ) ≤ E µ ∗ , Σ k ( ϕ k ) + k − . (44)Since C is assumed to be bounded, we can find for every subsequence k ∗ a further subsubsequence k ′ such that Σ k ′ converges to a matrix ¯Σ (not necessarily in C ). Let ε > k ′ in the subsequence such that E µ ∗ , ¯Σ ( ϕ k ′ ) < ε/ Σ ∈ C E µ ∗ , Σ ( ϕ k ′ ) ≤ E µ ∗ , Σ k ′ ( ϕ k ′ ) + k ′− ≤ E µ ∗ , Σ k ′ ( ϕ k ′ ) + k ′− (45)holds for all k ′ ≥ k ′ . Now Lemma E.2 together with Remark E.3(ii) may clearly be applied to thesubsequence k ′ , showing that E µ ∗ , Σ k ′ ( ϕ k ′ ) converges to E µ ∗ , ¯Σ ( ϕ k ′ ) < ε/ 2. But this shows thatlim sup k ′ →∞ sup Σ ∈ C E µ ∗ , Σ ( ϕ k ′ ) < ε . (46)67ase 2: ¯Σ is singular. Then we can find a subsequence k ′ i of k ′ and normalization constants s k ′ i such that the resulting matrices D k ′ i converge to a matrix D with the properties specifiedin Theorem 5.10. Because D + ¯Σ is positive definite, we can in view of (26) find a k ′ i (0) in thesubsequence k ′ i such that E µ ∗ ,D +¯Σ ( ϕ k ′ i (0) ) < ε/ k ′ i shows that E µ ∗ , Σ k ′ i ( ϕ k ′ i (0) )converges to E µ ∗ ,D +¯Σ ( ϕ k ′ i (0) ) < ε/ 2. But by (44) and (26)sup Σ ∈ C E µ ∗ , Σ ( ϕ k ′ i ) ≤ E µ ∗ , Σ k ′ i ( ϕ k ′ i ) + k ′− i ≤ E µ ∗ , Σ k ′ i ( ϕ k ′ i (0) ) + k ′− i holds for all i ≥ i (0). This shows thatlim sup i →∞ sup Σ ∈ C E µ ∗ , Σ ( ϕ k ′ i ) < ε .Taken together we have shown that sup Σ ∈ C E µ ∗ , Σ ( ϕ k ) must converge to zero along the original sequence k which proves (43). (cid:4) F Appendix: Proofs and Auxiliary Results for Subsection5.4 Lemma F.1. Suppose Assumption 5 holds. Then the sets A = (cid:8) y ∈ R n \ N : det ˇΩ( y ) = 0 (cid:9) and A = (cid:8) y ∈ R n \ N : det ˇΩ( y ) = 0 (cid:9) are invariant under G ( M ) , the former set being closed in the relative topology on R n \ N . The set N ∗ = N ∪ (cid:8) y ∈ R n \ N : det ˇΩ( y ) = 0 (cid:9) is a closed λ R n -null set in R n that is invariant under G ( M ) .Proof. The invariance of A and A follows immediately from the invariance of R n \ N and theequivariance of ˇΩ( y ). The relative closedness of A is an immediate consequence of the continuityof ˇΩ( y ) on R n \ N . The invariance of N ∗ follows from invariance of N discussed after Assumption5 and the just established invariance of A . Because N is a λ R n -null set and because ˇΩ( y ) is λ R n -almost everywhere nonsingular on R n \ N , it follows that N ∗ is a λ R n -null set. Finally, we establishclosedness of N ∗ : let y i ∈ N ∗ be a sequence with limit y . If y ∈ N , we are done. If y ∈ R n \ N ,by openness of this set also y i ∈ R n \ N for all but finitely many i must hold and thus det ˇΩ( y i ) = 0.But then continuity of ˇΩ on R n \ N implies det ˇΩ( y ) = 0, and hence y ∈ N ∗ . Proof of Lemma 5.15: (1) Follows from the discussion preceding the lemma and Lemma F.1.(2) Follows immediately from the observation that T coincides on the open set R n \ N ∗ with( R ˇ β ( y ) − r ) ′ ˇΩ − ( y )( R ˇ β ( y ) − r ) which is continuous on this set by Assumption 5.(3) Since N ∗ is invariant under the elements of G ( M ), it is in particular invariant under G ( M ).The result T ( g ( y )) = T ( y ) = 0 for g ∈ G ( M ) then follows trivially for y ∈ N ∗ . Now suppose68 ∈ R n \ N ∗ . Then also g α,µ (1)0 ,µ (2)0 ( y ) ∈ R n \ N ∗ for α = 0, µ ( i )0 ∈ M ( i = 1 , 2) by invariance of R n \ N ∗ . The invariance of T then follows immediately from the equivariance properties of ˇ β andˇΩ expressed in Assumption 5, using that µ ( i )0 ∈ M implies Rγ ( i ) = r for uniquely defined vectors γ ( i ) satisfying µ ( i )0 = Xγ ( i ) .(4) Set O = { y ∈ R n : T ( y ) = C } and note that O ⊆ R n \ N ∗ since C > O = [ y ∈ M ⊥ ( { y ∈ M : y + y ∈ R n \ N ∗ , T ( y + y ) = C } + y ) = [ y ∈ M ⊥ ( O ( y ) + y ) . Note that O as well as O ( y ) are clearly measurable sets. By the already established invariance of R n \ N ∗ , the fact that R n \ N ∗ ⊆ R n \ N , and by the equivariance properties of ˇ β and ˇΩ maintainedin Assumption 5, the set O ( y ) equals (cid:26) y ∈ M : (cid:16) R (cid:16) ˇ β ( y ) + ( X ′ X ) − X ′ y (cid:17) − r (cid:17) ′ ˇΩ − ( y ) (cid:16) R (cid:16) ˇ β ( y ) + ( X ′ X ) − X ′ y (cid:17) − r (cid:17) = C (cid:27) if y ∈ ( R n \ N ∗ ) ∩ M ⊥ , and it is empty if y ∈ N ∗ ∩ M ⊥ (since C > y ∈ ( R n \ N ∗ ) ∩ M ⊥ , theset O ( y ) ⊆ M is the image of¯ O ( y ) = n γ ∈ R k : (cid:0) R (cid:0) ˇ β ( y ) + γ (cid:1) − r (cid:1) ′ ˇΩ − ( y ) (cid:0) R (cid:0) ˇ β ( y ) + γ (cid:1) − r (cid:1) = C o under the invertible linear map γ Xγ from R k onto M . Now ¯ O ( y ) is the zero-set of a multivariatereal polynomial (in the components of γ ). The polynomial does not vanish everywhere on R k because the quadratic form making up the polynomial is unbounded on R k (because ˇΩ − ( y ) issymmetric and well-defined if y ∈ R n \ N ∗ and because rank( R ) = q holds). Consequently, ¯ O ( y )has k -dimensional Lebesgue measure zero and hence λ M ( O ( y )) = 0 for every y ∈ ( R n \ N ∗ ) ∩ M ⊥ .We conclude that λ M ( O ( y )) = 0 for every y ∈ M ⊥ .We now identify R n with M × M ⊥ and view Lebesgue measure λ R n on R n as λ M ⊗ λ M ⊥ . Hence, y is identified with ( y , y ) ∈ M × M ⊥ satisfying y = y + y . Fubini’s Theorem then shows λ R n ( O ) = λ M × M ⊥ ( O ) = Z M × M ⊥ O (( y , y )) dλ M × M ⊥ ( y , y )= Z M ⊥ Z M O ( y ) ( y ) dλ M ( y ) dλ M ⊥ ( y ) = Z M ⊥ λ M ( O ( y )) dλ M ⊥ ( y ) = 0 . (5&6) First observe that { y ∈ R n \ N ∗ : T ( y ) > C } = { y ∈ R n : T ( y ) > C } holds in view of C > T . By continuity of T on R n \ N ∗ established in Part 2 and by openness of R n \ N ∗ , the openness of { y ∈ R n \ N ∗ : T ( y ) > C } and { y ∈ R n \ N ∗ : T ( y ) < C } follows. It hencesuffices to show that these two sets are non-empty: Choose an arbitrary y ∈ R n \ N ∗ and set y ( γ ) = y + Xγ for γ ∈ R k . Then y ( γ ) ∈ R n \ N ∗ by invariance of R n \ N ∗ under G ( M ). Now by theequivariance properties of ˇ β and ˇΩ expressed in Assumption 5 T ( y ( γ )) = (cid:0) Rγ + R ˇ β ( y ) − r (cid:1) ′ ˇΩ − ( y ) (cid:0) Rγ + R ˇ β ( y ) − r (cid:1) . Define ¯ γ = ¯ β − ˇ β ( y ) for some ¯ β satisfying R ¯ β = r . Then T ( y (¯ γ )) = 0 < C holds showing that { y ∈ R n \ N ∗ : T ( y ) < C } is non-empty. Finally choose y ∈ R n \ N ∗ and v as in Assumption 6.69hoose δ such that v = Rδ . Then set γ = cδ + ¯ β − ˇ β ( y ) where ¯ β is as before and c is a real number.Observe that then T ( y ( γ )) = c v ′ ˇΩ − ( y ) v . Choosing c sufficiently large shows that T ( y ( γ )) > C can be achieved, establishing that { y ∈ R n \ N ∗ : T ( y ) > C } is non-empty.(7) Let G be a standard normal n × P ν m + µ , Φ m ( W ( C )) = Pr (cid:16) T ( ν m + µ + Φ / m G ) − C ≥ (cid:17) . (47)Set γ m = ( X ′ X ) − X ′ ν m and γ = ( X ′ X ) − X ′ µ . Observe that Rγ = r while k Rγ m k → ∞ as m → ∞ in view of ν m ∈ Π ( M − µ ) ⊥ ( M − µ ) and k ν m k → ∞ . For Φ / m G ∈ R n \ N ∗ (anevent which has probability 1 because N ∗ is a λ R n -null set and Φ m is positive-definite) we may useequivariance of ˇ β and ˇΩ and obtain that T ( ν m + µ + Φ / m G ) − C coincides on this event with( Rγ m + R ˇ β (Φ / m G )) ′ ˇΩ − (Φ / m G )( Rγ m + R ˇ β (Φ / m G )) − C. (48)Observe that Φ / m G → Φ / G as m → ∞ with probability 1. Furthermore, ˇ β and ˇΩ − are contin-uous on R n \ N ∗ , a set which has probability 1 under the law of Φ / G (since N ∗ is a λ R n -null setand Φ is positive-definite). From the continuous mapping theorem we conclude that R ˇ β (Φ / m G )and ˇΩ − (Φ / m G ) converge almost surely to R ˇ β (Φ / G ) and ˇΩ − (Φ / G ), respectively. Now let v ∈ A (( ν m ) m ≥ ) and let m i be a subsequence such that (cid:13)(cid:13) Rγ m i (cid:13)(cid:13) − Rγ m i → v . It follows that h ( Rγ m i + R ˇ β (Φ / m i G )) ′ ˇΩ − (Φ / m i G )( Rγ m i + R ˇ β (Φ / m i G )) − C i / (cid:13)(cid:13) Rγ m i (cid:13)(cid:13) converges to v ′ ˇΩ − (Φ / G ) v with probability 1. Since Pr (cid:0) v ′ ˇΩ − (Φ / G ) v = 0 (cid:1) by Assumption 7, it follows thatPr (cid:16) T ( ν m i + µ + Φ / m i G ) − C ≥ (cid:17) → Pr (cid:16) v ′ ˇΩ − (Φ / G ) v ≥ (cid:17) . This shows that lim inf m →∞ P ν m + µ , Φ m ( W ( C )) ≤ lim inf i →∞ P ν mi + µ , Φ mi ( W ( C ))= Pr (cid:16) v ′ ˇΩ − (Φ / G ) v ≥ (cid:17) , implying that lim inf m →∞ P ν m + µ , Φ m ( W ( C )) ≤ inf v ∈ A (( ν m ) m ≥ ) Pr (cid:16) v ′ ˇΩ − (Φ / G ) v ≥ (cid:17) . Conversely, let m i be a subsequence such that P ν mi + µ , Φ mi ( W ( C )) → lim inf m →∞ P ν m + µ , Φ m ( W ( C )) . R q is compact, we may assume that (cid:13)(cid:13)(cid:13) Rγ m i ( j ) (cid:13)(cid:13)(cid:13) − Rγ m i ( j ) converges to some v ∈ A (( ν m ) m ≥ ) along a suitable subsequence m i ( j ) . The same arguments as above then show thatlim inf m →∞ P ν m + µ , Φ m ( W ( C )) = lim inf j →∞ P ν mi ( j ) + µ , Φ mi ( j ) ( W ( C ))= Pr (cid:16) v ′ ˇΩ − (Φ / G ) v ≥ (cid:17) ≥ inf v ∈ A (( ν m ) m ≥ ) Pr (cid:16) v ′ ˇΩ − (Φ / G ) v ≥ (cid:17) . Given Assumption 7, the remaining equalities and inequalities in (29) and (30) are now obvious. (cid:4) Proof of Corollary 5.17: (1) If z ∈ R n \ N ∗ then µ + z ∈ R n \ N ∗ and T is continuous at µ + z for every µ ∈ M by Parts 1 and 2 of Lemma 5.15. If T ( µ ∗ + z ) > C holds, then by theinvariance of T established in Part 3 of Lemma 5.15, we have T ( µ + z ) = T ( µ ∗ + z ) > C for every µ ∈ M . Hence the sufficient conditions in Part 1 of Theorem 5.7 are satisfied and an applicationof this theorem delivers the result.(2) Completely analogous to the proof of (1) noting that the invariance of T required in Part 3of Theorem 5.7 is clearly satisfied.(3) Since N ∗ is a λ R n -null set the test statistic T is λ R n -almost everywhere equal to the teststatistic T ∗ ( y ) = ( T ( y ) y ∈ R n \ N ∗ , ∞ , y ∈ N ∗ .We verify that the sufficient conditions in Part 1 of Theorem 5.7 are satisfied for T ∗ . To that end fix µ ∈ M and let Z ′ ⊆ Z denote the set of all z such that z ∈ R n \ N , ˇΩ( z ) = 0, and R ˇ β ( z ) = 0 hold.By invariance of N (cf. discussion after Assumption 5) and equivariance of ˇΩ we see that z ∈ Z ′ implies µ + z ∈ R n \ N and ˇΩ( µ + z ) = 0, and thus T ∗ ( µ + z ) = ∞ > C holds for every z ∈ Z ′ bydefinition of T ∗ . We next show that T ∗ is lower semicontinuous at µ + z for every z ∈ Z ′ . Let y m be a sequence converging to µ + z . Since R n \ N is open, we may assume that this sequence entirelybelongs to R n \ N . If det ˇΩ( y m ) = 0 eventually holds, we are done since then T ∗ ( y m ) = ∞ eventuallyby construction. By a standard subsequence argument we may thus assume that det ˇΩ( y m ) > R n \ N by assumption. Now note that then T ∗ ( y m ) = T ( y m ) = ( R ˇ β ( y m ) − r ) ′ ˇΩ − ( y m )( R ˇ β ( y m ) − r ) ≥ λ − max ( ˇΩ( y m )) k R ˇ β ( y m ) − r k . Since ˇ β is continuous on R n \ N by assumption, we have R ˇ β ( y m ) → R ˇ β ( µ + z ) = R ˇ β ( z ) + r = r where we have made use of equivariance of ˇ β ( z ) and of µ ∈ M . Hence k R ˇ β ( y m ) − r k →k R ˇ β ( z ) k > 0. Furthermore, ˇΩ is continuous on R n \ N by assumption, hence ˇΩ( y m ) → ˇΩ( µ + z ) = 0.Consequently, T ∗ ( y m ) → ∞ , establishing lower semicontinuity of T ∗ . We may now apply Part 1 ofTheorem 5.7 together with Remark 5.8(i) to conclude the proof. (cid:4) Lemma F.2. Let ˇ β and ˇΩ satisfy Assumption 5, let T be the test statistic defined in (28), andlet W ( C ) = { y ∈ R n : T ( y ) ≥ C } with < C < ∞ be the rejection region. Let Φ m be symmet-ric positive definite n × n matrices such that Φ m → Φ for m → ∞ where Φ is singular with l := dim span(Φ) > . Suppose that for some sequence of positive real numbers s m the matrix D m = Π span(Φ) ⊥ Φ m Π span(Φ) ⊥ /s m converges to a matrix D , which is regular on span(Φ) ⊥ , andthat Π span(Φ) ⊥ Φ m Π span(Φ) /s / m → . Suppose further that span(Φ) ⊆ M . Let Z be a matrix, thecolumns of which form a basis for span(Φ) and let G be a standard normal n -vector. Then: . For every µ ∈ M , γ ∈ R l , < σ < ∞ we have s m h T (cid:16) µ + Zγ + σ Φ / m G (cid:17) − C i d → ξ ( γ, σ ) for m → ∞ where the random variable ξ ( γ, σ ) is given by (cid:16) R ˆ β (cid:16) σ − Zγ + Φ / G (cid:17)(cid:17) ′ ˇΩ − (cid:16)(cid:16) Φ / + D / (cid:17) G (cid:17) (cid:16) R ˆ β (cid:16) σ − Zγ + Φ / G (cid:17)(cid:17) for (cid:0) Φ / + D / (cid:1) G / ∈ N ∗ , which is an event that has probability under the law of G , andwhere ξ ( γ, σ ) = 0 else.2. If additionally Assumption 7 holds and R ˆ β ( z ) = 0 λ span(Φ) - a.e. is satisfied, then P µ + Zγ,σ Φ m ( W ( C )) = Pr (cid:16) T (cid:16) µ + Zγ + σ Φ / m G (cid:17) ≥ C (cid:17) → Pr ( ξ ( γ, σ ) ≥ as m → ∞ .Proof. (1) Observe that µ + Zγ ∈ M , that the columns of Φ / as well of Π span(Φ) Φ / m belong to M , and that R n \ N ∗ is invariant under the group G ( M ). Hence, using the equivariance propertiesof ˇ β and ˇΩ expressed in Assumption 5 repeatedly, we obtain that on the event n Φ / m G ∈ R n \ N ∗ o R ˇ β (cid:16) µ + Zγ + σ Φ / m G (cid:17) − r = R ˇ β (cid:16) µ + Zγ + σ Π span(Φ) Φ / m G + σ Π span(Φ) ⊥ Φ / m G (cid:17) − r = R (cid:16) BZγ + σB Π span(Φ) Φ / m G + σs / m ˇ β (cid:16) s − / m Π span(Φ) ⊥ Φ / m G (cid:17)(cid:17) = σR (cid:16) σ − BZγ + K m + s / m ˇ β ( L m ) (cid:17) holds, where B is shorthand for ( X ′ X ) − X ′ , K m = B (cid:16) Π span(Φ) Φ / m − s / m Φ / (cid:17) G , and L m =Φ / G + s − / m Π span(Φ) ⊥ Φ / m G ). Similarly, we obtainˇΩ (cid:16) µ + Zγ + σ Φ / m G (cid:17) = σ ˇΩ (cid:16) Φ / m G (cid:17) = σ ˇΩ (cid:16) Π span(Φ) Φ / m G + Π span(Φ) ⊥ Φ / m G (cid:17) = σ ˇΩ (cid:16) Π span(Φ) ⊥ Φ / m G (cid:17) = σ s m ˇΩ( L m )on the event n Φ / m G ∈ R n \ N ∗ o . Hence, on this event we have s m h T (cid:16) µ + Zγ + σ Φ / m G (cid:17) − C i = (cid:16) R (cid:16) σ − BZγ + K m + s / m ˇ β ( L m ) (cid:17)(cid:17) ′ ˇΩ − ( L m ) × R (cid:16) σ − BZγ + K m + s / m ˇ β ( L m ) (cid:17) − s m C. Clearly, K m and L m are jointly normal with mean zero and second moments given by E (cid:0) K m K ′ m (cid:1) = B (cid:16) Π span(Φ) Φ / m − s / m Φ / (cid:17) (cid:16) Π span(Φ) Φ / m − s / m Φ / (cid:17) ′ B ′ , (cid:0) L m L ′ m (cid:1) = Φ + D m + s − / m Π span(Φ) ⊥ Φ / m Φ / + s − / m (cid:16) Π span(Φ) ⊥ Φ / m Φ / (cid:17) ′ , and E (cid:0) K m L ′ m (cid:1) = B (cid:16) Π span(Φ) Φ / m − s / m Φ / (cid:17) (cid:16) Φ / + s − / m Π span(Φ) ⊥ Φ / m (cid:17) ′ . It is easy to see that E (cid:0) K m K ′ m (cid:1) converges to B Φ B ′ because s m → 0, while E (cid:0) L m L ′ m (cid:1) con-verges to Φ + D because of the following: Observe that s − / m Π span(Φ) ⊥ Φ / m is a (not necessarilysymmetric) square root of D m , and hence there exists an orthogonal n × n matrix U m such that s − / m Π span(Φ) ⊥ Φ / m = D / m U m . Let m ′ be an arbitrary subsequence of m . Then we can find asubsequence m ∗ of m ′ along which U m converges to U , say. Using D m → D , we see that along m ∗ the sequence s − / m Π span(Φ) ⊥ Φ / m Φ / converges to D / U Φ / . It remains to show that this limitis zero. By assumption s − / m Π span(Φ) ⊥ Φ m Π span(Φ) converges to 0. By rewriting this sequence as D / m U m Φ / m Π span(Φ) we see, using Φ m → Φ, that it converges to D / U Φ / along m ∗ , showingthat D / U Φ / = 0.Furthermore, E (cid:0) K m L ′ m (cid:1) converges to B Φ because B (cid:16) Π span(Φ) Φ / m − s / m Φ / (cid:17) (cid:16) s − / m Π span(Φ) ⊥ Φ / m (cid:17) ′ = B (cid:16) Π span(Φ) ⊥ Φ m Π span(Φ) /s / m (cid:17) ′ − B Φ / Φ / m Π span(Φ) ⊥ → − B ΦΠ span(Φ) ⊥ = − B (cid:0) Π span(Φ) ⊥ Φ (cid:1) ′ = 0where we have made use of the assumption Π span(Φ) ⊥ Φ m Π span(Φ) /s / m → (cid:18) K m L m (cid:19) d → N (cid:18) , (cid:20) B Φ B ′ B ΦΦ B ′ Φ + D (cid:21)(cid:19) . Note that this limiting normal distribution is also the joint distribution of K = B Φ / G and L = (cid:0) Φ / + D / (cid:1) G . [Observe that Φ / + D / = (Φ + D ) / since Φ D = D Φ = 0 as D vanishes onspan(Φ) by construction.] Now consider the map f on R n + k given by f ( x, y ) = ( f ( x ) , f ( y ) , f ( y ))where f ( x ) = x for x ∈ R k , and where f ( y ) = ˇ β ( y ), f ( y ) = ˇΩ − ( y ) for y ∈ R n \ N ∗ and are zeroelse. Observe that the set of discontinuity points, F say, of f is contained in R k × N ∗ . ButPr (( K , L ) ∈ F ) ≤ Pr (cid:0) ( K , L ) ∈ R k × N ∗ (cid:1) = Pr ( L ∈ N ∗ ) = 0 (49)because N ∗ is a λ R n -null set and the distribution of L is equivalent to Lebesgue measure on R n asΦ + D is positive definite. This shows that f ( K m , L m ) converges in distribution to f ( K , L ) as m → ∞ . Now s m h T (cid:16) µ + Zγ + σ Φ / m G (cid:17) − C i = (cid:16) R (cid:16) σ − BZγ + f ( K m ) + s / m f ( L m ) (cid:17)(cid:17) ′ f ( L m ) × R (cid:16) σ − BZγ + f ( K m ) + s / m f ( L m ) (cid:17) − s m C holds everywhere (note that L m ∈ R n \ N ∗ if and only if Φ / m G ∈ R n \ N ∗ by G ( M )-invariance of R n \ N ∗ ). Because s / m f ( L m ) converges to zero in probability and s m C → (cid:16) R (cid:16) σ − BZγ + f (cid:16) B Φ / G (cid:17)(cid:17)(cid:17) ′ f (cid:16)(cid:16) Φ / + D / (cid:17) G (cid:17) R (cid:16) σ − BZγ + f (cid:16) B Φ / G (cid:17)(cid:17) which coincides with ξ ( γ, σ ). Finally, the claim that (cid:8)(cid:0) Φ / + D / (cid:1) G ∈ R n \ N ∗ (cid:9) is a probability1 event has already been established in (49).(2) This follows from Part 1 if we can establish that Pr ( ξ ( γ, σ ) = 0) = 0. Now observe thatˇΩ − ( (cid:0) Φ / + D / (cid:1) G ) = ˇΩ − ( D / G ) by equivariance and that (cid:0) Φ / + D / (cid:1) G ∈ R n \ N ∗ if andonly if D / G ∈ R n \ N ∗ . HencePr ( ξ ( γ, σ ) = 0) = Pr (cid:16) ξ ( γ, σ ) = 0 , (cid:16) Φ / + D / (cid:17) G ∈ R n \ N ∗ (cid:17) = Pr (cid:18)(cid:16) ˆ β (cid:16) σ − Zγ + Φ / G (cid:17)(cid:17) ′ R ′ ˇΩ − ( D / G ) × R (cid:16) ˆ β (cid:16) σ − Zγ + Φ / G (cid:17)(cid:17) = 0 , D / G ∈ R n \ N ∗ (cid:17) = Z Pr (cid:18)(cid:16) ˆ β (cid:0) σ − Zγ + x (cid:1)(cid:17) ′ R ′ ˇΩ − ( D / G ) × R (cid:16) ˆ β (cid:0) σ − Zγ + x (cid:1)(cid:17) = 0 , D / G ∈ R n \ N ∗ (cid:17) dP , Φ ( x )= Z Pr (cid:18)(cid:16) ˆ β (cid:0) σ − Zγ + x (cid:1)(cid:17) ′ R ′ ˇΩ − (cid:16)(cid:16) Φ / + D / (cid:17) G (cid:17) × R (cid:16) ˆ β (cid:0) σ − Zγ + x (cid:1)(cid:17) = 0 , (cid:16) Φ / + D / (cid:17) G ∈ R n \ N ∗ (cid:17) dP , Φ ( x )= Z P , Φ+ D (cid:0)(cid:8) y ∈ R n \ N ∗ : v ( x ) ′ ˇΩ − ( y ) v ( x ) = 0 (cid:9)(cid:1) dP , Φ ( x ) (50)with v ( x ) = R ˆ β (cid:0) σ − Zγ + x (cid:1) , the third equality in the preceding display being true since Φ / G and D / G are independent as E (cid:18) Φ / G (cid:16) D / G (cid:17) ′ (cid:19) = Φ / D / = 0 . Now the integrand in the last line of (50) is zero by Assumption (7) for every x except when v ( x ) = 0. Hence, we are done if we can establish that P , Φ ( v ( x ) = 0) = 0. Because span(Φ) equalsthe span of the columns of Z , we can make the change of variables x = Zc and obtain P , Φ ( v ( x ) = 0) = P ,A ( v ( Zc ) = 0) = P ,A (cid:16) R (cid:16) ˆ β (cid:0) Z (cid:0) σ − γ + c (cid:1)(cid:1)(cid:17) = 0 (cid:17) where A = ( Z ′ Z ) − Z ′ Φ Z ( Z ′ Z ) − . Because A is non-singular, this probability is zero if the eventhas λ R l -measure zero. But λ R l (cid:16)n c : R ˆ β (cid:0) Z (cid:0) σ − γ + c (cid:1)(cid:1) = 0 o(cid:17) = λ span(Φ) (cid:16)n z : R ˆ β ( z ) = 0 o(cid:17) = 0by our assumptions. 74 roof of Theorem 5.19: Fix µ ∈ M and σ , 0 < σ < ∞ . Then for every γ ∈ R l we have P µ + Zγ,σ Σ m ( W ( C )) = Pr (cid:16) s m h T (cid:16) µ + Zγ + σ Σ / m G (cid:17) − C i ≥ (cid:17) which converges to Pr ( ξ ( γ, σ ) ≥ 0) as shown in the preceding lemma (with Σ m and ¯Σ playing therˆoles of Φ m and Φ, respectively). Consequently, for every γ ∈ R l inf Σ ∈ C P µ + Zγ,σ Σ ( W ( C )) ≤ Pr ( ξ ( γ, σ ) ≥ . But now lim inf M →∞ inf k γ k≥ M Pr ( ξ ( γ, σ ) ≥ ≤ lim inf M →∞ inf R ˆ β ( Zγ ) =0 , k γ k≥ M Pr ( ξ ( γ, σ ) ≥ M →∞ inf R ˆ β ( Zγ ) =0 , k γ k≥ M Pr (cid:16) ξ ( γ, σ ) / k γ k ≥ (cid:17) ≤ lim inf M →∞ inf R ˆ β ( Zγ ) =0 , k γ k = M Pr (cid:16) ξ ( γ, σ ) / k γ k ≥ (cid:17) ≤ inf k c k =1 ,R ˆ β ( Zc ) =0 lim inf M →∞ Pr (cid:0) ¯ ξ ( c, M, σ ) ≥ (cid:1) where ¯ ξ ( c, M, σ ) = (cid:16) R (cid:16) ˆ β ( Zc ) + σ ˆ β (cid:16) ¯Σ / G (cid:17) /M (cid:17)(cid:17) ′ ˇΩ − (cid:16)(cid:16) ¯Σ / + D / (cid:17) G (cid:17) × R (cid:16) ˆ β ( Zc ) + σ ˆ β (cid:16) ¯Σ / G (cid:17) /M (cid:17) on the event where (cid:0) ¯Σ / + D / (cid:1) G ∈ R n \ N ∗ and is zero else. The random variable ¯ ξ ( c, M, σ )converges in probability to the random variable ¯ ξ ( c ) as M → ∞ . Hencelim inf M →∞ Pr (cid:0) ¯ ξ ( c, M, σ ) ≥ (cid:1) = Pr (cid:0) ¯ ξ ( c ) ≥ (cid:1) holds for every c ∈ R l satisfying k c k = 1 and R ˆ β ( Zc ) = 0, because Pr (cid:0) ¯ ξ ( c ) = 0 (cid:1) = 0 for such c inview of Assumption 7 observing that P , ¯Σ+ D is equivalent to λ R n as ¯Σ + D is nonsingular. Thisproves that lim inf M →∞ inf k γ k≥ M inf Σ ∈ C P µ + Zγ,σ Σ ( W ( C )) ≤ inf k c k =1 ,R ˆ β ( Zc ) =0 Pr (cid:0) ¯ ξ ( c ) ≥ (cid:1) = inf k c k =1 Pr (cid:0) ¯ ξ ( c ) ≥ (cid:1) = inf c ∈ R l Pr (cid:0) ¯ ξ ( c ) ≥ (cid:1) = K , the first two equalities holding because ¯ ξ ( c ) ≡ R ˆ β ( Zc ) = 0 (and in particular if c = 0) andbecause Pr (cid:0) ¯ ξ ( c ) ≥ (cid:1) is homogenous in c . This establishes the first inequality in (33) because theleft-most expression in (33) is monotonically increasing in M . Furthermore,sup Σ ∈ C P µ ,σ Σ ( W ( C )) ≥ P µ ,σ Σ m ( W ( C )) , Σ ∈ C P µ ,σ Σ ( W ( C )) ≥ Pr ( ξ (0 , σ ) ≥ (cid:18)(cid:16) R ˆ β (cid:16) ¯Σ / G (cid:17)(cid:17) ′ ˇΩ − (cid:16)(cid:16) ¯Σ / + D / (cid:17) G (cid:17) R ˆ β (cid:16) ¯Σ / G (cid:17) ≥ , (cid:16) ¯Σ / + D / (cid:17) G ∈ R n \ N ∗ (cid:17) . Now observe that ˇΩ − (cid:0)(cid:0) ¯Σ / + D / (cid:1) G (cid:1) = ˇΩ − ( D / G ) by equivariance and that (cid:0) ¯Σ / + D / (cid:1) G ∈ R n \ N ∗ if and only if D / G ∈ R n \ N ∗ . Then by the same arguments as in (50) we obtainPr ( ξ (0 , σ ) ≥ 0) = Z Pr (cid:18)(cid:16) R ˆ β ( x ) (cid:17) ′ ˇΩ − (cid:16)(cid:16) ¯Σ / + D / (cid:17) G (cid:17) × R (cid:16) ˆ β ( x ) (cid:17) ≥ , (cid:16) ¯Σ / + D / (cid:17) G ∈ R n \ N ∗ (cid:17) dP , ¯Σ ( x )= Z Pr (cid:0) ¯ ξ ( γ ) ≥ (cid:1) dP ,A ( γ ) = K , the last equality resulting from the variable change x = Zγ which is possible since span( ¯Σ) equalsthe space spanned by Z . Finally, the inequality K ≤ K is obvious from the definition of theseconstants. (cid:4) Proof of Theorem 5.21: Define ϕ = ( W ( C )) and note that invariance of ϕ under G ( M ) aswell as the fact that ϕ is λ R n -almost everywhere neither equal to 0 or 1 follows from Lemma 5.15.Part 1 of Theorem 5.10 then implies Part 1 of the theorem. Similarly, Parts 2 and 3 of the theoremfollow from Parts 2 and 3 of Theorem 5.10, respectively, because condition (22) follows from Part 7of Lemma 5.15 combined with Remark 5.16 and because the lower bound in (30) equals 1 under theassumptions of Part 3. To prove Part 4 we use Theorem 5.12. Choose a sequence C k , 0 < C k < ∞ ,that diverges monotonically to infinity and set ϕ k = ( W ( C k )). Then (26) is satisfied and theresult follows from Theorem 5.12 upon setting C ( δ ) = C k ( δ ) . (cid:4) Lemma F.3. Let ˇ β and ˇΩ satisfy Assumptions 5 and 7. Let T be the test statistic defined in (28)and let C be a covariance model. If there is a z ∈ span ( J ( C )) ∩ M with z / ∈ M − µ (i.e., with R ˆ β ( z ) = 0 ), then T does not satisfy the invariance condition (34).Proof. Choose z ∈ span ( J ( C )) ∩ M with z / ∈ M − µ . Because M is a linear space, we also have cz ∈ span ( J ( C )) ∩ M for every c ∈ R . Now cz ∈ M entails that y ∈ R n \ N ∗ implies y + cz ∈ R n \ N ∗ .Using the definition of T and Assumption 5 we obtain T ( y + cz ) = T ( y ) + 2 c (cid:0) R ˇ β ( y ) − r (cid:1) ′ ˇΩ − ( y ) R ˆ β ( z ) + c (cid:16) R ˆ β ( z ) (cid:17) ′ ˇΩ − ( y ) (cid:16) R ˆ β ( z ) (cid:17) for every y ∈ R n \ N ∗ . Because R ˆ β ( z ) = 0, we can in view of Assumption 7 find an y ∈ R n \ N ∗ suchthat (cid:16) R ˆ β ( z ) (cid:17) ′ ˇΩ − ( y ) (cid:16) R ˆ β ( z ) (cid:17) = 0holds. Hence T ( y + cz ) = T ( y ) cannot hold for the so-chosen y and all c = 0. Because cz ∈ span ( J ( C )), Remark 5.11(i) implies that condition (34) is not satisfied.76 roof of Proposition 5.23: (1) By the assumed equivariance (invariance, respectively) of ¯ θ ,¯Ω, and ¯ N (and hence of ¯ N ∗ ) w.r.t. the transformations y αy + ¯ Xη , the equivariance (invariance,respectively) of ¯ β , ¯Ω, and ¯ N required in the original Assumption 5 is clearly satisfied. Now choose z ∈ J ( C ) and y ∈ R n . If y ∈ ¯ N ∗ then so is y + z because of invariance of ¯ N ∗ and because z ∈ J ( C ) ⊆ ¯ M holds by construction. Hence, T ( y ) = 0 = T ( y + z ) is satisfied in this case. Now let y ∈ R n \ ¯ N ∗ (and hence also y + z ∈ R n \ ¯ N ∗ ). Note that ¯Ω( y ) = ¯Ω( y + z ) holds by equivariance. Itremains to show that R ¯ β ( y ) = R ¯ β ( y + z ). Because z ∈ J ( C ) ⊆ ¯ M we have z = Xγ + (¯ x , . . . , ¯ x p ) δ and thus obtain R ¯ β ( y + z ) = ( R, 0) ¯ θ ( y + z ) = ( R, (cid:16) ¯ θ ( y ) + (cid:0) γ ′ , δ ′ (cid:1) ′ (cid:17) = R ¯ β ( y ) + Rγ, (51)where we have made use of equivariance of ¯ θ . Now observe that (¯ x , . . . , ¯ x p ) δ ∈ span ( J ( C ) ∪ ( M − µ ))by construction of the ¯ x i . Hence, we can find an element µ ∈ M such that (¯ x , . . . , ¯ x p ) δ − (cid:16) µ − µ (cid:17) ∈ span ( J ( C )). Consequently, we obtain z − (cid:16) (¯ x , . . . , ¯ x p ) δ − (cid:16) µ − µ (cid:17)(cid:17) = Xγ + (cid:16) µ − µ (cid:17) . The left-hand side is obviously an element of span ( J ( C )), while the right-hand side belongs to M ,implying that the right-hand side is in span J ( C ) ∩ M which is a subset of M − µ by assumption.Because µ − µ ∈ M − µ , we have established that Xγ ∈ M − µ , or in other words, that Rγ = 0.(2) The very first claim is obvious. If z ∈ span ( J ( C )) then again we have z = Xγ +(¯ x , . . . , ¯ x p ) δ and ¯ θ ( z ) = (cid:0) γ ′ , δ ′ (cid:1) ′ . Now R ¯ β ( z ) = ( R, 0) ¯ θ ( z ) = Rγ and exactly the same argument as above showsthat Rγ = 0. For the last claim note that ¯ X ¯ θ ( y ) = X ∗ θ ∗ ( y ) holds because ¯ X and X ∗ span thesame space. This equality can be written as X ¯ β ( y ) − Xβ ∗ ( y ) = p X i =1 x ∗ i θ ∗ k + i ( y ) − p X i =1 ¯ x i ¯ θ k + i ( y ) . Because the right-hand side of the above equation belongs to span ( J ( C ) ∪ ( M − µ )) we can find µ ∈ M such that the right-hand side of X (cid:0) ¯ β ( y ) − β ∗ ( y ) (cid:1) − (cid:16) µ − µ (cid:17) = p X i =1 x ∗ i θ ∗ k + i ( y ) − p X i =1 ¯ x i ¯ θ k + i ( y ) − (cid:16) µ − µ (cid:17) belongs to span ( J ( C )) while the left-hand side belongs to M . Arguing now similarly as in the proofof Part 1, we conclude that R ¯ β ( y ) = Rβ ∗ ( y ). (cid:4) G Appendix: Properties of AR-Correlation Matrices Lemma G.1. 1. Suppose the covariance model C contains Λ( ρ m ) for some sequence ρ m ∈ ( − , with ρ m → ( ρ m → − , respectively). Then span ( e + ) ( span ( e − ) , respectively)is a concentration space of C .2. C AR (1) has span ( e + ) and span ( e − ) as its only concentration spaces. Consequently, J ( C AR (1) ) =span( e + ) ∪ span( e − ) . . If ρ m ∈ ( − , is a sequence converging to then Σ m = Λ( ρ m ) satisfies Σ m → ¯Σ = e + e ′ + and D m = Π span ( ¯Σ ) ⊥ Σ m Π span ( ¯Σ ) ⊥ /s m → D as well as Π span ( ¯Σ ) ⊥ Σ m Π span ( ¯Σ ) /s / m → where s m = tr (cid:18) Π span ( ¯Σ ) ⊥ Σ m Π span ( ¯Σ ) ⊥ (cid:19) converges to zero and D is the matrix with ( i, j ) -th element − n | i − j | / P i,j | i − j | pre- and postmultiplied by (cid:0) I n − n − e + e ′ + (cid:1) . Furthermore, D is regularon span (cid:0) ¯Σ (cid:1) ⊥ .4. If ρ m ∈ ( − , is a sequence converging to − then Σ m = Λ( ρ m ) satisfies Σ m → ¯Σ = e − e ′− and D m = Π span ( ¯Σ ) ⊥ Σ m Π span ( ¯Σ ) ⊥ /s m → D as well as Π span ( ¯Σ ) ⊥ Σ m Π span ( ¯Σ ) /s / m → where s m = tr (cid:18) Π span ( ¯Σ ) ⊥ Σ m Π span ( ¯Σ ) ⊥ (cid:19) converges to zero and D is the matrix with ( i, j ) -th element n ( − | i − j | +1 | i − j | / P i,j | i − j | pre- and postmultiplied by (cid:0) I n − n − e − e ′− (cid:1) . Furthermore, D is regular on span (cid:0) ¯Σ (cid:1) ⊥ .Proof. (1) and (2) are obvious.(3) Because Π span ( ¯Σ ) ⊥ Σ m Π span ( ¯Σ ) ⊥ is nonnegative definite, but obviously different from the zeromatrix (recall that n > s m is always positive. Clearly, Π span ( ¯Σ ) ⊥ Σ m Π span ( ¯Σ ) ⊥ converges to Π span ( ¯Σ ) ⊥ ¯ΣΠ span ( ¯Σ ) ⊥ = 0 and hence s m → 0. By l’Hopital’s rule the limit of D m canbe obtained as the limit of Π span ( ¯Σ ) ⊥ ( d Λ /dρ ) ( ρ m )Π span ( ¯Σ ) ⊥ divided by the limit oftr (cid:18) Π span ( ¯Σ ) ⊥ ( d Λ /dρ ) ( ρ m )Π span ( ¯Σ ) ⊥ (cid:19) provided the latter is nonzero. The second limit now equalstr (cid:0)(cid:0) I n − n − e + e ′ + (cid:1) ( d Λ /dρ ) (1) (cid:0) I n − n − e + e ′ + (cid:1)(cid:1) = tr (cid:0) ( d Λ /dρ ) (1) (cid:0) I n − n − e + e ′ + (cid:1)(cid:1) = tr (( d Λ /dρ ) (1)) − n − tr (cid:0) e ′ + ( d Λ /dρ ) (1) e + (cid:1) . Observe that the ( i, j )-th element of the matrix ( d Λ /dρ ) (1) is given by | i − j | . Hence, the aboveexpression equals − n − tr (cid:0) e ′ + ( d Λ /dρ ) (1) e + (cid:1) = − n − X i,j | i − j | , which is clearly nonzero. The first limit exists and equals (cid:0) I n − n − e + e ′ + (cid:1) ( d Λ /dρ ) (1) (cid:0) I n − n − e + e ′ + (cid:1) which shows that D is of the form as claimed in the lemma. We next show that D is regular onspan (cid:0) ¯Σ (cid:1) ⊥ = span( e + ) ⊥ . This is equivalent to showing that the equation system( d Λ /dρ ) (1) x + λe + = 0 e ′ + x = 0has x = 0, λ = 0 as its only solution. We hence need to show that the ( n + 1) × ( n + 1) matrix A = (cid:20) ( d Λ /dρ ) (1) e + e ′ + (cid:21) n + 1. Let B be the ( n + 1) × ( n + 1) matrix given by B = (cid:20) B 00 1 (cid:21) where the n × n matrix B has 1 everywhere on the main diagonal, − n + 1) × ( n + 1) matrices B ∗ and B ∗∗ be given by B ∗ = (cid:20) I n (cid:21) , B ∗∗ = (cid:20) I n f (cid:21) , where f = − ( n − , n − , n − , . . . , , B , B ∗ , as well as B ∗∗ are non-singularand that B ∗ BAB ∗∗ = C = (cid:20) C 00 1 (cid:21) where C is an n × n matrix that has 1 everywhere on and above the diagonal and − C is nonsingular and hence A is so. Finally, we show that the limitof Π span ( ¯Σ ) ⊥ Σ m Π span ( ¯Σ ) /s / m equals zero. Because s m → 0, it suffices to show that the limit ofΠ span ( ¯Σ ) ⊥ Σ m Π span ( ¯Σ ) /s m exists and is finite. Now the same arguments as above show that thelatter limit is equal to (cid:0) I n − n − e + e ′ + (cid:1) ( d Λ /dρ ) (1) n − e + e ′ + divided by − n − P i,j | i − j | .(4) For the same reasons as in (3) s m is positive and converges to zero. By the same argumentas in (3) the limit of D m is (cid:2)(cid:0) I n − n − e − e ′− (cid:1) ( d Λ /dρ ) ( − (cid:0) I n − n − e − e ′− (cid:1)(cid:3) / tr (cid:0)(cid:0) I n − n − e − e ′− (cid:1) ( d Λ /dρ ) ( − (cid:0) I n − n − e − e ′− (cid:1)(cid:1) . Note that the denominator is equal totr (( d Λ /dρ ) ( − − n − tr (cid:0) e ′− ( d Λ /dρ ) ( − e − (cid:1) = n − X i,j | i − j | 6 = 0 , observing that the ( i, j )-th element of ( d Λ /dρ ) ( − 1) is given by ( − | i − j | +1 | i − j | . We next showthat D is regular on span (cid:0) ¯Σ (cid:1) ⊥ = span( e − ) ⊥ . This is equivalent to showing that the equationsystem ( d Λ /dρ ) ( − x + λe − = 0 e ′− x = 0has x = 0, λ = 0 as its only solution. We hence need to show that the ( n + 1) × ( n + 1) matrix A = (cid:20) ( d Λ /dρ ) ( − e − e ′− (cid:21) has rank n + 1. Note that this is equivalent to establishing that the matrix A † = (cid:20) ( d Λ /dρ ) ( − 1) ( − n +1 e − ( − n +1 e ′− (cid:21) is nonsingular. Now note that A † = − EAE where A is as in (3) and E is an ( n +1) × ( n +1) diagonal matrix with the i -th diagonal element givenby ( − i . This proves regularity of D on span (cid:0) ¯Σ (cid:1) ⊥ . The claim for Π span ( ¯Σ ) ⊥ Σ m Π span ( ¯Σ ) /s / m isproved as in (3). 79 emma G.2. For every ν ∈ [0 , π ] there exists a sequence Σ m ∈ C AR (2) converging to E ( ν ) E ( ν ) ′ .Proof. For ν = 0 ( ν = π , respectively) the matrix E ( ν ) E ( ν ) ′ equals e + e ′ + ( e − e ′− , respectively), andthe result thus follows from Lemma G.1. Hence assume that ν ∈ (0 , π ). Consider for 0 < r < f r ( ω ) = (2 π ) − c ( r ) (cid:12)(cid:12) − r cos ( ν ) exp( − ιω ) + r exp( − ιω ) (cid:12)(cid:12) − where c ( r ) = (cid:0) − r (cid:1) (cid:16)(cid:0) r (cid:1) − r cos ( ν ) (cid:17) (cid:0) r (cid:1) − . Observe that R f r ( ω ) dω = 1 where the integral extends over [ − π, π ]. Hence the n × n variancecovariance matrix Σ ( r ) corresponding to f r belongs to C AR (2) . Let ε > A ( ε ) = { ω ∈ [ − π, π ] : | ω − ν | ≥ ε } ∪ { ω ∈ [ − π, π ] : | ω + ν | ≥ ε } . Then it is easy to see thatsup ω ∈ A ( ε ) | f r ( ω ) | → r → δ > ε > < r ( ε, δ ) < Z [ − π,π ] \ A ( ε ) f r ( ω ) dω > − δ holds for all r satisfying r ( ε, δ ) < r < 1. In view of symmetry of f r around ω = 0, this shows thatfor r sufficiently close to 1 the spectral density f r is arbitrarily small outside of the union of theneighborhoods | ω − ν | < ε and | ω + ν | < ε and puts mass arbitrarily close to 1 / g on [ − π, π ]that Z [ − π,π ] g ( ω ) f r ( ω ) dω → . g ( ν ) + 0 . g ( − ν ) = Z [ − π,π ] g ( ω ) d (0 . δ ν + 0 . δ − ν )where δ x denotes unit pointmass at x . Specializing to g ( ω ) = exp( − ιlω ) shows that Σ ( r ) convergesto E ( ν ) E ( ν ) ′ .Using the arguments in the above proof it is actually not difficult to show that the closure ofthe set of AR(2)-spectral densities in the weak topology is the class of AR(2)-spectral densities plusall spectral measures of the form 0 . δ ν + 0 . δ − ν for ν ∈ [0 , π ]. This result extends in an obviousway to higher-order autoregressive models and has an appropriate generalization to (multivariate)autoregressive moving average models, see Theorem 4.1 in Deistler and P¨otscher (1984). References Anderson, T. W. (1971). The statistical analysis of time series . Wiley Series in Probability andMathematical Statistics, Wiley New York. Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrixestimation. Econometrica , ndrews, D. W. K. and Monahan, J. C. (1992). An improved heteroskedasticity and autocor-relation consistent covariance matrix estimator. Econometrica , Bakirov, N. and Szkely, G. (2005). Student’s t-test for gaussian scale mixtures. Zapiski Nauch-nyh Seminarov POMI , Banerjee, A. N. and Magnus, J. R. (2000). On the sensitivity of the usual t- and F-tests tocovariance misspecification. Journal of Econometrics , 157 – 176. Bartlett, M. S. (1950). Periodogram analysis and continuous spectra. Biometrika , Berk, K. N. (1974). Consistent autoregressive spectral estimates. Annals of Statistics , Billingsley, P. (1968). Convergence of probability measures . John Wiley & Sons, Inc., NewYork-London-Sydney. Cribari-Neto, F. (2004). Asymptotic inference under heteroskedasticity of unknown form. Com-putational Statistics and Data Analysis , 215 – 233. Deistler, M. and P¨otscher, B. M. (1984). The behaviour of the likelihood function for ARMAmodels. Advances in Applied Probability , den Haan, W. J. and Levin, A. T. (1997). A practitioner’s guide to robust covariance matrixestimation. In Robust Inference (G. Maddala and C. Rao, eds.), vol. 15 of Handbook of Statistics .Elsevier, 299 – 342. Dufour, J.-M. (1997). Some impossibility theorems in econometrics with applications to structuraland dynamic models. Econometrica , Dufour, J.-M. (2003). Identification, weak instruments, and statistical inference in econometrics. Canadian Journal of Economics/Revue canadienne d’conomique , Eicker, F. (1963). Asymptotic normality and consistency of the least squares estimators forfamilies of linear regressions. Ann. Math. Statist. , Eicker, F. (1967). Limit theorems for regressions with unequal and dependent errors. In Proc. FifthBerkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66) . Univ. CaliforniaPress, Berkeley, Calif., Vol. I: Statistics, pp. 59–82. Flegal, J. M. and Jones, G. L. (2010). Batch means and spectral variance estimators in Markovchain Monte Carlo. Ann. Statist. , Grenander, U. and Rosenblatt, M. (1957). Statistical analysis of stationary time series . JohnWiley & Sons, New York. Hannan, E. (1970). Multiple time series . Wiley Series in Probability and Mathematical Statistics,Wiley New York. Hannan, E. J. (1957). The variance of the mean of a stationary process. Journal of the RoyalStatistical Society. Series B , Hansen, B. E. (1992). Consistent covariance matrix estimation for dependent heterogeneousprocesses. Econometrica , eidelberger, P. and Welch, P. D. (1981). A spectral method for confidence interval generationand run length control in simulations. Commun. ACM , Ibragimov, R. and M¨uller, U. K. (2010). t-statistic based correlation and heterogeneity robustinference. Journal of Business and Economic Statistics , Jansson, M. (2002). Consistent covariance matrix estimation for linear processes. EconometricTheory , Jansson, M. (2004). The error in rejection probability of simple autocorrelation robust tests. Econometrica , Jowett, G. H. (1955). The comparison of means of sets of observations from sections of indepen-dent stochastic series. Journal of the Royal Statistical Society. Series B , Keener, R. W. , Kmenta, J. and Weber, N. C. (1991). Estimation of the covariance matrixof the least-squares regression coefficients when the disturbance covariance matrix is of unknownform. Econometric Theory , Kelejian, H. H. and Prucha, I. R. (2007). HAC estimation in a spatial framework. Journal ofEconometrics , 131 – 154. Kelejian, H. H. and Prucha, I. R. (2010). Specification and estimation of spatial autoregressivemodels with autoregressive and heteroskedastic disturbances. Journal of Econometrics , 53– 67. Kiefer, N. M. and Vogelsang, T. J. (2002a). Heteroskedasticity-autocorrelation robust stan-dard errors using the Bartlett kernel without truncation. Econometrica , Kiefer, N. M. and Vogelsang, T. J. (2002b). Heteroskedasticity-autocorrelation robust testingusing bandwidth equal to sample size. Econometric Theory , Kiefer, N. M. and Vogelsang, T. J. (2005). A new asymptotic theory for heteroskedasticity-autocorrelation robust tests. Econometric Theory , Kiefer, N. M. , Vogelsang, T. J. and Bunzel, H. (2000). Simple robust testing of regressionhypotheses. Econometrica , Kr¨amer, W. (1989). On the robustness of the F-test to autocorrelation among disturbances. Economics Letters , 37 – 40. Kr¨amer, W. (2003). The robustness of the F-test to spatial autocorrelation among regressiondisturbances. Statistica (Bologna) , Kr¨amer, W. , Kiviet, J. and Breitung, J. (1990). The null distribution of the F-test in thelinear regression model with autocorrelated disturbances. Statistica (Bologna) , Krmer, W. and Hanck, C. (2009). More on the F-test under nonspherical disturbances. In Statistical Inference, Econometric Analysis and Matrix Algebra (B. Schipp and W. Krmer, eds.).Physica-Verlag HD, 179–184. 82 ehmann, E. L. and Romano, J. P. (2005). Testing statistical hypotheses . 3rd ed. Springer Textsin Statistics, Springer, New York. Long, J. S. and Ervin, L. H. (2000). Using heteroscedasticity consistent standard errors in thelinear regression model. The American Statistician , Magee, L. (1989). An Edgeworth test size correction for the linear model with AR(1) errors. Econometrica , Martellosio, F. (2010). Power properties of invariant tests for spatial autocorrelation in linearregression. Econometric Theory , Neave, H. R. (1970). An improved formula for the asymptotic variance of spectrum estimates. The Annals of Mathematical Statistics , Newey, W. K. and West, K. D. (1987). A simple, positive semi-definite, heteroskedasticity andautocorrelation consistent covariance matrix. Econometrica , Newey, W. K. and West, K. D. (1994). Automatic lag selection in covariance matrix estimation. The Review of Economic Studies , Park, R. E. and Mitchell, B. M. (1980). Estimating the autocorrelated error model withtrended data. Journal of Econometrics , Perron, P. and Ren, L. (2011). On the irrelevance of impossibility theorems: the case of thelong-run variance. J. Time Ser. Econom. , Art. 1, 34. Phillips, P. C. B. (2005). HAC estimation by automated regression. Econometric Theory , Phillips, P. C. B. , Sun, Y. and Jin, S. (2006). Spectral density estimation and robust hypothesistesting using steep origin kernels without truncation. International Economic Review , Phillips, P. C. B. , Sun, Y. and Jin, S. (2007). Long run variance estimation and robustregression testing using sharp origin kernels with no truncation. Journal of Statistical Planningand Inference , Politis, D. (2011). Higher-order accurate, positive semidefinite estimation of large-sample covari-ance and spectral density matrices. Econometric Theory , P¨otscher, B. M. (2002). Lower risk bounds and properties of confidence sets for ill-posed estima-tion problems with applications to spectral density and persistence estimation, unit roots, andestimation of long memory parameters. Econometrica , Preinerstorfer, D. (2014). Finite sample properties of tests based on prewhitened nonparamet-ric covariance estimators. Working Paper, Department of Statistics, University of Vienna. Preinerstorfer, D. and P¨otscher, B. M. (2014). On the power of invariant tests for hypotheseson a covariance matrix. Working Paper, Department of Statistics, University of Vienna. obinson, G. (1979). Conditional properties of statistical procedures. Annals of Statistics , Sun, Y. (2013). A heteroskedasticity and autocorrelation robust F test using an orthonormal seriesvariance estimator. Econom. J. , Sun, Y. and Kaplan, D. M. (2012). Fixed-smoothing asymptotics and accurate F approxima-tion using vector autoregressive covariance matrix estimators. Working Paper, Department ofEconomics, UC San Diego . Sun, Y. , Phillips, P. C. B. and Jin, S. (2008). Optimal bandwidth selection in heteroskedasticity-autocorrelation robust testing. Econometrica , Sun, Y. , Phillips, P. C. B. and Jin, S. (2011). Power maximization and size control in het-eroskedasticity and autocorrelation robust tests with exponentiated kernels. Econometric Theory , Thomson, D. J. (1982). Spectrum estimation and harmonic analysis. Proceedings of the IEEE , Velasco, C. and Robinson, P. M. (2001). Edgeworth expansions for spectral density estimatesand studentized sample mean. Econometric Theory , Vogelsang, T. J. (2012). Heteroskedasticity, autocorrelation, and spatial correlation robustinference in linear panel models with fixed-effects. Journal of Econometrics , 303 – 319. White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct testfor heteroskedasticity. Econometrica , Zhang, X. and Shao, X. (2013a). Fixed-smoothing asymptotics for time series. Ann. Statist. , Zhang, X. and Shao, X. (2013b). On a general class of long run variance estimators. Econom.Lett. ,120