aa r X i v : . [ ec on . E M ] A ug On the Size Control of the Hybrid Test for PredictiveAbility
Deborah Kim ∗ Department of EconomicsNorthwestern University [email protected]
August 7, 2020
Abstract
We show that the hybrid test for superior predictability is not pointwise asymp-totically of level α under standard conditions, and may lead to rejection rates over11% when the significance level α is 5% in a simple case. We propose a modifiedhybrid test which is uniformly asymptotically of level α by properly adapting thegeneralized moment selection method. Keywords:
Asymptotic size; Generalized moment selection; Reality check; Uniform testing
1. Introduction
A test of superior predictive ability (SPA) compares many forecasting methods. Moreprecisely, it tests whether a certain forecasting method outperforms a finite set of alternativeforecasting methods. White (2000) developed a framework for a SPA test and proposeda SPA test called the reality check for data snooping. Hansen (2005) proposed a SPAtest featuring improved power in the framework of White (2000). Finally, Song (2012)devised a SPA test, called the hybrid test, which delivers better power against certain localalternative hypotheses under which both of the SPA tests of White (2000) and Hansen(2005) perform poorly. ∗ I am grateful to Ivan Canay for his valuable guidance and suggestions. I have had the support andencouragement of Yoon-Jae Whang. I thank Joel Horowitz, Eric Auerbach, Myungkou Shin, and ModiboCamara for their helpful comments. H : d ≤ d ∈ R M where, ingeneral, M ≥
2. It then follows that the limiting distribution of standard test statisticsdepends on exactly which of the elements in the vector d are equal to zero. This propertyprevents researchers from using any tabulated critical values.To circumvent the above problem, White (2000) proposed to use a critical value from theso-called least favorable distribution. The approach exploits the fact that the distributionof White (2000)’s’ test statistic T under d = 0 is stochastically largest over all possiblenull distributions satisfying d ≤
0. The distribution under d = 0 is then called the leastfavorable one. White (2000) proposes to approximate the least favorable distribution usingthe bootstrap and takes the 1 − α quantile of the distribution as the critical value, where α is a significance level. The resulting critical value converges to a value which is alwayslarger than 1 − α quantile of the limiting distribution of T under any null distribution andthus the approach yields a test with correct asymptotic size.Song (2012) followed White (2000) in the analysis of his hybrid test and used the sameleast favorable distribution, i.e., the one associated with d = 0. However, in this paper, weshow that this null distribution is not the least favorable one for the type of test statisticthat Song (2012) considers in the hybrid test which, in particular, combines two differenttest statistics. Whereas one of the test statistics is stochastically largest under d = 0, theother one is not. Consequently, the hybrid test which employs bootstrap approximationsto the distribution with d = 0 fails to control the rejection probability under the null andleads to size distortion.The main contributions of this paper are the following. First, we show that the hybridtest is not pointwise asymptotically of level α under reasonable conditions. Our resultsillustrate that the cause of the size distortion lies behind the fact that the bootstrap pro-cedure behind the hybrid test neither approximates the asymptotic distribution of the teststatistic nor it approximates the least favorable distribution. Second, we propose a modifiedhybrid test which is uniformly asymptotically of level α , again under reasonable conditions.2his stronger result implies that one would expect the finite sample size of the test notto exceed the significance level for large enough sample sizes. Our proposed modificationfollows the generalized moment selection method by Andrews and Soares (2007, henceforthAS) after accounting for the fact the test statistic in the hybrid test does not exhibit certainmonotonicity properties that are required for the approach in AS. This last observation,despite being a rather technical one, may be of independent interest.This article is organized as follows. Section 2 lays out notation and describes thehybrid test as originally proposed by Song (2012). Section 3 presents the main result ofthe properties of the hybrid test. Section 4 presents the modified hybrid test and its formalproperties. Section 5 explores the Monte Carlo simulations of the hybrid test and themodified one. Lastly, Section 6 concludes. The proofs of the formal results are included inAPPENDIX B.
2. The hybrid test for predictive ability
In this section, we introduce notation and the procedure of the hybrid test.Suppose there is a time-series { X t } ∞ t =1 from the distribution P and we observe X ( n ) ≡{ X t } nt =1 by time n . Define a τ -ahead unknown random variable ξ n + τ ≡ f ( X n + τ ) for somefunction f , the object that we aim to predict. A forecasting method ϕ is a mapping fromthe sample X ( n ) to a forecast for ξ n + τ . We have M + 1 different forecasting methods: abenchmark forecasting method ϕ and a finite set of alternative forecasting methods ϕ m , m ∈ M = { , · · · , M } .The objective of the hybrid test is to test whether the benchmark forecasting methodis superior to all other alternative forecasting methods in terms of predictive ability. Tocompare the predictive ability, we assess the risk (of prediction) of the m th forecastingmethod using a real-valued function Λ m ≡ Λ( ϕ m , P ) for m = 0 , , · · · , M . An example ofsuch risk is the mean squared error Λ m = E [ k ϕ m ( X ( n ) ) − ξ n + τ k ] for ξ n + τ ∈ R . We saythe benchmark forecasting method ϕ dominates the forecasting method ϕ m (in terms ofpredictive ability measured by the risk function Λ) if the risk of the forecasting method ϕ m is greater than or equal to the risk of the benchmark forecasting method ϕ . Becausewe are interested in testing whether the benchmark method ϕ dominates all alternative3orecasting methods in M , the hypotheses can be formulated as H : Λ ≤ Λ m for all m ∈ M , and H : Λ > Λ m for some m ∈ M . The risk of the m th forecasting method ϕ m is unknown, but it can be estimated. LetˆΛ n,m be an estimator for Λ m , for m = 0 , , · · · , M . We define the risk difference betweenthe benchmark forecasting method ϕ and the m th forecasting method ϕ m by d m . Definethe counterpart of d m in the sample space by ˆ d n,m , i.e., d m ≡ Λ − Λ m and ˆ d n,m ≡ ˆΛ n, − ˆΛ n,m for m ∈ M . Define d as the M -dimensional vector of which m th element is d m for m = 1 , · · · , M .Analogously define ˆ d n as the M -dimensional vector of which m th element is ˆ d n,m for m =1 , · · · , M . With this notation, the hypotheses can be equivalently written as H : d ≤
0, and H : d φ n ≡ φ n ( X , · · · , X n ) for the null hypothesis H is said to be pointwiseasymptotically of level α if it satisfieslim sup n →∞ E P [ φ n ] ≤ α for all P ∈ P , (1)where P is the set of all distributions P satisfying the null hypothesis and the basicassumptions. In turn, the test is said to be uniformly asymptotically of level α if it satisfieslim sup n →∞ sup P ∈ P E P [ φ n ] ≤ α . (2)Note that (2) implies (1). If either (1) or (2) fails, then we can always find data generatingprocesses under the null such that the rejection probability exceeds α. In the literature of comparing the predictive ability of forecasting methods, the asymp-totic properties of the statistical tests are often characterized by the asymptotic distributionof the vector ˆ d n . In line with this, Song (2012) imposes the following assumption on theasymptotic behavior of the vector ˆ d n . 4 ssumption 1. Let P be the distribution generating the time series { X t } ∞ t =1 .(i) √ n ( ˆ d n − d ) d → N (0 , Σ) where d and Σ depend on P and Σ is a M -dimensionalvariance-covariance matrix.(ii) There exist a consistent estimator ˆ σ n,m for Σ m,m for m = 1 , · · · , M . West (1996)’s Theorem 4.1 and Hansen (2005)’s Assumption 1 provide regularity conditionsunder which the Assumption 1 is satisfied. It’s worth noting that Assumption 1 (i) impliesthat ˆ d n converges in probability to a fixed parameter d as the sample size increases toinfinity. In this sense, Assumption 1 (i) is stronger than assuming that √ n ( ˆ d n − E P [ ˆ d n ])converges in distribution to a normal distribution.The key feature of the hybrid test is to use two pairs of a test statistic and a criticalvalue in order to form a rejection region. The first pair ( ˆ T rn , ˆ c r ∗ n ) is adopted from the realitycheck. We define the one-sided test by φ rn ≡ { ˆ T rn > ˆ c r ∗ n } . We call this one-sided testbecause the test statistic ˆ T rn is originally devised to test the one-sided null hypothesis H . The second pair ( ˆ T sn , ˆ c s ∗ n ) is adopted from the symmetrized test by Linton et al. (2005). Wedefine the two-sided test by φ sn ≡ { ˆ T sn > ˆ c s ∗ n } . Again, the name, two-sided test, comes fromthe fact that the statistic ˆ T sn is originally proposed to test the two-sided null hypothesis H s : d ≤ d ≥ . Given the two pairs ( ˆ T rn , ˆ c r ∗ n ) and ( ˆ T sn , ˆ c s ∗ n ), the hybrid test is defined by φ n ≡ φ rn (1 − φ sn ) + φ sn . (3)That is, the hybrid test rejects the null hypothesis H if ˆ T rn > ˆ c r ∗ n or ˆ T sn > ˆ c s ∗ n . The testtakes the union of two rejection regions formed by the two tests, φ rn and φ sn , as its rejectionregion. Such rejection region allows the hybrid test to deliver better power propertiesagainst some alternative hypotheses under which the reality check or Hansen (2005)’s SPAtest perform poorly. For details on this result, see Song (2012).The two test statistics are defined as follows:ˆ T rn ≡ √ n max m ∈ M ˆ d n,m ˆ σ n,m andˆ T sn ≡ √ n min max m ∈ M ˆ d n,m ˆ σ n,m , max m ∈ M − ˆ d n,m ˆ σ n,m !! .
5o define the two critical values, fix a significance level α ∈ (0 ,
1) and a tuning parameter γ ∈ (0 , c r ( α, γ ) and ¯ c s ( α, γ ) satisfying the two conditions:lim n →∞ P { ˆ T rn > ¯ c r ( α, γ ) and ˆ T sn ≤ ¯ c s ( α, γ ) } = α (1 − γ ) andlim n →∞ P { ˆ T sn > ¯ c s ( α, γ ) } = αγ. The two equations above imply that if the critical values ˆ c r ∗ n ≡ ˆ c r ∗ n ( α, γ ) and ˆ c s ∗ n ≡ ˆ c s ∗ n ( α, γ )converge in probability to ¯ c r ( α, γ ) and ¯ c s ( α, γ ) respectively, then the hybrid test has thelimiting rejection probability of α under the null hypothesis. The two values are, however,infeasible as the limit distribution of the test statistics is not pivotal. Therefore, Song(2012) proposes to implement bootstrap to obtain data-dependent critical values expectingthat this approach would deliver critical values with the above properties.The procedure to get the bootstrap critical values are the following. Consider generalbootstrap sample { ˆ d ∗ n,b : 1 ≤ b ≤ B } where we denote the m th element of ˆ d ∗ n,b as ˆ d ∗ n,b,m .For example, if the observations are stationary, then one can implement the stationarybootstrap. Define a centred bootstrap sample as˜ d ∗ n,b ≡ ˆ d ∗ n,b − ˆ d n . (4)Define the bootstrap test statistics { ( ˆ T r ∗ n,b , ˆ T s ∗ n,b ) } Bb =1 where ˆ T r ∗ n,b and ˆ T s ∗ n,b areˆ T r ∗ n,b ≡ √ n max m ∈ M ˜ d ∗ n,b,m ˆ σ n,m and (5)ˆ T s ∗ n,b ≡ √ n min max m ∈ M ˜ d ∗ n,b,m ˆ σ n,m , max m ∈ M − ˜ d ∗ n,b,m ˆ σ n,m !! and { ˆ σ n,m : m ∈ M } are not bootstrapped. Song (2012) defines ˆ c s ∗ n as the (1 − αγ )-quantileof the bootstrap sample { ˆ T s ∗ n,b } Bb =1 , i.e.,ˆ c s ∗ n ≡ inf ( c ∈ R : 1 B B X b =1 n ˆ T s ∗ n,b ≤ c o ≥ − αγ ) . Given ˆ c s ∗ n , the critical value ˆ c r ∗ n is defined as the (1 − α (1 − γ ))-quantile of the bootstrapsample { ˆ T r ∗ n,b · n ˆ T s ∗ n,b ≤ ˆ c s ∗ n o } Bb =1 , i.e.,ˆ c r ∗ n ≡ inf ( c ∈ R : 1 B B X b =1 n ˆ T r ∗ n,b · n ˆ T s ∗ n,b ≤ ˆ c s ∗ n o ≤ c o ≥ − α (1 − γ ) ) . γ determines the degree to which the two-sided test φ sn contributes to the hybrid test in constructing the rejection region. For example, if γ iszero, then the hybrid test coincides with a one-sided test with significance level α . If γ is 1, then the hybrid test corresponds to the two-sided test with significance level α . Werestrict the tuning parameter γ to be in (0 ,
1] as the asymptotic properties of the one-sidedtest follow White (2000).
3. On the size control of the hybrid test
In this section, we investigate the asymptotic theoretical properties of the hybrid test,as these were not formally studied in Song (2012). First, we provide a simple examplewhere the asymptotic rejection probability of the hybrid test exceeds the significance level.Next, we present the main result generalizing the observation.We begin by adding the to following two assumptions.
Assumption 2.
All diagonal elements of the variance-covariance matrix Σ are positive. Assumption 3.
As the sample size n diverges to infinity we have sup z ∈ R M (cid:12)(cid:12)(cid:12) P ∗ n n √ n (cid:16) ˆ d ∗ n,b − ˆ d n (cid:17) ≤ z o − P n √ n ( ˆ d n − d ) ≤ z o(cid:12)(cid:12)(cid:12) p → where P ∗ n denotes the probability measure conditional on the sample X ( n ) . Assumption 2 assumes that no element in √ n ( ˆ d n − d ) degenerates in the limit. Assump-tion 3 means that the bootstrap distribution approximates the distribution of √ n ( ˆ d n − d )when the sample size n is sufficiently large. This assumption is necessary to justify thebootstrapped critical values. White (2000) and Hansen (2005) provide sufficient conditionson ˆ d n and bootstrap procedures for Assumption 3 to hold. To gain intuitions on the asymptotic properties of the hybrid test, we consider a simpleexample where the number of alternative forecasting methods is two, M = 2. Let P bea distribution satisfying the null hypothesis. In particular, we assume that d = 0 and7 <
0. This means that the first alternative forecasting method is as risky as the bench-mark forecasting method in terms of predictive ability, whereas the benchmark forecastingmethod dominates the second alternative forecasting method. Let Z = ( Z , Z ) be thenormal vector in Assumption 1. We further assume that the risks among the benchmarkand alternative forecasting methods are independent in the limit, cov( Z , Z ) = 0. For thesake of simplicity, assume that var( Z ) = var( Z ) = 1 and γ = 0 . T rn and ˆ T sn . By As-sumption 1 and Assumption 2, we have √ n ( ˆ d n, − d , ˆ d n, − d ) d → ( Z , Z ) ∼ N (0 , I ) . The condition that d = 0 , d < √ n ˆ d n, diverges to −∞ as n goes to infinitywhile √ n ˆ d n, is stochastically bounded. As a result, the two test statistics depend only on √ n ˆ d n, for large n . That is, the vector of the test statistics converge in distribution to astandard normal distribution, i.e. ˆ T rn ˆ T sn ≡ √ n max( ˆ d n, , ˆ d n, ) √ n min(max( ˆ d n, , ˆ d n, ) , max( − ˆ d n, , − ˆ d n, )) ≈ √ n ˆ d n, √ n ˆ d n, d → Z Z . (6)Next, we derive the asymptotic distribution of the bootstrap version of ˆ T sn . By Assump-tion 3 we have √ n ( ˜ d ∗ n,b, , ˜ d ∗ n,b, ) d → ( V , V ) ∼ N (0 , I )with probability approaching 1. Note that the vector ( ˜ d ∗ n,b, , ˜ d ∗ n,b, ) is centred at zero while( ˆ d n, , ˆ d n, ) is centred at ( d , d ) = (0 , T s ∗ n,b depends on √ n ˜ d ∗ n,b, and √ n ˜ d ∗ n,b, for large n . This contrasts to the fact that ˆ T sn only relieson √ n ˆ d n, . The bootstrap consistency assumption and continuous mapping theorem giveˆ T s ∗ n,b ≡ √ n min(max( ˜ d ∗ n,b, , ˜ d ∗ n,b, ) , max( − ˜ d ∗ n,b, , − ˜ d ∗ n,b, )) d → L with probability approaching 1 where L ≡ min(max( V , V ) , max( − V , − V )) and the vector( V , V ) follows the bivariate standard normal distribution. A simple algebra gives theformula for the distribution function of L , F L ( t ) ≡ P { L ≤ t } = − ( t ) + 4Φ( t ) − t ∈ [0 , ∞ ) where Φ is the distribution function of the standard normal distribution.8oreover, the asymptotic distribution F L implies that the critical value ˆ c s ∗ n converges inprobability to the (1 − α/ F L , i.e.ˆ c s ∗ n p → c s ( α ) ≡ inf { x ∈ R : F L ( x ) ≥ − α/ } (7)Notice that the asymptotic distribution of the bootstrap test statistic ˆ T s ∗ n,b doesn’tcoincide with the asymptotic distribution of the two-sided test statistic ˆ T sn . This dis-cordance between Φ and F L eventually brings about size distortion of the hybrid test.To see this, we observe that the probability to reject the null hypothesis is larger than P { Z > c s ( α ) } = 1 − Φ( c s ( α )) for large n . i.e., E [ φ n ] = P { ˆ T rn > ˆ c r ∗ n or ˆ T sn > ˆ c s ∗ n } ≈ P { Z > min( c r ( α ) , c s ( α )) } ≥ P { Z > c s ( α ) } where c r ( α ) is the probability limit of ˆ c r ∗ n ; the approximation holds by Equation (6) and (7);and the inequality holds by the definition of minimum. A simple calculation then revealsthat 1 − Φ( c s ( α )) is greater than the nominal level α for α ∈ (0 , . − Φ( c s ( α )) and α could be sizable: the value of 1 − Φ( c s ( α )) is 0.158, 0.112, and0.050 when α is 0.10, 0.05, and 0.01 respectively. The previous example shows that there exists a fixed data generating process underwhich the size of the hybrid test is not controlled by the significance level α . The re-centered bootstrap sample mean ( ˜ d ∗ n,b, , ˜ d ∗ n,b, ) plays a critical role in obtaining the conclu-sion. Specifically, the re-centered bootstrap sample mean steers the asymptotic distributionof the bootstrap test statistics ( ˆ T r ∗ n,b , ˆ T s ∗ n,b ) away from the asymptotic distribution of the teststatistics ( ˆ T rn , ˆ T sn ), consequently yielding over-rejection of the null hypothesis. One mightexpect that the same result would hold in general cases because the bootstrap procedureuses re-centered bootstrap sample means even when M is larger than 2. The followingtheorem states that this conjecture is true. Theorem 1.
Suppose that Assumption 1, 2, and 3 hold. Let ≤ M < ∞ . Let P be adistribution generating X ( n ) satisfying the following conditions:1. there exists m ∈ M such that d m = 0 , . there exists m ′ ∈ M such that d m ′ < , and3. Σ is a diagonal matrix.For any γ ∈ (0 , , there exists an upper bound ¯ α ≡ ¯ α ( P, γ ) ∈ (0 , . such that the followingcondition holds: lim n →∞ E P [ φ n ] > α for any α ∈ (0 , ¯ α ) where φ n is the hybrid test defined in Equation (3). Theorem 1 provides sufficient conditions for a sequence of distributions under the nullhypothesis along which the asymptotic rejection probability exceeds the nominal level α ∈ (0 , ¯ α ). The existence of such a sequence of distributions implies that the hybrid test is notpoint-wise asymptotically of level α for α ∈ (0 , ¯ α ) and thus the size of the hybrid test isnot controlled asymptotically.The size of the upper bound ¯ α could be of practical interest as one can carry out thehybrid test without concerns on size distortion if ¯ α is smaller than 0.01. The value of ¯ α is,however, a priori unknown and depends on the data generating process. To be more specific,it relies on the number of the alternative forecasting methods M as well as the number ofthe alternatives attaining the same risk as the benchmark, i.e. M ≡ |{ m ∈ M : d m = 0 }| .Once M , M and γ are fixed, ¯ α can be obtained by numerical approximation. To see howlarge ¯ α could be, we tabulated some values of ¯ α under γ = 0 . α varies consistently over the ratio of M to M : ¯ α may get close to γ as the ratio increases to 1 and be close to 0 as the ratio diminishes to zero. We presentthe values of ¯ α under γ = 0 .
25 and γ = 0 .
75 in APPENDIX A. The result implies that onecannot use conventional significance levels { . , . , . } when the ratio exceeds a halfand that the use of p-values is generally invalid.While Theorem 1 postulates three conditions for data generating processes leading tosize distortion for α ∈ (0 , ¯ α ), the first condition in Theorem 1 is crucial because it preventsthe distribution of the test statistics from degenerating. The condition is satisfied if theset of alternative forecasting methods M contains at least one forecasting method thatattains the same risk as the benchmark forecasting method. The violation of this condition10able 1: Values of ¯ α ’s in Theorem 1 With M = 10 , , · · · , M = kM − k =0 . , . , · · · ,
1, and γ = 0 . kM M have greater risks than the benchmarkforecasting method under the null hypothesis. In this case, the two test statistics divergeto the negative infinity while the critical values converge to fixed real numbers regardless.Therefore, if the first condition is violated, the conclusion no longer holds.The second condition says that the set M must contain at least one forecasting methodriskier than the benchmark forecasting method. Recall that in the example d < T rn deviate from thatof ˆ T s ∗ n,b . In the same manner, the second condition causes the asymptotic distribution ofthe test statistics to differ from the asymptotic distribution of the bootstrap test statistics.If the second condition is not satisfied, then d must be zero under the null hypothesis. If d = 0, the bootstrap test statistics exactly approximate the limiting distribution of thetest statistics. The limit probability to reject the null hypothesis, therefore, becomes thesignificance level α , rather than exceeding α .Unlike the first two, the last condition is not a necessary condition. The condition11equires that the covariance of ( Z i , Z j ) is zero for any i = j ∈ M where Z is the randomvector from N (0 , Σ) in Assumption 1. In a simple case where M = 2, it can easily be shownthat the result still holds even if cov( Z , Z ) > . The condition is posited to simplify theproof.
4. Recovering asymptotic size control
Theorem 1 shows that the hybrid test is not pointwise asymptotically of level α . In thissection, we modify the hybrid test to be not only pointwise but also uniformly asymptoti-cally of level α as defined in Equation (2). The latter concept, which is stronger than theformer, implies that one can approximately control the finite sample size of the test givena sufficiently large sample.To fix the size distortion of the hybrid test, we borrow an idea from the moment inequal-ity literature. The size distortion essentially stems from the phenomenon that the bootstraptest statistics do not mimic the asymptotic behavior of the test statistics. Similar issuesoften arise in the moment inequality testing problems, and as a result, many studies haveproposed methods to circumvent this problem. However, as explained in Canay and Shaikh(2017), many of them hinge on the property that test statistics are monotone in ˆ d n . Be-cause ˆ T sn is not monotone in ˆ d n , we cannot simply take one of the off-the-shelf methods andapply to the hybrid test. Consequently, we alter the generalized moment selection methodproposed by AS. Below we explain the procedure.First, we normalize the test statistics so that their values are zero under the null hy-pothesis. Specifically two modified test statistics ˜ T rn and ˜ T sn are˜ T rn ≡ √ n max m ∈ M ˆ d n,m ˆ σ n,m ∨ ! ≡ S r (cid:16) ˆ D − n √ n ˆ d n (cid:17) and (8)˜ T sn ≡ √ n min max m ∈ M ˆ d n,m ˆ σ n,m ∨ ! , max m ∈ M − ˆ d n,m ˆ σ n,m ! ∨ !! ≡ S s (cid:16) ˆ D − n √ n ˆ d n (cid:17) (9)where ˆ D n ≡ diag (ˆ σ n, , · · · , ˆ σ n,M ), S q : R M → R for q ∈ { r, s } are real-valued functionssuch that S r ( x ) = max m ∈ M ( x ∨
0) and S s ( x ) = min(max m ∈ M ( x ∨ , max m ∈ M ( − x ∨ a ∨ b is the maximum between a and b .12econd, we define the moment selecting vector ˆ ψ n = ( ˆ ψ n, , · · · , ˆ ψ n,M ) t as proposed byAS where the m th element isˆ ψ n,m ≡ √ nκ n ˆ d n,m ˆ σ n,m ( √ nκ n ˆ d n,m ˆ σ n,m < − ) for m = 1 , · · · , M.κ n is a non-stochastic sequence of non-negative numbers such that κ n → ∞ and κ n / √ n → n diverges to infinity. κ n is a tuning parameter that a researcher has to choose. ASrecommend the choice κ n = √ log n. As in AS we suggest two types of data-dependent critical values. The first type issimulation-based. The critical values ˜ c qn (1 − α ) for q ∈ { r, s } are defined by˜ c qn (1 − α ) ≡ inf n x ∈ R : P n S q (cid:16) ˆΩ n Z + ˆ ψ n (cid:17) ≤ x o ≥ − α o (10)where ˆΩ n ≡ ˆ D − n ˆΣ n ˆ D − n , ˆΣ n is the consistent variance-covariance estimator for Σ in As-sumption 1, and ˆΩ / n is a symmetric positive semi-definite matrix such that ˆΩ / n ˆΩ / n = ˆΩ n . P is the conditional probability given ( ˆΩ n , ˆ ψ n ), and Z follows standard normal distribu-tion independently from the sample. ˜ c qn (1 − α ) can be obtained by simulating { Z , · · · , Z R } for some large R .The second type is bootstrap-based. The critical values are defined as follows:˜ c q ∗ n (1 − α ) ≡ inf n x ∈ R : P ∗ n ˜ T q ∗ n,b ≤ x o ≥ − α o for q ∈ { r, s } (11)where ˜ T q ∗ n,b ≡ S q (cid:16) ˆ D − n √ n ˜ d ∗ n,b + ˆ ψ n (cid:17) for q ∈ { r, s } for b = 1 , · · · , B (12)and P ∗ is the bootstrap conditional probability given the sample. Definition 1.
With the test statistics ˜ T rn and ˜ T n defined in Equation (9) , the modifiedhybrid test is defined by ˜ φ n ≡ { ˜ T rn > c rn (1 − α (1 − γ )) or ˜ T sn > c sn (1 − αγ ) } for any γ ∈ (0 , where ( c rn , c sn ) = (˜ c rn , ˜ c sn ) in Equation (10) or ( c rn , c sn ) = (˜ c r ∗ n , ˜ c s ∗ n ) inEquation (11) . ψ n are added to the centred bootstrap samples ˜ d ∗ n,b . The bootstrap sample ˜ d ∗ n,b centred at zero causes bootstrap test statistics to deviate from the asymptotic distributionof the actual test statistics when d is not zero. If d m < m ∈ M , √ n ˆ d n,m / ˆ σ n,m diverges to negative infinity while √ n ˜ d ∗ n,b,m remains stochastically bounded. The momentselecting vector prevents this deviation by adding a quantity diverging to negative infinityat √ n ˜ d ∗ n,b,m / ˆ σ n,m when √ n ˆ d n,m / ˆ σ n,m is small enough. As a result, the modified test attainsasymptotic size control as follows. Lemma 1.
Suppose Assumption 1, 2, and 3 hold. Assume that ˆΣ n is a consistent estimatorfor Σ . Then for α ∈ (0 , . and γ ∈ (0 , we have lim sup n →∞ sup P ∈ P P { ˜ T rn > ˜ c rn (1 − α (1 − γ )) or ˜ T sn > ˜ c sn (1 − αγ ) } ≤ α for ( c rn , c sn ) = (˜ c rn , ˜ c sn ) or ( c rn , c sn ) = (˜ c r ∗ n , ˜ c s ∗ n ) where P is the set of all distributions satisfyingthe null hypothesis and generating the sample X ( n ) . The proof for Lemma 1 can be found in APPENDIX B. Intuitively, the result followsfrom tailoring Lemma 2 and Theorem 1 of AS to our framework. While our test statisticsviolate Assumption 1(a) and 3 in AS, Assumption 1, which imposes a stronger restrictionon E P n [ ˆ d n ] than AS do, allows the uniformity result to hold.
5. Monte carlo simulation
While Theorem 1 tells us the possibility of size distortion of the hybrid test based onasymptotic arguments, it doesn’t inform how pronounced the distortion could be in a finitesample. In this section, we explore how significantly the asymptotic result manifests in afinite sample through Monte Carlo simulation. Furthermore, we study the finite sampleperformance of the modified hybrid test under the null hypothesis.We use the simulation design similar to the ones considered in Song (2012) and Hansen(2005). As in Section 2, suppose we have a benchmark forecasting method and M distinct14lternative forecasting methods. We observe n realized relative risks of alternative forecast-ing methods to the benchmark method, d t ∈ R M for t = 1 , · · · , n . We are interested in test-ing the null hypothesis that the relative risk is not greater than zero, H : d ≡ E P [ d t ] ≤ d n = P nt =1 d t /n .For simulation, we draw realized relative risks independently from a normal distribution,i.e., d t ∼ i.i.d. N ( − µλ M , V ) where µ is a positive number and λ M is an M dimensionalvector of which first M elements are zeros and the rest M − M elements are ones. M refersto the number of the alternative forecasting methods of which risks are the same as thatof the benchmark, i.e., M = |{ d m : d m = 0 , m = 1 , · · · , M }| . The relative risk d = − µλ k is non-positive and hence the design satisfies the null hypothesis. The i.i.d. observationsimply that Assumption 1 and Assumption 3 are satisfied. The variance-covariance matrix V is designed to satisfy the third condition of Theorem 1. The off-diagonal elements ofthe variance matrix V are zeros and the M diagonal elements are determined by a randomdraw from the uniform distribution over [1 ,
2] at the beginning of the simulation and arefixed during the simulation.The sample size n is 200 and hence we draw M × n random numbers. The number ofMonte Carlo repetitions and the bootstrap samples are 5,000 and 500 respectively. Thenumber of alternative forecasting methods is chosen from M ∈ { , } . For the sig-nificance level α , we consider 0.01, 0.05, and 0.10. We use γ = 0 . κ n = √ log n for the tuning parameter in the modified hybrid test asrecommended by AS.Table 2 reports the simulated rejection probabilities. Hyb. indicates the hybrid testwhile Boot. and Simu. refer to the modified test with the bootstrap-based and simulation-based critical values respectively.Table 2 provides evidence of finite sample size distortion of the hybrid test. Manysimulated rejection probabilities of the hybrid test exceed the significance level α when M is strictly less than M . The size distortion is the starkest when M is slightly less than M ,and the extent of distortion is not marginal. For example, the rejection probabilities of the15ybrid test with M = 50 and M = 45 are 0.208, 0.149, and 0.070 which are almost twice,three times, and seven times larger than the corresponding significance levels α = 0 . , . .
01. We have similar results in the case with M = 100 and M = 95 . There is a noticeable pattern in the simulated probabilities. First, when all inequalitiesare binding, that is, when the second condition in Theorem 1 is not satisfied, the probabil-ities are close to the nominal level α . This is because, under this data generating process,the bootstrap distribution correctly approximates the distribution of the test statistics andhence the rejection probability converges exactly to the nominal level. Second, as M de-creases, the probabilities abruptly increase over the nominal level but decline gradually.This is because both test statistics converge to max m ∈ M Z m , which decreases in M , where { Z m : m = 1 , · · · , M } are independent random variables from the standard normal distri-bution. Meanwhile, the limiting distributions of the bootstrap test statistics do not dependon M . This difference leads to diminishing rejection probabilities along M . Finally, theprobabilities fall below α when the ratio M /M is small: less than 0.4 for α = 0 .
10, 0.3 for α = 0 .
05, and 0.2 for α = 0 .
01 for the case M = 50 . This is consistent with our findingsfrom Table 1 that ¯ α decreases as the ratio M /M diminishes.Contrary to the hybrid test, the simulated rejection probabilities of the modified hybridtests are less than the nominal level except two cases with M = M = 50 and α = 0 .
01. Themodified hybrid test appears to be conservative in that the simulated rejection probabilitiesare close to α/ M is strictly less than M . This is because two test statistics ˜ T rn and ˜ T sn converge in distribution to the same distribution as ˆ T rn and ˆ T sn do in the previousexample. Furthermore, the probabilities show that two different critical values of themodified hybrid test yield similar results.
6. Conclusion
This article shows that the hybrid test proposed by Song (2012) is not point-wise asymp-totically of level α . We verify the cause of the size distortion: the principle of least-favorableapproach taken by White (2000) no longer holds in the hybrid test. As its result, the boot-strap procedure which centers the bootstrap sample at zero approximates a distributionthat is irrelevant to the test statistics. We modify the hybrid test by adopting the general-16able 2: Simulated Rejection Probabilities α = 0 . α = 0 . α = 0 . M M Hyb. Boot. Simu. Hyb. Boot. Simu. Hyb. Boot. Simu.50 50 0.106 0.083 0.083 0.055 0.046 0.043 0.016 0.011 0.01150 45 0.208 0.056 0.057 0.149 0.029 0.028 0.070 0.006 0.00750 40 0.192 0.052 0.053 0.139 0.029 0.027 0.060 0.005 0.00550 35 0.164 0.052 0.051 0.113 0.024 0.026 0.050 0.005 0.00550 30 0.139 0.053 0.053 0.095 0.028 0.028 0.047 0.007 0.00750 25 0.115 0.052 0.052 0.086 0.030 0.028 0.038 0.007 0.00650 20 0.102 0.062 0.060 0.074 0.033 0.033 0.036 0.008 0.00850 15 0.075 0.052 0.052 0.051 0.028 0.026 0.024 0.007 0.00650 10 0.053 0.055 0.055 0.036 0.027 0.026 0.016 0.008 0.00750 5 0.025 0.054 0.054 0.016 0.026 0.025 0.006 0.005 0.004100 100 0.103 0.081 0.083 0.053 0.042 0.041 0.011 0.009 0.008100 95 0.219 0.062 0.061 0.157 0.033 0.032 0.073 0.010 0.008100 90 0.225 0.062 0.061 0.162 0.030 0.031 0.073 0.007 0.006100 85 0.208 0.065 0.064 0.147 0.035 0.032 0.068 0.008 0.008100 80 0.190 0.060 0.060 0.137 0.032 0.032 0.059 0.009 0.009100 75 0.185 0.060 0.061 0.133 0.030 0.029 0.057 0.007 0.008100 70 0.172 0.057 0.058 0.116 0.029 0.029 0.052 0.008 0.008100 65 0.155 0.0572 0.057 0.111 0.028 0.027 0.048 0.007 0.007100 60 0.147 0.057 0.058 0.102 0.030 0.030 0.044 0.007 0.006100 55 0.136 0.053 0.053 0.090 0.029 0.029 0.038 0.008 0.007NOTE: Hyb., Boot., and Simu. refer to the hybrid test and modified hybrid test withbootstrap-based and simulation-based critical values respectively.17zed moment selection method by Andrews and Soares (2010) and show that the modifiedhybrid test is uniformly asymptotically of level α. We expect this article to shed light onunderstanding the asymptotic properties of the hybrid test and to help practitioners toconduct valid inference.
Appendix A: values of ¯ α under various γ Table A.1: The values of ¯ α ’s in Theorem 1 with M = 10 , , , · · · , M = kM − k = 0 . , . , · · · ,
1, and γ = 0 . kM Appendix B: proofs
B.1 Proof for Theorem 1
This proof consists of four steps. In the first step we obtain the asymptotic distributionsof the two test statistics. In the second and third step, we derive the probability limits of18able A.2: The values of ¯ α ’s in Theorem 1 with M = 10 , , , · · · , M = kM − k = 0 . , . , · · · ,
1, and γ = 0 . kM c s ∗ n and ˆ c s ∗ n respectively. In the last step we show the existence of ¯ α whichsatisfies the conclusion. Step 1:
Let P be a distribution which satisfies all three conditions in the theorem. Then P conforms to the null hypothesis. Let M denote the set of indices with zero mean, i.e. M = { m ∈ M : d m = 0 } . By the first two conditions, both sets M \ M and M are notempty. Define a diagonal matrix ˆ D n by ˆ D n ≡ diag(ˆ σ n, , · · · , ˆ σ n,M ). The third conditiontogether with Assumption 1 and Assumption 3 implies that we have √ n ˆ D − / n ( ˆ d n − d ) d → Z ≡ ( Z , · · · , Z M ) ′ (13)as n diverges to infinity where Z is a random vector from the M -dimensional standardnormal distribution. 19efine a function f : R M R by f ( x ) = max m ∈ M x m min(max m ∈ M x m , max m ∈ M ( − x m )) (14)where x = ( x , · · · , x M ) t . Clearly, both mappings x max m ∈ M x m and x min(max m ∈ M x m ,max m ∈ M ( − x m )) are continuous. This implies that function f is also a continuous mapping.The vector of test statistics can be written as ( ˆ T rn , ˆ T sn ) t = f ( √ n ˆ D − / n ˆ d n ).We want to obtain the limiting distribution of the vector of the test statistics. Moreprecisely, we want to show ( ˆ T rn , ˆ T sn ) t d → (max m ∈ M Z m , max m ∈ M Z m ) t . By the definition ofweak convergence, it means thatlim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E h g ( f ( √ n ˆ D − / ˆ d n )) i − E g max m ∈ M Z m max m ∈ M Z m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 0 (15)for any bounded continuous function g : R R .To this end, we define two events E and E by E ≡ { min m ∈ M ˆ d n,m ˆ σ n,m < max m ∈ M \ M ˆ d n,m ˆ σ n,m } and E ≡ { max m ∈ M ˆ d n,m ˆ σ n,m > max m ∈ M \ M − ˆ d n,m ˆ σ n,m } . Then by rearranging the terms and subtracting max m ∈ M \ M d m /σ m on both sides in theevent E we havelim n →∞ P { E } = lim n →∞ P {− max m ∈ M \ M d m σ m < − min m ∈ M ˆ d n,m ˆ σ n,m + max m ∈ M \ M ˆ d n,m ˆ σ n,m − max m ∈ M \ M d m σ m } = 0 . The last equality holds from the fact thatmin m ∈ M ˆ d n,m ˆ σ n,m p → m ∈ M \ M ˆ d n,m ˆ σ n,m − max m ∈ M \ M d m σ m p → n diverges to infinity, which are implied by Equation (13) and continuous mappingtheorem. Similarly we can show that lim n →∞ P { E } = 0. Let 1 E j denote an indicatorfunction which takes 1 as its value if the event E j occurs for j ∈ { , } and zero otherwise.Then Equation (15) holds by the following. For any bounded continuous function20 : R R , we havelim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E g ( f ( √ n ˆ D − / n ˆ d n )) − g max m ∈ M Z m max m ∈ M Z m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ lim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E { g ( f ( √ n ˆ D − / n ˆ d n )) − g max m ∈ M Z m max m ∈ M Z m }{ E + (1 − E )1 E } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + lim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E { g ( f ( √ n ˆ D − / n ˆ d n )) − g max m ∈ M Z m max m ∈ M Z m }{ − E }{ − E } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ lim n →∞ x ∈ R | g ( x ) |{ P { E } + P { E }} + lim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E { g ( f ( √ n ˆ D − / n ˆ d n )) − g max m ∈ M Z m max m ∈ M Z m }{ − E }{ − E } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = lim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E { g max m ∈ M √ n ˆ d n,m / ˆ σ n,m max m ∈ M √ n ˆ d n,m / ˆ σ n,m − g max m ∈ M Z m max m ∈ M Z m }{ − E }{ − E } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (16) ≤ lim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E g max m ∈ M √ n ˆ d n,m / ˆ σ n,m max m ∈ M √ n ˆ d n,m / ˆ σ n,m − g max m ∈ M Z m max m ∈ M Z m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + lim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E { g max m ∈ M √ n ˆ d n,m / ˆ σ n,m max m ∈ M √ n ˆ d n,m / ˆ σ n,m − g max m ∈ M Z m max m ∈ M Z m }{ E + 1 E − E E } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ lim n →∞ E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) g max m ∈ M √ n ˆ d n,m / ˆ σ n,m max m ∈ M √ n ˆ d n,m / ˆ σ n,m − g max m ∈ M Z m max m ∈ M Z m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) · | E + 1 E − E E | ≤ lim n →∞ x ∈ R | g ( x ) |{ P { E } + P { E }} = 0 . The first inequality holds by the triangular inequality. To get the first equality we use thefact that g is a bounded function, and that (1 − E )1 E ≤ E . The second equality holdsfrom that the probabilities of two events E and E converge to zero, and that f ( √ n ˆ D − / n ˆ d n ) = (cid:18) max m ∈ M √ n ˆ d n,m / ˆ σ n,m , max m ∈ M √ n ˆ d n,m / ˆ σ n,m (cid:19) t conditional on the event E c ∩ E c . The second inequality holds by the triangular inequalityagain. The penultimate inequality holds by continuous mapping theorem and by the def-inition of weak convergence. For the last inequality, we bound g with its supremum. The21ast equality again holds by that the probability of two events converge to zero. Therefore,the vector of test statistics ( ˆ T rn , ˆ T sn ) has a desired limit distribution. Step 2:
We obtain the probability limits of the critical values ˆ c r ∗ n and ˆ c s ∗ n . Assumption 3states that sup x ∈ R M (cid:12)(cid:12)(cid:12) P ∗ {√ n ( ˆ d ∗ n,b − ˆ d n ) ≤ x } − P {√ n ( ˆ d n − d ) ≤ x } (cid:12)(cid:12)(cid:12) p → n diverges to infinity where P ∗ denotes the bootstrap probability measure. Define arandom vector ( Z r , Z s ) t ≡ f ( Z ) where f is defined in Equation (14). Then continuousmapping theorem implies thatsup x,y ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ y } − P { Z r ≤ x, Z s ≤ y } (cid:12)(cid:12)(cid:12) p → n diverges to infinity. Since the mapping that selects a coordinate ( x, y ) y is contin-uous, we have sup y ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T s ∗ n,b ≤ y } − P { Z s ≤ y } (cid:12)(cid:12)(cid:12) p → Z s and show that it is continuous and strictly increasing. Given this, Lemma 11.2.1 ofLehmann and Romano (2006) gives us thatˆ c s ∗ n p → c s ≡ inf { y ∈ R : P { Z s ≤ y } ≥ − αγ } (18)for any α ∈ (0 ,
1) and γ ∈ (0 , Step 3:
To obtain the probability limit of the second critical value ˆ c r ∗ n , we start withshowing thatsup x ∈ R | G n ( x ) | ≡ sup x ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b { ˆ T s ∗ n,b ≤ ˆ c s ∗ n } ≤ x } − P { Z r { Z s ≤ c s } ≤ x } (cid:12)(cid:12)(cid:12) p → . (19)22he term on the left hand side can be bounded as follows:sup x ∈ R | G n ( x ) |≤ sup x< | G n ( x ) | + sup x ≥ | G n ( x ) | = sup x< (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ c s } (cid:12)(cid:12)(cid:12) + sup x ≥ | G n ( x ) | = sup x< (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ c s } (cid:12)(cid:12)(cid:12) + sup x ≥ (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ ˆ c s ∗ n } + P ∗ { ˆ T s ∗ n,b > ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ c s } − P { Z s > c s } (cid:12)(cid:12)(cid:12) (20) ≤ x ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ c s } (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) P ∗ { ˆ T s ∗ n,b > ˆ c s ∗ n } − P { Z s > c s } (cid:12)(cid:12)(cid:12) ≤ x ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ ˆ c s ∗ n } (cid:12)(cid:12)(cid:12) + 2 sup x ∈ R | P { Z r ≤ x, Z s ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ c s }| + (cid:12)(cid:12)(cid:12) P ∗ { ˆ T s ∗ n,b > ˆ c s ∗ n } − P { Z s > c s } (cid:12)(cid:12)(cid:12) . The first inequality holds by the triangular inequality. The first equality follows from thefact that for ˆ T r ∗ n,b { ˆ T s ∗ n,b ≤ ˆ c s ∗ n } to take a negative value the indicator function must beone. Similarly, we get the second equality by decomposing G n ( x ) into two cases where theindicator function is zero or not. For the second inequality we use that supremum is anon-decreasing set operator and the triangular inequality.Now we show that all three terms in the last line converge to zero in probability. Theconvergence of the first term comes from thatsup x ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ ˆ c s ∗ n } (cid:12)(cid:12)(cid:12) ≤ sup x,y ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ y } − P { Z r ≤ x, Z s ≤ y } (cid:12)(cid:12)(cid:12) and by Equation (17). To show the convergence of the second term, define the jointdistribution function of ( Z r , Z s ) and the marginal distribution function of Z s by F rs ( x, y ) ≡ P { Z r ≤ x, Z s ≤ y } and F s ( y ) ≡ P { Z s ≤ y } . c s ∗ n imply thepointwise convergence, F rs ( x, ˆ c s ∗ n ) − F rs ( x, c s ) p → x ∈ R . The same logic gives the pointwise convergence of conditional distribution function, F rs ( x, ˆ c s ∗ n ) F s (ˆ c s ∗ n ) − F rs ( x, c s ) F s ( c s ) p → x ∈ R . Now we can extend this pointwise convergence into the uniform convergence over the real-line by applying Theorem 11.2.9 of Lehmann and Romano (2006) to two conditional dis-tributions as the conditional distribution function F rs ( x, c s ) /F s ( c s ) is continuous. Once wehave the uniform convergence of the conditional distributions, we havesup x ∈ R | P { Z r ≤ x, Z s ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ c s }| = sup x ∈ R | F rs ( x, ˆ c s ∗ n ) − F rs ( x, c s ) | = sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) F rs ( x, ˆ c s ∗ n ) F s (ˆ c s ∗ n ) − F rs ( x, c s ) F s ( c s ) (cid:12)(cid:12)(cid:12)(cid:12) · F s (ˆ c s ∗ n ) + sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) F rs ( x, c s ) F s ( c s ) (cid:12)(cid:12)(cid:12)(cid:12) · | F s (ˆ c s ∗ n ) − F s ( c s ) | p → . The convergence of the third term is straightforward.Given the result, let us obtain the probability limit of critical value ˆ c r ∗ n . We can’tdirectly apply Lemma 11.2.1 of (Lehmann and Romano, 2006) as in Step 3 because thedistribution of Z r { Z s ≤ c s } is discontinuous at zero. Let α ∈ (0 , − − M ). Then we have P { Z r { Z s ≤ c s } ≤ } = P { Z r ≤ , Z s ≤ c s } + P { Z s > c s } = P { Z r ≤ , Z s ≤ c s } + αγ ≤ min( P { Z r ≤ } , P { Z s ≤ c s } ) + αγ (21)= min(2 − M , − αγ ) + αγ< − α + αγ = 1 − α (1 − γ ) if α < − − M . The second equality holds by the definition of c s . The third equality holds by that P { Z r ≤ } = P { Z m ≤ m ∈ M } = Φ M (0) = 2 − M and again by the definition of c s whereΦ( · ) is the cumulative distribution function of the standard normal distribution. This resultguarantees that the 1 − α (1 − γ ) quantile of Z r { Z s ≤ c s } is strictly positive given that24 < − − M . As the distribution function P { Z r { Z s ≤ c s } ≤ x } is continuous and strictlyincreasing over the interval [0 , ∞ ), we have thatˆ c r ∗ n p → c r ≡ inf { x ∈ R : P { Z r { Z s ≤ c s } ≤ x } ≥ − α (1 − γ ) } (22)by Lemma 11.2.1 of Lehmann and Romano (2006). Step 4:
We show that there exists ¯ α which makes the probability to reject the nullhypothesis strictly greater than α for all α ∈ (0 , ¯ α ).First, we compute a lower bound for the limiting rejection probability. The test function φ n is defined by φ n ≡ { ˆ T sn > ˆ c s ∗ n } + 1 { ˆ T sn ≤ ˆ c s ∗ n } { ˆ T rn > ˆ c r ∗ n } . Given the distribution P , the limiting rejection probability islim n →∞ E P [ φ n ] = lim n →∞ P { ˆ T sn > ˆ c s ∗ n or ˆ T rn > ˆ c r ∗ n } = P { max m ∈ M Z m > min( c s , c r ) } . (23)This holds by the weak convergence result in Equation (15), by convergence of the criticalvalues in Equation (18) and (22), and by the Slutsky theorem. Define k ≡ k ( α ) ≡ Φ( c s ) = 1 − Φ( − c s ) . (24)Note that k is a function of α as c s depends on α . The limiting rejection probability inEquation (23) is bounded from below by 1 − k because it holds that P { max m ∈ M Z m > min( c s , c r ) } ≥ P { max m ∈ M Z m > c s } = 1 − k M where M = | M | ≥
1. Therefore in order to attain the conclusion, it is sufficient to find α satisfying that 1 − k M > α .Now let us consider the relationship between k defind in Equation (24) and α . The25efinition of c s provides the connection between the two. By definition of c s , we have αγ = P ( Z s > c s )= P (min(max m ∈ M Z m , − min m ∈ M Z m ) > c s )= P (max m ∈ M Z m > c s , and − min m ∈ M Z m > c s )= 1 − P (max m ∈ M Z m ≤ c s , or min m ∈ M Z m ≥ − c s )= 1 − { P (max m ∈ M Z m ≤ c s ) + P ( min m ∈ M Z m ≥ − c s ) − P ( ∀ m ∈ M , − c s ≤ Z m ≤ c s ) } (25)= − k M + (2 k − M if c s ≥ αγ ∈ (0 , . − k M if c s < αγ ∈ (0 . , . The first case holds because Equation (25) equals 1 − { Φ M ( c s ) + (1 − Φ( − c s )) M − (Φ( c s ) − Φ( − c s )) M } . The second case holds because P {− c s ≤ Z m ≤ c s } =0 for any m ∈ M if c s ≤ α ∈ (0 , . /γ ]. Recall that the tuning parameter γ ∈ (0 ,
1] is fixed.Following Equation (21), define a function a γ : [0 , → [0 , γ ] by a γ ( x ) ≡ γ (1 − x M + (2 x − M ) if x ∈ [0 . , γ (1 − x M ) if x ∈ [0 , . . It is easy to check that a γ is continuous on [0 ,
1] and a ′ γ ( s ) < x ∈ (0 , a γ is bijective. In other words, for k ∈ [0 ,
1] there exists one-to-one relationbetween k and α , and a γ is the inverse function of k ( α ).Given the finding, let’s obtain the set of values for α satisfying 1 − k M > α . Specifically,find the values of x ∈ [0 . ,
1] satisfying the following condition: h γ ( x ) ≡ − x M − a γ ( x ) > h γ is a real-valued function defined on [0 , h γ (1) = 0 andlim x → − h ′ γ ( x ) <
0. Since h γ is a polynomial, there exists ¯ ε ∈ (0 , .
5) satisfying that h γ ( x ) > x ∈ (1 − ¯ ε, ε because it is the solution to the M th degree polynomial equation. However, for fixed M and M , the value of ¯ ε can be numerically approximated and so is ¯ α. Therefore any value α in the interval (0 , a γ (1 − ¯ ε )) satisfies 1 − k M > α . Recall thatEquation (21) in Step 3 requires α to be less than 1 − − M . As a result, we have the desiredresult by setting ¯ α = min( a γ (1 − ¯ ε ) , − − M , (2 γ ) − ).26 .2 Proof for Lemma 1 In this section, we prove Lemma 1 in the article by modifying the proof of Theorem1 and Lemma 2 in Andrews and Soares (2010). Their results do not directly apply toour setting. Specifically, our test statistics violate Assumption 1(a) and Assumption 3in Andrews and Soares (2010). Assumption 1(a) requires that the test statistics mustbe monotone in ˆ d n , but our statistic ˜ T sn is not monotone in ˆ d n as the function f ( x ) =min(max( x ∨ , max( − x ∨ x ≤ x ≥
0. Assumption 3 requires f ( x ) to be strictly positive if and only if x m > m ∈ M . It’s easy to check that f ( x ) = 0 if M = 2, x > x > . To get around the problem, we use a special feature of the SPA testing problem thatˆ d n converges in probability to a non-stochastic d . Andrews and Soares (2010) posits moregeneral setting than us in that they assume √ n ( ˆ d n − E P n [ ˆ d n ]) is asymptotically normal.Song (2012) indirectly imposes an constraint on the behaviour on E P n [ ˆ d n ] by Assump-tion 1. While the monotonicity assumption is used to control the asymptotic behaviourof E P n [ ˆ d n ]) the assumption is not necessary in our setting. Besides, Assumption 3 inAndrews and Soares (2010) is not crucial and can be easily modified. As a result, our proofcan be interpreted as a simplified version of Theorem 1 and Lemma 2 in Andrews and Soares(2010).As in Andrews and Soares (2010) we prove only for the case where the simulation-basedcritical values ˜ c qn (1 − α ) for q ∈ { r, s } are used. The other case with the bootstrap criticalvalues ˜ c q ∗ n (1 − α ) for q ∈ { r, s } can be shown in a similar manner.Our proof consists of three steps. Step 1 corresponds to Theorem 1 of Andrews and Soares(2010). Step 2 is the version of Lemma 2(a) without their Assumption 1(a). Step 3 followstheir proof of Lemma 2(b) but does not use their Assumption 3.Before we begin, we define some additional notation. Define a function ψ : R M → [ −∞ ,
0] such that ψ ( ξ ) = ( ψ ( ξ ) , · · · , ψ M ( ξ )) t and ψ m ( ξ ) = ξ m if ξ m < −
10 if ξ m ≥ − m = 1 , · · · , M where ξ m is the m th element of ξ ∈ R M . Given this notation, the moment selecting vectorˆ ψ n can be written as ψ ( ˆ ξ n ) where ˆ ξ n ≡ κ − n √ n ˆ D − / n ˆ d n .27 tep 1: We obtain a sequence of distributions in P n, along which the limit probabilityto reject a test equals the asymptotic size of the test. For any α ∈ (0 ,
1) and γ ∈ (0 , AsySize ≡ lim sup n →∞ sup P ∈ P n, P { ˜ T rn > ˜ c rn (1 − α (1 − γ )) or ˜ T sn > ˜ c sn (1 − αγ ) } . Then we can find a sequence { P n ∈ P n, : n ≥ } such that AsySize = lim sup n →∞ P n { ˜ T rn > ˜ c rn (1 − α (1 − γ )) or ˜ T sn > ˜ c sn (1 − αγ ) } . By definition, we can find a subsequence { u n : n ≥ } of { n } such that AsySize = lim n →∞ P u n { ˜ T ru n > ˜ c ru n (1 − α (1 − γ )) or ˜ T su n > ˜ c su n (1 − αγ ) } . We proceed arguments without specifying q unless necessary as statements are valid forboth choices, q ∈ { r, s } . Step 2:
We find the probability limit of the critical value ˜ c qn .We start with finding the probability limit of ˆ ξ n . Let d be the probability limit of ˆ d n inAssumption 1 along such sequence { P n } . Define a vector d ∗ = ( d ∗ , · · · , d ∗ M ) t such that d ∗ m = −∞ if d m <
00 if d m = 0 for m = 1 , · · · , M. Define a distribution function L q ( x ) ≡ P { S q (Ω / Z + ψ ( d ∗ )) ≤ x } for x ∈ R . Let c qd ∗ (1 − α ) be the 1 − α quantile from L q .Then it holds that ˆ ξ n p → d ∗ becauseˆ ξ n ≡ κ − n √ n ˆ D − / n ˆ d n = ˆ D − / n D / ( κ − n √ nD − / ( ˆ d n − d ) + κ − n √ nD − / d )= ( I M + o p (1)) · ( O p ( κ − n ) + d ∗ + o (1)) p → d ∗ along { P n } as n → ∞ by Assumption 1 and 2 where D is the probability limit of ˆ D n .In Step 2 and 3, we assume that c qd ∗ ¿0. This implies that d ∗ = ( −∞ ) M . Next, we showthat P { S q (Ω / Z + ψ ( ξ )) ≤ x } for x > ξ, Ω) at ( d ∗ , Ω ) where Ω = 0is the probability limit of ˆΩ n = ˆ D − / n ˆΣ n ˆ D − / n .We start by showing that ψ ( ξ ) → ψ ( d ∗ ) for any ξ → d ∗ . Suppose that d ∗ m = 0 for some m ∈ M . Then ψ m ( ξ ) → ψ m ( d ∗ ) because ψ m is continuous at zero. If d ∗ m = −∞ , then ψ m ( d ∗ ) = −∞ . It’s straightforward that ψ m ( ξ ) → −∞ . ξ, Ω) → ( d ∗ , Ω ), the result above and conti-nuity of S q imply that S q (Ω / Z + ψ ( ξ )) → S q (Ω / Z + ψ ( d ∗ )) almost surely in [ Z ]where Z ∼ N (0 , I M ). In turn we have1 { S q (Ω / Z + ψ ( ξ )) ≤ x } → { S q (Ω / Z + ψ ( d ∗ )) ≤ x } almost surely in [ Z ]for any x > S q (Ω / Z + ψ ( d ∗ )) is strictly increasing andcontinuous for x > d ∗ = ( −∞ ) M . The monotone convergence theorem gives us P { S q (Ω / Z + ψ ( ξ )) ≤ x } → P { S q (Ω / Z + ψ ( d ∗ )) ≤ x } for any x >
0. Therefore we achieve the claim.Now we obtain the probability limit of the critical value ˜ c qn . The result above, ( ˆ ξ n , ˆΩ n ) p → ( d ∗ , Ω ), and the Slutsky theorem imply that L qn ( x ) ≡ P { S q ( ˆΩ / n Z + ψ ( ˆ ξ n )) ≤ x } p → L q ( x ) ≡ P { S q (Ω / Z + ψ ( d ∗ )) ≤ x } for any x > { P n } as n → ∞ where P denotes the conditional probability given( ˆ ξ n , ˆΩ n ). Note that ˜ c qn defined in Section 4 is the 1 − α quantile from L qn . c qd ∗ (1 − α ) isthe 1 − α quantile from L q . Because we consider the case where c qd ∗ (1 − α ) >
0, we have˜ c qn (1 − α ) p → c qd ∗ (1 − α ) by Lemma 5 of Andrews and Guggenberger (2010). Step 3:
We finally derive the result on the asymptotic size. Note that all the convergenceresults in Step 2 still hold when we replace { P n } with { P u n } . Using the same argumentsin Step 1 of the proof for Theorem 1 , we can show that˜ T qu n ≡ S q ( ˆ D − / u n √ u n ˆ d u n ) d → S q (Ω / Z + ψ ( d ∗ ))along { P u n } as n → ∞ . Then we havelim inf n →∞ P u n { ˜ T qu n ≤ ˜ c qu n (1 − α ) } ≥ P { S q (Ω / Z + ψ ( d ∗ )) ≤ c qd ∗ (1 − α ) } ≥ − α c qd ∗ > c qd ∗ (1 − α ). The choice of { P u n } guarantees that AsySize ≡ lim sup n →∞ sup P ∈ P n, P { ˜ T rn > ˜ c rn (1 − α (1 − γ )) or ˜ T sn > ˜ c sn (1 − αγ ) }≤ lim sup n →∞ P u n { ˜ T rn > ˜ c rn (1 − α (1 − γ )) } + P u n { ˜ T sn > ˜ c sn (1 − αγ ) } ≤ α where the inequality holds by sub-additivity of the probability measure. Step 4:
Finally we show the conclusion still holds even if c qd ∗ (1 − α ) = 0 for all α ∈ (0 , P u n { ˜ T rn ≤ ˜ c rn (1 − α ) } ≥ P u n { ˜ T rn ≤ c rd ∗ (1 − α ) } = P u n { ˆ D − / u n √ u n ˆ d u n ≤ c rd ∗ (1 − α ) }→ P { Ω / Z + d ∗ ≤ c rd ∗ (1 − α ) } = P { S r (Ω / Z + d ∗ ) ≤ c rd ∗ (1 − α ) } ≥ − α where the first inequality holds because ˜ c rn (1 − α ) is non-negative; the first equality holdsby the definition of ˜ T rn where the inequality insides the probability holds component-wise;the convergence holds by Assumption 1; and the last equality holds by the definition of S r . Similarly, we have P u n { ˜ T sn ≤ ˜ c sn (1 − α ) } ≥ P u n { ˜ T sn ≤ c sd ∗ (1 − α ) } = P u n { ˆ D − / u n √ u n ˆ d u n ≤ c sd ∗ (1 − α ) or − ˆ D − / u n √ u n ˆ d u n ≤ c sd ∗ (1 − α ) }→ P { Ω / Z + d ∗ ≤ c sd ∗ (1 − α ) or − Ω / Z + d ∗ ≤ c sd ∗ (1 − α ) } = P { S s (Ω / Z + d ∗ ) ≤ c sd ∗ (1 − α ) } ≥ − α. References
Andrews, D. W. and Guggenberger, P. (2010). Asymptotic size and a problem with sub-sampling and with the m out of n bootstrap.
Econometric Theory , 26(2):426–468.Andrews, D. W. and Soares, G. (2010). Inference for parameters defined by momentinequalities using generalized moment selection.
Econometrica , 78(1):119–157.30anay, I. and Shaikh, A. (2017).
Practical and Theoretical Advances in Inference forPartially Identified Models , volume 2, pages 271–306. Cambridge University Press.Hansen, P. R. (2005). A test for superior predictive ability.
Journal of Business & EconomicStatistics , 23(4):365–380.Kosorok, M. R. (2008).
Introduction to empirical processes and semiparametric inference.
Springer.Lehmann, E. and Romano, J. (2006).
Testing Statistical Hypotheses . Springer Texts inStatistics. Springer New York.Linton, O., Maasoumi, E., and Whang, Y.-J. (2005). Consistent Testing for Stochas-tic Dominance under General Sampling Schemes.
The Review of Economic Studies ,72(3):735–765.Song, K. (2012). Testing predictive ability and power robustification.
Journal of Business& Economic Statistics , 30(2):288–296.West, K. D. (1996). Asymptotic inference about predictive ability.
Econometrica: Journalof the Econometric Society , pages 1067–1084.White, H. (2000). A reality check for data snooping.