[PDF] On the Size Control of the Hybrid Test for Predictive Ability

Abstract

We show that the hybrid test for superior predictability is not pointwise asymptotically of level under standard conditions, and may lead to rejection rates over 11% when the significance level α is 5% in a simple case. We propose a modified hybrid test which is uniformly asymptotically of level α by properly adapting the generalized moment selection method.

Full PDF

aa r X i v : . [ ec on . E M ] A ug On the Size Control of the Hybrid Test for PredictiveAbility

Deborah Kim ∗ Department of EconomicsNorthwestern University [email protected]

August 7, 2020

Abstract

We show that the hybrid test for superior predictability is not pointwise asymp-totically of level α under standard conditions, and may lead to rejection rates over11% when the signiﬁcance level α is 5% in a simple case. We propose a modiﬁedhybrid test which is uniformly asymptotically of level α by properly adapting thegeneralized moment selection method. Keywords:

Asymptotic size; Generalized moment selection; Reality check; Uniform testing

1. Introduction

A test of superior predictive ability (SPA) compares many forecasting methods. Moreprecisely, it tests whether a certain forecasting method outperforms a ﬁnite set of alternativeforecasting methods. White (2000) developed a framework for a SPA test and proposeda SPA test called the reality check for data snooping. Hansen (2005) proposed a SPAtest featuring improved power in the framework of White (2000). Finally, Song (2012)devised a SPA test, called the hybrid test, which delivers better power against certain localalternative hypotheses under which both of the SPA tests of White (2000) and Hansen(2005) perform poorly. ∗ I am grateful to Ivan Canay for his valuable guidance and suggestions. I have had the support andencouragement of Yoon-Jae Whang. I thank Joel Horowitz, Eric Auerbach, Myungkou Shin, and ModiboCamara for their helpful comments. H : d ≤ d ∈ R M where, ingeneral, M ≥

2. It then follows that the limiting distribution of standard test statisticsdepends on exactly which of the elements in the vector d are equal to zero. This propertyprevents researchers from using any tabulated critical values.To circumvent the above problem, White (2000) proposed to use a critical value from theso-called least favorable distribution. The approach exploits the fact that the distributionof White (2000)’s’ test statistic T under d = 0 is stochastically largest over all possiblenull distributions satisfying d ≤

0. The distribution under d = 0 is then called the leastfavorable one. White (2000) proposes to approximate the least favorable distribution usingthe bootstrap and takes the 1 − α quantile of the distribution as the critical value, where α is a signiﬁcance level. The resulting critical value converges to a value which is alwayslarger than 1 − α quantile of the limiting distribution of T under any null distribution andthus the approach yields a test with correct asymptotic size.Song (2012) followed White (2000) in the analysis of his hybrid test and used the sameleast favorable distribution, i.e., the one associated with d = 0. However, in this paper, weshow that this null distribution is not the least favorable one for the type of test statisticthat Song (2012) considers in the hybrid test which, in particular, combines two diﬀerenttest statistics. Whereas one of the test statistics is stochastically largest under d = 0, theother one is not. Consequently, the hybrid test which employs bootstrap approximationsto the distribution with d = 0 fails to control the rejection probability under the null andleads to size distortion.The main contributions of this paper are the following. First, we show that the hybridtest is not pointwise asymptotically of level α under reasonable conditions. Our resultsillustrate that the cause of the size distortion lies behind the fact that the bootstrap pro-cedure behind the hybrid test neither approximates the asymptotic distribution of the teststatistic nor it approximates the least favorable distribution. Second, we propose a modiﬁedhybrid test which is uniformly asymptotically of level α , again under reasonable conditions.2his stronger result implies that one would expect the ﬁnite sample size of the test notto exceed the signiﬁcance level for large enough sample sizes. Our proposed modiﬁcationfollows the generalized moment selection method by Andrews and Soares (2007, henceforthAS) after accounting for the fact the test statistic in the hybrid test does not exhibit certainmonotonicity properties that are required for the approach in AS. This last observation,despite being a rather technical one, may be of independent interest.This article is organized as follows. Section 2 lays out notation and describes thehybrid test as originally proposed by Song (2012). Section 3 presents the main result ofthe properties of the hybrid test. Section 4 presents the modiﬁed hybrid test and its formalproperties. Section 5 explores the Monte Carlo simulations of the hybrid test and themodiﬁed one. Lastly, Section 6 concludes. The proofs of the formal results are included inAPPENDIX B.

2. The hybrid test for predictive ability

In this section, we introduce notation and the procedure of the hybrid test.Suppose there is a time-series { X t } ∞ t =1 from the distribution P and we observe X ( n ) ≡{ X t } nt =1 by time n . Deﬁne a τ -ahead unknown random variable ξ n + τ ≡ f ( X n + τ ) for somefunction f , the object that we aim to predict. A forecasting method ϕ is a mapping fromthe sample X ( n ) to a forecast for ξ n + τ . We have M + 1 diﬀerent forecasting methods: abenchmark forecasting method ϕ and a ﬁnite set of alternative forecasting methods ϕ m , m ∈ M = { , · · · , M } .The objective of the hybrid test is to test whether the benchmark forecasting methodis superior to all other alternative forecasting methods in terms of predictive ability. Tocompare the predictive ability, we assess the risk (of prediction) of the m th forecastingmethod using a real-valued function Λ m ≡ Λ( ϕ m , P ) for m = 0 , , · · · , M . An example ofsuch risk is the mean squared error Λ m = E [ k ϕ m ( X ( n ) ) − ξ n + τ k ] for ξ n + τ ∈ R . We saythe benchmark forecasting method ϕ dominates the forecasting method ϕ m (in terms ofpredictive ability measured by the risk function Λ) if the risk of the forecasting method ϕ m is greater than or equal to the risk of the benchmark forecasting method ϕ . Becausewe are interested in testing whether the benchmark method ϕ dominates all alternative3orecasting methods in M , the hypotheses can be formulated as H : Λ ≤ Λ m for all m ∈ M , and H : Λ > Λ m for some m ∈ M . The risk of the m th forecasting method ϕ m is unknown, but it can be estimated. LetˆΛ n,m be an estimator for Λ m , for m = 0 , , · · · , M . We deﬁne the risk diﬀerence betweenthe benchmark forecasting method ϕ and the m th forecasting method ϕ m by d m . Deﬁnethe counterpart of d m in the sample space by ˆ d n,m , i.e., d m ≡ Λ − Λ m and ˆ d n,m ≡ ˆΛ n, − ˆΛ n,m for m ∈ M . Deﬁne d as the M -dimensional vector of which m th element is d m for m = 1 , · · · , M .Analogously deﬁne ˆ d n as the M -dimensional vector of which m th element is ˆ d n,m for m =1 , · · · , M . With this notation, the hypotheses can be equivalently written as H : d ≤

0, and H : d φ n ≡ φ n ( X , · · · , X n ) for the null hypothesis H is said to be pointwiseasymptotically of level α if it satisﬁeslim sup n →∞ E P [ φ n ] ≤ α for all P ∈ P , (1)where P is the set of all distributions P satisfying the null hypothesis and the basicassumptions. In turn, the test is said to be uniformly asymptotically of level α if it satisﬁeslim sup n →∞ sup P ∈ P E P [ φ n ] ≤ α . (2)Note that (2) implies (1). If either (1) or (2) fails, then we can always ﬁnd data generatingprocesses under the null such that the rejection probability exceeds α. In the literature of comparing the predictive ability of forecasting methods, the asymp-totic properties of the statistical tests are often characterized by the asymptotic distributionof the vector ˆ d n . In line with this, Song (2012) imposes the following assumption on theasymptotic behavior of the vector ˆ d n . 4 ssumption 1. Let P be the distribution generating the time series { X t } ∞ t =1 .(i) √ n ( ˆ d n − d ) d → N (0 , Σ) where d and Σ depend on P and Σ is a M -dimensionalvariance-covariance matrix.(ii) There exist a consistent estimator ˆ σ n,m for Σ m,m for m = 1 , · · · , M . West (1996)’s Theorem 4.1 and Hansen (2005)’s Assumption 1 provide regularity conditionsunder which the Assumption 1 is satisﬁed. It’s worth noting that Assumption 1 (i) impliesthat ˆ d n converges in probability to a ﬁxed parameter d as the sample size increases toinﬁnity. In this sense, Assumption 1 (i) is stronger than assuming that √ n ( ˆ d n − E P [ ˆ d n ])converges in distribution to a normal distribution.The key feature of the hybrid test is to use two pairs of a test statistic and a criticalvalue in order to form a rejection region. The ﬁrst pair ( ˆ T rn , ˆ c r ∗ n ) is adopted from the realitycheck. We deﬁne the one-sided test by φ rn ≡ { ˆ T rn > ˆ c r ∗ n } . We call this one-sided testbecause the test statistic ˆ T rn is originally devised to test the one-sided null hypothesis H . The second pair ( ˆ T sn , ˆ c s ∗ n ) is adopted from the symmetrized test by Linton et al. (2005). Wedeﬁne the two-sided test by φ sn ≡ { ˆ T sn > ˆ c s ∗ n } . Again, the name, two-sided test, comes fromthe fact that the statistic ˆ T sn is originally proposed to test the two-sided null hypothesis H s : d ≤ d ≥ . Given the two pairs ( ˆ T rn , ˆ c r ∗ n ) and ( ˆ T sn , ˆ c s ∗ n ), the hybrid test is deﬁned by φ n ≡ φ rn (1 − φ sn ) + φ sn . (3)That is, the hybrid test rejects the null hypothesis H if ˆ T rn > ˆ c r ∗ n or ˆ T sn > ˆ c s ∗ n . The testtakes the union of two rejection regions formed by the two tests, φ rn and φ sn , as its rejectionregion. Such rejection region allows the hybrid test to deliver better power propertiesagainst some alternative hypotheses under which the reality check or Hansen (2005)’s SPAtest perform poorly. For details on this result, see Song (2012).The two test statistics are deﬁned as follows:ˆ T rn ≡ √ n max m ∈ M ˆ d n,m ˆ σ n,m andˆ T sn ≡ √ n min max m ∈ M ˆ d n,m ˆ σ n,m , max m ∈ M − ˆ d n,m ˆ σ n,m !! .

5o deﬁne the two critical values, ﬁx a signiﬁcance level α ∈ (0 ,

1) and a tuning parameter γ ∈ (0 , c r ( α, γ ) and ¯ c s ( α, γ ) satisfying the two conditions:lim n →∞ P { ˆ T rn > ¯ c r ( α, γ ) and ˆ T sn ≤ ¯ c s ( α, γ ) } = α (1 − γ ) andlim n →∞ P { ˆ T sn > ¯ c s ( α, γ ) } = αγ. The two equations above imply that if the critical values ˆ c r ∗ n ≡ ˆ c r ∗ n ( α, γ ) and ˆ c s ∗ n ≡ ˆ c s ∗ n ( α, γ )converge in probability to ¯ c r ( α, γ ) and ¯ c s ( α, γ ) respectively, then the hybrid test has thelimiting rejection probability of α under the null hypothesis. The two values are, however,infeasible as the limit distribution of the test statistics is not pivotal. Therefore, Song(2012) proposes to implement bootstrap to obtain data-dependent critical values expectingthat this approach would deliver critical values with the above properties.The procedure to get the bootstrap critical values are the following. Consider generalbootstrap sample { ˆ d ∗ n,b : 1 ≤ b ≤ B } where we denote the m th element of ˆ d ∗ n,b as ˆ d ∗ n,b,m .For example, if the observations are stationary, then one can implement the stationarybootstrap. Deﬁne a centred bootstrap sample as˜ d ∗ n,b ≡ ˆ d ∗ n,b − ˆ d n . (4)Deﬁne the bootstrap test statistics { ( ˆ T r ∗ n,b , ˆ T s ∗ n,b ) } Bb =1 where ˆ T r ∗ n,b and ˆ T s ∗ n,b areˆ T r ∗ n,b ≡ √ n max m ∈ M ˜ d ∗ n,b,m ˆ σ n,m and (5)ˆ T s ∗ n,b ≡ √ n min max m ∈ M ˜ d ∗ n,b,m ˆ σ n,m , max m ∈ M − ˜ d ∗ n,b,m ˆ σ n,m !! and { ˆ σ n,m : m ∈ M } are not bootstrapped. Song (2012) deﬁnes ˆ c s ∗ n as the (1 − αγ )-quantileof the bootstrap sample { ˆ T s ∗ n,b } Bb =1 , i.e.,ˆ c s ∗ n ≡ inf ( c ∈ R : 1 B B X b =1 n ˆ T s ∗ n,b ≤ c o ≥ − αγ ) . Given ˆ c s ∗ n , the critical value ˆ c r ∗ n is deﬁned as the (1 − α (1 − γ ))-quantile of the bootstrapsample { ˆ T r ∗ n,b · n ˆ T s ∗ n,b ≤ ˆ c s ∗ n o } Bb =1 , i.e.,ˆ c r ∗ n ≡ inf ( c ∈ R : 1 B B X b =1 n ˆ T r ∗ n,b · n ˆ T s ∗ n,b ≤ ˆ c s ∗ n o ≤ c o ≥ − α (1 − γ ) ) . γ determines the degree to which the two-sided test φ sn contributes to the hybrid test in constructing the rejection region. For example, if γ iszero, then the hybrid test coincides with a one-sided test with signiﬁcance level α . If γ is 1, then the hybrid test corresponds to the two-sided test with signiﬁcance level α . Werestrict the tuning parameter γ to be in (0 ,

1] as the asymptotic properties of the one-sidedtest follow White (2000).

3. On the size control of the hybrid test

In this section, we investigate the asymptotic theoretical properties of the hybrid test,as these were not formally studied in Song (2012). First, we provide a simple examplewhere the asymptotic rejection probability of the hybrid test exceeds the signiﬁcance level.Next, we present the main result generalizing the observation.We begin by adding the to following two assumptions.

Assumption 2.

All diagonal elements of the variance-covariance matrix Σ are positive. Assumption 3.

As the sample size n diverges to inﬁnity we have sup z ∈ R M (cid:12)(cid:12)(cid:12) P ∗ n n √ n (cid:16) ˆ d ∗ n,b − ˆ d n (cid:17) ≤ z o − P n √ n ( ˆ d n − d ) ≤ z o(cid:12)(cid:12)(cid:12) p → where P ∗ n denotes the probability measure conditional on the sample X ( n ) . Assumption 2 assumes that no element in √ n ( ˆ d n − d ) degenerates in the limit. Assump-tion 3 means that the bootstrap distribution approximates the distribution of √ n ( ˆ d n − d )when the sample size n is suﬃciently large. This assumption is necessary to justify thebootstrapped critical values. White (2000) and Hansen (2005) provide suﬃcient conditionson ˆ d n and bootstrap procedures for Assumption 3 to hold. To gain intuitions on the asymptotic properties of the hybrid test, we consider a simpleexample where the number of alternative forecasting methods is two, M = 2. Let P bea distribution satisfying the null hypothesis. In particular, we assume that d = 0 and7 <

0. This means that the ﬁrst alternative forecasting method is as risky as the bench-mark forecasting method in terms of predictive ability, whereas the benchmark forecastingmethod dominates the second alternative forecasting method. Let Z = ( Z , Z ) be thenormal vector in Assumption 1. We further assume that the risks among the benchmarkand alternative forecasting methods are independent in the limit, cov( Z , Z ) = 0. For thesake of simplicity, assume that var( Z ) = var( Z ) = 1 and γ = 0 . T rn and ˆ T sn . By As-sumption 1 and Assumption 2, we have √ n ( ˆ d n, − d , ˆ d n, − d ) d → ( Z , Z ) ∼ N (0 , I ) . The condition that d = 0 , d < √ n ˆ d n, diverges to −∞ as n goes to inﬁnitywhile √ n ˆ d n, is stochastically bounded. As a result, the two test statistics depend only on √ n ˆ d n, for large n . That is, the vector of the test statistics converge in distribution to astandard normal distribution, i.e.  ˆ T rn ˆ T sn  ≡  √ n max( ˆ d n, , ˆ d n, ) √ n min(max( ˆ d n, , ˆ d n, ) , max( − ˆ d n, , − ˆ d n, ))  ≈  √ n ˆ d n, √ n ˆ d n,  d →  Z Z  . (6)Next, we derive the asymptotic distribution of the bootstrap version of ˆ T sn . By Assump-tion 3 we have √ n ( ˜ d ∗ n,b, , ˜ d ∗ n,b, ) d → ( V , V ) ∼ N (0 , I )with probability approaching 1. Note that the vector ( ˜ d ∗ n,b, , ˜ d ∗ n,b, ) is centred at zero while( ˆ d n, , ˆ d n, ) is centred at ( d , d ) = (0 , T s ∗ n,b depends on √ n ˜ d ∗ n,b, and √ n ˜ d ∗ n,b, for large n . This contrasts to the fact that ˆ T sn only relieson √ n ˆ d n, . The bootstrap consistency assumption and continuous mapping theorem giveˆ T s ∗ n,b ≡ √ n min(max( ˜ d ∗ n,b, , ˜ d ∗ n,b, ) , max( − ˜ d ∗ n,b, , − ˜ d ∗ n,b, )) d → L with probability approaching 1 where L ≡ min(max( V , V ) , max( − V , − V )) and the vector( V , V ) follows the bivariate standard normal distribution. A simple algebra gives theformula for the distribution function of L , F L ( t ) ≡ P { L ≤ t } = − ( t ) + 4Φ( t ) − t ∈ [0 , ∞ ) where Φ is the distribution function of the standard normal distribution.8oreover, the asymptotic distribution F L implies that the critical value ˆ c s ∗ n converges inprobability to the (1 − α/ F L , i.e.ˆ c s ∗ n p → c s ( α ) ≡ inf { x ∈ R : F L ( x ) ≥ − α/ } (7)Notice that the asymptotic distribution of the bootstrap test statistic ˆ T s ∗ n,b doesn’tcoincide with the asymptotic distribution of the two-sided test statistic ˆ T sn . This dis-cordance between Φ and F L eventually brings about size distortion of the hybrid test.To see this, we observe that the probability to reject the null hypothesis is larger than P { Z > c s ( α ) } = 1 − Φ( c s ( α )) for large n . i.e., E [ φ n ] = P { ˆ T rn > ˆ c r ∗ n or ˆ T sn > ˆ c s ∗ n } ≈ P { Z > min( c r ( α ) , c s ( α )) } ≥ P { Z > c s ( α ) } where c r ( α ) is the probability limit of ˆ c r ∗ n ; the approximation holds by Equation (6) and (7);and the inequality holds by the deﬁnition of minimum. A simple calculation then revealsthat 1 − Φ( c s ( α )) is greater than the nominal level α for α ∈ (0 , . − Φ( c s ( α )) and α could be sizable: the value of 1 − Φ( c s ( α )) is 0.158, 0.112, and0.050 when α is 0.10, 0.05, and 0.01 respectively. The previous example shows that there exists a ﬁxed data generating process underwhich the size of the hybrid test is not controlled by the signiﬁcance level α . The re-centered bootstrap sample mean ( ˜ d ∗ n,b, , ˜ d ∗ n,b, ) plays a critical role in obtaining the conclu-sion. Speciﬁcally, the re-centered bootstrap sample mean steers the asymptotic distributionof the bootstrap test statistics ( ˆ T r ∗ n,b , ˆ T s ∗ n,b ) away from the asymptotic distribution of the teststatistics ( ˆ T rn , ˆ T sn ), consequently yielding over-rejection of the null hypothesis. One mightexpect that the same result would hold in general cases because the bootstrap procedureuses re-centered bootstrap sample means even when M is larger than 2. The followingtheorem states that this conjecture is true. Theorem 1.

Suppose that Assumption 1, 2, and 3 hold. Let ≤ M < ∞ . Let P be adistribution generating X ( n ) satisfying the following conditions:1. there exists m ∈ M such that d m = 0 , . there exists m ′ ∈ M such that d m ′ < , and3. Σ is a diagonal matrix.For any γ ∈ (0 , , there exists an upper bound ¯ α ≡ ¯ α ( P, γ ) ∈ (0 , . such that the followingcondition holds: lim n →∞ E P [ φ n ] > α for any α ∈ (0 , ¯ α ) where φ n is the hybrid test deﬁned in Equation (3). Theorem 1 provides suﬃcient conditions for a sequence of distributions under the nullhypothesis along which the asymptotic rejection probability exceeds the nominal level α ∈ (0 , ¯ α ). The existence of such a sequence of distributions implies that the hybrid test is notpoint-wise asymptotically of level α for α ∈ (0 , ¯ α ) and thus the size of the hybrid test isnot controlled asymptotically.The size of the upper bound ¯ α could be of practical interest as one can carry out thehybrid test without concerns on size distortion if ¯ α is smaller than 0.01. The value of ¯ α is,however, a priori unknown and depends on the data generating process. To be more speciﬁc,it relies on the number of the alternative forecasting methods M as well as the number ofthe alternatives attaining the same risk as the benchmark, i.e. M ≡ |{ m ∈ M : d m = 0 }| .Once M , M and γ are ﬁxed, ¯ α can be obtained by numerical approximation. To see howlarge ¯ α could be, we tabulated some values of ¯ α under γ = 0 . α varies consistently over the ratio of M to M : ¯ α may get close to γ as the ratio increases to 1 and be close to 0 as the ratio diminishes to zero. We presentthe values of ¯ α under γ = 0 .

25 and γ = 0 .

75 in APPENDIX A. The result implies that onecannot use conventional signiﬁcance levels { . , . , . } when the ratio exceeds a halfand that the use of p-values is generally invalid.While Theorem 1 postulates three conditions for data generating processes leading tosize distortion for α ∈ (0 , ¯ α ), the ﬁrst condition in Theorem 1 is crucial because it preventsthe distribution of the test statistics from degenerating. The condition is satisﬁed if theset of alternative forecasting methods M contains at least one forecasting method thatattains the same risk as the benchmark forecasting method. The violation of this condition10able 1: Values of ¯ α ’s in Theorem 1 With M = 10 , , · · · , M = kM − k =0 . , . , · · · ,

1, and γ = 0 . kM M have greater risks than the benchmarkforecasting method under the null hypothesis. In this case, the two test statistics divergeto the negative inﬁnity while the critical values converge to ﬁxed real numbers regardless.Therefore, if the ﬁrst condition is violated, the conclusion no longer holds.The second condition says that the set M must contain at least one forecasting methodriskier than the benchmark forecasting method. Recall that in the example d < T rn deviate from thatof ˆ T s ∗ n,b . In the same manner, the second condition causes the asymptotic distribution ofthe test statistics to diﬀer from the asymptotic distribution of the bootstrap test statistics.If the second condition is not satisﬁed, then d must be zero under the null hypothesis. If d = 0, the bootstrap test statistics exactly approximate the limiting distribution of thetest statistics. The limit probability to reject the null hypothesis, therefore, becomes thesigniﬁcance level α , rather than exceeding α .Unlike the ﬁrst two, the last condition is not a necessary condition. The condition11equires that the covariance of ( Z i , Z j ) is zero for any i = j ∈ M where Z is the randomvector from N (0 , Σ) in Assumption 1. In a simple case where M = 2, it can easily be shownthat the result still holds even if cov( Z , Z ) > . The condition is posited to simplify theproof.

4. Recovering asymptotic size control

Theorem 1 shows that the hybrid test is not pointwise asymptotically of level α . In thissection, we modify the hybrid test to be not only pointwise but also uniformly asymptoti-cally of level α as deﬁned in Equation (2). The latter concept, which is stronger than theformer, implies that one can approximately control the ﬁnite sample size of the test givena suﬃciently large sample.To ﬁx the size distortion of the hybrid test, we borrow an idea from the moment inequal-ity literature. The size distortion essentially stems from the phenomenon that the bootstraptest statistics do not mimic the asymptotic behavior of the test statistics. Similar issuesoften arise in the moment inequality testing problems, and as a result, many studies haveproposed methods to circumvent this problem. However, as explained in Canay and Shaikh(2017), many of them hinge on the property that test statistics are monotone in ˆ d n . Be-cause ˆ T sn is not monotone in ˆ d n , we cannot simply take one of the oﬀ-the-shelf methods andapply to the hybrid test. Consequently, we alter the generalized moment selection methodproposed by AS. Below we explain the procedure.First, we normalize the test statistics so that their values are zero under the null hy-pothesis. Speciﬁcally two modiﬁed test statistics ˜ T rn and ˜ T sn are˜ T rn ≡ √ n max m ∈ M ˆ d n,m ˆ σ n,m ∨ ! ≡ S r (cid:16) ˆ D − n √ n ˆ d n (cid:17) and (8)˜ T sn ≡ √ n min max m ∈ M ˆ d n,m ˆ σ n,m ∨ ! , max m ∈ M − ˆ d n,m ˆ σ n,m ! ∨ !! ≡ S s (cid:16) ˆ D − n √ n ˆ d n (cid:17) (9)where ˆ D n ≡ diag (ˆ σ n, , · · · , ˆ σ n,M ), S q : R M → R for q ∈ { r, s } are real-valued functionssuch that S r ( x ) = max m ∈ M ( x ∨

0) and S s ( x ) = min(max m ∈ M ( x ∨ , max m ∈ M ( − x ∨ a ∨ b is the maximum between a and b .12econd, we deﬁne the moment selecting vector ˆ ψ n = ( ˆ ψ n, , · · · , ˆ ψ n,M ) t as proposed byAS where the m th element isˆ ψ n,m ≡ √ nκ n ˆ d n,m ˆ σ n,m ( √ nκ n ˆ d n,m ˆ σ n,m < − ) for m = 1 , · · · , M.κ n is a non-stochastic sequence of non-negative numbers such that κ n → ∞ and κ n / √ n → n diverges to inﬁnity. κ n is a tuning parameter that a researcher has to choose. ASrecommend the choice κ n = √ log n. As in AS we suggest two types of data-dependent critical values. The ﬁrst type issimulation-based. The critical values ˜ c qn (1 − α ) for q ∈ { r, s } are deﬁned by˜ c qn (1 − α ) ≡ inf n x ∈ R : P n S q (cid:16) ˆΩ n Z + ˆ ψ n (cid:17) ≤ x o ≥ − α o (10)where ˆΩ n ≡ ˆ D − n ˆΣ n ˆ D − n , ˆΣ n is the consistent variance-covariance estimator for Σ in As-sumption 1, and ˆΩ / n is a symmetric positive semi-deﬁnite matrix such that ˆΩ / n ˆΩ / n = ˆΩ n . P is the conditional probability given ( ˆΩ n , ˆ ψ n ), and Z follows standard normal distribu-tion independently from the sample. ˜ c qn (1 − α ) can be obtained by simulating { Z , · · · , Z R } for some large R .The second type is bootstrap-based. The critical values are deﬁned as follows:˜ c q ∗ n (1 − α ) ≡ inf n x ∈ R : P ∗ n ˜ T q ∗ n,b ≤ x o ≥ − α o for q ∈ { r, s } (11)where ˜ T q ∗ n,b ≡ S q (cid:16) ˆ D − n √ n ˜ d ∗ n,b + ˆ ψ n (cid:17) for q ∈ { r, s } for b = 1 , · · · , B (12)and P ∗ is the bootstrap conditional probability given the sample. Deﬁnition 1.

With the test statistics ˜ T rn and ˜ T n deﬁned in Equation (9) , the modiﬁedhybrid test is deﬁned by ˜ φ n ≡ { ˜ T rn > c rn (1 − α (1 − γ )) or ˜ T sn > c sn (1 − αγ ) } for any γ ∈ (0 , where ( c rn , c sn ) = (˜ c rn , ˜ c sn ) in Equation (10) or ( c rn , c sn ) = (˜ c r ∗ n , ˜ c s ∗ n ) inEquation (11) . ψ n are added to the centred bootstrap samples ˜ d ∗ n,b . The bootstrap sample ˜ d ∗ n,b centred at zero causes bootstrap test statistics to deviate from the asymptotic distributionof the actual test statistics when d is not zero. If d m < m ∈ M , √ n ˆ d n,m / ˆ σ n,m diverges to negative inﬁnity while √ n ˜ d ∗ n,b,m remains stochastically bounded. The momentselecting vector prevents this deviation by adding a quantity diverging to negative inﬁnityat √ n ˜ d ∗ n,b,m / ˆ σ n,m when √ n ˆ d n,m / ˆ σ n,m is small enough. As a result, the modiﬁed test attainsasymptotic size control as follows. Lemma 1.

Suppose Assumption 1, 2, and 3 hold. Assume that ˆΣ n is a consistent estimatorfor Σ . Then for α ∈ (0 , . and γ ∈ (0 , we have lim sup n →∞ sup P ∈ P P { ˜ T rn > ˜ c rn (1 − α (1 − γ )) or ˜ T sn > ˜ c sn (1 − αγ ) } ≤ α for ( c rn , c sn ) = (˜ c rn , ˜ c sn ) or ( c rn , c sn ) = (˜ c r ∗ n , ˜ c s ∗ n ) where P is the set of all distributions satisfyingthe null hypothesis and generating the sample X ( n ) . The proof for Lemma 1 can be found in APPENDIX B. Intuitively, the result followsfrom tailoring Lemma 2 and Theorem 1 of AS to our framework. While our test statisticsviolate Assumption 1(a) and 3 in AS, Assumption 1, which imposes a stronger restrictionon E P n [ ˆ d n ] than AS do, allows the uniformity result to hold.

5. Monte carlo simulation

While Theorem 1 tells us the possibility of size distortion of the hybrid test based onasymptotic arguments, it doesn’t inform how pronounced the distortion could be in a ﬁnitesample. In this section, we explore how signiﬁcantly the asymptotic result manifests in aﬁnite sample through Monte Carlo simulation. Furthermore, we study the ﬁnite sampleperformance of the modiﬁed hybrid test under the null hypothesis.We use the simulation design similar to the ones considered in Song (2012) and Hansen(2005). As in Section 2, suppose we have a benchmark forecasting method and M distinct14lternative forecasting methods. We observe n realized relative risks of alternative forecast-ing methods to the benchmark method, d t ∈ R M for t = 1 , · · · , n . We are interested in test-ing the null hypothesis that the relative risk is not greater than zero, H : d ≡ E P [ d t ] ≤ d n = P nt =1 d t /n .For simulation, we draw realized relative risks independently from a normal distribution,i.e., d t ∼ i.i.d. N ( − µλ M , V ) where µ is a positive number and λ M is an M dimensionalvector of which ﬁrst M elements are zeros and the rest M − M elements are ones. M refersto the number of the alternative forecasting methods of which risks are the same as thatof the benchmark, i.e., M = |{ d m : d m = 0 , m = 1 , · · · , M }| . The relative risk d = − µλ k is non-positive and hence the design satisﬁes the null hypothesis. The i.i.d. observationsimply that Assumption 1 and Assumption 3 are satisﬁed. The variance-covariance matrix V is designed to satisfy the third condition of Theorem 1. The oﬀ-diagonal elements ofthe variance matrix V are zeros and the M diagonal elements are determined by a randomdraw from the uniform distribution over [1 ,

2] at the beginning of the simulation and areﬁxed during the simulation.The sample size n is 200 and hence we draw M × n random numbers. The number ofMonte Carlo repetitions and the bootstrap samples are 5,000 and 500 respectively. Thenumber of alternative forecasting methods is chosen from M ∈ { , } . For the sig-niﬁcance level α , we consider 0.01, 0.05, and 0.10. We use γ = 0 . κ n = √ log n for the tuning parameter in the modiﬁed hybrid test asrecommended by AS.Table 2 reports the simulated rejection probabilities. Hyb. indicates the hybrid testwhile Boot. and Simu. refer to the modiﬁed test with the bootstrap-based and simulation-based critical values respectively.Table 2 provides evidence of ﬁnite sample size distortion of the hybrid test. Manysimulated rejection probabilities of the hybrid test exceed the signiﬁcance level α when M is strictly less than M . The size distortion is the starkest when M is slightly less than M ,and the extent of distortion is not marginal. For example, the rejection probabilities of the15ybrid test with M = 50 and M = 45 are 0.208, 0.149, and 0.070 which are almost twice,three times, and seven times larger than the corresponding signiﬁcance levels α = 0 . , . .

01. We have similar results in the case with M = 100 and M = 95 . There is a noticeable pattern in the simulated probabilities. First, when all inequalitiesare binding, that is, when the second condition in Theorem 1 is not satisﬁed, the probabil-ities are close to the nominal level α . This is because, under this data generating process,the bootstrap distribution correctly approximates the distribution of the test statistics andhence the rejection probability converges exactly to the nominal level. Second, as M de-creases, the probabilities abruptly increase over the nominal level but decline gradually.This is because both test statistics converge to max m ∈ M Z m , which decreases in M , where { Z m : m = 1 , · · · , M } are independent random variables from the standard normal distri-bution. Meanwhile, the limiting distributions of the bootstrap test statistics do not dependon M . This diﬀerence leads to diminishing rejection probabilities along M . Finally, theprobabilities fall below α when the ratio M /M is small: less than 0.4 for α = 0 .

10, 0.3 for α = 0 .

05, and 0.2 for α = 0 .

01 for the case M = 50 . This is consistent with our ﬁndingsfrom Table 1 that ¯ α decreases as the ratio M /M diminishes.Contrary to the hybrid test, the simulated rejection probabilities of the modiﬁed hybridtests are less than the nominal level except two cases with M = M = 50 and α = 0 .

01. Themodiﬁed hybrid test appears to be conservative in that the simulated rejection probabilitiesare close to α/ M is strictly less than M . This is because two test statistics ˜ T rn and ˜ T sn converge in distribution to the same distribution as ˆ T rn and ˆ T sn do in the previousexample. Furthermore, the probabilities show that two diﬀerent critical values of themodiﬁed hybrid test yield similar results.

6. Conclusion

This article shows that the hybrid test proposed by Song (2012) is not point-wise asymp-totically of level α . We verify the cause of the size distortion: the principle of least-favorableapproach taken by White (2000) no longer holds in the hybrid test. As its result, the boot-strap procedure which centers the bootstrap sample at zero approximates a distributionthat is irrelevant to the test statistics. We modify the hybrid test by adopting the general-16able 2: Simulated Rejection Probabilities α = 0 . α = 0 . α = 0 . M M Hyb. Boot. Simu. Hyb. Boot. Simu. Hyb. Boot. Simu.50 50 0.106 0.083 0.083 0.055 0.046 0.043 0.016 0.011 0.01150 45 0.208 0.056 0.057 0.149 0.029 0.028 0.070 0.006 0.00750 40 0.192 0.052 0.053 0.139 0.029 0.027 0.060 0.005 0.00550 35 0.164 0.052 0.051 0.113 0.024 0.026 0.050 0.005 0.00550 30 0.139 0.053 0.053 0.095 0.028 0.028 0.047 0.007 0.00750 25 0.115 0.052 0.052 0.086 0.030 0.028 0.038 0.007 0.00650 20 0.102 0.062 0.060 0.074 0.033 0.033 0.036 0.008 0.00850 15 0.075 0.052 0.052 0.051 0.028 0.026 0.024 0.007 0.00650 10 0.053 0.055 0.055 0.036 0.027 0.026 0.016 0.008 0.00750 5 0.025 0.054 0.054 0.016 0.026 0.025 0.006 0.005 0.004100 100 0.103 0.081 0.083 0.053 0.042 0.041 0.011 0.009 0.008100 95 0.219 0.062 0.061 0.157 0.033 0.032 0.073 0.010 0.008100 90 0.225 0.062 0.061 0.162 0.030 0.031 0.073 0.007 0.006100 85 0.208 0.065 0.064 0.147 0.035 0.032 0.068 0.008 0.008100 80 0.190 0.060 0.060 0.137 0.032 0.032 0.059 0.009 0.009100 75 0.185 0.060 0.061 0.133 0.030 0.029 0.057 0.007 0.008100 70 0.172 0.057 0.058 0.116 0.029 0.029 0.052 0.008 0.008100 65 0.155 0.0572 0.057 0.111 0.028 0.027 0.048 0.007 0.007100 60 0.147 0.057 0.058 0.102 0.030 0.030 0.044 0.007 0.006100 55 0.136 0.053 0.053 0.090 0.029 0.029 0.038 0.008 0.007NOTE: Hyb., Boot., and Simu. refer to the hybrid test and modiﬁed hybrid test withbootstrap-based and simulation-based critical values respectively.17zed moment selection method by Andrews and Soares (2010) and show that the modiﬁedhybrid test is uniformly asymptotically of level α. We expect this article to shed light onunderstanding the asymptotic properties of the hybrid test and to help practitioners toconduct valid inference.

Appendix A: values of ¯ α under various γ Table A.1: The values of ¯ α ’s in Theorem 1 with M = 10 , , , · · · , M = kM − k = 0 . , . , · · · ,

1, and γ = 0 . kM Appendix B: proofs

B.1 Proof for Theorem 1

This proof consists of four steps. In the ﬁrst step we obtain the asymptotic distributionsof the two test statistics. In the second and third step, we derive the probability limits of18able A.2: The values of ¯ α ’s in Theorem 1 with M = 10 , , , · · · , M = kM − k = 0 . , . , · · · ,

1, and γ = 0 . kM c s ∗ n and ˆ c s ∗ n respectively. In the last step we show the existence of ¯ α whichsatisﬁes the conclusion. Step 1:

Let P be a distribution which satisﬁes all three conditions in the theorem. Then P conforms to the null hypothesis. Let M denote the set of indices with zero mean, i.e. M = { m ∈ M : d m = 0 } . By the ﬁrst two conditions, both sets M \ M and M are notempty. Deﬁne a diagonal matrix ˆ D n by ˆ D n ≡ diag(ˆ σ n, , · · · , ˆ σ n,M ). The third conditiontogether with Assumption 1 and Assumption 3 implies that we have √ n ˆ D − / n ( ˆ d n − d ) d → Z ≡ ( Z , · · · , Z M ) ′ (13)as n diverges to inﬁnity where Z is a random vector from the M -dimensional standardnormal distribution. 19eﬁne a function f : R M R by f ( x ) =  max m ∈ M x m min(max m ∈ M x m , max m ∈ M ( − x m ))  (14)where x = ( x , · · · , x M ) t . Clearly, both mappings x max m ∈ M x m and x min(max m ∈ M x m ,max m ∈ M ( − x m )) are continuous. This implies that function f is also a continuous mapping.The vector of test statistics can be written as ( ˆ T rn , ˆ T sn ) t = f ( √ n ˆ D − / n ˆ d n ).We want to obtain the limiting distribution of the vector of the test statistics. Moreprecisely, we want to show ( ˆ T rn , ˆ T sn ) t d → (max m ∈ M Z m , max m ∈ M Z m ) t . By the deﬁnition ofweak convergence, it means thatlim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E h g ( f ( √ n ˆ D − / ˆ d n )) i − E  g  max m ∈ M Z m max m ∈ M Z m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 0 (15)for any bounded continuous function g : R R .To this end, we deﬁne two events E and E by E ≡ { min m ∈ M ˆ d n,m ˆ σ n,m < max m ∈ M \ M ˆ d n,m ˆ σ n,m } and E ≡ { max m ∈ M ˆ d n,m ˆ σ n,m > max m ∈ M \ M − ˆ d n,m ˆ σ n,m } . Then by rearranging the terms and subtracting max m ∈ M \ M d m /σ m on both sides in theevent E we havelim n →∞ P { E } = lim n →∞ P {− max m ∈ M \ M d m σ m < − min m ∈ M ˆ d n,m ˆ σ n,m + max m ∈ M \ M ˆ d n,m ˆ σ n,m − max m ∈ M \ M d m σ m } = 0 . The last equality holds from the fact thatmin m ∈ M ˆ d n,m ˆ σ n,m p → m ∈ M \ M ˆ d n,m ˆ σ n,m − max m ∈ M \ M d m σ m p → n diverges to inﬁnity, which are implied by Equation (13) and continuous mappingtheorem. Similarly we can show that lim n →∞ P { E } = 0. Let 1 E j denote an indicatorfunction which takes 1 as its value if the event E j occurs for j ∈ { , } and zero otherwise.Then Equation (15) holds by the following. For any bounded continuous function20 : R R , we havelim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E  g ( f ( √ n ˆ D − / n ˆ d n )) − g  max m ∈ M Z m max m ∈ M Z m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ lim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E  { g ( f ( √ n ˆ D − / n ˆ d n )) − g  max m ∈ M Z m max m ∈ M Z m  }{ E + (1 − E )1 E } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + lim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E  { g ( f ( √ n ˆ D − / n ˆ d n )) − g  max m ∈ M Z m max m ∈ M Z m  }{ − E }{ − E } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ lim n →∞ x ∈ R | g ( x ) |{ P { E } + P { E }} + lim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E  { g ( f ( √ n ˆ D − / n ˆ d n )) − g  max m ∈ M Z m max m ∈ M Z m  }{ − E }{ − E } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = lim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E  { g  max m ∈ M √ n ˆ d n,m / ˆ σ n,m max m ∈ M √ n ˆ d n,m / ˆ σ n,m  − g  max m ∈ M Z m max m ∈ M Z m  }{ − E }{ − E } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (16) ≤ lim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E  g  max m ∈ M √ n ˆ d n,m / ˆ σ n,m max m ∈ M √ n ˆ d n,m / ˆ σ n,m  − g  max m ∈ M Z m max m ∈ M Z m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + lim n →∞ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E  { g  max m ∈ M √ n ˆ d n,m / ˆ σ n,m max m ∈ M √ n ˆ d n,m / ˆ σ n,m  − g  max m ∈ M Z m max m ∈ M Z m  }{ E + 1 E − E E } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ lim n →∞ E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) g  max m ∈ M √ n ˆ d n,m / ˆ σ n,m max m ∈ M √ n ˆ d n,m / ˆ σ n,m  − g  max m ∈ M Z m max m ∈ M Z m (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) · | E + 1 E − E E |  ≤ lim n →∞ x ∈ R | g ( x ) |{ P { E } + P { E }} = 0 . The ﬁrst inequality holds by the triangular inequality. To get the ﬁrst equality we use thefact that g is a bounded function, and that (1 − E )1 E ≤ E . The second equality holdsfrom that the probabilities of two events E and E converge to zero, and that f ( √ n ˆ D − / n ˆ d n ) = (cid:18) max m ∈ M √ n ˆ d n,m / ˆ σ n,m , max m ∈ M √ n ˆ d n,m / ˆ σ n,m (cid:19) t conditional on the event E c ∩ E c . The second inequality holds by the triangular inequalityagain. The penultimate inequality holds by continuous mapping theorem and by the def-inition of weak convergence. For the last inequality, we bound g with its supremum. The21ast equality again holds by that the probability of two events converge to zero. Therefore,the vector of test statistics ( ˆ T rn , ˆ T sn ) has a desired limit distribution. Step 2:

We obtain the probability limits of the critical values ˆ c r ∗ n and ˆ c s ∗ n . Assumption 3states that sup x ∈ R M (cid:12)(cid:12)(cid:12) P ∗ {√ n ( ˆ d ∗ n,b − ˆ d n ) ≤ x } − P {√ n ( ˆ d n − d ) ≤ x } (cid:12)(cid:12)(cid:12) p → n diverges to inﬁnity where P ∗ denotes the bootstrap probability measure. Deﬁne arandom vector ( Z r , Z s ) t ≡ f ( Z ) where f is deﬁned in Equation (14). Then continuousmapping theorem implies thatsup x,y ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ y } − P { Z r ≤ x, Z s ≤ y } (cid:12)(cid:12)(cid:12) p → n diverges to inﬁnity. Since the mapping that selects a coordinate ( x, y ) y is contin-uous, we have sup y ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T s ∗ n,b ≤ y } − P { Z s ≤ y } (cid:12)(cid:12)(cid:12) p → Z s and show that it is continuous and strictly increasing. Given this, Lemma 11.2.1 ofLehmann and Romano (2006) gives us thatˆ c s ∗ n p → c s ≡ inf { y ∈ R : P { Z s ≤ y } ≥ − αγ } (18)for any α ∈ (0 ,

1) and γ ∈ (0 , Step 3:

To obtain the probability limit of the second critical value ˆ c r ∗ n , we start withshowing thatsup x ∈ R | G n ( x ) | ≡ sup x ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b { ˆ T s ∗ n,b ≤ ˆ c s ∗ n } ≤ x } − P { Z r { Z s ≤ c s } ≤ x } (cid:12)(cid:12)(cid:12) p → . (19)22he term on the left hand side can be bounded as follows:sup x ∈ R | G n ( x ) |≤ sup x< | G n ( x ) | + sup x ≥ | G n ( x ) | = sup x< (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ c s } (cid:12)(cid:12)(cid:12) + sup x ≥ | G n ( x ) | = sup x< (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ c s } (cid:12)(cid:12)(cid:12) + sup x ≥ (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ ˆ c s ∗ n } + P ∗ { ˆ T s ∗ n,b > ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ c s } − P { Z s > c s } (cid:12)(cid:12)(cid:12) (20) ≤ x ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ c s } (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) P ∗ { ˆ T s ∗ n,b > ˆ c s ∗ n } − P { Z s > c s } (cid:12)(cid:12)(cid:12) ≤ x ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ ˆ c s ∗ n } (cid:12)(cid:12)(cid:12) + 2 sup x ∈ R | P { Z r ≤ x, Z s ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ c s }| + (cid:12)(cid:12)(cid:12) P ∗ { ˆ T s ∗ n,b > ˆ c s ∗ n } − P { Z s > c s } (cid:12)(cid:12)(cid:12) . The ﬁrst inequality holds by the triangular inequality. The ﬁrst equality follows from thefact that for ˆ T r ∗ n,b { ˆ T s ∗ n,b ≤ ˆ c s ∗ n } to take a negative value the indicator function must beone. Similarly, we get the second equality by decomposing G n ( x ) into two cases where theindicator function is zero or not. For the second inequality we use that supremum is anon-decreasing set operator and the triangular inequality.Now we show that all three terms in the last line converge to zero in probability. Theconvergence of the ﬁrst term comes from thatsup x ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ ˆ c s ∗ n } (cid:12)(cid:12)(cid:12) ≤ sup x,y ∈ R (cid:12)(cid:12)(cid:12) P ∗ { ˆ T r ∗ n,b ≤ x, ˆ T s ∗ n,b ≤ y } − P { Z r ≤ x, Z s ≤ y } (cid:12)(cid:12)(cid:12) and by Equation (17). To show the convergence of the second term, deﬁne the jointdistribution function of ( Z r , Z s ) and the marginal distribution function of Z s by F rs ( x, y ) ≡ P { Z r ≤ x, Z s ≤ y } and F s ( y ) ≡ P { Z s ≤ y } . c s ∗ n imply thepointwise convergence, F rs ( x, ˆ c s ∗ n ) − F rs ( x, c s ) p → x ∈ R . The same logic gives the pointwise convergence of conditional distribution function, F rs ( x, ˆ c s ∗ n ) F s (ˆ c s ∗ n ) − F rs ( x, c s ) F s ( c s ) p → x ∈ R . Now we can extend this pointwise convergence into the uniform convergence over the real-line by applying Theorem 11.2.9 of Lehmann and Romano (2006) to two conditional dis-tributions as the conditional distribution function F rs ( x, c s ) /F s ( c s ) is continuous. Once wehave the uniform convergence of the conditional distributions, we havesup x ∈ R | P { Z r ≤ x, Z s ≤ ˆ c s ∗ n } − P { Z r ≤ x, Z s ≤ c s }| = sup x ∈ R | F rs ( x, ˆ c s ∗ n ) − F rs ( x, c s ) | = sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) F rs ( x, ˆ c s ∗ n ) F s (ˆ c s ∗ n ) − F rs ( x, c s ) F s ( c s ) (cid:12)(cid:12)(cid:12)(cid:12) · F s (ˆ c s ∗ n ) + sup x ∈ R (cid:12)(cid:12)(cid:12)(cid:12) F rs ( x, c s ) F s ( c s ) (cid:12)(cid:12)(cid:12)(cid:12) · | F s (ˆ c s ∗ n ) − F s ( c s ) | p → . The convergence of the third term is straightforward.Given the result, let us obtain the probability limit of critical value ˆ c r ∗ n . We can’tdirectly apply Lemma 11.2.1 of (Lehmann and Romano, 2006) as in Step 3 because thedistribution of Z r { Z s ≤ c s } is discontinuous at zero. Let α ∈ (0 , − − M ). Then we have P { Z r { Z s ≤ c s } ≤ } = P { Z r ≤ , Z s ≤ c s } + P { Z s > c s } = P { Z r ≤ , Z s ≤ c s } + αγ ≤ min( P { Z r ≤ } , P { Z s ≤ c s } ) + αγ (21)= min(2 − M , − αγ ) + αγ< − α + αγ = 1 − α (1 − γ ) if α < − − M . The second equality holds by the deﬁnition of c s . The third equality holds by that P { Z r ≤ } = P { Z m ≤ m ∈ M } = Φ M (0) = 2 − M and again by the deﬁnition of c s whereΦ( · ) is the cumulative distribution function of the standard normal distribution. This resultguarantees that the 1 − α (1 − γ ) quantile of Z r { Z s ≤ c s } is strictly positive given that24 < − − M . As the distribution function P { Z r { Z s ≤ c s } ≤ x } is continuous and strictlyincreasing over the interval [0 , ∞ ), we have thatˆ c r ∗ n p → c r ≡ inf { x ∈ R : P { Z r { Z s ≤ c s } ≤ x } ≥ − α (1 − γ ) } (22)by Lemma 11.2.1 of Lehmann and Romano (2006). Step 4:

We show that there exists ¯ α which makes the probability to reject the nullhypothesis strictly greater than α for all α ∈ (0 , ¯ α ).First, we compute a lower bound for the limiting rejection probability. The test function φ n is deﬁned by φ n ≡ { ˆ T sn > ˆ c s ∗ n } + 1 { ˆ T sn ≤ ˆ c s ∗ n } { ˆ T rn > ˆ c r ∗ n } . Given the distribution P , the limiting rejection probability islim n →∞ E P [ φ n ] = lim n →∞ P { ˆ T sn > ˆ c s ∗ n or ˆ T rn > ˆ c r ∗ n } = P { max m ∈ M Z m > min( c s , c r ) } . (23)This holds by the weak convergence result in Equation (15), by convergence of the criticalvalues in Equation (18) and (22), and by the Slutsky theorem. Deﬁne k ≡ k ( α ) ≡ Φ( c s ) = 1 − Φ( − c s ) . (24)Note that k is a function of α as c s depends on α . The limiting rejection probability inEquation (23) is bounded from below by 1 − k because it holds that P { max m ∈ M Z m > min( c s , c r ) } ≥ P { max m ∈ M Z m > c s } = 1 − k M where M = | M | ≥

1. Therefore in order to attain the conclusion, it is suﬃcient to ﬁnd α satisfying that 1 − k M > α .Now let us consider the relationship between k deﬁnd in Equation (24) and α . The25eﬁnition of c s provides the connection between the two. By deﬁnition of c s , we have αγ = P ( Z s > c s )= P (min(max m ∈ M Z m , − min m ∈ M Z m ) > c s )= P (max m ∈ M Z m > c s , and − min m ∈ M Z m > c s )= 1 − P (max m ∈ M Z m ≤ c s , or min m ∈ M Z m ≥ − c s )= 1 − { P (max m ∈ M Z m ≤ c s ) + P ( min m ∈ M Z m ≥ − c s ) − P ( ∀ m ∈ M , − c s ≤ Z m ≤ c s ) } (25)=  − k M + (2 k − M if c s ≥ αγ ∈ (0 , . − k M if c s < αγ ∈ (0 . , . The ﬁrst case holds because Equation (25) equals 1 − { Φ M ( c s ) + (1 − Φ( − c s )) M − (Φ( c s ) − Φ( − c s )) M } . The second case holds because P {− c s ≤ Z m ≤ c s } =0 for any m ∈ M if c s ≤ α ∈ (0 , . /γ ]. Recall that the tuning parameter γ ∈ (0 ,

1] is ﬁxed.Following Equation (21), deﬁne a function a γ : [0 , → [0 , γ ] by a γ ( x ) ≡  γ (1 − x M + (2 x − M ) if x ∈ [0 . , γ (1 − x M ) if x ∈ [0 , . . It is easy to check that a γ is continuous on [0 ,

1] and a ′ γ ( s ) < x ∈ (0 , a γ is bijective. In other words, for k ∈ [0 ,

1] there exists one-to-one relationbetween k and α , and a γ is the inverse function of k ( α ).Given the ﬁnding, let’s obtain the set of values for α satisfying 1 − k M > α . Speciﬁcally,ﬁnd the values of x ∈ [0 . ,

1] satisfying the following condition: h γ ( x ) ≡ − x M − a γ ( x ) > h γ is a real-valued function deﬁned on [0 , h γ (1) = 0 andlim x → − h ′ γ ( x ) <

0. Since h γ is a polynomial, there exists ¯ ε ∈ (0 , .

5) satisfying that h γ ( x ) > x ∈ (1 − ¯ ε, ε because it is the solution to the M th degree polynomial equation. However, for ﬁxed M and M , the value of ¯ ε can be numerically approximated and so is ¯ α. Therefore any value α in the interval (0 , a γ (1 − ¯ ε )) satisﬁes 1 − k M > α . Recall thatEquation (21) in Step 3 requires α to be less than 1 − − M . As a result, we have the desiredresult by setting ¯ α = min( a γ (1 − ¯ ε ) , − − M , (2 γ ) − ).26 .2 Proof for Lemma 1 In this section, we prove Lemma 1 in the article by modifying the proof of Theorem1 and Lemma 2 in Andrews and Soares (2010). Their results do not directly apply toour setting. Speciﬁcally, our test statistics violate Assumption 1(a) and Assumption 3in Andrews and Soares (2010). Assumption 1(a) requires that the test statistics mustbe monotone in ˆ d n , but our statistic ˜ T sn is not monotone in ˆ d n as the function f ( x ) =min(max( x ∨ , max( − x ∨ x ≤ x ≥

0. Assumption 3 requires f ( x ) to be strictly positive if and only if x m > m ∈ M . It’s easy to check that f ( x ) = 0 if M = 2, x > x > . To get around the problem, we use a special feature of the SPA testing problem thatˆ d n converges in probability to a non-stochastic d . Andrews and Soares (2010) posits moregeneral setting than us in that they assume √ n ( ˆ d n − E P n [ ˆ d n ]) is asymptotically normal.Song (2012) indirectly imposes an constraint on the behaviour on E P n [ ˆ d n ] by Assump-tion 1. While the monotonicity assumption is used to control the asymptotic behaviourof E P n [ ˆ d n ]) the assumption is not necessary in our setting. Besides, Assumption 3 inAndrews and Soares (2010) is not crucial and can be easily modiﬁed. As a result, our proofcan be interpreted as a simpliﬁed version of Theorem 1 and Lemma 2 in Andrews and Soares(2010).As in Andrews and Soares (2010) we prove only for the case where the simulation-basedcritical values ˜ c qn (1 − α ) for q ∈ { r, s } are used. The other case with the bootstrap criticalvalues ˜ c q ∗ n (1 − α ) for q ∈ { r, s } can be shown in a similar manner.Our proof consists of three steps. Step 1 corresponds to Theorem 1 of Andrews and Soares(2010). Step 2 is the version of Lemma 2(a) without their Assumption 1(a). Step 3 followstheir proof of Lemma 2(b) but does not use their Assumption 3.Before we begin, we deﬁne some additional notation. Deﬁne a function ψ : R M → [ −∞ ,

0] such that ψ ( ξ ) = ( ψ ( ξ ) , · · · , ψ M ( ξ )) t and ψ m ( ξ ) =  ξ m if ξ m < −

10 if ξ m ≥ − m = 1 , · · · , M where ξ m is the m th element of ξ ∈ R M . Given this notation, the moment selecting vectorˆ ψ n can be written as ψ ( ˆ ξ n ) where ˆ ξ n ≡ κ − n √ n ˆ D − / n ˆ d n .27 tep 1: We obtain a sequence of distributions in P n, along which the limit probabilityto reject a test equals the asymptotic size of the test. For any α ∈ (0 ,

1) and γ ∈ (0 , AsySize ≡ lim sup n →∞ sup P ∈ P n, P { ˜ T rn > ˜ c rn (1 − α (1 − γ )) or ˜ T sn > ˜ c sn (1 − αγ ) } . Then we can ﬁnd a sequence { P n ∈ P n, : n ≥ } such that AsySize = lim sup n →∞ P n { ˜ T rn > ˜ c rn (1 − α (1 − γ )) or ˜ T sn > ˜ c sn (1 − αγ ) } . By deﬁnition, we can ﬁnd a subsequence { u n : n ≥ } of { n } such that AsySize = lim n →∞ P u n { ˜ T ru n > ˜ c ru n (1 − α (1 − γ )) or ˜ T su n > ˜ c su n (1 − αγ ) } . We proceed arguments without specifying q unless necessary as statements are valid forboth choices, q ∈ { r, s } . Step 2:

We ﬁnd the probability limit of the critical value ˜ c qn .We start with ﬁnding the probability limit of ˆ ξ n . Let d be the probability limit of ˆ d n inAssumption 1 along such sequence { P n } . Deﬁne a vector d ∗ = ( d ∗ , · · · , d ∗ M ) t such that d ∗ m =  −∞ if d m <

00 if d m = 0 for m = 1 , · · · , M. Deﬁne a distribution function L q ( x ) ≡ P { S q (Ω / Z + ψ ( d ∗ )) ≤ x } for x ∈ R . Let c qd ∗ (1 − α ) be the 1 − α quantile from L q .Then it holds that ˆ ξ n p → d ∗ becauseˆ ξ n ≡ κ − n √ n ˆ D − / n ˆ d n = ˆ D − / n D / ( κ − n √ nD − / ( ˆ d n − d ) + κ − n √ nD − / d )= ( I M + o p (1)) · ( O p ( κ − n ) + d ∗ + o (1)) p → d ∗ along { P n } as n → ∞ by Assumption 1 and 2 where D is the probability limit of ˆ D n .In Step 2 and 3, we assume that c qd ∗ ¿0. This implies that d ∗ = ( −∞ ) M . Next, we showthat P { S q (Ω / Z + ψ ( ξ )) ≤ x } for x > ξ, Ω) at ( d ∗ , Ω ) where Ω = 0is the probability limit of ˆΩ n = ˆ D − / n ˆΣ n ˆ D − / n .We start by showing that ψ ( ξ ) → ψ ( d ∗ ) for any ξ → d ∗ . Suppose that d ∗ m = 0 for some m ∈ M . Then ψ m ( ξ ) → ψ m ( d ∗ ) because ψ m is continuous at zero. If d ∗ m = −∞ , then ψ m ( d ∗ ) = −∞ . It’s straightforward that ψ m ( ξ ) → −∞ . ξ, Ω) → ( d ∗ , Ω ), the result above and conti-nuity of S q imply that S q (Ω / Z + ψ ( ξ )) → S q (Ω / Z + ψ ( d ∗ )) almost surely in [ Z ]where Z ∼ N (0 , I M ). In turn we have1 { S q (Ω / Z + ψ ( ξ )) ≤ x } → { S q (Ω / Z + ψ ( d ∗ )) ≤ x } almost surely in [ Z ]for any x > S q (Ω / Z + ψ ( d ∗ )) is strictly increasing andcontinuous for x > d ∗ = ( −∞ ) M . The monotone convergence theorem gives us P { S q (Ω / Z + ψ ( ξ )) ≤ x } → P { S q (Ω / Z + ψ ( d ∗ )) ≤ x } for any x >

0. Therefore we achieve the claim.Now we obtain the probability limit of the critical value ˜ c qn . The result above, ( ˆ ξ n , ˆΩ n ) p → ( d ∗ , Ω ), and the Slutsky theorem imply that L qn ( x ) ≡ P { S q ( ˆΩ / n Z + ψ ( ˆ ξ n )) ≤ x } p → L q ( x ) ≡ P { S q (Ω / Z + ψ ( d ∗ )) ≤ x } for any x > { P n } as n → ∞ where P denotes the conditional probability given( ˆ ξ n , ˆΩ n ). Note that ˜ c qn deﬁned in Section 4 is the 1 − α quantile from L qn . c qd ∗ (1 − α ) isthe 1 − α quantile from L q . Because we consider the case where c qd ∗ (1 − α ) >

0, we have˜ c qn (1 − α ) p → c qd ∗ (1 − α ) by Lemma 5 of Andrews and Guggenberger (2010). Step 3:

We ﬁnally derive the result on the asymptotic size. Note that all the convergenceresults in Step 2 still hold when we replace { P n } with { P u n } . Using the same argumentsin Step 1 of the proof for Theorem 1 , we can show that˜ T qu n ≡ S q ( ˆ D − / u n √ u n ˆ d u n ) d → S q (Ω / Z + ψ ( d ∗ ))along { P u n } as n → ∞ . Then we havelim inf n →∞ P u n { ˜ T qu n ≤ ˜ c qu n (1 − α ) } ≥ P { S q (Ω / Z + ψ ( d ∗ )) ≤ c qd ∗ (1 − α ) } ≥ − α c qd ∗ > c qd ∗ (1 − α ). The choice of { P u n } guarantees that AsySize ≡ lim sup n →∞ sup P ∈ P n, P { ˜ T rn > ˜ c rn (1 − α (1 − γ )) or ˜ T sn > ˜ c sn (1 − αγ ) }≤ lim sup n →∞ P u n { ˜ T rn > ˜ c rn (1 − α (1 − γ )) } + P u n { ˜ T sn > ˜ c sn (1 − αγ ) } ≤ α where the inequality holds by sub-additivity of the probability measure. Step 4:

Finally we show the conclusion still holds even if c qd ∗ (1 − α ) = 0 for all α ∈ (0 , P u n { ˜ T rn ≤ ˜ c rn (1 − α ) } ≥ P u n { ˜ T rn ≤ c rd ∗ (1 − α ) } = P u n { ˆ D − / u n √ u n ˆ d u n ≤ c rd ∗ (1 − α ) }→ P { Ω / Z + d ∗ ≤ c rd ∗ (1 − α ) } = P { S r (Ω / Z + d ∗ ) ≤ c rd ∗ (1 − α ) } ≥ − α where the ﬁrst inequality holds because ˜ c rn (1 − α ) is non-negative; the ﬁrst equality holdsby the deﬁnition of ˜ T rn where the inequality insides the probability holds component-wise;the convergence holds by Assumption 1; and the last equality holds by the deﬁnition of S r . Similarly, we have P u n { ˜ T sn ≤ ˜ c sn (1 − α ) } ≥ P u n { ˜ T sn ≤ c sd ∗ (1 − α ) } = P u n { ˆ D − / u n √ u n ˆ d u n ≤ c sd ∗ (1 − α ) or − ˆ D − / u n √ u n ˆ d u n ≤ c sd ∗ (1 − α ) }→ P { Ω / Z + d ∗ ≤ c sd ∗ (1 − α ) or − Ω / Z + d ∗ ≤ c sd ∗ (1 − α ) } = P { S s (Ω / Z + d ∗ ) ≤ c sd ∗ (1 − α ) } ≥ − α. References

Andrews, D. W. and Guggenberger, P. (2010). Asymptotic size and a problem with sub-sampling and with the m out of n bootstrap.

Econometric Theory , 26(2):426–468.Andrews, D. W. and Soares, G. (2010). Inference for parameters deﬁned by momentinequalities using generalized moment selection.

Econometrica , 78(1):119–157.30anay, I. and Shaikh, A. (2017).

Practical and Theoretical Advances in Inference forPartially Identiﬁed Models , volume 2, pages 271–306. Cambridge University Press.Hansen, P. R. (2005). A test for superior predictive ability.

Journal of Business & EconomicStatistics , 23(4):365–380.Kosorok, M. R. (2008).

Introduction to empirical processes and semiparametric inference.

Springer.Lehmann, E. and Romano, J. (2006).

Testing Statistical Hypotheses . Springer Texts inStatistics. Springer New York.Linton, O., Maasoumi, E., and Whang, Y.-J. (2005). Consistent Testing for Stochas-tic Dominance under General Sampling Schemes.

The Review of Economic Studies ,72(3):735–765.Song, K. (2012). Testing predictive ability and power robustiﬁcation.

Journal of Business& Economic Statistics , 30(2):288–296.West, K. D. (1996). Asymptotic inference about predictive ability.

Econometrica: Journalof the Econometric Society , pages 1067–1084.White, H. (2000). A reality check for data snooping.