[PDF] Equal Predictive Ability Tests for Panel Data with an Application to OECD and IMF Forecasts

Abstract

This paper develops novel tests to compare the predictive ability of two forecasters using panels. We consider two different equal predictive ability (EPA) hypotheses. First hypothesis states that the predictive ability of two forecasters is equal on average over all periods and units. Under the second one, the EPA hypothesis holds jointly for all units. We study the asymptotic properties of proposed tests using sequential limits under strong and weak cross-sectional dependence. Their finite sample properties are investigated via Monte Carlo simulations. They are applied to compare the economic growth forecasts of OECD and IMF using data from OECD countries.

Full PDF

aa r X i v : . [ ec on . E M ] M a r Equal Predictive Ability Tests for Panel Datawith an Application to OECD and IMF Forecasts † Oguzhan Akgun a , Alain Pirotte a , Giovanni Urga b and Zhenlin Yang c a CRED, University Paris II Panth´eon-Assas, France b Cass Business School, London, United Kingdom, and Bergamo University, Italy c School of Economics, Singapore Management University, Singapore

March 6, 2020

Abstract

This paper develops novel tests to compare the predictive ability of two forecasters using panels. Weconsider two diﬀerent equal predictive ability (EPA) hypotheses. First hypothesis states that thepredictive ability of two forecasters is equal on average over all periods and units. Under the secondone, the EPA hypothesis holds jointly for all units. We study the asymptotic properties of proposedtests using sequential limits under strong and weak cross-sectional dependence. Their ﬁnite sampleproperties are investigated via Monte Carlo simulations. They are applied to compare the economicgrowth forecasts of OECD and IMF using data from OECD countries.

Keywords : Cross-Sectional Dependence; Forecast Evaluation; Heterogeneity; Hypothesis Testing.

JEL classiﬁcation : C12, C23.Email addresses: [email protected] (O. Akgun), [email protected] (A. Pirotte),[email protected] (G. Urga), [email protected] (Z. Yang). † We wish to thank participants in the 18 th International Workshop on Spatial Econometrics and Statistics(AgroParisTech, Paris, France, 23-24 May 2019), in particular the discussant Paul Elhorst and DavideFiaschi, the 39 th International Symposium on Forecasting (Thessaloniki, Greece, 16-19 June 2019), the 25 th International Panel Data Conference (Vilnius University, Vilnius, Lithuania, 4-5 July 2019), and the 22 nd Dynamic Econometrics Conference (Nuﬃeld College, University of Oxford, Oxford, United Kingdom, 9-10September 2019). The usual disclaimer applies. This paper is a part of Oguzhan Akgun’s PhD researchproject, and it was developed while Giovanni Urga and Zhenlin Yang were visiting professors at UniversityParis II Panth´eon-Assas - CRED (France) during the academic years 2017-2018 and 2018-2019, and AlainPirotte was visiting professor at Bergamo University during the academic years 2017-2018 and 2018-2019under the scheme “Progetto ITALY - Azione 2: Grants for Visiting Professor and Scholar”: we wish tothank both institutions for ﬁnancial support. Financial support is also acknowledged from the Centre forEconometric Analysis (CEA) of Cass Business School for several visits of Alain Pirotte and Oguzhan Akgunat CEA. Introduction

Formal tests of the null hypothesis of no diﬀerence in the forecast accuracy using two time se-ries of forecast errors have been widely discussed and formalized in the literature, for instance,by Vuong (1989), Diebold and Mariano (1995, hereafter DM), West (1996), Clark and McCracken(2001, 2015), Giacomini and White (2006, hereafter GW), Clark and West (2007), among others.Whereas the literature in panel data is scarce, with a few exceptions. First is Davies, Lahiri, et al.(1995, hereafter DL) who focus on testing unbiasedness and eﬃciency of forecasts made by sev-eral diﬀerent agents for the same unit. Their analysis is based on a three dimensional panel dataregression where the dimensions are agents generating the forecasts, target years and forecast hori-zons. Second is undertaken by Timmermann and Zhu (2019, hereafter TZ) who focus on predictionsproduced for several diﬀerent units.The main aim of this paper is to propose tests for the equal predictive ability (EPA) hypothesisfor panel data taking into account both the time series and the cross-sections features of the data.We propose tests allowing to compare the predictive ability of two forecasters, based on n units,hence n pairs of time series of observed forecast errors of length T . Various panel data tests of EPAare proposed, extending that of DM which concerns a single time series. Contrary to DL, our testsare developed for forecasts made for diﬀerent panel units.We develop two types of tests. The ﬁrst one focuses on EPA on average over all panel units andover time. This test is useful and of economic importance when the researcher is not interested inthe diﬀerences of predictive ability for a speciﬁc unit but the overall diﬀerences. In the second typeof tests we focus on the null hypothesis which states that the EPA holds for each panel unit.The applied literature in comparing the accuracy of two or more forecasts with panel datasuggests that the forecast errors of similar units, such as countries, are aﬀected by global commonshocks, for instance, by global ﬁnancial crisis. Pain, Lewis, Dang, Jin, and Richardson (2014) showthat the economic growth projections of the OECD for the period 2007-2012 are systematicallyupward biased. A similar tendency exists for other forecasters, such as IMF. Moreover, these eﬀectsare carried into the loss diﬀerentials as the evidence that we provide in this paper shows. The resultsof Pain et al. (2014) indicates also that the eﬀect of these common shocks is heterogeneous acrosseconomies and some country clusters exists. Hence, it is expected that a small number of commonfactors cannot capture all the dependencies in the forecast errors and the loss diﬀerentials based onthem. 2ollowing these insights, we model the loss diﬀerentials by an approximate factor model wherea small number of common factors aﬀect all units in the panel with heterogeneous loadings andthe idiosyncratic errors are cross-sectionally weakly correlated. We therefore allow the loss diﬀer-entials to contain weak cross-sectional dependence (WCD) and strong cross-sectional dependence(SCD) simultaneously. To develop our tests, we follow the literature on principal components (PC)analysis of large dimensional approximate factor models (Bai, 2003; Bai & Ng, 2002) and covariancematrix estimation methods which are robust to spatial dependence based on geographic or economicdistances between units (Kelejian & Prucha, 2007, hereafter KP). We also propose a novel partialsample covariance estimator for the case of unknown distances which is robust to arbitrary WCD.We explore the asymptotic properties of the test statistics that we propose using sequential limitsand the small sample properties via a Monte Carlo simulation exercise.Our paper is most related to that of TZ but our tests diﬀer from theirs in some importantaspects. We consider the joint EPA hypothesis and we explore diﬀerent ways to deal with WCD.Their overall test statistic J n,T is equivalent to our S (3) n,T in the sense that they are both based onthe estimation of the variance of the sample mean using cross-sectional averages of observations, amethod proposed by Driscoll and Kraay (1998, hereafter DK). Hence, both statistics are robust toarbitrary CD. However, their performance is relatively weak in the case of small T and large n , awell known problem in the literature. We propose test statistics by distinguishing the cases of WCDand SCD which provide some gain in small properties as we show by Monte Carlo simulations.TZ do not consider tests for the joint EPA hypothesis which is important in practice, especially formacroeconomic forecasts. They consider overall EPA tests, tests for known time and cross-sectionalclusters and tests for individual cross-sections. In their tests for time and cross-sectional clusters,the null hypothesis states that the EPA hypothesis holds for each cluster. To derive test statistics forthis hypothesis, they assume that the observations are dependent within clusters and independentamong them. Our tests can be easily generalized to testing the clustered EPA hypothesis relaxingthis assumption of independence.Our joint EPA hypothesis can be seen as a special case of their clustered EPA hypothesis wherethe number of clusters equals n . Apart from the fact that they assume independence among clusters,an important diﬀerence between their tests and the joint EPA tests we propose is that their testsmay lack power even if the clustered EPA hypothesis is violated in the case that the overall EPA issatisﬁed. The tests we propose are consistent when the EPA is violated for even one cluster (unit).3ur testing framework is also more general than that of TZ as we do not place assumptions onthe forecast errors, such that they follow a factor structure. Instead, following DM, we motivate ourtest statistics with assumptions on the loss diﬀerentials themselves and not on the models or methodsof forecasting, as in West (1996) and GW. Moreover, contrary to TZ, we rely on an approximatefactor model where the idiosyncratic errors are not necessarily independent among cross-sectionalunits.Finally, the paper contributes also to the empirical literature. These tests are applied to comparethe economic growth forecasts errors of the OECD and the IMF. We investigate the equality ofaccuracy for diﬀerent time periods and country samples.The remainder of the paper is as follows: In Section 2, we present our model, the hypothesesof interest and statistics for panel tests of EPA. Section 3 investigates the small sample propertiesof these new tests. In Section 4, the predictive ability of the OECD and IMF are compared usingtheir economic growth forecasts. Sections 5 concludes. We are interested in the forecast errors concerning an economic variable observed for time t =1 , , . . . , T , units i = 1 , , . . . , n . We assume that the loss diﬀerential of the errors take the form∆ L i,t = L ( e i,t ) − L ( e i,t ) = µ i + v i,t , (1) v i,t = λ ′ i f t + ε i,t , (2) ε i,t = n X j =1 r ij ǫ j,t , (3)where L ( · ) is a generic loss function, e li,t is the forecast error made by forecaster l = 1 , t for unit i . f t is an m × λ i is the associated m × r ij are ﬁxed but unknown elements of an n × n matrix R n .The variables f t and ǫ i,t are assumed to be zero mean weakly stationary time series allowed to beautocorrelated through time. In particular we assume that ǫ i,t = P ∞ h =0 c ih ψ i,t − h , where ψ i,t − h areindependently distributed random variables with zero mean and unit variance, c ih are absolutelysummable sequences of ﬁxed coeﬃcients for each i . We assume that the idiosyncratic terms ε i,t n →∞ Var n − n X i =1 ε i,t ! = lim n →∞  n − n X i,j =1 E( ǫ i,t ǫ j,t ) r ′ i. r j.  = 0 , ∀ t = 1 , , . . . , T, (4)where r i. = ( r i , r i , . . . , r in ) ′ . Whereas the common component in the process implies SCD suchthat the variance of its cross-sectional average is bounded away from zero for any n suﬃciently large:Var n − n X i =1 λ ′ i f t ! = n − n X i,j =1 λ ′ i E( f t f ′ t ) λ j ≥ δ > , ∀ t = 1 , , . . . , T. (5)WCD can be modeled by a factor model with a possibly inﬁnite number of weak common factors orusing spatial models. The deﬁnition of a weak common factor is due to Chudik, Pesaran, Tosetti, et al.(2011). Let us write (3) as a factor model without an idiosyncratic term: ε i,t = r ′ i. ǫ .,t where ǫ .,t = ( ǫ ,t , ǫ ,t , . . . , ǫ n,t ) ′ . Then a variable ǫ i,t is said to be a weak common factor if its associatedfactor loadings r ij satisfy lim n →∞ P nj =1 | r ij | = δ < ∞ . If each variable ǫ i,t satisﬁes this, we havethe usual condition of spatial econometrics on the boundedness of the row and column sums of thespatial weights matrix R n . If further structure is imposed on the WCD, the model in (3) contains asspecial cases all commonly used spatial processes like spatial autoregression (SAR), spatial movingaverage (SMA), and spatial error components (SEC) as well as their higher order versions. In whatfollows we propose tests for both cases of known and unknown distances between panel units.Assuming that µ i are ﬁxed parameters, a hypothesis of interest is H , : ¯ µ = 0 , (6)where ¯ µ = n P ni =1 µ i . This hypothesis state that the forecasts generated by the two agents areequally accurate on average over all i and t . It looks plausible to consider this in a micro forecastingstudy where the units can be seen as random draws from a population. If the researcher is notinterested in the diﬀerence in predictive ability for a ﬁxed unit but the predictive ability on average,this hypothesis should be considered.In a macro forecasting study, the diﬀerences for each unit can have a speciﬁc economic importanceand may be of interest from a policy perspective. For instance, a question of interest is whetherthe forecasts made by agents are more accurate for a particular group of countries or all countriesin the sample. In this case, the null hypothesis can be formulated such that the predictive equality5olds for each unit as H , : E(∆ L i,t ) = µ i = 0, for all i = 1 , , . . . , n. (7) Consider the sample mean loss diﬀerential over time and units:∆ ¯ L n,T = 1 nT n X i =1 T X t =1 ∆ L i,t . We provide testing procedures for overall EPA implied in (6) based on ∆ ¯ L n,T . Under conditionswhich we discuss below, in particular WCD of the loss diﬀerential series meaning that λ ′ i f t = 0 andthe idiosyncratic errors satisfy the condition in (4), this statistic satisﬁes a central limit theorem(CLT) given by √ nT (∆ ¯ L n,T − ¯ µ n ) /σ n,T D −→ N (0 , , (8)where ¯ µ n = n − P ni =1 µ i and σ n,T = 1 nT n X i,j =1 T X t,s =1 E( v i,t v j,s ) . If the loss diﬀerentials carry SCD as deﬁned in (5), the convergence rate is modiﬁed such that √ T (∆ ¯ L n,T − ¯ µ n ) /σ n,T D −→ N (0 , , (9)where σ n,T = 1 n T n X i,j =1 T X t,s =1 E( v i,t v j,s ) . The case of no CD.

Suppose that the loss diﬀerential is generated by (1)-(3) with λ ′ i f t = 0 and r ij = 0 for every i = j . If weak stationarity assumption is satisﬁed for each i , a sequential applicationof the CLT for weakly stationary time series (see, e.g. Anderson, 1971, Theorem 7.7.8) and the CLTfor independent but heterogeneous sequence (see, e.g. White, 2001, Theorem 5.10) provides the resultin (8) with σ n,T = ¯ σ n = n − P ni =1 σ i , σ i = P ∞ s = −∞ γ v i ( s ), γ v i ( s ) = E( v i,t v i,t − s ). The conditions forthis result to be valid can be seen by writing √ nT (∆ ¯ L n,T − ¯ µ n ) as √ n P ni =1 √ T (∆ ¯ L i,T − µ i ), where∆ ¯ L i,T = T P Tt =1 ∆ L i,t . As T → ∞ , √ T (∆ ¯ L i,T − µ i ) D −→ Z i , where Z i ∼ N (0 , σ i ), under weak6tationarity assumption. Then, the convergence of √ n P ni =1 Z i / ¯ σ n , as n → ∞ , follows from Theorem5.10 of White (2001), provided that { Z i } ni =1 are independent as they are, E | Z i | δ < C < ∞ forsome δ > i , and ¯ σ n > δ ′ > n suﬃciently large.Suppose that we want to test hypothesis (6). We consider the test statistic S (1) n,T = ∆ ¯ L n,T ˆ¯ σ n,T / √ nT D −→ N (0 , , (10)where ˆ¯ σ n,T = n − P ni =1 ˆ σ i,T , and ˆ σ i,T is a consistent estimate of σ i based on the i th time series ofloss diﬀerentials ˆ σ i,T = 1 T T X t,s =1 k T (cid:18) | t − s | l T + 1 (cid:19) ∆ ˜ L i,t ∆ ˜ L i,s , (11)where ∆ ˜ L i,t = ∆ L i,t − ∆ ¯ L i,T and k T ( · ) is the time series kernel function. Under general conditionsAndrews (1991) showed that ˆ σ i,T p −→ σ i as T → ∞ with l T → ∞ , l T = o ( T ). If the conditionsimplying ˆ σ i,T p −→ σ i are satisﬁed, it immediately follows that ˆ¯ σ n,T − σ n,T p −→ The case of WCD.

Suppose that in (1)-(3), λ ′ i f t = 0 but r ij = 0 for some i = j . In this caseof WCD, the loss diﬀerentials ∆ L i,t are no longer independent across i , and therefore, the varianceestimator ˆ¯ σ n,T given above is no longer valid. Nevertheless, the CLT in (8) still satisﬁed with σ n,T = 1 nT n X i,j =1 T X t,s =1 r ′ i. γ ǫ i ( | t − s | ) r j. , where γ ǫ i ( | t − s | ) = diag[ γ ǫ ( | t − s | ) , γ ǫ ( | t − s | ) , . . . , γ ǫ n ( | t − s | )], γ ǫ i ( s ) = E( ǫ i,t ǫ i,t − s ). To see this,write √ nT (∆ ¯ L n,T − ¯ µ n ) as √ n P ni =1 √ T (∆ ¯ L i,T − µ i ) = √ n P ni =1 r ′ i. (cid:16) √ T P Tt =1 ǫ .,t (cid:17) which followsfrom (3). Then, by the CLT for weakly stationary time series and the Cramer-Wold device (see, e.g.White, 2001, Proposition 5.1), as T → ∞ , √ T P Tt =1 s − / n ǫ .,t D −→ Z , where s n = diag( σ , σ , . . . , σ n )and Z ∼ N ( , I n ), under mutual independence of the components of ǫ .,t . Now the result follows fromthe application of the CLT for spatially correlated triangular arrays of Kelejian and Prucha (1998).Given that max ≤ i ≤ n P nj =1 | r ij | < ∞ , max ≤ j ≤ n P ni =1 | r ij | < ∞ , as n → ∞ , √ n e ′ n R n s / n Z D −→ N (0 , σ ) where e n is an n -dimensional vector of ones and σ = lim n →∞ e ′ n R n s n R ′ n e n , hence (8) issatisﬁed.For a single cross-sectional data subject to WCD, KP proposed a spatial heteroskedasticity andautocorrelation consistent (SHAC) estimator of variance-covariance matrix which can be extended7o give a WCD-robust estimator of σ n,T . Such an estimator isˆ σ ,n,T = 1 nT n X i,j =1 k S (cid:18) d ij d n (cid:19) T X t,s =1 k T (cid:18) | t − s | l T + 1 (cid:19) ∆ ˜ L i,t ∆ ˜ L j,s , (12)leading to a test statistic as S (2) n,T = ∆ ¯ L n,T ˆ σ ,n,T / √ nT D −→ N (0 , , (13)where k S ( · ) is the space kernel function, d ij = d ji ≥ i and j ,and d n the threshold distance, which is an increasing function of n such that d n → ∞ as n → ∞ . Theestimator ˆ σ ,n,T is a panel data generalization of the non-parametric covariance estimator proposedby KP. It is used by Pesaran and Tosetti (2011). Moscone and Tosetti (2012) use a similar estimatorwith the diﬀerence being that they set k T ( · ) = 1.Consistency of (12) follows from the arguments by Moscone and Tosetti (2012). To see this,deﬁne the space-time kernel by k ST (cid:18) d ij d n , | t − s | l T + 1 (cid:19) = k S (cid:18) d ij d n (cid:19) k T (cid:18) | t − s | l T + 1 (cid:19) . Consistency of the variance estimator requires that k ST ( x ) : R → [0 ,

1] satisfy (i) k ST (0) = 1 and k ST ( x ) = 0 for | x | >

1, (ii) k ST ( x ) = k ST ( − x ), and (iii) | k ST ( x ) − | ≤ C | x | δ for some δ ≥ < C < ∞ . Then, ˆ σ ,n,T − σ n,T p −→ ≤ i ≤ n P nj =1 d ij ≤ d n ≤ s n where s n is the number of units for which d ij ≤ d n and satisﬁes s n = O ( n κ ) such that 0 ≤ κ < . P nj =1 | r ′ j. r i. | d ηij < ∞ , η ≥ σ ,n,T = 1 n p T n p X i,j =1 T X t,s =1 k T (cid:18) | t − s | l T + 1 (cid:19) ∆ ˜ L i,t ∆ ˜ L j,s , where n p = ⌈ n / ⌉ and ⌈·⌉ denotes the smallest integer greater than its argument. Similar varianceestimators are used by Bai and Ng (2006) and Moscone and Tosetti (2015). The ﬁrst study focuses8n the factor models whereas the second one deals with panel regression models with small T . Ourvariance estimator generalizes theirs by allowing a large T with the help of a time series kernelfunction. Moscone and Tosetti (2015) shows the consistency of their estimator under the conditionsrequired for the consistency of the kernel based estimator discussed above, such as the boundednessof the row and column sums of the matrix R n . We can expect the consistency result to hold in ourmore general case as well but the details of this claim are beyond the scope of our paper. Remark 1.

In the case of WCD in addition to non-parametric estimation, one can use parametricmethods to estimate the covariance matrix. When the model for the spatial dependence structureof the loss diﬀerentials is correctly speciﬁed we can expect to have more powerful tests compared tothe case of non-parametric estimation.

Remark 2.

The tests S (1) n,T and S (2) n,T are directly applicable to a single cross-section which is aspecial case where T = 1. To show the normality of the test statistics it suﬃces to replace thelong-run variances with the variance of the loss diﬀerentials. Then, we can apply the CLT forindependent but heterogeneous sequence (see, e.g. White, 2001, Theorem 5.10) and the CLT forspatially correlated triangular arrays of Kelejian and Prucha (1998) to show the normality of S (1) n,T and S (2) n,T , respectively. When T = 1 the test statistic S (1) n,T corresponds to one of the tests for a singlecross-section by TZ. Our test S (2) n,T is more general as it allows for WCD in the loss diﬀerentials. Asdiscussed by TZ, when the loss diﬀerentials display SCD, the interpretation of these test statisticschange and they can be used to test the EPA conditional on the common factors. The case of SCD.

This is the most general case of the model deﬁned by (1)-(3) with no speciﬁcrestriction imposed on the parameters. A CLT as in (9) can be obtained under general conditionswith σ n,T = 1 n T n X i,j =1 T X t,s =1 λ ′ i E( f t f ′ s ) λ j + 1 n T n X i,j =1 T X t,s =1 r ′ i. γ ǫ i ( | t − s | ) r j. . We write √ T (∆ ¯ L n,T − ¯ µ n ) as √ T P Tt =1 (∆ ¯ L n,t − ¯ µ n ) = √ T P Tt =1 ¯ v n,t where ¯ L n,t = n P ni =1 ∆ L i,t and ¯ v n,t = n P ni =1 v i,t . Suppose that v i,t is α -mixing of size r/ ( r −

1) with r > v n,t is α -mixing of size r/ ( r −

1) as well. If E | ¯ v n,t | r < δ < ∞ for some r ≥ σ n,T = Var[ T − / P Tt =1 ¯ v n,t ] > δ > √ T ¯ v n,T / ¯ σ n,T ∼ N (0 ,

1) for all T suﬃciently large from which the result in (9) follows. In this case the convergencerate of the sample mean is √ T rather than √ nT .9n this case of SCD, the variance estimator given in (12) can be modiﬁed by setting k S ( · ) = 1 andleaving k T ( · ) unrestricted. This variance estimator does not require any knowledge of a distancemeasure between the units. Moreover, it assigns weights equal to one for all covariances, hencerobust to SCD as well as WCD. The test statistic takes the form: S (3) n,T = ∆ ¯ L n,T ˆ σ ,n,T / √ T D −→ N (0 , , (14)where ˆ σ ,n,T = 1 n T n X i,j =1 T X t,s =1 k T (cid:18) | t − s | l T + 1 (cid:19) ∆ ˜ L i,t ∆ ˜ L j,s . (15)The variance estimator (15) was proposed by DK, which is valid when T is large, regardless of n ﬁnite or inﬁnite. Consistency of the estimator follows immediately from the conditions given aboveexcept that now it is required v i,t to be α -mixing of size 2 r/ ( r −

1) with r > λ i to be uniformly bounded. Then the null distribution in (14) follows.It is known that when the number of units in the panel is close to the number of time seriesobservations this estimator performs poorly. An alternative way to estimate the covariance matrixis to exploit the factor structure of the DGP. The PC estimation of the factor model deﬁned by(1)-(3) is investigated by Stock and Watson (2002), Bai and Ng (2002), Bai (2003), among others.This method minimizes the sum of squared residuals SSR = P ni =1 P Tt =1 (∆ ˜ L i,t − λ ′ i f t ) subject toVar( f t ) = I m . Then the solution for the estimates of the common factors, b f t , are given by √ T timesthe ﬁrst m eigenvectors of the matrix P ni =1 ∆˜L i. ∆˜L ′ i. with ∆˜L i. = (∆ ˜ L i, , ∆ ˜ L i, , . . . , ∆ ˜ L i,T ) ′ andthe factor loadings can be estimated as b λ i = T P Tt =1 b f t ∆ ˜ L i,t . Then the overall EPA hypothesis canbe tested using S (4) n,T = ∆ ¯ L n,T ˆ σ ,n,T / √ T D −→ N (0 , , (16)whereˆ σ ,n,T = 1 n T n X i,j =1 T X t,s =1 k T (cid:18) | t − s | l T + 1 (cid:19) b λ ′ i b f t b f ′ s b λ j + 1 n T n X i,j =1 k S (cid:18) d ij d n (cid:19) T X t,s =1 k T (cid:18) | t − s | l T + 1 (cid:19) b ε i,t b ε j,s , (17)with b ε i,t = ∆ ˜ L i,t − b λ ′ i b f t . The conditions under which the estimates b λ i and b f t are consistent aregiven in Bai and Ng (2002). Consistency of the variance estimator (17) follows directly under theseconditions together with the conditions on consistent estimation of the long-run variance as in10ndrews (1991). These lead to the null distribution given in (16). As in the case of pure WCD ofthe loss diﬀerentials, if a distance metric is not available we can useˆ σ ,n,T = 1 n T n X i,j =1 T X t,s =1 k T (cid:18) | t − s | l T + 1 (cid:19) b λ ′ i b f t b f ′ s b λ j + 1 n  n p T n p X i,j =1 T X t,s =1 k T (cid:18) | t − s | l T + 1 (cid:19) b ε i,t b ε j,s  . (18) Remark 3.

In (17) and (18), the ﬁrst terms involving the common components dominate thesecond ones. This is because under WCD of the process ε i,t the latter terms vanish, hence, they areasymptotically negligible. This means that under SCD, one can use simpler estimators which donot take into account the WCD of the error terms. Such an estimator can be obtained by setting k S ( · ) = 1 if i = j and k S ( · ) = 0 otherwise in (17). Our Monte Carlo results which are availableupon request show that this variance estimator works very well in the case of SCD. In this section we are concerned with testing the hypothesis (7), i.e., H : µ = µ = · · · = µ n = 0.The discussion in Section 2.3.1 is based on large T and small n scenario. In Section 2.3.2 we proposeextensions to the case of large T and large n . T and small n In the case of ﬁxed n , by the CLT for weakly stationary time series and the Cramer-Wold device, thejoint limiting distribution of the vector of loss diﬀerential series ∆ ¯ L T = (∆ ¯ L ,T , ∆ ¯ L ,T , . . . , ∆ ¯ L n,T ) ′ is given by √ T Ω − / n ( ∆ ¯ L T − µ ) D −→ N ( , I n ) , as T → ∞ , where µ = ( µ , µ , . . . , µ n ) ′ , Ω n = 1 T n X i,j =1 T X t,s =1 h i h ′ j E( v i,t v j,s ) , with h i being the i th column of I n . The case of no CD.

Under cross-sectional independence of the loss diﬀerential series, we have Ω n = diag( σ , σ , . . . , σ n ). Therefore, the ﬁrst test statistic considered is J (1) n,T = T ∆ ¯ L ′ T b Ω − ,n ∆ ¯ L T D −→ χ n , (19)11here b Ω ,n is a consistent estimator of Ω n with diagonal elements ˆ σ i,T given in (11). Consistency ofthe estimator b Ω ,n follows directly from the fact that its components are consistent under theconditions, for instance, given by Andrews (1991). Hence, this test statistic is robust againstarbitrary time dependence as is S (1) n,T . The case of CD.

In the case of small n , neither the kernel based estimators of the spatialdependence structure nor the partial sample estimator is consistent. Nevertheless, the estimatorsuggested by DK is still adapted and it is robust to arbitrary CD. Similar to the steps leading tothe overall EPA test S (3) n,T we construct a test statistics which does not require a known distancemetric and is robust to WCD or SCD. The test statistic is given by J (3) n,T = T ∆ ¯ L ′ T b Ω − ,n ∆ ¯ L T D −→ χ n , where b Ω ,n = 1 T n X i,j =1 T X t,s =1 k T (cid:18) | t − s | l T + 1 (cid:19) h i h ′ j ∆ ˜ L i,t ∆ ˜ L j,s . Although this statistic has the advantage of being robust to any kind and strength of CD, it isfeasible only when n < T . T and large n When n grows with T , the limiting chi-square distribution is not meaningful. In this case, if theloss diﬀerentials are cross-sectionally independent a standardized chi-square test can be used. Thistest statistic is given by Z (1) n,T = J (1) n,T − n √ n D −→ N (0 , . (20)The stated null distribution can be obtained by sequential asymptotics, letting ﬁrst T approach toinﬁnity. Then, the quadratic form given in (19) is a sum of the squares of n independent standardnormal random variables, hence (20) is veriﬁed. The case of WCD.

An extension of S (2) n,T gives the second test statistic that is robust toarbitrary time and cross sectional dependence: J (2) n,T = T ∆ ¯ L ′ T b Ω − ,n ∆ ¯ L T , b Ω ,n = 1 T n X i,j =1 k S (cid:18) d ij d n (cid:19) T X t,s =1 k T (cid:18) | t − s | l T + 1 (cid:19) h i h ′ j ∆ ˜ L i,t ∆ ˜ L j,s , (21)In our Monte Carlo study, we use the χ n critical values to investigate the small sample propertiesof this test statistic. However, the validity of this null distribution is not obvious as the consistencyof the non-parametric variance estimator (21) requires large n but the test statistic has inﬁnitevariance as n → ∞ .To see if the small sample properties of the test improves, we calculate the following standardizedversion of this test statistics as is done in the case of independent cross-sections: Z (2) n,T = J (2) n,T − n √ n D −→ N (0 , . The case of SCD.

As in this case of large n we can estimate the common factors in the lossdiﬀerentials and their loadings consistently, we can generalize the test statistics S (3) n,T in order to testthe joint EPA hypothesis. This naive test statistic is given by J (4) n,T = T ∆ ¯ L ′ T b Ω − ,n ∆ ¯ L T , where b Ω ,n = b Λ " T T X t,s =1 k T (cid:18) | t − s | l T + 1 (cid:19) b f t b f ′ s Λ ′ + b Σ n , and b Σ n = 1 T n X i,j =1 k S (cid:18) d ij d n (cid:19) T X t,s =1 k T (cid:18) | t − s | l T + 1 (cid:19) h i h ′ j b ε i,t b ε j,s . (22)Once more, we use the χ n critical values for this test statistic in our Monte Carlo simulationsalthough the validity of this distribution is not obvious because PC estimates of the common factorsrequire large n but the test statistic has inﬁnite variance as n → ∞ . Again, one can use a centeredand scaled version of this statistic given by Z (4) n,T = J (4) n,T − n √ n D −→ N (0 , . Monte Carlo Study

To investigate the small sample properties of the test statistics given above, a set of Monte Carlosimulations are conducted. 2000 samples from each DGP described below for the dimensions of T ∈ { , , , , } , n ∈ { , , , , } are generated. All tests are applied for the nominalsize of 5%. Two diﬀerent DGPs are considered to explore the eﬀect of WCD and SCD on the performance ofthe tests. DGP1 contains only spatial dependence. In this case, for each unit i , two independentforecast error series ( e i,t , e i,t ) are generated using spatial AR(1) processes deﬁned as ζ l,it = ρ n X j =1 w ij ζ l,jt + u l,it , with , u l,it ∼ N (0 , , l = 1 , , where w ij is the element of the spatial matrix W n in row i and column j . To make the powerresults across diﬀerent experiments comparable, the average unconditional variance of the forecasterror series e l,it , l = 1 ,

2, is held ﬁxed. Such series are generated as e l,.t = 1 √ ¯ s S n u l,.t , (23)where u l,.t = ( u l, t , u l, t , . . . , u l,nt ) ′ , S n = ( I n − ρ W n ) − and ¯ s = n − tr( S n S ′ n ). It can now beshown that the average of the diagonal elements of the variance-covariance matrix of this process isequal to one. In this DGP a quadratic loss function is used.DGP2 contains common factors as well as spatial dependence. In this case, following GW wedirectly generate the loss diﬀerential, hence we do not rely on a speciﬁc loss function. This is givenby ∆ L i,t = φ ( µ i + λ i f t + λ i f t + ε i,t ) . To investigate the size properties we set µ i = 0 for each i = 1 , , . . . , n and generate factor loadingsas λ i , λ i ∼ N (1 , . . f t , f t ∼ N (0 , , hence, they do not incorporate autocorrelation. The error series ε i,t are generated in the same spiritas in (23). We ﬁnally set φ = 1 / . e i,t = √ . e i,t and report the results from testing the equality of forecast accuracy of e i,t and e i,t . Inthe heterogeneous scenario, we generate the third series according to e i,t = √ θ i e i,t where θ i ∼ U (0 . , . . Similarly, in the case of DGP2, we set µ i = 1 . i in the case of homogeneous alternativeand µ i ∼ U ( − . , . , in the case of heterogeneous alternative. It is important to note that in the case of heterogeneousalternative, the unconditional expectations of the loss diﬀerentials are equal to zero in all DGPs.Hence, the overall EPA hypothesis holds. On the other hand, for each unit, the expected value ofthe loss diﬀerential is diﬀerent from zero. Therefore, the joint EPA hypothesis does not hold. As aconsequence, we expect the overall EPA tests not to have increasing power against the heterogeneousalternative whereas joint EPA tests to be consistent.Three diﬀerent spatial AR(1) parameters are considered for both DGPs: ρ = 0, 0.5 and 0.9. Tosave space we report results for only ρ = 0 .

5. As error series and common factors are generated foreach unit as white noises, it is implicitly assumed that these are one-step ahead forecasts.As we generate one-step ahead forecasts, the time series kernel k T ( · ) = 1 if t = s and k T ( · ) = 0otherwise. Spatial interactions between units are created with a rook-type weight matrix wheretwo units in the panel are neighbors if their Euclidean distance is less than or equal to one. Inthe computation of the spatial kernel k S ( · ), we used these distances. In addition we use distancesbased on the wrong assumption that the units are located on a line. We use Bartlett kernel for allexperiments and following KP, we set the spatial kernel bandwidth to ⌈ n / ⌉ . For the tests withcommon factors we assume that the number of common factors is correctly speciﬁed.15efore the discussion of the size and power properties of the robust tests, as a benchmark werefer to the results on the non-robust tests S (1) n,T and J (1) n,T . The size and power of these tests arereported in Table 1. As is expected, all tests are incorrectly sized. The results on the size properties of robust tests with DGP1 are given in Table 2. The size of thekernel robust test S (2) n,T of the overall EPA hypothesis improves with either T or n . First we focuson the results when the distance metric is correctly speciﬁed. In the smallest samples with T = 10and n = 10, this particular setting provides an empirical size of 9.9%. For T = 100 with n = 10corresponding value equals 8.65% whereas for T = 10 with n = 100 it is 8.35%. In the largest sampleits size is 6.25% which is close to the nominal value of 5%. When the distance between the panelunits is not correctly speciﬁed, the size of the test still improves with either dimension. However,as expected the size distortions are slightly larger in this case. In the largest smallest and largestsample sizes its size equals 12.5% and 8.3%, respectively. The test which uses the partial sampleestimator of the variance has similar size values. However, its size improves only with T . When T = 10 and n = 10 its size is slightly larger than that of the kernel robust test with a misspeciﬁeddistance. When T = 10 with n = 100 the size distortion augments (13.4%). In the case of large T , however, it performs better than the kernel robust test with either correct or incorrect distance.For instance, when T = 100 and n = 10 its size equals 6.25%. The test S (3) n,T performs very wellespecially when T is large and n is small. In most of the combinations of T and n it shows betterproperties than S (2) n,T . When T is greater than 50, it is correctly sized except when n = 100, however,even in this case its size equals 6.95% which makes it the preferred test over any version of S (2) n,T .The test S (4) n,T shows good properties even though it wrongly assumes that the loss diﬀerential seriescontain common factors. For small values of n with moderate to large T it is even preferred over S (2) n,T . Overall, it is also observed that they are correctly sized as T gets large. Lastly, for these testswee see that asymptotically the way to deal with WCD does not play an important role: the testswhich are using the correct distance metric, misspeciﬁed distances and the partial sample estimatorof the variance are equivalent in large samples.Typically, the performance of the joint EPA tests deteriorates with n and improves with T .When the distance metric of the kernel robust version of the tests J (2) n,T is correctly speciﬁed, its sizeequals 22.5% for T = 10 with n = 10. As T gets larger its size converges to the nominal value. In16he extreme case of T = 10 with n = 100, the rejection rate of the correct null is nearly 100%. Forthis large value of n as well, its size improves as T gets larger but even when T = 100 its size isabout twice as big as the nominal value. For the joint tests we see that the size properties of thetest with a misspeciﬁed distance metric is nearly the same as the test using the correct distances.The standardized test Z (2) n,T provides some improvement over J (2) n,T . It can be seen that when T is large, the performance of J (2) n,T deteriorates with n whereas Z (2) n,T is nearly correctly sized for allvalues of n . The test J (3) n,T is infeasible when n > T . It is seen that it requires large T compared to n to have good size properties. However, even when n = 10 and T = 100 its size is 10.15%. Thefactor robust test J (4) n,T is over-sized for almost all sample sizes but its performance improves with T . In this case as well the kernel method is robust to the misspeciﬁcation of the distance metricand the standardization of the test statistic provides some improvement especially in terms of large n properties.The size results for DGP2 are reported in Table 3. As expected, for this DGP the overall test S (2) n,T is grossly over-sized and its performance does not improve with increases in the sample size inany dimension. The test S (3) n,T shows very good properties except when T is very small. Especiallywhen T >

30 it is correctly sized even for large values of n . Conclusions are similar for the factor-robust tests S (4) n,T . The tests which use kernel-robust estimation of the spatial interactions, withor without correct distance metric, as well as the test which uses the partial sample estimator arecorrectly sized for moderate to large values of T . As in the case of DGP1, their size properties areasymptotically equal.The results concerning the joint EPA tests are less encouraging as in the previous DGP. Thetest J (2) n,T does not show huge size distortions, especially for moderate T , however we observe thatit is undersized for large time series dimensions. The test J (3) n,T behaves in line with theoreticalexpectations such that it has lower size distortions for large T and small n . However, its size equals10.65% even for T = 100 when n = 10. The factor-robust tests of the overall EPA hypothesisare over-sized in all sample sizes under consideration. The standardized test Z (4) n,T improves theperformance but this improvement remains limited.To summarize, in the case of both DGPs overall EPA hypothesis can be tested with a size closeto the nominal value for almost all sample sizes. In particular, it is found that the test S (3) n,T hasvery good properties. For DGP1, for small T and large n , the kernel robust test is preferred overthe test based on the partial sample estimator given that the distance metric is correctly speciﬁed.17therwise partial sample estimator provides a test with better properties. Finally, for the joint teststhe test J (3) n,T is preferred over others, however, it requires samples with large T and small n . The power results of the tests for DGP1 under the homogeneous alternative hypothesis are given inTable 4. In the previous subsection, we have seen that the size of all overall EPA tests approach tothe nominal level. Here, it is seen that the power of the test S (2) n,T converges to 100%, hence the testis consistent in its all forms. For moderate to large T , the test S (3) n,T is correctly sized. We observethat its power is only slightly lower compared to that of S (2) n,T in these sample sizes. Even thoughthey wrongly assume that there are common factors in the DGP, the power of the factor-robusttests S (4) n,T are very close to that of S (3) n,T and they are consistent.The previous results of the joint EPA tests are correctly sized only for large T and small n . Herewe see that their power is generally low, with an exception of the standardized test Z (2) n,T . For thesamples with T = 100 and n = 10, its power equals 52.7%. The corresponding value for the test J (2) n,T Z (2) n,T has a clear advantage over J (2) n,T even when n is small. However, oncemore, the power of all tests approach to 100% with n and T .Table 5 reports the power results of the tests for DGP2 under the homogeneous alternativehypothesis. For this DGP, we have seen that the tests S (2) n,T are over-sized even asymptotically.Hence, we focus on the tests S (3) n,T and S (4) n,T . It is seen that the power of both tests are very similarfor all sample sizes. Their properties improve only with T but they reach 100% of even for moderate T for this parametrization of the DGP.To save space, we do not report the power results under the heterogeneous alternative hypothesis.However, the main ﬁnding is summarized in Figure 1 where the size adjusted power of the factor-robust tests S (4) n,T and J (4) n,T are shown. It can be seen that under the homogeneous alternative, thesize adjusted power of both overall and joint EPA hypotheses approach to 100%. Whereas, underthe heterogeneous alternative, the size adjusted power of the overall test equals the nominal size.This is because of the fact that under this alternative hypothesis the expected value of the lossdiﬀerential equals zero. However, they are diﬀerent from zero for each panel unit, hence the jointEPA test has power against this alternative. 18 Empirical Application

In this section, we use the tests proposed to compare the OECD and IMF GDP growth forecasts.The data for the IMF forecast errors come from their Historical WEO Forecasts Database. Thedatabase includes historical τ -steps ahead forecast values, τ = 1 , , . . . ,

5, for GDP growth rate. Thedata covers up to 192 countries and starts from early 1990’s. We collected similar data from thepast vintages of the Economic Outlook of the OECD. The Economic Outlook contains only 1-stepahead forecasts. Both organizations publish their forecasts twice a year. In our application we focuson their summer forecasts made for the following year. These forecasts are published in June everyyear by the OECD whereas IMF forecasts are published in July. The publishing dates are close,hence the forecast errors are comparable.Eventually we have a balanced panel data set of GDP growth forecast errors of 29 OECDcountries from the two organization between 1998 and 2016. To investigate the role of heterogeneityand the change in the dimensions of the panel data set, we also apply the tests to a sample of G7countries between 1991 and 2016. This data set comes from Turner (2017).We implement the tests described above on the two data sets. We create two diﬀerent loss series:absolute loss and quadratic loss. The absolute error loss diﬀerential is created as∆ L (1) i,t = | e i,t | − | e i,t | , where, as is throughout the application, ﬁrst organization is the OECD. This loss function is impor-tant when we compare the magnitude of the (absolute) bias made by the two organizations. Thequadratic loss is generated as ∆ L (2) i,t = e i,t − e i,t . This loss function is arguably the most frequently used one and it is useful to compare the variancein the forecast errors. For instance, if the forecasts of the both organizations are unbiased theexpectation of absolute error loss is zero and quadratic loss permits to compare the variances directly.We begin the analysis by the DM tests applied to each country. We compute the DM test statisticfor all countries, between the years 1998 and 2016 using the OECD data set. In the computations,we use a Bartlett kernel with a bandwidth parameter of 0 because we have 1-step ahead forecasts.19he result are given in Table 6.First, in terms of the sign of the statistics, a considerable amount of heterogeneity can be observedin the sample. For both types of loss functions roughly half of the statistics are negative. Second,most of these statistics are statistically insigniﬁcant with exceptions being BEL, CAN, ESP, HUNand NZL. For BEL which is a country where the predictive ability of the IMF is superior, the EPAhypothesis can be rejected at 5% and 10% levels with absolute and quadratic losses, respectively. Inthe case of CAN, we can reject the EPA hypothesis with absolute loss at 10% signiﬁcance level. ForCAN too, IMF predicts the economic growth rate better than OECD. In the case of ESP and HUN,the diﬀerences in predictive ability are signiﬁcant with both the absolute and quadratic losses. ForESP OECD predictions, for HUN IMF predictions outperform the other. For NZL we can rejectthe EPA hypothesis with absolute loss at 5% level.

As found in our Monte Carlo simulations, the increase in the number of cross-sections increases thepower of EPA tests. To see if we can reject the EPA hypothesis by using cross sectional informationwe apply the panel tests to the data set. However, the gain from the usage of panels depends onthe degree and the nature of CD. Before proceeding to panel tests of EPA, we analyze the CD inthe two panel data sets of OECD and G7 countries.Here, we use two tests of CD. The ﬁrst is the LM test of the absence of CD by Breusch and Pagan(1980) and the second one is the bias corrected version of it developed by Pesaran, Ullah, and Yamagata(2008). The ﬁrst is a test of the joint signiﬁcance of pairwise correlations between all units in thepanel. The null hypothesis of this test is the absence of CD between any pair in the panel andthe statistic is distributed as χ q with q = n ( n − /

2. Hence, the test is more suitable for thecases of ﬁxed and small n . The second test statistic is a bias corrected version of the LM test afterstandardization. It is asymptotically normal as n → ∞ and more suitable for large panels.We apply these tests to both the original data and to the residuals from a linear factor modelestimated by the PC methods as in Bai and Ng (2002) and Bai (2003). For the OECD sample thenumber of PCs to be extracted from the panel is chosen by the information criterion IC p proposedby Bai and Ng (2002). This suggests the existence of 6 common factors. For the G7 sample weextract 2 ﬁrst PCs as in this case of small n the information criteria are inconsistent. Although thisnumber is arbitrary, for this sample we apply the tests which do not require the consistent estimation20f the number of common factors, hence this strategy does not have an eﬀect on the conclusions ofour application. The results are given in Table 7. As can be seen, the null hypothesis of no CD isrejected using any test on both data sets and both loss functions in conventional signiﬁcance level.This means that the tests which allow for common factors and spatial dependence on this sampleare more reliable.To see the time series proﬁle of the common factors in the loss diﬀerential series, we report inFigure 2 the plot of the ﬁrst three PCs in the standardized quadratic loss diﬀerential series for theOECD sample, numbered in decreasing order with respect to their eigenvalues. Associated factorloadings estimates are reported in Table 8. These estimates of the common factors in quadratic lossdiﬀerentials show clearly the eﬀect of ﬁnancial crisis.The ﬁrst common factor makes a peak in 2009. As for most of the countries in the sampleestimated loadings are positive, this shows that the ﬁrst organization, OECD, had a lower predictiveability compared to IMF in this period. However, the second common factor is negative in thisperiod. Hence, it compensates the eﬀect of the ﬁrst one for the countries with a positive loading.There are eight countries in the sample for which the loadings of the two ﬁrst factors are positive.The third common factor shows the eﬀect in 2008. It is negative in this year, and for more thanhalf of the countries its loading is negative. This means that on 2008 as well, the OECD had a lowerpredictive ability for most of the countries. However, there is considerable amount of heterogeneityin the factor loadings, therefore the predictive ability between the organizations. Here we apply the panel EPA tests to both data sets. Following the insights of the Monte Carloresults, we apply the factor-robust tests S (4) n,T and J (4) n,T to the OECD data set and the cluster-robusttests S (3) n,T and J (3) n,T to the G7 sample. S (4) n,T and J (4) n,T are computed using (17) and (22), respectively.As a benchmark, we also report the results from the tests assuming no CD, namely S (1) n,T and J (1) n,T .The results are given in Table 9. As before, for the time series kernels we use a bandwidth of0. For the spatial kernels we use the geographic distances between countries which are measuredas the distance between the most populated cities of each country pair. Alternatively we tried thedistance between the capital cities but the results are similar and not reported here. The data ongeographical distance come from CEPII GeoDist dataset (Mayer and Zignago, 2011). We chose the25th percentile of the sample of distances as the bandwidth parameter in all kernel functions.21or the OECD sample the statistics of the overall EPA hypothesis for the absolute loss arepositive whereas for the quadratic loss they are negative. Hence, OECD has a lower predictionperformance in terms of bias and higher performance in terms of variance. However, these diﬀerencesare low and statistically insigniﬁcant. Using the non-robust joint EPA tests we cannot reject theEPA hypothesis. With robust tests, however, the null is rejected for both loss functions. Hence, wecan conclude that there are signiﬁcant diﬀerences between the prediction performance of the twoinstitutions.The results are similar for the G7 sample. In this case, the statistics of the overall EPA hypothesisare negative for both loss functions but once more they are statistically insigniﬁcant. However,using the robust tests of the joint EPA hypothesis on the quadratic loss, we can reject the null ofno diﬀerence between the predictive ability of the two institutions. This paper concerned the problem of testing equal predictive ability (EPA) hypotheses using paneldata. The test proposed by Diebold and Mariano (1995) is generalized to a panel data frameworktaking into account the complexities arising from using either micro and macro data. We derivedtest statistics for two diﬀerent EPA hypotheses. First hypothesis, the overall EPA, states that thepredictive ability of the two forecasters is equal on average over all time periods and cross-sectionalunits, whereas under the second hypothesis, the joint EPA, the equality of prediction performanceholds true jointly for each unit in the panel. Our proposed tests are robust to diﬀerent forms of cross-sectional dependence in the loss diﬀerentials, arising from spatial dependence (weak cross-sectionaldependence), common factors (strong cross-sectional dependence) or both.The small sample properties of the proposed tests are found to be good in a set of Monte Carlosimulations. In particular, the overall EPA tests robust to strong cross-sectional dependence arecorrectly sized. This is the case even in the experiments which do not involve common factors butonly spatial dependence. However, their power is relatively low compared to test statistics whichare robust only to spatial dependence, given that forecast errors do not contain common factors.The tests are used to compare the prediction performance of two major organizations, OECDand IMF, on their historical economic growth forecasts. In a sample of 29 OECD countries coveringthe period between 1998 and 2016, we found that IMF has an overall better performance in termsof bias whereas OECD makes predictions with less variance. Though, overall diﬀerences are not22tatistically signiﬁcant. It is possible to reject the joint EPA hypothesis, in favor of the alternativehypothesis of at least one panel unit for which the predictive power of the two institutions is diﬀerent.As a robustness check, the tests are applied to a sample of G7 countries between 1991 and 2016. Inthis sample, OECD predictions are better on average in terms of both bias and variance, though,once again the overall diﬀerences are statistically insigniﬁcant. We can reject the null of joint EPAhypothesis using the quadratic loss function.The main ﬁndings in this paper suggests further developments. A possible extension of the testingprocedures proposed in this paper is to allow to distinguish between the sources of the diﬀerencesin predictive ability. The predictive ability of diﬀerent forecasters may diﬀer through periods whileon average they have equal predictive power. To deal with this situation, the conditional EPA testsof Giacomini and White (2006) may be extended to our panel data framework. This is an ongoingresearch agenda.

References

Anderson, T. W. (1971).

The statistical analysis of time series . John Wiley & Sons.Andrews, D. W. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix esti-mation.

Econometrica , (3), 817–858.Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica , (1),135–171.Bai, J., & Ng, S. (2002). Determining the number of factors in approximate factor models. Econo-metrica , (1), 191–221.Bai, J., & Ng, S. (2006). Conﬁdence intervals for diﬀusion index forecasts and inference for factor-augmented regressions. Econometrica , (4), 1133–1150.Breusch, T. S., & Pagan, A. R. (1980). The Lagrange multiplier test and its applications to modelspeciﬁcation in econometrics. The Review of Economic Studies , (1), 239–253.Chudik, A., Pesaran, M. H., Tosetti, E., et al. (2011). Weak and strong cross-section dependenceand estimation of large panels. Econometrics Journal , (1), 45–90.Clark, T. E., & McCracken, M. W. (2001). Tests of equal forecast accuracy and encompassing fornested models. Journal of Econometrics , (1), 85–110.Clark, T. E., & McCracken, M. W. (2015). Nested forecast model comparisons: A new approachto testing equal accuracy. Journal of Econometrics , (1), 160–177.23lark, T. E., & West, K. D. (2007). Approximately normal tests for equal predictive accuracy innested models. Journal of Econometrics , (1), 291–311.Davies, A., Lahiri, K., et al. (1995). A new framework for analyzing survey forecasts using three-dimensional panel data. Journal of Econometrics , (1), 205–228.Diebold, F., & Mariano, R. (1995). Comparing predictive accuracy. Journal of Business & EconomicStatistics , (3), 253–63.Driscoll, J. C., & Kraay, A. C. (1998). Consistent covariance matrix estimation with spatiallydependent panel data. Review of Economics and Statistics , (4), 549–560.Giacomini, R., & White, H. (2006). Tests of conditional predictive ability. Econometrica , (6),1545–1578.Kelejian, H. H., & Prucha, I. R. (1998). A generalized spatial two-stage least squares procedurefor estimating a spatial autoregressive model with autoregressive disturbances. The Journalof Real Estate Finance and Economics , (1), 99–121.Kelejian, H. H., & Prucha, I. R. (2007). HAC estimation in a spatial framework. Journal ofEconometrics , (1), 131–154.Moscone, F., & Tosetti, E. (2012). HAC estimation in spatial panels. Economics Letters , (1),60–65.Moscone, F., & Tosetti, E. (2015). Robust estimation under error cross section dependence. Eco-nomics Letters , , 100–104.Pain, N., Lewis, C., Dang, T.-T., Jin, Y., & Richardson, P. (2014). OECD forecasts during andafter the ﬁnancial crisis (Working Paper No. 1107). OECD Economics Department.Pesaran, M. H., & Tosetti, E. (2011). Large panels with common factors and spatial correlation.

Journal of Econometrics , (2), 182–202.Pesaran, M. H., Ullah, A., & Yamagata, T. (2008). A bias-adjusted LM test of error cross-sectionindependence. The Econometrics Journal , (1), 105–127.Stock, J. H., & Watson, M. W. (2002). Forecasting using principal components from a large numberof predictors. Journal of the American Statistical Association , (460), 1167–1179.Timmermann, A., & Zhu, Y. (2019). Comparing forecasting performance with panel data (DiscussionPaper No. DP13746). Centre for Economic Policy Research.Turner, D. (2017).

Designing fan charts for GDP growth forecasts to better reﬂect downturn risks (Working Paper No. 1428). OECD Economics Department.24uong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses.

Econo-metrica , (2), 307–333.West, K. D. (1996). Asymptotic inference about predictive ability. Econometrica , (5), 1067–1084.White, H. (2001). Asymptotic theory for econometricians . Academic Press, Revised Edition.25able 1: Small Sample Properties of the Non-Robust Tests S (1) n,T and J (1) n,T Overall EPA Tests: S (1) n,T Joint EPA Tests: J (1) n,T n \ T

10 20 30 50 100 n \ T

10 20 30 50 100DGP1: Size10 13.55 13.05 12.10 12.70 13.40 10 19.35 10.90 8.35 8.15 6.8520 13.05 12.50 12.15 10.45 11.45 20 24.85 12.35 10.25 7.30 6.0530 11.40 11.70 11.85 11.40 10.50 30 32.35 14.85 11.15 8.05 6.9550 14.30 11.60 10.35 11.10 10.10 50 45.70 18.80 13.65 8.95 6.60100 13.50 11.05 12.75 12.40 11.40 100 67.15 28.50 17.10 11.20 7.95DGP2: Size10 48.00 47.30 44.85 42.90 48.45 10 23.90 19.95 17.10 15.00 16.7020 62.15 62.20 60.40 58.40 56.80 20 28.55 22.90 20.85 17.95 18.0030 67.65 64.25 65.05 65.55 65.35 30 29.85 23.35 23.80 22.30 19.4050 72.20 72.35 72.00 74.15 73.00 50 34.05 26.50 23.55 24.05 20.55100 79.90 80.75 81.90 79.20 80.00 100 35.70 30.05 27.85 24.20 24.60DGP1: Power10 24.15 33.05 38.15 54.80 76.25 10 22.35 16.60 18.65 26.10 44.5020 32.00 49.20 61.35 78.30 95.90 20 30.00 24.45 26.80 38.35 64.5030 40.95 61.10 74.65 89.50 99.35 30 39.45 29.45 32.40 46.20 77.2550 55.50 78.25 91.05 98.50 100.00 50 55.40 39.10 45.70 60.15 91.65100 79.85 96.35 99.30 100.00 100.00 100 78.45 60.90 67.10 83.75 99.45DGP2: Power10 95.40 99.60 100.00 100.00 100.00 10 92.05 99.15 99.90 100.00 100.0020 98.15 99.80 100.00 100.00 100.00 20 95.70 99.60 100.00 100.00 100.0030 98.75 99.95 100.00 100.00 100.00 30 97.20 99.75 100.00 100.00 100.0050 99.05 100.00 100.00 100.00 100.00 50 98.45 100.00 100.00 100.00 100.00100 99.50 100.00 100.00 100.00 100.00 100 99.30 100.00 100.00 100.00 100.00Note: Overall EPA Tests are introduced in Section 2.2 and Joint EPA Tests are in Section 2.3. Thenominal size is 5%. Power is calculated under homogeneous alternative hypothesis. able 2: Size - DGP1: No Common Factors, Spatial DependenceOverall EPA Tests Joint EPA Tests n \ T

10 20 30 50 100 n \ T

10 20 30 50 100 S (2) n,T

10 9.90 9.05 8.15 8.95 8.65 J (2) n,T

10 22.55 9.75 7.90 6.60 5.7520 8.65 8.00 7.90 6.85 7.15 20 52.80 22.25 15.00 9.90 6.6530 7.75 6.85 8.35 6.40 6.85 30 67.65 29.55 17.75 10.70 8.0050 9.70 6.85 6.35 7.30 6.45 50 88.20 41.90 26.15 13.40 7.95100 8.35 6.90 7.40 6.80 6.25 100 99.35 68.40 43.25 20.55 10.45 S (2) n,T [ ms ] 10 12.50 11.05 10.25 10.80 11.00 J (2) n,T [ ms ] 10 21.60 11.15 8.00 6.85 6.6520 11.05 10.25 10.20 8.80 9.25 20 37.85 16.40 11.90 8.65 6.2530 9.65 9.20 10.45 8.60 8.35 30 47.30 20.20 13.75 9.05 7.3050 11.75 9.20 7.85 9.60 8.40 50 65.05 27.30 17.30 10.05 6.45100 9.75 8.40 9.70 9.00 8.30 100 95.25 48.45 28.55 14.95 9.20 S (2) n,T [ ps ] 10 12.90 9.10 7.70 7.00 6.25 Z (2) n,T

10 20.95 9.05 6.90 5.50 5.3020 12.15 10.00 9.05 6.80 6.60 20 49.10 19.00 12.70 8.00 5.0530 11.35 8.70 9.20 6.55 6.75 30 64.10 25.55 15.00 8.15 5.5050 13.10 9.60 7.65 7.50 6.80 50 85.20 36.30 21.50 10.80 5.35100 13.40 9.85 10.00 9.00 7.65 100 99.00 62.40 36.30 15.45 7.35 S (3) n,T

10 8.30 7.15 5.85 5.80 5.90 J (3) n,T

10 56.30 33.05 18.30 10.1520 9.65 7.40 6.90 5.95 5.65 20 89.20 55.60 22.9530 8.70 6.85 7.45 5.30 5.15 30 92.05 45.5050 10.85 7.40 5.25 5.90 5.70 50 91.00100 9.95 7.15 6.95 6.45 5.35 100 S (4) n,T

10 8.85 7.80 6.05 5.60 5.20 J (4) n,T

10 66.20 33.40 24.55 17.30 12.5020 9.55 8.10 7.30 6.20 6.20 20 92.65 53.65 35.90 23.15 13.4030 8.40 7.20 7.25 5.90 5.80 30 98.25 67.10 43.75 26.95 15.8550 10.65 7.05 5.75 6.45 6.25 50 99.95 82.50 58.70 30.75 17.05100 9.00 6.85 7.75 6.95 6.00 100 100.00 96.55 78.65 44.40 21.60 S (4) n,T [ ms ] 10 8.90 8.00 6.15 5.60 4.90 J (4) n,T [ ms ] 10 65.00 34.30 26.10 18.90 14.0020 10.25 8.15 7.75 6.70 6.50 20 87.40 47.65 33.35 23.55 14.0530 8.75 7.70 8.10 6.45 6.35 30 94.55 55.30 37.80 23.25 15.6550 11.50 7.85 6.65 7.20 6.80 50 99.60 70.45 48.35 25.50 15.10100 9.70 7.70 9.00 8.05 6.95 100 100.00 89.70 66.70 35.65 18.50 S (4) n,T [ ps ] 10 9.50 8.50 7.05 6.90 6.60 Z (4) n,T

10 64.25 31.70 22.65 15.90 11.2020 10.30 8.20 8.20 6.45 6.80 20 91.65 50.25 31.90 20.30 11.4030 9.65 7.15 8.25 6.10 6.00 30 97.80 63.30 39.20 22.40 13.4550 11.45 7.85 6.55 6.55 6.40 50 99.95 78.30 54.15 24.85 12.65100 11.70 7.70 8.60 7.80 7.00 100 100.00 94.80 74.05 37.25 16.10Note: See the note of Table 1. [ ms ] indicates that the test uses a misspeciﬁed distance metric, [ ps ] refers to the partialsample estimator of the WCD. able 3: Size - DGP2: Common Factors, Spatial DependenceOverall EPA Tests Joint EPA Tests n \ T

10 20 30 50 100 n \ T

10 20 30 50 100 S (2) n,T

10 30.70 30.50 26.60 24.70 29.60 J (2) n,T

10 13.80 8.40 6.05 5.65 4.1520 29.95 29.35 28.05 25.90 25.00 20 17.45 7.70 5.30 2.45 2.5530 37.25 34.65 35.55 34.75 32.60 30 21.40 7.60 5.85 3.65 2.8050 47.35 43.55 42.90 44.35 42.70 50 28.95 9.10 6.55 4.40 2.80100 59.25 59.10 57.95 56.25 57.95 100 38.90 12.90 7.15 5.00 3.05 S (2) n,T [ ms ] 10 37.65 36.05 33.50 31.30 38.05 J (2) n,T [ ms ] 10 18.85 14.15 10.80 10.00 9.3020 45.50 44.85 43.45 41.35 40.25 20 21.30 13.05 10.55 7.50 7.3030 52.75 49.50 51.15 50.80 50.00 30 24.45 12.85 12.25 9.80 7.9050 60.85 59.30 58.90 60.20 59.80 50 28.25 15.25 13.15 11.10 8.50100 68.15 67.25 67.55 66.40 67.35 100 33.50 17.10 12.35 10.00 8.90 S (2) n,T [ ps ] 10 24.25 22.95 20.30 18.90 22.35 Z (2) n,T

10 13.35 7.90 5.85 5.20 3.7020 34.80 33.65 32.90 30.50 28.95 20 16.45 7.15 4.70 2.30 2.2030 40.40 37.60 38.35 37.35 35.85 30 19.80 6.65 5.25 3.15 2.2550 45.70 41.75 41.30 41.95 40.50 50 26.50 7.90 5.70 3.90 2.15100 53.95 54.25 54.15 51.60 52.25 100 36.25 11.30 6.40 4.40 2.95 S (3) n,T

10 10.15 8.45 6.00 6.30 5.65 J (3) n,T

10 56.20 34.60 18.90 10.6520 9.35 7.35 6.70 5.20 5.85 20 90.50 53.15 23.5530 10.25 6.50 7.70 5.75 5.40 30 90.85 46.0050 8.65 6.85 6.50 5.45 4.95 50 91.40100 8.45 7.05 5.25 5.65 4.70 100 S (4) n,T

10 9.90 7.95 5.65 6.05 5.25 J (4) n,T

10 66.80 36.15 26.45 17.30 14.0020 9.20 7.20 6.25 4.95 5.30 20 90.65 51.75 34.65 20.35 13.1530 10.00 6.40 7.55 5.45 5.25 30 96.55 62.15 38.75 22.75 11.0550 8.50 6.60 6.30 5.40 4.85 50 99.75 74.85 48.15 26.40 12.60100 8.45 6.95 5.20 5.60 4.55 100 100.00 92.70 66.15 33.65 11.05 S (4) n,T [ ms ] 10 9.85 8.05 5.65 6.05 5.30 J (4) n,T [ ms ] 10 66.00 36.15 26.60 18.55 15.4520 9.25 7.20 6.30 4.95 5.35 20 85.45 48.35 34.20 21.45 15.7530 10.05 6.40 7.65 5.50 5.25 30 92.35 56.00 36.25 22.85 14.6050 8.55 6.65 6.35 5.40 4.85 50 97.65 64.25 43.70 25.70 14.25100 8.45 7.00 5.20 5.60 4.55 100 100.00 85.95 59.25 31.55 14.00 S (4) n,T [ ps ] 10 9.75 7.95 5.75 6.05 5.40 Z (4) n,T

10 65.15 34.80 24.45 16.25 12.6520 9.15 7.15 6.15 4.90 5.20 20 89.80 49.10 31.85 17.45 10.9030 10.10 6.40 7.60 5.50 5.25 30 95.70 58.25 35.00 19.60 8.6550 8.60 6.65 6.35 5.40 4.85 50 99.65 70.55 42.65 21.65 10.30100 8.45 6.90 5.20 5.60 4.55 100 100.00 90.75 61.05 27.20 8.20Note: See the note of Table 2. able 4: Power Under Homogeneous Alternative DGP 1: No Common Factors, Spatial DependenceOverall EPA Tests Joint EPA Tests n \ T

10 20 30 50 100 n \ T

10 20 30 50 100 S (2) n,T

10 19.05 26.60 30.80 47.60 70.80 J (2) n,T

10 25.40 14.75 15.50 21.00 34.6520 24.55 40.05 53.05 71.15 93.20 20 57.40 33.90 29.25 34.30 54.5030 33.55 51.85 67.20 85.15 98.85 30 72.40 43.70 38.05 43.55 69.1550 48.55 71.60 87.80 97.50 100.00 50 91.75 60.40 54.15 57.80 84.50100 71.15 94.05 98.40 99.95 100.00 100 99.65 86.80 79.75 82.35 97.50 S (2) n,T [ ms ] 10 21.80 29.85 34.90 51.60 73.80 J (2) n,T [ ms ] 10 24.60 15.90 17.50 23.45 39.8520 28.65 45.30 57.30 75.10 95.00 20 43.85 28.70 27.15 36.45 59.8030 37.60 56.25 71.15 87.85 99.25 30 54.50 34.30 32.60 44.45 73.2550 52.35 75.15 89.75 98.05 100.00 50 73.55 45.45 48.40 57.60 88.25100 74.30 95.20 98.70 99.95 100.00 100 97.00 74.35 71.60 80.75 98.35 S (2) n,T [ ps ] 10 20.25 23.30 26.95 41.00 64.15 Z (2) n,T

10 42.15 31.60 31.40 38.15 52.7020 26.95 38.70 52.00 68.80 92.00 20 77.05 58.95 54.70 58.65 76.8530 36.15 52.85 65.55 83.35 98.60 30 88.70 70.20 66.95 71.10 87.5050 50.55 72.45 85.80 97.10 99.95 50 98.30 89.15 86.80 87.25 97.15100 73.25 94.45 98.45 99.95 100.00 100 100.00 99.35 98.30 98.75 99.95 S (3) n,T

10 16.45 21.70 25.35 39.10 63.50 J (3) n,T

10 61.60 41.75 34.45 36.2020 23.75 37.25 48.85 66.20 90.40 20 93.60 76.45 70.1530 31.80 48.80 63.00 82.15 98.25 30 97.65 90.1050 47.00 67.65 84.25 96.25 99.95 50 99.85100 68.50 92.20 97.80 99.90 100.00 100 S (4) n,T

10 17.60 20.65 25.15 36.80 59.00 J (4) n,T

10 69.45 39.95 33.85 33.75 37.2520 24.45 37.70 48.85 66.15 89.95 20 94.55 66.55 54.55 51.50 63.9030 31.85 49.35 64.50 82.40 97.90 30 98.55 78.70 64.75 63.25 78.2050 47.20 69.30 85.20 96.75 100.00 50 99.95 91.45 83.35 78.55 91.75100 69.35 92.70 97.95 99.95 100.00 100 100.00 99.55 96.20 94.70 99.00 S (4) n,T [ ms ] 10 17.15 20.85 25.20 36.80 58.75 J (4) n,T [ ms ] 10 69.70 41.65 35.30 35.45 38.2020 24.95 38.00 49.90 66.35 89.85 20 90.95 62.15 52.75 51.70 65.3030 32.90 50.30 65.30 83.55 98.00 30 96.20 70.95 61.95 62.10 79.8550 48.70 70.55 86.30 97.10 100.00 50 99.45 86.05 77.10 76.15 92.95100 71.00 93.25 98.15 99.95 100.00 100 100.00 98.00 93.35 94.05 99.25 S (4) n,T [ ps ] 10 17.35 22.10 26.05 39.10 62.30 Z (4) n,T

10 67.60 38.30 31.90 31.40 35.1020 25.20 37.00 49.50 66.60 90.15 20 93.60 63.65 50.30 47.55 59.9030 33.60 49.60 63.95 82.20 97.95 30 98.05 74.60 61.40 59.15 74.6050 47.90 69.10 85.40 96.60 100.00 50 99.95 89.45 79.85 73.55 89.70100 70.85 92.90 98.05 99.95 100.00 100 100.00 99.20 94.85 92.45 98.75Note: See the note of Table 2. able 5: Power Under Homogeneous Alternative DGP 2: Common Factors, Spatial DependenceOverall EPA Tests Joint EPA Tests n \ T

10 20 30 50 100 n \ T

10 20 30 50 100 S (2) n,T

10 91.80 98.90 99.85 100.00 100.00 J (2) n,T

10 86.35 97.20 99.75 100.00 100.0020 92.75 99.20 99.95 100.00 100.00 20 93.35 99.40 100.00 100.00 100.0030 95.60 99.50 100.00 100.00 100.00 30 96.70 99.90 100.00 100.00 100.0050 97.10 99.85 100.00 100.00 100.00 50 99.55 99.95 100.00 100.00 100.00100 98.45 99.95 100.00 100.00 100.00 100 99.90 100.00 100.00 100.00 100.00 S (2) n,T [ ms ] 10 93.80 99.30 99.95 100.00 100.00 J (2) n,T [ ms ] 10 89.90 98.40 99.85 100.00 100.0020 96.30 99.55 100.00 100.00 100.00 20 94.80 99.75 100.00 100.00 100.0030 97.60 99.75 100.00 100.00 100.00 30 97.35 99.75 100.00 100.00 100.0050 98.25 99.95 100.00 100.00 100.00 50 99.00 100.00 100.00 100.00 100.00100 99.15 100.00 100.00 100.00 100.00 100 99.95 100.00 100.00 100.00 100.00 S (2) n,T [ ps ] 10 88.75 98.30 99.85 100.00 100.00 Z (2) n,T

10 85.90 97.10 99.75 100.00 100.0020 93.55 99.30 100.00 100.00 100.00 20 92.80 99.40 100.00 100.00 100.0030 95.50 99.60 100.00 100.00 100.00 30 96.25 99.90 100.00 100.00 100.0050 96.25 99.85 100.00 100.00 100.00 50 99.40 99.95 100.00 100.00 100.00100 98.35 99.90 100.00 100.00 100.00 100 99.90 100.00 100.00 100.00 100.00 S (3) n,T

10 74.45 93.05 98.95 99.95 100.00 J (3) n,T

10 99.70 99.85 100.00 100.0020 74.90 94.75 99.35 100.00 100.00 20 100.00 100.00 100.0030 77.65 95.60 99.25 100.00 100.00 30 100.00 100.0050 78.10 95.35 99.45 99.95 100.00 50 100.00100 78.95 96.60 99.65 100.00 100.00 100 S (4) n,T

10 73.75 92.85 98.90 99.95 100.00 J (4) n,T

10 98.00 99.25 99.85 100.00 100.0020 74.40 94.55 99.30 100.00 100.00 20 99.95 100.00 100.00 100.00 100.0030 77.35 95.45 99.20 100.00 100.00 30 100.00 100.00 100.00 100.00 100.0050 77.70 95.25 99.45 99.95 100.00 50 100.00 100.00 100.00 100.00 100.00100 78.85 96.55 99.65 100.00 100.00 100 100.00 100.00 100.00 100.00 100.00 S (4) n,T [ ms ] 10 73.70 92.80 98.90 99.95 100.00 J (4) n,T [ ms ] 10 98.00 99.35 99.90 100.00 100.0020 74.45 94.55 99.30 100.00 100.00 20 99.90 100.00 100.00 100.00 100.0030 77.35 95.50 99.20 100.00 100.00 30 100.00 100.00 100.00 100.00 100.0050 77.75 95.35 99.45 99.95 100.00 50 100.00 100.00 100.00 100.00 100.00100 78.85 96.55 99.65 100.00 100.00 100 100.00 100.00 100.00 100.00 100.00 S (4) n,T [ ps ] 10 73.65 92.95 98.95 99.95 100.00 Z (4) n,T

10 97.75 99.20 99.85 100.00 100.0020 74.60 94.50 99.20 100.00 100.00 20 99.90 100.00 100.00 100.00 100.0030 77.45 95.45 99.20 100.00 100.00 30 100.00 100.00 100.00 100.00 100.0050 77.80 95.25 99.45 99.95 100.00 50 100.00 100.00 100.00 100.00 100.00100 78.85 96.55 99.65 100.00 100.00 100 100.00 100.00 100.00 100.00 100.00Note: See the note of Table 2. able 6: DM Test Statistics for Each Country–OECD Sample (1998-2016) Country Absolute Loss Quadratic Loss Country Absolute Loss Quadratic Loss

AUS -0.6155 -0.4050 ISL -0.5325 -0.4712(0.5382) (0.6855) (0.5943) (0.6375)AUT 0.6885 0.1479 ITA 1.0697 1.1711(0.4912) (0.8824) (0.2848) (0.2415)BEL 2.0138 1.6625 JPN 1.4231 1.0345(0.0440) (0.0964) (0.1547) (0.3009)CAN 1.7833 1.5011 KOR 0.5976 0.5560(0.0745) (0.1333) (0.5501) (0.5782)CHE 1.0464 1.0980 LUX 0.9249 1.0136(0.2954) (0.2722) (0.3550) (0.3108)CZE -1.0617 -1.0003 MEX -0.5196 -0.3816(0.2884) (0.3172) (0.6034) (0.7027)DEU -0.3686 -1.1310 NLD 0.0813 0.8709(0.7124) (0.2581) (0.9352) (0.3838)DNK 0.0445 -0.7032 NOR 0.0084 -0.6276(0.9645) (0.4819) (0.9933) (0.5302)ESP -1.6955 -1.6919 NZL -2.0726 -1.5350(0.0900) (0.0907) (0.0382) (0.1248)FIN 0.4240 0.1252 POL -0.4466 -0.9600(0.6716) (0.9003) (0.6552) (0.3370)FRA 1.4205 1.4507 PRT -0.0675 0.1274(0.1555) (0.1469) (0.9461) (0.8987)GBR -0.2435 -1.1233 SWE -0.6610 -0.1636(0.8076) (0.2613) (0.5086) (0.8701)GRC -1.0708 -1.4509 TUR -0.0736 -0.3015(0.2843) (0.1468) (0.9414) (0.7630)HUN 2.3868 1.8742 USA 0.2005 0.0081(0.0170) (0.0609) (0.8411) (0.9935)IRL 0.4724 0.6562(0.6366) (0.5117)Note: The statistics are calculated as S (0) i,T = √ T (∆ ¯ L i,T / ˆ σ i,T ) D −→ N (0 ,

1) where ˆ σ i,T is com-puted as in (11) with a bandwidth equal to zero. The values shown in parentheses are p -values.31able 7: CD Tests Results Absolute Loss Quadratic Loss Absolute Loss Quadratic Loss

OECD Sample G7 Sample

Original DataBP Test

Modiﬁed BP Test

Defactored DataBP Test

Modiﬁed BP Test p -values.Table 8: Factor Loadings in Quadratic Loss Diﬀerential Series in OECD Sample Country Factor 1 Factor 2 Factor 3 Country Factor 1 Factor 2 Factor 3

AUS 0.19 0.66 -0.02 ISL 0.04 0.21 -0.24AUT -0.43 -0.21 -0.27 ITA 0.86 -0.26 0.31BEL 0.05 -0.85 0.22 JPN 0.27 0.41 -0.44CAN -0.40 -0.46 -0.09 KOR 0.67 0.48 -0.05CHE 0.71 -0.26 0.41 LUX 0.44 -0.48 -0.33CZE 0.25 -0.05 -0.37 MEX 0.83 -0.09 0.00DEU 0.69 0.56 -0.15 NLD 0.65 0.04 0.19DNK -0.13 -0.06 0.75 NOR -0.64 0.57 -0.17ESP -0.65 0.50 0.15 NZL -0.10 0.28 0.84FIN 0.77 0.49 -0.07 POL -0.06 -0.55 0.02FRA 0.30 -0.24 0.66 PRT 0.16 -0.18 -0.07GBR -0.12 0.70 0.41 SWE 0.50 -0.18 -0.56GRC -0.21 -0.49 -0.16 TUR 0.18 -0.06 -0.04HUN 0.34 -0.85 0.24 USA 0.61 0.44 0.45IRL 0.67 -0.03 -0.49 32able 9: Empirical Results of the Panel Tests for the EPA Hypotheses

Overall Tests Joint TestsAbsolute Loss Quadratic Loss Absolute Loss Quadratic Loss

OECD Sample S (1) n,T J (1) n,T S (4) n,T J (4) n,T S (1) n,T -0.498 -1.187 J (1) n,T S (3) n,T -0.371 -1.089 J (3) n,T p -values.33

10 20 30 40 50 60 70 80 90 100 S i z e A d j u s t ed P o w e r ( N o m i na l v a l ue % ) (a) S (4) n,T , Homogeneous Alternative T

10 20 30 40 50 60 70 80 90 100 S i z e A d j u s t ed P o w e r ( N o m i na l v a l ue % ) n=10n=20n=30n=50n=100 (b) S (4) n,T , Heterogeneous Alternative T

10 20 30 40 50 60 70 80 90 100 S i z e A d j u s t ed P o w e r ( N o m i na l v a l ue % ) (c) J (4) n,T , Homogeneous Alternative T

10 20 30 40 50 60 70 80 90 100 S i z e A d j u s t ed P o w e r ( N o m i na l v a l ue % ) (d) J (4) n,T , Heterogeneous Alternative Figure 1: Size Adjusted Power of Selected Tests Under Diﬀerent Alternative Hypotheses for DGP2(5% Nominal Size)

Year F o r e c a s t e rr o r s -4-3-2-101234 CrisisFactor 1Factor 2Factor 3