[PDF] Cointegrating Polynomial Regressions with Power Law Trends: A New Angle on the Environmental Kuznets Curve

Abstract

The Environment Kuznets Curve (EKC) predicts an inverted U-shaped relationship between economic growth and environmental pollution. Current analyses frequently employ models which restrict the nonlinearities in the data to be explained by the economic growth variable only. We propose a Generalized Cointegrating Polynomial Regression (GCPR) with flexible time trends to proxy time effects such as technological progress and/or environmental awareness. More specifically, a GCPR includes flexible powers of deterministic trends and integer powers of stochastic trends. We estimate the GCPR by nonlinear least squares and derive its asymptotic distribution. Endogeneity of the regressors can introduce nuisance parameters into this limiting distribution but a simulated approach nevertheless enables us to conduct valid inference. Moreover, a subsampling KPSS test can be used to check the stationarity of the errors. A comprehensive simulation study shows good performance of the simulated inference approach and the subsampling KPSS test. We illustrate the GCPR approach on a dataset of 18 industrialised countries containing GDP and CO2 emissions. We conclude that: (1) the evidence for an EKC is significantly reduced when a nonlinear time trend is included, and (2) a linear cointegrating relation between GDP and CO2 around a power law trend also provides an accurate description of the data.

Full PDF

CCointegrating Polynomial Regressions with Power LawTrends: A New Angle on the Environmental KuznetsCurve ∗ Yicong Lin † and Hanno Reuvers Department of Econometrics and Data Science, Vrije Universiteit Amsterdam,1081 HV Amsterdam, The Netherlands Department of Econometrics, Erasmus University Rotterdam, 3062 PARotterdam, The NetherlandsSeptember 7, 2020

Abstract

The Environment Kuznets Curve (EKC) predicts an inverted U-shaped relationship betweeneconomic growth and environmental pollution. Current analyses frequently employ modelswhich restrict the nonlinearities in the data to be explained by the economic growth variableonly. We propose a Generalized Cointegrating Polynomial Regression (GCPR) with ﬂexibletime trends to proxy time e ﬀ ects such as technological progress and / or environmental aware-ness. More speciﬁcally, a GCPR includes ﬂexible powers of deterministic trends and integerpowers of stochastic trends. We estimate the GCPR by nonlinear least squares and derive itsasymptotic distribution. Endogeneity of the regressors can introduce nuisance parameters intothis limiting distribution but a simulated approach nevertheless enables us to conduct validinference. Moreover, a subsampling KPSS test can be used to check the stationarity of theerrors. A comprehensive simulation study shows good performance of the simulated inferenceapproach and the subsampling KPSS test. We illustrate the GCPR approach on a dataset of18 industrialised countries containing GDP and CO emissions. We conclude that: (1) theevidence for an EKC is signiﬁcantly reduced when a nonlinear time trend is included, and (2) alinear cointegrating relation between GDP and CO around a power law trend also provides anaccurate description of the data. JEL Classiﬁcation : C12, C13, C32, O44, Q20

Keywords : Cointegration Testing, Environmental Kuznets Curve, Generalized CointegratingPolynomial Regression, Nonlinear Least Squares, Power Law Trends ∗ Earlier versions of this paper have been presented at the Econometrics Internal Seminar (EIS) at Erasmus UniversityRotterdam, and the 2019 CFE meeting in London. We gratefully acknowledge the comments by the participants. Weextend our thanks to Eric Beutner, Dick van Dijk, and Stephan Smeekes for their valuable feedback. † Corresponding author: [email protected]. a r X i v : . [ ec on . E M ] S e p Introduction

On page 370 of their seminal paper, Grossman and Krueger (1995) conclude:“

Contrary to the alarmist cries of some environmental groups, we ﬁnd no evidence thateconomic growth does unavoidable harm to the natural habitat. Instead we ﬁnd thatwhile increases in GDP may be associated with worsening environmental conditionsin very poor countries, air and water quality appear to beneﬁt from economic growthonce some critical level of income has been reached. ”The quote above suggests an inverted U-shaped relationship between environmental degradation andeconomic growth. This relationship is currently known as the Environmental Kuznets Curve (EKC)and it forms an active research area. Indeed, some 25 years after its ﬁrst conception, there nowexists a rich literature that (1) reports on the experimental evidence on the existence / nonexistanceof the EKC, (2) provides economic theory to explain the EKC, and / or (3) reﬁnes the econometrictools that are used to analyse the EKC. The Web of Science returns a list of over 2,900 articleswhen the search query “

Environmental Kuznets Curve ” is entered. The studies on the EKC have been criticised on two main points. First, the GDP variablewas initially treated as a stationary variable even though unit root tests do not reject the nullhypothesis of a unit root. This has further consequences since EKC regressions include higherinteger powers of GDP as well. The combination of nonstationarity and nonlinearity places theEKC in the nonlinear cointegration literature and appropriate econometric techniques should beemployed. Such techniques have been provided in Wagner (2015) and Wagner and Hong (2016)under the name of

Cointegrating Polynomial Regressions (CPRs), that is regressions containing:deterministic variables, integrated processes, and integer powers of integrated processes. TheseCPRs are estimated by fully modiﬁed OLS to allow for standard inference on the coe ﬃ cients.Model diagnostics and a multiple countries analysis are discussed in Wang et al. (2018) and Wagneret al. (2019), respectively.Second, there is an ongoing debate in the EKC literature on the model speciﬁcation, and morespeciﬁcally, on omitted variables. Omitted variables are a valid concern because adaptation to cleantechnology, pollution control policy, increasing energy e ﬃ ciency, and increasing environmentalawareness may all inﬂuence pollution levels yet are also di ﬃ cult quantify (and for that reason oftenexcluded from the reduced-form model). It has been argued that the inclusion of deterministic timetrends will control for such omitted variables. In empirical applications, this typically translates Further references to these speciﬁc areas of research can be found in the review articles by Dasgupta et al. (2002),Stern (2004), and Carson (2009) among others. Web of Science, accessed on August 27, 2020, http: // The recent work by Stypka et al. (2017) conﬁrms the importance of treating the growth variable as nonstationary.However, it appears less important to use an estimation procedure that incorporates the fact that several integer powersof the same integrated process appear as regressors. Namely, Stypka et al. (2017) also ﬁnd that the “standard estimator”which treats higher order powers of the integrated regressor as additional I(1) variables has the same limiting distributionas the CPR estimator (yet a slightly worse ﬁnite sample performance). Nordhaus (2014) discusses the link between climate change and technological changes. As another example,Figure 2 in Gillingham and Stock (2018) reports a steady decline in the price of solar panels and a steady growth insolar panel sales. Cheaper solar energy can substitute fossil energy thereby reducing pollution. A policy variable, ‘Repudiation of Contracts by Government’, was included by Panayotou (1997) to proxy thequality of environmental policies and institutions. linear deterministic trends should control for omittedvariables and provide a valid EKC speciﬁcation. On the contrary, we will reason here, and lateralso in the empirical application, that omitted nonlinear trends are more likely to result in erroneousEKC results. The small simulation setting in Table 1 illustrates this point. We consider an omittednonlinear deterministic trend and estimate an EKC type of regression: y t = τ + τ t + φ x t + φ x t + u t .We test H : φ ≥ H a : φ < ﬃ cient in front of x t isthe typical result economists associate with the existence of the EKC. It is seen how a (correctlysized) Wald test misinterprets the negative curvature of a deterministic trend for negative curvaturecaused by a squared integrated variable. In other words, a negative and signiﬁcant coe ﬃ cient infront of the square of GDP might be caused by omitted nonlinear deterministic trends rather thanbeing any evidence for the EKC.The current paper augments the Cointegrating Polynomial Regressions of Wagner and Hong(2016) with power law deterministic trends. The powers of these time trends are estimated andthereby provide additional ﬂexibility in explaining the nonlinear trending behaviour observed in thedata. We provide the limiting distribution of the estimator and propose a simulated approach forparameter inference. Additionally, we show how a KPSS-type of test remains useful in verifyingthe stationarity of the error process hence avoiding spurious results or misspeciﬁcation of thecointegrating relation. A Monte Carlo study sheds light on the ﬁnite sample properties of thesimulated approach and stationarity test. As an empirical application, we revisit a dataset on 18countries over the timespan 1870-2014 and study the Environmental Kuznets Curve. For each ofthese countries, we ﬁnd that the ﬂexible deterministic trends su ﬃ ciently capture the nonlinearitiesin the data and turn higher integer powers of log per capita GDP redundant.Our paper builds upon several di ﬀ erent strands of literature. Clearly, we rely on results from theliterature on Cointegrating Polynomial Regressions (see references above). Additionally, variousreferences on power law trends are closely related to the current work. Phillips (2007) provides adetailed analysis of the power law trend regression. An extension of such power law regressions tospatial lattices is covered in Robinson (2012). Finally, there is also recent work by Hu et al. (2019)on power law functions applied to stochastic trends.This paper is organized as follows. Section 2 introduces the model and the estimation framework.Asymptotic properties of the estimators and parameter inference are discussed in Section 3. TheMonte Carlo simulations in Section 4 compare asymptotic results and the ﬁnite sample distributions.An in depth discussion of the Environment Kuznets Curve can be found in Section 5. Section 6concludes. All proofs can be found in the Appendix.Finally, some words on notation. The integer part of the number a ∈ R + is denoted by [ a ]. For avector x ∈ R n , its p -norm is denoted by (cid:107) x (cid:107) p = ( (cid:80) ni = | x i | p ) / p . For a matrix A , say of dimension( n × m ), the induced p -norm and Frobenius norm are deﬁned as (cid:107) A (cid:107) p = sup x (cid:44) (cid:107) Ax (cid:107) p / (cid:107) x (cid:107) p and (cid:107) A (cid:107) F = (cid:113)(cid:80) ni = (cid:80) mj = a i j , respectively. For p -norms, we will omit the subscripts whenever p = I n denotes the ( n × n ) identity matrix. Two special linear algebra operators are: the Hadamard product(element-wise multiplication) denoted by “ (cid:12) ” and the Kronecker product denoted by “ ⊗ ”. We omitthe integration bounds whenever the integral is take over [0 , ⇒ ” signiﬁes weakconvergence, “ d = ” stands for equality in distribution, and “ −→ p ” and “ −→ d ” denote convergencein probability and in distribution. If convergence occurs conditionally on the sample, then we add3 superscript “*” to the standard notation. The probabilistic Landau symbols are O p ( · ) and o p ( · ).Finally, the generic constant C can change from line to line. Our model speciﬁcation is a hybrid of the power law regressions from Robinson (2012) and thecointegrating polynomial regression (CPR) introduced by Wagner and Hong (2016). It combinesintegrated regressors (and their integer powers) with a ﬂexible deterministic trend speciﬁcation. Theresulting

Generalized Cointegrating Polynomial Regression (GCPR) for y t is given by y t = d (cid:88) i = τ i t θ i + m (cid:88) i = p i (cid:88) j = φ i j x jit + u t = d t ( θ ) (cid:48) τ + m (cid:88) i = x (cid:48) ( i ) t φ i + u t = d t ( θ ) (cid:48) τ + s (cid:48) t φ + u t , (2.1)where θ = [ θ , θ , . . . , θ d ] (cid:48) , τ = [ τ , τ , . . . , τ d ] (cid:48) , φ i = [ φ i , φ i , . . . , φ i , p i ] (cid:48) , d t ( θ ) = [ t θ , . . . , t θ d ] (cid:48) and x ( i ) t = [ x it , x it , . . . , x p i it ] (cid:48) collects the integer powers of the i th integrated regressor ( i = , , . . . , m ).The ﬁnal equality in (2.1) relies on the deﬁnitions s t = [ x (cid:48) (1) t , x (cid:48) (2) t , . . . , x (cid:48) ( m ) t ] (cid:48) and φ = [ φ (cid:48) , φ (cid:48) , . . . , φ (cid:48) m ] (cid:48) .The error term u t is stationary (see Assumption 2 for more details).We consider nonlinear least squares (NLS) estimators of the unknown parameters in (2.1). Assuch, we deﬁne the objective function Q T ( θ , τ , φ ) = (cid:80) Tt = (cid:0) y t − d t ( θ ) (cid:48) τ − s (cid:48) t φ (cid:1) and compute (cid:16) (cid:98) θ T , (cid:98) τ T , (cid:98) φ T (cid:17) = arg min ( θ , τ , φ ) ∈ Θ × R d × R p Q T ( θ , τ , φ ) , (2.2)where p = (cid:80) mi = p i , Θ = (cid:110) θ , θ , . . . , θ d : − < θ L ≤ θ ; θ j − θ j − ≥ δ, j = , . . . , d ; θ d ≤ θ U < ∞ (cid:111) ⊂ R p , for some lower bound θ L , some upper bound θ U , and δ >

0. Note that Θ is closed and bounded andtherefore compact.The optimization problem in (2.2) is easy to solve. For any given θ ∈ Θ , the minimizers for τ and φ can be found from the OLS regression (cid:34) τ ( θ ) φ ( θ ) (cid:35) =  T (cid:88) t = z t ( θ ) z t ( θ ) (cid:48)  −  T (cid:88) t = z t ( θ ) y t  , with z t ( θ ) = [ d t ( θ ) (cid:48) , s (cid:48) t ] (cid:48) . (2.3)We can thus minimize the concentrated criterion function (cid:101) Q T ( θ ) = Q T (cid:16) θ , τ ( θ ) , φ ( θ ) (cid:17) to obtain (cid:98) θ T and run a subsequent OLS regression to ﬁnd (cid:98) τ T and (cid:98) φ T . Remark 1

The ﬁxed powers of x it allow us to test for their signiﬁcance and thereby distinguish betweennonlinearities caused by deterministic and / or stochastic trends. This is important for our empiricalapplication on the Environmental Kuznets Curve, see Section 5. Hu et al. (2019) study a model witha ﬂexible power of the integrated regressor. That is, these authors derive the limiting distribution ofthe NLS estimators for β and γ when y t = β | x t | γ + u t with β (cid:44) . Remark 2

Equations (2.1) - (2.2) assume that each of the d elements in θ needs to be estimated. First, we nvision model speciﬁcations where d is small such that the deterministic trends cannot representthe integrated regressors (see, e.g. Phillips (1998)). Second, the applied researcher might prefer toﬁx certain elements of this vector. For example, requiring θ = and θ = includes an interceptand linear trend into the model. A typical model speciﬁcation could bey t = τ + τ t + τ t θ + s (cid:48) t φ + u t . (2.4) Our approach and proofs are easily adapted to the case in which some elements of θ are prespeciﬁed. We subsequently study the asymptotic properties of the NLS estimators. To this end we ﬁrst collectall the unknown parameters in the vector γ = (cid:2) θ (cid:48) , τ (cid:48) , φ (cid:48) (cid:3) (cid:48) . This vector is assumed to be an elementof the parameter space Γ = Θ × R d + p . The true parameter vector is γ = (cid:2) θ (cid:48) , τ (cid:48) , φ (cid:48) (cid:3) (cid:48) . Assumption 1

For all ≤ i ≤ d, we have τ i (cid:44) . Assumption 2

Let ζ t = [ η (cid:48) t , ε (cid:48) t ] (cid:48) be a sequence of i.i.d. random vectors with E ( ζ t ) = , Σ = E (cid:0) ζ t ζ (cid:48) t ) , and E (cid:107) ζ t (cid:107) q < ∞ for some q > .(a) u t = (cid:80) ∞ k = ψ k η t − k with (cid:80) ∞ k = k | ψ k | < ∞ .(b) x t = (cid:80) ts = v s , where v t = (cid:80) ∞ k = Ψ k ε t − k with (cid:80) ∞ k = (cid:107) Ψ k (cid:107) < ∞ and det (cid:0)(cid:80) ∞ k = Ψ k (cid:1) (cid:44) . The ﬁrst assumption is needed to avoid identiﬁcation issues. That is, if τ i = i ∈ { , . . . , d } , then the corresponding θ i is not identiﬁed and the Davies problem arrises whentesting H : τ i = ﬃ culties in the currentpaper and this is reﬂected in our model speciﬁcation (2.1). That is, we consider ﬂexible powersof the deterministic trends but ﬁxed powers of the stochastic trends thus allowing us to test zerorestrictions on (elements of) φ . This is of crucial importance in the EKC application to see whethernonlinear e ﬀ ects in the economic growth variable remain signiﬁcant after nonlinear time trendshave been added to the model. For di ﬀ erent model settings Assumption 1 has been relaxed inthe literature. Baek et al. (2015) and Cho and Phillips (2018) study the asymptotic behaviour ofa quasi-likelihood ratio test when Assumption 1 is violated and the conditional mean of the datacontains strictly stationary regressors and a ﬂexible time trend. Alternatively, one can use driftingparameter sequences with di ﬀ erent identiﬁcation strengths as in Andrews and Cheng (2012).Assumption 2 excludes cointegration among elements of x t and deﬁnes it as the partial sum of ashort memory process. The latter implies that1 √ T [ rT ] (cid:88) t = (cid:34) u t v t (cid:35) ⇒ B ( r ) = (cid:34) B u ( r ) B v ( r ) (cid:35) (3.1)where B ( r ) denotes an ( m + Ω = (cid:104) Ω uu Ω uv Ω vu Ω vv (cid:105) . The one-sided long-run covariance matrix ∆ = (cid:80) ∞ h = E (cid:16)(cid:104) u t u t + h u t v (cid:48) t + h v t u t + h v t v (cid:48) t + h (cid:105)(cid:17) = (cid:104) ∆ uu ∆ uv ∆ vu ∆ vv (cid:105)

5s partitioned similarly. Subscripts are used to refer to speciﬁc elements. For example, B v i and ∆ v i u denote the i th elements of B v and ∆ vu , respectively.A concise exposition of our results asks for additional notation. An enumeration of variousdeﬁnitions is presented below.(1) Introduce scaling matrices: D d , T ( θ ) = diag[ T θ , T θ , . . . , T θ d ] for the time trends and theircoe ﬃ cients, and D s , T = diag[ D (1) , T , . . . , D ( m ) , T ] for the integer powers of I (1) regressors,where D ( i ) , T = diag[ T / , T , . . . , T p i / ]. Moreover, we deﬁne two (2 d + p ) × (2 d + p ) nonrandomblock matrices L τ , T and D θ , T such that L τ , T =  I d − diag[ τ ]ln T I d I p  , D θ , T = √ T  D d , T ( θ ) D d , T ( θ ) D s , T  , and G γ , T = D θ , T L (cid:48)− τ , T = √ T  D d , T ( θ ) D d , T ( θ ) diag[ τ ]ln T D d , T ( θ ) O p × d O p × d D s , T  . (2) Deﬁne vectors d ( r ; θ ) = [ r θ , r θ , . . . , r θ d ] (cid:48) , B ( i ) ( r ) = (cid:2) B v i ( r ) , B v i ( r ) , . . . , B p i v i ( r ) (cid:3) (cid:48) and theirstacked random vector process j ( r ; γ ) = (cid:2) ( τ (cid:12) d ( r ; θ )) (cid:48) ln( r ) , d ( r ; θ ) (cid:48) , B (cid:48) (1) ( r ) , . . . , B (cid:48) ( m ) ( r ) (cid:3) (cid:48) .(3) For the second-order bias terms, we deﬁne b i = (cid:104) , (cid:82) B v i ( r ) dr , . . . , p i (cid:82) B p i − v i ( r ) dr (cid:105) (cid:48) and B vu = (cid:2) (cid:48) d × , (cid:48) d × , b (cid:48) ∆ v u , . . . , b (cid:48) m ∆ v m u (cid:3) (cid:48) . Theorem 1

Under Assumptions 1-2, we have G γ , T (cid:0)(cid:98) γ T − γ (cid:1) ⇒ (cid:32)(cid:90) j ( r ; γ ) j ( r ; γ ) (cid:48) dr (cid:33) − (cid:32)(cid:90) j ( r ; γ ) dB u ( r ) + B vu (cid:33) , as T → ∞ . The proof of Theorem 1 is closely related to the work by Chan and Wang (2015). These authorsprovide the asymptotic distribution of NLS estimators with nonstationary time series under a set ofgeneral conditions (see their theorem 3.1). We verify that these conditions are also fulﬁlled whenthe scaling matrix G γ , T is non-diagonal and depending on the true parameter vector γ . The resultsin Chan and Wang (2015) and Wang et al. (2018) suggest that Assumption 2 can be replaced bya long memory speciﬁcation for ∆ x t . However, long memory parameters will enter the limitingdistribution and inference will be complicated further.We now illustrate Theorem 1 with two examples. These examples highlight two mathematicalfeatures that will complicate parameter inference. Example 1

We consider the model y t = τ t θ + u t where the innovations satisfy Assumption 2. The limitingdistribution of the parameter estimators depends solely on the mean square Riemann-Stieltjes ntegrals (cid:82) τ r θ ln( r ) dB u and (cid:82) r θ dB u and is therefore normally distributed (e.g., section 2.3 inTanaka (2017)):  T θ + T θ + τ ln( T ) T θ +  (cid:34) (cid:98) θ T − θ (cid:98) τ T − τ (cid:35) −→ d N  , Ω uu (2 θ + (cid:34) τ − τ (2 θ + − τ (2 θ +

1) (2 θ + (cid:35) −  . (3.2) The scaling matrix in the LHS of (3.2) , (cid:20) T θ + T θ + τ ln( T ) T θ + (cid:21) , depends on θ and is non-diagonal. Example 2

If y t = τ t θ + φ x t + u t , then the limiting distribution of the NLS estimator is:  T θ + T θ + τ ln( T ) T θ + T  (cid:98) θ T − θ (cid:98) τ T − τ (cid:98) φ T − φ  ⇒ (cid:82) (cid:0) τ r θ ln( r ) (cid:1) dr (cid:82) τ r θ ln( r ) dr (cid:82) τ r θ ln( r ) B v dr (cid:82) τ r θ ln( r ) dr (cid:82) r θ dr (cid:82) r θ B v dr (cid:82) τ r θ ln( r ) B v dr (cid:82) r θ B v dr (cid:82) B v dr  − × (cid:82) τ r θ ln( r ) dB u (cid:82) r θ dB u (cid:82) B v dB u  +  ∆ vu  . This limiting distribution exhibits second order bias when ∆ vu (cid:44) , or when B u and B v are correlated. Two features of the limiting distribution of G γ , T (cid:0)(cid:98) γ T − γ (cid:1) deserve further comments. First,as emphasised in Example 1, the scaling matrix G γ , T features two uncommon properties: (1)this matrix depends on the true parameter vectors τ and θ , and (2) G γ , T is not diagonal. Thesepeculiarities are caused by the nonlinearity and nonstationarity of the model. More speciﬁcally, thesefeatures can be traced back to the presence of functions like f ( t ; τ, θ ) = τ t θ . Limiting distributionswith a similar mathematical structure can be found in the structural breaks literature, cf. modelsetting II.b of Perron and Zhu (2005) and its detailed analysis in Beutner et al. (2020).Second, the nonstationary regressor x it enters the model (2.1) through a polynomial transfor-mation of the form g ( x , φ i ) = φ i x + φ i x + . . . + φ ip i x p i ( i = , , . . . , m ). In the terminology ofPark and Phillips (2001) this part of the regression function is a linear combination of H -regularfunctions. It is well-documented in the literature, e.g. Chang et al. (2001) and Chan and Wang(2015), that this leads to second order bias terms and hence nonstandard inference (except in thespecial case of strictly exogenous nonstationary regressors). Remark 3

Asymptotic results with a diagonal scaling matrix can be obtained at the expense of a singularjoint limiting distribution. For example, reconsider Example 1 and note that (cid:20) T θ + T θ + / ln( T ) (cid:21) = (cid:104) − τ / ln( T ) (cid:105) (cid:20) T θ + T θ + τ ln( T ) T θ + (cid:21) . Since lim T →∞ (cid:104) − τ / ln( T ) (cid:105) = (cid:104) − τ (cid:105) , the continuous mapping theo-rem implies  T θ + T θ + / ln( T )  (cid:34)(cid:98) θ T − θ (cid:98) τ T − τ (cid:35) −→ d (cid:34) /τ − (cid:35) × N (cid:16) , Ω uu (2 θ + (cid:17) , and we recover the limiting distribution reported in theorem 6.3 of Phillips (2007). .1 General Considerations Let us assume for the moment that θ is known. The resulting model, that is y t = d t ( θ ) (cid:48) τ + s (cid:48) t φ + u t , is now linear in the unknown parameters [ τ (cid:48) , φ (cid:48) ] (cid:48) . The OLS estimator is (cid:104) (cid:98) τ T ( θ ) (cid:98) φ T ( θ ) (cid:105) = (cid:16)(cid:80) Tt = z t ( θ ) z t ( θ ) (cid:48) (cid:17) − (cid:16)(cid:80) Tt = z t ( θ ) y t (cid:17) and parameter inference is relatively straightforward. Forexample, we can apply the Fully Modiﬁed (FM) corrections as in Wagner and Hong (2016) toobtain a zero-mean Gaussian mixture limiting distribution that allows for standard inference. Given that θ is unknown in practice, it seems natural to ﬁrst compute (cid:98) θ T by minimisation of (cid:101) Q T ( θ ), and to subsequently compute  (cid:98) τ T (cid:98) φ T  =  T (cid:88) t = z t (cid:0)(cid:98) θ T (cid:1) z t (cid:0)(cid:98) θ T (cid:1) (cid:48)  − T (cid:88) t = z t (cid:0)(cid:98) θ T (cid:1) y t . (3.3)The latter estimator is linear in y , y , . . . , y T and fully modiﬁed adjustments seem possible. However,there are two issues. First, this estimator does not allow us to conduct inference on θ . Second, it isnot completely clear how the estimation error in (cid:98) θ T inﬂuences the limiting results. There is also some good news. Namely, if the estimator in (3.3) is used to calculate residuals,then these residuals can be used to construct consistent kernel estimators for the long-run variance(LRV) matrices ∆ and Ω . With V t ( γ ) = (cid:2) y t − d t (cid:0) θ (cid:1) (cid:48) τ − s (cid:48) t φ , ∆ x (cid:48) t ] (cid:48) , these estimators are deﬁned as (cid:98) ∆ T = T T (cid:88) t = t (cid:88) s = k (cid:32) | t − s | b T (cid:33) V t ( (cid:98) γ T ) V t ( (cid:98) γ T ) (cid:48) , (cid:98) Ω T = T T (cid:88) t = T (cid:88) s = k (cid:32) | t − s | b T (cid:33) V t ( (cid:98) γ T ) V t ( (cid:98) γ T ) (cid:48) , (3.4)for some kernel function k ( · ) and bandwidth parameter b T . The ﬁrst element in V t ( (cid:98) γ T ) is indeed theresidual ˆ u t = y t − d t (cid:0)(cid:98) θ T (cid:1) (cid:48) (cid:98) τ T − s (cid:48) t (cid:98) φ T . The remaining elements are ∆ x t = v t . Assumption 3 (a) k (0) = , k ( · ) is continuous at zero, and sup x ≥ | k ( x ) | < ∞ .(b) (cid:82) ∞ ¯ k ( x ) dx < ∞ , where ¯ k ( x ) = sup y ≥ x | k ( y ) | .(c) The bandwidth parameters { b T : T ≥ } satisﬁes { b T } ⊆ (0 , ∞ ) as well as lim T →∞ (cid:16) b − T + T − / b T ln T (cid:17) = . The conditions on the kernel function k ( · ), Assumptions 3(a)-(b), are identical to those in Jansson(2002). Jansson (2002) remarks that these assumptions “ would appear to be satisﬁed by any kernelin actual use ”. Commonly used kernel functions such as the Bartlett, Parzen, and Quadratic Spectralkernels indeed satisfy all these assumptions. Assumption 3(c) di ﬀ ers from the usual requirement,lim T →∞ (cid:16) b − T + T − / b T (cid:17) =

0, by a factor ln T . The di ﬀ erence is due to the estimation error in (cid:98) θ T .This error causes the residuals { ˆ u t } to be less close to the innovations { u t } and we balance this byincluding autocovariance matrices of higher lags at a lower rate. The deterministic component in the CPR model by Wagner and Hong (2016) is a linear combination of the elementsin the vector [1 , t , t , . . . , t q ] (cid:48) . The FM corrections are thus immediate if θ is known and takes values in the naturalnumbers. Based on Lemma 2 in the Appendix, it is also relatively straightforward to derive such corrections when θ isknown but not necessarily elements of the natural numbers. We have investigated the asymptotic behaviour of a Fully Modiﬁed version of the estimator in (3.3). Our e ﬀ orts inbounding the estimation error of (cid:98) θ T lead to a term in the covariance asymptotics that is O p (ln T ) instead of o p (1). This(as well as our Monte Carlo simulations) suggests that this Fully Modiﬁed estimator is not asymptotically valid. heorem 2 Under Assumptions 1-3, we have (cid:98) ∆ T −→ p ∆ and (cid:98) Ω T −→ p Ω . The limiting distribution in Theorem 1 is nonpivotal and thus unsuited for inference. We will use asimulated approach to account for the nuisance parameters, i.e. τ , θ and the parameters describingthe covariance structure of B ( r ). The main idea is to replace nuisance parameters by consistentestimates and to rely on a Monte Carlo (MC) simulation to approximate the limiting distribution.The empirical quantiles of these MC draws allow us to test hypothesis and / or conduct inference.Clearly, this kind of approach will provide exact inference when the limiting distribution is invariantwith respect to the nuisance parameters (e.g. Dufour and Khalaf (2002) and Dufour (2006)). In theabsence of such invariance, results as those in Wang et al. (2018) and Bergamelli et al. (2019) showthat the simulation approach can retain an asymptotic justiﬁcation. The following algorithm is anadaptation of Wang et al.’s (2018) simulated estimation. Among others, the current setting has tocontrol for more nuisance parameters because of the ﬂexible trend speciﬁcation.S tep

1: Estimate (cid:98) γ T and use the residuals { ˆ u t } to compute the estimators (cid:98) ∆ T and (cid:98) Ω T from (3.4)S tep

2: Repeat for j = , . . . , J ,(a) Draw an ( m + { e n } Nn = i.i.d. from N( , I m + ).(b) Compute (cid:104) (cid:98) µ n (cid:98) v n (cid:105) = (cid:98) Ω / T e n and construct the partial sum process (cid:8)(cid:98) χ n = (cid:2) (cid:98) χ n , . . . , (cid:98) χ mn (cid:3) (cid:48) (cid:9) Nn = according to (cid:98) χ n = (cid:98) χ n − + (cid:98) υ n and (cid:98) χ = .(c) Set (cid:98) w n = [ n , (cid:98) χ (cid:48) n ] (cid:48) , and construct a simulated draw as: (cid:98) J ( j ) N (cid:16)(cid:98) γ T , (cid:98) Ω T , (cid:98) ∆ − vu (cid:17) =  G (cid:48)− (cid:98) γ , N  N (cid:88) n = ˙ f ( (cid:98) w n , (cid:98) γ T ) ˙ f ( (cid:98) w n , (cid:98) γ T ) (cid:48)  G − (cid:98) γ , N  −  G (cid:48)− (cid:98) γ , N  N (cid:88) n = ˙ f ( (cid:98) w n , (cid:98) γ T ) (cid:98) µ n  + (cid:98) B − vu  , where (cid:98) ∆ − vu is a consistent estimator of the subblock of ∆ − = (cid:104) ∆ − uu ∆ − uv ∆ − vu ∆ − vv (cid:105) = Σ − ∆ (cid:48) , and (cid:98) B − vu = (cid:104) (cid:48) d × , (cid:48) d × , (cid:98) b (cid:48) (cid:98) ∆ − v u , . . . , (cid:98) b (cid:48) m (cid:98) ∆ − v m u (cid:105) (cid:48) with (cid:98) b i = (cid:20) , N (cid:80) Nn = (cid:16) (cid:98) χ in √ N (cid:17) , . . . , p i N (cid:80) Nn = (cid:16) (cid:98) χ in √ N (cid:17) p i − (cid:21) (cid:48) .S tep

3: Use the empirical quantiles of elements of (cid:26) (cid:98) J (1) N , . . . , (cid:98) J ( J ) N (cid:27) to conduct inference.Step 3 has been kept general for notational convenience. A more concrete example is as follows.To construct a two-sided equal-tailed conﬁdence interval for θ , we calculate the α and 1 − α empirical quantile of the ﬁrst elements of (cid:26) (cid:98) J (1) N , . . . , (cid:98) J ( J ) N (cid:27) , say c α/ and c − α/ respectively. Theimplied conﬁdence interval is (cid:104) ˆ θ − c − α/ T − (ˆ θ + ) , ˆ θ − c α/ T − (ˆ θ + ) (cid:105) . Theorem 3

Suppose Assumptions 1-3 hold and let N = [ κ T α ] for some κ > and < α ≤ min { , + θ } with ˜ θ ∈ ( − , θ L ) . Then, we have (cid:40) G (cid:48)− (cid:98) γ , N  N (cid:88) n = ˙ f ( (cid:98) w n , (cid:98) γ T ) ˙ f ( (cid:98) w n , (cid:98) γ T ) (cid:48)  G − (cid:98) γ , N (cid:41) − (cid:40) G (cid:48)− (cid:98) γ , N  N (cid:88) n = ˙ f ( (cid:98) w n , (cid:98) γ T ) (cid:98) µ n  + (cid:98) B − vu (cid:41) −→ d ∗ (cid:32)(cid:90) j ( r ; γ ) j ( r ; γ ) (cid:48) dr (cid:33) − (cid:32)(cid:90) j ( r ; γ ) dB u ( r ) + B vu (cid:33) , n probability. Theorem 3 establishes the asymptotic validity of the simulation approach. That is, for a largeenough J , the empirical quantiles of the simulated distribution will coincide with the asymptoticdistribution. According to Theorem 3, the length of the simulated time series, N , should grow moreslowly as θ L approaches − . The actual choice of θ L should satisfy θ L < θ i (for all 1 ≤ i ≤ d ). The correct speciﬁcation of the nonlinear cointegrating relation will result in a stationary errorprocess { u t } t ∈ Z . Stationarity tests can thus be used to detect spurious relationships and / or theomission of relevant terms from the cointegrating regression. We consider a KPSS-type test statisticfor the null hypothesis of stationarity. The test statistic reads K + T = (cid:98) Ω − u . v T T (cid:88) t =  √ T T (cid:88) i = ˆ u + t  , (3.5)where (cid:98) Ω u . v is a consistent estimator of Ω u . v = Ω uu − Ω uv Ω − vv Ω vu , ˆ u + t = y + t − d t (cid:0)(cid:98) θ T (cid:1) (cid:48) (cid:98) τ T − s (cid:48) t (cid:98) φ T ,and y + t = y t − (cid:98) Ω vu (cid:98) Ω − vv ∆ x t . The statistic is (stochastically) bounded under the null hypothesis butdiverges under the alternative. Several authors have reported model settings in which the asymptoticnull distribution of K + T is known, e.g. Kwiatkowski et al. (1992) and Wagner and Hong (2016).The estimation of θ contaminates the limiting distribution of (3.5) with nuisance parameters. Choi and Saikkonen (2010), Wagner and Hong (2016), Jiang et al. (2019), and Lin and Reuvers(2020), have shown that subsampling can resolve this issue. We will follow their approach and usesubsamples of size q T to compute the test statistics. Theorem 4

Under Assumptions 1-3 and if lim T →∞ (cid:18) q − T + (ln T ) (cid:16) q T T (cid:17) θ L + (cid:19) = , then for any (cid:96) ∈ { , . . . , T − q T + } we haveK + q T ,(cid:96) = (cid:98) Ω − u . v q T (cid:96) + q T − (cid:88) t = (cid:96)  √ q T t (cid:88) i = (cid:96) ˆ u + i  ⇒ (cid:90) (cid:2) W ( r ) (cid:3) dr , (3.6) where (cid:98) Ω u . v is a consistent estimator of Ω u . v = Ω uu − Ω uv Ω − vv Ω vu (Theorem 2) and W ( · ) denotes astandard Brownian motion. Theorem 4 does not provide any guidance on the choices for the starting value (cid:96) and thesubsample size q T . First, for a given q T , Choi and Saikkonen (2010) argue that the use of a singlesubsample (instead of all T observations) implies a signiﬁcant loss of power. We follow theirexample and combine all M = [ T / q T ] subresidual series of length q T using a Bonferroni procedure.That is, we create subresiduals series by selecting adjacent blocks of q T residuals while alternatingbetween the start and end of the sample. We calculate the KPSS-type test statistic for each subseries,say K , . . . , K M , and reject the null of stationarity at signiﬁcance α whenever max { K , . . . , K M } Proposition 5 in Wagner and Hong (2016) shows that the limiting distribution of K + T is free of nuisance parametersif θ is known and only a single integrated regressor occurs with integer powers greater than one. This result does notcarry over to the current setting because of the estimation error in (cid:98) θ T c α/ M which is deﬁned by P (cid:16)(cid:82) (cid:2) W ( r ) (cid:3) dr ≥ c α/ M (cid:17) = α/ M . Finally, we select the block size q T using Romano and Wolf’s (2001) minimum volatility rule. The approach is now completelydata-driven. This section lists various Monte Carlo simulations showing that the asymptotic approximations fromSection 3 provide useful guidance in ﬁnite samples. Further details on the implementation are asfollows. We consider T ∈ { , , } . The long-run covariance matrices in (3.4) are computedusing the Barlett kernel, k ( x ) = − | x | for | x | ≤ J = N = T (because θ L = ﬃ ces in our settings). We test at 5% signiﬁcanceand report results based on 2 . × MC replications.Foreshadowing the empirical application, we use the DGP y t = τ + τ t + τ t θ + φ x t + φ x t + u t , (4.1)where x t = (cid:80) ts = v s . The parameter values are θ = τ = [ τ , τ , τ ] (cid:48) = [7 , . , − × − ] (cid:48) , and φ = [5 , φ ] (cid:48) . The disturbance vector [ u t , v t ] (cid:48) is generated from the VAR(1) speciﬁcation (cid:34) u t v t (cid:35) = A (cid:34) u t − v t − (cid:35) + (cid:34) η t (cid:15) t (cid:35) , (cid:34) η t (cid:15) t (cid:35) i . i . d . ∼ N (cid:18) , (cid:20) ρρ (cid:21)(cid:19) . (4.2)In (4.2), we construct the autoregressive matrix A along the following two steps: (1) generate a(2 ×

2) random matrix U from U[0 ,

1] to construct the orthogonal matrix H = U ( U (cid:48) U ) − / , and(2) compute A = HLH (cid:48) using(A) L = diag[0 , , (C) L = diag[0 . , . , (B) L = diag[0 . , . , (D) L = diag[0 . , . . Settings (A)–(D) gradually increase the serial correlation in the error processes. The parameter ρ ∈ { . , . , . } governs the amount of endogeneity.We conduct four simulation experiments. Our ﬁrst simulation experiments relate to testing thenull of linear cointegration, i.e. we test H : φ = H a : φ (cid:44) φ =

0) is computed using four di ﬀ erent estimators: (1) the NLS estimator withsimulated critical values as described in Section 3.2 (SimNLS); the NLS estimator with simulatedcritical values and the true value for θ = θ )); (3) an heuristic FMOLSestimator which uses (cid:98) θ but ignores its estimation error (FMOLS); and (4) the FMOLS estimatorbased on θ = θ )). The results are listed in Table 2. It is immediately clear thatsimulated critical values improve size control. We point out three other observations. First, wesee that the empirical size is rather insensitive to changes in ρ , whereas the introduction of serialcorrelation makes the test (more) oversized. This behaviour is well-documented in simulationsettings where θ is known and restricted to be a natural number, cf. Wagner and Hong (2016) or Figure 1 is an exception. This ﬁgure reports power curves and here we decided to use T ∈ { , , } instead.This reduces the computational burden and avoids large parameter ranges where empirical power is equal to one. We start the VAR recursions from (cid:2) u v (cid:3) = and subsequently use a presample of 50 observations to reduce theinﬂuence of these initial values. θ ) and FMOLS( θ ). The model is nowlinear in its parameters and NLS estimation is no longer necessary. That is, we ﬁnd ourselves in themodel speciﬁcation previously analysed in detail by Wagner and Hong (2016). The comparison ofSimNLS( θ ) and FMOLS( θ ) indicates that simulated inference is also advantageous in this setting.The subsequent simulations are about testing power. We simulate power curves by varying φ over the interval (0 , .

15] (Figure 1). Since the outcomes are rather insensitive to changes in theendogeneity parameter ρ , we keep it ﬁxed at ρ = .

50. Throughout settings (A)–(D), empiricalpower increases monotonically with φ . There are also power gains from increasing the sample size.The latter fact is slightly distorted in setting (D) because the test is oversized when serial correlationis high. Overall, the behaviour of these power curves is as expected.Table 3 reports the empirical coverage and average conﬁdence interval (CI) length of 95%conﬁdence intervals for θ . The coverage is always below the desired nominal level of 95%.Coverage can drop as low as 54% when the sample size is small ( T = Ω ), then coverage is almost exactly 95% throughout all designs. As expected, theaverage width of the CIs decreases with sample size.Finally, as a fourth set of Monte Carlo experiments, we look at the ﬁnite sample properties ofthe KPSS test (Table 4). Comparing KPSS and KPSS( θ ), we see that knowledge of the true valueof θ is beneﬁcial as it always brings the empirical size closer to 5%. This di ﬀ erence aside, ourKPSS outcomes are comparable to the results reported in table 1 of Choi and Saikkonen (2010).That is, the Bonferroni correction leads to conservative tests for up to moderate levels of serialcorrelation. At high levels of serial correlation (Setting (D)) we approach unit root behaviour andthe KPSS tests are oversized. We examine the evidence for an EKC for a collection of 18 countries over the period 1870-2014( T = ) emissionsas a proxy for air pollution. The origin of these data is as follows. We used population and GDPdata from the Maddison Project (see https: // / ggdc / historicaldevelopment / maddison / ).Our carbon dioxide data are fossil-fuel CO emissions as made available by the Carbon DioxideInformation Analysis Center (CDIAC, see https: // cdiac.ess-dive.lbl.gov). Both GDP and CO emissions are expressed per capita and subsequently log-transformed. In accordance with thenotation of this paper, we will denote them by x t and y t , respectively. The same data set (or subsetsthereof) has also been studied by Wagner (2015), Chan and Wang (2015), Wang et al. (2018),Wagner et al. (2019), and Lin and Reuvers (2020). This conveniently allows us to compare results.All user choices (kernel speciﬁcation, bandwidth selection, etc.) are kept the same as during thesimulation study (see page 11).Prior to the analysis of the econometric models we will discuss several features of the time The stationarity properties of the series have been extensively studied and commented on in these papers. We willnot repeat this analysis but refer the interested reader to the Supplement where such results can be found. An inverted U-shaped relationship between GDP and CO (bothin log per capita) is clearly visible in Figure 2(a) and results like these have triggered the researchon the Environmental Kuznetz Curve. However, the heat map time indication also shows that timeis almost monotonically increasing along the curve. Time e ﬀ ects - e.g. increasing environmentalawareness, advances in sustainable technologies - can be valid alternative explanations for thesenonlinearities and their omission can (falsely) exaggerate the inﬂuence of GDP. It is for this reasonthat we developed and analysed the Generalized Cointegrating Polynomial Regression (GCPR).More evidence for the importance of time e ﬀ ects is available in Figure 2(b). This ﬁgure depictsthe same per capita series after detrending. The inverted U-shape is now (visually) less pronouncedor even absent.Finally, we consider two competing possibilities to extend the traditional linear cointegrationspeciﬁcation: y t = τ + τ t + φ x t + u t . This model does not account for any nonlinear behaviour overtime and is therefore ill-suited to ﬁt the data displayed in Figure 2(c). Cointegrating polynomialregressions use integer powers of x t to describe the curvature over time. Following Hu et al. (2019)we can allow for an integrated regressor with a ﬂexible power and estimate y t = τ + τ t + φ x t + φ x θ t + u t . The residual sum of squares (RSS) of the NLS estimator for this speciﬁcation is shownin Figure 2(d). The absence of a minimum at θ = x t . Moreover, the lack of any minimum might be interpreted as a sign that log percapita GDP is not the source of nonlinearity. Alternatively, we can opt for a ﬂexible deterministictrend as in y t = τ + τ t + τ t θ + φ x t + u t . The RSS now exhibits a clear minimum, see Figure 2(e).Considerations like these motivate the use of GCPRs. We will argue in the next pages that this lastmodel speciﬁcation is well-suited to capture the important features of the pollution data.We continue the empirical analysis with a comparison of three model speciﬁcations. All threemodels are of the form: y t = τ + τ t + τ t θ + φ x t + φ x t + u t . (5.1)Model (M1) is the speciﬁcation above with τ =

0. This model speciﬁcation (possibly with theadditional constraint τ =

0) has been explored in various papers, e.g. Piaggio and Padilla (2012),Wagner (2015), and Wang et al. (2018). For this model speciﬁcation (M1), an inverted-U relationshipresults when φ > φ < ﬃ cients have the correct signs, then the turning point- the level of economic growth at which environmental improvement starts - can be computed asexp ( − φ / φ ). Model (M1) is restrictive in the sense that nonlinear time e ﬀ ects (clearly visible inFigure 2) can only be explaining using the term φ x t .The model speciﬁcations (M2) and (M3) include deterministic nonlinear time trends. For model(M2), we allow for τ (cid:44) θ =

2. The model in (5.1) without further restrictions is referredto as (M3). In the latter model, the NLS estimator for θ is computed by a grid search over thevalues Θ = [0 . , . ∪ [1 . ,

10] and simulated inference is used (Section 3.2). Table 5 depicts The data for Austria, Belgium, and Finland are mentioned in both Wagner (2015) and Wagner et al. (2019) tobehave in line with the EKC. We discuss Belgium in the main text but the interested reader can ﬁnd the same ﬁgures forAustria and Finland in supplement section H.3. The conclusions are the same. The outcomes of the Perron and Yabu (2009) test (see Supplement) indicate that log per capita GDP is likely tohave a deterministic trend component. It is thus recommended to have a deterministic trend in the model for log percapita CO emissions. We should thus look at the relationship between GDP and CO emissions (in log per capita)after partialling out the e ﬀ ect of the linear trend. ﬀ ect the parameter estimates for φ and φ . Judging only by the signs of (cid:98) φ and (cid:98) φ (thus ignoring potential stationarity in the errors), theEKC exists for 17 out of 18, 9 out of 18, and 8 out of 18 countries for (M1), (M2), and (M3),respectively. Moreover, the signiﬁcance of squared log per capita GDP (that is φ ) reduces whennonlinear deterministic time trends are included. For model (M3), φ is never signiﬁcantly di ﬀ erentfrom zero at a 10% level and evidence in favour of EKC becomes rather meagre. The results ofthe KPSS tests for these models can be found in Table 5 under “Stationarity tests”. In general, thecointegrating relations seem well-speciﬁed except maybe for Belgium, Denmark, and UK. The insigniﬁcance of φ in model (M3) suggests us to return to the model speciﬁcation that wasintroduced earlier, namely y t = τ + τ t + τ t θ + φ x t + u t . (M4)Model (M4) speciﬁes a linear cointegrating relation around a ﬂexible time trend and does notincorporate nonlinear e ﬀ ects in log per capita GDP. That is, the model speciﬁcation does not allowfor an EKC. As before, we check parameter estimates and test for stationarity of the error terms(the columns labeled “( M y t = − . + . t − . × − t . + . x t + ˆ u t . (5.2)The ﬂexible power on the linear trend is estimated to be (cid:98) θ = .

603 resulting in nonlinearbehaviour over time. Moreover, the negative coe ﬃ cient in front of t . provides a contributionthat is sloping down over time. If time e ﬀ ects are ignored, then a 1% increase in GDP willlead to an estimated 1.006% increase in fossil-fuel CO emissions.2. The outcomes of the KPSS test do not point towards a misspeciﬁed cointegrating relation(Table 5). The ﬂexible deterministic trend is apparently su ﬃ cient to describe the nonlinearbehaviour of log per capita CO emissions over time, that is squared log per capita GDPis not needed in the model. Visual proof is found in Figures 2(a), 2(b) and 2(f) where theincorporation of increasingly ﬂexible time e ﬀ ects is seen to remove any apparent nonlinearrelationship between log per capita GDP and CO emissions.3. The estimates for θ and their conﬁdence intervals are reported in Figure 3. Since the parameterspace for θ is bounded below by − , we have truncated its conﬁdence interval at this value.This reﬂects the belief that values less than − are impossible (within our model setting).For Japan and Portugal, we ﬁnd (cid:98) θ = .

05, i.e. a value at the boundary of Θ . For these twocountries we suggest to omit the ﬂexible trend altogether.4. Figure 4 compares the ﬁt of CPR model (M1) with the ﬁt of the GCPR model (M4). The ﬁtof both models is comparable for most of the time span. However, the ﬁt of model (M4) isoften better at the start and end of the sample, say 1870-1890. Deciding on the correct speciﬁcation of the cointegrating relations for each of the 18 countries is implicitly a jointtest. The interpretation of the individual outcomes therefore su ﬀ ers from the multiple testing problem. A multivariatestationarity test is discussed in Lin and Reuvers (2020). Model speciﬁcation (M4) has the additional advantage of being invariant to the possible presence of a driftcomponent in log per capita GDP, also see footnote 13. Summary and conclusion

In this paper we have extended the cointegrating polynomial regression (CPR) model of Wagnerand Hong (2016) with power law deterministic trends. The unknown powers are estimated jointlywith the parameters in the cointegrating relation. The limiting distribution is nonstandard becauseit involves a non-diagonal scaling matrix and the usual second order bias e ﬀ ects. We thereforesuggest a simulation-based approach to conduct inference. The usual subsampling KPSS-type forstationarity of the innovations of the nonlinear cointegrating relation remains valid. Our results aresupported by Monte Carlo simulation. The empirical application on the Environmental KuznetsCurve shows that a ﬂexible trend can fully capture the nonlinearity in the data thereby makinghigher order powers of log per capita GDP redundant. Our resulting model is linear in log per capitaGDP and suggests an alternative explanation in which time e ﬀ ects (e.g. technological progress,environmental awareness) cause the recent slowdown in pollution. Contrary to the opening quotein the introduction, our data does not suggest that air quality will beneﬁt from economic growth.Finally, the narrative of this paper as well as the empirical application are centred around theEnvironmental Kuznets Curve. However, our model setting can also ﬁnd application elsewhere.15 eferences Adams, R. A. and C. Essex (2016).

Calculus: A Complete Course . Pearson Canada.Andrews, D. W. and X. Cheng (2012). Estimation and inference with weak, semi-strong, and strongidentiﬁcation.

Econometrica 80 , 2153–2211.Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrixestimation.

Econometrica 59 , 817–858.Baek, Y. I., S. J. Cho, and P. C. B. Phillips (2015). Testing linearity using power transforms ofregressors.

Journal of Econometrics 187 , 376–384.Bergamelli, M., A. Bianchi, L. Khalaf, and G. Urga (2019). Combining p-values to test for multiplestructural breaks in cointegrated regressions.

Journal of Econometrics 211 , 461–482.Beutner, E., Y. Lin, and S. Smeekes (2020). GLS estimation and conﬁdence sets for the date of asingle break in models with trends. Working Paper.Carson, R. T. (2009). The Environmental Kuznets Curve: seeking empirical regularity and theoreti-cal structure.

Review of Environmental Economics and Policy 4 , 3–23.Chan, N. and Q. Wang (2015). Nonlinear regressions with nonstationary time series.

Journal ofEconometrics 185 , 182–195.Chang, Y., J. Y. Park, and P. C. B. Phillips (2001). Nonlinear econometric models with cointegratedand deterministically trending regressors.

The Econometrics Journal 4 , 1–36.Cho, J. S. and P. C. B. Phillips (2018). Sequentially testing polynomial model hypotheses usingpower transforms of regressors.

Journal of Applied Econometrics 33 , 141–159.Choi, I. and P. Saikkonen (2010). Tests for nonlinear cointegration.

Econometric Theory 26 ,682–709.Dasgupta, S., B. K. H. Wang, and D. Wheeler (2002). Confronting the Environmental KuznetsCurve.

Journal of Economic Perspectives 16 , 147–168.Davies, R. B. (1977). Hypothesis testing when a nuisance parameter is present only under thealternative.

Biometrika 64 , 247–254.Davies, R. B. (1987). Hypothesis testing when a nuisance parameter is present only under thealternative.

Biometrika 74 , 33–43.Dufour, J.-M. (2006). Monte Carlo tests with nuisance parameters: A general approach to ﬁnite-sample inference and nonstandard asymptotics.

Journal of Econometrics 133 , 443–477.Dufour, J.-M. and L. Khalaf (2002). Simulation based ﬁnite and large sample tests in multivariateregressions.

Journal of Econometrics 111 , 303–322.Durlauf, S. N. and P. C. B. Phillips (1988). Trends versus random walks in time series analysis.

Econometrica 56 , 1333–1354. 16illingham, K. and J. H. Stock (2018). The cost of reducing greenhouse gas emissions.

Journal ofEconomic Perspectives 32 , 53–72.Grossman, G. M. and A. B. Krueger (1995). Economic growth and the environment.

The QuarterlyJournal of Economics 110 , 353–377.Hamilton, J. D. (1994).

Time Series Analysis . Princeton University Press.Hansen, B. E. (1992). Consistent covariance matrix estimation for dependent heterogeneousprocesses.

Econometrica 60 , 967–972.Hong, S. H. and P. C. B. Phillips (2010). Testing linearity in cointegrating relations with anapplication to purchasing power parity.

Journal of Business & Economic Statistics 28 , 96–114.Hu, Z., P. C. B. Phillips, and Q. Wang (2019). Nonlinear cointegrating power function regressionwith endogeneity. Working paper.Jansson, M. (2002). Consistent covariance matrix estimation for linear processes.

EconometricTheory 18 , 1449–1459.Jiang, B., Y. Lu, and J. Y. Park (2019). Testing for stationary at high frequency.

Journal ofEconometrics 215 , 341–374.Kwiatkowski, D., P. C. B. Phillips, P. Schmidt, and Y. Shin (1992). Testing the null hypothesis ofstationarity against the alternative of a unit root: How sure are we that economic time series havea unit root?

Journal of Econometrics 54 , 159–178.Lin, Y. and H. Reuvers (2020). E ﬃ cient estimation by fully modiﬁed GLS with an application tothe Environmental Kuznets Curve. Working paper.Nordhaus, W. D. (2014). The perils of the learning model for modeling endogenous technologicalchange. The Energy Journal 35 .Panayotou, T. (1997). Demystifying the environmental kuznets curve: Turning a black box into apolicy tool.

Environment and Development Economics 2 , 465–484.Park, J. Y. and P. C. B. Phillips (2001). Nonlinear regressions with integrated time series.

Econo-metrica 69 , 117–161.Perron, P. and T. Yabu (2009). Estimating deterministic trends with an integrated or stationary noisecomponent.

Journal of Econometrics 151 , 56–69.Perron, P. and X. Zhu (2005). Structural breaks with deterministic and stochastic trends.

Journal ofEconometrics 129 , 65–119.Phillips, P. C. B. (1986). Understanding spurious regressions in econometrics.

Journal of Econo-metrics 33 , 311–340.Phillips, P. C. B. (1998). New tools for understanding spurious regressions.

Econometrica 66 ,1299–1325. 17hillips, P. C. B. (2007). Regression with slowly varying regressors and nonlinear trends.

Econo-metric Theory 23 , 557–614.Phillips, P. C. B. and V. Solo (1992). Asymptotics for linear processes.

The Annals of Statistics 20 ,971–1001.Piaggio, M. and E. Padilla (2012). Co2 emissions and economic activity: Heterogeneity acrosscountries and non-stationary series.

Energy Policy 46 , 370–381.Robinson, P. M. (2012). Inference on power law spatial trends.

Bernoulli 18 , 644–677.Romano, J. P. and M. Wolf (2001). Subsampling intervals in autoregressive models with linear timetrend.

Econometrica 69 , 1283–1314.Soong, T. T. (1973).

Random Di ﬀ erential Equations in Science and Engineering . Academic Press,Inc.Stern, D. I. (2004). The rise and fall of the environmental kuznets curve. World Development 32 ,1419–1439.Stypka, O., M. Wagner, P. Grabarczyk, and R. Kawka (2017). The asymptotic validity of “standard”fully modiﬁed OLS estimation and inference in cointegrating polynomial regressions. WorkingPaper.Tanaka, K. (2017).

Time Series Analysis: Nonstationary and Noninvertible Distribution Theory .John Wiley & Sons.Wagner, M. (2015). The Environmental Kuznets Curve, cointegration and nonlinearity.

Journal ofApplied Econometrics 30 , 948–967.Wagner, M., P. Grabarczyk, and S. H. Hong (2019). Fully modiﬁed OLS estimation and inferencefor seemingly unrelated cointegrating polynomial regressions and the Environmental KuznetsCurve for carbon dioxide emissions.

Journal of Econometrics 214 , 216–255.Wagner, M. and S. H. Hong (2016). Cointegrating polynomial regressions: Fully modiﬁed OLSestimation and inference.

Econometric Theory 32 , 1289–1315.Wang, Q., D. Wu, and K. Zhu (2018). Model checks for nonlinear cointegrating regression.

Journalof Econometrics 207 , 261–284. 18 able 1:

The empirical size (in %) of a t-test for H : φ ≥ H a : φ < y t = τ + τ t + φ x t + φ x t + u t . The variable x t is generated as a random walk process. DGP: y t = − τ t + u t τ (in 10 − ) 0 1 2 3 4 5 6 T =

100 5.5 8.5 13.2 19.8 26.7 33.6 39.0 T =

200 5.5 37.4 57.9 65.2 68.6 70.6 72.5

Note: Further simulation details and a theoretical explanation of these Monte Carlo outcomes can be found in theSupplement. The mechanisms that cause this behaviour of the hypothesis test remind of the literature on spuriousregressions and spurious detrending (for example Phillips (1986) and Durlauf and Phillips (1988), respectively). able 2: The empirical size (in %) of the coe ﬃ cient test H : φ = H a : φ (cid:44)

0. The Monte Carloresults are based on: simulated inference with θ estimated by NLS (SimNLS), simulated inference with θ = θ )), a Fully Modiﬁed estimator with estimated θ (FMOLS), and a Fully Modiﬁed estimator beinginformed about the true value of θ (FMOLS( θ )). ( A ) ( B ) ( C ) ( D ) ρ T = θ ) 5.43 5.38 5.31 5.42 5.00 5.08 7.16 7.25 7.01 19.29 19.48 19.28FMOLS 10.70 10.67 11.12 18.66 18.52 17.88 25.61 25.52 24.30 42.97 43.53 42.15FMOLS( θ ) 7.50 7.00 6.58 14.69 14.20 12.90 21.08 20.28 19.12 38.98 38.54 37.88 T = θ ) 5.50 5.21 5.09 4.77 4.55 4.83 5.54 5.29 5.28 10.00 9.65 9.43FMOLS 9.33 9.33 10.60 14.87 14.68 14.62 19.05 19.22 18.30 31.59 31.62 30.72FMOLS( θ ) 6.08 6.02 5.79 10.90 10.85 10.20 14.58 14.65 13.32 26.02 25.98 25.02 T = θ ) 5.31 5.34 5.25 4.44 4.67 4.70 4.75 4.74 4.55 5.74 5.71 5.19FMOLS 8.41 8.58 10.35 11.92 11.78 12.32 14.32 14.47 14.18 20.48 20.39 19.55FMOLS( θ ) 5.49 5.38 5.45 8.33 8.40 7.85 10.62 10.08 9.78 14.92 15.22 13.87 Note: The DGP is y t = τ + τ t + τ t θ + φ x t + φ x t + u t (numerical values are given on page 11) with x t = (cid:80) ts = v s . The innovations follow (cid:2) u t v t (cid:3) = A (cid:2) u t − v t − (cid:3) + (cid:2) η t (cid:15) t (cid:3) with (cid:2) η t (cid:15) t (cid:3) i . i . d . ∼ N (cid:18) , (cid:20) ρρ (cid:21)(cid:19) . A = HLH (cid:48) with H = U ( U (cid:48) U ) − / and U being a (2 ×

2) matrix of uniformly distributedrandom variables on [0 , L : (A) L = diag[0 , L = diag[0 . , . L = diag[0 . , . L = diag[0 . , . a) Setting (A) (b)

Setting (B) (c)

Setting (C) (d)

Setting (D)

Figure 1:

The power curve for the test H : φ = H a : φ (cid:44)

0. Thesimulation DGP is y t = τ + τ t + τ t θ + φ x t + φ x t + u t . Results are shown for settings (A)–(D) with ρ = .

50. Simulation details can be found in Section 4. able 3: Simulation results on the conﬁdence intervals for θ . We report the empirical coverage, thecoverage( Ω ) computed with the true LRVs, and average length of 95% conﬁdence intervals. All computationsuse simulated inference, see Section 3.2. ( A ) ( B ) ( C ) ( D ) ρ T = Ω ) 94.55 94.64 95.44 94.49 94.38 94.07 94.72 94.59 94.45 95.66 95.76 96.14Length 0.66 0.70 0.81 0.87 0.89 0.92 1.14 1.14 1.16 1.76 1.79 1.76 T = Ω ) 94.98 94.89 94.99 95.22 94.80 94.34 95.06 95.04 95.07 95.61 95.53 95.97Length 0.12 0.13 0.15 0.17 0.17 0.18 0.24 0.24 0.24 0.39 0.39 0.39 T = Ω ) 94.94 95.03 94.89 94.75 94.64 94.40 95.01 95.02 94.51 95.07 95.26 95.51Length 0.01 0.01 0.02 0.02 0.02 0.02 0.03 0.03 0.03 0.06 0.06 0.06 Note: The DGP is y t = τ + τ t + τ t θ + φ x t + φ x t + u t (numerical values are given on page 11) with x t = (cid:80) ts = v s . The innovations follow (cid:2) u t v t (cid:3) = A (cid:2) u t − v t − (cid:3) + (cid:2) η t (cid:15) t (cid:3) with (cid:2) η t (cid:15) t (cid:3) i . i . d . ∼ N (cid:18) , (cid:20) ρρ (cid:21)(cid:19) . A = HLH (cid:48) with H = U ( U (cid:48) U ) − / and U being a (2 ×

2) matrix of uniformly distributedrandom variables on [0 , L : (A) L = diag[0 , L = diag[0 . , . L = diag[0 . , . L = diag[0 . , . able 4: The empirical size (in %) of the subsampling Bonferroni KPSS tests. The row labeled ‘KPSS’ iscomputable in practice. We additionally report simulation outcomes on the same test when being informedabout the true value of θ , see the row indicated by ‘KPSS( θ )’. ( A ) ( B ) ( C ) ( D ) ρ T = θ ) 0.69 0.78 1.90 1.32 1.46 1.89 2.30 2.23 2.23 16.48 16.51 16.42 T = θ ) 1.19 1.24 2.69 1.82 2.08 2.70 2.43 2.38 2.50 8.92 8.86 8.24 T = θ ) 1.36 1.76 3.13 2.27 2.24 2.56 2.32 2.27 2.54 4.35 3.98 3.68 Note: The DGP is y t = τ + τ t + τ t θ + φ x t + φ x t + u t with φ = x t = (cid:80) ts = v s .The innovations follow (cid:2) u t v t (cid:3) = A (cid:2) u t − v t − (cid:3) + (cid:2) η t (cid:15) t (cid:3) with (cid:2) η t (cid:15) t (cid:3) i . i . d . ∼ N (cid:18) , (cid:20) ρρ (cid:21)(cid:19) . A = HLH (cid:48) with H = U ( U (cid:48) U ) − / and U being a(2 ×

2) matrix of uniformly distributed random variables on [0 , L : (A) L = diag[0 , L = diag[0 . , . L = diag[0 . , . L = diag[0 . , . a b l e : P a r a m e t e r e s ti m a t e s a ndou t pu t o f t h e K PSS - t yp e o f t e s t f o r s t a ti on a r it y a s c o m pu t e d f o r m od e l s p ec i ﬁ ca ti on s ( M ) − ( M ) . T h ec o l u m n (cid:91) K P SS a nd M op t p r ov i d e t h e nu m e r i ca l v a l u e s o f t h e K PSS t e s t s a nd t h e nu m b e r o f c ho s e n r e s i du a l s ubb l o c k s , r e s p ec ti v e l y . P a r a m e t e r e s ti m a t e s S t a ti on a r it y t e s t s ( M )( M )( M )( M )( M )( M )( M )( M ) C oun t r y (cid:98) φ (cid:98) φ (cid:98) φ (cid:98) φ (cid:98) φ (cid:98) φ (cid:98) θ (cid:98) φ (cid:91) K P SS M op t (cid:91) K P SS M op t (cid:91) K P SS M op t (cid:91) K P SS M op t A u s t r a li a . − . − . ∗∗∗ . ∗∗∗ − . ∗∗∗ . . . ∗∗∗ . . . . A u s t r i a . ∗∗∗ − . ∗∗ . . . ∗∗∗ − . . . ∗∗∗ . . . . B e l g i u m . ∗∗∗ − . ∗∗∗ . ∗∗∗ − . ∗∗∗ . ∗∗∗ − . . . ∗∗∗ . . ∗ . ∗ . C a n a d a . ∗∗∗ − . ∗∗∗ . − . − . ∗∗∗ . . . ∗∗∗ . ∗ . ∗ . . D e n m a r k14 . ∗∗∗ − . ∗∗∗ − . . − . ∗∗∗ . . . ∗∗∗ . ∗ . ∗∗ . ∗ . F i n l a nd16 . ∗∗∗ − . ∗∗∗ . ∗∗∗ − . ∗∗∗ . ∗∗∗ − . . . ∗∗∗ . . . . F r a n ce . ∗∗∗ − . ∗∗∗ . ∗ − . . ∗∗∗ − . . . ∗∗∗ . . . . G e r m a ny6 . ∗∗∗ − . ∗∗∗ − . . − . ∗∗∗ . . . ∗∗∗ . ∗ . . . ∗ I t a l y11 . ∗∗∗ − . ∗∗∗ . ∗∗ − . . ∗∗∗ − . . . ∗∗∗ . ∗∗ . ∗∗ . . J a p a n9 . ∗∗∗ − . ∗∗∗ − . . . ∗∗∗ − . . . ∗∗∗ . ∗∗∗ . ∗∗ . . N e t h e r l a nd s . ∗∗∗ − . ∗∗∗ . − . . ∗∗ . . . ∗∗∗ . . . . N o r w a y3 . − . − . ∗∗ . ∗∗ − . ∗∗ . . . ∗∗∗ . ∗ . . . P o r t ug a l . . − . ∗∗∗ . − . ∗∗ . . . ∗∗∗ . ∗∗∗ . ∗∗ . . S p a i n7 . ∗∗∗ − . ∗∗∗ . − . . ∗∗∗ − . . . ∗∗∗ . ∗ . . . S w e d e n10 . ∗∗∗ − . ∗∗∗ − . ∗ . ∗∗ . . . . ∗∗∗ . ∗∗ . . . S w it ze r l a nd8 . ∗∗∗ − . ∗∗∗ − . ∗∗ . ∗∗∗ − . ∗∗∗ . . . ∗∗∗ . . . . UK . ∗∗∗ − . ∗∗∗ . ∗∗∗ − . ∗∗∗ . ∗∗∗ − . . . ∗∗∗ . ∗ . ∗∗ . ∗∗ . ∗∗ U S A . ∗∗∗ − . ∗∗∗ . − . − . ∗∗∗ . . . ∗∗∗ . . . . N o t e : A s t e r i s k s d e no t e r e j ec ti ono f t h e nu ll hypo t h e s i s a tt h e ∗∗∗ % , ∗∗ % , a nd ∗ % s i gn i ﬁ ca n ce l e v e l . D e p e nd i ngon t h e s p ec i ﬁ c t a b l ee n t r y , t h e nu ll hypo t h e s i s r e f e r s t o e it h e r ac o e ﬃ c i e n t b e i ng ze r oo r ( non li n ea r) c o i n t e g r a ti on . a) (b)(c) (d)(e) (f)Figure 2: Overview graphs for Belgium over 1870-2014. (a) log(GDP) versus log(CO ) (both per capita). (b) As subﬁgure (a) but using detrended variables. (c)

The log per capita CO emissions time series for Belgium. (d) The residual sum of squares (RSS) for the nonlinear model speciﬁcation y t = τ + τ t + φ x t + φ x θ t + u t for various values of θ . (e) The RSS as a function of θ for the ﬂexible nonlinear trend speciﬁcation y t = τ + τ t + τ t θ + φ x t + u t . (f) The relation between x t and y t after partialling out the constant, linear trend,and ﬂexible deterministic trend. igure 3: Estimates and 95% conﬁdence intervals for (cid:98) θ in model speciﬁcation (M4). i g u re : E s ti m a ti on r e s u lt s f o r C O e m i ss i on s : ac t u a l v a l u e s ( b l ac k ) , ﬁ tt e dv a l u e s und e r t h e C P R m od e l y t = τ + τ t + φ x t + φ x t + u t (r e d ) , a nd ﬁ tt e dv a l u e s und e r t h e G C P R m od e l y t = τ + τ t + τ t θ + φ x t + u t ( b l u e ) . Cointegrating Polynomial Regressions with Power Law Trends: ANew Angle on the Environmental Kuznets Curve

Yicong Lin and Hanno Reuvers

A Useful Lemmas

In this section, we ﬁrst show some preliminary results that will be used in the proofs of maintheorems (Section B).

Lemma 1 (i) For a L > − , we have sup a ≥ a L (cid:12)(cid:12)(cid:12)(cid:12) T (cid:80) Tt = (cid:16) tT (cid:17) a (cid:12)(cid:12)(cid:12)(cid:12) ≤ C,(ii) Under Assumption 2, for any a > − , and any k ≥ , E (cid:16) √ T (cid:80) Tt = (cid:16) tT (cid:17) a (ln t ) k u t (cid:17) ≤ C (ln T ) k , (iii) Under Assumption 2, for some a L and a U such that − < a L < a U < ∞ , and any k ≥ , E (cid:18) sup a ∈ [ a L , a U ] (cid:12)(cid:12)(cid:12)(cid:12) √ T (cid:80) Tt = (cid:16) tT (cid:17) a (ln t ) k u t (cid:12)(cid:12)(cid:12)(cid:12)(cid:19) ≤ C (ln T ) k , (iv) If a L and a U satisfy − < a L < a U < ∞ , and if k ∈ , , , . . . , then sup a ∈ [ a L , a U ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T (cid:88) t = (cid:18) tT (cid:19) a (cid:18) ln tT (cid:19) k − (cid:90) r a (ln r ) k dr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C (ln T ) k + T + min( a L , . Proof (i)

This is shown in lemma 4 of Robinson (2012). (ii)

Note that E (cid:32) √ T T (cid:88) t = (cid:32) tT (cid:33) a (ln t ) k u t (cid:33) = T T (cid:88) t = T (cid:88) s = (cid:18) tT (cid:19) a (cid:18) sT (cid:19) a (ln t ) k (ln s ) k E ( u t u s ) ≤ (ln T ) k T T (cid:88) t = T (cid:88) s = (cid:18) tT (cid:19) a (cid:18) sT (cid:19) a (cid:12)(cid:12)(cid:12) E ( u t u s ) (cid:12)(cid:12)(cid:12) ≤ T ) k T T (cid:88) t = t − (cid:88) s = (cid:18) tT (cid:19) a (cid:18) t − sT (cid:19) a (cid:12)(cid:12)(cid:12) γ s (cid:12)(cid:12)(cid:12) , (A.1)where we deﬁne γ s = E ( u t u t − s ). For the given index ranges, we also have | t − s | ≤ t such that E (cid:32) √ T T (cid:88) t = (cid:32) tT (cid:33) a (ln t ) k u t (cid:33) ≤ T ) k T T (cid:88) t = (cid:18) tT (cid:19) a ∞ (cid:88) s = | γ s | . (A.2)The ﬁrst summation in the RHS of (A.2) is bounded in view of Lemma 1 (i) and (cid:80) ∞ s = | γ s | < ∞ due to Assumption 2(a) (cf. Appendix 3.A. in Hamilton (1994)). (iii) Using the equality tT = (cid:80) t − s = (cid:104)(cid:16) s + T (cid:17) a − (cid:16) sT (cid:17) a (cid:105) and a change in the order of summation, we ﬁnd T (cid:88) t = (cid:18) tT (cid:19) a (ln t ) k u t = T (cid:88) t = t − (cid:88) s = (cid:34)(cid:32) s + T (cid:33) a − (cid:18) sT (cid:19) a (cid:35) (ln t ) k u t = T − (cid:88) s = (cid:34)(cid:32) s + T (cid:33) a − (cid:18) sT (cid:19) a (cid:35) T (cid:88) t = s + (ln t ) k u t = (cid:32) T (cid:33) a T (cid:88) t = (ln t ) k u t + T − (cid:88) s = (cid:34)(cid:32) s + T (cid:33) a − (cid:18) sT (cid:19) a (cid:35)  T (cid:88) t = (ln t ) k u t − s (cid:88) t = (ln t ) k u t  = (cid:32) T (cid:33) a T (cid:88) t = (ln t ) k u t + T (cid:88) t = (ln t ) k u t − (cid:32) T (cid:33) a T (cid:88) t = (ln t ) k u t − T − (cid:88) s = (cid:34)(cid:32) s + T (cid:33) a − (cid:18) sT (cid:19) a (cid:35) s (cid:88) t = (ln t ) k u t , E (cid:32) sup a ∈ [ a L , a U ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ T T (cid:88) t = (cid:18) tT (cid:19) a (ln t ) k u t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:33) ≤ E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ T T (cid:88) t = (ln t ) k u t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + E  sup a ∈ [ a L , a U ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ T T − (cid:88) s = (cid:34)(cid:32) s + T (cid:33) a − (cid:18) sT (cid:19) a (cid:35) s (cid:88) t = (ln t ) k u t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (A.3)For the ﬁrst term in the RHS of (A.3), we have E (cid:12)(cid:12)(cid:12)(cid:12) √ T (cid:80) Tt = (ln t ) k u t (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18) E (cid:16) √ T (cid:80) Tt = (ln t ) k u t (cid:17) (cid:19) / ≤ C (ln T ) k by Lemma 1(ii) with a =

0. For the second term, note that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ T T − (cid:88) s = (cid:34)(cid:32) s + T (cid:33) a − (cid:18) sT (cid:19) a (cid:35) s (cid:88) t = (ln t ) k u t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ T T − (cid:88) s = (cid:18) sT (cid:19) a (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:32) + s (cid:33) a − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) s (cid:88) t = (ln t ) k u t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (A.4)To deal with the supremum of (cid:12)(cid:12)(cid:12)(cid:12)(cid:16) + s (cid:17) a − (cid:12)(cid:12)(cid:12)(cid:12) over [ a L , a U ], we deﬁne g a ( x ) = (1 + x ) a − ≤ x ≤

1. If − < a ≤

1, then | g a ( x ) | ≤ | a | x by Bernoulli’s inequality. If a ≥

1, then convexity of g a ( x ) implies g a ( x ) ≤ (1 − x ) g a (0) + xg a (1) ≤ (2 a − x . We conclude that | g a ( x ) | ≤ C x for all a L ≤ a ≤ a U and x ∈ [0 , E  sup a ∈ [ a L , a U ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ T T − (cid:88) s = (cid:34)(cid:32) s + T (cid:33) a − (cid:18) sT (cid:19) a (cid:35) s (cid:88) t = (ln t ) k u t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ E  √ T T − (cid:88) s = (cid:18) sT (cid:19) a L sup a ∈ [ a L , a U ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:32) + s (cid:33) a − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) s (cid:88) t = (ln t ) k u t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C √ T T − (cid:88) s = (cid:18) sT (cid:19) a L s E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) s (cid:88) t = (ln t ) k u t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ CT − ( a L + / T − (cid:88) s = s a L − / (ln s ) k ≤ C (ln T ) k  T T (cid:88) s = (cid:18) sT (cid:19) a L − /  ≤ C (ln T ) k , where we used E (cid:12)(cid:12)(cid:12)(cid:80) st = (ln t ) k u t (cid:12)(cid:12)(cid:12) ≤ (cid:18) E (cid:16)(cid:80) st = (ln t ) k u t (cid:17) (cid:19) / ≤ C s / (ln s ) k (the steps in the proof of(ii) require a small modiﬁcation to establish this) to go to the last line, and (i) to obtain the ﬁnalinequality. The proof is complete since we have bounded both terms in the RHS of (A.3). (iv) If wedivide the integral into integration intervals of width T , then we ﬁndsup a ∈ [ a L , a U ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T (cid:88) t = (cid:18) tT (cid:19) a (cid:18) ln tT (cid:19) k − (cid:90) r a (ln r ) k dr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = sup a ∈ [ a L , a U ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t = (cid:90) t / T ( t − / T (cid:18) tT (cid:19) a (cid:18) ln tT (cid:19) k dr − T (cid:88) t = (cid:90) t / T ( t − / T r a (ln r ) k dr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = sup a ∈ [ a L , a U ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:32) T (cid:33) a (cid:32) ln 1 T (cid:33) k − (cid:90) / T r a (ln r ) k dr + T (cid:88) t = (cid:90) t / T ( t − / T (cid:34)(cid:18) tT (cid:19) a (cid:18) ln tT (cid:19) k − r a (ln r ) k (cid:35) dr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ sup a ∈ [ a L , a U ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:32) T (cid:33) a + (cid:32) ln 1 T (cid:33) k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + sup a ∈ [ a L , a U ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) / T r a (ln r ) k dr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + sup a ∈ [ a L , a U ] T (cid:88) t = (cid:90) t / T ( t − / T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:18) tT (cid:19) a (cid:18) ln tT (cid:19) k − r a (ln r ) k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) dr = : Ia + Ib + Ic , (A.5)29sing the triangle inequality. Clearly, Ia is bounded by T − ( a L + (ln T ) k . For Ib we can use thestandard integral (cf. Adams and Essex (2016)), namely (cid:82) / T r a (ln r ) k dr = ( − k a + (cid:16) T (cid:17) a + (ln T ) k − ka + (cid:82) / T r a (ln r ) k − dr for k (cid:44) −

1, to obtain (cid:90) / T r a (ln r ) k dr = ( − k (cid:32) T (cid:33) a + k − (cid:88) j = k !( k − j )! 1( a + + j (ln T ) k − j + ( − k k !( a + k (cid:90) / T r a dr = ( − k (cid:32) T (cid:33) a + k (cid:88) j = k !( k − j )! 1( a + + j (ln T ) k − j . We therefore conclude that Ib ≤ k (cid:88) j = k !( k − j )! sup a ∈ [ a L , a U ] a + + j (cid:32) T (cid:33) a + (ln T ) k − j ≤ k (cid:88) j = k !( k − j )! 1( a L + + j (cid:32) T (cid:33) a L + (ln T ) k − j ≤ CT − ( a L + (ln T ) k . It remains to bound the term Ic . Changing the integration variable to r = tT − s yields Ic = sup a ∈ [ a L , a U ] T (cid:88) t = (cid:90) / T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:18) tT (cid:19) a (cid:18) ln tT (cid:19) k − (cid:18) tT − s (cid:19) a (cid:20) ln (cid:18) tT − s (cid:19)(cid:21) k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ds . (A.6)We subsequently derive an upper bound for the integrand using an approach which mimics thederivations in (D.14) and (D.15) in Robinson (2012). For any T ≤ (cid:96) ≤ < s /(cid:96) ≤ ),we have (cid:12)(cid:12)(cid:12)(cid:12) (cid:96) a (ln (cid:96) ) k − ( (cid:96) − s ) a (cid:0) ln( (cid:96) − s ) (cid:1) k (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:2) (cid:96) a − ( (cid:96) − s ) a (cid:3) (ln (cid:96) ) k + ( (cid:96) − s ) a (cid:2) (ln (cid:96) ) k − (ln( (cid:96) − s )) k (cid:3)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:2) (cid:96) a − ( (cid:96) − s ) a (cid:3) (ln (cid:96) ) k (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) ( (cid:96) − s ) a (cid:104) (ln (cid:96) ) k − (ln( (cid:96) − s )) k (cid:105)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:96) a (cid:12)(cid:12)(cid:12) ln (cid:96) (cid:12)(cid:12)(cid:12) k (cid:12)(cid:12)(cid:12) − (1 − s /(cid:96) ) a (cid:12)(cid:12)(cid:12) + (cid:96) a (1 − s /(cid:96) ) a (cid:12)(cid:12)(cid:12) (ln (cid:96) ) k − (ln( (cid:96) − s )) k (cid:12)(cid:12)(cid:12) = : IIa + IIb , (A.7)by the triangle inequality and the fact that (cid:12)(cid:12)(cid:12) ( (cid:96) − s ) a (cid:12)(cid:12)(cid:12) = ( (cid:96) − s ) a . For IIa similar arguments as thosefound below (A.4) give (cid:12)(cid:12)(cid:12) − (1 − x ) a (cid:12)(cid:12)(cid:12) ≤ C x , and hence

IIa ≤ C (cid:96) a L (cid:12)(cid:12)(cid:12) ln (cid:96) (cid:12)(cid:12)(cid:12) k s (cid:96) ≤ C (cid:96) a L − (cid:12)(cid:12)(cid:12) ln (cid:96) (cid:12)(cid:12)(cid:12) k s ≤ C (cid:96) a L − (cid:12)(cid:12)(cid:12) ln (cid:96) (cid:12)(cid:12)(cid:12) k T ≤ C (cid:96) a L − (cid:12)(cid:12)(cid:12) ln (cid:96) (cid:12)(cid:12)(cid:12) k T ≤ C (cid:96) a L − (cid:0) ln T (cid:1) k T , (A.8)since | ln (cid:96) | ≤ | ln T | for all T ≤ (cid:96) ≤

1. For

IIb we ﬁrst note that ≤ − s /(cid:96) < − s /(cid:96) ) a < (1 − s /(cid:96) ) − ≤

2. Moreover, we use the factorization p n − q n = ( p − q ) (cid:80) n − j = p n − − j q j toobtain (cid:12)(cid:12)(cid:12)(cid:12) (ln (cid:96) ) k − (ln( (cid:96) − s )) k (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) ln (cid:96) − ln( (cid:96) − s ) (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k − (cid:88) j = (ln (cid:96) ) k − − j (cid:0) ln( (cid:96) − s ) (cid:1) j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) ln(1 − s /(cid:96) ) (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k − (cid:88) j = (ln (cid:96) ) k − − j (cid:0) ln( (cid:96) − s ) (cid:1) j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) ln(1 − s /(cid:96) ) (cid:12)(cid:12)(cid:12) k − (cid:88) j = | ln (cid:96) | k − − j | ln( (cid:96) − s ) | j ≤ k | ln(1 − s /(cid:96) ) | (ln T ) k − ≤ k s (cid:96) (cid:0) ln T (cid:1) k − , (A.9) For any x > −

1, we have the inequality x + x ≤ ln(1 + x ) ≤ x . This implies that | ln(1 − s /(cid:96) ) | = − ln (cid:0) − s /(cid:96) (cid:1) ≤ s /(cid:96) − s /(cid:96) ≤ s (cid:96) . / T ≤ (cid:96) − s < | ln( (cid:96) − s ) | ≤ ln T . Combining all previous results for IIb gives

IIb ≤ C (cid:96) a s (cid:96) (cid:0) ln T (cid:1) k − ≤ C (cid:96) a L − (cid:0) ln T (cid:1) k − T . Since T ≤ (cid:96) ≤

1, we use the bounds on

IIa and

IIb to bound the integrand of (A.6) as follows: Ic ≤ C sup a ∈ [ a L , a U ] T (cid:88) t = (cid:90) / T (cid:18) tT (cid:19) a L − T (ln T ) k ds ≤ C (ln T ) k T T (cid:88) t = (cid:18) tT (cid:19) a L − . The asymptotic order of (cid:80) Tt = (cid:0) tT (cid:1) a L − relies on the values of a L . We distinguish three cases: (1)if a L <

0, then (cid:80) Tt = (cid:0) tT (cid:1) a L − = T − a L (cid:80) Tt = t − aL = T − a L O (1), (2) if a L =

0, then (cid:80) Tt = (cid:0) tT (cid:1) a L − = T (cid:80) Tt = t − = T O (ln T ), and (3) if a L > (cid:80) Tt = (cid:0) tT (cid:1) a L − = O ( T ) by Lemma 1(i). Overall, we have Ic ≤ C (ln T ) k T T (cid:88) t = (cid:18) tT (cid:19) a l − = O (cid:32) (ln T ) k T a l + { a l < } + (ln T ) k + T { a l = } + (ln T ) k T { a l > } (cid:33) . (A.10)It is seen that Ia , Ib , and Ic converge to zero as T → ∞ . The proof follows from (A.5). (cid:4) Lemma 2

Let Assumption 2 hold. For any a such that − < a L ≤ a ≤ a U < ∞ , any i ∈ { , , . . . , m } , anyj ∈ { , , . . . , p i } , and k ∈ { , , , . . . } , we have:(i) √ T (cid:80) Tt = (cid:18) x it √ T (cid:19) j u t ⇒ (cid:82) B jv i ( r ) dB u ( r ) + j ∆ v i u (cid:82) B j − v i ( r ) dr,(ii) √ T (cid:80) Tt = (cid:16) tT (cid:17) a (cid:16) ln tT (cid:17) k u t ⇒ (cid:82) r a (ln r ) k dB u ( r ) ,(iii) T (cid:80) Tt = (cid:16) tT (cid:17) a (cid:16) ln tT (cid:17) k (cid:18) x it √ T (cid:19) j ⇒ (cid:82) r a (ln r ) k B jv i ( r ) dr.Moreover, the weak convergence in (i)-(iii) holds jointly.Proof For r ∈ (0 , f ( r ) = r a (ln r ) k . Two partial sum processes are deﬁned as S T ( r ) = √ T (cid:80) [ rT ] s = u s , and X i , T ( r ) = √ T x i , [ rT ] = √ T (cid:80) [ rT ] i = v it . Finally, set f T ( r ) = (cid:16) [ rT ] T (cid:17) a (cid:16) ln [ rT ] T (cid:17) k for r ∈ (cid:104) T , (cid:105) . (i) This result follows from lemma 1 of Hong and Phillips (2010). (ii)

We have1 √ T (cid:18) tT (cid:19) a (cid:18) ln tT (cid:19) k u t = f T (cid:18) tT (cid:19) u t √ T = f T (cid:18) tT (cid:19) (cid:34) S T (cid:18) tT (cid:19) − S T (cid:32) t − T (cid:33)(cid:35) = (cid:34) f T (cid:18) tT (cid:19) S T (cid:18) tT (cid:19) − f T (cid:32) t − T (cid:33) S T (cid:32) t − T (cid:33)(cid:35) − (cid:34) f T (cid:18) tT (cid:19) − f T (cid:32) t − T (cid:33)(cid:35) S T (cid:32) t − T (cid:33) (A.11)and hence1 √ T T (cid:88) t = (cid:18) tT (cid:19) a (cid:18) ln tT (cid:19) k u t = (cid:32) T (cid:33) a (cid:32) ln 1 T (cid:33) k u √ T + √ T T (cid:88) t = (cid:18) tT (cid:19) a (cid:18) ln tT (cid:19) k u t (A.11) = f T (cid:32) T (cid:33) S T (cid:32) T (cid:33) + (cid:34) f T (1) S T (1) − f T (cid:32) T (cid:33) S T (cid:32) T (cid:33)(cid:35) − T (cid:88) t = (cid:34) f T (cid:18) tT (cid:19) − f T (cid:32) t − T (cid:33)(cid:35) S T (cid:32) t − T (cid:33) f T (1) = = − T (cid:88) t = (cid:90) t / T ( t − / T S T ( r ) d f T ( r ) (A.12)31here we used the fact that S T ( · ) is piecewise constant. In view of Assumption 2, we can extendsuitably extend the probability space and have the following uniformly strong approximation of thepartial sum process S T (see for example page 562 of Phillips (2007)):sup ≤ t ≤ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) S T (cid:32) t − T (cid:33) − B u (cid:32) t − T (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o a . s . (cid:32) T (1 / − (1 / q ) (cid:33) , (A.13)for q >

2. Continuing from (A.12), this uniformly strong approximation gives1 √ T T (cid:88) t = (cid:18) tT (cid:19) a (cid:18) ln tT (cid:19) k u t = − T (cid:88) t = (cid:90) t / T ( t − / T B u ( r ) d f T ( r ) + o a . s . (cid:32) T (1 / − (1 / q ) (cid:33) = − (cid:90) / T B u ( r ) d f T ( r ) + o a . s . (cid:32) T (1 / − (1 / q ) (cid:33) = B u (cid:32) T (cid:33) f T (cid:32) T (cid:33) + (cid:90) / T f T ( r ) dB u ( r ) + o a . s . (cid:32) T (1 / − (1 / p ) (cid:33) = (cid:90) f ( r ) dB u ( r ) − (cid:90) / T f ( r ) dB u ( r ) + B u (cid:32) T (cid:33) f T (cid:32) T (cid:33) + (cid:90) / T (cid:104) f T ( r ) − f ( r ) (cid:105) dB u ( r ) + o a . s . (cid:32) T (1 / − (1 / p ) (cid:33) , (A.14)where the third line is obtained using integration by parts of the mean square Riemann-Stieltjesintegral, c.f. theorem 2.7 in Tanaka (2017). It remains to show that (cid:82) / T f ( r ) dB u ( r ), B u (cid:16) T (cid:17) f T (cid:16) T (cid:17) ,and (cid:82) / T (cid:104) f T ( r ) − f ( r ) (cid:105) dB u ( r ) are asymptotically negligible. These quantities are zero mean so itsu ﬃ ces to show that their variances vanish as T → ∞ . By the isometry property and steps similarto those above (A.6), we have V ar (cid:32)(cid:90) / T f ( r ) dB u ( r ) (cid:33) = Ω uu (cid:90) / T (cid:2) f ( r ) (cid:3) dr ≤ CT − (2 a L + (ln T ) k → , (A.15)as T → ∞ . Also, V ar (cid:16) B u (cid:16) T (cid:17) f T (cid:16) T (cid:17)(cid:17) = T Ω uu (cid:104) f T (cid:16) T (cid:17)(cid:105) = Ω uu (cid:16) T (cid:17) a L + (cid:16) ln T (cid:17) k →

0. To control thevariance of (cid:82) / T (cid:104) f T ( r ) − f ( r ) (cid:105) dB u ( r ), we look at (cid:90) / T | f ( r ) − f T ( r ) | dr = T (cid:88) t = (cid:90) t / T ( t − / T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) f ( r ) − (cid:32) t − T (cid:33) a (cid:32) ln t − T (cid:33) k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) dr = T − (cid:88) t = (cid:90) ( t + / Tt / T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) r a (cid:0) ln r (cid:1) k − (cid:18) tT (cid:19) a (cid:18) ln tT (cid:19) k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) dr = T − (cid:88) t = (cid:90) / T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:18) tT + s (cid:19) a (cid:20) ln (cid:18) tT + s (cid:19)(cid:21) k − (cid:18) tT (cid:19) a (cid:18) ln tT (cid:19) k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ds . (A.16)Now let (cid:96) ∈ (cid:110) T , T , . . . , (cid:111) and recall that 0 ≤ s ≤ T (hence also 0 ≤ s (cid:96) ≤ (cid:12)(cid:12)(cid:12)(cid:12) ( (cid:96) + s ) a (ln( (cid:96) + s )) k − (cid:96) a (ln (cid:96) ) k (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) [( (cid:96) + s ) a − (cid:96) a ] (ln( (cid:96) + s )) k + (cid:96) a (cid:104) (ln( (cid:96) + s )) k − (ln (cid:96) ) k (cid:105)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) [( (cid:96) + s ) a − (cid:96) a ] (ln( (cid:96) + s )) k (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) (cid:96) a (cid:104) (ln( (cid:96) + s )) k − (ln (cid:96) ) k (cid:105)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:96) a (cid:12)(cid:12)(cid:12)(cid:12) (1 + s /(cid:96) ) a − (cid:12)(cid:12)(cid:12)(cid:12) | ln( (cid:96) + s ) | k + (cid:96) a (cid:12)(cid:12)(cid:12) (ln( (cid:96) + s )) k − (ln (cid:96) ) k (cid:12)(cid:12)(cid:12) = IIc + IId . (A.17)32y the inequality | g a ( x ) | ≤ C x below (A.4) and the fact that | ln( (cid:96) + s ) | ≤ | ln (cid:96) | + | ln(1 + s /(cid:96) ) | ≤ ln T + s /(cid:96) , we obtain IIc ≤ C (cid:96) a L s (cid:96) (cid:12)(cid:12)(cid:12) ln T + s (cid:96) (cid:12)(cid:12)(cid:12) k ≤ C (cid:96) a L − (ln T ) k T . Moreover, the factorisation p n − q n = ( p − q ) (cid:80) n − j = p n − − j q j yields IId = (cid:96) a | ln (1 + s /(cid:96) ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k − (cid:88) j = (ln( (cid:96) + s )) k − − j (ln (cid:96) ) j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ k (cid:96) a L s (cid:96) | (ln T ) + | k − ≤ C (cid:96) a L − (ln T ) k − T . (A.18)By combination of the bounds on IIc and

IId , we conclude that (cid:12)(cid:12)(cid:12) ( (cid:96) + s ) a (ln( (cid:96) + s )) k − (cid:96) a (ln (cid:96) ) k (cid:12)(cid:12)(cid:12) ≤ C (cid:96) a L − (ln T ) k T and arrive at the following upper bound on the RHS of (A.16): (cid:90) / T | f ( r ) − f T ( r ) | dr ≤ C (ln T ) k T T (cid:88) t = (cid:18) tT (cid:19) a L − = O (cid:32) (ln T ) k T a L + ) { a L < } + (ln T ) k + T { a L = } + (ln T ) k T { a L > } (cid:33) . (A.19)The RHS of (A.19) will go to zero as T → ∞ , thereby establishing that (cid:82) / T (cid:104) f T ( r ) − f ( r ) (cid:105) dB u ( r ) isalso asymptotically negligible. The proof of part (ii) is now complete. (iii) We have1 T T (cid:88) t = (cid:18) tT (cid:19) a (cid:18) ln tT (cid:19) k (cid:18) x it √ T (cid:19) j = T (cid:88) t = (cid:90) t / T ( t − / T f T ( r ) X ji , T ( r ) dt = (cid:90) f ( r ) X ji , T ( r ) dr + (cid:90) / T (cid:2) f T ( r ) − f ( r ) (cid:3) X ji , T ( r ) dr = : IIIa + IIIb . (A.20)Given the CMT and X i , T ⇒ B v i , term IIIa will converge weakly to (cid:82) f ( r ) B jv i ( r ) dr if we can showthat x (cid:55)→ (cid:82) f ( r ) x j ( r ) dr is a continuous functional. Let x , y ∈ D [0 , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) f ( r ) x j ( r ) dr − (cid:90) f ( r ) y j ( r ) dr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:90) f ( r ) (cid:16) x j ( r ) − y j ( r ) (cid:17) dr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:90) | f ( r ) | dr sup r ∈ [0 , | x j ( r ) − y j ( r ) | ≤ C sup r ∈ [0 , | x ( r ) − y ( r ) | → , (A.21)because (cid:82) | f ( r ) | dr = k !(1 + a ) k + is bounded. Continuity of the functional now follows from (A.21). Ifwe apply the Cauchy-Schwartz inequality to IIIb , then we ﬁnd

IIIb ≤ (cid:34)(cid:90) / T | f ( r ) − f T ( r ) | dr (cid:35) / (cid:34)(cid:90) / T X ji , T ( r ) dr (cid:35) / . Since (cid:82) / T | f ( r ) − f T ( r ) | = o (1) by (A.19) and (cid:82) / T X ji , T ( r ) dr = (cid:82) X ji , T ( r ) dr ⇒ (cid:82) B jv i ( r ) dr . Weconclude that IIIb = o p (1). Now combine the limiting results for IIIa and

IIIb to complete theargument. (cid:4)

Lemma 3

Let f ( w t , γ ) = d t ( θ ) (cid:48) τ + s (cid:48) t φ , where w t = [ t , x t ] (cid:48) and x t = [ x t , x t , . . . , x mt ] (cid:48) stacks all the stochastictrends. If ˙ f ( w t , γ ) and ¨ f ( w t , γ ) denote the ﬁrst and second derivatives of f ( w t , γ ) with respect to γ , then i) ˙ f ( w t , γ ) = (cid:2) ( τ (cid:12) d t ( θ )) (cid:48) ln t , d t ( θ ) (cid:48) , s (cid:48) t (cid:3) (cid:48) ,(ii) L τ , T ˙ f ( w t , γ ) = (cid:20) [( τ − τ ) (cid:12) d t ( θ )] ln t + [ τ (cid:12) d t ( θ )] ln tT d t ( θ ) s t (cid:21) , and hence L τ , T ˙ f ( w t , γ ) = (cid:20) [ τ (cid:12) d t ( θ )] ln tT d t ( θ ) s t (cid:21) ,(iii) We have ˙ f ( w t , γ ) − ˙ f ( w t , γ ) =  (cid:16) ( τ − τ ) (cid:12) (cid:0) d t ( θ ) − d t ( θ ) (cid:1) + τ (cid:12) (cid:0) d t ( θ ) − d t ( θ ) (cid:1) + ( τ − τ ) (cid:12) d t ( θ ) (cid:17) ln t d t ( θ ) − d t ( θ ) p ×  , which implies L τ , T (cid:16) ˙ f ( w t , γ ) − ˙ f ( w t , γ ) (cid:17) =  (cid:16) ( τ − τ ) (cid:12) (cid:0) d t ( θ ) − d t ( θ ) (cid:1) + ( τ − τ ) (cid:12) d t ( θ ) (cid:17) ln t + τ (cid:12) (cid:0) d t ( θ ) − d t ( θ ) (cid:1) ln tT d t ( θ ) − d t ( θ ) p ×  . (A.22) (iv) ¨ f ( w t , γ ) = (cid:34) diag[ τ ] diag[ d t ( θ )](ln t ) diag[ d t ( θ )]ln t O d × p diag[ d t ( θ )]ln t O d × d O d × p O p × d O p × d O p × p (cid:35) ,(v) G (cid:48)− γ , T ¨ f ( w t , γ ) G − γ , T = (cid:34) ¨ F , t ¨ F , t O d × p ¨ F , t O d × d O d × p O p × d O p × d O p × p (cid:35) , where the blocks in this symmetric matrix are given by: ¨ F , t = T D d , T ( θ ) − (cid:16) diag[ τ ] diag[ d t ( θ )](ln t ) − τ ] diag[ d t ( θ )]ln t ln T (cid:17) D d , T ( θ ) − ¨ F , t = ¨ F , t = T D d , T ( θ ) − diag[ d t ( θ )]ln t D d , T ( θ ) − . (vi) Deﬁne a symmetric block matrix M T = (cid:2) M T , i j (cid:3) ≤ i , j ≤ = G (cid:48)− γ , T (cid:104)(cid:80) Tt = ˙ f ( w t , γ ) ˙ f ( w t , γ ) (cid:48) (cid:105) G − γ , T and a stacked vector z T = (cid:2) z T , i (cid:3) ≤ i ≤ = G (cid:48)− γ , T (cid:104)(cid:80) Tt = ˙ f ( w t , γ ) u t (cid:105) , where M T , i j = M (cid:48) T , ji , and M T , = T T (cid:88) t = (cid:16) τ (cid:12) D d , T ( θ ) − d t ( θ ) (cid:17)(cid:16) τ (cid:12) D d , T ( θ ) − d t ( θ ) (cid:17) (cid:48) (cid:18) ln tT (cid:19) M T , = T T (cid:88) t = (cid:16) τ (cid:12) D d , T ( θ ) − d t ( θ ) (cid:17)(cid:2) D d , T ( θ ) − d t ( θ ) (cid:3) (cid:48) ln tT M T , = T T (cid:88) t = (cid:16) τ (cid:12) D d , T ( θ ) − d t ( θ ) (cid:17)(cid:0) D − s , T s t (cid:1) (cid:48) ln tT M T , = T T (cid:88) t = (cid:2) D d , T ( θ ) − d t ( θ ) (cid:3)(cid:2) D d , T ( θ ) − d t ( θ ) (cid:3) (cid:48) M T , = T T (cid:88) t = (cid:2) D d , T ( θ ) − d t ( θ ) (cid:3)(cid:0) D − s , T s t (cid:1) (cid:48) M T , = T T (cid:88) t = (cid:0) D − s , T s t (cid:1)(cid:0) D − s , T s t (cid:1) (cid:48) , (A.23)34 oreover, z T , = √ T T (cid:88) t = (cid:16) τ (cid:12) D d , T ( θ ) − d t ( θ ) (cid:17) (cid:18) ln tT (cid:19) u t z T , = √ T T (cid:88) t = (cid:2) D d , T ( θ ) − d t ( θ ) (cid:3) u t z T , = √ T T (cid:88) t = (cid:0) D − s , T s t (cid:1) u t . (A.24) Proof

All results follow from linearity of the Hadamard product. (cid:4)

Lemma 4

For a constant δ > , we deﬁne N δ, T ( γ ) = (cid:26) γ ∈ Γ : (cid:13)(cid:13)(cid:13) D d , T ( θ ) ( θ − θ ) (cid:13)(cid:13)(cid:13) < δ T − / ln T , (cid:13)(cid:13)(cid:13) D s , T ( φ − φ ) (cid:13)(cid:13)(cid:13) < δ T − / ln T , (cid:13)(cid:13)(cid:13)(cid:13) D d , T ( θ ) (cid:16) ( τ − τ ) + (cid:2) τ (cid:12) ( θ − θ ) (cid:3) ln T (cid:17)(cid:13)(cid:13)(cid:13)(cid:13) < δ T − / ln T (cid:27) , (A.25) where γ ∈ Γ is ﬁxed. Under Assumption 2, for all k ≥ ,(i) sup γ ∈N δ, T ( γ ) 1 T (ln T ) k (cid:80) Tt = (cid:13)(cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:16) d t ( θ ) − d t ( θ ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13) = o (1) ,(ii) sup γ ∈N δ, T ( γ ) 1 T (ln T ) k (cid:80) Tt = (cid:13)(cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:104) τ (cid:12) (cid:16) d t ( θ ) − d t ( θ ) (cid:17)(cid:105) ln tT (cid:13)(cid:13)(cid:13)(cid:13) = o (1) ,(iii) sup γ ∈N δ, T ( γ ) 1 T (ln T ) k (cid:80) Tt = (cid:13)(cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:104) ( τ − τ ) (cid:12) d t ( θ ) (cid:105) ln t (cid:13)(cid:13)(cid:13)(cid:13) = o (1) ,(iv) sup γ ∈N δ, T ( γ ) 1 T (ln T ) k (cid:80) Tt = (cid:13)(cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:104) ( τ − τ ) (cid:12) (cid:0) d t ( θ ) − d t ( θ ) (cid:1)(cid:105) ln t (cid:13)(cid:13)(cid:13)(cid:13) = o (1) ,(v) sup γ ∈N δ, T ( γ ) (cid:80) Tt = (cid:12)(cid:12)(cid:12) f ( w t , γ ) − f ( w t , γ ) (cid:12)(cid:12)(cid:12) = O p (cid:0) (ln T ) (cid:1) ,(vi) sup γ ∈N δ, T ( γ ) 1 T (ln T ) k (cid:80) Tt = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − diag[ d t ( θ )] (cid:13)(cid:13)(cid:13) = o (1) ,(vii) sup γ ∈N δ, T ( γ ) 1 T (ln T ) (cid:13)(cid:13)(cid:13)(cid:80) Tt = D d , T ( θ ) − diag[ d t ( θ )] u t (ln t ) k (cid:13)(cid:13)(cid:13) = o p (1) .Proof When needed, let i ∈ { , , . . . , d } be arbitrary. The i th component of θ and τ are written θ i and τ i , respectively. (i) The i th component of D d , T ( θ ) − (cid:0) d t ( θ ) − d t ( θ ) (cid:1) equals T − θ i ( t θ i − t θ i ). Bythe mean-value theorem, we have1 T (ln T ) k T (cid:88) t = T − θ i (cid:12)(cid:12)(cid:12) t θ i − t θ i (cid:12)(cid:12)(cid:12) = T (ln T ) k T (cid:88) t = (cid:18) tT (cid:19) θ i (cid:12)(cid:12)(cid:12) t θ i − θ i − (cid:12)(cid:12)(cid:12) ≤ T (ln T ) k T (cid:88) t = (cid:18) tT (cid:19) θ i (ln t ) | θ i − θ i | ≤ T (ln T ) k + T (cid:88) t = (cid:18) tT (cid:19) θ i | θ i − θ i | ≤ (ln T ) k +  sup θ L ≤ θ ≤ θ U T T (cid:88) t = (cid:18) tT (cid:19) θ  | θ i − θ i | . (A.26)35he supremum in the RHS of (A.26) is bounded in view of Lemma 1(a) in the supplement.Moreover, if θ i ∈ N δ, T ( γ ), then T θ i | θ i − θ i | < δ T − / ln T or equivalently | θ i − θ i | < T − (1 + θ i ) ln T .We conclude that1 T (ln T ) k T (cid:88) t = T − θ i (cid:12)(cid:12)(cid:12) t θ i − t θ i (cid:12)(cid:12)(cid:12) ≤ C (ln T ) k + T − (1 + θ i ) (A.27)which converges to zero because 1 + θ i ≥ + θ L >

0. The result follows since i was arbitrary. (ii) We again look at the i th component of D d , T ( θ ) − (cid:104) τ (cid:12) (cid:16) d t ( θ ) − d t ( θ ) (cid:17)(cid:105) ln tT and ﬁnd1 T (ln T ) k T (cid:88) t = T − θ i τ i (cid:12)(cid:12)(cid:12) t θ i − t θ i (cid:12)(cid:12)(cid:12) (cid:18) ln tT (cid:19) ≤ (ln T ) k + τ i  sup θ L ≤ θ ≤ θ U T T (cid:88) t = (cid:18) tT (cid:19) θ (cid:18) ln tT (cid:19)  | θ i − θ i | . taking steps identical to those taken in (A.26). The supremum in the RHS is bounded becausesup θ L ≤ θ ≤ θ U ] (cid:12)(cid:12)(cid:12)(cid:12) T (cid:80) Tt = (cid:16) tT (cid:17) θ (cid:16) ln tT (cid:17) − (cid:82) r θ (ln r ) dr (cid:12)(cid:12)(cid:12)(cid:12) → (cid:82) r θ i (ln r ) dr is ﬁnite for all θ ∈ [ θ L , θ U ]. The proof is easily completed after recalling that | θ i − θ i | < T − (1 + θ i ) ln T whenever θ i ∈ N δ, T ( γ ). (iii) The contribution of the i th component of D d , T ( θ ) − (cid:2) ( τ − τ ) (cid:12) d t ( θ ) (cid:3) ln t to the sum T (ln T ) k (cid:80) Tt = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:2) ( τ − τ ) (cid:12) d t ( θ ) (cid:3) ln t (cid:13)(cid:13)(cid:13) is1 T (ln T ) k T (cid:88) t = T − θ i ( τ i − τ i ) t θ i (ln t ) ≤ (ln T ) k +  sup θ L ≤ θ ≤ θ U T T (cid:88) t = (cid:18) tT (cid:19) θ  ( τ i − τ i ) . (A.28)The supremum is bounded so it remains to say something about ( τ i − τ i ) . By the triangle inequalityand the properties of norms (namely (cid:107) diag[ a ] (cid:107) ≤ (cid:107) a (cid:107) ), we have (cid:13)(cid:13)(cid:13)(cid:13) D d , T ( θ )( τ − τ ) (cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13) D d , T ( θ ) (cid:104) τ − τ + (cid:2) τ (cid:12) ( θ − θ ) (cid:3) ln T (cid:105)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) D d , T ( θ ) diag[ τ ]( θ − θ ) (cid:13)(cid:13)(cid:13) ln T ≤ (cid:13)(cid:13)(cid:13)(cid:13) D d , T ( θ ) (cid:104) τ − τ + (cid:2) τ (cid:12) ( θ − θ ) (cid:3) ln T (cid:105)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:107) τ (cid:107) (cid:13)(cid:13)(cid:13) D d , T ( θ )( θ − θ ) (cid:13)(cid:13)(cid:13) ln T ≤ δ (cid:0) + (cid:107) τ (cid:107) (cid:1) T − / (ln T ) , (A.29)for all τ ∈ N δ, T ( γ ). (A.29) implies that ( τ i − τ i ) ≤ δ (cid:0) + (cid:107) τ (cid:107) (cid:1) (ln T ) T − (1 + θ i ) which goes to zeroas T → ∞ . Now combine this ﬁnding with the RHS of (A.28) to establish the result. (iv) Usesimilar arguments as used in the proofs of (i) and (iii). (v)

By deﬁnition of f ( w t , γ ) (see Lemma 3),it follows that f ( w t , γ ) − f ( w t , γ ) = d t ( θ ) (cid:48) τ + s (cid:48) t φ − (cid:2) d t ( θ ) (cid:48) τ + s (cid:48) t φ (cid:3) = (cid:2) d t ( θ ) − d t ( θ ) (cid:3) (cid:48) ( τ − τ ) + d t ( θ ) (cid:48) ( τ − τ ) + (cid:2) d t ( θ ) − d t ( θ ) (cid:3) (cid:48) τ + s (cid:48) t ( φ − φ ) . and by the c r -inequality that T (cid:88) t = (cid:12)(cid:12)(cid:12) f ( w t , γ ) − f ( w t , γ ) (cid:12)(cid:12)(cid:12) ≤ C (cid:40) T (cid:88) t = (cid:12)(cid:12)(cid:12)(cid:0) d t ( θ ) − d t ( θ ) (cid:1) (cid:48) ( τ − τ ) (cid:12)(cid:12)(cid:12) + T (cid:88) t = | d t ( θ ) (cid:48) ( τ − τ ) | + T (cid:88) t = (cid:12)(cid:12)(cid:12)(cid:0) d t ( θ ) − d t ( θ ) (cid:1) (cid:48) τ (cid:12)(cid:12)(cid:12) + T (cid:88) t = (cid:12)(cid:12)(cid:12) s (cid:48) t ( φ − φ ) (cid:12)(cid:12)(cid:12) (cid:41) = : C (cid:8) IVa + IVb + IVc + IVd (cid:9) . (A.30)36t remains to bound the terms IVa - IVd uniformly over N δ, T ( γ ). We repeatedly rely on | a (cid:48) b | ≤(cid:107) a (cid:107) (cid:107) b (cid:107) (Cauchy-Schwartz). We have IVa = T (cid:88) t = (cid:12)(cid:12)(cid:12)(cid:2) D d , T ( θ ) − ( d t ( θ ) − d t ( θ )) (cid:3) (cid:48) (cid:2) D d , T ( θ )( τ − τ ) (cid:3)(cid:12)(cid:12)(cid:12) ≤ T (cid:13)(cid:13)(cid:13) D d , T ( θ )( τ − τ ) (cid:13)(cid:13)(cid:13) T T (cid:88) t = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − ( d t ( θ ) − d t ( θ )) (cid:13)(cid:13)(cid:13) → N δ, T ( γ ) as T → ∞ by (A.29) and Lemma 4(a). The bound on IVb is derived similarly, that is

IVb ≤ T (cid:13)(cid:13)(cid:13) D d , T ( θ )( τ − τ ) (cid:13)(cid:13)(cid:13) T T (cid:88) t = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − d t ( θ ) (cid:13)(cid:13)(cid:13) = T (cid:13)(cid:13)(cid:13) D d , T ( θ )( τ − τ ) (cid:13)(cid:13)(cid:13) T T (cid:88) t = d (cid:88) i = (cid:18) tT (cid:19) θ i ≤ dT (cid:13)(cid:13)(cid:13) D d , T ( θ )( τ − τ ) (cid:13)(cid:13)(cid:13) T T (cid:88) t = (cid:18) tT (cid:19) θ L , (A.32)which is O (cid:0) (ln T ) (cid:1) uniformly over N δ, T ( γ ) because (cid:13)(cid:13)(cid:13) D d , T ( θ )( τ − τ ) (cid:13)(cid:13)(cid:13) = O (cid:0) T − (ln T ) (cid:1) (using(A.29)) and T (cid:80) Tt = (cid:16) tT (cid:17) θ L is bounded (see Lemma 1(i)). For the third term, we establish IVc ≤ (cid:107) τ (cid:107) T (cid:88) t = d (cid:88) i = (cid:12)(cid:12)(cid:12) t θ i − t θ i (cid:12)(cid:12)(cid:12) = (cid:107) τ (cid:107) d (cid:88) i = T (cid:88) t = t θ i (cid:12)(cid:12)(cid:12) t θ i − θ i − (cid:12)(cid:12)(cid:12) ≤ (cid:107) τ (cid:107) d (cid:88) i = T (cid:88) t = t θ i (ln t ) ( θ i − θ i ) ≤ (cid:107) τ (cid:107) (ln T ) T d (cid:88) i = T θ i ( θ i − θ i )  T T (cid:88) t = (cid:18) tT (cid:19) θ L  = (cid:107) τ (cid:107) (ln T ) T (cid:13)(cid:13)(cid:13) D d , T ( θ )( θ − θ ) (cid:13)(cid:13)(cid:13)  T T (cid:88) t = (cid:18) tT (cid:19) θ L  = O (cid:0) (ln T ) (cid:1) using the mean-value theorem, Lemma 1(i), and the fact that (cid:13)(cid:13)(cid:13) D d , T ( θ )( θ − θ ) (cid:13)(cid:13)(cid:13) = O (cid:0) T − (ln T ) (cid:1) on N δ, T ( γ ). Finally, the bound on IVd . We have

IVd ≤ T (cid:13)(cid:13)(cid:13) D s , T ( φ − φ ) (cid:13)(cid:13)(cid:13) T T (cid:88) t = (cid:107) D − s , T s t (cid:107) ≤ T (cid:13)(cid:13)(cid:13) D s , T ( φ − φ ) (cid:13)(cid:13)(cid:13) m (cid:88) i = p i (cid:88) j = T T (cid:88) t = (cid:32) x it √ T (cid:33) j , which is O p (cid:0) (ln T ) (cid:1) since (cid:13)(cid:13)(cid:13) D s , T ( φ − φ ) (cid:13)(cid:13)(cid:13) ≤ δ T − (ln T ) on N δ, T ( γ ) and since Assumption 2guarantees that all terms of the form T (cid:80) Tt = ( x it / √ T ) j converge to integrals of Brownian motions. (vi) It follows by (cid:107) diag[ a ] (cid:107) ≤ (cid:107) a (cid:107) , (cid:107) AB (cid:107) ≤ (cid:107) A (cid:107) (cid:107) B (cid:107) , the c r -inequality, and the triangle inequalitythat 1 T (ln T ) k T (cid:88) t = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − diag[ d t ( θ )] (cid:13)(cid:13)(cid:13) ≤ T (ln T ) k T (cid:88) t = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − d t ( θ ) (cid:13)(cid:13)(cid:13) ≤ C T (ln T ) k (cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:13)(cid:13)(cid:13) T T (cid:88) t = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:0) d t ( θ ) − d t ( θ ) (cid:1)(cid:13)(cid:13)(cid:13) + C T (ln T ) k (cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:13)(cid:13)(cid:13) T T (cid:88) t = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − d t ( θ ) (cid:13)(cid:13)(cid:13) (A.33)37oth terms are negligible since T (ln T ) k (cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:13)(cid:13)(cid:13) ≤ (ln T ) k T − (1 + θ L ) → T (cid:80) Tt = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:0) d t ( θ ) − d t ( θ ) (cid:1)(cid:13)(cid:13)(cid:13) is o (1) on N δ, T ( γ ) (Lemma 4(i)), and T (cid:80) Tt = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − d t ( θ ) (cid:13)(cid:13)(cid:13) is bounded in view of Lemma1(i). (vii) Use similar steps as in (A.33) to obtain T − (ln T ) (cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t = D d , T ( θ ) − diag[ d t ( θ )] u t (ln t ) k (cid:13)(cid:13)(cid:13)(cid:13) ≤ (ln T ) T − ( θ L + ) × (cid:40)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − / T (cid:88) t = D d , T ( θ ) − d t ( θ ) u t (ln t ) k (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) V ( a ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − / T (cid:88) t = D d , T ( θ ) − (cid:0) d t ( θ ) − d t ( θ ) (cid:1) u t (ln t ) k (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) V ( b ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:41) . Note that the i th component of the vector in V ( a ) equals T − / (cid:80) Tt = (cid:16) tT (cid:17) θ i u t (ln t ) k . Lemma 1(iii)and Chebyshev’s inequality imply that V ( a ) is O p (cid:0) (ln T ) k (cid:1) . Furthermore, the mean-value theoremimplies (cid:13)(cid:13)(cid:13) V ( b ) (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:2) D d , T ( θ )( θ − θ ) (cid:3) (cid:12)  T − / T (cid:88) t = D d , T ( θ ) − d t ( ˜ θ ) u t (ln t ) k + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) D d , T ( θ )( θ − θ ) (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − / T (cid:88) t = D d , T ( θ ) − d t ( ˜ θ ) u t (ln t ) k + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) D d , T ( θ )( θ − θ ) (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) D d , T ( θ ) D d , T ( ˜ θ ) (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − / T (cid:88) t = D d , T ( θ ) − d t ( ˜ θ ) u t (ln t ) k + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:16) δ (ln T ) T − / (cid:17) T (cid:107) θ − θ (cid:107) ∞ O p (cid:0) (ln T ) k + (cid:1) (A.34)using Lemma 1(iii) and θ ∈ N δ, T ( γ ). Having established the stochastic orders of V ( a ) and V ( b ) itis straightforward to verify the claim in (vii). (cid:4) B Proof of Main TheoremsC Proof of Theorem 1

C.1 Theorem 1

This section is devoted to the proof of Theorems 1 - 4. Before continuing, we recall the criterionfunction being deﬁned as Q T ( γ ) = (cid:80) Tt = (cid:0) y t − f ( w t , γ ) (cid:1) with f ( w t , γ ) = d t ( θ ) (cid:48) τ + s (cid:48) t φ and w t = [ t , x t ] (cid:48) . From theorem 3.1 in Chan and Wang (2015), we have G γ , T ( (cid:98) γ T − γ ) = M − T z T + o p (1),if the following ﬁve conditions are fulﬁlled: (i) (cid:13)(cid:13)(cid:13) G − γ , T (cid:13)(cid:13)(cid:13) → T → ∞ ,(ii) sup γ : (cid:107) G γ , T ( γ − γ ) (cid:107) ≤ k T (cid:13)(cid:13)(cid:13)(cid:13) G (cid:48)− γ , T (cid:80) Tt = (cid:104) ˙ f ( w t , γ ) ˙ f ( w t , γ ) (cid:48) − ˙ f ( w t , γ ) ˙ f ( w t , γ ) (cid:48) (cid:105) G − γ , T (cid:13)(cid:13)(cid:13)(cid:13) = o p (1),(iii) sup γ : (cid:107) G γ , T ( γ − γ ) (cid:107) ≤ k T (cid:13)(cid:13)(cid:13) G (cid:48)− γ , T (cid:80) Tt = ¨ f ( w t , γ ) (cid:2) f ( w t , γ ) − f ( w t , γ ) (cid:3) G − γ , T (cid:13)(cid:13)(cid:13) = o p (1), The original result in theorem 3.1 of Chan and Wang (2015) does not explicitly allow for deterministic trends and ascaling matrix G γ , T that is parameter dependent. However, all steps in the proof remain valid after allowing for thesefeatures. γ : (cid:107) G γ , T ( γ − γ ) (cid:107) ≤ k T (cid:13)(cid:13)(cid:13) G (cid:48)− γ , T (cid:80) Tt = ¨ f ( w t , γ ) u t G − γ , T (cid:13)(cid:13)(cid:13) = o p (1);(v) for any α i = [ α i , . . . , α i , d + p ] (cid:48) ∈ R d + p , i = , , (cid:2) α (cid:48) M T α , α (cid:48) z T (cid:3) ⇒ (cid:2) α (cid:48) M α , α (cid:48) z (cid:3) , where M > P ( z < ∞ ) = M T = G (cid:48)− γ , T  T (cid:88) t = ˙ f ( w t , γ ) ˙ f ( w t , γ ) (cid:48)  G − γ , T , z T = G (cid:48)− γ , T  T (cid:88) t = ˙ f ( w t , γ ) u t  . (C.1)We make two remarks before verifying these conditions. First, we set k T = δ ln T and we will verifyconditions (ii)-(iv) while replacing the supremum over the set (cid:101) N δ, T ( γ ) = (cid:8) γ ∈ Γ : (cid:107) G γ , T ( γ − γ ) (cid:107) ≤ δ ln T (cid:9) by a supremum over the set N δ, T ( γ ) given in (A.25). N δ, T ( γ ) = (cid:26) γ ∈ Γ : (cid:13)(cid:13)(cid:13) D d , T ( θ ) ( θ − θ ) (cid:13)(cid:13)(cid:13) < δ T − / ln T , (cid:13)(cid:13)(cid:13) D s , T ( φ − φ ) (cid:13)(cid:13)(cid:13) < δ T − / ln T , (cid:13)(cid:13)(cid:13)(cid:13) D d , T ( θ ) (cid:16) ( τ − τ ) + (cid:2) τ (cid:12) ( θ − θ ) (cid:3) ln T (cid:17)(cid:13)(cid:13)(cid:13)(cid:13) < δ T − / ln T (cid:27) . (C.2)Since (cid:101) N δ, T ( γ ) ⊂ N δ, T ( γ ), this replacement is innocuous. Second, note that (cid:107) a (cid:12) b (cid:107) ≤ (cid:107) a (cid:12) b (cid:107) ≤(cid:107) a (cid:107) (cid:107) b (cid:107) holds for comformable vectors a and b . Proof of Theorem 1 (i)

From (cid:107) L τ , T (cid:107) ≤ (cid:107) L τ , T (cid:107) F = (cid:16) d + p + (cid:107) τ (cid:107) (ln T ) (cid:17) / ≤ C ln T , we have (cid:13)(cid:13)(cid:13) G − γ , T (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) L τ , T (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) D − θ , T (cid:13)(cid:13)(cid:13) ≤ C (ln T ) T − / (cid:0) T − θ l + T − / (cid:1) → , as T → ∞ . (ii) Use (cid:80) t a t a (cid:48) t − (cid:80) t b t b (cid:48) t = (cid:80) t ( a t − b t )( a t − b t ) (cid:48) + (cid:80) t ( a t − b t ) b (cid:48) t + (cid:80) t b t ( a t − b t ) (cid:48) toobtain the bound (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) G (cid:48)− γ , T T (cid:88) t = (cid:104) ˙ f ( w t , γ ) ˙ f ( w t , γ ) (cid:48) − ˙ f ( w t , γ ) ˙ f ( w t , γ ) (cid:48) (cid:105) G − γ , T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) D − θ , T  T (cid:88) t = L τ , T (cid:16) ˙ f ( w t , γ ) − ˙ f ( w t , γ ) (cid:17) (cid:16) ˙ f ( w t , γ ) − ˙ f ( w t , γ ) (cid:17) (cid:48) L (cid:48) τ , T  D − θ , T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) D − θ , T  T (cid:88) t = L τ , T (cid:16) ˙ f ( w t , γ ) − ˙ f ( w t , γ ) (cid:17) ˙ f ( w t , γ ) (cid:48) L (cid:48) τ , T  D − θ , T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (C.3) ≤ T (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13) D − θ , T L τ , T (cid:16) ˙ f ( w t , γ ) − ˙ f ( w t , γ ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:118)(cid:116) (ln T ) T (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13) D − θ , T L τ , T (cid:16) ˙ f ( w t , γ ) − ˙ f ( w t , γ ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13) (cid:118)(cid:116) (ln T ) − T (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13) D − θ , T L τ , T ˙ f ( w t , γ ) (cid:13)(cid:13)(cid:13)(cid:13) . k ≥

0, the c r -inequality and (A.22) yield a further upper bound as(ln T ) k T (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13) D − θ , T L τ , T (cid:16) ˙ f ( w t , γ ) − ˙ f ( w t , γ ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13) ≤ C (ln T ) k  T T (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:16) ( τ − τ ) (cid:12) ( d t ( θ ) − d t ( θ )) (cid:17) ln t (cid:13)(cid:13)(cid:13)(cid:13) + T T (cid:88) t = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:0) d t ( θ ) − d t ( θ ) (cid:1)(cid:13)(cid:13)(cid:13) + T T (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:16) τ (cid:12) (cid:0) d t ( θ ) − d t ( θ ) (cid:1)(cid:17) ln tT (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + T T (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:16)(cid:0) τ − τ (cid:1) (cid:12) d t ( θ ) (cid:17) ln t (cid:13)(cid:13)(cid:13)(cid:13)  . (C.4)It follows from properties (i)-(iv) of Lemma 4 that each term in the RHS of (C.4) is o (1) uniformlyover N δ, T ( γ ). Moreover, since (cid:16) ln tT (cid:17) ≤ T ) for every t = , , . . . , T , we ﬁnd that(ln T ) − T (cid:88) t = (cid:13)(cid:13)(cid:13) D − θ , T L τ , T ˙ f ( w t , γ ) (cid:13)(cid:13)(cid:13) = (ln T ) − T T (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:16) τ (cid:12) d t ( θ ) (cid:17) ln tT (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (ln T ) − T T (cid:88) t = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − d t ( θ ) (cid:13)(cid:13)(cid:13) + (ln T ) − T T (cid:88) t = (cid:13)(cid:13)(cid:13) D − s , T s t (cid:13)(cid:13)(cid:13) (C.5)is O p (1) in view of Lemmas 1(i) and 2(iii). The combination of (C.3), (C.4), and (C.5) leads to thedesired result. (iii) The Cauchy-Schwarz inequality implies (cid:13)(cid:13)(cid:13)(cid:13) G (cid:48)− γ , T T (cid:88) t = ¨ f ( w t , γ ) (cid:2) f ( w t , γ ) − f ( w t , γ ) (cid:3) G − γ , T (cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:118)(cid:116) (ln T ) T (cid:88) t = (cid:13)(cid:13)(cid:13) G (cid:48)− γ , T ¨ f ( w t , γ ) G − γ , T (cid:13)(cid:13)(cid:13) (cid:118)(cid:116) (ln T ) − T (cid:88) t = (cid:12)(cid:12)(cid:12) f ( w t , γ ) − f ( w t , γ ) (cid:12)(cid:12)(cid:12) . (C.6)Using the identity in Lemma 3(v), we can bound the ﬁrst term in the RHS of (C.6) as in(ln T ) T (cid:88) t = (cid:13)(cid:13)(cid:13) G (cid:48)− γ , T ¨ f ( w t , γ ) G − γ , T (cid:13)(cid:13)(cid:13) ≤ C (ln T ) T (cid:88) t = (cid:13)(cid:13)(cid:13) ¨ F , t (cid:13)(cid:13)(cid:13) + C (ln T ) T (cid:88) t = (cid:13)(cid:13)(cid:13) ¨ F , t (cid:13)(cid:13)(cid:13) ≤ C (ln T ) T T (cid:88) t = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − diag[ d t ( θ )] (cid:13)(cid:13)(cid:13) , (C.7)which is o (1) uniformly over N δ, T ( γ ) by Lemma 4(vi). Note that the second inequality in (C.7)makes use of the facts that: (1) all matrices in ¨ F , t and ¨ F , t are diagonal and therefore commute,and (2) the triangle inequality gives (cid:107) diag[ τ ] (cid:107) ≤ (cid:107) τ (cid:107) ≤ (cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) D d , T ( θ )( τ − τ ) (cid:13)(cid:13)(cid:13) + (cid:107) τ (cid:107) ≤ C (ln T ) T − ( θ l + / + (cid:107) τ (cid:107) ≤ C when γ ∈ N δ, T ( γ ) and T su ﬃ ciently large. The ﬁrst term in (C.6) is o (1) and the second is O (1)(see Lemma 4( v)) over N δ, T ( γ ). The claim follows. (iv) By Lemma 3(v), we have (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) G (cid:48)− γ , T T (cid:88) t = ¨ f ( w t , γ ) u t G − γ , T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ C (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t = ¨ F , t u t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t = ¨ F , t u t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (C.8) As an example, we consider (ln T ) (cid:80) Tt = (cid:13)(cid:13)(cid:13) ¨ F , t (cid:13)(cid:13)(cid:13) explicitly. Using the deﬁnition of ¨ F , t in Lemma 3(v) andthe triangle inequality, we have (ln T ) (cid:80) Tt = (cid:13)(cid:13)(cid:13) ¨ F , t (cid:13)(cid:13)(cid:13) ≤ C (ln T ) T (cid:80) Tt = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − diag[ τ ] diag[ d t ( θ )] D d , T ( θ ) − (cid:13)(cid:13)(cid:13) ≤ C (ln T ) T (cid:107) diag[ τ ] (cid:107) (cid:80) Tt = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − diag[ d t ( θ )] D d , T ( θ ) − (cid:13)(cid:13)(cid:13) . (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T (cid:88) t = ¨ F , t u t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:107) diag[ τ ] (cid:107) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − T (cid:88) t = D d , T ( θ ) − diag[ d t ( θ )] u t (ln t ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:107) diag[ τ ] (cid:107) (ln T ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) T − T (cid:88) t = D d , T ( θ ) − diag[ d t ( θ )] u t ln t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , (C.9)and (cid:13)(cid:13)(cid:13)(cid:80) Tt = ¨ F , t u t (cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) T − (cid:80) Tt = D d , T ( θ ) − diag[ d t ( θ )] u t ln t (cid:13)(cid:13)(cid:13) . All relevant terms are o p (1) over N δ, T ( γ ) by Lemma 4(vii). (v) The convergence results in Lemma 2 applied to the deﬁnitions inLemma 3(vi) provide ( M T , z T ) ⇒ (cid:18)(cid:82) j ( r ; γ ) j ( r ; γ ) (cid:48) dr , (cid:82) j ( r ; γ ) dB u ( r ) + B vu (cid:19) as T → ∞ . (cid:4) C.2 Theorem 2

Proof of Theorem 2

Changing the summation indices, we can express the one-sided long-runcovariance estimator as (cid:98) ∆ T ( (cid:98) γ T , b T ) = T − (cid:88) i = k (cid:32) ib T (cid:33)  T T − i (cid:88) t = V t + i ( (cid:98) γ T ) V t ( (cid:98) γ T ) (cid:48)  , (C.10)where we explicitly indicate the dependence on the parameter estimator (cid:98) γ T and bandwidth b T . If wedeﬁne (cid:98) Σ T ( (cid:98) γ T ) = T (cid:80) Tt = V t ( (cid:98) γ T ) V t ( (cid:98) γ T ) (cid:48) and (cid:98) Γ T ( (cid:98) γ T , b T ) = (cid:80) T − i = k (cid:16) ib T (cid:17) (cid:16) T (cid:80) T − it = V t ( (cid:98) γ T ) V t ( (cid:98) γ T ) (cid:48) (cid:17) , then (cid:98) ∆ T ( (cid:98) γ T , b T ) = (cid:98) Σ T ( (cid:98) γ T ) + (cid:98) Γ T ( (cid:98) γ T , b T ). We make two observations. First, the two-sided long-runcovariance matrix estimator can be written as (cid:98) Ω T ( (cid:98) γ T , b T ) = (cid:98) Σ T ( (cid:98) γ T ) + (cid:98) Γ T ( (cid:98) γ T , b T ) + (cid:98) Γ T ( (cid:98) γ T , b T ) (cid:48) . (C.11)It thus su ﬃ ces to study the asymptotic behavior of (cid:98) Σ T ( (cid:98) γ T ) and (cid:98) Γ T ( (cid:98) γ T , b T ) only. Second, thebottom right subblock of V t ( γ ) V t ( γ ) (cid:48) equals v t v (cid:48) t (no parameter estimation uncertainty here). Theconsistency results for this subblock are immediate from theorem 2 of Jansson (2002). We willtherefore restrict our attention to (1 , th elements of (cid:98) Σ T ( (cid:98) γ T ) and (cid:98) Γ T ( (cid:98) γ T , b T ). That is, we will show (cid:104) (cid:98) Σ T ( (cid:98) γ T ) (cid:105) = T T (cid:88) t = ˆ u t −→ p E ( u t ) , (C.12)and (cid:104) (cid:98) Γ T ( (cid:98) γ T , b T ) (cid:105) = T − (cid:88) i = k (cid:32) ib T (cid:33)  T T − i (cid:88) t = ˆ u t + i ˆ u t  −→ p lim T →∞ T T − (cid:88) i = T − i (cid:88) t = E ( u t + i u t ) . (C.13)The consistency proofs for the other elements in the ﬁrst row / column of these matrices followseasily using similar arguments. The following result will be used throughoutˆ u t − u t = z t ( θ ) (cid:48) (cid:104) τ φ (cid:105) − z t ( (cid:98) θ T ) (cid:48) (cid:104) (cid:98) τ T (cid:98) φ T (cid:105) = (cid:2) z t ( θ ) − z t ( (cid:98) θ T ) (cid:3) (cid:48) (cid:104) τ φ (cid:105) + (cid:2) z t ( θ ) − z t ( (cid:98) θ T ) (cid:3) (cid:48) (cid:104) (cid:98) τ T − τ (cid:98) φ T − φ (cid:105) − z t ( θ ) (cid:48) (cid:104) (cid:98) τ T − τ (cid:98) φ T − φ (cid:105) = (cid:2) d t ( θ ) − d t ( (cid:98) θ T ) (cid:3) (cid:48) τ + (cid:2) d t ( θ ) − d t ( (cid:98) θ T ) (cid:3) (cid:48) (cid:0)(cid:98) τ T − τ (cid:1) − z t ( θ ) (cid:48) (cid:104) (cid:98) τ T − τ (cid:98) φ T − φ (cid:105) . (C.14)41 i) We ﬁrst show (C.12). Standard arguments provide T (cid:80) Tt = u t −→ p E ( u t ), so it su ﬃ ces toshow that T (cid:80) Tt = ( u t − ˆ u t ) = o p (1). First, by Cauchy-Schwarz we ﬁnd (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T (cid:88) t = (cid:0) ˆ u t − u t (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = T T (cid:88) t = ( ˆ u t − u t ) + T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t = u t ( ˆ u t − u t ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ T T (cid:88) t = ( ˆ u t − u t ) + (cid:118)(cid:116) T T (cid:88) t = u t (cid:118)(cid:116) T T (cid:88) t = ( ˆ u t − u t ) . (C.15)It remains to establish T (cid:80) Tt = ( ˆ u t − u t ) = o p (1). Using (C.14) and T (cid:80) Tt = ( a t + b t ) ≤ T (cid:80) Tt = a t + T (cid:80) Tt = b t + (cid:113) T (cid:80) Tt = a t (cid:113) T (cid:80) Tt = b t , we see that the result follows if the following three statementsare true: τ (cid:48)  T T (cid:88) t = (cid:2) d t ( (cid:98) θ T ) − d t ( θ ) (cid:3)(cid:2) d t ( (cid:98) θ T ) − d t ( θ ) (cid:3) (cid:48)  τ = o p (1) , (C.16a) (cid:0)(cid:98) τ T − τ (cid:1) (cid:48)  T T (cid:88) t = (cid:2) d t ( (cid:98) θ T ) − d t ( θ ) (cid:3)(cid:2) d t ( θ ) − d t ( (cid:98) θ T ) (cid:3) (cid:48)  (cid:0)(cid:98) τ T − τ (cid:1) = o p (1) , (C.16b) (cid:104) (cid:98) τ T − τ (cid:98) φ T − φ (cid:105) (cid:48) T T (cid:88) t = z t ( θ ) z t ( θ ) (cid:48) (cid:104) (cid:98) τ T − τ (cid:98) φ T − φ (cid:105) = o p (1) . (C.16c)We ﬁrst look at the norm of the ( i , j ) th component of T (cid:80) Tt = (cid:2) d t ( (cid:98) θ T ) − d t ( θ ) (cid:3)(cid:2) d t ( (cid:98) θ T ) − d t ( θ ) (cid:3) (cid:48) , i.e. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T (cid:88) t = (cid:16) t (cid:98) θ i − t θ i (cid:17) (cid:16) t (cid:98) θ j − t θ j (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = T T (cid:88) t = t θ i + θ j (cid:12)(cid:12)(cid:12)(cid:12) t (cid:98) θ i − θ i − (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) t (cid:98) θ j − θ j − (cid:12)(cid:12)(cid:12)(cid:12) ≤ C T T (cid:88) t = t θ i + θ j (cid:0) ln t (cid:1) (cid:12)(cid:12)(cid:12)(cid:12)(cid:98) θ i − θ i (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:98) θ j − θ j (cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:0) ln T (cid:1) T (cid:12)(cid:12)(cid:12)(cid:12) T θ i + ( (cid:98) θ i − θ i ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) T θ j + ( (cid:98) θ j − θ j ) (cid:12)(cid:12)(cid:12)(cid:12) T T (cid:88) t = (cid:18) tT (cid:19) θ i + θ j . (C.17)The RHS of (C.17) is O p (cid:16) T − (ln T ) (cid:17) by the convergence result from Theorem 1 and Lemma 1(i).The statements in (C.16a) and (C.16b) follow easily. We subsequently introduce the scaling matrix D θ , T = (cid:104) D d , T ( θ ) OO D s , T (cid:105) . The LHS of (C.16c) can now be expressed as (cid:16) D θ , T (cid:104) (cid:98) τ T − τ (cid:98) φ T − φ (cid:105)(cid:17) (cid:48)  T T (cid:88) t = (cid:2) D − θ , T z t ( θ ) (cid:3)(cid:2) D − θ , T z t ( θ ) (cid:3) (cid:48)  (cid:16) D θ , T (cid:104) (cid:98) τ T − τ (cid:98) φ T − φ (cid:105)(cid:17) . (C.18)The results in Lemma 2(iii) imply that T (cid:80) Tt = (cid:2) D − θ , T z t ( θ ) (cid:3)(cid:2) D − θ , T z t ( θ ) (cid:3) (cid:48) ⇒ (cid:82) (cid:101) j ( r ; θ ) (cid:101) j ( r ; θ ) (cid:48) dr ,where (cid:101) j ( r ; θ ) = (cid:2) d ( r ; θ ) (cid:48) , B (cid:48) (1) ( r ) , . . . , B (cid:48) ( m ) ( r ) (cid:3) (cid:48) . A comparison of the elements of D θ , T (cid:104) (cid:98) τ T − τ (cid:98) φ T − φ (cid:105) with the convergence rates of these estimators leads us to conclude that (C.16c) is also true.42 .3 Proof of (C.13) (ii) To prove (C.13), we again show that the parameter estimation error is asymptotically negligible.If this holds, then the remainder of the proof follows from Jansson (2002). This said, we write T − (cid:88) i = k (cid:32) ib T (cid:33)  T T − i (cid:88) t = (cid:16) ˆ u t + i ˆ u t − u t + i u t (cid:1) = T − (cid:88) i = k (cid:32) ib T (cid:33)  T T − i (cid:88) t = u t + i ( ˆ u t − u t )  + T − (cid:88) i = k (cid:32) ib T (cid:33)  T T − i (cid:88) t = ( ˆ u t + i − u t + i ) u t  + T − (cid:88) i = k (cid:32) ib T (cid:33)  T T − i (cid:88) t = ( ˆ u t + i − u t + i )( ˆ u t − u t )  : = I + II + III . (C.19)We provide details for I = o p (1) and omit the explicit proofs for II and III . Similar (and tedious)calculations are applicable there. Using (C.14), we can decompose I into I = T − (cid:88) i = k (cid:32) ib T (cid:33)  T T − i (cid:88) t = u t + i (cid:2) d t ( θ ) − d t ( (cid:98) θ T ) (cid:3) (cid:48) τ  + T − (cid:88) i = k (cid:32) ib T (cid:33)  T T − i (cid:88) t = u t + i (cid:2) d t ( θ ) − d t ( (cid:98) θ T ) (cid:3) (cid:48) (cid:0)(cid:98) τ T − τ (cid:1) − T − (cid:88) i = k (cid:32) ib T (cid:33)  T T − i (cid:88) t = u t + i z t ( θ ) (cid:48) (cid:104) (cid:98) τ T − τ (cid:98) φ T − φ (cid:105) : = I a + I b − I c . (C.20)We adjust Hansen’s (1992) argument slightly and look at the quantities T / b T ln T | I i | for i ∈ { a , b , c } .If these quantities are stochastically bounded, then the result follows because b T ln TT / → T / b T ln T | I a | ≤ T / b T ln T T − (cid:88) i = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k (cid:32) ib T (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T − i (cid:88) t = u t + i (cid:2) d t ( θ ) − d t ( (cid:98) θ T ) (cid:3) (cid:48) τ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ T / b T ln T T − i (cid:88) i = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k (cid:32) ib T (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:118)(cid:116) T T − i (cid:88) t = u t + i (cid:118)(cid:116) T T − i (cid:88) t = (cid:16)(cid:2) d t ( θ ) − d t ( (cid:98) θ T ) (cid:3) (cid:48) τ (cid:17) ≤ b T T − i (cid:88) i = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k (cid:32) ib T (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:118)(cid:116) T T (cid:88) t = u t (cid:118)(cid:116) τ (cid:48)  T ) T (cid:88) t = (cid:2) d t ( (cid:98) θ T ) − d t ( θ ) (cid:3)(cid:2) d t ( (cid:98) θ T ) − d t ( θ ) (cid:3) (cid:48)  τ . (C.21)The RHS of (C.21) is bounded in probability due to lemma 1 of Jansson (2002), the fact that T (cid:80) Tt = u t −→ p E ( u t ), and (C.17). Similarly, use T / b T ln T | I b | ≤ b T T − i (cid:88) i = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k (cid:32) ib T (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:118)(cid:116) T T (cid:88) t = u t × (cid:118)(cid:116)(cid:0)(cid:98) τ T − τ (cid:1) (cid:48)  T ) T (cid:88) t = (cid:2) d t ( (cid:98) θ T ) − d t ( θ ) (cid:3)(cid:2) d t ( (cid:98) θ T ) − d t ( θ ) (cid:3) (cid:48)  (cid:0)(cid:98) τ T − τ (cid:1) . (C.22) Hansen (1992) multiplies his terms by T / / b T .

43o show that T / b T ln T | I b | = O p (1). Finally, we have T / b T ln T | I c | ≤ T / b T ln T T − (cid:88) i = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k (cid:32) ib T (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T − i (cid:88) t = u t + i z t ( θ ) (cid:48)  (cid:98) τ T − τ (cid:98) φ T − φ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ b T T − i (cid:88) i = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k (cid:32) ib T (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:118)(cid:116) T T (cid:88) t = u t × (cid:118)(cid:116) √ T ln T D θ , T  (cid:98) τ T − τ (cid:98) φ T − φ  (cid:48)  T T (cid:88) t = (cid:2) D − θ , T z t ( θ ) (cid:3)(cid:2) D − θ , T z t ( θ ) (cid:3) (cid:48)   √ T ln T D θ , T  (cid:98) τ T − τ (cid:98) φ T − φ  . (C.23)Now note that √ T ln T D d , T ( θ )( (cid:98) τ T − τ ) and √ T D s , T ( (cid:98) φ T − φ ) are O p (1). This completes the proof. (cid:4) C.4 Proof of Theorem 3

Proof of Theorem 3

For brevity, we deﬁne (cid:98) J N (cid:16)(cid:98) γ T , (cid:98) Ω , (cid:98) ∆ − vu (cid:17) =  G (cid:48)− (cid:98) γ , N  N (cid:88) n = ˙ f ( (cid:98) w n , (cid:98) γ T ) ˙ f ( (cid:98) w n , (cid:98) γ T ) (cid:48)  G − (cid:98) γ , N  −  G (cid:48)− (cid:98) γ , N  N (cid:88) n = ˙ f ( (cid:98) w n , (cid:98) γ T ) (cid:98) µ n  + (cid:98) B − vu  . As a ﬁrst step, we show (cid:98) J N (cid:16)(cid:98) γ T , (cid:98) Ω , (cid:98) ∆ − vu (cid:17) = (cid:98) J N (cid:16) γ , (cid:98) Ω , (cid:98) ∆ − vu (cid:17) + o ∗ p (1). Direct calculation yields R N : = G γ , N G − (cid:98) γ , N =  D d , N ( θ ) D d , N ( θ ) diag[ τ ]ln N D d , N ( θ ) O p × d O p × d D s , N   D d , N ( (cid:98) θ T ) − − D d , N ( (cid:98) θ T ) − diag[ (cid:98) τ T ]ln N D d , N ( (cid:98) θ T ) − O p × d O p × d D − s , N  =  D d , N ( θ ) D d , N ( (cid:98) θ T ) − D d , N ( θ ) D d , N ( (cid:98) θ T ) − diag[ τ − (cid:98) τ T ]ln N D d , N ( θ ) D d , N ( (cid:98) θ T ) − O p × d O p × d I p  . We have R N −→ p I d + p . To see this, note that (1) a typical diagonal element of D d , N ( θ ) D d , N ( (cid:98) θ T ) − is N θ i − (cid:98) θ i for which N | (cid:98) θ i − θ i | = exp (cid:16)(cid:12)(cid:12)(cid:12) (cid:98) θ i − θ i (cid:12)(cid:12)(cid:12) ln N (cid:17) ≤ exp (cid:18) (ln N ) N − ( θ L + ) (cid:16) NT (cid:17) θ L + (cid:12)(cid:12)(cid:12)(cid:12) T θ i + (cid:0) (cid:98) θ i − θ i (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:19) −→ p

1, and (2) (cid:13)(cid:13)(cid:13) diag[ τ − (cid:98) τ T ]ln N (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:13)(cid:13)(cid:13) ln T √ T (cid:0) ln N (cid:1) (cid:13)(cid:13)(cid:13)(cid:13) √ T ln T D d , T ( θ )( (cid:98) τ T − τ ) (cid:13)(cid:13)(cid:13)(cid:13) ≤ C ln TT / + θ L (cid:0) ln N (cid:1) O p (1) = o p (1). Deﬁne (cid:101) N δ ∗ , N ( γ ) similarly to (cid:101) N δ, T ( γ ) (page 39 below (C.1)). Consequently, if there exists aconstant δ ∗ > P (cid:16)(cid:98) γ T ∈ (cid:101) N δ ∗ , N ( γ ) (cid:17) →

1, then G (cid:48)− (cid:98) γ , N  N (cid:88) n = ˙ f ( (cid:98) w n , (cid:98) γ T ) ˙ f ( (cid:98) w n , (cid:98) γ T ) (cid:48)  G − (cid:98) γ , N = R (cid:48) N G (cid:48)− γ , N  N (cid:88) n = ˙ f ( (cid:98) w n , (cid:98) γ T ) ˙ f ( (cid:98) w n , (cid:98) γ T ) (cid:48)  G − γ , N R N = R (cid:48) N  G (cid:48)− γ , N  N (cid:88) n = ˙ f ( (cid:98) w n , γ ) ˙ f ( (cid:98) w n , γ ) (cid:48)  G − γ , N + o p (1)  R N (C.24) = G (cid:48)− γ , N  N (cid:88) n = ˙ f ( (cid:98) w n , γ ) ˙ f ( (cid:98) w n , γ ) (cid:48)  G − γ , N + o p (1) , (C.25)44here (C.24) follows from the same arguments in (C.3). The condition P (cid:16)(cid:98) γ T ∈ (cid:101) N δ ∗ , N ( γ ) (cid:17) → (cid:13)(cid:13)(cid:13) G γ , N (cid:0)(cid:98) γ T − γ (cid:1)(cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13)(cid:13) G γ , N G − γ , T (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13) G γ , T (cid:0)(cid:98) γ T − γ (cid:1)(cid:13)(cid:13)(cid:13) ≤ C δ ln T = : δ ∗ ln T ,where G γ , N G − γ , T = (cid:114) NT  D d , N ( θ ) D d , T ( θ ) − D d , N ( θ ) D d , T ( θ ) − diag[ τ ] (cid:16) ln NT (cid:17) D d , N ( θ ) D d , T ( θ ) − O p × d O p × d D s , N D − s , T  , and thus, by the norm property (cid:107) · (cid:107) ≤ (cid:107) · (cid:107) F , (cid:13)(cid:13)(cid:13) G γ , N G − γ , T (cid:13)(cid:13)(cid:13) ≤ C NT (cid:32)(cid:13)(cid:13)(cid:13) D d , N ( θ ) D − d , T ( θ ) (cid:13)(cid:13)(cid:13) F (cid:18) ln NT (cid:19) + (cid:13)(cid:13)(cid:13) D s , N D − s , T (cid:13)(cid:13)(cid:13) F (cid:33) ≤ C (cid:18) NT (cid:19) + θ L (cid:18) ln NT (cid:19) + (cid:18) NT (cid:19) ≤ C . Next, we consider G (cid:48)− (cid:98) γ , N (cid:80) Nn = ˙ f ( (cid:98) w n , (cid:98) γ T ) (cid:98) µ n , or equivalently G (cid:48)− (cid:98) γ , N  N (cid:88) n = ˙ f ( (cid:98) w n , (cid:98) γ T ) (cid:98) µ n  = R (cid:48) N  G (cid:48)− γ , N N (cid:88) n = ˙ f ( (cid:98) w n , γ ) (cid:98) µ n + G (cid:48)− γ , N N (cid:88) n = (cid:16) ˙ f ( (cid:98) w n , (cid:98) γ T ) − ˙ f ( (cid:98) w n , γ ) (cid:17) (cid:98) µ n  . We know R N −→ p I d + p . Moreover, by the triangle and Cauchy-Schwartz inequality, we have (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) G (cid:48)− γ , N N (cid:88) n = (cid:16) ˙ f ( (cid:98) w n , (cid:98) γ T ) − ˙ f ( (cid:98) w n , γ ) (cid:17) (cid:98) µ n (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:118)(cid:116) N N (cid:88) n = (cid:13)(cid:13)(cid:13)(cid:13) G (cid:48)− γ , N (cid:16) ˙ f ( (cid:98) w n , (cid:98) γ T ) − ˙ f ( (cid:98) w n , γ ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13) (cid:118)(cid:116) N N (cid:88) n = (cid:98) µ n . (C.26)Using G γ , N = D θ , N L (cid:48)− τ , N , we see from (A.22) that the ﬁrst term in the RHS of (C.26) doesnot depend on { e , . . . , e N } . As in (C.4), we also have N (cid:80) Nn = (cid:13)(cid:13)(cid:13)(cid:13) G (cid:48)− γ , N (cid:16) ˙ f ( (cid:98) w n , (cid:98) γ T ) − ˙ f ( (cid:98) w n , γ ) (cid:17)(cid:13)(cid:13)(cid:13)(cid:13) ≤ C (cid:80) i = I i with I = N (cid:88) n = (cid:13)(cid:13)(cid:13)(cid:13) D d , N ( θ ) − (cid:16) ( (cid:98) τ T − τ ) (cid:12) ( d n ( (cid:98) θ T ) − d n ( θ )) (cid:17) ln n (cid:13)(cid:13)(cid:13)(cid:13) = O p (cid:32) (ln T ) T + θ L NT + θ L (cid:33) = o p (1) , I = N (cid:88) n = (cid:13)(cid:13)(cid:13)(cid:13) D d , N ( θ ) − (cid:0) d n ( (cid:98) θ T ) − d n ( θ ) (cid:1)(cid:13)(cid:13)(cid:13)(cid:13) = O p (cid:18) (ln T ) NT + θ L (cid:19) = o p (1) , I = N (cid:88) n = (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) D d , N ( θ ) − (cid:16) τ (cid:12) (cid:0) d n ( (cid:98) θ T ) − d n ( θ ) (cid:1)(cid:17) ln nN (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p (cid:18) (ln T ) NT + θ L (cid:19) = o p (1) , I = N (cid:88) n = (cid:13)(cid:13)(cid:13)(cid:13) D d , N ( θ ) − (cid:16)(cid:0)(cid:98) τ T − τ (cid:1) (cid:12) d n ( θ ) (cid:17) ln n (cid:13)(cid:13)(cid:13)(cid:13) = O p (cid:18) (ln T ) NT + θ L (cid:19) = o p (1) , and where stochastic orders are established as in Lemma 4 ( i ) to ( iv ) (and thus omitted). For thesecond term in the RHS of (C.26), note that N (cid:80) Nn = (cid:98) µ n ≤ N (cid:80) Nn = (cid:16)(cid:98) µ n + (cid:98) υ (cid:48) n (cid:98) υ n (cid:17) = N (cid:80) Nn = e (cid:48) n (cid:98) Ω T e n ≤ (cid:13)(cid:13)(cid:13) (cid:98) Ω T (cid:13)(cid:13)(cid:13) (cid:16) N (cid:80) Nn = e (cid:48) n e n (cid:17) = O ∗ p (1) because (cid:98) Ω T −→ p Ω and e n i . i . d . ∼ N( , I m + ). Overall, we have G (cid:48)− (cid:98) γ , N (cid:104)(cid:80) Nn = ˙ f ( (cid:98) w n , (cid:98) γ T ) (cid:98) µ n (cid:105) = G (cid:48)− γ , N (cid:80) Nn = ˙ f ( (cid:98) w n , γ ) (cid:98) µ n + o ∗ p (1). Combining this result with (C.25)gives (cid:98) J N (cid:16)(cid:98) γ T , (cid:98) Ω , (cid:98) ∆ − vu (cid:17) = (cid:98) J N (cid:16) γ , (cid:98) Ω , (cid:98) ∆ − vu (cid:17) + o ∗ p (1).45inally, we consider (cid:98) J N (cid:16) γ , (cid:98) Ω , (cid:98) ∆ − vu (cid:17) itself. By independence between { e n } and (cid:8) (cid:98) Ω , (cid:98) ∆ − vu (cid:9) ,consistency of (cid:98) Ω , and a FCLT for the i.i.d. sequence, we have1 √ N [ rN ] (cid:88) n = (cid:34)(cid:98) µ n (cid:98) υ n (cid:35) = (cid:98) Ω / √ N [ rN ] (cid:88) n = e n d ∗ → B ( r ) . (C.27)Note that the elements of (cid:98) Ω and (cid:98) ∆ are always multiplicative in the construction. By virtue of(C.27) and the direct application of Lemma 2, G (cid:48)− γ , N  N (cid:88) n = ˙ f ( (cid:98) w n , γ ) (cid:98) µ n  + (cid:98) B − vu d ∗ → (cid:90) j ( r ; γ ) dB u ( r ) +  d × d × Ω v u b ... Ω v m u b m  +  d × d × ∆ − v u b ... ∆ − v m u b m  = (cid:90) j ( r ; γ ) dB u ( r ) +  d × d × ∆ v u b ... ∆ v m u b m  , (C.28)where we use Ω + ∆ − = ( ∆ + ∆ (cid:48) − Σ ) + ( Σ − ∆ (cid:48) ) = ∆ . Similarly, we have G (cid:48)− γ , N  N (cid:88) n = ˙ f ( (cid:98) w n , γ ) ˙ f ( (cid:98) w n , γ ) (cid:48)  G − γ , N d ∗ → (cid:90) j ( r ; γ ) j ( r ; γ ) (cid:48) dr . (C.29)By (C.28) and (C.29), we obtain the theorem. (cid:4) D Proof of Theorem 4

Proof of Theorem 4

Without loss of generality, we set (cid:96) =

1. Subsequently, note that1 √ q T [ rq T ] (cid:88) t = ˆ u + t = √ q T [ rq T ] (cid:88) t = (cid:0) u t − Ω uv Ω − vv v t (cid:1) + (cid:0) Ω uv Ω − vv − (cid:98) Ω uv (cid:98) Ω − vv (cid:1) √ q T [ rq T ] (cid:88) t = v t − √ q T [ rq T ] (cid:88) t = (cid:16) d t (cid:0)(cid:98) θ T (cid:1) (cid:48) (cid:98) τ T − d t (cid:0) θ (cid:1) (cid:48) τ (cid:17) − √ q T [ rq T ] (cid:88) t = s (cid:48) t (cid:16) (cid:98) φ T − φ (cid:17) = : V Ia + V Ib − V Ic − V Id . (D.1)Assumption 2 justiﬁes the use of a functional central limit theorem for linear processes, e.g.Phillips and Solo (1992). Therefore, V Ia ⇒ B u . v ( r ) and Ω − u . v q T (cid:80) q T t = (cid:16) √ q T (cid:80) ti = (cid:96) (cid:0) u t − Ω uv Ω − vv v t (cid:1)(cid:17) ⇒ (cid:82) (cid:2) W ( r ) (cid:3) dr by the continuous mapping theorem for functionals. Theorem 4 will thus follow if wecan show that V Ib , V Ic and

V Id are asymptotically negligible.46ecause Assumptions 1-3 are required to hold, Theorem 2 implies that (cid:98) Ω T −→ p Ω (and hence (cid:107) Ω uv Ω − vv − (cid:98) Ω uv (cid:98) Ω − vv (cid:107) −→ p V Ib = o p (1). We decompose V Ic in three parts:

V Ic = √ q T [ rq T ] (cid:88) t = (cid:16) d t (cid:0)(cid:98) θ T (cid:1) − d t (cid:0) θ (cid:1)(cid:17) (cid:48) τ + √ q T [ rq T ] (cid:88) t = d t (cid:0) θ (cid:1) (cid:48) (cid:0)(cid:98) τ T − τ (cid:1) + √ q T [ rq T ] (cid:88) t = (cid:16) d t (cid:0)(cid:98) θ T (cid:1) − d t (cid:0) θ (cid:1)(cid:17) (cid:48) (cid:0)(cid:98) τ T − τ (cid:1) = : V Ic (1) + V Ic (2) + V Ic (3) . (D.2) | V Ic (1) | = (cid:12)(cid:12)(cid:12)(cid:12) √ q T (cid:80) [ rq T ] t = (cid:16)(cid:80) di = ( t (cid:98) θ i − t θ i ) (cid:17) τ i (cid:12)(cid:12)(cid:12)(cid:12) ≤ C (ln q T ) (cid:16) q T T (cid:17) θ L + (cid:80) di = | τ i | (cid:12)(cid:12)(cid:12)(cid:12) T θ i + ( (cid:98) θ i − θ i ) (cid:12)(cid:12)(cid:12)(cid:12) q T (cid:80) q T t = (cid:16) tq T (cid:17) θ i = (ln q T ) (cid:16) q T T (cid:17) θ L + O p (1) by the mean value theorem, Lemma 1(i), and T θ i + ( (cid:98) θ i − θ i ) = O p (1). By theCauchy-Schwartz and triangle inequality, we have (cid:12)(cid:12)(cid:12) V Ic (2) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ q T [ rq T ] (cid:88) t = (cid:104) D d , q T ( θ ) − d t (cid:0) θ (cid:1)(cid:105) (cid:48) (cid:104) D d , q T ( θ ) (cid:0)(cid:98) τ T − τ (cid:1)(cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (ln T ) (cid:18) q T T (cid:19) θ L +  q T [ rq T ] (cid:88) t = (cid:13)(cid:13)(cid:13) D d , q T ( θ ) − d t (cid:0) θ (cid:1)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) √ T ln T D d , T ( θ ) (cid:0)(cid:98) τ T − τ (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (ln T ) (cid:18) q T T (cid:19) θ L + O p (1) , where we used √ T ln T D d , T ( θ ) (cid:0)(cid:98) τ T − τ (cid:1) = O p (1) (see Theorem 1). Similarly, we bound (cid:12)(cid:12)(cid:12) V Ic (3) (cid:12)(cid:12)(cid:12) ≤ (ln T ) (cid:18) q T T (cid:19) θ L +  q T [ rq T ] (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13) D d , q T ( θ ) − (cid:16) d t (cid:0)(cid:98) θ T (cid:1) − d t (cid:0) θ (cid:1)(cid:17)(cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) √ T ln T D d , T ( θ ) (cid:0)(cid:98) τ T − τ (cid:1)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) . Because q T (cid:80) [ rq T ] t = (cid:12)(cid:12)(cid:12)(cid:12) q − θ i T (cid:0) t (cid:98) θ i − t θ i (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C (ln q T ) | (cid:98) θ i − θ i | (cid:18) q T (cid:80) [ rq T ] t = (cid:16) tq T (cid:17) θ i (cid:19) = (ln q T ) O p (cid:16) T − ( θ L + ) (cid:17) forany i = , , . . . , d , we see that V Ic (3) = o p (1). Overall, V Ic (1) , V Ic (2) and

V Ic (3) are all threeasymptotically negligle under the prerequisite that (ln T ) ( q T / T ) θ L + →

0. Finally, term

V Id . From | V Id | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:18) q T T (cid:19) / q T [ rq T ] (cid:88) t = (cid:16) D − s , q T s t (cid:17) (cid:48) D s , q T D − s , T (cid:104) √ T D s , T (cid:16) (cid:98) φ T − φ (cid:17)(cid:105)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18) q T T (cid:19) q T [ rq T ] (cid:88) t = (cid:13)(cid:13)(cid:13) D − s , q T s t (cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13) √ T D s , T (cid:16) (cid:98) φ T − φ (cid:17)(cid:13)(cid:13)(cid:13)(cid:13) ≤ (cid:18) q T T (cid:19)  q T [ rq T ] (cid:88) t = (cid:13)(cid:13)(cid:13) D − s , q T s t (cid:13)(cid:13)(cid:13)  / (cid:13)(cid:13)(cid:13)(cid:13) √ T D s , T (cid:16) (cid:98) φ T − φ (cid:17)(cid:13)(cid:13)(cid:13)(cid:13) , we see that | V Id | = O p (cid:0) q T T (cid:1) because: (1) q T (cid:80) [ rq T ] t = (cid:13)(cid:13)(cid:13) D − s , q T s t (cid:13)(cid:13)(cid:13) = tr (cid:16) q T (cid:80) [ rq T ] t = D − s , q T s t s (cid:48) t D − s , q T (cid:17) = O p (1),and (2) √ T D s , T (cid:16) (cid:98) φ T − φ (cid:17) = O p (1). (cid:4) E Further Explanations for MC Results in the Introduction

The innovation sequences { u t } and { v t } are mutually independent and generated as u t i . i . d . ∼ N(0 , σ u )and v t i . i . d . ∼ N(0 , σ v ). We used σ u = σ v = t c = = (cid:98) c √ ˆ σ [ X (cid:48) X ] − , with ( T ×

4) matrix X = (cid:34) ···

11 2 ··· Tx x ··· x T x x ··· x T (cid:35) (cid:48) , (cid:98) c is the third element of the OLS47stimator (cid:98) c = ( X (cid:48) X ) − X (cid:48) y , and (cid:98) σ = T (cid:13)(cid:13)(cid:13) y − X (cid:98) c (cid:13)(cid:13)(cid:13) . We reject the null whenever the teststatistic is less than the 5% quantile of a standard normally distributed random variable.We start with the derivation of the limiting distribution of (cid:98) c . Deﬁne the scaling matrix D T = diag( T / , T / , T , T / ), such that D − T X (cid:48) XD − T =  T (cid:80) Tt = t T / (cid:80) Tt = x t T (cid:80) Tt = x t ∗ T (cid:80) Tt = t T / (cid:80) Tt = tx t T (cid:80) Tt = tx t ∗ ∗ T (cid:80) Tt = x t T / (cid:80) Tt = x t ∗ ∗ ∗ T (cid:80) Tt = x t  ⇒  (cid:82) B ( r ) dr (cid:82) B ( r ) dr ∗ (cid:82) rB ( r ) dr (cid:82) rB ( r ) dr ∗ ∗ (cid:82) B ( r ) dr (cid:82) B ( r ) dr ∗ ∗ ∗ (cid:82) B ( r ) dr  , (E.1)by Lemma 2(iii) and with B ( · ) denoting a Brownian motion such that E (cid:0) B ( s ) − B ( r ) (cid:1) = σ v ( s − r )for ( s > r ). By the same lemma, after using the DGP y t = τ t + u t , we have1 T / D − T X (cid:48) y =  T (cid:80) Tt = y t T (cid:80) Tt = ty t T / (cid:80) Tt = x t y t T (cid:80) Tt = x t y t  = τ  T (cid:80) Tt = t T (cid:80) Tt = t T / (cid:80) Tt = t x t T (cid:80) Tt = t x t  + O p ( T − / ) ⇒ τ  (cid:82) r B ( r ) dr (cid:82) r B ( r ) dr  . (E.2)The combination (E.1) and (E.2) results in T / D T (cid:98) c = O p (1). For the fourth element this translatesinto T (cid:98) c = O p (1). Straightforward yet tedious calculations show T (cid:98) σ = O p (1). The asymptoticbehavior of the t-statistic is therefore t c = = (cid:98) c (cid:113) ˆ σ T [ D − T X (cid:48) XD − T ] − = T / (cid:98) c / T (cid:113) ˆ σ T [ D − T X (cid:48) XD − T ] − = O p ( T / ) . (E.3)The equation above shows that the t-statistic is stochastically unbounded whenever τ (cid:44) Itssign is governed by τ , see (E.2). F Limiting Distribution for Example 1

Referring to Theorem 1, we ﬁnd  T θ + T θ + τ ln( T ) T θ +  (cid:34) (cid:98) θ T − θ (cid:98) τ T − τ (cid:35) ⇒ (cid:34)(cid:82) (cid:0) τ r θ ln( r ) (cid:1) dr (cid:82) τ r θ ln( r ) dr (cid:82) τ r θ ln( r ) dr (cid:82) r θ dr (cid:35) − (cid:34)(cid:82) τ r θ ln( r ) dB u (cid:82) r θ dB u (cid:35) . We have to show that the quantity in the RHS is normally distributed with mean and variance asprovided in (3.2) of the main paper. Consider an arbitrary vector c = [ c , c ] (cid:48) and deﬁne A c = c (cid:48) (cid:34)(cid:82) τ r θ ln( r ) dB u (cid:82) r θ dB u (cid:35) = (cid:90) (cid:104) c τ r θ ln( r ) + c r θ (cid:105) dB u d = Ω / uu (cid:90) (cid:104) c τ r θ ln( r ) + c r θ (cid:105) dW u . The t-statistic should be adjusted in the presence of serial correlation and / or endogeneity. For simplicity, we didnot include such e ﬀ ects in the MC simulations. Similarly, our estimator for σ u also exploits the fact that { u t } is an i.i.d.sequence. If τ equals zero, then y t = u t and the terms that currently dominate the asymptotic distribution will be absent. Thet-statistic will be asymptotically standard normally distributed. A c it su ﬃ ces to derive mean and variance. Equation (4.190) in the same reference yields E ( A c ) =Ω / uu (cid:82) (cid:104) c τ r θ ln( r ) + c r θ (cid:105) d E (cid:0) W u (cid:1) =

0. Moreover, by (2.16) in Tanaka (2017) V ar (cid:0) A c (cid:1) = Ω uu (cid:90) (cid:104) c τ r θ ln( r ) + c r θ (cid:105) dr = Ω uu c (cid:48) (cid:34)(cid:82) (cid:0) τ r θ ln( r ) (cid:1) dr (cid:82) τ r θ ln( r ) dr (cid:82) τ r θ ln( r ) dr (cid:82) r θ dr (cid:35) c . c was arbitrary and therefore (cid:20) (cid:82) τ r θ ln( r ) dB u (cid:82) r θ dB u (cid:21) ∼ N (cid:18) , Ω uu (cid:20) (cid:82) (cid:0) τ r θ ln( r ) (cid:1) dr (cid:82) τ r θ ln( r ) dr (cid:82) τ r θ ln( r ) dr (cid:82) r θ dr (cid:21)(cid:19) . Finally, use (cid:82) (cid:0) r θ ln( r ) (cid:1) dr = θ + , (cid:82) r θ ln( r ) dr = − θ + , and basic linear algebra to recover the claim ofExample 1. G FMOLS Estimator

We here comment on the asymptotic properties of the FMOLS estimator. To be speciﬁc, we analysethe asymptotic properties of (cid:101) D θ , T (cid:20) (cid:98) τ + T − τ (cid:98) φ + T − φ (cid:21) , with (cid:101) D θ , T = √ T (cid:104) D d , T ( θ ) O d × p O p × d D s , T (cid:105) and (cid:98) τ + T (cid:98) φ + T  =  T (cid:88) t = z t ( (cid:98) θ T ) z t ( (cid:98) θ T ) (cid:48)  −  T (cid:88) t = z t ( (cid:98) θ T ) y + t − A ∗  , where y + t and A ∗ are the usual second-order bias corrections. That is, y + t = y t − (cid:98) Ω uv (cid:98) Ω − vv ∆ x t and A ∗ = [ (cid:48) d × , A ∗(cid:48) , . . . , A ∗(cid:48) m ] (cid:48) with A ∗ i = (cid:98) ∆ + v i u (cid:104) T , (cid:80) Tt = x it , . . . , p i (cid:80) Tt = x p i − it (cid:105) (cid:48) and (cid:98) ∆ + v i u is the i th rowof (cid:98) ∆ + vu = (cid:98) ∆ vu − (cid:98) ∆ vv (cid:98) Ω − vv (cid:98) Ω vu ( i = , , . . . , m ). If the converge speed of (cid:98) θ T is su ﬃ ciently fast, thenits estimation error is asymptotically negligible and the limiting distribution of (cid:101) D θ , T (cid:20) (cid:98) τ + T − τ (cid:98) φ + T − φ (cid:21) ismixed normal.We now focus on the limiting distribution. By linear algebra manipulations, we ﬁnd (cid:101) D θ , T  (cid:98) τ + T − τ (cid:98) φ + T − φ  =  (cid:101) D − θ , T T (cid:88) t = z t (cid:0)(cid:98) θ T (cid:1) z t (cid:0)(cid:98) θ T (cid:1) (cid:48) (cid:101) D − θ , T  − (cid:101) D − θ , T  T (cid:88) t = z t (cid:0)(cid:98) θ T (cid:1) ˜ u + t − A ∗  , (G.1)where ˜ u + t = (cid:16) z t (cid:0) θ (cid:1) − z t (cid:0)(cid:98) θ T (cid:1)(cid:17) (cid:48) (cid:104) τ φ (cid:105) + u t − (cid:98) Ω uv (cid:98) Ω − vv ∆ x t . We will discuss (cid:101) D − θ , T (cid:80) Tt = z t (cid:0)(cid:98) θ T (cid:1) z t (cid:0)(cid:98) θ T (cid:1) (cid:48) (cid:101) D − θ , T and (cid:101) D − θ , T (cid:104)(cid:80) Tt = z t (cid:0)(cid:98) θ T (cid:1) ˜ u + t − A ∗ (cid:105) separately after having enumerate several intermediate results. Lemma 5

Deﬁne (cid:101) j ( r ; θ ) = (cid:2) d ( r ; θ ) (cid:48) , B (cid:48) (1) ( r ) , . . . , B (cid:48) ( m ) ( r ) (cid:3) (cid:48) and B u . v = B u − Ω uv Ω − vv B v . Then, under Assump-tions 1-3, we have(i) (cid:101) D − θ , T (cid:80) Tt = z t (cid:0)(cid:98) θ T (cid:1) z t (cid:0)(cid:98) θ T (cid:1) (cid:48) (cid:101) D − θ , T ⇒ (cid:82) (cid:101) j ( r ; θ ) (cid:101) j ( r ; θ ) (cid:48) dr,(ii) (cid:101) D − θ , T (cid:104)(cid:80) Tt = z t (cid:0) θ (cid:1) (cid:16) u t − (cid:98) Ω uv (cid:98) Ω − vv v t (cid:17) − A ∗ (cid:105) ⇒ (cid:82) (cid:101) j ( r ; θ ) dB u . v ( r ) ,(iii) (cid:101) D − θ , T (cid:80) Tt = z t (cid:0) θ (cid:1) (cid:16) z t (cid:0)(cid:98) θ T (cid:1) − z t (cid:0) θ (cid:1)(cid:17) (cid:48) (cid:104) τ φ (cid:105) = O p (cid:16) ln T (cid:17) ,(iv) (cid:80) b T t = (cid:101) D − θ , b T (cid:16) z t (cid:0)(cid:98) θ T (cid:1) − z t (cid:0) θ (cid:1)(cid:17) (cid:16) z t (cid:0)(cid:98) θ T (cid:1) − z t (cid:0) θ (cid:1)(cid:17) (cid:48) (cid:104) τ φ (cid:105) = O p (cid:16) (ln T ) T − ( θ L + ) (cid:17) ,(v) (cid:101) D − θ , T (cid:80) Tt = (cid:16) z t (cid:0)(cid:98) θ T (cid:1) − z t (cid:0) θ (cid:1)(cid:17) (cid:16) u t − (cid:98) Ω uv (cid:98) Ω − vv v t (cid:17) = o p (1) . roof (i) We can always add and subtract such that the LHS of ( i ) reads (cid:101) D − θ , T T (cid:88) t = z t (cid:0)(cid:98) θ T (cid:1) z t (cid:0)(cid:98) θ T (cid:1) (cid:48) (cid:101) D θ , T = (cid:101) D − θ , T T (cid:88) t = z t (cid:0) θ (cid:1) z t (cid:0) θ (cid:1) (cid:48) (cid:101) D − θ , T +  (cid:101) D − θ , T T (cid:88) t = z t (cid:0)(cid:98) θ T (cid:1) z t (cid:0)(cid:98) θ T (cid:1) (cid:48) (cid:101) D − θ , T − (cid:101) D − θ , T T (cid:88) t = z t (cid:0) θ (cid:1) z t (cid:0) θ (cid:1) (cid:48) (cid:101) D − θ , T  . (G.2)Lemma 2(iii) implies that the ﬁrst term in the RHS of (G.2) converges to (cid:82) (cid:101) j ( r ; θ ) (cid:101) j ( r ; θ ) (cid:48) dr . Itremains to show that the term in brackets vanishes. By (cid:80) t a t a (cid:48) t − (cid:80) t b t b (cid:48) t = (cid:80) t ( a t − b t )( a t − b t ) (cid:48) + (cid:80) t ( a t − b t ) b (cid:48) t + (cid:80) t b t ( a t − b t ) (cid:48) and the Cauchy-Schwarz inequality, we have (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) (cid:101) D − θ , T T (cid:88) t = z t (cid:0)(cid:98) θ T (cid:1) z t (cid:0)(cid:98) θ T (cid:1) (cid:48) (cid:101) D − θ , T − (cid:101) D − θ , T T (cid:88) t = z t (cid:0) θ (cid:1) z t (cid:0) θ (cid:1) (cid:48) (cid:101) D − θ , T (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ T (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13) (cid:101) D − θ , T (cid:16) z t (cid:0)(cid:98) θ T (cid:1) − z t (cid:0) θ (cid:1)(cid:17)(cid:13)(cid:13)(cid:13)(cid:13) + T (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13) (cid:101) D − θ , T z t (cid:0) θ (cid:1)(cid:13)(cid:13)(cid:13)(cid:13) (cid:13)(cid:13)(cid:13)(cid:13) (cid:101) D − θ , T (cid:16) z t (cid:0)(cid:98) θ T (cid:1) − z t (cid:0) θ (cid:1)(cid:17)(cid:13)(cid:13)(cid:13)(cid:13) ≤ T (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13) (cid:101) D − θ , T (cid:16) z t (cid:0)(cid:98) θ T (cid:1) − z t (cid:0) θ (cid:1)(cid:17)(cid:13)(cid:13)(cid:13)(cid:13) + (cid:118)(cid:116) T (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13) (cid:101) D − θ , T z t (cid:0) θ (cid:1)(cid:13)(cid:13)(cid:13)(cid:13) (cid:118)(cid:116) T (cid:88) t = (cid:13)(cid:13)(cid:13)(cid:13) (cid:101) D − θ , T (cid:16) z t (cid:0)(cid:98) θ T (cid:1) − z t (cid:0) θ (cid:1)(cid:17)(cid:13)(cid:13)(cid:13)(cid:13) . We have (cid:80) Tt = (cid:13)(cid:13)(cid:13) (cid:101) D − θ , T z t (cid:0) θ (cid:1)(cid:13)(cid:13)(cid:13) = tr (cid:16)(cid:80) Tt = (cid:101) D − θ , T z t (cid:0) θ (cid:1) z t (cid:0) θ (cid:1) (cid:48) (cid:101) D − θ , T (cid:17) ⇒ tr (cid:16) (cid:82) (cid:101) j ( r ; θ ) (cid:101) j ( r ; θ ) (cid:48) dr (cid:17) . Nextnote that (cid:80) Tt = (cid:13)(cid:13)(cid:13) (cid:101) D − θ , T (cid:0) z t (cid:0)(cid:98) θ T (cid:1) − z t (cid:0) θ (cid:1)(cid:1)(cid:13)(cid:13)(cid:13) = T (cid:80) Tt = (cid:13)(cid:13)(cid:13) D d , T ( θ ) − (cid:0) d t (cid:0)(cid:98) θ T (cid:1) − d t (cid:0) θ (cid:1)(cid:1)(cid:13)(cid:13)(cid:13) . A typical contri-bution to the latter sum of norms can be bound by1 T T (cid:88) t = (cid:2) T − θ i (cid:0) t (cid:98) θ i − t θ i (cid:1)(cid:3) ≤ C (cid:16)(cid:98) θ i − θ i (cid:17) T T (cid:88) t = (cid:18) tT (cid:19) θ i (ln t ) ≤ CT − θ i − ) (ln T ) (cid:104) T θ i + (cid:16)(cid:98) θ i − θ i (cid:17)(cid:105) sup θ L ≤ θ ≤ θ U (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T (cid:88) t = (cid:18) tT (cid:19) θ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ T − θ L + ) (ln b T ) O p (1) = o p (1) , (G.3)where we used the mean-value theorem and Lemma 1(i). The claim follows. (ii) (cid:98) Ω uv and (cid:98) Ω vv are consistently estimating Ω uv and Ω vv , respectively (Theorem 2). It therefore su ﬃ ces to look at (cid:101) D − θ , T (cid:80) Tt = z t (cid:0) θ (cid:1) (cid:16) u t − Ω uv Ω − vv v t (cid:17) and (cid:101) D − θ , T A ∗ . Lemma 2(ii) with u + t = u t − Ω uv Ω − vv v t instead of u t gives the limiting result √ T (cid:80) Tt = (cid:0) x it / √ T (cid:1) j u + t ⇒ (cid:82) B jv i ( r ) dB u . v ( r ) + j ∆ + v i u (cid:82) B j − v i ( r ) dr , whichimplies (cid:101) D − θ , T T (cid:88) t = z t (cid:0) θ (cid:1) (cid:16) u t − Ω uv Ω − vv v t (cid:17) ⇒ (cid:90) (cid:101) j ( r ; γ ) dB u . v ( r ) + (cid:101) B + vu , (G.4)where (cid:101) B + vu = (cid:2) (cid:48) d × , b (cid:48) ∆ + v u , . . . , b (cid:48) m ∆ + v m u (cid:3) (cid:48) . The term − (cid:101) D − θ , T A ∗ is constructed to asymptoticallycancel out the term (cid:101) B + vu in the RHS of (G.4). (iii) Using z t ( (cid:98) θ T ) − z t ( θ ) = (cid:104) d t ( (cid:98) θ T ) − d t ( θ ) (cid:105) , we have (cid:101) D − θ , T T (cid:88) t = z t (cid:0) θ (cid:1) (cid:16) z t (cid:0)(cid:98) θ T (cid:1) − z t (cid:0) θ (cid:1)(cid:17) (cid:48) (cid:104) τ φ (cid:105) =  √ T (cid:80) Tt = D d , T ( θ ) − d t (cid:0) θ (cid:1) (cid:16) d t (cid:0)(cid:98) θ T (cid:1) − d t (cid:0) θ (cid:1)(cid:17) (cid:48) τ √ T (cid:80) Tt = D − s , T s t (cid:16) d t (cid:0)(cid:98) θ T (cid:1) − d t (cid:0) θ (cid:1)(cid:17) (cid:48) τ  . √ T (cid:80) Tt = (cid:16) tT (cid:17) θ i (cid:80) dk = τ k (cid:0) t (cid:98) θ k − t θ k (cid:1) or √ T (cid:80) Tt = (cid:16) x it √ T (cid:17) j (cid:80) dk = τ k (cid:0) t (cid:98) θ k − t θ k (cid:1) . We show that both contributions are O p (ln T ). By the mean-valuetheorem and Lemma 1(i), (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ T T (cid:88) t = (cid:18) tT (cid:19) θ i d (cid:88) k = τ k (cid:0) t (cid:98) θ k − t θ k (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ T d (cid:88) k = τ k T (cid:88) t = (cid:18) tT (cid:19) θ i t θ k (cid:16) t (cid:98) θ k − θ k − (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C d (cid:88) k = | τ k | (cid:12)(cid:12)(cid:12)(cid:12) T θ k + ( (cid:98) θ k − θ k ) (cid:12)(cid:12)(cid:12)(cid:12) T T (cid:88) t = (cid:18) tT (cid:19) θ i + θ k ln t ≤ C (ln T ) d (cid:88) k = | τ k | (cid:12)(cid:12)(cid:12)(cid:12) T θ k + ( (cid:98) θ k − θ k ) (cid:12)(cid:12)(cid:12)(cid:12)  T T (cid:88) t = (cid:18) tT (cid:19) θ i + θ k  = O p (ln T ) . (G.5)Similarly, from the mean-value theorem and Cauchy-Schwartz inequality, we see that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ T T (cid:88) t = (cid:32) x it √ T (cid:33) j d (cid:88) k = τ k (cid:0) t (cid:98) θ k − t θ k (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ T d (cid:88) k = τ k T (cid:88) t = (cid:32) x it √ T (cid:33) j t θ k (cid:16) t (cid:98) θ k − θ k − (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C d (cid:88) k = | τ k | (cid:12)(cid:12)(cid:12)(cid:12) T θ k + ( (cid:98) θ k − θ k ) (cid:12)(cid:12)(cid:12)(cid:12) T T (cid:88) t = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x it √ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) j (cid:18) tT (cid:19) θ k ln t ≤ C (ln T ) d (cid:88) k = | τ k | (cid:12)(cid:12)(cid:12)(cid:12) T θ k + ( (cid:98) θ k − θ k ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:118)(cid:116) T T (cid:88) t = (cid:32) x it √ T (cid:33) j (cid:118)(cid:116) T T (cid:88) t = (cid:18) tT (cid:19) θ L . (G.6)From (G.5) and (G.6) we conclude that (cid:101) D − θ , T (cid:80) Tt = z t (cid:0) θ (cid:1) (cid:16) z t (cid:0)(cid:98) θ T (cid:1) − z t (cid:0) θ (cid:1)(cid:17) (cid:48) (cid:104) τ φ (cid:105) = O p (ln T ). (iv) Use z t ( (cid:98) θ T ) − z t ( θ ) = (cid:104) d t ( (cid:98) θ T ) − d t ( θ ) (cid:105) to obtain (cid:101) D − θ , T (cid:80) Tt = (cid:16) z t (cid:0)(cid:98) θ T (cid:1) − z t (cid:0) θ (cid:1)(cid:17) (cid:16) z t (cid:0)(cid:98) θ T (cid:1) − z t (cid:0) θ (cid:1)(cid:17) (cid:48) (cid:104) τ φ (cid:105) = (cid:20) D d , T ( θ ) − √ T (cid:80) Tt = (cid:16) d t ( (cid:98) θ T ) − d t ( θ ) (cid:17)(cid:16) d t ( (cid:98) θ T ) − d t ( θ ) (cid:17) (cid:48) τ p × (cid:21) . For any i ∈ { , , . . . , d } , the norm of the i th componentof the nonzero vector is (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d (cid:88) k = τ k T θ i + / T (cid:88) t = (cid:16) t (cid:98) θ i − t θ i (cid:17) (cid:16) t (cid:98) θ k − t θ k (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ d (cid:88) k = | τ k | T θ i + / T (cid:88) t = t θ i + θ k (cid:12)(cid:12)(cid:12)(cid:12) t (cid:98) θ i − θ i − (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) t (cid:98) θ k − θ k − (cid:12)(cid:12)(cid:12)(cid:12) ≤ C d (cid:88) k = | τ k | (cid:12)(cid:12)(cid:12)(cid:98) θ i − θ i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:98) θ k − θ k (cid:12)(cid:12)(cid:12) T θ i + / T (cid:88) t = t θ i + θ k (ln t ) ≤ C (ln T ) T − ( θ L + ) (cid:12)(cid:12)(cid:12) T θ i + ( (cid:98) θ i − θ i ) (cid:12)(cid:12)(cid:12) d (cid:88) k = | τ k | (cid:12)(cid:12)(cid:12) T θ k + ( (cid:98) θ k − θ k ) (cid:12)(cid:12)(cid:12)  T T (cid:88) t = (cid:18) tT (cid:19) θ i + θ k  = O p (cid:32) (ln T ) T θ L + (cid:33) . (v) By similar steps as before, and invoking Theorem 2, it is easy to show that it su ﬃ ces to bound T − ( θ i + ) (cid:80) Tt = (cid:0) t (cid:98) θ i − t θ i (cid:1)(cid:0) u t − Ω uv Ω − vv v t (cid:1) . Writing u + t = u t − Ω uv Ω − vv v t (as before), we have T − ( θ i + ) T (cid:88) t = (cid:0) t (cid:98) θ i − t θ i (cid:1) u + t = √ T T (cid:88) t = (cid:18) tT (cid:19) θ i (cid:0) t (cid:98) θ i − θ i − (cid:1) u + t = (cid:0)(cid:98) θ i − θ i (cid:1) √ T T (cid:88) t = (ln t ) (cid:18) tT (cid:19) θ i u + t + o p (1) = T − ( θ i + ) (cid:104) T θ i + (cid:0)(cid:98) θ i − θ i (cid:1)(cid:105) √ T T (cid:88) t = (cid:18) ln tT (cid:19) (cid:18) tT (cid:19) θ i u + t + T − ( θ i + ) (cid:104) T θ i + (cid:0)(cid:98) θ i − θ i (cid:1)(cid:105) (ln T ) 1 √ T T (cid:88) t = (cid:18) tT (cid:19) θ i u + t + o p (1) = T θ i + O p (1) + ln TT θ i + O p (1) . This establishes (v). (cid:4) θ is unknown and has to be estimated. Anadditional simulation study was conducted to verify this claim. That is, we extend the simulationstudy on the Monte Carlo results for testing H : φ = H a : φ (cid:44) ρ = .

50 as in the last column of Table 2. For samplesizes as large as 15000, the empirical size of feasible FMOLS estimator that relies on (cid:98) θ T ﬂuctuatesaround 11% (Figure 5). This indeed points towards a lack of asymptotic validity. On the contrary,FMOLS( θ ) yields an empirical size close to 5%. Figure 5:

Empirical size of feasible and infeasible FMOLS estimators, see the note for Table 2. Further Empirical Results

H.1 Unit Root Tests

Table 6:

The t-statistics for the ADF and DF-GLS unit root tests. The columns with header ‘const’ and ‘const & trend’refer to the inclusion of only an intercept or both intercept and linear trend. Rejection of the unit root hypothesis at a10% and 5% level are indicated with one and two stars, respectively.

ADF DF-GLSconst const & trend const const & trendGDP CO GDP CO GDP CO GDP CO Australia 0.287 -2.549 -2.050 -1.986 2.046 1.379 -1.577 -0.732Austria -0.055 -2.118 -1.943 -2.738 1.478 -1.143 -1.655 − . ∗ Belgium 0.153 -2.336 -1.705 -2.818 2.041 -0.794 -1.287 -2.644Canada -0.575 -1.133 -2.020 -1.120 1.117 0.874 -1.894 -0.387Denmark -0.235 -2.446 -2.326 -0.136 1.393 0.410 -1.505 0.084Finland -0.362 -1.327 -2.315 − . ∗ − . ∗∗ France -0.557 -2.438 -1.823 -1.858 1.087 -0.267 -1.470 -1.212Germany -0.374 − . ∗∗ -2.767 − . ∗∗ − . ∗ Norway -0.680 -2.044 -2.064 -2.318 0.749 0.331 -1.017 -1.292Portugal -1.432 -0.455 -1.697 -1.676 -0.708 0.593 -0.741 -1.923Spain 0.402 -1.243 -1.354 -1.994 1.487 0.959 -1.077 -2.014Sweden -0.789 -2.075 -2.289 -1.625 0.258 0.180 -1.513 -0.968Switzerland -1.093 -1.963 -2.785 -1.989 2.272 0.368 -2.447 -1.237UK -0.179 -0.721 -1.262 -0.402 2.446 -0.622 -0.608 -0.013USA -0.349 -2.055 -2.871 -1.322 2.409 -0.101 − . ∗ -0.812 Note: Asterisks denote rejection of the null hypothesis at the ∗∗∗ ∗∗ ∗

10% signiﬁcance level. .2 Perron and Yabu (2009) Test for Deterministic Trend Coe ﬃ cient The Perron and Yabu (2009) test is used to test for the presence of a deterministic trend functionin the log per capita GDP series, see Table 7. The test allows for integrated or stationary errors.The details of the procedure can be found on page 61 of Perron and Yabu (2009). The asymptoticdistribution of this test statistic is standard normal (quantiles are z . = . z . = .

96, and z . = . Table 7:

Perron and Yabu (2009) test statistic for each of the 18 countries. (cid:99) PY Australia 3.17Austria 2.19Belgium 3.52Canada 3.33Denmark 5.58Finland 4.27 (cid:99) PY France 2.41Germany 1.91Italy 2.11Japan 2.93Netherlands 2.27Norway 5.85 (cid:99) PY Portugal 2.16Spain 2.31Sweden 7.12Switzerland 3.91UK 3.60USA 4.1254 .3 Overviews for Austria and Finland (a) (b)(c) (d)(e) (f)Figure 6:

Overview graphs for Austria over 1870-2014. (a) log(GDP) versus log(CO ) (both per capita). (b) As subﬁgure (a) but using detrended variables. (c)

The log per capita CO emissions time series for Austria. (d) The residual sum of squares (RSS) for the nonlinear model speciﬁcation y t = τ + τ t + φ x t + φ x θ t + u t for various values of θ . (e) The RSS as a function of θ for the ﬂexible nonlinear trend speciﬁcation y t = τ + τ t + τ t θ + φ x t + u t . (f) The relation between x t and y t after partialling out the constant, linear trend,and ﬂexible deterministic trend. a) (b)(c) (d)(e) (f)Figure 7: Overview graphs for Finland over 1870-2014. (a) log(GDP) versus log(CO ) (both per capita). (b) As subﬁgure (a) but using detrended variables. (c)

The log per capita CO emissions time series for Finland. (d) The residual sum of squares (RSS) for the nonlinear model speciﬁcation y t = τ + τ t + φ x t + φ x θ t + u t for various values of θ . (e) The RSS as a function of θ for the ﬂexible nonlinear trend speciﬁcation y t = τ + τ t + τ t θ + φ x t + u t . (f) The relation between x t and y t after partialling out the constant, linear trend,and ﬂexible deterministic trend. .4 Residual Series for Models (M1)-(M4) i g u re : T h e r e s i du a l s e r i e s f o r eac h c oun t r yund e r m od e l s p ec i ﬁ ca ti on ( M ) : y t = τ + τ t + φ x t + φ x t + u t . i g u re : T h e r e s i du a l s e r i e s f o r eac h c oun t r yund e r m od e l s p ec i ﬁ ca ti on ( M ) : y t = τ + τ t + τ t + φ x t + φ x t + u t . i g u re : T h e r e s i du a l s e r i e s f o r eac h c oun t r yund e r m od e l s p ec i ﬁ ca ti on ( M ) : y t = τ + τ t + τ t θ + φ x t + φ x t + u t . i g u re : T h e r e s i du a l s e r i e s f o r eac h c oun t r yund e r m od e l s p ec i ﬁ ca ti on ( M ) : y t = τ + τ t + τ t θ + φ x t + u t ..