Inference without smoothing for large panels with cross-sectional and temporal dependence
aa r X i v : . [ ec on . E M ] J un INFERENCE WITHOUT SMOOTHING FOR LARGE PANELS WITHCROSS-SECTIONAL AND TEMPORAL DEPENDENCE
JAVIER HIDALGO AND MARCIA SCHAFGANS
Abstract.
This paper addresses inference in large panel data models in the presence of bothcross-sectional and temporal dependence of unknown form. We are interested in making inferencesthat do not rely on the choice of any smoothing parameter as is the case with the often employed“
HAC ” estimator for the covariance matrix. To that end, we propose a cluster estimator forthe asymptotic covariance of the estimators and valid bootstrap schemes that do not require theselection of a bandwidth or smoothing parameter and accommodate the nonparametric nature ofboth temporal and cross-sectional dependence. Our approach is based on the observation thatthe spectral representation of the fixed effect panel data model is such that the errors becomeapproximately temporally uncorrelated. Our proposed bootstrap schemes can be viewed as wildbootstraps in the frequency domain. We present some Monte-Carlo simulations to shed somelight on the small sample performance of our inferential procedure.
JEL classification:
C12, C13, C23
Keywords:
Large panel data models. Cross-sectional strong-dependence. Central Limit Theo-rems. Clustering. Discrete Fourier Transformation. Nonparametric bootstrap algorithms. INTRODUCTION
Nowadays we often encounter panel data sets where both the number of individuals, n , andthe time dimension, T , are large or increase without limit. Phillips and Moon (1999) and Pesaranand Yamagata (2008) provide some theoretical results for the parameter estimators in large paneldata models, that is where both n and T tend to infinity. These works were done under theassumption of no dependence among the cross-sectional units. Yet, it is well recognized that thelatter assumption is not very realistic, and there has been a surge of work on how to provide validinferences when this type of dependence is present. The issues are closely related to Zellner’s(1962) SU RE (Seemingly Unrelated Regression) model, be it that here both dimensions areallowed to increase without limit.Once one accepts the possibility that the errors of the model may exhibit cross-sectional and/ortemporal dependence, a key component to make valid inferences is the consistent estimation of theasymptotic covariance matrix of the estimators. For that purpose, we might proceed by explicitlyassuming some specific dependence structure on the error term. In our context, this route appearsto be quite cumbersome mainly for two reasons. First, it is quite difficult to specify an appropriatemodel in the presence of cross-sectional dependence as there are ample generic models capable tojustify such a dependence. Some examples are the Spatial Autoregressive (
SAR ) model of Cliffand Ord (1973), which has its origins in Whittle (1954), Andrews’ (2005) proposal to capturecommon shocks (e.g., macroeconomic, technological, legal/institutional) across observations and
We would like to thank Professor Serena Ng, Silvia Gon¸calves, two anonymous referees, and participants at theBristol ESG Meetings for their helpful comments.
Pesaran’s (2006) factor model. Second, in many settings it may be quite unrealistic to assume thatthe temporal dependence is the same for all individuals, so finding a correct specification may beinfeasible as n increases with no limit. Inferential properties based on parameter estimates thatuse a specific (wrong) structure, moreover, may be worse than the least squares estimates ( LSE ).The latter observation was first documented in Engle (1974) and latter examined in Nicholls andPagan (1977), who illustrated the adverse consequences of imposing incorrect temporal depen-dence assumptions on inferences, say when the practitioner assumes an AR (1) model instead ofthe true underlying AR (2) specification.As the task of finding an appropriate model for the dependence can be very daunting, one ofour main aims in this paper is then to provide inferences in panel data not only when the errorterm (potentially) exhibits both temporal and cross-sectional dependence, but more importantlydoing so without relying on any parametric functional form for such a dependence. Under thesecircumstances, one standard methodology is based on the HAC estimator, whose implementationrequires the choice of one (or more) bandwidth parameter(s). While this approach is often invokedand used in the context of time series regression models, application of spatial HAC estimatorsis less common. The use of HAC estimators in spatial econometrics was advocated by Conley(1999) and Kelejian and Prucha (2007) studied its use in Cliff-Ord type spatial models. Recently,a HAC estimator accounting for both temporal and spatial correlation been considered by Kimand Sun (2013). The implementation requires not only the selection of a bandwidth parameterbut, more importantly, an associated measure of distance between the cross-sectional units. Thisexplicitly assumes that there is some type of ordering among the individuals or cross-sectionalunits which, in contrast with the time dimension, is not unambiguous. Even if one accepts theexistence of such an ordering, it is likely that various economics and/or geographical distancemeasures, each requiring their own bandwidth, may be required to encapsulate the order. Forinstance, simply relying on the geographic “as the crow flies” distance measure for ordering isquestionable as one cannot expect that two cross-sectional units located in the Rockies wouldbehave the same as if they were in the Midwest. Clearly, a distance measure which captures thetopography and other economic measures may be required. In addition, if we recognize that thetemporal dependence may not be the same for all individuals, even the selection of a bandwidthparameter to account for the temporal dependence may become infeasible. Any cross-validationalgorithm used to determine the bandwidth parameter for temporal dependence may then needto be performed for each individual.To deal with the potential caveats of HAC estimators, we shall propose a cluster based estimatorwhich is able to take into account both types of dependence and permits the temporal dependenceto vary across individuals, see Condition C In a time series regression model context several proposals, both in the time and frequency domain, havebeen employed and bootstrap applications commonly approximate the long term covariance by using a long ARpolynomial (sieve method). Other methods include the use of orthogonal polynomials, see, e.g., Sun (2013) andPhillips (2005), instead of the use of Fourier sequences. All of them have in common that they require the choiceof a bandwidth parameter and/or base function. Lazarus et al. (2018) provide an interesting simulation study.
NFERENCE WITHOUT SMOOTHING FOR PANELS 3 mentioned before. We avoid the use of the HAC methodology altogether . In addition, we providea new CLT that accounts for an unknown and general temporal spatial dependence structure thatpermits strong spatial dependence. Our approach allows for more general dependence structurethan permitted by Kim and Sun (2013) and Driscoll and Kraay (1998). Our new results cantherefore be regarded as providing primitive conditions that guarantee Kim and Sun’s and Driscolland Kraay’s assumption of the existence of a suitable CLT.Our approach is based on the observation that the spectral representation of the fixed effectpanel data model (2.1) is such that the errors become approximately temporally uncorrelatedwhilst heteroskedastic. It is this observation that enables us to conduct inference without anysmoothing. To provide finite sample improvements for inference based on our cluster estimator,we present and examine bootstrap schemes which also do not require the choice of any bandwidthparameter , contrary to the sieve or moving block bootstrap (henceforth denoted MBB) . Twobootstrap algorithms are presented, one where we assume homogeneous temporal dependence,which we shall denote as the na¨ıve bootstrap, and a second one, denoted the wild bootstrap,where we allow for heterogeneous temporal dependence. Our bootstrap schemes can be viewed aswild bootstraps in the frequency domain which are shown to have good finite sample properties.We compare our proposal to other methods that also do not require any ordering of the cross-sectional units. In particular, we consider Driscoll and Kraay’s HAC estimator and the fixed-basymptotic framework advocated by Vogelsang (2012) . We also consider the MBB bootstrapapplied to the vector containing all the individual observations at each point in time as proposedby Gon¸calves (2011).While our estimator does permit more general spatio temporal dependence and does not requireany smoothing parameters, in line with Robinson (1989), the approach examined in Section 2precludes the presence of conditional heteroskedasticity. In Section 4, we examine how we canrelax this by introducing a multiplicative error structure, v pt = σ ( w p ) σ ( ̺ t ) u pt , where w p and ̺ t can be functions of the fixed effects and/or variables which are correlated with the includedregressors. It is worth noting that we do not need to observe these variables, as is the casewhen w p , say, is the fixed effect. That is, we can allow for “groupwise” heteroskedasticity andapplications in development economics are commonplace, see Deaton (1996) and Greene (2018).Of particular interest, here, is the realisation that our cluster based inference is robust to thepresence of heteroskedasticity that is only cross-sectional in nature (i.e., where σ ( ̺ t ) is constant).In the presence of a non-constant σ ( ̺ t ), we propose a simple way to robustify our cluster basedinference. Whereas more general forms of heteroskedasticity, where v pt = σ ( x pt ) u pt , can bepermitted, its implementation would require the use of nonparametric methods which wouldrequire the selection of a bandwidth parameter to estimate the heteroskedasticity function. Weshall indicate how we should proceed if this were the case. Finally, a benefit of our estimator isthat it permits the temporal dependence to vary across individuals, which is more realistic. It isimportant to point out that MBB would not be valid in these settings as it depends on some typeof temporal homogeneity or even stationarity.The remainder of the paper is organized as follows. In the next section we discuss the regularityconditions for our model and describe the main results. In Section 3 we introduce our bandwidth JAVIER HIDALGO AND MARCIA SCHAFGANS parameter free bootstrap schemes and we demonstrate their validity. Section 4 discusses a gener-alisation of our model that permits (conditional) heteroskedasticity. Section 5 presents a MonteCarlo simulation experiment to shed some light on the finite sample performance of our clusterestimator and its comparison to others and we illustrate the finite sample benefits of our bootstrapschemes. In Section 6 we summarize. The proofs of our main results are given in Appendix A,which employs a series of lemmas given in Appendix B.2.
THE REGULARITY CONDITIONS AND MAIN RESULTS
We shall begin by considering the panel data model y pt = β ′ x pt + η p + α t + u pt , p = 1 , ..., n, t = 1 , ..., T , (2.1)where β is a k × x pt is a k × α t and η p represent respectively the time and individual fixed effects and { u pt } t ∈ Z , p ∈ N + , are sequencesof zero mean errors with heterogeneous variance E (cid:0) u pt (cid:1) = σ p , p ∈ N + . We allow for general(unknown) temporal and cross-sectional dependence structures of the sequence { u pt } t ∈ Z , p ∈ N + ,detailed in Condition C { x pt } t ∈ Z , p ∈ N + , detailed in Condition C . Further details areprovided in our discussion of these conditions below. For simplicity, we shall assume that thesequences { x pt } t ∈ Z , p ∈ N + , are mutually independent of the error term { u pt } t ∈ Z , p ∈ N + ,whilst allowing for dependence of the covariates with the fixed effects η p and/or α t . In Section4, we shall relax this condition allowing for heteroskedasticity. A straightforward extension thatallows for lagged endogenous variables { y p,t − ℓ } k ℓ =1 , as in Hidalgo and Schafgans (2017), requiresthe use of the instrumental variable estimator, where { x p,t − ℓ } k ℓ =1 provide natural instruments for { y p,t − ℓ } k ℓ =1 . We have avoided this generalization as it would detract from the main contributionof the paper and it will only add some extra technicalities and/or considerations which are wellknown and understood when n = 1.Our first aim in the paper is to perform inference on the slope parameters β in the presenceof a very general and unknown spatio-temporal dependence structure. To that end, we firstneed to extend a Central Limit Theorem provided in Phillips and Moon (1999), see also Hahnand Kuersteiner (2002). The reason for this is that in their work the sequences of randomvariables, say (cid:8) ψ pt (cid:9) t ∈ Z , p ∈ N + , are assumed to be independent, that is (cid:8) ψ pt (cid:9) t ∈ Z and (cid:8) ψ qt (cid:9) t ∈ Z are mutually independent for any p = q , which is ruled out in our context as we permit cross-sectional dependence. Moreover, as we shall allow for “ strong-dependence ” in our error andregressor sequences, we cannot use results and arguments based on any type of “ strong - mixing ”conditions, so that results in Jenish and Prucha (2009 , u pt to exhibit temporal dependence. A second aim of the paper is to extend the work ofDriscoll and Kraay (1998) by examining, in the presence of individual and temporal fixed effects,a cluster estimator of the asymptotic covariance of the estimator of the slope parameters that doesnot require the ordering of the observations (in the cross-sectional dimension) or the selection ofa bandwidth parameter. In fact, all that is needed is that the first conditonal moment of the error is zero and the second conditonalmoment is equal to the unconditonal one.
NFERENCE WITHOUT SMOOTHING FOR PANELS 5
The fixed effect model and the estimator for the slope parameters we consider is well known.Denoting for any generic sequence { ς pt } Tt =1 , p = 1 , ..., n , the transformation e ς pt = ς pt − ς · t − ς p · + ς ·· ; (2.2) ς · t = 1 n n X p =1 ς pt ; ς p · = 1 T T X t =1 ς pt ; and ς ·· = 1 nT T X t =1 n X p =1 ς pt ,the estimator of β is obtained by performing least squares on the transformed model (where theindividual and time effects are removed) e y pt = β ′ e x pt + e u pt , p = 1 , ..., n and t = 1 , ..., T, (2.3)so that b β is defined as b β = n X p =1 T X t =1 e x pt e x ′ pt − n X p =1 T X t =1 e x pt e y pt . (2.4)It is obvious that we can take Ex pt = 0 as e x pt is invariant to additive constants, say µ t or ν p , to x pt .In this paper, we shall focus on an equivalent frequency domain formulation of (2.1) and (2.3).It is the application of the Discrete Fourier Transform (DFT) to our model, as will become clearshortly, that plays an important role in describing and motivating the cluster estimator of theasymptotic covariance matrix of b β, or equivalently ˜ β given in (2.7) below, and the bootstrapschemes described in Section 3.For this purpose, we denote the DFT for generic sequences { ς pt } Tt =1 , p ≥
1, by J ς,p ( λ j ) = 1 T / T X t =1 ς pt e − itλ j , j = 1 , ..., e T = [ T /
2] , λ j = 2 πjT (2.5)and J ς,p ( λ j ) = J ς,p ( − λ T − j ), j = e T + 1 , .., T. We can then rewrite (2.3) as J e y,p ( λ j ) = β ′ J e x,p ( λ j ) + J e u,p ( λ j ) , p = 1 , ..., n ; j = 1 , ..., T − . (2.6)Given that our sequences { e ς pt } Tt =1 , p ≥ , are centered around their sample means, we can leaveout the frequency λ j for j = T (and 0) as J e ς,p (0) = T / P Tt =1 e ς pt = 0. The interesting propertyof J e u,p ( λ j ) , j = 1 , ..., T − , that allows us to formulate our new cluster estimator that accountsfor both types of dependence, is that it is serially uncorrelated over the Fourier frequencies forlarge T, whilst possibly heteroskedastic. Based on the frequency domain formulation of our model(2.6), we can also compute our estimator of β as e β = n X p =1 T − X j =1 J e x,p ( λ j ) J ′ e x,p ( − λ j ) − n X p =1 T − X j =1 J e x,p ( λ j ) J e y,p ( − λ j ) . (2.7)We introduce our regularity conditions next. To that end, and in what follows, we denote forany generic sequence { v pt } t ∈ Z , p ∈ N,ϕ v ( p, q ) = Cov ( v pt ; v qt ) , for any p, q ≥ JAVIER HIDALGO AND MARCIA SCHAFGANS
Condition C1: { u pt } t ∈ Z , p ∈ N + , are zero mean sequences of random variables such that ( i ) u pt = ∞ X k =0 d k ( p ) ξ p,t − k , ∞ X k =0 kd k < ∞ , d k =: sup p | d k ( p ) | , where E (cid:0) ξ pt | V p,t − (cid:1) = 0 ; E (cid:0) ξ pt | V p,t − (cid:1) = σ ξ,p and finite fourth moments, with V p,t denoting the σ − algebra generated by (cid:8) ξ ps , s ≤ t (cid:9) . ( ii ) For all t ∈ Z and p ∈ N + , ξ pt = ∞ X ℓ =1 a ℓ ( p ) ε ℓt , sup p ∈ N + ∞ X ℓ =1 | a ℓ ( p ) | < ∞ , sup ℓ ≥ n X p =1 | a ℓ ( p ) | < ∞ , where the sequences { ε ℓt } t ∈ Z , ℓ ∈ N + , are zero mean independent identically distributed ( iid ) random variables. ( iii ) The fourth cumulant of { u pt } t ∈ Z , p ∈ N + , satisfies lim T ր∞ sup p ∈ N + T X t ,t ,t =1 | Cum ( u pt ; u pt ; u pt ; u p ) | < ∞ . Condition C2: { x pt } t ∈ Z , p ∈ N + , are sequences of random variables such that: ( i ) x pt = ∞ X k =0 c k ( p ) χ p,t − k , ∞ X k =0 kc k < ∞ , c k =: sup p k c k ( p ) k , where k B k denotes the norm of the matrix B and E (cid:0) χ pt | Υ p,t − (cid:1) = 0 ; Cov (cid:0) χ pt | Υ p,t − (cid:1) =Σ χ,p and E (cid:13)(cid:13) χ pt (cid:13)(cid:13) < ∞ , with Υ p,t denoting the σ − algebra generated by (cid:8) χ ps , s ≤ t (cid:9) . ( ii ) The sequences of random variables (cid:8) χ pt (cid:9) t ∈ Z , p ∈ N + , are such that χ pt = ∞ X ℓ =1 b ℓ ( p ) η ℓt , sup p ∈ N + ∞ X ℓ =1 | b ℓ ( p ) | < ∞ , sup ℓ ≥ n X p =1 | b ℓ ( p ) | < ∞ ,where the sequences { η ℓt } t ∈ Z , ℓ ∈ N + , are zero mean iid random variables. ( iii ) Denoting Σ x,p = E (cid:0) x pt x ′ pt (cid:1) , we have < Σ x = lim n →∞ n n X p =1 Σ x,p (2.8) and the fourth cumulant of { x pt } t ∈ Z , p ∈ N + , satisfies lim T →∞ sup p ∈ N + T X t ,t ,t =1 | Cum ( x pt ,a ; x pt ,b ; x pt ,c ; x p ,d ) | < ∞ , a, b, c, d = 1 , ..., k , where x pt,a denotes the a − th element of x pt . Condition C3:
For all p ∈ N + , the sequences { u pt } t ∈ Z and { x pt } t ∈ Z are mutually indepen-dent and < max ≤ p ≤ n n X q =1 k ϕ ( p, q ) k < ∞ , (2.9) where ϕ ( p, q ) := ϕ u ( p, q ) ϕ x ( p, q ).We now comment on our conditions. Conditions C C { u pt } t ∈ Z and { x pt } t ∈ Z , p ∈ N + , are linear processes that permit the usual SAR (or more generally
SARM A ) model.
NFERENCE WITHOUT SMOOTHING FOR PANELS 7
Indeed, by definition of the
SAR model, with W a spatial weight matrix we have u = ( I − ωW ) − ε = ( I + Ξ) ε , Ξ = (cid:0) ψ q ( p ) (cid:1) np,q =1 ,so that u p = P nq =0 ψ q ( p ) ε q , which implies that the SAR model satisfies Condition C
1. Unlike theSAR model, Condition C P np =1 | a ℓ ( p ) | to grow with n . One can allowthe weights a ℓ ( p ) to depend on the sample size “ n ” as is often done in SAR models with weightmatrices W row-normalized, but it does not add anything significant. Our conditions, therefore,appear to be weaker than those typically assumed when cross-sectional dependence is allowedwhile being similar to those of Lee and Robinson (2013) . As the sequences may exhibit longmemory spatial dependence, the condition of strong mixing for the spatial dependence in Jenishand Prucha (2012) is ruled out. This appears to be the case as Ibragimov and Rozanov (1978)showed; if the sequence (cid:8) γ u,pq ( j ) (cid:9) j ∈ Z is not summable, the process { u pt } t ∈ Z , p ∈ N + , cannotbe strong - mixing . The long memory dependence also rules out that the process is Near EpochDependent with size greater than 1 /
2, which appears to be a necessary condition for standardasymptotic CLT results.Conditions C C { x pt } t ∈ Z and { u pt } t ∈ Z for each p . Even though there are several results available allowing their temporaldependence to exhibit long memory, see Robinson and Hidalgo (1997) or Hidalgo (2003), we havedecided to assume the temporal dependence of the regressors and errors to be weakly dependentto simplify the arguments. It is worth pointing out that our Conditions C i ) and C i ) can berelaxed to some extend to allow some type of mixing condition such as L − Near Epoch dependencewith size greater than or equal to 2. The latter condition is often invoked when we allow the errorsto have a nonlinear type of dependence structure or if (2.1) were replaced by a nonlinear paneldata model y pt = g ( x pt ; β ) + η p + α t + u pt , p = 1 , ..., n, t = 1 , ..., T .In fact, we expect the conclusions of our results to hold under such a mixing condition as ithas been shown in numerous situations. Conditions C C E (cid:0) ξ pt | V p,t − (cid:1) = σ ξ,p and Cov (cid:0) χ pt | Υ p,t − (cid:1) = Σ χ,p . This follows from ourconditions because E (cid:0) ξ pt | V p,t − (cid:1) = P ∞ ℓ =1 | a ℓ ( p ) | clearly depends on p . Furthermore, we allowfor some trending behaviour of the sequences { x pt } t ∈ Z , p ∈ N + , as we allow the mean of x pt todepend on time.An important consequence of Conditions C C { u pt } t ∈ Z and { x pt } t ∈ Z , p ∈ N + , is multiplicative. For instance, JAVIER HIDALGO AND MARCIA SCHAFGANS
Condition C p, q ∈ N + , E ( u pt u qs ) = E ∞ X k =0 d k ( p ) ξ p,t − k ∞ X ℓ =0 d ℓ ( q ) ξ q,s − ℓ ! = E (cid:0) ξ p ξ q (cid:1) ∞ X ℓ =0 d t − s + ℓ ( p ) d ℓ ( q ) t > s ∞ X ℓ =0 d ℓ ( p ) d s − t + ℓ ( q ) t ≤ s (2.10)= ϕ u ( p, q ) γ u ; pq ( t − s ) .Following the spatio-temporal literature, see Cressie and Huang (1999), we can denote this co-variance structure as separable . Of course, there are nonseparable covariance structures, seeGneiting (2002) and tests for separability are available, see Fuentes (2006) or Matsuda and Ya-jima (2004). Notice that in the absence of cross-sectional dependence, E (cid:0) ξ p ξ q (cid:1) = σ ξp ( p = q )and E ( u pt u qs ) = σ ξp γ u ; pp ( t − s ) ( p = q ). Here, and in what follows, ( A ) denotes the indicatorfunction. Remark 1.
The condition sup p ∈ N + P ∞ ℓ =0 | a ℓ ( p ) | < ∞ guarantees that for any reordering of thesequence n | a ℓ ( p ) | o ℓ ∈ N + , say n(cid:12)(cid:12) a ℓ ( τ ) ( p ) (cid:12)(cid:12) o ℓ ( τ ) ∈ N + , we have that a ℓ ( τ ) ( p ) = O (cid:16) ℓ ( τ ) − ζ (cid:17) for some ζ > / . Similarly the requirement sup ℓ ≥ P np =1 | a ℓ ( p ) | < ∞ will mean that a ℓ ( p ) = O (cid:0) p − ζ (cid:1) forsome ζ > / uniformly in ℓ ≥ . Similar arguments follow for n | b ℓ ( p ) | o ℓ ∈ N + , p ≥ . Condition C { x pt } t ∈ Z and { u pt } t ∈ Z , p ∈ N + are independent,although we envisage that it can be relaxed to require only conditional independence in first andsecond moments. To simplify the arguments somewhat, we have preferred to keep the condition asit stands. Even though we allow long memory spatial dependence of the individual sequences, theabsolute summability requirement in (2.9) limits the combined cross-sectional dependence, thatis the dependence of the sequence { z pt = u pt x pt } t ∈ Z , p ∈ N + , is “weakly spatially dependent”,see also Hidalgo and Schafgans (2017). We have adopted the convention that γ u ; pp ( t − s ) = E ( u pt u ps ) /ϕ u ( p, p ) . Importantly, as we assume that the errors and regressors are uncorrelated,the spectral density matrix of the sequences { z pt =: u pt x pt } t ∈ Z , p ∈ N + is given by the convolutionof the spectral density matrix of { x pt } t ∈ Z and spectral density function of { u pt } t ∈ Z , that is f p ( λ ) =: Z π − π f u,p ( υ ) f x,p ( λ − υ ) dυ , p ∈ N + , (2.11)where Conditions C C f p ( λ ) is twice continuous differentiable. By Fuller’s(1996) Theorem 3.4.1, or Corollary 3.4.1.2, the Fourier coefficients of f p ( λ ) are given by γ p ( j ) = γ x,p ( j ) γ u,p ( j ), p ∈ N + , so thatsup p,q =1 ,..,n ∞ X ℓ = −∞ (cid:13)(cid:13) γ pq ( ℓ ) (cid:13)(cid:13) < ∞ ; Cov ( z pt ; z qs ) = γ pq ( t − s ) ϕ ( p, q ) .With the convention that γ u,pq (0) = γ x,pq (0) = 1, Cov ( z pt , z qt ) = ϕ ( p, q ) =: ϕ u ( p, q ) ϕ x ( p, q ) asdefined in Condition C NFERENCE WITHOUT SMOOTHING FOR PANELS 9
Remark 2.
It is worth noticing that ( ) ensures that ϕ ( p, q ) = O (cid:0) q − − δ (cid:1) or ϕ ( p, q ) = O (cid:0) p − − δ (cid:1) for some δ > , so that lim n →∞ n n X p,q =1 ϕ ( p, q ) < ∞ .The latter displayed expression can be regarded as a type of weak dependence in the cross-sectionaldimension, see also Robinson (2011) or Lee and Robinson (2013) . In addition, the ergodicity insecond mean, that is n n X p,q =1 ( ϕ u ( p, q ) + ϕ x ( p, q )) = o (1) ,implies that ϕ u ( p, q ) = O ( q − ς u ) and ϕ x ( p, q ) = O ( q − ς x ) such that ς u + ς x = 1 + δ > . Conditions C − C , therefore, imply that the “ average ” long-run variance of the sequences { z pt =: u pt x pt } t ∈ Z , p ∈ N + , is given byΦ =: 2 π lim n →∞ n n X p,q =1 f pq (0) ϕ ( p, q ) < ∞ (2.12)2 πf pq (0) = ∞ X ℓ = −∞ γ pq ( ℓ ) . Observe that standard algebra yields thatΦ = : lim n →∞ lim T →∞ nT E n X p =1 T X t =1 x pt u pt n X p =1 T X t =1 x ′ pt u pt = lim n →∞ lim T →∞ nT n X p,q =1 T X t,s =1 E (cid:0) x pt x ′ qs (cid:1) E ( u pt u qs ) , (2.13)or, using its spectral domain formulation,Φ = lim n →∞ lim T →∞ nT E T − X j =1 n X p =1 J x,p ( λ j ) J u,p ( − λ j ) T − X j =1 n X p =1 J ′ x,p ( − λ j ) J u,p ( λ j ) = lim n →∞ lim T →∞ nT T − X j =1 n X p,q =1 E (cid:0) J x,p ( λ j ) J ′ x,q ( − λ j ) (cid:1) E ( J u,p ( − λ j ) J u,q ( λ j )) . (2.14)Finally, we denote V = Σ − x ΦΣ − x , (2.15)where Σ x > C Theorem 1.
Under Conditions C − C , we have that as n, T → ∞ , ( nT ) / (cid:16)e β − β (cid:17) d → N (0 , V ) .Proof. The proof of this result, based on either the time or frequency domain formulation, willbe given in Appendix A. All other proofs are relegated to this appendix as well. (cid:3)
Remark 3.
While the result could be shown to hold with finite n , a setting considered by Robinson(1998), the presence of the time fixed effect would require special attention since the dependencestructure of u pt and n − P np =1 u pt are not quite the same when n is finite. With V defined in (2.15), Theorem 1 indicates that to make inferences on β , we need toprovide a consistent estimator of Φ. A first glance at (2.13) or (2.14) suggests that this mightbe complicated or computationally burdensome due to the general spatio-temporal dependencestructure of the data. As we pointed out in the introduction, the standard approach to deal withsuch dependence, that is a HAC type of estimator, has various potential drawbacks in the presenceof cross-sectional dependence. While choosing a bandwidth parameter associated with the cross-sectional dependence requires or induces an artificial and/or nontrivial ordering, the presence ofindividual heterogeneous temporal dependence (as assumed in Conditions C C
2) would evenrender any cross validation method used to choose the temporal bandwidth parameter intractable.While Kim and Sun’s (2013) approach is subject to both these criticisms, Driscoll and Kraay(1989) avoid the need to specify an ordering of individuals by introducing an HAC estimator ofcross-sectional averages, so that one can consider their estimator as a hybrid between an HACand a cluster one: they employ the HAC methodology to deal with the temporal dependencewhereas they employ a cluster type of estimator to account for the cross-sectional dependence.We advocate to use an approach that does not require any ordering and/or selection of a bandwidthparameter and also permits a more general spatio-temporal dependence than allowed by eitherDriscoll and Kraay (1989) or Kim and Sun (2013) and permits the cross-sectional dependenceto be ”long-memory” which latter work ruled out. Moreover our approach permits the temporaldependence to be heterogeneous across individuals, which is more realistic.Our approach can be regarded as a natural extension of the earlier work by Robinson (1998)on inference without smoothing in a time series regression model context. In his case, abstractingfrom cross-sectional dependence Φ =: lim n →∞ πn n X p =1 f pp (0) .Applying his estimator to our model, would yield the estimator2 πn n X p =1 T T X j =1 I u,p ( λ j ) I x,p ( − λ j ) = 1 n n X p =1 T − X ℓ = − T +1 b γ x,p ( ℓ ) b γ u,p ( ℓ ) , (2.16)where b γ x,p ( j ) and b γ u,p ( j ) are respectively the standard sample moment estimators of γ x,p ( j ) and γ u,p ( j ) and I u,p ( λ ) = T − (cid:16)P Tt =1 u pt e itλ (cid:17) (cid:16)P Tt =1 u pt e − itλ (cid:17) ′ with I x,p ( λ ) similarly defined. Whencross-sectional dependence is allowed, the latter arguments suggest that (2.16) is not a consistent(cluster) estimator of Φ. The reason for this (see also the proof of Proposition 1 below) is that1 n n X p =1 T − X ℓ = − T +1 γ x,p ( ℓ ) γ u,p ( ℓ ) Φas expected since the first moment of (2.16) does not capture the cross-sectional dependence.The purpose of the next section is therefore to provide a consistent “cluster” estimator of Φ thataccounts for the presence of cross-sectional dependence.
NFERENCE WITHOUT SMOOTHING FOR PANELS 11
Cluster estimator of Φ . We shall present a simple cluster estimator of Φ using the “frequency” domain methodology.Obviously, there is a time domain analogue, which we shall briefly describe at the end of thesection. Our cluster estimator appears to be the first one which permits both time and cross-sectional dependence and gives a formal justification of its statistical properties. Our estimatortherefore becomes an extension of previous cluster estimators in the literature such as that inArellano (1987) (where only temporal dependence is present) or Bester, Conley and Hansen(2011) (where only cross-sectional dependence is present).Our main motivation to propose a cluster estimator using the frequency domain methodologycomes from the well known observation that for all j = k , J u,p ( λ j ) and J u,q ( λ k ) can be consideredas being uncorrelated although possibly heteroskedastic. This observation was employed in thelandmark paper by Hannan (1963) on adaptive estimation in a time series regression model.The fact that we may therefore consider J e x,p ( λ j ) J u,p ( − λ j ) as a sequence of uncorrelated andheteroskedastic random variables in j , although not in p , suggests that, in a spirit similar toWhite’s (1980) estimator, we may estimate Φ by˘Φ = 1 T T − X j =1 n / n X p =1 J e x,p ( λ j ) J b u,p ( − λ j ) n / n X p =1 J ′ e x,p ( − λ j ) J b u,p ( λ j ) . (2.17)Based on the DFT formulation, we denote the estimator of Σ x by e Σ x = 1 nT T − X j =1 n X p =1 J e x,p ( λ j ) J ′ e x,p ( − λ j ) .The following proposition establishes the consistency of our cluster estimator for the “average”long-run variance of the sequences { z pt =: u pt x pt } t ∈ Z , p ∈ N + . Proposition 1.
Under the conditions of Theorem 1, we have that ( a ) ˘Φ − Φ = o p (1)( b ) e Σ x − Σ x = o p (1) . Denoting ˆV =: e Σ − x ˘Φ e Σ − x , we now obtain the following corollary. Corollary 1.
Under the conditions of Theorem 1, we have that ( nT ) / ˆV − / (cid:16)e β − β (cid:17) d → N (0 , I ) .Proof. The proof is standard from Theorem 1 and Proposition 1, and is therefore omitted. (cid:3)
We now describe the time domain analogue estimator of Φ. For that purpose, using P Tt =1 e itλ ℓ =0 if 1 ≤ ℓ ≤ T −
1, we have after standard algebra that˘Φ = 1 n n X p,q =1 T − X | ℓ | =0 b γ x,pq ( ℓ ) b γ u,pq ( ℓ ) , where due to (2 . b γ x,pq ( ℓ ) = 1 T T −| ℓ | X t =1 e x pt e x ′ q,t + ℓ ; b γ u,pq ( ℓ ) = 1 T T −| ℓ | X t =1 b u pt b u q,t + ℓ ( ℓ >
0) + 1 T T −| ℓ | X t =1 b u qt b u p,t + ℓ ( ℓ <
0) ,and b u pt = e y pt − e β ′ e x pt , p = 1 , ..., n ; t = 1 , ..., T .3. BOOTSTRAP SCHEMES
Our motivation to introduce bootstrap schemes emanates from findings in our Monte-Carloexperiment, which suggest that the asymptotic distribution of ( nT ) / ˆV − / (cid:16)e β − β (cid:17) does notappear to provide a good approximation of its finite sample distribution. In such situations,the use of the bootstrap has been advocated as it has been shown to improve the finite sampleperformance. The general spatio-temporal dependence inherent in our model suggests that a validbootstrap mechanism may not to be easy to implement since one of the basic requirements for itsvalidity is that it has to preserve the covariance structure of the data/model. Drawing analogiesfrom the time series literature, one might be tempted to use the block bootstrap ( BB ) principleas it is not clear how the sieve bootstrap can be implemented under cross-sectional dependencein the absence of a clear ordering of the data. Applying a BB in both dimensions, however,would also be sensitive to the particular ordering chosen by the practitioner and be subject tothe absence of weak stationarity, where the dependence structure of say ( x p ,t , ..., x p + m,t ) ′ and( x p +1 ,t , ..., x p + m +1 ,t ) ′ need not be identical.Avoiding the need to establish a particular ordering of the cross sectional units, Gon¸calves(2011) proposes to apply a moving block bootstrap ( M BB ) to the vector containing all individualobservations for each t , that is it only applies a BB in the time dimension. The M BB , however,does require the choice of the block size and is known to be sensitive to its choice in finite samples.In the absence of temporal dependence, the block size equals one, and the approach is similar toHidalgo and Schafgans (2017).Here we propose valid bootstrap schemes with the interesting feature that they are computa-tionally simple (there is no need to estimate, either by parametric or nonparametric methods, thetime and/or cross-sectional dependence of the error term) and do not require the choice of anybandwidth parameter for its implementation, thereby avoiding any level of arbitrariness.Both bootstrap schemes considered are in the frequency domain. We recall the DFT for genericsequences { ς pt } Tt =1 , p ≥
1, as J ς,p ( λ j ) = T / P Tt =1 ς pt e − itλ j , j = 1 , ..., e T = [ T / λ j = πjT , anddefine its periodogram I ς,p ( λ j ) = |J ς,p ( λ j ) | j = 1 , ..., e T = [ T / p = 1 , ..., n .The first scheme, labelled the na¨ıve bootstrap, imposes Condition C C C NFERENCE WITHOUT SMOOTHING FOR PANELS 13
Condition C4:
Homogeneous time dependence: d k ( p ) and c k ( p ) defined in Conditions C C p. Denoting σ u ( p ) = Eu pt and f u,p ( λ ) the spectral density function of the sequence { u pt } t ∈ Z , forany p = 1 , , ... , Condition C f u,p ( λ ) σ u ( p ) =: g u ( λ ) , p = 1 , , , ... . (3.1)That is, the spectral density function normalized by the variance does not depend on p . Thisenables us to use the average periodogram of standardized residuals in constructing the validbootstrap model. What really matters here is that the ”correlation” structure is the same.The na¨ıve bootstrap, involves resampling from the residuals of the model and involves thefollowing simple STEP 1 : Obtain the residuals b u pt = e y pt − e β ′ e x pt , p = 1 , ..., n ; t = 1 , ..., T, compute e σ b u ( p ) = T − P Tt =1 b u pt , and obtain the standardized residualsˇ u pt = b u pt / e σ b u ( p ) . STEP 2 : Denoting ˆ U t = { ˆ u pt } np =1 , do standard random sampling from the empirical distri-bution of the residuals { ˆ U t } Tt =1 . That is, we assign probability T − to each n × U t .Denote the bootstrap sample by { U ∗ t } Tt =1 , where U ∗ t = (cid:8) u ∗ pt (cid:9) np =1 . Compute the bootstrapanalogue of (2.3) as J y ∗ ,p ( λ j ) = e β ′ J e x,p ( λ j ) + n n X q =1 I ˇ u,q ( λ j ) / J u ∗ ,p ( λ j )for p = 1 , ..., n and j = 1 , ..., T − . STEP 3 : Compute the corresponding bootstrap analogue of (2.7) as e β ∗ = n X p =1 T − X j =1 J e x,p ( λ j ) J ′ e x,p ( − λ j ) − n X p =1 T − X j =1 J e x,p ( λ j ) J e y ∗ ,p ( − λ j ) , (3.2)with J e y ∗ ,p ( λ j ) = J y ∗ ,p ( λ j ) − n P nq =1 J y ∗ ,q ( λ j ). Remark 4.
Since b u p = 0 there is no need the recenter in Step 1. The standardization of theresiduals (the variance is not the same for all individuals) is used in Step 2 to impose the appro-priate dependence structure on our bootstrap regression. As the bootstrap is done on the vectorcontaining all individual observations for each t there is no need for standardization otherwise. Remark 5.
We use the average periodogram of the standardized residuals to impose the appro-priate dependence structure on our bootstrap regression in Step 2. When the time dependenceis homogeneous among the cross-sectional units n P nq =1 I ˇ u,q ( λ j ) = σ − u ( p ) f u,p ( λ j ) (1 + o p (1)) =: g u ( λ j ) (1 + o p (1)) , see also (3.1). That is, if the temporal dependence were given by an AR(1) See also Remark 6 below. model, the right side becomes the spectral density function of an AR(1) sequence, where the in-novation sequence has variance equal to 1. In addition, as we bootstrap from ˆ u pt , which are theresiduals, we ensure that the variance of u ∗ pt is that of u pt . Remark 6.
Alternatively, we could have used random sampling from the normalized DFT of theresiduals as considered by Hidalgo (2003). In that case, denoting T b u ( λ j ) = (cid:8) J b u,p ( λ j ) / |J b u,p ( λ j ) | (cid:9) np =1 , T u ∗ ,p ( λ j ) form independent draws from the empirical distribution of ˜ T b u ( λ j ) = ( T b u ( λ j ) − ¯ T b u ) / ˆ σ T where ¯ T b u = [ T / − P T/ j =1 T b u ( λ j ) and ˆ σ T = [ T / − P T/ j =1 (cid:0) T b u ( λ j ) − ¯ T (cid:1) . The bootstrap analogueof ( ) would then be obtained using J y ∗ ,p ( λ j ) = e β ′ J e x,p ( λ j )+ (cid:16) n P nq =1 I ˇ u,q ( λ j ) (cid:17) / e σ b u ( p ) T u ∗ ,p ( λ j ) .Our scheme uses Step 2, which has better finite sample properties as observed in Hidalgo (2003). The key feature of this na¨ıve bootstrap, is that there is no need to choose any bandwidthparameter for its implementation. Under Condition C
4, uniformly in j = 1 , ..., T −
1, we havethat I ˇ u,p ( λ j ) = e σ − b u ( p ) (cid:26) I u,p ( λ j ) + (cid:16)e β − β (cid:17) I x,p ( λ j ) + (cid:16)e β − β (cid:17) J x,p ( λ j ) J u,p ( − λ j ) (cid:27) = σ − u ( p ) I u,p ( λ j ) (1 + o p (1)) , and E I u,p ( λ j ) = f u,p ( λ j ) (1 + o (1)) E ∗ ( J u ∗ ,p ( λ j ) J u ∗ ,p ( − λ ℓ )) = 0 , if j = ℓ, σ u ( p ) otherwise e σ b u ( p ) = σ u ( p ) (1 + o p (1)) .The last displayed expressions suggest that, under Condition C
4, we can consider (cid:16) n P nq =1 I ˇ u,q ( λ j ) (cid:17) / J u ∗ ,p ( λ j ) as some type of wild bootstrap in the frequency domain becauseunder homogeneous time dependence E ∗ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n X q =1 I ˇ u,q ( λ j ) / J u ∗ ,p ( λ j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:0) σ − u ( p ) f u,p ( λ j ) (cid:1) · σ u ( p ) . (1 + o p (1))= f u,p ( λ j ) (1 + o p (1)) .The following theorem is used to establish the validity of our na¨ıve bootstrap scheme. Theorem 2. (Na¨ıve Bootstrap) Under Conditions C − C , we have that in probability, ( nT ) / (cid:16)e β ∗ − e β (cid:17) d ∗ → N (0 , V ) . With the bootstrap cluster estimator of the asymptotic covariance, given by,˘Φ ∗ = 1 T T − X j =1 n / n X p =1 J e x,p ( λ j ) J ˆ u ∗ ,p ( − λ j ) n / n X p =1 J ′ e x,p ( − λ j ) J ˆ u ∗ ,p ( λ j ) (3.3)the next proposition establishes the consistency of the bootstrap cluster estimator. Proposition 2. (Na¨ıve Bootstrap) Under the assumptions of Theorem 2, we have ˘Φ ∗ − ˘Φ = o p ∗ (1) . NFERENCE WITHOUT SMOOTHING FOR PANELS 15
The previous results can be extended to incorporate the more realistic situation where thetemporal dynamics might differ by individual, as allowed by Conditions C C . This boot-strap, labelled the wild bootstrap, merges ideas from Hidalgo (2003) and Chan and Ogden (2009).As the DFT residuals are heterogeneous whilst independent over the Fourier frequencies, it ap-plies the wild-type bootstrap approach to the increasing dimensional vector {J ˆ u,p ( λ j ) } np =1 . Itrequires a modification of the above bootstrap, which primarily involves replacing STEP 2.
Forcompleteness we provide all steps:
STEP 1 ′ : Obtain the residuals b u pt = e y pt − e β ′ e x pt , p = 1 , ..., n ; t = 1 , ..., T. STEP 2 ′ : Denote (cid:8) η j (cid:9) e Tj =1 a sequence of independent identically distributed random vari-ables with mean zero and unit variance. We then compute the bootstrap analogue of (2.3)as J y ∗ ,p ( λ j ) = e β ′ J e x,p ( λ j ) + J ˆ u,p ( λ j ) η j , (cid:26) p = 1 , ..., nj = 1 , ..., T − J y ∗ ,p ( λ j ) = J y ∗ ,p ( λ T − j ) and η j = η T − j , for j = e T + 1 , ..., T − STEP 3 ′ : Compute the corresponding bootstrap analogue of (2.7) as e β ∗ = n X p =1 T − X j =1 J e x,p ( λ j ) J ′ e x,p ( − λ j ) − n X p =1 T − X j =1 J e x,p ( λ j ) J e y ∗ ,p ( − λ j ) , with J e y ∗ ,p ( λ j ) = J y ∗ ,p ( λ j ) − n P nq =1 J y ∗ ,q ( λ j ). Remark 7.
For a discussion regarding the requirement that η j = η T − j for j = e T + 1 , ..., T − .,we refer to Hidalgo (2003) . The validity of the wild bootstrap scheme follows from the following proposition.
Proposition 3. (Wild Bootstrap) Under Conditions C − C , in probability, ( nT ) / (cid:16)e β ∗ − e β (cid:17) d ∗ → N (0 , V ) and ˘Φ ∗ − ˘Φ = o p ∗ (1) . We conclude with the stating the validity of the standardised bootstrap statistic
Corollary 2.
Under Conditions C − C , we have that in probability, ( nT ) / ˆV ∗− / (cid:16)e β ∗ − e β (cid:17) d ∗ → N (0 , I ) ,where ˆV ∗ = e Σ − x ˘Φ ∗ e Σ − x .Proof. The proof is standard after Theorem 2 and Propositions 1, 2 and 3. (cid:3) (CONDITIONAL) HETEROSKEDASTICITY In this section, we extend our model to permit general forms of heteroskedasticity. Specifically,we begin by considering y pt = β ′ ´ x pt + η p + α t + v pt , p = 1 , ..., n, t = 1 , ..., T , (4.1)where v pt =: σ ( w p ) σ ( ̺ t ) u pt . (4.2)The error { u pt } t ∈ Z , p ∈ N + satisfies the same regularity conditions given in Condition C
1, exhibit-ing general spatial and temporal dependence. The sequences { w p } p ∈ N and { ̺ t } t ∈ Z , which can evenbe functions of the fixed effects, are not required to be mutually independent of the regressors { ´ x pt } t ∈ Z , p ∈ N + . Without loss of generality we will normalize σ u,p = 1 in Condition C σ u,p isnot separately identified from σ ( w p ) and σ ( ̺ t ). The error { u pt } t ∈ Z , p ∈ N + is assumed to beindependent of the regressors { ´ x pt } t ∈ Z , p ∈ N + , { w p } p ∈ N + and { ̺ t } t ∈ Z , see also footnote 2. Herethe errors v pt permit conditional heteroskedasticity. This is an extension of the so-called groupwiseheteroskedasticity , where observations belonging to different groups have distinct variances, seefor instance Greene (2018). This type of heteroskedasticity is not uncommon in applications suchas in development economics, where it has been suggested that observations within a villages orstrata would have the same (conditional) variance while differences over villages or strata exist(Deaton, 1996), that is the variance depends on some specific village variable(s).Before we modify our Condition C { ¨ x pt } t ∈ Z , p ∈ N + , the sequence that applies the usualtransformation to remove the fixed effects, see (2.2), to the sequence { ´ x pt } t ∈ Z , p ∈ N + , such that¨ x pt = ´ x pt − n n X q =1 ´ x qt − T T X s =1 ´ x ps + 1 nT n X q =1 T X s =1 ´ x qs . Observe that as it happens with ˜ x pt , we can take E (¨ x pt ) = 0 . Our new Condition C ′ is givennext. Condition C3 ′ : For all p ∈ N + , the sequence { u pt } t ∈ Z is independent of { ´ x pt } t ∈ Z , { w p } p ∈ N and { ̺ t } t ∈ Z and < max ≤ p ≤ n n X q =1 k ϕ ( p, q ) k < ∞ , (4.3) where ϕ ( p, q ) := ϕ u ( p, q ) ϕ ¨ x ( p, q ) and ϕ ¨ x ( p, q ) = Cov (cid:0) σ ( w p ) ¨ x pt ; σ ( w q ) ¨ x ′ qt (cid:1) , for any p, q ≥ v pt and ¨ x pt (´ x pt ) needed to ensure the existence of a consistent estimator of the “ average ” long-run varianceof the sequences { z pt =: v pt ¨ x pt } t ∈ Z , p ∈ N + in this framework. This is an obvious extension of our NFERENCE WITHOUT SMOOTHING FOR PANELS 17 previous Condition C
3, since our expression for Φ under our generalisation becomesΦ = : lim n →∞ lim T →∞ nT E n X p =1 T X t =1 ¨ x pt v pt n X p =1 T X t =1 ¨ x ′ pt v pt = lim n →∞ lim T →∞ nT n X p,q =1 T X t,s =1 E (cid:0) { σ ( ̺ t ) σ ( ̺ s ) } { σ ( w p ) ¨ x pt } (cid:8) σ ( w q ) ¨ x ′ qs (cid:9)(cid:1) E ( u pt u qs ) . (4.4)We shall now give some examples. We can allow( i ) ´ x pt =: x pt + h ( w p ) + h ( ̺ t ) and ( ii ) ´ x pt =: h ( w p ; ̺ t ) x pt , (4.5)for given h , h and h , where x pt satisfies Condition C
2. It is clear that under the additivestructure in ( i ), the transformed variables that account for the fixed effects, recall (2.2), satisfy e ´ x pt =: ¨ x pt ≡ e x pt , p = 1 , ..., n ; t = 1 , ..., T ,which renders this, potentially, the most straightforward setting. In this case we haveΦ = : lim n →∞ lim T →∞ nT E n X p =1 T X t =1 ¨ x pt v pt n X p =1 T X t =1 ¨ x ′ pt v pt = lim n →∞ lim T →∞ nT E n X p =1 T X t =1 x pt σ ( w p ) σ ( ̺ t ) u pt n X p =1 T X t =1 x ′ pt σ ( w p ) σ ( ̺ t ) u pt = lim n →∞ lim T →∞ nT n X p,q =1 T X t,s =1 E (cid:0) ˝x pt ˝x ′ qs (cid:1) E ( u pt u qs ) , where ˝x pt = x pt σ ( w p ) σ ( ̺ t ). The behaviour of the second moments of ˝x pt are essentially thoseof x pt because Cov (˝x pt , ˝x qs ) = E ( σ ( w p ) σ ( w q )) E ( σ ( ̺ t ) σ ( ̺ s )) Cov ( x pt , x qs ) . Remark 8.
In the last displayed assumption we have only assumed that E ( x pt | w p , ̺ t ) = 0; E ( x pt x qs | w p , w ; ̺ t , ̺ s ) = E ( x pt x qs ) =: Cov ( x pt , x qs ) so some type of dependence between x pt and ( w p , ̺ t ) is still allowed. With the multiplicative structure in ( ii ), it is basically the same sinceΦ = : lim n →∞ lim T →∞ nT E n X p =1 T X t =1 ¨ x pt v pt n X p =1 T X t =1 ¨ x ′ pt v pt = lim n →∞ lim T →∞ nT E n X p =1 T X t =1 x pt h ( w p ; ̺ t ) σ ( w p ) σ ( ̺ t ) u pt n X p =1 T X t =1 x ′ pt h ( w p ; ̺ t ) σ ( w p ) σ ( ̺ t ) u pt = lim n →∞ lim T →∞ nT n X p,q =1 T X t,s =1 E (cid:0) ˝x pt ˝x ′ qs (cid:1) E ( u pt u qs ) , where now ˝x pt = h ( w p ; ̺ t ) σ ( w p ) σ ( ̺ t ) x pt ,and | Cov (˝x pt , ˝x qs ) | ≤ K | Cov ( x pt , x qs ) | using Markovinequality. The same caveats mentioned in the last remark apply in this case. We now turn to the consistent estimator of the “ average ” long-run variance of the sequences { z pt =: v pt ¨ x pt } t ∈ Z , p ∈ N + in this framework (recognizing that we have established the necessaryregularity conditions for its existence). Following a rescaling of our regressors,˙ x pt =: ¨ x pt σ ( w p ) σ ( ̺ t ) ,for given σ ( w p ) σ ( ̺ t ) , our estimator for Φ , see also (2 . , becomes˘Φ = 1 T T − X j =1 n / n X p =1 J e ˙ x,p ( λ j ) J ˆ u,p ( − λ j ) n / n X p =1 J ′ e ˙ x,p ( − λ j ) J ˆ u,p ( λ j ) , (4.6)where ˆ u pt := ˆ v pt / (ˆ σ ( w p ) ˆ σ ( ̺ t )) and e ˙ x pt = ˆ σ ( w p ) ˆ σ ( ̺ t ) e ´ x pt . Implementation of this estimatoronly requires a consistent estimator of σ ( ̺ t ) (up to unknown scale of proportionality), and anatural estimator we can use is b σ ( ̺ t ) = 1 n n X p =1 ˆ v pt .The estimator for σ ( w p ), ˆ σ ( w p ), indeed cancels out when considering the product J e ˙ x,p ( λ j ) J ˆ u,p ( − λ j ),as J e ˙ x,p ( λ j ) J b u,p ( − λ j ) = 1 T / T X t =1 e ´ x pt ˆ σ ( w p ) ˆ σ ( ̺ t ) e − itλ j T X t =1 b v pt b σ ( ̺ t ) ˆ σ ( w p ) e − itλ j = 1 T / T X t =1 e ´ x pt b σ ( ̺ t ) e − itλ j T X t =1 b v pt b σ ( ̺ t ) e − itλ j Moreover, this result shows that when σ ( ̺ t ) is a constant, our results in Section 2 and 3 continueto hold true. That is, our estimators in previous section are robust to groupwise heteroskedasticityin the cross-sectional unit, a result supported by our Monte-Carlo simulations in Table 4 in thenext section.The intuition of the validity of this estimator comes form the standard observation that b σ ( ̺ t ) σ ( ̺ t ) P → , so that b u pt = b v pt b σ ( ̺ t ) ≃ v pt b σ ( ̺ t ) = v pt σ ( ̺ t ) (1 + o p (1)) =: σ ( w p ) u pt (1 + o p (1)) ,and b v pt b σ ( ̺ t ) σ ( w p ) =: u pt (1 + o p (1))from the above arguments. Of course the details can be lengthy, but otherwise has been done inother contexts many times.Our bootstrap algorithms also require some obvious and minimal change. The only adjustmentto the wild bootstrap algorithm relates to the use of the robust estimator of Φ provided in (4.6).For the na¨ıve bootstrap, a straightforward modification involves the following steps STEP 1 ′′ : Obtain the residuals b v pt = e y pt − e β ′ e ´ x pt , p = 1 , ..., n ; t = 1 , ..., T, NFERENCE WITHOUT SMOOTHING FOR PANELS 19 compute \ σ ( w p ) σ ( ̺ t ) = T − P Tt =1 b v pt · n − P np =1 b v pt , and obtain the standardized resid-uals b u pt = b v pt / \ σ ( w p ) σ ( ̺ t ) STEP 2 ′′ : Denoting ˆ U t = { ˆ u pt } np =1 , do standard random sampling from the empirical dis-tribution of the residuals { ˆ U t } Tt =1 . That is, we assign probability T − to each n × U t . Denote the bootstrap sample by { U ∗ t } Tt =1 , where U ∗ t = (cid:8) u ∗ pt (cid:9) np =1 . Let V ∗ t = n \ σ ( w p ) σ ( ̺ t ) u ∗ pt o np =1 Compute the bootstrap analogue of (2.3) as J y ∗ ,p ( λ j ) = e β ′ J e ´ x,p ( λ j ) + n n X q =1 I ˆ u,q ( λ j ) / J v ∗ ,p ( λ j )for p = 1 , ..., n and j = 1 , ..., T − . STEP 3 ′′ : Compute the corresponding bootstrap analogue of (2.7) as e β ∗ = n X p =1 T − X j =1 J e ´ x,p ( λ j ) J ′ e ´ x,p ( − λ j ) − n X p =1 T − X j =1 J e ´ x,p ( λ j ) J e y ∗ ,p ( − λ j ) , with J e y ∗ ,p ( λ j ) = J y ∗ ,p ( λ j ) − n P nq =1 J y ∗ ,q ( λ j ) Remark 9.
Step 2 ′′ assumes that the temporal dependence of the n × vector { v pt / ( σ ( w p ) σ ( ̺ t )) } np =1 is homogeneous so we can use the average periodogram to impose proper dependence structure on u ∗ pt (drawings from the empirical distribution of { ˆ v pt / ( \ σ ( w p ) σ ( ̺ t )) } np =1 ). We now discuss the scenario where the conditional moment of the error term depends on theregressors ´ x pt =: x pt themselves, i.e., y pt = β ′ ´ x pt + η p + α t + v pt , with v pt =: σ (´ x pt ) u pt As mentioned in the introduction, this would require us to estimate the conditional expectation, σ (´ x pt ) , nonparametrically. Several methods are available such as the Kernel regression methodor sieve estimation. As this approach would require the selection of a bandwidth parameter whichwe set out to avoid in this paper, we do not consider this in detail although we outline how toproceed. Regardless of the approach used, we anticipate that the estimator would be prettyaccurate as the number of observations in large panel data will normally be huge. For instancein a typical data set, with T = 20 and n = 1000 , we can use 20 ,
000 observations to estimate thenonparametric function. The estimator for Φ , see also (4 . , becomes˘Φ = 1 T T − X j =1 n / n X p =1 J e ´ x,p ( λ j ) J ˆ u,p ( − λ j ) n / n X p =1 J ′ ´ x,p ( − λ j ) J ˆ u,p ( λ j ) ,where ˆ u pt := ˆ v pt / ˆ σ (´ x pt ) and e ˙ x pt = ˆ σ (´ x pt ) e x pt . For the associated na¨ıve bootstrap procedure, wecan proceed as above where \ σ ( w p ) σ ( ̺ t ) is replaced by ˆ σ (´ x pt ) . FINITE SAMPLE BEHAVIOUR
In this section, we discuss the finite sample performance of our cluster-based inference proce-dure in the presence of cross-sectional and temporal dependence of unknown form. We contrastthis performance with the HAC-based inference procedure proposed by Driscoll and Kraay (1989),which unlike ours, requires the choice of smoothing parameters that may be arbitrary and erro-neous. We also provide evidence of the potential finite sample improvements of our frequencydomain bootstrap schemes and implement the MBB time domain bootstrap to the vector con-taining all individual observations for each t . Our frequency domain approaches have the benefitthat they do not rely on the choice of any smoothing parameter or require an ordering of cross-sectional units, which, as we argued before, may be arbitrary and erroneous. Another benefit ofour estimator we address in our simulations is the fact that our estimator permits heterogeneityin the temporal dependence. In our simulations, we also consider a multiplicative error structurethat permits groupwise heteroskedasticity and we reveal the robustness of our estimator to thissetting.In our Monte-Carlo experiments, we first consider the following data generating process y pt = α t + η p + βx pt + u pt for p = 1 , ..., n and t = 1 , .., T. The time fixed effects α t and individual fixed effects η p are drawn independently ( α t ∼ IIDN (1 , η p ∼ IIDN (1 , β is setequal to zero. The independently drawn errors and regressors are postulated to exhibit a varietyof scenarios for the temporal and cross-sectional dependence that are assumed to be the same forsimplicity.To evaluate the performance of our proposed cluster estimator, we analyze the empirical sizeand power for testing the significance of our parameter, H : β = 0 against H A : β = 0, at thenominal 5% level for various pairs of n and T using 5,000 simulations. In addition to presentingthe rejection rates based on the asymptotic distribution of the Wald statistic nT ˆ β ′ F E ˆ V − ˆ β F E , with b V =: e Σ − x ˘Φ e Σ − x where ˘Φ is defined in (2.17) (or equivalently the asymptotic t-test as β isscalar), we present rejection rates based on the empirical distribution of the bootstrapped teststatistic nT (cid:16) ˆ β ∗ F E − ˆ β F E (cid:17) ′ h ˆ V ∗ i − (cid:16) ˆ β ∗ F E − ˆ β F E (cid:17) , where ˆ β ∗ F E and ˆ V ∗ are the bootstrapped estimators of β and V defined in Section 3 . As inferencebased on the asymptotic distribution might not provide a good approximation to the finite sampleone, this allows us to assess the finite sample improvements our bootstrap schemes may yield.We compare the finite sample performance of our cluster-based inference procedure to theHAC based inference procedure and select the bandwidth parameter, denoted m T , using theparametric AR(1) plug-in method suggested in Andrews (1991) . This lag window is designed tominimize (approximately) the mean square of the standard error. For the HAC based inferencewe provide rejection rates of the Wald statistic nT ˆ β ′ F E ˆ V − m T ˆ β F E based on the asymptotic critical m T is chosen to be upward rounded integers. Kiefer and Vogelsang (2002) discuss the use of HAC estimators with bandwidth equal to the sample size ( b = 1).This bandwidth free approach does come at the cost of power relative to Andrews’ popular data driven optimalbandwidth selection, see also Vogelsang (2012). NFERENCE WITHOUT SMOOTHING FOR PANELS 21 values (asy) and the critical values based on the fixed-b asymptotics (fixb) of Kiefer and Vogelsang(2005) as this is shown to lead to more reliable inference, see also Vogelsang (2012). With b V m T =: e Σ − x ˆΦ m T e Σ − x , ˆΦ m T is defined asˆΦ m T = 1 nT n X p =1 n X q =1 T X t =1 T X s =1 K (cid:18) | t − s | m T (cid:19) ˆ z pt ˆ z ′ qs ,where ˆ z pt = e x pt b u pt and K ( h ) = (1 − | h | ) ( | h | ≤
1) is the Bartlett kernel. The fixed-b asymptoticdistribution is non-standard and our critical values are obtained by simulation. We also provide critical values for HAC based inference that rely on the pairs moving blockbootstrap proposed by Gon¸calves (2011) . She obtained bootstrapped samples z ∗ it = ( y ∗ it , x ∗′ it ) ′ by arranging k resampled blocks of ℓ observations from the set of T − ℓ + 1 overlapping blocks { B ,ℓ , .., B T − ℓ +1 ,ℓ } with B t,ℓ = { z t,n , z t +1 ,n , .., z t + ℓ − ,n } and z t,n = ( z t , .., z nt ) ′ in sequence (fornotational simplicity T = kℓ ). When ℓ = 1 this corresponds to the standard iid bootstrap on { z t,n } Tt =1 . The MMB based critical value are based on the standardized test statistic T (cid:16) ˆ β ∗ F E − ˆ β F E (cid:17) ′ h ˆ V ∗ ℓ i − (cid:16) ˆ β ∗ F E − ˆ β F E (cid:17) . Here ˆ V ∗ ℓ = (cid:16)e Σ ∗ x (cid:17) − ˘Φ ∗ ℓ (cid:16)e Σ ∗ x (cid:17) − , e Σ ∗ x = nT P np =1 P Tt =1 e x ∗ pt e x ∗ pt and˘Φ ∗ ℓ = 1 k P kj =1 (cid:16) ℓ − / P ℓt =1 n − ˆ s ∗ n, ( j − ℓ + t (cid:17) (cid:16) ℓ − / P ℓt =1 n − ˆ s ∗ n, ( j − ℓ + t (cid:17) ′ , where ˆ s ∗ nt = P np =1 e x ∗ pt (cid:16) ˜ y ∗ pt − ˜ x ∗′ pt ˆ β ∗ F E (cid:17) with e y ∗ pt = y ∗ pt − y ∗ p · − y ∗· t + y ∗·· and e x ∗ pt = x ∗ pt − x ∗ p · − x ∗· t + x ∗·· (seealso G¨otze and K¨unsch, 1996). The block size used is given by the integer part of the automaticbandwidth chosen by the Andrews (1991) as proposed by Gon¸calves (2011) . Simulations with Homogeneous Time Dependence.
In the first set of simulations, weassume that the time dependence is homogenous among individuals p = 1 , .., n . In particular, weassume that the error and regressors are mutually independent, homoscedastic, first order autoregressive random variables with ρ = 0 . ρ = 0 . . The error term, therefore, takes the form u pt = ρu p,t − + p − ρ η pt , with ρ = 0 . , . η pt characterizes the spatial dependence inherent in the error. We consider both a weakand strong cross-sectional dependence scenarios for u pt ( η pt ). To describe the cross-sectionaldependence, we follow Lee and Robinson (2013) and draw random locations for individual unitsalong a line, denoted s = ( s , ...s n ) ′ with s p ∼ IIDU [0 , n ] for p = 1 , .., n . Using the lineartime dependence representation, η pt = σ p ( P ∞ ℓ =1 c ℓ ( p ) e ℓt ) with e ℓt ∼ IIDN (0 , , we set c ℓ ( p ) =(1+ | s ℓ − s p | + ) − to permit weak dependence; σ p is such that V ar ( η pt ) = 1. For the strong spatialdependence setting, we use c ℓ ( p ) = (1 + | s ℓ − s p | + ) − . instead, see also Hidalgo and Schafgans(2017). The same discussion holds for the independently drawn, strictly exogenous regressor x pt , Let W q ( r ) denote a q dimensional vector of independent standard Wiener processes and define ˜ W q ( r ) = W q ( r ) − rW q (1) . The limiting distribution of the t-test is W (1) / √ C with C q = b R ˜ W q ( r ) ˜ W q ( r ) ′ dr − b R − b h ˜ W q ( r + b ) ˜ W q ( r ) ′ + ˜ W q ( r ) ˜ W q ( r + b ) ′ i dr given the use of the Bartlet kernel, where b ∈ (0 ,
1] with m T = bT (see Theorem 4, Vogelsang, 2012); the limiting distribution of the Wald test is W q (1) ′ C − q W q (1) . We obtain thecritical values using 500,000 simulations. We generated the spatial data with 49 + T periods and take the last T periods as our sample using 0 as thestarting value. Table 1.
Monte Carlo Simulations with Homogeneous Time DependenceEmpirical size of test for significance of β Spatial Weak Spatial Dependence Strong Spatial DependenceDependenceEstimator HS (Cluster) DK (HAC) HS (Cluster) DK (HAC)asy nb wb asy fixb mbb asy nb wb asy fixb mbb( n, T ) Time Dependence: AR(1), ρ = 0 . ,
16) .180 .074 .134 .253 .163 .028 .177 .068 .133 .261 .176 .030(50 ,
32) .126 .067 .091 .192 .131 .042 .129 .056 .091 .210 .148 .043(50 ,
64) .080 .054 .068 .128 .092 .049 .091 .050 .076 .158 .119 .056(50 , , ,
16) .172 .070 .120 .249 .153 .033 .183 .073 .134 .261 .174 .031(100 ,
32) .122 .057 .094 .185 .126 .050 .121 .053 .088 .200 .143 .037(100 ,
64) .082 .056 .070 .132 .098 .064 .096 .055 .084 .153 .110 .054(100 , , n, T ) Time Dependence: AR(1), ρ = 0 . ,
16) .320 .131 .276 .410 .258 .009 .312 .106 .257 .415 .279 .013(50 ,
32) .242 .097 .189 .327 .209 .013 .260 .093 .201 .368 .246 .022(50 ,
64) .168 .058 .107 .261 .169 .026 .174 .068 .124 .281 .195 .037(50 , , ,
16) .316 .125 .254 .414 .255 .007 .302 .130 .253 .400 .268 .011(100 ,
32) .252 .084 .204 .350 .224 .017 .242 .086 .173 .344 .229 .015(100 ,
64) .174 .067 .118 .249 .167 .026 .174 .069 .139 .269 .188 .033(100 , , µ t , which isindependently drawn ( µ t ∼ IIDN (1 , . In Table 1, we report the empirical size for testing the significance of β at the 5% level ofsignificance based on our cluster estimator of the variance of ˜ β in the columns labelled HS (Cluster). In addition to presenting the rejection rates based on the asymptotic critical values(asy), we report the empirical size based on the na¨ıve bootstrap (nb), and the wild bootstrap(nb). The empirical size based on the HAC based inference procedure proposed by Driscoll andKraay are reported in the columns labelled DK (HAC). For the HAC based inference, we providerejection rates based on the asymptotic critical values (asy), the critical values based on the fixed-b asymptotics (fixb) of Kiefer and Vogelsang (2005), and Gon¸calves’ (2011) MBB (mbb). We usedthe parametric AR(1) plug-in method suggested by Andrews (1991) to determine the window lag m T and the block length ℓ .The results from Table 1 reveal that our cluster based inference performs remarkably well evenin the presence of strong cross sectional dependence for moderately large panels. As before, therejection rates based on the asymptotic critical values tend to be closer to the nominal rejection NFERENCE WITHOUT SMOOTHING FOR PANELS 23 rates as n and T increase. The finite sample performance using these asymptotic critical valuesdoes suffer, in particular, from T being small, more so when the temporal dependence is stronger.This suggests that the cluster variance’s finite sample performance, in particular, appears torequire larger T , in order for us to be able to rely on the asymptotic critical value. Nevertheless,finite sample improvements in inference can be made using either frequency domain bootstrapschemes as rejection rates based on them are typically closer to the nominal rejection rates, withthe differences typically smaller as sample sizes increase. Given that we assume the temporaldynamics to be the same for all individuals in this simulation, both bootstrap schemes are valid.The na¨ıve bootstrap approach tends to perform better in the sense of providing a size closer tothe nominal rejection rate.Our cluster based inference, using the na¨ıve bootstrap for small panels, suggests large improve-ments in size relative to HAC based inference. While the use of fixed-b asymptotic critical valuesfor HAC based inference does indeed improve its performance, in accordance with Vogelsang,(2012), the gains in improvement in size achieved by our cluster based estimator remain signif-icant and are larger when the temporal or spatial dependence is stronger. Our cluster basedinference, however, does not necessarily perform superior to the HAC based inference that usethe critical values based on Gon¸calves’ pairwise MBB. Her approach indeed performs very wellin this setting where the temporal dependence is homogenous across individuals. As we will seein Table 3 relaxing this assumption, which is more realistic, does reveal a marked improvementof our cluster based performance over HAC based inference using the MBB. But even in the ho-mogenous setting, it should be noted that the MBB approach is sensitive to the chosen blocksize,and its selection here was appropriate given the imposed AR(1) temporal dependence (which isunknown in practical applications). Contrary to the MBB we do not need to choose a block size.In Table 2, we present the empirical power of our test for the significance of the slope when β = 0 . n, T ) pairs and compare the performance of our cluster-based inferenceprocedure to the HAC based inference procedure proposed by Driscoll and Kraay as before.The results from Table 2 show that our cluster based inference has good power to reject H : β = 0 when β = 0 . T relative to n. Thepower for our cluster based inference, using the na¨ıve bootstrap, compares well with that of powerof HAC based inference. Where the size-distortions for HAC based inference are smallest, anyapparent power loss of cluster based inference disappears. Both cluster based inference and HACbased inference have a comparable loss of power when both spatial and temporal dependence islarge.5.2.
Simulations with Heterogeneous Time Dependence.
In our second set of simulations,we allow individual heterogeneity in the time dependence of the error and the strictly exogenous
Table 2.
Monte Carlo Simulations with Homogeneous Time DependenceEmpirical Power of test for signifcance of β when β = 0 . HS (Cluster) DC (HAC) HS (Cluster) DK (HAC)asy nb wb asy fixb mbb asy nb wb asy fixb mbb( n, T ) Time Dependence: AR(1), ρ = 0 . ,
64) .852 .794 .830 .919 .882 .802 .243 .158 .212 .337 .277 .162(50 , , ,
64) .972 .959 .967 .991 .987 .971 .318 .239 .297 .418 .354 .241(100 , , n, T ) Time Dependence: AR(1), ρ = 0 . ,
64) .605 .401 .513 .707 .605 .232 .248 .116 .195 .359 .268 .065(50 , , ,
64) .784 .602 .710 .872 .799 .408 .287 .142 .238 .402 .305 .082(100 , , u pt is generated using various heterogeneous ARMA processes(1 − ρ ,p L )(1 + ρ L + ρ L ) u pt = (1 + θ ,p L + θ L + θ L ) η pt , with L denoting the lag operator, such that, e.g., Lu pt = u p,t − , ρ ,p and θ ,p are individualspecific AR and MA coefficients, and ( ρ , ρ ) and ( θ , θ ) are additional non-varying higher orderAR and MA coefficients. As before η pt characterizes the spatial dependence. A similar descriptionholds for the independently drawn, strictly exogenous regressor, x it , which is assumed to havethe same spatial temporal dependence as the error for simplicity. We allow the variance of u pt tovary across individuals p = 1 , ..., n .We consider four heterogeneous specifications: Mixed AR(1), Mixed AR(1)/MA(1), MixedAR(3), and Mixed AR(3)/MA(3). The individual specific parameters ρ ,p and θ ,p , where non-zero, reflect equidistant points on [0 . , . . The full details of these heterogeneous specificationsare provided at the bottom of Table 3.In Table 3, we report the empirical size for testing the significance of β in the presence ofheterogenous time dependence for panels where n = 100 and T = 64 , , and 256 . As before,we consider both weak and strong spatial dependence scenarios. For HAC based inference weused the parametric AR(1) plug-in method suggested by Andrews (1991) again, to determinethe window lag m T and the block length ℓ. A common approach, which does not recognize thetemporal heterogeneity nor the higher order (autoregressive) nature of the temporal dependenceunder consideration.
NFERENCE WITHOUT SMOOTHING FOR PANELS 25
Table 3.
Monte Carlo Simulations with Heterogeneous Time DependenceEmpirical size of test for significance of β Spatial Weak Spatial Dependence Strong Spatial DependenceDependenceEstimator HS (Cluster) DC (HAC) HS (Cluster) DK (HAC)asy nb wb asy fixb mbb asy nb wb asy fixb mbb( n, T ) Time Dependence: Mixed AR(1)(100 ,
64) .101 .055 .079 .189 .132 .062 .114 .065 .094 .207 .148 .052(100 , , n, T ) Time Dependence: Mixed AR(1)/MA(1)(100 ,
64) .080 .049 .066 .176 .132 .083 .092 .061 .087 .185 .136 .064(100 , , n, T ) Time Dependence: Mixed AR(3)(100 ,
64) .064 .051 .062 .147 .135 .120 .074 .055 .062 .150 .134 .125(100 , , n, T ) Time Dependence: Mixed AR(3)/MA(3)(100 ,
64) .068 .048 .062 .134 .113 .094 .072 .058 .068 .148 .129 .106(100 , , − ρ ,p L )(1 + ρ L + ρ L ) u pt = (1 + θ ,p L + θ L + θ L ) η pt , the follo-wing parameterisations are used: Denoting ρ p = (cid:0) ρ ,p , ρ , ρ (cid:1) ′ and θ p = ( θ ,p , θ , θ ) ′ Mixed AR(1): (cid:26) ρ p = (cid:16) . . p − n − , , (cid:17) ′ , θ p = 0 (cid:27) np =1 Mixed AR(1)/MA(1): (cid:26) ρ p = (cid:16) . . p − n/ − , , (cid:17) ′ , θ p = 0 (cid:27) n/ p =1 n ρ p = 0 , θ p = (cid:16) . . p − n/ − n/ − , , (cid:17)o np = n/ Mixed AR(3): (cid:26) ρ p = (cid:16) . . p − n − , . , . (cid:17) ′ , θ p = 0 (cid:27) np =1 Mixed AR(3)/MA(3): (cid:26) ρ p = (cid:16) . . p − n/ − , . , . (cid:17) ′ , θ p = 0 (cid:27) n/ p =1 n ρ p = 0, θ p = (cid:16) . . p − n/ − n/ − , . , . (cid:17)o n/ p =1 The results in Table 3 show that our cluster estimator of the variance is robust to the presenceof individual specific time dependence. The rejection rates based on the asymptotic critical valuesin the heterogeneous AR(1) time dependence setting, with (cid:8) ρ ,p (cid:9) np =1 in the range [0 . , . , arecomparable to the rejection rates in the homogenous AR(1) setting with ρ = 0 . . As in the Associated simulations considering the power to reject H : β = 0 when β = 0 . homogeneous time dependence setting, the rejection rates based on the asymptotic critical valuesapproach the nominal rejection rate of 5% as the sample size increases. The rejection rates basedon both frequency-based bootstrap schemes show that finite sample improvements in inferencecan be made. The improvements achieved when applying the wild bootstrap, proven to be valid inthe heterogeneous time dependence scenario, are more modest than those suggested by the na¨ıvebootstrap, which assumes homogeneous time dependence. Our cluster based inference reveals asimilar pattern when we permit higher order heterogeneous autoregressive/moving average tem-poral dependence with the na¨ıve bootstrap performing remarkably well again, suggesting that thena¨ıve bootstrap may be robust to violations of the homogeneous time dependence such as thoseconsidered in these simulations. Whereas the wild bootstrap does perform less well then expected,in particular in the presence of strong spatial dependence, the discrepancy between the rejectionrates based on the two bootstrap schemes does appear to be smaller than in the homogeneoustime dependence scenario.Importantly, our cluster based inference suggests large improvements over HAC based inferencein these heterogeneous time dependence settings, whether we use the asymptotic, the fixed-basymptotic critical values or base its rejection rates on the MBB. The inferior HAC based inferencemay be explained by the inappropriate use of a single smoothing parameter in these heterogeneoussettings, as is common practice, in addition to the fact that the parametric AR(1) plug-in methoddoes not account for other, and possible higher order (autoregressive) processes, than AR(1). Ourcluster based inference benefits from not requiring the choice of any smoothing parameter, andis therefore not subject to this deterioration in size. Aside from the ease of implementation, therobustness of our approach to the presence of individual specific time dependence is a particularlyattractive feature of our cluster robust inference.5.3. Simulations with (Conditional) Heteroskedasticity.
Finally, we consider simulationsthat make use of our ”modified” cluster based inference that permits general forms of heteroskedas-ticity. Here the data generating process is given by y pt = α t + η p + β ´ x pt + σ ( w p ) σ ( ̺ t ) u pt for p = 1 , ..., n and t = 1 , .., T. We consider both an additive and multiplicative specification for ´ x pt , in particular´ x pt = x pt + w p + ̺ t and ´ x pt = x pt ( w p ̺ t ) . Here u pt and x pt are drawn independently with weak temporal and weak cross-sectional depen-dence; w p and ̺ t are additional regressors where w p exhibits strong spatial dependence and ̺ t follows an AR(1) with coefficient equal to 0 .
7. Without loss of generality β = 0 again. Due to thepresence of the multiplicative error σ ( w p ) σ ( ̺ t ) u pt := v t , this setting does permit (conditional)heteroskedasticity with V ar ( v pt | x pt , w p ) = σ ( w p ) σ ( ̺ t ) after a normalization of the variance of u pt to one, for simplicity. In particular, we consider σ ( w p ) σ ( ̺ t ) = σ · [exp( δ w p ) + 1] [exp( δ ̺ t ) + 1] with δ = 0 . , . δ = 0 , . , . . The severity of heteroskedasticity, which we can measure using the coefficient of variation of σ ( w p ) σ ( ̺ t ) , increases with the values of δ and δ . The coefficient of variation is defined as theratio of the standard deviation of σ ( w p ) σ ( ̺ t ) to its mean. The average coefficient of variation of σ ( w p ) σ ( ̺ t ) over our simulations with δ = 0 . δ = 0 .
5) to 260% ( δ = 2 . . NFERENCE WITHOUT SMOOTHING FOR PANELS 27
The constant σ in chosen in such a way that the expected variability of σ ( w p ) σ ( ̺ t ) , equalsone for comparability across simulations.In Table 4, we report the empirical size for testing the significance of β in the presence of(conditional) heteroskedasticity. The average coefficient of variation for each specification acrossthe simulations is given in the first column. We provide two sets of simulations for our cluster basedinference: first we apply the original cluster based inference, which is robust to the presence ofheteroskedasticity that is only cross-sectional in nature, followed by the heteroskedasticity robustcluster based inference. In the top panel, we report the results based on the additive specificationof the regressor. The multiplicative specification of the regressors is in the bottom panel. Asbefore, we will compare the empirical size of our (robust) cluster based inference with the HACbased inference, in particular those using the MMB based critical values. As we impose an AR(1)temporal dependence, we have ensured that the use of the parametric AR(1) plug-in methodsuggested by Andrews (1991) to determine the window lag m T and the block length ℓ requiredfor this approach is suitable.The results in Table 4 show that under the additive formulation of the regressor, the perfor-mance of the cluster based inference and robust cluster based inference (which accounts for anon-constant σ ( ρ t )), are quite similar. The robust cluster based inference is required for ourfirst three formulations, where σ ( w p , ̺ t ) = σ ( w p ) σ ( ̺ t ) , whereas the final formulation permitsthe original cluster based inference. Compared to the results in Table 1 (case with weak spatialand temporal dependence), the rejection rates in the presence of (conditional) heteroskedasticityare only slightly larger (in part explained by the need to use estimates for σ ( ρ t )). Since e ´ x pt = e x pt under the additive formulation of the regressor, a comparison with the results in Table 1 is morestraightforward here than under the multiplicative formulation. Rejection rates that rely on thena¨ıve bootstrap compare favourably with that of the HAC based inference that uses the MBBand there does not appear a serious deterioration in the performance of the (robust) cluster basedinference when the severity of heteroskedasticity increases, either via δ or δ , in this setting.Under the multiplicative formulation of the regressor, the performance of the robust clusterbased inference is clearly superior in our first three specifications where robust cluster basedinference is required. The cluster based inference that does not account for (conditional) het-eroskedasticity that is not purely cross-sectional in nature (i.e., in the presence of non-constant σ ( ρ t )) , deteriorates quite quickly with δ (parameter reflecting the severity of temporal het-eroskedasticity). The rejection rates that use the robust estimator of the long-run variance, 4.6,are much closer to the nominal 5% rejection rates, whether we use the asymptotic critical valuesor the bootstrap algorithms. In fact, rejection rates that rely on the na¨ıve bootstrap compareagain quite well with the HAC based inference that use the MBB, which reveals the robustness ofour estimator to this type of (conditional) heteroskedasticity. This is a welcome result, given thatour estimator is simple to apply and does not require the choice of any smoothing parameter.6. CONCLUSIONS
In this paper we extend the literature on inference in panel data models in the presence ofboth temporal and cross-sectional dependence of unknown form. While a standard methodology,based on the
HAC estimator, is often invoked and used in the context of time series regression
Table 4.
Monte Carlo Simulations with Conditional HeteroskedasticityEmpirical size of test for significance of β Spatial Dep. Weak Spatial DependenceEstimator HS (Cluster) orig HS (Cluster) robust DC (HAC)asy nb wb asy nb wb asy fixb mbbCV ´ x - additive( n, T ) σ ( w p , ̺ t ) = σ · [exp(0 . w p ) + 1] [exp(0 . ̺ t ) + 1](100 ,
64) .428 .085 .054 .068 .089 .054 .068 .138 .100 .052(100 , , n, T ) σ ( w p , ̺ t ) = σ · [exp(0 . w p ) + 1] [exp(0 . ̺ t ) + 1](100 ,
64) .733 .086 .054 .068 .089 .052 .069 .132 .100 .055(100 , , n, T ) σ ( w p , ̺ t ) = σ · [exp(2 w p ) + 1] [exp(0 . ̺ t ) + 1](100 ,
64) 2.560 .090 .057 .070 .099 .055 .073 .133 .098 .051(100 , , n, T ) σ ( w p , ̺ t ) = σ · [exp(2 w p ) + 1](100 ,
64) 2.117 .088 .052 .070 .093 .050 .070 .134 .101 .056(100 , , x - multiplicative( n, T ) σ ( w p , ̺ t ) = σ · [exp(0 . w p ) + 1] [exp(0 . ̺ t ) + 1](100 ,
64) .428 .077 .062 .077 .070 .057 .069 .160 .142 .066(100 , , n, T ) σ ( w p , ̺ t ) = σ · [exp(0 . w p ) + 1] [exp(0 . ̺ t ) + 1](100 ,
64) .733 .128 .113 .125 .076 .063 .071 .162 .140 .051(100 , , n, T ) σ ( w p , ̺ t ) = σ · [exp(2 w p ) + 1] [exp(0 . ̺ t ) + 1](100 ,
64) 2.560 .131 .115 .123 .077 .063 .069 .162 .140 .049(100 , , σ ( w p , ̺ t ) = σ · [exp(2 w p ) + 1](100 ,
64) 2.117 .074 .049 .067 .086 .051 .068 .129 .099 .063(100 , , ρ = 0 . .σ is chosen to ensure that the expected σ (cid:0) w p ) σ ( ̺ t (cid:1) equals 1.models, in the presence of cross-sectional dependence its implementation has only recently beenconsidered, see Kim and Sun (2013), Driscoll and Kraay (1998) or Vogelsang (2012). To deal withvarious potential caveats of the HAC estimator, we propose a cluster based estimator which isable to take into account both types of dependence and allows the temporal dependence to beheterogeneous across individuals, extending the work of Arellano (1987) and Driscoll and Kraay(1998) in a substantial way. We provide a new CLT that accounts for an unknown and general
NFERENCE WITHOUT SMOOTHING FOR PANELS 29 temporal spatial dependence structure that permits strong spatial dependence. We thereby pro-vide primitive conditions that guarantee Kim and Sun’s (2013) , Driscoll and Kraay’s (1998) andGon¸calves’ (2011) assumption of the existence of a suitable CLT.Our approach is based on the insightful observation that the spectral representation of the fixedeffect panel data model is such that the errors become approximately temporally uncorrelated andheteroskedastic allowing the use of a cluster estimator of the long run variance in the frequencydomain. As the cluster estimator may not be reliable in small samples, and therefore may notprovide a good approximation to make accurate inferences, we present and examine bootstrapschemes in the frequency domain that are also bandwidth parameter free.Our simulation results reveal that our cluster estimator performs quite well even in the presenceof strong spatial dependence. For large panels, inference based on our cluster estimator is properlysized even in the presence of heterogeneous time dependence unlike Driscoll and Kraay’s HACbased inference of cross sectional averages that ignores such heterogeneity. Our bootstrap schemesprovide small sample improvements, where inference that use the na¨ıve bootstrap, in particular,is well sized, and reveal large improvements in size relative to HAC based inference when fixed-basymptotic critical values are used. Improvements over MBB based inference are more limited,except in the presence of heterogeneous time dependence. We have shown the robustness of ourcluster based inference to the presence of “groupwise” heteroskedasticity. To enable us to adaptto the presence of “groupwise” heteroskedasticity that is not purely cross-sectional in nature,a simple robust cluster based inference procedure was proposed that also does not require theselection of any smoothing parameter.
Appendix A: PROOF OF MAIN RESULTS
We first introduce some notation. For a generic function h , we shall abbreviate h ( λ j ) by h ( j )and for generic sequences (cid:8) ψ pt (cid:9) Tt =1 , p = 1 , ..., n , J ψ, · ( j ) = 1 T / T X t =1 n n X q =1 ψ qt e − itλ j .Using expression (10 . .
12) of Brockwell and Davis (1991), we also have the useful relation J u,p ( j ) = B u,p ( − j ) J ξ,p ( j ) + Y u,p ( j ) (A.1) J x,p ( j ) = B x,p ( − j ) J χ,p ( j ) + Y x,p ( j ) , p = 1 , ..., n ,where B u,p ( j ) =: B u,p (cid:0) e iλ j (cid:1) , B x,p ( j ) =: B x,p (cid:0) e iλ j (cid:1) andY u,p ( j ) = ∞ X ℓ =0 d ℓ ( p ) e − iℓλ j T / ( T − ℓ X t =1 − ℓ − T X t =1 ) ξ pt e − itλ j ! (A.2)Y x,p ( j ) = ∞ X ℓ =0 c ℓ ( p ) e − iℓλ j T / ( T − ℓ X t =1 − ℓ − T X t =1 ) χ pt e − itλ j ! .Finally, we shall make use of the well know result E J χ,p ( j ) J χ,q ( − k ) = ϕ x ( p, q ) ( j = k ) (A.3) E J ξ,p ( j ) J ξ,q ( − k ) = ϕ u ( p, q ) ( j = k ) . PROOF OF THEOREM 1.
For completeness, we provide the proof using the time domain estimator, ˆ β, and the frequencydomain estimator, ˜ β . We begin with ˆ β . Without loss of generality assume that x pt is scalar. Using (2 .
2) and standardarguments, we obtain T X t =1 n X p =1 e x pt e u pt = T X t =1 n X p =1 x pt u pt − T X t =1 n X p =1 ( x · t + x p · − x ·· ) u pt − T X t =1 n X p =1 ( u · t + u p · − u ·· ) x pt + o p (cid:16) ( nT ) / (cid:17) . (A.4)Because the second and third terms on the right of ( A.
4) are handled similarly, we shall only lookat the second. Now E T X t =1 n X p =1 x · t u pt = T X t,s =1 n X p,q =1 E ( x · t x · s ) γ u,pq ( t − s ) ϕ u ( p, q )= 1 n n X p ,q ,p ,q =1 ϕ x ( p , q ) ϕ u ( p , q ) T X t,s =1 γ x,p q ( t − s ) γ u,p q ( t − s ) ≤ C Tn n X p ,q =1 | ϕ x ( p , q ) | n X p ,q =1 | ϕ u ( p , q ) | = o ( nT ) . The latter displayed expression holds true because Conditions C C T X t,s =1 sup p,q (cid:12)(cid:12) γ x,pq ( t − s ) (cid:12)(cid:12) + sup p,q (cid:12)(cid:12) γ u,pq ( t − s ) (cid:12)(cid:12) < C , (A.5)whereas Condition C
3, see also Remark 2, implies that n X q =1 ϕ u ( p, q ) n X q =1 ϕ x ( p, q ) = o ( n ) (A.6)so that n X p ,p =1 ϕ u ( p , p ) n X q ,q =1 ϕ x ( q , q ) = o (cid:0) n (cid:1) . (A.7)Proceeding similarly with P Tt =1 P np =1 x p · u pt and x ·· P Tt =1 P np =1 u pt , we can conclude using ( A. nT ) / T X t =1 n X p =1 e x pt e u pt = 1( nT ) / T X t =1 n X p =1 x pt u pt + o p (1) d → N (0 , Φ)by Lemma B.8. From here it is standard to conclude that ( nT ) / (cid:16)b β − β (cid:17) → d N (cid:0) , Σ − ΦΣ − (cid:1) . For two nonnegative sequences { α p } and (cid:8) β p (cid:9) , P α p β p < C implies that P α p P β p = o ( n ) if P (cid:0) α p + β p (cid:1) = o ( n ). NFERENCE WITHOUT SMOOTHING FOR PANELS 31
We now show that ( nT ) / (cid:16)e β − β (cid:17) → d N (cid:0) , Σ − ΦΣ − (cid:1) . Proceeding similarly as we did above,we shall examine 1( nT ) / n X p =1 T − X j =1 J x,p ( j ) J u,p ( − j ) − nT ) / n X p =1 T − X j =1 J x,p ( j ) J u, · ( − j ) (A.8) − nT ) / n X p =1 T − X j =1 J x,p ( j ) J u, · ( − j ) .The first term of ( A.
8) converges in distribution to N (0 , Φ) by Lemma B.9. So, to complete theproof it suffices to show that the last two terms of ( A.
8) are o p (1). We examine the second termonly, with the third term being handled similarly. By standard algebra and ( A. n / n X p,q =1 T / T − X j =1 B x,p ( j ) B u,q ( j ) J χ,p ( j ) J ξ,q ( − j )+ 1 n / n X p,q =1 T / T − X j =1 B x,p ( j ) J χ,p ( j ) {J u,q ( − j ) − B u,q ( j ) J ξ,q ( − j ) } + 1 n / n X p,q =1 T / T − X j =1 B u,p ( j ) J ξ,q ( − j ) {J x,q ( − j ) − B x,q ( j ) J χ,p ( j ) } (A.9)+ 1 n / n X p,q =1 T / T − X j =1 ( J x,q ( − j ) − B x,q ( j ) J χ,p ( j )) × ( J u,q ( − j ) − B u,q ( j ) J ξ,q ( − j )) .We examine the second term of ( A.
9) first. Using ( A. T n n X p ,p ,q ,q =1 ϕ u ( q , q ) ϕ x ( p , p ) 1 T T − X j =1 sup p ,p | f x,p p ( j ) | = 1 T n n X q ,q =1 ϕ u ( q , q ) n X p ,p =1 ϕ x ( p , p )= o (cid:0) T − (cid:1) ,by Lemma B.1 and ( A. A.
9) are o p (cid:0) T − / (cid:1) . So tocomplete the proof we need to examine the first term of ( A. A. (cid:0) sup p,q | f x,pq ( j ) | + sup p,q | f u,pq ( j ) | (cid:1) ≤ C , bounded by1 T n T − X j =1 sup p,q | f x,pq ( j ) | | f u,pq ( j ) | n X p ,p =1 ϕ x ( p , p ) n X q ,q =1 ϕ u ( q , q ) = o (1) .This concludes the proof of the theorem. (cid:3) PROOF OF PROPOSITION 1.
We begin with part ( a ). We need to show that, for any k , k = 1 , ..., k ,˘Φ k ,k = 1 T T − X j =1 n / n X p =1 J e x,p,k ( j ) J b u,p ( − j ) n / n X p =1 J e x,p,k ( − j ) J b u,p ( j ) P → Φ k ,k .To simplify the notation we shall assume that k = 1. Now, after observing that J b u,p ( j ) = J e u,p ( j ) − (cid:16)e β − β (cid:17) J e x,p ( j ) ,we have that ˘Φ =: ˘Φ , is1 T T − X j =1 n / n X p =1 J e x,p ( j ) J u,p ( − j ) n / n X p =1 J e x,p ( − j ) J u,p ( j ) +2 (cid:16)e β − β (cid:17) T T − X j =1 n / n X p =1 I e x,p ( j ) n / n X p =1 J e x,p ( − j ) J u,p ( j ) + (cid:16)e β − β (cid:17) T T − X j =1 n / n X p =1 I e x,p ( j ) . (A.10)The third term of ( A.
10) is O p (cid:0) T − (cid:1) by Lemma B.7 and e β − β = O p (cid:16) ( nT ) − / (cid:17) . The secondterm of ( A.
10) is also o p (1) by Cauchy-Schwarz’s inequality if we show that the first term convergesin probability to Φ. Since J e x,p ( j ) = J x,p ( j ) − J x, · ( j ) , (A.11)this result holds true if we show that1 T T − X j =1 n / n X p =1 J x,p ( j ) J u,p ( − j ) n / n X p =1 J x,p ( − j ) J u,p ( j ) P → Φ (A.12)and 1 T T − X j =1 n / n X p =1 J x, · ( j ) J u,p ( − j ) n / n X p =1 J x,p ( − j ) J u,p ( j ) + 1 T T − X j =1 n / n X p =1 J x, · ( j ) J u,p ( − j ) n / n X p =1 J x, · ( − j ) J u,p ( j ) = o p (1) . (A.13)First we examine ( A. A. T T − X j =1 n X p =1 E ( J x, · ( j ) J x,p ( − j )) E ( J u,p ( − j ) J u,p ( j ))= CT n T − X j =1 n X p =1 n X r =1 ϕ x ( p, r ) n X q =1 ϕ u ( p, q ) (cid:26) CT (cid:27) . NFERENCE WITHOUT SMOOTHING FOR PANELS 33 using Lemma B.1, after we observe that the factor in brackets is n / J x, · ( j ) J u, · ( − j ). Using( A. , we conclude that the last displayed expression is o (1). Next, we observe that Lemma B.5implies, for instance, that E ( J u, · ( − j ) J u,p ( j ) J u, · ( − k ) J u,q ( k )) − E ( J u, · ( − j ) J u,p ( j ))= ϕ u ( p, q ) 1 n n X p ,q =1 ϕ u ( p , q ) (cid:26) ( j = k ) + CT (cid:27) .The variance of the first term on the left of ( A. , therefore, is bounded by1 T T − X j,k =1 n X p,q =1 ϕ ( p, q ) 1 n n X p ,q =1 ϕ u ( p , q ) n X p ,q =1 ϕ x ( p , q ) (cid:26) ( j = k ) + CT (cid:27) = o (cid:18) T (cid:19) using Condition C A. A.
13) is o p (1). The sameconclusion holds true for the second term of ( A. a ), it remains to show ( A. A. A. A. − ( A.
16) are o p (1);1 nT T − X j =1 n X p =1 B x,p ( − j ) B u,p ( j ) J χ,p ( j ) J ξ,p ( − j ) n X p =1 B x,p ( − j ) B u,p ( j ) J χ,p ( j ) J ξ,p ( − j ) − Φ, (A.14)1 nT T − X j =1 n X p =1 B x,p ( − j ) J χ,p ( j ) Y u,p ( − j ) n X p =1 B u,p ( j ) J ξ,p ( − j ) Y x,p ( j ) , (A.15)1 nT T − X j =1 n X p =1 Y x,p ( j ) Y u,p ( − j ) n X p =1 Y u,p ( − j ) Y x,p ( j ) (A.16)We begin by showing that ( A.
14) is o p (1). First, the expectation of ( A.
14) is1 n n X p,q =1 ϕ ( p, q ) 1 T T − X j =1 B x,p ( − j ) B x,q ( j ) B u,p ( j ) B u,q ( − j ) − Φ = O (cid:0) T − (cid:1) because, by continuous differentiability of f x,pq ( − λ ) f u,pq ( λ ), we have that1 T T − X j =1 B x,p ( − j ) B x,q ( j ) B u,p ( j ) B u,q ( − j ) − Z π f x,pq ( − λ ) f u,pq ( λ ) dλ = O (cid:0) T − (cid:1) . Next, because ( A.
3) implies that E { ( J χ,p ( j ) J ξ,p ( − j ) J χ,q ( − j ) J ξ,q ( j ) − E ( · ))( J χ,p ( − k ) J ξ,p ( k ) J χ,q ( k ) J ξ,q ( − k ) − E ( · )) } = ϕ x ( p , p ) ϕ x ( q , q ) ϕ u ( q , p ) ϕ u ( p , q ) ( j = k )+ ϕ x ( p , p ) ϕ x ( q , q ) ϕ u ( p , p ) ϕ u ( q , q ) ( j = k )+2 ϕ x ( p , p ) ϕ x ( q , q ) ∞ X ℓ =1 c ℓ ( p ) c ℓ ( p ) c ℓ ( q ) c ℓ ( q ) ( j = k )+ ∞ X ℓ =1 c ℓ ( p ) c ℓ ( p ) c ℓ ( q ) c ℓ ( q ) ∞ X ℓ =1 d ℓ ( p ) d ℓ ( p ) d ℓ ( q ) d ℓ ( q ) (cid:16) ( j = k ) + κ ,ξ κ ,χ T (cid:17) ,standard algebra yields that the second moment of ( A.
14) is o (1), when recognizing ∞ X ℓ =1 d ℓ ( p ) d ℓ ( p ) d ℓ ( q ) d ℓ ( q ) ≤ ∞ X ℓ =1 d ℓ ( p ) d ℓ ( p ) ∞ X ℓ =1 d ℓ ( q ) d ℓ ( q )= ϕ u ( p , p ) ϕ u ( q , q ) (A.17) ∞ X ℓ =1 c ℓ ( p ) c ℓ ( p ) c ℓ ( q ) c ℓ ( q ) ≤ ∞ X ℓ =1 c ℓ ( p ) c ℓ ( p ) ∞ X ℓ =1 c ℓ ( q ) c ℓ ( q )= ϕ x ( p , p ) ϕ x ( q , q ) (A.18)and n X p =1 ϕ x ( p , p ) ϕ u ( p , q ) ≤ n X p =1 ϕ /αx ( p , p ) α n X p =1 ϕ / − αu ( p , q ) − α = O (1) (A.19)since P np =1 ϕ x ( p , p ) ϕ u ( p , p ) = O (1) implies ϕ x ( p , p ) = O (cid:0) p − α (cid:1) and ϕ u ( p , p ) = O (cid:16) p − β (cid:17) with α + β > A. p |B x,p ( − j ) B u,p ( j ) | < C , the second moment of ( A.
15) isbounded by 1( nT ) T − X j,k =1 n X p ,q ,p ,q =1 | E {J χ,p ( j ) J χ,q ( − k ) Y x,p ( j ) Y x,q ( − k ) } E { Y u,p ( − j ) Y u,q ( k ) J ξ,p ( − j ) J ξ,q ( k ) }| .From here, proceeding as with ( A.
14) but using Lemmas B.1 and B.2 as needed, we easily concludethat ( A.
15) = o p (1) by Markov’s inequality, since for instance E {J χ,p ( j ) J χ,q ( k ) Y x,p ( − j ) Y x,q ( − k ) } = E ( J χ,p ( j ) J χ,q ( k )) E (Y x,p ( − j ) Y x,q ( − k ))+ E ( J χ,p ( j ) Y x,p ( − j )) E ( J χ,q ( k ) Y x,q ( − k ))+ E ( J χ,p ( j ) Y x,q ( − k )) E ( J χ,q ( k ) Y x,p ( − j ))+ cum ( J χ,p ( j ) ; J χ,q ( k ) ; Y x,p ( − j ) ; Y x,q ( − k )) . NFERENCE WITHOUT SMOOTHING FOR PANELS 35
The proof of part ( a ) now concludes since ( A.
16) = o p (1) by standard algebra and Lemmas B.1and B.2.Part ( b ). Because the continuous differentiability of f x,p ( λ ), we have that T − P Tj =1 f x,p ( j ) → R π f x,p ( λ ) dλ =: Σ x,p , see Brillinger (1981 , p. 15), so we can conclude by Lemma B.6 and ( A. nT n X p =1 T − X j =1 I x, · ( j ) and 2 nT n X p =1 T − X j =1 J x, · ( j ) J x,p ( j ) = o p (1) .are both o p (1). However this is the case proceeding similarly as with the proof of ( A. (cid:3) PROOF OF THEOREM 2.
Because Lemma B.7 implies that ( nT ) − P np =1 P T − j =1 I e x,p ( j ) P → Σ x and abbreviating b f u ( j ) = n P nq =1 I b u,q ( j ), it suffices to show( i ) 1 T / n / n X p =1 T − X j =1 J e x,p ( j ) (cid:16) b f / u ( j ) − f / u ( j ) (cid:17) J u ∗ ,p ( − j ) = o p ∗ (1) (A.20)( ii ) 1 T / n / n X p =1 T − X j =1 J e x,p ( λ j ) f / u ( j ) J u ∗ ,p ( − j ) d ∗ → N (0 , Φ) (A.21)We begin with part ( ii ). The left hand side of ( A.
21) is1 T / n / n X p =1 T − X j =1 f / u ( j ) B x,p ( j ) J χ,p ( j ) J u ∗ ,p ( − j ) (A.22)+ 1 T / n / n X p =1 T − X j =1 f / u ( j ) (cid:0) J e x,p ( j ) − B x,p ( j ) J χ,p ( j ) (cid:1) J u ∗ ,p ( − j ) .The second (bootstrap) moment of the second term of ( A.
22) is1 nT n X p,q =1 T − X j =1 f u ( j ) b σ u,pq (cid:0) J e x,p ( j ) − B x,p ( j ) J χ,p ( j ) (cid:1) (cid:0) J e x,q ( − j ) − B x,q ( − j ) J χ,q ( − j ) (cid:1) (A.23)using E ∗ ( J u ∗ ,p ( j ) J u ∗ ,q ( − k )) = b σ u,pq ( j = k ) ; b σ u,pq = 1 T T X t =1 b u pt b u qt , (A.24)By Lemma B.1 and ( A. E (cid:0) J e x,p ( j ) − B x,p ( j ) J χ,p ( j ) (cid:1) (cid:0) J e x,q ( − j ) − B x,p ( − j ) J χ,p ( − j ) (cid:1) = CT ϕ x ( p, q ) ; b σ u,pq = ϕ u ( p, q ) (cid:16) O p (cid:16) T − / (cid:17)(cid:17) . Hence it easily follows that the expected value of equation ( A.
23) is o (1) and consequently thesecond term of ( A.
22) is o p ∗ (1), after we observe that ( A.
23) is a nonnegative expression.
Turning to the first term of ( A. , let us denoteΞ ∗ s,t ( n ) = 1 n / n X p =1 χ ps u ∗ pt ; G ( j ) =: B x,p ( j ) f / u,p ( j ) . (A.25)Standard algebra yields that the first term of ( A.
22) is1 e T / T T X t,s =1 Ξ ∗ s,t ( n ) e T X j =1 G ( j ) e i ( t − s ) λ j = 1 T / T X t,s =1 φ ( | t − s | ) Ξ ∗ s,t ( n ) + CT / T X t,s =1 Ξ ∗ s,t ( n ) , (A.26)where to simplify the notation we assume that ϕ x ( p, p ) = ϕ u ( p, p ) = 1 for all p = 1 , ..., n and φ ( r ) is the r th Fourier coefficient of G ( j ). Hence the right hand side of ( A.
26) can now be writtenas φ (0) T / T − ℓ X t =1 n / n X p =1 χ pt u ∗ pt + T − X ℓ =1 φ ( ℓ ) T / T − ℓ X t =1 n / n X p =1 χ pt u ∗ p,t + ℓ + n X p =1 χ p,t + ℓ u ∗ pt . (A.27)Because φ ( r ) = O (cid:0) r − (cid:1) by Conditions C C , given the independence of the sequences ofrandom variables n − / P np =1 χ pt u ∗ p,t + ℓ and n − / P np =1 χ p,t + ℓ u ∗ pt in t , to complete the proof ofpart ( ii ), it suffices to to show thatΛ ∗ t,n =: 1 n / n X p =1 χ pt u ∗ p,t + ℓ d ∗ → N , T − ℓT lim n →∞ n n X p,q =1 ϕ ( p, q ) .The second bootstrap moment of Λ ∗ t,n is1 n n X p,q =1 χ pt χ qt T T − ℓ X r =1 b u p,r + ℓ b u q,r + ℓ = 1 n n X p,q =1 χ pt χ qt T T − ℓ X r =1 u p,r + ℓ u q,r + ℓ (1 + o p (1)) ,by standard algebra and Theorem 1. Now, Conditions C C n n X p,q =1 ( E (cid:0) χ pt χ qt (cid:1) T T − ℓ X r =1 E ( u p,r + ℓ u q,r + ℓ ) ) = T − ℓT n n X p,q =1 ϕ ( p, q ) . NFERENCE WITHOUT SMOOTHING FOR PANELS 37
Moreover, because E ( u p ,t + ℓ u q ,t + ℓ u p ,s + ℓ u q ,s + ℓ ) = E ( u p t u q t u p s u q s ) E n n X p,q =1 χ pt χ qt T T − ℓ X t =1 u pt u qt = 1 n n X p ,q ,p ,q =1 E (cid:0) χ p t χ q t χ p t χ q t (cid:1) T T − ℓ X t,s =1 E ( u p t u q t u p s u q s )= 1 n n X p ,q ,p ,q =1 T T − ℓ X t,s =1 (cid:8) E (cid:0) χ p t χ q t (cid:1) E (cid:0) χ p t χ q t (cid:1) + E (cid:0) χ p t χ q t (cid:1) E (cid:0) χ p t χ q t (cid:1) + E (cid:0) χ p t χ p t (cid:1) E (cid:0) χ q t χ q t (cid:1) + cum (cid:0) χ p t ; χ q t ; χ p t ; χ q t (cid:1)(cid:9) × { E ( u p t u q t ) E ( u p s u q s ) + E ( u p t u q s ) E ( u p s u q t )+ E ( u p t u p s ) E ( u q t u q s ) + cum ( u p t ; u q t ; u p s ; u q s ) } = 1 n T n X p ,q ,p ,q =1 T − ℓ X t,s =1 E (cid:0) χ p t χ q t (cid:1) E (cid:0) χ p t χ q t (cid:1) E ( u p t u q t ) E ( u p s u q s ) (1 + o (1))= T − ℓT n n X p,q =1 ϕ ( p, q ) (1 + o (1))because E ( u ps u qr ) = ϕ u ( p, q ) γ u,pq ( r − s ), P Tr,s =1 (cid:12)(cid:12) γ u,pq ( r − s ) (cid:12)(cid:12) = O ( T ) and ( A. E ∗ (cid:12)(cid:12) Λ ∗ t,n (cid:12)(cid:12) − T − ℓT n P np,q =1 ϕ ( p, q ) = o p (1).Thus, it remains to show the Lindeberg’s condition to complete the proof of part ( ii ). To thatend, it suffices to show that 1 n n X p =1 E ∗ (cid:0) χ pt u ∗ p,t + ℓ (cid:1) = o p (1) .The left hand side of the last displayed expression is1 n n X p =1 (cid:13)(cid:13) χ pt (cid:13)(cid:13) T T − ℓ X t =1 b u p,t + ℓ = 1 n n X p =1 (cid:13)(cid:13) χ pt (cid:13)(cid:13) T T − ℓ X t =1 u p,t + ℓ (1 + o p (1))= O p (cid:0) n − (cid:1) ,which completes the proof of part ( ii ).Next we prove part ( i ). The left side of ( A.
20) is1 T / n / n X p =1 T − X j =1 (cid:16) b f / u ( j ) − f / u ( j ) (cid:17) B x,p ( j ) J χ,p ( j ) J u ∗ ,p ( − j ) (A.28)+ 1 T / n / n X p =1 T − X j =1 (cid:16) b f / u ( j ) − f / u ( j ) (cid:17) (cid:0) J e x,p ( j ) − B x,p ( j ) J χ,p ( j ) (cid:1) w u ∗ ,p ( − j ) .We shall only show explicitly that the first term of ( A.
28) is o p ∗ (1), the second term followingsimilarly if not easier proceeding as with the second term of ( A.
22) and Lemma B.1. Now by ( A. A.
28) has second bootstrap moment given by1 T T X t =1 nT T − X j =1 n b f / u ( j ) − f / u ( j ) o f x ( j ) n X p,q =1 b u pt b u qt J χ,p ( j ) J χ,q ( − j ) .Because the last displayed expression is a nonnegative expression, to show that it is o p (1), itsuffices to show that its first moment converges to zero. To that end, we first observe that n b f / u ( j ) − f / u ( j ) o ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n n X q =1 I b u,q ( j ) − f u ( j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = o p (1) (A.29)using standard arguments and Theorem 1 under Condition C
4. On the other hand, proceedingsimilarly as in Proposition 1, we obtain easily that1 n n X p,q =1 b u pt J χ,p ( j ) b u qt J χ,q ( − j ) = 1 n n X p,q =1 u pt J χ,p ( j ) u qt J χ,q ( − j ) (1 + o p (1)) ,and thus the proof of part ( i ), and thereby the theorem, is completed if E n X p,q =1 u pt u qt J χ,p ( j ) J χ,q ( − j ) = O ( n ) .But the left hand side of the last displayed expression is n X p,q =1 ϕ u ( p, q ) 1 T T X t,s =1 E ( x pt x qs ) e − i ( t − s ) λ j = C n X p,q =1 ϕ u ( p, q ) ϕ x ( p, q ) = O ( n )by Condition C , which completes the proof of the theorem. (cid:3) PROOF OF PROPOSITION 2.
As with the proof of Proposition 1, we shall assume that k = 1. Now, after observing that J u ∗ p ( j ) = J e u ∗ ,p ( j ) − (cid:16)e β ∗ − e β (cid:17) J e x,p ( j ) ,we have that ˘Φ ∗ equals the sum of the following expressions ( A. − ( A. T T − X j =1 b f u ( j ) n / n X p =1 J e x,p ( j ) J u ∗ ,p ( − j ) n / n X p =1 J e x,p ( − j ) J u ∗ ,p ( j ) − ˘Φ (A.30)2 (cid:16)e β ∗ − e β (cid:17) T T − X j =1 b f / u ( j ) n / n X p =1 I e x,p ( j ) n / n X p =1 J e x,p ( − j ) J u ∗ ,p ( j ) (A.31) (cid:16)e β ∗ − e β (cid:17) T T − X j =1 n / n X p =1 I e x,p ( j ) . (A.32)That ( A.
32) is o p ∗ (1) follows straightforwardly by Theorem 2 and Lemma B.7 and ( A.
31) is o p ∗ (1) by Cauchy-Schwarz’s inequality if we show that ( A.
30) is o p ∗ (1). To that end, using ( A. NFERENCE WITHOUT SMOOTHING FOR PANELS 39 and ( A. E ∗ ( A.
30) = 1 nT T − X j =1 b f u ( j ) n X p,q =1 J x,p ( j ) J x,q ( − j ) b σ u,pq − ˘Φ+ 1 nT T − X j =1 b f u ( j ) n X p,q =1 J x, · ( j ) J x, · ( − j ) b σ u,pq . Because b σ u,pq = ϕ u ( p, q ) (1 + o p (1)) and ˘Φ − Φ = o p (1) by Proposition 1, proceeding as in theproof of Theorem 2 part ( i ), it suffices to examine the behaviour of1 nT T − X j =1 f u ( j ) n X p,q =1 { ϕ u ( p, q ) J x,p ( j ) J x,q ( − j ) } − Φ (A.33)+ 1 T T − X j =1 f u ( j ) J x, · ( j ) J x, · ( − j ) 1 n n X p,q =1 ϕ u ( p, q ) . (A.34)( A.
34) is o p (1) as we now show. As it is a nonnegative sequence, it suffices to show that its firstmean converges to zero. Using ( A.
1) and then Lemmas B.1 and B.2, we have that its first momentis proportional to 1 n n X p,q =1 ϕ x ( p, q ) 1 n n X p,q =1 ϕ u ( p, q ) = o (1)by ( A. A.
33) is o (1), it then remains to show that the (boot-strap) variance of ( A. , with J e x,p ( j ) replaced by J x,p ( j ) , converges to zero. Using ( A. T T − X j =1 b f u ( j ) n n X p ,q p ,q =1 J x,p ( j ) J x,q ( − j ) J x,p ( − j ) J x,q ( j ) b σ u,p p b σ u,q q + κ ,ξ (1 + o p (1)) T n T − X j,k =1 n b f u ( j ) b f u ( k ) × n X p ,q p ,q =1 ϕ u ( p , q ) ϕ u ( p , q ) J x,p ( j ) J x,q ( − j ) J x,p ( − k ) J x,q ( k ) , with Lemma B.4 guaranteeing cum ∗ (cid:0) u ∗ p t , u ∗ q t , u ∗ p t , u ∗ q t (cid:1) = κ ,ξ ϕ u ( p , q ) ϕ u ( p , q ) (1 + o p (1)) .From here we proceed as before after noticing that b σ u,p p = ϕ u ( p , p ) (1 + o p (1)). This com-pletes the proof of the proposition. (cid:3) PROOF OF PROPOSITION 3.
As with the proof of Theorem 2, it suffices to show that1 T / n / T − X j =1 n X p =1 J e x,p ( j ) J b u,p ( − j ) η j d ∗ → N (0 , Φ) . (A.35)
Because η j are normally distributed it suffices to show E ∗ T / n / T − X j =1 n X p =1 J e x,p ( j ) J b u,p ( − j ) η j P → Φ.This is the case as we now show. The left hand side of the last displayed expression is1 nT T − X j =1 n X p,q =1 J e x,p ( j ) J e x,q ( − j ) J b u,p ( − j ) J b u,q ( j )= 1 nT T − X j =1 n X p,q =1 J e x,p ( j ) J e x,q ( − j ) J u,p ( − j ) J u,q ( j ) + o p (1)as b u pt − u pt = (cid:16)e β − β (cid:17) x pt and e β − β = O p (cid:0) T − / n − / (cid:1) . Using ( A.
11) and proceeding as in theproof of part ( a ) of Proposition 1, we now have that the right hand side is1 nT T − X j =1 n X p,q =1 J x,p ( j ) J x,q ( − j ) J u,p ( − j ) J u,q ( − j )+ 2 nT T − X j =1 n X p,q =1 J x,p ( j ) J x, · ( − j ) J u,p ( − j ) J u,q ( − j )+ 1 nT T − X j =1 n X p,q =1 J x, · ( j ) J x, · ( − j ) J u,p ( − j ) J u,q ( − j ) + o p (1) .The first term converges in probability to Φ, whereas the second term follows by Cauchy-Schwarz’sinequality if the third term is also o p (1). But that term is o p (1) proceeding as in the proof ofpart ( a ) of Proposition 1 using Lemma B.5. Again observe that the expression is nonnegative.This concludes the proof. (cid:3) Appendix B: LEMMAS
First denoting Υ ℓ,p ( j ) = nP T − ℓt =1 − ℓ − P Tt =1 o ξ pt e − itλ j and Ψ ℓ,p ( j ) = nP T − ℓt =1 − ℓ − P Tt =1 o χ pt e − itλ j ,we have that Y u,p ( j ) and Y x,p ( j ) given in ( A.
2) can be decomposed asY u,p ( j ) = Y (1) u,p ( j ) + Y (2) u,p ( j ) (B.1)Y x,p ( j ) = Y (1) x,p ( j ) + Y (2) x,p ( j ) ,whereY (1) u,p ( j ) = 1 T / T X ℓ =0 d ℓ ( p ) e − iℓλ j Υ ℓ,p ( j ) ; Y (2) u,p ( j ) = 1 T / ∞ X ℓ = T +1 d ℓ ( p ) e − iℓλ j Υ ℓ,p ( j )Y (1) x,p ( j ) = 1 T / T X ℓ =0 c ℓ ( p ) e − iℓλ j Ψ ℓ,p ( j ) ; Y (2) x,p ( j ) = 1 T / ∞ X ℓ = T +1 c ℓ ( p ) e − iℓλ j Ψ ℓ,p ( j ) . Lemma B.1.
Assuming C and C , we have that for p, q = 1 , .., n and some υ u , υ x > finite, E (cid:16) Y (1) w,p ( j ) Y (1) w,q ( − k ) (cid:17) = υ w ϕ w ( p, q ) T ; w =: u or x (B.2) E (cid:16) Y (2) w,p ( j ) Y (2) w,q ( − k ) (cid:17) = o (cid:0) T − (cid:1) ϕ w ( p, q ) ( j = k ) ; w =: u or x . (B.3) NFERENCE WITHOUT SMOOTHING FOR PANELS 41
Proof.
We examine only the case when w =: u , with the proof for w =: x similarly handled. Webegin with ( B. ℓ ≥ T , E (Υ ℓ,p ( j ) Υ ℓ,q ( − k )) = 2 T ϕ u ( p, q ) ( j = k ), we obtain thatthe left hand side of ( B.
3) is2 ∞ X ℓ ,ℓ = T +1 d ℓ ( p ) d ℓ ( q ) ϕ u ( p, q ) ( j = k ) .The conclusion then follows because Condition C P ∞ ℓ = T +1 sup p | d ℓ ( p ) | = o (cid:0) T − (cid:1) .Next we consider ( B. T T X ℓ ,ℓ =0 d ℓ ( p ) d ℓ ( q ) E (Υ ℓ,p ( j ) Υ ℓ,q ( − k )) = ϕ u ( p, q ) υ u T since Υ ℓ,p ( j ) = nP t =1 − ℓ − P Tt = T − ℓ +1 o ξ pt e itλ j when ℓ ≤ T , so that E (Υ ℓ,p ( j ) Υ ℓ,q ( − k )) = 2 ϕ u ( p, q ) ℓ X t =1 e it ( λ j − λ k ) .We now conclude because P ∞ ℓ =0 ℓ sup p | d ℓ ( p ) | < ∞ by Condition C (cid:3) Lemma B.2.
Assuming C and C , we have that for p, q = 1 , .., n , ( a ) E (cid:16) Y (1) u,p ( j ) J ξ,q ( − k ) (cid:17) = ϕ u ( p, q ) 1 T T X ℓ =0 d ℓ ( p ) e − iℓλ j ℓ X t =1 e itλ j − k E (cid:16) Y (2) u,p ( j ) J ξ,q ( − k ) (cid:17) = ϕ u ( p, q ) ( j = k ) o (cid:0) T − (cid:1) ( b ) E (cid:16) Y (1) x,p ( j ) J χ,q ( − k ) (cid:17) = ϕ x ( p, q ) 1 T T X ℓ =0 c ℓ ( p ) e − iℓλ j ℓ X t =1 e itλ j − k E (cid:16) Y (2) x,p ( j ) J χ,q ( − k ) (cid:17) = ϕ x ( p, q ) ( j = k ) o (cid:0) T − (cid:1) .Proof. As in the proof of Lemma B.1 we shall only show part ( a ). To that end, we first noticethat Condition C E (Υ ℓ,p ( j ) J ξ,q ( − k )) = ϕ u ( p, q ) T / ( j = k ) ( ℓ ≥ T ) + T X t = T − ℓ +1 e itλ j − k ( ℓ < T ) ! .From here the proof concludes by standard algebra. (cid:3) Lemma B.3.
Assuming C and C , we have that (cid:12)(cid:12) cum (cid:0) ξ p t ; ξ p t ; ξ p t ; ξ p t (cid:1)(cid:12)(cid:12) ≤ | κ ,ξ | ϕ u ( p , p ) ϕ u ( p , p ) (cid:12)(cid:12) cum (cid:0) χ p t ; χ p t ; χ p t ; χ p t (cid:1)(cid:12)(cid:12) ≤ | κ ,χ | ϕ x ( p , p ) ϕ x ( p , p ) (B.4) Proof.
Using inequality ( A. , the proof follows easily since by definition cum (cid:0) ξ p t ; ξ p t ; ξ p t ; ξ p t (cid:1) = κ ,ξ ∞ X ℓ =1 a ℓ ( p ) a ℓ ( p ) a ℓ ( p ) a ℓ ( p ) . The proof is similar for the second expression in ( B. A.
18) is used instead of( A. (cid:3) Lemma B.4.
Assuming C and C , for some τ > , | cum ( u p t ; u p t ; u p t ; u p t ) | ≤ C | κ ,ξ | ϕ u ( p , p ) ϕ u ( p , p )( t − t ) τ ( t − t ) τ ( t − t ) τ | cum ( x p t ; x p t ; x p t ; x p t ) | ≤ C | κ ,χ | ϕ x ( p , p ) ϕ x ( p , p )( t − t ) τ ( t − t ) τ ( t − t ) τ .Proof. As in the proof of Lemma B.3, we handle the first displayed inequality only. Without lossof generality we take t ≤ t ≤ t ≤ t . Condition C cum ( u p t ; u p t ; u p t ; u p t ) = ∞ X k =1 d k ( p ) d k + t − t ( p ) d k + t − t ( p ) d k + t − t ( p ) × cum (cid:0) ξ p t ; ξ p t ; ξ p t ; ξ p t (cid:1) .From here we conclude using Lemma B.3 and the fact that Condition C p | d k ( p ) | = O ( k − τ ) for some τ > (cid:3) Lemma B.5.
Assuming C and C , we have that for w =: u or x , E ( J w,p ( j ) J w,p ( − k )) = f w,p p ( j ) ϕ w ( p , p ) (cid:26) ( j = k ) + CT (cid:27) (B.5) and E ( J w,p ( j ) J w,p ( − j ) J w,p ( k ) J w,p ( − k )) (B.6)= ϕ w ( p , p ) ϕ w ( p , p ) (cid:26) ( j = k ) + CT (cid:27) .Proof. Consider w =: u , say. By ( A. B.
5) is E (( B u,p ( − j ) J ξ,p ( j ) + Y u,p ( j )) ( B u,p ( k ) J ξ,p ( − k ) + Y u,p ( − k ))) , which using ( A.
3) equals the right hand side of ( B.
5) by Lemmas B.1 and B.2.Next, the left hand side of ( B.
6) is E ( J u,p ( j ) J u,p ( − j )) E ( J u,p ( k ) J u,p ( − k )) + E ( J u,p ( j ) J u,p ( k )) E ( J u,p ( − j ) J u,p ( − k ))+ E ( J u,p ( j ) J u,p ( − k )) E ( J u,p ( k ) J u,p ( − j )) + cum ( J u,p ( j ) ; J u,p ( − j ) ; J u,p ( k ) ; J u,p ( − k )) .Using ( B. , the first three terms of the last displayed expression are proportional to f u,p p ( j ) f u,p p ( j ) ϕ u ( p , p ) ϕ u ( p , p ) ( j = k ) , while the absolute value of the last term is bounded by1 T T X t ,t ,t ,t =1 | cum ( u p t ; u p t ; u p t ; u p t ) | ≤ C | κ ,ξ | T T X t ,t ,t ,t =1 ϕ u ( p , p ) ϕ u ( p , p )( t − t ) τ ( t − t ) τ ( t − t ) τ ≤ CT ϕ u ( p , p ) ϕ u ( p , p )because τ > (cid:3) Lemma B.6.
Assuming C − C , we have that as n, T → ∞ , E n n X p =1 I x,p ( j ) − f x,p ( j ) = o (1) . (B.7) NFERENCE WITHOUT SMOOTHING FOR PANELS 43
Proof.
Standard algebra yields that the left hand side of ( B.
7) is bounded by E n n X p =1 (cid:8) J x,p ( j ) J ′ x,p ( − j ) − E (cid:0) J x,p ( j ) J ′ x,p ( − j ) (cid:1)(cid:9) + n n X p =1 E I x,p ( j ) − f x,p ( j ) .Now n − P np =1 E I x,p ( j ) − f x,p ( j ) = O (cid:0) T − (cid:1) is standard as f x,p ( λ ) is twice continuously differ-entiable, whereas Lemma B.5 implies that the first term of the last displayed expression is Cn n X p,q =1 ϕ x ( p, q ) (cid:18) CT (cid:19) = o (1)by Condition C
3, see also Remark 1. (cid:3)
Lemma B.7.
Under C − C , we have that as n, T → ∞ , T T − X j =1 n n X p =1 I e x,p ( j ) − n n X p =1 I x,p ( j ) = o p (1) (B.8)1 T T − X j =1 n n X p =1 I x,p ( j ) − Z π − π lim n →∞ n n X p =1 f x,p ( λ ) dλ = o p (1) . (B.9) Proof.
Noticing that 1 n n X p =1 I e x,p ( j ) − I x,p ( j ) = −I x, · ( j ) ,we obtain that the left hand side of ( B.
8) equals1 T T − X j =1 I x, · ( j ) − T T − X j =1 I x, · ( j ) 1 n n X p =1 I x,p ( j ) .We shall examine the first term of the last displayed expression, with the second one being handledsimilarly, if not easier. Now, by definition I x, · ( j ) = 1 n n X p,q =1 J x,p ( j ) J x,q ( − j ) ,so that Lemma B.5, in particular ( B. E I x, · ( j ) = 1 n n X p ,...,p =1 ϕ x ( p , p ) ϕ x ( p , p ) (cid:26) ( j = k ) + CT (cid:27) = o (1)because n − P np ,p =1 ϕ x ( p , p ) = o (1) by ergodicity. This completes the proof of ( B. B. T T − X j =1 n n X p =1 I x,p ( j ) − E ( I x,p ( j )) = o p (1) (B.10)1 T T − X j =1 n n X p =1 I x,p ( j ) − E ( I x,p ( j )) n n X p =1 E ( I x,p ( j )) = o p (1) , (B.11) because the continuous differentiability of f x,p ( λ ) implies1 T T − X j =1 n n X p =1 E ( I x,p ( j )) − Z π − π lim n →∞ n n X p =1 f x,p ( λ ) = o (1)by standard arguments . Now ( B.
10) holds true by Lemma B.6 and ( B.
11) follows by Cauchy-Schwarz’s inequality. (cid:3)
The next lemma extends a Central Limit Theorem in Phillips and Moon (1999) when theirindependence condition fails.
Lemma B.8.
Let { u pt } t ∈ Z and { x pt } t ∈ Z , p ∈ N + , satisfy Conditions C − C . Then as n, T → ∞ , T / T X t =1 n / n X p =1 x pt u pt d → N (0 , Φ) . (B.12) Proof.
First, Hidalgo and Schafgans’ (2017) Theorem 1 implies that z n,t = 1 n / n X p =1 x pt u pt d → N (0 , Ω t ) , t = 1 , ..., T , (B.13)and also for any r, s ≥
0, 1 n / n X p =1 χ p,t + r ξ p,t + s d → N (0 , Ω t,r,s ) .Now, Phillips and Moon’s (1999) Theorem 2 cannot be employed as the latter result requires thatthe left hand side of ( B. { z n,t } t ≥ , is a sequence of independent random variables.Dropping the subscript “ p ” for notational convenience, we have that u t x t = ( D u ( L ) ξ t ) ( C x ( L ) χ t ) , (B.14)where D u ( L ) = ∞ X ℓ =0 d ℓ L ℓ ; C x ( L ) = ∞ X ℓ =0 c ℓ L ℓ by Conditions C C
2. We now employ a “second-order” BN decomposition similar to that inPhillips and Solo (1992, p. 978-979). First, we notice that standard algebra yields that the righthand side of ( B.
14) is ∞ X ℓ =0 d ℓ c ℓ ξ t − ℓ χ t − ℓ + ∞ X ℓ =0 ∞ X k = ℓ +1 + ∞ X k =0 ∞ X ℓ = k +1 ! d ℓ c k ξ t − ℓ χ t − k = ∞ X ℓ =0 d ℓ c ℓ ξ t − ℓ χ t − ℓ + ∞ X k =1 ∞ X ℓ =0 d ℓ c ℓ + k ξ t − ℓ χ t − k − ℓ ! + ∞ X ℓ =1 ∞ X k =0 c k d k + ℓ χ t − k ξ t − k − ℓ ! = ∞ X ℓ =0 d ℓ c ℓ ξ t − ℓ χ t − ℓ + ∞ X k =1 ∞ X ℓ =0 d ℓ c ℓ + k L ℓ ! ξ t χ t − k + ∞ X ℓ =1 ∞ X k =0 c k d k + ℓ L k ! χ t ξ t − ℓ = ̺ ( L ) ξ t χ t + ∞ X k =1 ̺ k ( L ) ξ t χ t − k + ∞ X ℓ =1 g ℓ ( L ) χ t ξ t − ℓ ,where ̺ k ( L ) = P ∞ ℓ =0 d ℓ c ℓ + k L ℓ and g ℓ ( L ) = P ∞ k =0 c k d k + ℓ L k . Observe that ̺ ( L ) = g ( L ). NFERENCE WITHOUT SMOOTHING FOR PANELS 45
Next, because for a generic polynomial h ( L ) = P ∞ ℓ =0 h ℓ L ℓ , we have the identity h ( L ) = h (1) − (1 − L ) e h ( L ), where e h ( L ) = P ∞ ℓ =0 e h ℓ L ℓ with e h ℓ = P ∞ p = ℓ +1 h p , we can write the right handside of the last displayed equality as ̺ (1) ξ t χ t + ξ t ∞ X k =1 ̺ k (1) χ t − k + χ t ∞ X ℓ =1 g ℓ (1) ξ t − ℓ (B.15) − (1 − L ) ∞ X k =1 e dc k ξ t − k χ t − k − (1 − L ) ∞ X k =1 e ̺ k ( L ) ξ t χ t − k − (1 − L ) ∞ X ℓ =1 e g ℓ ( L ) χ t ξ t − ℓ .Observe that e dc k = e ̺ ( L ) , e ̺ k ( L ) = ∞ X ℓ =0 e υ ℓ,k L ℓ with e υ ℓ,k = ∞ X p = ℓ +1 d p c p + k , e g ℓ ( L ) = ∞ X k =0 e ω k,ℓ L ℓ with e ω k,ℓ = ∞ X p = k +1 c p d p + ℓ , and ξ t P ∞ k =1 ̺ k (1) χ t − k and χ t P ∞ ℓ =1 g ℓ (1) ξ t − ℓ are mutually independent martingale differences.Given ( B. , we can write the left hand side of ( B.
12) as the sum of six terms. The contributiondue to the fourth term of ( B.
15) is ∞ X k =1 e dc k T / n / n X p =1 ξ p,t − k χ p,t − k = O p (cid:16) T − / (cid:17) because E (cid:16) n − / P np =1 ξ p,t − k χ p,t − k (cid:17) < C and by summability of the sequence n e dc k o k ∈ N + . Next,the contribution due to the fifth and sixth terms of ( B.
15) follow similarly and hence they are o p (1).So, we need to examine the contribution due to the first three terms of ( B.
15) on the left sideof ( B. ̺ (1)( T n ) / T X t =1 n X p =1 ξ pt χ pt + 1( T n ) / T X t =1 n X p =1 ξ pt e χ pt + 1( T n ) / T X t =1 n X p =1 e ξ pt χ pt , (B.16)where e χ pt =: ∞ X k =1 ̺ k (1) χ p,t − k ; e ξ pt =: ∞ X ℓ =1 g ℓ (1) ξ p,t − ℓ .The result that the first term of ( B.
16) converges to a normal random variable follows by (theproof of) Hidalgo and Schafgans’ (2017) Theorem 1 and Phillips and Moon’s (2002) Theorem 2 as n − / P np =1 ξ pt χ pt are independent sequences in t . Because the second and third terms of ( B. K X k =1 ̺ k (1) 1( T n ) / T X t =1 n X p =1 ξ pt χ p,t − k + ∞ X k = K +1 ̺ k (1) 1( T n ) / T X t =1 n X p =1 ξ pt χ p,t − k . (B.17)By summability of ̺ k (1) and given that E T n ) / T X t =1 n X p =1 ξ pt χ p,t − k = 1 T n T X t =1 X p,q ϕ ( p, q ) ≤ C by Condition C
3, we obtain that by choosing K large enough the second term of ( B.
17) is o p (1).The first term of ( B.
17) on the other hand converges to a normal random variable proceeding aswith the first term of ( B. (cid:3) Lemma B.9.
Under the same conditions of Lemma B.8, we have that e T / e T X j =1 n / n X p =1 J x,p ( j ) J u,p ( − j ) d → N (0 , Φ) . (B.18) Proof.
Using ( A.
1) and ( B.
5) of Lemma B.5, we have that the left side of ( B.
18) is governed by1 e T / e T X j =1 n / n X p =1 B x,p ( j ) B u,p ( − j ) J χ,p ( j ) J ξ,p ( − j )= 1 e T / e T X j =1 T T X t,s =1 Ξ s,t ( n ; j ) e i ( t − s ) λ j , (B.19)where Ξ s,t ( n ; j ) = 1 n / n X p =1 G p ( j ) χ ps ξ pt ; G p ( j ) =: B x,p ( j ) B u,p ( − j ) . (B.20)Because (cid:8) χ pt (cid:9) t ∈ Z and (cid:8) ξ pt (cid:9) t ∈ Z , p ∈ N + , are mutually independent iid zero mean sequences, wehave that Ξ s,t ( n ) is independent of Ξ r,m ( n ) if s = r and t = m and uncorrelated if s = r and t = m or s = r and t = m . By Lemma B.8, it follows that Ξ s,t ( n ; j ) → d N (cid:16) , e V ( j ) (cid:17) , where e V ( j ) = lim n →∞ n n X p,q =1 f x,pq ( j ) f u,pq ( j ) ϕ ( p, q )and E k Ξ s,t ( n ) k < C .Next, the right hand side of ( B.
19) is2 / T / T X t,s =1 n / n X p =1 χ ps ξ pt e T X j =1 g p ( j ) e i ( t − s ) λ j = 1 T / T X t,s =1 n / n X p =1 φ p ( t − s ) χ ps ξ pt (cid:18) CT (cid:19) (B.21)using Brillinger’s (1981) Exercise 1.7.14( b ), where φ p ( s ) denotes the s − th Fourier coefficient of g p ( λ j ) defined in ( B. ∞ X ℓ = −∞ φ p ( ℓ ) = 12 n Z π − π g p ( λ ) dλ = 12 π Z π − π f x,p ( λ ) f u,p ( λ ) dλ .Now, the right hand side of ( B.
21) can be written as1 T / T − ℓ X t =1 n / n X p =1 φ p (0) χ pt ξ pt + 1 T / T − X ℓ =1 T − ℓ X t =1 n / n X p =1 φ p ( ℓ ) (cid:0) χ pt ξ p,t + ℓ + χ p,t + ℓ ξ pt (cid:1) .From here, we conclude the proof proceeding as we did in Lemma B.8 since, say,1 n / n X p =1 φ p ( ℓ ) χ pt ξ p,t + ℓ NFERENCE WITHOUT SMOOTHING FOR PANELS 47 is a sequence of independent random variables in the t dimension which converges to a Gaussianrandom variable by arguments similar to those in the proof of Hidalgo and Schafgans’ (2017)Theorem 1 and 1 T / T − X ℓ = b T − ℓ X t =1 n / n X p =1 φ p ( ℓ ) χ pt ξ p,t + ℓ = o p (1)by choosing b large enough since φ p ( ℓ ) = O (cid:0) ℓ − (cid:1) . (cid:3) References [1]
Andrews, D.W.K. (1991): “Heteroskedasticity and autocorrelation consistent covariance matrix estimation,”
Econometrica , , 817-858.[2] Andrews, D.W.K. (2005): “Cross-Section Regression with Common Shocks,”
Econometrica , , 1551-1585.[3] Arellano, M. (1987): “Computing robust standard errors for within group estimators,”
Oxford Bulletin ofEconomics and Statistics , , 431-434.[4] Bester, C., Conley, J. and Hansen, C. (2011): “Inference with dependent data using cluster covarianceestimators,”
Journal of Econometrics , , 137-151.[5] Bester, C., Conley, J., Hansen, C., and Vogelsang, T.J. (2016): “Fixed-b asymptotics for spatiallydependent robust nonparametric covariance matrix estimators,”
Econometric Theory , Brillinger, D. (1981):
Time Series: Data Analysis and Theory.
Holt, Rinehart and Winston, New York.[7]
Brockwell, P.J. and Davis, R.A. (1991):
Time Series: Theory and Methods . Spinger-Verlag, New York.[8]
Chan, C. and Ogden, R.D. (2009): “Bootstrapping sums of independent but not identically distributedcontinuous processes with applications to functional data,”
Journal of Multivariate Analysis , , 1291-1303.[9] Cliff, A. and Ord, J. (1973):
Spatial Autocorrelation.
Pion, London.[10]
Conley, T.G. (1999): “GMM estimation with cross sectional dependence,”
Journal of Econometrics , 92 ,1-44.[11]
Cressie, N. and Huang, H.-C. (1999): “Classes of nonseparable, spatio-temporal stationary covariancefunctions,”
Journal of the American Statistical Association , , 1330-1340.[12] Deaton, A. (1996):
The Analysis of Household. Surveys: A Microeconometric Approach to DevelopmentPolicy . Baltimore MD: Johns Hopkins University Press[13]
Driscoll, J.C. and Kraay, A.C. (1998) “Consistent covariance matrix estimation with spatially dependentpanel data,”
The Review of Economics and Statistics , , 549-560.[14] Efron, B. (1979): “Bootstrap methods: another look at the jackknife,”
Annals of Statistics , , 1-26.[15] Engle, R.F. (1974): “Specification of the Disturbance Term for Efficient Estimation,”
Econometrica , ,135-146.[16] Fuentes, M. (2006): “Testing for separability of spatial-temporal covariance functions,”
Journal of StatisticsPlanning and Inference , , 447-466.[17] Fuller, W.A. (1996): Introduction to statistical time series. Second Edition. John Wiley: New York.[18]
Gon ¸c alves, S. (2011): “The moving blocks bootstrap for panel linear regression models with individual fixedeffects,” Econometric Theory , , 1048-1082.[19] G ¨o tze, F. and K ¨u nsch, H.R. (1996): “Second-order correctness of the blockwise bootsrap for stationaryobservations,” Annals of Statistics , , 1914-1933.[20] Gneiting, T. (2002): “Nonseparable, stationary covariance functions for space-time data,”
Journal of theAmerican Statistical Association , , 590-600.[21] Greene, W.H. (2018):
Econometric Analysis . Pearson, New York, NY.[22]
Hahn, J. and Kuersteiner, G. (2002): “Asymptotically unbiased inference for a dynamic panel model withfixed effects when both n and T are large, ” Econometrica , , 1639-1657.[23] Hannan, E.J. (1963): “Regression for time series, ” in Rosenblatt, M. (Ed.),
Time Series Analysis.
Wiley,New York, pp. 17-36. [24]
Hidalgo, J. (2003): “An alternative bootstrap to moving blocks for time series regression models,”
Journalof Econometrics , , 369-399.[25] Hidalgo, J. and Schafgans, M. (2018): “Inference without smoothing for large panels with cross sectionaland temporal dependence,”
Sticerd Discussion Paper,
EM/2018/597, London School of Economics.[26]
Hidalgo, J. and Schafgans, M. (2017): “Inference and testing breaks in large dynamic panels with strongcross sectional dependence,”
Journal of Econometrics, , 259-274.[27]
Hoechle D, (2007): “Robust standard errors for panel regressions with cross-sectional dependence,”
StataJournal , , 281-312.[28] Ibragimov, I.A. and Rozanov, Y.A. (1978): Gaussian random processes. Springer. New York.[29]
Jenish, N. and Prucha, I.R. (2009): “Central limit theorems and uniform laws of large numbers for arraysof random fields,”
Journal of Econometrics , , 86-98.[30] Jenish, N. and Prucha, I.R. (2012): “On spatial processes and asymptotic inference under near-epochdependence,”
Journal of Econometrics , , 178-190.[31] Kelejian, H.H. and Prucha, I.R. (2007): “HAC estimation in a spatial framework,”
Journal of Economet-rics , , 131-154.[32] Kiefer, N.M. and Vogelsang, T.J. (2002): “Heteroskedasticity-autocorrelation robust standard errors usingthe Bartlett Kernel without Truncation,”
Econometica , , 2093-2095.[33] Kiefer, N.M. and Vogelsang, T.J. (2005): “A new asymptotic theory for heteroskedasticity-autocorrelationrobust tests,”
Econometric Theory , , 1130-1164.[34] Kim, M.S. and Sun, Y. (2013): “Heteroskedasticity and spatiotemporal dependence robust inference for linearpanel models with fixed effects,”
Journal of Econometrics , , 85-108.[35] Lazarus, E., Lewis, D.J., Stock, J.H. and Watson, M.W. (2018): “HAR Inference: recommendations forpractice,”
Journal of Buisness and Economic Statistics , , 541–559.[36] Lee, J. and Robinson, P.M. (2013): “Series estimation under cross-sectional dependence,”
Sticerd DiscussionPaper,
EM/2013/570, London School of Economics.[37]
Matsuda, Y. and Yajima, Y. (2004): “On testing for separable correlations of multivariate time series,”
Journal of Time Series Analysis , , 501-528.[38] Nicholls, D.F. and Pagan, A.R. (1977): “Specification of the Disturbance for Efficient Estimation–AnExtended Analysis,”
Econometrica , , 211-217.[39] Pesaran, M.H. (2006): “Estimation and inference in large heterogeneous panels with a multifactor errorstructure,”
Econometrica , , 967-1012.[40] Pesaran, M.H. and Yamagata, T. (2008): “Testing slope heterogeneity in large panels,”
Journal of Econo-metrics , , 50-93.[41] Phillips, P.C.B. and Moon, R. (1999): “Linear regression limit theory for nonstationary panel data,”
Econometrica , , 1057-1111.[42] Phillips, P.C.B. and Solo. (1992): “Asymptotics for linear processes,”
Annals of Statistics , , 971-1001.[43] Politis, D.N. and White, H. (2004): “Automatic Block-Length Selection for the Dependent Bootstrap,”
Econometric Reviews , , 53–70.[44] Robinson, P.M. (1998): “Inference without smoothing in the presence of autocorrelation,”
Econometrica , ,1163-1182[45] Robinson, P.M. (2011): “Asymptotic theory for nonparametric regression with spatial data,”
Journal ofEconometrics , , 5-19.[46] Robinson, P.M. and Hidalgo, J. (1997): “Time series regression with long-range dependence,”
Annals ofStatistics , , 77-104.[47] Vogelsang, T.J. (2012): “Heteroskedasticity, autocorrelation, and spatial correlation robust inference inlinear panel models with fixed-effects,”
Journal of Econometrics , , 303-319.[48] White, H . (1980): “A heteroscedasticity-consistent covariance matrix estimator and a direct test for het-eroscedasticity,”
Econometrica , , 817-838.[49] Whittle , P. (1954): “On stationary processes in the plane,”
Biometrika , , 434-449. NFERENCE WITHOUT SMOOTHING FOR PANELS 49 [50]
Zellner, A. (1962): “An efficient method of estimating seemingly unrelated regressions and tests for aggre-gation bias,”
Journal of the American Statistical Association , , 348–368. Economics Department, London School of Economics, Houghton Street, London WC2A 2AE, UK
E-mail address : [email protected] Economics Department, London School of Economics, Houghton Street, London WC2A 2AE, UK
E-mail address ::