Inference in Unbalanced Panel Data Models with Interactive Fixed Effects
IInference in Unbalanced Panel Data Models withInteractive Fixed Effects
Daniel Czarnowske ∗ Amrei Stammann † April 8, 2020In this article, we study the limiting behavior of Bai (2009)’s interactive fixed effectsestimator in the presence of randomly missing data. In extensive simulation experiments,we show that the inferential theory derived by Bai (2009) and Moon and Weidner(2017) approximates the behavior of the estimator fairly well. However, we find that thefraction and pattern of randomly missing data affect the performance of the estimator.Additionally, we use the interactive fixed effects estimator to reassess the baseline analysisof Acemoglu et al. (2019). Allowing for a more general form of unobserved heterogeneityas the authors, we confirm significant effects of democratization on growth.
JEL Classification:
C01, C13, C23, C38, C55, O10
Keywords:
Economic Development, Interactive Fixed Effects, Factor Models, ModelSelection, Principal Components, Unbalanced Panel Data ∗ Heinrich-Heine-University Duesseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany, phone: +49 21181-10620, e-mail: [email protected] † Heinrich-Heine-University Duesseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany, phone: +49 21181-15307, e-mail: [email protected] a r X i v : . [ ec on . E M ] A p r Introduction
Economists are often concerned that unobserved heterogeneity is correlated with some variablesof interest and thus leads to inconsistent estimates of the corresponding common parameters β . If panel data is available, so called fixed effects models are frequently used to address thisissue. One critical assumption of these models is that the unobserved heterogeneity has to beadditive separable in both panel dimensions. For instance, if a panel consists of N individualsobserved for T time periods, the researcher has to assume that individual- and/or time-specificeffects enter the model additively. If this is not the case, for instance because both effectsare multiplicatively interacted, conventional fixed effects models are not suitable to solve theunderlying endogeneity problem. Exactly this concern motivates so-called interactive fixedeffects (IFE) estimators that model the time-varying unobserved heterogeneity as a low rankfactor structure λ (cid:48) i f t , where λ i and f t are individual- and time-specific effects, respectively (seeamong others Holtz-Eakin, Newey, and Rosen 1988; Pesaran 2006; Bai 2009). Throughoutthis article, we refer to λ i as factor loadings and f t as common factors. Holtz-Eakin, Newey, and Rosen (1988) propose an estimator for panels with large N butsmall T which is based on a quasi-differencing approach similar to Anderson and Hsiao (1982).First they eliminate the factor loadings from the estimation equation and then estimate theremaining common factors and parameters using lagged covariates as instrumental variables.Although this estimator is consistent under fixed T asymptotics, its is well known that forlarge T , the number of instruments and parameters leads to biased estimates (see Neweyand Smith 2004). Recently, the literature considers estimators that require N and T to besufficiently large. Pesaran (2006) suggests a common correlated effects (CEE) estimator in thespirit of Mundlak (1978) and Chamberlain (1982, 1984), which uses cross-sectional averagesof the dependent variable and the regressors to control for the unobserved common factors.His estimator is at least √ N consistent without the need to know the true rank of the factorstructure or to impose strong factor assumptions as in Bai (2009) and Moon and Weidner(2015, 2017). However, in order to use cross-sectional averages as proxy variables for theunobserved common factors, we require some parametric assumptions on the joint probabilitydistributions of the dependent variable and the covariates. Bai (2009) suggests a differentestimator that treats the common factors and factor loadings as nuisance parameters to beestimated. His estimator is closely related to Bai (2003)’s principal components estimator
1. Bonhomme and Manresa (2015) suggest a related but different approach. Instead of imposing rankrestrictions on the time-varying unobserved heterogeneity, they use a clustering approach to assign eachcross-sectional unit to a specific group where the corresponding group-specific heterogeneity is allowed to varyover time.2. The factor structure can be roughly interpreted as a series approximation of the time-varying unobservedheterogeneity.3. For a detailed discussion of the different interactive fixed effects estimators we refer the reader to Bai(2009) and Moon and Weidner (2015, 2017). √ N T consistency irrespective of cross-sectional and/or time-serialdependence in the idiosyncratic error term. However, the presence of cross-sectional and/ortime-serial dependence leads to an asymptotic bias in the limiting distribution of the estimatorthat can be corrected (see Bai 2009). Moon and Weidner (2017) derived an additional biascorrection for the Nickell (1981) bias stemming from the inclusion of predetermined andweakly exogenous regressors. Because the true number of factors is usually unknown, Moonand Weidner (2015) show that under certain assumptions and as long as the number of factorsused to estimate β is larger than the true number, the estimator may have the same limitingdistribution as shown by Bai (2009) but remains at least (cid:112) min( N, T ) consistent. There maytherefore be a loss of efficiency due to the inclusion of too many irrelevant factors. However,given a consistent estimator for β , the number of factors can be estimated using estimatorsfor pure factor models (see among others Buja and Eyuboglu 1992; Bai and Ng 2002; Hallinand Liška 2007; Alessi, Barigozzi, and Capasso 2010; Onatski 2010; Ahn and Horenstein 2013;Dobriban and Owen 2019). A recent comparison of some popular estimators is given in Choiand Jeong (2019). For the IFE estimator such a comparison does not exist so far.Additionally, it is often the case that some of the observations in a panel are missing.One frequent reason is attrition. For instance, some individuals drop out of a panel becausethey move or leave the participating household. In some of these cases, those individualsare replaced. In macroeconomic panels, it also occurs that some countries are divided intoseveral independent countries. Further, some survey designs replace individuals because ofnon-response. All these cases lead to very different patterns of missing data that usually, inthe absence of sample selection, do not affect the properties of the estimators (see Fernández-Val and Weidner 2018). However in the presence of missing data, the principal componentestimator of Bai (2009) requires an additional data augmentation step based on the EMalgorithm of Stock and Watson (1998, 2002) (see Appendix of Bai 2009 and Bai, Liao, andYang 2015). Bai, Liao, and Yang (2015) show consistency of the EM-type estimator inextensive simulation studies, but do not further investigate the limiting behavior of theestimator.Our article makes the following contributions. First, we extend the work of Bai, Liao,and Yang (2015) and show that the limiting behavior of their suggested estimator can befairly well approximated by the inferential theory of Bai (2009) and Moon and Weidner (2017)derived for balanced panels. In extensive simulation experiments, we show that the fractionand pattern of missing data may affect the properties of the estimator, which is contraryto other popular unobserved effects estimators like the conventional within estimator (seeFernández-Val and Weidner 2018). Further, we present some algorithms that reduce the3omputational costs in the presence of missing data. Second, because the limiting theoryof Bai (2009) and Moon and Weidner (2017) assumes that the true number of factors isknown, we additionally investigate the finite sample performance of some frequently usedestimators for the number of factors: Bai and Ng (2002), Onatski (2010), Ahn and Horenstein(2013), and Dobriban and Owen (2019). Although we find that all estimators perform wellfor balanced data and different configurations of the idiosyncratic error term, their accuracyvaries substantially with different fractions and patterns of randomly missing data. Third,we contribute to the literature of growth by reassessing the baseline analysis of Acemogluet al. (2019) using the IFE estimator of Bai (2009). We qualitatively confirm their main resultsand find significant effects of democratization on growth. In their preferred specification, weestimate a long-run effect of 18 %, which is pretty close to the 20 % reported by the authors,but the instantaneous effect of democratization almost halves to 0.6 %.The article is organized as follows. We introduce the model and the correspondingestimator in Section 2. We briefly review some estimators for the number of factors inSection 3. We provide results of extensive simulation experiments in Section 4. We reassessAcemoglu et al. (2019) using the IFE estimator in Section 5. We briefly discuss the handling ofendogenous regressors as in Moon and Weidner (2017) and Moon, Shum, and Weidner (2018)and consider an alternative estimator suggested by Moon and Weidner (2019) in Section 6.Finally, we conclude in Section 7.Throughout this article, we follow conventional notation: scalars are represented instandard type, vectors and matrices in boldface, and all vectors are column vectors. Further,let A be a N × N matrix and B be a N × T matrix. We denote µ r [ A ] as the r -th largesteigenvalue of A , we refer to [ B ] ij as the ij -th element of B , and we define M B = N − P B ,where N is a N × N identity matrix, P B = B ( B (cid:48) B ) † B (cid:48) , and ( · ) † is the Moore-Penrosepseudoinverse. In this article, we analyze the following unobserved effects model: y it = x (cid:48) it β + λ (cid:48) i f t + e it , (1)where i and t are individual and time specific indexes, x it = [ x it, , . . . , x it,K ] (cid:48) is a vector of K explanatory variables, β is the corresponding vector of common parameters, and e it is anidiosyncratic error term. To allow for the possibility of missing data, we introduce D as asubset of observed index pairs such that D ⊆ { ( i, t ) | i ∈ { , . . . , N } ∧ t ∈ { , . . . , T }} , where n = |D| is the sample size and N and T are the number of individuals and time periods,4espectively. Further, the unobserved effects are expressed as a factor structure of rank R (cid:28) min( N, T ) , where λ i = [ λ i , . . . , λ iR ] (cid:48) is a vector of factor loadings and f t = [ f t , . . . , f tR ] (cid:48) is a vector of common factors. Note that (1) collapses to the conventional additive unobservedeffects model if λ i = [ α i , (cid:48) and f t = [1 , δ t ] (cid:48) . In contrast to this conventional model, the factorstructure allows to capture more general patterns of heterogeneity. For instance, temporalshocks induced by financial crises might differently affect each countries output (see Bai (2009)for some additional motivating examples).Following Bai (2009) and Moon and Weidner (2017), we treat λ i and f t as parameters tobe estimated, and allow the common factors and loadings to be arbitrarily related with theregressors. To stress the similarity with the conventional fixed effects model, we follow theexisting literature and refer to (1) as interactive fixed effects model.For a given value of R , Moon and Weidner (2015, 2017) suggest ˆ β = arg min β ∈ R K Q ( β ) (2)as IFE estimator for β , where Q ( β ) = min Λ , F n (cid:88) ( i,t ) ∈D ( y it − x (cid:48) it β − λ (cid:48) i f t ) (3)is the profile objective function, Λ = [ λ , . . . , λ N ] (cid:48) is a N × R matrix of factor loadings, and F = [ f , . . . , f T ] (cid:48) is a T × R matrix of common factors. Note that the minimizing commonfactors ˆ f t ( β ) and loadings ˆ λ i ( β ) are not uniquely determined without imposing furthernormalizing restrictions. However, their column spaces are, which means their product isunique. We will return to this issue in the next subsection. Further, the objective function isglobally non-convex due to the rank constraint imposed on the factor structure.Moon and Weidner (2017) show consistency of the IFE estimator using an asymptoticframework where N, T → ∞ . Most notably, their imposed assumptions require the truenumber of factors to be known, some regularity assumptions, weak exogeneity of the regressors,and some conditions with respect to so called “low-rank” regressors. The latter assumptionsensure that these “low-rank” regressors are not entirely absorbed by the factor structure,which otherwise leads to an identification problem. This is very similar to the conventionalfixed effects model where coefficients of time-invariant regressors are not identified. Note thatBai (2009) also shows consistency using different asymptotics that requires strict exogeneityof all regressors. However, his framework is not suitable for our purposes because we willconsider predetermined regressors in our empirical illustration.We briefly describe the asymptotic distribution of the IFE estimator derived by Moon and
4. Time-invariant and/or common regressors can be project out by applying suitable within transformationsbefore estimating the parameters of interest (see Bai 2009). To avoid ambiguity, by randomly missing observations, we mean thatthe dependent variable is independent of the attrition process conditional on the covariatesand factor structure. Thus we assume that the observations are conditionally missing atrandom. Under asymptotic sequences where
N/T → κ and < κ < ∞ , the IFE estimatorhas the following limiting distribution: ˆ β d → N (cid:16) β − N − B − T − C − T − C , n − V (cid:17) , (4)where N = n/T , T = n/N , B , C , and C are leading bias terms, and V is a covariancematrix. C stems from the inclusion of weakly exogenous regressors, like predeterminedvariables, and is a generalization of the Nickell (1981) bias. B and C arise if the idiosyncraticerror term is heteroskedastic or correlated across individuals and time periods, respectively. Moon and Weidner (2017) show that (3) can be transformed to Q ( β ) = 1 n min( N,T ) (cid:88) r = R +1 µ r (cid:2) W ( β ) (cid:48) W ( β ) (cid:3) , (5)where W ( β ) is a N × T matrix with [ W ( β )] it = y it − x (cid:48) it β . However, in the presence ofmissing data, some of the entries in W ( β ) are missing and we cannot simply apply theeigendecomposition as in the balanced case.We follow the suggestion of Bai (2009) and Bai, Liao, and Yang (2015) and combine (5)with an EM algorithm proposed by Stock and Watson (1998, 2002). Intuitively, we augmentthe missing data in the E-step and apply the eigendecomposition to complete data in theM-step. Following Bai, Liao, and Yang (2015), we can rearrange (1) to y it − x (cid:48) it β = [ W ( β )] it = λ (cid:48) i f t + e it , (6)which means that, for a known β , we can augment missing observations in W ( β ) withestimates of Λ and F . As Bai (2009), we impose the following normalizing restrictions touniquely determine the common factors and loadings: F (cid:48) F /T = R and Λ (cid:48) Λ = diag ( α ) ,where α ∈ R R . Given these normalizing restrictions, (cid:98) F ( β ) is equal to the first R eigenvectorsof W ( β ) (cid:48) W ( β ) multiplied by √ T and (cid:98) Λ ( β ) = W ( β ) (cid:98) F ( β ) /T .
5. Chen, Fernández-Val, and Weidner (2019) use the same conjecture for non-linear models with interactivefixed effects.6. Other valid normalizing restrictions are discussed in Bai and Ng (2013).7. If
T > N , it is computationally more efficient to impose Λ (cid:48) Λ /N = R and F (cid:48) F = diag ( α ) and estimate (cid:2) P D ( A ) (cid:3) it = [ A ] it if ( i, t ) ∈ D otherwise (7)be a projection operator that replaces missing observations of any N × T matrix A with zeros.Likewise, P ⊥D ( A ) is the complementary operator that replaces non-missing observations withzeros. Definition.
EM algorithmGiven β and R , initialize W ⊥ = N × T and repeat the following steps until convergenceStep 1: Set (cid:102) W ( β ) = P D ( W ( β )) + P ⊥D (cid:0) W ⊥ (cid:1) Step 2: Compute (cid:98) F ( β ) and (cid:98) Λ ( β ) from (cid:102) W ( β ) Step 3: Update W ⊥ = (cid:98) Λ ( β ) (cid:98) F ( β ) (cid:48) For a given β and R , we start by replacing missing observations in W ( β ) with zeros. Wedenote this augmented matrix as (cid:102) W ( β ) . Afterwards we estimate (cid:98) F ( β ) and (cid:98) Λ ( β ) from (cid:102) W ( β ) ,replace the missing observations in (cid:102) W ( β ) with ˆ λ (cid:48) i ( β ) ˆ f t ( β ) , and repeat these two steps untilconvergence.Let (cid:102) W ( β ) denote the augmented matrix after convergence, a general IFE objective functionis given by (cid:101) Q ( β ) = 1 n min( N,T ) (cid:88) r = R +1 µ r (cid:104) (cid:102) W ( β ) (cid:48) (cid:102) W ( β ) (cid:105) , (8)where in case of balanced data (cid:102) W ( β ) is simply W ( β ) . Thus, in case of missing data,we complement the IFE estimator suggested by Moon and Weidner (2015, 2017) with anadditional data augmentation step inside the objective function. Before we describe the estimation of the leading bias terms and the covariance matrix necessaryfor inference, we need to know how to project the estimated common factors and/or loadingsout of an arbitrary n -dimensional vector v . Let ˘ v , ` v , and ´ v denote the residuals of the (cid:98) Λ ( β ) as the first R eigenvectors of W ( β ) W ( β ) (cid:48) multiplied by √ N and (cid:98) F ( β ) = W ( β ) (cid:48) (cid:98) Λ ( β ) /N . ˘ v it = v it − ˆ λ (cid:48) i ˆ a t − ˆ f (cid:48) t ˆ b i , (cid:16) (cid:98) A , (cid:98) B (cid:17) = arg min A ∈ R T × R , B ∈ R N × R (cid:88) ( i,t ) ∈D (cid:16) v it − ˆ λ (cid:48) i a t − ˆ f (cid:48) t b i (cid:17) , (9) ` v it = v it − ˆ λ (cid:48) i ˆ a t , (cid:98) A = arg min A ∈ R T × R (cid:88) ( i,t ) ∈D (cid:16) v it − ˆ λ (cid:48) i a t (cid:17) , (10) ´ v it = v it − ˆ f (cid:48) t ˆ b i , (cid:98) B = arg min B ∈ R N × R (cid:88) ( i,t ) ∈D (cid:16) v it − ˆ f (cid:48) t b i (cid:17) , (11)where A = [ a , . . . , a T ] (cid:48) and B = [ b , . . . , b N ] (cid:48) . In case of balanced data, these residuals havea straightforward expression: ˘ v = vec ( M (cid:98) Λ V M (cid:98) F ) , ` v = vec ( M (cid:98) Λ V ) , and ´ v = vec ( V M (cid:98) F ) ,where V is a N × T matrix with elements denoted by v it . However in the presence ofmissing data, we cannot simply augment missing observations in V with zeros and applythe same expressions as in the balanced case. We can still use a standard ordinary leastsquares estimator to obtain the residuals, but with increasing sample sizes this problemquickly becomes infeasible (even for moderately large data sets). Suppose we have a data setconsisting of N = 200 individuals observed for T = 50 time periods and consider five factors.Even in this moderate example, the rank of the sparse regressor matrix corresponding to thecommon factors and factor loadings is already ( N + T ) R = 1 , . Fortunately, we can useinsights from the literature about conventional fixed effects models and use sparse solvers likeHalperin (1962) and Fong and Saunders (2011) to mitigate this problem (see Guimarães andPortugal 2010; Gaure 2013; Stammann 2018).In the presence of missing data, we recommend to compute the residuals with the methodof alternating projections (MAP, see Halperin 1962) approach suggested by Stammann (2018).Let D t = { i | ( i, t ) ∈ D} and D i = { t | ( i, t ) ∈ D} , we define the following scalar expressions: (cid:2) M ˆ λ r v (cid:3) it = v it − ˆ λ ir (cid:80) i ∈D t ˆ λ ir v it (cid:80) i ∈D t ˆ λ ir and (cid:2) M ˆ f r v (cid:3) it = v it − ˆ f tr (cid:80) t ∈D i ˆ f tr v it (cid:80) t ∈D i ˆ f tr . Definition.
MAP algorithm (for unbalanced data)Step 0: Initialize Mv = v .Step 1: (If (cid:98) Λ has to be projected out e. g. in (9) and (10) )For r = 1 , . . . , R , set Mv = M ˆ λ r Mv .Step 2: (If (cid:98) F has to be projected out e. g. in (9) and (11) )For r = 1 , . . . , R , set Mv = M ˆ f r Mv .Step 3: Repeat step 1 and/or 2 until convergence, e. g. (cid:107) Mv ( i ) − Mv ( i − (cid:107) < (cid:15) , where i is the iteration number and (cid:15) is a tolerance parameter. After convergence Mv is aclose approximation to ˘ v , ` v , or ´ v . Next, we describe how to draw inference under the assumption that e it is homoskedastic.In this case, ˆ β is asymptotically unbiased and the corresponding covariance can be estimatedas (cid:98) V = ˆ σ (cid:98) D − , where ˆ σ = n − (cid:80) ( i,t ) ∈D ˆ e it , ˆ e it = y it − x (cid:48) it ˆ β − ˆ λ (cid:48) i ˆ f t , and (cid:98) D = (cid:88) ( i,t ) ∈D ˘ x it ˘ x (cid:48) it . (12)Contrary, under the assumption that e it is not homoskedastic, the IFE estimator is asymptot-ically biased, but can be corrected using appropriate bias corrections. The Nickell (1981) biasstemming from the inclusion of predetermined or weakly exogenous variables can be correctedin a similar fashion.Following Moon and Weidner (2015, 2017), a bias-corrected estimator for β is ˜ β = ˆ β + (cid:98) B + (cid:98) C + (cid:98) C , (13)where (cid:98) B = (cid:98) D − (cid:98) B β , (cid:98) C = (cid:98) D − (cid:98) C β , and (cid:98) C = (cid:98) D − (cid:98) C β are estimators for the asymptotic biasesdescribed in subsection 2.1. Further, (cid:98) B β , (cid:98) C β , and (cid:98) C β are K -dimensional vectors equal to (cid:98) B β k = (cid:88) ( i,t ) ∈D ˆ e it (cid:104) P D ( ` X k ) (cid:98) Θ (cid:48) (cid:105) ii , (14) (cid:98) C β ,k = N (cid:88) i =1 L (cid:88) l =1 (cid:88) t>l ∈D i (cid:2) P (cid:98) F (cid:3) t,t − l ˆ e i,t − l x it,k , (15) (cid:98) C β ,k = (cid:88) ( i,t ) ∈D ˆ e it (cid:104) P D ( ´ X k ) (cid:98) Θ (cid:105) tt + (16) N (cid:88) i =1 M (cid:88) m =1 (cid:88) t>m ∈D i ˆ e it ˆ e i,t − m (cid:18)(cid:104) P D ( ´ X k ) (cid:98) Θ (cid:105) t,t − m + (cid:104) P D ( ´ X k ) (cid:98) Θ (cid:105) t − m,t (cid:19) , M and L are bandwidth parameters for the truncated kernel of Newey and West (1987), (cid:98) Θ = (cid:98) Λ ( (cid:98) Λ (cid:48) (cid:98) Λ ) − ( (cid:98) F (cid:48) (cid:98) F ) − (cid:98) F (cid:48) , and P D ( ´ X k ) and P D ( ` X k ) are N × T matrices with elementsdenoted by ´ x it,k and ` x it,k if ( i, t ) ∈ D and zero otherwise. The first term of (cid:98) B β k and (cid:98) C β ,k aresymmetric and estimate the biases induced by individual and time-serial heteroskedasticityin the idiosyncratic error term, respectively. The second term in (cid:98) C β ,k corrects for the biasstemming from time-serial correlation (see Bai 2009 remark 6 and Moon and Weidner 2015).In this article, we do not consider cross-sectional correlation in the idiosyncratic error term,because usually a large distance between cross-sectional units does not imply small correlationof the corresponding error terms. Thus a kernel method as used to deal with time-serialcorrelation is not suitable for this purpose. Appropriate estimators for the covariance of ˜ β are given by (cid:101) V j = (cid:98) D − (cid:98) Ω j (cid:98) D − ∀ j = { , } , (17) (cid:98) Ω = (cid:88) ( i,t ) ∈D ˆ e it ˘ x it ˘ x (cid:48) it , (18) (cid:98) Ω = N (cid:88) i =1 (cid:32)(cid:88) t ∈D i ˆ e it ˘ x it (cid:33) (cid:32)(cid:88) t ∈D i ˆ e it ˘ x it (cid:33) (cid:48) . (19) (cid:101) V is a White (1980)-type heteroskedasticity robust and (cid:101) V is a cluster-robust covarianceestimator that takes into account arbitrary time-serial correlation by assigning observations toindividual specific clusters. Alternatively, the clustered covariance estimator can be substitutedby Newey and West (1987)’s estimator. Bai (2009) and Moon and Weidner (2017) show consistency and derive the asymptoticdistribution of the IFE estimator under the assumption that the number of factors is known.To avoid ambiguity with R , we denote the true number of factors as R . In practice, thisassumption is often very unlikely unless economic theory gives a clear prediction about thenumber of factors. However, even in this case, it might be necessary to support the theoreticalprediction by some empirical evidence. Therefor we need a reliable method to estimate thenumber of factors.For pure factor models, i. e. the unobserved effects model without covariates, there isalready an extensive literature on the estimation of the number of factors (see among othersBuja and Eyuboglu 1992; Bai and Ng 2002; Hallin and Liška 2007; Alessi, Barigozzi, and
8. In the presence of cross-sectional correlation, Bai (2009, in remark 7) presents a partial sample estimator.Alternatively, Bai and Liao (2017) suggest to estimate the cross-sectional correlation using an inverse covarianceestimator and incorporate the corresponding weights in the objective function. y it − x (cid:48) it ˆ β = λ (cid:48) i f t + e it − x it ( ˆ β − β ) is essentially a pure factor model. Thus givenan appropriate estimator for β , such that the error x it ( ˆ β − β ) is asymptotically negligible,we can consistently estimate the number of factors using the estimators developed for purefactor models (see Bai 2009 remark 5 and Appendix).Next we need to define an appropriate estimator. Bai (2009) argues, without rigorousproof, that the ˆ β is √ N T consistent as long as the number of factors is at least R . Theintuition is very similar to the inclusion of irrelevant variables in a standard OLS regression.Including redundant common factors does not affect consistency of the IFE estimator, butits precision (see Bai 2009 remark 4). Under some more restrictive assumptions as imposedby Bai (2009) and Moon and Weidner (2017), Moon and Weidner (2015) confirm that theasymptotic distribution of ˆ β with R > R is in fact identical to the one with R = R . However, imposing similar assumptions as Bai (2009) and Moon and Weidner (2017), theauthors can only show (cid:112) min(
N, T ) consistency of ˆ β .The related literature gives two practical advises: First, let R be a known upper boundon the numbers of factors, than we can consistently estimate β and afterwards the number offactors using estimators developed for pure factor models (see Bai 2009). Second, becauseincluding irrelevant factors does not affect the limiting distribution, valid inference does notnecessary require to consistently estimate the true number of factors. However this flexibilityis associated with an efficiency loss in finite samples and thus reliable methods to estimatethe number of factors are still useful to ensure that the number of factors used to estimate β is not substantially larger than R (see Moon and Weidner 2015).Throughout this article, we restrict ourselves to the estimators for the number of factorssuggested by Bai and Ng (2002), Onatski (2010), Ahn and Horenstein (2013), and Dobribanand Owen (2019) applied to the pure factor model W ( ˆ β ) given β is estimated with R = R .Bai and Ng (2002) introduces various model selection criteria based on minimizing the sumof squared residuals plus some penalty function of the number of estimated parameters.Onatski (2010), Ahn and Horenstein (2013), and Dobriban and Owen (2019) segment theeigenvalue spectrum of the covariance of W ( ˆ β ) to find a cutoff point between the commonfactors and the remaining noise stemming from the idiosyncratic error term. Onatski (2010)proposed the edge distribution (ED) estimator based on differences of consecutive eigenvalues.Ahn and Horenstein (2013) suggest to use ratios (ER) and growth rates (GR) instead ofdifferences. Buja and Eyuboglu (1992) suggest a specific version of the parallel analysis(PA), which compares the eigenvalues to those obtained of independent data. Intuitively, theeigenvalues of independent data provide a clear threshold to separate common factors from
9. Some of the very restrictive assumptions imposed by Moon and Weidner (2015), like independent andidentically standard normally distributed error terms, are mainly due to technical reasons. In simulationexperiments the authors violate this assumption and still find support for their theoretical results. W ( ˆ β ) , which preservesthe variances of the data but breaks the correlation pattern induced by the common factors.Recently, Dobriban (forthcoming) provides the theoretical justification for the accuracy ofPA and Dobriban and Owen (2019) propose a deflated version that improves the detectionaccuracy of smaller but important factors in the presence of large factors.In the presence of missing data, we follow Gagliardini, Ossola, and Scaillet (2019) andapply the different estimators to P D ( W ( ˆ β )) instead of W ( ˆ β ) . We study the inference drawn from the interactive fixed effects model in the presence ofrandomly missing data. Contrary to the balanced case, the IFE estimator requires an additionaldata augmentation step to estimate the common factors and loadings on complete data (seeBai 2009; Bai, Liao, and Yang 2015). This is done using the EM algorithm proposed byStock and Watson (1998, 2002). Given we know the true number of factors, we firstly analyzewhether the inferential theory derived for the IFE estimator is a reasonable approximation inthe presence of randomly missing data. For this purpose, we compare relative biases (Bias),average ratios of standard errors and standard deviations, and empirical sizes of z -tests with5 % nominal size (Size) for different patterns of randomly missing data and configurations ofthe idiosyncratic error term with those from a balanced panel. Because usually the numberof factors in unknown, we secondly consider different estimators for the number of factorsand compare their performance as well. We analyze the estimators suggested by Bai and Ng(2002), Onatski (2010), Ahn and Horenstein (2013), and Dobriban and Owen (2019). Fromthe various information criteria introduced by Bai and Ng (2002), we focus on IC and BIC which are also used in other studies (see Onatski 2010; Ahn and Horenstein 2013). To assesthe performance, we compare the average estimated number of factors of all estimators.As Moon and Weidner (2015), we consider a static panel data model with one regressorand two factors: y it = βx it + (cid:88) r =1 λ ir f tr + e it ,x it = 1 + (cid:88) r =1 ( λ ir + χ ir ) ( f tr + f t − ,r ) + w it ,i = 1 , . . . , N , t = 1 , . . . , T , and e it is an idiosyncratic error term. The regressor is generatedsuch that there is a correlation between the common factors and loadings. Throughout allexperiments, we generate f tr and w it as iid. N (0 , and λ ir and χ ir as iid. N (1 , .In spirit of Bai and Ng (2002) and Ahn and Horenstein (2013), we consider four different12onfigurations for the idiosyncratic error term: i) homoskedastic, ii) homoskedastic with fattails, iii) cross-sectional heteroskedastic, and iv) cross-sectional heteroskedastic with time-serial correlation. More precisely, i) e it ∼ iid. N (0 , , ii) e it = (cid:112) / ν it , where ν it has a t -distribution with five degrees of freedom, iii) e it ∼ iid. N (0 , if i is odd and e it ∼ iid. N (0 , else, and iv) e it = 0 . e it − + ν it , where ν it ∼ iid. N (0 , / if i is odd and ν it ∼ iid. N (0 , / else. For configuration iv), we ensure that e it is drawn from its stationary distribution bydiscarding the initial 1,000 time periods. Note that the variance of the idiosyncratic errorterm is equal across all configurations.We consider three different patterns where a fraction of ψ ∈ { , . , . } observations aremissing at random. The overall sample size is equal to N T (1 − ψ ) . Figure 1 gives a graphicalillustration about the three missing data patterns. In the first pattern, we irregularly drop Figure 1:
Patterns of Randomly Missing Observations lll ll ll l ll lll ll lll lll ll lll l llll l lll ll ll ll ll ll ll l ll ll lll l ll ll l ll ll ll l llll l l llll ll ll l ll ll ll ll l l l lll l llll lll ll l lll ll ll ll l ll ll l ll ll llll llll lll l l l l lll ll ll l ll lll ll l llll ll ll lll lll ll ll l l ll l l lll ll lll lllll ll l llll ll llllll ll lll l l ll lllll ll ll ll ll ll ll l ll l llll ll lll ll l lllll ll lll l llll l llll lll lll l lllll llllll l lllll l l ll ll ll ll l lllll llll l l l llll ll ll ll l ll lllll l l lll l l ll ll l l l l ll ll l lll l ll ll ll ll ll l ll lll l ll ll l lll l l llll l lll l ll l ll l ll ll ll ll l ll l lll l ll llll l l lll l l ll lll l ll llll lll lll l lll l ll l ll ll ll lll lll ll lll ll ll l l l ll l lll l l ll l ll l llll ll ll lll l ll ll lll l lllll l lll lll ll llll l ll lll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll
Pattern 1 Pattern 2 Pattern 30 10 20 30 0 10 20 30 0 10 20 300102030
Time Periods I nd i v i dua l s N T ψ observations from the entire panel data set. This pattern is also analyzed by Bai, Liao,and Yang (2015) and mimics a situation in surveys where individuals refuse or forget toanswer certain questions. The other patterns are borrowed from Czarnowske and Stammann(2019) and reflect situations where individuals are replaced after they dropped out from asurvey or not. To describe pattern 2 and 3, we divide all individuals into two types. Type 1consists of N = 2 ψN individuals that are observed for T = T / time periods. The remaining N = N − N individuals are of type 2 and are observed over the entire time horizon ( T = T ).Patterns 2 and 3 differ only in the point in time when the time series of a type 1 individualbegins. In pattern 2, all time series start in t = 1 , whereas in pattern 3, the initial period ischosen randomly with equal probability from { , , . . . , T − T } . All unbalanced data sets aregenerated from balanced panels by randomly dropping observations given the correspondingmissing data pattern.We consider panel data sets of different average sizes: N = { , } and T = { , , } ,13here N = N / (1 − ψ ) and T = T / (1 − ψ ) . This allows us to compare the results acrossdifferent fractions of missing data and check whether the conjecture of Fernández-Val andWeidner (2018) applies to the IFE estimator as well. All results are based on 1,000 replicationsand summarized in tables 1–6. All computations were done on a Linux Mint 18.1 workstationusing R Version 3.6.3 (R Core Team 2019).First, we analyze the finite sample properties of the IFE estimator. In configuration iii),we correct for the asymptotic bias ( B ) induced by cross-sectional heteroskedasticity anduse an appropriate covariance estimator in spirit of White (1980). In configuration iv), weadditionally correct for the asymptotic bias ( C ) induced by time-serial correlation and usea cluster robust covariance estimator. We choose the bandwidth for the estimation of C according to the rule of thumb proposed by Newey and West (1994): M = 4( T / / . Theresults are summarized in tables 1–3. For configuration i)–iii), we observe biases, ratios, andsizes that are almost identical to the balanced case irrespective of the fraction and pattern ofmissing data. Thus, the asymptotic properties of the IFE estimator for balanced data are afairly well approximation for the unbalanced one in these configurations. This is different forconfiguration iv). Here we observe biases that are twice as large compared to the balancedcase. Although all ratios are close to one, these larger biases distort the nominal sizes andlead to over-rejection. Contrary to the other configurations, the various missing data patternsaffect the finite sample properties of the estimator differently and a larger fraction of missingdata leads to worse properties.Next we analyze the different estimators for the number of factors suggested by Bai andNg (2002), Onatski (2010), Ahn and Horenstein (2013), and Dobriban and Owen (2019).The initial estimator to obtain the pure factor model uses R = (cid:100) N , T ) / / (cid:101) . This choice is different from other studies like Bai and Ng (2002), Onatski (2010), and Ahnand Horenstein (2013) who keep the number of factors fixed irrespective of the sample size.For ER and GR we use the mock eigenvalue proposed in Ahn and Horenstein (2013) toallow for the possibility to select zero common factors. All results are summarized in tables4–6. First we analyze the case of balanced data. For T ≥ , all estimators have littlebias. Additionally, for BIC , ED, and PA the biases are low irrespective of the sample sizewhereas ER and GR slightly underestimate the true number of factor. These findings arein line with Ahn and Horenstein (2013) for pure factor models and suggest that the errorin estimating β is asymptotically negligible. This is also an additional robustness check forMoon and Weidner (2015), who expect that their main results also apply to non iid. standardnormally distributed error terms. For unbalanced, we observe that the missing data patternsas well as the fraction of missing data affect the performance of all estimators differently. Ingeneral we find that ER and GR are more likely to underestimate, whereas the others tendto overestimate the number of factors. Further, the performance gets worth as the fraction
10. This rule of thumb was suggested by Bai and Ng (2002) in footnote 10 and traces back to Schwert (1989). able 1: Properties of ˆ β - Missing Data Pattern 1 N T ψ = 0 . / ψ = 0 . / ψ = 0 . Bias Ratio SizeHomoskedastic120 24 0.07 / -0.01 / 0.14 0.91 / 0.88 / 0.92 0.07 / 0.08 / 0.08120 48 0.05 / 0.00 / 0.03 1.00 / 0.92 / 0.95 0.05 / 0.08 / 0.05120 96 0.06 / 0.01 / 0.04 0.99 / 1.00 / 0.97 0.05 / 0.06 / 0.06240 24 0.06 / 0.02 / 0.04 0.91 / 0.96 / 0.90 0.07 / 0.05 / 0.08240 48 0.03 / 0.01 / 0.02 0.96 / 0.97 / 0.98 0.05 / 0.06 / 0.05240 96 0.01 / 0.00 / 0.01 0.98 / 0.99 / 0.99 0.05 / 0.04 / 0.05Homoskedastic with Fat Tails120 24 0.15 / 0.06 / 0.10 0.92 / 0.85 / 0.90 0.07 / 0.09 / 0.08120 48 -0.01 / 0.03 / 0.05 0.94 / 0.97 / 0.91 0.06 / 0.07 / 0.06120 96 0.00 / 0.02 / 0.02 0.97 / 0.88 / 0.95 0.06 / 0.06 / 0.06240 24 0.03 / 0.03 / 0.04 0.92 / 0.89 / 0.93 0.07 / 0.06 / 0.07240 48 -0.01 / 0.00 / 0.01 0.97 / 0.93 / 0.98 0.06 / 0.07 / 0.06240 96 -0.01 / -0.03 / 0.00 0.97 / 0.99 / 0.98 0.05 / 0.06 / 0.06Cross-Sectional Heteroskedastic120 24 0.05 / -0.03 / 0.13 0.90 / 0.92 / 0.92 0.08 / 0.07 / 0.07120 48 -0.01 / -0.01 / 0.00 0.97 / 0.92 / 0.96 0.06 / 0.07 / 0.06120 96 0.01 / -0.01 / 0.05 0.95 / 0.94 / 0.96 0.07 / 0.06 / 0.06240 24 -0.04 / 0.02 / 0.06 0.89 / 0.89 / 0.92 0.07 / 0.08 / 0.08240 48 0.02 / 0.03 / 0.00 0.97 / 0.95 / 0.95 0.06 / 0.05 / 0.06240 96 0.02 / -0.01 / 0.01 0.98 / 0.98 / 1.00 0.06 / 0.05 / 0.04Cross-Sectional Heteroskedastic with Time-Serial Correlation120 24 -1.19 / -1.20 / -1.17 0.87 / 0.93 / 0.97 0.14 / 0.15 / 0.17120 48 -0.32 / -0.52 / -0.51 1.01 / 0.99 / 0.98 0.05 / 0.09 / 0.09120 96 -0.16 / -0.24 / -0.25 0.96 / 0.97 / 0.97 0.07 / 0.06 / 0.08240 24 -1.28 / -1.31 / -1.20 0.86 / 0.92 / 0.98 0.23 / 0.26 / 0.30240 48 -0.37 / -0.57 / -0.56 0.96 / 0.94 / 0.97 0.08 / 0.15 / 0.17240 96 -0.12 / -0.24 / -0.29 0.98 / 1.00 / 0.98 0.06 / 0.08 / 0.12
Note:
Bias refers to relative biases in %, Ratio denotes the average ratiosof standard errors and standard deviations, and Size is the empirical sizesof z -tests with 5 % nominal size. Results are based on 1,000 repetitions. able 2: Properties of ˆ β - Missing Data Pattern 2 N T ψ = 0 . / ψ = 0 . / ψ = 0 . Bias Ratio SizeHomoskedastic120 24 0.07 / -0.01 / 0.06 0.91 / 0.92 / 0.92 0.07 / 0.08 / 0.07120 48 0.05 / 0.04 / 0.07 1.00 / 0.96 / 1.00 0.05 / 0.05 / 0.05120 96 0.06 / 0.00 / 0.04 0.99 / 0.96 / 0.97 0.05 / 0.06 / 0.06240 24 0.06 / 0.08 / 0.05 0.91 / 0.89 / 0.94 0.07 / 0.08 / 0.06240 48 0.03 / -0.01 / 0.01 0.96 / 0.94 / 0.98 0.05 / 0.06 / 0.06240 96 0.01 / 0.00 / 0.02 0.98 / 0.94 / 0.98 0.05 / 0.07 / 0.06Homoskedastic with Fat Tails120 24 0.15 / 0.05 / 0.12 0.92 / 0.87 / 0.88 0.07 / 0.08 / 0.07120 48 -0.01 / 0.00 / 0.03 0.94 / 0.93 / 0.94 0.06 / 0.06 / 0.06120 96 0.00 / -0.01 / 0.00 0.97 / 0.98 / 0.97 0.06 / 0.06 / 0.06240 24 0.03 / 0.07 / 0.04 0.92 / 0.90 / 0.88 0.07 / 0.07 / 0.07240 48 -0.01 / -0.01 / 0.05 0.97 / 0.91 / 0.95 0.06 / 0.06 / 0.07240 96 -0.01 / 0.02 / 0.03 0.97 / 0.96 / 0.96 0.05 / 0.06 / 0.07Cross-Sectional Heteroskedastic120 24 0.05 / 0.06 / 0.04 0.90 / 0.90 / 0.90 0.08 / 0.07 / 0.07120 48 -0.01 / 0.01 / 0.05 0.97 / 0.94 / 0.92 0.06 / 0.06 / 0.08120 96 0.01 / -0.01 / 0.02 0.95 / 0.97 / 0.96 0.07 / 0.05 / 0.06240 24 -0.04 / 0.06 / 0.01 0.89 / 0.93 / 0.89 0.07 / 0.07 / 0.08240 48 0.02 / 0.00 / 0.00 0.97 / 0.96 / 0.95 0.06 / 0.06 / 0.06240 96 0.02 / 0.01 / 0.01 0.98 / 0.93 / 0.97 0.06 / 0.06 / 0.06Cross-Sectional Heteroskedastic with Time-Serial Correlation120 24 -1.19 / -1.52 / -1.76 0.87 / 0.96 / 0.91 0.14 / 0.19 / 0.30120 48 -0.32 / -0.69 / -0.82 1.01 / 0.95 / 0.95 0.05 / 0.12 / 0.18120 96 -0.16 / -0.32 / -0.39 0.96 / 0.98 / 1.00 0.07 / 0.08 / 0.09240 24 -1.28 / -1.61 / -1.80 0.86 / 0.85 / 0.83 0.23 / 0.37 / 0.53240 48 -0.37 / -0.65 / -0.83 0.96 / 0.96 / 0.88 0.08 / 0.17 / 0.31240 96 -0.12 / -0.29 / -0.38 0.98 / 1.02 / 1.00 0.06 / 0.08 / 0.16
Note:
Bias refers to relative biases in %, Ratio denotes the average ratiosof standard errors and standard deviations, and Size is the empirical sizesof z -tests with 5 % nominal size. Results are based on 1,000 repetitions. able 3: Properties of ˆ β - Missing Data Pattern 3 N T ψ = 0 . / ψ = 0 . / ψ = 0 . Bias Ratio SizeHomoskedastic120 24 0.07 / 0.09 / 0.02 0.91 / 0.90 / 0.90 0.07 / 0.08 / 0.08120 48 0.05 / 0.04 / 0.00 1.00 / 0.92 / 0.96 0.05 / 0.08 / 0.06120 96 0.06 / 0.03 / 0.04 0.99 / 0.96 / 0.99 0.05 / 0.06 / 0.05240 24 0.06 / 0.08 / 0.03 0.91 / 0.92 / 0.90 0.07 / 0.08 / 0.08240 48 0.03 / 0.01 / 0.03 0.96 / 0.92 / 0.95 0.05 / 0.07 / 0.06240 96 0.01 / 0.03 / 0.00 0.98 / 0.98 / 0.96 0.05 / 0.06 / 0.06Homoskedastic with Fat Tails120 24 0.15 / 0.14 / 0.07 0.92 / 0.87 / 0.88 0.07 / 0.08 / 0.08120 48 -0.01 / 0.02 / 0.03 0.94 / 0.86 / 0.89 0.06 / 0.06 / 0.07120 96 0.00 / 0.03 / 0.00 0.97 / 0.97 / 0.94 0.06 / 0.06 / 0.06240 24 0.03 / 0.04 / 0.05 0.92 / 0.89 / 0.94 0.07 / 0.07 / 0.06240 48 -0.01 / 0.01 / 0.01 0.97 / 0.95 / 0.93 0.06 / 0.07 / 0.07240 96 -0.01 / 0.02 / 0.01 0.97 / 0.94 / 0.94 0.05 / 0.07 / 0.06Cross-Sectional Heteroskedastic120 24 0.05 / 0.13 / 0.13 0.90 / 0.91 / 0.90 0.08 / 0.07 / 0.09120 48 -0.01 / 0.07 / 0.05 0.97 / 0.95 / 0.93 0.06 / 0.07 / 0.07120 96 0.01 / 0.01 / 0.00 0.95 / 0.99 / 0.96 0.07 / 0.05 / 0.06240 24 -0.04 / 0.00 / 0.01 0.89 / 0.94 / 0.92 0.07 / 0.05 / 0.07240 48 0.02 / 0.02 / 0.02 0.97 / 0.99 / 1.00 0.06 / 0.06 / 0.05240 96 0.02 / 0.00 / -0.01 0.98 / 0.95 / 0.98 0.06 / 0.06 / 0.06Cross-Sectional Heteroskedastic with Time-Serial Correlation120 24 -1.19 / -1.53 / -1.84 0.87 / 0.92 / 0.94 0.14 / 0.21 / 0.33120 48 -0.32 / -0.66 / -0.87 1.01 / 0.97 / 0.94 0.05 / 0.10 / 0.21120 96 -0.16 / -0.29 / -0.45 0.96 / 0.98 / 1.01 0.07 / 0.08 / 0.11240 24 -1.28 / -1.62 / -1.93 0.86 / 0.94 / 0.90 0.23 / 0.37 / 0.59240 48 -0.37 / -0.70 / -0.92 0.96 / 0.97 / 0.92 0.08 / 0.17 / 0.36240 96 -0.12 / -0.29 / -0.45 0.98 / 0.97 / 0.97 0.06 / 0.10 / 0.21
Note:
Bias refers to relative biases in %, Ratio denotes the average ratiosof standard errors and standard deviations, and Size is the empirical sizesof z -tests with 5 % nominal size. Results are based on 1,000 repetitions.
17f missing data increases. While the accuracy of the different estimators in pattern 1 is stillvery close to that in balanced panels, this is only partially the case in the other two patterns.Intuitively, if the missing data pattern consists of large blocks without any observations, theinformation used to estimate the common factors and loadings, which are used to augmentthe missing observations, are substantially lower and lead to noisy estimates. This explainswhy the performances in patterns 2 and 3, which consist of those large blocks, are relativelyworse compared to pattern 1.To sum up, we find that the properties of the IFE estimator in the presence of randomlymissing data are fairly well approximated by the asymptotic theory derived by Bai (2009) andMoon and Weidner (2017). Further, the accuracy of the different estimators for the number offactors differs substantially across fractions and patterns of randomly missing data. Overall,these findings are very different from those of conventional fixed effects models where neitherthe fraction nor the pattern of randomly missing data affect inference (see Czarnowske andStammann 2019 for fixed effects binary choice models).18 a b l e : E x p ec t e d V a l u e o f (cid:98) R - M i ss i n g D a t a P a tt e r n N T ψ = . / ψ = . / ψ = . I C B I C E R G R E D P A H o m o s k e d a s t i c . . . . . . . . . . . . . . . . . .
95 120482 . . . . . . . . . . . . . . . . . .
01 120962 . . . . . . . . . . . . . . . . . .
01 240242 . . . . . . . . . . . . . . . . . .
99 240482 . . . . . . . . . . . . . . . . . .
02 240962 . . . . . . . . . . . . . . . . . . H o m o s k e d a s t i c w i t h F a t T a il s . . . . . . . . . . . . . . . . . .
97 120482 . . . . . . . . . . . . . . . . . .
00 120962 . . . . . . . . . . . . . . . . . .
01 240242 . . . . . . . . . . . . . . . . . .
99 240482 . . . . . . . . . . . . . . . . . .
01 240962 . . . . . . . . . . . . . . . . . . C r o ss - S ec t i o n a l H e t e r o s k e d a s t i c . . . . . . . . . . . . . . . . . .
94 120482 . . . . . . . . . . . . . . . . . .
01 120962 . . . . . . . . . . . . . . . . . .
01 240242 . . . . . . . . . . . . . . . . . .
99 240482 . . . . . . . . . . . . . . . . . .
02 240962 . . . . . . . . . . . . . . . . . . C r o ss - S ec t i o n a l H e t e r o s k e d a s t i c w i t h T i m e - S e r i a l C o rr e l a t i o n . . . . . . . . . . . . . . . . . .
97 120482 . . . . . . . . . . . . . . . . . .
01 120962 . . . . . . . . . . . . . . . . . .
01 240246 . . . . . . . . . . . . . . . . . .
00 240482 . . . . . . . . . . . . . . . . . .
02 240962 . . . . . . . . . . . . . . . . . . N o t e : I C a nd B I C d e n o t e t h e i n f o r m a t i o n c r i t e r i ao f B a i a nd N g ( ) , E R a nd G R a r e t h ee s t i m a t o r s o f A hn a nd H o r e n s t e i n ( ) , E D i s t h ee s t i m a t o r o f O n a t s k i ( ) , a nd P A i s t h e d e fl a t e dp a r a ll e l a n a l y s i ss u gg e s t b y D o b r i b a n a nd O w e n ( ) . T r u e nu m b e r o ff a c t o r s i s t w o . T h e i n i t i a l e s t i m a t o r f o r β u s e s R = (cid:100) ( m i n ( N , T ) / ) / (cid:101) f a c t o r s . R e s u l t s a r e b a s e d o n , r e p e t i t i o n s . a b l e : E x p ec t e d V a l u e o f (cid:98) R - M i ss i n g D a t a P a tt e r n N T ψ = . / ψ = . / ψ = . I C B I C E R G R E D P A H o m o s k e d a s t i c . . . . . . . . . . . . . . . . . .
59 120482 . . . . . . . . . . . . . . . . . .
99 120962 . . . . . . . . . . . . . . . . . .
46 240242 . . . . . . . . . . . . . . . . . .
66 240482 . . . . . . . . . . . . . . . . . .
05 240962 . . . . . . . . . . . . . . . . . . H o m o s k e d a s t i c w i t h F a t T a il s . . . . . . . . . . . . . . . . . .
55 120482 . . . . . . . . . . . . . . . . . .
99 120962 . . . . . . . . . . . . . . . . . .
43 240242 . . . . . . . . . . . . . . . . . .
67 240482 . . . . . . . . . . . . . . . . . .
05 240962 . . . . . . . . . . . . . . . . . . C r o ss - S ec t i o n a l H e t e r o s k e d a s t i c . . . . . . . . . . . . . . . . . .
57 120482 . . . . . . . . . . . . . . . . . .
01 120962 . . . . . . . . . . . . . . . . . .
44 240242 . . . . . . . . . . . . . . . . . .
65 240482 . . . . . . . . . . . . . . . . . .
06 240962 . . . . . . . . . . . . . . . . . . C r o ss - S ec t i o n a l H e t e r o s k e d a s t i c w i t h T i m e - S e r i a l C o rr e l a t i o n . . . . . . . . . . . . . . . . . .
71 120482 . . . . . . . . . . . . . . . . . .
02 120962 . . . . . . . . . . . . . . . . . .
51 240246 . . . . . . . . . . . . . . . . . .
84 240482 . . . . . . . . . . . . . . . . . .
11 240962 . . . . . . . . . . . . . . . . . . N o t e : I C a nd B I C d e n o t e t h e i n f o r m a t i o n c r i t e r i ao f B a i a nd N g ( ) , E R a nd G R a r e t h ee s t i m a t o r s o f A hn a nd H o r e n s t e i n ( ) , E D i s t h ee s t i m a t o r o f O n a t s k i ( ) , a nd P A i s t h e d e fl a t e dp a r a ll e l a n a l y s i ss u gg e s t b y D o b r i b a n a nd O w e n ( ) . T r u e nu m b e r o ff a c t o r s i s t w o . T h e i n i t i a l e s t i m a t o r f o r β u s e s R = (cid:100) ( m i n ( N , T ) / ) / (cid:101) f a c t o r s . R e s u l t s a r e b a s e d o n , r e p e t i t i o n s . a b l e : E x p ec t e d V a l u e o f (cid:98) R - M i ss i n g D a t a P a tt e r n N T ψ = . / ψ = . / ψ = . I C B I C E R G R E D P A H o m o s k e d a s t i c . . . . . . . . . . . . . . . . . .
99 120482 . . . . . . . . . . . . . . . . . .
95 120962 . . . . . . . . . . . . . . . . . .
80 240242 . . . . . . . . . . . . . . . . . .
21 240482 . . . . . . . . . . . . . . . . . .
25 240962 . . . . . . . . . . . . . . . . . . H o m o s k e d a s t i c w i t h F a t T a il s . . . . . . . . . . . . . . . . . .
98 120482 . . . . . . . . . . . . . . . . . .
94 120962 . . . . . . . . . . . . . . . . . .
85 240242 . . . . . . . . . . . . . . . . . .
21 240482 . . . . . . . . . . . . . . . . . .
24 240962 . . . . . . . . . . . . . . . . . . C r o ss - S ec t i o n a l H e t e r o s k e d a s t i c . . . . . . . . . . . . . . . . . .
02 120482 . . . . . . . . . . . . . . . . . .
95 120962 . . . . . . . . . . . . . . . . . .
83 240242 . . . . . . . . . . . . . . . . . .
23 240482 . . . . . . . . . . . . . . . . . .
25 240962 . . . . . . . . . . . . . . . . . . C r o ss - S ec t i o n a l H e t e r o s k e d a s t i c w i t h T i m e - S e r i a l C o rr e l a t i o n . . . . . . . . . . . . . . . . . .
13 120482 . . . . . . . . . . . . . . . . . .
05 120962 . . . . . . . . . . . . . . . . . .
87 240246 . . . . . . . . . . . . . . . . . .
40 240482 . . . . . . . . . . . . . . . . . .
34 240962 . . . . . . . . . . . . . . . . . . N o t e : I C a nd B I C d e n o t e t h e i n f o r m a t i o n c r i t e r i ao f B a i a nd N g ( ) , E R a nd G R a r e t h ee s t i m a t o r s o f A hn a nd H o r e n s t e i n ( ) , E D i s t h ee s t i m a t o r o f O n a t s k i ( ) , a nd P A i s t h e d e fl a t e dp a r a ll e l a n a l y s i ss u gg e s t b y D o b r i b a n a nd O w e n ( ) . T r u e nu m b e r o ff a c t o r s i s t w o . T h e i n i t i a l e s t i m a t o r f o r β u s e s R = (cid:100) ( m i n ( N , T ) / ) / (cid:101) f a c t o r s . R e s u l t s a r e b a s e d o n , r e p e t i t i o n s . Empirical Illustration
Whether democracy causes economic growth is a long standing and very controversial questionamong economists. Recently Acemoglu et al. (2019) provide evidence that democratizationhas a very substantial positive impact on GDP per capita. Using annual data of 175 countriesobserved between 1960–2010, their main findings suggest a long-run effect of about 20 %.The data set constructed by the authors is very well suited for our purposes as it is naturallyunbalanced and covers a very long time horizon with several common shocks induced bytechnological progress and financial crises. Overall the sample consists of 6934 observations,where 3558 are classified as democratic. From 88 different countries, 122 transit to democracyand 71 to non-democracy. The average GDPs, measured in year 2000 dollars, are 8150 fordemocratic and 2074 for non-democratic countries. 71 countries were observed over the entiretime horizon such that, on average, the data set covers 136 countries and 40 years. Thefraction and pattern of missing data are comparable to the setting with ψ = 0 . and pattern3 in our simulation study.We reassess the baseline analysis of Acemoglu et al. (2019) using the IFE estimator andthe following specification: y it = β D it + p (cid:88) j =1 γ j y it − j + α i + δ t + λ (cid:48) i f t + e it , (20)where i and t are country and time specific indexes, D it is an indicator for being a democracy,and y it is the corresponding natural logarithm of GDP per capita. α i and δ t are additivefixed effects that capture time-invariant country characteristics and control for the globalbusiness cycle, respectively. Contrary to Acemoglu et al. (2019), we further decompose thetime-varying unobservable shocks into a factor structure λ (cid:48) i f t and a remaining idiosyncraticcomponent e it . This allows us to capture common shocks ( f t ), which simultaneously affect thegrowth and democratization of a country in different ways ( λ i ). The dynamic specificationpermits a distinction between short- and long-run effects of democratization, where the formeris ˆ β and the latter can be computed as ˆ φ = ˆ β − (cid:80) pj =1 ˆ γ j . We use p ∈ { , , } , where p = 4 is the preferred specification of Acemoglu et al. (2019).To reduce the number of parameters during the optimization, we project the country- andtime-specific effects out of the dependent variable and all regressors before estimating β and22 (see Bai 2009 section 8). The model after the projection becomes ¨ y it = β ¨ D it + p (cid:88) j =1 γ j ¨ y it − j + ¨ λ (cid:48) i ¨ f t + e it , (21)where the two dots above denote variables after projecting out both additive effects. In theabsence of any common factors R = 0 , this is simply the conventional fixed effects model.For valid inference it is important to know the true number of factors, or at least anestimate that is larger but close to the true number. Because the true number of factorsis unknown, we proceed as follows: First, we estimate each specification with R = 10 toobtain the pure factor models P D ( ¨W ( ˆ β, ˆ γ )) , where ¨ w it ( ˆ β, ˆ γ ) = ˆ β ¨ D it + (cid:80) pj =1 ˆ γ j ¨ y it − j . Notethat the number of factors chosen is equal to the rule-of-thumb used during the simulationexperiments. Afterwards, we apply the estimators suggested by Bai and Ng (2002), Onatski(2010), Ahn and Horenstein (2013), and Dobriban and Owen (2019) to estimate the numberof factors. Table 7 summarizes the results. The estimates are almost identical across differentspecifications. Both model selection criteria of Bai and Ng (2002) predict a substantiallylarger number of factors compared to the other estimators, where IC always predicts theupper bound. This is in line with Ahn and Horenstein (2013) who find that the informationcriteria are quite sensitive to the chosen upper bound on the number of factors and tend tooverestimate. We partially observe the same behavior during our simulation experiments.Contrary to the model selection criteria, the estimators of Onatski (2010), Ahn and Horenstein(2013), and Dobriban and Owen (2019) all suggest one or three common factors. Additionally,figure 2 shows the singular values of the pure factor models (non-filled dots) and those for apermuted version of the data (filled dots). More precisely, we randomly shuffle each column of P D ( ¨W ( ˆ β, ˆ γ )) and compute the maximum value of each singular value across 199 randomizedsamples. Note that this is essentially a graphical illustration of the parallel analysis withoutthe deflation proposed by Dobriban and Owen (2019). The large gap between the first andthe second common factor, explains why most of the estimators that try to decompose theeigenvalue spectrum predict one factor. However, if we compare the spectra with those ofpermuted data, we find that factor two and three have some additional explanatory powereven if it is quite low in terms of variance explained. If we additionally consider the results ofMoon and Weidner (2015), who showed that overestimating the number of factors is betterthan underestimating, R = 3 is our preferred choice.Table 8 summarizes the results of different additive and interactive fixed effects estimators(Interactive). As Acemoglu et al. (2019), we report results for the fixed effects estimator(Within), the Arellano-Bond estimator (AB, see Arellano and Bond 1991), and the Hahn-Hausman-Kuersteiner estimator (HHK, see Hahn, Hausman, and Kuersteiner 2004). However,instead of the conventional fixed effects estimator used by the authors, we report results of23 able 7: Estimated Number of Factors
Specification IC BIC ER GR ED PA p = 1
10 7 1 1 1 3 p = 2
10 8 1 1 1 3 p = 4
10 8 1 1 3 3
Note: IC and BIC denote the information criteriaof Bai and Ng (2002), ER and GR are the estimatorsof Ahn and Horenstein (2013), ED is the estimatorof Onatski (2010), and PA is the deflated parallelanalysis suggest by Dobriban and Owen (2019).Estimators applied to P D ( ¨W ( ˆ β, ˆ γ )) . The initialestimator for β and γ uses R = 10 . Figure 2:
Largest Singular Values in Descending Order l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l
Specification 1 Specification 2 Specification 35 10 15 20 5 10 15 20 5 10 15 2001020
Factor Number S i ngu l a r V a l ue The twenty largest singular values for P D ( ¨W ( ˆ β, ˆ γ )) / √ N T are denoted by non-filled dots and those forpermuted data by filled dots. The values for permuted data are based on 199 replications. The initial estimatorfor β and γ uses R = 10 .
24 bias-corrected within estimator with bandwidth L = 5 that accounts for the asymptoticbias induced by the predetermined regressor (see Nickell 1981). For Interactive, we reportresults for R ∈ { , , } . To correct for the Nickell (1981) bias and those biases inducedby cross-sectional heteroskedasticity and time-serial correlation, we use the asymptotic biascorrections proposed by Bai (2009) and Moon and Weidner (2017) with L = 5 and M = 3 .Similar to Acemoglu et al. (2019), we report estimates and standard errors of the short-and long-run effects of democratization and the persistence of GDP processes. Further, allstandard errors are heteroskedasticity robust and clustered at the country level to allow forarbitrary patterns of time-serial correlation. We find that all estimators reveal a strongand significant persistence of GDP processes across all specifications. The coefficients ofdemocratization obtained by Within and Interactive with R ≥ are always significant atthe 5 % level, whereas those of AB and HHK are only significant for p = 4 . If we focus on p = 4 , which is Acemoglu et al. (2019)’s preferred specification, the fixed effects models usedby the authors reveal short-run effects of a transition to democracy between 0.828 % and1.178 % and long-run effects between 16.448 % and 29.262 %. However, after controlling foradditional time-varying unobserved heterogeneity, we find short- and long-run effects thatare substantially lower compared to those reported by the authors. Our preferred Interactivewith R = 3 , yields short- and long-run estimates of 0.622 % and 18.264 %.Next we consider two different sensitivity checks. First, the estimation of the asymptoticbiases requires different bandwidth choices. We check the sensitivity of the results by analyzingall combinations of the following bandwidth choices: L ∈ { , . . . , } and M ∈ { , . . . , } .Second, we report estimates of Interactive for R ∈ { , . . . , } . As shown by Moon andWeidner (2015), the inclusion of additional redundant common factors should only affect theprecision of the IFE estimator after controlling for all relevant common factors. Table 9 and10 summarize the results. With respect to the different bandwidth choices, we find that theresults of Interactive are very robust to all combinations of bandwidth choices as indicatedby the narrow intervals reported in the table. Contrary, the estimates of Within are moresensitive. For p = 4 , we find long-run effects between 21.571 % and 32.289 %. With respectto the number of factors, we find that after controlling for more than three, the estimatedpersistence of the GDP process starts declining. The same pattern was also recognized inthe empirical illustration of Moon and Weidner (2015). The authors argue that the dynamicspecification might be misspecified in the sense that the lagged outcome variables simplycapture time-serial correlation in the idiosyncratic error term instead of true state dependence.Because the factor structure also captures time-serial correlation, this might indicate thatthere is no true state dependence. Contrary the coefficients of democratization become larger
11. This estimator was also used by Chen, Chernozhukov, and Fernández-Val (2019) for the same illustration,but on a balanced subset of the data. The authors also proposed a split-panel jackknife bias correction toreduce the many moment bias of the Arellano-Bond estimator (see Newey and Smith 2004)12. All covariance estimators use a degrees-of-freedom adjustment to improve their finite sample properties. able 8: Effect of Democracy on Logarithmic GDP per Capita ( × Within AB HHK Interactive R = 1 R = 2 R = 3 Specification 1 - p = 1 Democracy 1.051 0.959 0.781 0.745 0.755 0.809(0.293) (0.477) (0.455) (0.329) (0.287) (0.305)Persistence of 0.983 0.946 0.938 0.960 0.973 0.967GDP process (0.006) (0.009) (0.011) (0.007) (0.006) (0.007)Long-run effect 60.489 17.608 12.644 18.644 27.690 24.546of democracy (28.073) (10.609) (8.282) (9.460) (12.250) (11.131)Specification 2 - p = 2 Democracy 0.671 0.797 0.582 0.488 0.554 0.559(0.247) (0.417) (0.387) (0.292) (0.263) (0.279)Persistence of 0.975 0.946 0.941 0.956 0.968 0.966GDP process (0.005) (0.009) (0.010) (0.007) (0.005) (0.006)Long-run effect 26.513 14.882 9.929 11.030 17.258 16.557of democracy (12.026) (9.152) (7.258) (7.174) (8.881) (9.005)Specification 3 - p = 4 Democracy 0.828 0.875 1.178 0.513 0.600 0.622(0.225) (0.374) (0.370) (0.267) (0.259) (0.249)Persistence of 0.972 0.947 0.953 0.958 0.964 0.966GDP process (0.005) (0.009) (0.009) (0.006) (0.006) (0.005)Long-run effect 29.262 16.448 25.032 12.226 16.749 18.264of democracy (10.281) (8.436) (10.581) (6.780) (7.956) (8.030)
Note:
Within, Ab, HHK, and Interactive denote the bias-corrected fixed effectsestimator, the Arellano-Bond estimator, the Hahn-Hausman-Kuersteiner estima-tor, and the IFE estimator. Standard errors in parentheses are heteroskedasticityrobust and clustered at the country level. Within and Interactive use bandwidths L = 5 and M = 3 for the estimation of the asymptotic biases. The results of ABand HHK are taken from table 2 in Acemoglu et al. (2019). Table 9:
Sensitivity to Different Bandwidth Choices
Within Interactive R = 1 R = 2 R = 3 Specification 1 - p = 1 Democracy [0.986; 1.057] [0.744; 0.775] [0.755; 0.806] [0.806; 0.867]Persistence of [0.974; 0.984] [0.958; 0.960] [0.972; 0.973] [0.966; 0.968]GDP processLong-run effect [38.127; 64.280] [18.551; 18.872] [27.336; 29.065] [24.368; 25.713]of democracy Specification 2 - p = 2 Democracy [0.633; 0.674] [0.488; 0.522] [0.554; 0.609] [0.548; 0.583]Persistence of [0.968; 0.976] [0.954; 0.956] [0.967; 0.968] [0.966; 0.967]GDP processLong-run effect [19.509; 27.723] [10.987; 11.369] [17.231; 18.721] [16.158; 17.150]of democracy Specification 3 - p = 4 Democracy [0.773; 0.849] [0.504; 0.547] [0.596; 0.635] [0.620; 0.661]Persistence of [0.964; 0.974] [0.957; 0.958] [0.964; 0.964] [0.965; 0.966]GDP processLong-run effect [21.571; 32.289] [11.945; 12.672] [16.548; 17.715] [18.113; 19.352]of democracy
Note:
Effect of democracy on logarithmic GDP per capita ( × . Within and Interactivedenote the bias-corrected fixed effects estimator and the IFE estimator. The intervalsdenote the ranges of all estimates across different combinations of L ∈ { , . . . , } and M ∈ { , . . . , } . Finally, we consider an additional specification without predetermined regressors ( p = 0 ).Again we estimate the number of factors from a pure factor model, where the initial estimateis based on R = 10 . The estimates are identical to those of p = 4 and provide further supportfor our preferred choice of R = 3 . The corresponding estimate of democratization is - 1.251 %(standard error = 1.286 %) and is in line with Barro (1996) who report a negative and/orinsignificant effect of democracy on growth.To sum up, we find some additional support for the hypothesis of Acemoglu et al. (2019):democracy does cause growth. Using the IFE estimator to control for time-varying unobservedheterogeneity, we obtain results that are qualitatively similar to the authors. If we compareHHK to Interactive with R = 3 in the authors preferred specification p = 4 , we find that theshort-run effect of democratization is halved. However the corresponding long-run effect of18.264 % is still pretty close to the 20 % reported by Acemoglu et al. (2019).27 able 10: Sensitivity to the Number of Factors R = 4 R = 5 R = 6 R = 7 R = 8 R = 9 R = 10 Specification 1 - p = 1 Democracy 0.417 0.160 1.101 1.553 1.615 1.619 2.197(0.511) (0.486) (0.516) (0.564) (0.602) (0.634) (0.718)Persistence of 0.848 0.854 0.754 0.702 0.600 0.554 0.453GDP process (0.018) (0.020) (0.030) (0.035) (0.034) (0.039) (0.040)Long-run effect 2.749 1.095 4.476 5.220 4.037 3.635 4.019of democracy (3.363) (3.339) (2.140) (1.939) (1.551) (1.479) (1.394)Specification 2 - p = 2 Democracy 0.388 0.868 1.178 1.424 1.905 1.150 1.418(0.469) (0.446) (0.511) (0.565) (0.614) (0.651) (0.705)Persistence of 0.810 0.689 0.586 0.485 0.369 0.342 0.247GDP process (0.019) (0.031) (0.036) (0.031) (0.038) (0.042) (0.056)Long-run effect 2.041 2.795 2.845 2.767 3.019 1.747 1.882of democracy (2.478) (1.458) (1.266) (1.129) (1.013) (1.018) (0.968)Specification 3 - p = 4 Democracy 0.763 1.091 1.452 1.474 1.513 1.474 0.476(0.455) (0.446) (0.534) (0.567) (0.557) (0.596) (0.589)Persistence of 0.628 0.593 0.380 0.213 0.174 0.187 -0.202GDP process (0.025) (0.038) (0.049) (0.050) (0.056) (0.092) (0.075)Long-run effect 2.053 2.683 2.341 1.874 1.831 1.814 0.396of democracy (1.211) (1.124) (0.883) (0.729) (0.692) (0.766) (0.493)
Note:
Effect of democracy on logarithmic GDP per capita ( × . Results obtainedby the interactive fixed effects estimator for R ∈ { , . . . , } . Standard errorsin parentheses are heteroskedasticity robust and clustered at the country level.Bandwidths L = 5 and M = 3 for the estimation of the asymptotic biases. Further Extensions
Although we analyzed the IFE estimator of Bai (2009), we want to point out two naturalextensions of our findings. First, in the presence of regressors that are endogenous with respectto the idiosyncratic error term, Moon and Weidner (2017) and Moon, Shum, and Weidner(2018) suggest a minimum distance estimator with interactive fixed effects in the spirit ofChernozhukov and Hansen (2006, 2008). Second, because the objective function of the IFEestimator is generally non-convex, Moon and Weidner (2019) suggest an alternative estimatorthat avoids the potentially difficult optimization problem with multiple local minima andresults in optimizing a convex objective function.
Extension 1: Minimum Distance Estimator
Suppose that x it can be further decomposed into K endogenous and K exogenousregressors such that K = K + K . To avoid ambiguity, we label endogenous and exogenousregressors with an appropriate superscript. Further, let z it = [ z it, , . . . , z it,M ] be a vectorof excluded exogenous instruments with M ≥ K . Moon and Weidner (2017) suggest thefollowing minimum distance estimator. In a first step, an estimator for β end is obtained by ˆ β end = arg min β end ∈ R K ˆ π (cid:0) β end (cid:1) (cid:48) Σ ˆ π (cid:0) β end (cid:1) , (22)where ˆ π ( β end ) is the IFE estimator of y it − x end (cid:48) it β end = x exo (cid:48) it β exo + z (cid:48) it π + λ (cid:48) i f t + e it (23)and Σ is a positive definite M × M weighting matrix. At the true value of β end , π is zerogiven the exclusion restriction on z it . In a second step, ˆ β exo is the IFE estimator of y it − x end (cid:48) it ˆ β end = x exo (cid:48) it β exo + λ (cid:48) i f t + e it . (24)The properties of the minimum distance estimator are studied in Moon, Shum, and Weidner(2018), where the authors extend the random coefficient demand model of Berry, Levinsohn,and Pakes (1995) with interactive fixed effects to account for unobserved product-marketspecific heterogeneity, like advertisement. Under very similar assumptions as in Moon andWeidner (2017), the authors show consistency and derive the asymptotic distribution of theminimum distance estimator. Because their estimator embeds the IFE estimator, we canapply the same algorithms and estimators studied in this article. Further, Lee, Moon, andWeidner (2012) use the same estimator to account for measurement errors in the dependentvariable in dynamic interactive fixed effects models.29 xtension 2: Nuclear Norm Minimizing Estimator Moon and Weidner (2019) show that the imposed rank constraint on the factor structureleads to a non-convex optimization problem. The authors suggest an alternative estimatorbased on a convex relaxation of this constraint. More precisely, they show that an estimatorfor β is ˆ β (cid:63) = arg min β ∈ R K n min( N,T ) (cid:88) r =1 σ r (cid:2) P D ( W ( β )) (cid:3) , (25)where σ r [ · ] denotes the r -th largest singular value. Moon and Weidner (2019) show consistencyof this estimator, but only at a rate of (cid:112) min( N, T ) . As a consequence, the convex relaxationleads to a certain loss of efficiency compared to the IFE estimator.To recover the properties of the IFE estimator, Moon and Weidner (2019) suggest toestimate the number of factors from P D ( W ( ˆ β (cid:63) )) and afterwards apply an iterative postestimation routine. After a finite number of iterations the estimator has the same limitingdistribution as the IFE estimator. The post estimation routine can be summarized as follows: Definition.
Post nuclear norm estimationGiven ˆ β (cid:63) and R , initialize ˆ β = ˆ β (cid:63) and repeat the following steps a finite number of timesStep 1: Compute (cid:98) F (cid:16) ˆ β (cid:17) and (cid:98) Λ (cid:16) ˆ β (cid:17) from (cid:102) W (cid:16) ˆ β (cid:17) Step 2: Compute ˘ y and ˘ x k for all k ∈ { , . . . , K } Step 3: Update ˆ β = (cid:16) ˘ X (cid:48) ˘ X (cid:17) − ˘ X (cid:48) ˘ y , where ˘ X = [˘ x , . . . , ˘ x K ] The assumption that unobserved heterogeneity is constant over time, is often very restrictive.Especially in panels that cover a long time horizon, like macroeconomic panels of countries, itis unlikely that a global shock affects all countries equally. Interactive fixed effects estimatorsoffer researchers new possibilities to consider this more general form of heterogeneity intheir analysis (see among others Holtz-Eakin, Newey, and Rosen 1988; Pesaran 2006; Bai2009). However these panels are often naturally unbalanced, demanding an additional dataaugmentation step for the estimator of Bai (2009) (see Appendix of Bai 2009 and Bai, Liao,and Yang 2015).In this article, we analyzed the finite sample behavior of Bai (2009)’s interactive fixedeffects estimator in the presence of randomly missing data. Simulation experiments confirmedthat the inferential theory derived by Bai (2009) and Moon and Weidner (2017) for balanceddata also provides a reasonable approximation for the unbalanced case. However, we alsofound that the finite sample performance can be affected by the fraction and pattern ofmissing data. 30uture research could address this issue and provide an inferential theory, which takesthe additional uncertainty induced by data augmentation into account. This might help toimprove the finite sample behavior of Bai (2009)’s estimator in the presence of randomlymissing data. 31 eferences
Acemoglu, Daron, Suresh Naidu, Pascual Restrepo, and James A. Robinson. 2019. “DemocracyDoes Cause Growth.”
Journal of Political Economy
127 (1): 47–100.Ahn, Seung C., and Alex R. Horenstein. 2013. “Eigenvalue Ratio Test for the Number ofFactors.”
Econometrica
81 (3): 1203–1227.Alessi, Lucia, Matteo Barigozzi, and Marco Capasso. 2010. “Improved penalization for de-termining the number of factors in approximate factor models.”
Statistics & ProbabilityLetters
80 (23): 1806–1813.Anderson, T.W., and Cheng Hsiao. 1982. “Formulation and estimation of dynamic modelsusing panel data.”
Journal of Econometrics
18 (1): 47–82.Arellano, Manuel, and Stephen Bond. 1991. “Some Tests of Specification for Panel Data:Monte Carlo Evidence and an Application to Employment Equations.”
The Review ofEconomic Studies
58 (2): 277–297.Bai, Jushan. 2003. “Inferential Theory for Factor Models of Large Dimensions.”
Econometrica
71 (1): 135–171.. 2009. “Panel Data Models with Interactive Fixed Effects.”
Econometrica
77 (4):1229–1279.Bai, Jushan, and Yuan Liao. 2017. “Inferences in panel data with interactive effects usinglarge covariance matrices.”
Journal of Econometrics
200 (1): 59–78.Bai, Jushan, Yuan Liao, and Jisheng Yang. 2015. “Unbalanced Panel Data Models withInteractive Effects.” In
The Oxford Handbook of Panel Data,
Econometrica
70 (1): 191–221.. 2013. “Principal components estimation and identification of static factors.”
Journalof Econometrics
176 (1): 18–29.Barro, Robert J. 1996. “Democracy and Growth.”
Journal of Economic Growth
Econometrica
63 (4): 841–890.Bonhomme, Stéphane, and Elena Manresa. 2015. “Grouped Patterns of Heterogeneity inPanel Data.”
Econometrica
83 (3): 1147–1184.32uja, Andreas, and Nermin Eyuboglu. 1992. “Remarks on Parallel Analysis.”
MultivariateBehavioral Research
27 (4): 509–540.Chamberlain, Gary. 1982. “Multivariate regression models for panel data.”
Journal of Econo-metrics
18 (1): 5–46.. 1984. “Chapter 22 Panel data,” 2:1247–1318. Handbook of Econometrics.Chen, Mingli, Iván Fernández-Val, and Martin Weidner. 2019. “Nonlinear Factor Models forNetwork and Panel Data.” arXiv preprint arXiv: 1412.5647.
Chen, Shuowen, Victor Chernozhukov, and Iván Fernández-Val. 2019. “Mastering PanelMetrics: Causal Impact of Democracy on Growth.”
AEA Papers and Proceedings
Journal of Econometrics
132 (2): 491–525.. 2008. “Instrumental variable quantile regression: A robust inference approach.”
Journalof Econometrics
142 (1): 379–398.Choi, In, and Hanbat Jeong. 2019. “Model selection for factor analysis: Some new criteria andperformance comparisons.”
Econometric Reviews
38 (6): 577–596.Czarnowske, Daniel, and Amrei Stammann. 2019. “Binary Choice Models with High-DimensionalIndividual and Time Fixed Effects.” arXiv preprint arXiv:1904.04217.
Dobriban, Edgar. Forthcoming. “Permutation methods for factor analysis and PCA.”
Annalsof Statistics.
Dobriban, Edgar, and Art B. Owen. 2019. “Deterministic parallel analysis: an improvedmethod for selecting factors and principal components.”
Journal of the Royal StatisticalSociety: Series B (Statistical Methodology)
81 (1): 163–183.Fernández-Val, Iván, and Martin Weidner. 2018. “Fixed Effects Estimation of Large-T PanelData Models.”
Annual Review of Economics
10 (1): 109–138.Fong, David Chin-Lung, and Michael Saunders. 2011. “LSMR: An Iterative Algorithm forSparse Least-Squares Problems.”
SIAM Journal on Scientific Computing
33 (5): 2950–2971.Gagliardini, Patrick, Elisa Ossola, and Olivier Scaillet. 2019. “A diagnostic criterion forapproximate factor structure.”
Journal of Econometrics
212 (2): 503–521.Gaure, Simen. 2013. “OLS with multiple high dimensional category variables.”
ComputationalStatistics & Data Analysis
Stata Journal
10 (4): 628–649.Hahn, Jinyong, Jerry Hausman, and Guido Kuersteiner. 2004. “Estimation with weak in-struments: Accuracy of higher-order bias and MSE approximations.”
The EconometricsJournal
Journal of the American Statistical Association
102 (478):603–617.Halperin, Israel. 1962. “The product of projection operators.”
Acta Sci. Math. (Szeged)
Econometrica
56 (6): 1371–1395.Lee, Nayoung, Hyungsik Roger Moon, and Martin Weidner. 2012. “Analysis of interactivefixed effects dynamic linear panel regression with measurement error.”
Economics Letters
117 (1): 239–242.Moon, Hyungsik Roger, Matthew Shum, and Martin Weidner. 2018. “Estimation of randomcoefficients logit demand models with interactive fixed effects.”
Journal of Econometrics
206 (2): 613–644.Moon, Hyungsik Roger, and Martin Weidner. 2015. “Linear Regression for Panel WithUnknown Number of Factors as Interactive Fixed Effects.”
Econometrica
83 (4): 1543–1579.. 2017. “DYNAMIC LINEAR PANEL REGRESSION MODELS WITH INTERAC-TIVE FIXED EFFECTS.”
Econometric Theory
33 (1): 158–195.. 2019. “Nuclear Norm Regularized Estimation of Panel Regression Models.” arXivpreprint arXiv: 1810.10987.
Mundlak, Yair. 1978. “On the Pooling of Time Series and Cross Section Data.”
Econometrica
46 (1): 69–85.Newey, Whitney K., and Richard J. Smith. 2004. “Higher Order Properties of GMM andGeneralized Empirical Likelihood Estimators.”
Econometrica
72 (1): 219–255.Newey, Whitney K., and Kenneth D. West. 1987. “A Simple, Positive Semi-Definite, Het-eroskedasticity and Autocorrelation Consistent Covariance Matrix.”
Econometrica
55 (3):703–708. 34ewey, Whitney K., and Kenneth D. West. 1994. “Automatic Lag Selection in CovarianceMatrix Estimation.”
The Review of Economic Studies
61 (4): 631–653.Nickell, Stephen. 1981. “Biases in Dynamic Models with Fixed Effects.”
Econometrica
49 (6):1417–1426.Onatski, Alexei. 2010. “Determining the Number of Factors from Empirical Distribution ofEigenvalues.”
The Review of Economics and Statistics
92 (4): 1004–1016.Pesaran, M. Hashem. 2006. “Estimation and Inference in Large Heterogeneous Panels with aMultifactor Error Structure.”
Econometrica
74 (4): 967–1012.R Core Team. 2019.
R: A Language and Environment for Statistical Computing.
Vienna,Austria: R Foundation for Statistical Computing. .Schwert, G. William. 1989. “Tests for Unit Roots: A Monte Carlo Investigation.”
Journal ofBusiness & Economic Statistics arXiv preprint arXiv:1707.01815.
Stock, James H., and Mark W. Watson. 1998. “Diffusion indexes.”
NBER Working Paper No.6702. . 2002. “Macroeconomic forecasting using diffusion indexes.”
Journal of Business &Economic Statistics
20 (2): 147–162.White, Halbert. 1980. “A Heteroskedasticity-Consistent Covariance Matrix Estimator and aDirect Test for Heteroskedasticity.”