[PDF] Inference in Unbalanced Panel Data Models with Interactive Fixed Effects

Abstract

In this article, we study the limiting behavior of Bai (2009)'s interactive fixed effects estimator in the presence of randomly missing data. In extensive simulation experiments, we show that the inferential theory derived by Bai (2009) and Moon and Weidner (2017) approximates the behavior of the estimator fairly well. However, we find that the fraction and pattern of randomly missing data affect the performance of the estimator. Additionally, we use the interactive fixed effects estimator to reassess the baseline analysis of Acemoglu et al. (2019). Allowing for a more general form of unobserved heterogeneity as the authors, we confirm significant effects of democratization on growth.

Full PDF

IInference in Unbalanced Panel Data Models withInteractive Fixed Eﬀects

Daniel Czarnowske ∗ Amrei Stammann † April 8, 2020In this article, we study the limiting behavior of Bai (2009)’s interactive ﬁxed eﬀectsestimator in the presence of randomly missing data. In extensive simulation experiments,we show that the inferential theory derived by Bai (2009) and Moon and Weidner(2017) approximates the behavior of the estimator fairly well. However, we ﬁnd that thefraction and pattern of randomly missing data aﬀect the performance of the estimator.Additionally, we use the interactive ﬁxed eﬀects estimator to reassess the baseline analysisof Acemoglu et al. (2019). Allowing for a more general form of unobserved heterogeneityas the authors, we conﬁrm signiﬁcant eﬀects of democratization on growth.

JEL Classiﬁcation:

C01, C13, C23, C38, C55, O10

Keywords:

Economic Development, Interactive Fixed Eﬀects, Factor Models, ModelSelection, Principal Components, Unbalanced Panel Data ∗ Heinrich-Heine-University Duesseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany, phone: +49 21181-10620, e-mail: [email protected] † Heinrich-Heine-University Duesseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany, phone: +49 21181-15307, e-mail: [email protected] a r X i v : . [ ec on . E M ] A p r Introduction

Economists are often concerned that unobserved heterogeneity is correlated with some variablesof interest and thus leads to inconsistent estimates of the corresponding common parameters β . If panel data is available, so called ﬁxed eﬀects models are frequently used to address thisissue. One critical assumption of these models is that the unobserved heterogeneity has to beadditive separable in both panel dimensions. For instance, if a panel consists of N individualsobserved for T time periods, the researcher has to assume that individual- and/or time-speciﬁceﬀects enter the model additively. If this is not the case, for instance because both eﬀectsare multiplicatively interacted, conventional ﬁxed eﬀects models are not suitable to solve theunderlying endogeneity problem. Exactly this concern motivates so-called interactive ﬁxedeﬀects (IFE) estimators that model the time-varying unobserved heterogeneity as a low rankfactor structure λ (cid:48) i f t , where λ i and f t are individual- and time-speciﬁc eﬀects, respectively (seeamong others Holtz-Eakin, Newey, and Rosen 1988; Pesaran 2006; Bai 2009). Throughoutthis article, we refer to λ i as factor loadings and f t as common factors. Holtz-Eakin, Newey, and Rosen (1988) propose an estimator for panels with large N butsmall T which is based on a quasi-diﬀerencing approach similar to Anderson and Hsiao (1982).First they eliminate the factor loadings from the estimation equation and then estimate theremaining common factors and parameters using lagged covariates as instrumental variables.Although this estimator is consistent under ﬁxed T asymptotics, its is well known that forlarge T , the number of instruments and parameters leads to biased estimates (see Neweyand Smith 2004). Recently, the literature considers estimators that require N and T to besuﬃciently large. Pesaran (2006) suggests a common correlated eﬀects (CEE) estimator in thespirit of Mundlak (1978) and Chamberlain (1982, 1984), which uses cross-sectional averagesof the dependent variable and the regressors to control for the unobserved common factors.His estimator is at least √ N consistent without the need to know the true rank of the factorstructure or to impose strong factor assumptions as in Bai (2009) and Moon and Weidner(2015, 2017). However, in order to use cross-sectional averages as proxy variables for theunobserved common factors, we require some parametric assumptions on the joint probabilitydistributions of the dependent variable and the covariates. Bai (2009) suggests a diﬀerentestimator that treats the common factors and factor loadings as nuisance parameters to beestimated. His estimator is closely related to Bai (2003)’s principal components estimator

1. Bonhomme and Manresa (2015) suggest a related but diﬀerent approach. Instead of imposing rankrestrictions on the time-varying unobserved heterogeneity, they use a clustering approach to assign eachcross-sectional unit to a speciﬁc group where the corresponding group-speciﬁc heterogeneity is allowed to varyover time.2. The factor structure can be roughly interpreted as a series approximation of the time-varying unobservedheterogeneity.3. For a detailed discussion of the diﬀerent interactive ﬁxed eﬀects estimators we refer the reader to Bai(2009) and Moon and Weidner (2015, 2017). √ N T consistency irrespective of cross-sectional and/or time-serialdependence in the idiosyncratic error term. However, the presence of cross-sectional and/ortime-serial dependence leads to an asymptotic bias in the limiting distribution of the estimatorthat can be corrected (see Bai 2009). Moon and Weidner (2017) derived an additional biascorrection for the Nickell (1981) bias stemming from the inclusion of predetermined andweakly exogenous regressors. Because the true number of factors is usually unknown, Moonand Weidner (2015) show that under certain assumptions and as long as the number of factorsused to estimate β is larger than the true number, the estimator may have the same limitingdistribution as shown by Bai (2009) but remains at least (cid:112) min( N, T ) consistent. There maytherefore be a loss of eﬃciency due to the inclusion of too many irrelevant factors. However,given a consistent estimator for β , the number of factors can be estimated using estimatorsfor pure factor models (see among others Buja and Eyuboglu 1992; Bai and Ng 2002; Hallinand Liška 2007; Alessi, Barigozzi, and Capasso 2010; Onatski 2010; Ahn and Horenstein 2013;Dobriban and Owen 2019). A recent comparison of some popular estimators is given in Choiand Jeong (2019). For the IFE estimator such a comparison does not exist so far.Additionally, it is often the case that some of the observations in a panel are missing.One frequent reason is attrition. For instance, some individuals drop out of a panel becausethey move or leave the participating household. In some of these cases, those individualsare replaced. In macroeconomic panels, it also occurs that some countries are divided intoseveral independent countries. Further, some survey designs replace individuals because ofnon-response. All these cases lead to very diﬀerent patterns of missing data that usually, inthe absence of sample selection, do not aﬀect the properties of the estimators (see Fernández-Val and Weidner 2018). However in the presence of missing data, the principal componentestimator of Bai (2009) requires an additional data augmentation step based on the EMalgorithm of Stock and Watson (1998, 2002) (see Appendix of Bai 2009 and Bai, Liao, andYang 2015). Bai, Liao, and Yang (2015) show consistency of the EM-type estimator inextensive simulation studies, but do not further investigate the limiting behavior of theestimator.Our article makes the following contributions. First, we extend the work of Bai, Liao,and Yang (2015) and show that the limiting behavior of their suggested estimator can befairly well approximated by the inferential theory of Bai (2009) and Moon and Weidner (2017)derived for balanced panels. In extensive simulation experiments, we show that the fractionand pattern of missing data may aﬀect the properties of the estimator, which is contraryto other popular unobserved eﬀects estimators like the conventional within estimator (seeFernández-Val and Weidner 2018). Further, we present some algorithms that reduce the3omputational costs in the presence of missing data. Second, because the limiting theoryof Bai (2009) and Moon and Weidner (2017) assumes that the true number of factors isknown, we additionally investigate the ﬁnite sample performance of some frequently usedestimators for the number of factors: Bai and Ng (2002), Onatski (2010), Ahn and Horenstein(2013), and Dobriban and Owen (2019). Although we ﬁnd that all estimators perform wellfor balanced data and diﬀerent conﬁgurations of the idiosyncratic error term, their accuracyvaries substantially with diﬀerent fractions and patterns of randomly missing data. Third,we contribute to the literature of growth by reassessing the baseline analysis of Acemogluet al. (2019) using the IFE estimator of Bai (2009). We qualitatively conﬁrm their main resultsand ﬁnd signiﬁcant eﬀects of democratization on growth. In their preferred speciﬁcation, weestimate a long-run eﬀect of 18 %, which is pretty close to the 20 % reported by the authors,but the instantaneous eﬀect of democratization almost halves to 0.6 %.The article is organized as follows. We introduce the model and the correspondingestimator in Section 2. We brieﬂy review some estimators for the number of factors inSection 3. We provide results of extensive simulation experiments in Section 4. We reassessAcemoglu et al. (2019) using the IFE estimator in Section 5. We brieﬂy discuss the handling ofendogenous regressors as in Moon and Weidner (2017) and Moon, Shum, and Weidner (2018)and consider an alternative estimator suggested by Moon and Weidner (2019) in Section 6.Finally, we conclude in Section 7.Throughout this article, we follow conventional notation: scalars are represented instandard type, vectors and matrices in boldface, and all vectors are column vectors. Further,let A be a N × N matrix and B be a N × T matrix. We denote µ r [ A ] as the r -th largesteigenvalue of A , we refer to [ B ] ij as the ij -th element of B , and we deﬁne M B = N − P B ,where N is a N × N identity matrix, P B = B ( B (cid:48) B ) † B (cid:48) , and ( · ) † is the Moore-Penrosepseudoinverse. In this article, we analyze the following unobserved eﬀects model: y it = x (cid:48) it β + λ (cid:48) i f t + e it , (1)where i and t are individual and time speciﬁc indexes, x it = [ x it, , . . . , x it,K ] (cid:48) is a vector of K explanatory variables, β is the corresponding vector of common parameters, and e it is anidiosyncratic error term. To allow for the possibility of missing data, we introduce D as asubset of observed index pairs such that D ⊆ { ( i, t ) | i ∈ { , . . . , N } ∧ t ∈ { , . . . , T }} , where n = |D| is the sample size and N and T are the number of individuals and time periods,4espectively. Further, the unobserved eﬀects are expressed as a factor structure of rank R (cid:28) min( N, T ) , where λ i = [ λ i , . . . , λ iR ] (cid:48) is a vector of factor loadings and f t = [ f t , . . . , f tR ] (cid:48) is a vector of common factors. Note that (1) collapses to the conventional additive unobservedeﬀects model if λ i = [ α i , (cid:48) and f t = [1 , δ t ] (cid:48) . In contrast to this conventional model, the factorstructure allows to capture more general patterns of heterogeneity. For instance, temporalshocks induced by ﬁnancial crises might diﬀerently aﬀect each countries output (see Bai (2009)for some additional motivating examples).Following Bai (2009) and Moon and Weidner (2017), we treat λ i and f t as parameters tobe estimated, and allow the common factors and loadings to be arbitrarily related with theregressors. To stress the similarity with the conventional ﬁxed eﬀects model, we follow theexisting literature and refer to (1) as interactive ﬁxed eﬀects model.For a given value of R , Moon and Weidner (2015, 2017) suggest ˆ β = arg min β ∈ R K Q ( β ) (2)as IFE estimator for β , where Q ( β ) = min Λ , F n (cid:88) ( i,t ) ∈D ( y it − x (cid:48) it β − λ (cid:48) i f t ) (3)is the proﬁle objective function, Λ = [ λ , . . . , λ N ] (cid:48) is a N × R matrix of factor loadings, and F = [ f , . . . , f T ] (cid:48) is a T × R matrix of common factors. Note that the minimizing commonfactors ˆ f t ( β ) and loadings ˆ λ i ( β ) are not uniquely determined without imposing furthernormalizing restrictions. However, their column spaces are, which means their product isunique. We will return to this issue in the next subsection. Further, the objective function isglobally non-convex due to the rank constraint imposed on the factor structure.Moon and Weidner (2017) show consistency of the IFE estimator using an asymptoticframework where N, T → ∞ . Most notably, their imposed assumptions require the truenumber of factors to be known, some regularity assumptions, weak exogeneity of the regressors,and some conditions with respect to so called “low-rank” regressors. The latter assumptionsensure that these “low-rank” regressors are not entirely absorbed by the factor structure,which otherwise leads to an identiﬁcation problem. This is very similar to the conventionalﬁxed eﬀects model where coeﬃcients of time-invariant regressors are not identiﬁed. Note thatBai (2009) also shows consistency using diﬀerent asymptotics that requires strict exogeneityof all regressors. However, his framework is not suitable for our purposes because we willconsider predetermined regressors in our empirical illustration.We brieﬂy describe the asymptotic distribution of the IFE estimator derived by Moon and

4. Time-invariant and/or common regressors can be project out by applying suitable within transformationsbefore estimating the parameters of interest (see Bai 2009). To avoid ambiguity, by randomly missing observations, we mean thatthe dependent variable is independent of the attrition process conditional on the covariatesand factor structure. Thus we assume that the observations are conditionally missing atrandom. Under asymptotic sequences where

N/T → κ and < κ < ∞ , the IFE estimatorhas the following limiting distribution: ˆ β d → N (cid:16) β − N − B − T − C − T − C , n − V (cid:17) , (4)where N = n/T , T = n/N , B , C , and C are leading bias terms, and V is a covariancematrix. C stems from the inclusion of weakly exogenous regressors, like predeterminedvariables, and is a generalization of the Nickell (1981) bias. B and C arise if the idiosyncraticerror term is heteroskedastic or correlated across individuals and time periods, respectively. Moon and Weidner (2017) show that (3) can be transformed to Q ( β ) = 1 n min( N,T ) (cid:88) r = R +1 µ r (cid:2) W ( β ) (cid:48) W ( β ) (cid:3) , (5)where W ( β ) is a N × T matrix with [ W ( β )] it = y it − x (cid:48) it β . However, in the presence ofmissing data, some of the entries in W ( β ) are missing and we cannot simply apply theeigendecomposition as in the balanced case.We follow the suggestion of Bai (2009) and Bai, Liao, and Yang (2015) and combine (5)with an EM algorithm proposed by Stock and Watson (1998, 2002). Intuitively, we augmentthe missing data in the E-step and apply the eigendecomposition to complete data in theM-step. Following Bai, Liao, and Yang (2015), we can rearrange (1) to y it − x (cid:48) it β = [ W ( β )] it = λ (cid:48) i f t + e it , (6)which means that, for a known β , we can augment missing observations in W ( β ) withestimates of Λ and F . As Bai (2009), we impose the following normalizing restrictions touniquely determine the common factors and loadings: F (cid:48) F /T = R and Λ (cid:48) Λ = diag ( α ) ,where α ∈ R R . Given these normalizing restrictions, (cid:98) F ( β ) is equal to the ﬁrst R eigenvectorsof W ( β ) (cid:48) W ( β ) multiplied by √ T and (cid:98) Λ ( β ) = W ( β ) (cid:98) F ( β ) /T .

5. Chen, Fernández-Val, and Weidner (2019) use the same conjecture for non-linear models with interactiveﬁxed eﬀects.6. Other valid normalizing restrictions are discussed in Bai and Ng (2013).7. If

T > N , it is computationally more eﬃcient to impose Λ (cid:48) Λ /N = R and F (cid:48) F = diag ( α ) and estimate (cid:2) P D ( A ) (cid:3) it =  [ A ] it if ( i, t ) ∈ D otherwise (7)be a projection operator that replaces missing observations of any N × T matrix A with zeros.Likewise, P ⊥D ( A ) is the complementary operator that replaces non-missing observations withzeros. Deﬁnition.

EM algorithmGiven β and R , initialize W ⊥ = N × T and repeat the following steps until convergenceStep 1: Set (cid:102) W ( β ) = P D ( W ( β )) + P ⊥D (cid:0) W ⊥ (cid:1) Step 2: Compute (cid:98) F ( β ) and (cid:98) Λ ( β ) from (cid:102) W ( β ) Step 3: Update W ⊥ = (cid:98) Λ ( β ) (cid:98) F ( β ) (cid:48) For a given β and R , we start by replacing missing observations in W ( β ) with zeros. Wedenote this augmented matrix as (cid:102) W ( β ) . Afterwards we estimate (cid:98) F ( β ) and (cid:98) Λ ( β ) from (cid:102) W ( β ) ,replace the missing observations in (cid:102) W ( β ) with ˆ λ (cid:48) i ( β ) ˆ f t ( β ) , and repeat these two steps untilconvergence.Let (cid:102) W ( β ) denote the augmented matrix after convergence, a general IFE objective functionis given by (cid:101) Q ( β ) = 1 n min( N,T ) (cid:88) r = R +1 µ r (cid:104) (cid:102) W ( β ) (cid:48) (cid:102) W ( β ) (cid:105) , (8)where in case of balanced data (cid:102) W ( β ) is simply W ( β ) . Thus, in case of missing data,we complement the IFE estimator suggested by Moon and Weidner (2015, 2017) with anadditional data augmentation step inside the objective function. Before we describe the estimation of the leading bias terms and the covariance matrix necessaryfor inference, we need to know how to project the estimated common factors and/or loadingsout of an arbitrary n -dimensional vector v . Let ˘ v , ` v , and ´ v denote the residuals of the (cid:98) Λ ( β ) as the ﬁrst R eigenvectors of W ( β ) W ( β ) (cid:48) multiplied by √ N and (cid:98) F ( β ) = W ( β ) (cid:48) (cid:98) Λ ( β ) /N . ˘ v it = v it − ˆ λ (cid:48) i ˆ a t − ˆ f (cid:48) t ˆ b i , (cid:16) (cid:98) A , (cid:98) B (cid:17) = arg min A ∈ R T × R , B ∈ R N × R (cid:88) ( i,t ) ∈D (cid:16) v it − ˆ λ (cid:48) i a t − ˆ f (cid:48) t b i (cid:17) , (9) ` v it = v it − ˆ λ (cid:48) i ˆ a t , (cid:98) A = arg min A ∈ R T × R (cid:88) ( i,t ) ∈D (cid:16) v it − ˆ λ (cid:48) i a t (cid:17) , (10) ´ v it = v it − ˆ f (cid:48) t ˆ b i , (cid:98) B = arg min B ∈ R N × R (cid:88) ( i,t ) ∈D (cid:16) v it − ˆ f (cid:48) t b i (cid:17) , (11)where A = [ a , . . . , a T ] (cid:48) and B = [ b , . . . , b N ] (cid:48) . In case of balanced data, these residuals havea straightforward expression: ˘ v = vec ( M (cid:98) Λ V M (cid:98) F ) , ` v = vec ( M (cid:98) Λ V ) , and ´ v = vec ( V M (cid:98) F ) ,where V is a N × T matrix with elements denoted by v it . However in the presence ofmissing data, we cannot simply augment missing observations in V with zeros and applythe same expressions as in the balanced case. We can still use a standard ordinary leastsquares estimator to obtain the residuals, but with increasing sample sizes this problemquickly becomes infeasible (even for moderately large data sets). Suppose we have a data setconsisting of N = 200 individuals observed for T = 50 time periods and consider ﬁve factors.Even in this moderate example, the rank of the sparse regressor matrix corresponding to thecommon factors and factor loadings is already ( N + T ) R = 1 , . Fortunately, we can useinsights from the literature about conventional ﬁxed eﬀects models and use sparse solvers likeHalperin (1962) and Fong and Saunders (2011) to mitigate this problem (see Guimarães andPortugal 2010; Gaure 2013; Stammann 2018).In the presence of missing data, we recommend to compute the residuals with the methodof alternating projections (MAP, see Halperin 1962) approach suggested by Stammann (2018).Let D t = { i | ( i, t ) ∈ D} and D i = { t | ( i, t ) ∈ D} , we deﬁne the following scalar expressions: (cid:2) M ˆ λ r v (cid:3) it = v it − ˆ λ ir (cid:80) i ∈D t ˆ λ ir v it (cid:80) i ∈D t ˆ λ ir and (cid:2) M ˆ f r v (cid:3) it = v it − ˆ f tr (cid:80) t ∈D i ˆ f tr v it (cid:80) t ∈D i ˆ f tr . Deﬁnition.

MAP algorithm (for unbalanced data)Step 0: Initialize Mv = v .Step 1: (If (cid:98) Λ has to be projected out e. g. in (9) and (10) )For r = 1 , . . . , R , set Mv = M ˆ λ r Mv .Step 2: (If (cid:98) F has to be projected out e. g. in (9) and (11) )For r = 1 , . . . , R , set Mv = M ˆ f r Mv .Step 3: Repeat step 1 and/or 2 until convergence, e. g. (cid:107) Mv ( i ) − Mv ( i − (cid:107) < (cid:15) , where i is the iteration number and (cid:15) is a tolerance parameter. After convergence Mv is aclose approximation to ˘ v , ` v , or ´ v . Next, we describe how to draw inference under the assumption that e it is homoskedastic.In this case, ˆ β is asymptotically unbiased and the corresponding covariance can be estimatedas (cid:98) V = ˆ σ (cid:98) D − , where ˆ σ = n − (cid:80) ( i,t ) ∈D ˆ e it , ˆ e it = y it − x (cid:48) it ˆ β − ˆ λ (cid:48) i ˆ f t , and (cid:98) D = (cid:88) ( i,t ) ∈D ˘ x it ˘ x (cid:48) it . (12)Contrary, under the assumption that e it is not homoskedastic, the IFE estimator is asymptot-ically biased, but can be corrected using appropriate bias corrections. The Nickell (1981) biasstemming from the inclusion of predetermined or weakly exogenous variables can be correctedin a similar fashion.Following Moon and Weidner (2015, 2017), a bias-corrected estimator for β is ˜ β = ˆ β + (cid:98) B + (cid:98) C + (cid:98) C , (13)where (cid:98) B = (cid:98) D − (cid:98) B β , (cid:98) C = (cid:98) D − (cid:98) C β , and (cid:98) C = (cid:98) D − (cid:98) C β are estimators for the asymptotic biasesdescribed in subsection 2.1. Further, (cid:98) B β , (cid:98) C β , and (cid:98) C β are K -dimensional vectors equal to (cid:98) B β k = (cid:88) ( i,t ) ∈D ˆ e it (cid:104) P D ( ` X k ) (cid:98) Θ (cid:48) (cid:105) ii , (14) (cid:98) C β ,k = N (cid:88) i =1 L (cid:88) l =1 (cid:88) t>l ∈D i (cid:2) P (cid:98) F (cid:3) t,t − l ˆ e i,t − l x it,k , (15) (cid:98) C β ,k = (cid:88) ( i,t ) ∈D ˆ e it (cid:104) P D ( ´ X k ) (cid:98) Θ (cid:105) tt + (16) N (cid:88) i =1 M (cid:88) m =1 (cid:88) t>m ∈D i ˆ e it ˆ e i,t − m (cid:18)(cid:104) P D ( ´ X k ) (cid:98) Θ (cid:105) t,t − m + (cid:104) P D ( ´ X k ) (cid:98) Θ (cid:105) t − m,t (cid:19) , M and L are bandwidth parameters for the truncated kernel of Newey and West (1987), (cid:98) Θ = (cid:98) Λ ( (cid:98) Λ (cid:48) (cid:98) Λ ) − ( (cid:98) F (cid:48) (cid:98) F ) − (cid:98) F (cid:48) , and P D ( ´ X k ) and P D ( ` X k ) are N × T matrices with elementsdenoted by ´ x it,k and ` x it,k if ( i, t ) ∈ D and zero otherwise. The ﬁrst term of (cid:98) B β k and (cid:98) C β ,k aresymmetric and estimate the biases induced by individual and time-serial heteroskedasticityin the idiosyncratic error term, respectively. The second term in (cid:98) C β ,k corrects for the biasstemming from time-serial correlation (see Bai 2009 remark 6 and Moon and Weidner 2015).In this article, we do not consider cross-sectional correlation in the idiosyncratic error term,because usually a large distance between cross-sectional units does not imply small correlationof the corresponding error terms. Thus a kernel method as used to deal with time-serialcorrelation is not suitable for this purpose. Appropriate estimators for the covariance of ˜ β are given by (cid:101) V j = (cid:98) D − (cid:98) Ω j (cid:98) D − ∀ j = { , } , (17) (cid:98) Ω = (cid:88) ( i,t ) ∈D ˆ e it ˘ x it ˘ x (cid:48) it , (18) (cid:98) Ω = N (cid:88) i =1 (cid:32)(cid:88) t ∈D i ˆ e it ˘ x it (cid:33) (cid:32)(cid:88) t ∈D i ˆ e it ˘ x it (cid:33) (cid:48) . (19) (cid:101) V is a White (1980)-type heteroskedasticity robust and (cid:101) V is a cluster-robust covarianceestimator that takes into account arbitrary time-serial correlation by assigning observations toindividual speciﬁc clusters. Alternatively, the clustered covariance estimator can be substitutedby Newey and West (1987)’s estimator. Bai (2009) and Moon and Weidner (2017) show consistency and derive the asymptoticdistribution of the IFE estimator under the assumption that the number of factors is known.To avoid ambiguity with R , we denote the true number of factors as R . In practice, thisassumption is often very unlikely unless economic theory gives a clear prediction about thenumber of factors. However, even in this case, it might be necessary to support the theoreticalprediction by some empirical evidence. Therefor we need a reliable method to estimate thenumber of factors.For pure factor models, i. e. the unobserved eﬀects model without covariates, there isalready an extensive literature on the estimation of the number of factors (see among othersBuja and Eyuboglu 1992; Bai and Ng 2002; Hallin and Liška 2007; Alessi, Barigozzi, and

8. In the presence of cross-sectional correlation, Bai (2009, in remark 7) presents a partial sample estimator.Alternatively, Bai and Liao (2017) suggest to estimate the cross-sectional correlation using an inverse covarianceestimator and incorporate the corresponding weights in the objective function. y it − x (cid:48) it ˆ β = λ (cid:48) i f t + e it − x it ( ˆ β − β ) is essentially a pure factor model. Thus givenan appropriate estimator for β , such that the error x it ( ˆ β − β ) is asymptotically negligible,we can consistently estimate the number of factors using the estimators developed for purefactor models (see Bai 2009 remark 5 and Appendix).Next we need to deﬁne an appropriate estimator. Bai (2009) argues, without rigorousproof, that the ˆ β is √ N T consistent as long as the number of factors is at least R . Theintuition is very similar to the inclusion of irrelevant variables in a standard OLS regression.Including redundant common factors does not aﬀect consistency of the IFE estimator, butits precision (see Bai 2009 remark 4). Under some more restrictive assumptions as imposedby Bai (2009) and Moon and Weidner (2017), Moon and Weidner (2015) conﬁrm that theasymptotic distribution of ˆ β with R > R is in fact identical to the one with R = R . However, imposing similar assumptions as Bai (2009) and Moon and Weidner (2017), theauthors can only show (cid:112) min(

N, T ) consistency of ˆ β .The related literature gives two practical advises: First, let R be a known upper boundon the numbers of factors, than we can consistently estimate β and afterwards the number offactors using estimators developed for pure factor models (see Bai 2009). Second, becauseincluding irrelevant factors does not aﬀect the limiting distribution, valid inference does notnecessary require to consistently estimate the true number of factors. However this ﬂexibilityis associated with an eﬃciency loss in ﬁnite samples and thus reliable methods to estimatethe number of factors are still useful to ensure that the number of factors used to estimate β is not substantially larger than R (see Moon and Weidner 2015).Throughout this article, we restrict ourselves to the estimators for the number of factorssuggested by Bai and Ng (2002), Onatski (2010), Ahn and Horenstein (2013), and Dobribanand Owen (2019) applied to the pure factor model W ( ˆ β ) given β is estimated with R = R .Bai and Ng (2002) introduces various model selection criteria based on minimizing the sumof squared residuals plus some penalty function of the number of estimated parameters.Onatski (2010), Ahn and Horenstein (2013), and Dobriban and Owen (2019) segment theeigenvalue spectrum of the covariance of W ( ˆ β ) to ﬁnd a cutoﬀ point between the commonfactors and the remaining noise stemming from the idiosyncratic error term. Onatski (2010)proposed the edge distribution (ED) estimator based on diﬀerences of consecutive eigenvalues.Ahn and Horenstein (2013) suggest to use ratios (ER) and growth rates (GR) instead ofdiﬀerences. Buja and Eyuboglu (1992) suggest a speciﬁc version of the parallel analysis(PA), which compares the eigenvalues to those obtained of independent data. Intuitively, theeigenvalues of independent data provide a clear threshold to separate common factors from

9. Some of the very restrictive assumptions imposed by Moon and Weidner (2015), like independent andidentically standard normally distributed error terms, are mainly due to technical reasons. In simulationexperiments the authors violate this assumption and still ﬁnd support for their theoretical results. W ( ˆ β ) , which preservesthe variances of the data but breaks the correlation pattern induced by the common factors.Recently, Dobriban (forthcoming) provides the theoretical justiﬁcation for the accuracy ofPA and Dobriban and Owen (2019) propose a deﬂated version that improves the detectionaccuracy of smaller but important factors in the presence of large factors.In the presence of missing data, we follow Gagliardini, Ossola, and Scaillet (2019) andapply the diﬀerent estimators to P D ( W ( ˆ β )) instead of W ( ˆ β ) . We study the inference drawn from the interactive ﬁxed eﬀects model in the presence ofrandomly missing data. Contrary to the balanced case, the IFE estimator requires an additionaldata augmentation step to estimate the common factors and loadings on complete data (seeBai 2009; Bai, Liao, and Yang 2015). This is done using the EM algorithm proposed byStock and Watson (1998, 2002). Given we know the true number of factors, we ﬁrstly analyzewhether the inferential theory derived for the IFE estimator is a reasonable approximation inthe presence of randomly missing data. For this purpose, we compare relative biases (Bias),average ratios of standard errors and standard deviations, and empirical sizes of z -tests with5 % nominal size (Size) for diﬀerent patterns of randomly missing data and conﬁgurations ofthe idiosyncratic error term with those from a balanced panel. Because usually the numberof factors in unknown, we secondly consider diﬀerent estimators for the number of factorsand compare their performance as well. We analyze the estimators suggested by Bai and Ng(2002), Onatski (2010), Ahn and Horenstein (2013), and Dobriban and Owen (2019). Fromthe various information criteria introduced by Bai and Ng (2002), we focus on IC and BIC which are also used in other studies (see Onatski 2010; Ahn and Horenstein 2013). To assesthe performance, we compare the average estimated number of factors of all estimators.As Moon and Weidner (2015), we consider a static panel data model with one regressorand two factors: y it = βx it + (cid:88) r =1 λ ir f tr + e it ,x it = 1 + (cid:88) r =1 ( λ ir + χ ir ) ( f tr + f t − ,r ) + w it ,i = 1 , . . . , N , t = 1 , . . . , T , and e it is an idiosyncratic error term. The regressor is generatedsuch that there is a correlation between the common factors and loadings. Throughout allexperiments, we generate f tr and w it as iid. N (0 , and λ ir and χ ir as iid. N (1 , .In spirit of Bai and Ng (2002) and Ahn and Horenstein (2013), we consider four diﬀerent12onﬁgurations for the idiosyncratic error term: i) homoskedastic, ii) homoskedastic with fattails, iii) cross-sectional heteroskedastic, and iv) cross-sectional heteroskedastic with time-serial correlation. More precisely, i) e it ∼ iid. N (0 , , ii) e it = (cid:112) / ν it , where ν it has a t -distribution with ﬁve degrees of freedom, iii) e it ∼ iid. N (0 , if i is odd and e it ∼ iid. N (0 , else, and iv) e it = 0 . e it − + ν it , where ν it ∼ iid. N (0 , / if i is odd and ν it ∼ iid. N (0 , / else. For conﬁguration iv), we ensure that e it is drawn from its stationary distribution bydiscarding the initial 1,000 time periods. Note that the variance of the idiosyncratic errorterm is equal across all conﬁgurations.We consider three diﬀerent patterns where a fraction of ψ ∈ { , . , . } observations aremissing at random. The overall sample size is equal to N T (1 − ψ ) . Figure 1 gives a graphicalillustration about the three missing data patterns. In the ﬁrst pattern, we irregularly drop Figure 1:

Patterns of Randomly Missing Observations lll ll ll l ll lll ll lll lll ll lll l llll l lll ll ll ll ll ll ll l ll ll lll l ll ll l ll ll ll l llll l l llll ll ll l ll ll ll ll l l l lll l llll lll ll l lll ll ll ll l ll ll l ll ll llll llll lll l l l l lll ll ll l ll lll ll l llll ll ll lll lll ll ll l l ll l l lll ll lll lllll ll l llll ll llllll ll lll l l ll lllll ll ll ll ll ll ll l ll l llll ll lll ll l lllll ll lll l llll l llll lll lll l lllll llllll l lllll l l ll ll ll ll l lllll llll l l l llll ll ll ll l ll lllll l l lll l l ll ll l l l l ll ll l lll l ll ll ll ll ll l ll lll l ll ll l lll l l llll l lll l ll l ll l ll ll ll ll l ll l lll l ll llll l l lll l l ll lll l ll llll lll lll l lll l ll l ll ll ll lll lll ll lll ll ll l l l ll l lll l l ll l ll l llll ll ll lll l ll ll lll l lllll l lll lll ll llll l ll lll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

Pattern 1 Pattern 2 Pattern 30 10 20 30 0 10 20 30 0 10 20 300102030

Time Periods I nd i v i dua l s N T ψ observations from the entire panel data set. This pattern is also analyzed by Bai, Liao,and Yang (2015) and mimics a situation in surveys where individuals refuse or forget toanswer certain questions. The other patterns are borrowed from Czarnowske and Stammann(2019) and reﬂect situations where individuals are replaced after they dropped out from asurvey or not. To describe pattern 2 and 3, we divide all individuals into two types. Type 1consists of N = 2 ψN individuals that are observed for T = T / time periods. The remaining N = N − N individuals are of type 2 and are observed over the entire time horizon ( T = T ).Patterns 2 and 3 diﬀer only in the point in time when the time series of a type 1 individualbegins. In pattern 2, all time series start in t = 1 , whereas in pattern 3, the initial period ischosen randomly with equal probability from { , , . . . , T − T } . All unbalanced data sets aregenerated from balanced panels by randomly dropping observations given the correspondingmissing data pattern.We consider panel data sets of diﬀerent average sizes: N = { , } and T = { , , } ,13here N = N / (1 − ψ ) and T = T / (1 − ψ ) . This allows us to compare the results acrossdiﬀerent fractions of missing data and check whether the conjecture of Fernández-Val andWeidner (2018) applies to the IFE estimator as well. All results are based on 1,000 replicationsand summarized in tables 1–6. All computations were done on a Linux Mint 18.1 workstationusing R Version 3.6.3 (R Core Team 2019).First, we analyze the ﬁnite sample properties of the IFE estimator. In conﬁguration iii),we correct for the asymptotic bias ( B ) induced by cross-sectional heteroskedasticity anduse an appropriate covariance estimator in spirit of White (1980). In conﬁguration iv), weadditionally correct for the asymptotic bias ( C ) induced by time-serial correlation and usea cluster robust covariance estimator. We choose the bandwidth for the estimation of C according to the rule of thumb proposed by Newey and West (1994): M = 4( T / / . Theresults are summarized in tables 1–3. For conﬁguration i)–iii), we observe biases, ratios, andsizes that are almost identical to the balanced case irrespective of the fraction and pattern ofmissing data. Thus, the asymptotic properties of the IFE estimator for balanced data are afairly well approximation for the unbalanced one in these conﬁgurations. This is diﬀerent forconﬁguration iv). Here we observe biases that are twice as large compared to the balancedcase. Although all ratios are close to one, these larger biases distort the nominal sizes andlead to over-rejection. Contrary to the other conﬁgurations, the various missing data patternsaﬀect the ﬁnite sample properties of the estimator diﬀerently and a larger fraction of missingdata leads to worse properties.Next we analyze the diﬀerent estimators for the number of factors suggested by Bai andNg (2002), Onatski (2010), Ahn and Horenstein (2013), and Dobriban and Owen (2019).The initial estimator to obtain the pure factor model uses R = (cid:100) N , T ) / / (cid:101) . This choice is diﬀerent from other studies like Bai and Ng (2002), Onatski (2010), and Ahnand Horenstein (2013) who keep the number of factors ﬁxed irrespective of the sample size.For ER and GR we use the mock eigenvalue proposed in Ahn and Horenstein (2013) toallow for the possibility to select zero common factors. All results are summarized in tables4–6. First we analyze the case of balanced data. For T ≥ , all estimators have littlebias. Additionally, for BIC , ED, and PA the biases are low irrespective of the sample sizewhereas ER and GR slightly underestimate the true number of factor. These ﬁndings arein line with Ahn and Horenstein (2013) for pure factor models and suggest that the errorin estimating β is asymptotically negligible. This is also an additional robustness check forMoon and Weidner (2015), who expect that their main results also apply to non iid. standardnormally distributed error terms. For unbalanced, we observe that the missing data patternsas well as the fraction of missing data aﬀect the performance of all estimators diﬀerently. Ingeneral we ﬁnd that ER and GR are more likely to underestimate, whereas the others tendto overestimate the number of factors. Further, the performance gets worth as the fraction

10. This rule of thumb was suggested by Bai and Ng (2002) in footnote 10 and traces back to Schwert (1989). able 1: Properties of ˆ β - Missing Data Pattern 1 N T ψ = 0 . / ψ = 0 . / ψ = 0 . Bias Ratio SizeHomoskedastic120 24 0.07 / -0.01 / 0.14 0.91 / 0.88 / 0.92 0.07 / 0.08 / 0.08120 48 0.05 / 0.00 / 0.03 1.00 / 0.92 / 0.95 0.05 / 0.08 / 0.05120 96 0.06 / 0.01 / 0.04 0.99 / 1.00 / 0.97 0.05 / 0.06 / 0.06240 24 0.06 / 0.02 / 0.04 0.91 / 0.96 / 0.90 0.07 / 0.05 / 0.08240 48 0.03 / 0.01 / 0.02 0.96 / 0.97 / 0.98 0.05 / 0.06 / 0.05240 96 0.01 / 0.00 / 0.01 0.98 / 0.99 / 0.99 0.05 / 0.04 / 0.05Homoskedastic with Fat Tails120 24 0.15 / 0.06 / 0.10 0.92 / 0.85 / 0.90 0.07 / 0.09 / 0.08120 48 -0.01 / 0.03 / 0.05 0.94 / 0.97 / 0.91 0.06 / 0.07 / 0.06120 96 0.00 / 0.02 / 0.02 0.97 / 0.88 / 0.95 0.06 / 0.06 / 0.06240 24 0.03 / 0.03 / 0.04 0.92 / 0.89 / 0.93 0.07 / 0.06 / 0.07240 48 -0.01 / 0.00 / 0.01 0.97 / 0.93 / 0.98 0.06 / 0.07 / 0.06240 96 -0.01 / -0.03 / 0.00 0.97 / 0.99 / 0.98 0.05 / 0.06 / 0.06Cross-Sectional Heteroskedastic120 24 0.05 / -0.03 / 0.13 0.90 / 0.92 / 0.92 0.08 / 0.07 / 0.07120 48 -0.01 / -0.01 / 0.00 0.97 / 0.92 / 0.96 0.06 / 0.07 / 0.06120 96 0.01 / -0.01 / 0.05 0.95 / 0.94 / 0.96 0.07 / 0.06 / 0.06240 24 -0.04 / 0.02 / 0.06 0.89 / 0.89 / 0.92 0.07 / 0.08 / 0.08240 48 0.02 / 0.03 / 0.00 0.97 / 0.95 / 0.95 0.06 / 0.05 / 0.06240 96 0.02 / -0.01 / 0.01 0.98 / 0.98 / 1.00 0.06 / 0.05 / 0.04Cross-Sectional Heteroskedastic with Time-Serial Correlation120 24 -1.19 / -1.20 / -1.17 0.87 / 0.93 / 0.97 0.14 / 0.15 / 0.17120 48 -0.32 / -0.52 / -0.51 1.01 / 0.99 / 0.98 0.05 / 0.09 / 0.09120 96 -0.16 / -0.24 / -0.25 0.96 / 0.97 / 0.97 0.07 / 0.06 / 0.08240 24 -1.28 / -1.31 / -1.20 0.86 / 0.92 / 0.98 0.23 / 0.26 / 0.30240 48 -0.37 / -0.57 / -0.56 0.96 / 0.94 / 0.97 0.08 / 0.15 / 0.17240 96 -0.12 / -0.24 / -0.29 0.98 / 1.00 / 0.98 0.06 / 0.08 / 0.12

Note:

Bias refers to relative biases in %, Ratio denotes the average ratiosof standard errors and standard deviations, and Size is the empirical sizesof z -tests with 5 % nominal size. Results are based on 1,000 repetitions. able 2: Properties of ˆ β - Missing Data Pattern 2 N T ψ = 0 . / ψ = 0 . / ψ = 0 . Bias Ratio SizeHomoskedastic120 24 0.07 / -0.01 / 0.06 0.91 / 0.92 / 0.92 0.07 / 0.08 / 0.07120 48 0.05 / 0.04 / 0.07 1.00 / 0.96 / 1.00 0.05 / 0.05 / 0.05120 96 0.06 / 0.00 / 0.04 0.99 / 0.96 / 0.97 0.05 / 0.06 / 0.06240 24 0.06 / 0.08 / 0.05 0.91 / 0.89 / 0.94 0.07 / 0.08 / 0.06240 48 0.03 / -0.01 / 0.01 0.96 / 0.94 / 0.98 0.05 / 0.06 / 0.06240 96 0.01 / 0.00 / 0.02 0.98 / 0.94 / 0.98 0.05 / 0.07 / 0.06Homoskedastic with Fat Tails120 24 0.15 / 0.05 / 0.12 0.92 / 0.87 / 0.88 0.07 / 0.08 / 0.07120 48 -0.01 / 0.00 / 0.03 0.94 / 0.93 / 0.94 0.06 / 0.06 / 0.06120 96 0.00 / -0.01 / 0.00 0.97 / 0.98 / 0.97 0.06 / 0.06 / 0.06240 24 0.03 / 0.07 / 0.04 0.92 / 0.90 / 0.88 0.07 / 0.07 / 0.07240 48 -0.01 / -0.01 / 0.05 0.97 / 0.91 / 0.95 0.06 / 0.06 / 0.07240 96 -0.01 / 0.02 / 0.03 0.97 / 0.96 / 0.96 0.05 / 0.06 / 0.07Cross-Sectional Heteroskedastic120 24 0.05 / 0.06 / 0.04 0.90 / 0.90 / 0.90 0.08 / 0.07 / 0.07120 48 -0.01 / 0.01 / 0.05 0.97 / 0.94 / 0.92 0.06 / 0.06 / 0.08120 96 0.01 / -0.01 / 0.02 0.95 / 0.97 / 0.96 0.07 / 0.05 / 0.06240 24 -0.04 / 0.06 / 0.01 0.89 / 0.93 / 0.89 0.07 / 0.07 / 0.08240 48 0.02 / 0.00 / 0.00 0.97 / 0.96 / 0.95 0.06 / 0.06 / 0.06240 96 0.02 / 0.01 / 0.01 0.98 / 0.93 / 0.97 0.06 / 0.06 / 0.06Cross-Sectional Heteroskedastic with Time-Serial Correlation120 24 -1.19 / -1.52 / -1.76 0.87 / 0.96 / 0.91 0.14 / 0.19 / 0.30120 48 -0.32 / -0.69 / -0.82 1.01 / 0.95 / 0.95 0.05 / 0.12 / 0.18120 96 -0.16 / -0.32 / -0.39 0.96 / 0.98 / 1.00 0.07 / 0.08 / 0.09240 24 -1.28 / -1.61 / -1.80 0.86 / 0.85 / 0.83 0.23 / 0.37 / 0.53240 48 -0.37 / -0.65 / -0.83 0.96 / 0.96 / 0.88 0.08 / 0.17 / 0.31240 96 -0.12 / -0.29 / -0.38 0.98 / 1.02 / 1.00 0.06 / 0.08 / 0.16

Note:

Bias refers to relative biases in %, Ratio denotes the average ratiosof standard errors and standard deviations, and Size is the empirical sizesof z -tests with 5 % nominal size. Results are based on 1,000 repetitions. able 3: Properties of ˆ β - Missing Data Pattern 3 N T ψ = 0 . / ψ = 0 . / ψ = 0 . Bias Ratio SizeHomoskedastic120 24 0.07 / 0.09 / 0.02 0.91 / 0.90 / 0.90 0.07 / 0.08 / 0.08120 48 0.05 / 0.04 / 0.00 1.00 / 0.92 / 0.96 0.05 / 0.08 / 0.06120 96 0.06 / 0.03 / 0.04 0.99 / 0.96 / 0.99 0.05 / 0.06 / 0.05240 24 0.06 / 0.08 / 0.03 0.91 / 0.92 / 0.90 0.07 / 0.08 / 0.08240 48 0.03 / 0.01 / 0.03 0.96 / 0.92 / 0.95 0.05 / 0.07 / 0.06240 96 0.01 / 0.03 / 0.00 0.98 / 0.98 / 0.96 0.05 / 0.06 / 0.06Homoskedastic with Fat Tails120 24 0.15 / 0.14 / 0.07 0.92 / 0.87 / 0.88 0.07 / 0.08 / 0.08120 48 -0.01 / 0.02 / 0.03 0.94 / 0.86 / 0.89 0.06 / 0.06 / 0.07120 96 0.00 / 0.03 / 0.00 0.97 / 0.97 / 0.94 0.06 / 0.06 / 0.06240 24 0.03 / 0.04 / 0.05 0.92 / 0.89 / 0.94 0.07 / 0.07 / 0.06240 48 -0.01 / 0.01 / 0.01 0.97 / 0.95 / 0.93 0.06 / 0.07 / 0.07240 96 -0.01 / 0.02 / 0.01 0.97 / 0.94 / 0.94 0.05 / 0.07 / 0.06Cross-Sectional Heteroskedastic120 24 0.05 / 0.13 / 0.13 0.90 / 0.91 / 0.90 0.08 / 0.07 / 0.09120 48 -0.01 / 0.07 / 0.05 0.97 / 0.95 / 0.93 0.06 / 0.07 / 0.07120 96 0.01 / 0.01 / 0.00 0.95 / 0.99 / 0.96 0.07 / 0.05 / 0.06240 24 -0.04 / 0.00 / 0.01 0.89 / 0.94 / 0.92 0.07 / 0.05 / 0.07240 48 0.02 / 0.02 / 0.02 0.97 / 0.99 / 1.00 0.06 / 0.06 / 0.05240 96 0.02 / 0.00 / -0.01 0.98 / 0.95 / 0.98 0.06 / 0.06 / 0.06Cross-Sectional Heteroskedastic with Time-Serial Correlation120 24 -1.19 / -1.53 / -1.84 0.87 / 0.92 / 0.94 0.14 / 0.21 / 0.33120 48 -0.32 / -0.66 / -0.87 1.01 / 0.97 / 0.94 0.05 / 0.10 / 0.21120 96 -0.16 / -0.29 / -0.45 0.96 / 0.98 / 1.01 0.07 / 0.08 / 0.11240 24 -1.28 / -1.62 / -1.93 0.86 / 0.94 / 0.90 0.23 / 0.37 / 0.59240 48 -0.37 / -0.70 / -0.92 0.96 / 0.97 / 0.92 0.08 / 0.17 / 0.36240 96 -0.12 / -0.29 / -0.45 0.98 / 0.97 / 0.97 0.06 / 0.10 / 0.21

Note:

17f missing data increases. While the accuracy of the diﬀerent estimators in pattern 1 is stillvery close to that in balanced panels, this is only partially the case in the other two patterns.Intuitively, if the missing data pattern consists of large blocks without any observations, theinformation used to estimate the common factors and loadings, which are used to augmentthe missing observations, are substantially lower and lead to noisy estimates. This explainswhy the performances in patterns 2 and 3, which consist of those large blocks, are relativelyworse compared to pattern 1.To sum up, we ﬁnd that the properties of the IFE estimator in the presence of randomlymissing data are fairly well approximated by the asymptotic theory derived by Bai (2009) andMoon and Weidner (2017). Further, the accuracy of the diﬀerent estimators for the number offactors diﬀers substantially across fractions and patterns of randomly missing data. Overall,these ﬁndings are very diﬀerent from those of conventional ﬁxed eﬀects models where neitherthe fraction nor the pattern of randomly missing data aﬀect inference (see Czarnowske andStammann 2019 for ﬁxed eﬀects binary choice models).18 a b l e : E x p ec t e d V a l u e o f (cid:98) R - M i ss i n g D a t a P a tt e r n N T ψ = . / ψ = . / ψ = . I C B I C E R G R E D P A H o m o s k e d a s t i c . . . . . . . . . . . . . . . . . .

95 120482 . . . . . . . . . . . . . . . . . .

01 120962 . . . . . . . . . . . . . . . . . .

01 240242 . . . . . . . . . . . . . . . . . .

99 240482 . . . . . . . . . . . . . . . . . .

02 240962 . . . . . . . . . . . . . . . . . . H o m o s k e d a s t i c w i t h F a t T a il s . . . . . . . . . . . . . . . . . .

97 120482 . . . . . . . . . . . . . . . . . .

00 120962 . . . . . . . . . . . . . . . . . .

01 240242 . . . . . . . . . . . . . . . . . .

99 240482 . . . . . . . . . . . . . . . . . .

01 240962 . . . . . . . . . . . . . . . . . . C r o ss - S ec t i o n a l H e t e r o s k e d a s t i c . . . . . . . . . . . . . . . . . .

94 120482 . . . . . . . . . . . . . . . . . .

01 120962 . . . . . . . . . . . . . . . . . .

01 240242 . . . . . . . . . . . . . . . . . .

99 240482 . . . . . . . . . . . . . . . . . .

02 240962 . . . . . . . . . . . . . . . . . . C r o ss - S ec t i o n a l H e t e r o s k e d a s t i c w i t h T i m e - S e r i a l C o rr e l a t i o n . . . . . . . . . . . . . . . . . .

97 120482 . . . . . . . . . . . . . . . . . .

01 120962 . . . . . . . . . . . . . . . . . .

01 240246 . . . . . . . . . . . . . . . . . .

00 240482 . . . . . . . . . . . . . . . . . .

02 240962 . . . . . . . . . . . . . . . . . . N o t e : I C a nd B I C d e n o t e t h e i n f o r m a t i o n c r i t e r i ao f B a i a nd N g ( ) , E R a nd G R a r e t h ee s t i m a t o r s o f A hn a nd H o r e n s t e i n ( ) , E D i s t h ee s t i m a t o r o f O n a t s k i ( ) , a nd P A i s t h e d e ﬂ a t e dp a r a ll e l a n a l y s i ss u gg e s t b y D o b r i b a n a nd O w e n ( ) . T r u e nu m b e r o ff a c t o r s i s t w o . T h e i n i t i a l e s t i m a t o r f o r β u s e s R = (cid:100) ( m i n ( N , T ) / ) / (cid:101) f a c t o r s . R e s u l t s a r e b a s e d o n , r e p e t i t i o n s . a b l e : E x p ec t e d V a l u e o f (cid:98) R - M i ss i n g D a t a P a tt e r n N T ψ = . / ψ = . / ψ = . I C B I C E R G R E D P A H o m o s k e d a s t i c . . . . . . . . . . . . . . . . . .

59 120482 . . . . . . . . . . . . . . . . . .

99 120962 . . . . . . . . . . . . . . . . . .

46 240242 . . . . . . . . . . . . . . . . . .

66 240482 . . . . . . . . . . . . . . . . . .

05 240962 . . . . . . . . . . . . . . . . . . H o m o s k e d a s t i c w i t h F a t T a il s . . . . . . . . . . . . . . . . . .

55 120482 . . . . . . . . . . . . . . . . . .

99 120962 . . . . . . . . . . . . . . . . . .

43 240242 . . . . . . . . . . . . . . . . . .

67 240482 . . . . . . . . . . . . . . . . . .

05 240962 . . . . . . . . . . . . . . . . . . C r o ss - S ec t i o n a l H e t e r o s k e d a s t i c . . . . . . . . . . . . . . . . . .

57 120482 . . . . . . . . . . . . . . . . . .

01 120962 . . . . . . . . . . . . . . . . . .

44 240242 . . . . . . . . . . . . . . . . . .

65 240482 . . . . . . . . . . . . . . . . . .

06 240962 . . . . . . . . . . . . . . . . . . C r o ss - S ec t i o n a l H e t e r o s k e d a s t i c w i t h T i m e - S e r i a l C o rr e l a t i o n . . . . . . . . . . . . . . . . . .

71 120482 . . . . . . . . . . . . . . . . . .

02 120962 . . . . . . . . . . . . . . . . . .

51 240246 . . . . . . . . . . . . . . . . . .

84 240482 . . . . . . . . . . . . . . . . . .

11 240962 . . . . . . . . . . . . . . . . . . N o t e : I C a nd B I C d e n o t e t h e i n f o r m a t i o n c r i t e r i ao f B a i a nd N g ( ) , E R a nd G R a r e t h ee s t i m a t o r s o f A hn a nd H o r e n s t e i n ( ) , E D i s t h ee s t i m a t o r o f O n a t s k i ( ) , a nd P A i s t h e d e ﬂ a t e dp a r a ll e l a n a l y s i ss u gg e s t b y D o b r i b a n a nd O w e n ( ) . T r u e nu m b e r o ff a c t o r s i s t w o . T h e i n i t i a l e s t i m a t o r f o r β u s e s R = (cid:100) ( m i n ( N , T ) / ) / (cid:101) f a c t o r s . R e s u l t s a r e b a s e d o n , r e p e t i t i o n s . a b l e : E x p ec t e d V a l u e o f (cid:98) R - M i ss i n g D a t a P a tt e r n N T ψ = . / ψ = . / ψ = . I C B I C E R G R E D P A H o m o s k e d a s t i c . . . . . . . . . . . . . . . . . .

99 120482 . . . . . . . . . . . . . . . . . .

95 120962 . . . . . . . . . . . . . . . . . .

80 240242 . . . . . . . . . . . . . . . . . .

21 240482 . . . . . . . . . . . . . . . . . .

25 240962 . . . . . . . . . . . . . . . . . . H o m o s k e d a s t i c w i t h F a t T a il s . . . . . . . . . . . . . . . . . .

98 120482 . . . . . . . . . . . . . . . . . .

94 120962 . . . . . . . . . . . . . . . . . .

85 240242 . . . . . . . . . . . . . . . . . .

21 240482 . . . . . . . . . . . . . . . . . .

24 240962 . . . . . . . . . . . . . . . . . . C r o ss - S ec t i o n a l H e t e r o s k e d a s t i c . . . . . . . . . . . . . . . . . .

02 120482 . . . . . . . . . . . . . . . . . .

95 120962 . . . . . . . . . . . . . . . . . .

83 240242 . . . . . . . . . . . . . . . . . .

23 240482 . . . . . . . . . . . . . . . . . .

25 240962 . . . . . . . . . . . . . . . . . . C r o ss - S ec t i o n a l H e t e r o s k e d a s t i c w i t h T i m e - S e r i a l C o rr e l a t i o n . . . . . . . . . . . . . . . . . .

13 120482 . . . . . . . . . . . . . . . . . .

05 120962 . . . . . . . . . . . . . . . . . .

87 240246 . . . . . . . . . . . . . . . . . .

40 240482 . . . . . . . . . . . . . . . . . .

34 240962 . . . . . . . . . . . . . . . . . . N o t e : I C a nd B I C d e n o t e t h e i n f o r m a t i o n c r i t e r i ao f B a i a nd N g ( ) , E R a nd G R a r e t h ee s t i m a t o r s o f A hn a nd H o r e n s t e i n ( ) , E D i s t h ee s t i m a t o r o f O n a t s k i ( ) , a nd P A i s t h e d e ﬂ a t e dp a r a ll e l a n a l y s i ss u gg e s t b y D o b r i b a n a nd O w e n ( ) . T r u e nu m b e r o ff a c t o r s i s t w o . T h e i n i t i a l e s t i m a t o r f o r β u s e s R = (cid:100) ( m i n ( N , T ) / ) / (cid:101) f a c t o r s . R e s u l t s a r e b a s e d o n , r e p e t i t i o n s . Empirical Illustration

Whether democracy causes economic growth is a long standing and very controversial questionamong economists. Recently Acemoglu et al. (2019) provide evidence that democratizationhas a very substantial positive impact on GDP per capita. Using annual data of 175 countriesobserved between 1960–2010, their main ﬁndings suggest a long-run eﬀect of about 20 %.The data set constructed by the authors is very well suited for our purposes as it is naturallyunbalanced and covers a very long time horizon with several common shocks induced bytechnological progress and ﬁnancial crises. Overall the sample consists of 6934 observations,where 3558 are classiﬁed as democratic. From 88 diﬀerent countries, 122 transit to democracyand 71 to non-democracy. The average GDPs, measured in year 2000 dollars, are 8150 fordemocratic and 2074 for non-democratic countries. 71 countries were observed over the entiretime horizon such that, on average, the data set covers 136 countries and 40 years. Thefraction and pattern of missing data are comparable to the setting with ψ = 0 . and pattern3 in our simulation study.We reassess the baseline analysis of Acemoglu et al. (2019) using the IFE estimator andthe following speciﬁcation: y it = β D it + p (cid:88) j =1 γ j y it − j + α i + δ t + λ (cid:48) i f t + e it , (20)where i and t are country and time speciﬁc indexes, D it is an indicator for being a democracy,and y it is the corresponding natural logarithm of GDP per capita. α i and δ t are additiveﬁxed eﬀects that capture time-invariant country characteristics and control for the globalbusiness cycle, respectively. Contrary to Acemoglu et al. (2019), we further decompose thetime-varying unobservable shocks into a factor structure λ (cid:48) i f t and a remaining idiosyncraticcomponent e it . This allows us to capture common shocks ( f t ), which simultaneously aﬀect thegrowth and democratization of a country in diﬀerent ways ( λ i ). The dynamic speciﬁcationpermits a distinction between short- and long-run eﬀects of democratization, where the formeris ˆ β and the latter can be computed as ˆ φ = ˆ β − (cid:80) pj =1 ˆ γ j . We use p ∈ { , , } , where p = 4 is the preferred speciﬁcation of Acemoglu et al. (2019).To reduce the number of parameters during the optimization, we project the country- andtime-speciﬁc eﬀects out of the dependent variable and all regressors before estimating β and22 (see Bai 2009 section 8). The model after the projection becomes ¨ y it = β ¨ D it + p (cid:88) j =1 γ j ¨ y it − j + ¨ λ (cid:48) i ¨ f t + e it , (21)where the two dots above denote variables after projecting out both additive eﬀects. In theabsence of any common factors R = 0 , this is simply the conventional ﬁxed eﬀects model.For valid inference it is important to know the true number of factors, or at least anestimate that is larger but close to the true number. Because the true number of factorsis unknown, we proceed as follows: First, we estimate each speciﬁcation with R = 10 toobtain the pure factor models P D ( ¨W ( ˆ β, ˆ γ )) , where ¨ w it ( ˆ β, ˆ γ ) = ˆ β ¨ D it + (cid:80) pj =1 ˆ γ j ¨ y it − j . Notethat the number of factors chosen is equal to the rule-of-thumb used during the simulationexperiments. Afterwards, we apply the estimators suggested by Bai and Ng (2002), Onatski(2010), Ahn and Horenstein (2013), and Dobriban and Owen (2019) to estimate the numberof factors. Table 7 summarizes the results. The estimates are almost identical across diﬀerentspeciﬁcations. Both model selection criteria of Bai and Ng (2002) predict a substantiallylarger number of factors compared to the other estimators, where IC always predicts theupper bound. This is in line with Ahn and Horenstein (2013) who ﬁnd that the informationcriteria are quite sensitive to the chosen upper bound on the number of factors and tend tooverestimate. We partially observe the same behavior during our simulation experiments.Contrary to the model selection criteria, the estimators of Onatski (2010), Ahn and Horenstein(2013), and Dobriban and Owen (2019) all suggest one or three common factors. Additionally,ﬁgure 2 shows the singular values of the pure factor models (non-ﬁlled dots) and those for apermuted version of the data (ﬁlled dots). More precisely, we randomly shuﬄe each column of P D ( ¨W ( ˆ β, ˆ γ )) and compute the maximum value of each singular value across 199 randomizedsamples. Note that this is essentially a graphical illustration of the parallel analysis withoutthe deﬂation proposed by Dobriban and Owen (2019). The large gap between the ﬁrst andthe second common factor, explains why most of the estimators that try to decompose theeigenvalue spectrum predict one factor. However, if we compare the spectra with those ofpermuted data, we ﬁnd that factor two and three have some additional explanatory powereven if it is quite low in terms of variance explained. If we additionally consider the results ofMoon and Weidner (2015), who showed that overestimating the number of factors is betterthan underestimating, R = 3 is our preferred choice.Table 8 summarizes the results of diﬀerent additive and interactive ﬁxed eﬀects estimators(Interactive). As Acemoglu et al. (2019), we report results for the ﬁxed eﬀects estimator(Within), the Arellano-Bond estimator (AB, see Arellano and Bond 1991), and the Hahn-Hausman-Kuersteiner estimator (HHK, see Hahn, Hausman, and Kuersteiner 2004). However,instead of the conventional ﬁxed eﬀects estimator used by the authors, we report results of23 able 7: Estimated Number of Factors

Speciﬁcation IC BIC ER GR ED PA p = 1

10 7 1 1 1 3 p = 2

10 8 1 1 1 3 p = 4

10 8 1 1 3 3

Note: IC and BIC denote the information criteriaof Bai and Ng (2002), ER and GR are the estimatorsof Ahn and Horenstein (2013), ED is the estimatorof Onatski (2010), and PA is the deﬂated parallelanalysis suggest by Dobriban and Owen (2019).Estimators applied to P D ( ¨W ( ˆ β, ˆ γ )) . The initialestimator for β and γ uses R = 10 . Figure 2:

Largest Singular Values in Descending Order l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l

Specification 1 Specification 2 Specification 35 10 15 20 5 10 15 20 5 10 15 2001020

Factor Number S i ngu l a r V a l ue The twenty largest singular values for P D ( ¨W ( ˆ β, ˆ γ )) / √ N T are denoted by non-ﬁlled dots and those forpermuted data by ﬁlled dots. The values for permuted data are based on 199 replications. The initial estimatorfor β and γ uses R = 10 .

24 bias-corrected within estimator with bandwidth L = 5 that accounts for the asymptoticbias induced by the predetermined regressor (see Nickell 1981). For Interactive, we reportresults for R ∈ { , , } . To correct for the Nickell (1981) bias and those biases inducedby cross-sectional heteroskedasticity and time-serial correlation, we use the asymptotic biascorrections proposed by Bai (2009) and Moon and Weidner (2017) with L = 5 and M = 3 .Similar to Acemoglu et al. (2019), we report estimates and standard errors of the short-and long-run eﬀects of democratization and the persistence of GDP processes. Further, allstandard errors are heteroskedasticity robust and clustered at the country level to allow forarbitrary patterns of time-serial correlation. We ﬁnd that all estimators reveal a strongand signiﬁcant persistence of GDP processes across all speciﬁcations. The coeﬃcients ofdemocratization obtained by Within and Interactive with R ≥ are always signiﬁcant atthe 5 % level, whereas those of AB and HHK are only signiﬁcant for p = 4 . If we focus on p = 4 , which is Acemoglu et al. (2019)’s preferred speciﬁcation, the ﬁxed eﬀects models usedby the authors reveal short-run eﬀects of a transition to democracy between 0.828 % and1.178 % and long-run eﬀects between 16.448 % and 29.262 %. However, after controlling foradditional time-varying unobserved heterogeneity, we ﬁnd short- and long-run eﬀects thatare substantially lower compared to those reported by the authors. Our preferred Interactivewith R = 3 , yields short- and long-run estimates of 0.622 % and 18.264 %.Next we consider two diﬀerent sensitivity checks. First, the estimation of the asymptoticbiases requires diﬀerent bandwidth choices. We check the sensitivity of the results by analyzingall combinations of the following bandwidth choices: L ∈ { , . . . , } and M ∈ { , . . . , } .Second, we report estimates of Interactive for R ∈ { , . . . , } . As shown by Moon andWeidner (2015), the inclusion of additional redundant common factors should only aﬀect theprecision of the IFE estimator after controlling for all relevant common factors. Table 9 and10 summarize the results. With respect to the diﬀerent bandwidth choices, we ﬁnd that theresults of Interactive are very robust to all combinations of bandwidth choices as indicatedby the narrow intervals reported in the table. Contrary, the estimates of Within are moresensitive. For p = 4 , we ﬁnd long-run eﬀects between 21.571 % and 32.289 %. With respectto the number of factors, we ﬁnd that after controlling for more than three, the estimatedpersistence of the GDP process starts declining. The same pattern was also recognized inthe empirical illustration of Moon and Weidner (2015). The authors argue that the dynamicspeciﬁcation might be misspeciﬁed in the sense that the lagged outcome variables simplycapture time-serial correlation in the idiosyncratic error term instead of true state dependence.Because the factor structure also captures time-serial correlation, this might indicate thatthere is no true state dependence. Contrary the coeﬃcients of democratization become larger

11. This estimator was also used by Chen, Chernozhukov, and Fernández-Val (2019) for the same illustration,but on a balanced subset of the data. The authors also proposed a split-panel jackknife bias correction toreduce the many moment bias of the Arellano-Bond estimator (see Newey and Smith 2004)12. All covariance estimators use a degrees-of-freedom adjustment to improve their ﬁnite sample properties. able 8: Eﬀect of Democracy on Logarithmic GDP per Capita ( × Within AB HHK Interactive R = 1 R = 2 R = 3 Speciﬁcation 1 - p = 1 Democracy 1.051 0.959 0.781 0.745 0.755 0.809(0.293) (0.477) (0.455) (0.329) (0.287) (0.305)Persistence of 0.983 0.946 0.938 0.960 0.973 0.967GDP process (0.006) (0.009) (0.011) (0.007) (0.006) (0.007)Long-run eﬀect 60.489 17.608 12.644 18.644 27.690 24.546of democracy (28.073) (10.609) (8.282) (9.460) (12.250) (11.131)Speciﬁcation 2 - p = 2 Democracy 0.671 0.797 0.582 0.488 0.554 0.559(0.247) (0.417) (0.387) (0.292) (0.263) (0.279)Persistence of 0.975 0.946 0.941 0.956 0.968 0.966GDP process (0.005) (0.009) (0.010) (0.007) (0.005) (0.006)Long-run eﬀect 26.513 14.882 9.929 11.030 17.258 16.557of democracy (12.026) (9.152) (7.258) (7.174) (8.881) (9.005)Speciﬁcation 3 - p = 4 Democracy 0.828 0.875 1.178 0.513 0.600 0.622(0.225) (0.374) (0.370) (0.267) (0.259) (0.249)Persistence of 0.972 0.947 0.953 0.958 0.964 0.966GDP process (0.005) (0.009) (0.009) (0.006) (0.006) (0.005)Long-run eﬀect 29.262 16.448 25.032 12.226 16.749 18.264of democracy (10.281) (8.436) (10.581) (6.780) (7.956) (8.030)

Note:

Within, Ab, HHK, and Interactive denote the bias-corrected ﬁxed eﬀectsestimator, the Arellano-Bond estimator, the Hahn-Hausman-Kuersteiner estima-tor, and the IFE estimator. Standard errors in parentheses are heteroskedasticityrobust and clustered at the country level. Within and Interactive use bandwidths L = 5 and M = 3 for the estimation of the asymptotic biases. The results of ABand HHK are taken from table 2 in Acemoglu et al. (2019). Table 9:

Sensitivity to Diﬀerent Bandwidth Choices

Within Interactive R = 1 R = 2 R = 3 Speciﬁcation 1 - p = 1 Democracy [0.986; 1.057] [0.744; 0.775] [0.755; 0.806] [0.806; 0.867]Persistence of [0.974; 0.984] [0.958; 0.960] [0.972; 0.973] [0.966; 0.968]GDP processLong-run eﬀect [38.127; 64.280] [18.551; 18.872] [27.336; 29.065] [24.368; 25.713]of democracy Speciﬁcation 2 - p = 2 Democracy [0.633; 0.674] [0.488; 0.522] [0.554; 0.609] [0.548; 0.583]Persistence of [0.968; 0.976] [0.954; 0.956] [0.967; 0.968] [0.966; 0.967]GDP processLong-run eﬀect [19.509; 27.723] [10.987; 11.369] [17.231; 18.721] [16.158; 17.150]of democracy Speciﬁcation 3 - p = 4 Democracy [0.773; 0.849] [0.504; 0.547] [0.596; 0.635] [0.620; 0.661]Persistence of [0.964; 0.974] [0.957; 0.958] [0.964; 0.964] [0.965; 0.966]GDP processLong-run eﬀect [21.571; 32.289] [11.945; 12.672] [16.548; 17.715] [18.113; 19.352]of democracy

Note:

Eﬀect of democracy on logarithmic GDP per capita ( × . Within and Interactivedenote the bias-corrected ﬁxed eﬀects estimator and the IFE estimator. The intervalsdenote the ranges of all estimates across diﬀerent combinations of L ∈ { , . . . , } and M ∈ { , . . . , } . Finally, we consider an additional speciﬁcation without predetermined regressors ( p = 0 ).Again we estimate the number of factors from a pure factor model, where the initial estimateis based on R = 10 . The estimates are identical to those of p = 4 and provide further supportfor our preferred choice of R = 3 . The corresponding estimate of democratization is - 1.251 %(standard error = 1.286 %) and is in line with Barro (1996) who report a negative and/orinsigniﬁcant eﬀect of democracy on growth.To sum up, we ﬁnd some additional support for the hypothesis of Acemoglu et al. (2019):democracy does cause growth. Using the IFE estimator to control for time-varying unobservedheterogeneity, we obtain results that are qualitatively similar to the authors. If we compareHHK to Interactive with R = 3 in the authors preferred speciﬁcation p = 4 , we ﬁnd that theshort-run eﬀect of democratization is halved. However the corresponding long-run eﬀect of18.264 % is still pretty close to the 20 % reported by Acemoglu et al. (2019).27 able 10: Sensitivity to the Number of Factors R = 4 R = 5 R = 6 R = 7 R = 8 R = 9 R = 10 Speciﬁcation 1 - p = 1 Democracy 0.417 0.160 1.101 1.553 1.615 1.619 2.197(0.511) (0.486) (0.516) (0.564) (0.602) (0.634) (0.718)Persistence of 0.848 0.854 0.754 0.702 0.600 0.554 0.453GDP process (0.018) (0.020) (0.030) (0.035) (0.034) (0.039) (0.040)Long-run eﬀect 2.749 1.095 4.476 5.220 4.037 3.635 4.019of democracy (3.363) (3.339) (2.140) (1.939) (1.551) (1.479) (1.394)Speciﬁcation 2 - p = 2 Democracy 0.388 0.868 1.178 1.424 1.905 1.150 1.418(0.469) (0.446) (0.511) (0.565) (0.614) (0.651) (0.705)Persistence of 0.810 0.689 0.586 0.485 0.369 0.342 0.247GDP process (0.019) (0.031) (0.036) (0.031) (0.038) (0.042) (0.056)Long-run eﬀect 2.041 2.795 2.845 2.767 3.019 1.747 1.882of democracy (2.478) (1.458) (1.266) (1.129) (1.013) (1.018) (0.968)Speciﬁcation 3 - p = 4 Democracy 0.763 1.091 1.452 1.474 1.513 1.474 0.476(0.455) (0.446) (0.534) (0.567) (0.557) (0.596) (0.589)Persistence of 0.628 0.593 0.380 0.213 0.174 0.187 -0.202GDP process (0.025) (0.038) (0.049) (0.050) (0.056) (0.092) (0.075)Long-run eﬀect 2.053 2.683 2.341 1.874 1.831 1.814 0.396of democracy (1.211) (1.124) (0.883) (0.729) (0.692) (0.766) (0.493)

Note:

Eﬀect of democracy on logarithmic GDP per capita ( × . Results obtainedby the interactive ﬁxed eﬀects estimator for R ∈ { , . . . , } . Standard errorsin parentheses are heteroskedasticity robust and clustered at the country level.Bandwidths L = 5 and M = 3 for the estimation of the asymptotic biases. Further Extensions

Although we analyzed the IFE estimator of Bai (2009), we want to point out two naturalextensions of our ﬁndings. First, in the presence of regressors that are endogenous with respectto the idiosyncratic error term, Moon and Weidner (2017) and Moon, Shum, and Weidner(2018) suggest a minimum distance estimator with interactive ﬁxed eﬀects in the spirit ofChernozhukov and Hansen (2006, 2008). Second, because the objective function of the IFEestimator is generally non-convex, Moon and Weidner (2019) suggest an alternative estimatorthat avoids the potentially diﬃcult optimization problem with multiple local minima andresults in optimizing a convex objective function.

Extension 1: Minimum Distance Estimator

Suppose that x it can be further decomposed into K endogenous and K exogenousregressors such that K = K + K . To avoid ambiguity, we label endogenous and exogenousregressors with an appropriate superscript. Further, let z it = [ z it, , . . . , z it,M ] be a vectorof excluded exogenous instruments with M ≥ K . Moon and Weidner (2017) suggest thefollowing minimum distance estimator. In a ﬁrst step, an estimator for β end is obtained by ˆ β end = arg min β end ∈ R K ˆ π (cid:0) β end (cid:1) (cid:48) Σ ˆ π (cid:0) β end (cid:1) , (22)where ˆ π ( β end ) is the IFE estimator of y it − x end (cid:48) it β end = x exo (cid:48) it β exo + z (cid:48) it π + λ (cid:48) i f t + e it (23)and Σ is a positive deﬁnite M × M weighting matrix. At the true value of β end , π is zerogiven the exclusion restriction on z it . In a second step, ˆ β exo is the IFE estimator of y it − x end (cid:48) it ˆ β end = x exo (cid:48) it β exo + λ (cid:48) i f t + e it . (24)The properties of the minimum distance estimator are studied in Moon, Shum, and Weidner(2018), where the authors extend the random coeﬃcient demand model of Berry, Levinsohn,and Pakes (1995) with interactive ﬁxed eﬀects to account for unobserved product-marketspeciﬁc heterogeneity, like advertisement. Under very similar assumptions as in Moon andWeidner (2017), the authors show consistency and derive the asymptotic distribution of theminimum distance estimator. Because their estimator embeds the IFE estimator, we canapply the same algorithms and estimators studied in this article. Further, Lee, Moon, andWeidner (2012) use the same estimator to account for measurement errors in the dependentvariable in dynamic interactive ﬁxed eﬀects models.29 xtension 2: Nuclear Norm Minimizing Estimator Moon and Weidner (2019) show that the imposed rank constraint on the factor structureleads to a non-convex optimization problem. The authors suggest an alternative estimatorbased on a convex relaxation of this constraint. More precisely, they show that an estimatorfor β is ˆ β (cid:63) = arg min β ∈ R K n min( N,T ) (cid:88) r =1 σ r (cid:2) P D ( W ( β )) (cid:3) , (25)where σ r [ · ] denotes the r -th largest singular value. Moon and Weidner (2019) show consistencyof this estimator, but only at a rate of (cid:112) min( N, T ) . As a consequence, the convex relaxationleads to a certain loss of eﬃciency compared to the IFE estimator.To recover the properties of the IFE estimator, Moon and Weidner (2019) suggest toestimate the number of factors from P D ( W ( ˆ β (cid:63) )) and afterwards apply an iterative postestimation routine. After a ﬁnite number of iterations the estimator has the same limitingdistribution as the IFE estimator. The post estimation routine can be summarized as follows: Deﬁnition.

Post nuclear norm estimationGiven ˆ β (cid:63) and R , initialize ˆ β = ˆ β (cid:63) and repeat the following steps a ﬁnite number of timesStep 1: Compute (cid:98) F (cid:16) ˆ β (cid:17) and (cid:98) Λ (cid:16) ˆ β (cid:17) from (cid:102) W (cid:16) ˆ β (cid:17) Step 2: Compute ˘ y and ˘ x k for all k ∈ { , . . . , K } Step 3: Update ˆ β = (cid:16) ˘ X (cid:48) ˘ X (cid:17) − ˘ X (cid:48) ˘ y , where ˘ X = [˘ x , . . . , ˘ x K ] The assumption that unobserved heterogeneity is constant over time, is often very restrictive.Especially in panels that cover a long time horizon, like macroeconomic panels of countries, itis unlikely that a global shock aﬀects all countries equally. Interactive ﬁxed eﬀects estimatorsoﬀer researchers new possibilities to consider this more general form of heterogeneity intheir analysis (see among others Holtz-Eakin, Newey, and Rosen 1988; Pesaran 2006; Bai2009). However these panels are often naturally unbalanced, demanding an additional dataaugmentation step for the estimator of Bai (2009) (see Appendix of Bai 2009 and Bai, Liao,and Yang 2015).In this article, we analyzed the ﬁnite sample behavior of Bai (2009)’s interactive ﬁxedeﬀects estimator in the presence of randomly missing data. Simulation experiments conﬁrmedthat the inferential theory derived by Bai (2009) and Moon and Weidner (2017) for balanceddata also provides a reasonable approximation for the unbalanced case. However, we alsofound that the ﬁnite sample performance can be aﬀected by the fraction and pattern ofmissing data. 30uture research could address this issue and provide an inferential theory, which takesthe additional uncertainty induced by data augmentation into account. This might help toimprove the ﬁnite sample behavior of Bai (2009)’s estimator in the presence of randomlymissing data. 31 eferences

Acemoglu, Daron, Suresh Naidu, Pascual Restrepo, and James A. Robinson. 2019. “DemocracyDoes Cause Growth.”

Journal of Political Economy

127 (1): 47–100.Ahn, Seung C., and Alex R. Horenstein. 2013. “Eigenvalue Ratio Test for the Number ofFactors.”

Econometrica

81 (3): 1203–1227.Alessi, Lucia, Matteo Barigozzi, and Marco Capasso. 2010. “Improved penalization for de-termining the number of factors in approximate factor models.”

Statistics & ProbabilityLetters

80 (23): 1806–1813.Anderson, T.W., and Cheng Hsiao. 1982. “Formulation and estimation of dynamic modelsusing panel data.”

Journal of Econometrics

18 (1): 47–82.Arellano, Manuel, and Stephen Bond. 1991. “Some Tests of Speciﬁcation for Panel Data:Monte Carlo Evidence and an Application to Employment Equations.”

The Review ofEconomic Studies

58 (2): 277–297.Bai, Jushan. 2003. “Inferential Theory for Factor Models of Large Dimensions.”

Econometrica

71 (1): 135–171.. 2009. “Panel Data Models with Interactive Fixed Eﬀects.”

Econometrica

77 (4):1229–1279.Bai, Jushan, and Yuan Liao. 2017. “Inferences in panel data with interactive eﬀects usinglarge covariance matrices.”

Journal of Econometrics

200 (1): 59–78.Bai, Jushan, Yuan Liao, and Jisheng Yang. 2015. “Unbalanced Panel Data Models withInteractive Eﬀects.” In

The Oxford Handbook of Panel Data,

Econometrica

70 (1): 191–221.. 2013. “Principal components estimation and identiﬁcation of static factors.”

Journalof Econometrics

176 (1): 18–29.Barro, Robert J. 1996. “Democracy and Growth.”

Journal of Economic Growth

Econometrica

63 (4): 841–890.Bonhomme, Stéphane, and Elena Manresa. 2015. “Grouped Patterns of Heterogeneity inPanel Data.”

Econometrica

83 (3): 1147–1184.32uja, Andreas, and Nermin Eyuboglu. 1992. “Remarks on Parallel Analysis.”

MultivariateBehavioral Research

27 (4): 509–540.Chamberlain, Gary. 1982. “Multivariate regression models for panel data.”

Journal of Econo-metrics

18 (1): 5–46.. 1984. “Chapter 22 Panel data,” 2:1247–1318. Handbook of Econometrics.Chen, Mingli, Iván Fernández-Val, and Martin Weidner. 2019. “Nonlinear Factor Models forNetwork and Panel Data.” arXiv preprint arXiv: 1412.5647.

Chen, Shuowen, Victor Chernozhukov, and Iván Fernández-Val. 2019. “Mastering PanelMetrics: Causal Impact of Democracy on Growth.”

AEA Papers and Proceedings

Journal of Econometrics

132 (2): 491–525.. 2008. “Instrumental variable quantile regression: A robust inference approach.”

Journalof Econometrics

142 (1): 379–398.Choi, In, and Hanbat Jeong. 2019. “Model selection for factor analysis: Some new criteria andperformance comparisons.”

Econometric Reviews

38 (6): 577–596.Czarnowske, Daniel, and Amrei Stammann. 2019. “Binary Choice Models with High-DimensionalIndividual and Time Fixed Eﬀects.” arXiv preprint arXiv:1904.04217.

Dobriban, Edgar. Forthcoming. “Permutation methods for factor analysis and PCA.”

Annalsof Statistics.

Dobriban, Edgar, and Art B. Owen. 2019. “Deterministic parallel analysis: an improvedmethod for selecting factors and principal components.”

Journal of the Royal StatisticalSociety: Series B (Statistical Methodology)

81 (1): 163–183.Fernández-Val, Iván, and Martin Weidner. 2018. “Fixed Eﬀects Estimation of Large-T PanelData Models.”

Annual Review of Economics

10 (1): 109–138.Fong, David Chin-Lung, and Michael Saunders. 2011. “LSMR: An Iterative Algorithm forSparse Least-Squares Problems.”

SIAM Journal on Scientiﬁc Computing

33 (5): 2950–2971.Gagliardini, Patrick, Elisa Ossola, and Olivier Scaillet. 2019. “A diagnostic criterion forapproximate factor structure.”

Journal of Econometrics

212 (2): 503–521.Gaure, Simen. 2013. “OLS with multiple high dimensional category variables.”

ComputationalStatistics & Data Analysis

Stata Journal

10 (4): 628–649.Hahn, Jinyong, Jerry Hausman, and Guido Kuersteiner. 2004. “Estimation with weak in-struments: Accuracy of higher-order bias and MSE approximations.”

The EconometricsJournal

Journal of the American Statistical Association

102 (478):603–617.Halperin, Israel. 1962. “The product of projection operators.”

Acta Sci. Math. (Szeged)

Econometrica

56 (6): 1371–1395.Lee, Nayoung, Hyungsik Roger Moon, and Martin Weidner. 2012. “Analysis of interactiveﬁxed eﬀects dynamic linear panel regression with measurement error.”

Economics Letters

117 (1): 239–242.Moon, Hyungsik Roger, Matthew Shum, and Martin Weidner. 2018. “Estimation of randomcoeﬃcients logit demand models with interactive ﬁxed eﬀects.”

Journal of Econometrics

206 (2): 613–644.Moon, Hyungsik Roger, and Martin Weidner. 2015. “Linear Regression for Panel WithUnknown Number of Factors as Interactive Fixed Eﬀects.”

Econometrica

83 (4): 1543–1579.. 2017. “DYNAMIC LINEAR PANEL REGRESSION MODELS WITH INTERAC-TIVE FIXED EFFECTS.”

Econometric Theory

33 (1): 158–195.. 2019. “Nuclear Norm Regularized Estimation of Panel Regression Models.” arXivpreprint arXiv: 1810.10987.

Mundlak, Yair. 1978. “On the Pooling of Time Series and Cross Section Data.”

Econometrica

46 (1): 69–85.Newey, Whitney K., and Richard J. Smith. 2004. “Higher Order Properties of GMM andGeneralized Empirical Likelihood Estimators.”

Econometrica

72 (1): 219–255.Newey, Whitney K., and Kenneth D. West. 1987. “A Simple, Positive Semi-Deﬁnite, Het-eroskedasticity and Autocorrelation Consistent Covariance Matrix.”

Econometrica

55 (3):703–708. 34ewey, Whitney K., and Kenneth D. West. 1994. “Automatic Lag Selection in CovarianceMatrix Estimation.”

The Review of Economic Studies

61 (4): 631–653.Nickell, Stephen. 1981. “Biases in Dynamic Models with Fixed Eﬀects.”

Econometrica

49 (6):1417–1426.Onatski, Alexei. 2010. “Determining the Number of Factors from Empirical Distribution ofEigenvalues.”

The Review of Economics and Statistics

92 (4): 1004–1016.Pesaran, M. Hashem. 2006. “Estimation and Inference in Large Heterogeneous Panels with aMultifactor Error Structure.”

Econometrica

74 (4): 967–1012.R Core Team. 2019.

R: A Language and Environment for Statistical Computing.

Vienna,Austria: R Foundation for Statistical Computing. .Schwert, G. William. 1989. “Tests for Unit Roots: A Monte Carlo Investigation.”

Journal ofBusiness & Economic Statistics arXiv preprint arXiv:1707.01815.

Stock, James H., and Mark W. Watson. 1998. “Diﬀusion indexes.”

NBER Working Paper No.6702. . 2002. “Macroeconomic forecasting using diﬀusion indexes.”

Journal of Business &Economic Statistics

20 (2): 147–162.White, Halbert. 1980. “A Heteroskedasticity-Consistent Covariance Matrix Estimator and aDirect Test for Heteroskedasticity.”