FForecasting with Bayesian Grouped Random
Effects in Panel Data
Boyuan Zhang : University of Pennsylvania
First Version: June 30, 2020This Version: October 6, 2020
Abstract
In this paper, we estimate and leverage latent constant group structure to gener-ate the point, set, and density forecasts for short dynamic panel data. We implementa nonparametric Bayesian approach to simultaneously identify coefficients and groupmembership in the random effects which are heterogeneous across groups but fixedwithin a group. This method allows us to flexibly incorporate subjective prior knowl-edge on the group structure that potentially improves the predictive accuracy. InMonte Carlo experiments, we demonstrate that our Bayesian grouped random effects(BGRE) estimators produce accurate estimates and score predictive gains over stan-dard panel data estimators. With a data-driven group structure, the BGRE estimatorsexhibit comparable accuracy of clustering with the
Kmeans algorithm and outperforma two-step Bayesian grouped estimator whose group structure relies on
Kmeans . Inthe empirical analysis, we apply our method to forecast the investment rate across abroad range of firms and illustrate that the estimated latent group structure improvesforecasts relative to standard panel data estimators.
JEL CLASSIFICATION: C11, C14, C23, C53, G31KEY WORDS: Panel Data; Grouped Heterogeneity; Random Effects; Dirichlet Process; SetForecast; Density Forecast; Investment : Department of Economics, Perelman Center for Political Science and Economics, University of Penn-sylvania, 133 S. 36th St., Philadelphia, PA 19104-6297. Email: [email protected]. We would like tothank Karun Adusumilli, Francis Diebold, Maximilian G¨obel, Philippe Goulet Coulombe, Frank Schorfheidefor helpful comments and suggestions. a r X i v : . [ ec on . E M ] O c t his Version: October 6, 2020his Version: October 6, 2020
JEL CLASSIFICATION: C11, C14, C23, C53, G31KEY WORDS: Panel Data; Grouped Heterogeneity; Random Effects; Dirichlet Process; SetForecast; Density Forecast; Investment : Department of Economics, Perelman Center for Political Science and Economics, University of Penn-sylvania, 133 S. 36th St., Philadelphia, PA 19104-6297. Email: [email protected]. We would like tothank Karun Adusumilli, Francis Diebold, Maximilian G¨obel, Philippe Goulet Coulombe, Frank Schorfheidefor helpful comments and suggestions. a r X i v : . [ ec on . E M ] O c t his Version: October 6, 2020his Version: October 6, 2020 With the increasing availability of panel data, many works have examined and demonstratedits central role in the empirical research throughout the social and business sciences. Analysisof panel data has various edges over that of pure cross-sectional or time-series data. Themost important one is that the panel data provide researchers with a flexible way to modelboth heterogeneity among individuals, firms, regions, and countries and possible structuralchanges over time. Apart from the principal role in the model estimation, it is interesting andessential to study their relevance for forecasting. Among novel methods emerged recently, thelatent group structure in the heterogeneity attracts wide attention. In this paper, we allowfor grouped patterns of unobserved heterogeneity in the dynamics panel data models andevaluate whether this latent structure improves the predictive performance in an extensivecollection of short time series.In the dynamics panel data model, it is common to assume that each cross-sectionalunit has unique intercept. This assumption introduces a large number of parameters thatbecome a burden in estimation. In models that have as many parameters as individualunits, fixed effects estimators are known to suffer from the “incidental parameters” problem(Neyman and Scott, 1948), which can bring about significant biases in estimates of commonparameters. This problem becomes severe in short panels even if the number of units goes toinfinity (Chamberlain, 1980; Nickell, 1981), and the fixed-effects themselves are often poorlyestimated. An unreliable estimate leads to concerns about the predictive power of paneldata models as inaccurate estimates affect forecasts in all aspects.To address this issue , econometricians attempt to reduce the number of unknown param-eters by dividing units into a finite number of groups. The premise of this idea is that unitsin the same group share the unit-specific parameters. Previous works include Bonhommeand Manresa (2015), Ando and Bai (2016), Su et al. (2016), Bester and Hansen (2016), Suet al. (2019), Bonhomme et al. (2019), and Cheng et al. (2019). Moreover, finite mixturemodel provide a well-known probabilistic approach to model-based clustering (McNicholasand Murphy, 2010; Fr¨uhwirth-Schnatter, 2011). With a finite number of groups, econome-tricians could avoid “incidental parameter” problem under several particular assumptionsand derive consistent estimators for the common parameters. Another important strand of literature implements generalized method of moments (GMM) methods toeliminate bias, see Arellano and Bond (1991), Arellano and Bover (1995) and Blundell and Bond (1998).Though successfully solved the “incidental parameters” problem, this set of methods doesn’t allow for anylatent group structure. his Version: October 6, 2020his Version: October 6, 2020
JEL CLASSIFICATION: C11, C14, C23, C53, G31KEY WORDS: Panel Data; Grouped Heterogeneity; Random Effects; Dirichlet Process; SetForecast; Density Forecast; Investment : Department of Economics, Perelman Center for Political Science and Economics, University of Penn-sylvania, 133 S. 36th St., Philadelphia, PA 19104-6297. Email: [email protected]. We would like tothank Karun Adusumilli, Francis Diebold, Maximilian G¨obel, Philippe Goulet Coulombe, Frank Schorfheidefor helpful comments and suggestions. a r X i v : . [ ec on . E M ] O c t his Version: October 6, 2020his Version: October 6, 2020 With the increasing availability of panel data, many works have examined and demonstratedits central role in the empirical research throughout the social and business sciences. Analysisof panel data has various edges over that of pure cross-sectional or time-series data. Themost important one is that the panel data provide researchers with a flexible way to modelboth heterogeneity among individuals, firms, regions, and countries and possible structuralchanges over time. Apart from the principal role in the model estimation, it is interesting andessential to study their relevance for forecasting. Among novel methods emerged recently, thelatent group structure in the heterogeneity attracts wide attention. In this paper, we allowfor grouped patterns of unobserved heterogeneity in the dynamics panel data models andevaluate whether this latent structure improves the predictive performance in an extensivecollection of short time series.In the dynamics panel data model, it is common to assume that each cross-sectionalunit has unique intercept. This assumption introduces a large number of parameters thatbecome a burden in estimation. In models that have as many parameters as individualunits, fixed effects estimators are known to suffer from the “incidental parameters” problem(Neyman and Scott, 1948), which can bring about significant biases in estimates of commonparameters. This problem becomes severe in short panels even if the number of units goes toinfinity (Chamberlain, 1980; Nickell, 1981), and the fixed-effects themselves are often poorlyestimated. An unreliable estimate leads to concerns about the predictive power of paneldata models as inaccurate estimates affect forecasts in all aspects.To address this issue , econometricians attempt to reduce the number of unknown param-eters by dividing units into a finite number of groups. The premise of this idea is that unitsin the same group share the unit-specific parameters. Previous works include Bonhommeand Manresa (2015), Ando and Bai (2016), Su et al. (2016), Bester and Hansen (2016), Suet al. (2019), Bonhomme et al. (2019), and Cheng et al. (2019). Moreover, finite mixturemodel provide a well-known probabilistic approach to model-based clustering (McNicholasand Murphy, 2010; Fr¨uhwirth-Schnatter, 2011). With a finite number of groups, econome-tricians could avoid “incidental parameter” problem under several particular assumptionsand derive consistent estimators for the common parameters. Another important strand of literature implements generalized method of moments (GMM) methods toeliminate bias, see Arellano and Bond (1991), Arellano and Bover (1995) and Blundell and Bond (1998).Though successfully solved the “incidental parameters” problem, this set of methods doesn’t allow for anylatent group structure. his Version: October 6, 2020his Version: October 6, 2020 N and T are large enough, the information criterion could select the true group struc-ture. However, with a short time span, these criteria might fail to achieve their goal. Asnoted in Bonhomme and Manresa (2015, hereafter BM), the choice of the number of groups iscrucial to estimation and inference for model parameters. Misspecification in group numberforces the algorithm to consider incorrect group membership. We will later show that it isthe information criteria that substantially affects the performance of Grouped Fixed Effects(GFE) estimator proposed by BM.The contributions of this paper are fourfold. First, closely following Kim and Wang (2019,hereafter KW) and Liu (2020), we develop a posterior sampling algorithm that addresses thenonparametric estimation of latent grouped effects and proposes Bayesian Grouped RandomEffects (BGRE) estimator. The number of groups is treated as an unknown parameter that isestimated jointly with the component-specific parameters under the assumption that groupmembership remains constant over time. Instead of using the Finite Mixture model, whichneeds to preset the number of groups, we use Dirichlet Process (DP) prior, in particular thestick-breaking prior, that allows for infinite potential groups. The entire posterior samplerbuilds upon the blocked Gibbs sampling proposed by Ishwaran and James (2001).Second, we leverage the researcher’s prior knowledge of the latent group structure toimprove the estimation and forecasting. In particular, we summarize and incorporate theinformation of subjective group structure in the prior distribution of the membership prob-abilities. If the subjective prior on the group structure is more precise than the randomguess, even with incorrect presumed number of groups, we show that including it in theprior improves the performance of the BGRE estimators as it guides the group membershipestimates. Unlike the P´olya urn Gibbs sampler (Escobar and West, 1995), blocked Gibbs sampler approach avoidsmarginalizing over the prior and thus allows for direct sampling of the nonparametric posterior, leading tocomputational and inferential advantages. his Version: October 6, 2020his Version: October 6, 2020
JEL CLASSIFICATION: C11, C14, C23, C53, G31KEY WORDS: Panel Data; Grouped Heterogeneity; Random Effects; Dirichlet Process; SetForecast; Density Forecast; Investment : Department of Economics, Perelman Center for Political Science and Economics, University of Penn-sylvania, 133 S. 36th St., Philadelphia, PA 19104-6297. Email: [email protected]. We would like tothank Karun Adusumilli, Francis Diebold, Maximilian G¨obel, Philippe Goulet Coulombe, Frank Schorfheidefor helpful comments and suggestions. a r X i v : . [ ec on . E M ] O c t his Version: October 6, 2020his Version: October 6, 2020 With the increasing availability of panel data, many works have examined and demonstratedits central role in the empirical research throughout the social and business sciences. Analysisof panel data has various edges over that of pure cross-sectional or time-series data. Themost important one is that the panel data provide researchers with a flexible way to modelboth heterogeneity among individuals, firms, regions, and countries and possible structuralchanges over time. Apart from the principal role in the model estimation, it is interesting andessential to study their relevance for forecasting. Among novel methods emerged recently, thelatent group structure in the heterogeneity attracts wide attention. In this paper, we allowfor grouped patterns of unobserved heterogeneity in the dynamics panel data models andevaluate whether this latent structure improves the predictive performance in an extensivecollection of short time series.In the dynamics panel data model, it is common to assume that each cross-sectionalunit has unique intercept. This assumption introduces a large number of parameters thatbecome a burden in estimation. In models that have as many parameters as individualunits, fixed effects estimators are known to suffer from the “incidental parameters” problem(Neyman and Scott, 1948), which can bring about significant biases in estimates of commonparameters. This problem becomes severe in short panels even if the number of units goes toinfinity (Chamberlain, 1980; Nickell, 1981), and the fixed-effects themselves are often poorlyestimated. An unreliable estimate leads to concerns about the predictive power of paneldata models as inaccurate estimates affect forecasts in all aspects.To address this issue , econometricians attempt to reduce the number of unknown param-eters by dividing units into a finite number of groups. The premise of this idea is that unitsin the same group share the unit-specific parameters. Previous works include Bonhommeand Manresa (2015), Ando and Bai (2016), Su et al. (2016), Bester and Hansen (2016), Suet al. (2019), Bonhomme et al. (2019), and Cheng et al. (2019). Moreover, finite mixturemodel provide a well-known probabilistic approach to model-based clustering (McNicholasand Murphy, 2010; Fr¨uhwirth-Schnatter, 2011). With a finite number of groups, econome-tricians could avoid “incidental parameter” problem under several particular assumptionsand derive consistent estimators for the common parameters. Another important strand of literature implements generalized method of moments (GMM) methods toeliminate bias, see Arellano and Bond (1991), Arellano and Bover (1995) and Blundell and Bond (1998).Though successfully solved the “incidental parameters” problem, this set of methods doesn’t allow for anylatent group structure. his Version: October 6, 2020his Version: October 6, 2020 N and T are large enough, the information criterion could select the true group struc-ture. However, with a short time span, these criteria might fail to achieve their goal. Asnoted in Bonhomme and Manresa (2015, hereafter BM), the choice of the number of groups iscrucial to estimation and inference for model parameters. Misspecification in group numberforces the algorithm to consider incorrect group membership. We will later show that it isthe information criteria that substantially affects the performance of Grouped Fixed Effects(GFE) estimator proposed by BM.The contributions of this paper are fourfold. First, closely following Kim and Wang (2019,hereafter KW) and Liu (2020), we develop a posterior sampling algorithm that addresses thenonparametric estimation of latent grouped effects and proposes Bayesian Grouped RandomEffects (BGRE) estimator. The number of groups is treated as an unknown parameter that isestimated jointly with the component-specific parameters under the assumption that groupmembership remains constant over time. Instead of using the Finite Mixture model, whichneeds to preset the number of groups, we use Dirichlet Process (DP) prior, in particular thestick-breaking prior, that allows for infinite potential groups. The entire posterior samplerbuilds upon the blocked Gibbs sampling proposed by Ishwaran and James (2001).Second, we leverage the researcher’s prior knowledge of the latent group structure toimprove the estimation and forecasting. In particular, we summarize and incorporate theinformation of subjective group structure in the prior distribution of the membership prob-abilities. If the subjective prior on the group structure is more precise than the randomguess, even with incorrect presumed number of groups, we show that including it in theprior improves the performance of the BGRE estimators as it guides the group membershipestimates. Unlike the P´olya urn Gibbs sampler (Escobar and West, 1995), blocked Gibbs sampler approach avoidsmarginalizing over the prior and thus allows for direct sampling of the nonparametric posterior, leading tocomputational and inferential advantages. his Version: October 6, 2020his Version: October 6, 2020
Kmeans algorithm (MacQueen, 1967) undercertain assumptions. In particular, both algorithms assign units to the closest centroid whenforming the clusters and recalculate the means of the new cluster afterward. To compare theperformance of clustering, we modify our algorithm to incorporate
Kmeans and construct atwo-step BGRE estimator where individuals are clustered in the first step using
Kmeans , andthe group-specific heterogeneity is estimated in the second step. In the simulation section,we document that our BGRE estimators dominate the two-step GRE estimator in terms ofthe performance of both clustering and forecasting. We also find that the two-step GREestimators with
Kmeans algorithm severely underestimate the number of groups under alldata generating processes, whereas BGRE estimators deliver accurate estimates.Last but not least, we examine the performance of BGRE estimators using various setsof simulated data and real data. The Monte Carlo study presents that grouped heterogene-ity brings gains in estimating group structure and one-step ahead point, set, and densityforecasting relative to commonly used predictors with different parametric priors on indi-vidual effects. In particular, our estimators outperform BM’s Grouped Fixed Effects (GFE)estimator in various settings of the data generating process. The better performance is pri-marily due to the accurate estimate of the group structure. Regarding other predictors, weshow that failing to model group structure and to pool information across units severelydeteriorates the results for both estimation and forecasting. Finally, we use our methodto forecast the investment rate across a broad range of firms. The BGRE estimators offerbetter performance than the standard panel data models in forecasting. This reveals thatincorporating the latent group structure provides a great amount of flexibility and improvesthe predictive power of the underlying panel data model.Our paper relates to several branches of the literature. Our work is closely related to BM,KW, and Bonhomme et al. (2019, hereafter BLM). All of these three papers aim to estimatethe unobserved heterogeneity in a linear dynamic panel data model and develop statisticalinference methods. BM estimates the parameters of the model using the GFE estimator thatminimizes the least-squares criterion for all possible groupings of the cross-sectional units.They jointly estimate the individual types and the model’s parameter given the numberof groups and perform model selection afterward. One the other hand, BLM modify thismethod and split the procedure into two steps with
Kmeans clustering algorithm is used inthe first step. From Bayesian’s point of view, KW proposes a full Bayesian estimator that his Version: October 6, 2020his Version: October 6, 2020
Kmeans clustering algorithm is used inthe first step. From Bayesian’s point of view, KW proposes a full Bayesian estimator that his Version: October 6, 2020his Version: October 6, 2020
Kmeans algorithm. We conductvarious Monte Carlo experiments in section 4 to examine the performance of the proposedestimator in a controlled environment in the light of point, density, and set forecasts. Wealso examine the performance of a few variants of the BGRE estimator. In section 5, weconduct empirical analysis in which we forecast the investment rate across firms. Finally,we conclude in section 6. A description of the data sets, additional empirical results, andderivations are relegated to the appendix. his Version: October 6, 2020his Version: October 6, 2020
Kmeans algorithm. We conductvarious Monte Carlo experiments in section 4 to examine the performance of the proposedestimator in a controlled environment in the light of point, density, and set forecasts. Wealso examine the performance of a few variants of the BGRE estimator. In section 5, weconduct empirical analysis in which we forecast the investment rate across firms. Finally,we conclude in section 6. A description of the data sets, additional empirical results, andderivations are relegated to the appendix. his Version: October 6, 2020his Version: October 6, 2020 We consider a panel with observations for cross-sectional units i “ , . . . , N in periods t “ , . . . , T . Given a panel data set tp y it , x it qu , a simple linear dynamic panel data modelwith grouped patterns of heterogeneity takes the following form: y it “ α g i t ` ρy it ´ ` β i x it ` ε it , ε it iid „ N ` , σ g i ˘ . (2.1)where x it are a p ˆ ε it but isallowed to be arbitrarily correlated with α g i t . α g i t denote the time-varying group-specificheterogeneity. The subscript g i P t , ..., K u is the group membership variable with unknownand unconstrained K . y it ´ is the lagged outcome variable. ρ is the homogeneous AR(1)parameters that are common for all cross-sectional units, and β i is a p ˆ ε it is the idiosyncratic error term featured by zero mean andgrouped heteroskedasticity σ g i , with cross-sectional homoskedasticity being a special casewhere σ g i “ σ . This setting leads to a heterogeneous panel with group pattern modeledthrough α g i t and σ g i .By stacking all observations for unit i , we get an aggregated model: y i “ α g i ` ρy i, ´ ` x i β i ` ε i , ε i iid „ N p , Σ g i q . (2.2)where y i “ r y i , y i , . . . , y iT s , y i, ´ “ r y i , y i , . . . , y iT ´ s , T ˆ x i “ r x i , x i , . . . , x iT s , α g i “ r α g i , α g i , . . . , α g i T s , ε i “ r ε i , ε i , . . . , ε iT s , Σ g i “ σ g i I T . To indicate the com-ponent from which each observation stems, we introduce a group membership variable G “ r g , . . . , g N s taking values in t , . . . , K u N . Define a set of unit that belongs to group k : C k “ t i P t , , ..., N u| g i “ k u . Let | C k | denote the cardinality of the set C k .Following Sun (2005), Lin and Ng (2012) and BM, we assume that individual groupmembership does not vary over time. In addition, for any group i ‰ j , we assume that theyhave different paths of random effects, e.g., α i ‰ α j , and no single unit can simultaneouslybelong to these two groups: C i Ş C j “ H .The main goal of this paper is to estimate the grouped random effects α g i , commonparameter ρ , hetergenous coeffecients β i and group membership G using full sample andprovide the point, set, and density forecasts of y it ` h for each unit i . Throughout this paper,we focus on the one-step ahead forecast where h “
1. For the multiple-step forecast, the his Version: October 6, 2020his Version: October 6, 2020
1. For the multiple-step forecast, the his Version: October 6, 2020his Version: October 6, 2020 y iT ` h in accordance with (2.1) given the estimate ofparameters and realizations of data. Our goal is to generate one-step ahead forecasts of y i,T ` for i “ , ..., N conditional on thehistory of observations, Y “ r y , y , ..., y N s , y i “ r y i , y i , ..., y iT s ,X “ r x , x , ..., x N s , x i “ r x i , x i , ..., x iT s . and newly available exogenous variables x iT ` at T `
1. For illustration purpose, we drop X and x iT ` from notations but we always condition on these exogenous variables.The posterior predictive distribution for unit i is given by p p y iT ` | Y q “ ż p p y iT ` | Y, Θ q p p Θ | Y q d Θ , (2.3)where Θ is a vector of parameters Θ “ t ρ, β i , α g i , Σ g i , g i u . This density is the posteriorexpectation of the following function, p p y iT ` | Y, Θ q “ K ÿ k “ p g i “ k q p p y iT ` | Y, ρ, β i , α k , Σ k q , (2.4)which is invariant to relabeling the components of the mixture. Therefore, given M ˚ posteriordraws, the density estimated from the MCMC draws isˆ p p y iT ` | Y q “ M ˚ M ˚ ÿ j “ ˜ K p j q ÿ k “ p g i “ k q p ´ y iT ` | Y, ρ p j q , β p j q i , α p j q k , Σ p j q k ¯¸ . (2.5)Therefore, we can draw samples from ˆ p p y iT ` | Y q by simulating (2.1) forward conditional onthe posterior draws of Θ and observations. We evaluate the point forecasts via the Root Mean Square Forecast Error (RMSFE) underthe quadratic loss function averaged across units. Let ˆ y iT ` represent the predicted value his Version: October 6, 2020his Version: October 6, 2020
1. For illustration purpose, we drop X and x iT ` from notations but we always condition on these exogenous variables.The posterior predictive distribution for unit i is given by p p y iT ` | Y q “ ż p p y iT ` | Y, Θ q p p Θ | Y q d Θ , (2.3)where Θ is a vector of parameters Θ “ t ρ, β i , α g i , Σ g i , g i u . This density is the posteriorexpectation of the following function, p p y iT ` | Y, Θ q “ K ÿ k “ p g i “ k q p p y iT ` | Y, ρ, β i , α k , Σ k q , (2.4)which is invariant to relabeling the components of the mixture. Therefore, given M ˚ posteriordraws, the density estimated from the MCMC draws isˆ p p y iT ` | Y q “ M ˚ M ˚ ÿ j “ ˜ K p j q ÿ k “ p g i “ k q p ´ y iT ` | Y, ρ p j q , β p j q i , α p j q k , Σ p j q k ¯¸ . (2.5)Therefore, we can draw samples from ˆ p p y iT ` | Y q by simulating (2.1) forward conditional onthe posterior draws of Θ and observations. We evaluate the point forecasts via the Root Mean Square Forecast Error (RMSFE) underthe quadratic loss function averaged across units. Let ˆ y iT ` represent the predicted value his Version: October 6, 2020his Version: October 6, 2020 T , the loss function is written as L p p y N,T ` , y N,T ` q “ N N ÿ i “ p ˆ y iT ` ´ y iT ` q “ N N ÿ i “ ˆ ε iT ` , (2.6)where y i,T ` is the realization at T ` ε iT ` denote the forecast error.The optimal posterior forecast under quadratic loss function is obtain by minimizing theposterior risk, ˆ y N,T ` “ argmin ˆ y P R N ż L p ˆ y, y N,T ` q p p y N,T ` | Y q dy N,T ` “ argmin ˆ y P R N N N ÿ i “ E “ p ˆ y ´ y iT ` h q | Y ‰ . (2.7)This implies optimal posterior forecast is the posterior mean,ˆ y i,T ` “ E p y iT ` | Y q , for i “ , . . . , N. (2.8) We construct set forecasts CS iT ` from the posterior predictive distribution of each unit. Inparticular, we adopt a Bayesian approach and report the highest posterior density interval(HPDI), which is the narrowest connected interval with coverage probability of 1 ´ α . Putdifferently, it requires that the probability of y iT ` h P CS iT ` conditional on having observedthe history Y is at least 1 ´ α , i.e., P p y iT ` P CS iT ` q ě ´ α, for all i, (2.9)and this interval is the shortest among all possible single connected candidate sets. Let δ l be the lower bound and δ u be the upper bound, then CS iT ` “ “ δ li , δ ui ‰ .The assessment of set forecasts in simulation studies and empirical applications is basedon two metrics: (1) the cross-sectional coverage frequency, Cov T ` “ N N ÿ i “ I t y iT ` P CS iT ` u , (2.10)and (2) the average length of the sets C iT ` , AvgL T ` “ N N ÿ i “ p δ ui ´ δ li q . (2.11) his Version: October 6, 2020his Version: October 6, 2020
1. For illustration purpose, we drop X and x iT ` from notations but we always condition on these exogenous variables.The posterior predictive distribution for unit i is given by p p y iT ` | Y q “ ż p p y iT ` | Y, Θ q p p Θ | Y q d Θ , (2.3)where Θ is a vector of parameters Θ “ t ρ, β i , α g i , Σ g i , g i u . This density is the posteriorexpectation of the following function, p p y iT ` | Y, Θ q “ K ÿ k “ p g i “ k q p p y iT ` | Y, ρ, β i , α k , Σ k q , (2.4)which is invariant to relabeling the components of the mixture. Therefore, given M ˚ posteriordraws, the density estimated from the MCMC draws isˆ p p y iT ` | Y q “ M ˚ M ˚ ÿ j “ ˜ K p j q ÿ k “ p g i “ k q p ´ y iT ` | Y, ρ p j q , β p j q i , α p j q k , Σ p j q k ¯¸ . (2.5)Therefore, we can draw samples from ˆ p p y iT ` | Y q by simulating (2.1) forward conditional onthe posterior draws of Θ and observations. We evaluate the point forecasts via the Root Mean Square Forecast Error (RMSFE) underthe quadratic loss function averaged across units. Let ˆ y iT ` represent the predicted value his Version: October 6, 2020his Version: October 6, 2020 T , the loss function is written as L p p y N,T ` , y N,T ` q “ N N ÿ i “ p ˆ y iT ` ´ y iT ` q “ N N ÿ i “ ˆ ε iT ` , (2.6)where y i,T ` is the realization at T ` ε iT ` denote the forecast error.The optimal posterior forecast under quadratic loss function is obtain by minimizing theposterior risk, ˆ y N,T ` “ argmin ˆ y P R N ż L p ˆ y, y N,T ` q p p y N,T ` | Y q dy N,T ` “ argmin ˆ y P R N N N ÿ i “ E “ p ˆ y ´ y iT ` h q | Y ‰ . (2.7)This implies optimal posterior forecast is the posterior mean,ˆ y i,T ` “ E p y iT ` | Y q , for i “ , . . . , N. (2.8) We construct set forecasts CS iT ` from the posterior predictive distribution of each unit. Inparticular, we adopt a Bayesian approach and report the highest posterior density interval(HPDI), which is the narrowest connected interval with coverage probability of 1 ´ α . Putdifferently, it requires that the probability of y iT ` h P CS iT ` conditional on having observedthe history Y is at least 1 ´ α , i.e., P p y iT ` P CS iT ` q ě ´ α, for all i, (2.9)and this interval is the shortest among all possible single connected candidate sets. Let δ l be the lower bound and δ u be the upper bound, then CS iT ` “ “ δ li , δ ui ‰ .The assessment of set forecasts in simulation studies and empirical applications is basedon two metrics: (1) the cross-sectional coverage frequency, Cov T ` “ N N ÿ i “ I t y iT ` P CS iT ` u , (2.10)and (2) the average length of the sets C iT ` , AvgL T ` “ N N ÿ i “ p δ ui ´ δ li q . (2.11) his Version: October 6, 2020his Version: October 6, 2020 To compare the performance of density forecast for various estimators, we examine the con-tinuous ranked probability score (CRPS) across units. The CRPS is frequently used to assessthe respective accuracy of two probabilistic forecasting models. It is a quadratic measure ofthe difference between the forecast cumulative distribution function (CDF), F T ` i p y q , andthe empirical CDF of the observation with the formula as follows, CRP S T ` “ N N ÿ i “ CRP S p F T ` i , y iT ` h q“ N N ÿ i “ ż ` F T ` i p y q ´ I t y iT ` h ď y u ˘ dy, (2.12)where y iT ` h is the realization at T ` LP S T ` “ N N ÿ i “ ln p p y iT ` | Y q , (2.13) To evaluate the statistical superiority of pooling within K clusters, we report the bias, stan-dard deviation, average length of 95% credible set, and frequentist coverage of the posteriormean estimate of ρ across Monte Carlo repetitions. For the random effects α , we only presentthe average bias as it may not be of interest for most empirical analysis.To estimate the number of groups, we derive a point estimator from its posterior dis-tribution, typically, the posterior mean, which is consistent with a quadratic loss function.In the empirical analysis, we also consider the posterior mode suggested by Malsiner-Walliet al. (2016), which is equal to the most frequent number of non-empty components visitedduring MCMC sampling. These approaches constitute an automatic and straightforwardstrategy to estimate the unknown number of groups without using model selection criteriaor marginal likelihoods. his Version: October 6, 2020his Version: October 6, 2020
1. For illustration purpose, we drop X and x iT ` from notations but we always condition on these exogenous variables.The posterior predictive distribution for unit i is given by p p y iT ` | Y q “ ż p p y iT ` | Y, Θ q p p Θ | Y q d Θ , (2.3)where Θ is a vector of parameters Θ “ t ρ, β i , α g i , Σ g i , g i u . This density is the posteriorexpectation of the following function, p p y iT ` | Y, Θ q “ K ÿ k “ p g i “ k q p p y iT ` | Y, ρ, β i , α k , Σ k q , (2.4)which is invariant to relabeling the components of the mixture. Therefore, given M ˚ posteriordraws, the density estimated from the MCMC draws isˆ p p y iT ` | Y q “ M ˚ M ˚ ÿ j “ ˜ K p j q ÿ k “ p g i “ k q p ´ y iT ` | Y, ρ p j q , β p j q i , α p j q k , Σ p j q k ¯¸ . (2.5)Therefore, we can draw samples from ˆ p p y iT ` | Y q by simulating (2.1) forward conditional onthe posterior draws of Θ and observations. We evaluate the point forecasts via the Root Mean Square Forecast Error (RMSFE) underthe quadratic loss function averaged across units. Let ˆ y iT ` represent the predicted value his Version: October 6, 2020his Version: October 6, 2020 T , the loss function is written as L p p y N,T ` , y N,T ` q “ N N ÿ i “ p ˆ y iT ` ´ y iT ` q “ N N ÿ i “ ˆ ε iT ` , (2.6)where y i,T ` is the realization at T ` ε iT ` denote the forecast error.The optimal posterior forecast under quadratic loss function is obtain by minimizing theposterior risk, ˆ y N,T ` “ argmin ˆ y P R N ż L p ˆ y, y N,T ` q p p y N,T ` | Y q dy N,T ` “ argmin ˆ y P R N N N ÿ i “ E “ p ˆ y ´ y iT ` h q | Y ‰ . (2.7)This implies optimal posterior forecast is the posterior mean,ˆ y i,T ` “ E p y iT ` | Y q , for i “ , . . . , N. (2.8) We construct set forecasts CS iT ` from the posterior predictive distribution of each unit. Inparticular, we adopt a Bayesian approach and report the highest posterior density interval(HPDI), which is the narrowest connected interval with coverage probability of 1 ´ α . Putdifferently, it requires that the probability of y iT ` h P CS iT ` conditional on having observedthe history Y is at least 1 ´ α , i.e., P p y iT ` P CS iT ` q ě ´ α, for all i, (2.9)and this interval is the shortest among all possible single connected candidate sets. Let δ l be the lower bound and δ u be the upper bound, then CS iT ` “ “ δ li , δ ui ‰ .The assessment of set forecasts in simulation studies and empirical applications is basedon two metrics: (1) the cross-sectional coverage frequency, Cov T ` “ N N ÿ i “ I t y iT ` P CS iT ` u , (2.10)and (2) the average length of the sets C iT ` , AvgL T ` “ N N ÿ i “ p δ ui ´ δ li q . (2.11) his Version: October 6, 2020his Version: October 6, 2020 To compare the performance of density forecast for various estimators, we examine the con-tinuous ranked probability score (CRPS) across units. The CRPS is frequently used to assessthe respective accuracy of two probabilistic forecasting models. It is a quadratic measure ofthe difference between the forecast cumulative distribution function (CDF), F T ` i p y q , andthe empirical CDF of the observation with the formula as follows, CRP S T ` “ N N ÿ i “ CRP S p F T ` i , y iT ` h q“ N N ÿ i “ ż ` F T ` i p y q ´ I t y iT ` h ď y u ˘ dy, (2.12)where y iT ` h is the realization at T ` LP S T ` “ N N ÿ i “ ln p p y iT ` | Y q , (2.13) To evaluate the statistical superiority of pooling within K clusters, we report the bias, stan-dard deviation, average length of 95% credible set, and frequentist coverage of the posteriormean estimate of ρ across Monte Carlo repetitions. For the random effects α , we only presentthe average bias as it may not be of interest for most empirical analysis.To estimate the number of groups, we derive a point estimator from its posterior dis-tribution, typically, the posterior mean, which is consistent with a quadratic loss function.In the empirical analysis, we also consider the posterior mode suggested by Malsiner-Walliet al. (2016), which is equal to the most frequent number of non-empty components visitedduring MCMC sampling. These approaches constitute an automatic and straightforwardstrategy to estimate the unknown number of groups without using model selection criteriaor marginal likelihoods. his Version: October 6, 2020his Version: October 6, 2020 Until now, our main focus is the group structure for the intercepts α i while ρ and β i areleft unchanged as in a standard panel data model. Here, we can easily extend our modeland allow for joint group-specific heterogeneity in α , β and ρ . Then, the extended model iswritten as, y it “ θ g i q x it ` ε it , ε it iid „ N ` , σ g i ˘ , (2.14)where q x it “ r y it ´ x it s , and θ g i “ r α g i ρ g i β g i s .From a group k , with a joint conjugate prior for parameters θ k , we modify our blockGibbs Sampler to draw α k , ρ k and β k simultaneously from their joint posterior distribution.The detailed derivation of the posterior distributions are presented in Appendix A.3. In this section, we provide details in Bayesian analysis. In Section 3.1, we document thespecification of the prior distribution for all parameters, including the auxiliary variable inthe random coefficient model, and the subjective group prior if econometricians have priorknowledge on group structure. Section 3.2 outlines the posterior sampler and the proposedalgorithm is shown in the Appendix A.1.1. Finally, we provide preliminary thoughts on theconnection between our Bayesian method and unsupervised machine learning methods inSection 3.3.
Two sets of specifications of prior distributions are considered in this section. In the firstprior specification, we concentrate on random effects models and implement a full Bayesiananalysis. In addition, we specify a hyperprior for the distribution of unobserved heterogeneityand then construct a joint posterior for the coefficients of this hyperprior as well as the actualunit-specific and common coefficients. In the second specification, econometricians couldprovide useful information on the latent group structure and incorporate it in the prior. his Version: October 6, 2020his Version: October 6, 2020
Two sets of specifications of prior distributions are considered in this section. In the firstprior specification, we concentrate on random effects models and implement a full Bayesiananalysis. In addition, we specify a hyperprior for the distribution of unobserved heterogeneityand then construct a joint posterior for the coefficients of this hyperprior as well as the actualunit-specific and common coefficients. In the second specification, econometricians couldprovide useful information on the latent group structure and incorporate it in the prior. his Version: October 6, 2020his Version: October 6, 2020 In this paper, we focus on the random coefficients model where heterogeneous parameters α g i t and σ g i are independent and are assumed to be independent of the initial value of each unit y i . The specification can be extended to correlated random coefficients model by modelingthe joint distribution of heterogeneous parameters and initial values y i .A typical choice in the nonparametric Bayesian literature is the Dirichlet Process (DP)prior or stick-breaking prior. With group probabilities π k and parameter in prior: (mean,variance) = p µ α , Σ α q , a draw of α g i t from the DP prior could be viewed as a mixture of pointmess with the probability mass function, α g i t „ K ÿ k “ π k δ α kt , with α kt „ N p µ α , Σ α q , (3.1)where δ x denotes the Dirac-delta function concentrated at x , each α kt is drawn from a normaldistribution and K is unknown. µ α are set to the OLS estimate of α assuming K “ α equals 200 ˆ ˆΣ α where ˆΣ α is the standard deviation of the OLS estimator. In the samefashion, we can define the DP prior for grouped heteroskedasticity σ g i given identical groupprobabilities π k : σ g i „ K ÿ k “ π k δ σ k , with σ k „ IG ˆ ν σ , δ σ ˙ , ν σ “ , and δ σ “ , (3.2)where each component is drawn from inverse-Gamma distribution.Put together, the posterior draws of grouped related coefficients can be characterized bya grouped triplet t π k , α k , σ k u for k “ , , .. , and α k “ r α k , α k , ..., α kT s . Importantly, thedistributions of both α k and σ k are discrete, because draws can only take the values in theset tp α k , σ k q : k P Z ` u . This nonparametric nature makes the Dirichlet Process prior an idealchoice for clustering problems especially when the distinct number of clusters is unknownbeforehand. The group parameters p α k , σ k q are assumed to follow the base distribution B which is an independent (non-conjugate) Multivariate Normal-Inverse-Gamma (IMNIG)distribution.On the other hand, the group probability is formalized through an infinite-dimensionalstick-breaking prior governed by the concentration parameter a , π k “ ξ k ź j ă k p ´ ξ j q for k ą , and π “ ξ , (3.3) his Version: October 6, 2020his Version: October 6, 2020
Two sets of specifications of prior distributions are considered in this section. In the firstprior specification, we concentrate on random effects models and implement a full Bayesiananalysis. In addition, we specify a hyperprior for the distribution of unobserved heterogeneityand then construct a joint posterior for the coefficients of this hyperprior as well as the actualunit-specific and common coefficients. In the second specification, econometricians couldprovide useful information on the latent group structure and incorporate it in the prior. his Version: October 6, 2020his Version: October 6, 2020 In this paper, we focus on the random coefficients model where heterogeneous parameters α g i t and σ g i are independent and are assumed to be independent of the initial value of each unit y i . The specification can be extended to correlated random coefficients model by modelingthe joint distribution of heterogeneous parameters and initial values y i .A typical choice in the nonparametric Bayesian literature is the Dirichlet Process (DP)prior or stick-breaking prior. With group probabilities π k and parameter in prior: (mean,variance) = p µ α , Σ α q , a draw of α g i t from the DP prior could be viewed as a mixture of pointmess with the probability mass function, α g i t „ K ÿ k “ π k δ α kt , with α kt „ N p µ α , Σ α q , (3.1)where δ x denotes the Dirac-delta function concentrated at x , each α kt is drawn from a normaldistribution and K is unknown. µ α are set to the OLS estimate of α assuming K “ α equals 200 ˆ ˆΣ α where ˆΣ α is the standard deviation of the OLS estimator. In the samefashion, we can define the DP prior for grouped heteroskedasticity σ g i given identical groupprobabilities π k : σ g i „ K ÿ k “ π k δ σ k , with σ k „ IG ˆ ν σ , δ σ ˙ , ν σ “ , and δ σ “ , (3.2)where each component is drawn from inverse-Gamma distribution.Put together, the posterior draws of grouped related coefficients can be characterized bya grouped triplet t π k , α k , σ k u for k “ , , .. , and α k “ r α k , α k , ..., α kT s . Importantly, thedistributions of both α k and σ k are discrete, because draws can only take the values in theset tp α k , σ k q : k P Z ` u . This nonparametric nature makes the Dirichlet Process prior an idealchoice for clustering problems especially when the distinct number of clusters is unknownbeforehand. The group parameters p α k , σ k q are assumed to follow the base distribution B which is an independent (non-conjugate) Multivariate Normal-Inverse-Gamma (IMNIG)distribution.On the other hand, the group probability is formalized through an infinite-dimensionalstick-breaking prior governed by the concentration parameter a , π k “ ξ k ź j ă k p ´ ξ j q for k ą , and π “ ξ , (3.3) his Version: October 6, 2020his Version: October 6, 2020 ξ k , which are called stick lengths, are independent random variables drawn from thebeta distribution Beta p , a q . This construction can be viewed as a stick-breaking procedure,where at each step, we independently and randomly break the leftover of a stick of unitlength and assign the length of this break to the current value of π k . The smaller a is, theless of the stick will be left for subsequent values (on average), yielding more concentrateddistributions.The concentration parameter a specifies how strong this discretization is. As a Ñ a Ñ 8 , the realizationsbecome continuous-valued from its based distribution. Escobar and West (1995) shows thatthe number of estimated groups under a DP prior is sensitive to a , which indicates that adata-driven estimate should be more reasonable. To determine how discrete we want andhow many groups are needed given the data, it is convenient to treat a as a parameterunder the nonparametric Bayesian framework. Put differently, we can set up a relativelygeneral hyperprior for a „ Gamma p . , q , and update it based on the observations. Thisstep generates a posterior estimate of a , which implicitly chooses the optimal K withoutre-estimating the models with different numbers of groups.Finally, the prior distribution for the common parameter ρ is chosen to be a normaldistribution, ρ „ N p , σ ρ q with σ ρ “ . (3.4)The prior of heterogeneous parameter β i follows, β i „ N p , Σ β q with Σ β “ ˆ I p . (3.5)To sum up, in the random coefficients model, we specify the Dirichlet Process priorsfor group random effects α g i t and heteroskedasticity σ g i , a stick-breaking process for groupprobabilities π k , a hyperprior for the concentration parameter a and a normal prior for thecommon parameter ρ and heterogeneous parameter β i . Frequently, researchers could provide a group structure on all or at least part of the unitsbased on personal expertise and the nature of individuals. For example, firms coming fromthe same industry may share a similar growth pattern with relative high probability; coun-tries having the same level of development form comparable fiscal policies. Though this his Version: October 6, 2020his Version: October 6, 2020
Two sets of specifications of prior distributions are considered in this section. In the firstprior specification, we concentrate on random effects models and implement a full Bayesiananalysis. In addition, we specify a hyperprior for the distribution of unobserved heterogeneityand then construct a joint posterior for the coefficients of this hyperprior as well as the actualunit-specific and common coefficients. In the second specification, econometricians couldprovide useful information on the latent group structure and incorporate it in the prior. his Version: October 6, 2020his Version: October 6, 2020 In this paper, we focus on the random coefficients model where heterogeneous parameters α g i t and σ g i are independent and are assumed to be independent of the initial value of each unit y i . The specification can be extended to correlated random coefficients model by modelingthe joint distribution of heterogeneous parameters and initial values y i .A typical choice in the nonparametric Bayesian literature is the Dirichlet Process (DP)prior or stick-breaking prior. With group probabilities π k and parameter in prior: (mean,variance) = p µ α , Σ α q , a draw of α g i t from the DP prior could be viewed as a mixture of pointmess with the probability mass function, α g i t „ K ÿ k “ π k δ α kt , with α kt „ N p µ α , Σ α q , (3.1)where δ x denotes the Dirac-delta function concentrated at x , each α kt is drawn from a normaldistribution and K is unknown. µ α are set to the OLS estimate of α assuming K “ α equals 200 ˆ ˆΣ α where ˆΣ α is the standard deviation of the OLS estimator. In the samefashion, we can define the DP prior for grouped heteroskedasticity σ g i given identical groupprobabilities π k : σ g i „ K ÿ k “ π k δ σ k , with σ k „ IG ˆ ν σ , δ σ ˙ , ν σ “ , and δ σ “ , (3.2)where each component is drawn from inverse-Gamma distribution.Put together, the posterior draws of grouped related coefficients can be characterized bya grouped triplet t π k , α k , σ k u for k “ , , .. , and α k “ r α k , α k , ..., α kT s . Importantly, thedistributions of both α k and σ k are discrete, because draws can only take the values in theset tp α k , σ k q : k P Z ` u . This nonparametric nature makes the Dirichlet Process prior an idealchoice for clustering problems especially when the distinct number of clusters is unknownbeforehand. The group parameters p α k , σ k q are assumed to follow the base distribution B which is an independent (non-conjugate) Multivariate Normal-Inverse-Gamma (IMNIG)distribution.On the other hand, the group probability is formalized through an infinite-dimensionalstick-breaking prior governed by the concentration parameter a , π k “ ξ k ź j ă k p ´ ξ j q for k ą , and π “ ξ , (3.3) his Version: October 6, 2020his Version: October 6, 2020 ξ k , which are called stick lengths, are independent random variables drawn from thebeta distribution Beta p , a q . This construction can be viewed as a stick-breaking procedure,where at each step, we independently and randomly break the leftover of a stick of unitlength and assign the length of this break to the current value of π k . The smaller a is, theless of the stick will be left for subsequent values (on average), yielding more concentrateddistributions.The concentration parameter a specifies how strong this discretization is. As a Ñ a Ñ 8 , the realizationsbecome continuous-valued from its based distribution. Escobar and West (1995) shows thatthe number of estimated groups under a DP prior is sensitive to a , which indicates that adata-driven estimate should be more reasonable. To determine how discrete we want andhow many groups are needed given the data, it is convenient to treat a as a parameterunder the nonparametric Bayesian framework. Put differently, we can set up a relativelygeneral hyperprior for a „ Gamma p . , q , and update it based on the observations. Thisstep generates a posterior estimate of a , which implicitly chooses the optimal K withoutre-estimating the models with different numbers of groups.Finally, the prior distribution for the common parameter ρ is chosen to be a normaldistribution, ρ „ N p , σ ρ q with σ ρ “ . (3.4)The prior of heterogeneous parameter β i follows, β i „ N p , Σ β q with Σ β “ ˆ I p . (3.5)To sum up, in the random coefficients model, we specify the Dirichlet Process priorsfor group random effects α g i t and heteroskedasticity σ g i , a stick-breaking process for groupprobabilities π k , a hyperprior for the concentration parameter a and a normal prior for thecommon parameter ρ and heterogeneous parameter β i . Frequently, researchers could provide a group structure on all or at least part of the unitsbased on personal expertise and the nature of individuals. For example, firms coming fromthe same industry may share a similar growth pattern with relative high probability; coun-tries having the same level of development form comparable fiscal policies. Though this his Version: October 6, 2020his Version: October 6, 2020 ω i “ r ω i , ω i , ..., ω iK p s for all unit i “ , .., N , where K p is the presetnumber of groups. Namely, before estimating the group membership, the researcher assignseach unit to different groups with a set of subjective group-specific probabilities, and theseprobabilities will enter the algorithm through a prior distribution for ω i . We name this priorfor ω i as Subjective Group Probability (SGP) Prior . In practice, one could provide a table(for example, Table 1) documenting the subjective group probability of a unit falling into aspecific group. Table 1: An example of prior group probability, K p “ i “ , , ..., N ,the researcher is fairly confident in her knowledge and sets one of ω ik to 100%. This isequivalent to the case where the researcher exactly partitions N units into K p predeterminedgroups.Building on the prior for the random effects model in Section 3.1.1, we allow for incorpo-rating the researchers’ prior knowledge while inheriting the feature of reallocating units andchanging the number of groups along the MCMC sampling. These flexible features enablethe block Gibbs sampler to automatically correct and update the imprecise subjective prior,especially when K p doesn’t match the true number of groups.To incorporate these subjective group probabilities, it is important to choose a properprior for ω . Dirichlet distribution is an applicable candidate among assorted densities since his Version: October 6, 2020his Version: October 6, 2020
Two sets of specifications of prior distributions are considered in this section. In the firstprior specification, we concentrate on random effects models and implement a full Bayesiananalysis. In addition, we specify a hyperprior for the distribution of unobserved heterogeneityand then construct a joint posterior for the coefficients of this hyperprior as well as the actualunit-specific and common coefficients. In the second specification, econometricians couldprovide useful information on the latent group structure and incorporate it in the prior. his Version: October 6, 2020his Version: October 6, 2020 In this paper, we focus on the random coefficients model where heterogeneous parameters α g i t and σ g i are independent and are assumed to be independent of the initial value of each unit y i . The specification can be extended to correlated random coefficients model by modelingthe joint distribution of heterogeneous parameters and initial values y i .A typical choice in the nonparametric Bayesian literature is the Dirichlet Process (DP)prior or stick-breaking prior. With group probabilities π k and parameter in prior: (mean,variance) = p µ α , Σ α q , a draw of α g i t from the DP prior could be viewed as a mixture of pointmess with the probability mass function, α g i t „ K ÿ k “ π k δ α kt , with α kt „ N p µ α , Σ α q , (3.1)where δ x denotes the Dirac-delta function concentrated at x , each α kt is drawn from a normaldistribution and K is unknown. µ α are set to the OLS estimate of α assuming K “ α equals 200 ˆ ˆΣ α where ˆΣ α is the standard deviation of the OLS estimator. In the samefashion, we can define the DP prior for grouped heteroskedasticity σ g i given identical groupprobabilities π k : σ g i „ K ÿ k “ π k δ σ k , with σ k „ IG ˆ ν σ , δ σ ˙ , ν σ “ , and δ σ “ , (3.2)where each component is drawn from inverse-Gamma distribution.Put together, the posterior draws of grouped related coefficients can be characterized bya grouped triplet t π k , α k , σ k u for k “ , , .. , and α k “ r α k , α k , ..., α kT s . Importantly, thedistributions of both α k and σ k are discrete, because draws can only take the values in theset tp α k , σ k q : k P Z ` u . This nonparametric nature makes the Dirichlet Process prior an idealchoice for clustering problems especially when the distinct number of clusters is unknownbeforehand. The group parameters p α k , σ k q are assumed to follow the base distribution B which is an independent (non-conjugate) Multivariate Normal-Inverse-Gamma (IMNIG)distribution.On the other hand, the group probability is formalized through an infinite-dimensionalstick-breaking prior governed by the concentration parameter a , π k “ ξ k ź j ă k p ´ ξ j q for k ą , and π “ ξ , (3.3) his Version: October 6, 2020his Version: October 6, 2020 ξ k , which are called stick lengths, are independent random variables drawn from thebeta distribution Beta p , a q . This construction can be viewed as a stick-breaking procedure,where at each step, we independently and randomly break the leftover of a stick of unitlength and assign the length of this break to the current value of π k . The smaller a is, theless of the stick will be left for subsequent values (on average), yielding more concentrateddistributions.The concentration parameter a specifies how strong this discretization is. As a Ñ a Ñ 8 , the realizationsbecome continuous-valued from its based distribution. Escobar and West (1995) shows thatthe number of estimated groups under a DP prior is sensitive to a , which indicates that adata-driven estimate should be more reasonable. To determine how discrete we want andhow many groups are needed given the data, it is convenient to treat a as a parameterunder the nonparametric Bayesian framework. Put differently, we can set up a relativelygeneral hyperprior for a „ Gamma p . , q , and update it based on the observations. Thisstep generates a posterior estimate of a , which implicitly chooses the optimal K withoutre-estimating the models with different numbers of groups.Finally, the prior distribution for the common parameter ρ is chosen to be a normaldistribution, ρ „ N p , σ ρ q with σ ρ “ . (3.4)The prior of heterogeneous parameter β i follows, β i „ N p , Σ β q with Σ β “ ˆ I p . (3.5)To sum up, in the random coefficients model, we specify the Dirichlet Process priorsfor group random effects α g i t and heteroskedasticity σ g i , a stick-breaking process for groupprobabilities π k , a hyperprior for the concentration parameter a and a normal prior for thecommon parameter ρ and heterogeneous parameter β i . Frequently, researchers could provide a group structure on all or at least part of the unitsbased on personal expertise and the nature of individuals. For example, firms coming fromthe same industry may share a similar growth pattern with relative high probability; coun-tries having the same level of development form comparable fiscal policies. Though this his Version: October 6, 2020his Version: October 6, 2020 ω i “ r ω i , ω i , ..., ω iK p s for all unit i “ , .., N , where K p is the presetnumber of groups. Namely, before estimating the group membership, the researcher assignseach unit to different groups with a set of subjective group-specific probabilities, and theseprobabilities will enter the algorithm through a prior distribution for ω i . We name this priorfor ω i as Subjective Group Probability (SGP) Prior . In practice, one could provide a table(for example, Table 1) documenting the subjective group probability of a unit falling into aspecific group. Table 1: An example of prior group probability, K p “ i “ , , ..., N ,the researcher is fairly confident in her knowledge and sets one of ω ik to 100%. This isequivalent to the case where the researcher exactly partitions N units into K p predeterminedgroups.Building on the prior for the random effects model in Section 3.1.1, we allow for incorpo-rating the researchers’ prior knowledge while inheriting the feature of reallocating units andchanging the number of groups along the MCMC sampling. These flexible features enablethe block Gibbs sampler to automatically correct and update the imprecise subjective prior,especially when K p doesn’t match the true number of groups.To incorporate these subjective group probabilities, it is important to choose a properprior for ω . Dirichlet distribution is an applicable candidate among assorted densities since his Version: October 6, 2020his Version: October 6, 2020 K ˚ ) in aniteration equals the presumed K p . Let ω i “ r ω i , ω i , ..., ω iK p s be the vector of group-specificprobability for unit i , we set the prior density for ω i as an unsymmetric Dirichlet distribution, ω i „ Dir p a i , a i , ..., a iK p q , (3.6)where a ik are concentration parameters and strictly positive. Conditional on ω i , the groupmembership g i is assumed to be drawn from a multinomial distribution, g i „ M ultinomial p ω i q , i.e. , P p g i “ k | ω i q “ ω ik , for k “ , . . . , K p . (3.7)It can be shown posterior probability of ω i given g i is also a Dirichlet distribution withmodified hyperparameters: Dir p a i ` p g i “ q , a i ` p g i “ q , ..., a iK p ` p g i “ K p qq . Hencewe can direct sample ω i from it posterior distribution.Another important property of Dirichlet distribution that enables itself to be the mostsuitable prior is that we can tie our prior probability directly with the expected value of ω ik , E p ω ik q “ a ik ř K p i “ k a ik . (3.8)To integrate the researcher’s prior knowledge, one only need to deliberately choose a setof t a ik u such that the expected probability matches her subjective probability on groups.Since the membership probabilities ω ik are updated based on observations, and we allowfor reallocating units and changing the number of groups along the MCMC sampling, arevision of our block Gibbs sampler is needed to adjust for such changes. The details of thenew algorithm are presented in Appendix A.2. In practice, we can restrict ř K p i “ k a ik to be 1so that a ik represents both the subjective group probability for unit i belonging to group k and the prior mean of ω ik . Draws from the joint posterior distribution can be obtained by using blocked Gibbs sampling.The proposed algorithm is based on Ishwaran and James (2001) and Walker (2007). Thoughthe algorithm in Ishwaran and James (2001) has been widely used for sampling stick-breakingpriors, it alone can’t fulfill our need for estimating the number of groups without any prede-termined level or upper bound since it requires a finite-dimensional prior and truncation. To his Version: October 6, 2020his Version: October 6, 2020
Two sets of specifications of prior distributions are considered in this section. In the firstprior specification, we concentrate on random effects models and implement a full Bayesiananalysis. In addition, we specify a hyperprior for the distribution of unobserved heterogeneityand then construct a joint posterior for the coefficients of this hyperprior as well as the actualunit-specific and common coefficients. In the second specification, econometricians couldprovide useful information on the latent group structure and incorporate it in the prior. his Version: October 6, 2020his Version: October 6, 2020 In this paper, we focus on the random coefficients model where heterogeneous parameters α g i t and σ g i are independent and are assumed to be independent of the initial value of each unit y i . The specification can be extended to correlated random coefficients model by modelingthe joint distribution of heterogeneous parameters and initial values y i .A typical choice in the nonparametric Bayesian literature is the Dirichlet Process (DP)prior or stick-breaking prior. With group probabilities π k and parameter in prior: (mean,variance) = p µ α , Σ α q , a draw of α g i t from the DP prior could be viewed as a mixture of pointmess with the probability mass function, α g i t „ K ÿ k “ π k δ α kt , with α kt „ N p µ α , Σ α q , (3.1)where δ x denotes the Dirac-delta function concentrated at x , each α kt is drawn from a normaldistribution and K is unknown. µ α are set to the OLS estimate of α assuming K “ α equals 200 ˆ ˆΣ α where ˆΣ α is the standard deviation of the OLS estimator. In the samefashion, we can define the DP prior for grouped heteroskedasticity σ g i given identical groupprobabilities π k : σ g i „ K ÿ k “ π k δ σ k , with σ k „ IG ˆ ν σ , δ σ ˙ , ν σ “ , and δ σ “ , (3.2)where each component is drawn from inverse-Gamma distribution.Put together, the posterior draws of grouped related coefficients can be characterized bya grouped triplet t π k , α k , σ k u for k “ , , .. , and α k “ r α k , α k , ..., α kT s . Importantly, thedistributions of both α k and σ k are discrete, because draws can only take the values in theset tp α k , σ k q : k P Z ` u . This nonparametric nature makes the Dirichlet Process prior an idealchoice for clustering problems especially when the distinct number of clusters is unknownbeforehand. The group parameters p α k , σ k q are assumed to follow the base distribution B which is an independent (non-conjugate) Multivariate Normal-Inverse-Gamma (IMNIG)distribution.On the other hand, the group probability is formalized through an infinite-dimensionalstick-breaking prior governed by the concentration parameter a , π k “ ξ k ź j ă k p ´ ξ j q for k ą , and π “ ξ , (3.3) his Version: October 6, 2020his Version: October 6, 2020 ξ k , which are called stick lengths, are independent random variables drawn from thebeta distribution Beta p , a q . This construction can be viewed as a stick-breaking procedure,where at each step, we independently and randomly break the leftover of a stick of unitlength and assign the length of this break to the current value of π k . The smaller a is, theless of the stick will be left for subsequent values (on average), yielding more concentrateddistributions.The concentration parameter a specifies how strong this discretization is. As a Ñ a Ñ 8 , the realizationsbecome continuous-valued from its based distribution. Escobar and West (1995) shows thatthe number of estimated groups under a DP prior is sensitive to a , which indicates that adata-driven estimate should be more reasonable. To determine how discrete we want andhow many groups are needed given the data, it is convenient to treat a as a parameterunder the nonparametric Bayesian framework. Put differently, we can set up a relativelygeneral hyperprior for a „ Gamma p . , q , and update it based on the observations. Thisstep generates a posterior estimate of a , which implicitly chooses the optimal K withoutre-estimating the models with different numbers of groups.Finally, the prior distribution for the common parameter ρ is chosen to be a normaldistribution, ρ „ N p , σ ρ q with σ ρ “ . (3.4)The prior of heterogeneous parameter β i follows, β i „ N p , Σ β q with Σ β “ ˆ I p . (3.5)To sum up, in the random coefficients model, we specify the Dirichlet Process priorsfor group random effects α g i t and heteroskedasticity σ g i , a stick-breaking process for groupprobabilities π k , a hyperprior for the concentration parameter a and a normal prior for thecommon parameter ρ and heterogeneous parameter β i . Frequently, researchers could provide a group structure on all or at least part of the unitsbased on personal expertise and the nature of individuals. For example, firms coming fromthe same industry may share a similar growth pattern with relative high probability; coun-tries having the same level of development form comparable fiscal policies. Though this his Version: October 6, 2020his Version: October 6, 2020 ω i “ r ω i , ω i , ..., ω iK p s for all unit i “ , .., N , where K p is the presetnumber of groups. Namely, before estimating the group membership, the researcher assignseach unit to different groups with a set of subjective group-specific probabilities, and theseprobabilities will enter the algorithm through a prior distribution for ω i . We name this priorfor ω i as Subjective Group Probability (SGP) Prior . In practice, one could provide a table(for example, Table 1) documenting the subjective group probability of a unit falling into aspecific group. Table 1: An example of prior group probability, K p “ i “ , , ..., N ,the researcher is fairly confident in her knowledge and sets one of ω ik to 100%. This isequivalent to the case where the researcher exactly partitions N units into K p predeterminedgroups.Building on the prior for the random effects model in Section 3.1.1, we allow for incorpo-rating the researchers’ prior knowledge while inheriting the feature of reallocating units andchanging the number of groups along the MCMC sampling. These flexible features enablethe block Gibbs sampler to automatically correct and update the imprecise subjective prior,especially when K p doesn’t match the true number of groups.To incorporate these subjective group probabilities, it is important to choose a properprior for ω . Dirichlet distribution is an applicable candidate among assorted densities since his Version: October 6, 2020his Version: October 6, 2020 K ˚ ) in aniteration equals the presumed K p . Let ω i “ r ω i , ω i , ..., ω iK p s be the vector of group-specificprobability for unit i , we set the prior density for ω i as an unsymmetric Dirichlet distribution, ω i „ Dir p a i , a i , ..., a iK p q , (3.6)where a ik are concentration parameters and strictly positive. Conditional on ω i , the groupmembership g i is assumed to be drawn from a multinomial distribution, g i „ M ultinomial p ω i q , i.e. , P p g i “ k | ω i q “ ω ik , for k “ , . . . , K p . (3.7)It can be shown posterior probability of ω i given g i is also a Dirichlet distribution withmodified hyperparameters: Dir p a i ` p g i “ q , a i ` p g i “ q , ..., a iK p ` p g i “ K p qq . Hencewe can direct sample ω i from it posterior distribution.Another important property of Dirichlet distribution that enables itself to be the mostsuitable prior is that we can tie our prior probability directly with the expected value of ω ik , E p ω ik q “ a ik ř K p i “ k a ik . (3.8)To integrate the researcher’s prior knowledge, one only need to deliberately choose a setof t a ik u such that the expected probability matches her subjective probability on groups.Since the membership probabilities ω ik are updated based on observations, and we allowfor reallocating units and changing the number of groups along the MCMC sampling, arevision of our block Gibbs sampler is needed to adjust for such changes. The details of thenew algorithm are presented in Appendix A.2. In practice, we can restrict ř K p i “ k a ik to be 1so that a ik represents both the subjective group probability for unit i belonging to group k and the prior mean of ω ik . Draws from the joint posterior distribution can be obtained by using blocked Gibbs sampling.The proposed algorithm is based on Ishwaran and James (2001) and Walker (2007). Thoughthe algorithm in Ishwaran and James (2001) has been widely used for sampling stick-breakingpriors, it alone can’t fulfill our need for estimating the number of groups without any prede-termined level or upper bound since it requires a finite-dimensional prior and truncation. To his Version: October 6, 2020his Version: October 6, 2020 K , we implement slice-samplingproposed by Walker (2007) and modify the framework of Ishwaran and James (2001) withadditional posterior sampling steps. Using the conjugate priors specified in Section 3.1, eachparameter is directly drawn from its posterior distribution.In the Appendix A, we provide detailed derivations for the conditional distributions overwhich the Gibbs sampler iterates. We focus on the time-varying grouped random effectsmodel with grouped heteroskedasticity, which is the most sophisticated specification. Otherspecifications can be estimated by merely ignoring time effects in α ’s or shutting down theheteroskedasticity. One feature of our proposed block Gibbs sampler is that it partitions N units into G groups,and, at the same time, generates posterior draws for parameters. This Gibbs sampler andour BGRE estimator inevitably remind us of one of the most popular clustering algorithmsin the area of unsupervised machine learning: the Kmeans algorithm. Indeed, the Kmeansalgorithm plays a crucial role in BM and BLM, who estimate the grouped fixed-effects fromthe frequentists’ point of view. In this subsection, we seek to illustrate the similarity andconnection between our block Gibbs sampler and the Kmeans algorithm in the limit.We start with the Kmeans clustering algorithm. Given a set of observations p z , z , . . . , z N q , where each observation contains the dependent variables and covariates, p y i , x i q . Kmeansclustering aims to partition the N observations into K sets so as to minimize the within-cluster sum of squares,min t C k u Kk “ K ÿ k “ ÿ i P C k } z i ´ µ k } where µ k “ | C k | ÿ i P C k z i . (3.9)The algorithm alternates between reassigning points to clusters and recomputing themeans. For the assignment step, one computes the squared Euclidean distance from eachpoint to each cluster mean, and then assign each observation to the cluster with the nearestmean. The update step of the algorithm recalculates centroid for observations assigned toeach cluster and updates µ k for all k .According to the block Gibbs sampler, we assign unit i to group k conditional on the his Version: October 6, 2020his Version: October 6, 2020
Two sets of specifications of prior distributions are considered in this section. In the firstprior specification, we concentrate on random effects models and implement a full Bayesiananalysis. In addition, we specify a hyperprior for the distribution of unobserved heterogeneityand then construct a joint posterior for the coefficients of this hyperprior as well as the actualunit-specific and common coefficients. In the second specification, econometricians couldprovide useful information on the latent group structure and incorporate it in the prior. his Version: October 6, 2020his Version: October 6, 2020 In this paper, we focus on the random coefficients model where heterogeneous parameters α g i t and σ g i are independent and are assumed to be independent of the initial value of each unit y i . The specification can be extended to correlated random coefficients model by modelingthe joint distribution of heterogeneous parameters and initial values y i .A typical choice in the nonparametric Bayesian literature is the Dirichlet Process (DP)prior or stick-breaking prior. With group probabilities π k and parameter in prior: (mean,variance) = p µ α , Σ α q , a draw of α g i t from the DP prior could be viewed as a mixture of pointmess with the probability mass function, α g i t „ K ÿ k “ π k δ α kt , with α kt „ N p µ α , Σ α q , (3.1)where δ x denotes the Dirac-delta function concentrated at x , each α kt is drawn from a normaldistribution and K is unknown. µ α are set to the OLS estimate of α assuming K “ α equals 200 ˆ ˆΣ α where ˆΣ α is the standard deviation of the OLS estimator. In the samefashion, we can define the DP prior for grouped heteroskedasticity σ g i given identical groupprobabilities π k : σ g i „ K ÿ k “ π k δ σ k , with σ k „ IG ˆ ν σ , δ σ ˙ , ν σ “ , and δ σ “ , (3.2)where each component is drawn from inverse-Gamma distribution.Put together, the posterior draws of grouped related coefficients can be characterized bya grouped triplet t π k , α k , σ k u for k “ , , .. , and α k “ r α k , α k , ..., α kT s . Importantly, thedistributions of both α k and σ k are discrete, because draws can only take the values in theset tp α k , σ k q : k P Z ` u . This nonparametric nature makes the Dirichlet Process prior an idealchoice for clustering problems especially when the distinct number of clusters is unknownbeforehand. The group parameters p α k , σ k q are assumed to follow the base distribution B which is an independent (non-conjugate) Multivariate Normal-Inverse-Gamma (IMNIG)distribution.On the other hand, the group probability is formalized through an infinite-dimensionalstick-breaking prior governed by the concentration parameter a , π k “ ξ k ź j ă k p ´ ξ j q for k ą , and π “ ξ , (3.3) his Version: October 6, 2020his Version: October 6, 2020 ξ k , which are called stick lengths, are independent random variables drawn from thebeta distribution Beta p , a q . This construction can be viewed as a stick-breaking procedure,where at each step, we independently and randomly break the leftover of a stick of unitlength and assign the length of this break to the current value of π k . The smaller a is, theless of the stick will be left for subsequent values (on average), yielding more concentrateddistributions.The concentration parameter a specifies how strong this discretization is. As a Ñ a Ñ 8 , the realizationsbecome continuous-valued from its based distribution. Escobar and West (1995) shows thatthe number of estimated groups under a DP prior is sensitive to a , which indicates that adata-driven estimate should be more reasonable. To determine how discrete we want andhow many groups are needed given the data, it is convenient to treat a as a parameterunder the nonparametric Bayesian framework. Put differently, we can set up a relativelygeneral hyperprior for a „ Gamma p . , q , and update it based on the observations. Thisstep generates a posterior estimate of a , which implicitly chooses the optimal K withoutre-estimating the models with different numbers of groups.Finally, the prior distribution for the common parameter ρ is chosen to be a normaldistribution, ρ „ N p , σ ρ q with σ ρ “ . (3.4)The prior of heterogeneous parameter β i follows, β i „ N p , Σ β q with Σ β “ ˆ I p . (3.5)To sum up, in the random coefficients model, we specify the Dirichlet Process priorsfor group random effects α g i t and heteroskedasticity σ g i , a stick-breaking process for groupprobabilities π k , a hyperprior for the concentration parameter a and a normal prior for thecommon parameter ρ and heterogeneous parameter β i . Frequently, researchers could provide a group structure on all or at least part of the unitsbased on personal expertise and the nature of individuals. For example, firms coming fromthe same industry may share a similar growth pattern with relative high probability; coun-tries having the same level of development form comparable fiscal policies. Though this his Version: October 6, 2020his Version: October 6, 2020 ω i “ r ω i , ω i , ..., ω iK p s for all unit i “ , .., N , where K p is the presetnumber of groups. Namely, before estimating the group membership, the researcher assignseach unit to different groups with a set of subjective group-specific probabilities, and theseprobabilities will enter the algorithm through a prior distribution for ω i . We name this priorfor ω i as Subjective Group Probability (SGP) Prior . In practice, one could provide a table(for example, Table 1) documenting the subjective group probability of a unit falling into aspecific group. Table 1: An example of prior group probability, K p “ i “ , , ..., N ,the researcher is fairly confident in her knowledge and sets one of ω ik to 100%. This isequivalent to the case where the researcher exactly partitions N units into K p predeterminedgroups.Building on the prior for the random effects model in Section 3.1.1, we allow for incorpo-rating the researchers’ prior knowledge while inheriting the feature of reallocating units andchanging the number of groups along the MCMC sampling. These flexible features enablethe block Gibbs sampler to automatically correct and update the imprecise subjective prior,especially when K p doesn’t match the true number of groups.To incorporate these subjective group probabilities, it is important to choose a properprior for ω . Dirichlet distribution is an applicable candidate among assorted densities since his Version: October 6, 2020his Version: October 6, 2020 K ˚ ) in aniteration equals the presumed K p . Let ω i “ r ω i , ω i , ..., ω iK p s be the vector of group-specificprobability for unit i , we set the prior density for ω i as an unsymmetric Dirichlet distribution, ω i „ Dir p a i , a i , ..., a iK p q , (3.6)where a ik are concentration parameters and strictly positive. Conditional on ω i , the groupmembership g i is assumed to be drawn from a multinomial distribution, g i „ M ultinomial p ω i q , i.e. , P p g i “ k | ω i q “ ω ik , for k “ , . . . , K p . (3.7)It can be shown posterior probability of ω i given g i is also a Dirichlet distribution withmodified hyperparameters: Dir p a i ` p g i “ q , a i ` p g i “ q , ..., a iK p ` p g i “ K p qq . Hencewe can direct sample ω i from it posterior distribution.Another important property of Dirichlet distribution that enables itself to be the mostsuitable prior is that we can tie our prior probability directly with the expected value of ω ik , E p ω ik q “ a ik ř K p i “ k a ik . (3.8)To integrate the researcher’s prior knowledge, one only need to deliberately choose a setof t a ik u such that the expected probability matches her subjective probability on groups.Since the membership probabilities ω ik are updated based on observations, and we allowfor reallocating units and changing the number of groups along the MCMC sampling, arevision of our block Gibbs sampler is needed to adjust for such changes. The details of thenew algorithm are presented in Appendix A.2. In practice, we can restrict ř K p i “ k a ik to be 1so that a ik represents both the subjective group probability for unit i belonging to group k and the prior mean of ω ik . Draws from the joint posterior distribution can be obtained by using blocked Gibbs sampling.The proposed algorithm is based on Ishwaran and James (2001) and Walker (2007). Thoughthe algorithm in Ishwaran and James (2001) has been widely used for sampling stick-breakingpriors, it alone can’t fulfill our need for estimating the number of groups without any prede-termined level or upper bound since it requires a finite-dimensional prior and truncation. To his Version: October 6, 2020his Version: October 6, 2020 K , we implement slice-samplingproposed by Walker (2007) and modify the framework of Ishwaran and James (2001) withadditional posterior sampling steps. Using the conjugate priors specified in Section 3.1, eachparameter is directly drawn from its posterior distribution.In the Appendix A, we provide detailed derivations for the conditional distributions overwhich the Gibbs sampler iterates. We focus on the time-varying grouped random effectsmodel with grouped heteroskedasticity, which is the most sophisticated specification. Otherspecifications can be estimated by merely ignoring time effects in α ’s or shutting down theheteroskedasticity. One feature of our proposed block Gibbs sampler is that it partitions N units into G groups,and, at the same time, generates posterior draws for parameters. This Gibbs sampler andour BGRE estimator inevitably remind us of one of the most popular clustering algorithmsin the area of unsupervised machine learning: the Kmeans algorithm. Indeed, the Kmeansalgorithm plays a crucial role in BM and BLM, who estimate the grouped fixed-effects fromthe frequentists’ point of view. In this subsection, we seek to illustrate the similarity andconnection between our block Gibbs sampler and the Kmeans algorithm in the limit.We start with the Kmeans clustering algorithm. Given a set of observations p z , z , . . . , z N q , where each observation contains the dependent variables and covariates, p y i , x i q . Kmeansclustering aims to partition the N observations into K sets so as to minimize the within-cluster sum of squares,min t C k u Kk “ K ÿ k “ ÿ i P C k } z i ´ µ k } where µ k “ | C k | ÿ i P C k z i . (3.9)The algorithm alternates between reassigning points to clusters and recomputing themeans. For the assignment step, one computes the squared Euclidean distance from eachpoint to each cluster mean, and then assign each observation to the cluster with the nearestmean. The update step of the algorithm recalculates centroid for observations assigned toeach cluster and updates µ k for all k .According to the block Gibbs sampler, we assign unit i to group k conditional on the his Version: October 6, 2020his Version: October 6, 2020 p p g i “ k | ρ, β, α, Σ , G p i q , Y, X q“ p p y i | ρ, β i , α k , σ k , Y, X q p u i ă π k q ř K ˚ j “ p p y i | ρ, β i , α j , σ j , Y, X q p u i ă π j q . “ c ik exp “ ´ p y i ´ ρy ´ ,i ´ x i β i ´ α k q Σ ´ k p y i ´ ρy ´ ,i ´ x i β i ´ α k q ‰ř K ˚ j “ c ij exp “ ´ p y i ´ ρy ´ ,i ´ x i β i ´ α j q Σ ´ j p y i ´ ρy ´ ,i ´ x i β i ´ α j q ‰ “ c ik exp “ ´ p ˜ y i ´ α k q Σ ´ k p ˜ y i ´ α k q ‰ř K ˚ j “ c ij exp “ ´ p ˜ y i ´ α j q Σ ´ j p ˜ y i ´ α j q ‰ , (3.10)where c ik “ p π q ´ T Σ ´ k p u i ă π k q , ˜ y i “ y i ´ ρy ´ ,i ´ x i β i , and y ´ ,i are the lagged values of y i . If we assume homoskedasticity, i.e., Σ k “ Σ for all k , then in the limit as Σ Ñ , the valueof p p g i “ k | ρ, β, α, Σ , G p i q , Y, X q approaches zero for all k except for the one correspondingto the smallest weighted distance p ˜ y i ´ α k q Σ ´ k p ˜ y i ´ α k q . In this case, this step is akin tothe assignment step of Kmeans but using a weighted Euclidean distance. Then, conditionalon newly estimated group membership, we update the group random effects α k throughBayesian linear regression only using the units of group k . This step exactly recalculates themeans of the new clusters, establishing the equivalence of the update step.Having this similarity in mind, it is natural to include the Kmeans algorithm in our MonteCarlo experiment and explore its performance relative to our BGRE estimator in termsof accuracy of clustering. Notably, following BLM, we construct a 2-step GRE estimatorequipped with K-mean algorithm in the first step. The performance of this 2-step estimatoris assessed in Section 4.3.3. In this section, we conducted Monte Carlo simulation experiments to examine the perfor-mance of various Grouped Random Effects (GRE) estimators under different data generatingprocesses (DGPs) and prior assumptions. These DGPs differ in whether the random effectsare time-invariant or time-varying and whether to introduce heterogeneity in the varianceof innovations. Such designs allow us to examine not only how our approach performs un-der DGPs with particular features, but also the reliability of appropriately estimating thenumber of clusters.We consider a setting with sample size N “ T “
11. The last his Version: October 6, 2020his Version: October 6, 2020
11. The last his Version: October 6, 2020his Version: October 6, 2020 K “
4. Given N and K , we partition the entire sample into K balanced blocks with N { K units in each block .For each DGP, 100 datasets are generated, and we run the block Gibbs samplers for eachdata set with M “ ,
000 iterations after a burn-in of 5,000 draws.
The Monte Carlo simulation is based on the dynamic panel data model in (2.1), in whichwe suppress the exogenous predictors x it for simplicity. In short, we consider four lineardynamic DGPs in this section. DGP1 and DGP2 involve time-invariant random effects whiletime-varying random effects are allowed in DGP3. Moreover, DGP1 and DGP3 considerhomoskedasticity, but DGP2 has heteroskedastic innovations. DGP4 is the standard paneldata model without a group structure. Throughout these DGPs, the random effects α k andidiosyncratic error ε it are standard normal distributed, independent across i , k , and t , andmutually independent. ε it is independent of all regressors. The data are simulated accordingto the following DGPs: DGP1:
Time-invariant grouped random effects, homoskedasticity (Grp Ti-Homo) .This DGP is the most naive panel data model with group pattern in the random effect. y it “ α g i ` ρy it ´ ` ε it , with ρ “ . ε it iid „ N p , . q and α k iid „ N p k, . q for k “ , , ..., K . DGP2:
Time-invariant grouped random effects, heteroskedasticity (Grp Ti-Hetero) .This DGP aims to incorporate heteroskedasticity, which leads to a slightly complicatedprocess that is hard to estimate. y it “ α g i ` ρy it ´ ` ε it , with ρ “ . ε it iid „ N ` , σ g i ˘ where σ k “ . ` ´ k ´ K ˘ , and α k iid „ N p k, . q for k “ , , ..., K . DGP3:
Time-varying grouped random effects, homoskedasticity (Grp Tv-Homo) .So far, we have focused on time-invariant models. But when estimated on real data, it’s rea-sonable to believe the random effect could a have time pattern. Hence, we introduce various If N { K is not an integer, use t N { K u for group 1,2,.., K ´
1, the last group contains the residual units. his Version: October 6, 2020his Version: October 6, 2020
1, the last group contains the residual units. his Version: October 6, 2020his Version: October 6, 2020 y it “ α g i ` ρy it ´ ` ε it , with ρ “ . ε it iid „ N p , q and α it iid „ N ` α g i t , . ˘ where α g i t varies across periods andgroups as depicted in Figure 1. To enrich the patterns of time-varying random effects, weconstruct 4 different paths. Group 1 has a constant mean for α i . The means for α i in group2 are also constant but experience a structure change at T “
5. Group 3 and group 4 areequipped with monotonically increasing/decreasing means.Figure 1: Mean of Random Effects for GDP3, K “ DGP4:
Time-invariant random effects, homoskedasticity, no group structure (Std Ti-Homo) . This is the standard panel data model with unit-specific random effects and iden-tical variance for the innovations. y it “ α i ` ρy it ´ ` ε it , with ρ “ . ε it iid „ N p , . q and α i iid „ N p , . q . We consider four types of BGRE estimators that differ on the assumptions made on the ran-dom effects (RE) and the variance of errors: (1) time-invariant grouped RE with homoskedas-ticity (Ti-Homo); (2) time-invariant grouped RE with heteroskedasticity (Ti-Hetero); (3) his Version: October 6, 2020his Version: October 6, 2020
Time-invariant random effects, homoskedasticity, no group structure (Std Ti-Homo) . This is the standard panel data model with unit-specific random effects and iden-tical variance for the innovations. y it “ α i ` ρy it ´ ` ε it , with ρ “ . ε it iid „ N p , . q and α i iid „ N p , . q . We consider four types of BGRE estimators that differ on the assumptions made on the ran-dom effects (RE) and the variance of errors: (1) time-invariant grouped RE with homoskedas-ticity (Ti-Homo); (2) time-invariant grouped RE with heteroskedasticity (Ti-Hetero); (3) his Version: October 6, 2020his Version: October 6, 2020 . For instance, Ti-Homo estimator assumes the truemodel to have time-invariant grouped random effects, and the variance of error terms to beconstant across units. Besides the results we will show below, we modify the DGPs and con-duct more experiments using (a) larger variance of error terms σ , (b) shorter time span, and(c) different true number of groups K . These additional results are available in AppendixE. Regarding alternative estimators, we consider the following Bayesian estimators thathave different prior assumptions on the random effects α i . (1) Bayesian pooled estimator(Pooled): α i is treated as a common parameter as ρ does, this means all units share thesame prior level of α i ; (2) flat-prior estimator (Flat): assume p p α i q 9
1, this amounts todraw samples from a posterior whose mode is the MLE estimate. Given the estimate ofcommon parameter, there is no pooling across units, α i ’s are estimated only using their ownhistory; (3) Parametric-prior estimator (Param): assume α i „ N p µ, π q , where a Normal-Inverse-Gamma hyperprior is further imposed on p µ, π q , this prior be thought of as a limitcase of the DP prior when the concentration parameter a Ñ
0, so there is only one cluster,and p µ, π q are directly drawn from the base distribution. Table 2 shows the estimate comparison among alternative predictors. For the DGP1, Ti-Homo and Ti-Hetero estimator are the best in every aspect. This is as expected sincethey correctly model the time-invariant random effects. Among these two estimators, whenwe allow for group-level heteroskedasticity, the optimal number of groups decreases as Ti-Hetero underestimates the number of groups. The coverage probability, however, is notwell-controlled, both of which are below the nominal coverage of 0.95. The Flat estimatoralso has good performance in terms of RMSE of ρ . Nonetheless, its coverage probability isrelatively low: only 23% of credible sets successfully contain the true values. This is due tothe relatively large biases for α i . The rest predictors are considerably worse. This implies For the Tv-Homo and Tv-Hetero estimator, as we allow for time effects in α i , we use the most recent α iT to make one-step ahead prediciton. This is equivalent to assume the law of motion of α it is a randomwalk. Modeling the trend of α it would result in a more accurate forecast, but this is beyond the Scope ofthis paper. The Normal-Inverse-Gamma hyperprior for p µ, π q used in the Monte Carlo simulation is as follow: µ | π „ N p m, vπ q with m equating to the pooled OLS estimator of α i and v “ π follows IG p ν π { , δ π { q with ν π “ δ π “ his Version: October 6, 2020his Version: October 6, 2020
0, so there is only one cluster,and p µ, π q are directly drawn from the base distribution. Table 2 shows the estimate comparison among alternative predictors. For the DGP1, Ti-Homo and Ti-Hetero estimator are the best in every aspect. This is as expected sincethey correctly model the time-invariant random effects. Among these two estimators, whenwe allow for group-level heteroskedasticity, the optimal number of groups decreases as Ti-Hetero underestimates the number of groups. The coverage probability, however, is notwell-controlled, both of which are below the nominal coverage of 0.95. The Flat estimatoralso has good performance in terms of RMSE of ρ . Nonetheless, its coverage probability isrelatively low: only 23% of credible sets successfully contain the true values. This is due tothe relatively large biases for α i . The rest predictors are considerably worse. This implies For the Tv-Homo and Tv-Hetero estimator, as we allow for time effects in α i , we use the most recent α iT to make one-step ahead prediciton. This is equivalent to assume the law of motion of α it is a randomwalk. Modeling the trend of α it would result in a more accurate forecast, but this is beyond the Scope ofthis paper. The Normal-Inverse-Gamma hyperprior for p µ, π q used in the Monte Carlo simulation is as follow: µ | π „ N p m, vπ q with m equating to the pooled OLS estimator of α i and v “ π follows IG p ν π { , δ π { q with ν π “ δ π “ his Version: October 6, 2020his Version: October 6, 2020 K equals to 3.60 while the truth is 4.For the case of DGP2, we keep time-invariant random effects while assuming heteroskedas-ticity. Tv-Homo and Tv-Hetero are still dominating. Tv-Hetero generates the best resultswith an accurate estimate of the number of groups as it correctly models heteroskedasticitywhich in turn improves the estimation efficiency. The Flat estimator closely follows them,and the rest are worse. Regarding DGP3, when time-varying random effects are introducedin the model, Tv-Homo and Tv-Hetero estimator yield the best performance and the es-timated average K s are close to the truth. But the biases are arguably low for these twoestimators in sacrifice for small standard deviation and short credible intervals. It is worthnoting that, unlike Ti-Homo in DGP1 and Ti-Hetero in DGP2, though correctly specified,the bias for α i is still comparatively high. This is because, for simplicity, we don’t modelthe law of motion for α it and simply assume α iT ` “ α iT , which results in large bias in α i . As regards the DGP4 that doesn’t have a group structure, the Flat estimator is thebest since it doesn’t pool cross-sectional information but estimate the unit-specific randomeffects. All of the BGRE estimators have almost identical performances with the estimatednumber of groups close or equal to 1. Since the Pooled and Param estimators assume nogroup structure, both have similar estimates as the BGRE estimators. Table 3 reports the predictive performance of a range of parametric forecasts. For the DGP1,the best forecasts are generated by the Ti-Homo estimator, as it is correctly specified in thisenvironment. It has the smallest RMSFE, the shortest average length of the credible set,correct coverage probability, the largest LPS, and the smallest CRPS. Although allowingfor heteroskedasticity along with the time-invariant random effects, Ti-Hetero generates aperfect point forecast as well as Ti-Homo. But Ti-Hetero introduces uncertainty revealed bya slightly wider credible set and worse density forecast. Moreover, estimators involving time-varying random effects (Tv-Homo and Tv-Hetero) worsen the forecast. Finally, incorrectlyimposing no latent group pattern substantially deteriorates the predictive performance in allaspects. his Version: October 6, 2020his Version: October 6, 2020
0, so there is only one cluster,and p µ, π q are directly drawn from the base distribution. Table 2 shows the estimate comparison among alternative predictors. For the DGP1, Ti-Homo and Ti-Hetero estimator are the best in every aspect. This is as expected sincethey correctly model the time-invariant random effects. Among these two estimators, whenwe allow for group-level heteroskedasticity, the optimal number of groups decreases as Ti-Hetero underestimates the number of groups. The coverage probability, however, is notwell-controlled, both of which are below the nominal coverage of 0.95. The Flat estimatoralso has good performance in terms of RMSE of ρ . Nonetheless, its coverage probability isrelatively low: only 23% of credible sets successfully contain the true values. This is due tothe relatively large biases for α i . The rest predictors are considerably worse. This implies For the Tv-Homo and Tv-Hetero estimator, as we allow for time effects in α i , we use the most recent α iT to make one-step ahead prediciton. This is equivalent to assume the law of motion of α it is a randomwalk. Modeling the trend of α it would result in a more accurate forecast, but this is beyond the Scope ofthis paper. The Normal-Inverse-Gamma hyperprior for p µ, π q used in the Monte Carlo simulation is as follow: µ | π „ N p m, vπ q with m equating to the pooled OLS estimator of α i and v “ π follows IG p ν π { , δ π { q with ν π “ δ π “ his Version: October 6, 2020his Version: October 6, 2020 K equals to 3.60 while the truth is 4.For the case of DGP2, we keep time-invariant random effects while assuming heteroskedas-ticity. Tv-Homo and Tv-Hetero are still dominating. Tv-Hetero generates the best resultswith an accurate estimate of the number of groups as it correctly models heteroskedasticitywhich in turn improves the estimation efficiency. The Flat estimator closely follows them,and the rest are worse. Regarding DGP3, when time-varying random effects are introducedin the model, Tv-Homo and Tv-Hetero estimator yield the best performance and the es-timated average K s are close to the truth. But the biases are arguably low for these twoestimators in sacrifice for small standard deviation and short credible intervals. It is worthnoting that, unlike Ti-Homo in DGP1 and Ti-Hetero in DGP2, though correctly specified,the bias for α i is still comparatively high. This is because, for simplicity, we don’t modelthe law of motion for α it and simply assume α iT ` “ α iT , which results in large bias in α i . As regards the DGP4 that doesn’t have a group structure, the Flat estimator is thebest since it doesn’t pool cross-sectional information but estimate the unit-specific randomeffects. All of the BGRE estimators have almost identical performances with the estimatednumber of groups close or equal to 1. Since the Pooled and Param estimators assume nogroup structure, both have similar estimates as the BGRE estimators. Table 3 reports the predictive performance of a range of parametric forecasts. For the DGP1,the best forecasts are generated by the Ti-Homo estimator, as it is correctly specified in thisenvironment. It has the smallest RMSFE, the shortest average length of the credible set,correct coverage probability, the largest LPS, and the smallest CRPS. Although allowingfor heteroskedasticity along with the time-invariant random effects, Ti-Hetero generates aperfect point forecast as well as Ti-Homo. But Ti-Hetero introduces uncertainty revealed bya slightly wider credible set and worse density forecast. Moreover, estimators involving time-varying random effects (Tv-Homo and Tv-Hetero) worsen the forecast. Finally, incorrectlyimposing no latent group pattern substantially deteriorates the predictive performance in allaspects. his Version: October 6, 2020his Version: October 6, 2020 ρ ˆ α i ClusterRMSE Bias Std AvgL Cov Bias Avg KDGP 1(Grp Ti Ho.) Ti-Homo 0.0198 0.0113 0.0120 0.0468 0.83 -0.0744 3.60Ti-Hetero 0.0202 0.0118 0.0120 0.0468 0.78 -0.0776 3.55Tv-Homo 0.2403 0.2387 0.0187 0.0712 0.06 -1.5255 1.82Tv-Hetero 0.2405 0.2389 0.0108 0.0689 0.07 -1.5301 1.89Pooled 0.2449 0.2447 0.0069 0.0268 0.00 -1.5588 1Flat 0.0369 -0.0344 0.0121 0.0469 0.23 0.2166 100Param 0.2711 0.2437 0.1148 0.4545 0.38 -1.5474 1DGP 2(Grp Ti He.) Ti-Homo 0.0226 0.0097 0.0150 0.0583 0.85 -0.0681 3.74Ti-Hetero 0.0112 0.0036 0.0082 0.0321 0.95 -0.0261 3.98Tv-Homo 0.1924 0.1885 0.0234 0.0893 0.14 -1.2289 11.44Tv-Hetero 0.0965 0.0894 0.0255 0.0979 0.30 -0.5925 3.42Pooled 0.2318 0.2316 0.0079 0.0310 0.00 -1.4905 1Flat 0.0493 -0.0469 0.0146 0.0567 0.06 0.2984 100Param 0.2576 0.2303 0.1115 0.4407 0.38 -1.4792 1DGP 3(Grp Tv Ho.) Ti-Homo 0.2726 0.2724 0.0119 0.0463 0.00 -2.1937 2.00Ti-Hetero 0.2741 0.2738 0.0121 0.0470 0.00 -2.2031 2.32Tv-Homo 0.0580 0.0525 0.0221 0.0860 0.33 -0.3679 3.89Tv-Hetero 0.0589 0.0534 0.0222 0.0863 0.33 -0.3743 3.85Pooled 0.1925 0.1923 0.0081 0.0314 0.00 -1.6381 1Flat 0.3230 0.3227 0.0126 0.0492 0.00 -2.5462 100Param 0.2172 0.1912 0.1021 0.4033 0.54 -1.6269 1DGP 4(Std Ti Ho.) Ti-Homo 0.2177 0.2170 0.0164 0.0635 0.01 0.0038 1.03Ti-Hetero 0.2168 0.2159 0.0165 0.0644 0.01 0.0035 1.02Tv-Homo 0.2216 0.2210 0.0161 0.0627 0.00 0.0037 1.01Tv-Hetero 0.2204 0.2198 0.0164 0.0638 0.00 0.0038 1.16Pooled 0.2204 0.2198 0.0161 0.0628 0.00 0.0037 1Flat 0.1838 -0.1817 0.0277 0.1076 0.00 -0.0032 100Param 0.2321 0.2201 0.0714 0.2856 0.06 0.0154 1 his Version: October 6, 2020his Version: October 6, 2020
0, so there is only one cluster,and p µ, π q are directly drawn from the base distribution. Table 2 shows the estimate comparison among alternative predictors. For the DGP1, Ti-Homo and Ti-Hetero estimator are the best in every aspect. This is as expected sincethey correctly model the time-invariant random effects. Among these two estimators, whenwe allow for group-level heteroskedasticity, the optimal number of groups decreases as Ti-Hetero underestimates the number of groups. The coverage probability, however, is notwell-controlled, both of which are below the nominal coverage of 0.95. The Flat estimatoralso has good performance in terms of RMSE of ρ . Nonetheless, its coverage probability isrelatively low: only 23% of credible sets successfully contain the true values. This is due tothe relatively large biases for α i . The rest predictors are considerably worse. This implies For the Tv-Homo and Tv-Hetero estimator, as we allow for time effects in α i , we use the most recent α iT to make one-step ahead prediciton. This is equivalent to assume the law of motion of α it is a randomwalk. Modeling the trend of α it would result in a more accurate forecast, but this is beyond the Scope ofthis paper. The Normal-Inverse-Gamma hyperprior for p µ, π q used in the Monte Carlo simulation is as follow: µ | π „ N p m, vπ q with m equating to the pooled OLS estimator of α i and v “ π follows IG p ν π { , δ π { q with ν π “ δ π “ his Version: October 6, 2020his Version: October 6, 2020 K equals to 3.60 while the truth is 4.For the case of DGP2, we keep time-invariant random effects while assuming heteroskedas-ticity. Tv-Homo and Tv-Hetero are still dominating. Tv-Hetero generates the best resultswith an accurate estimate of the number of groups as it correctly models heteroskedasticitywhich in turn improves the estimation efficiency. The Flat estimator closely follows them,and the rest are worse. Regarding DGP3, when time-varying random effects are introducedin the model, Tv-Homo and Tv-Hetero estimator yield the best performance and the es-timated average K s are close to the truth. But the biases are arguably low for these twoestimators in sacrifice for small standard deviation and short credible intervals. It is worthnoting that, unlike Ti-Homo in DGP1 and Ti-Hetero in DGP2, though correctly specified,the bias for α i is still comparatively high. This is because, for simplicity, we don’t modelthe law of motion for α it and simply assume α iT ` “ α iT , which results in large bias in α i . As regards the DGP4 that doesn’t have a group structure, the Flat estimator is thebest since it doesn’t pool cross-sectional information but estimate the unit-specific randomeffects. All of the BGRE estimators have almost identical performances with the estimatednumber of groups close or equal to 1. Since the Pooled and Param estimators assume nogroup structure, both have similar estimates as the BGRE estimators. Table 3 reports the predictive performance of a range of parametric forecasts. For the DGP1,the best forecasts are generated by the Ti-Homo estimator, as it is correctly specified in thisenvironment. It has the smallest RMSFE, the shortest average length of the credible set,correct coverage probability, the largest LPS, and the smallest CRPS. Although allowingfor heteroskedasticity along with the time-invariant random effects, Ti-Hetero generates aperfect point forecast as well as Ti-Homo. But Ti-Hetero introduces uncertainty revealed bya slightly wider credible set and worse density forecast. Moreover, estimators involving time-varying random effects (Tv-Homo and Tv-Hetero) worsen the forecast. Finally, incorrectlyimposing no latent group pattern substantially deteriorates the predictive performance in allaspects. his Version: October 6, 2020his Version: October 6, 2020 ρ ˆ α i ClusterRMSE Bias Std AvgL Cov Bias Avg KDGP 1(Grp Ti Ho.) Ti-Homo 0.0198 0.0113 0.0120 0.0468 0.83 -0.0744 3.60Ti-Hetero 0.0202 0.0118 0.0120 0.0468 0.78 -0.0776 3.55Tv-Homo 0.2403 0.2387 0.0187 0.0712 0.06 -1.5255 1.82Tv-Hetero 0.2405 0.2389 0.0108 0.0689 0.07 -1.5301 1.89Pooled 0.2449 0.2447 0.0069 0.0268 0.00 -1.5588 1Flat 0.0369 -0.0344 0.0121 0.0469 0.23 0.2166 100Param 0.2711 0.2437 0.1148 0.4545 0.38 -1.5474 1DGP 2(Grp Ti He.) Ti-Homo 0.0226 0.0097 0.0150 0.0583 0.85 -0.0681 3.74Ti-Hetero 0.0112 0.0036 0.0082 0.0321 0.95 -0.0261 3.98Tv-Homo 0.1924 0.1885 0.0234 0.0893 0.14 -1.2289 11.44Tv-Hetero 0.0965 0.0894 0.0255 0.0979 0.30 -0.5925 3.42Pooled 0.2318 0.2316 0.0079 0.0310 0.00 -1.4905 1Flat 0.0493 -0.0469 0.0146 0.0567 0.06 0.2984 100Param 0.2576 0.2303 0.1115 0.4407 0.38 -1.4792 1DGP 3(Grp Tv Ho.) Ti-Homo 0.2726 0.2724 0.0119 0.0463 0.00 -2.1937 2.00Ti-Hetero 0.2741 0.2738 0.0121 0.0470 0.00 -2.2031 2.32Tv-Homo 0.0580 0.0525 0.0221 0.0860 0.33 -0.3679 3.89Tv-Hetero 0.0589 0.0534 0.0222 0.0863 0.33 -0.3743 3.85Pooled 0.1925 0.1923 0.0081 0.0314 0.00 -1.6381 1Flat 0.3230 0.3227 0.0126 0.0492 0.00 -2.5462 100Param 0.2172 0.1912 0.1021 0.4033 0.54 -1.6269 1DGP 4(Std Ti Ho.) Ti-Homo 0.2177 0.2170 0.0164 0.0635 0.01 0.0038 1.03Ti-Hetero 0.2168 0.2159 0.0165 0.0644 0.01 0.0035 1.02Tv-Homo 0.2216 0.2210 0.0161 0.0627 0.00 0.0037 1.01Tv-Hetero 0.2204 0.2198 0.0164 0.0638 0.00 0.0038 1.16Pooled 0.2204 0.2198 0.0161 0.0628 0.00 0.0037 1Flat 0.1838 -0.1817 0.0277 0.1076 0.00 -0.0032 100Param 0.2321 0.2201 0.0714 0.2856 0.06 0.0154 1 his Version: October 6, 2020his Version: October 6, 2020 ρ and α i . But Ti-Homo fails in the set forecast and density forecast, which illustratesthe importance of modeling heteroskedasticity. Again, the rest estimators suffer apart fromthe Flat estimator.In the DGP3, Ti-Homo and Ti-Hetero are doing badly by not capturing the time effectsin α g i . Tv-Homo and Tv-Hetero are the best, beating the rest by a large margin, andequally accurate in this setup. The coverage probability for these two estimators is slightlylower than that of Ti-Homo and Ti-Hetero in part due to uncertainty introduced by moreparameters of interest. Pooled, Flat, and Param estimator neglect both group structure andtime-varying random effects, hence generating poor forecasts.Regarding DGP4, all estimators beside Param have comparable forecasts as all of themdeem no group structure in this environment (all estimated K s are close to 1). However,Param suffers from high variance, and it always generates the widest credible interval andworse density forecast as in other DGP’s. This section conducts four sets of Monte Carlo simulation experiments aiming to examinethe variants of BGRE estimator: (1) the GFE estimator proposed by BM, (2) a two-stepGRE estimator with Kmeans, (3) the BGRE estimator with the subjective group prior, and(4) the BGRE estimator imposed with true K . The main text ignores part of the estimatorswe consider in the previous section and focuses on the correctly specified estimator for eachDGP.In addition to four DGPs specified in Section 4.1, we design three new DGPs that inheritthe main features from DGP1, DGP2, and DGP 3, including balanced group structure andunit variance structure. However, instead of drawing α g i from the normal, we choose to usea constant α k for each group. In this way, we could focus on the clustering result rather thanrepetitions to average out the randomness brought by random effects. We impose K “ DGP5:
Time-invariant grouped fixed effects, homoskedasticity. y it “ α g i ` ρy it ´ ` ε it , his Version: October 6, 2020his Version: October 6, 2020
Time-invariant grouped fixed effects, homoskedasticity. y it “ α g i ` ρy it ´ ` ε it , his Version: October 6, 2020his Version: October 6, 2020 his Version: October 6, 2020his Version: October 6, 2020
Time-invariant grouped fixed effects, homoskedasticity. y it “ α g i ` ρy it ´ ` ε it , his Version: October 6, 2020his Version: October 6, 2020 his Version: October 6, 2020his Version: October 6, 2020 ρ “ . α k “ k for k “ , , ..., K and ε it iid „ N p , q . DGP6:
Time-invariant grouped fixed effects, heteroskedasticity. y it “ α g i ` ρy it ´ ` ε it , with ρ “ . α k “ k for k “ , , ..., K and ε it iid „ N ` , σ g i ˘ with σ k “ . ` ´ k ´ K ˘ . DGP7:
Time-varying grouped fixed effects, homoskedasticity. y it “ α g i t ` ρy it ´ ` ε it , with ρ “ . ε it iid „ N p , q and α g i t “ α g i t where α g i t varies across periods and groups asdepicted in Figure 1. In this experiment, we compare our BGRE estimator with BM’s GFE estimator . In particu-lar, we assess the performance of point forecast in y and the accuracy of coefficient estimates(group random effects α and common parameter ρ ).To compare the performance of estimators, we use the default numerical setting in BM.It is worth noticing that BM relies on information criteria to ex post select the optimalnumber of groups. Hence we consider at most 10 groups and estimate the number of groups K in accordance with the following Akaike information criterion (AIC) : AIC p k q “ N T N ÿ i “ T ÿ t “ ´ y it ´ ˆ ρ p k q y it ´ ´ ˆ β i k q x it ´ ´ p α p k q p g i t ¯ ` p σ p k q k p T ` N ´ k q N T , where p σ is a consistent estimate of the variance of ε it : p σ p k q “ N T ´ K max T ´ N ´ p p ` q N ÿ i “ T ÿ t “ ´ y it ´ ˆ ρ p k q y it ´ ´ ˆ β i k q x it ´ ´ p α p k q p g i t ¯ . The results are shown in the Table 4. As BM proposes two algorithms , we presentthe results for four versions of the GFE estimator. The first two estimators (GFE a and The default settings are as follow: (1) Number of groups = 4; Number of covariates = 1; Standard errors:0 (no standard errors). (2) For algorithm 0, Number of simulations = 100; (3) For algorithm 1, Number ofsimulations = 10, Number of neighbors = 10 , Number of steps = 10. We also tried the alternative choice ˆ σ kT ` N ` p ` NT ln p N T q for the penalty. This corresponds to the defaultBIC used in Bonhomme and Manresa (2015). We found that, in this case, BIC selected the smallest possiblenumber of groups for all DGPs, i.e., no group structure, whereas the truth is K “
4. Moreover, other formsof BIC could always select the largest K as well. Due to the inaccurate estimate of group structure andsubstantially poor performance, we don’t show the result with the default BIC. These two algorithms could generate different estimates as shown in the case of DGP6 below. his Version: October 6, 2020his Version: October 6, 2020
4. Moreover, other formsof BIC could always select the largest K as well. Due to the inaccurate estimate of group structure andsubstantially poor performance, we don’t show the result with the default BIC. These two algorithms could generate different estimates as shown in the case of DGP6 below. his Version: October 6, 2020his Version: October 6, 2020 a ) equip with the Iterative and Variable Neighborhood Search algorithm, respectively.We impose the true number of groups K for the other two estimators (GFE a and GFE a ),i.e., we don’t perform model selection but choose ˆ K “ K “ a and GFE a ) are still stragglingand worse than their counterparts with ˆ K selected by the information criterion. GFE a andGFE a generate relatively high bias for both α i and ρ . In the case of DGP7, Tv-Homoand Tv-Hetero still have better performance than the GFE estimators whose models arechosen by the information criterion. Furthermore, Tv-Homo and Tv-Hetero perform onlymarginally worse than GFE a and GFE a estimators that are imposed with the true K .This is because they overestimate the number of groups in some posterior draws, and hence,on average, the posterior mean forecast and estimate are slightly off.Moreover, the accuracy of the GFE estimator is profoundly affected by the choice ofthe information criteria. We implement several information criteria proposed by Bai andNg (2002) in this Monte Carlo experiment and find that there is no single criterion thatconsistently selects the correct number of groups nor close to the truth. As the GFE estimatoris designed for the time-varying model, we finally select the AIC mentioned above, whichchooses a model that is closest to the true model in time-varying DGPs. But this deliberatelyselected AIC fails to improve the performance of the GFE estimator in DGP7. Once weswitch to other forms of AIC or BIC, these results will not hold anymore. These facts alsoemphasize the importance of not relying on ex-post model selection and the superiority ofour Bayesian estimators. In this section, we explore whether subjective group structure improves the accuracy offorecast and group clustering. We conduct two Monte Carlo experiments corresponding toSGP Prior defined in Section 3.1.2.We consider five scenarios, each of which differs in the structure of subjective priorprobability and hence the prior group probability π . The exact specification is characterizedin Table 5. The first three scenarios set the preset number of groups as the truth K , whereas his Version: October 6, 2020his Version: October 6, 2020
4. Moreover, other formsof BIC could always select the largest K as well. Due to the inaccurate estimate of group structure andsubstantially poor performance, we don’t show the result with the default BIC. These two algorithms could generate different estimates as shown in the case of DGP6 below. his Version: October 6, 2020his Version: October 6, 2020 a ) equip with the Iterative and Variable Neighborhood Search algorithm, respectively.We impose the true number of groups K for the other two estimators (GFE a and GFE a ),i.e., we don’t perform model selection but choose ˆ K “ K “ a and GFE a ) are still stragglingand worse than their counterparts with ˆ K selected by the information criterion. GFE a andGFE a generate relatively high bias for both α i and ρ . In the case of DGP7, Tv-Homoand Tv-Hetero still have better performance than the GFE estimators whose models arechosen by the information criterion. Furthermore, Tv-Homo and Tv-Hetero perform onlymarginally worse than GFE a and GFE a estimators that are imposed with the true K .This is because they overestimate the number of groups in some posterior draws, and hence,on average, the posterior mean forecast and estimate are slightly off.Moreover, the accuracy of the GFE estimator is profoundly affected by the choice ofthe information criteria. We implement several information criteria proposed by Bai andNg (2002) in this Monte Carlo experiment and find that there is no single criterion thatconsistently selects the correct number of groups nor close to the truth. As the GFE estimatoris designed for the time-varying model, we finally select the AIC mentioned above, whichchooses a model that is closest to the true model in time-varying DGPs. But this deliberatelyselected AIC fails to improve the performance of the GFE estimator in DGP7. Once weswitch to other forms of AIC or BIC, these results will not hold anymore. These facts alsoemphasize the importance of not relying on ex-post model selection and the superiority ofour Bayesian estimators. In this section, we explore whether subjective group structure improves the accuracy offorecast and group clustering. We conduct two Monte Carlo experiments corresponding toSGP Prior defined in Section 3.1.2.We consider five scenarios, each of which differs in the structure of subjective priorprobability and hence the prior group probability π . The exact specification is characterizedin Table 5. The first three scenarios set the preset number of groups as the truth K , whereas his Version: October 6, 2020his Version: October 6, 2020 ρ ˆ α GroupRMSFE Error Std Bias Error Avg KDGP 5(Grp Ti Ho.) Ti-Homo 0.7806 0.0261 0.7801 0.0214 -0.1359 4.63Ti-Hetero 0.7829 0.0254 0.7825 0.0212 -0.1353 4.81Tv-Homo 0.8062 0.0904 0.8011 0.3261 -2.1184 1.00Tv-Hetero 0.7995 0.0883 0.7946 0.3262 -2.1183 1.52GFE a a a a a a a a a a a a = GFE estimator with the true K ; a1 = algorithm 1: Variable Neighborhood Search;a0 = algorithm 0: Iterative Search. his Version: October 6, 2020his Version: October 6, 2020
4. Moreover, other formsof BIC could always select the largest K as well. Due to the inaccurate estimate of group structure andsubstantially poor performance, we don’t show the result with the default BIC. These two algorithms could generate different estimates as shown in the case of DGP6 below. his Version: October 6, 2020his Version: October 6, 2020 a ) equip with the Iterative and Variable Neighborhood Search algorithm, respectively.We impose the true number of groups K for the other two estimators (GFE a and GFE a ),i.e., we don’t perform model selection but choose ˆ K “ K “ a and GFE a ) are still stragglingand worse than their counterparts with ˆ K selected by the information criterion. GFE a andGFE a generate relatively high bias for both α i and ρ . In the case of DGP7, Tv-Homoand Tv-Hetero still have better performance than the GFE estimators whose models arechosen by the information criterion. Furthermore, Tv-Homo and Tv-Hetero perform onlymarginally worse than GFE a and GFE a estimators that are imposed with the true K .This is because they overestimate the number of groups in some posterior draws, and hence,on average, the posterior mean forecast and estimate are slightly off.Moreover, the accuracy of the GFE estimator is profoundly affected by the choice ofthe information criteria. We implement several information criteria proposed by Bai andNg (2002) in this Monte Carlo experiment and find that there is no single criterion thatconsistently selects the correct number of groups nor close to the truth. As the GFE estimatoris designed for the time-varying model, we finally select the AIC mentioned above, whichchooses a model that is closest to the true model in time-varying DGPs. But this deliberatelyselected AIC fails to improve the performance of the GFE estimator in DGP7. Once weswitch to other forms of AIC or BIC, these results will not hold anymore. These facts alsoemphasize the importance of not relying on ex-post model selection and the superiority ofour Bayesian estimators. In this section, we explore whether subjective group structure improves the accuracy offorecast and group clustering. We conduct two Monte Carlo experiments corresponding toSGP Prior defined in Section 3.1.2.We consider five scenarios, each of which differs in the structure of subjective priorprobability and hence the prior group probability π . The exact specification is characterizedin Table 5. The first three scenarios set the preset number of groups as the truth K , whereas his Version: October 6, 2020his Version: October 6, 2020 ρ ˆ α GroupRMSFE Error Std Bias Error Avg KDGP 5(Grp Ti Ho.) Ti-Homo 0.7806 0.0261 0.7801 0.0214 -0.1359 4.63Ti-Hetero 0.7829 0.0254 0.7825 0.0212 -0.1353 4.81Tv-Homo 0.8062 0.0904 0.8011 0.3261 -2.1184 1.00Tv-Hetero 0.7995 0.0883 0.7946 0.3262 -2.1183 1.52GFE a a a a a a a a a a a a = GFE estimator with the true K ; a1 = algorithm 1: Variable Neighborhood Search;a0 = algorithm 0: Iterative Search. his Version: October 6, 2020his Version: October 6, 2020 ω ik “ { K for @ i, k ) in the scenario 3. Scenario 2 is anintermediate case where one is less confident in her knowledge and correctly assigns a unit toits group with the prior probability of 70% (other groups equally split the remaining 30%).For the scenario 4 and 5, the number of groups is different from the truth. We assume theresearcher divides all units into K ‰ K even groups with the prior probability of 100% .Table 5: Simulation Design: Subjective Group ProbabilityScenarios K very confident, assign 100 % to the correct group2 K less confident, assign 70 % to the correct group3 K uninformed, evenly assign 1 { K ˆ
100 % to the each group4 K ´ K ` . Remember that the prior knowledge is the most accurate in scenario1, where the researcher is equivalent to know the true group structure. In this regard, aclear gain in estimate emerges as the RMSE for ρ , bias for ρ and α i generated by SGP-RE1 The last two scenarios aim to evaluate the performance of SGP prior when the number of groups is wrong.Instead of randomly assigning a unit to each groups with a set of probability, we assume the econometricianis confident on her prior and set 100% for a particular group. In particular, we assign the first N { K unitsinto group 1, the next N { K units into group 2, and so on. We also run other designs for scenario 4 and 5with different prior probabilities. The results show that, as long as a certain amount of units are correctlyclustered into groups, the performances of SPG-RE estimators are slightly better than those of the BGREestimator. The full results are presented in the Appendix E.5. his Version: October 6, 2020his Version: October 6, 2020
100 % to the each group4 K ´ K ` . Remember that the prior knowledge is the most accurate in scenario1, where the researcher is equivalent to know the true group structure. In this regard, aclear gain in estimate emerges as the RMSE for ρ , bias for ρ and α i generated by SGP-RE1 The last two scenarios aim to evaluate the performance of SGP prior when the number of groups is wrong.Instead of randomly assigning a unit to each groups with a set of probability, we assume the econometricianis confident on her prior and set 100% for a particular group. In particular, we assign the first N { K unitsinto group 1, the next N { K units into group 2, and so on. We also run other designs for scenario 4 and 5with different prior probabilities. The results show that, as long as a certain amount of units are correctlyclustered into groups, the performances of SPG-RE estimators are slightly better than those of the BGREestimator. The full results are presented in the Appendix E.5. his Version: October 6, 2020his Version: October 6, 2020 „ „ K “ K , the uninformative prior forces the algorithm to consider other incorrectgroups with a large chance (= 1 ´ K ), and hence deteriorates the performance of bothestimates and forecasts. In terms of the one-step ahead forecast, SGP-RE1 leads the restby scoring the lowest values for each metrics (and the highest LPS), closely followed bySGP-RE2. SGP-RE3 is suffering in terms of point and set forecast. As for scenario 4 and 5,despite the flexibility of allowing for changing a and K p along MCMC sampling, the incorrectspecifications of the group number deteriorate the estimation, both of which fail to deliverreliable estimates for ρ and α . Nonetheless, such prior structures help point and densityforecasts. Both estimators beat the benchmark and generate comparable LPS to SGP-RE1and SGP-RE2. This valuable improvement mainly results from the fact the algorithm couldexploit the prior knowledge on group structure that successfully partitions merely a fractionof units and adapt the number of groups accordingly.In short, regarding the overall performance of the SGP prior, the best case is that theresearcher has a relative confident prior and knows the true number of groups. In this case,the SGP-RE estimator dominates the Tv-Hetero estimator from every angle. However, inpractice, we rarely come up with such a precise prior due to the incomplete understandingof the population of the data. Instead, we might be less confident on our knowledge or evenspecify more/fewer groups than the truth. Under this circumstance, the SGP-RE estimatorcould still deliver a better density forecast because of the great exploration of the priorinformation and the adaptive scheme featured by our Bayesian method. In this section, we compare our BGRE estimator with a two-step GRE estimator, where unitsare clustered into groups in the first step using
Kmeans algorithm, and the model is thenestimated in the second step with group-specific heterogeneity. Unlike BLM, we implementthe Bayesian framework in the second step to echo other Bayesian estimators presented inthe previous section. This two-step procedure allows us to examine the clustering accuracyof Kmeans relative to our full Bayesian estimate as two-step GRE estimators can be viewed his Version: October 6, 2020his Version: October 6, 2020
Kmeans algorithm, and the model is thenestimated in the second step with group-specific heterogeneity. Unlike BLM, we implementthe Bayesian framework in the second step to echo other Bayesian estimators presented inthe previous section. This two-step procedure allows us to examine the clustering accuracyof Kmeans relative to our full Bayesian estimate as two-step GRE estimators can be viewed his Version: October 6, 2020his Version: October 6, 2020
Note: the value of each relative metric is capped by 100% to enhance the readability. The number in eachsubpanel represents the (original) value of metric for the benchmark model (Tv-Hetero).
Figure 3: Monte Carlo Experiment: Forecast, SGP Prior
Note: the value of each relative metric is capped by 100% to enhance the readability. The number in eachsubpanel represents the (original) value of metric for the benchmark model (Tv-Hetero). his Version: October 6, 2020his Version: October 6, 2020
Note: the value of each relative metric is capped by 100% to enhance the readability. The number in eachsubpanel represents the (original) value of metric for the benchmark model (Tv-Hetero). his Version: October 6, 2020his Version: October 6, 2020 forestimates and forecast in Figure 4 and 5. Each bar represents the performance of the two-Step GRE estimators against the performance of the original GRE estimator. Above zeroindicates the 2-step estimator underperforms the benchmark while a 2-step estimator isbetter when its bars show negative values. The main models are correctly specified for eachDGP, i.e., Ti-Homo for DGP 1, Ti-Hetero for DGP 2, and Tv-Homo for DGP 3.Figure 4 presents the point estimates for each DGP. We document the root mean squaredforecast error, absolute bias, standard deviation, and the average length of the 95% credibleset for the common parameter ρ , while the metric for the random effects is the absolute bias.According to these measures, the two-Step GRE estimators perform worse than the BayesianGRE estimator as they introduce much higher bias in the estimate of ρ (hence larger RMSEfor ρ ) and α i . It is worth noting that the estimator equipped with Kmeans doesn’t affectthe standard deviation and the average length of 95% credible set of ρ .The inferior performance of the 2-step GRE estimator is due to the inaccurate estimateof group structure. Table 6 reports the estimated number of groups from the two-step GREestimators with Kmeans and the BGRE estimator. Regarding the performance of clustering,the Kmeans algorithm severely underestimates the number of groups as it prefers much lessgroups, while the true number is 4. Meanwhile, our BGRE approach accurately estimatesthe number of groups, though slightly underestimated in the DGP 1.Table 6: Number of groups, Kmeans vs. BGREDGP 1 DGP 2 DGP 3Kmeans 2.20 2.20 2.26BGRE 3.60 3.98 3.85Figure 5 shows the point, set, and density forecast for each DGP. As Kmeans fails toestimate the group structure, none of the 2-step GRE estimators outperform the GRE esti-mators. Namely, Kmeans clustering doesn’t help make a more accurate forecast, and instead,it generates a much higher forecast bias and standard deviation. The full results are presented in the Appendix E.4. his Version: October 6, 2020his Version: October 6, 2020
Note: the value of each relative metric is capped by 100% to enhance the readability. The number in eachsubpanel represents the (original) value of metric for the benchmark model (Tv-Hetero). his Version: October 6, 2020his Version: October 6, 2020 forestimates and forecast in Figure 4 and 5. Each bar represents the performance of the two-Step GRE estimators against the performance of the original GRE estimator. Above zeroindicates the 2-step estimator underperforms the benchmark while a 2-step estimator isbetter when its bars show negative values. The main models are correctly specified for eachDGP, i.e., Ti-Homo for DGP 1, Ti-Hetero for DGP 2, and Tv-Homo for DGP 3.Figure 4 presents the point estimates for each DGP. We document the root mean squaredforecast error, absolute bias, standard deviation, and the average length of the 95% credibleset for the common parameter ρ , while the metric for the random effects is the absolute bias.According to these measures, the two-Step GRE estimators perform worse than the BayesianGRE estimator as they introduce much higher bias in the estimate of ρ (hence larger RMSEfor ρ ) and α i . It is worth noting that the estimator equipped with Kmeans doesn’t affectthe standard deviation and the average length of 95% credible set of ρ .The inferior performance of the 2-step GRE estimator is due to the inaccurate estimateof group structure. Table 6 reports the estimated number of groups from the two-step GREestimators with Kmeans and the BGRE estimator. Regarding the performance of clustering,the Kmeans algorithm severely underestimates the number of groups as it prefers much lessgroups, while the true number is 4. Meanwhile, our BGRE approach accurately estimatesthe number of groups, though slightly underestimated in the DGP 1.Table 6: Number of groups, Kmeans vs. BGREDGP 1 DGP 2 DGP 3Kmeans 2.20 2.20 2.26BGRE 3.60 3.98 3.85Figure 5 shows the point, set, and density forecast for each DGP. As Kmeans fails toestimate the group structure, none of the 2-step GRE estimators outperform the GRE esti-mators. Namely, Kmeans clustering doesn’t help make a more accurate forecast, and instead,it generates a much higher forecast bias and standard deviation. The full results are presented in the Appendix E.4. his Version: October 6, 2020his Version: October 6, 2020
Note: the value for each relative metric is capped by 200% to enhance the readability.
Figure 5: Monte Carlo Experiment: Forecast, Two-Step GRE with Kmeans
Note: the value for each relative metric is capped by 50% to enhance the readability. his Version: October 6, 2020his Version: October 6, 2020
Note: the value for each relative metric is capped by 50% to enhance the readability. his Version: October 6, 2020his Version: October 6, 2020 In this section, we illustrate the use of Bayesian Grouped Random Effects estimators ina cross-firm study. We revisit the investment regression and use a different version of thedynamic grouped panel model to forecast the investment rate for a panel of firms in allindustries. Instead of using the traditional Tobin’s Q-type investment regression, we imple-ment a new scheme proposed by Gala et al. (2019), who directly estimates the corporateinvestment rate without Tobin’s Q. Again, our main focus is the one-step ahead point, setand density forecast. Due to space limitations, we only report forecast results for the mostrecent year in the main text. Summary statistics, and additional details of implementationare stored in Appendix F and G. We consider a general model with grouped latent heterogeneity in α i . Following Hsiao andTahmiscioglu (1997) and Gala et al. (2019), the investment equation is specified as, ˆ IK ˙ it “ α g i t ` ρ ˆ IK ˙ it ´ ` β i ˆ CFK ˙ it ´ ` β i ln K it ´ ` β i ln ˆ YK ˙ it ´ ` ε it , (5.1)where capital stock, K it is defined as net property, plant and equipment; I it is capital invest-ment; CF it , is a liquidity variable defined as cash flow minus dividends; Y t is the end-of-yearsales; ε it are the normally distributed error terms. The subscript i denotes companies, and t denotes time. Unlike the commonly specified investment equation using Tobin’s Q, the addi-tional terms, including the natural logarithm of lagged capital and sales-to-capital ratio, arebased on the regression proposed by Gala et al. (2019). The lagged values of the investmentrate are included as explanatory variables to avoid endogeneity problems.As we focus on forecasting, we can relax a few assumptions to achieve better predictiveperformance. These assumptions include time-invariant random effects α g i , homoskedasticityin σ i and homogeneous coefficients for all dependent variables ( β i ¨ = β i ). Table 7 summarizesthe estimators and their properties we consider in this section. The implementation oftime-invariant RE and homoskedasticity is similar to the one in the previous section, i.e.,construct four versions of BGRE estimator: Ti-Homo, Ti-Hetero, Tv-Homo, and Tv-Hetero.Despite the fact that the homogeneous slopes have been frequently rejected in empirical The investment rate for a firm in a particular year is defined as the fraction of capital expenditures inproperty, plant, and equipment in terms of the beginning-of-year capital stock. his Version: October 6, 2020his Version: October 6, 2020
Note: the value for each relative metric is capped by 50% to enhance the readability. his Version: October 6, 2020his Version: October 6, 2020 In this section, we illustrate the use of Bayesian Grouped Random Effects estimators ina cross-firm study. We revisit the investment regression and use a different version of thedynamic grouped panel model to forecast the investment rate for a panel of firms in allindustries. Instead of using the traditional Tobin’s Q-type investment regression, we imple-ment a new scheme proposed by Gala et al. (2019), who directly estimates the corporateinvestment rate without Tobin’s Q. Again, our main focus is the one-step ahead point, setand density forecast. Due to space limitations, we only report forecast results for the mostrecent year in the main text. Summary statistics, and additional details of implementationare stored in Appendix F and G. We consider a general model with grouped latent heterogeneity in α i . Following Hsiao andTahmiscioglu (1997) and Gala et al. (2019), the investment equation is specified as, ˆ IK ˙ it “ α g i t ` ρ ˆ IK ˙ it ´ ` β i ˆ CFK ˙ it ´ ` β i ln K it ´ ` β i ln ˆ YK ˙ it ´ ` ε it , (5.1)where capital stock, K it is defined as net property, plant and equipment; I it is capital invest-ment; CF it , is a liquidity variable defined as cash flow minus dividends; Y t is the end-of-yearsales; ε it are the normally distributed error terms. The subscript i denotes companies, and t denotes time. Unlike the commonly specified investment equation using Tobin’s Q, the addi-tional terms, including the natural logarithm of lagged capital and sales-to-capital ratio, arebased on the regression proposed by Gala et al. (2019). The lagged values of the investmentrate are included as explanatory variables to avoid endogeneity problems.As we focus on forecasting, we can relax a few assumptions to achieve better predictiveperformance. These assumptions include time-invariant random effects α g i , homoskedasticityin σ i and homogeneous coefficients for all dependent variables ( β i ¨ = β i ). Table 7 summarizesthe estimators and their properties we consider in this section. The implementation oftime-invariant RE and homoskedasticity is similar to the one in the previous section, i.e.,construct four versions of BGRE estimator: Ti-Homo, Ti-Hetero, Tv-Homo, and Tv-Hetero.Despite the fact that the homogeneous slopes have been frequently rejected in empirical The investment rate for a firm in a particular year is defined as the fraction of capital expenditures inproperty, plant, and equipment in terms of the beginning-of-year capital stock. his Version: October 6, 2020his Version: October 6, 2020 α i Homogeneity Group StructureHomogenousCoef. Ti-Homo
X X X
Ti-Hetero
X X
Tv-Homo
X X
Tv-Hetero X Flat
X X
Pooled
X X
Param
X X
HeterogenousCoef. Ti-Homo
X X X
Ti-Hetero
X X
Tv-Homo
X X
Tv-Hetero X Flat
X X
The individual company data are obtain from COMPUSTAT Annual database. To accountfor potential structural breaks and the advanced speed of capital accumulation in the recentdecades, our sample is composed of a balanced panel of firms for the years 2000 to 2019,that includes firms from all industries with no missing value in accounting data.We keep only firm-years that have non-missing information required to construct theprimary variables of interest, namely: capital stock K , investment I , liquidity CF , and salesrevenues Y . The further details of constructing the sample can be found in the AppendixF. The final sample comprises 337 firms and the observations for each firm is 20.To examine the performance of various estimators with limited observations, we chooseto use a rolling window of 15 years. In this sense, we create five balanced panels which endin years 2014, ..., 2018 ( t “ T ), respectively. The observations in the next year ( t “ T ` his Version: October 6, 2020his Version: October 6, 2020
The individual company data are obtain from COMPUSTAT Annual database. To accountfor potential structural breaks and the advanced speed of capital accumulation in the recentdecades, our sample is composed of a balanced panel of firms for the years 2000 to 2019,that includes firms from all industries with no missing value in accounting data.We keep only firm-years that have non-missing information required to construct theprimary variables of interest, namely: capital stock K , investment I , liquidity CF , and salesrevenues Y . The further details of constructing the sample can be found in the AppendixF. The final sample comprises 337 firms and the observations for each firm is 20.To examine the performance of various estimators with limited observations, we chooseto use a rolling window of 15 years. In this sense, we create five balanced panels which endin years 2014, ..., 2018 ( t “ T ), respectively. The observations in the next year ( t “ T ` his Version: October 6, 2020his Version: October 6, 2020 We begin the empirical analysis by comparing the performance of point, set, and densityforecast for the last panel (in-sample periord: 2005 - 2018). We aim to forecast the invest-ment rate in 2019. We consider all the model specifications depicted in the Table 7 andtheir performance is presented in the Table 8. Throughout the analysis, the Flat estimatorserves as the benchmark as it essentially assumes individual effects. In Table 8, the thirdcolumn shows the RMSFE for the one-step ahead forecast. For the panel considered in thetable, we first notice that the best model is the Tv-Hetero (time-varying random effects,heteroskedasticity) in homogeneous coefficients specification. It outperforms the benchmark– Flat estimator by 25%. Ti-Hetero also delivers accurate point forecast, which suggests timeeffects provide merely marginal improvement. Under heterogeneous coefficients specification,for the BGRE estimators, though all of them beat the Flat estimator, their RMSFEs arerelatively larger. This may arise from the fact that heteroskedasticity alone can capture agreat amount of individual effects, thus imposing heterogeneous coefficients in β i may overfitthe model and lead to poor forecasts.The fourth column documents the average number of latent groups in α i . Most of ourBGRE estimators deem a group structure with more than six underlying components. And his Version: October 6, 2020his Version: October 6, 2020
The individual company data are obtain from COMPUSTAT Annual database. To accountfor potential structural breaks and the advanced speed of capital accumulation in the recentdecades, our sample is composed of a balanced panel of firms for the years 2000 to 2019,that includes firms from all industries with no missing value in accounting data.We keep only firm-years that have non-missing information required to construct theprimary variables of interest, namely: capital stock K , investment I , liquidity CF , and salesrevenues Y . The further details of constructing the sample can be found in the AppendixF. The final sample comprises 337 firms and the observations for each firm is 20.To examine the performance of various estimators with limited observations, we chooseto use a rolling window of 15 years. In this sense, we create five balanced panels which endin years 2014, ..., 2018 ( t “ T ), respectively. The observations in the next year ( t “ T ` his Version: October 6, 2020his Version: October 6, 2020 We begin the empirical analysis by comparing the performance of point, set, and densityforecast for the last panel (in-sample periord: 2005 - 2018). We aim to forecast the invest-ment rate in 2019. We consider all the model specifications depicted in the Table 7 andtheir performance is presented in the Table 8. Throughout the analysis, the Flat estimatorserves as the benchmark as it essentially assumes individual effects. In Table 8, the thirdcolumn shows the RMSFE for the one-step ahead forecast. For the panel considered in thetable, we first notice that the best model is the Tv-Hetero (time-varying random effects,heteroskedasticity) in homogeneous coefficients specification. It outperforms the benchmark– Flat estimator by 25%. Ti-Hetero also delivers accurate point forecast, which suggests timeeffects provide merely marginal improvement. Under heterogeneous coefficients specification,for the BGRE estimators, though all of them beat the Flat estimator, their RMSFEs arerelatively larger. This may arise from the fact that heteroskedasticity alone can capture agreat amount of individual effects, thus imposing heterogeneous coefficients in β i may overfitthe model and lead to poor forecasts.The fourth column documents the average number of latent groups in α i . Most of ourBGRE estimators deem a group structure with more than six underlying components. And his Version: October 6, 2020his Version: October 6, 2020 his Version: October 6, 2020his Version: October 6, 2020
The individual company data are obtain from COMPUSTAT Annual database. To accountfor potential structural breaks and the advanced speed of capital accumulation in the recentdecades, our sample is composed of a balanced panel of firms for the years 2000 to 2019,that includes firms from all industries with no missing value in accounting data.We keep only firm-years that have non-missing information required to construct theprimary variables of interest, namely: capital stock K , investment I , liquidity CF , and salesrevenues Y . The further details of constructing the sample can be found in the AppendixF. The final sample comprises 337 firms and the observations for each firm is 20.To examine the performance of various estimators with limited observations, we chooseto use a rolling window of 15 years. In this sense, we create five balanced panels which endin years 2014, ..., 2018 ( t “ T ), respectively. The observations in the next year ( t “ T ` his Version: October 6, 2020his Version: October 6, 2020 We begin the empirical analysis by comparing the performance of point, set, and densityforecast for the last panel (in-sample periord: 2005 - 2018). We aim to forecast the invest-ment rate in 2019. We consider all the model specifications depicted in the Table 7 andtheir performance is presented in the Table 8. Throughout the analysis, the Flat estimatorserves as the benchmark as it essentially assumes individual effects. In Table 8, the thirdcolumn shows the RMSFE for the one-step ahead forecast. For the panel considered in thetable, we first notice that the best model is the Tv-Hetero (time-varying random effects,heteroskedasticity) in homogeneous coefficients specification. It outperforms the benchmark– Flat estimator by 25%. Ti-Hetero also delivers accurate point forecast, which suggests timeeffects provide merely marginal improvement. Under heterogeneous coefficients specification,for the BGRE estimators, though all of them beat the Flat estimator, their RMSFEs arerelatively larger. This may arise from the fact that heteroskedasticity alone can capture agreat amount of individual effects, thus imposing heterogeneous coefficients in β i may overfitthe model and lead to poor forecasts.The fourth column documents the average number of latent groups in α i . Most of ourBGRE estimators deem a group structure with more than six underlying components. And his Version: October 6, 2020his Version: October 6, 2020 his Version: October 6, 2020his Version: October 6, 2020 his Version: October 6, 2020his Version: October 6, 2020
The individual company data are obtain from COMPUSTAT Annual database. To accountfor potential structural breaks and the advanced speed of capital accumulation in the recentdecades, our sample is composed of a balanced panel of firms for the years 2000 to 2019,that includes firms from all industries with no missing value in accounting data.We keep only firm-years that have non-missing information required to construct theprimary variables of interest, namely: capital stock K , investment I , liquidity CF , and salesrevenues Y . The further details of constructing the sample can be found in the AppendixF. The final sample comprises 337 firms and the observations for each firm is 20.To examine the performance of various estimators with limited observations, we chooseto use a rolling window of 15 years. In this sense, we create five balanced panels which endin years 2014, ..., 2018 ( t “ T ), respectively. The observations in the next year ( t “ T ` his Version: October 6, 2020his Version: October 6, 2020 We begin the empirical analysis by comparing the performance of point, set, and densityforecast for the last panel (in-sample periord: 2005 - 2018). We aim to forecast the invest-ment rate in 2019. We consider all the model specifications depicted in the Table 7 andtheir performance is presented in the Table 8. Throughout the analysis, the Flat estimatorserves as the benchmark as it essentially assumes individual effects. In Table 8, the thirdcolumn shows the RMSFE for the one-step ahead forecast. For the panel considered in thetable, we first notice that the best model is the Tv-Hetero (time-varying random effects,heteroskedasticity) in homogeneous coefficients specification. It outperforms the benchmark– Flat estimator by 25%. Ti-Hetero also delivers accurate point forecast, which suggests timeeffects provide merely marginal improvement. Under heterogeneous coefficients specification,for the BGRE estimators, though all of them beat the Flat estimator, their RMSFEs arerelatively larger. This may arise from the fact that heteroskedasticity alone can capture agreat amount of individual effects, thus imposing heterogeneous coefficients in β i may overfitthe model and lead to poor forecasts.The fourth column documents the average number of latent groups in α i . Most of ourBGRE estimators deem a group structure with more than six underlying components. And his Version: October 6, 2020his Version: October 6, 2020 his Version: October 6, 2020his Version: October 6, 2020 his Version: October 6, 2020his Version: October 6, 2020 This paper studies the estimation and prediction of a dynamic panel data model with latentgrouped random effects. We adopt a nonparametric Bayesian approach to identify coefficientsand group membership in the random effects simultaneously. This approach avoids thesevere issue introduced by the ex-post model selection and allows us to incorporate anyforms of prior knowledge on group structure. In Monte Carlo experiments, we show that theBGRE estimators have the edge over standard Bayesian estimators. Regarding clustering,the BGRE estimators generate comparable performance with the
Kmeans algorithm. Ourempirical application to investment rates across firms reveals that the estimated latent groupstructure provides a great amount of flexibility and improves point, set, and density forecasts.The present work raises interesting issues for further research. First, it may be appealing his Version: October 6, 2020his Version: October 6, 2020
Kmeans algorithm. Ourempirical application to investment rates across firms reveals that the estimated latent groupstructure provides a great amount of flexibility and improves point, set, and density forecasts.The present work raises interesting issues for further research. First, it may be appealing his Version: October 6, 2020his Version: October 6, 2020 his Version: October 6, 2020his Version: October 6, 2020
Kmeans algorithm. Ourempirical application to investment rates across firms reveals that the estimated latent groupstructure provides a great amount of flexibility and improves point, set, and density forecasts.The present work raises interesting issues for further research. First, it may be appealing his Version: October 6, 2020his Version: October 6, 2020 his Version: October 6, 2020his Version: October 6, 2020 References
Ando, T. and J. Bai (2016): “Panel data models with grouped factor structure underunknown group membership,”
Journal of Applied Econometrics , 31, 163–191.
Antoniak, C. E. (1974): “Mixtures of Dirichlet processes with applications to Bayesiannonparametric problems,”
The Annals of Statistics , 1152–1174.
Arellano, M. and S. Bond (1991): “Some tests of specification for panel data: MonteCarlo evidence and an application to employment equations,”
The Review of EconomicStudies , 58, 277–297.
Arellano, M. and O. Bover (1995): “Another look at the instrumental variable esti-mation of error-components models,”
Journal of Econometrics , 68, 29–51.
Bai, J. and S. Ng (2002): “Determining the number of factors in approximate factormodels,”
Econometrica , 70, 191–221.
Bester, C. A. and C. B. Hansen (2016): “Grouped effects estimators in fixed effectsmodels,”
Journal of Econometrics , 190, 197–208.
Biernacki, C., G. Celeux, and G. Govaert (2000): “Assessing a mixture modelfor clustering with the integrated completed likelihood,”
IEEE transactions on patternanalysis and machine intelligence , 22, 719–725.
Blundell, R. and S. Bond (1998): “Initial conditions and moment restrictions in dy-namic panel data models,”
Journal of Econometrics , 87, 115–143.
Bonhomme, S., T. Lamadon, and E. Manresa (2019): “Discretizing unobserved het-erogeneity,”
University of Chicago, Becker Friedman Institute for Economics WorkingPaper . Bonhomme, S. and E. Manresa (2015): “Grouped patterns of heterogeneity in paneldata,”
Econometrica , 83, 1147–1184.
Celeux, G., F. Forbes, C. P. Robert, and Titterington (2006): “Deviance infor-mation criteria for missing data models,”
Bayesian Analysis , 1, 651–673.
Chamberlain, G. (1980): “Analysis of covariance with qualitative data,”
The Review ofEconomic Studies , 47, 225–238.
Cheng, X., F. Schorfheide, and P. Shao (2019): “Clustering for Multi-DimensionalHeterogeneity,” .
Escobar, M. D. and M. West (1995): “Bayesian density estimation and inference usingmixtures,”
Journal of the American Statistical Association , 90, 577–588.
Fr¨uhwirth-Schnatter, S. (2004): “Estimating marginal likelihoods for mixture andMarkov switching models using bridge sampling techniques,”
The Econometrics Journal ,7, 143–167. his Version: October 6, 2020his Version: October 6, 2020
The Econometrics Journal ,7, 143–167. his Version: October 6, 2020his Version: October 6, 2020
Advances in Data Analysis and Classification , 5, 251–280.
Gala, V. D., J. F. Gomes, and T. Liu (2019): “Investment without q,”
Journal ofMonetary Economics . Geweke, J. and G. Amisano (2010): “Comparing and evaluating Bayesian predictivedistributions of asset returns,”
International Journal of Forecasting , 26, 216–230.
Hastie, D. I., S. Liverani, and S. Richardson (2015): “Sampling from Dirichletprocess mixture models with unknown concentration parameter: mixing issues in largedata implementations,”
Statistics and Computing , 25, 1023–1037.
Hsiao, C. and A. K. Tahmiscioglu (1997): “A panel analysis of liquidity constraintsand firm investment,”
Journal of the American Statistical Association , 92, 455–465.
Ishwaran, H. and L. F. James (2001): “Gibbs sampling methods for stick-breakingpriors,”
Journal of the American Statistical Association , 96, 161–173.
Kalli, M., J. E. Griffin, and S. G. Walker (2011): “Slice sampling mixture models,”
Statistics and Computing , 21, 93–105.
Keribin, C. (2000): “Consistent estimation of the order of mixture models,”
Sankhy¯a: TheIndian Journal of Statistics, Series A , 49–66.
Kim, J. and L. Wang (2019): “Hidden group patterns in democracy developments:Bayesian inference for grouped heterogeneity,”
Journal of Applied Econometrics , 34, 1016–1028.
Lin, C.-C. and S. Ng (2012): “Estimation of panel data models with parameter hetero-geneity when group membership is unknown,”
Journal of Econometric Methods , 1, 42–55.
Liu, L. (2020): “Density Forecasts in Panel Data Models: A Semiparametric BayesianPerspective,” .
Liu, L., H. R. Moon, and F. Schorfheide (2019): “Forecasting with a panel tobitmodel,” Tech. rep., National Bureau of Economic Research.
Liverani, S., D. I. Hastie, L. Azizi, M. Papathomas, and S. Richardson (2015):“PReMiuM: An R package for profile regression mixture models using Dirichlet processes,”
Journal of Statistical Software , 64, 1.
MacQueen, J. (1967): “Some methods for classification and analysis of multivariate ob-servations,” in
Proceedings of the fifth Berkeley symposium on mathematical statistics andprobability , Oakland, CA, USA, vol. 1, 281–297.
Malsiner-Walli, G., S. Fr¨uhwirth-Schnatter, and B. Gr¨un (2016): “Model-basedclustering based on sparse finite Gaussian mixtures,”
Statistics and Computing , 26, 303–324. his Version: October 6, 2020his Version: October 6, 2020
Statistics and Computing , 26, 303–324. his Version: October 6, 2020his Version: October 6, 2020 McNicholas, P. D. and T. B. Murphy (2010): “Model-based clustering of longitudinaldata,”
Canadian Journal of Statistics , 38, 153–168.
Molitor, J., M. Papathomas, M. Jerrett, and S. Richardson (2010): “Bayesianprofile regression with an application to the National Survey of Children’s Health,”
Bio-statistics , 11, 484–498.
Neal, R. M. (2000): “Markov chain sampling methods for Dirichlet process mixture mod-els,”
Journal of Computational and Graphical Statistics , 9, 249–265.
Neyman, J. and E. L. Scott (1948): “Consistent estimates based on partially consistentobservations,”
Econometrica , 1–32.
Nickell, S. (1981): “Biases in dynamic models with fixed effects,”
Econometrica: Journalof the Econometric Society , 1417–1426.
Papaspiliopoulos, O. and G. O. Roberts (2008): “Retrospective Markov chain MonteCarlo methods for Dirichlet process hierarchical models,”
Biometrika , 95, 169–186.
Su, L., Z. Shi, and P. C. Phillips (2016): “Identifying latent structures in panel data,”
Econometrica , 84, 2215–2264.
Su, L., X. Wang, and S. Jin (2019): “Sieve estimation of time-varying panel data modelswith latent structures,”
Journal of Business & Economic Statistics , 37, 334–349.
Sun, Y. (2005): “Estimation and inference in panel structure models,”
Available at SSRN794884 . Walker, S. G. (2007): “Sampling the Dirichlet mixture model with slices,”
Communica-tions in Statistics—Simulation and Computation ® , 36, 45–54. Yau, C., O. Papaspiliopoulos, G. O. Roberts, and C. Holmes (2011): “Bayesiannon-parametric hidden Markov models with applications in genomics,”
Journal of theRoyal Statistical Society , 73, 37–57. his Version: October 6, 2020his Version: October 6, 2020
Journal of theRoyal Statistical Society , 73, 37–57. his Version: October 6, 2020his Version: October 6, 2020
A-1
Supplemental Appendix to“Forecasting with Bayesian Grouped Random Effectsin Panel Data”
Boyuan Zhang
A Posterior Distributions and Algorithms
A.1 Random Effects Model
Below, I present the conditional posterior distribution for the time-varying random effectsmodel with heteroskedasticity, which is the most complicated scenarios. For other models,such as its time-invariant counterparts and homoskedastic model, adjustment can be easilymade by eliminating time effects and heteroskedasticity.To facilitate derivation, we stack observations and parameters,Observations: Y “ r y , y , ..., y N s , y i “ r y i , y i , ..., y iT s , Covariates: X “ r x , x , ..., x N s , x i “ r x i , x i , ..., x iT s , Random effects: α “ r α , α , . . . s , Covariance matrices: Σ “ “ σ , σ , . . . ‰ , Hetergeneous coefficients: β “ r β , . . . , β N s , Stick length: Ξ “ r ξ , ξ , . . . s , Group membership: G “ r g , . . . , g N s , Auxiliary varaible: u “ r u , u , ..., u N s , Hyper parameters: φ “ r µ α , Σ α , ν σ , δ σ s . his Version: October 6, 2020his Version: October 6, 2020
Below, I present the conditional posterior distribution for the time-varying random effectsmodel with heteroskedasticity, which is the most complicated scenarios. For other models,such as its time-invariant counterparts and homoskedastic model, adjustment can be easilymade by eliminating time effects and heteroskedasticity.To facilitate derivation, we stack observations and parameters,Observations: Y “ r y , y , ..., y N s , y i “ r y i , y i , ..., y iT s , Covariates: X “ r x , x , ..., x N s , x i “ r x i , x i , ..., x iT s , Random effects: α “ r α , α , . . . s , Covariance matrices: Σ “ “ σ , σ , . . . ‰ , Hetergeneous coefficients: β “ r β , . . . , β N s , Stick length: Ξ “ r ξ , ξ , . . . s , Group membership: G “ r g , . . . , g N s , Auxiliary varaible: u “ r u , u , ..., u N s , Hyper parameters: φ “ r µ α , Σ α , ν σ , δ σ s . his Version: October 6, 2020his Version: October 6, 2020 A-2The posterior of unknown objects in the random effects model is, p p ρ, β, α, Σ , Ξ , a, G | Y, X q9 p p Y | X, ρ, β, α, Σ , G q p p ρ, β, α, Σ , Ξ , a, G q9 p p Y | X, ρ, β, α, Σ , G q p p α, Σ | φ q p p Ξ | a q p p G | Ξ q p p ρ q p p β q p p a q“ N ź i “ p p y i | x i , ρ, β i , α g i , σ g i q ź j “ p p α j , σ j | φ q ź j “ p p ξ j | a q N ź i “ p p g i | Ξ q p p ρ q N ź i “ p p β i q p p a q“ « N ź i “ p p y i | x i , ρ, β i , α g i , σ g i q p p g i | Ξ q p p β i q ff « ź j “ p p α j , σ j | φ q p p ξ j | a q ff p p ρ q p p a q . (A.1)In the following derivation and algorithm, we adopt the slice sampler (Walker, 2007) thatavoids approximation in Ishwaran and James (2001). Walker (2007) augments the posteriordistribution with a set of auxiliary variables u “ r u , u , ..., u N s , which are i.i.d. standarduniform random variables, i.e, u i iid „ U p , q . Then the augmented posterior is written as, p p ρ, β, α, Σ , Ξ , a, G, u | Y, X q9 « N ź i “ p p y i | x i , ρ, β i , α g i , σ g i q p u i ă π g i q p p β i q ff « ź j “ p p α j , σ j | φ q p p ξ j | a q ff p p ρ q p p a q“ « N ź i “ p p y i | x i , ρ, β i , α g i , σ g i q p p u i | π g i q π g i p p β i q ff « ź j “ p p α j , σ j | φ q p p ξ j | a q ff p p ρ q p p a q , (A.2)where π g i “ p p g i | Ξ q , and p¨q is the indicator function, which is equal to zero unless thespecific condition is satisfied. The original posterior can be recovered by integrating out u i for i “ , , ..., N . As we don’t limit the upper bound of the number of groups, it is impossibleto sample from an infinite-dimensional posterior density. The merit of slice-sampling is thatit reduces the dimensions and allows us to solve a manageable problem with finite dimensions,which we will see below.With a set of auxiliary variables u “ r u , u , ..., u N s , we define the largest possible numberof potential components as K ˚ “ min k u ˚ ą ´ k ÿ j “ π j + , (A.3) his Version: October 6, 2020his Version: October 6, 2020
Below, I present the conditional posterior distribution for the time-varying random effectsmodel with heteroskedasticity, which is the most complicated scenarios. For other models,such as its time-invariant counterparts and homoskedastic model, adjustment can be easilymade by eliminating time effects and heteroskedasticity.To facilitate derivation, we stack observations and parameters,Observations: Y “ r y , y , ..., y N s , y i “ r y i , y i , ..., y iT s , Covariates: X “ r x , x , ..., x N s , x i “ r x i , x i , ..., x iT s , Random effects: α “ r α , α , . . . s , Covariance matrices: Σ “ “ σ , σ , . . . ‰ , Hetergeneous coefficients: β “ r β , . . . , β N s , Stick length: Ξ “ r ξ , ξ , . . . s , Group membership: G “ r g , . . . , g N s , Auxiliary varaible: u “ r u , u , ..., u N s , Hyper parameters: φ “ r µ α , Σ α , ν σ , δ σ s . his Version: October 6, 2020his Version: October 6, 2020 A-2The posterior of unknown objects in the random effects model is, p p ρ, β, α, Σ , Ξ , a, G | Y, X q9 p p Y | X, ρ, β, α, Σ , G q p p ρ, β, α, Σ , Ξ , a, G q9 p p Y | X, ρ, β, α, Σ , G q p p α, Σ | φ q p p Ξ | a q p p G | Ξ q p p ρ q p p β q p p a q“ N ź i “ p p y i | x i , ρ, β i , α g i , σ g i q ź j “ p p α j , σ j | φ q ź j “ p p ξ j | a q N ź i “ p p g i | Ξ q p p ρ q N ź i “ p p β i q p p a q“ « N ź i “ p p y i | x i , ρ, β i , α g i , σ g i q p p g i | Ξ q p p β i q ff « ź j “ p p α j , σ j | φ q p p ξ j | a q ff p p ρ q p p a q . (A.1)In the following derivation and algorithm, we adopt the slice sampler (Walker, 2007) thatavoids approximation in Ishwaran and James (2001). Walker (2007) augments the posteriordistribution with a set of auxiliary variables u “ r u , u , ..., u N s , which are i.i.d. standarduniform random variables, i.e, u i iid „ U p , q . Then the augmented posterior is written as, p p ρ, β, α, Σ , Ξ , a, G, u | Y, X q9 « N ź i “ p p y i | x i , ρ, β i , α g i , σ g i q p u i ă π g i q p p β i q ff « ź j “ p p α j , σ j | φ q p p ξ j | a q ff p p ρ q p p a q“ « N ź i “ p p y i | x i , ρ, β i , α g i , σ g i q p p u i | π g i q π g i p p β i q ff « ź j “ p p α j , σ j | φ q p p ξ j | a q ff p p ρ q p p a q , (A.2)where π g i “ p p g i | Ξ q , and p¨q is the indicator function, which is equal to zero unless thespecific condition is satisfied. The original posterior can be recovered by integrating out u i for i “ , , ..., N . As we don’t limit the upper bound of the number of groups, it is impossibleto sample from an infinite-dimensional posterior density. The merit of slice-sampling is thatit reduces the dimensions and allows us to solve a manageable problem with finite dimensions,which we will see below.With a set of auxiliary variables u “ r u , u , ..., u N s , we define the largest possible numberof potential components as K ˚ “ min k u ˚ ą ´ k ÿ j “ π j + , (A.3) his Version: October 6, 2020his Version: October 6, 2020 A-3where u ˚ “ min ď i ď N u i . (A.4)Such specification ensures that for any group k ą K ˚ and any unit i P t , , ..., N u , we have u i ą π k . This crucial property limits the dimension of α and Σ to K ˚ as the densities of α k and σ k equal 0 for k ą K ˚ due to p u i ă π k q “
0, which will be clear in the subsequentposterior derivation.Next, we define the number of active groups K a “ max ď i ď N g i . (A.5)It can be shown that K a ď K ˚ . Conditional posterior of α (grouped random effects) . p p α | ρ, β, Σ , Ξ , a, G, u, Y, X q9 « N ź i “ p p y i | x i , ρ, β i , α g i , Σ g i q p u i ă π g i q ff « ź j “ p p α j , σ j | φ q ff , For k P t , , ..., K a u , define a set of unit that belongs to group k , C k “ t i P t , , ..., N u| g i “ k u , (A.6)then the posterior density for α k read as p p α k | ρ, β, Σ , Ξ , a, G, u, Y, X q9 « ź i P C k p p y i | x i , ρ, β i , α k , σ k q ff p p α k | φ q9 exp « ´ ÿ i P C k p y i ´ ρy ´ ,i ´ x i β i ´ α k q Σ ´ k p y i ´ ρy ´ ,i ´ x i β i ´ α k q ff exp “ ´ p α k ´ µ α q Σ ´ α p α k ´ µ α q ‰ exp “ ´ p α k ´ ¯ µ α k q ¯Σ ´ α p α k ´ ¯ µ α k q ‰ , where y ´ ,i are lagged values for y i . Assuming an independent normal conjugate prior for See proof in proposition B.1 See proof in proposition B.1 his Version: October 6, 2020his Version: October 6, 2020
0, which will be clear in the subsequentposterior derivation.Next, we define the number of active groups K a “ max ď i ď N g i . (A.5)It can be shown that K a ď K ˚ . Conditional posterior of α (grouped random effects) . p p α | ρ, β, Σ , Ξ , a, G, u, Y, X q9 « N ź i “ p p y i | x i , ρ, β i , α g i , Σ g i q p u i ă π g i q ff « ź j “ p p α j , σ j | φ q ff , For k P t , , ..., K a u , define a set of unit that belongs to group k , C k “ t i P t , , ..., N u| g i “ k u , (A.6)then the posterior density for α k read as p p α k | ρ, β, Σ , Ξ , a, G, u, Y, X q9 « ź i P C k p p y i | x i , ρ, β i , α k , σ k q ff p p α k | φ q9 exp « ´ ÿ i P C k p y i ´ ρy ´ ,i ´ x i β i ´ α k q Σ ´ k p y i ´ ρy ´ ,i ´ x i β i ´ α k q ff exp “ ´ p α k ´ µ α q Σ ´ α p α k ´ µ α q ‰ exp “ ´ p α k ´ ¯ µ α k q ¯Σ ´ α p α k ´ ¯ µ α k q ‰ , where y ´ ,i are lagged values for y i . Assuming an independent normal conjugate prior for See proof in proposition B.1 See proof in proposition B.1 his Version: October 6, 2020his Version: October 6, 2020
A-4 α k , the posterior for α k is given by α k | ρ, β, Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ α k , ¯Σ α k ˘ . (A.7)where ¯Σ α k “ ˜ Σ ´ α ` ÿ i P C k Σ ´ i ¸ ´ , ¯ µ α k “ ¯Σ α k « Σ ´ α µ α ` ÿ i P C k Σ ´ i r y i ff , r y i “ y i ´ ρy ´ ,i ´ x i β i . If group k is empty, we draw α k from its prior N p µ α , Σ α q . Conditional posterior of Σ (grouped variance) . Under the cross-sectional indepen-dence, for k “ , , ..., K a , p p σ k | ρ, β, α, Ξ , a, G, u, Y, X q9 « ź i P C k p p y i | x i , ρ, β i , α k , σ k q ff p p σ k | φ q Assuming a inverse-gamma prior σ k „ IG ` v σ , δ σ ˘ , the posterior distribution of σ k is p p σ k | ρ, β, α, G, u, Y, X q9 ź i P C k « p σ k q ´ T exp ˜ ´ ř Tt “ p y it ´ ρy it ´ ´ β x it ´ α kt q σ k ¸ff ˆ σ k ˙ vσ ` exp ˆ ´ δ σ σ k ˙ “ ˆ σ k ˙ vσ ` T | Ck | ` exp ˜ ´ δ σ ` ř i P C k ř Tt “ p y it ´ ρy it ´ ´ β x it ´ α kt q σ k ¸ . This implies σ k | ρ, β, α, Ξ , a, G, u, Y, X „ IG ˆ ¯ v σ,k , ¯ δ σ,k ˙ , (A.8) his Version: October 6, 2020his Version: October 6, 2020
A-4 α k , the posterior for α k is given by α k | ρ, β, Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ α k , ¯Σ α k ˘ . (A.7)where ¯Σ α k “ ˜ Σ ´ α ` ÿ i P C k Σ ´ i ¸ ´ , ¯ µ α k “ ¯Σ α k « Σ ´ α µ α ` ÿ i P C k Σ ´ i r y i ff , r y i “ y i ´ ρy ´ ,i ´ x i β i . If group k is empty, we draw α k from its prior N p µ α , Σ α q . Conditional posterior of Σ (grouped variance) . Under the cross-sectional indepen-dence, for k “ , , ..., K a , p p σ k | ρ, β, α, Ξ , a, G, u, Y, X q9 « ź i P C k p p y i | x i , ρ, β i , α k , σ k q ff p p σ k | φ q Assuming a inverse-gamma prior σ k „ IG ` v σ , δ σ ˘ , the posterior distribution of σ k is p p σ k | ρ, β, α, G, u, Y, X q9 ź i P C k « p σ k q ´ T exp ˜ ´ ř Tt “ p y it ´ ρy it ´ ´ β x it ´ α kt q σ k ¸ff ˆ σ k ˙ vσ ` exp ˆ ´ δ σ σ k ˙ “ ˆ σ k ˙ vσ ` T | Ck | ` exp ˜ ´ δ σ ` ř i P C k ř Tt “ p y it ´ ρy it ´ ´ β x it ´ α kt q σ k ¸ . This implies σ k | ρ, β, α, Ξ , a, G, u, Y, X „ IG ˆ ¯ v σ,k , ¯ δ σ,k ˙ , (A.8) his Version: October 6, 2020his Version: October 6, 2020 A-5where ¯ v σ,k “ v σ ` T | C k | , ¯ δ σ,kt “ δ σ ` ÿ i P C k T ÿ t “ p r y it ´ α kt q , | C k | “ occurrence of g i “ k, r y it “ y it ´ ρy it ´ ´ β x it . If group k is empty, we draw σ k from its prior IG ` v σ , δ σ ˘ . Conditional posterior of ρ (common coefficient) . Using a normal conjugate prior ρ „ N p µ ρ , Σ ρ q , we could solve standard Bayesian linear regression to get the posteriordensity of the common coefficient ρ , p p ρ | β, α, Σ , Ξ , a, G, u, Y, X q9 « N ź i “ p p y i | x i , ρ, β i , α g i , Σ g i q ff p p ρ q9 exp « ´ N ÿ i “ p y i ´ α g i ´ x i β i ´ ρy ´ ,i q Σ ´ g i p y i ´ α g i ´ x i β i ´ ρy ´ ,i ff exp “ ´p ρ ´ µ ρ q Σ ´ ρ p ρ ´ µ ρ q ‰ . This implies ρ | β, α, Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ ρ , ¯Σ ρ ˘ , (A.9)where ¯Σ ρ “ ˜ Σ ´ ρ ` N ÿ i “ y ,i Σ ´ g i y ´ ,i ¸ ´ , ¯ µ ρ “ ¯Σ ρ « Σ ´ ρ µ ρ ` N ÿ i “ y ,i Σ ´ g i ˆ y i ff , ˆ y i “ y i ´ α g i ´ x i β i . Conditional posterior of β (heterogeneous coefficients) . As ε it is independent acrossunits, we solve for β for each unit separately. We transform the model into a standard linear his Version: October 6, 2020his Version: October 6, 2020
A-4 α k , the posterior for α k is given by α k | ρ, β, Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ α k , ¯Σ α k ˘ . (A.7)where ¯Σ α k “ ˜ Σ ´ α ` ÿ i P C k Σ ´ i ¸ ´ , ¯ µ α k “ ¯Σ α k « Σ ´ α µ α ` ÿ i P C k Σ ´ i r y i ff , r y i “ y i ´ ρy ´ ,i ´ x i β i . If group k is empty, we draw α k from its prior N p µ α , Σ α q . Conditional posterior of Σ (grouped variance) . Under the cross-sectional indepen-dence, for k “ , , ..., K a , p p σ k | ρ, β, α, Ξ , a, G, u, Y, X q9 « ź i P C k p p y i | x i , ρ, β i , α k , σ k q ff p p σ k | φ q Assuming a inverse-gamma prior σ k „ IG ` v σ , δ σ ˘ , the posterior distribution of σ k is p p σ k | ρ, β, α, G, u, Y, X q9 ź i P C k « p σ k q ´ T exp ˜ ´ ř Tt “ p y it ´ ρy it ´ ´ β x it ´ α kt q σ k ¸ff ˆ σ k ˙ vσ ` exp ˆ ´ δ σ σ k ˙ “ ˆ σ k ˙ vσ ` T | Ck | ` exp ˜ ´ δ σ ` ř i P C k ř Tt “ p y it ´ ρy it ´ ´ β x it ´ α kt q σ k ¸ . This implies σ k | ρ, β, α, Ξ , a, G, u, Y, X „ IG ˆ ¯ v σ,k , ¯ δ σ,k ˙ , (A.8) his Version: October 6, 2020his Version: October 6, 2020 A-5where ¯ v σ,k “ v σ ` T | C k | , ¯ δ σ,kt “ δ σ ` ÿ i P C k T ÿ t “ p r y it ´ α kt q , | C k | “ occurrence of g i “ k, r y it “ y it ´ ρy it ´ ´ β x it . If group k is empty, we draw σ k from its prior IG ` v σ , δ σ ˘ . Conditional posterior of ρ (common coefficient) . Using a normal conjugate prior ρ „ N p µ ρ , Σ ρ q , we could solve standard Bayesian linear regression to get the posteriordensity of the common coefficient ρ , p p ρ | β, α, Σ , Ξ , a, G, u, Y, X q9 « N ź i “ p p y i | x i , ρ, β i , α g i , Σ g i q ff p p ρ q9 exp « ´ N ÿ i “ p y i ´ α g i ´ x i β i ´ ρy ´ ,i q Σ ´ g i p y i ´ α g i ´ x i β i ´ ρy ´ ,i ff exp “ ´p ρ ´ µ ρ q Σ ´ ρ p ρ ´ µ ρ q ‰ . This implies ρ | β, α, Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ ρ , ¯Σ ρ ˘ , (A.9)where ¯Σ ρ “ ˜ Σ ´ ρ ` N ÿ i “ y ,i Σ ´ g i y ´ ,i ¸ ´ , ¯ µ ρ “ ¯Σ ρ « Σ ´ ρ µ ρ ` N ÿ i “ y ,i Σ ´ g i ˆ y i ff , ˆ y i “ y i ´ α g i ´ x i β i . Conditional posterior of β (heterogeneous coefficients) . As ε it is independent acrossunits, we solve for β for each unit separately. We transform the model into a standard linear his Version: October 6, 2020his Version: October 6, 2020 A-6model with a known form of heteroskedasticity, y it ´ α g i t ´ ρy it ´ “ β i x it ` ε it , ε it „ N p , σ g i q . Using a normal conjugate prior β i „ N p µ β , σ β q , for the unit i , the posterior distribution iswritten as, p p β i | ρ, α, Σ , Ξ , a, G, u, Y, X q9 p p y i | x i , ρ, β i , α g i , σ g i q p p β i q9 exp « ´ ř Tt “ p y it ´ α g i ´ ρy it ´ x it β i q σ g i ff exp “ ´p β i ´ µ β q Σ ´ β p β i ´ µ β q ‰ . This implies β i | ρ, α, Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ β i , ¯Σ β i ˘ , (A.10)where ¯Σ β i “ ˜ Σ ´ ρ ` σ ´ g i T ÿ t “ x it x it ¸ ´ ¯ µ β i “ ¯Σ ρ « Σ ´ ρ µ ρ ` σ ´ g i T ÿ t “ x it ˆ y it ff ˆ y it “ y it ´ α g i ´ ρy it ´ Conditional posterior of Ξ (stick length) . p p Ξ | ρ, β, α, Σ , a, G, u, Y, X q9 « N ź i “ p p u i | π g i q π g i ff « ź j “ p p ξ j | a q ff « N ź i “ p p u i | π g i q ξ g i ź l ă g i p ´ ξ l q ff « ź j “ p p ξ j | a q ff his Version: October 6, 2020his Version: October 6, 2020
A-4 α k , the posterior for α k is given by α k | ρ, β, Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ α k , ¯Σ α k ˘ . (A.7)where ¯Σ α k “ ˜ Σ ´ α ` ÿ i P C k Σ ´ i ¸ ´ , ¯ µ α k “ ¯Σ α k « Σ ´ α µ α ` ÿ i P C k Σ ´ i r y i ff , r y i “ y i ´ ρy ´ ,i ´ x i β i . If group k is empty, we draw α k from its prior N p µ α , Σ α q . Conditional posterior of Σ (grouped variance) . Under the cross-sectional indepen-dence, for k “ , , ..., K a , p p σ k | ρ, β, α, Ξ , a, G, u, Y, X q9 « ź i P C k p p y i | x i , ρ, β i , α k , σ k q ff p p σ k | φ q Assuming a inverse-gamma prior σ k „ IG ` v σ , δ σ ˘ , the posterior distribution of σ k is p p σ k | ρ, β, α, G, u, Y, X q9 ź i P C k « p σ k q ´ T exp ˜ ´ ř Tt “ p y it ´ ρy it ´ ´ β x it ´ α kt q σ k ¸ff ˆ σ k ˙ vσ ` exp ˆ ´ δ σ σ k ˙ “ ˆ σ k ˙ vσ ` T | Ck | ` exp ˜ ´ δ σ ` ř i P C k ř Tt “ p y it ´ ρy it ´ ´ β x it ´ α kt q σ k ¸ . This implies σ k | ρ, β, α, Ξ , a, G, u, Y, X „ IG ˆ ¯ v σ,k , ¯ δ σ,k ˙ , (A.8) his Version: October 6, 2020his Version: October 6, 2020 A-5where ¯ v σ,k “ v σ ` T | C k | , ¯ δ σ,kt “ δ σ ` ÿ i P C k T ÿ t “ p r y it ´ α kt q , | C k | “ occurrence of g i “ k, r y it “ y it ´ ρy it ´ ´ β x it . If group k is empty, we draw σ k from its prior IG ` v σ , δ σ ˘ . Conditional posterior of ρ (common coefficient) . Using a normal conjugate prior ρ „ N p µ ρ , Σ ρ q , we could solve standard Bayesian linear regression to get the posteriordensity of the common coefficient ρ , p p ρ | β, α, Σ , Ξ , a, G, u, Y, X q9 « N ź i “ p p y i | x i , ρ, β i , α g i , Σ g i q ff p p ρ q9 exp « ´ N ÿ i “ p y i ´ α g i ´ x i β i ´ ρy ´ ,i q Σ ´ g i p y i ´ α g i ´ x i β i ´ ρy ´ ,i ff exp “ ´p ρ ´ µ ρ q Σ ´ ρ p ρ ´ µ ρ q ‰ . This implies ρ | β, α, Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ ρ , ¯Σ ρ ˘ , (A.9)where ¯Σ ρ “ ˜ Σ ´ ρ ` N ÿ i “ y ,i Σ ´ g i y ´ ,i ¸ ´ , ¯ µ ρ “ ¯Σ ρ « Σ ´ ρ µ ρ ` N ÿ i “ y ,i Σ ´ g i ˆ y i ff , ˆ y i “ y i ´ α g i ´ x i β i . Conditional posterior of β (heterogeneous coefficients) . As ε it is independent acrossunits, we solve for β for each unit separately. We transform the model into a standard linear his Version: October 6, 2020his Version: October 6, 2020 A-6model with a known form of heteroskedasticity, y it ´ α g i t ´ ρy it ´ “ β i x it ` ε it , ε it „ N p , σ g i q . Using a normal conjugate prior β i „ N p µ β , σ β q , for the unit i , the posterior distribution iswritten as, p p β i | ρ, α, Σ , Ξ , a, G, u, Y, X q9 p p y i | x i , ρ, β i , α g i , σ g i q p p β i q9 exp « ´ ř Tt “ p y it ´ α g i ´ ρy it ´ x it β i q σ g i ff exp “ ´p β i ´ µ β q Σ ´ β p β i ´ µ β q ‰ . This implies β i | ρ, α, Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ β i , ¯Σ β i ˘ , (A.10)where ¯Σ β i “ ˜ Σ ´ ρ ` σ ´ g i T ÿ t “ x it x it ¸ ´ ¯ µ β i “ ¯Σ ρ « Σ ´ ρ µ ρ ` σ ´ g i T ÿ t “ x it ˆ y it ff ˆ y it “ y it ´ α g i ´ ρy it ´ Conditional posterior of Ξ (stick length) . p p Ξ | ρ, β, α, Σ , a, G, u, Y, X q9 « N ź i “ p p u i | π g i q π g i ff « ź j “ p p ξ j | a q ff « N ź i “ p p u i | π g i q ξ g i ź l ă g i p ´ ξ l q ff « ź j “ p p ξ j | a q ff his Version: October 6, 2020his Version: October 6, 2020 A-7For k “ , , ..., K a , p p ξ k | ρ, β, α, Σ , a, G, u, Y, X q9 ˜ ź i P C k ξ k ¸ p ´ ξ k q N ř j “ p g j ą k q p ´ ξ k q a ´ ξ | C k | k p ´ ξ k q a ` N ř j “ p g j ą k q´ . Therefore, posterior distribution of ξ k is ξ k | ρ, β, α, Σ , a, G, u, Y, X „ Beta ˜ | C k | ` , a ` N ÿ j “ p g j ą k q ¸ . (A.11)Give Ξ “ r ξ , ξ , ..., ξ K a s , update π , π , ..., π K a , π k | G, Ξ „ ξ , k “ ξ k ś j ă k p ´ ξ j q , k “ , . . . , K a . (A.12) Conditional posterior of a (concentration parameter) . Regarding the DP concen-tration parameter, the standard posterior derivation doesn’t work due to the unrestrictednumber of components in the current sampler. Instead, we implement the 2-step procedureproposed by Escobar and West (1995) (p.8-9). Following their approach, we first draw alatent variable η from η „ Beta p a ` , N q . (A.13)Then, conditional on η and K a , we assume sample a from a mixture of two Gamma distri-bution: p p a | η, K a q “ π a Gamma p m ` K a , n ´ log p η qq ` p ´ π a q Gamma p m ` K a ´ , n ´ log p η qq , (A.14)with the weights π a defined by π a ´ π a “ m ` K a ´ N r n ´ log p η qs . his Version: October 6, 2020his Version: October 6, 2020
A-4 α k , the posterior for α k is given by α k | ρ, β, Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ α k , ¯Σ α k ˘ . (A.7)where ¯Σ α k “ ˜ Σ ´ α ` ÿ i P C k Σ ´ i ¸ ´ , ¯ µ α k “ ¯Σ α k « Σ ´ α µ α ` ÿ i P C k Σ ´ i r y i ff , r y i “ y i ´ ρy ´ ,i ´ x i β i . If group k is empty, we draw α k from its prior N p µ α , Σ α q . Conditional posterior of Σ (grouped variance) . Under the cross-sectional indepen-dence, for k “ , , ..., K a , p p σ k | ρ, β, α, Ξ , a, G, u, Y, X q9 « ź i P C k p p y i | x i , ρ, β i , α k , σ k q ff p p σ k | φ q Assuming a inverse-gamma prior σ k „ IG ` v σ , δ σ ˘ , the posterior distribution of σ k is p p σ k | ρ, β, α, G, u, Y, X q9 ź i P C k « p σ k q ´ T exp ˜ ´ ř Tt “ p y it ´ ρy it ´ ´ β x it ´ α kt q σ k ¸ff ˆ σ k ˙ vσ ` exp ˆ ´ δ σ σ k ˙ “ ˆ σ k ˙ vσ ` T | Ck | ` exp ˜ ´ δ σ ` ř i P C k ř Tt “ p y it ´ ρy it ´ ´ β x it ´ α kt q σ k ¸ . This implies σ k | ρ, β, α, Ξ , a, G, u, Y, X „ IG ˆ ¯ v σ,k , ¯ δ σ,k ˙ , (A.8) his Version: October 6, 2020his Version: October 6, 2020 A-5where ¯ v σ,k “ v σ ` T | C k | , ¯ δ σ,kt “ δ σ ` ÿ i P C k T ÿ t “ p r y it ´ α kt q , | C k | “ occurrence of g i “ k, r y it “ y it ´ ρy it ´ ´ β x it . If group k is empty, we draw σ k from its prior IG ` v σ , δ σ ˘ . Conditional posterior of ρ (common coefficient) . Using a normal conjugate prior ρ „ N p µ ρ , Σ ρ q , we could solve standard Bayesian linear regression to get the posteriordensity of the common coefficient ρ , p p ρ | β, α, Σ , Ξ , a, G, u, Y, X q9 « N ź i “ p p y i | x i , ρ, β i , α g i , Σ g i q ff p p ρ q9 exp « ´ N ÿ i “ p y i ´ α g i ´ x i β i ´ ρy ´ ,i q Σ ´ g i p y i ´ α g i ´ x i β i ´ ρy ´ ,i ff exp “ ´p ρ ´ µ ρ q Σ ´ ρ p ρ ´ µ ρ q ‰ . This implies ρ | β, α, Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ ρ , ¯Σ ρ ˘ , (A.9)where ¯Σ ρ “ ˜ Σ ´ ρ ` N ÿ i “ y ,i Σ ´ g i y ´ ,i ¸ ´ , ¯ µ ρ “ ¯Σ ρ « Σ ´ ρ µ ρ ` N ÿ i “ y ,i Σ ´ g i ˆ y i ff , ˆ y i “ y i ´ α g i ´ x i β i . Conditional posterior of β (heterogeneous coefficients) . As ε it is independent acrossunits, we solve for β for each unit separately. We transform the model into a standard linear his Version: October 6, 2020his Version: October 6, 2020 A-6model with a known form of heteroskedasticity, y it ´ α g i t ´ ρy it ´ “ β i x it ` ε it , ε it „ N p , σ g i q . Using a normal conjugate prior β i „ N p µ β , σ β q , for the unit i , the posterior distribution iswritten as, p p β i | ρ, α, Σ , Ξ , a, G, u, Y, X q9 p p y i | x i , ρ, β i , α g i , σ g i q p p β i q9 exp « ´ ř Tt “ p y it ´ α g i ´ ρy it ´ x it β i q σ g i ff exp “ ´p β i ´ µ β q Σ ´ β p β i ´ µ β q ‰ . This implies β i | ρ, α, Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ β i , ¯Σ β i ˘ , (A.10)where ¯Σ β i “ ˜ Σ ´ ρ ` σ ´ g i T ÿ t “ x it x it ¸ ´ ¯ µ β i “ ¯Σ ρ « Σ ´ ρ µ ρ ` σ ´ g i T ÿ t “ x it ˆ y it ff ˆ y it “ y it ´ α g i ´ ρy it ´ Conditional posterior of Ξ (stick length) . p p Ξ | ρ, β, α, Σ , a, G, u, Y, X q9 « N ź i “ p p u i | π g i q π g i ff « ź j “ p p ξ j | a q ff « N ź i “ p p u i | π g i q ξ g i ź l ă g i p ´ ξ l q ff « ź j “ p p ξ j | a q ff his Version: October 6, 2020his Version: October 6, 2020 A-7For k “ , , ..., K a , p p ξ k | ρ, β, α, Σ , a, G, u, Y, X q9 ˜ ź i P C k ξ k ¸ p ´ ξ k q N ř j “ p g j ą k q p ´ ξ k q a ´ ξ | C k | k p ´ ξ k q a ` N ř j “ p g j ą k q´ . Therefore, posterior distribution of ξ k is ξ k | ρ, β, α, Σ , a, G, u, Y, X „ Beta ˜ | C k | ` , a ` N ÿ j “ p g j ą k q ¸ . (A.11)Give Ξ “ r ξ , ξ , ..., ξ K a s , update π , π , ..., π K a , π k | G, Ξ „ ξ , k “ ξ k ś j ă k p ´ ξ j q , k “ , . . . , K a . (A.12) Conditional posterior of a (concentration parameter) . Regarding the DP concen-tration parameter, the standard posterior derivation doesn’t work due to the unrestrictednumber of components in the current sampler. Instead, we implement the 2-step procedureproposed by Escobar and West (1995) (p.8-9). Following their approach, we first draw alatent variable η from η „ Beta p a ` , N q . (A.13)Then, conditional on η and K a , we assume sample a from a mixture of two Gamma distri-bution: p p a | η, K a q “ π a Gamma p m ` K a , n ´ log p η qq ` p ´ π a q Gamma p m ` K a ´ , n ´ log p η qq , (A.14)with the weights π a defined by π a ´ π a “ m ` K a ´ N r n ´ log p η qs . his Version: October 6, 2020his Version: October 6, 2020 A-8
Conditional posterior of u (auxiliary variable) . Conditional on the group “sticklengths” ξ k and group member indices G , it is straightforward to show that the posteriordensity of u i is a uniform distribution ranging define on p , π g i q , that is u i | Ξ , G „ Unif p , π g i q , (A.15)where π g i “ ξ g i ś j ă g j p ´ ξ j q . Moreover, it is worth noting that the values for K ˚ and u ˚ need to be updated according to (A.3) and (A.4) after this step. Conditional posterior of G (group membership) . We derive the posterior distributionof g i consider on G p i q , where G p i q is a set including all member indices except for g i , i.e., G p i q “ G \ g i . Hence, for k “ , , ..., K ˚ , p p g i “ k | ρ, β, α, Σ , Ξ , a, G p i q , u, Y, X q9 p p y i | ρ, β i , α k , σ k , Y, X q p u i ă π g i q . As per a discrete distribution, we normalize the point mass to get a valid distribution: p p g i “ k | ρ, β, α, Σ , Ξ , a, G p i q , u, Y, X q “ p p y i | ρ, β i , α k , σ k , Y, X q p u i ă π k q ř K ˚ j “ p p y i | ρ, β i , α j , σ j , Y, X q p u i ă π j q . (A.16) A.1.1 Blocked Gibbs Sampler and Algorithm
Initialization:(i) Preset the initial number of active groups K a . As derived by Antoniak (1974), theexpected number of unique groups is E r K | a s « a log ` a ` Na ˘ . We set K a to its expectedvalue with concentration parameter a replaced by prior mean.(ii) In ignorance of group heterogeneity ( K “
1) and heteroskedasticity, run OLS to getˆ α OLS , ˆ ρ OLS , ˆ β iOLS and Cov p ˆ α OLS q . These OLS estimators serve as the mean andcovariance matrix in the related priors.(iii) Generate K ˚ random sample from the distribution N p ˆ α OLS , Cov p ˆ α OLS qq .(iv) Initialize group membership G by sampling from (A.16) ignoring p u i ă π g i q . Removeempty groups.For each iteration s “ , , .., N sim his Version: October 6, 2020his Version: October 6, 2020
1) and heteroskedasticity, run OLS to getˆ α OLS , ˆ ρ OLS , ˆ β iOLS and Cov p ˆ α OLS q . These OLS estimators serve as the mean andcovariance matrix in the related priors.(iii) Generate K ˚ random sample from the distribution N p ˆ α OLS , Cov p ˆ α OLS qq .(iv) Initialize group membership G by sampling from (A.16) ignoring p u i ă π g i q . Removeempty groups.For each iteration s “ , , .., N sim his Version: October 6, 2020his Version: October 6, 2020 A-9(i) Number of active groups: K a “ max ď i ď N g p s ´ q i . (ii) Group “stick length”: for k “ , , ..., K a , draw ξ k from a Beta distribution in (A.11): ξ k | ρ p s ´ q , β p s ´ q , α p s ´ q , Σ p s q , a p s ´ q , G p s ´ q , u p s ´ q , Y, X „ Beta ˜ | C k | ` , a ` N ÿ j “ p g j ą k q ¸ , and calculate group probability in accordance to (A.12).(iii) Group heterogeneity: for k “ , , ..., K a , draw α p s q k from a normal distribution in (A.7): α k | ρ p s ´ q , β p s ´ q , Σ p s ´ q , a p s ´ q , G p s ´ q , u p s ´ q , Y, X „ N ` ¯ µ α k , ¯Σ α k ˘ . (iv) Group heteroskedasticity: for k “ , , ..., K a and t “ , , ..., T , draw σ p s q k from aninverse Gamma distribution in (A.8): σ k | ρ p s ´ q , β p s ´ q , α p s q , G p s ´ q , u p s ´ q , Y, X „ IG ˆ ¯ v σ,k , ¯ δ σ,k ˙ . (v) Label switching : after each iteration an additional random permutation step is addedto the MCMC scheme which randomly permutes the current labeling of the compo-nents. Random permutation ensures that the sampler explores all K! modes of the fullposterior distribution and avoids that the sampler is trapped around a single posteriormode. Following Liu (2020) , we update ! α p s q k , σ p s q k , π p s q k , g p s ´ q i ) by three Metropolis-Hastings label-switching moves developed by Papaspiliopoulos and Roberts (2008) (step(a) and (b)) and Hastie et al. (2015) (step (c)). All these label switching moves aim toimprove numerical convergence.(a) Randomly select two nonempty groups i and j , swap group labels g p s ´ q i and g p s ´ q j Without this step, the one-at-a-time updates of the allocations mean that clusters rarely switch labels,and consequentially the ordering will be largely determined by the (perhaps random) initialization of thesampler. See Algorithm C.4 in the appendix his Version: October 6, 2020his Version: October 6, 2020
1) and heteroskedasticity, run OLS to getˆ α OLS , ˆ ρ OLS , ˆ β iOLS and Cov p ˆ α OLS q . These OLS estimators serve as the mean andcovariance matrix in the related priors.(iii) Generate K ˚ random sample from the distribution N p ˆ α OLS , Cov p ˆ α OLS qq .(iv) Initialize group membership G by sampling from (A.16) ignoring p u i ă π g i q . Removeempty groups.For each iteration s “ , , .., N sim his Version: October 6, 2020his Version: October 6, 2020 A-9(i) Number of active groups: K a “ max ď i ď N g p s ´ q i . (ii) Group “stick length”: for k “ , , ..., K a , draw ξ k from a Beta distribution in (A.11): ξ k | ρ p s ´ q , β p s ´ q , α p s ´ q , Σ p s q , a p s ´ q , G p s ´ q , u p s ´ q , Y, X „ Beta ˜ | C k | ` , a ` N ÿ j “ p g j ą k q ¸ , and calculate group probability in accordance to (A.12).(iii) Group heterogeneity: for k “ , , ..., K a , draw α p s q k from a normal distribution in (A.7): α k | ρ p s ´ q , β p s ´ q , Σ p s ´ q , a p s ´ q , G p s ´ q , u p s ´ q , Y, X „ N ` ¯ µ α k , ¯Σ α k ˘ . (iv) Group heteroskedasticity: for k “ , , ..., K a and t “ , , ..., T , draw σ p s q k from aninverse Gamma distribution in (A.8): σ k | ρ p s ´ q , β p s ´ q , α p s q , G p s ´ q , u p s ´ q , Y, X „ IG ˆ ¯ v σ,k , ¯ δ σ,k ˙ . (v) Label switching : after each iteration an additional random permutation step is addedto the MCMC scheme which randomly permutes the current labeling of the compo-nents. Random permutation ensures that the sampler explores all K! modes of the fullposterior distribution and avoids that the sampler is trapped around a single posteriormode. Following Liu (2020) , we update ! α p s q k , σ p s q k , π p s q k , g p s ´ q i ) by three Metropolis-Hastings label-switching moves developed by Papaspiliopoulos and Roberts (2008) (step(a) and (b)) and Hastie et al. (2015) (step (c)). All these label switching moves aim toimprove numerical convergence.(a) Randomly select two nonempty groups i and j , swap group labels g p s ´ q i and g p s ´ q j Without this step, the one-at-a-time updates of the allocations mean that clusters rarely switch labels,and consequentially the ordering will be largely determined by the (perhaps random) initialization of thesampler. See Algorithm C.4 in the appendix his Version: October 6, 2020his Version: October 6, 2020
A-10for all units in these groups, accept new label with probability:min ˜ , π N j i π N i j π N i i π N j j ¸ “ min ` , p π i { π j q N j ´ N i ˘ , where N i , N j are the number of units in the group i and j respectively.(b) Randomly select two adjacent groups l and l ` t l, l ` u Ă t , , ..., K a u ,swap group label g p s ´ q l and “stick length” ξ p s q l , accept new label and stick lengthwith probability: min ˜ , ˜ p N l ` l ˜ p N l l ` π N l l π N l ` l ` ¸ , where ˜ p i and ˜ p j are new group probabilities derived with new ξ p s q l and ξ p s q l ` .(c) Randomly select two adjacent groups k and k ` t k, k ` u Ă t , , ..., K a u ,swap group label g p s ´ q i , “stick length” ξ p s q k and update group-specific parameter t α p s q k , σ p s q k u , accept new new label and stick length with probabilitymin " , ´ R { r R ¯ N k ` ´ R { r R ¯ N k * , where R “ ` a ` N k ` ` ř l ą k ` N l a ` N k ` ` ř l ą k ` N l ,R “ a ` N k ` ř l ą k ` N l ` a ` N k ` ř l ą k ` N l , r R “ π k ` R ` π k R π k ` π k ` . The new group probability is defined as p k “ π k ` R { r R and p k ` “ π k R { r R . his Version: October 6, 2020his Version: October 6, 2020
A-10for all units in these groups, accept new label with probability:min ˜ , π N j i π N i j π N i i π N j j ¸ “ min ` , p π i { π j q N j ´ N i ˘ , where N i , N j are the number of units in the group i and j respectively.(b) Randomly select two adjacent groups l and l ` t l, l ` u Ă t , , ..., K a u ,swap group label g p s ´ q l and “stick length” ξ p s q l , accept new label and stick lengthwith probability: min ˜ , ˜ p N l ` l ˜ p N l l ` π N l l π N l ` l ` ¸ , where ˜ p i and ˜ p j are new group probabilities derived with new ξ p s q l and ξ p s q l ` .(c) Randomly select two adjacent groups k and k ` t k, k ` u Ă t , , ..., K a u ,swap group label g p s ´ q i , “stick length” ξ p s q k and update group-specific parameter t α p s q k , σ p s q k u , accept new new label and stick length with probabilitymin " , ´ R { r R ¯ N k ` ´ R { r R ¯ N k * , where R “ ` a ` N k ` ` ř l ą k ` N l a ` N k ` ` ř l ą k ` N l ,R “ a ` N k ` ř l ą k ` N l ` a ` N k ` ř l ą k ` N l , r R “ π k ` R ` π k R π k ` π k ` . The new group probability is defined as p k “ π k ` R { r R and p k ` “ π k R { r R . his Version: October 6, 2020his Version: October 6, 2020 A-11Additionally, we update the “stick lengths” for group k and k ` ξ k “ p k ś l ă c p ´ ξ l q ,ξ k ` “ p k ` p ´ ξ k q ś l ă c p ´ ξ l q . (vi) Auxiliary variables: for i “ , , ..., N , draw u i from an uniform distribution in (A.15): u i | Ξ p s q , G p s q „ U p , p p s q g i q . Then calculate u ˚ according to (A.4).(vii) DP concentration parameter:(a) Draw latent variable η from a Beta distribution in (A.13): η „ Beta p a ` , K a q (b) Draw concentration parameter a from a mixture of Gamma distribution in (A.14): a | η, K a „ Gamma p m ` K a , n ´ log p η qq with prob. π a Gamma p m ` K a ´ , n ´ log p η qq with prob. 1 ´ π a , and π a is defined as π a ´ π a “ m ` K a ´ N p n ´ log p η qq . (viii) Potential groups: start with ˜ K “ K a ,(a) Group probabilities:(1) if ř ˜ Kj “ π p s q j ą ´ u ˚ , set K ˚ “ ˜ K and stop(2) otherwise, let ˜ K “ ˜ K `
1, draw ξ ˜ K „ Beta ` , α p s q ˘ , update π ˜ K “ ξ ˜ K ś j ă ˜ K p ´ ξ j q and go to step p q (b) Group parameters: for k “ K ` , ¨ ¨ ¨ , K ˚ , draw α p s q k and σ p s q k from their priordistributions. This particular choices of ξ k and ξ k ` ensure the group probabilities that are changed are those associatedwith the the group k and k `
1, and the rest are unchanged. Moreover, it can be shown that p ´ ξ k qp ´ ξ k ` q “p ´ ξ k qp ´ ξ k ` q . See more details in the appendices of Hastie et al. (2015). his Version: October 6, 2020his Version: October 6, 2020
1, and the rest are unchanged. Moreover, it can be shown that p ´ ξ k qp ´ ξ k ` q “p ´ ξ k qp ´ ξ k ` q . See more details in the appendices of Hastie et al. (2015). his Version: October 6, 2020his Version: October 6, 2020 A-12(ix) Common AR(1) parameter: draw ρ from a normal distribution in (A.9): ρ | β p s ´ q , α p s q , Σ p s q , a p s q , G p s ´ q , u p s q , Y, X „ N ` ¯ µ ρ , ¯Σ ρ ˘ . (x) Heterogeneous parameter: draw β i from a normal distribution in (A.10): β i | ρ p s q , α p s q , Σ p s q , a p s q , G p s ´ q , u p s q , Y, X „ N ` ¯ µ β i , ¯Σ β i ˘ . (xi) Group membership: for i “ , , ..., N and k “ , , ..., K ˚ , draw g i from a multinomialdistribution in (A.16): p p g i “ k | ρ p s q , β p s q , α p s q , Σ p s q , ξ p s q , a p s q , G p i q , u p s q , Y, X q “ p p y i | ρ p s q , β p s q i , α p s q k , Σ p s q k q p u p s q i ă π k q ř K ˚ j “ p p y i | ρ p s q , β p s q i , α p s q j , Σ p s q j q p u p s q j ă π g j q A.2 Random Effects Model with Subjective Group ProbabilityPrior
This algorithm is designed for the random effect model where econometricians have priorknowledge on the group structure and presume the number of groups K p . Building onthe algorithm for the random effect model in Section A.1, we allow for incorporating theresearchers’ prior knowledge while inheriting the feature of reallocating units and changingthe number of groups along the MCMC sampling. his Version: October 6, 2020his Version: October 6, 2020
This algorithm is designed for the random effect model where econometricians have priorknowledge on the group structure and presume the number of groups K p . Building onthe algorithm for the random effect model in Section A.1, we allow for incorporating theresearchers’ prior knowledge while inheriting the feature of reallocating units and changingthe number of groups along the MCMC sampling. his Version: October 6, 2020his Version: October 6, 2020 A-13We use the same notation as in Appendix A.1,Observations: Y “ r y , y , ..., y N s , y i “ r y i , y i , ..., y iT s , Covariates: X “ r x , x , ..., x N s , x i “ r x i , x i , ..., x iT s , Random effects: α “ r α , α , . . . , α K s , Covariance matrices: Σ “ r Σ , Σ , . . . , Σ K s , Group membership: G “ r g , . . . , g N s , Stick length: Ξ “ r ξ , ξ , . . . s , Group probability: π “ r π , . . . , π N s , Membership probability: ω “ r ω , . . . , ω N s , ω i “ r ω i , ω i , ..., ω iK s , Auxiliary varaible: u “ r u , u , ..., u N s , Hyper parameters: φ “ r µ α , Σ α , ν σ , δ σ s . Notice that we define two sets of probabilities, π and ω . In practice, they have distinctroles in the algorithm. The group membership π captures the groups’ probability basedon the entire sample, and, most importantly, determines the upper bound of the auxiliaryvariable u g i that has direct effect on the potential number of groups K ˚ . On the other hand,the membership probability ω i represents the probabilities of a unit i belonging to each of K groups, through which the researcher’s prior knowledge enters the algorithm.As regards the choices of prior, we adopt the independent Multivariate Normal-Inverse-Gamma prior Dirichlet Process priors for group random effects α g i t and heteroskedasticity σ g i , a normal prior for the common parameter ρ , an unsymmetric Dirichlet prior for themembership probability ω with concentration parameters chosen by the econometrician, amultinomial prior for Group membership g i , a Beta prior for the stick length ξ , and a mixtureGamma prior for the concentration parameter a his Version: October 6, 2020his Version: October 6, 2020
This algorithm is designed for the random effect model where econometricians have priorknowledge on the group structure and presume the number of groups K p . Building onthe algorithm for the random effect model in Section A.1, we allow for incorporating theresearchers’ prior knowledge while inheriting the feature of reallocating units and changingthe number of groups along the MCMC sampling. his Version: October 6, 2020his Version: October 6, 2020 A-13We use the same notation as in Appendix A.1,Observations: Y “ r y , y , ..., y N s , y i “ r y i , y i , ..., y iT s , Covariates: X “ r x , x , ..., x N s , x i “ r x i , x i , ..., x iT s , Random effects: α “ r α , α , . . . , α K s , Covariance matrices: Σ “ r Σ , Σ , . . . , Σ K s , Group membership: G “ r g , . . . , g N s , Stick length: Ξ “ r ξ , ξ , . . . s , Group probability: π “ r π , . . . , π N s , Membership probability: ω “ r ω , . . . , ω N s , ω i “ r ω i , ω i , ..., ω iK s , Auxiliary varaible: u “ r u , u , ..., u N s , Hyper parameters: φ “ r µ α , Σ α , ν σ , δ σ s . Notice that we define two sets of probabilities, π and ω . In practice, they have distinctroles in the algorithm. The group membership π captures the groups’ probability basedon the entire sample, and, most importantly, determines the upper bound of the auxiliaryvariable u g i that has direct effect on the potential number of groups K ˚ . On the other hand,the membership probability ω i represents the probabilities of a unit i belonging to each of K groups, through which the researcher’s prior knowledge enters the algorithm.As regards the choices of prior, we adopt the independent Multivariate Normal-Inverse-Gamma prior Dirichlet Process priors for group random effects α g i t and heteroskedasticity σ g i , a normal prior for the common parameter ρ , an unsymmetric Dirichlet prior for themembership probability ω with concentration parameters chosen by the econometrician, amultinomial prior for Group membership g i , a Beta prior for the stick length ξ , and a mixtureGamma prior for the concentration parameter a his Version: October 6, 2020his Version: October 6, 2020 A-14The posterior of unknown objects in this random effects model is: p p ρ, β, α, Σ , G, ω | Y, X q9 p p Y | X, ρ, β, α, Σ , G, ω q p p ρ, β, α, Σ , G, π q9 p p Y | X, ρ, β, α, Σ , G q p p α, Σ | φ q p p G | p q p p ρ q p p β q p p ω | a q“ N ź i “ p p y i | x i , ρ, β i , α g i , σ g i q K ź j “ p p α j , σ j | φ q N ź i “ p p g i | ω i q N ź i “ p p ω i | a i q N ź i “ p p β i q p p ρ q“ N ź i “ “ p p y i | x i , ρ, β i , α g i , σ g i q p p g i | ω i q p p ω i | a i q p p β i q ‰ K ź j “ p p α j , σ j | φ q p p ρ q . To allow for automatically adjustment for the number of groups, we introduce a set ofauxiliary variables u “ r u , u , ..., u N s proposed by Walker (2007) and rewrite the posteriorabove as, p p ρ, β, α, Σ , Ξ , a, G, u, ω, π | Y, X q9 N ź i “ “ p p y i | x i , ρ, β i , α g i , σ g i q p p g i | ω i q p p ω i | a i q p u i ă π g i q p p β i q ‰ K ź j “ p p α j , σ j | φ q p p ρ q . The number of potential groups K ˚ and the number of active groups K a are defined inthe equation (A.3) and (A.5). Conditional posterior of α (grouped random effects) . Identical to (A.7). Conditional posterior of Σ (grouped variance) . Identical to (A.8). Conditional posterior of ρ (common coefficient) . Identical to (A.9). Conditional posterior of β (heterogeneous coefficients) . Identical to (A.10). Conditional posterior of Ξ (stick length) . Identical to (A.11). Then generate π inaccordance to (A.12). Conditional posterior of a (concentration parameter) . Identical to (A.14). Conditional posterior of u (auxiliary variable) . Identical to (A.15). Conditional posterior of ω (membership probability) . Sampling from the posteriorof π can be implemented as follows. As we adopt Dirichlet prior for π and Multinomial prior his Version: October 6, 2020his Version: October 6, 2020
This algorithm is designed for the random effect model where econometricians have priorknowledge on the group structure and presume the number of groups K p . Building onthe algorithm for the random effect model in Section A.1, we allow for incorporating theresearchers’ prior knowledge while inheriting the feature of reallocating units and changingthe number of groups along the MCMC sampling. his Version: October 6, 2020his Version: October 6, 2020 A-13We use the same notation as in Appendix A.1,Observations: Y “ r y , y , ..., y N s , y i “ r y i , y i , ..., y iT s , Covariates: X “ r x , x , ..., x N s , x i “ r x i , x i , ..., x iT s , Random effects: α “ r α , α , . . . , α K s , Covariance matrices: Σ “ r Σ , Σ , . . . , Σ K s , Group membership: G “ r g , . . . , g N s , Stick length: Ξ “ r ξ , ξ , . . . s , Group probability: π “ r π , . . . , π N s , Membership probability: ω “ r ω , . . . , ω N s , ω i “ r ω i , ω i , ..., ω iK s , Auxiliary varaible: u “ r u , u , ..., u N s , Hyper parameters: φ “ r µ α , Σ α , ν σ , δ σ s . Notice that we define two sets of probabilities, π and ω . In practice, they have distinctroles in the algorithm. The group membership π captures the groups’ probability basedon the entire sample, and, most importantly, determines the upper bound of the auxiliaryvariable u g i that has direct effect on the potential number of groups K ˚ . On the other hand,the membership probability ω i represents the probabilities of a unit i belonging to each of K groups, through which the researcher’s prior knowledge enters the algorithm.As regards the choices of prior, we adopt the independent Multivariate Normal-Inverse-Gamma prior Dirichlet Process priors for group random effects α g i t and heteroskedasticity σ g i , a normal prior for the common parameter ρ , an unsymmetric Dirichlet prior for themembership probability ω with concentration parameters chosen by the econometrician, amultinomial prior for Group membership g i , a Beta prior for the stick length ξ , and a mixtureGamma prior for the concentration parameter a his Version: October 6, 2020his Version: October 6, 2020 A-14The posterior of unknown objects in this random effects model is: p p ρ, β, α, Σ , G, ω | Y, X q9 p p Y | X, ρ, β, α, Σ , G, ω q p p ρ, β, α, Σ , G, π q9 p p Y | X, ρ, β, α, Σ , G q p p α, Σ | φ q p p G | p q p p ρ q p p β q p p ω | a q“ N ź i “ p p y i | x i , ρ, β i , α g i , σ g i q K ź j “ p p α j , σ j | φ q N ź i “ p p g i | ω i q N ź i “ p p ω i | a i q N ź i “ p p β i q p p ρ q“ N ź i “ “ p p y i | x i , ρ, β i , α g i , σ g i q p p g i | ω i q p p ω i | a i q p p β i q ‰ K ź j “ p p α j , σ j | φ q p p ρ q . To allow for automatically adjustment for the number of groups, we introduce a set ofauxiliary variables u “ r u , u , ..., u N s proposed by Walker (2007) and rewrite the posteriorabove as, p p ρ, β, α, Σ , Ξ , a, G, u, ω, π | Y, X q9 N ź i “ “ p p y i | x i , ρ, β i , α g i , σ g i q p p g i | ω i q p p ω i | a i q p u i ă π g i q p p β i q ‰ K ź j “ p p α j , σ j | φ q p p ρ q . The number of potential groups K ˚ and the number of active groups K a are defined inthe equation (A.3) and (A.5). Conditional posterior of α (grouped random effects) . Identical to (A.7). Conditional posterior of Σ (grouped variance) . Identical to (A.8). Conditional posterior of ρ (common coefficient) . Identical to (A.9). Conditional posterior of β (heterogeneous coefficients) . Identical to (A.10). Conditional posterior of Ξ (stick length) . Identical to (A.11). Then generate π inaccordance to (A.12). Conditional posterior of a (concentration parameter) . Identical to (A.14). Conditional posterior of u (auxiliary variable) . Identical to (A.15). Conditional posterior of ω (membership probability) . Sampling from the posteriorof π can be implemented as follows. As we adopt Dirichlet prior for π and Multinomial prior his Version: October 6, 2020his Version: October 6, 2020 A-15for g i , for i “ , ..., N , the posterior is written as, p p ω i | ρ, β, α, Σ , G, Y, X q9 p p g i | ω i q p p ω i | a i q9 ´ ω p g i “ q i . . . ω p g i “ K p q iK p ¯ ˆ ` ω a i ´ i . . . ω a iKp ´ ik ˘ “ ω a i ` p g i “ q´ i . . . ω a i ` p g i “ K p q´ iK p . (A.17)This implies ω i | ρ, β, α, Σ , G, Y, X „ Dir p a i ` p g i “ q , . . . , a iK p ` p g i “ K p qq . (A.18)It is worth noting that, during MCMC sampling, we allow for more/fewer groups than theresearcher expects, i.e., the potential number of groups K ˚p s q could be larger or smaller than K p in some iteration s . In such circumstances, we modify the Dirichlet posterior distributionin (A.17) to account for such changes. Notably, we present the posterior in three cases. Case 1 : K ˚p s q “ K p . The posterior distribution in (A.17) is still valid. Case 2 : K ˚p s q ą K p . We have to address new groups. For the additional groups suchthat k : ą K p , we have, for @ i , p g p s ´ q i “ k : q “ k : ą K a p s ´ q and a ik : “ . where K a p s ´ q “ max ď i ď N g p s q i denotes the number of active groups in the previous iteration.To ensure a non-negative posterior probability ω ik : for the new groups, we assume that,for some (cid:15) , K ˚p s q ÿ k “ k : ˜ a ik : “ (cid:15) and ˜ a im “ ˜ a in , @ m, n ě k : . and adjust the prior membership probability a ik if k ă k : by multiplying 1 ´ (cid:15) . This step arti-ficially assigns nonzero probabilities to news groups and forms a set of new hyperparameters˜ a ik such that ř K ˚p s ´ q k “ ˜ a ik “
1. Then we draw ω i from the posterior density, ω i | ρ, β, α, Σ , G, Y, X „ Dir ` ˜ a i ` p g i “ q , . . . , ˜ a iK ˚p s q ` p g i “ K ˚p s q q ˘ . (A.19) In the simulation, we set (cid:15) “ . his Version: October 6, 2020his Version: October 6, 2020
1. Then we draw ω i from the posterior density, ω i | ρ, β, α, Σ , G, Y, X „ Dir ` ˜ a i ` p g i “ q , . . . , ˜ a iK ˚p s q ` p g i “ K ˚p s q q ˘ . (A.19) In the simulation, we set (cid:15) “ . his Version: October 6, 2020his Version: October 6, 2020 A-16
Case 3 : K ˚p s q ă K p . We have few groups than the researcher assumes and a unit i might be assigned to a group that is no longer considered in the current iteration (i.e., K ˚p s q ă g p s ´ q i ). In this case, we need to select and renormalize a subset of a ik since somegroups are dismissed. In this regard, for a unit i , we select K ˚p s q most frequent non-emptygroups among the groups visited in the previous iteration s ´
1. If there are not enoughcandidates, we add back non-selected groups in the first K ˚p s q out of the K p groups. Thenwe normalize the selected a ik and get ˆ a ik . Finally we draw ω i from the posterior density, ω i | ρ, β, α, Σ , G, Y, X „ Dir ` ˆ a i ` p g i “ q , . . . , ˆ a iK ˚p s q ` p g i “ K ˚p s q q ˘ . (A.20) Conditional posterior of G (group membership) . We derive the posterior distributionof g i consider on G p i q , where G p i q is a set including all member indices except for g i , i.e., G p i q “ G \ g i . Hence, for k “ , , ..., K ˚ , p p g i “ k | ρ, β, α, Σ , G p i q , π, ω, Y, X q9 p p y i | ρ, β i , α k , Σ k , Y, X q ω ik p u i ă π g i q . As per a discrete distribution, we normalize the point mass to get a valid distribution, p p g i “ k | ρ, β i , α, Σ , G p i q , π, ω, Y, X q “ p p y i | ρ, β i , α k , Σ k , Y, X q ω ik p u i ă π k q ř Kj “ p p y i | ρ, β , α j , Σ j , Y, X q ω ij p u i ă π j q . (A.21) A.3 Random Effects Model with Group Structures in α , ρ and β In this subsection, I present the conditional posterior distribution for the time-invariantrandom effects model with group structures in α , ρ and β . The model is, y it “ θ g i q x it ` ε it , ε it iid „ N ` , σ g i ˘ , where q x it “ r y it ´ x it s , and θ g i “ r α g i ρ g i β g i s . In practice, if all selected a ik are zero for some i , we simply assume a ik “ { K ˚p s q . his Version: October 6, 2020his Version: October 6, 2020
1. If there are not enoughcandidates, we add back non-selected groups in the first K ˚p s q out of the K p groups. Thenwe normalize the selected a ik and get ˆ a ik . Finally we draw ω i from the posterior density, ω i | ρ, β, α, Σ , G, Y, X „ Dir ` ˆ a i ` p g i “ q , . . . , ˆ a iK ˚p s q ` p g i “ K ˚p s q q ˘ . (A.20) Conditional posterior of G (group membership) . We derive the posterior distributionof g i consider on G p i q , where G p i q is a set including all member indices except for g i , i.e., G p i q “ G \ g i . Hence, for k “ , , ..., K ˚ , p p g i “ k | ρ, β, α, Σ , G p i q , π, ω, Y, X q9 p p y i | ρ, β i , α k , Σ k , Y, X q ω ik p u i ă π g i q . As per a discrete distribution, we normalize the point mass to get a valid distribution, p p g i “ k | ρ, β i , α, Σ , G p i q , π, ω, Y, X q “ p p y i | ρ, β i , α k , Σ k , Y, X q ω ik p u i ă π k q ř Kj “ p p y i | ρ, β , α j , Σ j , Y, X q ω ij p u i ă π j q . (A.21) A.3 Random Effects Model with Group Structures in α , ρ and β In this subsection, I present the conditional posterior distribution for the time-invariantrandom effects model with group structures in α , ρ and β . The model is, y it “ θ g i q x it ` ε it , ε it iid „ N ` , σ g i ˘ , where q x it “ r y it ´ x it s , and θ g i “ r α g i ρ g i β g i s . In practice, if all selected a ik are zero for some i , we simply assume a ik “ { K ˚p s q . his Version: October 6, 2020his Version: October 6, 2020 A-17We use the same notation as in Appendix A.1 aside from the coefficients θ ,Observations: Y “ r y , y , ..., y N s , y i “ r y i , y i , ..., y iT s , Covariates: X “ r x , x , ..., x N s , x i “ r x i , x i , ..., x iT s , Stacked coefficients: θ “ r θ , θ , . . . s , Covariance matrices: Σ “ “ σ , σ , . . . ‰ , Stick length: Ξ “ r ξ , ξ , . . . s , Group membership: G “ r g , . . . , g N s , Auxiliary varaible: u “ r u , u , ..., u N s , Hyper parameters: φ “ r µ θ , Σ θ , ν σ , δ σ s . The posterior of unknown objects in the random effects model is, p p θ, Σ , Ξ , a, G, u | Y, X q9 « N ź i “ p p y i | x i , θ g i , σ g i q p u i ă π g i q ff « ź j “ p p θ j , σ j | φ q p p ξ j | a q ff p p ρ q p p a q . (A.22)The number of potential groups K ˚ and the number of active groups K a are defined inthe equation (A.3) and (A.5). Conditional posterior of θ (grouped random effects) . p p θ | Σ , Ξ , a, G, u, Y, X q9 « N ź i “ p p y i | x i , θ g i , Σ g i q p u i ă π g i q ff « ź j “ p p θ j , σ j | φ q ff For k P t , , ..., K a u , define a set of unit that belongs to group k , C k “ t i P t , , ..., N u| g i “ k u , (A.23) his Version: October 6, 2020his Version: October 6, 2020
1. If there are not enoughcandidates, we add back non-selected groups in the first K ˚p s q out of the K p groups. Thenwe normalize the selected a ik and get ˆ a ik . Finally we draw ω i from the posterior density, ω i | ρ, β, α, Σ , G, Y, X „ Dir ` ˆ a i ` p g i “ q , . . . , ˆ a iK ˚p s q ` p g i “ K ˚p s q q ˘ . (A.20) Conditional posterior of G (group membership) . We derive the posterior distributionof g i consider on G p i q , where G p i q is a set including all member indices except for g i , i.e., G p i q “ G \ g i . Hence, for k “ , , ..., K ˚ , p p g i “ k | ρ, β, α, Σ , G p i q , π, ω, Y, X q9 p p y i | ρ, β i , α k , Σ k , Y, X q ω ik p u i ă π g i q . As per a discrete distribution, we normalize the point mass to get a valid distribution, p p g i “ k | ρ, β i , α, Σ , G p i q , π, ω, Y, X q “ p p y i | ρ, β i , α k , Σ k , Y, X q ω ik p u i ă π k q ř Kj “ p p y i | ρ, β , α j , Σ j , Y, X q ω ij p u i ă π j q . (A.21) A.3 Random Effects Model with Group Structures in α , ρ and β In this subsection, I present the conditional posterior distribution for the time-invariantrandom effects model with group structures in α , ρ and β . The model is, y it “ θ g i q x it ` ε it , ε it iid „ N ` , σ g i ˘ , where q x it “ r y it ´ x it s , and θ g i “ r α g i ρ g i β g i s . In practice, if all selected a ik are zero for some i , we simply assume a ik “ { K ˚p s q . his Version: October 6, 2020his Version: October 6, 2020 A-17We use the same notation as in Appendix A.1 aside from the coefficients θ ,Observations: Y “ r y , y , ..., y N s , y i “ r y i , y i , ..., y iT s , Covariates: X “ r x , x , ..., x N s , x i “ r x i , x i , ..., x iT s , Stacked coefficients: θ “ r θ , θ , . . . s , Covariance matrices: Σ “ “ σ , σ , . . . ‰ , Stick length: Ξ “ r ξ , ξ , . . . s , Group membership: G “ r g , . . . , g N s , Auxiliary varaible: u “ r u , u , ..., u N s , Hyper parameters: φ “ r µ θ , Σ θ , ν σ , δ σ s . The posterior of unknown objects in the random effects model is, p p θ, Σ , Ξ , a, G, u | Y, X q9 « N ź i “ p p y i | x i , θ g i , σ g i q p u i ă π g i q ff « ź j “ p p θ j , σ j | φ q p p ξ j | a q ff p p ρ q p p a q . (A.22)The number of potential groups K ˚ and the number of active groups K a are defined inthe equation (A.3) and (A.5). Conditional posterior of θ (grouped random effects) . p p θ | Σ , Ξ , a, G, u, Y, X q9 « N ź i “ p p y i | x i , θ g i , Σ g i q p u i ă π g i q ff « ź j “ p p θ j , σ j | φ q ff For k P t , , ..., K a u , define a set of unit that belongs to group k , C k “ t i P t , , ..., N u| g i “ k u , (A.23) his Version: October 6, 2020his Version: October 6, 2020 A-18then the posterior density for θ k read as p p θ k | Σ , Ξ , a, G, u, Y, X q9 « ź i P C k p p y i | x i , θ k , σ k q ff p p θ k | φ q9 exp « ´ ÿ i P C k p y i ´ q x i θ k q Σ ´ k p y i ´ q x i θ k q ff exp “ ´ p θ k ´ µ θ q Σ ´ θ p θ k ´ µ θ q ‰ exp “ ´ p θ k ´ ¯ µ θ k q ¯Σ ´ θ p θ k ´ ¯ µ θ k q ‰ . Assuming an independent normal conjugate prior for θ k , the posterior for θ k is given by θ k | Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ θ k , ¯Σ θ k ˘ , (A.24)where ¯Σ θ k “ ˜ Σ ´ θ ` σ ´ k ÿ i P C k q x i q x i ¸ ´ , ¯ µ θ k “ ¯Σ θ k « Σ ´ θ µ θ ` σ ´ k ÿ i P C k q x i y i ff . If group k is empty, we draw θ k from its prior N p µ θ , Σ θ q . Conditional posterior of Σ (grouped variance) . Under the cross-sectional indepen-dence, for k “ , , ..., K a , p p σ k | θ, Ξ , a, G, u, Y, X q9 « ź i P C k p p y i | x i , ρ, β i , α k , σ k q ff p p σ k | φ q Assuming a inverse-gamma prior σ k „ IG ` v σ , δ σ ˘ , the posterior distribution of σ k is p p σ k | ρ, β, α, G, u, Y, X q9 ź i P C k « p σ k q ´ T exp ˜ ´ ř Tt “ p y it ´ θ k q x it q σ k ¸ff ˆ σ k ˙ vσ ` exp ˆ ´ δ σ σ k ˙ “ ˆ σ k ˙ vσ ` T | Ck | ` exp ˜ ´ δ σ ` ř i P C k ř Tt “ p y it ´ θ k q x it q σ k ¸ . his Version: October 6, 2020his Version: October 6, 2020
1. If there are not enoughcandidates, we add back non-selected groups in the first K ˚p s q out of the K p groups. Thenwe normalize the selected a ik and get ˆ a ik . Finally we draw ω i from the posterior density, ω i | ρ, β, α, Σ , G, Y, X „ Dir ` ˆ a i ` p g i “ q , . . . , ˆ a iK ˚p s q ` p g i “ K ˚p s q q ˘ . (A.20) Conditional posterior of G (group membership) . We derive the posterior distributionof g i consider on G p i q , where G p i q is a set including all member indices except for g i , i.e., G p i q “ G \ g i . Hence, for k “ , , ..., K ˚ , p p g i “ k | ρ, β, α, Σ , G p i q , π, ω, Y, X q9 p p y i | ρ, β i , α k , Σ k , Y, X q ω ik p u i ă π g i q . As per a discrete distribution, we normalize the point mass to get a valid distribution, p p g i “ k | ρ, β i , α, Σ , G p i q , π, ω, Y, X q “ p p y i | ρ, β i , α k , Σ k , Y, X q ω ik p u i ă π k q ř Kj “ p p y i | ρ, β , α j , Σ j , Y, X q ω ij p u i ă π j q . (A.21) A.3 Random Effects Model with Group Structures in α , ρ and β In this subsection, I present the conditional posterior distribution for the time-invariantrandom effects model with group structures in α , ρ and β . The model is, y it “ θ g i q x it ` ε it , ε it iid „ N ` , σ g i ˘ , where q x it “ r y it ´ x it s , and θ g i “ r α g i ρ g i β g i s . In practice, if all selected a ik are zero for some i , we simply assume a ik “ { K ˚p s q . his Version: October 6, 2020his Version: October 6, 2020 A-17We use the same notation as in Appendix A.1 aside from the coefficients θ ,Observations: Y “ r y , y , ..., y N s , y i “ r y i , y i , ..., y iT s , Covariates: X “ r x , x , ..., x N s , x i “ r x i , x i , ..., x iT s , Stacked coefficients: θ “ r θ , θ , . . . s , Covariance matrices: Σ “ “ σ , σ , . . . ‰ , Stick length: Ξ “ r ξ , ξ , . . . s , Group membership: G “ r g , . . . , g N s , Auxiliary varaible: u “ r u , u , ..., u N s , Hyper parameters: φ “ r µ θ , Σ θ , ν σ , δ σ s . The posterior of unknown objects in the random effects model is, p p θ, Σ , Ξ , a, G, u | Y, X q9 « N ź i “ p p y i | x i , θ g i , σ g i q p u i ă π g i q ff « ź j “ p p θ j , σ j | φ q p p ξ j | a q ff p p ρ q p p a q . (A.22)The number of potential groups K ˚ and the number of active groups K a are defined inthe equation (A.3) and (A.5). Conditional posterior of θ (grouped random effects) . p p θ | Σ , Ξ , a, G, u, Y, X q9 « N ź i “ p p y i | x i , θ g i , Σ g i q p u i ă π g i q ff « ź j “ p p θ j , σ j | φ q ff For k P t , , ..., K a u , define a set of unit that belongs to group k , C k “ t i P t , , ..., N u| g i “ k u , (A.23) his Version: October 6, 2020his Version: October 6, 2020 A-18then the posterior density for θ k read as p p θ k | Σ , Ξ , a, G, u, Y, X q9 « ź i P C k p p y i | x i , θ k , σ k q ff p p θ k | φ q9 exp « ´ ÿ i P C k p y i ´ q x i θ k q Σ ´ k p y i ´ q x i θ k q ff exp “ ´ p θ k ´ µ θ q Σ ´ θ p θ k ´ µ θ q ‰ exp “ ´ p θ k ´ ¯ µ θ k q ¯Σ ´ θ p θ k ´ ¯ µ θ k q ‰ . Assuming an independent normal conjugate prior for θ k , the posterior for θ k is given by θ k | Σ , Ξ , a, G, u, Y, X „ N ` ¯ µ θ k , ¯Σ θ k ˘ , (A.24)where ¯Σ θ k “ ˜ Σ ´ θ ` σ ´ k ÿ i P C k q x i q x i ¸ ´ , ¯ µ θ k “ ¯Σ θ k « Σ ´ θ µ θ ` σ ´ k ÿ i P C k q x i y i ff . If group k is empty, we draw θ k from its prior N p µ θ , Σ θ q . Conditional posterior of Σ (grouped variance) . Under the cross-sectional indepen-dence, for k “ , , ..., K a , p p σ k | θ, Ξ , a, G, u, Y, X q9 « ź i P C k p p y i | x i , ρ, β i , α k , σ k q ff p p σ k | φ q Assuming a inverse-gamma prior σ k „ IG ` v σ , δ σ ˘ , the posterior distribution of σ k is p p σ k | ρ, β, α, G, u, Y, X q9 ź i P C k « p σ k q ´ T exp ˜ ´ ř Tt “ p y it ´ θ k q x it q σ k ¸ff ˆ σ k ˙ vσ ` exp ˆ ´ δ σ σ k ˙ “ ˆ σ k ˙ vσ ` T | Ck | ` exp ˜ ´ δ σ ` ř i P C k ř Tt “ p y it ´ θ k q x it q σ k ¸ . his Version: October 6, 2020his Version: October 6, 2020 A-19This implies σ k | ρ, β, α, Ξ , a, G, u, Y, X „ IG ˆ ¯ v σ,k , ¯ δ σ,k ˙ , (A.25)where ¯ v σ,k “ v σ ` T | C k | , ¯ δ σ,kt “ δ σ ` ÿ i P C k T ÿ t “ p y it ´ θ k q x it q , | C k | “ occurrence of g i “ k. If group k is empty, we draw σ k from its prior IG ` v σ , δ σ ˘ . Conditional posterior of Ξ (stick length) . Identical to (A.11). Conditional posterior of a (concentration parameter) . Identical to (A.14). Conditional posterior of u (auxiliary variable) . Identical to (A.15). Conditional posterior of G (group membership) . Identical to (A.16). B Proofs
Proposition B.1
Suppose that we have a model with posterior as given in the section 3.2.Given the definition of the number of potential component K ˚ (eq.(A.3)), the minimum ofauxiliary variables u ˚ (eq.(A.4)) and the number of active group K (eq.(A.5)), we have(i) u i ą π k for @ i “ , , ..., n and @ k ą K ˚ ;(ii) K ă K ˚ .Proof: (i) By definition, u ˚ “ min ď i ď N u i for i “ , , ..., n , then, u i ě u ˚ ą ´ K ˚ ÿ j “ π j “ ÿ j “ K ˚ π j ě π k , @ k ą K ˚ . his Version: October 6, 2020his Version: October 6, 2020
Suppose that we have a model with posterior as given in the section 3.2.Given the definition of the number of potential component K ˚ (eq.(A.3)), the minimum ofauxiliary variables u ˚ (eq.(A.4)) and the number of active group K (eq.(A.5)), we have(i) u i ą π k for @ i “ , , ..., n and @ k ą K ˚ ;(ii) K ă K ˚ .Proof: (i) By definition, u ˚ “ min ď i ď N u i for i “ , , ..., n , then, u i ě u ˚ ą ´ K ˚ ÿ j “ π j “ ÿ j “ K ˚ π j ě π k , @ k ą K ˚ . his Version: October 6, 2020his Version: October 6, 2020 A-20(ii) Let i be an unit i such that g i “ K . According to the posterior of G , the group K exists if u i ă π K , otherwise p p g i “ K |¨q “
0. Then by definition, u ˚ ď u i ă π K ñ ´ u ˚ ą ´ π K “ K ´ ÿ j “ π j . Since K ˚ is the smallest number s.t. ´ u ˚ ă K ˚ ř j “ π j , then K ď K ˚ . C Convergence Diagnostic
To assess convergence, we assess the trace plot, cumulative mean, and auto-correlation ofposterior draws for different coefficients. In particular, the data generating process used hereis DGP7, where we assume time-varying grouped random effects and homoskedasticity. Weevaluate the most complicated BGRE estimator: Tv-Hetero (time-varying α i , heteroskedas-ticity), and report the convergence diagnostics for α , , σ and ρ .Figure C.1: Convergence Diagnostics, α , ( i “ t “ Due to time effects and heteroskedasticity, we randomly present one of the α for unit i “ t “
1, and the variance of error term σ for unit i “ his Version: October 6, 2020his Version: October 6, 2020
1, and the variance of error term σ for unit i “ his Version: October 6, 2020his Version: October 6, 2020 A-21Figure C.2: Convergence Diagnostics, σ , ( i “ ρ D Computation DetailsE Additional Simulation Results
E.1 Main MC Simulation: Larger variance
In this section, we present the additional simulation results of DGP1, DGP2 DGP3 andDGP4 with larger variance with σ “ . . The anther settings remain the same: N “ T “
10 and the true number of groups is K “ his Version: October 6, 2020his Version: October 6, 2020
10 and the true number of groups is K “ his Version: October 6, 2020his Version: October 6, 2020 A-22Table E.1 and E.2 shows the estimate and forecast comparison among alternative predic-tors. For DGP1 and DGP2, the results are similar to those of smaller variance in the maintext: the best models are Ti-Homo and Ti-Hetero, respectively. However, in the DGP3, theTv-Homo and Tv-Hetero estimators, which are expected to stand out since they correctlymodel the time effects, don’t offer the best performance. The potential reason is that theestimation becomes substantially difficult in the presence of both time-varying random ef-fects and much noisier error terms, making it hard to accurately determine group structure.Regarding the DGP4, Ti-Homo and Ti-Hetero deliver outstanding performance relative toother alternative estimators. As there is no group structure in this DGP, the Flat estimatorshould be the best, which indeed generate accurate estimates and forecast but Ti-Homo andTi-Hetero can still stand out. This is mainly because Ti-Homo and Ti-Hetero optimallypartition similar units into several groups, which averages out the noisy error terms and,hence, scores great performance. These exciting results also suggest that, though no groupstructure in the sample, our BGRE estimators have the edge over other estimators whoeither pool all information (Pooled) or treat each unit separately (Flat). his Version: October 6, 2020his Version: October 6, 2020
10 and the true number of groups is K “ his Version: October 6, 2020his Version: October 6, 2020 A-22Table E.1 and E.2 shows the estimate and forecast comparison among alternative predic-tors. For DGP1 and DGP2, the results are similar to those of smaller variance in the maintext: the best models are Ti-Homo and Ti-Hetero, respectively. However, in the DGP3, theTv-Homo and Tv-Hetero estimators, which are expected to stand out since they correctlymodel the time effects, don’t offer the best performance. The potential reason is that theestimation becomes substantially difficult in the presence of both time-varying random ef-fects and much noisier error terms, making it hard to accurately determine group structure.Regarding the DGP4, Ti-Homo and Ti-Hetero deliver outstanding performance relative toother alternative estimators. As there is no group structure in this DGP, the Flat estimatorshould be the best, which indeed generate accurate estimates and forecast but Ti-Homo andTi-Hetero can still stand out. This is mainly because Ti-Homo and Ti-Hetero optimallypartition similar units into several groups, which averages out the noisy error terms and,hence, scores great performance. These exciting results also suggest that, though no groupstructure in the sample, our BGRE estimators have the edge over other estimators whoeither pool all information (Pooled) or treat each unit separately (Flat). his Version: October 6, 2020his Version: October 6, 2020
A-23Table E.1: Monte Carlo Experiment: Point Estimates, Larger σ ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg KDGP 1(Grp Ti Ho.) Ti-Homo 0.0336 0.0227 0.0169 0.0658 0.70 -0.1446 3.08Ti-Hetero 0.0342 0.0241 0.0165 0.0643 0.63 -0.1548 2.96Tv-Homo 0.2386 0.2363 0.0181 0.0686 0.05 -1.5126 1.30Tv-Hetero 0.2436 0.2419 0.0155 0.0597 0.05 -1.5488 1.41Pooled 0.2217 0.2215 0.0088 0.0342 0 -1.4113 1Flat 0.0665 -0.0642 0.0164 0.0639 0.03 0.4058 100Param 0.2493 0.2205 0.1119 0.4429 0.51 -1.4000 1DGP 2(Grp Ti He.) Ti-Homo 0.0146 0.0058 0.0104 0.0404 0.94 -0.0405 4.19Ti-Hetero 0.0084 0.0033 0.0062 0.0242 0.91 -0.0231 3.96Tv-Homo 0.2231 0.2215 0.0193 0.0744 0.02 -1.4385 10.59Tv-Hetero 0.1639 0.1611 0.0218 0.0827 0.09 -1.0595 3.08Pooled 0.2495 0.2494 0.0063 0.0245 0 -1.6040 1Flat 0.0262 -0.0237 0.0105 0.0409 0.32 0.1506 100Param 0.2743 0.2480 0.1135 0.4488 0.29 -1.5926 1DGP 3(Grp Tv Ho.) Ti-Homo 0.2074 0.2064 0.0172 0.0670 0.02 0.0052 1.03Ti-Hetero 0.2063 0.2054 0.0169 0.0659 0.01 0.0050 1.03Tv-Homo 0.2102 0.2095 0.0168 0.0653 0 0.0052 1.03Tv-Hetero 0.2113 0.2106 0.0169 0.0658 0 0.0051 1.17Pooled 0.2102 0.2096 0.0166 0.0646 0 0.0050 1Flat 0.1870 -0.1849 0.0277 0.1080 0 -0.0047 100Param 0.2162 0.2098 0.0505 0.2012 0.01 0.0168 1DGP 4(Std Ti Ho.) Ti-Homo 0.0145 0.0063 0.0097 0.0376 0.88 -0.0439 4.45Ti-Hetero 0.0148 0.0069 0.0098 0.0384 0.89 -0.0481 4.23Tv-Homo 0.3096 0.3093 0.0137 0.0530 0 -1.9922 1.99Tv-Hetero 0.3108 0.3105 0.0135 0.0522 0 -1.9998 2.04Pooled 0.2529 0.2528 0.0062 0.0240 0 -1.6281 1Flat 0.0256 -0.0230 0.0100 0.0388 0.37 0.1481 100Param 0.2775 0.2515 0.1161 0.4593 0.26 -1.6167 1 his Version: October 6, 2020his Version: October 6, 2020
A-23Table E.1: Monte Carlo Experiment: Point Estimates, Larger σ ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg KDGP 1(Grp Ti Ho.) Ti-Homo 0.0336 0.0227 0.0169 0.0658 0.70 -0.1446 3.08Ti-Hetero 0.0342 0.0241 0.0165 0.0643 0.63 -0.1548 2.96Tv-Homo 0.2386 0.2363 0.0181 0.0686 0.05 -1.5126 1.30Tv-Hetero 0.2436 0.2419 0.0155 0.0597 0.05 -1.5488 1.41Pooled 0.2217 0.2215 0.0088 0.0342 0 -1.4113 1Flat 0.0665 -0.0642 0.0164 0.0639 0.03 0.4058 100Param 0.2493 0.2205 0.1119 0.4429 0.51 -1.4000 1DGP 2(Grp Ti He.) Ti-Homo 0.0146 0.0058 0.0104 0.0404 0.94 -0.0405 4.19Ti-Hetero 0.0084 0.0033 0.0062 0.0242 0.91 -0.0231 3.96Tv-Homo 0.2231 0.2215 0.0193 0.0744 0.02 -1.4385 10.59Tv-Hetero 0.1639 0.1611 0.0218 0.0827 0.09 -1.0595 3.08Pooled 0.2495 0.2494 0.0063 0.0245 0 -1.6040 1Flat 0.0262 -0.0237 0.0105 0.0409 0.32 0.1506 100Param 0.2743 0.2480 0.1135 0.4488 0.29 -1.5926 1DGP 3(Grp Tv Ho.) Ti-Homo 0.2074 0.2064 0.0172 0.0670 0.02 0.0052 1.03Ti-Hetero 0.2063 0.2054 0.0169 0.0659 0.01 0.0050 1.03Tv-Homo 0.2102 0.2095 0.0168 0.0653 0 0.0052 1.03Tv-Hetero 0.2113 0.2106 0.0169 0.0658 0 0.0051 1.17Pooled 0.2102 0.2096 0.0166 0.0646 0 0.0050 1Flat 0.1870 -0.1849 0.0277 0.1080 0 -0.0047 100Param 0.2162 0.2098 0.0505 0.2012 0.01 0.0168 1DGP 4(Std Ti Ho.) Ti-Homo 0.0145 0.0063 0.0097 0.0376 0.88 -0.0439 4.45Ti-Hetero 0.0148 0.0069 0.0098 0.0384 0.89 -0.0481 4.23Tv-Homo 0.3096 0.3093 0.0137 0.0530 0 -1.9922 1.99Tv-Hetero 0.3108 0.3105 0.0135 0.0522 0 -1.9998 2.04Pooled 0.2529 0.2528 0.0062 0.0240 0 -1.6281 1Flat 0.0256 -0.0230 0.0100 0.0388 0.37 0.1481 100Param 0.2775 0.2515 0.1161 0.4593 0.26 -1.6167 1 his Version: October 6, 2020his Version: October 6, 2020
A-24Table E.2: Monte Carlo Experiment: Forecast, Larger σ Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPSDGP 1(Grp Ti Ho.) Ti-Homo 1.2281 0.0012 1.2210 4.8430 0.95 -1.6271 0.6939Ti-Hetero 1.2297 0.0041 1.2224 4.8338 0.95 -1.6305 0.6952Tv-Homo 1.2863 0.0180 1.2724 5.1561 0.95 -1.6746 0.7266Tv-Hetero 1.2902 0.0193 1.2760 5.1626 0.95 -1.6776 0.7285Pooled 1.3385 0.3476 1.2832 5.4955 0.96 -1.7161 0.7567Flat 1.2637 -0.1433 1.2494 4.8660 0.95 -1.6564 0.7147Param 1.3415 0.3499 1.2856 8.2549 1 -1.8465 0.7971DGP 2(Grp Ti He.) Ti-Homo 0.6989 0.0055 0.6949 2.7235 0.93 -1.0583 0.3819Ti-Hetero 0.6926 0.0024 0.6884 2.5818 0.96 -0.8640 0.3600Tv-Homo 0.8502 0.0101 0.8425 2.5678 0.89 -1.2358 0.4496Tv-Hetero 0.7313 0.0080 0.7230 2.7180 0.95 -0.9323 0.3812Pooled 0.8699 0.4283 0.7507 3.7531 0.95 -1.2926 0.4832Flat 0.7172 -0.0414 0.7120 2.8231 0.93 -1.0899 0.3935Param 0.8751 0.4284 0.7567 7.1947 1 -1.5801 0.5658DGP 3(Grp Tv Ho.) Ti-Homo 1.2684 -0.0325 1.2614 5.0355 0.95 -1.6600 0.7162Ti-Hetero 1.2686 -0.0326 1.2616 5.0343 0.95 -1.6603 0.7164Tv-Homo 1.2750 0.0022 1.2616 5.0568 0.95 -1.6653 0.7200Tv-Hetero 1.2749 0.0026 1.2614 5.0580 0.95 -1.6653 0.7197Pooled 1.2678 -0.0328 1.2608 5.0383 0.95 -1.6596 0.7158Flat 1.2781 -0.0377 1.2711 4.7935 0.94 -1.6690 0.7235Param 1.2835 -0.0214 1.2608 9.2910 1 -1.8663 0.7878DGP 4(Std Ti Ho.) Ti-Homo 0.6435 -0.0112 0.6400 2.5472 0.95 -0.9807 0.3636Ti-Hetero 0.6436 -0.0103 0.6401 2.6020 0.95 -0.9837 0.3639Tv-Homo 0.7069 0.0247 0.6988 2.8327 0.95 -1.0760 0.3993Tv-Hetero 0.7072 0.0247 0.6991 2.8513 0.95 -1.0771 0.3994Pooled 0.8200 0.4198 0.7011 3.5977 0.97 -1.2356 0.4660Flat 0.6720 -0.0580 0.6663 2.6339 0.95 -1.0246 0.3800Param 0.8243 0.4203 0.7059 7.0829 1 -1.5529 0.5467 his Version: October 6, 2020his Version: October 6, 2020
A-24Table E.2: Monte Carlo Experiment: Forecast, Larger σ Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPSDGP 1(Grp Ti Ho.) Ti-Homo 1.2281 0.0012 1.2210 4.8430 0.95 -1.6271 0.6939Ti-Hetero 1.2297 0.0041 1.2224 4.8338 0.95 -1.6305 0.6952Tv-Homo 1.2863 0.0180 1.2724 5.1561 0.95 -1.6746 0.7266Tv-Hetero 1.2902 0.0193 1.2760 5.1626 0.95 -1.6776 0.7285Pooled 1.3385 0.3476 1.2832 5.4955 0.96 -1.7161 0.7567Flat 1.2637 -0.1433 1.2494 4.8660 0.95 -1.6564 0.7147Param 1.3415 0.3499 1.2856 8.2549 1 -1.8465 0.7971DGP 2(Grp Ti He.) Ti-Homo 0.6989 0.0055 0.6949 2.7235 0.93 -1.0583 0.3819Ti-Hetero 0.6926 0.0024 0.6884 2.5818 0.96 -0.8640 0.3600Tv-Homo 0.8502 0.0101 0.8425 2.5678 0.89 -1.2358 0.4496Tv-Hetero 0.7313 0.0080 0.7230 2.7180 0.95 -0.9323 0.3812Pooled 0.8699 0.4283 0.7507 3.7531 0.95 -1.2926 0.4832Flat 0.7172 -0.0414 0.7120 2.8231 0.93 -1.0899 0.3935Param 0.8751 0.4284 0.7567 7.1947 1 -1.5801 0.5658DGP 3(Grp Tv Ho.) Ti-Homo 1.2684 -0.0325 1.2614 5.0355 0.95 -1.6600 0.7162Ti-Hetero 1.2686 -0.0326 1.2616 5.0343 0.95 -1.6603 0.7164Tv-Homo 1.2750 0.0022 1.2616 5.0568 0.95 -1.6653 0.7200Tv-Hetero 1.2749 0.0026 1.2614 5.0580 0.95 -1.6653 0.7197Pooled 1.2678 -0.0328 1.2608 5.0383 0.95 -1.6596 0.7158Flat 1.2781 -0.0377 1.2711 4.7935 0.94 -1.6690 0.7235Param 1.2835 -0.0214 1.2608 9.2910 1 -1.8663 0.7878DGP 4(Std Ti Ho.) Ti-Homo 0.6435 -0.0112 0.6400 2.5472 0.95 -0.9807 0.3636Ti-Hetero 0.6436 -0.0103 0.6401 2.6020 0.95 -0.9837 0.3639Tv-Homo 0.7069 0.0247 0.6988 2.8327 0.95 -1.0760 0.3993Tv-Hetero 0.7072 0.0247 0.6991 2.8513 0.95 -1.0771 0.3994Pooled 0.8200 0.4198 0.7011 3.5977 0.97 -1.2356 0.4660Flat 0.6720 -0.0580 0.6663 2.6339 0.95 -1.0246 0.3800Param 0.8243 0.4203 0.7059 7.0829 1 -1.5529 0.5467 his Version: October 6, 2020his Version: October 6, 2020
A-25
E.2 Main MC Simulation: Shorter Time Periods
Here, we show the additional simulation results of DGP1, DGP2 and DGP4 with smallperiod, i.e, T “
5. The rest settings remain the same: N “ σ “ . and the truenumber of groups is K “ T ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg KDGP 1(Grp Ti Ho.) Ti-Homo 0.0379 0.0276 0.0198 0.0766 0.67 -0.1414 3.16Ti-Hetero 0.0387 0.029 0.0198 0.0772 0.66 -0.1488 3.09Tv-Homo 0.3577 0.3566 0.0199 0.0777 0.02 -1.8223 1.37Tv-Hetero 0.3654 0.3646 0.0195 0.0760 0.01 -1.8584 1.63Pooled 0.2789 0.2785 0.0120 0.0467 0.01 -1.4050 1Flat 0.0591 -0.0554 0.0190 0.0738 0.16 0.2782 100Param 0.3146 0.2783 0.1385 0.5503 0.43 -1.4006 1DGP 2(Grp Ti He.) Ti-Homo 0.0523 0.0380 0.0245 0.0952 0.66 -0.189 2.91Ti-Hetero 0.0230 0.0126 0.0147 0.0574 0.86 -0.0639 3.40Tv-Homo 0.3099 0.3052 0.0288 0.1115 0.05 -1.5558 5.65Tv-Hetero 0.2099 0.2018 0.0377 0.1455 0.17 -1.0403 2.87Pooled 0.2613 0.2609 0.0138 0.0537 0 -1.3253 1Flat 0.0833 -0.0798 0.0232 0.0902 0.04 0.4030 100Param 0.2965 0.2602 0.1365 0.5417 0.51 -1.3210 1DGP 4(Std Ti Ho.) Ti-Homo 0.2345 0.2329 0.0269 0.1051 0 0.0018 1Ti-Hetero 0.2344 0.2328 0.0269 0.105 0 0.0017 1.01Tv-Homo 0.2357 0.2342 0.0270 0.1051 0 0.0016 1Tv-Hetero 0.2363 0.2347 0.0272 0.1062 0 0.0018 1.24Pooled 0.2344 0.2328 0.0270 0.1051 0 0.0017 1Flat 0.3601 -0.3571 0.0459 0.1790 0 -0.0029 100Param 0.2516 0.2323 0.0941 0.3756 0.19 0.0066 1 his Version: October 6, 2020his Version: October 6, 2020
5. The rest settings remain the same: N “ σ “ . and the truenumber of groups is K “ T ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg KDGP 1(Grp Ti Ho.) Ti-Homo 0.0379 0.0276 0.0198 0.0766 0.67 -0.1414 3.16Ti-Hetero 0.0387 0.029 0.0198 0.0772 0.66 -0.1488 3.09Tv-Homo 0.3577 0.3566 0.0199 0.0777 0.02 -1.8223 1.37Tv-Hetero 0.3654 0.3646 0.0195 0.0760 0.01 -1.8584 1.63Pooled 0.2789 0.2785 0.0120 0.0467 0.01 -1.4050 1Flat 0.0591 -0.0554 0.0190 0.0738 0.16 0.2782 100Param 0.3146 0.2783 0.1385 0.5503 0.43 -1.4006 1DGP 2(Grp Ti He.) Ti-Homo 0.0523 0.0380 0.0245 0.0952 0.66 -0.189 2.91Ti-Hetero 0.0230 0.0126 0.0147 0.0574 0.86 -0.0639 3.40Tv-Homo 0.3099 0.3052 0.0288 0.1115 0.05 -1.5558 5.65Tv-Hetero 0.2099 0.2018 0.0377 0.1455 0.17 -1.0403 2.87Pooled 0.2613 0.2609 0.0138 0.0537 0 -1.3253 1Flat 0.0833 -0.0798 0.0232 0.0902 0.04 0.4030 100Param 0.2965 0.2602 0.1365 0.5417 0.51 -1.3210 1DGP 4(Std Ti Ho.) Ti-Homo 0.2345 0.2329 0.0269 0.1051 0 0.0018 1Ti-Hetero 0.2344 0.2328 0.0269 0.105 0 0.0017 1.01Tv-Homo 0.2357 0.2342 0.0270 0.1051 0 0.0016 1Tv-Hetero 0.2363 0.2347 0.0272 0.1062 0 0.0018 1.24Pooled 0.2344 0.2328 0.0270 0.1051 0 0.0017 1Flat 0.3601 -0.3571 0.0459 0.1790 0 -0.0029 100Param 0.2516 0.2323 0.0941 0.3756 0.19 0.0066 1 his Version: October 6, 2020his Version: October 6, 2020
A-26Table E.4: Monte Carlo Experiment: Forecast, Smaller T Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPSDGP 1(Grp Ti Ho.) Ti-Homo 0.8491 0.0553 0.8407 3.3362 0.95 -1.2561 0.4798Ti-Hetero 0.8505 0.0585 0.8418 3.3743 0.95 -1.2605 0.4810Tv-Homo 0.9310 0.1221 0.9114 3.6884 0.95 -1.3514 0.5275Tv-Hetero 0.9352 0.1248 0.9156 3.6947 0.95 -1.3549 0.5296Pooled 1.0817 0.6143 0.8796 4.2897 0.96 -1.4992 0.6152Flat 0.8790 -0.1242 0.8653 3.4044 0.95 -1.2946 0.4980Param 1.0852 0.6144 0.8836 7.5513 1 -1.6938 0.6666DGP 2(Grp Ti He.) Ti-Homo 1.1264 0.0956 1.1131 4.2430 0.93 -1.5308 0.6138Ti-Hetero 1.0782 0.0397 1.0698 3.9245 0.95 -1.3183 0.5621Tv-Homo 1.2929 0.1517 1.2713 4.0156 0.89 -1.6690 0.6880Tv-Hetero 1.1323 0.1113 1.1129 4.1041 0.94 -1.3952 0.5948Pooled 1.2862 0.6034 1.1257 5.0186 0.93 -1.6744 0.7084Flat 1.1320 -0.166 1.1129 4.3232 0.93 -1.5477 0.6210Param 1.2889 0.6022 1.1294 7.9859 0.99 -1.8040 0.7577DGP 4(Std Ti Ho.) Ti-Homo 0.8550 -0.0081 0.8500 3.4368 0.96 -1.2665 0.4841Ti-Hetero 0.8550 -0.0082 0.8500 3.4362 0.96 -1.2668 0.4841Tv-Homo 0.8607 -0.026 0.8501 3.4509 0.95 -1.2733 0.4875Tv-Hetero 0.8620 -0.0259 0.8514 3.4513 0.95 -1.2745 0.4883Pooled 0.8550 -0.0082 0.8500 3.4362 0.95 -1.2665 0.4841Flat 0.8907 0.0036 0.886 3.2123 0.92 -1.3147 0.5052Param 0.8671 -0.0032 0.8498 8.5456 1 -1.6545 0.5970 his Version: October 6, 2020his Version: October 6, 2020
A-26Table E.4: Monte Carlo Experiment: Forecast, Smaller T Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPSDGP 1(Grp Ti Ho.) Ti-Homo 0.8491 0.0553 0.8407 3.3362 0.95 -1.2561 0.4798Ti-Hetero 0.8505 0.0585 0.8418 3.3743 0.95 -1.2605 0.4810Tv-Homo 0.9310 0.1221 0.9114 3.6884 0.95 -1.3514 0.5275Tv-Hetero 0.9352 0.1248 0.9156 3.6947 0.95 -1.3549 0.5296Pooled 1.0817 0.6143 0.8796 4.2897 0.96 -1.4992 0.6152Flat 0.8790 -0.1242 0.8653 3.4044 0.95 -1.2946 0.4980Param 1.0852 0.6144 0.8836 7.5513 1 -1.6938 0.6666DGP 2(Grp Ti He.) Ti-Homo 1.1264 0.0956 1.1131 4.2430 0.93 -1.5308 0.6138Ti-Hetero 1.0782 0.0397 1.0698 3.9245 0.95 -1.3183 0.5621Tv-Homo 1.2929 0.1517 1.2713 4.0156 0.89 -1.6690 0.6880Tv-Hetero 1.1323 0.1113 1.1129 4.1041 0.94 -1.3952 0.5948Pooled 1.2862 0.6034 1.1257 5.0186 0.93 -1.6744 0.7084Flat 1.1320 -0.166 1.1129 4.3232 0.93 -1.5477 0.6210Param 1.2889 0.6022 1.1294 7.9859 0.99 -1.8040 0.7577DGP 4(Std Ti Ho.) Ti-Homo 0.8550 -0.0081 0.8500 3.4368 0.96 -1.2665 0.4841Ti-Hetero 0.8550 -0.0082 0.8500 3.4362 0.96 -1.2668 0.4841Tv-Homo 0.8607 -0.026 0.8501 3.4509 0.95 -1.2733 0.4875Tv-Hetero 0.8620 -0.0259 0.8514 3.4513 0.95 -1.2745 0.4883Pooled 0.8550 -0.0082 0.8500 3.4362 0.95 -1.2665 0.4841Flat 0.8907 0.0036 0.886 3.2123 0.92 -1.3147 0.5052Param 0.8671 -0.0032 0.8498 8.5456 1 -1.6545 0.5970 his Version: October 6, 2020his Version: October 6, 2020
A-27
E.3 Main MC Simulation: Different K In this section, we present the simulation results of DGP1, DGP2 and DGP3 with differentnumber of groups. In particular, we consider K P t , , , u . The rest settings remain thesame: N “ T “
10, and σ “ . .Figure E.1 presents the relative performance of the BGRE estimators against the flatestimator under different K . In particular, we show the results of the correctly specifiedestimators for each DGP, i.e., Ti-Homo estimator for DGP 1, Ti-Hetero estimator for DGP2, and Tv-Homo estimator for DGP 3. For the DGP 1, the accuracy of the estimatesand the predictive power of the BGRE estimator gradually vanish as K increases. At K “
8, the BGRE estimator still marginally dominates the flat estimator in all aspectsbesides the bias of α . Moving to the DGP 2, the BGRE estimator offers better performancethan the flat estimator for all K . This suggests that the BGRE estimator successfullycaptures heterogeneity in variance and sophisticated group patterns, even with eight differentclusters. Regarding the DPG 3, where we introduce time variation in α , the BGRE estimatoroutperforms the benchmark model in terms of forecasting. Moreover, we see RMSFE, theaverage length of the credible set, and LPS are all trending down. This suggests the moreremarkable improvement in the predictive power of the BGRE estimator, as the true modelbecomes more sophisticated. It is also noteworthy that, while the average length of thecredible set for ρ is relatively large, the BGRE estimator generates a much lower RMSE of ρ and absolute bias of α than the flat estimator. his Version: October 6, 2020his Version: October 6, 2020
8, the BGRE estimator still marginally dominates the flat estimator in all aspectsbesides the bias of α . Moving to the DGP 2, the BGRE estimator offers better performancethan the flat estimator for all K . This suggests that the BGRE estimator successfullycaptures heterogeneity in variance and sophisticated group patterns, even with eight differentclusters. Regarding the DPG 3, where we introduce time variation in α , the BGRE estimatoroutperforms the benchmark model in terms of forecasting. Moreover, we see RMSFE, theaverage length of the credible set, and LPS are all trending down. This suggests the moreremarkable improvement in the predictive power of the BGRE estimator, as the true modelbecomes more sophisticated. It is also noteworthy that, while the average length of thecredible set for ρ is relatively large, the BGRE estimator generates a much lower RMSE of ρ and absolute bias of α than the flat estimator. his Version: October 6, 2020his Version: October 6, 2020 A-28Figure E.1: Monte Carlo Experiment: BGRE Estimator, Different K (a) DGP1, Estimates (b) DGP1, Forecasts(c) DGP2, Estimates (d) DGP2, Forecasts(e) DGP3, Estimates (f) DGP3, Forecasts his Version: October 6, 2020his Version: October 6, 2020
8, the BGRE estimator still marginally dominates the flat estimator in all aspectsbesides the bias of α . Moving to the DGP 2, the BGRE estimator offers better performancethan the flat estimator for all K . This suggests that the BGRE estimator successfullycaptures heterogeneity in variance and sophisticated group patterns, even with eight differentclusters. Regarding the DPG 3, where we introduce time variation in α , the BGRE estimatoroutperforms the benchmark model in terms of forecasting. Moreover, we see RMSFE, theaverage length of the credible set, and LPS are all trending down. This suggests the moreremarkable improvement in the predictive power of the BGRE estimator, as the true modelbecomes more sophisticated. It is also noteworthy that, while the average length of thecredible set for ρ is relatively large, the BGRE estimator generates a much lower RMSE of ρ and absolute bias of α than the flat estimator. his Version: October 6, 2020his Version: October 6, 2020 A-28Figure E.1: Monte Carlo Experiment: BGRE Estimator, Different K (a) DGP1, Estimates (b) DGP1, Forecasts(c) DGP2, Estimates (d) DGP2, Forecasts(e) DGP3, Estimates (f) DGP3, Forecasts his Version: October 6, 2020his Version: October 6, 2020 A-29Table E.5: Monte Carlo Experiment: Point Estimates, Different K , DGP1 (Grp Ti Ho.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020
8, the BGRE estimator still marginally dominates the flat estimator in all aspectsbesides the bias of α . Moving to the DGP 2, the BGRE estimator offers better performancethan the flat estimator for all K . This suggests that the BGRE estimator successfullycaptures heterogeneity in variance and sophisticated group patterns, even with eight differentclusters. Regarding the DPG 3, where we introduce time variation in α , the BGRE estimatoroutperforms the benchmark model in terms of forecasting. Moreover, we see RMSFE, theaverage length of the credible set, and LPS are all trending down. This suggests the moreremarkable improvement in the predictive power of the BGRE estimator, as the true modelbecomes more sophisticated. It is also noteworthy that, while the average length of thecredible set for ρ is relatively large, the BGRE estimator generates a much lower RMSE of ρ and absolute bias of α than the flat estimator. his Version: October 6, 2020his Version: October 6, 2020 A-28Figure E.1: Monte Carlo Experiment: BGRE Estimator, Different K (a) DGP1, Estimates (b) DGP1, Forecasts(c) DGP2, Estimates (d) DGP2, Forecasts(e) DGP3, Estimates (f) DGP3, Forecasts his Version: October 6, 2020his Version: October 6, 2020 A-29Table E.5: Monte Carlo Experiment: Point Estimates, Different K , DGP1 (Grp Ti Ho.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-30Table E.6: Monte Carlo Experiment: Forecast, Different K , DGP1 (Grp Ti Ho.)Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPS K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020
8, the BGRE estimator still marginally dominates the flat estimator in all aspectsbesides the bias of α . Moving to the DGP 2, the BGRE estimator offers better performancethan the flat estimator for all K . This suggests that the BGRE estimator successfullycaptures heterogeneity in variance and sophisticated group patterns, even with eight differentclusters. Regarding the DPG 3, where we introduce time variation in α , the BGRE estimatoroutperforms the benchmark model in terms of forecasting. Moreover, we see RMSFE, theaverage length of the credible set, and LPS are all trending down. This suggests the moreremarkable improvement in the predictive power of the BGRE estimator, as the true modelbecomes more sophisticated. It is also noteworthy that, while the average length of thecredible set for ρ is relatively large, the BGRE estimator generates a much lower RMSE of ρ and absolute bias of α than the flat estimator. his Version: October 6, 2020his Version: October 6, 2020 A-28Figure E.1: Monte Carlo Experiment: BGRE Estimator, Different K (a) DGP1, Estimates (b) DGP1, Forecasts(c) DGP2, Estimates (d) DGP2, Forecasts(e) DGP3, Estimates (f) DGP3, Forecasts his Version: October 6, 2020his Version: October 6, 2020 A-29Table E.5: Monte Carlo Experiment: Point Estimates, Different K , DGP1 (Grp Ti Ho.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-30Table E.6: Monte Carlo Experiment: Forecast, Different K , DGP1 (Grp Ti Ho.)Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPS K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-31Table E.7: Monte Carlo Experiment: Point Estimates, Different K , DGP2 (Grp Ti He.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020
8, the BGRE estimator still marginally dominates the flat estimator in all aspectsbesides the bias of α . Moving to the DGP 2, the BGRE estimator offers better performancethan the flat estimator for all K . This suggests that the BGRE estimator successfullycaptures heterogeneity in variance and sophisticated group patterns, even with eight differentclusters. Regarding the DPG 3, where we introduce time variation in α , the BGRE estimatoroutperforms the benchmark model in terms of forecasting. Moreover, we see RMSFE, theaverage length of the credible set, and LPS are all trending down. This suggests the moreremarkable improvement in the predictive power of the BGRE estimator, as the true modelbecomes more sophisticated. It is also noteworthy that, while the average length of thecredible set for ρ is relatively large, the BGRE estimator generates a much lower RMSE of ρ and absolute bias of α than the flat estimator. his Version: October 6, 2020his Version: October 6, 2020 A-28Figure E.1: Monte Carlo Experiment: BGRE Estimator, Different K (a) DGP1, Estimates (b) DGP1, Forecasts(c) DGP2, Estimates (d) DGP2, Forecasts(e) DGP3, Estimates (f) DGP3, Forecasts his Version: October 6, 2020his Version: October 6, 2020 A-29Table E.5: Monte Carlo Experiment: Point Estimates, Different K , DGP1 (Grp Ti Ho.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-30Table E.6: Monte Carlo Experiment: Forecast, Different K , DGP1 (Grp Ti Ho.)Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPS K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-31Table E.7: Monte Carlo Experiment: Point Estimates, Different K , DGP2 (Grp Ti He.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-32Table E.8: Monte Carlo Experiment: Forecast, Different K , DGP2 (Grp Ti He.)Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPS K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020
8, the BGRE estimator still marginally dominates the flat estimator in all aspectsbesides the bias of α . Moving to the DGP 2, the BGRE estimator offers better performancethan the flat estimator for all K . This suggests that the BGRE estimator successfullycaptures heterogeneity in variance and sophisticated group patterns, even with eight differentclusters. Regarding the DPG 3, where we introduce time variation in α , the BGRE estimatoroutperforms the benchmark model in terms of forecasting. Moreover, we see RMSFE, theaverage length of the credible set, and LPS are all trending down. This suggests the moreremarkable improvement in the predictive power of the BGRE estimator, as the true modelbecomes more sophisticated. It is also noteworthy that, while the average length of thecredible set for ρ is relatively large, the BGRE estimator generates a much lower RMSE of ρ and absolute bias of α than the flat estimator. his Version: October 6, 2020his Version: October 6, 2020 A-28Figure E.1: Monte Carlo Experiment: BGRE Estimator, Different K (a) DGP1, Estimates (b) DGP1, Forecasts(c) DGP2, Estimates (d) DGP2, Forecasts(e) DGP3, Estimates (f) DGP3, Forecasts his Version: October 6, 2020his Version: October 6, 2020 A-29Table E.5: Monte Carlo Experiment: Point Estimates, Different K , DGP1 (Grp Ti Ho.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-30Table E.6: Monte Carlo Experiment: Forecast, Different K , DGP1 (Grp Ti Ho.)Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPS K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-31Table E.7: Monte Carlo Experiment: Point Estimates, Different K , DGP2 (Grp Ti He.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-32Table E.8: Monte Carlo Experiment: Forecast, Different K , DGP2 (Grp Ti He.)Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPS K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-33Table E.9: Monte Carlo Experiment: Point Estimates, Different K , DGP3 (Grp Tv Ho.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020
8, the BGRE estimator still marginally dominates the flat estimator in all aspectsbesides the bias of α . Moving to the DGP 2, the BGRE estimator offers better performancethan the flat estimator for all K . This suggests that the BGRE estimator successfullycaptures heterogeneity in variance and sophisticated group patterns, even with eight differentclusters. Regarding the DPG 3, where we introduce time variation in α , the BGRE estimatoroutperforms the benchmark model in terms of forecasting. Moreover, we see RMSFE, theaverage length of the credible set, and LPS are all trending down. This suggests the moreremarkable improvement in the predictive power of the BGRE estimator, as the true modelbecomes more sophisticated. It is also noteworthy that, while the average length of thecredible set for ρ is relatively large, the BGRE estimator generates a much lower RMSE of ρ and absolute bias of α than the flat estimator. his Version: October 6, 2020his Version: October 6, 2020 A-28Figure E.1: Monte Carlo Experiment: BGRE Estimator, Different K (a) DGP1, Estimates (b) DGP1, Forecasts(c) DGP2, Estimates (d) DGP2, Forecasts(e) DGP3, Estimates (f) DGP3, Forecasts his Version: October 6, 2020his Version: October 6, 2020 A-29Table E.5: Monte Carlo Experiment: Point Estimates, Different K , DGP1 (Grp Ti Ho.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-30Table E.6: Monte Carlo Experiment: Forecast, Different K , DGP1 (Grp Ti Ho.)Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPS K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-31Table E.7: Monte Carlo Experiment: Point Estimates, Different K , DGP2 (Grp Ti He.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-32Table E.8: Monte Carlo Experiment: Forecast, Different K , DGP2 (Grp Ti He.)Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPS K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-33Table E.9: Monte Carlo Experiment: Point Estimates, Different K , DGP3 (Grp Tv Ho.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-34Table E.10: Monte Carlo Experiment: Forecast, Different K , DGP3 (Grp Tv Ho.)Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPS K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020
8, the BGRE estimator still marginally dominates the flat estimator in all aspectsbesides the bias of α . Moving to the DGP 2, the BGRE estimator offers better performancethan the flat estimator for all K . This suggests that the BGRE estimator successfullycaptures heterogeneity in variance and sophisticated group patterns, even with eight differentclusters. Regarding the DPG 3, where we introduce time variation in α , the BGRE estimatoroutperforms the benchmark model in terms of forecasting. Moreover, we see RMSFE, theaverage length of the credible set, and LPS are all trending down. This suggests the moreremarkable improvement in the predictive power of the BGRE estimator, as the true modelbecomes more sophisticated. It is also noteworthy that, while the average length of thecredible set for ρ is relatively large, the BGRE estimator generates a much lower RMSE of ρ and absolute bias of α than the flat estimator. his Version: October 6, 2020his Version: October 6, 2020 A-28Figure E.1: Monte Carlo Experiment: BGRE Estimator, Different K (a) DGP1, Estimates (b) DGP1, Forecasts(c) DGP2, Estimates (d) DGP2, Forecasts(e) DGP3, Estimates (f) DGP3, Forecasts his Version: October 6, 2020his Version: October 6, 2020 A-29Table E.5: Monte Carlo Experiment: Point Estimates, Different K , DGP1 (Grp Ti Ho.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-30Table E.6: Monte Carlo Experiment: Forecast, Different K , DGP1 (Grp Ti Ho.)Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPS K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-31Table E.7: Monte Carlo Experiment: Point Estimates, Different K , DGP2 (Grp Ti He.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-32Table E.8: Monte Carlo Experiment: Forecast, Different K , DGP2 (Grp Ti He.)Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPS K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-33Table E.9: Monte Carlo Experiment: Point Estimates, Different K , DGP3 (Grp Tv Ho.)ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg K K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-34Table E.10: Monte Carlo Experiment: Forecast, Different K , DGP3 (Grp Tv Ho.)Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPS K “ K “ K “ K “ his Version: October 6, 2020his Version: October 6, 2020 A-35
E.4 Two-Step GRE estimator
Table E.11: Monte Carlo Experiment: Two-Step GRE with Kmean, Point Estimatesˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg KDGP 1(Grp Ti Ho.) Ti-Homo 0.0627 0.0599 0.0114 0.0444 0.25 -0.3836 2.2Ti-Hetero 0.0611 0.0583 0.0116 0.0452 0.27 -0.3734 2.2Tv-Homo 0.1567 0.1544 0.0186 0.0721 0.11 -0.9927 2.2Tv-Hetero 0.1560 0.1537 0.0189 0.0738 0.10 -0.9887 2.2DGP 2(Grp Ti He.) Ti-Homo 0.0550 0.0513 0.0133 0.0517 0.27 -0.3388 2.2Ti-Hetero 0.0456 0.0429 0.0099 0.0386 0.27 -0.2822 2.2Tv-Homo 0.1196 0.1143 0.0203 0.0789 0.21 -0.7572 2.2Tv-Hetero 0.1458 0.1427 0.0196 0.0764 0.15 -0.9400 2.2DGP 3(Grp Tv Ho.) Ti-Homo 0.2863 0.2861 0.0114 0.0446 0.00 -2.2885 2.26Ti-Hetero 0.2807 0.2804 0.0115 0.0448 0.00 -2.2489 2.26Tv-Homo 0.1196 0.1171 0.0166 0.0648 0.08 -0.8168 2.26Tv-Hetero 0.1172 0.1144 0.017 0.0661 0.08 -0.7982 2.26DGP 4(Std Ti Ho.) Ti-Homo 0.0832 0.0796 0.0210 0.0819 0.10 0.0014 2.03Ti-Hetero 0.0833 0.0797 0.0210 0.0819 0.08 0.0015 2.03Tv-Homo 0.0942 0.0908 0.0218 0.0851 0.05 0.0018 2.03Tv-Hetero 0.0943 0.0908 0.0218 0.0851 0.06 0.0019 2.03 his Version: October 6, 2020his Version: October 6, 2020
Table E.11: Monte Carlo Experiment: Two-Step GRE with Kmean, Point Estimatesˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov PBias Avg KDGP 1(Grp Ti Ho.) Ti-Homo 0.0627 0.0599 0.0114 0.0444 0.25 -0.3836 2.2Ti-Hetero 0.0611 0.0583 0.0116 0.0452 0.27 -0.3734 2.2Tv-Homo 0.1567 0.1544 0.0186 0.0721 0.11 -0.9927 2.2Tv-Hetero 0.1560 0.1537 0.0189 0.0738 0.10 -0.9887 2.2DGP 2(Grp Ti He.) Ti-Homo 0.0550 0.0513 0.0133 0.0517 0.27 -0.3388 2.2Ti-Hetero 0.0456 0.0429 0.0099 0.0386 0.27 -0.2822 2.2Tv-Homo 0.1196 0.1143 0.0203 0.0789 0.21 -0.7572 2.2Tv-Hetero 0.1458 0.1427 0.0196 0.0764 0.15 -0.9400 2.2DGP 3(Grp Tv Ho.) Ti-Homo 0.2863 0.2861 0.0114 0.0446 0.00 -2.2885 2.26Ti-Hetero 0.2807 0.2804 0.0115 0.0448 0.00 -2.2489 2.26Tv-Homo 0.1196 0.1171 0.0166 0.0648 0.08 -0.8168 2.26Tv-Hetero 0.1172 0.1144 0.017 0.0661 0.08 -0.7982 2.26DGP 4(Std Ti Ho.) Ti-Homo 0.0832 0.0796 0.0210 0.0819 0.10 0.0014 2.03Ti-Hetero 0.0833 0.0797 0.0210 0.0819 0.08 0.0015 2.03Tv-Homo 0.0942 0.0908 0.0218 0.0851 0.05 0.0018 2.03Tv-Hetero 0.0943 0.0908 0.0218 0.0851 0.06 0.0019 2.03 his Version: October 6, 2020his Version: October 6, 2020
A-36Table E.12: Monte Carlo Experiment: Two-Step GRE with Kmean, ForecastPoint Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPSDGP 1(Grp Ti Ho.) Ti-Homo 0.8607 0.0777 0.8502 3.3899 0.95 -1.2715 0.4866Ti-Hetero 0.8610 0.0751 0.8507 3.3949 0.95 -1.2714 0.4867Tv-Homo 0.8457 0.0084 0.8369 3.3324 0.95 -1.2545 0.4775Tv-Hetero 0.8457 0.0086 0.8369 3.3393 0.95 -1.2551 0.4774DGP 2(Grp Ti He.) Ti-Homo 1.0969 0.0856 1.0860 4.2250 0.93 -1.5155 0.6031Ti-Hetero 1.0993 0.0723 1.0896 4.0301 0.94 -1.4115 0.5857Tv-Homo 1.0886 0.0054 1.0764 4.2115 0.93 -1.5073 0.5971Tv-Hetero 1.0900 0.0079 1.0775 3.9970 0.94 -1.3806 0.5758DGP 3(Grp Tv Ho.) Ti-Homo 1.3326 0.3341 1.2852 5.0617 0.95 -1.7082 0.7589Ti-Hetero 1.3393 0.3219 1.2952 5.0363 0.95 -1.7071 0.7616Tv-Homo 1.3031 0.0273 1.2961 4.2823 0.90 -1.7175 0.7477Tv-Hetero 1.3052 0.0263 1.2982 4.2703 0.90 -1.7150 0.7483DGP 4(Std Ti Ho.) Ti-Homo 0.8532 -0.0232 0.8488 3.2353 0.94 -1.2638 0.4822Ti-Hetero 0.8532 -0.0231 0.8488 3.2420 0.94 -1.2641 0.4823Tv-Homo 0.8537 -0.0018 0.8458 3.2581 0.94 -1.2640 0.4826Tv-Hetero 0.8537 -0.0014 0.8458 3.2654 0.94 -1.2646 0.4826 his Version: October 6, 2020his Version: October 6, 2020
A-36Table E.12: Monte Carlo Experiment: Two-Step GRE with Kmean, ForecastPoint Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPSDGP 1(Grp Ti Ho.) Ti-Homo 0.8607 0.0777 0.8502 3.3899 0.95 -1.2715 0.4866Ti-Hetero 0.8610 0.0751 0.8507 3.3949 0.95 -1.2714 0.4867Tv-Homo 0.8457 0.0084 0.8369 3.3324 0.95 -1.2545 0.4775Tv-Hetero 0.8457 0.0086 0.8369 3.3393 0.95 -1.2551 0.4774DGP 2(Grp Ti He.) Ti-Homo 1.0969 0.0856 1.0860 4.2250 0.93 -1.5155 0.6031Ti-Hetero 1.0993 0.0723 1.0896 4.0301 0.94 -1.4115 0.5857Tv-Homo 1.0886 0.0054 1.0764 4.2115 0.93 -1.5073 0.5971Tv-Hetero 1.0900 0.0079 1.0775 3.9970 0.94 -1.3806 0.5758DGP 3(Grp Tv Ho.) Ti-Homo 1.3326 0.3341 1.2852 5.0617 0.95 -1.7082 0.7589Ti-Hetero 1.3393 0.3219 1.2952 5.0363 0.95 -1.7071 0.7616Tv-Homo 1.3031 0.0273 1.2961 4.2823 0.90 -1.7175 0.7477Tv-Hetero 1.3052 0.0263 1.2982 4.2703 0.90 -1.7150 0.7483DGP 4(Std Ti Ho.) Ti-Homo 0.8532 -0.0232 0.8488 3.2353 0.94 -1.2638 0.4822Ti-Hetero 0.8532 -0.0231 0.8488 3.2420 0.94 -1.2641 0.4823Tv-Homo 0.8537 -0.0018 0.8458 3.2581 0.94 -1.2640 0.4826Tv-Hetero 0.8537 -0.0014 0.8458 3.2654 0.94 -1.2646 0.4826 his Version: October 6, 2020his Version: October 6, 2020
A-37
E.5 Subjective Priors With Knowledge on Groups
Table E.13: Monte Carlo Experiment: Estimates, SGP prior, DGP3ˆ ρ ˆ α i ClusterRMSE Bias Std AvgL Cov Bias Avg KSGP-RE1 0.0396 0.0294 0.0225 0.0871 0.75 -0.2072 4.00SGP-RE2 0.0463 0.0378 0.0228 0.0892 0.64 -0.2650 3.98SGP-RE3 0.0760 0.0716 0.0214 0.0834 0.26 -0.4993 3.58SGP-RE4 0.0821 0.0793 0.0210 0.0818 0.02 -0.5533 4.00SGP-RE5 0.0691 0.0654 0.0214 0.0834 0.09 -0.4583 6.00TvHetero 0.0599 0.0549 0.0220 0.0849 0.31 -0.3842 3.85Flat 0.3243 0.3240 0.0126 0.0493 0 -2.5626 100Table E.14: Monte Carlo Experiment: Forecast, SGP prior, DGP3Point Forecast Set Forecast Density ForecastRMSFE Error Std AvgL Cov LPS CRPSSGP-RE1 1.0266 0.0198 1.0178 3.7766 0.93 -1.4553 0.5823SGP-RE2 1.0465 0.0214 1.0377 3.8382 0.93 -1.4716 0.5926SGP-RE3 1.1337 0.0287 1.1250 4.1172 0.93 -1.5542 0.6455SGP-RE4 1.0682 0.0310 1.0590 4.0305 0.94 -1.4934 0.6063SGP-RE5 1.0784 0.0293 1.0696 3.9689 0.93 -1.5070 0.6114Tv-Hetero 1.0952 0.0255 1.0866 3.9531 0.93 -1.6450 0.6192Flat 1.2400 0.4211 1.1608 5.3189 0.97 -1.6450 0.7048
E.5.1 Fixed K Estimator: Imposing the true number of groups
As shown in the previous subsection, the Bayesian GRE estimator works reasonably well infinite samples to determine the number of groups. In this subsection, we assume that thenumber of groups is known and focus on clustering. We present a table of the accuracy ofclustering, where each row shows the faction of units that are correctly assigned to the true his Version: October 6, 2020his Version: October 6, 2020
As shown in the previous subsection, the Bayesian GRE estimator works reasonably well infinite samples to determine the number of groups. In this subsection, we assume that thenumber of groups is known and focus on clustering. We present a table of the accuracy ofclustering, where each row shows the faction of units that are correctly assigned to the true his Version: October 6, 2020his Version: October 6, 2020
A-38group. As an orthodox clustering algorithm, the results for Kmeans are also included as thebenchmark. To avoid cluttering the tables in the main text, we don’t present the results forsuboptimal estimators. To be more precise, for the DGP involving time-invariant randomeffects, we only document the result for Ti-Homo and Ti-Hetero since other estimators arearguably worse in clustering, based on the simulation presented in the previous section. Thesame rule applies for time-varying DGPs.Table E.15 shows the accuracy of clustering for each estimator. Overall, the accuracy ishigh for Kmeans and correctly specified estimators in each DGP, while our BGRE estimatorsare slightly dominated by the Kmeans algorithm. The reasons are straightforward. OurBGRE estimators simultaneously estimate parameters and group units while Kmeans merelyperforms clustering. The additional estimation steps in our block Gibbs sampler depend onpriors and parametric assumptions that could affect the clustering. On the other hand,the Kmeans algorithm forms clusters through spatial relationships between units, free ofany assumption. Such differences yield the discrepancies in accuracy between Kmeans andBGRE estimators. But they are acceptable as the discrepancies are within 10% most ofthe time. Comparing the performance of Kmeans in the two-step GRE estimator (Table 6),imposing the correct number of groups indeed improves the clustering ability of Kmeans.Nevertheless, it is uncommon to know the truth in practice.Table E.15: Monte Carlo Experiment: Accuracy of Clustering, Fixed K Group 1 Group 2 Group 3 Group 4
DGP 5(Grp Ti Ho.) Ti-Homo 86.87% 83.62% 60.42% 91.64%Ti-Hetero 73.59% 66.17% 56.47% 95.82%Kmeans 88.00% 96.00% 68.00% 100.00%DGP 6(Grp Ti He.) Ti-Homo 78.17% 78.66% 66.58% 99.14%Ti-Hetero 89.63% 84.22% 85.18% 99.98%Kmeans 84.00% 100.00% 88.00% 100.00%DGP 7(Grp Tv Ho.) Tv-Homo 99.33% 68.99% 93.40% 84.61%Tv-Hetero 98.92% 71.53% 93.41% 75.87%Kmeans 96.00% 88.00% 100.00% 88.00%Next, we visualize the clusters to provide a clear view of the performance of clustering.We construct a posterior similarity matrix, a matrix containing the posterior probabilitiesof observations i and j being in the same cluster (estimated empirically from the MCMCdraws). This design avoids the problem of reassigning group members to give posterior draw his Version: October 6, 2020his Version: October 6, 2020
DGP 5(Grp Ti Ho.) Ti-Homo 86.87% 83.62% 60.42% 91.64%Ti-Hetero 73.59% 66.17% 56.47% 95.82%Kmeans 88.00% 96.00% 68.00% 100.00%DGP 6(Grp Ti He.) Ti-Homo 78.17% 78.66% 66.58% 99.14%Ti-Hetero 89.63% 84.22% 85.18% 99.98%Kmeans 84.00% 100.00% 88.00% 100.00%DGP 7(Grp Tv Ho.) Tv-Homo 99.33% 68.99% 93.40% 84.61%Tv-Hetero 98.92% 71.53% 93.41% 75.87%Kmeans 96.00% 88.00% 100.00% 88.00%Next, we visualize the clusters to provide a clear view of the performance of clustering.We construct a posterior similarity matrix, a matrix containing the posterior probabilitiesof observations i and j being in the same cluster (estimated empirically from the MCMCdraws). This design avoids the problem of reassigning group members to give posterior draw his Version: October 6, 2020his Version: October 6, 2020 A-39and show a clear group structure.Figure E.2, E.3 and E.4 present the similarity matrices for the simulation using DGP5,DGP6 and DGP7, respectively. The colors depict the degree of similarity. Ideally, a perfectestimator should reveal four light yellow squares in the heatmap, leaving the remaining areain dark blue. As DGP5 implements fixed-effects and assumes homoskedasticity, Ti-Homoand Ti-Hetero estimator reveal a clear partition that matches the design of DGP5. Thougha few units are incorrectly clustered, four yellow squares on the diagonal indicate that theposterior partition is reliable. However, Tv-Homo and Tv-Hetero estimators deliver inferiorestimates and present one major group instead of four.Turning to DGP6, the best partition is generated by the Ti-Hetero estimator, which iscorrectly specified under this DGP. Even though the data density of group 2 heavily overlapswith the one of group 1 and group 3, due to the relatively small mean and large variance in α i , Ti-Hetero estimator succeeds in delivering a clear group pattern that clearly distinguishesthese three groups. The Ti-Homo estimator also has an excellent performance with ignoranceof heteroskedasticity, but it generates much more vague boundaries between groups 1, 2, and3. The Tv-Homo and Tv-Hetero results are incredibly messed, none of which depicts thecorrect partition.As for DGP7, Tv-Homo and Tv-Hetero are the best, which is expected in this DGP. Wesee a clear four-group pattern from the similarity matrix in panel (c) and (d). A few yellowand light blue stripes in the off-diagonal block suggest Tv-Homo and Tv-Hetero estimatorswrongly allocate a few units in posterior draws, especially for the units in group 2 and 4.Indeed, the paths of random effects in these two groups share great similarities. As depictedin Figure 1, the red line (group 2) can be roughly viewed as the step function approximationof the green line (group 4). Ti-Home and Ti-Hetero struggle as they ignore the time effectin α i by construction. his Version: October 6, 2020his Version: October 6, 2020
DGP 5(Grp Ti Ho.) Ti-Homo 86.87% 83.62% 60.42% 91.64%Ti-Hetero 73.59% 66.17% 56.47% 95.82%Kmeans 88.00% 96.00% 68.00% 100.00%DGP 6(Grp Ti He.) Ti-Homo 78.17% 78.66% 66.58% 99.14%Ti-Hetero 89.63% 84.22% 85.18% 99.98%Kmeans 84.00% 100.00% 88.00% 100.00%DGP 7(Grp Tv Ho.) Tv-Homo 99.33% 68.99% 93.40% 84.61%Tv-Hetero 98.92% 71.53% 93.41% 75.87%Kmeans 96.00% 88.00% 100.00% 88.00%Next, we visualize the clusters to provide a clear view of the performance of clustering.We construct a posterior similarity matrix, a matrix containing the posterior probabilitiesof observations i and j being in the same cluster (estimated empirically from the MCMCdraws). This design avoids the problem of reassigning group members to give posterior draw his Version: October 6, 2020his Version: October 6, 2020 A-39and show a clear group structure.Figure E.2, E.3 and E.4 present the similarity matrices for the simulation using DGP5,DGP6 and DGP7, respectively. The colors depict the degree of similarity. Ideally, a perfectestimator should reveal four light yellow squares in the heatmap, leaving the remaining areain dark blue. As DGP5 implements fixed-effects and assumes homoskedasticity, Ti-Homoand Ti-Hetero estimator reveal a clear partition that matches the design of DGP5. Thougha few units are incorrectly clustered, four yellow squares on the diagonal indicate that theposterior partition is reliable. However, Tv-Homo and Tv-Hetero estimators deliver inferiorestimates and present one major group instead of four.Turning to DGP6, the best partition is generated by the Ti-Hetero estimator, which iscorrectly specified under this DGP. Even though the data density of group 2 heavily overlapswith the one of group 1 and group 3, due to the relatively small mean and large variance in α i , Ti-Hetero estimator succeeds in delivering a clear group pattern that clearly distinguishesthese three groups. The Ti-Homo estimator also has an excellent performance with ignoranceof heteroskedasticity, but it generates much more vague boundaries between groups 1, 2, and3. The Tv-Homo and Tv-Hetero results are incredibly messed, none of which depicts thecorrect partition.As for DGP7, Tv-Homo and Tv-Hetero are the best, which is expected in this DGP. Wesee a clear four-group pattern from the similarity matrix in panel (c) and (d). A few yellowand light blue stripes in the off-diagonal block suggest Tv-Homo and Tv-Hetero estimatorswrongly allocate a few units in posterior draws, especially for the units in group 2 and 4.Indeed, the paths of random effects in these two groups share great similarities. As depictedin Figure 1, the red line (group 2) can be roughly viewed as the step function approximationof the green line (group 4). Ti-Home and Ti-Hetero struggle as they ignore the time effectin α i by construction. his Version: October 6, 2020his Version: October 6, 2020 A-40Figure E.2: Heatmap for Similarity Matrix, DGP5, fix K Figure E.3: Heatmap for Similarity Matrix, DGP6, fix K his Version: October 6, 2020his Version: October 6, 2020
DGP 5(Grp Ti Ho.) Ti-Homo 86.87% 83.62% 60.42% 91.64%Ti-Hetero 73.59% 66.17% 56.47% 95.82%Kmeans 88.00% 96.00% 68.00% 100.00%DGP 6(Grp Ti He.) Ti-Homo 78.17% 78.66% 66.58% 99.14%Ti-Hetero 89.63% 84.22% 85.18% 99.98%Kmeans 84.00% 100.00% 88.00% 100.00%DGP 7(Grp Tv Ho.) Tv-Homo 99.33% 68.99% 93.40% 84.61%Tv-Hetero 98.92% 71.53% 93.41% 75.87%Kmeans 96.00% 88.00% 100.00% 88.00%Next, we visualize the clusters to provide a clear view of the performance of clustering.We construct a posterior similarity matrix, a matrix containing the posterior probabilitiesof observations i and j being in the same cluster (estimated empirically from the MCMCdraws). This design avoids the problem of reassigning group members to give posterior draw his Version: October 6, 2020his Version: October 6, 2020 A-39and show a clear group structure.Figure E.2, E.3 and E.4 present the similarity matrices for the simulation using DGP5,DGP6 and DGP7, respectively. The colors depict the degree of similarity. Ideally, a perfectestimator should reveal four light yellow squares in the heatmap, leaving the remaining areain dark blue. As DGP5 implements fixed-effects and assumes homoskedasticity, Ti-Homoand Ti-Hetero estimator reveal a clear partition that matches the design of DGP5. Thougha few units are incorrectly clustered, four yellow squares on the diagonal indicate that theposterior partition is reliable. However, Tv-Homo and Tv-Hetero estimators deliver inferiorestimates and present one major group instead of four.Turning to DGP6, the best partition is generated by the Ti-Hetero estimator, which iscorrectly specified under this DGP. Even though the data density of group 2 heavily overlapswith the one of group 1 and group 3, due to the relatively small mean and large variance in α i , Ti-Hetero estimator succeeds in delivering a clear group pattern that clearly distinguishesthese three groups. The Ti-Homo estimator also has an excellent performance with ignoranceof heteroskedasticity, but it generates much more vague boundaries between groups 1, 2, and3. The Tv-Homo and Tv-Hetero results are incredibly messed, none of which depicts thecorrect partition.As for DGP7, Tv-Homo and Tv-Hetero are the best, which is expected in this DGP. Wesee a clear four-group pattern from the similarity matrix in panel (c) and (d). A few yellowand light blue stripes in the off-diagonal block suggest Tv-Homo and Tv-Hetero estimatorswrongly allocate a few units in posterior draws, especially for the units in group 2 and 4.Indeed, the paths of random effects in these two groups share great similarities. As depictedin Figure 1, the red line (group 2) can be roughly viewed as the step function approximationof the green line (group 4). Ti-Home and Ti-Hetero struggle as they ignore the time effectin α i by construction. his Version: October 6, 2020his Version: October 6, 2020 A-40Figure E.2: Heatmap for Similarity Matrix, DGP5, fix K Figure E.3: Heatmap for Similarity Matrix, DGP6, fix K his Version: October 6, 2020his Version: October 6, 2020 A-41Figure E.4: Heatmap for Similarity Matrix, DGP7, fix K F Data Set
The individual company raw annual data are obtained from the COMPUSTAT database.We constructed the sample using the data from the year 1999 to 2019. The reason to not usethe data back to the 1970s is to avoid potential structure breaks in the variable of interestand to reflect the advanced speed of capital accumulation in recent decades. The primaryvariables of interest are: • K = Capital stock: net property, plant, and equipment. [PPENT] • I = Investment: capital expenditures in property, plant, and equipment. [CAPX] • Y = Sales: net sales revenues. [SALE] • CF = Cash Flow: income after taxes and interest plus depreciation minus dividends.[EBITDA - TXT - XINT - DVT]Additional variables used in the alternative model specification: his Version: October 6, 2020his Version: October 6, 2020
The individual company raw annual data are obtained from the COMPUSTAT database.We constructed the sample using the data from the year 1999 to 2019. The reason to not usethe data back to the 1970s is to avoid potential structure breaks in the variable of interestand to reflect the advanced speed of capital accumulation in recent decades. The primaryvariables of interest are: • K = Capital stock: net property, plant, and equipment. [PPENT] • I = Investment: capital expenditures in property, plant, and equipment. [CAPX] • Y = Sales: net sales revenues. [SALE] • CF = Cash Flow: income after taxes and interest plus depreciation minus dividends.[EBITDA - TXT - XINT - DVT]Additional variables used in the alternative model specification: his Version: October 6, 2020his Version: October 6, 2020
A-42 • Q = Tobin’s Q: define as (E+B-INV) / K - 1. • E = Market value of equity: the sum of common equity and preferred equity. [PRCC f *CSHO+ PSTK] • B = Book value of debt: the sum of short-term and long-term debt. [DLC + DLTT] • INV = Market value of inventories. [INVT].The variable names and formula in the bracket are corresponding items in COMPUSTAT.We process the raw data according to the following guidance:1. Observations where capital stock and sales are either zero or negative are eliminated.2. Firms that have missing values in the primary variables of interest during 1999-2019are excluded.3. We eliminate any firm-year observation if the firm involved in merger and acquisition.4. Each firm must have valid annual observations from the year 1999 to 2019.The final sample comprises 337 firms and the observations on each firm is 20. Thesummary statistics are reported in Table F.1.Table F.1: Descriptive Statistics for the Variables of InterestMin 25% Median Mean 75% Max StD Skew. Kurt.I/K 0.03 0.11 0.16 0.17 0.21 0.53 0.09 1.41 2.53CF/K -1.13 0.12 0.26 0.38 0.51 2.55 0.48 1.55 5.94Y/K -1.53 0.54 1.35 1.19 1.95 4.19 1.17 -0.23 -0.09N/K -8.63 -5.36 -4.19 -4.56 -3.46 -1.77 1.52 -0.74 -0.12log(K) -0.37 5.16 6.77 6.60 8.32 9.82 2.26 -0.63 0.21q -0.55 0.83 2.96 7.37 8.32 90.06 12.92 4.00 19.63
Notes: The descriptive statistics are computed across N and T dimension of thepanel. his Version: October 6, 2020his Version: October 6, 2020
Notes: The descriptive statistics are computed across N and T dimension of thepanel. his Version: October 6, 2020his Version: October 6, 2020 A-43
G Additional Empirical Results
In this section, we present the full result of empirical analysis in which detailed yearlyestimate results are listed here.Table G.1: Empirical Application: Predict Investment Rate, RMSFE2015 2016 2017 2018 2019HomogenousCoef. Ti-Homo 0.0917 0.1395 0.2625 0.1166 0.1108Ti-Hetero 0.0750 0.1159 0.3550 0.0674 0.0822Tv-Homo 0.0927 0.1382 0.2590 0.1165 0.1177Tv-Hetero 0.0783 0.1156 0.3686 0.0692 0.0812Pooled 0.0926 0.1386 0.2625 0.1160 0.1150Flat 0.1034 0.1491 0.2703 0.1328 0.1100Param 0.1958 0.2295 0.2466 0.2492 0.2043HeterogenousCoef. Ti-Homo 0.1103 0.1006 1.8575 0.1041 0.1144Ti-Hetero 0.1104 0.0999 1.8802 0.1028 0.1152Tv-Homo 0.1582 0.1729 0.2863 0.1782 0.1070Tv-Hetero 0.1097 0.1009 1.8644 0.1062 0.1101Flat 0.1649 0.1906 1.7937 0.1833 0.1164 his Version: October 6, 2020his Version: October 6, 2020
In this section, we present the full result of empirical analysis in which detailed yearlyestimate results are listed here.Table G.1: Empirical Application: Predict Investment Rate, RMSFE2015 2016 2017 2018 2019HomogenousCoef. Ti-Homo 0.0917 0.1395 0.2625 0.1166 0.1108Ti-Hetero 0.0750 0.1159 0.3550 0.0674 0.0822Tv-Homo 0.0927 0.1382 0.2590 0.1165 0.1177Tv-Hetero 0.0783 0.1156 0.3686 0.0692 0.0812Pooled 0.0926 0.1386 0.2625 0.1160 0.1150Flat 0.1034 0.1491 0.2703 0.1328 0.1100Param 0.1958 0.2295 0.2466 0.2492 0.2043HeterogenousCoef. Ti-Homo 0.1103 0.1006 1.8575 0.1041 0.1144Ti-Hetero 0.1104 0.0999 1.8802 0.1028 0.1152Tv-Homo 0.1582 0.1729 0.2863 0.1782 0.1070Tv-Hetero 0.1097 0.1009 1.8644 0.1062 0.1101Flat 0.1649 0.1906 1.7937 0.1833 0.1164 his Version: October 6, 2020his Version: October 6, 2020
A-44Table G.2: Empirical Application: Predict Investment Rate, Average Number of Groups2015 2016 2017 2018 2019HomogenousCoef. Ti-Homo 2 2 1 2 2Ti-Hetero 6.62 7.8 6.66 7.79 7.86Tv-Homo 1 1 1 1 1Tv-Hetero 6.75 6.79 6.8 7.64 6.75Pooled 1 1 1 1 1Flat 337 337 337 337 337Param 1 1 1 1 1HeterogenousCoef. Ti-Homo 7.75 6.66 6.78 8.05 7.48Ti-Hetero 7.09 6.65 7.23 6.67 6.68Tv-Homo 1 1 1 1 1Tv-Hetero 6.07 6.11 6.79 6.57 7.46Flat 337 337 337 337 337Table G.3: Empirical Application: Predict Investment Rate, Frequentist Coverage Rate2015 2016 2017 2018 2019HomogenousCoef. Ti-Homo 0.9822 0.9703 0.9733 0.9733 0.9614Ti-Hetero 0.9525 0.9466 0.9674 0.9585 0.9021Tv-Homo 0.9822 0.9644 0.9792 0.9733 0.9525Tv-Hetero 0.9466 0.9347 0.9496 0.9466 0.8813Pooled 0.9852 0.9644 0.9792 0.9763 0.9555Flat 0.9792 0.9792 0.9703 0.9733 0.9703Param 1 1 1 1 1HeterogenousCoef. Ti-Homo 0.9407 0.9585 0.9555 0.9525 0.8724Ti-Hetero 0.9436 0.9466 0.9674 0.9496 0.8724Tv-Homo 0.9852 0.9792 0.9792 0.9792 0.9703Tv-Hetero 0.9466 0.9466 0.9407 0.9258 0.8338Flat 0.9733 0.9822 0.9733 0.9763 0.9733 his Version: October 6, 2020his Version: October 6, 2020
A-44Table G.2: Empirical Application: Predict Investment Rate, Average Number of Groups2015 2016 2017 2018 2019HomogenousCoef. Ti-Homo 2 2 1 2 2Ti-Hetero 6.62 7.8 6.66 7.79 7.86Tv-Homo 1 1 1 1 1Tv-Hetero 6.75 6.79 6.8 7.64 6.75Pooled 1 1 1 1 1Flat 337 337 337 337 337Param 1 1 1 1 1HeterogenousCoef. Ti-Homo 7.75 6.66 6.78 8.05 7.48Ti-Hetero 7.09 6.65 7.23 6.67 6.68Tv-Homo 1 1 1 1 1Tv-Hetero 6.07 6.11 6.79 6.57 7.46Flat 337 337 337 337 337Table G.3: Empirical Application: Predict Investment Rate, Frequentist Coverage Rate2015 2016 2017 2018 2019HomogenousCoef. Ti-Homo 0.9822 0.9703 0.9733 0.9733 0.9614Ti-Hetero 0.9525 0.9466 0.9674 0.9585 0.9021Tv-Homo 0.9822 0.9644 0.9792 0.9733 0.9525Tv-Hetero 0.9466 0.9347 0.9496 0.9466 0.8813Pooled 0.9852 0.9644 0.9792 0.9763 0.9555Flat 0.9792 0.9792 0.9703 0.9733 0.9703Param 1 1 1 1 1HeterogenousCoef. Ti-Homo 0.9407 0.9585 0.9555 0.9525 0.8724Ti-Hetero 0.9436 0.9466 0.9674 0.9496 0.8724Tv-Homo 0.9852 0.9792 0.9792 0.9792 0.9703Tv-Hetero 0.9466 0.9466 0.9407 0.9258 0.8338Flat 0.9733 0.9822 0.9733 0.9763 0.9733 his Version: October 6, 2020his Version: October 6, 2020
A-45Table G.4: Empirical Application: Predict Investment Rate, Length of 95% Credible Set2015 2016 2017 2018 2019HomogenousCoef. Ti-Homo 0.5737 0.5692 0.5710 0.5912 0.5908Ti-Hetero 0.4149 0.3129 0.3223 0.3030 0.4012Tv-Homo 0.5665 0.5620 0.5628 0.5851 0.5875Tv-Hetero 0.2886 0.2801 0.2809 0.3810 0.3867Pooled 0.5759 0.5716 0.5716 0.5946 0.5966Flat 0.5709 0.5664 0.5729 0.5976 0.6041Param 6.7334 6.7548 6.9387 6.8507 6.8211HeterogenousCoef. Ti-Homo 0.2881 0.3008 0.403 0.2925 0.2837Ti-Hetero 0.2889 0.3024 0.4033 0.2921 0.2841Tv-Homo 0.6368 0.6605 0.7106 0.6414 0.6344Tv-Hetero 0.2827 0.2961 0.3978 0.2840 0.2741Flat 0.6660 0.7119 0.7948 0.6826 0.6753Table G.5: Empirical Application: Predict Investment Rate, LPS2015 2016 2017 2018 2019HomogenousCoef. Ti-Homo 0.8021 0.5577 -0.3237 0.6752 0.7039Ti-Hetero 1.5788 1.5833 1.6552 1.7086 1.3724Tv-Homo 0.8044 0.5601 -0.3055 0.6754 0.6671Tv-Hetero 1.5618 1.5680 1.6063 1.6896 1.2981Pooled 0.7952 0.5556 -0.3197 0.6716 0.6746Flat 0.7532 0.4900 -0.5030 0.5868 0.6935Param -1.2611 -1.2671 -1.2779 -1.2760 -1.2722HeterogenousCoef. Ti-Homo 1.3901 1.5670 1.2802 1.6146 1.1904Ti-Hetero 1.3059 1.5676 1.5278 1.6140 1.1883Tv-Homo 0.6598 0.5030 0.4711 0.5313 0.6879Tv-Hetero 1.5067 1.5490 1.5286 1.5520 1.1013Flat 0.5675 0.4889 0.4966 0.4385 0.6205 his Version: October 6, 2020his Version: October 6, 2020
A-45Table G.4: Empirical Application: Predict Investment Rate, Length of 95% Credible Set2015 2016 2017 2018 2019HomogenousCoef. Ti-Homo 0.5737 0.5692 0.5710 0.5912 0.5908Ti-Hetero 0.4149 0.3129 0.3223 0.3030 0.4012Tv-Homo 0.5665 0.5620 0.5628 0.5851 0.5875Tv-Hetero 0.2886 0.2801 0.2809 0.3810 0.3867Pooled 0.5759 0.5716 0.5716 0.5946 0.5966Flat 0.5709 0.5664 0.5729 0.5976 0.6041Param 6.7334 6.7548 6.9387 6.8507 6.8211HeterogenousCoef. Ti-Homo 0.2881 0.3008 0.403 0.2925 0.2837Ti-Hetero 0.2889 0.3024 0.4033 0.2921 0.2841Tv-Homo 0.6368 0.6605 0.7106 0.6414 0.6344Tv-Hetero 0.2827 0.2961 0.3978 0.2840 0.2741Flat 0.6660 0.7119 0.7948 0.6826 0.6753Table G.5: Empirical Application: Predict Investment Rate, LPS2015 2016 2017 2018 2019HomogenousCoef. Ti-Homo 0.8021 0.5577 -0.3237 0.6752 0.7039Ti-Hetero 1.5788 1.5833 1.6552 1.7086 1.3724Tv-Homo 0.8044 0.5601 -0.3055 0.6754 0.6671Tv-Hetero 1.5618 1.5680 1.6063 1.6896 1.2981Pooled 0.7952 0.5556 -0.3197 0.6716 0.6746Flat 0.7532 0.4900 -0.5030 0.5868 0.6935Param -1.2611 -1.2671 -1.2779 -1.2760 -1.2722HeterogenousCoef. Ti-Homo 1.3901 1.5670 1.2802 1.6146 1.1904Ti-Hetero 1.3059 1.5676 1.5278 1.6140 1.1883Tv-Homo 0.6598 0.5030 0.4711 0.5313 0.6879Tv-Hetero 1.5067 1.5490 1.5286 1.5520 1.1013Flat 0.5675 0.4889 0.4966 0.4385 0.6205 his Version: October 6, 2020his Version: October 6, 2020
A-46Table G.6: Empirical Application: Predict Investment Rate, CRPS2015 2016 2017 2018 2019HomogenousCoef. Ti-Homo 0.0529 0.0633 0.0712 0.0599 0.0605Ti-Hetero 0.0398 0.0399 0.0511 0.0315 0.0464Tv-Homo 0.0532 0.0635 0.0721 0.0599 0.0634Tv-Hetero 0.0354 0.0394 0.0517 0.0332 0.0454Pooled 0.0535 0.0637 0.0712 0.0603 0.0627Flat 0.0537 0.0640 0.0723 0.0620 0.0604Param 0.3510 0.3550 0.3649 0.3601 0.3554HeterogenousCoef. Ti-Homo 0.0389 0.0382 0.1257 0.0343 0.0485Ti-Hetero 0.0391 0.0381 0.1271 0.0346 0.0485Tv-Homo 0.0639 0.072 0.0790 0.0696 0.0615Tv-Hetero 0.0384 0.0379 0.1248 0.0353 0.0486Flat 0.0702 0.0751 0.1587 0.0721 0.065 his Version: October 6, 2020his Version: October 6, 2020