Panel Data Quantile Regression with Grouped Fixed Effects
PPANEL DATA QUANTILE REGRESSION WITH GROUPED FIXEDEFFECTS
JIAYING GU AND STANISLAV VOLGUSHEV
Abstract.
This paper introduces estimation methods for grouped latent heterogeneityin panel data quantile regression. We assume that the observed individuals come from aheterogeneous population with a finite number of types. The number of types and groupmembership is not assumed to be known in advance and is estimated by means of a convexoptimization problem. We provide conditions under which group membership is estimatedconsistently and establish asymptotic normality of the resulting estimators. Simulationsshow that the method works well in finite samples when T is reasonably large. To illustratethe proposed methodology we study the effects of the adoption of Right-to-Carry concealedweapon laws on violent crime rates using panel data of 51 U.S. states from 1977 - 2010. Introduction
It is widely accepted in applied Econometrics that individual latent effects constitutean important feature of many economic applications. When panel data are available, acommon approach is to incorporate latent structures in completely nonrestrictive way, i.e.the fixed effect approach. The fixed effect approach is attractive as it imposes minimalassumptions on the structure of the latent effects and on the correlation between the latenteffects and the observed covariates and hence has become a very common empirical tool(see Hsiao (2003) for a textbook treatment).A major challenge of the fixed effects approach lies in the fact that it introduces alarge number of parameters which grows linearly with the number of individuals. For afew specific models this can be avoided by differencing out individual effects and learningabout the common parameter of interest. However, for most models, including quantileregression, this simple differencing method no longer exists. The literature contains variousapproaches that put additional structure on latent effects in order to reduce the number ofparameters and obtain more interpretable models. One popular approach is to introducesome parametric distributional structure on the latent effects, see for example Mundlak(1978), Chamberlain (1982) and the correlated random effects literature. An alternative isto assume that the fixed effects have a group structure and hence only take a few distinctvalues which is the approach we take in this paper.There is ample evidence from empirical studies that it is often reasonable to consider anumber of homogeneous groups (clusters) within a heterogeneous population. This discreteapproach was taken by Heckman and Singer (1982) for duration analysis of unemploymentspells of a heterogeneous population of workers. Bester and Hansen (2016) argue that in
Version: August 7, 2018. The authors would like to thank Jacob Bien for bringing the convex clusteringliterature to their attention. We are also grateful to two anonymous Referees and the Associate Editor whosecomments helped to considerably improve the presentation of this manuscript. a r X i v : . [ ec on . E M ] A ug Panel Quantile regression with group fixed effects many applications individuals or firms are grouped naturally by some observable covariatessuch as classes, schools or industry codes. It is also widely accepted in the discrete choicemodel literature that individual agents are classified as a number of latent types (for instanceKeane and Wolpin (1997) among many others).Estimating cluster structure has a long history in Statistics and Economics, and has gen-erated a rich and mature literature. A general overview is given in Kaufman and Rousseeuw(2009). Among the many available clustering algorithms, the k-means algorithm (MacQueen(1967)) is one of the most popular methods. It has been successfully utilized in many eco-nomic applications, for instance Lin and Ng (2012), Bonhomme and Manresa (2015) andAndo and Bai (2016). Finite mixture models provide an alternative, likelihood based ap-proach. In the latter, grouping is usually achieved by maximizing the likelihood of theobserved data. Sun (2005) builds a multinomial logistic regression model to infer the grouppattern while nonparametric finite mixture models are considered in Allman, Mathias, andRhodes (2009) and Kasahara and Shimotsu (2009) among many others.The focus of the present paper is on quantile regression for panel data with grouped indi-vidual heterogeneity. Panel data quantile regression has recently attracted a lot of attention,and there is a rich and growing literature that proposes various approaches to dealing withindividual heterogeneity in this setting. In a pioneering contribution, Koenker (2004) takesthe fixed effect approach and introduces individual latent effects as location shifts. Theseindividual effects are regularized through an (cid:96) penalty which shrinks them towards a com-mon value. Lamarche (2010) proposes an optimal way to choose the corresponding penaltyparameter in order to optimize the asymptotic efficiency of the common parameters of theconditional quantile function, see Harding and Lamarche (2017) for an extension of this ap-proach. Another line of work that focuses on estimating common parameters while puttingno structure on individual effects includes Kato, Galvao, and Montes-Rojas (2012), Galvaoand Wang (2015) and Galvao and Kato (2016). Alternative approaches have also emerged.Abrevaya and Dahl (2008) take a random effect view of these individual latent effects. Theyconsider a correlated random-effect model in the spirit of Chamberlain (1982) where theindividual effects are modeled through a linear regression of some covariates. This is furtherdeveloped in Arellano and Bonhomme (2016) and Chetverikov, Larsen, and Palmer (2016)where the conditional quantile function of the unobserved heterogeneity is modelled as afunction of observable covariates. Our contribution, which builds upon Koenker (2004), is a linear quantile regressionmethod that accommodates grouped fixed effects. The advantages of our proposal overexisting proposals are twofold. First, grouped fixed effects maintain the merit of unre-stricted correlation between the latent effects and the observables and strike a good balancebetween the classical fixed effects approach and the other extreme which completely ignoreslatent heterogeneity. Second, in contrast to Koenker (2004), where the fixed effects aretreated as nuisance parameters and are regularized to achieve a more efficient estimatorfor the global parameter, our method allows the researcher to learn the particular groupstructure of the latent effects together with common parameters of interest in the model. Tothe best of our knowledge, panel data quantile regression with grouped fixed effects has not For related literature on non-separable panel data models see Evdokimov (2010), Chernozhukov, Fern´andez-Val, Hoderlein, Holzmann, and Newey (2015) and the references therein. u and Volgushev 3 been considered in the literature before. The only paper that goes in this direction is Su,Shi, and Phillips (2016). While the general framework developed in this paper does includea version of quantile regression with smoothed quantile objective function, the theoreticalanalysis requires the smoothing parameter to be fixed. This results in a non-vanishing biasand hence does not correspond to quantile regression in a strict sense.We do not assume any prior knowledge of the group structure and combine the quantileregression loss function with the recently proposed convex clustering penalty of Hocking,Vert, Bach, and Joulin (2011). The convex clustering method introduces a (cid:96) -constraint onthe pair-wise difference of the individual fixed effects, which tends to push the fixed effectsinto clusters. The number of clusters is controlled by a penalty parameter. The resultingoptimization problem remains convex and can be solved in a fast and reliable fashion.Further modifications and a theoretical analysis of convex clustering were considered inZhu, Xu, Leng, and Yan (2014), Tan and Witten (2015) and Radchenko and Mukherjee(2017). All of those authors combine (cid:96) penalties with the classical (cid:96) loss, and only considerclustering for cross-sectional data. Their theoretical results are not directly applicable topanel data or the non-smooth quantile loss function which is the main objective in thispaper (all of the available theoretical results explicitly make use of the differentiability ofthe (cid:96) loss function in their proofs).Our main theoretical contribution is to show consistency of the estimated grouping for asuitable range of penalty parameters when n and T tend to infinity jointly. We also proposea completely data-driven information criterion that facilitates the practical implementationof the method and prove its consistency for group selection as well as asymptotic normalityof the resulting parameter estimators.The remaining part of this paper is organized as follows. Section 2 contains a detaileddescription of the proposed methodology and provides details on its practical implemen-tation. Assumptions and theoretical results are included in Section 3. Section 4 presentsthe convex optimization problem and its computational details. Monte Carlo simulationresults are included in Section 5 where investigate the final sample behavior of the proposedmethodology. In Section 6 we apply the method to an empirical application in studying theeffect of the adoption of Right-to-Carry concealed weapon law on violent crime rate usinga panel data of 51 U.S. states from 1977 - 2010. All proofs are collected in Section 8 whileadditional simulation results and details for the empirical application are relegated to theAppendix. Panel Quantile regression with group fixed effects Methodology
Assume that for individuals i = 1 , ..., n we observe repeated measures ( X it , Y it ) t =1 ,...,T where X it denote covariates and Y it are responses. We shall maintain the assumption thatdata are i.i.d. within individuals and independent across individuals. The main object ofinterest in this paper is the conditional τ -quantile function of Y i given X i , which we willdenote by q i,τ . We assume that q i,τ is of the form q i,τ ( x ) = β ( τ ) (cid:62) x + α i ( τ ) , i = 1 , ...., n with individual fixed effects α i ( τ ) taking only a finite number, say K , of different values,say α (01) ( τ ) , ..., α (0 K ) ( τ ). We explicitly allow the group membership, and even the numberof groups to be unknown and to depend on τ but will not stress this dependence in thenotation for the sake of simplicity. Our main objective is to jointly estimate the number ofgroups, unknown group structure, and parameters α (01) , ..., α (0 K ) , β from the observations.To achieve this, we consider penalized estimators of the form(1) ( ˆ α , ..., ˆ α n , ˆ β ) := arg min α ,...,α n ,β Θ( α , ..., α n , β )where Θ( α , ..., α n , β ) := (cid:88) i,t ρ τ ( Y it − X (cid:62) it β − α i ) + (cid:88) i (cid:54) = j λ ij | α i − α j | . Here ρ τ denotes the usual ’check function’ and the weights λ i,j are allowed to depend on n, T and the data; one particular choice is discussed in below. The form of the penalty ismotivated by the work of Hocking, Vert, Bach, and Joulin (2011). Intuitively, large valuesof λ ij will push different coefficients closer together and result in clustered structure of theestimators ˆ α i . High-level conditions on the weights λ i,j which guarantee consistency of theresulting grouping procedure are provided in Theorem 3.1.There are various possible choices for the penalty parameters λ i,j . We propose to useweights of the form(2) ˇ λ ij := λ | ˇ α i − ˇ α j | − Here, T is assumed to be the same across individuals for notational simplicity. All results that followcan be extended to individual-specific values of T i as long as the ratio (max i =1 ,...,n T i ) / (min i =1 ,...,n T i ) isuniformly bounded. In this case the theory goes through without changes if all instances of T are replacedby n − (cid:80) i =1 ,...,n T i We follow Koenker (2004) in treating the α i as fixed parameters. An alternative approach which leads toequivalent results is to treat the α i as random (with no restrictions placed on the dependence with X it ).In this case the model can be written as Q Y it | X it ,α i ( τ ) ( τ ) = β ( τ ) (cid:62) x + α i ( τ ); here Q Y it | X it ,α i ( τ ) denotesthe conditional quantile function of Y it given ( X it , α i ( τ )) (see for instance Kato, Galvao, and Montes-Rojas(2012), Galvao and Wang (2015) and Galvao and Kato (2016) for this interpretation). Both interpretationslead to the same asymptotic results. As pointed out by a Referee, one could also consider combining the objective functions corresponding toseveral quantiles as was done in Koenker (2004) and force all coefficients α i to be independent of τ . Thiswould result in efficiency gains if all α i are purely location-shift effects but can introduce bias otherwise. Weleave this extension for future research. u and Volgushev 5 where ( ˇ α , ..., ˇ α n ) are the fixed effects quantile regression estimators(3) ( ˇ α , ..., ˇ α n , ˇ β ) := arg min α ,...,α n ,β (cid:88) i,t ρ τ ( Y it − X (cid:62) it β − α i )of Kato, Galvao, and Montes-Rojas (2012) and λ is a tuning parameter. This form ofweighting by preliminary estimators is motivated by the work of Zou (2006) on adaptivelasso. Intuitively, weighting by preliminary estimated distances tends to give smaller penal-ties to coefficients from different groups thus reducing some of the bias that is typicallypresent in the classical lasso.Given the developments above, it remains to find a value for the tuning parameter λ . Thehigh-level results in Theorem 3.1 together with findings in Kato, Galvao, and Montes-Rojas(2012) provide a theoretical range for those values (see the discussion following Theorem 3.1for additional details), but this range is not directly useful in practice since only rates andnot constants are provided. Moreover, despite the fact that the weights ˇ λ ij := λ | ˇ α i − ˇ α j | − lead to asymptotically unbiased estimates, bias can still be a problem in finite samples. Atypical approach in the literature to reduce bias which results from lasso-type penalties isto view the lasso problem solution as a candidate model (in our case, a candidate groupingof α i ) and re-fit based on this candidate model (see Belloni and Chernozhukov (2009) orSu, Shi, and Phillips (2016) among many others).To deal with bias issues and the choice of λ in practice, we propose to combine the re-fitting idea with a simple information criterion which will simultaneously reduce the biasproblem and provide a simple way to select a final model. A formal description of ourapproach is given in Algorithm 1.Theorem 3.2 provides a formal justification of Algorithm 1 under high-level conditionson ˆ C, p n,T . In particular we prove that the group structure is estimated consistently withprobability tending to one and derive the asymptotic distribution of the resulting estimatorsˆ α ICi , ˆ β IC . In order to make the proposed estimation procedure fully data-driven, we needto specify a choice for the tuning parameters ˆ C and p n,T . In our simulations, we found thatthe following choices lead to good results :(4) p n,T = nT / / , ˆ C := τ (1 − τ )ˆ s ( τ )with ˆ s ( τ ) := ( ˆ F − ( τ + h n,T ) − ˆ F − ( τ − h n,T )) / (2 h n,T )where ˆ F ( y ) := nT (cid:80) i,t I { Y it − X (cid:62) it ˇ β − ˇ α i ≤ y } denotes the empirical cdf of the regressionresiduals from the fixed effects quantile regression estimator given in (3), ˆ F − denotes thecorresponding empirical quantile function, and h n,T → As pointed out by a Referee, an alternative approach to obtain preliminary estimators for α i would be torun separate quantile regressions for each individual. This did not improve the performance of our procedurein the simulations that we tried. The exact constant 1 /
10 in the factor p n,T does not matter asymptotically. The value 1 /
10 was foundto work well for a wide range of values of n, T and for various models, details are provided in the MonteCarlo section 5.2. There we also show that the impact of the precise form of the factor in ˆ p n,T becomes lesspronounced as T increases Panel Quantile regression with group fixed effects input :
Data ( X it , Y it ), grid of values λ , ..., λ L , quantile level of interest τ output: Estimated number of groups ˆ K IC , estimated group membership ˆ I IC , ..., ˆ I IC ˆ K ,estimated coefficients ˆ α ICk , ˆ β IC for i ← to n do compute ˇ α i given in (3) endfor l ← to L do Compute( ˆ α ,(cid:96) , ..., ˆ α n,(cid:96) , ˆ β (cid:96) ) := arg min ( α ,...,α n ,β ) (cid:110) (cid:88) i,t ρ τ ( Y it − X (cid:62) it β − α i ) + λ (cid:96) (cid:88) i (cid:54) = j | α i − α j || ˇ α i − ˇ α j | (cid:111) Let ˆ α (1 ,(cid:96) ) < ... < ˆ α ( K (cid:96) ,(cid:96) ) denote the unique values of ˆ α ,(cid:96) , ..., ˆ α n,(cid:96) , and defineˆ I j,(cid:96) := { i : ˆ α i = ˆ α ( j,(cid:96) ) } as the estimated groups. Compute re-fitted estimators( (cid:101) α ,(cid:96) , ..., (cid:101) α K (cid:96) ,(cid:96) , (cid:101) β (cid:96) ) := arg min ( α ,...,α K(cid:96) ,β ) K (cid:96) (cid:88) k =1 (cid:88) i ∈ ˆ I k,(cid:96) (cid:88) t ρ τ ( Y it − X (cid:62) it β − α k ) . Compute the IC criterion IC ( (cid:96) ) := K (cid:96) (cid:88) k =1 (cid:88) i ∈ ˆ I k,(cid:96) (cid:88) t ρ τ ( Y it − X (cid:62) it (cid:101) β (cid:96) − (cid:101) α k,(cid:96) ) + ˆ CK (cid:96) p n,T , where the choice of ˆ C and p n,T is given in (4). end Set ˆ (cid:96) IC := arg min (cid:96) =1 ,...,L IC ( (cid:96) ) and denote by ˆ K IC := K ˆ (cid:96) IC the corresponding numberof groups. Set ˆ I ICk := ˆ I k, ˆ (cid:96) IC , ˆ α ICk := (cid:101) α , ˆ (cid:96) IC , ˆ β IC := (cid:101) β ˆ (cid:96) IC . Algorithm 1:
Grouping via IC criterionTo motivate this particular choice of constant ˆ C , observe the following expansion, whichis derived in detail in the proof of Theorem 3.2 (cid:88) i,t ρ τ ( Y it − X (cid:62) it ˇ β − ˇ α i ) − ρ τ ( Y it − X (cid:62) it β − α i ) = − (cid:88) i τ (1 − τ )2 E [ f Y i | X i ( q i,τ ( X i ) | X i )] + o P ( n ) . This shows that plugging in the estimated (by fixed effects quantile regression) instead oftrue errors underestimates the objective function evaluated at the residuals by roughly thefirst term on the right-hand side in the above expression. This term needs to be dominatedby the penalty if we want to avoid selecting models that are too large, and so it is natural toscale the penalty by a constant which is proportional to (cid:80) i / E [ f Y i | X i ( q i,τ ( X i ) | X i )] inorder to ensure reasonable performance across different data generating processes. Underthe simplifying assumption that f Y i | X i ( q i,τ ( X i ) | X i ) =: f ε (0) does not depend on i, X i ,this term equals n/f ε (0), and under the same assumptions ˆ s provides a consistent estimatorfor the latter, see Koenker (2005). Note that the sparsity term introduced here plays a u and Volgushev 7 similar role as the noise variance in classical information criteria such as AIC and BIC inleast squares regression. 3. Theoretical analysis
In this section we provide a theoretical analysis of the methodology proposed in Section 2.We begin by stating an assumption on the true (but unknown) underlying group structure.(C) For each quantile τ of interest, there exists a fixed number K τ , values α (01) ( τ ) < ... <α (0 K τ ) ( τ ) and disjoint sets I ( τ ) , ..., I K τ ( τ ) with ∪ k I k ( τ ) = { , ..., n } , | I k ( τ ) | /n → µ k ( τ ) ∈ (0 , α i ( τ ) = α j ( τ ) = α (0 k ) ( τ ) for i, j ∈ I k ( τ ). There exists ε > τ withmin k =1 ,...,K τ − | α (0 k ) ( τ ) − α (0 k +1) ( τ ) | ≥ ε . Assumption (C) implies that the individual fixed effects are grouped into K distinct groupsand that the group centers are separated. Note that the number of groups as well as groupmembership is allowed to differ across quantiles. For the sake of a concise notation, thedependence of the number of groups and group centers on τ will from now on be droppedunless there is risk of confusion. Note also that we require the number of groups to befixed (i.e. independent of n, T and non-random) and exogenous, i.e. independent of thecovariates X it .Next we collect some technical assumptions on the data generating process. Define Z (cid:62) it = (1 , X (cid:62) it ) and let Z denote the support of Z it .(A1) Assume that sup i (cid:107) Z it (cid:107) ≤ M < ∞ a.s. and that c λ ≤ inf i λ min ( E [ Z it Z (cid:62) it ]) ≤ sup i λ max ( E [ Z it Z (cid:62) it ]) ≤ C λ for some fixed constants c λ > C λ < ∞ .(A2) The conditional distribution functions F Y i | Z i ( y | z ) are twice differentiable w.r.t. y ,with the corresponding derivatives f Y i | Z i ( y | z ) and f (cid:48) Y i | Z i ( y | z ). Assume that f max := sup i sup y ∈ R ,z ∈Z | f Y i | Z i ( y | z ) | < ∞ , f (cid:48) := sup y ∈ R ,z ∈Z | f (cid:48) Y i | Z i ( y | z ) | < ∞ . (A3) Denote by T an open neighbourhood of τ . Assume that there exists a constant f min ≤ f max such that0 < f min ≤ inf i inf η ∈T inf z ∈Z f Y i | Z i ( q i,η ( z ) | z ) . Assumptions (A1)-(A3) are fairly standard and routinely imposed in the quantile regres-sion literature. Similar assumptions have been made, for instance in Kato, Galvao, andMontes-Rojas (2012) [see assumptions (B1)-(B3) in that paper].3.1.
Analysis of the estimators in (1) . To state our first main result defineΛ D := sup i ∈ I k ,j ∈ I k (cid:48) ,k (cid:54) = k (cid:48) λ i,j , Λ S := inf k inf i,j ∈ I k λ i,j . In words, Λ D corresponds to the largest penalty corresponding to the difference betweentwo individual effects from different groups while Λ S describes the smallest penalty between Panel Quantile regression with group fixed effects two effects from the same group. Our first result provides high-level conditions on Λ S , Λ D that guarantee asymptotically correct grouping. Theorem 3.1.
Let assumptions (A1)-(A3), (C) hold and assume that min( n, T ) → ∞ , log n = o ( T ) and (5) Λ D Λ S = o P (1) , n Λ D = o P ( T / ) , T / (log n ) / n Λ S = o P (1) . Denote the ordered unique values of ˆ α , ..., ˆ α n by ˆ α (1) < ... < ˆ α ( ˆ K ) (i.e. ˆ K denotes thenumber of distinct values taken by ˆ α , ..., ˆ α n which we interpret as the estimated number ofgroups) and define the sets ˆ I k := { i : ˆ α i = ˆ α ( k ) } , k = 1 , ..., ˆ K . Then P (cid:16) ˆ K = K, ˆ I k = I k , k = 1 , ..., K (cid:17) → . Next we discuss the implications of this general result for the specific choice ˇ λ i,j givenin (2). Define ˇΛ D := sup i ∈ I k ,j ∈ I k (cid:48) ,k (cid:54) = k (cid:48) ˇ λ i,j , ˇΛ S := inf k inf i,j ∈ I k ˇ λ i,j . From Kato, Galvao, and Montes-Rojas (2012) we obtain the boundsup i =1 ,...,n | ˇ α i − α i | = O P ((log n ) / /T / ) . Now if i, j ∈ I k then α i = α j and thus1 / ˇΛ S = (cid:110) inf k inf i,j ∈ I k | ˇ α i − ˇ α j | − λ (cid:111) − = λ − sup k sup i,j ∈ I k | ˇ α i − ˇ α j | = O P (cid:16) log nλT (cid:17) . Moreover, under (C) we have inf k (cid:54) = k (cid:48) inf i ∈ I k ,j ∈ I k (cid:48) | α i − α j | ≥ ε > D ≤ λ (cid:110) inf k (cid:54) = k (cid:48) inf i ∈ I k ,j ∈ I k (cid:48) | ˇ α i − ˇ α j | (cid:111) − ≤ λ/ ( ε − o P (1)) = O P ( λ ) . Given this choice of weights, the conditions Λ D Λ S = o P (1) is satisfied provided that T / log n → ∞ . The other conditions in (5) take the form T / (cid:29) nλ (cid:29) T − / (log n ) / . Assuming that (log n ) / = o ( T ), this provides a range of possible values for λ which willensure that (5) holds. more precisely, from the paragraph following equation (A.14) in the latter paper; note that this result isderived under the assumption that n grows at most polynomially with T u and Volgushev 9 Analysis of the information criterion in Algorithm 1.
In this section we pro-vide theoretical guarantees for the performance of the information criterion based estima-tors ˆ β IC , ˆ α ICk , ˆ I ICk , ˆ K IC Our main result shows that, under fairly general conditions on thepenalty parameter p n,T , the IC procedure selects the correct number of groups with proba-bility tending to one. Moreover, the estimators ( ˆ α IC , ..., ˆ α IC ˆ K IC , ˆ β IC ) are shown to enjoy the’oracle property’, i.e. they have the same asymptotic distribution as estimators which arebased on the true (but unknown) grouping of individuals. Before making this statementmore formal, we need some additional notation. Let(6) ( ˆ α ( OR )(1) , ..., ˆ α ( OR )( K ) , ˆ β ( OR ) ) := arg min ( α ,...,α K ,β ) (cid:88) k (cid:88) i ∈ I k (cid:88) t ρ τ ( Y it − X (cid:62) it β − α k )denote the infeasible ’oracle’ which uses the true group membership. The asymptotic vari-ance of the oracle estimator is conveniently expressed in terms of the following two limitswhich we assume to exist Σ ,τ := τ (1 − τ ) lim n →∞ n K (cid:88) k =1 (cid:88) i ∈ I k E [ ˜ Z ik ˜ Z (cid:62) ik ] , Σ ,τ := lim n →∞ n K (cid:88) k =1 (cid:88) i ∈ I k E [ ˜ Z ik ˜ Z (cid:62) ik f Y i | X i ( q i,τ ( X i ) | X i )] , where ˜ Z ik := ( e (cid:62) k , X (cid:62) i ) (cid:62) and e k denotes the k ’th unit vector in R K . Additionally, we needthe following condition on the grid λ , ..., λ L (G) For each ( n, T ), denote the grid values by λ ,n,T , ..., λ L,n,T where L can depend on n, T . There exists a sequence j n such that T − / (log n ) (cid:28) nλ j n (cid:28) T / .Assumption (G) is fairly mild. It only requires that among the candidate values for λ there exists one value so that ˇ λ i,j satisfies the assumptions of Theorem 3.1. In practice, werecommend choosing a grid of values that results in sufficiently many different numbers ofgroups. Theorem 3.2.
Let assumptions (A1)-(A3), (C), (G) hold and assume that min( n, T ) →∞ and n grows at most polynomially in T (i.e. n = O ( T b ) for some b < ∞ ) and (log T ) (log n ) T → . Assume that there exists ε > such that ˆ C > ε with probability tendingto one and that nT (cid:29) p n,T (cid:29) n, ˆ C = O P (1) . Then P ( ˆ K IC = K ) → and √ nT (cid:16) ( ˆ α IC , ..., ˆ α IC ˆ K IC , ( ˆ β IC ) (cid:62) ) − ( α (01) , ..., α (0 K ) , β (cid:62) ) (cid:17) D −→ N (0 , Σ − ,τ Σ ,τ Σ − ,τ ) , √ nT (cid:16) ( ˆ α ( OR )(1) , ..., ˆ α ( OR )( K ) , ( ˆ β ( OR ) ) (cid:62) ) − ( α (01) , ..., α (0 K ) , β (cid:62) ) (cid:17) D −→ N (0 , Σ − ,τ Σ ,τ Σ − ,τ ) . Remark 3.3.
Theorem 3.2 and Theorem 3.1 hold point-wise in the parameter space, andwe expect that deriving a similar result uniformly in the parameter space (in particular, if Appropriate forms of asymptotic normality of the oracle and IC estimators continue to hold without as-suming that the limits exist. This assumption is made for notational convenience. Strictly speaking, ˆ α ICk is not defined if ˆ K IC < K . Since the probability of this event tends to zero, we cansimply define ˆ α ICk = 0 for ˆ K IC > k ≥ K . Panel Quantile regression with group fixed effects cluster centers are allowed to depend on n, T and if their separation is lost, see Leeb andP¨otscher (2008) for such findings in the context of classical lasso penalized regression) isimpossible. It is a well established fact in the Statistics and Econometrics literature thatinference which is based on such ’point-wise’ asymptotic results can be unreliable. Recently,several approaches to alleviate this problem and achieve uniformly valid post-regularizationinference have been proposed (see, among others, Belloni, Chernozhukov, and Kato (2014),Lockhart, Taylor, Tibshirani, and Tibshirani (2014) and van de Geer, B¨uhlmann, Ritov,and Dezeure (2014)). Applying similar ideas to the present setting is a very importantquestion which we leave for future research.4.
Details on the optimization problem in Algorithm 1
To implement the proposed quantile panel data regression with group fixed effect, we needto solve the optimization problem stated in (1). A natural normalization of the objectiveand the penalty function leads to(7) min α ,...,α n , β nT (cid:88) i,t ρ τ ( Y it − X (cid:62) it β − α i ) + ˜ λn ( n − (cid:88) i (cid:54) = j | α i − α j || ˇ α i − ˇ α j | this is equivalent to the objective (1) except that ˜ λ is adjusted according to n and T sothat we can use a generic grid for ˜ λ rather than letting the grid support change with ( n, T ).In practice, the grid support of ˜ λ ∈ { , ˜ λ , . . . , ˜ λ (cid:96) , . . . , ˜ λ L } is chosen such that the numberof distinct values of the solution { ˆ α ,(cid:96) , . . . , ˆ α n,(cid:96) } for (cid:96) = 1 , . . . , L takes all possible integervalues in the set { , , . . . , n } . This is always achievable as long as the grid width of ˜ λ issmall enough. Since for each fixed ˜ λ (cid:96) , (7) is a linear programming problem which can beefficiently solved in any reliable solvers, this is not computationally expensive.To make this section self contained, we provide some details of the primal problem statedin (7) and its corresponding dual problem. Define λ ij := | ˇ α i − ˇ α j | − and observe that wecan re-write | α i − α j | = 2 (cid:16) − { ≤ α i − α j } (cid:17)(cid:16) − ( α i − α j ) (cid:17) . With this notation (7) canbe equivalently expressed as followsmin u , v , w , w , α , β τnT (cid:88) i,t u it + (1 − τ ) nT (cid:88) i,t v it + 4˜ λn ( n − (cid:16) n ( n − / (cid:88) j =1 w j + 12 n ( n − / (cid:88) j =1 w j (cid:17) subject to u it = max { Y it − X (cid:62) it β − α i , } v it = max { X (cid:62) it β + α i − Y it , } w j = max {− θ j , } w j = max { θ j , } Y it = u it − v it + α i + X (cid:62) it β w j − w j + θ j u and Volgushev 11 with θ being a vector of length n ( n − that consists entries ( α i − α j ) λ ij for i < j . We canrepresent θ as A α where A is a n ( n − × n matrix taking the form A = λ − λ . . . λ − λ . . . . . .λ n . . . − λ n λ − λ . . . . . . . . . λ n − ,n − λ n − ,n The corresponding dual problem of (7) can be stated as:max a , a a (cid:62) Y subject to X (cid:62) a = (1 − τ ) X (cid:62) nT Z (cid:62) a + 4 nT ˜ λn ( n − A (cid:62) a = (1 − τ ) Z (cid:62) nT + 2 nT ˜ λn ( n − A (cid:62) n ( n − / with Z being the incidence matrix that identifies the n individuals. The solution for α and β in the primal problem is then the dual solutions of the dual problem. We implementthe dual problem using the Mosek optimization software of Andersen (2010) through the Rinterface Rmosek of Friberg (2012). We have also implemented the estimation procedureusing the quantreg package in R and the code will be made available for public use. Monte Carlo Simulations
Finite Sample Performance of the Proposed Estimator.
To assess the finitesample performance of the proposed convex clustering panel quantile regression estimator,we apply the method to simulated data sets. In particular, we consider data generatedfrom two models and two error distributions for a range of n and T . The responses, Y it , aregenerated by either a location shift model(8) Y it = α i + X it β + u it or a location scale shift model(9) Y it = α i + X it β + (1 + X it γ ) u it where the individual latent effects α i are generated from three groups taking values { , , } with equal proportions. The covariate X it is generated such that it has a non-zero interclasscorrelation coefficient. In particular, X it = ρα i + γ i + v it Mosek is a commercial state-of-the-art convex optimization solver that provides a free academic license.We use its interior point algorithm to solve our linear programming problem. The estimation procedureimplemented using the quantreg package calls the sfn method, which uses the Frisch-Newton algorithmand exploits the sparse algebra to compute iterates. Panel Quantile regression with group fixed effects with γ i and v it independent and identically distributed over i and i, t respectively. Weconduct simulation experiments with ρ ∈ { , . } to investigate both cases where the fixedeffect is independent or correlated with the covariate. The true parameters are β = 1 and γ = 1 /
10. The error terms u it are i.i.d. following either a standard normal distribution ora student t distribution with three degrees of freedom. Results reported are based on 2000repetitions.We first investigate the performance of using the information criterion for estimatingthe number of groups. Table 1 and Table 2 report the proportion of estimated number ofgroups under the two models for different combinations of n and T for τ = 0 . τ = 0 . λ values withwidth 1 /
200 and support [0 , . λ values covers the integers in range [1 , n ]. For the IC criteria, thesparsity function ˆ s ( τ ) is estimated with bandwidth chosen based on the Hall and Sheather(1988) rule implemented in the quantreg package (see discussion in Koenker (2005)).The results suggest that the probability of getting the correct number of groups for τ = 0 . τ = 0 .
75. When T ≥
30, the estimates for the numberof groups for both error distributions and for different n are mostly satisfactory. Theperformance for t error deteriorates compared to those with normal error, especially forhigher quantiles. For T = 15 and quantiles other than the median the proposed methodshould be used with caution. Including correlation between individual effects and predictorsdoes not lead to dramatic changes in the accuracy for estimating the number of groups andgroup membership.Table 3 and Table 4 summarize the finite sample properties of ˆ β IC ( τ ) for τ = 0 . τ = 0 .
75, respectively and compare with the QRFE estimator where no penalization on theindividual fixed effect is used (i.e. λ = 0).The standard errors used for constructing confidence intervals (nominal coverage 95%)are based on the nid option with Hall-Sheather bandwidth in the package quantreg. Resultsbased on the Bofinger bandwidth selection are similar and not reported here. For the QRFE,the Bofinger bandwidth rule was used since the Hall-Sheather rule resulted in substantialunder-coverage with T = 30 for some of the models.When covariates and fixed effects are independent, the RMSE of the PQR-FEgroupestimator, ˆ β ( τ ), is smaller than that of the QRFE for all settings considered. This showsthat our penalization gains efficiency for estimating β when there is group structure in thefixed effects. The results do not change much from normal error to t error and from medianto higher quantiles.Introducing correlation between predictors and group membership leads to a bias for thegrouped effect estimator, while the fixed effects estimator does not suffer from additionalbias. This bias can be quite noticeable for small values of T , especially at the 75% quantile.The bias becomes negligible as T increases, so there is no contradiction to our asymptotictheory. An intuitive explanation for this behaviour is that for smaller T it is difficult to get Koenker (2004) used a similar data generating process with ρ = 0 for X and pointed out that the interclasscorrelation induced by γ i is crucial for the penalized quantile regression fixed effect estimator to have superiorperformance than the unpenalized QRFE estimator. u and Volgushev 13 a perfect grouping, and a wrong grouping leads to bias since there is dependence betweenpredictors and group structure.Last, we report in Table 5 and Table 6 the proportion of perfect classification of in-dividual effects and the average value of the percentage of correct classification togetherwith their standard errors. Since the comparison of the estimated membership and thetrue membership only makes sense when ˆ K = K , the estimated membership are basedon λ for which ˆ K = K (see Su, Shi, and Phillips (2016) for a similar approach). Resultssuggest that for T ≥
30 and τ = 0 .
5, the group membership estimation is quite satisfactory.While the proportion of perfect matches is low even for T = 30, the average proportionof correct classification shows that those effects are typically due to very few misclassifiedindividuals. For T = 15 perfect classification is almost impossible while average correctclassification rates remain reasonable. Adding correlation between individual effects andcovariates leads to a deterioration of the probability for achieving a perfect grouping forlocation-scale models, especially at higher quantiles, but does not have a strong impact onother results. Overall the simulations suggest that for small T , there is just not enoughinformation available for each individual to hope for perfect classification.5.2. Further analysis of tuning parameters in the IC Criteria.
The discussion at theend of Section 2 provides a motivation for the tuning parameters ˆ C in the IC criteria. Herewe further investigate the impact of rescaling p n,T by different factors. For illustration weconsider DGP1 in the previous section where data are generated based on the model (9). Weuse a grid of constants c ∈ [0 . , .
3] with width 0.01 and plot the associated performanceof the estimated number of groups, the RMSE of ˆ β IC ( τ ) and the coverage rate for p n,T = cnT / . Figure 1 and Figure 2 contain corresponding results for the location-scale shiftmodel with t errors and τ = 0 . , .
75, respectively. For T as small as 15, the performance isquite sensitive to the chosen constant. As predicted by the theory this dependence becomessomewhat less prominent as T increases. Overall the choice p n,T = nT / /
10 shows goodperformance for settings that we tried in this simulation. The patterns are similar for thosewith the normal error and the location-shift models and likewise for DGP2, results arereported in Figure 6- Figure 11 in the Appendix for the sake of completeness.6.
Empirical Example
To further illustrate our proposed methodology, we revisit the empirical inquiry on themuch-debated ”More Guns Less Crime” hypothesis. Lott and Mustard (1997) provide thefirst empirical analysis which claims that the adoption of Right-to-Carry (RTC) laws, whichallows local authorities to issue a concealed weapon permit to all applicants that are eligible,reduces crime. Ever since its publication, there has been much academic and political debatethat challenges the findings. Ayres and Donohue (2003) shows that the negative effect ofRTC laws has no statistical significance under a more reasonable model specification andinference using both the state and county level data between 1977 - 1999 for 51 U.S. states.Their conclusion is echoed by the National Research Council (2004) (NRC) report whichfinds little reliable statistical support for the “More Guns Less Crime” hypothesis. Recently,Aneja, Donohue, and Zhang (2014) revisited the hypothesis using the updated panel datafrom 1977 - 2010. They correct several mistakes in the dataset used in earlier analysis anddiscuss the shortcoming of using county level crime data. We refer the readers to more Panel Quantile regression with group fixed effects l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 15) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 30) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 60) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .
95% Coverage (T = 15) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .
95% Coverage (T = 30) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .
95% Coverage (T = 60) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 Figure 1.
Use different constants in p n,T for the IC criteria for location-scale shift model with t error on DGP1: For a equally spaced grid on [0 . , .
3] with width 0.01, the three columns representdifferent magnitudes of T while each figures in the row overlays the curves for n ∈ { , , } forvarious performance measures. The first row plots the proportion of correctly estimated number ofgroups. The second row plots the RMSE of ˆ β IC ( τ ) where τ = 0 . details provided in Aneja, Donohue, and Zhang (2014). For our empirical analysis, we usetheir updated state level data kindly provided by the authors. The main parameter of interest is the effect of the indicator of the RTC laws (denotedas lawind ) on crime rates. In our analysis, we focus on the violent crime rate. A similaranalysis is possible for other categories of offense. In addition to the law indicator, thereis also information on the incarceration rate (sentenced prisoners per 100,000 residents; The data and the detailed description of the data source can be downloaded from https://works.bepress.com/john_donohue/107/ . u and Volgushev 15 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 15) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 30) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 60) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . .
95% Coverage (T = 15) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .
95% Coverage (T = 30) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .
95% Coverage (T = 60) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 Figure 2.
Use different constants in p n,T for the IC criteria for location-scale shift model with t error on DGP1: For a equally spaced grid on [0 . , .
3] with width 0.01, the three columns representdifferent magnitudes of T while each figures in the row overlays the curves for n ∈ { , , } forvarious performance measures. The first row plots the proportion of correctly estimated numberof groups. The second row plots the RMSE of ˆ β IC ( τ ) where τ = 0 .
75 and the third plots thecoverage rate for nominal size 5%. Results are based on 400 repetitions. denoted as prisoner ) in the states in the previous year, real per capita personal income(denoted as rpcpi ) and other demographic variables on population proportions in differentage-gender groups for various ethnicities. As argued in Ayres and Donohue (2003) andAneja, Donohue, and Zhang (2014), to avoid multicollinearity and accounting for the factthat 90% of violent crimes in the U.S. are committed by male offenders, we follow theirspecification and control for only the proportion of African-American male in the age group10-19, 20-29 and 30-39. Inevitably, there are many different model specifications that can beconsidered for this empirical inquiry and it is impossible to report all the results. Our goalis to emphasize the heterogeneous law adoption effect for states that have low violent crime Panel Quantile regression with group fixed effects rate versus those with high crime rate. We also provide evidence of clustering behavior ofthe state fixed effects which are incorporated to capture states’ unobserved heterogeneityand show that by taking advantage of the dimension reduction of grouping the fixed effects,the parameters of interest on other control variables enjoy better statistical precision.Our main model specification is(10) q i,τ (log(violent it )) = α i ( τ ) + γ t ( τ ) + β ( τ )lawind it + θ ( τ ) (cid:62) X it where the additional control variables X it include lagged incarceration rate ( prisoners ),real per capita personal income ( rpcpi ) and the three demographic variables ( afam1019 , afam2029 , afam3039 ). We note that high incarceration rates in a given state may be afeedback towards rising violent crime, and therefore including those as control variablesmight lead to endogeneity issues. As a robustness check, we also report the correspondingresults in the appendix for model (10) without the incarcerating rate as a control covariate.The effects stay mostly unchanged.Figure 3 reports the panel data quantile regression estimates with state fixed effects for τ ∈ { . , . , . , . , . } and their associated point-wise confidence intervals. Similar tothe findings in Aneja, Donohue, and Zhang (2014), we see that the RTC law-adoption has apositive effects on violent crime rate. In addition this effect is significant for lower quantileswhile there is no statistical significance for such an effect for states at higher quantiles of theviolent crime rate. All other control variables have expected signs, with a negative effectof the lagged incarceration rate indicating that states with stricter laws have lower violentcrime rates, although this effect is not very precisely estimated. Higher real per capitapersonal income has a negative effect on violent crime at lower quantiles; this effect isdiminishing for higher quantiles of violent crime rate. The proportion of African-Americanpopulation at age group 30 - 39 has a positive effect on the violent crime and this effectbecomes more prominent for higher crime rate states. Figure 5 shows the correspondingstate fixed effect estimates for different quantile levels. There is some evidence of clusteredbehavior of these fixed effect estimates, although these are estimates with statistical errors.Using our proposed methodology, the estimated number of groups for the five quantile levelsare respectively { , , , , } .Figure 4 plots the corresponding panel data quantile regression estimates with the esti-mated optimal grouped state fixed effects for τ ∈ { . , . , . , . , . } and their associ-ated point-wise confidence intervals. The pattern of the quantile effects for all the controlvariables stay roughly the same compared to the fixed effect quantile regression estimates,while the variance of these estimates under the grouped fixed effects are noticeably smaller.This is similar to what we observe in the simulation section where the common parame-ters in quantile panel data regression with grouped fixed effects have lower variances thanthose of the estimates based on “individual heterogeneity”. However, we would also liketo point out that the standard errors are based on refitted models with estimated groupstructure, which does not account for the uncertainty of model selection on the fixed effectsand should be interpreted with caution. A proper inference method based on the proposedmethodology accounting for such uncertainty is important and is part of our future researchagenda. u and Volgushev 17 − . . . . . . lawind taus c f[, ] l l l l l − − − − + − prisoners taus c f[, ] l l l l l − − − − − − − − + rpcpi taus c f[, ] l l l l l − . − . − . − . − . . afam1019 taus c f[, ] l l l l l − . − . . . . . afam2029 taus c f[, ] l l l l l − . . . . . . afam3039 taus c f[, ] l l l l l Figure 3.
Panel data quantile regression estimates with state fixed effects for various τ based onmodel specification (10): For τ ∈ { . , . , . , . , . } , the solid black points plot the coefficientestimates for the effects of the RTC law adoption and other control variables on the violent crimerate based on panel data of 51 U.S. states for 1977 - 2010. The shaded area is the pointwise95% confidence interval for which the standard errors are computed using the Hendricks-Koenkersandwich covariance matrix estimates with the Hall-Sheather bandwidth rule. The red solid linemarks the fixed effect panel data mean regression estimates with the dotted red lines plots the95% confidence interval with robust clustered (at the states level) standard errors. Conclusions and future extensions
The present paper suggests a simple and computationally efficient way to incorporategroup fixed effects into a panel data quantile regression by means of a convex clusteringpenalty. We develop theoretical results on consistent group structure estimation and discussthe asymptotic properties of the resulting joint and group-specific estimators.There are several directions that we plan to explore in the future. First, our theoryfocused on individual fixed effects while assuming common slope coefficients. It is equally Panel Quantile regression with group fixed effects − . . . . . . lawind taus b s t a r [ k , ] l l l l l − − − − + − prisoners taus b s t a r [ k , ] l l l l l − − − − − − − − + rpcpi taus b s t a r [ k , ] l l l l l − . − . − . − . − . . afam1019 taus b s t a r [ k , ] l l l l l − . − . . . . . afam2029 taus b s t a r [ k , ] l l l l l − . . . . . . afam3039 taus b s t a r [ k , ] l l l l l Figure 4.
Panel data quantile regression estimates with grouped state fixed effects for various τ based on model specification (10): For τ ∈ { . , . , . , . , . } , the solid black points plotthe coefficient estimates for the effects of the RTC law adoption and other control variables onthe violent crime rate based on the proposed methodology with panel data of 51 U.S. states for1977 - 2010. The shaded area are the pointwise 95% confidence interval where the standard errorsare computed using the Hendricks-Koenker sandwich covariance matrix estimates with the Hall-Sheather bandwidth rule. The red solid line marks the fixed effect panel data mean regressionestimates with the dotted red lines plot the 95% confidence interval with robust clustered (at thestate level) standard errors. interesting to allow for group structure in some of the slope coefficients while keeping otherslope coefficients common across individuals, perhaps even allowing for individual fixedeffects. This can be achieved by straightforward modifications of the penalization approachwhich we explored so far, but a more detailed theoretical analysis of this approach remainsbeyond the scope of the present paper. u and Volgushev 19 lllllllllllllllllllllllllllllllllllllllllllllllllll tau = 0.1 Index F i x ed E ff e c t s lllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll tau = 0.5 Index F i x ed E ff e c t s lllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll tau = 0.9 Index F i x ed E ff e c t s lllllllllllllllllllllllllllllllllllllllllllllllllll Figure 5.
The estimated state fixed effects and the corresponding grouped fixed effects for various τ based on model specification (10): For τ ∈ { . , . , . } , the hollow black points plot the orderedpanel data quantile regression estimates for individual state fixed effects. The red solid points markout the estimated grouping and the corresponding group fixed effect estimates using the proposedmethodology. Second, one can take the standpoint that in many applications there is no exact groupstructure. In such settings, an alternative interpretation of the penalty which we inves-tigated is as a way of regularizing problems that have too many parameters. Such aninterpretation is in the spirit of the proposals of Koenker (2004) and Lamarche (2010), anda detailed investigation of the resulting bias-variance trade-off warrants further research.Finally, a deeper analysis of issues that are related to uniformity of distributional approx-imation in the entire parameter space was not addressed here, but remains an importanttheoretical and practical question which we hope to address in the future. Panel Quantile regression with group fixed effects
References
Abrevaya, J., and
C. Dahl (2008): “The Effects of Birth Inputs on Birthweight,”
Journal of Business &Economic Statistics , 26, 379–397.
Allman, E., C. Mathias, and
J. Rhodes (2009): “Identifiability of parameters in latent structure modelswith many observed variables,”
Annals of Statistics , 37, 3099–3132.
Andersen, E. D. (2010): “The Mosek Optimization Tools Manual, Version 6.0,” Available from . Ando, T., and
J. Bai (2016): “Panel Date Models with Grouped Factor Structure Under Unknown GroupMembership,”
Journal of Applied Econometrics , 31, 163–191.
Aneja, A., J. Donohue, and
A. Zhang (2014): “The impact of Right to Carry Laws and the NRC report:the latest lessons from the empirical evaluation of law and policy,” NBER working paper No. 18294.
Arellano, M., and
S. Bonhomme (2016): “Nonlinear Panel Data Estimation via Quantile Regressions,”
Econometrics Journal , 19, 64–94.
Ayres, L., and
J. Donohue (2003): “Shooting down the ‘more guns less crime’ hypothesis,”
Stanford LawReview , 55, 1193–1312.
Belloni, A., and
V. Chernozhukov (2009): “Least squares after model selection in high-dimensionalsparse models,” .
Belloni, A., V. Chernozhukov, and
K. Kato (2014): “Uniform post-selection inference for least abso-lute deviation regression and other Z-estimation problems,”
Biometrika (Oberwolfach 2012) , 102(1), 77–94.
Bester, A., and
C. Hansen (2016): “Grouped Effects Estimators in Fixed Effects Models,”
Journal ofEconometrics , 190, 197–208.
Bonhomme, S., and
E. Manresa (2015): “Grouped Pattern of Heterogeneity in Panel Data,”
Economet-rica , 83, 1147–1184.
Chamberlain, G. (1982): “Multivariate Regression Models for Panel Data,”
Journal of Econometrics , 18,5–46.
Chernozhukov, V., I. Fern´andez-Val, S. Hoderlein, H. Holzmann, and
W. Newey (2015): “Non-parametric Identification in Panels using Quantiles,”
Journal of Econometrics , 188, 378–392.
Chetverikov, D., B. Larsen, and
C. Palmer (2016): “IV Quantile Regression for Group-Level Treat-ments, with an Application on the Distributional Effects of Trade,”
Econometrica , 84, 809–833.
Council, N. R. (2004): in
Firearms and Violence: a Critical Review
The National Academies Press: Wash-ington.
Evdokimov, K. (2010): “Identification and Estimation of a Nonparametric Panel Data Model with Unob-served Heterogeneity,” preprint, Princeton University.
Friberg, H. A. (2012): “Users Guide to the R-to-Mosek Interface,” Available from http://rmosek.r-forge.r-project.org . Galvao, A. F., and
K. Kato (2016): “Smoothed quantile regression for panel data,”
Journal of Econo-metrics , 193(1), 92–112.
Galvao, A. F., and
L. Wang (2015): “Efficient minimum distance estimator for quantile regression fixedeffects panel data,”
Journal of Multivariate Analysis , 133, 1–26.
Hall, P., and
S. Sheather (1988): “On the Distribution of a Studentized Quantile,”
Journal of the RoyalStatistical Society, Series B , 50, 381–391.
Harding, M., and
C. Lamarche (2017): “Penalized quantile regression with semiparametric correlatedeffects: An application with heterogeneous preferences,”
Journal of Applied Econometrics , 32(2), 342–358.
Heckman, J., and
B. Singer (1982): “The Identification Problem in Econometric Models for DurationData,” in
Advances in Econometrics , ed. by W. Hildenbrand. Cambridge University Press.
Hocking, T., J. Vert, F. Bach, and
A. Joulin (2011): “Clusterpath: an Algorithm for ClusteringUsing Convex Fusion Penalties,” in
Proceeds of the International Conference of Machine Learning , ed. byL. Getoor, and
T. Scheffer. Omnipress: Madison.
Hsiao, C. (2003):
Analysis of Panel Data . Cambridge university press.
Kasahara, H., and
K. Shimotsu (2009): “Nonparametric Identification of Finite Mixture Models ofDynamic Discrete Choices,”
Econometrica , 77, 135–175. u and Volgushev 21
Kato, K., A. Galvao, and
G. Montes-Rojas (2012): “Asymptotics for Panel Quantile Regression Modelswith Individual Effects,”
Journal of Econometrics , 170, 76–91.
Kaufman, L., and
P. Rousseeuw (2009):
Finding Groups in Data: an Introduction to Cluster Analysis .Wiley: New York.
Keane, M., and
K. Wolpin (1997): “The Career Decisions of Young Men,”
Journal of Political Economy ,105, 473–522.
Koenker, R. (2004): “Quantile Regression for Longitudinal Data,”
Journal of Multivariate Analysis , 91,74–89.
Koenker, R. (2005):
Quantile regression , no. 38. Cambridge university press.
Lamarche, C. (2010): “Robust Penalized Quantile Regression Estimation for Panel Data,”
Journal ofEconometrics , 157, 396–408.
Leeb, H., and
B. M. P¨otscher (2008): “Sparse estimators and the oracle property, or the return ofHodges estimator,”
Journal of Econometrics , 142(1), 201–211.
Lin, C.-C., and
S. Ng (2012): “Estimation of Panel Data Models with Parameter Heterogeneity WhenGroup Membership is Unknown,”
Journal of Econometric Methods , 1, 42–55.
Lockhart, R., J. Taylor, R. J. Tibshirani, and
R. Tibshirani (2014): “A significance test for thelasso,”
Annals of statistics , 42(2), 413.
Lott, J., and
D. Mustard (1997): “Crime, Deterrence and Right-to-Carry Concealed Handguns,”
Journalof Legal Studies , 26, 1–68.
MacQueen, J. (1967): “Some Methods for Classification and Analysis of Multivariate Observations,” in
Proceeds of the 5th Berkeley Symposium of Mathematical Statistics and Probability , ed. by L. L. Cam, and
J. Neyman. University of California Press: Berkeley.
Mundlak, Y. (1978): “On the Pooling of Time Series and Cross Section Data,”
Econometrica , 46, 69–85.
Radchenko, P., and
G. Mukherjee (2017): “Convex Clustering via (cid:96) Fusion Penalization,”
Journal ofthe Royal Statistical Society Series B , forthcoming.
Su, L., Z. Shi, and
P. C. Phillips (2016): “Identifying latent structures in panel data,”
Econometrica ,84(6), 2215–2264.
Sun, Y. (2005): “Estimation and Inference in Panel Structure Models,” Working paper, University ofCalifornia, San Diego.
Tan, K., and
D. Witten (2015): “Statistical Properties of Convex Clustering,”
Electronic Journal ofStatistics , 9, 2324–2347. van de Geer, S., P. B¨uhlmann, Y. Ritov, and
R. Dezeure (2014): “On asymptotically optimalconfidence regions and tests for high-dimensional models,”
The Annals of Statistics , 42(3), 1166–1202. van der Vaart, A. W., and
J. A. Wellner (1996):
Weak Convergence and Empirical Processes . Springer.
Zhu, C., H. Xu, C. Leng, and
S. Yan (2014): “Convex optimization procedure for clustering: theoreticalrevisit,” in
Advances in Neural Information Processing Systems , pp. 1619–1627.
Zou, H. (2006): “The adaptive lasso and its oracle properties,”
Journal of the American statistical associ-ation , 101(476), 1418–1429. Panel Quantile regression with group fixed effects Proofs
We begin by collecting some useful facts and defining additional notation. We will re-peatedly make use of Knight’s identity (see (Koenker 2005), p. 121)) which holds for u (cid:54) = 0:(11) ρ τ ( u − v ) − ρ τ ( u ) = − vψ τ ( u ) + (cid:90) v I { u ≤ s } − I { u ≤ } ds. Additionally, let γ i := ( α i , β (cid:62) ) (cid:62) . The symbols a n (cid:46) b n , a n (cid:38) b n will mean that there existsa non-random constant C ∈ (0 , ∞ ) which is independent of n, T, τ such that P ( a n ≤ Cb n ) =1 and P ( a n ≥ Cb n ) = 1, respectively. Define ε τit := Y it − Z (cid:62) it γ i ( τ ) and let F ε τit | X it ( u | X it ) = F Y it | X it ( Z (cid:62) it γ i ( τ ) + u | X it ) denote the conditional cdf of ε τit given X it . When there is no riskof confusion, we will also write ε it instead of ε τit . Define ψ τ ( x ) := ( I { x ≤ } − τ ).8.1. Proof of Theorem 3.1.
We begin by stating some useful technical results which willbe proved at the end of this section.
Lemma 8.1.
For any fixed β ∈ R p define ε τit,β := Y it − X (cid:62) it β − α i ( τ ) . Then we have underassumptions (A1)-(A3) T (cid:88) t =1 ρ τ ( ε τit,β − a ) − ρ τ ( ε τit,β − a ) = ( a − a ) T (cid:88) t =1 ψ τ ( ε τit,β ) + ˜ r (1) n,i ( a , a ) + ˜ r (2) n,i ( a , a ) where sup i ˜ r (1) n,i ( a , a ) (cid:46) T | a − a | max( | a | , | a | ) , sup i ˜ r (2) n,i ( a , a ) = | a − a | O P ( T / (log n ) / ) . Lemma 8.2.
Under assumptions (A1)-(A3) there exist ε > , ∞ > c , c > such thatfor all i = 1 , ..., n (12) c (cid:107) γ − γ i (cid:107) ≥ E [ ρ τ ( Y it − Z (cid:62) it γ )] − E [ ρ τ ( Y it − Z (cid:62) it γ i )] ≥ c ( (cid:107) γ − γ i (cid:107) ∧ ε ) . Lemma 8.3.
Under assumption (A1) define for fixed B ∈ R (13) s n, ( B ) := sup i sup | γ |≤ B (cid:12)(cid:12)(cid:12) (cid:88) t (cid:16) ρ τ ( Y it − Z (cid:62) it γ ) − ρ τ ( Y it ) − E [ ρ τ ( Y it − Z (cid:62) it γ ) − ρ τ ( Y it )] (cid:17)(cid:12)(cid:12)(cid:12) We have for any fixed
B < ∞ , provided that min( n, T ) → ∞ , log n = o ( T )(14) s n, ( B ) = O P ( T / (log n ) / ) . Proof of Theorem 3.1Step 1: first bounds
In this step we shall prove that(15) (cid:107) ˆ β − β (cid:107) + sup i | ˆ α i − α i | = O P ( T − / (log n ) / + Λ D n/T ) . Combine the results in Lemma 8.2 and Lemma 8.3 to find that any minimizer of Θ( α , ..., α n , β )must satisfy c T (cid:88) i (cid:110) ( (cid:107) β − β (cid:107) + | α i − α i | ) ∧ ε (cid:111) (cid:46) ns n, + Λ D n . u and Volgushev 23 Let N ∆ := { i : (cid:107) β − β (cid:107) + | α i − α i | ≥ ∆ } . Then for any 0 < ∆ < ε T N ∆ ∆ = O P ( ns n, + Λ D n ) , i.e. by Lemma 8.3 N ∆ = nO P (( T − / (log n ) / ) + Λ D n/T )∆ − and in particular N ∆ = o P ( n ) as long as ∆ (cid:29) T − / (log n ) / + Λ D n/T . Provided thatΛ D n/T = o P (1) we obtain(16) (cid:107) ˆ β − β (cid:107) = O P ( T − / (log n ) / + Λ D n/T ) . Define D n,T := T − / (log n ) / + Λ D n/T . Next we will prove that, provided n Λ D = o P ( T / ) , T / (log n ) / / ( n Λ S ) = o P (1) alsosup i | ˆ α i − α i | = O P ( D n,T ) . To this end, it suffices to prove that sup i | ˆ α i − α i | = O P ( d n,T ) for any n Λ S (cid:29) d n,T (cid:29) D n,T .Define ˜ α i = (cid:26) ˆ α i if | ˆ α i − α i | ≤ d n ,α i + d / n sgn( ˆ α i − α i ) if | ˆ α i − α i | > d n . Define the set E := { i : ˜ α i = ˆ α i } . Observe that (cid:12)(cid:12)(cid:12) | ˜ α i − ˜ α j | − | ˆ α i − ˆ α j | (cid:12)(cid:12)(cid:12) ≤ | ˆ α i − ˜ α i | + | ˆ α j − ˜ α j | ∀ i, j, | ˜ α i − ˜ α j | − | ˆ α i − ˆ α j | < −| ˆ α i − α i | + d / n,T ∃ k : i ∈ I k ∩ E C , j ∈ I k ∩ E, | ˜ α i − ˜ α j | − | ˆ α i − ˆ α j | ≤ ∃ k : i, j ∈ I k . Panel Quantile regression with group fixed effects
Thus (cid:88) i,j λ i,j (cid:110) | ˆ α i − ˆ α j | − | ˜ α i − ˜ α j | (cid:111) = (cid:16) (cid:88) i ∈ E C (cid:88) j ∈ E + (cid:88) i ∈ E C (cid:88) j ∈ E C (cid:17) λ i,j (cid:110) | ˆ α i − ˆ α j | − | ˜ α i − ˜ α j | (cid:111) = 2 (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:16) (cid:88) j ∈ I k ∩ E + (cid:88) j ∈ I Ck ∩ E (cid:17) λ i,j (cid:110) | ˆ α i − ˆ α j | − | ˜ α i − ˜ α j | (cid:111) + (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:16) (cid:88) j ∈ I k ∩ E C + (cid:88) j ∈ I Ck ∩ E C (cid:17) λ i,j (cid:110) | ˆ α i − ˆ α j | − | ˜ α i − ˜ α j | (cid:111) ≥ (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:16) Λ S | I k ∩ E | − n Λ D (cid:17) {| ˆ α i − α i | − d / n,T }− (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:88) j ∈ I Ck ∩ E C λ i,j (cid:110) | ˆ α i − ˜ α i | + | ˆ α j − ˜ α j | (cid:111) ≥ (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:16) Λ S | I k ∩ E | − n Λ D (cid:17) {| ˆ α i − α i | − d / n,T } + (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:110) − n Λ D {| ˆ α i − α i | − d / n,T } − Λ D (cid:88) j ∈ I Ck ∩ E C {| ˆ α j − α j | − d / n,T } (cid:111) ≥ (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:16) Λ S | I k ∩ E | − n Λ D (cid:17) {| ˆ α i − α i | − d / n,T } . Now since N d n = o P ( n ) and by definition of ˜ α n it follows that under (C)max k (cid:12)(cid:12)(cid:12) | I k ∩ E | nµ k − (cid:12)(cid:12)(cid:12) = o P (1) , and since by assumption Λ D / Λ S = o P (1) we obtain (cid:88) i,j λ i,j (cid:110) | ˆ α i − ˆ α j | − | ˜ α i − ˜ α j | (cid:111) (cid:38) n Λ S (cid:88) i ∈ E C {| ˆ α i − α i | − d / n,T } . u and Volgushev 25 Next we note that for any i with | ˆ α i − α i | ≥ (2 + c /c ) d n,T we have1 T (cid:88) t ρ τ ( Y it − X (cid:62) it ˆ β − ˆ α i ) − ρ τ ( Y it − X (cid:62) it ˆ β − ˜ α i ) ≥ (cid:90) ρ τ ( y − x (cid:62) ˆ β − ˆ α i ) − ρ τ ( y − x (cid:62) ˆ β − ˜ α i ) dP Y i ,X i ( x, y ) − s n, /T = (cid:90) ρ τ ( y − x (cid:62) ˆ β − ˆ α i ) − ρ τ ( y − x (cid:62) β − α i ) dP Y i ,X i ( x, y ) − (cid:90) ρ τ ( y − x (cid:62) ˆ β − ˜ α i ) − ρ τ ( y − x (cid:62) β − α i ) dP Y i ,X i ( x, y ) − s n, /T ≥ c ( {(cid:107) ˆ β − β (cid:107) + | ˆ α i − α i | } ∧ ε ) − c ( (cid:107) ˆ β − β (cid:107) + | ˜ α i − α i | ) − s n, /T> d n,T . For i with | ˆ α i − α i | < (2 + c /c ) d n,T note that by Lemma 8.1 (cid:12)(cid:12)(cid:12) (cid:88) t ρ τ ( Y it − X (cid:62) it ˆ β − ˆ α i ) − ρ τ ( Y it − X (cid:62) it ˆ β − ˜ α i ) (cid:12)(cid:12)(cid:12) (cid:46) | ˆ α i − ˜ α i | (cid:16) (cid:88) t ψ τ ( ε τit, ˆ β ) + T d / n,T + O P ( T / (log n ) / ) (cid:17) (cid:46) {| ˆ α i − α i | − d / n,T } (cid:16) T (cid:107) ˆ β − β (cid:107) + T d / n,T + O P ( T / (log n ) / ) (cid:17) where the O P terms are uniform in i . Thus (cid:88) i (cid:88) t ρ τ ( Y it − X (cid:62) it ˆ β − ˆ α i ) − ρ τ ( Y it − X (cid:62) it ˆ β − ˜ α i ) (cid:38) − (cid:16) T d / n,T + O P ( T / (log n ) / ) (cid:17) (cid:88) i ∈ E C {| ˆ α i − α i | − d / n,T } Summarizing we have proved thatΘ( ˆ α , ..., ˆ α n , ˆ β ) − Θ( ˜ α , ..., ˜ α n , ˆ β ) (cid:38) (cid:104) n Λ S − (cid:16) T d / n,T + O P ( T / (log n ) / ) (cid:17)(cid:105) (cid:88) i ∈ E C {| ˆ α i − α i | − d / n,T } . Under the conditions n Λ D = o P ( T / ) , T / (log n ) / / ( n Λ S ) = o P (1) the last line is strictlypositive with probability tending to one unless E C = ∅ with probability tending to one.Thus the proof of (15) is complete. Step 2: recovery of clusters with probability to one
To simplify notation, assume that individual 1 , ..., N belongs to cluster 1, individual N + 1 , ..., N + N to cluster 2 and so on. Since all cluster can be handled by similararguments we only consider the first cluster. Let ˆ α (1) , ..., ˆ α ( L ) denote the distinct values ofˆ α , ..., ˆ α N , ordered in increasing order, and let n ,k := { i : ˆ α i = ˆ α ( k ) } . Again, to simplifynotation assume w.o.l.g. that ˆ α = ... = ˆ α n , = ˆ α (1) . To prove the result, we proceed in an Panel Quantile regression with group fixed effects iterative way. We will prove by contradiction that L = 1, i.e. all estimators of individualsfrom cluster 1 take the same value. Assume that L ≥ n , > N /
2. Assume that n , < N /
2. Define˜ α i = ˆ α (2) for i = 1 , ..., n , and ˜ α i = ˆ α i for i > n , . By (15) Lemma 8.1 we find that (cid:12)(cid:12)(cid:12) (cid:88) i T (cid:88) t =1 ρ τ ( Y it − X (cid:62) it ˆ β − ˆ α i ) − ρ τ ( Y it − X (cid:62) it ˆ β − ˜ α i ) (cid:12)(cid:12)(cid:12) (cid:46) n , ( ˆ α (2) − ˆ α (1) ) (cid:110)(cid:12)(cid:12)(cid:12) (cid:88) t ψ τ ( ε τit, ˆ β ) (cid:12)(cid:12)(cid:12) + O P ( T / (log n ) / ) + O P ( T / (log n ) / ) (cid:111) (cid:46) n , ( ˆ α (2) − ˆ α (1) ) O P ( T / (log n ) / )Next, observe that by construction, under (C) and using the fact that sup i | ˆ α i − α i | = o P (1), | ˜ α i − ˜ α j | − | ˆ α i − ˆ α j | = −| ˆ α (2) − ˆ α (1) | , ≤ i ≤ n , , n , < j ≤ N or 1 ≤ j ≤ n , , n , < i ≤ N , (cid:12)(cid:12)(cid:12) | ˜ α i − ˜ α j | − | ˆ α i − ˆ α j | (cid:12)(cid:12)(cid:12) ≤ | ˆ α (2) − ˆ α (1) | , ≤ i ≤ n , , N < j or 1 ≤ j ≤ n , , N < i, | ˜ α i − ˜ α j | − | ˆ α i − ˆ α j | = 0 , else . From this we obtainΘ( ˜ α , ..., ˜ α n , ˆ β ) − Θ( ˆ α , ..., ˆ α n , ˆ β ) (cid:46) n , ( ˆ α (2) − ˆ α (1) ) O P ( T / (log n ) / ) − ( ˆ α (2) − ˆ α (1) )Λ S n , ( N − n , )+ ( ˆ α (2) − ˆ α (1) ) n , Λ D O P ( n ) < n, T since by assumption Λ D / Λ S = o P (1) , n Λ S (cid:29) T / (log n ) / and since we assumed n , < N / N − n , ≥ N / (cid:38) n . However, this is a contradiction to the fact that ˆ α , ..., ˆ α n , ˆ β minimizes Θ.In a similar fashion, one can prove that n ,L > N /
2. Just define ˜ α N , ..., ˜ α N − n ,L +1 =ˆ α ( L − and proceed as above. Since n ,L + n , ≤ N and we have already proved that n , > N / L ≥
2, and hence L = 1. All other clusterscan be handled in a similar fashion and that completes the proof of the second step. (cid:50) Proof of Lemma 8.1
Apply Knight’s identity (11) to find that T (cid:88) t =1 ρ τ ( ε τit,β − δ ) − ρ τ ( ε τit,β )= − δ (cid:88) t ψ τ ( ε τit,β ) + (cid:88) t (cid:90) δ E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105) ds + (cid:90) δ (cid:88) t (cid:110) I { ε τit,β ≤ s } − I { ε τit,β ≤ } − E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105)(cid:111) ds. u and Volgushev 27 Hence it follows that T (cid:88) t =1 ρ τ ( ε τit,β − a ) − ρ τ ( ε τit,β − a )= ( a − a ) (cid:88) t ψ τ ( ε τit,β ) + (cid:90) a a (cid:88) t E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105) ds + (cid:90) a a (cid:88) t (cid:110) I { ε τit,β ≤ s } − I { ε τit,β ≤ } − E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105)(cid:111) ds =: ( a − a ) T (cid:88) t =1 ψ τ ( ε τit,β ) + ˜ r (1) n,i ( a , a ) + ˜ r (2) n,i ( a , a ) . Now by a Taylor expansionsup i (cid:12)(cid:12)(cid:12) (cid:88) t (cid:90) a a E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105) ds (cid:12)(cid:12)(cid:12) = sup i (cid:12)(cid:12)(cid:12) (cid:90) a a (cid:88) t E [ F Y i | X i ( β (cid:62) X it + s | X it ) − F Y i | X i ( β (cid:62) X it | X it )] ds (cid:12)(cid:12)(cid:12) (cid:46) T | a − a | max( | a | , | a | ) , so the bound on ˜ r (1) n,i ( a , a ) is established. Next define the classes of functions G := (cid:110) ( y, x ) (cid:55)→ I { y − β (cid:62) x ≤ s } − I { y − β (cid:62) x ≤ } (cid:12)(cid:12)(cid:12) s ∈ R , β ∈ R d ≤ B (cid:111) , G := (cid:110) ( y, x ) (cid:55)→ I { y − β (cid:62) x ≤ s } (cid:12)(cid:12)(cid:12) s ∈ R , β ∈ R d (cid:111) . Note that the class of functions G has envelope function F ≡
1. Thus by Lemma 2.6.15 andTheorem 2.6.7 of (van der Vaart and Wellner 1996) the class of functions G satisfies, forany probability measure Q , N ( ε, G , L ( Q )) ≤ K (1 /ε ) V for some finite constants K, V (here, N ( ε, G , L ( Q )) denotes the covering number, see Section 2.1 of (van der Vaart and Wellner1996)). Moreover, G ⊆ { g − g | g , g ∈ G } , and elementary computations with coveringnumbers show that N ( ε, G , L ( Q )) ≤ ˜ K (1 /ε ) ˜ V for some finite constants ˜ V , ˜ K . Hence wefind that by Theorem 2.14.9 of (van der Vaart and Wellner 1996)), for any h > P ∗ (cid:16) sup g ∈G √ T (cid:12)(cid:12)(cid:12) (cid:88) t g ( Y it , X it ) − E [ g ( Y it , X it )] (cid:12)(cid:12)(cid:12) ≥ h (cid:17) ≤ (cid:16) Dh (cid:112) ˜ V (cid:17) ˜ V e − h for some constant D that depends only on ˜ K (here, P ∗ denotes outer probability). Letting h = √ log n and applying the union bound for probabilities we obtainsup i sup β ∈ R d ,s ∈ R (cid:12)(cid:12)(cid:12) (cid:88) t (cid:110) I { ε τit,β ≤ s } − I { ε τit,β ≤ } − E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105)(cid:12)(cid:12)(cid:12) = O P ( T / (log n ) / ) . Panel Quantile regression with group fixed effects
Hence sup i sup β ∈ R p (cid:12)(cid:12)(cid:12) (cid:90) a a (cid:88) t (cid:110) I { ε τit,β ≤ s } − I { ε τit,β ≤ } − E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105) ds (cid:12)(cid:12)(cid:12) = O P ( T / (log n ) / ) | a − a | . Thus the bound on ˜ r (2) n,i ( a , a ) follows and the proof is complete. (cid:50) Proof of Lemma 8.2
Observe that by Knight’s identity (11) E (cid:104) ρ τ ( Y it − Z (cid:62) it γ ) − ρ τ ( Y it − Z (cid:62) it γ i ) (cid:105) = E (cid:104) ρ τ ( Y it − Z (cid:62) it γ i − Z (cid:62) it ( γ − γ i )) − ρ τ ( Y it − Z (cid:62) it γ i ) (cid:105) = E (cid:104) − ( γ − γ i ) (cid:62) Z it ψ τ ( ε it ) + (cid:90) ( γ − γ i ) (cid:62) Z it I { ε it ≤ s } − I { ε it ≤ } ds (cid:105) = E (cid:104) (cid:90) ( γ − γ i ) (cid:62) Z it F ε it | X it ( s | X it ) − F ε it | X it (0 | X it ) ds (cid:105) . Now under assumption (A2) | F ε it | X it ( s | X it ) − F ε it | X it (0 | X it ) | ≤ sf (cid:48) a.s., and thus given (A1) E (cid:12)(cid:12)(cid:12) (cid:90) ( γ − γ i ) (cid:62) Z it F ε it | X it ( s | X it ) − F ε it | X it (0 | X it ) ds (cid:12)(cid:12)(cid:12) ≤ f (cid:48) E (cid:104) (( γ − γ i ) (cid:62) Z it ) (cid:105) ≤ M f (cid:48) (cid:107) γ − γ i (cid:107) . This shows the upper bound in (12). For the lower bound, note that s (cid:55)→ F ε it | X it ( s | X it )is non-decreasing almost surely. Moreover, f ε it | X it (0 | X it ) ≥ f min a.s. by (A3) and thus by(A2) and (A3) we have almost surelyinf | s |≤ f min / f (cid:48) f ε it | X it ( s | X it ) ≥ f min . Define δ i := ( γ − γ i ) min { , f min / (2 M f (cid:48) (cid:107) γ − γ i (cid:107) ) } . Noting that s (cid:55)→ F ε it | X it ( s | X it ) isnon-decreasing almost surely, it follows that a.s. (cid:90) ( γ − γ i ) (cid:62) Z it F ε it | X it ( s | X it ) − F ε it | X it (0 | X it ) ds ≥ (cid:90) δ (cid:62) i Z it F ε it | X it ( s | X it ) − F ε it | X it (0 | X it ) ds ≥ f min δ (cid:62) i Z it ) where the last inequality follows since by definition | δ (cid:62) i Z it | ≤ f min / (2 f (cid:48) ) a.s. Finally, underassumption (A1), E [( δ (cid:62) i Z it ) ] ≥ (cid:107) δ i (cid:107) c λ . Summarizing, we find E (cid:104) ρ τ ( Y it − Z (cid:62) it γ ) − ρ τ ( Y it − Z (cid:62) it γ i ) (cid:105) ≥ f min c λ (cid:107) δ i (cid:107) = f min c λ (cid:16) (cid:107) γ − γ i (cid:107) ∧ f min M f (cid:48) (cid:17) which proves the lower bound in (12). Thus the proof of the Lemma is complete. (cid:50) u and Volgushev 29 Proof of Lemma 8.3
Consider the class of functions G B := (cid:110) ( y, z ) (cid:55)→ g γ ( y, z ) := ( ρ τ ( y − z (cid:62) γ ) − ρ τ ( y )) I {| z | ≤ M } + M B M B (cid:12)(cid:12)(cid:12) (cid:107) γ (cid:107) ≤ B (cid:111) . Note that by construction 0 ≤ g γ ( y, z ) ≤ (cid:107) γ (cid:107) ≤ B and moreover sup y,z | g γ ( y, z ) − g γ (cid:48) ( y, z ) | ≤ (cid:107) γ − γ (cid:48) (cid:107) / (2 B ). This shows the existence of constants V, K B < ∞ such that forall i = 1 , ..., n N [ ] ( ε, G B , L ( P i )) ≤ ( K B /ε ) V for 0 < ε < K B where K B depends on B onlyand P i denotes the measure corresponding to ( Y i , Z i ). Thus we have by Theorem 2.14.9of (van der Vaart and Wellner 1996), P ∗ (cid:16) sup γ √ T (cid:12)(cid:12)(cid:12) (cid:88) t g γ ( Y it , Z it ) − E [ g γ ( Y it , Z it )] (cid:12)(cid:12)(cid:12) ≥ h (cid:17) ≤ (cid:16) D B h √ V (cid:17) V e − h where the constant D B depends only on K B and P ∗ denotes outer probability. Set h = √ log n to bound the right-hand side above by o ( n − ). Defining the events E i,n := (cid:110) sup γ √ T (cid:12)(cid:12)(cid:12) (cid:88) t g γ ( Y it , Z it ) − E [ g γ ( Y it , Z it )] (cid:12)(cid:12)(cid:12) ≥ (cid:112) log n (cid:111) we obtain P ∗ ( ∪ i E i,n ) ≤ n sup i P ∗ ( E i,n ) ≤ no ( n − ) = o (1) . Finally, note that under (A1) we have a.s. ρ τ ( Y it − Z (cid:62) it γ ) − ρ τ ( Y it ) − E [ ρ τ ( Y it − Z (cid:62) it γ ) − ρ τ ( Y it )]2 M B = g γ ( Y it , Z it ) − E [ g γ ( Y it , Z it )] ∀ i, t. This completes the proof. (cid:50)
Proof of Theorem 3.2.
We begin by stating a useful technical result that will beproved at the end of this section.
Lemma 8.4.
Under assumptions (A1)-(A3) T (cid:88) t =1 ρ τ ( Y it − Z (cid:62) it ( γ + δ )) − ρ τ ( Y it − Z (cid:62) it γ )= δ (cid:62) T (cid:88) t =1 Z it ψ τ ( ε it ) + 12 T δ (cid:62) E [ Z it Z (cid:62) it f ε i | X i (0 | X it )] δ + r (1) n,i ( δ ) + r (2) n,i ( δ ) where, defining (cid:96) n,T := max { log n, log T } , there exists a constant C independent of n, T, δ such that sup i sup T − (cid:96) n,T ≤(cid:107) δ (cid:107)≤ | r (1) n,i ( δ ) |(cid:107) δ (cid:107) / = O P ( T / (cid:96) / n,T ) , sup i | r (2) n,i ( δ ) | ≤ T C (cid:107) δ (cid:107) . Proof of Theorem 3.2
The proof proceeds in several steps. First, we note that the ’ora-cle’ estimation problem (6) corresponds to a classical, fixed-dimensional quantile regressionwith true parameter vector ( α (01) , ..., α (0 K ) , β (cid:62) ) and nT independent observations ( Y it , ˜ Z it )where ˜ Z (cid:62) it = ( e (cid:62) k , X (cid:62) it ) , i ∈ I k , t = 1 , ..., T where e k denotes the k’th unit vector in R K . Panel Quantile regression with group fixed effects
A straightforward extension of classical proof techniques in parametric quantile regressionshows that under assumptions (A1)-(A3) and (C) the oracle estimator is asymptoticallynormal as claimed.Second, we observe that by definition of the optimization problem the estimated groupstructure ˆ I ,(cid:96) , ..., ˆ I K (cid:96) ,(cid:96) is the same for all values of (cid:96) with λ (cid:96) that give rise to the same numberof groups. Since the value of IC ( (cid:96) ) depends only on ˆ I ,(cid:96) , ..., ˆ I K (cid:96) ,(cid:96) , it suffices to minimize IC over those values of (cid:96) that correspond to different numbers of groups. Denote thedistinct estimated numbers of groups by ˆ K , ..., ˆ K R , the corresponding estimated groupingsby ˆ I (1 ˆ K r ) , ..., ˆ I ( ˆ K r ˆ K r ) , and the corresponding values of IC by IC ˆ K , ..., IC ˆ K R . By assumption(G) and Theorem 3.1, the probability of the event(17) P (cid:16) ∃ r : ˆ K r = K, ˆ I ( k ˆ K r ) = I k , k = 1 , ..., K (cid:17) → . Hence it suffices to prove that(18) P (cid:16) arg min r IC K r = K (cid:17) → . Once this result is established, we directly obtain P (cid:16) ( ˆ α IC , ..., ˆ α IC ˆ K IC , ( ˆ β IC ) (cid:62) ) = ( ˆ α ( OR )(1) , ..., ˆ α ( OR )( K ) , ( ˆ β ( OR ) ) (cid:62) ) (cid:17) → , and thus the asymptotic distribution of ( ˆ α IC , ..., ˆ α IC ˆ K IC , ( ˆ β IC ) (cid:62) ) matches that of the oracleestimator.We will now prove (18). From Theorem 3.2 in Kato, Galvao, and Montes-Rojas (2012)we know that under (A1)-(A3) and the additional assumptions that n → ∞ but T growsat most polynomially in n ˇ β − β = O P (( T / log n ) − / ∨ ( nT ) − / ) . If n → ∞ and T grows at most polynomially in n it follows that ˇ β − β = o P ( T − / ).Moreover, standard quantile regression arguments show thatˇ α i − α i = − E [ f ε τit | X it (0 | X it )] 1 T (cid:88) t ψ τ ( ε τit ) + R n,i u and Volgushev 31 where sup i | R n,i | = O p (cid:16)(cid:16) log TT (cid:17) / (cid:17) . Next apply Lemma 8.4 to find that provided (log T ) (log n ) T → (cid:88) i,t ρ τ ( Y it − Z (cid:62) it ˇ γ i ) − ρ τ ( ε τit )= (cid:88) i (ˇ γ i − γ i ) (cid:62) (cid:88) t Z it ψ τ ( ε τit ) + T (cid:88) i (ˇ γ i − γ i ) (cid:62) E [ Z i Z (cid:62) i f ε τi | X i (0 | X i )](ˇ γ i − γ i ) + o P ( n )= (cid:88) i ( ˇ α i − α i ) (cid:88) t ψ τ ( ε τit ) + T (cid:88) i ( ˇ α i − α i ) E [ f ε τi | X i (0 | X i )] + o P ( n )= − (cid:88) i E [ f ε τi | X i (0 | X i )] (cid:16) √ T (cid:88) t ψ τ ( ε τit ) (cid:17) + o P ( n )= − (cid:88) i τ (1 − τ )2 E [ f ε τi | X i (0 | X i )] + o P ( n ) . Next, observe that by asymptotic normality of the oracle estimatorsup k =1 ,...,K (cid:107) ˆ γ ( OR )( k ) − γ (0 k ) (cid:107) = O P (( nT ) − / )where we defined ˆ γ ( OR )( k ) := ( ˆ α ( OR )( k ) , ˆ β ( OR ) ). Again applying Lemma 8.4 we obtain (cid:88) k (cid:88) i ∈ I k (cid:88) t ρ τ ( Y it − Z (cid:62) it ˆ γ ( OR )( k ) ) − ρ τ ( ε τit )= (cid:88) k (ˆ γ ( OR )( k ) − γ (0 k ) ) (cid:62) (cid:88) i ∈ I k (cid:88) t Z it ψ τ ( ε it ) + nT O P (cid:16) sup k (cid:107) ˜ γ k − γ (0 k ) (cid:107) (cid:17) + o P ( n )= o P ( n ) . Combining the results obtained so far we have(19) (cid:88) k (cid:88) i ∈ I k (cid:88) t ρ τ ( Y it − Z (cid:62) it ˆ γ ( OR )( k ) ) − inf α ,...,α n ,β (cid:88) i,t ρ τ ( Y it − X (cid:62) it β − α i ) ≥ − (cid:88) i τ (1 − τ )2 E [ f ε i | X i (0 | X i )] + o P ( n ) . Next, let V n ( L ) denote the set of all disjoint partitions of { , ..., n } into L subsets. Observethat by (12) we have under assumption (C)inf L Summarizing, we find that under (C)inf L By Knight’s identity (11) we have ρ τ ( Y it − Z (cid:62) it ( γ i + δ )) − ρ τ ( Y it − Z (cid:62) it γ i )= − δ (cid:62) Z it ψ τ ( ε it ) + (cid:90) Z (cid:62) it δ F ε i | X i ( s | X it ) − F ε i | X i (0 | X it ) ds + (cid:90) Z (cid:62) it δ I { ε it ≤ s } − I { ε it ≤ } − ( F ε i | X i ( s | X it ) − F ε i | X i (0 | X it )) ds. Define r (1) n,i ( δ ) := (cid:88) t (cid:90) Z (cid:62) it δ I { ε it ≤ s } − I { ε it ≤ } − ( F ε i | X i ( s | X it ) − F ε i | X i (0 | X it )) ds − T δ (cid:62) E [ Z i Z (cid:62) i f ε i | X i (0 | X i )] + (cid:88) t f ε i | X i (0 | X it )( Z (cid:62) it δ ) ,r (2) n,i ( δ ) := (cid:88) t (cid:110) (cid:90) Z (cid:62) it δ F ε i | X i ( s | X it ) − F ε i | X i (0 | X it ) ds − f ε i | X i (0 | X it )( Z (cid:62) it δ ) (cid:111) . By a Taylor expansion we obtain (cid:12)(cid:12)(cid:12) (cid:90) Z (cid:62) it δ F ε i | X i ( s | X it ) − F ε i | X i ( s | X it ) ds − f ε i | X i (0 | X it )( Z (cid:62) it δ ) (cid:12)(cid:12)(cid:12) ≤ ( Z (cid:62) it δ ) f (cid:48) ≤ M f (cid:48) (cid:107) δ (cid:107) , and thus the bound on r ( i ) n, is established. Next we note that E (cid:104) (cid:90) Z (cid:62) it δ I { ε it ≤ s } − I { ε it ≤ } − ( F ε i | X i ( s | X it ) − F ε i | X i (0 | X it )) ds (cid:105) = 0 u and Volgushev 33 since the conditional expectation given Z it equals zero almost surely and moreover | I it ( δ ) | := (cid:12)(cid:12)(cid:12) (cid:90) Z (cid:62) it δ I { ε it ≤ s } − I { ε it ≤ } − ( F ε i | X i ( s | X it ) − F ε i | X i (0 | X it )) ds (cid:12)(cid:12)(cid:12) ≤ M (cid:107) δ (cid:107) + M (cid:107) δ (cid:107) I {| ε it | ≤ M (cid:107) δ (cid:107)} , | I it ( δ ) − I it ( δ (cid:48) ) | ≤ M (cid:107) δ − δ (cid:48) (cid:107) . Note that in particular for (cid:107) δ (cid:107) ≤ | I it ( δ ) | ≤ M ( M + 1) (cid:107) δ (cid:107) , E [ I it ( δ )] ≤ M + 4 M f (cid:48) ) (cid:107) δ (cid:107) . Define c ,M := M ( M + 1) , c ,M := 2( M + 4 M f (cid:48) ) and apply the Bernstein inequality toshow that for any 1 ≥ (cid:107) δ (cid:107) ≥ T − (cid:96) n,T , < a < ∞ P (cid:16)(cid:12)(cid:12)(cid:12) (cid:88) t I it ( δ ) (cid:12)(cid:12)(cid:12) > a(cid:96) / n,T T / (cid:107) δ (cid:107) / (cid:17) ≤ (cid:16) − a (cid:96) n,T T (cid:107) δ (cid:107) / T c ,M (cid:107) δ (cid:107) + ac ,M (cid:96) / n,T T / (cid:107) δ (cid:107) / / (cid:17) = 2 exp (cid:16) − a (cid:96) n,T / c ,M + ac ,M (cid:96) / n,T ( T (cid:107) δ (cid:107) ) − / / (cid:17) . For 0 < a < (cid:96) / n,T c ,M /c ,M the last line above is bounded by 2( n ∨ T ) − a / (4 c ,M ) . Denoteby G T a grid of values δ , ..., δ | G T | such that T − ≤ (cid:107) δ j (cid:107) ≤ j ∈ G T andsup (cid:96) n,T T − ≤(cid:107) δ (cid:107)≤ inf ˜ δ ∈ G T (cid:107) δ − ˜ δ (cid:107) = o ( T − ) . Note that it is possible to find such a G T with | G T | = O ( T d +1) ). It follows thatsup i sup (cid:96) n,T T − ≤(cid:107) δ (cid:107)≤ (cid:12)(cid:12)(cid:12) (cid:80) t I it ( δ ) (cid:12)(cid:12)(cid:12) (cid:107) δ (cid:107) / ≤ sup i sup δ ∈ G T (cid:12)(cid:12)(cid:12) (cid:80) t I it ( δ ) (cid:12)(cid:12)(cid:12) (cid:107) δ (cid:107) / + T / T M o ( T − )= sup i sup δ ∈ G T (cid:12)(cid:12)(cid:12) (cid:80) t I it ( δ ) (cid:12)(cid:12)(cid:12) (cid:107) δ (cid:107) / + o ( T / ) . Finally, note that for 0 < a < (cid:96) / n,T c ,M /c ,M P (cid:16) sup i sup δ ∈ G T (cid:12)(cid:12)(cid:12) (cid:80) t I it ( δ ) (cid:12)(cid:12)(cid:12) (cid:107) δ (cid:107) / > a(cid:96) / n,T T / (cid:17) ≤ n | G T | n ∨ T ) − a / (4 c ,M ) = O ( nT d +1) )( n ∨ T ) − a / (4 c ,M ) . Since (cid:96) n,T → ∞ we can pick a such that the last line above is o (1), and hencesup i sup δ ∈ G T (cid:12)(cid:12)(cid:12) (cid:80) t I it ( δ ) (cid:12)(cid:12)(cid:12) (cid:107) δ (cid:107) / = O P ( (cid:96) / n,T T / ) . Panel Quantile regression with group fixed effects Finally, observe that, denoting by (cid:107) A (cid:107) ∞ the maximum norm of the entries of the matrix A ,sup i (cid:12)(cid:12)(cid:12) − T δ (cid:62) E [ Z i Z (cid:62) i f ε i | X i (0 | X i )] + (cid:88) t f ε i | X i (0 | X it )( Z (cid:62) it δ ) (cid:12)(cid:12)(cid:12) = sup i (cid:12)(cid:12)(cid:12) δ (cid:62) (cid:110) (cid:88) t Z i Z (cid:62) i f ε i | X i (0 | X i ) − E [ Z i Z (cid:62) i f ε i | X i (0 | X i )] (cid:111) δ (cid:12)(cid:12)(cid:12) (cid:46) (cid:107) δ (cid:107) sup i (cid:13)(cid:13)(cid:13) (cid:88) t Z i Z (cid:62) i f ε i | X i (0 | X i ) − E [ Z i Z (cid:62) i f ε i | X i (0 | X i )] (cid:13)(cid:13)(cid:13) = (cid:107) δ (cid:107) O P ( (cid:112) T log n )where the last line follows by a straightforward application of the Hoeffding inequality. Thusthe proof of Lemma 8.4 is complete. (cid:50) u and Volgushev 35 Normal error t errorn T 1 2 3 4 ≥ ≥ DGP1: Independence between α i and x it . Model 1: location shift model30 15 0 0 . 074 0 . 504 0 . 324 0 . 098 0 0 . 168 0 . 490 0 . 266 0 . . 002 0 . 803 0 . 167 0 . 028 0 0 . 007 0 . 744 0 . 202 0 . . 000 0 . 984 0 . 016 0 . 000 0 0 . 000 0 . 966 0 . 032 0 . . 056 0 . 502 0 . 315 0 . 128 0 0 . 122 0 . 428 0 . 319 0 . . 000 0 . 856 0 . 127 0 . 018 0 0 . 002 0 . 767 0 . 186 0 . . 000 0 . 992 0 . 008 0 . 000 0 0 . 000 0 . 978 0 . 021 0 . . 040 0 . 465 0 . 339 0 . 156 0 0 . 113 0 . 392 0 . 322 0 . . 000 0 . 872 0 . 114 0 . 013 0 0 . 000 0 . 778 0 . 182 0 . . 000 0 . 996 0 . 004 0 . 000 0 0 . 000 0 . 984 0 . 016 0 . . 059 0 . 512 0 . 336 0 . 092 0 0 . 158 0 . 496 0 . 266 0 . . 000 0 . 814 0 . 159 0 . 026 0 0 . 006 0 . 759 0 . 198 0 . . 000 0 . 982 0 . 017 0 . 000 0 0 . 000 0 . 964 0 . 034 0 . . 038 0 . 520 0 . 324 0 . 118 0 0 . 112 0 . 437 0 . 330 0 . . 000 0 . 857 0 . 126 0 . 016 0 0 . 002 0 . 776 0 . 182 0 . . 000 0 . 994 0 . 006 0 . 000 0 0 . 000 0 . 980 0 . 020 0 . . 032 0 . 506 0 . 318 0 . 144 0 0 . 108 0 . 372 0 . 328 0 . . 000 0 . 876 0 . 110 0 . 013 0 0 . 001 0 . 794 0 . 170 0 . . 000 0 . 992 0 . 008 0 . 000 0 0 . 000 0 . 979 0 . 020 0 . DGP2: Correlation between α i and x it . Model 1: location shift model30 15 0 0 . 080 0 . 496 0 . 320 0 . 104 0 0 . 173 0 . 478 0 . 268 0 . . 003 0 . 788 0 . 178 0 . 031 0 0 . 012 0 . 722 0 . 229 0 . . 000 0 . 986 0 . 014 0 . 000 0 0 . 000 0 . 970 0 . 028 0 . . 062 0 . 492 0 . 314 0 . 132 0 0 . 128 0 . 426 0 . 315 0 . . 000 0 . 852 0 . 135 0 . 013 0 0 . 002 0 . 736 0 . 217 0 . . 000 0 . 992 0 . 008 0 . 000 0 0 . 000 0 . 980 0 . 020 0 . . 045 0 . 463 0 . 332 0 . 160 0 0 . 116 0 . 378 0 . 327 0 . . 000 0 . 854 0 . 133 0 . 012 0 0 . 002 0 . 732 0 . 220 0 . . 000 0 . 994 0 . 006 0 . 000 0 0 . 000 0 . 976 0 . 024 0 . . 138 0 . 453 0 . 330 0 . 078 0 0 . 228 0 . 456 0 . 249 0 . . 008 0 . 724 0 . 224 0 . 044 0 0 . 040 0 . 666 0 . 251 0 . . 000 0 . 972 0 . 025 0 . 003 0 0 . 000 0 . 923 0 . 068 0 . . 099 0 . 442 0 . 312 0 . 147 0 0 . 176 0 . 367 0 . 313 0 . . 001 0 . 748 0 . 210 0 . 042 0 0 . 016 0 . 631 0 . 272 0 . . 000 0 . 976 0 . 024 0 . 000 0 0 . 000 0 . 960 0 . 036 0 . . 098 0 . 389 0 . 324 0 . 189 0 0 . 145 0 . 330 0 . 316 0 . . 001 0 . 756 0 . 207 0 . 036 0 0 . 016 0 . 612 0 . 288 0 . . 000 0 . 978 0 . 022 0 . 000 0 0 . 000 0 . 950 0 . 050 0 . Table 1. Frequency of estimated number of groups as k = 1 , . . . K . We aggregate the frequencyfor K ≥ K = 3. Results are based on 2000simulation repetitions for quantile level τ = 0 . Panel Quantile regression with group fixed effects Normal error t errorn T 1 2 3 4 ≥ ≥ DGP1: Independence between α i and x it . Model 1: location shift model30 15 0 0 . 159 0 . 484 0 . 284 0 . 072 0 . 002 0 . 330 0 . 408 0 . 210 0 . . 011 0 . 734 0 . 207 0 . 048 0 . 000 0 . 200 0 . 560 0 . 207 0 . . 000 0 . 966 0 . 032 0 . 002 0 . 000 0 . 011 0 . 866 0 . 112 0 . . 129 0 . 398 0 . 325 0 . 148 0 . 000 0 . 188 0 . 337 0 . 306 0 . . 002 0 . 760 0 . 198 0 . 040 0 . 000 0 . 105 0 . 472 0 . 328 0 . . 000 0 . 970 0 . 029 0 . 000 0 . 000 0 . 000 0 . 861 0 . 128 0 . . 110 0 . 364 0 . 331 0 . 196 0 . 001 0 . 142 0 . 314 0 . 293 0 . . 000 0 . 768 0 . 194 0 . 038 0 . 000 0 . 072 0 . 450 0 . 341 0 . . 000 0 . 966 0 . 034 0 . 000 0 . 000 0 . 000 0 . 852 0 . 136 0 . . 158 0 . 472 0 . 300 0 . 070 0 . 002 0 . 334 0 . 390 0 . 214 0 . . 007 0 . 754 0 . 200 0 . 039 0 . 000 0 . 179 0 . 581 0 . 216 0 . . 000 0 . 968 0 . 031 0 . 001 0 . 000 0 . 007 0 . 868 0 . 118 0 . . 114 0 . 404 0 . 334 0 . 147 0 . 000 0 . 178 0 . 342 0 . 306 0 . . 002 0 . 771 0 . 188 0 . 038 0 . 000 0 . 082 0 . 474 0 . 353 0 . . 000 0 . 974 0 . 025 0 . 001 0 . 000 0 . 000 0 . 870 0 . 123 0 . . 093 0 . 354 0 . 346 0 . 208 0 . 000 0 . 134 0 . 292 0 . 310 0 . . 000 0 . 766 0 . 198 0 . 036 0 . 000 0 . 058 0 . 456 0 . 342 0 . . 000 0 . 955 0 . 044 0 . 000 0 . 000 0 . 000 0 . 854 0 . 132 0 . DGP2: Correlation between α i and x it . Model 1: location shift model30 15 0 0 . 172 0 . 494 0 . 265 0 . 070 0 . 006 0 . 327 0 . 416 0 . 200 0 . . 014 0 . 716 0 . 222 0 . 050 0 . 000 0 . 214 0 . 552 0 . 208 0 . . 000 0 . 954 0 . 044 0 . 002 0 . 000 0 . 014 0 . 844 0 . 132 0 . . 137 0 . 414 0 . 308 0 . 141 0 . 001 0 . 178 0 . 342 0 . 304 0 . . 003 0 . 758 0 . 202 0 . 037 0 . 000 0 . 109 0 . 464 0 . 328 0 . . 000 0 . 974 0 . 026 0 . 000 0 . 000 0 . 000 0 . 869 0 . 118 0 . . 110 0 . 349 0 . 336 0 . 205 0 . 002 0 . 139 0 . 320 0 . 294 0 . . 000 0 . 768 0 . 198 0 . 034 0 . 000 0 . 074 0 . 447 0 . 349 0 . . 000 0 . 964 0 . 034 0 . 002 0 . 000 0 . 000 0 . 836 0 . 149 0 . . 228 0 . 446 0 . 264 0 . 062 0 . 014 0 . 326 0 . 414 0 . 203 0 . . 038 0 . 660 0 . 257 0 . 045 0 . 000 0 . 276 0 . 502 0 . 196 0 . . 000 0 . 932 0 . 064 0 . 004 0 . 000 0 . 039 0 . 769 0 . 180 0 . . 176 0 . 377 0 . 304 0 . 144 0 . 004 0 . 169 0 . 338 0 . 303 0 . . 018 0 . 652 0 . 257 0 . 073 0 . 000 0 . 157 0 . 420 0 . 311 0 . . 000 0 . 954 0 . 044 0 . 002 0 . 000 0 . 004 0 . 775 0 . 192 0 . . 146 0 . 309 0 . 342 0 . 204 0 . 006 0 . 108 0 . 296 0 . 312 0 . . 010 0 . 660 0 . 258 0 . 072 0 . 000 0 . 118 0 . 373 0 . 342 0 . . 000 0 . 948 0 . 050 0 . 002 0 . 000 0 . 002 0 . 753 0 . 212 0 . Table 2. Frequency of estimated number of groups as k = 1 , . . . K . We aggregate the frequencyfor K ≥ K = 3. Results are based on 2000simulation repetitions for quantile level τ = 0 . u and Volgushev 37 N o r m a l e rr o r t e rr o r P Q R - F E g r o up Q R F EP Q R - F E g r o up Q R F E n T B i a s R M S E C o v e r ag e B i a s R M S E C o v e r ag e B i a s R M S E C o v e r ag e B i a s R M S E C o v e r ag e D G P : I nd e p e nd e n c e b e t w ee n α i a nd x i t . M o d e l :l o c a t i o n s h i f t m o d e l − . . . − . . . . . . . . . 931 30300 . . . . . . − . . . − . . . 908 30600 . . . . . . . . . . . . 934 60150 . . . . . . . . . . . . 780 60300 . . . . . . − . . . − . . . 942 60600 . . . . . . . . . . . . 942 9015 − . . . . . . − . . . − . . . 828 90300 . . . . . . . . . . . . 968 90600 . . . . . . . . . . . . M o d e l :l o c a t i o n - s c a l e s h i f t m o d e l . . . − . . . . . . . . . 932 30300 . . . . . . . . . . . . 890 30600 . . . . . . . . . . . . 935 60150 . . . . . . . . . . . . 774 60300 . . . . . . . . . . . . 946 60600 . . . . . . . . . . . . 938 90150 . . . . . . − . . . − . . . 830 90300 . . . . . . . . . . . . 966 90600 . . . . . . . . . . . . D G P : C o rr e l a t i o nb e t w ee n α i a nd x i t . M o d e l :l o c a t i o n s h i f t m o d e l . . . − . . . . . . . . . 931 30300 . . . . . . . . . − . . . 908 30600 . . . . . . . . . . . . 934 60150 . . . . . . . . . . . . 780 60300 . . . . . . . . . − . . . 942 60600 . . . . . . . . . . . . 942 90150 . . . . . . . . . − . . . 828 90300 . . . . . . . . . . . . 968 90600 . . . . . . . . . . . . M o d e l :l o c a t i o n - s c a l e s h i f t m o d e l . . . − . . . . . . . . . 933 30300 . . . . . . . . . − . . . 892 30600 . . . . . . . . . . . . 938 60150 . . . . . . . . . . . . 775 60300 . . . . . . . . . . . . 943 60600 . . . . . . . . . . . . 938 90150 . . . . . . . . . − . . . 831 90300 . . . . . . . . . . . . 966 90600 . . . . . . . . . . . . T a b l e . C o m p a r i s o n o f b i a s a nd r oo t m e a n s q u a r e d e rr o r o f ˆ β ( τ ) b a s e d o n t h e g r o upfi x e d e ff ec t q u a n t il e r e g r e ss i o n ( P Q R - F E g r o up ) a nd t h e fi x e d e ff ec t q u a n t il e r e g r e ss i o n e s t i m a t o r ( Q R F E ) . R e s u l t s a r e b a s e d o n s i m u l a t i o n r e p e t i t i o n s f o r q u a n t il e l e v e l τ = . . D G P ss u m e s t h a t x i t i s i nd e p e nd e n t o f t h e fi x e d e ff ec t α i . D G P ss u m e s t h a t x i t = . α i + γ i + v i t . Panel Quantile regression with group fixed effects N o r m a l e rr o r t e rr o r P Q R - F E g r o up Q R F EP Q R - F E g r o up Q R F E n T B i a s R M S E C o v e r ag e B i a s R M S E C o v e r ag e B i a s R M S E C o v e r ag e B i a s R M S E C o v e r ag e D G P : I nd e p e nd e n c e b e t w ee n α i a nd x i t . M o d e l :l o c a t i o n s h i f t m o d e l: n o r m a l e rr o r − . . . − . . . . . . . . . 944 30300 . . . − . . . − . . . − . . . 890 30600 . . . . . . . . . − . . . 938 60150 . . . . . . . . . − . . . 786 60300 . . . . . . − . . . . . . 945 60600 . . . . . . . . . . . . 902 9015 − . . . . . . . . . . . . 825 90300 . . . . . . . . . . . . 963 90600 . . . . . . . . . . . . M o d e l :l o c a t i o n - s c a l e s h i f t m o d e l: n o r m a l e rr o r − . . . − . . . − . . . − . . . 942 3030 − . . . − . . . − . . . − . . . 886 30600 . . . − . . . − . . . − . . . 933 6015 − . . . − . . . − . . . − . . . 784 6030 − . . . − . . . − . . . − . . . 936 60600 . . . − . . . − . . . − . . . 892 9015 − . . . − . . . − . . . − . . . 824 9030 − . . . − . . . − . . . − . . . 958 90600 . . . − . . . − . . . − . . . D G P : C o rr e l a t i o nb e t w ee n α i a nd x i t . M o d e l :l o c a t i o n s h i f t m o d e l . . . − . . . . . . . . . 944 30300 . . . − . . . . . . − . . . 890 30600 . . . . . . . . . − . . . 938 60150 . . . . . . . . . − . . . 786 60300 . . . . . . . . . . . . 945 60600 . . . . . . . . . . . . 902 90150 . . . . . . . . . . . . 825 90300 . . . . . . . . . . . . 963 90600 . . . . . . . . . . . . M o d e l :l o c a t i o n - s c a l e s h i f t m o d e l . . . − . . . . . . − . . . 942 30300 . . . − . . . . . . − . . . 888 30600 . . . − . . . . . . − . . . 935 60150 . . . − . . . . . . − . . . 784 60300 . . . − . . . . . . − . . . 936 60600 . . . − . . . . . . − . . . 896 90150 . . . − . . . . . . − . . . 828 90300 . . . − . . . . . . − . . . 960 90600 . . . − . . . . . . − . . . T a b l e . C o m p a r i s o n o f b i a s a nd r oo t m e a n s q u a r e d e rr o r o f ˆ β ( τ ) b a s e d o n t h e g r o upfi x e d e ff ec t q u a n t il e r e g r e ss i o n ( P Q R - F E g r o up ) a nd t h e fi x e d e ff ec t q u a n t il e r e g r e ss i o n e s t i m a t o r ( Q R F E ) . R e s u l t s a r e b a s e d o n s i m u l a t i o n r e p e t i t i o n s f o r q u a n t il e l e v e l τ = . . D G P ss u m e s t h a t x i t i s i nd e p e nd e n t o f t h e fi x e d e ff ec t α i . D G P ss u m e s t h a t x i t = . α i + γ i + v i t . u and Volgushev 39 Normal error t errorn T Perfect Match Avg Match Std Error Perfect Match Avg Match Std Error DGP1: Independence between α i and x it . Model 1: location shift model30 15 0 . 025 0 . 659 0 . 285 0 . 008 0 . 636 0 . . 384 0 . 877 0 . 222 0 . 223 0 . 820 0 . . 918 0 . 989 0 . 069 0 . 822 0 . 976 0 . . 003 0 . 651 0 . 307 0 . 000 0 . 590 0 . . 225 0 . 900 0 . 202 0 . 088 0 . 841 0 . . 884 0 . 992 0 . 064 0 . 735 0 . 985 0 . . 000 0 . 646 0 . 317 0 . 000 0 . 562 0 . . 119 0 . 912 0 . 187 0 . 024 0 . 839 0 . . 856 0 . 995 0 . 044 0 . 654 0 . 985 0 . . 028 0 . 670 0 . 285 0 . 012 0 . 638 0 . . 394 0 . 880 0 . 221 0 . 225 0 . 836 0 . . 906 0 . 988 0 . 072 0 . 800 0 . 975 0 . . 002 0 . 663 0 . 309 0 . 000 0 . 595 0 . . 218 0 . 903 0 . 201 0 . 084 0 . 850 0 . . 856 0 . 994 0 . 047 0 . 708 0 . 983 0 . . 000 0 . 662 0 . 323 0 . 000 0 . 550 0 . . 118 0 . 908 0 . 198 0 . 026 0 . 851 0 . . 827 0 . 994 0 . 051 0 . 634 0 . 984 0 . DGP2: Correlation between α i and x it . Model 1: location shift model30 15 0 . 024 0 . 655 0 . 289 0 . 008 0 . 624 0 . . 365 0 . 865 0 . 234 0 . 199 0 . 817 0 . . 923 0 . 991 0 . 061 0 . 824 0 . 978 0 . . 002 0 . 650 0 . 305 0 . 000 0 . 590 0 . . 202 0 . 895 0 . 212 0 . 076 0 . 829 0 . . 884 0 . 993 0 . 052 0 . 738 0 . 985 0 . . 000 0 . 643 0 . 318 0 . 000 0 . 554 0 . . 104 0 . 900 0 . 208 0 . 025 0 . 817 0 . . 850 0 . 994 0 . 053 0 . 646 0 . 984 0 . . 006 0 . 636 0 . 263 0 . 002 0 . 608 0 . . 215 0 . 826 0 . 250 0 . 119 0 . 780 0 . . 818 0 . 979 0 . 096 0 . 649 0 . 949 0 . . 000 0 . 612 0 . 293 0 . 000 0 . 545 0 . . 083 0 . 837 0 . 254 0 . 026 0 . 764 0 . . 715 0 . 981 0 . 091 0 . 516 0 . 968 0 . . 000 0 . 585 0 . 312 0 . 000 0 . 514 0 . . 023 0 . 832 0 . 268 0 . 006 0 . 744 0 . . 648 0 . 983 0 . 085 0 . 398 0 . 962 0 . Table 5. Membership estimation for τ = 0 . Panel Quantile regression with group fixed effects Normal error t errorn T Perfect Match Avg Match Std Error Perfect Match Avg Match Std Error DGP1: Independence between α i and x it . Model 1: location shift model30 15 0 . 012 0 . 666 0 . 240 0 . 000 0 . 574 0 . . 242 0 . 849 0 . 220 0 . 034 0 . 737 0 . . 846 0 . 982 0 . 082 0 . 471 0 . 923 0 . . 000 0 . 634 0 . 258 0 . 000 0 . 558 0 . . 098 0 . 873 0 . 198 0 . 002 0 . 732 0 . . 754 0 . 984 0 . 072 0 . 260 0 . 934 0 . . 000 0 . 633 0 . 267 0 . 000 0 . 530 0 . . 041 0 . 880 0 . 201 0 . 000 0 . 734 0 . . 690 0 . 984 0 . 071 0 . 182 0 . 939 0 . . 014 0 . 668 0 . 240 0 . 000 0 . 573 0 . . 248 0 . 858 0 . 215 0 . 040 0 . 751 0 . . 824 0 . 980 0 . 086 0 . 451 0 . 926 0 . . 000 0 . 640 0 . 260 0 . 000 0 . 566 0 . . 100 0 . 875 0 . 204 0 . 002 0 . 747 0 . . 729 0 . 985 0 . 069 0 . 270 0 . 938 0 . . 000 0 . 630 0 . 278 0 . 000 0 . 531 0 . . 045 0 . 885 0 . 195 0 . 001 0 . 741 0 . . 646 0 . 978 0 . 084 0 . 164 0 . 938 0 . DGP2: Correlation between α i and x it . Model 1: location shift model30 15 0 . 012 0 . 674 0 . 187 0 . 000 0 . 584 0 . . 255 0 . 873 0 . 163 0 . 052 0 . 758 0 . . 772 0 . 977 0 . 078 0 . 368 0 . 908 0 . . 000 0 . 727 0 . 189 0 . 000 0 . 523 0 . . 188 0 . 921 0 . 124 0 . 008 0 . 699 0 . . 855 0 . 994 0 . 032 0 . 258 0 . 911 0 . . 000 0 . 624 0 . 207 0 . 000 0 . 482 0 . . 048 0 . 868 0 . 177 0 . 002 0 . 667 0 . . 665 0 . 989 0 . 044 0 . 172 0 . 924 0 . . 002 0 . 618 0 . 161 0 . 000 0 . 538 0 . . 072 0 . 776 0 . 183 0 . 008 0 . 665 0 . . 468 0 . 945 0 . 103 0 . 140 0 . 819 0 . . 000 0 . 556 0 . 162 0 . 000 0 . 476 0 . . 000 0 . 761 0 . 188 0 . 000 0 . 591 0 . . 355 0 . 954 0 . 094 0 . 055 0 . 825 0 . . 000 0 . 527 0 . 172 0 . 000 0 . 428 0 . . 002 0 . 748 0 . 212 0 . 000 0 . 554 0 . . 240 0 . 953 0 . 103 0 . 010 0 . 805 0 . Table 6. Membership estimation for τ = 0 . 75 for two different error distributions: Perfect Match states thepercentage of perfect membership estimation out of the 2000 repetitions. Average match reports the meanof the percentage of correct membership estimation and the standard error reports the associated standarddeviation. u and Volgushev 41 Appendix A. Additional simulation results A.1. More investigations on the tuning parameters in the IC criteria. In the mainmanuscript, we reported the influence of the turning parameters in the IC criteria on theperformance of the group and common parameters estimation for location-scale shift modelwith t errors on DGP1 where the predictor X it and the fixed effects α i are independent.Figure 6 and Figure 7 shows the corresponding results for location shift model with t erroron DGP1. The corresponding plots for DGP2 are collected in Figure 8 - Figure 11. l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 15) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 30) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 60) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . . 95% Coverage (T = 15) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . . 95% Coverage (T = 30) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . . 95% Coverage (T = 60) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 Figure 6. Use different constants in p n,T for the IC criteria for location shift model with t erroron DGP1: For a equally spaced grid on [0 . , . 3] with width 0.01, the three columns representdifferent magnitudes of T while each figures in the row overlays the curves for n ∈ { , , } forvarious performance measures. The first row plots the proportion of correctly estimated number ofgroups. The second row plots the RMSE of ˆ β IC ( τ ) where τ = 0 . Panel Quantile regression with group fixed effects l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 15) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 30) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 60) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . 95% Coverage (T = 15) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . . 95% Coverage (T = 30) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . . 95% Coverage (T = 60) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 Figure 7. Use different constants in p n,T for the IC criteria for location shift model with t erroron DGP1: For a equally spaced grid on [0 . , . 3] with width 0.01, the three columns representdifferent magnitudes of T while each figures in the row overlays the curves for n ∈ { , , } forvarious performance measures. The first row plots the proportion of correctly estimated numberof groups. The second row plots the RMSE of ˆ β IC ( τ ) where τ = 0 . 75 and the third plots thecoverage rate for nominal size 5%. u and Volgushev 43 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . RMSE of beta (T = 15) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . RMSE of beta (T = 30) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . RMSE of beta (T = 60) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . 95% Coverage (T = 15) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . 95% Coverage (T = 30) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . . 95% Coverage (T = 60) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 Figure 8. Use different constants in p n,T for the IC criteria for location-scale shift model with t error on DGP2: For a equally spaced grid on [0 . , . 3] with width 0.01, the three columns representdifferent magnitudes of T while each figures in the row overlays the curves for n ∈ { , , } forvarious performance measures. The first row plots the proportion of correctly estimated number ofgroups. The second row plots the RMSE of ˆ β IC ( τ ) where τ = 0 . Panel Quantile regression with group fixed effects llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . RMSE of beta (T = 15) constant R M SE llllllllllllllll l lllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . RMSE of beta (T = 30) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . RMSE of beta (T = 60) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . 95% Coverage (T = 15) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . 95% Coverage (T = 30) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . . 95% Coverage (T = 60) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 Figure 9. Use different constants in p n,T for the IC criteria for location-scale shift model with t error on DGP2: For a equally spaced grid on [0 . , . 3] with width 0.01, the three columns representdifferent magnitudes of T while each figures in the row overlays the curves for n ∈ { , , } forvarious performance measures. The first row plots the proportion of correctly estimated numberof groups. The second row plots the RMSE of ˆ β IC ( τ ) where τ = 0 . 75 and the third plots thecoverage rate for nominal size 5%. Results are based on 400 repetitions. u and Volgushev 45 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . RMSE of beta (T = 15) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . RMSE of beta (T = 30) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . RMSE of beta (T = 60) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . 95% Coverage (T = 15) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . 95% Coverage (T = 30) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . . 95% Coverage (T = 60) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 Figure 10. Use different constants in p n,T for the IC criteria for location shift model with t erroron DGP2: For a equally spaced grid on [0 . , . 3] with width 0.01, the three columns representdifferent magnitudes of T while each figures in the row overlays the curves for n ∈ { , , } forvarious performance measures. The first row plots the proportion of correctly estimated number ofgroups. The second row plots the RMSE of ˆ β IC ( τ ) where τ = 0 . Panel Quantile regression with group fixed effects llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . RMSE of beta (T = 15) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . RMSE of beta (T = 30) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . RMSE of beta (T = 60) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . 95% Coverage (T = 15) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . 95% Coverage (T = 30) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . . 95% Coverage (T = 60) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 Figure 11. Use different constants in p n,T for the IC criteria for location shift model with t erroron DGP2: For a equally spaced grid on [0 . , . 3] with width 0.01, the three columns representdifferent magnitudes of T while each figures in the row overlays the curves for n ∈ { , , } forvarious performance measures. The first row plots the proportion of correctly estimated numberof groups. The second row plots the RMSE of ˆ β IC ( τ ) where τ = 0 . 75 and the third plots thecoverage rate for nominal size 5%. Results are based on 400 repetitions. Appendix B. Additional Empirical Application Analysis As a robustness check, we report here the results of the empirical analysis for model (10)without the incarcerating rate as a control covariate. Figure 12 presents the correspondingcommon parameters estimation using the fixed effect approach while Figure 13 for theresults using our proposed grouped fixed effect estimator. Figure 14 reports the raw andthe grouped fixed effect estimates. u and Volgushev 47 − . . . . . lawind taus c f[, ] l l l l l − − − − − − + − rpcpi taus c f[, ] l l l l l − . − . . . afam1019 taus c f[, ] l l l l l − . − . . . afam2029 taus c f[, ] l l l l l . . . . afam3039 taus c f[, ] l l l l l Figure 12. Panel data quantile regression estimates with state fixed effects for various τ basedon model specification (10) excluding the incarceration rate as a control variable: For τ ∈{ . , . , . , . , . } , the solid black points plot the coefficient estimates for the effects of theRTC law adoption and other control variables on the violent crime rate based on panel data on51 U.S. states for 1977 - 2010. The shaded area are the pointwise 95% confidence interval wherethe standard errors are computed using the Hendricks-Koenker sandwich covariance matrix esti-mates with the Hall-Sheather bandwidth rule. The red solid line marks the fixed effect panel datamean regression estimates with the dotted red lines plot the 95% confidence interval with robustclustered standard errors.8 Panel Quantile regression with group fixed effects − . . . . . lawind taus b s t a r [ k , ] l l l l l − − − − − − + − rpcpi taus b s t a r [ k , ] l l l l l − . − . . . afam1019 taus b s t a r [ k , ] l l l l l − . − . . . afam2029 taus b s t a r [ k , ] l l l l l . . . . afam3039 taus b s t a r [ k , ] l l l l l Figure 13. Panel data quantile regression estimates with grouped state fixed effects for various τ based on model specification (10) excluding the incarceration rate as a control variable: For τ ∈ { . , . , . , . , . } , the solid black points plot the coefficient estimates for the effects ofthe RTC law adoption and other control variables on the violent crime rate based on the proposedmethodology with panel data on 51 U.S. states for 1977 - 2010. The shaded area are the pointwise95% confidence interval where the standard errors are computed using the Hendricks-Koenkersandwich covariance matrix estimates with the Hall-Sheather bandwidth rule. The red solid linemarks the fixed effect panel data mean regression estimates with the dotted red lines plot the 95%confidence interval with robust clustered standard errors. u and Volgushev 49 lllllllllllllllllllllllllllllllllllllllllllllllllll tau = 0.1 Index F i x ed E ff e c t s lllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll tau = 0.5 Index F i x ed E ff e c t s lllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll tau = 0.9 Index F i x ed E ff e c t s lllllllllllllllllllllllllllllllllllllllllllllllllll Figure 14. The estimated state fixed effect and the corresponding estimated group structure forvarious τ based on model specification (10) excluding the incarceration rate as a control variable:For τ ∈ { . , . , . }}