[PDF] Panel Data Quantile Regression with Grouped Fixed Effects

Abstract

This paper introduces estimation methods for grouped latent heterogeneity in panel data quantile regression. We assume that the observed individuals come from a heterogeneous population with a finite number of types. The number of types and group membership is not assumed to be known in advance and is estimated by means of a convex optimization problem. We provide conditions under which group membership is estimated consistently and establish asymptotic normality of the resulting estimators. Simulations show that the method works well in finite samples when T is reasonably large. To illustrate the proposed methodology we study the effects of the adoption of Right-to-Carry concealed weapon laws on violent crime rates using panel data of 51 U.S. states from 1977 - 2010.

Full PDF

PPANEL DATA QUANTILE REGRESSION WITH GROUPED FIXEDEFFECTS

JIAYING GU AND STANISLAV VOLGUSHEV

Abstract.

This paper introduces estimation methods for grouped latent heterogeneityin panel data quantile regression. We assume that the observed individuals come from aheterogeneous population with a ﬁnite number of types. The number of types and groupmembership is not assumed to be known in advance and is estimated by means of a convexoptimization problem. We provide conditions under which group membership is estimatedconsistently and establish asymptotic normality of the resulting estimators. Simulationsshow that the method works well in ﬁnite samples when T is reasonably large. To illustratethe proposed methodology we study the eﬀects of the adoption of Right-to-Carry concealedweapon laws on violent crime rates using panel data of 51 U.S. states from 1977 - 2010. Introduction

It is widely accepted in applied Econometrics that individual latent eﬀects constitutean important feature of many economic applications. When panel data are available, acommon approach is to incorporate latent structures in completely nonrestrictive way, i.e.the ﬁxed eﬀect approach. The ﬁxed eﬀect approach is attractive as it imposes minimalassumptions on the structure of the latent eﬀects and on the correlation between the latenteﬀects and the observed covariates and hence has become a very common empirical tool(see Hsiao (2003) for a textbook treatment).A major challenge of the ﬁxed eﬀects approach lies in the fact that it introduces alarge number of parameters which grows linearly with the number of individuals. For afew speciﬁc models this can be avoided by diﬀerencing out individual eﬀects and learningabout the common parameter of interest. However, for most models, including quantileregression, this simple diﬀerencing method no longer exists. The literature contains variousapproaches that put additional structure on latent eﬀects in order to reduce the number ofparameters and obtain more interpretable models. One popular approach is to introducesome parametric distributional structure on the latent eﬀects, see for example Mundlak(1978), Chamberlain (1982) and the correlated random eﬀects literature. An alternative isto assume that the ﬁxed eﬀects have a group structure and hence only take a few distinctvalues which is the approach we take in this paper.There is ample evidence from empirical studies that it is often reasonable to consider anumber of homogeneous groups (clusters) within a heterogeneous population. This discreteapproach was taken by Heckman and Singer (1982) for duration analysis of unemploymentspells of a heterogeneous population of workers. Bester and Hansen (2016) argue that in

Version: August 7, 2018. The authors would like to thank Jacob Bien for bringing the convex clusteringliterature to their attention. We are also grateful to two anonymous Referees and the Associate Editor whosecomments helped to considerably improve the presentation of this manuscript. a r X i v : . [ ec on . E M ] A ug Panel Quantile regression with group fixed effects many applications individuals or ﬁrms are grouped naturally by some observable covariatessuch as classes, schools or industry codes. It is also widely accepted in the discrete choicemodel literature that individual agents are classiﬁed as a number of latent types (for instanceKeane and Wolpin (1997) among many others).Estimating cluster structure has a long history in Statistics and Economics, and has gen-erated a rich and mature literature. A general overview is given in Kaufman and Rousseeuw(2009). Among the many available clustering algorithms, the k-means algorithm (MacQueen(1967)) is one of the most popular methods. It has been successfully utilized in many eco-nomic applications, for instance Lin and Ng (2012), Bonhomme and Manresa (2015) andAndo and Bai (2016). Finite mixture models provide an alternative, likelihood based ap-proach. In the latter, grouping is usually achieved by maximizing the likelihood of theobserved data. Sun (2005) builds a multinomial logistic regression model to infer the grouppattern while nonparametric ﬁnite mixture models are considered in Allman, Mathias, andRhodes (2009) and Kasahara and Shimotsu (2009) among many others.The focus of the present paper is on quantile regression for panel data with grouped indi-vidual heterogeneity. Panel data quantile regression has recently attracted a lot of attention,and there is a rich and growing literature that proposes various approaches to dealing withindividual heterogeneity in this setting. In a pioneering contribution, Koenker (2004) takesthe ﬁxed eﬀect approach and introduces individual latent eﬀects as location shifts. Theseindividual eﬀects are regularized through an (cid:96) penalty which shrinks them towards a com-mon value. Lamarche (2010) proposes an optimal way to choose the corresponding penaltyparameter in order to optimize the asymptotic eﬃciency of the common parameters of theconditional quantile function, see Harding and Lamarche (2017) for an extension of this ap-proach. Another line of work that focuses on estimating common parameters while puttingno structure on individual eﬀects includes Kato, Galvao, and Montes-Rojas (2012), Galvaoand Wang (2015) and Galvao and Kato (2016). Alternative approaches have also emerged.Abrevaya and Dahl (2008) take a random eﬀect view of these individual latent eﬀects. Theyconsider a correlated random-eﬀect model in the spirit of Chamberlain (1982) where theindividual eﬀects are modeled through a linear regression of some covariates. This is furtherdeveloped in Arellano and Bonhomme (2016) and Chetverikov, Larsen, and Palmer (2016)where the conditional quantile function of the unobserved heterogeneity is modelled as afunction of observable covariates. Our contribution, which builds upon Koenker (2004), is a linear quantile regressionmethod that accommodates grouped ﬁxed eﬀects. The advantages of our proposal overexisting proposals are twofold. First, grouped ﬁxed eﬀects maintain the merit of unre-stricted correlation between the latent eﬀects and the observables and strike a good balancebetween the classical ﬁxed eﬀects approach and the other extreme which completely ignoreslatent heterogeneity. Second, in contrast to Koenker (2004), where the ﬁxed eﬀects aretreated as nuisance parameters and are regularized to achieve a more eﬃcient estimatorfor the global parameter, our method allows the researcher to learn the particular groupstructure of the latent eﬀects together with common parameters of interest in the model. Tothe best of our knowledge, panel data quantile regression with grouped ﬁxed eﬀects has not For related literature on non-separable panel data models see Evdokimov (2010), Chernozhukov, Fern´andez-Val, Hoderlein, Holzmann, and Newey (2015) and the references therein. u and Volgushev 3 been considered in the literature before. The only paper that goes in this direction is Su,Shi, and Phillips (2016). While the general framework developed in this paper does includea version of quantile regression with smoothed quantile objective function, the theoreticalanalysis requires the smoothing parameter to be ﬁxed. This results in a non-vanishing biasand hence does not correspond to quantile regression in a strict sense.We do not assume any prior knowledge of the group structure and combine the quantileregression loss function with the recently proposed convex clustering penalty of Hocking,Vert, Bach, and Joulin (2011). The convex clustering method introduces a (cid:96) -constraint onthe pair-wise diﬀerence of the individual ﬁxed eﬀects, which tends to push the ﬁxed eﬀectsinto clusters. The number of clusters is controlled by a penalty parameter. The resultingoptimization problem remains convex and can be solved in a fast and reliable fashion.Further modiﬁcations and a theoretical analysis of convex clustering were considered inZhu, Xu, Leng, and Yan (2014), Tan and Witten (2015) and Radchenko and Mukherjee(2017). All of those authors combine (cid:96) penalties with the classical (cid:96) loss, and only considerclustering for cross-sectional data. Their theoretical results are not directly applicable topanel data or the non-smooth quantile loss function which is the main objective in thispaper (all of the available theoretical results explicitly make use of the diﬀerentiability ofthe (cid:96) loss function in their proofs).Our main theoretical contribution is to show consistency of the estimated grouping for asuitable range of penalty parameters when n and T tend to inﬁnity jointly. We also proposea completely data-driven information criterion that facilitates the practical implementationof the method and prove its consistency for group selection as well as asymptotic normalityof the resulting parameter estimators.The remaining part of this paper is organized as follows. Section 2 contains a detaileddescription of the proposed methodology and provides details on its practical implemen-tation. Assumptions and theoretical results are included in Section 3. Section 4 presentsthe convex optimization problem and its computational details. Monte Carlo simulationresults are included in Section 5 where investigate the ﬁnal sample behavior of the proposedmethodology. In Section 6 we apply the method to an empirical application in studying theeﬀect of the adoption of Right-to-Carry concealed weapon law on violent crime rate usinga panel data of 51 U.S. states from 1977 - 2010. All proofs are collected in Section 8 whileadditional simulation results and details for the empirical application are relegated to theAppendix. Panel Quantile regression with group fixed effects Methodology

Assume that for individuals i = 1 , ..., n we observe repeated measures ( X it , Y it ) t =1 ,...,T where X it denote covariates and Y it are responses. We shall maintain the assumption thatdata are i.i.d. within individuals and independent across individuals. The main object ofinterest in this paper is the conditional τ -quantile function of Y i given X i , which we willdenote by q i,τ . We assume that q i,τ is of the form q i,τ ( x ) = β ( τ ) (cid:62) x + α i ( τ ) , i = 1 , ...., n with individual ﬁxed eﬀects α i ( τ ) taking only a ﬁnite number, say K , of diﬀerent values,say α (01) ( τ ) , ..., α (0 K ) ( τ ). We explicitly allow the group membership, and even the numberof groups to be unknown and to depend on τ but will not stress this dependence in thenotation for the sake of simplicity. Our main objective is to jointly estimate the number ofgroups, unknown group structure, and parameters α (01) , ..., α (0 K ) , β from the observations.To achieve this, we consider penalized estimators of the form(1) ( ˆ α , ..., ˆ α n , ˆ β ) := arg min α ,...,α n ,β Θ( α , ..., α n , β )where Θ( α , ..., α n , β ) := (cid:88) i,t ρ τ ( Y it − X (cid:62) it β − α i ) + (cid:88) i (cid:54) = j λ ij | α i − α j | . Here ρ τ denotes the usual ’check function’ and the weights λ i,j are allowed to depend on n, T and the data; one particular choice is discussed in below. The form of the penalty ismotivated by the work of Hocking, Vert, Bach, and Joulin (2011). Intuitively, large valuesof λ ij will push diﬀerent coeﬃcients closer together and result in clustered structure of theestimators ˆ α i . High-level conditions on the weights λ i,j which guarantee consistency of theresulting grouping procedure are provided in Theorem 3.1.There are various possible choices for the penalty parameters λ i,j . We propose to useweights of the form(2) ˇ λ ij := λ | ˇ α i − ˇ α j | − Here, T is assumed to be the same across individuals for notational simplicity. All results that followcan be extended to individual-speciﬁc values of T i as long as the ratio (max i =1 ,...,n T i ) / (min i =1 ,...,n T i ) isuniformly bounded. In this case the theory goes through without changes if all instances of T are replacedby n − (cid:80) i =1 ,...,n T i We follow Koenker (2004) in treating the α i as ﬁxed parameters. An alternative approach which leads toequivalent results is to treat the α i as random (with no restrictions placed on the dependence with X it ).In this case the model can be written as Q Y it | X it ,α i ( τ ) ( τ ) = β ( τ ) (cid:62) x + α i ( τ ); here Q Y it | X it ,α i ( τ ) denotesthe conditional quantile function of Y it given ( X it , α i ( τ )) (see for instance Kato, Galvao, and Montes-Rojas(2012), Galvao and Wang (2015) and Galvao and Kato (2016) for this interpretation). Both interpretationslead to the same asymptotic results. As pointed out by a Referee, one could also consider combining the objective functions corresponding toseveral quantiles as was done in Koenker (2004) and force all coeﬃcients α i to be independent of τ . Thiswould result in eﬃciency gains if all α i are purely location-shift eﬀects but can introduce bias otherwise. Weleave this extension for future research. u and Volgushev 5 where ( ˇ α , ..., ˇ α n ) are the ﬁxed eﬀects quantile regression estimators(3) ( ˇ α , ..., ˇ α n , ˇ β ) := arg min α ,...,α n ,β (cid:88) i,t ρ τ ( Y it − X (cid:62) it β − α i )of Kato, Galvao, and Montes-Rojas (2012) and λ is a tuning parameter. This form ofweighting by preliminary estimators is motivated by the work of Zou (2006) on adaptivelasso. Intuitively, weighting by preliminary estimated distances tends to give smaller penal-ties to coeﬃcients from diﬀerent groups thus reducing some of the bias that is typicallypresent in the classical lasso.Given the developments above, it remains to ﬁnd a value for the tuning parameter λ . Thehigh-level results in Theorem 3.1 together with ﬁndings in Kato, Galvao, and Montes-Rojas(2012) provide a theoretical range for those values (see the discussion following Theorem 3.1for additional details), but this range is not directly useful in practice since only rates andnot constants are provided. Moreover, despite the fact that the weights ˇ λ ij := λ | ˇ α i − ˇ α j | − lead to asymptotically unbiased estimates, bias can still be a problem in ﬁnite samples. Atypical approach in the literature to reduce bias which results from lasso-type penalties isto view the lasso problem solution as a candidate model (in our case, a candidate groupingof α i ) and re-ﬁt based on this candidate model (see Belloni and Chernozhukov (2009) orSu, Shi, and Phillips (2016) among many others).To deal with bias issues and the choice of λ in practice, we propose to combine the re-ﬁtting idea with a simple information criterion which will simultaneously reduce the biasproblem and provide a simple way to select a ﬁnal model. A formal description of ourapproach is given in Algorithm 1.Theorem 3.2 provides a formal justiﬁcation of Algorithm 1 under high-level conditionson ˆ C, p n,T . In particular we prove that the group structure is estimated consistently withprobability tending to one and derive the asymptotic distribution of the resulting estimatorsˆ α ICi , ˆ β IC . In order to make the proposed estimation procedure fully data-driven, we needto specify a choice for the tuning parameters ˆ C and p n,T . In our simulations, we found thatthe following choices lead to good results :(4) p n,T = nT / / , ˆ C := τ (1 − τ )ˆ s ( τ )with ˆ s ( τ ) := ( ˆ F − ( τ + h n,T ) − ˆ F − ( τ − h n,T )) / (2 h n,T )where ˆ F ( y ) := nT (cid:80) i,t I { Y it − X (cid:62) it ˇ β − ˇ α i ≤ y } denotes the empirical cdf of the regressionresiduals from the ﬁxed eﬀects quantile regression estimator given in (3), ˆ F − denotes thecorresponding empirical quantile function, and h n,T → As pointed out by a Referee, an alternative approach to obtain preliminary estimators for α i would be torun separate quantile regressions for each individual. This did not improve the performance of our procedurein the simulations that we tried. The exact constant 1 /

10 in the factor p n,T does not matter asymptotically. The value 1 /

10 was foundto work well for a wide range of values of n, T and for various models, details are provided in the MonteCarlo section 5.2. There we also show that the impact of the precise form of the factor in ˆ p n,T becomes lesspronounced as T increases Panel Quantile regression with group fixed effects input :

Data ( X it , Y it ), grid of values λ , ..., λ L , quantile level of interest τ output: Estimated number of groups ˆ K IC , estimated group membership ˆ I IC , ..., ˆ I IC ˆ K ,estimated coeﬃcients ˆ α ICk , ˆ β IC for i ← to n do compute ˇ α i given in (3) endfor l ← to L do Compute( ˆ α ,(cid:96) , ..., ˆ α n,(cid:96) , ˆ β (cid:96) ) := arg min ( α ,...,α n ,β ) (cid:110) (cid:88) i,t ρ τ ( Y it − X (cid:62) it β − α i ) + λ (cid:96) (cid:88) i (cid:54) = j | α i − α j || ˇ α i − ˇ α j | (cid:111) Let ˆ α (1 ,(cid:96) ) < ... < ˆ α ( K (cid:96) ,(cid:96) ) denote the unique values of ˆ α ,(cid:96) , ..., ˆ α n,(cid:96) , and deﬁneˆ I j,(cid:96) := { i : ˆ α i = ˆ α ( j,(cid:96) ) } as the estimated groups. Compute re-ﬁtted estimators( (cid:101) α ,(cid:96) , ..., (cid:101) α K (cid:96) ,(cid:96) , (cid:101) β (cid:96) ) := arg min ( α ,...,α K(cid:96) ,β ) K (cid:96) (cid:88) k =1 (cid:88) i ∈ ˆ I k,(cid:96) (cid:88) t ρ τ ( Y it − X (cid:62) it β − α k ) . Compute the IC criterion IC ( (cid:96) ) := K (cid:96) (cid:88) k =1 (cid:88) i ∈ ˆ I k,(cid:96) (cid:88) t ρ τ ( Y it − X (cid:62) it (cid:101) β (cid:96) − (cid:101) α k,(cid:96) ) + ˆ CK (cid:96) p n,T , where the choice of ˆ C and p n,T is given in (4). end Set ˆ (cid:96) IC := arg min (cid:96) =1 ,...,L IC ( (cid:96) ) and denote by ˆ K IC := K ˆ (cid:96) IC the corresponding numberof groups. Set ˆ I ICk := ˆ I k, ˆ (cid:96) IC , ˆ α ICk := (cid:101) α , ˆ (cid:96) IC , ˆ β IC := (cid:101) β ˆ (cid:96) IC . Algorithm 1:

Grouping via IC criterionTo motivate this particular choice of constant ˆ C , observe the following expansion, whichis derived in detail in the proof of Theorem 3.2 (cid:88) i,t ρ τ ( Y it − X (cid:62) it ˇ β − ˇ α i ) − ρ τ ( Y it − X (cid:62) it β − α i ) = − (cid:88) i τ (1 − τ )2 E [ f Y i | X i ( q i,τ ( X i ) | X i )] + o P ( n ) . This shows that plugging in the estimated (by ﬁxed eﬀects quantile regression) instead oftrue errors underestimates the objective function evaluated at the residuals by roughly theﬁrst term on the right-hand side in the above expression. This term needs to be dominatedby the penalty if we want to avoid selecting models that are too large, and so it is natural toscale the penalty by a constant which is proportional to (cid:80) i / E [ f Y i | X i ( q i,τ ( X i ) | X i )] inorder to ensure reasonable performance across diﬀerent data generating processes. Underthe simplifying assumption that f Y i | X i ( q i,τ ( X i ) | X i ) =: f ε (0) does not depend on i, X i ,this term equals n/f ε (0), and under the same assumptions ˆ s provides a consistent estimatorfor the latter, see Koenker (2005). Note that the sparsity term introduced here plays a u and Volgushev 7 similar role as the noise variance in classical information criteria such as AIC and BIC inleast squares regression. 3. Theoretical analysis

In this section we provide a theoretical analysis of the methodology proposed in Section 2.We begin by stating an assumption on the true (but unknown) underlying group structure.(C) For each quantile τ of interest, there exists a ﬁxed number K τ , values α (01) ( τ ) < ... <α (0 K τ ) ( τ ) and disjoint sets I ( τ ) , ..., I K τ ( τ ) with ∪ k I k ( τ ) = { , ..., n } , | I k ( τ ) | /n → µ k ( τ ) ∈ (0 , α i ( τ ) = α j ( τ ) = α (0 k ) ( τ ) for i, j ∈ I k ( τ ). There exists ε > τ withmin k =1 ,...,K τ − | α (0 k ) ( τ ) − α (0 k +1) ( τ ) | ≥ ε . Assumption (C) implies that the individual ﬁxed eﬀects are grouped into K distinct groupsand that the group centers are separated. Note that the number of groups as well as groupmembership is allowed to diﬀer across quantiles. For the sake of a concise notation, thedependence of the number of groups and group centers on τ will from now on be droppedunless there is risk of confusion. Note also that we require the number of groups to beﬁxed (i.e. independent of n, T and non-random) and exogenous, i.e. independent of thecovariates X it .Next we collect some technical assumptions on the data generating process. Deﬁne Z (cid:62) it = (1 , X (cid:62) it ) and let Z denote the support of Z it .(A1) Assume that sup i (cid:107) Z it (cid:107) ≤ M < ∞ a.s. and that c λ ≤ inf i λ min ( E [ Z it Z (cid:62) it ]) ≤ sup i λ max ( E [ Z it Z (cid:62) it ]) ≤ C λ for some ﬁxed constants c λ > C λ < ∞ .(A2) The conditional distribution functions F Y i | Z i ( y | z ) are twice diﬀerentiable w.r.t. y ,with the corresponding derivatives f Y i | Z i ( y | z ) and f (cid:48) Y i | Z i ( y | z ). Assume that f max := sup i sup y ∈ R ,z ∈Z | f Y i | Z i ( y | z ) | < ∞ , f (cid:48) := sup y ∈ R ,z ∈Z | f (cid:48) Y i | Z i ( y | z ) | < ∞ . (A3) Denote by T an open neighbourhood of τ . Assume that there exists a constant f min ≤ f max such that0 < f min ≤ inf i inf η ∈T inf z ∈Z f Y i | Z i ( q i,η ( z ) | z ) . Assumptions (A1)-(A3) are fairly standard and routinely imposed in the quantile regres-sion literature. Similar assumptions have been made, for instance in Kato, Galvao, andMontes-Rojas (2012) [see assumptions (B1)-(B3) in that paper].3.1.

Analysis of the estimators in (1) . To state our ﬁrst main result deﬁneΛ D := sup i ∈ I k ,j ∈ I k (cid:48) ,k (cid:54) = k (cid:48) λ i,j , Λ S := inf k inf i,j ∈ I k λ i,j . In words, Λ D corresponds to the largest penalty corresponding to the diﬀerence betweentwo individual eﬀects from diﬀerent groups while Λ S describes the smallest penalty between Panel Quantile regression with group fixed effects two eﬀects from the same group. Our ﬁrst result provides high-level conditions on Λ S , Λ D that guarantee asymptotically correct grouping. Theorem 3.1.

Let assumptions (A1)-(A3), (C) hold and assume that min( n, T ) → ∞ , log n = o ( T ) and (5) Λ D Λ S = o P (1) , n Λ D = o P ( T / ) , T / (log n ) / n Λ S = o P (1) . Denote the ordered unique values of ˆ α , ..., ˆ α n by ˆ α (1) < ... < ˆ α ( ˆ K ) (i.e. ˆ K denotes thenumber of distinct values taken by ˆ α , ..., ˆ α n which we interpret as the estimated number ofgroups) and deﬁne the sets ˆ I k := { i : ˆ α i = ˆ α ( k ) } , k = 1 , ..., ˆ K . Then P (cid:16) ˆ K = K, ˆ I k = I k , k = 1 , ..., K (cid:17) → . Next we discuss the implications of this general result for the speciﬁc choice ˇ λ i,j givenin (2). Deﬁne ˇΛ D := sup i ∈ I k ,j ∈ I k (cid:48) ,k (cid:54) = k (cid:48) ˇ λ i,j , ˇΛ S := inf k inf i,j ∈ I k ˇ λ i,j . From Kato, Galvao, and Montes-Rojas (2012) we obtain the boundsup i =1 ,...,n | ˇ α i − α i | = O P ((log n ) / /T / ) . Now if i, j ∈ I k then α i = α j and thus1 / ˇΛ S = (cid:110) inf k inf i,j ∈ I k | ˇ α i − ˇ α j | − λ (cid:111) − = λ − sup k sup i,j ∈ I k | ˇ α i − ˇ α j | = O P (cid:16) log nλT (cid:17) . Moreover, under (C) we have inf k (cid:54) = k (cid:48) inf i ∈ I k ,j ∈ I k (cid:48) | α i − α j | ≥ ε > D ≤ λ (cid:110) inf k (cid:54) = k (cid:48) inf i ∈ I k ,j ∈ I k (cid:48) | ˇ α i − ˇ α j | (cid:111) − ≤ λ/ ( ε − o P (1)) = O P ( λ ) . Given this choice of weights, the conditions Λ D Λ S = o P (1) is satisﬁed provided that T / log n → ∞ . The other conditions in (5) take the form T / (cid:29) nλ (cid:29) T − / (log n ) / . Assuming that (log n ) / = o ( T ), this provides a range of possible values for λ which willensure that (5) holds. more precisely, from the paragraph following equation (A.14) in the latter paper; note that this result isderived under the assumption that n grows at most polynomially with T u and Volgushev 9 Analysis of the information criterion in Algorithm 1.

In this section we pro-vide theoretical guarantees for the performance of the information criterion based estima-tors ˆ β IC , ˆ α ICk , ˆ I ICk , ˆ K IC Our main result shows that, under fairly general conditions on thepenalty parameter p n,T , the IC procedure selects the correct number of groups with proba-bility tending to one. Moreover, the estimators ( ˆ α IC , ..., ˆ α IC ˆ K IC , ˆ β IC ) are shown to enjoy the’oracle property’, i.e. they have the same asymptotic distribution as estimators which arebased on the true (but unknown) grouping of individuals. Before making this statementmore formal, we need some additional notation. Let(6) ( ˆ α ( OR )(1) , ..., ˆ α ( OR )( K ) , ˆ β ( OR ) ) := arg min ( α ,...,α K ,β ) (cid:88) k (cid:88) i ∈ I k (cid:88) t ρ τ ( Y it − X (cid:62) it β − α k )denote the infeasible ’oracle’ which uses the true group membership. The asymptotic vari-ance of the oracle estimator is conveniently expressed in terms of the following two limitswhich we assume to exist Σ ,τ := τ (1 − τ ) lim n →∞ n K (cid:88) k =1 (cid:88) i ∈ I k E [ ˜ Z ik ˜ Z (cid:62) ik ] , Σ ,τ := lim n →∞ n K (cid:88) k =1 (cid:88) i ∈ I k E [ ˜ Z ik ˜ Z (cid:62) ik f Y i | X i ( q i,τ ( X i ) | X i )] , where ˜ Z ik := ( e (cid:62) k , X (cid:62) i ) (cid:62) and e k denotes the k ’th unit vector in R K . Additionally, we needthe following condition on the grid λ , ..., λ L (G) For each ( n, T ), denote the grid values by λ ,n,T , ..., λ L,n,T where L can depend on n, T . There exists a sequence j n such that T − / (log n ) (cid:28) nλ j n (cid:28) T / .Assumption (G) is fairly mild. It only requires that among the candidate values for λ there exists one value so that ˇ λ i,j satisﬁes the assumptions of Theorem 3.1. In practice, werecommend choosing a grid of values that results in suﬃciently many diﬀerent numbers ofgroups. Theorem 3.2.

Let assumptions (A1)-(A3), (C), (G) hold and assume that min( n, T ) →∞ and n grows at most polynomially in T (i.e. n = O ( T b ) for some b < ∞ ) and (log T ) (log n ) T → . Assume that there exists ε > such that ˆ C > ε with probability tendingto one and that nT (cid:29) p n,T (cid:29) n, ˆ C = O P (1) . Then P ( ˆ K IC = K ) → and √ nT (cid:16) ( ˆ α IC , ..., ˆ α IC ˆ K IC , ( ˆ β IC ) (cid:62) ) − ( α (01) , ..., α (0 K ) , β (cid:62) ) (cid:17) D −→ N (0 , Σ − ,τ Σ ,τ Σ − ,τ ) , √ nT (cid:16) ( ˆ α ( OR )(1) , ..., ˆ α ( OR )( K ) , ( ˆ β ( OR ) ) (cid:62) ) − ( α (01) , ..., α (0 K ) , β (cid:62) ) (cid:17) D −→ N (0 , Σ − ,τ Σ ,τ Σ − ,τ ) . Remark 3.3.

Theorem 3.2 and Theorem 3.1 hold point-wise in the parameter space, andwe expect that deriving a similar result uniformly in the parameter space (in particular, if Appropriate forms of asymptotic normality of the oracle and IC estimators continue to hold without as-suming that the limits exist. This assumption is made for notational convenience. Strictly speaking, ˆ α ICk is not deﬁned if ˆ K IC < K . Since the probability of this event tends to zero, we cansimply deﬁne ˆ α ICk = 0 for ˆ K IC > k ≥ K . Panel Quantile regression with group fixed effects cluster centers are allowed to depend on n, T and if their separation is lost, see Leeb andP¨otscher (2008) for such ﬁndings in the context of classical lasso penalized regression) isimpossible. It is a well established fact in the Statistics and Econometrics literature thatinference which is based on such ’point-wise’ asymptotic results can be unreliable. Recently,several approaches to alleviate this problem and achieve uniformly valid post-regularizationinference have been proposed (see, among others, Belloni, Chernozhukov, and Kato (2014),Lockhart, Taylor, Tibshirani, and Tibshirani (2014) and van de Geer, B¨uhlmann, Ritov,and Dezeure (2014)). Applying similar ideas to the present setting is a very importantquestion which we leave for future research.4.

Details on the optimization problem in Algorithm 1

To implement the proposed quantile panel data regression with group ﬁxed eﬀect, we needto solve the optimization problem stated in (1). A natural normalization of the objectiveand the penalty function leads to(7) min α ,...,α n , β nT (cid:88) i,t ρ τ ( Y it − X (cid:62) it β − α i ) + ˜ λn ( n − (cid:88) i (cid:54) = j | α i − α j || ˇ α i − ˇ α j | this is equivalent to the objective (1) except that ˜ λ is adjusted according to n and T sothat we can use a generic grid for ˜ λ rather than letting the grid support change with ( n, T ).In practice, the grid support of ˜ λ ∈ { , ˜ λ , . . . , ˜ λ (cid:96) , . . . , ˜ λ L } is chosen such that the numberof distinct values of the solution { ˆ α ,(cid:96) , . . . , ˆ α n,(cid:96) } for (cid:96) = 1 , . . . , L takes all possible integervalues in the set { , , . . . , n } . This is always achievable as long as the grid width of ˜ λ issmall enough. Since for each ﬁxed ˜ λ (cid:96) , (7) is a linear programming problem which can beeﬃciently solved in any reliable solvers, this is not computationally expensive.To make this section self contained, we provide some details of the primal problem statedin (7) and its corresponding dual problem. Deﬁne λ ij := | ˇ α i − ˇ α j | − and observe that wecan re-write | α i − α j | = 2 (cid:16) − { ≤ α i − α j } (cid:17)(cid:16) − ( α i − α j ) (cid:17) . With this notation (7) canbe equivalently expressed as followsmin u , v , w , w , α , β τnT (cid:88) i,t u it + (1 − τ ) nT (cid:88) i,t v it + 4˜ λn ( n − (cid:16) n ( n − / (cid:88) j =1 w j + 12 n ( n − / (cid:88) j =1 w j (cid:17) subject to u it = max { Y it − X (cid:62) it β − α i , } v it = max { X (cid:62) it β + α i − Y it , } w j = max {− θ j , } w j = max { θ j , } Y it = u it − v it + α i + X (cid:62) it β w j − w j + θ j u and Volgushev 11 with θ being a vector of length n ( n − that consists entries ( α i − α j ) λ ij for i < j . We canrepresent θ as A α where A is a n ( n − × n matrix taking the form A =  λ − λ . . . λ − λ . . . . . .λ n . . . − λ n λ − λ . . . . . . . . . λ n − ,n − λ n − ,n  The corresponding dual problem of (7) can be stated as:max a , a a (cid:62) Y subject to X (cid:62) a = (1 − τ ) X (cid:62) nT Z (cid:62) a + 4 nT ˜ λn ( n − A (cid:62) a = (1 − τ ) Z (cid:62) nT + 2 nT ˜ λn ( n − A (cid:62) n ( n − / with Z being the incidence matrix that identiﬁes the n individuals. The solution for α and β in the primal problem is then the dual solutions of the dual problem. We implementthe dual problem using the Mosek optimization software of Andersen (2010) through the Rinterface Rmosek of Friberg (2012). We have also implemented the estimation procedureusing the quantreg package in R and the code will be made available for public use. Monte Carlo Simulations

Finite Sample Performance of the Proposed Estimator.

To assess the ﬁnitesample performance of the proposed convex clustering panel quantile regression estimator,we apply the method to simulated data sets. In particular, we consider data generatedfrom two models and two error distributions for a range of n and T . The responses, Y it , aregenerated by either a location shift model(8) Y it = α i + X it β + u it or a location scale shift model(9) Y it = α i + X it β + (1 + X it γ ) u it where the individual latent eﬀects α i are generated from three groups taking values { , , } with equal proportions. The covariate X it is generated such that it has a non-zero interclasscorrelation coeﬃcient. In particular, X it = ρα i + γ i + v it Mosek is a commercial state-of-the-art convex optimization solver that provides a free academic license.We use its interior point algorithm to solve our linear programming problem. The estimation procedureimplemented using the quantreg package calls the sfn method, which uses the Frisch-Newton algorithmand exploits the sparse algebra to compute iterates. Panel Quantile regression with group fixed effects with γ i and v it independent and identically distributed over i and i, t respectively. Weconduct simulation experiments with ρ ∈ { , . } to investigate both cases where the ﬁxedeﬀect is independent or correlated with the covariate. The true parameters are β = 1 and γ = 1 /

10. The error terms u it are i.i.d. following either a standard normal distribution ora student t distribution with three degrees of freedom. Results reported are based on 2000repetitions.We ﬁrst investigate the performance of using the information criterion for estimatingthe number of groups. Table 1 and Table 2 report the proportion of estimated number ofgroups under the two models for diﬀerent combinations of n and T for τ = 0 . τ = 0 . λ values withwidth 1 /

200 and support [0 , . λ values covers the integers in range [1 , n ]. For the IC criteria, thesparsity function ˆ s ( τ ) is estimated with bandwidth chosen based on the Hall and Sheather(1988) rule implemented in the quantreg package (see discussion in Koenker (2005)).The results suggest that the probability of getting the correct number of groups for τ = 0 . τ = 0 .

75. When T ≥

30, the estimates for the numberof groups for both error distributions and for diﬀerent n are mostly satisfactory. Theperformance for t error deteriorates compared to those with normal error, especially forhigher quantiles. For T = 15 and quantiles other than the median the proposed methodshould be used with caution. Including correlation between individual eﬀects and predictorsdoes not lead to dramatic changes in the accuracy for estimating the number of groups andgroup membership.Table 3 and Table 4 summarize the ﬁnite sample properties of ˆ β IC ( τ ) for τ = 0 . τ = 0 .

75, respectively and compare with the QRFE estimator where no penalization on theindividual ﬁxed eﬀect is used (i.e. λ = 0).The standard errors used for constructing conﬁdence intervals (nominal coverage 95%)are based on the nid option with Hall-Sheather bandwidth in the package quantreg. Resultsbased on the Boﬁnger bandwidth selection are similar and not reported here. For the QRFE,the Boﬁnger bandwidth rule was used since the Hall-Sheather rule resulted in substantialunder-coverage with T = 30 for some of the models.When covariates and ﬁxed eﬀects are independent, the RMSE of the PQR-FEgroupestimator, ˆ β ( τ ), is smaller than that of the QRFE for all settings considered. This showsthat our penalization gains eﬃciency for estimating β when there is group structure in theﬁxed eﬀects. The results do not change much from normal error to t error and from medianto higher quantiles.Introducing correlation between predictors and group membership leads to a bias for thegrouped eﬀect estimator, while the ﬁxed eﬀects estimator does not suﬀer from additionalbias. This bias can be quite noticeable for small values of T , especially at the 75% quantile.The bias becomes negligible as T increases, so there is no contradiction to our asymptotictheory. An intuitive explanation for this behaviour is that for smaller T it is diﬃcult to get Koenker (2004) used a similar data generating process with ρ = 0 for X and pointed out that the interclasscorrelation induced by γ i is crucial for the penalized quantile regression ﬁxed eﬀect estimator to have superiorperformance than the unpenalized QRFE estimator. u and Volgushev 13 a perfect grouping, and a wrong grouping leads to bias since there is dependence betweenpredictors and group structure.Last, we report in Table 5 and Table 6 the proportion of perfect classiﬁcation of in-dividual eﬀects and the average value of the percentage of correct classiﬁcation togetherwith their standard errors. Since the comparison of the estimated membership and thetrue membership only makes sense when ˆ K = K , the estimated membership are basedon λ for which ˆ K = K (see Su, Shi, and Phillips (2016) for a similar approach). Resultssuggest that for T ≥

30 and τ = 0 .

5, the group membership estimation is quite satisfactory.While the proportion of perfect matches is low even for T = 30, the average proportionof correct classiﬁcation shows that those eﬀects are typically due to very few misclassiﬁedindividuals. For T = 15 perfect classiﬁcation is almost impossible while average correctclassiﬁcation rates remain reasonable. Adding correlation between individual eﬀects andcovariates leads to a deterioration of the probability for achieving a perfect grouping forlocation-scale models, especially at higher quantiles, but does not have a strong impact onother results. Overall the simulations suggest that for small T , there is just not enoughinformation available for each individual to hope for perfect classiﬁcation.5.2. Further analysis of tuning parameters in the IC Criteria.

The discussion at theend of Section 2 provides a motivation for the tuning parameters ˆ C in the IC criteria. Herewe further investigate the impact of rescaling p n,T by diﬀerent factors. For illustration weconsider DGP1 in the previous section where data are generated based on the model (9). Weuse a grid of constants c ∈ [0 . , .

3] with width 0.01 and plot the associated performanceof the estimated number of groups, the RMSE of ˆ β IC ( τ ) and the coverage rate for p n,T = cnT / . Figure 1 and Figure 2 contain corresponding results for the location-scale shiftmodel with t errors and τ = 0 . , .

75, respectively. For T as small as 15, the performance isquite sensitive to the chosen constant. As predicted by the theory this dependence becomessomewhat less prominent as T increases. Overall the choice p n,T = nT / /

10 shows goodperformance for settings that we tried in this simulation. The patterns are similar for thosewith the normal error and the location-shift models and likewise for DGP2, results arereported in Figure 6- Figure 11 in the Appendix for the sake of completeness.6.

Empirical Example

To further illustrate our proposed methodology, we revisit the empirical inquiry on themuch-debated ”More Guns Less Crime” hypothesis. Lott and Mustard (1997) provide theﬁrst empirical analysis which claims that the adoption of Right-to-Carry (RTC) laws, whichallows local authorities to issue a concealed weapon permit to all applicants that are eligible,reduces crime. Ever since its publication, there has been much academic and political debatethat challenges the ﬁndings. Ayres and Donohue (2003) shows that the negative eﬀect ofRTC laws has no statistical signiﬁcance under a more reasonable model speciﬁcation andinference using both the state and county level data between 1977 - 1999 for 51 U.S. states.Their conclusion is echoed by the National Research Council (2004) (NRC) report whichﬁnds little reliable statistical support for the “More Guns Less Crime” hypothesis. Recently,Aneja, Donohue, and Zhang (2014) revisited the hypothesis using the updated panel datafrom 1977 - 2010. They correct several mistakes in the dataset used in earlier analysis anddiscuss the shortcoming of using county level crime data. We refer the readers to more Panel Quantile regression with group fixed effects l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 15) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 30) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 60) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .

95% Coverage (T = 15) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .

95% Coverage (T = 30) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .

95% Coverage (T = 60) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 Figure 1.

Use diﬀerent constants in p n,T for the IC criteria for location-scale shift model with t error on DGP1: For a equally spaced grid on [0 . , .

3] with width 0.01, the three columns representdiﬀerent magnitudes of T while each ﬁgures in the row overlays the curves for n ∈ { , , } forvarious performance measures. The ﬁrst row plots the proportion of correctly estimated number ofgroups. The second row plots the RMSE of ˆ β IC ( τ ) where τ = 0 . details provided in Aneja, Donohue, and Zhang (2014). For our empirical analysis, we usetheir updated state level data kindly provided by the authors. The main parameter of interest is the eﬀect of the indicator of the RTC laws (denotedas lawind ) on crime rates. In our analysis, we focus on the violent crime rate. A similaranalysis is possible for other categories of oﬀense. In addition to the law indicator, thereis also information on the incarceration rate (sentenced prisoners per 100,000 residents; The data and the detailed description of the data source can be downloaded from https://works.bepress.com/john_donohue/107/ . u and Volgushev 15 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 15) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 30) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 60) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . .

95% Coverage (T = 15) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .

95% Coverage (T = 30) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .

95% Coverage (T = 60) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 Figure 2.

Use diﬀerent constants in p n,T for the IC criteria for location-scale shift model with t error on DGP1: For a equally spaced grid on [0 . , .

75 and the third plots thecoverage rate for nominal size 5%. Results are based on 400 repetitions. denoted as prisoner ) in the states in the previous year, real per capita personal income(denoted as rpcpi ) and other demographic variables on population proportions in diﬀerentage-gender groups for various ethnicities. As argued in Ayres and Donohue (2003) andAneja, Donohue, and Zhang (2014), to avoid multicollinearity and accounting for the factthat 90% of violent crimes in the U.S. are committed by male oﬀenders, we follow theirspeciﬁcation and control for only the proportion of African-American male in the age group10-19, 20-29 and 30-39. Inevitably, there are many diﬀerent model speciﬁcations that can beconsidered for this empirical inquiry and it is impossible to report all the results. Our goalis to emphasize the heterogeneous law adoption eﬀect for states that have low violent crime Panel Quantile regression with group fixed effects rate versus those with high crime rate. We also provide evidence of clustering behavior ofthe state ﬁxed eﬀects which are incorporated to capture states’ unobserved heterogeneityand show that by taking advantage of the dimension reduction of grouping the ﬁxed eﬀects,the parameters of interest on other control variables enjoy better statistical precision.Our main model speciﬁcation is(10) q i,τ (log(violent it )) = α i ( τ ) + γ t ( τ ) + β ( τ )lawind it + θ ( τ ) (cid:62) X it where the additional control variables X it include lagged incarceration rate ( prisoners ),real per capita personal income ( rpcpi ) and the three demographic variables ( afam1019 , afam2029 , afam3039 ). We note that high incarceration rates in a given state may be afeedback towards rising violent crime, and therefore including those as control variablesmight lead to endogeneity issues. As a robustness check, we also report the correspondingresults in the appendix for model (10) without the incarcerating rate as a control covariate.The eﬀects stay mostly unchanged.Figure 3 reports the panel data quantile regression estimates with state ﬁxed eﬀects for τ ∈ { . , . , . , . , . } and their associated point-wise conﬁdence intervals. Similar tothe ﬁndings in Aneja, Donohue, and Zhang (2014), we see that the RTC law-adoption has apositive eﬀects on violent crime rate. In addition this eﬀect is signiﬁcant for lower quantileswhile there is no statistical signiﬁcance for such an eﬀect for states at higher quantiles of theviolent crime rate. All other control variables have expected signs, with a negative eﬀectof the lagged incarceration rate indicating that states with stricter laws have lower violentcrime rates, although this eﬀect is not very precisely estimated. Higher real per capitapersonal income has a negative eﬀect on violent crime at lower quantiles; this eﬀect isdiminishing for higher quantiles of violent crime rate. The proportion of African-Americanpopulation at age group 30 - 39 has a positive eﬀect on the violent crime and this eﬀectbecomes more prominent for higher crime rate states. Figure 5 shows the correspondingstate ﬁxed eﬀect estimates for diﬀerent quantile levels. There is some evidence of clusteredbehavior of these ﬁxed eﬀect estimates, although these are estimates with statistical errors.Using our proposed methodology, the estimated number of groups for the ﬁve quantile levelsare respectively { , , , , } .Figure 4 plots the corresponding panel data quantile regression estimates with the esti-mated optimal grouped state ﬁxed eﬀects for τ ∈ { . , . , . , . , . } and their associ-ated point-wise conﬁdence intervals. The pattern of the quantile eﬀects for all the controlvariables stay roughly the same compared to the ﬁxed eﬀect quantile regression estimates,while the variance of these estimates under the grouped ﬁxed eﬀects are noticeably smaller.This is similar to what we observe in the simulation section where the common parame-ters in quantile panel data regression with grouped ﬁxed eﬀects have lower variances thanthose of the estimates based on “individual heterogeneity”. However, we would also liketo point out that the standard errors are based on reﬁtted models with estimated groupstructure, which does not account for the uncertainty of model selection on the ﬁxed eﬀectsand should be interpreted with caution. A proper inference method based on the proposedmethodology accounting for such uncertainty is important and is part of our future researchagenda. u and Volgushev 17 − . . . . . . lawind taus c f[, ] l l l l l − − − − + − prisoners taus c f[, ] l l l l l − − − − − − − − + rpcpi taus c f[, ] l l l l l − . − . − . − . − . . afam1019 taus c f[, ] l l l l l − . − . . . . . afam2029 taus c f[, ] l l l l l − . . . . . . afam3039 taus c f[, ] l l l l l Figure 3.

Panel data quantile regression estimates with state ﬁxed eﬀects for various τ based onmodel speciﬁcation (10): For τ ∈ { . , . , . , . , . } , the solid black points plot the coeﬃcientestimates for the eﬀects of the RTC law adoption and other control variables on the violent crimerate based on panel data of 51 U.S. states for 1977 - 2010. The shaded area is the pointwise95% conﬁdence interval for which the standard errors are computed using the Hendricks-Koenkersandwich covariance matrix estimates with the Hall-Sheather bandwidth rule. The red solid linemarks the ﬁxed eﬀect panel data mean regression estimates with the dotted red lines plots the95% conﬁdence interval with robust clustered (at the states level) standard errors. Conclusions and future extensions

The present paper suggests a simple and computationally eﬃcient way to incorporategroup ﬁxed eﬀects into a panel data quantile regression by means of a convex clusteringpenalty. We develop theoretical results on consistent group structure estimation and discussthe asymptotic properties of the resulting joint and group-speciﬁc estimators.There are several directions that we plan to explore in the future. First, our theoryfocused on individual ﬁxed eﬀects while assuming common slope coeﬃcients. It is equally Panel Quantile regression with group fixed effects − . . . . . . lawind taus b s t a r [ k , ] l l l l l − − − − + − prisoners taus b s t a r [ k , ] l l l l l − − − − − − − − + rpcpi taus b s t a r [ k , ] l l l l l − . − . − . − . − . . afam1019 taus b s t a r [ k , ] l l l l l − . − . . . . . afam2029 taus b s t a r [ k , ] l l l l l − . . . . . . afam3039 taus b s t a r [ k , ] l l l l l Figure 4.

Panel data quantile regression estimates with grouped state ﬁxed eﬀects for various τ based on model speciﬁcation (10): For τ ∈ { . , . , . , . , . } , the solid black points plotthe coeﬃcient estimates for the eﬀects of the RTC law adoption and other control variables onthe violent crime rate based on the proposed methodology with panel data of 51 U.S. states for1977 - 2010. The shaded area are the pointwise 95% conﬁdence interval where the standard errorsare computed using the Hendricks-Koenker sandwich covariance matrix estimates with the Hall-Sheather bandwidth rule. The red solid line marks the ﬁxed eﬀect panel data mean regressionestimates with the dotted red lines plot the 95% conﬁdence interval with robust clustered (at thestate level) standard errors. interesting to allow for group structure in some of the slope coeﬃcients while keeping otherslope coeﬃcients common across individuals, perhaps even allowing for individual ﬁxedeﬀects. This can be achieved by straightforward modiﬁcations of the penalization approachwhich we explored so far, but a more detailed theoretical analysis of this approach remainsbeyond the scope of the present paper. u and Volgushev 19 lllllllllllllllllllllllllllllllllllllllllllllllllll tau = 0.1 Index F i x ed E ff e c t s lllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll tau = 0.5 Index F i x ed E ff e c t s lllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll tau = 0.9 Index F i x ed E ff e c t s lllllllllllllllllllllllllllllllllllllllllllllllllll Figure 5.

The estimated state ﬁxed eﬀects and the corresponding grouped ﬁxed eﬀects for various τ based on model speciﬁcation (10): For τ ∈ { . , . , . } , the hollow black points plot the orderedpanel data quantile regression estimates for individual state ﬁxed eﬀects. The red solid points markout the estimated grouping and the corresponding group ﬁxed eﬀect estimates using the proposedmethodology. Second, one can take the standpoint that in many applications there is no exact groupstructure. In such settings, an alternative interpretation of the penalty which we inves-tigated is as a way of regularizing problems that have too many parameters. Such aninterpretation is in the spirit of the proposals of Koenker (2004) and Lamarche (2010), anda detailed investigation of the resulting bias-variance trade-oﬀ warrants further research.Finally, a deeper analysis of issues that are related to uniformity of distributional approx-imation in the entire parameter space was not addressed here, but remains an importanttheoretical and practical question which we hope to address in the future. Panel Quantile regression with group fixed effects

References

Abrevaya, J., and

C. Dahl (2008): “The Eﬀects of Birth Inputs on Birthweight,”

Journal of Business &Economic Statistics , 26, 379–397.

Allman, E., C. Mathias, and

J. Rhodes (2009): “Identiﬁability of parameters in latent structure modelswith many observed variables,”

Annals of Statistics , 37, 3099–3132.

Andersen, E. D. (2010): “The Mosek Optimization Tools Manual, Version 6.0,” Available from . Ando, T., and

J. Bai (2016): “Panel Date Models with Grouped Factor Structure Under Unknown GroupMembership,”

Journal of Applied Econometrics , 31, 163–191.

Aneja, A., J. Donohue, and

A. Zhang (2014): “The impact of Right to Carry Laws and the NRC report:the latest lessons from the empirical evaluation of law and policy,” NBER working paper No. 18294.

Arellano, M., and

S. Bonhomme (2016): “Nonlinear Panel Data Estimation via Quantile Regressions,”

Econometrics Journal , 19, 64–94.

Ayres, L., and

J. Donohue (2003): “Shooting down the ‘more guns less crime’ hypothesis,”

Stanford LawReview , 55, 1193–1312.

Belloni, A., and

V. Chernozhukov (2009): “Least squares after model selection in high-dimensionalsparse models,” .

Belloni, A., V. Chernozhukov, and

K. Kato (2014): “Uniform post-selection inference for least abso-lute deviation regression and other Z-estimation problems,”

Biometrika (Oberwolfach 2012) , 102(1), 77–94.

Bester, A., and

C. Hansen (2016): “Grouped Eﬀects Estimators in Fixed Eﬀects Models,”

Journal ofEconometrics , 190, 197–208.

Bonhomme, S., and

E. Manresa (2015): “Grouped Pattern of Heterogeneity in Panel Data,”

Economet-rica , 83, 1147–1184.

Chamberlain, G. (1982): “Multivariate Regression Models for Panel Data,”

Journal of Econometrics , 18,5–46.

Chernozhukov, V., I. Fern´andez-Val, S. Hoderlein, H. Holzmann, and

W. Newey (2015): “Non-parametric Identiﬁcation in Panels using Quantiles,”

Journal of Econometrics , 188, 378–392.

Chetverikov, D., B. Larsen, and

C. Palmer (2016): “IV Quantile Regression for Group-Level Treat-ments, with an Application on the Distributional Eﬀects of Trade,”

Econometrica , 84, 809–833.

Council, N. R. (2004): in

Firearms and Violence: a Critical Review

The National Academies Press: Wash-ington.

Evdokimov, K. (2010): “Identiﬁcation and Estimation of a Nonparametric Panel Data Model with Unob-served Heterogeneity,” preprint, Princeton University.

Friberg, H. A. (2012): “Users Guide to the R-to-Mosek Interface,” Available from http://rmosek.r-forge.r-project.org . Galvao, A. F., and

K. Kato (2016): “Smoothed quantile regression for panel data,”

Journal of Econo-metrics , 193(1), 92–112.

Galvao, A. F., and

L. Wang (2015): “Eﬃcient minimum distance estimator for quantile regression ﬁxedeﬀects panel data,”

Journal of Multivariate Analysis , 133, 1–26.

Hall, P., and

S. Sheather (1988): “On the Distribution of a Studentized Quantile,”

Journal of the RoyalStatistical Society, Series B , 50, 381–391.

Harding, M., and

C. Lamarche (2017): “Penalized quantile regression with semiparametric correlatedeﬀects: An application with heterogeneous preferences,”

Journal of Applied Econometrics , 32(2), 342–358.

Heckman, J., and

B. Singer (1982): “The Identiﬁcation Problem in Econometric Models for DurationData,” in

Advances in Econometrics , ed. by W. Hildenbrand. Cambridge University Press.

Hocking, T., J. Vert, F. Bach, and

A. Joulin (2011): “Clusterpath: an Algorithm for ClusteringUsing Convex Fusion Penalties,” in

Proceeds of the International Conference of Machine Learning , ed. byL. Getoor, and

T. Scheﬀer. Omnipress: Madison.

Hsiao, C. (2003):

Analysis of Panel Data . Cambridge university press.

Kasahara, H., and

K. Shimotsu (2009): “Nonparametric Identiﬁcation of Finite Mixture Models ofDynamic Discrete Choices,”

Econometrica , 77, 135–175. u and Volgushev 21

Kato, K., A. Galvao, and

G. Montes-Rojas (2012): “Asymptotics for Panel Quantile Regression Modelswith Individual Eﬀects,”

Journal of Econometrics , 170, 76–91.

Kaufman, L., and

P. Rousseeuw (2009):

Finding Groups in Data: an Introduction to Cluster Analysis .Wiley: New York.

Keane, M., and

K. Wolpin (1997): “The Career Decisions of Young Men,”

Journal of Political Economy ,105, 473–522.

Koenker, R. (2004): “Quantile Regression for Longitudinal Data,”

Journal of Multivariate Analysis , 91,74–89.

Koenker, R. (2005):

Quantile regression , no. 38. Cambridge university press.

Lamarche, C. (2010): “Robust Penalized Quantile Regression Estimation for Panel Data,”

Journal ofEconometrics , 157, 396–408.

Leeb, H., and

B. M. P¨otscher (2008): “Sparse estimators and the oracle property, or the return ofHodges estimator,”

Journal of Econometrics , 142(1), 201–211.

Lin, C.-C., and

S. Ng (2012): “Estimation of Panel Data Models with Parameter Heterogeneity WhenGroup Membership is Unknown,”

Journal of Econometric Methods , 1, 42–55.

Lockhart, R., J. Taylor, R. J. Tibshirani, and

R. Tibshirani (2014): “A signiﬁcance test for thelasso,”

Annals of statistics , 42(2), 413.

Lott, J., and

D. Mustard (1997): “Crime, Deterrence and Right-to-Carry Concealed Handguns,”

Journalof Legal Studies , 26, 1–68.

MacQueen, J. (1967): “Some Methods for Classiﬁcation and Analysis of Multivariate Observations,” in

Proceeds of the 5th Berkeley Symposium of Mathematical Statistics and Probability , ed. by L. L. Cam, and

J. Neyman. University of California Press: Berkeley.

Mundlak, Y. (1978): “On the Pooling of Time Series and Cross Section Data,”

Econometrica , 46, 69–85.

Radchenko, P., and

G. Mukherjee (2017): “Convex Clustering via (cid:96) Fusion Penalization,”

Journal ofthe Royal Statistical Society Series B , forthcoming.

Su, L., Z. Shi, and

P. C. Phillips (2016): “Identifying latent structures in panel data,”

Econometrica ,84(6), 2215–2264.

Sun, Y. (2005): “Estimation and Inference in Panel Structure Models,” Working paper, University ofCalifornia, San Diego.

Tan, K., and

D. Witten (2015): “Statistical Properties of Convex Clustering,”

Electronic Journal ofStatistics , 9, 2324–2347. van de Geer, S., P. B¨uhlmann, Y. Ritov, and

R. Dezeure (2014): “On asymptotically optimalconﬁdence regions and tests for high-dimensional models,”

The Annals of Statistics , 42(3), 1166–1202. van der Vaart, A. W., and

J. A. Wellner (1996):

Weak Convergence and Empirical Processes . Springer.

Zhu, C., H. Xu, C. Leng, and

S. Yan (2014): “Convex optimization procedure for clustering: theoreticalrevisit,” in

Advances in Neural Information Processing Systems , pp. 1619–1627.

Zou, H. (2006): “The adaptive lasso and its oracle properties,”

Journal of the American statistical associ-ation , 101(476), 1418–1429. Panel Quantile regression with group fixed effects Proofs

We begin by collecting some useful facts and deﬁning additional notation. We will re-peatedly make use of Knight’s identity (see (Koenker 2005), p. 121)) which holds for u (cid:54) = 0:(11) ρ τ ( u − v ) − ρ τ ( u ) = − vψ τ ( u ) + (cid:90) v I { u ≤ s } − I { u ≤ } ds. Additionally, let γ i := ( α i , β (cid:62) ) (cid:62) . The symbols a n (cid:46) b n , a n (cid:38) b n will mean that there existsa non-random constant C ∈ (0 , ∞ ) which is independent of n, T, τ such that P ( a n ≤ Cb n ) =1 and P ( a n ≥ Cb n ) = 1, respectively. Deﬁne ε τit := Y it − Z (cid:62) it γ i ( τ ) and let F ε τit | X it ( u | X it ) = F Y it | X it ( Z (cid:62) it γ i ( τ ) + u | X it ) denote the conditional cdf of ε τit given X it . When there is no riskof confusion, we will also write ε it instead of ε τit . Deﬁne ψ τ ( x ) := ( I { x ≤ } − τ ).8.1. Proof of Theorem 3.1.

We begin by stating some useful technical results which willbe proved at the end of this section.

Lemma 8.1.

For any ﬁxed β ∈ R p deﬁne ε τit,β := Y it − X (cid:62) it β − α i ( τ ) . Then we have underassumptions (A1)-(A3) T (cid:88) t =1 ρ τ ( ε τit,β − a ) − ρ τ ( ε τit,β − a ) = ( a − a ) T (cid:88) t =1 ψ τ ( ε τit,β ) + ˜ r (1) n,i ( a , a ) + ˜ r (2) n,i ( a , a ) where sup i ˜ r (1) n,i ( a , a ) (cid:46) T | a − a | max( | a | , | a | ) , sup i ˜ r (2) n,i ( a , a ) = | a − a | O P ( T / (log n ) / ) . Lemma 8.2.

Under assumptions (A1)-(A3) there exist ε > , ∞ > c , c > such thatfor all i = 1 , ..., n (12) c (cid:107) γ − γ i (cid:107) ≥ E [ ρ τ ( Y it − Z (cid:62) it γ )] − E [ ρ τ ( Y it − Z (cid:62) it γ i )] ≥ c ( (cid:107) γ − γ i (cid:107) ∧ ε ) . Lemma 8.3.

Under assumption (A1) deﬁne for ﬁxed B ∈ R (13) s n, ( B ) := sup i sup | γ |≤ B (cid:12)(cid:12)(cid:12) (cid:88) t (cid:16) ρ τ ( Y it − Z (cid:62) it γ ) − ρ τ ( Y it ) − E [ ρ τ ( Y it − Z (cid:62) it γ ) − ρ τ ( Y it )] (cid:17)(cid:12)(cid:12)(cid:12) We have for any ﬁxed

B < ∞ , provided that min( n, T ) → ∞ , log n = o ( T )(14) s n, ( B ) = O P ( T / (log n ) / ) . Proof of Theorem 3.1Step 1: ﬁrst bounds

In this step we shall prove that(15) (cid:107) ˆ β − β (cid:107) + sup i | ˆ α i − α i | = O P ( T − / (log n ) / + Λ D n/T ) . Combine the results in Lemma 8.2 and Lemma 8.3 to ﬁnd that any minimizer of Θ( α , ..., α n , β )must satisfy c T (cid:88) i (cid:110) ( (cid:107) β − β (cid:107) + | α i − α i | ) ∧ ε (cid:111) (cid:46) ns n, + Λ D n . u and Volgushev 23 Let N ∆ := { i : (cid:107) β − β (cid:107) + | α i − α i | ≥ ∆ } . Then for any 0 < ∆ < ε T N ∆ ∆ = O P ( ns n, + Λ D n ) , i.e. by Lemma 8.3 N ∆ = nO P (( T − / (log n ) / ) + Λ D n/T )∆ − and in particular N ∆ = o P ( n ) as long as ∆ (cid:29) T − / (log n ) / + Λ D n/T . Provided thatΛ D n/T = o P (1) we obtain(16) (cid:107) ˆ β − β (cid:107) = O P ( T − / (log n ) / + Λ D n/T ) . Deﬁne D n,T := T − / (log n ) / + Λ D n/T . Next we will prove that, provided n Λ D = o P ( T / ) , T / (log n ) / / ( n Λ S ) = o P (1) alsosup i | ˆ α i − α i | = O P ( D n,T ) . To this end, it suﬃces to prove that sup i | ˆ α i − α i | = O P ( d n,T ) for any n Λ S (cid:29) d n,T (cid:29) D n,T .Deﬁne ˜ α i = (cid:26) ˆ α i if | ˆ α i − α i | ≤ d n ,α i + d / n sgn( ˆ α i − α i ) if | ˆ α i − α i | > d n . Deﬁne the set E := { i : ˜ α i = ˆ α i } . Observe that (cid:12)(cid:12)(cid:12) | ˜ α i − ˜ α j | − | ˆ α i − ˆ α j | (cid:12)(cid:12)(cid:12) ≤ | ˆ α i − ˜ α i | + | ˆ α j − ˜ α j | ∀ i, j, | ˜ α i − ˜ α j | − | ˆ α i − ˆ α j | < −| ˆ α i − α i | + d / n,T ∃ k : i ∈ I k ∩ E C , j ∈ I k ∩ E, | ˜ α i − ˜ α j | − | ˆ α i − ˆ α j | ≤ ∃ k : i, j ∈ I k . Panel Quantile regression with group fixed effects

Thus (cid:88) i,j λ i,j (cid:110) | ˆ α i − ˆ α j | − | ˜ α i − ˜ α j | (cid:111) = (cid:16) (cid:88) i ∈ E C (cid:88) j ∈ E + (cid:88) i ∈ E C (cid:88) j ∈ E C (cid:17) λ i,j (cid:110) | ˆ α i − ˆ α j | − | ˜ α i − ˜ α j | (cid:111) = 2 (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:16) (cid:88) j ∈ I k ∩ E + (cid:88) j ∈ I Ck ∩ E (cid:17) λ i,j (cid:110) | ˆ α i − ˆ α j | − | ˜ α i − ˜ α j | (cid:111) + (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:16) (cid:88) j ∈ I k ∩ E C + (cid:88) j ∈ I Ck ∩ E C (cid:17) λ i,j (cid:110) | ˆ α i − ˆ α j | − | ˜ α i − ˜ α j | (cid:111) ≥ (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:16) Λ S | I k ∩ E | − n Λ D (cid:17) {| ˆ α i − α i | − d / n,T }− (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:88) j ∈ I Ck ∩ E C λ i,j (cid:110) | ˆ α i − ˜ α i | + | ˆ α j − ˜ α j | (cid:111) ≥ (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:16) Λ S | I k ∩ E | − n Λ D (cid:17) {| ˆ α i − α i | − d / n,T } + (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:110) − n Λ D {| ˆ α i − α i | − d / n,T } − Λ D (cid:88) j ∈ I Ck ∩ E C {| ˆ α j − α j | − d / n,T } (cid:111) ≥ (cid:88) k (cid:88) i ∈ I k ∩ E C (cid:16) Λ S | I k ∩ E | − n Λ D (cid:17) {| ˆ α i − α i | − d / n,T } . Now since N d n = o P ( n ) and by deﬁnition of ˜ α n it follows that under (C)max k (cid:12)(cid:12)(cid:12) | I k ∩ E | nµ k − (cid:12)(cid:12)(cid:12) = o P (1) , and since by assumption Λ D / Λ S = o P (1) we obtain (cid:88) i,j λ i,j (cid:110) | ˆ α i − ˆ α j | − | ˜ α i − ˜ α j | (cid:111) (cid:38) n Λ S (cid:88) i ∈ E C {| ˆ α i − α i | − d / n,T } . u and Volgushev 25 Next we note that for any i with | ˆ α i − α i | ≥ (2 + c /c ) d n,T we have1 T (cid:88) t ρ τ ( Y it − X (cid:62) it ˆ β − ˆ α i ) − ρ τ ( Y it − X (cid:62) it ˆ β − ˜ α i ) ≥ (cid:90) ρ τ ( y − x (cid:62) ˆ β − ˆ α i ) − ρ τ ( y − x (cid:62) ˆ β − ˜ α i ) dP Y i ,X i ( x, y ) − s n, /T = (cid:90) ρ τ ( y − x (cid:62) ˆ β − ˆ α i ) − ρ τ ( y − x (cid:62) β − α i ) dP Y i ,X i ( x, y ) − (cid:90) ρ τ ( y − x (cid:62) ˆ β − ˜ α i ) − ρ τ ( y − x (cid:62) β − α i ) dP Y i ,X i ( x, y ) − s n, /T ≥ c ( {(cid:107) ˆ β − β (cid:107) + | ˆ α i − α i | } ∧ ε ) − c ( (cid:107) ˆ β − β (cid:107) + | ˜ α i − α i | ) − s n, /T> d n,T . For i with | ˆ α i − α i | < (2 + c /c ) d n,T note that by Lemma 8.1 (cid:12)(cid:12)(cid:12) (cid:88) t ρ τ ( Y it − X (cid:62) it ˆ β − ˆ α i ) − ρ τ ( Y it − X (cid:62) it ˆ β − ˜ α i ) (cid:12)(cid:12)(cid:12) (cid:46) | ˆ α i − ˜ α i | (cid:16) (cid:88) t ψ τ ( ε τit, ˆ β ) + T d / n,T + O P ( T / (log n ) / ) (cid:17) (cid:46) {| ˆ α i − α i | − d / n,T } (cid:16) T (cid:107) ˆ β − β (cid:107) + T d / n,T + O P ( T / (log n ) / ) (cid:17) where the O P terms are uniform in i . Thus (cid:88) i (cid:88) t ρ τ ( Y it − X (cid:62) it ˆ β − ˆ α i ) − ρ τ ( Y it − X (cid:62) it ˆ β − ˜ α i ) (cid:38) − (cid:16) T d / n,T + O P ( T / (log n ) / ) (cid:17) (cid:88) i ∈ E C {| ˆ α i − α i | − d / n,T } Summarizing we have proved thatΘ( ˆ α , ..., ˆ α n , ˆ β ) − Θ( ˜ α , ..., ˜ α n , ˆ β ) (cid:38) (cid:104) n Λ S − (cid:16) T d / n,T + O P ( T / (log n ) / ) (cid:17)(cid:105) (cid:88) i ∈ E C {| ˆ α i − α i | − d / n,T } . Under the conditions n Λ D = o P ( T / ) , T / (log n ) / / ( n Λ S ) = o P (1) the last line is strictlypositive with probability tending to one unless E C = ∅ with probability tending to one.Thus the proof of (15) is complete. Step 2: recovery of clusters with probability to one

To simplify notation, assume that individual 1 , ..., N belongs to cluster 1, individual N + 1 , ..., N + N to cluster 2 and so on. Since all cluster can be handled by similararguments we only consider the ﬁrst cluster. Let ˆ α (1) , ..., ˆ α ( L ) denote the distinct values ofˆ α , ..., ˆ α N , ordered in increasing order, and let n ,k := { i : ˆ α i = ˆ α ( k ) } . Again, to simplifynotation assume w.o.l.g. that ˆ α = ... = ˆ α n , = ˆ α (1) . To prove the result, we proceed in an Panel Quantile regression with group fixed effects iterative way. We will prove by contradiction that L = 1, i.e. all estimators of individualsfrom cluster 1 take the same value. Assume that L ≥ n , > N /

2. Assume that n , < N /

2. Deﬁne˜ α i = ˆ α (2) for i = 1 , ..., n , and ˜ α i = ˆ α i for i > n , . By (15) Lemma 8.1 we ﬁnd that (cid:12)(cid:12)(cid:12) (cid:88) i T (cid:88) t =1 ρ τ ( Y it − X (cid:62) it ˆ β − ˆ α i ) − ρ τ ( Y it − X (cid:62) it ˆ β − ˜ α i ) (cid:12)(cid:12)(cid:12) (cid:46) n , ( ˆ α (2) − ˆ α (1) ) (cid:110)(cid:12)(cid:12)(cid:12) (cid:88) t ψ τ ( ε τit, ˆ β ) (cid:12)(cid:12)(cid:12) + O P ( T / (log n ) / ) + O P ( T / (log n ) / ) (cid:111) (cid:46) n , ( ˆ α (2) − ˆ α (1) ) O P ( T / (log n ) / )Next, observe that by construction, under (C) and using the fact that sup i | ˆ α i − α i | = o P (1), | ˜ α i − ˜ α j | − | ˆ α i − ˆ α j | = −| ˆ α (2) − ˆ α (1) | , ≤ i ≤ n , , n , < j ≤ N or 1 ≤ j ≤ n , , n , < i ≤ N , (cid:12)(cid:12)(cid:12) | ˜ α i − ˜ α j | − | ˆ α i − ˆ α j | (cid:12)(cid:12)(cid:12) ≤ | ˆ α (2) − ˆ α (1) | , ≤ i ≤ n , , N < j or 1 ≤ j ≤ n , , N < i, | ˜ α i − ˜ α j | − | ˆ α i − ˆ α j | = 0 , else . From this we obtainΘ( ˜ α , ..., ˜ α n , ˆ β ) − Θ( ˆ α , ..., ˆ α n , ˆ β ) (cid:46) n , ( ˆ α (2) − ˆ α (1) ) O P ( T / (log n ) / ) − ( ˆ α (2) − ˆ α (1) )Λ S n , ( N − n , )+ ( ˆ α (2) − ˆ α (1) ) n , Λ D O P ( n ) < n, T since by assumption Λ D / Λ S = o P (1) , n Λ S (cid:29) T / (log n ) / and since we assumed n , < N / N − n , ≥ N / (cid:38) n . However, this is a contradiction to the fact that ˆ α , ..., ˆ α n , ˆ β minimizes Θ.In a similar fashion, one can prove that n ,L > N /

2. Just deﬁne ˜ α N , ..., ˜ α N − n ,L +1 =ˆ α ( L − and proceed as above. Since n ,L + n , ≤ N and we have already proved that n , > N / L ≥

2, and hence L = 1. All other clusterscan be handled in a similar fashion and that completes the proof of the second step. (cid:50) Proof of Lemma 8.1

Apply Knight’s identity (11) to ﬁnd that T (cid:88) t =1 ρ τ ( ε τit,β − δ ) − ρ τ ( ε τit,β )= − δ (cid:88) t ψ τ ( ε τit,β ) + (cid:88) t (cid:90) δ E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105) ds + (cid:90) δ (cid:88) t (cid:110) I { ε τit,β ≤ s } − I { ε τit,β ≤ } − E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105)(cid:111) ds. u and Volgushev 27 Hence it follows that T (cid:88) t =1 ρ τ ( ε τit,β − a ) − ρ τ ( ε τit,β − a )= ( a − a ) (cid:88) t ψ τ ( ε τit,β ) + (cid:90) a a (cid:88) t E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105) ds + (cid:90) a a (cid:88) t (cid:110) I { ε τit,β ≤ s } − I { ε τit,β ≤ } − E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105)(cid:111) ds =: ( a − a ) T (cid:88) t =1 ψ τ ( ε τit,β ) + ˜ r (1) n,i ( a , a ) + ˜ r (2) n,i ( a , a ) . Now by a Taylor expansionsup i (cid:12)(cid:12)(cid:12) (cid:88) t (cid:90) a a E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105) ds (cid:12)(cid:12)(cid:12) = sup i (cid:12)(cid:12)(cid:12) (cid:90) a a (cid:88) t E [ F Y i | X i ( β (cid:62) X it + s | X it ) − F Y i | X i ( β (cid:62) X it | X it )] ds (cid:12)(cid:12)(cid:12) (cid:46) T | a − a | max( | a | , | a | ) , so the bound on ˜ r (1) n,i ( a , a ) is established. Next deﬁne the classes of functions G := (cid:110) ( y, x ) (cid:55)→ I { y − β (cid:62) x ≤ s } − I { y − β (cid:62) x ≤ } (cid:12)(cid:12)(cid:12) s ∈ R , β ∈ R d ≤ B (cid:111) , G := (cid:110) ( y, x ) (cid:55)→ I { y − β (cid:62) x ≤ s } (cid:12)(cid:12)(cid:12) s ∈ R , β ∈ R d (cid:111) . Note that the class of functions G has envelope function F ≡

1. Thus by Lemma 2.6.15 andTheorem 2.6.7 of (van der Vaart and Wellner 1996) the class of functions G satisﬁes, forany probability measure Q , N ( ε, G , L ( Q )) ≤ K (1 /ε ) V for some ﬁnite constants K, V (here, N ( ε, G , L ( Q )) denotes the covering number, see Section 2.1 of (van der Vaart and Wellner1996)). Moreover, G ⊆ { g − g | g , g ∈ G } , and elementary computations with coveringnumbers show that N ( ε, G , L ( Q )) ≤ ˜ K (1 /ε ) ˜ V for some ﬁnite constants ˜ V , ˜ K . Hence weﬁnd that by Theorem 2.14.9 of (van der Vaart and Wellner 1996)), for any h > P ∗ (cid:16) sup g ∈G √ T (cid:12)(cid:12)(cid:12) (cid:88) t g ( Y it , X it ) − E [ g ( Y it , X it )] (cid:12)(cid:12)(cid:12) ≥ h (cid:17) ≤ (cid:16) Dh (cid:112) ˜ V (cid:17) ˜ V e − h for some constant D that depends only on ˜ K (here, P ∗ denotes outer probability). Letting h = √ log n and applying the union bound for probabilities we obtainsup i sup β ∈ R d ,s ∈ R (cid:12)(cid:12)(cid:12) (cid:88) t (cid:110) I { ε τit,β ≤ s } − I { ε τit,β ≤ } − E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105)(cid:12)(cid:12)(cid:12) = O P ( T / (log n ) / ) . Panel Quantile regression with group fixed effects

Hence sup i sup β ∈ R p (cid:12)(cid:12)(cid:12) (cid:90) a a (cid:88) t (cid:110) I { ε τit,β ≤ s } − I { ε τit,β ≤ } − E (cid:104) I { ε τit,β ≤ s } − I { ε τit,β ≤ } (cid:105) ds (cid:12)(cid:12)(cid:12) = O P ( T / (log n ) / ) | a − a | . Thus the bound on ˜ r (2) n,i ( a , a ) follows and the proof is complete. (cid:50) Proof of Lemma 8.2

Observe that by Knight’s identity (11) E (cid:104) ρ τ ( Y it − Z (cid:62) it γ ) − ρ τ ( Y it − Z (cid:62) it γ i ) (cid:105) = E (cid:104) ρ τ ( Y it − Z (cid:62) it γ i − Z (cid:62) it ( γ − γ i )) − ρ τ ( Y it − Z (cid:62) it γ i ) (cid:105) = E (cid:104) − ( γ − γ i ) (cid:62) Z it ψ τ ( ε it ) + (cid:90) ( γ − γ i ) (cid:62) Z it I { ε it ≤ s } − I { ε it ≤ } ds (cid:105) = E (cid:104) (cid:90) ( γ − γ i ) (cid:62) Z it F ε it | X it ( s | X it ) − F ε it | X it (0 | X it ) ds (cid:105) . Now under assumption (A2) | F ε it | X it ( s | X it ) − F ε it | X it (0 | X it ) | ≤ sf (cid:48) a.s., and thus given (A1) E (cid:12)(cid:12)(cid:12) (cid:90) ( γ − γ i ) (cid:62) Z it F ε it | X it ( s | X it ) − F ε it | X it (0 | X it ) ds (cid:12)(cid:12)(cid:12) ≤ f (cid:48) E (cid:104) (( γ − γ i ) (cid:62) Z it ) (cid:105) ≤ M f (cid:48) (cid:107) γ − γ i (cid:107) . This shows the upper bound in (12). For the lower bound, note that s (cid:55)→ F ε it | X it ( s | X it )is non-decreasing almost surely. Moreover, f ε it | X it (0 | X it ) ≥ f min a.s. by (A3) and thus by(A2) and (A3) we have almost surelyinf | s |≤ f min / f (cid:48) f ε it | X it ( s | X it ) ≥ f min . Deﬁne δ i := ( γ − γ i ) min { , f min / (2 M f (cid:48) (cid:107) γ − γ i (cid:107) ) } . Noting that s (cid:55)→ F ε it | X it ( s | X it ) isnon-decreasing almost surely, it follows that a.s. (cid:90) ( γ − γ i ) (cid:62) Z it F ε it | X it ( s | X it ) − F ε it | X it (0 | X it ) ds ≥ (cid:90) δ (cid:62) i Z it F ε it | X it ( s | X it ) − F ε it | X it (0 | X it ) ds ≥ f min δ (cid:62) i Z it ) where the last inequality follows since by deﬁnition | δ (cid:62) i Z it | ≤ f min / (2 f (cid:48) ) a.s. Finally, underassumption (A1), E [( δ (cid:62) i Z it ) ] ≥ (cid:107) δ i (cid:107) c λ . Summarizing, we ﬁnd E (cid:104) ρ τ ( Y it − Z (cid:62) it γ ) − ρ τ ( Y it − Z (cid:62) it γ i ) (cid:105) ≥ f min c λ (cid:107) δ i (cid:107) = f min c λ (cid:16) (cid:107) γ − γ i (cid:107) ∧ f min M f (cid:48) (cid:17) which proves the lower bound in (12). Thus the proof of the Lemma is complete. (cid:50) u and Volgushev 29 Proof of Lemma 8.3

Consider the class of functions G B := (cid:110) ( y, z ) (cid:55)→ g γ ( y, z ) := ( ρ τ ( y − z (cid:62) γ ) − ρ τ ( y )) I {| z | ≤ M } + M B M B (cid:12)(cid:12)(cid:12) (cid:107) γ (cid:107) ≤ B (cid:111) . Note that by construction 0 ≤ g γ ( y, z ) ≤ (cid:107) γ (cid:107) ≤ B and moreover sup y,z | g γ ( y, z ) − g γ (cid:48) ( y, z ) | ≤ (cid:107) γ − γ (cid:48) (cid:107) / (2 B ). This shows the existence of constants V, K B < ∞ such that forall i = 1 , ..., n N [ ] ( ε, G B , L ( P i )) ≤ ( K B /ε ) V for 0 < ε < K B where K B depends on B onlyand P i denotes the measure corresponding to ( Y i , Z i ). Thus we have by Theorem 2.14.9of (van der Vaart and Wellner 1996), P ∗ (cid:16) sup γ √ T (cid:12)(cid:12)(cid:12) (cid:88) t g γ ( Y it , Z it ) − E [ g γ ( Y it , Z it )] (cid:12)(cid:12)(cid:12) ≥ h (cid:17) ≤ (cid:16) D B h √ V (cid:17) V e − h where the constant D B depends only on K B and P ∗ denotes outer probability. Set h = √ log n to bound the right-hand side above by o ( n − ). Deﬁning the events E i,n := (cid:110) sup γ √ T (cid:12)(cid:12)(cid:12) (cid:88) t g γ ( Y it , Z it ) − E [ g γ ( Y it , Z it )] (cid:12)(cid:12)(cid:12) ≥ (cid:112) log n (cid:111) we obtain P ∗ ( ∪ i E i,n ) ≤ n sup i P ∗ ( E i,n ) ≤ no ( n − ) = o (1) . Finally, note that under (A1) we have a.s. ρ τ ( Y it − Z (cid:62) it γ ) − ρ τ ( Y it ) − E [ ρ τ ( Y it − Z (cid:62) it γ ) − ρ τ ( Y it )]2 M B = g γ ( Y it , Z it ) − E [ g γ ( Y it , Z it )] ∀ i, t. This completes the proof. (cid:50)

Proof of Theorem 3.2.

We begin by stating a useful technical result that will beproved at the end of this section.

Lemma 8.4.

Under assumptions (A1)-(A3) T (cid:88) t =1 ρ τ ( Y it − Z (cid:62) it ( γ + δ )) − ρ τ ( Y it − Z (cid:62) it γ )= δ (cid:62) T (cid:88) t =1 Z it ψ τ ( ε it ) + 12 T δ (cid:62) E [ Z it Z (cid:62) it f ε i | X i (0 | X it )] δ + r (1) n,i ( δ ) + r (2) n,i ( δ ) where, deﬁning (cid:96) n,T := max { log n, log T } , there exists a constant C independent of n, T, δ such that sup i sup T − (cid:96) n,T ≤(cid:107) δ (cid:107)≤ | r (1) n,i ( δ ) |(cid:107) δ (cid:107) / = O P ( T / (cid:96) / n,T ) , sup i | r (2) n,i ( δ ) | ≤ T C (cid:107) δ (cid:107) . Proof of Theorem 3.2

The proof proceeds in several steps. First, we note that the ’ora-cle’ estimation problem (6) corresponds to a classical, ﬁxed-dimensional quantile regressionwith true parameter vector ( α (01) , ..., α (0 K ) , β (cid:62) ) and nT independent observations ( Y it , ˜ Z it )where ˜ Z (cid:62) it = ( e (cid:62) k , X (cid:62) it ) , i ∈ I k , t = 1 , ..., T where e k denotes the k’th unit vector in R K . Panel Quantile regression with group fixed effects

A straightforward extension of classical proof techniques in parametric quantile regressionshows that under assumptions (A1)-(A3) and (C) the oracle estimator is asymptoticallynormal as claimed.Second, we observe that by deﬁnition of the optimization problem the estimated groupstructure ˆ I ,(cid:96) , ..., ˆ I K (cid:96) ,(cid:96) is the same for all values of (cid:96) with λ (cid:96) that give rise to the same numberof groups. Since the value of IC ( (cid:96) ) depends only on ˆ I ,(cid:96) , ..., ˆ I K (cid:96) ,(cid:96) , it suﬃces to minimize IC over those values of (cid:96) that correspond to diﬀerent numbers of groups. Denote thedistinct estimated numbers of groups by ˆ K , ..., ˆ K R , the corresponding estimated groupingsby ˆ I (1 ˆ K r ) , ..., ˆ I ( ˆ K r ˆ K r ) , and the corresponding values of IC by IC ˆ K , ..., IC ˆ K R . By assumption(G) and Theorem 3.1, the probability of the event(17) P (cid:16) ∃ r : ˆ K r = K, ˆ I ( k ˆ K r ) = I k , k = 1 , ..., K (cid:17) → . Hence it suﬃces to prove that(18) P (cid:16) arg min r IC K r = K (cid:17) → . Once this result is established, we directly obtain P (cid:16) ( ˆ α IC , ..., ˆ α IC ˆ K IC , ( ˆ β IC ) (cid:62) ) = ( ˆ α ( OR )(1) , ..., ˆ α ( OR )( K ) , ( ˆ β ( OR ) ) (cid:62) ) (cid:17) → , and thus the asymptotic distribution of ( ˆ α IC , ..., ˆ α IC ˆ K IC , ( ˆ β IC ) (cid:62) ) matches that of the oracleestimator.We will now prove (18). From Theorem 3.2 in Kato, Galvao, and Montes-Rojas (2012)we know that under (A1)-(A3) and the additional assumptions that n → ∞ but T growsat most polynomially in n ˇ β − β = O P (( T / log n ) − / ∨ ( nT ) − / ) . If n → ∞ and T grows at most polynomially in n it follows that ˇ β − β = o P ( T − / ).Moreover, standard quantile regression arguments show thatˇ α i − α i = − E [ f ε τit | X it (0 | X it )] 1 T (cid:88) t ψ τ ( ε τit ) + R n,i u and Volgushev 31 where sup i | R n,i | = O p (cid:16)(cid:16) log TT (cid:17) / (cid:17) . Next apply Lemma 8.4 to ﬁnd that provided (log T ) (log n ) T → (cid:88) i,t ρ τ ( Y it − Z (cid:62) it ˇ γ i ) − ρ τ ( ε τit )= (cid:88) i (ˇ γ i − γ i ) (cid:62) (cid:88) t Z it ψ τ ( ε τit ) + T (cid:88) i (ˇ γ i − γ i ) (cid:62) E [ Z i Z (cid:62) i f ε τi | X i (0 | X i )](ˇ γ i − γ i ) + o P ( n )= (cid:88) i ( ˇ α i − α i ) (cid:88) t ψ τ ( ε τit ) + T (cid:88) i ( ˇ α i − α i ) E [ f ε τi | X i (0 | X i )] + o P ( n )= − (cid:88) i E [ f ε τi | X i (0 | X i )] (cid:16) √ T (cid:88) t ψ τ ( ε τit ) (cid:17) + o P ( n )= − (cid:88) i τ (1 − τ )2 E [ f ε τi | X i (0 | X i )] + o P ( n ) . Next, observe that by asymptotic normality of the oracle estimatorsup k =1 ,...,K (cid:107) ˆ γ ( OR )( k ) − γ (0 k ) (cid:107) = O P (( nT ) − / )where we deﬁned ˆ γ ( OR )( k ) := ( ˆ α ( OR )( k ) , ˆ β ( OR ) ). Again applying Lemma 8.4 we obtain (cid:88) k (cid:88) i ∈ I k (cid:88) t ρ τ ( Y it − Z (cid:62) it ˆ γ ( OR )( k ) ) − ρ τ ( ε τit )= (cid:88) k (ˆ γ ( OR )( k ) − γ (0 k ) ) (cid:62) (cid:88) i ∈ I k (cid:88) t Z it ψ τ ( ε it ) + nT O P (cid:16) sup k (cid:107) ˜ γ k − γ (0 k ) (cid:107) (cid:17) + o P ( n )= o P ( n ) . Combining the results obtained so far we have(19) (cid:88) k (cid:88) i ∈ I k (cid:88) t ρ τ ( Y it − Z (cid:62) it ˆ γ ( OR )( k ) ) − inf α ,...,α n ,β (cid:88) i,t ρ τ ( Y it − X (cid:62) it β − α i ) ≥ − (cid:88) i τ (1 − τ )2 E [ f ε i | X i (0 | X i )] + o P ( n ) . Next, let V n ( L ) denote the set of all disjoint partitions of { , ..., n } into L subsets. Observethat by (12) we have under assumption (C)inf L

Summarizing, we ﬁnd that under (C)inf

L K we have by (17), (19) and the assumptions on p n,T , ˆ C , with probability tending toone IC K r − inf s IC K s (cid:38) p n,T − n + o P ( n ) (cid:29) . It follows that, with probability tending to one, arg min (cid:96) IC ( (cid:96) ) ≤ K . Moreover, for K r < K we have by (20) and the assumptions on p n,T , ˆ C , with probability tending to one IC K r − inf s IC K s (cid:38) − Kp n,T + nT − O P ( nT / (log n ) / ) (cid:29) . Hence, with probability tending to one, K ≥ arg min r IC K r ≥ K and thus (18) follows. (cid:50) Proof of Lemma 8.4

By Knight’s identity (11) we have ρ τ ( Y it − Z (cid:62) it ( γ i + δ )) − ρ τ ( Y it − Z (cid:62) it γ i )= − δ (cid:62) Z it ψ τ ( ε it ) + (cid:90) Z (cid:62) it δ F ε i | X i ( s | X it ) − F ε i | X i (0 | X it ) ds + (cid:90) Z (cid:62) it δ I { ε it ≤ s } − I { ε it ≤ } − ( F ε i | X i ( s | X it ) − F ε i | X i (0 | X it )) ds. Deﬁne r (1) n,i ( δ ) := (cid:88) t (cid:90) Z (cid:62) it δ I { ε it ≤ s } − I { ε it ≤ } − ( F ε i | X i ( s | X it ) − F ε i | X i (0 | X it )) ds − T δ (cid:62) E [ Z i Z (cid:62) i f ε i | X i (0 | X i )] + (cid:88) t f ε i | X i (0 | X it )( Z (cid:62) it δ ) ,r (2) n,i ( δ ) := (cid:88) t (cid:110) (cid:90) Z (cid:62) it δ F ε i | X i ( s | X it ) − F ε i | X i (0 | X it ) ds − f ε i | X i (0 | X it )( Z (cid:62) it δ ) (cid:111) . By a Taylor expansion we obtain (cid:12)(cid:12)(cid:12) (cid:90) Z (cid:62) it δ F ε i | X i ( s | X it ) − F ε i | X i ( s | X it ) ds − f ε i | X i (0 | X it )( Z (cid:62) it δ ) (cid:12)(cid:12)(cid:12) ≤ ( Z (cid:62) it δ ) f (cid:48) ≤ M f (cid:48) (cid:107) δ (cid:107) , and thus the bound on r ( i ) n, is established. Next we note that E (cid:104) (cid:90) Z (cid:62) it δ I { ε it ≤ s } − I { ε it ≤ } − ( F ε i | X i ( s | X it ) − F ε i | X i (0 | X it )) ds (cid:105) = 0 u and Volgushev 33 since the conditional expectation given Z it equals zero almost surely and moreover | I it ( δ ) | := (cid:12)(cid:12)(cid:12) (cid:90) Z (cid:62) it δ I { ε it ≤ s } − I { ε it ≤ } − ( F ε i | X i ( s | X it ) − F ε i | X i (0 | X it )) ds (cid:12)(cid:12)(cid:12) ≤ M (cid:107) δ (cid:107) + M (cid:107) δ (cid:107) I {| ε it | ≤ M (cid:107) δ (cid:107)} , | I it ( δ ) − I it ( δ (cid:48) ) | ≤ M (cid:107) δ − δ (cid:48) (cid:107) . Note that in particular for (cid:107) δ (cid:107) ≤ | I it ( δ ) | ≤ M ( M + 1) (cid:107) δ (cid:107) , E [ I it ( δ )] ≤ M + 4 M f (cid:48) ) (cid:107) δ (cid:107) . Deﬁne c ,M := M ( M + 1) , c ,M := 2( M + 4 M f (cid:48) ) and apply the Bernstein inequality toshow that for any 1 ≥ (cid:107) δ (cid:107) ≥ T − (cid:96) n,T , < a < ∞ P (cid:16)(cid:12)(cid:12)(cid:12) (cid:88) t I it ( δ ) (cid:12)(cid:12)(cid:12) > a(cid:96) / n,T T / (cid:107) δ (cid:107) / (cid:17) ≤ (cid:16) − a (cid:96) n,T T (cid:107) δ (cid:107) / T c ,M (cid:107) δ (cid:107) + ac ,M (cid:96) / n,T T / (cid:107) δ (cid:107) / / (cid:17) = 2 exp (cid:16) − a (cid:96) n,T / c ,M + ac ,M (cid:96) / n,T ( T (cid:107) δ (cid:107) ) − / / (cid:17) . For 0 < a < (cid:96) / n,T c ,M /c ,M the last line above is bounded by 2( n ∨ T ) − a / (4 c ,M ) . Denoteby G T a grid of values δ , ..., δ | G T | such that T − ≤ (cid:107) δ j (cid:107) ≤ j ∈ G T andsup (cid:96) n,T T − ≤(cid:107) δ (cid:107)≤ inf ˜ δ ∈ G T (cid:107) δ − ˜ δ (cid:107) = o ( T − ) . Note that it is possible to ﬁnd such a G T with | G T | = O ( T d +1) ). It follows thatsup i sup (cid:96) n,T T − ≤(cid:107) δ (cid:107)≤ (cid:12)(cid:12)(cid:12) (cid:80) t I it ( δ ) (cid:12)(cid:12)(cid:12) (cid:107) δ (cid:107) / ≤ sup i sup δ ∈ G T (cid:12)(cid:12)(cid:12) (cid:80) t I it ( δ ) (cid:12)(cid:12)(cid:12) (cid:107) δ (cid:107) / + T / T M o ( T − )= sup i sup δ ∈ G T (cid:12)(cid:12)(cid:12) (cid:80) t I it ( δ ) (cid:12)(cid:12)(cid:12) (cid:107) δ (cid:107) / + o ( T / ) . Finally, note that for 0 < a < (cid:96) / n,T c ,M /c ,M P (cid:16) sup i sup δ ∈ G T (cid:12)(cid:12)(cid:12) (cid:80) t I it ( δ ) (cid:12)(cid:12)(cid:12) (cid:107) δ (cid:107) / > a(cid:96) / n,T T / (cid:17) ≤ n | G T | n ∨ T ) − a / (4 c ,M ) = O ( nT d +1) )( n ∨ T ) − a / (4 c ,M ) . Since (cid:96) n,T → ∞ we can pick a such that the last line above is o (1), and hencesup i sup δ ∈ G T (cid:12)(cid:12)(cid:12) (cid:80) t I it ( δ ) (cid:12)(cid:12)(cid:12) (cid:107) δ (cid:107) / = O P ( (cid:96) / n,T T / ) . Panel Quantile regression with group fixed effects

Finally, observe that, denoting by (cid:107) A (cid:107) ∞ the maximum norm of the entries of the matrix A ,sup i (cid:12)(cid:12)(cid:12) − T δ (cid:62) E [ Z i Z (cid:62) i f ε i | X i (0 | X i )] + (cid:88) t f ε i | X i (0 | X it )( Z (cid:62) it δ ) (cid:12)(cid:12)(cid:12) = sup i (cid:12)(cid:12)(cid:12) δ (cid:62) (cid:110) (cid:88) t Z i Z (cid:62) i f ε i | X i (0 | X i ) − E [ Z i Z (cid:62) i f ε i | X i (0 | X i )] (cid:111) δ (cid:12)(cid:12)(cid:12) (cid:46) (cid:107) δ (cid:107) sup i (cid:13)(cid:13)(cid:13) (cid:88) t Z i Z (cid:62) i f ε i | X i (0 | X i ) − E [ Z i Z (cid:62) i f ε i | X i (0 | X i )] (cid:13)(cid:13)(cid:13) = (cid:107) δ (cid:107) O P ( (cid:112) T log n )where the last line follows by a straightforward application of the Hoeﬀding inequality. Thusthe proof of Lemma 8.4 is complete. (cid:50) u and Volgushev 35 Normal error t errorn T 1 2 3 4 ≥ ≥ DGP1: Independence between α i and x it . Model 1: location shift model30 15 0 0 .

074 0 .

504 0 .

324 0 .

098 0 0 .

168 0 .

490 0 .

266 0 . .

002 0 .

803 0 .

167 0 .

028 0 0 .

007 0 .

744 0 .

202 0 . .

000 0 .

984 0 .

016 0 .

000 0 0 .

000 0 .

966 0 .

032 0 . .

056 0 .

502 0 .

315 0 .

128 0 0 .

122 0 .

428 0 .

319 0 . .

000 0 .

856 0 .

127 0 .

018 0 0 .

002 0 .

767 0 .

186 0 . .

000 0 .

992 0 .

008 0 .

000 0 0 .

000 0 .

978 0 .

021 0 . .

040 0 .

465 0 .

339 0 .

156 0 0 .

113 0 .

392 0 .

322 0 . .

000 0 .

872 0 .

114 0 .

013 0 0 .

000 0 .

778 0 .

182 0 . .

000 0 .

996 0 .

004 0 .

000 0 0 .

000 0 .

984 0 .

016 0 . .

059 0 .

512 0 .

336 0 .

092 0 0 .

158 0 .

496 0 .

266 0 . .

000 0 .

814 0 .

159 0 .

026 0 0 .

006 0 .

759 0 .

198 0 . .

000 0 .

982 0 .

017 0 .

000 0 0 .

000 0 .

964 0 .

034 0 . .

038 0 .

520 0 .

324 0 .

118 0 0 .

112 0 .

437 0 .

330 0 . .

000 0 .

857 0 .

126 0 .

016 0 0 .

002 0 .

776 0 .

182 0 . .

000 0 .

994 0 .

006 0 .

000 0 0 .

000 0 .

980 0 .

020 0 . .

032 0 .

506 0 .

318 0 .

144 0 0 .

108 0 .

372 0 .

328 0 . .

000 0 .

876 0 .

110 0 .

013 0 0 .

001 0 .

794 0 .

170 0 . .

000 0 .

992 0 .

008 0 .

000 0 0 .

000 0 .

979 0 .

020 0 . DGP2: Correlation between α i and x it . Model 1: location shift model30 15 0 0 .

080 0 .

496 0 .

320 0 .

104 0 0 .

173 0 .

478 0 .

268 0 . .

003 0 .

788 0 .

178 0 .

031 0 0 .

012 0 .

722 0 .

229 0 . .

000 0 .

986 0 .

014 0 .

000 0 0 .

000 0 .

970 0 .

028 0 . .

062 0 .

492 0 .

314 0 .

132 0 0 .

128 0 .

426 0 .

315 0 . .

000 0 .

852 0 .

135 0 .

013 0 0 .

002 0 .

736 0 .

217 0 . .

000 0 .

992 0 .

008 0 .

000 0 0 .

000 0 .

980 0 .

020 0 . .

045 0 .

463 0 .

332 0 .

160 0 0 .

116 0 .

378 0 .

327 0 . .

000 0 .

854 0 .

133 0 .

012 0 0 .

002 0 .

732 0 .

220 0 . .

000 0 .

994 0 .

006 0 .

000 0 0 .

000 0 .

976 0 .

024 0 . .

138 0 .

453 0 .

330 0 .

078 0 0 .

228 0 .

456 0 .

249 0 . .

008 0 .

724 0 .

224 0 .

044 0 0 .

040 0 .

666 0 .

251 0 . .

000 0 .

972 0 .

025 0 .

003 0 0 .

000 0 .

923 0 .

068 0 . .

099 0 .

442 0 .

312 0 .

147 0 0 .

176 0 .

367 0 .

313 0 . .

001 0 .

748 0 .

210 0 .

042 0 0 .

016 0 .

631 0 .

272 0 . .

000 0 .

976 0 .

024 0 .

000 0 0 .

000 0 .

960 0 .

036 0 . .

098 0 .

389 0 .

324 0 .

189 0 0 .

145 0 .

330 0 .

316 0 . .

001 0 .

756 0 .

207 0 .

036 0 0 .

016 0 .

612 0 .

288 0 . .

000 0 .

978 0 .

022 0 .

000 0 0 .

000 0 .

950 0 .

050 0 . Table 1.

Frequency of estimated number of groups as k = 1 , . . . K . We aggregate the frequencyfor K ≥ K = 3. Results are based on 2000simulation repetitions for quantile level τ = 0 . Panel Quantile regression with group fixed effects

Normal error t errorn T 1 2 3 4 ≥ ≥ DGP1: Independence between α i and x it . Model 1: location shift model30 15 0 0 .

159 0 .

484 0 .

284 0 .

072 0 .

002 0 .

330 0 .

408 0 .

210 0 . .

011 0 .

734 0 .

207 0 .

048 0 .

000 0 .

200 0 .

560 0 .

207 0 . .

000 0 .

966 0 .

032 0 .

002 0 .

000 0 .

011 0 .

866 0 .

112 0 . .

129 0 .

398 0 .

325 0 .

148 0 .

000 0 .

188 0 .

337 0 .

306 0 . .

002 0 .

760 0 .

198 0 .

040 0 .

000 0 .

105 0 .

472 0 .

328 0 . .

000 0 .

970 0 .

029 0 .

000 0 .

861 0 .

128 0 . .

110 0 .

364 0 .

331 0 .

196 0 .

001 0 .

142 0 .

314 0 .

293 0 . .

000 0 .

768 0 .

194 0 .

038 0 .

000 0 .

072 0 .

450 0 .

341 0 . .

000 0 .

966 0 .

034 0 .

000 0 .

852 0 .

136 0 . .

158 0 .

472 0 .

300 0 .

070 0 .

002 0 .

334 0 .

390 0 .

214 0 . .

007 0 .

754 0 .

200 0 .

039 0 .

000 0 .

179 0 .

581 0 .

216 0 . .

000 0 .

968 0 .

031 0 .

001 0 .

000 0 .

007 0 .

868 0 .

118 0 . .

114 0 .

404 0 .

334 0 .

147 0 .

000 0 .

178 0 .

342 0 .

306 0 . .

002 0 .

771 0 .

188 0 .

038 0 .

000 0 .

082 0 .

474 0 .

353 0 . .

000 0 .

974 0 .

025 0 .

001 0 .

000 0 .

870 0 .

123 0 . .

093 0 .

354 0 .

346 0 .

208 0 .

000 0 .

134 0 .

292 0 .

310 0 . .

000 0 .

766 0 .

198 0 .

036 0 .

000 0 .

058 0 .

456 0 .

342 0 . .

000 0 .

955 0 .

044 0 .

000 0 .

854 0 .

132 0 . DGP2: Correlation between α i and x it . Model 1: location shift model30 15 0 0 .

172 0 .

494 0 .

265 0 .

070 0 .

006 0 .

327 0 .

416 0 .

200 0 . .

014 0 .

716 0 .

222 0 .

050 0 .

000 0 .

214 0 .

552 0 .

208 0 . .

000 0 .

954 0 .

044 0 .

002 0 .

000 0 .

014 0 .

844 0 .

132 0 . .

137 0 .

414 0 .

308 0 .

141 0 .

001 0 .

178 0 .

342 0 .

304 0 . .

003 0 .

758 0 .

202 0 .

037 0 .

000 0 .

109 0 .

464 0 .

328 0 . .

000 0 .

974 0 .

026 0 .

000 0 .

869 0 .

118 0 . .

110 0 .

349 0 .

336 0 .

205 0 .

002 0 .

139 0 .

320 0 .

294 0 . .

000 0 .

768 0 .

198 0 .

034 0 .

000 0 .

074 0 .

447 0 .

349 0 . .

000 0 .

964 0 .

034 0 .

002 0 .

000 0 .

836 0 .

149 0 . .

228 0 .

446 0 .

264 0 .

062 0 .

014 0 .

326 0 .

414 0 .

203 0 . .

038 0 .

660 0 .

257 0 .

045 0 .

000 0 .

276 0 .

502 0 .

196 0 . .

000 0 .

932 0 .

064 0 .

004 0 .

000 0 .

039 0 .

769 0 .

180 0 . .

176 0 .

377 0 .

304 0 .

144 0 .

004 0 .

169 0 .

338 0 .

303 0 . .

018 0 .

652 0 .

257 0 .

073 0 .

000 0 .

157 0 .

420 0 .

311 0 . .

000 0 .

954 0 .

044 0 .

002 0 .

000 0 .

004 0 .

775 0 .

192 0 . .

146 0 .

309 0 .

342 0 .

204 0 .

006 0 .

108 0 .

296 0 .

312 0 . .

010 0 .

660 0 .

258 0 .

072 0 .

000 0 .

118 0 .

373 0 .

342 0 . .

000 0 .

948 0 .

050 0 .

002 0 .

000 0 .

002 0 .

753 0 .

212 0 . Table 2.

Frequency of estimated number of groups as k = 1 , . . . K . We aggregate the frequencyfor K ≥ K = 3. Results are based on 2000simulation repetitions for quantile level τ = 0 . u and Volgushev 37 N o r m a l e rr o r t e rr o r P Q R - F E g r o up Q R F EP Q R - F E g r o up Q R F E n T B i a s R M S E C o v e r ag e B i a s R M S E C o v e r ag e B i a s R M S E C o v e r ag e B i a s R M S E C o v e r ag e D G P : I nd e p e nd e n c e b e t w ee n α i a nd x i t . M o d e l :l o c a t i o n s h i f t m o d e l − . . . − . . . . . . . . .

931 30300 . . . . . . − . . . − . . .

908 30600 . . . . . . . . . . . .

934 60150 . . . . . . . . . . . .

780 60300 . . . . . . − . . . − . . .

942 60600 . . . . . . . . . . . .

942 9015 − . . . . . . − . . . − . . .

828 90300 . . . . . . . . . . . .

968 90600 . . . . . . . . . . . . M o d e l :l o c a t i o n - s c a l e s h i f t m o d e l . . . − . . . . . . . . .

932 30300 . . . . . . . . . . . .

890 30600 . . . . . . . . . . . .

935 60150 . . . . . . . . . . . .

774 60300 . . . . . . . . . . . .

946 60600 . . . . . . . . . . . .

938 90150 . . . . . . − . . . − . . .

830 90300 . . . . . . . . . . . .

966 90600 . . . . . . . . . . . . D G P : C o rr e l a t i o nb e t w ee n α i a nd x i t . M o d e l :l o c a t i o n s h i f t m o d e l . . . − . . . . . . . . .

931 30300 . . . . . . . . . − . . .

908 30600 . . . . . . . . . . . .

934 60150 . . . . . . . . . . . .

780 60300 . . . . . . . . . − . . .

942 60600 . . . . . . . . . . . .

942 90150 . . . . . . . . . − . . .

828 90300 . . . . . . . . . . . .

968 90600 . . . . . . . . . . . . M o d e l :l o c a t i o n - s c a l e s h i f t m o d e l . . . − . . . . . . . . .

933 30300 . . . . . . . . . − . . .

892 30600 . . . . . . . . . . . .

938 60150 . . . . . . . . . . . .

775 60300 . . . . . . . . . . . .

943 60600 . . . . . . . . . . . .

938 90150 . . . . . . . . . − . . .

831 90300 . . . . . . . . . . . .

966 90600 . . . . . . . . . . . . T a b l e . C o m p a r i s o n o f b i a s a nd r oo t m e a n s q u a r e d e rr o r o f ˆ β ( τ ) b a s e d o n t h e g r o upﬁ x e d e ﬀ ec t q u a n t il e r e g r e ss i o n ( P Q R - F E g r o up ) a nd t h e ﬁ x e d e ﬀ ec t q u a n t il e r e g r e ss i o n e s t i m a t o r ( Q R F E ) . R e s u l t s a r e b a s e d o n s i m u l a t i o n r e p e t i t i o n s f o r q u a n t il e l e v e l τ = . . D G P ss u m e s t h a t x i t i s i nd e p e nd e n t o f t h e ﬁ x e d e ﬀ ec t α i . D G P ss u m e s t h a t x i t = . α i + γ i + v i t . Panel Quantile regression with group fixed effects N o r m a l e rr o r t e rr o r P Q R - F E g r o up Q R F EP Q R - F E g r o up Q R F E n T B i a s R M S E C o v e r ag e B i a s R M S E C o v e r ag e B i a s R M S E C o v e r ag e B i a s R M S E C o v e r ag e D G P : I nd e p e nd e n c e b e t w ee n α i a nd x i t . M o d e l :l o c a t i o n s h i f t m o d e l: n o r m a l e rr o r − . . . − . . . . . . . . .

944 30300 . . . − . . . − . . . − . . .

890 30600 . . . . . . . . . − . . .

938 60150 . . . . . . . . . − . . .

786 60300 . . . . . . − . . . . . .

945 60600 . . . . . . . . . . . .

902 9015 − . . . . . . . . . . . .

825 90300 . . . . . . . . . . . .

963 90600 . . . . . . . . . . . . M o d e l :l o c a t i o n - s c a l e s h i f t m o d e l: n o r m a l e rr o r − . . . − . . . − . . . − . . .

942 3030 − . . . − . . . − . . . − . . .

886 30600 . . . − . . . − . . . − . . .

933 6015 − . . . − . . . − . . . − . . .

784 6030 − . . . − . . . − . . . − . . .

936 60600 . . . − . . . − . . . − . . .

892 9015 − . . . − . . . − . . . − . . .

824 9030 − . . . − . . . − . . . − . . .

958 90600 . . . − . . . − . . . − . . . D G P : C o rr e l a t i o nb e t w ee n α i a nd x i t . M o d e l :l o c a t i o n s h i f t m o d e l . . . − . . . . . . . . .

944 30300 . . . − . . . . . . − . . .

890 30600 . . . . . . . . . − . . .

938 60150 . . . . . . . . . − . . .

786 60300 . . . . . . . . . . . .

945 60600 . . . . . . . . . . . .

902 90150 . . . . . . . . . . . .

825 90300 . . . . . . . . . . . .

963 90600 . . . . . . . . . . . . M o d e l :l o c a t i o n - s c a l e s h i f t m o d e l . . . − . . . . . . − . . .

942 30300 . . . − . . . . . . − . . .

888 30600 . . . − . . . . . . − . . .

935 60150 . . . − . . . . . . − . . .

784 60300 . . . − . . . . . . − . . .

936 60600 . . . − . . . . . . − . . .

896 90150 . . . − . . . . . . − . . .

828 90300 . . . − . . . . . . − . . .

960 90600 . . . − . . . . . . − . . . T a b l e . C o m p a r i s o n o f b i a s a nd r oo t m e a n s q u a r e d e rr o r o f ˆ β ( τ ) b a s e d o n t h e g r o upﬁ x e d e ﬀ ec t q u a n t il e r e g r e ss i o n ( P Q R - F E g r o up ) a nd t h e ﬁ x e d e ﬀ ec t q u a n t il e r e g r e ss i o n e s t i m a t o r ( Q R F E ) . R e s u l t s a r e b a s e d o n s i m u l a t i o n r e p e t i t i o n s f o r q u a n t il e l e v e l τ = . . D G P ss u m e s t h a t x i t i s i nd e p e nd e n t o f t h e ﬁ x e d e ﬀ ec t α i . D G P ss u m e s t h a t x i t = . α i + γ i + v i t . u and Volgushev 39 Normal error t errorn T Perfect Match Avg Match Std Error Perfect Match Avg Match Std Error DGP1: Independence between α i and x it . Model 1: location shift model30 15 0 .

025 0 .

659 0 .

285 0 .

008 0 .

636 0 . .

384 0 .

877 0 .

222 0 .

223 0 .

820 0 . .

918 0 .

989 0 .

069 0 .

822 0 .

976 0 . .

003 0 .

651 0 .

307 0 .

000 0 .

590 0 . .

225 0 .

900 0 .

202 0 .

088 0 .

841 0 . .

884 0 .

992 0 .

064 0 .

735 0 .

985 0 . .

000 0 .

646 0 .

317 0 .

000 0 .

562 0 . .

119 0 .

912 0 .

187 0 .

024 0 .

839 0 . .

856 0 .

995 0 .

044 0 .

654 0 .

985 0 . .

028 0 .

670 0 .

285 0 .

012 0 .

638 0 . .

394 0 .

880 0 .

221 0 .

225 0 .

836 0 . .

906 0 .

988 0 .

072 0 .

800 0 .

975 0 . .

002 0 .

663 0 .

309 0 .

000 0 .

595 0 . .

218 0 .

903 0 .

201 0 .

084 0 .

850 0 . .

856 0 .

994 0 .

047 0 .

708 0 .

983 0 . .

000 0 .

662 0 .

323 0 .

000 0 .

550 0 . .

118 0 .

908 0 .

198 0 .

026 0 .

851 0 . .

827 0 .

994 0 .

051 0 .

634 0 .

984 0 . DGP2: Correlation between α i and x it . Model 1: location shift model30 15 0 .

024 0 .

655 0 .

289 0 .

008 0 .

624 0 . .

365 0 .

865 0 .

234 0 .

199 0 .

817 0 . .

923 0 .

991 0 .

061 0 .

824 0 .

978 0 . .

002 0 .

650 0 .

305 0 .

000 0 .

590 0 . .

202 0 .

895 0 .

212 0 .

076 0 .

829 0 . .

884 0 .

993 0 .

052 0 .

738 0 .

985 0 . .

000 0 .

643 0 .

318 0 .

000 0 .

554 0 . .

104 0 .

900 0 .

208 0 .

025 0 .

817 0 . .

850 0 .

994 0 .

053 0 .

646 0 .

984 0 . .

006 0 .

636 0 .

263 0 .

002 0 .

608 0 . .

215 0 .

826 0 .

250 0 .

119 0 .

780 0 . .

818 0 .

979 0 .

096 0 .

649 0 .

949 0 . .

000 0 .

612 0 .

293 0 .

000 0 .

545 0 . .

083 0 .

837 0 .

254 0 .

026 0 .

764 0 . .

715 0 .

981 0 .

091 0 .

516 0 .

968 0 . .

000 0 .

585 0 .

312 0 .

000 0 .

514 0 . .

023 0 .

832 0 .

268 0 .

006 0 .

744 0 . .

648 0 .

983 0 .

085 0 .

398 0 .

962 0 . Table 5.

Membership estimation for τ = 0 . Panel Quantile regression with group fixed effects

Normal error t errorn T Perfect Match Avg Match Std Error Perfect Match Avg Match Std Error DGP1: Independence between α i and x it . Model 1: location shift model30 15 0 .

012 0 .

666 0 .

240 0 .

000 0 .

574 0 . .

242 0 .

849 0 .

220 0 .

034 0 .

737 0 . .

846 0 .

982 0 .

082 0 .

471 0 .

923 0 . .

000 0 .

634 0 .

258 0 .

000 0 .

558 0 . .

098 0 .

873 0 .

198 0 .

002 0 .

732 0 . .

754 0 .

984 0 .

072 0 .

260 0 .

934 0 . .

000 0 .

633 0 .

267 0 .

000 0 .

530 0 . .

041 0 .

880 0 .

201 0 .

000 0 .

734 0 . .

690 0 .

984 0 .

071 0 .

182 0 .

939 0 . .

014 0 .

668 0 .

240 0 .

000 0 .

573 0 . .

248 0 .

858 0 .

215 0 .

040 0 .

751 0 . .

824 0 .

980 0 .

086 0 .

451 0 .

926 0 . .

000 0 .

640 0 .

260 0 .

000 0 .

566 0 . .

100 0 .

875 0 .

204 0 .

002 0 .

747 0 . .

729 0 .

985 0 .

069 0 .

270 0 .

938 0 . .

000 0 .

630 0 .

278 0 .

000 0 .

531 0 . .

045 0 .

885 0 .

195 0 .

001 0 .

741 0 . .

646 0 .

978 0 .

084 0 .

164 0 .

938 0 . DGP2: Correlation between α i and x it . Model 1: location shift model30 15 0 .

012 0 .

674 0 .

187 0 .

000 0 .

584 0 . .

255 0 .

873 0 .

163 0 .

052 0 .

758 0 . .

772 0 .

977 0 .

078 0 .

368 0 .

908 0 . .

000 0 .

727 0 .

189 0 .

000 0 .

523 0 . .

188 0 .

921 0 .

124 0 .

008 0 .

699 0 . .

855 0 .

994 0 .

032 0 .

258 0 .

911 0 . .

000 0 .

624 0 .

207 0 .

000 0 .

482 0 . .

048 0 .

868 0 .

177 0 .

002 0 .

667 0 . .

665 0 .

989 0 .

044 0 .

172 0 .

924 0 . .

002 0 .

618 0 .

161 0 .

000 0 .

538 0 . .

072 0 .

776 0 .

183 0 .

008 0 .

665 0 . .

468 0 .

945 0 .

103 0 .

140 0 .

819 0 . .

000 0 .

556 0 .

162 0 .

000 0 .

476 0 . .

000 0 .

761 0 .

188 0 .

000 0 .

591 0 . .

355 0 .

954 0 .

094 0 .

055 0 .

825 0 . .

000 0 .

527 0 .

172 0 .

000 0 .

428 0 . .

002 0 .

748 0 .

212 0 .

000 0 .

554 0 . .

240 0 .

953 0 .

103 0 .

010 0 .

805 0 . Table 6.

Membership estimation for τ = 0 .

75 for two diﬀerent error distributions: Perfect Match states thepercentage of perfect membership estimation out of the 2000 repetitions. Average match reports the meanof the percentage of correct membership estimation and the standard error reports the associated standarddeviation. u and Volgushev 41

Appendix A. Additional simulation results

A.1.

More investigations on the tuning parameters in the IC criteria.

In the mainmanuscript, we reported the inﬂuence of the turning parameters in the IC criteria on theperformance of the group and common parameters estimation for location-scale shift modelwith t errors on DGP1 where the predictor X it and the ﬁxed eﬀects α i are independent.Figure 6 and Figure 7 shows the corresponding results for location shift model with t erroron DGP1. The corresponding plots for DGP2 are collected in Figure 8 - Figure 11. l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . Prop of correct constant P r op o f c o rr e c t g r oup s l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 15) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 30) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . RMSE of beta (T = 60) constant R M SE l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .

95% Coverage (T = 15) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .

95% Coverage (T = 30) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .

95% Coverage (T = 60) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 Figure 6.

Use diﬀerent constants in p n,T for the IC criteria for location shift model with t erroron DGP1: For a equally spaced grid on [0 . , .

95% Coverage (T = 15) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .

95% Coverage (T = 30) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 l l l l l l l l l l l l l l l . . . . .

95% Coverage (T = 60) constant % c o v e r age l l l l l l l l l l l l l l l l l n = 30n = 60n = 90 Figure 7.

Use diﬀerent constants in p n,T for the IC criteria for location shift model with t erroron DGP1: For a equally spaced grid on [0 . , .

75 and the third plots thecoverage rate for nominal size 5%. u and Volgushev 43 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . RMSE of beta (T = 15) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . RMSE of beta (T = 30) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . RMSE of beta (T = 60) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . .

95% Coverage (T = 15) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . .

95% Coverage (T = 30) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . .

95% Coverage (T = 60) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 Figure 8.

Use diﬀerent constants in p n,T for the IC criteria for location-scale shift model with t error on DGP2: For a equally spaced grid on [0 . , .

95% Coverage (T = 15) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . .

95% Coverage (T = 30) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . .

95% Coverage (T = 60) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 Figure 9.

Use diﬀerent constants in p n,T for the IC criteria for location-scale shift model with t error on DGP2: For a equally spaced grid on [0 . , .

75 and the third plots thecoverage rate for nominal size 5%. Results are based on 400 repetitions. u and Volgushev 45 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . Prop of correct constant P r op o f c o rr e c t g r oup s llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . RMSE of beta (T = 15) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . RMSE of beta (T = 30) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . RMSE of beta (T = 60) constant R M SE llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . .

95% Coverage (T = 15) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . .

95% Coverage (T = 30) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . .

95% Coverage (T = 60) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 Figure 10.

Use diﬀerent constants in p n,T for the IC criteria for location shift model with t erroron DGP2: For a equally spaced grid on [0 . , .

95% Coverage (T = 15) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . .

95% Coverage (T = 30) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 llllllllllllllllllllllllllllll . . . . .

95% Coverage (T = 60) constant % c o v e r age llllllllllllllllllllllllllllll l l n = 30n = 60n = 90 Figure 11.

Use diﬀerent constants in p n,T for the IC criteria for location shift model with t erroron DGP2: For a equally spaced grid on [0 . , .

75 and the third plots thecoverage rate for nominal size 5%. Results are based on 400 repetitions.

Appendix B. Additional Empirical Application Analysis

As a robustness check, we report here the results of the empirical analysis for model (10)without the incarcerating rate as a control covariate. Figure 12 presents the correspondingcommon parameters estimation using the ﬁxed eﬀect approach while Figure 13 for theresults using our proposed grouped ﬁxed eﬀect estimator. Figure 14 reports the raw andthe grouped ﬁxed eﬀect estimates. u and Volgushev 47 − . . . . . lawind taus c f[, ] l l l l l − − − − − − + − rpcpi taus c f[, ] l l l l l − . − . . . afam1019 taus c f[, ] l l l l l − . − . . . afam2029 taus c f[, ] l l l l l . . . . afam3039 taus c f[, ] l l l l l Figure 12.

Panel data quantile regression estimates with state ﬁxed eﬀects for various τ basedon model speciﬁcation (10) excluding the incarceration rate as a control variable: For τ ∈{ . , . , . , . , . } , the solid black points plot the coeﬃcient estimates for the eﬀects of theRTC law adoption and other control variables on the violent crime rate based on panel data on51 U.S. states for 1977 - 2010. The shaded area are the pointwise 95% conﬁdence interval wherethe standard errors are computed using the Hendricks-Koenker sandwich covariance matrix esti-mates with the Hall-Sheather bandwidth rule. The red solid line marks the ﬁxed eﬀect panel datamean regression estimates with the dotted red lines plot the 95% conﬁdence interval with robustclustered standard errors.8 Panel Quantile regression with group fixed effects − . . . . . lawind taus b s t a r [ k , ] l l l l l − − − − − − + − rpcpi taus b s t a r [ k , ] l l l l l − . − . . . afam1019 taus b s t a r [ k , ] l l l l l − . − . . . afam2029 taus b s t a r [ k , ] l l l l l . . . . afam3039 taus b s t a r [ k , ] l l l l l Figure 13.

Panel data quantile regression estimates with grouped state ﬁxed eﬀects for various τ based on model speciﬁcation (10) excluding the incarceration rate as a control variable: For τ ∈ { . , . , . , . , . } , the solid black points plot the coeﬃcient estimates for the eﬀects ofthe RTC law adoption and other control variables on the violent crime rate based on the proposedmethodology with panel data on 51 U.S. states for 1977 - 2010. The shaded area are the pointwise95% conﬁdence interval where the standard errors are computed using the Hendricks-Koenkersandwich covariance matrix estimates with the Hall-Sheather bandwidth rule. The red solid linemarks the ﬁxed eﬀect panel data mean regression estimates with the dotted red lines plot the 95%conﬁdence interval with robust clustered standard errors. u and Volgushev 49 lllllllllllllllllllllllllllllllllllllllllllllllllll tau = 0.1 Index F i x ed E ff e c t s lllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll tau = 0.5 Index F i x ed E ff e c t s lllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllll tau = 0.9 Index F i x ed E ff e c t s lllllllllllllllllllllllllllllllllllllllllllllllllll Figure 14.

The estimated state ﬁxed eﬀect and the corresponding estimated group structure forvarious τ based on model speciﬁcation (10) excluding the incarceration rate as a control variable:For τ ∈ { . , . , . }}