[PDF] Irregular Identification of Structural Models with Nonparametric Unobserved Heterogeneity

Abstract

One of the most important empirical findings in microeconometrics is the pervasiveness of heterogeneity in economic behaviour (cf. Heckman 2001). This paper shows that cumulative distribution functions and quantiles of the nonparametric unobserved heterogeneity have an infinite efficiency bound in many structural economic models of interest. The paper presents a relatively simple check of this fact. The usefulness of the theory is demonstrated with several relevant examples in economics, including, among others, the proportion of individuals with severe long term unemployment duration, the average marginal effect and the proportion of individuals with a positive marginal effect in a correlated random coefficient model with heterogenous first-stage effects, and the distribution and quantiles of random coefficients in linear, binary and the Mixed Logit models. Monte Carlo simulations illustrate the finite sample implications of our findings for the distribution and quantiles of the random coefficients in the Mixed Logit model.

Full PDF

aa r X i v : . [ ec on . E M ] M a y Irregular Identiﬁcation of Structural Models withNonparametric Unobserved Heterogeneity

Juan Carlos Escanciano ∗ Universidad Carlos III de Madrid

May 18th, 2020

Abstract

One of the most important empirical ﬁndings in microeconometrics is the pervasivenessof heterogeneity in economic behaviour (cf. Heckman 2001). This paper shows that cumu-lative distribution functions and quantiles of the nonparametric unobserved heterogeneityhave an inﬁnite eﬃciency bound in many structural economic models of interest. The paperpresents a relatively simple check of this fact. The usefulness of the theory is demonstratedwith several relevant examples in economics, including, among others, the proportion ofindividuals with severe long term unemployment duration, the average marginal eﬀect andthe proportion of individuals with a positive marginal eﬀect in a correlated random co-eﬃcient model with heterogenous ﬁrst-stage eﬀects, and the distribution and quantiles ofrandom coeﬃcients in linear, binary and the Mixed Logit models. Monte Carlo simulationsillustrate the ﬁnite sample implications of our ﬁndings for the distribution and quantilesof the random coeﬃcients in the Mixed Logit model.

Keywords:

Irregular Identiﬁcation; Semiparametric Models; Nonparametric UnobservedHeterogeneity.

JEL classiﬁcation:

C14; C31; C33; C35 ∗ Department of Economics, Universidad Carlos III de Madrid, Calle Madrid 126, Getafe 28907, Madrid, Spain.E-mail: [email protected]. Web Page: https://sites.google.com/view/juancarlosescanciano. Research fundedby the Spanish Grant PGC2018-096732-B-I00. Introduction

A tenet in empirical microeconometrics research is the pervasiveness of heterogeneity in be-haviour of otherwise observationally equivalent individuals (cf. Heckman 2001). This papershows that, for a large class of structural economic models, regular identiﬁcation of function-als of nonparametric unobserved heterogeneity (UH), that is, identiﬁcation of these functionalswith a ﬁnite eﬃciency bound, implies certain necessary smoothness conditions on the functional,leading to a practically simple check for regularity (or lack thereof). In particular, this paperuses these implications to show that cumulative distribution functions (CDFs) and quantilesof UH often have inﬁnite eﬃciency bounds in many empirically relevant economic models withnonparametric UH. These results have important practical implications, as these parametersare relevant for policy analysis, and they explain why any inferences on such parameters areexpected to be unstable in empirical work. In particular, if a parameter is irregularly identiﬁed,then no regular estimator with a parametric rate of convergence exists (see Chamberlain 1986).These observations are applicable to a wide class of models with nonparametric UH. Weconsider ﬁrst continuous mixtures, which have been commonly employed as a modeling device toaccount for UH in a variety of economic settings ranging from labour to industrial organization;see Compiani and Kitamura (2016) for a recent review. The canonical example is a tightlyspeciﬁed structural parametric model that is made ﬂexible by allowing all (or a subset) ofparameters to be individual speciﬁc, thereby accounting for UH. We show that if the mappingfrom the individual speciﬁc parameters to the conditional likelihood is smooth, then there willbe many functionals of UH that will not be regularly identiﬁed. Heuristically, smoothness of theconditional likelihood translates into a multicollinearity problem, as we further explain below.There are important economic applications that fall under this setting, see, e.g., Heckman andSinger (1984a, 1984b) for the study of unemployment duration. We demonstrate the usefulnessof these results in the context of duration data by establishing an inﬁnite eﬃciency bound forthe distribution and quantiles of UH in the structural model of unemployment duration withtwo spells and nonparametric UH recently proposed by Alvarez, Borovickov´a and Shimer (2016).The results are then extended to several classes of Random Coeﬃcients (RC) models. Thesemodels have a long history in economics; see, e.g., Masten (2017) for a review of the litera-ture. Applying our results to these models is technically more involved because these modelshave discontinuous conditional likelihoods given UH. We consider ﬁrst RC models where UHis independent of regressors and establish an inﬁnite eﬃciency bound for the distribution andquantiles of UH in binary and linear RC models. Establishing the zero information in the linearRC model is particularly challenging because the discontinuity in the conditional likelihood leadsto potential discontinuities in the scores of the model. Given these results, we extend them toa triangular RC model with a continuous endogenous variable, where we show irregular identi-ﬁcation of the average marginal eﬀect (AME) and the proportion of individuals with a positive2arginal eﬀect. The irregularity of the AME is driven by a positive mass of individuals withsmall ﬁrst-stage eﬀects. The irregular identiﬁcation of the CDF and quantiles of the distributionof random or correlated eﬀects holds more generally.The models treated up to this point are indexed by the distribution of UH, and only by thatdistribution. However, a simple and powerful observation of this paper is that our analysis canbe trivially extended to more complex semiparametric models indexed by UH and additional(possibly inﬁnite-dimensional) parameters. We illustrate this point with several examples, in-cluding semiparametric mixture models where some parameters are ﬁxed and others are random.A leading example is the popular RC Logit or Mixed Logit model, which is one of the mostcommonly used models in applied choice analysis. This model was introduced by Boyd andMellman (1980) and Cardell and Dunbar (1980) and it is widely used in environmental eco-nomics, industrial economics, marketing, public economics, transportation economics and otherﬁelds. Applying our results to this model we obtain an inﬁnite eﬃciency bound for CDFs andquantiles of the RC. The Mixed Logit example nicely illustrates the most appealing feature ofour method of proof, which is its simplicity. Two lines of proof and a simple application ofdominated convergence suﬃce. This should be contrasted with direct eﬃciency bounds calcu-lations, which are particularly challenging for this model (or for any of the models we considerfor that matter). These results have practical implications for proposed estimators of the MixedLogit model. We report Monte Carlo simulations supporting our theoretical ﬁndings for “ﬁxedgrid” estimators of the distribution and quantiles in the Mixed Logit model (cf. Bajari, Fox andRyan 2007 and Fox, Kim and Yang 2016). Further illustrations demonstrating the utility of ourresults in semiparametric settings are gathered in an Appendix and include examples on mixedproportional duration models and measurement error models with two measurements identiﬁedby means of Kotlarski’s lemma.The parameters (functionals) we consider are of interest in their own. For example, laboureconomists are interested in the proportion of individuals at risk of severe long term unemploy-ment, and more generally, social scientists are interested in evaluating the eﬀects of treatmentsand policy interventions (e.g. average marginal eﬀects and average signs). The functionals thatwe entertain, such as CDFs and quantiles of UH, are also used as imputs in subsequent coun-terfactual exercises. Our research limits the kind of inferences that are attainable with theseparameters in models where UH is nonparametric.What can be done to obtain regular identiﬁcation of CDFs and quantiles of UH in these mod-els? We show in several examples that functional form assumptions that restrict the conditionallikelihood of observables given heterogeneity do not generally help for the purpose of achievingregularity of quantiles and CDFs if UH is still nonparametric. Thus, our results show thatrestricting UH is somewhat necessary to attain ﬁnite eﬃciency bounds for the distribution andquantiles of UH in many of the aforementioned models. Commonly used strategies in practice,3uch as the use of parametric distributions for UH or considering discrete heterogeneity, indeedrestore the regular identiﬁcation of functionals of UH but can be deemed too strong. We ﬁndnecessary conditions of regular identiﬁcation under semiparametric restrictions on UH, althoughwe recognize that giving general primitive assumptions for these conditions seems diﬃcult. Ourrecommendation for inference on CDFs and quantiles of UH is to use ﬂexible semiparametricspeciﬁcations such as sieve methods; see, e.g., Shen (1997), Chen (2007), Bajari, Fox and Ryan(2007), Hu and Schennach (2008), Bester and Hansen (2007), Chen and Liao (2014), Fox, Kimand Yang (2016) and references therein, coupled with regularization (penalization) to reducethe high variance of estimates of functionals of UH when the conditional likelihood is a verysmooth function of UH, as illustrated in this paper with the Mixed Logit model.The rest of the paper is organized as follows. After a literature review, Section 3 setsnotation and considers the class of continuous mixtures, where the method is most transparent.This section illustrates the theoretical results in the structural model of Alvarez, Borovickov´aand Shimer (2016). Section 4 extends the analysis to several classes of RC models. Section 5extends further the analysis to semiparametric models, illustrating the theory with the MixedLogit model. Section 6 discusses diﬀerent strategies, some of them considered in the literature,to regularize the estimation of CDFs and quantiles of UH. Section 7 reports the results of someMonte Carlo simulations for the CDF and quantiles of the distribution of UH in the Mixed Logitmodel. Section 8 concludes. An Appendix contains proofs of the main results, further resultson nonlinear RC models, examples and simulations.

Our paper relates to a number of studies providing suﬃcient conditions for nonparametric iden-tiﬁcation for the distribution of UH in the aforementioned models. See, among many others,Elbers and Ridder (1982), Heckman and Singer (1984a, 1984b) and Alvarez, Borovickov´a andShimer (2016) for structural models of unemployment duration, Beran and Hall (1992), Be-ran, Feuerverger and Hall (1996), and Hoderlein, Klemela and Mammen (2010) for linear RC,Ichimura and Thompson (1998), Gautier and Kitamura (2013) and Hoderlein and Sherman(2015) for binary RC, Briesch, Chintagunta and Matzkin (2010) and Fox, Kim, Ryan and Ba-jari (2012) for RC multinomial choice models, Hoderlein, Holzmann and Meister (2017) fortriangular RC models, Masten (2017) for simultaneous RC models, and Lewbel and Pendakur(2017) for nonlinear RC models. For a review of nonparametric identiﬁcation results see Matzkin(2007, 2013) and Lewbel (2019). What diﬀerentiates our paper from these and other relatedstudies is our focus on establishing whether identiﬁcation is regular or not.Establishing an inﬁnite eﬃciency bound for functionals of UH in these models is a priori arather challenging task. The main reason is that characterizing the so-called tangent space of4he model and projections onto it is generally quite complicated in the models we study here,and it may explain the relative lack of theoretical work on semiparametric eﬃciency boundsin RC and related models. See Newey (1990) for a review of semiparametric eﬃciency boundsand some of the related concepts. Our method of proof avoids the complications in directlycomputing the tangent space, projections and the Fisher information, which is the standardapproach in the literature for obtaining eﬃciency bounds (see, e.g., Chamberlain 1986, Khanand Tamer 2010). Our indirect method of proof is relatively much simpler. The basic tool isa dominated convergence theorem, with regularity conditions that are easy to check in manymodels (although not in all models). The main building block is a fundamental result by vander Vaart (1991), who found a necessary condition for regular estimation of a parameter. Themain observation of our paper consists in systematically exploiting the implications that van derVaart’s (1991) necessary condition has on the smoothness of certain inﬂuence functions. van derVaart (1991), Groeneboom and Wellner (1992) and Bickel, Klassen, Ritov and Wellner (1998)have also used the necessary condition of van der Vaart (1991) to show that CDFs are irregularlyidentiﬁed in some speciﬁc univariate exponential and uniform mixture models. Relative to thiswork, our contribution is to derive suﬃcient conditions for a general method of proof, therebyextending the scope of applications to models of economic interest. In particular, we allow formultidimensional UH, semiparametric models and non-smooth conditional likelihoods such asthose that arise with RC models.Although not the focus of this paper, a large class of models for which our results areapplicable are panel data models with ﬁxed eﬀects. Within this setting, Chamberlain (1992)established regular identiﬁcation of the AME in a linear RC panel data model, while Arellano andBonhomme (2012) showed the identiﬁcation of the full distribution of UH in a model with limitedserial dependence in errors. Graham and Powell (2012) pointed out the irregular identiﬁcationof the AME when regressors exhibit little variation across periods, while Bonhomme (2011)derived conditions for regular and irregular identiﬁcation of moments of UH in nonlinear paneldata. Our research is highly complementary to these papers, as we consider diﬀerent modelsand our approach for proving irregular identiﬁcation is diﬀerent and exploits the smoothnessimplications of regular identiﬁcation.We illustrate the theoretical results with some Monte Carlo simulations implementing the“ﬁxed grid” nonparametric CDF estimator of Bajari, Fox and Ryan (2007) and Fox, Kim, Ryanand Bajari (2011), and further investigated in Fox, Kim and Yang (2016). We contribute to theliterature on the Mixed Logit model by proving the inﬁnite eﬃciency bound for the CDF andquantiles of the nonparametric distribution of RC. We report further ﬁnite sample evidence onthe performance of their computationally attractive “ﬁx grid” estimator for CDFs and quantiles,as well as some regularized variants, complementing recent work in econometrics by Horowitzand Nesheim (2019) and Heiss, Hetzenecker and Osterhaus (2019).5

Basic Setting and Results

Let { ( Z i , α i ) } ni =1 denote an independent and identically distributed (iid) sample with the samedistribution as ( Z, α ). The observed data is Z , ..., Z n , while α i denotes the i -th individual’sUH. Assume each observation Z i has a probability law P and a density with respect to (wrt) a σ − ﬁnite measure µ given by f η ( z ) = Z A f z/α ( z ) dη ( α ) , (1)where f z/α ( z ) denotes the known conditional density of Z given α, and η is the unknowndistribution of α with support on A ⊆ R d α (the results can potentially be extended to abstractheterogeneity spaces, but for simplicity of exposition we focus on the Euclidean case). Theassumption of known conditional density f z/α ( z ) is relaxed in Section 5.Suppose we are interested in estimating a moment of UH, φ ( η ) = E η [ r ( α )] , for a measurable function r ( · ) ∈ L ( η ) , where, henceforth, E η denotes the expectation underthe distribution η and L p ( ν ) denotes the space of (equivalence classes of) real-valued measurablefunctions h such that R | h | p dν < ∞ , for a generic measure ν. Henceforth, we drop the sets ofintegration in integrals and the qualiﬁcation ν − almost surely for simplicity of notation . So,for example, a function in L ( ν ) is discontinuous when there is no continuous function in itsequivalence class. Also, we drop the reference to the measure ν in L ( ν ) when ν = P , and writesimply L . We will be concerned with regular identiﬁcation of φ ( η ) , i.e. identiﬁcation of φ ( η )with a ﬁnite eﬃciency bound, when UH is nonparametric as formally deﬁned below.The basic message of this paper is based on two observations. First, from a general resultin van der Vaart (1991), we prove that a necessary condition for regular identiﬁcation of φ ( η )when UH is nonparametric is the existence of a measurable function s ( Z ) with zero mean andﬁnite variance such that r ( α ) − φ ( η ) = Z s ( z ) f z/α ( z ) dµ ( z ) . (2)Second, if the mapping α → f z/α is continuous (smooth), then under mild regularity conditions,(2) implies that r ( · ) must be also continuous (smooth) . The bulk of this paper is a formalizationof the second observation and its application to some economic models of interest.The precise sense of UH being nonparametric is the usual one, formalized as follows. Let H denote a class of distributions on A , and assume η ∈ H . Let η t ∈ H be a parametric submodelindexed by t ∈ [0 , ε ) , for some ε > , such that for a b ∈ L ( η ) the classical mean squarediﬀerentiability condition holds, Z " dη / t − dη / t − bdη / → t ↓ . (3)6hen, a formal deﬁnition of nonparametric UH is given as follows. Denote by T ( η ) the linearspan of the b ′ s in (3) and let L ( ν ) denote the subspace of functions in L ( ν ) with zero ν − mean. Deﬁnition 3.1

UH is nonparametric if T ( η ) is dense in L ( η ) . Henceforth, we assume, unless otherwise stated, that UH is nonparametric. The ﬁrst re-sult in this section, which follows from an application of van der Vaart (1991), shows that,in the presence of nonparametric UH in model (1), regular identiﬁcation of E η [ r ( α )] requiresnecessarily that (2) holds. Lemma 3.1

If UH is nonparametric, then (2) is necessary for regular identiﬁcation of φ ( η ) . We note that Severini and Tripathi (2006, 2012) and Bonhomme (2011) have found relatedresults in the context of nonparametric instrumental variables and nonlinear panel data models,respectively. Also, Escanciano (2020) has shown that (2) is also suﬃcient for semiparametricidentiﬁcation of φ ( η ) in model (1). Note that we are not assuming here that η or s in (2)are identiﬁed. This generality is important because these functions may not be identiﬁed inmany structural economics models under weak assumptions, which does not prevent us fromidentifying and estimating certain functionals of them (cf. Hurwicz 1950). We now proceed with the main insight of this paper, which is that if the mapping α → f z/α iscontinuous (smooth), then, under regularity conditions, r ( · ) must be also continuous (smooth) . This simple observation follows by dominated convergence, and it implies non-regularity ofCDFs, signs, quantiles, and other functionals of UH in “smooth models” satisfying the followingassumption. Let N denote an open subset of A ⊂ R d α . Assumption 1 (i) α → f z/α ( z ) is continuous on N a.e- µ ; (ii) for all α ∈ N there exists aneighborhood of α, say Γ ⊂ N, such that for all s satisfying (2), Z | s ( z ) | sup α ∈ Γ f z/α ( z ) dµ ( z ) < ∞ . (4)Assumption 1(i) is easy to check. Assumption 1(ii) is a dominance condition. The maincomplication in checking Assumption 1(ii) is that s belongs to L ( P ) but not necessarily to L ( µ ) or L ( µ ). We verify these conditions in a number of examples below. Lemma 3.2

Let the conditional density f z/α ( z ) satisfy Assumption 1. Then, r ( α ) in (2) iscontinuous in α on N. Of course, if η is identiﬁed, so is φ ( η ) (since r is known). Identiﬁcation of φ ( η ) follows from (2) becausewe can ﬁnd an identiﬁed function ˜ s ( Z ) , depending only on f z/α and r, such that r ( α ) = E [ ˜ s ( Z ) | α ] holds, andthus by iterated expectations φ ( η ) = E η [ r ( α )] = E η [ E [ ˜ s ( Z ) | α ]] = E [˜ s ( Z )] . Corollary 3.1

Let Assumption 1 hold. The CDF φ ( η ) = E η [1( α ≤ α r )] , for α r ∈ N, is notregularly identiﬁed. Quantiles of UH are nonlinear functionals, and are not covered by the previous results.To extend the theory to a more general setting including nonlinear functionals we need tointroduce some notation. A functional φ ( η ) : H → R is said to be diﬀerentiable if there existsan r φ ∈ L ( η ) such that for all paths satisfying (3), it holdslim t → φ ( η t ) − φ ( η ) t = E η [ r φ ( α ) b ( α )] . Under nonparametric UH such r φ is unique, as in Newey (1994). This function r φ plays the roleof the preceding moment function r. To illustrate with an example, consider the scalar UH case and assume η is absolute contin-uous with a strictly positive Lebesgue density in a neighborhood of φ ( η ) , where φ ( η ) is suchthat Z φ ( η ) −∞ dη ( α ) = τ, τ ∈ (0 , . (5)That is, φ ( η ) is the τ -quantile of η . It is well-known, see, e.g., Lemma 21.3 in van der Vaart(1998), that the quantile functional is diﬀerentiable under the conditions above with inﬂuencefunction r φ ( α ) = − { α < φ ( η )) − τ } ˙ η ( φ ( η )) , where ˙ η is the density pertaining to η . From our results, the discontinuity of the inﬂuencefunction r φ ( · ) implies irregular identiﬁcation. Next result, formalizes this ﬁnding. Corollary 3.2

Let Assumption 1 hold. Assume η is absolute continuous with a strictly positiveLebesgue density in a neighborhood of φ ( η ) satisfying (5). If φ ( η ) ∈ N, then the τ -quantile ofthe nonparametric UH distribution is not regularly identiﬁed. Remark 3.1

Henceforth, whenever we discuss identiﬁcation of quantiles, we implicitly assumethat the components of UH have densities that satisfy the conditions in Corollary 3.2. Thisexample illustrates how our results are applicable to nonlinear diﬀerentiable functionals.

We discuss now the complications of the more standard approach of computing the FisherInformation or the eﬃciency bound. Deﬁne the so-called tangent space of scores S := { s ∈ L : s ( z ) = E [ b ( α ) | Z ] for some b ∈ T ( η ) } . Then, a standard result in linear inverse problems isthat all solutions s of equation (2) have the same orthogonal projection onto the closure of S (see Engl, Hanke and Nuebauer, 1996). Denote by s ∗ such orthogonal projection, the so-called8ﬃcient score. The eﬃciency bound is given by the variance of s ∗ ( Z ) (see e.g. Newey 1990, vander Vaart 1998, Bickel et al. 1998, and Escanciano 2020). Thus, an alternative to our approachis to compute s ∗ ( Z ) and checking that it has inﬁnite variance. However, computing s ∗ ( Z ) can becumbersome, particularly because characterizing the mean squared closure of S can be a ratherdiﬃcult task in the models we analyze here. In fact, to the best of our knowledge, the analyticalexpression for s ∗ remains unknown for the functionals and models we study. In passing, wenote that these arguments show that it suﬃces to check the dominance condition (4) for s inthe closure of S . This additional information will turn out to be quite useful in some of ourapplications, such as the linear RC model. We illustrate the applicability of the previous results in the context of a structural model ofunemployment with nonparametric UH. Nonparametric heterogeneity has played a critical rolein rationalizing unemployment duration ever since the seminal contributions by Elbers andRidder (1982) and Heckman and Singer (1984a, 1984b). Recent work by Alvarez et al. (2016)is motivated from this perspective. These authors have shown nonparametric identiﬁcation ofthe distribution of UH in their nonparametric structural model for unemployment with twospells. Speciﬁcally, Alvarez, Borovickov´a and Shimer (2016) propose a structural model fortransitions in and out of employment that implies a duration of unemployment given by theﬁrst passage time of a Brownian motion with drift, a random variable with an inverse Gaussiandistribution. The parameters of the inverse Gaussian distribution are allowed to vary in arbitraryways to account for UH in workers. These authors investigate nonparametric identiﬁcation ofthe distribution of UH, η , when two unemployment spells Z i = ( t i , t i ) are observed on theset T , T ⊆ [0 , ∞ ). The reduced form parameters α = ( α , α ) ′ ∈ R × [0 , ∞ ) are functions ofstructural parameters. The distribution of Z i is absolutely continuous with Lebesgue density f η ( t , t ) given, up to a normalizing constant, by f η ( t , t ) = Z R × [0 , ∞ ) α t / t / e − ( α t − α t − ( α t − α t dη ( α , α ) . (6)Alvarez, Borovickov´a and Shimer (2016) show that η is nonparametrically identiﬁed up to thesign of α , but they do not investigate if speciﬁc functionals of this distribution are regularly orirregularly identiﬁed, which is the focus of study here. Speciﬁcally, we show that the CDF of η at a point, and other functionals of η with discontinuous inﬂuence functions, such as quantiles,have inﬁnite eﬃciency bounds. These functionals are important parameters. For example, φ ( η ) = E η [1 ( α ≤ α ) 1 ( α ≤ α )] , for a ﬁxed α < < α and large absolute values of α and α , quantiﬁes the proportion of individuals at risk of severe long term unemployment(an individual with parameters α and α , α ≤ α and α ≤ α , has a probability larger or9qual than 1 − exp(2 α α ) of remaining unemployed forever). We apply our previous resultsto this example for a generic moment φ ( η ) = E η [ r ( α , α )] , under the following mild condition. Assumption 2 (i) Let the set

T ⊆ [0 , ∞ ) be a convex set with a non-empty interior; (ii) themoment function r is locally bounded. Proposition 3.1

Under Assumption 2, if φ ( η ) = E η [ r ( α , α )] is regularly identiﬁed, then r ( · ) ∈ (cid:8) b ( α , α ) ∈ L ( η ) : b ( α , α ) = C + C α e α α h ( α , α ) (cid:9) , for constants C and C and a continuous function h ( u, v ) deﬁned on (0 , ∞ ) that, if T isbounded, is an inﬁnite number of times diﬀerentiable at u ∈ (0 , ∞ ) , for all v ∈ (0 , ∞ ) . For the purpose of proving an inﬁnite eﬃciency bound for CDFs and quantiles only the continuitypart of Proposition 3.1 suﬃces. Thus, an implication of Proposition 3.1 is that the CDF of UHat the ﬁxed point ( α , α ) , i.e. φ ( η ) = E [1( α ≤ α )1( α ≤ α )] , is not regularly identiﬁedbecause r φ ( α , α ) = 1( α ≤ α )1( α ≤ α ) is not continuous when ( α , α ) is in the interiorof the support of η . Corollary 3.3

Under Assumption 2(i), the CDFs and quantiles of UH in the model (6) are notregularly identiﬁed.

Random coeﬃcient models have long been used in economics to model nonparametric UH. Thereis by now an extensive literature on nonparametric identiﬁcation of UH in these models, see,e.g., Masten (2017) and references therein. In this paper we focus on establishing irregularidentiﬁcation of CDFs and quantiles of the distributions of RC. To the best of our knowledge,this is the ﬁrst paper to do so in this generality.A general class of random coeﬃcient models, including nonlinear models, is given by Y i = m ( X i , α i ) , (7)where Z i = ( Y i , X i ) are observed, but α i is unobserved and independent of X i with support A .Assume m : X ×A → R r is a measurable map, where X is the support of X . The functional formof m is known, and the nonparametric part is given by the distribution of α i . The assumptionsof known m and the independence of α i and X i are relaxed below. The density of the data is f η ( y, x ) = Z A y = m ( x, α )) dη ( α ) , A ) denotes the indicator function of the event A . In this setting, the dominating measure µ is deﬁned on Z = Y × X as µ ( B × B ) = ν Y ( B ) ν X ( B ) , where B and B are Borel sets of Y and X , respectively, ν Y is either the counting measure for discrete outcomes or the Lebesguemeasure λ ( · ) for continuous outcomes, and ν X ( · ) is the probability measure for X. The mainchallenge we face with RC models is that f z/α ( z ) = 1 ( y = m ( x, α )) is not continuous, and thusthe previous results need to be generalized. The generalization is non-trivial, particularly forcontinuous outcomes, and in some cases it requires delicate technical work. We consider ﬁrstthe binary choice RC model. Section 10.1 in the Appendix contains some generic results fornonlinear RC, as well as discussion on some RC models for which our conclusions do not hold. The binary choice random coeﬃcient model is given by Y i = 1 ( X ′ i α i ≥ , where we observe Z i = ( Y i , X i ) but α i is unobservable. The random vector α i is independentof X i , normalized to | α i | = 1 and satisﬁes P ( α i = 0) = 0. As in the existing literature, weassume η is absolutely continuous wrt the uniform spherical measure σ ( · ) in S d α − , where S d α − = { b ∈ R d α : | b | = 1 } denotes the unit sphere in R d α . The density of the data for apositive outcome (i.e. the choice probability function) is given by f η ( x ) = Z S dα − x ′ s ≥ dη ( s ) . (8)Ichimura and Thompson (1998) and Gautier and Kitamura (2013) found suﬃcient conditionsfor nonparametric identiﬁcation of η , but they did not investigate whether identiﬁcation wasregular or irregular, which is the focus here.By (8) and Lemma 3.1 a necessary condition for regular identiﬁcation of φ ( η ) = E η [ r ( α )]under nonparametric UH is r ( α ) − φ ( η ) = Z x ′ α ≥ s (1 , x ) dv X ( x ) , (9)for some s ∈ L . The following result provides necessary conditions for regular identiﬁcation.Write α = ( α , α ′ ) ′ . Proposition 4.1

If the distribution of X/ | X | is absolutely continuous, then r ( · ) in (9) must beuniformly continuous on S d α − . If X = (1 , ˜ X ) and α ′ ˜ X is absolutely continuous, then r ( α , α ) is an absolutely continuous function of α . An implication of this proposition is that functionals such as the CDF and quantiles of randomcoeﬃcients are not regularly identiﬁed in the binary RC model. To the best of our knowledge,this result is new in the literature. 11 orollary 4.1

Under the conditions of Proposition 4.1, the CDFs and quantiles of UH in thebinary RC model are not regularly identiﬁed.

The linear RC model has a long history in econometrics, see, e.g., Hildreth and Huock (1968)and Swamy (1970). This model is given by Y i = X ′ i α i , where we observe a d z − dimensional vector Z i = ( Y i , X i ) , but α i is unobservable and independentof X i . The dimension of X i and α i is d α , so d z = d α +1 . Like in Hoderlein, Klemel¨a and Mammen(2010), we normalize X i so that | X i | = 1 . The density of the data is f η ( z ) = Z R dα y = x ′ α ) dη ( α ) . (10)Nonparametric identiﬁcation and estimation of η has been studied by Beran and Hall (1992),Beran, Feuerverger and Hall (1996), and Hoderlein, Klemel¨a and Mammen (2010), among others.These authors exploit the relation between (10) and the Radon transform. In this paper we studynecessary conditions for regular identiﬁcation of φ ( η ) = E η [ r ( α )] , for a measurable function r ( · ) with E η [ r ( α )] < ∞ , and regular identiﬁcation of quantiles of the components of α. By Lemma 3.1 a necessary condition for regular identiﬁcation of φ ( η ) = E η [ r ( α )] undernonparametric UH is r ( α ) − φ ( η ) = Z s ( x ′ α, x ) dv X ( x ) , (11)for some s ∈ L . Under suitable conditions scores in the tangent space S = { s ∈ L : s ( z ) = E [ b ( α ) | Z ] for some b ∈ T ( η ) } are continuous, but providing conditions under which elements ofthe closure of S are continuous is much harder. In fact, without additional restrictions elementsin the closure of S can be potentially very discontinuous. We shall provide regularity conditionsbelow that guarantee that any element of the closure of S can be written as s ( z ) = g ( z ) f η ( z ) , where g ( z ) has an squared integrable weak derivative with respect to the ﬁrst argument y. Aswe show below, this last condition will be instrumental for checking the suﬃcient conditions forthe dominated convergence theorem in Lemma 3.2.Let η ,x denote the Lebesgue density of x ′ α when α has distribution η . The set η T ( η )is deﬁned as η T ( η ) := { η b : b ∈ T ( η ) } , while the deﬁnition of a Sobolev space H ρ ( A ) isprovided after (24) in the Appendix. 12 ssumption 3 For d α > and N as in Assumption 1: (i) the distribution η is bounded, hasbounded support, with a corresponding density η ,x that is continuous and satisﬁes inf α ∈ N η ,x ( x ′ α ) ≥ /l ( x ) for a positive measurable function l ( · ) such that E X [ l ( X )] < ∞ ; (ii) X is absolutely con-tinuous with a bounded density f X ( · ); (iii) η T ( η ) ⊆ H ρ ( A ) , where ρ + ( d α − / > (iv) r belongs to the closure of T ( η ) . The bounded support of Assumption 3(i) is often considered in the literature, see, e.g.,Hoderlein, Klemel¨a and Mammen (2010). If the inﬁnite eﬃciency bound holds in a model withbounded support of α it also holds in the more general model where the support is unrestricted.A suﬃcient condition for the continuity of η ,x is that the Fourier transform of the density of η is integrable, which was also assumed in Hoderlein, Klemel¨a and Mammen (2010). Assumptions3(i-ii) establish a link between the tails of η and f X ( · ) . Assumption 3(iii) imposes a mildsmoothness condition on the tangent space of UH. This assumption and Assumption 3(iv) allowbut do not require nonparametric UH.

Proposition 4.2

Under Assumption 3 and if r satisﬁes (11), then it must be continuous on N. Corollary 4.2

Under the conditions of Proposition 4.2, the CDFs and quantiles of UH are notregularly identiﬁed in the linear RC model.

The independence assumption between regressors and UH rules out important models and pa-rameters in economics, such as the Average Marginal Eﬀect (AME) φ ( η ) = E η [ γ i ] and theProportion of individuals with a Positive AME (PPAME), φ ( η ) = E η [1 ( γ i > , where γ i isthe coeﬃcient of an endogenous continuous variable in a RC triangular system. We extend ourprevious results to these cases. We will show that under nonparametric UH these importantparameters are not regularly identiﬁed. These results appear to be new in the literature underthis generality. For simplicity, we focus on a triangular model, but the same arguments areapplicable to a wide class of random coeﬃcient models, including simultaneous equation mod-els, nonlinear models with endogeneity, or variations of these models that include covariates,multiple endogenous variables, and mixed random and non-random coeﬃcients.Consider the triangular model: Y = γY + U , Y = δX + U , (12)where γ, U , δ and U are RC, and we observe Z = ( Y , Y , X ) ′ . The variable Y is a continuoustreatment variable, possibly endogenous, in the sense that U and U are correlated, and X isan instrument, independent of all the random coeﬃcients. Suppose, the researcher is interested13n the AME φ ( η ) = E η [ γ ] or the PPAME φ ( η ) = E η [1( γ > . We will provide conditionsunder which both parameters have an inﬁnite eﬃciency bound. To see this, we obtain thereduced forms Y = γδX + γU + U ≡ π X + π ,Y = δX + U , which, with some abuse of notation, are jointly written as Y = α + α X, where Y = ( Y , Y ) ′ ,α = ( α , α ) , α = ( π , U ) ′ and α = ( π , δ ) ′ . Proposition 4.2 can then be applied to thereduced form. Because the corresponding inﬂuence functions for the AME and PPAME are r AME ( α ) = π /δ and r P P AME ( α ) = 1( π > δ >

0) + 1( π < δ < , respectively, andthey are discontinuous functions of α = ( π , δ ) ′ , non-regularity follows from Proposition 4.2.Consider the following assumption. Let N be an open set in the interior of A , the support ofthe reduced form random coeﬃcient α . Assumption 4 (i) Assumption 3 holds with the reduced form Y = α + α X ; (ii) X indepen-dent of the random coeﬃcients ( γ, U , δ, U ) ; (iii) ( p , u , , d ) ∈ N for some ( p , u , d ); (iv) ( p , u , p , ∈ N for some ( p , u , p ) . Proposition 4.3

Suppose (12) and Assumption 4(i-ii) holds. If in addition Assumption 4(iii)or Assumption 4(iv) holds, then the PPAME is not regularly identiﬁed. If Assumption 4(iv)holds and E [ γ ] < ∞ , then the AME is not regularly identiﬁed. Proposition 4.3 proves non-regularity for the AME and the PPAME. The condition E [ γ ] < ∞ ensures that the AME is a continuous functional in L ( η ). If f δ denotes the (Lebesgue)density of δ and h ( u ) = E [ π | δ = u ] f δ ( u ) , then a suﬃcient condition for E [ γ ] < ∞ islim u → + h ( u ) /u ρ < ∞ for some ρ > E [ π ] < ∞ ; see Khuri and Casella (2002, pg. 45).Intuitively, non-regularity of the AME comes from the presence of a set of individuals withnear-zero ﬁrst-stage eﬀects (Assumption 4(iv)), although P ( δ = 0) = 0. When the instrumentsatisﬁes a monotonicity restriction, in the sense that P ( δ >

0) = 1 or P ( δ <

0) = 1 , thenregular identiﬁcation of the AME might be possible. Indeed, Heckman and Vytlacil (1998) andWooldridge (1997, 2003, 2008) show that with homogenous ﬁrst-stage eﬀects regular estimationby IV methods holds. Masten (2017, Proposition 4) gives conditions for nonparametric identi-ﬁcation of the distribution of γ, but he did not discuss eﬃciency bounds for the AME or thePPAME under his conditions. Khan and Tamer (2010) and Graham and Powell (2012) showirregularity of the AME in diﬀerent models where E [ γ ] = ∞ . We show irregularity of the AMEin a setting where E [ γ ] < ∞ . See also Florens et al. (2008), Masten and Torgovitsky (2016),and the extensive literature following the seminal contributions by Imbens and Angrist (1994)and Heckman and Vytlacil (2005) for identiﬁcation results on conditional and weighted AMEor their discrete versions. 14he PPAME is non-regular under more general conditions than the AME, because it has adiscontinuous inﬂuence function under more general conditions than that of the AME. Heckman,Smith and Clements (1997) provide bounds for the analog to PPAME in the binary treatmentcase, and identiﬁcation when gains are not anticipated at the time of the program. The irreg-ularity of the PPAME also follows from a more general principle that we describe in the nextsection: if irregularity holds in a model with exogenous eﬀects, it also holds in the model withendogenous eﬀects.

This section extends our results to semiparametric models. The main point is as follows, if afunctional is non-regularly identiﬁed in a model, it will be non-regularly identiﬁed in a largermodel that nests the original model as a special case. Information can only decrease (or remainthe same) when we know less. This basic observation has important implications, and it widenssubstantially the applicability of our results as illustrated with the Mixed Logit model here andwith further examples in the Appendix.

Consider ﬁrst a conditional semiparametric mixture model with density f η ,θ ( y, x ) = Z f y/x,α ( y ; θ ) dη ( α ) , where θ is an additional unknown parameter, ﬁnite or inﬁnite-dimensional. The basic idea hereis that irregularity of φ ( η ) = E η [ r ( α )] in the model where θ is known implies irregularity inthe model where θ is unknown.We illustrate our point with the random coeﬃcients Logit model, also known as the MixedLogit—one of the most commonly used models in applied choice analysis. Fox, Kim, Ryan andBajari (2012) have recently shown nonparametric identiﬁcation for the semiparametric MixedLogit model. Here, we show that the identiﬁcation of the CDF and quantiles of the distributionof RC is necessarily irregular when UH is nonparametric. The CDF and quantiles of thisdistribution are important parameters in applications of discrete choice.The data Z i = ( Y i , X i ) is a random sample from the density (wrt µ below), f λ ( y, x ) = Z f y/x,α ( y ; θ ) dη ( α ) , where λ = ( θ , η ) ∈ Θ × H, θ = ( θ , ..., θ J ) ′ ,f y/x,α ( y ; θ ) = exp (cid:0) θ y + x ′ y α (cid:1) P Jj =1 exp (cid:0) θ j + x ′ j α (cid:1) , x = ( x , x , ..., x J ) ∈ X and y ∈ Y = { , , ..., J } . The consumer can choose between j = 1 , ..., J, J < ∞ , mutually exclusive inside goods and one outside good ( y = 0) . The utilityfor the inside good is normalized so that θ = 0 and x = 0 . The random coeﬃcients α areindependent of the regressors X, and have a distribution η . The main result below also applies tothe correlated random coeﬃcient case . In fact, non-regular identiﬁcation for CDFs and quantilesis proved even when θ is known. This will imply non-regularity when θ is unknown and/orwhen random coeﬃcients are dependent of the characteristics.The measure µ is deﬁned on Z = Y × X as µ ( B × B ) = τ ( B ) ν X ( B ) , where B ⊂ Y , B is a Borel set of X , τ ( · ) is the counting measure and ν X ( · ) is the probability measure for X. The vector α and covariates x y are K − dimensional. The parameter space Θ is an open set of R J . The set H consists of measurable functions η : R K → R whose support A has a non-emptyinterior and R A dη ( α ) = 1.Applying the necessary condition for regular identiﬁcation to a continuous linear functional φ ( η ) ∈ R with inﬂuence function r φ in the model where θ is known, it must be true that forsome s ∈ L , r φ ( α ) − φ ( η ) = Z f y/x,α ( y ; θ ) s ( y, x ) dµ ( y, x ) . (13)It is straightforward to show that the right hand side in (13) is continuous in α in the interiorof its support. In fact, more is true in general: it is an analytic function of α (a function thatis inﬁnitely diﬀerentiable with a convergent power series expansion). But continuity suﬃces forproving the non-regularity of CDFs and quantiles of η . This follows without computing leastfavorable distributions and eﬃciency bounds, simply by dominated convergence. We gather theproof here to illustrate the simplicity of our method of proof.

Proposition 5.1 r φ in (13) is continuous in the interior of A . Proof of Proposition 5.1 : Write Z f y/x,α ( y ; θ ) s ( y, x ) dµ ( y, x ) = J X j =0 Z f y/x,α ( j ; θ ) s ( j, x ) v X ( dx ) . Each of the summands in the last expression is continuous in α in the interior of its support, bycontinuity and boundedness of f y/x,α ( j ; θ ) and the dominated convergence theorem. (cid:4) Proposition 5.1 implies that identiﬁcation of the CDF and quantiles of the distribution of η under the conditions speciﬁed in Fox et al. (2012) must be irregular. Bajari, Fox and Ryan(2007) propose a simple estimator of the CDF of η , and Fox, Kim and Yang (2016) show itsconsistency (in the weak topology) and obtain its rates of convergence . Proposition 5.1 impliesthat the estimator in Fox et al. (2016), or any other estimator for that matter, cannot achieveregular parametric rates of convergence. The lack of regularity is not evident from the rates16stablished in Fox et al. (2016). Let F be the CDF pertaining to η and b F η the “ﬁxed grid”estimator of Bajari et al. (2007), Fox et al. (2011) and Fox et al. (2016) based on D grid points( D ≡ D ( n ) , where n is the sample size). The order of the bias established in Fox et al. (2016) is D − ¯ s/K where ¯ s is the smoothness of the mapping α → f y/x,α (here ¯ s = ∞ ) . This suggests thatparametric rates might be attainable, but our results show that this is not possible (at least ina local uniform sense). The order of the variance for b F η is inversely related to the minimumeigenvalue of the D × D matrix Ψ D with ( d , d ) − th element, 1 ≤ d , d ≤ D, given by E [ g ′ ( X, α d ) g ( X, α d )] , (14)where g ( x, α d ) = ( f y/x,α d (0; θ ) , ..., f y/x,α d ( J ; θ )) ′ are conditional choice probabilities when UH isevaluated at the d − th grid point α d , d = 1 , ..., D. This minimum eigenvalue quantiﬁes the levelof multicollinearity in the least squares regression of Fox et al. (2016), and we conjecture thatgiven the high smoothness of the mapping α → f y/x,α this term will go to zero exponentiallyfast, so it will be the main determinant in the (slow) rate of convergence of b F η . A detailedtheoretical analysis of this issue is beyond the scope of this paper, but see the discussion in thenext section and the Monte Carlo simulations below, which support these claims. The previous examples show that regular identiﬁcation of CDFs and quantiles of UH in themodels considered may require restricting the nature of heterogeneity. In this section we in-vestigate how common approaches considered in the literature address the lack of regularity ofthese functionals. Additionally, we provide a necessary condition for CDFs and quantiles to beregularly identiﬁed when UH is semiparametric and a discussion on how smoothness of α → f z/α translates into a multicollinearity problem for sieve and related estimators.Our ﬁrst observation is derived from the main idea in the previous section: functional formassumptions that restrict the conditional likelihood may not help with the irregular identiﬁcationof CDFs and quantiles if still the mapping α → f z/α is smooth, while UH is nonparametric. Forexample, knowing the ﬁnite dimensional parameters of a semiparametric mixture, knowing thefunctional forms of the idiosyncratic error terms in Kotlarski’s lemma, or knowing the functionalform of the baseline hazard in the mixed proportional hazard model do not help in restoringregular identiﬁcation of CDFs and quantiles of UH when UH is nonparametric.We discuss how restrictions on UH translate into regularity of functionals of UH. Denoteby T ( η ) the mean squared closure of T ( η ) in L ( η ) . That UH is not nonparametric formallymeans that T ( η ) is a strict subset of L ( η ) . The extension of the necessary condition for regularidentiﬁcation of φ ( η ) = E η [ r ( α )] , for a measurable function r ( · ) with E η [ r ( α )] < ∞ , is givenin the following lemma. Let Π V denote the orthogonal projection operator onto V , where V V in the norm topology. Lemma 6.1

The necessary condition for regular identiﬁcation of φ ( η ) = E η [ r ( α )] is Π T ( η ) r ( α ) = Π T ( η ) E [ s ( Z ) | α ] , for some s ∈ L . (15)The mismatch in smoothness between r ( α ) and E [ s ( Z ) | α ] , which was the source of irregularityin the examples studied, may now be restored by the projection onto T ( η ) . We brieﬂy discusshow diﬀerent restrictions on UH translate into regularity of CDFs and quantiles in view of thisgeneral characterization.A popular approach in practice is to consider a parametric distribution for the UH. A leadingexample of parametric model is a ﬁnite mixture with known and ﬁnite support points. Para-metric heterogeneity leads to a ﬁnite dimensional tangent space T ( η ) , which is then closed T ( η ) = T ( η ) , and which is generated by the scores of the speciﬁed distribution. Denote by l η the score of UH, i.e. T ( η ) = T ( η ) = span ( l η ) , assume E η (cid:2) l η ( α ) l ′ η ( α ) (cid:3) is non-singular, anddeﬁne the projected score s ( Z ) = E [ l η ( α ) | Z ] . Then, simple algebra shows that a solution to(15) in s is given by s r deﬁned by s r ( Z ) = λ ′ r s ( Z ) , where λ r is a solution to E [ s ( Z ) s ′ ( Z )] λ r = E (cid:2) r ( α ) l ′ η ( α ) (cid:3) . (16)If the Fisher information for η is positive, which means E [ s ( Z ) s ′ ( Z )] is non-singular, thenthere is a unique solution λ r of (16), and φ ( η ) is regularly identiﬁed. More generally, φ ( η ) maybe regularly identiﬁed even when η is not, and this corresponds to the system in (16) havingsome solution in λ r . The drawback of the parametric approach is the high misspeciﬁcationrisk, which can be quantiﬁed by the dimension and form of the model’s tangent space. If thedimension of T ( η ) is D, then the tangent space of the model is at most D − dimensional andgiven by S := { s ∈ L : s ( z ) = λ ′ s ( z ) for some λ ∈ R D } . Estimators for functionals of UH willbe in general inconsistent when the model is misspeciﬁed.As usual, a semiparametric approach is more robust to misspeciﬁcation. In Lemma 6.1 wehave derived the necessary condition for regular identiﬁcation of moments when UH is semipara-metric, so T ( η ) is a strict subset of L ( η ) of inﬁnite dimension. Examples of semiparametricmodels include ﬁnite mixtures with unknown support points and sieve methods with incompletesieve basis. Existing rate results for ﬁnite mixtures with unknown support points suggest irreg-ularity of the CDFs in general (see, e.g., Chen 1995 and Heinrich and Kahn 2018), although weare not aware of any paper investigating semiparametric eﬃciency bounds for ﬁnite mixtureswith unknown support points. We recognize that, although the suﬃcient condition for semi-parametric restrictions in Lemma 6.1 is general, it may be hard to ﬁnd primitive conditions for18t, as computing the closure of T ( η ) and the projections onto it may not be straightforward inapplications.As a practical approach, we recommend a sieve method where the span of { l η ( α ) } increaseswith the sample size, i.e. D → ∞ as n → ∞ . Without loss of generality normalize l η so that E η (cid:2) l η ( α ) l ′ η ( α ) (cid:3) is the identity matrix. A key quantity for sieve estimation is the minimumeigenvalue of the Fisher information matrix E [ s ( Z ) s ′ ( Z )] , denoted by ξ min ≡ ξ min ( D ); see Fox,Kim and Yang (2016) and (16). We provide a useful bound for ξ min . To that end, we assume thescore operator Ab = E [ b ( α ) | Z ] from L ( η ) to L is compact. A well known suﬃcient conditionfor this is Z f z/α ( z ) f η ( z ) dη ( α ) dµ ( z ) < ∞ . (17)Under this condition, A has a sequence of singular values { µ d } ∞ d =1 (see Engl, Hanke and Nue-bauer, 1996). Then, the following bound follows essentially from Blundell, Chen and Kristensen(2007, Lemma 1). Lemma 6.2

If (17) holds, then ξ min ( D ) ≤ µ D . Since µ D → D → ∞ , Lemma 6.2 implies that also ξ min ( D ) →

0. This is the multi-collinearity problem mentioned above. Furthermore, the score operator A is an integral opera-tor with kernel K ( z, α ) = f z/α ( z ) /f η ( z ) , and it is well known that the smoother the mapping α → K ( z, α ) , the faster the singular values µ D go to zero. In particular, for analytical kernelsthe singular values decay exponentially fast to zero (Hille and Tamarkin 1931). The minimumeigenvalue ξ min ( D ) is also closely related to the sieve measure of ill-posedness τ D proposed ineconometrics (see Chen 2007 and Blundell, Chen and Kristensen 2007) through the relation τ D = 1 ξ min ( D ) . Prior to this paper, Blundell, Chen and Kristensen (2007, Lemma 1) obtained the bound τ D ≥ /µ D in a nonparametric IV setting. Thus, the modest contribution here is the interpretationin terms of the minimum eigenvalue of the Fisher information matrix. For applications of sieveestimators along this line and the important role of τ D (or ξ min ( D )) see, e.g., Chen (2007),Bajari, Fox and Ryan (2007), Hu and Schennach (2008), Bester and Hansen (2007), Chen andLiao (2014), Fox, Kim and Yang (2016) and references therein. Next section investigates theﬁnite sample performance of the sieve “ﬁxed grid” method of Fox, Kim and Yang (2016) and aregularized version to reduce the variance of estimates of the CDFs and quantiles of UH. This section illustrates some of the theoretical ideas in a Monte Carlo study on the Mixed Logitmodel. Speciﬁcally, we consider the “ﬁxed grid” nonparametric estimator of Bajari et al. (2007)19nd Fox et al. (2016), and evaluate the performance of this estimator for estimating the CDFand quantiles of UH. We also provide a variant of this estimator that performs a Singular ValueDecomposition (SVD) of the resulting design matrix to reduce the variance of the estimator. Tointroduce the estimator, consider a discrete approximation of the distribution of UH of the form η ( α ) ≈ D X d =1 θ d δ α d ( α ) , (18)where θ d are probabilities, adding up to one, over a ﬁnite support { α d } Dd =1 of size D in A . InFox et al. (2016) D, and thus the discrete support, is allowed to increase with the sample size n .Deﬁne Y i,j as the binary choice equals 1 whenever individual i ′ s choice is j, and zero otherwise.Deﬁne the regression error term ε i,j = Y i,j − f η ( j, X i ) . The least squares estimator uses theregression equation Y i,j = Z f y/X i ,α ( j ) dη ( α ) + ε i,j , with the approximation in (18) to obtain the approximated linear regression model Y i,j ≈ D X d =1 θ d f y/X i ,α d ( j ) + ε i,j . Fox et al. (2016) proposes running a regression of Y i,j on the regressors Z di,j := f y/X i ,α d ( j ) subjectto the constrains on the probabilities θ d , i.e. b θ = arg min θ ∈ ∆ d nJ n X i =1 J X j =0 Y i,j − D X d =1 θ d Z di,j ! , (19)where θ = ( θ , ..., θ D ) ′ ∈ ∆ d = n ( p , ..., p D ) : 0 ≤ p d ≤ P Dd =1 p d = 1 o . The least squaresproblem in (19) is convex and can be eﬃciently solved by standard routines (such as lsqlin inMatlab). The estimator of the CDF of η at α is then given by b F η ( α ) = D X d =1 b θ d α d ≤ α ) , (20)and from the CDF we deﬁne the quantile estimators as usual.For simplicity of computation, in the Monte Carlo we apply this estimator to the MixedLogit model without ﬁxed parameters, so f y/x,α ( y ) = exp (cid:0) x ′ y α (cid:1) P Jj =1 exp (cid:0) x ′ j α (cid:1) , We thank Jeremy Fox for sharing the Matlab code to implement their estimator. x = ( x , x , ..., x J ) ∈ X and y ∈ Y = { , , ..., J } . Smoothness of mapping α −→ f y/x,α translates into high correlation of the regressors Z di,j when D is large (for d ′ s corresponding tonearby α ′ d s ), suggesting that methods that account for multicollinearity may reduce the variancesof the resulting estimators. We suggest using the SVD of the design nJ × D matrix Z = ( Z di,j ) , by adding the linear constrain V ′ p − D θ = 0 to (19), where V p − D = ( v p − D , v p − D +1 , ..., v D ) denotesthe last p − D left singular vectors of Z (where as usual, they are ordered according to thesingular values from largest to smallest). This is the classical Principal Component Regressionadapted to the constrained case where θ ′ s are probabilities. The resulting estimator is e θ = arg min θ ∈ ∆ d ,V ′ p − D θ =0 nJ n X i =1 J X j =0 Y i,j − D X d =1 θ d Z di,j ! , which solves a convex problem and can be equally computed by routines such as lsqlin in Matlab.Let e F η ( α ) = P Dd =1 e θ d α d ≤ α ) denote the corresponding CDF estimator . We compare belowthe performance of the resulting CDFs and quantile estimators based on b θ and e θ, respectively.The Monte Carlo setting we consider is taken from a recent study by Heiss, Hetzeneckerand Osterhaus (2019). The data generating process we consider is as follows. The numberof products (not including outside good) is J = 3. The number of product characteristicsis K = 2. The characteristics are generated as independent uniforms on [0 , . The randomcoeﬃcient distribution is a mixture of two bivariate normal distributions with probability weights(1 / , / , means ( − . , − .

2) and (1 . , .

3) and equal variances Σ = Σ = Σ given byΣ = " . . .

15 0 . . To generate the grid { α d } Dd =1 we use a Halton sequence with points spread on [ − , × [ − , . The ﬁxed grid covers the support of the true distribution with probability close to one. Weconsider diﬀerent values for the number of points in the grid D ∈ { , , } and sample sizes n ∈ { , , } . For computing e θ we set the number of components p to 5 throughout (wehave investigated with values of p between 3 and 10 and obtain qualitatively similar results).We set p deterministically in simulations to save time, but in practice we recommend cross-validation to select p . The number of Monte Carlo simulations is M = 500 . To evaluate theperformance of CDFs’ estimators we compute the integrated absolute bias

Bias ( b F ) = 1 M L M X m =1 L X l =1 (cid:12)(cid:12)(cid:12) b F η,m ( α l ) − F ( α l ) (cid:12)(cid:12)(cid:12) , where { α l } Ll =1 is an additional equally spaced grid over [ − , × [ − ,

5] with L = 121 , b F η,m isthe ﬁxed grid CDF estimator (cf. 20) for the m − th Monte Carlo simulation, and F denotesthe true CDF pertaining to η . 21e also report the Root integrated Mean Squared Error deﬁned as RM SE ( b F ) = vuut M L M X m =1 L X l =1 (cid:16) b F η,m ( α l ) − F ( α l ) (cid:17) . The quantities

Bias ( e F ) and RM SE ( e F ) are analogously deﬁned.Table 1 reports the bias and root mean squared errors for the CDFs estimators b F and e F .

The ﬁrst observation is that the bias is small even for small sample sizes such as n = 100 , and it does not depend much on D, which is consistent with our discussion in Section 5.1. Theregularization causes e F to have a slightly larger bias than b F in some cases, although the diﬀerenceis not substantial, and for small samples the bias of e F is even smaller. On the other hand, thevariance of b F is systematically larger than that of e F , particularly for moderate and large valuesof D, consistent with our claims that the level of multicollinearity increases dramatically withthe number of points D . Table 1 . Bias and RMSE for CDFs in Mixed Logit n D Bias ( b F ) Bias ( e F ) RM SE ( b F ) RM SE ( e F )100 25 0.0781 0.0729 0.1791 0.1059500 25 0.0663 0.0713 0.1380 0.09331000 25 0.0605 0.0708 0.1231 0.0904100 100 0.0799 0.0682 0.1896 0.0999500 100 0.0606 0.0639 0.1428 0.08551000 100 0.0511 0.0630 0.1284 0.0831100 500 0.0784 0.0651 0.1906 0.0982500 500 0.0541 0.0602 0.1452 0.08351000 500 0.0440 0.0592 0.1303 0.0805 M = 500 simulations.Table 2 reports the RMSE for the medians of the marginal distributions of UH (denotedby RMSEQ1 and RMSEQ2 for b F and RMSEQ1-PCR and RMSEQ2-PCR for e F , respectively).Results for other quantile levels are reported in the Appendix. We do not report the biasseparately to save space, but we note that the bias for quantiles is much larger than the biasfor CDFs. We observe substantial gains in terms of RMSE of the regularization by SVD,with the beneﬁts increasing with the number of grid points. Importantly, in both cases, CDFsand quantiles, the reported results are consistent with much slower rates of convergence thanparametric, lending support on the inﬁnite eﬃciency bounds established in this paper.22 able 2 . RMSE for Medians of Marginals of UH in the Mixed Logit n D

RMSEQ1 RMSEQ1-PCR RMSEQ2 RMSEQ2-PCR100 25 1.6624 0.8061 1.4621 0.7085500 25 0.8492 0.5232 0.8713 0.41551000 25 0.8008 0.4923 0.7386 0.3254100 100 1.6084 0.6315 1.8392 0.6514500 100 0.9411 0.2995 0.9409 0.27901000 100 0.8947 0.1874 0.8976 0.1832100 500 1.6373 0.6360 1.6270 0.5974500 500 1.0599 0.2710 0.9917 0.26391000 500 0.9374 0.1879 0.9669 0.1766 M = 500 simulations. We have established irregular identiﬁcation of CDFs and quantiles (or more generally, function-als with discontinuous inﬂuence functions) of nonparametric UH in some structural economicmodels. Example applications include the structural model of unemployment with two spells inAlvarez et al. (2015), the binary and linear RC models (possibly with correlated eﬀects), theAME in a triangular model with near zero ﬁrst-stage eﬀects, and the distribution and quantilesof UH in the Mixed Logit model. These are only some applications, but the results are applicablemore widely. Further examples in the Appendix include mixed proportional duration models,and measurement error models with two measurements identiﬁed by means of Kotlarski’s lemma.Furthermore, as we discuss in the Appendix, we expect our approach to be applicable to manysituations where the so-called Information Operator (see e.g. Begun, Hall, Huang and Wellner(1983)) is a smoothing operator.The most appealing feature of our method of proof is its simplicity, relative to alternativeapproaches that directly compute eﬃciency bounds, which are particularly diﬃcult to compute inthe models we have studied. Instead, we exploit some necessary smoothness conditions that theinﬂuence function of a regularly identiﬁed functional must satisfy. The Mixed Logit exampleis illustrative of the easiness in the application of our method of proof. In contrast, directlycomputing the Fisher information and the eﬃciency bound in this model is rather challenging(and were unknown prior to this paper). The practical implications of the irregularity of CDFsand quantiles have been investigated in a Monte Carlo study. We have found substantial beneﬁtsfrom regularizing the ﬁxed grid estimator of Bajari et al. (2007), Fox et al. (2011) and Fox etal. (2016), without sacriﬁcing much of its appealing computational simplicity. Future researchon the theoretical properties of regularized estimators is guaranteed.23

Appendix A: Proofs of Main Results

Proof of Lemma 3.1 : First, the functional η → φ ( η ) = E η [ r ( α )] is diﬀerentiable withinﬂuence function χ ( α ) = Π T ( η ) r ( α ) , where Π V denotes the orthogonal projection operator onto the closure of V, V .

To see this, notethat by linearity of η → φ ( η ) , for all b ∈ T ( η ) , lim t → φ ( η t ) − φ ( η ) t = E η [ r ( α ) b ( α )]= E η [ (cid:16) Π T ( η ) r ( α ) (cid:17) b ( α )] . Since UH is nonparametric Π T ( η ) r ( α ) = r ( α ) − φ ( η ) . On the other hand, by Lemma 25.34 invan der Vaart (1998) the adjoint of the score operator is given by A ∗ s = E [ s ( Z ) | α ] − E [ s ( Z )] . The lemma then follows from Theorem 3.1 and Theorem 4.1 in van der Vaart (1991), whichestablish that a necessary condition for positive Fisher information for φ ( η ) is r ( α ) − φ ( η ) = E [ s ( Z ) | α ] , since E [ s ( Z )] = 0. (cid:4) Proof of Lemma 3.2:

Let α n , α ∈ N such that α n → α, and deﬁne h n ( z ) = s ( z ) f z/α n ( z ) . Note (i) implies h n ( z ) → h ( z ) := s ( z ) f z/α ( z ) a.e- µ. Also, by the dominance condition, for asuﬃciently large n, Z | h n ( z ) | dµ ( z ) < ∞ . We conclude by dominated convergence that Z s ( z ) f z/α n ( z ) dµ ( z ) → Z s ( z ) f z/α ( z ) dµ ( z ) . (cid:4) Proof of Corollary 3.1:

By Lemma 3.2 if the inﬂuence function of the functional is discontin-uous then the functional is not regularly identiﬁed. Since the indicator is not continuous, thisproves the lemma. (cid:4)

Proof of Corollary 3.2:

Lemma 21.3 in van der Vaart (1998) shows the pathwise diﬀerentia-bility of the quantile functional with an inﬂuence function r φ ( α ) = − { α < φ ( η )) − τ } ˙ η ( φ ( η )) . η → φ ( η )satisﬁes, for all b ∈ T ( η ) , lim t → φ ( η t ) − φ ( η ) t = E η [ r φ ( α ) b ( α )] . From Van der Vaart (1991) it follows that a necessary condition for the quantile functional tobe diﬀerentiable is r φ ( α ) − φ ( η ) = Z s ( z ) f z/α ( z ) dµ ( z ) . By Lemma 3.2 if the inﬂuence function of the functional is discontinuous then the functionalis not regularly identiﬁed. Since the inﬂuence function of the quantile is not continuous, thisproves the lemma. (cid:4)

Proof of Proposition 3.1 : By substitution of f z/α ( t , t ) we obtain E [ s ( Z ) | α ] = Z T s ( t , t ) f z/α ( t , t ) dt dt = Cβ e αβ h ( α , α ) , where h ( u, v ) = Z T s ( t , t ) 1 t / t / s ( u, v ; t ) s ( u, v ; t ) dt dt and s ( u, v ; t ) = exp (cid:18) − ut − v t (cid:19) , t ∈ T , ( u, v ) ∈ (0 , ∞ ) . We check that the conditions for an application of the Leibniz’s rule hold. These conditions are The partial derivative ∂ m s ( u, v ; t ) s ( u, v ; t ) /∂ m u exists and is a continuous function on anopen neighborhood B of ( u, v ) , for a.s. ( t , t ) ∈ T . There is a positive function h m ( t , t ) such thatsup ( u,v ) ∈ B (cid:12)(cid:12)(cid:12)(cid:12) ∂ m s ( u, v ; t ) s ( u, v ; t ) ∂ m u (cid:12)(cid:12)(cid:12)(cid:12) ≤ h m ( t , t ) (21)and Z T s ( t , t ) 1 t / t / h m ( t , t ) dt dt < ∞ . (22)Simple diﬀerentiation and induction show that for any integer m ≥ ∂ m s ( u, v ; t ) s ( u, v ; t ) ∂ m u = 2 − m ( − m ( t + t ) m s ( u, v ; t ) s ( u, v ; t ) . u ∗ and v ∗ such that (21) holds with h m ( t , t ) = 2 − m ( t + t ) m s ( u ∗ , v ∗ ; t ) s ( u ∗ , v ∗ ; t ) . Furthermore, by E [ s ( Z ) | α ] < ∞ for all α in a local neighborhood (by local boundedness of r ) , and the boundedness of T , condition (22) holds. The continuity of h ( u, v ) is a special case ofthe previous arguments with m = 0 (note the term ( t + t ) m is one and the boundedness of T is not needed in this case). (cid:4) Proof of Proposition 4.1 : Deﬁne b ( α ) = E [ s ( Y i = 1 , X i ) | α i = α ]= Z x ′ α ≥ s (1 , x ) dv X ( x ) . We prove that b is continuous and by compactness of the sphere is therefore uniformly continuous.Since the halfspaces 1 ( x ′ α ≥

0) and 1 ( x ′ α ≥

0) intersect in sets having surface measure of order | α − α | , it follows from the absolutely continuity of the angular component of X that | b ( α ) − b ( α ) | = O ( | α − α | ) . When x = (1 , ˜ x ) , then b ( α ) = Z x ′ α ≥ − α ) s (1 , , ˜ x ) dv X (˜ x ) , = Z u ≥ − α ) s α ( u ) f α ( u ) du, where s α ( u ) = E h s ( Y i = 1 , , ˜ X i ) (cid:12)(cid:12)(cid:12) α ′ ˜ X i = u i and f α denotes the density of α ′ ˜ X i . The absolutecontinuity in α follows from the integrability of s α ( u ) f α ( u ) and Royden (1968, Chapter 5). (cid:4) Proof of Corollary 4.1 : The proof follows as in Corollaries 3.1 and 3.2. (cid:4)

For a function a ∈ L ( λ ) ∩ L ( λ ) , deﬁne the Fourier transform ˆ a ( t ) = R e it ′ α a ( α ) dα, where i = √− . Use the notation ˜ g ( p, x ) = Z e ipy g ( y, x ) dy, for the Fourier transform with respect to just the ﬁrst argument (for g ( · , x ) ∈ L ( λ ) ∩ L ( λ )) . Deﬁne the norms | g | ,ρ = Z S dα − Z R | ˜ g ( p, x ) | (1 + | p | ) ρ dpdx (23)and | g | ρ = Z | ˆ g ( t ) | (1 + | t | ) ρ dt. (24)26he Sobolev space H ρ ( A ) is deﬁned as the set of measurable functions g such that | g | ρ < ∞ . Proof of Proposition 4.2 : Deﬁne the score operator A : T ( η ) → L Ab ( z ) = Rbη ( z ) f η ( z ) 1( f η ( z ) > , where R denotes the Radon transform Ra ( y, x ) = Z a ( α )1( y = x ′ α ) dα. Deﬁne g ( z ) = s ( z ) f η ( z ) and a ( α ) = b ( α ) η ( α ) . Since f η ( z ) and η are bounded, it follows that g and a are in L ( λ ) ∩ L ( λ ) . From the deﬁnition of Ra ( y, x )sup y,x | Ra ( y, x ) | ≤ Z | a ( α ) | dα < ∞ , (25)and since the supports of α and X are bounded, the support of Y is also bounded and Ra ∈ L ( λ ) , so we can view R : L ( λ ) → L ( λ ) . First, we show that if s belongs to the closure of the range of A, then g ( z ) = s ( z ) f η ( z )belongs to the closure of the range of R. Indeed, if s n is a sequence in the range of A convergingto s in L , then g n = s n f η ( z ) ≡ Ra n and clearly Z | g n ( z ) − g ( z ) | dz ≤ Z | s n ( z ) − s ( z ) | f η ( z ) dz → . Next, we shall show that any function g in the closure of the range of R will have an squaredintegrable weak derivative with respect to the ﬁrst argument (in y ) . By Theorem 2.4.1 in Rammand Katsevich (1996) and Assumption 3(iii) it follows that | g | ,ρ < ∞ for ρ = ρ + ( d α − / . While by well known results in Fourier analysis, with ∂ y g denoting the weak derivative withrespect to y Z S dα − Z (cid:12)(cid:12)(cid:12) f ∂ y g ( p, x ) (cid:12)(cid:12)(cid:12) dpdx ≤ Z S dα − Z | p | | e g ( p, x ) | dpdx ≤ Z S dα − Z | ˜ g ( p, x ) | (1 + | p | ) ρ dpdx< ∞ , and similarly, by Cauchy-Schwarz Z S dα − Z (cid:12)(cid:12)(cid:12) f ∂ y g ( p, x ) (cid:12)(cid:12)(cid:12) dpdx ≤ Z S dα − Z (cid:0) | p | (cid:1) / | e g ( p, x ) | dpdx ≤ C (cid:18)Z S dα − Z (1 + | p | ) − ρ dpdx (cid:19) / < ∞ , because ρ > . f ∂ y g ( p, x ) ∈ L ( λ ) ∩ L ( λ ) and by Plancherell’s theorem ∂ y g ( · ) ∈ L ( λ ) , as we claimed.Deﬁne ϕ ( · ) = ∂ y g ( · ) ∈ L ( λ ) . We proceed to verify the conditions of the dominated conver-gence theorem, see Lemma 3.2. First, we show that g ( y, x ) is continuous in y. Indeed, by thebounded support assumption g ( y, x ) = Z y −∞ ϕ ( u, x ) dx is absolutely continuous in y (see Royden 1968, Chapter 5).Next, by independence of α i and X i , P [ Y i ≤ y | X i = x ] = P [ x ′ α i ≤ y ] , and taking derivatives we conclude f η ( z ) = η ,x ( y ) . Thus, f η ( z ) is also continuous in y byAssumption 3(i). Moreover, inf α ∈ N η ,x ( x ′ α ) ≥ /l ( x ) > , which yields the continuity of α → s ( x ′ α, x ) in N. Furthermore, by Cauchy-Schwarz and Z sup α ∈ Γ | s ( x ′ α, x ) | f X ( x ) dx = Z sup α ∈ Γ | g ( x ′ α, x ) | sup α ∈ Γ (cid:12)(cid:12)(cid:12)(cid:12) f X ( x ) f η ( x ′ α, x ) (cid:12)(cid:12)(cid:12)(cid:12) dx ≤ (cid:18)Z | ϕ ( u, x ) | dudx (cid:19) / Z sup α ∈ Γ (cid:12)(cid:12)(cid:12)(cid:12) f X ( x ) f η ( x ′ α, x ) (cid:12)(cid:12)(cid:12)(cid:12) dx ! / ≤ C Z l ( x ) f X ( x ) dx ≤ C. Thus, by dominated convergence r must be continuous in N . (cid:4) Proof of Corollary 4.2 : The proof follows as in Corollaries 3.1 and 3.2. (cid:4)

Proof of Proposition 4.3 : A necessary condition for a reduced form functional φ ( η ) = E η [ r ( α )] to be regularly identiﬁed is r ( α ) − φ ( η ) = Z s ( α + α x, x ) dv X ( x ) , α = ( α ′ , α ′ ) = ( π , U , π , δ ) ′ . Thus, by Proposition 4.2 r ( α ) must be continuous in N. However, the inﬂuence function for thePPAME r P P AME ( α ) = 1( π > δ >

0) + 1( π < δ < p , u , , d ) or ( p , u , p , . Conclude that the PPAME is notregularly identiﬁed. As for AME, by E [ γ ] < ∞ this functional is diﬀerentiable in the sense ofvan der Vaart (1991) with an inﬂuence function r AME ( β ) = π /δ. Since there is no continuous28unction that is η − a.s equal to r AME ( β ) = π /δ when ( p , u , p ,

0) is a point in the interior ofthe support, we conclude that the AME is not regularly identiﬁed. (cid:4)

Proof of Lemma 6.1 : By Lemma 25.34 in van der Vaart (1998) the so-called score operatoris given by Ab ( z ) = E [ b ( α ) | Z ] , b ∈ T ( η )Thus, by the law of iterated expectations E [ Ab ( Z ) s ( Z )] = E [ b ( α ) s ( Z )]= E [ b ( α ) E [ s ( Z ) | α ]]= E h b ( α )Π T ( η ) E [ s ( Z ) | α ] i . In Lemma 3.2 we have shown that the functional η → φ ( η ) = E η [ r ( α )] is diﬀerentiable withinﬂuence function χ ( α ) = Π T ( η ) r ( α ) . The lemma then follows from Theorem 3.1 in van der Vaart (1991). (cid:4)

Proof of Lemma 6.2 : The sieve measure of ill-posedness (cf. Blundell, Chen and Kristensen2007) is τ D = sup b ∈ T ( η ) ,b =0 k b kk Ab k . Since T ( η ) = span ( l η ) and E η (cid:2) l η ( α ) l ′ η ( α ) (cid:3) is the identity then b = λ ′ l η and k b k = λ ′ λ = | λ | , while k Ab k = λ ′ E [ s ( Z ) s ′ ( Z )] λ. Thus, τ D = sup λ ∈ R D ,λ =0 | λ | λ ′ E [ s ( Z ) s ′ ( Z )] λ = 1inf λ ∈ R D , | λ | =1 λ ′ E [ s ( Z ) s ′ ( Z )] λ = 1 ξ min ( D ) . The bound then follows from Lemma 1 in Blundell, Chen and Kristensen (2007). (cid:4)

10 Appendix B: Further Results

In this section we describe a generic approach that can be used for generic nonlinear RC modelswith continuous outcomes. We also illustrate how certain invertible RC models are ruled out by29ur conditions. For the generic RC model in (7), the regularity condition reads r ( α ) − φ ( η ) = E [ s ( m ( X i , α ) , X i )] . (26)Again, the main diﬃculty in proving that the right hand side of (26) is continuous is that thescore function s ( · ) is only known to be in L (thus, s is potentially very discontinuous). Toovercome this diﬃculty, we resort to Fourier analysis and use the so-called Parseval’s identity(see Rudin 1987, pg. 187). To describe the method, assume X is absolutely continuous withdensity f X ( x ) , and deﬁne g ( z ) = s ( z ) f η ( z ) and w ( z, α ) = 1 ( y = m ( x, α )) f X ( x ) f η ( z ) 1( f η ( z ) > . Note that g ∈ L ( λ ) , and since f η is bounded, also g ∈ L ( λ ) . Let η m,x denote the density of m ( x, α ) when α has density η . Under our conditions below, w ( · , α ) ∈ L ( λ ) ∩ L ( λ ) , and byParseval’s identity, if r satisﬁes (26) then r ( α ) − φ ( η ) = Z ˆ g ( t ) ˆ w ( t, α ) dt, (27)where, for a generic function h ∈ L ( λ ) , ˆ h ( t ) = (2 π ) − d z / R e − it ′ z h ( z ) dz denotes the Fouriertransform , with i = √− , v denotes the complex conjugate of v andˆ w ( t, α ) = (2 π ) − d z / Z f X ( x ) η m,x ( x ′ α ) e i ( t m ( x,α )+ t ′ x ) dx. This integral representation is now amenable to our Lemma 3.2 under the following assumption.

Assumption 5 (i) The vector X is absolutely continuous with a bounded density f X ( · ) ; (ii)the density η m,x is continuous and satisﬁes inf α ∈ N η m,x ( m ( x, α )) > /l ( x ) for an a.s. positivemeasurable function l ( · ) such that E X [ l ( X )] < ∞ ; (iii) the function α → m ( x, α ) is continuousa.s. in x ; (iv) for all ˆ g satisfying (27) , Z | ˆ g ( t ) | sup α ∈ Γ (cid:12)(cid:12)(cid:12) ˆ w ( t, α ) (cid:12)(cid:12)(cid:12) dt < ∞ . (28) Proposition 10.1

Under Assumption 5 and if r satisﬁes (11), then r ( · ) must be continuous on N. Proof of Proposition 10.1 : First, we need to check that g and w ( z, α ) are in L ( λ ) ∩ L ( λ ) , so we can apply Parseval’s identity. From s ∈ L and the deﬁnition of g ( z ) = s ( z ) f η ( z ) , it isclear that g ∈ L ( λ ) . Next, note f η ( z ) ≤ Z R d dη ( α ) = 1 . g also belongs to L ( λ ) . Furthermore, by independence of α i and X i , P [ Y i ≤ y | X i = x ] = P [ m ( x, α i ) ≤ y ] , and taking derivatives we conclude f η ( z ) = η m,x ( y ) . Then, for p = 1 or 2 , Z | w ( z, α ) | p dz = Z (cid:12)(cid:12)(cid:12)(cid:12) f X ( x ) η m,x ( x ′ α ) (cid:12)(cid:12)(cid:12)(cid:12) p dx ≤ Z l p ( x ) | f X ( x ) | p dx ≤ C Z l p ( x ) f X ( x ) dx< ∞ , because f X is bounded. Then, we can apply Parseval’s identity and obtain r ( α ) − φ ( η ) = Z ˆ g ( t ) ˆ w ( t, α ) dt. We now proceed to verify the conditions of Lemma 3.2 with ˆ g ( · ) playing the role of s and ˆ w ( t, α )that of the conditional density. Noteˆ w ( t, α ) = (2 π ) − d z / Z f X ( x ) η m,x ( m ( x, α )) e i ( t m ( x,α )+ t ′ x ) dx. Under the conditions of the proposition the function α → ˆ w ( t, α ) is continuous on N since η m,x ( · )and m ( x, · ) are continuous and η m,x ( m ( x, α )) is bounded away from zero on N. Furthermore, thedominance condition holds from (28). Conclude applying one more time dominated convergenceunder the dominance condition Assumption 5(iii). (cid:4)

Among the conditions of Assumption 5, the most important one is (28). We will see that in aclass of invertible models this condition fails to be satisﬁed. Consider the canonical monotonicnonseparable model Y i = m ( X i , α i )with a scalar α i and where α → m ( x, α ) is strictly increasing with inverse m − ( y, x ) . Then, ifwe deﬁne s ( Y i , X i ) = 1( m − ( Y i , X i ) ≤ , then the regularity condition of Lemma 3.1 is satisﬁedwith r ( α ) = 1( α ≤ , proving that the necessary condition for regular identiﬁcation of the CDFat 0 (or at any other point in fact) holds. In invertible models like this, regularity of CDFsand quantiles is satisﬁed even in cases where m is not known, but identiﬁed. Our results donot apply to invertible models where heterogeneity can be recovered as an identiﬁed function ofobservables.To give a speciﬁc example, consider the model Y i = X i + α i , where s ( Y i , X i ) = 1( Y i ≤ X i )solves (2) with r ( α ) = 1( α ≤ , which is discontinuous at 0 . This is of course an unrealistic31odel, but the idea is simply to illustrate which of our assumptions is key for the results to hold.In this example, Assumption 5(i-ii) is satisﬁed under mild conditions, since η m,x ( m ( x, α )) = η ( α ) , but the integrability condition (28) fails, since for s ( Y i , X i ) = 1( Y i ≤ X i ) Z | ˆ g ( t ) | sup α ∈ Γ (cid:12)(cid:12)(cid:12) ˆ w ( t, α ) (cid:12)(cid:12)(cid:12) dt = inf α ∈ Γ η ( α ) Z | ˆ g ( t ) | dt = ∞ , where ˆ g ( t ) = R α ≤ η ( α ) e it α dα. Note that the discontinuity implies the lack of integra-bility.

There is a growing literature in econometrics identifying the distribution of latent variables bymeans of Kotlarski’s Lemma (see Prakasa Rao (1983) for a description of the method). In thissetting we observe Z = ( Y , Y ) satisfying Y = α + α Y = α + α , where α = ( α , α , α ) ′ is a vector of UH with independent components, and (with some abuseof notation) Lebesgue densities η j , for j = 1 , ,

3. The density of the data is given by f η ( y , y ) = Z y = α + α )1( y = α + α ) η ( α ) η ( α ) η ( α ) dα dα dα = Z η ( y − α ) η ( y − α ) η ( α ) dα . Consider a parametric submodel where η and η are known and continuous. The model reducesthen to our original setting where f z/α ( z ) = η ( y − α ) η ( y − α ) is known and continuous in α . If the dominance condition of Lemma 3.2 is satisﬁed, then the CDF and quantiles of η willbe irregularly identiﬁed. The Mixed Proportional Hazard Model leads to a conditional density for duration Y given avector of covariates X given by f η ( y, x ) = Z φ ( x ) ψ ( y ) αe − φ ( x )Ψ( y ) α dη ( α ) , where φ ( x ) is a transformation of covariates, Ψ( y ) is the baseline cumulative hazard, withderivative ψ, and α denotes UH. In submodel where φ ( x ) and ψ ( y ) are known, the model ﬁts32ur original formulation with f z/α ( z ) = φ ( x ) ψ ( y ) αe − φ ( x )Ψ( y ) α known and continuous as a functionof α . Indeed, Horowitz (1999) has established very slow rates of convergence (logarithmic) forthe CDF of α , consistent with the irregular identiﬁcation. The necessary condition for regular estimation in van der Vaart (1991) is quite general, and inits abstract form reads as ˜ ψ ∈ R ( A ∗ ) , where ˜ ψ is the so-called gradient, which for our original moment functional is ˜ ψ ( α ) = r ( α ) − φ ( η ) , and A ∗ is the adjoint of the so-called score operator A . In many semiparametric models, A ∗ isa smoothing integral operator, in the sense that A ∗ s = Z s ( z ) k ( z, α ) dµ ( z )is an operator from L to L ( η ) with a kernel function k such that α → k ( z, α ) is smooth,at least for some submodel. We expect our results to be potentially applicable in this generalsetting. We report here further results for estimation of quantiles in the Mixed Logit Model. The settingis that of the Monte Carlo section, the only diﬀerent being that other quantile levels τ diﬀerentfrom the median ( τ = 0 .

5) are considered. Table 3 report the RMSE. We observe that, asexpected, the RMSE at more extreme quantiles are larger than those for the median. Again,the gains from regularization are substantial, particularly for large values of R. able 3 . RMSE for τ − Quantiles of marginals of UH τ n D

RMSEQ1 RMSEQ1-PCR RMSEQ2 RMSEQ2-PCR0.25 100 25 1.6739 0.9308 1.7238 0.72700.25 500 25 1.3247 0.8674 1.3270 0.63050.25 1000 25 1.0596 0.8075 1.1569 0.62500.25 100 100 1.8369 0.6098 1.8217 0.66080.25 500 100 1.4038 0.4929 1.3763 0.55040.25 1000 100 1.3153 0.4832 1.2041 0.50360.25 100 500 1.8075 0.5893 1.8696 0.62750.25 500 500 1.4529 0.4520 1.4698 0.48940.25 1000 500 1.2954 0.4444 1.2481 0.45000.75 100 25 1.5719 0.9803 1.6573 0.99280.75 500 25 1.1938 0.8077 1.3158 0.79530.75 1000 25 0.8941 0.7045 1.1489 0.75250.75 100 100 1.8192 0.8616 1.7989 0.80990.75 500 100 1.2178 0.6029 1.1809 0.59360.75 1000 100 0.8947 0.5495 0.9476 0.53580.75 100 500 1.9017 0.8107 1.9402 0.83290.75 500 500 1.2381 0.5666 1.2324 0.54670.75 1000 500 0.9606 0.4885 0.9533 0.510234 eferences

Alvarez, F., Boroviˇckov´a, K. and

R. Shimer (2016): “Decomposing Duration Depen-dence in a Stopping Time Model,” unpublished manuscript.

Arellano, M. and S. Bonhomme (2012): “Identifying Distributional Characteristics inRandom Coeﬃcients Panel Data Models,”

Review of Economic Studies,

79, 987-1020.

Bajari, P., Fox, J.T., Ryan, S. , (2007): “Linear Regression Estimation of Discrete ChoiceModels with Nonparametric Distributions of Random Coeﬃcients,”

American Economic Re-view , 72, 459-463.

Begun, J. M., W. J. Hall, W. M. Huang, and J. A. Wellner (1983): “Informationand Asymptotic Eﬃciency in Parametric-Nonparametric Models,”

The Annals of Statistics ,11, 432-452.

Beran, R. and P. Hall (1992): “Estimating Coeﬃcient Distributions in Random CoeﬃcientRegressions,”

The Annals of Statistics , 20, 1970-1984.

Beran, R., Feuerverger, A., and Hall, P. (1996): “On Nonparametric Estimation ofIntercept and Slope Distributions in Random Coeﬃcient Regression”,

The Annals of Statistics ,24, 2569-2592.

Bester, A., and C. Hansen (2007): “Flexible Correlated Random Eﬀects Estimation inPanel Models with Unobserved Heterogeneity,” unpublished manuscript.

Bickel, P. J., C. A. J. Klassen, Y. Ritov, and

J. A. Wellner (1998):

Eﬃcient andAdaptive Estimation for Semiparametric Models . New York: Springer-Verlag.

Blundell, R., X. Chen, and

D. Kristensen (2007): “Semi-nonparametric IV Estimationof Shape-invariant Engel Curves, ”

Econometrica , 75, 1613-1670.

Bonhomme, S. (2011): “Panel Data, Inverse Problems, and the Estimation of Policy Parame-ters,” Unpublished manuscript.

Boyd, J.H., and Mellman, R.E. (1980): “Eﬀect of Fuel Economy Standards on the USAutomotive Market: An Hedonic Demand Analysis,”

Transportation Research B

Briesch, R. A., P. K. Chintagunta and R. L. Matzkin (2010): “Nonparametric DiscreteChoice Models With Unobserved Heterogeneity,”

Journal of Business & Economic Statistics ,28, 291-307. 35 ardell, N.S. and Dunbar, F.C. (1980): “Measuring the Societal Impacts of AutomobileDownsizing,”

Transportation Research B

Chamberlain, G. (1986): “Asymptotic Eﬃciency in Semi-Parametric Models with Censor-ing,”

Journal of Econometrics , 34, 305-334.

Chamberlain, G. (1992): “Eﬃciency Bounds for Semiparametric Regression,”

Econometrica ,60, 567-596.

Chen, J. (1995): “Optimal Rate of Convergence for Finite Mixture Models,”

The Annals ofStatistics , 23, 221-233.

Chen, X. , (2007): “Large Sample Sieve Estimation of Semi-Nonparametric Models,” in:

Hand-book of Econometrics , vol. 7. Elsevier.

Chen, X. and Z. Liao (2014): “Sieve M-Inference of Irregular Parameters”,

Journal ofEconometrics , 182, 70-86.

Elbers, C. and G. Ridder (1982): “True and Spurious Duration Dependence: The Identiﬁ-ability of the Proportional Hazard Model,”

Review of Economic Studies , 49, 403-409.

Engl, H. W., M. Hanke, and

A. Nuebauer (1996):

Regularization of Inverse Problems .Dordrecht: Kluwer Academic Publishers.

Escanciano, J.C., (2020): “Semiparametric Identiﬁcation and Fisher Information”, unpub-lished manuscript.

Florens, J.P., J.J. Heckman, C. Meghir and E. Vytlacil, (2008): “Identiﬁcation ofTreatment Eﬀects Using Control Functions in Models with Continuous, Endogenous Treat-ment and Heterogeneous Eﬀects,”

Econometrica , 76, 1191-1206.

Fox, J. T., K.-I. Kim, S. P. Ryan, and P. Bajari (2011): “A Simple Estimator for theDistribution of Random Coeﬃcients,”

Quantitative Economics , 2, 381-418.

Fox, J. T., K.-I. Kim, S. P. Ryan, and P. Bajari (2012): “The Random Coeﬃcients LogitModel Is Identiﬁed,”

Journal of Econometrics , 166 (2), 204-212.

Fox, J. T., K.-I. Kim, C. Yang (2016): “A Simple Nonparametric Approach to Estimatingthe Distribution of Random Coeﬃcients in Structural Models,”

Journal of Econometrics ,195, 236-254.

Gautier, E. and Y. Kitamura (2013): “Nonparametric Estimation in Random CoeﬃcientsBinary Choice Models,”

Econometrica , 81, 581-607.36 raham, B. W., and J. L. Powell (2012): ”Identiﬁcation and Estimation of Average PartialEﬀects in ’Irregular’ Correlated Random Coeﬃcient Panel Data Models,”

Econometrica , 80,2105-2152.

Groeneboom P. and Wellner J.A. (1992).

Information Bounds and Nonparametric Max-imum Likelihood Estimation . DMV Seminar, vol 19. Birkh¨auser, Basel

Heckman, J. J. (2001): “Micro Data, Heterogeneity, and the Evaluation of Public Policy:Nobel Lecture,”

Journal of Political Economy , 109(4), 673-748.

Heckman, J.J. and B. Singer (1984a): “The Identiﬁability of the Proportional HazardModel”,

Review of Economic Studies , 51,231-241.

Heckman, J.J. and B. Singer (1984b): “A Method for Minimizing the Impact of Distribu-tional Assumptions in Econometric Models for Duration Data,”

Econometrica , 52, 271-320.

Heckman, J.J., Smith, J. and N. Clements (1993): “Making The Most Out of ProgrammeEvaluations and Social Experiments: Accounting For Heterogeneity in Programme Impacts,”

Review of Economic Studies , 64, 487-535.

Heckman, J.J. and E. Vytlacil (1998): “Instrumental Variables Methods for the CorrelatedRandom Coeﬃcient Model: Estimating the Average Rate of Return to Schooling When theReturn is Correlated with Schooling,”

Journal of Human Resources , 33, 974-987.

Heckman, J.J. and E. Vytlacil (2005): “Structural Equations, Treatment, Eﬀects andEconometric Policy Evaluation,”

Econometrica , 73, 669-738.

Heiss, F. S. Hetzenecker and M. Osterhaus (2019): “Nonparametric Estimation of theRandom Coeﬃcients Model: An Elastic Net Approach, unpublished manuscript.

Heinrich, P. and J. Kahn (2018): “Optimal Rates for Finite Mixture Estimation,”

TheAnnals of Statistics , 46, 2844-2870.

Hildreth, C. and J.P. Huock (1968): “Some Estimators for a Linear Model with RandomCoeﬃcients,”

Journal of the American Statistical Association,

63, 584-92.

Hille, E. and J. D. Tamarkin, (1931): “On the Characteristic Values of Linear IntegralEquations,”

Acta Mathematica , 57, 1-76.

Hoderlein, S., H. Holzmann, and A. Meister (2017): “The Triangular Model withRandom Coeﬃcients,”

Journal of Econometrics , 201, 144-169.

Hoderlein, S., S. Klemela, and E. Mammen (2010): “Analyzing the Random CoeﬃcientModel Nonparametrically,”

Econometric Theory , 26, 804-837.37 oderlein S. and Sherman, B. (2015): “Identiﬁcation and estimation in a correlated ran-dom coeﬃcients binary response model,”

Journal of Econometrics , 188, 135-149.

Horowitz, J. L. (1999): “Semiparametric Estimation of a Proportional HazardModel withUnobserved Heterogeneity,”

Econometrica , 67, 1001-1028.

Horowitz, J. L. and L. Nesheim (2019): “Using Penalized Likelihood to Select Parametersin a Random Coeﬃcients Multinomial Logit Model,” forthcoming

Journal of Econometrics . Hu, Y., and S. M. Schennach (2008): “Instrumental Variable Treatment of NonclassicalMeasurement Error Models,”

Econometrica , 76 (1), 195-216.

Hurwicz L. (1950): “Generalization of the Concept of Identiﬁcation,” In Statistical Inferencein Dynamic Economic Models, ed. T.C. Koopmans. New York: Wiley.

Ichimura, H., and T. Thompson (1998): “Maximum Likelihood Estimation of a BinaryChoice Model with Random Coeﬃcients of Unknown Distribution,”

Journal of Econometrics ,86, 269-295.

Imbens, G., and J. Angrist (1994), “Identiﬁcation and Estimation of Local Average Treat-ment Eﬀects,”

Econometrica , 61, 467-476.

Khan, S. and E. Tamer (2010): “Irregular Identiﬁcation, Support Conditions, and InverseWeight Estimation,”

Econometrica , 6, 2021-2042.

Khuri, A. and G. Casella (2002): “The Existence of the First Negative Moment Revisited”,

The American Statistician,

56, 44-47.

Lewbel, A. and K. Pendakur (2017): “Unobserved Preference Heterogeneity in DemandUsing Generalized Random Coeﬃcients”,

Journal of Political Economy , 125, 1100-1148.

Lewbel, A. (2019): “The Identiﬁcation Zoo – Meanings of Identiﬁcation in Econometrics”

Journal of Economic Literature , 57(4), 835-903.

Masten, M.A. (2017): “Random Coeﬃcients on Endogenous Variables in Simultaneous Equa-tions Models”,

Review of Economic Studies , 85, 1193 1250.

Masten, M.A. and A. Torgovitsky (2016): “Identiﬁcation of Instrumental Variables Cor-related Random Coeﬃcients Models,”

The Review of Economics and Statistics , 98, 1001-1005.

Matzkin, R. L. (2007). Nonparametric Identiﬁcation. In J.J. Heckman and E.E. Leamer (eds.),Handbook of Econometrics, Vol. 6b, pp. 5307-5368, Elsevier, New York.38 atzkin, R. L. (2013): “Nonparametric Identiﬁcation in Structural Economic Models,”

An-nual Review of Economics , 5.

Newey, W. K. (1990): “Semiparametric Eﬃciency Bounds,”

Journal of Applied Econometrics ,5, 99-135.

Newey, W. K. (1994): “The Asymptotic Variance of Semiparametric Estimators,”

Econo-metrica,

62, 1349-1382.

Prakasa Rao B. L. S. (1983).

Nonparametric Functional Estimation . New York: Acad. Press.

Ramm, A. G. and A. I. Katsevich (1996).

The radon transform and local tomography . CRCPress.

Royden, H. L. (1968).

Real Analysis . Second Edition, The Macmillan Company, New York.

Rudin, W. (1987).

Real and complex analysis.

Severini, T. A., and

G. Tripathi (2006): “Some Identiﬁcation Issues in NonparametricLinear Models with Endogenous Regressors,”

Econometric Theorey , 22(2), 258–278.

Severini, T. A., and

G. Tripathi (2012): “Eﬃcency Bounds for Estimating Linear Func-tionals of Nonparametric Regression Models with Endogenous Regressors,”

Journal of Econo-metrics , 170(2), 491-498.

Shen, X. (1997): “On Methods of Sieves and Penalization,”

The Annals of Statistics , 25,2555-2591.

Swamy, P.A.V.B. (1970): “Eﬃcient Inference in a Random Coeﬃcient Model,”

Econometrica ,38, 311-23. van der Vaart, A. W. (1991): “On Diﬀerentiable Functionals,”

The Annals of Statistics , 19,178-204. van der Vaart, A. W. (1998).

Asymptotic Statistics , vol. 3 of Cambridge Series in Statisticaland Probabilistic Mathematics. Cambridge University Press, Cambridge.

Wooldridge, J. (1997): “On Two Stage Least Squares Estimation of the Average TreatmentEﬀect in a Random Coeﬃcient Model,”

Economics Letters , 56, 129-133.

Wooldridge, J. (2003): “Further Results on Instrumental Variables Estimation of AverageTreatment Eﬀects in the Correlated Random Coeﬃcient Model,”

Economics Letters , 79, 185-191. 39 ooldridge, J. (2008): “Instrumental Variables Estimation of the Average Treatment Eﬀectin the Correlated Random Coeﬃcient Model,” in Modelling and Evaluating Treatment Eﬀectsin Econometrics (