Irregular Identification of Structural Models with Nonparametric Unobserved Heterogeneity
aa r X i v : . [ ec on . E M ] M a y Irregular Identification of Structural Models withNonparametric Unobserved Heterogeneity
Juan Carlos Escanciano ∗ Universidad Carlos III de Madrid
May 18th, 2020
Abstract
One of the most important empirical findings in microeconometrics is the pervasivenessof heterogeneity in economic behaviour (cf. Heckman 2001). This paper shows that cumu-lative distribution functions and quantiles of the nonparametric unobserved heterogeneityhave an infinite efficiency bound in many structural economic models of interest. The paperpresents a relatively simple check of this fact. The usefulness of the theory is demonstratedwith several relevant examples in economics, including, among others, the proportion ofindividuals with severe long term unemployment duration, the average marginal effect andthe proportion of individuals with a positive marginal effect in a correlated random co-efficient model with heterogenous first-stage effects, and the distribution and quantiles ofrandom coefficients in linear, binary and the Mixed Logit models. Monte Carlo simulationsillustrate the finite sample implications of our findings for the distribution and quantilesof the random coefficients in the Mixed Logit model.
Keywords:
Irregular Identification; Semiparametric Models; Nonparametric UnobservedHeterogeneity.
JEL classification:
C14; C31; C33; C35 ∗ Department of Economics, Universidad Carlos III de Madrid, Calle Madrid 126, Getafe 28907, Madrid, Spain.E-mail: [email protected]. Web Page: https://sites.google.com/view/juancarlosescanciano. Research fundedby the Spanish Grant PGC2018-096732-B-I00. Introduction
A tenet in empirical microeconometrics research is the pervasiveness of heterogeneity in be-haviour of otherwise observationally equivalent individuals (cf. Heckman 2001). This papershows that, for a large class of structural economic models, regular identification of function-als of nonparametric unobserved heterogeneity (UH), that is, identification of these functionalswith a finite efficiency bound, implies certain necessary smoothness conditions on the functional,leading to a practically simple check for regularity (or lack thereof). In particular, this paperuses these implications to show that cumulative distribution functions (CDFs) and quantilesof UH often have infinite efficiency bounds in many empirically relevant economic models withnonparametric UH. These results have important practical implications, as these parametersare relevant for policy analysis, and they explain why any inferences on such parameters areexpected to be unstable in empirical work. In particular, if a parameter is irregularly identified,then no regular estimator with a parametric rate of convergence exists (see Chamberlain 1986).These observations are applicable to a wide class of models with nonparametric UH. Weconsider first continuous mixtures, which have been commonly employed as a modeling device toaccount for UH in a variety of economic settings ranging from labour to industrial organization;see Compiani and Kitamura (2016) for a recent review. The canonical example is a tightlyspecified structural parametric model that is made flexible by allowing all (or a subset) ofparameters to be individual specific, thereby accounting for UH. We show that if the mappingfrom the individual specific parameters to the conditional likelihood is smooth, then there willbe many functionals of UH that will not be regularly identified. Heuristically, smoothness of theconditional likelihood translates into a multicollinearity problem, as we further explain below.There are important economic applications that fall under this setting, see, e.g., Heckman andSinger (1984a, 1984b) for the study of unemployment duration. We demonstrate the usefulnessof these results in the context of duration data by establishing an infinite efficiency bound forthe distribution and quantiles of UH in the structural model of unemployment duration withtwo spells and nonparametric UH recently proposed by Alvarez, Borovickov´a and Shimer (2016).The results are then extended to several classes of Random Coefficients (RC) models. Thesemodels have a long history in economics; see, e.g., Masten (2017) for a review of the litera-ture. Applying our results to these models is technically more involved because these modelshave discontinuous conditional likelihoods given UH. We consider first RC models where UHis independent of regressors and establish an infinite efficiency bound for the distribution andquantiles of UH in binary and linear RC models. Establishing the zero information in the linearRC model is particularly challenging because the discontinuity in the conditional likelihood leadsto potential discontinuities in the scores of the model. Given these results, we extend them toa triangular RC model with a continuous endogenous variable, where we show irregular identi-fication of the average marginal effect (AME) and the proportion of individuals with a positive2arginal effect. The irregularity of the AME is driven by a positive mass of individuals withsmall first-stage effects. The irregular identification of the CDF and quantiles of the distributionof random or correlated effects holds more generally.The models treated up to this point are indexed by the distribution of UH, and only by thatdistribution. However, a simple and powerful observation of this paper is that our analysis canbe trivially extended to more complex semiparametric models indexed by UH and additional(possibly infinite-dimensional) parameters. We illustrate this point with several examples, in-cluding semiparametric mixture models where some parameters are fixed and others are random.A leading example is the popular RC Logit or Mixed Logit model, which is one of the mostcommonly used models in applied choice analysis. This model was introduced by Boyd andMellman (1980) and Cardell and Dunbar (1980) and it is widely used in environmental eco-nomics, industrial economics, marketing, public economics, transportation economics and otherfields. Applying our results to this model we obtain an infinite efficiency bound for CDFs andquantiles of the RC. The Mixed Logit example nicely illustrates the most appealing feature ofour method of proof, which is its simplicity. Two lines of proof and a simple application ofdominated convergence suffice. This should be contrasted with direct efficiency bounds calcu-lations, which are particularly challenging for this model (or for any of the models we considerfor that matter). These results have practical implications for proposed estimators of the MixedLogit model. We report Monte Carlo simulations supporting our theoretical findings for “fixedgrid” estimators of the distribution and quantiles in the Mixed Logit model (cf. Bajari, Fox andRyan 2007 and Fox, Kim and Yang 2016). Further illustrations demonstrating the utility of ourresults in semiparametric settings are gathered in an Appendix and include examples on mixedproportional duration models and measurement error models with two measurements identifiedby means of Kotlarski’s lemma.The parameters (functionals) we consider are of interest in their own. For example, laboureconomists are interested in the proportion of individuals at risk of severe long term unemploy-ment, and more generally, social scientists are interested in evaluating the effects of treatmentsand policy interventions (e.g. average marginal effects and average signs). The functionals thatwe entertain, such as CDFs and quantiles of UH, are also used as imputs in subsequent coun-terfactual exercises. Our research limits the kind of inferences that are attainable with theseparameters in models where UH is nonparametric.What can be done to obtain regular identification of CDFs and quantiles of UH in these mod-els? We show in several examples that functional form assumptions that restrict the conditionallikelihood of observables given heterogeneity do not generally help for the purpose of achievingregularity of quantiles and CDFs if UH is still nonparametric. Thus, our results show thatrestricting UH is somewhat necessary to attain finite efficiency bounds for the distribution andquantiles of UH in many of the aforementioned models. Commonly used strategies in practice,3uch as the use of parametric distributions for UH or considering discrete heterogeneity, indeedrestore the regular identification of functionals of UH but can be deemed too strong. We findnecessary conditions of regular identification under semiparametric restrictions on UH, althoughwe recognize that giving general primitive assumptions for these conditions seems difficult. Ourrecommendation for inference on CDFs and quantiles of UH is to use flexible semiparametricspecifications such as sieve methods; see, e.g., Shen (1997), Chen (2007), Bajari, Fox and Ryan(2007), Hu and Schennach (2008), Bester and Hansen (2007), Chen and Liao (2014), Fox, Kimand Yang (2016) and references therein, coupled with regularization (penalization) to reducethe high variance of estimates of functionals of UH when the conditional likelihood is a verysmooth function of UH, as illustrated in this paper with the Mixed Logit model.The rest of the paper is organized as follows. After a literature review, Section 3 setsnotation and considers the class of continuous mixtures, where the method is most transparent.This section illustrates the theoretical results in the structural model of Alvarez, Borovickov´aand Shimer (2016). Section 4 extends the analysis to several classes of RC models. Section 5extends further the analysis to semiparametric models, illustrating the theory with the MixedLogit model. Section 6 discusses different strategies, some of them considered in the literature,to regularize the estimation of CDFs and quantiles of UH. Section 7 reports the results of someMonte Carlo simulations for the CDF and quantiles of the distribution of UH in the Mixed Logitmodel. Section 8 concludes. An Appendix contains proofs of the main results, further resultson nonlinear RC models, examples and simulations.
Our paper relates to a number of studies providing sufficient conditions for nonparametric iden-tification for the distribution of UH in the aforementioned models. See, among many others,Elbers and Ridder (1982), Heckman and Singer (1984a, 1984b) and Alvarez, Borovickov´a andShimer (2016) for structural models of unemployment duration, Beran and Hall (1992), Be-ran, Feuerverger and Hall (1996), and Hoderlein, Klemela and Mammen (2010) for linear RC,Ichimura and Thompson (1998), Gautier and Kitamura (2013) and Hoderlein and Sherman(2015) for binary RC, Briesch, Chintagunta and Matzkin (2010) and Fox, Kim, Ryan and Ba-jari (2012) for RC multinomial choice models, Hoderlein, Holzmann and Meister (2017) fortriangular RC models, Masten (2017) for simultaneous RC models, and Lewbel and Pendakur(2017) for nonlinear RC models. For a review of nonparametric identification results see Matzkin(2007, 2013) and Lewbel (2019). What differentiates our paper from these and other relatedstudies is our focus on establishing whether identification is regular or not.Establishing an infinite efficiency bound for functionals of UH in these models is a priori arather challenging task. The main reason is that characterizing the so-called tangent space of4he model and projections onto it is generally quite complicated in the models we study here,and it may explain the relative lack of theoretical work on semiparametric efficiency boundsin RC and related models. See Newey (1990) for a review of semiparametric efficiency boundsand some of the related concepts. Our method of proof avoids the complications in directlycomputing the tangent space, projections and the Fisher information, which is the standardapproach in the literature for obtaining efficiency bounds (see, e.g., Chamberlain 1986, Khanand Tamer 2010). Our indirect method of proof is relatively much simpler. The basic tool isa dominated convergence theorem, with regularity conditions that are easy to check in manymodels (although not in all models). The main building block is a fundamental result by vander Vaart (1991), who found a necessary condition for regular estimation of a parameter. Themain observation of our paper consists in systematically exploiting the implications that van derVaart’s (1991) necessary condition has on the smoothness of certain influence functions. van derVaart (1991), Groeneboom and Wellner (1992) and Bickel, Klassen, Ritov and Wellner (1998)have also used the necessary condition of van der Vaart (1991) to show that CDFs are irregularlyidentified in some specific univariate exponential and uniform mixture models. Relative to thiswork, our contribution is to derive sufficient conditions for a general method of proof, therebyextending the scope of applications to models of economic interest. In particular, we allow formultidimensional UH, semiparametric models and non-smooth conditional likelihoods such asthose that arise with RC models.Although not the focus of this paper, a large class of models for which our results areapplicable are panel data models with fixed effects. Within this setting, Chamberlain (1992)established regular identification of the AME in a linear RC panel data model, while Arellano andBonhomme (2012) showed the identification of the full distribution of UH in a model with limitedserial dependence in errors. Graham and Powell (2012) pointed out the irregular identificationof the AME when regressors exhibit little variation across periods, while Bonhomme (2011)derived conditions for regular and irregular identification of moments of UH in nonlinear paneldata. Our research is highly complementary to these papers, as we consider different modelsand our approach for proving irregular identification is different and exploits the smoothnessimplications of regular identification.We illustrate the theoretical results with some Monte Carlo simulations implementing the“fixed grid” nonparametric CDF estimator of Bajari, Fox and Ryan (2007) and Fox, Kim, Ryanand Bajari (2011), and further investigated in Fox, Kim and Yang (2016). We contribute to theliterature on the Mixed Logit model by proving the infinite efficiency bound for the CDF andquantiles of the nonparametric distribution of RC. We report further finite sample evidence onthe performance of their computationally attractive “fix grid” estimator for CDFs and quantiles,as well as some regularized variants, complementing recent work in econometrics by Horowitzand Nesheim (2019) and Heiss, Hetzenecker and Osterhaus (2019).5
Basic Setting and Results
Let { ( Z i , α i ) } ni =1 denote an independent and identically distributed (iid) sample with the samedistribution as ( Z, α ). The observed data is Z , ..., Z n , while α i denotes the i -th individual’sUH. Assume each observation Z i has a probability law P and a density with respect to (wrt) a σ − finite measure µ given by f η ( z ) = Z A f z/α ( z ) dη ( α ) , (1)where f z/α ( z ) denotes the known conditional density of Z given α, and η is the unknowndistribution of α with support on A ⊆ R d α (the results can potentially be extended to abstractheterogeneity spaces, but for simplicity of exposition we focus on the Euclidean case). Theassumption of known conditional density f z/α ( z ) is relaxed in Section 5.Suppose we are interested in estimating a moment of UH, φ ( η ) = E η [ r ( α )] , for a measurable function r ( · ) ∈ L ( η ) , where, henceforth, E η denotes the expectation underthe distribution η and L p ( ν ) denotes the space of (equivalence classes of) real-valued measurablefunctions h such that R | h | p dν < ∞ , for a generic measure ν. Henceforth, we drop the sets ofintegration in integrals and the qualification ν − almost surely for simplicity of notation . So,for example, a function in L ( ν ) is discontinuous when there is no continuous function in itsequivalence class. Also, we drop the reference to the measure ν in L ( ν ) when ν = P , and writesimply L . We will be concerned with regular identification of φ ( η ) , i.e. identification of φ ( η )with a finite efficiency bound, when UH is nonparametric as formally defined below.The basic message of this paper is based on two observations. First, from a general resultin van der Vaart (1991), we prove that a necessary condition for regular identification of φ ( η )when UH is nonparametric is the existence of a measurable function s ( Z ) with zero mean andfinite variance such that r ( α ) − φ ( η ) = Z s ( z ) f z/α ( z ) dµ ( z ) . (2)Second, if the mapping α → f z/α is continuous (smooth), then under mild regularity conditions,(2) implies that r ( · ) must be also continuous (smooth) . The bulk of this paper is a formalizationof the second observation and its application to some economic models of interest.The precise sense of UH being nonparametric is the usual one, formalized as follows. Let H denote a class of distributions on A , and assume η ∈ H . Let η t ∈ H be a parametric submodelindexed by t ∈ [0 , ε ) , for some ε > , such that for a b ∈ L ( η ) the classical mean squaredifferentiability condition holds, Z " dη / t − dη / t − bdη / → t ↓ . (3)6hen, a formal definition of nonparametric UH is given as follows. Denote by T ( η ) the linearspan of the b ′ s in (3) and let L ( ν ) denote the subspace of functions in L ( ν ) with zero ν − mean. Definition 3.1
UH is nonparametric if T ( η ) is dense in L ( η ) . Henceforth, we assume, unless otherwise stated, that UH is nonparametric. The first re-sult in this section, which follows from an application of van der Vaart (1991), shows that,in the presence of nonparametric UH in model (1), regular identification of E η [ r ( α )] requiresnecessarily that (2) holds. Lemma 3.1
If UH is nonparametric, then (2) is necessary for regular identification of φ ( η ) . We note that Severini and Tripathi (2006, 2012) and Bonhomme (2011) have found relatedresults in the context of nonparametric instrumental variables and nonlinear panel data models,respectively. Also, Escanciano (2020) has shown that (2) is also sufficient for semiparametricidentification of φ ( η ) in model (1). Note that we are not assuming here that η or s in (2)are identified. This generality is important because these functions may not be identified inmany structural economics models under weak assumptions, which does not prevent us fromidentifying and estimating certain functionals of them (cf. Hurwicz 1950). We now proceed with the main insight of this paper, which is that if the mapping α → f z/α iscontinuous (smooth), then, under regularity conditions, r ( · ) must be also continuous (smooth) . This simple observation follows by dominated convergence, and it implies non-regularity ofCDFs, signs, quantiles, and other functionals of UH in “smooth models” satisfying the followingassumption. Let N denote an open subset of A ⊂ R d α . Assumption 1 (i) α → f z/α ( z ) is continuous on N a.e- µ ; (ii) for all α ∈ N there exists aneighborhood of α, say Γ ⊂ N, such that for all s satisfying (2), Z | s ( z ) | sup α ∈ Γ f z/α ( z ) dµ ( z ) < ∞ . (4)Assumption 1(i) is easy to check. Assumption 1(ii) is a dominance condition. The maincomplication in checking Assumption 1(ii) is that s belongs to L ( P ) but not necessarily to L ( µ ) or L ( µ ). We verify these conditions in a number of examples below. Lemma 3.2
Let the conditional density f z/α ( z ) satisfy Assumption 1. Then, r ( α ) in (2) iscontinuous in α on N. Of course, if η is identified, so is φ ( η ) (since r is known). Identification of φ ( η ) follows from (2) becausewe can find an identified function ˜ s ( Z ) , depending only on f z/α and r, such that r ( α ) = E [ ˜ s ( Z ) | α ] holds, andthus by iterated expectations φ ( η ) = E η [ r ( α )] = E η [ E [ ˜ s ( Z ) | α ]] = E [˜ s ( Z )] . Corollary 3.1
Let Assumption 1 hold. The CDF φ ( η ) = E η [1( α ≤ α r )] , for α r ∈ N, is notregularly identified. Quantiles of UH are nonlinear functionals, and are not covered by the previous results.To extend the theory to a more general setting including nonlinear functionals we need tointroduce some notation. A functional φ ( η ) : H → R is said to be differentiable if there existsan r φ ∈ L ( η ) such that for all paths satisfying (3), it holdslim t → φ ( η t ) − φ ( η ) t = E η [ r φ ( α ) b ( α )] . Under nonparametric UH such r φ is unique, as in Newey (1994). This function r φ plays the roleof the preceding moment function r. To illustrate with an example, consider the scalar UH case and assume η is absolute contin-uous with a strictly positive Lebesgue density in a neighborhood of φ ( η ) , where φ ( η ) is suchthat Z φ ( η ) −∞ dη ( α ) = τ, τ ∈ (0 , . (5)That is, φ ( η ) is the τ -quantile of η . It is well-known, see, e.g., Lemma 21.3 in van der Vaart(1998), that the quantile functional is differentiable under the conditions above with influencefunction r φ ( α ) = − { α < φ ( η )) − τ } ˙ η ( φ ( η )) , where ˙ η is the density pertaining to η . From our results, the discontinuity of the influencefunction r φ ( · ) implies irregular identification. Next result, formalizes this finding. Corollary 3.2
Let Assumption 1 hold. Assume η is absolute continuous with a strictly positiveLebesgue density in a neighborhood of φ ( η ) satisfying (5). If φ ( η ) ∈ N, then the τ -quantile ofthe nonparametric UH distribution is not regularly identified. Remark 3.1
Henceforth, whenever we discuss identification of quantiles, we implicitly assumethat the components of UH have densities that satisfy the conditions in Corollary 3.2. Thisexample illustrates how our results are applicable to nonlinear differentiable functionals.
We discuss now the complications of the more standard approach of computing the FisherInformation or the efficiency bound. Define the so-called tangent space of scores S := { s ∈ L : s ( z ) = E [ b ( α ) | Z ] for some b ∈ T ( η ) } . Then, a standard result in linear inverse problems isthat all solutions s of equation (2) have the same orthogonal projection onto the closure of S (see Engl, Hanke and Nuebauer, 1996). Denote by s ∗ such orthogonal projection, the so-called8fficient score. The efficiency bound is given by the variance of s ∗ ( Z ) (see e.g. Newey 1990, vander Vaart 1998, Bickel et al. 1998, and Escanciano 2020). Thus, an alternative to our approachis to compute s ∗ ( Z ) and checking that it has infinite variance. However, computing s ∗ ( Z ) can becumbersome, particularly because characterizing the mean squared closure of S can be a ratherdifficult task in the models we analyze here. In fact, to the best of our knowledge, the analyticalexpression for s ∗ remains unknown for the functionals and models we study. In passing, wenote that these arguments show that it suffices to check the dominance condition (4) for s inthe closure of S . This additional information will turn out to be quite useful in some of ourapplications, such as the linear RC model. We illustrate the applicability of the previous results in the context of a structural model ofunemployment with nonparametric UH. Nonparametric heterogeneity has played a critical rolein rationalizing unemployment duration ever since the seminal contributions by Elbers andRidder (1982) and Heckman and Singer (1984a, 1984b). Recent work by Alvarez et al. (2016)is motivated from this perspective. These authors have shown nonparametric identification ofthe distribution of UH in their nonparametric structural model for unemployment with twospells. Specifically, Alvarez, Borovickov´a and Shimer (2016) propose a structural model fortransitions in and out of employment that implies a duration of unemployment given by thefirst passage time of a Brownian motion with drift, a random variable with an inverse Gaussiandistribution. The parameters of the inverse Gaussian distribution are allowed to vary in arbitraryways to account for UH in workers. These authors investigate nonparametric identification ofthe distribution of UH, η , when two unemployment spells Z i = ( t i , t i ) are observed on theset T , T ⊆ [0 , ∞ ). The reduced form parameters α = ( α , α ) ′ ∈ R × [0 , ∞ ) are functions ofstructural parameters. The distribution of Z i is absolutely continuous with Lebesgue density f η ( t , t ) given, up to a normalizing constant, by f η ( t , t ) = Z R × [0 , ∞ ) α t / t / e − ( α t − α t − ( α t − α t dη ( α , α ) . (6)Alvarez, Borovickov´a and Shimer (2016) show that η is nonparametrically identified up to thesign of α , but they do not investigate if specific functionals of this distribution are regularly orirregularly identified, which is the focus of study here. Specifically, we show that the CDF of η at a point, and other functionals of η with discontinuous influence functions, such as quantiles,have infinite efficiency bounds. These functionals are important parameters. For example, φ ( η ) = E η [1 ( α ≤ α ) 1 ( α ≤ α )] , for a fixed α < < α and large absolute values of α and α , quantifies the proportion of individuals at risk of severe long term unemployment(an individual with parameters α and α , α ≤ α and α ≤ α , has a probability larger or9qual than 1 − exp(2 α α ) of remaining unemployed forever). We apply our previous resultsto this example for a generic moment φ ( η ) = E η [ r ( α , α )] , under the following mild condition. Assumption 2 (i) Let the set
T ⊆ [0 , ∞ ) be a convex set with a non-empty interior; (ii) themoment function r is locally bounded. Proposition 3.1
Under Assumption 2, if φ ( η ) = E η [ r ( α , α )] is regularly identified, then r ( · ) ∈ (cid:8) b ( α , α ) ∈ L ( η ) : b ( α , α ) = C + C α e α α h ( α , α ) (cid:9) , for constants C and C and a continuous function h ( u, v ) defined on (0 , ∞ ) that, if T isbounded, is an infinite number of times differentiable at u ∈ (0 , ∞ ) , for all v ∈ (0 , ∞ ) . For the purpose of proving an infinite efficiency bound for CDFs and quantiles only the continuitypart of Proposition 3.1 suffices. Thus, an implication of Proposition 3.1 is that the CDF of UHat the fixed point ( α , α ) , i.e. φ ( η ) = E [1( α ≤ α )1( α ≤ α )] , is not regularly identifiedbecause r φ ( α , α ) = 1( α ≤ α )1( α ≤ α ) is not continuous when ( α , α ) is in the interiorof the support of η . Corollary 3.3
Under Assumption 2(i), the CDFs and quantiles of UH in the model (6) are notregularly identified.
Random coefficient models have long been used in economics to model nonparametric UH. Thereis by now an extensive literature on nonparametric identification of UH in these models, see,e.g., Masten (2017) and references therein. In this paper we focus on establishing irregularidentification of CDFs and quantiles of the distributions of RC. To the best of our knowledge,this is the first paper to do so in this generality.A general class of random coefficient models, including nonlinear models, is given by Y i = m ( X i , α i ) , (7)where Z i = ( Y i , X i ) are observed, but α i is unobserved and independent of X i with support A .Assume m : X ×A → R r is a measurable map, where X is the support of X . The functional formof m is known, and the nonparametric part is given by the distribution of α i . The assumptionsof known m and the independence of α i and X i are relaxed below. The density of the data is f η ( y, x ) = Z A y = m ( x, α )) dη ( α ) , A ) denotes the indicator function of the event A . In this setting, the dominating measure µ is defined on Z = Y × X as µ ( B × B ) = ν Y ( B ) ν X ( B ) , where B and B are Borel sets of Y and X , respectively, ν Y is either the counting measure for discrete outcomes or the Lebesguemeasure λ ( · ) for continuous outcomes, and ν X ( · ) is the probability measure for X. The mainchallenge we face with RC models is that f z/α ( z ) = 1 ( y = m ( x, α )) is not continuous, and thusthe previous results need to be generalized. The generalization is non-trivial, particularly forcontinuous outcomes, and in some cases it requires delicate technical work. We consider firstthe binary choice RC model. Section 10.1 in the Appendix contains some generic results fornonlinear RC, as well as discussion on some RC models for which our conclusions do not hold. The binary choice random coefficient model is given by Y i = 1 ( X ′ i α i ≥ , where we observe Z i = ( Y i , X i ) but α i is unobservable. The random vector α i is independentof X i , normalized to | α i | = 1 and satisfies P ( α i = 0) = 0. As in the existing literature, weassume η is absolutely continuous wrt the uniform spherical measure σ ( · ) in S d α − , where S d α − = { b ∈ R d α : | b | = 1 } denotes the unit sphere in R d α . The density of the data for apositive outcome (i.e. the choice probability function) is given by f η ( x ) = Z S dα − x ′ s ≥ dη ( s ) . (8)Ichimura and Thompson (1998) and Gautier and Kitamura (2013) found sufficient conditionsfor nonparametric identification of η , but they did not investigate whether identification wasregular or irregular, which is the focus here.By (8) and Lemma 3.1 a necessary condition for regular identification of φ ( η ) = E η [ r ( α )]under nonparametric UH is r ( α ) − φ ( η ) = Z x ′ α ≥ s (1 , x ) dv X ( x ) , (9)for some s ∈ L . The following result provides necessary conditions for regular identification.Write α = ( α , α ′ ) ′ . Proposition 4.1
If the distribution of X/ | X | is absolutely continuous, then r ( · ) in (9) must beuniformly continuous on S d α − . If X = (1 , ˜ X ) and α ′ ˜ X is absolutely continuous, then r ( α , α ) is an absolutely continuous function of α . An implication of this proposition is that functionals such as the CDF and quantiles of randomcoefficients are not regularly identified in the binary RC model. To the best of our knowledge,this result is new in the literature. 11 orollary 4.1
Under the conditions of Proposition 4.1, the CDFs and quantiles of UH in thebinary RC model are not regularly identified.
The linear RC model has a long history in econometrics, see, e.g., Hildreth and Huock (1968)and Swamy (1970). This model is given by Y i = X ′ i α i , where we observe a d z − dimensional vector Z i = ( Y i , X i ) , but α i is unobservable and independentof X i . The dimension of X i and α i is d α , so d z = d α +1 . Like in Hoderlein, Klemel¨a and Mammen(2010), we normalize X i so that | X i | = 1 . The density of the data is f η ( z ) = Z R dα y = x ′ α ) dη ( α ) . (10)Nonparametric identification and estimation of η has been studied by Beran and Hall (1992),Beran, Feuerverger and Hall (1996), and Hoderlein, Klemel¨a and Mammen (2010), among others.These authors exploit the relation between (10) and the Radon transform. In this paper we studynecessary conditions for regular identification of φ ( η ) = E η [ r ( α )] , for a measurable function r ( · ) with E η [ r ( α )] < ∞ , and regular identification of quantiles of the components of α. By Lemma 3.1 a necessary condition for regular identification of φ ( η ) = E η [ r ( α )] undernonparametric UH is r ( α ) − φ ( η ) = Z s ( x ′ α, x ) dv X ( x ) , (11)for some s ∈ L . Under suitable conditions scores in the tangent space S = { s ∈ L : s ( z ) = E [ b ( α ) | Z ] for some b ∈ T ( η ) } are continuous, but providing conditions under which elements ofthe closure of S are continuous is much harder. In fact, without additional restrictions elementsin the closure of S can be potentially very discontinuous. We shall provide regularity conditionsbelow that guarantee that any element of the closure of S can be written as s ( z ) = g ( z ) f η ( z ) , where g ( z ) has an squared integrable weak derivative with respect to the first argument y. Aswe show below, this last condition will be instrumental for checking the sufficient conditions forthe dominated convergence theorem in Lemma 3.2.Let η ,x denote the Lebesgue density of x ′ α when α has distribution η . The set η T ( η )is defined as η T ( η ) := { η b : b ∈ T ( η ) } , while the definition of a Sobolev space H ρ ( A ) isprovided after (24) in the Appendix. 12 ssumption 3 For d α > and N as in Assumption 1: (i) the distribution η is bounded, hasbounded support, with a corresponding density η ,x that is continuous and satisfies inf α ∈ N η ,x ( x ′ α ) ≥ /l ( x ) for a positive measurable function l ( · ) such that E X [ l ( X )] < ∞ ; (ii) X is absolutely con-tinuous with a bounded density f X ( · ); (iii) η T ( η ) ⊆ H ρ ( A ) , where ρ + ( d α − / > (iv) r belongs to the closure of T ( η ) . The bounded support of Assumption 3(i) is often considered in the literature, see, e.g.,Hoderlein, Klemel¨a and Mammen (2010). If the infinite efficiency bound holds in a model withbounded support of α it also holds in the more general model where the support is unrestricted.A sufficient condition for the continuity of η ,x is that the Fourier transform of the density of η is integrable, which was also assumed in Hoderlein, Klemel¨a and Mammen (2010). Assumptions3(i-ii) establish a link between the tails of η and f X ( · ) . Assumption 3(iii) imposes a mildsmoothness condition on the tangent space of UH. This assumption and Assumption 3(iv) allowbut do not require nonparametric UH.
Proposition 4.2
Under Assumption 3 and if r satisfies (11), then it must be continuous on N. Corollary 4.2
Under the conditions of Proposition 4.2, the CDFs and quantiles of UH are notregularly identified in the linear RC model.
The independence assumption between regressors and UH rules out important models and pa-rameters in economics, such as the Average Marginal Effect (AME) φ ( η ) = E η [ γ i ] and theProportion of individuals with a Positive AME (PPAME), φ ( η ) = E η [1 ( γ i > , where γ i isthe coefficient of an endogenous continuous variable in a RC triangular system. We extend ourprevious results to these cases. We will show that under nonparametric UH these importantparameters are not regularly identified. These results appear to be new in the literature underthis generality. For simplicity, we focus on a triangular model, but the same arguments areapplicable to a wide class of random coefficient models, including simultaneous equation mod-els, nonlinear models with endogeneity, or variations of these models that include covariates,multiple endogenous variables, and mixed random and non-random coefficients.Consider the triangular model: Y = γY + U , Y = δX + U , (12)where γ, U , δ and U are RC, and we observe Z = ( Y , Y , X ) ′ . The variable Y is a continuoustreatment variable, possibly endogenous, in the sense that U and U are correlated, and X isan instrument, independent of all the random coefficients. Suppose, the researcher is interested13n the AME φ ( η ) = E η [ γ ] or the PPAME φ ( η ) = E η [1( γ > . We will provide conditionsunder which both parameters have an infinite efficiency bound. To see this, we obtain thereduced forms Y = γδX + γU + U ≡ π X + π ,Y = δX + U , which, with some abuse of notation, are jointly written as Y = α + α X, where Y = ( Y , Y ) ′ ,α = ( α , α ) , α = ( π , U ) ′ and α = ( π , δ ) ′ . Proposition 4.2 can then be applied to thereduced form. Because the corresponding influence functions for the AME and PPAME are r AME ( α ) = π /δ and r P P AME ( α ) = 1( π > δ >
0) + 1( π < δ < , respectively, andthey are discontinuous functions of α = ( π , δ ) ′ , non-regularity follows from Proposition 4.2.Consider the following assumption. Let N be an open set in the interior of A , the support ofthe reduced form random coefficient α . Assumption 4 (i) Assumption 3 holds with the reduced form Y = α + α X ; (ii) X indepen-dent of the random coefficients ( γ, U , δ, U ) ; (iii) ( p , u , , d ) ∈ N for some ( p , u , d ); (iv) ( p , u , p , ∈ N for some ( p , u , p ) . Proposition 4.3
Suppose (12) and Assumption 4(i-ii) holds. If in addition Assumption 4(iii)or Assumption 4(iv) holds, then the PPAME is not regularly identified. If Assumption 4(iv)holds and E [ γ ] < ∞ , then the AME is not regularly identified. Proposition 4.3 proves non-regularity for the AME and the PPAME. The condition E [ γ ] < ∞ ensures that the AME is a continuous functional in L ( η ). If f δ denotes the (Lebesgue)density of δ and h ( u ) = E [ π | δ = u ] f δ ( u ) , then a sufficient condition for E [ γ ] < ∞ islim u → + h ( u ) /u ρ < ∞ for some ρ > E [ π ] < ∞ ; see Khuri and Casella (2002, pg. 45).Intuitively, non-regularity of the AME comes from the presence of a set of individuals withnear-zero first-stage effects (Assumption 4(iv)), although P ( δ = 0) = 0. When the instrumentsatisfies a monotonicity restriction, in the sense that P ( δ >
0) = 1 or P ( δ <
0) = 1 , thenregular identification of the AME might be possible. Indeed, Heckman and Vytlacil (1998) andWooldridge (1997, 2003, 2008) show that with homogenous first-stage effects regular estimationby IV methods holds. Masten (2017, Proposition 4) gives conditions for nonparametric identi-fication of the distribution of γ, but he did not discuss efficiency bounds for the AME or thePPAME under his conditions. Khan and Tamer (2010) and Graham and Powell (2012) showirregularity of the AME in different models where E [ γ ] = ∞ . We show irregularity of the AMEin a setting where E [ γ ] < ∞ . See also Florens et al. (2008), Masten and Torgovitsky (2016),and the extensive literature following the seminal contributions by Imbens and Angrist (1994)and Heckman and Vytlacil (2005) for identification results on conditional and weighted AMEor their discrete versions. 14he PPAME is non-regular under more general conditions than the AME, because it has adiscontinuous influence function under more general conditions than that of the AME. Heckman,Smith and Clements (1997) provide bounds for the analog to PPAME in the binary treatmentcase, and identification when gains are not anticipated at the time of the program. The irreg-ularity of the PPAME also follows from a more general principle that we describe in the nextsection: if irregularity holds in a model with exogenous effects, it also holds in the model withendogenous effects.
This section extends our results to semiparametric models. The main point is as follows, if afunctional is non-regularly identified in a model, it will be non-regularly identified in a largermodel that nests the original model as a special case. Information can only decrease (or remainthe same) when we know less. This basic observation has important implications, and it widenssubstantially the applicability of our results as illustrated with the Mixed Logit model here andwith further examples in the Appendix.
Consider first a conditional semiparametric mixture model with density f η ,θ ( y, x ) = Z f y/x,α ( y ; θ ) dη ( α ) , where θ is an additional unknown parameter, finite or infinite-dimensional. The basic idea hereis that irregularity of φ ( η ) = E η [ r ( α )] in the model where θ is known implies irregularity inthe model where θ is unknown.We illustrate our point with the random coefficients Logit model, also known as the MixedLogit—one of the most commonly used models in applied choice analysis. Fox, Kim, Ryan andBajari (2012) have recently shown nonparametric identification for the semiparametric MixedLogit model. Here, we show that the identification of the CDF and quantiles of the distributionof RC is necessarily irregular when UH is nonparametric. The CDF and quantiles of thisdistribution are important parameters in applications of discrete choice.The data Z i = ( Y i , X i ) is a random sample from the density (wrt µ below), f λ ( y, x ) = Z f y/x,α ( y ; θ ) dη ( α ) , where λ = ( θ , η ) ∈ Θ × H, θ = ( θ , ..., θ J ) ′ ,f y/x,α ( y ; θ ) = exp (cid:0) θ y + x ′ y α (cid:1) P Jj =1 exp (cid:0) θ j + x ′ j α (cid:1) , x = ( x , x , ..., x J ) ∈ X and y ∈ Y = { , , ..., J } . The consumer can choose between j = 1 , ..., J, J < ∞ , mutually exclusive inside goods and one outside good ( y = 0) . The utilityfor the inside good is normalized so that θ = 0 and x = 0 . The random coefficients α areindependent of the regressors X, and have a distribution η . The main result below also applies tothe correlated random coefficient case . In fact, non-regular identification for CDFs and quantilesis proved even when θ is known. This will imply non-regularity when θ is unknown and/orwhen random coefficients are dependent of the characteristics.The measure µ is defined on Z = Y × X as µ ( B × B ) = τ ( B ) ν X ( B ) , where B ⊂ Y , B is a Borel set of X , τ ( · ) is the counting measure and ν X ( · ) is the probability measure for X. The vector α and covariates x y are K − dimensional. The parameter space Θ is an open set of R J . The set H consists of measurable functions η : R K → R whose support A has a non-emptyinterior and R A dη ( α ) = 1.Applying the necessary condition for regular identification to a continuous linear functional φ ( η ) ∈ R with influence function r φ in the model where θ is known, it must be true that forsome s ∈ L , r φ ( α ) − φ ( η ) = Z f y/x,α ( y ; θ ) s ( y, x ) dµ ( y, x ) . (13)It is straightforward to show that the right hand side in (13) is continuous in α in the interiorof its support. In fact, more is true in general: it is an analytic function of α (a function thatis infinitely differentiable with a convergent power series expansion). But continuity suffices forproving the non-regularity of CDFs and quantiles of η . This follows without computing leastfavorable distributions and efficiency bounds, simply by dominated convergence. We gather theproof here to illustrate the simplicity of our method of proof.
Proposition 5.1 r φ in (13) is continuous in the interior of A . Proof of Proposition 5.1 : Write Z f y/x,α ( y ; θ ) s ( y, x ) dµ ( y, x ) = J X j =0 Z f y/x,α ( j ; θ ) s ( j, x ) v X ( dx ) . Each of the summands in the last expression is continuous in α in the interior of its support, bycontinuity and boundedness of f y/x,α ( j ; θ ) and the dominated convergence theorem. (cid:4) Proposition 5.1 implies that identification of the CDF and quantiles of the distribution of η under the conditions specified in Fox et al. (2012) must be irregular. Bajari, Fox and Ryan(2007) propose a simple estimator of the CDF of η , and Fox, Kim and Yang (2016) show itsconsistency (in the weak topology) and obtain its rates of convergence . Proposition 5.1 impliesthat the estimator in Fox et al. (2016), or any other estimator for that matter, cannot achieveregular parametric rates of convergence. The lack of regularity is not evident from the rates16stablished in Fox et al. (2016). Let F be the CDF pertaining to η and b F η the “fixed grid”estimator of Bajari et al. (2007), Fox et al. (2011) and Fox et al. (2016) based on D grid points( D ≡ D ( n ) , where n is the sample size). The order of the bias established in Fox et al. (2016) is D − ¯ s/K where ¯ s is the smoothness of the mapping α → f y/x,α (here ¯ s = ∞ ) . This suggests thatparametric rates might be attainable, but our results show that this is not possible (at least ina local uniform sense). The order of the variance for b F η is inversely related to the minimumeigenvalue of the D × D matrix Ψ D with ( d , d ) − th element, 1 ≤ d , d ≤ D, given by E [ g ′ ( X, α d ) g ( X, α d )] , (14)where g ( x, α d ) = ( f y/x,α d (0; θ ) , ..., f y/x,α d ( J ; θ )) ′ are conditional choice probabilities when UH isevaluated at the d − th grid point α d , d = 1 , ..., D. This minimum eigenvalue quantifies the levelof multicollinearity in the least squares regression of Fox et al. (2016), and we conjecture thatgiven the high smoothness of the mapping α → f y/x,α this term will go to zero exponentiallyfast, so it will be the main determinant in the (slow) rate of convergence of b F η . A detailedtheoretical analysis of this issue is beyond the scope of this paper, but see the discussion in thenext section and the Monte Carlo simulations below, which support these claims. The previous examples show that regular identification of CDFs and quantiles of UH in themodels considered may require restricting the nature of heterogeneity. In this section we in-vestigate how common approaches considered in the literature address the lack of regularity ofthese functionals. Additionally, we provide a necessary condition for CDFs and quantiles to beregularly identified when UH is semiparametric and a discussion on how smoothness of α → f z/α translates into a multicollinearity problem for sieve and related estimators.Our first observation is derived from the main idea in the previous section: functional formassumptions that restrict the conditional likelihood may not help with the irregular identificationof CDFs and quantiles if still the mapping α → f z/α is smooth, while UH is nonparametric. Forexample, knowing the finite dimensional parameters of a semiparametric mixture, knowing thefunctional forms of the idiosyncratic error terms in Kotlarski’s lemma, or knowing the functionalform of the baseline hazard in the mixed proportional hazard model do not help in restoringregular identification of CDFs and quantiles of UH when UH is nonparametric.We discuss how restrictions on UH translate into regularity of functionals of UH. Denoteby T ( η ) the mean squared closure of T ( η ) in L ( η ) . That UH is not nonparametric formallymeans that T ( η ) is a strict subset of L ( η ) . The extension of the necessary condition for regularidentification of φ ( η ) = E η [ r ( α )] , for a measurable function r ( · ) with E η [ r ( α )] < ∞ , is givenin the following lemma. Let Π V denote the orthogonal projection operator onto V , where V V in the norm topology. Lemma 6.1
The necessary condition for regular identification of φ ( η ) = E η [ r ( α )] is Π T ( η ) r ( α ) = Π T ( η ) E [ s ( Z ) | α ] , for some s ∈ L . (15)The mismatch in smoothness between r ( α ) and E [ s ( Z ) | α ] , which was the source of irregularityin the examples studied, may now be restored by the projection onto T ( η ) . We briefly discusshow different restrictions on UH translate into regularity of CDFs and quantiles in view of thisgeneral characterization.A popular approach in practice is to consider a parametric distribution for the UH. A leadingexample of parametric model is a finite mixture with known and finite support points. Para-metric heterogeneity leads to a finite dimensional tangent space T ( η ) , which is then closed T ( η ) = T ( η ) , and which is generated by the scores of the specified distribution. Denote by l η the score of UH, i.e. T ( η ) = T ( η ) = span ( l η ) , assume E η (cid:2) l η ( α ) l ′ η ( α ) (cid:3) is non-singular, anddefine the projected score s ( Z ) = E [ l η ( α ) | Z ] . Then, simple algebra shows that a solution to(15) in s is given by s r defined by s r ( Z ) = λ ′ r s ( Z ) , where λ r is a solution to E [ s ( Z ) s ′ ( Z )] λ r = E (cid:2) r ( α ) l ′ η ( α ) (cid:3) . (16)If the Fisher information for η is positive, which means E [ s ( Z ) s ′ ( Z )] is non-singular, thenthere is a unique solution λ r of (16), and φ ( η ) is regularly identified. More generally, φ ( η ) maybe regularly identified even when η is not, and this corresponds to the system in (16) havingsome solution in λ r . The drawback of the parametric approach is the high misspecificationrisk, which can be quantified by the dimension and form of the model’s tangent space. If thedimension of T ( η ) is D, then the tangent space of the model is at most D − dimensional andgiven by S := { s ∈ L : s ( z ) = λ ′ s ( z ) for some λ ∈ R D } . Estimators for functionals of UH willbe in general inconsistent when the model is misspecified.As usual, a semiparametric approach is more robust to misspecification. In Lemma 6.1 wehave derived the necessary condition for regular identification of moments when UH is semipara-metric, so T ( η ) is a strict subset of L ( η ) of infinite dimension. Examples of semiparametricmodels include finite mixtures with unknown support points and sieve methods with incompletesieve basis. Existing rate results for finite mixtures with unknown support points suggest irreg-ularity of the CDFs in general (see, e.g., Chen 1995 and Heinrich and Kahn 2018), although weare not aware of any paper investigating semiparametric efficiency bounds for finite mixtureswith unknown support points. We recognize that, although the sufficient condition for semi-parametric restrictions in Lemma 6.1 is general, it may be hard to find primitive conditions for18t, as computing the closure of T ( η ) and the projections onto it may not be straightforward inapplications.As a practical approach, we recommend a sieve method where the span of { l η ( α ) } increaseswith the sample size, i.e. D → ∞ as n → ∞ . Without loss of generality normalize l η so that E η (cid:2) l η ( α ) l ′ η ( α ) (cid:3) is the identity matrix. A key quantity for sieve estimation is the minimumeigenvalue of the Fisher information matrix E [ s ( Z ) s ′ ( Z )] , denoted by ξ min ≡ ξ min ( D ); see Fox,Kim and Yang (2016) and (16). We provide a useful bound for ξ min . To that end, we assume thescore operator Ab = E [ b ( α ) | Z ] from L ( η ) to L is compact. A well known sufficient conditionfor this is Z f z/α ( z ) f η ( z ) dη ( α ) dµ ( z ) < ∞ . (17)Under this condition, A has a sequence of singular values { µ d } ∞ d =1 (see Engl, Hanke and Nue-bauer, 1996). Then, the following bound follows essentially from Blundell, Chen and Kristensen(2007, Lemma 1). Lemma 6.2
If (17) holds, then ξ min ( D ) ≤ µ D . Since µ D → D → ∞ , Lemma 6.2 implies that also ξ min ( D ) →
0. This is the multi-collinearity problem mentioned above. Furthermore, the score operator A is an integral opera-tor with kernel K ( z, α ) = f z/α ( z ) /f η ( z ) , and it is well known that the smoother the mapping α → K ( z, α ) , the faster the singular values µ D go to zero. In particular, for analytical kernelsthe singular values decay exponentially fast to zero (Hille and Tamarkin 1931). The minimumeigenvalue ξ min ( D ) is also closely related to the sieve measure of ill-posedness τ D proposed ineconometrics (see Chen 2007 and Blundell, Chen and Kristensen 2007) through the relation τ D = 1 ξ min ( D ) . Prior to this paper, Blundell, Chen and Kristensen (2007, Lemma 1) obtained the bound τ D ≥ /µ D in a nonparametric IV setting. Thus, the modest contribution here is the interpretationin terms of the minimum eigenvalue of the Fisher information matrix. For applications of sieveestimators along this line and the important role of τ D (or ξ min ( D )) see, e.g., Chen (2007),Bajari, Fox and Ryan (2007), Hu and Schennach (2008), Bester and Hansen (2007), Chen andLiao (2014), Fox, Kim and Yang (2016) and references therein. Next section investigates thefinite sample performance of the sieve “fixed grid” method of Fox, Kim and Yang (2016) and aregularized version to reduce the variance of estimates of the CDFs and quantiles of UH. This section illustrates some of the theoretical ideas in a Monte Carlo study on the Mixed Logitmodel. Specifically, we consider the “fixed grid” nonparametric estimator of Bajari et al. (2007)19nd Fox et al. (2016), and evaluate the performance of this estimator for estimating the CDFand quantiles of UH. We also provide a variant of this estimator that performs a Singular ValueDecomposition (SVD) of the resulting design matrix to reduce the variance of the estimator. Tointroduce the estimator, consider a discrete approximation of the distribution of UH of the form η ( α ) ≈ D X d =1 θ d δ α d ( α ) , (18)where θ d are probabilities, adding up to one, over a finite support { α d } Dd =1 of size D in A . InFox et al. (2016) D, and thus the discrete support, is allowed to increase with the sample size n .Define Y i,j as the binary choice equals 1 whenever individual i ′ s choice is j, and zero otherwise.Define the regression error term ε i,j = Y i,j − f η ( j, X i ) . The least squares estimator uses theregression equation Y i,j = Z f y/X i ,α ( j ) dη ( α ) + ε i,j , with the approximation in (18) to obtain the approximated linear regression model Y i,j ≈ D X d =1 θ d f y/X i ,α d ( j ) + ε i,j . Fox et al. (2016) proposes running a regression of Y i,j on the regressors Z di,j := f y/X i ,α d ( j ) subjectto the constrains on the probabilities θ d , i.e. b θ = arg min θ ∈ ∆ d nJ n X i =1 J X j =0 Y i,j − D X d =1 θ d Z di,j ! , (19)where θ = ( θ , ..., θ D ) ′ ∈ ∆ d = n ( p , ..., p D ) : 0 ≤ p d ≤ P Dd =1 p d = 1 o . The least squaresproblem in (19) is convex and can be efficiently solved by standard routines (such as lsqlin inMatlab). The estimator of the CDF of η at α is then given by b F η ( α ) = D X d =1 b θ d α d ≤ α ) , (20)and from the CDF we define the quantile estimators as usual.For simplicity of computation, in the Monte Carlo we apply this estimator to the MixedLogit model without fixed parameters, so f y/x,α ( y ) = exp (cid:0) x ′ y α (cid:1) P Jj =1 exp (cid:0) x ′ j α (cid:1) , We thank Jeremy Fox for sharing the Matlab code to implement their estimator. x = ( x , x , ..., x J ) ∈ X and y ∈ Y = { , , ..., J } . Smoothness of mapping α −→ f y/x,α translates into high correlation of the regressors Z di,j when D is large (for d ′ s corresponding tonearby α ′ d s ), suggesting that methods that account for multicollinearity may reduce the variancesof the resulting estimators. We suggest using the SVD of the design nJ × D matrix Z = ( Z di,j ) , by adding the linear constrain V ′ p − D θ = 0 to (19), where V p − D = ( v p − D , v p − D +1 , ..., v D ) denotesthe last p − D left singular vectors of Z (where as usual, they are ordered according to thesingular values from largest to smallest). This is the classical Principal Component Regressionadapted to the constrained case where θ ′ s are probabilities. The resulting estimator is e θ = arg min θ ∈ ∆ d ,V ′ p − D θ =0 nJ n X i =1 J X j =0 Y i,j − D X d =1 θ d Z di,j ! , which solves a convex problem and can be equally computed by routines such as lsqlin in Matlab.Let e F η ( α ) = P Dd =1 e θ d α d ≤ α ) denote the corresponding CDF estimator . We compare belowthe performance of the resulting CDFs and quantile estimators based on b θ and e θ, respectively.The Monte Carlo setting we consider is taken from a recent study by Heiss, Hetzeneckerand Osterhaus (2019). The data generating process we consider is as follows. The numberof products (not including outside good) is J = 3. The number of product characteristicsis K = 2. The characteristics are generated as independent uniforms on [0 , . The randomcoefficient distribution is a mixture of two bivariate normal distributions with probability weights(1 / , / , means ( − . , − .
2) and (1 . , .
3) and equal variances Σ = Σ = Σ given byΣ = " . . .
15 0 . . To generate the grid { α d } Dd =1 we use a Halton sequence with points spread on [ − , × [ − , . The fixed grid covers the support of the true distribution with probability close to one. Weconsider different values for the number of points in the grid D ∈ { , , } and sample sizes n ∈ { , , } . For computing e θ we set the number of components p to 5 throughout (wehave investigated with values of p between 3 and 10 and obtain qualitatively similar results).We set p deterministically in simulations to save time, but in practice we recommend cross-validation to select p . The number of Monte Carlo simulations is M = 500 . To evaluate theperformance of CDFs’ estimators we compute the integrated absolute bias
Bias ( b F ) = 1 M L M X m =1 L X l =1 (cid:12)(cid:12)(cid:12) b F η,m ( α l ) − F ( α l ) (cid:12)(cid:12)(cid:12) , where { α l } Ll =1 is an additional equally spaced grid over [ − , × [ − ,
5] with L = 121 , b F η,m isthe fixed grid CDF estimator (cf. 20) for the m − th Monte Carlo simulation, and F denotesthe true CDF pertaining to η . 21e also report the Root integrated Mean Squared Error defined as RM SE ( b F ) = vuut M L M X m =1 L X l =1 (cid:16) b F η,m ( α l ) − F ( α l ) (cid:17) . The quantities
Bias ( e F ) and RM SE ( e F ) are analogously defined.Table 1 reports the bias and root mean squared errors for the CDFs estimators b F and e F .
The first observation is that the bias is small even for small sample sizes such as n = 100 , and it does not depend much on D, which is consistent with our discussion in Section 5.1. Theregularization causes e F to have a slightly larger bias than b F in some cases, although the differenceis not substantial, and for small samples the bias of e F is even smaller. On the other hand, thevariance of b F is systematically larger than that of e F , particularly for moderate and large valuesof D, consistent with our claims that the level of multicollinearity increases dramatically withthe number of points D . Table 1 . Bias and RMSE for CDFs in Mixed Logit n D Bias ( b F ) Bias ( e F ) RM SE ( b F ) RM SE ( e F )100 25 0.0781 0.0729 0.1791 0.1059500 25 0.0663 0.0713 0.1380 0.09331000 25 0.0605 0.0708 0.1231 0.0904100 100 0.0799 0.0682 0.1896 0.0999500 100 0.0606 0.0639 0.1428 0.08551000 100 0.0511 0.0630 0.1284 0.0831100 500 0.0784 0.0651 0.1906 0.0982500 500 0.0541 0.0602 0.1452 0.08351000 500 0.0440 0.0592 0.1303 0.0805 M = 500 simulations.Table 2 reports the RMSE for the medians of the marginal distributions of UH (denotedby RMSEQ1 and RMSEQ2 for b F and RMSEQ1-PCR and RMSEQ2-PCR for e F , respectively).Results for other quantile levels are reported in the Appendix. We do not report the biasseparately to save space, but we note that the bias for quantiles is much larger than the biasfor CDFs. We observe substantial gains in terms of RMSE of the regularization by SVD,with the benefits increasing with the number of grid points. Importantly, in both cases, CDFsand quantiles, the reported results are consistent with much slower rates of convergence thanparametric, lending support on the infinite efficiency bounds established in this paper.22 able 2 . RMSE for Medians of Marginals of UH in the Mixed Logit n D
RMSEQ1 RMSEQ1-PCR RMSEQ2 RMSEQ2-PCR100 25 1.6624 0.8061 1.4621 0.7085500 25 0.8492 0.5232 0.8713 0.41551000 25 0.8008 0.4923 0.7386 0.3254100 100 1.6084 0.6315 1.8392 0.6514500 100 0.9411 0.2995 0.9409 0.27901000 100 0.8947 0.1874 0.8976 0.1832100 500 1.6373 0.6360 1.6270 0.5974500 500 1.0599 0.2710 0.9917 0.26391000 500 0.9374 0.1879 0.9669 0.1766 M = 500 simulations. We have established irregular identification of CDFs and quantiles (or more generally, function-als with discontinuous influence functions) of nonparametric UH in some structural economicmodels. Example applications include the structural model of unemployment with two spells inAlvarez et al. (2015), the binary and linear RC models (possibly with correlated effects), theAME in a triangular model with near zero first-stage effects, and the distribution and quantilesof UH in the Mixed Logit model. These are only some applications, but the results are applicablemore widely. Further examples in the Appendix include mixed proportional duration models,and measurement error models with two measurements identified by means of Kotlarski’s lemma.Furthermore, as we discuss in the Appendix, we expect our approach to be applicable to manysituations where the so-called Information Operator (see e.g. Begun, Hall, Huang and Wellner(1983)) is a smoothing operator.The most appealing feature of our method of proof is its simplicity, relative to alternativeapproaches that directly compute efficiency bounds, which are particularly difficult to compute inthe models we have studied. Instead, we exploit some necessary smoothness conditions that theinfluence function of a regularly identified functional must satisfy. The Mixed Logit exampleis illustrative of the easiness in the application of our method of proof. In contrast, directlycomputing the Fisher information and the efficiency bound in this model is rather challenging(and were unknown prior to this paper). The practical implications of the irregularity of CDFsand quantiles have been investigated in a Monte Carlo study. We have found substantial benefitsfrom regularizing the fixed grid estimator of Bajari et al. (2007), Fox et al. (2011) and Fox etal. (2016), without sacrificing much of its appealing computational simplicity. Future researchon the theoretical properties of regularized estimators is guaranteed.23
Appendix A: Proofs of Main Results
Proof of Lemma 3.1 : First, the functional η → φ ( η ) = E η [ r ( α )] is differentiable withinfluence function χ ( α ) = Π T ( η ) r ( α ) , where Π V denotes the orthogonal projection operator onto the closure of V, V .
To see this, notethat by linearity of η → φ ( η ) , for all b ∈ T ( η ) , lim t → φ ( η t ) − φ ( η ) t = E η [ r ( α ) b ( α )]= E η [ (cid:16) Π T ( η ) r ( α ) (cid:17) b ( α )] . Since UH is nonparametric Π T ( η ) r ( α ) = r ( α ) − φ ( η ) . On the other hand, by Lemma 25.34 invan der Vaart (1998) the adjoint of the score operator is given by A ∗ s = E [ s ( Z ) | α ] − E [ s ( Z )] . The lemma then follows from Theorem 3.1 and Theorem 4.1 in van der Vaart (1991), whichestablish that a necessary condition for positive Fisher information for φ ( η ) is r ( α ) − φ ( η ) = E [ s ( Z ) | α ] , since E [ s ( Z )] = 0. (cid:4) Proof of Lemma 3.2:
Let α n , α ∈ N such that α n → α, and define h n ( z ) = s ( z ) f z/α n ( z ) . Note (i) implies h n ( z ) → h ( z ) := s ( z ) f z/α ( z ) a.e- µ. Also, by the dominance condition, for asufficiently large n, Z | h n ( z ) | dµ ( z ) < ∞ . We conclude by dominated convergence that Z s ( z ) f z/α n ( z ) dµ ( z ) → Z s ( z ) f z/α ( z ) dµ ( z ) . (cid:4) Proof of Corollary 3.1:
By Lemma 3.2 if the influence function of the functional is discontin-uous then the functional is not regularly identified. Since the indicator is not continuous, thisproves the lemma. (cid:4)
Proof of Corollary 3.2:
Lemma 21.3 in van der Vaart (1998) shows the pathwise differentia-bility of the quantile functional with an influence function r φ ( α ) = − { α < φ ( η )) − τ } ˙ η ( φ ( η )) . η → φ ( η )satisfies, for all b ∈ T ( η ) , lim t → φ ( η t ) − φ ( η ) t = E η [ r φ ( α ) b ( α )] . From Van der Vaart (1991) it follows that a necessary condition for the quantile functional tobe differentiable is r φ ( α ) − φ ( η ) = Z s ( z ) f z/α ( z ) dµ ( z ) . By Lemma 3.2 if the influence function of the functional is discontinuous then the functionalis not regularly identified. Since the influence function of the quantile is not continuous, thisproves the lemma. (cid:4)
Proof of Proposition 3.1 : By substitution of f z/α ( t , t ) we obtain E [ s ( Z ) | α ] = Z T s ( t , t ) f z/α ( t , t ) dt dt = Cβ e αβ h ( α , α ) , where h ( u, v ) = Z T s ( t , t ) 1 t / t / s ( u, v ; t ) s ( u, v ; t ) dt dt and s ( u, v ; t ) = exp (cid:18) − ut − v t (cid:19) , t ∈ T , ( u, v ) ∈ (0 , ∞ ) . We check that the conditions for an application of the Leibniz’s rule hold. These conditions are The partial derivative ∂ m s ( u, v ; t ) s ( u, v ; t ) /∂ m u exists and is a continuous function on anopen neighborhood B of ( u, v ) , for a.s. ( t , t ) ∈ T . There is a positive function h m ( t , t ) such thatsup ( u,v ) ∈ B (cid:12)(cid:12)(cid:12)(cid:12) ∂ m s ( u, v ; t ) s ( u, v ; t ) ∂ m u (cid:12)(cid:12)(cid:12)(cid:12) ≤ h m ( t , t ) (21)and Z T s ( t , t ) 1 t / t / h m ( t , t ) dt dt < ∞ . (22)Simple differentiation and induction show that for any integer m ≥ ∂ m s ( u, v ; t ) s ( u, v ; t ) ∂ m u = 2 − m ( − m ( t + t ) m s ( u, v ; t ) s ( u, v ; t ) . u ∗ and v ∗ such that (21) holds with h m ( t , t ) = 2 − m ( t + t ) m s ( u ∗ , v ∗ ; t ) s ( u ∗ , v ∗ ; t ) . Furthermore, by E [ s ( Z ) | α ] < ∞ for all α in a local neighborhood (by local boundedness of r ) , and the boundedness of T , condition (22) holds. The continuity of h ( u, v ) is a special case ofthe previous arguments with m = 0 (note the term ( t + t ) m is one and the boundedness of T is not needed in this case). (cid:4) Proof of Proposition 4.1 : Define b ( α ) = E [ s ( Y i = 1 , X i ) | α i = α ]= Z x ′ α ≥ s (1 , x ) dv X ( x ) . We prove that b is continuous and by compactness of the sphere is therefore uniformly continuous.Since the halfspaces 1 ( x ′ α ≥
0) and 1 ( x ′ α ≥
0) intersect in sets having surface measure of order | α − α | , it follows from the absolutely continuity of the angular component of X that | b ( α ) − b ( α ) | = O ( | α − α | ) . When x = (1 , ˜ x ) , then b ( α ) = Z x ′ α ≥ − α ) s (1 , , ˜ x ) dv X (˜ x ) , = Z u ≥ − α ) s α ( u ) f α ( u ) du, where s α ( u ) = E h s ( Y i = 1 , , ˜ X i ) (cid:12)(cid:12)(cid:12) α ′ ˜ X i = u i and f α denotes the density of α ′ ˜ X i . The absolutecontinuity in α follows from the integrability of s α ( u ) f α ( u ) and Royden (1968, Chapter 5). (cid:4) Proof of Corollary 4.1 : The proof follows as in Corollaries 3.1 and 3.2. (cid:4)
For a function a ∈ L ( λ ) ∩ L ( λ ) , define the Fourier transform ˆ a ( t ) = R e it ′ α a ( α ) dα, where i = √− . Use the notation ˜ g ( p, x ) = Z e ipy g ( y, x ) dy, for the Fourier transform with respect to just the first argument (for g ( · , x ) ∈ L ( λ ) ∩ L ( λ )) . Define the norms | g | ,ρ = Z S dα − Z R | ˜ g ( p, x ) | (1 + | p | ) ρ dpdx (23)and | g | ρ = Z | ˆ g ( t ) | (1 + | t | ) ρ dt. (24)26he Sobolev space H ρ ( A ) is defined as the set of measurable functions g such that | g | ρ < ∞ . Proof of Proposition 4.2 : Define the score operator A : T ( η ) → L Ab ( z ) = Rbη ( z ) f η ( z ) 1( f η ( z ) > , where R denotes the Radon transform Ra ( y, x ) = Z a ( α )1( y = x ′ α ) dα. Define g ( z ) = s ( z ) f η ( z ) and a ( α ) = b ( α ) η ( α ) . Since f η ( z ) and η are bounded, it follows that g and a are in L ( λ ) ∩ L ( λ ) . From the definition of Ra ( y, x )sup y,x | Ra ( y, x ) | ≤ Z | a ( α ) | dα < ∞ , (25)and since the supports of α and X are bounded, the support of Y is also bounded and Ra ∈ L ( λ ) , so we can view R : L ( λ ) → L ( λ ) . First, we show that if s belongs to the closure of the range of A, then g ( z ) = s ( z ) f η ( z )belongs to the closure of the range of R. Indeed, if s n is a sequence in the range of A convergingto s in L , then g n = s n f η ( z ) ≡ Ra n and clearly Z | g n ( z ) − g ( z ) | dz ≤ Z | s n ( z ) − s ( z ) | f η ( z ) dz → . Next, we shall show that any function g in the closure of the range of R will have an squaredintegrable weak derivative with respect to the first argument (in y ) . By Theorem 2.4.1 in Rammand Katsevich (1996) and Assumption 3(iii) it follows that | g | ,ρ < ∞ for ρ = ρ + ( d α − / . While by well known results in Fourier analysis, with ∂ y g denoting the weak derivative withrespect to y Z S dα − Z (cid:12)(cid:12)(cid:12) f ∂ y g ( p, x ) (cid:12)(cid:12)(cid:12) dpdx ≤ Z S dα − Z | p | | e g ( p, x ) | dpdx ≤ Z S dα − Z | ˜ g ( p, x ) | (1 + | p | ) ρ dpdx< ∞ , and similarly, by Cauchy-Schwarz Z S dα − Z (cid:12)(cid:12)(cid:12) f ∂ y g ( p, x ) (cid:12)(cid:12)(cid:12) dpdx ≤ Z S dα − Z (cid:0) | p | (cid:1) / | e g ( p, x ) | dpdx ≤ C (cid:18)Z S dα − Z (1 + | p | ) − ρ dpdx (cid:19) / < ∞ , because ρ > . f ∂ y g ( p, x ) ∈ L ( λ ) ∩ L ( λ ) and by Plancherell’s theorem ∂ y g ( · ) ∈ L ( λ ) , as we claimed.Define ϕ ( · ) = ∂ y g ( · ) ∈ L ( λ ) . We proceed to verify the conditions of the dominated conver-gence theorem, see Lemma 3.2. First, we show that g ( y, x ) is continuous in y. Indeed, by thebounded support assumption g ( y, x ) = Z y −∞ ϕ ( u, x ) dx is absolutely continuous in y (see Royden 1968, Chapter 5).Next, by independence of α i and X i , P [ Y i ≤ y | X i = x ] = P [ x ′ α i ≤ y ] , and taking derivatives we conclude f η ( z ) = η ,x ( y ) . Thus, f η ( z ) is also continuous in y byAssumption 3(i). Moreover, inf α ∈ N η ,x ( x ′ α ) ≥ /l ( x ) > , which yields the continuity of α → s ( x ′ α, x ) in N. Furthermore, by Cauchy-Schwarz and Z sup α ∈ Γ | s ( x ′ α, x ) | f X ( x ) dx = Z sup α ∈ Γ | g ( x ′ α, x ) | sup α ∈ Γ (cid:12)(cid:12)(cid:12)(cid:12) f X ( x ) f η ( x ′ α, x ) (cid:12)(cid:12)(cid:12)(cid:12) dx ≤ (cid:18)Z | ϕ ( u, x ) | dudx (cid:19) / Z sup α ∈ Γ (cid:12)(cid:12)(cid:12)(cid:12) f X ( x ) f η ( x ′ α, x ) (cid:12)(cid:12)(cid:12)(cid:12) dx ! / ≤ C Z l ( x ) f X ( x ) dx ≤ C. Thus, by dominated convergence r must be continuous in N . (cid:4) Proof of Corollary 4.2 : The proof follows as in Corollaries 3.1 and 3.2. (cid:4)
Proof of Proposition 4.3 : A necessary condition for a reduced form functional φ ( η ) = E η [ r ( α )] to be regularly identified is r ( α ) − φ ( η ) = Z s ( α + α x, x ) dv X ( x ) , α = ( α ′ , α ′ ) = ( π , U , π , δ ) ′ . Thus, by Proposition 4.2 r ( α ) must be continuous in N. However, the influence function for thePPAME r P P AME ( α ) = 1( π > δ >
0) + 1( π < δ < p , u , , d ) or ( p , u , p , . Conclude that the PPAME is notregularly identified. As for AME, by E [ γ ] < ∞ this functional is differentiable in the sense ofvan der Vaart (1991) with an influence function r AME ( β ) = π /δ. Since there is no continuous28unction that is η − a.s equal to r AME ( β ) = π /δ when ( p , u , p ,
0) is a point in the interior ofthe support, we conclude that the AME is not regularly identified. (cid:4)
Proof of Lemma 6.1 : By Lemma 25.34 in van der Vaart (1998) the so-called score operatoris given by Ab ( z ) = E [ b ( α ) | Z ] , b ∈ T ( η )Thus, by the law of iterated expectations E [ Ab ( Z ) s ( Z )] = E [ b ( α ) s ( Z )]= E [ b ( α ) E [ s ( Z ) | α ]]= E h b ( α )Π T ( η ) E [ s ( Z ) | α ] i . In Lemma 3.2 we have shown that the functional η → φ ( η ) = E η [ r ( α )] is differentiable withinfluence function χ ( α ) = Π T ( η ) r ( α ) . The lemma then follows from Theorem 3.1 in van der Vaart (1991). (cid:4)
Proof of Lemma 6.2 : The sieve measure of ill-posedness (cf. Blundell, Chen and Kristensen2007) is τ D = sup b ∈ T ( η ) ,b =0 k b kk Ab k . Since T ( η ) = span ( l η ) and E η (cid:2) l η ( α ) l ′ η ( α ) (cid:3) is the identity then b = λ ′ l η and k b k = λ ′ λ = | λ | , while k Ab k = λ ′ E [ s ( Z ) s ′ ( Z )] λ. Thus, τ D = sup λ ∈ R D ,λ =0 | λ | λ ′ E [ s ( Z ) s ′ ( Z )] λ = 1inf λ ∈ R D , | λ | =1 λ ′ E [ s ( Z ) s ′ ( Z )] λ = 1 ξ min ( D ) . The bound then follows from Lemma 1 in Blundell, Chen and Kristensen (2007). (cid:4)
10 Appendix B: Further Results
In this section we describe a generic approach that can be used for generic nonlinear RC modelswith continuous outcomes. We also illustrate how certain invertible RC models are ruled out by29ur conditions. For the generic RC model in (7), the regularity condition reads r ( α ) − φ ( η ) = E [ s ( m ( X i , α ) , X i )] . (26)Again, the main difficulty in proving that the right hand side of (26) is continuous is that thescore function s ( · ) is only known to be in L (thus, s is potentially very discontinuous). Toovercome this difficulty, we resort to Fourier analysis and use the so-called Parseval’s identity(see Rudin 1987, pg. 187). To describe the method, assume X is absolutely continuous withdensity f X ( x ) , and define g ( z ) = s ( z ) f η ( z ) and w ( z, α ) = 1 ( y = m ( x, α )) f X ( x ) f η ( z ) 1( f η ( z ) > . Note that g ∈ L ( λ ) , and since f η is bounded, also g ∈ L ( λ ) . Let η m,x denote the density of m ( x, α ) when α has density η . Under our conditions below, w ( · , α ) ∈ L ( λ ) ∩ L ( λ ) , and byParseval’s identity, if r satisfies (26) then r ( α ) − φ ( η ) = Z ˆ g ( t ) ˆ w ( t, α ) dt, (27)where, for a generic function h ∈ L ( λ ) , ˆ h ( t ) = (2 π ) − d z / R e − it ′ z h ( z ) dz denotes the Fouriertransform , with i = √− , v denotes the complex conjugate of v andˆ w ( t, α ) = (2 π ) − d z / Z f X ( x ) η m,x ( x ′ α ) e i ( t m ( x,α )+ t ′ x ) dx. This integral representation is now amenable to our Lemma 3.2 under the following assumption.
Assumption 5 (i) The vector X is absolutely continuous with a bounded density f X ( · ) ; (ii)the density η m,x is continuous and satisfies inf α ∈ N η m,x ( m ( x, α )) > /l ( x ) for an a.s. positivemeasurable function l ( · ) such that E X [ l ( X )] < ∞ ; (iii) the function α → m ( x, α ) is continuousa.s. in x ; (iv) for all ˆ g satisfying (27) , Z | ˆ g ( t ) | sup α ∈ Γ (cid:12)(cid:12)(cid:12) ˆ w ( t, α ) (cid:12)(cid:12)(cid:12) dt < ∞ . (28) Proposition 10.1
Under Assumption 5 and if r satisfies (11), then r ( · ) must be continuous on N. Proof of Proposition 10.1 : First, we need to check that g and w ( z, α ) are in L ( λ ) ∩ L ( λ ) , so we can apply Parseval’s identity. From s ∈ L and the definition of g ( z ) = s ( z ) f η ( z ) , it isclear that g ∈ L ( λ ) . Next, note f η ( z ) ≤ Z R d dη ( α ) = 1 . g also belongs to L ( λ ) . Furthermore, by independence of α i and X i , P [ Y i ≤ y | X i = x ] = P [ m ( x, α i ) ≤ y ] , and taking derivatives we conclude f η ( z ) = η m,x ( y ) . Then, for p = 1 or 2 , Z | w ( z, α ) | p dz = Z (cid:12)(cid:12)(cid:12)(cid:12) f X ( x ) η m,x ( x ′ α ) (cid:12)(cid:12)(cid:12)(cid:12) p dx ≤ Z l p ( x ) | f X ( x ) | p dx ≤ C Z l p ( x ) f X ( x ) dx< ∞ , because f X is bounded. Then, we can apply Parseval’s identity and obtain r ( α ) − φ ( η ) = Z ˆ g ( t ) ˆ w ( t, α ) dt. We now proceed to verify the conditions of Lemma 3.2 with ˆ g ( · ) playing the role of s and ˆ w ( t, α )that of the conditional density. Noteˆ w ( t, α ) = (2 π ) − d z / Z f X ( x ) η m,x ( m ( x, α )) e i ( t m ( x,α )+ t ′ x ) dx. Under the conditions of the proposition the function α → ˆ w ( t, α ) is continuous on N since η m,x ( · )and m ( x, · ) are continuous and η m,x ( m ( x, α )) is bounded away from zero on N. Furthermore, thedominance condition holds from (28). Conclude applying one more time dominated convergenceunder the dominance condition Assumption 5(iii). (cid:4)
Among the conditions of Assumption 5, the most important one is (28). We will see that in aclass of invertible models this condition fails to be satisfied. Consider the canonical monotonicnonseparable model Y i = m ( X i , α i )with a scalar α i and where α → m ( x, α ) is strictly increasing with inverse m − ( y, x ) . Then, ifwe define s ( Y i , X i ) = 1( m − ( Y i , X i ) ≤ , then the regularity condition of Lemma 3.1 is satisfiedwith r ( α ) = 1( α ≤ , proving that the necessary condition for regular identification of the CDFat 0 (or at any other point in fact) holds. In invertible models like this, regularity of CDFsand quantiles is satisfied even in cases where m is not known, but identified. Our results donot apply to invertible models where heterogeneity can be recovered as an identified function ofobservables.To give a specific example, consider the model Y i = X i + α i , where s ( Y i , X i ) = 1( Y i ≤ X i )solves (2) with r ( α ) = 1( α ≤ , which is discontinuous at 0 . This is of course an unrealistic31odel, but the idea is simply to illustrate which of our assumptions is key for the results to hold.In this example, Assumption 5(i-ii) is satisfied under mild conditions, since η m,x ( m ( x, α )) = η ( α ) , but the integrability condition (28) fails, since for s ( Y i , X i ) = 1( Y i ≤ X i ) Z | ˆ g ( t ) | sup α ∈ Γ (cid:12)(cid:12)(cid:12) ˆ w ( t, α ) (cid:12)(cid:12)(cid:12) dt = inf α ∈ Γ η ( α ) Z | ˆ g ( t ) | dt = ∞ , where ˆ g ( t ) = R α ≤ η ( α ) e it α dα. Note that the discontinuity implies the lack of integra-bility.
There is a growing literature in econometrics identifying the distribution of latent variables bymeans of Kotlarski’s Lemma (see Prakasa Rao (1983) for a description of the method). In thissetting we observe Z = ( Y , Y ) satisfying Y = α + α Y = α + α , where α = ( α , α , α ) ′ is a vector of UH with independent components, and (with some abuseof notation) Lebesgue densities η j , for j = 1 , ,
3. The density of the data is given by f η ( y , y ) = Z y = α + α )1( y = α + α ) η ( α ) η ( α ) η ( α ) dα dα dα = Z η ( y − α ) η ( y − α ) η ( α ) dα . Consider a parametric submodel where η and η are known and continuous. The model reducesthen to our original setting where f z/α ( z ) = η ( y − α ) η ( y − α ) is known and continuous in α . If the dominance condition of Lemma 3.2 is satisfied, then the CDF and quantiles of η willbe irregularly identified. The Mixed Proportional Hazard Model leads to a conditional density for duration Y given avector of covariates X given by f η ( y, x ) = Z φ ( x ) ψ ( y ) αe − φ ( x )Ψ( y ) α dη ( α ) , where φ ( x ) is a transformation of covariates, Ψ( y ) is the baseline cumulative hazard, withderivative ψ, and α denotes UH. In submodel where φ ( x ) and ψ ( y ) are known, the model fits32ur original formulation with f z/α ( z ) = φ ( x ) ψ ( y ) αe − φ ( x )Ψ( y ) α known and continuous as a functionof α . Indeed, Horowitz (1999) has established very slow rates of convergence (logarithmic) forthe CDF of α , consistent with the irregular identification. The necessary condition for regular estimation in van der Vaart (1991) is quite general, and inits abstract form reads as ˜ ψ ∈ R ( A ∗ ) , where ˜ ψ is the so-called gradient, which for our original moment functional is ˜ ψ ( α ) = r ( α ) − φ ( η ) , and A ∗ is the adjoint of the so-called score operator A . In many semiparametric models, A ∗ isa smoothing integral operator, in the sense that A ∗ s = Z s ( z ) k ( z, α ) dµ ( z )is an operator from L to L ( η ) with a kernel function k such that α → k ( z, α ) is smooth,at least for some submodel. We expect our results to be potentially applicable in this generalsetting. We report here further results for estimation of quantiles in the Mixed Logit Model. The settingis that of the Monte Carlo section, the only different being that other quantile levels τ differentfrom the median ( τ = 0 .
5) are considered. Table 3 report the RMSE. We observe that, asexpected, the RMSE at more extreme quantiles are larger than those for the median. Again,the gains from regularization are substantial, particularly for large values of R. able 3 . RMSE for τ − Quantiles of marginals of UH τ n D
RMSEQ1 RMSEQ1-PCR RMSEQ2 RMSEQ2-PCR0.25 100 25 1.6739 0.9308 1.7238 0.72700.25 500 25 1.3247 0.8674 1.3270 0.63050.25 1000 25 1.0596 0.8075 1.1569 0.62500.25 100 100 1.8369 0.6098 1.8217 0.66080.25 500 100 1.4038 0.4929 1.3763 0.55040.25 1000 100 1.3153 0.4832 1.2041 0.50360.25 100 500 1.8075 0.5893 1.8696 0.62750.25 500 500 1.4529 0.4520 1.4698 0.48940.25 1000 500 1.2954 0.4444 1.2481 0.45000.75 100 25 1.5719 0.9803 1.6573 0.99280.75 500 25 1.1938 0.8077 1.3158 0.79530.75 1000 25 0.8941 0.7045 1.1489 0.75250.75 100 100 1.8192 0.8616 1.7989 0.80990.75 500 100 1.2178 0.6029 1.1809 0.59360.75 1000 100 0.8947 0.5495 0.9476 0.53580.75 100 500 1.9017 0.8107 1.9402 0.83290.75 500 500 1.2381 0.5666 1.2324 0.54670.75 1000 500 0.9606 0.4885 0.9533 0.510234 eferences
Alvarez, F., Boroviˇckov´a, K. and
R. Shimer (2016): “Decomposing Duration Depen-dence in a Stopping Time Model,” unpublished manuscript.
Arellano, M. and S. Bonhomme (2012): “Identifying Distributional Characteristics inRandom Coefficients Panel Data Models,”
Review of Economic Studies,
79, 987-1020.
Bajari, P., Fox, J.T., Ryan, S. , (2007): “Linear Regression Estimation of Discrete ChoiceModels with Nonparametric Distributions of Random Coefficients,”
American Economic Re-view , 72, 459-463.
Begun, J. M., W. J. Hall, W. M. Huang, and J. A. Wellner (1983): “Informationand Asymptotic Efficiency in Parametric-Nonparametric Models,”
The Annals of Statistics ,11, 432-452.
Beran, R. and P. Hall (1992): “Estimating Coefficient Distributions in Random CoefficientRegressions,”
The Annals of Statistics , 20, 1970-1984.
Beran, R., Feuerverger, A., and Hall, P. (1996): “On Nonparametric Estimation ofIntercept and Slope Distributions in Random Coefficient Regression”,
The Annals of Statistics ,24, 2569-2592.
Bester, A., and C. Hansen (2007): “Flexible Correlated Random Effects Estimation inPanel Models with Unobserved Heterogeneity,” unpublished manuscript.
Bickel, P. J., C. A. J. Klassen, Y. Ritov, and
J. A. Wellner (1998):
Efficient andAdaptive Estimation for Semiparametric Models . New York: Springer-Verlag.
Blundell, R., X. Chen, and
D. Kristensen (2007): “Semi-nonparametric IV Estimationof Shape-invariant Engel Curves, ”
Econometrica , 75, 1613-1670.
Bonhomme, S. (2011): “Panel Data, Inverse Problems, and the Estimation of Policy Parame-ters,” Unpublished manuscript.
Boyd, J.H., and Mellman, R.E. (1980): “Effect of Fuel Economy Standards on the USAutomotive Market: An Hedonic Demand Analysis,”
Transportation Research B
Briesch, R. A., P. K. Chintagunta and R. L. Matzkin (2010): “Nonparametric DiscreteChoice Models With Unobserved Heterogeneity,”
Journal of Business & Economic Statistics ,28, 291-307. 35 ardell, N.S. and Dunbar, F.C. (1980): “Measuring the Societal Impacts of AutomobileDownsizing,”
Transportation Research B
Chamberlain, G. (1986): “Asymptotic Efficiency in Semi-Parametric Models with Censor-ing,”
Journal of Econometrics , 34, 305-334.
Chamberlain, G. (1992): “Efficiency Bounds for Semiparametric Regression,”
Econometrica ,60, 567-596.
Chen, J. (1995): “Optimal Rate of Convergence for Finite Mixture Models,”
The Annals ofStatistics , 23, 221-233.
Chen, X. , (2007): “Large Sample Sieve Estimation of Semi-Nonparametric Models,” in:
Hand-book of Econometrics , vol. 7. Elsevier.
Chen, X. and Z. Liao (2014): “Sieve M-Inference of Irregular Parameters”,
Journal ofEconometrics , 182, 70-86.
Elbers, C. and G. Ridder (1982): “True and Spurious Duration Dependence: The Identifi-ability of the Proportional Hazard Model,”
Review of Economic Studies , 49, 403-409.
Engl, H. W., M. Hanke, and
A. Nuebauer (1996):
Regularization of Inverse Problems .Dordrecht: Kluwer Academic Publishers.
Escanciano, J.C., (2020): “Semiparametric Identification and Fisher Information”, unpub-lished manuscript.
Florens, J.P., J.J. Heckman, C. Meghir and E. Vytlacil, (2008): “Identification ofTreatment Effects Using Control Functions in Models with Continuous, Endogenous Treat-ment and Heterogeneous Effects,”
Econometrica , 76, 1191-1206.
Fox, J. T., K.-I. Kim, S. P. Ryan, and P. Bajari (2011): “A Simple Estimator for theDistribution of Random Coefficients,”
Quantitative Economics , 2, 381-418.
Fox, J. T., K.-I. Kim, S. P. Ryan, and P. Bajari (2012): “The Random Coefficients LogitModel Is Identified,”
Journal of Econometrics , 166 (2), 204-212.
Fox, J. T., K.-I. Kim, C. Yang (2016): “A Simple Nonparametric Approach to Estimatingthe Distribution of Random Coefficients in Structural Models,”
Journal of Econometrics ,195, 236-254.
Gautier, E. and Y. Kitamura (2013): “Nonparametric Estimation in Random CoefficientsBinary Choice Models,”
Econometrica , 81, 581-607.36 raham, B. W., and J. L. Powell (2012): ”Identification and Estimation of Average PartialEffects in ’Irregular’ Correlated Random Coefficient Panel Data Models,”
Econometrica , 80,2105-2152.
Groeneboom P. and Wellner J.A. (1992).
Information Bounds and Nonparametric Max-imum Likelihood Estimation . DMV Seminar, vol 19. Birkh¨auser, Basel
Heckman, J. J. (2001): “Micro Data, Heterogeneity, and the Evaluation of Public Policy:Nobel Lecture,”
Journal of Political Economy , 109(4), 673-748.
Heckman, J.J. and B. Singer (1984a): “The Identifiability of the Proportional HazardModel”,
Review of Economic Studies , 51,231-241.
Heckman, J.J. and B. Singer (1984b): “A Method for Minimizing the Impact of Distribu-tional Assumptions in Econometric Models for Duration Data,”
Econometrica , 52, 271-320.
Heckman, J.J., Smith, J. and N. Clements (1993): “Making The Most Out of ProgrammeEvaluations and Social Experiments: Accounting For Heterogeneity in Programme Impacts,”
Review of Economic Studies , 64, 487-535.
Heckman, J.J. and E. Vytlacil (1998): “Instrumental Variables Methods for the CorrelatedRandom Coefficient Model: Estimating the Average Rate of Return to Schooling When theReturn is Correlated with Schooling,”
Journal of Human Resources , 33, 974-987.
Heckman, J.J. and E. Vytlacil (2005): “Structural Equations, Treatment, Effects andEconometric Policy Evaluation,”
Econometrica , 73, 669-738.
Heiss, F. S. Hetzenecker and M. Osterhaus (2019): “Nonparametric Estimation of theRandom Coefficients Model: An Elastic Net Approach, unpublished manuscript.
Heinrich, P. and J. Kahn (2018): “Optimal Rates for Finite Mixture Estimation,”
TheAnnals of Statistics , 46, 2844-2870.
Hildreth, C. and J.P. Huock (1968): “Some Estimators for a Linear Model with RandomCoefficients,”
Journal of the American Statistical Association,
63, 584-92.
Hille, E. and J. D. Tamarkin, (1931): “On the Characteristic Values of Linear IntegralEquations,”
Acta Mathematica , 57, 1-76.
Hoderlein, S., H. Holzmann, and A. Meister (2017): “The Triangular Model withRandom Coefficients,”
Journal of Econometrics , 201, 144-169.
Hoderlein, S., S. Klemela, and E. Mammen (2010): “Analyzing the Random CoefficientModel Nonparametrically,”
Econometric Theory , 26, 804-837.37 oderlein S. and Sherman, B. (2015): “Identification and estimation in a correlated ran-dom coefficients binary response model,”
Journal of Econometrics , 188, 135-149.
Horowitz, J. L. (1999): “Semiparametric Estimation of a Proportional HazardModel withUnobserved Heterogeneity,”
Econometrica , 67, 1001-1028.
Horowitz, J. L. and L. Nesheim (2019): “Using Penalized Likelihood to Select Parametersin a Random Coefficients Multinomial Logit Model,” forthcoming
Journal of Econometrics . Hu, Y., and S. M. Schennach (2008): “Instrumental Variable Treatment of NonclassicalMeasurement Error Models,”
Econometrica , 76 (1), 195-216.
Hurwicz L. (1950): “Generalization of the Concept of Identification,” In Statistical Inferencein Dynamic Economic Models, ed. T.C. Koopmans. New York: Wiley.
Ichimura, H., and T. Thompson (1998): “Maximum Likelihood Estimation of a BinaryChoice Model with Random Coefficients of Unknown Distribution,”
Journal of Econometrics ,86, 269-295.
Imbens, G., and J. Angrist (1994), “Identification and Estimation of Local Average Treat-ment Effects,”
Econometrica , 61, 467-476.
Khan, S. and E. Tamer (2010): “Irregular Identification, Support Conditions, and InverseWeight Estimation,”
Econometrica , 6, 2021-2042.
Khuri, A. and G. Casella (2002): “The Existence of the First Negative Moment Revisited”,
The American Statistician,
56, 44-47.
Lewbel, A. and K. Pendakur (2017): “Unobserved Preference Heterogeneity in DemandUsing Generalized Random Coefficients”,
Journal of Political Economy , 125, 1100-1148.
Lewbel, A. (2019): “The Identification Zoo – Meanings of Identification in Econometrics”
Journal of Economic Literature , 57(4), 835-903.
Masten, M.A. (2017): “Random Coefficients on Endogenous Variables in Simultaneous Equa-tions Models”,
Review of Economic Studies , 85, 1193 1250.
Masten, M.A. and A. Torgovitsky (2016): “Identification of Instrumental Variables Cor-related Random Coefficients Models,”
The Review of Economics and Statistics , 98, 1001-1005.
Matzkin, R. L. (2007). Nonparametric Identification. In J.J. Heckman and E.E. Leamer (eds.),Handbook of Econometrics, Vol. 6b, pp. 5307-5368, Elsevier, New York.38 atzkin, R. L. (2013): “Nonparametric Identification in Structural Economic Models,”
An-nual Review of Economics , 5.
Newey, W. K. (1990): “Semiparametric Efficiency Bounds,”
Journal of Applied Econometrics ,5, 99-135.
Newey, W. K. (1994): “The Asymptotic Variance of Semiparametric Estimators,”
Econo-metrica,
62, 1349-1382.
Prakasa Rao B. L. S. (1983).
Nonparametric Functional Estimation . New York: Acad. Press.
Ramm, A. G. and A. I. Katsevich (1996).
The radon transform and local tomography . CRCPress.
Royden, H. L. (1968).
Real Analysis . Second Edition, The Macmillan Company, New York.
Rudin, W. (1987).
Real and complex analysis.
Severini, T. A., and
G. Tripathi (2006): “Some Identification Issues in NonparametricLinear Models with Endogenous Regressors,”
Econometric Theorey , 22(2), 258–278.
Severini, T. A., and
G. Tripathi (2012): “Efficency Bounds for Estimating Linear Func-tionals of Nonparametric Regression Models with Endogenous Regressors,”
Journal of Econo-metrics , 170(2), 491-498.
Shen, X. (1997): “On Methods of Sieves and Penalization,”
The Annals of Statistics , 25,2555-2591.
Swamy, P.A.V.B. (1970): “Efficient Inference in a Random Coefficient Model,”
Econometrica ,38, 311-23. van der Vaart, A. W. (1991): “On Differentiable Functionals,”
The Annals of Statistics , 19,178-204. van der Vaart, A. W. (1998).
Asymptotic Statistics , vol. 3 of Cambridge Series in Statisticaland Probabilistic Mathematics. Cambridge University Press, Cambridge.
Wooldridge, J. (1997): “On Two Stage Least Squares Estimation of the Average TreatmentEffect in a Random Coefficient Model,”
Economics Letters , 56, 129-133.
Wooldridge, J. (2003): “Further Results on Instrumental Variables Estimation of AverageTreatment Effects in the Correlated Random Coefficient Model,”
Economics Letters , 79, 185-191. 39 ooldridge, J. (2008): “Instrumental Variables Estimation of the Average Treatment Effectin the Correlated Random Coefficient Model,” in Modelling and Evaluating Treatment Effectsin Econometrics (