Identification of Random Coefficient Latent Utility Models
aa r X i v : . [ ec on . E M ] F e b Identification of Random Coefficient Latent Utility Models ∗ Roy AllenDepartment of EconomicsUniversity of Western [email protected] John RehbeckDepartment of EconomicsThe Ohio State [email protected] 3, 2020
Abstract
This paper provides nonparametric identification results for random coef-ficient distributions in perturbed utility models. We cover discrete and con-tinuous choice models. We establish identification using variation in meanquantities, and the results apply when an analyst observes aggregate demandsbut not whether goods are chosen together. We require exclusion restrictionsand independence between random slope coefficients and random intercepts.We do not require regressors to have large supports or parametric assumptions. ∗ Any remaining errors are our own. Introduction
Latent utility models with linear random coefficients have been extensively used.They have a long history in discrete choice, and have become increasingly populardue to computational advances (see e.g. Train [2009]). They now form the coredemand system of most applied work involving demand for differentiated productsfollowing Berry et al. [1995]. Progress has been made on identification of these modelsin discrete choice, but gaps remain, even in semiparametric settings. For example,nonparametric identification of the distribution of random coefficients in the randomcoefficients nested logit model has not been established without unbounded regres-sors. More broadly, there is growing interest in models that allow complementarity,but even less is known about identification of random coefficients in these models. The main contribution of this paper establishes nonparametric identification for themoments of random coefficients in a general class of latent utility models. The frame-work applies to discrete and continuous choices. As a special case, we establishidentification for a bundles model with limited consideration of either alternativesor characteristics (Example 2). Identification only depends on the average struc-tural function [Blundell and Powell, 2003]. Thus the results can be applied when oneobserves the average demands of individuals without observing whether goods arechosen together. Leveraging the main result, we can identify the distribution of ran-dom coefficients when it is characterized by its moments (e.g. normal distributions).Specialized to discrete choice, the main contribution is new since it does not requireany regressor to be unbounded.Two key ingredients let us get traction for identification. First, we assume indepen-dence between random slopes and random intercepts. This is a standard assumptionin the widely-used random coefficients logit model. We use this assumption to inte-grate out the random intercepts. This smooths out demand when conditioning on Heckman [2001] attributes the first use to Domenich and McFadden [1975] in economics. See Nevo [2000], p. 524-526. il Kim [2014] has established identification in the special casewhere intercept location coefficients are 0. Recent work includes Gentzkow [2007], McFadden and Fosgerau [2012], Fosgerau et al. [2019],Allen and Rehbeck [2019a], Ershov et al. [2018], Monardo [2019], Iaria and Wang [2019], and Wang[2020]. This work is an outgrowth of the discrete choice additive random utility model [McFadden,1981] and differs from classic continuous demand systems (e.g. Deaton and Muellbauer [1980]) byfocusing on characteristic variation rather than variation in a budget constraint. known , wedo not. The fundamental shape restriction we exploit is that, after integrating out randomintercepts, integrated mean choices are the derivative of a convex function. This fol-lows from an application of the envelope theorem. Similar tools from convex analysishave also been used for identification of hedonic models [Ekeland et al., 2002, 2004,Heckman et al., 2010, Chernozhukov et al., 2019b], matching [Galichon and Salani´e,2015], dynamic discrete choice [Chiong et al., 2016], discrete choice panel models [Shiet al., 2018], and perturbed models with additively separable heterogeneity [Allen andRehbeck, 2019a], among others. By exploiting the envelope theorem, we can treat several models in a common frame-work. There is little work on identification of optimizing models with linear randomcoefficients outside of discrete choice. Exceptions include Dunker et al. [2018] fordiscrete games and Dunker et al. [2017] and Iaria and Wang [2019] for a random coef-ficients version of the Gentzkow [2007] discrete bundles model. We differ by requiringidentification of only the average structural function (“mean demands”), withoutneeding to observe the frequency with which goods are chosen together. Identifi-cation with linear random coefficients has also been established in settings withoutassuming an optimizing model. See for example the simultaneous equations analysisin Masten [2017] and references therein.Identification of linear random coefficients has been extensively studied in discretechoice. Despite this, nonparametric identification has only been established either In discrete choice, assuming this function is known translates to the distribution of randomintercepts being known (e.g. logit). Lewbel and Pendakur [2017] show that one can drop theassumption that this mapping is known in some settings if one imposes an additional additiveseparability assumption. Matzkin [1994] reviews other identification results using shape restrictions motivated by eco-nomic theory. See also work on optimal transport, as in Galichon [2018]. Wang [2020] also works with the average structural function but does not identify the distribu-tion of random coefficients. Several of thesepapers additionally assume large support regressors also have the same coefficientacross goods. In contrast, Fox et al. [2012] and il Kim [2014] do not assume largesupport or a homogeneous regressor, but assume the distribution of the random in-tercept is known (e.g. logit). Chernozhukov et al. [2019a] discusses identification ofratios of certain moments of the distribution of random coefficients without requir-ing large support, but do not provide conditions under which the full distribution isidentified.The remainder of the paper proceeds as follows. Section 2 provides details on theclass of latent utility models we study and examples of behavior that this covers. Sec-tion 3 provides the main result, which identifies arbitrary order moments of randomcoefficients and shows that a single independence and scale assumption can be usedto identify all other moments. Section 4 discusses how to recover different welfareobjects and perform counterfactuals. Finally, Section 5 discusses relations to some ex-isting papers, shows how the results can be taken to settings with non-linear randomcoefficients, and discusses some testable properties of the framework.
This paper studies the random coefficients perturbed utility model , in which optimizingchoices satisfy Y p X, β, ε q P argmax y P B K ÿ k “ y k p β k X k q ` D p y, ε q . (1) Exceptions include Kashaev [2018] and Matzkin [2019], but neither paper studies nonparametricidentification for the distribution of linear random coefficients. Y p¨q as the quantity vector for K different goods. The vector X k “ p X k, , . . . , X k,d k q denotes observable shifters ofthe desirability of good k , and β k “ p β k, , . . . , β k,d k q denotes random coefficients onthese shifters, which may be good-specific. The index β k X k shifts the marginal utilityof good k . We collect X “ p X , . . . , X K q and β “ p β , . . . , β K q . The term D p y, ε q is a disturbance that depends on unobservables ε of unrestricted dimension. When D p y, ε q “ ř Kk “ ε k y k , ε k can be interpreted as a random intercept for the desirabilityof the k -th good. In general, we refer to D p y, ε q as the random intercept. The set B Ď R K is a feasibility set. This is introduced purely for exposition, since D p y, ε q can be ´8 which allows random feasibility sets.The focus of this paper is on identification of moments of the distribution of β .Our results do not require specification of the budget B , the disturbance D , or thedistribution of over ε . For concreteness, we provide some examples. Example 1 (Discrete Choice) . Consider a discrete choice models with latent utilityfor good k of the form v k “ β k X k ` ε k . When β is random, this is a linear random coefficients model as studied in Hausmanand Wise [1978], Boyd and Mellman [1980], Cardell and Dunbar [1980], among manyothers. This fits into the setup of (1) by setting D p y, ε q “ ř Kk “ y k ε k , B “ t y P R K | ř Kk “ y k “ , y k ě u the probability simplex, and letting Y p X, β, ε q P t , u K be avector of indicators denoting which good is chosen. In many applications, an “outsidegood” is set to have a utility of . This can be mapped to our setup by replacing thebudget with B “ t y P R K | ř Kk “ y k ď , y k ě u ; this allows Y p X, β, ε q “ p , . . . , q P B , which can be interpreted as the choice of the outside option.We also cover what is sometimes called the perturbed representation of choice, whichcan model market shares or individuals who like variety. For example, Anderson et al.[1988] show logit models are related to the maximization problem max y P ∆ K ÿ k “ y k p β k X k q ` K ÿ k “ y k log p y k q , This allows the agent to randomize when there are utility ties. ith ∆ the probability simplex. Hofbauer and Sandholm [2002] show by replacing theadditive entropy term with a general disturbance, the setup covers all discrete choiceadditive random models once random intercepts are integrated out. Fudenberg et al.[2015] study a model in which the disturbance is additively separable. Fosgerau et al.[2019] and Allen and Rehbeck [2019b] show how to model complementarity with theperturbed utility representation. Example 2 (Bundles with Limited Consideration) . Gentzkow [2007] presents a modelof choice of bundles involving online and print news. The model involves multiplegoods and individuals can choose more than one good at the same time. A randomcoefficients version of the model has been studied in Dunker et al. [2017]. Let v j,k denote the utility associated with quantity j of the first good, and quantity k of thesecond good. Specify utilities v , “ v , “ β X ` ε , v , “ β X ` ε , v , “ v , ` v , ` ε , . Here, ε , denotes a utility boost or loss from purchasing both goods relative to thesum of their individual utility. It describes complementarity/substitutability betweenthe goods. For each quantity vector ~y “ p y , y q , set the utility as ÿ k “ y k p β k X k q ` p y ε , ` y ε , ` y y ε , q , and let the budget be B “ t , u . Then the optimizing quantity vector Y p X, β, ε q Pt , u fits into the setup of (1).This can be modified to include latent budgets. One may interpret these as mental“consideration sets” [Eliaz and Spiegler, 2011, Masatlioglu et al., 2012, Manzini andMariotti, 2014, Aguiar, 2017] or general latent feasibility sets [Manski, 1977, Conlonand Mortimer, 2013, Brady and Rehbeck, 2016]. In addition, we can allow limitedconsideration of the characteristics of goods. See Gabaix [2019] for a survey of “behavioral inattention.” o model these types of limited attention, consider a version of the bundles modelgiven by Y p X, β, ε q P argmax y Pt , u ÿ k “ y k p β k X k q ` D p y, ε q , where D p y, ε q “ $&% y ε , ` y ε , ` y y ε , if y P B p ε q´8 otherwise . Here, the set B p ε q is a latent feasibility set, which could arise because an individualmay not consider all goods or the analyst cannot observe when goods are out of stock.Some components of β can be zero with positive probability, reflecting that individualsmay not notice or care about certain characteristics.This setup can be generalized to allow more goods, with some goods continuous andsome goods discrete. What is key for our analysis is that the index β k X k shifts (only)the marginal utility of good k . This paper establishes identification of moments of β using the average structuralfunction [Blundell and Powell, 2003] Y p x q “ ż Y p x, β, ε q dτ p β, ε q for some probability measure τ that does not depend on covariates x . We assumethat the measure τ satisfies a key independence condition. Assumption 1 (Slope-Intercept Independence) . The random variables β and ε areindependent under the measure τ , and the average structural function is finite. While independence between β and ε is restrictive, it is a standard assumption inapplications of the random coefficients logit model in discrete choice. It has beenexploited for identification in [Fox et al., 2012] and [Chernozhukov et al., 2019a]. However, independence is not imposed in some papers studying identification. For example,Ichimura and Thompson [1998] or Gautier and Kitamura [2013] do not impose independence of theslope and intercept. Y p x q “ ż ż Y p x, β, ε q dµ p ε q dν p β q for some probability measures µ and ν . Technically, full independence is not neededas long as we can factor the average structural function in this way.For an example of an average structural function, suppose p Y, X, β, ε q are randomvariables that satisfy Y “ Y p X, β, ε q almost surely. Moreover, assume X , β , and ε are all independent. In addition to independence, suppose a continuous version of theconditional mean of Y given X exists. Then E r Y | X “ x s “ Y p x q “ ż ż Y p x, β, ε q dµ p ε q dν p β q for x in the support of X , where µ is the marginal distribution of ε and ν is themarginal distribution of β .The results in this paper apply to general average structural functions Y p x q , not onlythe conditional mean. Thus, while slope-intercept independence is important for ourresults, independence between X and p β, ε q is not. Therefore, the results in this paperare relevant for settings with endogeneity.The goal of this paper is not to provide a new method to identify the average struc-tural function, but rather to use the function to identify other features of a utilitymaximizing model. There is a large literature on identifying structural functions.Blundell and Powell [2003] describe how to use control functions to identify the aver-age structural function Y p x q . Altonji and Matzkin [2005] identify derivatives of theaverage structural function using certain conditional independence or symmetry con-ditions. Berry [1994], Berry et al. [1995], Newey and Powell [2003], Berry and Haile[2014], and Dunker et al. [2017] among others use instrumental variables to identifyan average structural function from aggregate data. Recall that the support of X is the smallest closed set S such that P p X P S q “ A key step to apply these methods is injectivity in a market-level observable to a vector ofunobservable endogenous vectors, usually denoted ξ . See Allen [2019] or Lemma 3 in Allen andRehbeck [2019a] for injectivity results that cover the present model when the utility index for good k is β k x k ` ξ k . Related injectivity results have appeared in Galichon and Salani´e [2015] and Chionget al. [2017].
8n important feature of the analysis is that only the average structural functionis required to be identified over an appropriate region. Thus, the full distributionof Y p x, ¨ , ¨q induced by the product measure µ ˆ ν over p β, ε q is not necessary foridentification. For common discrete choice models the average structural functionand the full distribution of Y p x, ¨ , ¨q contain the same information, but this is not truein general. This is particularly important when combining this analysis with workallowing endogeneity between X and p β, ε q . In particular, there are well-understoodmethods to identify the average structural function in the presence of endogeneityas mentioned earlier. In contrast, less is known about identification of the entiredistribution of Y p x, ¨ , ¨q in the presence of endogeneity. In addition, requiring only the average structural function implies that the analysiscan be applied to settings outside of discrete choice without observing whether goodsare chosen together. Of course if the full distribution of Y p x, ¨ , ¨q is identified, thenthese results apply as well. We recall that this paper does not study identification ofthe distribution of ε in the original latent utility model (1). However, it is possible toidentify the distribution of ε in some cases. For example, Dunker et al. [2017] showhow to identify the distribution of random intercepts in a full-consideration randomcoefficients bundles model, provided the analyst has aggregate data on the frequencywith which goods are chosen together. We make use of an aggregation result that first integrates out the distribution of ε . Lemma 1 (Allen and Rehbeck [2019a]) . Let Y p¨q satisfy (1). For any measure µ over ε such that ş Y p x, β, ε q dµ p ε q and ş D p Y p x, β, ε q , ε q dµ p ε q exist and are finite, itfollows that ż Y p x, β, ε q µ p dε q P argmax y P B K ÿ k “ y k p β k x k q ` D p y q , Imbens and Newey [2009] identify average and quantile structural functions with multidimen-sional heterogeneity in the outcome equation. Torgovitsky [2015] and D’Haultfœuille and F´evrier[2015] identify the entire structural function with one-dimensional unobservable heterogeneity in theoutcome equation. A multidimensional counterpart has been studied in Gunsilius [2019]. Thesepapers all identify features of structural functions in the presence of endogeneity. or B the convex hull of B , and D p y q “ sup ˜ Y P Y : ş ˜ Y p ε q dµ p ε q“ y ş D ´ ˜ Y p ε q , ε ¯ dµ p ε q , where Y is the set of ε -measurable functions that map to B .In addition, the (integrated) indirect utility function V p β x , . . . , β K x K q “ max y P B K ÿ k “ y k p β k x k q ` D p y q , satisfies V p β x , . . . , β K x K q “ ż ˜ max y P B K ÿ k “ y k p β k x k q ` D p y, ε q ¸ dµ p ε q . Note that assuming Y p¨q satisfies (1) requires the argmax set to be nonempty. Thisis a behavioral restriction that imposes sufficient structure for the theorem to gothrough, and imposes minimal restrictions on D . In particular, D can be ´8 forcertain combinations of p y, ε q and need not be continuous. This allows us to treatlimited consideration models as in Example 2.We leverage the aggregation result from Lemma 1 to use calculus-based techniquesfor identification. To illustrate how aggregation can lead to smoothness, recall that Y p x, β, ε q in discrete choice is a vector of indicators denoting which good is chosen(assuming no ties). Derivatives with respect to x either do not exist at certain points,or are zero and contain little information.We smooth choices by working with Y p x, β q : “ ż Y p x, β, ε q dµ p ε q with µ as in Assumption 1. In discrete choice, when ε is integrated out Y p x, β q canbe interpreted as the vector of probabilities conditional on only the utility indices.However, this general framework allows us to use the same tools to address discreteand continuous choice. For example, choices could involve a single discrete choice,discrete bundle choice, a prospective matching, continuous quantities of several goods,or time use among other settings. The supremum is taken to be ´8 when there is no ˜ Y P Y such that ş ˜ Y p ε q dµ p ε q “ y .
10e places some additional high-level sufficient conditions relative to the conclusionsof Lemma 1.
Assumption 2.
Assume the following:(i) Y p x, β q “ argmax y P B K ÿ k “ y k p β k x k q ` D p y q . (ii) B Ď R K is a nonempty, closed, and convex set.(iii) D : R K Ñ R Y t´8u is concave, upper semi-continuous, and finite at some y P B . Allen and Rehbeck [2019a] provide lower-level conditions that, when combined withLemma 1, imply this assumption. Part (i) strengthens the conclusion of Lemma 1to obtain a unique maximizer. Concavity in part (iii) is milder than it first appears,and delivers no additional restrictions on Y p x, β q when the other assumptions aremaintained. See the discussion in Allen and Rehbeck [2019a].To further present the foundation of the identification results, we present a versionof the envelope theorem. Lemma 2.
Let Assumption 2 hold. It follows that Y k p x, β q “ B k V p β x , . . . , β K x K q . (2)Here, Y k p x, β q is the k -th component of Y p x, β q and B k V p β x , . . . , β K x K q isthe derivative with respect to the k -th dimension of V evaluated at the point p β x , . . . , β K x K q . We use similar notation for the rest of the paper. Differentia-bility of V is implied by the fact that Y is the unique maximizer. This is the primaryimplication of Assumption 2 that we use for this paper.Fox et al. [2012] use a structure similar to (2), showing that when V is known, it ispossible to identify moments of the distribution of β . We differ because we do notrequire an analyst to specify V . Instead, we require certain moments to be nonzero asa relevance condition. Appendix B.2 provides further details and a comparison withtheir approach. A related structure is considered in Lewbel and Pendakur [2017], whoidentify the distribution of random coefficients when an analogue of B k V is known in11dvance or additively separable in arguments. We do not impose this structure.We also leverage a symmetry property of mixed partial derivatives that results fromthe optimizing behavior in Assumption 2. For a vector of indices γ “ p γ , . . . , γ M q Pt , . . . , K u M and a sufficiently differentiable function f : R K Ñ R , let B γ f : “ B γ ¨ ¨ ¨ B γ M f. Lemma 3.
Suppose V is M -times continuously differentiable in a neighborhood of ~u P R K . Let γ, δ P t , . . . , K u M be vectors of indices in which each index occurs thesame number of times in both γ and δ . It follows that B γ V p ~u q “ B δ V p ~u q . This result states that the order in which we take partial derivatives does not matter.For example, when M “ j, k P t , . . . , K u that B j,k V p ~u q “ B k,j V p ~u q . The lemma follows by repeated application of the M “ With the foundations in place, we now turn to the task of identifying moments ofrandom coefficients. We focus on conditions where certain M -th order moments ofthe distribution of β are identified. In particular, if the assumptions hold for all M ,then all moments of the distribution of random coefficients are identified.We assume regressors are continuous and satisfy an exclusion restriction. Assumption 3.
All covariates are continuous. In addition, each x k is a vector ofregressors specific to the k -th good. We now provide some intuition for the main result (Theorem 1). We consider iden-tifying second moments of β ( M “
2) when there are two goods ( K “
2) and each12ood has a single covariate ( d k “ f , with respect to the covariates of the j -th good, x j ,as B x j f . Differentiating the envelope theorem (Lemma 2) and evaluating at x “ B x j Y k p , β q “ B j,k V p q β j . This uses the fact that x j is continuous and excluded from the utility index of othergoods. This can be repeated with other mixed partial derivatives. Importantly, byevaluating derivatives at the point x “
0, the terms B j,k V p q do not depend on β .Thus, when integrating over the values of the random coefficients, the term involving V passes outside of the integral. In particular, integrating over β yields the followingsystem of equations B x B x Y p q “ B , , V p q ż β dν p β qB x B x Y p q “ B , , V p q ż β β dν p β qB x B x Y p q “ B , , V p q ż β β dν p β qB x B x Y p q “ B , , V p q ż β dν p β q (3)where we have implicitly assumed that differentiation and integration can be inter-changed.Assume that the derivatives of Y are identified. At first glance, this is a system offour equations with seven unknowns (clearly the β β and β β moments are equal).However, when V is sufficiently differentiable, partial derivatives of V do not dependon the order of differentiation (Lemma 3), which eliminates two unknowns. Usinga scale assumption that ş β dν p β q is known a priori will eliminate an unknown andgives a system with 4 equations and 4 unknowns. We show that this is enough toidentify all second moments of β .To constructively see how the moments are identified, note that using symmetryof derivatives, the first and third equations identify ş β β dν p β q . Using this, we Here we abuse notation and for the function f , we let B s f p q “ B s f p z q| z “ . B , , V p q using the second equation. Again using symmetry of derivativesand combining this with the last equation identifies ş β dν p β q . Once all moments areidentified, the remaining third order derivatives of V can be identified at 0. We now provide formal conditions that justify the intuitive argument for any numberof goods, covariates, and order of moment M . Assumption 4.
For the natural number M , ş β M , dν p β q is finite, known a priori, andnonzero. Assumption 4 holds if we set β , “
1, for example, but is considerably more general.It allows heterogeneity in the sign of β , , for example. In general, if one wants to iden-tify all moments of β using the main result, then for every M Assumption 4 must hold.This assumption holds when the distribution of β , is known a priori and the distri-bution has nonzero moments of all orders. If Assumption 4 is dropped, the results inthis paper establish identification of the ratio of any nonzero M -th order moments.Thus, Assumption 4 can be appropriately modified by instead holding fixed the valueof some other nonzero M -th order moment of the form ş β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M dν p β q . Weshow in Section 3.1 that if β , is independent of all other components of β , thenidentification is possible using a single scale assumption on the first moment.Recall that with minor abuse of notation we set Y p x q “ ż Y p x, β q dν p β q . We require the following regularity conditions.
Assumption 5.
For the natural number M , the following conditions hold:(i) For each good k , one can interchange integration and differentiation for all M -th This part also requires the equations B x B x Y p q “ B , , V p q ż β dν p β qB x B x Y p q “ B , , V p q ż β dν p β q to identify B , , V p q and B , , V p q . rder partial derivatives at x “ so that B x k ,ℓ ¨ ¨ ¨ B x kM,ℓM Y k p q “ ż B x k ,ℓ ¨ ¨ ¨ B x kM ,ℓM Y k p , β q dν p β q holds.(ii) Each M -th order moment ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M dν p β q exists and is finite.(iii) V is p M ` q -times continuously differentiable in a neighborhood of .(iv) For each γ P t , . . . , K u M ` , B γ V p q ‰ . (v) Y p x q is known in a neighborhood of x “ , or more generally it is known in aneighborhood of x “ with respect to the weakly positive orthant of R ř Kk “ d k . These regularity conditions parallel assumptions in Fox et al. [2012]. To interpretpart (i), note that ν can be a discrete probability measure over β with finite support.For discrete measures, (i) holds whenever Y k p x, β q is M -times differentiable in x forevery β in its support. Part (ii) formalizes that the moments we wish to identify existand are finite.Parts (iii) and (iv) can be linked to derivatives of the function Y p x q via the envelopetheorem (Lemma 2). Indeed, differentiating the envelope theorem for the k -th good,evaluating the derivative of ¯ Y k with respect to x j,ℓ at x “
0, and taking expectationsyields B Y k p qB x j,ℓ “ B j,k V p q ż β j,ℓ dν p β q . (4)Thus, when V is p M ` q -times continuously differentiable, it follows that Y is M -times continuously differentiable. Moreover, if one sees empirically that B Y k p qB x j,ℓ ‰ More formally, the second condition can be written as follows: for some neighborhood H of x “ R ř Kk “ d k , Y p x q is known on H X R ř Kk “ d k ` , where R ` “ R X r , . B j,k V p q ‰ γ has Lebesgue measure 0. In general, whether (iv) holds depends onfeatures of the distribution of ε and choice of D , which are example specific. Forexample, part (iv) rules out pure characteristic discrete choice models as in Berryand Pakes [2007] and Dunker et al. [2017]. These models do not include a randomintercept, and so the value function for the pure characteristics model, V P C , can bewritten as V P C p β x , . . . , β K x K q “ sup y P B K ÿ k “ y k p β k x k q without the additive disturbance D , where B is the probability simplex. V P C doesnot have a non-zero derivative at x “ M ě V P C also does not always induce a unique maximizer.More generally, condition (iv) requires that goods in the demand system are related.For example, if the original K -good demand system can be written as K separate1-good demand systems, then derivatives of the form V j,k p q will be zero for j ‰ k .This is because under this separability assumption, the utility index of good j doesnot alter the demand for the k -th good. In general, (iv) cannot be relaxed for themain result to hold without additional assumptions. For example, if we impose that β j “ β k (a.s.) for all j, k P t , . . . , K u , then one can identify ratios of moments underthe weaker assumption that B M ` j V p q ‰ j . See Appendix B.3.Condition (v) states that Y p x q is identified over a small region near x “
0. Theconstructive identification results in Fox et al. [2012] and Chernozhukov et al. [2019a]have also made use of variation around zero. In contrast, most of the literature insteadrequires identification of Y either for all x or for a set over which x is unboundedalong some dimensions.To interpret condition (v), suppose that X , β , and ε are all independent, and weidentify Y from a continuous version of the conditional mean of Y given X . For thiscase, condition (v) is implied when the support of X contains an open ball around x “
0. The second more general part of (v) highlights that the results also applywhen the average structural function is identified over a weakly positive region. Thus,16ur results do not rule out prices. We can handle this case because we only need toidentify certain derivatives of Y at 0. These derivatives of Y at 0 are identified inthis case by calculating derivatives from “one-sided” limits involving non-negativenumbers.The final assumption used for identification is that a sufficiently rich set of M -thorder moments of β are nonzero. Assumption 6.
For the natural number M and each tuple of good indices p k , . . . , k M q P t , . . . , K u M , there is a corresponding tuple of characteristic indices p ℓ , . . . , ℓ M q P ś Mm “ t , . . . , d k m u such that the M -th order moment ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M ν p dβ q exists and is nonzero. This is a relevance condition. It is not necessary to know which indices p ℓ , . . . , ℓ M q satisfy this condition in advance. A sufficient condition for this is that for every k -th good there is a regressor ℓ k P t , . . . , d k u such that either β k,ℓ k ě β k,ℓ k ď β k, “ k -th good bysetting ℓ “ ¨ ¨ ¨ “ ℓ M “
1. This is a common assumption in the literature [Berryand Haile, 2009, Briesch et al., 2010, Dunker et al., 2017]. However, Ichimura andThompson [1998] and Gautier and Kitamura [2013] establish identification of randomcoefficients models for binary discrete choice using a more general halfspace condition.With these assumptions, we can now state the main result of the paper.
Theorem 1.
Let Assumptions 1-6 hold with the same natural number M . Each M -thorder moment of the form ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M dν p β q is identified. In addition, for each γ P t , . . . , K u M ` , B γ V p q is identified. See the discussion after Lemma A.2 in Appendix A.2. β . It canbe directly used to establish semiparametric identification of the distribution of β forcertain parametric families without specifying other objects (e.g. V ). For example, if β is normally distributed then Theorem 1 identifies the distribution when the assump-tions hold for M P t , u because normal distributions are characterized by meansand covariances. Recall that while we identify non-centered moments, we can use thisinformation to identify centered moments. More generally, for any distribution of β that is defined by its moments up to order M this result estabilishes identification ofthe distribution. Corollary 1.
Suppose Assumptions 1-6 hold for each M ď M (which could be )and the distribution of β is determined by its first M moments. It follows that thedistribution of β is identified. Fox et al. [2012] and il Kim [2014] describe a sufficient condition for a distributionto be determined by its moments. Distributions with compact finite support aredetermined by their moments. Lognormal distributions are an example of distribu-tions that are not determined by integer moments [Heyde, 1963]. That is, there areother nonparametric distributions that can match the same moments. However, theparameters may still be identified within the lognormal class.
Remark . The proof of Theorem 1 is constructive.While a detailed approach to estimation is beyond the scope of this paper, the con-structive results can be used to show how consistent estimation of the ratios of certainderivatives allows one to consistently estimate moments of the distribution of randomcoefficients. See Appendix B.1 for a brief outline. Estimation error of the M -th ordermoments can be bounded when the analyst has a suitable estimator of the M -th orderderivatives of the average structural function. For example, Chen and Christensen[2018] provide an approach to estimate derivatives of average structural functions inthe presence of endogeneity. β , Identifying the distribution of β using Corollary 1 requires Assumption 4, which spec-ifies all moments of the form ş β M , dν p β q . When the distribution of β is identified fromits moments, one must specify the marginal distribution of β , in advance to apply18orollary 1. While the common assumption β , “ β , has a known distribution. This section describes an alternative assumption thatensures identification of moments of β . In particular, we assume β , is independentof other components of β .With this independence assumption, we show that a single scale assumption on thefirst moment of β , allows us to identify a rich collection of moments. This contrastswith Theorem 1, which uses an assumption on the M -th order moment of β , toidentify only M -th order moments of β . Assumption 7.
For the measure ν as defined in Assumption 1, β , is independentof all other components of β . In addition, | ş β , dν p β q| is finite, known a priori, andnonzero. Alternatively, one could set the absolute value of some other order moment of β , ,but we focus on the first moment since it facilitates interpretation. Independencebetween β , and other components is considerably weaker than assuming β , “ β , to be sometimes negative and sometimespositive. Thus, different individuals can be repelled or attracted to higher values of x , .Replacing Assumption 4 with Assumption 7, we obtain the following counterpart ofTheorem 1. Proposition 1.
Let K ě and Assumptions 1-3 and 5-7 hold for all for each M ď M (which could be ). It follows that the M -th order moment of the form ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M ν p dβ q is identified. In addition, for each γ P t , . . . , K u M ` , B γ V p q is identified.If K “ , the same conclusions hold given the additional assumption that for each ach M ď M , there exists an order M ´ moment such that ż β ,ℓ ¨ ¨ ¨ β ,ℓ M ´ dν p β q ‰ , where ℓ m ‰ for every m P t , . . . , M ´ u . Relative to Theorem 1, independence of β , from the other components allows us torelate the M -th and M ´ M -th ordermoment in which β , appears exactly once. Using independence, we obtain ż β , β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M dν p β q “ ż β , dν p β q ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M dν p β q . In the proof, we show that ş β , dν p β q can be identified when | ş β , dν p β q| is finite,known a priori , and non-zero. With this knowledge, we can identify the ratio of all M -th order moments to all p M ´ q -th order moments and apply induction to identifyall M ď ¯ M order moments. Remark . When K ě
2, the assumptions of Proposition 1 impose thatthere are multiple relevant characteristics. The additional assumption in the K “ x , . Remark . Provided ş β , ν p dβ q exists and is nonzero, it is anormalization to set | ş β , ν p dβ q| “
1. In other words, this imposes no additionalrestrictions on the model. This is seen by noting that if we divide the original latentutility model by | ş β , ν p dβ q| , then the argmax set does not change and none of theassumptions in Proposition 1 are affected by this division.A natural intuition is that when β , ą β , and rewrite the problem with β , “
1. This istrue if we only inspect the original latent utility model (Equation 1), but is no longertrue when we consider independence or certain other additional assumptions on theintegrated model in Assumption 2. Recall that Assumption 1 means β and ε areindependent in the definition of the average structural function Y . In general, thisassumption is not invariant to division by β , . This means that setting β , “ Welfare Analysis and Counterfactuals
We now turn to identification of certain welfare and counterfactual objects. Identifi-cation is established given identification of certain features of V , which is the indirectutility function obtained when random intercepts are integrated out. We first providethree results that identify differences in V . Using these results, we discuss welfareanalysis and counterfactuals.The reason we identify V is that we can use the envelope theorem to determine certainaverage choices Y p x, β q “ ∇ V p β x , . . . , β K x K q . We require identification of the right hand side at values other than 0 to considercounterfactuals at new values of covariates. V We first provide conditions under which identification of partial derivatives of V at0 allows us to directly extrapolate the function. Specifically, we assume V is a realanalytic function. That is, V has derivatives of all orders and agrees with its Taylorseries in a neighborhood of every point. Real analytic functions have the importantproperty that local information can be used to reconstruct the function globally byextrapolating. This is similar to common parametric classes of functions. However,the set of real analytic functions is infinite dimensional. Corollary 2.
Let the assumptions of Theorem 1 or the assumptions of Proposition 1hold with M “ 8 . If V is a real analytic function, then it is identified up to anadditive constant. One way to drop the assumption that V is a real analytic function is to insteadassume β k, “ k . With this assumption, let ˜ x be a value thatis zero for every characteristic except the first characteristic of each good. Then theenvelope theorem (Lemma 2) specializes to Y k p ˜ x, β q “ B k V p ˜ x , , . . . , ˜ x K, q . β , and so by taking expectations, the average structuralfunction identifies the derivative of V at the point p ˜ x , , . . . , ˜ x K, q . By integrating thederivatives we can identify differences in V , as we now formalize. Proposition 2.
Let Assumption 2 hold and assume β k, “ almost surely for each k P t , . . . , K u . Suppose Y p x q is identified for all x “ p x , . . . , x K q satisfying x k, Pr x k, , x k, s for each k , and x k,j “ for j ą . Then differences in V are identifiedover the region ˆ Kk “ r x k, , x k, s . In particular, if x k, “ ´8 and x k, “ 8 for each k ,then V is identified up to an additive constant. The results on welfare and counterfactual analysis require that derivatives of V beidentified at certain values p β x , ¨ ¨ ¨ , β K x K q . If the support of β is compact, then itis not necessary to identify V everywhere, and so it is not necessary to have x k, “ ´8 and x k, “ 8 to apply Proposition 2.Finally, we mention a third way to identify differences in V . A key distinction isthat it requires identification of the distribution of Y p x, β q for fixed x , rather thanidentification of the average structural function as in the rest of the paper. We adaptthe following lemma. Lemma 4 (McCann [1995]; statement from Chernozhukov et al. [2019b], Corollary2) . Let W “ f p η q , where W and η have the same finite dimension. Suppose f is thegradient of a convex function, η has a known distribution that is absolutely continuouswith respect to Lesbesgue measure, the distribution of W is known, and η and W havefinite variance. It follows that f is identified. This result can be applied to our setting by adapting the envelope theorem, Y p x, β q “ ∇ V p β x , . . . , β K x K q . Interpret W “ Y p x, β q , f “ ∇ V , and η “ p β x , . . . , β K x K q . When x is fixed andthe distribution of β is identified (from previous arguments), the distribution of η isknown. The function V is convex, and so the lemma provides conditions under which ∇ V is identified. Importantly, the lemma can be applied at a single x , so it is notnecessary to have full support of covariates to apply the result. Such an x cannotbe arbitrary. For example, when x “ η is notabsolutely continuous. The lemma can still be applied for x near but not equal to 0.Moreover, to apply the lemma, the distribution of β cannot be degenerate, i.e. there22ust be truly “random” coefficients. If β is almost surely equal to a constant, thenthe distribution corresponding to η is not absolutely continuous and the lemma doesnot apply.Importantly, to apply Lemma 4 in our setting, the distribution of Y p x, β q must beidentified at some fixed x . One example in which this lemma can be applied iswhen ε in the original latent utility model is not present, so that Y p x, β q correspondsto the observable choices given x and β . Such structure could be appropriate in acontinuous choice model in which all unobservable heterogeneity is controlled by therandom slopes β , and in which the choices (rather than e.g. average choices for agroup of individuals) are observed. We now describe how identification of V leads to identification of certain welfareobjects. First, recall that V may be interpreted as the indirect utility conditionalon the utility index (Lemma 1) where the random intercept ε under the measure µ is integrated out. To interpret V p¨q as a welfare object, suppose β is an individual-specific term that is random across the population but constant across decisionsfor the same individual. Interpret the random intercept ε as an idiosyncratic tasteshock across decision problems. Then V p β x , . . . , β K x K q is an individual-specific(integrated) indirect utility. The conditions of Corollary 1 identify the distribution of V p β x , . . . , β K x K q under the measure ν up to an additive constant once the valuesof covariates are fixed.Thus, we can identify the distribution of individual-specific indirect utilities. Byfurther integrating out the distribution of β , we also identify differences in the averageindirect utility via Lemma 1: ż V p β x , . . . , β K x K q dν p β q “ ż ż sup y P B K ÿ k “ y k p β k x k q ` D p y, ε q dµ p ε q dν p β q . (5)This result holds regardless of whether the distribution of ε is identified since V is a welfare-relevant summary measure of the distribution of ε . Indeed, we do not23stablish identification of the distribution ofsup y P B K ÿ k “ y k p β k x k q ` D p y, ε q according to the product measure µ ˆ ν over p β, ε q . In particular, since this paper doesnot study identification of ε , we do not identify the distribution of indirect utilitiesincluding random intercepts.Note that the units of (5) are relative to the scale assumption used to identify the dis-tribution of β . If we impose the scale assumptions in Theorem 1 to apply Corollary 2,then the distribution of β , is fixed. Thus, the units of Equation 5 are set by the distribution of the conversion rate between x , and utils. In contrast, if we imposethe conditions of Proposition 1, then the scale is determined by ˇˇş β , ν p β q ˇˇ . For thiscase, the units of Equation 5 are relative to the average conversion rate between x , and utils.An alternative measure of average indirect utility is ż | β , | V p β x , . . . , β K x K q dν p β q “ ż ż | β , | sup y P B K ÿ k “ y k p β k x k q ` D p y, ε q dµ p ε q dν p β q . (6)This sets the conversion rate of x , and utils to ˘
1. Importantly, this preserveswhether the first characteristic is desirable or undesirable. It also forces the inten-sity of preference to be constant across individuals. This welfare measure is mostinterpretable when the regressor has a homogeneous sign. For example, if x , is the(negative) price of good 1, then β , ă Once V is identified, we can also answer certain counterfactual questions involvingquantities at new values of covariates. To this end, recall from Lemma 2 that Y k p x, β q “ B k V p β x , . . . , β K x K q . (7)24ere, Y k p x, β q is the demand for good k fixing covariates and the random intercept,but integrating out the distribution of ε . We interpret β as an individual-specificparameter that is constant across decision problems, while ε is an idiosyncratic shockthat can vary across decision problems. Thus, Y k p x, β q is the individual-specificaverage quantity of the k -th good. Once V and the distribution of β are identified,we can identify the distribution of Y k p x, ¨q from Equation 7.Conceptually, this shows it is possible to start with identification of the averagestructural function (“mean choices”) around x “ Y k p x, β q at all values of the covariates. This also implies that any value x at which Y can be identified directly from data (as opposed to the theoretical analysis justdescribed) provides overidentifying information. We now provide additional discussion of the main results in the paper.
Remark . Our analysis can be applied to settingsin which covariates do not vary around zero. For example, we could recenter via˜ X “ X ´ E r X s so that identification uses variation in the average structural functionaround the mean, rather than variation around 0. This is noted as well in Foxet al. [2012]. Importantly, the assumptions in this paper are not typically invariantto recentering. Thus, the assumptions must be made given a particular centeringat which the average structural function is identified. In addition, the location ofrecentering defines where the random slopes do not alter preferences since β x “ x “
0. In words, the choice of centering sets a location of taste homogeneitywith regard to the random slopes (but not random intercepts).
Remark . The results may be adapted to certainmodels in which coefficients are not linear. Suppose instead of the linear index β k x k ,we have x ρ k k for a scalar shifter x k and random exponent ρ k . Applying the envelopetheorem yields B x j Y k p x, ρ q “ B j,k V p x ρ , . . . , x ρ K K q ρ j x ρ j ´ j ρ “ p ρ , . . . , ρ K q collects the random exponents. In particular, the partialderivatives of V can be evaluated at a vector of ones so that x ρ j ´ j “ ρ , evaluating the above equationaround covariates equal to one, and using symmetry of mixed partial derivatives, weobtain B x j Y k p qB x k Y j p q “ E r ρ j s E r ρ k s . Ratios of higher order moments can be identified by considering additional derivativesof the average structural function, and by using symmetry of mixed partials of V evaluated at the vector of ones. Remark V ) . Recall that the envelope theoremyields B x j Y k p , β q “ B j,k V p q β j . Thus, second order mixed partial derivatives of V describe how changes in the utilityindex of good x j alter the demand for good k . The sign of B j,k V p q describes whethergoods are local complements ( B j,k V p q ě
0) or substitutes ( B j,k ď V p q ). Theorem 1provides conditions under which this derivative is identified, and thus we obtaininformation on complementarity/substitability with random coefficients. Severalpapers have studied complementarity in the bundles model (Example 2) whencharacteristics shift the marginal utility of a good homogenously (Gentzkow [2007],Fox and Lazzati [2017], Chernozhukov et al. [2015], Allen and Rehbeck [2019d,c]).To our knowledge, the only paper that studies identification of complementarity inthe bundles model with heterogeous tastes for characteristics is Dunker et al. [2017].Rather than working with only the average structural function (“mean demands”),Dunker et al. [2017] require identification of the frequencies that goods are chosentogether. Remark . We do not require the assumption thatcoefficients are the same across goods, β j “ β k . One interpretation of this assumptionis that preferences are driven by observable characteristics [Gorman, 1980, Lancaster,1966], not the label of the good ( j vs k ). In this paper, we assume that the shiftersassociated with β k vary for good k . Thus, setting β j “ β k means the shifters thatvary for good j are the same as for good k . This assumption is inconsistent with26ome empirical settings, especially outside of discrete choice. For example, it is notsatisfied for the bundles model of Gentzkow [2007] where internet speed varies foronline news but not print news.Placing restrictions relating coefficients across different goods allows one to eitherweaken the conditions used for identification or use alternative techniques. SeeAppendix B.3 for additional discussion and relation to Chernozhukov et al. [2019a]. Remark . The conditions of Theorem 1 imply testable implicationsbecause of theoretical relationships between different moments. To see this, we revisitthe system of equations (3) used previously to illustrate the identification technique.Dividing the first and third equations, and the second and fourth equations, we obtain B x B x Y p qB x B x Y p q “ ş β ν p dβ q ş β β ν p dβ q B x B x Y p qB x B x Y p q “ ş β ν p dβ q ş β β ν p dβ q . Multiplying these equations and using the Cauchy-Schwarz inequality yields thetestable restriction B x B x Y p qB x B x Y p q ¨ B x B x Y p qB x B x Y p q ě . Note that this inequality only concerns the average structural function close to 0.
Remark . The results in this paper establish identifi-cation of certain moments of the distribution of random coefficients. It is naturalto wonder whether the distribution can be identified without requiring that it beuniquely determined by its moments. A related question has been studied previously.Specifically, consider the setting Y “ A ` B Z, where Z is independent of p A, B q and the distribution of p Y, Z q is identified. Buildingon Belisle et al. [1997], Masten [2017] shows that, if the support of Z is bounded, thenidentification of either p A, B q or even the marginal distributions B k requires that thedistribution p A, B q is determined by its moments. In light of this, it is possible thatgiven only the conditions of Theorem 1 for each M , identification of the distributionof β requires that it be determined by its moments. Recall that we only assumeidentification of the average structural function Y p x q in a bounded set around 0.27 eferences Victor H Aguiar. Random categorization and bounded rationality.
Economics Letters ,159:46–52, 2017.Roy Allen. Injectivity and the law of demand.
Available at SSRN 3437946 , 2019.Roy Allen and John Rehbeck. Identification with additively separable heterogeneity.
Econometrica , 87(3):1021–1054, 2019a.Roy Allen and John Rehbeck. Revealed stochastic choice with attributes.
Availableat SSRN 2818041 , 2019b.Roy Allen and John Rehbeck. Latent complementarity in bundles models.
WorkingPaper , 2019c.Roy Allen and John Rehbeck. Hicksian complementarity and perturbed utility mod-els.
Working Paper , 2019d.Joseph G Altonji and Rosa L Matzkin. Cross section and panel data estimators fornonseparable models with endogenous regressors.
Econometrica , 73(4):1053–1102,2005.Simon P Anderson, Andr´e De Palma, and J-F Thisse. A representative consumertheory of the logit model.
International Economic Review , pages 461–466, 1988.Simon P Anderson, Andre De Palma, and Jacques Fran¸cois Thisse.
Discrete choicetheory of product differentiation . Cambridge, MA: MIT press, 1992.Claude Belisle, Jean-Claude Mass´e, Thomas Ransford, et al. When is a probabilitymeasure determined by infinitely many projections?
The Annals of Probability , 25(2):767–786, 1997.Steven Berry and Ariel Pakes. The pure characteristics demand model.
InternationalEconomic Review , 48(4):1193–1225, 2007.Steven Berry, James Levinsohn, and Ariel Pakes. Automobile prices in market equi-librium.
Econometrica , pages 841–890, 1995.28teven T Berry. Estimating discrete-choice models of product differentiation.
TheRAND Journal of Economics , pages 242–262, 1994.Steven T Berry and Philip A Haile. Nonparametric identification of multinomialchoice demand models with heterogeneous consumers. Technical report, NationalBureau of Economic Research, 2009.Steven T Berry and Philip A Haile. Identification in differentiated products marketsusing market level data.
Econometrica , 82(5):1749–1797, 2014.Richard Blundell and James L Powell. Endogeneity in nonparametric and semipara-metric regression models.
Econometric society monographs , 36:312–357, 2003.J Hayden Boyd and Robert E Mellman. The effect of fuel economy standards on theus automotive market: an hedonic demand analysis.
Transportation Research PartA: General , 14(5-6):367–378, 1980.Richard L Brady and John Rehbeck. Menu-dependent stochastic feasibility.
Econo-metrica , 84(3):1203–1223, 2016.Richard A Briesch, Pradeep K Chintagunta, and Rosa L Matzkin. Nonparamet-ric discrete choice models with unobserved heterogeneity.
Journal of Business &Economic Statistics , 28(2):291–307, 2010.N Scott Cardell and Frederick C Dunbar. Measuring the societal impacts of au-tomobile downsizing.
Transportation Research Part A: General , 14(5-6):423–434,1980.Xiaohong Chen and Timothy M Christensen. Optimal sup-norm rates and uniforminference on nonlinear functionals of nonparametric iv regression.
QuantitativeEconomics , 9(1):39–84, 2018.Victor Chernozhukov, Whitney K Newey, and Andres Santos. Constrained conditionalmoment restriction models. arXiv preprint arXiv:1509.06311 , 2015.Victor Chernozhukov, Iv´an Fern´andez-Val, and Whitney K Newey. Nonseparablemultinomial choice models in cross-section and panel data.
Journal of Economet-rics , 211(1):104–116, 2019a. 29ictor Chernozhukov, Alfred Galichon, Marc Henry, and Brendan Pass. Single marketnonparametric identification of multi-attribute hedonic equilibrium models. arXivpreprint arXiv:1709.09570 , 2019b.Khai Chiong, Yu-Wei Hsieh, and Matthew Shum. Counterfactual estimationin semiparametric discrete-choice models. Technical report, 2017. URL http://dx.doi.org/10.2139/ssrn.2979446 .Khai Xiang Chiong, Alfred Galichon, and Matt Shum. Duality in dynamic discrete-choice models.
Quantitative Economics , 7(1):83–115, 2016.Christopher T Conlon and Julie Holland Mortimer. Demand estimation under in-complete product availability.
American Economic Journal: Microeconomics , 5(4):1–30, 2013.Angus Deaton and John Muellbauer. An almost ideal demand system.
The AmericanEconomic Review , 70(3):312–326, 1980.Xavier D’Haultfœuille and Philippe F´evrier. Identification of nonseparable triangularmodels with discrete instruments.
Econometrica , 83(3):1199–1210, 2015.T Domenich and D McFadden. Urban travel demand: a behavioural approach.
Worth-Holland, Amsterdam , 1975.Fabian Dunker, Stefan Hoderlein, and Hiroaki Kaido. Nonparametric identification ofendogenous and heterogeneous aggregate demand models: complements, bundlesand the market level.
Working Paper , 2017.Fabian Dunker, Stefan Hoderlein, Hiroaki Kaido, and Robert Sherman. Nonparamet-ric identification of the distribution of random coefficients in binary response staticgames of complete information.
Journal of Econometrics , 206(1):83–102, 2018.Ivar Ekeland, James J Heckman, and Lars Nesheim. Identifying hedonic models.
American Economic Review , 92(2):304–309, 2002.Ivar Ekeland, James J Heckman, and Lars Nesheim. Identification and estimation ofhedonic models.
Journal of political economy , 112(S1):S60–S109, 2004.30fir Eliaz and Ran Spiegler. Consideration sets and competitive marketing.
TheReview of Economic Studies , 78(1):235–262, 2011.Daniel Ershov, Jean-William Lalibert´e, and Scott Orr. Mergers in a model withcomplementarity. Technical report, 2018. Working Paper.Mogens Fosgerau, Julien Monardo, and Andr´e De Palma. The inverse product differ-entiation logit model. 2019.Jeremy T Fox. A note on nonparametric identification of distributions of randomcoefficients in multinomial choice models. Technical report, National Bureau ofEconomic Research, 2017.Jeremy T Fox and Amit Gandhi. Nonparametric identification and estimation of ran-dom coefficients in multinomial choice models.
The RAND Journal of Economics ,47(1):118–139, 2016.Jeremy T Fox and Natalia Lazzati. A note on identification of discrete choice modelsfor bundles and binary games.
Quantitative Economics , 8(3):1021–1036, 2017.Jeremy T Fox, Kyoo il Kim, Stephen P Ryan, and Patrick Bajari. The randomcoefficients logit model is identified.
Journal of Econometrics , 166(2):204–212,2012.Drew Fudenberg, Ryota Iijima, and Tomasz Strzalecki. Stochastic choice and revealedperturbed utility.
Econometrica , 83(6):2371–2409, 2015.Xavier Gabaix. Chapter 4 - behavioral inattention. In B. Douglas Bern-heim, Stefano DellaVigna, and David Laibson, editors,
Handbook of Behav-ioral Economics - Foundations and Applications 2 , volume 2 of
Handbookof Behavioral Economics: Applications and Foundations 1 , pages 261 – 343.North-Holland, 2019. doi: https://doi.org/10.1016/bs.hesbe.2018.11.001. URL .Alfred Galichon.
Optimal transport methods in economics . Princeton University Press,2018.Alfred Galichon and Bernard Salani´e. Cupid’s invisible hand: Social sur-31lus and identification in matching models. Technical report, 2015. URL http://dx.doi.org/10.2139/ssrn.2979446 .Eric Gautier and Yuichi Kitamura. Nonparametric estimation in random coefficientsbinary choice models.
Econometrica , 81(2):581–607, 2013.Matthew Gentzkow. Valuing new goods in a model with complementarity: Onlinenewspapers.
American Economic Review , 97(3):713–744, 2007.William M Gorman. A possible procedure for analysing quality differentials in theegg market.
The Review of Economic Studies , 47(5):843–856, 1980.Florian Gunsilius. Nonparametric point-identification of multivariate models withbinary instruments.
Working Paper , 2019.Jerry A Hausman and David A Wise. A conditional probit model for qualitativechoice: Discrete decisions recognizing interdependence and heterogeneous prefer-ences.
Econometrica: Journal of the Econometric Society , pages 403–426, 1978.James J Heckman. Micro data, heterogeneity, and the evaluation of public policy:Nobel lecture.
Journal of political Economy , 109(4):673–748, 2001.James J Heckman, Rosa L Matzkin, and Lars Nesheim. Nonparametric identificationand estimation of nonadditive hedonic models.
Econometrica , 78(5):1569–1591,2010.Chris C Heyde. On a property of the lognormal distribution.
Journal of the RoyalStatistical Society: Series B (Methodological) , 25(2):392–393, 1963.Josef Hofbauer and William H Sandholm. On the global convergence of stochasticfictitious play.
Econometrica , 70(6):2265–2294, 2002.Alessandro Iaria and Ao Wang. Identification and estimation of demand for bundles.Technical report, 2019. Working Paper.Hidehiko Ichimura and T Scott Thompson. Maximum likelihood estimation of abinary choice model with random coefficients of unknown distribution.
Journal ofEconometrics , 86(2):269–295, 1998. 32yoo il Kim. Identification of the distribution of random coefficients in static and dy-namicdiscrete choice models.
The Korean Economic Review , 30(2):191–216, 2014.Guido W Imbens and Whitney K Newey. Identification and estimation of triangularsimultaneous equations models without additivity.
Econometrica , 77(5):1481–1512,2009.Nail Kashaev. Identification of semiparametric discrete outcome models with boundedcovariates. arXiv preprint arXiv:1811.05555 , 2018.Kelvin J Lancaster. A new approach to consumer theory.
Journal of political economy ,74(2):132–157, 1966.Arthur Lewbel and Krishna Pendakur. Unobserved preference heterogeneity in de-mand using generalized random coefficients.
Journal of Political Economy , 125(4):1100–1148, 2017.Charles F Manski. The structure of random utility models.
Theory and decision , 8(3):229–254, 1977.Paola Manzini and Marco Mariotti. Stochastic choice and consideration sets.
Econo-metrica , 82(3):1153–1176, 2014.Yusufcan Masatlioglu, Daisuke Nakajima, and Erkut Y Ozbay. Revealed attention.
American Economic Review , 102(5):2183–2205, 2012.Matthew A Masten. Random coefficients on endogenous variables in simultaneousequations models.
The Review of Economic Studies , 85(2):1193–1250, 2017.Rosa L Matzkin. Restrictions of economic theory in nonparametric methods.
Hand-book of Econometrics , 4:2523–2558, 1994.Rosa L Matzkin. Constructive identification in some nonseparable discrete choicemodels.
Journal of Econometrics , 211(1):83–103, 2019.Robert J McCann. Existence and uniqueness of monotone measure-preserving maps.
Duke Mathematical Journal , 80(2):309–324, 1995.Daniel McFadden. Econometric models of probabilistic choice. In Charles F. Manski33nd Daniel McFadden, editors,
Structural Analysis of Discrete Data with Econo-metric Applications , pages 198–272. Cambridge, MA: MIT Press, 1981.Daniel L McFadden and Mogens Fosgerau. A theory of the perturbed consumer withgeneral budgets. Technical report, National Bureau of Economic Research, 2012.Julien Monardo. The flexible inverse logit (fil) model.
Available at SSRN 3388972 ,2019.Aviv Nevo. A practitioner’s guide to estimation of random-coefficients logit modelsof demand.
Journal of economics & management strategy , 9(4):513–548, 2000.Whitney K Newey and James L Powell. Instrumental variable estimation of nonpara-metric models.
Econometrica , 71(5):1565–1578, 2003.R Tyrrell Rockafellar.
Convex analysis . Number 28. Princeton university press, 1970.Xiaoxia Shi, Matthew Shum, and Wei Song. Estimating semi-parametric panel multi-nomial choice models using cyclic monotonicity.
Econometrica , 86(2):737–761,2018.Alexander Torgovitsky. Identification of nonseparable models using instruments withsmall support.
Econometrica , 83(3):1185–1197, 2015.Kenneth E Train.
Discrete Choice Methods with Simulation . Cambridge UniversityPress, 2009.Ao Wang. A blp demand model of product-level market shares with complementarity.Technical report, 2020. 34 ppendix A Proofs of Main Results
A.1 Preliminary Lemmas
Proof of Lemma 1.
This follows line by line from the proof of Allen and Rehbeck[2019a], Theorem 1. The statement of that result included an additional Assumption1, which was not used in the proof as long as the underlying choice is appropriatelymeasurable. Here, we start with Y of the form Y p X, β, ε q , which is automatically ameasurable function. Proof of Lemma 2.
See Allen and Rehbeck [2019a], Lemma 1. The result may alsobe directly proven from Rockafellar [1970], Theorems 23.5 and 25.1.
Proof of Lemma 3.
The function V is convex. The result then follows from Rockafel-lar [1970], Theorem 4.5, plus repeated differentiation. A.2 Proof of Theorem 1
The following lemmas maintain the assumptions of Theorem 1. These assumptionsensure the requisite smoothness assumptions and ensure that the following argumentsdo not divide by zero.In order to simplify presentation, we require some additional notation. Let p γ, ξ q bea tuple with γ P t , . . . , K u M denoting good indices, and let ξ k P t , . . . d γ k u describewhich characteristic corresponds to the γ k -th good. We set B p γ,ξ q Y k p , β q “ B x γ ,ξ ¨ ¨ ¨ B x γM ,ξM Y k p , β q . As shorthand we also write multiplication of the coefficients of β for the characteristicsof p γ, ξ q as β p γ,ξ q “ β γ ,ξ ¨ ¨ ¨ β γ M ,ξ M . Lemma A.1. B p γ,ξ q Y k p , β q “ B γ B k V p q β p γ,ξ q . roof. Lemma 2 establishes Y k p x, β q “ B k V p β x , . . . , β K x K q . Differentiating with respect to x γ ,ξ and evaluating at x “ B γ ,ξ Y k p , β q “ B γ B k V p q β γ ,ξ . By repeating the differentiation process and evaluating at x “ j -th regressors x j areexcluded from the desirability indices of the other goods. Lemma A.2. B p γ,ξ q Y k p q “ B γ B k V p q ż β p γ,ξ q dν p β q Proof.
We obtain B p γ,ξ q Y k p q “ B x ,ξ ¨ ¨ ¨ B x M,ξM Y k p q“ ż B x ,ξ ¨ ¨ ¨ B x M,ξM Y k p , β q dν p β q“ ż B γ B k V p q β p γ,ξ q dν p β q“ B γ B k V p q ż β p γ,ξ q dν p β q where the interchange of integration and differentiation in the second equality followsfrom Assumption 5(i), the third equality is Lemma A.1, and the final equality followssince the evaluation of B γ B k V p q is a constant that does not depend on β .Combining the result of Lemma A.2 and Assumption 6 ensures that there exists aset of goods and characteristic indices p γ, ξ q such that B p γ,ξ q Y k p q ‰
0. To see this,recall that Assumption 6 requires that for each collection of good indices γ we canfind characteristic indices ξ such that ş β p γ,ξ q dν p β q ‰
0. Given Assumption 5(iv) that δ γ δ k V p q ‰
0, Lemma A.2 shows that we must have B p γ,ξ q Y k p q ‰ Lemma A.3.
Fix j, k
P t , . . . , K u and let γ, δ P t , K u M . Suppose that each good ndex shows up exactly the same number of times in p γ, k q and p δ, j q . Then B γ B k V p q “ B δ B j V p q . Proof.
This is a slight restatement of Lemma 3.Suppose now that p δ, η q is defined similar to p γ, ξ q . That is, δ P t , . . . K u M denotesgood indices and η j P t , . . . , d δ j u indexes a characteristic of the δ j -th good.Combining the previous two lemmas, we obtain that if each good index shows upexactly the same number of times in p γ, k q and p δ, j q , then B p γ,ξ q Y k p q O B p δ,η q Y j p q “ ż β p γ,ξ q dν p β q O ż β p δ,η q dν p β q (8)whenever the denominator is nonzero. Thus, if the denominator of moments is iden-tified, the numerator is as well. Lemma A.4.
Suppose γ, δ
P t , . . . , K u M only differ in at most one component and ş β p δ,η q dν p β q is identified and nonzero. Then for every ξ tuple of characteristic indices ş β p γ,ξ q dν p β q is identified.Proof. Lemmas A.2 and A.3 immediately imply (8) since B p γ,ξ q Y k p q O B p δ,η q Y j p q “ ˆ B γ B k V p q ż β p γ,ξ q dν p β q ˙ O ˆ B δ B j V p q ż β p δ,η q dν p β q ˙ “ ż β p γ,ξ q dν p β q O ż β p δ,η q dν p β q . The term ş β p γ,ξ q dν p β q is identified because all other parts of (8) are identified.Note that in Lemma A.4 that the γ and δ terms can be the same. This covers thenon-trivial K “ ş β M , ν p dβ q be known. We present a lemma that drops thisassumption for the moment. This lemma will be used in subsequent results.37 emma A.5. If ş β M , dν p β q is not known, then we still identify the ratio of any M -thorder moments ż β p γ,ξ q dν p β q O ż β p δ,η q dν p β q , provided the denominator is nonzero.Proof. Start with good indices δ “ p , . . . , q of length M , and characteristic indices η such that the corresponding moment of β is nonzero. Applying Lemma A.4 for thepair with goods δ “ p , , . . . , q and characteristic indices η , we identify the ratio ż β p δ ,η q dν p β q O ż β p δ ,η q dν p β q . We can repeat this procedure with a sequence p δ , η q and δ “ p , , . . . , q withappropriately chosen characteristic indices η , and so forth, to construct a sequence δ , δ , . . . that reaches all possible tuples of good indices γ P t , . . . , M u K . At eachstep, we can change the good index one component at a time and then apply (8).This identifies the ratio of two adjacent moments in this sequence. We avoid dividingby zero because of the relevance condition (Assumption 6), which implies for each setof goods, δ , we can find tuples of characteristics, η , such that ş β p δ,η q dν p β q is nonzero.By multiplication we can identify new ratios. For example, a ratio involving δ and δ is identified via ż β p δ ,η q dν p β q O ż β p δ ,η q dν p β q “ ˜ż β p δ ,η q dν p β q O ż β p δ ,η q dν p β q ¸ ˜ż β p δ ,η q dν p β q O ż β p δ ,η q dν p β q ¸ . From these arguments, for each pair of good indices γ and δ we can find some tuplesof characteristic indices ξ and η such that ż β p γ,ξ q dν p β q O ż β p δ,η q dν p β q is identified, where numerator and denominator are nonzero.38rom Lemma A.4, the ratio ż β p δ, ˜ η q dν p β q O ż β p δ,η q dν p β q is identified for any vector of characteristic indices ˜ η where η is chosen so that thedenominator is nonzero. Thus, we identify the ratio of all moments.Using Lemma A.5, we conclude that if we fix ş β M , dν p β q in advance and it is nonzero,we identify all moments. However, we could fix any other nonzero M -th order momentand also obtain identification.Finally, from Lemma A.2 we have for all γ P t , . . . , K u M that B p γ,ξ q Y k p q “ B γ B k V p q ż β p γ,ξ q dν p β q . Moreover, from Assumption 6 we can find some ξ such that the right hand side isnonzero. By dividing, we identify B γ B k V p q , completing the proof of Theorem 1. A.3 Proof of Proposition 1
First, from the envelope theorem (see Lemma A.2 above), B Y p qB x , “ B , V p q ż β , ν p dβ q . The function V is convex and hence B , V p q ą B Y p qB x , and ş β , dν p β q have the same sign. Thus, the sign of the first moment of β , is identified from above and the magnitude is assumed known in Assumption 7.We prove the remainder of the result by induction on M . Recall that with M “
1, firstorder moments are identified from Theorem 1 using the assumption that ş β , ν p dβ q is known and nonzero.Now, fix an M such that 1 ď M ď M ´
1. As the inductive hypothesis, we assumeall M -th order moments ş β p δ,η q dν p β q are identified for all δ P t , . . . , K u M and η M ` K ě
2, for δ P t , . . . , K u M (i.e. no good index is equal to1) we can find a collection of characteristic indices η such that ş β p δ,η q dν p β q ‰
0. Ifinstead K “
1, we can set δ as the length- M vector of 1’s and let η be some collectionof characteristic indices with η m ‰ ż β ,η ¨ ¨ ¨ β ,η M dν p β q ‰ . In either case K “ K ě
2, set ˜ δ “ p δ , q and ˜ η “ p η , q . Then we obtain ż β p ˜ δ, ˜ η q dν p β q “ ż β , dν p β q ż β p δ,η q dν p β q because β , is independent of all other components of β under the measure ν , andthe tuple p δ, η q does not include the first characteristic of good 1. In particular, weidentify ż β p ˜ δ, ˜ η q dν p β q , which is nonzero because it is the product of two nonzero terms. From Lemma A.5,we identify the ratio of all M ` ş β p ˜ δ, ˜ η q dν p β q . Since ş β p ˜ δ, ˜ η q dν p β q is known and nonzero we identify all M ` V , use Lemma A.2as in the proof of Theorem 1. A.4 Proof of Proposition 2
Let x “ p x , . . . , x K q satisfy x k, P r x k, , x k, s for each k , and x k,j “ j ą x : , “ p x , , . . . , x K, q be a vector of the first characteristics for each good. FromLemma 2 and integrating over β , we obtain Y p x q “ ∇ V p x : , q where V p x q is convex. 40onsider initial characteristic values, x I , and final characteristic values, x F , suchthat for all k P t , . . . , K u and for all j ą
1, the equality x Ik,j “ x Fk,j “ x I to x F , we obtain V p x F : , q ´ V p x I : , q “ ż Y p tx F ´ p ´ t q x I q ¨ p x F : , ´ x I : , q dt, where Riemann integrability follows from Rockafellar [1970] Corollary 24.2.1. Appendix B Supplemental Results
B.1 Plug-in Estimation
The proof of Theorem 1 is based on multiplying derivative ratios. By directly pluging-in an estimator of the M -th order derivatives, one can construct an estimator of the M -th order moments. Suppose we have estimator B x k ,ℓ ¨ ¨ ¨ B x kM ,ℓM ˆ Y p q of the associated M -th order derivative of the average structural function where k m “ k ˜ m “ j . In addition, suppose we have an estimator B x , ¨ ¨ ¨ B x , ˆ Y j p q of the M -th order partial derivative of the structural function with respect to the x , regressor.We construct an estimator {ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M ν p dβ q “ B x k ,ℓ ¨ ¨ ¨ B x kM,ℓM ˆ Y p qB x , ¨ ¨ ¨ B x , ˆ Y j p q , where for simplicitly we assume ş β M , ν p dβ q “
1. (More generally we need it to beknown a priori and nonzero.) 41sing Equation 8, we see that ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M ν p dβ q ´ {ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M ν p dβ q “B x k ,ℓ ¨ ¨ ¨ B x kM ,ℓM Y p qB x , ¨ ¨ ¨ B x , Y j p q ´ B x k ,ℓ ¨ ¨ ¨ B x kM,ℓM ˆ Y p qB x , ¨ ¨ ¨ B x , ˆ Y j p q . Thus, estimation error on the right-hand side translates to estimation error on theleft hand side for the M -th order moment of β . Note that the choice of the j -th goodis arbitrary here, and so one could also construct an estimator with right hand sidereplaced by an average over ratios with respect to different goods.This argument can be generalized to additional moments. Here, we use the fact thatwe are interested in an M -th order moment that is only one good index away frombeing a vector of 1’s. For other M -th order moments, our constructive formulasshow one must multiply additional derivative ratios. See in particular the proof ofLemma A.5. B.2 V Known and Relation to Fox et al. [2012]
Theorem 1 identifies the M -th order moment of β when we fix ş β M , ν p dβ q . By fixingthe entire distribution of β , , we can identify all moments of β . An alternativeassumption is that V is known. If we impose this assumption, then we can dropAssumption 4, which provides knowledge of each M -th order moment of β , , andAssumption 6 that a rich collection of moments are nonzero. The intuition why wecan relax the scale assumption on moments is that here we instead set the scale of V . Assumption B.1. V is known in a neighborhood of , up to an additive constant. Proposition B.1.
Let Assumptions 1-3, 5, and B.1 hold with the same natural num-ber M . Each M -th order moment ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M dν p β q is identified.Proof. Lemma A.2 holds under the assumptions of this proposition, and so for each42uple of good indices γ P t , . . . , M u M and characteristic indices ξ we have B p γ,ξ q Y k p q “ B γ B k V p q ż β p γ,ξ q dν p β q . Since B γ B k V p q ‰ ş β p γ,ξ q ν p dβ q by B p γ,ξ q Y k p q O B γ B k V p q “ ż β p γ,ξ q dν p β q . The proof demonstrates that in fact, we only need B γ B k V p q to be known and nonzerofor some k in order to identify a corresponding moment of β . For further relationto Theorem 1, consider good indices γ “ p , . . . , q and characteristic indices ξ “p , . . . , q . Then as in the proof of Proposition B.1, we obtain B M Y k p q “ B M B k V p q ż β M , dν p β q . This shows that one can either fix the M -th moment of β or fix B M B k V p q for some k , and then the other can be identified. Thus, Assumption 4 can be replaced inTheorem 1 if we instead assume B M B k V p q is known for some k . Alternatively, if weassume β , is independent of β as in Section 3.1, then specifying B , V p q identifies ş β , dν p β q by the envelope theorem B Y p qB x , “ B , V p q ż β , dν p β q . Thus, independence combined with a single scale assumption on a partial derivativeof V can identify all moments of β by adapting Proposition 1.In discrete choice, Fox et al. [2012] present a constructive approach to identifyingmoments of the distribution of random coefficients. Specializing our analysis to dis-crete choice, their assumptions show that when the distribution of an additive erroris known (e.g. logit with known intercept) that this implies identification of V . Tosee this, recall V is defined as an indirect utility function given a disturbance D andconstraint set B . In turn, using Lemma 1 we see D and B are determined by the43udget set B , disturbance function D , and measure µ over ε . Thus, when the budgetset and µ are known then one can find B and D needed to compute V . For example,multinomial logit is described by V p ~u q “ max y P B K ÿ k “ y k u k ` K ÿ k “ p α k ` p k ln p k q , where α k is a nonrandom intercept for good k and B is the probability simplex (e.g.Anderson et al. [1992]). The derivatives of V can be used to yield the standard logitformula Y k p ~u q “ e α k ` u k ř Kj “ e α j ` u j . We conclude that Proposition B.1 is a generalization of a technique of Fox et al.[2012] to settings outside of discrete choice. However, Theorem 1 does not requirethat V be known. Thus, the results in this paper complement their approach sincewhile it relaxes assumptions on V , it instead requires an additional scale assumption(Assumption 4) and requires that a rich collection of moments are nonzero (Assump-tion 6). B.3 Homogeneity of Coefficients and Relation to Cher-nozhukov et al. [2019a]
When M “
2, the proof of Theorem 1 establishes the constructive formula B Y k p qB x j,ℓ O B Y j p qB x k,m “ ż β j,ℓ dν p β q O ż β k,m dν p β q . (9)A version of (9) has appeared for binary choice in Chernozhukov et al. [2019a], whoalso discuss identification of the ratios of M -th order moments of β up to scale. Theyalso mention one can identify certain moments up to scale in multinomial choice.Here is one interpretation of their discussion, translated to our setup. Start with a Fox et al. [2012] also presents nonconstructive results using alternative assumptions maintainingthe assumption that V is known. B x k ,ℓ B x k ,ℓ Y k p q “ B k B k B k V p q ż β k ,ℓ β k ,ℓ dν p β q . Now keep the good indices ( k and k ) constant, but change the characteristics to get B x k , ˜ ℓ B x k , ˜ ℓ Y k p q “ B k B k B k V p q ż β k , ˜ ℓ β k , ˜ ℓ dν p β q . Since the derivatives of V are taken with respect to the same arguments, we can dividethese equations to identify the associated ratios of moments of β . This techniqueresembles an implicit function theorem argument for identification. Importantly, thistechnique only covers ratios of moments in which the good indices ( k and k here)are the same, because it does not use symmetry (cf. Lemma 3). Using symmetry,this paper establishes identification of the ratio of all M -th order moments, not onlythose that have the same good indices. However, if we impose additional assumptionssuch as β j “ β k for all goods, then the choice of good indices does not matter. In thisspecial case, using the equations described previously one can identify the ratio ofany 2-nd order moments of β . Similar arguments can identify the ratio of any M -thorder moments of ββ