[PDF] Identification of Random Coefficient Latent Utility Models

Abstract

This paper provides nonparametric identification results for random coefficient distributions in perturbed utility models. We cover discrete and continuous choice models. We establish identification using variation in mean quantities, and the results apply when an analyst observes aggregate demands but not whether goods are chosen together. We require exclusion restrictions and independence between random slope coefficients and random intercepts. We do not require regressors to have large supports or parametric assumptions.

Full PDF

aa r X i v : . [ ec on . E M ] F e b Identiﬁcation of Random Coeﬃcient Latent Utility Models ∗ Roy AllenDepartment of EconomicsUniversity of Western [email protected] John RehbeckDepartment of EconomicsThe Ohio State [email protected] 3, 2020

Abstract

This paper provides nonparametric identiﬁcation results for random coef-ﬁcient distributions in perturbed utility models. We cover discrete and con-tinuous choice models. We establish identiﬁcation using variation in meanquantities, and the results apply when an analyst observes aggregate demandsbut not whether goods are chosen together. We require exclusion restrictionsand independence between random slope coeﬃcients and random intercepts.We do not require regressors to have large supports or parametric assumptions. ∗ Any remaining errors are our own. Introduction

Latent utility models with linear random coeﬃcients have been extensively used.They have a long history in discrete choice, and have become increasingly populardue to computational advances (see e.g. Train [2009]). They now form the coredemand system of most applied work involving demand for diﬀerentiated productsfollowing Berry et al. [1995]. Progress has been made on identiﬁcation of these modelsin discrete choice, but gaps remain, even in semiparametric settings. For example,nonparametric identiﬁcation of the distribution of random coeﬃcients in the randomcoeﬃcients nested logit model has not been established without unbounded regres-sors. More broadly, there is growing interest in models that allow complementarity,but even less is known about identiﬁcation of random coeﬃcients in these models. The main contribution of this paper establishes nonparametric identiﬁcation for themoments of random coeﬃcients in a general class of latent utility models. The frame-work applies to discrete and continuous choices. As a special case, we establishidentiﬁcation for a bundles model with limited consideration of either alternativesor characteristics (Example 2). Identiﬁcation only depends on the average struc-tural function [Blundell and Powell, 2003]. Thus the results can be applied when oneobserves the average demands of individuals without observing whether goods arechosen together. Leveraging the main result, we can identify the distribution of ran-dom coeﬃcients when it is characterized by its moments (e.g. normal distributions).Specialized to discrete choice, the main contribution is new since it does not requireany regressor to be unbounded.Two key ingredients let us get traction for identiﬁcation. First, we assume indepen-dence between random slopes and random intercepts. This is a standard assumptionin the widely-used random coeﬃcients logit model. We use this assumption to inte-grate out the random intercepts. This smooths out demand when conditioning on Heckman [2001] attributes the ﬁrst use to Domenich and McFadden [1975] in economics. See Nevo [2000], p. 524-526. il Kim [2014] has established identiﬁcation in the special casewhere intercept location coeﬃcients are 0. Recent work includes Gentzkow [2007], McFadden and Fosgerau [2012], Fosgerau et al. [2019],Allen and Rehbeck [2019a], Ershov et al. [2018], Monardo [2019], Iaria and Wang [2019], and Wang[2020]. This work is an outgrowth of the discrete choice additive random utility model [McFadden,1981] and diﬀers from classic continuous demand systems (e.g. Deaton and Muellbauer [1980]) byfocusing on characteristic variation rather than variation in a budget constraint. known , wedo not. The fundamental shape restriction we exploit is that, after integrating out randomintercepts, integrated mean choices are the derivative of a convex function. This fol-lows from an application of the envelope theorem. Similar tools from convex analysishave also been used for identiﬁcation of hedonic models [Ekeland et al., 2002, 2004,Heckman et al., 2010, Chernozhukov et al., 2019b], matching [Galichon and Salani´e,2015], dynamic discrete choice [Chiong et al., 2016], discrete choice panel models [Shiet al., 2018], and perturbed models with additively separable heterogeneity [Allen andRehbeck, 2019a], among others. By exploiting the envelope theorem, we can treat several models in a common frame-work. There is little work on identiﬁcation of optimizing models with linear randomcoeﬃcients outside of discrete choice. Exceptions include Dunker et al. [2018] fordiscrete games and Dunker et al. [2017] and Iaria and Wang [2019] for a random coef-ﬁcients version of the Gentzkow [2007] discrete bundles model. We diﬀer by requiringidentiﬁcation of only the average structural function (“mean demands”), withoutneeding to observe the frequency with which goods are chosen together. Identiﬁ-cation with linear random coeﬃcients has also been established in settings withoutassuming an optimizing model. See for example the simultaneous equations analysisin Masten [2017] and references therein.Identiﬁcation of linear random coeﬃcients has been extensively studied in discretechoice. Despite this, nonparametric identiﬁcation has only been established either In discrete choice, assuming this function is known translates to the distribution of randomintercepts being known (e.g. logit). Lewbel and Pendakur [2017] show that one can drop theassumption that this mapping is known in some settings if one imposes an additional additiveseparability assumption. Matzkin [1994] reviews other identiﬁcation results using shape restrictions motivated by eco-nomic theory. See also work on optimal transport, as in Galichon [2018]. Wang [2020] also works with the average structural function but does not identify the distribu-tion of random coeﬃcients. Several of thesepapers additionally assume large support regressors also have the same coeﬃcientacross goods. In contrast, Fox et al. [2012] and il Kim [2014] do not assume largesupport or a homogeneous regressor, but assume the distribution of the random in-tercept is known (e.g. logit). Chernozhukov et al. [2019a] discusses identiﬁcation ofratios of certain moments of the distribution of random coeﬃcients without requir-ing large support, but do not provide conditions under which the full distribution isidentiﬁed.The remainder of the paper proceeds as follows. Section 2 provides details on theclass of latent utility models we study and examples of behavior that this covers. Sec-tion 3 provides the main result, which identiﬁes arbitrary order moments of randomcoeﬃcients and shows that a single independence and scale assumption can be usedto identify all other moments. Section 4 discusses how to recover diﬀerent welfareobjects and perform counterfactuals. Finally, Section 5 discusses relations to some ex-isting papers, shows how the results can be taken to settings with non-linear randomcoeﬃcients, and discusses some testable properties of the framework.

This paper studies the random coeﬃcients perturbed utility model , in which optimizingchoices satisfy Y p X, β, ε q P argmax y P B K ÿ k “ y k p β k X k q ` D p y, ε q . (1) Exceptions include Kashaev [2018] and Matzkin [2019], but neither paper studies nonparametricidentiﬁcation for the distribution of linear random coeﬃcients. Y p¨q as the quantity vector for K diﬀerent goods. The vector X k “ p X k, , . . . , X k,d k q denotes observable shifters ofthe desirability of good k , and β k “ p β k, , . . . , β k,d k q denotes random coeﬃcients onthese shifters, which may be good-speciﬁc. The index β k X k shifts the marginal utilityof good k . We collect X “ p X , . . . , X K q and β “ p β , . . . , β K q . The term D p y, ε q is a disturbance that depends on unobservables ε of unrestricted dimension. When D p y, ε q “ ř Kk “ ε k y k , ε k can be interpreted as a random intercept for the desirabilityof the k -th good. In general, we refer to D p y, ε q as the random intercept. The set B Ď R K is a feasibility set. This is introduced purely for exposition, since D p y, ε q can be ´8 which allows random feasibility sets.The focus of this paper is on identiﬁcation of moments of the distribution of β .Our results do not require speciﬁcation of the budget B , the disturbance D , or thedistribution of over ε . For concreteness, we provide some examples. Example 1 (Discrete Choice) . Consider a discrete choice models with latent utilityfor good k of the form v k “ β k X k ` ε k . When β is random, this is a linear random coeﬃcients model as studied in Hausmanand Wise [1978], Boyd and Mellman [1980], Cardell and Dunbar [1980], among manyothers. This ﬁts into the setup of (1) by setting D p y, ε q “ ř Kk “ y k ε k , B “ t y P R K | ř Kk “ y k “ , y k ě u the probability simplex, and letting Y p X, β, ε q P t , u K be avector of indicators denoting which good is chosen. In many applications, an “outsidegood” is set to have a utility of . This can be mapped to our setup by replacing thebudget with B “ t y P R K | ř Kk “ y k ď , y k ě u ; this allows Y p X, β, ε q “ p , . . . , q P B , which can be interpreted as the choice of the outside option.We also cover what is sometimes called the perturbed representation of choice, whichcan model market shares or individuals who like variety. For example, Anderson et al.[1988] show logit models are related to the maximization problem max y P ∆ K ÿ k “ y k p β k X k q ` K ÿ k “ y k log p y k q , This allows the agent to randomize when there are utility ties. ith ∆ the probability simplex. Hofbauer and Sandholm [2002] show by replacing theadditive entropy term with a general disturbance, the setup covers all discrete choiceadditive random models once random intercepts are integrated out. Fudenberg et al.[2015] study a model in which the disturbance is additively separable. Fosgerau et al.[2019] and Allen and Rehbeck [2019b] show how to model complementarity with theperturbed utility representation. Example 2 (Bundles with Limited Consideration) . Gentzkow [2007] presents a modelof choice of bundles involving online and print news. The model involves multiplegoods and individuals can choose more than one good at the same time. A randomcoeﬃcients version of the model has been studied in Dunker et al. [2017]. Let v j,k denote the utility associated with quantity j of the ﬁrst good, and quantity k of thesecond good. Specify utilities v , “ v , “ β X ` ε , v , “ β X ` ε , v , “ v , ` v , ` ε , . Here, ε , denotes a utility boost or loss from purchasing both goods relative to thesum of their individual utility. It describes complementarity/substitutability betweenthe goods. For each quantity vector ~y “ p y , y q , set the utility as ÿ k “ y k p β k X k q ` p y ε , ` y ε , ` y y ε , q , and let the budget be B “ t , u . Then the optimizing quantity vector Y p X, β, ε q Pt , u ﬁts into the setup of (1).This can be modiﬁed to include latent budgets. One may interpret these as mental“consideration sets” [Eliaz and Spiegler, 2011, Masatlioglu et al., 2012, Manzini andMariotti, 2014, Aguiar, 2017] or general latent feasibility sets [Manski, 1977, Conlonand Mortimer, 2013, Brady and Rehbeck, 2016]. In addition, we can allow limitedconsideration of the characteristics of goods. See Gabaix [2019] for a survey of “behavioral inattention.” o model these types of limited attention, consider a version of the bundles modelgiven by Y p X, β, ε q P argmax y Pt , u ÿ k “ y k p β k X k q ` D p y, ε q , where D p y, ε q “ $&% y ε , ` y ε , ` y y ε , if y P B p ε q´8 otherwise . Here, the set B p ε q is a latent feasibility set, which could arise because an individualmay not consider all goods or the analyst cannot observe when goods are out of stock.Some components of β can be zero with positive probability, reﬂecting that individualsmay not notice or care about certain characteristics.This setup can be generalized to allow more goods, with some goods continuous andsome goods discrete. What is key for our analysis is that the index β k X k shifts (only)the marginal utility of good k . This paper establishes identiﬁcation of moments of β using the average structuralfunction [Blundell and Powell, 2003] Y p x q “ ż Y p x, β, ε q dτ p β, ε q for some probability measure τ that does not depend on covariates x . We assumethat the measure τ satisﬁes a key independence condition. Assumption 1 (Slope-Intercept Independence) . The random variables β and ε areindependent under the measure τ , and the average structural function is ﬁnite. While independence between β and ε is restrictive, it is a standard assumption inapplications of the random coeﬃcients logit model in discrete choice. It has beenexploited for identiﬁcation in [Fox et al., 2012] and [Chernozhukov et al., 2019a]. However, independence is not imposed in some papers studying identiﬁcation. For example,Ichimura and Thompson [1998] or Gautier and Kitamura [2013] do not impose independence of theslope and intercept. Y p x q “ ż ż Y p x, β, ε q dµ p ε q dν p β q for some probability measures µ and ν . Technically, full independence is not neededas long as we can factor the average structural function in this way.For an example of an average structural function, suppose p Y, X, β, ε q are randomvariables that satisfy Y “ Y p X, β, ε q almost surely. Moreover, assume X , β , and ε are all independent. In addition to independence, suppose a continuous version of theconditional mean of Y given X exists. Then E r Y | X “ x s “ Y p x q “ ż ż Y p x, β, ε q dµ p ε q dν p β q for x in the support of X , where µ is the marginal distribution of ε and ν is themarginal distribution of β .The results in this paper apply to general average structural functions Y p x q , not onlythe conditional mean. Thus, while slope-intercept independence is important for ourresults, independence between X and p β, ε q is not. Therefore, the results in this paperare relevant for settings with endogeneity.The goal of this paper is not to provide a new method to identify the average struc-tural function, but rather to use the function to identify other features of a utilitymaximizing model. There is a large literature on identifying structural functions.Blundell and Powell [2003] describe how to use control functions to identify the aver-age structural function Y p x q . Altonji and Matzkin [2005] identify derivatives of theaverage structural function using certain conditional independence or symmetry con-ditions. Berry [1994], Berry et al. [1995], Newey and Powell [2003], Berry and Haile[2014], and Dunker et al. [2017] among others use instrumental variables to identifyan average structural function from aggregate data. Recall that the support of X is the smallest closed set S such that P p X P S q “ A key step to apply these methods is injectivity in a market-level observable to a vector ofunobservable endogenous vectors, usually denoted ξ . See Allen [2019] or Lemma 3 in Allen andRehbeck [2019a] for injectivity results that cover the present model when the utility index for good k is β k x k ` ξ k . Related injectivity results have appeared in Galichon and Salani´e [2015] and Chionget al. [2017].

8n important feature of the analysis is that only the average structural functionis required to be identiﬁed over an appropriate region. Thus, the full distributionof Y p x, ¨ , ¨q induced by the product measure µ ˆ ν over p β, ε q is not necessary foridentiﬁcation. For common discrete choice models the average structural functionand the full distribution of Y p x, ¨ , ¨q contain the same information, but this is not truein general. This is particularly important when combining this analysis with workallowing endogeneity between X and p β, ε q . In particular, there are well-understoodmethods to identify the average structural function in the presence of endogeneityas mentioned earlier. In contrast, less is known about identiﬁcation of the entiredistribution of Y p x, ¨ , ¨q in the presence of endogeneity. In addition, requiring only the average structural function implies that the analysiscan be applied to settings outside of discrete choice without observing whether goodsare chosen together. Of course if the full distribution of Y p x, ¨ , ¨q is identiﬁed, thenthese results apply as well. We recall that this paper does not study identiﬁcation ofthe distribution of ε in the original latent utility model (1). However, it is possible toidentify the distribution of ε in some cases. For example, Dunker et al. [2017] showhow to identify the distribution of random intercepts in a full-consideration randomcoeﬃcients bundles model, provided the analyst has aggregate data on the frequencywith which goods are chosen together. We make use of an aggregation result that ﬁrst integrates out the distribution of ε . Lemma 1 (Allen and Rehbeck [2019a]) . Let Y p¨q satisfy (1). For any measure µ over ε such that ş Y p x, β, ε q dµ p ε q and ş D p Y p x, β, ε q , ε q dµ p ε q exist and are ﬁnite, itfollows that ż Y p x, β, ε q µ p dε q P argmax y P B K ÿ k “ y k p β k x k q ` D p y q , Imbens and Newey [2009] identify average and quantile structural functions with multidimen-sional heterogeneity in the outcome equation. Torgovitsky [2015] and D’Haultfœuille and F´evrier[2015] identify the entire structural function with one-dimensional unobservable heterogeneity in theoutcome equation. A multidimensional counterpart has been studied in Gunsilius [2019]. Thesepapers all identify features of structural functions in the presence of endogeneity. or B the convex hull of B , and D p y q “ sup ˜ Y P Y : ş ˜ Y p ε q dµ p ε q“ y ş D ´ ˜ Y p ε q , ε ¯ dµ p ε q , where Y is the set of ε -measurable functions that map to B .In addition, the (integrated) indirect utility function V p β x , . . . , β K x K q “ max y P B K ÿ k “ y k p β k x k q ` D p y q , satisﬁes V p β x , . . . , β K x K q “ ż ˜ max y P B K ÿ k “ y k p β k x k q ` D p y, ε q ¸ dµ p ε q . Note that assuming Y p¨q satisﬁes (1) requires the argmax set to be nonempty. Thisis a behavioral restriction that imposes suﬃcient structure for the theorem to gothrough, and imposes minimal restrictions on D . In particular, D can be ´8 forcertain combinations of p y, ε q and need not be continuous. This allows us to treatlimited consideration models as in Example 2.We leverage the aggregation result from Lemma 1 to use calculus-based techniquesfor identiﬁcation. To illustrate how aggregation can lead to smoothness, recall that Y p x, β, ε q in discrete choice is a vector of indicators denoting which good is chosen(assuming no ties). Derivatives with respect to x either do not exist at certain points,or are zero and contain little information.We smooth choices by working with Y p x, β q : “ ż Y p x, β, ε q dµ p ε q with µ as in Assumption 1. In discrete choice, when ε is integrated out Y p x, β q canbe interpreted as the vector of probabilities conditional on only the utility indices.However, this general framework allows us to use the same tools to address discreteand continuous choice. For example, choices could involve a single discrete choice,discrete bundle choice, a prospective matching, continuous quantities of several goods,or time use among other settings. The supremum is taken to be ´8 when there is no ˜ Y P Y such that ş ˜ Y p ε q dµ p ε q “ y .

10e places some additional high-level suﬃcient conditions relative to the conclusionsof Lemma 1.

Assumption 2.

Assume the following:(i) Y p x, β q “ argmax y P B K ÿ k “ y k p β k x k q ` D p y q . (ii) B Ď R K is a nonempty, closed, and convex set.(iii) D : R K Ñ R Y t´8u is concave, upper semi-continuous, and ﬁnite at some y P B . Allen and Rehbeck [2019a] provide lower-level conditions that, when combined withLemma 1, imply this assumption. Part (i) strengthens the conclusion of Lemma 1to obtain a unique maximizer. Concavity in part (iii) is milder than it ﬁrst appears,and delivers no additional restrictions on Y p x, β q when the other assumptions aremaintained. See the discussion in Allen and Rehbeck [2019a].To further present the foundation of the identiﬁcation results, we present a versionof the envelope theorem. Lemma 2.

Let Assumption 2 hold. It follows that Y k p x, β q “ B k V p β x , . . . , β K x K q . (2)Here, Y k p x, β q is the k -th component of Y p x, β q and B k V p β x , . . . , β K x K q isthe derivative with respect to the k -th dimension of V evaluated at the point p β x , . . . , β K x K q . We use similar notation for the rest of the paper. Diﬀerentia-bility of V is implied by the fact that Y is the unique maximizer. This is the primaryimplication of Assumption 2 that we use for this paper.Fox et al. [2012] use a structure similar to (2), showing that when V is known, it ispossible to identify moments of the distribution of β . We diﬀer because we do notrequire an analyst to specify V . Instead, we require certain moments to be nonzero asa relevance condition. Appendix B.2 provides further details and a comparison withtheir approach. A related structure is considered in Lewbel and Pendakur [2017], whoidentify the distribution of random coeﬃcients when an analogue of B k V is known in11dvance or additively separable in arguments. We do not impose this structure.We also leverage a symmetry property of mixed partial derivatives that results fromthe optimizing behavior in Assumption 2. For a vector of indices γ “ p γ , . . . , γ M q Pt , . . . , K u M and a suﬃciently diﬀerentiable function f : R K Ñ R , let B γ f : “ B γ ¨ ¨ ¨ B γ M f. Lemma 3.

Suppose V is M -times continuously diﬀerentiable in a neighborhood of ~u P R K . Let γ, δ P t , . . . , K u M be vectors of indices in which each index occurs thesame number of times in both γ and δ . It follows that B γ V p ~u q “ B δ V p ~u q . This result states that the order in which we take partial derivatives does not matter.For example, when M “ j, k P t , . . . , K u that B j,k V p ~u q “ B k,j V p ~u q . The lemma follows by repeated application of the M “ With the foundations in place, we now turn to the task of identifying moments ofrandom coeﬃcients. We focus on conditions where certain M -th order moments ofthe distribution of β are identiﬁed. In particular, if the assumptions hold for all M ,then all moments of the distribution of random coeﬃcients are identiﬁed.We assume regressors are continuous and satisfy an exclusion restriction. Assumption 3.

All covariates are continuous. In addition, each x k is a vector ofregressors speciﬁc to the k -th good. We now provide some intuition for the main result (Theorem 1). We consider iden-tifying second moments of β ( M “

2) when there are two goods ( K “

2) and each12ood has a single covariate ( d k “ f , with respect to the covariates of the j -th good, x j ,as B x j f . Diﬀerentiating the envelope theorem (Lemma 2) and evaluating at x “ B x j Y k p , β q “ B j,k V p q β j . This uses the fact that x j is continuous and excluded from the utility index of othergoods. This can be repeated with other mixed partial derivatives. Importantly, byevaluating derivatives at the point x “

0, the terms B j,k V p q do not depend on β .Thus, when integrating over the values of the random coeﬃcients, the term involving V passes outside of the integral. In particular, integrating over β yields the followingsystem of equations B x B x Y p q “ B , , V p q ż β dν p β qB x B x Y p q “ B , , V p q ż β β dν p β qB x B x Y p q “ B , , V p q ż β β dν p β qB x B x Y p q “ B , , V p q ż β dν p β q (3)where we have implicitly assumed that diﬀerentiation and integration can be inter-changed.Assume that the derivatives of Y are identiﬁed. At ﬁrst glance, this is a system offour equations with seven unknowns (clearly the β β and β β moments are equal).However, when V is suﬃciently diﬀerentiable, partial derivatives of V do not dependon the order of diﬀerentiation (Lemma 3), which eliminates two unknowns. Usinga scale assumption that ş β dν p β q is known a priori will eliminate an unknown andgives a system with 4 equations and 4 unknowns. We show that this is enough toidentify all second moments of β .To constructively see how the moments are identiﬁed, note that using symmetryof derivatives, the ﬁrst and third equations identify ş β β dν p β q . Using this, we Here we abuse notation and for the function f , we let B s f p q “ B s f p z q| z “ . B , , V p q using the second equation. Again using symmetry of derivativesand combining this with the last equation identiﬁes ş β dν p β q . Once all moments areidentiﬁed, the remaining third order derivatives of V can be identiﬁed at 0. We now provide formal conditions that justify the intuitive argument for any numberof goods, covariates, and order of moment M . Assumption 4.

For the natural number M , ş β M , dν p β q is ﬁnite, known a priori, andnonzero. Assumption 4 holds if we set β , “

1, for example, but is considerably more general.It allows heterogeneity in the sign of β , , for example. In general, if one wants to iden-tify all moments of β using the main result, then for every M Assumption 4 must hold.This assumption holds when the distribution of β , is known a priori and the distri-bution has nonzero moments of all orders. If Assumption 4 is dropped, the results inthis paper establish identiﬁcation of the ratio of any nonzero M -th order moments.Thus, Assumption 4 can be appropriately modiﬁed by instead holding ﬁxed the valueof some other nonzero M -th order moment of the form ş β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M dν p β q . Weshow in Section 3.1 that if β , is independent of all other components of β , thenidentiﬁcation is possible using a single scale assumption on the ﬁrst moment.Recall that with minor abuse of notation we set Y p x q “ ż Y p x, β q dν p β q . We require the following regularity conditions.

Assumption 5.

For the natural number M , the following conditions hold:(i) For each good k , one can interchange integration and diﬀerentiation for all M -th This part also requires the equations B x B x Y p q “ B , , V p q ż β dν p β qB x B x Y p q “ B , , V p q ż β dν p β q to identify B , , V p q and B , , V p q . rder partial derivatives at x “ so that B x k ,ℓ ¨ ¨ ¨ B x kM,ℓM Y k p q “ ż B x k ,ℓ ¨ ¨ ¨ B x kM ,ℓM Y k p , β q dν p β q holds.(ii) Each M -th order moment ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M dν p β q exists and is ﬁnite.(iii) V is p M ` q -times continuously diﬀerentiable in a neighborhood of .(iv) For each γ P t , . . . , K u M ` , B γ V p q ‰ . (v) Y p x q is known in a neighborhood of x “ , or more generally it is known in aneighborhood of x “ with respect to the weakly positive orthant of R ř Kk “ d k . These regularity conditions parallel assumptions in Fox et al. [2012]. To interpretpart (i), note that ν can be a discrete probability measure over β with ﬁnite support.For discrete measures, (i) holds whenever Y k p x, β q is M -times diﬀerentiable in x forevery β in its support. Part (ii) formalizes that the moments we wish to identify existand are ﬁnite.Parts (iii) and (iv) can be linked to derivatives of the function Y p x q via the envelopetheorem (Lemma 2). Indeed, diﬀerentiating the envelope theorem for the k -th good,evaluating the derivative of ¯ Y k with respect to x j,ℓ at x “

0, and taking expectationsyields B Y k p qB x j,ℓ “ B j,k V p q ż β j,ℓ dν p β q . (4)Thus, when V is p M ` q -times continuously diﬀerentiable, it follows that Y is M -times continuously diﬀerentiable. Moreover, if one sees empirically that B Y k p qB x j,ℓ ‰ More formally, the second condition can be written as follows: for some neighborhood H of x “ R ř Kk “ d k , Y p x q is known on H X R ř Kk “ d k ` , where R ` “ R X r , . B j,k V p q ‰ γ has Lebesgue measure 0. In general, whether (iv) holds depends onfeatures of the distribution of ε and choice of D , which are example speciﬁc. Forexample, part (iv) rules out pure characteristic discrete choice models as in Berryand Pakes [2007] and Dunker et al. [2017]. These models do not include a randomintercept, and so the value function for the pure characteristics model, V P C , can bewritten as V P C p β x , . . . , β K x K q “ sup y P B K ÿ k “ y k p β k x k q without the additive disturbance D , where B is the probability simplex. V P C doesnot have a non-zero derivative at x “ M ě V P C also does not always induce a unique maximizer.More generally, condition (iv) requires that goods in the demand system are related.For example, if the original K -good demand system can be written as K separate1-good demand systems, then derivatives of the form V j,k p q will be zero for j ‰ k .This is because under this separability assumption, the utility index of good j doesnot alter the demand for the k -th good. In general, (iv) cannot be relaxed for themain result to hold without additional assumptions. For example, if we impose that β j “ β k (a.s.) for all j, k P t , . . . , K u , then one can identify ratios of moments underthe weaker assumption that B M ` j V p q ‰ j . See Appendix B.3.Condition (v) states that Y p x q is identiﬁed over a small region near x “

0. Theconstructive identiﬁcation results in Fox et al. [2012] and Chernozhukov et al. [2019a]have also made use of variation around zero. In contrast, most of the literature insteadrequires identiﬁcation of Y either for all x or for a set over which x is unboundedalong some dimensions.To interpret condition (v), suppose that X , β , and ε are all independent, and weidentify Y from a continuous version of the conditional mean of Y given X . For thiscase, condition (v) is implied when the support of X contains an open ball around x “

0. The second more general part of (v) highlights that the results also applywhen the average structural function is identiﬁed over a weakly positive region. Thus,16ur results do not rule out prices. We can handle this case because we only need toidentify certain derivatives of Y at 0. These derivatives of Y at 0 are identiﬁed inthis case by calculating derivatives from “one-sided” limits involving non-negativenumbers.The ﬁnal assumption used for identiﬁcation is that a suﬃciently rich set of M -thorder moments of β are nonzero. Assumption 6.

For the natural number M and each tuple of good indices p k , . . . , k M q P t , . . . , K u M , there is a corresponding tuple of characteristic indices p ℓ , . . . , ℓ M q P ś Mm “ t , . . . , d k m u such that the M -th order moment ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M ν p dβ q exists and is nonzero. This is a relevance condition. It is not necessary to know which indices p ℓ , . . . , ℓ M q satisfy this condition in advance. A suﬃcient condition for this is that for every k -th good there is a regressor ℓ k P t , . . . , d k u such that either β k,ℓ k ě β k,ℓ k ď β k, “ k -th good bysetting ℓ “ ¨ ¨ ¨ “ ℓ M “

1. This is a common assumption in the literature [Berryand Haile, 2009, Briesch et al., 2010, Dunker et al., 2017]. However, Ichimura andThompson [1998] and Gautier and Kitamura [2013] establish identiﬁcation of randomcoeﬃcients models for binary discrete choice using a more general halfspace condition.With these assumptions, we can now state the main result of the paper.

Theorem 1.

Let Assumptions 1-6 hold with the same natural number M . Each M -thorder moment of the form ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M dν p β q is identiﬁed. In addition, for each γ P t , . . . , K u M ` , B γ V p q is identiﬁed. See the discussion after Lemma A.2 in Appendix A.2. β . It canbe directly used to establish semiparametric identiﬁcation of the distribution of β forcertain parametric families without specifying other objects (e.g. V ). For example, if β is normally distributed then Theorem 1 identiﬁes the distribution when the assump-tions hold for M P t , u because normal distributions are characterized by meansand covariances. Recall that while we identify non-centered moments, we can use thisinformation to identify centered moments. More generally, for any distribution of β that is deﬁned by its moments up to order M this result estabilishes identiﬁcation ofthe distribution. Corollary 1.

Suppose Assumptions 1-6 hold for each M ď M (which could be )and the distribution of β is determined by its ﬁrst M moments. It follows that thedistribution of β is identiﬁed. Fox et al. [2012] and il Kim [2014] describe a suﬃcient condition for a distributionto be determined by its moments. Distributions with compact ﬁnite support aredetermined by their moments. Lognormal distributions are an example of distribu-tions that are not determined by integer moments [Heyde, 1963]. That is, there areother nonparametric distributions that can match the same moments. However, theparameters may still be identiﬁed within the lognormal class.

Remark . The proof of Theorem 1 is constructive.While a detailed approach to estimation is beyond the scope of this paper, the con-structive results can be used to show how consistent estimation of the ratios of certainderivatives allows one to consistently estimate moments of the distribution of randomcoeﬃcients. See Appendix B.1 for a brief outline. Estimation error of the M -th ordermoments can be bounded when the analyst has a suitable estimator of the M -th orderderivatives of the average structural function. For example, Chen and Christensen[2018] provide an approach to estimate derivatives of average structural functions inthe presence of endogeneity. β , Identifying the distribution of β using Corollary 1 requires Assumption 4, which spec-iﬁes all moments of the form ş β M , dν p β q . When the distribution of β is identiﬁed fromits moments, one must specify the marginal distribution of β , in advance to apply18orollary 1. While the common assumption β , “ β , has a known distribution. This section describes an alternative assumption thatensures identiﬁcation of moments of β . In particular, we assume β , is independentof other components of β .With this independence assumption, we show that a single scale assumption on theﬁrst moment of β , allows us to identify a rich collection of moments. This contrastswith Theorem 1, which uses an assumption on the M -th order moment of β , toidentify only M -th order moments of β . Assumption 7.

For the measure ν as deﬁned in Assumption 1, β , is independentof all other components of β . In addition, | ş β , dν p β q| is ﬁnite, known a priori, andnonzero. Alternatively, one could set the absolute value of some other order moment of β , ,but we focus on the ﬁrst moment since it facilitates interpretation. Independencebetween β , and other components is considerably weaker than assuming β , “ β , to be sometimes negative and sometimespositive. Thus, diﬀerent individuals can be repelled or attracted to higher values of x , .Replacing Assumption 4 with Assumption 7, we obtain the following counterpart ofTheorem 1. Proposition 1.

Let K ě and Assumptions 1-3 and 5-7 hold for all for each M ď M (which could be ). It follows that the M -th order moment of the form ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M ν p dβ q is identiﬁed. In addition, for each γ P t , . . . , K u M ` , B γ V p q is identiﬁed.If K “ , the same conclusions hold given the additional assumption that for each ach M ď M , there exists an order M ´ moment such that ż β ,ℓ ¨ ¨ ¨ β ,ℓ M ´ dν p β q ‰ , where ℓ m ‰ for every m P t , . . . , M ´ u . Relative to Theorem 1, independence of β , from the other components allows us torelate the M -th and M ´ M -th ordermoment in which β , appears exactly once. Using independence, we obtain ż β , β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M dν p β q “ ż β , dν p β q ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M dν p β q . In the proof, we show that ş β , dν p β q can be identiﬁed when | ş β , dν p β q| is ﬁnite,known a priori , and non-zero. With this knowledge, we can identify the ratio of all M -th order moments to all p M ´ q -th order moments and apply induction to identifyall M ď ¯ M order moments. Remark . When K ě

2, the assumptions of Proposition 1 impose thatthere are multiple relevant characteristics. The additional assumption in the K “ x , . Remark . Provided ş β , ν p dβ q exists and is nonzero, it is anormalization to set | ş β , ν p dβ q| “

1. In other words, this imposes no additionalrestrictions on the model. This is seen by noting that if we divide the original latentutility model by | ş β , ν p dβ q| , then the argmax set does not change and none of theassumptions in Proposition 1 are aﬀected by this division.A natural intuition is that when β , ą β , and rewrite the problem with β , “

1. This istrue if we only inspect the original latent utility model (Equation 1), but is no longertrue when we consider independence or certain other additional assumptions on theintegrated model in Assumption 2. Recall that Assumption 1 means β and ε areindependent in the deﬁnition of the average structural function Y . In general, thisassumption is not invariant to division by β , . This means that setting β , “ Welfare Analysis and Counterfactuals

We now turn to identiﬁcation of certain welfare and counterfactual objects. Identiﬁ-cation is established given identiﬁcation of certain features of V , which is the indirectutility function obtained when random intercepts are integrated out. We ﬁrst providethree results that identify diﬀerences in V . Using these results, we discuss welfareanalysis and counterfactuals.The reason we identify V is that we can use the envelope theorem to determine certainaverage choices Y p x, β q “ ∇ V p β x , . . . , β K x K q . We require identiﬁcation of the right hand side at values other than 0 to considercounterfactuals at new values of covariates. V We ﬁrst provide conditions under which identiﬁcation of partial derivatives of V at0 allows us to directly extrapolate the function. Speciﬁcally, we assume V is a realanalytic function. That is, V has derivatives of all orders and agrees with its Taylorseries in a neighborhood of every point. Real analytic functions have the importantproperty that local information can be used to reconstruct the function globally byextrapolating. This is similar to common parametric classes of functions. However,the set of real analytic functions is inﬁnite dimensional. Corollary 2.

Let the assumptions of Theorem 1 or the assumptions of Proposition 1hold with M “ 8 . If V is a real analytic function, then it is identiﬁed up to anadditive constant. One way to drop the assumption that V is a real analytic function is to insteadassume β k, “ k . With this assumption, let ˜ x be a value thatis zero for every characteristic except the ﬁrst characteristic of each good. Then theenvelope theorem (Lemma 2) specializes to Y k p ˜ x, β q “ B k V p ˜ x , , . . . , ˜ x K, q . β , and so by taking expectations, the average structuralfunction identiﬁes the derivative of V at the point p ˜ x , , . . . , ˜ x K, q . By integrating thederivatives we can identify diﬀerences in V , as we now formalize. Proposition 2.

Let Assumption 2 hold and assume β k, “ almost surely for each k P t , . . . , K u . Suppose Y p x q is identiﬁed for all x “ p x , . . . , x K q satisfying x k, Pr x k, , x k, s for each k , and x k,j “ for j ą . Then diﬀerences in V are identiﬁedover the region ˆ Kk “ r x k, , x k, s . In particular, if x k, “ ´8 and x k, “ 8 for each k ,then V is identiﬁed up to an additive constant. The results on welfare and counterfactual analysis require that derivatives of V beidentiﬁed at certain values p β x , ¨ ¨ ¨ , β K x K q . If the support of β is compact, then itis not necessary to identify V everywhere, and so it is not necessary to have x k, “ ´8 and x k, “ 8 to apply Proposition 2.Finally, we mention a third way to identify diﬀerences in V . A key distinction isthat it requires identiﬁcation of the distribution of Y p x, β q for ﬁxed x , rather thanidentiﬁcation of the average structural function as in the rest of the paper. We adaptthe following lemma. Lemma 4 (McCann [1995]; statement from Chernozhukov et al. [2019b], Corollary2) . Let W “ f p η q , where W and η have the same ﬁnite dimension. Suppose f is thegradient of a convex function, η has a known distribution that is absolutely continuouswith respect to Lesbesgue measure, the distribution of W is known, and η and W haveﬁnite variance. It follows that f is identiﬁed. This result can be applied to our setting by adapting the envelope theorem, Y p x, β q “ ∇ V p β x , . . . , β K x K q . Interpret W “ Y p x, β q , f “ ∇ V , and η “ p β x , . . . , β K x K q . When x is ﬁxed andthe distribution of β is identiﬁed (from previous arguments), the distribution of η isknown. The function V is convex, and so the lemma provides conditions under which ∇ V is identiﬁed. Importantly, the lemma can be applied at a single x , so it is notnecessary to have full support of covariates to apply the result. Such an x cannotbe arbitrary. For example, when x “ η is notabsolutely continuous. The lemma can still be applied for x near but not equal to 0.Moreover, to apply the lemma, the distribution of β cannot be degenerate, i.e. there22ust be truly “random” coeﬃcients. If β is almost surely equal to a constant, thenthe distribution corresponding to η is not absolutely continuous and the lemma doesnot apply.Importantly, to apply Lemma 4 in our setting, the distribution of Y p x, β q must beidentiﬁed at some ﬁxed x . One example in which this lemma can be applied iswhen ε in the original latent utility model is not present, so that Y p x, β q correspondsto the observable choices given x and β . Such structure could be appropriate in acontinuous choice model in which all unobservable heterogeneity is controlled by therandom slopes β , and in which the choices (rather than e.g. average choices for agroup of individuals) are observed. We now describe how identiﬁcation of V leads to identiﬁcation of certain welfareobjects. First, recall that V may be interpreted as the indirect utility conditionalon the utility index (Lemma 1) where the random intercept ε under the measure µ is integrated out. To interpret V p¨q as a welfare object, suppose β is an individual-speciﬁc term that is random across the population but constant across decisionsfor the same individual. Interpret the random intercept ε as an idiosyncratic tasteshock across decision problems. Then V p β x , . . . , β K x K q is an individual-speciﬁc(integrated) indirect utility. The conditions of Corollary 1 identify the distribution of V p β x , . . . , β K x K q under the measure ν up to an additive constant once the valuesof covariates are ﬁxed.Thus, we can identify the distribution of individual-speciﬁc indirect utilities. Byfurther integrating out the distribution of β , we also identify diﬀerences in the averageindirect utility via Lemma 1: ż V p β x , . . . , β K x K q dν p β q “ ż ż sup y P B K ÿ k “ y k p β k x k q ` D p y, ε q dµ p ε q dν p β q . (5)This result holds regardless of whether the distribution of ε is identiﬁed since V is a welfare-relevant summary measure of the distribution of ε . Indeed, we do not23stablish identiﬁcation of the distribution ofsup y P B K ÿ k “ y k p β k x k q ` D p y, ε q according to the product measure µ ˆ ν over p β, ε q . In particular, since this paper doesnot study identiﬁcation of ε , we do not identify the distribution of indirect utilitiesincluding random intercepts.Note that the units of (5) are relative to the scale assumption used to identify the dis-tribution of β . If we impose the scale assumptions in Theorem 1 to apply Corollary 2,then the distribution of β , is ﬁxed. Thus, the units of Equation 5 are set by the distribution of the conversion rate between x , and utils. In contrast, if we imposethe conditions of Proposition 1, then the scale is determined by ˇˇş β , ν p β q ˇˇ . For thiscase, the units of Equation 5 are relative to the average conversion rate between x , and utils.An alternative measure of average indirect utility is ż | β , | V p β x , . . . , β K x K q dν p β q “ ż ż | β , | sup y P B K ÿ k “ y k p β k x k q ` D p y, ε q dµ p ε q dν p β q . (6)This sets the conversion rate of x , and utils to ˘

1. Importantly, this preserveswhether the ﬁrst characteristic is desirable or undesirable. It also forces the inten-sity of preference to be constant across individuals. This welfare measure is mostinterpretable when the regressor has a homogeneous sign. For example, if x , is the(negative) price of good 1, then β , ă Once V is identiﬁed, we can also answer certain counterfactual questions involvingquantities at new values of covariates. To this end, recall from Lemma 2 that Y k p x, β q “ B k V p β x , . . . , β K x K q . (7)24ere, Y k p x, β q is the demand for good k ﬁxing covariates and the random intercept,but integrating out the distribution of ε . We interpret β as an individual-speciﬁcparameter that is constant across decision problems, while ε is an idiosyncratic shockthat can vary across decision problems. Thus, Y k p x, β q is the individual-speciﬁcaverage quantity of the k -th good. Once V and the distribution of β are identiﬁed,we can identify the distribution of Y k p x, ¨q from Equation 7.Conceptually, this shows it is possible to start with identiﬁcation of the averagestructural function (“mean choices”) around x “ Y k p x, β q at all values of the covariates. This also implies that any value x at which Y can be identiﬁed directly from data (as opposed to the theoretical analysis justdescribed) provides overidentifying information. We now provide additional discussion of the main results in the paper.

Remark . Our analysis can be applied to settingsin which covariates do not vary around zero. For example, we could recenter via˜ X “ X ´ E r X s so that identiﬁcation uses variation in the average structural functionaround the mean, rather than variation around 0. This is noted as well in Foxet al. [2012]. Importantly, the assumptions in this paper are not typically invariantto recentering. Thus, the assumptions must be made given a particular centeringat which the average structural function is identiﬁed. In addition, the location ofrecentering deﬁnes where the random slopes do not alter preferences since β x “ x “

0. In words, the choice of centering sets a location of taste homogeneitywith regard to the random slopes (but not random intercepts).

Remark . The results may be adapted to certainmodels in which coeﬃcients are not linear. Suppose instead of the linear index β k x k ,we have x ρ k k for a scalar shifter x k and random exponent ρ k . Applying the envelopetheorem yields B x j Y k p x, ρ q “ B j,k V p x ρ , . . . , x ρ K K q ρ j x ρ j ´ j ρ “ p ρ , . . . , ρ K q collects the random exponents. In particular, the partialderivatives of V can be evaluated at a vector of ones so that x ρ j ´ j “ ρ , evaluating the above equationaround covariates equal to one, and using symmetry of mixed partial derivatives, weobtain B x j Y k p qB x k Y j p q “ E r ρ j s E r ρ k s . Ratios of higher order moments can be identiﬁed by considering additional derivativesof the average structural function, and by using symmetry of mixed partials of V evaluated at the vector of ones. Remark V ) . Recall that the envelope theoremyields B x j Y k p , β q “ B j,k V p q β j . Thus, second order mixed partial derivatives of V describe how changes in the utilityindex of good x j alter the demand for good k . The sign of B j,k V p q describes whethergoods are local complements ( B j,k V p q ě

0) or substitutes ( B j,k ď V p q ). Theorem 1provides conditions under which this derivative is identiﬁed, and thus we obtaininformation on complementarity/substitability with random coeﬃcients. Severalpapers have studied complementarity in the bundles model (Example 2) whencharacteristics shift the marginal utility of a good homogenously (Gentzkow [2007],Fox and Lazzati [2017], Chernozhukov et al. [2015], Allen and Rehbeck [2019d,c]).To our knowledge, the only paper that studies identiﬁcation of complementarity inthe bundles model with heterogeous tastes for characteristics is Dunker et al. [2017].Rather than working with only the average structural function (“mean demands”),Dunker et al. [2017] require identiﬁcation of the frequencies that goods are chosentogether. Remark . We do not require the assumption thatcoeﬃcients are the same across goods, β j “ β k . One interpretation of this assumptionis that preferences are driven by observable characteristics [Gorman, 1980, Lancaster,1966], not the label of the good ( j vs k ). In this paper, we assume that the shiftersassociated with β k vary for good k . Thus, setting β j “ β k means the shifters thatvary for good j are the same as for good k . This assumption is inconsistent with26ome empirical settings, especially outside of discrete choice. For example, it is notsatisﬁed for the bundles model of Gentzkow [2007] where internet speed varies foronline news but not print news.Placing restrictions relating coeﬃcients across diﬀerent goods allows one to eitherweaken the conditions used for identiﬁcation or use alternative techniques. SeeAppendix B.3 for additional discussion and relation to Chernozhukov et al. [2019a]. Remark . The conditions of Theorem 1 imply testable implicationsbecause of theoretical relationships between diﬀerent moments. To see this, we revisitthe system of equations (3) used previously to illustrate the identiﬁcation technique.Dividing the ﬁrst and third equations, and the second and fourth equations, we obtain B x B x Y p qB x B x Y p q “ ş β ν p dβ q ş β β ν p dβ q B x B x Y p qB x B x Y p q “ ş β ν p dβ q ş β β ν p dβ q . Multiplying these equations and using the Cauchy-Schwarz inequality yields thetestable restriction B x B x Y p qB x B x Y p q ¨ B x B x Y p qB x B x Y p q ě . Note that this inequality only concerns the average structural function close to 0.

Remark . The results in this paper establish identiﬁ-cation of certain moments of the distribution of random coeﬃcients. It is naturalto wonder whether the distribution can be identiﬁed without requiring that it beuniquely determined by its moments. A related question has been studied previously.Speciﬁcally, consider the setting Y “ A ` B Z, where Z is independent of p A, B q and the distribution of p Y, Z q is identiﬁed. Buildingon Belisle et al. [1997], Masten [2017] shows that, if the support of Z is bounded, thenidentiﬁcation of either p A, B q or even the marginal distributions B k requires that thedistribution p A, B q is determined by its moments. In light of this, it is possible thatgiven only the conditions of Theorem 1 for each M , identiﬁcation of the distributionof β requires that it be determined by its moments. Recall that we only assumeidentiﬁcation of the average structural function Y p x q in a bounded set around 0.27 eferences Victor H Aguiar. Random categorization and bounded rationality.

Economics Letters ,159:46–52, 2017.Roy Allen. Injectivity and the law of demand.

Available at SSRN 3437946 , 2019.Roy Allen and John Rehbeck. Identiﬁcation with additively separable heterogeneity.

Econometrica , 87(3):1021–1054, 2019a.Roy Allen and John Rehbeck. Revealed stochastic choice with attributes.

Availableat SSRN 2818041 , 2019b.Roy Allen and John Rehbeck. Latent complementarity in bundles models.

WorkingPaper , 2019c.Roy Allen and John Rehbeck. Hicksian complementarity and perturbed utility mod-els.

Working Paper , 2019d.Joseph G Altonji and Rosa L Matzkin. Cross section and panel data estimators fornonseparable models with endogenous regressors.

Econometrica , 73(4):1053–1102,2005.Simon P Anderson, Andr´e De Palma, and J-F Thisse. A representative consumertheory of the logit model.

International Economic Review , pages 461–466, 1988.Simon P Anderson, Andre De Palma, and Jacques Fran¸cois Thisse.

Discrete choicetheory of product diﬀerentiation . Cambridge, MA: MIT press, 1992.Claude Belisle, Jean-Claude Mass´e, Thomas Ransford, et al. When is a probabilitymeasure determined by inﬁnitely many projections?

The Annals of Probability , 25(2):767–786, 1997.Steven Berry and Ariel Pakes. The pure characteristics demand model.

InternationalEconomic Review , 48(4):1193–1225, 2007.Steven Berry, James Levinsohn, and Ariel Pakes. Automobile prices in market equi-librium.

Econometrica , pages 841–890, 1995.28teven T Berry. Estimating discrete-choice models of product diﬀerentiation.

TheRAND Journal of Economics , pages 242–262, 1994.Steven T Berry and Philip A Haile. Nonparametric identiﬁcation of multinomialchoice demand models with heterogeneous consumers. Technical report, NationalBureau of Economic Research, 2009.Steven T Berry and Philip A Haile. Identiﬁcation in diﬀerentiated products marketsusing market level data.

Econometrica , 82(5):1749–1797, 2014.Richard Blundell and James L Powell. Endogeneity in nonparametric and semipara-metric regression models.

Econometric society monographs , 36:312–357, 2003.J Hayden Boyd and Robert E Mellman. The eﬀect of fuel economy standards on theus automotive market: an hedonic demand analysis.

Transportation Research PartA: General , 14(5-6):367–378, 1980.Richard L Brady and John Rehbeck. Menu-dependent stochastic feasibility.

Econo-metrica , 84(3):1203–1223, 2016.Richard A Briesch, Pradeep K Chintagunta, and Rosa L Matzkin. Nonparamet-ric discrete choice models with unobserved heterogeneity.

Journal of Business &Economic Statistics , 28(2):291–307, 2010.N Scott Cardell and Frederick C Dunbar. Measuring the societal impacts of au-tomobile downsizing.

Transportation Research Part A: General , 14(5-6):423–434,1980.Xiaohong Chen and Timothy M Christensen. Optimal sup-norm rates and uniforminference on nonlinear functionals of nonparametric iv regression.

QuantitativeEconomics , 9(1):39–84, 2018.Victor Chernozhukov, Whitney K Newey, and Andres Santos. Constrained conditionalmoment restriction models. arXiv preprint arXiv:1509.06311 , 2015.Victor Chernozhukov, Iv´an Fern´andez-Val, and Whitney K Newey. Nonseparablemultinomial choice models in cross-section and panel data.

Journal of Economet-rics , 211(1):104–116, 2019a. 29ictor Chernozhukov, Alfred Galichon, Marc Henry, and Brendan Pass. Single marketnonparametric identiﬁcation of multi-attribute hedonic equilibrium models. arXivpreprint arXiv:1709.09570 , 2019b.Khai Chiong, Yu-Wei Hsieh, and Matthew Shum. Counterfactual estimationin semiparametric discrete-choice models. Technical report, 2017. URL http://dx.doi.org/10.2139/ssrn.2979446 .Khai Xiang Chiong, Alfred Galichon, and Matt Shum. Duality in dynamic discrete-choice models.

Quantitative Economics , 7(1):83–115, 2016.Christopher T Conlon and Julie Holland Mortimer. Demand estimation under in-complete product availability.

American Economic Journal: Microeconomics , 5(4):1–30, 2013.Angus Deaton and John Muellbauer. An almost ideal demand system.

The AmericanEconomic Review , 70(3):312–326, 1980.Xavier D’Haultfœuille and Philippe F´evrier. Identiﬁcation of nonseparable triangularmodels with discrete instruments.

Econometrica , 83(3):1199–1210, 2015.T Domenich and D McFadden. Urban travel demand: a behavioural approach.

Worth-Holland, Amsterdam , 1975.Fabian Dunker, Stefan Hoderlein, and Hiroaki Kaido. Nonparametric identiﬁcation ofendogenous and heterogeneous aggregate demand models: complements, bundlesand the market level.

Working Paper , 2017.Fabian Dunker, Stefan Hoderlein, Hiroaki Kaido, and Robert Sherman. Nonparamet-ric identiﬁcation of the distribution of random coeﬃcients in binary response staticgames of complete information.

Journal of Econometrics , 206(1):83–102, 2018.Ivar Ekeland, James J Heckman, and Lars Nesheim. Identifying hedonic models.

American Economic Review , 92(2):304–309, 2002.Ivar Ekeland, James J Heckman, and Lars Nesheim. Identiﬁcation and estimation ofhedonic models.

Journal of political economy , 112(S1):S60–S109, 2004.30ﬁr Eliaz and Ran Spiegler. Consideration sets and competitive marketing.

TheReview of Economic Studies , 78(1):235–262, 2011.Daniel Ershov, Jean-William Lalibert´e, and Scott Orr. Mergers in a model withcomplementarity. Technical report, 2018. Working Paper.Mogens Fosgerau, Julien Monardo, and Andr´e De Palma. The inverse product diﬀer-entiation logit model. 2019.Jeremy T Fox. A note on nonparametric identiﬁcation of distributions of randomcoeﬃcients in multinomial choice models. Technical report, National Bureau ofEconomic Research, 2017.Jeremy T Fox and Amit Gandhi. Nonparametric identiﬁcation and estimation of ran-dom coeﬃcients in multinomial choice models.

The RAND Journal of Economics ,47(1):118–139, 2016.Jeremy T Fox and Natalia Lazzati. A note on identiﬁcation of discrete choice modelsfor bundles and binary games.

Quantitative Economics , 8(3):1021–1036, 2017.Jeremy T Fox, Kyoo il Kim, Stephen P Ryan, and Patrick Bajari. The randomcoeﬃcients logit model is identiﬁed.

Journal of Econometrics , 166(2):204–212,2012.Drew Fudenberg, Ryota Iijima, and Tomasz Strzalecki. Stochastic choice and revealedperturbed utility.

Econometrica , 83(6):2371–2409, 2015.Xavier Gabaix. Chapter 4 - behavioral inattention. In B. Douglas Bern-heim, Stefano DellaVigna, and David Laibson, editors,

Handbook of Behav-ioral Economics - Foundations and Applications 2 , volume 2 of

Handbookof Behavioral Economics: Applications and Foundations 1 , pages 261 – 343.North-Holland, 2019. doi: https://doi.org/10.1016/bs.hesbe.2018.11.001. URL .Alfred Galichon.

Optimal transport methods in economics . Princeton University Press,2018.Alfred Galichon and Bernard Salani´e. Cupid’s invisible hand: Social sur-31lus and identiﬁcation in matching models. Technical report, 2015. URL http://dx.doi.org/10.2139/ssrn.2979446 .Eric Gautier and Yuichi Kitamura. Nonparametric estimation in random coeﬃcientsbinary choice models.

Econometrica , 81(2):581–607, 2013.Matthew Gentzkow. Valuing new goods in a model with complementarity: Onlinenewspapers.

American Economic Review , 97(3):713–744, 2007.William M Gorman. A possible procedure for analysing quality diﬀerentials in theegg market.

The Review of Economic Studies , 47(5):843–856, 1980.Florian Gunsilius. Nonparametric point-identiﬁcation of multivariate models withbinary instruments.

Working Paper , 2019.Jerry A Hausman and David A Wise. A conditional probit model for qualitativechoice: Discrete decisions recognizing interdependence and heterogeneous prefer-ences.

Econometrica: Journal of the Econometric Society , pages 403–426, 1978.James J Heckman. Micro data, heterogeneity, and the evaluation of public policy:Nobel lecture.

Journal of political Economy , 109(4):673–748, 2001.James J Heckman, Rosa L Matzkin, and Lars Nesheim. Nonparametric identiﬁcationand estimation of nonadditive hedonic models.

Econometrica , 78(5):1569–1591,2010.Chris C Heyde. On a property of the lognormal distribution.

Journal of the RoyalStatistical Society: Series B (Methodological) , 25(2):392–393, 1963.Josef Hofbauer and William H Sandholm. On the global convergence of stochasticﬁctitious play.

Econometrica , 70(6):2265–2294, 2002.Alessandro Iaria and Ao Wang. Identiﬁcation and estimation of demand for bundles.Technical report, 2019. Working Paper.Hidehiko Ichimura and T Scott Thompson. Maximum likelihood estimation of abinary choice model with random coeﬃcients of unknown distribution.

Journal ofEconometrics , 86(2):269–295, 1998. 32yoo il Kim. Identiﬁcation of the distribution of random coeﬃcients in static and dy-namicdiscrete choice models.

The Korean Economic Review , 30(2):191–216, 2014.Guido W Imbens and Whitney K Newey. Identiﬁcation and estimation of triangularsimultaneous equations models without additivity.

Econometrica , 77(5):1481–1512,2009.Nail Kashaev. Identiﬁcation of semiparametric discrete outcome models with boundedcovariates. arXiv preprint arXiv:1811.05555 , 2018.Kelvin J Lancaster. A new approach to consumer theory.

Journal of political economy ,74(2):132–157, 1966.Arthur Lewbel and Krishna Pendakur. Unobserved preference heterogeneity in de-mand using generalized random coeﬃcients.

Journal of Political Economy , 125(4):1100–1148, 2017.Charles F Manski. The structure of random utility models.

Theory and decision , 8(3):229–254, 1977.Paola Manzini and Marco Mariotti. Stochastic choice and consideration sets.

Econo-metrica , 82(3):1153–1176, 2014.Yusufcan Masatlioglu, Daisuke Nakajima, and Erkut Y Ozbay. Revealed attention.

American Economic Review , 102(5):2183–2205, 2012.Matthew A Masten. Random coeﬃcients on endogenous variables in simultaneousequations models.

The Review of Economic Studies , 85(2):1193–1250, 2017.Rosa L Matzkin. Restrictions of economic theory in nonparametric methods.

Hand-book of Econometrics , 4:2523–2558, 1994.Rosa L Matzkin. Constructive identiﬁcation in some nonseparable discrete choicemodels.

Journal of Econometrics , 211(1):83–103, 2019.Robert J McCann. Existence and uniqueness of monotone measure-preserving maps.

Duke Mathematical Journal , 80(2):309–324, 1995.Daniel McFadden. Econometric models of probabilistic choice. In Charles F. Manski33nd Daniel McFadden, editors,

Structural Analysis of Discrete Data with Econo-metric Applications , pages 198–272. Cambridge, MA: MIT Press, 1981.Daniel L McFadden and Mogens Fosgerau. A theory of the perturbed consumer withgeneral budgets. Technical report, National Bureau of Economic Research, 2012.Julien Monardo. The ﬂexible inverse logit (ﬁl) model.

Available at SSRN 3388972 ,2019.Aviv Nevo. A practitioner’s guide to estimation of random-coeﬃcients logit modelsof demand.

Journal of economics & management strategy , 9(4):513–548, 2000.Whitney K Newey and James L Powell. Instrumental variable estimation of nonpara-metric models.

Econometrica , 71(5):1565–1578, 2003.R Tyrrell Rockafellar.

Convex analysis . Number 28. Princeton university press, 1970.Xiaoxia Shi, Matthew Shum, and Wei Song. Estimating semi-parametric panel multi-nomial choice models using cyclic monotonicity.

Econometrica , 86(2):737–761,2018.Alexander Torgovitsky. Identiﬁcation of nonseparable models using instruments withsmall support.

Econometrica , 83(3):1185–1197, 2015.Kenneth E Train.

Discrete Choice Methods with Simulation . Cambridge UniversityPress, 2009.Ao Wang. A blp demand model of product-level market shares with complementarity.Technical report, 2020. 34 ppendix A Proofs of Main Results

A.1 Preliminary Lemmas

Proof of Lemma 1.

This follows line by line from the proof of Allen and Rehbeck[2019a], Theorem 1. The statement of that result included an additional Assumption1, which was not used in the proof as long as the underlying choice is appropriatelymeasurable. Here, we start with Y of the form Y p X, β, ε q , which is automatically ameasurable function. Proof of Lemma 2.

See Allen and Rehbeck [2019a], Lemma 1. The result may alsobe directly proven from Rockafellar [1970], Theorems 23.5 and 25.1.

Proof of Lemma 3.

The function V is convex. The result then follows from Rockafel-lar [1970], Theorem 4.5, plus repeated diﬀerentiation. A.2 Proof of Theorem 1

The following lemmas maintain the assumptions of Theorem 1. These assumptionsensure the requisite smoothness assumptions and ensure that the following argumentsdo not divide by zero.In order to simplify presentation, we require some additional notation. Let p γ, ξ q bea tuple with γ P t , . . . , K u M denoting good indices, and let ξ k P t , . . . d γ k u describewhich characteristic corresponds to the γ k -th good. We set B p γ,ξ q Y k p , β q “ B x γ ,ξ ¨ ¨ ¨ B x γM ,ξM Y k p , β q . As shorthand we also write multiplication of the coeﬃcients of β for the characteristicsof p γ, ξ q as β p γ,ξ q “ β γ ,ξ ¨ ¨ ¨ β γ M ,ξ M . Lemma A.1. B p γ,ξ q Y k p , β q “ B γ B k V p q β p γ,ξ q . roof. Lemma 2 establishes Y k p x, β q “ B k V p β x , . . . , β K x K q . Diﬀerentiating with respect to x γ ,ξ and evaluating at x “ B γ ,ξ Y k p , β q “ B γ B k V p q β γ ,ξ . By repeating the diﬀerentiation process and evaluating at x “ j -th regressors x j areexcluded from the desirability indices of the other goods. Lemma A.2. B p γ,ξ q Y k p q “ B γ B k V p q ż β p γ,ξ q dν p β q Proof.

We obtain B p γ,ξ q Y k p q “ B x ,ξ ¨ ¨ ¨ B x M,ξM Y k p q“ ż B x ,ξ ¨ ¨ ¨ B x M,ξM Y k p , β q dν p β q“ ż B γ B k V p q β p γ,ξ q dν p β q“ B γ B k V p q ż β p γ,ξ q dν p β q where the interchange of integration and diﬀerentiation in the second equality followsfrom Assumption 5(i), the third equality is Lemma A.1, and the ﬁnal equality followssince the evaluation of B γ B k V p q is a constant that does not depend on β .Combining the result of Lemma A.2 and Assumption 6 ensures that there exists aset of goods and characteristic indices p γ, ξ q such that B p γ,ξ q Y k p q ‰

0. To see this,recall that Assumption 6 requires that for each collection of good indices γ we canﬁnd characteristic indices ξ such that ş β p γ,ξ q dν p β q ‰

0. Given Assumption 5(iv) that δ γ δ k V p q ‰

0, Lemma A.2 shows that we must have B p γ,ξ q Y k p q ‰ Lemma A.3.

Fix j, k

P t , . . . , K u and let γ, δ P t , K u M . Suppose that each good ndex shows up exactly the same number of times in p γ, k q and p δ, j q . Then B γ B k V p q “ B δ B j V p q . Proof.

This is a slight restatement of Lemma 3.Suppose now that p δ, η q is deﬁned similar to p γ, ξ q . That is, δ P t , . . . K u M denotesgood indices and η j P t , . . . , d δ j u indexes a characteristic of the δ j -th good.Combining the previous two lemmas, we obtain that if each good index shows upexactly the same number of times in p γ, k q and p δ, j q , then B p γ,ξ q Y k p q O B p δ,η q Y j p q “ ż β p γ,ξ q dν p β q O ż β p δ,η q dν p β q (8)whenever the denominator is nonzero. Thus, if the denominator of moments is iden-tiﬁed, the numerator is as well. Lemma A.4.

Suppose γ, δ

P t , . . . , K u M only diﬀer in at most one component and ş β p δ,η q dν p β q is identiﬁed and nonzero. Then for every ξ tuple of characteristic indices ş β p γ,ξ q dν p β q is identiﬁed.Proof. Lemmas A.2 and A.3 immediately imply (8) since B p γ,ξ q Y k p q O B p δ,η q Y j p q “ ˆ B γ B k V p q ż β p γ,ξ q dν p β q ˙ O ˆ B δ B j V p q ż β p δ,η q dν p β q ˙ “ ż β p γ,ξ q dν p β q O ż β p δ,η q dν p β q . The term ş β p γ,ξ q dν p β q is identiﬁed because all other parts of (8) are identiﬁed.Note that in Lemma A.4 that the γ and δ terms can be the same. This covers thenon-trivial K “ ş β M , ν p dβ q be known. We present a lemma that drops thisassumption for the moment. This lemma will be used in subsequent results.37 emma A.5. If ş β M , dν p β q is not known, then we still identify the ratio of any M -thorder moments ż β p γ,ξ q dν p β q O ż β p δ,η q dν p β q , provided the denominator is nonzero.Proof. Start with good indices δ “ p , . . . , q of length M , and characteristic indices η such that the corresponding moment of β is nonzero. Applying Lemma A.4 for thepair with goods δ “ p , , . . . , q and characteristic indices η , we identify the ratio ż β p δ ,η q dν p β q O ż β p δ ,η q dν p β q . We can repeat this procedure with a sequence p δ , η q and δ “ p , , . . . , q withappropriately chosen characteristic indices η , and so forth, to construct a sequence δ , δ , . . . that reaches all possible tuples of good indices γ P t , . . . , M u K . At eachstep, we can change the good index one component at a time and then apply (8).This identiﬁes the ratio of two adjacent moments in this sequence. We avoid dividingby zero because of the relevance condition (Assumption 6), which implies for each setof goods, δ , we can ﬁnd tuples of characteristics, η , such that ş β p δ,η q dν p β q is nonzero.By multiplication we can identify new ratios. For example, a ratio involving δ and δ is identiﬁed via ż β p δ ,η q dν p β q O ż β p δ ,η q dν p β q “ ˜ż β p δ ,η q dν p β q O ż β p δ ,η q dν p β q ¸ ˜ż β p δ ,η q dν p β q O ż β p δ ,η q dν p β q ¸ . From these arguments, for each pair of good indices γ and δ we can ﬁnd some tuplesof characteristic indices ξ and η such that ż β p γ,ξ q dν p β q O ż β p δ,η q dν p β q is identiﬁed, where numerator and denominator are nonzero.38rom Lemma A.4, the ratio ż β p δ, ˜ η q dν p β q O ż β p δ,η q dν p β q is identiﬁed for any vector of characteristic indices ˜ η where η is chosen so that thedenominator is nonzero. Thus, we identify the ratio of all moments.Using Lemma A.5, we conclude that if we ﬁx ş β M , dν p β q in advance and it is nonzero,we identify all moments. However, we could ﬁx any other nonzero M -th order momentand also obtain identiﬁcation.Finally, from Lemma A.2 we have for all γ P t , . . . , K u M that B p γ,ξ q Y k p q “ B γ B k V p q ż β p γ,ξ q dν p β q . Moreover, from Assumption 6 we can ﬁnd some ξ such that the right hand side isnonzero. By dividing, we identify B γ B k V p q , completing the proof of Theorem 1. A.3 Proof of Proposition 1

First, from the envelope theorem (see Lemma A.2 above), B Y p qB x , “ B , V p q ż β , ν p dβ q . The function V is convex and hence B , V p q ą B Y p qB x , and ş β , dν p β q have the same sign. Thus, the sign of the ﬁrst moment of β , is identiﬁed from above and the magnitude is assumed known in Assumption 7.We prove the remainder of the result by induction on M . Recall that with M “

1, ﬁrstorder moments are identiﬁed from Theorem 1 using the assumption that ş β , ν p dβ q is known and nonzero.Now, ﬁx an M such that 1 ď M ď M ´

1. As the inductive hypothesis, we assumeall M -th order moments ş β p δ,η q dν p β q are identiﬁed for all δ P t , . . . , K u M and η M ` K ě

2, for δ P t , . . . , K u M (i.e. no good index is equal to1) we can ﬁnd a collection of characteristic indices η such that ş β p δ,η q dν p β q ‰

0. Ifinstead K “

1, we can set δ as the length- M vector of 1’s and let η be some collectionof characteristic indices with η m ‰ ż β ,η ¨ ¨ ¨ β ,η M dν p β q ‰ . In either case K “ K ě

2, set ˜ δ “ p δ , q and ˜ η “ p η , q . Then we obtain ż β p ˜ δ, ˜ η q dν p β q “ ż β , dν p β q ż β p δ,η q dν p β q because β , is independent of all other components of β under the measure ν , andthe tuple p δ, η q does not include the ﬁrst characteristic of good 1. In particular, weidentify ż β p ˜ δ, ˜ η q dν p β q , which is nonzero because it is the product of two nonzero terms. From Lemma A.5,we identify the ratio of all M ` ş β p ˜ δ, ˜ η q dν p β q . Since ş β p ˜ δ, ˜ η q dν p β q is known and nonzero we identify all M ` V , use Lemma A.2as in the proof of Theorem 1. A.4 Proof of Proposition 2

Let x “ p x , . . . , x K q satisfy x k, P r x k, , x k, s for each k , and x k,j “ j ą x : , “ p x , , . . . , x K, q be a vector of the ﬁrst characteristics for each good. FromLemma 2 and integrating over β , we obtain Y p x q “ ∇ V p x : , q where V p x q is convex. 40onsider initial characteristic values, x I , and ﬁnal characteristic values, x F , suchthat for all k P t , . . . , K u and for all j ą

1, the equality x Ik,j “ x Fk,j “ x I to x F , we obtain V p x F : , q ´ V p x I : , q “ ż Y p tx F ´ p ´ t q x I q ¨ p x F : , ´ x I : , q dt, where Riemann integrability follows from Rockafellar [1970] Corollary 24.2.1. Appendix B Supplemental Results

B.1 Plug-in Estimation

The proof of Theorem 1 is based on multiplying derivative ratios. By directly pluging-in an estimator of the M -th order derivatives, one can construct an estimator of the M -th order moments. Suppose we have estimator B x k ,ℓ ¨ ¨ ¨ B x kM ,ℓM ˆ Y p q of the associated M -th order derivative of the average structural function where k m “ k ˜ m “ j . In addition, suppose we have an estimator B x , ¨ ¨ ¨ B x , ˆ Y j p q of the M -th order partial derivative of the structural function with respect to the x , regressor.We construct an estimator {ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M ν p dβ q “ B x k ,ℓ ¨ ¨ ¨ B x kM,ℓM ˆ Y p qB x , ¨ ¨ ¨ B x , ˆ Y j p q , where for simplicitly we assume ş β M , ν p dβ q “

1. (More generally we need it to beknown a priori and nonzero.) 41sing Equation 8, we see that ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M ν p dβ q ´ {ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M ν p dβ q “B x k ,ℓ ¨ ¨ ¨ B x kM ,ℓM Y p qB x , ¨ ¨ ¨ B x , Y j p q ´ B x k ,ℓ ¨ ¨ ¨ B x kM,ℓM ˆ Y p qB x , ¨ ¨ ¨ B x , ˆ Y j p q . Thus, estimation error on the right-hand side translates to estimation error on theleft hand side for the M -th order moment of β . Note that the choice of the j -th goodis arbitrary here, and so one could also construct an estimator with right hand sidereplaced by an average over ratios with respect to diﬀerent goods.This argument can be generalized to additional moments. Here, we use the fact thatwe are interested in an M -th order moment that is only one good index away frombeing a vector of 1’s. For other M -th order moments, our constructive formulasshow one must multiply additional derivative ratios. See in particular the proof ofLemma A.5. B.2 V Known and Relation to Fox et al. [2012]

Theorem 1 identiﬁes the M -th order moment of β when we ﬁx ş β M , ν p dβ q . By ﬁxingthe entire distribution of β , , we can identify all moments of β . An alternativeassumption is that V is known. If we impose this assumption, then we can dropAssumption 4, which provides knowledge of each M -th order moment of β , , andAssumption 6 that a rich collection of moments are nonzero. The intuition why wecan relax the scale assumption on moments is that here we instead set the scale of V . Assumption B.1. V is known in a neighborhood of , up to an additive constant. Proposition B.1.

Let Assumptions 1-3, 5, and B.1 hold with the same natural num-ber M . Each M -th order moment ż β k ,ℓ ¨ ¨ ¨ β k M ,ℓ M dν p β q is identiﬁed.Proof. Lemma A.2 holds under the assumptions of this proposition, and so for each42uple of good indices γ P t , . . . , M u M and characteristic indices ξ we have B p γ,ξ q Y k p q “ B γ B k V p q ż β p γ,ξ q dν p β q . Since B γ B k V p q ‰ ş β p γ,ξ q ν p dβ q by B p γ,ξ q Y k p q O B γ B k V p q “ ż β p γ,ξ q dν p β q . The proof demonstrates that in fact, we only need B γ B k V p q to be known and nonzerofor some k in order to identify a corresponding moment of β . For further relationto Theorem 1, consider good indices γ “ p , . . . , q and characteristic indices ξ “p , . . . , q . Then as in the proof of Proposition B.1, we obtain B M Y k p q “ B M B k V p q ż β M , dν p β q . This shows that one can either ﬁx the M -th moment of β or ﬁx B M B k V p q for some k , and then the other can be identiﬁed. Thus, Assumption 4 can be replaced inTheorem 1 if we instead assume B M B k V p q is known for some k . Alternatively, if weassume β , is independent of β as in Section 3.1, then specifying B , V p q identiﬁes ş β , dν p β q by the envelope theorem B Y p qB x , “ B , V p q ż β , dν p β q . Thus, independence combined with a single scale assumption on a partial derivativeof V can identify all moments of β by adapting Proposition 1.In discrete choice, Fox et al. [2012] present a constructive approach to identifyingmoments of the distribution of random coeﬃcients. Specializing our analysis to dis-crete choice, their assumptions show that when the distribution of an additive erroris known (e.g. logit with known intercept) that this implies identiﬁcation of V . Tosee this, recall V is deﬁned as an indirect utility function given a disturbance D andconstraint set B . In turn, using Lemma 1 we see D and B are determined by the43udget set B , disturbance function D , and measure µ over ε . Thus, when the budgetset and µ are known then one can ﬁnd B and D needed to compute V . For example,multinomial logit is described by V p ~u q “ max y P B K ÿ k “ y k u k ` K ÿ k “ p α k ` p k ln p k q , where α k is a nonrandom intercept for good k and B is the probability simplex (e.g.Anderson et al. [1992]). The derivatives of V can be used to yield the standard logitformula Y k p ~u q “ e α k ` u k ř Kj “ e α j ` u j . We conclude that Proposition B.1 is a generalization of a technique of Fox et al.[2012] to settings outside of discrete choice. However, Theorem 1 does not requirethat V be known. Thus, the results in this paper complement their approach sincewhile it relaxes assumptions on V , it instead requires an additional scale assumption(Assumption 4) and requires that a rich collection of moments are nonzero (Assump-tion 6). B.3 Homogeneity of Coeﬃcients and Relation to Cher-nozhukov et al. [2019a]

When M “

2, the proof of Theorem 1 establishes the constructive formula B Y k p qB x j,ℓ O B Y j p qB x k,m “ ż β j,ℓ dν p β q O ż β k,m dν p β q . (9)A version of (9) has appeared for binary choice in Chernozhukov et al. [2019a], whoalso discuss identiﬁcation of the ratios of M -th order moments of β up to scale. Theyalso mention one can identify certain moments up to scale in multinomial choice.Here is one interpretation of their discussion, translated to our setup. Start with a Fox et al. [2012] also presents nonconstructive results using alternative assumptions maintainingthe assumption that V is known. B x k ,ℓ B x k ,ℓ Y k p q “ B k B k B k V p q ż β k ,ℓ β k ,ℓ dν p β q . Now keep the good indices ( k and k ) constant, but change the characteristics to get B x k , ˜ ℓ B x k , ˜ ℓ Y k p q “ B k B k B k V p q ż β k , ˜ ℓ β k , ˜ ℓ dν p β q . Since the derivatives of V are taken with respect to the same arguments, we can dividethese equations to identify the associated ratios of moments of β . This techniqueresembles an implicit function theorem argument for identiﬁcation. Importantly, thistechnique only covers ratios of moments in which the good indices ( k and k here)are the same, because it does not use symmetry (cf. Lemma 3). Using symmetry,this paper establishes identiﬁcation of the ratio of all M -th order moments, not onlythose that have the same good indices. However, if we impose additional assumptionssuch as β j “ β k for all goods, then the choice of good indices does not matter. In thisspecial case, using the equations described previously one can identify the ratio ofany 2-nd order moments of β . Similar arguments can identify the ratio of any M -thorder moments of ββ