Identification of a class of index models: A topological approach
aa r X i v : . [ ec on . E M ] A p r Manuscript submitted to
The Econometrics Journal , pp. 1–13.
Identification of a class of index models: A topologicalapproach Mogens Fosgerau † and Dennis Kristensen ‡ † Dept. of Economics, Univ. of Copenhagen, ster Farimagsgade 5, 1353 Kbenhavn K, Denmark.
E-mail: [email protected] ‡ Department of Economics, University College London, Gower Street, London, WC1E 6BT, UK.
E-mail: [email protected]
Summary
We establish nonparametric identification in a class of so-called indexmodels using a novel approach that relies on general topological results. Our proofstrategy requires substantially weaker conditions on the functions and distributionscharacterizing the model compared to existing strategies; in particular, it does notrequire any large support conditions on the regressors of our model. We apply thegeneral identification result to additive random utility and competing risk models.
1. INTRODUCTIONWe develop a novel nonparametric identification result for the following class of models,Π ( w, x, z ) = Λ ( a ( w, x ) , z ) , (1.1)where a ( w, x ) = g ( w ) + h ( x ) . (1.2)is a vector of additively separable index functions while Λ : R J × R d Z R J , g : R d W R J and h : R d X R J are all vector-valued functions of dimension J ≥
1. The arguments w ∈ R d J and x ∈ R d X represent the values of two sets of regressors, W and X , while z ∈ R d Z corresponds to values of a set of control variables, Z . We take as high-levelassumption that we know (have observed from data) the function Π ( w, x, z ), for ( w, x, z )in the support of ( W, X, Z ), from which we then wish to identify the unknown functionsΛ ( a, z ) and h ( x ), while we treat the function g ( w ) as being known. We refer to thisclass of models as index models since W and X are restricted to enter the model through g ( W ) and h ( X ), respectively. We make three major contributions relative to the existingliterature:First, we do not impose any large support conditions on any of the regressors inour model. Most existing results on identification within this class of models requireavailability of a set of ”special” continuously distributed regressors; identification is thenachieved by sending each of these special regressors off to the boundary of their support.Estimators based on such ”thin set identification” argument were analyzed by Khan andTamer (2010) who showed that they tend to be irregularly behaved with slow convergence Mogens Fosgerau † and Dennis Kristensen ‡ rates. In contrast, we achieve identification as long as the random index a ( W, X ) exhibitssufficient, but potentially bounded, variation. We expect this to translate into betterbehaved estimators.Second, we impose weak conditions on the functions of interest and distributions ofthe random variables (
W, X, Z ). We do not require continuity or differentiability of thefunctions entering the model in order to show identification while most existing resultsas a minimum require these to be differentiable. Similarly, we only require g ( W ) tohave continuous support while ( X, Z ) can both be discrete, continuous or a mix of thetwo as long as their supports satisfy certain conditions. Thus, our results cover modelswith thresholds and kinks in Λ, g and h , which existing results cannot handle. In thecase of discrete choice models such features may occur if the decision maker optimizessubject to constraints; see, e.g., Cantillo and de Dios Ort´uzar (2006). These modelshave traditionally been formulated in a parametric fashion; our theory demonstrateshow these can be identified without parametric constraints. There is a growing literatureon nonparametric estimation with unknown thresholds and kinks which we conjecturecan be employed in our setting in order to translate our identification result into actualestimators; see, e.g., Chiou et al. (2018).Third, we show how the presence of the controls Z can help to achieve identificationin a nontrivial way: We first show local identification at each value of the control Z .Suitable variation in Z then allows us to piece the locally identified components togetheracross different values of Z to achieve global identification. In comparison, most otherpapers that allow for control variables show identification at a fixed arbitrary value of Z in which case variation in Z is unnecessary for identification.Our proof strategy relies on arguments from general topology that, to our knowledge,are completely new to the literature on nonparametric identification. These should beof general interest since they can be used for identification in other settings. The twokey elements of our approach is the notions of relative identification and connected sets.Below, we state our formal definition of the former: Definition 1.1.
A function h is said to be relatively identified on a given set X ifidentification of h ( x ∗ ) at some point x ∗ ∈ X implies that h ( x ) is also identified at allother x ∈ X . Next, recall the topological notion of connnectedness: A connected set cannot be con-tained in the union of two non-empty disjoint open sets while having non-empty intersec-tion with both. In particular, it is not possible to split a connected open set into disjointopen subsets.Our identification strategy then proceeds in three steps where we here initially suppressthe presence of Z for simplicity: First, we decompose the support of X into suitablesubsets and achieve relative identification on each of these. This is done via two featuresof our model: For a given x , we are able to identify the relative variation in Λ ( a ), with a = h ( x ) + g ( w ), through the observed variation in Π ( w, x ) w.r.t. w through the knownfunction g ( w ). By injectivity of Λ, we are then able to identify the relative value of a which in turn yields the relative value of h ( x ) = a − g ( w ) on suitably chosen subsets ofthe support of X . Second, we achieve global identification on the union of these subsets byusing the second main ingredient of our proof strategy, connectedness: We will require thesupport of a ( W, X ) to be connected which is used to extend relative local identification toglobal identification. Finally, reintroducing Z , we again rely on the supports of X | Z = z dentification of a class of index models to be suitably connected across different values of z in the support of Z to enlarge theidentification region further.Like us, Berry and Haile (2018) and Evdokimov (2010), among others, rely on con-nectness to achieve global identification but in these papers the restriction is imposeddirectly on the support of the covariates thereby implicity restricting the covariates to becontinuous. In contrast, we impose connectedness on the image of a ( W, X ) and so allowfor both X and W to contain discrete components.Two leading examples that fall within our general framework are nonparametric addi-tive versions of multiple discrete choice and competing risk models as shown in the nextsection. There is a large literature on identification and estimation of semiparametricmultinomial choice models (see, e.g., Manski, 1975; Lewbel et al., 2000). In contrast,the literature on nonparametric identification is quite thin with few results having beendeveloped since the seminal work of Matzkin (1993). In terms of modelling, Matzkin(1993) is probably the most closely related to our setting, but the assumptions made andidentification strategy pursued in this paper are very different from ours. Our and her setof assumptions are not clearly ranked with some of our assumptions being stronger whileothers weaker compared to hers. One key feature of her proof strategy is the introduc-tion of assumptions that ensure the multinomial model can be converted into a binarychoice problem followed by a thin-set identification argument. More recently, Allen andRehbeck (2019) provide conditions under which one can identify how regressors alter thedesirability of alternatives using only average demands. Their conditions are weaker thanours but on the other hand they are only able to identify certain features of the model,not the underlying data-generating structure.There is also a nascent literature on nonparametric identification of so-called BLPmodels (Berry et al., 1995) as used in industrial organization; see, for example, Berryand Haile (2018) and Chiappori et al. (2018). The setting of the BLP model is somewhatdifferent, though, since there the choice probabilities are treated as observed variableswhich depend on unobserved product characteristics that have to be controlled for. Thisleads to a different identification problem compared to ours.Finally, there is also a literature on identification in competing risk models. The twomost closely related papers in terms of modelling are Heckman and Honor´e (1989) and Leeand Lewbel (2013). Heckman and Honor´e (1989) achieves identification by assuming theindex (in our notation a ( W, X )) has support on (0 , ∞ ) J and then achieves identificationof a given component of the index by letting the other components go to zero, and sotheir result falls in the thin-set identification category. Abbring and van den Berg (2003)weaken this assumption substantially for the class of mixed proportional hazard models, asubclass of competing risk models. Lee and Lewbel (2013) provide a high-level assumptionfor identification of the general model involving a rank condition of an integral operator.Primitive conditions for this to hold are not known. Honor´e and Lleras-Muney (2006)derive bounds for the functions of interest when only discrete covariates are available. Wecomplement these studies by showing identification in the general competing risk modelunder primitive conditions that allow for the presence of discrete co-variates, but at thesame time impose more structure on the index, c.f. eq. (1.2).In the next section, we give two motivating examples in form of a random utility modeland a competing risk model that both fall within the setting of eq. (1.1). We present ourgeneral framework in Section 3 and the assumptions we will work under, and provideour identification results in Section 4. Section 5 applies our general result to the twoexamples and Section 6 concludes. Mogens Fosgerau † and Dennis Kristensen ‡
2. TWO MOTIVATING EXAMPLESThe model (1.1) comprises a range of models that are met in economics. We here presenttwo classes of models that fall within our framework. We will return to these two classesof models in Section 5 where we apply our general identification result to each of them.
We here first demonstrate that the class of additive random utility models (ARUM) canbe mapped into (1.1). Using existing results in the literature, this in turn implies that ourresults also apply to a broad class of rational inattention discrete choice models (Fosgerauet al., 2019) and an even wider class of perturbed utility models.
Consider an agent choosing between J + 1 alternatives,each carrying an associated indirect utility of the form U j = a j ( W, X ) + ε j , j = 0 , , ..., J, where ( W, X ) is a set of observed covariates while ε = ( ε , ε , ..., ε J ) is unobserved.This model was initially proposed by McFadden (1973) and has since become one ofthe workhorses in applied microeconomics; see e.g. Ben-Akiva and Lerman (1985) andMaddala (1986). As is standard in the literature, we impose the following normalizationon the ”outside”option j = 0: a ( w, x ) = 0.Some of the regressors ( W, X ) may potentially be dependent on ε . To handle thissituation, we assume the availability of a set of control variables Z so that ( W, X ) areindependent of ε conditional on Z . In addition to ( W, X, Z ), the researcher also observesthe utility maximizing choice, D = arg max j ∈{ , ,...,J } U j . Thus, the conditional choiceprobabilities (CCP’s),Π j ( w, x, z ) := P ( D = j | ( W, X, Z ) = ( w, x, z )) , j = 0 , , ..., J, (2.3)are identified in the population. We collect these in the vector-valued function Π ( w, x, z ) = { Π j ( w, x, z ) : j = 1 , ..., J } ∈ R J where we leave out the CCP of the outside option. Itnow follows from standard results in the literature that Π ( w, x, z ) can be written on theform (1.1) with Λ being the gradient of the so-called surplus function; see Section 5 forfurther details.Our identification result requires the researcher to group the observed covariates intotwo sets: The first set, denoted W , contains the ”special” regressors that enter the index a through a known function g ( W ) as specified by the researcher, c.f. eq. (1.2). The secondset, denoted X , then enters a through h ( X ) which is left unspecified. The choices of W and g ( W ) are application specific and should be guided by two considerations: First, g ( W ) need to exhibit sufficient continuous variation on R J since this is a key requirementfor our identification result to go through. Second, since g j ( W ) affects the utility of the j th alternative positively by definition, it should be specified accordingly.As an example of this joint modelling and identification strategy, let us consider theproblem of estimating willingness-to-pay for different goods, a common problem in var-ious applied fields of economics (e.g., Fosgerau, 2006; Bontemps and Nauges, 2016). Inthis setting, choosing g to be g j ( W j ) = − ln W j , where W j is the price of alternative j , j = 1 , ..., J , transforms a positive price vector into a vector that can in principle attainvalues in all of R J . With this choice, h j ( X ) + ε j captures the log willingness to pay dentification of a class of index models for good j , where X contains characteristics of the agent and other characteristics ofthe different alternatives. Prices generally exhibit continuous variation and so satisfy thefirst of the two aforementioned requirements. This example assumes the availability ofalternative specific regressors, W , ...., W J . However, our identification result may stillbe applied if this is not true. In this case, the researcher needs to construct alternative-specific regressors g ( W ) , ...., g J ( W ) from a set of underlying covariates W .Our assumption of g ( W ) being known has antecedents in the literature on identificationin discrete choice models. For example, in the context of binary choice ( J = 1), Lewbelet al. (2000) also assumes the presence of a ”special” regressor, in our notation W , thatenters the utility of alternative 1 in a known fashion. But this paper furthermore restricts h ( x ) to be linear, h ( x ) = βx and, importantly, identification of β is achieved throughvariation of g ( W ) on the boundary of its support. Our identification result does not relyon any such argument.Our framework also includes so-called rational inattention discrete choice model. Fos-gerau et al. (2019) show that any ARUM satisfying the conditions above is observa-tionally equivalent to a rational inattention discrete choice model in which the prior isheld constant. This generalizes the finding of Matˇejka and McKay (2015) who show thatthe multinomial logit model has a foundation as a rational inattention model. Thus, ouridentification result extends without effort to a broad class of rational inattention models. The class of perturbed utility models (Fosgerau and McFadden,2012; Fudenberg et al., 2015; Allen and Rehbeck, 2019) is another generalization of theclass of ARUM. As shown by Hofbauer and Sandholm (2002), the CCP’s of an ARUMcan be represented as the solution to a maximization problem where an agent choosesthe vector of CCP’s to maximize a function that consists of a linear term and a concaveterm. Here we present an extended version that includes controls affecting the concaveterm, i.e. Λ ( a, z ) = arg max q ∈ ∆ { a ⊺ q + Ω ( q | z ) } , (2.4)where a ∈ R J +1 is a vector of utility indices, ∆ = { q ∈ R J +1+ : P Jj =0 q j = 1 } is the unitsimplex and Ω ( ·| z ) is a concave function for each z ∈ Z . The perturbed utility modelincludes ARUM as a special case, while allowing an individual to have strict preferencefor randomization rather than to choose a vertex of the probability simplex. As noted byAllen and Rehbeck (2019), observing only realizations of lotteries across choice optionsis sufficient for identification which requires only the vector of CCP’s, Π ( w, x, z ) . Weshow in Section 5 that the implied CCP’s satisfy (1.1). Consider a competing risk model as in Heckman and Honor´e (1989) with J competingcauses of failure. A latent failure time T j > j ∈ { , ..., J } .The econometrician observes the duration until the first failure, Y = min j ∈{ ,...,J } T j , andthe associated cause of failure, D = arg min j ∈{ ,...,J } T j , together with a set of observedcovariates ( X, W, Z ). Assume that the j th failure time satisfiesln T j = a j ( W, X ) − ε j , (2.5)for some function a j ( w, x ) , j = 1 , ..., J . The model may then be termed a multivariategeneralized accelerated failure time model (Kalbfleisch and Prentice, 1980; Fosgerau et al., Mogens Fosgerau † and Dennis Kristensen ‡ j ( w, x, z ) := E [ln Y | W = w, X = x, Z = z ] · P ( D = j | W = w, X = x, Z = z ) , (2.6)for j = 1 , ..., J , where Z is used to control for potential dependence between ( W, X )and ε . We collect the unobservables in ε = ( ε , ..., ε J ) and again require them to beconditionally independent of ( X, W ) in which case, as shown in Section 5, Π definedabove again satisfies eq. (1.1).Typical applications of the above model are in the modelling of (un)employment spellswhere an exit from the unemployment register can be the result of finding a full or apart-time job in different sectors or another change of status. Thus, in this setting, j =1 , ...J indices the different exits (types of non-unemployment), and ( W, X ) contain bothvariables characterizing the types of employment (such as salary in a given type/sectorof employment) and individual-specific controls (such as age and marital status). Similarto discrete choice models, we would then need to construct g ( W ) to capture risk-specificcharacteristics with continuous variation and then include all other co-variates in X .Most empirical applications assume a parametric structure for the index, e.g. a ( W, X ) = αW + βX . In this setting, requiring g to be known effectively assumes fixing α ∈ R J × d Y . At the same time, we impose very weak restrictions on the distributional features ofthe regressors X and how they enter the index a ( W, X ).3. GENERAL FRAMEWORKWe now return to the general model given in eqs. (1.1-1.2) where g : R d Y → R J isassumed to be a known function while h : R d X → R J and Λ : R J × R d Z → R J areunknown functions. In the following, let int A denote the interior of a given set A andlet supp ( Y ) denote the support of a given random variable Y . We then take Π ( w, x, z )as given and known to us for all ( w, x, z ) ∈ supp ( W, X, Z ) ⊆ R J × R d X × R d Z where( W, X, Z ) denote the random variables that we have observed, c.f. the examples in theprevious section.The covariates contained in g ( W ) play a special role in our approach in that weneed sufficient continuous variation in these to achieve identification. First note thatdim g ( W ) = J . Thus, sufficient continuous variation of g ( W ), which is known to us,permit us to identify the relative variation of Λ ( a, z ) w.r.t. a . Formally, for any givenpair ( x, z ) ∈ supp ( X, Z ), define G ( x, z ) = int supp ( g ( W ) | X = x, Z = z ) , X ( z ) = supp ( X | Z = z ) (3.7)We will then throughout implicitly require that some of the open sets G ( x, z ), ( x, z ) ∈ supp ( X, Z ), are non-empty and then achieve identification at the values of x for whichthis is true. A sufficient condition for a given G ( x, z ) to be non-empty is that the dis-tribution of W | X = x, Z = z is continuous and that g maps open sets into open sets;however, this is not required and g ( W ) may contain discrete components as long as theyfall within the support of the continuous component. However, our identification resultstill applies if any values of a discrete component fall outside the continuous supportbut excludes these values. This also rules out that some components of W are includedin Z since in this case G ( x, z ) = ∅ . At the same time, however, ( X, W ) can depend on Z ; we just need sufficient variation in ( X, W ) conditional on Z . Moreover, no continuityrestrictions are imposed on the distribution of ( X, Z ) which may be completely discrete.Finally, we would like to stress that we do not impose any large-support restrictions dentification of a class of index models on g ( W ), which is in contrast to most existing results in the literature, as discussed inthe Introduction. If, for example, G ( x, z ) = R J , for all x , then our result demonstratesthat h ( x ) is identified on all of supp ( X ) ; but it is not necessary, identification on all ofsupp ( X ) can be achieved without such full support condition.Next, let M ( x, z ) = G ( x, z ) × { x } , A ( x, z ) = a ( M ( x, z )) = G ( x, z ) + { h ( x ) } , denote the support of ( W, X ) | ( X, Z ) = ( x, z ) and a ( W, X ) | ( X, Z ) = ( x, z ), respectively,and M ( z ) = ∪ x ∈X ( z ) M ( x, z ) , A ( z ) = ∪ x ∈X ( z ) A ( x, z ) = a ( M ( z )) , (3.8)the supports of the same random variables but now only conditioning on Z = z . Finally,for some set Z ⊆ supp ( Z ) chosen according to certain assumptions stated below, let A = ∪ z ∈Z A ( z ) X = ∪ z ∈Z X ( z ) . (3.9)be the supports of a ( W, X ) and X conditional on Z ∈ Z , respectively. We will thenshow identification of h ( x ) and Λ ( a, z ) for x ∈ X , a ∈ A and z ∈ Z . Specifically, Z will be constructed according to certain properties of the underlying covariates and thefunctions of interest. Observe the dependence of M and A on the set Z . To achieve“maximal” identification, we would ideally like to choose Z = supp ( Z ). However, wepotentially have to restrict Z . First, we require a Λ ( a, z ) to satisfy the followingcondition for all z ∈ Z : Assumption 3.1.
For any z ∈ Z , a Λ ( a, z ) is injective on A ( z ) as defined in (3.8). By asking for Λ( a, z ) to be injective, we can identify the relative variation in a ( w, x )through the observed variation in Π ( w, x, z ). In a given application, Assumption 3.1 maynot hold for all z ∈ supp ( Z ) in which case we need to remove such values from Z . Inthe worst case scenario, this leaves us with Z being empty and our identification resultbecomes void. At the other extreme, Z = supp ( Z ) and we may achieve identificationon the whole support.Due to the structure of a ( w, x ), it follows from the definition of G ( x, z ) that A ( x, z )and thereby also A ( z ) and A are open sets. We add to this by also requiring A ( z ) tobe connected for all z ∈ Z . An open set A is connected if A = O ∪ O implies that O ∩ O = ∅ whenever O and O are nonempty open sets. Thus an open connected setcannot be separated into two non-empty disjoint open sets. We then impose: Assumption 3.2. A ( z ) is connected for all z ∈ Z . Assumption 3.2 allows us to go from local identification at a given point x ∈ X ( z ) torelative identification on all of X ( z ), z ∈ Z via the image of a ( x, w ). The assumptionimposes restrictions on the support of the random variable a ( X, W ) instead of (
X, W )themselves. This is done in order to impose minimal restrictions on the distribution of X and the smoothness of h . Recall that W is assumed to contain a continuous component.Thus, Assumption 3.2 includes, for example, the case of X being unbounded and discrete,or X to be continuous while h ( X ) is discontinuous everywhere. Assumption 3.2 is notverifiable from data but the same holds for smoothness conditions that are regularlyimposed in existing identification results. If we are willing to entertain certain smoothness Mogens Fosgerau † and Dennis Kristensen ‡ conditions, such as the inverse of Λ( a, z ) being continuous with respect to a , then theassumption is implied by connectedness of Π( M ( z ) | z ) = Λ( A ( z ) , z ), this latter propertybeing verifiable. Similarly, if we restrict X and h to both be continuous, it will be impliedby connectedness of M ( z ).Once we have achieved relative identification on each X ( z ), z ∈ Z , global identificationis then reached through the following assumption: Assumption 3.3. If Z ∪ Z = Z , Z , Z = ∅ , then ( ∪ z ∈Z M ( z ) ∩ ( ∪ z ∈Z M ( z ) = ∅ . This is used to paste together the relatively identified sets X ( z ) across z . Again,this assumption does not require X and/or h to be continuous, only that the setssupp ( W, X | Z = z ), z ∈ Z overlap. Finally, the following normalization on the func-tion h gives us identification on X ( z ): Assumption 3.4.
There exists known z ∈ Z and ( w , x ) ∈ M ( z ) so that h ( x ) = 0 . Such a normalization is needed to identify the level of h since, for any given pair of(Λ , h ), we have Λ ( g ( w ) + h ( x ) , z ) = ˜Λ (cid:16) g ( w ) + ˜ h ( x ) , z (cid:17) where ˜Λ ( a, z ) = Λ ( a + c, z )and ˜ h ( x ) = h ( x ) − c for some given value of c ∈ R J .4. MAIN RESULTAs explained earlier, we shall make use of the notion of relative identification in our proofof identification. As a first step, we show relative identification on any two overlappingimages of a ; this is achieved through injectivity of Λ ( a, z ) which allows us to map theoverlapping images into overlapping images of Π. Lemma 4.1.
Suppose that Assumption 3.1 holds, and that h ( x ∗ ) is identified at x ∗ ∈X ( z ) for some z ∈ Z . Then the set X ∗ ( z ) := { x ∈ X ( z ) |A ( x ∗ , z ) ∩ A ( x, z ) = ∅} isidentified and h ( x ) is identified on X ∗ ( z ) . Proof.
By definition, A ( x ∗ , z ) ∩A ( x, z ) = ∅ if and only if there exists w ∗ and w so that g ( w ∗ ) ∈ G ( x ∗ , z ), g ( w ) ∈ G ( x, z ) and a ( w ∗ , x ∗ ) = a ( w, x ). Using that Λ ( a | z ) is injectiveby Assumption 3.1, the last equality is equivalent to Λ ( a ( w ∗ , x ∗ ) , z ) = Λ ( a ( w, x ) , z ),which we recognize as Π ( w ∗ , x ∗ , z ) = Π ( w, x, z ) , (4.10)where Π is known to us. Thus, X ∗ ( x, z ) is identified as the set of solutions x to (4.10)as we vary ( w ∗ , w ). Next, for any given x ∈ X ∗ ( x, z ), let w ∗ and w be the correspondingvalues for which (4.10) holds. Since these are known, the value a ( w ∗ , x ∗ ) = g ( w ∗ )+ h ( x ∗ )is also known to us. This in turn implies that h ( x ) = a ( w, x ) − g ( w ) = a ( w ∗ , x ∗ ) − g ( w )is identified.We then use this lemma in conjunction with the connectedness of A ( z ) to show relativeidentification on each of the sets X ( z ): Lemma 4.2.
Suppose that Assumptions 3.1-3.2 hold. Then, for all z ∈ Z , h ( x ) is rela-tively identified on X ( z ) as defined in eq. (3.7). dentification of a class of index models Proof.
Let x ∗ ∈ X ( z ) be given and suppose we know the value of h ( x ∗ ). Let X ∗ ( z ) ⊆X ( z ) be the set on which h ( x ) is identified and let A ∗ ( z ) = ∪ x ∈X ∗ ( z ) A ( x, z ) be thecorresponding values of a ( w, x ). By assumption x ∗ ∈ X ∗ ( z ) and so the identified setis non-empty. This in turn implies that A ∗ ( z ) is non-empty and open. Now, seekinga contradiction, suppose that X ∗∗ ( z ) := X ( z ) \X ∗ ( z ) = ∅ . Then define A ∗∗ ( z ) = ∪ x ∈X ∗∗ ( z ) A ( x, z ) which is also open and non-empty. Since A ∗ ( z ) ∪A ∗∗ ( z ) = A ( z ), whichis connected according to Assumption 3.2, there must exist x ∈ X ∗ ( z ) and x ′ ∈ X ∗∗ ( z )so that A ( x, z ) ∩A ( x ′ , z ) = ∅ . Lemma 4.1 then implies that x ′ and h ( x ′ ) is also identifiedwhich is a contradiction.Finally, the ”connectedness” of ∪ z ∈Z M ( z ) as stated in Assumption 3.3 together withthe normalization in Assumption 3.4 gives us global identification: Theorem 4.1.
Under Assumptions 3.1-3.4, h ( x ) is identified on X = ∪ z ∈Z X ( z ) . Proof.
Let X ∗ be the identified set. By Lemma 4.2, X ∗ = ∪ z ∈Z ∗ X ( z ) for some Z ∗ ⊆Z . By Assumption 4, z ∈ Z ∗ and so the set is non-empty. Seeking a contradiction,suppose that Z ∗∗ := Z \Z ∗ = ∅ . By definition Z ∗ ∪ Z ∗∗ = Z and so {∪ z ∈Z ∗ M ( z ) } ∩{∪ z ∈Z ∗∗ M ( z ) } 6 = ∅ by Assumption 3.3. This implies that there exists z ∗ ∈ Z ∗ and z ∗∗ ∈ Z ∗∗ so that M ( z ∗ ) ∩ M ( z ∗∗ ) = ∅ which in turn implies that there exists x ∗ ∈X ( z ∗ ) ∩ X ( z ∗∗ ) for which h ( x ∗ ) is identified. But then Lemma 4.2 implies that h ( x ) isidentified on all of X ( z ∗∗ ) which is a contradiction.Once we have identified h we can also identify Λ: Theorem 4.2.
Under Assumptions 3.1-3.4,
Λ ( a, z ) is identified on { ( a, z ) | a ∈ A ( z ) , z ∈ Z } . Proof.
Let z ∈ Z and a ∈ A ( z ) be given. By definition of A ( z ), there exists some pair( w, x ) ∈ M ( z ) such that a = a ( w, x ). Since h ( · ) and thereby also a ( · , · ) is identified, thepair ( w, x ) is known. But then we also know Π ( w, x, z ) and so Λ ( a, z ) = Π ( w, x, z ) isuniquely identified. 5. APPLICATIONSThis section applies the general result to the two main examples of Section 2, the ARUMand the competing risk model, and compare our identification results for these two modelswith existing ones found in the literature. In both examples, we impose the followingconditional independence restriction on the error term: Assumption 5.1. (i) ε is conditionally independent of ( X, W ) , F ε | ( W,X,Z ) ( ·|· , · , z ) = F ε | Z ( ·| z ) for all z ∈ Z for some Z ⊆ supp ( Z ) ; (ii) F ε | Z ( ·| z ) has a conditional densitywith full support for all z ∈ Z . We demonstrate in the next two subsections that part (i) implies Π, as defined in eq.(2.3) and 2.6, respectively, can be written on the form (1.1)-(1.2) for all z ∈ Z , whilepart (ii) ensures that the model specific Λ( a, z ) is injective w.r.t a for all z ∈ Z . Mogens Fosgerau † and Dennis Kristensen ‡ Define the surplus function G ( a , ...a J , z ) := E (cid:20) max j =0 ,...,J U j | a ( W, X ) = a, Z = z (cid:21) = E (cid:20) max j =0 ,...,J { ε j + a j } | Z = z (cid:21) , for any given ( a , a , ..., a J ) ∈ R J +1 , where the second equality uses eq. (2.1.1) andAssumption 5.1(i). The Williams-Daly-Zacchary Theorem (McFadden, 1981) then impliesthat the CCP’s, as defined in (2.3), can be written on the form (1.1)-(1.2) with Λ definedas the gradient of the surplus function,Λ ( a, z ) := ∂G ( a, z ) ∂a (cid:12)(cid:12)(cid:12)(cid:12) a =0 . We conclude:
Corollary 5.1.
Any ARUM on the form (2.1.1) that satisfies Assumptions 3.1-3.4 and5.1(i) is identified.
Next, we discuss each of Assumptions 3.1-3.4 in the context of ARUM and how thesecompare with existing ones found in the literature on identification of ARUM.First, Assumption 3.1, injectivity of Λ ( · , z ) for each z , is implied by Assumption 5.1(ii),c.f. Hofbauer and Sandholm (2002, Thm 2.1). However, Assumption 5.1(ii) is not neces-sary for injectivity to hold. A simply example is the binomial model, where the probabilityfor alternative 0 is the cumulative distribution of ε . If the distribution includes pointmasses, then ties can occur, but this does not destroy injectivity. This is true for anytie-breaking rule. More generally, if the subdifferential of the surplus function is strictlycyclically monotone (Rockafellar, 1970), which does not require the existence of a density,then the utility maximizing choice probabilities under any tie-breaking rule are injective(Sørensen and Fosgerau, 2020).Assumptions 3.2-3.3 impose restrictions on the joint variation of ( g ( W ) , X ). For As-sumption 3.2 to hold, we need to identify J regressors, g ( W ), that exhibit enough jointcontinuous variation so their joint support, conditional on ( X, Z ) has non-empty interioron R J . One instance where this can be achieved is if we have observed alternative specificcharacteristics. In case of demand modellling, one such choice would be a (transforma-tion) of the (relative) prices of the different alternative while X contains all remainingregressors, possibly including other alternative specific covariates. In this case, to controlfor potential endogeneity of prices, we could then include cost shifters in Z . Prices tendto exhibit continuous variation and Assumptions 3.2 would be likely to hold. Assumption3.3 requires other observed product characteristics and the agent’s observed character-istics to exhibit sufficient variation conditional on the controls in Z so that these haveoverlapping support across different values of Z .As already mentioned in the introduction, there are few fully nonparametric identi-fication results for ARUM. To our knowledge, the only results comparable to ours arefound in Matzkin (1993). Her results also require the presence of alternative specificregressors but impose stronger conditions on these and other covariates. Moreover, herset-up does not include any control variables. On the other hand, she does not necessarilyrequire that a ( W, X ) is additive, which we assume throughout. Theorem 1 of Matzkin(1993) does allow for dependence between (
W, X ) and ε but in this case, she requires dentification of a class of index models the observed component of the utilities to be identical across alternatives and strictlyincreasing in one of the arguments. In our notation, this requires a j ( W, X ), j = 1 , ..., J to all be identical. We do not impose any such constraints. Her Theorem 2 requires fullindependence between ( W, X ) and ε but, on the other hand, impose fewer restrictionson a ( W, X ) compared to us. But in both cases, she identifies Λ by letting different com-ponents of W diverge to + ∞ , which is an example of ”thin set identification” discussedearlier. We here demonstrate that the CCP’s for the pertubed discrete choice model again canbe expressed on the form (1.1)-(1.2) with Λ defined in (2.4) being injective. This is doneunder the following restrictions: First, in order to rule out zero demands, the norm ofthe gradient ∇ q Ω ( q | z ) has to approach infinity as q approaches the boundary of the unitsimplex. Second, Ω ( q | z ) is differentiable . Third, we normalize the outside option so that g ( w ) = h ( x ) = 0. Under these three restrictions, for each value of the control z , thedemand solves the first-order condition for an interior solution, a + ∇ q Ω (Λ ( a, z ) | z ) = λι, where λ is a scalar constant and ι ∈ R J is a vector consisting of ones. To show that Λis injective, consider this equation at a and a and assume that Λ ( a , z ) = Λ ( a , z ).Define a matrix M such that M x = x − x ι for all x = ( x , ..., x J ) ∈ R J +1 . Pre-multiplythis matrix onto the first-order condition to obtain that a + M ∇ q Ω (Λ ( a , z ) | z ) = a + M ∇ q Ω (Λ ( a , z ) | z ) , which implies that a = a as required. Define Λ ( a, z ) := G ( a, z ) · ∂G ( a, z ) ∂a , (5.11)where as before a = ( a , ..., a J ) while G ( a, z ) is now defined as the expected log failuretime, G ( a, z ) := E [ln Y | a ( W, X ) = a, Z = z ] = − E (cid:20) max j =1 ,...,J {− a j + ε j } | Z = z (cid:21) , where the second equality uses eq. (2.5) and Assumption 5.1(i). Williams-Daly-ZaccharyTheorem (McFadden, 1981) then implies that Π, now defined by (2.6), can be written onthe form (1.1)-(1.2). Injectivity of Λ ( a, z ), as given in eq. (5.11), is obtained by recyclingthe arguments of the previous subsection except that no normalization of one of thecauses of failure is required since the level G ( a, z ) is included. Corollary 5.2.
Any competing risk model on the form (2.5) that satisfies Assumptions3.1-3.4 and 5.1(i) is identified. Mogens Fosgerau † and Dennis Kristensen ‡ Given that the competing risk model and the ARUM share a similar structure, thediscussion of the remaining assumptions carry over to the current setting with obviousmodifications.Compared to existing results (Heckman and Honor´e, 1989; Lee and Lewbel, 2013) weimpose stronger conditions on the index a ( W, X ) since we require it to be additive andwith g ( W ) known. On the other hand, Heckman and Honor´e (1989) require a ( W, X ) togo to zero as W diverges, and so relies on a ”thin set identification” argument, whileLee and Lewbel (2013) rely on a high-level functional rank-condition. It is unclear whichprimitive conditions suffice for this rank condition to hold. Finally, Honor´e and Lleras-Muney (2006) restrict themselves to the case of purely discrete regressors and are onlyable to derive bounds for objects of interest. We achieve point identification as long asthere is some continuous variation in W while X can be completely discrete6. CONCLUSIONWe have established an identification result for a wide class of index models based ongeneral topological arguments. Three key features of our argument is that smoothness ofthe model is not required; no large support condition is imposed on the regressors; andcontrol variables may contribute to achieving identification. We leave the developmentof nonparametric estimators of the identified components for future research.REFERENCESAbbring, J. H. and G. J. van den Berg (2003). The identifiability of the mixed propor-tional hazards competing risks model. Journal of the Royal Statistical Society 65 (3),701–710.Allen, R. and J. Rehbeck (2019, 5). Identification With Additively Separable Hetero-geneity.
Econometrica 87 (3), 1021–1054.Ben-Akiva, M. and S. R. Lerman (1985).
Discrete Choice Analysis: Theory and Appli-cation to Travel Demand , Volume 6. Cambridge, MA: MIT Press.Berry, S. T. and P. A. Haile (2018, 1). Identification of Nonparametric SimultaneousEquations Models With a Residual Index Structure.
Econometrica 86 (1), 289–315.Berry, S. T., J. Levinsohn, and A. Pakes (1995, 7). Automobile Prices in Market Equi-librium.
Econometrica 63 (4), 841–890.Bontemps, C. and C. Nauges (2016, 1). The Impact of Perceptions in Averting-decisionModels: An Application of the Special Regressor Method to Drinking Water Choices.
American Journal of Agricultural Economics 98 (1), 297–313.Cantillo, V. and J. de Dios Ort´uzar (2006, 11). Implications of thresholds in discretechoice modelling.
Transport Reviews 26 (6), 667–691.Chiappori, P.-A., I. Komunjer, and D. Kristensen (2018). Nonparametric identificationand estimation of discrete choice models.Chiou, Y. Y., M. Y. Chen, and J. e. Chen (2018, 10). Nonparametric regression withmultiple thresholds: Estimation and inference.
Journal of Econometrics 206 (2), 472–514.Evdokimov, K. (2010). Identification and Estimation of a Nonparametric Panel DataModel with Unobserved Heterogeneity.Fosgerau, M. (2006). Investigating the distribution of the value of travel time savings.
Transportation Research Part B: Methodological 40 (8), 688–707.Fosgerau, M., D. McFadden, and M. Bierlaire (2013). Choice probability generatingfunctions.
Journal of Choice Modelling 8 . dentification of a class of index models Fosgerau, M. and D. L. McFadden (2012, 3). A theory of the perturbed consumer withgeneral budgets.
NBER Working Paper , 1–27.Fosgerau, M., E. Melo, A. de Palma, and M. Shum (2019, 12). Discrete Choice andRational Inattention: A General Equivalence Result.
SSRN Electronic Journal .Fudenberg, D., R. Iijima, and T. Strzalecki (2015). Stochastic Choice and RevealedPerturbed Utility.
Econometrica 83 (6), 2371–2409.Heckman, J. J. and B. E. Honor´e (1989, 6). The identifiability of the competing risksmodel.
Biometrika 76 (2), 325–330.Hofbauer, J. and W. H. Sandholm (2002). On the global convergence of stochasticfictitious play.
Econometrica 70 (6), 2265–2294.Honor´e, B. E. and A. Lleras-Muney (2006, 11). Bounds in Competing Risks Models andthe War on Cancer.
Econometrica 74 (6), 1675–1698.Kalbfleisch, J. D. and R. L. Prentice (1980).
The statistical analysis of failure time data ,Volume 2nd. of
Wiley Series in probability and statistics . Hoboken, New Jersey: Wiley.Khan, S. and E. Tamer (2010, 11). Irregular Identification, Support Conditions, andInverse Weight Estimation.
Econometrica 78 (6), 2021–2042.Lee, S. and A. Lewbel (2013, 10). Nonparametric identification of accelerated failuretime competing risks models.
Econometric Theory 29 (05), 905–919.Lewbel, A., W. Shen, and H. M. Zhang (2000, 7). Semiparametric qualitative responsemodel estimation with unknown heteroscedasticity or instrumental variables.
Journalof Econometrics 97 (530), 145–177.Maddala, G. S. (1986).
Limited-dependent and qualitative variables in econometrics.
Cambridge: Cambridge University Press.Manski, C. F. (1975, 8). Maximum score estimation of the stochastic utility model ofchoice.
Journal of Econometrics 3 (3), 205–228.Matˇejka, F. and A. McKay (2015, 1). Rational Inattention to Discrete Choices: A NewFoundation for the Multinomial Logit Model.
American Economic Review 105 (1),272–298.Matzkin, R. L. (1993, 7). Nonparametric identification and estimation of polychotomouschoice models.
Journal of Econometrics 58 (1-2), 137–168.McFadden, D. (1981). Econometric Models of Probabilistic Choice. In C. Manski andD. McFadden (Eds.),
Structural Analysis of Discrete Data with Econometric Applica-tions , pp. 198–272. Cambridge, MA, USA: MIT Press.McFadden, D. L. (1973). Conditional logit analysis of qualitative choice behavior. In
Frontiers in Econometrics , pp. 105142. New York: Academic Press.Rockafellar, R. T. (1970).