[PDF] Identification of a class of index models: A topological approach

Abstract

We establish nonparametric identification in a class of so-called index models using a novel approach that relies on general topological results. Our proof strategy requires substantially weaker conditions on the functions and distributions characterizing the model compared to existing strategies; in particular, it does not require any large support conditions on the regressors of our model. We apply the general identification result to additive random utility and competing risk models.

Full PDF

aa r X i v : . [ ec on . E M ] A p r Manuscript submitted to

The Econometrics Journal , pp. 1–13.

Identiﬁcation of a class of index models: A topologicalapproach Mogens Fosgerau † and Dennis Kristensen ‡ † Dept. of Economics, Univ. of Copenhagen, ster Farimagsgade 5, 1353 Kbenhavn K, Denmark.

E-mail: [email protected] ‡ Department of Economics, University College London, Gower Street, London, WC1E 6BT, UK.

E-mail: [email protected]

Summary

We establish nonparametric identiﬁcation in a class of so-called indexmodels using a novel approach that relies on general topological results. Our proofstrategy requires substantially weaker conditions on the functions and distributionscharacterizing the model compared to existing strategies; in particular, it does notrequire any large support conditions on the regressors of our model. We apply thegeneral identiﬁcation result to additive random utility and competing risk models.

1. INTRODUCTIONWe develop a novel nonparametric identiﬁcation result for the following class of models,Π ( w, x, z ) = Λ ( a ( w, x ) , z ) , (1.1)where a ( w, x ) = g ( w ) + h ( x ) . (1.2)is a vector of additively separable index functions while Λ : R J × R d Z R J , g : R d W R J and h : R d X R J are all vector-valued functions of dimension J ≥

1. The arguments w ∈ R d J and x ∈ R d X represent the values of two sets of regressors, W and X , while z ∈ R d Z corresponds to values of a set of control variables, Z . We take as high-levelassumption that we know (have observed from data) the function Π ( w, x, z ), for ( w, x, z )in the support of ( W, X, Z ), from which we then wish to identify the unknown functionsΛ ( a, z ) and h ( x ), while we treat the function g ( w ) as being known. We refer to thisclass of models as index models since W and X are restricted to enter the model through g ( W ) and h ( X ), respectively. We make three major contributions relative to the existingliterature:First, we do not impose any large support conditions on any of the regressors inour model. Most existing results on identiﬁcation within this class of models requireavailability of a set of ”special” continuously distributed regressors; identiﬁcation is thenachieved by sending each of these special regressors oﬀ to the boundary of their support.Estimators based on such ”thin set identiﬁcation” argument were analyzed by Khan andTamer (2010) who showed that they tend to be irregularly behaved with slow convergence Mogens Fosgerau † and Dennis Kristensen ‡ rates. In contrast, we achieve identiﬁcation as long as the random index a ( W, X ) exhibitssuﬃcient, but potentially bounded, variation. We expect this to translate into betterbehaved estimators.Second, we impose weak conditions on the functions of interest and distributions ofthe random variables (

W, X, Z ). We do not require continuity or diﬀerentiability of thefunctions entering the model in order to show identiﬁcation while most existing resultsas a minimum require these to be diﬀerentiable. Similarly, we only require g ( W ) tohave continuous support while ( X, Z ) can both be discrete, continuous or a mix of thetwo as long as their supports satisfy certain conditions. Thus, our results cover modelswith thresholds and kinks in Λ, g and h , which existing results cannot handle. In thecase of discrete choice models such features may occur if the decision maker optimizessubject to constraints; see, e.g., Cantillo and de Dios Ort´uzar (2006). These modelshave traditionally been formulated in a parametric fashion; our theory demonstrateshow these can be identiﬁed without parametric constraints. There is a growing literatureon nonparametric estimation with unknown thresholds and kinks which we conjecturecan be employed in our setting in order to translate our identiﬁcation result into actualestimators; see, e.g., Chiou et al. (2018).Third, we show how the presence of the controls Z can help to achieve identiﬁcationin a nontrivial way: We ﬁrst show local identiﬁcation at each value of the control Z .Suitable variation in Z then allows us to piece the locally identiﬁed components togetheracross diﬀerent values of Z to achieve global identiﬁcation. In comparison, most otherpapers that allow for control variables show identiﬁcation at a ﬁxed arbitrary value of Z in which case variation in Z is unnecessary for identiﬁcation.Our proof strategy relies on arguments from general topology that, to our knowledge,are completely new to the literature on nonparametric identiﬁcation. These should beof general interest since they can be used for identiﬁcation in other settings. The twokey elements of our approach is the notions of relative identiﬁcation and connected sets.Below, we state our formal deﬁnition of the former: Definition 1.1.

A function h is said to be relatively identiﬁed on a given set X ifidentiﬁcation of h ( x ∗ ) at some point x ∗ ∈ X implies that h ( x ) is also identiﬁed at allother x ∈ X . Next, recall the topological notion of connnectedness: A connected set cannot be con-tained in the union of two non-empty disjoint open sets while having non-empty intersec-tion with both. In particular, it is not possible to split a connected open set into disjointopen subsets.Our identiﬁcation strategy then proceeds in three steps where we here initially suppressthe presence of Z for simplicity: First, we decompose the support of X into suitablesubsets and achieve relative identiﬁcation on each of these. This is done via two featuresof our model: For a given x , we are able to identify the relative variation in Λ ( a ), with a = h ( x ) + g ( w ), through the observed variation in Π ( w, x ) w.r.t. w through the knownfunction g ( w ). By injectivity of Λ, we are then able to identify the relative value of a which in turn yields the relative value of h ( x ) = a − g ( w ) on suitably chosen subsets ofthe support of X . Second, we achieve global identiﬁcation on the union of these subsets byusing the second main ingredient of our proof strategy, connectedness: We will require thesupport of a ( W, X ) to be connected which is used to extend relative local identiﬁcation toglobal identiﬁcation. Finally, reintroducing Z , we again rely on the supports of X | Z = z dentiﬁcation of a class of index models to be suitably connected across diﬀerent values of z in the support of Z to enlarge theidentiﬁcation region further.Like us, Berry and Haile (2018) and Evdokimov (2010), among others, rely on con-nectness to achieve global identiﬁcation but in these papers the restriction is imposeddirectly on the support of the covariates thereby implicity restricting the covariates to becontinuous. In contrast, we impose connectedness on the image of a ( W, X ) and so allowfor both X and W to contain discrete components.Two leading examples that fall within our general framework are nonparametric addi-tive versions of multiple discrete choice and competing risk models as shown in the nextsection. There is a large literature on identiﬁcation and estimation of semiparametricmultinomial choice models (see, e.g., Manski, 1975; Lewbel et al., 2000). In contrast,the literature on nonparametric identiﬁcation is quite thin with few results having beendeveloped since the seminal work of Matzkin (1993). In terms of modelling, Matzkin(1993) is probably the most closely related to our setting, but the assumptions made andidentiﬁcation strategy pursued in this paper are very diﬀerent from ours. Our and her setof assumptions are not clearly ranked with some of our assumptions being stronger whileothers weaker compared to hers. One key feature of her proof strategy is the introduc-tion of assumptions that ensure the multinomial model can be converted into a binarychoice problem followed by a thin-set identiﬁcation argument. More recently, Allen andRehbeck (2019) provide conditions under which one can identify how regressors alter thedesirability of alternatives using only average demands. Their conditions are weaker thanours but on the other hand they are only able to identify certain features of the model,not the underlying data-generating structure.There is also a nascent literature on nonparametric identiﬁcation of so-called BLPmodels (Berry et al., 1995) as used in industrial organization; see, for example, Berryand Haile (2018) and Chiappori et al. (2018). The setting of the BLP model is somewhatdiﬀerent, though, since there the choice probabilities are treated as observed variableswhich depend on unobserved product characteristics that have to be controlled for. Thisleads to a diﬀerent identiﬁcation problem compared to ours.Finally, there is also a literature on identiﬁcation in competing risk models. The twomost closely related papers in terms of modelling are Heckman and Honor´e (1989) and Leeand Lewbel (2013). Heckman and Honor´e (1989) achieves identiﬁcation by assuming theindex (in our notation a ( W, X )) has support on (0 , ∞ ) J and then achieves identiﬁcationof a given component of the index by letting the other components go to zero, and sotheir result falls in the thin-set identiﬁcation category. Abbring and van den Berg (2003)weaken this assumption substantially for the class of mixed proportional hazard models, asubclass of competing risk models. Lee and Lewbel (2013) provide a high-level assumptionfor identiﬁcation of the general model involving a rank condition of an integral operator.Primitive conditions for this to hold are not known. Honor´e and Lleras-Muney (2006)derive bounds for the functions of interest when only discrete covariates are available. Wecomplement these studies by showing identiﬁcation in the general competing risk modelunder primitive conditions that allow for the presence of discrete co-variates, but at thesame time impose more structure on the index, c.f. eq. (1.2).In the next section, we give two motivating examples in form of a random utility modeland a competing risk model that both fall within the setting of eq. (1.1). We present ourgeneral framework in Section 3 and the assumptions we will work under, and provideour identiﬁcation results in Section 4. Section 5 applies our general result to the twoexamples and Section 6 concludes. Mogens Fosgerau † and Dennis Kristensen ‡

2. TWO MOTIVATING EXAMPLESThe model (1.1) comprises a range of models that are met in economics. We here presenttwo classes of models that fall within our framework. We will return to these two classesof models in Section 5 where we apply our general identiﬁcation result to each of them.

We here ﬁrst demonstrate that the class of additive random utility models (ARUM) canbe mapped into (1.1). Using existing results in the literature, this in turn implies that ourresults also apply to a broad class of rational inattention discrete choice models (Fosgerauet al., 2019) and an even wider class of perturbed utility models.

Consider an agent choosing between J + 1 alternatives,each carrying an associated indirect utility of the form U j = a j ( W, X ) + ε j , j = 0 , , ..., J, where ( W, X ) is a set of observed covariates while ε = ( ε , ε , ..., ε J ) is unobserved.This model was initially proposed by McFadden (1973) and has since become one ofthe workhorses in applied microeconomics; see e.g. Ben-Akiva and Lerman (1985) andMaddala (1986). As is standard in the literature, we impose the following normalizationon the ”outside”option j = 0: a ( w, x ) = 0.Some of the regressors ( W, X ) may potentially be dependent on ε . To handle thissituation, we assume the availability of a set of control variables Z so that ( W, X ) areindependent of ε conditional on Z . In addition to ( W, X, Z ), the researcher also observesthe utility maximizing choice, D = arg max j ∈{ , ,...,J } U j . Thus, the conditional choiceprobabilities (CCP’s),Π j ( w, x, z ) := P ( D = j | ( W, X, Z ) = ( w, x, z )) , j = 0 , , ..., J, (2.3)are identiﬁed in the population. We collect these in the vector-valued function Π ( w, x, z ) = { Π j ( w, x, z ) : j = 1 , ..., J } ∈ R J where we leave out the CCP of the outside option. Itnow follows from standard results in the literature that Π ( w, x, z ) can be written on theform (1.1) with Λ being the gradient of the so-called surplus function; see Section 5 forfurther details.Our identiﬁcation result requires the researcher to group the observed covariates intotwo sets: The ﬁrst set, denoted W , contains the ”special” regressors that enter the index a through a known function g ( W ) as speciﬁed by the researcher, c.f. eq. (1.2). The secondset, denoted X , then enters a through h ( X ) which is left unspeciﬁed. The choices of W and g ( W ) are application speciﬁc and should be guided by two considerations: First, g ( W ) need to exhibit suﬃcient continuous variation on R J since this is a key requirementfor our identiﬁcation result to go through. Second, since g j ( W ) aﬀects the utility of the j th alternative positively by deﬁnition, it should be speciﬁed accordingly.As an example of this joint modelling and identiﬁcation strategy, let us consider theproblem of estimating willingness-to-pay for diﬀerent goods, a common problem in var-ious applied ﬁelds of economics (e.g., Fosgerau, 2006; Bontemps and Nauges, 2016). Inthis setting, choosing g to be g j ( W j ) = − ln W j , where W j is the price of alternative j , j = 1 , ..., J , transforms a positive price vector into a vector that can in principle attainvalues in all of R J . With this choice, h j ( X ) + ε j captures the log willingness to pay dentiﬁcation of a class of index models for good j , where X contains characteristics of the agent and other characteristics ofthe diﬀerent alternatives. Prices generally exhibit continuous variation and so satisfy theﬁrst of the two aforementioned requirements. This example assumes the availability ofalternative speciﬁc regressors, W , ...., W J . However, our identiﬁcation result may stillbe applied if this is not true. In this case, the researcher needs to construct alternative-speciﬁc regressors g ( W ) , ...., g J ( W ) from a set of underlying covariates W .Our assumption of g ( W ) being known has antecedents in the literature on identiﬁcationin discrete choice models. For example, in the context of binary choice ( J = 1), Lewbelet al. (2000) also assumes the presence of a ”special” regressor, in our notation W , thatenters the utility of alternative 1 in a known fashion. But this paper furthermore restricts h ( x ) to be linear, h ( x ) = βx and, importantly, identiﬁcation of β is achieved throughvariation of g ( W ) on the boundary of its support. Our identiﬁcation result does not relyon any such argument.Our framework also includes so-called rational inattention discrete choice model. Fos-gerau et al. (2019) show that any ARUM satisfying the conditions above is observa-tionally equivalent to a rational inattention discrete choice model in which the prior isheld constant. This generalizes the ﬁnding of Matˇejka and McKay (2015) who show thatthe multinomial logit model has a foundation as a rational inattention model. Thus, ouridentiﬁcation result extends without eﬀort to a broad class of rational inattention models. The class of perturbed utility models (Fosgerau and McFadden,2012; Fudenberg et al., 2015; Allen and Rehbeck, 2019) is another generalization of theclass of ARUM. As shown by Hofbauer and Sandholm (2002), the CCP’s of an ARUMcan be represented as the solution to a maximization problem where an agent choosesthe vector of CCP’s to maximize a function that consists of a linear term and a concaveterm. Here we present an extended version that includes controls aﬀecting the concaveterm, i.e. Λ ( a, z ) = arg max q ∈ ∆ { a ⊺ q + Ω ( q | z ) } , (2.4)where a ∈ R J +1 is a vector of utility indices, ∆ = { q ∈ R J +1+ : P Jj =0 q j = 1 } is the unitsimplex and Ω ( ·| z ) is a concave function for each z ∈ Z . The perturbed utility modelincludes ARUM as a special case, while allowing an individual to have strict preferencefor randomization rather than to choose a vertex of the probability simplex. As noted byAllen and Rehbeck (2019), observing only realizations of lotteries across choice optionsis suﬃcient for identiﬁcation which requires only the vector of CCP’s, Π ( w, x, z ) . Weshow in Section 5 that the implied CCP’s satisfy (1.1). Consider a competing risk model as in Heckman and Honor´e (1989) with J competingcauses of failure. A latent failure time T j > j ∈ { , ..., J } .The econometrician observes the duration until the ﬁrst failure, Y = min j ∈{ ,...,J } T j , andthe associated cause of failure, D = arg min j ∈{ ,...,J } T j , together with a set of observedcovariates ( X, W, Z ). Assume that the j th failure time satisﬁesln T j = a j ( W, X ) − ε j , (2.5)for some function a j ( w, x ) , j = 1 , ..., J . The model may then be termed a multivariategeneralized accelerated failure time model (Kalbﬂeisch and Prentice, 1980; Fosgerau et al., Mogens Fosgerau † and Dennis Kristensen ‡ j ( w, x, z ) := E [ln Y | W = w, X = x, Z = z ] · P ( D = j | W = w, X = x, Z = z ) , (2.6)for j = 1 , ..., J , where Z is used to control for potential dependence between ( W, X )and ε . We collect the unobservables in ε = ( ε , ..., ε J ) and again require them to beconditionally independent of ( X, W ) in which case, as shown in Section 5, Π deﬁnedabove again satisﬁes eq. (1.1).Typical applications of the above model are in the modelling of (un)employment spellswhere an exit from the unemployment register can be the result of ﬁnding a full or apart-time job in diﬀerent sectors or another change of status. Thus, in this setting, j =1 , ...J indices the diﬀerent exits (types of non-unemployment), and ( W, X ) contain bothvariables characterizing the types of employment (such as salary in a given type/sectorof employment) and individual-speciﬁc controls (such as age and marital status). Similarto discrete choice models, we would then need to construct g ( W ) to capture risk-speciﬁccharacteristics with continuous variation and then include all other co-variates in X .Most empirical applications assume a parametric structure for the index, e.g. a ( W, X ) = αW + βX . In this setting, requiring g to be known eﬀectively assumes ﬁxing α ∈ R J × d Y . At the same time, we impose very weak restrictions on the distributional features ofthe regressors X and how they enter the index a ( W, X ).3. GENERAL FRAMEWORKWe now return to the general model given in eqs. (1.1-1.2) where g : R d Y → R J isassumed to be a known function while h : R d X → R J and Λ : R J × R d Z → R J areunknown functions. In the following, let int A denote the interior of a given set A andlet supp ( Y ) denote the support of a given random variable Y . We then take Π ( w, x, z )as given and known to us for all ( w, x, z ) ∈ supp ( W, X, Z ) ⊆ R J × R d X × R d Z where( W, X, Z ) denote the random variables that we have observed, c.f. the examples in theprevious section.The covariates contained in g ( W ) play a special role in our approach in that weneed suﬃcient continuous variation in these to achieve identiﬁcation. First note thatdim g ( W ) = J . Thus, suﬃcient continuous variation of g ( W ), which is known to us,permit us to identify the relative variation of Λ ( a, z ) w.r.t. a . Formally, for any givenpair ( x, z ) ∈ supp ( X, Z ), deﬁne G ( x, z ) = int supp ( g ( W ) | X = x, Z = z ) , X ( z ) = supp ( X | Z = z ) (3.7)We will then throughout implicitly require that some of the open sets G ( x, z ), ( x, z ) ∈ supp ( X, Z ), are non-empty and then achieve identiﬁcation at the values of x for whichthis is true. A suﬃcient condition for a given G ( x, z ) to be non-empty is that the dis-tribution of W | X = x, Z = z is continuous and that g maps open sets into open sets;however, this is not required and g ( W ) may contain discrete components as long as theyfall within the support of the continuous component. However, our identiﬁcation resultstill applies if any values of a discrete component fall outside the continuous supportbut excludes these values. This also rules out that some components of W are includedin Z since in this case G ( x, z ) = ∅ . At the same time, however, ( X, W ) can depend on Z ; we just need suﬃcient variation in ( X, W ) conditional on Z . Moreover, no continuityrestrictions are imposed on the distribution of ( X, Z ) which may be completely discrete.Finally, we would like to stress that we do not impose any large-support restrictions dentiﬁcation of a class of index models on g ( W ), which is in contrast to most existing results in the literature, as discussed inthe Introduction. If, for example, G ( x, z ) = R J , for all x , then our result demonstratesthat h ( x ) is identiﬁed on all of supp ( X ) ; but it is not necessary, identiﬁcation on all ofsupp ( X ) can be achieved without such full support condition.Next, let M ( x, z ) = G ( x, z ) × { x } , A ( x, z ) = a ( M ( x, z )) = G ( x, z ) + { h ( x ) } , denote the support of ( W, X ) | ( X, Z ) = ( x, z ) and a ( W, X ) | ( X, Z ) = ( x, z ), respectively,and M ( z ) = ∪ x ∈X ( z ) M ( x, z ) , A ( z ) = ∪ x ∈X ( z ) A ( x, z ) = a ( M ( z )) , (3.8)the supports of the same random variables but now only conditioning on Z = z . Finally,for some set Z ⊆ supp ( Z ) chosen according to certain assumptions stated below, let A = ∪ z ∈Z A ( z ) X = ∪ z ∈Z X ( z ) . (3.9)be the supports of a ( W, X ) and X conditional on Z ∈ Z , respectively. We will thenshow identiﬁcation of h ( x ) and Λ ( a, z ) for x ∈ X , a ∈ A and z ∈ Z . Speciﬁcally, Z will be constructed according to certain properties of the underlying covariates and thefunctions of interest. Observe the dependence of M and A on the set Z . To achieve“maximal” identiﬁcation, we would ideally like to choose Z = supp ( Z ). However, wepotentially have to restrict Z . First, we require a Λ ( a, z ) to satisfy the followingcondition for all z ∈ Z : Assumption 3.1.

For any z ∈ Z , a Λ ( a, z ) is injective on A ( z ) as deﬁned in (3.8). By asking for Λ( a, z ) to be injective, we can identify the relative variation in a ( w, x )through the observed variation in Π ( w, x, z ). In a given application, Assumption 3.1 maynot hold for all z ∈ supp ( Z ) in which case we need to remove such values from Z . Inthe worst case scenario, this leaves us with Z being empty and our identiﬁcation resultbecomes void. At the other extreme, Z = supp ( Z ) and we may achieve identiﬁcationon the whole support.Due to the structure of a ( w, x ), it follows from the deﬁnition of G ( x, z ) that A ( x, z )and thereby also A ( z ) and A are open sets. We add to this by also requiring A ( z ) tobe connected for all z ∈ Z . An open set A is connected if A = O ∪ O implies that O ∩ O = ∅ whenever O and O are nonempty open sets. Thus an open connected setcannot be separated into two non-empty disjoint open sets. We then impose: Assumption 3.2. A ( z ) is connected for all z ∈ Z . Assumption 3.2 allows us to go from local identiﬁcation at a given point x ∈ X ( z ) torelative identiﬁcation on all of X ( z ), z ∈ Z via the image of a ( x, w ). The assumptionimposes restrictions on the support of the random variable a ( X, W ) instead of (

X, W )themselves. This is done in order to impose minimal restrictions on the distribution of X and the smoothness of h . Recall that W is assumed to contain a continuous component.Thus, Assumption 3.2 includes, for example, the case of X being unbounded and discrete,or X to be continuous while h ( X ) is discontinuous everywhere. Assumption 3.2 is notveriﬁable from data but the same holds for smoothness conditions that are regularlyimposed in existing identiﬁcation results. If we are willing to entertain certain smoothness Mogens Fosgerau † and Dennis Kristensen ‡ conditions, such as the inverse of Λ( a, z ) being continuous with respect to a , then theassumption is implied by connectedness of Π( M ( z ) | z ) = Λ( A ( z ) , z ), this latter propertybeing veriﬁable. Similarly, if we restrict X and h to both be continuous, it will be impliedby connectedness of M ( z ).Once we have achieved relative identiﬁcation on each X ( z ), z ∈ Z , global identiﬁcationis then reached through the following assumption: Assumption 3.3. If Z ∪ Z = Z , Z , Z = ∅ , then ( ∪ z ∈Z M ( z ) ∩ ( ∪ z ∈Z M ( z ) = ∅ . This is used to paste together the relatively identiﬁed sets X ( z ) across z . Again,this assumption does not require X and/or h to be continuous, only that the setssupp ( W, X | Z = z ), z ∈ Z overlap. Finally, the following normalization on the func-tion h gives us identiﬁcation on X ( z ): Assumption 3.4.

There exists known z ∈ Z and ( w , x ) ∈ M ( z ) so that h ( x ) = 0 . Such a normalization is needed to identify the level of h since, for any given pair of(Λ , h ), we have Λ ( g ( w ) + h ( x ) , z ) = ˜Λ (cid:16) g ( w ) + ˜ h ( x ) , z (cid:17) where ˜Λ ( a, z ) = Λ ( a + c, z )and ˜ h ( x ) = h ( x ) − c for some given value of c ∈ R J .4. MAIN RESULTAs explained earlier, we shall make use of the notion of relative identiﬁcation in our proofof identiﬁcation. As a ﬁrst step, we show relative identiﬁcation on any two overlappingimages of a ; this is achieved through injectivity of Λ ( a, z ) which allows us to map theoverlapping images into overlapping images of Π. Lemma 4.1.

Suppose that Assumption 3.1 holds, and that h ( x ∗ ) is identiﬁed at x ∗ ∈X ( z ) for some z ∈ Z . Then the set X ∗ ( z ) := { x ∈ X ( z ) |A ( x ∗ , z ) ∩ A ( x, z ) = ∅} isidentiﬁed and h ( x ) is identiﬁed on X ∗ ( z ) . Proof.

By deﬁnition, A ( x ∗ , z ) ∩A ( x, z ) = ∅ if and only if there exists w ∗ and w so that g ( w ∗ ) ∈ G ( x ∗ , z ), g ( w ) ∈ G ( x, z ) and a ( w ∗ , x ∗ ) = a ( w, x ). Using that Λ ( a | z ) is injectiveby Assumption 3.1, the last equality is equivalent to Λ ( a ( w ∗ , x ∗ ) , z ) = Λ ( a ( w, x ) , z ),which we recognize as Π ( w ∗ , x ∗ , z ) = Π ( w, x, z ) , (4.10)where Π is known to us. Thus, X ∗ ( x, z ) is identiﬁed as the set of solutions x to (4.10)as we vary ( w ∗ , w ). Next, for any given x ∈ X ∗ ( x, z ), let w ∗ and w be the correspondingvalues for which (4.10) holds. Since these are known, the value a ( w ∗ , x ∗ ) = g ( w ∗ )+ h ( x ∗ )is also known to us. This in turn implies that h ( x ) = a ( w, x ) − g ( w ) = a ( w ∗ , x ∗ ) − g ( w )is identiﬁed.We then use this lemma in conjunction with the connectedness of A ( z ) to show relativeidentiﬁcation on each of the sets X ( z ): Lemma 4.2.

Suppose that Assumptions 3.1-3.2 hold. Then, for all z ∈ Z , h ( x ) is rela-tively identiﬁed on X ( z ) as deﬁned in eq. (3.7). dentiﬁcation of a class of index models Proof.

Let x ∗ ∈ X ( z ) be given and suppose we know the value of h ( x ∗ ). Let X ∗ ( z ) ⊆X ( z ) be the set on which h ( x ) is identiﬁed and let A ∗ ( z ) = ∪ x ∈X ∗ ( z ) A ( x, z ) be thecorresponding values of a ( w, x ). By assumption x ∗ ∈ X ∗ ( z ) and so the identiﬁed setis non-empty. This in turn implies that A ∗ ( z ) is non-empty and open. Now, seekinga contradiction, suppose that X ∗∗ ( z ) := X ( z ) \X ∗ ( z ) = ∅ . Then deﬁne A ∗∗ ( z ) = ∪ x ∈X ∗∗ ( z ) A ( x, z ) which is also open and non-empty. Since A ∗ ( z ) ∪A ∗∗ ( z ) = A ( z ), whichis connected according to Assumption 3.2, there must exist x ∈ X ∗ ( z ) and x ′ ∈ X ∗∗ ( z )so that A ( x, z ) ∩A ( x ′ , z ) = ∅ . Lemma 4.1 then implies that x ′ and h ( x ′ ) is also identiﬁedwhich is a contradiction.Finally, the ”connectedness” of ∪ z ∈Z M ( z ) as stated in Assumption 3.3 together withthe normalization in Assumption 3.4 gives us global identiﬁcation: Theorem 4.1.

Under Assumptions 3.1-3.4, h ( x ) is identiﬁed on X = ∪ z ∈Z X ( z ) . Proof.

Let X ∗ be the identiﬁed set. By Lemma 4.2, X ∗ = ∪ z ∈Z ∗ X ( z ) for some Z ∗ ⊆Z . By Assumption 4, z ∈ Z ∗ and so the set is non-empty. Seeking a contradiction,suppose that Z ∗∗ := Z \Z ∗ = ∅ . By deﬁnition Z ∗ ∪ Z ∗∗ = Z and so {∪ z ∈Z ∗ M ( z ) } ∩{∪ z ∈Z ∗∗ M ( z ) } 6 = ∅ by Assumption 3.3. This implies that there exists z ∗ ∈ Z ∗ and z ∗∗ ∈ Z ∗∗ so that M ( z ∗ ) ∩ M ( z ∗∗ ) = ∅ which in turn implies that there exists x ∗ ∈X ( z ∗ ) ∩ X ( z ∗∗ ) for which h ( x ∗ ) is identiﬁed. But then Lemma 4.2 implies that h ( x ) isidentiﬁed on all of X ( z ∗∗ ) which is a contradiction.Once we have identiﬁed h we can also identify Λ: Theorem 4.2.

Under Assumptions 3.1-3.4,

Λ ( a, z ) is identiﬁed on { ( a, z ) | a ∈ A ( z ) , z ∈ Z } . Proof.

Let z ∈ Z and a ∈ A ( z ) be given. By deﬁnition of A ( z ), there exists some pair( w, x ) ∈ M ( z ) such that a = a ( w, x ). Since h ( · ) and thereby also a ( · , · ) is identiﬁed, thepair ( w, x ) is known. But then we also know Π ( w, x, z ) and so Λ ( a, z ) = Π ( w, x, z ) isuniquely identiﬁed. 5. APPLICATIONSThis section applies the general result to the two main examples of Section 2, the ARUMand the competing risk model, and compare our identiﬁcation results for these two modelswith existing ones found in the literature. In both examples, we impose the followingconditional independence restriction on the error term: Assumption 5.1. (i) ε is conditionally independent of ( X, W ) , F ε | ( W,X,Z ) ( ·|· , · , z ) = F ε | Z ( ·| z ) for all z ∈ Z for some Z ⊆ supp ( Z ) ; (ii) F ε | Z ( ·| z ) has a conditional densitywith full support for all z ∈ Z . We demonstrate in the next two subsections that part (i) implies Π, as deﬁned in eq.(2.3) and 2.6, respectively, can be written on the form (1.1)-(1.2) for all z ∈ Z , whilepart (ii) ensures that the model speciﬁc Λ( a, z ) is injective w.r.t a for all z ∈ Z . Mogens Fosgerau † and Dennis Kristensen ‡ Deﬁne the surplus function G ( a , ...a J , z ) := E (cid:20) max j =0 ,...,J U j | a ( W, X ) = a, Z = z (cid:21) = E (cid:20) max j =0 ,...,J { ε j + a j } | Z = z (cid:21) , for any given ( a , a , ..., a J ) ∈ R J +1 , where the second equality uses eq. (2.1.1) andAssumption 5.1(i). The Williams-Daly-Zacchary Theorem (McFadden, 1981) then impliesthat the CCP’s, as deﬁned in (2.3), can be written on the form (1.1)-(1.2) with Λ deﬁnedas the gradient of the surplus function,Λ ( a, z ) := ∂G ( a, z ) ∂a (cid:12)(cid:12)(cid:12)(cid:12) a =0 . We conclude:

Corollary 5.1.

Any ARUM on the form (2.1.1) that satisﬁes Assumptions 3.1-3.4 and5.1(i) is identiﬁed.

Next, we discuss each of Assumptions 3.1-3.4 in the context of ARUM and how thesecompare with existing ones found in the literature on identiﬁcation of ARUM.First, Assumption 3.1, injectivity of Λ ( · , z ) for each z , is implied by Assumption 5.1(ii),c.f. Hofbauer and Sandholm (2002, Thm 2.1). However, Assumption 5.1(ii) is not neces-sary for injectivity to hold. A simply example is the binomial model, where the probabilityfor alternative 0 is the cumulative distribution of ε . If the distribution includes pointmasses, then ties can occur, but this does not destroy injectivity. This is true for anytie-breaking rule. More generally, if the subdiﬀerential of the surplus function is strictlycyclically monotone (Rockafellar, 1970), which does not require the existence of a density,then the utility maximizing choice probabilities under any tie-breaking rule are injective(Sørensen and Fosgerau, 2020).Assumptions 3.2-3.3 impose restrictions on the joint variation of ( g ( W ) , X ). For As-sumption 3.2 to hold, we need to identify J regressors, g ( W ), that exhibit enough jointcontinuous variation so their joint support, conditional on ( X, Z ) has non-empty interioron R J . One instance where this can be achieved is if we have observed alternative speciﬁccharacteristics. In case of demand modellling, one such choice would be a (transforma-tion) of the (relative) prices of the diﬀerent alternative while X contains all remainingregressors, possibly including other alternative speciﬁc covariates. In this case, to controlfor potential endogeneity of prices, we could then include cost shifters in Z . Prices tendto exhibit continuous variation and Assumptions 3.2 would be likely to hold. Assumption3.3 requires other observed product characteristics and the agent’s observed character-istics to exhibit suﬃcient variation conditional on the controls in Z so that these haveoverlapping support across diﬀerent values of Z .As already mentioned in the introduction, there are few fully nonparametric identi-ﬁcation results for ARUM. To our knowledge, the only results comparable to ours arefound in Matzkin (1993). Her results also require the presence of alternative speciﬁcregressors but impose stronger conditions on these and other covariates. Moreover, herset-up does not include any control variables. On the other hand, she does not necessarilyrequire that a ( W, X ) is additive, which we assume throughout. Theorem 1 of Matzkin(1993) does allow for dependence between (

W, X ) and ε but in this case, she requires dentiﬁcation of a class of index models the observed component of the utilities to be identical across alternatives and strictlyincreasing in one of the arguments. In our notation, this requires a j ( W, X ), j = 1 , ..., J to all be identical. We do not impose any such constraints. Her Theorem 2 requires fullindependence between ( W, X ) and ε but, on the other hand, impose fewer restrictionson a ( W, X ) compared to us. But in both cases, she identiﬁes Λ by letting diﬀerent com-ponents of W diverge to + ∞ , which is an example of ”thin set identiﬁcation” discussedearlier. We here demonstrate that the CCP’s for the pertubed discrete choice model again canbe expressed on the form (1.1)-(1.2) with Λ deﬁned in (2.4) being injective. This is doneunder the following restrictions: First, in order to rule out zero demands, the norm ofthe gradient ∇ q Ω ( q | z ) has to approach inﬁnity as q approaches the boundary of the unitsimplex. Second, Ω ( q | z ) is diﬀerentiable . Third, we normalize the outside option so that g ( w ) = h ( x ) = 0. Under these three restrictions, for each value of the control z , thedemand solves the ﬁrst-order condition for an interior solution, a + ∇ q Ω (Λ ( a, z ) | z ) = λι, where λ is a scalar constant and ι ∈ R J is a vector consisting of ones. To show that Λis injective, consider this equation at a and a and assume that Λ ( a , z ) = Λ ( a , z ).Deﬁne a matrix M such that M x = x − x ι for all x = ( x , ..., x J ) ∈ R J +1 . Pre-multiplythis matrix onto the ﬁrst-order condition to obtain that a + M ∇ q Ω (Λ ( a , z ) | z ) = a + M ∇ q Ω (Λ ( a , z ) | z ) , which implies that a = a as required. Deﬁne Λ ( a, z ) := G ( a, z ) · ∂G ( a, z ) ∂a , (5.11)where as before a = ( a , ..., a J ) while G ( a, z ) is now deﬁned as the expected log failuretime, G ( a, z ) := E [ln Y | a ( W, X ) = a, Z = z ] = − E (cid:20) max j =1 ,...,J {− a j + ε j } | Z = z (cid:21) , where the second equality uses eq. (2.5) and Assumption 5.1(i). Williams-Daly-ZaccharyTheorem (McFadden, 1981) then implies that Π, now deﬁned by (2.6), can be written onthe form (1.1)-(1.2). Injectivity of Λ ( a, z ), as given in eq. (5.11), is obtained by recyclingthe arguments of the previous subsection except that no normalization of one of thecauses of failure is required since the level G ( a, z ) is included. Corollary 5.2.

Any competing risk model on the form (2.5) that satisﬁes Assumptions3.1-3.4 and 5.1(i) is identiﬁed. Mogens Fosgerau † and Dennis Kristensen ‡ Given that the competing risk model and the ARUM share a similar structure, thediscussion of the remaining assumptions carry over to the current setting with obviousmodiﬁcations.Compared to existing results (Heckman and Honor´e, 1989; Lee and Lewbel, 2013) weimpose stronger conditions on the index a ( W, X ) since we require it to be additive andwith g ( W ) known. On the other hand, Heckman and Honor´e (1989) require a ( W, X ) togo to zero as W diverges, and so relies on a ”thin set identiﬁcation” argument, whileLee and Lewbel (2013) rely on a high-level functional rank-condition. It is unclear whichprimitive conditions suﬃce for this rank condition to hold. Finally, Honor´e and Lleras-Muney (2006) restrict themselves to the case of purely discrete regressors and are onlyable to derive bounds for objects of interest. We achieve point identiﬁcation as long asthere is some continuous variation in W while X can be completely discrete6. CONCLUSIONWe have established an identiﬁcation result for a wide class of index models based ongeneral topological arguments. Three key features of our argument is that smoothness ofthe model is not required; no large support condition is imposed on the regressors; andcontrol variables may contribute to achieving identiﬁcation. We leave the developmentof nonparametric estimators of the identiﬁed components for future research.REFERENCESAbbring, J. H. and G. J. van den Berg (2003). The identiﬁability of the mixed propor-tional hazards competing risks model. Journal of the Royal Statistical Society 65 (3),701–710.Allen, R. and J. Rehbeck (2019, 5). Identiﬁcation With Additively Separable Hetero-geneity.

Econometrica 87 (3), 1021–1054.Ben-Akiva, M. and S. R. Lerman (1985).

Discrete Choice Analysis: Theory and Appli-cation to Travel Demand , Volume 6. Cambridge, MA: MIT Press.Berry, S. T. and P. A. Haile (2018, 1). Identiﬁcation of Nonparametric SimultaneousEquations Models With a Residual Index Structure.

Econometrica 86 (1), 289–315.Berry, S. T., J. Levinsohn, and A. Pakes (1995, 7). Automobile Prices in Market Equi-librium.

Econometrica 63 (4), 841–890.Bontemps, C. and C. Nauges (2016, 1). The Impact of Perceptions in Averting-decisionModels: An Application of the Special Regressor Method to Drinking Water Choices.

American Journal of Agricultural Economics 98 (1), 297–313.Cantillo, V. and J. de Dios Ort´uzar (2006, 11). Implications of thresholds in discretechoice modelling.

Transport Reviews 26 (6), 667–691.Chiappori, P.-A., I. Komunjer, and D. Kristensen (2018). Nonparametric identiﬁcationand estimation of discrete choice models.Chiou, Y. Y., M. Y. Chen, and J. e. Chen (2018, 10). Nonparametric regression withmultiple thresholds: Estimation and inference.

Journal of Econometrics 206 (2), 472–514.Evdokimov, K. (2010). Identiﬁcation and Estimation of a Nonparametric Panel DataModel with Unobserved Heterogeneity.Fosgerau, M. (2006). Investigating the distribution of the value of travel time savings.

Transportation Research Part B: Methodological 40 (8), 688–707.Fosgerau, M., D. McFadden, and M. Bierlaire (2013). Choice probability generatingfunctions.

Journal of Choice Modelling 8 . dentiﬁcation of a class of index models Fosgerau, M. and D. L. McFadden (2012, 3). A theory of the perturbed consumer withgeneral budgets.

NBER Working Paper , 1–27.Fosgerau, M., E. Melo, A. de Palma, and M. Shum (2019, 12). Discrete Choice andRational Inattention: A General Equivalence Result.

SSRN Electronic Journal .Fudenberg, D., R. Iijima, and T. Strzalecki (2015). Stochastic Choice and RevealedPerturbed Utility.

Econometrica 83 (6), 2371–2409.Heckman, J. J. and B. E. Honor´e (1989, 6). The identiﬁability of the competing risksmodel.

Biometrika 76 (2), 325–330.Hofbauer, J. and W. H. Sandholm (2002). On the global convergence of stochasticﬁctitious play.

Econometrica 70 (6), 2265–2294.Honor´e, B. E. and A. Lleras-Muney (2006, 11). Bounds in Competing Risks Models andthe War on Cancer.

Econometrica 74 (6), 1675–1698.Kalbﬂeisch, J. D. and R. L. Prentice (1980).

The statistical analysis of failure time data ,Volume 2nd. of

Wiley Series in probability and statistics . Hoboken, New Jersey: Wiley.Khan, S. and E. Tamer (2010, 11). Irregular Identiﬁcation, Support Conditions, andInverse Weight Estimation.

Econometrica 78 (6), 2021–2042.Lee, S. and A. Lewbel (2013, 10). Nonparametric identiﬁcation of accelerated failuretime competing risks models.

Econometric Theory 29 (05), 905–919.Lewbel, A., W. Shen, and H. M. Zhang (2000, 7). Semiparametric qualitative responsemodel estimation with unknown heteroscedasticity or instrumental variables.

Journalof Econometrics 97 (530), 145–177.Maddala, G. S. (1986).

Limited-dependent and qualitative variables in econometrics.

Cambridge: Cambridge University Press.Manski, C. F. (1975, 8). Maximum score estimation of the stochastic utility model ofchoice.

Journal of Econometrics 3 (3), 205–228.Matˇejka, F. and A. McKay (2015, 1). Rational Inattention to Discrete Choices: A NewFoundation for the Multinomial Logit Model.

American Economic Review 105 (1),272–298.Matzkin, R. L. (1993, 7). Nonparametric identiﬁcation and estimation of polychotomouschoice models.

Journal of Econometrics 58 (1-2), 137–168.McFadden, D. (1981). Econometric Models of Probabilistic Choice. In C. Manski andD. McFadden (Eds.),

Structural Analysis of Discrete Data with Econometric Applica-tions , pp. 198–272. Cambridge, MA, USA: MIT Press.McFadden, D. L. (1973). Conditional logit analysis of qualitative choice behavior. In

Frontiers in Econometrics , pp. 105142. New York: Academic Press.Rockafellar, R. T. (1970).