Duality in dynamic discrete-choice models
DDUALITY IN DYNAMIC DISCRETE CHOICE MODELS
KHAI X. CHIONG § , ALFRED GALICHON † , AND MATT SHUM ♣ Abstract.
Using results from convex analysis, we investigate a novel approach to iden-tification and estimation of discrete choice models which we call the “Mass TransportApproach” (MTA). We show that the conditional choice probabilities and the choice-specific payoffs in these models are related in the sense of conjugate duality , and that theidentification problem is a mass transport problem. Based on this, we propose a newtwo-step estimator for these models; interestingly, the first step of our estimator involvessolving a linear program which is identical to the classic assignment (two-sided matching)game of Shapley and Shubik (1971). The application of convex-analytic tools to dynamicdiscrete choice models, and the connection with two-sided matching models, is new in theliterature.
Date : First draft: April 2013. This version: May 2015.The authors thank the Editor, three anonymous referees, as well as Benjamin Connault, Thierry Magnac,Emerson Melo, Bob Miller, Sergio Montero, John Rust, Sorawoot (Tang) Srisuma, and Haiqing Xu foruseful comments. We are especially grateful to Guillaume Carlier for providing decisive help with the proofof Theorem 5. We also thank audiences at Michigan, Northwestern, NYU, Pittsburgh, UCSD, the CEMMAPconference on inference in game-theoretic models (June 2013), UCLA econometrics mini-conference (June2013), the Boston College Econometrics of Demand Conference (December 2013) and the Toulouse conferenceon “Recent Advances in Set Identification” (December 2013) for helpful comments. Galichon’s research hasreceived funding from the European Research Council under the European Union’s Seventh FrameworkProgramme (FP7/2007-2013) / ERC grant agreement n ◦ § Division of the Humanities and Social Sciences, California Institute of Technology; [email protected] † Department of Economics, Sciences Po; [email protected] ♣ Division of the Humanities and Social Sciences, California Institute of Technology; [email protected]. a r X i v : . [ ec on . E M ] F e b HIONG, GALICHON, AND SHUM Introduction
Empirical research utilizing dynamic discrete choice models of economic decision-makinghas flourished in recent decades, with applications in all areas of applied microeconomicsincluding labor economics, industrial organization, public finance, and health economics.The existing literature on the identification and estimation of these models has recognizeda close link between the conditional choice probabilities (hereafter, CCP, which can beobserved and estimated from the data) and the payoffs (or choice-specific value functions ,which are unobservable to the researcher); indeed, most estimation procedures containan “inversion” step in which the choice-specific value functions are recovered given theestimated choice probabilities.This paper has two contributions. First, we explicitly characterize this duality relation-ship between the choice probabilities and choice-specific payoffs. Specifically, in discretechoice models, the social surplus function (McFadden (1978)) provides us with the mappingfrom payoffs to the probabilities with which a choice is chosen at each state (conditionalchoice probabilities). Recognizing that the social surplus function is convex, we developthe idea that the convex conjugate of the social surplus function gives us the inverse map-ping - from choice probabilities to utility indices. More precisely, the subdifferential of theconvex conjugate is a correspondence that maps from the observed choice probabilities toan identified set of payoffs. In short, the choice probabilities and utility indices are relatedin the sense of conjugate duality . The discovery of this relationship allows us to succinctlycharacterize the empirical content of discrete choice models, both static and dynamic.Not only is the convex conjugate of the social surplus function a useful theoretical object;it also provides a new and practical way to “invert” from a given vector of choice probabilitiesback to the underlying utility indices which generated these probabilities. This is the secondcontribution of this paper. We show how the conjugate along with its set of subgradientscan be efficiently computed by means of linear programming. This linear programmingformulation has the structure of an optimal assignment problem (as in Shapley-Shubik’s(1971) classic work). This surprising connection enables us to apply insights developed inthe optimal transport literature, e.g. Villani (2003, 2009), to discrete choice models. Wecall this new methodology the “Mass Transport Approach” to CCP inversion.
UALITY IN DYNAMIC DISCRETE CHOICE MODELS
This paper focuses on the estimation of dynamic discrete-choice models via two-stepestimation procedures in which conditional choice probabilities are estimated in the initialstage; this estimation approach was pioneered in Hotz and Miller (HM, 1993) and Hotz,Miller, Sanders, Smith (1994). Our use of tools and concepts from convex analysis tostudy identification and estimation in this dynamic discrete choice setting is novel in theliterature. Based on our findings, we propose a new two-step estimator for DDC models.A nice feature of our estimator is that it works for practically any assumed distributionof the utility shocks. Thus, our estimator would make possible the task of evaluating therobustness of estimation to different distributional assumptions. Section 2 contains our main results regarding duality between choice probabilities andpayoffs in discrete choice models. Based on these results, we propose, in Section 3, atwo-step estimation approach for these models. We also emphasize here the surprisingconnection between dynamic discrete-choice and optimal matching models. In Section 4 wediscuss computational details for our estimator, focusing on the use of linear programmingto compute (approximately) the convex conjugate function from the dynamic discrete-choicemodel. Monte Carlo experiments (in Section 5) show that our estimator performs well inpractice, and we apply the estimator to Rust’s (1987) bus engine replacement data (Section6). Section 7 concludes. The Appendix contains proofs and also a brief primer on relevantresults from convex analysis. Sections 2.2 and 2.3, as well as Section 4, are not specific todynamic discrete choice problems but are also true for any (static) discrete choice model. Subsequent contributions include Aguirregabiria and Mira (2002, 2007), Magnac and Thesmar (2002),Pesendorfer and Schmidt-Dengler (2008), Bajari, et. al. (2009), Arcidiacono and Miller (2011), and Noretsand Tang (2013). While existing identification results for dynamic discrete choice models allow for quite general specifica-tions of the additive choice-specific utility shocks, many applications of these two-step estimators maintainthe restrictive assumption that the utility shocks are distributed i.i.d. type I extreme value, independentlyof the state variables, leading to choice probabilities which take the multinomial logit form. While they are not the focus in this paper, many applications of dynamic choice models do not uti-lize HM-type two step estimation procedures, and they allow for quite flexible distributions of the utilityshocks, and also for serial correlation in these shocks (examples include Pakes (1986) and Keane and Wolpin(1997)). This literature typically employs simulated method of moments, or simulated maximum likelihoodfor estimation (see Rust (1994, section 3.3)).
HIONG, GALICHON, AND SHUM Basic Model
The framework.
In this section we review the basic dynamic discrete-choice setup, asencapsulated in Rust’s (1987) seminal paper. The state variable is x ∈ X which we assumeto take only a finite number of values. Agents choose actions y ∈ Y from a finite space Y = { , , . . . , D } . The single-period utility flow which an agent derives from choosing y ina given period is ¯ u y ( x ) + ε y where ε y denotes the utility shock pertaining to action y , which differs across agents. Acrossagents and time periods, the set of utility shocks ε ≡ ( ε y ) y ∈Y is distributed according to ajoint distribution function Q ( · · · ; x ) which can depend on the current values of the statevariable x . We assume that this distribution Q is known to the researcher.Throughout, we consider a stationary setting in which the agent’s decision environmentremains unchanged across time periods; thus, for any given period, we use primes ( (cid:48) ) todenote next-period values. Following Rust (1987), and most of the subsequent papers inthis literature, we maintain the following conditional independence assumption (which rulesout serially persistent forms of unobserved heterogeneity ): Assumption 1 (Conditional Independence) . ( x, ε ) evolves across time periods as a con-trolled first-order Markov process, with transition P r ( x (cid:48) , ε (cid:48) | y, x, ε ) = P r ( ε (cid:48) | x (cid:48) , y, x, ε ) · P r ( x (cid:48) | y, x, ε )= P r ( ε (cid:48) | x (cid:48) ) · P r ( x (cid:48) | y, x ) . The discount rate is β . Agents are dynamic optimizers whose choices each period satisfy y ∈ arg max ˜ y ∈Y (cid:8) ¯ u ˜ y ( x ) + ε ˜ y + β E (cid:2) ¯ V (cid:0) x (cid:48) , ε (cid:48) (cid:1) | x, ˜ y (cid:3)(cid:9) , (1)where the value function ¯ V is recursively defined via Bellman’s equation as ¯ V ( x, ε ) = max ˜ y ∈Y (cid:8) ¯ u ˜ y ( x ) + ε ˜ y + β E (cid:2) ¯ V (cid:0) x (cid:48) , ε (cid:48) (cid:1) | x, ˜ y (cid:3)(cid:9) . See Norets (2009), Kasahara and Shimotsu (2009), Arcidiacono and Miller (2011), and Hu and Shum(2012). We have used Assumption 1 to eliminate ε as a conditioning variable in the expectation in Eq. (1). See, eg., Bertsekas (1987, chap. 5) for an introduction and derivation of this equation.
UALITY IN DYNAMIC DISCRETE CHOICE MODELS V ( x ), the ex-ante value function, is defined as: V ( x ) = E (cid:2) ¯ V ( x, ε ) | x (cid:3) . The expectation above is conditional on the current state x . In the literature, V ( x ) iscalled the ex-ante (or integrated) value function, because it measures the continuation valueof the dynamic optimization problem before the agent observes his shocks ε , so that theoptimal action is still stochastic from the agent’s point of view.Next we define the choice-specific value functions as consisting of two terms: the per-period utility flow and the discounted continuation payoff: w y ( x ) ≡ ¯ u y ( x ) + β E (cid:2) V ( x (cid:48) ) | x, y ) (cid:3) . In this paper, the utility flows { u y ( x ); ∀ y ∈ Y , ∀ x ∈ X } , and subsequently also the choice-specific value functions { w y ( x ) , ∀ y, x } , will be treated as unknown parameters; and we willstudy the identification and estimation of these parameters. For this reason, in the initialpart of the paper, we will suppress the explicit dependence of w y on x for convenience.Given these preliminaries, we derive the duality which is central to this paper.2.2. The social surplus function and its convex conjugate.
We start by introducingthe expected indirect utility of a decision maker facing the |Y| -dimensional vector of choice-specific values w ≡ { w y , y ∈ Y} (cid:48) : G ( w ; x ) = E (cid:20) max y ∈Y ( w y + ε y ) | x (cid:21) (2)where the expectation is assumed to be finite and is taken over the distribution of the utilityshocks, Q ( · ; x ). This function G ( · ; x ) : R |Y| → R , is called the “social surplus function”in McFadden’s (1978) random utility framework, and can be interpreted as the expectedwelfare of a representative agent in the dynamic discrete-choice problem. There is a difference between the definition of V ( x ) and the last terms in Equation (1) above. Here, weare considering the expectation of the value function ¯ V ( x, ε ) taken over the distribution of ε | x (ie. holdingthe first argument fixed). In the last term of Eq. (1), however, we are considering the expectation over the joint distribution of ( x (cid:48) , ε (cid:48) ) | x (ie. holding neither argument fixed). HIONG, GALICHON, AND SHUM
For convenience in what follows, we introduce the notation Y ( w, ε ) to denote an agent’soptimal choice given the vector of choice-specific value functions w and the vector of util-ity shocks ε ; that is, Y ( w, ε ) = argmax y ∈Y ( w y + ε y ). This notation makes explicit therandomness in the optimal alternative (arising from the utility shocks ε ). We get G ( w ; x ) = E (cid:2) w Y ( w,ε ) + ε Y ( w,ε ) | x (cid:3) = (cid:88) y ∈Y P r ( Y ( w, ε ) = y | x ) (cid:124) (cid:123)(cid:122) (cid:125) ≡ p y ( x ) ( w y + E [ ε y | Y ( w, ε ) = y, x ]) (3)which shows an alternative expression for the social surplus function as a weighted average,where the weights are the components of the vector of conditional choice probabilities p ( x ).For the remainder of this section, we suppress the dependence of all quantities on x forconvenience. In later sections, we will reintroduce this dependence when it is necessary.In the case when the social surplus function G ( w ) is differentiable (which holds for mostdiscrete-choice model specifications considered in the literature ), we obtain a well-knownfact that the vector of choice probabilities p compatible with rational choice coincides withthe gradient of G at w : Proposition 1 (The Williams-Daly-Zachary (WDZ) Theorem) . p = ∇G ( w ) . This result, which is analogous to Roy’s Identity in discrete choice models, is expoundedin McFadden (1978) and Rust (1994; Thm. 3.1)). It characterizes the vector of choiceprobabilities corresponding to optimal behavior in a discrete choice model as the gradientof the social surplus function. For completeness, we include a proof in the Appendix.The WDZ theorem provides a mapping from the choice-specific value functions (which areunobserved by researchers) to the observed choice probabilities p .However, the identification problem is the reverse problem, namely to determine the setof w which would lead to a given vector of choice probabilities. This problem is exactly We use w and ε (and also p below) to denote vectors, while w y and ε y (and p y ) denote the y -thcomponent of these vectors. This includes logit, nested logit, multinomial probit, etc. in which the distribution of the utility shocksis absolutely continuous and w is bounded, cf. Lemma 1 in Shi, Shum and Wong (2014). UALITY IN DYNAMIC DISCRETE CHOICE MODELS solved by convex duality and the introduction of the convex conjugate of G , which we denoteas G ∗ : Definition 1 (Convex Conjugate) . We define G ∗ , the Legendre-Fenchel conjugate functionof G (a convex function), by G ∗ ( p ) = sup w ∈ R Y (cid:88) y ∈Y p y w y − G ( w ) . (4)Equation (4) above has the property that if p is not a probability, that is if eitherconditions p y ≥ (cid:80) y ∈Y p y = 1 do not hold, then G ∗ ( p ) = + ∞ . Because the choice-specific value functions w and the choice probabilities p are, respectively, the argumentsof the functions G and its convex conjugate function G ∗ , we say that w and p are relatedin the sense of conjugate duality . The theorem below states an implication of this duality,and provides an “inverse” correspondence from the observed choice probabilities back tothe unobserved w , which is a necessary step for identification and estimation. Theorem 1.
The following pair of equivalent statements capture the empirical content ofthe DDC model:(i) p is in the subdifferential of G at w p ∈ ∂ G ( w ) , (5) (ii) w is in the subdifferential of G ∗ at pw ∈ ∂ G ∗ ( p ) . (6)The definition and properties of the subdifferential of a convex function are provided inAppendix A. Part (i) is, of course, connected to the WDZ theorem above; indeed, it is Details of convex conjugates are expounded in the Appendix. Convex conjugates are also encounteredin classic producer and consumer theory. For instance, when f is the convex cost function of the firm(decreasing returns to scale in production), then the convex conjugate of the cost function, f ∗ , is in fact thefirm’s optimal profit function. G is differentiable at w if and only if ∂ G ( w ) is single-valued. In that case, part (i) of Th. 1 reducesto p = ∇G ( w ), which is the WDZ theorem. If, in addition, ∇G is one-to-one, then we immediately get w = ( ∇G ) − ( p ), or ∇G ∗ ( p ) = ( ∇G ) − ( p ), which is the case of the classical Legendre transform. However, HIONG, GALICHON, AND SHUM the WDZ theorem when G ( w ) is differentiable at w . Hence, it encapsulates an optimal-ity requirement that the vector of observed choice probabilities p be derived from optimaldiscrete-choice decision making for some unknown vector w of choice-specific value func-tions.Part (ii) of this proposition, which describes the “inverse” mapping from conditionalchoice probabilities to choice-specific value functions, does not appear to have been ex-ploited in the literature on dynamic discrete choice. It relates to Galichon and Salani´e(2012) who use convex analysis to estimate matching games with transferable utilities. Itspecifically states that the vector of choice-specific value functions can be identified fromthe corresponding vector of observed choice probabilities p as the subgradient of the convexconjugate function G ∗ ( p ). Eq. (6) is also constructive, and suggests a procedure for com-puting the choice-specific value functions corresponding to observed choice probabilities.We will fully elaborate this procedure in subsequent sections .Appendix A contains additional derivations related to the subgradient of a convex func-tion. Specifically, it is known (Eq. (25)) that G ( w ) + G ∗ ( p ) = (cid:80) y ∈Y p y w y if and only if p ∈ ∂ G ( w ). Combining this with Eq. (3), we obtain an alternative expression for the convexconjugate function G ∗ : G ∗ ( p ) = − (cid:88) y p y E [ ε y | Y ( w, ε ) = y ] , (7)corresponding to the weighted expectations of the utility shocks ε y conditional on choosingthe option y . It is also known that the subdifferential ∂ G ∗ ( p ) corresponds to the set ofmaximizers in the program (4) which define the conjugate function G ∗ ( p ); that is, w ∈ ∂ G ∗ ( p ) ⇔ w ∈ argmax w ∈ R Y (cid:88) y ∈Y p y w y − G ( w ) . (8) as we show below, ∇G ( w ) is not typically one-to-one in discrete choice models, so that the statement in part(ii) of Th. 1 is more suitable. Clearly, Theorem 1 also applies to static random utility discrete-choice models, with the w ( x ) beinginterpreted as the utility indices for each of the choices. As such, Eq. (6) relates to results regarding theinvertibility of the mapping from utilities to choice probabilities in static discrete choice models (e.g. Berry(1994); Haile, Hortacsu, and Kosenok (2008); Berry, Gandhi, and Haile (2013)). Similar results have alsoarisen in the literature on stochastic learning in games (Hofbauer and Sandholm (2002); Cominetti, Meloand Sorin (2010)). UALITY IN DYNAMIC DISCRETE CHOICE MODELS
Later, we will exploit this variational representation of the subdifferential G ∗ ( p ) for compu-tational purposes; cf. Section 4 below. Example 1 (Logit) . Before proceeding, we discuss the logit model, for which the functionsand relations above reduce to familiar expressions. When the distribution Q of ε obeys anextreme value type I distribution, it follows from Extreme Value theory that G and G ∗ can beobtained in closed form : G ( w ) = log( (cid:80) y ∈Y exp( w y ))+ γ , while G ∗ ( p ) = (cid:80) y ∈Y p y log p y − γ if p belongs in the interior of the simplex, G ∗ ( p ) = + ∞ otherwise ( γ ≈ . is Euler’sconstant). Hence in this case, G ∗ is the entropy of distribution p (see Anderson, de Palma,Thisse (1988) and references therein).The subdifferential of G ∗ is characterized as follows: w ∈ ∂ G ∗ ( p ) if and only if w y =log p y − K , for some K ∈ R . In this logit case the convex conjugate function G ∗ is theentropy of distribution p , which explains why it can be called a generalized entropy functioneven in non-logit contexts. (cid:4) Identification.
It follows from Theorem 1 that the identification of systematic utilitiesboils down to the problem of computing the subgradient of a generalized entropy function.However, from examining the social surplus function G , we see that if w ∈ ∂ G ∗ ( p ), then itis also true that w − K ∈ ∂ G ∗ ( p ), where K ∈ R |Y| is a vector taking values of K across all Y components. Indeed, the choice probabilities are only affected by the differences in thelevels offered by the various alternatives. In what follows, we shall tackle this indeterminacyproblem by isolating a particular w among those satisfying w ∈ ∂ G ∗ ( p ), where we choose G (cid:0) w (cid:1) = 0 . (9)We will impose the following assumption on the heterogeneity. Assumption 2 (Full Support) . Assume the distribution Q of the vector of utility shocks ε is such that the distribution of the vector ( ε y − ε ) y (cid:54) =1 has full support. Relatedly, Arcidiacono and Miller (2011, pp. 1839-1841) discuss computational and analytical solutionsfor the G ∗ function in the generalized extreme value setting. HIONG, GALICHON, AND SHUM
Under this assumption, Theorem 2 below shows that Eq. (9) defines w uniquely. The-orem 3 will then show that the knowledge of w allows for easy recovery of all vectors w satisfying p ∈ ∂ G ( w ). Theorem 2.
Under Assumption 2, let p be in the interior of the simplex ∆ |Y| , (i.e. p y > for each y and (cid:80) y p y = 1 ). Then there exists a unique w ∈ ∂ G ∗ ( p ) such that G (cid:0) w (cid:1) = 0 . The proof of this theorem is in the Appendix. Moreover, even when Assumption 2 is notsatisfied, w will still be set-identified; Theorem 4 below describes the identified set of w corresponding to a given vector of choice probabilities p .Our next result is our main tool for identification; it shows that our choice of w ( x ), asdefined in Eq. (9) is without loss of generality; it is not an additional model restriction, butmerely a convenient way of representing all w ( x ) in ∂ G ∗ ( p ( x )) with respect to a naturaland convenient reference point. Theorem 3.
Maintain Assumption 2, and let K denote any scalar K ∈ R . The set ofconditions w ∈ ∂ G ∗ ( p ) and G ( w ) = K is equivalent to w y = w y + K, ∀ y ∈ Y . This theorem shows that any vector within the set ∂ G ∗ ( p ) can be characterized as thesum of the (uniquely-determined, by Theorem 3) vector w and a constant K ∈ R . As wewill see below, this is our invertibility result for dynamic discrete choice problems, as it willimply unique identification of the vector of choice-specific value functions corresponding toany observed vector of conditional choice probabilities. This indeterminacy issue has been resolved in the existing literature on dynamic discrete choice models(eg. Hotz and Miller (1993), Rust (1994), Magnac and Thesmar (2002) by focusing on the differences betweenchoice-specific value functions, which is equivalent to setting w y ( x ), the choice-specific value function for abenchmark choice y , equal to zero. Compared to this, our choice of w ( x ) satisfying G ( w ( x )) = 0 is moreconvenient in our context, as it leads to a simple expression for the constant K (see Section 2.4). See Berry (1994), Chiappori and Komunjer (2010), Berry, Gandhi, and Haile (2012), among others, forconditions ensuring the invertibility or “univalence” of demand systems stemming from multinomial choicemodels, under settings more general than the random utility framework considered here.
UALITY IN DYNAMIC DISCRETE CHOICE MODELS
Empirical Content of Dynamic Discrete Choice Model.
To summarize the em-pirical content of the model, we recall the fact that the ex-ante value function V solves thefollowing equation V ( x ) = (cid:88) y ∈Y p y ( x ) (cid:32) ¯ u y ( x ) + E [ ε y | Y ( w, ε ) = y, x ] + β (cid:88) x (cid:48) p (cid:0) x (cid:48) | x, y (cid:1) V (cid:0) x (cid:48) (cid:1)(cid:33) (derived in Pesendorfer and Schmidt-Dengler (2008), among others), where we write p ( x (cid:48) | x, y ) = P r ( x t +1 = x (cid:48) | x t = x, y t = y ). Noting that the choice-specific value function is just w y ( x ) = ¯ u y ( x ) + β (cid:88) x (cid:48) p (cid:0) x (cid:48) | x, y (cid:1) V (cid:0) x (cid:48) (cid:1) , (10)and, comparing with Eq. (3), V ( x ) = G ( w ( x ); x ) and p ( x ) ∈ ∂ G ( w ( x ); x ) . Hence, by Theorem 3, the true w ( x ) will differ from w ( x ) by a constant term V ( x ): w ( x ) = w ( x ) + V ( x )where w ( x ) is defined in Theorem 2. This result is also convenient for identificationpurposes, as it separates identification of w into two subproblems, the determination of w and the determination of V . Once w and V are known, the utility flows are determinedfrom Eq. (10). This motivates our two-step estimation procedure, which we describe next.3. Estimation using the Mass Transport Approach (MTA)
Based upon the derivations in the previous section, we present a two-step estimationprocedure. In the first step, we use the results from Theorem 3 to recover the vectorof choice-specific value functions w ( x ) corresponding to each observed vector of choiceprobabilities p ( x ). In the second step, we recover the utility flow functions ¯ u y ( x ) given the w ( x ) obtained from the first step.3.1. First step.
In the first step, the goal is to recover the vector of choice-specific valuefunctions w ( x ) ∈ ∂ G ∗ ( p ( x )) corresponding to the vector of observed choice probabilities p ( x ) for each value of x . In doing this, we use Theorem 1 above and Proposition 2 below,which show how w ( x ) belongs to the subdifferential of the conjugate function G ∗ ( p ( x )). HIONG, GALICHON, AND SHUM
We delay discussing these details until Section 4. There, we will show how this problemof obtaining w ( x ) can be reformulated in terms of a class of mathematical programmingproblems, the Monge-Kantorovich mass transport problems, which leads to convenient com-putational procedures. Since this is the central component of our estimation procedure, wehave named it the mass transport approach (MTA).3.2. Second step.
From the first step, we obtained w ( x ) such that w ( x ) = w ( x ) + V ( x ).Now in the second step, we use the recursive structure of the dynamic model, along withfixing one of the utility flows, to jointly pin down the values of w ( x ) and V ( x ). Finally,once w ( x ) and V ( x ) are known, the utility flows can be obtained from ¯ u y ( x ) = w y ( x ) − β E [ V ( x (cid:48) ) | x, y ].In order to nonparametrically identify ¯ u y ( x ), we need to fix some values of the utilityflows. Following Bajari, Chernozhukov, Hong, and Nekipelov (2009), we fix the utility flowcorresponding to a benchmark choice y to be constant at zero: Assumption 3 (Fix utility flow for benchmark choice) . ∀ x, ¯ u y ( x ) = 0 . With this assumption, we get0 = w y ( x ) + V ( x ) − β E (cid:2) V (cid:0) x (cid:48) (cid:1) | x, y = y (cid:3) . (11)Let W be the column vector whose general term is (cid:0) w y ( x ) (cid:1) x ∈X , let V be the columnvector whose general term is ( V ( x )) x ∈X , and let Π be the |X | × |X | matrix whose generalterm Π ij is P r ( x t +1 = j | x t = i, y = y ). Equation (11), rewritten in matrix notation, is W = β Π V − V and for β <
1, matrix I − β Π is a diagonally dominant matrix. Hence, it is invertible andEquation (11) becomes V = ( β Π − I ) − W. (12) In a static discrete-choice setting (i.e. β = 0), this assumption would be a normalization, and withoutloss of generality. In a dynamic discrete-choice setting, however, this entails some loss of generality becausedifferent values for the utility flows imply different values for the choice-specific value functions, which leadsto differences in the optimal choice behavior. Norets and Tang (2013) discuss this issue in greater detail. UALITY IN DYNAMIC DISCRETE CHOICE MODELS
The right hand side of this equation is uniquely estimated from the data. After obtaining V ( x ), ¯ u y ( x ) can be nonparametrically identified by¯ u y ( x ) = w y ( x ) + V ( x ) − β E [ V ( x (cid:48) ) | x, y ] , (13)where w ( x ) is as in Theorem 3, and V is given by (12).As a sanity check, one recovers ¯ u y ( . ) = W + V − β Π V = 0. Also, when β → u y ( x ) = w y ( x ) − w y ( x ) which is the case in standard static discrete choice.Moreover, since our approach to identifying the utility flows is nonparametric, our MTAapproach does not leverage any known restrictions on the flow utility (including parametricor shape restrictions) in identifying or estimating the flow utilities. Eqs. (12) and (13) above, showing how the per-period utility flows can be recovered fromthe choice-specific value functions via a system of linear equations, echoes similar derivationsin the existing literature (e.g. Aguirregabiria and Mira (2007), Pesendorfer and Schmidt-Dengler (2008), Arcidiacono and Miller (2011, 2013)). Hence, the innovative aspect of ourMTA estimator lies not in the second step, but rather in the first step. In the next section,we delve into computational aspects of this first step.Existing procedures for estimating DDC models typically rely on a small class of distri-butions for the utility shocks – primarily those in the extreme-value family, as in Example1 above – because these distributions yield analytical (or near-analytical) formulas for thechoice probabilities and { E [ ε y | Y ( w, ε ) = y, x ] } y , the vector of conditional expectation ofthe utility shocks for the optimal choices, which is required in order to recover the utilityflows . Our approach, however, which is based on computing the G ∗ function, easily ac-commodates different choices for Q ε , the (joint) distribution of the utility shocks conditional To ensure that the inverted w satisfies certain shape restrictions, the linkage between w and the CCPwill no longer be stipulated by the subdifferential of the convex conjugate function. It is possible thatthere exists a modification of the convex conjugate function that is equivalent to imposing certain shaperestrictions on utilities. This is an interesting avenue for future research. Related papers include Hotz and Miller (1993), Hotz, Miller, Sanders, Smith (1994), Aguirregabiriaand Mira (2007), Pesendorfer and Schmidt-Dengler (2008), Arcidiacono and Miller (2011). Norets andTang (2013) propose another estimation approach for binary dynamic choice models in which the choiceprobability function is not required to be known.
HIONG, GALICHON, AND SHUM on X . Therefore, our findings expand the set of dynamic discrete-choice models suitablefor applied work far beyond those with extreme-value distributed utility shocks. Computational details for the MTA estimator
In Section 4.1, we show that the problem of identification in DDC models can be for-mulated as a mass transport problem, and also how this may be implemented in practice.In showing how to compute G ∗ , we exploit the connection, alluded to above, between thisfunction and the assignment game, a model of two-sided matching with transferable utilitywhich has been used to model marriage and housing markets (such as Shapley and Shubik(1971) and Becker (1973)).4.1. Mass Transport formulation.
Much of our computational strategy will be basedon the following proposition, which was derived in Galichon and Salani´e (2012, Proposition2). It characterizes the G ∗ function as an optimum of a well-studied mathematical program:the “mass transport,”problem, see Villani (2003). Proposition 2 (Galichon and Salani´e) . Given Assumption (2), the function G ∗ ( p ) is thevalue of the mass transport problem in which the distribution Q of vectors of utility shocks ε is matched optimally to the distribution of actions y given by the multinomial distribution p , when the cost associated to a match of ( ε, y ) is given by c ( y, ε ) = − ε y where ε y is the utility shock from taking the y -th action. That is, G ∗ ( p ) = sup w,z s.t. w y + z ( ε ) ≤ c ( y,ε ) { E p [ w Y ] + E Q [ z ( ε )] } , (14) where the supremum is taken over the pair ( w, z ) , where w y is a vector of dimension |Y| and z ( · ) is a Q -measurable random variable. By Monge-Kantorovich duality, (14) coincides This remark is also relevant for static discrete choice models. In fact, the random-coefficients multi-nomial demand model of Berry, Levinsohn, and Pakes (1995) does not have a closed-form expression for thechoice probabilities, thus necessitating a simulation-based inversion procedure. In ongoing work (Chiong,Galichon, Shum (2013)), we are exploring the estimation of random-coefficients discrete-choice demandmodels using our approach.
UALITY IN DYNAMIC DISCRETE CHOICE MODELS with its dual G ∗ ( p ) = min Y ∼ pε ∼ Q E [ c ( Y, ε )] , (15) where the minimum is taken over the joint distribution of ( Y, ε ) such that the the first margin Y has distribution p and the second margin ε has distribution Q . Moreover, w ∈ ∂ G ∗ ( p ) ifand only if there exists z such that ( w, z ) solves (14). Finally, w ∈ ∂ G ∗ ( p ) and G ( w ) = 0 if and only if there exists z such that (cid:0) w , z (cid:1) solves (14) and z is such that E Q [ z ( ε )] = 0 . In Eq. (15) above, the minimum is taken across all joint distributions of (
Y, ε ) withmarginal distribution equal to, respectively, p and Q . It follows from the proposition thatthe main problem of identification of the choice-specific value functions w can be recast asa mass transport problem (Villani (2003)), in which the set of optimizers to Eq. (14) yieldvectors of choice-specific value functions w ∈ ∂ G ∗ ( p ).Moreover, the mass transport problem can be interpreted as an optimal matching prob-lem. Using a marriage market analogy, consider a setting in which a matched couple con-sisting of a “man” (with characteristics y ∼ p ) and a “woman” (with characteristics ε ∼ Q )obtain a joint marital surplus − c ( y, ε ) = ε y . Accordingly, Eq. (15) is an optimal matchingproblem in which the joint distribution of characteristics ( y, ε ) of matched couples is chosento maximize the aggregate marital surplus.In the case when Q is a discrete distribution, the mass transport problem in the aboveproposition reduces to a linear-programming problem which coincides with the assignmentgame of Shapley and Shubik (1971). This connection suggests a convenient way for effi-ciently computing the G ∗ function (along with its subgradient). Specifically, we will showhow the dual problem (Eq. (15)) takes the form of a linear programming problem or assign-ment game, for which some of the associated Lagrange multipliers correspond to the thesubgradient ∂ G ∗ , and hence the choice-specific value functions. These computational detailsare the focus of Section 4 below. We include the proof of Proposition 2 in the Appendix forcompleteness.4.2. Linear programming computation.
Let ˆ Q be a discrete approximation to the dis-tribution Q . Specifically, consider a S -point approximation to Q , where the support isSupp( ˆ Q ) = { ε , . . . , ε S } . Let P r ( ˆ Q = ε s ) = q s . The best S -point approximation is such HIONG, GALICHON, AND SHUM that the support points are equally weighted, q s = S , i.e. the best ˆ Q is a uniform distri-bution, see Kennan (2006). Therefore, let ˆ Q be a uniform distribution whose support canbe constructed by drawing S points from the distribution Q . Moreover, ˆ Q converges to Q uniformly as S → ∞ , so that the approximation error from this discretization will vanishwhen S is large. Under these assumptions, Problem (14)-(15) has a Linear Programmingformulation as max π ≥ (cid:88) y,s π ys ε sy (16) S (cid:88) s =1 π ys = p y , ∀ y ∈ Y (17) (cid:88) y ∈Y π ys = q s , ∀ s ∈ { , ..., S } . (18)For this discretized problem, the set of w ∈ ∂ G ∗ ( p ) is the set of vectors w of Lagrangemultipliers corresponding to constraints (17). To see how we recover w , the specific elementin ∂ G ∗ ( p ) as defined in Theorem 1, we begin with the dual problemmin λ,z (cid:88) y ∈Y p y λ y + S (cid:88) s =1 q s z s (19) s.t. λ y + z s ≥ ε sy Consider ( λ, z ) a solution to (19). By duality, λ and z are, respectively, vectors of La-grange multipliers associated to constraints (17) and (18). We have G ∗ ( p ) = (cid:80) y ∈Y p y λ y + (cid:80) Ss =1 q s z s , which implies that G ( λ ) = − (cid:80) Ss =1 q s z s . Also, for any two elements λ, w ∈ ∂ G ∗ ( p ), we have (cid:80) y ∈Y p y λ y − G ( λ ) = (cid:80) y ∈Y p y w y − G ( w ). Because ˆ Q is constructed from i.i.d. draws from Q , this uniform convergence follows from the Glivenko-Cantelli Theorem. Because the two linear programs (16) and (19) are dual to each other, the Lagrange multipliers ofinterest λ y can be obtained by computing either program. In practice, for the simulations and empiricalapplication below, we computed the primal problem (16). This uses Eq. (25) in Appendix A, which (in our setup) states that G ∗ ( p ) + G ( λ ) = p · λ , for allLagrange multiplier vectors λ ∈ ∂ G ∗ ( p ). UALITY IN DYNAMIC DISCRETE CHOICE MODELS
Hence, because G ( w ) = 0, we get w y = λ y − G ( λ ) = λ y + S (cid:88) s =1 q s z s . (20)In Theorem 5 below, we establish the consistency of this estimate of w .4.3. Discretization of Q and a second type of indeterminacy issue. Thus far, wehave proposed a procedure for computing G ∗ (and the choice-specific value functions w ) bydiscretizing the otherwise continuous distribution Q . However, because the support of ε isdiscrete, w y will generally not be unique. This is due to the non-uniqueness of the solutionto the dual of the LP problem in Eq. (16), and corresponds to Shapley and Shubik’s (1971)well-known results on the multiplicity of the core in the finite assignment game. Applied todiscrete-choice models, it implies that when the support of the utility shocks is finite, theutilities from the discrete-choice model will only be partially identified. In this section, wediscuss this partial identification, or indeterminacy, problem further.Recall that G ∗ ( p ) = sup w y + z ( ε ) ≤ c ( y,ε ) { E p [ w Y ] + E Q [ z ( ε )] } (21)where c ( y, ε ) = − ε y . In Proposition 2, this problem was shown to be the dual formulationof an optimal assignment problem.We call identified set of payoff vectors , denoted by I ( p ), the set of vectors w such thatPr (cid:18) w y + ε y ≥ max y (cid:48) { w y (cid:48) + ε y (cid:48) } (cid:19) = p y (22)and we denote by I ( p ) the normalized identified set of payoff vectors , that is the set of w ∈ I ( p ) such that G ( w ) = 0. If Q were to have full support, I ( p ) would contain onlythe singleton (cid:8) w (cid:9) as in Theorem 3. Instead, when the distribution Q is discrete, the set I ( p ) contains a multiplicity of vectors w which satisfy (5). One has: Theorem 4.
The following holds: Note that Theorem 1 requires ε to have full support. HIONG, GALICHON, AND SHUM (i) The set I ( p ) coincides with the set of w such that there exists z such that ( w, z ) is asolution to (21). Thus I ( p ) = w : ∃ z, w y + z ε ≤ c ( y, ε ) E p [ w Y ] + E Q [ z ε ] = G ∗ ( p ) . (ii) The set I ( p ) is determined by the following set of linear inequalities I ( p ) = w : ∃ z, w y + z ε ≤ c ( y, ε ) E p [ w Y ] = G ∗ ( p ) E Q [ z ε ] = 0 . This result allows us to easily derive bounds on the individual components of w usingthe characterization of the identified set using linear inequalities. Indeed, for each y ∈ Y , wecan obtain upper (resp. lower) bounds on w y by maximizing (resp. minimizing) w y subjectto the linear inequalities characterizing I ( p ), which is a linear programming problem. Furthermore, when the dimensionality of discretization, S , is high, the core shrinks to asingleton, and the core collapses to (cid:8) w (cid:9) . This is a consequence of our next theorem, whichis a consistency result. In our Monte Carlo experiments below, we provide evidence forthe magnitude of this indeterminacy problem under different levels of discretization.4.4.
Consistency of MTA estimator.
Here we show (strong) consistency for our MTAestimator of w , the normalized choice-specific value functions. In our proof, we accommo-date two types of error: (i) approximation error from discretizing the distribution Q of ε ,and (ii) sampling error from our finite-sample observations of the choice probabilities. Weuse Q n to denote the discretized distributions of ε , and p n to denote the sample estimatesof the choice probabilities. The limiting vector of choice probabilities is denoted p . For a However, letting ¯ w y (resp. w y ) denote the upper (resp. lower) bound on w y , we note that typically thevector ( w y , y ∈ Y ) (cid:48) (cid:54)∈ I ( p ). Moreover, partial identification in w (due to discretization of the shock distribution Q ( ε ) will naturallyalso imply partial identification in the utility flows u . For a given identified vector w (and also given thechoice probabilities p and transition matrix Π from the data), we can recover the corresponding u usingEqs. (12)-(13). Gretsky, Ostroy, and Zame (1999) also discusses this phenomenon in their paper.
UALITY IN DYNAMIC DISCRETE CHOICE MODELS given ( Q n , p n ), let w ny denote the choice-specific value functions estimated using our MTAapproach. Theorem 5.
Assume:(i) The sequence of vectors (cid:8) p ny (cid:9) y ∈Y , viewed as the multinomial distribution of y , con-verges weakly to p ;(ii) The discretized distributions of ε converge weakly to Q : Q n d → Q ;(iii) The second moments of Q n are uniformly bounded.Then the convergence w ny → w y for each y ∈ Y holds almost surely. The proof, which is in the appendix, may be of independent interest as the main argumentrelies on approximation results from mass transport theory, which we believe to be the firstuse of such results for proving consistency in an econometrics context.5.
Monte Carlo Evidence
In this section, we illustrate our estimation framework using a dynamic model of resourceextraction. To illustrate how our method can tractably handle any general distribution ofthe unobservables, we use a distribution in which shocks to different choices are correlated.We will begin by describing the setup.At each time t , let x t ∈ { , , . . . , } be the state variable denoting the size of theresource pool. There are three choices, y t = 0 : The pool of resources is extracted fully. x t +1 | x t , y t = 0 follows a multinomialdistribution on { , , , } with parameter π = ( π , π , π , π ). The utility flow is¯ u ( y t = 0 , x t ) = 0 . √ x t − ε . y t = 1 : The pool of resources is extracted partially. x t +1 | x t , y t = 1 follows a multino-mial distribution on { max { , x t − } , max { , x t − } , max { , x t − } , max { , x t − }} with parameter π . The utility flow is ¯ u ( y t = 1 , x t ) = 0 . √ x t − ε . y t = 2 : Agent waits for the pool to grow and does not extract. x t +1 | x t , y t = 3 followsa multinomial distribution on { x t , x t + 1 , x t + 2 , x t + 3 } with parameter π . Wenormalize the utility flow to be ¯ u ( y t = 2 , x t ) = ε . HIONG, GALICHON, AND SHUM
The joint distribution of the unobserved state variables is given by ( ε − ε , ε − ε ) ∼ N (cid:32)(cid:32) (cid:33) , (cid:32) . . . (cid:33)(cid:33) . Other parameters we fix and hold constant for the Monte Carlostudy are the discount rate, β = 0 . π = (0 . , . , . , . Asymptotic performance.
As a preliminary check of our estimation procedure, weshow that we are able to recover the utility flows using the actual conditional choice proba-bilities implied by the underlying model. We discretized the distribution of ε using S = 5000support points. As is clear from Figure 1, the estimated utility flows (plotted as dots) as afunction of states matched the actual utility functions very well. x , state Nonparametric estimatesNonparametric estimates¯ u ( y = 0 , x ) = − . √ x ¯ u ( y = 1 , x ) = − . √ x Figure 1.
Comparison between the estimated and true utility flows.5.2.
Finite sample performance.
To test the performance of our estimation procedurewhen there is sampling error in the CCPs, we generate simulated panel data of the followingform: { y it , x it : i = 1 , , . . . , N ; t = 1 , , . . . , T } where y it ∈ { , , } is the dynamically UALITY IN DYNAMIC DISCRETE CHOICE MODELS optimal choice at x it after the realization of simulated shocks. We vary the number of cross-section observations N and the number periods T , and for each combination of ( N, T ), wegenerate 100 independent datasets. For each replication or simulated dataset, the root-mean-square error (RMSE) and R are calculated, showing how well the estimated ¯ u y ( x ) fits the true utility function for each y . The averages are reported in Table 1.Design RMSE( y = 0) RMSE( y = 1) R ( y = 0) R ( y = 1) N = 100 , T = 100 0.5586 0.2435 0.3438 0.7708 N = 100 , T = 500 0.1070 0.1389 0.7212 0.9119 N = 100 , T = 1000 0.0810 0.1090 0.8553 0.9501 N = 200 , T = 100 0.1244 0.1642 0.5773 0.8736 N = 200 , T = 200 0.1177 0.1500 0.7044 0.9040 N = 500 , T = 100 0.0871 0.1162 0.8109 0.9348 N = 500 , T = 500 0.0665 0.0829 0.8899 0.9678 N = 1000 , T = 100 0.0718 0.0928 0.8777 0.9647 N = 1000 , T = 1000 0.0543 0.0643 0.9322 0.9820 Table 1.
Average fit across all replications. Standard deviations are re-ported in the Appendix.5.3.
Size of the identified set of payoffs.
As mentioned in Section 4.3, using a discreteapproximation to the distribution of the unobserved state variable introduces a partialidentification problem: the identified choice-specific value functions might not be unique.Using simulations, we next show that the identified set of choice-specific value functions(which we will simply refer to as “payoffs”) shrinks to a singleton as S increases, where S is the number of support points in the discrete approximation of the continuous errordistribution. For S ranging from 100 to 1000, we plot in Figure 2, the differences betweenthe largest and smallest choice-specific value function for y = 2 across all values of p ∈ ∆ (using the linear programming procedures described in Section 4.3). In each dataset, we initialized x i with a random state in X . When calculating RMSE and R ,we restrict to states where the probability is in the interior of the simplex ∆ , otherwise utilities are notidentified and the estimates are meaningless. HIONG, GALICHON, AND SHUM
Figure 2.
The identified set of payoffs shrinks to a singleton across ∆ .
100 200 300 400 500 600 700 800 900 100000.020.040.060.080.10.120.140.16
Number of discretized points, S U pp e r b o und Size of the identified set of payoff for choice y=3
For each value of S , we plot the values of the differences max w ∈ ∂ G ∗ ( p ) w − min w ∈ ∂ G ∗ ( p ) w across allvalues of p ∈ ∆ . In the boxplot, the central mark is the median, the edges of the box are the 25thand 75th percentiles, the whiskers extend to the most extreme data points not considered outliers,and outliers are plotted individually. As is evident, even at small S , the identified payoffs are very close to each other inmagnitude. At S = 1000, where computation is near-instantaneous, for most of the valuesin the discretised grid of ∆ , the core is a singleton; when it is not, the difference in theestimated payoff is less than 0.01. Similar results hold for the choice-specific value functionsfor choices y = 0 and y = 1, which are plotted in Figures 5 and 6 in the Appendix. To sumup, it appears that this indeterminacy issue in the payoffs is not a worrisome problem foreven very modest values of S .5.4. Comparison: MTA vs. Simulated Maximum Likelihood.
One common tech-nique used in the literature to estimate dynamic discrete choice models with non-standarddistribution of unobservables is the Simulated Maximum Likelihood (SML). Our MTAmethod has a distinct advantage over SML – while MTA allows the utility flows ¯ u y ( x )for different choices y and states x to be nonparametric, the SML approach typically re-quires parameterizing these utility flows as a function of a low-dimensional parameter vector.This makes comparison of these two approaches awkward. Nevertheless, here we undertake UALITY IN DYNAMIC DISCRETE CHOICE MODELS a comparison of the nonparametric MTA vs. the parametric SML approach. First we com-pare the performance of the two alternative approaches in terms of computational time.The computations were performed on a Quad Core Intel Xeon 2.93GHz UNIX workstation,and the results are presented in Table 2.From a computational point of view, the disadvantage of SML is that the dynamic pro-gramming problem must be solved (via Bellman function iteration) for each trial parametervector, whereas the MTA requires solving a large-scale linear programming problem – butonly once . Table 2 shows that our MTA procedure significantly outperforms SML in termsof computational speed. This finding, along with the results in Table 1, show that MTAhas the desirable properties of speed and accuracy, and also allows for nonparametric spec-ification of the utility flows ¯ u y ( x ). Table 2.
Comparison: MTA vs. Simulated Maximum Likelihood (SML) S discretized points SML: + MTA: ++ Avg. seconds Avg. seconds2000 19.8 2.63000 24.5 4.44000 26.5 6.65000 40.9 9.66000 70.5 13.47000 105.0 17.58000 129.4 21.5 + : In this column we report time it takes to estimate the parameters θ = ( θ , θ , θ , θ ) as a localmaximum of a simulated maximum likelihood, where θ corresponds to ¯ u y =0 ( x ) = θ + θ √ x , and¯ u y =1 ( x ) = θ + θ √ x . ++ : In this column we report the time it takes to nonparametrically estimate the per-period utility flow.
Furthermore, as confirmed in our computations, the nonlinear optimization routines typ-ically used to implement SML have trouble finding the global optimum; in contrast, theMTA estimator, by virtue of its being a linear programming problem, always finds the globaloptimum. Indeed, under the logistic assumption on unobservables and linear-in-parametersutility, one advantage of the Hotz-Miller estimator for DDC models (vs. SML) is that the
HIONG, GALICHON, AND SHUM system of equations defining the estimator has a unique global solution; in their discus-sion of this, Aguirregabiria and Mira (2010, pg. 48) remark that “extending the range ofapplicability of ... CCP methods to models which do not impose the CLOGIT [logistic] as-sumption is a topic for further research.” This paper fills the gap: our MTA estimator sharesthe computational advantages of the CLOGIT setup, but works for non-logistic models. Inthis sense, the MTA estimator is a generalized CCP estimator.6.
Empirical Application: Revisiting Harold Zurcher
In this section, we apply our estimation procedure to the bus engine replacement datasetfirst analyzed in Rust (1987). In each week t , Harold Zurcher (bus depot manager), chooses y t ∈ { , } after observing the mileage x t ∈ X and the realized shocks ε t . If y t = 0, then hechooses not to replace the bus engine, and y t = 1 means that he chooses to replace the busengine. The states space is X = { , , . . . } , that is, we divided the mileage space into 30states, each representing a 12,500 increment in mileage since the last engine replacement. Harold Zurcher manages a fleet of 104 identical buses, and we observe the decisions thathe made, as well as the corresponding bus mileage at each time period t . The durationbetween t + 1 and t is a quarter of a year, and the dataset spans 10 years. Figures 7 and 8in the Appendix summarize the frequencies and mileage at which replacements take placein the dataset.Firstly, we can directly estimate the probability of choosing to replace and not to replacethe engine for each state in X . Also directly obtained from the data is the Markov transitionprobabilities for the observed state variable x t ∈ X , estimated as: This grid is coarser compared to Rust’s (1987) original analysis of this data, in which he dividedthe mileage space into increments of 5,000 miles. However, because replacement of engines occurred soinfrequently (there were only 61 replacement in the entire ten-year sample period), using such a fine gridsize leads to many states that have zero probability of choosing replacement. Our procedure – like all otherCCP-based approaches – fails when the vector of conditional choice probability lies on the boundary of thesimplex.
UALITY IN DYNAMIC DISCRETE CHOICE MODELS x , m ileage since last replacem ent (p er 12,500 m iles) ˆ u ( y = , x ) ǫ ∼ . · N (cid:181) ,
11 + 0 . x ¶ + 0 . · N (0 , ǫ ∼ . · N (0 , . · N (0 , Figure 3.
Estimates of utility flows ¯ u y =0 ( x ), across values of mileage x ˆPr( x t +1 = j | x t = i, y t = 0) = . j = i . j = i + 10 otherwiseˆPr( x t +1 = j | x t = i, y t = 1) = . j = 00 . j = 10 otherwiseFor this analysis, we assumed a normal mixture distribution of the error term, specifically, ε t − ε t ∼ N (0 ,
1) + N (0 , . x ). We chose this mixture distribution in order to allow In this paper, we restrict attention to the case where the researcher fully knows the distribution of theunobservables Q (cid:126)ε , so that there are no unknown parameters in these distributions. In principle, the two-stepprocedure proposed here can be nested inside an additional “outer loop” in which unknown parameters of Q (cid:126)ε are considered, but identification and estimation in this case must rely on additional model restrictions HIONG, GALICHON, AND SHUM the utility shocks to depend on mileage – which accommodates, for instance, operating costswhich may be more volatile and unpredictable at different levels of mileage. At the sametime, these specifications for the utility shock distribution showcase the flexibility of ourprocedure in estimating dynamic discrete choice models for any general error distribution.For comparison, we repeat this exercise using an error distribution that is homoskedastic,i.e., its variance does not depend on the state variable x t . The result appears to be robustto using different distributions of ε t − ε t . We set the discount rate β = 0 . u y =0 ( x ), we fixed ¯ u y =1 ( x ) to 0 for all x ∈ X . Hence, ourestimates of ¯ u y =0 ( x ) should be interpreted as the magnitude of operating costs relative toreplacement costs , with positive values implying that replacement costs exceed operatingcosts. The estimated utility flows from choosing y = 0 (don’t replace) relative to y = 1(replace engine) are plotted in Figure 3. We only present estimates for mileage within therange x ∈ [9 , u y =0 ( x ) using our procedure. Theresults are plotted in Figure 4. The evidence suggests that we are able to obtain fairly tight in addition to those considered in this paper. We are currently exploring such a model in the context of thesimpler static discrete choice setting (Chiong, Galichon and Shum (2014, work in progress)). Operating costs include maintenance, fuel, insurance costs, plus Zurcher’s estimate of the costs of lostridership and goodwill due to unexpected breakdowns. To be pedantic, this also includes the operating cost at x = 0. UALITY IN DYNAMIC DISCRETE CHOICE MODELS
Figure 4.
Bootstrapped estimates of utility flows ¯ u y =0 ( x )
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30−505−1.28 x , mi l eage si nce l ast repl acement (p er 12, 500 mi l es) ˆ u ( y = 0 , x ) We plot the values of the bootstrapped resampled estimates of ¯ u y =0 ( x ). In each boxplot, thecentral mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskersextend to the 5th and 95th percentiles. cost estimates for states where there is at least one replacement, i.e. for x ≥ x ≥ , x ≤
22 ( x ≤ ,
000 miles).7.
Conclusion
In this paper, we have shown how results from convex analysis can be fruitfully appliedto study identification in dynamic discrete choice models; modulo the use of these tools, alarge class of dynamic discrete choice problems with quite general utility shocks becomes nomore difficult to compute and estimate than the Logit model encountered in most empiricalapplications. This has allowed us to provide a natural and holistic framework encompassingthe papers of Rust (1987), Hotz and Miller (1993), and Magnac and Thesmar (2002). Whilethe identification results in this paper are comparable to other results in the literature, the
HIONG, GALICHON, AND SHUM approach we take, based on the convexity of the social surplus function G and the resultingduality between choice probabilities and choice-specific value functions, appears new. Farmore than providing a mere reformulation, this approach is powerful, and has significantimplications in several dimensions.First, by drawing the (surprising) connection between the computation of the G ∗ functionand the computation of optimal matchings in the classical assignment game, we can applythe powerful tools developed to compute optimal matchings to dynamic discrete-choicemodels. Moreover, by reformulating the problem as an optimal matching problem, allexistence and uniqueness results are inherited from the theory of optimal transport. Forinstance, the uniqueness of a systematic utility rationalizing the consumer’s choices followsfrom the uniqueness of a potential in the Monge-Kantorovich theorem.We believe the present paper opens a more flexible way to deal with discrete choicemodels. While identification is exact for a fixed structure of the unobserved heterogeneity,one may wish to parameterize the distribution of the utility shocks and do inference onthat parameter. The results and methods developed in this paper may also extend to dy-namic discrete games, with the utility shocks reinterpreted as players’ private information. However, we leave these directions for future exploration.
References [1] V. Aguirregabiria and P. Mira. Swapping the nested fixed point algorithm: A class of estimators fordiscrete Markov decision models.
Econometrica , 70:1519-1543, 2002.[2] V. Aguirregabiria and P. Mira. Sequential estimation of dynamic discrete games.
Econometrica , 75:1–53,2007.[3] V. Aguirregabiria and P. Mira. Dynamic discrete choice structural models: a survey.
Journal of Econo-metrics , 156:38–67, 2010.[4] Anderson, S., de Palma, A., and Thisse, J.-F. A Representative Consumer Theory of the Logit Model.
International Economic Review , 29(3), 461-466, 1988.[5] P. Arcidiacono and R. Miller. Conditional Choice Probability Estimation of Dynamic Discrete ChoiceModels with Unobserved Heterogeneity.
Econometrica , 79: 1823-1867, 2011. While the present paper has used standard Linear Programming algorithms such as the Simplexalgorithm, other, more powerful matching algorithms such as the Hungarian algorithm may be efficientlyput to use when the dimensionality of the problem grows. See, e.g. Aguirregabiria and Mira (2007) or Pesendorfer and Schmidt-Dengler (2008)).
UALITY IN DYNAMIC DISCRETE CHOICE MODELS [6] P. Arcidiacono and R. Miller. Identifying Dynamic Discrete Choice Models off Short Panels. Workingpaper, 2013.[7] C. Aliprantis and K. Border.
Infinite Dimensional Analysis: A Hitchhiker’s Guide . Springer-Verlag,2006.[8] P. Bajari, V. Chernozhukov, H. Hong, and D. Nekipelov. Nonparametric and semiparametric analysisof a dynamic game model. Preprint, 2009.[9] S. Berry, A. Gandhi, and P. Haile. Connected Substitutes and Invertibility of Demand.
Econometrica
81: 2087-2111, 2013.[10] S. Berry. Estimating Discrete-Choice models of Production Differentiation.
RAND Journal of Econom-ics , 25:242-262, 1994.[11] S. Berry, J. Levinsohn, and A. Pakes. Automobile prices in market equilibrium.
Econometrica , 63:841–890, July 1995.[12] D. Bertsekas.
Dynamic Programming Deterministic and Stochastic Models . Prentice-Hall, 1987.[13] P. Chiappori and I. Komunjer. On the Nonparametric Identification of Multiple Choice Models. Workingpaper, 2010.[14] K. Chiong, A. Galichon, and M. Shum. Simulation and Partial Identification in Random CoefficientDiscrete Choice Demand Models. Work in progress, 2014.[15] R. Cominetti, E. Melo, and S. Sorin. A payoff-based learning procedure and its application to trafficgames.
Games and Economic Behavior , 70:71-83, 2010.[16] A. Galichon and B. Salani´e. Cupid’s invisible hand: Social surplus and identification in matching models.Preprint, 2012.[17] N. Gretsky, J. Ostroy, and W. Zame. Perfect Competition in the Continuous Assignment Model.
Journalof Economic Theory , Vol. 85, pp. 60-118, 1999.[18] P. Haile, A. Hortacsu, and G. Kosenok. On the Empirical Content of Quantal Response Models.
Amer-ican Economic Review , 98:180-200, 2008.[19] J. Hofbauer and W. Sandholm. On the Global Convergence of Stochastic Fictitious Play.
Econometrica ,70: 2265-2294, 2002.[20] J. Hotz and R. Miller. Conditional choice probabilties and the estimation of dynamic models.
Reviewof Economic Studies , 60:497–529, 1993.[21] J. Hotz, R. Miller, S. Sanders, and J. Smith. A Simulation Estimator for Dynamic Models of DiscreteChoice.
Review of Economic Studies , 61:265-289, 1994.[22] Y. Hu and M. Shum. Nonparametric Identification of Dynamic Models with Unobserved Heterogeneity.
Journal of Econometrics , 171: 32-44, 2012.[23] H. Kasahara and K. Shimotsu. Nonparametric Identification of Finite Mixture Models of DynamicDiscrete Choice.
Econometrica , 77: 135–175, 2009.
HIONG, GALICHON, AND SHUM [24] M. Keane and K. Wolpin. The career decisions of young men.
Journal of Political Economy , 105: 473–522, 1997.[25] J. Kennan. A Note on Discrete Approximations of Continuous Distributions. Mimeo, University ofWisconsin at Madison, 2006.[26] T. Magnac and D. Thesmar. Identifying dynamic discrete decision processes.
Econometrica , 70:801–816,2002.[27] D. McFadden. Modeling the choice of residential location. In A. Karlquist et. al., editor,
Spatial Inter-action Theory and Residential Location . North Holland Pub. Co., 1978.[28] D. McFadden. Economic Models of Probabilistic Choice. In C. Manski and D. McFadden, editors,
Structural Analysis of Discrete Data with Econometric Applications , 1981.[29] A. Norets. Inference in dynamic discrete choice models with serially correlated unobserved state vari-ables.
Econometrica , 77: 1665-1682, 2009.[30] A. Norets and S. Takahashi. On the Surjectivity of the Mapping Between Utilities and Choice Proba-bilities.
Quantitative Economics
Econo-metrica , 54:1027-1057, 1986.[33] M. Pesendorfer and P. Schmidt-Dengler. Asymptotic least squares estimators for dynamic games.
Reviewof Economic Studies , 75:901–928, 2008.[34] R. Tyrell Rockafellar.
Convex Analysis . Princeton University Press, 1970.[35] J. Rust. Structural Estimation of Markov Decision Processes.
Handbook of Econometrics , Volume 4 (ed.R. Engle and D. McFadden). North-Holland, 1994.[36] J. Rust. Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher.
Economet-rica , 55:999–1033, 1987.[37] X. Shi, M. Shum, and W. Song. Estimating Multinomial Models using Cyclic Monotonicity. CaltechSocial Science Working Paper 1397, 2014.[38] L. Shapley and M. Shubik. The assignment game I: The core.
International Journal of Game Theory ,1(1):111–130, 1971.[39] C. Villani.
Topics in Optimal Transportation . Graduate Studies in Mathematics, Vol. 58. AmericanMathematical Society, 2003.[40] C. Villani.
Optimal Transport, Old and New . Springer, 2009.
UALITY IN DYNAMIC DISCRETE CHOICE MODELS Background results
Convex Analysis for Discrete-choice Models .
Here, we give a brief review ofthe main notions and results used in the paper. We keep an informal style and do not giveproofs, but we refer to Rockafellar (1970) for an extensive treatment of the subject.Let u ∈ R |Y| be a vector of utility indices. For utility shocks { ε y } y ∈Y distributed accordingto a joint distribution function Q , we define the social surplus function as G ( u ) = E [max y { u y + ε y } ] , (23)where u y is the y -th component of u . If E ( ε y ) exists and is finite, then the function G is a proper convex function that is continuous everywhere. Moreover assuming that Q is sufficiently well-behaved (for instance, if it has a density with respect to the Lebesguemeasure), G is differentiable everywhere.Define the Legendre-Fenchel conjugate , or convex conjugate of G as G ∗ ( p ) = sup u ∈ R |Y| { p · u − G ( u ) } . Clearly, G ∗ is a convex function as it is the supremum of affine functions. Notethat the inequality G ( u ) + G ∗ ( p ) ≥ p · u (24)holds in general. The domain of G ∗ consists of p ∈ R |Y| for which the supremum is finite.In the case when G is defined by (23), it follows from Norets and Takahashi (2013) that thedomain of G ∗ contains the simplex ∆ |Y| , which is the set of p ∈ R |Y| such that p y ≥ (cid:80) y ∈Y p y = 1. This means that our convex conjugate function is always well-defined.The subgradient ∂ G ( u ) of G at u is the set of p ∈ R |Y| such that p · u − G ( u ) ≥ p · u (cid:48) − G ( u (cid:48) )holds for all u (cid:48) ∈ R |Y| . Hence ∂ G is a set-valued function or correspondence. ∂ G ( u ) is asingleton if and only if G ( u ) is differentiable at u ; in this case, ∂ G ( u ) = ∇G ( u ).One sees that p ∈ ∂ G ( u ) if and only if p · u − G ( u ) = G ∗ ( p ), that is if equality is reachedin inequality (24): G ( u ) + G ∗ ( p ) = p · u. (25)This equation is itself of interest, and is known in the literature as “Fenchel’s equality”. Bysymmetry in (25), one sees that p ∈ ∂ G ( u ) if and only if u ∈ ∂ G ∗ ( p ). In particular, whenboth G and G ∗ are differentiable, then ∇G ∗ = ∇G − . HIONG, GALICHON, AND SHUM Proofs
Proof of Proposition 1.
Consider the y -th component, corresponding to ∂ G ( w ) ∂w y : ∂ G ( w ) ∂w y = ∂∂w y (cid:90) max y [ w y + ε y ] dQ (26)= (cid:90) ∂∂w y max y [ w y + ε y ] dQ (27)= (cid:90) ( w y + ε y ≥ w y (cid:48) + ε y (cid:48) ) , ∀ y (cid:48) (cid:54) = y ) dQ = p ( y ) . (28)(We have suppressed the dependence on x for convenience.) Proof of Theorem 1.
This follows directly from Fenchel’s equality (see Rockafellar (1970),Theorem 23.5, see also Appendix 8.1), which states that p ∈ ∂ G ( w )is equivalent to G ( w ) + G ∗ ( p ) = (cid:80) y p y w y , which is equivalent in turn to w ∈ ∂ G ∗ ( p ) . Proof of Theorem 2.
Because ε has full support, the choice probabilities p will lie strictlyin the interior of the simplex ∆ |Y| . Let ˜ w ∈ ∂ G ∗ ( p ), and let w y = ˜ w y − G ( ˜ w ). One has G ( w ) = 0, and an immediate calculation shows that ∂ G ( w ) = p . Let us now show that w isunique. Consider w and w (cid:48) such that G ( w ) = G ( w (cid:48) ) = 0, and p ∈ ∂ G ( w ) and p ∈ ∂ G ( w (cid:48) ).Assume w (cid:54) = w (cid:48) to get a contradiction; then there exist two distinct y and y such that w y − w y (cid:54) = w (cid:48) y − w (cid:48) y ; without loss of generality one may assume w y − w y > w (cid:48) y − w (cid:48) y . Let S be the set of ε ’s such that w y − w y > ε y − ε y > w (cid:48) y − w (cid:48) y w y + ε y > max y (cid:54) = y ,y { w y + ε y } w (cid:48) y + ε y > max y (cid:54) = y ,y (cid:8) w (cid:48) y + ε y (cid:9) UALITY IN DYNAMIC DISCRETE CHOICE MODELS
Because ε has full support, S has positive probability.Let ¯ w = w + w (cid:48) . Because p ∈ ∂ G ( w ) and p ∈ ∂ G ( w (cid:48) ), G is linear on the segment [ w, w (cid:48) ],thus G ( ¯ w ) = 0, thus0 = E (cid:2) ¯ w Y ( ¯ w,ε ) + ε Y ( ¯ w,ε ) (cid:3) = 12 E (cid:2) w Y ( ¯ w,ε ) + ε Y ( ¯ w,ε ) (cid:3) + 12 E (cid:104) w (cid:48) Y ( ¯ w,ε ) + ε Y ( ¯ w,ε ) (cid:105) ≤ E (cid:2) w Y ( w,ε ) + ε Y ( w,ε ) (cid:3) + 12 E (cid:104) w (cid:48) Y ( w (cid:48) ,ε ) + ε Y ( w (cid:48) ,ε ) (cid:105) = 12 (cid:0) G ( w ) + G (cid:0) w (cid:48) (cid:1)(cid:1) = 0Hence equality holds term by term, and w Y ( w,ε ) + ε Y ( w,ε ) = w Y ( ¯ w,ε ) + ε Y ( ¯ w,ε ) w (cid:48) Y ( w (cid:48) ,ε ) + ε Y ( w (cid:48) ,ε ) = w (cid:48) Y ( ¯ w,ε ) + ε Y ( ¯ w,ε ) For ε ∈ S , Y ( w, ε ) = Y ( ¯ w, ε ) = y and Y ( w (cid:48) , ε ) = Y ( ¯ w, ε ) = y , and we get the desiredcontradiction.Hence w = w (cid:48) , and the uniqueness of w follows. Proof of Theorem 3.
From G (cid:0) w (cid:1) = 0 and ∂ G ( w − G ( w )) = ∂ G ( w ), and by the uniquenessresult in Theorem 2, it follows that w = w − G ( w ) . Proof of Proposition 2.
The proof is in Galichon and Salani´e (2012), but we include it herefor self-containedness. This connection between the G ∗ function and a matching model HIONG, GALICHON, AND SHUM follows from manipulation of the variational problem in the definition of G ∗ : G ∗ ( p ) = sup w ∈ R Y (cid:40)(cid:88) y p y w y − E Q (cid:20) max y ∈Y ( w y + ε y ) (cid:21)(cid:41) (29)= sup w ∈ R Y (cid:88) y p y w y + E Q (cid:20) min y ∈Y ( − w y − ε y ) (cid:21)(cid:124) (cid:123)(cid:122) (cid:125) ≡ z ( ε ) . Defining c ( y, ε ) ≡ − ε y , one can rewrite the above as G ∗ ( p ) = sup w y + z ( ε ) ≤ c ( y,ε ) { E p [ w Y ] + E Q [ z ( ε )] } . (30)As is well-known from the results of Monge-Kantorovich (Villani (2003), Thm. 1.3), this isthe dual-problem for a mass transport problem. The corresponding primal problem is G ∗ ( p ) = min Y ∼ pε ∼ ˆ Q E [ c ( Y, ε )]which is equivalent to (16)-(18). Comparing Eqs. (29) and (30), we see that the subdif-ferential ∂ G ∗ ( p ) is identified with those elements w such that ( w, z ), for some z , solves thedual problem (30). Proof of Theorem 4. (i) follows from Proposition 2 and the fact that if w y + z ( ε ) ≤ c ( y, ε ),then E p [ w Y ] + E Q [ z ( ε )] = G ∗ ( p ) if and only if ( w, z ) is a solution to the dual problem.(ii) follows from the fact that − z ( ε ) = sup y { w y − c ( y, ε ) } = sup y { w y + ε y } , thus E Q [ z ( ε )] = 0 is equivalent to E Q (cid:2) sup y { w y + ε y } (cid:3) = 0, that is G ( w ) = 0. Proof of Theorem 5.
We shall show that the vector of choice-specific value functions derivedfrom the MTA estimation procedure, denoted w n , converges to the true vector w . In ourprocedure, there are two sources of estimation error. The first is the sampling error in thevector of choice probabilities, denoted p n . The second is the simulation error involved inthe discretization of the distribution of ε ; we let Q n denote this discretized distribution.A distinctive aspect of our proof is that it utilizes the theory of mass transport; namelyconvergence results for sequences of mass transport problems. For y ∈ Y , let ι y denote the UALITY IN DYNAMIC DISCRETE CHOICE MODELS |Y| -dimensional row vector with all zeros except a 1 in the y -th column. This discretizedmass transport problem from which we obtain w n is:sup γ ∈M ( Q n ,p n ) (cid:90) R d × R d ( ι · ε ) γ ( dε, dι ) (31)where M ( Q n , p n ) denotes the set of joint (discrete) probability measures with marginaldistributions Q n and p n . In the above, ι denotes a random vector which is equal to ι y withprobability p ny , for y ∈ Y . The dual problem used in the MTA procedure isinf z,w (cid:90) z ( ε ) dQ n ( ε ) + (cid:88) y w y p ny : (32) s.t. z ( ε ) ≥ ι y · ε − w y , ∀ y, ∀ ε (33) G n ( w ny ) = 0 , (34)where G n ( w ) ≡ E Q n ( w y + (cid:15) y ). We let ( z n , w n ) denote solutions to this discretized dualproblem (32). Recall (from the discussion in Section 2.3) that the extra constraint (34) inthe dual problem just selects among the many dual optimizing arguments ( w n , z n ) corre-sponding to the optimal primal solution γ n , and so does not affect the primal problem. Next we derive a more manageable representation of this constraint (34). From Fenchel’sEquality (Eq. (25)), we have (cid:80) y p ny w ny = G n ( w n ) + G ∗ n ( p n ) = G ∗ n ( p n ) (with G ∗ n defined asthe convex conjugate function of G n ). Moreover, from Proposition 2, we know that G ∗ n ( p n )can be characterized as the optimized dual objective function in (32). Hence, we see thatthe constraint G n ( w n ) = 0 is equivalent to (cid:82) z n ( ε ) dQ n ( ε ) = 0. We introduce this latterconstraint directly and rewrite the dual programinf z,w (cid:88) y w y p ny + (cid:90) z ( ε ) dQ n ( ε ) (35) s.t. z ( ε ) ≥ ι y · ε − w y , ∀ y, ∀ ε (36) (cid:90) z ( ε ) dQ n ( ε ) = 0 . (37) We note that, as discussed before, the discreteness of Q n implies that ( z n , w n ) will not be uniquelydetermined, as the core of the assignment game for a finite market is not a singleton. But this does notaffect the proof, as our arguments below hold for any sequence of selections { z n , w n } n . HIONG, GALICHON, AND SHUM
We will demonstrate consistency by showing that ( z n , w n ) converge a.s. to the dualoptimizers in the “limit” dual problem, given byinf z,w (cid:88) y w y p y (38) z ( ε ) ≥ ι y · ε − w y , ∀ y, ∀ ε (39) (cid:90) z ( ε ) dQ = 0 (40)We denote the optimizers in this limit problem by ( w , z ), where, by construction, w are the “true” values of the choice-specific value functions. The difference between thediscretized and limit dual problems is that Q n in the former has been replaced by Q , thecontinuous distribution of ε , and the estimated choice probabilities p n have been replacedby the limit p .We proceed in two steps. First, we argue that the sequence of optimized dual programs(35) converges to the optimized limit dual program (38), a.s. Based upon this, we thenargue that the sequence of dual optimizers, ( w n , z n ), necessarily converge to their uniquelimit optimizers, ( w , z ), a.s. First step.
By the Kantorovich duality theorem, we know that the optimized values forthe limit primal and dual programs coincidesup γ ∈ Π( Q ,p ) (cid:90) R d × R d ( ι · ε ) γ ( dε, dι ) = inf (cid:88) y w y p y + (cid:90) z ( ε ) dQ. (41)Moreover, both the primal and dual problems in the discretized case are finite-dimensionallinear programming problem, and by the usual LP duality, the optimal primal and dualproblems for the discretized case also coincide: (cid:90) R d × R d ( ι · ε ) γ n ( dε, dι ) = (cid:88) y w ny p ny + (cid:90) z n ( ε ) dQ n . Given Assumption 1, and by Theorem 5.20 in Villani (2009), p. 77, we have that, upto a subsequence extraction, γ n (the optimizing argument of (31)) converges weakly. Inaddition, by Theorem 5.30 in Villani (2009), the left-hand side of (41) has a unique solution UALITY IN DYNAMIC DISCRETE CHOICE MODELS γ ; hence, the sequence γ n must converge generally to γ . This implies a.s. convergence ofthe value of the primal problems: (cid:90) R d × R d ( ι · ε ) γ n ( dε, dι ) → (cid:90) R d × R d ( ι · ε ) γ ( dε, dι ) , a.s., and, by duality, we must also have a.s. convergence of the discretized dual problem to thelimit problem: (cid:88) y w ny p ny + (cid:90) z n ( ε ) dQ n → (cid:88) y w y p y + (cid:90) z ( ε ) dQ, a.s. (42) Second step.
Next, we show that the discretized dual minimizers ( z n , w n ) converge a.s.For convenience, in what follows we will suppress the qualifier “a.s.” from all the statementsbelow. Let w¯ n = min y w ny . (43)From examination of the dual problem (35), we see that z n is the piecewise affine function z n ( ε ) = max y { ι y · ε − w ny } , (44)thus z n is M -Lipschitz with M := max y | ι y | = 1. Now observe that z n ( ε ) + w¯ n = max y { ι y · ε − w ny + w¯ n } ≤ max y { ι y · ε } =: z ( ε ) (45)and, letting y (cid:48) be the argument of the minimum in (43), z n ( ε ) + w¯ n ≥ ι y (cid:48) · ε − w ny (cid:48) + w¯ n = ι y (cid:48) · ε ≥ min y { ι y · ε } =: z ( ε ) (46)thus, by a combination of (45) and (46), z ( ε ) ≤ z n ( ε ) + w¯ n ≤ z ( ε ) . (47)By (cid:82) z n ( ε ) dQ n ( ε ) = 0, we have that that w¯ n is uniformly bounded (sublinear): for someconstant K , | z n ( ε ) | ≤ C (1 + | ε | ) for every n and every ε . Hence the sequence z n is uniformlyequicontinuous, and converges locally uniformly up to a subsequence extraction by Ascoli’stheorem. Let this limit function be denoted z . By (42), and Theorem 2, we deduce that z , HIONG, GALICHON, AND SHUM the optimizer in the limit dual problem is unique , so that it must coincide with the limitfunction z .By the definition of ( w n , z n ) as optimizing arguments for (35), we have (cid:80) y w ny p ny ≤ (cid:80) y w¯ n p y + (cid:82) [ z ( ε )] dQ n ( ε ) or (cid:88) y (cid:0) w ny − w¯ n (cid:1) p ny ≤ (cid:90) [ z ( ε )] dQ n ( ε ) = E Q n z The second moment restrictions on Q n (condition (ii) in the theorem) imply that E Q n z ( ε )exists and converges to E Q z . Hence, the nonnegative vectors (cid:0) w ny − w¯ n (cid:1) are bounded;accordingly, the vectors (cid:0) w ny (cid:1) are themselves bounded. This implies that w n converges upto a subsequence to some limit point w ∗ , using the Bolzano-Weierstrass theorem. Thisimplies that (cid:80) y w ny p ny → (cid:80) y w ∗ y p y by bounded convergence. By Theorem 2, we know thatthe limit point w ∗ must coincide with w , which is the unique optimizer in the dual limitproblem (38). Thus, we have shown that w n converges to w , a.s. Although the support of ε is not bounded, the locally uniform convergence of z n and the fact that thesecond moments of Q n are uniformly bounded are enough to conclude. UALITY IN DYNAMIC DISCRETE CHOICE MODELS
Additional Figures
Design RMSE( y = 0) RMSE( y = 1) R ( y = 0) R ( y = 1) N = 100 , T = 100 0.5586 (3.7134) 0.2435 (0.1155) 0.3438 (0.7298) 0.7708 (0.2073) N = 100 , T = 500 0.1070 (0.0541) 0.1389 (0.0638) 0.7212 (0.2788) 0.9119 (0.0820) N = 100 , T = 1000 0.0810 (0.0376) 0.1090 (0.0425) 0.8553 (0.1285) 0.9501 (0.0352) N = 200 , T = 100 0.1244 (0.0594) 0.1642 (0.0628) 0.5773 (0.6875) 0.8736 (0.1112) N = 200 , T = 200 0.1177 (0.0736) 0.1500 (0.0816) 0.7044 (0.2813) 0.9040 (0.0842) N = 500 , T = 100 0.0871 (0.0375) 0.1162 (0.0430) 0.8109 (0.2468) 0.9348 (0.0650) N = 500 , T = 500 0.0665 (0.0261) 0.0829 (0.0290) 0.8899 (0.1601) 0.9678 (0.0374) N = 1000 , T = 100 0.0718 (0.0340) 0.0928 (0.0344) 0.8777 (0.1320) 0.9647 (0.0314) N = 1000 , T = 1000 0.0543 (0.0176) 0.0643 (0.0162) 0.9322 (0.0577) 0.9820 (0.0101) Table 3
HIONG, GALICHON, AND SHUM
100 200 300 400 500 600 700 800 900 100000.010.020.030.040.050.060.070.08
Number of discretized points, S U pp e r b o und Size of the identified set of payoff for choice y=1
Figure 5.
For each value of S , we plot the values of the differencesmax w ∈ ∂ G ∗ ( p ) w − min w ∈ ∂ G ∗ ( p ) w across all values of p ∈ ∆ . In the box-plot, the central mark is the median, the edges of the box are the 25th and75th percentiles, the whiskers extend to the most extreme data points notconsidered outliers, and outliers are plotted individually. UALITY IN DYNAMIC DISCRETE CHOICE MODELS
100 200 300 400 500 600 700 800 900 100000.020.040.060.080.10.120.140.16
Number of discretized points, S U pp e r b o und Size of the identified set of payoff for choice y=2
Figure 6.
For each value of S , we plot the values of the differencesmax w ∈ ∂ G ∗ ( p ) w − min w ∈ ∂ G ∗ ( p ) w across all values of p ∈ ∆ . In the box-plot, the central mark is the median, the edges of the box are the 25th and75th percentiles, the whiskers extend to the most extreme data points notconsidered outliers, and outliers are plotted individually. HIONG, GALICHON, AND SHUM x , m ileage since last replacem ent (p er 12,500 m iles) F r e q u e n c y o f e n g i n e r e p l a ce m e n t Figure 7 x , m ileage since last replacem ent (p er 12,500 m iles) P r o b a b ili t y o f e n g i n e r e p l a ce m e n tt