Dynamic Optimal Choice When Rewards are Unbounded Below
aa r X i v : . [ ec on . T H ] N ov Dynamic optimal choice when rewards are unbounded below Qingyin Ma a and John Stachurski ba ISEM, Capital University of Economics and Business b Research School of Economics, Australian National UniversityDecember 2, 2019
Abstract.
We propose a new approach to solving dynamic decision problems withrewards that are unbounded below. The approach involves transforming the Bell-man equation in order to convert an unbounded problem into a bounded one. Themajor advantage is that, when the conditions stated below are satisfied, the trans-formed problem can be solved by iterating with a contraction mapping. While themethod is not universal, we show by example that many common decision problemsdo satisfy our conditions.
JEL Classifications:
C61, E00
Keywords:
Dynamic programming, optimality Introduction
Reward functions that are unbounded below have long been a stumbling block forrecursive solution methods, due to a failure of the standard contraction mapping ar-guments first developed by Blackwell (1965). At the same time, such specificationsare popular in economics and finance, due to their convenience and well-establishedproperties. This issue is more than esoteric, since the Bellman equation for such prob-lems can have multiple solutions that confound the search for optima. Computationof solutions, already challenging when the state space is large, becomes even more sowhen rewards are unbounded.Here we propose a new approach to handling problems with values that are unboundedbelow. Instead of creating a new optimality theory, our approach proceeds by trans-forming the Bellman equation to convert these unbounded problems into bounded We thank Takashi Kamihigashi and Yiannis Vailakis for valuable feedback and suggestions, aswell as audience members at the Econometric Society meeting in Auckland in 2018 and the 2ndConference on Structural Dynamic Models in Copenhagen. Financial support from ARC DiscoveryGrant DP120100321 is gratefully acknowledged.
Email addresses: [email protected] , [email protected] ones. The main advantage of this approach is that, when the conditions stated beloware satisfied, the transformed problem can be solved using standard methods basedaround contraction mappings. The technical contribution of our paper lies obtainingsuitable conditions and providing a proof that the solution to the transformed prob-lem is equal to the solution to the original one. While the method is not universal, weshow by example that many well-known decision problems do satisfy our conditions.Our work contributes to a substantial existing literature on dynamic choice withunbounded rewards. The best known approach to such problems is the weightedsupremum norm method, originally developed by Wessels (1977) and connected toeconomic modeling by Boyd (1990). This approach has been successful in treatingmany maximization problems where rewards are unbounded above. Unfortunately,as noted by many authors, this same approach typically fails when rewards are un-bounded below. This failure was a major motivation behind the development of the local contractionapproach to dynamic programming, due to Rinc´on-Zapatero and Rodr´ıguez-Palmero(2003), Martins-da Rocha and Vailakis (2010) and, for the stochastic case Matkowski and Nowak(2011). This local contraction method, which requires contractions on successivelylarger subsets of the state space, is ingenious and elegant but also relatively techni-cal, which might be the cause of slow uptake on the part of applied economists. Asecond disadvantage in terms of applications is that the convergence results for valuefunction iteration are not as sharp as with traditional dynamic programming.Another valuable contribution is Ja´skiewicz and Nowak (2011), which explicitly ad-mits problems with rewards that are unbounded below. In this setting, they showthat the value function of a Markov decision process is a solution to the Bellmanequation. We strengthen their results by adding a uniqueness result and proving thatvalue function iteration leads to an optimal policy. Both of these results are significantfrom an applied and computational perspective. Like Ja´skiewicz and Nowak (2011),we combine our methodology with the weighted supremum norm approach, so thatwe can handle problems that are both unbounded above and unbounded below.Many other researchers have used transformations of the Bellman equation, includ-ing Rust (1987), Jovanovic (1982), Bertsekas (2017), Ma and Stachurski (2018) and See, for example, the discussions in Le Van and Vailakis (2005) or Ja´skiewicz and Nowak (2011).Alvarez and Stokey (1998) find some success handling certain problems that are unbounded belowusing weighted supremum norm methods, although they require a form of homogeneity that fails tohold in the applications we consider. B¨auerle and Ja´skiewicz (2018) extend the weighted supremumnorm technique to risk sensitive preferences in a setting where utility is bounded below.
Abbring et al. (2018). These transformations are typically aimed at improving eco-nomic intuition, estimation properties or computational efficiency. The present paperis, to the best of our knowledge, the first to consider transformations of the Bell-man equation designed to solving dynamic programming problems with unboundedrewards.The rest of our paper is structured as follows. Section 2 starts the exposition with typ-ical examples. Section 3 presents theory and Section 4 gives additional applications.Most proofs are deferred to the appendix.2.
Example Applications
We first illustrate the methodology for converting unbounded problems to boundedones in some common settings.2.1.
Application 1: Optimal Savings.
Consider an optimal savings problem wherea borrowing constrained agent seeks to solvesup E ∞ X t =0 β t u ( c t )subject to the constraints0 c t w t , w t +1 = R ( w t − c t ) + y t +1 and ( w , y ) given . (1)Here β ∈ (0 ,
1) is the discount factor, c t , w t and y t are respectively consumption,wealth and non-financial income at time t , R is the rate of return on financial income, and u is the CRRA utility function defined by u ( c ) = c − γ − − γ with γ > . (2)We are focusing on the case γ > { y t } is a Markov process with state space Y ⊂ R + and stochastic kernel P satisfying ¯ u := inf y ∈ Y Z u ( y ′ ) P ( y, d y ′ ) > −∞ . (3) The timing associated with the wealth constraint in (1) is such that y t +1 is excluded from thetime t information set, as in, say Benhabib et al. (2015). One can modify the second constraint in(1) to an alternative timing such as w t +1 = R ( w t − c t + y t ) and the arguments below still go throughafter suitable modifications. An application along these lines is given in Section 2.3. Here P ( y, · ) can be interpreted as the transition probability. In particular, P ( y, A ) representsthe probability of transitioning from y to set A in one step. See Section 3.1 for formal definition. Condition (3) holds if, say, • { y t } is a finite state Markov chain taking positive values (see, e.g., A¸cıkg¨oz(2018) and Cao (2018)), or • { y t } is iid and E u ( y t ) > −∞ (see, e.g., Benhabib et al. (2015)), or • { y t } is a Markov switching process, say, y t = µ t + σ t ε t , where { ε t } iid ∼ N (0 , { µ t } and { σ t } are positive and driven by finite state Markov chains (see,e.g., Heathcote et al. (2010) and Kaplan and Violante (2010)).The Bellman equation of this problem is v ( w, y ) = sup c w (cid:26) u ( c ) + β Z v ( R ( w − c ) + y ′ , y ′ ) P ( y, d y ′ ) (cid:27) , (4)where w ∈ R + and y ∈ Y . Since c t w t , it is clear that the value function isunbounded below. Put differently, if v is a candidate value function, then even if v isbounded, its image T v ( w, y ) = sup c w (cid:26) u ( c ) + β Z v ( R ( w − c ) + y ′ , y ′ ) P ( y, d y ′ ) (cid:27) (5)under the Bellman operator is dominated by u ( w ) plus some finite constant, andhence v ( w, y ) → −∞ as w → y ∈ Y .Consider, however, the following transformation. Let s := w − c and g ( y, s ) := β Z v ( Rs + y ′ , y ′ ) P ( y, d y ′ ) (6)so that v ( w, y ) = sup s w { u ( w − s ) + g ( y, s ) } . (7)We can eliminate the function v from (7) by using the definition of g . The first stepis to evaluate v in (7) at ( Rs + y ′ , y ′ ), which gives v ( Rs + y ′ , y ′ ) = sup s ′ Rs + y ′ { u ( Rs + y ′ − s ′ ) + g ( y ′ , s ′ ) } . Now we take expectations on both sides of the last equality and multiply by β to get g ( y, s ) = β Z sup s ′ Rs + y ′ { u ( Rs + y ′ − s ′ ) + g ( y ′ , s ′ ) } P ( y, d y ′ ) . (8)This is a functional equation in g . We now introduce a modified Bellman operator S such that any solution g of (8) is a fixed point of S : Sg ( y, s ) = β Z sup s ′ Rs + y ′ { u ( Rs + y ′ − s ′ ) + g ( y ′ , s ′ ) } P ( y, d y ′ ) . (9) Let G be the set of bounded measurable functions on Y × R + . We claim that S maps G into itself and, moreover, is a contraction of modulus β with respect to thesupremum norm.To see that this is so, pick any g ∈ G . Then Sg is bounded above, since γ > Sg ( y, s ) β (sup c > u ( c ) + k g k ) β k g k , where k · k is the supremum norm. More importantly, Sg is bounded below. Indeed, Sg ( y, s ) > β Z sup s ′ Rs + y ′ { u ( Rs + y ′ − s ′ ) − k g k} P ( y, d y ′ )= β Z { u ( Rs + y ′ ) − k g k} P ( y, d y ′ ) > β Z u ( y ′ ) P ( y, d y ′ ) − β k g k > β ¯ u − β k g k . Finally, S is obviously a contraction mapping, since, for any g, h ∈ G , we have (cid:12)(cid:12)(cid:12)(cid:12) sup s ′ { u ( Rs + y ′ − s ′ ) + g ( y ′ , s ′ ) } − sup s ′ { u ( Rs + y ′ − s ′ ) + h ( y ′ , s ′ ) } (cid:12)(cid:12)(cid:12)(cid:12) sup s ′ | g ( y ′ , s ′ ) − h ( y ′ , s ′ ) | and hence | Sg ( y, s ) − Sh ( y, s ) | β Z sup s ′ Rs + y ′ | g ( y ′ , s ′ ) − h ( y ′ , s ′ ) | P ( y, d y ′ ) β k g − h k . Taking the supremum over all ( y, s ) ∈ Y × R + yields k Sg − Sh k β k g − h k . We have now shown that S is a contractive self-map on G . Most significant here isthat G is a space of bounded functions. By Banach’s contraction mapping theorem, S has a unique fixed point g ∗ in G . Presumably, we can insert g ∗ into the right handside of the “Bellman equation” (8), compute the maximizer at each state and obtainthe optimal savings policy. If a version of Bellman’s principle of optimality appliesto this modified Bellman equation, we also know that policies obtained in this wayexactly coincide with optimal policies, so, if all of these conjectures are correct, wehave a complete characterization of optimality.A significant amount of theory must be put in place to make the proceeding argu-ments work. In particular, the conjectures discussed immediately above regarding thevalidity of Bellman’s principle of optimality vis-a-vis the modified Bellman equationare nontrivial, since the transformation in (6) that maps v to g is not bijective. As aresult, some careful analysis is required before we can make firm conclusions regardingoptimality. This is the task of Section 3. A final comment on this application is that, for this particular problem, we can alsouse Euler equation methods, which circumvent some of the issues associated withunbounded rewards (see, e.g., Li and Stachurski (2014)). However, these methodsare not applicable in many other settings, due to factors such as existence of discretechoices. The next two applications illustrate this point.2.2.
Application 2: Job Search.
As in McCall (1970), an unemployed worker caneither accept current job offer w t = z t + ξ t and work at that wage forever or choosean outside option (e.g., irregular work in the informal sector) yielding c t = z t + ζ t andcontinue to the next period. Here z t is a persistent component, while ξ t and ζ t aretransient components. We assume that { ξ t } and { ζ t } are iid and lognormal, andln z t +1 = ρ ln z t + σε t +1 , { ε t } iid ∼ N (0 , . (10)The worker’s value function satisfies the Bellman equation v ( w, c, z ) = max (cid:26) u ( w )1 − β , u ( c ) + β E z v ( w ′ , c ′ , z ′ ) (cid:27) . (11)Let u be increasing, continuous, and unbounded below with u ( w ) = −∞ as w → u be bounded above. Moreover, we assume thateither inf z> E z u ( w ′ ) > −∞ or inf z> E z u ( c ′ ) > −∞ . (12)Condition (12) is satisfied if u is CRRA, say, since then E u ( ξ t ) and E u ( ζ t ) are finite.Note that v ( w, c, z ) is unbounded below since utility can be arbitrarily close to −∞ .To shift to a bounded problem, we can proceed in a similar vein to our manipulationof the Bellman equation in the optimal savings case. First we set g ( z ) := β E z v ( w ′ , c ′ , z ′ ) , so that (11) can be written as v ( w, c, z ) = max (cid:26) u ( w )1 − β , u ( c ) + g ( z ) (cid:27) . Next we use the definition of g to eliminate v from this last expression, which leadsto the functional equation g ( z ) = β E z max (cid:26) u ( w ′ )1 − β , u ( c ′ ) + g ( z ′ ) (cid:27) . (13)The corresponding fixed point operator is Sg ( z ) = β E z max (cid:26) u ( w ′ )1 − β , u ( c ′ ) + g ( z ′ ) (cid:27) . (14) If g is bounded above then clearly so is Sg . Moreover, if g is bounded below by someconstant M , then, by Jensen’s inequality, Sg ( z ) > β max (cid:26) E z u ( w ′ )1 − β , E z u ( c ′ ) + M (cid:27) . Condition (12) then implies that Sg is also bounded below.An argument similar to the one adopted above for the optimal savings model provesthat S is a contraction mapping with respect to the supremum norm on a space ofbounded functions (Section 3 gives details). Thus, we can proceed down essentiallythe same path we used for the optimal savings problem, with the same caveat thatthe modified Bellman operator S and the original Bellman operator need to have thesame connection to optimality, and all computational issues need to be clarified.2.3. Application 3: Optimal Default.
Consider an infinite horizon optimal sav-ings problem with default, in the spirit of Arellano (2008) and a large related litera-ture. A country with current assets w t chooses between continuing to participate ininternational financial markets and default. Output y t = y ( z t , ξ t )is a function of a persistent component { z t } and an innovation { ξ t } . The persistentcomponent is a Markov process such as the one in (10) and the transient component { ξ t } is iid . To simplify the exposition, we assume that default leads to permanentexclusion from financial markets, with lifetime value v d ( y, z ) = E ∞ X t =0 β t u ( y t ) . Notice that v d satisfies the functional equation v d ( y, z ) = u ( y ) + β E z v d ( y ′ , z ′ ) . The value of continued participation in financial markets is v c ( w, y, z ) = sup − b w ′ R ( w + y ) { u ( w + y − w ′ /R ) + β E z v ( w ′ , y ′ , z ′ ) } , where b > v is the value function satisfying v ( w, y, z ) = max (cid:8) v d ( y, z ) , v c ( w, y, z ) (cid:9) . The utility function u has the same properties as Section 2.2. It is easy to see that v is unbounded below since u can be arbitrarily close to −∞ . However, we can convertthis into a bounded problem, as the following analysis shows. Recent examples include Aguiar and Amador (2019) and Aguiar et al. (2019).
Let i be a discrete choice variable taking values in { , } , with 0 indicating defaultand 1 indicating continued participation. We define g ( z, w ′ , i ) := ( β E z v d ( y ′ , z ′ ) if i = 0 β E z v ( w ′ , y ′ , z ′ ) if i = 1so that for − b w ′ R ( w + y ), we have v ( w, y, z ) = max (cid:26) u ( y ) + g ( z, w ′ , , sup w ′ { u ( w + y − w ′ /R ) + g ( z, w ′ , } (cid:27) . Eliminating the value function v yields g ( z, w ′ ,
0) = β E z { u ( y ′ ) + g ( z ′ , w ′ , } and g ( z, w ′ ,
1) = β E z max (cid:26) u ( y ′ ) + g ( z ′ , w ′ , , sup w ′′ { u ( w ′ + y ′ − w ′′ /R ) + g ( z ′ , w ′′ , } (cid:27) , where − b w ′′ R ( w ′ + y ′ ). We can then define the fixed point operator S corre-sponding to these functional equations.If g is bounded above by some constant K , then Sg sup c u ( c ) + K . More impor-tantly, if g is bounded below by some constant M , we obtain Sg ( z, w ′ , > β E z u ( y ′ ) + βM and Sg ( z, w ′ , > β E z max { u ( y ′ ) + M, u ( w ′ + y ′ + b/R ) + M } = β E z max { u ( y ′ ) , u ( w ′ + y ′ + b/R ) } + βM. Hence, Sg is bounded below by a finite constant ifinf z E z u ( y ′ ) > −∞ . (15)For example, (15) holds if y t = z t + ξ t where { z t } is positive and E u ( ξ t ) > −∞ . Anargument similar to the one in Section 2.1 now proves that S is a contraction withrespect to the supremum norm (Section 3 gives details).3. General Formulation
The preceding section showed how some unbounded problems can be converted tobounded problems by modifying the Bellman equation. The next step is to confirmthe validity of such a modification in terms of the connection between the modifiedBellman equation and optimal policies. We do this in a generic dynamic programmingsetting that contains the applications given above.
Theory.
For a given set E , let B ( E ) be the Borel subsets of E . For our purpose,a dynamic program consists of • a nonempty set X called the state space , • a nonempty set A called the action space , • a nonempty correspondence Γ from X to A called the feasible correspondence ,along with the associated set of state action pairs D := { ( x, a ) ∈ X × A : a ∈ Γ( x ) } , • a measurable map r : D → R ∪ {−∞} called the reward function , • a constant β ∈ (0 ,
1) called the discount factor , and • a stochastic kernel Q governing the evolution of states. Each period, an agent observes a state x t ∈ X and responds with an action a t ∈ Γ( x t ) ⊂ A . The agent then obtains a reward r ( x t , a t ), moves to the next period witha new state x t +1 , and repeats the process by choosing a t +1 and so on. The stateprocess updates according to x t +1 ∼ Q ( x t , a t , · ).Let Σ denote the set of feasible policies , which we assume to be nonempty and defineas all measurable maps σ : X → A satisfying σ ( x ) ∈ Γ( x ) for all x ∈ X . Given anypolicy σ ∈ Σ and initial state x = x ∈ X , the σ -value function v σ is defined by v σ ( x ) = ∞ X t =0 β t E x r ( x t , σ ( x t )) . We understand v σ ( x ) as the lifetime value of following policy σ now and forever,starting from current state x .The value function associated with this dynamic program is defined at each x ∈ X by v ∗ ( x ) = sup σ ∈ Σ v σ ( x ) . (16)A feasible policy σ ∗ is called optimal if v σ ∗ = v ∗ on X . The objective of the agent isto find an optimal policy that attains the maximum lifetime value.To handle rewards that are unbounded above as well as below, we introduce a weight-ing function κ , which is a measurable function mapping X to [1 , ∞ ). Let G be the setof measurable functions g : D → R such that g is bounded below and k g k κ := sup ( x,a ) ∈ F | g ( x, a ) | κ ( x ) < ∞ . (17) Here a stochastic kernel corresponding to our controlled Markov process { ( x t , a t ) } is a mapping Q : D × B ( X ) → [0 ,
1] such that (i) for each ( x, a ) ∈ D , A Q ( x, a, A ) is a probability measure on B ( X ), and (ii) for each A ∈ B ( X ), ( x, a ) Q ( x, a, A ) is a measurable function. The pair ( G , k · k κ ) is a Banach space (see, e.g., Bertsekas (2013)). Moreover, at each x ∈ X and ( x, a ) ∈ D , we define¯ r ( x ) := sup a ∈ Γ( x ) r ( x, a ) and ℓ ( x, a ) := E x,a ¯ r ( x ′ ) . (18) Assumption 3.1.
There exist constants d ∈ R + and α ∈ (0 , /β ) such that ¯ r ( x ) dκ ( x ) and E x,a κ ( x ′ ) ακ ( x ) for all ( x, a ) ∈ D .Assumption 3.1 relaxes the standard weighted supremum norm assumptions (see, e.g.,Wessels (1977) or Bertsekas (2013)), in the sense that the reward function is allowedto be unbounded from below.Next, we define S on G as Sg ( x, a ) := β E x,a sup a ′ ∈ Γ( x ′ ) { r ( x ′ , a ′ ) + g ( x ′ , a ′ ) } . (19)Given g ∈ G , a feasible policy σ is called g -greedy if r ( x, σ ( x )) + g ( x, σ ( x )) = sup a ∈ Γ( x ) { r ( x, a ) + g ( x, a ) } for all x ∈ X . (20)Although the reward function is potentially unbounded below, the dynamic programcan be solved by the operator S , as the following theorem shows. Theorem 3.1.
If Assumption 3.1 holds and ℓ is bounded below, then (1) S G ⊂ G and S is a contraction mapping on ( G , k · k κ ) . (2) S admits a unique fixed point g ∗ in G . (3) S k g converges to g ∗ at rate O (( αβ ) k ) under k · k κ . (4) If there exists a closed subset G of G such that S G ⊂ G and a g -greedy policyexists for each g ∈ G , then, in addition, (a) g ∗ is an element of G and satisfies g ∗ ( x, a ) = β E x,a v ∗ ( x ′ ) and v ∗ ( x ) = max a ∈ Γ( x ) { r ( x, a ) + g ∗ ( x, a ) } . (b) At least one optimal policy exists. (c)
A feasible policy is optimal if and only if it is g ∗ -greedy. Sufficient Conditions.
Consider a dynamic programming problemmax E ∞ X t =0 β t r ( w t , s t ) (21)subject to0 s t w t , w t +1 = f ( s t , η t +1 ) , η t = h ( z t , ε t ) and ( w , z ) given . (22) Here z and ε correspond respectively to a Markov process { z t } on Z and an iid process { ε t } , f and h are nonnegative continuous functions, and f is increasing in s .Furthermore, Z and the range space of { η t } are Borel subsets of finite-dimensionalEuclidean spaces, and the stochastic kernel P corresponding to { z t } is Feller. This problem can be placed in our framework by setting x := ( w, z ) , a := s, X := R + × Z , A := R + , Γ( x ) := [0 , w ]and D := { ( w, z, s ) ∈ R + × Z × R + : 0 s w } . Suppose that the reward function r : D → R ∪{−∞} is increasing in w and decreasingin s , r is continuous on the interior of D and, if r is bounded below, it is continuous.Recall κ defined in Assumption 3.1. Let ℓ ( z ) := E z r ( f (0 , η ′ ) ,
0) and κ e ( z, s ) := E z,s κ ( w ′ , z ′ ) . Let G be the set of functions g in G that is increasing in its last argument andcontinuous. Notice that, in the current setting, S defined on G is given by Sg ( z, s ) = β E z,s max s ′ ∈ [0 ,w ′ ] { r ( w ′ , s ′ ) + g ( z ′ , s ′ ) } . Theorem 3.1 is applicable in the current setting, as the following result illustrates.
Proposition 3.2.
If Assumption 3.1 holds for some continuous functions κ and κ e ,and ℓ is continuous and bounded below, then S is a contraction mapping on ( G , k · k κ ) and the conclusions of Theorem 3.1 hold. Applications
In this section, we complete the discussion of all applications in Section 2. We alsoextend the optimality results of Benhabib et al. (2015) by adding a persistent com-ponent to labor income and returns.4.1.
Optimal Savings (Continued).
Recall the optimal savings problem of Sec-tion 2.1. This problem can be placed into the framework of Section 3.2 by letting η = z := y, r ( w, s ) := u ( w − s ) , f ( s, η ′ ) := Rs + η ′ and h ( z, ε ) := z. To establish the desired properties, it remains to verify the conditions of Proposi-tion 3.2. Since we have shown that S G ⊂ G , where G is the set of bounded measurable In other words, z R h ( z ′ ) P ( z, d z ′ ) is bounded and continuous whenever h is. functions on Y × R + , we can simply set κ ≡ κ and κ e are continuous functions. Moreover, note that ℓ ( y ) = E y u ( y ′ ) = Z u ( y ′ ) P ( y, d y ′ ) , which is bounded below by (3). As a result, all the conclusions of Theorem 3.1 hold aslong as y R u ( y ′ ) P ( y, d y ′ ) is continuous. In particular, when this further conditionholds, S is a contraction mapping on ( G , k · k ) with unique fixed point g ∗ , and afeasible policy is optimal if and only if it is g ∗ -greedy. Here G is the set of boundedcontinuous functions on Y × R + that is increasing in its last argument.4.2. Job Search (Continued).
Recall the job search problem of Section 2.2. Thisproblem fits into the framework of Section 3.1 if we let the a be a discrete choicevariable taking values in { , } , where 0 denotes the decision to stop and 1 representsthe decision to continue, x := ( w, z, c ) , X := (0 , ∞ ) , A := { , } , Γ( x ) := { , } , D := (0 , ∞ ) × { , } and the reward function r ( x, a ) be r ( w, c, a ) := u ( w )1 − β if a = 0 and r ( w, c, a ) := u ( c ) if a = 1 . We have shown that S G ⊂ G , where G is the set of bounded measurable functions on(0 , ∞ ). Hence, Assumption 3.1 holds with κ ≡
1. Note that in this case, the function ℓ ( x, a ) reduces to ℓ ( z ) = E z max { u ( w ′ ) / (1 − β ) , u ( c ′ ) } . Then ℓ is bounded below by Jensen’s inequality and (12). Since in addition the actionset is finite, a g -greedy policy always exists for all g ∈ G . Let G := G . The analysisabove implies that all the conclusions of Theorem 3.1 hold.4.3. Optimal Default (Continued).
Recall the optimal default problem studiedin Section 2.3. This setting is a special case of our framework. In particular, x := ( w, y, z ) , a := ( w ′ , i ) , X := [ − b, ∞ ) × Y × Z and A := [ − b, ∞ ) × { , } , where i is a discrete choice variable taking values in { , } , and Y and Z are respec-tively the range spaces of { y t } and { z t } . The reward function r reduces to r ( w, y, w ′ , i ) := ( u ( y ) if i = 0 ,u ( w + y − w ′ /R ) if i = 1 . Since S G ⊂ G , where G is the set of bounded measurable functions on Z × [ − b, ∞ ) ×{ , } , Assumption 3.1 holds for κ ≡
1. Moreover, ℓ satisfies ℓ ( z, w ′ ) = E z max { u ( y ′ ) , u ( w ′ + y ′ + b/R ) } > E z u ( y ′ ) , which is bounded below by (15). Let G be the set of functions in G that is increasingin its second-to-last argument and continuous. Through similar steps to the proofof Proposition 3.2, one can show that S G ⊂ G and a g -greedy policy exists for all g ∈ G . As a result, all the conclusions of Theorem 3.1 are true.4.4. Optimal Savings with Capital Income Risk.
Consider an optimal savingsproblem with capital income risk (see, e.g., Benhabib et al. (2015)). The setting issimilar to that of Section 2.1, except that the rate of return to wealth is stochastic.In particular, the constraint (1) now becomes0 s t w t , w t +1 = R t +1 s t + y t +1 and ( w , z ) given . where w t is wealth, s t is the amount of saving, while R t and { y t } are respectively therate of return to wealth and the non-financial income that satisfy R t = h R ( z t , ξ t ) and y t = h y ( z t , ζ t ) . Here { z t } is a finite state Markov chain, and { ξ t } and { η t } are iid innovation processes.The importance of these features for wealth dynamics is highlighted in Fagereng et al.(2016) and Hubmer et al. (2018), among others.This problem fits into the framework of Section 3.2 by setting η := ( R, y ) , ε t := ( ξ t , ζ t ) , r ( w, a ) := u ( w − s ) and f ( s, η ′ ) := Rs + y ′ . In this case, ℓ ( z ) = E z u ( y ′ ) and Sg ( z, s ) = β E z,s max s ′ ∈ [0 ,w ′ ] { u ( w ′ − s ′ ) + g ( z ′ , s ′ ) } . Consider, for example, the CRRA utility in (2). In this case, Assumption 3.1 holdswith κ ≡
1, and G reduces to the set of bounded continuous functions on Z × R + thatis increasing in its last argument. The conclusions of Theorem 3.1 hold if z E z u ( y ′ )is continuous and bounded below. 5. Appendix
Let V (resp., V ) be the set of measurable functions v : X → R ∪ {−∞} such that( x, a ) β E x,a v ( x ′ ) is in G (resp., G ), and let H (resp., H ) be the set of measurable functions h : D → R ∪ {−∞} such that h = r + g for some g in G (resp., G ). Next,we define the operators W , W and M respectively on V , G and H as W v ( x, a ) := β E x,a v ( x ′ ) , W g ( x, a ) := r ( x, a ) + g ( x, a ) , and M h ( x ) := sup a ∈ Γ( x ) h ( x, a ) . Then S in (19) satisfies S = W M W on G . Proof of Theorem 3.1.
To see claim (1) holds, we first show that S G ⊂ G . Fix g ∈ G .By the definition of G , there is a lower bound g ∈ R such that g > g . Then Sg ( x, a ) > β E x,a sup a ′ ∈ Γ( x ′ ) (cid:8) r ( x ′ , a ′ ) + g (cid:9) = β " E x,a sup a ′ ∈ Γ( x ′ ) r ( x ′ , a ′ ) + g = β (cid:2) E x,a ¯ r ( x ′ ) + g (cid:3) = β (cid:2) ℓ ( x, a ) + g (cid:3) . Since by assumption ℓ is bounded below, so is Sg . Moreover, by Assumption 3.1, Sg ( x, a ) β E x,a ( ¯ r ( x ′ ) + sup a ′ ∈ Γ( x ′ ) g ( x ′ , a ′ ) ) β E x,a { ( d + k g k κ ) κ ( x ′ ) } αβ ( d + k g k κ ) κ ( x )for all ( x, a ) ∈ D . Hence, Sg/κ is bounded above. Since in addition Sg is boundedbelow and κ >
1, we have k Sg k κ < ∞ . We have now shown that Sg ∈ G .Next, we show that S is a contraction mapping on ( G , k · k κ ). Fix g , g ∈ G . Notethat for all ( x, a ) ∈ D , we have | Sg ( x, a ) − Sg ( x, a ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) β E x,a sup a ′ ∈ Γ( x ′ ) { r ( x ′ , a ′ ) + g ( x ′ , a ′ ) } − β E x,a sup a ′ ∈ Γ( x ′ ) { r ( x ′ , a ′ ) + g ( x ′ , a ′ ) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) β E x,a (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) sup a ′ ∈ Γ( x ′ ) { r ( x ′ , a ′ ) + g ( x ′ , a ′ ) } − sup a ′ ∈ Γ( x ′ ) { r ( x ′ , a ′ ) + g ( x ′ , a ′ ) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) β E x,a sup a ′ ∈ Γ( x ′ ) | g ( x ′ , a ′ ) − g ( x ′ , a ′ ) | β k g − g k κ E x,a κ ( x ′ ) αβ k g − g k κ κ ( x ) , where the last inequality follows from Assumption 3.1. Then we have k Sg − Sg k κ αβ k g − g k κ . Since αβ < S is a contraction mapping on ( G , k · k κ ) and claim (1)is verified.Claims (2)–(3) follow immediately from claim (1) and the Banach contraction map-ping theorem. Regarding claim (4), since G is a closed subset of G and S G ⊂ G , S is also a contraction mapping on ( G , k · k κ ) and the unique fixed point g ∗ of S is indeedin G . Based on Proposition 2 of Ma and Stachurski (2018), the Bellman operator T := M W W maps elements of V into itself and has a unique fixed point ¯ v in V that satisfies ¯ v = M W g ∗ and g ∗ = W ¯ v .To verify part (a) of claim (4), it remains to show that ¯ v = v ∗ . For all x ∈ X and σ ∈ Σ, we have¯ v ( x ) > r ( x , σ ( x )) + β E x ,σ ( x ) ¯ v ( x ) > r ( x , σ ( x )) + β E x ,σ ( x ) (cid:8) r ( x , σ ( x )) + β E x ,σ ( x ) ¯ v ( x ) (cid:9) = r ( x , σ ( x )) + β E x ,σ ( x ) r ( x , σ ( x )) + β E x ,σ ( x ) E x ,σ ( x ) ¯ v ( x ) > T X t =0 β t E x ,σ ( x ) · · · E x t − ,σ ( x t − ) r ( x t , σ ( x t )) + β T +1 E x ,σ ( x ) · · · E x T ,σ ( x T ) ¯ v ( x T +1 )= T X t =0 β t E x r ( x t , σ ( x t )) + β T E x ,σ ( x ) · · · E x T − ,σ ( x T − ) g ∗ ( x T , σ ( x T )) . (23)Notice that, by Assumption 3.1, we have (cid:12)(cid:12) β T E x ,σ ( x ) · · · E x T − ,σ ( x T − ) g ∗ ( x T , σ ( x T )) (cid:12)(cid:12) β T E x ,σ ( x ) · · · E x T − ,σ ( x T − ) | g ∗ ( x T , σ ( x T )) | β T E x ,σ ( x ) · · · E x T − ,σ ( x T − ) k g ∗ k κ κ ( x T ) β T α T k g ∗ k κ κ ( x ) = ( αβ ) T k g ∗ k κ κ ( x ) → T → ∞ . Letting T → ∞ , (23) then implies that ¯ v ( x ) > v σ ( x ). Since x ∈ X and σ ∈ Σ arearbitrary, we have ¯ v > v ∗ . Moreover, since g ∗ = W ¯ v and there exists a g ∗ -greedypolicy σ ∗ by assumption, all the inequalities in (23) holds with equality once we let σ = σ ∗ . In other words, we have ¯ v = v σ ∗ v ∗ . In summary, we have shown that¯ v = v ∗ . Hence, g ∗ = W v ∗ and v ∗ = M W g ∗ , and part (a) of claim (4) holds.Since we have shown that v ∗ is the unique fixed point of T in V , by Theorem 1of Ma and Stachurski (2018), the set of optimal policies is nonempty, and a feasiblepolicy is optimal if and only if it is v ∗ -greedy. Since in addition g ∗ = W v ∗ , parts (b)and (c) of claim (4) hold. (cid:3) Next, we aim to prove Proposition 3.2. For all g ∈ G and ( w, z ) ∈ X , we define h g ( w, z ) := max s w { r ( w, s ) + g ( z, s ) } and M g ( w, z ) := { s ∈ [0 , w ] : h g ( w, z ) = r ( w, s ) + g ( z, s ) } . The following result is helpful in applications for verifying S G ⊂ G . Lemma 5.1.
For all g ∈ G , h g and M g satisfy the following properties: (1) h g is well defined and increasing in w , (2) h g is continuous on (0 , ∞ ) × Z , (3) h g is continuous on X if r is bounded below, and (4) M g is nonempty, compact-valued, and upper hemicontinuous.Proof. Fix g ∈ G . Since g is bounded below, h g (0 , z ) = r (0 ,
0) + g (0 , z ) ∈ R ∪ {−∞} and h g is well defined at w = 0. Now consider w >
0. Let D be the interior of D .By assumption, either(i) r is continuous on D and lim s → w r ( w, s ) = −∞ for some w ∈ R + , or(ii) r is continuous and bounded below.Each scenario, since g is continuous, the maximum in the definition of h g can beattained at some s ∈ [0 , w ]. Hence, h g is well defined for all w >
0. Regardingmonotonicity, let w , w ∈ R + with w < w . By the monotonicity of r , we have h g ( w , z ) max s ∈ [0 ,w ] { r ( w , s ) + g ( s, z ) } max s ∈ [0 ,w ] { r ( w , s ) + g ( s, z ) } = h g ( w , z ) . Hence, claim (a) holds. Claims (b)–(d) follow from Berge’s theorem of maximum(adjusted to accommodate possibly negative infinity valued objective functions). (cid:3)
Proof of Proposition 3.2. ℓ is bounded below since, by the monotonicity of f and r , ℓ ( x, a ) = E z,s r ( w ′ , > E z r [ f (0 , η ′ ) ,
0] = ℓ ( z ) , which is bounded below by assumption. Moreover, it is obvious that G is a closedsubset of G . Existence of g -greedy policies for g in G has been verified by Lemma 5.1.It remains to show that S G ⊂ G . For fixed g ∈ G , Theorem 3.1 implies that Sg ∈ G . To see that Sg is increasing in its last argument and continuous, note thatby Lemma 5.1, h g is continuous on D and increasing in w ′ . For all s , s ∈ A with s s , the monotonicity of f implies that Sg ( z, s ) = β E z,s h g ( w ′ , z ′ ) = β E z h g ( f ( s , η ′ ) , z ′ ) β E z h g ( f ( s , η ′ ) , z ′ ) = β E z,s h g ( w ′ , z ′ ) = Sg ( z, s ) . Hence, Sg is increasing in its last argument. In addition, the definition of G and themonotonicity of r and f implies that r ( f (0 , η ′ ) , − α h g ( w ′ , z ′ ) α κ ( w ′ , z ′ ) for some α , α ∈ R + . Since κ e and ℓ are continuous and the stochastic kernel P is Feller, Fatou’s lemmaimplies that Sg ( z, s ) = β E z,s h g ( w ′ , z ′ ) is continuous. (cid:3) ReferencesAbbring, J. H., J. R. Campbell, J. Tilly, and N. Yang (2018): “Very SimpleMarkov-Perfect Industry Dynamics: Theory,”
Econometrica , 86, 721–735.
Ac¸ıkg¨oz, ¨O. T. (2018): “On the Existence and Uniqueness of Stationary Equilib-rium in Bewley Economies with Production,”
Journal of Economic Theory , 173,18–55.
Aguiar, M. and M. Amador (2019): “A Contraction for Sovereign Debt Models,”
Journal of Economic Theory , 183, 842–875.
Aguiar, M., M. Amador, H. Hopenhayn, and I. Werning (2019): “Take theShort Route: Equilibrium Default and Debt Maturity,”
Econometrica , 87, 423–462.
Alvarez, F. and N. L. Stokey (1998): “Dynamic programming with homoge-neous functions,”
Journal of Economic Theory , 82, 167–189.
Arellano, C. (2008): “Default risk and income fluctuations in emergingeconomies,”
The American Economic Review , 98, 690–712.
B¨auerle, N. and A. Ja´skiewicz (2018): “Stochastic optimal growth model withrisk sensitive preferences,”
Journal of Economic Theory , 173, 181–200.
Benhabib, J., A. Bisin, and S. Zhu (2015): “The wealth distribution in Bewleyeconomies with capital income risk,”
Journal of Economic Theory , 159, 489–515.
Bertsekas, D. P. (2013):
Abstract dynamic programming , Athena Scientific.——— (2017):
Dynamic programming and optimal control , vol. 4, Athena Scientific.
Blackwell, D. (1965): “Discounted dynamic programming,”
The Annals of Math-ematical Statistics , 36, 226–235.
Boyd, J. H. (1990): “Recursive utility and the Ramsey problem,”
Journal of Eco-nomic Theory , 50, 326–345.
Cao, D. (2018): “Recursive Equilibrium in Krusell and Smith (1998),” Tech. rep.,SSRN 2863349.
Fagereng, A., L. Guiso, D. Malacrino, and L. Pistaferri (2016): “Het-erogeneity and Persistence in Returns to Wealth,” Tech. rep., National Bureau ofEconomic Research.
Heathcote, J., K. Storesletten, and G. L. Violante (2010): “The Macroe-conomic Implications of Rising Wage Inequality in the United States,”
Journal ofPolitical Economy , 118, 681–722.
Hubmer, J., P. Krusell, and A. A. Smith, Jr. (2018): “A ComprehensiveQuantitative Theory of the US Wealth Distribution,” Tech. rep., Yale.
Ja´skiewicz, A. and A. S. Nowak (2011): “Discounted dynamic programmingwith unbounded returns: application to economic models,”
Journal of Mathemati-cal Analysis and Applications , 378, 450–462. Jovanovic, B. (1982): “Selection and the evolution of industry,”
Econometrica ,649–670.
Kaplan, G. and G. L. Violante (2010): “How much consumption insurancebeyond self-insurance?”
American Economic Journal: Macroeconomics , 2, 53–87.
Le Van, C. and Y. Vailakis (2005): “Recursive utility and optimal growth withbounded or unbounded returns,”
Journal of Economic Theory , 123, 187–209.
Li, H. and J. Stachurski (2014): “Solving the income fluctuation problem withunbounded rewards,”
Journal of Economic Dynamics and Control , 45, 353–365.
Ma, Q. and J. Stachurski (2018): “Dynamic Programming Deconstructed,” arXivpreprint arXiv:1811.01940 . Martins-da Rocha, V. F. and Y. Vailakis (2010): “Existence and uniquenessof a fixed point for local contractions,”
Econometrica , 78, 1127–1141.
Matkowski, J. and A. S. Nowak (2011): “On discounted dynamic programmingwith unbounded returns,”
Economic Theory , 46, 455–474.
McCall, J. J. (1970): “Economics of information and job search,”
The QuarterlyJournal of Economics , 113–126.
Rinc´on-Zapatero, J. P. and C. Rodr´ıguez-Palmero (2003): “Existence anduniqueness of solutions to the Bellman equation in the unbounded case,”
Econo-metrica , 71, 1519–1555.
Rust, J. (1987): “Optimal replacement of GMC bus engines: An empirical modelof Harold Zurcher,”
Econometrica , 999–1033.
Wessels, J. (1977): “Markov programming by successive approximations with re-spect to weighted supremum norms,”