When to Quit Gambling, if You Must!
WWhen to Quit Gambling, if You Must! ∗ Sang Hu † Jan Ob(cid:32)l´oj ‡ Xun Yu Zhou § February 8, 2021
Abstract
We develop an approach to solve Barberis (2012)’s casino gambling model in whicha gambler whose preferences are specified by the cumulative prospect theory (CPT)must decide when to stop gambling by a prescribed deadline. We assume that thegambler can assist their decision using an independent randomization, and explainwhy it is a reasonable assumption. The problem is inherently time-inconsistent dueto the probability weighting in CPT, and we study both precommitted and na¨ıvestopping strategies. We turn the original problem into a computationally tractablemathematical program, based on which we derive an optimal precommitted rule whichis randomized and Markovian. The analytical treatment enables us to make severalpredictions regarding a gambler’s behavior, including that with randomization theymay enter the casino even when allowed to play only once, that whether they willplay longer once they are granted more bets depends on whether they are in a gainor at a loss, and that it is prevalent that a naivit´e never stops loss. ∗ We thank Nick Barberis for a long list of constructive comments on a previous version of the paperthat have led to a much improved version. † School of Data Science, The Chinese University of Hong Kong, Shenzhen, China 518172. Email: [email protected] . This author would like to acknowledge the funding of National Natural Science ofChina (Grant No. 11901494). ‡ Mathematical Institute, the Oxford-Man Institute of Quantitative Finance and St John’s College, Uni-versity of Oxford, Oxford, UK. Email:
[email protected] . Part of this research was completedwhilst this author was visiting CUHK and he is grateful for the support from the host. He also gratefullyacknowledges support from ERC Starting Grant
RobustFinMath § Department of Industrial Engineering and Operations Research, Columbia University, New York,New York 10027. Email: [email protected] . This author gratefully acknowledges financial supportsthrough start-up grants at both University of Oxford and Columbia University, and through Oxford–NieLab for Financial Big Data and the Nie Center for Intelligent Asset Management. a r X i v : . [ q -f i n . M F ] F e b ey words: casino gambling; cumulative prospect theory; optimal stopping;probability weighting; time inconsistency; randomization; finite time horizon; Sko-rokhod embedding; potential function. Barberis (2012) proposes a casino gambling model in the framework of Tversky and Kah-neman (1992)’s cumulative prospect theory (CPT) to study the optimal timing to quitgambling and leave the casino. The author has derived two key economic insights: (1) aCPT gambler may be willing to enter the casino even though its bets offer neither positiveexpected values nor skewness because, by implementing an appropriate stopping strategy,he would be able to build a positively skewed final winning amount that would be favoredby the underlying probability weighting in CPT; (2) there is an inherent time-inconsistencydue to the dynamically changing strength of probability weighting on a same event: thegambler may deviate completely from his initial stopping strategy as he gambles along, andhis eventual stopping behavior depends on whether he is aware of this time-inconsistencyand whether he is able to commit his original plan. It is, however, not an objective of Barberis (2012) to develop a general approach to solve the casino model he puts forward. Barberis (2012) acknowledges that the nonlin-ear probability weighting involved in CPT makes it “very difficult” to solve the problemanalytically, and “the problem has no known analytical solution for general T ” (p. 42),where T is an exogenously given number of bets the gambler can maximally have. Instead,Barberis (2012) uses an exhaustive search to find a solution; namely, he enumerates all thepossible Markovian stopping strategies, calculates the CPT value of each of them and findsthe one that achieves the highest CPT value as the optimal strategy. As one would expect,this approach works only for smaller T , as the number of admissible Markovian strategiesis exponential in T . Barberis (2012) solves the problem with T = 5. Barberis (2012) discusses three types of gamblers, following the original classification of Strotz (1955):a na¨ıve gambler who is unaware of the time-inconsistency and changes his strategy all the time; a precom-mitted gambler who is aware of time-inconsistency and can commit to his initial plan; and a sophisticated gambler who is aware of time-inconsistency yet unable to commit, and at each time takes the future selves’disobedience into account when devising an optimal strategy. The number of nodes is T ( T +1)2 in a binomial tree of horizon T , and at each node there is a binarychoice of { stop, continue } . Hence the total number of strategies is 2 T ( T +1)2 . We ran exhaustive search on a desktop with Intel Core i5-4590/CPU 3.30GHz/RAM 8.00GB for dif- systematic approach to solve it, not necessarily in an analytically closed form, but in acomputationally efficient way. Not only can we then obtain optimal solutions for arbitraryvalues of parameters, but we may gain (likely more profound) economic insights from themodel by post-optimality analyses such as comparative statics. The main technical hurdleto solve the casino model is probability weighting, as pointed out by Barberis (2012). Thetwo main approaches in the classical optimal stopping theory – dynamic programming(variational inequalities) and martingale method – both fail under probability weighting:the former does because of the time-inconsistency, and the latter does because of the absenceof a “tower property” with respect to the weighted probability.He et al. (2017, 2019a,b) are probably the first series of papers that aim at an analyticaltreatment of the casino model, albeit in the infinite time horizon. The main idea ofthese papers consists of two deeply intertwined steps: (1) search the optimal probabilitydistribution of the final winning/losing amount upon leaving the casino instead of theoptimal time to leave; (2) once the optimal distribution is found, recover the optimal timethat generates it. Both steps call for a complete characterization of the set of all theadmissible distributions, and the second step is the discrete-time version of the eminent
Skorokhod embedding theorem which in the casino setting is solved in He et al. (2019b). Themain thrust to make this idea work is to permit randomization , namely the gambler canflip an independent, possibly biased, coin to assist his decision each step of the way. Theprobabilities of the head of the coin are endogenous and dynamically changing; thus theyare part of the final solution. The randomization of decisions is a key feature when studyingagents with CPT preferences, as discussed independently by Henderson et al. (2017). Heet al. (2017, 2019a) also allow path-dependent strategies, that is, the stopping decision ferent T ’s while keeping the other parameters same as Barberis (2012)’s. The running times for T = 5 , , T = 8due to out of storage, with the running time estimated to be 300 days. Consider, e.g., the simplex method for linear programs or the dynamic programming formulation foroptimal control problems. Here by “analytical treatment” we mean an optimization analysis not based on heuristics or on bruteforce such as an exhaustive search. This idea was first put forth by Xu and Zhou (2012) for a continuous-time optimal stopping modelfeaturing probability weighting. There is considerable difficulty to adapt this idea to the discrete-timesetting. Mathematically, randomization convexifies the aforementioned set of admissible distributions. Hence,He et al. (2017, 2019a) use randomization as a technical tool to make the Skorokhod embedding work,but fall short of explaining, economically, why people would randomize and how exactly they do it. Thepresent paper offers discussions on these issues; see Subsection 2.3.
3s made based on the whole betting history instead of just the current winning/losingamount. They further show that allowing path-dependent strategies or randomized ones strictly improve the optimal CPT values. Based on these analyses, He et al. (2019a) turn thecasino model into an infinite dimensional mathematical program that can be solved fairlyefficiently. Most of the gambler’s behaviors – those of a precommitter and of a naivet´e –implied from the solutions reconcile qualitatively with Barberis (2012)’s results; but thereare also new findings. For example, it is revealed that, for most empirically relevant CPTparameter estimates, a precommitted gambler lets gain run while stops loss, but a na¨ıveone almost surely does not stop at any loss level.As noted, He et al. (2017, 2019a) deal with the infinite horizon gambling model. Thereare important reasons to study the finite horizon model under CPT preferences, despitethe existing results for the infinite horizon counterpart. Conceptually, the finite horizonproblem approximates the reality much better, as a gambler clearly will not be able toplay arbitrarily and indefinitely long. Also, the original work of Barberis (2012) considers T = 5 and hence we need to solve the finite horizon model in order to be able to make adirect comparison. It is worth noting that solutions to the finite horizon case can not berecovered from those of the infinite horizon case by a simple truncation: if τ is optimal forthe latter, then, typically, τ ∧ T will not be optimal for the former.Methodologically, the finite horizon case is significantly more complex. It is well ac-knowledged that optimal stopping in a finite horizon is fundamentally more difficult thanits infinite horizon counterpart, mainly because value function of the former has both timeand spatial variables while the latter has only spatial variables. In the infinite time hori-zon setting in which the accumulated winning/losing amount is modelled by a symmetricrandom walk S , He et al. (2019b) show that for any centered probability measure µ onthe set of integers Z , there exists a randomized stopping time τ such that S τ ’s distributionis ∼ µ . As discussed previously, this is the key theoretical underpinning for the new ap-proach. Unfortunately, this result is no longer true if the stopping time is constrained by apre-specified deadline. Indeed, additional conditions are required for measures that can beembedded by uniformly bounded stopping times. One of the contributions of this paper isto identify explicitly these conditions, which in turn enables us to reformulate the originalcasino model into a mathematical program whose number of constraints is of the order of T and, hence, can be efficiently solved. In the terminology of Skorokhod embedding theorem, we say τ embeds µ in S . any parameter values, wewill then be able to first compare our results with those of Barberis (2012)’s. In particular,we compute for exactly the same case that is solved and discussed in Barberis (2012)with T = 5. The respective stopping strategies for a precommitter are identical except intwo time–state instances in which our decisions are to stop with very small probabilities(0.00864 and 0.0368 respectively) whereas Barberis’ are just to continue. Qualitatively,both strategies are of the so-called loss-exit type, namely, they continue in gains but stopafter having accumulated sufficient amounts of losses. With randomization, our optimalCPT value improves, if slightly, over Barberis’. Likewise, the respective na¨ıve strategiesare the same save for one time–state instance in which ours is to stop with a probability of0.179 while Barberis’ is to continue. Our solution, however, enables us to look beyond therelatively short horizon of T = 5. Indeed, we carry out numerical experiments for differentvalues of T up to T = 20, and discover that the interplay between the utility function,probability weighting and loss aversion dictates various gambling behaviors.Note that our analytical treatment relies on the introduction of randomization in ourmodel, as randomization convexifies the optimization problem. Barberis (2012) does notallow randomization, for which our approach would fail. However, our solution would pro-vide a well-founded relaxation heuristic for solving a casino model without randomization:we first relax the problem by introducing randomization, and then, for each time-state pair,round up or round down the probability of stopping to 1 (which means stop) or to 0 (whichmeans continue).Our approach makes it possible to analyze and understand the impacts of some keyattributes of the model, which we believe is the most important contribution of this paper.For example, Barberis (2012) argues that a gambler may be willing to enter a casinobecause, by implementing a loss-exit strategy, he may be able to generate a positivelyskewed probability distribution of the final accumulated gain/loss which has a positive CPTpreference value. However, he will need to spend time building such a skewed distribution,which requires a sufficiently large T . We show, however, that for the same gambler whowould have demanded a long horizon for agreeing to enter the casino, will enter evenif he is allowed to play only once (i.e., T = 1), provided that he can flip a coin. Thereason for this is that, with randomization, the gambler can design a coin right away with the desired skewed distribution, saving all the time otherwise needed to reach thatdistribution. Another insight is about the value of time: how much is time on your hands5orth? Specifically, we examine the question of what a gambler would do should he beallowed to stay one more period than previously agreed. Would he always take advantageof this extended time horizon and actually play the additional round? It turns out thatthere is no uniform answer to the question – it depends crucially on whether the gambleris currently in a gain or at a loss.We also study the behaviors of a na¨ıve gambler with various parameter specificationsand a longer time horizon ( T = 20). We find that, unless he does not enter the casino, hisbehavior is consistently of gain-exit type, i.e., he stops gain but lets loss run, reminiscent ofthe disposition effect in security trading (Odean, 1998). In particular, he never stops lossand gambles “until the bitter end”. This gamble-until-the-bitter-end behavior is derived byEbert and Strack (2015) in a model in which a na¨ıve gambler can construct arbitrarily smallrandom payoffs. Because he prefers “skewness in the small”, he never stops gambling. Hen-derson et al. (2017), employing the approach developed in Xu and Zhou (2012), investigatea stylized continuous-time model and show that a na¨ıve gambler may stop with a positiveprobability if she is allowed to randomize, which complements and counters the findings inEbert and Strack (2015). Both Ebert and Strack (2015) and Henderson et al. (2017) relyon the crucial feature of their models that allows the gambler to construct arbitrarily smallrandom payoffs. This feature is absent in our discrete-time model, in which the gamblercannot construct strategies with arbitrarily small random payoffs due to the minimal stakesize fixed to be $
1. Hence, their results are not applicable to our setting. Our finding there-fore suggests that the gamble-until-the-bitter-end phenomenon is probably more prevalentof a naivit´e’s behavior.The paper proceeds as follows. In Section 2, we formulate a casino gambling modelunder CPT as an optimal stopping problem and discuss why we allow randomization in ourmodel. In Section 3, we develop the key step in our approach to solve the gambling model:characterizing the set of probability distributions of all possible accumulated winning/losingamounts upon leaving the casino. In Section 4, we present a mathematical program that isequivalent to the casino model, and then report the results of a numerical example whichis studied in Barberis (2012). We discuss about various implications and predictions ofour model in Section 5. Finally, we conclude the paper by Section 6. Proofs are placed inAppendices. 6
The Model
In this section we first highlight the key ingredients of Tversky and Kahneman (1992)’sCPT, then formulate the casino gambling model in a finite time horizon as an optimal stop-ping problem, and finally discuss about the reasons why we make randomization availablein our model.
In CPT, a utility (or value ) function u ( · ) depends on a reference point k in wealth thatdivides gains and losses . An agent derives the utility from gains and losses, rather thanfrom the absolute amount of wealth itself. The utility function is u ( x ) = u + ( x − k ) , x ≥ k, − λu − ( k − x ) , x < k, where u + ( · ) and u − ( · ) are both concave functions and λ >
1. This renders an overall
S-shaped utility function u ( · ) that is concave (risk-averse) in the gain region x ≥ k andconvex (risk-loving) in the loss region x < k . Moreover, λ > loss aversion . Tversky and Kahneman (1992) propose the following parametric form of u ( · ): u ( x ) = ( x − k ) α + , x ≥ k, − λ ( k − x ) α − , x < k, (1)where 0 < α ± ≤ λ >
1; see the left panel of Figure 1 for an illustration of this typeof functions.In CPT there are also probability weighting (or distortion ) functions w + ( · ) and w − ( · )applied to gains and losses respectively. An inverse S-shaped weighting function is firstconcave and then convex in the domain of probabilities. Such a weighting function over-weights both tails of a probability distribution, reflecting the exaggeration of extremelysmall probabilities of extremely large gains and losses. Tversky and Kahneman (1992)7 x -5-4-3-2-1012345 u ( x ) + = - = 0.5, = 1.5 + = - = 0.88, = 2.25 p w ( p ) = 0.4 = 0.61 = 0.69 Figure 1: The left panel graphs two S-shaped utility functions (1) with α + = α − = 0 . , λ =1 . α + = α − = 0 . , λ = 2 .
25, respectively. The right panel depicts three inverseS-shaped probability weighting functions (2) with δ = 0 . δ = 0 .
61, and δ = 0 .
69, respec-tively.suggest a parametric form of a weighting function w ( · ): w ( p ) = p δ ( p δ + (1 − p ) δ ) δ , (2)where 0 < δ ≤
1; see the right panel of Figure 1 for an illustration. Note that δ = 1 meansthat no weighting is applied. We now reformulate Barberis (2012)’s model of casino gambling in a finite time horizon[0 , T ], where T ∈ Z + := { , , , ... } is given. The gambling process proceeds as follows. Attime 0, the gambler is offered a fair bet, e.g., one with a roulette wheel: win or lose $ If the gambler decides not to play the bet, then he will not even enterthe casino. If the gambler enters and takes the bet, then the bet outcome is played out attime 1, leading to either a win or a loss of $ As in Barberis (2012), we assume in this paper that the gamble is fair. It will not affect the maineconomic findings and implications of our results. A model of unfair games is more technical, and is leftfor a future study. (0,0) (1,1) (2,2) (1,-1) (2,-2) (2,0) Figure 2:
The gain/loss binomial tree with T = 5. The gambler must leave the casino by time 5,which is represented by the black nodes. is over and the gambler leaves the casino with $ T : the bet is offered and played out repeatedly until either thefirst time the gambler declines the bet, or at time T when the gambler must quit gamblingand leave. The accumulated gain/loss process can be represented as a binomial tree; seeFigure 2. Therein, each node is marked by a pair ( t, x ), where t ∈ N := { , , , . . . } standsfor the time and x ∈ Z := { , ± , ± , . . . } the amount of cumulative gains or losses. Forexample, the node (2 , −
2) signifies a cumulative loss of $ T but the gambler may quit at some earlier time τ ≤ T .The gain/loss binomial tree S = ( S t : t ∈ N ) is a standard symmetric random walk (SSRW) defined on a filtered probability space (Ω , F , P ; ( F t ) t ∈ N ). We assume the probabilityspace is rich enough to support an F -measurable random variable ξ that is uniformlydistributed on [0 ,
1] and independent of S .Suppose the gambler quits gambling at a random time τ ∈ [0 , T ]. Then, with thereference point being his initial wealth before he enters the casino, the CPT value of his9ealth upon leaving is V ( S τ ) := T (cid:88) n =1 u + ( n ) [ w + ( P ( S τ ≥ n )) − w + ( P ( S τ ≥ n + 1))] − λ T (cid:88) n =1 u − ( n ) [ w − ( P ( S τ ≤ − n )) − w − ( P ( S τ ≤ − n − . (3)Throughout this paper we assume that both u + ( · ) and u − ( · ) are concave and both w + ( · )and w − ( · ) are inverse S-shaped. The gambler needs to determine the optimal time to quitand leave the casino: such a stopping (exit) strategy τ is made at t = 0 to maximize V ( S τ )among all admissible strategies. Note that, due to probability weighting, the problem isinherently time-inconsistent ; so τ is optimal only at t = 0 in the sense of a precommitted strategy; it may no longer be optimal from the vantage point of any later time t > admissible stopping strategies T T := { τ ∈ [0 , T ] : τ is an ( F t ) t ∈ N -stopping time } . So a decision whether or not to quit at time t ∈ [0 , T ] depends on all the information up to t .In particular, path-dependent strategies are admissible. Moreover, F – and hence all F t –contains the information about ξ , a uniform random variable independent of S . Using ξ , wecan define countably many binary random variables which are mutually independent andalso independent of S . In consequence, an admissible strategy may involve randomizationby tossing a (generally biased) coin. In the next subsection we will outline the rationalebehind allowing randomized strategies.The gambler’s problem is max τ ∈T T V ( S τ ) . (4) In our model (4), the filtration ( F t ) t ∈ N includes the information based on a uniform randomvariable that is independent of the underlying random walk. This means that we allow thegambler to assist his decision by flipping an independent, most likely biased, coin at each10ode. We now discuss the rationale behind making this randomization available in ourmodel.First of all, randomization is related to the accommodation of path-dependence. It ispractically more reasonable, and indeed necessary, to consider path-dependent strategiesthan mere Markovian ones. How has the gambler arrived at a current amount (say $ Markovian stopping time inthe sense that both attain the same CPT value. As a result, we can consider randomiza-tion in lieu of considering the past information.However, He et al. (2017) also show the converse is not true, namely, a randomizedMarkovian strategy may not be replicated by a non-randomized, path-dependent strategy,and the optimal CPT value among the former type may be strictly greater than thatamong the latter type. The authors attribute this to the lack of quasi-convexity of CPTpreference (in contrast to the classical expected utility theory preference). This propertywas also exploited by Henderson et al. (2017) to complement and counter the findings inEbert and Strack (2015).So, what are the other reasons why a gambler may want to randomize, beyond and inde-pendent of replacing path-dependence and maximizing CPT preference? Indeed, preferencefor randomization is observed in daily life and in different cultures, such as last-minute dealsby flight booking apps, “sushi omakase” (you entrust yourself to a sushi chef to choose theingredients and presentations of your sushi plate), and “fukubukuro” (grab bags filled with Note any random variable taking a continuum of values, including the Bernoulli random variable, canbe generated from a uniform random variable. While this result has been obtained for the infinite horizon model, the underlying argument is exactlythe same for the finite horizon case. The intuition is that, due to the independent increments of a randomwalk, considering all the past information can be achieved by randomizing at the current state. deliberately randomizing when making decisions.Agranov and Ortoleva (2017) report on experiments in which subjects who face identicalquestions repeated three times in a row often switch between their answers, and a signif-icant portion of them are even willing to pay for a coin flip to choose answers for them.Dwenger et al. (2013) study a clearing house data for university admissions in Germany,where applicants submit multiple rankings of the universities they wish to attend. Theauthors find that a significant fraction of students report contradictory rankings withoutany rational reasons.The psychological literature has put forward various theories to explain the prefer-ence for randomization, such as responsibility aversion (Leonhardt et al., 2011), decisionavoidance (Anderson, 2003), and regret theory (Zeelenberg and Pieters, 2007). In the afore-mentioned example of divination sticks, prayers delegate their decisions to a god, a benefitof which is to release themselves from making the decisions on their own and hence relievethemselves from regret should the choice turn out to be bad. In the economics literature,Diecidue et al. (2004) use the general “utility of gambling” to explain randomization. This type of utilities either violate some basic axioms underlining classical utility theorysuch as second-order stochastic dominance and betweenness (Blavatskyy, 2006, Camererand Ho, 1994) (precisely the same reason why randomization strictly improves CPT value),or prefer irrational diversification or hedging (Rubinstein, 2002), or weight more on fairnessthan on outcomes (Bolton et al., 2005, Kahneman et al., 1986).In summary, a gambler may have various independent reasons to perform randomizationwhile gambling, which may be only partly relevant, or completely irrelevant, to his CPTpreference. That is why we introduce an independent binary random variable to capturesuch a desire for randomization. Finally, the easy availability for flipping a biased coin The theory of utility of gambling can also be used to explain why a gambler is willing to play a betthat has unfavorable average return; see Barberis (2012, p. 38). Here, the utility of gambling is applied toa different phenomenon, namely, the desire to randomize.
As explained earlier, the main thrust of our approach to solving Problem (4) is to changeits decision variable from the stopping time τ to the distribution of the stopped state S τ .A key step is therefore to characterize the admissible set of these distributions. Moreover,once an optimal distribution is obtained there needs to be a way to recover the stoppingtime that generates this distribution. These two questions are intertwined and will actuallybe solved together. This section addresses them.Denote by P ( R ) the set of probability measures µ on R and by P ( R ) the subset of P ( R ) whose elements have finite first moments and are centered: (cid:82) | x | µ ( dx ) < ∞ and (cid:82) xµ ( dx ) = 0. Denote by P ( Z ) = { µ ∈ P ( R ) : µ ( Z ) = 1 } the subset of P ( R ) supportedon integers.For µ ∈ P ( R ), define a function U µ ( x ) := (cid:90) R | x − y | µ ( dy ) , x ∈ R , which is called the potential of µ . For µ ∈ P ( Z ), U µ is a linear interpolation of thepoints { U µ ( k ) : k ∈ Z } . The following are evident: µ ( { x } ) = U µ ( x + 1) + U µ ( x − − U µ ( x ) , U µ ( x ) = − xµ ([ x, ∞ ))+ x +2 (cid:88) y ≥ x yµ ( { y } ) . (5)Potential function uniquely determine probability measure, namely, two measures are iden-tical if and only if their potential functions are identical; see Ob(cid:32)l´oj (2004). Finally, for anystopping time τ , with a slight abuse of notation we simply write U S τ for the potential ofthe distribution of S τ , when well defined. Note that our definition here is the negative of the usual definition of potential. U U U U U μ Figure 3:
An illustration of how U t evolves to U µ , for t = 0 , , , ,
4. Here µ has the distribution µ ( { } ) = µ ( {− } ) = 1 / µ ( { } ) = µ ( {− } ) = 5 / µ ( { } ) = µ ( {− } ) = 5 /
16; otherwise, µ ( { x } ) = 0. We can use a sequence of piecewise linear functions, called evolutional functions , toapproach a potential function. Indeed, given µ ∈ P ( Z ), we define recursively the followingsequence of functions: U µ ( x ) := | x | ,U µt ( x ) := U µt − ( x −
1) + U µt − ( x + 1)2 ∧ U µ ( x ) , t = 1 , , ..., x ∈ Z . (6)We then extend each U µt to non-integers x ∈ R by linear interpolation. When µ is fixed,we may drop the superscript µ and just write U t for simplicity. Figure 3 illustrates how U t evolves to U µ for an example of µ .The optimal stopping time we will derive belongs to a special class of randomized,Markovian stopping times called the Root stopping times . The original version of the Rootstopping times was developed in Root (1969) to solve the classical Skorokhod embeddingproblem for a Brownian motion B on an infinite time horizon. We now develop the Precisely, an (original) Root stopping time is the first hitting time of B on an explicitly constructedregion with a barrier in the time-space R + × R . For any centred µ on R with a finite second moment v ,such a time τ R exists which embeds µ in B , i.e., B τ ∼ µ and E [ τ ] = v ; see Root (1969), Rost (1976) and b := ( ..., b ( − , b (0) , b (1) , ... ) where, for any x ∈ Z , b ( x ) = x + 2 k ≥ | x | with some k ∈ Z , and another vector r := ( ..., r ( − , r (0) , r (1) , ... ),where r ( x ) ∈ [0 , b and r , define the probability distributions of a family ofBernoulli random variables { ξ t,x : t ∈ N , x ∈ Z } as follows: P ( ξ t,x = 0) = 1 − P ( ξ t,x = 1) = 0 , t < b ( x ) , P ( ξ t,x = 0) = 1 − P ( ξ t,x = 1) = r ( x ) , t = b ( x ) , P ( ξ t,x = 0) = 1 − P ( ξ t,x = 1) = 1 , t > b ( x ) . Graphically, b is a barrier that defines a time-space stopping region R b := (cid:8) ( t, x ) : t ∈ N , x ∈ Z , t ≥ b ( x ) (cid:9) , and the components of r are the probabilities to stop exactly on the boundary of thisstopping region. The randomized Root stopping time is defined as τ R ( b , r ) := inf { t ∈ N : ( t, S t ) ∈ R b and ξ t,S t = 0 } . (7)This stopping time is Markovian, because it depends only on the current state of therandom walk S . It is randomized because it depends on the outcome of the Bernoullirandom variables ξ t,x ’s.Figure 4 illustrates such a stopping time. The grey boundary divides the area into twosubareas: the one on the left hand side has white nodes representing “continue”, and thaton the right hand side consists of black nodes indicating “stop”. Stopping at a grey node( t, x ) is randomized with r ( x ) being the probability of stopping.The following theorem is one of the main results of the paper that provides a theoreticalfoundation for the numerical algorithm we are going to present to solve our casino gamblingmodel. It characterizes the admissible set of stopped distributions under stopping timesin T T , and reveals that the set is the same as that of stopped distributions using onlyrandomized Root stopping times. As a consequence, any admissible stopping strategy isalways dominated by a randomized Root stopping time. Ob(cid:32)l´oj (2004). (0,0) (1,1) (2,2) (1,-1) (2,-2) (2,0) (3,3) (3,-1) (3,-3) (4,4) (4,0) (4,2) (4,-4) (4,-2) (3,1) Figure 4:
An example of the Root stopping time with T = 5. Black nodes mean “stop”, whitenodes mean “continue”, and grey nodes mean “randomize”. The boundary b is given as follows: b (4) = 4, b (3) = 3, b (2) = 4, b (1) = 3, b (0) = 2, b ( −
1) = 3, b ( −
2) = 2, b ( −
3) = 3, b ( −
4) = 4.
Theorem 3.1
Let T ≥ , µ ∈ P ( Z ) such that µ ([ − T, T ]) = 1 . Then there exists astopping time τ ∈ T T such that S τ ∼ µ if and only if U µ ( x ) ≤ U µT − ( x + 1) + U µT − ( x − , x = − ( T − , − ( T − , ..., T − , T − . (8) Moreover, in this case there exists a randomized Root stopping time τ R ( b , r ) ∈ T T such that S τ R ( b , r ) ∼ µ . Theorem 3.1 hints that we can, instead of endeavoring to find the stopping time τ inProblem (4), try to find the probability distribution µ of the stopped state S τ . Namely wechange decision variable from τ to µ for Problem (4). The resulting problem is a (nonlinear) mathematical program (i.e., a constrained optimization problem) with the condition (8)translating into certain constraints.Moreover, once we solve this problem and find the optimal distribution µ , then it followsfrom Theorem 3.1 that there exists a randomized Root stopping time τ R ( b , r ) that achievesthe same stopped distribution and, hence, solves (4). Furthermore, based on the proof of16heorem 3.1 (see Appendix A), we can devise an algorithm to find ( b , r ) and, consequently, τ R ( b , r ). We now formulate the mathematical program and provide its solution algorithm. Given τ ∈ T T , let µ ∼ S τ . Define two T -dimensional vector variables, x := ( x , x , ..., x T )and y := ( y , y , ..., y T ), where x n = µ ([ n, T ]), y n = µ ([ − T, − n ]), n = 1 , , ..., T . Clearly, x and y are gambler’s decumulative gain distribution and cumulative loss distribution,respectively. Then the original objective function (3) is equivalent to, as a function of( x , y ), U ( x , y ) := T (cid:88) n =1 [ u + ( n ) − u + ( n − w + ( x n ) − λ T (cid:88) n =1 [ u − ( n ) − u − ( n − w − ( y n ) . (9)Naturally, we must have 1 ≥ x ≥ x ≥ ... ≥ x T ≥
0, 1 ≥ y ≥ y ≥ ... ≥ y T ≥ x + y ≤
1. On the other hand, µ has zero expectation due to optional sampling theorem;so 0 = T (cid:88) n = − T nµ ( { n } ) = T (cid:88) n =1 nµ ( { n } ) − T (cid:88) n =1 nµ ( {− n } ) = T (cid:88) n =1 µ ([ n, T ]) − T (cid:88) n =1 µ ([ − T, − n ])= T (cid:88) n =1 x n − T (cid:88) n =1 y n . In summary, the following constraints are required for the probability distribution of S τ where τ ∈ T T : ≥ x ≥ x ≥ ... ≥ x T ≥ , ≥ y ≥ y ≥ ... ≥ y T ≥ ,x + y ≤ , (cid:80) Tn =1 x n = (cid:80) Tn =1 y n . (10)Moreover, Theorem 3.1 necessitates condition (8), which constitutes a family of inequal-ities on µ ’s potential function and the corresponding evolutional functions, which will later17e translated into constraints on S τ ’s distribution functions. Here, let us illustrate (8)for each of T = 1 , , . . . ,
5. To ease notation, we will suppress the superscript µ on theevolutional functions. For T = 1, the condition is satisfied automatically for any µ ∈ P ( Z )with µ ([ − T, T ]) = 1. For T = 2, (8) amounts to U µ (0) ≤ U (1)+ U ( − = 1. For T = 3, (8)reduces to max { U µ (1) , U µ ( − } ≤ { U µ (0) , } . For T = 4, (8) is equivalent to U µ (2) ≤ U (1)2 , U (1) = min (cid:110) U µ (1) , U µ (0) , (cid:111) ,U µ (0) ≤ U (1)+ U ( − , U ( −
1) = min (cid:110) U µ ( − , U µ (0) , (cid:111) ,U µ ( − ≤ U ( − . For T = 5, (8) specializes to U µ (3) ≤ U (2)2 , U (2) = min (cid:26) U µ (2) , (cid:110) U µ (1) , { Uµ (0) , } (cid:111) (cid:27) ,U µ (1) ≤ U (2)+ U (0)2 , U (0) = min (cid:26) U µ (0) , min (cid:16) U µ (1) , Uµ (0) , (cid:17) +min (cid:16) U µ ( − , Uµ (0) , (cid:17) (cid:27) ,U µ ( − ≤ U ( − U (0)2 , U ( −
2) = min (cid:26) U µ ( − , (cid:16) U µ ( − , Uµ (0) , (cid:17) (cid:27) ,U µ ( − ≤ U ( − . The following lemma, which follows a direct, if somewhat lengthy, computation, ex-presses U µ ( n ) and, consequently, the constraints (8), in terms of x and y . Lemma 4.1
For n ∈ Z ∩ [ − T, T ] , U µ ( n ) = (cid:80) Tj = n +1 x j + n, n ≥ , (cid:80) Tj = | n | +1 y j + | n | , n < .
18o illustrate, take T = 5. Then U µ (3) = 2 (cid:80) n =4 x n + 3 , U µ ( −
3) = 2 (cid:80) n =4 y n + 3 ,U µ (1) = 2 (cid:80) n =2 x n + 1 , U µ ( −
1) = 2 (cid:80) n =2 y n + 1 . We are now ready to formulate the mathematical program that is equivalent to theoriginal stopping problem (4). Define A = . . . − − . . . − ( T +1) × T , c = ( T +1) × , = T × , e j = ...10... T × , along with a set of functions f mn : R T × R T → R , m = 1 , ...T , n = 1 , ... T + 1, in thefollowing way. For x , y ∈ R T , let f m ( x , y ) = f m T +1 ( x , y ) ≡ T, m = 1 , ...T,f n ( x , y ) ≡ | n − ( T + 1) | , n = 2 , ... T, and for m = 2 , , ...T : f mn ( x , y ) = min (cid:16) f m − n − ( x , y )+ f m − n +1 ( x , y )2 , (cid:80) Tj = T +2 − n e j (cid:48) y + ( T + 1) − n (cid:17) , n = 2 , , ...T, min (cid:16) f m − n − ( x , y )+ f m − n +1 ( x , y )2 , (cid:80) Tj = n − T e j (cid:48) x + n − ( T + 1) (cid:17) , n = T + 1 , ... T − , T. x , y U ( x , y ) , subject to Ax ≤ c , Ay ≤ c , e (cid:48) x + e (cid:48) y ≤ , (cid:48) x − (cid:48) y = 0 , f Tn − ( x , y )+ f Tn +1 ( x , y )2 ≥ (cid:80) Tj = T +2 − n e j (cid:48) y + ( T + 1) − n for n = 2 k + 1 , k = 1 , , ..., n ≤ T, f Tn − ( x , y )+ f Tn +1 ( x , y )2 ≥ (cid:80) Tj = n − T e j (cid:48) x + n − ( T + 1)for n = 2 T − k + 1 , k = 1 , , ..., n ≥ T + 1 . (11)The number of decision variables ( x and y ) and the number of constraints in (11) areboth linear in T ; hence the complexity of the problem is manageable. Moreover, there arestandard solvers to solve this type of mathematical program. The running times for T = 5 , , T = 8 due to out of storage, with therunning time estimated to be 300 days.Once we solve this problem to get optimal ( x ∗ , y ∗ ), we then run the following algorithmto find the optimal randomized Root stopping time: Step 1
Given µ ∗ ≡ ( x ∗ , y ∗ ), compute the corresponding potential function: U µ ∗ ( n ) =2 (cid:80) Tj = n +1 x ∗ j + n for n ≥
0, and U µ ∗ ( n ) = 2 (cid:80) Tj = | n | +1 y ∗ j + | n | for n <
0. Then, compute itsevolutional functions U µ ∗ t by (6), t = 0 , , ...T . Step 2
Compute the boundary b that separates the “continue” region from the “stop”region: b ( n ) = inf { t ≥ | n | , t ∈ Z : U µ ∗ t +1 ( n ) = U µ ∗ ( n ) } , n ∈ [ − T, T ] ∩ Z . (The constraints in(11) guarantees that the set involved is non-empty and b ( n ) ≤ T ∀ n ∈ [ − T, T ] ∩ Z .) Step 3
Compute the probability r to stop at the boundary: r ( n ) = U µ ∗ b ( n ) ( n −
1) + U µ ∗ b ( n ) ( n + 1) − U µ ∗ ( n ) U µ ∗ b ( n ) ( n −
1) + U µ ∗ b ( n ) ( n + 1) − U µ ∗ b ( n ) ( n ) , n ∈ [ − T, T ] ∩ Z . In the following numerical experiments, we employ nonlinear optimization solver ‘fmincon’ from
MAT-LAB
Optimization Toolbox, on a desktop with Intel Core i5-4590/CPU 3.30GHz/RAM 8.00GB. For theBarberis (2012)’s parameters α + = α − = 0 . δ + = δ − = 0 . λ = 1 . T = 5 , , ,
8, MATLABuses 205 seconds, 220 seconds, 280 seconds, 350 seconds respectively. (Compare with those of the bruteforce reported in Footnote 3.) The running times for T = 10 , , , ,
50 are 9.35 minutes, 29 minutes,89 minutes, 3.5 hours, 7.17 hours, respectively. tep 4 Construct τ R ( b , r ) according to (7). We present an example to illustrate the solution procedure, using the same parameters asin Barberis (2012) with T = 5, α + = α − = 0 . δ + = δ − = 0 . λ = 1 . Solving thecorresponding mathematical program for the optimal distribution µ ∗ yields x ∗ = 0 . , x ∗ = 0 . , x ∗ = 0 . , x ∗ = 0 . , x ∗ = 0 . ,y ∗ = 0 . , y ∗ = 0 , y ∗ = 0 , y ∗ = 0 , y ∗ = 0 . The corresponding potential function U µ ∗ is U µ ∗ (0) = 1 , U µ ∗ (1) = 1 . , U µ ∗ (2) = 2 . , U µ ∗ (3) = 3 . , U µ ∗ (4) = 4 . ,U µ ∗ ( n ) = | n | for n ≥ n ≤ − . Figure 5 illustrates how U µ ∗ is achieved by the evolutional functions within five steps.We then apply the algorithm previously presented to recover the optimal randomizedRoot stopping time τ ∗ from the optimal distribution µ ∗ , with S τ ∗ ∼ µ ∗ . The strategy, whichis optimal at t = 0 (only) and implemented by the precommitted gambler, is drawn in theleft panel of Figure 6. Note that black nodes mean “stop”, white ones mean “continue”,and grey ones mean “randomization”. The number above a grey node is the probability tostop.The main feature of this precommitted optimal strategy is to continue in the gain do-main and to stop in the loss domain until T = 5, except at time 4 where there are positiveprobabilities to stop in gains. In particular, randomization takes place at nodes (4 ,
4) and(4 , ,
4) and (4 , In More examples with much longer time horizons will be presented in the next section. These are the only two nodes that are different between Barberis (2012) and the present paper. Notethey occur at T − U U U U U μ Figure 5: U µ ∗ is achieved within five steps, where the optimal distribution µ ∗ is the solution to(11) for T = 5, α + = α − = 0 . δ + = δ − = 0 . λ = 1 . (0,0) (1,1) (2,2) (1,-1) (2,-2) (2,0) (3,3) (3,-1) (3,-3) (4,4) (4,0) (4,2) (4,-4) (4,-2) (3,1) (0,0) (1,1) (2,2) (1,-1) (2,-2) (2,0) (3,3) (3,-1) (3,-3) (4,4) (4,0) (4,2) (4,-4) (4,-2) (3,1) Figure 6:
The left panel shows the precommitter’s strategy and the right panel shows the naivet´e’sstrategy, for T = 5, α + = α − = 0 . δ + = δ − = 0 . λ = 1 .
5. Black nodes mean “stop”, whitenodes mean “continue”, and grey nodes mean “randomize”. The numbers above the grey nodestand for the probability to stop. While the precommitter is mainly to continue in gains and stopin losses, the naivet´e’s behavior is almost completely reversed. particular instance) over non-randomized,Markovian ones. Moreover, one can achieve this improved optimal value by implementing a Markovianrandomization, with the overall strategy very similar qualitatively to Barberis (2012)’s –both are of the loss-exit type. While the precommitted gambler follows through the optimal strategy originally deter-mined at time 0, a na¨ıve gambler thought he would do the same but in actuality constantlydeviates from previously planned strategies. More precisely, at any time t >
0, a naivet´e re-considers the optimal stopping problem starting from t , devises a precommitted strategybut carries it out for only one period (because he will re-optimize again at the next timeinstant). Here, we assume this gambler keeps his initial wealth at time 0 as the referencepoint. The na¨ıve gambler’s strategy can be computed by deriving all the time- t precom-mitted strategies, t = 0 , , ..., T , implementing each of them for just one period, and then“pasting” them together. As a result, his actual quitting strategy could be drastically dif-ferent from the precommitted one, the one he originally planned before he enters the casino;see the right panel of Figure 6. There, the only node calling for randomization is now (2,0),with a probability of 0.179 to quit. Comparing the two strategies depicted in Figure 6, wefind that the na¨ıve strategy is not only significantly different from the precommitted one,but indeed almost completely opposite in character: the latter is mainly to continue in thegain domain and to stop in the loss domain, while the former is reversed. For a discussionon experimental evidence on the dramatic departure of the actual gambling behaviors from is extended from T to T + 1. This improvement can be significant with other parameter specifications. For example, with T = 2 and( α ± , δ ± , λ ) = (0 . , . , . T = 2and ( α ± , δ ± , λ ) = (0 . , . , We have also revisited the T = 6 example considered also in He et al. (2017). In that paper, arandomized strategy, found by trial and error, leads to the value function V = 0 . V = 0 . V = 0 . ,
0) with probability0 . ,
1) with 0 . ,
3) with 0 . ,
5) with 0 . This is also the assumption made in Barberis (2012) when analyzing a na¨ıve gambler’s behavior. It isboth natural and plausible that a gambler remembers the initial amount of cash he brought into the casinoand always compares wins and losses against that amount. This is also the only node that makes our na¨ıve strategy different from Barberis (2012)’s in which thenode (2,0) is white meaning “continue”; see the right panel of Figure 4 therein. disposition effect by Odean (1998).
One of the main takeaways of Barberis (2012) is that CPT offers an explanation why agambler would be willing to enter a casino even if the bets there have neither skewness norpositive expected values. By implementing a loss-exit strategy, namely keep gambling whenwinning but stop gambling when accumulating a sufficient loss, he envisions a positivelyskewed probability distribution of the accumulated gain/loss at the exit time which isfavored by the CPT preference. However, he would need a sufficiently long time periodto build such a skewed distribution in order to have a positive
CPT value to justify theentry (recall that the CPT value of not playing at all is zero). For the case of a piece-wise power utility function (1) and an inverse S-shaped weighting function (2), Barberis(2012, Proposition 1) provides a sufficient condition for this to happen. Moreover, forthe parameter values α + = α − = 0 . δ + = δ − = 0 . λ = 2 .
25, this sufficient conditiontranslates into T ≥
26; see Barberis (2012, Corollary 1). However, with randomization allowed, the gambler may be willing to enter the casinoeven if he is allowed to play only once (i.e., T = 1). Proposition 5.1
Suppose T = 1 . If lim p → [ w (cid:48) + ( p ) /w (cid:48)− ( p )] > λ [ u − (1) /u + (1)] , then theoptimal CPT value is strictly positive. Barberis (2012, Proposition 1) is stated for a na¨ıve gambler. However, the result holds for a precom-mitter as well because both gamblers face the same problem at t = 0. These parameter values are close to those given by Tversky and Kahneman (1992), i.e., α + = α − =0 . δ + = 0 . δ − = 0 . λ = 2 .
25. If we apply the exact Tversky and Kahneman (1992) parametervalues to Barberis (2012, Proposition 1), then the corresponding T ≥
20. Such a shorter period is expectedbecause the probability weighting in gains is stronger than that in losses with Tversky and Kahneman(1992)’s parameters; thus it takes less time to build the desired positively skewed distribution with apositive CPT value. Time horizon T O p t i m a l C P T v a l ue s + = - = 0.88, + = 0.61, - = 0.69, = 2.25 Figure 7:
Optimal CPT values for T = 1 , , ...
20 under the parameter values of Tversky andKahneman (1992), i.e., α + = α − = 0 . δ + = 0 . δ − = 0 . λ = 2 . Recall that in our model, randomization is available; so the optimal CPT value beingstrictly positive means that the gambler will enter the casino, possibly tossing a coin todecide whether to actually play (the only) one round of bet.What if T ≥
2? Naturally, as T increases, the optimal CPT values increase. Figure 7graphs the optimal CPT values for T = 1 , , ...,
20 with the Tversky and Kahneman (1992)estimates. Therefore, if the gambler will enter the casino for T = 1 with a given set ofparameters, so will he for T ≥ T ≥ δ + < δ − (which isthe case with Tversky and Kahneman (1992)’s estimates), we have lim p → [ w (cid:48) + ( p ) /w (cid:48)− ( p )] =+ ∞ . Hence, Proposition 5.1 yields that, as long as the loss-aversion degree λ is finite, arandomized gambling strategy is always preferred to non-gamble, even when T = 1.The intuition of Proposition 5.1 is as follows. The condition lim p → [ w (cid:48) + ( p ) /w (cid:48)− ( p )] >λ [ u − (1) /u + (1)] means that exaggeration of big gains outweighs exaggeration of big lossesand loss aversion combined ; so the gambler assigns a positive CPT value to a sufficientlypositively skewed distribution. Without randomization, it would take time to build sucha distribution. With randomization, however, the gambler can design a coin right away with such a distribution, saving him all the time otherwise needed. In other words, a coin25oss can be used to supersede all the time-consuming (and perhaps clever) maneuvers toreach the desired distribution. Note that even though randomization still gives rise to a symmetric distribution of gains and losses and hence the loss aversion seemingly wouldprevent the gambler from entering, the sufficiently unequal levels of probability weightingon gains and losses, as stipulated by the condition lim p → [ w (cid:48) + ( p ) /w (cid:48)− ( p )] > λ [ u − (1) /u + (1)],yields the contrary.On the other hand, the effectiveness of randomization crucially depends on the chosenparameters. If the degree of probability weighting in gains is equal to or less than that inlosses, and the level of loss-aversion is sufficiently large so that λ [ u − (1) /u + (1)] >
1, thenthe above proposition does not apply because lim p → [ w (cid:48) + ( p ) /w (cid:48)− ( p )] ≤ < λ [ u − (1) /u + (1)].For example, for utility function (1) and probability weighting function (2), let α + = α − = 0 . δ + = δ − = 0 . λ = 2 .
25, the values used in Barberis (2012, Corollary 1).In this case, we find a positive CPT value of randomized strategies only when the timehorizon is at least T = 25, slightly shorter than that ( T = 26) presented in Barberis (2012,Corollary 1) using a special non-randomized loss-exit strategy. The corresponding optimalprecommitted strategy is also of a loss-exit type: stop once losing $
1, and continue withpossible randomization when wining. If we further let the probability weighting in gains beweaker than that in losses, e.g., δ + = 0 .
69 and δ − = 0 .
61, then a positive preference valueis found only at T ≥ There are three components in the risk/loss preferences under CPT: the utility function,the probability weighting and the loss aversion. They are intertwined and compete witheach other in determining the overall preference and dictating the final behavior. In thissubsection, we study the roles they play in the case when the utility function is (1) and theweighting function is (2), with parameters α ± , δ ± and λ .Others being kept unchanged, the effect of each of these parameters is as follows: asmaller α + implies a higher degree of risk-aversion in gains and a smaller α − implies ahigher degree of risk-seeking in losses, a smaller δ ± yields a higher level of probabilityweighting in gains/losses, and a smaller λ indicates a smaller extent of loss aversion. Tounderstand the overall impact of these parameters on exit decisions, we first fix λ andconsider four sets of scenarios: large α ± and small δ ± ; small α ± and large δ ± ; small α ± and26mall δ ± ; and large α ± and large δ ± . Then we examine the effect of λ . In the followingdiscussions we fix T = 10.The left panel of Figure 8 draws the optimal precommitted strategy when α ± = 0 . δ ± = 0 . λ = 1 .
5. These are the same parameters as in the numerical example presentedin Subsection 4.2, except now we have a much longer horizon. Again, black nodes mean“stop”, white nodes “continue”, grey nodes “randomize”, and the number above the greynode is the probability to stop. This strategy is mainly to continue or toss a coin in gainsuntil the final time and to stop in losses, which is thus a loss-exit one. The intuition isas follows. This is the case where α ± are relatively large (lower risk-aversion/-seeking)and δ ± relatively small (heavier probability weighting). In the gain region, the strongerexaggeration of the small probability of winning a large amount outweighs the weaker riskaversion; hence the gambler is willing to take more risk and stay longer. In the loss region,the stronger exaggeration of the small probability of losing a large amount, together withthe loss aversion, outweighs the weaker risk-seeking appetite and prompts the gambler toplay safe and quit earlier.The above argument is reversed, leading to a gain-exit type of strategy, when α ± arerelatively small and δ ± relatively large, such as the one depicted in the left panel of Figure9 where α ± = 0 . δ ± = 0 . λ = 1 .
5. An interesting small variation of this case is whenprobability weighting is absent, i.e., α ± = 0 . δ ± = 1, λ = 1 .
5, in which the optimal CPTvalue is positive and the precommitted strategy is still a gain-exit one. Indeed, a positivepreference value is found at a much shorter horizon T = 4 under this group of parameters,and the optimal distribution of S τ is left-skewed (which is favored by a strong risk-seekingpreference in losses represented by α − ).The left panel of Figure 10 shows the precommitted strategy for the parameter values α ± = δ ± = 0 . λ = 1 .
5, which is the case of small α ± and small δ ± . This is still aloss-exit strategy, but the main differences from that visualized by the left panel of Figure8 are that, in the gain region, there are now more black nodes and the numbers abovethe grey nodes are larger, implying a higher likelihood of stop even when the gambler hasaccumulated a gain. The reason is that with a smaller α + , the exaggeration of the smallprobability of winning a large amount still outweighs the risk aversion in gains, but with alesser degree than the previous case.The last set of parameters are α ± = δ ± = 0 . λ = 1 .
5, with which the optimal27 .111
Figure 8:
The precommitted (left panel) and na¨ıve (right panel) strategies for T = 10, α ± = 0 . δ ± = 0 . λ = 1 .
5. Black nodes are “stop”, white nodes are “continue”, and grey nodes are“randomize”. The numbers above the grey nodes are the probabilities to stop. Figure 9:
The precommitted (left panel) and na¨ıve (right panel) strategies for T = 10, α ± = 0 . δ ± = 0 . λ = 1 .
5. Black nodes are “stop” and white nodes are “continue”. There is no greynode. .903 Figure 10:
The precommitted (left panel) and na¨ıve (right panel) strategy for T = 10, α ± = δ ± = 0 . λ = 1 .
5. Black nodes are “stop”, white nodes are “continue”, and grey nodes are“randomize”. The numbers above the grey nodes are the probabilities to stop. Loss-aversion degree O p t i m a l C P T v a l ue s = 0.95, = 0.5 = 0.5, = 0.95 = 0.5, = 0.5 = 0.95, = 0.95 Time horizon T O p t i m a l C P T v a l ue s = 0.95, = 0.5 = 0.5, = 0.95 = 0.5, = 0.5 = 0.95, = 0.95 Figure 11:
Optimal CPT values for λ from 1 to 3, while T = 10, are shown in the left panel.Optimal CPT values for T = 1 , ...
20, while λ = 1 .
5, are shown in the right panel. In both panels,( α ± , δ ± ) ∈ { (0 . , . , (0 . , . , (0 . , . , (0 . , . } . CPT value is zero and the gambler will simply not enter the casino. This is becausethese parameter values render a risk preference close to both risk-neutral and probability–weighting–free, while a zero-mean bet and a loss-aversion degree λ > λ is more straightforward, which we now examine. For each group of α ± and δ ± considered above, we obtain the optimal CPT value by varying λ from 1 to 3;see Figure 11, the left panel. Quite naturally, each of the optimal CPT values decreases as λ increases, and three of them hit zero before λ reaches 3. As a result, the gambler willbe increasingly reluctant to stay in or even enter the casino as his level of loss aversionincreases.The analysis in this subsection shows that the CPT casino modeling with various con-stellations of parameter specifications can predict and explain a rich array of gamblerbehaviors. In particular, whether the strategy is loss-exit or otherwise depends on theinterplay between the three intertwining and competing forces represented by α ± , δ ± , and λ . 31 .3 One more round? With a longer time horizon a precommitter is more likely to obtain a positive CPT pref-erence value and hence more likely to enter the casino because, trivially, the optimal CPTvalue for T is no less than that for T −
1. On the other hand, with a longer time horizonand a loss-exit strategy one can possibly construct a more positively skewed probabilitydistribution of the accumulated gain/loss at the exit time which, under CPT preference, ispreferred by the precommitter. Hence, the optimal preference value may strictly increaseas T increases, which is demonstrated in the right panel of Figure 11 where the optimalCPT values for T = 1 , , ...
20 under different groups of parameters are plotted.So, the overall CPT value will be heightened if the gambler is told to be granted an additional round of bet than previously agreed. But would he always take advantage ofthis extended time horizon and actually play the additional round? It turns out that theanswer can be totally different depending on whether the gambler is in the gain region orin the loss region.Let the original problem have a horizon T > τ ∈ T T be a given exiting strategy.Assume p T := P ( S τ = T ) > T, T ) under τ , namely τ = T and S τ = T . Now suppose the timehorizon is expanded to T + 1 so the gambler is allowed to play one more round. Firstly,we are interested in knowing, given τ = T and S τ = T , namely the gambler has alreadyplayed the originally final bet with the maximal possible accumulated win of T , whether thegambler would actually take this opportunity and play one more time to possibly achieve afinal accumulated gain of T + 1 or T − The situation is illustrated in the left panel ofFigure 12. Recall that randomization is allowed at any time; so let us denote by r T ∈ [0 , S τ = T , and by τ (cid:48) the strategy appending the original τ by, given τ = T and S τ = T , playing one more round with probability 1 − r T at time T and finallystopping at time T + 1. Let q T = − r T ∈ [0 , ]. The decumulative distribution of S τ (cid:48) differsfrom that of S τ only at P ( S τ (cid:48) ≥ T + 1) = q T p T and P ( S τ (cid:48) ≥ T ) = (1 − q T ) p T . The problemnow is to choose q T to maximize V ( S τ (cid:48) ) or, equivalently, to maximize w + ( q T p T )[ u + ( T + 1) − u + ( T )] + w + (cid:0) (1 − q T ) p T (cid:1) [ u + ( T ) − u + ( T − . Bear in mind all the decisions are made at t = 0 as we are considering precommitted strategies. So weare studying this problem from the vantage point of t = 0. Figure 12:
If one more bet is allowed given that the precommitted gambler could have played untilthe end with a sufficiently accumulated gain (loss), she would randomize (stop) with sufficientlylarge gain (loss) as shown in left panel (right panel).
For T large enough, both q T p T and (1 − q T ) p T are small enough to fall into the concave regionof the probability weighting function w + ( · ). Hence the above is a concave maximizationand the following first-order condition is necessary and sufficient for a maximum q ∗ T :0 = (cid:8) w (cid:48) + ( q ∗ T p T )[ u + ( T + 1) − u + ( T )] − w (cid:48) + (cid:0) (1 − q ∗ T ) p T (cid:1) [ u + ( T ) − u + ( T − (cid:9) p T , or equivalently, w (cid:48) + ( q ∗ T p T ) w (cid:48) + (cid:0) (1 − q ∗ T ) p T (cid:1) = u + ( T ) − u + ( T − u + ( T + 1) − u + ( T ) . (12)Assuming u + ( · ) is strictly concave (e.g. that given by (1)), the right hand side of (12) isstrictly greater than one. Hence, the equation is satisfied by some q ∗ T ∈ (0 , ), but not q ∗ T = 0 (noting w (cid:48) + (0) = + ∞ ) or q ∗ T = . Recall that q ∗ T = 0 and q ∗ T = correspond to r T = 1 and r T = 0 respectively. So, given the gambler has already played until the endwith a sufficiently accumulated gain (so that p T is sufficiently small), once he is allowed toplay (only) one more time he will not have a black-and-white decision of either “continue”or “stop”; rather he will always engage in randomization to make his decision. Moreover,as T increases the right hand side of (12) decreases; hence q ∗ T increases or r T decreases. Inother words, the more gains accumulated, the more likely the gambler will continue.What is the intuition behind these results? Standing at t = 0, the probability of This also explains why randomization happens at T − definitely continuing) is notoptimal either because of the strict risk aversion – randomization helps trigger probabilityweighting in large gains which in turn offsets the risk aversion level.Next, let us examine whether the gambler would like to take one more step in the loss region if the horizon is expanded. Again, suppose τ is a given exit strategy for the horizon T >
0, and denote p − T = P ( S τ = − T ) >
0. Let q − T = − r − T ∈ [0 , ], where r − T is theprobability to stop at S τ = − T , and by τ (cid:48)(cid:48) the strategy extending the original τ by, given τ = T and S τ = − T (see the right panel of Figure 12 for an illustration), playing onemore round with probability 1 − r − T at time T and stopping at time T + 1, assuming thehorizon is now T + 1. Then a similar analysis to the gain case shows that the optimal q ∗− T minimizes w − ( q − T p − T )[ u − ( T + 1) − u − ( T )] + w − (cid:0) (1 − q − T ) p − T (cid:1) [ u − ( T ) − u − ( T − . (13)Different from the gain region, in the loss region the optimality is achieved by minimizing a concave function when p − T is sufficiently small. Hence, the optimal q ∗− T is either 0 or , corresponding to “stop” or “continue” respectively. This means that the gambler will not flip a coin this time. To investigate which is better between “stop” and “continue”, wecalculate the difference between the objective values (13) at q ∗− T = 0 and at q ∗− T = : w − ( p − T )[ u − ( T ) − u − ( T − − w − ( p − T / u − ( T + 1) − u − ( T − u − ( T ) − u − ( T − w − ( p − T / (cid:20) w − ( p − T ) w − ( p − T / − u − ( T + 1) − u − ( T − u − ( T ) − u − ( T − (cid:21) . As T → ∞ , we have p − T → w − ( p − T ) w − ( p − T / → δ − , u − ( T + 1) − u − ( T − u − ( T ) − u − ( T −
1) = 1 + u − ( T + 1) − u − ( T ) u − ( T ) − u − ( T − → , assuming w − ( · ) is given by (2) with 0 < δ − < u − ( · ) has diminishing marginal(dis)utility, namely, u (cid:48)− ( x ) → x → ∞ (which holds for (1)). This implies that the34alue (13) at q ∗− T = 0 is smaller than that at q ∗− T = , when T is sufficiently large.Consequently, the gambler will choose to stop even if he is offered to play one more round.The intuition is clear: from the perspective at t = 0, the probability of losing sufficientlybig is very small, which is inflated by probability weighting. This inflation outweighs therisk-seeking in losses because of the diminishing marginal disutility. As a result, the actionof stop, which generates zero additional CPT value, is the best because any other actionwill only add negative CPT values.We have proved the following result. Theorem 5.1
Let τ ∈ T T be a given strategy. (a) Assume that u + ( · ) is strictly concave and P ( S τ = T ) > . Construct a new strategy τ (cid:48) = τ (cid:48) ( r T ) := τ + τ = T,S τ = T ξ T,T , where r T ∈ [0 , , ξ T,T is a Bernoulli random variablethat is independent of S = ( S t : t ∈ N ) and τ , and P ( ξ T,T = 0) = r T = 1 − P ( ξ T,T =1) . Then τ (cid:48) ∈ T T +1 and, for sufficiently large T , there exists r T ∈ (0 , such that V ( S τ (cid:48) ) > V ( S τ ) . (b) Assume that w − ( · ) is given by (2) with < δ − < , u (cid:48)− ( x ) → as x → ∞ , and P ( S τ = − T ) > . Construct a new strategy τ (cid:48)(cid:48) = τ (cid:48)(cid:48) ( r − T ) := τ + τ = T,S τ = − T ξ T, − T ,where r − T ∈ [0 , , ξ T, − T is a Bernoulli random variable that is independent of S =( S t : t ∈ N ) and τ , and P ( ξ T, − T = 0) = r − T = 1 − P ( ξ T, − T = 1) . Then τ (cid:48)(cid:48) ∈ T T +1 and,for sufficiently large T , V ( S τ (cid:48)(cid:48) ) < V ( S τ ) for all r − T ∈ [0 , . For general utility and weighting functions, the above results are valid for sufficientlylarge T ; but for the utility function (1) and probability weighting function (2) with Tverskyand Kahneman (1992)’s estimates, T does not need to be excessively large. For example,it follows from the proof of (a) that all we need is to ensure p T = P ( S τ = T ) falls into theconcave domain of w + ( · ). For δ + = 0 .
61, this requires p T < . T = 2. Similarly, by the proof of (b), for α − = 0 .
88 and δ − = 0 .
69, a straightforward calculation yields that when T = 2, p − T falls into the concavedomain of w − ( · ) and r − T = 1 dominates the other choices. We have put the proof of this result here instead of in the appendix, not only because it is relativelyelementary, but also because the proof discloses why there are essential differences between the gain andloss regions.
35n the preceding discussions we assume that an original (i.e., before the horizon is ex-tended) strategy has resulted in the maximum possible gain or loss. We now investigate thesituations when the strategy ends up with an intermediate state with a mild accumulatedgain or loss. Specifically, let τ ∈ T T be a given exiting strategy and n < T be a gain state.Assume p n := P ( S τ = n ) > p n +1 := P ( S τ ≥ n + 1) > T, n ) under τ , namely τ = T and S τ = n .Now, with an additional round of play granted, we denote by τ (cid:48) the strategy modifying theoriginal τ by, given τ = T and S τ = n , playing one more round with probability 1 − r n attime T , where r n ∈ [0 , q n = − r n ∈ [0 , ]. An argument similar to the case of n = T yields that the extra CPT value due to the possible additional round of play, as a functionof q n , is w + ( q n p n + ¯ p n +1 )[ u + ( n + 1) − u + ( n )] + w + (cid:0) (1 − q n ) p n + ¯ p n +1 (cid:1) [ u + ( n ) − u + ( n − , (14)whose first-order derivative is (cid:8) w (cid:48) + ( q n p n + ¯ p n +1 )[ u + ( n + 1) − u + ( n )] − w (cid:48) + (cid:0) (1 − q n ) p n + ¯ p n +1 (cid:1) [ u + ( n ) − u + ( n − (cid:9) p n = [ u + ( n + 1) − u + ( n )] w (cid:48) + ((1 − q n ) p n + ¯ p n +1 ) p n (cid:104) w (cid:48) + ( q n p n +¯ p n +1 ) w (cid:48) + ((1 − q n ) p n +¯ p n +1 ) − u + ( n ) − u + ( n − u + ( n +1) − u + ( n ) (cid:105) . (15)The necessary condition for a maximum q ∗ n is thus w (cid:48) + ( q ∗ n p n + ¯ p n +1 ) w (cid:48) + (cid:0) (1 − q ∗ n ) p n + ¯ p n +1 (cid:1) = u + ( n ) − u + ( n − u + ( n + 1) − u + ( n ) . (16)Assume n is sufficiently large so that p n + ¯ p n +1 ≡ P ( S τ ≥ n ) falls into the concaveregion of w + ( · ). Because u + ( n ) − u + ( n − u + ( n +1) − u + ( n ) > u + ( · ), q ∗ n = willnever satisfy (16); hence r n (cid:54) = 0 or the gambler will not continue decisively. Moreover, if w (cid:48) + (¯ p n +1 ) w (cid:48) + ( p n +¯ p n +1 ) > u + ( n ) − u + ( n − u + ( n +1) − u + ( n ) , then there is q ∗ n ∈ (0 , ) such that (16) holds, in which case r n ∈ (0 ,
1) indicating that the gambler will randomize. On the other hand, if w (cid:48) + (¯ p n +1 ) w (cid:48) + ( p n +¯ p n +1 ) ≤ u + ( n ) − u + ( n − u + ( n +1) − u + ( n ) then it follows from (15) that (14) is a non-increasing function of q n ; so itsmaximal value achieves at q n = 0 (and hence r n = 1). This is in stark contrast to the casewhen n = T : at some intermediate gain state n , the gambler may indeed choose to stopeven if the time horizon is extended. This is examplified by the black node (9,1) in the left panel of Figure 10. − n > − T , a similar analysis yields that ran-domization with q n ∈ (0 , ) is again being dominated. It is possible that q ∗− n = (resp, r ∗− n = 0), in which case the gambler will continue for sure if the time horizon is extended.This is different from the case of maximal loss state − n = − T . While a precommitted gambler follows the optimal strategy determined at time 0, a na¨ıvegambler constantly deviates from it. We have shown in Subsection 4.2 that, under theparameter specification therein, the naivet´e’s actual behavior changes from the originallyplanned loss-exit strategy to an eventual gain-exit one.Numerically, the naivit´e’s strategy can be obtained by computing each time- t precom-mitted strategy, carrying it out for just one period, and then pasting them together; seeSubsection 4.2 for details. We apply this scheme to the first three groups of parametersstudied in Subsection 5.2, and draw the na¨ıve strategies in the right panels of Figure 8 –10. The problem in Figure 8 has the same parameter values as that in Figure 6 but a longerhorizon. The changes from the left panels to the right ones in the two figures are qualita-tively the same, namely the naivit´e turns a loss-exit strategy to a gain-exit one eventually.The same happens to Figure 10. In Figure 9, the two panels are almost identical – bothare gain-exit – except the two lowest nodes at t = 8 ,
9. This is because the difference inbehaviors of the precommitter and the naivit´e emanates from time-inconsistency, which inturn stems from probability weighting. In this case, the strength of probability weightingis very low with δ ± = 0 .
95, leading to a low level of time-inconsistency than the other twocases and hence the high similarity between the precommitted and na¨ıve strategies.It is very interesting to note that, in all the cases, the na¨ıve gambler’s behavior is consistent , irrespective of the underlying parameter specifications: once he enters the casinohe always takes gain-exit strategies, reminiscent of the disposition effect in security trading.In particular, he never stops loss and gambles “until the bitter end” (Ebert and Strack,2015). In the right panel of Figure 10, all the nodes with state x = 1 are black, which “block” the gamblerfrom accessing the nodes beyond state 1. This is why the nodes above state 1 are also all black. We reiterate that the result of Ebert and Strack (2015) depends critically on the assumption that
37e now provide a theory that explains such a phenomenon. Suppose a na¨ıve gamblerhas accumulated a gain equal to x > T −
1, the date just before the terminal one.Then his decision problem regarding whether he should quit at T − q ∈ [0 , / g ( q ) := (cid:0) u + ( x + 1) − u + ( x ) (cid:1) w + ( q ) − (cid:0) u + ( x ) − u + ( x − (cid:1) (1 − w + (1 − q )) , where, as before, q = − r and r ∈ [0 ,
1] is the probability to stop. Suppose w + satisfiesthe so-called subcertainty, i.e., 1 − w + (1 − p ) ≥ w + ( p ) for p ∈ [0 , / g ( q ) ≤ (cid:16)(cid:0) u + ( x + 1) − u + ( x ) (cid:1) − (cid:0) u + ( x ) − u + ( x − (cid:1)(cid:17) w + ( q ) ≤ , where the second inequality follows from the concavity of u + , while the equality is achievedwhen q = 0, corresponding to the decision of “stop”. We have established the followingresult. Proposition 5.2
Assume that w + satisfies subcertainty. Then it is optimal for a na¨ıvegambler to stop in gain at T − . Next, suppose the naivit´e’ has accumulated a loss − x < T −
1. His decision problemto continue or stop at T − q ∈ [0 , / l ( q ) := (cid:0) u − ( x + 1) − u − ( x ) (cid:1) w − ( q ) − (cid:0) u − ( x ) − u − ( x − (cid:1) (1 − w − (1 − q )) . Suppose probability weighting function w − is differentiable and w (cid:48)− (1 − p ) /w (cid:48)− ( p ) ≥ p ∈ [0 , / p = 0. A straightforwardcalculation verifies that this condition is satisfied by the Tversky–Kahneman weighting the gambler can construct arbitrarily small random payoffs, which is possibly valid only in a continuous-time model. The finding that “gamble-until-bitter-end” is also present in the discrete-time casino modelsuggests that the behavior is probably more prevalent characterizing broadly a naivit´e (be it a gambler oran investor). l (cid:48) ( q ) = (cid:0) u − ( x + 1) − u − ( x ) (cid:1) w (cid:48)− ( q ) − (cid:0) u − ( x ) − u − ( x − (cid:1) w (cid:48)− (1 − q )= (cid:0) u − ( x ) − u − ( x − (cid:1) w (cid:48)− ( q ) (cid:18) u − ( x + 1) − u − ( x ) u − ( x ) − u − ( x − − w (cid:48)− (1 − q ) w (cid:48)− ( q ) (cid:19) ≤ (cid:0) u − ( x ) − u − ( x − (cid:1) w (cid:48)− ( q ) (cid:18) u − ( x + 1) − u − ( x ) u − ( x ) − u − ( x − − (cid:19) ≤ , where the last inequality comes from the concavity of u − . As a result, l ( q ) is non-increasingin q ∈ [0 , /
2] and the minimum is achieved when q = 1 /
2, corresponding to the “continue”decision.
Proposition 5.3
Assume that w − is differentiable and w (cid:48)− (1 − p ) /w (cid:48)− ( p ) ≥ for p ∈ [0 , / . Then it is optimal for a na¨ıve gambler to continue in loss at T − . A corollary of Proposition 5.3 is that the naivit´e will definitely continue even if thereis only one round of play left as long as he is in loss, let alone when a longer horizon isallowed. As a consequence, he will not stop loss in any case, until the bitter end.
A sophisticated gambler is unable to precommit and realizes that her future selves willdeviate from whatever plans she makes now. Her resolution is to compromise and choose consistent planning in the sense that she optimizes taking the future disobedience as a con-straint. Consequently, strategies of sophisticated gamblers can be obtained using backwarddeduction as in dynamic programming.To start, we note that at T −
1, a sophisticated gambler and a na¨ıve one face the sameproblem; hence we have the following immediate result.
Proposition 5.4
Propositions 5.2 and 5.3 hold true for a sophisticated gambler.
Next, we derive a sophisticated gambler’s stopping strategies for the four cases studiedin Subsection 5.2, where T = 10. It turns out that, of the four cases, she will enterthe casino only in the case when ( α ± , δ ± , λ ) = (0 . , . , . δ ± is close to 1, the level of probability weighting is low, hence so is that oftime-inconsistency, leading to similar strategies of all the three types of gamblers.Note that in the case above, the sophisticated gambler takes the gain-exit type ofstrategy. Indeed, so long as she enters the casino, she essentially stops in gain undersome mild conditions. This follows from the following argument: by Proposition 5.4, thesophisticated gambler will stop in gain at T −
1. Knowing this, she will also stop in gain at T − same problem but with constraints from her future selves’decisions. Hence, if the latter finds that stopping immediately is optimal at a current node,so will the former because the strategy of an immediate stop automatically satisfies theaforementioned constraints. Proposition 5.5
Under any specification of parameters, a sophisticated gambler stops nolater than a na¨ıve gambler does.
An implication of this result is that the naivit´e is at least as risk-taking as the sophis-ticated, if not more.
This section explores connection between the finite horizon and infinite horizon casinomodels.Define T ∞ := { τ ∈ [0 , ∞ ) : τ is an ( F t ) t ∈ N -stopping time } , which is the set of admissible stopping strategies (allowing randomization) in the infinitetime horizon. Suppose τ ∈ T ∞ is optimal for the infinite horizon model and achieves a40nite CPT value. Then we have V ( S τ ∧ T ) ≤ sup σ ∈T T V ( S σ ) ≤ V ( S τ ) . We see immediately that the value of the finite horizon model converges to that of theinfinite horizon one as the horizon approaches infinity. The following makes this formal.
Theorem 5.2
Assume τ ∗ achieves the optimal value of the gambling model in the infinitetime horizon with τ ∗ < ∞ a.s., V ( S τ ∗ ) = v ∗ < ∞ , and S τ ∗ is lower-bounded a.s. Then lim T →∞ sup τ ∈T T V ( S τ ) = lim T →∞ V ( S τ ∗ ∧ T ) = v ∗ . We stress that this result only reveals the relationship between the two models in termsof the optimal values. It does not offer a solution to the finite horizon problem (which isharder) from a solution to the infinite one (which is comparatively easier), nor does it tellthe error in the optimal values when T is given and fixed. That said, the result suggeststhat the optimal value of the infinite horizon model is an upper bound of that of the finitehorizon one, and it is a tight upper bound if T is sufficiently large. Moreover, while thetruncation method mentioned earlier does not provide an exact optimal solution to thefinite horizon model, it does nevertheless offer a good solution when T is large enough. In this paper we develop a systematic approach to studying the stopping behaviors of CPTgamblers in a finite time horizon. We hope that this work opens an avenue of thoroughlyunderstanding Barberis (2012)’s model and beyond. Indeed, as Barberis (2012) points out,casino gambling is not an isolated model requiring a unique treatment; rather it is just oneof the many examples, including ones in financial markets, that share a common feature ofthe probability weighting. 41 eferences
Agranov, M. and Ortoleva, P. (2017). Stochastic choice and preferences for randomization,
Journal of Political Economy (1): 40–68.Anderson, C. J. (2003). The psychology of doing nothing: Forms of decision avoidanceresult from reason and emotion,
Psychological Bulletin : 139–167.Barberis, N. (2012). A model of casino gambling,
Management Science (1): 35–51.Blavatskyy, P. R. (2006). Violations of betweenness or random errors?, Economics Letters (1): 34–38.Bolton, G. E., Brandts, J. and Ockenfels, A. (2005). Fair procedures: Evidence from gamesinvolving lotteries, Economic Journal : 1054–1076.Camerer, C. F. and Ho, T.-H. (1994). Violations of the betweenness axiom and nonlinearityin probability,
Journal of Risk and Uncertainty : 167–196.Diecidue, E., Schmidt, U. and Wakker, P. P. (2004). The utility of gambling reconsidered, Journal of Risk and Uncertainty : 241–259.Dwenger, N., K¨ubler, D. and Weizsacker, G. (2013). Flipping a coin: Theory and evidence.Working Paper. URL: http://ssrn.com/abstract=2353282.
Ebert, S. and Strack, P. (2015). Until the bitter end: on prospect theory in a dynamiccontext,
American Economic Review (4): 1618 – 1633.He, X. D., Hu, S., Ob(cid:32)l´oj, J. and Zhou, X. Y. (2017). Path-dependent and randomizedstrategies in barberis’ casino gambling model,
Operations Research (1): 97–103.He, X. D., Hu, S., Ob(cid:32)l´oj, J. and Zhou, X. Y. (2019a). Optimal exit time from casinogambling: Strategies of pre-committed and naive gamblers, SIAM Journal on Controland Optimization (3): 1845–1868.He, X. D., Hu, S., Ob(cid:32)l´oj, J. and Zhou, X. Y. (2019b). Two explicit skorokhod embed-dings for simple symmetric random walk, Stochastic Processes and their Applications (9): 3431–3435. 42eimer, R., Iliewa, Z., Imas, A. and Weber, M. (2020). Dynamic inconsistency in riskychoice: Evidence from the lab and field. Working Paper.
URL: https://ssrn.com/abstract=3600583.
Henderson, V., Hobson, D. and Tse, A. (2017). Randomized strategies and prospect theoryin a dynamic context,
Journal of Economic Theory (3): 287–300.Kahneman, D., Knetsch, J. L. and Thaler, R. (1986). Fairness as a constraint on profitseeking: Entitlements in the market,
American Economic Review : 728–741.Kahneman, D. and Tversky, A. (1979). Prospect theory: An analysis of decision underrisk, Econometrica (2): 263–291.Leonhardt, J. M., Keller, R. L. and Pechmann, C. (2011). Avoiding the risk of responsibilityby seeking uncertainty: Responsibility aversion and preference for indirect agency whenchoosing for others, Journal of Consumer Psychology : 405–413.Ob(cid:32)l´oj, J. (2004). The skorokhod embedding problem and its offspring, Probability Surveys : 321–392.Odean, T. (1998). Are investors reluctant to realize their losses, Journal of Finance (5): 1775–1798.Root, D. H. (1969). The exitstence of certain stopping times on brownian motion, TheAnnuals of Mathematical Statistics (2): 715–718.Rost, H. (1976). Skorokhod stopping times of minimal variance, S ´minaire de Probabilit´s X ,Vol. 511 of
Lecture Notes in Mathematics , Springer, pp. 194–208.Rubinstein, A. (2002). Irrational diversification in multiple decision problems,
EuropeanEconomic Review : 1369–1378.Shiryaev, A. (1978). Optimal Stopping Rules , Springer–Verlag, New York.Strotz, R. (1955). Myopia and inconsistency in dynamic utility maximization,
The Reviewof Economic Studies : 165–180.Tversky, A. and Kahneman, D. (1992). Advances in prospect theory: Cumulative repre-sentation of uncertainty, Journal of Risk and Uncertainty (4): 297–323.43u, Z. Q. and Zhou, X. Y. (2012). Optimal stopping under probability distortion, Annalsof Applied Probability (1): 251–282.Yong, J. and Zhou, X. Y. (1999). Stochastic Controls: Hamiltonian Systems and HJBEquations , Springer, New York.Zeelenberg, M. and Pieters, R. (2007). A theory of regret regulation 1.0,
Journal of Con-sumer Psychology (1): 3–18. AppendixA Proof of Theorem 3.1
We prove this theorem through a series of results. We start by recalling some properties ofthe potential and its link to the first exit times.
Proposition A.1
Let τ be an ( F t ) t ∈ N -stopping time such that { S τ ∧ t : t ∈ N } is uniformlyintegrable. Then (i) For any t ∈ N , U S τ ∧ t is a convex function, U S τ ( x ) ≥ U S τ ∧ t ( x ) ≥ | x | ∀ x ∈ R , with U S τ ∧ t ( x ) = | x | ∀ x / ∈ ( − t, t ) . (ii) For any two integers a < b and ρ := inf { u ≥ τ : S u / ∈ ( a, b ) } , U S ρ ( x ) = U S τ ( x ) ∀ x / ∈ ( a, b ) , and U S ρ is linear on [ a, b ] . (iii) Fix t ≥ and let K := { k ∈ Z | k = t − j, j ∈ Z } . Then U S τ ∧ t ( x ) = U S τ ∧ ( t − ( x ) + P ( S t − = x, τ ≥ t ) x ∈K ∀ x ∈ Z . (17) In particular, if t is odd, then U S τ ∧ t ( x ) = U S τ ∧ ( t − ( x ) for any odd x ; and if t is even,then U S τ ∧ t ( x ) = U S τ ∧ ( t − ( x ) for any even x . Proof
The first two properties are standard; see Ob(cid:32)l´oj (2004, Section 2). So we onlyestablish (iii). Note that S t − is supported on K . We have | S τ ∧ t − S τ ∧ ( t − | ≤ S τ ∧ t ≥ x } = { S τ ∧ ( t − ≥ x } ∀ x / ∈ K . In particular, since S is a martingale, we have U S τ ∧ t ( x ) = U S τ ∧ ( t − ( x ) ∀ x / ∈ K . Now take x ∈ K . Since x + 1 , x − / ∈ K , using (5), we have P ( S τ ∧ t = x ) = U S τ ∧ t ( x + 1) + U S τ ∧ t ( x − − U S τ ∧ t ( x )= U S τ ∧ ( t − ( x + 1) + U S τ ∧ ( t − ( x − − U S τ ∧ t ( x )= P ( S τ ∧ ( t − = x ) + U S τ ∧ ( t − ( x ) − U S τ ∧ t ( x ) . (18)Rearranging and observing that P ( S t = x, τ ≥ t ) = 0 the thesis follows. (cid:3) The following proposition provides some useful properties of U t . Proposition A.2
Let µ ∈ P ( Z ) and U t = U µt . Then (i) U ( x ) ≤ U t ( x ) ≤ U µ ( x ) ∧ U S t ( x ) ∀ x ∈ Z , t ∈ N . (ii) U t ( x ) = U t +1 ( x ) when t is odd and x is even, or when t is even and x is odd. (iii) U t ( x ) is convex in x ∈ R and non-decreasing in t ∈ N . Proof (i) By the construction of U t we have U ( x ) ≤ U t ( x ) ≤ U µ ( x ). On other hand, by(5) and the structure of SSRW that P ( S t = x ) = (cid:0) P ( S t − = x −
1) + P ( S t − = x + 1) (cid:1) /
2, onecan show easily that U S t ( x ) = (cid:0) U S t − ( x −
1) + U S t +1 ( x + 1) (cid:1) / x ∈ Z . Then by induction,we have U t ( x ) ≤ U S t ( x ).(ii) Again, by construction we have U ( x ) = U ( x ) for all odd x and U ( x ) = U ( x ) forall even x . The conclusions follow immediately from induction.(iii) Clearly U is convex. Suppose U t is convex and fix m ∈ Z . If we put ˜ U ( x ) = U t ( x )for x ∈ Z \ { m } , pick any˜ U ( m ) ∈ (cid:20) U t ( m ) ,
12 ( U t ( m −
1) + U t ( m + 1)) (cid:21) , and finally define ˜ U by a linear interpolation for x ∈ R , then ˜ U is convex. Observe that U t +1 is obtained exactly by repeating this procedure for all m ∈ Z and, hence, is alsoconvex. Moreover, it now follows, by its definition, that U t ( x ) is non-decreasing in t . (cid:3) roposition A.3 Let T ≥ , µ ∈ P ( Z ) such that µ ([ − T, T ]) = 1 and U t = U µt be definedin (6) . Then, the following are equivalent: (i) U T ( x ) = U µ ( x ) ∀ x ∈ Z . (ii) There exists a randomized Root stopping time τ R ( b , r ) such that τ R ( b , r ) ≤ T and U S τR ( b , r ) ∧ t = U t ∀ t ≤ T ; in particular S τ R ( b , r ) ∼ µ . (iii) There exists τ ∈ T T such that S τ ∼ µ .Furthermore, for any τ ∈ T T such that S τ ∼ µ we have U S τ ∧ t ( x ) ≤ U t ( x ) ∀ x ∈ R , t ≤ T . Proof
Proof of (i) → (ii). To show the existence of a randomized Root stopping timeembedding µ we first construct its stopping barrier b . For x ∈ Z , define b ( x ) := inf { t ≥ | x | : U t +1 ( x ) = U µ ( x ) } . (19)It follows from Proposition A.2 that b ( x ) = x + 2 k for some k ∈ Z . Next define theprobabilities of the binary random variables { ξ t,x } , P ( ξ t,x = 0) = 1 − P ( ξ t,x = 1). For each x ∈ Z , P ( ξ t,x = 0) = 0 for t < b ( x ) , P ( ξ t,x = 0) = r ( x ) := U t ( x − U t ( x +1) − U µ ( x ) U t ( x − U t ( x +1) − U t ( x ) for t = b ( x ) , P ( ξ t,x = 0) = 1 for t > b ( x ) . (20)Note that r ( x ) = 1 is only possible if U µ ( x ) = | x | which happens for x outside of thesupport of µ . For other x we have U µ ( x ) > | x | and a randomization, i.e., 0 < r ( x ) < t, x ) when t = b ( x ) and U t ( x −
1) + U t ( x + 1)2 > U µ ( x ) > U t ( x ) . Let τ = τ R ( b , r ) be the randomized Root stopping time in (7). By (i), U T ≥ U µ and hence b ( x ) ≤ T − ∀ x ∈ Z ∩ ( − T, T ). It follows that τ ≤ T as required.To show S τ ∼ µ , we need only to establish U S τ ( x ) = U µ ( x ) ∀ x ∈ Z . Note U S ( x ) =46 S τ ∧ ( x ) = U ( x ). Suppose we have U S τ ∧ t ( x ) = U t ( x ) for t ≤ n −
1. It follows from (5) that P ( S τ ∧ t = x ) = U t ( x + 1) + U t ( x − − U t ( x ) , t ≤ n − . On the other hand, by Proposition A.1, we have U S τ ∧ n ( x ) = U S τ ∧ ( n − ( x ) + P ( S n − = x, τ ≥ n ) x ∈K ∀ x ∈ Z , where K = { k ∈ Z | k = n − j, j ∈ Z } .If U n − ( x ) = U µ ( x ) = U n ( x ), then b ( x ) < n − P ( S n − = x, τ ≥ n ) = 0; hence U S τ ∧ n ( x ) = U S τ ∧ ( n − ( x ) = U n − ( x ) = U µ ( x ) = U n ( x ) . If U n − ( x ) < U µ ( x ) = U n ( x ), then b ( x ) = n − x ∈ K . We have, bydefinition, P ( S n − = x, τ ≥ n ) = P ( S τ ∧ ( n − = x ) P ( ξ n − ,x = 1)= (cid:18) U n − ( x + 1) + U n − ( x − − U n − ( x ) (cid:19) P ( ξ n − ,x = 1) . It then follows that U S τ ∧ n ( x ) = U S τ ∧ ( n − ( x ) + P ( S n − = x, τ ≥ n ) x ∈K = U n − ( x ) P ( ξ n − ,x = 0) + U n − ( x + 1) + U n − ( x − P ( ξ n − ,x = 1)= U n − ( x ) U n − ( x −
1) + U n − ( x + 1) − U µ ( x ) U n − ( x −
1) + U n − ( x + 1) − U n − ( x )+ U n − ( x + 1) + U n − ( x − U µ ( x ) − U n − ( x ) U n − ( x −
1) + U n − ( x + 1) − U n − ( x )= U µ ( x ) = U n ( x ) . Finally, if U n ( x ) < U µ ( x ), then b ( x ) > n −
1. By definition, we have U n ( x ) = U n − ( x +1)+ U n − ( x − and P ( S τ = x ) = P ( S τ = x, τ ≥ n ). Consequently, P ( S n − = x, τ ≥ n ) = P ( S τ ∧ ( n − = x ) = U n − ( x + 1) + U n − ( x − − U n − ( x ) . x ∈ K , then U S τ ∧ n ( x ) = U S τ ∧ ( n − ( x ) + P ( S n − = x, τ ≥ n ) x ∈K = U n − ( x ) + U n − ( x + 1) + U n − ( x − − U n − ( x )= U n − ( x + 1) + U n − ( x − U n ( x ) . If x / ∈ K , then, noting that P ( S τ = x, τ < n ) = 0, we have P ( S τ ∧ ( n − = x ) = 0. As aresult, U n − ( x +1)+ U n − ( x − = U n − ( x ) and U S τ ∧ n ( x ) = U S τ ∧ ( n − ( x ) + P ( S n − = x, τ ≥ n ) x ∈K = U n − ( x ) = U n − ( x + 1) + U n − ( x − U n ( x ) . In summary, U S τ ∧ n ( x ) = U n ( x ) ∀ n ∈ Z + . As a result, U S τ ( x ) = U S τ ∧ T ( x ) = U T ( x ) = U µ ( x ) ∀ x ∈ Z , namely, S τ ∼ µ .Proof of (ii) → (iii). This is trivial.Proofs of (iii) → (i) and the last assertion of the theorem. We start with the latterassuming (iii) holds. Let τ ∈ T T such that S τ ∼ µ . Note that U S τ ∧ ≡ U . Suppose U S τ ∧ t ( x ) ≤ U t ( x ) ∀ x , for some t < T . Let ˜ S t = | S τ ∧ t − x | , then ( ˜ S t : t ≥
0) is asubmartingale. Hence, U S τ ∧ ( x ) ≤ ... ≤ U S τ ∧ ( t − ( x ) ≤ U S τ ∧ t ( x ) ≤ ... ≤ U S τ ∧ T ( x ) = U µ ( x ) ∀ x . By (17), if x / ∈ K , then U S τ ∧ t ( x ) = U S τ ∧ ( t − ( x ) ≤ U t − ( x ) ≤ U t ( x ); if x ∈ K and U t ( x ) = U µ ( x ), then U S τ ∧ t ( x ) ≤ U µ ( x ) = U t ( x ); and if x ∈ K and U t ( x ) < U µ ( x ), then U S τ ∧ t ( x ) ≤ U S τ ∧ t ( x −
1) + U S τ ∧ t ( x + 1)2= U S τ ∧ ( t − ( x −
1) + U S τ ∧ ( t − ( x + 1)2 ≤ U t − ( x −
1) + U t − ( x + 1)2 = U t ( x ) , where the first inequality is due to the convexity of U S τ ∧ t ( · ), and the second equality is dueto x − , x + 1 / ∈ K . This proves the last assertion of the theorem. Next, taking t = T andnoting that τ ≤ T we have U µ = U S τ ∧ T ≤ U T which shows (iii) → (i). (cid:3) We are now ready to prove Theorem 3.1. The “only if” part follows immediately fromProposition A.3-(i) and the construction of U T ( x ). To prove the “if” part, supposed (8)holds. First, we have U T ( x ) = U µ ( x ) = | x | for | x | ≥ T . For x = − ( T − , − ( T − , ..., T − , T −
2, it follows from (8) that U T ( x ) = U T − ( x +1)+ U T − ( x − ∧ U µ ( x ) = U µ ( x ). Next, byProposition A.2, U T ( x ) = U T − ( x ) for all x with x = T + 2 j for some j ∈ Z . As a result, for x = − ( T − , − ( T − , ..., T − , T −
1, we have U µ ( x +1) = U T ( x +1) = U T − ( x +1), U µ ( x −
1) = U T ( x −
1) = U T − ( x − U µ ( x ) ≤ U µ ( x +1)+ U µ ( x − = U T − ( x +1)+ U T − ( x − ,where the first inequality is due to the convexity of U µ , and it follows that there exists therandomized Root stopping time that embeds µ in the random walk with finite time T . Weconclude that U T ( x ) ≥ U µ ( x ) ∀ x ∈ Z and, hence, Proposition A.3 yields the desired result. B Proof of Proposition 5.1
Suppose at time 0, the gambler takes a randomized strategy with probability r of “stop”and probability 1 − r of “continue”, where r ∈ [0 , q = (1 − r ) / ∈ [0 , / u ( x ) = u + ( x ) x ≥ − λu − ( − x ) x< , the CPT value of this strategy is givenby u + (1) w + ( q ) − λu − (1) w − ( q ), whose derivative in q is u + (1) w (cid:48) + ( q ) − λu − (1) w (cid:48)− ( q ). Iffollows from the assumption lim p → [ w (cid:48) + ( p ) /w (cid:48)− ( p )] > λ [ u − (1) /u + (1)] that u + (1) w + ( q ) − λu − (1) w − ( q ) is strictly increasing in q ∈ [0 , ˜ q ] for some ˜ q ∈ (0 , / q > u + (1) w + (¯ q ) − λu − (1) w − (¯ q ) > C Proof of Theorem 5.2
For any T , we have V ( S τ ∗ ∧ T ) ≤ sup τ ∈T T V ( S τ ) ≤ V ( S τ ∗ ) = v ∗ . Since S τ ∗ is lower-bounded a.s., there exists N > S τ ∗ > − N a.s. For any (cid:15) > M large enough such that M (cid:88) n =1 u + ( n ) ( w + ( P ( S τ ∗ ≥ n )) − w + ( P ( S τ ∗ ≥ n + 1))) − λ N (cid:88) n =1 u − ( n ) ( w − ( P ( S τ ∗ ≤ − n )) − w − ( P ( S τ ∗ ≤ − n − v > v ∗ − (cid:15)/ .
49n the other hand, since τ ∗ is finite a.s., the distribution of S τ ∗ ∧ T converges to that of S τ ∗ .Then there is sufficiently large T such that V ( S τ ∗ ∧ T ) ≥ M (cid:88) n =1 u + ( n ) ( w + ( P ( S τ ∗ ∧ T ≥ n )) − w + ( P ( S τ ∗ ∧ T ≥ n + 1))) − λ N (cid:88) n =1 u − ( n ) ( w − ( P ( S τ ∗ ∧ T ≤ − n )) − w − ( P ( S τ ∗ ∧ T ≤ − n − > ˜ v − (cid:15)/ > v ∗ − (cid:15).(cid:15).