A Bellman View of Jesse Livermore
aa r X i v : . [ q -f i n . T R ] O c t A BELLMAN VIEW OF JESSELIVERMORE
NICK POLSON
AND
JAN HENDRIK WITTE Introduction
Richard Bellman’s
Principle of Optimality , formu-lated in 1957, is the heart of dynamic programming,the mathematical discipline which studies the opti-mal solution of multi-period decision problems.In his 1923 book
Reminiscences of a Stock Oper-ator , the legendary trader Jesse Livermore gave adetailed account of his trading methods.In this article, we study some of Livermore’s trad-ing rules, and we show that many of them directlyreflect in Bellman’s
Principle of Optimality . Thus,in their strive for optimality, two of the greatestminds of the 20th century can be found to be neatlyaligned.Richard Bellman’s 1957 book on
Dynamic Program-ming introduces his conceptual framework for thesolution of multi-stage decision processes. Whilehaving multiple different mathematical formulations,the problems studied by Bellman all share the fol-lowing main characteristics. • There is a system, characterised at eachstage by a set of parameters and state vari-ables. • At each stage of either process, we have achoice of a number of decisions. • The effect of a decision is a transformationof the state variables. • Past history is of no importance in deter-mining future actions. • The purpose is to maximise a function ofthe state variables.In Bellman’s words, a “
Policy ” is any rule for mak-ing decisions which yields an allowable sequenceof decisions; and an “
Optimal Policy ” is a policywhich maximises a pre-assigned function of the finalstate variables. For every problem with the above
Nick Polson, Booth School of Business, University ofChicago, [email protected] .Jan Hendrik Witte, Mathematical Institute, University ofOxford, [email protected] . listed properties, Bellman establishes the followingrule. Bellman’s Principle of Optimality: “An optimal policy has the property that, whateverthe initial state and initial decision, the remainingdecisions must constitute an optimal policy with re-gard to the state resulting from the first decision.”
It becomes immediately clear that Bellman’s crite-ria for optimality are reminiscent of a trading pro-cess, and that, therefore, the
Principle of Optimal-ity should also be applicable to trading.The
Principle of Optimality suggests that we studythe Q -value matrix describing the value of perform-ing action a in our current state s , and then actingoptimally henceforth. In this framework, let Q ( s, a )denote the set of values available from current state s through action a ; e.g., interpret s as the agent’scurrent wealth, and a as a parametrisation of a longor short position (or any other action) he initiates.The current Optimal Policy and
Value Function aregiven by a ⋆ ( s ) := arg max a Q ( s, a )and V ( s ) := max a Q ( s, a ) = Q ( s, a ⋆ ( s )) , respectively. Let s ′ denote the next state of thesystem, and let r ( a, s, s ′ ) be the reward of the nextstate s ′ given current state s and action a .We would like to put the just introduced definitionsinto a sequential context. We will consider the pathof a single market where trading takes place at dis-crete times. Time t = 0 corresponds to the currenttime. The realised current state will be denoted by s t , the current action by a t , a future unobservedstate by S t +1 , and optimal policy and value func-tions are defined by a ⋆ ( s t ) = arg max a Q t +1 ( s t , a )and V ( s t ) = Q t +1 ( s t , a ⋆ ( s t )) , respectively.The Principle of Optimality now provides the keysequential identity of
Dynamic Programming , namelythat Q t +1 ( s t , a )(1) = E [ r ( a, s t , S t +1 ) + V ( S t +1 ) | s t , a ] . In Bellman’s words: whatever today’s state s t , andwhatever today’s decision a , today’s value Q t +1 ( s t , a )is based on (expected) optimal decision making withregard to the next state S t +1 which results from to-day’s state s t and decision a . o find today’s optimal action, one has to solvethe equilibrium condition (1) for the Q -matrix, andthen read off the optimal action a ⋆ ( s t ) that attainsarg max a Q ( s t , a ). For simplicity, we assume that a t ∈ A takes only a finite set of possible values.(Equations (1) and (2) also allow for inclusion ofa discounting factor, which, if required, should beincorporated in r ( a, s t , S t +1 ).)However, in reality, there’s a caveat: to evaluate(1) in his decision making, as the real world proba-bilities are unknown to him, the trader has to takeexpectations under his subjective probability distri-bution q ( S t | s t , a ), which describes his beliefs aboutthe future path of state variables depending on hiscurrent wealth s t and his action a .Thus, instead of (1), the trader will attempt tosolve Q qt +1 ( s t , a )(2) = E q [ r ( a, s t , S t +1 ) + V q ( S t +1 ) | s t , a ] , where E q [ · ] and V q ( · ) denote probabilities takenwith respect to the distribution q ( S t +1 | s t , a ) of thetrader’s beliefs.Within the Bellman and Livermore optimal frame-work, we note a number of compelling features,which we summarise in the following remarks. Remark 1.1.
It is surprising how little effect thedistinction between (1) and (2) has on the actualtrading process.
Remark 1.2.
Rules based on deviations betweenrealised market prices and a trader’s expectationshave little place in assessing the optimal action a ⋆ :arguments such as “ sell because prices went higherthan my expectations” do not enter the picture,which is a version of Livermore’s (1923) maximthat “the market is never wrong”. Put simply, weshould only worry about the optimal Bellman pathof actions, or how we got there, rather to act opti-mally from here on out. Remark 1.3.
A large part of the Bellman and Liv-ermore optimal policy insight is that the traderssubjective beliefs (2) must be updated conditionally on observed market prices. Because the market hasa superior information set when prices rise, theoptimal action is to do nothing – in Livermore’s(1923) words, “one should hope, not fear” – andwhen prices fall, one should think of selling – “oneshould fear, not hope”.
We will look at the importance of Remarks 1.1, 1.2,and 1.3 in more detail in the next section. 2.
Trading Principles
We discuss two of Jesse Livermore’s main tradingrules, hoping to provide a modest insight into histrading principles.2.1.
Profits take care of themselves, lossesnever do.
Suppose there are only two possible mar-ket positions,
Long and
Neutral , denoted by a L and a N , respectively. Short will be the reflectionof
Long .Suppose a trader initiates a long position a L at time t because, given his current wealth s t , he observes a L = a ∗ ( s t )(3) =arg max a Q qt +1 ( s t , a )= arg max a E q (cid:2) r ( a, s t , S t +1 )+ V q ( S t +1 ) | s t , a (cid:3) . But, suppose that, at time t + 1, he finds that r ( a, s t , s t +1 ) <
0, and that his new wealth now is S t +1 < s t . Then the trader has to evaluate whetherto close his position (i.e., action a N ), or whether tohold on to his position (i.e., action a L ).If we assume that the decision whether to be inor out of the market is independent of the currentwealth level S t +1 , thenarg max a Q qt +2 ( S t +1 , a ) = arg max a Q qt +2 ( s t , a ) . We observe that, if acting under an unchanged sub-jective probability q = q ( S t +1 | s t , a ) , then he trader will proceed with a L , since Q qt +2 ( S t +1 , a L ) > Q qt +2 ( S t +1 , a N )as before.However, if we define Q q,x t +1 t +2 ( S t +1 , a ) := E q (cid:2) r ( a, s t , S t +2 )+ V q ( S t +2 ) | s t , a, x t +1 (cid:3) for a ∈ { a L , a N } , then the trader now supplementshis rational (i.e., his subjective probability q ) bythe information contained in the most recent pricemove x t +1 .If we denote the trader’s updated views by q ∪{ x t +1 } , then, by a symmetry argument, we get that Q q ∪{ x t +1 } t +2 ( S t +1 , a L ) < Q q ∪{ x t +1 } t +2 ( S t +1 , a N ) , where, by assumption, r ( a L , s t , S t +1 ) < r ( a N , s t , S t +1 ) . We observe that, in absence of other external in-formation besides the price move x t +1 , and beingconsistent with his own former rationale, the tradershould take a Neutral action and exit his position. n reality, due to cost of trading, the trader’s reac-tion will not be immediate. However, the implica-tion of the principle of optimality is that the onlything of concern with is Selling the losing position.The reflection of the above argument, in the casewhere r ( a L , s t , S t +1 ) >
0, shows that next period’soptimal action is a L the same as the current period:the winners take care of themselves. In Livermore’s (1923) words: “Profits always take care of themselves, but lossesnever do. The speculator has to insure himself againstconsiderable losses by taking the first small loss. Indoing so, he keeps his account in order, so that, atsome future time, when he has a constructive idea,he will be in a position to go into another deal, tak-ing the same amount of stock he had when he waswrong.”
We shall now look at a brief example, putting thejust introduced concepts into practice.
Example 2.1.
Consider a trader Jan, who is invest-ing in Google (GOOG) shares. Suppose Jan’s onlycounter-party is a broker called Theobald-Fritz, butwhom we will nickname
Hermes .Suppose Jan has just purchased GOOG shares worth$1000 from Hermes, i.e., we have s t = $1000. Sup-pose that Jan trades with Hermes daily, and that,every time he trades, he invests exactly $1000 (inde-pendently of his net wealth), taking his profit/lossfor the trade on the following day. Suppose furtherthat the daily price movements of the GOOG sharesare exactly ± p and 1 − p are the real-worldprobabilities for an up or down move, respectively.Define u := $10 and d := − $10, and denote thedecision to take a long (short) position by a L (by a S ). As p is unknown to him, Jan has to make histrading decision based on his personal beliefs 1 >q, − q >
0. According to Jan’s own estimate, q > .
5, and Q qt +1 ( s t , a L )= E q [ r ( a, s t , S t +1 ) + V q ( S t +1 ) | s t , a = a L ] > E q [ V q ( S t +1 ) | s t , a = a N ] > E q [ r ( a, s t , S t +1 ) + V q ( S t +1 ) | s t , a = a S ] , and, therefore, Jan his very happy with his newlypurchased share of GOOG equity.Suppose in reality p < .
5, and that, the follow-ing day, Jan checks with Hermes, just to find that S t +1 = s t − d = $990 and r ( a L , s t , s t − d ) = − $10 < In classic Greek Mythology,
Hermes is the patron oftravellers, herdsmen, poets, athletes, invention, trade , and thieves . The role played by Hermes here also strongly re-sembles Benjamin Graham’s 1949 creation of
Mr Market ,which is intentional. to make a decision: should he increase his holdingGOOG equity following a L ?Jan’s personal estimate q > . r ( a S , s t , s t + d ) = $10 > a L , the re-verse argument would have held, and he would havekept his GOOG equity, enjoying the ride on the op-timal Bellman trajectory (and avoiding any disputeswith Hermes). Don’t average down.
Averaging down is thepractice of increasing one’s position after taking aloss, in the hope of reaping the expected profit andrecovering all previous losses.Strictly speaking, averaging down is already pro-hibited if losing positions are exited, which we cov-ered in Section 2.1; however, the strategy is so pop-ular that it warrants separate consideration.Using the notation of Section 2.1, suppose againthat, based on (3), our trader holds a long posi-tion a L at time t , and that, at time t + 1, he findsthat r ( a L , s t , S t +1 ) <
0, and that his new wealthnow is S t +1 < s t . If our trader thinks that an in-creased long position is in order, then, clearly, hemust think that Q q ′ t +2 ( S t +1 , a L ) > Q q ′ t +2 ( S t +1 , a N ) , (4)where q ′ denotes his updated personal probabilities.But, in absence of other external information be-sides the price move x t +1 , we have q ′ = q ∪ { x t +1 } ,and, as already seen in Section 2.1, (3) then implies Q q ′ t +2 ( S t +1 , a L ) < Q q ′ t +2 ( S t +1 , a N ) , (5)which means that (4) cannot be true.The contradiction between (4) and (5) is an inter-esting one, and slightly exceeds the implications ofSection 2.1. As acting based on q ′ = q t ∪ { x t +1 } leads to (5), we see that averaging down can onlyever be justified if the trader believes to have ob-tained a new set of external information, exceedingwhat was learned from the latest price move x t +1 ,and dominating the price – a very rare case indeed:generally, doubling up on a losing position is irra-tional. In Livermore’s (1923) words: One other point: it is foolhardy to make a secondtrade, if your first trade shows you a loss. Neveraverage losses. Let that thought be written indeliblyupon your mind.”“I have warned against averaging losses. That is amost common practice. Great numbers of people willbuy a stock, let us say at $50, and two or three dayslater if they can buy at $47 they are seized with theurge to average down. [...] If one is to apply suchan unsound principle, he should keep on averaging bybuying 200 shares at $44, then 400 at $41, 800 at $38,1600 at $35, 3200 at $32, 6400 at $29 and so on. Howmany speculators could stand such pressure? Yet ifthe policy is sound it should not be abandoned. Ofcourse, abnormal moves such as the one indicateddo not happen often. But it is just such abnormalmoves which the speculator must guard against toavoid disaster.”“So, at the risk of repetition, let me urge you to avoidaveraging down. [...] Why send good money afterbad? Keep that good money for another day. Riskit on something more attractive than an obviouslylosing deal.”
Remark 2.2.
An immediate corollary to Section2.2 is that trading on the belief of a ‘true’ value isdangerous. Almost always, the true value is estab-lished once, and then convergence is waited for. Ad-verse movements are interpreted as providing ‘bet-ter entry points’, and are believed to ‘strengthen’the opportunity – clearly, any such reasoning com-pounds the conflict between (4) and (5) severalfold,and should be strictly avoided.
Remark 2.3.
It is helpful to add that Jesse Liv-ermore’s original writings were independent of anyspecific market structure, but were presented to holdin generality, for any market . Similarly, RichardBellman’s
Principle of Optimality applies to anymulti-period decision making process. Therefore,the Bellman and Livermore optimal policy insightpresented in this article applies to any financialtransaction which takes place within an exogenouslygiven market place of any form . Conclusion
There is a saying, attributed to John Kenneth Gal-braith, that “faced with the choice between chang-ing one’s mind and proving that there is no needto do so, almost everyone gets busy on the proof.”
In this article, we show that, within the Bellmanand Livermore optimal policy insight, every viewheld should be updated conditionally on the latestavailable information; an unwillingness to learn , asalluded to by Galbraith, is expected to be detri-mental.On his website, Joe Fahmy gives a summary ofJesse Livermore’s principles in his own words. Hislist, once again, nicely reflects the parallels between the Bellman and Livermore optimal policy insights,and serves as a nice completion to this paper.(1) Do not trust your own opinion and backyour judgment until the action of the mar-ket itself confirms your opinion.(2) Markets are never wrong – opinions oftenare.(3) The real money made in speculating hasbeen in commitments showing in profit rightfrom the start.(4) As long as a stock is acting right, and themarket is right, do not be in a hurry to takeprofits.(5) The money lost by speculation alone is smallcompared with the gigantic sums lost by so-called investors who have let their invest-ments ride.(6) Never buy a stock because it has had a bigdecline from its previous high.(7) Never sell a stock because it seems high-priced.(8) Never average losses.(9) Big movements take time to develop.(10) It is not good to be too curious about allthe reasons behind price movements.
Appendix
A.We also observe that a
Price Process , denoted by p t ( x t ), will also satisfy a Bellman optimality; forrisk neutral traders with information set x t and div-idends, or rewards, r t +1 ( X t +1 ), we have p t ( x t ) = max a E p [ r t +1 ( X t +1 ) + p t +1 ( X t +1 )] , where E p [ · ] denotes expectation with respect to theprobability p ( S t +1 | s t , a, x t ). Further Reading [1] Richard E. Bellman; Dynamic Programming; 1957.[2] Benjamin Graham, The intelligent Investor; 1949.[3] Jesse L. Livermore; Reminiscences of a Stock Operator;1923.[4] Jesse L. Livermore; How to Trade in Stocks; 1940.[5] http://joefahmy.com/2014/06/23/jesse-livermores-trading-rules-written-1940/[1] Richard E. Bellman; Dynamic Programming; 1957.[2] Benjamin Graham, The intelligent Investor; 1949.[3] Jesse L. Livermore; Reminiscences of a Stock Operator;1923.[4] Jesse L. Livermore; How to Trade in Stocks; 1940.[5] http://joefahmy.com/2014/06/23/jesse-livermores-trading-rules-written-1940/