Optimal allocation using the Sortino ratio
OOptimal allocation using the Sortino ratio
T. Nassar, S. EphremJuly 14, 2020
In 1956, while working at AT&T, Kelly published a seminal paper inconspicuously entitled "A newinterpretation of information rate" [1]. The paper’s avowed goal was to provide a betting strategyfor a gambler who receives information (through some channel) related to her bets. For instance,suppose the gambler is betting on the outcomes of a coin toss. Prior to each such toss, the gamblerreceives a communication from a secret benefactor that tells her which way she should bet. If thisbenefactor is an oracle capable of predicting the future exactly, then clearly, by betting all of hermoney every time as per the instructions of the benefactor, the gambler can achieve great fortune.This becomes considerably more complicated if the benefactor is not an oracle and therefore has acertain probability p of being right. In this case, Kelly asks what would be the optimal proportion Θ of the gambler’s money that should be bet every time given the probability p.To this end, we observe that if the benefactor’s communication is correct then, at time t+1 thegambler’s money increases by (1 + Θ) V t where V t is the amount of money the gambler had at time tand if the benefactor is wrong, the gambler’s money decreases to (1 − Θ) V t where we have assumedthe odds are 1:1. Hence, after a certain time period T, the gambler’s fortune would be given by: V t = (1 + Θ) W (1 + Θ) L V = (1 + Θ) W (1 + Θ) T − W V (1)where w is the number of wins and L = T - W is the number of losing bets.From (1) it is easy to see that if W ∼ = T (i.e. the benefactor is right most of the time), thevalue V t will continue to grow in an unbounded way the longer the gambler keeps playing the game.Instead of dealing with such divergent quantities which, due to their divergence as T → + ∞ becomeimpossible to compare or optimise over, Kelly considers the logarithm of (1): G = lim T →∞ T log (cid:18) V T V (cid:19) = (1 − p ) log(1 − Θ) + p log(1 + Θ) (2)where : p = lim T →∞ WT (3)i.e. p is the probability that the benefactor is right1 a r X i v : . [ q -f i n . P M ] J u l he quantity defined in (2) bears a relationship to channel capacity in information theory butit also has a much simpler financial interpretation: it is the internal rate of return (IRR) on thebetting/investment strategy considered. The proportion of the portfolio invested can now be deter-mined in a straightforward way by maximising G over Θ : Θ = 2 p − (4)Clearly, from (4) we note that if p < . , we get a negative Θ which implies that we should never doany betting in that case. That’s perfectly intuitive since having p < . means that our benefactoris more wrong than right and as such is not very trustworthy.The Kelly criterion as set out in equation (4) and justified in (1-3), applies to the case where thedecisions available to the gambler are binary and where the odds are 1:1. This need not be the caseand the criterion can be generalised to account for non-binary choices and odds different from 1:1as discussed in [1]. More relevant to us in this paper is the risk associated with the use of a strategybased on the Kelly criterion. In equation (2) one can prove, using the central limit theorem, thatG converges almost surely to (1 − p ) log(1 − Θ) + p log(1 + Θ) as T → ∞ : P ( ˆ G = G ) = δ ( G − (1 − p ) log(1 − Θ) − p log(1 + Θ)) This fact is sometimes erroneously taken to mean that use of the Kelly criterion entails no risk.In addition, as pointed out in [2], Kelly’s criterion minimizes the time needed to reach a certainwealth level. However, as pointed out in [3], the innocuous limit T → ∞ is an idealisation that doesnot approximate reality in a satisfactory way.To show this, let’s denote by ˆ η ∈ {− , } whether the decision communicated by the benefactorto the gambler at some time t is correct (ˆ η = 1) or incorrect (ˆ η = − . The gambler’s fortune afterT steps is then given by: ˆ V T = V T (cid:89) t =1 (1 + ˆ η t Θ) (5)so that: ˆ G T = 1 T log (cid:32) ˆ V T V (cid:33) = 1 T T (cid:88) t =1 log(1 + ˆ η Θ)= 12 log(1 − Θ ) + 12 T log (cid:18) − Θ (cid:19) T (cid:88) t =1 ˆ η t (6)Kelly refers to the quantity ˆ G T as the rate of growth of the gambler’s fortune. It correspondsto what one would call an internal rate of return in terms of trading strategies. The expectedvalue of ˆ G T is given by equation (2) and is maximised by equation (4). However, unlike the limit ˆ G = lim T →∞ ˆ G T , maximising ˆ G T targets returns but does not take risk into account. In particular,we can, for instance, calculate the Sharpe ratio as follows: E (cid:104) ˆ G T (cid:105) − E (cid:104) ˆ G T (cid:105) = 14 T log (cid:18) − Θ (cid:19) ( E (cid:34) T (cid:88) t,s =1 ˆ η t ˆ η s (cid:35) − T (2 p − )= 14 T log (cid:18) − Θ (cid:19) ( T + T ( T − p − − T (2 p − )= 1 T log (cid:18) − Θ (cid:19) p (1 − p ) (7)2hich gives for the Sharpe ratio : S ( p, Θ) = (cid:115) Tp (1 − p ) p −
12 + 12 log(1 − Θ )log (cid:18) − Θ (cid:19) (8)Inserting allocation (4) in Kelly’s criterion : S ( p, p −
1) = (cid:115) Tp (1 − p ) p −
12 + log (4(1 − p ) p )2 log (cid:18) p − p (cid:19) (9)Removing the "scaling factor" √ T from (7), we can easily plot the behaviour of the Kelly Sharperatio against the probability p Figure 1: Kelly Sharpe RatioUnless the probability p > 0.975, the Sharpe ratio remains below 1 which, in itself, is a fairly lowSharpe ratio. If, instead of using the allocation proposed by Kelly, we try to maximise the Sharperatio (as given in (8)) directly, we get: ∂ Θ S ( p, Θ) = 0 = ⇒ log(1 − Θ ) + Θ log (cid:18) − Θ (cid:19) = 0 (10)In the following, we will propose a different method where the aim is to try and maximise a differentrisk-adjusted method, the Sortino ratio, to obtain the optimal allocation/bet. We will then proceedto testing the strategy that consists of optimal amount every day for a certain period of time onthe Dow Jones data. Finally, we will provide a summary and a description of future work. The theory of optimal asset allocation is old and well researched. A general and key tenantin optimal asset allocation is to strike a balance between expected utility maximization and riskminimization. Traditionally, risk is measured by the variance of the returns so that one is trying tooptimise: L = E [ U (ˆ r T , Θ)] − λ (cid:0) E (cid:2) ˆ r T (cid:1) − E [ˆ r T ] (cid:1) where ˆ r T is the return of a given investment, Θ is the allocation vector and λ parametrises the riskaversion of individual investors. 3he first term is the utility and the second is the risk. Different utilities lead to different allocationsand in fact Kelly’s criterion corresponds basically to the "commonly preferred constant relative riskaversion" as it degenerates to a log-utility function (see [4] for a comprehensive overview). TheSharpe ratio: S T = E [ˆ r T ] − µ (cid:113) E [ ˆ r T ) − E [ˆ r T ] basically corresponds to the case where the utility function islinear and it penalises risk through the use of standard deviation.The idea is that if this standard deviation goes to zero, the returns on the proposed investmentstrategy become certain. This suffers from two problems:1. Although the return becomes certain as the standard deviation goes to zero, that return may bequite low. Since dividing any number by zero gives infinity, the use of the Sharpe ratio in the caseof a very small standard deviation can be very misleading when comparing different strategies.2. Standard deviation captures the fluctuations of a random variable around its mean regardless ofwhether these fluctuations are above or below the mean. In risk terms, the standard deviation pe-nalises both the good returns and the bad! When the distribution of the returns is not symmetricalabout the mean (whether that distribution is Gaussian or not doesn’t matter), the Sharpe ratio willagain be misleading as a performance gauge of different strategies.The Sortino ratio, proposed in 1980 by Frank Sortino [5], addresses the two problems of the Sharperatio (see [6]) as follows. Let’s define: Φ = E (cid:104) ˆ G T (cid:105) − µ (cid:114) E (cid:104) min (cid:16) ˆ G T − µ, (cid:17)(cid:105) = (1 − p ) log (1 − Θ) + p log (1 + Θ) − µ (cid:114) E (cid:104) min (cid:16) ˆ G T − µ, (cid:17)(cid:105) (11)where µ is a desired rate of return The numerator will be positive if the expected return onthe proposed strategy exceeds the desired rate. The novelty is in the denominator. The function min (cid:16) ˆ G T − µ, (cid:17) picks up the cases where the strategy is achieving returns below the desired rateof return so that the denominator D = (cid:114) E (cid:104) min (cid:16) ˆ G T − µ, (cid:17)(cid:105) of the Sortino ratio becomes thestandard deviation of the "losses" of the strategy. Therefore, the Sortino ratio will penalise returnsthat are inordinately skewed to the downside.Now we are in a position to calculate the Sortino ratio in the investment/betting setting usedby Kelly. Noting that using the function min (cid:16) ˆ G T − µ, (cid:17) amounts to summing over configurationsthat satisfy:
12 log 1 − Θ + 12 T log (cid:18) − Θ (cid:19) T (cid:88) t =1 ˆ η t < µ = ⇒ ˆ x T < min ( T, T max ) (12)where ˆ x T = (cid:80) Tt =1 ˆ η t and T max = min T, T (2 µ − log (1 − Θ ))log (cid:18) − Θ (cid:19) since we also have ˆ x T ≤ T The desired return µ can of course be also set to the "risk free" rate prevalent for the investment horizon beingconsidered, e.g. some Libor or Fed funds rate. D = T max (cid:88) x =0 (cid:32)(cid:18)
12 log (1 − Θ ) + 12 T log (cid:18) − Θ x (cid:19)(cid:19) × T ! x ! ( T − x )! p x (1 − p ) t − x (cid:19) (13)Equation (12) also entails that ∀ µ > log(1 + Θ) , D remains the same. Equation (13) on the otherhand, horrible as it looks, has a closed form solution, albeit one that uses certain incomplete betafunctions : D = T ! a !( T − a − p a − (1 − p ) − a + T − ( − aα + a p ( T −
1) + b ( p − p )+ cB − p ( T − α, α + 1) ) (14)where α , a, b and c are given in the appendix. To avoid cumbersome calculations, the derivation of(14) is relegated to the appendix. In the following section we shall investigate the behaviour of theallocation based on maximisation of the Sortino ratio relative to the different parameters involved. As the probability of a “win” increases, we expect that both the optimal allocation Θ and thecorresponding Sortino ratio increase. This is indeed the case as figures 1 and 2 (where we have used T = 90 and µ = 0 .
02 = 2%) below show:Figure 2: Optimal allocations versus probabilities5he stairlike behaviour is also to be expected due to the denominator of the Sortino ratio whichis a discontinuous function. The behaviour of the Sortino ratio itself is quite telling:Figure 3: Optimal Sortino ratios versus probabilitiesThe previous figure shows that the Sortino ratio doesn’t go beyond 2 (which is usually seen as a"good" minimal value) unless p>83.55% which is very high. µ The behaviour with µ is fairly complicated. Remembering that µ is a desired level of returns (forinstance, a risk free rate) and that the Sortino ratio punishes return fluctuations that are below thedesired level, it seems natural that the Sortino ratio would diminish as µ increases. This is indeedthe case as shown by the graph below (where we have taken p = 0 . and T = 90) :Figure 4: Optimal Sortino ratios versus desired returns6 Results
To showcase how the previous analysis can be applied in real life, we will use the Dow Jones closingprices from 29-01-1985 to 28-08-2019. The period was chosen in such a way that it includes mul-tiple economic and financial shocks and bull markets to illustrate the robustness of the method indifferent settings. We simulate signals based on the Dow Jones such that: P [ ˆ s t = sign ( p t +1 − p t )] = p = 0 . where sign ( x ) = if x > − if x < if x = 0 This can be done by setting : ˆ s t = sign ( p t +1 − p t ) ˆ ξ t where ˆ ξ t ∈ {− , } and ˆ ξ t ∼ p since P [ ˆ s t = sign ( p t +1 − p t )] = P (cid:104) ˆ ξ t = 1 (cid:105) = p We repeat each simulation L times so we denote the generated signals by ˆ s ( a ) t where a = 1 , .. , L .The strategy consisted of selling the optimal allocation Θ ∗ ( p, µ, T ) (where T = 100 and µ = 3%) when ˆ s ( a ) t = − and buying when ˆ s ( a ) t = 1 . For each date in the Dow Jones series, a trade wasinitiated based on the signal ˆ s t and unwound after T days. The return for each such trade was thencalculated and pooled into a series of returns obtained from the strategy. This series was then usedto calculate the resulting expected return and the corresponding Sortino ratio. The simulationsare then averaged to estimate the probability density of the returns. In this paper, we have usedL = 20,000. The resulting returns for each day are then averaged to estimate the distribution of(annualised) returns. This is shown in figure 4 below:Figure 5: Return probabilities from Dow Jones simulations7he average return is 24.74% and the Sortino ratio is 2.29. The Sharpe ratio on the other handis 0.15. This is understandable since, as figure 5 shows, is overwhelmingly skewed towards positivereturns and as such, the Sortino ratio will be high whereas the Sharpe ratio will be low. Perhapsmore interesting than the returns showcased above is the behaviour of this strategy in differentmarket conditions. To test this, we take the series of simulated returns and we plot it against time,highlighting the periods of different financial crisis: -55%-5%45%95%145%17-Feb-05 14-Nov-07 10-Aug-10 06-May-13 31-Jan-16 27-Oct-18 23-Jul-21 Comparative Returns
Strategy ReturnsIndex Returns
Figure 6: Comparative returns relative to market
In this article we have discussed a method that addresses some of the shortcomings of using Kelly’scriterion and Sharpe ratios. This was done by using the Sortino ratio as a measure to balance riskand return. The proposed allocation and the resulting trading strategies was then simulated versushistorical data from the Dow Jones Industrial Average that range from 1985 to 2019. The data setwas chosen to span a long period to test how the proposed strategy weathers different economic andfinancial climates. The results are highly encouraging. Further work that:1. takes into account transaction costs2. calculates maximum drawdown and other trading risk measures3. estimates the potential for scalability in this approach8
Appendix
The expression in equation (13) admits the more "elegant" form (14) which we derive here. Thecumulative binomial distribution has a well known closed form in terms of the incomplete betafunction [7]: α (cid:88) x =0 T ! x !( T − x )! p x (1 − p ) T − x = 1 − T ! α !( T − α − (cid:90) p t α (1 − t ) T − α dt = 1 − T ! a !( − α + T − B p ( α + 1 , T − α ) (A.1)What we are after is : α (cid:88) x =0 ( Ax + B ) T ! x !( T − x )! p x (1 − p ) T − x (A.2)We will now show how to rewrite (A.2) in terms of (A.1). Let Ω be the differential operator: Ω = a∂ p + b∂ p + c We would like to find a,b and c such that: ( Ax + B ) T ! x !( T − x )! p x (1 − p ) T − x = Ω (cid:18) T ! x !( T − x )! p x (1 − p ) T − x (cid:19) (A.3.1)where : A = 12 log (1 − Θ ) (A.3.2)and B = 12 T log (cid:18) − Θ (cid:19) (A.3.3)By expanding the terms in equation (A.3) and applying the differential operator Ω , we get thefollowing solution for the coefficients a,b and c: a = B ( p − p (A.4.1) b = − B ( p − p (2 A + B (2 p ( T −
1) + 1)) (A.4.2) c = A + 2 ABpT + ( B ) pT ( p ( T −
1) + 1) (A.4.3)If we set : α = T max = min T, T (2 µ − log (1 − Θ ))log (cid:18) − Θ (cid:19) (A.5)and we use equations (A.1) and (A.2) we find that : α (cid:88) x =0 ( A + Bx ) T ! x !( T − x )! p x (1 − p ) T − x = Ω (cid:32) α (cid:88) x =0 T ! x !( T − x )! p x (1 − p ) T − x (cid:33) = ( T − α ) T ! α !( T − α )! Ω (cid:18)(cid:90) − p t T − α − (1 − t ) α dt (cid:19) = T ! α !( T − α − (cid:0) p α − (1 − p ) − α + T − ( − aα + ap ( T −
1) + b ( p − p )+ cB − p ( T − α, α + 1)) (A.6)where B − p is the incomplete beta function given by : B z ( a, b ) = (cid:82) z t a − (1 − t ) b − dt .9 eferences [1] J. L. Kelly Jr. A new interpretation of information rate. Bell System Technical Journal ,35(4):917–926, 1956.[2] L. Breiman. Optimal gambling systems for favorable games. In
Proceedings of the FourthBerkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions tothe Theory of Statistics , pages 65–78, Berkeley, Calif., 1961. University of California Press.[3] Paul A. Samuelson. The "fallacy" of maximizing the geometric mean in long sequences ofinvesting or gambling.
Proceedings of the National Academy of Sciences of the United States ofAmerica , 68(10):2493–2496, 1971.[4] Baz J. and Guo H.
An Asset Allocation Primer: Connecting Markowitz, Kelly and Risk Parity .PIMCO, 2017.[5] Frank A. Sortino and Lee N. Price. Performance measurement in a downside risk framework.
The Journal of Investing , 3(3):59–64, 1994.[6] S. T. Hoffman T. N. Rollinger.
Sortino: A Sharper Ratio . Red Rock Capital, 2014.[7] G.P. Wadsworth.