[PDF] Machine Learning Portfolio Allocation

Abstract

We find economically and statistically significant gains when using machine learning for portfolio allocation between the market index and risk-free asset. Optimal portfolio rules for time-varying expected returns and volatility are implemented with two Random Forest models. One model is employed in forecasting the sign probabilities of the excess return with payout yields. The second is used to construct an optimized volatility estimate. Reward-risk timing with machine learning provides substantial improvements over the buy-and-hold in utility, risk-adjusted returns, and maximum drawdowns. This paper presents a new theoretical basis and unifying framework for machine learning applied to both return- and volatility-timing.

Full PDF

MMachine Learning Portfolio Allocation

Michael Pinelis * and David Ruppert *** Department of Economics, Cornell University, [email protected] ** Department of Statistics & Data Science and School of Operations Research andInformation Engineering, Cornell University, [email protected]

March 2, 2020

Abstract

We ﬁnd economically and statistically signiﬁcant gains when using machine learningfor portfolio allocation between the market index and risk-free asset. Optimal portfoliorules for time-varying expected returns and volatility are implemented with two RandomForest models. One model is employed in forecasting the sign probabilities of the excessreturn with payout yields. The second is used to construct an optimized volatilityestimate. Reward-risk timing with machine learning provides substantial improvementsover the buy-and-hold in utility, risk-adjusted returns, and maximum drawdowns. Thispaper presents a new theoretical basis and unifying framework for machine learningapplied to both return- and volatility-timing.

Keywords : portfolio allocation, machine learning, random forest, market timing,reward-risk timing, volatility estimation, equity return predictability

JEL Classiﬁcation : G11, G12, C13 1 a r X i v : . [ q -f i n . P M ] J u l Introduction

We use machine learning to ﬁnd the optimal portfolio weights between the market index andthe risk-free asset. The timing strategy is generated from the utility maximization principleand gives optimal portfolio weights estimated monthly with two Random Forest models.The market weight is proportional to the reward factor, which is based on the probabilityof the excess market return being positive, and is inversely proportional to the risk factor,an estimate of prevailing squared volatility. This procedure is simultaneously return- andvolatility-timing the market and can be called ’reward-risk timing’ . Our method found thata portfolio allocation strategy employing machine learning to reward-risk time the marketgave a signiﬁcant improvement in investor utility and Sharpe ratios and earned a large alphaof 4%. A novel theoretical framework is introduced to help explain the results. We motivateour analysis from the vantage point of a utility-maximizing investor, who adjusts the portfolioallocation according to the attractiveness of the risk-reward trade-oﬀ.A number of papers have been written on predicting returns and volatilities withmachine learning and large numbers of features. See as a review (Henrique et al., 2019).Machine learning methods have been shown to be suitable and advantageous for the diﬃculttask of identifying the regimes in the markets (Gu et al., 2020). Gu et al. ﬁnd a beneﬁt ofusing machine learning for market timing with return forecasts of 26% and 18% increasesin Sharpe ratios relative to that of the buy-hold with neural networks and Random Forest,respectively. Our results document a 40% increase when using Random Forest for both returnsand volatilities in combination. Taking advantage of the allowance for nonlinear predictorinteractions in machine learning models gives better return forecasts and parameter values ina volatility estimator based on market conditions. An approach with machine learning thatconsiders both expected return- and volatility-timing leads to a proﬁtable trading strategy,without an extensive set of predictors. This paper studies how the machine learning method This term is from Kirby and Ostdiek (2012), who propose weighting by individual price of risks in amulti-asset portfolio. Our paper focuses on the portfolio with the market index and risk-free asset. Anotherdiﬀerence is Kirby and Ostdiek (2012) use several-year-long rolling window estimates of the conditionalmean and volatility while we look at the twelve-months-maximum rolling data window for machine learningstrategies.

2f Random Forest can forecast the sign of the excess return with past payout yields (seeBoudoukh et al. (2007)). Then a separate Random Forest model is employed to predict theoptimal parameters of a volatility estimator. Speciﬁcally, we apply the model to estimatethe optimal volatility reference window as a function of lagged volatilities. Comparingthe performance of a linear model for reward-risk timing, we show that machine learningoutperforms by a signiﬁcant margin.Expected-return or reward-timing involves adjusting the portfolio allocation according tobeliefs about future asset returns. This is akin to benchmark timing, the active managementdecision to vary the managed portfolio’s beta with respect to the benchmark (Grinold andKahn, 1999). Merton (1981) derived the economic value of return forecasts. Campbell andThompson (2008) show that many predictive regressions beat the historical average return,once weak restrictions are imposed on the signs of coeﬃcients and return forecast.Volatility- or risk-timing is a newer idea. While there is a wide array of volatility-basedportfolio allocation strategies, this paper derives directly from the utility maximizationprinciple a strategy that naturally depends on both the return and volatility. With thismethodology, the portfolio weight in the risky asset is inversely proportional to the recentsquared volatility, which turns out to be similar to the assumption in Moreira and Muir(2017). Intuitively, by avoiding high-volatility times the investor avoids risks, but if therisk-return trade-oﬀ is strong one also sacriﬁces expected returns, leaving the volatility timingstrategy with no edge. Commonly, the volatility estimator is the realized volatility for thepast few months. We propose a dynamic volatility estimator that changes the look-backwindow length with machine learning. Varying the length of the volatility reference periodin the standard sum of squared returns formula gives a more accurate reﬂection of marketconditions that ﬁlters out noise better than static volatility estimators. The results showthat the beneﬁts from volatility-timing are enhanced when using this proposed measure forvolatility.Reward-risk timing is the combination of both return- and volatility-timing. Return-timing can be proﬁtable with superior forecasting ability, yet ignoring the risk associated3ith a high return, for instance, would lead to poor risk-adjusted performance. The incorrectforecasts are not mitigated by their risk. On the other hand, volatility-timing is advantageousif the risk is not compensated fully by the reward, yet there may be cases when in factthe reward overcompensates the risk. Timing the market with both the expected returnand volatility addresses the drawbacks of these individual approaches. The role of machinelearning is to provide more accurate estimates by taking advantage of complex non-linearrelationships between market variables and help make optimal decisions. With this, weprovide a unifying framework for return- and volatility-timing as well as machine learning inﬁnance.An outline of the paper follows. Section 2 reviews the literature. Section 3 describesthe portfolio allocation methodology, including the utility-maximization problem and models.Section 4 demonstrates the results of using the machine learning portfolio allocation strategy.Section 5 contains theoretical interpretations of the results, and Section 6 concludes.

Abundant work can be found on two strands of market timing, via expected returnsand volatilites. Work can also be found on approaches combining the two, yet none to ourknowledge integrate machine learning.There is a long literature on expected-return timing. Kandel and Stambaugh (1996)examine equity return predictability and ﬁnd that the optimal stock-versus-cash allocationcan depend importantly on a predictor variable such as the dividend yield. Goyal and Welch(2008) comprehensively examine the performance of variables that have been suggested by theacademic literature to be good predictors of the equity premium and ﬁnd contradictory results.Johannes et al. (2014), however, ﬁnd strong evidence that investors can use predictabilityto improve out-of-sample portfolio performance provided they incorporate time-varyingvolatility and estimation risk into their optimal portfolio problems.There has also been a sizable interest in volatility-timing. Moreira and Muir (2017)4howed volatility-managed factors outperform their buy-and-hold counterparts, modeling theoptimal weight as a constant over the realized volatility for the previous month. Fleming etal. (2007) discussed the economic value of volatility timing, and Moreira and Muir (2019)derived that investors who volatility time earn 2.4% more annually than those who do not.Numerous papers have been written in response. Liu et al. (2019) found that the strategy inMoreira and Muir (2017) is subject to look-ahead bias since they choose the constant basedon the full sample and that it is not easy to outperform the market with volatility timingalone. One ﬁnding in this paper is that simply replacing the constant with the expandingestimate of the unconditional mean excess return, which stays close to the constant chosenby Moreira, leads to similiar performance .Our main aim is to simultaneously perform expected return- and volatility-timing.Johannes et al. (2014) ﬁnd statistically and economically signiﬁcant out-of-sample portfoliobeneﬁts for an investor who uses models of return predictability when forming optimalportfolios, if accounting for estimation risk and allowing for time-varying volatility. We doso, however, not with typical regression-based approaches but with machine learning.Kirby and Ostdiek (2012) develop volatility- and reward-risk-timing strategies for theportfolio with many assets. Our paper considers the problem for the risk-free asset and themarket while applying machine learning.Gu et al. (2020) showed the beneﬁt from using machine learning for empirical assetpricing, tracing the predictive gains to the allowance of non-linear predictor interactions.Trees and neural nets were the most successful in predicting returns.An article by Nystrup et al. (2016) proposes an approach to dynamic asset allocationusing Hidden Markov Models that is based on detection of change points without ﬁtting amodel with a ﬁxed number of regimes to the data, without estimating any parameters, andwithout assuming a speciﬁc distribution of the data. Our approach also does not assume anumber of regimes, yet it does not discretize the portfolio weights.To our knowledge, this is the ﬁrst paper written on a machine learning approach to Our weight is constrained by a 150% leverage limit so the alphas are not the same in the main results.

We perform two tasks with machine learning that give the weight of the market index inour portfolio. First, we predict if the market excess return next month will be positive withlagged net payout yields and risk-free rates as the predictor variables. Second, we estimate theprevailing volatility with lagged values for a volatility proxy. The weight of the equity indexis proportional to the probability that the next month’s return exceeds that of the risk-freeasset and inversely proportional to the squared volatility estimate. This gives us a series ofout-of-sample portfolio returns and corresponding performance metrics. Finally, the sameprocedure is performed on a holdout set, data that provides a ﬁnal estimate of the models’performance after they have been trained and validated, to test against backtest-overﬁtting(Bailey et al., 2015) . Algorithm 1 describes the general portfolio allocation approach. Algorithm 1:

Portfolio Allocation Approach for each month t = 1 to T do

1. Update machine learning models with the data until the mostrecent returns and predictors at time t −

2. Forecast the class probabilities of the sign of the excess return and theoptimal reference window length for the volatility estimate at time t

3. Compute the optimal weight in the stock index for time t andreturn to step 1 end The strategies begin on January, 1952. The reason for this is two-fold. First, it isimportant that the data that trains a machine learning model is large enough. Trying to Holdout sets are never used to make decisions about which algorithms to use or for improving or tuningalgorithms. Therefore, the performance on the holdout set is indicative of investment performance if aninvestor starts trading with the models and strategy today.

Most models of portfolio allocation with exact, closed-form solutions assume expectedreturns or stochastic volatility evolve continuously through time, a constant investmentopportunity set, or single-period optimization. Our problem is harder due to the presenceof time-varying risk premia and volatility across a discretized time horizon with periodicrebalancing. To ﬁnd tractable solutions that are applicable to real-life investors, one can ﬁrstconsider the static one-period problem in Merton (1969) and Samuelson (1969), followedby stylized cases with time-varying expected returns and volatility which give our optimalweights.Consider a power utility investor of terminal wealth W t +∆ t . U ( W t +∆ t ) = W (1 − γ ) t +∆ t − − γ , (1)where γ > is the coeﬃcient of relative risk-aversion. For γ = 1 , U ( W t +∆ t ) = ln W t +∆ t .The investment universe with a risky and riskless asset and a constant mean and7ariance constrained by a budget is deﬁned by r t = µ + σ · z t (2) W t = W t − (cid:16) w t · exp( r t ) + (1 − w t ) · exp( r ft ) (cid:17) , (3)where µ is the expected log return on the risky asset, σ is the volatility, z t is a normal randomvariable and E [ z t | z t − ] = E [ z t ] = 0 , W t is the investor’s wealth at time t , r ft is the risk-freeasset log return, and w t is the portfolio weight in the risky asset at time t . While the returnon the risk-free asset is realized at time t , the rate is locked in at time t − . Samuelson(1969) showed the optimal investment fraction in the risky asset to maximize the expectedutility of wealth is given by: w ∗ t = µ − r ft γσ . (4)It is well known that the investment opportunities are not constant throughout time. There-fore, consider the following model where the market expected return and volatility changeaccording to two non-linear functions of lagged predictor variables and volatilities. r t = µ t + σ t · z t (5) µ t = g t (cid:16) x t − , . . . , x t − , r ft − , . . . , r ft − (cid:17) + (cid:15) t (6) log( σ t ) = h t (cid:0) log( σ t − ) , . . . , log( σ t − ) (cid:1) + s t , (7)where x t − , . . . , x t − are the nine lagged values of predictor variable, σ t − , . . . , σ t − are thenine lagged volatilities, z t , (cid:15) t , and s t are potentially correlated normal random variables withmean zero, E [ z t | z t − ] = E [ z t ] , E [ (cid:15) t | (cid:15) t − ] = E [ (cid:15) t ] , and E [ s t | s t − ] = E [ s t ] . Functions g t and h t are unknown and to be estimated. In certain stylized cases, there exist closed-form solutionsto multi-period investment problems when variables at the current time are unknown. AsJohannes et al. (2004) point out, however, for an analytical solution, expected returns canbe unknown only if the current volatility is known, for instance, by the quadratic variationprocess. Because both future returns and volatility are predicted, to solve the optimal8ortfolio problem, we follow the existing literature and simplify the allocation problem byconsidering a single-period problem: J ( F t − ) = max w t E [ U ( W t ) |F t − ] = max w t (cid:90) U ( W t ) P ( r t |F t − ) dr t , (8)where P ( r t |F t − ) is the predictive distribution of future returns and F t − = { x t − , . . . , x t − ,r ft − , . . . , r ft − , σ t − , . . . , σ t − } . This is similar to the approach taken in Kandel and Stam-baugh (1996) and Johannes et al. (2004).The diﬀerence between single and multi-period problems is that in the latter, hedgingdemands arise from changes in variables determining the attractiveness of future investmentopportunities. Brandt (1999) showed that hedging demands are typically very small terms inthe optimal weight. Additionally, portfolio choice will be myopic if the investor has powerutility and returns are IID.To derive the optimal portfolio weight, let us assume that U ( · ) is twice diﬀerentiable,monotonically increasing, and concave (which is the case for the power utility investor). Thenby Eq. 3, the optimal portfolio is given by the ﬁrst order condition E [ U (cid:48) ( W t )( R t − R ft ) |F t − ] = 0 , (9)where R t denotes exp( r t ) − , R ft is exp( r ft ) − , and the expectation is taken over thepredictive distribution of future returns. By the deﬁnition of covariance and Eq. 9, cov [ U (cid:48) ( W t ) , R t − R ft |F t − ] + E [ U (cid:48) ( W t ) |F t − ] E [ R t − R ft |F t − ] = 0 , (10)To separate the eﬀects of risk and return on utility, realize that R t has a stochastic volatilitymixture distribution (Gron et al., 2011). In this case, a generalization of Stein’s lemma (see9ppendix) allows us to re-write the covariance term as cov [ U (cid:48) ( W t ) , R t − R ft |F t − ] = E Q [ U (cid:48)(cid:48) ( W t ) |F t − ] cov [ W t , R t |F t − ]= w t E Q [ U (cid:48)(cid:48) ( W t ) |F t − ] var [ R t |F t − ] , (11)where Q represents the size-biased volatility-adjusted distribution. Solving for the optimalweight, w ∗ t = E [ R t − R ft |F t − ]¯ γ · var [ R t |F t − ] , (12)where ¯ γ = − E [ U (cid:48) ( W t ) |F t − ] /E Q [ U (cid:48)(cid:48) ( W t ) |F t − ] . This provides a justiﬁcation for using aconditional mean-variance rule.As a ﬁnal case, consider constant-mean returns and time-varying volatility: r t = µ + σ t · z t (13) log( σ t ) = h t (cid:0) log( σ t − ) , . . . , log( σ t − ) (cid:1) + s t (14)Starting from Eq. 10, using the fact that E [ R t − R ft |F t − ] = E [ R t − R ft ] , and applying thesame logic, the optimal weight is given by w ∗ t = E [ R t − R ft ]¯ γ · var [ R t |F t − ] . (15)The two functions g t ( F t − ) = R t − R ft and h t ( F t − ) = log( σ t ) give the excess returnand variance, respectively, at time t given the information set F t − at the previous time. Inthis paper, we learn g t and h t with the machine learning algorithm Random Forest discussedin Section 3.3.With this portfolio allocation framework in mind, we examine three reward-risk strategiesbased on the optimal weight. The ﬁrst strategy is reward-risk timing with an expandingwindow estimate of the reward, the numerator in Eq. 15. It assumes time-varying volatility,but the investor has no superior knowledge about the excess market return at time t .10peciﬁcally, in this base strategy, volatility is computed as the realized volatility for the pastmonth but the risk premia with the full sample until time t − . Therefore, the strategyweights are given by t − (cid:80) t − i =1 ( R t − R ft ) / (¯ γ · σ t − ) , a simple estimate of the optimal weight.The second and third, full reward-risk timing strategies employ machine learning (linearmodel) models to 1) forecast the probabilities of the signs of the excess return for the nextmonth with lagged payout yields (the absolute return) and 2) estimate the best length ofthe reference window to use in the volatility calculation with lagged volatilities, giving thevariance in the denominator of Eq. 12.Given the excess return class probabilities, the numerator in the optimal weight isadjusted with a more accurate view of the expected reward. A correct return directionprediction 58% of the time, on average, signiﬁes an advantage over using the unconditionalmean. By varying the length of the reference window, the volatility estimate can be adjustedand optimized . The length determines which months’ realized returns are included inthe estimate, and in eﬀect the magnitude of the volatility estimate. Correctly scaling thevolatility in terms of the actual future excess return is the result.The reward-risk timing strategies avoid investing during periods of low market rewardand high risk. It is not surprising that the performance of the base reward-risk timingstrategy is better relative to the buy-and-hold given that it is an extension of the risk-managed portfolio literature discussed in the next subsection. The full strategy employingmachine learning achieves better results than both strategies. Next, we look more closely atthe volatility-timing strategy in the literature and the modiﬁcation that is made to arrive atthe base reward-risk timing strategy. Moreira and Muir (2017) examine a volatility-managed portfolio constructed by scalingthe portfolio weight of the market or factor w t by the inverse of the past month’s realizeddaily return variance. The strategy is motivated by the observation that changes in volatility We optimize the volatility estimate in the sense that the estimate gives a higher expected return whenusing it to determine portfolio weights. w t = c ˆ σ t − , (16)where c is a constant and ˆ σ t − is the realized return variance in month t − . ˆ σ t − is computedfrom the 22 average daily returns over the month ˆ σ t ( f ) = RV t ( f ) = (cid:88) d =1 / (cid:32) f t + d − (cid:80) d =1 / f t + d (cid:33) , (17)where f is the daily excess return. The constant c is set in Moreira and Muir (2017) such thatthe strategy’s standard deviation matches that of the buy-and hold for ease of interpretation.Liu et al. (2019) point out that choosing c based on the unconditional volatility over theentire period is an in-sample approach and is thus subject to look-ahead bias. While this iscorrect, simply using the historical average excess return forecast instead of the constantgives the same or better performance results. This is not surprising since the historical meandivided by the risk-aversion coeﬃcient ¯ γ = 4 produces a numerator that stays consistentlyclose to the the exact value of c , the constant which makes the standard deviation of thevolatility-managed strategy equal to that of the buy-and-hold . To see the eﬀects on theportfolio weights, consider the two quantities c/ ¯ σ and t − (cid:80) t − i =1 ( R t − R ft ) / (4¯ σ ) , where ¯ σ is the average squared realized volatility, from 1952-2010 displayed in Figure 1.The two weights stay close to each other for most of the period.The discussion above provides an intuition for why this modiﬁed version of volatility-timing, or base reward-risk timing, achieves similar investment performance for the marketportfolio to volatility-timing in Moreira and Muir (2017). The results are discussed in Section Because our data has a slightly shorter sample period, the value here does not exactly match that in thepapers above. igure 1: Volatility-timing with a constant versus the expanding window estimate ofexcess return. The constant c , which gives the volatility-timing strategy the same ending standarddeviation as the buy-and-hold, over the average realized volatility is plotted in versus the numeratorobtained from using the expanding excess return mean over a risk-aversion coeﬃcient ¯ γ = 4 in black.

4. To come to the full strategy, we ﬁrst look at the linear and machine learning models inthe next sections.

We consider a linear regression model with extensions as a comparison to machinelearning. Only the ﬁrst task of excess return estimation is used in this comparison, with thevolatility estimate of this strategy set equal to that of the machine learning model.Starting with the simple model of monthly excess returns as a function of the laggedpayout yields and risk-free rates, g t (cid:16) x t − , . . . , x t − , r ft − , . . . , r ft − (cid:17) = α + (cid:88) i =1 β i x t − i + (cid:88) j =1 ω j r ft − j +9 + (cid:15) t (18)we ﬁnd that the residuals are serially correlated. For this reason, we model the residuals as13n ARMA process. (cid:15) t = φ (cid:15) t − + · · · + φ p (cid:15) t − p + θ z t − + · · · + θ q z t − q + z t , (19)where z t is white noise. The number of AR and MA terms, p and q , are chosen at each timewith AICc (Sugiura, 2007). One alternative to this speciﬁcation is an ARMAX model thatis estimated with maximum likelihood. However, the coeﬃcients are harder to interpret.Regression with ARMA errors can capture the residual autocorrelation, if it is present,while allowing rapid changes in the dependent variable. This modiﬁcation slightly improvespredictive performance. Random Forest is an ensemble machine learning algorithm developed by Breiman (2001).The prediction by a Random Forest model is the majority vote across all the individualdecision tree learners (Hastie et al., 2017). The default tree bagging procedure draws B diﬀerent bootstrap samples of the training data and ﬁts a separate classiﬁcation tree tothe bth sample. The forecast is the average of the trees’ individual forecasts. Trees for abootstrap sample are usually deep and overﬁt, meaning each has low bias but is ineﬃcientlyvariable. Averaging over the B predictions reduces the variance and stabilizes the trees’forecast performance. Algorithm 2 gives the procedure used to construct a Random Forestwith the implementation by Liaw and Wiener (2002).14 lgorithm 2: Random Forest

Result:

The ensemble of trees { T b } B for b = 1 to B do

1. Draw a bootstrap sample Z ∗ of size n from the training data.2. Grow a random-forest tree T b to the bootstrapped data, byrecursively repeating the following steps for each terminal nodeof the tree, until the minimum node size s min is reached.(a) Select m variables at random from the p variables(b) Pick the best variable/split-point among the m .(c) Split the node into two child nodes.To make a prediction at a new point, (cid:126)x , let ˆ C b ( (cid:126)x ) ∈ {− , } be the class prediction of the bth random-forest tree. Then the class prediction of the Random Forest model is ˆ C ( (cid:126)x ) = sign ( B (cid:88) b =1 ˆ C b ( (cid:126)x )) , (20)the majority vote. For a binary model, the class probabilities are (cid:80) i ∈ B + ˆ C i ( (cid:126)x ) / (cid:80) Bb =1 ˆ C b ( (cid:126)x ) and (cid:80) j ∈ B − ˆ C j ( (cid:126)x ) / (cid:80) Bb =1 ˆ C b ( (cid:126)x ) , the proportion of votes for eachclass, with B + and B − denoting the sets of trees predicting positive and negative classes,respectively.Random forests give an improvement over bagging with a variation designed to reducethe correlation among trees grown from diﬀerent bootstrap samples. If most of the bootstrapsamples are similar, the trees trained on these sample sets will be highly correlated. Thenthe average estimators of similar decision trees can be more robust but do not performmuch better than a single decision tree. If, for example, last month’s dividend yield is thedominant predictor of the return direction out of the variables, then most of the baggedtrees will have low-depth splits on the most recent yield, resulting in a large correlation15mong their predictions. Trees are de-correlated with a method known as "random subspace"or "attribute bagging," which considers only a random subset of m predictors out of p for splitting at each potential branch. In the example, attribute bagging will ensure earlybranches for some trees will split on predictors other than the most recent dividend yield.Since each tree is grown with diﬀerent sets of predictors, the average correlation among treesfurther decreases and the variance reduction relative to standard bagging is larger (Gu et al.2020) . The number of variables randomly sampled as candidates at each split, m , number ofbootstrap samples, B , and the minimum fraction of observations in the terminal nodes, s min ,are the tuning parameters optimized with validation. A detailed algorithm for classiﬁcationtrees can be found in the Appendix.The parameters m and s min are tuned with the sample from 1952 to 2010. To testagainst parameter over-ﬁtting, the ﬁnal values are kept on the holdout time period from 2011to 2017. Forecasting returns is explored extensively in Gu et al. (2020). For better modelintelligibility, we consider a related task, predicting whether excess returns will be positiveor negative, which is a binary classiﬁcation problem. For optimal portfolio construction, theweight should increase when the investor expects a positive excess return, holding all elseconstant.To classify each month, we borrow from the standard literature and use a variationof lagged dividend yields as the predictors. The importance of the dividend yield in theallocation is robust to the "data-mining" consideration (Kandel and Stambaugh, 1996),and it has been shown to explain equity return predictability in Johannes et al. (2004) forexample. In traditional theory, the dividend yield can explain equity prices since prices are thediscounted future cash ﬂows. Boudoukh et al. (2007) research a measure of net payout yieldincorporating both share repurchases and issuances which can have a stronger association Because this makes Random Forest a non-deterministic algorithm, we average the results for multiplediﬀerent seeds. Y t , will be positive is P ( R t − R ft > | Y t = y t ) = g RFt ( x t − , . . . , x t − , r ft − , . . . , r ft − ) = (cid:80) j ∈ By t ˆ C j ( (cid:126)x ) (cid:80) Bb =1 ˆ C b ( (cid:126)x ) , (21)where g RFt is the Random Forest model ﬁt at time t , y t is + or − , By t is set of decisiontrees that predict y t , x t − , . . . , x t − are the nine last values of the payout yield, ˆ C b is anindividual tree’s decision, and (cid:126)x is the feature vector consisting of the payout yields andrisk-free rates . Being correct more than half of the time would be suﬃcient for the investorto beneﬁt, provided that excess returns are symmetric. Positive excess returns, however, canbe more frequent and smaller in magnitude than negative excess returns. Therefore, we willsee that the quality of the model is not only summarized by the classiﬁcation accuracy, butalso by the ability to anticipate large positive and negative excess returns.There is information in the payout yield up to three quarters ago and the presenceof interaction eﬀects between payout yields at diﬀerent months, which Random Forest candetect. In traditional literature, a higher past month’s payout yield is indicative of a higherchance of a positive excess return (Fama and French, 1988). Yet the yield in the month beforethat still has information about the overall trend in the market. We trace the predictivegains of our approach to these eﬀects.For the base reward-risk timing strategy, the expected excess return E [ R − R f ] in w ∗ While the class vote proportions are not exactly the the class probabilities, we use them as proxies.

17s kept as the mean of the expanding window of excess returns until time t − , R − R f = t − (cid:80) t − i =1 ( R i − R fi ) . If an investor knows with some probability P and some level of conﬁdence δ t that the model will correctly forecast the sign of the excess return, the investor can adjustthe expectation to E [ R t − R ft | Y t = y t ] = δ t · ( R − R f + · P ( R t − R ft > | Y t = y t ) · π + + R − R f − · P ( R t − R ft ≤ | Y t = y t ) · π − )+ (1 − δ t ) · R − R f , (22)where π is the proportion of excess returns that were historically positive or negativemultiplied by two, R − R f + and R − R f − are the means conditional on a positive or negativeexcess return, and P ( R t − R ft >

0) + P ( R t − R ft ≤

0) = 1 . Here, δ t is the test accuracyrate of the Random Forest model . With this approach, the ﬁtted model is able to predictthe correct direction of the return approximately 58% of the time. In other words, thenumerator of the weight becomes the sum of the conditional expectations weighted by classprediction probabilities and the expectation without any knowledge of the future. Thesum of conditional expectations and the unconditional expectation is itself weighted by theconﬁdence in the machine learning model. The numerator is equal to the unconditionalexpectation when the probabilities of positive and negative excess returns are equal. Using aweighted average of the historical mean and Random Forest prediction reduces the frequencyof large shifts in the portfolio yet allows for the share in the equity index to grow when themodel is highly conﬁdent that the excess return will be high. For the full machine learningreward-risk timing portfolio, the expectation of the excess return is given by Eq. 22. Volatility has a central role in optimal portfolio selection, derivatives pricing, and riskmanagement. These applications motivate an extensive literature on volatility modeling. The accuracy rate ﬂuctuates slightly at each iteration and is therefore updated with the expandingwindow of predictions until the current time. N ∗ t , of the simple volatilityestimator deﬁned as the standard deviation of the past N daily log returns. The motivationbehind this approach is the varying choice of N in the risk-managed factors literature . σ t ( N ) = (cid:118)(cid:117)(cid:117)(cid:116) N N (cid:88) d =1 (cid:32) f t +1 − d/N − (cid:80) Nd =1 f t +1 − d/N N (cid:33) . (23)The number of returns to use in the volatility calculation, N , is the output of theRandom Forest model trained on an expanding window of data until time t − and isrestricted to values that include no partial month so the problem is multi-class classiﬁcation.Another restriction on the values of N is 1 month or multiples of 3 months until 12, i.e. N ∈ { , , , , } to limit the frequency of changes. Since the optimal weight of the marketindex at time t is inversely proportional to the squared volatility, the optimal number ofreturns to include in the estimate, N ∗ t , is deﬁned as the value which makes the volatilityestimate the maximum or the minimum under the previous constraints depending on thesign of the excess return: If r t > r ft , N ∗ t := arg min N σ t ( N ) else N ∗ t := arg max N σ t ( N ) . The volatility σ t with N ∗ t is the return-maximizing volatility σ ∗ t in our portfolio allocationframework. The future excess return, and thus N ∗ t , is unknown at time t − . We can,however, estimate the relationship between past values of N ∗ t and some predictor variables Barroso and Santa-Clara (2014) use a 6-month estimate of realized volatility to construct their risk-managed momentum strategy. On the other hand, Moreira and Muir (2017) use a single month for a numberof factors including momentum, indicating the choice for N could be optimized. We delegate the decision tomachine learning and ﬁnd an advantage to automating the task based on prevailing market conditions.

19t time t − .The predictor variables are lagged realized volatilities for the past nine months actingas proxies, ˆ σ t = (cid:88) d =1 (cid:32) f t +1 − d/ − (cid:80) d =1 f t +1 − d/ (cid:33) , and ˆ N t = h RFt (ˆ σ t − , . . . , ˆ σ t − ) . (24)The reference window length is a function of the lagged volatilities. Thus, the investor’sestimate of the squared return-maximizing volatility σ ∗ t given the estimated optimal referencewindow length ˆ N t becomes E [ σ ∗ t | ˆ N t ] = ¯ σ t − , (25)where ¯ σ t − = 22ˆ N t ˆ N t (cid:88) d =1  f t − d/ ˆ N t − (cid:80) ˆ N t d =1 f t − d/ ˆ N t ˆ N t  . If ˆ N t = 22 , the average number oftrading days in a month, the squared volatility estimate is simply equal to the last month’srealized squared volatility. The advantage of this measure over simply the last month’svolatility is that it contains information about the future excess return. The majority of thetime, N t takes either the values of 1 month or 12 months, and changes in the window lengthare usually persistent.The test, or out-of-sample, accuracy for the Random Forest is deﬁned as ∆ = 1 t t (cid:88) i =1 { h RFi ( · ) = N ∗ i } . (26)The accuracy of this Random Forest model is on average 40.2% because classiﬁcationcorrectness is a harsh metric for multi-class models. Because there are ﬁve classes, the40.2% attained by the model should be measured against the majority class proportion,35.6%, and is a substantial improvement. This accuracy, however, is suﬃcient for a beneﬁtin performance. See Section 4. 20 Empirical Results

This paper uses monthly data from Kenneth French’s website on the market return(Mkt) and risk-free asset return (Rf). Daily returns are retrieved to compute the realizedvolatilities.We use data on the payout yield from Michael Robert’s website, which are derivedfrom all ﬁrms continuously listed on the NYSE, AMEX, or NASDAQ indices. The payoutyield here is a more inclusive measure of total payouts than standard dividend yields andis achieved via the ‘net payout’ of Boudoukh et al. (2007). It includes share issuancesand repurchases in addition to the traditional cash dividend yields. In recent years sharerepurchases have played a more important role in total payouts to shareholders. For example,Boudoukh, Richardson, and Whitelaw (2006) report a signiﬁcantly higher forecast R whenusing various measures of the payout yield (i.e. including repurchases) than the dividendyield. For the payout yields after 2010, CRSP monthly data at the ﬁrm-level and the sameaggregation procedure to form the yields is used. To assess the predictive performance for the linear and machine learning models, wemeasure their directional accuracy and the excess return mean-squared error. Table 1 containsthe out-of-sample percentage time the machine learning and linear models give the sign ofthe excess return correctly . Table 1: Out-of-Sample Forecasting Accuracy

In this table are the out-of-sample classiﬁcation accuracies for the initial period from 1952 to 2010,the holdout period from 2011 to 2017, and the full sample period for the various strategies. The machine learning accuracy is based on the expected excess return in Eq. 22 R or mean-squared forecasterror (MSFE) is often used to measure statistical accuracy. Our machine learning model,however, only predicts the direction of the excess return and assigns a probability. Ourobjective is to maximize investor utility and risk-adjusted returns by anticipating largepositive or negative market returns, not predict their precise magnitudes. The forecasts,given by a sum of the historical negative and positive returns weighted by probabilities, stayclose to the long-run mean. For this reason, it is useful to take a longer period for assessingmodel advantages over the mean.The out-of-sample annual R is calculated as R os = 1 − (cid:80) t ∈T ( f At − ˆ f At ) (cid:80) t ∈T f A t (27)where T denotes the set of points not used for model training and f A are annual marketexcess returns. The forecasts, ˆ f A , are formed with an average of the monthly forecasts.The annual R os is 20.4% for the Random Forest model. Gu et al. (2020) attain an annual,stock-level out-of-sample R of 15.7% with Neural Networks on an optimized set of predictors.With these forecasting characteristics in mind, we next discuss the risk-adjusted perfor-mance of the strategies and models. This section discuss the out-of-sample investment performance for machine learningcalibrated reward-risk timing and makes the relevant comparisons. We invest $1 in 1952 asan investor with a coeﬃcient of relative-risk aversion ¯ γ = 4 and plot the cumulative returns22o each strategy on a log scale in Figures 2 and 3 without short-selling and with 100% and50% leverage constraints, respectively . For the rest of the paper, we impose the morerealistic portfolio constraint, preventing the investor from taking more than 50% leverage asin Campbell and Thompson (2008): that is, conﬁning the portfolio weight on the marketindex to lie between 0% and 150%. Figure 2: Cumulative returns of reward-risk timing to market index (200% leveragelimit).

This ﬁgure plots the cumulative returns of the base reward-risk timing strategy in blue andmachine learning reward-risk timing in black against the market index in green from 1952 to 2010.The vertical axis is in log-scale.

The investments that reward-risk time realize relatively steady gains. The ﬁnal wealthaccumulates to around $1,300 and $600 at the end of the sample for the machine learningand base (expanding sample mean reward estimate and previous month realized volatilityrisk estimate) strategies, respectively, versus about $400 for the buy-and-hold. At the startof the period, the machine learning model have seen three-hundred observations as part ofthe training data, and the investment performance improves with the size of the training The ﬁgures and tables in this section are all with ¯ γ = 4 except for Table 3. The results do not changesigniﬁcantly for other values. igure 3: Cumulative returns of reward-risk timing to market index (150% leveragelimit). This ﬁgure plots the cumulative returns of the base reward-risk timing strategy in blue andmachine learning reward-risk timing in black against the market index in green from 1952 to 2010.The vertical axis is in log-scale. set and the classiﬁcation accuracy becomes more stable. The ’break-away’ moment fromthe base reward-risk timing strategy is around 1970. Because the Random Forest model’sparameters are determined within this period, it is necessary to also look at the cumulativereturns for the holdout period from 2011 to 2017 in Figure 4.An investors who starts with $1 in 2011 and reward-risk times with machine learningoutperforms relative to the market and other strategies again. Therefore, the results cannotbe easily explained by the particular choice of machine learning model parameters.Figure 5 plots the drawdown of the two strategies relative to the market, which helpsus understand when our strategies lose money relative to the buy-and-hold. The basereward-risk strategy takes relatively more risk when volatility is low (e.g., the 1970s) andthus, not surprisingly, its largest losses are concentrated in these times. The machine learninganalog has a pattern of losses similiar to reward-risk timing with no predictive model, yet itdiminishes the severity of many losses and to a high degree for some of the most extreme24 igure 4: Cumulative returns of reward-risk timing to market index (150% leverage).

This ﬁgure plots the cumulative returns of the base reward-risk timing strategy in blue and machinelearning reward-risk timing in black against the market index in green from 2011 to 2017, the unseensample period. The vertical axis is in log-scale. negative returns. For the sharp market losses starting in 1962, the ﬁrst major drawdown,the return direction machine learning model’s response is delayed, due to the very suddendrop. Yet for the next two major drawdowns in 1969 and 1973, our machine learning modelsare able to recognize the incoming negative returns because the drops are more staggered,cutting the losses felt by investors greatly. This is seen even more clearly in the Dot-combubble, where using machine learning allows investors to almost completely avoid lossesduring this time. In the last recession of 2007-2008, due to the extremely sharp onset, ourreturn direction machine learning model reduces risk exposure slightly too late, yet theinformation in the volatility estimate still correctly steers market exposure down. Reward-risktiming never has a drawdown greater than 40% of the portfolio value and greatly mitigatesthree of the four largest losses during severe recessions.Before proceeding with the numerical results, we deﬁne the various strategy weightsand give descriptions: 25 igure 5: Drawdowns of reward-risk timing to market index.

This ﬁgure plots the drawdownof the base reward-risk timing strategy in blue, machine learning reward-risk timing in black againstthe market index in green from 1952 to 2010. • w = max ( min ( E RF [ R − R f |F t − ] / (¯ γ · E RF [ σ ∗ |F t − ]) , . , . This is using RandomForest for the reward and risk estimates and a leverage limit of 50%. • w = max ( min ( E LM [ R − R f |F t − ] / (¯ γ · E RF [ σ ∗ |F t − ]) , . , . This is using thelinear model for the reward estimate and Random Forest for the risk estimate and aleverage limit of 50%. • w = max ( min ( E [ R − R f ] / (¯ γσ t − ) , . , . This is using the expanding windowestimate as the reward and the previous month’s realized volatility as the risk (discussedin Section 3.1), with a leverage limit of 50%. • w is the same as w but the 1.5 limit is decreased to 1, no leverage. • w is w after the same change as in w . • w is w after the same change as in w .26he risk-adjusted returns from machine learning portfolio allocation are substantiallyhigher than reward-risk timing with no model and the buy-and-hold. Table 2 displays theSharpe ratios for each portfolio allocation strategy and diﬀerent time periods. The samplefrom 2011 to 2017 is a holdout set, meaning we run the portfolio allocation process on itwith the same parameters and seeds as the previous sample after they are ﬁnalized. Table 2: Sharpe Ratios

In this table are the out-of-sample annual returns, standard deviations, and Sharpe ratios for theinitial period from 1952 to 2010, the holdout period from 2011 to 2017, and the full sample periodfor the various strategies. Mkt denotes the buy-and-hold.

Sample Strategy Annual Return (%) Standard Deviation (%) Sharpe Ratio

M kt w w w M kt w w w M kt w w w All the active strategies outperform the buy-and-hold on a risk-adjusted basis for eachout-of-sample period. Reward-risk timing with Random Forest gives the highest Sharpe ratioof 0.60 from 1952-2010, which is a 40% increase from the buy-and-hold. An investor whoreward-risk times with machine learning gains about 2 percentage points on return per yearrelative to passively investing, without increasing the risk.To quantify the economic relevance of our results and facilitate comparison, we considerthe perspective of the power-utility investor. The certainty-equivalent (CE) yield for themachine learning strategy is 8.49% and 6.59% for base reward-risk timing. The averagemonthly utility is also 40.2% greater for machine learning reward-risk timing than the27uy-and-hold.Machine learning reward-risk timing generates large gains relative to solely focusing onthe risk component. Campbell and Thompson (2008) estimate that the utility gain of timingexpected returns is 35% of lifetime utility. Next, we run a series of time-series regression ofthe strategies on each other and the market index, f at +1 = α + βf bt +1 + (cid:15) t +1 , (28)where f t +1 are the monthly excess returns. A positive intercept implies that the strategy a increases Sharpe ratios relative to strategy b . When this test is applied to systematic factors(e.g., the market portfolio) that summarize pricing information for a wide cross-section ofassets and strategies, a positive alpha implies that our portfolio-allocation strategy expandsthe mean-variance frontier. Table 3: Strategy Alphas

In this table, we run time-series regressions of each strategy on the market and on one another f at +1 = α + βf bt +1 + (cid:15) t +1 . The superscripts denote the three variations of strategies: RF for randomforest, LM for linear model, and base using no model. The data are monthly and the sample periodis 1952 to 2010. Standard errors are in parentheses and are adjusted for heteroskedasticity (White,1980). The alphas and errors are annualized in percent per year by multiplying monthly values by 12. Univariate Regressions f a f b Beta ( β ) Alpha ( α ) R N obs Mkt RF Mkt 0.66(0.05) 4.06(1.36) 0.50 709Mkt RF Mkt

Base RF Mkt LM LM Mkt 0.50(0.05) 3.17(1.31) 0.37 709Mkt

Base

Mkt 0.97(0.04) 1.15(1.08) 0.78 709Table 3 reports results from running regressions of the machine learning reward-risktiming strategy on the market index and the other strategies. The intercepts (Jensen’s α ’s)(Jensen, 1968) are positive and statistically signiﬁcant in all cases, except for the base. The28achine learning strategy has an annualized alpha of 4.06% and a beta of only 0.66. Themachine learning strategy over the base and linear model reward-risk timing has annualizedalphas of 3.20% and 4.02%, respectively. For the comparisons, the alphas earned from usinglinear model and unconditional mean to forecast the excess return are markedly smaller at3.17% and 1.15%, respectively.The next ﬁnding is that our strategies survive transaction costs, given in Table 4.Speciﬁcally, we evaluate our portfolio allocation strategy for the reward-risk timing portfolioswhen accounting for empirically realistic transaction costs as in (Moreira and Muir, 2017).Strategies that capture reward-risk timing but reduce trading activity include capping thestrategy’s leverage at 1 compared to the case with a weight limit of 1.5. These leveragelimits reduce trading and hence total transaction costs. We report the average absolutechange in monthly weights, expected return, and Jensen’s alpha of each strategy beforetransaction costs. The next columns contain the alphas when including various transactioncost assumptions. Finally, the last column derives the implied trading costs in basis pointssuch that the alphas are zero in each of the cases.The results indicate that machine learning reward-risk timing survives transactionscosts, even with high volatility episodes where such fees rise. Overall, the annualized alphaof the reward-risk timing portfolio allocation strategy decreases slightly, but is still verylarge. Reward-risk timing with machine learning does not require extreme leverage or drasticportfolio rebalancing to be proﬁtable.The empirical results overall indicate a signiﬁcant advantage in using machine learningfor portfolio allocation. With only standard predictor variables, reward-risk timing withmachine learning models oﬀers economically substantial improvements in risk-adjusted returns(40% increase in Sharpe ratio). Statistically signiﬁcant positive alphas of 4% are found as aresult of the superior forecasting ability of machine learning. Finally, realistic trading costsare applied to gain further insight on real-life applicability, showing alphas remain large.With this evidence in mind, it is also valuable to look from a theoretical perspective at whythe strategy outperforms. 29 able 4: Transaction Costs of Machine Learning Portfolio Allocation In this table, we evaluate our reward-risk timing strategies for the market when including transactioncosts. Lower leverage limits reduce trading activity. Speciﬁcally, we consider restricting risk exposureto be between 0 and 1 (i.e., no leverage) or 1.5. The alphas are reported with these assumptions.Following Moreira and Muir (2017), the 1bp cost comes from Fleming et al. (2003), the 10bps is fromFrazzini, Israel, and Moskowitz (2015) when trading approximately 1% of daily volume, and the nextcolumn adds an additional 4bps to cover for transaction costs increasing in high-volatility episodes.The last column backs out the implied trading costs in basis points needed to drive the alphas tozero in each of the cases. α After Trading CostsWeight | ∆ w | E [ R ] α ¯ γ = 4 w w w w w w ¯ γ = 6 w w w w w w In this section, we provide a theoretical framework to interpret some of our ﬁndings.We ﬁrst derive the alpha for the base reward-risk timing. Then we do the same process formachine learning reward-risk timing. We show that the alphas for base portfolio allocationare proportional to the covariance between the conditional variance of the risky asset andthe asset price of risk. A new result is that our alphas for portfolio allocation with machinelearning are a function of models’ performance.We work in continuous time. Consider the total portfolio value process R t with expectedreturn r t and conditional volatility σ t . Then dR t = r t · dt + σ t · dz t . Construct the reward-risk30iming version of this return with w t = r − r f ¯ γσ t from Eq. 15, dR (cid:48) t = dR t · w t + r ft dt · (1 − w t )= ( dR t − r ft dt ) · r − r f ¯ γσ t + r ft dt, (29)where r ft is the instantaneous risk-free rate and r − r f = t (cid:80) ti =1 ( r i − r fi ) is the expandingsample mean. The α of a time-series regression of the market-timing portfolio excess return dR (cid:48) t − r ft dt on the market portfolio excess return dR t − r ft dt is given by α = E [ dR (cid:48) t − r ft dt ] /dt − βE [ dR t − r ft dt ] /dt (30)Using that E [ dR (cid:48) t − r ft dt ] /dt = r − r f · E [ r t − r ft σ t ] , β = r − r f ¯ γE [ σ t ] by minimizing the sum of squareddeviations, and E [ dR t − r ft dt ] /dt = E [ r t − r ft ] and simplifying, we obtain a relationshipbetween the alpha and the covariance between the volatility and the price of risk. α = E (cid:34) r t − r ft σ t (cid:35) · r − r f − E [ r t − r ft ] · r − r f E [ σ t ]= − r − r f ¯ γE [ σ t ] · cov (cid:34) σ t , r t − r ft σ t (cid:35) (31)Thus, the α is positive when the price of risk moves opposite to the volatility. This isessentially the same result that Moreira and Muir (2017) recover. The diﬀerence here isthat the alpha is ampliﬁed by the unconditional mean excess return at time t rather than aconstant.Now, we examine the machine learning reward-risk timing alpha generation process.For tractability, the assumption that E [ r t − r ft |F t − ] E [ σ t |F t − ] = E [ r t − r ft σ t |F t − ] is made. While we do notestimate the price of risk directly, Ait-Sahalia and Brandt (2001) explore directly estimatingthe optimal weight and ﬁnd similar performance to estimating the weight factors separately.31he process is then dR (cid:48)(cid:48) t = ( dR t − r ft dt ) · γ E (cid:34) r t − r ft σ t (cid:12)(cid:12)(cid:12)(cid:12) F t − (cid:35) + r ft dt, (32)where E (cid:20) r t − r ft σ t (cid:12)(cid:12)(cid:12)(cid:12) F t − (cid:21) is the estimate of the market price of risk the models give with theinformation set F t − . The α of a time-series regression of the machine learning market-timingportfolio excess return dR (cid:48)(cid:48) t − r ft dt on the market portfolio excess return dR t − r ft dt is again α = E [ dR (cid:48)(cid:48) t − r ft dt ] /dt − βE [ dR t − r ft dt ] /dt (33)Using that E [ dR (cid:48)(cid:48) t − r ft dt ] /dt = E [ r t − r ft σ t ] · E [ r t − r ft ] / ¯ γ by iterated expectations andindependence of the excess return and the machine learning price of risk estimate, β = γ E (cid:20) r t − r ft σ t (cid:12)(cid:12)(cid:12)(cid:12) F t − (cid:21) , and E [ dR t − r ft dt ] /dt = E [ r t − r ft ] , we obtain a relationship between thealpha and the price of risk expectations. α = E [ r t − r ft ]¯ γ · (cid:32) E (cid:34) r t − r ft σ t (cid:35) − E (cid:34) r t − r ft σ t (cid:12)(cid:12)(cid:12)(cid:12) F t − (cid:35)(cid:33) . (34)In this case, α is positive when the machine learning expectation of the market priceof risk given the information set at the previous time is cheaper than the unconditionalexpectation of the price, if the excess return is positive. If the excess return is negative,then α is positive if the machine learning estimate is less than the unconditional, avoiding ahigh allocation. The result does not depend on the sign of the return. Not surprisingly, thealpha is positive if the accuracy of the models used to estimate the market price of risk isgood enough to distinguish between positive and negative risk premia based on the knowninformation set.The above results provide a mapping between machine learning reward-risk timingalphas and the dynamics of the price of risk for an individual asset.32 Conclusion

Machine learning portfolio allocation oﬀers large risk-adjusted returns and is feasible toimplement in real-time. We perform both return- and volatility-timing, or reward-risk timing,with and without machine learning, showcasing the relative advantage machine learning canprovide. Furthermore, our strategy’s performance is informative about the alpha generationprocess for actively managed portfolios.At the same time, there are possibilities for improvements. Other machine learningmethods like deep neural networks may allow trading some interpretability for performancegains. Using predictors beyond lagged payout yields and risk-free rates may also be beneﬁcial.Additionally, this strategy on daily or weekly data may have the beneﬁt of catching sharpdrops in the market. Since one of our goals here was to show that machine learning has anadvantage in ﬁnance and portfolio allocation outside the context of big data, the results withstandard variables are promising.

References [1] Ait-Sahalia, Y., and M. Brandt, 2001. Variable Selection for Portfolio Choice.

TheJournal of Finance

Journal of Computational Finance

Journal ofFinancial Economics

Journalof Econometrics

The Journal ofFinance

Review of Financial Studies

Machine learning

The Review of FinancialStudies

Econometrica

Journal of Financial Economics

The Journal of Finance

The Review of Financial Studies

Applied Stochastic Models in Business and Industry

The Review of Financial Studies

ExpertSystems With Applications

The Journal of Finance

Journal of Financial and QuantitativeAnalysis

R News

The Journal of Portfolio Management

The Review of Economics and Statistics

The Journal of Business

The Journal ofFinance

TheJournal of Financial Economics

JAsset Manag

Review of Economics and Statistics

Commun. Stat.

Econometrica ppendix

A Stein’s lemma for stochastic volatility

Let X be a random variable with a stochastic volatility so that X | σ is distributed N ( µ, V σ ) and σ has density p ( σ ) that is non-negative only for σ ≥ . Let g ( X ) be thediﬀerentiable function of X such that E [ | g ( X ) | ] < ∞ . Suppose that < E [ σ ] < ∞ . If ( X, Y | σ ) are bivariate Normal random variables then cov [ g ( X ) , Y ] = E Q [ g (cid:48) ( X )] cov [ X, Y ] , (A.35)where E Q is the expectation taken under the measure induced by size-biasing q ( σ ) = σp ( σ ) /E [ σ ] . For a proof see Gron et al. (2011). B Additional alpha derivations

The base reward-risk time-series regression is given by dR (cid:48) t dt − r ft = α + β ( dR t dt − r ft ) + (cid:15) t , (B.1)with dR (cid:48) t given by Eq. 30. Next, deﬁne f t = dR t dt − r ft and f (cid:48) t as the left-hand side to get f (cid:48) t = α + βf t + (cid:15) t (B.2)To derive β , minimize the sum of squared residuals. min α,β E [( f (cid:48) t − ( α + βf t )) ] . (B.3)37olving the standard ﬁrst-order conditions gives, β = cov [ f (cid:48) t , f t ] var [ f t ] (B.4)For the base reward-risk timing, β = cov [( r t − r ft ) · r − r f / (¯ γσ t ) , r t − r ft ] var [ r t − r ft ] = r − r f ¯ γE [ σ t ] (B.5)For the machine learning reward-risk timing, β = cov [( r t − r ft ) · E [( r t − r ft ) /σ t |F t − ] / ¯ γ, r t − r ft ] var [ r t − r ft ]= 1¯ γ E (cid:34) r t − r ft σ t |F t − (cid:35) (B.6) C Decision tree algorithms

Algorithm C1 details how to build a classiﬁcation tree in a Random Forest and is agreedy algorithm (Breiman et al., 1984). We refer to the recursive version in (Murphy, 2012).38 lgorithm C1:

Classiﬁcation TreeInitialize stump node, N (0) . N k ( d ) is the k th node at depth d . S denotes the data,and C is the set of unique labels.function ﬁtTree( N k ( d ) , S , d )1. The prediction of the N k ( d ) node is the majority vote of its observations, sign ( (cid:80) i ∈ N k ( d ) y i )

2. Deﬁne the cost function as the Gini index. cost ( { x i , y i } ) = (cid:80) | C | c =1 ˆ π c (1 − ˆ π c ) ,where ˆ π c is the frequency an entry in the leaf belongs to class c .3. Select the optimal split: ( j ∗ , t ∗ ) = arg min j ∈{ ,..,m } min t ∈T j ( cost ( { x i , y i : x ij ≤ t } ) + cost ( { x i , y i : x ij > t } )) . S left = { x i , y i : x ij ≤ t } , S right = { x i , y i : x ij > t } .4. if notworthSplitting( d , cost , S left , S right ) then return N k ( d ) else Update the nodes: N ( d + 1) = ﬁtTree( N k ( d ) , S left , d + 1 ) N ( d + 1) = ﬁtTree( N k ( d ) , S right , d + 1 )return N k ( d ) endResult: The classiﬁcation tree model f ( (cid:126)x ) = (cid:80) d m =1 w m { (cid:126)x ∈ S m } , where w m = sign ( (cid:80) i ∈ S m y i ) The function notworthSplitting( d , cost , S left , S right ) contains stopping heuristics to preventoverﬁtting. In our case, the function value is true if the fraction of examples in either S left or S right is less than s min , the minimum fraction of observations in a node for a split determinedby the user’s parameter optimization.For the reward Random Forest model, which estimates the excess return direction and39robability, the values we set for s min , the number of trees, and the number of variables toselect from at each split ( m ) are 0.02, 500, and 4, respectively. For the risk Random Forestmodel, which gives the volatility window, the values we set for the number of trees and thenumber of variables to select from at each split ( m ) are 300 and 4, respectively. Rather than s minmin