Portfolio Construction Using Stratified Models
PPortfolio Construction Using Stratified Models
Jonathan Tuck Shane Barratt Stephen BoydFebruary 10, 2021
Abstract
In this paper we develop models of asset return mean and covariance that dependon some observable market conditions, and use these to construct a trading policy thatdepends on these conditions, and the current portfolio holdings. After discretizing themarket conditions, we fit Laplacian regularized stratified models for the return meanand covariance. These models have a different mean and covariance for each marketcondition, but are regularized so that nearby market conditions have similar models.This technique allows us to fit models for market conditions that have not occurredin the training data, by borrowing strength from nearby market conditions for whichwe do have data. These models are combined with a Markowitz-inspired optimizationmethod to yield a trading policy that is based on market conditions. We illustrate ourmethod on a small universe of 18 ETFs, using three well known and publicly availablemarket variables to construct 1000 market conditions, and show that it performs wellout of sample. The method, however, is general, and scales to much larger problems,that presumably would use proprietary data sources and forecasts along with publiclyavailable data.
Trading policy.
We consider the problem of constructing a trading policy that dependson some observable market conditions, as well as the current portfolio holdings. We denotethe asset daily returns as y t ∈ R n , for t = 1 , . . . , T . The observable market conditions aredenoted as z t . We assume these are discrete or categorical, so we have z t ∈ { , . . . , K } . Wedenote the portfolio asset weights as w t ∈ R n , with T w t = 1, where is the vector with allentries one. The trading policy has the form T : { , . . . , K } × R n → R n , where w t = T ( z t , w t − ), i.e. , it maps the current market condition and previous portfolioweights to the current portfolio weights. In this paper we refer to z t as the market conditions,since in our example it is derived from market conditions, but in fact it could be anythingknown before the portfolio weights are chosen, including proprietary forecasts or other data.Our policy T is a simple Markowitz-inspired policy, based on a Laplacian regularized strat-ified model of the asset return mean and covariance; see, e.g. , [Mar52, GK99, BBD + a r X i v : . [ q -f i n . P M ] F e b aplacian regularized stratified model. We model the asset returns, conditioned onmarket conditions, as Gaussian, y | z ∼ N ( µ z , Σ z ) , with µ z ∈ R n and Σ z ∈ S n ++ (the set of symmetric positive definite n × n matrices), z =1 , . . . , K . This is a stratified model, with stratification feature z . We fit this stratifed model, i.e. , determine the means µ , . . . , µ K and covariances Σ , . . . , Σ K , by minimizing the negativelog-likelihood of historical training data, plus a regularization term that encourages nearbymarket conditions to have similar means and covariances. This technique allows us to fitmodels for market conditions which have not occurred in the training data, by borrowingstrength from nearby market conditions for which we do have data. Laplacian regularizedstratified models are discussed in, e.g. , [DWW14, SS16, THB19, TBB21, TB20, TB21]. Oneadvantage of Laplacian regularized stratified models is they are interpretable. They are alsoauditable: we can easily check if the results are reasonable. This paper.
In this paper we present a single example of developing a trading policy asdescribed above. Our example is small, with a universe of 18 ETFs, and we use marketconditions that are publicly available and well known. Given the small universe and our useof widely available market conditions, we cannot expect much in terms of performance, butwe will see that the trading algorithm performs well out of sample. Our example is meantonly as a simple illustration of the ideas; the techniques we decribe can easily scale to auniverse of thousands of assets, and use proprietary forecasts in the market conditions.
Outline.
We start by reviewing Laplacian regularized models in §
2. In § § § § T . We mention a few extensionsand variations of the methods in § A number of studies show that the underlying covariances of equities change during dif-ferent market conditions, such as when the market performs historically well or poorly (a“bull” or “bear” market, respectively), or when there is historically high or low volatil-ity [EHV94, LS01, AB03, AB04, Bor12]. Modeling the dynamics of underlying statisticalproperties of assets is an area of ongoing research. Many model these statistical prop-erties as occurring in hard regimes, and utilize methods such as hidden Markov mod-els [RTA98, HTF09, NML18] or greedy Gaussian segmentation [HNB19] to model the tran-sitions and breakpoints between the regimes. In contrast, this paper assumes a hard regimemodel of our statistical parameters, but our chief assumption is, informally speaking, thatsimilar regimes have similar statistical parameters.2sset allocation based on changing market conditions is a sensible method for activeportfolio management [AB02, AT11, NHML15, Pet15]. A popular method is to utilize convexoptimization control policies to dynamically allocate assets in a portfolio, where the time-varying statistical properties are modeled as a hidden Markov model [NBLM19].
In this section we review Laplacian regularized stratified models, focussing on the specificmodels we will use; for more detail see [TBB21, TB20]. We are given data records of theform ( z, y ) ∈ { , . . . , K } × R n , where z is the feature over which we stratify, and y is theoutcome. We let θ ∈ Θ denote the parameter values in our model. The stratified modelconsists of a choice of parameter θ z for each value of z . In this paper, we construct twostratified models. One is for return, where θ z ∈ Θ = R n is an estimate or forecast of return,and the other is for return covariance, where θ z ∈ Θ = S n ++ is the inverse covariance orprecision matrix, and S n ++ denotes the set of symmetric positive definite n × n matrices.(We use the precision matrix since it is the natural parameter in the exponential familyrepresentation of a Gaussian, and renders the fitting problems convex.)To choose the parameters θ , . . . , θ K , we minimize K (cid:88) k =1 ( (cid:96) k ( θ k ) + r ( θ k )) + L ( θ , . . . , θ K ) . (1)Here (cid:96) k is the loss function, that depends on the training data y i , for z i = k , typically anegative log-likelihood under our model for the data. The function r is the local regularizer,chosen to improve out of sample performance of the model.The last term in (1) is the Laplacian regularization, which encourages neighboring valuesof z , under some weighted graph, to have similar parameters. It is characterized by W ∈ S K ,a symmetric weight matrix with zero diagonal entries and nonnegative off-diagonal entries.The Laplacian regularization has the form L ( θ , . . . , θ K ) = 12 K (cid:88) i,j =1 W ij (cid:107) θ i − θ j (cid:107) , where the norm is the Euclidean or (cid:96) norm when θ z is a vector, and the Frobenius normwhen θ z is a matrix. We think of W as defining a weighted graph, with edges associated withpositive entries of W , with edge weight W ij . The larger W ij is, the more encouragement wegive for θ i and θ j to be close.When the loss and regularizer are convex, the problem (1) is convex, and so in principleis tractable [BV04]. The distributed method introduced in [TBB21], which exploits theproperties that the first two terms in the objective are separable across k , while the lastterm is separable across the entries of the parameters, can easily solve very large instancesof the problem. 3 Laplacian regularized stratified model typically includes several hyper-parameters, forexample that scale the local regularization, or scale some of the entries in W . We adjustthese hyper-parameters by choosing some values, fitting the Laplacian regularized stratifiedmodel for each choice of the hyper-parameters, and evaluating the true loss function on a(held-out) validation set. (The true loss function is often but not always the same as the lossfunction used in the fitting objective (1).) We choose hyper-parameters that give the least,or nearly least, true loss on the validation data, biasing our choice toward larger values, i.e. ,more regularization.We make a few observations about Laplacian regularized stratified models. First, theyare interpretable, and we can check them for reasonableness by examining the values θ z , andhow they vary with z . At the very least, we can examine the largest and smallest values ofeach entry (or some function) of θ z over z ∈ { , . . . , K } .Second, we note that a Laplacian regularized stratified model can be created even whenwe have no training data for some, or even many, values of z . The parameter values forthose values of z are obtained by borrowing strength from their neighbors for which we dohave data. In fact, the parameter values for values of z for which we have no data areweighted averages of their neighbors. This implies a number of interesting properties, suchas a maximum principle: Any such value lies between the minimum and maximum values ofthe parameter over those values of z for which we have data. Our example considers n = 18 ETFs as the universe of assets,AGG, DBC, GLD, IBB, ITA, PBJ, TLT, VNQ, VTI,XLB, XLE, XLF, XLI, XLK, XLP, XLU, XLV, XLY . Each data record has the form ( y, z ), where y ∈ R is the daily active return of each assetwith respect to VTI, from market close on the previous day until market close on that day,and z represents the market condition known at the previous day’s market close, describedlater in §
4. (The daily active return of each asset with respect to VTI is the daily return ofthat asset minus the daily return of VTI.) Henceforth, when we refer to return or risk wemean active return or active risk, with respect to our benchmark VTI. The benchmark VTIhas zero active return and risk.Our dataset spans March 2006 to December 2019, for a total of 3462 data points. Wefirst partition it into two subsets. The first, using data from 2006–2014, is used to fit thereturn and risk models as well as to choose the hyper-parameters in the return and riskmodels and the trading policy. The second subset, with data in 2015–2019, is used to testthe trading policy. We then randomly partition the first subset into two parts: a trainingset consisting of 80% of the data records, and a validation set consisting of 20% of the datarecords. Thus we have three datasets: a training data set with 1780 data points in the daterange 2006–2014, a validation set with 445 data points also in the date range 2006–2014, and4olatility Inflation MortgageVolatility 1 -0.13 -0.14Inflation - 1 0.21Mortgage - - 1
Table 1:
Correlation of the market indicators over the training and validation period, 2006–2014. a test dataset with 1237 data points in the date range 2015–2019. We use 9 years of datato fit our models and choose hyper-parameters, and 5 years of later data to test the tradingpolicy. The return data in the training and validation datasets were winsorized (clipped) attheir 1st and 99th percentiles. The return data in the test dataset was not.
Each data record also includes the market condition z known on the previous day’s mar-ket close. To construct the market condition z , we start with three (real-valued) marketindicators. Market implied volatility.
The volatility of the market is a commonly used economicindicator, with extreme values associated with market turbulence [FSS87, Sch89, AIL99,CCR20]. Here, volatility is measured by the 15-day moving average of the CBOE volatilityindex (VIX) on the S&P 500 [Exc20], lagged by an additional day.
Inflation rate.
The inflation rate measures the percentage change of purchasing powerin the economy [WS94, BLS96, BLS01, BC03, Hun03, Mah17]. The inflation rate is pub-lished by the United States Bureau of Labor Statistics [oLS20] as the percent change of theconsumer price index (CPI), which measures changes in the price level of a representativebasket of consumer goods and services, and is updated monthly.
This metric is the interest rate charged by a mortgagelender on 30-year mortgages, and the change of this rate is an economic indicator correlatedwith economic spending [Cav16, SMS17]. The 30-year U.S. mortgage rate are published bythe Federal Home Loan Mortgage Corporation, a public government-sponsored enterprise,and is generally updated weekly [FRED20].These three economic indicators are not particularly correlated over the training andvalidation period, as can be seen in table 1.
Discretization.
Each of these market indicators is binned into deciles, labeled 1 , . . . , K = 10 × ×
10 = 1000. We can think of z as a 3-tupleof deciles, in { , . . . , } , or encoded as a single value z ∈ { , . . . , } .5 Date D e c il e VolatilityInflationMortgage
Figure 1:
Stratification feature values over time. The vertical line at 2015 separates the trainingand validation period (2006–2014) from the test period (2015–2019).
The market conditions over the entire dataset are shown in figure 1, with the vertical lineat 2015 indicating the boundary between the training and validation period (2006–2014) andthe test period (2015–2019). The average value of (cid:107) z t +1 − z t (cid:107) (interpreting them as vectorsin { , . . . , } ) is around 0.26, meaning that on each day, the market conditions change byaround 0.26 deciles on average. Data scarcity.
The market conditions can take on K = 1000 possible values. In the train-ing/validation datasets, only 224 of 1000 market conditions appear, so there are 776 marketconditions for which there are no data points. The most populated market condition, whichcorresponds to market conditions (9 , , no training data. This scarcity of datameans that the Laplacian regularization is critical in constructing models of the return andrisk that depend on the market conditions.In the test dataset, only 133 of the economic conditions appear. The average number ofdata points per market condition in the test dataset is 1.26. Only nine economic conditionsappear in both the training/validation and test datasets. In the test data, there are only397 days (about 32% of the 1237 test data days) in which the market conditions for thatday were observed in the training/validation datasets. Regularization graph.
Laplacian regularization requires a weighted graph that tells uswhich market conditions are ‘close’. Our graph is the Cartesian product of three chain6raphs, which link each decile of each indicator to its successor (and predecessor). Thisgraph on the 1000 values of z has 2700 edges. Each edge connects two adjacent deciles ofone of our three economic indicators. We assign three different positive weights to the edges,depending on which indicator they link. We denote these as γ vol , γ inf , γ mort . (2)These are hyper-parameters in our Laplacian regularization. Each of the nonzero entries inthe weight matrix W is one of these values. For example, the edge between (3 , ,
4) and(3 , , z that differ by one decile in Inflation, has weight γ inf . In this section we describe the stratified return model. The model consists of a mean returnvector in µ z ∈ R for each of K = 1000 different market conditions, for a total of Kn =18000 parameters.The loss in (1) is a Huber penalty, (cid:96) k ( µ k ) = (cid:88) t : z t = k T H ( µ k − y t ) , where H is the Huber penalty (applied entrywise above), H ( z ) = (cid:40) z , | z | ≤ M M | z | − M , | z | > M, where M > M = 0 .
01. (Thiscorresponds to the 79th percentile of absolute return on the training dataset.) We usequadratic or (cid:96) squared local regularization in (1), r ( µ k ) = γ ret , loc (cid:107) µ k (cid:107) , where the positive regularization weight γ ret , loc is another hyper-parameter.The Laplacian regularization contains the three hyper-parameters (2), so overall ourstratified return model has four hyper-parameters. To choose the hyper-parameters for the stratified return model, we start with a coarse gridsearch, evaluating all combinations of γ ret , loc = 0 . , . , . ,γ vol = 1 , , , , ,γ inf = 1 , , , , ,γ mort = 1 , , , , , Table 2:
Correlations to the true returns over the training set and the held-out validation set forthe return models. a total of 375 combinations, and selecting the hyper-parameter combination that yieldedthe largest correlation between the return estimates and the returns over the validation set.(Thus, our true loss is negative correlation of forecast and realized returns.) The hyper-parameters ( γ ret , loc , γ vol , γ inf , γ mort ) = (0 . , , , γ ret , loc = 0 . , . , . ,γ vol = 2 , , , , ,γ inf = 2 , , , , ,γ mort = 200 , , , , , a total of 375 combinations. The final hyper-parameter values are( γ ret , loc , γ vol , γ inf , γ mort ) = (0 . , , , . (3)These can be roughly interpreted as follows. The large value for γ mort tells us that our returnmodel should not vary much with mortage rate, and the smaller values for for γ vol and γ inf tells us that iour return model can vary more with volatility and inflation. Table 2 shows the correlation coefficient of the return estimates to the true returns over thetraining and validation sets, for the stratified return model and the common return model, i.e. , the empirical mean over the training set. The stratified return model estimates have alarger correlation with the realized returns in both the training set and the validation set.The common return model even has a slightly negative correlation with the true returns onthe validation dataset.Table 3 summarizes some of the statistics of our stratified return model over the 1000market conditions, along with the common model value. We can see that each forecastvaries considerably across the market conditions. Note that the common model values arethe averages over the training data; the median, minimum, and maximum are over the 1000market conditions. 8sset Common Median Min MaxAGG -0.021 -0.071 -0.128 0.073DBC -0.056 -0.060 -0.158 0.106GLD -0.012 -0.012 -0.119 0.153IBB 0.033 0.031 -0.098 0.139ITA 0.022 0.031 -0.077 0.075PBJ 0.006 0.005 -0.039 0.112TLT -0.000 -0.063 -0.173 0.110VNQ 0.016 0.009 -0.301 0.071XLB 0.001 0.010 -0.065 0.078XLE -0.005 0.014 -0.122 0.127XLF -0.019 -0.040 -0.398 0.055XLI 0.007 0.010 -0.056 0.062XLK 0.005 0.004 -0.059 0.090XLP 0.005 -0.004 -0.041 0.070XLU -0.008 -0.018 -0.074 0.083XLV 0.010 0.009 -0.033 0.065XLY 0.013 0.004 -0.059 0.066
Table 3:
Return predictions, in percent daily return. The first column gives the common re-turn model; the second, third, and fourth columns give median, minimum, and maximum returnpredictions over the 1000 market conditions for the Laplacian regularized stratified model. Stratified risk model
In this section we describe the stratified risk model, i.e. , a return covariance that dependson z . For determining the risk model, we can safely ignore the (small) mean return, andassume that y t has zero mean. The model consists of K = 1000 inverse covariance matricesΣ − k = θ k ∈ S , indexed by the market conditions. Our stratified risk model has Kn ( n +1) / (cid:96) k ( θ k ) = Tr ( S k Σ − k ) − log det(Σ − k )where S k = n k (cid:80) t : z t = k y t y Tt is the empirical covariance matrix of the data y for which z = k ,and n k is the number of data samples with z = k . (When n k = 0, we take S k = 0.) Wefound that local regularization did not improve the model performance, so we take localregularization r = 0. All together our stratified risk model has the three Laplacian hyper-parameters (2). We start with a coarse grid search over all 125 combinations of γ vol = 0 . , , , , ,γ inf = 0 . , , , , ,γ mort = 0 . , , , , , selecting the hyper-parameter combination with the smallest negative log-likelihood (ourtrue loss) on the validation set. The hyper-parameters( γ vol , γ inf , γ mort ) = (1 , , γ vol = 0 . , . , , , ,γ inf = 0 . , . , , , ,γ mort = 20 , , , , . For the stratified risk model, the final hyper-parameter values chosen are( γ vol , γ inf , γ mort ) = (0 . , , . It is interesting to compare these to the hyper-parameter values chosen for the stratifiedreturn model, given in (3). Since the losses for return and risk models are different, we canscale the hyper-parameters in the return and risk to compare them. We can see that theyare not the same, but not too different, either; both choose γ inf larger than γ vol , and γ mort quite a bit larger than γ vol . 10odel Train loss Validation lossStratified risk model -10.64 -4.27Common risk model 2.44 3.26 Table 4:
Average negative log-likelihood (scaled, with constant terms ignored) over the trainingand validation sets for the stratified and common risk models.
Asset Common Median Min MaxAGG 1.313 0.864 0.537 4.236DBC 1.289 0.998 0.725 3.950GLD 1.665 1.194 0.866 5.613IBB 0.914 0.796 0.634 1.920ITA 0.619 0.549 0.474 1.421PBJ 0.648 0.502 0.414 1.502TLT 1.816 1.263 0.734 6.050VNQ 1.328 0.769 0.643 3.730XLB 0.772 0.623 0.491 2.148XLE 1.024 0.793 0.628 3.117XLF 1.190 0.602 0.378 3.479XLI 0.499 0.432 0.360 1.008XLK 0.515 0.453 0.380 1.241XLP 0.760 0.569 0.424 1.682XLU 0.883 0.724 0.614 2.155XLV 0.703 0.499 0.417 1.560XLY 0.536 0.433 0.350 1.386
Table 5:
Forecasts of volatility, expressed in percent daily return. The first column gives thecommon model; the second, third, and fourth columns give median, minimum, and maximumvolatility predictions over the 1000 market conditions for the Laplacian regularized stratified model.
Table 4 shows the average negative log likelihood (scaled, with constant terms ignored) overthe training and held-out validation sets, for both the stratified risk model and the commonrisk model, i.e. , the empirical covariance. We can see that the stratified risk model hassubstantially better loss on the training and validation sets.Table 5 summarizes some of the statistics of our stratified return model asset volatilities, i.e. , ((Σ z ) ii ) / , expressed as daily percentages, over the 1000 market conditions, along withthe common model asset volatilities. We can see that the predictions vary considerablyacross the market conditions, with a few varying by a factor almost up to ten. Table 6summarizes the same statistics for the correlation of each asset with AGG, an aggregatebond market ETF. Here we see dramatic variation, for example, the correlation betweenXLI (an industrials ETF) and AGG varies from -85% to +80% over the market conditions.11sset Common Median min maxAGG 1 1 1 1DBC 0.490 0.414 -0.285 0.959GLD 0.683 0.522 -0.131 0.979IBB 0.238 0.066 -0.669 0.888ITA 0.021 -0.059 -0.896 0.842PBJ 0.569 0.356 -0.058 0.918TLT 0.934 0.888 0.749 0.995VNQ -0.345 0.007 -0.908 0.796XLB -0.213 -0.216 -0.802 0.826XLE -0.203 -0.166 -0.832 0.854XLF -0.520 -0.267 -0.946 0.105XLI -0.108 -0.117 -0.848 0.801XLK 0.158 0.091 -0.749 0.864XLP 0.717 0.561 0.228 0.938XLU 0.555 0.427 0.010 0.945XLV 0.600 0.390 -0.275 0.917XLY -0.059 -0.035 -0.833 0.534 Table 6:
Forecasts of correlations with the aggregate bond index AGG. The first column givesthe common model; the second, third, and fourth columns give median, minimum, and maximumcorrelation predictions over the 1000 market conditions for the Laplacian regularized stratifiedmodel. Trading policy and backtest
In this section we give the details of how we use our stratified return and risk models toconstruct the trading policy T .At the beginning of each day t , we use the previous day’s market conditions z t to allocateour current portfolio according to the weights w t , computed as the solution of the Markowitz-inspired problem [BBD + µ Tz t w − γ sc κ T ( w ) − − γ tc τ Tt | w − w t − | subject to w T Σ z t w ≤ σ , T w = 1 , (cid:107) w (cid:107) ≤ L max , w min ≤ w ≤ w max , (4)with optimization variable w ∈ R , where w − = max { , − w } (elementwise), and the abso-lute value is elementwise. We describe each term and constraint below. • Return forecast.
The first term in the objective, µ Tz t w , is the expected return under ourforecast mean, which depends on the current market conditions. • Shorting cost.
The second term γ sc κ T ( w ) − is a shorting cost, with κ ∈ R the vectorof shorting cost rates. (For simplicity we take the shorting cost rates as constant.) Thepositive hyper-parameter γ sc scales the shorting cost term, and is used to control ourshorting aversion. • Transaction cost.
The third term γ tc τ Tt | w − w t − | is a transaction cost, with τ t ∈ R the vector of transaction cost rates used on day t . We take τ t as one-half the averagebid-ask spread of each asset for the previous 15 trading days (excluding the currentday). We summarize the bid-ask spreads of each asset over the training and holdoutperiods in table 7. The positive hyper-parameter γ tc scales the transaction cost term,and is used to control the turnover. • Risk limit.
The constraint w T Σ z w ≤ σ limits the (daily) risk (under our risk model,which depends on market conditions) to σ , which corresponds to an annualized risk of √ σ . • Leverage limit.
The constraint (cid:107) w (cid:107) ≤ L max limits the portfolio leverage, or equiva-lently, it limits the total short position T ( w ) − to no more than ( L − / • Position limits.
The constraint w min ≤ w ≤ w max (interpeted elementwise) limits theindividual weights. Parameters.
Some of the constants in the trading policy (4) we simply fix to reasonablevalues. We fix the shorting cost rate vector to (0 . , i.e. , 5 basis points for each asset.We take σ = 0 . √ σ , around 7.1%.13sset Training/validation period Holdout periodAGG 0.000298 0.000051DBC 0.000653 0.000324GLD 0.000112 0.000048IBB 0.000418 0.000181ITA 0.000562 0.000175PBJ 0.000966 0.000637TLT 0.000157 0.000048VNQ 0.000394 0.000066VTI 0.000204 0.000048XLB 0.000310 0.000098XLE 0.000181 0.000077XLF 0.000359 0.000200XLI 0.000295 0.000079XLK 0.000324 0.000093XLP 0.000298 0.000095XLU 0.000276 0.000099XLV 0.000271 0.000070XLY 0.000334 0.000059 Table 7:
One-half the mean bid-ask spread of each asset, over the training and validation periodsand the holdout period.
We take L max = 2, which means the total short position cannot exceed one half of theportfolio value. (A portfolio with a leverage of 2 satisfying 1 T w = 1 is commonly referredto as a .) We fix the position limits as w min = − . and w max = 0 . ,meaning we cannot short any asset by more than 0 .
25 times the portfolio value, and wecannot hold more than 0 . Hyper-parameters.
Our trading policy has two hyper-parameters, γ sc and γ tc , whichcontrol our aversion to shorting and trading, respectively. Backtests are carried out starting from a portfolio of all VTI and a starting portfolio valueof v = 1. On day t , after computing w t as the solution to (4), we compute the value of ourportfolio v t by r t, net = r Tt w t − κ T ( w t ) − − ( τ sim t ) T | w t − w t − | , v t = v t − (1 + r t, net ) , Here r t ∈ R is the vector of asset returns on day t , r Tt w t is the gross return of the portfoliofor day t , τ sim t is one-half the realized bid-ask spread on day t , and r t, net is the net return ofthe portfolio for day t including shorting and transaction costs. In particular, our backtests Table 8:
Annualized return and risk for the stratified model policy over the train and validationperiods. take shorting and transaction costs into account . Note also that in the backtests, we usethe actual realized bid-ask spread on that day (which is not known at the beginning of theday) to determine the true transaction cost, whereas in the policy, we use the trailing 15 dayaverage (which is known at the beginning of the day).Our backtest is a bit simplified. Our simulation assumes dividend reinvestment. Weaccount for the shorting and transaction costs by adjusting the portfolio return, which isequivalent to splitting these costs across the whole portfolio; a more careful treatment mightinclude a small cash account. For portfolios of very high value, we would add an additionalnonlinear transaction cost term, for example proportional to the 3 / | w t − w t − | [BBD + To choose values of the two hyper-parameters in the trading policy, we carry out multiplebacktest simulations over the training set. We evaluate these backtest simulations by theirrealized return (net, including costs) over the validation set.We perform a grid search, testing all 625 pairs of 25 values of each hyper-parameterlogarithmically spaced from 0 . γ sc = 2 . , γ tc = 2 . , shown on figure 2 as a star.These values are themselves interesting. Roughly speaking, we should plan our tradesas if the shorting cost were more than 2.5 times the actual cost, and the transaction cost ismore than double the true transaction cost. The blue and purple region at the bottom ofthe heat map indicates poor validation performance when the transaction cost parameter istoo low, i.e. , the policy trades too much.Table 8 gives the annualized return and risk for the policies over the train and validationperiods. Common model trading policy.
We will compare our stratified model trading policyto a common model trading policy, which uses the constant return and risk models, alongwith the same Markowitz policy (4). In this case, none of the parameters in the optimizationproblem change with market conditions, and the only parameter that changes in differentdays is w t − , the previous day’s asset weights, which enters into the transaction cost.15 sc t c R e t u r n o n v a li d a t i o n s e t ( a nnu a li z e d ) Figure 2:
Heatmap of the annualized return on the validation set as a function of the two hyper-parameters γ sc and γ tc . The star shows the hyper-parameter combination used in our tradingpolicy. Table 9:
Annualized active return, active risk, active Sharpe ratios, and maximum drawdown ofthe active portfolio value for the three policies over the test period (2015–2019).
We also perform a grid search for this trading policy, over the same 625 pairs of thehyper-parameters. For the common model trading policy, we choose the final values γ sc = 0 . , γ tc = 1 . . We re-fit our stratified risk and return models, utilizing all of the data in the training andvalidation sets, using the hyper-parameters selected in § § i.e. , value above the benchmark VTI) for our stratified model and commonmodel. Buying and holding the benchmark VTI gives zero active return, and a constantactive portfolio value of 1. The superior performance of the stratified model policy, e.g. ,higher return and lower volatility, is evident in this plot.Table 9 shows the annualized active return, annualized active risk, annualized activeSharpe ratio (return divided by risk), and maximum drawdown of the active portfolio valuefor the policies over the test period. We remind the reader that we are fully accounting forthe shorting and transaction cost, so the turnover of the policy is accounted for in thesebacktest metrics.The results are impressive when viewed in the following light. First, we are using a verysmall universe of only 18 ETFs. Second, our trading policy uses only three widely availablemarket conditions, and indeed, only their deciles. Third, the policy was entirely developedusing data prior to 2015, with no adjustments made for the next five years. (In actual use,one would likely re-train the model periodically, perhaps every quarter or year.) Comparison of stratified and constant policies.
In figure 4, we plot the asset weightsof the stratified model policy (top) and of the common model policy (bottom), over the testperiod. (The variations in the common model policy holdings come from a combination of adaily rebalancing of the assets and the transaction cost model.) The top plot shows that theweights in the stratified policy change considerably with market conditions. The only assetsthat are shorted to a significant degree are AGG, GLD and TLT, and only during times ofmarket turbulence. The common model policy is mainly concentrated in just seven assets,17 E c o n o m i c c o n d i t i o n s d e c il e VolatilityInflationMortgage
Date0.850.900.951.001.051.10 P o r t f o li o v a l u e Stratified model policyVTICommon model policy
Figure 3:
Plot of economic conditions (top) and cumulative portfolio value for the stratified modeland the common model (bottom) over the test period. The horizontal blue line is the cumulativeportfolio value for buying and holding the benchmark VTI. .10.00.10.20.30.4 S t r a t i f i e d m o d e l h o l d i n g s Date C o mm o n m o d e l h o l d i n g s Figure 4:
Asset weights of the stratified model policy (top) and of the common model policy(bottom), over the test period. The first time period asset weights, which are all VTI, are notshown.
AGG (bonds) GLD (gold), IBB (biotech), XLE (energy), XLF (financials), XLY (consumerdiscretionary), and VTI (which is effectively cash when considering active returns and risks),and never shorts any assets, i.e. , is long only. Moreover, the common model policy shortsAGG (to the position limit of -0.25) and TLT and XLF (by a much lesser degree).
Factor analysis.
We fit a linear regression model of the active returns of the two policiesover the test set to four of the Fama-French factors [FF92, FF93, Fre21]: • MKTRF , the value-weighted return of United States equities, minus the risk free rate, • SMB , the return on a portfolio of small size stocks minus a portfolio of big size stocks, • HML , the return on a portfolio of value stocks minus a portfolio of growth stocks, and19actor Stratified model policy Common model policyMKTRF -0.033887 0.154132SMB 0.165571 0.231877HML -0.233028 -0.457454UMD -0.127748 -0.108726Alpha 0.000097 -0.000228
Table 10:
The top four rows give the regression model coefficients of the active portfolio returnson the Fama-French factors; the fifth row gives the intercept or alpha value. • UMD , the return on a portfolio of high momentum stocks minus a portfolio of low ornegative momentum stocks.We also include an intercept term, commonly referred to as alpha. Table 10 gives the resultsof these fits. Relative to the common model policy, the stratified model policy active returnsare much less positively correlated to the market, shorter the size factor, longer the valuefactor, and shorter the momentum factor. Its active alpha is around 2.43% annualized. Whilenot very impressive on its own, this alpha seems good considering it was accomplished withjust 18 ETFs, and using only three widely available quantities in the policy.
We have presented a simple (but realistic) example only to illustrate the ideas, which caneasily be applied in more complex settings, with a far larger universe, a more complextrading policy, and using proprietary forecasts of returns and quantities used to judge marketconditions. We describe some extensions and variations on our method below.
Multi-period optimization.
For simplicity we use a policy that is based on solving asingle-period Markowitz problem. The entire method immediately extends to policies basedon multi-period optimization. For example, we would fit separate stratified models of returnand risk for the next 1 day, 5 day, 20 day, and 60 day periods (roughly daily, weekly, monthly,quarterly), all based on the same current market conditions. These data are fed into a multi-period optimizer as described in [BBD + Joint modeling of return and risk.
In this paper we created separate Laplacian regu-larized stratified models for return and risk. The advantage of this approach is that we canjudge each model separately (and with different true objectives), and use different hyper-parameter values. It is also possible to fit the return mean and covariance jointly , in onestratified model, using the natural parameters in the exponential family for a Gaussian, Σ − and Σ − µ . The resulting log-likelihood is jointly concave, and a Laplacian regularized modelcan be directly fit. 20 ow-dimensional economic factors. When just a handful (such as in our example,three) base quantities are used to construct the stratified market conditions, we can binand grid the values as we do in this paper. This simple stratification of market conditionspreserves interpretability. If we wish to include more raw data in our stratification of marketconditions, simple binning and enumeration is not practical. Instead we can use severaltechniques to handle such situations. The simplest is to perform dimensionality-reduction onthe (higher-dimensional) economic conditions, such as principal component analysis [Pea01]or low-rank forecasting [BDB20], and appropriately bin these low-dimensional economicconditions. These economic conditions may then be related on a graph with edge weightsdecided by an appropriate method, such as nearest neighbor weights.
Structured covariance estimation.
It is quite common to model the covariance matrixof returns as having structure, e.g. , as the sum of a diagonal matrix plus a low rank ma-trix [RSV12, FLL16]. This structure can be added by a combination of introducing newvariables to the model and encoding constraints in the local regularization. In many cases,this structure constraint turns the stratified risk model fitting problem into a non-convexproblem, which may be solved approximately.
Multi-linear interpolation.
In the approach presented above, the economic conditionsare categorical, i.e. , take on one of K = 1000 possible values at each time t , based on thedeciles of three quantities. A simple extension is to use multi-linear interpolation [WZ88,Dav97] to determine the return and risk to use in the Markowitz optimizer. Thus we woulduse the actual quantile of the three market quantiities, and not just their deciles. In the caseof risk, we would apply the interpolation to the precision matrix Σ − t , the natural parameterin the exponential family description of a Gaussian. End-to-end hyper-parameter optimization.
In the example presented in this paperthere are a total of nine hyper-parameters to select. We keep things simple by separatelyoptimizing the hyper-parameters for the stratified return model, the stratified risk model,and the trading policy. This approach allows each step to be checked independently. It isalso possible to simultaneously optimize all of the hyper-parameters with respect to a singlebacktest, using, for example, CVXPYlayers [AAB +
19, ABBS20] to differentiate through thetrading policy.
Stratified ensembling.
The methods described in this paper can be used to combineor emsemble a collection of different return forecasts or signals, whone performance varieswith market (or other) conditions. We start with a collection of return predictions, andcombine these (ensemble them) using weights that are a function of the market conditions.We develop a stratified selection of the combining weights.21
Conclusions
We argue that stratified models are interesting and useful in portfolio construction andfinance. They can contain a large number of parameters, but unlike, say, neural networks,they are fully interpretable and auditable. They allow arbitrary variation across marketconditions, with Laplacian regularization there to help us come up with reasonable modelseven for market conditions for which we have no training data. The maximum principlementioned on page 4 tells us that a Laplacian regularized stratified model will never doanything crazy when it encounters values of z that never appeared in the training data.Instead it will use a weighted sum of other values for which we do have training data. Theseweights are not just any weights, but ones carefully chosen by validation.The small but realistic example we have presented is only meant to illustrate the ideas.The very same ideas and method can be applied in far more complex and sophisticatedsettings, with a larger universe of assets, a more complex trading policy, and incorporatingproprietary data and forecasts. Acknowledgements
The authors gratefully acknowledge discussions with and suggestions from Ronald Kahn,Raffaele Savi, and Andrew Ang. Jonathan Tuck is supported by the Stanford GraduateFellowship in Science and Engineering. 22 eferences [AAB +
19] A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and Z. Kolter. Differen-tiable convex optimization layers. In
Advances in Neural Information ProcessingSystems , 2019.[AB02] A. Ang and G. Bekaert. International asset allocation with regime shifts.
TheReview of Financial Studies , 15(4):1137–1187, July 2002.[AB03] A. Ang and G. Bekaert. How do regimes affect asset allocation? TechnicalReport 10080, National Bureau of Economic Research, November 2003.[AB04] A. Ang and G. Bekaert. How regimes affect asset allocation.
Financial AnalystsJournal , 60(2):86–99, 2004.[ABBS20] A. Agrawal, S. Barratt, S. Boyd, and B. Stellato. Learning convex optimizationcontrol policies. In
Proceedings of the 2nd Conference on Learning for Dynamicsand Control , volume 120 of
Proceedings of Machine Learning Research , pages361–373, 10–11 Jun 2020.[AIL99] R. Aggarwal, C. Inclan, and R. Leal. Volatility in emerging stock markets.
TheJournal of Financial and Quantitative Analysis , 34(1):33–55, 1999.[AT11] A. Ang and A. Timmermann. Regime changes and financial markets. TechnicalReport 17182, National Bureau of Economic Research, June 2011.[BBD +
17] S. Boyd, E. Busseti, S. Diamond, R. Kahn, K. Koh, P. Nystrup, and J. Speth.Multi-period trading via convex optimization.
Foundations and Trends in Opti-mization , 3(1):1–76, 2017.[BC03] J. Boyd and B. Champ. Inflation and financial market performance: what havewe learned in the last ten years? Technical Report 0317, Federal Reserve Bankof Cleveland, 2003.[BDB20] S. Barratt, Y. Dong, and S. Boyd. Low rank forecasting. Manuscript, 2020.[BLS96] J. Boyd, R. Levine, and B. Smith. Inflation and financial market performance.Technical report, Federal Reserve Bank of Minneapolis, October 1996.[BLS01] J. Boyd, R. Levine, and B. Smith. The impact of inflation on financial sectorperformance.
Journal of Monetary Economics , 47(2):221 – 248, 2001.[Bor12] L. Borland. Statistical signatures in times of panic: markets as a self-organizingsystem.
Quantitative Finance , 12(9):1367–1379, October 2012.[BV04] S. Boyd and L. Vandenberghe.
Convex Optimization . Cambridge UniversityPress, 2004. 23Cav16] G. La Cava. Housing prices, mortgage interest rates and the rising share of capitalincome in the United States. BIS Working Papers 572, Bank for InternationalSettlements, July 2016.[CCR20] D. Chun, H. Cho, and D. Ryu. Economic indicators and stock market volatilityin an emerging economy.
Economic Systems , 44(2):100788, 2020.[Dav97] S. Davies. Multidimensional triangulation and interpolation for reinforcementlearning. In M. C. Mozer, M. Jordan, and T. Petsche, editors,
Advances inNeural Information Processing Systems 9 , pages 1005–1011. MIT Press, 1997.[DWW14] P. Danaher, P. Wang, and D. Witten. The joint graphical lasso for inversecovariance estimation across multiple classes.
Journal of the Royal StatisticalSociety , 76(2):373–397, 2014.[EHV94] C. Erb, C. Harvey, and T. Viskanta. Forecasting international equity correlations.
Financial Analysts Journal , 50(6):32–45, 1994.[Exc20] Chicago Board Options Exchange. CBOE volatility index. , 2020.[FF92] E. Fama and K. French. The cross-section of expected stock returns.
Journal ofFinance , 47(2):427–465, 1992.[FF93] E. Fama and K. French. Common risk factors in the returns on stocks and bonds.
Journal of Financial Economics , 33(1):3–56, 1993.[FLL16] J. Fan, Y. Liao, and H. Liu. An overview of the estimation of large covarianceand precision matrices.
The Econometrics Journal , 19(1):C1–C32, 2016.[Fre21] K. French. Description of Fama/French factors. https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html , 2021.[FRED20] Federal Reserve Bank of St. Louis Federal Reserve Economic Data. 30-yearfixed rate mortgage average in the United States (MORTGAGE30US). https://fred.stlouisfed.org/series/MORTGAGE30US , 2020.[FSS87] K. French, W. Schwert, and R. Stambaugh. Expected stock returns and volatility.
Journal of Financial Economics , 19(1):3, 1987.[GK99] R. Grinold and R. Kahn.
Active Portfolio Management: A Quantitative Approachfor Producing Superior Returns and Controlling Risk . McGraw Hill New York,NY, 1999.[HNB19] D. Hallac, P. Nystrup, and S. Boyd. Greedy Gaussian segmentation of multi-variate time series.
Advances in Data Analysis and Classification , 13(3):727–751,2019. 24HTF09] T. Hastie, R. Tibshirani, and J. Friedman.
The Elements of Statistical Learning:Data Mining, Inference, and Prediction . Springer series in statistics. Springer,2009.[Hun03] F.-S. Hung. Inflation, financial development, and economic growth.
InternationalReview of Economics & Finance , 12(1):45–67, 2003.[LS01] F. Longin and B. Solnik. Correlation structure of international equity marketsduring extremely volatile periods.
Journal of Finance , 56(2):649–676, April 2001.[Mah17] H. Mahyar. The Effect Of Inflation On Financial Development Indicators In Iran(2000-2015).
Studies in Business and Economics , 12(2):53–62, August 2017.[Mar52] H. Markowitz. Portfolio selection.
The Journal of Finance , 7(1):77–91, 1952.[NBLM19] P. Nystrup, S. Boyd, E. Lindstr¨om, and H. Madsen. Multi-period portfolioselection with drawdown control.
Annals of Operations Research , 282(1):245–271, November 2019.[NHML15] P. Nystrup, B. Hansen, H. Madsen, and E. Lindstr¨om. Regime-based versus staticasset allocation: Letting the data speak.
The Journal of Portfolio Management ,42(1):103–109, 2015.[NML18] P. Nystrup, H. Madsen, and E. Lindstr¨om. Dynamic portfolio optimization acrosshidden market regimes.
Quantitative Finance , 18(1):83–95, 2018.[oLS20] United States Bureau of Labor Statistics. Consumer price index. , 2020.[Pea01] K. Pearson. On lines and planes of closest fit to systems of points in space.
TheLondon, Edinburgh, and Dublin Philosophical Magazine and Journal of Science ,2(11):559–572, 1901.[Pet15] G. Petre. A case for dynamic asset allocation for long term investors.
ProcediaEconomics and Finance , 29:41 – 55, 2015.[RSV12] E. Richard, P.-A. Savalle, and N. Vayatis. Estimation of simultaneously sparseand low rank matrices. In
Proceedings of the 29th International Conference onMachine Learning , ICML’12, page 51–58, Madison, WI, USA, 2012. Omnipress.[RTA98] T. Ryden, T. Terasvirta, and S. Asbrink. Stylized facts of daily return seriesand the hidden Markov model.
Journal of Applied Econometrics , 13(3):217–244,1998.[Sch89] W. Schwert. Why does stock market volatility change over time?
The Journalof Finance , 44(5):1115–1153, 1989.25SMS17] G. Sutton, D. Mihaljek, and A. Subelyt˙e. Interest rates and house prices inthe United States and around the world. BIS Working Papers 665, Bank forInternational Settlements, October 2017.[SS16] T. Saegusa and A. Shojaie. Joint estimation of precision matrices in heteroge-neous populations.
Electronic Journal of Statistics , 10(1):1341–1392, 2016.[TB20] J. Tuck and S. Boyd. Fitting Laplacian regularized stratified Gaussian models. arXiv , 2020. Manuscript.[TB21] J. Tuck and S. Boyd. Eigen-stratified models.
Optimization and Engineering ,2021.[TBB21] J. Tuck, S. Barratt, and S. Boyd. A distributed method for fitting Laplacianregularized stratified models.
Journal of Machine Learning Research , 2021. Toappear.[THB19] J. Tuck, D. Hallac, and S. Boyd. Distributed majorization-minimization forLaplacian regularized problems.
IEEE/CAA Journal of Automatica Sinica , 2019.[WS94] M. Wynne and F. Sigalla. The consumer price index.
Economic and FinancialPolicy Review , 2:1–22, Feb 1994.[WZ88] A. Weiser and S. Zarantonello. A note on piecewise linear and multilinear tableinterpolation in many dimensions.