[PDF] Optimal Asset Allocation For Outperforming A Stochastic Benchmark Target

Abstract

We propose a data-driven Neural Network (NN) optimization framework to determine the optimal multi-period dynamic asset allocation strategy for outperforming a general stochastic target. We formulate the problem as an optimal stochastic control with an asymmetric, distribution shaping, objective function. The proposed framework is illustrated with the asset allocation problem in the accumulation phase of a defined contribution pension plan, with the goal of achieving a higher terminal wealth than a stochastic benchmark. We demonstrate that the data-driven approach is capable of learning an adaptive asset allocation strategy directly from historical market returns, without assuming any parametric model of the financial market dynamics. Following the optimal adaptive strategy, investors can make allocation decisions simply depending on the current state of the portfolio. The optimal adaptive strategy outperforms the benchmark constant proportion strategy, achieving a higher terminal wealth with a 90% probability, a 46% higher median terminal wealth, and a significantly more right-skewed terminal wealth distribution. We further demonstrate the robustness of the optimal adaptive strategy by testing the performance of the strategy on bootstrap resampled market data, which has different distributions compared to the training data.

Full PDF

aa r X i v : . [ q -f i n . C P ] J un Optimal Asset AllocationFor Outperforming A Stochastic Benchmark Target

Chendi Ni , Yuying Li , Peter Forsyth , and Ray Carroll Cheriton School of Computer Science, University of Waterloo, Waterloo, N2L 3G1,Canada, [email protected] Cheriton School of Computer Science, University of Waterloo, Waterloo, N2L 3G1,Canada, [email protected] Cheriton School of Computer Science, University of Waterloo, Waterloo, N2L 3G1,Canada, [email protected] Neuberger Berman Breton Hill, Toronto, M4W 1A8, Canada,

[email protected]

June 30, 2020

Abstract

We propose a data-driven Neural Network (NN) optimization framework to determine the optimalmulti-period dynamic asset allocation strategy for outperforming a general stochastic target. We formu-late the problem as an optimal stochastic control with an asymmetric, distribution shaping, objectivefunction. The proposed framework is illustrated with the asset allocation problem in the accumulationphase of a deﬁned contribution pension plan, with the goal of achieving a higher terminal wealth thana stochastic benchmark. We demonstrate that the data-driven approach is capable of learning an adap-tive asset allocation strategy directly from historical market returns, without assuming any parametricmodel of the ﬁnancial market dynamics. Following the optimal adaptive strategy, investors can makeallocation decisions simply depending on the current state of the portfolio. The optimal adaptive strat-egy outperforms the benchmark constant proportion strategy, achieving a higher terminal wealth with a90% probability, a 46% higher median terminal wealth, and a signiﬁcantly more right-skewed terminalwealth distribution. We further demonstrate the robustness of the optimal adaptive strategy by testingthe performance of the strategy on bootstrap resampled market data, which has diﬀerent distributionscompared to the training data.

We propose a data-driven framework to compute optimal multi-period dynamic strategies for outperforminga general stochastic benchmark target, which is an important portfolio management problem with im-mediate practical applications. There is a large extant literature on techniques for constructing portfo-lios which outperform a stochastic benchmark, e.g., (Browne, 1999, 2000; Tepla, 2001; Basak et al., 2006;Davis and Lleo, 2008; Lim and Wong, 2010; Oderda, 2015; Alekseev and Sokolov, 2016; Samo and Vervuurt,2016; Al-Aradi and Jaimungal, 2018).Typically, outperforming a multi-period investment benchmark is formulated as an optimal stochastic con-trol problem under an assumed model for trading asset price dynamics, e.g., (Oderda, 2015; Al-Aradi and Jaimungal,2018). In Oderda (2015), under the assumption that stocks follow a geometric Brownian motion and no in-vesting constraints (i.e. inﬁnite leverage, trading continues if insolvent, and shorting is allowed), the authorsﬁnd that a portfolio which outperforms (under certain criteria) the benchmark market capitalization indexcan be constructed by a combination of (i) the benchmark portfolio and (ii) rule-based portfolios such as1qual weight and minimum variance portfolios. The determination of the optimal weights for these portfoliosis independent of estimates of the expected returns of individual stocks. Hence this outperformance portfoliois robust to uncertainty in the expected return parameters. A natural conjecture is that determining assetallocation strategies that outperform a benchmark may be robust in general.In Al-Aradi and Jaimungal (2018), optimal stochastic control techniques are also used in this context.Based on several assumptions, Al-Aradi and Jaimungal (2018) formulate the control problem as a Hamilton-Jacobi-Bellman (HJB) Partial Diﬀerential Equation (PDE), and are able to obtain a closed-form solution.However, of necessity, this approach requires (i) the assumption of a parametric model for the StochasticDiﬀerential Equations (SDEs) governing the asset price processes and (ii) no constraints on the portfolio(i.e. inﬁnite leverage is allowed). It is possible, in some cases, to solve the HJB PDE numerically, and thusinclude more realistic constraints.There are two main challenges in the aforementioned methods for the stochastic optimal control outper-forming benchmark problem. Firstly, unless the benchmark is speciﬁcally restricted, it can add additionalstochastic state variables in the optimal control problem (Al-Aradi and Jaimungal, 2018). This makes solv-ing the PDE formulated control problem numerically challenging, due to the curse of dimensionality. Hence,this technique is limited to a small number of stochastic factors (i.e. less than four). Secondly, a parametricmodel of the asset returns needs to be postulated, which adds challenges as the parameters can be diﬃcultto estimate accurately.To overcome the aforementioned challenges, in this work we use market asset return data directly tosolve a scenario-based stochastic optimal control formulation, corresponding to the original stochastic controlproblem. This avoids the need to make model assumptions and parameter estimations. In addition, we solvethe stochastic optimal control problem directly, without invoking dynamic programming to transform it intoa PDE problem (thus avoiding the curse of dimensionality). The optimal control is represented as a neuralnetwork (NN) which is learned through training. The features for the NN can include any state variable thatinﬂuences the optimal strategy, including the state variables associated with a stochastic target. We designa speciﬁc objective function to create a desirable terminal wealth distribution. This is done by measuringthe relative performance of the strategy against an elevated ﬁnal wealth of the stochastic target strategy topenalize extreme losses and limit unlikely extreme gains.We formulate a general optimal control problem for the multi-period asset allocation portfolio whichoutperforms a benchmark as an optimal stochastic control problem. We propose a benchmark target-basedobjective function which measures the diﬀerence between the terminal wealth of the adaptive strategy and apath-dependent elevated target (which is the terminal wealth of the constant proportion strategy multipliedby a pre-deﬁned growth factor). The objective function is designed as a double-sided penalty function toforce the terminal wealth of the adaptive strategy to be close to the elevated target. The NN model takesthree features as inputs: the current wealth of the adaptive portfolio, the current wealth of the constantproportion portfolio, and the time remaining. In the case that the underlying assets follow simple stochasticprocesses, it can be shown that the control is only a function of these variables.Instead of formulating the problem as an HJB equation derived from dynamic programming, we solve thesingle original optimal control problem directly as in Li and Forsyth (2019). We deﬁne an objective functionin terms of the terminal wealth, and then solve for the control directly, using a data-driven approach. Theproposed data-driven approach does not require an estimation of the parameters of an assumed parametricmodel for traded assets. We represent the control using a shallow neural network (NN). We remark thatshallow learning is found to outperform deep learning for asset pricing in Gu et al. (2018). We also note thatgood results are obtained in Hejazi and Jackson (2016) with an NN containing only one hidden layer (shallowlearning), in which the shallow neural network learns a good choice of distance function for eﬃciently andaccurately interpolating the Greeks for the input portfolio of Variable Annuity contracts.To illustrate the proposed framework, we consider a practically relevant and important problem: optimalmulti-period asset allocation during the accumulation phase of a DC pension plan. A deﬁned contribution(DC) plan is a retirement plan in which the employer, employee, or both make contributions regularly withno guarantee on the accumulated amount in the plan at the retirement date. In contrast, another type ofretirement plan is the deﬁned beneﬁt (DB) plan, which promises to pay a set income when the employee2etires. There has been a paradigm shift from DB plans to DC plans in the United States, Canada, theUnited Kingdom, and Australia, as both the public and private sectors are no longer willing to take on therisks of DB plans.In a DC plan, the employee (investor) is often presented with a list of eligible stock and bond funds, andthen needs to specify how the DC account is to be allocated to each fund. Typically the employee has theopportunity to change the asset allocation at least yearly. Normally, the DC plan is tax-advantaged, so thatthere are no tax consequences triggered on rebalancing. A typical DC plan accumulation phase would occurover 30 years, assuming a 30-year-lifetime employment period. The choice of the asset allocation strategy iscrucial to the terminal wealth in the DC fund.A popular asset allocation strategy for retirement plans is the constant proportion strategy, in which theemployee invests ﬁxed proportions of the wealth into several assets. This idea can be traced back to Graham(2003). Among the constant proportion strategies, a very popular one is the 50/50 strategy, in which 50% ofthe wealth is allocated to stocks and 50% of the wealth is allocated to bonds. It is conventional wisdom that a50/50 portfolio is an appropriate tradeoﬀ between risk and reward for those saving for retirement. Althoughthere has been a popular shift to a 60/40 portfolio (60% in stocks) in recent years, for illustration, we willfocus on the 50/50 portfolio in this article. This would be a typical average allocation to equities over the fullaccumulation phase of a lifecycle fund. Note that, in Forsyth and Vetzal (2019), it is shown that the ﬁnalwealth distributions of a constant weight allocation, and any glide path strategy having the same averageallocation as the constant weight strategy, are essentially the same. Hence there is little to be gained byusing a (deterministic) glide path compared to a constant weight strategy. Using the proposed framework todetermine the optimal multi-period dynamic asset allocation strategy for outperforming a stochastic target,we address a natural and interesting question of whether it is possible to develop a dynamic allocationstrategy that outperforms the constant proportion strategy.It is common practice in the ﬁnancial industry to train and test strategy performance by splitting thehistorical market data path into two segments - one for training and the other for testing. We take a diﬀerentapproach. We aim to determine an investment strategy that would perform well statistically on a large setof data paths created through bootstrap resampling, rather than on a single historical data path. To achievethis, we generate additional data paths from the historical market data path by block bootstrap resamplingof the historical data (see, e.g., Politis and Romano (1994); Politis and White (2004); Patton et al. (2009)).Once we have a large set of price paths from bootstrap resampling, we split them into training data set andtesting data set.To demonstrate the robustness of our approach, we test the optimal adaptive strategy on market datawith diﬀerent distributions from the training data. We ﬁrst test the optimal adaptive strategy, learnedfrom bootstrap resampled data with a given expected blocksize, on bootstrap resampled data with diﬀerentexpected blocksizes (thus diﬀerent distributions, as noted by Politis and Romano (1994)). We then test theadaptive strategy learned from synthetic data generated from a parametric jump-diﬀusion stochastic process(estimated from the same single historic path) on bootstrap resampled data. Finally, we test the strategylearned on bootstrap resampling data from a segment of the historical market data path on bootstrapresampling data generated from another non-overlapping segment of the historical data path.To the best of our knowledge, the closest work related to the research in this paper is Samo and Vervuurt(2016), in which the authors also use a data-driven machine learning approach for constructing a dynamicstrategy which outperforms a benchmark. Samo and Vervuurt (2016) approximate the control by a Gaussianprocess and solve the optimal hyperparameters using Bayesian inference. However, they do not assess thedistributional properties of the investment strategy, but rather evaluate the performance on a single historicalpath. In addition, they only validate the performance of the strategy for a relatively short period from 1992-2014. In contrast to our focus in this work, they consider the case of daily rebalancing with a large numberof stocks which would not be typical of a deﬁned contribution pension plan.In this paper, we consider investment portfolios which are combinations of a stock index and a bond A lifecycle fund is based on the intuitive concept of allocating a high equity weight during the early employment years, andthen moving to bonds as retirement nears. However, as shown in Graf (2017), this strategy does not outperform a constantweight strategy. • We propose a data-driven solution to a general optimal dynamic asset allocation for outperforming astochastic benchmark, which is formulated as a stochastic control problem. The data-driven learningbypasses the need for a robust estimation of parameters of an assumed parametric model. In addition,closed-form solutions are only available assuming simple parametric price processes and no portfolioconstraints (i.e. inﬁnite leverage is allowed) (see, for example, Oderda (2015); Al-Aradi and Jaimungal(2018)). Existing solution techniques, which require dynamic programming, are computationally in-feasible due to the high dimensionality. In this work, we formulate the controls as the outputs of aneural network function and avoid the curse of dimensionality of a PDE approach . We use a gradient-based optimization method to solve for the controls. This approach naturally extends the method in(Li and Forsyth, 2019) to the problem of outperforming a stochastic benchmark. Our philosophy hereis similar to that in Samo and Vervuurt (2016), although our method of implementation and scope aresigniﬁcantly diﬀerent. • Unlike the commonly used one-sided quadratic shortfall objective function, we propose an asymmetricdistribution shaping objective function for the optimal asset allocation problem. The proposed objec-tive function aims to produce an optimal dynamic and adaptive strategy which can yield signiﬁcantlyhigher median terminal wealth than the stochastic benchmark, with only a small probability (andmagnitude) of underperformance. • Recognizing ﬁnancial data scarcity, we use block bootstrap resampling to generate both training dataand testing data. We observe that the block bootstrap resampling data sets generated using diﬀerentexpected blocksizes lead to performance testing against diﬀerent distributions. We mathematicallyestablish upper bounds on the probability of a training path being equivalent to a testing path tojustify the soundness of the proposed stationary block bootstrap method even when the same expectedblocksizes are used to generate training and testing data sets. • We apply the proposed data-driven framework to the allocation of the DC pension plan. In thiscontext, the constant proportion strategy is a popular asset allocation strategy because of its simplicityin execution and its capability of diversifying market risks eﬀectively. However, constant proportionstrategies are not able to adapt to diﬀerent market scenarios because of the predeﬁned ﬁxed allocations.It is a popular active research problem within the ﬁnancial industry to devise schemes that consistentlyoutperform the constant proportion strategy. Since we consider benchmark and optimal portfolios as having two assets each, this would result in a four-dimensional PDEproblem, assuming discrete rebalancing. Our work has signiﬁcant empirical importance and implications. The optimal adaptive asset allocationstrategy learned from the data-driven framework has a more favorable terminal wealth distributionthan the constant proportion strategy with a higher expected terminal wealth and signiﬁcantly lessdownside risk. In addition, the optimal adaptive strategy has consistently higher expected wealthcompared to the constant proportion strategy over the entire investment period. Finally, the optimaladaptive strategy is robust in the sense that it performs well on bootstrapped market data with diﬀerentdistributions.

Let the initial time t = 0 and consider a set T of rebalancing times T ≡ { t = 0 < t < . . . < t N = T } . (2.1)The fraction of total wealth allocated to each asset is adjusted at times t n , n = 0 , . . . , N −

1, with theinvestment horizon t N = T . Consider an investment problem in M risky and riskless assets.Assume that, at time t , a fund holds wealth of amount W m ( t ) in asset m , m = 1 , . . . , M . The total valueof the portfolio at t is then W ( t ) = M X m =1 W m ( t ) . (2.2)For any given time t and arbitrary function f ( t ), deﬁne f ( t + ) = lim ǫ → + f ( t + ǫ ), and f ( t − ) = lim ǫ → + f ( t − ǫ ).Assume that W ( t − ) = 0, i.e., the initial value of the portfolio before any cash injection is zero, and let q ( t n )represent an a priori speciﬁed cash injection schedule.We denote the allocation at stage n by an allocation vector p n , n = 0 , . . . , N −

1. Given the allocation con-trol vectors p , p , . . . , p N − , the statistical properties of the terminal wealth of the adaptive portfolio W ( T )can be determined. Similarly, given a benchmark allocation vector ˜ p n , the ﬁnal wealth of the benchmarkportfolio W b ( T ) can also be determined. The time evolution of W ( t ) and W b ( t ) is given by f or n = 0 , , ..., N − W ( t + n ) = W ( t − n ) + q ( t n ) W b ( t + n ) = W b ( t − n ) + q ( t n ) W ( t − n +1 ) = p Tn R ( t n ) W ( t + n ) W b ( t − n +1 ) = ˜ p Tn R ( t n ) W b ( t + n ) end, where R ( t n ) is the vector of returns on assets in ( t − n , t + n ).Our ﬁrst goal is to minimize some measure of underperformance against the benchmark. A natural choiceis to quadratically penalize the underperformance of the terminal wealth of the adaptive strategy comparedto a benchmark of the terminal wealth of the constant proportion strategy, as in Li and Forsyth (2019). Note,however, that in our case, the benchmark is stochastic. This leads to the following optimization problem( E [ · ] is the expectation operator): min p ,p ,...,p N − E h min (cid:0) W ( T ) − W b ( T ) , (cid:1) i . (2.3)Unfortunately the optimal solution to (2.3) is trivially the benchmark strategy p n = ˜ p n , ∀ n , whichindicates the formulation (2.3) does not suﬃciently capture properties of the desired solution.5e propose to generate a more ambitious strategy by using an elevated target e sT · W b ( T ) in the objectivefunction, i.e., min p ,p ,...,p N − E h min (cid:0) W ( T ) − e sT · W b ( T ) , (cid:1) i , (2.4)where s is the yearly pre-determined target outperformance spread. Consequently, in an ideal case, theadaptive strategy will have a terminal wealth of e sT · W b ( T ) which indicates that the adaptive strategyachieves an annual outperformance spread of return s compared to the benchmark strategy.We note, however, that if the outperformance spread s is large, (2.4) will tend to generate a strategythat concentrates on the asset with the highest rate of returns, which can potentially lead to an unaccept-able probability of underperformance. We can see that outperforming a target is a complex distributionshaping problem with multiple criteria which is diﬃcult to formulate and to compute. Recognizing that lowprobability underperformance scenarios often come with low probability high outperformance scenarios, wechoose an asymmetric objective function which controls the loss-side tail by penalizing the underperformancequadratically, while at the same time penalizing the outperformance linearly.Our asymmetric distribution shaping benchmark outperforming formulation becomesmin p ,p ,...,p N − E h min (cid:0) W ( T ) − e sT · W b ( T ) , (cid:1) + max (cid:0) W ( T ) − e sT · W b ( T ) , (cid:1)i . (2.5)Figure 2.1 illustrates this asymmetric distribution shaping objective function. -20 0 20 40 60 80 100 W(T) - e sT W b (T) V a l ue Objective Function

Obj Value

Figure 2.1: Asymmetric distribution shaping objective function with elevated target e sT · W b ( T ).We remark that distribution shaping objectives can be problem dependent and we choose the objectivefunction (2.5) for the pension investment problem. Furthermore, our proposed framework does not dependon any speciﬁc form of the objective function.If we postulate parametric stochastic processes for prices of the traded assets, mathematically, the controls p , ..., p N − can be determined using dynamic programming. This will result in a nonlinear HJB PDE (see(Al-Aradi and Jaimungal, 2018) for example). In the absence of any closed-form solution, computing asolution of this problem numerically would be costly, particularly when the problem has a high dimension.Consider the simplest allocation problem, for which the portfolio consists of a stock index and a bond index.In the case of discrete rebalancing, the state variables would be the dollar amounts in the bond and stockindices, for both the adaptive and target portfolios (Dang and Forsyth, 2014). Consequently, even for thiscomparatively simple case, this would result in a four-dimensional HJB PDE.6ssume that samples of asset returns are available. These samples can come directly from marketobservations or from simulations of postulated parametric models. Instead of solving p , . . . , p N − usingdynamic programming, we propose a data driven approach as follows. We represent the optimal control asa function of several features F ( t ), i.e., at t n , n = 0 , , . . . , N − p n = p ( F ( t n )) Example 1 (Two Asset Problem with Benchmark W / ) . In our numerical examples, we will focus onportfolios consisting of two assets: a stock index and a bond index. The benchmark portfolio in this casewill be a constant proportion strategy, with stocks and bonds. We will denote the wealth of thebenchmark strategy in this case as W / ( t ) . For this example, for the stochastic target pension allocationproblem, we use three features for F ( t ) : (i) W ( t n ) , the wealth of the adaptive portfolio at t n , (ii) W / ( t n ) ,the wealth of the constant proportion portfolio at t n , (iii) T-t, time remaining in the investment period. Inthe case that simple stochastic processes are assumed, then it can be shown (in the absence of transactioncosts) that the controls are only a function of these features (Dang and Forsyth, 2014) . We remark that our feature set F ( t ) for Example 1 is diﬀerent from the features in Samo and Vervuurt(2016) which explicitly use security prices. Instead, at time t our feature set consists of the accumulatedwealth at t from allocation strategy and benchmark strategy, which depend on the returns of traded assetsfrom prior periods. Traded asset prices are not directly used as features for the neural network model.This is essentially because, at each rebalancing time, we search for the optimal adaptive strategy amongst allstrategies with the current level of wealth. In addition, since we evaluate the performance of a trading strategybased on the terminal wealth W ( T ) only, the trading decision at time t depends on the current accumulatedwealth and return distribution of future trading periods. Unless the asset price has predictability in its futurereturn, including the prices as features is redundant in this context and will likely lead to overﬁtting of themodel.We use a 2-layer neural network as the functional form for the optimal control. As a result, the goal ofthe optimization problem is to ﬁnd the optimal parameters of the neural network. F F F h h h p p HiddenlayerInputlayer OutputlayerFigure 2.2: A 2-Layer NN representing the control functionsAssume that h ∈ R l is the output of the hidden layer. Let the matrix z ∈ R dl be the weights from theinput features F ( t n ) ∈ R d to the hidden nodes h . We use the sigmoid activation function, σ ( u ) = 11 + e u , and have h j ( F ( t n )) = σ ( F i ( t n ) z ij ) . F i ( t n ) z ij ≡ d X i =1 F i ( t n ) z ij , j = 1 , ..., l . At the output layer, we use the logistic sigmoid function as the activation function. Let the matrix x ∈ R lM be the weights for output layer. For the m th asset, the asset allocation on this asset is given by: (cid:0) p ( F ( t n )) (cid:1) m = e x km h k ( F ( t n )) P i e x ki h k ( F ( t n )) , ≤ m ≤ M .

Note that with the logistic sigmoid activation function, the following constraint is automatically satisﬁed0 ≤ p ( F ( t n )) ≤ , T p ( F ( t n )) = 1 . This enforces the constraints of no-shorting and no leverage. In addition, insolvency cannot occur.The dynamics of the terminal wealth of the adaptive portfolio then becomes f or n = 0 , , ..., N − W ( t + n ) = W ( t − n ) + q ( t n ) W ( t − n +1 ) = p ( F ( t n )) T R ( t n ) W ( t + n ) end . (2.6)We approximate the expectation in equation (2.5) by a ﬁnite number of wealth samples of W ( T ), com-puted from return samples of R ( t n ) obtained by bootstrapping the historical data. Let W ℓ ( T ) , W ℓb ( T ) bethe ﬁnal wealth samples for the adaptive and benchmark strategies, obtained using equation (2.6), along the ℓ th return sample path R ( t n ) ℓ , n = 0 , , . . . , N − g ( x ) ≡ min (cid:0) x, (cid:1) + max (cid:0) x, (cid:1) . (2.7)The expectation in equation (2.5) is approximated by E h g ( W ( T ) − e sT · W b ( T )) i ≃ L ℓ = L X ℓ =1 g ( W ℓ ( T ) − e sT · W ℓb ( T )) (2.8)Since the approximate function on the right hand side of (2.8) is a nonconvex, continuous but piecewisediﬀerentiable function of the NN weights, solving the optimization problem is challenging.We recognize however that E h g ( W ( T ) − e sT · W b ( T )) i is a continuously diﬀerentiable function of theNN weights assuming that the continuous return distribution is continuous. This motivates us to use thesmoothing technique from Alexander et al. (2006). In equation (2.8), we replace g ( x ) by the smoothedapproximation ¯ g ( x ), to obtain a continuously diﬀerentiable approximation,¯ g ( x ) =  x, if x > ǫ , x ǫ + x + ǫ, if − ǫ ≤ x ≤ ǫ , ( x + ǫ ) , if x < − ǫ , (2.9)where ǫ is a predetermined small number. Since we are essentially optimizing the parameters x and z , wewrite the ﬁnal problem as min x,z L ℓ = L X ℓ =1 ¯ g ( W ℓ ( T ) − e sT · W ℓb ( T )) . (2.10)8imilar to Li and Forsyth (2019), we use the same trust region optimization method Coleman and Li (1996)to solve the resulting optimization problem.More speciﬁcally, the optimization method requires the evaluation of the objective function, its derivativewith respect to the weight parameters x and z , and the Hessian matrix. The gradients can be explicitlyevaluated via the chain rule, and the Hessian matrix can be numerically computed via the ﬁnite-diﬀerenceof the gradients. The detailed gradient computation can be found in Li and Forsyth (2019). Success in data-driven learning critically depends on the eﬃcient use of data. Standard machine learningmeasures success based on testing the model performance on unseen data which are assumed to have thesame distribution as the training data. In other words, test results are typically computed based on testsamples from the same distributions as training samples.For training of the optimization problem (2.10), we only have access to a single path of historical returns.This lack of data presents a unique challenge in data-driven ﬁnancial model learning.For ﬁnancial model learning and testing, it is a common practice to train and test strategy performanceby splitting the historical market data path into two segments - one for training and the other for testing.A critical problem in this approach is insuﬃcient data for robust learning and testing. This is especiallyproblematic in the context of pension planning due to the long-term investment horizon.Li and Forsyth (2019) uses block bootstrap resampling to generate training and testing data in data-driven ﬁnancial decision learning. Standard block bootstrap resampling is done by dividing the historicalmarket sequential data into blocks with ﬁxed blocksizes and randomly choosing blocks to construct thebootstrap resampled data series. To reduce the impact of a ﬁxed blocksize and to mitigate the edge eﬀectsat each block end, the stationary block bootstrap (Patton et al., 2009; Politis and White, 2004) can be used.A single bootstrap resampled path is constructed as follows. • First, randomly select a block of the historical market data time series. The blocksize is randomlysampled from a shifted geometric distribution with an expected blocksize ˆ b . The optimal choice for ˆ b is determined using the algorithm described in (Patton et al., 2009). • Repeat the previous step and concatenate the new block after the existing data series until the newresampled path has reached the desired length. • If the selected block exceeds the range of historical data, wrap around the historical data as in thecircular bootstrap method (Politis and White, 2004; Patton et al., 2009).Algorithm 1 presents pseudocode for the stationary block bootstrap.In Li and Forsyth (2019), the training dataset is generated using stationary block resampling with one ex-pected blocksize and the testing dataset is generated with a diﬀerent expected blocksize. As Politis and Romano(1994) points out, changing the expected blocksizes for block bootstrap resampling essentially changes thedistribution of the bootstrap resampled data paths. Consequently, such training and testing assessmentsactually perform out-of-distribution tests.Intuitively, using the block bootstrap resampling time-series ﬁnancial market data seems natural. Wehave trained a model, considering all permutations of the ﬁnancial market data with respect to diﬀerent andrandom concatenations of time horizons. In addition, testing has been performed on a diﬀerent distributionof the ﬁnancial market random horizon concatenations, since the testing data uses a diﬀerent expectedblocksize from that of the training data. Indeed, evaluating testing performance in this fashion seems touphold a more stringent standard in comparison to the standard machine learning which evaluates testingperformance assuming (unseen) testing samples are from the same distribution of the training data.Still, one may have concerns that when the training data and testing data are block bootstrap resampledfrom the same underlying historical market data sequence, one path may appear in both training and testingdatasets so that the learning algorithm may beneﬁt from such an unfair edge. To address such concerns, we9 lgorithm 1:

Pseudocode for stationary block bootstrap /* initialization */ bootstrap samples = [ ]; /* loop until the total number of required samples are reached */ while

True do /* choose random starting index in [1,...,N], N is the index of the lasthistorical sample */ index = UniformRandom( 1, N ); /* actual blocksize follows a shifted geometric distribution with expected valueof exp block size */ blocksize = GeometricRandom( exp block size ); for ( i = 0; i < blocksize ; i = i + 1 ) { /* if the chosen block exceeds the range of the historical data array, do acircular bootstrap */ if index + i > N then bootstrap samples.append( historical data[ index + i - N ] ); else bootstrap samples.append( historical data[ index + i ] ); endif bootstrap samples.len() == number required thenreturn bootstrap samples; end } end establish a theoretical bound on the probability of training and testing sample sequences being exactly thesame. Theorem 1.

Consider generating a sequence of N data points using ﬁxed block resampling from a sequenceof N tot distinct observations. Let path P be a bootstrap resampled with a ﬁxed blocksize of b and path P be a bootstrap resampled with a ﬁxed blocksize of b . Then the probability of P and P being identical is ( N tot ) lcm ( Nb , Nb ) , where lcm ( a, b ) is the least common multiple of integer a, b . The proof of Theorem 1 is in Appendix A.1. To put this into perspective, assume a ﬁxed blocksizefor the training paths of 6 months, and a ﬁxed blocksize for the testing path of 24 months (or 2 years).Consider a 30-year investment horizon of monthly return paths randomly generated from historical monthlydata over 90 years, i.e. N = 30 ×

12 = 360 and N tot = 90 ×

12 = 1080. Then the probability of a trainingpath being identical to a testing path is ( ) lcm ( , ) = ( ) < − . Assume that we use atotal of 100,000 training paths in the training data and 10,000 testing paths in the testing data. By theunion bound, the probability of the existence of a pair of identical training and testing paths is bounded by100 , × , × − = 10 − . Next, we consider the stationary block bootstrap case, in which the blocksizes are randomly generatedfrom a shifted geometric distribution. We are able to establish the following theorem about the probabilityof two paths generated with stationary block bootstrap being identical.

Theorem 2.

Consider generating a sequence of N data points using stationary block resampling from asequence of N tot distinct observations. Let P and P be two paths generated from the stationary blockbootstrap resampling from this observation sequence with the expected blocksizes of ˆ b and ˆ b respectively, andboth have a length of N . The probability of P and P being identical is N tot (cid:16)(cid:0) − b (cid:1)(cid:0) − b (cid:1) + b + b − b ˆ b N tot (cid:17) N − . The proof of Theorem 2 is also in Appendix A.1. Consider the following example. If the training pathsare bootstrap resampled with an expected blocksize of 6 months (0.5 years) and the testing paths with anexpected blocksize of 24 (2 years), for N = 30 ×

12 = 360 (30-year horizon) and N tot = 90 ×

12 = 1080.Then the probability of a training path being identical to a testing path is 8 . × − . If training data set consists of a total of 100,000 training paths and testing data set consists of 10,000testing paths, by union bound, the probability of existing a pair of training and testing path being identicalis bounded by 100 , × , × . × − < − . Therefore, even when the training set and testing set are generated from the same data sequence, theprobability of observing the same path in the training and testing dataset is near zero. This suggests thatusing the block bootstrap resampling to generate training and testing data sets is a robust method forenhancing data for the learning framework.

Remark 1.

Under stationary block bootstrap, a path is likely to have large actual blocksizes even if theexpected blocksize is relatively small, which can result in a higher probability of observing two identical pathsthan under ﬁxed block bootstrap. For example, a path with expected blocksize of years has a probabilityof only containing one block of years, which increases the probability of one path being identical to anotherpath, according to Theorem 1. We evaluate and report the performance of the proposed data-driven approach for outperforming a stochastictarget in the context of a 30 year DC pension plan. In our numerical tests, we focus on portfolios with onlytwo assets: a stock index and a bond index, as described in Example 1. The benchmark portfolio is aconstant weight strategy, which is rebalanced to 50% bonds and 50% stocks annually. We denote the wealthof the benchmark strategy at time t by W / ( t ). Our main objective here is to consider the core allocation problem between a risky and a defensive asset.To that end, we use monthly historical data from the Center for Research in i Security Prices (CRSP) fromJanuary 1, 1926 to December 31, 2015. . Speciﬁcally, we use the CRSP 3-month Treasury bill (T-bill) indexand the CRSP cap-weighted total return index. The latter index includes all distributions for all domesticstocks trading on major U.S. exchanges. Since both indexes are in nominal terms, we adjust them forinﬂation using the U.S. CPI index, also supplied by CRSP. We use real indexes since investors saving forretirement should be focused on real (not nominal) wealth goals. Note that in (Li and Forsyth, 2019), inthe context of a ﬁxed (non-stochastic) target based objective function, we have also tested the use of theCRSP equal weighted index (for the risky asset) and the ten year treasury index (for the defensive asset).The control strategies are qualitatively similar for either choice of risky and defensive asset. For simplicityhere, we will focus on the CRSP index and the 3-month T-bill case.For illustration, we consider here a two-asset allocation in which the wealth of the portfolio is allocatedto the two indexes. We subsequently refer to the two assets simply as the stock and the bond.For the stock index and bond index, Table 4.1 shows the optimal expected blocksize for each indexestimated from the historical data. When using the resampling method in the proposed data-driven NN More speciﬁcally, results presented here were calculated based on data from Historical Indexes, c (cid:13) b (months)Real 3-month T-bill index 50.1Real CRSP cap-weighted index 1.8Table 4.1: Optimal expected blocksize ˆ b = 1 /v when the blocksize follows a geometric distribution P r ( b = k ) = (1 − v ) k − v . The algorithm in Patton et al. (2009) is used to determine ˆ b . The parameters used in training and testing the proposed data-driven approach are as below: • L : a total of L = 100 ,

000 bootstrap paths are used for training; • L test : a total of L test = 10 ,

000 paths are bootstrap resampled from a diﬀerent expected blocksize thanthe training data for testing the strategy performance; • W (0): initial wealth is W (0) = 0; • T : the entire investment period is T = 30 years; • N : the entire period is divided into N = 30 periods. At the beginning of each period rebalancingoccurs, i.e., annual rebalancing; • q : annual cash injection is q = 10; • s : the annual target outperformance rate s = 1% for calculating the elevated target e sT W / ( T ),where W / ( T ) is the terminal wealth of the constant proportion portfolio; • – T − t : time remaining in the investment period, – W ( t ): wealth of the adaptive portfolio at time t , – W / ( t ): wealth of the constant proportion portfolio at time t . We now evaluate the performance of the optimal adaptive strategy trained on bootstrap resampled data.First, we show the performance of the optimal adaptive strategy trained on the bootstrap resampled datawith the expected blocksize ˆ b = 0.5 years, and tested on bootstrap resampled data with expected blocksizeof ˆ b = 2, which is the average optimal blocksize. When discussing robustness in Section 5.1, we show thatthe strategy performance using alternative training-testing expected blocksize pairs is qualitatively similar.12 raining Results on Bootstrap Data: Expected Blocksize ˆ b = 0 . E ( W T ) std ( W T ) median ( W T ) P r ( W T < median ( W CPT )) P r ( W T < median ( W NNT ))constant proportion( p = 0 .

5) 678 276 624 0.50 0.84adaptive 963 474 913 0.27 0.50Testing Results on Bootstrap Data: Expected Blocksize ˆ b = 2 yearsStrategy E ( W T ) std ( W T ) median ( W T ) P r ( W T < median ( W CPT )) P r ( W T < median ( W NNT ))constant proportion( p = 0 .

5) 679 267 629 0.50 0.84adaptive 962 449 921 0.26 0.50

Table 4.2: Terminal wealth statistics of the optimal adaptive strategy, trained on bootstrap resampled datawith blocksize ˆ b = 0 . b = 2 years.Table 4.2 reports performance statistics and the probability of the terminal wealth less than the medianof the terminal wealth of both strategies. From Table 4.2 , we observe that • The median and mean of the optimal adaptive strategy is signiﬁcantly higher than the constant pro-portion strategy. • The optimal adaptive strategy has only 26% probability of achieving a lower terminal wealth thanthe median terminal wealth of the constant proportion strategy ( median ( W CPT )), while the constantproportion strategy has an 84% probability of achieving a lower terminal wealth than the medianterminal wealth of the NN adaptive strategy ( median ( W NNT )).It is also worth noting that the standard deviation of the terminal wealth of the optimal adaptive strategyis higher than the standard deviation of the terminal wealth of the constant proportion strategy. In thecontext of dynamic trading, a higher standard deviation does not imply that the performance of the strategyis poor. In fact, we can observe from Figure 4.1a that the distribution of the terminal wealth of the optimaladaptive strategy is signiﬁcantly more right-skewed. A higher standard deviation of terminal wealth isdesirable in the right-skewed situation (van Staden et al., 2019). This illustrates why standard deviationand Sharpe Ratio are poor measures of risk for inherently non-linear strategies (Lhabitant, 2000). In fact,the optimal adaptive dynamic strategy has properties in common with option-based strategies. We also plotthe CDF plot for the optimal adaptive strategy and the constant proportion strategy in Figure 4.1b. (a) Histogram of terminal wealth foradaptive strategy and constant propor-tion strategy $ CDF of Terminal Wealth

Constant proportion Adaptive

AdaptiveConstant (b) CDF of terminal wealth adap-tive strategy and constant propor-tion strategy -200 -100 0 100 200 300 400 500

W(T)-W (T) C u m u l a t i v e p r obab ili t y CDF of Terminal Wealth Difference

CDF (c) CDF of terminal wealth diﬀer-ence between adaptive strategy andconstant proportions strategy

Figure 4.1: Histogram of terminal wealth W ( T ) (adaptive) and W / ( T ) (constant proportion) and CDFof wealth diﬀerence W ( T ) − W / ( T ) based on the testing data (bootstrap data with ˆ b =2 years)We should point out that the terminal wealth distribution of the optimal adaptive strategy has a slightlyworse left tail than the constant proportion strategy. The 90% VaR of terminal wealth is 340 for the13ptimal adaptive strategy and 394 for the constant proportion strategy. These tail events occur when thebootstrapped path corresponds to consistently bearish market periods when stocks underperform bonds fora long period of time. We remark that the investor can include risk measures such as VaR and CVaR in theobjective function if reducing the tail risk is of a higher priority in the investment plan. This, of course, willproduce a lower probability of outperformance.Figure 4.1c shows the cumulative distribution function (CDF) of the wealth diﬀerence W ( T ) − W / ( T )to give a more direct comparison between the optimal adaptive strategy and the constant proportion strategyalong the same paths. From Figure 4.1c we can see that the probability of the optimal adaptive strategyunderperforming the constant proportion strategy is less than 10%. When underperformance occurs, themagnitude of underperformance is small compared to the magnitude of outperformance.We have analyzed and compared the overall performance based on the terminal wealth adaptive strategy.Next, we provide more detailed comparisons of the various characteristics of the strategies. Since the objective function for the optimal control (2.5) is deﬁned from the terminal wealth, we examinehow the optimal adaptive strategy performs over the entire period of investment.

Time t -1000100200300400500600700 W ea l t h D i ff e r en c e Wealth Difference Over Time (a) Percentiles of wealth diﬀerence W ( t ) − W / ( t ) overtime Time t -20%-10%0%10%20%30%40%50%60%70% R e l a t i v e W ea l t h D i ff e r en c e Relative Wealth Difference Over Time (b) Percentiles of relative wealth diﬀerence W ( t ) − W / ( t ) W / ( t ) over time Figure 4.2: Wealth diﬀerence and relative wealth diﬀerence over time: W ( t ) denotes the optimal adaptive iswealth and W / ( t ) denotes the benchmarkFigure 4.2 graphs the average and various percentiles of the wealth diﬀerence W ( t ) − W / ( t ) in theinvestment time horizon. From Figure 4.2, we observe that • With a high probability, the optimal adaptive strategy achieves higher wealth than the constant pro-portion strategy over time. • The outperformance of the optimal adaptive strategy in terms of the relative wealth diﬀerence is notas signiﬁcant as the wealth diﬀerence in dollar values.The observations indicate that larger outperformance of the optimal adaptive strategy often occurs whenthe constant proportion strategy performs well. Nevertheless, the outperformance of the optimal adaptive We measure quantiles of the terminal wealth, not losses. Hence a larger value of VAR is more desirable, i.e. has less risk.

We further examine the characteristics of the optimal adaptive strategy. Figure 4.3a shows diﬀerent per-centiles of the stock allocation of the optimal adaptive strategy over time. We observe that • In general, the stock allocation (fraction of wealth invested in stocks) decreases when approaching theend of the investment horizon. • The stock allocation almost always stays above the benchmark allocation of 50%.

Time t S t o ck a ll o c a t i on Stock Allocation Over Time (a) Percentiles of the fraction invested in stocks overtime for the adaptive strategy

Stock Allocation Heatmap

Time t -2000200400600800 W ( t )- W / ( t ) (b) Heatmap, fraction invested in stocks for the adap-tive strategy Figure 4.3: Fraction invested in stocks over time for the optimal adaptive strategy: percentiles and theheatmapWith a red-blue color scheme, Figure 4.3b shows the heatmap of the stock allocation with respect totime t and the wealth diﬀerence W ( t ) − W / ( t ). Darker shades of the red color indicate more allocationin stocks and darker shades of the blue color indicate more allocation in bonds.From Figure 4.3b, we observe that when W ( t ) − W / ( t ) is positive and large (optimal adaptive strategyoutperforming), the allocation of wealth to the stock becomes small. The intuitive explanation is thatthe optimal adaptive strategy tends to decrease the wealth allocation to stocks once it has established anadvantage over the benchmark constant proportion strategy. This also explains why the stock allocationalmost always stays above 50%. In most cases where the optimal adaptive strategy has established anadvantage over the constant proportion strategy (as we have seen in Figure 4.2), decreasing the stockallocation to 50% to maintain the same allocation strategy as the 50/50 constant proportion strategy locksin the outperformance.On the other hand, when W ( t ) − W / ( t ) < As a special out-of-sample test, we consider the actual historical path from 1985 to 2015 to backtest theperformance of the optimal adaptive strategy. We note that the historical path is not a path in the trainingdata set.From Figure 4.4, we see that the optimal adaptive portfolio always maintains a higher wealth than theconstant proportion strategy over the entire investment period. While optimizing the performance of theadaptive strategy on a speciﬁc path is not the goal of our study, it is still quite interesting to see thathistorically the optimal adaptive strategy does better than the constant proportion strategy.Note that the adaptive strategy does show a large drawdown in 2002 and 2008. However, our objectivefunction is posed in terms of outperformance of the terminal wealth. We see that the adaptive strategyoutperforms, in the sense that its wealth is always above the benchmark wealth, even in 2002 and 2008. Itis, of course, possible to add penalties on drawdowns in the objective function. However, this would resultin less favorable terminal statistics.The solid line without markers in Figure 4.4 illustrates the time evolution of the stock allocation on thehistorical path. When the adaptive strategy performs poorly, such as in 2002 and 2008, the strategy allocatesmore wealth to stocks. When the adaptive strategy performs well, the strategy decreases allocation to stocksand invests more in bonds.

Year W ea l t h ( $ ) a ll o c a t i on Wealth Growth From 1985 to 2015 (Year End)

NN AdaptiveConstant ProportionRisky Asset Allocation

Figure 4.4: Backtest of strategy performance over the historical period from 1985-2015 (single path)

While the average stock allocation from the optimal adaptive strategy varies over time, its average over timeis about 80%. A natural question is how the optimal adaptive strategy compares with the 80/20 constantproportion strategy which invests 80% of the wealth in the stocks and 20% in the bonds.Here we compare the optimal adaptive strategy with the 80/20 constant proportion strategy. Recallthat in Section 4.3, the optimal adaptive strategy is trained on bootstrap resampled data with the expectedblocksize of 0 . W NN ( T ) − W / ( T ) and W / ( T ) − W / ( T ), i.e., the wealth diﬀerenceof the optimal adaptive strategy and the 80/20 strategy from the 50/50 strategy respectively. -200 0 200 400 600 W(T)-W (T) C u m u l a t i v e P r obab ili t y CDF of Terminal Wealth Difference

Adaptive80/20 (a) CDF of terminal wealth diﬀerence W ( T ) − W / ( T ), W ( T ) is either W NN ( T ) or W / ( T ) -120 -100 -80 -60 -40 -20 0 W(T)-W (T) C u m u l a t i v e P r obab ili t y CDF of Terminal Wealth - Underperformance

Adaptive80/20 (b) CDF of terminal wealth diﬀerence - enlarged forunderperformance

Figure 4.5: CDF of wealth diﬀerence of both strategies (optimal adaptive and 80/20 constant proportion)over the 50/50 strategyWe observe that the optimal adaptive strategy controls tail risk better than the 80/20 strategy. Specif-ically, the probability of the optimal adaptive strategy underperforming the 50/50 strategy is lower thanthe 80/20 strategy. When underperformance against the 50/50 strategy occurs, the magnitude of underper-formance for the optimal adaptive strategy is less than the magnitude of underperformance for the 80/20strategy, as in Figure 4.5.It is worth noting that the 80/20 strategy has more upside than the optimal adaptive strategy. However,we should remind the readers that less upside is a natural result of our choice of the double-sided penaltyobjective function. As reﬂected in the asymmetric objective function, our goal is not to achieve extremelylarge outperformance over the 50/50 strategy, but to reach the elevated target with high probability andto control the downside risk. The optimal adaptive strategy achieves those goals better than the 80/20strategy. To better demonstrate this, we plot the following CDF of outperformance of both strategies overthe elevated target e sT · W / ( T ), in Figure 4.6b.We also observe that the optimal adaptive strategy has a smaller probability of underperforming theelevated target (37.3%) than the 80/20 strategy (46.8%). This means the optimal adaptive strategy is morelikely to reach the elevated target and thus achieve the pre-determined annual outperformance spread.Moreover, we observe from the enlarged CDF plot in Figure 4.6b that the optimal adaptive strategyconsistently controls underperformance better than the 80/20 strategy, in the sense that the optimal adaptivestrategy underperforms less than the 80/20 strategy when the elevated target is not met. To further evaluate the robustness of the optimal adaptive strategy, we assess optimal control models fromthe following three perspectives: • We test the strategy learned from the bootstrap data with a given expected blocksize on bootstrapdata with multiple diﬀerent expected blocksizes.17

400 -300 -200 -100 0 100 200

W(T)-W (T)

CDF of Terminal Wealth Difference - Elevated target

Adaptive80/20 (a) CDF of terminal wealth diﬀerence W ( T ) − e sT · W / ( T ), W ( T ) is either W NN ( T ) or W / ( T ) -300 -250 -200 -150 -100 -50 0 W(T)-e sT W (T) CDF of Underperformance - Elevated target

Adaptive80/20 (b) CDF of terminal wealth diﬀerence over the ele-vated target - enlarged for underperformance

Figure 4.6: CDF of wealth diﬀerence of both strategies (optimal adaptive and 80/20 constant proportion)over the elevated target e sT · W / ( T ) • We train the model on a dataset simulated from a synthetic parametric model and test it on thebootstrap resampled dataset. • We train the strategy learned on bootstrap data from one segment of the historical data and test thestrategy on bootstrap data from another segment of the historical data.We generate the bootstrap resampled data by sampling directly from the speciﬁed historical data sequencefor training the optimal control model.

We test the adaptive strategy learned on bootstrap resampled data with a given blocksize on bootstrapresampled data with diﬀerent blocksizes.For illustration, here we only show the testing results of the strategy learned on bootstrap resampleddata with expected blocksize of 0 . • The mean and the median terminal wealth of the adaptive strategy remain similar across diﬀerentblocksizes. • The adaptive strategy has a more favorable terminal wealth distribution as it is more likely to achievethe terminal wealth higher than the median terminal wealth of the constant proportion strategy.Table 5.1 demonstrate that the outperformance of the adaptive strategy over the benchmark strategyis robust across diﬀerent expected blocksizes. We include more testing results from strategies trained withother expected blocksizes in the Appendix. 18 raining Results on Bootstrap Data with Expected Blocksize = 0.5 : Market Cap WeightedStrategy E ( W T ) std ( W T ) median ( W T ) P r ( W T < median ( W CPT )) P r ( W T < median ( W NNT ))constant proportion( p = 0 .

5) 678 276 624 0.50 0.86adaptive 963 474 913 0.26 0.50Testing Results on Bootstrap Data: Market Cap WeightedStrategy E ( W T ) std ( W T ) median ( W T ) P r ( W T < median ( W CPT )) P r ( W T < median ( W NNT ))Expected Blocksize ˆ b = 1 yearsconstant proportion( p = .

5) 674 273 624 0.50 0.84NN adaptive 955 466 909 0.27 0.50Expected Blocksize ˆ b = 2 yearsconstant proportion( p = .

5) 676 263 631 0.50 0.84NN adaptive 958 445 917 0.26 0.50Expected Blocksize ˆ b = 5 yearsconstant proportion( p = .

5) 669 244 626 0.50 0.85NN adaptive 953 409 915 0.24 0.50Expected Blocksize ˆ b = 8 yearsconstant proportion( p = .

5) 669 233 632 0.50 0.87NN adaptive 960 393 928 0.23 0.50Expected Blocksize ˆ b = 10 yearsconstant proportion( p = .

5) 667 223 635 0.50 0.88NN adaptive 961 383 928 0.22 0.50

Table 5.1: Terminal wealth statistics of the adaptive strategy trained on bootstrap resampled data withexpected blocksize ˆ b = 0 . In this section, we generate synthetic data from a parametric model calibrated to historical data. We thentest the strategy on bootstrap resampled data. Clearly, the synthetic data from the parametric model willhave a diﬀerent distribution compared to the resampled data.

The synthetic data is generated based on a jump-diﬀusion stochastic process. Let S ( t ) and B ( t ) respectivelydenote the wealth invested in the stocks and bonds at time t , t ∈ [0 , T ]. Speciﬁcally, we will assume that S ( t )represents the unit amount invested in a broad stock market index (CRSP cap-weighted index), while B ( t )is the unit amount invested in short term default-free government bonds (in our case, the 3-month T-bill).Recall that t − = t − ǫ, ǫ → + , i.e. t − is the instant of time before t , and let ψ be a random numberrepresenting a jump multiplier. When a jump occurs, S ( t ) = ξS ( t − ). Allowing discontinuous jumps lets usexplore the eﬀects of severe market crashes on the stock holding, and nonnormal returns. We assume that ξ follows a double exponential distribution ((Kou, 2002); (Kou and Wang, 2004)). If a jump occurs, p up isthe probability of an upward jump, while 1 − p up is the chance of a downward jump. The density functionfor y = log ξ is f ( y ) = p up η e − η y y ≥ + (1 − p up ) η e η y y ≤ . (5.1)For future reference, note that E [ y = log ξ ] = p up η − (1 − p up ) η , E [ y = ξ ] = p up η η − − p up ) η η − S ( t ) evolves according to dS ( t ) S ( t − ) = ( µ − λE [ ξ − dt + σdZ + d (cid:0) π t X i =1 ( ξ i − (cid:1) , (5.3)where µ is the (uncompensated) drift rate, σ is the volatility, dZ is the increment of a Wiener process, π t is a Poisson process with positive intensity parameter λ , and ξ i are i.i.d. positive random variables havingdistribution (5.1). Moreover, ξ i , π t , and dZ are assumed to all be mutually independent.19e assume that the dynamics of the amount B ( t ) invested in the risk-free asset are dB ( t ) = rB ( t ) dt, (5.4)where r is the (constant) risk-free rate. This is obviously a simpliﬁcation of the real bond market. We remindthe reader that, ultimately, our NN method is entirely data-driven, and will be based on bootstrapped stockand bond indexes.Based on (5.3) and (5.4), we use the methods in (Dang and Forsyth, 2016) to calibrate the processparameters. We use a threshold technique (Cont et al., 2011) to identify jump frequency and distribution,and the methods in (Dang and Forsyth, 2016) to determine the remaining parameters. Annualized estimatedparameters for the cap-weighted stock index is provided in Table 5.2. µ σ λ p up η η r Real CRSP Cap-Weighted Stock Index and 3-month T-bill Index.08889 .14771 .32222 0.27586 4.4273 5.2613 0.00827Table 5.2: Estimated annualized parameters for double exponential jump diﬀusion model. Cap-weightedindex, deﬂated by the CPI. Sample period 1926:1 to 2015:12.We then generate the synthetic data based on the parametric model with the calibrated parametersthrough Monte Carlo simulations.

We test the performance of the strategy trained on synthetic data on bootstrap data with expected blocksizeˆ b = 2 years. Note that the testing performance with other expected blocksizes is very similar to each otherso we only show results for ˆ b = 2 years. Training Results on Synthetic Data : Market Cap WeightedStrategy E ( W T ) std ( W T ) median ( W T ) P r ( W T < median ( W CPT )) P r ( W T < median ( W NNT ))constant proportion( p = 0 .

5) 714 383 630 0.50 0.82adaptive 1019 651 930 0.29 0.50Testing Results on Bootstrap Data with Expected Blocksize = 2 yearsStrategy E ( W T ) std ( W T ) median ( W T ) P r ( W T < median ( W CPT )) P r ( W T < median ( W NNT ))constant proportion( p = 0 .

5) 679 267 630 0.50 0.84adaptive 944 431 912 0.26 0.50

Table 5.3: Terminal wealth statistics the adaptive strategy trained on synthetic data and tested on bootstrapresampled data with expected blocksize ˆ b = 2 yearsTable 5.3 shows that the adaptive strategy learned from synthetic data performs well on the test setof bootstrap resampled data. The adaptive strategy have signiﬁcantly higher median and mean terminalwealth than the constant proportion strategy in both training and testing.We do notice that in the testing results, the adaptive strategy has slightly lower mean and medianterminal wealth, as well as a lower standard deviation than in training results. This is hardly surprisingsince the training and test data have diﬀerent distributions. However, overall, the strategy appears to bequite robust. Further distribution comparisons can be found in Appendix A.3. In § § Case

We train the adaptive strategy on bootstrap resampled data from the entire historical pathfrom 1926 to 2015. We test the strategy on bootstrap resampled data from the last 30 years of thehistorical path from 1986-2015. There is an overlap between the underlying historical path for trainingand testing (1986-2015). We show that such overlap does not introduce an advantage in terms of thestrategy performance by comparing it with case non-overlap case.

Case

We train the adaptive strategy on bootstrap resampled data from the ﬁrst 60 years of thehistorical path from 1926 to 1985. We test the strategy on the same bootstrap resampled data generatedfrom the last 30 years of the historical path from 1986-2015 as in case

Year1926 1985 2015Training Testing

Figure 5.1: Case

Year1926 1985 2015Training Testing

Figure 5.2: Case T = 30 years. In order to obtain moremeaningful block bootstrap resampling results for the non-overlap window, we will reduce the investmenthorizon to T = 15 years, for both cases in this section.We ﬁrst compare the CDF of the terminal wealth of the two cases. From Figure 5.3 we can observe thatCase non-overlap case) - supports our argument that forward-looking bias is not a concern in our21

50 0 50 100

W(T)-W (T) C u m u l a t i v e P r obab ili t y CDF of Terminal Wealth Difference

Case

Figure 5.3: CDF of wealth diﬀerence W ( T ) − W / ( T ) for the two cases: case non-overlap case actually has less tail risk.In Figure 5.5, we compare the actual strategies, i.e., stock allocations of both cases. This time we canobserve some diﬀerences between Case In this article, we propose a data-driven framework for computing the optimal asset allocation for out-performing a stochastic benchmark target based on market asset return observations. The scenario-baseddynamic asset allocation problem is solved directly assuming a neural network representation for the optimalcontrol, without using dynamic programming. This leads to a method that avoids the curse of dimensionality22

Time t -6-5-4-3-2-10

Wealth Difference W(t)-W (t) Over Time - 5th Percentile

Case

Time t

Wealth Difference W(t)-W (t) Over Time - Median

Case

Time t

Wealth Difference W(t)-W (t) Over Time - Mean

Case

Time t

Wealth Difference W(t)-W (t) Over Time - 95th Percentile

Case

Figure 5.4: Percentiles of wealth diﬀerence W ( T ) − W / ( T ) for the two cases Time t

Stock Allocation - 5th Percentile

Case

Time t

Stock Allocation - Median

Case

Time t

Stock Allocation - Mean

Case

Time t

Stock Allocation - 95th Percentile

Case

Figure 5.5: Stock allocation for the two cases23hich is a critical issue in dynamic allocation for outperforming a stochastic benchmark.In addition, we design an asymmetric distribution shaping objective function which is capable of produc-ing an optimal strategy which can yield signiﬁcantly larger median terminal wealth than the target, withonly a small probability (and magnitude) of underperformance. We emphasize that our methodology canencompass a wide class of objective functions, which can be tailored to the risk preferences of individualinvestors.We use block bootstrap resampling to augment historical ﬁnancial market data. The training data isgenerated by block bootstrap resampling from market asset returns. This leads to a data-driven approachfor determining the optimal dynamic asset allocation, avoiding the need to make a parametric asset pricemodel as well as model parameter estimations. We further provide mathematical justiﬁcations for usingblock bootstrap resampling to generate both training and testing datasets.The proposed method is illustrated in the DC pension allocation problem, which is a practically relevantand important problem on its own. We evaluate and analyze the performance of the optimal NN adaptivestrategy based on CRSP 3-month Treasury bill (T-bill) index for the risk-free asset and the CRSP cap-weighted total return index for the risky asset from .We illustrate the robustness of our approach from three diﬀerent perspectives. • We show that the adaptive strategy trained on bootstrap resampled data with a given expected block-size performs consistently well on bootstrap resampled data with diﬀerent expected blocksizes (thusdiﬀerent distributions). • We show that the adaptive strategy learned on synthetic data performs well on bootstrap resampleddata, despite the fact that the methodology for generating the datasets are quite diﬀerent. • We compare the performance of our strategy with the strategy learned in an non-overlap setting wherethe underlying market data for the training dataset and testing dataset has no overlap. We show thatthe non-overlap case has a comparable performance which supports our argument that forward-lookingbias should not be a concern in our approach.Basing our optimal control on a shallow Neural Network representation using only a small number ofﬁnancially relevant feature variables results in a strategy that is ﬁnancially intuitive and implementable.

This work was supported by a Collaborative Research and Development (CRD) grant from the NaturalSciences and Research Council of Canada NSERC Neuberger Berman CRD:

The authors have no conﬂicts of interest to report.

A Appendix

A.1 Proofs for Theorem 1 and 2

We mathematically establish Theorem 1 and 2. 24or a path P , we use the following notations:ˆ b = expected blocksize in stationary block bootstrap N = number of total datapoints in the path N tot = number of total datapoints to bootstrap from P [ i ] = the i th data point in path P (A.1)We also make the following deﬁnitions. Deﬁnition 1.

Assume that a path P of length N , which contains blocks [ B , . . . B k ] , is resampled from theoriginal data path of length N tot . The decision index list [ I , . . . , I k ] of the path P is deﬁned as the list ofstarting indices of every block in the resampled path with I = 1 , I i = 1 + P i − j =1 | B j | , i = 2 , . . . k , where | B j | denotes the number of points in the block B j . If I k is the starting index of the last block in the path, then,for index completeness, we deﬁne I k +1 ≡ N + 1 . Remark 2 (Decision Index List Example) . Given a decision index list [ I , . . . , I k ] , associated with a path P , then the data point of the path, which starts at decision index I i , is P [ I i ] . Deﬁnition 2.

For any two paths P and P , the combined decision index list of P and P is the mergedindex list (with only a single copy of each index) of the decision index lists of P and P . The merged list [ I , . . . , I p ] retains the order properties of the original lists, i.e. I i +1 > I i and I p +1 = N + 1 . Deﬁnition 3.

For any two paths P and P , we deﬁne N cdi ( P , P ) as the length of the combined decisionindex list of P and P . Lemma 1.

Consider either the ﬁxed block resampling or stationary resampling from a sequence of N tot distinct observations. Two paths P and P with [ I , I , . . . , I cdi ] as the combined decision index list areidentical if and only P [ I j ] = P [ I j ] at any I j , j = 1 , . . . , N cdi .Proof. First, P equals to P clearly implies that P [ I j ] = P [ I j ] at any I j , j = 1 , . . . , N cdi . Con-versely, assume that P [ I j ] = P [ I j ] , j = 1 , . . . , N cdi . For any j, j = 1 , . . . , N cdi , the entire segment P [ I j ] , . . . , P [ I j +1 −

1] is from the same resampled subblock of the original data. Similarly, the the entire seg-ment P [ I j ] , . . . , P [ I j +1 −

1] is from the same resampled subblock of the original data. Since P [ I j ] = P [ I j ],then P [ I j ] , . . . , P [ I j +1 −

1] and P [ I j ] , . . . , P [ I j +1 −

1] are identical. Thus, the entire paths P and P areidentical. THEOREM 1.

Consider ﬁxed block resampling sequences of N points from a sequence of N tot distinctobservations . Let path P be a bootstrap resampled path with a ﬁxed blocksize of b and path P be abootstrap resampled path with a ﬁxed blocksize of b . Then the probability of P and P being identical is( N tot ) lcm ( Nb , Nb ) , where lcm ( a, b ) is the least common multiple of integer a, b . Proof.

Let I denote the combined decision index list of P and P , with N cdi the total number of combineddecision points and I j denoting the j th index within I .From Lemma 1, two paths are identical if and only if P [ I j ] = P [ I j ] at any I j , j = 1 , . . . , N cdi .25or any j = 1 , . . . , N cdi , since each starting point of either P or P is chosen independently with equalprobability P ( P [ I j ] = P [ I j ]) = N tot . In addition P ( P [ I j ] = P [ I j ] , j = 1 , . . . , N cdi ( P , P )) = N cdi ( P , P ) Y j =1 P ( P [ I j ] = P [ I j ])= ( 1 N tot ) N cdi ( P , P ) . Since N cdi ( P , P ) = lcm ( Nb , Nb ), the probability of P and P being identical is ( N tot ) lcm ( Nb , Nb ) .Next, we consider the stationary block bootstrap resampling, in which the blocksizes are randomlygenerated from a shifted geometric distribution. Properties 1 (Properties of a Geometric Distribution) . Suppose the integer m > is drawn from a shiftedgeometric distribution, with E [ m ] = 1 /p , then P [ m = k ] = (1 − p ) k − p P [ m ≥ k ] = (1 − p ) k − . (A.2) We rewrite equation (A.2) in a form amenable to manipulation. Let (1 − p ) = e − λ , (A.3) so that equation (A.2) becomes P [ m = k ] = e − λk ( e λ − P [ m ≥ k ] = e − λ ( k − λ = − log[1 − p ] . (A.4) Denote the expected blocksize by ˆ b , then in our case, p = 1 / ˆ b , and consequently λ = − log (cid:20) − b (cid:21) . (A.5) Lemma 2.

Suppose [ I , . . . , I k ] be the decision index list of a block resampled path of length N with theexpected blocksize of ˆ b . Then the probability of the decision index list [ I , . . . , I k ] occurring is e − λ ( N − ( e λ − k − , with λ = − log[1 − b ] .Proof. By deﬁnition, I j +1 > I j for any j = 1 , . . . , k −

1, and I k +1 = N + 1. The probability of path P having [ I , . . . , I k ] as the decision index list is equal to the probability of path P having the ﬁrst block withblocksize of I − I , . . . , the k th block with blocksize of I k +1 − I k . Denote the blocks of path P as B , . . . , B k .According to Properties 1, P ( blocksize ( B j ) = I j +1 − I j ) = ( e − λ ( I j +1 − I j ) ( e λ − , if j < ke − λ ( I k +1 − I k − , if j = k The probability of path P having [ I , . . . , I k ] as the decision index list is k Y j =1 P ( blocksize ( B j ) = I j +1 − I j ) = e − λ ( I k +1 − I − ( e λ − k − = e − λ ( N − ( e λ − k − . P with an expected blocksizeof ˆ b having a decision index list is uniquely determined by the expected blocksize ˆ b , the path length N , andthe length k of the decision index list. Lemma 3.

Suppose two paths P and P of the length N are generated by stationary block bootstrap resam-pling with the expected blocksizes of ˆ b and ˆ b respectively. Then P ( N cdi ( P , P ) = k ) = (cid:18) N − k − (cid:19) e − ( λ + λ )( N − ( e λ + λ − k − λ = − log (cid:20) − b (cid:21) ; λ = − log (cid:20) − b (cid:21) . (A.6) Proof.

Let f (ˆ b, n ) denote the occurrence probability of a stationary block resampled path of length N withthe expected blocksize of ˆ b and a decision index list of length n (this is given by Lemma 2).Suppose [ I , . . . , I k ] is a combined index list of any two paths P and P . Let v be the number ofoverlapped indices and i be the number of non-overlapped indices for P respectively, corresponding to[ I , . . . , I k ].Enumerating the possible values for v , the number of overlapped indices and values for i , the numbernon-overlapped indices in P , the probability of a combined decision index list [ I , . . . , I k ] occurring equals k X v =1 (cid:16)(cid:18) k − v − (cid:19) k − v X i =0 (cid:18) k − vi (cid:19) f (ˆ b , v + i ) f (ˆ b , k − i ) (cid:17) . (A.7)Note that k X v =1 (cid:16)(cid:18) k − v − (cid:19) k − v X i =0 (cid:18) k − vi (cid:19) f (ˆ b , v + i ) f (ˆ b , k − i ) (cid:17) = k X v =1 (cid:16)(cid:18) k − v − (cid:19) k − v X i =0 (cid:18) k − vi (cid:19) e − λ ( N − ( e λ − v + i − e − λ ( N − ( e λ − k − i − (cid:17) = e − ( λ + λ )( N − k X v =1 (cid:16)(cid:18) k − v − (cid:19)(cid:0) e λ + λ − e λ − e λ + 1 (cid:1) v − (cid:16) k − v X i =0 (cid:18) k − vi (cid:19) ( e λ − i ( e λ − k − v − i (cid:17)(cid:17) = e − ( λ + λ )( N − k X v =1 (cid:16)(cid:18) k − v − (cid:19)(cid:0) e λ + λ − e λ − e λ + 1 (cid:1) v − (cid:16) e λ + e λ − (cid:17) k − v (cid:17) = e − ( λ + λ )( N − ( e λ + λ − k − Since there are (cid:0) N − k − (cid:1) combinations of the decision index list of length k , we conclude P ( N cdi ( P , P ) = k ) = (cid:18) N − k − (cid:19) e − ( λ + λ )( N − ( e λ + λ − k − . Using Lemma 1 and Lemma 3, we establish the probability of two paths generated with stationary blockbootstrap resampling being identical.

THEOREM 2.

Let P and P be two paths of the length N generated from the stationary block bootstrapresampling from a sequence of N tot distinct observations with the expected blocksizes of ˆ b and ˆ b respectively.The probability of P and P being identical is 27 N tot (cid:16)(cid:0) − b (cid:1)(cid:0) − b (cid:1) + b + b − b ˆ b N tot (cid:17) N − . Proof.

Using Lemma 1, P = P if and only if the observations from P and P are equal at each of theindex in the combined decision index list. Thus P (cid:0) P = P | N cdi ( P , P ) = k (cid:1) = (cid:18) N tot (cid:19) k . Additionally, following Lemma 3, we have P ( P = P ) = N X k =1 P (cid:0) N cdi ( P , P ) = k (cid:1) · P (cid:0) P = P | N cdi ( P , P ) = k (cid:1) = N X k =1 (cid:18) N − k − (cid:19) e − ( λ + λ )( N − ( e λ + λ − k − ( 1 N tot ) k = e − ( λ + λ )( N − N tot N X k =1 (cid:18) N − k − (cid:19)(cid:16) e λ + λ − N tot (cid:17) k − = e − ( λ + λ )( N − N tot (cid:16) e λ + λ − N tot (cid:17) N − = 1 N tot (cid:16) e − ( λ + λ ) + 1 − e − ( λ + λ ) N tot (cid:17) N − = 1 N tot (cid:16)(cid:0) − b (cid:1)(cid:0) − b (cid:1) + b + b − b ˆ b N tot (cid:17) N − . A.2 Additional Robustness Testing Results

As mentioned in section 4.3, we only showed terminal wealth statistics for the strategy trained with bootstrapresampled with expected blocksize ˆ b = 0 . est Results: Market Cap WeightedStrategy E ( W T ) std ( W T ) median ( W T ) P r ( W T < median ( W CPT )) P r ( W T < median ( W NNT ))Expected Blocksize ˆ b = 0 . p = .

5) 678 286 623.07 0.50 0.81NN adaptive 949 478 874.84 0.27 0.50Expected Blocksize ˆ b = 1 yearsconstant proportion( p = .

5) 674 273 623.99 0.50 0.81NN adaptive 942 459 878.60 0.27 0.50Expected Blocksize ˆ b = 2 yearsconstant proportion( p = .

5) 676 263 631.06 0.50 0.81NN adaptive 945 438 882.74 0.26 0.50Expected Blocksize ˆ b = 5 yearsconstant proportion( p = .

5) 669 244 626.11 0.50 0.83NN adaptive 940 404 881.87 0.23 0.50Expected Blocksize ˆ b = 8 yearsconstant proportion( p = .

5) 669 233 632.24 0.50 0.84NN adaptive 945 388 892.84 0.22 0.50Expected Blocksize ˆ b = 10 yearsconstant proportion( p = .

5) 667 223 635.29 0.50 0.85NN adaptive 942 373 895.88 0.22 0.50

Table A.1: Trained on bootstrap resampled data with ˆ b = 1 years Test Results: Market Cap WeightedStrategy E ( W T ) std ( W T ) median ( W T ) P r ( W T < median ( W CPT )) P r ( W T < median ( W NNT ))Expected Blocksize ˆ b = 0 . p = .

5) 678 286 623.07 0.50 0.83NN adaptive 962 491 903.07 0.27 0.50Expected Blocksize ˆ b = 1 yearsconstant proportion( p = .

5) 674 273 623.99 0.50 0.83NN adaptive 954 470 905.02 0.27 0.50Expected Blocksize ˆ b = 2 yearsconstant proportion( p = .

5) 676 263 631.06 0.50 0.84NN adaptive 958 446 912.31 0.26 0.50Expected Blocksize ˆ b = 5 yearsconstant proportion( p = .

5) 669 244 626.11 0.50 0.85NN adaptive 954 409 914.34 0.23 0.50Expected Blocksize ˆ b = 8 yearsconstant proportion( p = .

5) 669 233 632.24 0.50 0.87NN adaptive 961 392 928.89 0.22 0.50Expected Blocksize ˆ b = 10 yearsconstant proportion( p = .

5) 667 223 635.29 0.50 0.88NN adaptive 961 380 930.15 0.21 0.50

Table A.2: Trained on bootstrap resampled data with ˆ b = 2 years29 est Results: Market Cap WeightedStrategy E ( W T ) std ( W T ) median ( W T ) P r ( W T < median ( W CPT )) P r ( W T < median ( W NNT ))Expected Blocksize ˆ b = 0 . p = .

5) 678 286 623.07 0.50 0.86NN adaptive 995 495 963.03 0.26 0.50Expected Blocksize ˆ b = 1 yearsconstant proportion( p = .

5) 674 273 623.99 0.50 0.87NN adaptive 988 478 963.28 0.25 0.50Expected Blocksize ˆ b = 2 yearsconstant proportion( p = .

5) 676 263 631.06 0.50 0.88NN adaptive 994 458 973.65 0.25 0.50Expected Blocksize ˆ b = 5 yearsconstant proportion( p = .

5) 669 244 626.11 0.50 0.89NN adaptive 997 427 976.51 0.22 0.50Expected Blocksize ˆ b = 8 yearsconstant proportion( p = .

5) 669 233 632.24 0.50 0.90NN adaptive 1011 415 993.88 0.21 0.50Expected Blocksize ˆ b = 10 yearsconstant proportion( p = .

5) 667 223 635.29 0.50 0.92NN adaptive 1015 409 996.57 0.20 0.50

Table A.3: Trained on bootstrap resampled data with ˆ b = 5 years Test Results: Market Cap WeightedStrategy E ( W T ) std ( W T ) median ( W T ) P r ( W T < median ( W CPT )) P r ( W T < median ( W NNT ))Expected Blocksize ˆ b = 0 . p = .

5) 678 286 623.07 0.50 0.86NN adaptive 980 480 945.12 0.25 0.50Expected Blocksize ˆ b = 1 yearsconstant proportion( p = .

5) 674 273 623.99 0.50 0.86NN adaptive 973 464 947.99 0.25 0.50Expected Blocksize ˆ b = 2 yearsconstant proportion( p = .

5) 676 263 631.06 0.50 0.87NN adaptive 979 443 957.32 0.25 0.50Expected Blocksize ˆ b = 5 yearsconstant proportion( p = .

5) 669 244 626.11 0.50 0.88NN adaptive 981 412 959.86 0.21 0.50Expected Blocksize ˆ b = 8 yearsconstant proportion( p = .

5) 669 233 632.24 0.50 0.90NN adaptive 994 399 976.44 0.21 0.50Expected Blocksize ˆ b = 10 yearsconstant proportion( p = .

5) 667 223 635.29 0.50 0.91NN adaptive 996 390 980.07 0.20 0.50

Table A.4: Trained on bootstrap resampled data with ˆ b = 8 years30 est Results: Market Cap WeightedStrategy E ( W T ) std ( W T ) median ( W T ) P r ( W T < median ( W CPT )) P r ( W T < median ( W NNT ))Expected Blocksize ˆ b = 0 . p = .

5) 678 286 623.07 0.50 0.84NN adaptive 963 468 920.86 0.25 0.50Expected Blocksize ˆ b = 1 yearsconstant proportion( p = .

5) 674 273 623.99 0.50 0.84NN adaptive 957 451 923.63 0.25 0.50Expected Blocksize ˆ b = 2 yearsconstant proportion( p = .

5) 676 263 631.06 0.50 0.85NN adaptive 962 431 932.13 0.25 0.50Expected Blocksize ˆ b = 5 yearsconstant proportion( p = .

5) 669 244 626.11 0.50 0.87NN adaptive 962 399 937.08 0.22 0.50Expected Blocksize ˆ b = 8 yearsconstant proportion( p = .

5) 669 233 632.24 0.50 0.88NN adaptive 973 384 951.40 0.21 0.50Expected Blocksize ˆ b = 10 yearsconstant proportion( p = .

5) 667 223 635.29 0.50 0.90NN adaptive 973 373 954.63 0.20 0.50

Table A.5: Trained on bootstrap resampled data with ˆ b = 10 years A.3 Robustness: Distribution Comparison Based on Test Results From theSynthetic Model

We observe from Figure A.1 that the terminal wealth distributions of the adaptive strategy are consistentlyright-skewed and have similar shapes in training and testing, which indicates that the NN strategy similarlyoutperforms the constant proportion in both training and testing. (a) Training on synthetics data (b) Testing on bootstrap data with ˆ b =0.5 years Figure A.1: Histogram of terminal wealth. Model trained on synthetic data and tested on bootstrap resam-pled data with expected blocksize of 2 yearsWe also show the plot of the CDF of the wealth diﬀerence W ( T ) − W / ( T ) to give a more directcomparison between the adaptive strategy and constant proportion strategy on the same paths.From Figure A.2 we can see that the probability of the adaptive strategy underperforming the constantproportion strategy is less than 10% for both training and testing. When underperformance occurs, the scaleof underperformance is small compared to the scale of potential outperformance. Therefore, we concludethat the adaptive strategy controls tail risks consistently in both training and testing, despite the fact that31

200 -100 0 100 200 300 400 500

W(T)-W (T) C u m u l a t i v e p r obab ili t y CDF of Terminal Wealth Difference

CDF (a) Training on synthetics data -200 -100 0 100 200 300 400 500

W(T)-W (T) C u m u l a t i v e p r obab ili t y CDF of Terminal Wealth Difference

CDF (b) Testing on bootstrap data with ˆ b =2 years Figure A.2: CDF of terminal wealth diﬀerence W ( T ) − W / ( T )the training dataset is synthetically generated and the testing dataset is bootstrap resampled data. References

Al-Aradi, A. and S. Jaimungal (2018). Outperformance and tracking: dynamic asset allocation for activeand passive portfolio management.

Applied Mathematical Finance 25(3) , 268–294.Alekseev, A. G. and M. V. Sokolov (2016). Benchmark-based evaluation of portfolio performance: a charac-terization.

Annals of Finance 12 , 409–440.Alexander, S., T. F. Coleman, and Y. Li (2006). Minimizing CVaR and VaR for a portfolio of derivatives.

Journal of Banking & Finance 30 (2), 583–605.Basak, S., A. Shapiro, and L. Tepla (2006). Risk management with benchmarking.

Management Sci-ence 52(4) , 542–557.Browne, S. (1999). Beating a moving target: optimal portfolio strategies for outperforming a stochasticbenchmark.

Finance and Stochastics 3 , 275–294.Browne, S. (2000). Risk-constrained dynamic active portfolio management.

Management Science 46(9) ,1188–1199.Coleman, T. F. and Y. Li (1996). An interior, trust region approach for nonlinear minimization subject tobounds.

SIAM Journal on Optimization 6 , 418–445.Cont, R., C. Mancini, et al. (2011). Nonparametric tests for pathwise properties of semimartingales.

Bernoulli 17 (2), 781–813.Dang, D.-M. and P. A. Forsyth (2014). Continuous time mean-variance optimal portfolio allocation un-der jump diﬀusion: a numerical impulse control approach.

Numerical Methods for Partial DiﬀerentialEquations 30 , 664–698. 32ang, D.-M. and P. A. Forsyth (2016). Better than pre-commitment mean-variance portfolio allocationstrategies: A semi-self-ﬁnancing Hamilton–Jacobi–Bellman equation approach.

European Journal of Op-erational Research 250 (3), 827–841.Davis, M. and S. Lleo (2008). Risk-sensitive benchmarked asset management.

Quantitative Finance 8(4) ,415–426.Forsyth, P. and K. Vetzal (2019). Optimal asset allocation for retirement savings: Deterministic vs. timeconsistent adaptive strategies.

Applied Mathematical Finance 26:1 , 1–37.Graf, S. (2017). Life-cycle funds: Much ado about nothing?

European Journal of Finance 23 , 974–998.Graham, B. (2003).

The Intelligent Investor . New York: HarperCollins. Revised edition, forward by J.Zweig.Gu, S., R. Kelly, and D. Xu (2018). Empirical asset pricing via machine learning. SSRN:3159577.Hejazi, S. and K. R. Jackson (2016). A neural network approach to eﬃcient valuation of large portfolios ofvariable annuities.

Insurance: Mathematics and Economics 70 , 169–181.Kou, S. G. (2002). A jump-diﬀusion model for option pricing.

Management science 48 (8), 1086–1101.Kou, S. G. and H. Wang (2004). Option pricing under a double exponential jump diﬀusion model.

Manage-ment science 50 (9), 1178–1192.Lhabitant, F.-S. (2000). Derivatives in portfolio management: Why beating the market is easy. EDHECWorking paper.Li, Y. and P. A. Forsyth (2019). A data-driven neural network approach to optimal asset allocation fortarget based deﬁned contribution pension plans.

Insurance: Mathematics and Economics 86 , 189–204.Lim, A. and B. Wong (2010). A benchmark approach to optimal asset allocation for insurers and pensionfunds.

Insurance: Mathematics and Economics 46(2) , 317–327.Oderda, G. (2015). Stochastic portfolio theory optimization and the origin of rule based investing.

Quanti-tative Finance 15(8) , 1259–1266.Patton, A., D. N. Politis, and H. White (2009). Correction to automatic block-length selection for thedependent bootstrap by d. politis and h. white.

Econometric Reviews 28 (4), 372–375.Politis, D. N. and J. P. Romano (1994). The stationary bootstrap.

Journal of the American StatisticalAssociation 89 (428), 1303–1313.Politis, D. N. and H. White (2004). Automatic block-length selection for the dependent bootstrap.

Econo-metric Reviews 23 (1), 53–70.Samo, Y.-L. K. and A. Vervuurt (2016). Stochastic portfolio theory: a machine learning perspective. ArXiv1605.02654.Tepla, L. (2001). Optimal investment with minimum performance constraints.