[PDF] Numerical Solution of Dynamic Portfolio Optimization with Transaction Costs

Abstract

We apply numerical dynamic programming techniques to solve discrete-time multi-asset dynamic portfolio optimization problems with proportional transaction costs and shorting/borrowing constraints. Examples include problems with multiple assets, and many trading periods in a finite horizon problem. We also solve dynamic stochastic problems, with a portfolio including one risk-free asset, an option, and its underlying risky asset, under the existence of transaction costs and constraints. These examples show that it is now tractable to solve such problems.

Full PDF

aa r X i v : . [ q -f i n . P M ] M a r Numerical Solution of Dynamic Portfolio Optimizationwith Transaction Costs

Yongyang Cai Kenneth L. Judd Rong Xu ∗ March 5, 2020

Abstract

We apply numerical dynamic programming techniques to solve discrete-time multi-asset dynamic portfolio optimization problems with proportional transaction costs andshorting/borrowing constraints. Examples include problems with multiple assets, andmany trading periods in a ﬁnite horizon problem. We also solve dynamic stochasticproblems, with a portfolio including one risk-free asset, an option, and its underlyingrisky asset, under the existence of transaction costs and constraints. These examplesshow that it is now tractable to solve such problems.

Keywords : Numerical dynamic programming, dynamic portfolio optimization, transac-tion cost, no-trade region, option hedging, Epstein-Zin preferences

JEL Classiﬁcation:

C61, C63, G11 ∗ Cai (corresponding author), [email protected], The Ohio State University; Judd, [email protected],Hoover Institution & NBER; Xu, [email protected]. We thank Gerd Infanger, Sunil Kumar, WalterMurray, Michael Saunders, Karl Schmedders, Benjamin Van Roy, and participants in the SITE SummerWorkshop 2014 at Stanford University for their helpful comments. Cai acknowledges support from the HooverInstitution at Stanford University. This research is part of the Blue Waters sustained-petascale computingproject, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993) andthe State of Illinois. Blue Waters is a joint eﬀort of the University of Illinois at Urbana-Champaign and itsNational Center for Supercomputing Applications. This research was also supported in part by NIH throughresources provided by the Computation Institute and the Biological Sciences Division of the University ofChicago and Argonne National Laboratory, under grant 1S10OD018495-01. We give special thanks to theHTCondor team of the University of Wisconsin-Madison for their support. Introduction

Dynamic management of portfolios is a critical part of any investment strategy by individualsand ﬁrms. Multi-stage portfolio optimization problems assume that there are k risky assetsand/or a risk-less asset (“bank account” paying a ﬁxed interest rate) traded during a timehorizon [0 , T ] , and portfolio adjustments are made at N ﬁxed times in the interval [0 , T ] .Trades are made to maximize the investor’s expected utility over terminal wealth ( T is theterminal time) and/or consumption during [0 , T ] .Standard theory assumes that there is no cost to rebalancing a portfolio, but transactioncosts are not negligible in real markets. Not only are there transaction fees such as brokerageexpenses, but also the presence of a bid-ask spread creates a transaction cost for a trader.In the modern U.S. stock market, the bid-ask spread is becoming smaller and smaller, butvaries across stock prices. If stock prices are high, then the spread could be less than 0.01%,but if stock prices are low, then the spread could be more than 0.1%. Moreover, if riskyassets for trading are not stocks, then they may have high transaction costs. For example,if an investor wants to exchange two currencies such as USD and CNY, then transactioncosts could be more than 1%. Furthermore, if some risky assets such as housing are not veryliquid, then transaction costs could be even higher.In standard ﬁnance theory, the pricing of an option is dependent on the assumptions thatits underlying risky asset can be traded at any continuous time without transaction costs andthere are no constraints in shorting or borrowing, so that the option can be replicated by theunderlying risky asset and one safe asset. However, while the transaction costs may be smallin one-time trading, the frequency of rebalancing is theoretically high, so the total transactioncost will be high. Moreover, investors face constraints in shorting risky assets or borrowingcash: they have limits on trading amounts, they have to pay higher costs for shorting riskyassets, and they have to pay higher interest rates for borrowing cash than what they receivefrom saving the same amount of cash in a bank account. Therefore, any examination ofreal-world dynamic portfolio management needs to consider these frictions/constraints.2n this paper, we ﬁrst examine multi-stage portfolio optimization problems with propor-tional transaction costs but no options. If the major transaction cost is the bid-ask spread,then a proportional transaction cost is the correct case to study. For simplicity, we assumethat neither shorting risky assets nor borrowing cash is allowed . In fact, if we assume thatnon-positive wealth has negative inﬁnite utility (i.e., power utility) and the logarithm of re-turns of risky assets have inﬁnite support in both the low and high end, then the no-shortingand no-borrowing constraints will hold automatically in discrete-time analysis. We also solvethese problems with stochastic parameters.However, in many cases, investors would like to reduce or control the risk and amount ofloss in investment strategies. One way is to add put options into the set of portfolio assets asa hedging strategy. A put option is a ﬁnancial contract that gives its holder the right to sella certain amount of an underlying asset at a speciﬁed price (strike price) within a speciﬁedtime (time to maturity, or expiration time). Investors may also want to include call optionsin their portfolio. A call option is a ﬁnancial contract that gives its holder the right to buya certain amount of an underlying asset at a speciﬁed price (strike price) within a speciﬁedtime (expiration time). In this paper, we always assume that the options are of Europeantype, that is, the options can only be exercised at the expiration time (but they can still betraded before the expiration time).Multi-stage portfolio optimization problems with transaction costs have been studied inmany papers. The problem with one risky asset has been well studied; see, e.g., Zabel (1973),Constantinides (1976, 1986), Gennotte and Jung (1994), and Boyle and Lin (1997). The keyinsight is that transaction costs create a “no-trade region” (NTR); that is, no trading is doneif the current portfolio is inside the no-trade region, otherwise the investor trades to somepoint on the boundary of the no-trade region. Kamin (1975) considers the case with onlytwo risky assets. Constantinides (1979) and Abrams and Karmarkar (1980) establish someproperties of the NTR for multiple assets, but only present numerical examples with one safeand one risky asset. Brown and Smith (2011) evaluate some heuristic strategies and their3ounds based on simulation, but their method cannot give the optimal portfolios.In the continuous-time version, there are many papers about the portfolio optimizationproblem with transaction costs with one or two risky assets; see, e.g., Davis and Norman(1990), Duﬃe and Sun (1990), Akian et al. (1996), Oksendal and Sulem (2002), Janecek andShreve (2004), Liu (2004), Goodman and Ostrov (2010), and Baccarin and Marazzina (2014,2016). Muthuraman and Kumar (2006, 2008) give numerical examples for at most threerisky assets. Muthuraman and Zha (2008) provide a computational scheme that combinessimulation with the boundary update procedure, and present some computational resultswith k ≥ . However, the presence of simulation implies that the boundary of NTR canonly be roughly approximated. More importantly, it is challenging for these continuous-timemethods to solve problems with constraints and stochastic parameters.To the best of our knowledge, when the number of correlated risky assets is bigger thanthree and the number of periods is larger than ﬁve, our numerical dynamic programming(DP) method is the ﬁrst to explicitly give good numerical solutions with transaction costsand general utility functions. Moreover, our DP method can be extended with the recentadvances in sparse grid interpolation (see e.g., Judd et al. (2014) and Brumm and Scheidegger(2017)), to solve even larger dynamic portfolio problems.Moreover, this is the ﬁrst paper to solve the dynamic portfolio optimization problemswith an option and its underlying asset in the portfolio when transaction costs and borrow-ing/shorting constraints are present. For simplicity, when there is an option in the portfolio,we assume that the portfolio has one riskless asset paying interest, one option, and its un-derlying risky asset. It is simple to extend our problem to cases with multiple risky assetsand/or multiple put/call options with diﬀerent expiration times and/or strike prices, or someother derivatives. We assume that the prices of options are given by some brokerage agencies,which use the pricing formulas in standard ﬁnance theory without consideration of transac-tion costs and constraints. However, investors have to pay the transaction costs and face theconstraints in shorting or borrowing. We also study the impact of Epstein–Zin preferences4Epstein and Zin, 1989) on the optimal allocations.The paper is organized as follows. Section 2 outlines the portfolio models without optionsand presents their DP formulation. Section 3 shows the numerical results for exampleswithout options. Section 4 reviews the pricing formulas for options in standard ﬁnancetheory. Section 5 introduces the DP model of the portfolio optimization problems withoptions, and also extends it to a long-horizon portfolio optimization problem with Epstein-Zin preferences in which the trading horizon is longer than the expiration time of options.Section 6 shows the numerical results of examples with options. Section 7 concludes. Assume that there are k risky assets and one risk-less asset available for investment. Theinvestor can reallocate the portfolio at N periods in [0 , T ] : t < t < · · · < t N − < t N = T , which will incur transaction costs. For simplicity, we assume that t j are equally separatedwith a length of time ∆ t = T /N . Let R = ( R , . . . , R k ) ⊤ be the random one-period grossreturn vector of the risky assets, and R f = exp( r ∆ t ) be the one-period return of the risk-freeasset, where r is the annual interest rate. In real-life models, the risk-less gross return R f and the multivariate distribution of risky asset returns are stochastic and serially-correlated.Assume that they are dependent on some stochastic parameters such as the interest rate r ,the drift µ , and the volatility σ of the risky returns. Let all these parameters be denoted as avector θ t at time t . They could be discrete Markov chains with a given transition probabilitymatrix from the previous stage to the current stage, or continuously distributed, conditionalon their previous-stage values. For simplicity, in this paper we assume that θ t is a discreteMarkov chain, and then the transition law of θ t can be denoted by θ t +∆ t = H t ( θ t , ǫ t ) , (1)where ǫ t are identically and independently distributed random variables.5he portfolio fraction for asset i at time t (we will always use t to mean a time t j which is at the beginning of period j right before reallocation) is denoted x t,i , and let x t = ( x t, , . . . , x t,k ) ⊤ . Let W t be the total wealth at time t right before reallocation (itwill be changed after reallocation due to the existence of transaction costs in reallocation),which is the sum of the values of assets. The diﬀerence between the total wealth andthe wealth invested in risky assets is invested in the risk-free asset. Let δ t,i W t denote theamount of dollars for buying or selling part of the i -th risky asset at time t , expressed asa fraction of wealth, where δ t,i > means buying, and δ t,i < means selling. We assumethat f ( δ t,i W t ) = τ | δ t,i W t | with a constant τ > is the transaction cost function for buyingor selling part of the i -th risky asset using δ t,i W t dollars. Thus, right after reallocation,the total wealth becomes W t − P i f ( δ t,i W t ) . The absolute operator creates a kink whichshould be avoided in optimization problems. Therefore, for computational purposes, we let δ t,i = δ + t,i − δ − t,i with δ + t,i , δ − t,i ≥ ; we know | δ t,i | ≤ δ + t,i + δ − t,i but the optimal solution will makeeither δ + t,i or δ − t,i to be zero as the investor will not buy and sell the same risky asset at thesame time. That is, in the optimal solution, δ + t = ( δ + t, , . . . , δ + t,k ) ⊤ is the vector of fractions ofwealth W t for buying risky assets, and δ − t = ( δ − t, , . . . , δ − t,k ) ⊤ is the vector of fractions of wealth W t for selling risky assets, so | δ t | = δ + t + δ − t , where the absolute operator is component-wise.Let e denote the column vector of all elements equal to 1. The law of transition of thetotal wealth is W t +∆ t = e ⊤ X t +∆ t + R f (1 − e ⊤ x t − y t ) W t , (2)where X t +∆ t = ( X t +∆ t, , . . . , X t +∆ t,k ) ⊤ is the vector of the amount of dollars invested in therisky assets at time t + ∆ t , i.e., X t +∆ t,i = R i ( x t,i + δ + t,i − δ − t,i ) W t , (3)and y t = e ⊤ ( δ + t − δ − t + τ ( δ + t + δ − t )) (4)6s the fraction of wealth in buying or selling risky assets with δ t = δ + t − δ − t and associatedtransaction costs. The next-period portfolio fraction for asset i becomes x t +∆ t,i = X t +∆ t,i /W t +∆ t (5)for i = 1 , ..., k .Since we do not allow shorting risky assets, we should have the constraint x t + δ + t − δ − t ≥ , (6)which means that every element of the vector ( x t + δ + t − δ − t ) is nonnegative. The no-borrowingconstraint requires − e ⊤ x t ≥ y t . (7)The investor’s objective is to maximize expected utility at the terminal time T . Thus,the multi-stage portfolio optimization problem can be expressed as V ( W , x , θ ) = max δ + t ,δ − t ≥ E { U ( W T ) } , (8)subject to the constraints (1)-(7), where E {·} is the expectation operator, U is the utilityfunction, and δ + t and δ − t are element-wise nonnegative.We may allow assets to ﬁnance consumption during the investment period. Let the totalconsumption during period t be c t W t ∆ t , where c t ∆ t is the fraction of wealth W t , and c t W t isthe annual consumption rate in the period. Then the transition law of total wealth becomes W t +∆ t = e ⊤ X t +∆ t + R f (1 − e ⊤ x t − y t ) W t − c t W t ∆ t. (9)Let β = exp( − ρ ∆ t ) be the one-period discount factor and a terminal value function V T isgiven. The dynamic portfolio optimization problem is to ﬁnd optimal transaction ( δ + t , δ − t ) c t such that we have the maximal expected total utility, i.e., V ( W , x , θ ) = max δ + t ,δ − t ,c t ≥ E ( N − X j =0 β j U ( c t j W t j )∆ t + β N V T ( W T , x T , θ T ) ) , (10)subject to the constraints (1), (3)-(6), (9), and the following new no-borrowing constraint − e ⊤ x t − c t ∆ t ≥ y t . (11)In the objective of the maximization problem (10), we choose U ( c t j W t j )∆ t instead of U ( c t j W t j ∆ t ) because (i) when ∆ t is very small, as for CRRA (constant relative risk aversion) utility func-tions we use later, U ( c t j W t j ∆ t ) will make no sense as the marginal utility goes to −∞ when ∆ t → , and (ii) N − P j =0 β j U ( c t j W t j )∆ t is an approximation of the integrand: Z T e − ρt U ( c t W t ) dt which is standard in continuous time models, whereas N − P j =0 β j U ( c t j W t j ∆ t ) cannot approximatethe integrand.For dynamic portfolio problems with transaction costs, the continuous state variables arethe total wealth W t and allocation fractions x t invested in the risky assets. Here W t and x t are the values right before reallocation at time t . Thus, the Bellman equation (Bellman,1957) of the no-consumption model (8) is V t ( W t , x t , θ t ) = max δ + t ,δ − t ≥ E t { V t +∆ t ( W t +∆ t , x t +∆ t , θ t +∆ t ) } , (12)subject to the constraints (2)-(7), where E t {·} is the expectation operator conditional on theinformation at time t . The terminal value function is V T ( W, x , θ ) = U ( W ) for some givenutility function U . 8he Bellman equation of the with-consumption model (10) is V t ( W t , x t , θ t ) = max δ + t ,δ − t ,c t ≥ U ( c t W t )∆ t + β E t { V t +∆ t ( W t +∆ t , x t +∆ t , θ t +∆ t ) } , (13)subject to the constraints (9), (3)-(6), and (11). For the model (13), we assume that all riskyassets have to be converted into the risk-less asset at the terminal time so wealth at time T becomes (1 − τ e ⊤ x ) W T (where W T is the wealth right before conversion). We also assumethat afterwards the investor will save this wealth in a bank account and always consume onlythe interest from saving, i.e., r (1 − τ e ⊤ x ) W T for all t ≥ T , so the terminal value function is V T ( W, x , θ ) = ∞ X j = N β j − N U ( r (1 − τ e ⊤ x ) W )∆ t = U ( r (1 − τ e ⊤ x ) W )∆ t − β , (14)assuming that the investor lives forever. In economics and ﬁnance, we usually assume a CRRA utility function, i.e., U ( W ) = W − γ / (1 − γ ) for some constant γ > and γ = 1 , or U ( W ) = log( W ) for γ = 1 . Thus, for U ( W ) = W − γ / (1 − γ ) , if we assume that V t +∆ t ( W t +∆ t , x t +∆ t , θ t +∆ t ) = W − γt +∆ t · G t +∆ t ( x t +∆ t , θ t +∆ t ) ,then the no-consumption model (12) can be transformed as G t ( x t , θ t ) = max δ + t ,δ − t ≥ E t (cid:8) Π − γt +∆ t · G t +∆ t ( x t +∆ t , θ t +∆ t ) (cid:9) , (15)subject to s t +∆ t,i ≡ R i ( x t,i + δ + t,i − δ − t,i ) , (16) Π t +∆ t ≡ e ⊤ s t +∆ t + R f (1 − e ⊤ x t − y t ) , (17) x t +∆ t,i ≡ s t +∆ t,i / Π t +∆ t , (18)9nd the constraints (4) and (6)-(7). Moreover, W t +∆ t = Π t +∆ t W t , and V t ( W t , x t , θ t ) = W − γt · G t ( x t , θ t ) by induction, for any time t = t , t , . . . , t N , while G T ( x , θ ) ≡ / (1 − γ ) .Similarly, for u ( W ) = log( W ) , we can also separate W t and ( x t , θ t ) in the value function V t ( W t , x t , θ t ) .Since W t and ( x t , θ t ) are separable using the CRRA utility function, we could just do abackward recursion on the functions G t ( x , θ ) instead of V t ( W, x , θ ) , so the optimal portfoliorules are independent of wealth W . This transformation makes computation much easier,because it not only saves one state variable W t but also it avoids the exponentially expandingdomains of W t over time t .We know that there is a “no-trade region” (NTR), Ω t , for any t = t , t , . . . , t N − . When x t ∈ Ω t , the investor will not trade at all, and when x t / ∈ Ω t , the investor will trade to somepoint on the boundary of Ω t . That is, Ω t is deﬁned as Ω t = { x t : ( δ + t ) ∗ = ( δ − t ) ∗ = 0 } , where ( δ + t ) ∗ ≥ are fractions of wealth for buying risky assets, and ( δ − t ) ∗ ≥ are fractions ofwealth for selling risky assets, for a given ( x t , θ t ) . With a CRRA utility function, the NTR Ω t is independent of W t because of the separability of W t and ( x t , θ t ) .Abrams and Karmarkar (1980) show that the NTR is a connected set and that it is acone when the utility function is assumed to be positively homogeneous (a function U ( x ) is positively homogeneous if there exists a positive value function ψ ( x ) such that U ( ax ) = ψ ( a ) U ( x ) for any a > ). Moreover, in the case of proportional transaction costs and concaveutility functions, the NTR can take on many forms ranging from a simple half-line to a non-convex set. For further discussion, see Kamin (1975), Constantinides (1976, 1979, 1986),Davis and Norman (1990), and Muthuraman and Kumar (2006), among other papers.10 .2 Portfolio with Transaction Costs and Consumption We may allow assets to ﬁnance consumption during the investment period. As we discussedin Section 2.1, if the utility function is U ( c ) = c − γ / (1 − γ ) with γ > and γ = 1 , and theterminal value function is V T ( W, x , θ ) = W − γ · G T ( x , θ ) for some given G T ( x , θ ) , then wehave V t ( W t , x t , θ t ) = W − γt · G t ( x t , θ t ) , and G t ( x t , θ t ) = max c t ,δ + t ,δ − t ≥ U ( c t )∆ t + β E t (cid:8) Π − γt +∆ t · G t +∆ t ( x t +∆ t , θ t +∆ t ) (cid:9) , (19)subject to Π t +∆ t ≡ e ⊤ s t +∆ t + R f (1 − e ⊤ x t − y t − c t ∆ t ) , − e ⊤ x t − c t ∆ t ≥ y t , and the constraints (4), (1), (6), (16), and (18). From the terminal value function (14), wehave G T ( x , θ ) := ( r (1 − τ e ⊤ x )) − γ ∆ t (1 − γ )(1 − β ) . (20)Similarly, we can also have the separability of W and ( x , θ ) when U ( c ) = log( c ) and V T ( W, x , θ ) = log( W ) + G T ( x , θ ) . Here, the no-trade region Ω t is deﬁned as Ω t = { x t / (1 − c ∗ t ∆ t ) : ( δ + t ) ∗ = ( δ − t ) ∗ = 0 } , where c ∗ t ∆ t is the optimal fraction of wealth for consumption, ( δ + t ) ∗ ≥ are optimal fractionsof wealth for buying risky assets, and ( δ − t ) ∗ ≥ are optimal fractions of wealth for sellingrisky assets, for a given ( x t , θ t ) . Since we do not allow shorting risky assets or borrowing cash, the domain of x t is a simplex: { x ∈ R k + : P ki =1 x i ≤ } . However, for computational convenience, we choose the hypercube11 , k as the approximation domain of x t . That is, we assume that at every period rightbefore transactions, negative cash can exist, but after transactions cash must be nonnegativeso some risky assets must be sold to cover the negative cash.In our numerical DP method, we choose degree- d complete Chebyshev polynomials toapproximate value functions G t ( x , θ ) (we approximate V t ( W t , x t , θ t ) for problems withoutCRRA utility; but in this paper all examples use CRRA utility), and we always use ( d + 1) k tensor grids of Chebyshev nodes (i.e., each state dimension has d + 1 Chebyshev nodes)as approximation nodes on the hypercube approximation domain [0 , k for constructingChebyshev coeﬃcients using the Chebyshev regression algorithm (see Appendix A.1). Thedegree d or the number of quadrature nodes is chosen such that a higher degree or a highernumber of quadrature nodes has little eﬀect on the numerical solution. In our numericalexamples, we use degree-100 complete Chebyshev polynomials for no-consumption problems(15), or degree-60 for with-consumption problems (19). The high degree polynomials arerequired because the value functions have kinks on the borders of the NTRs, while theutility of consumption and the discount factor in the with-consumption model (19) make thekinks have a less serious impact on the numerical solutions, thus requiring a lower-degreeapproximation than in the no-consumption model (15).We apply the multi-dimensional product Gauss-Hermite quadrature rule for approximat-ing the expectation in (15) or (19). If the time step size ∆ t is one month (i.e., ∆ t = 1 / years) or smaller, then we ﬁnd that in all of our numerical examples, using 3 Gauss-Hermitequadrature nodes for each risky return already achieves an accurate estimate for the expec-tation in (15) or (19), and a higher number of Gauss-Hermite quadrature nodes (e.g., 5, 7,or 9) has almost no change to the NTRs. This happens because with a small time step, thevariance of a one-period risky return is small. Moreover, when τ = 0 , the value functionsare independent of x t , so a small τ will make the value functions not have high curvaturesexcept at the border of the NTRs. Thus, a small number of quadrature nodes can achievehigh accuracy. When the time step is one year, we ﬁnd it requires 5 quadrature nodes in12ach dimension as the variance of a one-year risky return is not small.The maximization problem in (15) or (19) is solved with the NPSOL optimization package(Gill et al., 1994) for each approximation node. For each approximation node, we solve themaximization problem in (15) or (19). The maximization problems are independent of eachother among the approximation nodes, and there are many approximation nodes in ourportfolio problems. Thus we use the parallel DP algorithm (Cai et al., 2015), which wasdeveloped under the HTCondor system of the University of Wisconsin-Madison at ﬁrst andwas adapted to supercomputers later. For small problems, we use a Mac Pro with a 3.5 GHz6-Core Intel Xeon E5. For large problems, we use the Blue Waters supercomputer.Appendix A.1 provides more discussion on the numerical DP method, including numericalapproximation and numerical integration. More details of numerical DP can also be foundin Cai (2010, 2019), Judd (1998), and Rust (2008). If we can trade the assets at any continuous time without transaction costs and the timehorizon is inﬁnite, and if we assume the power utility with γ denoting the relative riskaversion, the theoretically optimal portfolio is the Merton point (Merton 1969, 1971): (ΛΣΛ) − ( µ − r ) /γ, (21)where Λ is a diagonal matrix with the diagonal elements Λ ii = σ i and Σ is the correlationmatrix of the risky asset returns. If the problem includes consumption, then the optimalconsumption can be given by a function of the Merton point (see Samuelson (1969) andCai (2010)). Thus, we can use this to check the accuracy of our numerical DP algorithm insolving the multi-asset portfolio problems with zero transaction costs, by checking if the NTRconverges to the Merton point when the transaction cost τ → , the horizon is large enough,and the time step size is small enough for approximating the inﬁnite-horizon continuous-time13roblem without transaction costs. This is alternative to the standard checks using a higherdegree approximation (and a higher number of approximation nodes) or a higher number ofquadrature nodes discussed in Section 2.3. In this section, we give several numerical examples for solving multi-stage portfolio op-timization problems with proportional transaction costs and the power utility function u ( W ) = W − γ / (1 − γ ) . In these examples, the one-period stochastic return vector R isalways assumed to be log-normal with log( R ) ∼ N (( µ − σ t, ( ΛΣΛ )∆ t ) in R k , where µ = ( µ , · · · , µ k ) ⊤ is the drift, σ = ( σ , · · · , σ k ) ⊤ is the volatility, Σ is thecorrelation matrix of the log-returns, Λ = diag ( σ , . . . , σ k ) , and σ is the elementwise squareof σ . We can express the returns in terms of the Cholesky factorization Σ = LL ⊤ , where L = ( L i,j ) k × k is a lower triangular matrix, implying log( R i ) = ( µ i − σ i t + σ i √ ∆ t i X j =1 L i,j z j , where z i are independent standard normal random variables, for i = 1 , . . . , k . Therefore, forthe optimization problems (15) and (19), we apply a product Gauss-Hermite quadrature for R to estimate the conditional expectation of Π − γt +∆ t · G t +∆ t ( x t +∆ t , θ t +∆ t ) . Here, r , µ , and σ (and even Σ ) could be stochastic. That is, r , µ , and σ (and even Σ ) could be elements of θ t , and R f and the distribution of R could be dependent on θ t in period t .Subsection 3.1 uses a three-asset problem without consumption to illustrate the basicproperties of a NTR and to show that our numerical DP method can achieve high accuracyin ﬁnding the NTR even with daily transactions. Subsection 3.2 solves several problems with14onsumption to show that our numerical DP method can solve problems with three to ﬁveassets at a good accuracy, and solve problems with stochastic parameters and constraints. We ﬁrst solve a three-asset portfolio example of the model (15), the multi-stage portfoliooptimization problem without consumption. In this example, the assets available for tradinginclude one risk-free asset with an interest rate r and k = 2 uncorrelated risky assets withlog-normal returns. We assume that the utility function at the terminal time is U ( W ) = W − γ / (1 − γ ) , with γ = 3 . We let r = 0 . , µ = (0 . , . ⊤ , and σ = (0 . , . ⊤ , so theMerton point is (1 / , / ; that is, the optimal portfolio in the inﬁnite-horizon continuous-time problem without transaction costs is to invest one third of wealth in each risky asset,and the remaining one third of wealth in the risk-free asset.We let T = 3 years and choose daily time periods, i.e., ∆ t = 1 / years (so the number ofperiods is N = 1095 ). Figure 1 shows the NTRs at the initial time under various transactioncosts ranging from τ = 0 . to . , where horizontal and vertical axes represent thefraction of wealth invested in risky assets 1 and 2 respectively (all ﬁgures in this sectionuse fractions of wealth invested in risky assets as their axes). The left panel of Figure 1 isfor τ = 0 . . The circle point located inside the NTR is the Merton point. The NTR isnearly a square and symmetric w.r.t. the 45 degree line, which is expected because the tworisky assets are i.i.d.. The left panel of Figure 1 also uses the arrows to show the optimaltransaction directions. For example, if the initial portfolio before re-allocation is (0 , , i.e.,all wealth is invested in the risk-less asset, then the optimal re-allocation is to buy each riskyasset with 30.5% of wealth (the left-bottom corner of the NTR). Thus, we see that even withthe small transaction cost τ = 0 . , the NTR is nontrivial: the width of the NTR is 0.026,i.e., 2.6% of wealth.The right panel of Figure 1 shows that the NTRs converge to the Merton point when τ → : when τ = 0 . , the NTR is almost identical to the Merton point. A higher15 .28 0.3 0.32 0.34 0.36 0.38 0.4 Risky asset 1 R i sky a ss e t Initial NTR with =0.01%

Border of NTRMerton pointTrading Direction

Risky asset 1 R i sky a ss e t Impact of Transaction Cost on NTR

Merton point =0.00001% =0.0001% =0.001% =0.01% =0.1%

Figure 1: NTR at the initial time with various transaction costs.transaction cost leads to a larger NTR that contains the NTR with smaller transaction costs.Moreover, as we increase τ , the NTRs grow more slowly in size than the growth rate of τ ,with the width of the NTRs expanding at a rate of about τ / . For example, the widthof the NTR with τ = 0 . is 0.061, nearly . times the width with τ = 0 . , which isclose to (0 . / . / ≈ . . This is consistent with the theoretical asymptotic result ofGoodman and Ostrov (2010), who found that the width of the NTR is nearly proportionalto τ / for small τ for inﬁnite-horizon and continuous-time problems. All these properties ofthe NTRs verify partially that our numerical solutions are accurate.The results in Figure 1 are provided by our numerical DP method, in which we choosedegree-100 complete Chebyshev polynomial approximations, use tensor Chebyshevnodes as the approximation nodes to compute Chebyshev coeﬃcients, and implement thetensor product Gauss-Hermite quadrature rule with tensor quadrature nodes. We runit on a Mac Pro with a 3.5 GHz 6-Core Intel Xeon E5 using the parallel DP method, andit takes about 1.5 wall clock hours due to the high degree polynomial approximations. Tofurther verify that we solve the portfolio problem accurately, we employ degree-150 complete16hebyshev polynomial approximations, tensor Chebyshev nodes, and tensor productGauss-Hermite quadrature rule with tensor quadrature nodes. We ﬁnd that there is littlechange in the solution: e.g., the L diﬀerence between the degree-100 and the degree-150solutions is . × − , and the L ∞ diﬀerence is . × − . A smaller degree polynomialapproximation can be faster but will lead to less accurate solutions. For example, if we usedegree-80 complete Chebyshev polynomial approximations and tensor Chebyshev nodesfor the case with τ = 0 . , then it takes only 44 minutes, but it also provides a nontrivialchange in the solutions: the L diﬀerence between the degree-100 and degree-89 solutionsis 0.0014, and the L ∞ diﬀerence is 0.0033. Note that the kinks on the border of the NTRin the value and policy functions make it slow to improve the accuracy of the solution byincreasing the degree of polynomial approximation.Figure 2 shows the NTRs at the initial time with various time step sizes ranging fromdaily to monthly (the left panel) and horizons from one week to three years (the right panel),with τ = 0 . . The left panel of Figure 1 shows that the NTR with 3-year horizon expandsif the time step size decreases. The right panel of Figure 2 shows that the NTR of a dailytrader shrinks as the horizon increases, and the NTR with T = 0 . years is almost identicalto the NTR with T = 3 years. This implies that the trading strategy for inﬁnite-horizoncontinuous-time problems can be approximated by the initial time solutions of a six-monthhorizon problem with daily time steps. This also explains why the NTRs in the right panelof Figure 1 converge to the Merton point when τ → . In this subsection, we solve the with-consumption model (19). The assets available fortrading include one risk-free asset with an interest rate r and multiple risky assets with log-normal annual returns. We assume that the utility function is U ( c ) = c − γ / (1 − γ ) . Table1 list values of parameters of the examples in this subsection, where I represents the × identity matrix. 17 .28 0.3 0.32 0.34 0.36 0.38 0.4 Risky asset 1 R i sky a ss e t Impact of Time Step Size on NTR with 3-Year Horizon

Merton pointmonthlyweeklydaily

Risky asset 1 R i sky a ss e t Impact of Horizon on NTR of a Daily Trader

Merton pointthree yearssix monthsone monthone week

Figure 2: Initial NTR with various time step sizes and horizons.Example 2 Example 3 Example 4 Example 5 k γ τ

1% 0.1% 0.1% 0.1% β exp( − . t ) exp( − . t ) exp( − . t ) exp( − . t ) r µ ⊤ (0 . , . stochastic (0 . , . , .

07) (0 . , . , . , . σ ⊤ ( √ . , √ .

17) (0 . , .

2) (0 . , . , .

2) (0 . , . , . , . (cid:20) . . (cid:21) (cid:20) (cid:21)  . . . . . .

16 1  I Table 1: Parameters for Examples of Portfolio Problems with Consumption.18 .2.1 Example 2: a Three-Asset Problem with Consumption

For the problem with two risky assets and one risk-free asset, the parameter values are listedin Table 1, and they are either the same as or discrete-time analogs to those in Discussion 1 inMuthuraman and Kumar (2006), including τ = 1% , in order to show that our numerical DPmethod can also solve the continuous-time inﬁnite-horizon portfolio problem with consump-tion in Muthuraman and Kumar (2006). We use weekly time steps (i.e., ∆ t = 1 / years),and in the numerical DP method for this example, we use degree- complete Chebyshevpolynomials to approximate value functions, use tensor Chebyshev nodes as the ap-proximation nodes to compute Chebyshev coeﬃcients, and implement the multi-dimensionalproduct Gauss-Hermite quadrature rule with tensor quadrature nodes. For the case with T = 3 years (i.e., N = 156 periods), it takes only two minutes on a Mac Pro with a 3.5GHz 6-Core Intel Xeon E5. A larger degree (e.g., degree-100) polynomial approximation(with a higher number of Chebyshev nodes) or more quadrature nodes (e.g., 5 nodes in eachdimension) has little impact on the solution: e.g., the L diﬀerence between the degree-60and degree-100 solutions is . × − , and the L ∞ diﬀerence is 0.0013. In comparison withthe no-consumption examples in Section 3.1, the with-consumption problems require a muchlower degree of polynomial approximation, because the discount factor and utility of con-sumption in the objective of the with-consumption model (19) alleviate the impact of kinksin the value functions on the solutions.The left panel of Figure 3 displays the NTRs at the initial time with various time stepsizes ranging from daily to monthly for T = 3 years (i.e., N = 156 periods). The circle pointlocated inside the NTRs is the Merton point (0 . , . . We see that the NTR with weeklytime steps is close to the NTR with daily time steps, and is also close to the solution given inMuthuraman and Kumar (2006) for the inﬁnite-horizon portfolio optimization problems inwhich assets can be traded at any continuous time. Thus, we have shown that our numerical19P can solve continuous-time inﬁnite-horizon portfolio problem with consumption. More-over, we can use weekly time steps to approximate the continuous-time with-consumptionmodel.The right panel of Figure 3 shows the initial-time NTRs with various trading horizonsfrom one month to ten years, for a ﬁxed time step size of one week. We see that thetrading horizon changes the NTR: a shorter horizon has a larger NTR. We see that theNTR of the three-year horizon is almost identical to the NTR of the ten-year horizon.Moreover, when the horizon is small, the no-shorting constraint becomes binding for someportfolios before rebalancing. For example, when the horizon has only one month, if theportfolio before re-allocation is (0,0), then the optimal trading strategy is to keep the currentportfolio unchanged (i.e., the no-shorting constraint is binding). These occasionally bindingconstraints make it challenging for a partial diﬀerential equation solution method such as theone used in Muthuraman and Kumar (2006), but our numerical DP method can handle them.Clearly, the short-horizon NTRs are impacted by the terminal condition, i.e., the terminalvalue function. With our terminal value function (20) and the assumption to close all riskyassets, NTRs are moving towards the origin if horizon becomes shorter, as an investor is lesswilling to keep risky assets so as to avoid the transaction costs incurred by selling all riskyassets at the terminal time. Thus, the Merton point is no longer in the center of the NTRsfor short-horizon problems and is no longer a good approximate solution, as its distance tothe optimal portfolio may be more than 0.2 for the one-month-horizon problem (as shownin the right panel of Figure 3). Example 2 has shown that numerical DP can solve a continuous-time inﬁnite-horizon DPproblem, which is often transformed to a Hamilton-Jacobi-Bellman partial diﬀerential equa-tion (PDE) for being solved numerically. However, if there are stochastic and discrete param-20

Risky asset 1 R i sky a ss e t Impact of Time Step Size on NTR with 3-Year Horizon

Merton pointmonthlyweeklydaily

Risky asset 1 R i sky a ss e t Impact of Horizon on NTR of a Weekly Trader

Merton pointone monthsix monthsone yearthree yearsten years

Figure 3: Impact of time step size and horizon on the initial-time NTR for 2 correlated riskyassets and 1 risk-free asset with consumption.eters, then it is often challenging to solve PDEs with such jump processes. In this example,we apply numerical DP to solve a three-asset with-consumption model (19), assuming thatthe drifts of risky returns, µ t = ( µ t, , µ t, ) ⊤ , are stochastic. In Appendix A.2, we also solvedynamic portfolio problems with stochastic interest rates r or stochastic volatility σ . Weassume that µ t, and µ t, are discrete Markov chains and independent of each other. Each µ t,i has two possible values: µ = 0 . and µ = 0 . , and its transition probability matrixis  .

75 0 . .

25 0 .  for each i = 1 , , where its ( j , j ) element represents the transition probability from µ t,i = µ j to µ t +∆ t,i = µ j , for j , j = 1 , . The other parameter values are listed in Table 1.We use weekly time periods (i.e., ∆ t = 1 / years) in T = 3 years (so the number ofperiods is N = 156 ). The assets available for trading include one risk-free asset with aninterest rate r and k = 2 uncorrelated risky assets with log-normal returns. In the numerical21 .25 0.3 0.35 0.4 0.45 0.5 Risky asset 1 R i sky a ss e t Initial NTR

Border of NTR at =(0.06,0.06)Merton point at =(0.06,0.06)Border of NTR at =(0.06,0.08)Merton point at =(0.06,0.08)Border of NTR at =(0.08,0.06)Merton point at =(0.08,0.06)Border of NTR at =(0.08,0.08)Merton point at =(0.08,0.08)

Figure 4: NTRs with a stochastic µ .DP method, we choose degree- complete Chebyshev polynomials to approximate valuefunctions, use tensor Chebyshev nodes as the approximation nodes, and implement themulti-dimensional product Gauss-Hermite quadrature rule with tensor quadrature nodes.Figure 4 displays the NTRs for four possible discrete states of ( µ t, , µ t, ) at the initialtime. The previous examples without stochastic parameters have only one unique NTR ateach trading time, but in this example each discrete state of stochastic parameters has its owncorresponding NTR. These NTRs are close to being square as the two risky assets are i.i.d..The top-right square is the NTR for the state ( µ t, , µ t, ) = (0 . , . , the bottom-left squareis the NTR for the state ( µ t, , µ t, ) = (0 . , . , and the top-left and the bottom-rightsquares are the NTRs for the states ( µ t, , µ t, ) = (0 . , . and ( µ t, , µ t, ) = (0 . , . respectively. The diamond, the mark, the plus, and the circle inside the squares are thecorresponding Merton points if we assume that ( µ t, , µ t, ) are ﬁxed at their initial values.We see that a smaller µ t,i implies a NTR and a Merton point closer to the origin of thecoordinate system. This is consistent with the formula of the Merton point, (21). We seethat these Merton points are no longer in the centers of the NTRs, like what the previous22xamples without stochastic parameters have shown (except the short-horizon problems);thus using Merton points as portfolio solution is no longer good as the diﬀerence betweenthe optimal solution and the Merton points may be more than 0.1, as shown in Figure 4.Appendix A.2 shows more examples with stochastic interest rate or volatility, and theirNTRs have similar patterns. It is often challenging for a PDE solution method to solve problems with three or morecontinuous state dimensions. In this example, we employ the numerical DP method tosolve the with-consumption model (19) with three continuous state dimensions, in which wehave one risk-free asset and three correlation risky assets available for monthly trading (i.e., ∆ t = 1 / years) in T = 3 years (so there are N = 36 periods). The parameter valuesare listed in Table 1. In the numerical DP method for this example, we choose degree- complete Chebyshev polynomials to approximate value functions, use tensor Chebyshevnodes as the approximation nodes to compute Chebyshev coeﬃcients, and implement themulti-dimensional product Gauss-Hermite quadrature rule with tensor quadrature nodes.We run the parallel DP method on the Blue Waters supercomputer, and it takes nearly 2.6wall clock hours using 640 cores. A higher degree (e.g., degree-80) polynomial approximation(with a higher number of Chebyshev nodes) or more quadrature nodes (e.g., 5 nodes in eachdimension) have little change to the solution: e.g., the L diﬀerence between the solutionfrom the degree-60 polynomial approximation and the solution from the degree-80 polynomialapproximation is . × − , and the L ∞ diﬀerence is 0.0017. Figure 5 displays the NTRat the initial time. The circle inside the NTR is the Merton point: (0 . , . , . .For convenience, we plot the faces of the NTR as ﬂat, but in fact there is small curvature onthe faces, and the exact NTR might have curvy faces too as Figure 3 has shown. The NTRis tilted as the risky assets are correlated. 23 .140.210.150.16 0.20.17 0.140.19 R i sky a ss e t Initial NTR

Risky asset 2

Risky asset 1

Figure 5: Initial NTR for 3 correlated risky assets and 1 risk-free asset with consumption.

Examples 2 and 4 have shown that computational time expands exponentially with thenumber of risky assets as we use tensor grids for approximation nodes and quadrature nodes.In particular, due to the existence of kinks of the value functions, we have to use high-degreepolynomial approximations. In Example 4, we have used the degree-80 solution as the “true”solution to compare with lower-degree solutions. If the diﬀerence is small, we consider thesolutions from the lower-degree approximations to be accurate.However, for problems with more risky assets, we cannot use a solution from a very highdegree polynomial approximation to estimate our solution’s accuracy, due to computationalresource limitations. Instead, we roughly estimate our solution’s accuracy from policy ap-proximation errors. That is, we use the solutions of optimal decisions over all approximationnodes in the state space to construct a complete Chebyshev polynomial to approximate apolicy function for each decision variable, and then we compute the distances between opti-mal decisions over 1,000 random nodes in the state space and the values of the approximatedpolicy functions over the same set of random nodes. These distances are called policy ap-24eg. Policy Approx. Errors Diﬀ. from deg-80 Sol. Number Time L Error L ∞ Error L meas. L ∞ meas. of Cores8 0.00190 0.0203 0.0195 0.0452 6 18 seconds20 0.00040 0.0089 0.0059 0.0152 6 12 minutes30 0.00016 0.0062 0.0027 0.0050 256 7 minutes40 0.00013 0.0042 0.0027 0.0046 288 37 minutes60 0.00009 0.0027 0.0005 0.0017 640 2.6 hoursTable 2: Error Analysis and Running Times.proximation errors. This method has been implemented in Cai et al. (2017) and Cai andLontzek (2019).Table 2 shows the policy approximation errors and the diﬀerence from the degree-80solution in their L and L ∞ norms. We see that both the policy approximation errors andthe diﬀerence from the degree-80 solution are declining over the degrees. A lower-degreeapproximation is computationally faster but its error is also larger. For example, usingdegree-8 approximation takes only 18 seconds using 6 cores, but its policy approximationerror in L ∞ is 0.02, and its solution’s diﬀerence from the degree-80 solution is 0.045 in the L ∞ measurement. If we want to control the L ∞ error at around 0.005, then the degree-30solution can suﬃce. Our last example without options solves the with-consumption model (19), where the assetsavailable for trading include one risk-free asset and four risky assets with independent log-normal annual returns. We assume T = 3 years and ∆ t = 1 / years. The parameter valuesare listed in Table 1.Figure 6 displays the NTR at the initial time, and we see that the NTR is close to being ahyper-rectangle as the four risky assets are uncorrelated. The top-right rectangle (with solidlines) and the bottom-left rectangle (with dashed lines) are the cross-sections of the NTR forrisky assets 1 and 2, and risky assets 3 and 4, respectively. In the ﬁgure, the circle and themark are, respectively, the projections of the Merton point, (0 . , . , . , . ,25 .12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.21 Risky asset 1 or 3 R i sky a ss e t o r Initial NTR

Border of NTR on Risky Assets 1 and 2Projection of the Merton point on Risky Assets 1 and 2Border of NTR on Risky Assets 3 and 4Projection of the Merton point on Risky Assets 3 and 4

Figure 6: Initial NTR for 4 independent risky assets and 1 risk-free asset.on risky assets 1 and 2, and risky assets 3 and 4.We choose degree-30 complete Chebyshev polynomials to approximate value functions,use tensor Chebyshev nodes to compute Chebyshev coeﬃcients, and implement themulti-dimensional product Gauss-Hermite quadrature rule with tensor quadrature nodes.Thus, for each period, there are = 923 , optimization problems in the maximizationstep. Moreover, the evaluation of the objective function of each optimization problem istime-consuming: the product quadrature rule means there are = 81 evaluations (one eval-uation per quadrature node) of a degree-30 complete Chebyshev polynomial which contains 

30 + 44  = 46 , terms of Chebyshev basis polynomials. This is a computationallyintensive problem. It takes about 9.8 hours to get the solution using 3,840 cores of theBlue Waters supercomputer. If we use degree-60 complete Chebyshev polynomials with tensor Chebyshev nodes, then we can improve our solution’s accuracy, but it would require26oo much computing resources: around × 

60 + 44  × 

30 + 44  ≈ times more as a rough estimate, which has exceeded our available resources in the BlueWaters supercomputer. Thus, we cannot use a solution from a very high degree polynomialapproximation to estimate our solution’s accuracy, so instead we roughly estimate it usingpolicy approximation errors, which are . × − in L and . in L ∞ . These errors areclose to the policy approximation errors in Example 4 with the degree-30 approximation.Thus, from Table 2, we estimate that the error of our solution in Figure 6 in comparisonwith the true solution would be around 0.003 in L and 0.005 in L ∞ .We also run an example with one risk-free asset and six uncorrelated risky assets us-ing degree-8 complete Chebyshev polynomials to approximate value functions on the six-dimensional hypercube, and get the NTR as a six-dimensional hyper-rectangle. Its policyapproximation errors are also close to those of Example 4 with the degree-8 approximation,so its error would be several percent of wealth, such that it might not be better than a solu-tion using its Merton point for a long-horizon problem. However, for such a high-dimensionalproblem, our numerical DP method can provide a NTR which might still be better than aMerton point, when the problem has a short horizon or stochastic parameters, in which thedistance between the Merton point and the true NTR could be relatively large, as shown inthe right panel of Figure 3 or Figure 4. 27 Pricing Formula for Options

We ﬁrst give the pricing formula for a put option. Let the expiration time of the put optionbe T , and let its strike price be K . Then the payoﬀ of the put option at the expiration time T is max( K − S T , , where S T is the price of the underlying risky asset at time T . We assumethat the price S t of the underlying asset follows a geometric Brownian motion process withconstant drift µ and volatility σ , and the risk-free interest rate is r . Black and Scholes (1973)give an explicit risk-neutral pricing formula for the European-type put option at time t < T .However, some other kinds of options cannot have an explicit formula. For example, anAmerican-type put option has to use some numerical methods for its pricing. One generalnumerical method for pricing is the binomial lattice method (see e.g., Luenberger 1997),which can be used for pricing both American-type and European-type options.In the binomial lattice model, we set h as a sub-period length and n as the number ofsub-periods such that nh = ∆ t , where ∆ t is the length of one period in the multi-periodportfolio optimization model (i.e., investors can rebalance their portfolio every time ∆ t ). Ifthe price is known as S at the beginning of a sub-period, the price at the beginning of thenext sub-period is Su or Sd , with u > and < d < . The probability of the upwardmovement is p , and the probability of the downward movement is − p . To match thelog-normal assumption of the risky asset return, we follow the binomial lattice method toassume that  p = + µ − σ / σ √ h,u = exp( σ √ h ) ,d = exp( − σ √ h ) , (22)such that the expected growth rate of log( S ) in the binomial model converges to ( µ − σ / ,and the variance of the rate converges to σ , as h goes to zero. The risk-free return of onesub-period is exp( rh ) , and the risk-free return of one period is R f = (exp( rh )) n = exp( r ∆ t ) .Thus, the value of one sub-period put option over the underlying risky asset governed by the28inomial lattice is given as P = exp( − rh )( qP u + (1 − q ) P d ) , where P u and P d are the values at the upward branch and the downward branch at the endof this sub-period respectively, and q is the risk-neutral probability with q = (exp( rh ) − d ) / ( u − d ) . Since the payoﬀ of the put option at the expiration time T is P T = max( K − S T , , we cancompute the price of the put option at any binomial node by applying the above backwarditerative relation. From the above binomial model, we know that the risky asset return ofone period, R , has the following probability distribution: Pr( R = u j d n − j ) = n ! j !( n − j )! p j (1 − p ) n − j (23)for j = 0 , , . . . , n, where p, u , and d are deﬁned in (22).We also use the same binomial lattice method to price a call option. The computationalprocess is the same as for the put option, except we need to set the payoﬀ of the call optionat the expiration time T as P T = max( S T − K, for a pre-speciﬁed strike price K . Thebinomial lattice method will also be used to price a butterﬂy option, a combination of acall option and a put option with the same strike price. That is, the payoﬀ of the butterﬂyoption at the expiration time is P T = | S T − K | . We solve dynamic portfolio problems with a portfolio including a risk-less asset, an option,and its underlying risky asset. We assume that the option has an expiration time T and a29trike price K , and it is available for trading at times t = t , t , t , . . . , t N with t j = j ∆ t for j = 0 , , ..., N and ∆ t = T /N , with a price process P t , while its underlying risky asset’s priceprocess is S t . Since the option price P t is dependent on S t and K , we denote it as P t ( S t , K ) explicitly. Right before reallocation time t , assume that W t is the wealth, X t is the amountof money invested in the risky asset, and Y t is the amount of money invested in the option.Since the price of the option is dependent on the risky asset price, we should add S t as astate variable. That is, there are four state variables: W t , X t , Y t , S t . Denote A t = S t /K .Since the payoﬀ of a put option is P T = max( K − S T ,

0) = K · max(1 − A T , , we have P t ( S t , K ) = KP t ( A t , by replication. For simplicity, we denote P t ( A t ) = P t ( A t , , so P t +∆ t ( S t +∆ t , K ) P t ( S t , K ) = P t +∆ t ( A t +∆ t ) P t ( A t ) . Similarly, for a call option or a butterﬂy option, the separation of K and A t still hold. Thus,we can use A t as the state variable instead of S t .Assume that ∆ X t is the amount of dollars for buying or selling the risky asset at time t ,and ∆ Y t is the amount of dollars for buying or selling the option. ∆ X t > or ∆ Y t > meansbuying the risky asset or the option, while ∆ X t < and ∆ Y t < means selling the riskyasset and the option. We assume that there are proportional transaction costs in buying orselling the risky asset and the option, with the proportions τ and τ respectively. Thus, thetransition laws of the four state variables in one period with time length ∆ t are A t +∆ t = A t R, (24) Y t +∆ t = ( Y t + ∆ Y t ) P t +∆ t ( A t +∆ t ) P t ( A t ) , (25) X t +∆ t = R ( X t + ∆ X t ) , (26) W t +∆ t = R f ( W t − X t − Y t − Z t ) + X t +∆ t + Y t +∆ t , (27)30here R is the random return of the risky asset in one period with length ∆ t , R f = exp( r ∆ t ) is the one-period risk-free return with r the interest rate, and Z t = ∆ X t + ∆ Y t + τ | ∆ X t | + τ | ∆ Y t | . (28)We also assume that there are constraints in shorting risky assets/options or borrowingcash. For simplicity, we assume that neither shorting nor borrowing is allowed. That is, wehave the following constraints: X t + ∆ X t ≥ , (29) Y t + ∆ Y t ≥ , (30) X t + Y t + Z t ≤ W t . (31)Thus, the DP model is V t ( W t , X t , Y t , A t ) = max ∆ X t , ∆ Y t E t { V t +∆ t ( W t +∆ t , X t +∆ t , Y t +∆ t , A t +∆ t ) } , (32)subject to the constraints (24)-(31), for t = t , t , t , . . . , t N − . Here, E t is the expecta-tion operator conditional on the information at time t , and the terminal value function is V T ( W, X, Y, S ) = U ( W ) for some given utility function U . In order to use a smooth optimization solver for solving the DP model, we want to cancel outthe absolute operator in the constraint (28). This can be done by letting ∆ X t = W t ( δ + tx − δ − tx ) and ∆ Y t = W t ( δ + ty − δ − ty ) , with δ + tx , δ − ty ≥ such that | ∆ X t | and | ∆ Y t | can be substituted by W t ( δ + tx + δ − tx ) and W t ( δ + ty + δ − ty ) respectively in the optimization problems.If the utility function is CRRA with relative risk aversion coeﬃcient γ > , then we let x t = X t /W t and y t = Y t /W t . By using W t , x t , y t , and A t as state variables, we can separate31 t and ( x t , y t , A t ) in the value function V t ( W t , x t , y t , A t ) . When U ( W ) = W − γ / (1 − γ ) with γ > and γ = 1 , we have V t ( W t , x t , y t , A t ) = W − γt · g t ( x t , y t , A t ) , (33)where g t ( x t , y t , A t ) = max δ + tx ,δ − tx ,δ + ty ,δ − ty ≥ E t (cid:8) Π − γt +∆ t · g t +∆ t ( x t +∆ t , y t +∆ t , A t +∆ t ) (cid:9) , (34)subject to A t +∆ t = A t R, (35) ξ t +∆ t = R ( x t + δ + tx − δ − tx ) , (36) η t +∆ t = ( y t + δ + ty − δ − ty ) P t +∆ t ( A t +∆ t ) P t ( A t ) , (37) z t = δ + tx − δ − tx + δ + ty − δ − ty + τ ( δ + tx + δ − tx ) + τ ( δ + ty + δ − ty ) , (38) Π t +∆ t = R f (1 − x t − y t − z t ) + ξ t +∆ t + η t +∆ t , (39) x t +∆ t = ξ t +∆ t / Π t +∆ t , (40) y t +∆ t = η t +∆ t / Π t +∆ t , (41)and the following no-shorting and no-borrowing constraints x t + δ + tx − δ − tx ≥ , (42) y t + δ + ty − δ − ty ≥ , (43) x t + y t + z t ≤ . (44)The terminal function is g T ( x, y, A ) = 1 / (1 − γ ) . Similarly, when U ( W ) = log( W ) , we canalso separate W t and ( x t , y t , A t ) in the value function V t ( W t , x t , y t , A t ) . The approximationdomain of ( x t , y t ) can be set as [0 , as we do not allow shorting or borrowing.Similarly to what we computed in the portfolio problems without options, we will also32ompute a no-trade region, Ω t , for the problems with options, in which Ω t is deﬁned as Ω t = { ( x t , y t ) : ( δ + tx ) ∗ = ( δ − tx ) ∗ = ( δ + ty ) ∗ = ( δ − ty ) ∗ = 0 } , where ( δ + tx ) ∗ , ( δ + ty ) ∗ ≥ are optimal fractions of wealth for buying the risky asset and theoption, and ( δ − tx ) ∗ , ( δ − ty ) ∗ ≥ are optimal fractions of wealth for selling the risky asset andthe option, for a given ( x t , y t , A t ) . From the separability of W t and ( x t , y t , A t ) , we see thatthe optimal portfolio rules are independent of wealth W t . Thus the no-trade regions Ω t arealso independent of W t , for CRRA utility functions. In some problems, assets are used to ﬁnance consumption during the period t . Like what wediscussed in Section 2.1, if the utility function for consumption is U ( C ) = C − γ / (1 − γ ) andthe terminal value function is V T ( W, x, y, A ) = W − γ · g T ( x, y, A ) for some given g T ( x, y, A ) ,then the Bellman equation (34) changes to g t ( x t , y t , A t ) = max c t ,δ + tx ,δ − tx ,δ + ty ,δ − ty ≥ U ( c t )∆ t + β E t (cid:8) Π − γt +∆ t · g t +∆ t ( x t +∆ t , y t +∆ t , A t +∆ t ) (cid:9) , (45)subject to the constraints (35)-(38), (40)-(43), and Π t +∆ t = R f (1 − x t − y t − z t − c t ∆ t ) + ξ t +∆ t + η t +∆ t ,x t + y t + z t + c t ∆ t ≤ , where β = exp( − ρ ∆ t ) is the one-period discount factor. The terminal value function is g T ( x T , y T , A T ) = U ( r (1 − τ x T ))∆ t − β = ( r (1 − τ x T )) − γ ∆ t (1 − β )(1 − γ ) (46)33y assuming that the agent sells all of the underlying risky asset (incurring proportionaltransaction cost τ ) and exercises all the options (without transaction costs) at the terminaltime, and then consumes only the interest of the terminal wealth forever.In the model (45), γ is the risk aversion parameter, but it is also the inverse of theintertemporal elasticity of substitution (IES). Epstein and Zin (1989) introduce a recursiveutility function to separate risk aversion and IES. If we use Epstein–Zin preferences and U ( c ) = c − γ / (1 − γ ) with γ the inverse of IES, then the Bellman equation (45) becomes g t ( x t , y t , A t ) = max c t ,δ + tx ,δ − tx ,δ + ty ,δ − ty ≥ U ( c t )∆ t + β (cid:20) E t (cid:26)(cid:0) Π − γt +∆ t · g t +∆ t ( x t +∆ t , y t +∆ t , A t +∆ t ) (cid:1) − ψ − γ (cid:27)(cid:21) − γ − ψ , (47)where ψ is the risk aversion parameter. See Cai et al. (2017) and Cai and Lontzek (2019) formore discussion on Epstein–Zin preferences and its use in dynamic stochastic optimizationproblems (e.g., the Bellman equation (47) formula holds only when < γ < ; when γ > ,it needs some adjustment).In reality, the time to maturity of options is typically not more than one year. Thus,if we want to solve a multi-year portfolio optimization problem with options (where eachperiod is equal to or less than 1 year), we need to liquidate the option at its expirationtime and buy some newly issued options at some intermediate stages. For simplicity, weassume that all options have the same maturity T while the whole trading horizon is mT (that is, we have m rounds of option issuing), and at the new option issuing times t ∈ I ≡{ , T, T, . . . , ( m − T } , only the at-the-money options are available (i.e., the strike price K for an option issued at time t ∈ I is S t , so the strike prices will be diﬀerent for thesediﬀerent at-the-money options issued at diﬀerent times). Thus, when t

6∈ I or t = 0 , the DPmodel is the same as (32).When t ∈ I and t > , we use t + to denote the time right after liquidating the expiredoptions and before rebalancing with newly issued options. We assume that there is notransaction cost for liquidating the expired options. Thus, we have Y t + = 0 and W t + = W t W t − X t − Y t to W t − X t ). Moreover, since we assumethat only an at-the-money option is available for trading no matter what the risky assetprice is at time t ∈ I , we have A t + = 1 . That is, we have V t ( W t , X t , Y t , A t ) = V t + ( W t + , X t + , ,

1) = V t + ( W t , X t , ,

1) = W − γt · g t ( x t , , , for t ∈ I and t > , because we assume that there is no transaction cost for liquidating theexpired options. In all examples in this section, we assume that the available assets for trading are one risk-free asset, one risky asset, and one option underlying the risky asset. The risk-free assethas an annual interest rate, r = 0 . . The risky asset has a return R drawn from thebinomial distribution (23) with µ = 0 . , σ = 0 . , and a proportional transaction cost ratio τ . The option has an expiration time T and a strike price K speciﬁed at its issue time and aproportional transaction cost ratio τ . The terminal value function is U ( W ) = W − γ / (1 − γ ) with the default γ = 3 for the problems without consumption. The default length of oneperiod is ∆ t = 1 / years, i.e., one week. We apply the binomial lattice model (22) withthe default length of one sub-period h = 1 / (so the default number of sub-periods is n = ∆ t/h = 10 ).In the following examples, our approximation method in numerical DP uses degree-100 complete Chebyshev polynomials with Chebyshev nodes in two continuous states: ( x t , y t ) , where x t and y t are the fractions of wealth invested in the risky asset and the optionat the time right before the reallocation time t respectively. Similarly to our exampleswithout options, we use the square [0 , as the approximation domain for ( x t , y t ) , and wechoose a degree-100 polynomial approximation as a higher degree approximation (with ahigher number of approximation nodes) has little change in the solution. Here we do not35eed the Gauss-Hermite quadrature rule as we have already discretized the price of the riskyasset and keep it as a discrete state variable. Our ﬁrst example with options chooses the at-the-money put option (i.e., the strike price K = S or equivalently A = 1 ) with the expiration time T = 0 . years. The trading horizonis also T , so it has 26 re-allocation times with the weekly time steps, and 261 sub-periodsover the whole horizon (so the largest number of values of the discrete state A t is 261,which happens at the terminal time). Figure 7 shows the NTRs and the optimal transactionpaths in arrows at the initial time for τ = τ = 0 . (the left panel) and τ = 0 . and τ = 0 . (the right panel), where the horizontal and vertical axes represent the fractionof wealth invested in the underlying risky asset and the option respectively (the axes of allother ﬁgures in this section have the same meaning). The circle on the horizontal axis isthe Merton point, (0 . , , if we assume that the underlying risky asset can be traded atany time without transaction costs so that the option can be replicated completely by therisk-free asset and the underlying risky asset (then there is zero fraction of wealth investedin the option at the Merton point). Underlying risky asset O p t i on =0.001, =0.001 Border of NTRMerton pointTrading Direction

Underlying risky asset O p t i on =0.001, =0.002 Border of NTRMerton pointTrading Direction

Figure 7: Initial NTRs for 1 underlying risky asset, 1 put option, and 1 risk-free asset.36ince the payoﬀ of the put option at the expiration time is max( K − S T , , we know thatthe put option is negatively and highly correlated with the underlying risky asset. The highcorrelation of the underlying risky asset and the option implies that the objective functionof the optimization problem is ﬂat so the NTR should be a strip that is close to a line, andthe strip should be tilted toward one axis. The negative correlation implies that the stripshould be tilted with a positive slope. All these properties can be observed in Figure 7, whilethe right panel’s strip is slightly wider than the left panel’s, due to the larger transactioncost of the option.The arrows in Figure 7 show the paths from the initial allocations on the border of theaxis box to the optimal allocations. In the left panel of Figure 7 (with τ = τ = 0 . ),we see that if the initial allocation of wealth is 47% or less, then the optimal decision is tosell all put options and buy some risky assets until the point (0.47,0). This happens becausethe put options are for hedging risks, so if the fraction of risky assets is small, the wholeportfolio’s risk is small and it becomes ineﬃcient to hold the put options. Moreover, if theinitial allocation of wealth is 80% or more in the underlying risky asset and 0% in the putoption, then the optimal decision is to sell some of the risky asset until its fraction becomes75% and to buy some put options until its fraction becomes 2.4%, in order to hedge risks.Otherwise, holding too many risky assets may incur too large a loss of wealth if the returnof the risky asset turns out to be small. Furthermore, if the initial allocation of wealthis between 60% and 75% in the underlying risky asset and 0% in the put option, then tohedge risks the optimal decision is to keep the same amount of the risky asset but to buyput options until the lower envelope of the NTR is reached. However, in the right panel ofFigure 7 (with τ = 0 . and τ = 0 . ), if the initial allocation of wealth is 60% or morein the underlying risky asset and 0% in the put option, then the optimal decision is to sellsome of the risky asset until its fraction becomes 58% and to buy some put options until itsfraction becomes 0.5%. That is, the higher transaction cost for options makes an investorless willing to buy options, and hold less of the risky asset as the need for risk hedging from37ut options is also less.For each initial risky asset allocation fraction from 0 to 1, while the initial option allo-cation fraction is 0, we can compute the optimal trading strategy for the dynamic portfolioproblem, and then we can compute its certainty equivalent, which is the certain amountgiving the same utility as the expected utility of uncertain future wealth. Figure 8 showsthe functions of certainty equivalents over the initial risky asset allocation fraction for ﬁvecases. Its horizontal axis is the NTR of the underlying risky asset with τ = 0 . whenthere is no option available for trading. The solid line represents the case where there is nooption available for trading, and the other four lines (circled, dotted, dash-dotted, dashed)represent the cases with diﬀerent transaction cost ratios, τ , for the available option. Theﬁgure shows that the put option contributes to the certainty equivalents. If the allocationfraction in the underlying risky asset is higher, then the diﬀerence is higher, as an investorrequires more put options to hedge risks. Moreover, if τ is smaller, then the correspondingcertainty equivalent with the put option is higher, as the option with smaller transactioncosts becomes more attractive to investors. NTR of the risky asset when option is unavailable C e r t a i n t y E qu i v a l en t o f N e t R e t u r n % Certainty Equivalent with Put Option

Option unavailableOption available, =0.002Option available, =0.001Option available, =0.0005Option available, =0.0001 Figure 8: Certainty equivalents for put options with various transaction costs.From Figure 8, we see that when the initial risky asset allocation fraction is 52.8% ofthe wealth (the upper bound of the NTR without options), the diﬀerence of the certainty38quivalent between the case with the put option with τ = 0 . and the case without optionsis less than 0.001%. That is, the social value of one put option is less than 0.00001 dollarsper dollar of investment if initially 52.8% of total wealth is invested in its underlying riskyasset and the remained amount of the wealth is invested in the risk-free asset. Our second example with options uses an at-the-money call option (i.e., the strike price K = S or equivalently A = 1 ) in the portfolio. Other parameter values are the same as theprevious example. The left panel of Figure 9 shows the NTR and the optimal transactionpaths in arrows. We see that the NTR is a narrow strip with a negative slope. This is causedby the positive high correlation between the call option and its underlying risky asset, as thepayoﬀ of the call at the expiration time is max( S T − K, . Moreover, if the initial allocationfraction in the risky asset is bigger than 0.53, then the optimal decision is to sell all optionsand to sell some of the risky asset until the boundary of the NTR (i.e., the point (0 . , ) isreached. Furthermore, if the initial portfolio holds the risky asset with 40% or less of wealthand zero options, then the optimal decision is to buy the call option until the lower envelopeof the NTR is reached.The right panel of Figure 9 shows the functions of the certainty equivalents of the expectedutility of future wealth over the initial allocation fraction of wealth in the underlying riskyasset for ﬁve cases. Similarly to the cases with the put option, Figure 9 shows that the calloption contributes to the certainty equivalents, and that the call option is more useful whenthe initial allocation in the risky asset is smaller. Moreover, if the transaction cost ratio forthe call option is smaller, then the corresponding certainty equivalent with the call optionis higher, which is consistent with the intuition that investors like to invest more in assetswith lower transaction costs. 39 Underlying risky asset O p t i on =0.001, =0.001 Border of NTRMerton pointTrading Direction

NTR of the risky asset when option is unavailable C e r t a i n t y E qu i v a l en t o f N e t R e t u r n % Certainty Equivalent with Call Option

Option unavailableOption available, =0.002Option available, =0.001Option available, =0.0005Option available, =0.0001 Figure 9: Initial NTR and certainty equivalents for call options.

Our third example with options uses an at-the-money butterﬂy option in the portfolio. Otherparameter values are the same as the previous example. The left panel of Figure 10 showsthe NTR and the optimal transaction paths in arrows. We see that the NTR is not anarrow strip like what the previous two examples showed. Its shape is reasonable becausethe butterﬂy option is equivalent to a combination of an at-the-money put option and anat-the-money call option, and its payoﬀ at the expiration time is | S T − K | . Moreover, theoptimal decision is to not buy any butterﬂy option in the case with τ = τ = 0 . . Thisis also reﬂected in the right panel of Figure 10, which shows that the butterﬂy option hasalmost no contribution to the certainty equivalents. This happens because the price of thebutterﬂy is the sum of the prices of the call and the put, so it becomes less proﬁtable forinvestment. 40 .4 0.45 0.5 0.55 0.6 Underlying risky asset O p t i on =0.001, =0.001 Border of NTRMerton pointTrading Direction

NTR of the risky asset when option is unavailable C e r t a i n t y E qu i v a l en t o f N e t R e t u r n % Certainty Equivalent with Butterfly Option

Option unavailableOption available, =0.002Option available, =0.001Option available, =0.0005Option available, =0.0001 Figure 10: Initial NTR and certainty equivalents for butterﬂy options.

We solve the long-horizon model in Section 5.2, which also includes consumption ﬁnancedby assets, and we use Epstein–Zin preferences. The trading horizon is three years and τ = τ = 0 . . The expiration time of each newly-issued at-the-money put option is T = 0 . years. The portfolio is rebalanced every week (i.e., ∆ t = 1 / years) so it has156 periods. Every six months there is a newly-issued at-the-money put option available fortrading. That is, at times t = 0 . , 1, 1.5, 2, 2.5 years, we have a new at-the-money put optionavailable for buying, while its older put option has to be exercised, so the largest number ofdiscrete values of A t is 261, which happens at the maturity times of the option. Following Caiet al. (2017) and Cai and Lontzek (2019), we choose the utility discount rate to be ρ = 0 . (i.e., the one-period utility discount factor is β = exp( − ρ ∆ t ) ), γ = 2 or 2/3 (i.e., IES is0.5 or 1.5), and the degree of risk aversion (RA) ψ = 2 or 5. Our numerical DP methodemploys degree-100 complete Chebyshev polynomials to approximate values functions forevery discrete state. We use 1,600 cores of the Blue Waters supercomputer and it takesabout 1.3 hours for each case of IES and RA.41igure 11 shows the NTRs for the four cases of IES and RA. We see that a more riskaverse investor chooses to hold smaller amounts of risky assets and options, just like whattheir Merton points show, so RA does have signiﬁcant impact on the NTR. Meanwhile, IEShas little impact on the NTR: when RA=5, the diﬀerence between the NTRs when IES=0.5and IES=1.5 is almost invisible in Figure 11. Underlying risky asset O p t i on Border of NTR for IES=1.5, RA=2Border of NTR for IES=0.5, RA=2Border of NTR for IES=1.5, RA=5Border of NTR for IES=0.5, RA=5Merton point for RA=2Merton point for RA=5

Figure 11: NTRs for problems with consumption and Epstein-Zin preferences.However, IES impacts the optimal consumption rate: when RA=5, the optimal consump-tion rate is around 0.012 when IES=0.5, increasing to around 0.018 when IES=1.5, whilethe impact of RA on the optimal consumption is small. That is, a larger IES implies a largerconsumption at the initial time, when the interest rate r is lower than the utility discountrate ρ . This can be explained intuitively by a simple dynamic programming problem: aninvestor maximizes P ∞ j =0 β j U ( c t j )∆ t subject to W t +∆ t = R f W t − c t ∆ t for any t = t , t , t , ... ,that is, the only available asset for investment is the risk-free asset. It has the Euler equation: U ′ ( c t ) = βR f U ′ ( c t +∆ t ) . If U ( c ) = c − γ / (1 − γ ) , so its annual consumption growth rate at theperiod ( t, t + ∆ t ) is g t ≡ ln( c t +∆ t /c t ) / ∆ t = ( r − ρ ) /γ , consistent with the famous Ramseygrowth theory. That is, if r < ρ , then consumption growth is negative, and a smaller γ (i.e.,a larger IES) implies a larger magnitude of negative growth of consumption, so then initialconsumption is larger. With Epstein-Zin preferences and risky assets including options, if42A is very large, then almost all money would be invested in the risk-free asset; this im-plies that a larger IES leads to a larger consumption at the initial time. We also verify thisintuition numerically by solving the same long-horizon examples except with r increased to . . In the cases with r = 0 . > ρ = 0 . , we ﬁnd that a larger IES leads to smallerconsumption at the initial time numerically, as a higher interest rate implies that futurewealth could be much larger if initial consumption is smaller. This ﬁnding is also similar towhat Cai et al. (2017) and Cai and Lontzek (2019) ﬁnd for the impact of IES and RA onsocial cost of carbon.We also run examples with ten years as the trading horizon while other parameters(including r = 0 . ) are unchanged. We ﬁnd that the NTRs of the ten-year-horizon problemsare almost identical to the NTRs of the three-year-horizon problems at the initial time. This paper has shown that numerical DP can solve multi-stage portfolio optimization prob-lems with multiple assets and transaction costs in an eﬃcient and accurate manner. Weillustrate trading strategies by describing the no-trade regions for various choices of asset re-turns and transaction costs. The numerical DP algorithms may be computationally intensivefor large portfolio optimization problems, but modern hardware and parallel DP algorithmsmake it possible to solve such big problems. This paper has also shown that when thereare transaction costs and constraints in shorting or borrowing, there exists a thin no-traderegion for the portfolio of an option and its underlying risky asset. Future research couldincorporate sparse grid methods to solve even larger dynamic portfolio problems.

References [1] Abrams, R.A., and U.S. Karmarkar (1980). Optimal multiperiod investment-consumption policies.

Econometrica , 48(2):333–353.432] Akian, M., J.L. Menaldi, and A. Sulem (1996). On an investment-consumption modelwith transaction costs.

SIAM Journal of Control and Optimization , 34(1):329–364.[3] Baccarin, S., and D. Marazzina (2014). Optimal Impulse Control of a Portfolio with aFixed Transaction Cost.

Central European Journal of Operations Research , vol. 22-2:355–372.[4] Baccarin, S., and D. Marazzina (2016). Passive Portfolio Management over a FiniteHorizon with a Target Liquidation Value under Transaction Costs and Solvency Con-straints.

IMA Journal of Management Mathematics , Vol. 27-4: 471–504.[5] Bellman, R. (1957).

Dynamic Programming . Princeton University Press.[6] Black, F., and M. Scholes (1973). The pricing of options and corporate liabilities.

Journalof Political Economy , 81(3):637–654.[7] Boyle, P.P., and X. Lin (1997). Optimal portfolio selection with transaction costs.

NorthAmerican Actuarial Journal,

Management Science , 57(10):1752–1770.[9] Brumm, J., and S. Scheidegger (2017). Using Adaptive Sparse Grids to Solve High-Dimensional Dynamic Models.

Econometrica , 85(5): 1575–1612.[10] Cai, Y. (2010).

Dynamic Programming and Its Application in Economics and Finance .PhD thesis, Stanford University.[11] Cai, Y. (2019). Computational methods in environmental and resource economics.

An-nual Review of Resource Economics

11, 59–82.[12] Cai, Y., K.L. Judd, and T.S. Lontzek (2017). The social cost of carbon with economicand climate risks. Hoover economic working paper 18113.4413] Cai, Y., K.L. Judd, G. Thain, and S. Wright (2015). Solving dynamic programmingproblems on a computational grid.

Computational Economics , 45(2), 261–284.[14] Cai, Y., and T.S. Lontzek (2019). The social cost of carbon with economic and climaterisks.

Journal of Political Economy , 127(6):2684–2734.[15] Constantinides, G.M. (1976). Optimal portfolio revision with proportional transactioncosts: extension to HARA utility functions and exogenous deterministic income.

Man-agement Science , 22(8):921–923.[16] Constantinides, G.M. (1979). Multiperiod consumption and investment behavior withconvex transaction costs.

Management Science , 25:1127–1137.[17] Constantinides, G.M. (1986). Capital market equilibrium with transaction costs.

Journalof Political Economy , 94(4):842–862.[18] Davis, M.H.A., and A.R. Norman (1990). Portfolio selection with transaction costs.

Mathematics of Operations Research , 15(4):676–713.[19] Duﬃe, D., and T.S. Sun (1990). Transaction costs and portfolio choice in a discrete-continuous-time setting.

Journal of Economic Dynamics and Control, 14:35–

Econometrica , 57(4),937–969.[21] Gennotte, G., and A. Jung (1994). Investment strategies under transaction costs: theﬁnite horizon case.

Management Science , 40(3):385–404.[22] Gill, P., W. Murray, M.A. Saunders and M.H. Wright (1994). User’s Guide for NPSOL5.0: a Fortran Package for Nonlinear Programming. Technical report, SOL, StanfordUniversity. 4523] Goodman, J., and D.N. Ostrov (2010). Balancing small transaction costs with lossof optimal allocation in dynamic stock trading strategies.

SIAM Journal of AppliedMathematics , 70(6):1977–1998.[24] Janecek, K., and S.E. Shreve (2004). Asymptotic analysis for optimal investment andconsumption with transaction costs.

Finance Stochast. , 8(2):181–206.[25] Judd, K.L. (1998).

Numerical Methods in Economics . The MIT Press.[26] Judd, K.L., L. Maliar, S. Maliar, and R. Valero (2014). Smolyak method for solving dy-namic economic models: Lagrange interpolation, anisotropic grid and adaptive domain.

Journal of Economic Dynamic and Control

Management Science , 21(11):1263–1271.[28] Liu, H. (2004). Optimal consumption and investment with transaction costs and multi-ple risky assets.

Journal of Finance , 59:289–338.[29] Luenberger, D. (1997).

Investment Science . Oxford University Press.[30] Malin, B., D. Krueger and F. Kubler (2011). Solving the Multi-Country Real BusinessCycle Model using a Smolyak-Collocation Method.

Journal of Economic Dynamics andControl , 35, 229--239.[31] Merton, R. (1969). Lifetime portfolio selection under uncertainty: the continuous timecase.

The Review of Economics and Statistics , 51(3):247–257.[32] Merton, R. (1971). Optimum consumption and portfolio rules in a continuous timemodel.

Journal of Economic Theory , 3:373–413.[33] Muthuraman, K., and S. Kumar (2006). Multidimensional portfolio optimization withproportional transaction costs.

Mathematical Finance , 16(2):301–335.4634] Muthuraman, K., and S. Kumar (2008). Solving free-boundary problems with applica-tions in ﬁnance.

Foundations and Trends in Stochastic Systems , 1(4):259–341.[35] Muthuraman, K., and H. Zha (2008). Simulation-based portfolio optimization for largeportfolios with transaction costs.

Mathematical Finance , 18(1):115–134.[36] Oksendal, B., and A. Sulem (2002). Optimal consumption and portfolio with both ﬁxedand proportional costs.

SIAM Journal on Control and Optimization , vol. 40: 1765–1790.[37] Rust, J. (2008). Dynamic Programming. In:

New Palgrave Dictionary of Economics ,ed. by Steven N. Durlauf and Lawrence E. Blume. Palgrave Macmillan, second edition.[38] Samuelson, P. (1969). Lifetime portfolio selection by dynamic stochastic programming.

The Review of Economics and Statistics , 51(3):239–246.[39] Zabel, E. (1973). Consumer choice, portfolio decisions, and transaction costs.

Econo-metrica , 41(2):321–335. 47 nline Appendices

A.1 Numerical DP Algorithms

If the state and control variables in a DP problem are continuous, then the value functionmust be approximated in some computationally tractable manner. It is common to approx-imate value functions with a ﬁnitely parameterized collection of functions; that is, we usesome functional form ˆ V ( x ; b ) , where b is a vector of parameters, and approximate a valuefunction, V ( x ) , with ˆ V ( x ; b ) for some parameter value b . For example, ˆ V could be a lin-ear combination of polynomials where b would be the weights on polynomials. After thefunctional form is ﬁxed, we focus on ﬁnding the vector of parameters, b , such that ˆ V ( x ; b ) approximately satisﬁes the Bellman equation.Numerical solutions to a DP problem are based on the Bellman equation: V t ( x , θ ) = max a ∈D ( x ,θ,t ) U t ( x , a ) + β E t (cid:8) V t +∆ t ( x + , θ + ) (cid:9) , (A.1)s.t. x + = G t ( x , θ, a , ω ) ,θ + = H t ( θ, ǫ ) , where x is the vector of continuous states in R k (in our portfolio examples, k is the numberof risky assets including options), θ is the vector of discrete states in a ﬁnite set Θ , V t ( x , θ ) is called the value function at time t ≤ T (the terminal value function V T ( x , θ ) is given), a is the action variable vector in its feasible set D ( x , θ, t ) , x + is the next-stage continuousstate vector with its transition function G t , θ + is the next-stage discrete state vector withits transition function H t , ω and ǫ are two vectors of random variables, and U t ( x , a ) is theutility function, β is the discount factor, and E t {·} is the expectation operator conditionalon information at time t .The following is the algorithm of the numerical DP with value function iteration for ﬁniteA.1orizon problems. Algorithm Numerical Dynamic Programming with Value Function Iteration for FiniteHorizon Problems

Initialization.

Choose the approximation nodes, X = { x i : 1 ≤ i ≤ M } ⊂ R k , for every t < T , and choose a functional form for ˆ V ( x , θ ; b ) . Let ˆ V ( x , θ ; b T ) ≡ V T ( x , θ ) . Thenfor t = t N − , t N − , . . . , t , iterate through steps 1 and 2. Step

1. Maximization Step. Compute v i,j = max a ∈D ( x i ,θ j ,t ) U t ( x i , a ) + β E n ˆ V ( x + , θ + ; b t +∆ t ) o s.t. x + = G t ( x i , θ j , a , ω ) ,θ + = H t ( θ j , ǫ ) , for each θ j ∈ Θ , x i ∈ X , ≤ i ≤ M. Step

2. Fitting Step. Using an appropriate approximation method, compute the b t suchthat ˆ V ( x , θ j ; b t ) approximates ( x i , v i,j ) data for each θ j ∈ Θ . Algorithm 1 shows that there are three main components in value function iteration for DPproblems: optimization, approximation, and integration.

A.1.1 Optimization

We apply some fast Newton-type solvers, such as the NPSOL optimization package, to solvethe maximization problems in the numerical DP algorithm (Algorithm 1). The maximizationproblem (15) is formulated in terms of k control variables ( δ + t and δ − t ), and k boundconstraints ( δ + t , δ − t ≥ ), ( k + 1) inequality constraints for no-shorting and no-borrowing,and other unknowns are expressed in terms of the controls, where k is the number of riskyassets. The problem (19) has one more control at each time, c t .A.2arallelization allows researchers to solve huge problems and is the foundation of modernscientiﬁc computation. Our work shows that parallelization can also be used eﬀectivelyin solving the dynamic portfolio optimization problems using the numerical DP method.The key fact is that at each maximization step, there are many independent optimizationproblems, one for each ( x i , θ j ) . In our portfolio problems there are often thousands or millionsof such independent problems. See Cai et al. (2015) for a more detailed discussion of parallelDP methods. A.1.2 Numerical Approximation

An approximation scheme consists of two parts: basis functions and approximation nodes.Approximation nodes can be chosen as uniformly spaced nodes, Chebyshev nodes, or someother speciﬁed nodes. From the viewpoint of basis functions, approximation methods canbe classiﬁed as either spectral methods or ﬁnite element methods. A spectral method usesglobally nonzero basis functions φ j ( x ) such that ˆ V ( x ; b ) = P j b j φ j ( x ) . Examples of spectralmethods include ordinary polynomial approximation, Chebyshev polynomial approximation,and shape-preserving Chebyshev polynomial approximation (Cai and Judd, 2013), and Her-mite approximation (Cai and Judd, 2015). In contrast, a ﬁnite element method uses locallynonzero basis functions φ j ( x ) , which are nonzero over sub-domains of the approximationdomain. Examples of ﬁnite element methods include piecewise linear interpolation, cubicsplines, B-splines, and shape-preserving rational splines (Cai and Judd, 2012). See Cai (2010,2019) and Judd (1998) for more details.We prefer Chebyshev polynomials when the value function is smooth. Chebyshev basispolynomials on [ − , are deﬁned as T j ( x ) = cos( j cos − ( x )) , while general Chebyshev basispolynomials on [ x min , x max ] are deﬁned as T j ((2 x − x min − x max ) / ( x max − x min )) for j =0 , , , . . . . For Chebyshev approximation, we know Chebyshev nodes are the most eﬃcientapproximation nodes, and it is often enough to let the number of Chebyshev nodes, m , beequal to the number of unknown Chebyshev coeﬃcients for one-dimensional problems; that is,A.3 = d + 1 , where d is the degree of Chebyshev polynomials. For one-dimensional problemsusing m approximation nodes in a state space [ x min , x max ] , the corresponding Chebyshevnodes are x i = ( z i + 1)( x max − x min ) / x min with z i = − cos ((2 i − π/ (2 m )) for i = 1 , ..., m . If we have Lagrange data { ( x i , v i ) : i = 1 , . . . , m } with v i = V ( x i ) , then V can be approximated by a degree- d Chebyshevpolynomial ˆ V ( x ; b ) = d X j =0 b j T j (cid:18) x − x min − x max x max − x min (cid:19) , (A.2)where b j can be computed with the Chebyshev regression algorithm, that is, b j = 2 m m X i =1 v i T j ( z i ) , j = 1 , . . . , d, (A.3)and b = P mi =1 v i /m .In a k -dimensional approximation problem, let the domain of the value function be { x = ( x , . . . , x k ) : x min ,j ≤ x j ≤ x max ,j , j = 1 , . . . k } , for some real numbers x min ,j and x max ,j , with x max ,j > x min ,j for j = 1 , . . . , k . Let x min =( x min , , . . . , x min ,k ) and x max = ( x max , , . . . , x max ,k ) . Then we denote [ x min , x max ] as the hyper-rectangle domain. Let α = ( α , . . . , α k ) be a vector of nonnegative integers. Let T α ( z ) denotethe product Q ≤ j ≤ k T α j ( z j ) for z = ( z , . . . , z k ) ∈ [ − , k . Let Z ( x ) = (cid:18) x − x min , − x max , x max , − x min , , . . . , x d − x min ,k − x max ,k x max ,k − x min ,k (cid:19) for any x = ( x , . . . , x k ) ∈ [ x min , x max ] . A.4sing these notations, the degree- d complete Chebyshev approximation for V ( x ) is ˆ V d ( x ; b ) = X ≤| α |≤ d b α T α ( Z ( x )) , (A.4)where | α | ≡ P kj =1 α j for the nonnegative integer vector α = ( α , . . . , α k ) . So the numberof terms with ≤ | α | ≤ d is (cid:0) d + kk (cid:1) for the degree- d complete Chebyshev approximation in R k . Note that the complete Chebyshev approximation does not suﬀer from the so-called“curse of dimensionality”. For example, a degree-2 complete Chebyshev approximation in a100-dimensional state space has only 5,151 terms.For k -dimensional problems in the state space [ x min , x max ] , if we use m grids in eachdimension, then there are m k approximation nodes with the tensor grid, and the values ofChebyshev nodes in dimension j are x i,j = ( z i + 1)( x max ,j − x min ,j ) / x min ,j . Usually we can just choose m = d + 1 , as a higher number of nodes has little improvementon accuracy in approximation.Using the Chebyshev nodes as approximation nodes x i = ( x i, , ..., x i,k ) and their corre-sponding values V ( x i ) = v i , we can compute Chebyshev coeﬃcients directly using a multi-dimensional Chebyshev regression algorithm. That is, b α = 2 k − n m k m k X i =1 v i T α ( Z ( x i )) , (A.5)for any nonnegative integer vector α = ( α , . . . , α k ) with ≤ α j ≤ d , where n is the numberof zero elements in α . A.5 .1.3 Numerical Integration In the objective function of the Bellman equations, we often need to compute the conditionalexpectation of V ( x + | x , a ) . When the random variable is continuous, we have to usenumerical integration to compute the expectation.One naive way is to apply Monte Carlo or pseudo Monte Carlo methods to computethe expectation. By the Central Limit Theorem, the numerical error of the integrationcomputed by (pseudo) Monte Carlo methods has a distribution that is close to normal, andso no bound for this numerical error exists. Moreover, the optimization problem often needshundreds or thousands of evaluations of the objective function. This implies that once oneevaluation of the objective function has a big numerical error, the previous iterations to solvethe optimization problem may make no sense, and the iterations may never converge to theoptimal solution. Thus it is not practical to apply (pseudo) Monte Carlo methods to theoptimization problem generally, unless the stopping criterion of the optimization problem isset very loosely.Therefore, it will be better to have a numerical integration method with a boundednumerical error. Here we present a common numerical integration method to use when therandom variable is normal.In the expectation operator of the objective function of the Bellman equation, if therandom variable has a normal distribution, then it will be good to apply the Gauss-Hermitequadrature formula to compute the numerical integration. That is, if we want to compute E { f ( Y ) } where Y has a distribution N ( µ, σ ) , then E { f ( Y ) } = (2 πσ ) − / Z ∞−∞ f ( y ) e − ( y − µ ) / (2 σ ) dy = (2 πσ ) − / Z ∞−∞ f ( √ σ x + µ ) e − x √ σdx. = π − m X i =1 ω i f ( √ σx i + µ ) , A.6here ω i and x i are the Gauss-Hermite quadrature weights and nodes over ( −∞ , ∞ ) , and m is the number of quadrature nodes.If Y is log normal, i.e., log( Y ) has a distribution N ( µ, σ ) , then we can assume that Y = e X where X ∼ N ( µ, σ ) , thus E { f ( Y ) } = E { f ( e X ) } . = π − m X i =1 ω i f (cid:16) e √ σx i + µ (cid:17) . If we want to compute a multidimensional integration, we could apply the product rule.For example, suppose that we want to compute E { f ( X ) } , where X is a random vector withmultivariate normal distribution N ( µ, Σ) over ( −∞ , + ∞ ) k , where µ is the mean columnvector and Σ is the covariance matrix, then we could do the Cholesky factorization ﬁrst, i.e.,ﬁnd a lower triangular matrix L such that Σ = LL ⊤ . This is feasible as Σ must be a positivesemi-deﬁnite matrix, from the covariance property. Thus, E { f ( X ) } = (cid:0) (2 π ) k det(Σ) (cid:1) − / Z R k f ( y ) e − ( y − µ ) ⊤ Σ − ( y − µ ) / d y = (cid:0) (2 π ) d det( L ) (cid:1) − / Z R k f (cid:16) √ Lx + µ (cid:17) e − x ⊤ x k/ det( L ) d x . = π − k m X i =1 · · · m X i k =1 ω i · · · ω i k f √ L , x i + µ , √ L , x i + L , x i ) + µ , · · · , √ k X j =1 L k,j x i j ) + µ k ! , where ω i and x i are the Gauss-Hermite quadrature weights and nodes over ( −∞ , ∞ ) , L i,j isthe ( i, j ) -element of L , and det( · ) means the matrix determinant operator. A.2 Problems with Stochastic Parameters

In this section, we solve the three-asset with-consumption model (19) with either a stochasticinterest rate or volatility. We use weekly time periods (i.e., ∆ t = 1 / years) in T = 3 yearsA.7so the number of periods is N = 156 ), and the discount rate is 0.05. In the examples,the assets available for trading include one risk-free asset with an interest rate r and k = 2 uncorrelated risky assets with log-normal returns. We assume that the utility function is U ( C ) = C − γ / (1 − γ ) with γ = 3 . The terminal value function is V T ( W, x , θ ) = ( r (1 − τ e ⊤ x ) W ) − γ ∆ t (1 − γ )(1 − β ) = W − γ · G T ( x , θ ) where G T ( x , θ ) := ( r (1 − τ e ⊤ x )) − γ ∆ t (1 − γ )(1 − β ) . The proportional transaction cost ratio is τ = 0 . for buying or selling risky assets. Thedefault parameter values are r = 0 . , µ = (0 . , . ⊤ , and σ = (0 . , . ⊤ , if they are notstochastic. In the numerical DP method, we choose degree- complete Chebyshev polyno-mials to approximate value functions, use tensor Chebyshev nodes as the approximationnodes, and implement the multi-dimensional product Gauss-Hermite quadrature rule with tensor quadrature nodes. A.2.1 Stochastic Interest Rate

We solve a case with a stochastic interest rate r t . The interest rate r t is assumed to be aMarkov chain. It has three possible values: r = 0 . , r = 0 . , r = 0 . . That is, r t isthe discrete state θ t in the model (19). Its transition probability matrix is  . . . . .

20 0 . .  , where its ( i, j ) element represents the transition probability from r t = r i to r t +∆ t = r j , for i, j = 1 , , .Figure A.1 shows the NTRs at the initial time with diﬀerent initial interest rates. In theA.8gure, the mark, the plus, and the circle represent the Merton points at three discrete statevalues: r = 0 . , . , and . , respectively, if we assume that the interest rate is ﬁxed at itsinitial value. The top-right square, the middle square, and the left-bottom square representsthe NTRs at the three discrete state values: r = 0 . , . , and . , respectively. We seethat a higher interest rate implies a NTR and a Merton point closer to the origin of thecoordinate system. This is consistent with the formula of the Merton point, (21). Moreover,the Merton point at r = 0 . or . stays away from the center of its corresponding square,and the “outside edges” of its NTR are closer to the point than the “inside edges”. That is,the circle is located in the left-bottom corner of its corresponding NTR, and the mark islocated in the top-right corner of its corresponding NTR. If the discrete state at time t is r = 0 . , then a portfolio before rebalancing that is located near the center of all the threesquares (i.e., a point in the northeast direction of the circle in the bottom-left square) tendsto have less trade than another portfolio that is the same distance from the circle but furtheraway from the center (i.e., a point in the southwest direction of the circle in the bottom-leftsquare). This conﬁrms the intuition that if the portfolio is in a position closer to the centerof all NTRs, there is little or no incentive to trade, because the expected direction of nextperiod’s trade is close to zero. A.2.2 Stochastic Volatility

Now we examine the case with stochastic volatility of risky returns, σ t = ( σ t, , σ t, ) ⊤ . Weassume that σ t, and σ t, are discrete Markov chains and independent of each other. Each σ t,i has two possible values: . and 0.24, and its transition probability matrix is  .

75 0 . .

25 0 .  for each i = 1 , .Figure A.2 displays the NTRs for four possible discrete states of ( σ t, , σ t, ) at the ini-A.9 .15 0.2 0.25 0.3 0.35 0.4 Risky asset 1 R i sky a ss e t Initial NTR

Border of NTR at r=0.03Merton point at r=0.03Border of NTR at r=0.04Merton point at r=0.04Border of NTR at r=0.05Merton point at r=0.05

Figure A.1: NTRs with a stochastic interest rate.tial time. The top-right polygon is the NTR for the state ( σ t, , σ t, ) = (0 . , . , thebottom-left square is the NTR for the state ( σ t, , σ t, ) = (0 . , . , and the top-left andthe bottom-right squares are respectively the NTRs for the states ( σ t, , σ t, ) = (0 . , . and ( σ t, , σ t, ) = (0 . , . . The mark, the diamond, the circle, and the plus are thecorresponding Merton points if we assume that ( σ t, , σ t, ) are ﬁxed at their initial values.We see that a higher σ t,i implies a NTR and a Merton point closer to the origin of thecoordinate system, consistent with the formula of the Merton point, (21). Moreover, when ( σ t, , σ t, ) = (0 . , . , the no-shorting and/or no-borrowing constraints become bindingfor some portfolios before rebalancing. For example, if the portfolio before re-allocation is(0.5,0.5), the top-right vertex of the top-right polygon, then the optimal trading strategy isto keep the current portfolio unchanged (i.e., the no-borrowing constraint is binding).A.10 .2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 Risky asset 1 R i sky a ss e t Initial NTR

Border of NTR at =(0.16,0.16)Merton point at =(0.16,0.16)Border of NTR at =(0.16,0.24)Merton point at =(0.16,0.24)Border of NTR at =(0.24,0.16)Merton point at =(0.24,0.16)Border of NTR at =(0.24,0.24)Merton point at =(0.24,0.24)

Figure A.2: NTRs with stochastic σ . References [1] Cai, Y. (2010).

Dynamic Programming and Its Application in Economics and Finance .PhD thesis, Stanford University.[2] Cai, Y. (2019). Computational methods in environmental and resource economics.

An-nual Review of Resource Economics

11, forthcoming. doi: 10.1146/annurev-resource-100518-093841.[3] Cai, Y., and K.L. Judd (2012). Dynamic programming with shape-preserving rationalspline Hermite interpolation.

Economics Letters , Vol. 117, No. 1, 161–164.[4] Cai, Y., and K.L. Judd (2013). Shape-preserving dynamic programming.

MathematicalMethods of Operations Research , Vol. 77, No. 3, 407–421.[5] Cai, Y., and K.L. Judd (2015). Dynamic programming with Hermite approximation.

Mathematical Methods of Operations Research , 81, 245–267.[6] Judd, K.L. (1998).