[PDF] Deep learning for efficient frontier calculation in finance

Abstract

We propose deep neural network algorithms to calculate efficient frontier in some Mean-Variance and Mean-CVaR portfolio optimization problems. We show that we are able to deal with such problems when both the dimension of the state and the dimension of the control are high. Adding some additional constraints, we compare different formulations and show that a new projected feedforward network is able to deal with some global constraints on the weights of the portfolio while outperforming classical penalization methods. All developed formulations are compared in between. Depending on the problem and its dimension, some formulations may be preferred.

Full PDF

DDeep learning for eﬃcient frontier calculation in ﬁnance. ∗ Xavier

Warin † January 11, 2021

Abstract

We propose deep neural network algorithms to calculate eﬃcient frontier in some Mean-Variance and Mean-CVaR portfolio optimization problems. We show that we are able to dealwith such problems when both the dimension of the state and the dimension of the controlare high. Adding some additional constraints, we compare diﬀerent formulations and showthat a new projected feedforward network is able to deal with some global constraints on theweights of the portfolio while outperforming classical penalization methods. All developedformulations are compared in between. Depending on the problem and its dimension, someformulations may be preferred.

Key words:

Deep neural networks, ﬁnance, Mean-Variance, Mean-CVaR, eﬀective frontier

Portfolio selection in order to achieve a given expected return given an accepted risk has been along time studied subject. The ﬁrst problem studied was the one period Mean-Variance problemin [Mar52], [Mar59]. An analytic solution has been proposed ﬁrst in [Mer72] for a positive co-variance matrix and when short selling is allowed. It is only in [LN00] that the multi-period casehas been solved by a reformulation of the problem as a linear quadratic (LQ) one. The solutionfor the continuous case when short selling is allowed in a complete Black Scholes market has beenproposed in [ZL00] still based on the LQ reformulation. The case without short selling has beensolved two years after in [LZL02]. Notice that, in this case, a solution is provided if borrowing isallowed so that the investment in the bond is not constrained. Then some extension with somerandomization in coeﬃcients is proposed in [Lim04], some risk on the correlations in [CW14],[IP19]. Very recently some results are obtained in [JMP20] supposing that the volatility is rough[GJR18] and follows some aﬃne and quadratic Volterra model: the Mean-Variance problem issolved in some case as the explicit solution of some Riccati backward stochastic diﬀerential equa-tions.All these theoretical results are interesting but of limited use by practitioner as borrowing isgenerally not used and other operational constraints are added: investors ﬁrst tend to limit therebalancing of the portfolio (only achieved at some discrete dates) to limit transaction cost byimposing constraints on the variation of the investment weights in the assets composing the port-folio. Second, they generally impose some strategic views on the weights that are only allowedto stay in some certain limits. In this case, numerical methods are necessary. Using conventionalPDE methods [WF10] have solved many constrained problems in the case of a single risky assetfollowing a Black Scholes dynamic. The same methodology in the case of an asset with jumps ∗ This work is supported by FiME, Laboratoire de Finance des March´es de l’Energie † EDF R&D & FiME xavier.warin at edf.fr a r X i v : . [ q -f i n . P M ] J a n as been used in [DF14]. This kind of approach can only be used at most with two or threeassets and the resolution of a realistic portfolio selection problem is out of reach. In to ordertackle the multi dimensional problem with constraints, [CO16] have proposed two algorithmsbased on the LQ formulation: the ﬁrst being based on pure forward simulations is sub optimalwith constraints, while the second using a backward recursion is based on regressions such thatonly rather low dimensional cases can be solved.The used of the variance to evaluate the risk has been questioned by both practitioners andsearchers as it both penalizes gain and loss. Numerous downsize risk measures penalizing lossesor low gains have been proposed in the literature. Among them, Low Partial Moments (LPM)which have been proposed more than forty years ago in [Fis77] rely on two parameters : thegamma γ parameter named the ”Benchmark” parameter is set by the investor and the secondone q represents the risk attitude of the investor. This risk model embeds a lot of classicalmodels. For example the case q = 0 corresponds to the safety rule of [Roy52], the case q = 1correspond to the expected regret of [DR99], q = 2 corresponds to the semi deviation below theBenchmark parameter or the semi variance if the Benchmark parameter is set to the expectedwealth.Another classical risk measure related is the CVaR introduced in [RU+00] corresponding the ex-pected loss below the VaR measure. As shown in [RU+00], CVaR calculation can be parametrizedas the minimum over a parameter α of a function value linked to a LPM risk measure with pa-rameter q = 1. Using this formulation, [Gao+17] studied the Mean-CVaR continuous case givingsemi analytical solutions to the Mean-CVaR problem when the wealth is bounded. Indeed theboundedness of the wealth or the control as studied in [MY17] is necessary as the general case isnot well posed for downsize risk measures as shown in [JYZ05]: using this kind of risk measure,and without constraints, the investors tend to gamble more and more if the market is in badshape, and investments in the most risky assets tend toward inﬁnity. Therefore, the Mean-CVaRproblem has to be solved with constraints or in discrete time and numerical methods are nec-essary to optimize portfolio. A classical approach consists in using the auxiliary formulationproposed by [RU+00] and in using a gradient descend method on α as proposed in [MY17].However, this procedure is very time consuming and only can be solved in low dimension.In order to optimize portfolio with general Mean-Risk measure, Neural Networks appears to bean interesting choice. Neural networks are known to be able to approximate function in highdimension. Very recently, neural networks have been used in risk management ﬁrst in [Bue+19].Some cases with constraints on the hedging products and some downsize risk measures havebeen studied in [FMW19]. In both cases, results reported are promising. Some cases of AssetLiability Management have been reported very recently in [KT20].In this article, we show that neural networks are able to calculate very realistic eﬃcient frontierin the Mean-Variance case and the Mean-CVar case.The main ﬁndings developed are the following ones: • Neural networks are able to solve some continuous Mean Variance problemsaccurately in low and high dimension for the control : using diﬀerent formulations,we ﬁrst solve the continuous Mean-Variance problem without constraints in the BlackScholes model with a direct formulation and the LQ formulation. We introduce for eachformulation two methods to approximate the eﬃcient frontier : the ﬁrst one approximatesthe frontier point by point while the second permits to evaluate the global frontier in asingle calculation. In all the cases, the frontier is correctly approximated in dimension 4and 20 by comparison to the analytical solution.Then we used the global and point by point local method and adapt the network to solvethe problem when no short selling and borrowing are allowed. Although we do not haveany analytical solution, all frontiers calculated are very similar indicating that they arecorrectly evaluated. 2

Neural networks are able to deal with some operational constraints : in order tocalculate the frontier when some global and local constraints are added to the resolutionwe propose diﬀerent formulations based on diﬀerent penalization methods. We show thatthe global constraints are hard to satisfy by penalization and we introduce a new projectedfeedforward network which is able to deal with the fact that the weights in the portfolio aresuch that all of them are positive, that their sum is equal to one and that each weights arebetween some given bounds. Numerical methods still indicate that the border is correctlycalculated. • Neural networks are able to deal with the Mean Variance problem with a statein high dimension : the Black Scholes case is interesting but is special as the state of theproblem only involve the global wealth: then as the number of assets increases, only thedimension of the control increases. In a section we use some Heston model and show thatthe frontiers are still correctly calculated with or without constraints. As the state of theproblem depends on the wealth and the variance of the assets, we have shown that we areable to solve a high dimension problem for both the state and the control. • Neural networks can solve some Mean-CVaR problem accurately : we solve somemore diﬃcult Mean-CVaR problems and show that on some cases, solutions are sometimestrapped in local minima that can be far from the optimum. In this case, some multiplecalculations can be achieved to get back the correct frontier for some global formulationswhile the point by point formulation always gives oscillations.We also show that, depending on the problem, one formulation may be preferred: • For the Mean-Variance problem, a point by point approximation of the frontier is the bestchoice for the Black Scholes model, while the global formulation is the best choice for theHeston model. • Whereas for the Mean-CVaR, the global approach with a randomization of the risk coef-ﬁcients is the only way to always obtain good results in the high dimension case (or whatseems to be good results because no reference is available).The article article is organized as follows: In the ﬁrst section, we brieﬂy recall what a neuralnetwork is, then in the second section, we solve the continuous Mean-Variance problem. Someconstraints are added in the third section. Section fourth and ﬁfth focus respectively on the useof the Heston model in the Mean-Variance setting and the Mean-CVaR setting with a BlackScholes model.In the whole sequel, we note (Ω , F , P , F t ) a ﬁltered probability space. For each set A de R d , wenote L F t (0 , T, A ) the set of the F t adapted square integrable processes with values in A . Foran element x of R , x + = max( x,

0) applied component by component. At last we suppose thatthe risk free rate is 0 which is equivalent to discount all asset values with a rate r , so that weconsider all risky assets with an adapted trend equal to the surplus of trend with respect to thenon risky asset. Deep neural networks are designed to approximate unknown or large class of functions. They relyon the composition of simple functions, and appear to provide an eﬃcient way to handle high-dimensional approximation problems, by ﬁnding the “optimal” parameters by stochastic gradientdescent methods. We here use some basic type of network dubbed feedforward networks. We ﬁxthe input dimension d = d that will represent the dimension of the state variable x , the outputdimension d (here d = 1 for approximating the real-valued solution to the PDE, or d = d L + 1 ∈ N \ { , } of layers with m (cid:96) , (cid:96) = 0 , . . . , L , the number of neurons (units or nodes) on each layer: the ﬁrstlayer is the input layer with m = d , the last layer is the output layer with m L = d , and the L − m (cid:96) = m , (cid:96) = 1 , . . . , L −

1. A feedforward neural network is a function from R d to R d deﬁnedas the composition x ∈ R d (cid:55)−→ A L ◦ (cid:37) ◦ A L − ◦ . . . ◦ (cid:37) ◦ A ( x ) ∈ R d . (1)Here A (cid:96) , (cid:96) = 1 , . . . , L are aﬃne transformations: A maps from R d to R m , A , . . . , A L − mapfrom R m to R m , and A L maps from R m to R d , represented by A (cid:96) ( x ) = W (cid:96) x + β (cid:96) , (2)for a matrix W (cid:96) called weight, and a vector β (cid:96) called bias term, (cid:37) : R → R is a nonlinear function,called activation function, and applied component-wise on the outputs of A (cid:96) , i.e., (cid:37) ( x , . . . , x m )= ( (cid:37) ( x ) , . . . , (cid:37) ( x m )). Standard examples of activation functions are the sigmoid, the ReLU, theELU, tanh. Generally the number of layers is kept low (between 2 and 4) in order to avoid theproblem of vanishing gradient and the number of neurons depends on d but is kept generallybetween 10 and 100.All these matrices W (cid:96) and vectors β (cid:96) , (cid:96) = 1 , . . . , L , are the parameters of the neural network,and can be identiﬁed with an element θ ∈ R κ L,m , where κ L,m = (cid:80) L − (cid:96) =0 m (cid:96) (1 + m (cid:96) +1 ) = d (1 + m ) + m (1 + m )( L −

2) + m (1 + d ) is the number of parameters, where we ﬁx d , d , L , and m . The fundamental result of Hornick et al. [HSW89] justiﬁes the use of neural networks asfunction approximators by proving that the set of all feedforward networks letting m vary isdense in L ( ν ) for any ﬁnite measure ν on R d , whenever (cid:37) is continuous and non-constant.In the whole article, we use three hidden layers and a number of neurons equal to 10 + d suchthat the dimensional space for the parameters of the neural network is ˆ κ = κ , d . Gradientdescent is implemented in Tensorﬂow [Aba+15] using ADAM optimized [KB14]. The activationfunction used is the tanh function. In this section we suppose that the assets follow a Black Scholes model: dS t S t = µdt + σdW t (3)with S t with values R d with components S t,j , j = 1 , . . . , d , µ with values in R d , σ ∈ { diag( v ) , v ∈ R d> } the set of diagonal matrices with strictly positive values, w t = ( ˆ w it ) i =1 ,d where the ˆ w i are F t adapted Brownian motions correlated with a correlation matrix ρ .In the continuous setting, we note ξ = ( ξ t ) t> the investment strategy with values in R d untilmaturity T , with a component i corresponding to the fraction of wealth invested in asset i . Wesuppose ξ t is in L F t (0 , T, R d ). The portfolio value at date T veriﬁes: X ξT = X + (cid:90) T ξ t X ξt . dS t S t = X + (cid:90) T X ξt ξ t . ( µdt + σdW t ) (4)The Mean-Variance problem consist in ﬁnding strategies ξ adapted to the available informationthat minimize: ( J ( ξ ) , J ( ξ )) = ( − E [ X ξT ] , E [( X ξT − E [ X ξT ]) ]) . (5)4n admissible strategy ξ ∗ is said to be eﬃcient if there is no other strategy ψ such that J ( ψ ) ≤ J ( ξ ∗ ) , J ( ψ ) ≤ J ( ξ ∗ )and at least one of the two previous inequalities is strict.Then ( J ( ξ ∗ ) , J ( ξ ∗ )) is an eﬃcient point and the set of all eﬃcient points deﬁnes the eﬃcientfrontier. By convexity (see [ZL00]), this Pareto frontier can be calculated by minimizing thefunction deﬁned as the weighted average of the two criteria: J ( ξ ) + βJ ( ξ ) (6)where the parameter β > α t = ξ ∗ t X ξ ∗ t where ξ ∗ minimizes (6) and X ξ ∗ t is the optimalportfolio associated is given by: α ( X ξ ∗ t ) = − ( σρσ ) − µ (cid:104) X ξ ∗ t − X − e RT β (cid:105) , ≤ t ≤ T (7)where R = µ. (( σρσ ) − µ ).The problem (6) does not admit any dynamic programming principal due the average term inthe variance deﬁnition so that no PDE or regression method can be used directly to solve it. Thederivation of the analytic solution is based on a LQ auxiliary equivalent formulation as shownin [ZL00]. Indeed the solution of the problem (6) is solution of ξ ∗ = argmin ξ ∈L F t (0 ,T, R d ) E [( X ξT − γ ) ] (8)where γ = 12 β + E [ X ξ ∗ T ] . (9)It is then possible to estimate the eﬃcient frontier by the resolution of the problem (8) by letting γ vary. This kind of formulation is generally used by conventional methods such as regressions[CO16] and PDEs [WF10], [DF14] as the dynamic programming principle can be used. Equation (4) is ﬁrst discretized on grid of dates t i , ≤ i < N such that t = 0, 0 < t i < T for0 < i < N and we note t N = T . We note φ i a position (as a fraction of the wealth invested ineach asset) at date t i , and φ = ( φ i ) i =0 ,...,N − . The portfolio value is than given by X φT = X + N − (cid:88) i =0 φ i X φt i . S t i +1 − S t i S t i (10)We ﬁrst present the methodology used to solve (6) for a given β (or (8) for a given γ ). The stateof the system only depends on t and the wealth x , then we classically introduce a single networkwith parameters θ ∈ R ˆ κ taking t and the wealth x as input (so in dimension 2) and with outputˆ φ θ ( t, x ) in dimension d where ˆ φ θj ( t i , . ) is an approximation of φ ij . No activation function is usedon the ﬁnal output such that the network gives an output potentially covering R d . Then wesolve θ ∗ = argmin θ ∈ R ˆ κ − E [ X ˆ φ θ T ] + β E [( X ˆ φ θ T − E [ X ˆ φ θ T ]) ] (11)5or problem (6) or θ ∗ = argmin θ ∈ R ˆ κ E [( X ˆ φ θ T − γ ) ] (12)for problem (8).Instead of evaluating the frontier by solving (11) for each β chosen (or (12) for each γ ), a secondresolution method aims at evaluating the global frontier in one calculation. In this case, takingfor example problem (6), we introduce a network with input ( t, x, β ) in dimension 3 and still withan output in dimension d . The associated strategy is noted ˆ φ θ ; β to emphasize the dependenceon β and we solve : θ ∗ = argmin θ ∈ ˆ κ E [ − E [ X ˆ φ θ ;ˆ β T / ˆ β ] + ˆ β E [( X ˆ φ θ ;ˆ β T − E [ X ˆ φ θ ;ˆ β T / ˆ β ]) / ˆ β ]] (13)where ˆ β is a random variable with density that can be taken • either with discrete values so p ( x ) = K (cid:80) Kj δ β j ( x ) where ( β i ) i =1 ,K is a set of values wherewe want to approximate the frontier (generally β = 0) and in this case it is equivalent tominimize θ ∗ = argmin θ K (cid:88) k =1 − E [ X ˆ φ θ ; βk T ] + β k E [( X ˆ φ θ ; βk T − E [ X ˆ φ θ ; βk T ]) ] (14) • or p ( x ) is for example an uniform law on [ β, ¯ β ] representing where we want to approximatethe frontier.All results in the following section are obtained using some batch of size 300, the linear rate istaken linearly decreasing from 1 e − /d to 1 e − / (10 d ) with gradient iterations. The numberof iterations is set to 15000. After training using 40 points ( K = 40 in the global estimation inequation (14), or 40 points to approximate the frontier point by point), each point of the frontieris plotted calculating mean and variance using 1 e We take µ = (0 . , . , . , . T , the diagonal of σ is given by(0 . , . , . , . ρ =  . . − .

43 0 . .

26 1 . .

003 0 . − .

43 0 .

003 1 . − . .

233 0 . − .

33 1 .  . We suppose that T = 1 year and that the rebalancing is achieved twice a week ( N = 104) toapproach a continuous rebalancing. We plot the frontier obtained with β values between 0 . .

7. The analytic solution is obtained by applying the continuous optimal control (7) at eachrebalancing date. In the sequel, in the ﬁgures, ”Point by point” stands for an approximationusing (11), ”point by point auxiliary” for an approximation using (12), ”global” stands foran approximation using (14), ”global random” stands for an approximation using (13) and anuniform law for ˆ β . At last ”global auxiliary” and ”global auxiliary random” stand respectivelyfor a global resolution with deterministic and random γ values for the auxiliary problem (8).6oint by point Global Global randomFigure 1: Eﬃcient frontier in dimension 4.Results on ﬁgure 1 show that the whole frontiers obtained are on the analytic frontier butglobal estimation on the direct problem (6) tend to fail to reproduce the whole curve. Portfoliowith very high returns are not found. Nevertheless results are very good. We take µ i = 0 . i − and the diagonal matrix σ such as σ i,i = 0 . i − for i = 1 , . . . ,

20. Thecorrelation is picked up randomly. Results on ﬁgure 2 show that the point by point approximationespecially for the direct approach is less eﬀective for very small β . The global approach canslightly outperform the analytic solution as the continuous formula is applied on a time discreteproblem. Point by point Global Global randomFigure 2: Eﬃcient frontier in dimension 20. In the section, on focus on the discrete case still trying to solve (6) or (8).

Remarque 4.1

We always suppose that the allocation weights belong to a convex set and then (8) and (6) remain equivalent.

In the whole section, we impose that there is no short selling, no borrowing and that all thewhole wealth is invested. Then in the two sections below, the φ i in equation (10) satisfy:0 ≤ φ ij ≤ , ∀ i = 0 , . . . , N − , j = 1 , . . . , d d (cid:88) j =1 φ ij = 1 , ∀ i = 0 , . . . , N − Y i = S ti +1 − S ti S ti so that: X φT = X N − (cid:89) i =0 (1 + φ i . Y i ) (16)In all the cases in this section we take T = 10 years and we suppose that rebalancing is achievedonce a month. In dimension 4 and 20, trends and volatilities are the same as in the continuouscase but correlations are increased and for example in dimension 4: ρ =  . . − .

894 0 . .

805 1 . − .

571 0 . − . − .

571 1 . − . .

59 0 . − .

772 1 .  . In this section, we suppose that only constraints (15) are imposed. We use a similar networkas in the previous section with parameter θ except that we use a sigmoid activation functionat the output giving a network ˆ κ θ ( t, X ) for the point by point evaluation with values in [0 , d .Investment weights are then given byˆ φ θ ( t, X ) = ˆ κ θ ( t, X ) d (cid:88) i =1 ˆ κ θi ( t, X ) (17)Then equation (11) or (12) can be solved.Similarly for a global formulation the network for the direct problem (6) take ( t, x, β ) as inputand the weights are deﬁned as ˆ φ θ ; β ( t, X, β ) = ˆ κ θ ( t, X, β ) d (cid:88) i =1 ˆ κ θi ( t, X, β ) . (18)Then it is possible to solve (13) or to minimize the objective function corresponding to theauxiliary problem.In the sequel of the article, we will note by static optimization (”Static” on ﬁgures) the optimalconstant mix strategy such that the weights are kept constant during the whole period. Thenthe optimization constrained with the weights being constant can be used too to give an eﬃcientfrontier in this class of strategies. When ”Static” is not speciﬁed, a plot is carried out with adynamic optimization. Remarque 4.2

For a static optimization no neural network is used and the problem is reducedto an optimization in dimension d . In this section we take an initial learning rate equal to 0 . e − /d and linearly decreasing withiterations to 0 . e − / (10 d ). The number of gradient iterations is set to 15000, the batch sizeequal to 100. 8imension 4 Dimension 20Figure 3: Eﬃcient frontier with static optimization.On ﬁgure 3, we plot the eﬃcient frontier calculated by static optimization for diﬀerent meth-ods. In dimension 4, all methods seem to get back the whole frontier while in dimension 20 veryhigh return are hard to catch with most of the methods : only the global auxiliary method cancatch the very high returns.Dimension 4 Dimension 20Figure 4: Eﬃcient frontier with dynamic optimization.As seen on ﬁgure 4 dealing with the dynamic case, global with deterministic β (respectively γ ) coeﬃcients and point by point methods give the same results except for very small β valuescorresponding to very high γ values.The randomize version of the global approach here doesn’t give good results as shown on ﬁgure5: in dimension 4, direct global approach with randomization only give a part of the curve whilethe auxiliary random version gives another part of the optimal curve. At last in dimension 20,the randomized auxiliary version is sub optimal.9imension 4 Dimension 20Figure 5: Eﬃcient frontier with dynamic optimization for global random approach : methodswith stochastic risk coeﬃcients compared to reference calculated with global auxiliary method. Investors often impose other constraints on portfolio: • Some are local constraints: the variations of the weights are limited from one step toanother in order to face liquidity constraints and to reduce transaction cost: | ( φ j ( t i +1 , X t i +1 ) − φ j ( t i , X t i ) | ≤ η j , for j = 1 , . . . , d, and i = 0 , . . . , N − • Some are global constraints : weights are only allowed to stay in some convex compact: φ j ≤ φ j ( t i , X t i ) ≤ ¯ φ j , for j = 1 , . . . , d. (20)We ﬁrst compare some formulations to solve the problem with the previous constraints. We testthem on the point by point estimation of the frontier in dimension 4 in the next subsection.Then we use the best model to achieve an exhaustive comparison of the point by point andglobal approaches. • The ﬁrst model consists in taking the same representation for the weights as in equation(17). Then constraints (19), (20) are imposed by penalization of the objective functionand the parameters θ minimize: J ( ˆ φ θ ) + βJ ( ˆ φ θ ) + 1 (cid:15) d (cid:88) j =1 N − (cid:88) i =0 E (cid:104) ( | ˆ φ θj ( t i +1 , X ˆ φ θ t i +1 ) − ˆ φ θj ( t i , X ˆ φ θ t i ) | − η j ) + (cid:105) +1 (cid:15) d (cid:88) j =1 N − (cid:88) i =0 E (cid:16) ( ˆ φ θj ( t i , X ˆ φ θ t i ) − ¯ φ j ) + + ( φ j − ˆ φ θj ( t i , X ˆ φ θ t i )) + (cid:17) (21)where (cid:15) is a small penalization parameter.10 The second model consists in introducing the variation of weights between two dates simi-larly as in [FMW19]. We introduce ξ θ ( t, X ) a neural network taking time and the portfoliovalue as input with an tanh activation function as output so with values in [ − , d . Theinitial portfolio weight is represented by a vector ˜ θ in R d and the weight in the portfolioat a given date is given by:ˆ φ θ, ˜ θ ( t i , X ˆ φ θ, ˜ θ t i ) = ˜ θ + η i (cid:88) l =1 ξ θ ( t l , X ˆ φ θ, ˜ θ t l ) (22)We note ¯ θ = ( θ, ˜ θ ). Local constraints are taken into account in this formulation. It remainsto impose global bounds on the weights and the constraint on the summation of the weights.It leads to the minimization in ¯ θ of J ( ˆ φ ¯ θ ) + βJ ( ˆ φ ¯ θ ) + 1 (cid:15) N − (cid:88) i =0 E  | d (cid:88) j =1 ˆ φ ¯ θj ( t i , X ˆ φ ¯ θ t i ) − |  (cid:15) d (cid:88) j =1 N − (cid:88) i =0 E (cid:16) ( ˆ φ ¯ θj ( t i , X ˆ φ ¯ θ t i ) − ¯ φ j ) + + ( φ j − ˆ φ ¯ θj ( t i , X ˆ φ ¯ θ t i )) + (cid:17) (23) • The third model consist in imposing directly the global constraints at the output of thenetwork (22) by introducing:ˆ φ θ, ˜ θ ( t i , X ˆ φ θ, ˜ θ t i ) = (˜ θ + η i (cid:88) l =1 ξ θ ( t l , X ˆ φ θ, ˜ θ t l )) ∧ ¯ φ ) ∨ φ (24)Local and global constraint on the weights are taken into account and it remains to imposethat the sum of the weights is equal to one giving the following expression to minimize in θ : J ( ˆ φ ¯ θ ) + βJ ( ˆ φ ¯ θ ) + 1 (cid:15) N − (cid:88) i =0 E  | d (cid:88) j =1 ˆ φ ¯ θj ( t i , X ˆ φ ¯ θ t i ) − |  (25) • The fourth and last model consists in using a feedforward network giving as output κ θ ( t, X )in [0 , d by using a sigmoid activation function at the output of the network. A rescalingis achieved to get some weights in [ φ, ¯ φ ] byˆ φ θ ( t, X ) = φ + κ θ ( t, X )( ¯ φ − φ ) . (26)It remains to get an output in the following hyperplane1 = d (cid:88) i =1 ˆ φ θ ( t, X ) , (27)which can be achieved with the following projection algorithm applied on the network: Algorithm 1

Projection algorithm applied on the output of the network Input : ˆ φ θ ( t, X ) with values in [ φ, ¯ φ ] for i = 1 , d do ˆ φ θi ( t, X ) = [( ˆ φ θi ( t, X ) + (1 − (cid:80) dj =1 ˆ φ θj ( t, X ))) ∧ ¯ φ i ] ∨ φ i end for Return : ˆ φ θ ( t, X ) satisfying (27) 11he projections are carried out successively in the diﬀerent directions. In order to avoidhaving a preferential direction, it is also possible to randomize the loop of the previousalgorithm by performing a random permutation of the diﬀerent visited dimensions.The objective function to minimize in θ is J ( ˆ φ θ ) + βJ ( ˆ φ θ ) + 1 (cid:15) d (cid:88) j =1 N − (cid:88) i =0 E (cid:104) ( | ˆ φ θj ( t i +1 , X ˆ φ θ t i +1 ) − ˆ φ θj ( t i , X ˆ φ θ t i ) | − η j ) + (cid:105) (28) Remarque 4.3

In the previous formulations, we have supposed that the initial weights in theportfolio had to be optimized. It is often a data in the problem and then only weights after theinitial date have to be optimized.

We tests the four previous model on the four dimensional test case described at the beginningof the section. We impose the additional constraints : the weights cannot vary more than 0 . . .

6. Besides in this case, we impose that initial weights are the same for each asset.We take the following resolution parameters: we take (cid:15) = 1 e −

4, the initial learning rate is setto 1 e − e − E ( X T ) − β E (( X T − E ( X T )) ) for the four models for diﬀerent values of β . β E ( X T ) − βE (( X T − E ( X T )) ) obtained.Models 1 and 4 give the best results. Besides, models 2 and 3 have diﬃculties to satisfy theconstraints: in the following table 1 we give the opposite of the objective function calculated.The diﬀerence between results in table below with results in table 1 indicate a violation of theconstraints. β We keep model 1 and model 4 to test them further in diﬀerent dimensions. We suppose that theweights are between 0 . p Init and 2 p Init where p Init = d is the initial equal weight used in theportfolio. Local constraints are kept as in the previous subsection with variations below 0 .

05 inabsolute value between two rebalancing date. Optimization parameters are:12

Model 1: initial learning rate equal to . e − d and linearly decreasing to . e − d with gradientiterations, • Model 4: initial learning rate equal to 1 e − e − We keep the constraints deﬁned in section 4.2.2. The model 4 can be extended easily to estimatethe frontier globally: the network κ θ with parameters θ is now a function of t, X, β still withvalues in [0 , d . The weight approximation is carried out by:ˆ φ θ ; β ( t, X, β ) = φ + κ θ ( t, X, β )( ¯ φ − φ ) . (29)The projection algorithm (27) is used and the objective function (13) is replaced by: θ ∗ =argmin θ E [ − E [ X ˆ φ θ ;ˆ β T / ˆ β ] + ˆ β E [( X ˆ φ θ ;ˆ β T + E [ X ˆ φ θ ;ˆ β T / ˆ β ]) / ˆ β ]]+1 (cid:15) d (cid:88) j =1 N − (cid:88) i =0 E [ | ˆ φ θj ( t i +1 , X ˆ φ θ ; ˆ βt i +1 , ˆ β ) − ˆ φ θj ( t i , X ˆ φ θ ; ˆ βt i , ˆ β ) | − η j ) + ] (30)Oﬀ course a similar objective function can be written for the auxiliary problem.We only present the results obtained using the deterministic (for β and γ ) global method as the13andomized approach tends to represent only a part of the curve or is sub optimal as seen on thecase without constraints. As before, in dimension 4, 40 points are used to estimate the eﬃcientfrontier while only 16 are used in dimension 20.Dimension 4 Dimension 20Figure 7: Point by point versus global eﬃcient frontier estimation.On ﬁgure 7, we plot the eﬃcient frontier obtained by point by point and global estimation.The point by point calculation and the direct global calculations (deterministic and randomizedversion) give the same curves in dimension 4 and 20. The global auxiliary approach gives a suboptimal curve in dimension 4 and 20 for its deterministic version while the randomize versiongives solution with a ﬂat variance in both cases so is not reported. Remarque 4.4

In the whole section, we have added local constraints in the strategies. It is alsopossible to remove the local constraints and take into account the transaction costs directly in thedynamic of the asset. Supposing that half of spread bid ask for asset i is given by p i , and stillsupposing that we deal with the time discrete optimization (10) , then the ﬁnal value of the assetsis given with the convention φ − = 0 by X φT = X + N − (cid:88) i =0 φ i X φt i . S t i +1 − S t i S t i + N − (cid:88) i =0 d (cid:88) j =1 | φ i,j X φt i S t i ,j − φ i − ,j X φt i − S t i − ,j | p j (31) Now the control depends not only on the wealth but also on S t and the weight at the previousdate and they have to be included in the state at the input of the network. We now suppose that the assets follow and Heston model [Hes93]: dS t = µS t + (cid:112) V t S t dW t ,dV t = κ ( ¯ V − V t ) dt + ¯ σ (cid:112) V t dW t (32)where S t , V t with values in R d , ( W t , W t ) is a vector of 2 d Brownian correlated with a correlationmatrix ρ . As there is no analytic solution to equation (32), we rely on a Milstein scheme on thevolatility [KJ06] to assure positivity of the volatility under the Feller condition 2 κ ¯ V ≤ ¯ σ .We are only interested in the discrete problem with no short selling and no borrowing : weightsare all in [0 ,

1] and the summation of the weights is equal to one. This problem is interesting14s the state of the problem now involve not only the wealth X t but also the variance V t , suchthat not only the control but also the global state has a dimension increasing with the numberof assets in the portfolio.We ﬁrst deal with the case without additional constraints and then the case with local and globalconstraints. For all optimizations, we take a learning rate initially equal to 1 e − e − Using the previous algorithm equation (17) and (18) are now replaced byˆ φ θ ( t, X, V , . . . , V d ) = ˆ κ θ ( t, X, V , . . . , V d )) (cid:80) di =1 ˆ κ θi ( t, X, V , . . . , V d )) (33)for the point by point approximation and byˆ φ θ ; β ( t, X, V , . . . , V d , β ) = ˆ κ θ ( t, X, V , . . . , V d , β )) (cid:80) di =1 ˆ κ θi ( t, X, V , . . . , V d , β )) (34)in the global method where V j is the variance of the asset j .In the whole section we still take T = 10 years and a rebalancing every month. In dimension 4we take the following parameters: µ = (0 . , . , . , . T , ¯ V is taken equal to V and given by (0 . , . , . , . T . Volatility of the variance¯ σ is given by the diagonal matrix with coeﬃcients (0 . , . , . , . κ values areequal to 0 .

5. The initial asset values are all equal to one. The correlation matrix associated to( W t, , . . . , W t,d , W t, , . . . , W t, ) is given by ρ =  . − .

383 0 . − . − . − .

110 0 . . − .

383 1 . − . − . − .

053 0 . − . − . . − . . . − . − .

398 0 .

249 0 . − . − .

411 0 .

145 1 . .

655 0 . − . − . − . − . − .

051 0 .

655 1 . − . − . − . − .

110 0 . − .

398 0 . − .

172 1 . . − . . − .

226 0 . − . − .

464 0 .

044 1 . − . . − .

349 0 . − . − . − . − .

580 1 .  . In dimension 10 a similar test case is created with quite high correlations. All curves are plottedusing 30,points. 15oint by point Point by point versus globalFigure 8: Results in dimension 4 for the Heston model without additional constraints.In ﬁgure 8, we ﬁrst plot in dimension 4 the eﬃcient frontier obtained by direct optimizationand using auxiliary equations. Both approaches give the same curve, well above the static curvewhich surprisingly doesn’t seem to be very well estimated (by a point by point approach) asconvexity of the curve is not totally respected. In ﬁgure 8, we also compare global estimations topoint by point estimations: using the direct approach (deterministic and randomized version) weget the same curve as the one given by the point by point estimation. Using the auxiliary versionwith the global approach, we get similar results as in the Black Scholes case: the deterministicversion gives a sub optimal curve while the randomized version is not reported as it gives badresults.In ﬁgure 9, we give the same results in dimension 10. We get very similar results, except thatthe static point by point optimization curve is more realistic.Point by point Point by point versus globalFigure 9: Results in dimension 10 for the Heston model without additional constraints.

We keep the constraints used in the Black Scholes model in section 4.2.2. The equation (26) ismodiﬁed taking into account the fact that the state includes the variance of the diﬀerent assets16or the point by point optimization and the equation (29) is modiﬁed in the same way for globaloptimization. Point by point Point by point versus globalFigure 10: Results in dimension 4 for the Heston model with additional constraints.Figure 10 seems to indicate that we are able to recover the frontier with a good accuracyboth by the point by point and the global method.Point by point Point by point versus globalFigure 11: Results in dimension 8 for the Heston model with additional constraints.In higher dimension (for example in ﬁgure 11 in dimension 8), the convergence is harder toachieve and small oscillation appear for the point by point approach. In this case, the globalauxiliary results are not reported as the curve obtained is not satisfactory. Globally, the globalversion on the direct approach appears to be the most eﬀective.

In the whole section we suppose that the assets follow a Black Scholes model given by equation(3). As the Mean-Variance case penalizes both gains and losses, practitioners prefer to use somedownsize risk only penalizing the losses (or small gains). In this part we focus on the Mean-CVaR17riterion.We ﬁrst recall the VaR deﬁnition: if X is a distribution of gain with a continuous cumulativedistribution F X ( x ) and α ∈ ]0 , V aR ( X, α ) = min( y ∈ R /P ( − X ≤ y ) ≥ α ) (35)where P ( − X ≤ x ) is the probability that the random variable − X is below x , so that P ( − X ≤ x ) = F − X ( x ).CVaR is then the average loss conditionally to the fact that the losses are above the VaR: CV aR ( X, α ) = E [ − X/ − X ≥ V aR ( X, α )] (36)The VaR criterion is not convex but the CVaR is convex [RU+00]. The Mean-CVaR problemconsists in ﬁnding strategies adapted to the available information and minimizing( J ( φ ) , J ( φ )) = ( − E [ X φT ] , CV aR ( X φT − X , α )) . (37)where X φT is the value of portfolio managed with strategy φ and given by (10).As for the Mean-Variance case, an admissible strategy φ ∗ is eﬃcient if there is no other admissiblestrategy ψ such that J ( ψ ) ≤ J ( φ ∗ ) , J ( ψ ) ≤ J ( φ ∗ )with at least one of the last inequality being strict. The set of eﬃcient points ( J ( φ ∗ ) , J ( φ ∗ ))deﬁnes the eﬃcient frontier.As recalled in the introduction, the continuous optimization problem is not well posed and wewill only focus on the time discrete optimization problem.It is also possible to recast the problem as minimizing the CVaR under a constraint that theexpected value of the gains is above a threshold M :min φ ∈L F t (0 ,T, R d ) CV ar ( X φT − X )] (38)with E [ X φT ] ≥ M (39)so that using [RU+00] representation,min y,φ ∈L F t (0 ,T, R d ) E [ y + 11 − α ( − X φT + X − y ) + ] (40)with E [ X T ] ≥ M (41)and when optimality is reached, y corresponds to the VaR of the portfolio.Conventional numerical methods generally prefer this formulation as if y is ﬁxed, the problemcan be solved by dynamic programming. Then combining a gradient descent method with thisoptimization where y is ﬁxed, it possible to optimize the portfolio.The interest of this formulation is not obvious with neural network problem as it adds a ﬁctitiousdimension on the problem and it turns out that using this formulation goes not give good resultssuch that we don’t report them in the article.In the sequel use the same formulation as in the Mean-Variance case: the eﬃcient frontier canbe calculated by minimizing (6) using the deﬁnition (37) for a ﬁxed β and let β vary to describethe frontier.As in the Mean-Variance case, it is possible to use a neural network to optimize the strategiesby minimize equation (11) for a point by point approximation of the frontier or minimizing thesingle problem (13) to learn the global curve.In all the tests, the parameters of the model are the same as in section 4. As before, T = 1018ears and rebalancing is achieved every month.As for the convergence of the stochastic gradient, the learning rate starts at 1 e − e −

5. The number of stochastic gradient iterations is ﬁxedat 15000. The batch size is chosen equal to 2000 to have a correct assessment of the CVaR.While calculating the curve point by point or globally with deterministic β , the following values β i = ( i /K ) , i = 0 , . . . , K − K depends on the case. When arandomization of the parameter β is used, ˆ β ∼ β K − U where U is an uniform random variableon [0 , Similarly to the Mean-Variance we can impose some constraints on the portfolio: using theweights formulation (17) while optimizing (6), the weights formulation (18) optimizing (13), itis possible to impose that all weights are positive, between 0 and 1 and with sum equal to one.We train the network and plot the resulting eﬃcient frontier using 40 points ﬁrst in dimension4 on ﬁgure 12. The global approaches with deterministic and stochastic β give the same curve.The point by point curve appears to be oscillating and may not be very accurate. α = 0 . α = 0 . β in dimension 4.In dimension 20, results with point by point approximation are not reported as they giveoscillations. Only global solution with deterministic and stochastic β are given on ﬁgure 13.19 = 0 . α = 0 . β in dimension 20.For the Mean-CVaR criterion, it seems that the global approximation gives the best results. Using equation (26) with algorithm 1 in equation (28), or equation (29) for the weights whileoptimizing (30), it is possible to add the constraints given by equations (19) and (20). Theparameters for the additional constraints are the same as in section 4.2.2. Results are reported indimension 4 and only in dimension 8 as computing cost grows signiﬁcantly with the dimension ofthe problem. Curves are reconstructed with 20 points. On ﬁgure 14, we plot the curves obtainedwith a static optimization, and the dynamic optimization with the global approaches and thepoint by point approach. The global approaches seem to give slightly better results. α = 0 . α = 0 . β coeﬃcients as shown onﬁgure 15. 20 = 0 . α = 0 . α = 0 .

95, some global resolution can lead to some sub optimal curve asshown on ﬁgure 16. The optimizer is trapped in a local minimum away from the solution andpoint by point estimations oscillates between the two curves.Figure 16: Diﬀerent Mean CVaR runs with deterministic global method compared to point bypoint estimation with α = 0 . References [Aba+15] Martin Abadi et al.

TensorFlow: Large-Scale Machine Learning on HeterogeneousSystems . Software available from tensorﬂow.org. 2015. url : .[Bue+19] Hans Buehler et al. “Deep hedging”. In: Quantitative Finance

Journal of Economic Dynamics andControl

64 (2016), pp. 23–38.[CW14] Mei Choi Chiu and Hoi Ying Wong. “Mean–variance portfolio selection with cor-relation risk”. In:

Journal of Computational and Applied Mathematics

263 (2014),pp. 432–444.[DF14] Duy-Minh Dang and Peter A Forsyth. “Continuous time mean-variance optimalportfolio allocation under jump diﬀusion: An numerical impulse control approach”.In:

Numerical Methods for Partial Diﬀerential Equations

Annals of Operations Research

The American Economic Review arXiv preprint arXiv:1902.05287 (2019).[Gao+17] Jianjun Gao et al. “Dynamic mean-LPM and mean-CVaR portfolio optimizationin continuous-time”. In:

SIAM Journal on Control and Optimization

Quantitative Finance

The review of ﬁnancial studies

Neural Networks

Mathematical Finance

Available at SSRN3636794 (2020).[JYZ05] Hanqing Jin, Jia-An Yan, and Xun Yu Zhou. “Continuous-time mean–risk portfolioselection”. In:

Annales de l’Institut Henri Poincare (B) Probability and Statistics .Vol. 41. 3. Elsevier. 2005, pp. 559–580.[KB14] Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”.In: arXiv preprint arXiv:1412.6980 (2014).[KJ06] Christian Kahl and Peter J¨ackel. “Fast strong approximation Monte Carlo schemesfor stochastic volatility models”. In:

Quantitative Finance arXiv preprint arXiv:2009.05034 (2020).[Lim04] Andrew EB Lim. “Quadratic hedging and mean-variance portfolio selection with ran-dom parameters in an incomplete market”. In:

Mathematics of Operations Research

Mathematical ﬁnance

SIAM Journal on Control and Optimization

J Finance

Portfolio Selection: Eﬃcient Diversiﬁcation of Investment . Wiley,New York, 1959.[Mer72] Robert C Merton. “An analytic derivation of the eﬃcient portfolio frontier”. In:

Journal of ﬁnancial and quantitative analysis (1972), pp. 1851–1872.[MY17] Christopher W Miller and Insoon Yang. “Optimal control of conditional value-at-riskin continuous time”. In:

SIAM Journal on Control and Optimization

Econometrica:Journal of the econometric society (1952), pp. 431–449.[RU+00] R Tyrrell Rockafellar, Stanislav Uryasev, et al. “Optimization of conditional value-at-risk”. In:

Journal of risk

Journal of Eco-nomic Dynamics and control