[PDF] A deep learning algorithm for optimal investment strategies

Abstract

This paper treats the Merton problem how to invest in safe assets and risky assets to maximize an investor's utility, given by investment opportunities modeled by a d-dimensional state process. The problem is represented by a partial differential equation with optimizing term: the Hamilton-Jacobi-Bellman equation. The main purpose of this paper is to solve partial differential equations derived from the Hamilton-Jacobi-Bellman equations with a deep learning algorithm: the Deep Galerkin method, first suggested by Sirignano and Spiliopoulos (2018). We then apply the algorithm to get the solution of the PDE based on some model settings and compare with the one from the finite difference method.

Full PDF

AA deep learning algorithm for optimal investment strategies

Daeyung Gim ∗ and Hyungbin Park † Department of Mathematical Sciences and Research Institute of MathematicsSeoul National University1, Gwanak-ro, Gwanak-gu, Seoul, Republic of Korea

February 1, 2021

Abstract

This paper treats the Merton problem how to invest in safe assets and risky assets to maxi-mize an investor’s utility, given by investment opportunities modeled by a d -dimensional stateprocess. The problem is represented by a partial diﬀerential equation with optimizing term:the Hamilton–Jacobi–Bellman equation. The main purpose of this paper is to solve partial dif-ferential equations derived from the Hamilton–Jacobi–Bellman equations with a deep learningalgorithm: the Deep Galerkin method, ﬁrst suggested by Sirignano and Spiliopoulos (2018). Wethen apply the algorithm to get the solution of the PDE based on some model settings andcompare with the one from the ﬁnite diﬀerence method. Consider the following expected utility maximization problem:max ( π u ) u ≥ t p E [( X πT ) p | X t = x, Y t = y ] , where π is a portfolio, X π a wealth process and Y a state variable with the utility function(1 /p ) x p =: U ( x ). This kind of problem is ﬁrst suggested by Merton (1969), which is the mostfundamental and pioneering in economics. The Merton problem has played as a key for aninvestor’s wealth allocation in several assets under some market circumstances. Since thenthere have been lots of studies about Merton problem under various conditions. Benth et al.(2003) studied Merton problem under the Black-Scholes setting by using the OU type stochasticvolatility model. K¨uhn and Stroh (2010) studied optimizing portfolio of Merton problem undera limit-ordered market in view of a shadow price. The research on the optimal investmentbased on inside information and drift parameter uncertainty was conducted by Danilova et al.(2010). Nutz (2010) studied the utility maximization in a semimartingale market setting with theopportunity process. Hansen (2013) suggested an optimal investment strategies with investors’partial and private information. Pedersen and Peskir (2017) applied the Lagrange multiplier to ∗ [email protected] † [email protected], [email protected] a r X i v : . [ q -f i n . P M ] J a n olve nonlinear mean-variance optimal portfolio selection problem. Also there was research onthe optimal portfolio strategies using over-reaction and under-reaction by Callegaro et al. (2017).Liang and Ma (2020) researched a robust Merton problem using the constant relative/absoluterisk aversion utility functions under the time-dependent sets of conﬁdence.In this paper we follow the overall market setting in Guasoni and Robertson (2015) andinduce the so-called Hamilton–Jacobi–Bellman equation under time variable t , variable x rep-resenting wealth process and variable y = ( y , . . . , y d ) from the d -dimensional state variable.We can optimize the portfolio by means of ﬁnding a solution to the HJB equation. By usingsome properties including homotheticity and concaveness, we eliminate the optimizing term tochange the HJB equation into a nonlinear partial diﬀerential equation.Under this circumstance we face with the problem of solving nonlinear PDEs. Because ingeneral most PDEs do not have analytic solutions, there exists several well-known numericaltools. These classical approaches can be found in Achdou and Pironneau (2005) and Burdenet al. (2010).At the same time there has been some studies about solving PDEs with a deep neuralnetwork. Lee and Kang (1990), Lagaris et al. (2000) suggested the neural network algorithm ona ﬁxed mesh. Malek and Beidokhti (2006) also suggested the numerical hybrid DNN optimizingmethod. However in case of the higher dimension of PDEs, these grid-based methods would becomputationally ineﬃcient: a curse of dimensionality.Recently there have been several researches to get rid of the curse of dimensionality usingmachine learning techniques. Han et al. (2018) and Weinan et al. (2019) suggested a deepbackward stochastic diﬀerential equation method with the Feynman–Kaˇc formula.The deep learning algorithm mainly used in this paper is the Deep Galerkin method suggestedby Sirignano and Spiliopoulos (2018). It is computationally eﬃcient since there does not needto make any mesh or grid. We deﬁne a loss functional to minimize L -norm about the desireddiﬀerential operator and other conditions from the PDE. To make the loss small enough as wewant, we sample random points from the domain and optimize by means of stochastic gradientdescent. After deriving surfaces, we also apply the ﬁnite diﬀerence method(FDM) in order tocompare surfaces from both algorithms: DGM and FDM. For further research on the DeepGalerkin method, see Al-Aradi et al. (2018) and Al-Aradi et al. (2019).This paper is organized as follows. In section 2, we start by describing the general setting ofthis paper, and induce the partial diﬀerential equation with optimizing term: the HJB equation.The Deep Galerkin method algorithm and neural network approximation theorem from Sirig-nano and Spiliopoulos (2018) are presented in section 3, with some part of code for each stepof DGM algorithm. Numerical test of the algorithm is presented in section 4. Speciﬁcally, wemodel 2 dimensional state process by the OU process and the CIR process, return process by theHeston model. Then we use the calibrated parameters from Cris´ostomo (2014) and Mehrdoustand Fallah (2020). We display the solution surface at each ﬁxed time in some pre-determineddomain of the state variable. We ﬁnally analyze surfaces from the Deep Galerkin method andthose from the ﬁnite diﬀerence method. Conclusions can be found in section 5, and proofs ofneural network approximation theorem are in appendix A. In the case that an economic agent is in time interval [0 , T ], the problem is that he or she has todecide how to invest in several risky assets or safe assets as time goes by, starting with the initialwealth. This problem was ﬁrst suggested by Merton in the 1960s: Merton problem, known as autility maximization problem. The aim of the agent is to establish a portfolio strategy in such away of maximizing utility under some conditions. In this section we describe the general settingof this paper, and induce the HJB equation. We ﬁnally reach to a nonlinear PDE by using some roperties. The above problem is equivalent to a matter of ﬁnding a solution of the equation. We ﬁrst start by describing market with the following framework. Assume that the market has n + 1 assets S (0) , S (1) , . . . , S ( n ) , where S (0) is safe and S (1) , . . . , S ( n ) are risky. One can make adecision to the investment by a d -dimensional state variable Y = ( Y (1) , . . . , Y ( d ) ) satisfying: dY t = b ( Y t ) dt + a ( Y t ) dW t , (2.1)where W = ( W (1) , . . . , W ( d ) ) denotes a standard Brownian motion.Let r be the interest rate, µ be the excess returns, and σ be the volatility matrix. We alsoassume that the prices of the assets satisfy: dS (0) t = rS (0) t dt, (2.2) dS ( i ) t S ( i ) t = rdt + dR ( i ) t ≤ i ≤ n, (2.3)where R = ( R (1) , . . . , R ( n ) ) denotes the cumulative excess return satisfying: dR ( i ) t = µ i ( Y t ) dt + n (cid:88) j =1 σ ij ( Y t ) dZ ( j ) t ≤ i ≤ n. (2.4) ρ = ( ρ ij ) = d (cid:104) Z, W (cid:105) t /dt denotes the cross correlations between the n -dimensional Brownianmotion Z and W . Σ = σσ T = d (cid:104) R, R (cid:105) t /dt is the matrix of quadratic covariance of returns, andΥ = σρa T = d (cid:104) R, Y (cid:105) t /dt denotes the correlation between the return and the state process.In the market, an investor buys the risky assets by a portfolio π = ( π (1) t , . . . , π ( n ) t ) t ≥ . Thewealth process X π = ( X πt ) t ≥ corresponding to the portfolio satisﬁes dX πt X πt = r dt + π Tt dR t , X π ≥ . (2.5)Observe ﬁrst that the portfolio process ( π t ) t ≥ is F t -measurable, where the ﬁltration F = ( F t ) t ≥ is generated by the return R and state variable Y . It might be clear in light of the investor’seyes: he or she has all informations about state and asset return from time t = 0 to the currenttime. Note also the portfolio process is integrable with respect to the return process R . By theMerton problem, we assume the investors’ utility function is deﬁned by the following: U ( x ) = 1 p x p , < p < . For ﬁxed wealth x and state y = ( y , . . . , y d ) satisfying (2.1) and (2.5), our aim is to maximizethe conditional expectation of terminal wealth utility given wealth and state at time t , that ismax ( π u ) u ≥ t p E [( X πT ) p | X t = x, Y t = y ] . Now we substitute the problem of utility maximization to that of solving the PDE, namely theHamilton–Jacobi–Bellman equation. There needs to be some deﬁnitions before approaching tothe HJB equation. eﬁnition 2.1. A portfolio process π = ( π t ) t ≥ is called an admissible portfolio if • For every t ∈ [0 , T ] and ( x, y ) ∈ D ⊂ R × R d , π ( t, x, y ) ∈ U , where U ⊂ R is a ﬁxed subset. • For any given initial points ( t, x ) and y = ( y , . . . , y d ) , the following SDE has a uniquesolution: dX πs = rX πs ds + π Ts dR s ,X πt = x. (2.6) • For any given initial point ( t, y ) = ( t, y , . . . , y d ) , the following SDE has a unique solution: dY s = b ( Y s ) ds + a ( Y s ) dW s ,Y t = y. (2.7)By now we assume the portfolio π is admissible. Deﬁnition 2.2.

Let U be an investor’s utility function. • For each π , we deﬁne the expected value function V π as V π ( t, x, y ) = E [ U ( X πT ) | X t = x, Y t = y ] , given (2.6) and (2.7) . • We deﬁne the optimal value function V as V ( t, x, y ) = sup π V π ( t, x, y ) . The following theorem justiﬁes a conversion from the way of ﬁnding optimal portfolio to thatof solving PDEs having optimizing term. Heuristic process for deriving the HJB equation is inchapter 19, Bj¨ork (2009), in the way of limiting procedures in dynamic programming.

Theorem 2.1.

Assume the following. • The market has a safe asset S (0) whose dynamics is expressed in (2.2) . • The market has n risky assets satisfying (2.3) , with the return process R following thediﬀusion (2.4) . • There exists an optimal portfolio ˆ π = (ˆ π (1) t , . . . , ˆ π ( n ) t ) t ≥ . • The optimal value function V is regular, that is, V ∈ C , , with respect to ( t, x, y ) , y =( y , . . . , y d ) .Then the following hold:1. V satisﬁes the Hamilton–Jacobi–Bellman equation V t + b T ( ∇ y V ) + 12 tr [ a T ( ∇ y V ) a ] + rxV x + sup π (cid:20) π T ( µV x + Υ( ∇ y V x )) x + 12 x V xx π T Σ π (cid:21) = 0 , ( t, x, y ) ∈ [0 , T ] × D,V (0 , x, y ) = U ( x ) , ( x, y ) ∈ D.

2. An optimizing term in the above equation can be achieved by π = ˆ π : sup π (cid:20) π T ( µV x + Υ( ∇ y V x )) x + 12 x V xx π T Σ π (cid:21) = ˆ π T ( µV x + Υ( ∇ y V x )) x + 12 x V xx ˆ π T Σˆ π. f we deﬁne the optimal value function as V ( t, x, y , . . . , y d ) = sup ( π u ) u ≥ t E (cid:20) p ( X πT ) p | X πt = x, Y (1) t = y , . . . , Y ( d ) t = y d (cid:21) , (2.8)by Theorem 2.1 with the Itˆo formula, one can derive the Hamilton–Jacobi–Bellman equationfrom (2.8): V t + b T ( ∇ y V ) + 12 tr[ a T ( ∇ y V ) a ] + rxV x + sup π (cid:20) π T ( µV x + Υ( ∇ y V x )) x + 12 x V xx π T Σ π (cid:21) = 0 , (2.9)where the terminal condition of (2.9) is V ( T, x, y ) = (1 /p ) x p . ∇ y V = ( V y , . . . , V y d ) and ∇ y V = (cid:0) V y i y j (cid:1) ≤ i,j ≤ d stand for the gradient and the Hessian of V with respect to y = ( y , . . . , y d ),respectively. Because of the concaveness of V in x and sup π ( π T b + π T Aπ ) = − b T A − b fornegative deﬁnite matrix A , (2.9) becomes V t + b T ( ∇ y V ) + 12 tr[ a T ( ∇ y V ) a ] + rxV x − ( µV x + Υ( ∇ y V x )) T Σ − V xx ( µV x + Υ( ∇ y V x )) = 0 , (2.10)with the corresponding optimal portfolio is π = π ( t, x, y , . . . , y d ) = − xV xx Σ − ( µV x + Υ( ∇ y V x )) . Since the utility function is homothetic, we deﬁne the reduced value function u as V ( t, x, y , . . . , y d ) = 1 p x p u ( t, y , . . . , y d ) . (2.11)If we put (2.11) into (2.10) and divide each component by x p , (2.10) becomes u t + ( b T − qµ T Σ − Υ) ∇ y u + 12 tr[ a T ( ∇ y u ) a ]+( pr − q µ T Σ − µ ) u − q u ( ∇ y u ) T Υ T Σ − Υ( ∇ y u ) = 0 , (2.12)where the terminal condition of (2.12) is u ( T, y , . . . , y d ) = 1. In (2.12), we set q = p/ ( p −

1) forsimplicity. Also the following is the reduced optimal portfolio: π ( t, y , . . . , y d ) = 11 − p (cid:18) Σ − µ + Σ − Υ( ∇ y u ) 1 u (cid:19) . (2.13) Now we investigate how to solve the PDEs such as (2.12). Since only few PDEs have analyticsolutions, there are well-known numerical tools including the Monte Carlo method exempliﬁedby the Feynman–Kaˇc theorem and the ﬁnite diﬀerence method. However one of the most diﬃcultfacts is a curse of dimensionality. In particular in grid-based numerical methods, the numberof mesh points grows explosively as the dimension goes higher, so Sirignano and Spiliopoulos(2018) suggest a DNN-based algorithm for approximating solution of PDEs: the Deep Galerkinmethod(DGM), such that there is no need to make any mesh. ith the parametrized deep neural network, say f , a loss functional f (cid:55)→ J ( f ) is deﬁnedto minimize L -norm about the desired diﬀerential operator and terminal condition. To makethe loss small enough as we want, the network samples random points from the pre-determineddomain and is optimized by means of the stochastic gradient descent. In this section we ﬁrstintroduce the DGM algorithm. We then state the approximation theorem in order to justifythis new algorithm. Let u = u ( t, y ) be an unknown function which satisﬁes the PDE: ∂ t u ( t, y ) + L u ( t, y ) = 0 , ( t, y ) ∈ [0 , T ] × D,u ( T, y ) = u T ( y ) , y ∈ D, (3.1)where D ⊂ R d . Our aim is to express the solution of (3.1) as a neural network function f = f ( t, y ; θ ) in place of u . θ = ( θ (1) , · · · , θ ( K ) ) denotes a vector of network parameters.Deﬁne a loss functional J := J + J with J ( f ) := (cid:107) ∂ t f ( t, y ; θ ) + L f ( t, y ; θ ) (cid:107) ,T ] × D,ν J ( f ) := (cid:107) f ( T, y ; θ ) − u T ( y ) (cid:107) D,ν Note that all above terms are expressed in terms of L -norm, that is, (cid:107) h ( y ) (cid:107) Y ,ν = (cid:82) Y | h ( y ) | ν ( y ) dy .Each functionals J and J determine that how well the approximation has conducted in viewof the PDE diﬀerential operator and terminal condition. The aim is to ﬁnd a parameter ˆ θ insuch a way of minimizing J ( f ), equivalently,ˆ θ = arg min θ J ( f ( t, y ; θ )) . As the error J ( f ) goes smaller, the approximated function f would get closer to the solution u .Hence f ( t, y ; ˆ θ ) might be the best approximation of u ( t, y ).The algorithm of DGM is as follows:1. Set initial values of θ = ( θ (1)0 , · · · , θ ( K )0 ) and determine the learning rate β n .2. Sample random points ( t n , y n ) in [0 , T ] × D according to probability density ν . Likewise,pick random points w n from D with density ν .3. Calculate the L -error for the randomly sampled points s n = { ( t n , y n ) , w n } : L ( θ n , s n ) = (( ∂ t + L ) f ( t n , y n ; θ n )) + ( f ( T, w n ; θ n ) − u T ( w n )) .

4. Use the stochastic gradient descent at s n : θ n +1 = θ n − β n ∇ θ L ( θ n , s n ) .

5. Repeat until (cid:107) θ n +1 − θ n (cid:107) is small enough.The following is some part of code for each step of DGM algorithm: oper_init = tf.global_variables_initializer() lrn_rate = tf.train.exponential_decay(init_lrn_rate, glob_step, dec_step,dec_rate, staircase=True)optimizer = tf.train.AdamOptimizer(lrn_rate).minimize(loss_tnsr) t_int = np.random.uniform(low=0, high=T, size=[nSim_int,1])y1_int = np.random.uniform(low=y1_low, high=y1_high, size=[nSim_int,1])y2_int = np.random.uniform(low=y2_low, high=y2_high, size=[nSim_int,1]) t_ter = T * np.ones(nSim_ter,1)y1_ter = np.random.uniform(low=y1_low, high=y1_high, size=[nSim_ter,1])y2_ter = np.random.uniform(low=y2_low, high=y2_high, size=[nSim_ter,1]) J1 = tf.reduce_mean(tf.square(diff_u))

J2 = tf.reduce_mean(tf.square(fitted_ter - target_ter))J = J1 + J2 for k in range(steps_per_sample):loss, J1, J2, k = sess.run([loss_tnsr, J1_tnsr, J2_tnsr, optimizer],feed_dict={t_int_tnsr:t_int, y1_int_tnsr:y1_int, y2_int_tnsr:y2_int, t_ter_tnsr:t_ter, y1_ter_tnsr:y1_ter, y2_ter_tnsr:y2_ter})

The following neural network approximation theorem is stated in Sirignano and Spiliopoulos(2018). In other words, there exists a collection of approximated neural network functions thatconverges to a solution of quasilinear parabolic PDEs.

Theorem 3.1.

Deﬁne C n as a collection of DNN functions with n hidden neurons in a singlehidden layer. Assume u = u ( t, y ) be an unknown solution for (3.1) . Under certain conditionsin Sirignano and Spiliopoulos (2018), there exists a neural network function f n with n hiddenneurons such that the following hold:1. J ( f n ) → as n → ∞ ,2. f n strongly −−−−−→ u in L ρ ([0 , T ] × D ) as n → ∞ , where ρ < . Some part of proofs for our formulation in this paper is in appendix A. Further detailsincluding conditions and proofs are in section 7 and appendix A in Sirignano and Spiliopoulos(2018).

The key purpose of this section is to solve (2.12) with the Deep Galerkin method and comparethe numerical solution with the one derived by the well-known ﬁnite diﬀerence method.

We ﬁrst set some speciﬁc settings of the market model. For our experiment we assume thatthere are two ways of decision for trading, i.e., 2 dimensional state variable Y = ( Y (1) , Y (2) ). Let (1) be the Ornstein-Uhlenbeck(OU) process and Y (2) be the Cox-Ingersoll-Ross(CIR) process.This state variable is expressed by the following matrix form: (cid:32) dY (1) t dY (2) t (cid:33) = (cid:32) θ (1) ( k (1) − Y (1) t ) θ (2) ( k (2) − Y (2) t ) (cid:33) dt + (cid:32) (cid:113) Y (2) t (cid:33) (cid:18) a (1 , a (1 , a (2 , a (2 , (cid:19) (cid:32) dW (1) t dW (2) t (cid:33) . We also assume that there is a risky asset S (1) in the market, that is: dS (1) t = rS (1) t dt + S (1) t dR t , where the cumulative excess return R follows the diﬀusion: dR t = Y (1) t dt + σ (cid:113) Y (2) t dZ t , ( σ ∈ R )which is known as the Heston model. In this case the correlation matrix between Z and W isof the form ρ = ( ρ , ρ ) satisfying: (cid:68) Z, W ( i ) (cid:69) = ρ i dt, ≤ i ≤ . Now for the next step we need to set the value of parameters. Let P be a vector of parametersto be determined given by P = (cid:110) θ (1) , θ (2) , k (1) , k (2) , a (1 , , a (1 , , a (2 , , a (2 , , σ, ρ , ρ (cid:111) . We shortly introduce the calibrating process using the nonlinear least squares optimization fromthe market data. For more detail, see Cris´ostomo (2014) and Mehrdoust and Fallah (2020).Deﬁne the Percentage Mean Squared Error (PMSE) between the price C market from themarket and the model price C model of the European call option derived from the double Hestonmodel in Mehrdoust and Fallah (2020) and Lemaire et al. (2020):PMSE := n (cid:88) j =1 w j (cid:32) C market ( S (0) , K j , T j , r ) − C model ( S (0) , K j , T j , r, P ) C market ( S (0) , K j , T j , r ) (cid:33) , where the weights w j satisﬁes: w j = 1 (cid:114)(cid:12)(cid:12)(cid:12) C ( j ) ask − C ( j ) bid (cid:12)(cid:12)(cid:12) . The optimal parameter vector P (cid:63) is determined by the following nonlinear least squares problem P (cid:63) = arg inf PMSE . Table 1 shows the optimal parameters on the observed market data from the S&P500 index atthe close of the market in September 2010.

Table 1: Calibrated parameters θ (1) = 0 . k (1) = 0 . a (1 , = − . a (1 , = 0 . ρ = − . σ = 0 . θ (2) = 0 . k (2) = 0 . a (2 , = − . a (2 , = 0 . ρ = − . .3 Implementation Now let us solve (2.12) by the DGM algorithm under conditions from the above setting. For thenumerical test, we set the interest rate r = 1%, the maturity time T = 1 and the power utilitypreference parameters p = 0 . p = 0 .

5. We sampled 1000 time-space points ( t, y , y ) inthe interior of the domain [0 , T ] × [ − , × [0 ,

10] and 100 space points at terminal time T .We set 100 steps to resample new time-space domain points. Before resampling, each stochasticgradient descent step is repeated 10 times. We set 50 hidden neurons in a hidden layer. Fromstarting 0 . t = 0 (b) t = 0 . T (c) t = 0 . T (d) t = 0 . T Figure 1: Surface of solution by the Deep Galerkin method. ( p = 0 . After solving (2.12) by the DGM algorithm, investors can choose their states ( y , y ) ∈ [ − , × [0 ,

10] for ﬁxed t ∈ [0 , T ]. The optimal portfolio can be constructed using (2.13) as: π DGM ( t, y , y ) = 11 − p (cid:18) Σ − µ + Σ − Υ( ∇ y u DGM ) 1 u DGM (cid:19) . a) t = 0 (b) t = 0 . T (c) t = 0 . T (d) t = 0 . T Figure 2: Surface of solution by the Deep Galerkin method. ( p = 0 . To sum up, one can get the value of u and the portfolio value π at every time or state. Theinvestor could buy or sell a risky asset S (1) based on the value of the portfolio to maximizeutility from terminal wealth.Figure 1 shows surfaces of the solution u DGM of (2.12) using DGM algorithm in diﬀerenttimes, with the power utility preference parameter p = 0 . , × [0 ,

1] as a plot range for convenience. Figure 2 shows surfaces of the solution of (2.12)in p = 0 .

5, with the restricted plot range [0 , × [0 , p , we can easily notice the fact that the surface tends to the plane u = 1 astime goes to the terminal time T : the terminal condition of (2.12). Note however Figure 1 ismore regular than Figure 2 in the sense that the value of L -loss in p = 0 . p = 0 .

5. Hence we may infer the value of market preference parameter p has played a signiﬁcant role for using the Deep Galerkin method algorithm. .4 Comparing with the Finite Diﬀerence Method Now we solve (2.12) using the ﬁnite diﬀerence method(FDM). The domain has equally divided40 grids satisfying: 0 = t < t < · · · < t = T, −

10 = y < y < · · · < y = 10 , y < y < · · · < y = 10 . First of all, we discretize the solution u as u ni,j := u ( t n , y i , y j ) , ≤ i, j, n ≤ . With this notation, we can substitute the equation (2.12) using the following central diﬀerenceformula: u t = u n +1 i,j − u ni,j ∆ t , u y = u ni +1 ,j − u ni − ,j y ) , u y = u ni,j +1 − u ni,j − y ) . Note that we used the forward diﬀerence for discretizing u t in order to get the values of( u ni,j ) ≤ i,j ≤ by using the values of ( u n +1 i,j ) ≤ i,j ≤ , for n = 0 , . . . , u are given by: u y y = u ni +1 ,j − u ni,j + u ni − ,j (∆ y ) , u y y = u ni,j +1 − u ni,j + u ni,j − (∆ y ) ,u y y = u ni +1 ,j +1 − u ni +1 ,j − − u ni − ,j +1 + u ni − ,j − y )(∆ y ) . Then the PDE (2.12) becomes a nonlinear equation with 1521(=39 ×

39) unknowns ( u ni,j ) ≤ i,j ≤ for each n = 0, . . . ,39. The equation is of the form: u n +1 i,j − u ni,j ∆ t + C u ni +1 ,j − u ni − ,j y ) + C u ni,j +1 − u ni,j − y ) + C u ni +1 ,j − u ni,j + u ni − ,j (∆ y ) + C u ni +1 ,j +1 − u ni +1 ,j − − u ni − ,j +1 + u ni − ,j − y )(∆ y ) + C u ni,j +1 − u ni,j + u ni,j − (∆ y ) + C u ni,j − q u ni,j (cid:34) C (cid:18) u ni +1 ,j − u ni − ,j y ) (cid:19) + C u ni +1 ,j − u ni − ,j y ) u ni,j +1 − u ni,j − y ) + C ( u ni,j +1 − u ni,j − y ) ) (cid:35) = 0 , (4.1)where C , . . . , C are constants. Note that the terminal condition u ( T, y ) = u ( T, y , y ) = 1 alsobecomes u i,j = 1 for all 0 ≤ i, j ≤ . Since (2.12) has no boundary condition, we used the boundary data from the DGM algorithm.Figure 3 shows surfaces of the solution of (4.1) using the ﬁnite diﬀerence method in diﬀerenttimes with p = 0 . p , the absolute errors between the solution from the Deep Galerkinmethod and the one from the ﬁnite diﬀerence method are displayed in Figure 5. Notice that theerror between these algorithms is getting slightly larger as the time t goes to zero. This may bedue to the time-reversely performed ﬁnite diﬀerence method algorithm, from t = T to t = 0. Inother words, the stability on the solution from the terminal condition was gradually weakenedas the time goes to zero.In a diﬀerent point of view, combining Figure 5 with Figure 1 and Figure 3, we conclude thesolution is well-estimated by the deep neural network. It usually takes about 5 minutes to train a) t = 0 (b) t = 0 . T (c) t = 0 . T (d) t = 0 . T Figure 3: Surface of solution using the ﬁnite diﬀerence method the network. On the other hand, it only takes less than 30 seconds to ﬁnd the surface of solutionby the FDM. One can deduce this traditional algorithm would be more eﬃcient for time-saving.However, it is not always true. Figure 4 shows surfaces derived from the ﬁnite diﬀerence methodalgorithm with p = 0 .

5, same domain with Figure 2. In Figure 4, the solution has extremelylarge or small values. This singularity may have arised since the system of equations (4.1) isnonlinear. In other words, the matter of ﬁnding inverse matrix in the Newton’s method at eachstep n = 39 , . . . , In this paper we ﬁrst modeled the market with a safe asset and some risky assets whose dynamicssatisfy the diﬀusion process with returns. We then induced the HJB equation to maximizethe expectation of an investor’s utility, given by investment opportunities modeled by a d - a) t = 0 (b) t = 0 . T (c) t = 0 . T (d) t = 0 . T Figure 4: Surface of solution by the ﬁnite diﬀerence method ( p = 0 . dimensional state process. Using some properties including homotheticity and concaveness, weﬁnally derived a nonlinear partial diﬀerential equation and approximated the solution with adeep learning algorithm.For comparison with the Deep Galerkin method, we applied the ﬁnite diﬀerence method toﬁnd an approximated solution. In case of the utility parameter being quite small, p = 0 . p = 0 .

5, there were several singular points in solution surfaces approximated by the ﬁnitediﬀerence method. Hence unlike the Deep Galerkin method, this mesh-based algorithm showedsome defects such as a singularity by a nonlinearity of discretized version of partial diﬀerentialequations. This concludes that the DGM algorithm is relatively stable and has less diﬃcultiesto approximate the solution for PDEs.Furthermore, all above procedures in section 4 were performed only with the 2-dimensionalstate process. If the dimension d of state process increases, since there would exist millions ofgrids, it would be more computationally eﬃcient to apply the DGM algorithm than the FDM a) t = 0 (b) t = 0 . T (c) t = 0 . T (d) t = 0 . T Figure 5: Absolute errors between the Deep Galerkin method and the ﬁnite diﬀerence method algorithm. Finally with the approximated solution from the relatively stable DGM algorithm,the investor can decide how to allocate one’s wealth in several risky assets by the optimalportfolio formula.Also there has some further studies to be researched. the stability or regularity of the solutionis to be researched as the following are changed: model or dimension of a state variable Y , valueof calibrated parameters, market preference parameter p and sampling domain. Also in theoptimal portfolio formula, the stability on a gradient term needs to be considered. Meanwhile,Sirignano and Spiliopoulos (2018) proved the convergence of the DGM algorithm only in a classof quasilinear parabolic PDEs. Although Sirignano and Spiliopoulos (2018) refered that thealgorithm can be applied to other types of PDEs, there needs to be some researches for thestability of hyperbolic, elliptic or fully nonlinear PDEs. A Proof of Theorem 3.1

Here we now justify Theorem 3.1 by proving the following two theorems in special cases. Themain idea of proofs are from Sirignano and Spiliopoulos (2018) and Hornik (1991) based onuniversal approximation arguments. Note that the formulations in this section are not the sameas the ones from the above papers. For completeness, we display almost all computations in thefollowing proofs. The ﬁrst theorem shows the convergence of J ( f ): there exists a deep neuralnetwork f such that the loss functional J ( f ) tends to the arbitrary small. The latter one standsfor the convergence of the DNN function to the solution of PDEs. .1 Convergence of the loss functional Assume D ⊂ R d is bounded with a smooth boundary ∂D . Denote D T = [0 , T ) × D . Considerthe following form of quasilinear parabolic PDE: G [ u ]( t, y ) := ∂ t u ( t, y ) − div( α ( t, y, u, ∇ u )) + γ ( t, y, u, ∇ u ) = 0 , ( t, y ) ∈ D T ,u ( T, y ) = u T ( y ) , y ∈ D. (A.1)Then the above diﬀerential operator G can be expressed as G [ u ]( t, y ) = ∂ t u ( t, y ) − d (cid:88) i,j =1 ∂α i ∂u y j ∂u y j ∂y i − d (cid:88) i =1 ∂α i ∂u ∂ y i u − d (cid:88) i =1 ∂α i ∂ y i + γ ( t, y, u, ∇ u )=: ∂ t u ( t, y ) − d (cid:88) i,j =1 ∂α i ∂u y j ∂u y j ∂y i + ˆ γ ( t, y, u, ∇ u ) . Theorem A.1.

Let C n ( ψ ) be a collection of DNN functions with n hidden neurons in a singlehidden layer: C n ( ψ ) =  ζ : R d → R : ζ ( t, y ) = n (cid:88) i =1 β i ψ  α i t + d (cid:88) j =1 α ji y j  + c i  , where ψ is an activation function and θ = ( β , · · · , β n , α , · · · , α dn , c , · · · , c n ) ∈ R n + n (1+ d ) isa vector of the neural network parameters. Assume the following: • ψ is in C ( R d ) , bounded and non-constant. • [0 , T ] × D is compact. • supp ν ⊂ D T and supp ν ⊂ D . • The above PDE (A.1) has a unique solution, where this solution belongs to both C ( ¯ D T ) and C η , η ( D T ) for ≤ η ≤ , and sup D T (cid:0) |∇ y u ( t, y ) | + (cid:12)(cid:12) ∇ y u ( t, y ) (cid:12)(cid:12)(cid:1) < ∞ . • ˆ γ ( t, y, u, p ) and ∂α i ( t,y,u,p ) ∂p j for ≤ i, j ≤ d are locally Lipschitz continuous, where Lipschitzconstant has a polynomial growth in u and p . • ∂α i ( t, y, u, p ) ∂u y j is bounded, for ≤ i, j ≤ d .Then there is a constant K = K (cid:32) sup D T | u | , sup D T |∇ y u | , sup D T |∇ y u | (cid:33) > , such that for arbitrary positive (cid:15) > , there is a DNN function f in C ( ψ ) = ∞ (cid:83) n =1 C n ( ψ ) satisfying J ( f ) ≤ K(cid:15) .Proof.

By Theorem 3 in Hornik (1991), for every (cid:15) > u ∈ C , ([0 , T ] × R d ), there is aDNN function f = f ( t, y ; θ ) in C ( ψ ) such thatsup D T | ∂ t u − ∂ t f | + sup ¯ D T , ≤ j ≤ | ∂ ( j ) y u − ∂ ( j ) y f | < (cid:15). (A.2) lso we may assume for C >

0, nonnegative constants c , c , c and c , | ˆ γ ( t, y, u, p ) − ˆ γ ( t, y, v, q ) | ≤ C (cid:16) | u | c + | v | c + | p | c + | q | c + 1 (cid:17) ( | u − v | + | p − q | ) , by the local Lipschitz continuity of ˆ γ ( t, y, u, p ) in u and p . We abbreviate u ( t, y ) and f ( t, y ; θ )for convenience. From the H¨older inequality with exponents r and r , (cid:90) D T | ˆ γ ( t, y, f, ∇ y f ) − ˆ γ ( t, y, u, ∇ y u ) | dν ≤ C (cid:90) D T ( | f | c + | u | c + |∇ y f | c + |∇ y u | c + 1)( | f − u | + |∇ y f − ∇ y u | ) dν ≤ C (cid:18)(cid:90) D T ( | f | c + | u | c + |∇ y f | c + |∇ y u | c + 1) r dν (cid:19) r × (cid:18)(cid:90) D T ( | f − u | + |∇ y f − ∇ y u | ) r dν (cid:19) r ≤ C (cid:18)(cid:90) D T ( | f − u | c + |∇ y f − ∇ y u | c + | u | c ∨ c + |∇ y u | c ∨ c + 1) r dν (cid:19) r × (cid:18)(cid:90) D T ( | f − u | + |∇ y f − ∇ y u | ) r dν (cid:19) r ≤ C (cid:32) (cid:15) c + (cid:15) c + sup D T | u | c ∨ c + sup D T |∇ y u | c ∨ c (cid:33) (cid:15) .Each constant C from the above inequalities may diﬀer from each other. The last inequalityholds because of (A.2).Also we may assume (cid:12)(cid:12)(cid:12)(cid:12) ∂α i ( t, y, u, p ) ∂p j − ∂α i ( t, y, v, q ) ∂q j (cid:12)(cid:12)(cid:12)(cid:12) ≤ C (cid:16) | u | c + | v | c + | p | c + | q | c + 1 (cid:17) ( | u − v | + | p − q | ) , by the local Lipschitz continuity of ∂α i ( t,y,u,p ) ∂p j in u and p . For convenience, we denote ξ ( t, y, h, ∇ h, ∇ h ) = d (cid:88) i,j =1 ∂α i ( t, y, h, ∇ h ) ∂h y j ∂ y i y j h ( t, y ) . In spirit to the above procedure we used the H¨older inequality with exponents p and q : (cid:90) D T | ξ ( t, y, u, ∇ y u, ∇ y u ) − ξ ( t, y, f, ∇ y f, ∇ y f ) | dν ≤ (cid:90) D T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d (cid:88) i,j =1 (cid:18) ∂α i ( t, y, f, ∇ f ) ∂f y j − ∂α i ( t, y, u, ∇ u ) ∂u y j (cid:19) ∂ y i y j u ( t, y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) dν + (cid:90) D T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d (cid:88) i,j =1 ∂α i ( t, y, f, ∇ f ) ∂f y j ( ∂ y i y j f ( t, y ; θ ) − ∂ y i y j u ( t, y )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) dν ≤ C d (cid:88) i,j =1 (cid:18)(cid:90) D T | ∂ y i y j u ( t, y ) | p dν (cid:19) p (cid:32)(cid:90) D T (cid:12)(cid:12)(cid:12)(cid:12) ∂α i ( t, y, f, ∇ f ) ∂f y j − ∂α i ( t, y, u, ∇ u ) ∂u y j (cid:12)(cid:12)(cid:12)(cid:12) q dν (cid:33) q + C d (cid:88) i,j =1 (cid:32)(cid:90) D T (cid:12)(cid:12)(cid:12)(cid:12) ∂α i ( t, y, f, ∇ f ) ∂f y j (cid:12)(cid:12)(cid:12)(cid:12) p dν (cid:33) p (cid:18)(cid:90) D T | ∂ y i y j f ( t, y ; θ ) − ∂ y i y j u ( t, y ) | q dν (cid:19) q C d (cid:88) i,j =1 (cid:18)(cid:90) D T | ∂ y i y j u ( t, y ) | p dν (cid:19) p (cid:18)(cid:90) D T ( | f − u | + |∇ y f − ∇ y u | ) qr dν (cid:19) qr × (cid:18)(cid:90) D T ( | f − u | c + |∇ y f − ∇ y u | c + | u | c ∨ c + |∇ y u | c ∨ c + 1) qr dν (cid:19) qr + C d (cid:88) i,j =1 (cid:32)(cid:90) D T (cid:12)(cid:12)(cid:12)(cid:12) ∂α i ( t, y, f, ∇ f ) ∂f y j (cid:12)(cid:12)(cid:12)(cid:12) p dν (cid:33) p (cid:18)(cid:90) D T | ∂ y i y j f ( t, y ; θ ) − ∂ y i y j u ( t, y ) | q dν (cid:19) q ≤ C(cid:15) .To sum up, we ﬁnally obtain the following inequality: J ( f ) = (cid:107)G [ f ] (cid:107) D T ,ν + (cid:107) f ( T, y ; θ ) − u T ( y ) (cid:107) D,ν = (cid:107)G [ f ] − G [ g ] (cid:107) D T ,ν + (cid:107) f ( T, y ; θ ) − u T ( y ) (cid:107) D,ν ≤ (cid:90) D T (cid:0) | ∂ t u − ∂ t f | + | ξ ( t, y, u, ∇ u, ∇ u ) − ξ ( t, y, f, ∇ f, ∇ f ) | (cid:1) dν + (cid:90) D T | ˆ γ ( t, y, f, ∇ y f ) − ˆ γ ( t, y, u, ∇ y u ) | dν + (cid:90) D | f ( T, y ; θ ) − u T ( y ) | dν ≤ K(cid:15) for some constant K > A.2 Convergence of the DNN function to the solution of PDEs

As we done in section A.1, consider the quasilinear parabolic PDE (A.1) and the following lossfunctional J ( f ) = (cid:107)G [ f ] (cid:107) D T ,ν + (cid:107) f ( T, y ; θ ) − u T ( y ) (cid:107) D,ν . By Theorem A.

1, there is a neural network f n such that J ( f n ) tends to 0. Each f n satisﬁes thefollowing: G [ f n ]( t, y ) = h n ( t, y ) , ( t, y ) ∈ D T ,f n ( T, y ) = u nT ( y ) , y ∈ D, (A.3)and (cid:107) h n (cid:107) D T ,ν + (cid:107) u nT − u T (cid:107) D,ν → n → ∞ . Theorem A.2.

Assume the following: • (cid:107) α ( t, y, u, p ) (cid:107) ≤ µ ( (cid:107) p (cid:107) + κ ( t, y )) for all ( t, y ) ∈ D T , with µ > and κ ∈ L ( D T ) beingpositive. • α is continuously diﬀerentiable in ( y, u, p ) . • Both α and γ are Lipschitz continuous, uniformly on the following form of compact sets: (cid:8) ( t, y, u, p ) : t ∈ [0 , T ] , y ∈ ¯ D, ≤ | u | ≤ C, ≤ (cid:107) p (cid:107) ≤ C (cid:9) . • (cid:104) p, α ( t, y, u, p ) (cid:105) ≥ ν (cid:107) p (cid:107) for some ν > . • (cid:104) p − p , α ( t, y, u, p ) − α ( t, y, u, p ) (cid:105) > for some ν > , for every p , p ∈ R d with p (cid:54) = p . • | γ ( t, y, u, p ) | ≤ (cid:107) p (cid:107) λ ( t, y ) for all ( t, y ) ∈ D T , with λ ∈ L d +2 ( D T ) being positive. • u T ( y ) ∈ C , ξ ( ¯ D ) for some ξ > . Note that (cid:107) u ( y ) (cid:107) C ,β ( ¯ D ) = sup y ∈ ¯ D | u ( y ) | [ β ] + sup y ,y ∈ ¯ D,y (cid:54) = y | u ( y ) − u ( y ) || y − y | β − [ β ] . u T and u (cid:48) T are bounded in ¯ D . • D ⊂ R d is bounded and open with boundary ∂D ∈ C . • ( f n ) n ∈ N ∈ C , ( ¯ D T ) and ( f n ) n ∈ N ∈ L ( D T ) .Then1. the PDE (A.1) has a unique bounded solution u ∈ C ,δ, δ ( ¯ D T ) ∩ W (1 , , ( D (cid:63)T ) ∩ L (cid:16) , T ; W , ( D ) (cid:17) , δ > , for any interior subdomain D (cid:63)T ⊂ D T .2. f n → u strongly in L ρ ( D T ) for every ρ < . Note that in case of the class of quasilinear parabolic PDEs with boundary conditions, weshould also consider the limiting process in the weak formulation of PDEs and use the Vitali’stheorem. For more detail, see Appendix A in Sirignano and Spiliopoulos (2018). See alsoBoccardo et al. (2009), Magliocca (2018), Di Nardo et al. (2011) and Debnath (2011).

Proof.

Existence, regularity and uniqueness for (A.1) follows from Theorem 2.1 in Porzio (1999)and Theorem 6.3 to 6.5 of chapter V.6 in Ladyzhenskaia et al. (1968). Boundedness holds byTheorem 2.1 in Porzio (1999). See also chapter V.2 from Ladyzhenskaia et al. (1968).Let f n be the solution of (A.3). By Lemma 4.1 of Porzio (1999), { f n } n ∈ N is uniformlybounded in both L ∞ (0 , T ; L ( D )) and L (cid:16) , T ; W , ( D ) (cid:17) . Then we can pick a subsequencefrom the sequence of neural networks { f n } n ∈ N , where we denote also by { f n } n ∈ N for convenience,satisfying • f n w ∗ −−→ u in L ∞ (0 , T ; L ( D )), • f n → u , weakly in L (cid:16) , T ; W , ( D ) (cid:17) , • f n ( · , t ) → v ( · , t ), weakly in L ( D ), for every ﬁxed t in [0 , T ),for some functions u, v . Since the norm of f in a Banach space L (cid:16) , T ; W , ( D ) (cid:17) is deﬁned as (cid:107) f (cid:107) L ( ,T ; W , ( D ) ) = (cid:18)(cid:90) T (cid:107) f (cid:107) W , ( D ) dt (cid:19) , where (cid:107) f (cid:107) W , ( D ) = (cid:88) | α |≤ (cid:107) D α f (cid:107) L ( D ) = (cid:107) f (cid:107) L ( D ) + (cid:107) Df (cid:107) L ( D ) + (cid:107) D f (cid:107) L ( D ) , {∇ y f n } n ∈ N is uniformly bounded in L (0 , T ; W , ( D )).Let q = 1 + dd + 4 ∈ (1 , r , r > (cid:90) D T | γ ( t, y, f n , ∇ y f n ) | q dtdy ≤ (cid:90) D T | λ ( t, y ) | q |∇ y f n ( t, y ) | q dtdy ≤ (cid:18)(cid:90) D T | λ ( t, y ) | r q dtdy (cid:19) r (cid:18)(cid:90) D T |∇ y f n ( t, y ) | r q dtdy (cid:19) r . Choose r = 2 q . Then we get r = 22 − q and hence r q = d + 2. Since λ ∈ L d +2 ( D T ) and {∇ y f n } n ∈ N is uniformly bounded, (cid:90) D T | γ ( t, y, f n , ∇ y f n ) | q dtdx ≤ C, or some C > α and the above argument imply that { ∂ t f n } n ∈ N is uniformlybounded in L dd +4 ( D T ) and L (cid:0) , T ; W − , ( D ) (cid:1) . Let δ , δ be the conjugate exponents sat-isfying δ > max { , d } . By the Gagliardo–Nirenberg–Sobolev inequality and the Rellich–Kondrachov compactness theorem(for further details, see chapter 5 in Evans (2002)), the fol-lowing embeddings hold: W − , ( D ) ⊂ W − ,δ ( D ) , L q ( D ) ⊂ W − ,δ ( D ), and L ( D ) ⊂ W − ,δ ( D ) , and hence { ∂ t f n } n ∈ N is uniformly bounded in L (0 , T ; W − ,δ ( D )).By Corollary 4 in Simon (1986) and the following embedding W , ( D ) ⊂⊂ L ( D ) ⊂ W − ,δ ( D ) , { f n } n ∈ N is relatively compact in L ( D T ), in other words, f n → u strongly in L ( D T ) as n → ∞ . Thus f n → u almost everywhere in D T up to subsequences. (A.4)Note that from the Theorem 3.3 of Boccardo et al. (1997), we get ∇ f n → ∇ u almost everywhere in D T . (A.5)Hence f n → u strongly in L ρ (cid:16) , T ; W ,ρ ( D ) (cid:17) and so in L ρ ( D T ) for every ρ <

2, by (A.4) and(A.5).

Acknowledgement.

Hyungbin Park was supported by Research Resettlement Fund for the new faculty of SeoulNational University. Hyungbin Park was also supported by the National Research Foundationof Korea (NRF) grants funded by the Ministry of Science and ICT (No. 2018R1C1B5085491and No. 2017R1A5A1015626) and the Ministry of Education (No. 2019R1A6A1A10073437)through Basic Science Research Program.

References

Achdou, Y. and Pironneau, O. (2005).

Computational methods for option pricing . SIAM.Al-Aradi, A., Correia, A., Naiﬀ, D., Jardim, G., and Saporito, Y. (2018). Solving nonlin-ear and high-dimensional partial diﬀerential equations via deep learning. arXiv preprintarXiv:1811.08782 .Al-Aradi, A., Correia, A., Naiﬀ, D. d. F., Jardim, G., and Saporito, Y. (2019). Applications ofthe deep galerkin method to solving partial integro-diﬀerential and hamilton-jacobi-bellmanequations. arXiv preprint arXiv:1912.01455 .Benth, F. E., Karlsen, K. H., and Reikvam, K. (2003). Merton’s portfolio optimization problemin a black and scholes market with non-gaussian stochastic volatility of ornstein-uhlenbecktype.

Mathematical Finance: An International Journal of Mathematics, Statistics and Fi-nancial Economics , 13(2):215–244.Bj¨ork, T. (2009).

Arbitrage theory in continuous time . Oxford university press. occardo, L., Dall’Aglio, A., Gallou¨et, T., and Orsina, L. (1997). Nonlinear parabolic equationswith measure data. journal of functional analysis , 147(1):237–258.Boccardo, L., Porzio, M. M., and Primo, A. (2009). Summability and existence results fornonlinear parabolic equations. Nonlinear Analysis: Theory, Methods & Applications , 71(3-4):978–990.Burden, R., Faires, J. D., and Reynolds, A. (2010). Numerical analysis, brooks/cole.

Boston,Mass, USA, .Callegaro, G., Ga¨ıgi, M., Scotti, S., and Sgarra, C. (2017). Optimal investment in markets withover and under-reaction to information.

Mathematics and Financial Economics , 11(3):299–322.Cris´ostomo, R. (2014). An analyisis of the heston stochastic volatility model: Implementationand calibration using matlab.Danilova, A., Monoyios, M., and Ng, A. (2010). Optimal investment with inside informationand parameter uncertainty.

Mathematics and Financial Economics , 3(1):13–38.Debnath, L. (2011).

Nonlinear partial diﬀerential equations for scientists and engineers . SpringerScience & Business Media.Di Nardo, R., Feo, F., and Guibe, O. (2011). Existence result for nonlinear parabolic equationswith lower order terms.

Anal. Appl.(Singap.) , 9(2):161–186.Evans, L. C. (2002). Partial diﬀerential equations, ams.

Graduate Studies in Mathematics , 19.Guasoni, P. and Robertson, S. (2015). Static fund separation of long-term investments.

Math-ematical Finance , 25(4):789–826.Han, J., Jentzen, A., and Weinan, E. (2018). Solving high-dimensional partial diﬀerentialequations using deep learning.

Proceedings of the National Academy of Sciences , 115(34):8505–8510.Hansen, S. L. (2013). Optimal consumption and investment strategies with partial and privateinformation in a multi-asset setting.

Mathematics and Financial Economics , 7(3):305–340.Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks.

Neuralnetworks , 4(2):251–257.K¨uhn, C. and Stroh, M. (2010). Optimal portfolios of a small investor in a limit order market:a shadow price approach.

Mathematics and Financial Economics , 3(2):45–72.Ladyzhenskaia, O. A., Solonnikov, V. A., and Ural’tseva, N. N. (1968).

Linear and quasi-linearequations of parabolic type , volume 23. American Mathematical Soc.Lagaris, I. E., Likas, A. C., and Papageorgiou, D. G. (2000). Neural-network methods forboundary value problems with irregular boundaries.

IEEE Transactions on Neural Networks ,11(5):1041–1049.Lee, H. and Kang, I. S. (1990). Neural algorithm for solving diﬀerential equations.

Journal ofComputational Physics , 91(1):110–131.Lemaire, V., Montes, T., et al. (2020). Stationary heston model: Calibration and pricing ofexotics using product recursive quantization. arXiv preprint arXiv:2001.03101 . iang, Z. and Ma, M. (2020). Robust consumption-investment problem under crra and carautilities with time-varying conﬁdence sets. Mathematical Finance , 30(3):1035–1072.Magliocca, M. (2018). Existence results for a cauchy–dirichlet parabolic problem with a repulsivegradient term.

Nonlinear Analysis , 166:102–143.Malek, A. and Beidokhti, R. S. (2006). Numerical solution for high order diﬀerential equationsusing a hybrid neural network—optimization method.

Applied Mathematics and Computation ,183(1):260–271.Mehrdoust, F. and Fallah, S. (2020). On the calibration of fractional two-factor stochasticvolatility model with non-lipschitz diﬀusions.

Communications in Statistics-Simulation andComputation , pages 1–20.Merton, R. C. (1969). Lifetime portfolio selection under uncertainty: The continuous-time case.

The review of Economics and Statistics , pages 247–257.Nutz, M. (2010). The opportunity process for optimal consumption and investment with powerutility.

Mathematics and ﬁnancial economics , 3(3-4):139–159.Pedersen, J. L. and Peskir, G. (2017). Optimal mean-variance portfolio selection.

Mathematicsand Financial Economics , 11(2):137–160.Porzio, M. M. (1999). Existence of solutions for some” noncoercive” parabolic equations.

Dis-crete & Continuous Dynamical Systems-A , 5(3):553.Remani, C. (2013). Numerical methods for solving systems of nonlinear equations.

LakeheadUniversity Thunder Bay, Ontario, Canada .Simon, J. (1986). Compact sets in the space L p (0 , T ; B ). Annali di Matematica pura ed applicata ,146(1):65–96.Sirignano, J. and Spiliopoulos, K. (2018). Dgm: A deep learning algorithm for solving partialdiﬀerential equations.

Journal of computational physics , 375:1339–1364.Weinan, E., Hutzenthaler, M., Jentzen, A., and Kruse, T. (2019). On multilevel picard numericalapproximations for high-dimensional nonlinear parabolic partial diﬀerential equations andhigh-dimensional nonlinear backward stochastic diﬀerential equations.

Journal of ScientiﬁcComputing , 79(3):1534–1571., 79(3):1534–1571.