[PDF] Bellman type strategy for the continuous time mean-variance model

Abstract

To investigate a time-consistent optimal strategy for the continuous time mean-variance model, we develop a new method to establish the Bellman principle. Based on this new method, we obtain a time-consistent dynamic optimal strategy that differs from the pre-committed and game-theoretic strategies. A comparison with the existing results on the continuous time mean-variance model shows that our method has several advantages. The explicit solutions of the dynamic optimal strategy and optimal wealth are given. When the dynamic optimal strategy is given at the initial time, we do not change it in the following investment time interval.

Full PDF

aa r X i v : . [ q -f i n . P M ] J u l Bellman type strategy for the continuous time mean-variancemodel ∗ Shuzhen Yang †‡ Abstract : To investigate a time-consistent optimal strategy for the continuous time mean-variancemodel, we develop a new method to establish the Bellman principle. Based on this new method, weobtain a time-consistent dynamic optimal strategy that di ﬀ ers from the pre-committed and game-theoretic strategies. A comparison with the existing results on the continuous time mean-variancemodel shows that our method has several advantages. The explicit solutions of the dynamic optimalstrategy and optimal wealth are given. When the dynamic optimal strategy is given at the initial time,we do not change it in the following investment time interval. ∗ Keywords : Mean-variance; Bellman principle; Stochastic control.

Journal of Economic Literature classiﬁcation Numbers : G11, D81, C61.

MSC2010 subject classiﬁcation : 91B28; 93E20; 49N10. † Shandong University-Zhong Tai Securities Institute for Financial Studies, Shandong University, PR China,([email protected]). ‡ This work was supported by the National Natural Science Foundation of China (Grant No.11701330) and YoungScholars Program of Shandong University. Introduction

In the portfolio selection problem, we want to minimize the risk within a given expected returnof the wealth. To solve this problem, Markowitz (1952, 1959) proposed a mean-variance model ina single-period case. Then, Merton (1972) solved this single-period problem analytically using mildassumptions. In the single-period mean-variance model, the mean and variance of wealth are usedto represent its expected return and risk, respectively. Following this original single-period frame-work for the portfolio selection problem, many authors begin to consider related problems and themulti-period mean-variance model. Di ﬀ erent from the single-period framework, the investor needs tooptimize the multi-period objectives in the multi-period mean-variance model but not only optimizesthe next period objective.Furthermore, the discrete and continuous time mean-variance portfolio selection models havebeen proposed for the multi-period framework. Richardson (1989) investigated a mean-variancemodel for one risky asset stock and a bond with a constant risk-free rate in a continuous-time set-ting, in which the author focused on minimizing the variance of the wealth at the terminal time underthe constraint on mean value. Bajeux-Besnainou and Portait (1998) considered the portfolio strate-gies that are mean-variance e ﬃ cient when continuous rebalancing is allowed between the initial timeand the terminal time. Li and Ng (2000) employed the results of stochastic optimal control theory tosolved a discrete-time multi-period mean-variance problem by embedding the original problem into amulti-objective optimization framework. Following the same idea in Li and Ng (2000), Zhou and Li(2000) investigated an optimal strategy and e ﬃ cient frontier for the continuous-time mean-varianceproblem. In contrast, Dybvig (1988) proposed a cost-e ﬃ cient approach to the optimal portfolio selec-tion in a straightforward manner. Based on the cost-e ﬃ cient approach, Bernard and Vandu ﬀ el (2014)considered the problem of a mean-variance optimal portfolio in the presence of a benchmark. The op-timal strategy and e ﬃ cient frontier in the continuous-time mean-variance problem, which derived bythe cost-e ﬃ cient approach, is consistent with the results of Zhou and Li (2000). Further extensions tothe mean-variance problem in continuous time include those with bankruptcy prohibition, transactioncosts, and random parameters in complete and incomplete markets (Bielecki et al., 2005; Dai et al.,2010; Lim and Zhou, 2002; Lim, 2004; Xia, 2005).Based on a general mean-ﬁeld framework, Andersson and Djehiche (2011) considered the op-timal control problem of a stochastic di ﬀ erential equation of mean-ﬁeld type, also be called McK-ean–Vlasov type equation. Employing the related stochastic maximum principle to the mean-varianceportfolio selection problem, Andersson and Djehiche (2011) obtained an optimal strategy which iscoincided with that in Zhou and Li (2000). Li (2012) investigated an integral form stochastic maxi-mum principle for general mean-ﬁeld optimal control systems. As an application, a mean-ﬁeld type2inear quadratic stochastic control problem is solved. Buckdahn et al. (2011) established a generalstochastic maximum principle for the stochastic di ﬀ erential equations of mean-ﬁeld type. In addition,Fischer and Livieri (2016) studied the continuous time mean-variance portfolio optimization prob-lem and obtained the related pre-committed strategy using the mean ﬁeld approach. Pham and Wei(2017) considered the optimal control of general stochastic McKean-Vlasov equation and establishedthe dynamic programming principle for the value function in the Wasserstein space of probabilitymeasures. In addition, the linear-quadratic stochastic McKean-Vlasov control problem and an in-terbank systemic risk model with common noise were investigated in Pham and Wei (2017), furthersee Pham and Wei (2018). Recently, Ismail and Pham (2019) considered a robust continuous-timemean-variance portfolio selection problem where the model uncertainty a ﬀ ects the covariance matrixof multiple risky assets. Furthermore, Ismail and Pham (2019) obtained the explicit solution for theoptimal robust portfolio strategies in the case of uncertain volatilities, which is coincided with that inZhou and Li (2000) and Fischer and Livieri (2016).The optimal strategy in the aforementioned multi-period mean-variance framework is a pre-committed strategy that strengths the premise that the investor needs to follow the strategy given atthe initial time. However, if the optimal strategy is not time-consistent, the investor may not obey thisstrategy in the following investment time interval. Here, the time-consistent means that the investorobtains the same strategy at any time during investment time interval. Thus, developing a dynamictime-consistent strategy for the mean-variance model in the continuous time framework is signiﬁ-cant. From a game point of view, by directly deﬁning a local maximum principle, a game-theoreticapproach is investigated to address the mean-variance model in the multi-period case. Furthermore,by introducing an adjustment term in the objective, Basak and Chabakauri (2010) adopted a dynamicmethod to study the mean-variance model. Hu et al. (2012) formulated a general time-inconsistentstochastic linear-quadratic control problem, and deﬁned an equilibrium instead of optimal control,further see Yong (2012). In addition, a pre-committed strategy for mean-variance model is givenin Hu et al. (2012). Huang et al. (2007) considered the large population stochastic dynamic gamesand the Nash certainty equivalence based control laws. Bensoussan et al. (2016) studied the linear-quadratic mean ﬁeld games via the adjoint equation approach, further see Bensoussan et al. (2013).Bj¨ork et al. (2014) studied the mean-variance problem with state dependent risk aversion. Bj¨ork et al.(2017) established a general framework to study the time-inconsistent stochastic control in the con-tinuous time framework. In particular, Dai et al. (2019) proposed a dynamic mean-variance analysisfor log returns within the game-theoretic approach.Di ﬀ erent from the aforementioned continuous time mean-variance framework, we develop anew method to study the multi-period mean-variance model via the dynamic programming prin-ciple in this study. Let X π t , x ( · ) denote the wealth of the investor in the investment time interval3 t , T ] with the initial time t and state x , where π ( · ) is the related strategy. The objective of the in-vestor is to minimize the variance of the wealth Var[ X π t , x ( T )] within a given mean level constraint on E [ X π t , x ( T )]. The question of this problem is that the objective Var[ X π t , x ( T )] does not satisfy the iterated-expectation property. Therefore, we cannot directly use the dynamic programming principle in thetheory of stochastic optimal control to solve this continuous time mean-variance problem. Notingthat Var[ X π t , x ( T )] = E [ (cid:0) X π t , x ( T ) − E [ X π t , x ( T )] (cid:1) ], the term E [ X π t , x ( T )] in the formula of Var[ X π t , x ( T )] is themain gap when we investigate a Bellman principle for variance Var[ X π t , x ( T )]. To bridge this gap, weuse a deterministic process to represent the mean process E [ X π t , x ( · )], which is motivated by Example1. Therefore, we introduce a new deterministic process Y π t , y ( · ), which satisﬁes Y π t , y ( · ) = E [ X π t , y ( · )]. Theobjective becomes E [( X π t , x ( T ) − Y π t , y ( T )) ] within a given mean constraint on E [ X π t , x ( T )].In this study, we want to consider the following objective cost functional:˜ J ( t , x , y , µ ; π ( · )) = µ E [ (cid:0) X π t , x ( T ) − Y π t , y ( T ) (cid:1) ] − Y π t , y ( T ) . (1.1)Note that the deﬁnition of the cost functional (1.1) allows us to separate the process Y π t , y ( · ) fromthe wealth’s variance. Then, a value function V µ ( t , x , y ) is deﬁned by optimizing the objective costfunctional (1.1). We can prove that the value function V µ ( t , x , y ) satisﬁes a Bellman principle, anda related Hamilton-Jocabi-Bellman equation is derived. Through a series of analyses, we can obtainthe explicit solution for the value function V µ ( t , x , y ) and related optimal strategy with x , y . Tosolve the original problem, we extend the explicit solution of the value function V µ ( t , x , y ) to the case x = y . Furthermore, we ﬁnd a time-consistent dynamic optimal strategy that di ﬀ ers from the existingstrategies and compare our dynamic optimal strategy with the pre-committed and game-theoreticstrategies. For notation simplicity, we use the game-theoretic strategy to denote the optimal strategythat is developed by the game-theoretic approach.The remainder of this paper is organized as follows. In Section 2, we formulate the continuoustime mean-variance model. Then, in Section 3, we investigate an optimal strategy and establish adynamic time-consistent relationship between the mean and variance of the investor’s wealth. InSection 4, following the main results of Section 3, we compare the mean, variance of the investor’swealth and the dynamic optimal strategy of our method with that of the pre-committed and game-theoretic strategies. In Section 5, we consider a general setting for the mean-variance model. Finally,we conclude the paper in Section 6. In this section, we show the motivation of our Bellman principle for the classical continuous timemean-variance model using the following example.4 xample 1.

Let us consider a simple stochastic process:X t , x ( s ) = x + b ( s − t ) + σ [ W ( s ) − W ( t )] , t ≤ s ≤ T , where b , σ are constants, T > , and W ( · ) is a standard Brownian motion. We consider the followingvalue function: V ( t , x ) = E [ X t , x ( T )] . Employing the Bellman principle to V ( · ) , one obtainsV ( t , x ) = E [ V ( s , X t , x ( s ))] , t ≤ s ≤ T . Thus, V ( t , x ) satisﬁes the following partial di ﬀ erential equation (PDE):  ∂ t V ( t , x ) + b ∂ x V ( t , x ) + σ ∂ xx V ( t , x ) = , V ( T , x ) = x , ≤ t < T . (2.1) Based on PDE (2.1), we can ﬁnd an unique classical solution,V ( t , x ) = x + b ( T − t ) , from which, we can see that the second-order term, σ ∂ xx V ( t , x ) = . Therefore, equation (2.1) becomes  ∂ t V ( t , x ) + b ∂ x V ( t , x ) = , V ( T , x ) = x , ≤ t < T . (2.2) These results motivate us to consider the expectation process of X t , x ( · ) ,Y t , x ( s ) = E [ X t , x ( s )] = x + b ( s − t ) , t ≤ s ≤ T , and note that V ( t , x ) = Y t , x ( T ) = x + b ( T − t ) satisﬁes equation (2.2).In the following, we consider the value function of a nonlinear function of E [ · ] ,V ( t , x ) = Φ ( E [ X t , x ( T )]) , where Φ ( x ) has a continuous ﬁrst-order derivative in x ∈ R . Notice that we cannot use the Bellmanprinciple for a nonlinear function of E [ · ] , Φ ( E [ X t , x ( T )]) . This is because the iterated-expectationproperty does not hold for Φ ( E [ X t , x ( T )]) . Noting that, we can study the value function that is deﬁnedby process Y t , x ( · ) = E [ X t , x ( · )] ,V ( t , x ) = V ( s , Y t , x ( s )) = Φ ( Y t , x ( T )) , t ≤ s ≤ T , nd V ( t , x ) satisﬁes the following equation:  ∂ t V ( t , x ) + b ∂ x V ( t , x ) = , V ( T , x ) = Φ ( x ) , ≤ t < T . (2.3) Remark 2.1.

Example 1 indicates that when we consider the value function of nonlinear function of E [ X t , x ( · )] , we can introduce the process that denotes the expectation of state process X t , x ( · ) . Based onthese observations, we can establish the Bellman principle for the value function through the meanprocess E [ X t , x ( · )] . In the following, we use this idea to study the mean-variance portfolio problem incontinuous time framework. Given a complete ﬁltered probability space ( Ω , F , P ; {F ( s ) } s ≥ t ), and W ( · ) is a d -dimensional stan-dard Brownian motion deﬁned on which with W ( t ) =

0, where {F ( s ) } s ≥ t is the P -augmentation of thenatural ﬁltration generated by W ( · ). In the ﬁnancial market, we consider that one risk-free bond assetand n risky stock assets are traded, where the bond satisﬁes the following equation:  d S ( s ) S ( s ) = r ( s )d s , S ( t ) = s , t < s ≤ T , and the i ’th (1 ≤ i ≤ n ) stock asset is described by  d S i ( s ) S i ( s ) = b i ( s )d t + d X j = σ i j ( s )d W j ( s ) , S i ( t ) = s i , t < s ≤ T , where r ( · ) ∈ R is the risk-free return rate of the bond, b ( · ) = ( b ( · ) , · · · , b n ( · )) ∈ R n is the expectedreturn rate of the risky assets, and σ ( · ) = ( σ ( · ) , · · · , σ n ( · )) ⊤ ∈ R n × d is the corresponding volatilitymatrix. Given initial capital x > γ ( · ) = ( γ ( · ) , · · · , γ n ( · )) ∈ R n , where γ i ( · ) = b i ( · ) − r ( · ) , ≤ i ≤ n .The investor’s wealth X π t , x ( · ) satisﬁes  d X π t , x ( s ) = (cid:2) r ( s ) X π t , x ( s ) + γ ( s ) π ( s ) ⊤ (cid:3) d s + π ( s ) σ ( s )d W ( s ) , X π t , x ( t ) = x , t < s ≤ T , (3.1)where π ( · ) = ( π ( · ) , · · · , π n ( · )) ∈ R n is the capital invested in the risky asset S ( · ) = ( S ( · ) , · · · , S n ( · )) ∈ R n and π ( · ) is the capital invested in the bond. Thus, we have X π t , x ( · ) = n X i = π i ( · ).6n this study, we consider the following mean-variance model: J ( t , x ; π ( · )) = Var[ X π t , x ( T )] = E [ (cid:0) X π t , x ( T ) − E [ X π t , x ( T )] (cid:1) ] , (3.2)with the following constraint on the mean, E [ X π t , x ( T )] = L . (3.3)The set of admissible strategies π ( · ) is deﬁned as: A Tt = (cid:26) π ( · ) : π ( · ) ∈ L F [ t , T ; R n ] (cid:27) , where L F [ t , T ; R n ] is the set of all square integrable measurable R n valued {F s } s ≥ t adaptive processes.If there exists a strategy π ∗ ( · ) ∈ A Tt that yields the minimum value of the cost functional (3.2), thenwe say that the mean-variance model (3.2) is solved.We suppose the following assumptions are used to obtain the optimal strategy for the proposedmodel (3.2): H : r ( · ) , b ( · ) and σ ( · ) are bounded deterministic continuous functions. H : r ( · ) , γ ( · ) > σ ( · ) σ ( · ) ⊤ > δ I , where δ > I is the identity matrix of S n , and S n is the set of symmetric matrices. In this section, we want to solve the mean-variance model via a dynamic programming principlemethod. In detail, we set the term E [ X π t , x ( · )] as a deterministic process Y π t , x ( · ), which di ﬀ ers from thestochastic term X π t , x ( · ). Therefore, we can establish the related Bellman principle. First, we introducethe following cost functional: J ( t , x , µ ; π ( · )) = µ Var[ X π t , x ( T )] − E [ X π t , x ( T )] , (3.4)where µ > ﬃ cient and can be determined by the mean constraint L in (3.3).Notice that, Var[ X π t , x ( T )] = E [ (cid:0) X π t , x ( T ) − E [ X π t , x ( T )] (cid:1) ] . However, we cannot obtain the Bellman principle for the term [ E X π t , x ( T )] because that [ E ( · )] is anonlinear function of E ( · ). Remark 2.1 suggests that we consider the dynamic programming principlefor variables ( s , X π t , x ( s ) , E [ X π t , x ( s )]) , t ≤ s ≤ T , x ∈ R . To separate the expectation term from thevariance, we introduce the following auxiliary process Y π t , y ( · ), where Y π t , y ( · ) satisﬁes  d Y π t , y ( s ) = (cid:2) r ( s ) Y π t , y ( s ) + γ ( s ) E [ π ( s ) ⊤ ] (cid:3) d s , Y π t , y ( t ) = y , t < s ≤ T . (3.5)7omparing equations (3.1) and (3.5), we can see that Y π t , y ( s ) = E [ X π t , y ( s )] , t ≤ s ≤ T . Now, we introduce a useful version for cost functional (3.4):˜ J ( t , x , y , µ ; π ( · )) = µ E [ (cid:0) X π t , x ( T ) − E [ X π t , y ( T )] (cid:1) ] − E [ X π t , y ( T )] = E [ µ (cid:0) X π t , x ( T ) − Y π t , y ( T ) (cid:1) ] − Y π t , y ( T ) . Obviously, we have ˜ J ( t , x , x , µ ; π ( · )) = J ( t , x , µ ; π ( · )) . Therefore, we consider the following value function: V µ ( t , x , y ) = inf π ( · ) ∈A Tt ˜ J ( t , x , y , µ ; π ( · )) . (3.6) Remark 3.1.

In the deﬁnition of the cost functional ˜ J ( t , x , y , µ ; π ( · )) , we consider a stochastic processX π t , x ( · ) and a deterministic process Y π t , y ( · ) under the same strategy π ( · ) ∈ A Tt , where Y π t , y ( · ) = E [ X π t , y ( · )] .In this study, this relationship is useful. In the following, we derive the Bellman principle for the valuefunction V µ ( t , x , y ) . Similar to the manner in Lemma 3.2, Chapter 4 in Yong and Zhou (1999), we obtain the followinguseful results.

Lemma 3.1.

Let Assumptions H and H hold. For any given ≤ t ≤ s ≤ T , x , y ∈ R , X π t , x ( s ) = ξ, X π t , y ( s ) = η ∈ L ( Ω ) , we have, ˜ J ( s , ξ, E [ η ] , µ ; π ( · )) = E [ µ (cid:0) X π t , x ( T ) − Y π t , y ( T ) (cid:1) − Y π t , y ( T ) | F s ] . (3.7)Based on Lemma 3.1, we have the following Bellman principle for the value function V µ ( t , x , y ).The proofs of Theorems 3.1 and 3.2 are given in Appendix A. Theorem 3.1.

Let Assumptions H and H hold. For any given ≤ t ≤ s ≤ T , x , y ∈ R , we have,V µ ( t , x , y ) = inf π ( · ) ∈A st E [ V µ ( s , X π t , x ( s ) , Y π t , y ( s ))] . (3.8) Theorem 3.2.

Let Assumptions H and H hold. For any given ≤ t ≤ T , x , y ∈ R , x , y,V µ ( t , x , y ) = µ ( x − y ) e R Tt r ( h )d h − ye R Tt r ( h )d h − Z Tt β ( h )4 µ d h , (3.9)8 s the classical solution of the following partial di ﬀ erential equation (PDE),  ∂ t V µ ( t , x , y ) = − inf π ∈ R n (cid:26) ∂ x V µ ( t , x , y )[ r ( t ) x + γ ( t ) π ⊤ ] + ∂ y V µ ( t , x , y )[ r ( t ) y + γ ( t ) π ⊤ ] + ∂ xx V µ ( t , x , y ) πσ ( t ) σ ( t ) ⊤ π ⊤ (cid:27) , V µ ( T , x , y ) = µ ( x − y ) − y , (3.10) where β ( t ) = γ ( t )[ σ ( t ) σ ( t ) ⊤ ] − γ ( t ) ⊤ , and the related optimal strategy is π ∗ ( t , x , y ) = µ γ ( t ) (cid:2) σ ( t ) σ ( t ) ⊤ (cid:3) − e − R Tt r ( h )d h , ( t , x , y ) ∈ [0 , T ] × R × R . Remark 3.2.

In Theorem 3.2, we obtain the explicit solution for V µ ( t , x , y ) and the related optimalstrategy π ∗ ( t , x , y ) with x , y. The question is how to obtain the optimal strategy π ∗ ( t , x , y ) with x = y.Note that, V µ ( t , x , y ) and π ∗ ( t , x , y ) are derived for x , y and continuous on ( x , y ) ∈ R . Thus, weextend the explicit solution of V µ ( t , x , y ) and π ∗ ( t , x , y ) to the case x , y. In the following, we setV µ ( t , x , x ) = lim y → x V µ ( t , x , y ) , π ∗ ( t , x , x ) = lim y → x π ∗ ( t , x , y ) . Thus, the original problem J ( t , x , µ ; π ( · )) is solved based on the Bellman-type time-consistentoptimal strategy π ∗ ( t , x , x ) , x ∈ R . In other words, we can obtain a time-consistent optimal strategyfor the original problem by extending the explicit solution of Hamilton system (3.10). Remark 3.3.

From Theorem 3.2, for given ( t , x , y ) ∈ [0 , T ] × R × R , we can obtain the optimal strategy π ∗ ( t , x , y ) , which deduces the optimal strategy at ( s , X π ∗ t , x ( s ) , E [ Y π ∗ t , y ( s )]) is π ∗ ( s , X π ∗ t , x ( s ) , E [ Y π ∗ t , y ( s )]) = µ γ ( s ) (cid:2) σ ( s ) σ ( s ) ⊤ (cid:3) − e − R Ts r ( h )d h , which is independent from the initial state ( x , y ) . Thus, we omit the variable ( x , y ) in π ∗ ( · ) , and theoptimal strategy π ∗ ( s ) = µ γ ( s ) (cid:2) σ ( s ) σ ( s ) ⊤ (cid:3) − e − R Ts r ( h )d h , t ≤ s ≤ T , does not change value at time s > max( t , t ) with di ﬀ erent initial times t , t ≥ . Thus, we cansee that π ∗ ( · ) is a time-consistent dynamic optimal strategy. Notice that, we have not shown how todetermine the value of risk aversion parameter µ . We need to use the mean level L in constrainedcondition (3.3) to solve µ , further see Remark 3.5. Based on Remark 3.2, let y → x , combining the deﬁnition of V µ ( t , x , y ), (3.6) and explicit formu-lation of V µ ( t , x , y ), (3.9), we set V µ ( t , x , x ) = lim y → x V µ ( t , x , y ) = inf π ( · ) ∈A Tt (cid:26) µ Var[ X π t , x ( T )] − E [ X π t , x ( T )] (cid:27) = − xe R Tt r ( h )d h − Z Tt β ( h )4 µ d h . (3.11)9ote that in the ﬁrst term − xe R Tt r ( s )d s of the value function V µ ( t , x , x ), the parameter x > V µ ( t , x , x ) is decreasing within x ∈ (0 , + ∞ ), whichindicates that the large value of initial wealth brings small objective cost functional. In general, wecan assume a constant risk-free rate r >

0, which shows that the ﬁrst term of the value function V µ ( t , x , x ) is decreasing with the length of the investment time interval T − t . In the second term − R Tt β ( h )4 µ d h , β ( s ) = γ ( s )[ σ ( s ) σ ( s ) ⊤ ] − γ ( s ) ⊤ , s ∈ [ t , T ]. To clarify the e ﬀ ect of the second term,we consider a simple Black-Sholes setting, where r , b , σ are independent from time s ∈ [ t , T ] and b i > r > , σ i j = , i , j , σ ii = σ i > , ≤ i , j ≤ n . Thus, we can obtain − Z Tt β ( h )4 µ d h = t − T µ n X i = (cid:18) b i − r σ i (cid:19) , where µ is the risk aversion parameter of the investor and b i − r σ i is the shape-ratio of the i ’th riskyasset. This formulation shows that cost functional V µ ( t , x , x ) is decreasing with the risk aversionparameter µ and increasing with the shape-ratio of the risky asset. These results coincide with thehigh return within high risk. Note that b i − r σ i > , ≤ i ≤ n ; therefore, the cost functional V µ ( t , x , x )is increasing with the number of risky assets n , indicating that risk diversiﬁcation may produce extracosts. In addition, the second term is decreasing with the length of the investment time interval T − t which is same with the ﬁrst term.Note that the optimal strategy is given as follows: π ∗ ( s ) = µ γ ( s )[ σ ( s ) σ ( s ) ⊤ ] − e − R Ts r ( h )d h , t ≤ s ≤ T . Following this Black-Sholes setting, we have π ∗ ( s ) = e ( s − T ) r µ ( b − r σ , b − r σ , · · · , b n − r σ n ) , t ≤ s ≤ T . (3.12)Thus, the investor invests an amount e ( s − T ) r µ b i − r σ i into the i ’th risky asset and an amount x − e ( s − T ) r µ n X i = b i − r σ i into the risk-free asset. From the formulation of optimal strategy π ∗ ( · ), (3.12), we can see that the op-timal strategy π ∗ ( · ) is decreasing with the length of the investment time interval T − s and decreasingwith the risk aversion parameter µ which shows that the risk averse investor invests less money intothe risky assets within a large value of the risk aversion parameter µ . In addition, each element of π ∗ ( · ), e ( s − T ) r µ b i − r σ i , ≤ i ≤ n is decreasing with the length of the investment time interval T − s which indicates that the investor will add the proportion of the amount in the risky asset along withthe holding time. Remark 3.4.

We need to point out that the optimal strategy π ∗ ( · ) is independent from the wealthstate. This ﬁnding coincides with the results in Basak and Chabakauri (2010), in which the authors btained an optimal strategy based on the game-theoretic approach. In fact, we may expect that anoptimal strategy can depend on wealth x. However, we can solve this problem by changing the valueof risk aversion parameter µ . We can determine the value of µ using the initial time t and wealth statex, and keep this risk aversion µ until the terminal time T . In addition, based on the formulation ofthe optimal cost functional V µ ( t , x , x ) = − xe R Tt r ( h )d h − Z Tt β ( h )4 µ d h, we can take a large value of riskaversion µ and a large value of initial wealth x to balance the cost functional V µ ( t , x , x ) . Further seeBj¨ork et al. (2014, 2017) and Dai et al. (2019). ﬃ cient frontier In this section, we want to derive the dynamic e ﬃ cient frontier for E [ X π ∗ t , x ( s )] and Var[ X π ∗ t , x ( s )] , t ≤ s ≤ T . Plugging the optimal strategy π ∗ ( s ) = µ γ ( s ) (cid:2) σ ( s ) σ ( s ) ⊤ (cid:3) − e − R Ts r ( h )d h , t ≤ s ≤ T , into the wealth equation (3.1), we can obtain that E [ X π ∗ t , x ( · )] and E [ (cid:0) X π ∗ t , x ( · ) (cid:1) ] satisfy the followinglinear ordinary di ﬀ erential equations.  d E [ X π ∗ t , x ( s )] = (cid:20) r ( s ) E [ X π ∗ t , x ( s )] + µ e − R Ts r ( h )d h β ( s ) (cid:21) d s , E [ X π ∗ t , x ( t )] = x , t < s ≤ T , (3.13)and  d E [ (cid:0) X π ∗ t , x ( s ) (cid:1) ] = (cid:20) r ( s ) E [ (cid:0) X π ∗ t , x ( s ) (cid:1) ] + E [ X π ∗ t , x ( s )] µ e − R Ts r ( h )d h β ( s ) + µ e − R Ts r ( h )d h β ( s ) (cid:21) d s , E [ (cid:0) X π ∗ t , x ( t ) (cid:1) ] = x , t < s ≤ T . (3.14)By equation (3.13), we have  d (cid:0) E [ X π ∗ t , x ( s )] (cid:1) = (cid:2) r ( s ) (cid:0) E [ X π ∗ t , x ( s )] (cid:1) + E [ X π ∗ t , x ( s )] µ e − R Ts r ( h )d h β ( s ) (cid:3) d s , E [ X π ∗ t , x ( t )] = x , t < s ≤ T , (3.15)Note that, Var[ X π ∗ t , x ( s )] = E [ (cid:0) X π ∗ t , x ( s ) (cid:1) ] − (cid:0) E [ X π ∗ t , x ( s )] (cid:1) , t ≤ s ≤ T , combining equations (3.14) and(3.15), it follows that,  dVar[ X π ∗ t , x ( s )] = (cid:20) r ( s )Var[ X π ∗ t , x ( s )] + µ e − R Ts r ( h )d h β ( s ) (cid:21) d s , Var[ X π ∗ t , x ( t )] = , t < s ≤ T . (3.16)11rom equations (3.13) and (3.16), for t ≤ s ≤ T , we can obtain E [ X π ∗ t , x ( s )] and Var[ X π ∗ t , x ( s )] as follows:  E [ X π ∗ t , x ( s )] = xe R st r ( h )d h + e − R Ts r ( h )d h Z st β ( h )2 µ d h , Var[ X π ∗ t , x ( s )] = e − R Ts r ( h )d h Z st β ( h )4 µ d h . (3.17) Remark 3.5.

Notice that, we introduce the risk aversion coe ﬃ cient µ in cost functional (3.4). Byequation (3.17), we can solve µ by constrained condition (3.3) as follows: µ = R Tt β ( h )d h (cid:0) L − xe R Tt r ( h )d h (cid:1) . From equation (3.17), for t ≤ s ≤ T , the relationship between E [ X π ∗ t , x ( s )] and Var[ X π ∗ t , x ( s )] is givenas follows: Theorem 3.3.

Let Assumptions H and H hold. We have Var[ X π ∗ t , x ( s )] = (cid:18) E [ X π ∗ t , x ( s )] − xe R st r ( h )d h (cid:19) R st β ( h )d h , t ≤ s ≤ T , (3.18) where β ( h ) = γ ( h )[ σ ( h ) σ ( h ) ⊤ ] − γ ( h ) ⊤ , h ∈ [ t , T ] . Remark 3.6.

Based on equality (3.17), one obtains, ∂ s E [ X π ∗ t , x ( s )] = xr ( s ) e R st r ( h )d h + r ( s ) e − R Ts r ( h )d h Z st β ( h )2 µ d h + β ( s )2 µ e − R Ts r ( h )d h > and ∂ s Var[ X π ∗ t , x ( s )] = r ( s ) e − R Ts r ( h )d h Z st β ( h )4 µ d h + β ( s )4 µ e − R Ts r ( h )d h > . Thus, E [ X π ∗ t , x ( s )] and Var[ X π ∗ t , x ( s )] are increasing within s ∈ [ t , T ] . Noting that E [ X π ∗ t , x ( s )] ≥ xe R st r ( h )d h , s ∈ [ t , T ] , Var[ X π ∗ t , x ( s )] is increasing with E [ X π ∗ t , x ( s )] . Furthermore, from formulation (3.18), we can see thatthe relationship between E [ X π ∗ t , x ( s )] and Var[ X π ∗ t , x ( s )] is uniformly for s ∈ [ t , T ] . This formulation isuseful for the investor to check the relation between variance and mean value at each time s ∈ [ t , T ] . In this section, we compare our dynamic optimal strategy with the existence results: pre-committedand game-theoretic strategies. We focus on the properties of mean value, variance, optimal strategyand e ﬃ cient frontier. 12 .1 Comparison with pre-committed strategy To solve the classical mean-variance model in the multi-period case, Li and Ng (2000) consideredthe discrete-time multi-period mean-variance problem within a multi-objective optimization frame-work by embedding the original problem into a stochastic linear-quadratic optimal control problem.Based on the same idea in Li and Ng (2000), Zhou and Li (2000) formulated the continuous-timemean-variance problem as a stochastic LQ optimal control problem. In contrast, Dybvig (1988)proposed a cost-e ﬃ cient approach to solve the optimal portfolio selection in a straightforward man-ner. Bernard and Vandu ﬀ el (2014) studied the problem of mean-variance optimal portfolio in thepresence of a benchmark by the cost-e ﬃ cient approach. Also, see Andersson and Djehiche (2011),Fischer and Livieri (2016) and Ismail and Pham (2019) for the pre-committed strategies.Based on the same notation of this study, we review the main results of Zhou and Li (2000). Forthe given initial time t and state x , the optimal pre-committed strategy is given as follows: π ∗ ( s ) = γ ( s )[ σ ( s ) σ ( s ) ⊤ ] − [ λ e − R Ts r ( h )d h − X π ∗ t , x ( s )] , t ≤ s ≤ T , (4.1)where λ = e R Tt β ( h )d h µ + xe R Tt r ( h )d h . The related e ﬃ cient frontier is given as follows:Var[ X π ∗ t , x ( T )] = (cid:18) E [ X π ∗ t , x ( T )] − xe R Tt r ( h )d h (cid:19) e R Tt β ( h )d h − , (4.2)where E [ X π ∗ t , x ( s )] = xe R st [ r ( h ) − β ( h )]d h + λ e − R Ts r ( h )d h [1 − e − R st β ( h )d h ] , t ≤ s ≤ T , and E [ X π ∗ t , x ( T )] = µ ( e R Tt β ( h )d h − + xe R Tt r ( h )d h . In our model, by equality (3.17) in Subsection 3.2, we have E [ X π ∗ t , x ( s )] = xe R st r ( h )d h + e − R Ts r ( h )d h Z st β ( h )2 µ d h , with the dynamic optimal strategy π ∗ ( s ) = µ γ ( s )[ σ ( s ) σ ( s ) ⊤ ] − e − R Ts r ( h )d h , t ≤ s ≤ T . By formula (4.1), the value of optimal pre-committed strategy π ∗ ( · ) at initial time t is given asfollows: π ∗ ( t ) = µ γ ( t )[ σ ( t ) σ ( t ) ⊤ ] − e R Tt [ β ( h ) − r ( h )]d h . We have that π ∗ ( t ) < π ∗ ( t ), where π ∗ ( t ) < π ∗ ( t ) means that each element of π ∗ ( t ) is smaller than thatof π ∗ ( t ). This is because the optimal pre-committed strategy only cares about the mean and variance13t terminal time T , but not the entire investment time interval [ t , T ]. Thus, the optimal pre-committedstrategy changes along with initial time t . Now, we return to our dynamic optimal strategy π ∗ ( · ) thatcan minimize the objective cost functional along the investment time interval [ t , T ]. Thus, when weprovide the dynamic optimal strategy π ∗ ( · ) at initial time t , then it will not change in the followingtime s ∈ [ t , T ]. Furthermore, we have the following properties of mean and variance under the optimalpre-committed strategy π ∗ ( · ) and the dynamic optimal strategy π ∗ ( · ). The proof of Proposition 4.1 isgiven in Appendix A. Proposition 4.1.

For a given mean level L > xe R Tt r ( h )d h in the constrained condition (3.3), we have Var[ X π ∗ t , x ( T )] > Var[ X π ∗ t , x ( T )] . (4.3) For a given risk aversion parameter µ > , one obtains Var[ X π ∗ t , x ( T )] < Var[ X π ∗ t , x ( T )] , E [ X π ∗ t , x ( T )] < E [ X π ∗ t , x ( T )] . (4.4) Remark 4.1.

For a given mean constrained value L > xe R Tt r ( h )d h , E [ X π ∗ t , x ( T )] = E [ X π ∗ t , x ( T )] = L, basedon the purpose of the dynamic optimal strategy π ∗ ( · ) and the optimal pre-committed strategy π ∗ ( · ) ,we can see that the variance of the wealth X π ∗ t , x ( T ) within dynamic optimal strategy π ∗ ( · ) is largerthan the variance of the wealth X π ∗ t , x ( T ) within optimal pre-committed strategy π ∗ ( · ) . For a given riskaversion parameter µ > , the investor can obtain smaller mean value and variance at terminaltime T within the dynamic optimal strategy π ∗ ( · ) than within the optimal pre-committed strategy π ∗ ( · ) .Furthermore, for the given terminal time T , from the formulas of E [ X π ∗ t , x ( T )] and E [ X π ∗ t , x ( T )] , we can seethat the larger value of risk aversion parameter µ within a larger value of mean level L in constrainedcondition (3.3). Di ﬀ er from the pre-committed strategies, by considering an adjustment term, Basak and Chabakauri(2010) adopted a dynamic method to study the mean-variance model within a game-theoretic inter-pretation. In contrast, based on the game-theoretic approach, Bj¨ork et al. (2014, 2017) studied themean-variance problem with state dependent risk aversion.Now, we introduce the results of Subsection 3.1 of Basak and Chabakauri (2010). We assumethat there is one bond with risk-free rate r and one risky asset. The risky asset satisﬁes the constantelasticity of variance (CEV):d S ( s ) S ( s ) = b d s + σ S α ( s )d W ( s ) , t ≤ s ≤ T , r , b , σ, α are constants, b > r > , σ >

0. The optimal strategy π ∗ ( · ) in Basak and Chabakauri(2010) is given as follows: π ∗ ( s ) = b − r µσ S α ( s ) − µ (cid:18) b − r σ S α ( s ) (cid:19) e − α r ( T − s ) − r e − r ( T − s ) , t ≤ s ≤ T . (4.5)Similar with the manner of Theorem 3.2 and apply the results of Theorem 3.2 to the CEV model,further see Theorem 5.1. The dynamic optimal strategy is given as follows: π ∗ ( s ) = b − r µσ S α ( s ) , t ≤ s ≤ T . (4.6) Remark 4.2.

Note that if α = , the second term of optimal strategy π ∗ ( · ) is equal to , thus, π ∗ ( s ) = π ∗ ( s ) , t ≤ s ≤ T . This result demonstrates that our methodology developed in this study is a usefultool to establish a dynamic optimal strategy for the classical mean-variance model.However, our method is di ﬀ erent from that of Basak and Chabakauri (2010). Note that, for t ≤ s ≤ T , when S ( s ) > , we can obtain π ∗ ( s ) > π ∗ ( s ) for α > , and π ∗ ( s ) < π ∗ ( s ) for α < . Comparedwith our dynamic optimal strategy π ∗ ( · ) , the optimal strategy π ∗ ( · ) suggests that the investor adds theinvestment amount to the risky asset when the volatility of the risky asset becomes large, and reducesthe investment amount to the risky asset when the volatility of the risky asset becomes small. Incontrast, our dynamic optimal strategy π ∗ ( · ) suggests that the investor adds the investment amount tothe risky asset when the volatility of the risky asset becomes small, and reduces the investment amountto the risky asset when the volatility of the risky asset becomes large. These results indicate that ourdynamic optimal strategy π ∗ ( · ) is better than the optimal strategy π ∗ ( · ) that is derived based on thegame-theoretic approach. In this section, we consider the following general setting for the bond and the risky assets. Inthe ﬁnancial market, there is one risk-free bond asset and n risky stock assets that are traded, and thebond satisﬁes the following equation:  d P ( s ) P ( s ) = r ( s , P ( s ))d s , P ( t ) = p , t < s ≤ T , and the i ’th (1 ≤ i ≤ n ) stock asset is described by  d P i ( s ) P i ( s ) = b i ( s , P i ( s ))d s + d X j = σ i j ( s , P i ( s ))d W j ( s ) , P i ( t ) = p i , t < s ≤ T , σ ( · ) = ( σ ( · ) , · · · , σ n ( · )) ⊤ ∈ R n × d is the corresponding volatility matrix. Given initial capital x > γ ( · ) = ( γ ( · ) , · · · , γ n ( · )) ∈ R n , where γ i ( · ) = b i ( · ) − r ( · ) , ≤ i ≤ n . The investor’s wealth X π t , x ( · )satisﬁes  d X π t , x ( s ) = (cid:2) r ( s , P ( s )) X π t , x ( s ) + γ ( s , P ( s ) , P ( s )) π ( s ) ⊤ (cid:3) d s + π ( s ) σ ( s , P ( s ))d W ( s ) , X π t , x ( t ) = x , t < s ≤ T , (5.1)where π ( · ) = ( π ( · ) , · · · , π n ( · )) ∈ R n is the capital invested in the risky assets, P ( · ) = ( P ( · ) , · · · , P n ( · )) ∈ R n and π ( · ) is the capital invested in the bond.We assume the following new Assumptions H and H for the above general setting. H : For ( t , z ) ∈ [0 , T ] × R , r ( t , z ) z , b ( t , z ) z and σ ( t , z ) z are deterministic continuous functions andsatisfy Lipschitz conditions in z . H : r ( · ) , γ ( · ) > σ ( · ) σ ( · ) ⊤ > δ I , where δ > I is the identity matrix of S n , and S n is the set of symmetric matrices.Notice that, Assumption H is used to guarantee the existence and uniqueness of P ( · ) and P ( · ).Meanwhile, we will employ Assumption H to obtain the optimal strategy. The main result of thissection is given as follows and the proof is given in Appendix A. Theorem 5.1.

Let Assumptions H and H hold. For any given ≤ t ≤ T , x , y ∈ R , x , y,V µ ( t , x , y ) = µ ( x − y ) e R Tt r ( h , P ( h ))d h − ye R Tt r ( h , P ( h ))d h − Z Tt E [ β ( h )]4 µ d h , (5.2) is the classical solution of the following partial di ﬀ erential equation,  ∂ t V µ ( t , x , y ) = − inf π ∈ R n (cid:26) ∂ x V µ ( t , x , y )[ r ( t , P ( t )) x + γ ( t , P ( t ) , P ( t )) π ⊤ ] + ∂ y V µ ( t , x , y )[ r ( t , P ( t )) y + γ ( t , P ( t ) , P ( t )) π ⊤ ] + ∂ xx V µ ( t , x , y ) πσ ( t , P ( t )) σ ( t , P ( t )) ⊤ π ⊤ (cid:27) , V µ ( T , x , y ) = µ ( x − y ) − y , (5.3) where β ( t ) = γ ( t , P ( t ) , P ( t ))[ σ ( t , P ( t )) σ ( t , P ( t )) ⊤ ] − γ ( t , P ( t ) , P ( t )) ⊤ , and the related optimal strategyis π ∗ ( t , x , y ) = µ γ ( t , P ( t ) , P ( t ))[ σ ( t , P ( t )) σ ( t , P ( t )) ⊤ ] − e − R Tt r ( h , P ( h ))d h . Remark 5.1.

Based on Remark 3.3 and Theorem 5.1, we can obtain the time-consistent dynamicoptimal strategy π ∗ ( s ) = µ γ ( s , P ( s ) , P ( s ))[ σ ( s , P ( s )) σ ( s , P ( s )) ⊤ ] − e − R Ts r ( h , P ( h ))d h , t ≤ s ≤ T , hich is independent from the state ( x , y ) and the optimal value for cost functional is given as follows:V µ ( t , x , x ) = lim y → x V µ ( t , x , y ) = − xe R Tt r ( h , P ( h ))d h − Z Tt E [ β ( h )]4 µ d h , where the expectation E [ · ] is based on the information of time t.In general, we can consider the following objective value function:V µ ( t , x , y ) = inf π ( · ) ∈A Tt E [ Φ ( X π t , x ( T ) , E [ X π t , y ( T )])] , where Φ ( x , y ) , x , y ∈ R is a nonlinear function of ( x , y ) . We can obtain a Hamilton-Jocabi-Bellmanequation for the value function V µ ( t , x , y ) with boundary condition V µ ( T , x , y ) = Φ ( x , y ) . To obtain a time-consistent dynamic optimal strategy for the classical continuous time mean-variance model, we view that the mean process E [ X π t , x ( · )] should be recognized as a deterministicprocess that is di ﬀ erent from the wealth process X π t , x ( · ). Then, we consider the following objectivecost functional: ˜ J ( t , x , y , µ ; π ( · )) = µ E [ (cid:0) X π t , x ( T ) − Y π t , y ( T ) (cid:1) ] − Y π t , y ( T ) . (6.1)From the cost functional (6.1), we can distinguish the wealth process X π t , x ( · ) and mean process Y π t , y ( · ) = E [ X π t , y ( · )] from the variance of the wealth. Based on these setting, we can derive a Hamilton-Jocabi-Bellman equation for the ternary value function V µ ( t , x , y ). Our main results are given as follows: • A new method is proposed to deal with the objective cost functional when it contains a nonlinearpart of the mean process E [ X π t , x ( · )]. This new method can help us to separate the nonlinear partof the mean process from the original objective cost functional. • For the general setting, we can obtain the explicit formula for the value function V µ ( t , x , y ). Thetime-consistent dynamic optimal strategy is found and is di ﬀ erent from the existing results. • Furthermore, the time-consistent relation of the mean and variance of this mean-variance modelis established.

A The main proofs

Proof of Theorem 3.1.

Using the same technique in the proof of Theorem 3.3, Chapter 4 in Yong and Zhou(1999), we can prove these results. For the reader’s convenience, we show the main steps of this proof.In the following, for any given 0 ≤ t ≤ s ≤ T , x , y ∈ R , we set˜ V µ ( t , x , y ) = inf π ( · ) ∈A st E [ V µ ( s , X π t , x ( s ) , Y π t , y ( s ))] .

17y the deﬁnition of value function V µ ( t , x , y ), for any given ε >

0, there exists strategy ˜ π ( · ) (in thesense of weak formulation, see Yong and Zhou (1999)) such that V µ ( t , x , y ) + ε ≥ E [ µ (cid:0) X ˜ π t , x ( T ) − Y ˜ π t , y ( T ) (cid:1) − Y ˜ π t , y ( T )] = E (cid:2) E (cid:2) µ (cid:0) X ˜ π t , x ( T ) − Y ˜ π t , y ( T ) (cid:1) − Y ˜ π t , y ( T ) | F s (cid:3)(cid:3) = E (cid:2) E (cid:2) µ (cid:0) X ˜ π s , X ˜ π t , x ( s ) ( T ) − Y ˜ π s , Y ˜ π t , y ( s ) ( T ) (cid:1) − Y ˜ π s , Y ˜ π t , y ( s ) ( T ) | F s (cid:3)(cid:3) = E (cid:2) ˜ J ( s , X ˜ π t , x ( s ) , Y ˜ π t , y ( s ) , µ ; ˜ π ( · )) (cid:3) ≥ E (cid:2) V µ ( s , X ˜ π t , x ( s ) , Y ˜ π t , y ( s )) (cid:3) ≥ ˜ V µ ( t , x , y ) . (A.1)The third equality of (A.1) is derived by Lemma 3.1. In contrast, for the given ε >

0, we want toprove V µ ( t , x , y ) ≤ ˜ V µ ( t , x , y ) + ε in the following step. Based on Assumptions H and H , there exists δ > | x − x | + | y − y | < ε , we have that (cid:12)(cid:12)(cid:12) ˜ J ( t , x , y , µ ; π ( · )) − ˜ J ( t , x , y , µ ; π ( · )) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) V µ ( t , x , y ) − V µ ( t , x , y ) (cid:12)(cid:12)(cid:12) < ε . This inequality helps us ﬁnd a strategyˆ π ( h ) =  π ( h ) , t ≤ h ≤ s , ˜ π ( h ) , s < h ≤ T , where π ( · ) ∈ A st is a any given strategy, such that˜ J ( s , X π t , x ( s ) , Y π t , y ( s ) , µ ; ˜ π ( · )) < V µ ( s , X π t , x ( s ) , Y π t , y ( s )) + ε. Thus, for the strategy ˆ π ( · ), we have V µ ( t , x , y ) ≤ E [ µ (cid:0) X ˆ π t , x ( T ) − Y ˆ π t , y ( T ) (cid:1) − Y ˆ π t , y ( T )] = E (cid:2) E (cid:2) µ (cid:0) X ˆ π t , x ( T ) − Y ˆ π t , y ( T ) (cid:1) − Y ˆ π t , y ( T ) | F s (cid:3)(cid:3) = E (cid:2) E (cid:2) µ (cid:0) X ˆ π s , X π t , x ( s ) ( T ) − Y ˆ π s , Y π t , y ( s ) ( T ) (cid:1) − Y ˆ π s , Y π t , y ( s ) ( T ) | F s (cid:3)(cid:3) = E (cid:2) ˜ J ( s , X π t , x ( s ) , Y π t , y ( s ) , µ ; ˆ π ( · )) (cid:3) ≤ E (cid:2) V µ ( s , X π t , x ( s ) , Y π t , y ( s )) (cid:3) + ε, (A.2)for π ( · ) ∈ A st is a any given strategy, we have V µ ( t , x , y ) ≤ ˜ V µ ( t , x , y ) + ε. (A.3)18ow, we combine equations (A.1) and (A.3) to obtain the equation (3.8). This completes the proof. (cid:3) Proof of Theorem 3.2.

Note that, when V µ ( t , x , y ) ∈ C , , ([0 , T ] × R × R ), we have that0 = inf π ∈A st E [ V µ ( s , X π t , x ( s ) , Y π t , y ( s )) − V µ ( t , x , y )] = inf π ∈A st E (cid:20) V µ ( t , x , y )( s − t ) + ∂ x V µ ( t , x , y )( X π t , x ( s ) − x ) + ∂ xx V µ ( t , x , y )( X π t , x ( s ) − x ) + ∂ y V µ ( t , x , y )( Y π t , y ( s ) − y ) (cid:21) + o( s − t ) = inf π ∈A st E (cid:20) V µ ( t , x , y )( s − t ) + ∂ x V µ ( t , x , y )( X π t , x ( s ) − x ) + ∂ xx V µ ( t , x , y )( X π t , x ( s ) − x ) + ∂ y V µ ( t , x , y )( X π t , y ( s ) − y ) (cid:21) + o( s − t ) , the last equality is derived by the equation Y π t , y ( s ) = E [ X π t , y ( s )], where ∂ t V µ ( · , · , · ) means the partialderivative on time, while ∂ x V µ ( · , · , · ) and ∂ y V µ ( · , · , · ) mean the partial derivative on the ﬁrst and secondstate of the value function V µ ( · , · , · ), respectively, and ∂ xx V µ ( · , · , · ) means the second-order partialderivative on the ﬁrst state x . Dividing s − t on both sides of this equation and letting s → t , oneobtains  ∂ t V µ ( t , x , y ) = − inf π ∈ R n (cid:26) ∂ x V µ ( t , x , y )[ r ( t ) x + γ ( t ) π ⊤ ] + ∂ y V µ ( t , x , y )[ r ( t ) y + γ ( t ) π ⊤ ] + ∂ xx V µ ( t , x , y ) πσ ( t ) σ ( t ) ⊤ π ⊤ (cid:27) , V µ ( T , x , y ) = µ ( x − y ) − y , ≤ t ≤ T . (A.4)In the ﬁrst step, we assume ∂ xx V µ ( t , x , y ) >

0; thus, the optimal strategy at time t satisﬁes π ∗ ( t , x , y ) = γ ( t )[ σ ( t ) σ ( t ) ⊤ ] − [ ∂ x V µ ( t , x , y ) + ∂ y V µ ( t , x , y )] − ∂ xx V µ ( t , x , y ) , which deduces that ∂ t V µ ( t , x , y ) + ∂ x V µ ( t , x , y ) r ( t ) x + ∂ y V µ ( t , x , y ) r ( t ) y = − β ( t )[ ∂ x V µ ( t , x , y ) + ∂ y V µ ( t , x , y )] ∂ xx V µ ( t , x , y ) , (A.5)where β ( t ) = γ ( t )[ σ ( t ) σ ( t ) ⊤ ] − γ ( t ) ⊤ .In the second step, we assume the solution to equation (A.5) is given as follows: V µ ( t , x , y ) = A ( t )( x − y ) + B ( t ) y + C ( t ) , (A.6)where A ( · ) , B ( · ) , C ( · ) are the continuous derivable functions in [0 , T ] with A ( T ) = µ, B ( T ) = − , C ( T ) = .

19e plug the representation of V µ ( t , x , y ) (A.6) into equation (A.5), A ′ ( t )( x − y ) + B ′ ( t ) y + C ′ ( t ) + A ( t ) r ( t )( x − y ) x + A ( t ) r ( t )( y − x ) y + B ( t ) r ( t ) y = − β ( t )[2 A ( t )( x − y ) + A ( t )( y − x ) + B ( t )] A ( t ) , (A.7)then, [ A ′ ( t ) + A ( t ) r ( t )]( x − y ) + [ B ′ ( t ) + B ( t ) r ( t )] y + C ′ ( t ) = − β ( t ) B ( t ) A ( t ) . Thus, we obtain the equations for A ( · ) , B ( · ) , C ( · ), A ′ ( t ) + A ( t ) r ( t ) = , A ( T ) = µ, ≤ t ≤ T ; B ′ ( t ) + B ( t ) r ( t ) = , B ( T ) = − , ≤ t ≤ T ; C ′ ( t ) = − β ( t ) B ( t ) A ( t ) , C ( T ) = , ≤ t ≤ T . (A.8)The solution to equation (A.8) is given as follows: A ( t ) = µ e R Tt r ( h )d h , ≤ t ≤ T ; B ( t ) = − e R Tt r ( h )d h , ≤ t ≤ T ; C ( t ) = − Z Tt β ( h )4 µ d h , ≤ t ≤ T . (A.9)Therefore, we have for x , y , V µ ( t , x , y ) = µ ( x − y ) e R Tt r ( h )d h − ye R Tt r ( h )d h − Z Tt β ( h )4 µ d h . (A.10)Notice that the risk aversion parameter µ >

0, thus, ∂ xx V µ ( t , x , y ) >

0. The optimal strategy, π ∗ ( t , x , y ) = µ γ ( t )[ σ ( t ) σ ( t ) ⊤ ] − e − R Tt r ( h )d h , ≤ t ≤ T . (A.11)Now, we can check the formula (A.10) of V µ ( t , x , y ) ∈ C , , ([0 , T ] × R × R ) which is a classicalsolution to (A.4). Employing the uniqueness results from Theorem 6.1 Chapter 4 in Yong and Zhou(1999), we have that V µ ( t , x , y ) in equation (A.10) is the unique classical solution of PDE (A.4). Thiscompletes the proof. (cid:3) Proof of Proposition 4.1.

For a given mean level L > xe R Tt r ( h )d h in constrained condition (3.3). Theoptimal strategy π ∗ ( · ) and π ∗ ( · ) satisfy E [ X π ∗ t , x ( T )] = E [ X π ∗ t , x ( T )] = L . By formulations (3.18) and (4.2), we haveVar[ X π ∗ t , x ( T )] = (cid:18) L − xe R Tt r ( h )d h (cid:19) R Tt β ( h )d h , Var[ X π ∗ t , x ( T )] = (cid:18) L − xe R Tt r ( h )d h (cid:19) e R Tt β ( h )d h − .

20y Assumption H , we have β ( s ) > , t ≤ s ≤ T , and Z Tt β ( h )d h < e R Tt β ( h )d h − . Therefore, one obtains, Var[ X π ∗ t , x ( T )] > Var[ X π ∗ t , x ( T )] . For a given risk aversion parameter µ >

0, we have E [ X π ∗ t , x ( T )] = xe R Tt r ( h )d h + Z Tt β ( h )2 µ d h , and E [ X π ∗ t , x ( T )] = xe R Tt r ( h )d h + µ ( e R Tt β ( h )d h − . From β ( s ) > , t ≤ s ≤ T , it follows Z Tt β ( h )2 µ d h < µ ( e R Tt β ( h )d h − , which implies that xe R Tt r ( h )d h < E [ X π ∗ t , x ( T )] < E [ X π ∗ t , x ( T )] . Again, by formulations (3.18) and (4.2), we haveVar[ X π ∗ t , x ( T )] = R Tt β ( h )d h µ < e R Tt β ( h )d h − µ = Var[ X π ∗ t , x ( T )] . Therefore, Var[ X π ∗ t , x ( T )] < Var[ X π ∗ t , x ( T )] , E [ X π ∗ t , x ( T )] < E [ X π ∗ t , x ( T )] . (A.12)This completes the proof. (cid:3) Proof of Theorem 5.1.

The proof of this Theorem is same with that in Theorem 3.2. For reader’sconvenience, we show the details of this proof. For any given 0 ≤ t ≤ s ≤ T , x , y ∈ R . Using thetechnique in the proof of Theorem 3.1, we can obtain that V µ ( t , x , y ) = inf π ( · ) ∈A st E [ V µ ( s , X π t , x ( s ) , Y π t , y ( s ))] . (A.13)In the following, we assume V µ ( t , x , y ) ∈ C , , ([0 , T ] × R × R ). Employing Itˆo formula to V µ ( s , X π t , x ( s ) , Y π t , y ( s ))and by equation (A.13), it follows that0 = inf π ∈A st E (cid:20) V µ ( t , x , y )( s − t ) + ∂ x V µ ( t , x , y )( X π t , x ( s ) − x ) + ∂ xx V µ ( t , x , y )( X π t , x ( s ) − x ) + ∂ y V µ ( t , x , y )( X π t , y ( s ) − y ) (cid:21) + o( s − t ) . (A.14)21ividing s − t on both sides of equation (A.14) and letting s → t , we have  ∂ t V µ ( t , x , y ) = − inf π ∈ R n (cid:26) ∂ x V µ ( t , x , y )[ r ( t , P ( t )) x + γ ( t , P ( t ) , P ( t )) π ⊤ ] + ∂ y V µ ( t , x , y )[ r ( t , P ( t )) y + γ ( t , P ( t ) , P ( t )) π ⊤ ] + ∂ xx V µ ( t , x , y ) πσ ( t , P ( t )) σ ( t , P ( t )) ⊤ π ⊤ (cid:27) , V µ ( T , x , y ) = µ ( x − y ) − y , ≤ t ≤ T . (A.15)In addition, we assume ∂ xx V µ ( t , x , y ) >

0; thus, the optimal strategy at time t satisﬁes π ∗ ( t , x , y ) = γ ( t , P ( t ) , P ( t ))[ σ ( t , P ( t )) σ ( t , P ( t )) ⊤ ] − [ ∂ x V µ ( t , x , y ) + ∂ y V µ ( t , x , y )] − ∂ xx V µ ( t , x , y ) , and ∂ t V µ ( t , x , y ) + ∂ x V µ ( t , x , y ) r ( t , P ( t )) x + ∂ y V µ ( t , x , y ) r ( t , P ( t )) y = − β ( t )[ ∂ x V µ ( t , x , y ) + ∂ y V µ ( t , x , y )] ∂ xx V µ ( t , x , y ) , (A.16)where β ( t ) = γ ( t , P ( t ) , P ( t ))[ σ ( t , P ( t )) σ ( t , P ( t )) ⊤ ] − γ ( t , P ( t ) , P ( t )) ⊤ .In the following, we assume the solution to equation (A.16) is given as follows: V µ ( t , x , y ) = A ( t )( x − y ) + B ( t ) y + C ( t ) , (A.17)where A ( · ) , B ( · ) , C ( · ) are the continuous derivable functions in [0 , T ] with A ( T ) = µ, B ( T ) = − , C ( T ) = . We plug the representation of V µ ( t , x , y ) (A.17) into equation (A.16). Then, we can obtain the equa-tions for A ( · ) , B ( · ) , C ( · ), A ′ ( t ) + A ( t ) r ( t , P ( t )) = , A ( T ) = µ, ≤ t ≤ T ; B ′ ( t ) + B ( t ) r ( t , P ( t )) = , B ( T ) = − , ≤ t ≤ T ; C ′ ( t ) = − β ( t ) B ( t ) A ( t ) , C ( T ) = , ≤ t ≤ T . (A.18)Notice that, for s > t , β ( s ) is a random variable. To ﬁnd an adapted solution for C ( · ), we take theexpectation E [ · ] on both sides of the third equation of (A.18), where the expectation E [ · ] is based onthe information of time t . The solution to equation (A.18) is given as follows: A ( t ) = µ e R Tt r ( h , P ( h ))d h , ≤ t ≤ T ; B ( t ) = − e R Tt r ( h , P ( h ))d h , ≤ t ≤ T ; C ( t ) = − Z Tt E [ β ( h )]4 µ d h , ≤ t ≤ T . (A.19)22herefore, we have V µ ( t , x , y ) = µ ( x − y ) e R Tt r ( h , P ( h ))d h − ye R Tt r ( h , P ( h ))d h − Z Tt E [ β ( h )]4 µ d h . (A.20)Notice that the risk aversion parameter µ >

0, thus, ∂ xx V µ ( t , x , y ) >

0. The optimal strategy is givenas follows: π ∗ ( t , x , y ) = µ γ ( t , P ( t ) , P ( t ))[ σ ( t , P ( t )) σ ( t , P ( t )) ⊤ ] − e − R Tt r ( h , P ( h ))d h , ≤ t ≤ T . (A.21)The following proof is same with that in Theorem 3.2. Thus, we omit it. This completes the proof. (cid:3) eferences D. Andersson and B. Djehiche. A maximum principle for SDEs of mean-ﬁeld type.

Appl. Math.Optim. , 63:341–356, 2011.I. Bajeux-Besnainou and R. Portait. Dynamic asset allocation in a mean-variance framework.

Man-agement Science , 11:79–95, 1998.S. Basak and G. Chabakauri. Dynamic mean-variance asset allocation.

Review of Financial Studies ,23:2970–3016, 2010.A. Bensoussan, K. Sung, and S. C. P. Yam. Linear-quadratic time-inconsistent mean ﬁeld games.

Dyn. Games. Appl , 3:537–552, 2013.A. Bensoussan, K. Sung, S. C. P. Yam, and S. P. Yung. Linear-quadratic mean ﬁeld games.

Journalof Optimization Theory and Applications , 169:496–529, 2016.C. Bernard and S. Vandu ﬀ el. Mean-variance optimal portfolios in the presence of a benchmark withapplications to fraud detection. European Journal of Operational Research , 234:469–480, 2014.T. R. Bielecki, H. Q. Jin, S. Pliska, and X. Y. Zhou. Continuous time mean-variance portfolio selectionwith bankruptcy prohibition.

Mathematical Finance , 15:213–244, 2005.T. Bj¨ork, A. Murgoci, and X. Y. Zhou. Mean-variance protfolio optimization with state-dependentrisk aversion.

Mathematical Finance , 24:1–24, 2014.T. Bj¨ork, M. Khapko, and A. Murgoci. On time-inconsistent stochastic control in continuous time.

Finance Stochastic , 21:331–360, 2017.R. Buckdahn, B. Djehiche, and J. Li. A general stochastic maximum principle for SDEs of mean-ﬁeldtype.

Appl. Math. Optim. , 64:197–216, 2011.M. Dai, Z. Q. Xu, and X. Y. Zhou. Continuous-time markowitz model with transaction costs.

SIAMJournal on Financial Mathematics , 1:96–125, 2010.M. Dai, H. Jin, K. Steven, and Y. Xu. A dynamic mean-variance analysis for log returns.

Acceptedby Management Science , https: // ssrn.com / abstract = ﬃ cient dynamic portfolio strategies or how to throw away a million dollars in thestock market. The Review of Financial Studies , 1:67–88, 1988.M. Fischer and G. Livieri. Continuous time mean-variance portfolio optimization through the meanﬁeld approach.

ESAIM: Probability and Statistics , 20:30–44, 2016.24. Hu, H. Jin, and X. Y. Zhou. Time-inconsistent stochastic linear-quadratic control.

SIAM Journalon Control and Optimization , 50:1548–1572, 2012.M. Huang, P. E. Caines, and R. P. Malhame. The Nash certainty equivalence principle and McKean-Vlasov systems: An invariance principle and entry adaptation.

Proceedings of the 46th IEEEConference on Decision and Control , pages 121–126, 2007.A. Ismail and H. Pham. Robust Markowitz mean-variance portfolio selection under ambiguous co-variance matrix.

Mathematical Finance , 29:174–207, 2019.D. Li and W. L. Ng. Optimal dynamic portfolio selection: Multi-period mean-variance formulation.

Mathematical Finance , 10:387–406, 2000.J. Li. Stochastic maximum principle in the mean-ﬁeld controls.

Automatica , 48:366–373, 2012.A. E. B. Lim. Quadratic hedging and mean-variance portfolio selection with random parameters inan incomplete market.

Mathematics of Operations Research , 29:132–161, 2004.A. E. B. Lim and X. Y. Zhou. Quadratic hedging and mean-variance portfolio selection with randomparameters in a complete market.

Mathematics of Operations Research , 1:101–120, 2002.H. Markowitz. Portfolio selection.

Journal of Finance , 7:77–91, 1952.H. Markowitz.

Portfolio Selection: E ﬃ cient diversiﬁcation of investment . John Wiley & Sons, NewYork, 1959.R. C. Merton. An analytic derivation of the e ﬃ cient frontier. J. Finance Quant. Anal. , 7:1851–1872,1972.H. Pham and X. Wei. Dynamic programming for optimal control of stochastic McKean-Vlasov dy-namics.

SIAM Journal on Control and Optimization , 55:1069–1101, 2017.H. Pham and X. Wei. Bellman equation and viscosity solutions for mean-ﬁeld stochastic controlproblem.

ESAIM: Control, Optimisation and Calculus of Variations , 24:437–461, 2018.H. R. Richardson. A minimum variance result in continuous trading portfolio optimization.

Manage-ment Science , 9:1045–1055, 1989.J. M. Xia. Mean-variance portfolio choice: Quadratic partial hedging.

Mathematical Finance , 15:533–538, 2005.J. Yong. Time-inconsistent optimal control problems and the equilibrium HJB equation.

Mathemati-cal control and related ﬁelds , 2:271–329, 2012.25. Yong and X. Y. Zhou.

Stochastic control: Hamiltonian systems and HJB equations . Springer, NewYork, 1999.X. Y. Zhou and D. Li. Continuous-time mean-variance portfolio selection: A stochastic LQ frame-work.