Sparse High-Order Portfolios via Proximal DCA and SCA
SSPARSE HIGH-ORDER PORTFOLIOS VIA PROXIMAL DCA AND SCA
Jinxin Wang † , Zengde Deng ‡ , Taoli Zheng † , and Anthony Man-Cho So †† Department of Systems Engineering and Engineering Management, CUHK, Hong Kong SAR ‡ Cainiao Network, Alibaba Group, Hangzhou, China
ABSTRACT
In this paper, we aim at solving the cardinality constrainedhigh-order portfolio optimization, i.e., mean-variance-skewness-kurtosis model with cardinality constraint (MVSKC). Opti-mization for the MVSKC model is of great difficulty in twoparts. One is that the objective function is non-convex, theother is the combinational nature of the cardinality con-straint, leading to non-convexity as well as dis-continuity.Since the cardinality constraint has the difference-of-convex(DC) property, we transform the cardinality constraint intoa penalty term and then propose three algorithms includingthe proximal difference of convex algorithm (pDCA), pDCAwith extrapolation (pDCAe) and the successive convex ap-proximation (SCA) to handle the resulting penalized MVSK(PMVSK) formulation. Moreover, theoretical convergenceresults of these algorithms are established respectively. Nu-merical experiments on the real datasets demonstrate thesuperiority of our proposed methods in obtaining high utilityand sparse solutions as well as efficiency in terms of time.
Index Terms — High-order portfolios, cardinality con-straint, difference of convex, successive convex approxima-tion
1. INTRODUCTION
Asset allocation is a fundamental and challenging taskfor investors. One significant progress to address this issuein modern portfolio theory is the Markowitz mean-varianceportfolio (MV) framework [1]. In the MV framework, theinvestors’ purpose is to maximize their expected profit (i.e.,mean return rate, or first moment) and minimize the corre-sponding risk (i.e., variance of portfolio, or second moment).Due to the transaction cost, budget constraints, or even mentalcosts, investors may pick only a small number of assets outof all candidates, resulting to the sparse portfolio optimiza-tion [2]. Some optimization methods have been developedfor the MV framework with cardinality constraint [3–5].The MV framework is based on the assumption thatthe returns follow a Gaussian distribution or investors havea quadratic utility [6]. However, in real financial market,the Gaussian distribution assumption is seldom satisfied be-cause asset returns in real market are usually asymmetric and heavy-tailed [7, 8]. To overcome the aforementioneddrawbacks, several studies suggest taking high-order mo-ments into consideration since the third and fourth moments(i.e., skewness and kurtosis) can well reflect the asymmetryand heavy-tailedness [9–11]. Consequently, they proposethe MVSK framework aiming at maximizing the mean andskewness (odd moments) as well as minimizing the varianceand kurtosis (even moments). Assuming that investors haveconstant relative risk aversion (CRRA) [12] preferences, theMVSK model is the fourth-order Taylor expansion of theexpected utility. Indeed, the MVSK framework is more accu-rate than the MV framework at the expense of non-convexityinducing from the high-order moments, giving rise to chal-lenges for optimization.Recently, with the development in optimization fields,some algorithms are proposed to solve the MVSK relatedproblems. A difference of convex algorithm (DCA) is devel-oped in [13] for the MVSK framework. Further improvementbased on the difference-of-convex-sums-of-squares (DC-SOS) is customized [14]. Quite recently, to approximatethe original non-convex objective function more tightly, twoalgorithms based on the classical MM and SCA are pro-posed [15]. However, to the best of our knowledge, due tothe great difficulty in optimization, the MVSKC frameworkhas not received much attention yet. One exception is the ref-erence [16] that considers a bi-objective optimization basedon the trade-off between expected utility and cardinality. It isshown that there are gains in terms of out-of-sample certaintyequivalent and Sharpe ratio for certain cardinality levels.However, they directly apply a derivative-free solver basedon direct multi-search [17], which may be cumbersome andextremely sensitive to the initial point.In this paper, based on certain available structures of theMVSK problem, we consider integrating the cardinality con-straint into the objective function via penalty technique, lead-ing to the PMVSK problem. Further, we present several ap-proaches to solve the PMVSK problem, including pDCA, pD-CAe, and SCA. The convergence results of our proposed al-gorithms to a stationary point are established. In addition tothe theoretical guarantees, our methods obtain sparse solu-tions with better utility. In terms of computational efficiency,our proposed pDCAe and SCA empirically require less time.Thus they are favorable for large-scale problems. a r X i v : . [ q -f i n . P M ] A ug . PROBLEM FORMULATION Suppose the returns of N assets are ˜ r ∈ R N and w ∈ R N is the portfolio weights. The expected return of this portfolio,i.e., the first moment, is φ ( w ) = E ( w (cid:62) ˜ r ) = w (cid:62) µ , where µ = E (˜ r ) is the mean return vector. We denote r =˜ r − µ as the centered returns. The q -th central moment ofthe portfolio return is E [( w (cid:62) r ) q ] , from which we have thefollowing:• The second moment, a.k.a. variance, of the portfolioreturn is φ ( w ) = E [ w (cid:62) rr (cid:62) w ] = w (cid:62) Σ w , where Σ = E [ rr (cid:62) ] is the covariance matrix.• The third moment, a.k.a. skewness, of the portfolio re-turn is φ ( w ) = E [( w (cid:62) r ) ] = w (cid:62) Φ ( w ⊗ w ) , where ⊗ is the Kronecker product, and Φ = E ( r ( r (cid:62) ⊗ r (cid:62) )) ∈ R N × N is the co-skewness matrix.• The forth moment a.k.a. kurtosis, of the portfolio returnis φ ( w ) = E [( w (cid:62) r ) ] = w (cid:62) Ψ ( w ⊗ w ⊗ w ) , where Ψ = E [ r ( r (cid:62) ⊗ r (cid:62) ⊗ r (cid:62) )] ∈ R N × N is theco-kurtosis matrix.Based on the above definitions, the MVSKC problem [16]is given as follows: min w f ( w ) = − λ φ ( w ) + λ φ ( w ) − λ φ ( w ) + λ φ ( w ) s.t. (cid:62) w = 1 , (cid:107) w (cid:107) ≤ k, − α ≤ w ≤ α , (1)where λ , λ , λ , λ > are parameters to balance the fourmoments of the portfolio return, (cid:107) w (cid:107) is defined as the num-ber of non-zero elements of w , k < N is an integer control-ling the number of assets to be selected, and α > is to boundeach element of w to avoid overconcentration in a few assets.It is easy to verify that f is non-convex, and twice contin-uously differentiable. The gradient and Hessian of each termin f are given in the following lemma. Lemma 1. [15, Lemma 1] The gradient and Hessian of themean, variance, skewness, and kurtosis are as follows: ∇ φ ( w ) = µ , ∇ φ ( w ) = 2 Σ w , ∇ φ ( w ) = 3 Φ ( w ⊗ w ) , ∇ φ ( w ) = 4 Ψ ( w ⊗ w ⊗ w ) , ∇ φ ( w ) = , ∇ φ ( w ) = 2 Σ , ∇ φ ( w ) = 6 Φ ( I ⊗ w ) , ∇ φ ( w ) = 12 Ψ ( I ⊗ w ⊗ w ) , where I ∈ R N × N is the identity matrix. In addition, we can decompose the objection function f into the convex part f cvx = − λ φ ( w ) + λ φ ( w ) as wellas the nonconvex part f ncvx = − λ φ ( w ) + λ φ ( w ) . Oneimportant observation is that there exists an upper bound onthe spectral radius of ∇ f ncvx ( w ) , a.k.a. ρ ( ∇ f ncvx ( w )) , denoted as τ dc (see Lemma 2). Hence, we rewrite f ( w ) as f ( w ) = f cvx ( w ) + τ dc w (cid:62) w − (cid:16) τ dc w (cid:62) w − f ncvx ( w ) (cid:17) , which is a DC function. Lemma 2.
Under the constraints in (1) , we obtain ρ ( ∇ f ncvx ( w )) ≤ αλ max ≤ i ≤ N N (cid:88) j =1 | Φ ij | + 12 α λ max ≤ i ≤ N N (cid:88) j =1 | Ψ ij | . (2)When it comes to the cardinality constraint, inspired bythe DC property of the cardinality constraint [18], (cid:107) w (cid:107) ≤ k ⇐⇒ (cid:107) w (cid:107) − (cid:107) w (cid:107) [ k ] = 0 , where (cid:107) w (cid:107) [ k ] is the largest- k norm, which is defined as thesum of the k largest absolute value elements of w , we trans-form (1) to the PMVSK problem by taking the cardinalityconstraint as a penalty term, min w f p ( w ) = τ dc w (cid:62) w − (cid:16) τ dc w (cid:62) w − f ncvx ( w ) (cid:17) + f cvx ( w ) + ρ ( (cid:107) w (cid:107) − (cid:107) w (cid:107) [ k ] ) s.t. (cid:62) w =1 , − α ≤ w ≤ α , (3)where ρ > is the penalty coefficient.
3. ALGORITHM DESIGN
In this section, we propose three algorithms includingproximal difference-of-convex algorithm (pDCA), proximaldifference-of-convex algorithm with extrapolation (pDCAe),and successive convex approximation algorithm (SCA) tosolve the PMVSK problem (3). (3)The main idea of pDCA is to successively construct anglobal upper bound of the objective function in (3) by lin-earizing the concave part. In the j -th iteration, we need tosolve the following subproblem, min w f cvx ( w ) + τ dc w (cid:62) w + ρ (cid:107) w (cid:107) − ( τ dc w j − ∇ f ncvx ( w j ) + ρ s j ) (cid:62) w s.t. (cid:62) w = 1 , − α ≤ w ≤ α , (4)where s j is a subgradient of (cid:107) w (cid:107) [ k ] in w j . s j can be com-puted efficiently in the follows two steps:1) sort the elements of | w j | in decreasing order, i.e., | w j (1) | ≥ | w j (2) | ≥ · · · ≥ | w j ( N ) | .2) s j ( i ) = (cid:40) sign ( w j ( i ) ) , i = 1 , . . . , k, , otherwise . Furthermore, by introducing a new varibale u ∈ R N , theproblem in (4) can be cast as a convex quadratic programming(QP) problem, min w , u f cvx ( w ) + τ dc w (cid:62) w + ρ (cid:62) u − ( τ dc w j − ∇ f ncvx ( w j ) + ρ s j ) (cid:62) w s.t. (cid:62) w = 1 , − α ≤ w ≤ α , − u ≤ w ≤ u , (5) lgorithm 1 pDCA for PMVSK (3). Input:
Iteration number j = 0 , error tolerance (cid:15) > , andinitial point w . Output:
Optimal solution w ∗ . for j = 0 , , · · · do Solve the sub-problem (5) via QP solver to get w j +1 . if (cid:107) w j +1 − w j (cid:107) (cid:107) w j (cid:107) < (cid:15) and | f p ( w j +1 ) − f p ( w j ) | | f p ( w j +1 ) | < (cid:15) then Set w ∗ = w j +1 . break end if end for which can be efficiently solved by quadprog in MATLAB. Inthe rest of the paper, we will always utilize this technique tocast the (cid:96) norm into linear inequality constraints. The com-plete procedures of pDCA are summarized in Algorithm 1. (3)Despite the common use of pDCA in lots of applications,it can be slow in practice [19]. To possibly accelerate pDCAwhile not increasing too much computational cost, we adoptthe extrapolation technique as in [19]. In the j -th iteration,we need to solve the following sub-problem, min w f cvx ( w ) + τ dc w (cid:62) w + ρ (cid:107) w (cid:107) − ( τ dc y j − ∇ f ncvx ( y j ) + ρ ˆ s j ) (cid:62) w s.t. (cid:62) w = 1 , − α ≤ w ≤ α , (6)where ˆ s j is a subgradient of (cid:107) w (cid:107) [ k ] in y j . The only differencebetween (4) and (6) is that the latter linearly approximates theconcave part at y j , which is an extrapolation between w j − and w j . We summarize the pDCAe in Algorithm 2. (3)In this subsection, instead of utilizing the DC decomposi-tion as pDCA and pDCAe, we construct a strongly convexfunction which is not necessarily a global upper bound of f ncvx ( w ) , leading to a tighter approximation. This is a kind Algorithm 2 pDCAe for PMVSK (3).
Input:
Iteration number j = 0 , error tolerance (cid:15) > , andinitial point w − = w . θ − = θ = 1 , Output:
Optimal solution w ∗ . for j = 0 , , · · · do Compute β j = θ j − − θ j and θ j +1 = √ θ j . Set y j = w j + β j ( w j − w j − ) . Solve the sub-problem (6) via QP solver to get w j +1 . if (cid:107) w j +1 − w j (cid:107) (cid:107) w j (cid:107) < (cid:15) and | f p ( w j +1 ) − f p ( w j ) | | f p ( w j +1 ) | < (cid:15) then Set w ∗ = w j +1 . break end if end for Algorithm 3 SCA for PMVSK (3).
Input:
Iteration number j = 0 , error tolerance (cid:15) > , andinitial point w . Output:
Optimal solution w ∗ . for j = 0 , , · · · do Get ˆ w j +1 by solving a convex QP problem (7). Perform the backtracking line search (8) to obtain thestepsize γ j . Update w j +1 = w j + γ j ( ˆ w j +1 − w j ) . if | ( ˆ w j +1 − w j ) (cid:62) ( ∇ f ( w j ) − ρ s j ) + ρ ( (cid:107) ˆ w j +1 (cid:107) −(cid:107) w j (cid:107) ) | < (cid:15) then Set w ∗ = w j +1 . break end if end for of successive convex approximation strategy, and we refer itas SCA for brevity.Specifically, we have ˜ f ncvx ( w , w j ) = f ncvx ( w j ) + ∇ f ncvx ( w j ) (cid:62) ( w − w j )+ 12 ( w − w j ) (cid:62) H jncvx ( w − w j )+ τ w (cid:107) w − w j (cid:107) , where H jncvx ∈ S N + is an approximation of ∇ f ncvx ( w j ) . Lemma 3. [20, Lemma 5] The nearest symmetric positivesemidefinite matrix in the Frobenius norm to a real symmet-ric matrix X is U Diag ( d + ) U (cid:62) , where U Diag ( d ) U (cid:62) is theeigenvalue decomposition of X . In addition to constructing a strongly convex local up-per bound for f ncvx ( w ) , we also linearize the concave part − ρ (cid:107) w (cid:107) [ k ] . In the j -th iteration, we need to solve the follow-ing convex QP problem to get ˆ w j +1 , min w f cvx ( w ) + ρ (cid:107) w (cid:107) + ˜ f ncvx ( w , w j ) − ρ ( s j ) (cid:62) w s.t. (cid:62) w = 1 , − α ≤ w ≤ α . (7)It is noteworthy that our problem (3) is non-convex and non-smooth, traditional choice of stepsize [21] does not work here.Hence, we perform line search to guarantee convergence ofSCA. The details are given as follows: given scalars < c < and < β < , the stepsize γ j is set to be γ j = β m j ,where m j is the smallest nonnegative integer m satisfying thefollowing inequality: f ( w j + β m ( ˆ w j +1 − w j )) − β m ρ ( ˆ w j +1 − w j ) (cid:62) s j + β m ρ ( (cid:107) ˆ w j +1 (cid:107) − (cid:107) w j (cid:107) ) ≤ f ( w j ) + cβ m ( ˆ w j +1 − w j ) (cid:62) ( ∇ f ( w j ) − ρ s j )+ cρβ m ( (cid:107) ˆ w j +1 (cid:107) − (cid:107) w j (cid:107) ) . (8)It is guaranteed that the stepsize γ j acquired by (8) is non-zeroand f p ( w j +1 ) < f p ( w j ) [22, Proposition 2]. The completeSCA algorithm is given in Algorithm 3. . THEORETICAL ANALYSIS In this section, we provide convergence guarantees for ourproposed algorithms in the following theorems.
Theorem 1. [19, Theorem 4.1, Proposition 4.1] Let { w j } bethe sequence generated by Algorithm 1 or Algorithm 2, thenthe following statements hold:• lim j →∞ (cid:107) w j +1 − w j (cid:107) = 0 .• Any limiting point of { w j } is a stationary point of (3) .• f ∗ p = lim j →∞ f p ( w j ) exists.• f p ≡ f ∗ p on Ω , where Ω is the set of limiting points of { w j } . Theorem 2. [22, Theorem 3] Let { w j } be the sequence gen-erated by Algorithm 3. Then any limiting point of { w j } is astationary point of (3) .
5. EXPERIMENTS
Next, we evaluate the performance of our proposed al-gorithms. The data is generated according to the followingsteps:1) Randomly select N ( N = 50 ) stocks from S & P 500Index components .2) Randomly choose N continuous trading days from − − to − − .3) Compute the sample moments using selected data.Our experiments are performed in MATLAB on a PC withIntel i5-6200U CPU at 2.3GHz and 12GB memory. We set α = 0 . , (cid:15) = 10 − for pDCA, pDCAe as well as SCA. Inaddition, we set model parameters in f following [10] with λ = 1 , λ = ξ/ , λ = ξ ( ξ +1) / , λ = ξ ( ξ +1)( ξ +2) / ,where ξ = 5 , is the risk aversion parameter. We set ρ =3 × − , × − for ξ = 10 , respectively.Figure 1 shows the evolution procedure of f p ( w j ) withrespect to the iteration number and CPU time respectivelywith the setting ξ = 10 and Figure 2 gives the correspond-ing results when ξ = 5 . Significantly, pDCAe and SCA aremuch fast than pDCA and they all get sparse solutions.Next, we show the results compared with several base-lines. One commonly used strategy, which we denote asRMVSKC, is to first relax the cardinality constraint to (cid:96) norm constraint [15] (it is worth emphasizing that this kindof relaxation can not get a sparse solution) and then projectthe resulted solution to satisfy the original constraints ofMVSKC (1). The projection step is actually solving prob-lem (9). Here, we introduce a integer variable u to rewritethe cardinality constraint, leading to − α u ≤ w ≤ α u , and (cid:62) u ≤ k . However, the consequent mixed-integer quadraticprogramming is challenging even with Gurobi ( > hours). https://cran.r-project.org/web/packages/portfolioBacktest/vignettes. f p - v a l u e -3 (a) f p - v a l u e -3 pDCApDCAeSCA (b) Fig. 1 . f p -value versus iteration/CPU time with ξ = 10 . f p - v a l u e -3 (a) f p - v a l u e -3 pDCApDCAeSCA (b) Fig. 2 . f p -value versus iteration/CPU time with ξ = 5 .Then, we turn to the genetic algorithm (GA). Utilizing sim-ilar techniques that transform the cardinality constraint intomix-integer programming, we can solve the MVSKC (1) viaGA. min w (cid:107) w − ˜ w ∗ (cid:107) s.t. (cid:62) w = 1 , (cid:107) w (cid:107) ≤ k, − α ≤ w ≤ α . (9)From Table 1, we can see that our proposed algorithmspDCA, pDCAe as well as SCA do have superior advan-tages in obtaining better utility. In addition, this is done withmuch less time when using pDCAe and SCA. Table 1 . f -value and time usage of different methods with ξ = 10 and ξ = 5 .Methods ξ = 10 ξ = 5 f value CPU time (s) f value CPU time (s)pDCA -1.5e-3 21.0 -1.8e-3 19.6pDCAe -1.5e-3 8.9 -1.7e-3 2.7SCA -1.5e-3 0.4 -2.0e-3 0.4 RMVSK +3.7e-4 9.8 +2.0e-5 10.4MVSKC +8.4e-4 95.0 +3.8e-4 204.5
6. CONCLUSION
In this paper, we have considered the high-order portfoliooptimization with cardinality constraint. We have proposedto recast the cardinality constraint into a penalized term andthen developed three algorithms including pDCA, pDCAe aswell as SCA. Theoretical convergence results have been es-tabilished respectively. Extensive experimens show that ourproposed algorithms have better utility than baselines. In ad-dition, pDCAe and SCA enjoy incrediblely fast convergence.Thus they are applicable for large-scale scenarios. . REFERENCES [1] Harry M Markowitz, “Portfolio selection,”
Journal ofFinance , vol. 7, no. 1, pp. 77–91, 1952.[2] Daniel Bienstock, “Computational study of a fam-ily of mixed-integer quadratic programming problems,”
Mathematical Programming , vol. 74, no. 2, pp. 121–140, 1996.[3] Dimitris Bertsimas and Ryan Cory-Wright, “A scalablealgorithm for sparse portfolio selection,” arXiv preprintarXiv:1811.00138 , 2018.[4] Jize Zhang, Tim Leung, and Aleksandr Aravkin, “A re-laxed optimization approach for cardinality-constrainedportfolios,” in
European Control Conference (ECC) .IEEE, 2019, pp. 2885–2892.[5] Antonio Frangioni, Fabio Furini, and Claudio Gentile,“Improving the approximated projected perspective re-formulation by dual information,”
Operations ResearchLetters , vol. 45, no. 5, pp. 519–524, 2017.[6] Petter N Kolm, Reha T¨ut¨unc¨u, and Frank J Fabozzi, “60years of portfolio optimization: Practical challenges andcurrent trends,”
European Journal of Operational Re-search , vol. 234, no. 2, pp. 356–371, 2014.[7] Christopher Adcock, Martin Eling, and Nicola Loper-fido, “Skewed distributions in finance and actuarial sci-ence: a review,”
The European Journal of Finance , vol.21, no. 13-14, pp. 1253–1281, 2015.[8] Andrew L Turner and Eric J Weigel, “Daily stock mar-ket volatility: 1928–1989,”
Management Science , vol.38, no. 11, pp. 1586–1609, 1992.[9] Campbell R Harvey, John C Liechty, Merrill W Liechty,and Peter M¨uller, “Portfolio selection with higher mo-ments,”
Quantitative Finance , vol. 10, no. 5, pp. 469–485, 2010.[10] Kris Boudt, Wanbo Lu, and Benedict Peeters, “Higherorder comoments of multifactor models and asset allo-cation,”
Finance Research Letters , vol. 13, pp. 225–233,2015.[11] Saranya Kshatriya and P Krishna Prasanna, “Geneticalgorithm-based portfolio optimization with higher mo-ments in global stock markets,”
Journal of Risk , vol. 20,no. 4, 2018.[12] Yacine A¨ıt-sahali and Michael W Brandt, “Variable se-lection for portfolio choice,”
The Journal of Finance ,vol. 56, no. 4, pp. 1297–1351, 2001. [13] Tao Pham Dinh and Yi-Shuai Niu, “An efficient dc pro-gramming approach for portfolio decision with highermoments,”
Computational Optimization and Applica-tions , vol. 50, no. 3, pp. 525–554, 2011.[14] Yi-Shuai Niu and Ya-Juan Wang, “Higher-order mo-ment portfolio optimization via the difference-of-convexprogramming and sums-of-squares,” arXiv preprintarXiv:1906.01509 , 2019.[15] Rui Zhou and Daniel P Palomar, “Solving high-orderportfolios via successive convex approximation algo-rithms,” arXiv preprint arXiv:2008.00863 , 2020.[16] Rui Pedro Brito, H´elder Sebasti˜ao, and Pedro Godinho,“Portfolio management with higher moments: the car-dinality impact,”
International Transactions in Opera-tional Research , vol. 26, no. 6, pp. 2531–2560, 2019.[17] Ana Luısa Cust´odio, JF Aguilar Madeira, A Ismael FVaz, and Lu´ıs Nunes Vicente, “Direct multisearch formultiobjective optimization,”
SIAM Journal on Opti-mization , vol. 21, no. 3, pp. 1109–1140, 2011.[18] Jun-ya Gotoh, Akiko Takeda, and Katsuya Tono, “Dcformulations and algorithms for sparse optimizationproblems,”
Mathematical Programming , vol. 169, no.1, pp. 141–176, 2018.[19] Bo Wen, Xiaojun Chen, and Ting Kei Pong, “A proxi-mal difference-of-convex algorithm with extrapolation,”
Computational Optimization and Applications , vol. 69,no. 2, pp. 297–324, 2018.[20] Nicholas J Higham, “Computing a nearest symmetricpositive semidefinite matrix,”
Linear Algebra and ItsApplications , vol. 103, pp. 103–118, 1988.[21] Ying Sun, Gesualdo Scutari, and Daniel Palomar, “Dis-tributed nonconvex multiagent optimization over time-varying networks,” in . IEEE, 2016, pp.788–794.[22] Yang Yang, Marius Pesavento, Symeon Chatzinotas,and Bj¨orn Ottersten, “Successive convex approximationalgorithms for sparse signal estimation with nonconvexregularizations,”