[PDF] Solving High-Order Portfolios via Successive Convex Approximation Algorithms

Abstract

The first moment and second central moments of the portfolio return, a.k.a. mean and variance, have been widely employed to assess the expected profit and risk of the portfolio. Investors pursue higher mean and lower variance when designing the portfolios. The two moments can well describe the distribution of the portfolio return when it follows the Gaussian distribution. However, the real world distribution of assets return is usually asymmetric and heavy-tailed, which is far from being a Gaussian distribution. The asymmetry and the heavy-tailedness are characterized by the third and fourth central moments, i.e., skewness and kurtosis, respectively. Higher skewness and lower kurtosis are preferred to reduce the probability of extreme losses. However, incorporating high-order moments in the portfolio design is very difficult due to their non-convexity and rapidly increasing computational cost with the dimension. In this paper, we propose a very efficient and convergence-provable algorithm framework based on the successive convex approximation (SCA) algorithm to solve high-order portfolios. The efficiency of the proposed algorithm framework is demonstrated by the numerical experiments.

Full PDF

11 Solving High-Order Portfolios via SuccessiveConvex Approximation Algorithms

Rui Zhou and Daniel P. Palomar,

Fellow, IEEE

Abstract —The ﬁrst moment and second central momentsof the portfolio return, a.k.a. mean and variance, have beenwidely employed to assess the expected proﬁt and risk of theportfolio. Investors pursue higher mean and lower variance whendesigning the portfolios. The two moments can well describe thedistribution of the portfolio return when it follows the Gaussiandistribution. However, the real world distribution of assets returnis usually asymmetric and heavy-tailed, which is far from being aGaussian distribution. The asymmetry and the heavy-tailednessare characterized by the third and fourth central moments, i.e.,skewness and kurtosis, respectively. Higher skewness and lowerkurtosis are preferred to reduce the probability of extreme losses.However, incorporating high-order moments in the portfoliodesign is very difﬁcult due to their non-convexity and rapidlyincreasing computational cost with the dimension. In this paper,we propose a very efﬁcient and convergence-provable algorithmframework based on the successive convex approximation (SCA)algorithm to solve high-order portfolios. The efﬁciency of theproposed algorithm framework is demonstrated by the numericalexperiments.

Index Terms —High-order portfolios, skewness, kurtosis, efﬁ-cient algorithm, successive convex approximation.

I. I

NTRODUCTION

Modern portfolio theory has developed rapidly since HarryMarkowitz’s seminal paper in 1952, which proposed the mean-variance framework to pursue the trade-off between maximiz-ing the portfolio’s proﬁt and minimizing the risk [1]. The proﬁtand risk of a portfolio are measured by the mean and variance,i.e., the ﬁrst moment and the second central moments, of theportfolio return. The mean-variance framework assumes thatthe investors prefer a quadratic utility or that the returns ofassets follow a Gaussian distribution [2].However, the mean-variance framework is not widely usedin the real market investment. One of the main reasons isthat returns of assets in real markets are seldom Gaussiandistributed. They are usually asymmetric and more likely tocontain outliers or exhibit a heavier tail, making the portfolioreturn also asymmetric and heavy-tailed [3], [4]. Meanwhile,most investors would be willing to accept lower expectedproﬁt and higher volatility in exchange for more positivelyskewed and less heavy-tailed portfolio return [5], [6], [7]. Thisaspiration has been beyond the characterization of the mean-variance framework. Apart from that, the investors might havedifferent tastes in utility functions. Sometimes the shapes ofthese utility functions can be signiﬁcantly different from thequadratic one.

This work was supported by the Hong Kong RGC 16208917 research grant.The authors are with the Hong Kong University of Science andTechnology (HKUST), Clear Water Bay, Kowloon, Hong Kong (e-mail:[email protected]; [email protected]).

To make up the drawbacks of the mean-variance framework,we need to take high-order moments of the portfolio returninto consideration. The asymmetry and heavy-tailedness ofportfolio return are well captured by its third and fourth centralmoments, i.e., skewness and kurtosis. A higher skewnessusually means that the portfolio return admits a more positivelyskewed shape, while the lower kurtosis usually corresponds tothinner tail. We can extend the mean-variance framework bydirectly incorporating the high-order moments to obtain themean-variance-skewness-kurtosis (MVSK) framework, wherewe shall try to strike a balance between maximizing the meanand skewness (odd moments) while minimizing the varianceand kurtosis (even moments) [8], [9], [10]. Besides, suchextension can be seen as approximating a general expectedutility function with its Taylor series expansion truncated tothe four most important order terms [11]. There also exist someother high-order portfolios within the MVSK framework. Forexample, the MVSK tilting portfolios [12] are obtained by“tilting” a given portfolio to the MVSK efﬁcient frontier.Although there are many advantages of the MVSK frame-work, solving such high-order portfolio optimization problemsis quite challenging. First, the third and fourth central momentsare both nonconvex functions, making the problems in generalNP-hard [13]. These problems are traditionally solved by somemetaheuristic optimization tools, e.g., differential evolution[14] and genetic algorithms [10]. However, they are essentiallyperforming a time-consuming random search [15], [16]. Amethod based on the Difference of Convex (DC) algorithm wasproposed to solve the MVSK portfolio problem to a stationarypoint [8], but it converges too slowly and that it is onlyapplicable to small-size problems. Second, the complexity ofcomputing the value or the gradients of high-order momentsgrows rapidly with the problem dimension. The classicalgeneral gradient descent method and backtracking line searchalso become inapplicable when the problem dimension growslarge. Therefore, it is meaningful and necessary to designefﬁcient algorithms for solving high-order portfolios.To this end, the major goal of this paper is to developan efﬁcient algorithm framework based on the successiveconvex approximation (SCA) to solve high-order portfolios.The SCA algorithm solves the original intractable problemby constructing and solving a sequence of strongly convexapproximating problems [17], [18], [19]. In this paper, wepropose an easy approach to construct the approximation forthe nonconvex functions. This allows to construct a sequenceof convex problems compatible with existing efﬁcient solversthat can obtain the solutions to the original high-order portfo-lio optimization problems. The convergence of the proposedalgorithm framework to a stationary point is established. In a r X i v : . [ q -f i n . P M ] A ug addition, owing to their low computational complexity, thealgorithms are amenable for high-dimensional applications.Extensive numerical experiments are performed to corroborateour claims.The paper is organized as follows. We ﬁrst give the prelimi-nary knowledge on the high-order moments of portfolio returnin Section II and then pose the problem formulations in SectionIII. The SCA algorithm and its special cases are introducedin Section IV. In Section V and Section VI, we derive ouralgorithms based on the SCA algorithm to solve the high-order portfolios. The complexity and convergence analysisof the proposed algorithms are discussed in Section VII. InSection VIII, we present some other formulations of high-order portfolio problems and indicate the applicability of ourproposed algorithm framework. The numerical experiments aregiven in Section IX. Finally, the conclusion of this paper issummarized in Section X.II. P RELIMINARIES : THE M OMENTS OF P ORTFOLIO R ETURN

Denote by r ∈ R N the returns of N assets and w ∈ R N the portfolio weights. The return of this portfolio is w T r withexpected value, i.e., the ﬁrst moment φ ( w ) = E (cid:2) w T r (cid:3) = w T µ , (1)where µ = E ( r ) is the mean vector of the assets’ returns. De-note by ˜ r = r − µ the centered returns, the q -th central momentof the portfolio return is E (cid:2)(cid:0) w T r − w T µ (cid:1) q (cid:3) = E (cid:2)(cid:0) w T ˜ r (cid:1) q (cid:3) ,which gives us the following: • The second central moment, a.k.a. variance, of the port-folio return is φ ( w ) = E (cid:104)(cid:0) w T ˜ r (cid:1) (cid:105) = E (cid:2) w T ˜ r ˜ r T w (cid:3) = w T Σw , (2)where Σ = E (cid:2) ˜ r ˜ r T (cid:3) is the covariance matrix. • The third central moment, a.k.a. skewness, of the portfo-lio return is φ ( w ) = E (cid:104)(cid:0) w T ˜ r (cid:1) (cid:105) = E (cid:2) w T ˜ r ˜ r T w ˜ r T w (cid:3) = E (cid:2) w T ˜ r (cid:0) ˜ r T ⊗ ˜ r T (cid:1) ( w ⊗ w ) (cid:3) = w T Φ ( w ⊗ w ) , (3)where Φ = E (cid:2) ˜ r (cid:0) ˜ r T ⊗ ˜ r T (cid:1)(cid:3) is the co-skewness matrix. • The fourth central moment, a.k.a. kurtosis, of the portfo-lio return is φ ( w ) = E (cid:104)(cid:0) w T ˜ r (cid:1) (cid:105) = E (cid:2) w T ˜ r ˜ r T w ˜ r T w ˜ r T w (cid:3) = E (cid:2) w T ˜ r (cid:0) ˜ r T ⊗ ˜ r T (cid:1) ( w ⊗ w ) ˜ r T w (cid:3) = E (cid:2) w T ˜ r (cid:0) ˜ r T ⊗ ˜ r T ⊗ ˜ r T (cid:1) ( w ⊗ w ⊗ w ) (cid:3) = w T Ψ ( w ⊗ w ⊗ w ) , (4)where Ψ = E (cid:2) ˜ r (cid:0) ˜ r T ⊗ ˜ r T ⊗ ˜ r T (cid:1)(cid:3) is the co-kurtosismatrix. Portfolio Return P r obab ili t y D en s i t y skewness = 1skewness = 0 (Gaussian)skewness = −1 Figure 1: The implication of skewness.The gradients of φ ( w ) and φ ( w ) w.r.t. w are µ and Σw , while their Hessians are and Σ , respectively. Butthe gradient and the Hessian of φ ( w ) and φ ( w ) are morecomplicated to derive and we give the next some useful results. Lemma 1.

The gradient and Hessian of the skewness andkurtosis are given by: (cid:79) φ ( w ) = 3 Φ ( w ⊗ w ) , (cid:79) φ ( w ) = 4 Ψ ( w ⊗ w ⊗ w ) , (cid:79) φ ( w ) = 6 Φ ( I ⊗ w ) , (cid:79) φ ( w ) = 12 Ψ ( I ⊗ w ⊗ w ) . (5) Proof:

See Appendix A.

Corollary 2.

The gradient and Hessian of the skewness andkurtosis admit the following relations: (cid:79) φ ( w ) = 12 (cid:79) φ ( w ) w , (6) (cid:79) φ ( w ) = 13 (cid:79) φ ( w ) w . (7) Proof:

Using Lemma 1, we have φ ( w ) = w T (cid:79) φ ( w ) .Then taking the derivative of both sides w.r.t. w , we get (cid:79) φ ( w ) = (cid:79) φ ( w ) + (cid:79) φ ( w ) w , which further derivesequation (6). Equation (7) can be derived similarly.Note that (cid:79) φ ( w ) = 6 (cid:80) Nk =1 Φ ( k ) ij w k and (cid:79) φ ( w ) =12 (cid:80) Nk,l =1 Ψ ( k,l ) ij w k w l can be easily obtained from Lemma 1,where Φ ( k ) ij = E [˜ r i ˜ r j ˜ r k ] and Ψ ( k,l ) ij = E [˜ r i ˜ r j ˜ r k ˜ r l ] are thecorresponding elements of matrices Φ and Ψ .A high expected value and low variance of the portfolioreturn are naturally chased by investors to increase the proﬁtand decrease the risk. Besides, in the non-Gaussian case, ahigh skewness and low kurtosis are also desirable as they canreduce the probability of extreme losses. As shown in Figure 1,a positively skewed portfolio return is signiﬁcantly less likelyto suffer extreme losses than a negatively skewed one. Besides,we can see from Figure 2 that a lower kurtosis shows also athinner tail, which alleviates the appearance of extreme returns.In general, investors have a preference for odd moments whiledislike even moments. Portfolio Return P r obab ili t y D en s i t y kurtosis = 10kurtosis = 3 (Gaussian)kurtosis = 2 Figure 2: The implication of kurtosis.III. P

ROBLEM F ORMULATION

A. MVSK Portfolio

The classical Markowitz’s mean-variance (MV) portfolio [1]is obtained by solving the following problem: minimize w − w T µ + λ w T Σw subject to w ∈ W , (8)where λ ≥ is a parameter striking a balance between theexpected return ( w T µ ) and the portfolio risk (deﬁned by thevariance w T Σw ), W is the feasible set of portfolio weights,which we set as W = (cid:8) w | T w = 1 , (cid:107) w (cid:107) ≤ L (cid:9) , (9)where L ≥ is the leverage constraint of the portfolio [20].Speciﬁcally, when L = 1 , W reduces to the no shortingconstraint: (cid:8) w | T w = 1 , w ≥ (cid:9) . The expected mean andthe expected variance are actually the ﬁrst moment and thesecond central moment of the portfolio return. However, thereal world assets return usually appears to be asymmetricand of extreme values, which is beyond the characterizationof ﬁrst two moments. It is reasonable to consider the thirdand fourth central moments in the portfolio design. A naturalway to incorporate the two higher-order moments is revisingthe objective of problem (8) to achieve the mean-variance-skewness-kurtosis portfolio design problem [8], [9], [10]: minimize w f ( w ) = − λ φ ( w ) + λ φ ( w ) − λ φ ( w ) + λ φ ( w ) subject to w ∈ W , (10)where λ , λ , λ , λ ≥ are the parameters for combining thefour moments of the portfolio return. B. MVSK Tilting Portfolio

Directly solving the problem (10) leads us to the MVSKefﬁcient frontier, where we cannot improve any moment with-out impairing other moments. However, the investors mightwant to modify another existing portfolio w toward a MVSKefﬁcient portfolio. This can be done by tilting these portfoliosin a direction that increases their ﬁrst moment and third central moment and decreases their second and fourth centralmoments [12], i.e., maximize w ,δ δ subject to φ ( w ) ≥ φ ( w ) + d δ,φ ( w ) ≤ φ ( w ) − d δ,φ ( w ) ≥ φ ( w ) + d δ,φ ( w ) ≤ φ ( w ) − d δ, ( w − w ) T Σ ( w − w ) ≤ κ , w ∈ W , δ ≥ , (11)where d = [ d , d , d , d ] ≥ is the tilting direction, φ i ( w ) , i = 1 , , , are the moments of w (starting point)for tilting, κ determines the maximum tracking error volatilityof w with respect to the reference portfolio w . C. Difﬁculty of Solving High-Order Portfolios

The MVSK portfolio optimization problem (10) and MVSKtilting portfolio optimization problems (11) are very difﬁcultto solve for two reasons:1)

Non-convexity : the third and fourth central moments,i.e., φ ( w ) and φ ( w ) , are non-convex on w , makingthe problem (10) and the problem (11) both non-convexproblems.2) Computational complexity : Ψ is of dimension N × N ,which means the memory complexity is O (cid:0) N (cid:1) andthe computational complexity of one single evaluationof the fourth moment is O (cid:0) N (cid:1) . Lemma 1 shows thatthe computational complexity for computing the gradientof the fourth central moment is also O (cid:0) N (cid:1) . Thenthe general gradient descent method and backtrackingline search are inappropriate to the high-order portfolioproblem.Due to the non-convexity, the classical convex optimizationmethods are not applicable, while the general gradient methodis also not applicable due to the expensive cost of gradientcomputation. It is necessary to design a speciﬁc algorithmto efﬁciently solve high-order portfolios. Such an algorithmshould converge fast and avoid evaluating the gradients orvalue of high-order moments frequently. This paper proposesa very efﬁcient algorithm framework to solve the high-orderportfolio optimization problem based on the SCA algorithm.But before that, some background on the SCA algorithm isdue in the next section.IV. T HE S UCCESSIVE C ONVEX A PPROXIMATION A LGORITHM

The successive convex approximation (SCA) algorithm is ageneral framework especially designed for solving non-convexoptimization problems. Instead of solving the original in-tractable optimization problem, it resorts to successively solv-ing a sequence of strongly convex approximating problems.The convergence of the SCA algorithm can be guaranteedunder mild assumptions.

Speciﬁcally, consider a nonconvex constrained optimizationproblem, minimize x f ( x ) subject to g i ( x ) ≤ , i = 1 , . . . , m, x ∈ K , (12)where f ( x ) and g i ( x ) are nonconvex functions and K is aconvex set. In order to solve the problem (12), which is directlyintractable, we may turn to successively solving a sequenceof strongly convex approximating problems. Denote by x k the current iterate at k -th iteration, then the SCA algorithmconstructs a strongly convex approximating problem for (12)as [19]: minimize x ˜ f (cid:0) x ; x k (cid:1) subject to ˜ g i (cid:0) x ; x k (cid:1) ≤ η (cid:0) x k (cid:1) , i = 1 , . . . , m, (cid:107) x − x k (cid:107) ∞ ≤ β, x ∈ K , (13)where ˜ f (cid:0) x ; x k (cid:1) and ˜ g i (cid:0) x ; x k (cid:1) are the approximating func-tions for f ( x ) and g i ( x ) at x k , the quantity η (cid:0) x k (cid:1) in thesurrogate constraints serves to suitably enlarge the feasible setof the subproblem to ensure it is always nonempty, and β is auser-chosen positive constant. The term η (cid:0) x k (cid:1) is deﬁned as η (cid:0) x k (cid:1) (cid:44) (1 − θ ) max i (cid:8) g i ( x k ) + (cid:9) + θ min x (cid:110) max i (cid:8) ˜ g i ( x ; x k ) + (cid:9)(cid:12)(cid:12) x ∈ K (cid:111) , (14)with θ ∈ (0 , . The general SCA algorithm generates thesequence (cid:8) x k (cid:9) as (cid:40) ˆ x k +1 ← solve the problem (13) , x k +1 = x k + γ k (cid:0) ˆ x k +1 − x k (cid:1) , (15)where at each iteration, the ﬁrst stage is generating the descentdirection ˆ x k +1 − x k , and the second stage is updating thevariable along the solved descent direction with a step-size γ k satisfying lim k →∞ γ k = 0 and ∞ (cid:88) k =0 γ k = ∞ . (16)The generated sequence (cid:8) x k (cid:9) is proven to converge to ageneralized stationary point of the original problem (12) underthe following mild assumptions [19]: Assumption 1.

Let O β and O K be open neighborhoods of (cid:8) x |(cid:107) x − x k (cid:107) ∞ ≤ β (cid:9) and K and such that: On original problem (12):A1) K is an nonempty, closed, and convex set.A2) f ( x ) and g i ( x ) are continuously differentiable withlocally Lipschitz gradients on an open set containing K . On surrogate function ˜ f :B1) ˜ f ( x ; y ) is a strongly convex function on O β for every y ∈ K with modulus of strong convexity c > independent of y ;B2) ˜ f ( x ; y ) is continuous on O β × O K ;B3) (cid:79) ˜ f ( x ; y ) is continuous on O β × O K ; B4) (cid:79) ˜ f ( y ; y ) = (cid:79) f ( y ) for every y ∈ K ; On surrogate constraint ˜ g i :C1) ˜ g i ( x ; y ) is a convex function on O β for every y ∈ K ;C2) ˜ g i ( x ; y ) is continuous on R N × O K ;C3) ˜ g i ( x ; y ) = g i ( y ) for every y ∈ K ;C4) (cid:79) ˜ g i ( x ; y ) is continuous on O β × O K ;C5) (cid:79) ˜ g i ( y ; y ) = (cid:79) f ( y ) for every y ∈ K ;where (cid:79) ˜ f ( u ; y ) and (cid:79) ˜ g i ( u ; y ) denote the partial gradi-ent of ˜ f ( u ; y ) and ˜ g i ( u ; y ) evaluated at u . We can simplify the surrogate problem (13) accordinglywhen the following assumptions are additionally satisﬁed:1) if K is bounded, then the constraint (cid:107) x − x k (cid:107) ∞ ≤ β can be ignored;2) if (cid:79) f ( x ) is Lipschitz continuous on K and ˜ g i (cid:0) x ; x k (cid:1) ≥ g i ( x ) is satisﬁed for every x ∈ K , then the constraint (cid:107) x − x k (cid:107) ∞ ≤ β can be ignored and η (cid:0) x k (cid:1) ≡ [21];3) if (cid:79) f ( x ) is Lipschitz continuous on K and ˜ g i (cid:0) x ; x k (cid:1) = g i ( x ) is satisﬁed for every x ∈ K , then the algorithmreduces to the vanilla SCA algorithm. The constraint (cid:107) x − x k (cid:107) ∞ ≤ β can be ignored and η (cid:0) x k (cid:1) ≡ [22];4) if K is bounded, ˜ f (cid:0) x ; x k (cid:1) ≥ f ( x ) and ˜ g i (cid:0) x ; x k (cid:1) = g i ( x ) are satisﬁed for every x ∈ K , then the algorithmreduces to the classical majorization-minimization (MM)method with convex majorization functions. The con-straint (cid:107) x − x k (cid:107) ∞ ≤ β can be ignored, η (cid:0) x k (cid:1) ≡ , and γ k can be simply ﬁxed to [17], [23].V. S OLVING THE

MVSK P

ORTFOLIO P ROBLEM VIA

SCAIn this section, we discuss how to solve the problem (10)via the SCA algorithm. We ﬁrst investigate the Difference ofConvex (DC) programming approach for solving the problem(10) [8], which is actually a special case of the MM algo-rithm. Inspired by this, we herein propose another MM basedalgorithm by constructing a sequence of tighter upper boundfunctions. Thus fewer iterations can be expected. However,we further recognize that the MM algorithm might still be tooconservative as it requires constructing a global upper for theobjective function. Therefore, we further propose a generalSCA based algorithm for solving the problem (10), where astrongly convex approximating function is constructed for theobjective function.

A. Preliminary Approach: DC Algorithm

A DC approach method was proposed in [8] to solveproblem (10) by recognizing that (cid:79) f ( w ) has a boundedspectral radius under the bounded feasible set W . Lemma 3. [8] Given w ≥ , T w = 1 , we have ρ (cid:0) (cid:79) f ( w ) (cid:1) ≤ λ (cid:107) Σ (cid:107) ∞ + 6 λ max ≤ i ≤ N N (cid:88) j,k =1 | Φ ( k ) ij | + 12 λ max ≤ i ≤ N N (cid:88) j,k,l =1 | Ψ ( k,l ) ij | , (17) where ρ ( X ) is the spectral radius of X . Algorithm 1

DC method for problem (10). Initialize w ∈ W and compute τ DC ≥ ρ (cid:0) (cid:79) f ( w ) (cid:1) as inLemma 3. for k = 0 , , , . . . do Calculate (cid:79) f (cid:0) w k (cid:1) . Solve the problem (19) to obtain ˆ w k +1 . w k +1 = ˆ w k +1 . Terminate loop if converges. end for The bound for ρ (cid:0) (cid:79) f ( w ) (cid:1) provided in Lemma 3 can beeasily extended under the constraints in (9) (where insteadof no-shorting w ≥ we allow some leverage of L with (cid:107) w (cid:107) ≤ L ) to ρ (cid:0) (cid:79) f ( w ) (cid:1) ≤ λ (cid:107) Σ (cid:107) ∞ + 6 λ L max ≤ i ≤ N N (cid:88) j,k =1 | Φ ( k ) ij | + 12 λ L max ≤ i ≤ N N (cid:88) j,k,l =1 | Ψ ( k,l ) ij | , Then we can represent f ( w ) as f ( w ) = τ DC w T w − (cid:16) τ DC w T w − f ( w ) (cid:17) , (18)where both τ DC w T w and τ DC w T w − f ( w ) are convex func-tions in w if τ DC ≥ ρ (cid:0) (cid:79) f ( w ) (cid:1) . Then the classical concave-convex procedure (CCCP) can be employed here by iterativelylinearizing the second (concave) term, i.e., minimize w τ DC w T w − w T (cid:0) τ DC w k − (cid:79) f (cid:0) w k (cid:1)(cid:1) subject to w ∈ W , (19)where (cid:79) f (cid:0) w k (cid:1) = − λ (cid:79) φ (cid:0) w k (cid:1) + λ (cid:79) φ (cid:0) w k (cid:1) − λ (cid:79) φ (cid:0) w k (cid:1) + λ (cid:79) φ (cid:0) w k (cid:1) . It is already a convex problemand can be easily solved. Furthermore, we can rewrite it as aconvex quadratic programing (QP) problem by introducing avariable u ∈ R N : minimize w , u τ DC w T w − w T (cid:0) τ DC w k − (cid:79) f (cid:0) w k (cid:1)(cid:1) subject to T w = 1 , − u ≤ w ≤ u , T u ≤ L, (20)which can be very efﬁciently solved with a QP solver. In therest of the paper, we will always use this trick to transformthe (cid:96) -norm constraint to linear inequality constraints. Thecomplete DC algorithm for solving the problem (10) is givenin Algorithm 1. B. Preliminary Approach: MM Algorithm

The DC algorithm is a special case of the more general MMalgorithm, which works by solving a sequence of global upperbound problems of the original problem [24], [17]. Inspired bythe DC approach discussed in the above section, we propose atighter upper bound function for f ( w ) . Note that the objectivein the surrogate problem (19) can be rewritten as τ DC w T w − τ DC (cid:0) w k (cid:1) T w + (cid:79) f (cid:0) w k (cid:1) T w + const. = f (cid:0) w k (cid:1) + (cid:79) f (cid:0) w k (cid:1) T (cid:0) w − w k (cid:1) + τ DC (cid:107) w − w k (cid:107) , (21) Algorithm 2

MM method for problem (10). Initialize w ∈ W and compute τ MM ≥ ρ (cid:0) (cid:79) f ncvx ( w ) (cid:1) as in Lemma 4. for k = 0 , , , . . . do Calculate (cid:79) f ncvx (cid:0) w k (cid:1) . Solve the problem (24) to obtain ˆ w k +1 . w k +1 = ˆ w k +1 . Terminate loop if converges. end for It is actually a global upper bound function of f ( w ) [17]at w k . However, denoting f ( w ) = f cvx ( w ) + f ncvx ( w ) with f cvx ( w ) = − λ φ ( w ) + λ φ ( w ) and f ncvx ( w ) = − λ φ ( w ) + λ φ ( w ) , we ﬁnd f cvx ( w ) is already a convexfunction. Then we can merely construct the an upper boundfunction for f ncvx ( w ) . Inspired by Lemma 3, we propose asmaller bound for ρ (cid:0) (cid:79) f ncvx ( w ) (cid:1) as follows. Lemma 4.

Under the constraints in (9), we have ρ (cid:0) (cid:79) f ncvx ( w ) (cid:1) ≤ λ L max ≤ i ≤ N N (cid:88) j =1 max ≤ k ≤ N | Φ ( k ) ij | + 12 λ L max ≤ i ≤ N N (cid:88) j =1 max ≤ k,l ≤ N | Ψ ( k,l ) ij | . (22) Proof:

See Appendix B.Then we can construct, compared with the upper boundfunction actually used in DC method, a much tighter upperbound function ˇ f ncvx ( w ) for f ncvx ( w ) at w k as [17]: ˇ f ncvx (cid:0) w , w k (cid:1) = f ncvx (cid:0) w k (cid:1) + (cid:79) f ncvx (cid:0) w k (cid:1) T (cid:0) w − w k (cid:1) + τ MM (cid:107) w − w k (cid:107) , (23)where (cid:79) f ncvx (cid:0) w k (cid:1) = − λ (cid:79) φ (cid:0) w k (cid:1) + λ (cid:79) φ (cid:0) w k (cid:1) and τ MM ≥ ρ (cid:0) (cid:79) f ncvx ( w ) (cid:1) can be calculated via Lemma 4. Thena tighter global upper bound function can be constructed for f ( w ) as ˇ f (cid:0) w , w k (cid:1) = f cvx ( w ) + ˇ f ncvx (cid:0) w , w k (cid:1) . At eachiteration of the MM algorithm, we need solve the followingsurrogate problem: minimize w w T ˇ Q k w + w T ˇ q k subject to w ∈ W , (24)where ˇ Q k = λ Σ + τ MM I and ˇ q k = − λ µ + (cid:79) f ncvx (cid:0) w k (cid:1) − τ MM w k . It is a strongly convex QP problem and can bevery efﬁciently solved by a QP solver. The complete MMalgorithm for solving the problem (10) is given in Algorithm 2.Compared with the original DC algorithm, the MM algorithmdoes not introduce any additional computation, while we canexpect faster convergence. C. Q-MVSK Algorithm

The MM-type methods require constructing a global upperbound approximation, which is sometimes criticized to be tooconservative to capture the global landscape for the objective (cid:239) (cid:239)

Step O b j e c t i v e DC app.MM app. (prop.)Q (cid:239)

MVSK app. (prop.)orig. func. Algorithm 5

QCQO-MVSKT method for problem (9). Initialize w (0) and pick ⌧ , ⌧ w and a sequence k . for k = 0 , , , . . . do Calculate O w k , O w k , H k , H k . Solve problem (40) and compute ⌘ w k , k as in (38). Solve problem (36) to obtain ˆ w k +1 . w k +1 = w k + k ˆ w k +1 w k . Terminate loop if converges. end for approximation for nonconvex constraints in problem 9 whileconvex quadratic constraints, i.e., minimize w , + ⌧ k ) + ⌧ w k w w k k subject to g i ( w , )  , i = 1 , , , ˜ g j w , ; w k , k  ⌘ w k , k , j = 3 , , w , , (36)where ˜ g j w , ; w k , k is the quadratic approximation func-tion of g j ( w , ) at w k , k : ˜ g ( w , ; w k , k )= ( w ) + ( w k ) O ( w k ) T ( w w k )+ 12 ( w w k ) T H k ( w w k ) , ˜ g ( w , ; w k , k )= ( w ) + ( w k ) O ( w k ) T ( w w k )+ 12 ( w w k ) T H k ( w w k ) , (37)with H k and H k being the PSD approximation matrixes for O w k and O w k . ⌘ w k , k can be computed from ⌘ w k , k , (1 ✓ ) max j =3 , n g j w k , k + o + ✓ min w , ⇢ max j =3 , ˜ g j w , ; w k , k + ( w , ) ˜ W , (38)where ˜ W is a convex set deﬁned as ˜ W = n ( w , ) w , , g i ( w , )  , i = 1 , , o (39)Note that the second term in equation (38) can be decided by t + with t obtained from solving the following problem: minimize w ,,t t subject to ˜ g w , ; w k , k  t ˜ g w , ; w k , k  t ( w , ) ˜ W . (40)As this algorithm is actually using the successive quadrati-cally approximated constraints and quadratically approximatedobjective (QCQO) to solve the MVSK tilting problem, wecall it the QCQO-MVSKT algorithm and give the completealgorithm in the Algorithm 5. VII. C OMPLEXITY AND C ONVERGENCE A NALYSIS

A. Complexity Analysis

First of all, it should be noted that the memory complexityfor solving the high-order portfolio optimization problem is O N as the kurtosis matrix is of dimension N ⇥ N .For example, when N = 200 , storing a complete takesalmost GB memory size. Thus it is impractical to solve alarge-scale high-order portfolio optimization problem for now.The per-iteration computation complexity of constructingthe surrogate problems in any proposed algorithm is O N .The complexity of solving the surrogate problems varieswith the type of the problem and depends on the solverimplementation. B. Convergence Analysis

The convergence properties for the proposed algorithms aregiven in the following.

Proposition 6.

Every limit point of the solution sequence w k generated by the Algorithm 2 is a stationary point ofproblem (8).Proof: Note that: 1) W is a compact and convex set;2) f ( w ) is continuous and bounded on W ; 3) ˇ f w , w k iscontinuous in both w and w k ; 4) ˇ f w , w k is a global upperbound function for f ( w ) and is tangent to it at w k . Thus, [13,Assumption 2.12] and [13, Assumption 2.13] are satisﬁed, andthe proof of Proposition 6 follows directly from [13, Theorem2.14]. Proposition 7.

Suppose k (0 , , k ! and P k k =+ , and let w k be the sequence generated by Algorithm3. Then either Algorithm 3 converges in a ﬁnite number ofiterations to a stationary point of (8) or every limit of w k (at least one such point exists) is a stationary point of (8).Proof: Note that the surrogate problem in Algorithm 3only approximates the objective of problem (8) but leave theconstraints untouched, and: 1) W is a compact and convex set;2) f ( w ) is continuously differentiable and coercive on W ;3) O f w is Lipschitz continuous on W (provided by Lemma2); 4) ˜ f w , w k is strongly convex on w ; 5) O w f w k =˜ f w k , w k ; 6) ˜ f w , w k is continuous in both w and w k .Thus, [8, Assumption 1-3 and 5] are satisﬁed, and the proofof Proposition 7 follows directly from [8, condition (b) inTheorem 2]. Proposition 8.

Suppose k (0 , , k ! and P k k =+ , and let w k be the sequence generated by Algorithm5 or 4. Then w k is a generalized stationary point of theproblem (27).Proof: The only difference between Algorithm 4 andAlgorithm 5 is that Algorithm 5 constructs the quadraticapproximation for the nonconvex constraints while the Al-gorithm 4 simply linearize all the constraints. However, itdoes not affect the convergence checking as they are bothconvex approximation for the constraints. Besides, it is easyto check that all the conditions in Lemma ?? are satisﬁed Figure 3: Illustration of approximating functions.function [18]. Therefore, in this section, we propose the Q-MVSK algorithm to solve the problem (10) via a stronglyconvex approximation (need not be a global upper bound) forthe objective. More speciﬁcally, we still leave the convex part f cvx ( w ) untouched but construct a second-order approxima-tion for f ncvx ( w ) as ˜ f ncvx (cid:0) w , w k (cid:1) = f ncvx (cid:0) w k (cid:1) + (cid:79) f ncvx (cid:0) w k (cid:1) T (cid:0) w − w k (cid:1) + 12 (cid:0) w − w k (cid:1) T H k ncvx (cid:0) w − w k (cid:1) + τ w (cid:107) w − w k (cid:107) , (25)where H k ncvx is an approximation of (cid:79) f ncvx (cid:0) w k (cid:1) with (cid:79) f ncvx (cid:0) w k (cid:1) = − λ (cid:79) φ (cid:0) w k (cid:1) + λ (cid:79) φ (cid:0) w k (cid:1) from Lemma1, and τ w ≥ is to preserve the strong convexity of ˜ f ncvx (cid:0) w , w k (cid:1) . Note that τ w can be set to when λ > . H k ncvx is a positive semideﬁnite matrix close to (cid:79) f ncvx (cid:0) w k (cid:1) obtained as follows. Lemma 5. [25] The nearest symmetric positive semideﬁnitematrix in the Frobenius norm to a real symmetric real matrix X is U Diag ( d + ) U T , where U Diag ( d ) U T is the eigenvaluedecomposition of X . Then we have an approximating function for f ( w ) as ˜ f (cid:0) w , w k (cid:1) = f cvx ( w ) + ˜ f ncvx (cid:0) w , w k (cid:1) . In Figure 3, the threeapproximating functions are illustrated by being restricted toa line on W . We can see that ˜ f (cid:0) w , w k (cid:1) can best describethe global behaviour of f ( w ) . At each iteration of the MMalgorithm, we need solve the following surrogate problem: minimize w w T ˜ Q k w + w T ˜ q k subject to w ∈ W , (26)where ˜ Q k = λ Σ + H k ncvx + τ w I and ˜ q k = − λ µ + (cid:79) f ncvx (cid:0) w k (cid:1) − H k ncvx w k − τ w w k . It is a strongly convex QPproblem and can be very efﬁciently solved by a QP solver. Thecomplete Q-MVSK algorithm for solving the problem (10) isgiven in Algorithm 3. Algorithm 3

Q-MVSK algorithm for problem (10). Initialize w ∈ W and pick a sequence (cid:8) γ k (cid:9) . for k = 0 , , , . . . do Calculate (cid:79) f ncvx (cid:0) w k (cid:1) , H k ncvx . Solve the problem (26) to obtain ˆ w k +1 . w k +1 = w k + γ k (cid:0) ˆ w k +1 − w k (cid:1) . Terminate loop if converges. end for VI. S

OLVING T HE MVSK T

ILTING P ORTFOLIO P ROBLEMVIA

SCAIn this section, we discuss how to solve the MVSK tiltingproblem (11), which we rewrite as minimize w ,δ − δ subject to g i ( w , δ ) ≤ , i = 1 , . . . , w ∈ W , δ ≥ , (27)where g ( w , δ ) = φ ( w ) − φ ( w ) + d δ,g ( w , δ ) = φ ( w ) − φ ( w ) + d δ,g ( w , δ ) = φ ( w ) − φ ( w ) + d δ,g ( w , δ ) = φ ( w ) − φ ( w ) + d δ,g ( w , δ ) = ( w − w ref ) T Σ ( w − w ref ) − κ . (28)Note that g i ( w , δ ) , i = 1 , , are all convex functions, while g i ( w , δ ) , i = 3 , are both nonconvex functions. We willnext explore several options to deal with problem (27), whichcontains nonconvex constraints.The classical way for solving such constrained problem isthe interior-point (a.k.a. barrier) method (IPM), which addsthe indicator functions for the inequality constraints to theobjective and approximates them with logarithmic barrierfunctions [26]. The IPM method can be employed to theproblem (11) and transform it to minimize w ,δ − tδ − (cid:88) i =1 log ( − g i ( w , δ )) subject to w ∈ W , δ ≥ , (29)where t > is a parameter that sets the accuracy of the barrierapproximation. Then we could solve the problem (29) via ageneral gradient descend method or SCA algorithm. However,due to the implicit constraint g i ( w , δ ) ≤ , a line search iscompulsory at each iteration to guarantee a feasible updateof ( w , δ ) . As we have discussed before, the computationalcomplexity of a single evaluation of g ( w , δ ) is O (cid:0) N (cid:1) . Thenthe line search is too computationally expensive to be practicalin this problem.Another way to solve problem (27) could be by constructinga global upper bound approximation for all the nonconvexconstraints and solve a sequence of inner convex approximat-ing problems. Using the upper bound construction procedure in Section V-B, we can easily construct an inner convexapproximating problem for problem (27) at w k as: minimize w ,δ − δ + τ δ δ − δ k ) + τ w (cid:107) w − w k (cid:107) subject to g i ( w , δ ) ≤ , i = 1 , , , ˇ g j (cid:0) w , δ ; w k , δ k (cid:1) ≤ , j = 3 , , w ∈ W , δ ≥ , (30)where ˇ g j (cid:0) w , δ ; w k , δ k (cid:1) is the global upper bound of g j ( w , δ ) at (cid:0) w k , δ k (cid:1) , which can be constructed as in Section V-B. Theproblem (30) is a convex quadratically constrained quadraticprograming (QCQP) problem and can be solved via severalsolvers. However, we can know from Figure 3 and the nu-merical experiments in Section IX-A that such upper bound isvery loose and the convergence is slow.Instead, we proposed constructing convex approximations(although not upper bounds) for the nonconvex constraints inthe following. A. Preliminary Approach: L-MVSKT Algorithm

The most classical choice, as mentioned in [19], is approx-imating the objective function by a quadratic function whilelinearizing all constraints. Therefore, we herein propose the L-MVSKT algorithm by linearizing all the non-linear constraintsin problem (11), i.e., the surrogate problem is minimize w ,δ − δ + τ δ δ − δ k ) + τ w (cid:107) w − w k (cid:107) subject to g ( w , δ ) ≤ g j ( w , δ ; w k , δ k ) ≤ η ( w k , δ k ) , j = 2 , , , w ∈ W , δ ≥ , (31)where ¯ g j ( w , δ ; w k , δ k ) is the linear approximation of g j ( w , δ ) at (cid:0) w k , δ k (cid:1) with ¯ g j ( w , δ ; w k , δ k )= g (cid:0) w k , δ k (cid:1) + (cid:79) w g j (cid:0) w k , δ k (cid:1) T (cid:0) w − w k (cid:1) + (cid:79) δ g j (cid:0) w k , δ k (cid:1) T ( δ − δ k ) , j = 2 , , , (32)Besides, η (cid:0) w k , δ k (cid:1) here can be computed as η (cid:0) w k , δ k (cid:1) (cid:44) (1 − θ ) max j =2 , , , (cid:110) g j (cid:0) w k , δ k (cid:1) + (cid:111) + θ min w ,δ (cid:26) max j =2 , , , (cid:8) ¯ g j (cid:0) w , δ ; w k , δ k (cid:1) + (cid:9)(cid:12)(cid:12) ( w , δ ) ∈ ¯ W (cid:27) , (33)where ¯ W is a convex set deﬁned as ¯ W = (cid:110) ( w , δ ) (cid:12)(cid:12) w ∈ W , g ( w , δ ) ≤ , δ ≥ (cid:111) . (34)The second term in equation (33) is obtained as t from solvingthe following problem: minimize w ,δ,t t subject to ¯ g j (cid:0) w , δ ; w k , δ k (cid:1) ≤ t, j = 2 , , , , ( w , δ ) ∈ ¯ W , t ≥ . (35) Algorithm 4

L-MVSKT algorithm for problem (11). Initialize w ∈ W and pick τ δ , τ w and a sequence (cid:8) γ k (cid:9) . for k = 0 , , , . . . do Calculate (cid:79) φ (cid:0) w k (cid:1) , (cid:79) φ (cid:0) w k (cid:1) . Solve problem (35) and compute η (cid:0) w k , δ k (cid:1) as in (33). Solve problem (31) to obtain ˆ w k +1 . w k +1 = w k + γ k (cid:0) ˆ w k +1 − w k (cid:1) . Terminate loop if converges. end for Problem (31) is a convex QP problem and problem (35)is a linear programing (LP) problem. Both of them can bevery efﬁciently solved by a QP solver and an LP solver,respectively. The complete L-MVSKT algorithm is given inthe Algorithm 4.

B. Q-MVSKT Algorithm

In the above section, we have proposed the L-MVSKTalgorithm for solving the MVSK tilting problem (11). How-ever, it requires us to linearize the tractable convex quadraticconstraints and the simple linearization is rarely regarded as aproper approximation for nonconvex constraints. In SectionV-C, we have proposed a quadratic approximation for thethird and fourth central moments. It shows great advantagesfrom the numerical experiments presented in Section IX-A.Therefore, similar to Section V-C, we can construct a quadraticapproximation for the nonconvex constraints in problem (11)while not approximating the already convex constraints, i.e., minimize w ,δ − δ + τ δ δ − δ k ) + τ w (cid:107) w − w k (cid:107) subject to g i ( w , δ ) ≤ , i = 1 , , , ˜ g j (cid:0) w , δ ; w k , δ k (cid:1) ≤ η (cid:0) w k , δ k (cid:1) , j = 3 , , w ∈ W , δ ≥ , (36)where ˜ g j (cid:0) w , δ ; w k , δ k (cid:1) is the quadratic approximating func-tion of g j ( w , δ ) at (cid:0) w k , δ k (cid:1) : ˜ g ( w , δ ; w k , δ k )= φ ( w ) − φ ( w k ) + d δ − (cid:79) φ ( w k ) T ( w − w k )+ 12 ( w − w k ) T H k Φ ( w − w k ) , ˜ g ( w , δ ; w k , δ k )= φ (cid:0) w k (cid:1) − φ ( w ) + d δ + (cid:79) φ ( w k ) T ( w − w k )+ 12 ( w − w k ) T H k Ψ ( w − w k ) , (37)with H k Φ and H k Ψ being the PSD approximating matrixes for − (cid:79) φ (cid:0) w k (cid:1) and (cid:79) φ (cid:0) w k (cid:1) . η (cid:0) w k , δ k (cid:1) can be computedfrom η (cid:0) w k , δ k (cid:1) (cid:44) (1 − θ ) max j =3 , (cid:110) g j (cid:0) w k , δ k (cid:1) + (cid:111) + θ min w ,δ (cid:26) max j =3 , (cid:8) ˜ g j (cid:0) w , δ ; w k , δ k (cid:1) + (cid:9)(cid:12)(cid:12) ( w , δ ) ∈ ˜ W (cid:27) , (38) Algorithm 5

Q-MVSKT algorithm for problem (11). Initialize w ∈ W and pick τ δ , τ w and a sequence (cid:8) γ k (cid:9) . for k = 0 , , , . . . do Calculate (cid:79) φ (cid:0) w k (cid:1) , (cid:79) φ (cid:0) w k (cid:1) , H k Φ , and H k Ψ . Solve problem (40) and compute η (cid:0) w k , δ k (cid:1) as in (38). Solve problem (36) to obtain ˆ w k +1 . w k +1 = w k + γ k (cid:0) ˆ w k +1 − w k (cid:1) . Terminate loop if converges. end for where ˜ W is a convex set deﬁned as ˜ W = (cid:110) ( w , δ ) (cid:12)(cid:12) w ∈ W , δ ≥ , g i ( w , δ ) ≤ , i = 1 , , (cid:111) (39)The second term in equation (38) is obtained as t from solvingthe following problem: minimize w ,δ,t t subject to ˜ g (cid:0) w , δ ; w k , δ k (cid:1) ≤ t, ˜ g (cid:0) w , δ ; w k , δ k (cid:1) ≤ t, ( w , δ ) ∈ ˜ W , t ≥ . (40)Problems (36) and (40) are both convex QCQP problems andcan be efﬁciently solved by the corresponding solvers. We callit the Q-MVSKT algorithm and give the complete descriptionin Algorithm 5.VII. C OMPLEXITY AND C ONVERGENCE A NALYSIS

A. Complexity Analysis

First of all, it should be noted that the memory complexityfor solving the high-order portfolio optimization problem is O (cid:0) N (cid:1) as the kurtosis matrix Ψ is of dimension N × N .For example, when N = 200 , storing a complete Ψ takesalmost GB memory size. Thus it is impractical to solvea very large-scale high-order portfolio optimization problemdue to the memory restriction. All the algorithms investigatedor proposed in this paper are iterative methods. Therefore,we discuss the computational complexity of constructing thesurrogate problems in each iteration, while the computationalcomplexity of solving them depends on the speciﬁc solvers.

1) On Solving The MVSK Portfolio Problem (10):

ForAlgorithm 1 and 2, the per-iteration computational cost ofconstructing the surrogate problems comes mainly from com-puting the gradients, which is O (cid:0) N (cid:1) . For Algorithm 3, it ismainly from computing the gradient (cid:79) f ncvx (cid:0) w k (cid:1) and Hessian (cid:79) f ncvx (cid:0) w k (cid:1) , which in principle are O (cid:0) N (cid:1) and O (cid:0) N (cid:1) , re-spectively. However, we can simplify the computation by ﬁrstcomputing (cid:79) f ncvx (cid:0) w k (cid:1) = − λ (cid:79) φ (cid:0) w k (cid:1) + λ (cid:79) φ (cid:0) w k (cid:1) as (cid:79) φ ( w ) = 6 Φ ( I ⊗ w ) = 6 (cid:104) Φ (1) w · · · Φ ( N ) w (cid:105) , (41) (cid:79) φ ( w ) = 12 Ψ ( I ⊗ w ⊗ w )= 12 (cid:104) Ψ (1) ( w ⊗ w ) · · · Ψ ( N ) ( w ⊗ w ) (cid:105) , (42)where Φ ( i ) is the i -th block matrix of dimension N × N in Φ and Ψ ( i ) is the i -th block matrix of dimension N × N in Ψ . Then the computational complexity of computing (cid:79) f ncvx (cid:0) w k (cid:1) is reduced to O (cid:0) N (cid:1) . With the usage of Corol-lary 2, (cid:79) f ncvx (cid:0) w k (cid:1) can be easily computed as (cid:79) f ncvx (cid:0) w k (cid:1) = − λ (cid:79) φ (cid:0) w k (cid:1) w k + λ (cid:79) φ (cid:0) w k (cid:1) w k . (43)Then the overall computational complexity of (cid:79) f ncvx (cid:0) w k (cid:1) and (cid:79) f ncvx (cid:0) w k (cid:1) is still O (cid:0) N (cid:1) . Therefore, the per-iterationcomputational cost of constructing the surrogate problems forAlgorithms 1, 2, and 3 are O (cid:0) N (cid:1) .

2) On Solving The MVSK Tilting Portfolio Problem (11):

The per-iteration computational cost of constructing the surro-gate problems in Algorithm 4 comes mainly from computingthe gradients, while that in Algorithm 5 from computing boththe gradients and Hessian. Similar to the above analysis, thelatter can be simpliﬁed so that both algorithms admit the O (cid:0) N (cid:1) complexity on constructing the surrogate problemsat each iteration. B. Convergence Analysis

The convergence properties for the proposed algorithms aregiven in the following.

Proposition 6.

Every limit point of the solution sequence (cid:8) w k (cid:9) generated by the Algorithm 2 is a stationary point ofproblem (10).Proof: Note that: 1) ˇ f (cid:0) w , w k (cid:1) is continuous in both w and w k ; 2) ˇ f (cid:0) w , w k (cid:1) is a global upper bound function for f ( w ) and is tangent to it at w k . Thus, [23, Assumption 1] issatisﬁed, and the proof of Proposition 6 follows directly from[23, Theorem 1]. Proposition 7.

Suppose γ k ∈ (0 , , γ k → and (cid:80) k γ k =+ ∞ , and let (cid:8) w k (cid:9) be the sequence generated by Algorithm3. Then either Algorithm 3 converges in a ﬁnite number ofiterations to a stationary point of (10) or every limit of (cid:8) w k (cid:9) (at least one such point exists) is a stationary point of (10).Proof: Note that the surrogate problem in Algorithm3 only approximates the objective of problem (10) with aquadratic one but leave the constraints untouched, and: 1) W is a compact and convex set; 2) f ( w ) is continuously differen-tiable and coercive on W ; 3) (cid:79) f w is Lipschitz continuous on W (provided by Lemma 4). Thus, [22, Assumptions A1-A4]are satisﬁed, and the proof of Proposition 7 follows directlyfrom [22, Theorem 3]. Proposition 8.

Suppose γ k ∈ (0 , , γ k → and (cid:80) k γ k =+ ∞ , and let (cid:8) w k (cid:9) be the sequence generated by Algorithm 4or Algorithm 5. Then (cid:8) w k (cid:9) is a generalized stationary pointof the problem (27).Proof: The only difference between Algorithm 4 andAlgorithm 5 is that Algorithm 5 constructs the quadraticapproximation for the nonconvex constraints while the Algo-rithm 4 simply linearize all the constraints. However, it does not affect the convergence checking as they are both convexapproximation for the constraints. Besides, it is easy to checkthat all the conditions in Assumption 1 are satisﬁed in bothalgorithms. Then the proof of Proposition 8 follows directlyfrom [19].VIII. S

OLVING O THER H IGH - ORDER P ORTFOLIO P ROBLEMS

The algorithm framework proposed in this paper can beeasily employed to solve other high-order portfolio problems.

A. MVSK Tilting Portfolio with General Deterioration Mea-sures

As in [12], the MVSK tilting portfolio problem with generaldifference constraint to the reference portfolio is given asfollows: minimize w ,δ − δ subject to g ref ( w ) ≤ κ,g i ( w , δ ) ≤ , i = 1 , . . . , , w ∈ W , δ ≥ , (44)where g ref ( w ) is a measure of how distant the current portfoliois from the reference one and κ determines the maximumdistance. For examples, g ref ( w ) may be chosen as the riskconcentration [27]: g ref ( w ) = N (cid:88) i =1 (cid:18) w i ( Σw ) i w T Σw − N (cid:19) . (45)The regularized MVSK tilting portfolio problem is obtained bytransforming the general distance constraint of problem (44)to a regularization term in the objective: minimize w ,δ − δ + λg ref ( w ) subject to g i ( w , δ ) ≤ , i = 1 , . . . , , w ∈ W , δ ≥ . (46)Obviously, problems 44 and 46 are both solvable via the pro-posed algorithm framework in Section VI. The only differenceis that here we also need to construct the convex approximatingfunction for g ref ( w ) if it is nonconvex. The procedure is trivialand hence omitted. B. General Minkovski Distance MVST Portfolio

The general Minkovski distance MVST portfolio [28] ad-mits the formulation minimize w , d z ( d ) = (cid:32) (cid:88) k =1 (cid:12)(cid:12)(cid:12)(cid:12) d k z k (cid:12)(cid:12)(cid:12)(cid:12) p (cid:33) /p subject to y i ( w , d ) ≤ , i = 1 , . . . , , w ∈ W , d ≥ . (47)where z k is the aspired levels for k -th moments and y ( w , d ) = − φ ( w ) − d + z ,y ( w , d ) = φ ( w ) − d − z ,y ( w , d ) = − φ ( w ) − d + z ,y ( w , d ) = φ ( w ) − d − z . (48) It is easy to write a sequence of convex approximatingsurrogate problem as minimize w , d (cid:79) z ( d k ) T ( d − d k ) + τ d (cid:107) d − d k (cid:107) + τ w (cid:107) w − w k (cid:107) subject to ˜ y i (cid:0) w , d ; w k , d k (cid:1) ≤ η ( w k , d k ) , i = 1 , . . . , , w ∈ W , d ≥ . where ˜ y i (cid:0) w , d ; w k , d k (cid:1) is the convex approximation of y j ( w , d ) at (cid:0) w k , d k (cid:1) , which can be easily constructed fol-lowing the similar procedures in Section VI. C. Polynomial Goal Programming MVST Portfolio

The polynomial goal programming (PGP) model for solvingthe high-order portfolio [29], [30] is a variation of the generalMinkovski distance MVST portfolio taking investors’ relativepreference into consideration. It is formulated as minimize w , d z ( d ) = (cid:12)(cid:12)(cid:12)(cid:12) d z (cid:12)(cid:12)(cid:12)(cid:12) λ + (cid:12)(cid:12)(cid:12)(cid:12) d z (cid:12)(cid:12)(cid:12)(cid:12) λ + (cid:12)(cid:12)(cid:12)(cid:12) d z (cid:12)(cid:12)(cid:12)(cid:12) λ + (cid:12)(cid:12)(cid:12)(cid:12) d z (cid:12)(cid:12)(cid:12)(cid:12) λ subject to y i ( w , d ) ≤ , i = 1 , . . . , , w ∈ W , d ≥ . (49)This problem can still be easily handled via the similarprocedure in solving the general Minkovski distance MVSTportfolio. IX. N UMERICAL E XPERIMENTS

In this section, we perform the numerical experiments onour proposed algorithms . The data is generated according tothe following steps:1) randomly select N stocks from a dataset of 500 stocks,each of them listed in the S&P 500 Index components;2) randomly pick N continuous trading days from 2004-01-01 to 2018-12-31;3) compute four sample moments of the selected N stocksduring the picked trading period.The starting point are selected as w = N for all methods.Without loss of generality, we simply set L = 1 , θ = , andchoose the diminishing step size sequence as: γ = 1 , γ k = γ k − (cid:0) − − γ k (cid:1) . (50)The inner solvers for QP, LP, and QCQP are selected as quadprog [31], lpSolveAPI [32], and ECOS [33], [34], respec-tively. The algorithm is regarded as converged when any of thefollowing condition is satisﬁed: | x k +1 − x k | ≤ − (cid:0) | x k +1 | + | x k | (cid:1) , | f ( x k +1 ) − f ( x k ) | ≤ − (cid:0) | f ( x k +1 ) | + | f ( x k ) | (cid:1) . (51) We have released an R package highOrderPortfolios implementing ourproposed algorithms at https://github.com/dppalomar/highOrderPortfolios. A. On the MVSK Portfolio Problem (10)

We ﬁrst set N = 100 and then solve the problem (10)using the DC-based Algorithm 1, our proposed MM-basedAlgorithm 2, and the Q-MVSK Algorithm 3. The weightsfor four moments are decided according to the fourth orderexpansion of the Constant Relative Risk Aversion (CRRA)utility function: λ = 1 , λ = ξ ,λ = ξ ( ξ + 1)6 , λ = ξ ( ξ + 1) ( ξ + 2)24 , (52)where ξ ≥ is the risk aversion parameter [9] and set tobe in our experiments. For comparison, we also solve theproblem using the general optimization tool nloptr [35] withgradients passed. In Figure 4, we compare the convergenceof these algorithms. Signiﬁcantly, the Q-MVSK algorithm canconverge to the best result in very few iterations, which ismuch more efﬁcient than the solver nloptr . The DC-based andMM-based algorithms are both slower than the general solver nloptr . It implies that they may use very loose upper bounds.The MM-based algorithm, though much faster than the DC-based algorithm, is far from being comparable with the Q-MVSK algorithm.In Figure 5, we show the comparison of time consumptionof the proposed Q-MVSK algorithm and nloptr while chang-ing the problem dimension N . The DC-based and the MM-based algorithms are not included as they are too slow to becompared with the proposed Q-MVSK algorithm and nloptr .For fair comparison, we force nloptr to run until it reachesthe objective obtained from Q-MVSK algorithm. The result isobtained by performing the experiments on realizationsof randomly generated data. We can see that our proposedQ-MVSK algorithm is consistently more than one order ofmagnitude faster than the nloptr . B. On the MVSK Tilting Portfolio Problem (11)

Similar to the above, we ﬁrst set N = 100 and thensolve the problem (11) via the proposed Algorithms 4 and5, respectively. The reference portfolio is simply chosen asthe equally weighted portfolio, i.e., w = 1 N . (53)The tilting direction d is decided as d i = | φ i ( w ) | . Wechoose κ in (11) as κ = c × (cid:112) φ ( w ) with c ≥ . Thegeneral solver nloptr is also included for comparison . Weﬁnd that, although the ﬁnal convergence is guaranteed, the fastconvergence of the proposed L-MVSKT algorithm really relieson the proper choice of τ w and τ δ , while that of our proposedQ-MVSKT is much robust. For example, in Figure 6, we set κ = 0 . (cid:112) φ ( w ) and show the convergence of the proposedalgorithms. It is signiﬁcant that the Q-MVSKT algorithmconverges in few iterations simply with τ w = τ δ = 10 − .The L-MVSKT algorithm can also converge with comparable We use directly the implementation from authors of [12], which is availableat https://github.com/cdries/mvskPortfolios.

DC MM Q−MVSK nloptr −0.00150−0.00125−0.00100−0.00075 O b j e c t i v e −10 −8 −6 −4 O p t i m a li t y G ap Figure 4: The convergence of algorithms on solving MVSKproblem (10) with N = 100 . (cid:239) (cid:239)

40 60 80 100 120 140 160 180 200 N C P U t i m e ( s e c ond s ) nloptrQ (cid:239) MVSK (prop.) O b j e c t i v e L (cid:239) MVSKT (prop.) (cid:239) tw = 40, td = 6L (cid:239)

MVSKT (prop.) (cid:239) tw = 20, td = 6L (cid:239)

MVSKT (prop.) (cid:239) tw = 10, td = 6Q (cid:239)

MVSKT (prop.) (cid:239) tw = td = 106 aaaaanloptr DC MM Q − MVSK nloptr − − − − O b j e c t i v e − − − − O p t i m a li t y G ap Figure 4: The convergence of algorithms on solving problem(10) with N = 100 .expansion of the Constant Relative Risk Aversion (CRRA)utility function: = 1 , = ⇠ , = ⇠ ( ⇠ + 1)6 , = ⇠ ( ⇠ + 1) ( ⇠ + 2)24 , (52)where ⇠ is the risk aversion parameter [9] and set tobe in our experiments. For comparison, we also solve theproblem using the general optimization tool nloptr [28] withgradients passed. In Figure 4, we compare the convergenceof these algorithms. Signiﬁcantly, the Q-MVSK algorithm canconverge to the best result in very few iterations, which ismuch more efﬁcient than the solver nloptr . The DC-based andMM-based algorithms are both slower than the general solver nloptr . It implies that they may use very loose upper bounds.The MM-based algorithm, though much faster than the DC-based algorithm, is far from being comparable with the Q-MVSK algorithm.In Figure 5, we show the comparison of time consump-tion of the proposed Q-MVSK algorithm and nloptr whilechanging the problem dimension N . For fair comparison, weforce nloptr to run until it reaches the objective obtained fromQ-MVSK algorithm. The result is obtained by performingthe experiments on realizations of randomly generateddata. We can see that our proposed Q-MVSK algorithm isconsistently more than one order of magnitude faster than the nloptr . − −

40 60 80 100 120 140 N C P U t i m e ( s e c ond s ) nloptrQ − MVSK (prop.)

Figure 5: Time usage of algorithms on solving problem (10).

B. On the MVSK Tilting Portfolio Problem (11)

Similar to the above, we ﬁrst set N = 100 and thensolve the problem (11) via the proposed Algorithms 4 and5, respectively. The reference portfolio is simply chosen asthe equally weighted portfolio, i.e., w = 1 N . (53)The tilting direction is decided as: ( d , d , d , d )= ( | ( w ) | , | ( w ) | , | ( w ) | , | ( w ) | ) . (54)We choose  in the form of  = c ⇥ p ( w ) with c .The general solver nloptr is also included for comparison . During the experiments, we ﬁnd that the choice of ⌧ w and ⌧ may affect the convergence speed of our proposedalgorithms, especially the L-MVSKT algorithm. In Figure 6,we set  = 0 . p ( w ) and show the convergence of theproposed algorithms. For the Q-MVSKT algorithm, we ⌧ w = 40 , ⌧ = 6 ⌧ w = 20 , ⌧ = 6 ⌧ w = 10 , ⌧ = 6 ⌧ w = ⌧ = 10 We simply set ⌧ w = ⌧ = 10 for the Q-MVSKTalgorithm and it can converge very well. But for the L-MVSKTalgorithm, we need to properly tune the parameters or it mayconverge slowly. For example, we set  = 0 . p ( w ) andshow the convergence of the proposed algorithms in Figure6. The two of our proposed algorithms, i.e., Q-MVSKT andL-MVSKT, are both very efﬁcient when compared with the nloptr . In Figure 7, we show the ﬁnal results of these algo-rithms when changing the maximum tracking error volatilityconstraints. It is clear that all algorithms can give the sameresults, which are nondecreasing when  increases.In Figure 8, we show the comparison of time consumptionof the proposed Q-MVSKT algorithm, L-MVSKT algorithm,and nloptr while changing the problem dimension N . Theresult is obtained by performing the experiments on real-izations of randomly generated data. It is signiﬁcant that the We use directly the implementation from authors of [12], which is availableat https://github.com/cdries/mvskPortfolios.

Figure 6: Convergence of proposed algorithms with N = 100 and  = 0 . p ( w ) . c O b j e c t i v e nloptrL − MVSKT (prop.)Q − MVSKT (prop.)

Figure 7: Comparison of the results with different  (  = c ⇥ p ( w ) ).approximates all constraints by linear functions, making thesolution to approximating problems easily violates the originalconstraints. However, the Q-MVSKT algorithm reserves theconvex constraints and approximate the nonconvex constraintsby convex quadratic functions, which turns out to work verywell. Besides, we notice that solving the QCQP problem issigniﬁcantly slower than solving the QP problem of the samesize. It might be because we are using the R interface to amore general second-order cone programming (SOCP) solver,i.e., ECOS [34]. In Figure 7, we show the ﬁnal results ofthese algorithms when changing the maximum tracking errorvolatility constraints. It is clear that all algorithms can givethe same results, which are nondecreasing when  increases.In Figure 8, we show the comparison of time consump-tion of the proposed Q-MVSKT algorithm and nloptr whilechanging the problem dimension N . The result is obtained byperforming the experiments on realizations of randomlygenerated data. It is signiﬁcant that the proposed Q-MVSKTconsistently outperform the L-MVSKT algorithm and is about −

40 60 80 100 120 140 160 180 200 N C P U t i m e ( s e c ond s ) nloptrQ − MVSKT (prop.)

Figure 8: Time usage of algorithms on solving problem (11).one order of magnitude faster than the nloptr .X. C

ONCLUSION

In this paper, we have considered the high-order moments ofthe portfolio return for high-order portfolio optimization. Wehave proposed an efﬁcient algorithm framework for solvingthe high-order portfolio optimization problems based on thesuccessive convex approximation framework. In particular,we have proposed efﬁcient algorithms for solving the mean-variance-skewness-kurtosis portfolio optimization problem andthe mean-variance-skewness-kurtosis tilting portfolio opti-mization problem. Theoretically, all the proposed algorithmsenjoy global convergence to a stationary point. Extensivenumerical experiments show that our proposed framework ismuch more efﬁcient than the existing method and the generalsolver. A

PPENDIX

A. Proof for Lemma 1

According to the Leibniz integral rule [36], we have O ( w ) = @ E ⇥ w T ˜ r ˜ r T w ˜ r T w ⇤ @ w = E " @ w T ˜ rw T ˜ r ˜ r T w @ w = E ⇥ r ˜ r T ⌦ ˜ r T ( w ⌦ w ) ⇤ = 3 E ⇥ ˜ r ˜ r T ⌦ ˜ r T ⇤ ( w ⌦ w )= 3 ( w ⌦ w ) , (54) O ( w ) = @ E ⇥ w T ˜ r ˜ r T w ˜ r T w ˜ r T w ⇤ @ w = E " @ w T ˜ r ˜ r T w ˜ r T w ˜ r T w @ w = E ⇥ r ˜ r T ⌦ ˜ r T ⌦ ˜ r T ( w ⌦ w ⌦ w ) ⇤ = 4 E ⇥ ˜ r ˜ r T ⌦ ˜ r T ⌦ ˜ r T ⇤ ( w ⌦ w ⌦ w )= 4 ( w ⌦ w ⌦ w ) , (55) O b j e c t i v e L (cid:239) MVSKT (prop.) (cid:239) tw = 40, td = 6L (cid:239)

MVSKT (prop.) (cid:239) tw = 20, td = 6L (cid:239)

MVSKT (prop.) (cid:239) tw = 10, td = 6Q (cid:239)

40 60 80 100 120 140 N C P U t i m e ( s e c ond s ) nloptrQ − MVSK (prop.)

Figure 5: Time usage of algorithms on solving problem (10).

B. On the MVSK Tilting Portfolio Problem (11)

Similar to the above, we ﬁrst set N = 100 and thensolve the problem (11) via the proposed Algorithms 4 and5, respectively. The reference portfolio is simply chosen asthe equally weighted portfolio, i.e., w = 1 N . (53)The tilting direction is decided as: ( d , d , d , d )= ( | ( w ) | , | ( w ) | , | ( w ) | , | ( w ) | ) . (54)We choose  in the form of  = c ⇥ p ( w ) with c .The general solver nloptr is also included for comparison . During the experiments, we ﬁnd that the choice of ⌧ w and ⌧ may affect the convergence speed of our proposedalgorithms, especially the L-MVSKT algorithm. In Figure 6,we set  = 0 . p ( w ) and show the convergence of theproposed algorithms. For the Q-MVSKT algorithm, we ⌧ w = 40 , ⌧ = 6 ⌧ w = 20 , ⌧ = 6 ⌧ w = 10 , ⌧ = 6 ⌧ w = ⌧ = 10 We simply set ⌧ w = ⌧ = 10 for the Q-MVSKTalgorithm and it can converge very well. But for the L-MVSKTalgorithm, we need to properly tune the parameters or it mayconverge slowly. For example, we set  = 0 . p ( w ) andshow the convergence of the proposed algorithms in Figure6. The two of our proposed algorithms, i.e., Q-MVSKT andL-MVSKT, are both very efﬁcient when compared with the nloptr . In Figure 7, we show the ﬁnal results of these algo-rithms when changing the maximum tracking error volatilityconstraints. It is clear that all algorithms can give the sameresults, which are nondecreasing when  increases.In Figure 8, we show the comparison of time consumptionof the proposed Q-MVSKT algorithm, L-MVSKT algorithm,and nloptr while changing the problem dimension N . Theresult is obtained by performing the experiments on real-izations of randomly generated data. It is signiﬁcant that the We use directly the implementation from authors of [12], which is availableat https://github.com/cdries/mvskPortfolios.

Figure 6: Convergence of proposed algorithms with N = 100 and  = 0 . p ( w ) . c O b j e c t i v e nloptrL − MVSKT (prop.)Q − MVSKT (prop.)

40 60 80 100 120 140 160 180 200 N C P U t i m e ( s e c ond s ) nloptrQ − MVSKT (prop.)

Figure 8: Time usage of algorithms on solving problem (11).one order of magnitude faster than the nloptr .X. C

ONCLUSION

PPENDIX

A. Proof for Lemma 1

According to the Leibniz integral rule [36], we have O ( w ) = @ E ⇥ w T ˜ r ˜ r T w ˜ r T w ⇤ @ w = E " @ w T ˜ rw T ˜ r ˜ r T w @ w = E ⇥ r ˜ r T ⌦ ˜ r T ( w ⌦ w ) ⇤ = 3 E ⇥ ˜ r ˜ r T ⌦ ˜ r T ⇤ ( w ⌦ w )= 3 ( w ⌦ w ) , (54) O ( w ) = @ E ⇥ w T ˜ r ˜ r T w ˜ r T w ˜ r T w ⇤ @ w = E " @ w T ˜ r ˜ r T w ˜ r T w ˜ r T w @ w = E ⇥ r ˜ r T ⌦ ˜ r T ⌦ ˜ r T ( w ⌦ w ⌦ w ) ⇤ = 4 E ⇥ ˜ r ˜ r T ⌦ ˜ r T ⌦ ˜ r T ⇤ ( w ⌦ w ⌦ w )= 4 ( w ⌦ w ⌦ w ) , (55) Figure 5: Time usage of algorithms on solving MVSK problem(10). O b j e c t i v e L (cid:239) MVSKT (prop.) (cid:239) tw = 40, td = 6L (cid:239)

MVSKT (prop.) (cid:239) tw = 20, td = 6L (cid:239)

MVSKT (prop.) (cid:239) tw = 10, td = 6Q (cid:239)

40 60 80 100 120 140 N C P U t i m e ( s e c ond s ) nloptrQ − MVSK (prop.)

Figure 5: Time usage of algorithms on solving problem (10).

B. On the MVSK Tilting Portfolio Problem (11)

Similar to the above, we ﬁrst set N = 100 and thensolve the problem (11) via the proposed Algorithms 4 and5, respectively. The reference portfolio is simply chosen asthe equally weighted portfolio, i.e., w = 1 N . (53)The tilting direction is decided as: ( d , d , d , d )= ( | ( w ) | , | ( w ) | , | ( w ) | , | ( w ) | ) . (54)We choose  in the form of  = c ⇥ p ( w ) with c .The general solver nloptr is also included for comparison . During the experiments, we ﬁnd that the choice of ⌧ w and ⌧ may affect the convergence speed of our proposedalgorithms, especially the L-MVSKT algorithm. In Figure 6,we set  = 0 . p ( w ) and show the convergence of theproposed algorithms. For the Q-MVSKT algorithm, we ⌧ w = 40 , ⌧ = 6 ⌧ w = 20 , ⌧ = 6 ⌧ w = 10 , ⌧ = 6 ⌧ w = ⌧ = 10 We simply set ⌧ w = ⌧ = 10 for the Q-MVSKTalgorithm and it can converge very well. But for the L-MVSKTalgorithm, we need to properly tune the parameters or it mayconverge slowly. For example, we set  = 0 . p ( w ) andshow the convergence of the proposed algorithms in Figure6. The two of our proposed algorithms, i.e., Q-MVSKT andL-MVSKT, are both very efﬁcient when compared with the nloptr . In Figure 7, we show the ﬁnal results of these algo-rithms when changing the maximum tracking error volatilityconstraints. It is clear that all algorithms can give the sameresults, which are nondecreasing when  increases.In Figure 8, we show the comparison of time consumptionof the proposed Q-MVSKT algorithm, L-MVSKT algorithm,and nloptr while changing the problem dimension N . Theresult is obtained by performing the experiments on real-izations of randomly generated data. It is signiﬁcant that the We use directly the implementation from authors of [12], which is availableat https://github.com/cdries/mvskPortfolios.

Figure 6: Convergence of proposed algorithms for MVSKtilting problem (11) with N = 100 and κ = 0 . (cid:112) φ ( w ) . c O b j e c t i v e nloptrL−MVSKT (prop.)Q−MVSKT (prop.) Figure 7: Comparison of the results with different κ ( κ = c × (cid:112) φ ( w ) ).speed when parameters are properly tuned. It may be explainedas that the L-MVSKT algorithm poorly approximates all con-straints by linear functions, making the solution to approximat-ing problems easily violates the original constraints. However,the Q-MVSKT algorithm preserves the convex constraints andapproximates the nonconvex constraints by convex quadraticfunctions, which turns out to work very well. Besides, wenotice that solving the QCQP problem is signiﬁcantly slowerthan solving the QP problem of the same size. It might bebecause we are using the R interface to a more general second-order cone programming (SOCP) solver, i.e., ECOS [34].In Figure 7, we show the ﬁnal results of these algorithmswhen changing the maximum tracking error constraint. It isclear that all algorithms can give the same results, which arenondecreasing when κ increases.In Figure 8, we show the comparison of time consump-tion of the proposed Q-MVSKT algorithm and nloptr whilechanging the problem dimension N . The proposed L-MVSKTalgorithm is not included as its convergence speed relies heav- (cid:239)

40 60 80 100 120 140 160 180 200 N C P U t i m e ( s e c ond s ) nloptrQ (cid:239) MVSKT (prop.) O b j e c t i v e L (cid:239) MVSKT (prop.) (cid:239) tw = 40, td = 6L (cid:239)

MVSKT (prop.) (cid:239) tw = 20, td = 6L (cid:239)

MVSKT (prop.) (cid:239) tw = 10, td = 6Q (cid:239)

40 60 80 100 120 140 N C P U t i m e ( s e c ond s ) nloptrQ − MVSK (prop.)

Figure 5: Time usage of algorithms on solving problem (10).

B. On the MVSK Tilting Portfolio Problem (11)

Similar to the above, we ﬁrst set N = 100 and thensolve the problem (11) via the proposed Algorithms 4 and5, respectively. The reference portfolio is simply chosen asthe equally weighted portfolio, i.e., w = 1 N . (53)The tilting direction is decided as: ( d , d , d , d )= ( | ( w ) | , | ( w ) | , | ( w ) | , | ( w ) | ) . (54)We choose  in the form of  = c ⇥ p ( w ) with c .The general solver nloptr is also included for comparison . During the experiments, we ﬁnd that the choice of ⌧ w and ⌧ may affect the convergence speed of our proposedalgorithms, especially the L-MVSKT algorithm. In Figure 6,we set  = 0 . p ( w ) and show the convergence of theproposed algorithms. For the Q-MVSKT algorithm, we ⌧ w = 40 , ⌧ = 6 ⌧ w = 20 , ⌧ = 6 ⌧ w = 10 , ⌧ = 6 ⌧ w = ⌧ = 10 We simply set ⌧ w = ⌧ = 10 for the Q-MVSKTalgorithm and it can converge very well. But for the L-MVSKTalgorithm, we need to properly tune the parameters or it mayconverge slowly. For example, we set  = 0 . p ( w ) andshow the convergence of the proposed algorithms in Figure6. The two of our proposed algorithms, i.e., Q-MVSKT andL-MVSKT, are both very efﬁcient when compared with the nloptr . In Figure 7, we show the ﬁnal results of these algo-rithms when changing the maximum tracking error volatilityconstraints. It is clear that all algorithms can give the sameresults, which are nondecreasing when  increases.In Figure 8, we show the comparison of time consumptionof the proposed Q-MVSKT algorithm, L-MVSKT algorithm,and nloptr while changing the problem dimension N . Theresult is obtained by performing the experiments on real-izations of randomly generated data. It is signiﬁcant that the We use directly the implementation from authors of [12], which is availableat https://github.com/cdries/mvskPortfolios.

Figure 6: Convergence of proposed algorithms with N = 100 and  = 0 . p ( w ) . c O b j e c t i v e nloptrL − MVSKT (prop.)Q − MVSKT (prop.)

40 60 80 100 120 140 160 180 200 N C P U t i m e ( s e c ond s ) nloptrQ − MVSKT (prop.)

Figure 8: Time usage of algorithms on solving problem (11).one order of magnitude faster than the nloptr .X. C

ONCLUSION

PPENDIX

A. Proof for Lemma 1

MVSKT (prop.) (cid:239) tw = 20, td = 6L (cid:239)

MVSKT (prop.) (cid:239) tw = 10, td = 6Q (cid:239)

40 60 80 100 120 140 N C P U t i m e ( s e c ond s ) nloptrQ − MVSK (prop.)

Figure 5: Time usage of algorithms on solving problem (10).

B. On the MVSK Tilting Portfolio Problem (11)

Similar to the above, we ﬁrst set N = 100 and thensolve the problem (11) via the proposed Algorithms 4 and5, respectively. The reference portfolio is simply chosen asthe equally weighted portfolio, i.e., w = 1 N . (53)The tilting direction is decided as: ( d , d , d , d )= ( | ( w ) | , | ( w ) | , | ( w ) | , | ( w ) | ) . (54)We choose  in the form of  = c ⇥ p ( w ) with c .The general solver nloptr is also included for comparison . During the experiments, we ﬁnd that the choice of ⌧ w and ⌧ may affect the convergence speed of our proposedalgorithms, especially the L-MVSKT algorithm. In Figure 6,we set  = 0 . p ( w ) and show the convergence of theproposed algorithms. For the Q-MVSKT algorithm, we ⌧ w = 40 , ⌧ = 6 ⌧ w = 20 , ⌧ = 6 ⌧ w = 10 , ⌧ = 6 ⌧ w = ⌧ = 10 We simply set ⌧ w = ⌧ = 10 for the Q-MVSKTalgorithm and it can converge very well. But for the L-MVSKTalgorithm, we need to properly tune the parameters or it mayconverge slowly. For example, we set  = 0 . p ( w ) andshow the convergence of the proposed algorithms in Figure6. The two of our proposed algorithms, i.e., Q-MVSKT andL-MVSKT, are both very efﬁcient when compared with the nloptr . In Figure 7, we show the ﬁnal results of these algo-rithms when changing the maximum tracking error volatilityconstraints. It is clear that all algorithms can give the sameresults, which are nondecreasing when  increases.In Figure 8, we show the comparison of time consumptionof the proposed Q-MVSKT algorithm, L-MVSKT algorithm,and nloptr while changing the problem dimension N . Theresult is obtained by performing the experiments on real-izations of randomly generated data. It is signiﬁcant that the We use directly the implementation from authors of [12], which is availableat https://github.com/cdries/mvskPortfolios.

Figure 6: Convergence of proposed algorithms with N = 100 and  = 0 . p ( w ) . c O b j e c t i v e nloptrL − MVSKT (prop.)Q − MVSKT (prop.)

40 60 80 100 120 140 160 180 200 N C P U t i m e ( s e c ond s ) nloptrQ − MVSKT (prop.)

Figure 8: Time usage of algorithms on solving problem (11).one order of magnitude faster than the nloptr .X. C

ONCLUSION

PPENDIX

A. Proof for Lemma 1

According to the Leibniz integral rule [36], we have O ( w ) = @ E ⇥ w T ˜ r ˜ r T w ˜ r T w ⇤ @ w = E " @ w T ˜ rw T ˜ r ˜ r T w @ w = E ⇥ r ˜ r T ⌦ ˜ r T ( w ⌦ w ) ⇤ = 3 E ⇥ ˜ r ˜ r T ⌦ ˜ r T ⇤ ( w ⌦ w )= 3 ( w ⌦ w ) , (54) O ( w ) = @ E ⇥ w T ˜ r ˜ r T w ˜ r T w ˜ r T w ⇤ @ w = E " @ w T ˜ r ˜ r T w ˜ r T w ˜ r T w @ w = E ⇥ r ˜ r T ⌦ ˜ r T ⌦ ˜ r T ( w ⌦ w ⌦ w ) ⇤ = 4 E ⇥ ˜ r ˜ r T ⌦ ˜ r T ⌦ ˜ r T ⇤ ( w ⌦ w ⌦ w )= 4 ( w ⌦ w ⌦ w ) , (55) Figure 8: Time usage of algorithms on solving problem (11).ily on parameter tuning. The result is obtained by performingthe experiments on realizations of randomly generateddata. It is signiﬁcant that the proposed Q-MVSKT consistentlyoutperform the L-MVSKT algorithm and is about one orderof magnitude faster than nloptr .X. C

ONCLUSION

In this paper, we have considered the high-order moments ofthe portfolio return for high-order portfolio optimization. Wehave proposed an efﬁcient algorithm framework for solvingthe high-order portfolio optimization problems based on thesuccessive convex approximation framework. In particular,we have proposed efﬁcient algorithms for solving the mean-variance-skewness-kurtosis portfolio optimization problem andthe mean-variance-skewness-kurtosis tilting portfolio opti-mization problem. Theoretically, all the proposed algorithmsenjoy global convergence to a stationary point. Extensivenumerical experiments show that our proposed algorithms,speciﬁcally the Q-MVSK and Q-MVSKT algorithms, aremuch more efﬁcient than the existing method and the generalsolver. A

PPENDIX

A. Proof for Lemma 1

According to the Leibniz integral rule [36], we have (cid:79) φ ( w ) = ∂ E (cid:2) w T ˜ r ˜ r T w ˜ r T w (cid:3) ∂ w = E (cid:34) ∂ (cid:0) w T ˜ rw T ˜ r ˜ r T w (cid:1) ∂ w (cid:35) = E (cid:2) r (cid:0) ˜ r T ⊗ ˜ r T (cid:1) ( w ⊗ w ) (cid:3) = 3 E (cid:2) ˜ r (cid:0) ˜ r T ⊗ ˜ r T (cid:1)(cid:3) ( w ⊗ w )= 3 Φ ( w ⊗ w ) , (54) (cid:79) φ ( w ) = ∂ E (cid:2) w T ˜ r ˜ r T w ˜ r T w ˜ r T w (cid:3) ∂ w = E (cid:34) ∂ (cid:0) w T ˜ r ˜ r T w ˜ r T w ˜ r T w (cid:1) ∂ w (cid:35) = E (cid:2) r (cid:0) ˜ r T ⊗ ˜ r T ⊗ ˜ r T (cid:1) ( w ⊗ w ⊗ w ) (cid:3) = 4 E (cid:2) ˜ r (cid:0) ˜ r T ⊗ ˜ r T ⊗ ˜ r T (cid:1)(cid:3) ( w ⊗ w ⊗ w )= 4 Ψ ( w ⊗ w ⊗ w ) , (55) (cid:79) φ ( w ) = ∂ E (cid:2) w T ˜ r ˜ r T w ˜ r T w (cid:3) ∂ w ∂ w T = E (cid:34) ∂ (cid:0) w T ˜ rw T ˜ r ˜ r T w (cid:1) ∂ w ∂ w T (cid:35) = E (cid:2) r (cid:0) ˜ r T ⊗ ˜ r T (cid:1) ( I ⊗ w ) (cid:3) = 6 E (cid:2) ˜ r (cid:0) ˜ r T ⊗ ˜ r T (cid:1)(cid:3) ( I ⊗ w )= 6 Φ ( I ⊗ w ) , (56) (cid:79) φ ( w ) = ∂ E (cid:2) w T ˜ r ˜ r T w ˜ r T w ˜ r T w (cid:3) ∂ w ∂ w T = E (cid:34) ∂ (cid:0) w T ˜ r ˜ r T w ˜ r T w ˜ r T w (cid:1) ∂ w ∂ w T (cid:35) = E (cid:2) r (cid:0) ˜ r T ⊗ ˜ r T ⊗ ˜ r T (cid:1) ( I ⊗ w ⊗ w ) (cid:3) = 12 E (cid:2) ˜ r (cid:0) ˜ r T ⊗ ˜ r T ⊗ ˜ r T (cid:1)(cid:3) ( I ⊗ w ⊗ w )= 12 Ψ ( I ⊗ w ⊗ w ) . (57) B. Proof for Lemma 4

According to the Gershgorin circle theorem [37], we have ρ (cid:0) (cid:79) f ncvx ( w ) (cid:1) ≤ (cid:107) (cid:79) f ncvx ( w ) (cid:107) ∞ ≤ λ (cid:107) (cid:79) φ ( w ) (cid:107) ∞ + λ (cid:107) (cid:79) φ ( w ) (cid:107) ∞ . (58)Under the constraints in (9), we can get (cid:107) (cid:79) φ ( w ) (cid:107) ∞ = 6 max ≤ i ≤ N N (cid:88) j =1 | N (cid:88) k =1 Φ ( k ) ij w k |≤ ≤ i ≤ N N (cid:88) j =1 N (cid:88) k =1 | Φ ( k ) ij || w k |≤ ≤ i ≤ N N (cid:88) j =1 max ≤ k ≤ N L | Φ ( k ) ij | = 6 L max ≤ i ≤ N N (cid:88) j =1 max ≤ k ≤ N | Φ ( k ) ij | , (59) (cid:107) (cid:79) φ ( w ) (cid:107) ∞ = 12 max ≤ i ≤ N N (cid:88) j =1 | N (cid:88) k =1 w k N (cid:88) l =1 Ψ ( k,l ) ij w l |≤

12 max ≤ i ≤ N N (cid:88) j =1 N (cid:88) k =1 | w k | N (cid:88) l =1 | Ψ ( k,l ) ij || w l |≤

12 max ≤ i ≤ N N (cid:88) j =1 N (cid:88) k =1 | w k | max ≤ l ≤ N L | Ψ ( k,l ) ij | , ≤

12 max ≤ i ≤ N N (cid:88) j =1 max ≤ k ≤ N L max ≤ l ≤ N L | Ψ ( k,l ) ij | = 12 L max ≤ i ≤ N N (cid:88) j =1 max ≤ k,l ≤ N | Ψ ( k,l ) ij | . (60)Therefore, we have ρ (cid:0) (cid:79) f ncvx ( w ) (cid:1) ≤ λ L max ≤ i ≤ N N (cid:88) j =1 max ≤ k ≤ N | Φ ( k ) ij | + 12 λ L max ≤ i ≤ N N (cid:88) j =1 max ≤ k,l ≤ N | Ψ ( k,l ) ij | . (61)R EFERENCES[1] H. Markowitz, “Portfolio selection,”

Journal of Finance , vol. 7, no. 1,pp. 77–91, 1952.[2] P. N. Kolm, R. Tütüncü, and F. J. Fabozzi, “60 years of portfoliooptimization: Practical challenges and current trends,”

European Journalof Operational Research , vol. 234, no. 2, pp. 356–371, 2014.[3] C. Adcock, M. Eling, and N. Loperﬁdo, “Skewed distributions in ﬁnanceand actuarial science: a review,”

The European Journal of Finance ,vol. 21, no. 13-14, pp. 1253–1281, 2015.[4] S. I. Resnick,

Heavy-tail phenomena: probabilistic and statistical mod-eling . Springer Science & Business Media, 2007.[5] C. R. Harvey and A. Siddique, “Conditional skewness in asset pricingtests,”

The Journal of Finance , vol. 55, no. 3, pp. 1263–1295, 2000.[6] N. J. Jobst and S. A. Zenios, “The tail that wags the dog: Integratingcredit risk in asset portfolios,”

Journal of Risk Finance , pp. 31–43, 2001.[7] A. Ang, J. Chen, and Y. Xing, “Downside risk,”

The Review of FinancialStudies , vol. 19, no. 4, pp. 1191–1239, 2006.[8] T. P. Dinh and Y.-S. Niu, “An efﬁcient DC programming approach forportfolio decision with higher moments,”

Computational Optimizationand Applications , vol. 50, no. 3, pp. 525–554, 2011.[9] K. Boudt, W. Lu, and B. Peeters, “Higher order comoments of multi-factor models and asset allocation,”

Finance Research Letters , vol. 13,pp. 225–233, 2015.[10] S. Kshatriya and P. K. Prasanna, “Genetic algorithm-based portfoliooptimization with higher moments in global stock markets,”

Journal ofRisk , vol. 20, no. 4, 2018.[11] W. H. Jean, “The extension of portfolio analysis to three or moreparameters,”

Journal of ﬁnancial and Quantitative Analysis , vol. 6, no. 1,pp. 505–515, 1971.[12] K. Boudt, D. Cornilly, F. V. Holle, and J. Willems, “Algorithmic portfoliotilting to harvest higher moment gains,”

Heliyon , vol. 6, no. 3, p. e03516,2020.[13] K. G. Murty and S. N. Kabadi, “Some NP-complete problems inquadratic and nonlinear programming,” Tech. Rep., 1985.[14] D. Maringer and P. Parpas, “Global optimization of higher ordermoments in portfolio selection,”

Journal of Global Optimization , vol. 43,no. 2-3, pp. 219–230, 2009.[15] C. Blum and A. Roli, “Metaheuristics in combinatorial optimiza-tion: Overview and conceptual comparison,”

ACM Computing Surveys(CSUR) , vol. 35, no. 3, pp. 268–308, 2003.[16] A. Savine,

Modern Computational Finance: AAD and Parallel Simula-tions . John Wiley & Sons, 2018. [17] Y. Sun, P. Babu, and D. P. Palomar, “Majorization-minimization algo-rithms in signal processing, communications, and machine learning,” IEEE Transactions on Signal Processing , vol. 65, no. 3, pp. 794–816,2016.[18] G. Scutari and Y. Sun, “Parallel and Distributed Successive ConvexApproximation Methods for Big-Data Optimization,” in

Multi-agentOptimization: Cetraro, Italy 2014 , F. Facchinei and J.-S. Pang, Eds.Springer, 2018, ch. 3, pp. 141–308.[19] F. Facchinei, V. Kungurtsev, L. Lampariello, and G. Scutari, “Ghostpenalties in nonconvex constrained optimization: Diminishing stepsizesand iteration complexity,”

Mathematics of Operations Research to ap-pear , 2020.[20] Z. Zhao, R. Zhou, and D. P. Palomar, “Optimal mean-reverting port-folio with leverage constraint for statistical arbitrage in ﬁnance,”

IEEETransactions on Signal Processing , vol. 67, no. 7, pp. 1681–1695, 2019.[21] G. Scutari, F. Facchinei, and L. Lampariello, “Parallel and distributedmethods for constrained nonconvex optimization—Part I: Theory,”

IEEETransactions on Signal Processing , vol. 65, no. 8, pp. 1929–1944, 2016.[22] G. Scutari, F. Facchinei, P. Song, D. P. Palomar, and J.-S. Pang,“Decomposition by partial linearization: Parallel optimization of multi-agent systems,”

IEEE Transactions on Signal Processing , vol. 62, no. 3,pp. 641–656, 2013.[23] M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A uniﬁed convergenceanalysis of block successive minimization methods for nonsmoothoptimization,”

SIAM Journal on Optimization , vol. 23, no. 2, pp. 1126–1153, 2013.[24] D. R. Hunter and K. Lange, “A tutorial on MM algorithms,”

TheAmerican Statistician , vol. 58, no. 1, pp. 30–37, 2004.[25] N. J. Higham, “Computing a nearest symmetric positive semideﬁnitematrix,”

Linear Algebra and its Applications , vol. 103, pp. 103–118,1988.[26] S. Boyd, S. P. Boyd, and L. Vandenberghe,

Convex optimization .Cambridge university press, 2004.[27] Y. Feng and D. P. Palomar, “SCRIP: Successive convex optimizationmethods for risk parity portfolio design,”

IEEE Transactions on SignalProcessing , vol. 63, no. 19, pp. 5285–5300, 2015.[28] P. Nijkamp and J. Spronk, “Interactive multiple goal programming: anevaluation and some results,” in

Multiple Criteria Decision MakingTheory and Application . Springer, 1980, pp. 278–293.[29] K. K. Lai, L. Yu, and S. Wang, “Mean-variance-skewness-kurtosis-based portfolio optimization,” in

First International Multi-Symposiumson Computer and Computational Sciences (IMSCCS’06) , vol. 2. IEEE,2006, pp. 292–297.[30] M. Aksaraylı and O. Pala, “A polynomial goal programming model forportfolio optimization based on entropy and higher moments,”

ExpertSystems with Applications , vol. 94, pp. 185–192, 2018.[31] B. A. Turlach and A. Weingessel, quadprog: Functions to SolveQuadratic Programming Problems , 2019, R package version 1.5-7.[Online]. Available: https://CRAN.R-project.org/package=quadprog[32] K. Konis and F. Schwendinger, lpSolveAPI: R Interface to ‘lp_solve’Version 5.5.2.0 , 2020, R package version 5.5.2.0-17.6. [Online].Available: https://CRAN.R-project.org/package=lpSolveAPI[33] A. Domahidi, E. Chu, and S. Boyd, “ECOS: An SOCP solver forembedded systems,” in .IEEE, 2013, pp. 3071–3076.[34] A. Fu and B. Narasimhan,

ECOSolveR: Embedded Conic Solverin R , 2019, R package version 0.5.3. [Online]. Available: https://CRAN.R-project.org/package=ECOSolveR[35] J. Ypma and S. G. Johnson,

Introduction to nloptr: an R interfaceto NLopt , 2020, R package version 1.2.2.1. [Online]. Available:https://CRAN.R-project.org/package=nloptr[36] M. Abramowitz and I. A. Stegun,

Handbook of mathematical functionswith formulas, graphs, and mathematical tables . US Governmentprinting ofﬁce, 1948, vol. 55.[37] R. S. Varga,