[PDF] Bias-variance trade-off in portfolio optimization under Expected Shortfall with ℓ 2 regularization

Abstract

The optimization of a large random portfolio under the Expected Shortfall risk measure with an ℓ 2 regularizer is carried out by analytical calculation. The regularizer reins in the large sample fluctuations and the concomitant divergent estimation error, and eliminates the phase transition where this error would otherwise blow up. In the data-dominated region, where the number N of different assets in the portfolio is much less than the length T of the available time series, the regularizer plays a negligible role even if its strength η is large, while in the opposite limit, where the size of samples is comparable to, or even smaller than the number of assets, the optimum is almost entirely determined by the regularizer. We construct the contour map of estimation error on the N/T vs. η plane and find that for a given value of the estimation error the gain in N/T due to the regularizer can reach a factor of about 4 for a sufficiently strong regularizer.

Full PDF

BBias-variance trade-oﬀ in portfolio optimization underExpected Shortfall with (cid:96) regularization G´abor Papp , Fabio Caccioli , and Imre Kondor , , Abstract

The optimization of a large random portfolio under the Expected Shortfall riskmeasure with an (cid:96) regularizer is carried out by analytical calculation. The regularizerreins in the large sample ﬂuctuations and the concomitant divergent estimation error,and eliminates the phase transition where this error would otherwise blow up. In thedata-dominated region, where the number N of diﬀerent assets in the portfolio is muchless than the length T of the available time series, the regularizer plays a negligiblerole even if its strength η is large, while in the opposite limit, where the size of samplesis comparable to, or even smaller than the number of assets, the optimum is almostentirely determined by the regularizer. We construct the contour map of estimationerror on the N/T vs. η plane and ﬁnd that for a given value of the estimation errorthe gain in N/T due to the regularizer can reach a factor of about 4 for a suﬃcientlystrong regularizer.

The current international market risk regulation measures risk in terms of Expected Short-fall (ES) [1]. In order to decrease their capital requirements, ﬁnancial institutions have tooptimize the ES of their trading book.Optimizing a large portfolio may be diﬃcult under any risk measure, but becomes partic-ularly hard in the case of Value at Risk (VaR) and Expected Shortfall (ES) that discarda large part of the data except those at or above a high quantile. Without some kind of1 a r X i v : . [ q -f i n . P M ] J u l egularization, this leads to a phase transition at a critical value r c of the ratio r = N/T where N is the dimension of the portfolio (the number of diﬀerent assets or risk factors)and T is the sample size (the length of the available time series). This critical ratio dependson the conﬁdence level α that determines the VaR threshold above which the losses areto be averaged to obtain the ES. Beyond r c it is impossible to carry out the optimization,and upon approaching this critical value from below the estimation error increases withoutbound.The estimation error problem of portfolio selection has been the subject of a large numberof works, [2–12] are but a small selection from this vast literature. The critical behavior andthe locus of the phase boundary separating the region where the optimization is feasiblefrom the one where it is not has also been studied in a series of papers [13–17].In the present note we discuss the eﬀect of adding an (cid:96) regularizer to the ES risk measure.As noted in [18] and [19], the optimization problem so obtained is equivalent to one ofthe variants of support vector regression ( ν -SVR) [20], therefore its study is of interestalso for machine learning, independently of the portfolio optimization context. Concerningits speciﬁc ﬁnancial application, (cid:96) regularization may have two diﬀerent interpretations.First, (cid:96) has the tendency to pull the solution towards the diagonal, where all the weightsare the same. As such, (cid:96) represents a diversiﬁcation pressure [21, 22] that may be usefulin a situation where, e.g., the market is dominated by a small number of blue chips.Alternatively, the portfolio manager may wish to take into account the market impact ofthe future liquidation of the portfolio already at its construction. As shown in [17], marketimpact considerations lead one to regularized portfolio optimization, with (cid:96) correspondingto linear impact.In this paper we carry out the optimization of (cid:96) -regularized ES analytically in the specialcase of i.i.d. Gaussian distributed returns, in the limit where both the dimension and thesample size are large, but their ratio r = N/T is ﬁxed. The calculation will be performed bythe method of replicas borrowed from the statistical physics of disordered systems [23]. Thepresent work extends a previous paper [24] by incorporating the regularizer. By preventingthe phase transition from taking place, the regularizer fundamentally alters the overallpicture (in this respect, the role of the regularizer is analogous to that of an external ﬁeldcoupled to the order parameter in a phase transition). As the technical details of thereplica calculation have been laid out in [13] and, in a somewhat diﬀerent form, in [17], wedo not repeat them here. Instead we just recall the setup of the problem and focus on themost important result: the relative estimation error (closely related to the out-of-sampleestimator of ES) as a function of r = N/T and the strength of the regularizer η .Our results exhibit a clear distinction between the region in the space of parameters wheredata dominate and the regularizer plays a minor role, from the one where the data areinsuﬃcient and the regularizer stabilizes the estimate at the price of essentially suppressingthe data. Thereby, our results provide a clean and explicit example of what statisticians2all the bias-variance trade-oﬀ that lies at the heart of the regularization procedure. Weﬁnd that the transition between the data-dominated regime and the bias-dominated oneis rather sharp, and it is only around this transition that an actual trade-oﬀ takes place.Following the curves of ﬁxed estimation error on the r − η plane we can see that r increaseswith increasing η by a factor of approximately 4. Beyond this point the contour lines turnback and we go over onto a branch of the contour line where the optimization is determinedby the regularizer rather than the data. The plan of the rest of the paper is as follows.In Sec. 2 we present the formalism, in Sec 3 display our results and in Sec. 4 draw ourconclusions. The simple portfolios we consider here are linear combinations of N risk factors, withreturns x i and weights w i , i = 1 , , ..., N : X = N (cid:88) i =1 w i x i (1)The weights will be normalized such that their sum is N , instead of the customary 1: N (cid:88) i =1 w i = N. (2)The motivation for choosing this normalization is that we wish to have weights of orderunity, rather than 1 /N , in the limit N → ∞ . Apart from the budget constraint, the weightswill not be subject to any other condition. In particular, they can take any real value, thatis we are allowing unlimited short positions. We do not impose the usual constraint onthe expected return on the portfolio either, so we are looking for the global minimum riskportfolio. This setup is motivated by simplicity, but we note that tracking a benchmarkrequires exactly this kind of optimization to be performed.The probability for the loss (cid:96) ( { w i } , { x i } ) = − X to be smaller than a threshold (cid:96) is: P ( { w i } , (cid:96) ) = (cid:90) N (cid:89) i =1 dx i p ( { x i } ) θ ( (cid:96) − (cid:96) ( { w i } , { x i } ))3here p ( { x i } ) is the probability density of the returns, and θ ( x ) is the Heaviside function: θ ( x ) = 1 for x >

0, and zero otherwise. The Value at Risk (VaR) at conﬁdence level α isthen deﬁned as: VaR α ( { w i } ) = min { (cid:96) : P ( { w i } , (cid:96) ) ≥ α } . (3)Expected Shortfall is the average loss beyond the VaR quantile:ES( { w i } ) = 11 − α (cid:90) Π i dx i p ( { x i } ) (cid:96) ( { w i } , { x i } ) θ ( (cid:96) ( { w i } , { x i } ) − VaR α ( { w i } )) . (4)Portfolio optimization seeks to ﬁnd the optimal weights that make the above ES minimalsubject to the budget constraint (2). Instead of dealing directly with ES, Rockafellar andUryasev [25] proposed to minimize the related function F α ( { w i } , (cid:15) ) = (cid:15) + 11 − α (cid:90) Π i dx i p ( { x i } ) [ (cid:96) ( { w i } , { x i } ) − (cid:15) ] + (5)over the variable (cid:15) and the weights w i :ES( { w i } ) = min (cid:15) F α ( { w i } , (cid:15) ) , (6)where [ x ] + = ( x + | x | ) / E ( (cid:15), { u t } ) = (1 − α ) T (cid:15) + T (cid:88) t =1 u t (7)under the constraints u t ≥ ∀ t,u t + (cid:15) + N (cid:88) i =1 x it w i ≥ ∀ t, (8)and (cid:88) i w i = N.

4t is convenient to introduce the regularizer at this stage, by adding the (cid:96) -norm to theloss function [17]: min (cid:126)w,(cid:126)u,(cid:15) (cid:34) (1 − α ) T (cid:15) + T (cid:88) t =1 u t + η (cid:88) i w i (cid:35) , (9)s . t . (cid:126)w · (cid:126)x t + (cid:15) + u t ≥ u t ≥ ∀ t, (10) (cid:88) i w i = N, (11)where (cid:15) and (cid:126)u are auxiliary variables, and the coeﬃcient η sets the strength of the regu-larization.As the constraint on the expected return has been omitted, we are seeking the globaloptimum of the portfolio here. If the returns x it are i.i.d. Gaussian variables and N, T → ∞ with r = N/T ﬁxed, the method of replicas allows one to reduce the above optimizationtask in N + T + 1 variables to the optimization of a cost function depending on only sixvariables, the so-called order parameters [13, 17]: F ( λ, (cid:15), q , ∆ , ˆ q , ˆ∆) = λ + 1 r (1 − α ) (cid:15) − ∆ˆ q − ˆ∆ q (12)+ (cid:104) min w [ V ( w, z )] (cid:105) z + ∆2 r √ π (cid:90) ∞−∞ ds e − s g (cid:18) (cid:15) ∆ + s (cid:114) q ∆ (cid:19) , where V ( w, z ) = ˆ∆ w − λw − zw (cid:112) − q + ηw , (13) (cid:104)·(cid:105) z is an average over the standard normal variable z , and g ( x ) =  , x ≥ x , − ≤ x ≤ − x − , x < − (cid:104) w ∗ (cid:105) z (15)(1 − α ) + 12 √ π (cid:90) ∞−∞ ds e − s g (cid:48) (cid:18) (cid:15) ∆ + s (cid:114) q ∆ (cid:19) = 0 (16)5∆ − r √ πq (cid:90) ∞−∞ ds e − s sg (cid:48) (cid:18) (cid:15) ∆ + s (cid:114) q ∆ (cid:19) = 0 (17) − ˆ q − q ∆ + 12 r √ π (cid:90) ∞−∞ ds e − s g (cid:18) (cid:15) ∆ + s (cid:114) q ∆ (cid:19) + (1 − α ) r (cid:15) ∆ = 0 (18)∆ = 1 √− q (cid:104) w ∗ z (cid:105) z (19) q = (cid:68) w ∗ (cid:69) z , (20)where the variable w ∗ is that value of the weight that minimizes the “potential” V in (13).(The prime means derivative with respect to the argument.)Three of the order parameters are easily eliminated and the integrals can be reduced tothe error function and its integrals by repeated integration by parts, as in [24]. Finally,one ends up with three equations to be solved: r (1 − η ∆) = Φ (cid:18) ∆ + (cid:15) √ q (cid:19) − Φ (cid:18) ∆ + (cid:15) √ q (cid:19) (21) α = √ q ∆ (cid:26) Ψ (cid:18) ∆ + (cid:15) √ q (cid:19) − Ψ (cid:18) ∆ + (cid:15) √ q (cid:19)(cid:27) (22)12∆ + αr (cid:15) ∆ + q + 12 r − ηq ∆ = q r ∆ (cid:26) W (cid:18) ∆ + (cid:15) √ q (cid:19) − W (cid:18) ∆ + (cid:15) √ q (cid:19)(cid:27) , (23)where Φ( x ) = 1 √ π (cid:90) x −∞ dte − t / , (24)Ψ( x ) = x Φ( x ) + 1 √ π e − x / , (25) W ( x ) = x + 12 Φ( x ) + x √ π e − x / . (26)These functions are closely related to each other: Φ( x ) is the derivative of Ψ( x ) and Ψ( x )is the derivative of W ( x ). 6s explained in [24], each of the three remaining order parameters in the above set ofequations, q , ∆, and (cid:15) has a direct ﬁnancial meaning: ∆ is related to the in-sampleestimator of ES (and also to the second derivative of the cost function F with respect tothe Lagrange multiplyer λ associated with the budget constraint) and (cid:15) is the in-sampleVaR of the portfolio optimized under the ES risk measure. Our present concern is theorder parameter q , which is a measure of the out-of-sample estimator of ES. As shownin [24], if ES out is the out-of-sample estimate of ES based on samples of size T , and ES (0) is its true value (that would obtain for N ﬁnite and T → ∞ ), thenES out ES (0) = √ q , (27)that is √ q − ES = rF − α . (28)The fundamental cause of the divergence of estimation error in the original, non-regularizedproblem is that ES as a risk measure is not bounded from below. In ﬁnite samples it canhappen that one of the assets, or a combination of assets, dominates the others (i.e., pro-duces a larger return than the others in the given sample), thereby leading to an apparentarbitrage: one can achieve an arbitrarily large gain (an arbitrarily large negative ES) bygoing very long in the dominant asset and correspondingly short in the others [13, 26]. Itis evident that this apparent arbitrage is a mere statistical ﬂuctuation, but along a specialcurve in the r − α plane this divergence occurs with probability one [13]. As a result,the estimation error will diverge along the phase boundary shown in Fig. 1. Note thatin high-dimensional statistics where regularization is routinely applied the loss is alwaysbounded both from above and below. The setting in the present paper is, therefore, verydiﬀerent from the customary setup, which explaines the unusual results.The purpose of regularization is to penalize the large ﬂuctuations of the weight vector,thereby eliminating this phase transition.Since ES is a piecewise linear function of the weights, the quadratic regularizer will over-come excessive ﬂuctuations, no matter how small the coeﬃcient η is. Deep inside the regionof stability (shown by pale yellow in Fig.1), a weak regularizer (small η ) will modify thebehavior of various quantities very little. In contrast, close to the phase boundary, andespecially in the vicinity of the point α = 1, r = 0 . .0 0.2 0.4 0.6 0.8 1.0 . . . . . . a r Figure 1:

The phase boundary of unregularized ES for i.i.d. normal underlying returns. In the regionbelow the phase boundary the optimization of ES is feasible and the estimation error is ﬁnite. Approachingthe phase boundary from below, the estimation error diverges, and above the line optimization is no longerfeasible. singularity, the eﬀect of even a small η is very strong, and beyond the yellow region, whereoriginally there was no solution, the regularizer will dominate the scene. In the regionwhere the solution is stable even without the regularizer, r = N/T is small, which meanswe have an abundance of data. We call this region the data-dominated region. In thepresence of the regularizer we will ﬁnd ﬁnite solutions also far beyond the phase boundary,but here the regularizer dominates the data, so we can call this domain the bias-dominatedregion.

The solution of the stationarity conditions can be obtained with the help of a computer.In the following, we present the numerical solutions for the relative estimation error. Theresults will be displayed by constructing the contour map of this quantity, which will allowus to make a direct comparison between our present results and those in [24].8 .0 0.2 0.4 0.6 0.8 1.0 . . . . . . a r . . . . . . . contours of fixed q Figure 2:

Contour lines of ﬁxed √ q in the absence of regularization. These curves are also the contourlines for the relative error for the out-of-sample estimate of ES. In Fig. 2 we recall the contour map of the relative estimation error of ES without regular-ization.As can be seen, without regularization the constant q curves are all inside the feasibleregion. For larger and larger values of q the corresponding curves run closer and closerto the phase boundary, along which the estimation error diverges. Note that the phaseboundary becomes ﬂat, with all its derivatives vanishing, at the upper right corner of thefeasible region: there is an essential singularity at the point α = 1 , r = 0 . q contourlines, and the value of r = N/T corresponding to a conﬁdence limit α in the vicinity of1 (such as the regulatory value α = 0 . r require an unrealistically large sample size T if N is not small. Forexample, at the regulatory value of α = 0 . N = 100 assets one would need a time series of more than 7200data points [24].Let us see how regularization reorganizes the set of constant q curves. Figs. 3 and 4.display these curves for two diﬀerent values of the coeﬃcient η of the regularizer. (Noticethe logarithmic scale on the vertical axes in these ﬁgures.)9 - - - a r h = Figure 3:

Contour plot for ﬁxed values of √ q onthe α – r plane at η = 0 . - - - a r h = Figure 4:

Contour plot for ﬁxed values of √ q onthe α – r plane at η = 0 . The curves of constant q have two branches now. For a given q the lower branch liesmostly or partly in the previously feasible region, the upper branch lies outside, above it.Along the lower branch the value of the ratio r is small, which means we have very largesamples with respect to the dimension: this is the data-dominated regime. We can also seethat when the data dominate, the dependence on the regularizer is weak: the set of curvesinside the yellow region is quite similar in the two ﬁgures, even though the regularizer hasbeen increased 5-fold from Fig 3 to Fig. 4. Following the curve corresponding to a givenvalue of q , say the black one, we see that at the beginning it is increasing with α , in thevicinity of α = 1 it starts to decline, then it sharply turns around and shoots up steeply,leaving the feasible region and increasing with decreasing α . Along this upper branch theratio r is not small any more. We do not have large samples here, in fact, the situation isjust the opposite: the dimension N becomes larger than the size T of the samples. Clearly,in this regime the regularizer dominates and the data play only a minor role: this is thebias-dominated regime. It is interesting to note the sudden turn over of the constant q curves in the vicinity of α = 1. Such a sharp feature would be extremely hard to discoverif we wanted to solve the original optimization problem by numerical simulations: thesimulation would jump over to the upper branch before we could observe the sharp dipand the identiﬁcation of the boundary between the data-dominated and the bias-dominatedregimes would be hard. This is even more so for real life data which are inevitably noisy.An important point in regularization is the correct choice of the parameter η . When datacome from real observations, and the size of the sample (or the number of samples) is10 .001 0.010 0.100 1.000 h r

5% 1%10%

Figure 5:

The overall behavior of the contour lines of ﬁxed estimation error (ﬁxed q ) on the r = N/T - η plane, for a given value of the conﬁdence limit α = 0 .

975 and for three diﬀerent values (1%, 5% and10%) of the relative estimation error. The data-dominated and bias-dominated regions correspond to thetwo branches of these curves: in the range of small r ’s the curves depend on the strength of the regularizervery weakly, while for r ’s in the vicinity of the phase boundary, and even more for large r ’s high in theoriginally unfeasible region, the ﬁxed estimation error curves display a strong dependence on η . limited by time and/or cost considerations, the standard procedure is cross validation [27],i.e., using a part of the data to infer the value of η and checking the correctness of thischoice on the other part. In the present analytical approach we have the luxury of inﬁnitelymany samples to average over, so we can obtain the value of the coeﬃcient of regularizationby demanding a given relative error (that is a given q ) for a given conﬁdence limit α andgiven aspect ratio r = N/T . Fig. 5 displays the plot of the given estimation error curveson the r - η plane for the speciﬁc value of α = 0 .

975 demanded by the new market riskregulation [1], and relative errors of 1%, 5% and 10%, respectively. It shows the change-over between the data-dominated resp. bias-dominated regimes very clearly. For a givenvalue of r the corresponding value of η can be read oﬀ from the curves. If r is small (i.e.the sample is large with respect to the dimension) the curves with the prescribed values ofrelative error are almost horizontal. This means that when we have suﬃcient data the valueof the regularizer is more or less immaterial: within reasonable limits we can choose anycoeﬃcient for the regularizer, it will not change the precision of the estimate, because in thissituation the data will determine the optimum. Conversely, when the data are insuﬃcient( r is not small, or it is even beyond the feasible region), the value of η necessary to enforcea given relative error strongly depends on r . In this region, however, we need a smallerand smaller η to ﬁnd the same relative error, because here the data almost do not matterand even a small regularizer will establish the optimum. The transition between these11 .0 0.5 1.0 1.5 2.0 2.5 . . . . . . % % % % % h r a = % % % % % h rr a = Figure 6: a: The previous ﬁgure in higher resolution (left). b: The same as the left ﬁgure, but the r ( η )curves normalized by their initial values r corresponding to η = 0 (right). It can be seen that the gain in r is about a factor 4. two regimes takes place around the points where the curves turn back. This happens stillinside the feasible region, and the width of this range is rather small: from the r valuecorresponding to η = 0 to the one where the curves turn around the increase of r alwaysremains within a factor of about 4.Let us take a closer look at that part of the previous ﬁgure where the curves turn aroundand r starts to increase. Fig. 6a shows this region in higher resolution. For a given, small,value of the estimation error (such as 1% or 2%), r grows by a factor of about 4 by the timewe reach the elbow of the curves (at rather large η values). This means that for a givensample size T the regularization allows us to consider a four times larger portfolio withoutincreasing the estimation error. Conversely, for a given value of N the regularizer allowsthe use of four times shorter time series without compromising the quality of the estimate.Of course, the growth of r could be followed beyond the elbow, up to higher values alongthe constant q curves, but it must be clear that these sections of the curves correspond toa situation where the estimate is mostly or entirely determined by the regularizer. This isalso shown by the fact that the curves of given estimation error strongly lean backwardsto the vertical axis: where the dimension is high and the data few even a weak regularizercan stabilize the estimate, but it will then have nothing to do with the information comingfrom the time series.A gain of a factor 4 in the allowed region in r could be regarded as very satisfactory, wereit not for the fact that the initial ( η = 0) value of r along the small estimation error curves12s so small that it remains small even after a multiplication by 4.If we inspect another curve, corresponding to a larger estimation error (say, 5%), we cansee that it turns back for a much smaller η , but the relative increase of r up to the elbowis still about a factor 4. We can also see that beyond this point the curves very quicklyreach the region where the regularizer dominates.Figure. 6b displays the same curves as in Fig. 6a, but this time they are normalized bytheir vales at η = 0, so that they show the gain in r due to the regularizer. We have considered the problem of optimizing Expected Shortfall in the presence of an (cid:96) regularizer. The regularizer takes care of the large sample ﬂuctuations and eliminatesthe phase transition that would be present in the problem without regularization. Deepinside the feasible region, where we have a large amount of data relative to the dimen-sion, the size of the sample needed for a given level of relative estimation error is basicallyconstant, largely independent of the regularizer. In the opposite case, for sample sizescomparable to or small relative to the dimension, the regularizer dominates the optimiza-tion and suppresses the data. The transition between the the data-dominated regime andthe regularizer-dominated one is rather narrow. It is in this transition region where wecan meaningfully speak about a trade-oﬀ between ﬂuctuation and bias, otherwise one orthe other dominates the estimation. The identiﬁcation of this transitional zone is easywithin the present scheme, where we could carry out the optimization analytically: thetransitional zone is the small region where the curves in Fig. 5 sharply turn back, but stillremain inside the originally feasible region. In real life, where the size of the samples canrarely be changed at will and where all kinds of external noise (other than that comingfrom the sample ﬂuctuations) may be present, the distinction between the region wherethe data dominates and where the bias reigns may be much less clear, and one may notbe sure where the transition takes place between them. Below this transition there is notmuch point in using regularization, because the data themselves are suﬃcient to providea stable and reliable estimate. Above the transition zone it is almost meaningless to talkabout the observed data, because they are crowded out by the bias. The identiﬁcation ofthe relatively narrow transition zone between these two extremes and the gain of a factor4 below the transition are the main results of this paper.It is important to realize, however, that the cause of this narrow transition region is thesame as that of the strong ﬂuctuations, namely the unbounded loss function. ExpectedShortfall is not the only risk measure to have this deﬁciency: all the downside risk measureshave it, including Value at Risk. The preference for downside risk measures is explainedby the fact that investors (and regulators) are not afraid of big gains, only of big losses.13erhaps they should be. Refusing to acknowledge the risk in improbably large gains isa Ponzi scheme mentality. Downside risk measures embody this mentality. As a part ofregulation, however, they acquire an air of undeserved respectability, at which point theassociated technical issues become components of systemic risk. We are obliged to R. Kondor, M. Marsili and S. Still for valuable discussions. FC ac-knowledges support of the Economic and Social Research Council (ESRC) in funding theSystemic Risk Centre (ES/K002309/1). IK is grateful for the hospitality extended to himat the Computer Science Department of the University College of London where this paperwas written up.

References [1] Basel Committee on Banking Supervision. Minimum capital requirements for marketrisk. Bank for International Settlements, 2016.[2] P. Jorion. Bayes-Stein estimation for portfolio analysis. Journal of Financial andQuantitative Analysis, 21:279–292, 1986.[3] P.A. Frost and J.E. Savarino. An empirical Bayes approach to eﬃcient portfolioselection. Journal of Financial and Quantitative Analysis, 21:293–305, 1986.[4] O. Ledoit and M. Wolf. Improved estimation of the covariance matrix of stock returnswith an application to portfolio selection. Journal of Empirical Finance, 10(5):603–621,2003.[5] O. Ledoit and M. Wolf. A well-conditioned estimator for large-dimensional covariancematrices. J. Multivar. Anal., 88:365–411, 2004.[6] O. Ledoit and M. Wolf. Honey, I shrunk the sample covariance matrix. J. PortfolioManagement, 31:110, 2004.[7] V. Golosnoy and Y. Okhrin. Multivariate shrinkage for optimal portfolio weights. TheEuropean Journal of Finance, 13:441–458, 2007.[8] J. Brodie, I. Daubechies, C. De Mol, D. Giannone, and I. Loris. Sparse and stableMarkowitz portfolios. Proceedings of the National Academy of Science, 106(30):12267–12272, 2009. 149] L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters. Noise dressing of ﬁnancialcorrelation matrices. Phys. Rev. Lett., 83:1467–1470, 1999.[10] L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters. Random matrix theory andﬁnancial correlations. International Journal of Theoretical and Applied Finance, 3:391,2000.[11] V. Plerou, P. Gopikrishnanm, B. Rosenow, L.A.N. Amaral, and H.E. Stanley. Uni-versal and non-universal properties of cross-correlations in ﬁnancial time series. Phys.Rev. Lett., 83:1471, 1999.[12] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. N. Amaral, T. Guhr, and H. E. Stanley.A random matrix approach to cross-correlations in ﬁnancial time-series. Phys. Rev.E, 65:066136, 2000.[13] S. Ciliberti, I. Kondor, and M. M´ezard. On the feasibility of portfolio optimizationunder expected shortfall. Quantitative Finance, 7:389–396, 2007.[14] S. Ciliberti and M. M´ezard. Risk minimization through portfolio replication. Eur.Phys. J., B 57:175–180, 2007.[15] I. Varga-Haszonits and I. Kondor. The instability of downside risk measures. J. Stat.Mech., P12007, 2008.[16] F. Caccioli, S. Still, M. Marsili, and I. Kondor. Optimal liquidation strategies regu-larize portfolio selection. The European Journal of Finance, 19(6):554–571, 2013.[17] Fabio Caccioli, Imre Kondor, Matteo Marsili, and Susanne Still. Liquidity risk andinstabilities in portfolio optimization. International Journal of Theoretical and AppliedFinance, 19(05):1650035, 2016.[18] A. Takeda and M. Sugiyama. νν