Cross-validated covariance estimators for high-dimensional minimum-variance portfolios
FFinancial Markets and Portfolio Management manuscript No. (will be inserted by the editor)
Cross-validated covariance estimators forhigh-dimensional minimum-variance portfolios
Sven Husmann · Antoniya Shivarova ∗ · Rick Steinert
Received: date / Accepted: date
Abstract
The global minimum-variance portfolio is a typical choice for in-vestors because of its simplicity and broad applicability. Although it requiresonly one input, namely the covariance matrix of asset returns, estimating theoptimal solution remains a challenge. In the presence of high-dimensionality inthe data, the sample covariance estimator becomes ill-conditioned and leads tosuboptimal portfolios out-of-sample. To address this issue, we review recentlyproposed efficient estimation methods for the covariance matrix and extendthe literature by suggesting a multi-fold cross-validation technique for selectingthe necessary tuning parameters within each method. Conducting an exten-sive empirical analysis with three datasets based on the Russell 3000, we showthat choosing the specific tuning parameters with the proposed cross-validationimproves the out-of-sample performance of the global minimum-variance port-folio. In addition, we identify estimators that are strongly influenced by thechoice of the tuning parameter and detect a clear relationship between theselection criterion within the cross-validation and the evaluated performancemeasure.
Keywords
Covariance Estimation · Portfolio Optimization · High-dimensionality · Cross-validation
JEL classification
G10 · G11 · C13 · C80 ∗ Corresponding author. Email: [email protected] Husmann · Antoniya Shivarova · Rick SteinertEuropa-Universit¨at ViadrinaGroße Scharrnstraße 5915230 Frankfurt (Oder), Germany a r X i v : . [ q -f i n . P M ] A ug Sven Husmann et al.
Based on the simple but essential idea of diversification and optimal risk-returnprofile of an investment strategy, the mean-variance model by Markowitz(1952) still represents the groundwork for portfolio optimization. In its originaldesign, Markowitz portfolio theory assumes perfect knowledge about the ex-pected value and variance of returns. For practical implementations, however,these parameters have to be estimated from historical data. The misspecifica-tions due to error in estimation can lead to strong deviations from optimalityand therefore an inferior out-of-sample performance (Jobson and Korkie 1981;Frost and Savarino 1986; Michaud 1989; Broadie 1993). This major drawbackhas been tackled from different perspectives in the financial literature. Somefocus on estimation errors in the portfolio weights directly (see, e.g., Brodieet al. 2009; DeMiguel et al. 2009a), whereas others work on the inputs byimproving expected returns and the covariance matrix.In particular, portfolio weights are extremely sensitive to changes in ex-pected returns (Best and Grauer 1991a,b), which in turn are more difficultto estimate than the covariances of returns (Merton 1980). It is therefore notsurprising that a considerable part of recent academic research focuses on theglobal minimum-variance portfolio (GMV), as this does not depend on ex-pected returns. However, even if investors decide to use the global minimum-variance portfolio, the estimation errors associated with the covariances canstill lead to significant estimation errors in the portfolio weights, especially ina high-dimensional scenario.We cover several approaches that have been shown to overcome these es-timation issues and perform well in terms of out-of-sample variance. For in-stance, we discuss the linear shrinkage estimators of Ledoit and Wolf (2004a,b)designed to offer an optimal bias-variance trade-off between the sample covari-ance matrix and a structured target matrix. Furthermore, we adopt the recentnonlinear shrinkage technique by Ledoit and Wolf (2017a) which is proven tobe optimal under a variety of financially relevant loss functions (Ledoit andWolf 2017a, 2018a). Moreover, we outline and implement the elaborate prin-cipal orthogonal complement thresholding (POET) estimator by Fan et al.(2013) In addition, we follow the findings of the recent empirical studies byGoto and Xu (2015) and Torri et al. (2019) and include the graphical leastabsolute shrinkage and selection operator (GLASSO),More importantly, the selected covariance estimation methods share onething in common: a regularization of the sample covariance is performed tooptimize its out-of-sample performance. For example, linear shrinkage methodsneed an optimal shrinkage intensity to balance the included variance and bias,whereas the performance of the GLASSO depends on the level of sparsity,induced by a penalty parameter. The procedure for optimally identifying thosetuning parameters often includes the choice of a specific loss function to be DeMiguel et al. (2009b) additionally show that the mean-variance portfolio is outper-formed out-of-sample by the minimum-variance portfolio not only in terms of risk, but aswell in respect to the return-risk ratio.ross-validated covariance estimators for high-dimensional MVP 3 minimized. As often advocated, a loss function or measure of fit in the modelestimation is best aligned with the evaluation framework (Christoffersen andJacobs 2004; Ledoit and Wolf 2017a; Engle et al. 2019). To exploit those effectsin more detail, we apply a nonparametric cross-validation (CV) technique withdifferent selection criteria to determine the optimal parameters, necessary forthe calculation of all the considered covariance estimators.Since we focus on enhancing the risk profile of the GMV portfolios, wechoose two relevant risk-related measures for our cross-validated estimationmethodology and the corresponding out-of-sample performance evaluation,namely the mean squared forecasting error (MSFE), as in Zakamulin (2015),and the out-of-sample portfolio variance. We show empirically that in mostcases there exists a strong positive relation between the selection criterionwithin the CV and the respective out-of-sample performance measure. For in-stance, when the overall goal is to reduce the out-of-sample risk, then usingCV with the portfolio variance as a measure of fit leads to lower risk thanthe original method. Similar results are documented by Liu (2014), althoughhe only considers the most straight-forward linear shrinkage as in Ledoit andWolf (2003, 2004a,b). Here, we examine more recent and efficient estimationmethods and identify those that can actually profit from CV. In detail, es-timators that depend strongly on the choice of a specific tuning parameterwithin their derivation are more prone to be positively influenced by replacingthe original solution with a cross-validated one.Our contributions to the current literature on the subject of covarianceand precision matrix estimation within the portfolio optimization frameworkcan be summarized as follows. First, we show that recent advances in methodsfor high-dimensional covariance estimation lead to strong improvements in therisk profile of the GMV. In this context, we emphasize the distinct and oftensignificant outperformance of the In line with the main discussion, we showthat a model’s outperformance in respect to out-of-sample portfolio variancedoes coincide with an identical objective within the CV. Although the elabo-rate nonlinear shrinkage methods are not strongly influenced by applying theCV procedure, all the other cross-validated estimators perform better thantheir original counterparts. This advantage becomes even greater as the high-dimensionality of the data increases. Considering the MSFE, the results arestraightforward for all estimation methods. If an investor aims to minimizethis measure, the respective validation within the CV ought to be performed.Nonetheless, we analyze the inefficiency of the MSFE for high-dimensionalasset returns’ data, in particular, because of a distorted calculation of therealized covariance matrix.The rest of paper is organized as follows: In Section 2, we review the con-sidered covariance estimation methods and their properties. Section 3 outlinesthe suggested CV methodology in respect to its main characteristics: the pro-cedure, the parameter set, and the selection criteria. We describe the empiricalstudy in Section 4 with a strong focus on the chosen dataset, methodology andperformance measures. In Section 5, we discuss the performance of classicaland constrained GMV portfolios and analyze in detail the influence of cross-
Sven Husmann et al. validated estimation among all considered datasets and methods. Section 6summarizes the results and concludes. (cid:98) Σ S = 1 T − R − (cid:98) µ (cid:48) ( R − (cid:98) µ , (1)where R ∈ R T × n is the matrix of past asset returns with T observations and n number of stocks, (cid:98) µ ∈ R n is the vector of expected returns, here estimatedwith the sample mean, and 1 is an n -dimensional vector of ones. As shown byMerton (1980), the sample covariance matrix is an asymptotically unbiasedand consistent estimator of the true covariance matrix Σ For a large numberof assets a concentration ratio of such magnitude is practically infeasible dueto limited data availability and illiquidity issues. With a high relation of thenumber of assets to the sample size, also called high-dimensionality, the samplecovariance and its inverse exhibit higher amount of estimation error, mainlydue to the over- and underestimation of the respective eigenvalues. Moreover,for q >
1, the sample covariance becomes singular and the inverse cannot becalculated.The sample estimator’s instability and possible singularity in case of high-dimensionality are a problem within the optimization of global minimum-variance portfolios, where the covariance matrix and, specifically, its inversecapture the dependency between asset returns and allow for the effect of diver-sification as a way of reducing risk. It is then straightforward that the accuracyof optimally estimated portfolio weights is directly related to the estimator’sprecision. As a solution, several alternative estimators have been proposed inthe literature.2.2 Linear ShrinkageTo produce more stable estimators of the covariance matrix, a linear shrinkingprocedure can be applied to the sample estimator toward a more structuredtarget matrix (cid:98) Σ T , (cid:98) Σ LS = s (cid:98) Σ T + (1 − s ) (cid:98) Σ S , where the constant s ∈ [0 ,
1] controls the shrinkage intensity, which is set higherthe more ill-conditioned the sample estimator is and vice versa. In contrast tothe unbiased, but unstable sample covariance, a structured target matrix haslittle estimation error but tends to be biased. As a compromise, the convexcombination of both uses the bias-variance trade-off by accepting more bias ross-validated covariance estimators for high-dimensional MVP 5 in-sample in exchange for less variance out-of-sample. This idea is central tothe shrinkage methodology of Stein (1956) and James and Stein (1961). Therespective linear shrinkage estimator is calculated as (cid:98) Σ LW = (cid:98) s ¯ σI n + (1 − (cid:98) s ) (cid:98) Σ S , (2)where ¯ σ = n (cid:80) nj =1 σ jj is the average of all individual sample variances and (cid:98) s isan optimal shrinkage intensity parameter. However, in the context of financialtime series, it is beneficial to consider target matrices with reference to thecorrelation structure of asset returns.Ledoit and Wolf (2004a) consider identical pairwise correlations between all n assets. The target matrix is therefore derived under the constant correlationmatrix model of Elton and Gruber (1973), so that (cid:98) Σ T = (cid:98) Σ CC . While thevariances are kept as their original sample values, the off-diagonal entries of thetarget matrix are estimated by assuming a constant average sample correlation¯ ρ . This results in (cid:98) Σ CC , ij = (cid:112)(cid:98) σ ii (cid:98) σ jj ¯ ρ. The corresponding estimator is definedas (cid:98) Σ LW CC = (cid:98) s (cid:98) Σ CC + (1 − (cid:98) s ) (cid:98) Σ S . (3)The level of the shrinkage (cid:98) s in Equations (2) and (3) can be obtained analyt-ically. In particular, as shown by Ledoit and Wolf (2004a,b), asymptoticallyconsistent estimators for the optimal linear shrinkage intensity are derivedunder the quadratic loss function L (cid:16) (cid:98) Σ, Σ (cid:17) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) Σ − Σ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F , (4)known as the Frobenius loss, where the covariance estimator (cid:98) Σ is substitutedwith Equation (2) or (3). The finite sample solution is found at the minimumof the expected value of the Frobenius loss, namely the mean squared error(MSE), (cid:98) s = arg min s E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) Σ − Σ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F (cid:21) . (5)The methodology behind this derivation can be applied to other shrinkagetargets in a convex combination setting after an individually performed analy-sis and mathematical adaptation. Our cross-validation methodology, however,can be implemented for any linear shrinkage without further modifications,since we do not rely on the theoretically derived shrinkage intensity; instead,we search for an optimal value using CV.2.3 Nonlinear ShrinkageWithout any assumption about the true covariance matrix, the positive-definiterotationally equivariant nonlinear shrinkage is based on the spectral decom-position of the sample covariance matrix and defined as (cid:98) Σ LW NL = V (cid:98) Λ NL V (cid:48) , (6) This class of estimators was first introduced by Stein (1986). Sven Husmann et al. where V = [ v , . . . , v n ] is the orthogonal matrix with the sample eigenvectors v i as columns and (cid:98) Λ NL is the diagonal matrix of the sample eigenvalues λ i ,shrunk by a nonlinear shrinkage function (cid:98) φ . To find the optimal (cid:98) φ ∗ , Ledoitand Wolf (2012) originally minimize the MSE in finite samples.Without going into further details, we examine the practical implemen-tation of the nonlinear shrinkage. The optimal solution is achieved using anonparametric variable bandwidth kernel estimation of the limiting spectraldensity of the sample eigenvalues and its Hilbert transform. The speed atwhich the bandwidth vanishes in the number of assets n can be set to − / − / − .
35. Within the suggested CV technique, we aim to verify whether thisexact choice of the kernel bandwidth’s speed is crucial for the estimator’s ef-ficiency and whether the out-of-sample performance can be improved by anin-sample validation.2.4 Approximate Factor ModelThe previously outlined methods for improved high-dimensional covariance es-timation do not assume any structural knowledge about the covariance matrixand regularize only the sample eigenvalues λ i . An underlying structure couldbe established by regularizing the sample eigenvectors v i , for example if thecovariance matrix itself is assumed to be sparse (see, e.g., Bickel and Levina2008; Cai and Liu 2011). Unfortunately, this is not appropriate for financialtime series because of the presence of common factors (Fan et al. 2013). How-ever, if there is only conditional sparsity, the covariance matrix of investmentreturns can be estimated using factor models given by (cid:98) Σ FM = B (cid:98) Σ F B (cid:48) + (cid:98) Σ u , where Σ F is the sample covariance matrix of the common factors and (cid:98) Σ u isthe residuals covariance matrix. One disadvantage of such exact factor mod-els is the strong assumption of no correlation in the error terms across assets;that is, the error covariance matrix (cid:98) Σ u is assumed to contain only the sam-ple variances of the residuals. Therefore, possible cross-sectional correlationsare neglected after separating the common present factors (Fan et al. 2013).Instead, approximate factor models allow for off-diagonal values within theerror covariance matrix. The POET estimator is one of the most recent and Most recently, Ledoit and Wolf (2018b) reach an analytical solution by replacing thecomplex-valued Stieltjes transform (Ledoit and Wolf 2017a) of the limiting distribution ofthe sample eigenvalues by the Hilbert transform, which acts as a local attraction force. Asa result, each sample eigenvalue is shrunk toward its closest and most numerous neighbors. Following this definition and assuming K common factors with K < n , a covariancematrix estimator based on factor models only needs to estimate K ( K + 1) / efficient estimators from this branch of research. Using the close connectionbetween factor models and the principal component analysis, Fan et al. (2013)infer the necessary factor loadings by running a singular value decompositionon the sample covariance matrix as Σ S = K (cid:88) i =1 λ i v i v (cid:48) i + n (cid:88) i = K +1 λ i v i v (cid:48) i . The covariance, formed by the first K principal components, contains mostof the information about the implied structure. The rest is assumed to be anapproximately sparse matrix, estimated by applying an adaptive thresholdingprocedure (Cai and Liu 2011) with a threshold parameter c . As a result, thePOET estimator becomes Σ POET = K (cid:88) i =1 λ i v i v (cid:48) i + (cid:98) Σ c u , K . (7)As argued by Fan et al. (2013), for high-dimensional asset returns with asufficiently large n → ∞ , the number of factors K can be inferred from thedata, for example, with (cid:98) K = arg min ≤≤ k max log (cid:32) nT (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) R − T RF k F (cid:48) k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F (cid:33) + kg ( T, n ) , (8)where k max is the predefined maximum number of factors, R is the matrixof asset returns with a sample covariance matrix Σ S , F k is a T × k matrixwith columns the eigenvectors, corresponding to the k largest eigenvalues of Σ S , and g ( T, n ) is a penalty function of the type, introduced by Bai and Ng(2002). In this study we further examine whether the proposed CV approachcan select optimal values for K by considering the out-of-sample performancemeasure of interest as a selection criterion.2.5 Graphical ModelA proper estimation of the covariance matrix of returns is crucial in a portfoliooptimization context, since its inverse Θ = Σ − is the direct input parameternecessary for exploiting diversification effects upon optimization. Instead ofimposing a factor structure on the covariance matrix with a sparse error co-variance as in POET, sparsity in the precision matrix can be a valid approachfor reducing estimation errors, especially in the case of conditional indepen-dence among asset pairs (Fan et al. 2016). In detail, the entry Θ i , j = 0 if For the operational use of POET, the threshold value c needs to be determined, so thatthe positive-definiteness of (cid:98) Σ c u , K is assured in finite samples. The choice of c can thereforeoccur from a set, for which the respective minimal eigenvalue of the errors’ covariance matrixafter thresholding is positive. The minimal constant c that guarantees positive-definitenessis then chosen. For more details, see, Fan et al. (2013). Sven Husmann et al. and only if asset returns r i and r j are independent, conditional on the otherassets in the investment universe. Since graphical models are used to describeboth the conditional and unconditional dependence structures of a set of vari-ables, the estimation of Θ is closely related to graphs under a Gaussian model.The identification of zeros in the inverse can be performed with the Gaussiangraphical model, since within the Markowitz portfolio optimization frameworkasset returns are assumed to follow a multivariate normal distribution. One of the most commonly used methods for inducing sparsity on theprecision matrix is by penalizing the maximum-likelihood. For i.i.d. R with R ∼ N (0 , Σ ), the Gaussian log-likelihood function is given by L ( Θ ) = log | Θ | − tr (cid:16) (cid:98) Σ S Θ (cid:17) , (9)where | · | denotes the determinant and tr( . ) the trace of a matrix. MaximizingEquation (9) alone yields the known maximum-likelihood estimator for theprecision matrix (cid:98) Θ S , which suffers from high estimation error in case of high-dimensionality. To reduce such errors, the maximum log-likelihood functioncan be penalized by adding a lasso penalty (Tibshirani 1996) on the precisionmatrix entries as L ( Θ ) = log | Θ | − tr (cid:16) (cid:98) Σ S Θ (cid:17) − ρ (cid:12)(cid:12)(cid:12)(cid:12) Θ − (cid:12)(cid:12)(cid:12)(cid:12) , (10)where || Θ − || is the L -norm (the sum of the absolute values) of the matrix Θ − , an n × n matrix with the off-diagonal elements, equal to the correspondingelements of the precision matrix Θ and the diagonal elements equal to zero. Furthermore, ρ is a penalty parameter that controls the sparsity level, withhigher ρ values leading to a larger number of off-diagonal zero elements withinthe resulting estimator.The penalized likelihood framework for a sparse graphical model estimationwas first proposed by Yuan and Lin (2007), who solve Equation (10) with aninterior-point method. Banerjee et al. (2008) show that the problem is convexand solve it for Σ with a box-constrained quadratic program. To date, thefastest available solution for the sparse graphical model in Equation (10) isreached with the GLASSO algorithm, developed by Friedman et al. (2008)and later improved by Witten et al. (2011). They demonstrate that the aboveformulation is equivalent to an N-coupled lasso problem and solve it using acoordinate descent procedure.In addition to a well-performing algorithm, the value of ρ is necessary forcalculating the optimal GLASSO estimator. For this purpose, Yuan and Lin(2007) suggest using the Bayesian Information Criterion (BIC), defined foreach ρ as BIC ( ρ ) = − log (cid:12)(cid:12)(cid:12) (cid:98) Θ ρ (cid:12)(cid:12)(cid:12) + tr (cid:16) (cid:98) Σ S (cid:98) Θ ρ (cid:17) + log( T )2 n (cid:88) i =1 ,i (cid:54) = j n (cid:88) j =1 ,j (cid:54) = i { (cid:98) Θ ρ, ij (cid:54) =0 } , (11) This idea was first proposed by Dempster (1972) with the so-called covariance selectionmodel. This insures that no penalty is applied to the asset returns’ sample variances.ross-validated covariance estimators for high-dimensional MVP 9 where the indicator function { (cid:98) Θ ρ,ij (cid:54) =0 } counts the number of nonzero off-diagonal elements in the estimated precision matrix. The value of ρ , corre-sponding to the lowest BIC, is chosen as the optimal lasso penalty parameter.The choice of the BIC as a selection criterion for ρ is further justified by therelation between the penalized problem in Equation (10) and the model selec-tion criteria (Goto and Xu 2015). Although Yuan and Lin (2007) argue thata CV procedure for an optimal lasso penalty can yield better out-of-sampleresults, the existing financial applications estimate ρ only once in-sample. Bycontrast, and additionally perform a multi-fold CV with risk-related selectioncriteria. The exact methodology is described in the next section.
Each of the outlined covariance estimators includes an exogenous or data-dependent parameter. The linear shrinkage estimators in Equations (2) and(3) are calculated with an optimal shrinkage intensity (cid:98) s . For the more gen-eral nonlinear shrinkage Ledoit and Wolf (2017a) set the kernel bandwidth’sspeed at − .
35 as the average of two recognized approaches. The approxi-mate factor model, the POET estimator by Fan et al. (2013), deals with anunknown number of factors K , which are identified by minimizing popularinformation criteria. Finally, the GLASSO estimator proposed by Friedmanet al. (2008) needs an optimal choice for the penalty parameter ρ , often esti-mated by minimizing the BIC in-sample. To clarify our analysis, we refer tothese estimation methods as ‘original’. In addition, we adopt a nonparametrictechnique, a multi-fold CV, to identify the necessary parameter for each esti-mation method. Instead of relying on pre-specified assumptions and derivingcorresponding solutions individually, we perform a grid search over a domainof values and find the best possible parameter for two exemplary out-of-sampleselection criteria.3.1 Parameter SetTo employ a cross-validated choice, we first need to specify a domain of possi-ble values for the necessary parameters that should be selected within the CVprocedure. For this purpose we create a sequence (or grid) of arbitrary param-eters δ ∈ ∆ for each covariance model. Depending on the chosen length of thesequence, the CV can be computationally time-consuming. Since the choiceof this sequence is crucial for the out-of-sample efficiency of the methodol-ogy, the domain of possible parameters has to be individually evaluated for Goto and Xu (2015) induce sparsity to enhance robustness and lower the estimation errorwithin portfolio hedging strategies, Brownlees et al. (2018) develop a procedure called “re-alized network” by applying GLASSO as a regularization procedure for realized covarianceestimators, and Torri et al. (2019) analyze the out-of-sample performance of a minimum-variance portfolio, estimated with GLASSO.0 Sven Husmann et al. each estimation method by considering the trade-off between desired precisionand computing time. Subsection 4.2 outlines the examined sequences for theconsidered covariance estimation methods.3.2 Cross-Validation ProcedureThe CV is a model validation technique designed to assess how an estimatedmodel would perform on an unknown dataset. To evaluate the model accuracy,the available dataset is repeatedly split into a training and a testing subset ina rolling-window fashion (see, e.g., Hjort 1996; Arlot and Celisse 2010). Forinstance, in the case of an m -fold CV, a dataset with τ observations is splitinto m equal parts. The first rolling-window then uses as a training dataset thefirst fold consisting of the first ν < τ observations ordered by time. Upon this,the consecutive υ observations are used to validate the performed estimationas a test dataset. This is iteratively done m times by shifting the trainingwindow by υ observations and, therefore, maintaining the chronological orderwithin the data.In our setting, for each of the pre-defined parameters we successively usethe training data to calculate a covariance matrix estimator (cid:98) Σ t ,δ for a testdataset t and a specific parameter δ . During the following validation stage,we must set selection criteria, also referred to as measures of fit, to identifywhich parameter performs best. In this study, we investigate two commonobjectives within the field of portfolio risk minimization.As often argued, the squared forecasting error (SFE) or, as defined inSection 2, the Frobenius loss, is minimized to find a covariance estimator withthe least forecasting error (see, e.g., Zakamulin 2015). Specifically, we firstcalculate a realized covariance matrix for the test dataset with Σ t = ( R t − (cid:98) µ t (cid:48) ( R t − (cid:98) µ t , where R t ∈ R υ × n are the asset returns from the test dataset and is the vectorof average returns for the testing period consisting of υ observations. Then,we find the corresponding SFE as (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) Σ t ,δ − Σ t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F . This procedure is repeated m times, so that we end up with m SFE values foreach δ . From the parameter set we then choose this δ for which the average(over all m iterations) SFE is minimized. In our empirical study, CV with theSFE as a measure of fit is referred to as CV1.Instead of the SFE, within a portfolio optimization framework, one is gen-erally more interested in whether a covariance estimator leads to lower out-of-sample risk of the optimal portfolio (see, e.g., Liu 2014; Ledoit and Wolf 2017a; For clarity in the notation, we do not differentiate between covariance estimators. Theprocedure is applied to all methods equally.ross-validated covariance estimators for high-dimensional MVP 11
Engle et al. 2019). To incorporate and later investigate this concept, as oursecond scenario (CV2), we minimize the out-of-sample portfolio variance. Indetail, with the covariance matrix (cid:98) Σ t ,δ , previously estimated with the trainingdata, we calculate the optimal weights (cid:98) w t ,δ for a portfolio of our choice (e.g.,the GMV). This then allows us to calculate the respective portfolio returnsthroughout the testing period with υ observations as r p t ,δ = (cid:98) w (cid:48) t ,δ R t . This procedure is repeated m times, so that we end up with m portfolio returnvectors for each δ . From the parameter set, we then choose this δ for whichthe empirical variance (over all m iterations) of those portfolio out-of-samplereturns is minimized.By applying different measures of fit within the CV we explicitly addressthe importance of aligned selection criteria for the out-of-sample performanceof each covariance estimation method. Moreover, we aim to verify whether thecalibration of covariance parameters with a multi-fold CV yields better resultsout-of-sample than the original models. To exploit the above considerations, we perform an extensive empirical studyof the suggested covariance estimation methods within a high-dimensionalportfolio optimization context. For this purpose, we create fully invested aswell as portfolios and evaluate their out-of-sample performance for a rangeof commonly used measures. We additionally compare the original covarianceparameters with their calibrated equivalents. The exact empirical construct iselaborated on in the following subsections.4.1 Model SetupFor the empirical study, we focus on the GMV portfolio. The optimal weightsfor an investment period t are determined by minimizing the portfolio varianceas (cid:98) w t = arg min w w (cid:48) (cid:98) Σ t w s.t. 1 (cid:48) n w = 1 , (12)where 1 n is an n-dimensional vector of ones and (cid:98) Σ t is an arbitrary covari-ance matrix estimator for the investment period t . This formulation has theanalytical solution (cid:98) w t = (cid:98) Σ − t n (cid:48) n (cid:98) Σ − t n . τ = 24 months (or roughly 504 days),and an out-of-sample period from , resulting in T − τ = 240 months (or5029 days) out-of-sample portfolio returns. Similarly to the original studieson the reviewed covariance estimation methods (Fan et al. 2013; Ledoit andWolf 2017a), we employ a monthly rebalancing strategy, since this is morecost-efficient and common in practice. Within each rolling-window step, thecovariance matrix of asset returns for the investment month t is estimated atthe end of month t − . The POET estimator is calculated using theR-package POET provided by Fan et al. (2013). Finally, the GLASSO estimatoris calculated with the algorithm provided by Friedman et al. (2008) within theR-package glasso with no penalty on the diagonal elements and an in-sampleselection of the lasso penalty using the BIC.In addition to the models in Section 2, we calculate the cross-validatedestimators as in Section 3 by implementing an m -fold CV. To determine theselection criteria for the respective CV methods, we choose m = 12 and there-fore divide the in-sample observations into a training sample of 12 months (or252 days) and a testing sample of one month (or 21 days). With this con-struction, we replicate the proposed monthly rebalancing strategy inside theperformed CV. As introduced in Subsection 3.1, we additionally need to definea set of parameters for each covariance estimation method.Since both linear shrinkage methods in Equation (2) (LW ) and Equa-tion (3) (LW CC ) represent the weighted average between the sample and atarget covariance matrix, we define a parameter set ∆ of G shrinkage intensi-ties, such that ∆ = ∆ CC = ( δ , δ , . . . , δ G ) ∈ [0 , NL ) as well as for the single-factor nonlinear shrinkage in Equation ( ?? )(LW NLSF ), we set the kernel bandwidth’s speed to lie between − . − . ∆ NL = ( δ , δ , . . . , δ G ) ∈ [ − . , − . ρ , derived from the in-sample data. Specif-ically, we define a logarithmic sequence 10 log ( k ( x,e,u,G )) as our ρ -generatingfunction, where k ( x, e, u, G ) = ( x − · e − uG − + u with G = 50 number of pa- .ross-validated covariance estimators for high-dimensional MVP 13 rameters in the sequence, u being the maximal absolute value of the samplecovariance matrix, estimated with the training dataset, and e = 0 . u .After calculating all the possible combinations of original and cross-validatedestimators within the validation subset, we choose an optimal parameter foreach covariance estimation method, as outlined in Subsection 3.2, and use allthe in-sample data to estimate the covariance matrix for the next investmentmonth. Since the reviewed estimation methods and our methodology do notmodel time-dependency in the covariance matrix, we set . We use (cid:98) Σ t to findthe optimal weights (cid:98) w t , as in Equations (12) and ( ?? ). With these weights,we calculate the out-of-sample portfolio returns for each model in t . This pro-cedure is repeated multiple times until the end of our investment horizon.First, we include the equally-weighted portfolio, hereafter also referred toas the Naive portfolio. This strategy implies an identity covariance matrixand hence, does not include any estimation risk (DeMiguel et al. 2009b). Inaddition to the Naive strategy, which is a standard benchmark when comparinginduced transaction costs and turnover rates, we estimate the weights withthe sample covariance matrix estimator, which serves as a benchmark for theout-of-sample risk. All these portfolios are evaluated with the performancemeasures, presented in the following subsection.4.3 Performance MeasuresTo evaluate the out-of-sample performance of each covariance matrix estima-tion method, we report different performance measures for the estimator’sefficiency and the risk profile as well as the allocation properties of the corre-sponding GMV and GMV-130-30 portfolios. First, we calculate the MSFE asMSFE = 1 T − τ T − τ (cid:88) t = τ n (cid:88) i =1 i (cid:88) j =1 (cid:16) Σ t,ij − (cid:98) Σ t,ij (cid:17) , (13)where (cid:98) Σ t,ij is the covariance matrix estimator and Σ t,ij is the realized covari-ance for month t . The MSFE is frequently used to measure the forecastingpower of an estimation method. To avoid double accounting for forecastingerrors, we exclude the lower triangular part of both matrices from the calcu-lation.Considering the nature of minimum-variance portfolios as risk-reductionstrategies, we are especially interested in the out-of-sample SD as a perfor-mance indicator. We calculate the standard deviation (SD) of the 5029 out-of-sample portfolio returns and multiply by √
252 to annualize it. For a moredetailed analysis of the out-of-sample risk of the constructed portfolios andtherefore, implicitly, covariance estimation methods, we perform the two-sidedParzen Kernel HAC-test for differences in variances, as described by Ledoitand Wolf (2008) and Ledoit and Wolf (2011), and report the correspondingsignificance levels. Since we utilize daily returns, a sufficient number of ob- servations is available and a bootstrap technique is not essential. Since theMSFE is closely related to the SFE optimality criterion, as within the CV1method, we expect the respectively optimized covariance estimators to ex-hibit a lower MSFE than their original versions. Moreover, an estimation withthe CV2 approach, based on minimizing the portfolio variance, is expected toresult in a lower out-of-sample SD.In practice investors need to additionally address the problem of high trans-action costs; hence, they prefer a more stable allocation for an optimal portfoliostrategy. Therefore, as a proxy for occurring transaction costs, we analyze theaverage monthly turnover, defined asTurnover = 1 T − τ − T − τ − (cid:88) t = τ (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) w t +1 − (cid:98) w + t (cid:12)(cid:12)(cid:12)(cid:12) , (14)where || · || denotes the (cid:96) -norm and (cid:98) w + t denotes the portfolio weights atthe end of the investment month t , scaled back to one. The turnover rate iscalculated as the averaged sum of absolute values of the monthly rebalancingtrades across all n assets and over all investment dates T − τ −
1. The nextsection reports the detailed out-of-sample performance analysis and empiricalresults. Considering the trend of the optimal linear shrinkage intensities for LW andLW CC , we observe that the original approaches of Ledoit and Wolf (2004a) andLedoit and Wolf (2004b) are less reactive to changes in asset returns than ourCV methodologies. The strong fluctuation in the selected shrinkage intensityfor CV1, CV21, and CV22 results from the properties and functionality of theCV itself and implies fast adaptation to potentially changing market conditionsand selection criteria.5.2 GMV PortfolioTable 1 presents the central results of our empirical analysis on the GMVportfolio. as well as the three performance measures MSFE, SD and averagemonthly turnover rate (TO). The latter are reported in percentage. The rowsindicate the portfolio strategies based on the covariance estimation. While theoriginal estimators are noted only by the respective name of the estimation For the sake of completeness, we have also performed a block bootstrap as in Ledoit andWolf (2011). The corresponding significant values are comparable to those from the HACtest and are therefore not reported. The other datasets produce similar results. For reference, see Figure 2, Appendix A.ross-validated covariance estimators for high-dimensional MVP 15 . . . . . . LW Year S h r i n k age I n t en s i t y . . . . . . LW CC Year S h r i n k age I n t en s i t y − . − . − . − . − . − . − . LW NL Year B and w i d t h ' s S peed − . − . − . − . − . − . − . LW NLSF
Year B and w i d t h ' s S peed POET
Year N u m be r o f F a c t o r s + − − − − GLASSO
Year La ss o P ena l t y Original CV1 CV21 CV22
Fig. 1: Optimally selected parameters with original, CV1, CV21, and CV22covariance estimation methods for the 500RUA dataset.method, the endings CV1 and CV21 represent the cross-validated approaches,as explained in the previous sections.The compact representation of the results across datasets allows us to ob-serve that in the case of enhanced covariance estimators, the annualized SDdeclines as more assets are included in the GMV portfolio. This is easily ex-plained by the known power of diversification – the desirable effect of includingmore stocks in a portfolio. As expected, all the efficient covariance estimationmethods perform better than the sample estimator in terms of out-of-samplerisk for all the datasets, with more significant deviations for higher concentra-tion ratios.More importantly, we can detect the positive effect of the appropriatechoice of selection criterion for determining the necessary covariance param-eters. For all the datasets, minimizing the portfolio variance with the CV21approach indeed leads to lower out-of-sample SD for the linear shrinkage meth-ods LW and LW CC and the GLASSO estimator. For the POET estimator, theCV21 method does not lead to consistent outperformance in terms of out-of- T a b l e : P e r f o r m a n ce o f G M V p o r t f o li o s a c r o ss d i ff e r e n t e s t i m a t o r s a ndd a t a s e t s . R UA R UA R UA M S F E S D T u r n o v e r M S F E S D T u r n o v e r M S F E S D T u r n o v e r N a i v e . . . . . . S a m p l e . . . . . . --- L W . . . . . . . . . L W - C V . . . . . . . . . L W - C V . . . . . . . . . L W CC . . . . . . . . . L W CC - C V . . . . . . . . . L W CC - C V . . . . . . . . . L W N L . . . . . . . . . L W N L - C V . . . . . . . . . L W N L - C V . . . . . . . . . L W N L S F . . . . . . . . . L W N L S F - C V . . . . . . . . . L W N L S F - C V . . . . . . . . . P O E T . . . . . . . . . P O E T - C V . . . . . . . . . P O E T - C V . . . . . . . . . G L A SS O . . . . . . . . . G L A SS O - C V . . . . . . . . . G L A SS O - C V . . . . . . . . . T h i s t a b l e r e p o r t s t h e a nnu a li ze d o u t - o f - s a m p l e S D a nd a v e r ag e m o n t h l y t u r n o v e r ( T O )( i np e r ce n t) o f t h e G M V p o r t f o li o s a s w e ll a s t h e m o n t h l y M S F E o f t h e r e s p ec t i v ec o v a r i a n cee s t i m a t o r s a c r o ss a ll t h ec o n s i d e r e dd a t a s e t s w i t h , nd s t o c k s , r e s p ec t i v e l y . S i n ce t h e N a i v e p o r t f o li o s t r a t e g y d o e s n o t r e q u i r e a c o v a r i a n cee s t i m a t o r p e r d e fin i t i o n , n o v a l u e s a r e r e p o r t e d f o r t h e M S F E . W e r e p o r tt h e l o w e s t M S F E , S D , a nd T O f o r e a c h e s t i m a t i o n m e t h o d i nb o l d . T h e b e s t r e s u l t s i n t e r m s o f t h e M S F E a ndS D f o r e a c hd a t a s e t a r e und e r li n e d . W e a dd i t i o n a ll y und e r li n e t h e l o w e s t T O , e x c l ud i n g t h e N a i v e p o r t f o li o . ross-validated covariance estimators for high-dimensional MVP 17 sample variance. Still, for the largest dataset 500RUA, POET-CV21 stronglyoutperforms its original counterpart.For the CV1 approach, the investigation of the MSFE is mandatory. Thevalues reported in Table 1 indicate the distinct effect of the CV1 approachon the minimization of the MSFE out-of-sample. For all the estimation meth-ods and datasets, the MSFE is the lowest for the CV1 version of each es-timator. Even robust estimators such as LW NL and LW NLSF exhibit higherforecasting power, measured by the MSFE, when the corresponding param-eters are estimated with the CV1 approach. Nevertheless, it is noteworthythat the MSFE measure does not seem to proxy for the out-of-sample port-folio risk level. Within the financial literature, including Zakamulin (2015),the MSFE is studied in reference to datasets with low concentration ratios q .However, in a high-dimensional setting, a lower MSFE does not coincide withlower SD out-of-sample for any of the datasets or estimation methods. Underthe CV1 method, the SFE is computed as an estimator’s squared distanceto the monthly realized covariance matrix, calculated based on daily returns(roughly 21 days) for n assets. The implied concentration ratios, ranging from100 /
21 = 4 .
76 for the 100RUA dataset to 500 /
21 = 23 .
81 for the 500RUAdataset, lead to ill-conditioned realized covariance matrices and a noisy SFEcalculation. Therefore, we focus our further analysis on the CV21 approach.Table 2 should be read column-wise; that is, the difference in SD for theLW and sample estimator is listed under the second column for the first row.For completeness, we construct the table symmetrically. Still, we focus ourattention on the elements above the diagonal only.At first glance, we can distinguish the positive effect of the CV21 proce-dure on the out-of-sample risk of the linear shrinkage and GLASSO estimators.While the original estimation methods LW and LW CC are the worst perform-ers for this asset universe, we observe an astonishing improvement when thelinear shrinkage intensity is optimized for the out-of-sample portfolio variancewith CV21. Both LW -CV21 and LW CC -CV21 result in a significantly lowerout-of-sample SD than their original counterparts.Another insight emerges from the comparison of LW CC -CV2 with LW NL .Although specifically designed to overcome the high-dimensionality problem,both the original and CV21-based nonlinear shrinkage methods lead to higherout-of-sample risk than the cross-validated linear shrinkage estimator. Thiseffect is observable for the 100RUA dataset as well (see, for reference, Table 5).As the difference is not statistically significant in any of the cases, we can onlydraw a qualitative conclusion that a methodologically easy-to-understand andsimple-to-implement method can perform as well as a complex state-of-the-artestimator when the optimal tuning parameter (here, the shrinkage intensity)is identified in with CV.Table 1 additionally reports the average monthly turnover rate as a proxyfor the arising transaction costs induced by monthly rebalancing. The Naive As a possible solution, recent financial studies have focused on improving the estima-tion of large realized covariance matrices (see, e.g., Hautsch et al. 2012; Callot et al. 2017;Bollerslev et al. 2018).8 Sven Husmann et al. T a b l e : D i ff e r e n ce s i nS D p . a . o f G M V w i t h t h e R UA d a t a s e t a c r o ss d i ff e r e n t e s t i m a t o r s . L W L W - C V L W CC L W CC - C V L W N L L W N L - C V L W N L S F L W N L S F - C V P O E T P O E T - C V G L A SS O G L A SS O - C V L W − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W - C V . . − . ∗ − . ∗ − . ∗∗ − . ∗∗∗ − . ∗∗∗ − . − . − . ∗∗∗ − . ∗∗∗ L W CC . − . − . ∗∗∗ − . − . − . ∗∗∗ − . ∗∗∗ − . − . − . ∗∗∗ − . ∗∗∗ L W CC - C V . . . . . − . ∗∗∗ − . ∗∗∗ . − . − . ∗∗∗ − . ∗∗∗ L W N L . . . − . − . − . ∗∗∗ − . ∗∗∗ − . − . − . ∗∗∗ − . ∗∗∗ L W N L - C V . . . − . . − . ∗∗∗ − . ∗∗∗ − . − . − . ∗∗∗ − . ∗∗∗ L W N L S F . . . . . . . ∗∗∗ . ∗∗∗ . ∗∗∗ . ∗∗∗ . ∗∗ L W N L S F - C V . . . . . . − . . ∗∗∗ . ∗∗∗ . ∗ . P O E T . . . − . . . − . − . − . ∗∗ − . ∗∗∗ − . ∗∗∗ P O E T - C V . . . . . . − . − . . − . ∗∗ − . ∗∗∗ G L A SS O . . . . . . − . − . . . − . ∗ G L A SS O - C V . . . . . . − . − . . . . b e tt e r t h a n % o f m o d e l s . . . . . . . . . . . . T h i s t a b l e s h o w s t h e d i ff e r e n ce s i n t h e a nnu a li ze d o u t - o f - s a m p l e S D o f t h e G M V p o r t f o li o s w i t h t h e R UA d a t a s e t a c r o ss t h e m a i n c o v a r i a n cee s t i m a t i o n m e t h o d s a nd t h e i r C V - b a s e d c o un t e r p a r t s . T h e t a b l e i s c o n s t r u c t e d i n a s y mm e t r i c a l w a y . O n t h ee l e m e n t s a b o v e t h e d i ago n a l, t h e s i g n i fi c a n t p a i r w i s e o u t p e r f o r m a n ce i n t e r m s o f v a r i a n ce i s d e n o t e db y a s t e r i s k s : *** d e n o t e ss i g n i fi c a n ce a tt h e . l e v e l; ** d e n o t e ss i g n i fi c a n ce a tt h e . l e v e l; a nd * d e n o t e ss i g n i fi c a n ce a tt h e . l e v e l. F i n a ll y , f o r e a c h m o d e l, w e r e p o r tt h e o v e r a ll r e l a t i v e o u t p e r f o r m a n ce i np e r ce n t . T a b l e : D i ff e r e n ce s i nS D p . a . o f G M V - - w i t h t h e R UA d a t a s e t a c r o ss d i ff e r e n t e s t i m a t o r s . L W L W - C V L W CC L W CC - C V L W N L L W N L - C V L W N L S F L W N L S F - C V P O E T P O E T - C V G L A SS O G L A SS O - C V L W − . − . − . . ∗∗∗ . ∗∗∗ − . ∗∗∗ − . ∗∗∗ . ∗ . − . − . L W - C V . − . − . . ∗∗∗ . ∗∗∗ − . ∗∗∗ − . ∗∗∗ . ∗ . − . − . L W CC . . . . ∗∗∗ . ∗∗∗ − . − . . ∗∗∗ . ∗∗ − . . L W CC - C V . . − . . ∗∗∗ . ∗∗∗ − . ∗ − . . ∗∗ . ∗ − . . L W N L − . − . − . − . − . ∗∗ − . ∗∗∗ − . ∗∗∗ . − . − . ∗∗∗ − . ∗∗∗ L W N L - C V − . − . − . − . . − . ∗∗∗ − . ∗∗∗ . − . − . ∗∗∗ − . ∗∗ L W N L S F . . . . . . . ∗ . ∗∗∗ . ∗∗∗ . . ∗∗ L W N L S F - C V . . . . . . − . . ∗∗∗ . ∗∗∗ . . ∗ P O E T − . − . − . − . − . − . − . − . − . ∗ − . ∗∗∗ − . ∗∗∗ P O E T - C V − . − . − . − . . . − . − . . − . ∗∗∗ − . ∗ G L A SS O . . . . . . − . − . . . . ∗∗∗ G L A SS O - C V . . − . − . . . − . − . . . − . b e tt e r t h a n % o f m o d e l s . . . . . . . . . . . . T h i s t a b l e s h o w s t h e d i ff e r e n ce s i n t h e a nnu a li ze d o u t - o f - s a m p l e S D o f t h e G M V p o r t f o li o s w i t h t h e R UA d a t a s e t a c r o ss t h e m a i n c o v a r i a n cee s t i m a t i o n m e t h o d s a nd t h e i r C V - b a s e d c o un t e r p a r t s . T h e t a b l e i s c o n s t r u c t e d i n a s y mm e t r i c a l w a y . O n t h ee l e m e n t s a b o v e t h e d i ago n a l, t h e s i g n i fi c a n t p a i r w i s e o u t p e r f o r m a n ce i n t e r m s o f v a r i a n ce i s d e n o t e db y a s t e r i s k s : *** d e n o t e ss i g n i fi c a n ce a tt h e . l e v e l; ** d e n o t e ss i g n i fi c a n ce a tt h e . l e v e l; a nd * d e n o t e ss i g n i fi c a n ce a tt h e . l e v e l. F i n a ll y , f o r e a c h m o d e l, w e r e p o r tt h e o v e r a ll r e l a t i v e o u t p e r f o r m a n ce i np e r ce n t . ross-validated covariance estimators for high-dimensional MVP 19 portfolio, being long-only and equally-weighted by construction, naturally hasthe lowest turnover (approximately 0.06 on average across all the datasets).As expected, the GMV portfolios estimated with the sample covariance ma-trix are characterized by extreme exposures for all the datasets. On the otherhand, an estimation with GLASSO has the most pronounced positive effecton the allocation characteristics of the GMV portfolio. In particular, theGLASSO-CV1 estimation methodology results in GMV portfolios with thelowest turnover for all the datasets. Interestingly, the estimator LW -CV21leads to the second-lowest turnover rates across all the estimation methodsfor the datasets 100RUA and 250RUA. It seems that when the concentrationratio is tolerable, the linear shrinkage methodology, as a convex combinationbetween the sample covariance and an identity matrix, produces satisfactoryresults. The underlying model in LW is equivalent to the introduction of aridge type penalty in the estimation (Warton 2008), which has been proven toinduce stability. When the sample covariance matrix becomes ill-conditionedor even singular, the cross-validated choice of the linear shrinkage intensityreduces the turnover rate by approximately 68%. While LW shrinks thesample covariance matrix toward the identity matrix, GLASSO shrinks theprecision matrix toward the identity matrix. Since the Naive portfolio corre-sponds to a GMV portfolio estimated with an identity covariance and hence,precision matrix, one may suggest that both estimation methods result in animplicit shrinkage of the sample GMV portfolio weights toward an equally-weighted portfolio, as in Tu and Zhou (2011), and therefore perform well interms of turnover.5.3 GMV-130-30The table is structured similarly to Table 1 with the columns representingthe investment universes (100RUA, 250RUA, and 500RUA), and performancemeasures, while the rows indicate the portfolio strategies based on the consid-ered covariance estimation methods. Since the examined constraint does notplay any role in the CV1-based estimation of the covariance matrix, we do notreport the MSFE values.Moreover, Table 3 presents the differences in annualized SDs and the re-spective pairwise significance levels across all the main covariance estimationmethods and their CV22-based counterparts for the high-dimensional case ofthe 500RUA dataset. In addition, Appendix C compares the other datasets.The first notable consequence of the gross exposure constraint is the improve-ment in portfolio performance for the case of the sample covariance matrix.Both for 100RUA and 250RUA, the sample estimator is significantly outper-formed only by the LW NLSF , POET, and GLASSO estimators and their CV22versions.Finally, we examine the average monthly turnover rates, reported in Ta-ble 4. Similar reduction in turnover takes place in the case of the LW CC estimator, as well.0 Sven Husmann et al. Table 4: Performance of GMV-130-30 portfolios across different estimators anddatasets. -CV1 12.07 -CV22 11.92 32.89 10.64 41.24 LW CC CC -CV1 11.98 LW CC -CV22 11.85 32.67 10.55 42.47 9.51 45.52LW NL LW NL -CV1 11.90 NL -CV22 11.91 29.75 NLSF
NLSF -CV1 11.74 28.31 10.45 34.41 9.41 LW NLSF -CV22 11.73 27.66 10.44 36.96 9.42 38.62POET 11.82
POET-CV1 11.79 32.17 10.51 37.48 9.84 38.79POET-CV22
GLASSO-CV22
In this study, we review some of the most recent and efficient estimation meth-ods for high-dimensional minimum-variance portfolios. We extend the currentresearch by proposing a CV methodology to determine the corresponding tun-ing parameters, such as the linear shrinkage intensity and the sparsity penaltyterm.In a detailed empirical analysis with three high-dimensional datasets, weidentify the characteristics of our approach. First, we establish that the selec-tion criterion within the CV should correspond to the performance measure ofinterest. We show that the lowest overall out-of-sample portfolio risk is indeedgenerated when we select the optimal tuning parameters by minimizing theportfolio variance with the proposed CV. ross-validated covariance estimators for high-dimensional MVP 21
We additionally demonstrate that a CV methodology is beneficial to esti-mators whose performance depends strongly on the embedded tuning param-eters, as is the case with linear shrinkage, POET and GLASSO estimationmethods. Even complex and highly efficient estimators can be surpassed bysimpler approaches if the corresponding tuning parameters are calibrated effi-ciently. One of the reasons for this observation is the rapid adaptation of theCV toward ever-changing market situations and asset returns.Further, in this paper, we investigate only high-dimensional covariance esti-mation methods that assume homoscedasticity in the returns. Since we observea time-variable parameter selection with the CV approach and a resulting im-provement in the out-of-sample performance, we argue that the combinationof cross-validated parameter selection and time-dependent high-dimensionalvariance estimators, as recently proposed by Halbleib and Voev (2016) andEngle et al. (2019), is an important topic for future research.
References
S. Arlot and A. Celisse. A survey of cross-validation procedures for modelselection.
Statistics Surveys , 4(0):40–79, 2010. doi: 10.1214/09-SS054. URL http://projecteuclid.org/euclid.ssu/1268143839 .J. Bai and S. Ng. Determining the number of factors in approximate factormodels.
Econometrica , 70(1):191–221, 2002. doi: 10.1111/1468-0262.00273.O. Banerjee, L. E. Ghaoui, and d. d’Aspremont. Model selection throughsparse maximum likelihood estimation for multivariate Gaussian or binarydata.
Journal of Machine Learning Research , 9(3):485–516, 2008.M. J. Best and R. R. Grauer. On the sensitivity of mean-variance-efficientportfolios to changes in asset means: some analytical and computationalresults.
The Review of Financial Studies , 4(2):315–342, 1991a. URL .M. J. Best and R. R. Grauer. Sensitivity analysis for mean-variance portfolioproblems.
Management Science , 37(8):980–989, 1991b. URL .P. J. Bickel and E. Levina. Covariance regularization by thresholding.
TheAnnals of Statistics , 36(6):2577–2604, 2008. doi: 10.1214/08-AOS600. URL http://projecteuclid.org/euclid.aos/1231165180 .T. Bollerslev, A. J. Patton, and R. Quaedvlieg. Modeling and forecast-ing (un)reliable realized covariances for more reliable financial decisions.
Journal of Econometrics , 207(1):71–91, 2018. doi: 10.1016/j.jeconom.2018.05.004. URL https://linkinghub.elsevier.com/retrieve/pii/S0304407618301180 .M. Broadie. Computing efficient frontiers using estimated parameters.
Annalsof Operations Research , 45(1):21–58, 1993. doi: 10.1007/BF02282040.J. Brodie, I. Daubechies, C. De Mol, D. Giannone, and I. Loris. Sparse and sta-ble Markowitz portfolios.
Proceedings of the National Academy of Sciences ,106(30):12267–12272, 2009. doi: 10.1073/pnas.0904287106.
C. Brownlees, E. Nualart, and Y. Sun. Realized networks.
Journal of AppliedEconometrics , 33(7):986–1006, 2018. doi: 10.1002/jae.2642.T. Cai and W. Liu. Adaptive thresholding for sparse covariance matrix esti-mation.
Journal of the American Statistical Association , 106(494):672–684,2011. doi: 10.1198/jasa.2011.tm10560.L. A. F. Callot, A. B. Kock, and M. C. Medeiros. Modeling and forecastinglarge realized covariance matrices and portfolio choice.
Journal of AppliedEconometrics , 32(1):140–158, 2017. doi: 10.1002/jae.2512.P. Christoffersen and K. Jacobs. The importance of the loss function in optionvaluation.
Journal of Financial Economics , 72:291–318, 2004.V. DeMiguel, L. Garlappi, F. J. Nogales, and R. Uppal. A generalized approachto portfolio optimization: Improving performance by constraining portfolionorms.
Management Science , 55(5):798–812, 2009a. doi: 10.1287/mnsc.1080.0986.V. DeMiguel, L. Garlappi, and R. Uppal. Optimal versus naive diversification:How inefficient is the 1/N portfolio strategy?
Review of Financial Studies ,22(5):1915–1953, 2009b. doi: 10.1093/rfs/hhm075.A. P. Dempster. Covariance selection.
Biometrics , 28:157–175, 1972.E. J. Elton and M. J. Gruber. Estimating the dependence structure of shareprices–implications for portfolio selection.
Journal of Finance , 28(5):1203–1232, 1973.R. F. Engle, O. Ledoit, and M. Wolf. Large dynamic covariance matrices.
Journal of Business & Economic Statistics , 37(2):363–375, 2019. doi: 10.1080/07350015.2017.1345683.J. Fan, Y. Liao, and M. Mincheva. Large covariance estimation by thresholdingprincipal orthogonal complements.
Journal of the Royal Statistical Society:Series B (Statistical Methodology) , 75(4):603–680, 2013. doi: 10.1111/rssb.12016.J. Fan, Y. Liao, and H. Liu. An overview of the estimation of large covarianceand precision matrices.
The Econometrics Journal , 19(1):C1–C32, 2016. doi:10.1111/ectj.12061. URL https://academic.oup.com/ectj/article/19/1/C1/5056252 .J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimationwith the graphical lasso.
Biostatistics , 9(3):432–441, 2008. doi: 10.1093/biostatistics/kxm045.P. A. Frost and J. E. Savarino. An empirical Bayes approach to efficientportfolio selection.
The Journal of Financial and Quantitative Analysis , 21(3):293–305, 1986. doi: 10.2307/2331043. URL .S. Goto and Y. Xu. Improving mean variance optimization through sparsehedging restrictions.
Journal of Financial and Quantitative Analysis , 50(6):1415–1441, 2015.R. Halbleib and V. Voev. Forecasting covariance matrices: A mixed ap-proach.
Journal of Financial Econometrics , 14(2):383–417, 2016. doi:10.1093/jjfinec/nbu031. ross-validated covariance estimators for high-dimensional MVP 23
N. Hautsch, L. M. Kyj, and R. C. A. Oomen. A blocking and regularizationapproach to high-dimensional realized covariance estimation.
Journal ofApplied Econometrics , 27(4):625–645, 2012. doi: 10.1002/jae.1218.N. L. Hjort.
Pattern recognition and neural networks . Cambridge universitypress, 1996.W. James and C. Stein. Estimation with Quadratic Loss. In
Proceedings ofthe Fourth Berkeley Symposium on Mathematical Statistics and Probabil-ity , Volume 1: Contributions to the Theory of Statistics, pages 361–379.University of California Press, Berkeley, Calif., 1961.J. D. Jobson and B. M. Korkie. Performance hypothesis testing with theSharpe and Treynor measures.
The Journal of Finance , 36(4):889–908, 1981.O. Ledoit and M. Wolf. Improved estimation of the covariance matrixof stock returns with an application to portfolio selection.
Journalof Empirical Finance , 10(5):603–621, 2003. doi: 10.1016/S0927-5398(03)00007-0. URL https://linkinghub.elsevier.com/retrieve/pii/S0927539803000070 .O. Ledoit and M. Wolf. Honey, I shrunk the sample covariance matrix.
TheJournal of Portfolio Management , 30(4):110–119, 2004a.O. Ledoit and M. Wolf. A well-conditioned estimator for large-dimensionalcovariance matrices.
Journal of Multivariate Analysis , 88(2):365–411,2004b. doi: 10.1016/S0047-259X(03)00096-4. URL https://linkinghub.elsevier.com/retrieve/pii/S0047259X03000964 .O. Ledoit and M. Wolf. Robust performance hypothesis testing with theSharpe ratio.
Journal of Empirical Finance , 15:850–859, 2008.O. Ledoit and M. Wolf. Robust performances hypothesis testing with thevariance.
Wilmott , 2011(55):86–89, 2011. doi: 10.1002/wilm.10036.O. Ledoit and M. Wolf. Nonlinear shrinkage estimation of large-dimensionalcovariance matrices.
The Annals of Statistics , 40(2):1024–1060, 2012.doi: 10.1214/12-AOS989. URL http://projecteuclid.org/euclid.aos/1342625460 .O. Ledoit and M. Wolf. Nonlinear shrinkage of the covariance matrix forportfolio selection: Markowitz meets Goldilocks.
The Review of FinancialStudies , 30(12):4349–4388, 2017a. doi: 10.1093/rfs/hhx052. URL https://academic.oup.com/rfs/article/30/12/4349/3863121 .O. Ledoit and M. Wolf. Direct nonlinear shrinkage estimation of large-dimensional covariance matrices.
University of Zurich, Department of Eco-nomics, Working Paper , 2017b. URL .O. Ledoit and M. Wolf. Optimal estimation of a large-dimensional covariancematrix under Stein’s loss.
Bernoulli , 24(4B):3791–3832, 2018a. doi: 10.3150/17-BEJ979. URL https://projecteuclid.org/euclid.bj/1524038770 .O. Ledoit and M. Wolf. Analytical nonlinear shrinkage of large-dimensionalcovariance matrices.
University of Zurich, Department of Economics, Work-ing Paper , 264, 2018b.X. Liu. Portfolio Selection via Shrinkage by Cross Validation.
Journal ofFinance and Accounting , 2(4):74–81, 2014.
H. M. Markowitz. Portfolio selection.
The Journal of Finance , 7(1):77–91,1952.R. C. Merton. On estimating the expected return on the market: An ex-ploratory investigation.
Journal of Financial Economics , 8:323–361, 1980.doi: 10.3386/w0444. URL .R. O. Michaud. The Markowitz optimization enigma: Is ‘optimized’ optimal?
Financial Analysts Journal , 45(1):31–42, 1989. URL .B. W. Silverman.
Density estimation for statistics and data analysis , vol-ume 26. CRC press, 1986.C. Stein. Inadmissibility of the usual estimator for the mean of a multivariatenormal distribution. Technical report, Stanford University, 1956.C. Stein. Lectures on the theory of estimation of many parameters.
Journalof Soviet Mathematics , 34(1):1373–1403, 1986. doi: 10.1007/BF01085007.R. Tibshirani. Regression shrinkage and selection via the lasso.
Journal of theRoyal Statistical Society , 58(1):267–288, 1996. URL .G. Torri, R. Giacometti, and S. Paterlini. Sparse precision matrices for min-imum variance portfolios.
Computational Management Science , 16(3):375–400, 2019. doi: 10.1007/s10287-019-00344-6.J. Tu and G. Zhou. Markowitz meets Talmud: A combination of sophisti-cated and naive diversification strategies.
Journal of Financial Economics ,99(1):204–215, 2011. doi: 10.1016/j.jfineco.2010.08.013. URL https://linkinghub.elsevier.com/retrieve/pii/S0304405X10001893 .D. I. Warton. Penalized normal likelihood and ridge regularization of correla-tion and covariance matrices.
Journal of the American Statistical Associa-tion , 103(481):340–349, 2008. doi: 10.1198/016214508000000021.D. M. Witten, J. H. Friedman, and N. Simon. New insights and faster com-putations for the graphical lasso.
Journal of Computational and GraphicalStatistics , 20(4):892–900, 2011. doi: 10.1198/jcgs.2011.11051a.M. Yuan and Y. Lin. Model selection and estimation in the Gaussian graphicalmodel.
Biometrika , 94(1):19–35, 2007. doi: 10.1093/biomet/asm018.V. Zakamulin. A test of covariance-matrix forecasting methods.
The Journal ofPortfolio Management , 41(3):97–108, 2015. doi: 10.3905/jpm.2015.41.3.097. ross-validated covariance estimators for high-dimensional MVP 25
A Covariance Parameters . . . . . . LW Year S h r i n k age I n t en s i t y . . . . . . LW CC Year S h r i n k age I n t en s i t y − . − . − . − . − . − . − . LW NL Year B and w i d t h ' s S peed − . − . − . − . − . − . − . LW NLSF
Year B and w i d t h ' s S peed POET
Year N u m be r o f F a c t o r s + − − − − GLASSO
Year La ss o P ena l t y Original CV1 CV21 CV22 (a) 100RUA . . . . . . LW Year S h r i n k age I n t en s i t y . . . . . . LW CC Year S h r i n k age I n t en s i t y − . − . − . − . − . − . − . LW NL Year B and w i d t h ' s S peed − . − . − . − . − . − . − . LW NLSF
Year B and w i d t h ' s S peed POET
Year N u m be r o f F a c t o r s + − − − − GLASSO
Year La ss o P ena l t y Original CV1 CV21 CV22 (b) 250RUA
Fig. 2: Optimally selected parameters with original, CV1, CV21, and CV22covariance estimation methods for the 100RUA and 250RUA datasets.
B GMV . . . . . . LW LW CC LW NL LW NLSF
POET GLASSO
CV21 CV1 Original . . . . . . LW LW CC LW NL LW NLSF
POET GLASSO
CV21 CV1 Original
Fig. 3: Relative differences in the annualized SD of GMV portfolios with the100RUA and 250RUA datasets across the efficient covariance estimation meth-ods. ross-validated covariance estimators for high-dimensional MVP 27 T a b l e : D i ff e r e n ce s i nS D p . a . o f G M V w i t h t h e R UA d a t a s e t a c r o ss d i ff e r e n t e s t i m a t o r s . S a m p l e L W L W - C V L W CC L W CC - C V L W N L L W N L - C V L W N L S F L W N L S F - C V P O E T P O E T - C V G L A SS O G L A SS O - C V S a m p l e − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W . − . ∗∗ − . ∗∗∗ − . ∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W - C V . . − . − . . . − . ∗∗∗ − . ∗∗∗ − . − . − . ∗∗∗ − . ∗∗∗ L W CC . . . − . . . − . ∗∗∗ − . ∗∗∗ . − . − . ∗∗∗ − . ∗∗∗ L W CC - C V . . . . . . − . ∗ − . ∗ . − . − . ∗ − . ∗∗∗ L W N L . . − . − . − . − . − . ∗∗∗ − . ∗∗∗ − . − . ∗ − . ∗∗∗ − . ∗∗∗ L W N L - C V . . − . − . − . . − . ∗∗∗ − . ∗∗∗ − . − . ∗ − . ∗∗∗ − . ∗∗∗ L W N L S F . . . . . . . . . ∗∗∗ . ∗∗∗ − . − . L W N L S F - C V . . . . . . . − . . ∗∗∗ . ∗∗∗ − . − . P O E T . . . − . − . . . − . − . − . ∗∗ − . ∗∗∗ − . ∗∗∗ P O E T - C V . . . . . . . − . − . . − . ∗∗∗ − . ∗∗∗ G L A SS O . . . . . . . . . . . − . G L A SS O - C V . . . . . . . . . . . . b e tt e r t h a n % o f m o d e l s . . . . . . . . . . . . . T h i s t a b l e s h o w s t h e d i ff e r e n ce s i n t h e a nnu a li ze d o u t - o f - s a m p l e S D o f t h e G M V p o r t f o li o s w i t h t h e R UA d a t a s e t a c r o ss t h e m a i n c o v a r i a n cee s t i m a t i o n m e t h o d s a nd t h e i r C V - b a s e d c o un t e r p a r t s . T h e t a b l e i s c o n s t r u c t e d i n a s y mm e t r i c a l w a y w i t h a n a pp li e d c o l o r s c h e m e f r o m r e d ( h i g h e r S D t h a n t h e o t h e r m o d e l )t og r ee n ( l o w e r S D t h a n t h e o t h e r m o d e l ) . I n a dd i t i o n , o n t h ee l e m e n t s a b o v e t h e d i ago n a l, t h e s i g n i fi c a n t p a i r w i s e o u t p e r f o r m a n ce i n t e r m s o f v a r i a n ce i s d e n o t e db y a s t e r i s k s : *** d e n o t e ss i g n i fi c a n ce a tt h e . l e v e l; ** d e n o t e ss i g n i fi c a n ce a tt h e . l e v e l; a nd * d e n o t e ss i g n i fi c a n ce a tt h e . l e v e l. F i n a ll y , f o r e a c h m o d e l, w e r e p o r tt h e p e r ce n t ag e o f t h e o t h e r m o d e l s t h a t e x h i b i t h i g h e r v a r i a n ce a s a q u a li t a t i v e m e a s u r e . T a b l e : D i ff e r e n ce s i nS D p . a . o f G M V w i t h t h e R UA d a t a s e t a c r o ss d i ff e r e n t e s t i m a t o r s . S a m p l e L W L W - C V L W CC L W CC - C V L W N L L W N L - C V L W N L S F L W N L S F - C V P O E T P O E T - C V G L A SS O G L A SS O - C V S a m p l e − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W . − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W - C V . . . . − . − . − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W CC . . − . − . − . − . − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W CC - C V . . − . . − . − . − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W N L . . . . . − . − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W N L - C V . . . . . . − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W N L S F . . . . . . . . ∗∗ . . ∗∗ . . L W N L S F - C V . . . . . . . − . . . ∗ . . P O E T . . . . . . . − . − . . ∗ . − . P O E T - C V . . . . . . . − . − . − . − . − . ∗ G L A SS O . . . . . . . − . − . − . . − . ∗∗ G L A SS O - C V . . . . . . . − . − . . . . b e tt e r t h a n % o f m o d e l s . . . . . . . . . . . . . T h i s t a b l e s h o w s t h e d i ff e r e n ce s i n t h e a nnu a li ze d o u t - o f - s a m p l e S D o f t h e G M V p o r t f o li o s w i t h t h e R UA d a t a s e t a c r o ss t h e m a i n c o v a r i a n cee s t i m a t i o n m e t h o d s a nd t h e i r C V - b a s e d c o un t e r p a r t s . T h e t a b l e i s c o n s t r u c t e d i n a s y mm e t r i c a l w a y w i t h a n a pp li e d c o l o r s c h e m e f r o m r e d ( h i g h e r S D t h a n t h e o t h e r m o d e l )t og r ee n ( l o w e r S D t h a n t h e o t h e r m o d e l ) . I n a dd i t i o n , o n t h ee l e m e n t s a b o v e t h e d i ago n a l, t h e s i g n i fi c a n t p a i r w i s e o u t p e r f o r m a n ce i n t e r m s o f v a r i a n ce i s d e n o t e db y a s t e r i s k s : *** d e n o t e ss i g n i fi c a n ce a tt h e . l e v e l; ** d e n o t e ss i g n i fi c a n ce a tt h e . l e v e l; a nd * d e n o t e ss i g n i fi c a n ce a tt h e . l e v e l. F i n a ll y , f o r e a c h m o d e l, w e r e p o r tt h e p e r ce n t ag e o f t h e o t h e r m o d e l s t h a t e x h i b i t h i g h e r v a r i a n ce a s a q u a li t a t i v e m e a s u r e . C GMV - 130/30 . . . . . . LW LW CC LW NL LW NLSF
POET GLASSO
CV22 CV1 Original . . . . . . LW LW CC LW NL LW NLSF
POET GLASSO
CV22 CV1 Original