[PDF] Cross-validated covariance estimators for high-dimensional minimum-variance portfolios

Abstract

The global minimum-variance portfolio is a typical choice for investors because of its simplicity and broad applicability. Although it requires only one input, namely the covariance matrix of asset returns, estimating the optimal solution remains a challenge. In the presence of high-dimensionality in the data, the sample covariance estimator becomes ill-conditioned and leads to suboptimal portfolios out-of-sample. To address this issue, we review recently proposed efficient estimation methods for the covariance matrix and extend the literature by suggesting a multi-fold cross-validation technique for selecting the necessary tuning parameters within each method. Conducting an extensive empirical analysis with four datasets based on the S&P 500, we show that the data-driven choice of specific tuning parameters with the proposed cross-validation improves the out-of-sample performance of the global minimum-variance portfolio. In addition, we identify estimators that are strongly influenced by the choice of the tuning parameter and detect a clear relationship between the selection criterion within the cross-validation and the evaluated performance measure.

Full PDF

FFinancial Markets and Portfolio Management manuscript No. (will be inserted by the editor)

Cross-validated covariance estimators forhigh-dimensional minimum-variance portfolios

Sven Husmann · Antoniya Shivarova ∗ · Rick Steinert

Received: date / Accepted: date

Abstract

The global minimum-variance portfolio is a typical choice for in-vestors because of its simplicity and broad applicability. Although it requiresonly one input, namely the covariance matrix of asset returns, estimating theoptimal solution remains a challenge. In the presence of high-dimensionality inthe data, the sample covariance estimator becomes ill-conditioned and leads tosuboptimal portfolios out-of-sample. To address this issue, we review recentlyproposed eﬃcient estimation methods for the covariance matrix and extendthe literature by suggesting a multi-fold cross-validation technique for selectingthe necessary tuning parameters within each method. Conducting an exten-sive empirical analysis with three datasets based on the Russell 3000, we showthat choosing the speciﬁc tuning parameters with the proposed cross-validationimproves the out-of-sample performance of the global minimum-variance port-folio. In addition, we identify estimators that are strongly inﬂuenced by thechoice of the tuning parameter and detect a clear relationship between theselection criterion within the cross-validation and the evaluated performancemeasure.

Keywords

Covariance Estimation · Portfolio Optimization · High-dimensionality · Cross-validation

JEL classiﬁcation

G10 · G11 · C13 · C80 ∗ Corresponding author. Email: [email protected] Husmann · Antoniya Shivarova · Rick SteinertEuropa-Universit¨at ViadrinaGroße Scharrnstraße 5915230 Frankfurt (Oder), Germany a r X i v : . [ q -f i n . P M ] A ug Sven Husmann et al.

Based on the simple but essential idea of diversiﬁcation and optimal risk-returnproﬁle of an investment strategy, the mean-variance model by Markowitz(1952) still represents the groundwork for portfolio optimization. In its originaldesign, Markowitz portfolio theory assumes perfect knowledge about the ex-pected value and variance of returns. For practical implementations, however,these parameters have to be estimated from historical data. The misspeciﬁca-tions due to error in estimation can lead to strong deviations from optimalityand therefore an inferior out-of-sample performance (Jobson and Korkie 1981;Frost and Savarino 1986; Michaud 1989; Broadie 1993). This major drawbackhas been tackled from diﬀerent perspectives in the ﬁnancial literature. Somefocus on estimation errors in the portfolio weights directly (see, e.g., Brodieet al. 2009; DeMiguel et al. 2009a), whereas others work on the inputs byimproving expected returns and the covariance matrix.In particular, portfolio weights are extremely sensitive to changes in ex-pected returns (Best and Grauer 1991a,b), which in turn are more diﬃcultto estimate than the covariances of returns (Merton 1980). It is therefore notsurprising that a considerable part of recent academic research focuses on theglobal minimum-variance portfolio (GMV), as this does not depend on ex-pected returns. However, even if investors decide to use the global minimum-variance portfolio, the estimation errors associated with the covariances canstill lead to signiﬁcant estimation errors in the portfolio weights, especially ina high-dimensional scenario.We cover several approaches that have been shown to overcome these es-timation issues and perform well in terms of out-of-sample variance. For in-stance, we discuss the linear shrinkage estimators of Ledoit and Wolf (2004a,b)designed to oﬀer an optimal bias-variance trade-oﬀ between the sample covari-ance matrix and a structured target matrix. Furthermore, we adopt the recentnonlinear shrinkage technique by Ledoit and Wolf (2017a) which is proven tobe optimal under a variety of ﬁnancially relevant loss functions (Ledoit andWolf 2017a, 2018a). Moreover, we outline and implement the elaborate prin-cipal orthogonal complement thresholding (POET) estimator by Fan et al.(2013) In addition, we follow the ﬁndings of the recent empirical studies byGoto and Xu (2015) and Torri et al. (2019) and include the graphical leastabsolute shrinkage and selection operator (GLASSO),More importantly, the selected covariance estimation methods share onething in common: a regularization of the sample covariance is performed tooptimize its out-of-sample performance. For example, linear shrinkage methodsneed an optimal shrinkage intensity to balance the included variance and bias,whereas the performance of the GLASSO depends on the level of sparsity,induced by a penalty parameter. The procedure for optimally identifying thosetuning parameters often includes the choice of a speciﬁc loss function to be DeMiguel et al. (2009b) additionally show that the mean-variance portfolio is outper-formed out-of-sample by the minimum-variance portfolio not only in terms of risk, but aswell in respect to the return-risk ratio.ross-validated covariance estimators for high-dimensional MVP 3 minimized. As often advocated, a loss function or measure of ﬁt in the modelestimation is best aligned with the evaluation framework (Christoﬀersen andJacobs 2004; Ledoit and Wolf 2017a; Engle et al. 2019). To exploit those eﬀectsin more detail, we apply a nonparametric cross-validation (CV) technique withdiﬀerent selection criteria to determine the optimal parameters, necessary forthe calculation of all the considered covariance estimators.Since we focus on enhancing the risk proﬁle of the GMV portfolios, wechoose two relevant risk-related measures for our cross-validated estimationmethodology and the corresponding out-of-sample performance evaluation,namely the mean squared forecasting error (MSFE), as in Zakamulin (2015),and the out-of-sample portfolio variance. We show empirically that in mostcases there exists a strong positive relation between the selection criterionwithin the CV and the respective out-of-sample performance measure. For in-stance, when the overall goal is to reduce the out-of-sample risk, then usingCV with the portfolio variance as a measure of ﬁt leads to lower risk thanthe original method. Similar results are documented by Liu (2014), althoughhe only considers the most straight-forward linear shrinkage as in Ledoit andWolf (2003, 2004a,b). Here, we examine more recent and eﬃcient estimationmethods and identify those that can actually proﬁt from CV. In detail, es-timators that depend strongly on the choice of a speciﬁc tuning parameterwithin their derivation are more prone to be positively inﬂuenced by replacingthe original solution with a cross-validated one.Our contributions to the current literature on the subject of covarianceand precision matrix estimation within the portfolio optimization frameworkcan be summarized as follows. First, we show that recent advances in methodsfor high-dimensional covariance estimation lead to strong improvements in therisk proﬁle of the GMV. In this context, we emphasize the distinct and oftensigniﬁcant outperformance of the In line with the main discussion, we showthat a model’s outperformance in respect to out-of-sample portfolio variancedoes coincide with an identical objective within the CV. Although the elabo-rate nonlinear shrinkage methods are not strongly inﬂuenced by applying theCV procedure, all the other cross-validated estimators perform better thantheir original counterparts. This advantage becomes even greater as the high-dimensionality of the data increases. Considering the MSFE, the results arestraightforward for all estimation methods. If an investor aims to minimizethis measure, the respective validation within the CV ought to be performed.Nonetheless, we analyze the ineﬃciency of the MSFE for high-dimensionalasset returns’ data, in particular, because of a distorted calculation of therealized covariance matrix.The rest of paper is organized as follows: In Section 2, we review the con-sidered covariance estimation methods and their properties. Section 3 outlinesthe suggested CV methodology in respect to its main characteristics: the pro-cedure, the parameter set, and the selection criteria. We describe the empiricalstudy in Section 4 with a strong focus on the chosen dataset, methodology andperformance measures. In Section 5, we discuss the performance of classicaland constrained GMV portfolios and analyze in detail the inﬂuence of cross-

Sven Husmann et al. validated estimation among all considered datasets and methods. Section 6summarizes the results and concludes. (cid:98) Σ S = 1 T − R − (cid:98) µ (cid:48) ( R − (cid:98) µ , (1)where R ∈ R T × n is the matrix of past asset returns with T observations and n number of stocks, (cid:98) µ ∈ R n is the vector of expected returns, here estimatedwith the sample mean, and 1 is an n -dimensional vector of ones. As shown byMerton (1980), the sample covariance matrix is an asymptotically unbiasedand consistent estimator of the true covariance matrix Σ For a large numberof assets a concentration ratio of such magnitude is practically infeasible dueto limited data availability and illiquidity issues. With a high relation of thenumber of assets to the sample size, also called high-dimensionality, the samplecovariance and its inverse exhibit higher amount of estimation error, mainlydue to the over- and underestimation of the respective eigenvalues. Moreover,for q >

1, the sample covariance becomes singular and the inverse cannot becalculated.The sample estimator’s instability and possible singularity in case of high-dimensionality are a problem within the optimization of global minimum-variance portfolios, where the covariance matrix and, speciﬁcally, its inversecapture the dependency between asset returns and allow for the eﬀect of diver-siﬁcation as a way of reducing risk. It is then straightforward that the accuracyof optimally estimated portfolio weights is directly related to the estimator’sprecision. As a solution, several alternative estimators have been proposed inthe literature.2.2 Linear ShrinkageTo produce more stable estimators of the covariance matrix, a linear shrinkingprocedure can be applied to the sample estimator toward a more structuredtarget matrix (cid:98) Σ T , (cid:98) Σ LS = s (cid:98) Σ T + (1 − s ) (cid:98) Σ S , where the constant s ∈ [0 ,

1] controls the shrinkage intensity, which is set higherthe more ill-conditioned the sample estimator is and vice versa. In contrast tothe unbiased, but unstable sample covariance, a structured target matrix haslittle estimation error but tends to be biased. As a compromise, the convexcombination of both uses the bias-variance trade-oﬀ by accepting more bias ross-validated covariance estimators for high-dimensional MVP 5 in-sample in exchange for less variance out-of-sample. This idea is central tothe shrinkage methodology of Stein (1956) and James and Stein (1961). Therespective linear shrinkage estimator is calculated as (cid:98) Σ LW = (cid:98) s ¯ σI n + (1 − (cid:98) s ) (cid:98) Σ S , (2)where ¯ σ = n (cid:80) nj =1 σ jj is the average of all individual sample variances and (cid:98) s isan optimal shrinkage intensity parameter. However, in the context of ﬁnancialtime series, it is beneﬁcial to consider target matrices with reference to thecorrelation structure of asset returns.Ledoit and Wolf (2004a) consider identical pairwise correlations between all n assets. The target matrix is therefore derived under the constant correlationmatrix model of Elton and Gruber (1973), so that (cid:98) Σ T = (cid:98) Σ CC . While thevariances are kept as their original sample values, the oﬀ-diagonal entries of thetarget matrix are estimated by assuming a constant average sample correlation¯ ρ . This results in (cid:98) Σ CC , ij = (cid:112)(cid:98) σ ii (cid:98) σ jj ¯ ρ. The corresponding estimator is deﬁnedas (cid:98) Σ LW CC = (cid:98) s (cid:98) Σ CC + (1 − (cid:98) s ) (cid:98) Σ S . (3)The level of the shrinkage (cid:98) s in Equations (2) and (3) can be obtained analyt-ically. In particular, as shown by Ledoit and Wolf (2004a,b), asymptoticallyconsistent estimators for the optimal linear shrinkage intensity are derivedunder the quadratic loss function L (cid:16) (cid:98) Σ, Σ (cid:17) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) Σ − Σ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F , (4)known as the Frobenius loss, where the covariance estimator (cid:98) Σ is substitutedwith Equation (2) or (3). The ﬁnite sample solution is found at the minimumof the expected value of the Frobenius loss, namely the mean squared error(MSE), (cid:98) s = arg min s E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) Σ − Σ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F (cid:21) . (5)The methodology behind this derivation can be applied to other shrinkagetargets in a convex combination setting after an individually performed analy-sis and mathematical adaptation. Our cross-validation methodology, however,can be implemented for any linear shrinkage without further modiﬁcations,since we do not rely on the theoretically derived shrinkage intensity; instead,we search for an optimal value using CV.2.3 Nonlinear ShrinkageWithout any assumption about the true covariance matrix, the positive-deﬁniterotationally equivariant nonlinear shrinkage is based on the spectral decom-position of the sample covariance matrix and deﬁned as (cid:98) Σ LW NL = V (cid:98) Λ NL V (cid:48) , (6) This class of estimators was ﬁrst introduced by Stein (1986). Sven Husmann et al. where V = [ v , . . . , v n ] is the orthogonal matrix with the sample eigenvectors v i as columns and (cid:98) Λ NL is the diagonal matrix of the sample eigenvalues λ i ,shrunk by a nonlinear shrinkage function (cid:98) φ . To ﬁnd the optimal (cid:98) φ ∗ , Ledoitand Wolf (2012) originally minimize the MSE in ﬁnite samples.Without going into further details, we examine the practical implemen-tation of the nonlinear shrinkage. The optimal solution is achieved using anonparametric variable bandwidth kernel estimation of the limiting spectraldensity of the sample eigenvalues and its Hilbert transform. The speed atwhich the bandwidth vanishes in the number of assets n can be set to − / − / − .

35. Within the suggested CV technique, we aim to verify whether thisexact choice of the kernel bandwidth’s speed is crucial for the estimator’s ef-ﬁciency and whether the out-of-sample performance can be improved by anin-sample validation.2.4 Approximate Factor ModelThe previously outlined methods for improved high-dimensional covariance es-timation do not assume any structural knowledge about the covariance matrixand regularize only the sample eigenvalues λ i . An underlying structure couldbe established by regularizing the sample eigenvectors v i , for example if thecovariance matrix itself is assumed to be sparse (see, e.g., Bickel and Levina2008; Cai and Liu 2011). Unfortunately, this is not appropriate for ﬁnancialtime series because of the presence of common factors (Fan et al. 2013). How-ever, if there is only conditional sparsity, the covariance matrix of investmentreturns can be estimated using factor models given by (cid:98) Σ FM = B (cid:98) Σ F B (cid:48) + (cid:98) Σ u , where Σ F is the sample covariance matrix of the common factors and (cid:98) Σ u isthe residuals covariance matrix. One disadvantage of such exact factor mod-els is the strong assumption of no correlation in the error terms across assets;that is, the error covariance matrix (cid:98) Σ u is assumed to contain only the sam-ple variances of the residuals. Therefore, possible cross-sectional correlationsare neglected after separating the common present factors (Fan et al. 2013).Instead, approximate factor models allow for oﬀ-diagonal values within theerror covariance matrix. The POET estimator is one of the most recent and Most recently, Ledoit and Wolf (2018b) reach an analytical solution by replacing thecomplex-valued Stieltjes transform (Ledoit and Wolf 2017a) of the limiting distribution ofthe sample eigenvalues by the Hilbert transform, which acts as a local attraction force. Asa result, each sample eigenvalue is shrunk toward its closest and most numerous neighbors. Following this deﬁnition and assuming K common factors with K < n , a covariancematrix estimator based on factor models only needs to estimate K ( K + 1) / eﬃcient estimators from this branch of research. Using the close connectionbetween factor models and the principal component analysis, Fan et al. (2013)infer the necessary factor loadings by running a singular value decompositionon the sample covariance matrix as Σ S = K (cid:88) i =1 λ i v i v (cid:48) i + n (cid:88) i = K +1 λ i v i v (cid:48) i . The covariance, formed by the ﬁrst K principal components, contains mostof the information about the implied structure. The rest is assumed to be anapproximately sparse matrix, estimated by applying an adaptive thresholdingprocedure (Cai and Liu 2011) with a threshold parameter c . As a result, thePOET estimator becomes Σ POET = K (cid:88) i =1 λ i v i v (cid:48) i + (cid:98) Σ c u , K . (7)As argued by Fan et al. (2013), for high-dimensional asset returns with asuﬃciently large n → ∞ , the number of factors K can be inferred from thedata, for example, with (cid:98) K = arg min ≤≤ k max log (cid:32) nT (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) R − T RF k F (cid:48) k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F (cid:33) + kg ( T, n ) , (8)where k max is the predeﬁned maximum number of factors, R is the matrixof asset returns with a sample covariance matrix Σ S , F k is a T × k matrixwith columns the eigenvectors, corresponding to the k largest eigenvalues of Σ S , and g ( T, n ) is a penalty function of the type, introduced by Bai and Ng(2002). In this study we further examine whether the proposed CV approachcan select optimal values for K by considering the out-of-sample performancemeasure of interest as a selection criterion.2.5 Graphical ModelA proper estimation of the covariance matrix of returns is crucial in a portfoliooptimization context, since its inverse Θ = Σ − is the direct input parameternecessary for exploiting diversiﬁcation eﬀects upon optimization. Instead ofimposing a factor structure on the covariance matrix with a sparse error co-variance as in POET, sparsity in the precision matrix can be a valid approachfor reducing estimation errors, especially in the case of conditional indepen-dence among asset pairs (Fan et al. 2016). In detail, the entry Θ i , j = 0 if For the operational use of POET, the threshold value c needs to be determined, so thatthe positive-deﬁniteness of (cid:98) Σ c u , K is assured in ﬁnite samples. The choice of c can thereforeoccur from a set, for which the respective minimal eigenvalue of the errors’ covariance matrixafter thresholding is positive. The minimal constant c that guarantees positive-deﬁnitenessis then chosen. For more details, see, Fan et al. (2013). Sven Husmann et al. and only if asset returns r i and r j are independent, conditional on the otherassets in the investment universe. Since graphical models are used to describeboth the conditional and unconditional dependence structures of a set of vari-ables, the estimation of Θ is closely related to graphs under a Gaussian model.The identiﬁcation of zeros in the inverse can be performed with the Gaussiangraphical model, since within the Markowitz portfolio optimization frameworkasset returns are assumed to follow a multivariate normal distribution. One of the most commonly used methods for inducing sparsity on theprecision matrix is by penalizing the maximum-likelihood. For i.i.d. R with R ∼ N (0 , Σ ), the Gaussian log-likelihood function is given by L ( Θ ) = log | Θ | − tr (cid:16) (cid:98) Σ S Θ (cid:17) , (9)where | · | denotes the determinant and tr( . ) the trace of a matrix. MaximizingEquation (9) alone yields the known maximum-likelihood estimator for theprecision matrix (cid:98) Θ S , which suﬀers from high estimation error in case of high-dimensionality. To reduce such errors, the maximum log-likelihood functioncan be penalized by adding a lasso penalty (Tibshirani 1996) on the precisionmatrix entries as L ( Θ ) = log | Θ | − tr (cid:16) (cid:98) Σ S Θ (cid:17) − ρ (cid:12)(cid:12)(cid:12)(cid:12) Θ − (cid:12)(cid:12)(cid:12)(cid:12) , (10)where || Θ − || is the L -norm (the sum of the absolute values) of the matrix Θ − , an n × n matrix with the oﬀ-diagonal elements, equal to the correspondingelements of the precision matrix Θ and the diagonal elements equal to zero. Furthermore, ρ is a penalty parameter that controls the sparsity level, withhigher ρ values leading to a larger number of oﬀ-diagonal zero elements withinthe resulting estimator.The penalized likelihood framework for a sparse graphical model estimationwas ﬁrst proposed by Yuan and Lin (2007), who solve Equation (10) with aninterior-point method. Banerjee et al. (2008) show that the problem is convexand solve it for Σ with a box-constrained quadratic program. To date, thefastest available solution for the sparse graphical model in Equation (10) isreached with the GLASSO algorithm, developed by Friedman et al. (2008)and later improved by Witten et al. (2011). They demonstrate that the aboveformulation is equivalent to an N-coupled lasso problem and solve it using acoordinate descent procedure.In addition to a well-performing algorithm, the value of ρ is necessary forcalculating the optimal GLASSO estimator. For this purpose, Yuan and Lin(2007) suggest using the Bayesian Information Criterion (BIC), deﬁned foreach ρ as BIC ( ρ ) = − log (cid:12)(cid:12)(cid:12) (cid:98) Θ ρ (cid:12)(cid:12)(cid:12) + tr (cid:16) (cid:98) Σ S (cid:98) Θ ρ (cid:17) + log( T )2 n (cid:88) i =1 ,i (cid:54) = j n (cid:88) j =1 ,j (cid:54) = i { (cid:98) Θ ρ, ij (cid:54) =0 } , (11) This idea was ﬁrst proposed by Dempster (1972) with the so-called covariance selectionmodel. This insures that no penalty is applied to the asset returns’ sample variances.ross-validated covariance estimators for high-dimensional MVP 9 where the indicator function { (cid:98) Θ ρ,ij (cid:54) =0 } counts the number of nonzero oﬀ-diagonal elements in the estimated precision matrix. The value of ρ , corre-sponding to the lowest BIC, is chosen as the optimal lasso penalty parameter.The choice of the BIC as a selection criterion for ρ is further justiﬁed by therelation between the penalized problem in Equation (10) and the model selec-tion criteria (Goto and Xu 2015). Although Yuan and Lin (2007) argue thata CV procedure for an optimal lasso penalty can yield better out-of-sampleresults, the existing ﬁnancial applications estimate ρ only once in-sample. Bycontrast, and additionally perform a multi-fold CV with risk-related selectioncriteria. The exact methodology is described in the next section.

Each of the outlined covariance estimators includes an exogenous or data-dependent parameter. The linear shrinkage estimators in Equations (2) and(3) are calculated with an optimal shrinkage intensity (cid:98) s . For the more gen-eral nonlinear shrinkage Ledoit and Wolf (2017a) set the kernel bandwidth’sspeed at − .

35 as the average of two recognized approaches. The approxi-mate factor model, the POET estimator by Fan et al. (2013), deals with anunknown number of factors K , which are identiﬁed by minimizing popularinformation criteria. Finally, the GLASSO estimator proposed by Friedmanet al. (2008) needs an optimal choice for the penalty parameter ρ , often esti-mated by minimizing the BIC in-sample. To clarify our analysis, we refer tothese estimation methods as ‘original’. In addition, we adopt a nonparametrictechnique, a multi-fold CV, to identify the necessary parameter for each esti-mation method. Instead of relying on pre-speciﬁed assumptions and derivingcorresponding solutions individually, we perform a grid search over a domainof values and ﬁnd the best possible parameter for two exemplary out-of-sampleselection criteria.3.1 Parameter SetTo employ a cross-validated choice, we ﬁrst need to specify a domain of possi-ble values for the necessary parameters that should be selected within the CVprocedure. For this purpose we create a sequence (or grid) of arbitrary param-eters δ ∈ ∆ for each covariance model. Depending on the chosen length of thesequence, the CV can be computationally time-consuming. Since the choiceof this sequence is crucial for the out-of-sample eﬃciency of the methodol-ogy, the domain of possible parameters has to be individually evaluated for Goto and Xu (2015) induce sparsity to enhance robustness and lower the estimation errorwithin portfolio hedging strategies, Brownlees et al. (2018) develop a procedure called “re-alized network” by applying GLASSO as a regularization procedure for realized covarianceestimators, and Torri et al. (2019) analyze the out-of-sample performance of a minimum-variance portfolio, estimated with GLASSO.0 Sven Husmann et al. each estimation method by considering the trade-oﬀ between desired precisionand computing time. Subsection 4.2 outlines the examined sequences for theconsidered covariance estimation methods.3.2 Cross-Validation ProcedureThe CV is a model validation technique designed to assess how an estimatedmodel would perform on an unknown dataset. To evaluate the model accuracy,the available dataset is repeatedly split into a training and a testing subset ina rolling-window fashion (see, e.g., Hjort 1996; Arlot and Celisse 2010). Forinstance, in the case of an m -fold CV, a dataset with τ observations is splitinto m equal parts. The ﬁrst rolling-window then uses as a training dataset theﬁrst fold consisting of the ﬁrst ν < τ observations ordered by time. Upon this,the consecutive υ observations are used to validate the performed estimationas a test dataset. This is iteratively done m times by shifting the trainingwindow by υ observations and, therefore, maintaining the chronological orderwithin the data.In our setting, for each of the pre-deﬁned parameters we successively usethe training data to calculate a covariance matrix estimator (cid:98) Σ t ,δ for a testdataset t and a speciﬁc parameter δ . During the following validation stage,we must set selection criteria, also referred to as measures of ﬁt, to identifywhich parameter performs best. In this study, we investigate two commonobjectives within the ﬁeld of portfolio risk minimization.As often argued, the squared forecasting error (SFE) or, as deﬁned inSection 2, the Frobenius loss, is minimized to ﬁnd a covariance estimator withthe least forecasting error (see, e.g., Zakamulin 2015). Speciﬁcally, we ﬁrstcalculate a realized covariance matrix for the test dataset with Σ t = ( R t − (cid:98) µ t (cid:48) ( R t − (cid:98) µ t , where R t ∈ R υ × n are the asset returns from the test dataset and is the vectorof average returns for the testing period consisting of υ observations. Then,we ﬁnd the corresponding SFE as (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) Σ t ,δ − Σ t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F . This procedure is repeated m times, so that we end up with m SFE values foreach δ . From the parameter set we then choose this δ for which the average(over all m iterations) SFE is minimized. In our empirical study, CV with theSFE as a measure of ﬁt is referred to as CV1.Instead of the SFE, within a portfolio optimization framework, one is gen-erally more interested in whether a covariance estimator leads to lower out-of-sample risk of the optimal portfolio (see, e.g., Liu 2014; Ledoit and Wolf 2017a; For clarity in the notation, we do not diﬀerentiate between covariance estimators. Theprocedure is applied to all methods equally.ross-validated covariance estimators for high-dimensional MVP 11

Engle et al. 2019). To incorporate and later investigate this concept, as oursecond scenario (CV2), we minimize the out-of-sample portfolio variance. Indetail, with the covariance matrix (cid:98) Σ t ,δ , previously estimated with the trainingdata, we calculate the optimal weights (cid:98) w t ,δ for a portfolio of our choice (e.g.,the GMV). This then allows us to calculate the respective portfolio returnsthroughout the testing period with υ observations as r p t ,δ = (cid:98) w (cid:48) t ,δ R t . This procedure is repeated m times, so that we end up with m portfolio returnvectors for each δ . From the parameter set, we then choose this δ for whichthe empirical variance (over all m iterations) of those portfolio out-of-samplereturns is minimized.By applying diﬀerent measures of ﬁt within the CV we explicitly addressthe importance of aligned selection criteria for the out-of-sample performanceof each covariance estimation method. Moreover, we aim to verify whether thecalibration of covariance parameters with a multi-fold CV yields better resultsout-of-sample than the original models. To exploit the above considerations, we perform an extensive empirical studyof the suggested covariance estimation methods within a high-dimensionalportfolio optimization context. For this purpose, we create fully invested aswell as portfolios and evaluate their out-of-sample performance for a rangeof commonly used measures. We additionally compare the original covarianceparameters with their calibrated equivalents. The exact empirical construct iselaborated on in the following subsections.4.1 Model SetupFor the empirical study, we focus on the GMV portfolio. The optimal weightsfor an investment period t are determined by minimizing the portfolio varianceas (cid:98) w t = arg min w w (cid:48) (cid:98) Σ t w s.t. 1 (cid:48) n w = 1 , (12)where 1 n is an n-dimensional vector of ones and (cid:98) Σ t is an arbitrary covari-ance matrix estimator for the investment period t . This formulation has theanalytical solution (cid:98) w t = (cid:98) Σ − t n (cid:48) n (cid:98) Σ − t n . τ = 24 months (or roughly 504 days),and an out-of-sample period from , resulting in T − τ = 240 months (or5029 days) out-of-sample portfolio returns. Similarly to the original studieson the reviewed covariance estimation methods (Fan et al. 2013; Ledoit andWolf 2017a), we employ a monthly rebalancing strategy, since this is morecost-eﬃcient and common in practice. Within each rolling-window step, thecovariance matrix of asset returns for the investment month t is estimated atthe end of month t − . The POET estimator is calculated using theR-package POET provided by Fan et al. (2013). Finally, the GLASSO estimatoris calculated with the algorithm provided by Friedman et al. (2008) within theR-package glasso with no penalty on the diagonal elements and an in-sampleselection of the lasso penalty using the BIC.In addition to the models in Section 2, we calculate the cross-validatedestimators as in Section 3 by implementing an m -fold CV. To determine theselection criteria for the respective CV methods, we choose m = 12 and there-fore divide the in-sample observations into a training sample of 12 months (or252 days) and a testing sample of one month (or 21 days). With this con-struction, we replicate the proposed monthly rebalancing strategy inside theperformed CV. As introduced in Subsection 3.1, we additionally need to deﬁnea set of parameters for each covariance estimation method.Since both linear shrinkage methods in Equation (2) (LW ) and Equa-tion (3) (LW CC ) represent the weighted average between the sample and atarget covariance matrix, we deﬁne a parameter set ∆ of G shrinkage intensi-ties, such that ∆ = ∆ CC = ( δ , δ , . . . , δ G ) ∈ [0 , NL ) as well as for the single-factor nonlinear shrinkage in Equation ( ?? )(LW NLSF ), we set the kernel bandwidth’s speed to lie between − . − . ∆ NL = ( δ , δ , . . . , δ G ) ∈ [ − . , − . ρ , derived from the in-sample data. Specif-ically, we deﬁne a logarithmic sequence 10 log ( k ( x,e,u,G )) as our ρ -generatingfunction, where k ( x, e, u, G ) = ( x − · e − uG − + u with G = 50 number of pa- .ross-validated covariance estimators for high-dimensional MVP 13 rameters in the sequence, u being the maximal absolute value of the samplecovariance matrix, estimated with the training dataset, and e = 0 . u .After calculating all the possible combinations of original and cross-validatedestimators within the validation subset, we choose an optimal parameter foreach covariance estimation method, as outlined in Subsection 3.2, and use allthe in-sample data to estimate the covariance matrix for the next investmentmonth. Since the reviewed estimation methods and our methodology do notmodel time-dependency in the covariance matrix, we set . We use (cid:98) Σ t to ﬁndthe optimal weights (cid:98) w t , as in Equations (12) and ( ?? ). With these weights,we calculate the out-of-sample portfolio returns for each model in t . This pro-cedure is repeated multiple times until the end of our investment horizon.First, we include the equally-weighted portfolio, hereafter also referred toas the Naive portfolio. This strategy implies an identity covariance matrixand hence, does not include any estimation risk (DeMiguel et al. 2009b). Inaddition to the Naive strategy, which is a standard benchmark when comparinginduced transaction costs and turnover rates, we estimate the weights withthe sample covariance matrix estimator, which serves as a benchmark for theout-of-sample risk. All these portfolios are evaluated with the performancemeasures, presented in the following subsection.4.3 Performance MeasuresTo evaluate the out-of-sample performance of each covariance matrix estima-tion method, we report diﬀerent performance measures for the estimator’seﬃciency and the risk proﬁle as well as the allocation properties of the corre-sponding GMV and GMV-130-30 portfolios. First, we calculate the MSFE asMSFE = 1 T − τ T − τ (cid:88) t = τ n (cid:88) i =1 i (cid:88) j =1 (cid:16) Σ t,ij − (cid:98) Σ t,ij (cid:17) , (13)where (cid:98) Σ t,ij is the covariance matrix estimator and Σ t,ij is the realized covari-ance for month t . The MSFE is frequently used to measure the forecastingpower of an estimation method. To avoid double accounting for forecastingerrors, we exclude the lower triangular part of both matrices from the calcu-lation.Considering the nature of minimum-variance portfolios as risk-reductionstrategies, we are especially interested in the out-of-sample SD as a perfor-mance indicator. We calculate the standard deviation (SD) of the 5029 out-of-sample portfolio returns and multiply by √

252 to annualize it. For a moredetailed analysis of the out-of-sample risk of the constructed portfolios andtherefore, implicitly, covariance estimation methods, we perform the two-sidedParzen Kernel HAC-test for diﬀerences in variances, as described by Ledoitand Wolf (2008) and Ledoit and Wolf (2011), and report the correspondingsigniﬁcance levels. Since we utilize daily returns, a suﬃcient number of ob- servations is available and a bootstrap technique is not essential. Since theMSFE is closely related to the SFE optimality criterion, as within the CV1method, we expect the respectively optimized covariance estimators to ex-hibit a lower MSFE than their original versions. Moreover, an estimation withthe CV2 approach, based on minimizing the portfolio variance, is expected toresult in a lower out-of-sample SD.In practice investors need to additionally address the problem of high trans-action costs; hence, they prefer a more stable allocation for an optimal portfoliostrategy. Therefore, as a proxy for occurring transaction costs, we analyze theaverage monthly turnover, deﬁned asTurnover = 1 T − τ − T − τ − (cid:88) t = τ (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) w t +1 − (cid:98) w + t (cid:12)(cid:12)(cid:12)(cid:12) , (14)where || · || denotes the (cid:96) -norm and (cid:98) w + t denotes the portfolio weights atthe end of the investment month t , scaled back to one. The turnover rate iscalculated as the averaged sum of absolute values of the monthly rebalancingtrades across all n assets and over all investment dates T − τ −

1. The nextsection reports the detailed out-of-sample performance analysis and empiricalresults. Considering the trend of the optimal linear shrinkage intensities for LW andLW CC , we observe that the original approaches of Ledoit and Wolf (2004a) andLedoit and Wolf (2004b) are less reactive to changes in asset returns than ourCV methodologies. The strong ﬂuctuation in the selected shrinkage intensityfor CV1, CV21, and CV22 results from the properties and functionality of theCV itself and implies fast adaptation to potentially changing market conditionsand selection criteria.5.2 GMV PortfolioTable 1 presents the central results of our empirical analysis on the GMVportfolio. as well as the three performance measures MSFE, SD and averagemonthly turnover rate (TO). The latter are reported in percentage. The rowsindicate the portfolio strategies based on the covariance estimation. While theoriginal estimators are noted only by the respective name of the estimation For the sake of completeness, we have also performed a block bootstrap as in Ledoit andWolf (2011). The corresponding signiﬁcant values are comparable to those from the HACtest and are therefore not reported. The other datasets produce similar results. For reference, see Figure 2, Appendix A.ross-validated covariance estimators for high-dimensional MVP 15 . . . . . . LW Year S h r i n k age I n t en s i t y . . . . . . LW CC Year S h r i n k age I n t en s i t y − . − . − . − . − . − . − . LW NL Year B and w i d t h ' s S peed − . − . − . − . − . − . − . LW NLSF

Year B and w i d t h ' s S peed POET

Year N u m be r o f F a c t o r s + − − − − GLASSO

Year La ss o P ena l t y Original CV1 CV21 CV22

Fig. 1: Optimally selected parameters with original, CV1, CV21, and CV22covariance estimation methods for the 500RUA dataset.method, the endings CV1 and CV21 represent the cross-validated approaches,as explained in the previous sections.The compact representation of the results across datasets allows us to ob-serve that in the case of enhanced covariance estimators, the annualized SDdeclines as more assets are included in the GMV portfolio. This is easily ex-plained by the known power of diversiﬁcation – the desirable eﬀect of includingmore stocks in a portfolio. As expected, all the eﬃcient covariance estimationmethods perform better than the sample estimator in terms of out-of-samplerisk for all the datasets, with more signiﬁcant deviations for higher concentra-tion ratios.More importantly, we can detect the positive eﬀect of the appropriatechoice of selection criterion for determining the necessary covariance param-eters. For all the datasets, minimizing the portfolio variance with the CV21approach indeed leads to lower out-of-sample SD for the linear shrinkage meth-ods LW and LW CC and the GLASSO estimator. For the POET estimator, theCV21 method does not lead to consistent outperformance in terms of out-of- T a b l e : P e r f o r m a n ce o f G M V p o r t f o li o s a c r o ss d i ﬀ e r e n t e s t i m a t o r s a ndd a t a s e t s . R UA R UA R UA M S F E S D T u r n o v e r M S F E S D T u r n o v e r M S F E S D T u r n o v e r N a i v e . . . . . . S a m p l e . . . . . . --- L W . . . . . . . . . L W - C V . . . . . . . . . L W - C V . . . . . . . . . L W CC . . . . . . . . . L W CC - C V . . . . . . . . . L W CC - C V . . . . . . . . . L W N L . . . . . . . . . L W N L - C V . . . . . . . . . L W N L - C V . . . . . . . . . L W N L S F . . . . . . . . . L W N L S F - C V . . . . . . . . . L W N L S F - C V . . . . . . . . . P O E T . . . . . . . . . P O E T - C V . . . . . . . . . P O E T - C V . . . . . . . . . G L A SS O . . . . . . . . . G L A SS O - C V . . . . . . . . . G L A SS O - C V . . . . . . . . . T h i s t a b l e r e p o r t s t h e a nnu a li ze d o u t - o f - s a m p l e S D a nd a v e r ag e m o n t h l y t u r n o v e r ( T O )( i np e r ce n t) o f t h e G M V p o r t f o li o s a s w e ll a s t h e m o n t h l y M S F E o f t h e r e s p ec t i v ec o v a r i a n cee s t i m a t o r s a c r o ss a ll t h ec o n s i d e r e dd a t a s e t s w i t h , nd s t o c k s , r e s p ec t i v e l y . S i n ce t h e N a i v e p o r t f o li o s t r a t e g y d o e s n o t r e q u i r e a c o v a r i a n cee s t i m a t o r p e r d e ﬁn i t i o n , n o v a l u e s a r e r e p o r t e d f o r t h e M S F E . W e r e p o r tt h e l o w e s t M S F E , S D , a nd T O f o r e a c h e s t i m a t i o n m e t h o d i nb o l d . T h e b e s t r e s u l t s i n t e r m s o f t h e M S F E a ndS D f o r e a c hd a t a s e t a r e und e r li n e d . W e a dd i t i o n a ll y und e r li n e t h e l o w e s t T O , e x c l ud i n g t h e N a i v e p o r t f o li o . ross-validated covariance estimators for high-dimensional MVP 17 sample variance. Still, for the largest dataset 500RUA, POET-CV21 stronglyoutperforms its original counterpart.For the CV1 approach, the investigation of the MSFE is mandatory. Thevalues reported in Table 1 indicate the distinct eﬀect of the CV1 approachon the minimization of the MSFE out-of-sample. For all the estimation meth-ods and datasets, the MSFE is the lowest for the CV1 version of each es-timator. Even robust estimators such as LW NL and LW NLSF exhibit higherforecasting power, measured by the MSFE, when the corresponding param-eters are estimated with the CV1 approach. Nevertheless, it is noteworthythat the MSFE measure does not seem to proxy for the out-of-sample port-folio risk level. Within the ﬁnancial literature, including Zakamulin (2015),the MSFE is studied in reference to datasets with low concentration ratios q .However, in a high-dimensional setting, a lower MSFE does not coincide withlower SD out-of-sample for any of the datasets or estimation methods. Underthe CV1 method, the SFE is computed as an estimator’s squared distanceto the monthly realized covariance matrix, calculated based on daily returns(roughly 21 days) for n assets. The implied concentration ratios, ranging from100 /

21 = 4 .

76 for the 100RUA dataset to 500 /

21 = 23 .

81 for the 500RUAdataset, lead to ill-conditioned realized covariance matrices and a noisy SFEcalculation. Therefore, we focus our further analysis on the CV21 approach.Table 2 should be read column-wise; that is, the diﬀerence in SD for theLW and sample estimator is listed under the second column for the ﬁrst row.For completeness, we construct the table symmetrically. Still, we focus ourattention on the elements above the diagonal only.At ﬁrst glance, we can distinguish the positive eﬀect of the CV21 proce-dure on the out-of-sample risk of the linear shrinkage and GLASSO estimators.While the original estimation methods LW and LW CC are the worst perform-ers for this asset universe, we observe an astonishing improvement when thelinear shrinkage intensity is optimized for the out-of-sample portfolio variancewith CV21. Both LW -CV21 and LW CC -CV21 result in a signiﬁcantly lowerout-of-sample SD than their original counterparts.Another insight emerges from the comparison of LW CC -CV2 with LW NL .Although speciﬁcally designed to overcome the high-dimensionality problem,both the original and CV21-based nonlinear shrinkage methods lead to higherout-of-sample risk than the cross-validated linear shrinkage estimator. Thiseﬀect is observable for the 100RUA dataset as well (see, for reference, Table 5).As the diﬀerence is not statistically signiﬁcant in any of the cases, we can onlydraw a qualitative conclusion that a methodologically easy-to-understand andsimple-to-implement method can perform as well as a complex state-of-the-artestimator when the optimal tuning parameter (here, the shrinkage intensity)is identiﬁed in with CV.Table 1 additionally reports the average monthly turnover rate as a proxyfor the arising transaction costs induced by monthly rebalancing. The Naive As a possible solution, recent ﬁnancial studies have focused on improving the estima-tion of large realized covariance matrices (see, e.g., Hautsch et al. 2012; Callot et al. 2017;Bollerslev et al. 2018).8 Sven Husmann et al. T a b l e : D i ﬀ e r e n ce s i nS D p . a . o f G M V w i t h t h e R UA d a t a s e t a c r o ss d i ﬀ e r e n t e s t i m a t o r s . L W L W - C V L W CC L W CC - C V L W N L L W N L - C V L W N L S F L W N L S F - C V P O E T P O E T - C V G L A SS O G L A SS O - C V L W − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W - C V . . − . ∗ − . ∗ − . ∗∗ − . ∗∗∗ − . ∗∗∗ − . − . − . ∗∗∗ − . ∗∗∗ L W CC . − . − . ∗∗∗ − . − . − . ∗∗∗ − . ∗∗∗ − . − . − . ∗∗∗ − . ∗∗∗ L W CC - C V . . . . . − . ∗∗∗ − . ∗∗∗ . − . − . ∗∗∗ − . ∗∗∗ L W N L . . . − . − . − . ∗∗∗ − . ∗∗∗ − . − . − . ∗∗∗ − . ∗∗∗ L W N L - C V . . . − . . − . ∗∗∗ − . ∗∗∗ − . − . − . ∗∗∗ − . ∗∗∗ L W N L S F . . . . . . . ∗∗∗ . ∗∗∗ . ∗∗∗ . ∗∗∗ . ∗∗ L W N L S F - C V . . . . . . − . . ∗∗∗ . ∗∗∗ . ∗ . P O E T . . . − . . . − . − . − . ∗∗ − . ∗∗∗ − . ∗∗∗ P O E T - C V . . . . . . − . − . . − . ∗∗ − . ∗∗∗ G L A SS O . . . . . . − . − . . . − . ∗ G L A SS O - C V . . . . . . − . − . . . . b e tt e r t h a n % o f m o d e l s . . . . . . . . . . . . T h i s t a b l e s h o w s t h e d i ﬀ e r e n ce s i n t h e a nnu a li ze d o u t - o f - s a m p l e S D o f t h e G M V p o r t f o li o s w i t h t h e R UA d a t a s e t a c r o ss t h e m a i n c o v a r i a n cee s t i m a t i o n m e t h o d s a nd t h e i r C V - b a s e d c o un t e r p a r t s . T h e t a b l e i s c o n s t r u c t e d i n a s y mm e t r i c a l w a y . O n t h ee l e m e n t s a b o v e t h e d i ago n a l, t h e s i g n i ﬁ c a n t p a i r w i s e o u t p e r f o r m a n ce i n t e r m s o f v a r i a n ce i s d e n o t e db y a s t e r i s k s : *** d e n o t e ss i g n i ﬁ c a n ce a tt h e . l e v e l; ** d e n o t e ss i g n i ﬁ c a n ce a tt h e . l e v e l; a nd * d e n o t e ss i g n i ﬁ c a n ce a tt h e . l e v e l. F i n a ll y , f o r e a c h m o d e l, w e r e p o r tt h e o v e r a ll r e l a t i v e o u t p e r f o r m a n ce i np e r ce n t . T a b l e : D i ﬀ e r e n ce s i nS D p . a . o f G M V - - w i t h t h e R UA d a t a s e t a c r o ss d i ﬀ e r e n t e s t i m a t o r s . L W L W - C V L W CC L W CC - C V L W N L L W N L - C V L W N L S F L W N L S F - C V P O E T P O E T - C V G L A SS O G L A SS O - C V L W − . − . − . . ∗∗∗ . ∗∗∗ − . ∗∗∗ − . ∗∗∗ . ∗ . − . − . L W - C V . − . − . . ∗∗∗ . ∗∗∗ − . ∗∗∗ − . ∗∗∗ . ∗ . − . − . L W CC . . . . ∗∗∗ . ∗∗∗ − . − . . ∗∗∗ . ∗∗ − . . L W CC - C V . . − . . ∗∗∗ . ∗∗∗ − . ∗ − . . ∗∗ . ∗ − . . L W N L − . − . − . − . − . ∗∗ − . ∗∗∗ − . ∗∗∗ . − . − . ∗∗∗ − . ∗∗∗ L W N L - C V − . − . − . − . . − . ∗∗∗ − . ∗∗∗ . − . − . ∗∗∗ − . ∗∗ L W N L S F . . . . . . . ∗ . ∗∗∗ . ∗∗∗ . . ∗∗ L W N L S F - C V . . . . . . − . . ∗∗∗ . ∗∗∗ . . ∗ P O E T − . − . − . − . − . − . − . − . − . ∗ − . ∗∗∗ − . ∗∗∗ P O E T - C V − . − . − . − . . . − . − . . − . ∗∗∗ − . ∗ G L A SS O . . . . . . − . − . . . . ∗∗∗ G L A SS O - C V . . − . − . . . − . − . . . − . b e tt e r t h a n % o f m o d e l s . . . . . . . . . . . . T h i s t a b l e s h o w s t h e d i ﬀ e r e n ce s i n t h e a nnu a li ze d o u t - o f - s a m p l e S D o f t h e G M V p o r t f o li o s w i t h t h e R UA d a t a s e t a c r o ss t h e m a i n c o v a r i a n cee s t i m a t i o n m e t h o d s a nd t h e i r C V - b a s e d c o un t e r p a r t s . T h e t a b l e i s c o n s t r u c t e d i n a s y mm e t r i c a l w a y . O n t h ee l e m e n t s a b o v e t h e d i ago n a l, t h e s i g n i ﬁ c a n t p a i r w i s e o u t p e r f o r m a n ce i n t e r m s o f v a r i a n ce i s d e n o t e db y a s t e r i s k s : *** d e n o t e ss i g n i ﬁ c a n ce a tt h e . l e v e l; ** d e n o t e ss i g n i ﬁ c a n ce a tt h e . l e v e l; a nd * d e n o t e ss i g n i ﬁ c a n ce a tt h e . l e v e l. F i n a ll y , f o r e a c h m o d e l, w e r e p o r tt h e o v e r a ll r e l a t i v e o u t p e r f o r m a n ce i np e r ce n t . ross-validated covariance estimators for high-dimensional MVP 19 portfolio, being long-only and equally-weighted by construction, naturally hasthe lowest turnover (approximately 0.06 on average across all the datasets).As expected, the GMV portfolios estimated with the sample covariance ma-trix are characterized by extreme exposures for all the datasets. On the otherhand, an estimation with GLASSO has the most pronounced positive eﬀecton the allocation characteristics of the GMV portfolio. In particular, theGLASSO-CV1 estimation methodology results in GMV portfolios with thelowest turnover for all the datasets. Interestingly, the estimator LW -CV21leads to the second-lowest turnover rates across all the estimation methodsfor the datasets 100RUA and 250RUA. It seems that when the concentrationratio is tolerable, the linear shrinkage methodology, as a convex combinationbetween the sample covariance and an identity matrix, produces satisfactoryresults. The underlying model in LW is equivalent to the introduction of aridge type penalty in the estimation (Warton 2008), which has been proven toinduce stability. When the sample covariance matrix becomes ill-conditionedor even singular, the cross-validated choice of the linear shrinkage intensityreduces the turnover rate by approximately 68%. While LW shrinks thesample covariance matrix toward the identity matrix, GLASSO shrinks theprecision matrix toward the identity matrix. Since the Naive portfolio corre-sponds to a GMV portfolio estimated with an identity covariance and hence,precision matrix, one may suggest that both estimation methods result in animplicit shrinkage of the sample GMV portfolio weights toward an equally-weighted portfolio, as in Tu and Zhou (2011), and therefore perform well interms of turnover.5.3 GMV-130-30The table is structured similarly to Table 1 with the columns representingthe investment universes (100RUA, 250RUA, and 500RUA), and performancemeasures, while the rows indicate the portfolio strategies based on the consid-ered covariance estimation methods. Since the examined constraint does notplay any role in the CV1-based estimation of the covariance matrix, we do notreport the MSFE values.Moreover, Table 3 presents the diﬀerences in annualized SDs and the re-spective pairwise signiﬁcance levels across all the main covariance estimationmethods and their CV22-based counterparts for the high-dimensional case ofthe 500RUA dataset. In addition, Appendix C compares the other datasets.The ﬁrst notable consequence of the gross exposure constraint is the improve-ment in portfolio performance for the case of the sample covariance matrix.Both for 100RUA and 250RUA, the sample estimator is signiﬁcantly outper-formed only by the LW NLSF , POET, and GLASSO estimators and their CV22versions.Finally, we examine the average monthly turnover rates, reported in Ta-ble 4. Similar reduction in turnover takes place in the case of the LW CC estimator, as well.0 Sven Husmann et al. Table 4: Performance of GMV-130-30 portfolios across diﬀerent estimators anddatasets. -CV1 12.07 -CV22 11.92 32.89 10.64 41.24 LW CC CC -CV1 11.98 LW CC -CV22 11.85 32.67 10.55 42.47 9.51 45.52LW NL LW NL -CV1 11.90 NL -CV22 11.91 29.75 NLSF

NLSF -CV1 11.74 28.31 10.45 34.41 9.41 LW NLSF -CV22 11.73 27.66 10.44 36.96 9.42 38.62POET 11.82

POET-CV1 11.79 32.17 10.51 37.48 9.84 38.79POET-CV22

GLASSO-CV22

In this study, we review some of the most recent and eﬃcient estimation meth-ods for high-dimensional minimum-variance portfolios. We extend the currentresearch by proposing a CV methodology to determine the corresponding tun-ing parameters, such as the linear shrinkage intensity and the sparsity penaltyterm.In a detailed empirical analysis with three high-dimensional datasets, weidentify the characteristics of our approach. First, we establish that the selec-tion criterion within the CV should correspond to the performance measure ofinterest. We show that the lowest overall out-of-sample portfolio risk is indeedgenerated when we select the optimal tuning parameters by minimizing theportfolio variance with the proposed CV. ross-validated covariance estimators for high-dimensional MVP 21

We additionally demonstrate that a CV methodology is beneﬁcial to esti-mators whose performance depends strongly on the embedded tuning param-eters, as is the case with linear shrinkage, POET and GLASSO estimationmethods. Even complex and highly eﬃcient estimators can be surpassed bysimpler approaches if the corresponding tuning parameters are calibrated eﬃ-ciently. One of the reasons for this observation is the rapid adaptation of theCV toward ever-changing market situations and asset returns.Further, in this paper, we investigate only high-dimensional covariance esti-mation methods that assume homoscedasticity in the returns. Since we observea time-variable parameter selection with the CV approach and a resulting im-provement in the out-of-sample performance, we argue that the combinationof cross-validated parameter selection and time-dependent high-dimensionalvariance estimators, as recently proposed by Halbleib and Voev (2016) andEngle et al. (2019), is an important topic for future research.

References

S. Arlot and A. Celisse. A survey of cross-validation procedures for modelselection.

Statistics Surveys , 4(0):40–79, 2010. doi: 10.1214/09-SS054. URL http://projecteuclid.org/euclid.ssu/1268143839 .J. Bai and S. Ng. Determining the number of factors in approximate factormodels.

Econometrica , 70(1):191–221, 2002. doi: 10.1111/1468-0262.00273.O. Banerjee, L. E. Ghaoui, and d. d’Aspremont. Model selection throughsparse maximum likelihood estimation for multivariate Gaussian or binarydata.

Journal of Machine Learning Research , 9(3):485–516, 2008.M. J. Best and R. R. Grauer. On the sensitivity of mean-variance-eﬃcientportfolios to changes in asset means: some analytical and computationalresults.

The Review of Financial Studies , 4(2):315–342, 1991a. URL .M. J. Best and R. R. Grauer. Sensitivity analysis for mean-variance portfolioproblems.

Management Science , 37(8):980–989, 1991b. URL .P. J. Bickel and E. Levina. Covariance regularization by thresholding.

TheAnnals of Statistics , 36(6):2577–2604, 2008. doi: 10.1214/08-AOS600. URL http://projecteuclid.org/euclid.aos/1231165180 .T. Bollerslev, A. J. Patton, and R. Quaedvlieg. Modeling and forecast-ing (un)reliable realized covariances for more reliable ﬁnancial decisions.

Journal of Econometrics , 207(1):71–91, 2018. doi: 10.1016/j.jeconom.2018.05.004. URL https://linkinghub.elsevier.com/retrieve/pii/S0304407618301180 .M. Broadie. Computing eﬃcient frontiers using estimated parameters.

Annalsof Operations Research , 45(1):21–58, 1993. doi: 10.1007/BF02282040.J. Brodie, I. Daubechies, C. De Mol, D. Giannone, and I. Loris. Sparse and sta-ble Markowitz portfolios.

Proceedings of the National Academy of Sciences ,106(30):12267–12272, 2009. doi: 10.1073/pnas.0904287106.

C. Brownlees, E. Nualart, and Y. Sun. Realized networks.

Journal of AppliedEconometrics , 33(7):986–1006, 2018. doi: 10.1002/jae.2642.T. Cai and W. Liu. Adaptive thresholding for sparse covariance matrix esti-mation.

Journal of the American Statistical Association , 106(494):672–684,2011. doi: 10.1198/jasa.2011.tm10560.L. A. F. Callot, A. B. Kock, and M. C. Medeiros. Modeling and forecastinglarge realized covariance matrices and portfolio choice.

Journal of AppliedEconometrics , 32(1):140–158, 2017. doi: 10.1002/jae.2512.P. Christoﬀersen and K. Jacobs. The importance of the loss function in optionvaluation.

Journal of Financial Economics , 72:291–318, 2004.V. DeMiguel, L. Garlappi, F. J. Nogales, and R. Uppal. A generalized approachto portfolio optimization: Improving performance by constraining portfolionorms.

Management Science , 55(5):798–812, 2009a. doi: 10.1287/mnsc.1080.0986.V. DeMiguel, L. Garlappi, and R. Uppal. Optimal versus naive diversiﬁcation:How ineﬃcient is the 1/N portfolio strategy?

Review of Financial Studies ,22(5):1915–1953, 2009b. doi: 10.1093/rfs/hhm075.A. P. Dempster. Covariance selection.

Biometrics , 28:157–175, 1972.E. J. Elton and M. J. Gruber. Estimating the dependence structure of shareprices–implications for portfolio selection.

Journal of Finance , 28(5):1203–1232, 1973.R. F. Engle, O. Ledoit, and M. Wolf. Large dynamic covariance matrices.

Journal of Business & Economic Statistics , 37(2):363–375, 2019. doi: 10.1080/07350015.2017.1345683.J. Fan, Y. Liao, and M. Mincheva. Large covariance estimation by thresholdingprincipal orthogonal complements.

Journal of the Royal Statistical Society:Series B (Statistical Methodology) , 75(4):603–680, 2013. doi: 10.1111/rssb.12016.J. Fan, Y. Liao, and H. Liu. An overview of the estimation of large covarianceand precision matrices.

The Econometrics Journal , 19(1):C1–C32, 2016. doi:10.1111/ectj.12061. URL https://academic.oup.com/ectj/article/19/1/C1/5056252 .J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimationwith the graphical lasso.

Biostatistics , 9(3):432–441, 2008. doi: 10.1093/biostatistics/kxm045.P. A. Frost and J. E. Savarino. An empirical Bayes approach to eﬃcientportfolio selection.

The Journal of Financial and Quantitative Analysis , 21(3):293–305, 1986. doi: 10.2307/2331043. URL .S. Goto and Y. Xu. Improving mean variance optimization through sparsehedging restrictions.

Journal of Financial and Quantitative Analysis , 50(6):1415–1441, 2015.R. Halbleib and V. Voev. Forecasting covariance matrices: A mixed ap-proach.

Journal of Financial Econometrics , 14(2):383–417, 2016. doi:10.1093/jjﬁnec/nbu031. ross-validated covariance estimators for high-dimensional MVP 23

N. Hautsch, L. M. Kyj, and R. C. A. Oomen. A blocking and regularizationapproach to high-dimensional realized covariance estimation.

Journal ofApplied Econometrics , 27(4):625–645, 2012. doi: 10.1002/jae.1218.N. L. Hjort.

Pattern recognition and neural networks . Cambridge universitypress, 1996.W. James and C. Stein. Estimation with Quadratic Loss. In

Proceedings ofthe Fourth Berkeley Symposium on Mathematical Statistics and Probabil-ity , Volume 1: Contributions to the Theory of Statistics, pages 361–379.University of California Press, Berkeley, Calif., 1961.J. D. Jobson and B. M. Korkie. Performance hypothesis testing with theSharpe and Treynor measures.

The Journal of Finance , 36(4):889–908, 1981.O. Ledoit and M. Wolf. Improved estimation of the covariance matrixof stock returns with an application to portfolio selection.

Journalof Empirical Finance , 10(5):603–621, 2003. doi: 10.1016/S0927-5398(03)00007-0. URL https://linkinghub.elsevier.com/retrieve/pii/S0927539803000070 .O. Ledoit and M. Wolf. Honey, I shrunk the sample covariance matrix.

TheJournal of Portfolio Management , 30(4):110–119, 2004a.O. Ledoit and M. Wolf. A well-conditioned estimator for large-dimensionalcovariance matrices.

Journal of Multivariate Analysis , 88(2):365–411,2004b. doi: 10.1016/S0047-259X(03)00096-4. URL https://linkinghub.elsevier.com/retrieve/pii/S0047259X03000964 .O. Ledoit and M. Wolf. Robust performance hypothesis testing with theSharpe ratio.

Journal of Empirical Finance , 15:850–859, 2008.O. Ledoit and M. Wolf. Robust performances hypothesis testing with thevariance.

Wilmott , 2011(55):86–89, 2011. doi: 10.1002/wilm.10036.O. Ledoit and M. Wolf. Nonlinear shrinkage estimation of large-dimensionalcovariance matrices.

The Annals of Statistics , 40(2):1024–1060, 2012.doi: 10.1214/12-AOS989. URL http://projecteuclid.org/euclid.aos/1342625460 .O. Ledoit and M. Wolf. Nonlinear shrinkage of the covariance matrix forportfolio selection: Markowitz meets Goldilocks.

The Review of FinancialStudies , 30(12):4349–4388, 2017a. doi: 10.1093/rfs/hhx052. URL https://academic.oup.com/rfs/article/30/12/4349/3863121 .O. Ledoit and M. Wolf. Direct nonlinear shrinkage estimation of large-dimensional covariance matrices.

University of Zurich, Department of Eco-nomics, Working Paper , 2017b. URL .O. Ledoit and M. Wolf. Optimal estimation of a large-dimensional covariancematrix under Stein’s loss.

Bernoulli , 24(4B):3791–3832, 2018a. doi: 10.3150/17-BEJ979. URL https://projecteuclid.org/euclid.bj/1524038770 .O. Ledoit and M. Wolf. Analytical nonlinear shrinkage of large-dimensionalcovariance matrices.

University of Zurich, Department of Economics, Work-ing Paper , 264, 2018b.X. Liu. Portfolio Selection via Shrinkage by Cross Validation.

Journal ofFinance and Accounting , 2(4):74–81, 2014.

H. M. Markowitz. Portfolio selection.

The Journal of Finance , 7(1):77–91,1952.R. C. Merton. On estimating the expected return on the market: An ex-ploratory investigation.

Journal of Financial Economics , 8:323–361, 1980.doi: 10.3386/w0444. URL .R. O. Michaud. The Markowitz optimization enigma: Is ‘optimized’ optimal?

Financial Analysts Journal , 45(1):31–42, 1989. URL .B. W. Silverman.

Density estimation for statistics and data analysis , vol-ume 26. CRC press, 1986.C. Stein. Inadmissibility of the usual estimator for the mean of a multivariatenormal distribution. Technical report, Stanford University, 1956.C. Stein. Lectures on the theory of estimation of many parameters.

Journalof Soviet Mathematics , 34(1):1373–1403, 1986. doi: 10.1007/BF01085007.R. Tibshirani. Regression shrinkage and selection via the lasso.

Journal of theRoyal Statistical Society , 58(1):267–288, 1996. URL .G. Torri, R. Giacometti, and S. Paterlini. Sparse precision matrices for min-imum variance portfolios.

Computational Management Science , 16(3):375–400, 2019. doi: 10.1007/s10287-019-00344-6.J. Tu and G. Zhou. Markowitz meets Talmud: A combination of sophisti-cated and naive diversiﬁcation strategies.

Journal of Financial Economics ,99(1):204–215, 2011. doi: 10.1016/j.jﬁneco.2010.08.013. URL https://linkinghub.elsevier.com/retrieve/pii/S0304405X10001893 .D. I. Warton. Penalized normal likelihood and ridge regularization of correla-tion and covariance matrices.

Journal of the American Statistical Associa-tion , 103(481):340–349, 2008. doi: 10.1198/016214508000000021.D. M. Witten, J. H. Friedman, and N. Simon. New insights and faster com-putations for the graphical lasso.

Journal of Computational and GraphicalStatistics , 20(4):892–900, 2011. doi: 10.1198/jcgs.2011.11051a.M. Yuan and Y. Lin. Model selection and estimation in the Gaussian graphicalmodel.

Biometrika , 94(1):19–35, 2007. doi: 10.1093/biomet/asm018.V. Zakamulin. A test of covariance-matrix forecasting methods.

The Journal ofPortfolio Management , 41(3):97–108, 2015. doi: 10.3905/jpm.2015.41.3.097. ross-validated covariance estimators for high-dimensional MVP 25

A Covariance Parameters . . . . . . LW Year S h r i n k age I n t en s i t y . . . . . . LW CC Year S h r i n k age I n t en s i t y − . − . − . − . − . − . − . LW NL Year B and w i d t h ' s S peed − . − . − . − . − . − . − . LW NLSF

Year B and w i d t h ' s S peed POET

Year N u m be r o f F a c t o r s + − − − − GLASSO

Year La ss o P ena l t y Original CV1 CV21 CV22 (a) 100RUA . . . . . . LW Year S h r i n k age I n t en s i t y . . . . . . LW CC Year S h r i n k age I n t en s i t y − . − . − . − . − . − . − . LW NL Year B and w i d t h ' s S peed − . − . − . − . − . − . − . LW NLSF

Year B and w i d t h ' s S peed POET

Year N u m be r o f F a c t o r s + − − − − GLASSO

Year La ss o P ena l t y Original CV1 CV21 CV22 (b) 250RUA

Fig. 2: Optimally selected parameters with original, CV1, CV21, and CV22covariance estimation methods for the 100RUA and 250RUA datasets.

B GMV . . . . . . LW LW CC LW NL LW NLSF

POET GLASSO

CV21 CV1 Original . . . . . . LW LW CC LW NL LW NLSF

POET GLASSO

CV21 CV1 Original

Fig. 3: Relative diﬀerences in the annualized SD of GMV portfolios with the100RUA and 250RUA datasets across the eﬃcient covariance estimation meth-ods. ross-validated covariance estimators for high-dimensional MVP 27 T a b l e : D i ﬀ e r e n ce s i nS D p . a . o f G M V w i t h t h e R UA d a t a s e t a c r o ss d i ﬀ e r e n t e s t i m a t o r s . S a m p l e L W L W - C V L W CC L W CC - C V L W N L L W N L - C V L W N L S F L W N L S F - C V P O E T P O E T - C V G L A SS O G L A SS O - C V S a m p l e − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W . − . ∗∗ − . ∗∗∗ − . ∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W - C V . . − . − . . . − . ∗∗∗ − . ∗∗∗ − . − . − . ∗∗∗ − . ∗∗∗ L W CC . . . − . . . − . ∗∗∗ − . ∗∗∗ . − . − . ∗∗∗ − . ∗∗∗ L W CC - C V . . . . . . − . ∗ − . ∗ . − . − . ∗ − . ∗∗∗ L W N L . . − . − . − . − . − . ∗∗∗ − . ∗∗∗ − . − . ∗ − . ∗∗∗ − . ∗∗∗ L W N L - C V . . − . − . − . . − . ∗∗∗ − . ∗∗∗ − . − . ∗ − . ∗∗∗ − . ∗∗∗ L W N L S F . . . . . . . . . ∗∗∗ . ∗∗∗ − . − . L W N L S F - C V . . . . . . . − . . ∗∗∗ . ∗∗∗ − . − . P O E T . . . − . − . . . − . − . − . ∗∗ − . ∗∗∗ − . ∗∗∗ P O E T - C V . . . . . . . − . − . . − . ∗∗∗ − . ∗∗∗ G L A SS O . . . . . . . . . . . − . G L A SS O - C V . . . . . . . . . . . . b e tt e r t h a n % o f m o d e l s . . . . . . . . . . . . . T h i s t a b l e s h o w s t h e d i ﬀ e r e n ce s i n t h e a nnu a li ze d o u t - o f - s a m p l e S D o f t h e G M V p o r t f o li o s w i t h t h e R UA d a t a s e t a c r o ss t h e m a i n c o v a r i a n cee s t i m a t i o n m e t h o d s a nd t h e i r C V - b a s e d c o un t e r p a r t s . T h e t a b l e i s c o n s t r u c t e d i n a s y mm e t r i c a l w a y w i t h a n a pp li e d c o l o r s c h e m e f r o m r e d ( h i g h e r S D t h a n t h e o t h e r m o d e l )t og r ee n ( l o w e r S D t h a n t h e o t h e r m o d e l ) . I n a dd i t i o n , o n t h ee l e m e n t s a b o v e t h e d i ago n a l, t h e s i g n i ﬁ c a n t p a i r w i s e o u t p e r f o r m a n ce i n t e r m s o f v a r i a n ce i s d e n o t e db y a s t e r i s k s : *** d e n o t e ss i g n i ﬁ c a n ce a tt h e . l e v e l; ** d e n o t e ss i g n i ﬁ c a n ce a tt h e . l e v e l; a nd * d e n o t e ss i g n i ﬁ c a n ce a tt h e . l e v e l. F i n a ll y , f o r e a c h m o d e l, w e r e p o r tt h e p e r ce n t ag e o f t h e o t h e r m o d e l s t h a t e x h i b i t h i g h e r v a r i a n ce a s a q u a li t a t i v e m e a s u r e . T a b l e : D i ﬀ e r e n ce s i nS D p . a . o f G M V w i t h t h e R UA d a t a s e t a c r o ss d i ﬀ e r e n t e s t i m a t o r s . S a m p l e L W L W - C V L W CC L W CC - C V L W N L L W N L - C V L W N L S F L W N L S F - C V P O E T P O E T - C V G L A SS O G L A SS O - C V S a m p l e − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W . − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W - C V . . . . − . − . − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W CC . . − . − . − . − . − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W CC - C V . . − . . − . − . − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W N L . . . . . − . − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W N L - C V . . . . . . − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ − . ∗∗∗ L W N L S F . . . . . . . . ∗∗ . . ∗∗ . . L W N L S F - C V . . . . . . . − . . . ∗ . . P O E T . . . . . . . − . − . . ∗ . − . P O E T - C V . . . . . . . − . − . − . − . − . ∗ G L A SS O . . . . . . . − . − . − . . − . ∗∗ G L A SS O - C V . . . . . . . − . − . . . . b e tt e r t h a n % o f m o d e l s . . . . . . . . . . . . . T h i s t a b l e s h o w s t h e d i ﬀ e r e n ce s i n t h e a nnu a li ze d o u t - o f - s a m p l e S D o f t h e G M V p o r t f o li o s w i t h t h e R UA d a t a s e t a c r o ss t h e m a i n c o v a r i a n cee s t i m a t i o n m e t h o d s a nd t h e i r C V - b a s e d c o un t e r p a r t s . T h e t a b l e i s c o n s t r u c t e d i n a s y mm e t r i c a l w a y w i t h a n a pp li e d c o l o r s c h e m e f r o m r e d ( h i g h e r S D t h a n t h e o t h e r m o d e l )t og r ee n ( l o w e r S D t h a n t h e o t h e r m o d e l ) . I n a dd i t i o n , o n t h ee l e m e n t s a b o v e t h e d i ago n a l, t h e s i g n i ﬁ c a n t p a i r w i s e o u t p e r f o r m a n ce i n t e r m s o f v a r i a n ce i s d e n o t e db y a s t e r i s k s : *** d e n o t e ss i g n i ﬁ c a n ce a tt h e . l e v e l; ** d e n o t e ss i g n i ﬁ c a n ce a tt h e . l e v e l; a nd * d e n o t e ss i g n i ﬁ c a n ce a tt h e . l e v e l. F i n a ll y , f o r e a c h m o d e l, w e r e p o r tt h e p e r ce n t ag e o f t h e o t h e r m o d e l s t h a t e x h i b i t h i g h e r v a r i a n ce a s a q u a li t a t i v e m e a s u r e . C GMV - 130/30 . . . . . . LW LW CC LW NL LW NLSF

POET GLASSO

CV22 CV1 Original . . . . . . LW LW CC LW NL LW NLSF

POET GLASSO

CV22 CV1 Original