[PDF] Reactive Global Minimum Variance Portfolios with k− BAHC covariance cleaning

Abstract

We introduce a k -fold boosted version of our Boostrapped Average Hierarchical Clustering cleaning procedure for correlation and covariance matrices. We then apply this method to global minimum variance portfolios for various values of k and compare their performance with other state-of-the-art methods. Generally, we find that our method yields better Sharpe ratios after transaction costs than competing filtering methods, despite requiring a larger turnover.

Full PDF

AARTICLE PREPRINT

Reactive Global Minimum Variance Portfolios with k − BAHCcovariance cleaning

Christian Bongiorno a and Damien Challet a a Universit´e Paris-Saclay, CentraleSup´elec, Laboratoire de Math´ematiques et Informatiquepour les Syst`emes Complexes, 91190, Gif-sur-Yvette, France

ARTICLE HISTORY

Compiled May 19, 2020

ABSTRACT

We introduce a k -fold boosted version of our Boostrapped Average HierarchicalClustering cleaning procedure for correlation and covariance matrices. We then ap-ply this method to global minimum variance portfolios for various values of k andcompare their performance with other state-of-the-art methods. Generally, we ﬁndthat our method yields better Sharpe ratios after transaction costs than competingﬁltering methods, despite requiring a larger turnover. KEYWORDS covariance matrix cleaning; portfolio optimization, global minimum varianceportfolios, realized risk.

1. Introduction

Portfolio optimization works best when the asset covariance matrix is optimallycleaned. The necessity to ﬁlter covariance matrices was recognized a long time agoin this context (Michaud 1989). Cleaning may be optimal in two respects: ﬁrst, esti-mation noise has to be ﬁltered out when the number of data points t is comparableto the number of assets n (the so-called curse of dimensionality). This is often thecase as the non-stationary nature of the dependence between asset price returns dic-tates to take as small a t as possible (Bongiorno and Challet 2020b). Many ﬁlteringmethods have been proposed, either for covariance or correlation matrices themselves(see Bun, Bouchaud, and Potters (2017) for a review) or for the portfolio optimizationmethods objectives (Markowitz 1959; Black and Litterman 1990; Duﬃe and Pan 1997;Hull and White 1998; Krokhmal, Palmquist, and Uryasev 2002; Roncalli 2013; Meucci,Santangelo, and Deguest 2015). Secondly, a good ﬁltering method should also be ableto retain the most stable structure of dependence matrices. As a consequence, whatﬁltering method is optimal may depend on asset classes and market conditions. Forthese reasons, using a ﬂexible yet robust method brings a more consistent performance.A third ingredient to improve portfolio optimization is to account for stochasticvolatility itself, i.e., to use both asset-level volatility model and covariance matrixcleaning, as in Engle, Ledoit, and Wolf (2019). Since this work is devoted to theinﬂuence of covariance cleaning itself, we will not use this ingredient. CONTACT C. Bongiorno. Email: [email protected] a r X i v : . [ q -f i n . P M ] M a y ere we shall focus on covariance cleaning; we refer the reader to the extensivereview of Bun, Bouchaud, and Potters (2017). There are two main ways to cleancovariance matrices: either to ﬁlter the eigenvalues of the corresponding correlationmatrix and or to make assumptions on the structure of correlation matrices, i.e., touse an ansatz.Eigenvalue ﬁltering rests on the spectral decomposition of the covariance or corre-lation matrix into a sum of outer product of eigenvectors and eigenvalues. Remarkablerecent progresses lead to the proof that provided if n < t and if the system is stationary,the Rotationally Invariant Estimator (RIE) (Bun et al. 2016) converges to the oracleestimator (which knows the realized correlation matrix) at ﬁxed ratio q = t/n and inthe large system limit n and t → ∞ . In practice, computing RIE is far from trivial forﬁnite n , i.e., for sparse eigenvalue densities; several numerical methods address thisproblem, such as QuEST (Ledoit, Wolf et al. 2012), Inverse Wishart regularisation(Bun, Bouchaud, and Potters 2017), or the cross-validated approach (CV hereafter)(Bartz 2016). Note that these methods only modify the eigenvalues, and keep theempirical eigenvectors intact.The structure-based approach requires a well-chosen ansatz that needs to be suit-able for the system under study. For example, linear shrinkage uses a target covariance(or correlation) matrix, and also has an eigenvalue ﬁltering interpretation (Potters,Bouchaud, and Laloux 2005). Factor models belong to the structure-based approach.A particular case is hierarchical factor models, which have been shown to yield re-markably good Global Minimum Variance (GMV henceafter) portfolios (Tumminello,Lillo, and Mantegna 2007a; Pantaleo et al. 2011; Tumminello, Lillo, and Mantegna2007b).A problem of the hierarchical ansatz is its sensitivity to the bootstrapping of theoriginal data, which does not yield many statistically validated clusters in correlationmatrices of equity returns (Bongiorno, Miccich`e, and Mantegna 2019). Very recently,we have leveraged this sensitivity to build a more ﬂexible estimator, which consistsin averaging ﬁltered hierarchical clustering correlation or covariance matrices of boot-strapped data (BAHC) (Bongiorno and Challet 2020a). BAHC not only allows animperfect hierarchical structure, i.e., a moderate overlapping among clusters, but alsoa probabilistic superposition of quite distinct hierarchical structures. When applied toGMV portfolios, BAHC yields similar or better realized risk than the optimal eigen-values ﬁltering methods but for a much smaller t than its competitors, which givesportfolios that are much more reactive to changing market conditions. It can be furtherimproved, as shown below.This paper proposes to extend BAHC to account for the structure of the correlationmatrix that is not described by BAHC, i.e., the residuals. The rationale is that thelatter may also contain a structure that persists in the out-of-sample window, hencethat they should not be erased by the ﬁltering method. The idea is to apply to ﬁlterrecursively the residuals and to average the ﬁltered matrices of bootstrapped data.The order of recursion, denoted by k , is a parameter of the method, which weproposed to call k − BAHC. This new method is equivalent to BAHC when k = 1 byconvention. The higher k , the ﬁner the details kept by k − BAHC, which, as shownbelow, improves the out-of-sample GMV portfolios up to a point. When k tends toinﬁnity, the ﬁltered correlation matrix converges to the unﬁltered correlation matrixaveraged over many bootstrap copies. This matrix is almost surely positive deﬁnitein the high-dimensional regime t < n , despite the fact that the empirical unﬁlteredcorrelation matrix is not positive deﬁnite Bongiorno (2020).As shown below, the optimal average k depends on the size of the in-sample window2or a data set of US equities. It is generally an increasing function of the in-samplewindow length t in : for small t in , most of the variations of the empirical correlationmatrices are due to estimation noise, which is best ﬁltered by a small k ; as t in increases,the relative importance of estimation noise decreases and thus a higher k should bepreferred.

2. Methods

Let us start with some notations of standard quantities: let R be a n × t matrix ofprice returns. Its n × n covariance matrix, denoted by Σ , has elements σ ij , where σ ij = 1 t t (cid:88) h =1 ( r ih − ¯ r i ) ( r jh − ¯ r j ) (1)and where ¯ r i = (cid:80) th =1 r ih /t is the sample mean of vector r i . The Pearson correlationmatrix C has elements c ij = σ ij √ σ ii σ jj (2)As k − BAHC is an extension of BAHC, itself is a bootstrapped version of the strictlyhierarchical ﬁltering method of Tumminello, Lillo, and Mantegna (2007a), let us startwith the hierarchical clustering.

Hierarchical Clustering

Hierarchical clustering agglomerates groups of objects recursively according to a dis-tance matrix taken here as D = 1 − C with elements d ij ; D respects all the axioms ofa proper distance. Accordingly, the distance between clusters p and q , denoted by ρ pq is deﬁned as the average distance between their elements ρ pq = (cid:80) i ∈ C p (cid:80) j ∈ C q d ij n q n p , (3)where C p and C q denote the n p , respectively n q elements of clusters p and q respectively.Hierarchical agglomeration works as follows: one starts by giving each element itsown cluster. Then, the two clusters ( p, q ) with the smallest distance ρ pq are mergedinto a new cluster s which contains the elements C s = C p ∪ C q . This algorithm is applieduntil all nodes form a single unique cluster. This deﬁnes a tree, called a dendrogram,which uniquely identiﬁes the genealogy of cluster merges, denoted by G . Hierarchical Clustering Average Linkage Filtering (HCAL)

Deﬁning a merging tree is not enough to clean correlation matrices. Tumminello, Lillo,and Mantegna (2007a) propose to average all the elements of the sub-correlation matrixdeﬁned from the indices F pq = { ( i, j ) : i ∈ C p , j ∈ C q } , i.e., to replace c ij by c

The method we propose rests on two ingredients: a recursive HCAL ﬁltering of theresiduals of ﬁltered correlation matrices, and bootstrapping the return matrix (time-wise) in order to make the method more ﬂexible, as in the BAHC method. k − HCAL Filtering

Let us deﬁne the ﬁltered matrix of order k = 0 as C < (0) = . The residue matrix oforder k is then E ( k ) = C − C < ( k ) . (5)When k = 0, E (0) = C ; For any value of k ∈ N + , we can apply the ﬁltering procedureof sec. 2.2 to the residue matrix E ( k ) to obtain a ﬁltered residue matrix E < ( k ) . Thenthe k + 1 − HCAL ﬁltered matrix is obtained with C < ( k +1) = C < ( k ) + E < ( k ) . (6)For example, k = 1 correspond to HCAL-ﬁltered matrix. The recursive application ofEqs.(5) and (6) allows us to compute the ﬁltered matrix at any order k . It is worthnoticing that by iterating Eqs.(5) and (6)lim k →∞ C < ( k ) = C (7)since the residue become smaller and smaller. It is important to point out that, C < ( k ) isnot in general a semi-positive deﬁnite matrix for k >

1, and in most cases, some smallnegative eigenvalues have been observed in our numerical experiments. These eigen-values, according to Eq.(7), shrink to non-negative values when k approach inﬁnity.For any order k >

1, we set the possibly negative eigenvalues to 0.

In the spirit of the BAHC method (Bongiorno and Challet 2020a), our recipe pre-scribes to create a set of m bootstrap copies of the data matrix R , denoted by { R (1) , R (2) , · · · , R ( m ) } . A single bootstrap copy of the data matrix R ( b ) ∈ R n × t isdeﬁned entry-wise as r ( b ) ij = r i s ( b ) j , where s ( b ) is a vector of dimension t obtained with4andom sampling by replacement of the elements of the vector { , , · · · , t } . The vector s ( b ) , b = 1 , · · · , m are sampled independently.We compute the Pearson correlation matrix C ( b ) of each bootstrap R ( b ) of the datamatrix, from which we derive the k − HCAL-ﬁltered matrix C ( b ) < ( k ) . Finally, the ﬁlteredPearson correlation matrix C k -BAHC is deﬁned as the average over the m ﬁlteredbootstrap copies, i.e., C k − BAHC = m (cid:88) b =1 C ( b ) < ( k ) m (8)While C ( b ) < ( k ) is a semi-positive deﬁnite matrix, the average of these ﬁltered rapidlybecomes positive-deﬁnite, as shown in Bongiorno (2020). This convergence is fast, andit is guaranteed almost surely if m ≥ n , but in most of the cases is reached for m muchsmaller then n .Finally, k -BAHC ﬁltered covariance is obtained from the sample univariate varianceaccording to σ k -BAHC ij = c k -BAHC ij σ ii σ jj (9)The main advantage of k − BAHC over k − HCAL is not to force C k − BAHC to beembedded in a purely recursive hierarchical structure.

3. Results3.1.

Data

We consider the daily close-to-close returns from 1999-01-04 to 2020-03-31 of US eq-uities, adjusted for dividends, splits, and other corporate events. More precisely, thedata set consists of 1295 assets taken from the union of all the components of theRussell 1000 from 2010-06 to 2020-03. The number of stocks with data varies overtime: it ranges from 497 in 1999-02-18 to 1172 in 2018-01-17.

Spectral Properties

One of the reasons why the original BAHC ﬁltering achieves a similar or better real-ized variance than its competitors that focus on ﬁltering the eigenvalues of the cor-relation matrix only, that the resulting eigenvectors have a larger overlap with theout-of-sample eigenvectors than the unﬁltered empirical eigenvectors while still ﬁlter-ing eigenvalues nearly as well as the optimal methods (Bongiorno and Challet 2020a).This sub-section is devoted to investigate how the eigenvector components change as k is increased. It turnouts that the localization of eigenvectors is crucial in understandingthe role of k .To understand why localization matters to portfolio optimization, it is worth recall-ing that Global Minimum Portfolios correspond to the optimal weights w ∗ = Σ − Σ − (cid:48) (10)5

25 50 75 100 125 150 175 200

IPR C D F k=1k=3k=8k=24k=70k=samplenull (a) IPR cumulative distribution. i I P R i samplek=k=70k=24k=8k=3k=1 (b) IPRs and eigenvalues scatter plot. Figure 1.

Left plot: cumulative distribution of the Inverse Participation Ratio (IPR) computed on k − BACHfor diﬀerent value of k , together with the sample covariance IPR and the IPR of shuﬄed returns (null). Rightplot: scatter plot of the IPR i versus eigenvalue λ i . Both plots use data from the period [2016-04-12,2020-03-31]which contains 588 assets. that is a sum by rows (or columns) of the inverted covariance matrix, then normal-ized to one. The inverted covariance matrix can be expressed in terms of spectraldecomposition of Σ as Σ − = n (cid:88) i =1 λ i v i v (cid:48) i , (11)where λ i and v i are respectively the i − th eigenvalue and and its associated eigenvectorof Σ . This means that the composition of the eigenvectors related to the highesteigenvalues is irrelevant and the portfolio allocation is dominated by the eigenvectorsof the smallest eigenvalues. Let us assume that the eigenvalue are ordered, i.e., that λ > λ > · · · > λ n . If the smallest eigenvalue is much smaller than all the othersones, i.e., λ n (cid:28) λ n − , the largest part of investment will be on the j -stocks such that | v nj | (cid:29)

0. Therefore, the localization of the non-zero elements of the eigenvectors iscrucial to understand the portfolio allocation.The statistical characteristics of the eigenvectors localization is typically describedin term of Inverse Participation Ratio (IPR), deﬁned asIPR i = 1 (cid:80) nj =1 v ij (12)where the index i refers the i -th eigenvalue. The smaller the value of IPR i , the morelocalized its associated eigenvector, the most localized case corresponding to IPR= 1.Figure 1(a) reports the cumulative distribution function of IPR of the eigenvectorsfor diﬀerent recursion orders k . The dependence on k is obvious: 1 − BACH has the mostlocalized eigenvectors and the larger the value of k , the less localized the components ofthe eigenvectors. In the limit k → ∞ , one recovers the empirical, unﬁltered, covariancematrix. In addition, the IPR of the latter two are hardly diﬀerent from the randommatrix null expectation obtained by shuﬄing price returns asset by asset in the datamatrix.Figure 1(b) gives more details about the IPRs associated with the eigenvalues. Itmakes it obvious that IPRs are diﬀerent for small eigenvalues, while no clear patternemerges for the outliers λ i (cid:38) − . Since the lowest eigenvalues are the ones thataﬀect mainly GVM portfolio optimization, a ﬁltering procedure that modiﬁes the IPRof such eigenvalues will produce a substantial diﬀerence in the portfolio allocation.6 .3. Global Minimum Variance portfolios

This part explores how the realized risk of GMV portfolios depends on the recursionorder k and compares it with the performance obtained from sample covariance and theCross-Validated (CV) eigenvalue shrinkage (Bartz 2016), which is a strong contenderfor the best realized risk (Bongiorno and Challet 2020a). Two types of tests are carriedout: because our data covers many diﬀerent market regimes and a variable numberof assets, we ﬁrst ask what is the average realized risk of each covariance cleaningmethod for a random collection of assets in a randomly chosen period of ﬁxed length.This allows us to assess the performance of each cleaning scheme in a fair way and tocontrol the eﬀect of the calibration window length. In the second part, we compare theperformance of these optimal portfolios with all available stocks at any given time. The experiments of this part are carried out in the following way: for each calibrationwindow length ∆ t in ∈ [20 , t between 2000-01-03 and2020-03-30 that deﬁnes a calibration widow [ t − ∆ t in , t [, and a test window [ t, t +∆ t out [with ∆ t out = 21 days. We then sample n = 100 stocks over the available assetswithin the calibration and test windows. Finally, we compute the GMV portfolios withand without short positions using k − BAHC, the state-of-art Cross-Validated (CV)eigenvalue shrinkage (Bartz 2016) and the unﬁltered empirical covariance matrix.Figure 2 shows the realized risk of GMV portfolios obtained with the chosen ﬁlteringschemes and with the empirical (sample) covariance matrix. The k − BACH estimatorsoutperform both CV and the sample covariance estimators for ∆ t in <

300 in thelong-short case and for every ∆ t in for the long-only case (Figures 2(a) and 2(c)). Thehighest performance of CV is obtained for ∆ t in ≈ k − BAHC with ∆ t in ≈ k (and for this data set) depends on ∆ t in . In the high-dimensional regime ( q > t in <

90, the best results are obtained is for k = 1; however, when ∆ t in increases the performance of k = 1 becomes even worsethan the sample covariance. From this analysis is clear that the larger the calibrationwindow size, the larger the approximation order k must be. Figures 2(b) and 2(d) showthe average optimal k ∗ that minimizes the realized risk as a function of ∆ t in for thelong-short and long-only case. They conﬁrm that a longer calibration window requiresa higher approximation order both for the long-short and long-only cases; however,whereas for the long-short the increment seems linear with ∆ t in , this dependence forthe long-only case is sub-linear (and much noisier). It is worth remarking that the ﬁtsof the right plots of Figure 2 are obtained with k ≤

20: larger values of k might furtherimprove the performance for larger ∆ t in ; however, they would require a comparativelygreater computational eﬀort. In this section we performed a set of portfolio optimizations with monthly computa-tions of new portolio weights (and relabancing) over the full time-period [2000-01-02,2020-03-31] for all the considered covariance estimators and equally weighted portfolios(EQ hence after), and for diﬀerent in-sample window lengths. The backtests includetransaction costs set to 2 bps. A slight complication comes from the variable numberof available assets. Thus, as any time, q = n/t varies and generally increases with time7 t in R e a li z e d r i s k CVsample135 101420 (a) Long-short realized risk. t in k * k * =0.012 t in +0.911 k BAHC (b) Long-short optimal k .. t in R e a li z e d r i s k CVsample135 101420inf (c) Long-only realized risk. t in k * k * =2.452log( t in )+ 7.506KBAHC (d) Long-only optimal k .. Figure 2.

Left plots: annualized realized risk for diﬀerent covariance estimators computed over calibrationwindows of length ∆ t in ; each point is the median of 10,000 simulations; testing period of 21 days. Legendnumbers refer to the approximation order used in k − BAHC. Right plots: average optimal approximation order k for diﬀerent calibration windows; the error bars represent the standard deviation obtained by bootstrapre-sampling of the test-period performance; the continuous black line comes from a linear regression. Upperplots correspond to the long-short portfolio, while the lower plots impose a long-only constraint. at ﬁxed calibration window length. In any case, it is worth keeping in mind that n isrelatively large, i.e., between 497 and 1172.In particular, at each rebalancing time-step we considered all the available stockslisted in both the in-sample and out-of-sample periods. The present work in-vestigates in detail short calibration windows: we chose a sequence of ∆ t in ∈ [21 , , , · · · ,

12 months. In addition,for sake of completeness we also included longer calibration windows ∆ t in =300 , , , , , , , − BAHC, the globally best value for the data set that we used. Asexpected, GMV reduce the realized risk with respect to EW portfolios, and cleaningthe covariance matrix is clearly beneﬁcial as well.Let us start with realized risk, the focus of this paper. The realized risk of EQportfolios is much larger than that resulting from the other methods, which is hardlysurprising as the latter account for the covariance matrix (Tables 2 and 3); k − BACHachieves the smallest realized risk, and the best value for k increases as ∆ t in increases.Although GMV does not guarantees a positive return, we also report the Sharperatios of the various ﬁltering methods. Because computing Sharpe ratios with momentsis not eﬃcient for heavy-tailed variables, we use the eﬃcient and unbiased moment-free SR estimator introduced in Challet (2017) and implemented in Challet (2020).Sharpe ratios paint a picture similar to realized risk (see Tables 4 and 5): k − BAHCoutperforms all the other methods for medium to large values of k for almost every ∆ t in

000 2004 2008 2012 2016 2020

Time [Year] l o g c u m u l . p e r f o r m a n c e s

11 BAHCCVSampleEQ (a) Long-short

Time [Year] l o g c u m u l . p e r f o r m a n c e s

11 BAHCCVSampleEQ (b) Long-only

Figure 3.

Cumulative performances obtained with 21 days between weights updates and rebalancing fordiﬀerent methods. Both plots were obtained with a calibration window of 105 days (5 months). The upper plotrefers to long-short portfolios, the lower one to long-only portfolios both in the long-only and long-short cases, especially when the calibration window issmaller than a year, which corresponds to q = n/t ∈ [2 , . .

25 for k = 7 , ,

18 in the remarkably short calibration window ∆ t in = 105 days(about 5 months), and is signiﬁcantly higher than the best performance of CV (SR=1 .

18) obtained for a much higher calibration window ∆ t in = 400. This shows thatreactive portfolio optimization is invaluable. In the long-only case, the improvementof k − BAHC is smaller: the best SR (1.13) is that of 18 − BAHC, whereas the highestSR of CV is 1 . (cid:104) rank (cid:105) to every method deﬁned as the average dense rank over the years. Theresults for the long-short and long-only cases are summarized in Table 1. It is worthnoting that diﬀerent medium to large values of k of k − BACH outperform all the othermethods, and in particular the optimal performances are achieved with a calibrationwindow length shorter than for CV by a factor of about four.We checked that the portfolios obtained with k − BAHC are more concentrated thanthe other ones, which is consistent with the fact that the IPR of the relevant eigen-vectors is smaller. The concentration of a portfolio can be measured with n eﬀ = 1 (cid:80) ni =1 w i (13)as proposed in Bouchaud and Potters (2003); However, as noticed in Pantaleo et al.(2011), this quantity does not have a clear interpretation when short selling is al-lowed. To overcome this issue, Pantaleo et al. (2011) introduced the n metrics which9 able 1. Average rank of Sharpe ratios computed year by year, denoted by (cid:104) rank (cid:105) , ofthe various methods for diﬀerent in-sample window sizes ∆ t in . The left table refers tolong-short portfolios, and the right table to long-only portfolios.Long-shortrank (cid:104) rank (cid:105) Method ∆ t in − BAHC 1052 39.48 30 − BAHC 843 39.76 18 − BAHC 634 40.10 18 − BAHC 1055 40.14 11 − BAHC 846 40.57 30 − BAHC 637 41.00 18 − BAHC 848 41.38 11 − BAHC 639 42.14 7 − BAHC 8410 42.57 30 − BAHC 105 · · · · · · · · · · · ·

31 47.62 CV 400143 65.24 Sample 1500192 81.57 EQ - Long-onlyrank (cid:104) rank (cid:105)

Method ∆ t in − BAHC 632 38.43 30 − BAHC 633 39.62 7 − BAHC 634 40.90 18 − BAHC 635 42.29 30 − BAHC 846 42.67 11 − BAHC 1267 42.86 4 − BAHC 638 43.57 4 − BAHC 1269 43.95 3 − BAHC 6310 44.19 11 − BAHC 105 · · · · · · · · · · · ·

20 46.76 CV 12629 48.52 Sample 400168 73.86 EQ - measures the smallest number of stocks that amount for at least 90% of the investedcapital. Accordingly, we used n is for the long-short case and n eﬀ for the long-onlyone. Looking at Tables 6 and 7, the number of stocks selected is systematically smallerfor every k and calibration window for k − BAHC for both long-only and long-shortportfolios.That said, k − BAHC has two drawbacks. First, the gross leverage is generally largerthan for CV in the long-short case (see Table 8). However, if we compare the values ofgross leverage corresponding to the larger SR for CV and k − BAHC for ∆ t in withinone year, they diﬀer only by 0 .

52 (2 .

47 for CV and 3 .

00 for k − BAHC). On the otherhand, without constraining the calibration window, the highest SR for CV is achievedfor ∆ t in = 400 and the gross leverage reaches 3 .

31, which is larger than for othermethods.The other drawback of k − BAHC is that it requires a larger turnover for long-shortportfolios. A natural turnover metrics, denoted by γ , was deﬁned in Reigneron et al.(2020) as γ = 1 τ τ − (cid:88) h =0 n (cid:88) i =1 | w i ( t + h ∆ t out ) − w i ( t + ( h + 1)∆ t out ) | , (14)where τ is the number of rebalancing operations and t is the initial time. γ measuresthe average changes in the portfolio allocation between two consecutive portfolio allo-cations. Table 9 shows that k − BAHC has a γ typically twice as large as CV, exceptfor k = 1 for large ∆ t in for long-short portfolios. For the long-only case (Table 10)CV still outperforms k − BAHC in that respect, although not by much.All performancemeasures take into account into account the rebalancing costs. Note that the largerturnover comes from the fact that portfolios are more concentrated, i.e., select fewerassets. It is therefore more likely that the set of selected changes at every weightupdates. 10 . Discussion

By combining recursive hierarchical clustering average linkage and bootstrapping ofthe data matrix yields a globally better way to ﬁltering asset price covariance ma-trices. We have shown that this method ﬁlters the eigenvectors associated with smalleigenvalues of the covariance matrix by making them more concentrated, which in turnyields portfolios with fewer assets. Because k − BACH captures more of the persistentstructure of covariance matrices with shorter calibration windows, it leads to better re-alized variance of Global Minimum Variance portfolios than even the best method thatoptimally ﬁlters the eigenvalues of the correlation matrix. Finally, it is able to achieveits best performance for signiﬁcantly smaller calibration window lengths, which makes k − BAHC portfolios more reactive to changing market conditions. The main drawbackis that it requires a larger turnover.This is due, in part, to the fact that resulting portfolios are more concentrated, hencethat the fraction of capital in which to invest change more rapidly than less speciﬁcmethods. Whether this reﬂects a genuine change of market structure or a by-productof the speciﬁc assumptions of k − BACH is an interesting open question.Future work will investigate how k − BAHC may improve other kinds of portfoliooptimization schemes and other ﬁnancial applications of covariance matrices.

Acknowledgement(s)

This work was performed using HPC resources from the “M´esocentre” computingcenter of CentraleSup´elec and ´Ecole Normale Sup´erieure Paris-Saclay supported byCNRS and R´egion ˆIle-de-France ( http://mesocentre.centralesupelec.fr/ ) Funding

This publication stems from a partnership between CentraleSup´elec and BNP Paribas.

References

Bartz, Daniel. 2016. “Cross-validation based Nonlinear Shrinkage.” arXiv preprintarXiv:1611.00798 .Black, Fischer, and Robert Litterman. 1990. “Asset allocation: combining investor views withmarket equilibrium.”

Goldman Sachs Fixed Income Research arXivpreprint arXiv:2004.03165 .Bongiorno, Christian, and Damien Challet. 2020a. “Covariance matrix ﬁltering with boot-strapped hierarchies.” arXiv preprint arXiv:2003.05807 .Bongiorno, Christian, and Damien Challet. 2020b. “Nonparametric sign prediction of high-dimensional correlation matrix coeﬃcients.” arXiv preprint arXiv:2001.11214 .Bongiorno, Christian, Salvatore Miccich`e, and Rosario N Mantegna. 2019. “Nested partitionsfrom hierarchical clustering statistical validation.” arXiv preprint arXiv:1906.06908 .Bouchaud, Jean-Philippe, and Marc Potters. 2003.

Theory of ﬁnancial risk and derivativepricing: from statistical physics to risk management . Cambridge university press.Bun, Jo¨el, Romain Allez, Jean-Philippe Bouchaud, and Marc Potters. 2016. “Rotational in- ariant estimator for general noisy matrices.” IEEE Transactions on Information Theory

62 (12): 7475–7490.Bun, Jo¨el, Jean-Philippe Bouchaud, and Marc Potters. 2017. “Cleaning large correlation ma-trices: tools from random matrix theory.”

Physics Reports

Applied Math-ematical Finance

24 (1): 1–22.Challet, Damien. 2020. sharpeRratio: Moment-Free Estimation of Sharpe Ratios . R packageversion 1.4.1, https://CRAN.R-project.org/package=sharpeRratio .Duﬃe, Darrell, and Jun Pan. 1997. “An overview of value at risk.”

Journal of derivatives

Journal of Business & Economic Statistics

37 (2): 363–375. https://doi.org/10.1080/07350015.2017.1345683 .Hull, John, and Alan White. 1998. “Value at risk when daily changes in market variables arenot normally distributed.”

Journal of derivatives

5: 9–19.Krokhmal, Pavlo, Jonas Palmquist, and Stanislav Uryasev. 2002. “Portfolio optimization withconditional value-at-risk objective and constraints.”

Journal of Risk

4: 43–68.Ledoit, Olivier, Michael Wolf, et al. 2012. “Nonlinear shrinkage estimation of large-dimensionalcovariance matrices.”

The Annals of Statistics

40 (2): 1024–1060.Markowitz, Harry. 1959.

Portfolio selection: Eﬃcient diversiﬁcation of investments . Vol. 16.John Wiley New York.Meucci, Attilio, Alberto Santangelo, and Romain Deguest. 2015. “Risk budgeting and diver-siﬁcation based on optimized uncorrelated factors.”

Available at SSRN 2276632 .Michaud, Richard O. 1989. “The Markowitz optimization enigma: Is “optimized ” optimal?”

Financial Analysts Journal

45 (1): 31–42.Pantaleo, Ester, Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna. 2011. “Whendo improved covariance matrix estimators enhance portfolio optimization? An empiricalcomparative study of nine estimators.”

Quantitative Finance

11 (7): 1067–1080.Potters, Marc, Jean-Philippe Bouchaud, and Laurent Laloux. 2005. “Financial Applicationsof Random Matrix Theory: Old Laces and New Pieces.”

Acta Physica Polonica B

36: 2767.Reigneron, Pierre-Alain, Vincent Nguyen, Stefano Ciliberti, Philip Seager, and Jean-PhilippeBouchaud. 2020. “Agnostic Allocation Portfolios: A Sweet Spot in the Risk-Based Jungle?”

The Journal of Portfolio Management .Roncalli, Thierry. 2013.

Introduction to risk parity and budgeting . CRC Press.Tumminello, Michele, Fabrizio Lillo, and Rosario N Mantegna. 2007a. “Hierarchically nestedfactor model from multivariate data.”

EPL (Europhysics Letters)

78 (3): 30006.Tumminello, Michele, Fabrizio Lillo, and Rosario N Mantegna. 2007b. “Kullback-Leibler dis-tance as a measure of the information ﬁltered from multivariate data.”

Physical Review E

76 (3): 031123. a b l e . R e a li ze d a nnu a li ze d r i s k o f t h e l o n g - s h o r t p o r t f o li o ; r e b a l a n c i n g e v e r y d a y s . B o l d e n t r i e s a r e t h e o p t i m a l v a l u e s f o r e a c h ∆ t i n . C o v a r i a n ce m a t r i x e s t i m a t o r s ∆ t i n E Q S a m p l e C V − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C . N a N . . . . . . . . . . N a N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N a N . . . . . . . . . . N a N . . . . . . . . . . N a N . . . . . . . . . . N a N . . . . . . . . . . N a N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a b l e . R e a li ze d a nnu a li ze d r i s k o f t h e l o n g - o n l y p o r t f o li o s ; r e b a l a n c i n g e v e r y d a y s . B o l d e n t r i e s a r e t h e o p t i m a l v a l u e s f o r e a c h ∆ t i n . C o v a r i a n ce m a t r i x e s t i m a t o r s ∆ t i n E Q S a m p l e C V − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

104 3000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a b l e . Sh a r p e r a t i oo f t h e l o n g - s h o r t p o r t f o li o s ; r e b a l a n c i n g e v e r y d a y s . B o l d e n t r i e s a r e t h e o p t i m a l v a l u e s f o r e a c h ∆ t i n . C o v a r i a n ce m a t r i x e s t i m a t o r s ∆ t i n E Q S a m p l e C V − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C . N a N . . . . . . . . . . N a N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

07 3000 . N a N . . . . . . . . . . N a N . . . . . . . . . . N a N . . . . . . . . . . N a N . . . . . . . . . . N a N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a b l e . Sh a r p e - R a t i oo f t h e l o n g - o n l y p o r t f o li o s o b t a i n e d ; r e b a l a n c i n g e v e r y d a y s . B o l d e n t r i e s a r e t h e o p t i m a l v a l u e s f o r e a c h ∆ t i n . C o v a r i a n ce m a t r i x e s t i m a t o r s ∆ t i n E Q S a m p l e C V − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94 3000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a b l e . n o f t h e l o n g - s h o r t p o r t f o li o s ; r e b a l a n c i n g e v e r y d a y s . B o l d e n t r i e s a r e t h e o p t i m a l v a l u e s f o r e a c h ∆ t i n . C o v a r i a n ce m a t r i x e s t i m a t o r s ∆ t i n S a m p l e C V − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C a b l e . A v e r ag e nu m b e r o f e ﬀ ec t i v e a ss e t s n e ﬀ o f t h e l o n g - o n l y p o r t f o li o s ; r e b a l a n c i n g e v e r y d a y s . B o l d e n t r i e s a r e t h e o p t i m a l v a l u e s f o r e a c h ∆ t i n . C o v a r i a n ce m a t r i x e s t i m a t o r s ∆ t i n S a m p l e C V − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a b l e . G r o ss l e v e r ag e o f t h e l o n g - s h o r t p o r t f o li o s ; r e b a l a n c i n g e v e r y d a y s . B o l d e n t r i e s a r e t h e o p t i m a l v a l u e s f o r e a c h ∆ t i n . C o v a r i a n ce m a t r i x e s t i m a t o r s ∆ t i n S a m p l e C V − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71 300498 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a b l e . γ o f t h e l o n g - s h o r t p o r t f o li o s ; r e b a l a n c i n g e v e r y d a y s . B o l d e n t r i e s a r e t h e o p t i m a l v a l u e s f o r e a c h ∆ t i n . C o v a r i a n ce m a t r i x e s t i m a t o r s ∆ t i n S a m p l e C V − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55 300916 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a b l e . γ o f t h e l o n g - o n l y p o r t f o li o s ; r e b a l a n c i n g e v e r y d a y s . B o l d e n t r i e s a r e t h e o p t i m a l v a l u e s f o r e a c h ∆ t i n . C o v a r i a n ce m a t r i x e s t i m a t o r s ∆ t i n S a m p l e C V − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C − B AH C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37 3000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11