[PDF] Betas, Benchmarks and Beating the Market

Abstract

We give an explicit formulaic algorithm and source code for building long-only benchmark portfolios and then using these benchmarks in long-only market outperformance strategies. The benchmarks (or the corresponding betas) do not involve any principal components, nor do they require iterations. Instead, we use a multifactor risk model (which utilizes multilevel industry classification or clustering) specifically tailored to long-only benchmark portfolios to compute their weights, which are explicitly positive in our construction.

Full PDF

aa r X i v : . [ q -f i n . P M ] J u l Betas, Benchmarks and Beating the Market

Zura Kakushadze §† and Willie Yu ♯ § Quantigic r Solutions LLC1127 High Ridge Road † Free University of Tbilisi, Business School & School of Physics240, David Agmashenebeli Alley, Tbilisi, 0159, Georgia ♯ Centre for Computational Biology, Duke-NUS Medical School8 College Road, Singapore 169857 (May 30, 2018)

Abstract

We give an explicit formulaic algorithm and source code for building long-only benchmark portfolios and then using these benchmarks in long-only mar-ket outperformance strategies. The benchmarks (or the corresponding betas)do not involve any principal components, nor do they require iterations. In-stead, we use a multifactor risk model (which utilizes multilevel industry clas-siﬁcation or clustering) speciﬁcally tailored to long-only benchmark portfoliosto compute their weights, which are explicitly positive in our construction. Zura Kakushadze, Ph.D., is the President and CEO of Quantigic r Solutions LLC, and a FullProfessor at Free University of Tbilisi. Email: [email protected] Willie Yu, Ph.D., is a Research Fellow at Duke-NUS Medical School. Email: [email protected] DISCLAIMER: This address is used by the corresponding author for no purpose other thanto indicate his professional aﬃliation as is customary in publications. In particular, the contentsof this paper are not intended as an investment, legal, tax or any other such advice, and in noway represent views of Quantigic r Solutions LLC, the website or any of theirother aﬃliates.

Introduction and Summary

Diversiﬁed long-only portfolios consisting of many stocks are invariably exposedto broad market movements. So, is there an “optimal” way of constructing suchlong-only portfolios? Here we can principally distinguish two rather diﬀerent cases.In the ﬁrst case, we have no detailed expectations about individual stock returns.I.e., we are trying to construct a long-only portfolio oblivious to any trading signals(or alphas). We can think of such a portfolio as a benchmark. One possibility isto use oﬀ-the-shelf (market cap weighted) broad market portfolios such as S&P 500or Russell 3000. Another simple approach is to use a minimum-variance portfolio,whose weights w i ( i = 1 , . . . , N labels N stocks in our universe), up to an overallnormalization, are given by w i ∝ N X j =1 C − ij (1)There are several issues with this. First, if C ij is a sample covariance matrix basedon a time-series of historical stock returns, in many cases it is singular as thereare not enough observations in the time-series. Second, even if it is nonsingular,the oﬀ-diagonal elements of C ij (more, precisely, pair-wise correlations) are highlyunstable out-of-sample, and thus so are the benchmark weights w i . Third, the betas β i [Sharpe, 1963] of the individual stocks w.r.t. to this portfolio are all equal 1 (up toan overall normalization factor), so the correlations of the stocks with the benchmarkend up being inversely proportional to the historical standard deviations σ i of theirreturns, which is not what we expect for a broad market benchmark. Fourth,generally, some weights (1), even if computable, are negative unless we construct aminimum-variance portfolio subject to lower bounds on the weights, thereby eitherexcluding stocks that would otherwise have negative weights (thereby, among otherthings, diminishing diversiﬁcation and distorting the remainder of the weights), orassigning some ad hoc minimum weights to such stocks (and still distorting theother weights). Some of these issues can be attempted to be circumvented by using(up to an overall normalization factor) the ﬁrst principal component of C ij as theweights (and, consequently, the betas). However, the issues with negative weightsand out-of-sample instability (the latter perhaps to a lesser degree) still persist.In the second case, assume that we can forecast non-random expected returns E i for individual stocks. We can try to construct a long-only portfolio that outperformsa benchmark portfolio. Mean-variance optimization [Markowitz, 1952] would give w i ∝ N X j =1 C − ij E j (2) It has been appreciated for decades that betas are highly unstable [Fabozzi and Francis, 1978]. Actually, because of the oﬀ-diagonal elements in C ij , the stocks that are excluded are notexactly the same as those with w i ≤ See, e.g., [Avellaneda and Lee, 2010], [Connor and Korajczyk, 1993], [Geweke and Zhou,1996], [Trzcinka, 1986]. E i ); and ii) then builda long-short dollar-neutral portfolio on top of it based on E i such that the combinedportfolio is still long-only and well-diversiﬁed. The dollar-neutral portfolio can bebuilt using standard optimization techniques by employing a well-built, stable mul-tifactor risk model instead of the sample covariance matrix C ij . We emphasize thatthis construction is not simply a “trend-following” or “market-timing” strategy.Instead, this is a systematic way of constructing a long-only portfolio as a combina-tion of a long-only “passive” benchmark and an “actively managed” dollar-neutralportfolio (e.g., based on statistical arbitrage), and the latter is expected to producepositive returns on its own. However, in the context of long-only portfolios, thisdollar-neutral strategy is more constrained, as the net positions must be all long.So, this is a way of generating excess returns above that of the “passive” benchmark,the price being the risk associated with the dollar-neutral strategy (no free lunch).In this paper we discuss a systematic approach to the above “program”. First, wegive an explicit formulaic algorithm for constructing a benchmark portfolio given aset of betas, which can be chosen to have various desirable properties. The resultantbenchmark weights are positive by construction. This is achieved by building amultifactor risk model Γ ij and using it instead of the sample covariance matrix C ij speciﬁcally for constructing a long-only benchmark. This Γ ij is carefully built basedon a binary multilevel industry classiﬁcation (or some other clustering scheme). Itdoes not involve any principal components, and there are no iterations (due to,e.g., lower bounds) required to obtain w i . In fact, they are given by a simple butnontrivial (and arguably elegant) formula, which is one of the main results of thispaper. We give a detailed construction of Γ ij and the benchmark weights w i inSection 3 after reviewing some generalities pertaining to betas in Section 2. In anutshell, the weights w i are expressed via a product of simple algebraic quantitiesbuilt from speciﬁc variances at each level in the industry classiﬁcation (clusteringscheme). These speciﬁc variances carry nontrivial information about the underlyingstock returns. The source code for computing Γ ij and w i is given in Appendix A. In Section 4 we then discuss the outperformance strategy based on overlayingan “actively managed” dollar-neutral strategy on top of a “passive” benchmark.Given the expected returns E i , the dollar-neutral portfolio can be constructed usingstandard optimization techniques with bounds. However, the nontrivial part is thatthe multifactor risk model Γ ′ ij used in this optimization is diﬀerent from Γ ij usedin constructing the benchmark. Further, this strategy is similar to the S&P 500outperformance strategy, where a long position in the S&P 500 stocks (with dynamicweights) is oﬀset by a short S&P futures position. Section 5 brieﬂy concludes. This “passive” benchmark can be thought of as an index. See, e.g., [Lo, 2016] for an overview. The source code given in Appendix A hereof is not written to be “fancy” or optimized forspeed or in any other way. Its sole purpose is to illustrate the algorithms described in the maintext in a simple-to-understand fashion. Some important legalese is relegated to Appendix B. Here: i) we have no futures, and ii) the benchmark is not S&P 500 based (albeit, it can be). Betas

Consider a universe of N stocks labeled by i = 1 , . . . , N . Let R is be a time-seriesof stock returns (e.g., daily, weekly, monthly, etc., close-to-close returns). Here theindex s = 1 , . . . , T labels trading days on which these returns are computed ( s = 1labels the most recent date). Let F s be another time-series of returns, which canbut need not correspond to a portfolio of stocks, such as an index. A priori it canbe any time-series of returns (e.g., on a commodity, bond, currency, etc.). For ourpurposes, we can think of F s as the returns of a benchmark portfolio (see below).Stock betas β i can be thought of as measuring how closely each stock “follows”the movement of the benchmark. Betas can be formally deﬁned via a serial regres-sion with the intercept (and unit weights) of the returns R is over F s : R is = α i I s + β i F s + ǫ is (3)Here: α i is the regression coeﬃcient of the intercept I s ≡ β i is the coeﬃcient of F s ; and ǫ is are the regression residuals. The regression coeﬃcients are given by α i = R i − β i F (4) β i = P Ts =1 e R is e F s P Ts =1 e F s (5)where for a time-series A s A s = 1 T T X s =1 A s (6) e A s = A s − A s (7)So, the betas are given by β i = Cov( R is , F s )Cov( F s , F s ) (8)where Cov( · , · ) denotes a serial covariance. If F s are the returns of a portfolio of thestocks in our universe with weights w i , i.e., if F s = N X i =1 w i R is (9)then we have (here C ij is the sample covariance matrix of the stock returns R is ) β i = 1 σ F N X j =1 C ij w j (10) σ F = N X i,j =1 C ij w i w j (11) C ij = Cov( R is , R js ) (12) Here R is can be raw returns or excess returns w.r.t. a risk-free rate, depending on the context. Benchmarks

Suppose we now turn the tables and attempt to construct the benchmark portfolio– i.e., determine the weights w i – given the betas β i . This can be done by formally solving (10) via (in deﬁning e C ij we assume that all β i = 0): w i = σ F N X j =1 C − ij β j = σ F β i N X j =1 e C − ij (13) σ F = " N X i,j =1 C − ij β i β j − = " N X i,j =1 e C − ij − (14) e C ij = 1 β i β j C ij (15)where C − ij is the matrix inverse to C ij . However, in practice there are some issueswith this formal solution. First, if T < N + 1, the sample covariance matrix C ij is singular (i.e., not invertible). Second, unless T ≫ N , which is seldom (if ever)the case in practice, the oﬀ-diagonal elements (in particular, the correlations – thediagonal elements are relatively stable) are highly unstable out-of-sample rendering C ij essentially useless (unpredictive out-of-sample). Third, even if all β i are positive(e.g., β i ≡ C ij is invertible, the weights given by (13) are not necessarilyall positive due to nontrivial correlations between stocks. In many applications thebenchmark is a long-only portfolio, which requires the weights w i to be nonnegative. For a moment, let us ignore the aforesaid issues with the sample covariance ma-trix C ij and assume that it somehow – magically – is invertible and stable out-of-sample. Then there is a simple “solution” to constructing a benchmark portfolio. Thus, it is tempting to take β i = γV (1) i , where γ > V (1) i is the ﬁrst principal component of C ij : N X j =1 C ij V ( a ) j = λ ( a ) V ( a ) i , a = 1 , . . . , N (16) N X i =1 V ( a ) i V ( b ) i = δ ab (17) N X a =1 V ( a ) i V ( a ) j = δ ij (18) Strictly speaking, invertibility is not required here, but this is not critical. However, as we discuss in a moment, this “solution” is intrinsically ﬂawed. V ( a ) i denotes the a -th principal component of C ij corresponding to the eigen-value λ ( a ) , where the eigenvalues are organized in the descending order: λ (1) > λ (2) > · · · > λ ( N ) . Then the portfolio weights are given by w i = V (1) i /γ (19)More generally, we can take β i = γ i U (1) i (20) w i = U (1) i /γ i (21) N X j =1 D ij U ( a ) j = e λ ( a ) U ( a ) i , a = 1 , . . . , N (22) D ij = 1 γ i γ j C ij (23)Here γ i > N -vector, and U ( a ) i are principal components of the matrix D ij .Let us start with (19). One immediate issue with this “solution” is that thisportfolio is highly skewed, to wit, it is over-weighted with highly volatile stocks.This is because: i) the stock volatilities σ i = √ C ii have a skewed (roughly log-normal) distribution with a long tail for higher values of σ i ; and ii) the elementsof the ﬁrst principal component roughly scale as V (1) i ∝ σ i . To see this, consider asimple model where all stocks have uniform pair-wise correlations equal ρ : C ij = σ i σ j [(1 − ρ ) δ ij + ρν i ν j ] (24)where ν i ≡ N -vector. Straightforward algebra leads to the followingeigenvalue equation: ρ N X i =1 σ i λ − (1 − ρ ) σ i = 1 (25)whose N solutions are the eigenvalues λ = λ ( a ) , a = 1 , . . . , N . The eigenvectors aregiven by: V ( a ) i = η ( a ) σ i λ ( a ) − (1 − ρ ) σ i (26)where η ( a ) are overall normalization factors (ﬁxed via (17)). We can solve (25)iteratively via λ = ρ N X i =1 σ i − (1 − ρ ) σ i /λ (27) In theory some eigenvalues can be degenerate, but in practice positive eigenvalues are not (andhere we are assuming there are no null eigenvalues). This is not critical here as we are interestedin the ﬁrst principal component, whose eigenvalue can be safely assumed to be non-degenerate. I.e., the sample correlation matrix Ψ ij = C ij /σ i σ j is replaced by a 1-factor model (see below),where ρ can be taken as the average pair-wise correlation ρ = P Ni,j =1; i = j Ψ ij /N ( N − λ ≫ ρ σ i for all i . In the zeroth approximation we have λ (1) ≈ ρ N X i =1 σ i (28) V (1) i ≈ σ i qP Ni =1 σ i (29)It is now evident that the portfolio given by the weights (19) is indeed skewed andoverloaded with high volatility stocks. This conclusion persists even if we do notassume uniform correlations as in (24). Thus, straightforward algebra will convincethe reader that this is the case for, e.g., the following model: C ij = σ i σ j e Ψ ij (30) e Ψ ij = e ξ i δ ij + e λ (1) U (1) i U (1) j (31) e ξ i = 1 − e λ (1) [ U (1) i ] (32)where U (1) i is the ﬁrst principal component of the sample correlation matrix Ψ ij = C ij /σ i σ j , and e λ (1) is the corresponding eigenvalue. Note that 0 < e ξ i <

1. Nowwe no longer have uniform correlations, and (in the zeroth approximation) we have V (1) i ∝ σ i U (1) i , so V (1) i are skewed for large σ i (as U (1) i are independent of σ i and arenot skewed; in the zeroth approximation U (1) i ≈ / √ N ). So the problem persists.A simple “ﬁx” is to take skewed betas to begin with, e.g., if we take γ i = σ i in (20), then we have w i = U (1) i /σ i , where U (1) i is the ﬁrst principal component ofthe sample correlation matrix Ψ ij . Now w i are skewed but in the opposite direction– the weights for volatile stocks are suppressed, which is a welcome feature. Thiswould seem to solve the problem of constructing an acceptable benchmark. However,there are other issues we have been postponing dealing with. First, generally, someelements of U (1) i are negative. Since for long-only benchmarks we wish to have The actual λ (1) is even higher and the actual V (1) i are even more skewed for large σ i , whichfurther aids the point we make here. At any rate, in practice the zeroth approximation is prettygood. E.g., for the dataset discussed in Subsection 2.6 of [Kakushadze and Yu, 2018], which consistsof 21-day historical volatilities based on daily close-to-close returns for 3810 U.S. stocks, we havethe following statistics for the ratio σ i / P Nj =1 σ j : Min = 1 . × − , 1st Quartile = 3 . × − ,Median = 7 . × − , Mean = 2 . × − , 3rd Quartile = 2 . × − , Max = 4 . × − .Therefore, the zeroth approximation (28) is justiﬁed assuming ρ is not too small, which is thecase considering that the 21-day historical average pair-wise correlation for the aforesaid dataset ρ = 0 .

10. So, if we set ρ = ρ , then for κ i = (1 − ρ ) σ i /ρ P Nj =1 σ j we have that max( κ i ) ≈ . κ i ≪

1. Further, a relatively small number of stocks for which κ i is not smallhave even larger contributions to λ (1) and V (1) i compared with the aforesaid zeroth approximation,which further exacerbates the skewness of V (1) i (and thus w i ) for the large σ i stocks (see below). In (30) the sample correlation matrix Ψ ij is replaced by a 1-factor statistical risk model e Ψ ij based on the ﬁrst principal component U (1) i of Ψ ij . See [Kakushadze and Yu, 2017] for details. E.g., in the dataset mentioned in fn. 15, out of the 3810 elements, 575 (or over 15%) of U (1) i and 668 elements of V (1) i are negative. This is a sizable number. i >

0, this poses an issue. One way to circumvent this is not to use (20) (with γ i = σ i ), but instead to proceed as follows. First, instead of using the samplecovariance matrix C ij , we model it via the 1-factor statistical risk model (30) (seefn. 16). The inverse of C ij is given by C − ij = 1 σ i σ j e Ψ − ij (33) e Ψ − ij = 1 e ξ i δ ij − q U (1) i e ξ i U (1) j e ξ j (34) q = " / e λ (1) + N X k =1 [ U (1) k ] / e ξ k − (35)Let H + = { i | U (1) i ≥ } and H + = { i | U (1) i < } . Then we can set (here γ is anoverall normalization constant, while κ is a parameter to be ﬁxed) β i = γ σ i U (1) i , i ∈ H + (36) β i = κ γ σ i U (1) i , i ∈ H − (37) w i = e q U (1) i γσ i e ξ i h / e λ (1) + (1 − κ ) G − i , i ∈ H + (38) w i = e q U (1) i γσ i e ξ i h κ/ e λ (1) − (1 − κ ) G + i , i ∈ H − (39) e q = h ( G + + κ G − ) / e λ (1) + (1 − κ ) G + G − i − (40) G + = X k ∈ H + [ U (1) k ] / e ξ k (41) G − = X k ∈ H − [ U (1) k ] / e ξ k (42)It then follows that for i ∈ H − we have: w i < κ ∗ < κ ≤ w i = 0 for κ = κ ∗ ;and w i > κ < κ ∗ . Here κ ∗ = G + / e λ (1) + G + (43)Now: i) N + = | H + | is sizably larger than N − = | H − | (so G + ∼ > / e λ (1) ≪ κ ∗ in (43) is subleading. As a result, κ ∗ ≈ − /G + e λ (1) is very close to 1. Also, G + ≫ G − . Na¨ıvely, it might appear reasonableto assume that G + /G − ∼ N + /N − . However, typically, G + /G − is sizably largerthan N + /N − . The additional skew is due to the fact that the absolute value ofthe average negative pair-wise correlation is sizably lower than the average positive In practice, generically, no element of U (1) i is expected to be exactly 0. which implies that average [ U (1) i ] for i ∈ H − is sizably lowerthan average [ U (1) i ] for i ∈ H + , which in turn implies that on average e ξ i are closerto 1 for i ∈ H − than for i ∈ H + . So, let κ = 1 − ζ /G + e λ (1) . For ζ = 1 we have κ ≈ κ ∗ (up to subleading corrections). For ζ > w i > i ∈ H − . However, ifwe take ζ ≫

1, then on average the weights w i for i ∈ H − will be much higher thanfor i ∈ H + . This implies that ζ ∼

1. Then we have (up to subleading corrections) w i ≈ U (1) i γσ i e ξ i G + , i ∈ H + (44) w i ≈ − U (1) i γσ i e ξ i G + [ ζ − , i ∈ H − (45)So, we can determine the value of ζ = ζ ∗ such that the mean (or median) of w i is the same in H + and H − . This way stocks from these two sets will on average beweighted more or less equally in the benchmark, with all w i > β i < H − ).Here one may argue that the solution with ζ = 2, where w i ≈ | U (1) i | /γσ i e ξ i G + , ismore natural (notwithstanding that on average the weights w i , i ∈ H − , are somewhatsmaller than the weights w i , i ∈ H + ). However, there is no magic bullet herefor ﬁxing ζ . In fact, the parametrization using a single parameter, ζ , is just onepossibility out of myriad others. Now, as to ζ , it can be chosen to be pretty muchanywhere between (somewhat above) 1 and (somewhat above) ζ ∗ deﬁned above.So, problem solved? Well, not quite. The fact that tiny (of order 1 /G + e λ (1) on arelative basis) changes in β i produce of order 1 changes in the benchmark weights w i should sound an alarm. The culprit here is that, as already mentioned, in the zerothapproximation we have U (1) i ≈ / √ N . This is the so-called “market mode” – theﬁrst principal component of Ψ ij corresponds to the overall movement of the broadmarket (see, e.g., [Bouchaud and Potters, 2011]). The deviations from U (1) i ≈ / √ N correspond to diﬀerent stocks having diﬀerent β i /σ i . However, in the simple 1-factormodel (30), very diﬀerent looking benchmarks have almost the same betas. To seethis, consider the weights w i = ω i /σ i , where 0 < ω i ∼

1. We then have β i = σ i σ F "e ξ i ω i + e λ (1) U (1) i N X j =1 U (1) j ω j (46)Considering that e ξ i ∼ e λ (1) ≫

1, the ﬁrst term due to the speciﬁc risk is Thus, for the dataset mentioned in fn. 15, the median (mean) correlation in H + is 0 . . H − it is − .

136 ( − . U (1) i ] in H + is 2 . × − (2 . × − ), while in H − it is 2 . × − (5 . × − ). The median (mean) e ξ i in H + is 0 . . H − it is 0 .

985 (0 . G + = 1 . G − = 0 . e λ (1) = 674 . For the dataset mentioned in fn. 15, the value of ζ ∗ computed based on the median (mean)is 6 .

988 (6 . H + and H − mentioned above. β i = σ i σ F e λ (1) U (1) i N X j =1 U (1) j ω j h O (1 / e λ (1) ) i (47)So, up to small O (1 / e λ (1) ) corrections, β i are simply proportional to σ i U (1) i , irrespec-tive of the individual values of ω i . Therefore, all that gymnastics for constructing w i > β i ∝ σ i U (1) i , we can try to includehigher principal components via a K -factor statistical risk model: C ij = σ i σ j e Ψ ij (48) e Ψ ij = e ξ i δ ij + K X a =1 e λ ( a ) U ( a ) i U ( a ) j (49) e ξ i = 1 − K X a =1 e λ ( a ) [ U ( a ) i ] (50)Here K > w i > w i and thebetas β i constructed using such models. This is not a fruitful direction to pursue. While we can and will consider multifactor models beyond statistical risk models,we are not done with 1-factor models quite yet. Let us consider a general 1-factormodel: C ij = σ i σ j e Ψ ij (51) e Ψ ij = e ξ i δ ij + Ω i Ω j (52) e ξ i = 1 − Ω i (53)A priori we must only require that Ω i < i are arbitrary otherwise. If wetake Ω i = β i /σ i , straightforward algebra gives w i = η β i ξ i (54) ξ i = σ i e ξ i (55) η − = N X i =1 β i ξ i (56)9ere ξ i is the speciﬁc variance in the covariance matrix C ij (as opposed to e ξ i ,which is the same quantity in the correlation matrix e Ψ ij ). So, up to an overallnormalization factor η , w i are proportional to β i /ξ i , which is the same result as whatwe would have obtained had we assumed – albeit this may appear strange at ﬁrst –that C ij = ξ i δ ij . Put diﬀerently, it is as though we ignore the factor risk altogetherand only account for speciﬁc (idiosyncratic) risk. However, there is actually nothingstrange about this result. Importantly, if all β i >

0, then automatically all w i > The weights (13) are nothing but the solution to maximizing the expected Sharperatio [Sharpe, 1966], [Sharpe, 1994] S of the benchmark portfolio if we treat β i asthe expected returns E i for our stocks: E i = γ β i (57)where γ is an immaterial (for our purposes here) overall normalization factor. Theexpected Sharpe ratio is given by S = 1 σ F N X i =1 E i w i (58) σ F = N X i,j =1 C ij w i w j (59)Maximizing S w.r.t. w i , we get (13) up to an overall normalization factor. Thelatter can be ﬁxed by requiring that N X i =1 w i β i = 1 (60)which is a consequence of (13) and (14). Now we can understand the issue weencountered above with the “market mode”. Maximizing the Sharpe ratio hedges against the broad market going bust (i.e., all or most stocks selling oﬀ at the sametime). While this is a natural thing to do when constructing dollar-neutral portfolios,it makes no sense to do this when constructing long-only portfolios. Indeed, longonly portfolios are exposed to market risk by deﬁnition. Hedging against speciﬁcrisk does make sense, which amounts to dropping the factor risk from C ij (in a1-factor model) and simply taking C ij = ξ i δ ij , i.e., treating it as though we onlyhave speciﬁc risk. So, the lesson here is that we must eliminate the “market mode”. Here we have no transaction costs, bounds or constraints, so maximizing the Sharpe ratio isequivalent to mean-variance optimization [Markowitz, 1952]. See, e.g., [Kakushadze, 2015a]. Not precisely, but approximately. However, when N is large, for the 1-factor model (e.g., basedon the ﬁrst principal component U (1) i as in (30)), the “mishedge” is suppressed by 1 /N [Kakushadzeand Yu, 2017]. .4 Multifactor Models Next, let us discuss a general multifactor model covariance matrix Γ ij :Γ ij = ξ i δ ij + K X A,B =1 Ω iA φ AB Ω jB (61)Here: ξ i is the speciﬁc (a.k.a. idiosyncratic) risk; Ω iA , A = 1 , . . . , K , is the factorloadings matrix; and φ AB is the factor covariance matrix. For our purposes here itwill not be important to know how Γ ij is constructed. What matters here is that:the number of risk factors K ≪ N ( K can still be in hundreds); all ξ i > φ AB ispositive-deﬁnite (then so is Γ ij ); Γ ii = C ii (so the sample variances are matched).Using the inverse of Γ ij Γ − ij = 1 ξ i δ ij − K X A,B =1 Ω iA ξ i Q − AB Ω jB ξ j (62) Q AB = φ − AB + N X i =1 ξ i Ω iA Ω iB (63)in lieu of C − ij in (13) and (14), we have w i = σ F ξ i [ β i − Υ i ] (64)Υ i = K X A,B =1 Ω iA Q − AB Λ B (65)Λ A = N X j =1 β j Ω jA ξ j (66) σ F = " Θ − K X A,B =1 Λ A Q − AB Λ B − (67)Θ = N X j =1 β j ξ j (68)The weights w i can be rewritten as follows. Let e Ω iA = 1 ξ i (cid:20) Ω iA − β i Λ A Θ (cid:21) (69) e Υ i = K X A,B =1 e Ω iA Q − AB Λ B (70) For a general discussion, see, e.g., [Grinold and Kahn, 2000]. For an explicit open-sourceimplementation of a general multifactor risk model for equities, see [Kakushadze and Yu, 2016a]. w i : w i = β i Θ ξ i − σ F e Υ i (71)Note that (thus, we have (60)) N X i =1 β i e Υ i = 0 (72)Therefore, assuming all β i >

0, the ﬁrst term in (71) is always positive; however, e Υ i can be negative (in which case w i is positive) or positive, in which case w i can benegative. A deceptively “simple” way to ensure that all w i > iA suchthat they are “orthogonal” to β i N X j =1 β j Ω jA ξ j ≡ A ≡ e Υ i ≡

0. However, this is not practicable asa priori (i.e., before constructing the full risk model) ξ i are unknown and theirdependence on Ω iA is highly nonlinear (see [Kakushadze and Yu, 2016a] for details).On the other hand, it is also impracticable to derive conditions ensuring that w i > w i ≥

0) for a general multifactor model. Nonetheless, (73) has animportant interpretation: it is nothing but the requirement that the risk factors be“orthogonal” to the “market mode”. In fact, we can turn this around and ask: whathappens if we include the “market mode”? We already know the answer in the caseof a 1-factor model: if we take the corresponding factor loading as β i , then the factorrisk does not aﬀect the weights w i . But for a multifactor model things are trickier.So, let us assume that Ω i = β i . Then we can always rotate the remaining Ω iA , A >

1, such that Λ A ≡ A >

1. (Note that this rotation aﬀects φ AB .) We thenhave e Ω i ≡

0, while e Ω iA = Ω iA /ξ i for A >

1. Therefore, we have e Υ i = Θ K X A =2 e Ω iA Q − A (74) Q A = φ − A , A > φ A ≡ A >

1, then for these values of A , we also have φ − A ≡ Q A ≡ Q − A ≡

0, which would imply that e Υ i ≡

0. It is therefore the nonzerocorrelations (i.e., mixing) between the “market mode” and the other risk factorsthat make e Υ i = 0, which can lead to some negative w i . Barring setting φ A ≡ A > w i > .5 Binary “Cluster” Factors Thus, in practice, the factor loadings are not arbitrary but relatively constrained.The columns of the factor loadings matrix Ω iA typically are based on: i) industryclassiﬁcation (or some other clustering); ii) style factors (e.g., size, value, liquidity,volatility, etc.); and/or iii) principal components. We already discussed principalcomponents and we will return to them a bit later. We will also come back tostyle factors. Here we focus on “cluster” based factors, which can be based on afundamental industry classiﬁcation or a statistical one [Kakushadze and Yu, 2016b].First, let σ i = C ii be the total sample variances computed based on the historicaltime-series data. We have (see above) Γ ii = σ i . The volatilities σ i have a skewed(roughly log-normal) cross-sectional distribution with a long tail for higher valuesof σ i . This is why in practice, instead of directly modeling C ij via a factor model,it makes a lot more sense to model the sample correlation matrix Ψ ij = C ij /σ i σ j ,from which the skewness present in σ i has been nicely factored out. Indeed, thediagonal elements Ψ ii ≡

1, and the oﬀ-diagonal ones (i.e., pair-wise correlations) | Ψ ij | < i = j ). What this means in terms of the factor model for Γ ij is thatΓ ij = σ i σ j b Γ ij (76) b Γ ij = b ξ i δ ij + K X A,B =1 b Ω iA φ AB b Ω jB (77) b ξ i = ξ i /σ i (78) b Ω iA = Ω iA /σ i (79) b ξ i + K X A,B =1 b Ω iA φ AB b Ω iB ≡ b Ω iA is wholly devoid of any skewness in σ i . This simpliﬁes things a lot.Now, let us consider a model where the factor loadings b Ω iA are based on a binaryindustry classiﬁcation: b Ω iA = Ω i δ G ( i ) ,A (81) G : { , . . . , N } 7→ { , . . . , K } (82)Here: the N -vector Ω i a priori is arbitrary; G maps stocks labeled by i ( i = 1 , . . . , N )to “clusters” labeled by A ( A = 1 , . . . , K ); each cluster contains one and only onestock; J ( A ) = { i | G ( i ) = A } is the set of stocks that belong to the cluster labeled by A ; and N ( A ) = | J ( A ) | is the number of stocks in said cluster. The clusters can be,e.g., sectors, industries or sub-industries in a binary industry classiﬁcation. We For details, see [Kakushadze, 2015c], [Kakushadze and Yu, 2016a]. Such as GICS (Global Industry Classiﬁcation Standard), BICS (Bloomberg Industry Classi-ﬁcation System), SIC (Standard Industrial Classiﬁcation), etc. In principle, we can also considerquasi-binary classiﬁcations where some stocks (conglomerates, whose number typically is relativelysmall) belong to more than one cluster. We will not do so, nor will it be critical for our purposes. b ξ i = 1 − Ω i φ G ( i ) ,G ( i ) (83)which implies that (by deﬁnition, φ AB is the factor covariance matrix, so φ AA > i < /φ G ( i ) ,G ( i ) (84)For Ω i , we basically have three choices. We can take Ω i = 1 / p N ( A ), i ∈ J ( A ), i.e.,uniform within-cluster loadings. Another choice is to take Ω i = [ U ( A )] i , i ∈ J ( A ),where the N ( A )-vector [ U ( A )] i is the ﬁrst principal component of the N ( A ) × N ( A )matrix [Ψ( A )] ij = Ψ ij , i, j ∈ J ( A ). Finally, we can take Ω i to be a style factor. Once Ω i is speciﬁed, the factor covariance matrix φ AB and the speciﬁc risks b ξ i can be computed. So, following our discussion in the 1-factor case, let us takeΩ i = β i /σ i . Then straightforward algebra gives w i = σ F β i ξ i γ G ( i ) (85) ξ i = σ i b ξ i (86) γ A = 1 − K X B =1 Q − AB Λ B (87)Λ A = X j ∈ J ( A ) β i ξ i (88) Q AB = φ − AB + Λ A δ AB (89) σ − F = K X A =1 Λ A γ A (90)Here ξ i is the speciﬁc variance in the factor model covariance matrix Γ ij (as opposedto b ξ i , which is the same quantity in the correlation matrix b Γ ij ). So, the importantlesson from (85) is that the weights w i within each cluster are computed the sameway as in the 1-factor model (i.e., by ignoring the factor risk). The normalizationfactors γ G ( i ) are uniform within each cluster and only vary from cluster to cluster.They result from optimizing cluster returns across the K clusters (see below). In thisregard, they are not guaranteed to be positive. Thus, if we assume that there is no This is the binary risk model construction [Kakushadze and Yu, 2016a]. This is the heterotic construction [Kakushadze, 2015c], [Kakushadze and Yu, 2016a]. This is the heterotic CAPM construction [Kakushadze and Yu, 2016a]. The “style” factorhere can be related to β i itself (see below). More precisely, depending on the historical lookback, φ AB may be computable as a samplecovariance matrix of factor returns or itself may have to be modeled via a factor model covariancematrix as the sample factor covariance matrix may be singular or unstable out-of-sample. However,the diagonal elements φ AA are always the same as sample variances of the (appropriately deﬁnedand normalized) factor returns. See [Kakushadze, 2015c], [Kakushadze and Yu, 2016a]. φ AB ≡ A = B ), then we have γ A = 1 / (1 + φ AA Λ A ) >

0. In the presence of mixing we canhave some negative γ A . However, then all w i in the corresponding cluster(s) wouldbe negative as well (assuming all β i > w i . This can be seen by modeling φ AB via a1-factor model. The story is the same as above, with clusters in the place of stocks. φ AB To illustrate the discussion above, for our purposes here it suﬃces to consider a1-factor model for the factor covariance matrix: φ AB = ζ A δ AB + χ A χ B (91)Straightforward algebra then yields Q − AB = 1 ν A δ AB + 1 κ χ A ν A ζ A χ B ν B ζ B (92) ν A = 1 ζ A + Λ A (93) κ = 1 + K X A =1 χ A Λ A ζ A Λ A (94)So, we have γ A = κ − ζ A Λ A − K X B =1; B = A (cid:20) χ A χ B − (cid:21) χ B Λ B ζ B Λ B ! (95)For generic χ A (i.e., | χ A /χ B − | ∼ B = A ), some γ A can be negative. Consider A = A ∗ , χ A ∗ = max( χ A ). If K ≫

1, to avoid negative γ A ∗ , we would have to assume χ B Λ B / (1 + ζ B Λ B ) ≪ B = A ∗ . If, for such B , ζ B Λ B ∼ >

1, then we have χ B /ζ B ≪ ζ B Λ B ≪ χ B Λ B ≪

1, and ζ B Ω i ≪ | χ B | Ω i ≪ i ∈ J ( B ), so, nonsensically, thewithin-cluster pair-wise stock correlations b Γ ij = Ω i Ω j φ BB ≪ i = j , i, j ∈ J ( B ). Further, if all ζ A Λ A ≪

1, both within-cluster and inter-cluster pair-wise stock correlations b Γ ij = Ω i Ω j φ G ( i ) ,G ( j ) ≪ i = j , unless all | χ A | ≫ ζ A , i.e., unless, nonsensically, all clusters arealmost 100% (anti-)correlated. That is, in the (unrealistic) ζ A Λ A ≪ ij is almost diagonal. .5.2 Cluster Weights So, how can (should) we compute w i such that they are nonnegative? We can getinsight by looking at (95). Let us, ad hoc, take all χ A to be uniform: χ A ≡ χ . Thenwe have γ A = κ − ζ A Λ A (96)and all γ A >

0. So, up to an immaterial overall normalization factor, this is thesame result as what we get if we assume that φ AB is diagonal, in which case we have γ A = 1 / (1 + φ AA Λ A ). However, here φ AB is not diagonal and in (96) we have thespeciﬁc variance ζ A in the denominator, not the total variances φ AA . This is becausewe have eﬀectively dropped the “market mode” (i.e., the factor risk) in φ AB . Whatremains is the speciﬁc risk. So, the question is, what is the interpretation of (96)?Let us look at each cluster independently from other clusters. We can constructthe benchmark for the universe of stocks corresponding to each cluster using the1-factor model approach. These weights are given by (see (54))[ w ( A )] i = η A β i ξ i , i ∈ J ( A ) (97) η − A = X j ∈ J ( A ) β j ξ j = Λ A (98)Now we can construct cluster returns, i.e., the returns of the K benchmark portfolioscorresponding to the K clusters, via R A = X i ∈ J ( A ) [ w ( A )] i R i (99)Since (up to an overall normalization factor) E i = β i are the expected returns forthe stocks, the expected returns E A for the clusters are E A ≡

1. If we constructa “global” benchmark portfolio made of all the clusters, the corresponding weightswith which we combine the clusters are (normalized such that P KA =1 w A = 1) w A = µ E A ζ A = µ ζ A (100) µ − = K X A =1 E A ζ A = K X A =1 ζ A (101)The stock weights in this “global” benchmark portfolio are given by w i = w A [ w ( A )] i = µζ A Λ A β i ξ i , i ∈ J ( A ) (102)This is precisely (85) with γ A of the form (96) and σ F = µ κ (see (90)) in the limitwhere ζ A Λ A ≫

1. So, the question is, what is the meaning of the extra 1 in (96)?16o understand this, let us consider the opposite limit, where ζ A Λ A ≪

1. In thislimit pair-wise stock correlations are small (see fn. 30). This means that the totalrisk approximately equals the speciﬁc risk and the factor model covariance matrixis approximately diagonal. So, in this limit the eﬀect of the clusters is negligibleand we should recover our result for the 1-factor model (54). And this is preciselywhat happens in the ζ A Λ A ≪ γ A ≈ /κ and is independent of A , so (85) correctly reduces to (54). For the intermediate values ζ A Λ A ∼

1, (90)smoothly interpolates between the two limits. This is what the full optimizationgives, which balances the stock-speciﬁc risk and the factor risk. The only “looseend” is that in arriving at (90) we assumed uniform χ A ≡ χ , which results in γ A that na¨ıvely might appear as independent of χ . However, Λ A depends on b ξ i = 1 − Ω i φ AA = 1 − Ω i ( ζ A + χ ), i ∈ J ( A ). Furthermore, ζ A depends on χ (which is the factor loading in the 1-factor model (91) for φ AB ) via a computationinvolving the time-series of factor returns (see [Kakushadze, 2015c], [Kakushadzeand Yu, 2016a]). So, the aforesaid “loose end” is this: why are χ A ≡ χ uniform?This is because the cluster expected returns are uniform: E A ≡ β A are uniform: β A ≡ b . So, following our discussion in Subsection 3.2, the factor loading Ω A inthe 1-factor model for the correlation matrix ψ AB = φ AB /σ A σ B (where σ A = φ AA )is given by Ω A = β A /σ A , while the corresponding factor loading in the covariance matrix χ A = σ A Ω A is simply χ A = β A ≡ b . I.e., χ A are uniform, χ A ≡ χ , and χ is identiﬁed with b . Note that while our discussion here na¨ıvely may appear a bit“cavalier” w.r.t. the normalizations of β A and χ A , it is not. This is because, in a1-factor model, the factor loading χ A subsumes the 1 × ϕ . So, the factor model (91) actually reads: φ AB = ζ A δ AB + χ ′ A ϕ χ ′ B (103)Here χ ′ A is the raw (unnormalized) factor loading and χ A = √ ϕ χ ′ A . So, we canidentify χ ′ A with β A ≡ b , which can be normalized arbitrarily, and this normalizationis then subsumed in χ A via ϕ , which is computed based on the time-series of factorreturns and depends on χ ′ A . The end result is that our χ A are uniform: χ A ≡ χ . Above we discussed a 1-factor model for the cluster factor covariance matrix φ AB .However, we can generalize our result to a multifactor model for φ AB where the K clusters labeled by A can be grouped into further F clusters (typically, F ≪ K ),which we will label by a , a = 1 , . . . , F . This naturally arises in binary fundamentalindustry classiﬁcations. E.g., in BICS (see above) at the most granular level wehave sub-industries, which are grouped into industries, which themselves are grouped Both within- and inter-cluster pair-wise stock correlations are small unless all | χ A | ≫ ζ A . Up to an immaterial overall normalization, that is: κ explicitly depends on χ . This structure also arises in statistical industry classiﬁcations [Kakushadze and Yu, 2016b]. A labels sub-industries and a labels industries,or A labels subs-industries and a labels sectors, etc. So, the idea here is that wehave an F -factor model for φ AB : φ AB = ζ A δ AB + F X a,b =1 χ Aa ϕ ab χ Bb (104) χ Aa = χ A δ S ( A ) ,a (105) S : { , . . . , K } 7→ { , . . . , F } (106)Here: χ Aa is a K × F factor loadings matrix; ϕ ab is an F × F factor covariancematrix; the K -vector χ A is a priori arbitrary; S maps the K “sub-clusters” labeledby A to the F “clusters” labeled by a , a = 1 , . . . , F ; each cluster contains one andonly one sub-cluster; J ′ ( a ) = { A | S ( A ) = a } is the set of sub-clusters that belong tothe cluster labeled by a . Straightforward algebra gives: Q − AB = 1 ν A δ AB + χ A ν A ζ A χ B ν B ζ B κ − S ( A ) ,S ( B ) (107) ν A = 1 ζ A + Λ A (108) κ ab = ϕ − ab + δ ab X A ∈ J ′ ( a ) χ A Λ A ζ A Λ A (109)So, we have γ A = 11 + ζ A Λ A " − χ A K X B =1 χ B Λ B ζ B Λ B κ − S ( A ) ,S ( B ) (110)Following our logic above, we must take uniform χ A ≡ χ . However, unlike the1-factor case, where κ was a number, here instead we have a matrix, κ ab , whichdepends on the details of ϕ ab . Consider a 1-factor model: ϕ ab = ρ a δ ab + ω a ω b (111)For the same reason as why χ A ≡ χ are uniform, we must take uniform ω a ≡ ω . This is the original nested “Russian-doll” embedding of [Kakushadze, 2015b] used in[Kakushadze, 2015c] and [Kakushadze and Yu, 2016a]. Thus, we have K sub-clusters grouped into F clusters. Similarly to (97), we can constructthe benchmark for the universe of sub-clusters corresponding to each cluster (recall that E A ≡ w ( a )] A = η a /ζ A , A ∈ J ′ ( a ), where η − a = P A ∈ J ′ ( a ) ζ − A . We can compute the F cluster returns R a using the K sub-cluster returns: R a = P A ∈ J ′ ( a ) [ w ( a )] A R A . Then the cluster expected returns E a ≡

1. Hence uniform ω a . Again, the foregoing holds up to immaterial overall normalizations. ω a , straightforward algebra yields the following simple result: γ A = 11 + ζ A Λ A

11 + ρ S ( A ) λ S ( A )

11 + τ (112) λ a = χ X A ∈ J ′ ( a ) Λ A ζ A Λ A (113) τ = ω F X a =1 λ a ρ a λ a (114)Note that the factorization in (112) occurs precisely because χ A and ω a are uniform. Above we considered a 2-level clustering scheme. It is now evident how to generalizeit to any P -level clustering scheme. We have the following sequence: Stocks (Level-0) → Level-1 Clusters → Level-2 Clusters → . . . → Level- P Clusters → “Market”(Level-( P + 1)). Here “Market” means the entire universe of N stocks and can bethought of as the ﬁnal single cluster in the above sequence. Thus, BICS is a 3-level industry classiﬁcation ( P = 3), where Level-1 Clusters = BICS Sub-industries,Level-2 Clusters = BICS Industries, and Level-3 Clusters = BICS Sectors. We havethe following nested “Russian-doll” risk model construction (here ℓ = 1 , . . . , P ):Γ ( ℓ ) A ( ℓ ) ,B ( ℓ ) = h ζ ( ℓ ) A ( ℓ ) i δ A ( ℓ ) ,B ( ℓ ) ++ K ( ℓ +1) X A ( ℓ +1) ,B ( ℓ +1) =1 Ω ( ℓ ) A ( ℓ ) ,A ( ℓ +1) Γ ( ℓ +1) A ( ℓ +1) ,B ( ℓ +1) Ω ( ℓ ) B ( ℓ ) ,B ( ℓ +1) (115)Here: A (0) , B (0) = 1 , . . . , N label stocks (i.e., they are the indices i, j = 1 , . . . , N in the notations above); Γ (0) ij = Γ ij is the factor model covariance matrix for stocks(and [ ζ (0) i ] = ξ i are the corresponding speciﬁc variances); A ( ℓ ) , B ( ℓ ) = 1 , . . . , K ( ℓ ) , ℓ = 1 , . . . , P , label the Level- ℓ Clusters; Γ ( ℓ ) A ( ℓ ) ,B ( ℓ ) , ℓ = 1 , . . . , P , are the factorcovariance matrices corresponding to the Level- ℓ Clusters; at Level-( P + 1) we have A ( P +1) = B ( P +1) = 1 (i.e., these indices take only one value corresponding to the“Market”, so we have K ( P +1) = 1), and we can either have [ ζ ( P +1)1 ] = Γ ( P +1)11 > ( P ) A ( P ) ,B ( P ) is a 1-factor model), or we can set [ ζ ( P +1)1 ] = Γ ( P +1)11 = 0 (so Γ ( P ) A ( P ) ,B ( P ) Let us emphasize that, unlike for stocks, uniform factor loadings are reasonable for clusters(e.g., industries, sectors, etc.). This is because clusters are diversiﬁed stock portfolios with muchless skewed volatilities than stocks. Therefore, in this regard, very small clusters should be avoided. Also, the last leg in the above sequence “Level- P Clusters → “Market” (Level-( P + 1))” canbe treated as optional and omitted, if so desired (see below). Note that GICS (see above) has P = 4 levels.

19s diagonal, i.e., a “0-factor model”); ﬁnally, for the factor loadings Ω ( ℓ ) A ( ℓ ) ,A ( ℓ +1) wehave (here, as above, β i are the stock betas):Ω (0) i,A (1) = β i δ G (0) ( i ) ,A (1) (116)Ω ( ℓ ) A ( ℓ ) ,A ( ℓ +1) = χ ( ℓ ) δ G ( ℓ ) ( A ( ℓ ) ) ,A ( ℓ +1) , ℓ = 1 , . . . , P (117) G ( ℓ ) : { , . . . , K ( ℓ ) } 7→ { , . . . , K ( ℓ +1) } , ℓ = 0 , , . . . , P (118)Here: G ( ℓ ) is a map from the Level- ℓ Clusters to the Level-( ℓ + 1) Clusters, ℓ =0 , , . . . , P ; “Level-0 Clusters” = stocks; K (0) = N ; at Level-( P + 1) we have the“Market” (a single “cluster”). The benchmark portfolio weights are then given by: w i = σ F β i ξ i γ G (0) ( i ) (119) γ A (1) = P +1 Y ℓ =1 (cid:18) h ζ ( ℓ ) F ( ℓ ) ( A (1) ) i Λ ( ℓ ) F ( ℓ ) ( A (1) ) (cid:19) − (120) F (1) ( A (1) ) = A (1) (121) F ( ℓ +1) ( A (1) ) = G ( ℓ ) ( F ( ℓ ) ( A (1) )) = G ( ℓ ) ( G ( ℓ − ( . . . G (1) ( A (1) ) . . . )) (122)Λ (1) A (1) = X j ∈ J (0) ( A (1) ) β j ξ j (123)Λ ( ℓ +1) A ( ℓ +1) = (cid:2) χ ( ℓ ) (cid:3) X A ( ℓ ) ∈ J ( ℓ ) ( A ( ℓ +1) ) Λ ( ℓ ) A ( ℓ ) (cid:18) h ζ ( ℓ ) A ( ℓ ) i Λ ( ℓ ) A ( ℓ ) (cid:19) − (124) σ − F = K (1) X A (1) =1 Λ (1) A (1) γ A (1) (125)Here: ℓ = 1 , . . . , P in (124); the sets J ( ℓ ) ( A ( ℓ +1) ) = { A ( ℓ ) | G ( ℓ ) ( A ( ℓ ) ) = A ( ℓ +1) } ; and F ( P +1) ( A (1) ) = 1. The benchmark weights (119) comprise one of our main results.

Assuming all β i > χ ( ℓ ) > w i as the matrix Γ ij has all positive elements. Thus, pursuantto the Perron-Frobenius theorem [Perron, 1907], [Frobenius, 1912], all V (1) i > V (1) i can always be ﬂipped simultaneously),where V (1) i is the ﬁrst principal component of Γ ij . However, (119) is not based onprincipal components of some random positive covariance (or correlation) matrix Γ ij .Instead, here we construct a non-random, meaningful Γ ij for “arbitrary” β i > Further, note that the factor in the product (120) corresponding to ℓ = P + 1 is actuallyindependent of A (1) , and it is equal 1 if ζ ( P +1)1 = 0 (i.e., if Γ ( P ) A ( P ) ,B ( P ) is a “0-factor model”). β i must beskewed similarly to σ i , where σ i = Γ ii = C ii are sample variances for stocks. I.e., b β i = β i /σ i is the quantity that is expected not to be skewed. Otherwise, within thesame Level-1 Cluster labeled by A (1) stocks with large σ i , i ∈ J (0) ( A (1) ), would havesmall correlations with other stocks. So, at Level-0 we have the following factormodel b Γ ij for the correlation matrixΓ ij = σ i σ j b Γ ij (126) b Γ ij = b ξ i δ ij + b β i Γ (1) G (0) ( i ) ,G (0) ( j ) b β j (127) b ξ i = ξ i /σ i (128) b ξ i + b β i Γ (1) G (0) ( i ) ,G (0) ( i ) ≡ b β i we would haveto use the method of Section 4 of [Kakushadze and Yu, 2016a], whereby we havesome trial values b β ′ i and the actual values b β i are related to b β ′ i via a highly nonlinearcombination of b β ′ i and the sample correlation matrix Ψ ij . It is then impracticable todetangle b β ′ i from the desired values b β i . To avoid complications with such nonlineari-ties, we can use the heterotic construction of [Kakushadze, 2015c], [Kakushadze andYu, 2016a], where b β i , i ∈ J (0) ( A (1) ), are given by the ﬁrst principal component ofthe square block Ψ ij , i, j ∈ J (0) ( A (1) ). However, these principal components are notguaranteed to be all positive. This can be overcome by deforming each block suchthat all correlations therein are positive. This is doable but somewhat convolutedand there is no unique way of doing this. At the end we would have just a singlechoice of β i – subject to variability due to the choice of the deformation, that is. Another possibility is to take b β i ≡ β i . Happily, precisely because for each block we are dealing with a 1-factor model, we canuse another, simple method to satisfy (129). Consider a symmetric M × M matrix To be clear, the method of Section 4 of [Kakushadze and Yu, 2016a] is perfectly adequate ifwe wish to construct a factor model, e.g., for optimization purposes, and need not worry aboutthe precise values of the factor loadings (i.e., the fact that the trial and actual loadings are not thesame). However, the problem at hand is diﬀerent, which is to construct a benchmark portfolio forgiven betas, and the actual factor loadings must coincide with these betas by construction. Note that equivalently we can take b β i = b G (0) ( i ) , where b A (1) > b β i → b β i b G (0) ( i ) is simply absorbed by the corresponding rescaling ofthe factor covariance matrix Γ (1) A (1) ,B (1) → Γ (1) A (1) ,B (1) /b A (1) b B (1) , so the factor model is unaﬀected. Also, this method would not work for the Level- ℓ Clusters, ℓ ≥ αβ , α, β = 1 , . . . , M . For our purposes here it will suﬃce to assume that X αβ issemi-positive deﬁnite (as here we are interested in cases where X αβ is a covarianceor correlation matrix). Suppose we wish to model it via a 1-factor model: Y αβ = a α δ αβ + b α ϑ b β (130)subject to reproducing the diagonal elements: Y αα = X αα = ς α (131) a α + ϑ b α = ς α (132)So, we need to ﬁt the unknown ϑ given the values of b α and X αβ . We can do thisas follows. First, let us deﬁne z min and z max such that for all values of α we have z min ς α ≤ a α ≤ z max ς α (133)I.e., z min and z max deﬁne the minimum and maximum allowed values of the fractionof the total standard deviation ς α attributable to the speciﬁc risk a α . (E.g., we canset z min = 0 . z max = 0 . ϑ min ≤ ϑ ≤ ϑ max (134) ϑ min = (cid:0) − z max (cid:1) / min( b b α ) (135) ϑ max = (cid:0) − z min (cid:1) / max( b b α ) (136) b b α = b α /ς α (137)So, given ς α , the values of b α cannot be arbitrary but must be such that ϑ min ≤ ϑ max (see Appendix A for how ϑ min > ϑ max cases are dealt with). Next, we can ﬁnd thevalue of ϑ which provides the least-squares ﬁt of the oﬀ-diagonal elements of b Y αβ = Y αβ /ς α ς β (138)into those of b X αβ = X αβ /ς α ς β (note that the diagonal elements b Y αα ≡ M X α,β =1; α = β h b X αβ − b b α ϑ b b β i → min (139)subject to (134). So, we have ϑ = min(max( ϑ ∗ , ϑ min ) , ϑ max ) (140) ϑ ∗ = P Mα,β =1; α = β b b α b X αβ b b β P Mα,β =1; α = β b b α b b β (141)Note that in our context here b X αβ is a sample correlation matrix, so | b X αβ | ≤

1, infact, | b X αβ | < α = β . Assuming b b α are tightly distributed, we can expect ϑ ∗ tobe somewhere between ϑ min and ϑ max (as opposed to saturating these bounds).22 .7.2 Application to “Russian-doll” Embedding Given X αβ , b α , z min and z max , let θ ( X αβ , b α , z min , z max ) denote the value of ϑ givenby (140). Then we have the following procedure for computing the speciﬁc risksand factor covariance matrices in the nested “Russian-doll” embedding describedabove: X (0) ij = C ij (142) b (0) i = β i (143) b ( ℓ ) A ( ℓ ) ≡ χ ( ℓ ) , ℓ = 1 , . . . , P (144)Γ ( ℓ +1) A ( ℓ +1) ,A ( ℓ +1) = θ ( X ( ℓ ) A ( ℓ ) ,B ( ℓ ) , b ( ℓ ) A ( ℓ ) ) , A ( ℓ ) , B ( ℓ ) ∈ J ( ℓ ) ( A ( ℓ +1) ) (145) X ( ℓ +1) A ( ℓ +1) ,B ( ℓ +1) = e X ( ℓ +1) A ( ℓ +1) ,B ( ℓ +1) u ( ℓ +1) A ( ℓ +1) u ( ℓ +1) B ( ℓ +1) (146) u ( ℓ +1) A ( ℓ +1) = vuut Γ ( ℓ +1) A ( ℓ +1) ,A ( ℓ +1) e X ( ℓ +1) A ( ℓ +1) ,A ( ℓ +1) (147) e X ( ℓ +1) A ( ℓ +1) ,B ( ℓ +1) = X A ( ℓ ) ∈ J ( ℓ ) ( A ( ℓ +1) ) X B ( ℓ ) ∈ J ( ℓ ) ( B ( ℓ +1) ) X ( ℓ ) A ( ℓ ) ,B ( ℓ ) b ( ℓ ) A ( ℓ ) b ( ℓ ) B ( ℓ ) (148) h ζ ( ℓ ) A ( ℓ ) i = X ( ℓ ) A ( ℓ ) ,A ( ℓ ) − h b ( ℓ ) A ( ℓ ) i Γ ( ℓ +1) G ( ℓ ) ( A ( ℓ ) ) ,G ( ℓ ) ( A ( ℓ ) ) (149)Here, as above, C ij is the sample covariance matrix of the stock returns. Also, notethat the choice of χ ( ℓ ) , ℓ = 1 , . . . , P , is immaterial. This procedure together with(115) completely deﬁnes the risk model. All speciﬁc risks and factor covariancematrices are positive-deﬁnite by construction. And so are the benchmark weights(119) for a range of values of β i so long as b β i = β i /σ i are not skewed (see above). Here we can ask two questions. Why does the above construction make sense?And are there viable alternatives? Thus, above we construct the risk model in arather speciﬁc way, grouping stocks into clusters and essentially building a 1-factormodel within each cluster with the factor loadings given by the betas. Then wegroup these clusters into further clusters and repeat the procedure until we end Below we suppress the z min , z max arguments. In fact, they can be ℓ -dependent, if so desired. Thus, if we, e.g., take z min = 0 . z max = 0 .

9, then ϑ min = 0 . / min( b β i ) and ϑ max =0 . / max( b β i ), so the allowed range of betas is max( b β i ) / min( b β i ) ≤ p . / . ≈ .

28. Note thatfor the Level- ℓ Clusters, ℓ = 1 , . . . , P , generally it is reasonable to expect the factor covariances andspeciﬁc risks to be of order 1 and nontrivial solutions for the ﬁtted values of ϑ (i.e., Γ ( ℓ +1) A ( ℓ +1) ,A ( ℓ +1) computed via (145)) to exist. Appendix A deals with occasions when this does not hold. This is essentially the heterotic CAPM construction of [Kakushadze and Yu, 2016a], whichis similar to the heterotic construction of [Kakushadze, 2015c], except that in the latter the factorloadings are based on principal components, which are not necessarily all positive (see above).

23p with a stable and positive-deﬁnite factor covariance matrix. The question wecan ask is, can we take a more general risk model construction instead? Basically,there two separate issues here. The ﬁrst issue is independent of the fact that weare dealing with a long-only portfolio and pertains to the fact that: i) higher-than-ﬁrst principal components are intrinsically unstable out-of-sample; ii) standard stylefactors (see above) are poor proxies for modeling pair-wise correlations, so – contraryto a common practice in commercial risk model oﬀerings – using them as factorloadings is highly suboptimal (see [Kakushadze and Yu, 2016a] for details); andiii) well-constructed fundamental industry classiﬁcations are rather stable out-of-sample as companies rarely jump industries (let alone sectors). This is what justiﬁesthe above construction except for the choice of the factor loadings, which in this caseare simply the betas. And this latter part of our construction is dictated by the factthat this choice is the only one that eﬀectively removes the “market mode”. Otherchoices for factor loadings generically would lead to undesirable negative weights w i .In this regard, if we take a generic multifactor risk model for Γ ij , which includes,e.g., style factors and/or principal components, some weights generically will benegative for any given choice of the betas. Let us emphasize that we can alwayswork backwards, pick some positive weights w i , and compute the correspondingbetas β i . However, it is unclear what any such portfolio represents. In contrast, inthe above construction the meaning of the resultant benchmark portfolio is clear.Thus, assuming that the second term in the parentheses in (120) is dominant at eachlevel ℓ (typically, this is expected to be a good approximation for large clusters) wehave w i ≈ η β i ξ i Λ (1) G (0) ( i ) P Y ℓ =1 (cid:18)h ζ ( ℓ ) F ( ℓ ) ( G (0) ( i )) i e Λ ( ℓ +1) F ( ℓ +1) ( G (0) ( i )) (cid:19) − (150)Λ (1) A (1) = X j ∈ J (0) ( A (1) ) β j ξ j (151) e Λ ( ℓ +1) A ( ℓ +1) ≈ X A ( ℓ ) ∈ J ( ℓ ) ( A ( ℓ +1) ) h ζ ( ℓ ) A ( ℓ ) i − , ℓ = 1 , . . . , P (152) η − ≈ K (1) X A (1) =1 P Y ℓ =1 (cid:18)h ζ ( ℓ ) F ( ℓ ) ( A (1) ) i e Λ ( ℓ +1) F ( ℓ +1) ( A (1) ) (cid:19) − (153)The interpretation of these weights (similarly to the example we discussed above) isclear: we suppress the weights by a product of speciﬁc variances at each level, withproper normalizations (such that at each level cluster betas are 1 up to immaterialoverall normalization factors). Note that instead of speciﬁc variances we could usetotal variances, which would “overcount” the “market mode” risk, whereas speciﬁc Also, their number is limited and fails to compete with ubiquitous industry (cluster) factors. Statistical industry classiﬁcations are not as stable as fundamental industry classiﬁcations,but still sizably outperform models based on principal components [Kakushadze and Yu, 2016b]. E i are related to the betas via (57)) subject to (60) and lower bounds w i ≥ w mini ,where w mini ≥

0. A priori here we can use a generic multifactor model covariancematrix Γ ij instead of the sample covariance matrix C ij . However, for a generic Γ ij alarge fraction of w i (in many cases, around 50%) can turn out to saturate the lowerbounds w mini . Such portfolios generically can be skewed and far from “optimality”. Now that we have a method for constructing benchmark portfolios, we can ask adiﬀerent question. Suppose we have an “alpha model”, which forecasts expectedreturns E i . Can we construct a long-only portfolio based on these returns? If allexpected returns are positive, then we can treat them as betas, i.e., as in (57), set β i = E i /γ (where γ is an immaterial normalization factor) and compute the weights w i via (119). There are two independent issues with this approach. First, even ifall E i ≥

0, the distribution of E i /σ i may be too wide, so using the method (140)might be problematic. Second, in practice, most alpha models will not have allnonnegative E i , in fact, many E i (in many cases, around 50%) may be negative.So, how can we deal with this? Instead of constructing a long-only portfoliobased on E i from scratch, we can follow a diﬀerent, 2-step approach. Since we arebuilding a long-only portfolio, we are exposed to market risk no matter what we do.So, we might as well identify a benchmark portfolio whose market exposure we arewilling to live with. Then we can try to construct a long only portfolio that – basedon out-of-sample backtests – can reasonably be expected (albeit, as with any forward-looking statements, not guaranteed) to outperform this benchmark portfolio. Oneway to construct this portfolio is to combine the benchmark portfolio with a dollar-neutral portfolio (such that the resultant portfolio is still long-only), where thedollar-neutral portfolio has a positive expected return and a low correlation withthe benchmark portfolio. So, for the weights w i of our long-only portfolio, we have w i = w ∗ i + w ′ i (154) N X i =1 w ′ i = 0 (155) Note that at the level of stocks (i.e., Level-0 in the above nomenclature), we can always ﬁnd β i such that (140) exists. Thus, we can take β i = σ i , so b β i ≡

1. Then ϑ ∗ = N ( N − P Ni,j =1; i = j Ψ ij ,where Ψ ij is the sample correlation matrix. That is, ϑ ∗ in this case is the average pair-wisecorrelation, which typically is positive and well-within reasonably set bounds ϑ min and ϑ max for agiven cluster (e.g., sub-industry, industry, sector) for a well-constructed industry classiﬁcation. This benchmark portfolio can (but need not) be constructed as in Section 3 (see below). w ′ i are the weights of the dollar-neutral portfolio such that w mini ≤ w ′ i ≤ w maxi (156) w mini ≥ − w ∗ i (157)and w ∗ i > w maxi > w i dono deviate from the benchmark portfolio by some percentage: w mini = − z w ∗ i , w maxi = z w ∗ i , where, say, 0 < z <

1. Other customizations/variations are possible.The weights w ′ i can be ﬁxed in a variety of standard ways, e.g., via Sharpe ratio(or mean-variance – see above) optimization, which is what we will assume for thesake of deﬁniteness (albeit this is not critical here). Then, ignoring for a momentthe bounds (156) and the dollar-neutrality constraint (155), we have w ′ i = γ ′ N X j =1 Γ ′− ij E j (158)Here γ ′ is a normalization coeﬃcient (to be determined), and Γ ′− ij is the inverse ofΓ ′ ij , which is an N × N (typically, multifactor) model covariance matrix. Note thatΓ ′ ij need not be the same as Γ ∗ ij , which denotes the multifactor model covariancematrix used in constructing the benchmark weights w ∗ i . This is because w ′ i is a long-short (dollar-neutral) portfolio, so we do not have the same kinds of restrictions onΓ ′ ij as on Γ ∗ ij . In fact, generally, we expect that Γ ′ ij built using, say, the heteroticconstruction [Kakushadze, 2015c], [Kakushadze and Yu, 2016a] (which utilizes ﬁrstprincipal components of the blocks of the sample correlation matrix correspondingto clusters) would work better than Γ ∗ ij . Then, given some Γ ′ ij , what should γ ′ be?Considering that Γ ′ ij is the multifactor model covariance matrix we use for mod-eling risk of a generic portfolio, we can compute the expected Sharpe ratio of thecombined portfolio as follows: S = P Ni =1 E i w i qP Ni,j =1 Γ ′ ij w i w j (159)Further, the expected correlation ρ between the portfolios w ∗ i and w ′ i is given by ρ = 1 σ ∗ σ ′ N X i,j =1 Γ ′ ij w ∗ i w ′ j = γ ′ E ∗ σ ∗ σ ′ = E ∗ σ ∗ e ′ (160) A generic Γ ′− ij includes the “market mode”, so unless the returns E i are contrivedly ﬁne-tuned,the weights w ′ i , while not exactly dollar-neutral, generically are not expected to be highly skewedtoward long or short positions. So, ignoring the dollar-neutrality condition is not detrimental.Without dollar-neutrality, assuming P Ni =1 w ∗ i = 1, we no longer have P Ni =1 w i = 1. However, thiscan be cured by simply rescaling w i (albeit this may move w i away from “optimality”). Ignoring thebounds poses a bigger issue. However, we will incorporate both the dollar-neutrality and boundsin a moment. Ignoring them for now serves the purpose of developing an intuitive understanding. E ∗ is the expected return of the benchmark portfolio, and σ ∗ and σ ′ are theexpected volatilities of the w ∗ i and w ′ i portfolios: E ∗ = N X i =1 E i w ∗ i (161)( σ ∗ ) = N X i,j =1 Γ ′ ij w ∗ i w ∗ j (162)( σ ′ ) = N X i,j =1 Γ ′ ij w ′ i w ′ j = ( γ ′ ) ( e ′ ) (163)( e ′ ) = N X i,j =1 Γ ′− ij E i E j (164)So, for the Sharpe ratio S as a function of γ ′ we have: S ( γ ′ ) = E ∗ + γ ′ ( e ′ ) σ ( γ ′ ) = ∂σ ( γ ′ ) ∂γ ′ σ ( γ ′ ) = p ( σ ∗ ) + 2 γ ′ E ∗ + ( γ ′ ) ( e ′ ) (165)The Sharpe ratio is maximized when γ ′ → ∞ . However, in this limit we do nothave a long-only portfolio. Instead, we have a long-short portfolio, which, in fact,would be dollar-neutral had we incorporated the dollar-neutrality constraint. Inactuality, we must impose the bounds (156). Then, in the limit γ ′ →

0, we have thelong-only portfolio w ∗ i . As we increase γ ′ , more and more bounds will be saturated.The bounds distort the w ′ i portfolio away from “optimality”. Above some value γ ′ opt ,the Sharpe ratio starts to fall oﬀ: S ( γ ′ ) < S ( γ ′ opt ) for γ ′ > γ ′ opt . We can ﬁx γ ′ opt via,e.g., the golden-section search [Kiefer, 1953] and use it as the “optimal value” of γ ′ . In practice, in the presence of the bounds (156) it is easier to implement mean-variance optimization than maximizing the Sharpe ratio. In the mean-varianceoptimization, we maximize the objective function w.r.t. w ′ i (for a ﬁxed value of γ ′ ) g ( w ′ i , γ ′ ) = N X i =1 E i w ′ i − γ ′ N X i,j =1 Γ ′ ij w ′ i w ′ j (166)subject to the bounds (156) and the dollar-neutrality constraint (155). This opti-mization can be performed in a standard way (see, e.g., [Kakushadze, 2015a] for a Thus, we have S (0) = E ∗ /σ ∗ = ρ e ′ , and S ( γ ′ → ∞ ) = e ′ > S (0). Also, ∂S ( γ ′ ) /∂γ ′ =( σ ∗ ) ( e ′ ) (cid:2) − ρ (cid:3) /σ ( γ ′ ) > The two are not equivalent once bounds, costs, etc., are included, except only in one specialcase of establishing trades with linear costs [Kakushadze, 2015a]. ρ gen-erally is nonzero. So, the dollar-neutral portfolio w ′ i has a nonzero expected betawith the benchmark portfolio. We may wish to make ρ vanish. This can be achievedvia the following linear homogeneous constrain on w ′ i : N X i =1 q i w ′ i = 0 (167) q i = N X j =1 Γ ′ ij w ∗ j (168)Alternatively, we may wish to make the w ′ i portfolio simply orthogonal to the w ∗ i portfolio, which is achieved via the following constraint: N X i =1 w ∗ i w ′ i = 0 (169)More generally, we can have p constraints N X i =1 Q ia w ′ i = 0 , a = 1 , . . . , p (170)Here Q ia is an N × p matrix with linearly independent columns, one of which is theunit N -vector corresponding to the dollar-neutrality constraint. We can also includeneutrality w.r.t. sectors, industries, etc., or some style factors, if so desired. Etc. We can achieve an approximately null expected correlation ρ in a diﬀerent way. Forgeneric “raw” expected returns E i we have nonzero E ∗ and thus nonzero ρ . However,given an N -vector E i , we can construct E ′ i orthogonal to w ∗ i , e.g., by regressing E i (with unit weights and no intercept) over w ∗ i and taking the residuals (i.e., E ′ i = ǫ i ): ǫ i = E i − w ∗ i P Nj =1 E j w ∗ j P Nj =1 ( w ∗ j ) (171)More generally, we can use a weighted regression with weights v i (and no intercept): ǫ i = v i E i − w ∗ i P Nj =1 v j E j w ∗ j P Nj =1 v j ( w ∗ j ) ! (172)If in (166) we substitute ǫ i instead of E i , then we approximately achieve (167). Then the question is, what should the weights v i be? Basing them on volatilities σ i (or thecorresponding speciﬁc risks) would make little sense as i) we are already optimizing the w ′ i portfolioand ii) both | E i | and w ∗ i scale linearly with σ i . So, we can simply take unit weights v i ≡

1, or basethem on quantities that are independent of σ i or have milder dependence thereon (e.g., ln( σ i )). If it were not for the distortion caused by the bounds (156), (167) would be precisely satisﬁed. Concluding Remarks

Let us brieﬂy conclude with some remarks. First, in the market outperformancestrategy we discuss in Section 4, the benchmark w ∗ i a priori can be any long-onlyportfolio (including S&P 500, Russell 3000, etc.), and not just built using the methodof Section 3. However, market cap weighted portfolios and benchmark portfolios ofSection 3 are expected to have sizable correlations. Intuitively this may appear to beevident as these are all long-only portfolios. However, this goes beyond such “zeroth-approximation” intuition. Thus, the weights w ∗ i in the benchmarks of Section 3 scaleas ∝ /σ i , and − ln( σ i ) and ln( M i ) ( M i is the market cap) are highly correlated.The “devil” then is in the details of the construction of Section 3, which aimsto capture the substructure in the stock returns corresponding to the multilevelindustry classiﬁcation (clustering). I.e., there is more information in w ∗ i than in M i .In our construction above, β i , albeit not completely arbitrary, are not ﬁxed.Recall from Section 3 that, up to an overall normalization factor, b β i = β i /σ i areof order 1 with a tight distribution with a standard deviation also of order 1. So,what should these betas be? A simple answer is that there is no magic bullet here.We can simply pick them, backtest them out-of-sample and compare the resultswith those for other values. Or we can take the “observed” values b β obsi basedon some broad market index, calculate their median value b β median = median( b β obsi ),and then cap and ﬂoor the “outliers” by b β max = b β median + κ max MAD( b β obsi ) and b β min = b β median − κ min MAD( b β obsi ), where κ max ∼ κ min ∼ κ max = κ min ). Fixing b β i this way is a “roundabout” as it uses another (cap-weighted)benchmark. But then again, there are no “ﬁrst principles” that can ﬁx b β i uniquely.Let us note that, while we can take b β i ≡ β i ≡

1. However, as mentioned above, β i ≡ β i = σ i ρ i /σ F (173)where, as before, σ F is the benchmark portfolio volatility, and ρ i is the samplecorrelation between the stock labeled by i and the benchmark. Setting β i ≡ ρ i ∝ /σ i . Similarly,in the factor model context of Section 3, we would have that, within the same cluster(sector, industry, etc.) stocks with high volatilities are almost uncorrelated to stockswith low volatilities (and each other). And this is not what we observe empirically(see, e.g., [Kakushadze, 2015c], [Kakushadze and Yu, 2016a]). We must have β i ∝ σ i .To tie up the ﬁnal “lose end”, the weights w ′ i of the dollar-neutral portfolio inSection 4 can be computed using the bounded optimization code in Appendix Cof [Kakushadze, 2015c], to wit, the function bopt.calc.opt() , whose argumentsare: ret is the N -vector of expected returns E i ; load is the matrix of constraints Q ia ; inv.cov is the inverse matrix Γ ′− ij ; upper , lower are the bounds w maxi , w mini . This kind of “sampling” can get computationally taxing quickly. MAD = mean absolute deviation. R Source Code for Benchmark Weights

In this appendix we give the R (R Package for Statistical Computing, ) source code for computing the benchmark weights w i based on thealgorithm of Section 3. This code is essentially self-explanatory and straightforwardas it simply follows the formulas therein. The function qrm.benchmark(ret, ind,beta, mkt.fac = T, z.min = 0.1, z.max = 0.9) returns the weights w i normalizedas in (60). The input is as follows: ret is an N × T matrix of returns R is (e.g.,daily close-to-close returns), where N is the number of tickers, T is the number ofobservations in the time-series (e.g., the number of trading days), and the orderingof the dates is immaterial; ii) ind is a list of length P , whose elements are populatedby the binary matrices (with rows corresponding to tickers, so dim(ind[[ · ]])[1] is N ) corresponding to the levels in the input binary industry classiﬁcation in the orderof decreasing granularity (for BICS ind[[1]] is the N × K (1) matrix δ G ( i ) ,A (1) (sub-industries), ind[[2]] is the N × K (2) matrix δ G ′ ( i ) ,A (2) (industries), and ind[[3]] is the N × K (3) matrix δ G ′′ ( i ) ,A (3) (sectors), where G = G (0) maps tickers to sub-industries, G ′ = G (0) G (1) maps tickers to industries, and G ′′ = G (0) G (1) G (2) mapstickers to sectors); iii) beta is the N -vector β i ; iv) mkt.fac , where at the ﬁnal stepfor TRUE we have a single industry factor (“Market”), while for

FALSE the industryfactors correspond to the least granular level in the industry classiﬁcation (sectorsfor BICS); and v) z.min and z.max are z min and z max deﬁned in Subsection 3.7.1.There are two small tweaks in the source code beyond what is in Section 3. First,if Level-1 (in the nomenclature of Section 3) is very granular, for a given universe ofstocks we can have one or more Level-1 Clusters each containing only one stock (e.g.,single-stock sub-industries can and do arise in BICS). Typically, for long-horizonlong-only portfolios using such granularity can be overkill. However, just in case,the source code deals with these situations in the internal function calc.theta() .Second, on occasion, it can happen that θ min > θ max (these quantities are deﬁned in(135) and (136) in Subsection 3.7.1). This can happen when there are outliers withtoo low or too high variances. Now, here we can try to do all kinds of contrivedand convoluted things. But there is a simple way of dealing with this situation.Note that by deﬁnition both θ min and θ max are positive. Violating θ max , i.e., having θ > θ max , can – unacceptably – lead to negative speciﬁc variances (see Subsection3.7.1). On the other hand, violating θ min , i.e., having θ < θ min is not detrimental solong as θ >

0. Indeed, in this case we just have small factor risk. So, θ min > θ max can be simply dealt with by setting ϑ as in (140) as opposed to ϑ = max(min( ϑ ∗ , ϑ max ) , ϑ min ) (174)which is equivalent to (140) when θ min ≤ θ max , but not when θ min > θ max . So, thesource code uses (140); see the line t <- min(max(t, t.min), t.max) in the function calc.theta() . Note that if we set b β i ≡

1, or more generally have max( b β i ) / min( b β i ) ≤ p (1 − z min ) / (1 − z max ), then we are guaranteed to have θ min ≤ θ max at Level-1. But θ min > θ max can arise at less granular levels due to outliers in factor variances.30 rm.benchmark <- function (ret, ind, beta, mkt.fac = T,z.min = 0.1, z.max = 0.9) { calc.load <- function(load, load1) { x <- colSums(load1)load <- (t(load1) %*% load) / xreturn(load) } calc.theta <- function(x, b, z.min, z.max) { if(length(x) == 1)return((1 - z.max^2) * x / b^2)s <- sqrt(diag(x))x <- t(x / s) / sb <- b / st.min <- (1 - z.max^2) / min(b^2)t.max <- (1 - z.min^2) / max(b^2)x <- t(x * b) * bx <- sum(x) - sum(diag(x))b <- sum(b^2)^2 - sum(b^4)t <- x / bt <- min(max(t, t.min), t.max)return(t) } ind[[length(ind) + 1]] <- matrix(1, nrow(ind[[1]]), 1)x <- cov(t(ret))y <- list()v <- list()w <- b <- betafor(lvl in 1:length(ind)) { if(lvl > 1) { flm <- calc.load(ind[[lvl]], ind[[lvl - 1]])b <- rep(1, nrow(flm)) } elseflm <- ind[[lvl]] <- rep(0, k <- ncol(flm))y1 <- rep(0, nrow(flm))v1 <- rep(0, k)for(a in 1:k) { take <- flm[, a] == 1if(lvl == length(ind) & !mkt.fac)G[a] <- 0elseG[a] <- calc.theta(x[take, take], b[take],z.min = z.min, z.max = z.max)y1[take] <- diag(x)[take] - b[take]^2 * G[a]if(lvl == 1)v1[a] <- sum(b[take]^2 / y1[take])elsev1[a] <- sum(v[[lvl - 1]][take] /(1 + y1[take] * v[[lvl - 1]][take])) } y[[lvl]] <- y1v[[lvl]] <- v1x1 <- t(flm) %*% x %*% flmu <- sqrt(G / diag(x1))x <- t(x1 * u) * u } w <- w / y[[1]]for(lvl in 1:(length(ind) - 1)) { for(a in 1:ncol(ind[[lvl]])) { take <- ind[[lvl]][, a] == 1w[take] <- w[take] / (1 + y[[lvl + 1]][a] * v[[lvl]][a]) }} w <- w / sum(w * beta)return(w) } DISCLAIMERS

Wherever the context so requires, the masculine gender includes the feminine and/orneuter, and the singular form includes the plural and vice versa . The author of thispaper (“Author”) and his aﬃliates including without limitation Quantigic r Solu-tions LLC (“Author’s Aﬃliates” or “his Aﬃliates”) make no implied or expresswarranties or any other representations whatsoever, including without limitationimplied warranties of merchantability and ﬁtness for a particular purpose, in con-nection with or with regard to the content of this paper including without limitationany code or algorithms contained herein (“Content”).The reader may use the Content solely at his/her/its own risk and the readershall have no claims whatsoever against the Author or his Aﬃliates and the Authorand his Aﬃliates shall have no liability whatsoever to the reader or any third partywhatsoever for any loss, expense, opportunity cost, damages or any other adverseeﬀects whatsoever relating to or arising from the use of the Content by the readerincluding without any limitation whatsoever: any direct, indirect, incidental, spe-cial, consequential or any other damages incurred by the reader, however causedand under any theory of liability; any loss of proﬁt (whether incurred directly orindirectly), any loss of goodwill or reputation, any loss of data suﬀered, cost of pro-curement of substitute goods or services, or any other tangible or intangible loss;any reliance placed by the reader on the completeness, accuracy or existence of theContent or any other eﬀect of using the Content; and any and all other adversitiesor negative eﬀects the reader might encounter in using the Content irrespective ofwhether the Author or his Aﬃliates is or are or should have been aware of suchadversities or negative eﬀects.The R code included in Appendix A hereof is part of the copyrighted R codeof Quantigic r Solutions LLC and is provided herein with the express permission ofQuantigic r Solutions LLC. The copyright owner retains all rights, title and interestin and to its copyrighted source code included in Appendix A hereof and any andall copyrights therefor.

References

Avellaneda, M. and Lee, J.H. (2010) Statistical arbitrage in the U.S. equitymarket.

Quantitative Finance

The Oxford Handbook of Random Matrix Theory.

Oxford, UK: OxfordUniversity Press.Connor, G. and Korajczyk, R.A. (1993) A Test for the Number of Factors inan Approximate Factor Model.

Journal of Finance

Journal ofFinancial and Quantitative Analysis

Sitzungsberichte der K¨oniglich Preussischen Akademie der Wissenschaften zuBerlin , pp. 456-477.Geweke, J. and Zhou, G. (1996) Measuring the Pricing Error of the ArbitragePricing Theory.

Review of Financial Studies

Active Portfolio Management . New York,NY: McGraw-Hill.Kakushadze, Z. (2015a) Mean-Reversion and Optimization.

Journal of AssetManagement https://ssrn.com/abstract=2478345 .Kakushadze, Z. (2015b) Russian-Doll Risk Models.

Journal of Asset Manage-ment https://ssrn.com/abstract=2538123 .Kakushadze, Z. (2015c) Heterotic Risk Models.

Wilmott Magazine https://ssrn.com/abstract=2600798 .Kakushadze, Z. and Yu, W. (2016a) Multifactor Risk Models and HeteroticCAPM.

Journal of Investment Strategies https://ssrn.com/abstract=2722093 .Kakushadze, Z. and Yu, W. (2016b) Statistical Industry Classiﬁcation.

Journalof Risk & Control https://ssrn.com/abstract=2802753 .Kakushadze, Z. and Yu, W. (2017) Statistical Risk Models.

Journal of Invest-ment Strategies https://ssrn.com/abstract=2732453 .Kakushadze, Z. and Yu, W. (2018) Notes on Fano Ratio and Portfolio Opti-mization.

Journal of Risk & Control https://ssrn.com/abstract=3050140 .Kiefer, J. (1953) Sequential minimax search for a maximum.

Proceedings of theAmerican Mathematical Society

Journal of Portfolio Management

Journal of Finance

Mathematische Annalen

Proceedings – EUSIPCO 2007, 15th European Signal Pro-cessing Conference.

Pozna´n, Poland (September 3-7), pp. 606-610.Sharpe, W.F. (1963) A simpliﬁed model of portfolio analysis.

Management Sci-ence

Journal of Business

Journal of Portfolio Management