FFat Tailed Factors
Jan Rosenzweig ∗ Abstract
Standard, PCA-based factor analysis suffers from a number of well known prob-lems due to the random nature of pairwise correlations of asset returns. We anal-yse an alternative based on ICA, where factors are identified based on their non-Gaussianity, instead of their variance.Generalizations of portfolio construction to the ICA framework leads to two semi-optimal portfolio construction methods: a fat-tailed portfolio , which maximises re-turn per unit of non-Gaussianity, and the hybrid portfolio , which asymptoticallyreduces variance and non-Gaussianity in parallel.For fat-tailed portfolios, the portfolio weights scale like performance to the powerof 1 /
3, as opposed to linear scaling of Kelly portfolios; such portfolio constructionsignificantly reduces portfolio concentration, and the winner-takes-all problem in-herent in Kelly portfolios.For hybrid portfolios, the variance is diversified at the same rate as Kelly PCA-basedportfolios, but excess kurtosis is diversified much faster than in Kelly, at the rate of n − compared to Kelly portfolios’ n − for increasing number of components n . Key words: optimal portfolios, ICA, PCA, fat-tailed risk
Key messages: • tail-risk-based portfolios can address a number of know problems of covariance-based portfolios • when portfolios are optimized for tail risk, weights of components scale sub-linearly with their performance, reducing portfolio concentration and the winner-takes-all problem • portfolios optimized for tail risk manage variance as efficiently as covariance-based portfolios.Word count: 2,969Number of figures: 4Number of tables: 0 ∗ Pine Tree Funds, 107 Cheapside, London EC2V 6DN, [email protected]; King’s Col-lege London, Strand, London WC2R 2LS, [email protected] a r X i v : . [ q -f i n . P M ] D ec Introduction
Analysis of portfolio returns is usually performed with reference to the relationshipbetween the portfolio’s returns vector and its covariance matrix, a practice stemmingback from Markowitz’s Modern Portfolio Theory in the 1950s [13, 14].This, in turn, is significantly aided by the use of the Principal Component Anal-ysis (PCA) [8]. Fast algorithms for extraction of principal components are routinelyused to partially invert covariance matrices and construct approximate optimal port-folios.PCA itself has an equally long history in finance, stemming back from the workof Tobin [21], Lintner [9] and Sharpe [19] in the 1950s and 1960s in the context ofstock returns. It is also routiney used for simplification of the dynamics of muti-assetcomplexes such as yield curves [10] and volatility surfaces [2].Covariance based methods, however, suffer from a number of well known prob-lems. The main problem is that correlations of asset returns are not particularlyrobust, and they may even arise randomly [1]. This is in part due to the relativelysmall number of data points used to calculate correlations, leading to a wide confi-dence interval with the possibility of significant errors. There have been a numberof approaches aimed at alleviating that problem (see [1, 20] and references therein).We take a somewhat different approach, based on the Independent ComponentAnalysis (ICA) [23]. In signal processing, ICA is a computational method for sepa-rating a multivariate signal into additive subcomponents. This is done by assumingthat the subcomponents are non-Gaussian signals and that they are statisticallyindependent from each other. ICA is a special case of blind source separation. Acommon example application is the ”cocktail party problem” of listening in on oneperson’s speech in a noisy room [7].ICA is a robust method widely used in engineering signal processing applicationsranging from acoustic signal processing [5], visual processing [15] and telecommuni-cations [17], to electroencephalography [22] and many others.Practically, ICA extracts factors based on their non-Gaussianity, rather than ontheir variance, as PCA. Traditionally, the chosen measure of non-Gaussianity wasexcess kurtosis. Other measures of non-Gaussianity have also been popular, such asvarious empirical approximations to negentropy [6].ICA has been sporadically used in finance, starting with the pioneering work ofBack [3] in the 1990s, and proceeding to this date [16, 18, 12, 11, 4].This paper is set out as follows. Section 2 outlines the the main differences be-tween PCA and ICA, and sets out the notation. Section 3 is dedicated to
Fat-TailedPortfolios , a generalization of Kelly portfolios to the non-Gaussianity framework.Section 4 analyses
Hybrid Portfolios , a class of kurtosis- and variance- minimizingportfolios. Section 5 looks at a practical analysis of S&P500 constituents over a 12year period, and Section 6 is the Discussion.
Let S (1) t ..S ( N ) t denote prices of N assets at time t , forming a vector of asset prices S t . Let w ( i ) be portfolio weight N -vectors, and Π ( i ) be the resulting portfolios,Π ( i ) t = w ( i ) . S t (1) ith portfolio variances σ ( i ) and kurtoses κ ( i ) .Both Principal Component Decomposition (PCA) and
Independent ComponentDecomposition (ICA) are decompositions of the form (1), and they can both beobtained iteratively.For the i th component of the PCA, the unit weight vector w ( i ) is selected so as tomaximise the variance σ ( i ) , the residues are projected to the hyperplane orthogonalto the resulting component Π ( i ) , and the iteration then proceeds to the component i + 1 for the residues.For the i th component of the ICA, the process is analogous, except that theweight vector w ( i ) is selected so as to maximise the kurtosis κ ( i ) instead of thevariance σ ( i ) . Also, by convention, the weight vectors are normalised differentlyin PCA and ICA. While PCA weight vectors are normalised to the same portfolioweight (typically unity), ICA weights are normalised to the same variance (typicallythe variance of the first component, whose weight vector is normalised to unity).In further text, we denote the i th principal component as P C ( i ) , and the i th inde-pendent component as IC ( i ) . The respective means, volatilities and kurtoses of theprincipal and independent components are denoted µ ( i ) P CA , µ ( i ) ICA , σ ( i ) P CA , σ ( i ) ICA , κ ( i ) P CA and κ ( i ) ICA , respectively.
We denote by m = E ( S ) , V = E ( S ⊗ S ) − m K = E (( S − m ) ⊗ ( S − m ) ⊗ ( S − m ) ⊗ ( S − m )) − V ⊗ V the return, covariance and excess cokurtosis of the joint distribution of the assetprocess S .The Kelly criterion for portfolio construction is easily obtained by looking for theweights that maximize the portfolio return, while penalising for portfolio variance: w = arg min w ( w . m − λ w . V . w ) (2)for some risk aversion parameter λ .Differentiating the right hand side of (2) wrt w and setting the result to zeroyieds the familiar Kelly criterion w ∝ V − . m (3)where the constant of proportionaity depends on the investor’s risk aversion. Equa-tion (3) is easily solved using PCA; noting that each eigenvector of V corresponds tothe weights vector of a principal component, and that the corresponding eigenvaluecorresponds to its variance, we can write the optimal portfoio weights in terms ofPCs as w ( i ) P CA ∝ µ ( i ) P CA σ ( i ) P CA (4) he optimal portfolio is given asΠ ∝ (cid:88) i w ( i ) P CA
P C ( i ) = (cid:88) i µ ( i ) P CA σ ( i ) P CA P C ( i ) (5)where the sum is typically truncated to a small number of principal components.The optimization (2) is easily generalised if we are penalising for kurtosis insteadof variance; the equivalent of (2) is w = arg min w ( w . m − ν w . w . K . w . w ) , (6)where ν is a different risk aversion parameter, aversion to kurtosis risk. This leadsto portfolio weights w ( i ) ICA ∝ (cid:32) µ ( i ) ICA κ ( i ) ICA (cid:33) / (7)and the optimal portfolio isΠ ∝ (cid:88) i w ( i ) ICA IC ( i ) = (cid:88) i (cid:32) µ ( i ) ICA κ ( i ) ICA (cid:33) / IC ( i ) , (8)which can again practically be truncated to a small number of independent compo-nents. We call the portfolio (8) the Fat-Tailed Portfolio .The Fat-tailed portfolio (8) has several interesting properties; in particular, theportfolio weight no longer scales linearly with the performance of a component,as it does in a Kelly portfolio, but as its cubed root. Practically, this leads tomore diversified portfolios with a less pronounced winner-takes-all profile than thatdictated by Kelly.In particular, let us imagine two components with same variances and kurtoses,but the first one has twice the return of the second one; Kelly dictates that the firstcomponent would get twice the leverage of the second, while the fat tailed portfoliowould only allocate it 2 / ≈ .
9% of the leverage of the second component.It is, however, important to note that the fat-tailed portfolio (8) only maximizesthe return per unit of kurtosis; it does not specifically target low variance, andits variance is not guaranteed to be low. Even the independence of the ICs doesnot necessarily lead to variance decay through diversification. Diversification wouldrequire summation of components of equal variance; the ICs themselves have equalvariances by construction, but in the portfolio they are weigthed by the cubed rootof the ratio of their mean to kurtosis. Diversification is only likely if this ratio isrelatively stable across the ICs. On the other hand, the cubed root scaling servesto flatten the portfolio weights, so the requirement that they are stable is easier toachieve in practice, as will be shown in Sections 4 and 5.In the absence of diversification, the fat-tailed portfolio only moves the risk fromthe fourth moment to the second; it pushes the risks from the tails of the distributiontowards its centre, but it does not eliminate them from the centre.
We now turn to the construction of portfolios which manage both variance andkurtosis. n analogy to (2) and (6), one could naively look for portfolio weights by solvingthe combined optimization problem w = arg min w ( w . m − λ w . V . w − ν w . w . K . w . w ) , (9)where we now have two risk aversion parameters; aversion to variance, λ , and aver-sion to kurtosis, ν .This is, however, not ideal. The Kelly portfolio (5) and the Fat-tailed portfolio(8) have a certain measure of universality; the shape of the portfolio is fixed, andthe investor risk aversion only affects the leverage. With (9), this would no longerbe the case. The shape of the portfolio would now depend on the ratio of varianceaversion to kurtosis aversion, and it woud be different from investor to investor.We are, on the other hand, interested in finding universal portfolio shapes thatcontrol both variance and kurtosis, while being independent of the investor riskpreferences. In other words, the investor risk preferences should only affect theleverage, but not the shape of the portfolio.The solution comes from the Central Limit Theorem, which loosely states thatthe sum of independent random variables, de-meaned and normalized to unit volatil-ity, tends to a normal distribution. In particular, its excess kurtosis vanishes due tothe Central Limit Theorem, and its variance decays through diversification.We therefore look at portfolios of ICs that satisfy the conditions of the CentralLimit Theorem. For this, we have the following result: Theorem (Central Limit Theorem for ICA) . Consider two portfolios, Π P C = 1 n n (cid:88) i P C ( i ) σ ( i ) P CA , Π IC = 1 n n (cid:88) i IC ( i ) σ ( i ) ICA . Then, as n → ∞ , the variance of the returns of both portfolios is O (cid:18) n (cid:19) ; the excess kurtosis of the returns of Π P C is o (cid:18) n (cid:19) , and the excess kurtosis of the returns pf Π IC is o (cid:18) n (cid:19) . The proof of the CLT for ICA is given in the Appendix. Note that there isformally no need to normalise the components of Π IC by σ ( i ) ICA in constructing theIC portfolio, as we do in the statement of the CLT, since volatilities of ICs arealready all equal by construction. We are only normalising them so that we wouldbe able to compare them directly to PCA portfolios in the numerical example in thenext Section.This form of the CLT tells us that portfolios of ICs can be equally good asportfolios of PCs in suppressing the variance, but they have the added bonus ofsuppressing the non-Gaussianity of the returns much faster. S&P500 stocks
We looked at S&P500 stocks over a period of 12 years, from the 1st January 2007until the 31st December 2018. To counteract the effects of stocks drifting in andout of the index over such a long time frame, we have divided the time frame intofour buckets, each lasting three calendar years; from 1st January 2007 until 31stDecember 2009, from 1st January 2010 until 31st December 2012, from 1st January2013 until 31st December 2015 and from 1st January 2016 until 31st December 2018.The basket for each bucket was selected as consisting of the index constituents onthe last business day prior to the start of the bucket, and these stocks were followeduntil the end of the bucket. Any stock that was de-listed before the end of a bucketin which it appeared was deemed to have returned 0% from its last trading day untilthe end of the bucket. There were no adjustments for stocks entering or leaving theindex over the duration of any of the buckets.We have performed PCA and ICA on each of the buckets, extracting 10 prin-cipal components and 10 independent components for each bucket. Decomposi-tions used the Python package scikit-learn 0.23.2 , and in particuar the classes sklearn.decomposition.PCA for PCA, and sklearn.decomposition.fastICA forICA.Resulting PCs and ICs are plotted in Figure 1. Qualitatively, the PCs appear tobe significantly more dispersed than the ICs; PC1 is immediately visually identifiableas an outlier on each PCA graph, while IC1 is not an obvious outlier in any of thebuckets.Additionally, PCs vary on the order ± in all buckets, and 10 in the 2010-2012 bucket, while the ICs vary on the order 1 in every bucket. This is not an issueof scaling; first ten PCs have higher variance than the first ten ICs, but first tenICs have higher kurtosis than the first ten PCs. This counters any argument thatthe PCs and ICs are differently scaled; it is impossible to scale them so that theirvariances and kurtoses are of the same order of magnitude simultaneously.The correlation matrices between the PCs and the ICs in each bucket are shownin Figure 2. As expected, the PCs are all orthogonal to each other, as are the ICs.But the PCs are generally not orthogonal to the ICs, and the correlation matrix hasa block structure, with the PC-IC block generally non-zero.We have constructed the portfolios Π P C and Π IC from the Central Limit Theo-rem from the increasing number of components n , starting from a single componentand ending with all n = 10 components. The variances and kurtoses of the resultingportfolios are shown in Figure 3. Each component was scaled to unit variance, sothe variance of each portfolo is exactly 1 /n , to machine prexision. The kurtosisilustrates the difference between the portfolios; kurtosis of Π IC decays faster thanthe variance with increasing n , while the kurtosis of Π P C generally does not. Thisis consistent with the respective n − and n − scaling for the kurtoses of Π P C andΠ IC as predicted by the Central Limit Theorem.Finally, we have constructed the optimal Kelly portfolio from the ten PCs, andthe optimal Fat-tailed portfolio from the ten ICs. The resulting portfolios are shownin Figure 4. Inarguably, both Kelly and Fat-tailed portfolios capture the samefactors and the same performance over each bucket. The correlation between theKelly portfolio and the Fat-tailed portfolio is greater than 90% for each bucket, andit is as high as 97 . olatility and higher Sharpe ratio, while the Fat-tailed portfolio always has lowerkurtosis and higher Fat-tailed ratio ( µ/κ ) / . This is based on portfolio constructionwith perfect hindsight. In reality, the differences between the Sharpe ratios of theKelly and Fat-tailed portfolios are minor, always below 10%. In real-world portfolioconstruction without perfect hindsight, these differences would be unobservable.The differences between their kurtoses and Fat-tailed ratios vary more widely,ranging from 10% difference in 2013-2015 (0.49 vs 0.54) to 60% difference in 2010-2012 (0.766 vs 1.226). Some of these differences would arguably survive into practicalportfolio construction, where they would be felt.In short, the differences between their performances per unit variance are min-imal and insignificant in practice. On the ther hand, using the same risk aversionnumber for aversion to volatility and aversion to kurtosis, the Kelly portfolios consis-tently underperform their respective Fat-tailed portfolios. The only exception to thisis the years 2008 and 2009. This is qualitatively atributable to the winner-takes-alllinear portfolio weights coupled with the wider gap between the PCs. Non-Gaussianity-based factors such as those described in this paper are an interest-ing alternative to standard PCA-based factors. In enginering applications rangingfrom acoustic signal processing [5], visual processing [15], telecomunications [17],electroencephalography [22] and many others, ICA is generally considered to be aconsiderably more powerful tool than PCA, and it is a de facto standard in a widerange of applications.This has historically not been the case in finance, however, where PCA basedmethods have dominated since Markowitz’ modern portfolio theory [13] [14], andonly a handful of studies have even seriously looked at ICA [3, 16, 18, 12, 11, 4].While PCA and ICA are conceptually similar, their stated purpose is different.The purpose of PCA is to isolate the strongest signals (those with highest vari-ance), while the purpose of ICA is to isolate the noisiest signals (those with highestdeviation from Gaussianity).This leads to a different distribution of factors, as illustrated in our S&P500example. The first IC is nowhere near as dominant as the first PC, which directlyleads to the slower, 1 / / / powerful alternative to the existing array of methods, and they address a numberof known concerns with covariance-based methods. Appendix: Proof of the Central Limit Theorem for ICA
The CLT for Π
P C is the standard CLT for any sum of de-meaned, normalizedorthogonal random variables. We denote Y i = dP C ( i ) − µ ( i ) P CA σ ( i ) P CA (10)where dP C ( i ) denotes the returns process of P C ( i ) , so that all Y i all have zero meanand unit variance, and their normalised sums as Z n = 1 √ n n (cid:88) i Y i (11)Given that Y i are all orthogonal, the characteristic function of Z n is ϕ Z n ( t ) = ϕ √ n (cid:80) ni Y i ( t ) (12)= ϕ Y (cid:18) t √ n (cid:19) ϕ Y (cid:18) t √ n (cid:19) ... ϕ Y n (cid:18) t √ n (cid:19) + o (cid:18) t n (cid:19) (13)= (cid:20) ϕ Y (cid:18) t √ n (cid:19)(cid:21) n + o (cid:18) t n (cid:19) (14)as t /n → o ( t /n ) arises due to the fact that we can not guarantee thatjoint moments beyond order 2 are zero. The construction of PCs only guaranteestheir orthogonality, but it does not guarantee either their Gaussianity, nor theirindependence at orders higher than 2.The second order Taylor expansion of ϕ Y around zero gives ϕ Y = (cid:18) − t n + o (cid:18) t n (cid:19)(cid:19) , (15)so ϕ Z n = (cid:18) − t n + o (cid:18) t n (cid:19)(cid:19) n + o (cid:18) t n (cid:19) . (16)Expanding the known terms in (16) to fourth order, we get ϕ Z n = 1 − n t n + n ( n − t n + o (cid:18) t n (cid:19) (17)where the term o ( t /n ) may contain additional contributions at the order t , so thebest we can say about the error at t is that it is o (1 /n ).In particular, the second and fourth derivatives of ϕ Z n at zero are ϕ (cid:48)(cid:48) Z n = 1 (18) ϕ ivZ n = n ( n − n o (cid:18) n (cid:19) , (19) ence the excess kurtosis of Z n is o (1 /n ). Given that the returns of Π P C differ from Z n / √ n by a deterministic mean process, the excess kurtosis of the returns of Π P C is also o (1 /n ).Moving to Π IC , the difference is that the components are now, by construction,independent to the fourth order, instead of to second order for Π P C . Therefore,denoting y i = dIC ( i ) − µ ( i ) ICA σ ( i ) ICA , (20)where dIC ( i ) denotes the returns process of IC ( i ) z n = 1 √ n n (cid:88) i y i , (21)equations (15) and (16) read ϕ y = (cid:18) − t n + o (cid:18) t n (cid:19)(cid:19) , (22) ϕ z n = (cid:18) − t n + o (cid:18) t n (cid:19)(cid:19) n + o (cid:18) t n (cid:19) , (23)with the error at t now of the order o (1 /n ).The second and fourth derivatives become ϕ (cid:48)(cid:48) Z n = 1 (24) ϕ ivZ n = n ( n − n o (cid:18) n (cid:19) , (25)and the excess kurtoses of z n and the returns of Π IC are o (1 /n ). Acknowledgments
The author reports no conflicts of interest. The author alone is responsible for thecontent and writing of the paper.
References [1] Avellaneda, M. (2019)
Hierarchical PCA and Applications toPortfolio Management , https://ssrn.com/abstract=3467712 orhttp://dx.doi.org/10.2139/ssrn.3467712[2] Avellaneda, M., Healy, B., Papanicolaou, A. & Papanicolaou, .G (2020)
PCAfor Implied Volatility Surfaces , https://arxiv.org/abs/2002.00085[3] Back,A. & Weigend, A. (1997) A First Application of Independent ComponentAnalysis to Extracting Structure from Stock Returns,
International Journal ofNeural Systems , Vol. 8, No.5
4] Chowdhury, U.N. Chakravarty, S.K & Hossain, M.T. (2018) Short-Term Finan-cial Time Series Forecasting Integrating Principal Component Analysis and Inde-pendent Component Analysis with Support Vector Regression,
Journal of Com-puter and Communications , 6, 51-67. https://doi.org/10.4236/jcc.2018.63004[5] Haykin, S & Kan, K. (2007)
Coherent ICA: Implications for Auditory SignalProcessing , 1 - 5. 10.1109/ASPAA.2007.4393059.[6] Hyv¨arinen A. (1999) Fast and Robust Fixed-Point Algorithms for IndependentComponent Analysis,
IEEE Trans. on Neural Networks , 10(3):626-634[7] Hyv¨arinen, A. (2013) Independent component analysis: recent advances,
Philo-sophical Transactions: Mathematical, Physical and Engineering Sciences . 371[8] Jollife, I.T. (2002)
Principal Compoment Analysis , 2nd edition, Springer,NewYork.[9] Lintner, J. (1965) The Valuation of Risk Assets and the Selection of Risky In-vestments in Stock Portfolios and Capital Budgets,
The Review of Economicsand Statistics . 47 (1): 13–39. doi:10.2307/1924119. JSTOR 1924119.[10] Litterman, R. & Scheinkman, J. (1991) Common factors affecting bond returns,
The Journal of Fixed Income .[11] Liu, H. & Wang, J. (2011) Integrating Independent Component Analysis andPrincipal Component Analysis with Neural Network to Predict Chinese StockMarket,
Mathematical Problems in Engineering , vol. 2011, Article ID 382659.https://doi.org/10.1155/2011/382659[12] Lu, C.J. Lee, T & Chiu, C.C. (2009) Financial time series forecasting usingindependent component analysis and support vector regression, D ecision SupportSystems,] Volume 47, Issue 2[13] Markowitz, H.M. (1952) Portfolio Selection, The Journal of Finance . 7 (1):77–91. doi:10.2307/2975974. JSTOR 2975974.[14] Markowitz, H.M. (1956) The Optimization of a Quadratic Function Subjectto Linear Constraints,
Naval Research Logistics Quarterly . 3 (1–2): 111–133.doi:10.1002/nav.3800030110.[15] Mart´ın-Clemente, R. & Hornillo-Mellado, S. (2006)
Image processing usingICA: a new perspective , IEEE MELECON 2006, May 16-19, Benalm´adena(M´alaga), Spain.[16] Oja, E. Kiviluoto K. & Malaroiu, S. (2000)
Independent component analysis forfinancial time series , Proceedings of the IEEE 2000 Adaptive Systems for SignalProcessing, Communications, and Control Symposium (Cat. No.00EX373), LakeLouise, Alberta, Canada, pp. 111-116, doi: 10.1109/ASSPCC.2000.882456.[17] Parmar, S. D. & Unhelkar, B. (2009) Separation performance of ICAalgorithms in communication systems,
International Multimedia, SignalProcessing and Communication Technologies , Aligarh, pp. 142-145, doi:10.1109/MSPCT.2009.5164195.[18] Pike, E.R. & Klepfish, E.G. (2003) The Analysis of Financial Time Series Databy Independent Component Analysis, In: Takayasu H. (eds)
The Application ofEconophysics . Springer, Tokyo. https://doi.org/10.1007/978-4-431-53947-6 24[19] Sharpe, W.F (1964)
Capital asset prices: A theory of market equilibrium underconditions of risk , Journal of Finance . 19 (3): 425–442. doi:10.2307/2977928.hdl:10.1111/j.1540-6261.1964.tb02865.x. JSTOR 2977928.
20] Shkolnik, A.D., Goldberg, L. & Bohn, J.R. (2016)
Identify-ing broad andnarrow financial risk factors with convex optimization ,https://ssrn.com/abstract=2800237 or http://dx.doi.org/10.2139/ssrn.2800237[21] Tobin, J. (1958) Liquidity preference as behavior towards risk,
The Review ofEconomic Studies . 25 (2): 65–86. doi:10.2307/2296205. JSTOR 2296205.[22] Ungureanu, M. , Bigan, C., Strungaru, R. & Lazarescu, V. (2004)
IndependentComponent Analysis Applied in Biomedical Signal Processing , Meas Sci Rev,Volume 4, Section 2.[23] Hyv¨arinen A., Karhunen, J. & Oja, E. (2001)
Independent component analysis ,John Wiley & Sons, DOI:10.1002/0471221317 n , PC portfolios vs IC portfolios.14igure 4: Kelly portfolio vs Fat-tailed portfolio for each bucket; Fat-tailed Ratio refersto the quantity ( µ/κ ) /3