A Multi-factor Adaptive Statistical Arbitrage Model
AA Multi-factor Adaptive Statistical Arbitrage Model
Wenbin Zhang , Zhen Dai, Bindu Pan, and Milan Djabirov Tepper School of Business, Carnegie Mellon Unversity 55 Broad St, New York, NY 10005 USA
Abstract
This paper examines the implementation of a statistical arbitrage trading strategy based on co-integration relationships where we discover candidate portfolios using multiple factors rather than just price data. The portfolio selection methodologies include K-means clustering, graphical lasso and a combination of the two. Our results show that clustering appears to yield better candidate portfolios on average than naively using graphical lasso over the entire equity pool. A hybrid approach of using the combination of graphical lasso and clustering yields better results still. We also examine the effects of an adaptive approach during the trading period, by re-computing potential portfolios once to account for change in relationships with passage of time. However, the adaptive approach does not produce better results than the one without re-learning. Our results managed to pass the test for the presence of statistical arbitrage test at a statistically significant level. Additionally we were able to validate our findings over a separate dataset for formation and trading periods.
Introduction
Papers published in the past that explore co-integration and pairs trading identify portfolios of "similar" stocks by finding those whose prices historically moved in tandem. We felt that, in the co-integration case, this process can be improved upon by seeking "similar" stocks through measures other than price alone because the stock prices of characteristically similar firms will more or less move together. The intuition is that if we can identify portfolios that are alike over multiple dimensions, then their linear combinations (over price) should be more likely to revert to being co-integrated after any temporarily divergence. Injecting more information into the selection process by adding extra dimensions in order to identify stronger relationships in future price movements seemed worthwhile exploring. As a companion to graphical lasso, another machine learning technique - clustering was a natural choice to utilize. After briefly looking through published literature on co-integration, pairs trading, and other statistical arbitrage methodologies, we did not find any others attempting this concept. The three major components for developing a statistical arbitrage are determining the right assets to trade, simulating trading through back testing, and verifying the existence of statistical arbitrage. Below is an outline of our study in these elements. The first component, the selection process, highlights the bulk of our efforts: Factor selection: we used PCA technique to identify a set of independent factors. We used the factors themselves and the linear combination of these raw factors computed from PCA loadings. Clustering: we used K-mean clustering. Corresponding author. Email address: [email protected].
Combining clustering and graphical lasso. We propose two distinct approaches – “Clustering-Glasso” and “Glasso-Clustering”. For the second component, we followed a standard strategy arbitrage trading procedure: We tested for a co-integration relationship for each identified portfolio. We checked whether the portfolio generated a positive profit over the formation period. If so, we continued to trade these portfolios. We attempted to rebalance the strategy during trading phase to account for clusters and co-integration relationships perhaps changing over time. Finally, we used the JTTW-based approach to test the trading results and cross-validate our strategy.
Data Collection and Normalization
Our raw data was largely sourced from Bloomberg. We selected 19 different dimensions based on fundamental, statistical and momentum associated factors. This dataset covered all US stocks in the S&P 500 for the period starting from the first trading day of 2004 through the final trading day of 2011. The dimensions for our initial consideration are:
Volatility (60 day) Shares Outstanding Sales Growth RSI (Relative Strength Index) Price to Book Ratio Price to Sales Ratio Price to EBITDA Ratio P/E Ratio Normalized ROE Market Cap Free Cash Flow Growth Cash Flow Growth Dividend (per share) Bloomberg Estimates Analyst Rating Total Number of Sell Recommendations Total Number of Buy Recommendations Price (close Ask Bid
We cleaned the initial raw dataset by removing all non-trading days and missing values. There were 109 stocks with no missing values in all 19 dimensions across the entire period. Our implementation is based on this universe of stocks. We note that it is probably more appropriate to have chosen the S&P 500 stocks from 2004 and enhanced our methodology to deal with missing fundamental data in separate formation periods. Unfortunately we did not manage to obtain the means to procure this data. This has the potential of introducing survivor bias. A separate section on data selection and potential bias re-visits this issue later in the paper. ext, we normalized all dimensions before applying any additional filtering. The number of buy/sell recommendations were merged into a single factor as (buy-sell)/(buy+sell). We also took the logarithm of market cap and number of shares outstanding. This step is motivated Axtell who shows that US Firm sizes show a Zipf-law like distribution when plotted on a log-log scale (rank vs frequency). The factors were then normalized by subtracting the mean and dividing by the sample standard deviation. Our date set will be divided into two parts: Regular Experiment Phase: From January 2004 to December 2007. The first two years are formation period, and the next two years are trading period. Cross Validation Phase: From January 2008 to December 2011. The first two years are formation period, and the next two years are trading period.
PCA Analysis
In order to select the factors that are most impactful we applied PCA over the normalized data. The below graphs shows the resulting analysis: From the output of the loadings, we determined that the 7 most significant components contribute to 95.5% of the total variance. We used two different approaches towards factor selection given this data.
Choosing Most Significant Raw Factors
Based on the independent principal components generated by PCA, we can readily observe the dimensions that are largely responsible for variance of our data. In this case, we did not directly use the linear combinations. The 7 most significant factors are:
P/E ratio Price to Sales Ratio Cash Flow Growth Price Price to EBITDA ratio ROE olatility
Choosing Principal Components Generated by PCA
We also directly chose the 7 most significant principal components for our analysis. We ran clustering algorithms based on both selection approaches in the results to follow.
K-mean Clustering
There are a number of commonly used clustering algorithms. We felt, for our purpose, the most intuitive choice is K-means clustering. In order to produce a reasonable size for each cluster during the formation period, we chose K=30 which seems to generate cluster sizes of about 2-4 stocks on average.
Candidate Portfolio Generation
To keep the portfolio sizes comparable for each selection methodology, we enforced a policy of 2 - 4 stocks per portfolio. In this study, we applied two simple approaches (clustering and graphical lasso) and two hybrid approaches (Clustering-Glasso and Glasso-Clustering) to generate candidate trading portfolios.
K-means Clustering If a cluster contains only one stock, ignore. If a cluster contains 2, 3 or 4 stocks, take the entire cluster as a candidate portfolio. If a cluster contains 5 or more stocks, split them into sub-groups of 2 or 3 stocks and treat each group as a candidate portfolio. For our initial formation period, this method generated 35 candidate trading portfolios with an average of 2.89 stocks per portfolio with selected 7 raw factors; and it generated 37 candidate trading portfolios with an average of 2.73 stocks per portfolio with top 7 principal components.
Graphical Lasso (Glasso) If there is only one non-zero entry in a given row of the inverse correlation matrix, ignore. If there are 2, 3 or 4 non-zero entries in a given row of the inverse correlation matrix, take the corresponding stocks as a candidate portfolio. If there are 5 or more non-zero entries in a given row of the inverse correlation matrix, take the corresponding 4 stocks with the largest absolute values as a candidate portfolio. For our initial formation period, this method generated 55 candidate trading portfolios with an average of 3.82 stocks per portfolio.
K-means Clustering - Graphical Lasso (Clustering-Glasso) Run K-means with K = 3 to create 3 large clusters. Run graphical lasso on the entire set. If there is only one non-zero entry in a given row of the inverse correlation matrix, ignore. If there are 2, 3 or 4 non-zero entries in a given row of the inverse correlation matrix, check to make sure that they belong to the same cluster. If not, ignore.
If there are 5 or more non-zero entries in a given row of the inverse correlation matrix, take the corresponding 4 stocks with the largest absolute values. For our initial formation period, this method generated 49 candidate trading portfolios with an average of 3.61 stocks per portfolio with selected 7 raw factors; and it generated 50 candidate trading portfolios with an average of 3.7 stocks per portfolio with top 7 principal components. Running K-means clustering first will generate at most 109 candidate portfolios since we determine 0 or 1 portfolios per row in the inverse correlation matrix.
Graphical Lasso - K-means Clustering (Glasso-Clustering) Run graphical lasso on the entire set. Run K-means with K = 3 to create 3 large clusters. Filter the inverse correlation matrix based on cluster membership, i.e. set up 3 separate passes through the inverse correlation matrix. When searching under one cluster, members of other clusters will have their entries in the inverse correlation matrix set to 0. For each pass, if there is only one non-zero entry in a given row of the inverse correlation matrix, ignore. If there are 2, 3 or 4 non-zero entries in a given row of the inverse correlation matrix, take the corresponding stocks as a candidate portfolio. If there are 5 or more non-zero entries in a given row of the inverse correlation matrix, take the corresponding 4 stocks with the largest absolute values as a candidate portfolio. For our initial formation period, this method generated 132 candidate trading portfolios with an average of 3.53 stocks per portfolio with selected 7 raw factors. In this setup, each row of the inverse correlation matrix can produce up to 3 candidate portfolios, and as expected, given the methodology we chose, the number of candidate trading portfolios found increased significantly with this second attempt at a hybrid search approach. We thought that this second approach may have produced too many candidate portfolios. In fact we had significant amount of room to carry out additional selection and still have a comparable number of portfolios with respect to the other selection methods. To that end, we ranked each of the 132 portfolios by the sum of the absolute values of the non-zero entries in the inverse correlation matrix. rom this graph, we can see that 50 is an appropriate cut-off point to choose portfolios. In order to have a fair comparison, we choose 55 portfolios, the number detected by solely using the graphical lasso method, for our simulation on the next step.
Portfolio Simulation
We applied the standard Johansen test for co-integration relationship on the candidate portfolios determined by each selection method. Those portfolios that passed the test are experimentally traded over a formation period from January 2004 through December 2005. Those that produced a net positive profit in the formation period go on to be traded in the trading period from January 2006 through December 2007. We normalized the long and short of our open trades such that the sum of their absolute values is $2. Below table shows the simulation result with portfolios based on solely clustering or graphical lasso method.
Clustering (Based on Sig. Raw Factors) Clustering (Based on Principal Components) Graphical Lasso
Simulation Result Remarks Simulation Result Remarks Simulation Result Remarks Portfolios identified 35 37 55 Average
6 16.2%
17 30.9% Portfolios that produce a net positive profit during formation period 3 75%
5 83.3%
11 64.7% Portfolios that produce a net positive profit during trading period 3 100%
3 60.0%
5 45.5% Total
26 83.9%
51 83.6% Average net profit per trade 0.019 0.031 0.012 Average net profit per portfolio
Total net profit Ratio of portfolios passed Johansen test to total number of portfolios Ratio of portfolios generated a positive profit during formation period to portfolios passed Johansen test Ratio of portfolios generated a positive profit during trading period to portfolios generate a positive profit during formation period Ratio of trades produced a positive profit during trading period to all trades opened
We observed that the clustering algorithm identified fewer candidate portfolios. Additionally, percentage wise, a fewer of these portfolios passed the Johansen test. However, a greater percentage of them yielded a net positive profit in the trading period. The average net profit per trade and per portfolio is also significantly higher than that of the graphical lasso method. Overall, clustering and graphical lasso yielded comparable performance in terms of generating candidate trading portfolios for co-integration-based statistical arbitrage strategy. Clustering found fewer portfolios but they were more profitable on average. We believe that the difference in the results come from the fact that clustering algorithms captures mainly cross-sectional behavior between stocks while graphical lasso concerns with only historical price time series. Similarly we ran the same test for the two hybrid approaches with two different variable selection methods – most significant raw factors and principal components. In general, they all yielded higher profit per portfolio and higher total net profit, comparing to individual clustering or graphical lasso methods.
Clustering based on Sig. Raw Factors (Sizes of three clusters: 32, 37, 40)
Clustering-Glasso Glasso-Clustering
Simulation Result Remarks Simulation Result Remarks Portfolios identified 49 55 Average
Total net profit
Clustering based on Principal Components (Sizes of three clusters: 32, 35, 42)
Clustering-Glasso Glasso-Clustering
Simulation Result Remarks Simulation Result Remarks Portfolios identified 50 55 Average
Total net profit
We wanted to also make sure that our additional filtering in the graphical lasso-clustering method accurately sifted out less profitable candidates. The table below shows the simulation results from trading the top ranked 30/50/60/90/100 versus all 132 portfolios for the raw-factor clustering case. Indeed, we saw that the lowest ranked 22 portfolios did not add any value to the strategy.
Average
Statistical Arbitrage Testing
We took two approaches to generating the P&L time series from our results for testing the existence of statistical arbitrage. In one, we applied a daily mark-to-mark approach to generating our gains and losses on our positions. In the other, we took the realized profit or loss on each trade and distributed the amount evenly, with discounting, and took daily average over the period of the holding. In both approaches, we fitted the JTTW model with an AR(1) noise term to each series. The time series for the risk free rate used was the daily 3 month Treasury bill rates from 2004 to 2011. From our experimental results, the realized P&L approach looked to be more informative because the trades opened did not evenly cover the entire trading period so that we saw a flat P&L series during certain time periods. We use a 0.05 significance level for all tests we performed. Under the singular portfolio selection methods, only principal-components-based clustering method passed our statistical arbitrage test, while the graphical lasso method and pure raw-factor-based clustering method did not pass the test. However, for both raw-factor-based clustering method and principal-omponents-based clustering method, all two hybrid models (Clustering-Glasso and Glasso-Clustering) yielded very low p-values (<0.05), signaling that we should reject the null hypothesis that a statistical arbitrage does not exist. Therefore, our hybrid models produced statistical arbitrage strategies in all cases.
Clustering (Based on Sig. Raw Factors) Clustering (Based on Principal Components) Graphical Lasso
P-value Remarks P-value Remarks P-value Remarks Singular portfolio selection methods 0.785 Failed 0.01 Success 0.234 Failed
Clustering-Glasso Glasso-Clustering
P-value Remarks P-value Remarks Clustering based on raw factors 0.041* Success 0.0* Success Clustering based on principal components 0.0* Success 0.0* Success * All the hybrid models passed statistical arbitrage tests at a 0.05 significance level.
Adaptive Trading
We tested rebalancing our portfolio once during the trading period by closing all trades at the end of 2006, re-running the two hybrid portfolio selection methods on 2006 data and trading the newly found candidates in 2007.
Clustering based on Sig. Raw Factors (Sizes of three clusters: 32, 37, 40 in the first half, and 28, 58, 23 in the second half)
Clustering-Glasso Glasso-Clustering
Simulation Result Remarks Simulation Result Remarks Portfolios identified 49/41 55/55 Average
Clustering based on Principal Components (Sizes of three clusters: 35, 32, 42 in the first half, and 26, 29, 54 in the second half)
Clustering-Glasso Glasso-Clustering
Simulation Result Remarks Simulation Result Remarks Portfolios identified 50/39 55/55 Average
Cross Validation
Cross validation was performed on the second half of our cleaned data. The formation period was set from 2008 through 2009 and the trading period lasted from 2010 through 2011.
Clustering (Based on Sig. Raw Factors) Clustering (Based on Principal Components) Graphical Lasso
Simulation Result Remarks Simulation Result Remarks Simulation Result Remarks Portfolios identified 34 35 90 Average
Total net profit
P-value of Statistical arbitrage test (Realized P&L) 0.0 Success 0.0 Success 0 Success We saw that the results for clustering and graphical lasso alone are reasonably in line with what we saw in our initial testing. Actually clustering itself outperforms graphical lasso quite a bit. Below two tables show the hybrid models with raw factors and principal components. The results consistently show that the hybrid models outperform sole clustering models or graphical lasso models. lustering based on Sig. Raw Factors (Sizes of three clusters: 23, 29, 57)
Clustering-Glasso Glasso-Clustering
Simulation Result Remarks Simulation Result Remarks Portfolios identified 83 90 Average
Total net profit
P-value of Statistical arbitrage test (Realized P&L) 0.0 Success 0.0 Success
Clustering based on Principal Components (Sizes of three clusters: 22, 41, 44)
Clustering-Glasso Glasso-Clustering
Simulation Result Remarks Simulation Result Remarks Portfolios identified 82 90 Average
Total net profit
P-value of Statistical arbitrage test (Realized P&L) 0.0 Success 0.0 Success The raw-factor-based hybrid models performed a bit worse than the testing period. However, they still generated candidate portfolios that are more profitable than those detected by using the graphical lasso method alone. In particular, all hybrid models generated much higher total net profits than either clustering model or graphical model alone.
We also tested adaptive trading over the cross validation period. The results are shown below.
Clustering based on Sig. Raw Factors (Sizes of three clusters: 23, 29, 57 in the first half, and 22, 32, 55 in the second half)
Clustering-Glasso Glasso-Clustering
Simulation Result Remarks Simulation Result Remarks Portfolios identified 83/86 90/90 Average lustering based on Principal Components (Sizes of three clusters: 24, 41, 44 in the first half, and 22, 29, 58 in the second half)
Clustering-Glasso Glasso-Clustering
Simulation Result Remarks Simulation Result Remarks Portfolios identified 82/86 90/90 Average urvivorship Bias
One issue that needs special attention when analyzing our results is data selection and survivorship bias. We wanted to select a wide universe of stocks with readily available statistics on the 19 factors we used as input to our candidate portfolio selection strategy. A natural candidate was the SP500 index which is a widely recognized benchmark. Unfortunately obtaining historical compositions of SP500 proved difficult. While Standard and Poor’s freely publishes current index composition, retrieving queries by date is part of a paid subscription service. Choosing the universe of stocks to be today’s SP500 and not changing that when testing back in time already implies survivorship bias. However, while we cannot currently prove that, our belief is that year-over-year the index composition changes are small enough that the general validity of our results would still hold. To give an idea of how the SP500 changes over time, we mined the following data from various online news sources.
SP500 Index Composition Changes:
TSS replaces SNV WPO replaces TIN RRC replaces TRB GME replaces DJ AMT replaces AT MTW replaces TEK POM replaces HCR TIE replaces BOL JEC replaces AV
SCG replaces MER FLIR replaces NCC OI replaces WB MFE replaces BRL EQT replaces RIG RSG replaces AW DNB replaces LIZ LIFE replaces ABI SRCL replaces BUD CEPH replaces GGP
PLN replaces SGP FSLR replaces WYE ARG replaces CBE CFN replaces MTW FMC replaces CTX RHT replaces CIT PWR replaces IR WDC replaces EQ PCS added TEL deleted
DISCA replaces PBG BRK.B replaces BNI KMX replaces XTO QEP replaces STR ACE replaces MIL TYC replaces SII IR replaces PTV CVC replaces KG FFIV replaces NYT NFLX replaces ODP NFX replaces EK
WPX replaces CPWR TEL replaces CEPH MOS replaces NSM MPC replaces MRO AMB replaces PLD ANR replaces MEE CMG replaces NOVL BLK replaces GENZ JOY replaces AYE
There is an average of 10 or so ticker changes (out of 500) per year or around 2% of ticker turnover, which over a few years is probably not enough to introduce significant changes in the results. While ideally we would want to have time-specific composition of SP500 and account for missing data during the trading, we still believe our results hold valid especially when used in a comparative setting (glasso vs clustering and hybrid approaches) where all strategies face the same data.
Conclusions
Based on our study, we felt that there is certainly merit in refining the portfolio selection process when developing a co-integration trading strategy. While a standalone graphical lasso approach detected a large number of candidate portfolios in our universe of stocks, their average profitability was relatively low. In contrast, a clustering only approach found fewer, but more profitable candidate portfolios. A hybrid approach was able to benefit from the strength of both by generation a reasonable number of profitable portfolios. We were not able to find a similar level of result in our implementation of continuously rebalancing portfolios, but we feel that there is room for improvement on this front.
Future Work s mentioned earlier, given ticker histories, we could have gathered more data, for example the equities in the S&P 500 in 2004 rather than 2012. This would have fully eliminated any potential for look-ahead bias and survivorship bias. We can easily account for stocks that stop trading in our system during the trading period but our data selection actually ensures existence so such provisions would not trigger. Gathering the data with missing tickers turned out to be quite difficult. While we were not able to directly compare the universe of stocks from S&P 500 in 2004 versus that of 2012, we do not believe that universe was markedly different based on the data in more recent years. Moreover, given that few stocks were selected relative to the size of the universe, we do not believe that there is a strong presence of survivorship bias in our study, and we do believe our hybrid models are still able to beat clustering or graphical model alone consistently, but we need to re-verify this anyway once we obtain the “unbiased” dataset in our future research. In terms of the stock selection process, we also wanted to experiment with other machine learning concepts such as hierarchical clustering or K-nearest neighbor classifier. Among partition-based clustering algorithms we could attempt applying fuzzy C-means clustering as well. Regarding the adaptive trading phase of our study, we would try to see the results of not forcibly closing trades at the end of 2006 and instead only update our pool of candidate portfolios for future holdings. Additionally, we can tweak parameters more carefully in each step of our study, and we can apply systematical and adaptive approach to stop loss under highly risky environment. Actually we saw during the cross validation phase that there were a few trades that closed with large losses. From the distribution of profits we can see that a lower bail out threshold, e.g. 0.2 may have been more appropriately. Indeed when we made this adjustment, we saw marked improved in the average profit of each traded portfolio.
References
Robert Jarrow, Melvyn Teo, Yiu Kuen Tse, and Mitch Warachka. “Statistical Arbitrage and Market Efficiency: Enhanced Theory, Robust Tests and Further Applications”. February 2005 Marcilio C. P. de Souto, Daniel A. S. Araujo, Ivan G. Costa, Rodrigo G. F. Soares, Teresa B. Ludermir, and Alexander Schliep. “Comparative Study on Normalization Procedures for Cluster Analysis of Gene Expression Datasets”. Neural Networks, 2008. IJCNN 2008. Chris Ding and Xiaofeng He. “Principal Component Analysis and Effective K-Means Clustering”. SDM'04. 2004 Glenn Fung. “A Comprehensive Overview of Basic Clustering Algorithms”. June 2001 Yan-Xia Lin, Michael McCrae, and Chandra Gulati. “Loss protection in pairs trading through minimum profit bounds: A cointegration approach” Journal of Applied Mathematics and Decision Sciences, Volume 2006 (2006) Robert L. Axtell. “Zipf Distribution of U.S. Firm Sizes”