[PDF] A Robust Transferable Deep Learning Framework for Cross-sectional Investment Strategy

Abstract

Stock return predictability is an important research theme as it reflects our economic and social organization, and significant efforts are made to explain the dynamism therein. Statistics of strong explanative power, called "factor" have been proposed to summarize the essence of predictive stock returns. Although machine learning methods are increasingly popular in stock return prediction, an inference of the stock returns is highly elusive, and still most investors, if partly, rely on their intuition to build a better decision making. The challenge here is to make an investment strategy that is consistent over a reasonably long period, with the minimum human decision on the entire process. To this end, we propose a new stock return prediction framework that we call Ranked Information Coefficient Neural Network (RIC-NN). RIC-NN is a deep learning approach and includes the following three novel ideas: (1) nonlinear multi-factor approach, (2) stopping criteria with ranked information coefficient (rank IC), and (3) deep transfer learning among multiple regions. Experimental comparison with the stocks in the Morgan Stanley Capital International (MSCI) indices shows that RIC-NN outperforms not only off-the-shelf machine learning methods but also the average return of major equity investment funds in the last fourteen years.

Full PDF

AA R

OBUST T RANSFERABLE D EEP L EARNING F RAMEWORK FOR C ROSS - SECTIONAL I NVESTMENT S TRATEGY

A P

REPRINT

Kei Nakagawa

Innovation LabNomura Asset Management Co.,1-11-1 Nihonbashi, Chuo-ku, Tokyo, 103-8260, Japan [email protected]

Masaya Abe

Innovation LabNomura Asset Management Co.,1-11-1 Nihonbashi, Chuo-ku, Tokyo, 103-8260, Japan [email protected]

Junpei Komiyama

Leonard N. Stern School of BusinessNew York University44 West 4th Street, New York, NY 10012 [email protected]

October 4, 2019 A BSTRACT

Stock return predictability is an important research theme as it reﬂects our economic and socialorganization, and signiﬁcant efforts are made to explain the dynamism therein. Statistics of strongexplanative power, called “factor” [1] have been proposed to summarize the essence of predictivestock returns. Although machine learning methods are increasingly popular in stock return prediction[2], an inference of the stock returns is highly elusive, and still most investors, if partly, rely ontheir intuition to build a better decision making. The challenge here is to make an investmentstrategy that is consistent over a reasonably long period, with the minimum human decision onthe entire process. To this end, we propose a new stock return prediction framework that we callRanked Information Coefﬁcient Neural Network (RIC-NN). RIC-NN is a deep learning approach andincludes the following three novel ideas: (1) nonlinear multi-factor approach, (2) stopping criteriawith ranked information coefﬁcient (rank IC), and (3) deep transfer learning among multiple regions.Experimental comparison with the stocks in the Morgan Stanley Capital International (MSCI) indicesshows that RIC-NN outperforms not only off-the-shelf machine learning methods but also the averagereturn of major equity investment funds in the last fourteen years.

Stock return predictability has been an important research theme as it reﬂects our economic and social organization.Although the dynamic nature of our economic activity makes it harder to predict the future returns of the stocks,signiﬁcant efforts are made to explain the dynamism therein. Statistics of strong explanative powers, called “factor”[1], are proposed to summarize the essence of predictive stock returns, and a large portion of investors develop theirportfolio strategies based on these factors. For example, Book-value to Price ratio (net asset of a company divided bythe market value of the corresponding stock) is one of the nominal factors, and this factor combined with simple sortingportfolio yields a positive return [1]. Due to their predictive power and robustness, investment decisions by professionalinvestors are heavily dependent on the factors.Machine learning is an increasingly popular tool for predicting unknown target variables; the last decades saw manyattempts to apply machine learning algorithms to support smart decision-making in different ﬁnancial segments [3, 4, 2].Still, its highly elusive nature makes it harder to make a consistent inference: Most investors, if partly, rely on theirintuition to build a better decision-making. a r X i v : . [ q -f i n . S T ] O c t PREPRINT - O

CTOBER

4, 2019 S c o r e Input Output (3) Deep Transfer Learning

Loss rank IC

Stop Epoch

Time step t-1

Time step t 𝒗 𝒊 𝒗 𝒇 (2) Weight Initialization and Stopping Stop 𝒗 𝒇 Loss (1) Multi-factor DL Approach S c o r e Stock

Input

Output S c o r e Stock

Input OutputSource DomainTarget Domain Transfer S t o c k Factor S t o c k Factor

Stock S t o c k Factor

Initialization 𝒗 𝒊 Initialization N o r t h A m e r i c a A s i a P a c i f i c Figure 1: Our approach: RIC-NN.The challenge in this paper is to make an investment strategy that is consistent over a fairly long period, with thesmallest human intervention on the entire process. We propose a novel approach, called Rank Information CoefﬁcientNeural Net (RIC-NN) for developing an investment strategy. Most of the quantitative investment strategies require aranking over the stock returns, and we made a ranking by using a deep learning (DL) approach. In particular, the largestadvantage of the machine learning lies in its capability to learn the nonlinear relationship between the factors and thestock returns [5, 6], and the DL approach is recently reported to outperform other more traditional approaches in manydomains, such as natural language processing [7], media recognition [8], and time-series forecasting.Due to the dynamic nature of our economic activity, naive use of the off-the-shelf machine learning tool easily overﬁtsthe existing data, and thus it fails to predict the future stock returns. For example, [9] applied deep learning to stockmarket prediction: They reported that the advantage of a deep learning model over a linear autoregressive model hasmostly disappeared in the test set. We show that the proposed RIC-NN consistently outperforms other methods basedon off-the-shelf machine learning algorithms. Our framework involves three novel ideas (Figure 1): Namely, (1) wepropose a deep learning multi-factor approach that enables cross-sectional prediction, and (2) the approach involvesa novel training method of neural network based on the rank IC. Our framework is very practical: We conducted acomprehensive evaluation of our approach based on the stocks in the Morgan Stanley Capital International (MSCI)indices. Our evaluation demonstrated that a neural network with a standard training method performs poorly, whereasour RIC-NN alleviates overﬁtting and outperformed linear models and ensemble-based models. Moreover, the averagereturn of RIC-NN over fourteen years surpasses the ones of major equity investment funds. (3) We furthermoreconsidered an information aggregation among several different markets in MSCI indices: Namely, a transfer learningbetween the North America (NA) region and the Asia Paciﬁc (AP) region. The experimental results imply that one canutilize the NA data to predict the future returns of the AP market, but not vice versa. The results verify the asymmetriccausal structure between the two markets [10, 11].

There are two major strategies in stock trading: Namely, the one based on time-series analysis [4] and the one based oncross-sectional analysis [12].The methods of the former strategy analyze past stock prices as time-series data [3] and are applied to a practicaltrading strategy that focuses on a particular stock. Indeed, ﬁnancial time-series forecasting can be considered oneof the signiﬁcant challenges in time series and machine learning literature [13]. The study of ﬁnancial time-serieswas originally started from a linear model, such as the autoregressive (AR) model in which the parameters are2

PREPRINT - O

CTOBER

4, 2019uniquely determined [14]. Introduction of machine learning techniques to the literature enabled us to capture nonlinearrelationship among relevant factors without prior knowledge about the input data distribution [3]. Still, an applicationof nonlinear methods to time-series data is highly non-trivial [9].The methods of the latter strategy, which include the work in this paper, perform a regression analysis using cross-sectional data of corporate attributes. Such a strategy aims to build a portfolio for investing as a subset of a largebucket of stocks and is applied to a practical quantitative investment strategy [15]. One of the most signiﬁcant interestsin a cross-sectional analysis lies in ﬁnding “factors” that have strong predictive powers to the expected return of across-sectional trading strategy: The Fama-French three-factor model [1, 16] is one of the nominal works in this ﬁeld.They argued that the cross-sectional structure of the stock price can be explained by three factors: Namely, the beta(market portfolio), the size (market capitalization), and the value (Book-value to price ratio; BPR). This argumentinspires many subsequent research papers that propose more sophisticated versions of factors. [17] surveyed the historyof the proposed factors and argued that the number of reported factors shows a rapid increase in the last two decades:Each year from 1980 to 1991 we saw a new factor, whereas each year from 1991 to 2003 we saw ﬁve new factors. Inthe period from 2003 to 2012, the number of factors proposed at each year rose sharply to around 18. As a result, over300 factors were discovered until 2012.While each factor shows a positive correlation to the investment strategy, the effectiveness of the factors signiﬁcantlyvaries over time and among different markets: Figure 2 and 3 show the cumulative returns of the long-short portfoliostrategy based on each single factor in Appendix A , from which one can see that the return of each single factor varieslargely over time. These results motivate a multi-factor approach where more than one factor is taken into considerationto the aim of better returns [5, 6].In particular, machine learning approaches, which can capture the nonlinear relationship among multiple factors, arerecently applied to a cross-sectional analysis. [18] applied the LASSO [19] in the U.S. stock market, [20] applied anauto-encoder based nonlinear model into a U.S. biotechnology market, and [21, 22, 23] applied deep learning in theJapanese stock market. However, these results are not universal: their experiments are performed only in a single market.Note also that, the neural nets by [21, 22, 23] adopted epoch-based stopping, which we show in the Experiments Sectionto be sensitive to the number of epochs.

This section describes RIC-NN, a deep learning based investment strategy.

We consider a medium-term investment cycle, where an investment is done on a monthly basis. Namely, let t = 1 , . . . , T be the time step, and each step corresponds to the end of a month between December 1994 and December 2018. We usethe term “stock universe” (or simply universe) U t to represent all the stocks of interest at time step t : In the case of theNorth America stock market, the number of stocks in the each U t is about 700. Note that U t gradually changes over thetime step to reﬂect economic activities among different sectors. At each time step, let i ∈ U t be an index denoting eachstock in the universe. Let R i,t ∈ R be the (unit) return of the stock i between the time step t − and t . Let x i,t ∈ R be the factors associated with the stock i at t .In this paper, we consider investment strategies that are widely used in the literature of ﬁnance [1, 24]. Namely, (i) thelong portfolio strategy, and (ii) the long-short portfolio strategy. We consider an equally-weighted (EW) portfolio, whichis simple yet sometimes outperforms more sophisticated alternatives [25]. (i) The long portfolio strategy consideredhere buys the top quintile (i.e., one-ﬁfth) of the stocks with equal weight aiming to outperform the average return of allthe stocks. Namely, let L t ⊂ U t : | L t | = 1 / | U t | be the long portfolio. The return from the portfolio is deﬁned as theaverage return of L t . R Lt = 1 | L t | (cid:88) i ∈ L t R i,t (ii) The long-short portfolio strategy not only buys the top quintile of the stocks but also sells the bottom quintile of thestocks. Namely, let S t ⊂ U t : | S t | = 1 / | U t | be the short portfolio. Let R St = | S t | (cid:80) i ∈ S t R i,t . The average returnin this strategy is deﬁned as R LSt = R Lt − R St . While the long-short portfolio cannot take advantage of the stockmarket growth, it is robust against a large market crisis (i.e., the ﬁnancial crisis during 2007-2008) because of its neutralposition.Essentially, both of the strategies above requires a ranking over the expected return of the stocks in the universesince we invent on the most promising stocks. Namely, let o t ∈ N | U t | be the ground-truth ranking with its element3 PREPRINT - O

CTOBER

4, 2019Figure 2: Cumulative portfolio returns in MSCI North America based on each single factor. Factors are listed inAppendix A. Figure 3: Cumulative portfolio returns in MSCI Paciﬁc based on each single factor.4

PREPRINT - O

CTOBER

4, 2019 o i,t ∈ { , , . . . , | U t |} denotes the corresponding place for each i ∈ U t . At each round t , we build the estimated ranking ˆ o t . We choose L t and U t be the top and bottom quintile on the basis of ˆ o t , respectively. Ranked information coefﬁcient(rank IC), which is also referred as the Spearman’s correlation coefﬁcient, between two rankings o t , ˆ o t is deﬁned as rank IC( o t , ˆ o t ) = 1 − (cid:80) i ∈ U t ( o i,t − ˆ o i,t ) | U t | ( | U t | − , which takes the value in [ − , and is widely used in the ﬁeld of ﬁnance [15]. The larger the value of the rank IC is, thebetter a portfolio strategy based on the ranking is.We consider a rolling-horizon setting: Namely, at each time step t , we estimate the ranking of the next time step ˆ o t +1 .The following sections introduce RIC-NN, our DL-based method to build ˆ o t +1 . The normalized rank of the stock i at time step t is denoted as r i,t ∈ R : Namely, we rank the stocks in accordancewith their return { R i,t } and normalize them so that r i,t ∈ [0 , (i.e., r i,t for the stock of the largest return at each t is ,whereas r i,t for the stock of the median return is . ).At each time step t , we build an estimator ˆ r i,t of r i,t by using the following augmented feature vector v i,t ∈ R :Namely, given that many of the factors are updated in quarterly basis (i.e., each time steps), we deﬁne v i,t =( x i,t , x i,t − , ..., x i,t − , x i,t / R x i,t − , ..., x i,t / R x i,t − ) ∈ R using the past ﬁve time steps, where x / R y over twovectors x and y denotes an element-wise differentiation operator with its each element is deﬁned by × ( x − y ) / ( | x | + | y | ) ,which is popularly used in ﬁnance [26]. We adopt a seven-layer feed forward neural network with Rectiﬁed linear function (ReLU) activation function [27] tolearn the relationship between v i,t and r i,t +1 . The hidden layer size is set to be (150 − − − − − ,and the dropout rate for each layer is set to be (50% − − − − − . We adopt the standard mean squared error (MSE) as the loss function and train our deep learning model by using thedata of the latest time steps from the past 10 years. Namely,

MSE t = 1 K  t − (cid:88) t (cid:48) = t − N (cid:88) i ∈ U t (cid:48) ( r i,t (cid:48) +1 − f ( v i,t (cid:48) ; θ t (cid:48) ))  , (1)where N = 120 (i.e., ten years) is the size of sliding window to consider and K = (cid:80) t − t (cid:48) = t − N | U (cid:48) t | is the number of alltraining examples, and f ( · , θ ) is our neural net with weight parameter θ . We adopt the Adam [28] optimizer and batchnormalization [29]. The mini-batch size is set to be . A fundamental challenge in the cross-sectional analysis lies in its dynamism: Standard machine learning methods focuson the generalization performance on the i.i.d. assumption where the training dataset and the test dataset are drawnfrom the same (unknown) underlying distribution. However, a straightforward application of deep neural networksleads to overﬁtting to the current time window, which compromises the performance as a predictor of the next time step.To avoid overﬁtting, we initialize and terminate the training in the following criterion ((2) in Figure 1): Namely, wedeﬁne the initialization rank IC v i ∈ [0 , and stopping rank IC v s ∈ [0 , , and conducts the training as follows. Let θ t,v is the weights of the RIC-NN at time step t during the training when the average from rank IC in the trainingwindow reaches v . We used (i) θ t − ,v i as the initial parameters to train model at time step t and (ii) adopts θ t,v s as theﬁnal model parameter θ t . We estimate ˆ o t +1 by ranking the stocks in accordance with f ( v i,t ; θ t ) , which, combinedwith the long or the long-short portfolio, deﬁnes our RIC-NN.The value of v i , v f are set to be . , . . These values are optimized by the performance in the three years from2005 to 2008. Essentially, these values “moderately overﬁt” to the model: The value of the rank IC of a fairly goodportfolio is around . , and thus these values are large enough to exploit current data while it stops before overﬁttingto the dataset in the current window. The experiment section shows that these values consistently performs well inmultiple markets of very different natures. 5 PREPRINT - O

CTOBER

4, 2019

In evaluating an investment strategy, we use the following measures that are widely used in the ﬁeld of ﬁnance [30].These measures evaluate not only the actual return of the portfolio but also the magnitude of the risk taken: A samplefrom highly ﬂuctuated series can cause a large variance, and thus a return normalized by a risk yields a more reliableevaluation.Regarding the long portfolio strategy, the annualized return is the excess return (Alpha) against the average return of allstocks in the universe, the risk (tracking error; TE) is calculated as the standard deviation of Alpha and risk/return isAlpha/TE (information ratio; IR).

Alpha = T (cid:89) t =1 (1 + α t ) /T − (2) TE = (cid:114) T − × ( α t − µ α ) (3) IR = Alpha / TE (4)Here, α t = R Lt − | U t | (cid:80) i ∈ U t R i,t , µ α = (1 /T ) (cid:80) Tt =1 α t .Likewise, we evaluate the long-short portfolio strategy by its annualized return (AR), risk as the standard deviation ofreturn (RISK), risk/return (R/R) as return divided by risk as for the long portfolio strategy. AR = T (cid:89) t =1 (1 + R LSt ) /T − (5) RISK = (cid:114) T − × ( R LSt − µ LS ) (6) R / R = AR / RISK (7)Here, µ LS = (1 /T ) (cid:80) Tt =1 R LSt be the average return of the long-short portfolio.In summary, the return of the long (resp. long-short) portfolio is evaluated by Alpha (resp. AR), whereas the risk of thelong (resp. long-short) portfolio is evaluated by TE (resp. RISK). We use the risk-normalized return (i.e. IR for the longand R/R for the long-short) that gives more reliable measure than the return itself.We also evaluate maximum drawdown (MaxDD), which is yet another widely used risk measures [31, 32], for both ofthe long portfolio strategy and the long-short portfolio strategy: Namely, MaxDD is deﬁned as the largest drop from anextremum:

MaxDD = min k ∈ [1 ,T ] (cid:32) , W Port k max j ∈ [1 ,k ] W Port j − (cid:33) (8) W Port k = k (cid:89) i =1 (1 + R Port i ) . (9)where R Port i = R Li (resp. R Port i = R LSi ) for the long (resp. long-short) strategy. These performance measures arecalculated monthly during the prediction period from January 2005 to December 2018 ( T = 168 ). We prepare a stock dataset corresponding to Morgan Stanley Capital International (MSCI) North America and MSCIPaciﬁc Indices. These MSCI indices comprise the large and mid-cap segments of the North America (NA) and AsiaPaciﬁc (AP) markets respectively, and are widely used as a benchmark for the institutional investors investing in eachstock market [33]. We use popular factors listed in Appendix A. In calculating these factors, we use the followingdata sources: Namely, Compustat, WorldScope, Thomson Reuters, the Institutional Brokers’ Estimate System (I/B/E/S),and EXSHARE. Combining these sources, we calculate the factors on a monthly basis. As for stock returns, localreturns with dividends are acquired. 6 PREPRINT - O

CTOBER

4, 2019Table 1: Experimental Results of Long portfolio and Long-Short portfolio in MSCI North America. Bold charactersindicate the best ones among each category. The evaluation measures are the ones discussed in the “PerformanceMeasures” Section: Alpha (resp. AR) measures return, TE (resp. RISK) and MaxDD measure risk, and IR (resp. R/R)is a risk-normalized return measure in the long (resp. long-short) portfolio.Long Linear NonlinearLASSO RF NN RIC-NN RIC-NN(TF from AP)Alpha 0.62% 0.79% 0.82% -14.37% -20.57%Long-Short Linear NonlinearLASSO RF NN RIC-NN RIC-NN(TF from AP)AR 2.24% 1.71% 2.10% -21.26% -39.35%By using these sources, we build a dataset comprised 1,194 stocks on average (NA: 702, AP: 492) and 288 time stepsfrom December 1994 to November 2018. The following sections show the performance of the proposed RIC-NNstrategy compared with several baselines. This dataset involves a reasonably long period so that we can evaluate aconsistently-good investment strategy.

We compare the performance of RIC-NN with major off-the-shelf machine learning algorithms. Namely: LASSOregression (LASSO) model [19], random forest (RF), and standard Neural Network (NN). LASSO and RF areimplemented with scikit-learn [34], and NN is implemented with TensorFlow [35]. These methods are used to learn therelation between v i,t and r i,t +1 . Regarding the hyperparameters, regularization strength ("alpha") of LASSO is set to0.001, which is the largest value to yield a meaningful ranking. We use the default hyperparameters of RF. Severaldifferent hyperparameters are tested, and their results are shown in Appendix B. NN adopted the same framework as ourRIC-NN, except for the fact that NN stops the training at Epoch in MSCI North America and in MSCI Paciﬁc .We used random numbers as initial weights for the ﬁrst time step.Table 1 compares the algorithms in the MSCI NA dataset. RIC-NN outperforms all of the LASSO, RF, and NN in bothof the risk and the return measures, regardless of whether the portfolio strategy is the long or the long-short. A notableﬁnding is that RF and NN have smaller returns compared with LASSO. Our hypothesis is that the highly non-stationarynature of the stocks has lead to the overﬁtting of these nonlinear models. Table 2 shows the results in the MSCI Paciﬁcdataset. Although LASSO yields a larger return than RIC-NN by taking a larger risk, in terms of a risk-normalizedreturn, which is the prominent measure of investment strategy, RIC-NN outperforms the other methods. Regarding theresults of the transfer learning (“TF from AP/NA”), we discuss in a later section.We have also conducted the same experiment with Ridge Regression (RR): The performance of RR is not very differentfrom that of LASSO. Table 3 and 4 show the result of NN with different number of training epochs. While NN that stops at epoch performs better in the NA market, NN that stops at epoch performs better in the AP market. One can also ﬁndthat the performance of NN is very sensitive to the choice of the stopping epoch. On the other hand, RIC-NN thatconsistently stops at v f = 0 . outperforms most of (epoch-based) NN. This implies that the rank IC is a consistentmeasure of the ﬁtness of stock prediction models. These epochs are chosen so that the rank IC reaches 0.20 during the training of the ﬁrst time step. PREPRINT - O

CTOBER

4, 2019Table 2: Experimental Results of Long portfolio and Long-Short portfolio in MSCI Paciﬁc. Bold characters indicate thebest ones among each category.Long Linear NonlinearLASSO RF NN RIC-NN RIC-NN(TF from NA)Alpha 5.35% 3.79% 4.34% 5.25%

TE 5.17% 5.75% 4.18% 4.20%

IR 1.04 0.66 1.04 1.25

MaxDD -11.53% -11.43% -9.37% -7.51% -3.37%

Long-Short Linear NonlinearLASSO RF NN RIC-NN RIC-NN(TF from NA)AR 10.27% 7.78% 8.52% 9.81%

RISK 9.23% 9.65% 7.78% 7.83%

R/R 1.11 0.81 1.10 1.25

MaxDD -18.07% -18.66% -19.74% -11.06% -8.89%

Table 3: Comparison between RIC-NN and NN with different number of training epochs in MSCI North America.Long RIC-NN NN (Epoch)40 50 56 60 80Alpha 1.23% 0.18% -13.48% -17.41% -20.98% -15.94%Long-Short RIC-NN NN (Epoch)40 50 56 60 80AR

R/R -21.26% -40.09% -26.20% -34.49% -31.62% -23.47%Table 4: Comparison between RIC-NN and NN with different number of training epochs in MSCI Paciﬁc.Long RIC-NN NN (Epoch)40 46 50 60 80Alpha IR -7.16% -7.45% -7.52%Long-Short RIC-NN NN (Epoch)40 46 50 60 80AR R/R 1.25 1.16 1.10 1.11 -11.06% -13.07% -19.74% -12.17% -14.70% -13.44%8

PREPRINT - O

CTOBER

4, 2019

The value of Alpha (Eq. (2)) indicates the advantage of the long strategy over the average return in the universe, whichenables us to infer the possible advantage we can obtain by using machine learning algorithms.Comparing the Alpha in Table 1 and 2, machine learning algorithms has a smaller advantage in the NA market thanthey do in the AP market:A portfolio strategy of a higher return essentially exploits the gap between the market value of the stocks and the truevaluation of the companies: The more efﬁcient a market is, the more difﬁcult obtaining a higher return is. In otherwords, the result implies the efﬁciency of the NA market compared with the AP market.

To exploit the interdependency between the markets, we further apply transfer learning to our RIC-NN. Namely, we usethe weights of the ﬁrst four layers that are trained in the source region as the initial weight of the target region.Table 1 shows that the transfer from NA to AP is not very successful, whereas 2 shows the transfer from AP to NA isquite successful. In other words, NA as a source domain is quite informative to enhance the performance of AP, not viceversa. Those results are consistent with the market movements propagate from the NA stock market to the AP stockmarket [10, 11]. The experiment here shows the capability of RIC-NN to exploit highly non-trivial causal structureamong multiple markets by using deep neural networks.

This section compares the performance of RIC-NN with major funds where the investments involve decision-making byhuman experts. We select the top 5 funds in terms of the total assets (US dollar) excluding index funds as followingcriteria and calculate average total return series of these funds, including the trust fees: Namely, we select these fundsby querying Bloomberg fund screening search with the following condition: • Fund Asset Class Focus: Equity • Fund Geographical Focus: North America Region (resp. Asian Paciﬁc Region) • Fund Type: Open-End-Funds • Currency: US dollar • Market Cap Focus (Holdings Based): Large-cap, Mid-cap • Inception Date: before 12/31/2004In both of the NA and AP regions, the correlation coefﬁcient between the performance of the averaged funds above andthe MSCI index is larger than . , which implies these funds are based on the long strategy. For comparison, We addthe benchmark returns calculated by average return of MSCI North America (resp. MSCI Paciﬁc) constituent to longportfolio strategy performance and convert to US dollars.Table 5 shows the performance of RIC-NN and the aforementioned stock investing funds from January 2005 toJune 2018. The corresponding time-series data is shown in the supplementary material (see Appendix C). Unlikethe performance of the machine learning models, the performance of the funds involves the transaction cost: As aconservative baseline, Table 5 shows the performance of RIC-NN where the transaction cost for updating the entireportfolio is deducted (i.e., an overestimated transaction cost ): RIC-NN still outperforms the average performance ofthe funds. In this paper, we have proposed a new stock price prediction framework called RIC-NN by introducing three novelideas: (1) a nonlinear multi-factor approach, (2) a stopping criteria based on rank IC and (3) deep transfer learning.RIC-NN is conceptually simple yet universal: The identical NN architecture and RankIC stopping value yielded aconsistently good return for a long timescale and the two different markets of very different structures. Experimental We have deducted the cost of rebalancing all the stocks in the portfolio every month. Estimated transaction cost is 0.05% oneway in North America and 0.1% one way in Asia Paciﬁc. PREPRINT - O

CTOBER

4, 2019Table 5: The upper panel: Performance of RIC-NN in MSCI North America and averaged performance of ﬁveinvestment funds in the NA stock market. The lower panel: Performance of RIC-NN (TF from NA) in MSCI Paciﬁcand averaged performance of ﬁve investment funds in the AP stock market.NorthAmerica RIC-NN RIC-NN(After cost deduction) FundsAR 9.09% 7.79% 5.90%RISK 17.78% 17.78% 14.91%R/R 0.51 0.44 0.40AsiaPaciﬁc RIC-NN (TF) RIC-NN (TF)(After cost deduction) FundsAR 12.08% 9.44% 7.88%RISK 17.23% 17.23% 17.58%R/R 0.70 0.55 0.45comparison showed that RIC-NN outperforms off-the-shell machine learning methods and average performance ofinvestment funds in the last decades.Directions of promising future work includes the followings.

More sophisticated portfolio strategies:

In this study, we use a simple equally-weighted (EW) portfolio that maximiz-ing the predictive power of stock returns. On the other hand, the portfolio theory [36] states that explicit considerationof the risk in portfolio selection is important. Regarding this direction, combining our method with more sophisticatedportfolio strategies, such as Subset Resampling Portfolio [32] or Ensemble Growth Optimal Portfolio [37] will be aninteresting direction for the future work.

Stateful models:

This paper considered a rolling-horizon learning of a neural network, whereas there are severalother approaches for portfolio selection. In particular, the recurrent neural networks and its variants are stateful neuralnetworks that can capture the time evolution of the stock universe. Note that our RIC-NN model uses quite a large timewindow (i.e., ten years) for the training, which implies that the long-range interaction is important in the multi-factormachine learning models. While we presume that a straightforward application of recurrent neural network overﬁts tothe data up to the current time horizon, several attempts to capture long-range interactions, such as memory networksand attention mechanisms, can be applied to predict cross-sectional investments.

References [1] Eugene F Fama and Kenneth R French. The cross-section of expected stock returns.

J. of Finance , 47(2):427–465,1992.[2] Keywan Christian Rasekhschaffe and Robert C. Jones. Machine learning for stock selection.

Financial AnalystsJournal , 75(3):70–88, 2019.[3] George S Atsalakis and Kimon P Valavanis. Surveying stock market forecasting techniques–part ii: Soft computingmethods.

Expert Systems with Applications , 36(3):5932–5941, 2009.[4] Rodolfo C Cavalcante, Rodrigo C Brasileiro, Victor LF Souza, Jarley P Nobrega, and Adriano LI Oliveira.Computational intelligence and ﬁnancial markets: A survey and future directions.

Expert Systems with Applications ,55:194–211, 2016.[5] Asriel E Levin. Stock selection via nonlinear multi-factor models. In

NIPS , pages 966–972, 1996.[6] Alan Fan and Marimuthu Palaniswami. Stock selection using support vector machines. In

IJCNN , volume 3,pages 1793–1798. IEEE, 2001.[7] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. In

NIPS ,pages 3104–3112, 2014.[8] Dan C. Ciresan, Ueli Meier, Jonathan Masci, and Jürgen Schmidhuber. Multi-column deep neural network fortrafﬁc sign classiﬁcation.

Neural Networks , 32:333–338, 2012.[9] Eunsuk Chong, Chulwoo Han, and Frank C Park. Deep learning networks for stock market analysis and prediction:Methodology, data representations, and case studies.

Expert Systems with Applications , 83:187–205, 2017.[10] Yin-Wong Cheung and Lilian K Ng. A causality-in-variance test and its application to ﬁnancial market prices.

Journal of econometrics , 72(1-2):33–48, 1996. 10

PREPRINT - O

CTOBER

4, 2019[11] Aymen Ben Rejeb and Mongi Arfaoui. Financial market interdependencies: A quantile regression analysis ofvolatility spillover.

Research in International Business and Finance , 36:140–157, 2016.[12] Avanidhar Subrahmanyam. The cross-section of expected stock returns: What have we learnt from the pasttwenty-ﬁve years of research?

European Financial Management , 16(1):27–42, 2010.[13] Francis EH Tay and Lijuan Cao. Application of support vector machines in ﬁnancial time series forecasting. omega , 29(4):309–317, 2001.[14] James Douglas Hamilton.

Time series analysis , volume 2. Princeton university press Princeton, NJ, 1994.[15] Richard C Grinold and Ronald N Kahn. Active portfolio management. 2000.[16] Eugene F Fama and Kenneth R French. Common risk factors in the returns on stocks and bonds.

J. of FinancialEconomics , 33(1):3–56, 1993.[17] Campbell R Harvey, Yan Liu, and Heqing Zhu. . . . and the cross-section of expected returns.

Rev. of FinancialStudies , 29(1):5–68, 2016.[18] Alex Chinco, Adam D Clark-Joseph, and Mao Ye. Sparse signals in the cross-section of returns.

J. of Finance ,74(1):449–492, 2019.[19] Robert Tibshirani. Regression shrinkage and selection via the lasso.

JRSS: Series B , 58(1):267–288, 1996.[20] J. B. Heaton, Nicholas G. Polson, and J. H. Witte. Deep portfolio theory.

CoRR , abs/1605.07230, 2016.[21] Masaya Abe and Hideki Nakayama. Deep learning for forecasting stock returns in the cross-section. In

PAKDD ,pages 273–284. Springer, 2018.[22] Kei Nakagawa, Takumi Uchida, and Tomohisa Aoshima. Deep factor model. In

ECML PKDD 2018 Workshops ,pages 37–50. Springer, 2018.[23] Seisuke Sugitomo and Shotaro Minami. Fundamental factor models using machine learning.

J. of MathematicalFinance , 8:111–118, 2018.[24] R David McLean and Jeffrey Pontiff. Does academic research destroy stock return predictability?

J. of Finance ,71(1):5–32, 2016.[25] Victor Demiguel, Lorenzo Garlappi, and Raman Uppal. Optimal versus naive diversiﬁcation: How inefﬁcient isthe 1/n portfolio strategy?

Rev. of Financial Studies , 22, 05 2009.[26] Barr Rosenberg and Walt McKibben. The prediction of systematic and speciﬁc risk in common stocks.

J. ofFinancial and Quantitative Analysis , 8(2):317–333, 1973.[27] Richard H. R. Hahnloser, Rahul Sarpeshkar, Misha A. Mahowald, Rodney J. Douglas, and H. Sebastian Seung.Digital selection and analogue ampliﬁcation coexist in a cortex-inspired silicon circuit.

Nature , 405:947–951,2000.[28] Diederik P Kingma and Jimmy Ba. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 ,2014.[29] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internalcovariate shift. arXiv preprint arXiv:1502.03167 , 2015.[30] Michael W. Brandt. Portfolio choice problems.

Handbook of Financial Econometrics, Vol 1 , 1, 12 2010.[31] Malik Magdon-Ismail and Amir F Atiya. Maximum drawdown.

Risk Magazine , 17(10):99–102, 2014.[32] Weiwei Shen and Jun Wang. Portfolio selection via subset resampling. In

AAAI , pages 1517–1523, 2017.[33] Hung-Ling Chen, Cheng-Yi Shiu, Hui-Shan Wei, et al. Price effect and investor awareness: Evidence from mscistandard index reconstitutions.

J. of Empirical Finance , 50(C):93–112, 2019.[34] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel,Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python.

JMLR , 12(Oct):2825–2830, 2011.[35] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, SanjayGhemawat, Geoffrey Irving, Michael Isard, et al. Tensorﬂow: a system for large-scale machine learning. In

OSDI ,volume 16, pages 265–283, 2016.[36] Harry Markowitz. Portfolio selection.

J. of ﬁnance , 7(1):77–91, 1952.[37] Weiwei Shen, Bin Wang, Jian Pu, and Jun Wang. The kelly growth optimal portfolio with ensemble learning. In

AAAI , pages 1134–1141, 2019. 11

PREPRINT - O

CTOBER

4, 2019

A Appendix A: List of Factors Used in This Paper

Table 6 shows the list of 20 factors used in this paper. The ﬁnancial data is acquired from Compustat, WorldScope andReuters Fundamentals (ordered by the priority). Note that the Compustat data source, which is mainly used to buildfactors for North America, involves a delay of maximum three months, and the other sources, which are mainly usedfor Asia Paciﬁc, involve a delay of four months. These data sources are used to calculate the factors from No. 1 to No.14. The Earnings per share (EPS) revisions, which indicate the future value of a company, are obtained from ThomsonReuters Estimates and I/B/E/S Estimates (ordered by the priority).We can classify these factors into three types; technical, fundamental and both. Technical factors are calculated fromhistorical stock prices, whereas fundamental factors are calculated from the qualitative and quantitative information of acompany [4].Table 6: List of the factors. F (resp. T) in the “Type” column indicates that the corresponding factor is derived from itsfundamental (resp. technical) property of the stock, and B in the column indicates the factor derived from both of thefundamental and technical properties.No Factor Description Type1 Book-value to Price Ratio Net Asset/Market Value B2 Earnings to Price Ratio Net Proﬁt/Market Value B3 Dividend Yield Dividend/Market Value B4 Sales to Price Ratio Sales/Market Value B5 Cash Flow to Price Ratio Operating cash ﬂow/Market Value B6 Return on Equity Net Proﬁt/Net Asset F7 Return on Asset Net Operating Proﬁt/Total Asset F8 Return on Invested Capital Net Operating Proﬁt After Taxes/(Liabilities with interest + Net Asset) F9 Accruals -(Changes in Current Assets and Liability-Depreciation)/Total Asset F10 Total Asset Growth Rate Change Rate of Total Assets from the previous period F11 Current Ratio Current Asset/Current Liability F12 Equity Ratio Net Asset/Total Asset F13 Total Asset Turnover Rate Sales/Total Asset F14 Capital Expenditure Growth Rate Change Rate of Capital Expenditure from the previous period F15 EPS Revision (1 month) 1 month Earnings Per Share (EPS) Revision B16 EPS Revision (3 month) 3 month Earnings Per Share (EPS) Revision B17 Momentum (1 month) Stock Returns in the last month T18 Momentum (12-1 month) Stock Returns in the past 12 months except for last month T19 Volatility Standard Deviation of Stock Returns in the past 60 months T20 Skewness Skewness of Stock Returns in the past 60 months T12

PREPRINT - O

CTOBER

4, 2019

B Appendix B: Additional Experiments

B.1 Different Hyperparameters in Off-the-shelf Models

Tables 9 and 10 show the results of RF with different depths, whereas Tables 7 and 8 show the results of LASSOwith different magnitudes of the regularizer. Overall, comparison with Tables 1 and 2 show that (i) RF falls below thecompetitors, and (ii) LASSO, which outperforms RF, still fall below RIC-NN. We have also conﬁrmed that a LASSOwith a regularizer stronger than . suppresses most of the features, which yields meaningless results.Table 7: Results of LASSO with different magnitudes ofregularizer in MSCI North America.Long Regularizer0.001 0.0001 0.00001Alpha 0.62% 0.53% 1.14%TE 5.40% 4.60% 4.21%IR 0.11 0.11 0.27MaxDD -21.84% -17.85% -16.02%Long-Short Regularizer0.001 0.0001 0.00001AR 2.24% 0.76% 2.00%RISK 10.90% 9.56% 8.81%R/R 0.21 0.08 0.23MaxDD -34.73% -35.10% -34.52% Table 8: Results of LASSO with different magnitudes ofregularizer in MSCI Paciﬁc.Long Regularizer0.001 0.0001 0.00001Alpha 5.35% 5.23% 4.90%TE 5.17% 4.58% 4.20%IR 1.04 1.14 1.17MaxDD -11.53% -8.43% -6.81%Long-Short Regularizer0.001 0.0001 0.00001AR 10.27% 10.42% 8.99%RISK 9.23% 8.35% 8.08%R/R 1.11 1.25 1.11MaxDD -18.07% -13.30% -13.34%Table 9: Results of RF with different depths in MSCI NorthAmerica. Long Depth3 5 7Alpha 0.77% 0.85% 0.92%TE 4.80% 4.61% 4.33%IR 0.16 0.18 0.21MaxDD -21.76% -23.21% -21.16%Long-Short Depth3 5 7AR 1.62% 2.44% 2.38%RISK 11.10% 10.47% 9.86%R/R 0.15 0.23 0.24MaxDD -40.78% -34.90% -34.10% Table 10: Results of RF with different depths in MSCIPaciﬁc. Long Depth3 5 7Alpha 2.48% 3.53% 4.24%TE 5.28% 4.96% 4.94%IR 0.47 0.71 0.86MaxDD -13.04% -9.21% -7.36%Long-Short Depth3 5 7AR 6.48% 7.83% 8.74%RISK 8.63% 8.61% 8.87%R/R 0.75 0.91 0.99MaxDD -20.09% -15.13% -13.56%13 PREPRINT - O

CTOBER

4, 2019