[PDF] Deeply Equal-Weighted Subset Portfolios

Abstract

Full PDF

DDeeply Equal-Weighted Subset Portfolios

Sang Il Lee ∗ DeepAllocation TechnologiesJune 26, 2020

Abstract

The high sensitivity of optimized portfolios to estimation errors has prevented theirpractical application. To mitigate this sensitivity, we propose a new portfolio modelcalled a Deeply Equal-Weighted Subset Portfolio (DEWSP). DEWSP is a subset oftop- N ranked assets in an asset universe, the members of which are selected based onthe predicted returns from deep learning algorithms and are equally weighted. Herein,we evaluate the performance of DEWSPs of diﬀerent sizes N in comparison with theperformance of other types of portfolios such as optimized portfolios and historicallyequal-weighed subset portfolios (HEWSPs), which are subsets of top- N ranked assetsbased on the historical mean returns. We found the following advantages of DEWSPs:First, DEWSPs provides an improvement rate of 0.24% to 5.15% in terms of monthlySharpe ratio compared to the benchmark, HEWSPs. In addition, DEWSPs are builtusing a purely data-driven approach rather than relying on the eﬀorts of experts.DEWSPs can also target the relative risk and return to the baseline of the EWP of anasset universe by adjusting the size N . Finally, the DEWSP allocation mechanism istransparent and intuitive. These advantages make DEWSP competitive in practice. Despite the signiﬁcant success of deep learning, its application to stock trading remains ex-tremely challenging owing to the volatile movements of stock prices, making it diﬃcult todeﬁne the input values and understand how to apply the output values. Machine learningmodels are built on a training set and are tested on a disjointed test set to prove theirgeneralization capability, and are commonly applied in various applications such as imageprocessing, image recognition, speech recognition, and Internet searches. However, this ap-proach is limited when applied to the ﬁnancial ﬁeld owing to the time-evolving propertiesof the ﬁnancial markets, for example, structural breaks at occasional time points [1, 2],volatility clustering [3], and time-varying mean returns [4]. Furthermore, the time orderingof ﬁnancial data prevents the use of cross-validation as a reliable estimate of the ensemble ∗ Electronic address: [email protected] a r X i v : . [ q -f i n . P M ] J un eneralization error. As a result, the performance of ﬁnancial time-series models tends tobe extremely sensitive to pre-speciﬁed periods, showing the high power of in-sample (IS)prediction and the poor power of out-of-sample (OOS) prediction [5]. This hampers thepractical use of portfolio optimization techniques because an optimization is prone to the‘garbage in, garbage out’ phenomenon, in which biases occur in a portfolio selection unlesspredictions are adjusted suitably for an estimation error. To mitigate this problem, we builta new model called a Deeply Equal-Weight Subset Portfolios (DEWSPs) that combines deeplearning techniques with an equal-weight strategy. Portfolio theory

The mean-variance portfolio (MVP) theory, pioneered by Markowitz(1952) [6], has long been recognized as the cornerstone of modern portfolio theory (MPT).It provides a mathematical framework for determining a set of portfolios with a maximizedexpected return per unit of risk; in addition, the return and risk of a set are drawn as a line,called an eﬃcient frontier, on a risk-return plane. However, despite the theoretical advancesin portfolio models including the MVP and its extensions, their practical use remains prob-lematic owing to the diﬃculty in estimating reliable expected returns, which critically aﬀectthe performance of the portfolio [7]. For example, MVPs are not necessarily well-diversiﬁed[8], portfolio optimizers are often “error maximizers” [9], and a mean–variance optimizationcan produce extreme or non-intuitive weights for some of the assets in the portfolio [10, 11].Many studies have attempted to apply improved estimation procedures and mitigate theestimation error problem. These include Bayesian methods [12, 13], shrinkage methods [14][15, 16], a factor structure imposed on the returns [17], and the combination of a tangencyportfolio, a risk-free rate, and a global minimum variance portfolio [18].

Equal-weight portfolio (EWP)

There is a growing body of evidence showing that the useof simple rules of thumb is more successful than optimization. The most well-known exampleis the EWP, also called 1 /N naive diversiﬁcation, which is free of parameter uncertainty andhas the following properties: It never shorts any assets, it avoids a concentration, and upon arebalancing of the dates, it sells high and buys low, thus exploiting a possible mean-reversioneﬀect [19]. The strength of an EWP is well known experimentally [7, 12, 20, 21]. DeMiguelet al.’s study [20] is particularly convincing because the authors evaluated 14 models on 7empirical datasets. They found that none of the 14 models consistently outperform a 1 /N EWP. Tu and Zhou (2011) [22] showed the combination of an EWP and more sophisticatedmodes [6, 12, 17, 18] is a way to improve performance. The importance of the EW ap-proach lies in its simplicity and widespread use. In addition, Bernartzi and Thaler (2001)[23] demonstrated that EW diversiﬁcation is ingrained in human behavior by ﬁnding that aconsiderable fraction of participants equally distribute their contributions across the avail-able investment opportunities. This implies that investment decisions tend to use intuitionto choose a security and do not necessarily rely on sophisticated formal techniques. Investorscan execute an EW strategy with a large universe but extremely low transaction costs usingequally weighted exchange traded funds (ETFs), for example, Direxion NASDAQ-100 EqualWeighted Index Shares, First Trust Dow 30 Equal Weight ETF, and Goldman Sachs EqualWeight U.S. Large Cap Equity ETF.

DEWSP

DEWSPs are constructed by incorporating deep learning techniques into an EW2trategy. The building procedure consists of three steps: First, the 1-month ahead return ofassets is forecasted using deep learning algorithms. Second, assets are ranked in descendingorder based on the forecasts. Finally, subset portfolios are constructed with top- N rankedassets that are equally weighted. Our contribution is as follows: • DEWSP is fully data-driven based on hyperparameter optimization. The entire pro-cess is automatic without the views of human experts in building the models, whichcontributes to reduced costs in terms of portfolio management. We also use publicdata on the prices and volume, which can be publicly obtained from various Web sites.Thus, DEWSPs are easily reproducible. • DEWSPs show an increase in their risk and return from the baseline of the EWPwith a decrease in the number of assets. This means that it is possible to controlthe aggressiveness of DEWSPs in terms of their risk-return tradeoﬀ. This mitigatesdiﬃculties in understanding the black-box portfolio optimization and in tailoring therisk and return of ranked portfolios based on ﬁnancial factors (e.g., based on size, value,and leverage).

Related papers

This study covers stock prediction using deep learning methods and ranked-portfolios. Deep learning models are on the rise, showing impressive results in modelingthe complex behavior of ﬁnancial data. Examples include stock prediction based on longshort-term memory (LSTM) networks [24], deep portfolios based on deep autoencoders [25],threshold-based portfolios using recurrent neural networks [26], deep factor models usingdeep feed-forward networks [27], a time-varying multi-factor model using LSTM networks[28], and an enhancing standard factor model using deep learning [29].Ranked portfolios are widely used with varying degrees of complexity, and their ba-sic premise is the same: ranking stocks-based on factors such as their value, momentum,quality, size, low risk, and a combination of these factors, and then selecting a particularproportion of the top-ranked stocks to add to the portfolio. These include portfolios rankedin terms of size and book-to-market [30], portfolios ranked on value and momentum factors[31], portfolios ranked on time-series momentum [32], and portfolios ranked on binary clas-siﬁcation using returns predicted through deep learning [24].The remainder of this paper is organized as follows: In Section 2, we describe the dataand preprocessing methods applied. In Section 3, we describe the experimental setting andimplementation. In Section 4, we provide the experimental results and compare diﬀerentportfolio models. Finally, some concluding remarks are oﬀered in section 5.

Small portfolios are considered for an easier analysis, and are important for several practicalreasons [33]: First, it is diﬃcult for small investors to acquire and continuously monitor alarge portfolio. Second, large investors need to identify a threshold where the cost exceeds the3eneﬁt of risk reduction from diversiﬁcation. Third, large portfolios amplify the estimationerrors during the optimization process. To select a small but well-diversiﬁed universe, we referto the most commonly applied classiﬁcation system, i.e., the Global Industry ClassiﬁcationStandard (GICS). The asset universe consists of 22 diversiﬁed stocks in Standard and Poor’s500 index (S&P 500) that belong to 11 diﬀerent GICS sectors: • Energy : ExxonMobil (XOM) and Chevron (CVX),

Utilities : Duke Energy (DUK)and Consolidated Edison (ED),

Materials : Sherwin-Williams (SHW) and DuPont(DD),

Industrials : Boeing (BA) and Union Paciﬁc (UNP),

Consumer Discre-tionary : Amazon (AMZN) and McDonald’s (MCD)

Consumer Staples : Coca-Cola(KO) and Procter & Gamble (PG)

Healthcare : United Health Group (UNH) andJohnson & Johnson (JNJ)

Financials : Berkshire Hathaway (BRK-B) and JPMor-gan Chase (JPM)

Information Technology Sector : Apple (AAPL) and Microsoft(MSFT),

Communication Services : Facebook (FB) and Alphabet (GOOG),

RealEstate : American Tower (AMT) and Simon Property Group (SPG).We use data from Yahoo Finance from January 1997 to October 2019, which is the com-mon period of data availability. The monthly stock dataset contains ﬁve attributes: openprice, high price, low price, adjusted close price, and volume (OHLCV). The last of the dailyOHLCV datasets per month is used as the raw dataset. For each experiment, we split thedata into an in-sample (70%) period and an out-of-sample (30%) period. The in-sample dataare divided again into a training dataset (50%) for developing the prediction models and avalidation set (50%) for evaluating its predictive ability.

Technical indicators

A technical analysis is a method for forecasting price movementsusing past prices and volume and includes a variety of forecasting techniques such as a chartanalysis, cycle analysis, and computerized technical trading systems.A technical analysis has a long history of widespread use by participants in speculativemarkets [34, 35, 36, 37, 38, 39], and there is a large body of academic evidence demonstratingthe usefulness of such analysis, including theoretical support [40] and empirical evidence[41, 42], as well as the role of such analysis in out-of-sample equity premium predictability[43, 44, 45]. We used a full set of 14 technical indicators based on 3 types of popular technicalstrategies, i.e., the moving average crossover, momentum, and volume rules: • The time-series momentum indicator, MOM( m ), is the generation of a buy signalwhen the price is higher than the historical price. Its validation is supported by theobservation that the “trend” eﬀect persists for approximately 1 year and then partiallyreverses over a longer timeframe. Here, MOM t ( m ) at time t is deﬁned as follows:MOM t ( m ) = (cid:40) , if P t ≥ P t − m − , otherwise . (1)where P t is the index value at time t , and m is the look-back period. We use m = 1, 3,6, 9 and 12, which are respectively labeled as MOM t (1M), MOM t (3M), MOM t (6M),MOM t (9M), and MOM t (12M). 4 The moving average indicator, MA( s, l ), provides a signal for an upward or downwardtrend. A buy signal is generated when the short-term moving average crosses above thelong-term moving average because this represents the beginning of an upward trend. Asell signal is generated when the short-term moving average falls below the long-termmoving average because this represents the beginning of a downward trend.Let us deﬁne a simple moving average of the index as follows:MA

Pj,t = (1 /j ) j − (cid:88) i =0 P t − m for j = s or l, (2)where s and l are the look-back periods for short and long moving averages. Themoving average indicator MA t ( s, l ) is then designed as follows:MA t ( s, l ) = (cid:40) , if MA Ps,t ≥ MA Pl,t − , otherwise . (3)The six moving average indicators are constructed for s =1, 2, and 3, and for l = 9 and12, which are symbolized as MA(1M-9M), MA(1M-12M), MA(2M-9M), MA(2M-12M),MA(3M-9M), and MA(3M-12M). • The volume indicator, VOL( s, l ), indicates a strong market trend if the recent stockmarket volume and stock price increase. Let us deﬁne the on-balance volume (OBV)as follows: OBV t = t (cid:88) k =1 V OL k D k , (4)where V OL k is a measure of the trading volume (i.e., number of shares traded) duringperiod k , and D k is a binary variable: D k = (cid:40) , if P k ≥ P k − − , otherwise . (5)The value of OBV t conceptionally measures both positive and negative volume basedon the belief that changes in volume can predict a stock movement. The volume-basedindicator is then deﬁned as the diﬀerence between the moving averages with an s -periodand an l -period:VOL( s, l ) = (cid:40) , if MA OBV s,t ≥ MA OBV l,t − , otherwise . (6)Here, MA OBV j,t = (1 /j ) (cid:80) j − i =0 OBV t − i is the moving average of OBV t for j = s or l . Thesix moving average indicators are constructed for s =1, 2, and 3 and for l = 9 and12, which are symbolized as follows: VOL(1M-9M), VOL(1M-12M), VOL(2M-9M),VOL(1M-12M), VOL(3M-9M) and VOL(3M-12M).5 Frameworks

For a comparative analysis, we also built three diﬀerent types of portfolios, which are distinctin terms of their optimization or estimation process. All portfolios are built on the followingassumptions: (1) all stocks are inﬁnitely divisible; (2) there are no restrictions on the buyingor selling of any selected portfolio; (3) there is no friction (e.g., transaction costs, taxation,commissions, or liquidity); and (4) it is possible to buy and sell stocks at the closing pricesat any time t . We adapt a periodic rebalancing strategy in which the investor adjusts theweights in the investor’s portfolio at the close price on the last business day of every month. List of portfolios considered : • DEWSP: This is a subset of portfolios that consist of the top N -th ranked assets amongall N assets based on their expected returns. • EW whole portfolio (EWWP): This is a traditional EWP of all assets N , and can beviewed as a special case of DEWSP when N = N . Because there are no parameterestimations, it serves as the baseline for an evaluation of the risk and return of theDEWPs of diﬀerent sizes. • Historically EW subset portfolios (HEWSPs): Like DEWSPs, HEWSPs are top-rankedsubset portfolios, although their expected returns are estimated as a historical averageover the training and validation (HEWSP-TV) and historical average over the valida-tion (HEWSP-V). This reveals the eﬀect of the return prediction of the DEWSPs. • Randomly EW subset portfolios (REWSPs): These are subsets of portfolios consistingof N assets selected randomly, without the use of a ranking method. A comparisonbetween REWSPs and DEWSPs and HEWSPs reveals the eﬀect of the estimatedreturn prediction. • Maximum Sharpe ratio portfolios (MSRPs): These are complete portfolios that aremaximized to achieve the highest Sharpe ratio, and are mathematically deﬁned asfollows: max w t w Tt µ t / (cid:112) w Tt Σ t w t s.t. w Tt = 1 , and w i,t ≥ , ∀ i, (7)where µ t is a vector of N predicted returns, w t = ( w ,t , . . . , w N ,t ) T is a vector ofportfolio weights, Σ t is a covariance matrix of the asset returns, N = (1 , . . . , T is an N -dimensional vector, and w Tt µ t and w Tt (cid:80) t w t are the portfolio return and variance,respectively. Because µ t and Σ t are unknown in practice, we replace them with (cid:98) µ t from deep learning algorithms and (cid:98) Σ t from an in-sample dataset. A comparison withDEWSPs reveals the eﬀect of the estimation error on a portfolio optimization. • Minimum variance portfolios (MVPs): These are complete portfolios optimized for thelowest volatility, and solve the following constrained minimization problem:min w w Tt Σ t w t s.t. w Tt = 1 , and w i,t ≥ , ∀ i. (8)6 comparison with DEWSPs reveals the eﬀectiveness of optimization under the con-dition of no estimation errors. A multilayer feedforward neural network (FFNN) was used in this study. We used Tree-structured Parzen Estimator (TPE) approach [46] for automated hyperparameter tuningand Table 1 presents the list of hyperparameters and their values. Each optimization runwas initialized with randomly selected points, after which it proceeded sequentially for atotal of 50 function evaluations. During one evaluation run, the FFNN was trained over anin-sample training data. The mean squared error (MSE) is calculated on a validation setper function evaluation, early stopping was applied when there is no improvement on thevalidation accuracy after 10 continuous epochs.

We used two popular regularization methods, i.e., a dropout and batch normalization (BN).A dropout [47] is a simple way to prevent co-adaptation among hidden nodes of a deepfeed-forward neural network by dropping out randomly selected hidden nodes. In recentyears, BN [48] has replaced a dropout in modern neural network architectures, and uses thedistribution of the summed input to a speciﬁc neuron over a mini-batch of training casesto compute the mean and variance, which are then used to normalize the summed input tothat neuron for each training case. A dropout and BN layers were employed for all hiddenlayers.

Average percent change (APC)

The APC measures the rate of change in a DEWSPreturn and the volatility as size N increases from N = 1 to N = N to see the rate of changefrom the baseline of N = N to N = 1, and is deﬁned as follows:APC x = 1 N − N − (cid:88) N =1 x N − x N +1 x N +1 , (9)where x is r t or σ t . Average Sharpe ratio improvement rate (ASRIR)

ASRIR measures the relative im-provement of the DEWSPs as compared to the HEWSP benchmark in terms of the Sharperatio (SR), and is deﬁned as follows:ASRIR = 1 N N (cid:88) N =1 x N DEWSP − x N HEWSP-TV/T x N HEWSP-TV/T , (10)where x is the SR of DEWSPs and HEWSPs of the same size N .7able 1: List of hyperparameters and range of each hyperparameter.Hyperparamter Considered values/functionsNumber of Hidden Layers {

2, 3 } Number of Hidden Units {

2, 4, 8, 16 } Standard deviation { } Dropout { } Batch Size {

28, 64, 128 } Optimizer { RMSProp, ADAM, SGD (no momentum) } Activation Function Hidden layer: { tanh, ReLU, sigmoid } , Output layer: LinearLearning Rate { } Number of Epochs { } Number of layers : number of layers of a neural network.

Number of hidden units : numberof units in the hidden layers of a neural network.

Standard deviation : standard deviationof a random normal initializer.

Dropout : dropout rates.

Bath size : number of samples perbatch.

Activation : sigmoid function σ ( z ) = 1 / (1 + e − z ), hyperbolic tangent function tanh( z ) =( e z − e − z ) / ( e z − e − z ), and rectiﬁed linear unit (ReLU) function ReLU( z ) = max(0 , z ). LearningRate : learning rate of the back-propagation algorithm.

The Number of Epochs : number ofiterations over all training data.

Optimizer : stochastic gradient descent (SGD) [49], RMSProp[50], and ADAM [49]

We examined the portfolio performance over both IS and OOS periods for three diﬀerentuniverses: a total of 22 stocks (Exp. I), with 11 stocks consisting of the ﬁrst stocks of eachsector on the list (Exp. II), and the other 11 stocks (Exp. III). The following observationwas made based on the empirical simulation results. • The left side of Figure 1 graphically shows the realized risk and return points of theportfolios on the risk-return plane. Each color represents a diﬀerent type of portfolio,and diﬀerent points with the same color represent diﬀerent sizes. A comparison ofDEWSPs and HEWSPs with the REWSPs of a (seemingly) random pattern indicatesthat the prediction-based ranking assets can be used to construct portfolios with in-creasing return and volatility as N decreases. In Table 2, APC r s and APC σ s indicatequantitative measurements of the increase over Exp. I, II, and III, and APC r / APC σ shows the degree of trade-oﬀ between the return and risk. • We also found ASRIRs of 21 .

15, 27 .

04, and 13 .

09% for Exp. I, II, and III, respectively,indicating the superiority of DEWSP during the IS period. • MVP, as expected, achieves the least volatility of 0 .

99, and MSRP achieves the high-est Sharpe ratio of 0.65 ( µ = 0.026 and σ = 0.040), which outperform those of the8 .03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 Risk (Realized volatility) R e a li z e d r e t u r n DEWPHEWP-THEWP-TVRandPy=0.51xMSRPMVP 0.03 0.04 0.05 0.06 0.07

Risk (Realized volatility) R e a li z e d r e t u r n DEWPHEWP-THEWP-TVRandPy=0.455xMSRPMVP

Figure 1: Realized risk vs. return of six diﬀerent types of portfolios for the in-sample (left)and out-of-sample (right) experiments. The dotted lines specify the maximum SR estimate.Table 2: Performance evaluation results of DEWSPs over in-sample period.Metrics (%) Exp. I Exp. II Exp. IIIAPC r σ r / APC σ • The DEWSPs are built using 1-month ahead predicted returns from the trained model,and the HEWSPs are built using the average historical return over the in-sample period. • The computation results are summarized on the right side of Figure 1 and in Table3. As with the IS experiment, the return and volatility of the DEWSPs and HEWSPsstill show an increasing pattern with the positive APC values. This allows us to tailorthe portfolio’s return and risk for investment purposes.Table 3: Performance evaluation results of DEWSPs over the out-of-sample period.Metrics (%) Exp. I Exp. II Exp. IIIAPC r σ r / APC σ The ASRIRs ranged from 0.24 to 5.15% indicate that the DEWSPs outperform thehistorical models in terms of the monthly SR. The values are small compared to those ofthe in-sample ASRIRs, but indicate promising results. First, we can beat the HEWSPbenchmark, and second, we can tailor the return and volatility of the portfolios relativeto the baseline of the EWWP. • Although the MVP without a parameter estimation still gives the least volatility at0 . .

44% ( µ = 0 .

014 and σ = 0 . Despite the signiﬁcant success of machine learning in numerous ﬁelds, stock prediction isstill severely limited owing to its seasonal, non-stationary, and unpredictable nature. Con-sequently, portfolio models are inevitably exposed to the risk of estimation errors, whichhinders their performance.To cope with such risk, we have proposed a new DEWSP model by incorporating deep-learning-based predictions into the framework of the EW strategy. We empirically demon-strated that DEWSPs can be used to target the levels of portfolio return and risk relativeto the baseline of the EWWPs by adjusting the number of assets, and that its mechanismis clear in terms of the risk-return trade-oﬀ. We also showed that DEWSPs are superior toHEWSPs in terms of the SR and that the mean-variance optimization ampliﬁes the estima-tion error dramatically, which results in a substantially worse Sharpe ratio. To summarize,DEWSPs are attractive from an implementation perspective, i.e., the use of public stockdata, a transparent mechanism based on a risk-return trade-oﬀ, automatic hyperparameteroptimization, the existence of a baseline of the EWP and a benchmark of the HEWSP, thecapability of building portfolios using small numbers of assets (with expandability to largeassets), and a simple incorporation of deep learning algorithms into the portfolio scheme.

References [1] Sydney C Ludvigson, Sai Ma, and Serena Ng. Uncertainty and business cycles: exoge-nous impulse or endogenous response? Technical report, National Bureau of EconomicResearch, 2015.[2] James H Stock and Mark W Watson. Modeling inﬂation after the crisis. Technicalreport, National Bureau of Economic Research, 2010.[3] Rama Cont. Volatility clustering in ﬁnancial markets: empirical facts and agent-basedmodels. In

Long memory in economics , pages 289–309. Springer, 2007.[4] Robert F Engle, David M Lilien, and Russell P Robins. Estimating time varying riskpremia in the term structure: The arch-m model.

Econometrica: journal of the Econo-metric Society , pages 391–407, 1987. 105] John Y Campbell and Samuel B Thompson. Predicting excess stock returns out ofsample: Can anything beat the historical average?

The Review of Financial Studies ,21(4):1509–1531, 2007.[6] Harry Markowitz. Portfolio selection.

Journal of Finance , 7(1):77–91, 1952.[7] J David Jobson and Bob Korkie. Estimation for markowitz eﬃcient portfolios.

Journalof the American Statistical Association , 75(371):544–554, 1980.[8] Richard C Green and Burton Holliﬁeld. When will mean-variance eﬃcient portfolios bewell diversiﬁed?

The Journal of Finance , 47(5):1785–1809, 1992.[9] Richard O Michaud and Robert O Michaud.

Eﬃcient asset management: a practicalguide to stock portfolio optimization and asset allocation . Oxford University Press, 2008.[10] Fischer Black and Robert Litterman. Global asset allocation with equities, bonds, andcurrencies.

Fixed Income Research , 2(15-28):1–44, 1991.[11] Fischer Black and Robert Litterman. Global portfolio optimization, ﬁnancial analysisjournal. 1992.[12] Philippe Jorion. Bayes-stein estimation for portfolio analysis.

Journal of Financial andQuantitative analysis , 21(3):279–292, 1986.[13] L’uboˇs P´astor. Portfolio selection and asset pricing models.

The Journal of Finance ,55(1):179–223, 2000.[14] Olivier Ledoit and Michael Wolf. A well conditioned estimator for large dimensionalcovariance matrices. 2000.[15] Olivier Ledoit and Michael Wolf. Improved estimation of the covariance matrix ofstock returns with an application to portfolio selection.

Journal of empirical ﬁnance ,10(5):603–621, 2003.[16] Zhenyu Wang. A shrinkage approach to model uncertainty and asset allocation.

Reviewof Financial Studies , pages 673–705, 2005.[17] A Craig MacKinlay and Lubos Pastor. Asset pricing models: Implications for expectedreturns and portfolio selection. Technical report, National Bureau of Economic Re-search, 1999.[18] Raymond Kan and Guofu Zhou. Optimal portfolio choice with parameter uncertainty.

Journal of Financial and Quantitative Analysis , 42(3):621–656, 2007.[19] Mark Kritzman, Sbastien Page, and David Turkington. In defense of optimization: thefallacy of 1/n.

Financial Analysts Journal , 66(2):31–39, 2010.[20] Victor DeMiguel, Lorenzo Garlappi, and Raman Uppal. Optimal versus naive diversi-ﬁcation: How ineﬃcient is the 1/n portfolio strategy?

The review of Financial studies ,22(5):1915–1953, 2009. 1121] Ran Duchin and Haim Levy. Markowitz versus the talmudic portfolio diversiﬁcationstrategies.

The Journal of Portfolio Management , 35(2):71–74, 2009.[22] Jun Tu and Guofu Zhou. Markowitz meets talmud: A combination of sophisticated andnaive diversiﬁcation strategies.

Journal of Financial Economics , 99(1):204–215, 2011.[23] Shlomo Benartzi and Richard H Thaler. Naive diversiﬁcation strategies in deﬁnedcontribution saving plans.

American economic review , 91(1):79–98, 2001.[24] Thomas Fischer and Christopher Krauss. Deep learning with long short-term memorynetworks for ﬁnancial market predictions.

European Journal of Operational Research ,270(2):654–669, 2018.[25] JB Heaton, NG Polson, and Jan Hendrik Witte. Deep learning for ﬁnance: deep port-folios.

Applied Stochastic Models in Business and Industry , 33(1):3–12, 2017.[26] Sang Il Lee and Seong Joon Yoo. Threshold-based portfolio: the role of the thresholdand its applications.

The Journal of Supercomputing , pages 1–18, 2018.[27] Kei Nakagawa, Takumi Uchida, and Tomohisa Aoshima. Deep factor model. In

ECMLPKDD 2018 Workshops , pages 37–50. Springer, 2018.[28] Kei Nakagawa, Tomoki Ito, Masaya Abe, and Kiyoshi Izumi. Deep recurrent factormodel: Interpretable non-linear and time-varying multi-factor model. arXiv preprintarXiv:1901.11493 , 2019.[29] John Alberg and Zachary C Lipton. Improving factor-based quantitative investing byforecasting company fundamentals. arXiv preprint arXiv:1711.04837 , 2017.[30] Eugene F Fama and Kenneth R French. Size and book-to-market factors in earningsand returns.

The journal of ﬁnance , 50(1):131–155, 1995.[31] Cliﬀord S Asness, Tobias J Moskowitz, and Lasse Heje Pedersen. Value and momentumeverywhere.

The Journal of Finance , 68(3):929–985, 2013.[32] Tobias J Moskowitz, Yao Hua Ooi, and Lasse Heje Pedersen. Time series momentum.

Journal of ﬁnancial economics , 104(2):228–250, 2012.[33] Dietmar G Maringer.

Portfolio management with heuristic optimization , volume 8.Springer Science & Business Media, 2006.[34] Seymour Smidt.

Amateur Speulators: A Survey of Trading Styles, InformationSources and Patterns of Entry Into and Exit from Commodity-futures Markets by Non-professional Speculators . Graduate School of Business and Public Administration, Cor-nell University, 1965.[35] Randall S Billingsley and Don M Chance. Beneﬁts and limitations of diversiﬁcationamong commodity trading advisors.

Journal of Portfolio Management , 23(1):65, 1996.1236] William Fung and David A Hsieh. The information content of performance track records:investment style and survivorship bias in the historical returns of commodity tradingadvisors.

Journal of Portfolio Management , 24(1):30–41, 1997.[37] Lukas Menkhoﬀ. Examining the use of technical currency analysis.

International Journalof Finance & Economics , 2(4):307–318, 1997.[38] Yin-Wong Cheung and Menzie David Chinn. Currency traders and exchange ratedynamics: a survey of the us market.

Journal of international Money and Finance ,20(4):439–471, 2001.[39] Thomas Gehring and Lukas Menkhoﬀ. Technical analysis in foreign exchange-the workhorse gains further ground. Technical report, Diskussionspapiere derWirtschaftswissenschaftlichen Fakult¨at, Universit¨at . . . , 2003.[40] David P Brown and Robert H Jennings. On technical analysis.

The Review of FinancialStudies , 2(4):527–551, 1989.[41] Andrew W Lo, Harry Mamaysky, and Jiang Wang. Foundations of technical analysis:Computational algorithms, statistical inference, and empirical implementation.

Thejournal of ﬁnance , 55(4):1705–1765, 2000.[42] Lawrence Blume, David Easley, and Maureen O’hara. Market statistics and technicalanalysis: The role of volume.

The Journal of Finance , 49(1):153–181, 1994.[43] Fabian Baetje and Lukas Menkhoﬀ. Equity premium prediction: Are economic andtechnical indicators unstable?

International Journal of Forecasting , 32(4):1193–1207,2016.[44] David E Rapach, Jack K Strauss, and Guofu Zhou. Out-of-sample equity premium pre-diction: Combination forecasts and links to the real economy.

The Review of FinancialStudies , 23(2):821–862, 2010.[45] Christopher J Neely, David E Rapach, Jun Tu, and Guofu Zhou. Forecasting the equityrisk premium: the role of technical indicators.

Management science , 60(7):1772–1791,2014.[46] James S Bergstra, R´emi Bardenet, Yoshua Bengio, and Bal´azs K´egl. Algorithms forhyper-parameter optimization. In

Advances in neural information processing systems ,pages 2546–2554, 2011.[47] Nitish Srivastava, Geoﬀrey Hinton, Alex Krizhevsky, Ilya Sutskever, and RuslanSalakhutdinov. Dropout: a simple way to prevent neural networks from overﬁtting.

The journal of machine learning research , 15(1):1929–1958, 2014.[48] Sergey Ioﬀe and Christian Szegedy. Batch normalization: Accelerating deep networktraining by reducing internal covariate shift. In Francis R. Bach and David M. Blei,editors,

ICML , volume 37 of

JMLR Workshop and Conference Proceedings , pages 448–456. JMLR.org, 2015. 1349] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXivpreprint arXiv:1412.6980 , 2014.[50] Tijmen Tieleman and Geoﬀrey Hinton. Lecture 6.5-rmsprop: Divide the gradient by arunning average of its recent magnitude.