[PDF] Trader-Company Method: A Metaheuristic for Interpretable Stock Price Prediction

Abstract

Investors try to predict returns of financial assets to make successful investment. Many quantitative analysts have used machine learning-based methods to find unknown profitable market rules from large amounts of market data. However, there are several challenges in financial markets hindering practical applications of machine learning-based models. First, in financial markets, there is no single model that can consistently make accurate prediction because traders in markets quickly adapt to newly available information. Instead, there are a number of ephemeral and partially correct models called "alpha factors". Second, since financial markets are highly uncertain, ensuring interpretability of prediction models is quite important to make reliable trading strategies. To overcome these challenges, we propose the Trader-Company method, a novel evolutionary model that mimics the roles of a financial institute and traders belonging to it. Our method predicts future stock returns by aggregating suggestions from multiple weak learners called Traders. A Trader holds a collection of simple mathematical formulae, each of which represents a candidate of an alpha factor and would be interpretable for real-world investors. The aggregation algorithm, called a Company, maintains multiple Traders. By randomly generating new Traders and retraining them, Companies can efficiently find financially meaningful formulae whilst avoiding overfitting to a transient state of the market. We show the effectiveness of our method by conducting experiments on real market data.

Full PDF

TTrader-Company Method: A Metaheuristic for InterpretableStock Price Prediction

Katsuya Ito

Preferred Networks, [email protected]

Kentaro Minami

Preferred Networks, [email protected]

Kentaro Imajo

Preferred Networks, [email protected]

Kei Nakagawa

Nomura Asset Management Co., [email protected]

ABSTRACT

Investors try to predict returns of financial assets to make suc-cessful investment. Many quantitative analysts have used machinelearning-based methods to find unknown profitable market rulesfrom large amounts of market data. However, there are severalchallenges in financial markets hindering practical applications ofmachine learning-based models. First, in financial markets, thereis no single model that can consistently make accurate predictionbecause traders in markets quickly adapt to newly available informa-tion. Instead, there are a number of ephemeral and partially correctmodels called “alpha factors”. Second, since financial markets arehighly uncertain, ensuring interpretability of prediction models isquite important to make reliable trading strategies. To overcomethese challenges, we propose the Trader-Company method, a novelevolutionary model that mimics the roles of a financial institute andtraders belonging to it. Our method predicts future stock returns byaggregating suggestions from multiple weak learners called Traders.A Trader holds a collection of simple mathematical formulae, eachof which represents a candidate of an alpha factor and would beinterpretable for real-world investors. The aggregation algorithm,called a Company, maintains multiple Traders. By randomly gener-ating new Traders and retraining them, Companies can efficientlyfind financially meaningful formulae whilst avoiding overfitting toa transient state of the market. We show the effectiveness of ourmethod by conducting experiments on real market data.

KEYWORDS

Finance, Metaheuristics, Stock Price Prediction

ACM Reference Format:

Katsuya Ito, Kentaro Minami, Kentaro Imajo, and Kei Nakagawa. 2021.Trader-Company Method: A Metaheuristic for Interpretable Stock PricePrediction. In

Proc. of the 20th International Conference on Autonomous Agentsand Multiagent Systems (AAMAS 2021), Online, May 3–7, 2021 , IFAAMAS,9 pages.

Developing quantitative trading strategies is a universal task in thefinancial industry [11]. Many quantitative models have been pro-posed to predict the behavior of financial markets [20, 32, 41]. Forexample, Fama–Frech’s three-factor model and five-factor model

Proc. of the 20th International Conference on Autonomous Agents and Multiagent Systems(AAMAS 2021), U. Endriss, A. Nowé, F. Dignum, A. Lomuscio (eds.), May 3–7, 2021, Online [7, 15, 16] have been standard asset pricing models for many years.Technical indicators such as Moving Average Convergence Diver-gence (MACD) and Relative Strength Index (RSI) are also predictionmethods that have been used by many traders [1, 28].Although many quantitative analysts are struggling to derivenew rules from newly available big data, there has been no gold-standard practical method that can fully leverage these data [41].We believe that there are the following two challenges that arehindering the development of quantitative models today.

Our first challenge is to tackle the non-stationary and noisy natureof the financial market, which is known as the efficiency of mar-kets. The widely acknowledged Efficient Market Hypothesis [14]states that asset prices reflect all available information, correctingundervalued or overvalued prices into fair values. In an efficientmarket, investors cannot outperform the overall market becauseasset prices quickly follow other traders’ strategic and adversarialactivities [10]. In fact, many empirical studies have reported thatreal-world markets are nearly efficient [9, 32, 41]. Due to this, futurestock returns are hardly predictable in most markets, and no singleexplanatory model can consistently make an accurate prediction.On the other hand, there still is a common belief that stockreturns can be predictable in a sufficiently short time period, whichsuggests the existence of investments or trading strategies thatbeat the overall market at least temporarily. In particular, potentialsources of profitability would come from some simple mathematicalformulae, called alpha factors [13, 26]. Typical alpha factors used inproduction are given as combinations of few elementary functionsand arithmetic operations. For example,log ( yesterday’s close price / yesterday’s open price ) represents the classical momentum strategy [24]. It has been re-ported that there is a variety of mathematical formulae with rea-sonably low mutual correlations [26], each of which can be a goodtrading signal and actually usable in real-life trading.Although the efficacy of a single formula is slight and ephemeral,combining multiple formulae in a sophisticated way can lead to amore robust trading signal. We hypothesize that we can overcomethe instability and the uncertainty of markets by maintaining mul-tiple “weak models” given as simple mathematical formulae. Thisis in the same spirit as the ensemble methods (see e.g., [21, 39]), a r X i v : . [ q -f i n . T R ] D ec New!

Stock 𝐴 " Stock 𝐵 " Stock 𝐶 " If sign( 𝐴 " ) > 0 then Buy AIf "*+ + 0.9𝐵 " > 0 then Buy AIf "*/ + 𝐵 "*0 > 0 then Buy A(1)Traders makesimple strategies Trader: predict based on simple formulas Company: aggregate predictions and manage traders

Figure 1: Illustration of our method. (1) Traders predict the return of assets using simple formulas. Companies manage Tradersand combine Traders’ predictions into one value. The Company algorithm consists of four functions: (2) prediction by aggre-gation, (3) education of bad Traders, (4) dismissal of bad Traders, and (5) recruitment of new Traders. but paying more attention to specific structure of real-world al-pha factors may improve the performance of the resulting tradingstrategy.

The second challenge is to gain the interpretability of models. Asmentioned above, it is hard to achieve consistently high perfor-mance in the financial markets. Even if we could have the bestpossible trading strategies at hand, their predictive accuracies arequite limited and unsustainable. To gain intuition, suppose thatwe forecast the rise or fall of a single stock. Then, the accuracy istypically no higher than 51%, which is approximately the chancerate. As such, investors should worry about their trading strategieshaving a large uncertainty in the returns and the risks.In such a highly uncertain environment, the interpretability ofmodels is of utmost importance. Warren Buffett said, “Risk comesfrom not knowing what you are doing” [18]. As his word implies, in-vestors may desire the model to be explainable to understand whatthey are doing. In fact, historically, investors and researchers havepreferred linear factor models to explain asset prices (e.g., [15, 16]),which are often believed as interpretable. On the other hand, formachine learning-based strategies, the lack of interpretability canbe an obstacle to practical use, without which investors cannotunderstand their own investments nor ensure accountability to cus-tomers. We believe that using the combination of simple formulaementioned in the previous subsection can open up a possibility ofinterpretable machine learning-based trading strategies.

To address the aforementioned challenges, we propose the Trader-Company method, a new metaheuristics-based method for stockprice prediction. Our method is inspired by the role of financialinstitutions in the real-world stock markets. Figure 1 depicts theentire framework of our method. Our method consists of two mainingredients,

Traders and

Companies . A single Trader predicts thereturns based on simple mathematical formulae, which are postu-lated to be good candidates for interpretable alpha factors. Broughttogether by a Company, Traders act as weak learners that providepartial information helping the Company’s eventual prediction.The Company also updates the collection of Traders by generating (i.e., hiring) new candidates of good Traders as well as by deleting(i.e., dismissing) poorly performing Traders. This framework allowsus to effectively search over the space of mathematical formulaehaving categorical parameters.We demonstrate the effectiveness of our method by experimentson real market data. We show that our method outperforms severalstandard baseline methods in realistic settings. Moreover, we showthat our method can find formulae that are profitable by themselvesand simple enough to be interpreted by real-world investors.

In this section, we provide several mathematical definitions of fi-nancial concepts and formulate our problem setting. Table 1 sum-marizes the notation we use in this paper.

Our problem is to forecast future returns of stocks based on theirhistorical observations. To be precise, let 𝑋 𝑖 [ 𝑡 ] be the price of stock 𝑖 at time 𝑡 , where 1 ≤ 𝑖 ≤ 𝑆 is the index of given stocks and0 ≤ 𝑡 ≤ 𝑇 is the time index. Throughout this paper, we considerthe logarithmic returns of stock prices as input features of models.That is, we denote the one period ahead return of stock 𝑖 by 𝑟 𝑖 [ 𝑡 ] : = log ( 𝑋 𝑖 [ 𝑡 ]/ 𝑋 𝑖 [ 𝑡 − ]) ≈ 𝑋 𝑖 [ 𝑡 ] − 𝑋 𝑖 [ 𝑡 − ] 𝑋 𝑖 [ 𝑡 − ] . (1)We denote returns over multiple periods and returns over multipleperiods and multiple stocks by 𝒓 𝑖 [ 𝑢 : 𝑣 ] = ( 𝑟 𝑖 [ 𝑢 ] , · · · , 𝑟 𝑖 [ 𝑣 ]) , 𝒓 𝑖 : 𝑗 [ 𝑢 : 𝑣 ] = ( 𝑟 𝑖 [ 𝑢 : 𝑣 ] , · · · , 𝑟 𝑗 [ 𝑢 : 𝑣 ]) (2)Our main problem is formulated as follows. Problem 1 (one-period-ahead prediction) . The predictor sequen-tially observes the returns 𝑟 𝑖 [ 𝑡 ] (1 ≤ 𝑖 ≤ 𝑆 ) at every time 0 ≤ 𝑡 ≤ 𝑇 .For each time 𝑡 , the predictor predicts the one-period-ahead re-turn 𝑟 𝑖 [ 𝑡 + ] based on the past 𝑡 returns 𝒓 𝑆 [ 𝑡 ] . That is, thepredictor’s output can be written asˆ 𝑟 𝑖 [ 𝑡 + ] = 𝑓 𝑡 ( 𝒓 𝑆 [ 𝑡 ]) (3)for some function 𝑓 𝑡 that does not depend on the values of 𝒓 𝑆 [ 𝑡 : 𝑇 ] . able 1: Notation.Notation Meaning Def. 𝑋 𝑖 [ 𝑡 ] stock price of stock 𝑖 at time 𝑡 where 1 ≤ 𝑖 ≤ 𝑆, ≤ 𝑡 ≤ 𝑇 § 𝑟 𝑖 [ 𝑡 ] logarithmic return of 𝑖 at 𝑡 (1) 𝒓 𝑖 [ 𝑢 : 𝑣 ] ( 𝑟 𝑖 [ 𝑢 ] , · · · , 𝑟 𝑖 [ 𝑣 ]) (2) 𝒓 𝑖 : 𝑗 [ 𝑢 : 𝑣 ] ( 𝑟 𝑖 [ 𝑢 : 𝑣 ] , · · · , 𝑟 𝑗 [ 𝑢 : 𝑣 ]) (2)ˆ 𝑟 𝑖 [ 𝑡 ] , ˆ 𝑏 𝑖 [ 𝑡 ] predicted value of 𝑟 𝑖 [ 𝑡 ] and its sign (3)(4) 𝑀, 𝑃 𝑗 , 𝑄 𝑗 , 𝐷 𝑗 𝐹 𝑗 , 𝐴 𝑗 , 𝑂 𝑗 hyper-parameters of Traders (5) To evaluate the goodness of the prediction, we use the cumulativereturn defined as follows. Given a predictor’s output ˆ 𝑟 𝑖 [ 𝑡 + ] , wedefine the “canonical” trading strategy asˆ 𝑏 𝑖 [ 𝑡 + ] = sign ( ˆ 𝑟 𝑖 [ 𝑡 + ]) . (4)Here, the value of ˆ 𝑏 𝑖 [ 𝑡 ] represents the trade of stock 𝑖 at time 𝑡 .That is, ˆ 𝑏 𝑖 [ 𝑡 ] = ± 𝑟 𝑖 [ 𝑡 + ] as 𝐶 𝑖 [ 𝑡 ] = 𝑡 ∑︁ 𝑢 = ˆ 𝑏 𝑖 [ 𝑢 + ] 𝑟 𝑖 [ 𝑢 + ] = 𝑡 ∑︁ 𝑢 = sign ( ˆ 𝑟 𝑖 [ 𝑢 + ]) 𝑟 𝑖 [ 𝑢 + ] . If the predictor could perfectly predict the sign of the one periodahead returns, the above canonical strategy yields the maximumcumulative return among all possible strategies that can only tradeunit amounts of stocks. As such, we consider 𝐶 𝑖 [ 𝑡 ] as an evaluationmajor of the prediction. In this section, we present the

Trader-Company method , a newmetaheuristics-based prediction algorithm for stock prices.Figure 1 outlines our proposed method. Our method consists oftwo main components,

Traders and

Companies , which are inspiredby the role of human traders and financial institutes, respectively.A Trader predicts the returns using a simple model expressingrealistic trading strategies. A Company combines suggestions frommultiple Traders into a single prediction. To train the parametersof the proposed system, we employ an evolutionary algorithmthat mimics the role of financial institutes as employers of traders.During training, a Company generates promising new candidatesof Traders and deletes poorly performing ones. Below, we providemore detailed definitions and training algorithms for Traders andCompany.

First, we introduce the Traders, which are the minimal componentsin our proposed framework. As mentioned in the introduction, trad-ing strategies used in real-life trading are made of simple formulaeinvolving a small number of arithmetic operations on the return values [26]. We postulate that there are a number of unexploredprofitable market rules that can be represented by simple formulae,which leads us to the following definition of a parametrized familyof formulae.

Definition 1.

A Trader is a predictor of one period ahead returnsdefined as follows. Let 𝑀 be the number of terms in the predictionformula. For each 1 ≤ 𝑗 ≤ 𝑀 , we define 𝑃 𝑗 , 𝑄 𝑗 as the indices ofthe stock to use, 𝐷 𝑗 , 𝐹 𝑗 as the delay parameters, 𝑂 𝑗 as the binaryoperator, 𝐴 𝑗 as the activation function, and 𝑤 𝑗 as the weight of the 𝑗 -th term. Then, the Trader predicts the return value 𝑟 𝑖 [ 𝑡 + ] attime 𝑡 + 𝑓 Θ ( 𝒓 𝑆 [ 𝑡 ]) = 𝑀 ∑︁ 𝑗 = 𝑤 𝑗 𝐴 𝑗 ( 𝑂 𝑗 ( 𝑟 𝑃 𝑗 [ 𝑡 − 𝐷 𝑗 ] , 𝑟 𝑄 𝑗 [ 𝑡 − 𝐹 𝑗 ])) . (5)where Θ is the parameters of the Trader: Θ : = { 𝑀, { 𝑃 𝑗 , 𝑄 𝑗 , 𝐷 𝑗 , 𝐹 𝑗 , 𝑂 𝑗 , 𝐴 𝑗 , 𝑤 𝑗 } 𝑀𝑗 = } . For activation functions 𝐴 𝑗 , we use standard activation functionsused in deep learning such as the identity function, hyperbolictangent function, hyperbolic sine function, and Rectified LinearUnit (ReLU). For the binary operators 𝑂 𝑗 , we use several arithmeticbinary operators (e.g., 𝑥 + 𝑦 , 𝑥 − 𝑦 , and 𝑥 × 𝑦 ), the coordinateprojection, ( 𝑥, 𝑦 ) ↦→ 𝑥 , the max/min functions, and the comparisonfunction ( 𝑥 > 𝑦 ) = sign ( 𝑥 − 𝑦 ) .Our definition of the Trader has several advantages. First, theformula (5) is ready to be interpreted in the sense that it has a similarform to typical human-generated trading strategies [26]. Second,the Trader model has a sufficient expressive power. The Traderhas various binary operators as fundamental units, which allowsit to represent any binary operations commonly used in practicaltrading strategies. Besides, the model also encompasses the linearmodels since we can choose the projection operator ( 𝑥, 𝑦 ) ↦→ 𝑥 as 𝑂 𝑗 .Ideally, we want to optimize the Traders by maximizing thecumulative returns: Θ ∗ ∈ arg max Θ (cid:205) 𝑢 sign ( 𝑓 Θ ( 𝒓 𝑆 [ 𝑢 ])) · 𝑟 𝑖 [ 𝑢 + ] : = arg max Θ 𝑅 ( 𝑓 Θ , 𝒓 𝑆 [ 𝑡 ] , 𝑟 𝑖 [ 𝑡 + ]) (6)However, it is difficult to apply common optimization methodssince the objective in the right-hand side is neither differentiablenor continuous w.r.t. the parameter Θ . Therefore, we introduce anovel evolutionary algorithm driven by Company models, whichwe will describe below. As mentioned in the introduction, the behaviour of the financialmarket is highly unstable and uncertain, and thereby any singleexplanatory model is merely partially correct and transient. To over-come this issue and robustify the prediction, we develop a methodto combine predictions of multiple Traders. This is in the same spiritas the general and long-standing framework of ensemble methods(e.g., [21] or Chapter 4 of [39]), but introducing an “inductive bias”that takes into account the dynamics of real-world financial mar-kets would improve the performance of combined prediction. Inarticular, given the fact that the stock prices are determined asa result of diverse investments made by institutional traders, itis reasonable to consider a model imitating the environments inwhich institutional traders are involved (i.e., financial institutions).

Algorithm 1

Prediction algorithm of Company

Input: 𝒓 𝑆 [ 𝑡 ] :stock returns before 𝑡 , Traders : { Θ 𝑛 } 𝑁𝑛 = Output: ˆ 𝑟 𝑖 [ 𝑡 + ] : predicted return of stock 𝑖 at 𝑡 function CompanyPrediction for 𝑛 = , · · · , 𝑁 do 𝑃 𝑛 ⇐ 𝑓 Θ 𝑛 ( 𝒓 𝑆 [ 𝑡 ]) ⊲ Prediction by Trader (5) end for return Aggregate ( 𝑃 , · · · , 𝑃 𝑁 ) ⊲ Aggregation end functionAlgorithm 2 Educate algorithm of Company

Input: 𝒓 𝑆 [ 𝑡 ] :stock returns before 𝑡 Input:

Traders. 𝑁 : the number of Traders. 𝑄 : ratio. Output:

Traders function CompanyEducate 𝑅 𝑛 ⇐ 𝑅 ( 𝑓 Θ 𝑛 , 𝒓 𝑆 [ 𝑡 ] , 𝑟 𝑖 [ 𝑡 + ]) ⊲ Trader’s return (6) 𝑅 ∗ ⇐ bottom 𝑄 percentile of { 𝑅 𝑛 } for 𝑛 ∈ { 𝑚 | 𝑅 𝑚 ≤ 𝑅 ∗ } do ⊲ for all bad traders Update 𝑤 𝑖 in (5) by least squares method end for return Traders end functionAlgorithm 3 Prune-and-Generate algorithm of Company

Input: 𝒓 𝑆 [ 𝑡 ] :stock returns before 𝑡 , F: Input: 𝑁 : the number of Predictors. 𝑄 : ratio. Output: 𝑁 ′ Predictors Θ 𝑛 ∼ Unifrom Distribution for 𝑘 = , · · · , 𝐹 do 𝑅 𝑛 ⇐ 𝑅 ( 𝑓 Θ , 𝒓 𝑆 [ 𝑡 ] , 𝑟 𝑖 [ 𝑡 + ]) ⊲ Trader’s return (6) 𝑅 ∗ ⇐ bottom 𝑄 -percentile of { 𝑅 𝑛 } { Θ 𝑗 } 𝑗 ⇐ { Θ 𝑛 | 𝑅 𝑛 ≥ 𝑅 ∗ } ⊲ Pruning { Θ 𝑗 } 𝑁 ′ 𝑗 = ∼ GM fitted to { Θ 𝑗 } 𝑗 * ⊲ Generation end for return 𝑁 ′ Predictors with { Θ 𝑗 } 𝑁 ′ 𝑗 = * If the parameter is an integer, we round it off.In our framework, a Company maintains 𝑁 Traders that act asweak learners or feature extractors, and aggregate them. Given 𝑁 Traders specified by parameters Θ , . . . , Θ 𝑛 and the past observa-tions of stock returns 𝒓 𝑆 [ 𝑡 ] , a Company predicts the futurereturns by ˆ 𝑟 [ 𝑡 + ] = Aggregate ( 𝑓 Θ , . . . , 𝑓 Θ 𝑛 ) . For clarity, this procedure is presented in Algorithm 1. Here, Aggregatecan be an arbitrary aggregation function and allowed to have ex-tra parameters. For example, we can use the simple averaging 𝑁 (cid:205) 𝑁𝑛 = 𝑓 Θ 𝑛 ( 𝒓 𝑆 [ 𝑡 ]) , linear regression or general trainable pre-diction models (e.g., neural networks and the Random Forest) thattake the Traders’ suggestions as the input features.In order to achieve low training errors whilst avoiding overfitting,the Company should maintain the average quality as well as thediversity of the Traders’ suggestions. To this end, we introducethe Educate algorithm (Algorithm 2) and the Prune-and-Generatealgorithm (Algorithm 3), which update weights and formulae ofTraders, respectively. • Educating Traders : Recall that a single Trader (5) is a linearcombination of 𝑀 mathematical formula. A single Tradercan perform poorly in terms of cumulative returns. However,if the Trader has good candidates of formulae (i.e., alphafactors), slightly updating the weights { 𝑤 𝑗 } while keepingthe formulae would significantly improve the performance.Algorithm 2 corrects the weights { 𝑤 𝑗 } of the Traders achiev-ing relatively low cumulative returns. Here, we update theweights by the least-squares method, which is solved analyt-ically. • Pruning poorly performing Traders and generating newcandidate good Traders : If a Trader holds “bad” candidatesof formulae, keeping that Trader makes no improvement onthe prediction performance while it can increase risk expo-sures. In that case, it may be beneficial to simply removethat Trader and replace it with a new promising candidate.Algorithm 3 implements this idea. First, we evaluate the cu-mulative returns of the current set of Traders, and removethe Traders having relatively low returns. Then, we generatenew Traders by randomly fluctuating the existing Traderswith good performances. To this end, we fit some probabil-ity distribution to the current set of parameters, and drawnew parameters from it. While the parameter specifyingthe formulae contain discrete variables such as indices ofstocks and choices of arithmetic operations, we empiricallyfound that fitting the Gaussian mixture distribution (i.e., acontinuous multi-modal distribution) to discrete indices anddiscretizing the generated samples can achieve reasonablygood performances. See Section 4 for detailed experimentalresults.Using the above algorithms together, we can effectively searchthe complicated parameter space of Traders. Educate algorithm(Algorithm 2) is intended to be applied before pruning (Algorithm3) to prevent potentially useful alpha factors from being pruned.In practice, given past observations of returns 𝒓 𝑆 [ 𝑡 : 𝑡 ] , we cantrain the model by the following workflow.(1) Educate a fixed proportion of poorly performing Traders byAlgorithm 2.(2) Replace a fixed proportion of poorly performing Traderswith random new Traders by Algorithm 3.(3) If the aggregation function Aggregate has trainable param-eters, update them using the data 𝒓 𝑆 [ 𝑡 : 𝑡 ] and any opti-mization algorithm.(4) Predict future returns by Algorithm 1.We comment on some intuitions about the advantages of ourmethod, although there is no theoretical guarantee in practicalsettings. Algorithm 3 increases the diversity of Traders by injectingandom fluctuations to existing good Traders. From a generalizationperspective, injecting randomness may help to avoid overfittingto the current transient state of the market. From an optimizationperspective, we can view our algorithm as a variant of evolutionaryalgorithms such as the Covariance Matrix Adaptation EvolutionStrategy (CMA-ES) [19]. While the original CMA-ES generates newparticles from a Gaussian distribution, we see that using a multi-modal distribution is practically important. See Section 4 for anempirical verification. We conducted experiments to evaluate the performance of ourproposed method on real market data. We compare our methodwith several benchmarks including simple linear models and state-of-the-art deep learning methods, and confirm the superiority of ourmethod. We also demonstrate that our method can find profitableformulae (alpha factors) that are simple enough to interpret.

Throughout the experiments, we used two real market datasets: (i)US stock prices listed on Standard & Poor’s 500 (S&P 500) Stock In-dex and (ii) UK stock prices from the London Stock Exchange (LSE).S&P 500 and LSE have one of the highest trading volumes and mar-ket capitalization in the world. From a reproducibility viewpoint,we used data that were distributed online free from Dukascopy’sOnline data feed . For the S&P 500 data, we used daily data for allthe 500 stocks listed S&P 500 Stock Index in the period from May19, 2000 to May 19, 2020. For the LSE data, we used hourly data forall the 77 stocks prices available on Dukascopy in the period fromSeptember 07, 2016 to September 07, 2019. In our experiments, weemployed two practical constraints, time windows and executionlags. In practice, we cannot use all the past observations of stockprices due to the time and space complexity. Also, we cannot tradestocks immediately after the observation of returns because weneed some time to make an inference by the model and executethe trade. Thus, it is reasonable to introduce a time window 𝑤 > 𝑙 >

0, so we train models using observations 𝒓 𝑖 : 𝑆 [ 𝑡 − 𝑙 − 𝑤 : 𝑡 − 𝑙 ] and predict returns at time 𝑡 +

1. Throughoutexperiments, we used 𝑤 =

10 and 𝑙 = To evaluate the performances of prediction algo-rithms, we adopted three metrics defined as follows. For eachmetric, higher value is better. Let 𝑟 𝑖 [ 𝑡 ] be the return of stock 𝑖 ( 𝑖 ∈ { , . . . , 𝑆 } ) at time 𝑡 , and let ˆ 𝑟 𝑖 [ 𝑡 ] be its prediction obtainedfrom an arbitrary method. Recall ˆ 𝑏 𝑖 [ 𝑡 ] = sign ( ˆ 𝑟 𝑖 [ 𝑡 ) . • Accuracy (ACC) the accuracy rate of prediction of the riseand drop of stock prices. ACC 𝑖 = P [ sign ( 𝑟 𝑖 [ 𝑡 ]) = ˆ 𝑟 𝑖 [ 𝑡 ]] . • Annualized Return(AR) : Given predictions of the returnsof stock 𝑖 , the cumulative return is given as 𝐶 𝑖 [ 𝑡 ] : = (cid:205) 𝑡𝑢 = ˆ 𝑏 𝑖 [ 𝑢 + ] 𝑟 𝑖 [ 𝑢 + ] . We define the Annualized Return (AR) averagedover all stocks as AR : = × 𝑇 Y 𝑇 𝑆 (cid:205) 𝑆𝑖 = 𝐶 𝑖 [ 𝑇 ] , where 𝑇 Y isthe average number of periods contained in one year. • Sharpe Ratio (SR) : The Sharpe ratio [36], or the Return/Riskratio (R/R) is the return value adjusted by its standard devia-tion. That is, letting 𝜇 𝑖 : = 𝑇 𝐶 𝑖 [ 𝑇 ] = 𝑇 (cid:205) 𝑇𝑡 = ˆ 𝑏 𝑖 [ 𝑢 + ] 𝑟 𝑖 [ 𝑢 + ] and 𝜎 𝑖 : = 𝑇 (cid:205) 𝑇𝑡 = ( ˆ 𝑏 𝑖 [ 𝑢 + ] 𝑟 𝑖 [ 𝑢 + ] − 𝜇 𝑖 ) , and we defineSR 𝑖 : = 𝜇 𝑖 / 𝜎 𝑖 . Then, we report the average SR = 𝑆 (cid:205) 𝑆𝑖 = SR 𝑖 . • Calmar Ratio (CR) : We also use the Calmar ratio [43], an-other definition of adjuster returns. Define the MaximumDrawDown (MDD) asMDD 𝑖 : = max ≤ 𝑡 ≤ 𝑇 max 𝑡 < 𝑠 ≤ 𝑇 (cid:16) − 𝐶 𝑖 [ 𝑡 ] 𝐶 𝑖 [ 𝑠 ] (cid:17) . The Calmar ratio is defined as CR 𝑖 : = AR 𝑖 / MDD 𝑖 . In ourexperiments, we report CR = 𝑆 (cid:205) 𝑆𝑖 = CR 𝑖 . Note that whileboth SR and CR are adjusted returns by its risk measures,CR is more sensitive to drawdown events that occur lessfrequently (e.g., financial crises). We compared our methods with the following baseline methods: • Market: a uniform Buy-And-Hold strategy. • Vector Autoregression (VAR): a commonly-used linear modelfor financial time series forecasting [37]. • Random Forest (RF): a commonly-used ensemble method[5]. • Multi Head Attention (MHA): a deep learning algorithm fortime series prediction [40]. • Long- and Short-Term Networks (LSTNet): a deep-learning-based algorithm which combines Convolutional and Recur-rent Neural Network [27]. • State-Frequency Memory Recurrent Neural Networks (SFM) :a deep learning-based stock price prediction algorithm thatincorporates the concept of Fourier Transform into LongShort-Term Memory [44]. • Symbolic Regression by Genetic Programming (GP): a pre-dicton algorithm using genetic programming [33].To verify the effects of individual technical components in ourproposed method (TC), we also compared the following “ablation”models. • Changing the Trader structure:

TC linear only uses thelinear activation function 𝑥 ↦→ 𝑥 , so the eventual predictionof a Campany becomes just a linear combination of severalbinary operations. TC unary only uses the unary operator 𝑂 ( 𝑥, 𝑦 ) = 𝑥 . • Changing the optimization algorithm:

TC w/o educate doesnot execute the Educate Algorithm (Algorithm 2), so Traderscan be discarded even if they have promising formulae.

TCw/o prune does not execute the pruning step in Algorithm3, so a Company keeps poorly performing Traders.

TC uni-modal uses the Gaussian distributions instead of the Gauss-ian mixtures in the generation step.

TC MSE uses the meansquared loss as scores for educating and pruning, so it doesnot use the cumulative returns at all.Table 2 lists the hyper-parameters used in the baseline algo-rithms. There was an unintended data leak in the implementation published by the author.Therefore, we fixed it in our experiment for fairness. able 2: Hyper-parameters in Experiments

Parameter Value Definition 𝑀 { , · · · , } (5) 𝐷 𝑗 , 𝐹 𝑗 { , · · · , } (5) 𝐴 𝑗 ( 𝑥 ) { 𝑥, tanh ( 𝑥 ) , exp ( 𝑥 ) , sign ( 𝑥 ) , ReLU ( 𝑥 ) } (5) 𝑂 𝑗 ( 𝑥, 𝑦 ) { 𝑥 + 𝑦, 𝑥 − 𝑦, 𝑥𝑦, 𝑥, 𝑦, max ( 𝑥, 𝑦 ) ,min ( 𝑥, 𝑦 ) , 𝑥 > 𝑦, 𝑥 < 𝑦, Corr ( 𝑥, 𝑦 ) } (5) 𝑁

100 Algorithm 1Aggregate Linear regression Algorithm 1 𝑄 . L o g C u m u l a t i v e R e t u r n TCRFLSTNetMHAGPVARSFMMarket

Figure 2: Cumulative returns on US market L o g C u m u l a t i v e R e t u r n TCRFLSTNetMHAGPVARSFMMarket

Figure 3: Cumulative returns on UK market

First, we trained the models using thefirst half of the datasets, and then evaluated the performances ofthe models with frozen parameters using the latter half. For USmarket, we used the data before May 2018 for training and the restfor testing. For UK market, we used the first one and a half yearsfor training and the rest for testing.Table 3 and Table 4 show the comparisons of between our pro-posed method (TC) and other baseline methods on US and UKmarkets, respectively. All methods are evaluated using three evalu-ation metrics (AR, SR, and CR). For methods depending on randominitializations, we run the evaluations for each method 100 times

Table 3: Performance comparison on US markets

ACC(%) AR(%) SR CR

Market

VAR

MHA ± ± ± ± LSTNet ± ± ± ± SFM ± ± ± ± GP ± ± ± ± RF ± ± ± ± TC linear ± ± ± ± TC unary ± ± ± ± TC w/o educate ± ± ± ± TC w/o prune ± ± ± ± TC unimodal ± ± ± ± TC MSE ± ± ± ± TC (Proposed) 55.68 ± ± ± ± Table 4: Performance comparison on UK markets

ACC(%) AR(%) R/R CRMarket

VAR

MHA ± ± ± ± LSTNet ± ± ± ± SFM ± ± ± ± GP ± ± ± ± RF ± ± ± ± TC 50.928 ± ± ± ± with different random seeds and provide the means and the stan-dard deviations. Also, Figure 2 and Figure 3 show the cumulativereturns on US and UK markets, respectively.Overall, our method outperformed the other baselines in thepresented three evaluation metrics. Some interesting observationsare as follows. Importance of Traders : Comparing several baseline methodsand ablation models, we found that the structure of Traders is ofcrucial importance.First, in the definition of Traders (5), we restrict the formulaeto those represented by the binary operators 𝑂 𝑗 . This means thatTraders rule out formulae that leverage interactions between threeor more terms, which can reduce the complexity of the entire modelwithout losing the expressive power. This is corroborated by thefollowing observations: Among the baseline methods, a simplelinear method (VAR) achieved a relatively good performance. VARestimates its coefficients by the ordinary least-squares method,which means that the prediction of the return of each individualstock can be a “dense” linear combination of past observations. Onthe other hand, TC linear, which improved SR significantly uponVAR, finds the solution among linear combinations of features madeof at most two observations. On the other hand, we can also seethat using only unary operations (TC unary) greatly deterioratesthe performance.Second, comparing TC and TC linear, we see that introducingnon-linear activation functions also improves the performance. able 5: Prediction formula extracted from the best Traders.These expressions are for predicting the returns at 𝑡 + 𝑡 + = − . ( SHP 𝑡 > AHT 𝑡 − ) = − .

80 sign ( WPP 𝑡 − + SKY 𝑡 )− .

85 sign ( min ( AV 𝑡 − , SHP 𝑡 − )) = . ( AHT 𝑡 − )+ .

859 sign ( max ( WEIR 𝑡 − , WTB 𝑡 − ))− . ( PFC 𝑡 − > DGE 𝑡 − ) Third, TC also outperformed another off-the-shelf ensemblemethod (RF). RF combines non-linear predictors given by decisiontrees, i.e., indicator functions of rectangles (see e.g., Chapter 9 of[21]). RF requires many decision trees to approximate binary op-erations such as max 𝑥, 𝑦 or 𝑥 > 𝑦 . Hence, when these operationsare actually important for constructing alpha factors, RF can in-crease the redundancy and the model complexity, which leads topoor performance especially in SR and CR. In fact, these operationsfrequently appear in good Traders (Table 5). Importance of combining multiple formulae : Among the base-line methods, GP outputs a single mathematical formula by usinggenetic programming [33]. However, this did not work well in ourexperiments in which we adopted reasonably long test periods. Thismay reflect the fact that any single formula is ephemeral due to the(near) efficiency of the markets. On the other hand, our method thatmaintains multiple formulae performed well over the test period.

Importance of optimization heuristics : We found that eachindividual optimization technique presented in Section 3.2 signif-icantly improves the performance. First, the pruning step seemsquite important (cf. TC w/o prune). Regarding the scores for thepruning, using the MSE instead of the cumulative returns dete-riorates the performance (cf. TC MSE). Second, we can see thatintroducing the education step also improves the overall perfor-mance (cf. TC w/o educate). Otherwise Companies may discardTraders that have possibly good formulae. Lastly, using multimodaldistribution in the generation step (Algorithm 3) is quite important.If we instead use a unimodal distribution (cf. TC unimodal), theperformance is substantially deteriorated. A possible reason is thata unimodal distribution concentrates around the means of discreteindices, which does not make sense.

Comparison to other non-linear predictors : We also comparedour method to other complex non-linear models. MHA, LSTNetand SFM are prediction methods based on deep learning. Amongthese, SFM performed relatively well in US market, but none ofthem achieved good performances in UK market. Our method con-sistently outperformed these methods.

In the previous experiment, we adopted asimple train/test splitting. However, in practice, it is not reasonableto use a single model for a long time, and we might update the modelmore frequently to follow structural changes of the markets. Here,using the US data, we also evaluated our method in a sequentialprediction setting. We sequentially updated the models every year,where we used all the past observations for training. Figure 4 shows L o g C u m u l a t i v e R e t u r TCRFLSTNetMHAGPVARSFMMarket

Figure 4: Cumulative returns on US market in the online pre-diction setting L o g C u m u l a t i v e R e t u r n TCTC

Figure 5: Cumulative returns achieved by individual Tradersextracted from Companies. “TC” is the overall performanceof our method. “TC 𝑀 . Ticker symbols which are used by Traders N u m be r o f T r ade r s A Y U M C M C S T I F SBA

C C M A B W A SEE PV H J N P R CN P C T AS BE N S R E BXP B M Y C OG C O F S TT CDN

S S W KS R L S J M S I VB CC L C M S S CH W S L B G I S B LL C A T CHD C PB C I N F PE G D G X H OG PX D S H W F L S Binary operators which are used by Traders o f T r ade r s Figure 6: The number of times each parameter wasused.(Upper) indices 𝑃 𝑗 , 𝑄 𝑗 . (Lower) operators 𝑂 𝑗 the cumulative returns on US market. Our method performed wellalso in this setting. So far, we have evaluated the overall performance of the proposedmethod. Here, we investigate the performance of a single Trader ora formula extracted from the trained model, and discuss about theinterpretation of obtained formulae.irst, we investigate the performance of a single

Trader . Recallthat a Trader (5) is given as a linear combination of 𝑀 mathematicalformulae. Thus, the expressive power of a single Trader increasesas 𝑀 increases. Here, using the UK data, we trained the Companiesby restricting 𝑀 to be a fixed value in { , , , } (Note that, inour method, 𝑀 for each Trader is trainable by default). Then, weextracted the best performing Trader from each trained Company,and evaluated the performance for the test period. Figure 5 showsthe result. We can see that our method can find Traders that achievepositive returns by themselves, while the total return of the overallmarket is negative during the test period. If we increase the numberof terms 𝑀 , the cumulative return of a single Trader at the endof the test period decreased (albeit slightly), which suggests thatincreasing the expressive powers of individual Traders is proneto overfitted formulae and does not necessarily result in a singleprofitable formula. Meanwhile, we should note that the overallperformance of the Company (TC in Figure 5) is much better thana single Trader.Next, we consider the interpretability issue. Table 5 lists theactual formulae extracted from the trained Traders. Regarding themeanings of the ticker symbols used in the formulae, see the websiteof London Stock Exchange . For example, LLOY t + indicates thereturn from 𝑡 + 𝑡 + ( 𝑥, 𝑦 ) , max ( 𝑥, 𝑦 ) and pairwise comparisons ( 𝑥 > 𝑦 ). Theseoperators are difficult to be approximated by decision trees thatcan only represent rectangular regions, which may be the reasonfor the superiority of our method over the Random Forest.Figure 6 shows which ticker symbol and binary operator areused by Traders and how many times to predict the return of AAPL(Apple Inc.). From Figure 6, we can interpret which stocks theCompany is paying attention to and binary operators the Tradersare using. There are two established approaches to financial time series mod-eling. Firstly, many statistical time series models that aim to describethe generative processes of financial time series have been devel-oped. For example, the autoregressive integrated moving average(ARIMA) model and Vector autoregression (VAR) are often used infinancial time series prediction [4, 29]. Secondly, another directionis to use factor models , which aim to explain asset prices usingnot only the structure of each individual time series but also thecross-section information and other kinds of financial information.To name a few, Fama–French’s three-factor model [15], Carhartfour-factor model [7], and Fama-French’s five-factor model [16] areamong the most important ones. More than 300 identified factorsare reported as of 2012 [20].Recent financial econometrics is characterized by the combina-tion of financial modeling and machine learning methods, especiallydeep learning. Although there are many research directions in this flourishing field, these include directions that (i) apply sophisticatedtime series models to financial problems and (ii) incorporate dataof different modalities into the prediction. Sezer et al. conducted anextensive survey of these applications [35]. For the first direction,Zhang et al. [44] proposed to used frequency information to forecaststock prices. Lai et al. [27] proposed a method to extract long-termand short-term patterns by combining CNN and RNN. For the sec-ond direction, the use of news, social media and networks amongcompanies is also active [8, 12, 23, 42].However, it is folklore among experts that, under the transientand uncertain environments of financial markets, complex models(including neural networks) do not work well as expected, and tradi-tional simpler models are more preferred. In partcular, Makridakiset al. [30] pointed out the superiority of “traditional” statisticalmodels over machine learning models in financial time series. Thismotivates us to leverage simple units of models such as formulaicalphas [13, 26] and deal with the uncertainty by bootstrappingthem, instead of using “black-box” deep learning-based methods. Combining multiple predictions is a long-standing approach in datascience [21, 39]. Generally speaking, model selection and model en-semble are both important ideas (e.g., Chapter 8 of [21]). However,in some situations, selecting only a single model with the (tempo-ral) best track record can lead to suboptimal performances, whichhas been confirmed empirically (e.g., Section 7.2 of [39]) and theo-retically (e.g., [25]). Therefore, in many situations, ensemble-typemethods might be the first candidate to try.In ensemble methods, some techniques such as pruning and ran-dom generation of experts have shown to be effective in varioussituations. For example, it has been widely known that eliminatingpoorly performing experts (e.g., [17] and Section 7.3 of [39]) orpartial structures of experts (e.g., [6]) can improve the overall per-formance. The idea of random generation of experts has been usedin the Random Forest or the Random Fourier Features method [34].We would like to stress that, as we demonstrated in Section 4, thedesigns of pruning and generation schemes are crucially importantin financial time series prediction.

Over the years, many research have been done on the applicationof metaheuristics to finance [3]. Soler-Dominguez et al. [38] hasdone an extensive survey on these application. While portfolio op-timization and index tracking and enhanced indexation are activeapplications of metaheuristics, the application of Genetic Program-ming (GP) is common for stock price prediction. Index predictionmethod, combination with self-organizing map (SOM), one usingmulti-gene and one using hybrid GP were proposed [2, 22, 31].

We proposed a new prediction method for financial time series.Our method consists of two main ingredients, the Traders and theCompany. The Traders aim to predict the future returns of stocks bysimple mathematical formulae, which can be naturally interpretableas “alpha factors” in finance literature. The Company aggregateshe predictions of Traders to overcome the highly uncertain envi-ronments of financial markets. The Company also provides a noveltraining algorithm inspired by real-world financial institutes, whichallows us to search over the complicated parameter space of Tradersand find promising mathematical formulae efficiently. We demon-strated the efficacy of our method through experiments on USand UK market data. In particular, our method outperformed somecommon baseline methods in both markets, and an ablation studyshowed that each individual technique in our proposed methoddoes improve the overall performance. We focused on forecastingstock prices throughout this paper, and an interesting future direc-tion is to investigate the applicability of our method to other typesof assets.

ACKNOWLEDGMENTS

We thank the anonymous reviewers for their constructive sugges-tions and comments. We also thank Masaya Abe, Shuhei Noma,Prabhat Nagarajan and Takuya Shimada for helpful discussions.

REFERENCES [1] Steven Achelis. 2000.

Technical Analysis from A to Z, 2nd Edition . McGraw-Hill,New York.[2] Sara Elsir M. Ahmed, Alaa F. Sheta, and Hossam Faris. 2015. Evolving StockMarket Prediction Models Using Multigene Symbolic Regression Genetic Pro-gramming.

Artificial Intelligence and Machine Learning AIML

15, 1 (June 2015),11–20.[3] Franklin Allen and Risto Karjalainen. 1999. Using genetic algorithms to findtechnical trading rules.

Journal of Financial Economics

51, 2 (Feb. 1999), 245–271.[4] George E. P. Box, Gwilym M. Jenkins, Gregory C. Reinsel, and Greta M. Ljung.2015.

Time Series Analysis: Forecasting and Control (Wiley Series in Probabilityand Statistics) . Wiley, New York, NY.[5] Leo Breiman. 2001. Random Forests.

Machine Learning

45, 1 (2001), 5–32.[6] L. Breiman, J. Friedman, R Olshen, , and C. Stone. 1984.

Classification andRegression Trees . Wadsworth.[7] Mark M. Carhart. 1997. On Persistence in Mutual Fund Performance.

The Journalof Finance

52, 1 (March 1997), 57–82.[8] Yingmei Chen, Zhongyu Wei, and Xuanjing Huang. 2018. Incorporating Corpo-ration Relationship via Graph Convolutional Neural Networks for Stock PricePrediction. In

Proceedings of the 27th ACM International Conference on Informationand Knowledge Management - CIKM '18 (Torino, Italy). ACM Press, New York,NY, USA, 1655–1658.[9] R. Cont. 2001. Empirical properties of asset returns: stylized facts and statisticalissues.

Quantitative Finance

1, 2 (Feb. 2001), 223–236.[10] Marcos López de Prado. 2015. The Future of Empirical Finance.

The Journal ofPortfolio Management

41, 4 (2015), 140–144.[11] Marcos Lopez de Prado. 2018.

Advances in Financial Machine Learning . Wiley,New York, NY.[12] Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan. 2015. Deep Learning for Event-Driven Stock Prediction. In

Proceedings of the 24th International Conference onArtificial Intelligence (IJCAI’15) . AAAI Press, Buenos Aires, Argentina, 2327–2333.[13] Igor Tulchinsky et al. 2015.

Finding Alphas: A Quantitative Approach to BuildingTrading Strategies . Wiley.[14] Eugene F. Fama. 1970. Efficient Capital Markets: A Review of Theory and Empir-ical Work.

The Journal of Finance

25, 2 (May 1970), 383.[15] Eugene F. Fama and Kenneth R. French. 1992. The Cross-Section of ExpectedStock Returns.

The Journal of Finance

47, 2 (June 1992), 427–465.[16] Eugene F. Fama and Kenneth R. French. 2015. A five-factor asset pricing model.

Journal of Financial Economics

Annals ofStatistics

19, 1 (1991), 1–67.[18] Robert Hagstrom. 1997.

The Warren Buffett way : investment strategies of theworld’s greatest investor . J. Wiley, New York.[19] Nikolaus Hansen and Andreas Ostermeier. 1996. Adapting arbitrary normalmutation distributions in evolution strategies: the covariance matrix adaptation.In

Proceedings of IEEE International Conference on Evolutionary Computation .IEEE, Nagoya, Japan, 312–317.[20] Campbell R. Harvey, Yan Liu, and Heqing Zhu. 2015. ... and the Cross-Sectionof Expected Returns.

Review of Financial Studies

29, 1 (Oct. 2015), 5–68.[21] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009.

The Elements ofStatistical Learning (2nd edition ed.). Springer-Verlag. [22] Chih-Ming Hsu. 2011. A hybrid procedure for stock price prediction by integrat-ing self-organizing map and genetic programming.

Expert Syst. Appl.

38 (may2011), 14026–14036.[23] Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, and Tie-Yan Liu. 2018. Listeningto Chaotic Whispers. In

Proceedings of the Eleventh ACM International Confer-ence on Web Search and Data Mining - WSDM '18 . Association for ComputingMachinery, New York, NY, USA, 261–269.[24] Narasimhan Jegadeesh and Sheridan Titman. 1993. Returns to Buying Winnersand Selling Losers: Implications for Stock Market Efficiency.

The Journal ofFinance

48, 1 (1993), 65–91.[25] A. Juditsky, P. Rigollet, and A. B. Tsybakov. 2008. Learning by mirror averaging.

Annals of Statstics

36, 5 (2008), 2183–2206.[26] Zura Kakushadze and Juan Andrés Serur. 2018.

151 Trading Strategies . SpringerInternational Publishing, Cham, Switzerland.[27] Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. 2018. ModelingLong- and Short-Term Temporal Patterns with Deep Neural Networks. In

The 41stInternational ACM SIGIR Conference on Research & Development in InformationRetrieval (Ann Arbor, MI, USA) (SIGIR ’18) . Association for Computing Machinery,New York, NY, USA, 95–104.[28] Zhige Li, Derek Yang, Li Zhao, Jiang Bian, Tao Qin, and Tie-Yan Liu. 2019. Individ-ualized Indicator for All: Stock-wise Technical Indicator Optimization with StockEmbedding. In

Proceedings of the 25th ACM SIGKDD International Conferenceon Knowledge Discovery & Data Mining - KDD '19 . Association for ComputingMachinery, New York, NY, USA, 894–902.[29] Helmut Lütkepohl. 2005.

New Introduction to Multiple Time Series Analysis .Springer Berlin Heidelberg, Berlin. https://doi.org/10.1007/978-3-540-27752-1[30] Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. 2018. Sta-tistical and Machine Learning forecasting methods: Concerns and ways forward.

PLOS ONE

13, 3 (March 2018), 1–26.[31] Viktor Manahov, Robert Hudson, and Hafiz Hoque. 2015. Return predictabil-ity and the ‘wisdom of crowds’: Genetic Programming trading algorithms, theMarginal Trader Hypothesis and the Hayek Hypothesis.

Journal of InternationalFinancial Markets, Institutions and Money

37 (July 2015), 85–98.[32] R. David McLean and Jeffrey E. Pontiff. 2016. Does Academic Research DestroyStock Return Predictability?

The Journal of Finance

71, 1 (Jan. 2016), 5–32.[33] Riccardo Poli, William B. Langdon, and Nicholas Freitag McPhee. 2008.

A FieldGuide to Genetic Programming . Lulu Enterprises, UK Ltd, Egham, United King-dom.[34] Ali Rahimi and Benjamin Recht. 2008. Random Features for Large-Scale KernelMachines. In

Advances in Neural Information Processing Systems 20 . 1177–1184.[35] Omer Berat Sezer, Mehmet Ugur Gudelek, and Ahmet Murat Ozbayoglu. 2019.Financial Time Series Forecasting with Deep Learning : A Systematic LiteratureReview: 2005-2019. arXiv:1911.13288 [cs.LG][36] William F Sharpe. 1964. Capital asset prices: A theory of market equilibriumunder conditions of risk.

The journal of finance

19, 3 (1964), 425–442.[37] Christopher A. Sims. 1980. Macroeconomics and Reality.

Econometrica

48, 1 (Jan.1980), 1.[38] Amparo Soler-Dominguez, Angel A. Juan, and Renatas Kizys. 2017. A Surveyon Financial Applications of Metaheuristics.

Comput. Surveys

50, 1 (April 2017),1–23.[39] Allan Timmermann. 2006. Forecast Combinations. In

Handbook of EconomicForecasting . Elsevier, Amsterdam, Netherlands, 135–196.[40] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is Allyou Need. In

Advances in Neural Information Processing Systems 30 , I. Guyon,U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett(Eds.). Curran Associates, Inc., Red Hook, NY, USA, 5998–6008.[41] Thomas Wiecki, Andrew Campbell, Justin Lent, and Jessica Stauth. 2016. AllThat Glitters Is Not Gold: Comparing Backtest and Out-of-Sample Performanceon a Large Cohort of Trading Algorithms.

The Journal of Investing

25, 3 (Aug.2016), 69–80.[42] Yumo Xu and Shay B. Cohen. 2018. Stock Movement Prediction from Tweets andHistorical Prices. In

Proceedings of the 56th Annual Meeting of the Association forComputational Linguistics (Volume 1: Long Papers) . Association for ComputationalLinguistics, Melbourne, Australia, 1970–1979.[43] Terry W Young. 1991. Calmar ratio: A smoother tool.

Futures

20, 1 (1991), 40.[44] Liheng Zhang, Charu Aggarwal, and Guo-Jun Qi. 2017. Stock Price Predictionvia Discovering Multi-Frequency Trading Patterns. In