[PDF] Capturing dynamics of post-earnings-announcement drift using genetic algorithm-optimised supervised learning

Abstract

While Post-Earnings-Announcement Drift (PEAD) is one of the most studied stock market anomalies, the current literature is often limited in explaining this phenomenon by a small number of factors using simpler regression methods. In this paper, we use a machine learning based approach instead, and aim to capture the PEAD dynamics using data from a large group of stocks and a wide range of both fundamental and technical factors. Our model is built around the Extreme Gradient Boosting (XGBoost) and uses a long list of engineered input features based on quarterly financial announcement data from 1,106 companies in the Russell 1000 index between 1997 and 2018. We perform numerous experiments on PEAD predictions and analysis and have the following contributions to the literature. First, we show how Post-Earnings-Announcement Drift can be analysed using machine learning methods and demonstrate such methods' prowess in producing credible forecasting on the drift direction. It is the first time PEAD dynamics are studied using XGBoost. We show that the drift direction is in fact driven by different factors for stocks from different industrial sectors and in different quarters and XGBoost is effective in understanding the changing drivers. Second, we show that an XGBoost well optimised by a Genetic Algorithm can help allocate out-of-sample stocks to form portfolios with higher positive returns to long and portfolios with lower negative returns to short, a finding that could be adopted in the process of developing market neutral strategies. Third, we show how theoretical event-driven stock strategies have to grapple with ever changing market prices in reality, reducing their effectiveness. We present a tactic to remedy the difficulty of buying into a moving market when dealing with PEAD signals.

Full PDF

CCapturing dynamics of post-earnings-announcement drift usinggenetic algorithm-optimised supervised learning

Zhengxin Joseph Ye and Björn W. SchullerGLAM, Department of ComputingImperial College LondonLondon, UK

Abstract

Post-Earnings-Announcement Drift (PEAD) is a stock market phenomenon when a stock’s cumulative abnormalreturn has a tendency to drift in the direction of an earnings surprise in the near term following an earningsannouncement. Although it is one of the most studied stock market anomalies, the current literature is oftenlimited in explaining this phenomenon by a small number of factors using simpler regression methods. In thispaper, we use a machine learning based approach instead, and aim to capture the PEAD dynamics using data froma large group of stocks and a wide range of both fundamental and technical factors. Our model is built around theExtreme Gradient Boosting (XGBoost) and uses a long list of engineered input features based on quarterly ﬁnan-cial announcement data from 1 106 companies in the Russell 1 000 index between 1997 and 2018. We performnumerous experiments on PEAD predictions and analysis and have the following contributions to the literature.First, we show how Post-Earnings-Announcement Drift can be analysed using machine learning methods anddemonstrate such methods’ prowess in producing credible forecasting on the drift direction. It is the ﬁrst timePEAD dynamics are studied using XGBoost. We show that the drift direction is in fact driven by diﬀerent factorsfor stocks from diﬀerent industrial sectors and in diﬀerent quarters and XGBoost is eﬀective in understanding thechanging drivers. Second, we show that an XGBoost well optimised by a Genetic Algorithm can help allocateout-of-sample stocks to form portfolios with higher positive returns to long and portfolios with lower negativereturns to short, a ﬁnding that could be adopted in the process of developing market neutral strategies. Third, weshow how theoretical event-driven stock strategies have to grapple with ever changing market prices in reality,reducing their eﬀectiveness. We present a tactic to remedy the diﬃculty of buying into a moving market whendealing with PEAD signals.

I. Introduction

The stock market is characterised by nonlinearities, discontinu-ities, and multi-polynomial components because it continuouslyinteracts with many factors such as individual companys’ news,political events, macro economic conditions, and general supplyand demand, etc. [1]. The non-stationary nature of the stock mar-ket is supported by a widely accepted, but still hotly contestedeconomic theory

Eﬃcient Market Hypothesis which states thatasset prices fully reﬂect all available information and the marketonly moves by reacting to new information. Such a theory impliesthat the stock market behaves like a martingale and knowledgeof all past prices is not informative regarding the expectation offuture prices.Ball and Brown [2] were the ﬁrst to note that after earningsare announced, estimated cumulative abnormal returns continueto drift up for ﬁrms that are perceived to have reported goodﬁnancial results for the preceding quarter and drift down for ﬁrms whose results have turned out worse than the market hadexpected. The discovery of Post Earnings Announcement Drift(PEAD), which is a violation of a semi-strong Eﬃcient MarketHypothesis, seems to suggest that while stock markets are gen-erally eﬃcient, there may be information leakages around theannouncement dates, coupled with post-earnings drift, result-ing in price movement anomalies. It also seems to suggest thatpast stock price information or other past economic or ﬁnancialinformation can potentially be used to predict price movementfollowing a signiﬁcant economic event such as an earnings an-nouncement.Researches on Post Earnings Announcement Drift prolifer-ated in the late 1980s and 1990s. Fama and French [3] showthat average stock returns co-vary with three factors, namely, themarket risk factor, the book-to-market factor, and the size factor.Bhushan suggests that the existence of sophisticated and unso-phisticated investors, transaction costs, and economies of scale1 a r X i v : . [ q -f i n . S T ] S e p n managing money can explain the market’s delayed responseto earnings [4]. We notice nearly all previous researches pooledcompanies with negative and positive earnings surprises whenmeasuring the eﬀect of earnings surprises on abnormal returnsand regress the absolute value of earnings surprise as well asother factors against the absolute value of abnormal return [5].However, we have found that stock markets do not just react sym-metrically to negative and positive earnings surprises and thereare a lot more factors in play that drive the near term risk adjustedreturns of a stock following an earnings release.Rather than trying to analyse the link between PEAD andeconomic and accounting factors as commonly seen in the lit-erature, by using machine learning models, we manage to leapstraight to the more important goal of predicting the directionof PEAD. In this process we have overcome a number of con-straints commonly seen in previous researches: we are includinga much wider range of factors including both fundamental andtechnical/momentum factors; we achieve a higher level of gen-erality without having to pre-group companies by the value oftheir earnings surprises or other attributes prior to the analysisor prediction ( subsample analysis ) which is common in the lit-erature [6]. Additionally we have chosen 1 106 stocks that areor once existed as components of the Russell 1 000 index (whichtracks approximately the 1 000 largest public companies in theUS) during the chosen time period between 1997 and 2018.Our selection includes companies that either went bankrupt ordropped out of Russell 1 000, signiﬁcantly reducing survivor-ship bias in our training data. This test population is larger thanmost earlier studies of similar nature. For example, Beyaz andcolleagues only chose 140 stocks from S&P500 when they at-tempted to forecast stock prices both six months and a year outbased on fundamental analysis and technical analysis [7], andBradbury used a sample of only 172 ﬁrms to research the re-lationships among voluntary semi-annual earnings disclosures,earnings volatility, unexpected earnings, and ﬁrm size [8]. Ourresults should generalise better with the universe of stocks on theUS markets.Recognising the highly nonlinear nature of stock price move-ments, we have chosen to run our experiments using XGBoostwhich is a state-of-the-art supervised model. We divide thetraining data into in-sample and out-of-sample periods of vary-ing lengths and use the in-sample data set to optimise a model’shyperparameters before training it. Our earlier experiments showthat grid search as a traditional way of ﬁnding an optimal param-eter set is inexhaustive and can be very slow. Instead we havechosen to use the highly adaptable Genetic Algorithm (GA) tooptimise our models [9]. We recognise hyperparameter opti-misation is a delicate step and searching with a limited set ofparameters will result in a non-optimal model which will notable to ﬁt the essential structure of the training data set. To avoidthis potential problem, we have chosen to use a broad value rangeand a small granular step for each of the hyperparameters. A 5-fold cross validation (CV) is employed within each GA iterationduring the optimisation. Our machine learning-based approach is in direct contrast tomost earlier ﬁnancial research work in the literature as typiﬁedby [10], which sought to devise diﬀerent portfolios a priori by dif-ferent factor characteristics and tried to analyse and make senseof the link between portfolio returns and the corresponding eco-nomic factors that segregated the portfolios. Instead, our modelautomatically learns the intrinsic link between the input featurespace and stock price returns with no a priori assumptions. Weﬁnd that stocks that belong to diﬀerent industrial sectors canhave their PEAD movements driven by diﬀerent primary fac-tors and such factors can also change from quarter to quarter.Despite such diﬀerences and changes in the driving factors, aGA optimised XGBoost model is able to pick up the underlyingsignals embedded in our engineered features and forecast the 30day PEAD direction with reasonable accuracy. We also studythe possibility of grouping stocks into portfolios according totheir predicted levels of Cumulative Abnormal Return (CAR).We have found that ranking the out-of-sample stocks by theirpredicted returns help form portfolios which consistently oﬀerhigher positive returns and lower negative returns, a result thatcould potentially form the basis of further usage in market neu-tral long-short trading strategies. In the end, we also look atthe challenges of applying predictive models in real life marketsdue to ever changing market prices and asymmetrical level ofinformation access by certain market participants. We share atactic that can turn a model’s forecasts into actionable signals. II. Related Work

Since the discovery of Post Earnings Announcement Drift as astock market anomaly by Ball and Brown [2] who documentedthe return predictability for up to two months after the annualearnings announcements, extensive research has been carriedout in the literature though with varying results. For exam-ple, Foster, Olsen, and Shevlin [11] found that systematic post-announcement drifts in security returns are only observed for asubset of earnings expectations models when testing drifts in the[+1, +60] trading day period. In recent years, the literature hasbecome less limited to the speciﬁc study of PEAD and insteadput more focus on the direct predictions of stock price move-ment using stocks’ fundamental and/or technical information,again with varying rates of success. Malkiel studied the im-pact of price/earnings (P/E) ratios and dividend yields on stockprices using the Campbell-Shiller model. He conceded his workdemonstrating that exploitable arbitrage did not exist for investorsto earn excess risk-adjusted returns and he could not ﬁnd a markettiming strategy capable of producing returns exceeding a buy-and-hold on a broad market index [12]. Olson and Mossman onthe other hand not only showed that an artiﬁcial neural network(ANN) outperforms traditional regression based methods whenforecasting 12-month returns by examining 61 ﬁnancial ratiosfor 2352 Canadian stocks, but, more importantly, shows that byusing fundamental metrics sourced from earning reports, theywere able to achieve excessive risk-adjusted returns [13].2ther authors went beyond metrics from earnings reportsand attempted stock forecast using both fundamental and tech-nical analysis. Sheta et al. explored the use of ANNs, SupportVector Machines (SVMs), and Multiple Linear Regression forprediction of S&P500 market index. They selected 27 technicalindicators as well as macro economic indicators and reportedthat SVM contributed to better predictions than the other modelstested [14]. Hafezi et al. considered both fundamental and tech-nical analyses in a novel model called Bat-neural Network Multi-agent System when forecasting stock returns. The resulted meanabsolute percentage error showed that the new model performedbetter than a typical Neural Network coupled with a GA [15].Alternative data are becoming popular, too. Solberg and Karlseninvestigated the possibility to predict the direction of stock pricesusing scripts of earnings conference calls. By analysing 29 330diﬀerent earnings call scripts between 2014 and 2017 using fourdiﬀerent machine learning algorithms they managed to achievea classiﬁcation error rate of 43.8 % using logistic regression andbeat the S&P500 benchmark using both logistic regression andgradient boosting. Their results showed that earnings calls con-tain predictive power for the next day’s stock price direction postearnings release [16].Researchers also studied how machine learning would di-rectly beneﬁt ﬁnancial trading. Through a series of applicationsinvolving hundreds of predictors and stocks, Huck looked athow to apply some of the state-of-the-art machine learning tech-niques to manage a long-short portfolio. In that process he alsoexplored a series of practical questions with regard to the predic-tor data and was able to show that the techniques he examinedgenerated useful trading signals for portfolios with short holdingperiods [17]. Sant’Anna and Caldeira applied Lasso regressionfor index tracking and long-short investing strategies. They usedstocks from three benchmarks, S&P100, Russell 1000, and theIbovespa Index from Brazil from 2010 to 2017 to assess the qual-ity of Lasso-based tracking portfolios. By using co-integrationas a benchmark method to solve the same problems, they demon-strated that the Lasso regression based approach was able to formportfolios that produced similar returns compared to using co-integration, but incurred signiﬁcantly less transaction costs [18].As a model that has only recently burst on the scene, thereis limited study of XGBoost in ﬁnancial applications. Chatzis etal. [19] evaluate the possibility of a market crash over a 1-dayand 20-day horizon across the global markets by forecasting 1-day and 20-day stock market returns and see if they will havedropped below a low quantile of historical distribution of stockmarket returns. By using a vast set of data from global stock mar-kets, bond markets and FX markets, the paper explores a largeset of supervised learning models including Logistic Regres-sion, Decision Trees, Random Forest, Support Vector Machines,Deep Neural Networks, and XGBoost. The paper draws conclu-sions by declaring the superiority of certain models includingXGBoost over others by examining the forecast results on stockreturns through a list of statistical measurement metrics. Li andZhang [20] use XGBoost to dynamically predict the value of a set of seven factors that contribute a stock selection process.Dynamically generated factors are then used to select a portfolioof diﬀerent stocks whose return is measured over a multiple yearperiod. Portfolios of dynamically selected stocks are shown toperform better than benchmark portfolios.

III. Model Features Generation

We have chosen to use 1106 US companies in the Russell-1000index in total. The data time frame is between the ﬁrst ﬁnancialquarter of 1997 (1997 Q1) and the fourth ﬁnancial quarter of2018 (2018 Q4). The model output is the day CumulativeAbnormal Return post earnings release of each individual stockand the input space consists of the following set of unadjusteddata which we have sourced from Bloomberg:• Financial statements data• Earnings Surprise data• Momentum indicator data• Short interest dataIn total, we have sourced 97 901 quarterly ﬁnancial state-ments from our chosen companies over the test time frame. Theﬁnal population of valid data points used for training and testingwhose input features include both ﬁnancial statement metricsand other economic metrics stands close to 50,000, dependingon the test cases. There are a number of reasons for the reducedpopulation: (a) there are no Earnings data, Short interest dataor other input feature data on Bloomberg for a good number ofhistorical ﬁnancial quarters within the test time frame; (b) wehave discarded companies in certain historical quarters when theearnings reports suﬀered badly from missing data; (c) We havebeen very careful with whether an earnings report was releasedbefore market opened, after market closed or during trading hoursas such a diﬀerence is signiﬁcant as we would need to alter theforecast starting point accordingly. Bloomberg is missing therelease time of day for some ﬁnancial quarters in earlier years,and we have discarded those quarters. A. Financial Statements data

As shown in Table 1, twenty nine metrics from earnings reportshave been chosen to create training data.Based on the reported value of these metrics, we have en-gineered new features as quarterly change and yearly change ofeach of these ﬁnancial metrics.

B. Earnings Surprise data

Earnings Surprise represents how much a company’s actual re-ported Earnings Per Share (EPS) is more (or less) than the averageof a selected group of stock analysts’ estimates on the same quar-ter’s EPS. We are not calculating Earnings Surprise as a %change3ash Operating MarginCash from Operating Activities Price to Book RatiosCost of Revenue Price to Cashﬂow RatiosCurrent Ratio Price to Sales RatiosDividend Payout Ratio Quick RatioDividend Yield Return On AssetsFree Cash Flow Return On Common EquityGross Proﬁt RevenueIncome from Continued Operations Short Term DebtInventory Turnover Total AssetNet Debt to EBIT Total AssetNet Income Total Debt to Total AssetsOperating Expenses Total Debt to Total EquityOperating Income Total InventoryTotal LiabilitiesTable 1: Earnings report metrics chosen as input featuresbetween the reported EPS and market estimated EPS because (a)%change is very volatile when a EPS level is close to zero and asmall change can lead to a misleadingly large %change, and (b)we would like to avoid the change-of-signs problem.We have subsequently engineered the following three featuresrelated to Earnings Surprise:• Current quarter’s Earnings Surprise (reported EPS minusmarket estimated EPS);• Diﬀerence between current quarter’s Earnings Surpriseand that of the previous quarter;• Diﬀerence between current quarter’s Earnings Surpriseand the average Earnings surprise of the preceding threequarters;

C. Momentum Indicators

We have chosen the following technical/momentum indicatorvalues calculated on the same day an individual company’s quar-terly earnings data was released:• 9-day Relative Strength Index (RSI)• 30-day Relative Strength Index• 5-day Moving Average / 50-day Moving Average• 5-day Moving Average /200-day Moving Average• 50-day Moving Average / 200-day Moving AverageWe believe all these indicators should in a way measure how astock’s recent short term movements compare to its historicalmovements further back in time. The inclusion of momentumindicators is motivated by the intention to allow the predictionprocess of future stock movements to take into account a stock’srecent movement trend as information leakage does happen prior to ﬁnancial reportings. We have engineered the three ratios ofshort term moving averages to near or long term moving averagesas proxies to the golden cross indicators.

D. Short Interest data

Short interest ratio is released for most companies twice a monthand is calculated by dividing the number of shares short in astock by the stock’s average daily trading volume. The shortinterest ratio is a good gauge on how heavily shorted a stock maybe versus its trading volume. The most recent short interest ratiofor each company prior to its earnings release is sourced as aninput feature to the model for that company.

IV. Data Pre-processing

With totally 1106 companies involved over 21 years, there is alot of data representing input features for each company at eachquarter. In order for them to be understood by the model, we putthem into a matrix-like data structure A ∈ M m × n ( R ) , where eachof the m rows represents a n dimensional training data point, in-dexed by the pairing of a company name and a historical quarter,and each column holds data of the same feature from all the datapoints.Before we put the data of all the companies and of all the quar-ters into a matrix, we pre-process each company’s data to dealwith outliers and to standardise data of every company. Firstly,we employ Winsorisation [21] to reduce the number of outlierspresent in the input features. This is carried out on the featuredata of each individual company. Secondly, we standardise aselective group of features of each company. Every company’sstandardised features will then be stacked back into a full trainingdata set. The pre-processing process is illustrated in Figure 1.4igure 1: Steps of Data Pre-processing V. Models and Methods

A. Extreme Gradient Boosting

Extreme Gradient Boosting (XGBoost) is a scalable machinelearning system for tree boosting invented by Tianqi Chen [22],which has gained much prominence in recent years. It distin-guishes itself from other existing tree boosting methods [23] [24]by having cache-aware and sparsity-aware learnings. The formertechnology gives the system twice the speed against running anon-cache-aware but otherwise identical greedy tree splittingalgorithm, and the latter gives an amazing 50 times speed boost-ing against a naive implementation handling an Allstate-10kdataset [22]. More importantly, XGBoost has achieved algorith-mic optimisations by introducing a regularised learning objectivewithin a tree structure which helps achieve smart tree splittingand branch pruning.For a data set in matrix form A ∈ M m × n ( R ) with m datapoints and n features, a tree ensemble model uses K base leanerfunctions to predict the output: (cid:101) y i = φ ( x i ) = K (cid:213) k = f k ( x i ) , f k ∈ F , (1)where F is the space of regression trees. Each hypothesis f k corresponds to an independent tree structure q with leaf scores ω . XGBoost utilises regression trees each of which contains ascore on each of its leaves. These scores help form the decisionrules in the trees to classify each set of inputs into leaves andcalculate the ﬁnal predicted output by summing up the scoresin the related leaves. Unlike other standard gradient boostingmodels such as AdaBoost and GBM which do not intrinsically perform regularisation, XGBoost minimises a re g ularised lossfunction in order to learn the set of functions: L ( φ ) = (cid:213) i (cid:96) ( (cid:98) y i , y i ) + (cid:213) k Ω ( f k ) . (2)Here, (cid:96) is a diﬀerentiable convex loss function for the modeloutput and the regularisation term is deﬁned as (though not lim-ited to) Ω ( f ) = γ T + λ || ω || , which reduces the chance ofoverﬁtting. As in a typical gradient tree boosting model, a newbase learner regression tree f i which most minimises the lossfunction in equation 2 is greedily and iteratively added to the ﬁ-nal loss function. Let (cid:98) y i , t be the model output of the i -th instanceat the t -th iteration the loss function can be re-written as L t ( φ ) = (cid:213) i , k (cid:96) ( y i , (cid:98) y i , t − + f t ( x i )) + (cid:213) k Ω ( f k , t ) . (3)By taking the Taylor expansion on this loss function up to thesecond order and removing the constant terms as a result of theexpansion the loss function can be simpliﬁed to: L t ( φ ) = T (cid:213) j = [ G j ω j + ( H j + λ ) ω j ] + λ T , (4)where5 j = (cid:213) i ∈ I j g i H j = (cid:213) i ∈ I j h i I j = { i | q ( x i ) = j } g i = ∂ (cid:98) y i , t − (cid:96) ( y i , (cid:98) y i , t − ) h i = ∂ (cid:98) y i , t − (cid:96) ( y i , (cid:98) y i , t − ) . Here, T is the number of leaves in the tree. With ω j beingindependent with respect to others, Tianqi [22] has proven thatthe best ω j for a given tree structure q ( x ) should be ω ∗ j = − G j H j + λ, (5)which in turn makes the objective function come to its ﬁnalform: L ∗ j = − T (cid:213) j = G j H j + λ + γ T . (6)Ideally, the model would enumerate all possible tree struc-tures with a quality score and pick the best one to be addediteratively. In reality, this is intractable and optimisation has tobe executed one tree level at a time. This is made available by theﬁnal form of the loss function, as the model uses it as a scoringfunction to decide on the optimal leaf splitting point. Assumethat I L and I R are the instance sets of left and right nodes after thesplit. Letting I = I L ∪ I R , the scoring function for leaf splittingis L split = (cid:20) ( (cid:205) i ∈ I L g i ) (cid:205) i ∈ I L h i + λ + ( (cid:205) i ∈ I R g i ) (cid:205) i ∈ I R h i + λ + ( (cid:205) i ∈ I g i ) (cid:205) i ∈ I h i + λ (cid:21) − γ. (7)These scores are then used by a method called the exact g reed y al g orithm to enumerate all the possible splits for con-tinuous features, allowing each level of a tree to be optimised andthe overall loss function to be minimised in the process. Whendeployed on a distributed platform XGBoost employs approxi-mate algorithms instead to alleviate the huge memory consump-tion demanded by the exact greedy algorithm although this is notneeded in our experiments which run on a single machine. B. Hyperparameter Optimisation

Model optimisation is one of the two most important steps (theother being data cleansing) in ensuring the model output canmeaningfully capture the underlying dynamics of the dependentvariable. In search of optimal hyperparameter sets, we initiallyexperimented a more straightforward approach of grid searchbut found it less eﬀective in its performance and inexhuastivein the search results. GA as an adaptable and easily extensibleheuristic optimisation method is chosen instead to carry out this task. Table 2 gives the list of model hyperparameters we haveput through GA. Before we start the optimisation process, weﬁrst split the population of data into training data and test data.Selection of the out-of-sample test data varies and depends onthe nature of a test which will be explained in subsequent sec-tions. It is the training data that we use to optimise the model.We use 5-fold cross validation to calculate the ﬁtness value ona particular set of hyperparameters examined by the GA. To dothat we split the training data into ﬁve equal groups, use fourgroups to train the model and calculate the ﬁtness value usingthe last group (validation). This process is repeated ﬁve timesiteratively on each of the ﬁve groups and the ﬁnal ﬁtness valueis the averaged ﬁtness of the ﬁve iterations.To optimise the model, each hyperparameter is randomly ini-tialised according to its own valid range of values. This initiali-sation is repeated 40 times so that we have 40 sets of randomlyinitialised hyperparameters to start the GA process with. Eachset is called a population , and each hyperparameter within a set iscalled a chromosome . All of the 40 populations are considered tobe part of the current generation . The GA process carries out a 5-fold cross-validation on a model using each of the 40 populationsof parameters and when ﬁnished, keeps the 20 populations thathave produced the smallest ﬁtness values in the cross validationstep. These 20 sets or populations of hyperparameters are consid-ered to have performed better in forecasting post-announcementdrifts with the current model than the 20 discarded ones. The 20better populations are then used to cross-breed into 20 new pop-ulations and in this process mutation is allowed to happen to thecross-bred populations, i. e., chromosomes in the 20 newly cre-ated populations are allowed to randomly change value followinga predeﬁned level of probability. At the end of this process, wehave produced a new and potentially better set of 40 populationsof hyperparameters and we call them the new generation . Thenew generation are then fed through a second iteration of theGA process until eventually the minimum ﬁtness value producedby the cross-validation step no longer changes its value withintolerance and at this point we have arrived at the optimal set ofhyperparameters which produces the smallest ﬁtness value whenbeing used in the current model. Figure 2 shows how GA andCross Validation work together to produce the set of hyperpa-rameters of each model which result in the highest predictionaccuracy (smallest ﬁtness value) on the validation set.

VI. Results

All of our experiments centre around the 30 day post-earningsCumulative Abnormal Return (CAR) as a measure of risk ad-justed stock price return. An abnormal return is between theactual return of a security and its expected rate of return. AR i = r i − E ( r i ) , (8)where AR i is the one-day abnormal return for company i on day t , r i is the actual one-day stock return and E ( r i ) is the6 yper Parameters GammaMax depthSub sampleLearning RateMinimum child weightColumn sample by treeTable 2: XGBoost hyperparameters optimised by GA + CVFigure 2: Hyperparameter Optimisation using GA + CVexpected one-day return of stock i . As explored by Kim [10],there is a variety of ways of evaluating the expected return in-cluding using quantitative models such as the one-factor CAPMmodel and the Fama French three-factor model [3]. In our exper-iments, we choose to use the S&P500 index return to representthe broader market’s return and use that to proxy a stock’s ex-pected return. Consequently, our model output for stock i , whichis the cumulative abnormal return from T to T , is deﬁned as: C AR i ( T , T ) = T (cid:213) t = T ( AR it ) = T (cid:213) t = T ( r it − r S & P ) . (9) A. Single Stock Forecast

In this experiment, we have chosen stocks that ﬁled for earningswith SEC in the four quarters in each ﬁnancial year from 2014to 2018 as our out-of-sample test population. That means, weﬁrst run a forecast on movement direction of all the stocks thatreported earnings in 2014 while using all the data prior to 2014as training data. Once done, we move on to repeat the sameexercise on stocks that reported in 2015, etc. It should be notedthat a company that ﬁled in each of the 4 quarters of a ﬁnancialyear is considered as four independent data points since the onlydata consumed by the XGBoost + GA model to predict the PEADdirection of a stock at any quarter are the near term momentumsignals and ﬁnancial statement data of this stock in that particularquarter.Separately, the same test as described above is also repeated on stocks belonging to a particular industrial sector. Bloombergcategorises US companies into seven sectors:

Industrial, BasicMaterials, Consumer Cyclical, Consumer Non-Cyclical, Finan-cial, Technology, Communications, Energy, and Utilities . Ourchosen companies and their data are divided up into seven groupsby industrial sector so that we can run the same tests per industrialgroup.Each of such tests whether on all of the stocks or stocksbelonging to a particular industrial sector are run 100 times.In each run, the same set of training data are used to train themodel whose performance is veriﬁed using the same set of out-of-sample test data. This generates 100 sets of results for eachtest from which stats are calculated.Results and stats of predicting 30 day PEAD direction ineach year from 2014 to 2018 and over the seven industrial sec-tors are presented in ﬁgure 3. Our results clearly suggest thatour model has strong prediction power and is able to pick up thepatterns in the input data space when there is any driver in it. Itis particularly interesting to see the model performs much betterwith stocks from some industrial sectors than others. Given thatthe same set of input features is used across the board, there isclear evidence that our data is more impactful to some sectorsthan others. There are probably two reasons that can explainthis observation. First, there are other data that are not includedin our feature space that does aﬀect stock movements follow-ing earnings release. Second, stocks from diﬀerent sectors aresubject to diﬀerent drivers, i. e., investment personnel/computertrading algorithms look for signals in diﬀerent ﬁnancial metrics7igure 3: Accuracy rate of predicting 30 day PEAD movementfor diﬀerent sectors. Even if the same driving features are exam-ined, the implicit feature weighting must be diﬀerent for diﬀerentsectors.The ﬁrst reason is true, as there are impactful data that haveyet to be included in our research, such as management’s guid-ance, recent revisions of analysts’ price forecast, other text in-formation carried in ﬁnancial reports, and meeting minutes withanalysts, etc. It is entirely possible that certain stocks, or stocksfrom certain sectors are more susceptible to those data and theabsence of such data reduces the model’s prediction accuracy onthose stocks.We have taken a closer look at the second possible cause andwe ﬁnd indeed stocks from diﬀerent sectors are driven by diﬀer-ent factors. With the 100 groups of tests we have carried out onindividual sectors in each ﬁnancial year from 2014 to 2018, wehave counted the appearance of the three factors that appear mostoften as the top ﬁve driving forces. The results are recorded inthe Appendix section of this paper and present some very inter-esting ﬁndings. First, most of the time, it is the three EPS relatedmetrics that feature heavily on the top ﬁve spots of most inﬂuen-tial factors. This ﬁnding is consistent with market practice andEarning Per Share surprise/disappointment is indeed one of themost important factors that investors examine. We need to pointout that two of these features are engineered by us, which repre-sent how the current quarter’s reported EPS compares to thoseat the preceding quarters. The fact that these two factors alsodominate shows that investors look for more complex movementsin ﬁnancial metrics. Second, over the years, we consistently seeimportant albeit less strong features appearing on the top ﬁve listfor some of the sectors. For instance, the quarterly change in Re-turn On Assets, Price-to-Sales Ratio, and Dividend Payout Ratioare consistently making up the top ﬁve spots driving PEAD ofstocks from the

Industrial , Financial , and

Basic Materials sec- tors respectively. We have carried out separate forecasting testsusing the three EPS features only, and did not obtain good re-sults which show that the model cannot be driven purely by ahandful of key features and other, less strong, but also impactfulfeatures must not be ignored. Third, in the years when our modelproduces better prediction results for a particular sector, we arefrequently seeing features that are more consistently dominant.This is represented by higher occurrence counts observed forthe dominant features. This can be observed in the results forthe

Industrial , Consumer Cyclical , and

Consumer Non-Cyclical sectors. The opposite is also true. With

Energy and

Utilities being the most diﬃcult sectors to predict, the model is returningan inconsistent set of top drivers from the 5 yearly tests amongwhich the occurrence count is also comparatively lower. With-out strong and consistent drivers among our feature data for suchsectors, the prediction result is unsurprisingly poorer.The results of our experiments in this section help us concludethat Post Earnings-Announcement Drift is not merely a marketanomaly, but a characteristics of the markets whose directioncan be materially predicted. The strength of signal may vary intime and from sector to sector, but machine learning models —especially an XGBoost well optimised by GA — are able to pickup on them. However, the fact that the model performs well withstocks from certain sectors but not on others suggest that theremay be limitations in our input feature space or the way the spe-cial features have been engineered. The input space may not havecaptured enough driving factors for certain sectors. We acknowl-edge that, when a company makes an earnings announcement,information come out in many diﬀerent forms such as in ﬁnancialmetric numbers, textual information embedded in the documentsﬁled with SEC, earnings calls with a selected group of equityanalysts, let alone information leakage prior to announcement oreven insider trading. Trying to capture and take advantage of8ore forms of drivers on Post Earnings Announcement Drifts isa future research topic.

B. PEAD Analysis on Portfolio of Stocks

We believe it is meaningful to evaluate the dynamics of post-earnings drifts in the context of portfolios. In the subsequentseries of tests, we use the model to forecast the actual level of day post earning cumulative abnormal returns instead of only themovement direction. We rank out-of-sample stocks accordingto each stock’s predicted returns from high to low, group stockstogether into small portfolios and examine the actual returns ofthe portfolios. B.1. Stocks that reported earnings in the same ﬁnancial quarter

We’ve mainly examined stocks that reported earnings in the sameﬁnancial quarter in the years between 2014 and 2018. In eachtest we use all the data prior to the test quarter for training andcarry out stock return prediction on stocks in the test quarter. Inall the tests once the stocks being examined have been rankedaccording to their model-predicted post earnings returns, we areconsistently observing that portfolios which include stocks fromtop quantiles of the ranked list are producing higher positivereturns, whereas portfolios which include stocks from bottomquantiles are producing lower negative returns. Theoretically, along-short market neutral strategy [25] could be formed throughlonging the top quantile portfolios and shorting the bottom ones.We’ve selected the Q3 2018 and Q4 2018 earning seasonsto demonstrate the results . One of the reasons for choosingthese two quarters is that the US stock market went through twopolar opposite phases of development in these two quarters withthe S&P500 shedding 20 % in the last quarter of 2018 (aroundthe time most Q3 2018 earnings were reported in the US) amidfear of Fed rate rises and US-China trade war escalation amongother things but gaining a major rebound in the ﬁrst quarter of2019 (when most US companies reported Q4 2018 earnings).Our intention is to evaluate if our model can successfully capturethose very diﬀerent PEAD dynamics given very diﬀerent macroconditions and diﬀerent company speciﬁc accounts.Each point on ﬁgures 4 and 5 corresponds to the actual re-turn of a portfolio consisting of 100 stocks when we move downthe list of out-of-sample stocks which have now been ranked bytheir predicted 30-day risk-adjusted returns following earningsreleases. For instance, the ﬁrst point is the actual return of aportfolio consisting of the 1st to the 100th stocks and the secondpoint is the actual return of a portfolio including the 2nd to the101st stocks, etc. We consistently produce similar ﬁgures with adownward slope when we continuously run the same tests. Ourmodel has captured an unseen collecti v e trend of movement bygroups of stocks as triggered by their earnings release and otherrelevant economic factors. B.2. Stocks that reported earnings on the same date

If we were to construct market neutral portfolios, practicallyspeaking it would only make sense if we could execute the buy-ing and short-selling of model-chosen stocks within a short timeframe, such as within a day or ideally less. Here, we run thesame portfolio test on stocks which ﬁled for earnings with SECon the same date. Two dates in 2018 with busy earnings releaseactivities were chosen for demonstration. Figures 6 and 7 are re-turns of portfolios created using stocks that reported earnings oneach date. The stocks have been ranked by their model-predicted30-day post earning CAR before being grouped into portfolios.Once again, the combination of the XGBoost-GA model and ourengineered input features is producing the kind of results whichcan be used to rank stocks and construct portfolios, which wouldproduce higher positive returns or lower negative returns.Table 3 gives the stats on how the top quantile portfolios andbottom quantile portfolios are performing compared against theaverage return of all the out-of-sample stocks. In some cases,returns from portfolios consisting of top/bottom quantile stocksare considerably higher/lower than the out-of-sample stock pop-ulation’s average. Such patterns of portfolio returns could havetheoretically made them good candidates for a long-short strategycapitalising on the events of earnings release. This, however, ismade diﬃcult in reality, as we will discuss more about the time-liness of the signals.

C. Trading on Earning Event Signals

In the aforementioned experiments we have chosen the last pub-licly available tradable stock price before an earnings releaseas the starting point of a 30 day forecasting period. This is anintuitive choice and commonly seen in the literature. For in-stance, Erlien [26] uses the end point of her training window asthe beginning of a calculation window for cumulative abnormalreturns. Similarly, when examining how numerous factors drivesthe revision of analysts’ consensus forecast on a company’s EPS,Ahmed and Irfan [27] collect the ﬁnal consensus available priorto earnings announcement to start the forecast period.However in reality a company’s stock price moves on receiptof the ﬁrst trickle of news. Information is never symmetrical,and some parties always possess greater material knowledgethan others. They can and will act on such material informa-tion driving the stock price away from the last tradable pricebefore the wider market gains access to the same level of infor-mation. Also, the incorporation of earnings information into thelatest price is hugely accelerated by the presence of algorithmictrading systems as veriﬁed by Frino et al. who studied a uniquedataset obtained from the Australian Securities Exchange [28].Correct forecasting of stock movements upon ﬁnancial events isnot practically useful unless they can be acted upon.With this in mind, we have attempted to forecast cumulativeabnormal returns from 1 day after the announcement of newsto 30 day after, i. e., CAR from t to t . Our results showthat the forecasting is inferior with accuracy of around 50 % and9igure 4: 2018 Q4 test result. Actual returns of moving portfolios consisting of 100 stocks. All stocks have been pre-ranked by themodel-predicted stock returns from high to low.Figure 5: 2018 Q3 test result. Actual returns of moving portfolios consisting of 100 stocks. All stocks have been pre-ranked by themodel-predicted stock returns from high to low.sometimes less and cannot be relied upon. This is not at allsurprising, because, as per the eﬃcient market hypothesis anygranular earnings information embedded in the ﬁnancial state-ments and management’s guidance, coupled with the market’sown interpretations, will have been mostly consumed by themarkets and reﬂected in the latest stock prices not too long afterthe announcement. A similar observation was already seen byAllen and Karjalainen [29] that introducing a one-day delay totrading signals removes most of the forecasting ability when theyused GAs to ﬁnd technical trading rules.Since we are not able to accurately forecast the direction ofCAR from t t + ∆ t to t t + T (with ∆ t being non-negligible) usingnewly released ﬁnancial statements data and a stock’s momen-tum signals prior to announcement, we have devised a tactic toinfer the delayed-starting direction. In this case we wait until 1day after earnings release. It is important to point out that sincewe intend to trade on a stock’s movement from t to t , ourstandpoint is now 1 day after the earnings announcement, andwe are already in possession of the knowledge of how a stockhas moved from t to t . Standing near the close of the market 1day after announcement, we would follow the following steps to infer a stock’s movement from t to t :1. Stock Exclusion: Exclude all the stocks whose actualmovement from t to t are within the interval of [-0.05 %,0.05 %] (obtained through empirical analysis) so as toeliminate stocks with weak immediate response to earn-ings announcements;2. Re-run the forecast on PEAD direction from t to t byalso including in the input space a stock’s known move-ment direction from t to t . The overall accuracy hasincreased to around 70 % due to this new input to themodel;3. Filtering: Select stocks whose real movement from t to t is in opposite direction compared with the predictedmovement from t to t , i. e., select stocks which havegone up (down) in the ﬁrst day despite being forecasted tomove down (up) over 30 days;4. For all the remaining stocks, deduce a stock’s movementdirection from t to t to be the same as the predictedmovement direction from t to t .10igure 6: 02 Aug 2018. Actual returns of moving portfolios consisting of 50 stocks from all sectors that reported earnings on thisdate. All stocks have been pre-ranked by the model-predicted stock returns from high to low.Figure 7: 25 Oct 2018. Actual returns of moving portfolios consisting of 50 stocks from all sectors that reported earnings on thisdate. All stocks have been pre-ranked by the model-predicted stock returns from high to low.We test this tactic using data from 2016, 2017, and 2018,respectively. Data from preceding years are used for training.As noted in an earlier section, a company that ﬁled in each ofthe four quarters of a year is considered as four independent datapoints. Table 4 gives the results of applying this tactic on thethree years. After stock exclusion and ﬁltering the number ofeligible stocks have come down to lower hundreds. With theremaining stocks we observe that we are consistently achievingclose to 60 % accuracy in inferring the stock direction from t to t . The important thing is that, since this tactic is meant to beexercised by a trader at or near the close of market one day afteran earnings announcement, this is a signal that can genuinely beacted upon. We also expect the overall accuracy of inferring thestock direction from t t + ∆ t to t t + T to increase once we are able tofurther increase the PEAD prediction accuracy by the model ingeneral. We believe this is possible, as there are other sources ofimpactful information that have yet to be included in the featurespace, such as management’s guidance, equity analyst’s pricerevisions, other text data carried in ﬁnancial reports, and meet-ing minutes with analysts, etc. This is another potential futureresearch direction. VII. Conclusion

Post-earnings announcement drift is a well known and well stud-ied stock market anomaly when a stock’s risk adjusted price cancontinue in the direction of an earnings surprise in the near to midterm following an earnings release. Past research was, however,often limited in using simpler regression based methods to ex-plain this phenomenon, and was often conﬁned to using a limitedset of explaining factors. Even fewer research was carried outon how to potentially take advantage of this known anomaly andconduct actionable forecast on stock price movements followingsuch a signiﬁcant economic event to companies. Attempting toﬁll this gap in the literature, our experiment is including a muchbigger set of carefully selected input factors of various typeswith some being speciﬁcally engineered, sourcing the data overa longer historical time frame and attempting to forecast the di-rection of Cumulative Abnormal Returns (CAR) with a machinelearning approach. We have adopted the state-of-the-art XG-Boost and put it through a rigorous optimisation process. We notonly looked at speciﬁc forecast success rates, but also examinedif there is a collecti v e trend of movement enjoyed by a group11 ut-of-sampleTime Frame Industries ForecastHolding Period Top QuantilePortfolio Return Average ActualReturn Bottom QuantilePortfolio Return Q4 2018 All 30 days 3.90% -0.29 % -3.76 %Q3 2018 All 30 days 4.09 % 0.36% -4.78 %02-Aug-18 All 30 days 2.75 % 1.16 % 0.54 %25-Oct-18 All 30 days 3.83% 1.06 % -1.64 %Table 3: Actual returns of portfolios consisting of top and bottom quantile stocks. Stocks have been ranked by their predictedreturns Year tested No. ofout-of-samplesbefore ﬁltering No. ofout-of-samples afterﬁltering t to t accuracyby model Inferred accuracy t to t for remaining stocks %2017 3715 151 70.80 % %2018 3728 261 69.90 % %Table 4: Accuracy of inferring stock direction from t to t using model predicted direction from t to t and the known stockmovement from t to t of stocks following their individual earnings release. First, ourresults show that when properly conﬁgured using a Generic Al-gorithm, XGBoost produces meaningful prediction accuracy onthe direction of PEAD. We demonstrated that our selected in-put features were genuinely driving PEAD with a classiﬁcationsuccess rate going up to 63 % depending on the test scenarios.In a further breakdown, we observed that stocks from diﬀerentindustrial sectors and at a diﬀerent time can have their PEADsdriven by diﬀerent primary factors. The strengths of the drivingfactors are well understood by our model with stocks from cer-tain sectors producing excellent/poor forecast results when theunderlying factor dominance is more/less pronounced. Second,guided by the model’s forecast outputs we found that it is pos-sible to build portfolios which consistently oﬀer higher positivereturns and lower negative returns and such an observation couldpotentially form the basis of market neutral long-short tradingstrategy. Third, we studied the challenges of applying earningevent signals in real trading. Market participants with informa-tion advantage can drive the price away before the rest of themarkets have an opportunity to act on the signals. Instead oftrying to buy in as soon as event data comes out, we have de-vised a tactic to create opportunities to delay-buy into the marketat a later time using the same prediction results by the modelsas well as public knowledge of market movements immediatelyfollowing the release of earnings data. Lastly, future eﬀorts willneed to also investigate recent methods of deep learning, which,in our preliminary experiments were inferior to the consideredapproach. However, their partial or combined usage such as forrepresentation learning or data augmentation appears promising. References [1] M. GÃűÃğken, M. ÃŰzÃğalÄścÄś, A. Boru, and A. T.DosdoÄ§ru, “Integrating metaheuristics and artiﬁcial neu-ral networks for improved stock price prediction,”

ExpertSystems With Applications , vol. 44, pp. 320–331, 2016.[2] R. Ball and P. Brown, “An empirical evaluation of account-ing income numbers,”

Journal of Accounting Research ,vol. 6, pp. 159–78, 1968.[3] E. F. Fama and K. R. French, “Common risk factors in thereturns on stocks and bonds,”

Journal of Financial Eco-nomics , vol. 22, pp. 3–56, 1993.[4] R. Bhushan, “An informational eﬃciency perspective onthe post-eanings announcement drift,”

Journal o f Account-ing and Economics , vol. 18, pp. 45–65, 1994.[5] L. Qiu, “Earnings announcement and abnormal return ofs&p 500 companies,” 2014.[6] H. K. Baker, Y. Ni, S. Saadi, and H. Zhu, “Competitiveearnings news and post-earnings announcement drift,”

In-ternational Review of Financial Analysis , vol. 63, pp. 331–343, 2016.[7] E. Beyaz, F. Tekiner, X.-J. Zeng, and J. Keane, “Com-paring technical and fundamental indicators in stock priceforecasting,”

Proceedings of IEEE 4th International Con-ference on Data Science and Systems , pp. 1607–1613, June2018.[8] M. E. Bradbury, “Voluntary semiannual earnings disclo-sures, earnings volatility, unexpected earnings, and ﬁrm12ize,”

Journal o f Accounting Research , vol. 30, pp. 137–145, Spring 1992.[9] S. Deng, K. Yoshiyama, T. Mitsubuchi, and A. Sakurai,“Hybrid method of multiple kernel learning and genetic al-gorithm for forecasting short-term foreign exchange rates,”

Computational Economics , vol. 45, no. 1, pp. 49–89, 2015.[10] D. Kim and M. Kim, “A multifactor explanation of post-earnings announcement drift,”

The Journal of Financialand Quantitative Analysis , vol. 38, pp. 383–398, 2003.[11] G. Foster, C. Olsen, and T. Shevlin, “Earnings releases,anomalies, and the behavior of security returns,”

The Ac-counting Review , vol. 59, no. 4, pp. 574–603, 1984.[12] B. G. Malkiel, “Models of stock market predictability,”

Journal of Financial Research , vol. 27.4, pp. 449–459,2004.[13] D. Olson and C. Mossman, “Neural network forecasts ofcanadian stock returns using accounting ratios,”

Interna-tional Journal of Forecasting , vol. 19, no. 3, pp. 453–465,2003.[14] A. Sheta, S. Ahmed, and H. Faris, “A comparison betweenregression, artiﬁcial neural networks and support vectormachines for predicting stock market index,”

InternationalJournal of Advanced Research in Artiﬁcial Intelligence ,vol. 4, pp. 55–63, 07 2015.[15] R. Hafezi, J. Shahrabi, and E. Hadavandi, “A bat-neuralnetwork multi-agent system (bnnmas) for stock price pre-diction: case study of dax stock price,”

Applied Soft Com-puting , vol. 29, no. C, pp. 196–210, 2015.[16] L. E. Solberg and J. Karlsen, “The predictive power of earn-ings conference calls : predicting stock price movementwith earnings call transcripts,” Master’s thesis, NorwegianSchool of Economics, 2018.[17] N. Huck, “Large data sets and machine learning: Applica-tions to statistical arbitrage,”

European Journal of Opera-tional Research , vol. 278, no. 1, pp. 330–342, 2019.[18] L. R. SantâĂŹanna, J. F. Caldeira, and T. P. Filomena,“Lasso-based index tracking and statistical arbitrage long-short strategies,”

North American Journal of Economicsand Finance , vol. 51, p. No Pagination, 2020. [19] S. P. Chatzis, V. Siakoulis, A. Petropoulos, E. Stavroulakis,and N. Vlachogiannakis, “Forecasting stock market cri-sis events using deep and statistical machine learningtechniques,”

Expert Systems with Applications , vol. 112,pp. 353–371, 2018.[20] L. Jidong and Z. Ran, “Dynamic weighting multi factorstock selection strategy based on xgboost machine learningalgorithm,”

Proceedings of IEEE International Conferenceof Safety Produce Informatization (IICSPI) , pp. 868–872,2018.[21] B. Duan and W. P. Dunlap, “The robustness of trimming andwinsorization when the population distribution is skewed,”1998. ProQuest Dissertations and Theses.[22] T. Chen and C. Guestrin, “Xgboost: A scalable tree boost-ing system,”

Proceedings of the 22nd ACM SIGKDD In-ternational Conference on Knowledge Discovery and DataMining , pp. 785–794, 2016.[23] S. Tyree, K. Weinberger, K. Agrawal, and J. Paykin, “Par-allel boosted regression trees for web search ranking,”

Pro-ceedings of the 20th international conference on World wideweb , pp. 387–396, 2011.[24] J. Ye, J.-H. Chow, J. Chen, and Z. Zheng, “Stochastic gra-dient boosted distributed decision trees,”

Proceedings ofthe 18th ACM conference on Information and knowledgemanagement , pp. 2061–2064, 01 2009.[25] L. E. Solberg and J. Karlsen, “Pairs trading using machinelearning: An empirical study,” Master’s thesis, ErasmusUniversity Rotterdam, 2017.[26] M. Erlien, “Earnings announcements and stock returns - astudy of eﬃciency in the norwegian capital market,” Mas-ter’s thesis, University of Stavanger Norway, 2011.[27] A. S. Ahmed and I. Safdar, “Dissecting stock price mo-mentum using ﬁnancial statement analysis,”

Accounting &Finance , vol. 58, no. S1, pp. 3–43, 2018.[28] A. Frino, T. Prodromou, G. H. Wang, P. J. Westerholm, andH. Zheng, “An empirical analysis of algorithmic tradingaround earnings announcements,”

Paciﬁc-Basin FinanceJournal , vol. 45, pp. 34–51, 2017.[29] F. Allen and R. Karjalainen, “Using genetic algorithmsto ﬁnd technical trading rules,”

Journal of Financial Eco-nomics , vol. 51, no. 2, pp. 245–271, 1999.13

III. Appendix

We are listing the most signiﬁcant driving factors for stocks from each of the seven industrial sectors as indicated by our XGBoost+ GA model. The results are created after running 100 tests on each group of stocks. The occurrence of features has been counted.The three features that most frequently appear as the top ﬁve driving factors are given along with their occurrence counts. Theseinformation is provided for all the years and industrial sectors that have been tested. In all the tables provided in Appendix, F1 toF5 represent the top ﬁve most impactful features. Features whose name starts with the name of a ﬁnancial variable and ends with"Q_Change" or "Y_Change" represents the quarterly change or yearly change of the same variable. URL to the source code andsource data will be provided upon acceptance of the paper.

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 45 EPS_EarningsSurprise 2 EPS_EarningsSurprise 2 F2 EPS_EarningsSurprise 31 EPS_Earnings_Surprise_Backward_Diﬀ 12 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 4 F3 Total_Liabilities_Q_Change 19 EPS_EarningsSurprise 12 EPS_Earnings_Surprise_Backward_Diﬀ 8 F4 Total_Liabilities_Q_Change 19 Return_On_Common_Equity 13 EPS_Earnings_Surprise_Backward_Diﬀ 8 F5 Operating_Income_Y_Change 13 Return_On_Common_Equity 12 Total_Liabilities_Q_Change 8

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 45 EPS_EarningsSurprise 4 EPS_Earnings_Surprise_Backward_Diﬀ 1 F2 EPS_EarningsSurprise 43 EPS_Earnings_Surprise_Backward_Diﬀ 4 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 3 F3 EPS_Earnings_Surprise_Backward_Diﬀ 30 Total_Liabilities_Q_Change 10 Operating_Income_Y_Change 4 F4 Total_Liabilities_Q_Change 19 Return_On_Common_Equity 13 Operating_Income_Y_Change 10 F5 Operating_Income_Y_Change 14 Total_Liabilities_Q_Change 13 Return_On_Common_Equity 12

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 36 EPS_EarningsSurprise 13 EPS_Earnings_Surprise_Backward_Diﬀ 1 F2 EPS_EarningsSurprise 32 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 14 EPS_Earnings_Surprise_Backward_Diﬀ 3 F3 EPS_Earnings_Surprise_Backward_Diﬀ 24 Total_Liabilities_Q_Change 14 EPS_EarningsSurprise 3 F4 Total_Liabilities_Q_Change 23 Return_On_Common_Equity 8 EPS_Earnings_Surprise_Backward_Diﬀ 7 F5 Operating_Income_Y_Change 17 Return_On_Common_Equity 10 Gross_Proﬁt_Y_Change 6

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 27 EPS_EarningsSurprise 18 EPS_Earnings_Surprise_Backward_Diﬀ 5 F2 EPS_EarningsSurprise 29 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 9 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 9 F3 Total_Liabilities_Q_Change 28 EPS_Earnings_Surprise_Backward_Diﬀ 10 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 7 F4 Total_Liabilities_Q_Change 17 EPS_Earnings_Surprise_Backward_Diﬀ 16 Return_On_Common_Equity 3 F5 Operating_Income_Y_Change 25 EPS_Earnings_Surprise_Backward_Diﬀ 6 Return_On_Common_Equity 4

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 44 EPS_EarningsSurprise 4 EPS_Earnings_Surprise_Backward_Diﬀ 2 F2 EPS_EarningsSurprise 36 EPS_Earnings_Surprise_Backward_Diﬀ 6 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 5 F3 EPS_Earnings_Surprise_Backward_Diﬀ 26 Total_Liabilities_Q_Change 14 EPS_EarningsSurprise 5 F4 Total_Liabilities_Q_Change 31 EPS_Earnings_Surprise_Backward_Diﬀ 6 Operating_Income_Y_Change 5 F5 Operating_Income_Y_Change 20 Return_On_Common_Equity 16 Cost_Of_Revenue_Q_Change 2

Table 5: Top ﬁve driving factors for all stocks in each ﬁnancial reporting year from 2014 to 2018.14

018 Highest Occurance Count Second HighestOccurance Count Third Highest Occurance CountF1

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 94 EPS_Earnings_Surprise_Backward_Diﬀ 6 EPS_EarningsSurprise 0 F2 EPS_Earnings_Surprise_Backward_Diﬀ 67 EPS_EarningsSurprise 20 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 6 F3 EPS_EarningsSurprise 64 EPS_Earnings_Surprise_Backward_Diﬀ 19 Free_Cash_Flow_Q_Change 7 F4 Free_Cash_Flow_Q_Change 55 Return_On_Assets_Q_Change 9 Return_On_Assets_Q_Change 9 F5 Return_On_Assets_Q_Change 20 Return_On_Assets_Y_Change 12 Free_Cash_Flow_Q_Change 11

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 52 EPS_Earnings_Surprise_Backward_Diﬀ 39 EPS_EarningsSurprise 9 F2 EPS_Earnings_Surprise_Backward_Diﬀ 52 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 34 EPS_EarningsSurprise 12 F3 EPS_EarningsSurprise 66 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 9 Return_On_Assets_Q_Change 8 F4 Return_On_Assets_Q_Change 36 PC_Ratios_Y_Change 16 Current_Ratio_Q_Change 11 F5 Return_On_Assets_Q_Change 24 RSI-30D 11 Free_Cash_Flow_Q_Change 10

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 64 EPS_Earnings_Surprise_Backward_Diﬀ 24 EPS_EarningsSurprise 10 F2 EPS_Earnings_Surprise_Backward_Diﬀ 51 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 22 EPS_EarningsSurprise 18 F3 EPS_EarningsSurprise 46 Return_On_Assets_Q_Change 18 EPS_Earnings_Surprise_Backward_Diﬀ 17 F4 Return_On_Assets_Q_Change 27 EPS_EarningsSurprise 14 EPS_EarningsSurprise 14 F5 Return_On_Assets_Q_Change 23 Free_Cash_Flow_Q_Change 20 Current_Ratio_Q_Change 17

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 72 EPS_Earnings_Surprise_Backward_Diﬀ 15 EPS_EarningsSurprise 12 F2 EPS_Earnings_Surprise_Backward_Diﬀ 39 EPS_EarningsSurprise 36 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 16 F3 EPS_EarningsSurprise 46 EPS_Earnings_Surprise_Backward_Diﬀ 31 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8 F4 Free_Cash_Flow_Q_Change 27 Return_On_Assets_Y_Change 14 Return_On_Assets_Q_Change 11 F5 Return_On_Assets_Y_Change 19 PC_Ratios_Y_Change 15 Return_On_Assets_Q_Change 14

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 65 EPS_Earnings_Surprise_Backward_Diﬀ 19 EPS_EarningsSurprise 14 F2 EPS_Earnings_Surprise_Backward_Diﬀ 54 EPS_EarningsSurprise 19 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 15 F3 EPS_EarningsSurprise 53 EPS_Earnings_Surprise_Backward_Diﬀ 21 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 10 F4 Free_Cash_Flow_Q_Change 18 Current_Ratio_Q_Change 15 PC_Ratios_Y_Change 14 F5 Free_Cash_Flow_Q_Change 24 Return_On_Assets_Q_Change 14 RSI-30D 9

Table 6: Top ﬁve driving factors for the

Industrial stocks in each ﬁnancial reporting year from 2014 to 2018.

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 36 Total_Liabilities_Q_Change 11 PC_Ratios 6 F2 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 17 Dividend_Payout_Ratio 12 Total_Liabilities_Q_Change 9 F3 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 10 DMA_50D/200D 9 Dividend_Payout_Ratio 7 F4 EPS_EarningsSurprise 9 EPS_EarningsSurprise 9 PC_Ratios 7 F5 EPS_EarningsSurprise 8 PC_Ratios 6 PC_Ratios 6

Dividend_Payout_Ratio 31 DMA_50D/200D 9 EPS_EarningsSurprise 3 F2 Dividend_Payout_Ratio 14 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8 DMA_50D/200D 7 F3 DMA_50D/200D 11 Dividend_Payout_Ratio 10 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 9 F4 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 10 DMA_50D/200D 6 EPS_EarningsSurprise 3 F5 PE_Ratios_Q_Change 8 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 6 EPS_EarningsSurprise 5

EPS_EarningsSurprise 15 EPS_EarningsSurprise 15 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 13 F2 EPS_Earnings_Surprise_Backward_Diﬀ 13 Cost_Of_Revenue_Q_Change 10 Cost_Of_Revenue_Q_Change 10 F3 Inventory_Turnover 8 Cost_Of_Revenue_Q_Change 7 Total_Liabilities_Q_Change 6 F4 EPS_Earnings_Surprise_Backward_Diﬀ 7 EPS_Earnings_Surprise_Backward_Diﬀ 7 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 6 F5 EPS_Earnings_Surprise_Backward_Diﬀ 10 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 7 Cost_Of_Revenue_Q_Change 5

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 41 EPS_Earnings_Surprise_Backward_Diﬀ 6 Dividend_Payout_Ratio 4 F2 EPS_Earnings_Surprise_Backward_Diﬀ 16 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 13 Dividend_Payout_Ratio 7 F3 EPS_Earnings_Surprise_Backward_Diﬀ 10 Dividend_Payout_Ratio 7 Dividend_Payout_Ratio 7 F4 EPS_Earnings_Surprise_Backward_Diﬀ 14 Dividend_Payout_Ratio 9 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 7 F5 Dividend_Payout_Ratio 11 Total_Liabilities_Q_Change 7 EPS_Earnings_Surprise_Backward_Diﬀ 6

Dividend_Payout_Ratio 13 EPS_Earnings_Surprise_Backward_Diﬀ 12 EPS_EarningsSurprise 10 F2 Dividend_Payout_Ratio 11 EPS_EarningsSurprise 9 Total_Liabilities_Q_Change 7 F3 Dividend_Payout_Ratio 15 EPS_EarningsSurprise 8 EPS_EarningsSurprise 8 F4 EPS_Earnings_Surprise_Backward_Diﬀ 14 PC_Ratios_Y_Change 6 PC_Ratios_Y_Change 6 F5 Dividend_Payout_Ratio 8 EPS_Earnings_Surprise_Backward_Diﬀ 6 EPS_EarningsSurprise 5

Table 7: Top ﬁve driving factors for the

Basic Materials stocks in each ﬁnancial reporting year from 2014 to 2018.15

018 Highest Occurance Count Second HighestOccurance Count Third Highest Occurance CountF1

EPS_EarningsSurprise 65 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 21 EPS_Earnings_Surprise_Backward_Diﬀ 7 F2 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 41 EPS_EarningsSurprise 20 EPS_Earnings_Surprise_Backward_Diﬀ 15 F3 Return_On_Common_Equity 29 Return_On_Common_Equity 29 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 16 F4 Return_On_Common_Equity 31 EPS_Earnings_Surprise_Backward_Diﬀ 26 Net_Income_Y_Change 13 F5 Net_Income_Y_Change 35 Return_On_Common_Equity 14 Total_Liabilities_Q_Change 7

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 70 EPS_EarningsSurprise 19 EPS_Earnings_Surprise_Backward_Diﬀ 7 F2 EPS_EarningsSurprise 36 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 22 EPS_Earnings_Surprise_Backward_Diﬀ 17 F3 Return_On_Common_Equity 30 EPS_Earnings_Surprise_Backward_Diﬀ 21 EPS_EarningsSurprise 20 F4 Return_On_Common_Equity 32 EPS_Earnings_Surprise_Backward_Diﬀ 11 Total_Liabilities_Q_Change 6 F5 Total_Liabilities_Q_Change 11 Net_Income_Y_Change 10 PE_Ratios 9

EPS_EarningsSurprise 47 EPS_Earnings_Surprise_Backward_Diﬀ 25 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 21 F2 EPS_Earnings_Surprise_Backward_Diﬀ 35 EPS_EarningsSurprise 33 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 11 F3 EPS_Earnings_Surprise_Backward_Diﬀ 25 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 21 Return_On_Common_Equity 16 F4 Return_On_Common_Equity 40 Net_Income_Y_Change 11 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8 F5 Net_Income_Y_Change 19 Return_On_Common_Equity 12 Total_Liabilities_Q_Change 11

EPS_EarningsSurprise 61 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 16 Return_On_Common_Equity 12 F2 Return_On_Common_Equity 38 EPS_EarningsSurprise 20 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 17 F3 EPS_Earnings_Surprise_Backward_Diﬀ 29 Return_On_Common_Equity 21 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 16 F4 Net_Income_Y_Change 27 Return_On_Common_Equity 19 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 16 F5 Net_Income_Y_Change 32 PE_Ratios 12 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8

EPS_Earnings_Surprise_Backward_Diﬀ 42 EPS_EarningsSurprise 28 EPS_EarningsSurprise 28 F2 EPS_EarningsSurprise 41 EPS_Earnings_Surprise_Backward_Diﬀ 27 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 22 F3 Return_On_Common_Equity 22 EPS_Earnings_Surprise_Backward_Diﬀ 21 EPS_EarningsSurprise 18 F4 Return_On_Common_Equity 26 Net_Income_Y_Change 25 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8 F5 Net_Income_Y_Change 29 Return_On_Common_Equity 24 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 10

Table 8: Top ﬁve driving factors for the

Consumer Cyclical stocks in each ﬁnancial reporting year from 2014 to 2018.

EPS_Earnings_Surprise_Backward_Diﬀ 41 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 28 EPS_EarningsSurprise 20 F2 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 27 EPS_EarningsSurprise 26 EPS_Earnings_Surprise_Backward_Diﬀ 23 F3 EPS_EarningsSurprise 32 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 15 EPS_Earnings_Surprise_Backward_Diﬀ 11 F4 Return_On_Common_Equity 19 EPS_EarningsSurprise 10 Operating_Margin_Y_Change 8 F5 Return_On_Common_Equity 12 Operating_Margin_Y_Change 7 Operating_Margin_Y_Change 7

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 51 EPS_Earnings_Surprise_Backward_Diﬀ 34 EPS_EarningsSurprise 10 F2 EPS_Earnings_Surprise_Backward_Diﬀ 35 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 34 EPS_EarningsSurprise 16 F3 EPS_EarningsSurprise 53 EPS_Earnings_Surprise_Backward_Diﬀ 17 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8 F4 Operating_Income_Y_Change 22 EPS_EarningsSurprise 9 Return_On_Common_Equity 6 F5 Operating_Income_Y_Change 14 Return_On_Common_Equity 13 Cash_Q_Change 11

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 52 EPS_Earnings_Surprise_Backward_Diﬀ 35 EPS_EarningsSurprise 12 F2 EPS_Earnings_Surprise_Backward_Diﬀ 45 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 34 EPS_EarningsSurprise 13 F3 EPS_EarningsSurprise 46 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 10 EPS_Earnings_Surprise_Backward_Diﬀ 8 F4 Return_On_Common_Equity 21 EPS_EarningsSurprise 7 EPS_EarningsSurprise 7 F5 Return_On_Common_Equity 15 Gross_Proﬁt_Y_Change 13 PS_Ratios_Y_Change 8

EPS_EarningsSurprise 67 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 23 Dividend_Payout_Ratio_Q_Change 2 F2 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 37 EPS_EarningsSurprise 22 EPS_Earnings_Surprise_Backward_Diﬀ 13 F3 EPS_Earnings_Surprise_Backward_Diﬀ 47 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 10 EPS_EarningsSurprise 5 F4 Return_On_Common_Equity 12 Total_Liabilities_Q_Change 9 Dividend_Yield_Q_Change 7 F5 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 50 EPS_Earnings_Surprise_Backward_Diﬀ 37 EPS_EarningsSurprise 13 F2 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 35 EPS_Earnings_Surprise_Backward_Diﬀ 33 EPS_EarningsSurprise 26 F3 EPS_EarningsSurprise 49 EPS_Earnings_Surprise_Backward_Diﬀ 23 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 9 F4 Return_On_Common_Equity 16 Cash_Q_Change 11 EPS_EarningsSurprise 5 F5 Return_On_Common_Equity 16 Cash_Q_Change 6 EPS_EarningsSurprise 5

Table 9: Top ﬁve driving factors for the

Consumer Non-Cyclical stocks in each ﬁnancial reporting year from 2014 to 2018.16

018 Highest Occurance Count Second HighestOccurance Count Third Highest Occurance CountF1

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 95 EPS_Earnings_Surprise_Backward_Diﬀ 5 EPS_EarningsSurprise 0 F2 EPS_Earnings_Surprise_Backward_Diﬀ 62 EPS_EarningsSurprise 11 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 5 F3 EPS_EarningsSurprise 28 EPS_EarningsSurprise 28 EPS_Earnings_Surprise_Backward_Diﬀ 20 F4 PS_Ratios 29 EPS_EarningsSurprise 22 Operating_Margin_Y_Change 10 F5 PS_Ratios 19 PS_Ratios_Y_Change 14 EPS_EarningsSurprise 11

EPS_Earnings_Surprise_Backward_Diﬀ 52 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 46 EPS_EarningsSurprise 1 F2 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 47 EPS_Earnings_Surprise_Backward_Diﬀ 34 EPS_EarningsSurprise 8 F3 PS_Ratios 31 EPS_EarningsSurprise 27 Income_from_Continued_Operations_Q_Change 10 F4 PS_Ratios 37 PS_Ratios_Y_Change 13 EPS_EarningsSurprise 9 F5 EPS_EarningsSurprise 13 PS_Ratios 11 PS_Ratios_Y_Change 8

EPS_Earnings_Surprise_Backward_Diﬀ 50 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 47 EPS_EarningsSurprise 3 F2 EPS_Earnings_Surprise_Backward_Diﬀ 43 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 35 EPS_EarningsSurprise 9 F3 PS_Ratios 33 EPS_EarningsSurprise 27 Return_On_Common_Equity 4 F4 PS_Ratios 38 EPS_EarningsSurprise 18 PS_Ratios_Y_Change 8 F5 EPS_EarningsSurprise 15 PS_Ratios 11 Return_On_Common_Equity 10

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 92 EPS_Earnings_Surprise_Backward_Diﬀ 6 Inventory_Turnover 2 F2 EPS_Earnings_Surprise_Backward_Diﬀ 78 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8 EPS_EarningsSurprise 5 F3 EPS_EarningsSurprise 49 PS_Ratios_Y_Change 19 PS_Ratios 13 F4 PS_Ratios 31 PS_Ratios_Y_Change 24 EPS_EarningsSurprise 19 F5 PS_Ratios_Y_Change 28 PS_Ratios 21 EPS_EarningsSurprise 12

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 86 EPS_Earnings_Surprise_Backward_Diﬀ 13 EPS_EarningsSurprise 1 F2 EPS_Earnings_Surprise_Backward_Diﬀ 71 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 12 PS_Ratios 4 F3 PS_Ratios 52 EPS_EarningsSurprise 12 EPS_Earnings_Surprise_Backward_Diﬀ 8 F4 PS_Ratios 26 PS_Ratios_Y_Change 18 EPS_EarningsSurprise 15 F5 PS_Ratios_Y_Change 12 PS_Ratios 7 Short_Term_Debt_Y_Change 6

Table 10: Top ﬁve driving factors for the

Financial stocks in each ﬁnancial reporting year from 2014 to 2018.

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 57 EPS_Earnings_Surprise_Backward_Diﬀ 14 EPS_EarningsSurprise 13 F2 EPS_EarningsSurprise 26 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 18 Return_On_Assets_Q_Change 11 F3 Return_On_Assets_Q_Change 24 EPS_Earnings_Surprise_Backward_Diﬀ 16 EPS_EarningsSurprise 15 F4 Return_On_Assets_Q_Change 18 EPS_Earnings_Surprise_Backward_Diﬀ 11 EPS_EarningsSurprise 10 F5 Return_On_Assets_Q_Change 12 EPS_EarningsSurprise 11 Short_Term_Debt_Q_Change 8

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 53 EPS_Earnings_Surprise_Backward_Diﬀ 15 EPS_EarningsSurprise 11 F2 EPS_EarningsSurprise 37 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 17 Dividend_Payout_Ratio_Y_Change 7 F3 Return_On_Assets_Q_Change 13 EPS_EarningsSurprise 11 Dividend_Payout_Ratio_Y_Change 10 F4 Return_On_Assets_Q_Change 18 EPS_EarningsSurprise 12 Operating_Income_Y_Change 11 F5 Dividend_Payout_Ratio_Y_Change 9 Return_On_Assets_Q_Change 7 DMA_5D/200D 6

EPS_EarningsSurprise 31 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 21 Operating_Income_Y_Change 9 F2 EPS_EarningsSurprise 22 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 14 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 14 F3 EPS_EarningsSurprise 18 EPS_Earnings_Surprise_Backward_Diﬀ 11 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 9 F4 Operating_Income_Y_Change 11 Short_Term_Debt_Q_Change 10 Short_Term_Debt_Q_Change 10 F5 Operating_Income_Y_Change 15 EPS_Earnings_Surprise_Backward_Diﬀ 8 Return_On_Assets_Q_Change 7

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 61 EPS_EarningsSurprise 11 EPS_EarningsSurprise 11 F2 EPS_EarningsSurprise 18 EPS_Earnings_Surprise_Backward_Diﬀ 16 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 15 F3 EPS_EarningsSurprise 19 EPS_Earnings_Surprise_Backward_Diﬀ 11 Return_On_Assets_Q_Change 9 F4 EPS_EarningsSurprise 12 EPS_EarningsSurprise 12 Short_Term_Debt_Q_Change 11 F5 EPS_EarningsSurprise 9 EPS_EarningsSurprise 9 EPS_Earnings_Surprise_Backward_Diﬀ 8

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 55 EPS_Earnings_Surprise_Backward_Diﬀ 14 EPS_EarningsSurprise 6 F2 EPS_Earnings_Surprise_Backward_Diﬀ 18 EPS_EarningsSurprise 17 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 12 F3 EPS_EarningsSurprise 20 EPS_Earnings_Surprise_Backward_Diﬀ 10 Short_Term_Debt_Q_Change 9 F4 EPS_Earnings_Surprise_Backward_Diﬀ 11 EPS_EarningsSurprise 8 Return_On_Assets_Q_Change 7 F5 Operating_Income_Y_Change 9 EPS_EarningsSurprise 8 Short_Term_Debt_Q_Change 7

Table 11: Top ﬁve driving factors for the

Technology stocks in each ﬁnancial reporting year from 2014 to 2018.17

018 Highest Occurance Count Second HighestOccurance Count Third Highest Occurance CountF1

Total_Liabilities_Q_Change 27 Net_Income_Y_Change 18 PE_Ratios 13 F2 Total_Liabilities_Q_Change 26 Net_Income_Y_Change 18 Operating_Income_Y_Change 8 F3 Operating_Income_Y_Change 15 Total_Liabilities_Q_Change 13 PE_Ratios 11 F4 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 11 PC_Ratios 6 PC_Ratios 6 F5 Net_Income_Y_Change 13 Operating_Income_Y_Change 9 PE_Ratios 7

PE_Ratios 37 PB_Ratios_Y_Change 8 PB_Ratios_Y_Change 8 F2 Net_Income_Y_Change 17 PE_Ratios 14 PB_Ratios_Y_Change 8 F3 PB_Ratios_Y_Change 12 PB_Ratios_Y_Change 12 Net_Income_Y_Change 10 F4 Net_Income_Y_Change 17 Cost_Of_Revenue_Y_Change 9 PB_Ratios_Y_Change 8 F5 Total_Liabilities_Q_Change 9 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8 Cost_Of_Revenue_Y_Change 7

Total_Liabilities_Q_Change 21 PS_Ratios_Y_Change 12 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 6 F2 Total_Liabilities_Q_Change 11 PS_Ratios_Y_Change 10 PE_Ratios 7 F3 Total_Liabilities_Q_Change 12 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 10 PS_Ratios_Y_Change 8 F4 Total_Liabilities_Q_Change 8 PS_Ratios_Y_Change 7 PE_Ratios 6 F5 Total_Liabilities_Q_Change 13 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8 PE_Ratios 7

Total_Liabilities_Q_Change 18 PE_Ratios 12 Operating_Income_Y_Change 8 F2 Total_Liabilities_Q_Change 14 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 11 PE_Ratios 10 F3 Operating_Income_Y_Change 10 PE_Ratios 8 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 7 F4 Total_Liabilities_Q_Change 11 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8 PE_Ratios 7 F5 Total_Liabilities_Q_Change 14 PE_Ratios 5 DMA_50D/200D 4

Total_Liabilities_Q_Change 41 PE_Ratios 8 PS_Ratios_Y_Change 7 F2 Total_Liabilities_Q_Change 20 PS_Ratios_Y_Change 16 PE_Ratios 7 F3 Total_Liabilities_Q_Change 12 PS_Ratios_Y_Change 10 PE_Ratios 8 F4 Operating_Income_Y_Change 7 Operating_Income_Y_Change 7 PE_Ratios 6 F5 PE_Ratios 12 Cost_Of_Revenue_Y_Change 6 Cost_Of_Revenue_Y_Change 6

Table 12: Top ﬁve driving factors for the

Communications stocks in each ﬁnancial reporting year from 2014 to 2018.

EPS_EarningsSurprise 33 DMA_50D/200D 15 DMA_5D/200D 5 F2 PS_Ratios 15 EPS_EarningsSurprise 13 DMA_50D/200D 8 F3 EPS_EarningsSurprise 12 DMA_50D/200D 8 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 5 F4 DMA_50D/200D 9 PC_Ratios_Q_Change 8 PC_Ratios_Q_Change 8 F5 EPS_EarningsSurprise 6 EPS_EarningsSurprise 6 DMA_5D/200D 5

EPS_EarningsSurprise 34 DMA_5D/200D 8 DMA_5D/200D 8 F2 EPS_EarningsSurprise 12 EPS_EarningsSurprise 12 DMA_50D/200D 9 F3 Current_Ratio 12 Return_On_Assets_Y_Change 10 EPS_EarningsSurprise 9 F4 Return_On_Assets_Y_Change 7 EPS_EarningsSurprise 6 EPS_EarningsSurprise 6 F5 DMA_50D/200D 9 Current_Ratio 8 Return_On_Assets_Y_Change 6

DMA_50D/200D 24 DMA_5D/200D 13 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 12 F2 DMA_50D/200D 15 PS_Ratios 14 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 13 F3 DMA_50D/200D 13 DMA_50D/200D 13 DMA_5D/200D 11 F4 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 18 DMA_50D/200D 12 EPS_EarningsSurprise 8 F5 DMA_50D/200D 11 PS_Ratios 9 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 8

EPS_Earnings_Surprise_Backward_Ave_Diﬀ 69 Gross_Proﬁt_Q_Change 5 DMA_50D/200D 4 F2 DMA_5D/200D 16 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 11 EPS_EarningsSurprise 9 F3 PS_Ratios 14 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 11 EPS_EarningsSurprise 9 F4 DMA_5D/200D 9 PS_Ratios 7 EPS_EarningsSurprise 6 F5 DMA_5D/200D 12 PS_Ratios 11 Gross_Proﬁt_Q_Change 6

PS_Ratios 20 DMA_5D/200D 15 EPS_EarningsSurprise 12 F2 PS_Ratios 17 EPS_EarningsSurprise 14 DMA_50D/200D 11 F3 PS_Ratios 18 EPS_Earnings_Surprise_Backward_Ave_Diﬀ 15 EPS_EarningsSurprise 9 F4 DMA_50D/200D 11 Return_On_Assets_Y_Change 10 EPS_EarningsSurprise 9 F5 DMA_50D/200D 11 Cash_Q_Change 10 PS_Ratios 8

Table 13: Top ﬁve driving factors for the

Energy stocks in each ﬁnancial reporting year from 2014 to 2018.18

018 Highest Occurance Count Second HighestOccurance Count Third Highest Occurance CountF1

Short_Term_Debt_Q_Change 38 Return_On_Assets_Y_Change 12 Return_On_Assets 3 F2 Return_On_Assets_Y_Change 19 Short_Term_Debt_Q_Change 18 DMA_50D/200D 6 F3 Return_On_Assets_Y_Change 10 Net_Debt_to_EBIT_Y_Change 6 Net_Debt_to_EBIT_Y_Change 6 F4 Short_Term_Debt_Y_Change 8 Return_On_Assets_Y_Change 7 Return_On_Assets_Y_Change 7 F5 Short_Term_Debt_Y_Change 7 Return_On_Assets_Y_Change 6 Short_Term_Debt_Q_Change 5

Return_On_Assets_Y_Change 36 Dividend_Yield_Y_Change 28 Short_Term_Debt_Q_Change 8 F2 Dividend_Yield_Y_Change 24 Return_On_Assets_Y_Change 18 Short_Term_Debt_Q_Change 12 F3 Short_Term_Debt_Q_Change 20 Return_On_Assets_Y_Change 11 Dividend_Yield_Y_Change 8 F4 Short_Term_Debt_Q_Change 11 Dividend_Yield_Y_Change 7 Free_Cash_Flow_Q_Change 6 F5 Free_Cash_Flow_Q_Change 11 Short_Term_Debt_Q_Change 9 PC_Ratios_Y_Change 6

Short_Term_Debt_Q_Change 20 Operating_Margin_Y_Change 18 Return_On_Assets_Y_Change 14 F2 Return_On_Assets_Y_Change 18 Operating_Margin_Y_Change 10 Short_Term_Debt_Q_Change 7 F3 Return_On_Assets_Y_Change 13 Operating_Margin_Y_Change 8 DMA_5D/50D 7 F4 Operating_Margin_Y_Change 10 Operating_Margin_Y_Change 10 PC_Ratios_Y_Change 6 F5 Short_Term_Debt_Q_Change 6 Short_Term_Debt_Q_Change 6 PC_Ratios 5

Short_Term_Debt_Q_Change 21 Return_On_Assets_Y_Change 13 DMA_5D/50D 12 F2 Short_Term_Debt_Q_Change 19 Return_On_Assets_Y_Change 12 DMA_5D/50D 5 F3 Return_On_Assets_Y_Change 13 DMA_5D/50D 9 Short_Term_Debt_Q_Change 8 F4 DMA_5D/50D 12 Short_Term_Debt_Q_Change 7 Short_Term_Debt_Q_Change 7 F5 PC_Ratios 6 PC_Ratios 6 PC_Ratios 6

PC_Ratios 12 Short_Term_Debt_Q_Change 10 Free_Cash_Flow_Q_Change 8 F2 Short_Term_Debt_Q_Change 7 Free_Cash_Flow_Q_Change 5 Free_Cash_Flow_Q_Change 5 F3 Free_Cash_Flow_Q_Change 5 Free_Cash_Flow_Q_Change 5 Net_Debt_to_EBIT_Y_Change 4 F4 Short_Term_Debt_Q_Change 6 Short_Term_Debt_Q_Change 6 Cost_Of_Revenue_Q_Change 5 F5 Short_Term_Debt_Q_Change 5 PC_Ratios 4 PC_Ratios 4

Table 14: Top ﬁve driving factors for the