[PDF] Earnings Prediction with Deep Learning

Abstract

In the financial sector, a reliable forecast the future financial performance of a company is of great importance for investors' investment decisions. In this paper we compare long-term short-term memory (LSTM) networks to temporal convolution network (TCNs) in the prediction of future earnings per share (EPS). The experimental analysis is based on quarterly financial reporting data and daily stock market returns. For a broad sample of US firms, we find that both LSTMs outperform the naive persistent model with up to 30.0% more accurate predictions, while TCNs achieve and an improvement of 30.8%. Both types of networks are at least as accurate as analysts and exceed them by up to 12.2% (LSTM) and 13.2% (TCN).

Full PDF

EEarnings Prediction with Deep Leaning

Lars Elend − − − , Sebastian A. Tideman − − − X ] ,Kerstin Lopatta − − − , and Oliver Kramer − − − Computational Intelligence Group, Department of Computer Science,Carl von Ossietzky University of Oldenburg, 26111 Oldenburg, Germany .@uni-oldenburg.de Chair of Accounting, Auditing and Sustainability,University of Hamburg, 20146 Hamburg, Germany .@uni-hamburg.de

Abstract.

In the ﬁnancial sector, a reliable forecast the future ﬁnancialperformance of a company is of great importance for investors’ investmentdecisions. In this paper we compare long-term short-term memory (LSTM)networks to temporal convolution network (TCNs) in the prediction offuture earnings per share (EPS). The experimental analysis is based onquarterly ﬁnancial reporting data and daily stock market returns. Fora broad sample of US ﬁrms, we ﬁnd that both LSTMs outperform thenaive persistent model with up to 30.0% more accurate predictions, whileTCNs achieve and an improvement of 30.8%. Both types of networks areat least as accurate as analysts and exceed them by up to 12.2% (LSTM)and 13.2% (TCN).

Keywords:

Finance · Earnings Prediction · EPS Forecasts · Long ShortTerm Memory · Temporal Convolutional Network.

Investors rely ﬁrst and foremost on earnings predictions when making investmentdecisions, e.g., buy, hold, or sell a ﬁrm’s shares. Besides using own projections, theyheavily rely on earnings forecasts provided by ﬁnancial analysts. Consequently,forecasting earnings is one of the main tasks of ﬁnancial analysts working at majorﬁnancial institutions, e.g., broker ﬁrms. Analysts invest signiﬁcant resources toprovide accurate forecasts. However, forecasting is a diﬃcult undertaking asnumerous factors have an inﬂuence on the prediction performance. In this paper,we predict publicly listed US ﬁrms’ quarterly earnings per share with state-of-the-art techniques from the ﬁeld of deep neural networks based on companies’time series data.We structure the remainder of this paper as follows. In Section 2, we presentrelated work on prediction of ﬁnancial data. The base time series model andquality measures are introduced in Section 3. We describe the data preprocessingprocess in Section 4. Objective of our work is to compare

LSTM networks with

TCN s, which will be introduced in Section 5. Section 6 presents the experimentalanalysis, and Section 7 draws conclusions. a r X i v : . [ q -f i n . GN ] J un L. Elend et al.

Analyst forecasts are often used to benchmark the accuracy of earnings predictionsobtained from models. However, due to recent regulation on ﬁnancial analystsworking conditions, e.g., limiting the private access to management, a drop inanalyst coverage has been observed [1]. Automated earnings prediction modelssupported by artiﬁcial intelligence may ﬁll this gap. Empirical evidence is missingwhether artiﬁcial intelligence can provide meaningful forecasts.Some evidence exists that fraud, e.g., illegal manipulation of earnings, canbe predicted using machine learning [4]. In their study, Bao et al. (2020) ﬁndthat ensemble learning with raw accounting numbers has predictive power forfuture fraud cases. Their approach outperforms logistic regression models basedon ﬁnancial ratios commonly used by prior research [6] as well as a support-vector-machine model [5], where a ﬁnancial kernel maps raw accounting numbersinto a set of ﬁnancial ratios. Yet, the prediction of restatements is relativelyless challenging as it is a binary decision tree (future restatement vs. no futurerestatement). To the contrary, predicting future earnings is more challengingas all discrete values are theoretically possible and information from multiplesources, e.g., ﬁnancial statements, stock market data, have to be considered.To our knowledge, no study has yet predicted future earnings using artiﬁcialintelligence. Closest to this study is the work of Ball and Ghysels (2018) [3]. Theyuse a mixed data sampling regression method (but no neural networks) to predictfuture earnings and ﬁnd that their predictions beat analysts’ predictions in certaincases, e.g., when the ﬁrm size is smaller and analysts’ forecast dispersion is high.

The goal in data-driven prediction based on time series is to ﬁnd a function φ that yields a future value y based on the data of the past β time steps x = ( q t − β +1 , . . . , q t ) (Fig. 1). In this paper, the time-span between two time stepsis 3 months. A non-perfect predictor ˆ φ ( x ) = ˆ y can be evaluated using the meansquared error ( MSE ) to the real value y . . . . q t − q t − q t − q t q t +1 q t +2 q t +3 . . .tτ = 1 β = 3 x yφ Fig. 1: Illustration of time series model for prediction of earnings of a companywith quarterly reports q t at time step t . We seek a mapping φ from pattern x of earning data of the past to label y of the predicted earning for the future t = t + τ . The window size β describes the time span of considered past earnings. arnings Prediction with Deep Leaning 3 To evaluate our model we compare it with the persistent model and theanalysts forecast. The persistence model is a simple baseline that uses the currentvalue as a prediction for the next time step. For each model the

MSE is calculated.Therefore, larger deviations are more punished than smaller ones.Since the diﬃculty of forecasting the given data varies greatly over time andbetween diﬀerent companies, the error value in itself is not meaningful. Thereforewe use a relative comparison between the diﬀerent models, namely the skill score(SS) [11]: SS

MSE = 1 − MSE ( m ) MSE ( base ) , (1)where MSE ( m ) is MSE of the own model m ( LSTM , TCN ) and

MSE ( base ) is the MSE of the comparison model: persistent model pa or analyst forecast a . Themodel under consideration is better (worse) than the reference model if the skillscore is greater (less) than 0 [11]. As input data, we use accounting data (e.g., total assets and cost of goodssold) from

Computstat Quarterly as well as daily stock market price andreturn data from

CRSP ( Daily Shares ). At ﬁrst both datasets

ComputstatQuarterly and

Daily Shares are reduced to the most important parameters per time-step and ﬁrm. Diﬀerent value ranges of individual parameters x are“normalized” and scaled using the total assets atq : x (cid:48) = x max { , atq } (2)and studentized: z (cid:48) i = z i − z (cid:113) n (cid:80) i ( z i − z ) , (3)where z is the mean of z i . Outliers of eps which are partially erroneous areremoved by using the ﬁrst (last) percentile as minimum (maximum). We createcompany samples of a given window size (number of quarters). Smaller data gapsa ﬁlled using linear interpolation, while samples with larger gaps are rejected.The quarterly data are merged with the corresponding daily stock data DailyShares , which are also being studentized. For the comparison with the persistent model only data points are used for whichanalyst forecasts exist. The following parameters of the data records are used. The parameters in brackets areonly used for the assignment and selection of the samples.

Computstat Quarterly :( cusip , fpedats , ffi5 , ffi10 , ffi12 , ffi48 , financialfirm , EPS_Mean_Analyst ), rdq , epsfiq , atq , revtq , nopiq , xoprq , apq , gdwlq , rectq , xrdq , cogsq , rcpq , ceqq , niq , oiadpq , oibdpq , dpq , ppentq , piq , txtq , gdwlq , xrdq , rcpq Daily Shares :( cusip , date ), ret , prc , vol , shrout , vwretd L. Elend et al. An LSTM network [8] belongs to the family of recurrent neural networks. Itemploys backward connections, which allow saving information in time.

LSTM cells internally consists of three gates: forget, input, and output gates, see Fig. 2.An

LSTM cell employs internal states h and s propagated through time. Yellowboxes represent ANN layers, orange circles represent element wise operations.Input x t is concatenated with h t − and fed to the forget, input, and output gates.The forget gate determines which information should be forgotten, the inputgate speciﬁes to which amount the new input data is taken into account, and theoutput gate state speciﬁes the information to output based on the internal state.With these functional components, an LSTM is well suited for time series data.

LSTM networks have successfully been applied to numerous domains, e.g., forwind power prediction [12] and for speech recognition [7].A

TCN [2] is a special kind of convolutional neural network [9]. While con-volutional neural networks are primarily used for classiﬁcation tasks in image,text or speech,

TCN s can be applied to time series data.

TCN s extend theircounterparts by causal convolutions and dilated convolutions. The

TCN has aone-dimensional time series input. Causal convolutions only use the current andpast information for each ﬁlter. The dilation deﬁnes the distance between theused input data elements of each ﬁlter. An example for both concepts is visualizedin Fig. 3 with a dilated causal convolution with kernel size k = 2 and dilations1, 2, 4. In our experiments we increase d exponentially, i.e. d i = 2 i and selectan appropriate number of layers to cover the given time span. TCN s also ﬁndnumerous applications, e.g., in satellite image time series classiﬁcation [10].

LSTM x t h t − h t forget gate input gate output gate[ · , · ] σ × c t f t σ tanh × + i t ˜ s t σ × tanh o t s t − s t Fig. 2: Illustration of

LSTM cell d = 1 d = 2 d = 4 InputOutput x x x x x x x x y y y y y y y y t Fig. 3: Dilated causal convolution

For our experiments we employed two datasets: A for the choice of a properarchitecture and parameters and B for the ﬁnal experiment with the selectedbest architecture. The training set of A includes all samples whose predicted arnings Prediction with Deep Leaning 5EPS values lie in the period 2012 to end of 2016. The last 10% of the trainingset is used for validation only. The test set is in the following half year after thetraining, so it is independent and has no unfair knowledge. For data set B, theperiod is extended by half a year, so that its test data have not been seen before.Each model is trained with a batch size 1024 for 1000 epochs and a dropoutrate of 0.3 for each intermediate layer and the recurrent edges of an LSTM layer.Dense layers apply tanh as activation function, except for the last layer using alinear one. The window size of

Computstat Quarterly and

Daily Shares is set to 20, i.e., the last 20 quarters of earning reports and the last 20 dailystock market returns form a pattern. The model is optimized using Adam and

MSE as loss. Each epoch’s best model w.r.t. validation error is used for testing.Each experiment is repeated ﬁve times. Statistics include mean and standarddeviation.Furthermore, we have experimentally selected the best architectures as repre-sentatives for

LSTM and

TCN (Fig. 4).

Computstat Quarterly and

DailyShares are used as input (green). The dimensions are given in parentheses. Sincethe shares data is put into a dense layer (D), the time input × is ﬂattenedto 220. After a few layers the two inputs are joined by a merge layer. For the TCN

32 ﬁlters and a kernel size of 3 were used. The last dense layer with onlyone neuron outputs the predicted

EPS value. quarters (20,19) LSTM (20,76) LSTM (38)D (220)D (440)D (660)shares (220) merge D (19) D (8) D (1)(a) LSTM architecturequarters (20,19) TCN (f=32, k=3) D (38)D (220)D (440)D (660)shares (220) merge D (19) D (8) D (1)(b) TCN architecture

Fig. 4: Visualization of selected

LSTM and TCN architectures.As ﬁnancial and non-ﬁnancial companies show a signiﬁcantly diﬀerent behaviorin many regards, we analyze the prediction in independent experiments. Table 1compares the prediction performance with three diﬀerent sets of companies: allcompanies (all), no ﬁnancial companies (noﬁn), only ﬁnancial companies (onlyﬁn).The data sets without ﬁnancial ﬁrms usually give the best results. The worstresults are achieved when only ﬁnancial companies are taken into account.We test the best model an an independent dataset B. Table 2 shows the resultsof the bests conﬁgurations of Table 1. The results for the non-ﬁnancial companiesare similar to the results observed before with an

MSE that is 12–13% better thanthe analysts’ predictions. The predictions for all companies are slightly better,but worse than on dataset A.

L. Elend et al.

Table 1: Selected architectures and parameters for three groups of companies:ﬁnancial (onlyﬁn), non-ﬁnancial (noﬁn), and all. SS MSE type comp ( m, pa ) ( m, a ) LSTM all 0.466 ± ± ± ± ± ± ± ± ± ± ± ± Table 2: Results on dataset B of optimal architectures and parameters groupedby ﬁnancial sector aﬃliation. SS MSE type comp ( m, pa ) ( m, a ) LSTM noﬁn ± ± ± ± These results suggest that

LSTM networks and

TCN s are indeed able to providemeaningful earnings predictions. Even after acknowledging for the variation acrossthe repetitions (e.g., standard errors based on three repetitions), the range ofsigniﬁcance (e.g., mean estimate plus/minus standard error) is well above zeroin all cases. This is remarkable, as we only used widely available public dataon companies such as balance sheet information and stock market price andreturn data. Hence, we can conclude that our networks outperform both thepersistent model and the mean forecast of ﬁnancial analysts based on a subsampleof non-ﬁnancial ﬁrms (e.g., manufacturing ﬁrms).

Our experimental analysis has shown that

LSTM networks and

TCN s are powerfulmodels in the application of earnings prediction. We base our prediction modelson quarterly accounting data such as cost of goods sold and total assets as wellas stock market price and return data. Using these widely available time seriesdata, the persistent model was signiﬁcantly outperformed. The

LSTM s performedslightly better in our analysis using the same set of variables. In the future, wewill extend the experimental analysis to further data sets and integrate furtherdomain knowledge to improve the ﬁnancial predictions. Our ﬁndings are relevantto both broker ﬁrms and investors. Broker ﬁrms may want to consider developing

LSTM networks and

TCN to supplement their analysts’ forecast. Investors couldbuild up their own forecast models using artiﬁcial intelligence, particularly whenthere are no forecasts available from ﬁnancial analysts, which became a moreurgent issue recently due to the drop in analyst coverage induced by regulation. arnings Prediction with Deep Leaning 7

References

1. Anantharaman, D., Zhang, Y.: Cover Me: Managers’ Responses to Changes inAnalyst Coverage in the Post-Regulation FD Period. The Accounting Review (6),1851–1885 (Nov 2011). https://doi.org/10.2308/accr-101262. Bai, S., Kolter, J.Z., Koltun, V.: An Empirical Evaluation of Generic Convolutionaland Recurrent Networks for Sequence Modeling. CoRR abs/1803.01271 (2018)3. Ball, R.T., Ghysels, E.: Automated Earnings Forecasts: Beat Analysts or Combineand Conquer? Management Science (10), 4936–4952 (Oct 2017). https://doi.org/10.1287/mnsc.2017.28644. Bao, Y., Ke, B., Li, B., Yu, Y.J., Zhang, J.: Detecting Accounting Fraud in PubliclyTraded U.S. Firms Using a Machine Learning Approach. Journal of AccountingResearch (1), 199–235 (2020). https://doi.org/10.1111/1475-679X.122925. Cecchini, M., Aytug, H., Koehler, G.J., Pathak, P.: Detecting Management Fraudin Public Companies. Management Science (7), 1146–1160 (May 2010). https://doi.org/10.1287/mnsc.1100.11746. Dechow, P.M., Ge, W., Larson, C.R., Sloan, R.G.: Predicting Material AccountingMisstatements. Contemporary Accounting Research (1), 17–82 (2011). https://doi.org/10.1111/j.1911-3846.2010.01041.x7. Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neuralnetworks. In: Proceedings of the 31st International Conference on InternationalConference on Machine Learning - Volume 32. pp. II–1764–II–1772. ICML’14,JMLR.org, Beijing, China (2014)8. Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation (8), 1735–1780 (Nov 1997). https://doi.org/10.1162/neco.1997.9.8.17359. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.,Jackel, L.D.: Backpropagation Applied to Handwritten Zip Code Recognition.Neural Computation (4), 541–551 (Dec 1989). https://doi.org/10.1162/neco.1989.1.4.54110. Pelletier, C., Webb, G.I., Petitjean, F.: Temporal Convolutional Neural Networkfor the Classiﬁcation of Satellite Image Time Series. Remote Sensing (5), 523(Jan 2019). https://doi.org/10.3390/rs1105052311. Roebber, P.J.: The Regime Dependence of Degree Day Forecast Technique, Skill,and Value. Weather and Forecasting13