[PDF] Application of Deep Neural Networks to assess corporate Credit Rating

Abstract

Recent literature implements machine learning techniques to assess corporate credit rating based on financial statement reports. In this work, we analyze the performance of four neural network architectures (MLP, CNN, CNN2D, LSTM) in predicting corporate credit rating as issued by Standard and Poor's. We analyze companies from the energy, financial and healthcare sectors in US. The goal of the analysis is to improve application of machine learning algorithms to credit assessment. To this end, we focus on three questions. First, we investigate if the algorithms perform better when using a selected subset of features, or if it is better to allow the algorithms to select features themselves. Second, is the temporal aspect inherent in financial data important for the results obtained by a machine learning algorithm? Third, is there a particular neural network architecture that consistently outperforms others with respect to input features, sectors and holdout set? We create several case studies to answer these questions and analyze the results using ANOVA and multiple comparison testing procedure.

Full PDF

aa r X i v : . [ q -f i n . R M ] M a r Application of Deep Neural Networks to assesscorporate Credit Rating

Parisa Golbayani , Dan Wang , and Ionut¸ Florescu ∗ Financial Engineering, School of Business, Stevens Institute of Technology, 1 CastlePoint Terrace, Hoboken, NJ, 07030, USA Hanlon Financial Laboratories , Stevens Institute of Technology, 1 Castle PointTerrace, Hoboken, NJ, 07030, USA

Abstract

Recent literature implements machine learning techniques to assesscorporate credit rating based on ﬁnancial statement reports. In this work,we analyze the performance of four neural network architectures (MLP,CNN, CNN2D, LSTM) in predicting corporate credit rating as issued byStandard and Poor’s. We analyze companies from the energy, ﬁnancialand healthcare sectors in US. The goal of the analysis is to improve appli-cation of machine learning algorithms to credit assessment. To this end,we focus on three questions. First, we investigate if the algorithms per-form better when using a selected subset of features, or if it is better toallow the algorithms to select features themselves. Second, is the tempo-ral aspect inherent in ﬁnancial data important for the results obtained bya machine learning algorithm? Third, is there a particular neural networkarchitecture that consistently outperforms others with respect to inputfeatures, sectors and holdout set? We create several case studies to an-swer these questions and analyze the results using ANOVA and multiplecomparison testing procedure.

Keywords:

Convolutional neural network, long short term memory, percep-tron, credit rating

ECL:

C45, C52, C55

Credit rating is an indication of the level of the risk in investing with the com-pany. It represents the likelihood that the company pays its ﬁnancial obligationson time. Standard & Poor’s uses a two-fold analysis (Qualitative and Quanti-tative) to assign a credit score to a company. Qualitative analysis is based on ∗ Corresponding author. [email protected] , Other authors emails: [email protected] , [email protected] Real-world data are often noisy and incomplete. Therefore, the ﬁrst step of anyprediction problem, in particular, to credit risk assessment is to clean data suchthat we maintain as much meaningful information as possible. The data in thisstudy was obtained from Bloomberg and Compustat. It consists of 332 ﬁnancialvariables of three ﬁnancial, energy and healthcare sectors in US. We gathered allﬁnancial variables available in Bloomberg and Compustat for 30 companies inenergy sector from 2010 − − − In this section, we provide a broad description and speciﬁc details of the networkarchitecture used in this study. For more details please refer to the citations ofeach section.

This is the most classical neural network architecture, also known as feed forwardArtiﬁcial Neural Network. It consists of an input layer, an output layer andseveral hidden layers. In MLP, all the nodes are fully connected to the nodesin adjacent layers. MLP is the most prevalent network architecture for creditrating problems Ahn and Kim (2011); Huang et al. (2004); Kumar and Haynes(2003); Kumar and Bhattacharya (2006).Large scale neural networks are typically overﬁtting the training data sets.There are several parameters of the MLP such as, the number of hidden layers,the number of hidden units, the learning rate and the dropout ratio which arecrucial for tuning the performance of such networks. In this study, we start byusing the values of these parameter (also known as hyper parameters) publishedin Huang et al. (2004). Then, we use

GridSearch to ﬁnd the best values of theseparameters for our speciﬁc datasets.

CNNs are a derivative of traditional Multilayer Perceptron (MLP) neural net-works. They are optimized for advanced computer vision tasks such as twodimensional pattern recognition and image classiﬁcation problems Driss et al.(2017); LeCun et al. (1995). 4he architecture of CNN consists of three key components: the convolutionlayers, the pooling layers and the fully connected layers. CNNs assume there islocal connectivity between the input vector data points. They capture this con-nectivity by using ﬁlters with speciﬁc size and stride, panning around the entireimage. Each ﬁlter moves along the input data k points at a time (stride of k ) andextracts certain features by applying a convolution operation on input tensor.When the stride value increases, the dimension of the input data decreases. Incase of images, the input data is in a matrix format. Margins (edges) of imagesare a very diﬀerent part of input than a rectangle in the middle of the image.To detect the edges of the image, generally the image is padded with 0’s. Whendealing with large dimensional input data (e.g., high resolution images), CNNis introducing pooling layers between subsequent convolution layers. The onlypurpose of the pooling layer is to reduce the spatial dimension of the input. Thedepth of the original data set will be unchanged as the pooling is independentlydone on each dimension. The most common type of pooling is max poolingwhich computes the maximum value of data on each consecutive set of vectorsin a given dimension. The combination of convolution and pooling layers cap-tures the temporal dynamics of time series Tsantekidis et al. (2017). After thelast convolution/pooling layers, a set of fully connected layers are used in thenetwork. The ground truth target for us y i is the number of distinct classes wehave for corporate credit scores while ˜ y i is the predicted credit score for a givenobservation . The error in prediction is obtained by applying the cross entropyloss function on the output layer. CNN is trained using a classic backpropaga-tion algorithm. The backpropagation algorithm Werbos et al. (1990) is basedon minimizing the categorical cross entropy loss function.Comparing to standard feed forward neural network, CNN is keeping aneasier training process due to the less number of parameters and connections.It is also capable of selecting useful ﬁnancial features during the training processas a result of convolution operation Di Persio and Honchar (2016).In this work we implement two types of CNN; one-dimensional (CNN) andtwo-dimensional CNN (CNN2D). In both architectures, we use two convolutionlayers and two fully connected layers. The ﬁrst convolution layer includes 64ﬁlters, the second convolution layer consists of 32 ﬁlters, each with size 3 thatmove along the input data by stride = 1. The last two fully connected layerscontain 128 neurons each. In constructing CNN, the input data contains onlyone quarter (the current quarter), so the ﬁlter moves only in one direction.However, in CNN2D, the 4 most recent consecutive quarters are fed as input tothe network at a time, so the ﬁlter moves in two directions. Corporate credit scores do not change very much from quarter to quarter.Therefore, studying changes in credit rating is much more important than treat-ing each quarter as an independent observation. However, credit rating does notdeteriorate immediately, signs of distress will typically appear in quarters beforethe rating is changing. This led us to include in the input variables informa-5ion from the previous quarters and naturally to the Long Short Term Memory(LSTM) neural network architecture Hochreiter and Schmidhuber (1997).LSTM is a special type of recurrent neural networks (RNN). RNN is a type ofneural network containing loops, that allow information to persist in time. Oneof the important advantages of RNN is the ability to connect previous temporalobservations as input for the problem. In ﬁnance, this is typically crucial whichexplains why LSTM are popular in this area. For a more detailed descriptionof the LSTM architecture please see Heaton et al. (2016); Dixon et al. (2017).For diﬀerent applications of LSTM in ﬁnance see Akita et al. (2016); Bao et al.(2017); Chen et al. (2015).In this study, we implement LSTM with three layers. The ﬁrst layer containsLSTM cells with 32 units. The second and third layers are densely-connectedneural network layers each with 128 units. We consider 4 consecutive trades asthe input to LSTM.

We are addressing three diﬀerent questions in this section. • First, is the performance better when using a selected subset of input fea-tures versus all input variables? (Section 4.1) • Second, is the performance diﬀerent depending on the choice of train-ing/test datasets? (Section 4.2) • Third, is there a particular neural network architecture that consistentlyoutperforms others with respect to input features, sectors and holdout set? (Section 4.3)

Comparing the performance of algorithms when us-ing a selected set of features versus all input features

Rating agencies use speciﬁc ﬁnancial ratios to assess corporate credit ratingsWallis et al. (2019). These features are considered to be good indicators of com-panies ﬁnancial situation. In this section, we choose features that are importantfor companies proﬁtability and revenue Wallis et al. (2019); Kumar (2001). Ta-ble 1 provides the list of these features according to the literature provided.

Case1:

In this section, we use these selected features as an input to the feedforward neural network to predict corporate credit ratings of three sectors (ﬁ-nancial, energy and healthcare) in US. 6able 1: Financial Ratios as common informative featuresRatio R Debt/EBITDA R FFO/Total Debt R EBITDA/Interest R FFO/Interest R CFO/Debt R FFO/Net Proﬁt R NWC/Revenue R Current Asset/Current Liabilities R (FFO+Cash)/Current Liabilities R EBITDA/Revenues R Cash/Total Debt R Total Debt/Tangible Net worth R Total Debt/Revenue R Debt/Capital R Cash/Asset R Total Fixed Capital/Total Fixed Assets R Equity/Asset R NWC/Total Assets R Retained Earnings/Total Assets R EBITDA/Total AssetsFor the case 1 we only use standard MLP neural network architecture (Sec-tion 3.1). Similar results are obtained for the other network architectures. Asin Huang et al. (2004) we start with 41 hidden units, and use

GridSearch toﬁnd the best values of hidden units along with optimal number of hidden layersfor our datasets. For future reference, the best results correspond to a 3 hiddenlayers architecture with the learning rate equaling 0 .

05. To address overﬁttingwe apply cross validation, dropout and early stopping.Table 2: Cross Validation, Dropout and Early Stopping results on energy sectorNumber of hidden units Accuracy on test set Accuracy on train set41 0.776 0.81982 0.808 0.806164 0.753 0.838Table 2 shows the optimal accuracy on the test and training set on energysector.

Case 2:

In this scenario we use all historical ﬁnancial variables available on

COMPUSTAT indiscriminately as input data set to neural networks. We alsouse the MLP architecture to have a direct comparison with Case 1.The data contains 332 ﬁnancial variables and we replace all missing points7ith zero. Handling missing values is important, as they can signiﬁcantly aﬀectthe results of classiﬁcation. To address this issue, we also implement a CNNarchitecture. Recall from section 3.2, that the convolutional layer in the CNNarchitecture is capable of discovering important features automatically. Specif-ically, in our case CNN adapts the weights of features to classify the respectivecorporate credit ratings.Table 3: Mean and Standard Deviation of accuracy based on random selectionusing 332 ﬁnancial variablesTest Mean Test Std Train Mean Train StdEnergyMLP 0.872 0.010 0.974 0.009CNN 0.885 0.022 0.984 0.034FinancialMLP 0.747 0.022 0.837 0.022CNN 0.868 0.012 0.981 0.007HealthcareMLP 0.776 0.014 0.869 0.014CNN 0.866 0.0324 0.981 0.026Table 3 compares the performance of MLP and CNN on test and trainingsets for energy, ﬁnancial and healthcare sectors. As we can clearly see CNNprovides better results. We believe this indicates that handling missing data isimportant and probably MLP may be improved by better processing data. Table 4 showcase the main comparison of this section. We present the resultsobtained applying MLP on the test data for the energy and healthcare sectors.We compare case 1 which develops a model based only on selected features whilecase 2 MLP model contains all the variables. From the results we see that evenusing an ineﬃcient way of handling missing data , case 2 is outperforming theresults for case 1.Table 4: Comparison of mean and standard deviation of accuracy on the testset for Energy and Healthcare sectorsSector Case 1 mean Case 1 std dev Case 2 mean Case 2 std devEnergy 0.808 0.032 0.872 0.010Healthcare 0.752 0.025 0.776 0.014 All experiments in this section are based on random selection of train/test dataset. Energysector contains 30 companies from 2010-2016. Healthcare and ﬁnancial sectors include 59 and66 companies respectively from 2000-2016. In order to have a fair assessment, we split thedataset for energy sector to 85% training and 15% test set. For ﬁnancial and healthcaresectors, we split the dataset to 94% training and 6% test set. We train and test each model15 times to determine the stability of each model. replace all missing points with 0, no convolutional layer Comparing randomly selection of data with yearlyselection of data

When forecasting corporate credit ratings, the majority of existing literatureis using a random selection of data points to select the training and testingsubsets. However, in ﬁnance, most datasets have a temporal aspect to them.If one uses random selection to separate train/test data it is possible that themodel is trained using the same quarter data as the test data. We believe thetemporal aspect is important as quarterly credit rating is done around the sametime. One would think that the existing political and economic conditions willhave an eﬀect on the ratings. For example, consider 2008 the most turbulentyear in our data set. Realistically, one would like the training set to not containany quarter from that year. The reason is that the ﬁnancial statements fromdiﬀerent companies in the same sector during that time are similar.To test the eﬀect of random allocation of data between training and testdatasets we create two test cases.

Case 3:

We run four diﬀerent neural network architectures: MLP, CNN,CNN2D and LSTM for all sectors. We randomly split the dataset into a trainingset and a holdout set as in the previous section. As we explained in section 2,we split the dataset for the energy sector into an 85% train and 15% test sets.For ﬁnancial and healthcare sectors, we split the dataset into 94% training and6% test set. The goal is to compare results of this case with the ones obtainedfor Case 4 with the same ratio of train/test set.

Case 4:

We form the test set by selecting one year, and the train data as allthe years minus the one selected. This allocation avoids having data from thesame period in the test and training sets. We again apply to all sectors usingMLP, CNN, CNN2D and LSTM models. Table 5 provides the results of thisanalysis obtained from the test data. For completion we show the results for allyears. For example, the 2014 row entry means that the training set contains datafor all years but 2014 and test data is 2014. Due to space constraints we decidednot to include the results for the training data. However, we should mention thatthe accuracy numbers obtained for the training data are much larger than thosein table 5 obtained for the test data. This is generally indicative of overﬁtting.Intuitively, this is as expected since there are no contemporaneous points in thetraining and test data sets and thus the performance on the test data is muchworse. 9nergy CNN CNN2D LSTM MLP2010 0.788 0.835 0.825 0.7692011 0.889 0.917 0.870 0.8982012 0.830 0.973 0.929 0.8932013 0.893 0.902 0.902 0.8392014 0.848 0.830 0.839 0.8132015 0.716 0.843 0.913 0.6902016 0.543 0.875 0.938 0.491Financial CNN CNN2D LSTM MLP2000 0.565 0.770 0.745 0.5032001 0.706 0.822 0.789 0.5672002 0.801 0.828 0.806 0.5812003 0.817 0.869 0.649 0.7492004 0.776 0.890 0.810 0.7112005 0.807 0.894 0.787 0.7152006 0.699 0.815 0.731 0.6112007 0.662 0.676 0.740 0.5572008 0.505 0.631 0.653 0.4862009 0.576 0.768 0.817 0.6292010 0.808 0.885 0.836 0.7342011 0.811 0.829 0.899 0.7592012 0.860 0.856 0.908 0.7902013 0.875 0.862 0.806 0.8882014 0.803 0.803 0.849 0.7772015 0.867 0.846 0.817 0.8632016 0.723 0.935 0.919 0.752Healthcare CNN CNN2D LSTM MLP2000 0.525 0.803 0.752 0.5162001 0.611 0.765 0.788 0.6342002 0.690 0.638 0.696 0.6202003 0.718 0.754 0.810 0.6202004 0.676 0.759 0.841 0.5722005 0.707 0.804 0.811 0.6602006 0.772 0.771 0.785 0.6552007 0.722 0.711 0.792 0.6562008 0.704 0.803 0.770 0.7242009 0.816 0.829 0.868 0.7702010 0.732 0.680 0.830 0.6862011 0.775 0.811 0.855 0.7942012 0.806 0.839 0.845 0.8382013 0.788 0.785 0.767 0.7642014 0.747 0.858 0.876 0.7292015 0.806 0.804 0.881 0.8762016 0.696 0.882 0.863 0.70210able 5: Accuracy based on Case 4To formally compare the results of the Cases 3 and 4, Table 6 presents themean and standard deviation of percent accuracy for the test dataset. Case 4column presents average performance across years with standard deviation inparenthesis. For Case 3 we used 15 diﬀerent random allocations, while the Case4 uses each year as one observation.Table 6: Mean and Standard Deviation of percent accuracy for Cases 3 and 4Case3 Mean (Std) Case 4 Mean (Std)Energymlp 0.8359(0.0200) 0.7704(0.1427)cnn 0.8775(0.0243) 0.7868(0.1237)lstm 0.9543(0.0154) 0.888(0.0438)cnn2d 0.9403(0.0243) 0.8822(0.0522)Financialmlp 0.7321(0.018) 0.6866(0.1206)cnn 0.8654(0.0145) 0.7447(0.1117)lstm 0.8984(0.0255) 0.7978(0.0775)cnn2d 0.9013(0.0275) 0.8222(0.0776)Healthcaremlp 0.8181(0.0242) 0.695(0.0941)cnn 0.8972(0.0123) 0.723(0.0746)lstm 0.8862(0.0245) 0.8135(0.0509)cnn2d 0.8444(0.0288) 0.7821(0.0623)There are a few observations we can draw from the results in Table 6. Itappears that a random split of data into test/training (case 3) generally producebetter results than a more realistic temporal allocation (case 4). To conﬁrm thiswe perform a one sided test with the following hypotheses: H : There is no diﬀerence in results obtained for yearly versus random allocation H a : The results for random allocation are better than those for yearly allocation We present the p values of the test in Table 7. From these numbers we seethat, with the exception of MLP for the energy and ﬁnancial sector, CNN forenergy sector, every result is signiﬁcantly larger when using a random allocationof data. The numbers points out the need to have a proper temporal allocationwhen testing machine learning algorithms developed for credit rating. A randomallocation of data will signiﬁcantly increase the percent accuracy.11able 7: P values for a one-sided t test comparing results obtained for case 3and case 4 Energy Financial HealthcareMLP 0.1359 0.0714 2.84E-05CNN 0.0506 0.0002 1.69E-08LSTM 0.0033 0.0000 1.17E-05CNN2D 0.0126 0.0004 5.78E-04A second observation from the numbers presented in Table 5 is that LSTMand CNN2d seem to outperform CNN and MLP over all three sectors. Weexpected this diﬀerence in results as both CNN2D and LSTM use as input thecurrent quarter as well as the past 3 quarters, while MLP and CNN use as inputonly the current quarter. The next section is dedicated to discovering the bestnetwork architecture. Comparing the performance of neural networks

In this section we introduce a formal statistical test that compares the per-formance of the network architectures considered. To our surprise, none of themachine learning articles perform this type of testing procedure. Most are usinga simple t-test similar to the testing procedure in the previous section. This testis suitable for a single pair comparison but when we are performing multiplecomparisons the p-value of the tests has to be suitably modiﬁed. Here we areusing a multiple comparison procedure (see for example Hsu (1996)).Speciﬁcally, we consider a two way ANOVA model where the response vari-able is the performance of the model. The ﬁrst factor is the sector with 3 levels:energy, ﬁnancial and healthcare. The second factor is the network architecturewith 4 levels: MLP, CNN, CNN2D, and LSTM. Table 8 presents the ANOVAtable for this analysis. We can clearly see that the interaction term is highlysigniﬁcant. This implies there is no universally best network architecture andin fact each sector needs to be studied separately.Table 8: Two-way ANOVAsum-squared df F

P R ( > F )Sector 0.06382 2 69.37 7.42E-24Network Architecture 19.1571 4 10410.81 3.76E-240Sector:Network 0.18564 8 50.443 7.59E-45Residual 0.9661 210We investigate each sector by performing one way ANOVA’s for each sec-tor. For each sector the factor network architecture is highly signiﬁcant. Weperform a multiple comparison for pairwise contrasts using a Tukey procedure.The results are presented in Tables 9 and 10. We skip the detailed numerical12nalysis and just present the result of the analysis. In the tables Rank 1 has thebest performance (percent accuracy for the test data), Rank 2 is second bestperformance and so on. The circled values are not statistically diﬀerent.Table 9: Multiple Comparison results for Case 3. Lower rank is better. Circledvalues are not signiﬁcantly diﬀerent.Rank 1 2 3 4Energy lstm cnn2d cnn mlpFinancial cnn2d lstm cnn mlpHealthcare cnn lstm cnn2d mlpTable 10: Multiple Comparison results for Case 4. Lower rank is better. Circledvalues are not signiﬁcantly diﬀerent.Rank 1 2 3 4Energy lstm cnn2d cnn mlpFinancial cnn2d lstm cnn mlpHealthcare lstm cnn2d cnn mlpAs expected from the two way ANOVA results for each sector we have diﬀer-ent architectures that perform better. However, we can see that for both casesas well as all sectors the LSTM architecture is always in the top performinggroup. Based on these results it would appear that if one would have a singlechoice of network architecture the best choice is the Long Short Term Memorymodel. In this work, we compared critically applications of artiﬁcial neural networks tocredit risk assessment. We implement four popular neural network architectures(MLP, CNN, CNN2D, LSTM) and we used them to answer several questionsrelevant to ﬁnance. The results obtained point to several recommendationswhen applying machine learning algorithms to ﬁnance.It is generally believed that using only a subset of relevant features producebetter results for machine learning algorithms than using all the variables inthe ﬁnancial reports. However, in this work we found that using all the ﬁnan-cial variables as input and allowing the neural networks to perform the featureselection in the training process works better.All the papers applying machine learning algorithms to ﬁnance split thedata randomly into a training and test set. We show that using the temporaldimension which is always present in ﬁnancial data is the appropriate way totest algorithms. Speciﬁcally, we form the test set by holding out one year, andthe respective year held out is the training set. Using a temporal allocation is of13ourse how the rating is done in reality, since all quarterly reports come generallyat around the same time. We show that a random allocation will produceperformance results signiﬁcantly higher that a proper temporal allocation so isvery clear that they alter the actual algorithms performance. We believe thereason why random allocation has better performance is that some observationsfrom the same quarter will be in the training and the test data. We think theneural network architecture picks this up.Finally, we investigated if there exists a neural network architecture whichconsistently outperform others with respect to input features, sectors and hold-out set. We consider a two-way ANOVA model where the response variableis the performance result. To our knowledge this is the ﬁrst proper statisticalanalysis of results.We conclude that for credit rating perspective a temporalnetwork architecture such as LSTM seem to perform best.These recommendations are useful when assessing credit worthiness of acompany not previously rated. The ﬁrst time a company is rated is basedprimarily on historical quarterly statements. This may be the case for examplewhen rating Twitter and LinkedIn in 2014. When the rating is updated everyquarter, the assessors need to look at the change in rating and thus detectinga change in rating is perhaps a very relevant problem. We plan to investigatethe performance of machine learning algorithms to detect and predict changesin credit rating in future work.

Acknowledgements

The authors report no conﬂicts of interest. The authors alone are responsible forthe content and writing of the paper. The authors are grateful to UBS for theresearch grant to Hanlon lab which provided partial support for this research.14 eferences

Ahn, H. and K.-J. Kim (2011). Corporate credit rating using multiclass classiﬁ-cation models with order information.

World Academy of Science, Engineer-ing and Technology, International Journal of Social, Behavioral, Educational,Economic, Business and Industrial Engineering 5 (12), 1783–1788.Akita, R., A. Yoshihara, T. Matsubara, and K. Uehara (2016). Deep learn-ing for stock prediction using numerical and textual information. In , pp. 1–6. IEEE.Angelini, E., G. di Tollo, and A. Roli (2008). A neural network approach forcredit risk evaluation.

The quarterly review of economics and ﬁnance 48 (4),733–755.Bao, W., J. Yue, and Y. Rao (2017). A deep learning framework for ﬁnancialtime series using stacked autoencoders and long-short term memory.

PloSone 12 (7), e0180944.Breiman, L. (2001). Random forests.

Machine learning 45 (1), 5–32.Chavan, S., H. Doshi, D. Godbole, P. Parge, and D. Gore. 1d convolutionalneural network for stock market prediction using tensorﬂow. js.Chen, K., Y. Zhou, and F. Dai (2015). A lstm-based method for stock returnsprediction: A case study of china stock market. In , pp. 2823–2824. IEEE.Dereli, N. and M. Sara¸clar (2019). Convolutional neural networks for ﬁnancialtext regression. In

Proceedings of the 57th Annual Meeting of the Associationfor Computational Linguistics: Student Research Workshop , pp. 331–337.Di Persio, L. and O. Honchar (2016). Artiﬁcial neural networks architectures forstock price prediction: Comparisons and applications.

International journalof circuits, systems and signal processing 10 , 403–413.Dixon, M., D. Klabjan, and J. H. Bang (2017). Classiﬁcation-based ﬁ-nancial markets prediction using deep neural networks.

Algorithmic Fi-nance (Preprint), 1–11.Driss, S. B., M. Soua, R. Kachouri, and M. Akil (2017). A comparison studybetween mlp and convolutional neural network models for character recog-nition. In

Real-Time Image and Video Processing 2017 , Volume 10223, pp.1022306. International Society for Optics and Photonics.Du, Y. (2018). Enterprise credit rating based on genetic neural network. In

MATEC Web of Conferences , Volume 227, pp. 02011. EDP Sciences.15u, K., D. Cheng, Y. Tu, and L. Zhang (2016). Credit card fraud detectionusing convolutional neural networks. In

International Conference on NeuralInformation Processing , pp. 483–490. Springer.Gao, Q. (2016).

Stock market forecasting using recurrent neural network . Ph.D. thesis, University of Missouri–Columbia.Graves, A., S. Fern´andez, F. Gomez, and J. Schmidhuber (2006). Connectionisttemporal classiﬁcation: labelling unsegmented sequence data with recurrentneural networks. In

Proceedings of the 23rd international conference on Ma-chine learning , pp. 369–376. ACM.Graves, A., M. Liwicki, H. Bunke, J. Schmidhuber, and S. Fern´andez (2008). Un-constrained on-line handwriting recognition with recurrent neural networks.In

Advances in neural information processing systems , pp. 577–584.Graves, A. and J. Schmidhuber (2005). Framewise phoneme classiﬁcation withbidirectional lstm networks. In

Proceedings. 2005 IEEE International JointConference on Neural Networks, 2005. , Volume 4, pp. 2047–2052. IEEE.H´ajek, P. (2011). Municipal credit rating modelling by neural networks.

DecisionSupport Systems 51 (1), 108–118.Heaton, J., N. G. Polson, and J. H. Witte (2016). Deep learning in ﬁnance. arXiv preprint arXiv:1602.06561 .Hinton, G. E., N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdi-nov (2012). Improving neural networks by preventing co-adaptation of featuredetectors. arXiv preprint arXiv:1207.0580 .Hochreiter, S. and J. Schmidhuber (1997). Long short-term memory.

Neuralcomputation 9 (8), 1735–1780.Hsieh, T.-J., H.-F. Hsiao, and W.-C. Yeh (2011). Forecasting stock marketsusing wavelet transforms and recurrent neural networks: An integrated sys-tem based on artiﬁcial bee colony algorithm.

Applied soft computing 11 (2),2510–2525.Hsu, J. (1996).

Multiple comparisons: theory and methods . Chapman andHall/CRC.Huang, Z., H. Chen, C.-J. Hsu, W.-H. Chen, and S. Wu (2004). Credit rat-ing analysis with support vector machines and neural networks: a marketcomparative study.

Decision support systems 37 (4), 543–558.Krizhevsky, A., I. Sutskever, and G. E. Hinton (2012). Imagenet classiﬁcationwith deep convolutional neural networks. In

Advances in neural informationprocessing systems , pp. 1097–1105.Kumar, K. (2001).

Detection and prediction of ﬁnancial distress . Barmarick.16umar, K. and S. Bhattacharya (2006). Artiﬁcial neural network vs lineardiscriminant analysis in credit ratings forecast: A comparative study of pre-diction performances.

Review of Accounting and Finance 5 (3), 216–227.Kumar, K. and J. D. Haynes (2003). Forecasting credit ratings using an annand statistical techniques.

International journal of business studies 11 (1).LeCun, Y., Y. Bengio, et al. (1995). Convolutional networks for images, speech,and time series.

The handbook of brain theory and neural networks 3361 (10),1995.Liu, Y., Q. Zeng, J. Ordieres Mer´e, and H. Yang (2019). Anticipating stockmarket of the renowned companies: A knowledge graph approach.

Complex-ity 2019 .Liwicki, M., A. Graves, S. Fern`andez, H. Bunke, and J. Schmidhuber (2007).A novel approach to on-line handwriting recognition based on bidirectionallong short-term memory networks. In

Proceedings of the 9th InternationalConference on Document Analysis and Recognition, ICDAR 2007 .Minami, S. (2018). Predicting equity price with corporate action events usinglstm-rnn.

Journal of Mathematical Finance 8 (1), 58–63.Miyamoto, M. and M. Ando. Predicting credit risk for japanese smes with aneural network model.Nazari, M. and M. Alidadi (2013). Measuring credit risk of bank customersusing artiﬁcial neural network.

Journal of Management Research 5 (2), 17.Ntakaris, A., G. Mirone, J. Kanniainen, M. Gabbouj, and A. Iosiﬁdis (2019).Feature engineering for mid-price prediction forecasting with deep learning. arXiv preprint arXiv:1904.05384 .Prechelt, L. (1998). Early stopping-but when? In

Neural Networks: Tricks ofthe trade , pp. 55–69. Springer.Rajaa, S. and J. K. Sahoo (2019). Convolutional feature extraction and neuralarithmetic logic units for stock prediction. arXiv preprint arXiv:1905.07581 .Rather, A. M., A. Agarwal, and V. Sastry (2015). Recurrent neural networkand a hybrid model for prediction of stock returns.

Expert Systems withApplications 42 (6), 3234–3241.Roondiwala, M., H. Patel, and S. Varma (2017). Predicting stock prices usinglstm.

International Journal of Science and Research (IJSR) 6 (4), 1754–1756.Rout, A. K., P. Dash, R. Dash, and R. Bisoi. Forecasting ﬁnancial time seriesusing a low complexity recurrent neural network and evolutionary learningapproach.

Journal of King Saud University-Computer and Information Sci-ences . 17mith, K. A. and J. N. Gupta (2003).

Neural networks in business: techniquesand applications . IGI Global.Takeuchi, L. and Y.-Y. A. Lee (2013). Applying deep learning to enhance mo-mentum trading strategies in stocks. In

Technical Report . Stanford University.Tsantekidis, A., N. Passalis, A. Tefas, J. Kanniainen, M. Gabbouj, and A. Iosi-ﬁdis (2017). Forecasting stock prices from the limit order book using convo-lutional neural networks. In , Volume 1, pp. 7–12. IEEE.Wallis, M., K. Kumar, and A. Gepp (2019). Credit rating forecasting usingmachine learning techniques. In

Managerial Perspectives on Intelligent BigData Analytics , pp. 180–198. IGI Global.Werbos, P. J. et al. (1990). Backpropagation through time: what it does andhow to do it.

Proceedings of the IEEE 78 (10), 1550–1560.Xiong, R., E. P. Nichols, and Y. Shen (2015). Deep learning stock volatilitywith google domestic trends. arXiv preprint arXiv:1512.04916 .Yu, L., S. Wang, and K. K. Lai (2008). Credit risk assessment with a multi-stage neural network ensemble learning approach.

Expert systems with appli-cations 34 (2), 1434–1444.Zhuge, Q., L. Xu, and G. Zhang (2017). Lstm neural network with emotionalanalysis for prediction of stock price.