Firms Default Prediction with Machine Learning
aa r X i v : . [ q -f i n . R M ] F e b Firms Default Prediction with Machine Learning
Tesi Aliaj , Aris Anagnostopoulos ⋆ , and Stefano Piersanti Sapienza University of Rome, Italy. [email protected],[email protected], [email protected] Bank of Italy
Abstract.
Academics and practitioners have studied over the yearsmodels for predicting firms bankruptcy, using statistical and machine-learning approaches. An earlier sign that a company has financial diffi-culties and may eventually bankrupt is going in default , which, looselyspeaking means that the company has been having difficulties in repayingits loans towards the banking system. Firms default status is not tech-nically a failure but is very relevant for bank lending policies and oftenanticipates the failure of the company. Our study uses, for the first timeaccording to our knowledge, a very large database of granular credit datafrom the Italian Central Credit Register of Bank of Italy that containinformation on all Italian companies’ past behavior towards the entireItalian banking system to predict their default using machine-learningtechniques. Furthermore, we combine these data with other informationregarding companies’ public balance sheet data. We find that ensembletechniques and random forest provide the best results, corroborating thefindings of Barboza et al. (Expert Syst. Appl., 2017).
Bankruptcy prediction of a company is, not surprisingly, a topic that has at-tracted a lot of research in the past decades by multiple disciplines [2–5, 7–10,12–14, 18, 19, 21, 22]. Probably the main importance of such research is in banklending. Banks need to predict the possibility of default of a potential counter-party before they extend a loan. An effective predictive system can lead to asounder and profitable lending decisions leading to significant savings for thebanks and the companies and, most importantly, to a stable financial bankingsystem. A stable and effective banking system is crucial for financial stabilityand economic recovery as well highlighted by the recent global financial crisisand European debt crisis. According to Fabio Panetta, general director of theBank of Italy, referring to Italian loans, “The growth of the new deterioratedbank loans and the slowness of the judicial recovery procedures have determineda rapid increase in the stock of these assets, which in 2015 reached a peak of 200billion, equal to 11 percent of total loans.” ⋆ Partially supported by ERC Advanced Grant 788893 AMDROMA “Algorithmic andMechanism Design Research in Online Markets.” Fabio Panetta, Chamber of Deputies, Rome, May 10, 2018. f course, despite the plethora of studies, predicting the failure of a companyis a hard task, as demonstrated by the enormous increase in large corporatefailures in the last decades.Most related research has focused on bankruptcy prediction, which takes placewhen the company officially has the status of being unable to pay its debts (seeSection 3). However, companies often signal much earlier their financial problemstowards the banking system by going in default . Informally speaking, a companyenters into a default state if it has failed to meet its requirement to repay its loansto the banks and it is very probably that it will not be able to meet his financialcommitments in the future (again, see Section 3). Entering into a default state isa strong signal of a company’s failure: typically banks do not finance a companyinto such a state and it is correlated with future bankruptcy.In this paper we use historic data for predicting whether a company will enterin default. We base our analysis on two sets of data. First, we use historic infor-mation from all the loans obtained by almost all the companies based in Italy(totaling to around 800 K companies). This information includes information onthe companies credit dynamics in the past years, as well as past informationon relations with banks and on values of protections associated to loans. Sec-ond, we combine these data with the balance sheets of 300 K of these companies(the rest of them are not obliged to produce balance sheets). We apply multiplemachine-learning techniques, showing that the future default status can be pre-dicted with reasonable accuracy. Note that the dimensions and the informationin our dataset exceeds significantly those of past work, allowing to obtain a veryaccurate picture of the possibility to predict over various economic sectors. Contributions.
To summarize the contributions of our paper1. We analyze a vary large dataset (800 K companies) with highly granulardata on the performance of each company over a period of 10 year. To ourknowledge, this is the most extensive dataset used in the literature.2. We use these data to predict whether a company will default in the nextyear.3. We combine our data with data available from company balance sheets,showing that we can improve further the accuracy of predictions. Roadmap.
In Section 2 we present some related work. In Section 3 we providedefinitions and we describe the problem that we solve. In Section 4 we describeour datasets and the techniques that we use and in Section 5 we present ourresults. We conclude in Section 6.
There has been an enormous amount of work on bankruptcy prediction. Herewe present some of the most influential studies.Initially, scholars focused on making a linear distinction among healthy com-panies and the ones that will eventually default. Among the most influencingioneers in this field we can distinguish Altman [2] and Ohlson [19], both ofwhom made a traditional probabilistic econometric analysis. Altman, essentiallydefined a score, the Z discriminant score, which depends on several financialratios (working capital/total assets, retained earnings/total assets, etc.) to assesthe financial condition of a company. Ohlson on the other side, is using a linearregression (LR) logit model that estimates the probability of failure of a com-pany. Some papers papers criticize these methods as unable to classify companiesas viable or nonviable [5]. However, both approaches are used, in the majorityof the literature, as a benchmark to evaluate more sophisticated methods.Since these early works there has been a large number of works based onmachine-learning techniques [15, 17, 20]. The most successful have been basedon decision trees [11, 14, 16, 23] and neural networks [3, 6, 10, 18, 22]. Typically,all these works use different datasets and different sets of features, dependingon the dataset. Barboza et al. [4] compare such techniques with support vectormachines and ensemble methods showing that ensemble methods and randomforests perform the best.These works mostly try to predict bankruptcy of a company. Our goal is topredict default (see Section 3). Furthermore, most of these papers use balance-sheet data (which are public). Our dataset contains a granular information ofa very large set of companies on the past behavior of loan repayment. To ourknowledge, this is the most extensive dataset used in the literature. There are many technical terms used to characterize debtors who are in financialproblems: illiquidity, insolvency, default, bankruptcy, and so on. Most of the pastresearch on prediction of failures addresses the concept of firm bankruptcy , whichis the legal status of a company, in the public registers, that is unable to payits debts. A firm is in default towards a bank, if it is unable to meet its legalobligations towards paying a loan. There are specific quantitative criteria that abank may use to give a default status to a company.
Adjusted Default Status
The recent financial crisis has led to a revision and harmonization at interna-tional level of the concept of loan default. In general, default is the failure topay interest or principal on a loan or security when due. In this paper we con-sider the classification of adjusted default status , which is a classification that theItalian Central bank (Bank of Italy) gives to a company that has a problematicdebt situation towards the entire banking system. It represents a supervisoryconcept, whose aim is to extend the default credit status to all the loans of aborrower towards the entire financial system (banks, financial institutions, etc.).The term refers to the concept of the Basel II international accord of default of customers. According to this definition, a borrower is defined in default if itscredit exposure has became significantly negative. In detail, to asses the statusf adjusted default, Bank of Italy considers three types of negative exposures.They are the following, in decreasing order of severity: (1) A bad (performing)loan is the most negative classification; (2) an unlikely to pay (UTP) loan is aloan for which the bank has high probability to loose money; (3) A loan is pastdue if it is not returned after a significant period past the deadline.Bank of Italy classifies a company in adjusted default , or adjusted non per-forming loan if it has a total amount of loans belonging to the aforementionedthree categories exceeding certain pre-established proportionality thresholds [1].Therefore, a firm’s adjusted default classification derives from quantitative cri-teria and takes into account the company’s debt exposure to the entire bankingsystem.If a company enters into an adjusted-default status then it is typically unableto obtain new loans. Furthermore, such companies are multiple times more likelyto bankrupt in the future. For instance, out of the 13 K companies that wereclassified in a status of adjusted default in December of 2015, 2160 (16 . .
4% of the companies that were not inadjusted default status became bankrupt.In this paper we attempt to predict whether a company will obtain an ad-justed default status, although for brevity we may call it just default.
In this section we describe the data on which we based our analysis and themachine-learning techniques that we used.
Our analysis is based on two datasets. The first and most important in ourwork is composed of information on loans and the credit of a large sample ofItalian companies. The second reports balance sheet data of a large sub-sampleof medium-large Italian companies.
Credit data.
The first dataset consists of a very large and high granular datasetof credit information about Italian companies belonging to the Italian centralcredit register (CCR). It is an information system on the debt of the customersof the banks and financial companies supervised by the Bank of Italy. Bankof Italy collects information on customers’ borrowings from the intermediariesand notifies them of the risk position of each customer vis-`a-vis the bankingsystem. By means of the CCR the Bank of Italy provides intermediaries with aservice intended to improve the quality of the lending of the credit system andultimately to enhance its stability. The intermediaries report to the Bank of Italyon a monthly basis the total amount of credit due from their customers: datainformation about loans of 30 ,
000 euro or more and non-performing loans of anyamount. The Italian CCR has three main goals: (1) to improve the process ofssessing customer creditworthiness, (2) to raise the quality of credit granted byintermediaries, and to (3) strengthen the financial stability of the credit system.The crucial feature of this database is the high granularity of credit informa-tion. It contains information for about 800 K companies for each quarter of thethe period of 2009–2014. The main features are shown in Table 1. ID Description ID DescriptionL1
Granted amount of loans B1 Revenues L2 Used amount of loans B2 ROE L3 Bank’s classification of firm B3 ROA L4 Average amount of loan used B5 Total turnover L5 Overdraft B6 Total assets L6 Margins B7 Financial charges/operating margin L7 Past due (loans not returned after the deadline) B8 EBITDA L8 Amount of problematic loans L9 Amount of non-performing loans
L10
Amount of loans protected by a collateral
L11
Value of the protection
L12
Amount of forborne credit
Table 1.
Main attributes for the loan (L) and the balance-sheet (B) datasets.
The Balance-sheets dataset.
Our second dataset consists of the balance-sheetdata of about 300 K Italian firms. They are generally medium and large compa-nies and they form a subset of the 800 K companies with loan data. The mainfeatures include those that regard the profitability of a company, such as returnof equity (ROE) and return of assets (ROA); see Table 1 for a more extendedlist. Typically balance sheet data are public data and have been used extensivelyfor bankruptcy prediction (e.g., see Barboza et al. [4] and references therein). As we explain in Section 2, the first approaches for assessing the likelihood ofcompanies to fail were based on some fixed scores; see the work by Altman [2].Current approaches are based on more advanced machine-learning techniques.In this paper we follow the literature [4] by considering a set of diverse machine-learning approaches for predicting loan defaults.In the first test we used five well-known machine-learning approaches. Weprovide a brief description of each of them, as provided by Wikipedia.
Decision Tree (DT) : one of the most popular tool in decision analysis andalso in Machine Learning. A decision tree is a flowchart-like structure in whicheach internal node represents a “test” on an attribute, each branch representshe outcome of the test, and each leaf node represents a class label (decisiontaken after computing all attributes). The paths from root to leaf representclassification rules.
Random Forest (RF) : Random forest are an ensemble learning method forclassification, regression and other tasks, that operate by constructing a multi-tude of decision trees at training time and outputting the class that is the modeof the classes. Random decision forests correct for decision trees’ habit of over-fitting to their training set.
Bagging (BAG) : Bootstrap aggregating, also called bagging, is a machinelearning ensemble meta-algorithm designed to improve the stability and accuracyof machine learning algorithms used in statistical classification and regression. Italso reduces variance and helps to avoid overfitting. Although it is usually appliedto decision tree methods, it can be used with any type of method. Baggingwas proposed by Leo Breiman in 1994 to improve classification by combiningclassifications of randomly generated training sets.
AdaBoost (ADA) : AdaBoost, short for Adaptive Boosting, is a machinelearning meta-algorithm formulated by Yoav Freund and Robert Schapire, in2003. It can be used in conjunction with many other types of learning algorithmsto improve performance. The output of the other learning algorithms (’weaklearners’) is combined into a weighted sum that represents the final output ofthe boosted classifier. AdaBoost (with decision trees as the weak learners) isoften referred to as the best out-of-the-box classifier.
Gradient boosting (GB) : Gradient boosting is a machine learning tech-nique for regression and classification problems, which produces a predictionmodel in the form of an ensemble of weak prediction models, typically decisiontrees. It builds the model in a stage-wise fashion like other boosting methods do,and it generalizes them by allowing optimization of an arbitrary differentiableloss function. That is, algorithms that optimize a cost function over functionspace by iteratively choosing a function (weak hypothesis) that points in thenegative gradient direction.Except for these standard techniques, we also combined the various classifiersin the following way. After learning two versions of each classifier, one with thedefault parameters of the
Python scikit implementation and one with optimalparameters, we apply all of them (10 in total) and if at least 3 classifiers pre-dict that a firm will default then the classifier predicts default. The number 3was chosen after experimentation. We call this approach combination approach
COMB . The main goal of our study is to evaluate the extent to which we can predictwhether a company will enter in a default state using data from past years. Inparticular, our goal is to predict whether a company that by December 2014 isnot in default, will enter in default during one of the four trimesters of 2015. Todo the prediction, we initially used data from the period of 2006–2014; howevere noticed that using data running earlier than five trimesters before 2015 didnot help. Therefore, for all the experimental results that we report here, we usethe data from the last quarter of 2013 and the four quarter of 2014.
We use a variety of evaluation measures to assess the effectiveness of our classi-fiers, which we briefly define. As usually, in a binary classification context, we usethe standard concepts of true positive (TP), false positive (FP), true negative(TN), false negative (FN): Predicted Default Predicted Not DefaultDefault
TP FN
Did not default
FP TN
For instance, FN is the number of firms that defaulted during 2015 but theclassifier predicted that they will not default.We now define the measure that we use: – Precision: Pr = TPTP + FP–
Recall: Re = TPTP + FN–
F1-score: F1 = 2 · Pr · RePr + Re–
Type-I Error:
Type-I = FNTP + FN–
Type-II Error:
Type-II = FPTN + FP–
Balanced Accuracy:
BACCF1 = 2 · TP · TNTP + TN5.2 Datasets
In Section 4.1 we describe the datasets that we use. We perform two families ofexperiments. In the first one, we use only the loan data (as typically performedby Bank of Italy) to assess the probability of default. Then, we also combinethis information with balance-sheet data. Because only a subset of 300 K (out ofthe 800 K ) companies have balance-sheet data available, to be able to comparethe results, we report here the findings over this subset for both families ofexperiments. The classification problem that we deal is very imbalanced: around 4 .
3% ofthe firms were in a default state in 2015. This common problem in classificationmakes it harder. Therefore, as performed in prior work [4] we consider two cases.irst we use the entire dataset, second we also create a balanced version byselecting all the firms that defaulted (13 . K ) and an equal number of randomfirms that did not default, creating in this way a balanced dataset of 26 . K firms. We evaluated the techniques presented in Section 4.2. To assess their effective-ness, we compare them with three basic approaches. The first one is a simplemultinomial Na¨ıve Bayes (
MNB ) classifier. The second is a logistic regression(
LOG ) classifier. Finally, we created the following simple test. We first measuredthe correlation of each feature with the target variables (refer to Table 1). Wefound the most significant ones, (i.e., the ones that are mostly correlated withthe target variable) are L3 (a bank’s classification of the firm) and L7 (amountof loans not repaid after the deadline) for the loan dataset and B2 (ROE) andB3 (ROA) for the balance sheet dataset.Then we built the simple classifier that outputs default if at least one of L3or L7 are nonzero and not default otherwise for the loan dataset. We call thisbaseline
NAIVE .We gather the classification approaches that we use in Table 2.
ID method DescriptionNAIVE
Naive classifier based on features correlation with target
MNB
Multinomial Bayesian classifier
LOG
Logistic Regression GB Gradient Boosting RF Random Forest DT Decision Tree
BAG
Bagging
ADA
AdaBoost
COMB
Combined method based on multiple classifiers
Table 2.
Baselines and classification algorithms.
We are now ready to predict whether companies will enter into an adjusteddefault state, as we explain in Section 3.1.First we present the results for the original, imbalanced dataset. In Table 3we present the results when we use only the loan dataset, whereas in Table 4we present the results when we also use the balance-sheet data. The first findingis that the evaluation scores are rather low. This is in accordance to all priorork, indicating the difficulty of the problem. We observe that the machine-learning approaches are better than the baselines and the various algorithmstrade off differently over the various evaluation measures. Random forests per-form particularly well (in accordance with the findings of Barboza et al. [4])and our combined approach (
COMB ) is able to trade off between precision andrecall and give an overall good classification. Comparing Table 3 with Table 4we see that the additional information provided by the balance-sheet data helpsto improve the classification.
Pr Re F1 Type-I Type-II BACCNAIVE
MNB
LOG GB RF DT BAG
ADA
COMB
Table 3.
Imbalanced training set; loan data. Higher values are better, except for Type-Iand Type-II error.
Pr Re F1 Type-I Type-II BACCNAIVE
MNB
LOG GB RF DT BAG
ADA
COMB
Table 4.
Imbalanced training set; loan and balance-sheet data. Higher values arebetter, except for Type-I and Type-II error.
In Tables 5 and 6 we present the results for the balanced dataset. Thereare some interesting findings here as well. First, as expected the classificationaccuracy improves (similarly to [4]). Second, notice that the
NAIVE classi-fier performs well (expected, as feature L3 takes into account several factors ofhe company’s behavior); however the type-II error is high. Overall,
COMB approach remains the best performer.
Pr Re F1 Type-I Type-II BACCNAIVE
MNB
LOG GB RF DT BAG
ADA
COMB
Table 5.
Balanced training set; loan data. Higher values are better, except for Type-Iand Type-II error.
Pr Re F1 Type-I Type-II BACCNAIVE
MNB
LOG GB RF DT BAG
ADA
COMB
Table 6.
Balanced training set; loan and balance-sheet data. Higher values are better,except for Type-I and Type-II error.
We now see an application of our classifier in an applied problem faced by Bankof Italy. We compare the best performing classifier (
COMB ) with a methodcommonly used to estimate the probability of one-year default by companies ataggregate level.Consider a segmentation of all the companies (e.g., according to economicsector, geographical area, etc.). Often there is the need to estimate the probabilityf default (PD) of a loan in a given segment. A very simple approach, which isactually used in practice, is to simply take the ratio of the companies in thesegment that went into default at year T + 1 over all the companies that werenot in default in year T . We use this method as a baseline.We now consider a second approach based on our classifier, which we call COMB . We estimate the PD by considering the amount of companies in thesegment that are expected (using the
COMB classifier) to go into default atyear T + 1 compared to the total loans existing for the segment at the time T .We use two different segmentations. A coarse one, in which the segments aredefined by the economic sector (e.g. mineral extraction, manufacturing) . and afiner one, which is defined by the combination of the economic sector and thegeographic area, as defined by a value similar to the company’s zip code.In Table 7 we compare the two approaches for estimating the PD. As ex-pected, in both segmentations the classifier-based approach is a winner, withthe improvement being larger for the finer segmentation. In many cases the twoapproaches give the same result, typically because in these cases there are nocompanies that fail (PD equals 0). Coarse segmentation Fine segmentationBaseline COMB Baseline COMBMean error
Mean error
Var error
Var
Superioritypercentage
Superioritypercentage
Table 7.
Comparison of the standard approach to estimate PD with the classifier-basedone. “Mean error” is the average error between the predicted PD value and the realone. “Var error” is the variance of the error. “Superiority percentage” is the percentageof segments in which the predictor is better than the other; in the remaining ones wehave the same performance.
Business-failure prediction is a very important topic of study for economic analy-sis and the regular functioning of the financial system. Moreover the importanceof this issue has greatly increased following the recent financial crisis. There havebeen many recent studies that have tried to predict the failure of companies usingvarious machine-learning techniques.In our study, we used for the first time credit information from the Ital-ian Central Credit Register to predict the banking default of Italian companies,using Machine Learning techniques. We analyzed a very large dataset contain-ing information about almost all the loans of all the Italian companies . Ourfirst findings is that, as in the case of bankruptcy prediction, machine-learningpproaches are able to outperform significantly simpler statistical approaches.Moreover, combining classifiers of different type can lead to even better results.Finally, using information on past loan data is crucial, but the additional use ofbalance-sheet data can improve classification even further.We show that the combined use of loan data with balanced-sheet data leadsto improved performance for predicting default. We conjecture that using loandata in the prediction of bankruptcy (where, typically, only balance-sheet dataare being used) can improve further the performance.Nevertheless, prediction remains an extremely hard problem. Yet, even slightimprovement in the performance, can lead to savings of multiple hundreds of eu-ros for the banking system. Thus our goal is to improve classification even furtherby combining our approaches with further techniques, such as neural-networkbased ones. Some preliminary results in which we use only neural networks areencouraging, even though are worse than the results we report here.
References
1. Methods and sources: Methodological notes (2018), available at (accessed on 6/6/2019)2. Altman, E.: Predicting financial distress of companies: Revisiting the z-score andzeta. Handbook of Research Methods and Applications in Empirical Finance (092000)3. Atiya, A.: Bankruptcy prediction for credit risk using neural networks: A surveyand new results. Neural Networks, IEEE Transactions on , 929 – 935 (08 2001).https://doi.org/10.1109/72.9351014. Barboza, F., Kimura, H., Altman, E.: Machine learning mod-els and bankruptcy prediction. Expert Syst. Appl. (C),405–417 (Oct 2017). https://doi.org/10.1016/j.eswa.2017.04.006, https://doi.org/10.1016/j.eswa.2017.04.006
5. Begley, J., Ming, J., Watts, S.: Bankruptcy classification errors in the 1980s: Anempirical analysis of altman’s and ohlson’s models. Review of Accounting Studies , 267–284 (12 1996)6. Boritz, J., Kennedy, D., Albuquerque, A.d.M.e.: Predicting cor-porate failure using a neural network approach. Intelligent Sys-tems in Accounting, Finance and Management (2), 95–111 (1995), https://onlinelibrary.wiley.com/doi/abs/10.1002/j.1099-1174.1995.tb00083.x
7. Chen, M.Y.: Bankruptcy prediction in firms with statistical and in-telligent techniques and a comparison of evolutionary computation ap-proaches. Computers & Mathematics with Applications (12), 4514 –4524 (2011). https://doi.org/https://doi.org/10.1016/j.camwa.2011.10.030,
8. Cho, S., Hong, H., Ha, B.C.: A hybrid approach based on the com-bination of variable selection using decision trees and case-basedreasoning using the mahalanobis distance: For bankruptcy predic-tion. Expert Systems with Applications (4), 3482 – 3488 (2010), . Erdogan, B.: Prediction of bankruptcy using support vector machines: An appli-cation to bank bankruptcy. Journal of Statistical Computation and Simulation –J STAT COMPUT SIM , 1–13 (01 2012)10. Ferna¡ndez, E., Olmeda, I.: Bankruptcy prediction with artificial neural networks.In: In Proc. of the 2018 2nd International Conference on Inventive Systems andControl (ICISC 2018). vol. 930, pp. 1142–1146 (06 1995)11. Gepp, A., Kumar, K.: Predicting financial distress: A comparison of survival anal-ysis and decision tree techniques. Procedia Computer Science , 396–404 (122015)12. Kumar, P.R., Ravi, V.: Bankruptcy prediction in banks and firms via statisticaland intelligent techniques – a review. European Journal of Operational Research (1), 1 – 28 (2007). https://doi.org/https://doi.org/10.1016/j.ejor.2006.08.043,
13. Lee, S., Choi, W.S.: A multi-industry bankruptcy prediction modelusing back-propagation neural network and multivariate discrim-inant analysis. Expert Systems with Applications (8), 2941 –2946 (2013). https://doi.org/https://doi.org/10.1016/j.eswa.2012.12.009,
14. Lee, W.C.: Genetic programming decision tree for bankruptcy prediction. In: 9thJoint International Conference on Information Sciences (JCIS-06). Atlantis Press(2006/10)15. Lin, W.Y., Hu, Y.H., Tsai, C.F.: Machine learning in financial crisis prediction: Asurvey. IEEE Transactions on Systems, Man, and Cybernetics - TSMC , 421–436(07 2012). https://doi.org/10.1109/TSMCC.2011.217042016. Martinelli, E., de Carvalho, A., Rezende, S., Matias, A.: Rules extractions frombanks’ bankrupt data using connectionist and symbolic learning algorithms. Proc.Computational Finance Conf (1 1999)17. Nanni, L., Lumini, A.: An experimental comparison of ensemble ofclassifiers for bankruptcy prediction and credit scoring. Expert Syst.Appl. (2), 3028–3033 (Mar 2009). https://doi.org/10.1016/j.eswa.2008.01.018, http://dx.doi.org/10.1016/j.eswa.2008.01.018
18. Odom, M., Sharda, R.: A neural network model for bankruptcy prediction. In:In Proc. of the 1990 IJCNN International Joint Conference on Neural Networks.vol. 2, pp. 163 – 168 vol.2 (07 1990)19. Ohlson, J.A.: Financial ratios and the probabilistic prediction of bankruptcy. Jour-nal of Accounting Research (1980)20. Sarojini Devi, S., Radhika, Y.: A survey on machine learning and statistical tech-niques in bankruptcy prediction. International Journal of Machine Learning andComputing , 133–139 (04 2018). https://doi.org/10.18178/ijmlc.2018.8.2.67621. Wang, G., Ma, J., Yang, S.: An improved boosting basedon feature selection for corporate bankruptcy prediction. Ex-pert Systems with Applications (5), 2353 – 2361 (2014),
22. Wang, N.: Bankruptcy prediction using machine learning. Journal of MathematicalFinance , 908–918 (01 2017). https://doi.org/10.4236/jmf.2017.7404923. Zhou, L., Wang, H.: Loan default prediction on large imbalanced data using randomforests. TELKOMNIKA Indonesian Journal of Electrical Engineering10