Software Testing | 2021

CSSG: A cost‐sensitive stacked generalization approach for software defect prediction

 
 

Abstract


The prediction of software artifacts on defect‐prone (DP) or non‐defect‐prone (NDP) classes during the testing phase helps minimize software business costs, which is a classification task in software defect prediction (SDP) field. Machine learning methods are helpful for the task, although they face the challenge of data imbalance distribution. The challenge leads to serious misclassification of artifacts, which will disrupt the predictor s performance. The previously developed stacking ensemble methods do not consider the cost issue to handle the class imbalance problem (CIP) over the training dataset in the SDP field. To bridge this research gap, in the cost‐sensitive stacked generalization (CSSG) approach, we try to combine the staking ensemble learning method with cost‐sensitive learning (CSL) since the CSL purpose is to reduce misclassification costs. In the cost‐sensitive stacked generalization (CSSG) approach, logistic regression (LR) and extremely randomized trees classifiers in cases of CSL and cost‐insensitive are used as a final classifier of stacking scheme. To evaluate the performance of CSSG, we use six performance measures. Several experiments are carried out to compare the CSSG with some cost‐sensitive ensemble methods on 15 benchmark datasets with different imbalance levels. The results indicate that the CSSG can be an effective solution to the CIP than other compared methods.

Volume 31
Pages None
DOI 10.1002/stvr.1761
Language English
Journal Software Testing

Full Text