Journal of Information and Optimization Sciences | 2019

ELM and KELM based software defect prediction using feature selection techniques

 
 

Abstract


Abstract Context: Software defect prediction (SDP) models help in delivering a dependable and a genuine product to the clients. However, the performance of these models is affected by the presence of irrelevant features in the datasets. This problem is addressed by feature selection techniques. Objectives: (1) To determine the performance of feature selection based classification models in the context of software defect prediction, and (2) To determine if the removal of insignificant features makes a significant difference in the performance of the SDP models. Method: SDP models are built using two classifiers – Extreme learning machine (ELM) and Kernel based extreme learning machine (KELM) based on five wrapper and seven filter based feature selection techniques. Experiments are performed using seven datasets from the PROMISE repository. Testing accuracy is used for performance comparison of the feature selection based ELM and KELM defect classification models. Results: (1) ELM based classifiers achieved a higher testing accuracy with wrapper based feature selection methods while KELM classifiers performed better with filter based methods. (2) It is also found that even after eliminating over 85 percent of the attributes from the original software project data, the classification performance of the models is comparable before and after removing the insignificant features in most of the cases and it improved in very few experiments. Conclusion: With respect to the feature selection based defect classification, the performance of ELM and KELM based models is better with wrapper and filter based methods, respectively. Overall, a dimensionally reduced space does not significantly affect the prediction performance of the SDP models. In a way, it is indicated that the feature subsets obtained after removing the insignificant software metrics provide more significance to the output class.

Volume 40
Pages 1025 - 1045
DOI 10.1080/02522667.2019.1637999
Language English
Journal Journal of Information and Optimization Sciences

Full Text