Applied Intelligence | 2021
Simulated annealing based undersampling (SAUS): a hybrid multi-objective optimization method to tackle class imbalance
Abstract
Learning from imbalanced datasets is a challenging problem in machine learning research since the performance of the traditional classifiers suffer from biased classification towards the Majority class resulting in a low Minority class prediction rate. The inherent assumptions of equal class distribution and accuracy-driven evaluation are the identified reasons behind this degraded performance. Further, false negatives have higher penalty than the false positives. A simple logical solution to mitigate this issue is to construct a balanced training set from the imbalanced one. However, several such sets of balanced training sets can be formed for a given imbalanced set from which an optimal balanced training set has to be obtained. This is a computationally intractable problem and prone to local-optimal maxima/minima. To address these issues, a Simulated Annealing-based Under Sampling (SAUS) method is proposed. Simulated annealing is a popular meta-heuristic search algorithm, which implements a novel cost function in terms of Balanced Error Rate. This cost function strikes a balance between Sensitivity and Specificity measures while evaluating the solution at each iteration in the subsampling process and also is free from the local trap. The experimental results of SAUS demonstrate that the average Sensitivity measure on the test set has improved from 0.68 to 0.86 and proves its efficacy in tackling the imbalance issue in the dataset. Area Under the ROC Curve (AUC) results also demonstrate that SAUS outperforms several popular undersampling methods. SAUS works on par with state-of-the-art solutions for the class imbalance problem.