IEEE Access | 2021

An Improved MAHAKIL Oversampling Method for Imbalanced Dataset Classification

Abstract

In the process of imbalanced dataset classification, traditional oversampling technologies do not take into the account of the distribution characteristics of data sets, which easily results in low classification accuracy of positive samples. In order to improve the recognition rate of positive samples, the algorithm combining the K-means clustering with the MAHAKIL oversampling is proposed in this article. Firstly, the K-means clustering algorithm is used to divide the positive samples into K clusters. The samples in each cluster are then divided into two initial populations according to the magnitude of the Mahalanobis distance. Finally, the crossover operator of the genetic algorithm is used to continuously synthesize new data samples to balance the data set. The experimental results show that when K-means MAHAKIL oversampling algorithm is combined with different classification algorithms, it can achieve better classification performance, which verifies the effectiveness of the algorithm.

Volume 9

IEEE Access | 2021

An Improved MAHAKIL Oversampling Method for Imbalanced Dataset Classification

Abstract

Volume 9

Pages 16030-16040

DOI 10.1109/ACCESS.2020.3047741

Language English

Journal IEEE Access

Full Text