IEEE Transactions on Neural Networks and Learning Systems | 2019

Biased Random Forest For Dealing With the Class Imbalance Problem

 
 
 

Abstract


The class imbalance issue has been a persistent problem in machine learning that hinders the accurate predictive analysis of data in many real-world applications. The class imbalance problem exists when the number of instances present in a class (or classes) is significantly fewer than the number of instances belonging to another class (or classes). Sufficiently recognizing the minority class during classification is a problem as most algorithms employed to learn from data input are biased toward the majority class. The underlying issue is made more complex with the presence of data difficult factors embedded in such data input. This paper presents a novel and effective ensemble-based method for dealing with the class imbalance problem. This paper is motivated by the idea of moving the oversampling from the data level to the algorithm level, instead of increasing the minority instances in the data sets, the algorithms in this paper aims to “oversample the classification ensemble” by increasing the number of classifiers that represent the minority class in the ensemble, i.e., random forest. The proposed biased random forest algorithm employs the nearest neighbor algorithm to identify the critical areas in a given data set. The standard random forest is then fed with more random trees generated based on the critical areas. The results show that the proposed algorithm is very effective in dealing with the class imbalance problem.

Volume 30
Pages 2163-2172
DOI 10.1109/TNNLS.2018.2878400
Language English
Journal IEEE Transactions on Neural Networks and Learning Systems

Full Text