2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE) | 2019

The Undersampling Effects on RANDSHUFF Oversampling Algorithms

 

Abstract


Randshuff (Random Shuffle Oversampling Techniques for Qualitative Data) is one of an oversampling algorithm which appropriate for nominal attributes. Randshuff uses IVDM (Interpolated Value Difference Metric) distance calculation and crossover with random shuffle technique. Although Randshuff can overcome the problems on minority data, but the problems on majority data are ignored. The problem arises where majority data contain distribution complexity problems such as small disjuncts, overlap and noise. There are two kinds of undersampling concepts: informed undersampling and simple random undersampling. Tomeks links, Edited Nearest neighbors (ENN) and Near Miss are informed undersampling state of the art methods. Meanwhile, Random Undersampling (RUS) is simple random undersampling method. So, evaluations of both undersampling concepts on Randshuff are needed to be conducted. The experiments were evaluated on five public datasets. The results show that RUS as simple random undersampling and Near Miss as informed under sampling improve recall, f-measure and g-mean performance on Randshuff algorithm.

Volume None
Pages 265-270
DOI 10.1109/ICITISEE48480.2019.9003930
Language English
Journal 2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE)

Full Text