Archive | 2019

Improving classification performance for an imbalanced educational dataset example using SMOTE

 
 
 

Abstract


With technology, a lot of data is formed in digital environments. One of the areas with intensive data is educational data sets. By analyzing educational data sets, students situatiokjgjjooOns can be predicted by foreseeing. In this way, students can be assisted by anticipating situations such as drop-out due to failure. Educational institutions can take measures to prevent such dropouts and reduce student drop-out. Thus, financial losses of students and educational institutions can be prevented. In this study, the data of five separate associate degree students who were enrolled in Amasya University Distance Education Center in 2016-2017 were used. These are associate degree programs in child development, medical documentation and secretarial, electricity, mechatronics, and internet and network technologies. It was estimated whether the students could graduate or not at the end of the IV. Semester with looking at their I. and II. semester course notes. These data were analyzed by k nearest neighbor (K-NN) and KStar algorithms. Some of the data were obtained from the distance education center as imbalanced data due to the low number of students. In Educational Data Mining, researchers usually overlook the balance of the distribution on a dataset. Unbalanced data can seriously affect the success of classification. Synthetic minority oversampling technique (SMOTE) method was applied to these unbalanced data and how it affected the success of classification was examined. First, the raw data were analyzed with K-nearest neighbors classifier and KStar classifier. In this study, the analysis results of these five chapters are given in tables and comparatively. In this study, it has been seen that SMOTE oversampling method increase the classification success. In areas where unstable data such as educational data mining may exist, higher classification accuracy can be achieved with the help of different oversampling methods.

Volume None
Pages 485-489
DOI 10.31590/ejosat.638608
Language English
Journal None

Full Text