Turkish J. Electr. Eng. Comput. Sci. | 2021

Development of majority vote ensemble feature selection algorithm augmented with rank allocation to enhance Turkish text categorization

 
 
 

Abstract


The increase in the number of texts as digital documents from numerous sources such as customer reviews, news, and social media has made text categorization crucial in order to be able to manage the enormous amount of data. The high dimensional nature of these texts requires a preliminary feature selection task to reduce the feature space with a potential increase in the prediction accuracy. In this study, we developed an ensemble feature selection method, namely majority vote rank allocation, was developed for Turkish text categorization purposes. The method uses a majority voting ensemble strategy in combination with a rank allocation approach to combine weak filters such as information gain, symmetric uncertainty, relief, and correlation-based feature selection. Thus, the proposed method measures the quality of the features among all features with the majority votes of the filters and ranking allocation. The feature selection efficacy of the method was tested on two datasets, one from the literature and a newly collected dataset. The effect of the obtained features on the classification prediction performance was evaluated on top of the naive bayes, support vector machine J48, and random forests algorithms. It was empirically observed that the developed method improved the prediction accuracies of the classifiers compared to the mentioned filters. The statistical significance of the experimental results were also validated with the use of a two-way analysis of variance test.

Volume 29
Pages 514-530
DOI 10.3906/elk-1911-116
Language English
Journal Turkish J. Electr. Eng. Comput. Sci.

Full Text