Spatial Information Research | 2021

Improved email classification through enhanced data preprocessing approach

 
 
 
 

Abstract


Email has become one of the most widely used forms of communication, resulting in an exponential increase in emails received and creating an immense burden on existing approaches to email classification. Applying the classification method on the raw data may worsen the performance of classifier algorithms. Hence, the data have to be prepared for better performance of the machine learning classifiers. This paper proposes an enhanced data preprocessing approach for multi-category email classification. The proposed model removes the signature of the email. Further, special characters and unwanted words are removed using various preprocessing methods such as stop-word removal, enhanced stop-word removal, and stemming. The proposed model is evaluated using various classifiers such as Multi-Nominal Naïve Bayes, Linear Support Vector Classifier, Logistic Regression, and Random Forest. The results showed that the proposed data preprocessing to email classification is superior to the existing approach.

Volume 29
Pages 247-255
DOI 10.1007/s41324-020-00378-y
Language English
Journal Spatial Information Research

Full Text