2021 IEEE 41st International Conference on Distributed Computing Systems Workshops (ICDCSW) | 2021

VPN-nonVPN Traffic Classification Using Deep Reinforced Naive Bayes and Fuzzy K-means Clustering

 

Abstract


This paper addresses one of the attack methods where threat actors break into secure networks through virtual private networks (VPN) to launch tunneling based encrypted attacks, obfuscated advanced persistent threats and malware. The paper proposes Naive Bayes augmented with deep reinforcement learning (DRL) and fuzzy k-means clustering to classify VPN and non-VPN data. The proposed method is validated on the publicly available UNB-CIC VPN non-VPN dataset and shows an accuracy comparable to other state-of-the-art machine learning algorithms for traffic characterization. The proposed approach for traffic characterization of traffic classes (e.g., FTP and P2P) and application identification (e.g., Netflix and Amazon Prime) is analyzed for traffic-class detection efficiency and accuracy, and the ability to distinguish between VPN and non-VPN network traffic. The experimental results reveal that an optimum network traffic classification is achieved for ‘NB+DRL’ approach, where DRL is used to reinforce the NB classification model through iterative policy evaluation and improvement and achieves recall of 0.96 in traffic categorization and 0.95 in application identification. This is further improved with fuzzy k-means clustering to reduce computational costs for encrypted network traffic classification using ‘NB+DRL+fuzzy k-means clustering’, where the class partitions based on k-means converge to a local minimum based on dissimilarity measures for given packet features. The test set F1 scores of 0.973 and 0.965 are achieved for traffic characterization and application identification, respectively.

Volume None
Pages 1-6
DOI 10.1109/icdcsw53096.2021.00008
Language English
Journal 2021 IEEE 41st International Conference on Distributed Computing Systems Workshops (ICDCSW)

Full Text