Perform. Evaluation | 2019
Clustering and evolutionary approach for longitudinal web traffic analysis
Abstract In recent years, data-driven approaches have attracted the interest of the research community. Considering network monitoring, unsupervised machine learning solutions such as clustering are particularly appealing to let the network analysts observe patterns, and track the evolution of traffic over time. In this paper, we present a novel unsupervised methodology to automatically process and analyze batches of HTTP traffic, looking just at the URL structure. First, we describe IDBSCAN, Iterative-DBSCAN. We design it to obtain well-shaped clusters, and to simplify the choice of parameters — often a cumbersome step for the network analyst. Second, we show LENTA, Longitudinal Exploration for Network Traffic Analysis, which allows to automatically observe the evolution over time of traffic, naturally highlighting trends and pinpointing anomalies. We first evaluate IDBSCAN and LENTA on synthetic data to compare their performance against well-known algorithms. Then we apply them on a real case, facing the analysis of hundred thousands of URLs collected from a live network. Results show both the goodness of clusters produced by IDBSCAN and LENTA ability to highlight changes in traffic, facilitating the analyst job.