IEEE Access | 2021

Adaptive Landmark-Based Spectral Clustering for Big Datasets

 
 
 

Abstract


Clustering has emerged as an effective tool for the processing and assessment of the vast data generated by modern applications; its primary aim is to classify data into clusters in which the items are grouped into a given category. However, various challenges, such as volume, velocity, and variety, occur during the clustering of big data. Different algorithms have been proposed to enhance the performance of clustering. The landmark-based spectral clustering (LSC) technique has been proven to be efficient in clustering big datasets. In this study, an algorithm called adaptive landmark-based spectral clustering (ALSC) is proposed for clustering big datasets. The proposed algorithm comprises the adaptive competitive learning neural network (ACLNN) algorithm, which can be efficiently used to determine the number of clusters and the LSC technique. The ACLNN algorithm can also be used with small datasets. Thus, in our implementation, the original big dataset is split into N small sub datasets, which run in parallel by N copies of the ACLNN algorithm. To evaluate the performance of the proposed algorithm, two distinctive datasets, namely, Fashion-MNIST and United States Postal Service are used. The experiments show that the proposed ALSC algorithm produces high clustering accuracy with the identification of the number of clusters. Results reveal that the normalized mutual information and adjusted Rand index of the proposed algorithm outperform state-of-the-art models.

Volume 9
Pages 88291-88300
DOI 10.1109/ACCESS.2021.3088295
Language English
Journal IEEE Access

Full Text