Archive | 2019

Clustering High-Dimensional Data: A Reduction-Level Fusion of PCA and Random Projection

 
 
 

Abstract


Principal Component Analysis (PCA) is a very famous statistical tool for representing the data within lower dimension embedding. K-means is a prototype (centroid)-based clustering technique used in unsupervised learning tasks. Random Projection (RP) is another widely used technique for reducing the dimensionality. RP uses projection matrix to project the data into a feature space. Here, we prove the effectiveness of these methods by combining them for efficiently clustering the low as well as high-dimensional data. Our proposed algorithms works by combining Principal Component Analysis (PCA) with Random Projection (RP) to project the data into feature space, then performs K-means clustering on that reduced space (feature space). We compare the proposed algorithm’s performance with simple K-means and PCA-K-means algorithms on 12 benchmark datasets. Of these, 4 are low-dimensional and 8 are high-dimensional datasets. Our proposed algorithms outperform the other methods.

Volume None
Pages 479-487
DOI 10.1007/978-981-13-1280-9_44
Language English
Journal None

Full Text