2019 IEEE 4th International Conference on Advanced Robotics and Mechatronics (ICARM) | 2019

Unsupervised classification of high-dimension and low-sample data with variational autoencoder based dimensionality reduction

 
 

Abstract


In data mining research and development, one of the defining challenges is to perform classification or clustering tasks for relatively limited-samples with high-dimensions data, also known as high-dimensional limited-sample size (HDLSS) problem. Due to the limited-sample-size, there is a lack of enough training data to train classification models. Also, the ‘curse of dimensionality’ aspect is often a restriction on the effectiveness of many methods for solving HDLSS problem. Classification model with limited-sample dataset lead to overfitting and cannot achieve a satisfactory result. Thus, the unsupervised method is a better choice to solve such problems. Due to the emergence of deep learning, their plenty of applications and promising outcome, it is required an extensive analysis of the deep learning technique on HDLSS dataset. This paper aims at evaluating the performance of variational autoencoder (VAE) based dimensionality reduction and unsupervised classification on the HDLSS dataset. The performance of VAE is compared with two existing techniques namely PCA and NMF on fourteen datasets in term of three evaluation metrics namely purity, Rand index, and NMI. The experimental result shows the superiority of VAE over the traditional methods on the HDLSS dataset.

Volume None
Pages 498-503
DOI 10.1109/ICARM.2019.8834333
Language English
Journal 2019 IEEE 4th International Conference on Advanced Robotics and Mechatronics (ICARM)

Full Text