IEEE Transactions on Knowledge and Data Engineering | 2019

Feature Selection with Unsupervised Consensus Guidance

 
 
 

Abstract


Most of the unsupervised feature selection methods employ pseudo labels generated by clustering to guide the feature selection; however, noisy and irrelevant features degrade the cluster structure, which is ineffective to supervise feature selection. In light of this, we propose the Consensus Guided Unsupervised Feature Selection (CGUFS) framework, which introduces consensus clustering to generate pseudo labels for feature selection. Generally speaking, multiple diverse basic partitions are generated from the data and the consensus clustering is employed to provide the high-quality and robust partition to guide the feature selection in a one-step framework. In addition, complex constraints such as non-negative are removed due to the crisp indicators of consensus clustering. Based on the CGUFS framework, two formulations are put forward by using the utility function and co-association matrix, respectively, and we propose the (weighted) K-means-like optimization solution for efficient solutions with theoretical supports. Moreover, we extend the CGUFS framework to handle multi-view data feature selection. Extensive experiments on several single-view and multi-view data mining data sets in different domains demonstrate that our methods outperform the most recent state-of-the-art work in terms of effectiveness and efficiency. Some important impact factors and model parameters within CGUFS are thoroughly discussed for practical use.

Volume 31
Pages 2319-2331
DOI 10.1109/TKDE.2018.2875712
Language English
Journal IEEE Transactions on Knowledge and Data Engineering

Full Text