Inf. Process. Manag. | 2021

A novel regularized asymmetric non-negative matrix factorization for text clustering

 
 

Abstract


Abstract Non-negative matrix factorization (NMF) is a dimension reduction method that extracts semantic features from high-dimensional data. Most of the developed optimization methods for NMF only pay attention to how each feature vector of factorized matrices should be modeled, and ignore the relationships among feature vectors. Such a relationship among documents’ feature vectors provides better factorization for text clustering. This paper proposes a novel regularized asymmetric non-negative matrix factorization (RANMF) for text clustering. The proposed method puts regularized constraints on pairwise feature vectors by applying penalties using distance-based measures. We design a new cost function based on the Kullback–Leibler divergence and develop an optimization scheme to solve the cost function by suggesting novel multiplicative updating rules. The proposed method considers the documents from the same cluster closely together in the new representation space. Hence, the acquired parts-based representation has consistent cluster labeling with the original space and has a more discriminating ability. The complexity analysis showed that RANMF does not increase time cost by applying regularizers when comparing with the original NMF. Regarding experiments, the proposed RANMF converges very fast because it terminates in less than ten iterations. The complete proof of convergence and experimental results on the benchmark data sets demonstrate that the proposed multiplicative updating rules converge fast and achieve superior results compared to other algorithms.

Volume 58
Pages 102694
DOI 10.1016/J.IPM.2021.102694
Language English
Journal Inf. Process. Manag.

Full Text