IEEE Access | 2021

A Streamlined scRNA-Seq Data Analysis Framework Based on Improved Sparse Subspace Clustering

 
 
 
 
 
 
 
 

Abstract


One advantage of single-cell RNA sequencing is its ability in revealing cell heterogeneity by cell clustering. However, cell clustering based on single-cell RNA sequencing is challenging due to the high transcript amplification noise, sparsity and outlier cell populations. In this study, we propose a novel sparse subspace clustering method called Structured Sparse Subspace Clustering and Completion for single-cell RNA sequencing analysis by assuming the cells related together are in the same subspace, and so the relationships among cells can be described within a subspace instead of between cell pairs. The proposed optimization model is solved by the Linearized Alternating Direction Method of Multipliers, in which data completion and spectral clustering are combined as a whole by mutual constraint. It is worth noting that random walk is used in the model to make the coefficient matrix more diagonal in the optimum iterative procedure, and the effect is significant. Our model is applied and compared with 5 state-of-the-art clustering methods on 6 public single cell datasets and a simulated data set with cell numbers varying from 56 to over 3000. As a result, our model outperforms the other clustering methods in clustering accuracy as evaluated by Adjusted Rand Index, Normalized Mutual Information, Homogeneity and Completeness, especially compared with the other improved sparse subspace clustering methods.

Volume 9
Pages 9719-9727
DOI 10.1109/ACCESS.2021.3049807
Language English
Journal IEEE Access

Full Text