bioRxiv | 2021

Improved estimation of cell type-specific gene expression through deconvolution of bulk tissues with matrix completion

 
 
 
 
 
 
 

Abstract


Cell type-specific gene expression (CSE) brings novel biological insights into both physiological and pathological processes compared with bulk tissue gene expression. Although fluorescence-activated cell sorting (FACS) and single-cell RNA sequencing (scRNA-seq) are two widely used techniques to detect gene expression in a cell type-specific manner, the constraints of cost and labor force make it impractical as a routine on large patient cohorts. Here, we present ENIGMA, an algorithm that deconvolutes bulk RNA-seq into cell type-specific expression matrices and cell type fraction matrices without the need of physical sorting or sequencing of single cells. ENIGMA used cell type signature matrix generated from either FACS RNA-seq or scRNA-seq as reference, and applied matrix completion technique to achieve fast and accurate deconvolution. We demonstrated the superior performance of ENIGMA to previously published algorithms (TCA, bMIND and CIBERSORTx) while requiring much less running time on both simulated and realistic datasets. To prove its value in biological discovery, we applied ENIGMA to bulk RNA-seq from arthritis patients and revealed a pseudo-differentiation trajectory that could reflect monocyte to macrophage transition. We also applied ENIGMA to bulk RNA-seq data of pancreatic islet tissue from type 2 diabetes (T2D) patients and discovered a beta cell-specific gene co-expression module related to senescence and apoptosis that possibly contributed to the pathogenesis of T2D. Together, ENIGMA provides a new framework to improve the CSE estimation by integrating FACS RNA-seq and scRNA-seq with tissue bulk RNA-seq data, and will extend our understandings about cell heterogeneity on population level with no need for experimental tissue disaggregation.

Volume None
Pages None
DOI 10.1101/2021.06.30.450493
Language English
Journal bioRxiv

Full Text