Using single-cell entropy to describe the dynamics of reprogramming and differentiation of induced pluripotent stem cells
UUsing single-cell entropy to describe the dynamicsof reprogramming and differentiation of inducedpluripotent stem cells
Yusong Ye , Zhuoqin Yang , Jinzhi Lei ∗ April 20, 2020 School of Mathematical Sciences, Beihang University, Beijing 100191, China School of Mathematical Sciences, Tiangong University, Tianjin 30387, China
Abstract
Induced pluripotent stem cells (iPSCs) provide a great model to studythe process of reprogramming and differentiation of stem cells. Single-cellRNA sequencing (scRNA-seq) enables us to investigate the reprogram-ming process at single-cell level. Here, we introduce single-cell entropy(scEntropy) as a macroscopic variable to quantify the cellular transcrip-tome from scRNA-seq data during reprogramming and differentiation ofiPSCs. scEntropy measures the relative order parameter of genomic tran-scriptions at single cell level during the cell fate change process, whichshows increasing during differentiation, and decreasing upon reprogram-ming. Moreover, based on the scEntropy dynamics, we construct a phe-nomenological stochastic differential equation model and the correspond-ing Fokker-Plank equation for cell state transitions during iPSC differen-tiation, which provide insights to infer cell fates changes and stem celldifferentiation. This study is the first to introduce the novel concept ofscEntropy to the biological process of iPSC, and suggests that the scEn-tropy can provide a suitable quantify to describe cell fate transition indifferentiation and reprogramming of stem cells. single-cell RNA sequencing, single-cell entropy, induced pluripotent stem cell,stochastic dynamics, differentiation, reprogramming
Induced pluripotent stem cells (iPSCs) are derived from skin or blood cells thathave been reprogrammed back into an embryonic-like pluripotent state that en-ables the development of other types of differentiated cells. Reprogramming of ∗ Corresponding author, Email: [email protected] a r X i v : . [ q - b i o . CB ] A p r PSC is often induced by introducing genes important for maintaining the essen-tial properties of embryonic stem cells (ESCs), and the genomic transcriptionschanges during the process of reprogramming and further differentiation. How-ever, this process is rather stochastic, and the molecular processes of cell fatechanges remain unclear[10]. Recently, single-cell RNA sequencing (scRNA-seq)methods have allowed for the investigation of cellular transcriptome at the levelof individual levels[11, 1, 3]. The technique of scRNA-seq enables us to betterstudy the dynamics of reprogramming and differentiation of iPSCs, which canprovide insightful informations on the process of cell reprogramming[12, 7, 2].Based on the microscopic state of gene transcriptions provided by scRNA-seq,we can often determine the marker genes that guide the reprogramming pro-cess. Alternatively, a well defined macroscopic variable for the state of a cell isimportant for the understanding of the dynamics process.Recently, a novel concept of single-cell entropy (scEntropy) was proposedto measure the order of cellular transcriptome profile from scRNA-seq data[5].The scEntropy of a cell is defined as the information entropy of the differencein transcriptions between the cell and a predefined reference cell, and providesa straightforward and parameter free macroscopic variable that can be usedto quantify the process of early embryo development[5]. Here, we investigatewhether scEntropy can be used to described the reprogramming and the differ-entiation process of iPSCs. We introduce the concept of scEntropy to quantifythe states of individual cells along the reprogramming and differentiation pro-cesses, which reveal state changes during the biological processes. scEntropycan be served as a pseudo-time of the process, according to which we identifythe genes that show expression correlated with scEntropy, and hence can be po-tent marker genes of cell fate changes. We also constructed phenomenologicalstochastic differentiation equations for the plasticity process of stem cells.
The scEntropy was proposed to measure the order of cellular transcription fromscRNA-seq data with respect to a reference level; larger entropy means lowerorder in the transcriptions[5]. Given an N × M gene expression matrix with N cells and M genes, and the gene expression vector r of the reference cell. Let x i ( i = 1 , · · · , N ) the gene expression vector of the i th cell. Calculation of thescEntropy of x i with reference to r , S ( x i | r ), includes two steps[5]: (1) calculatethe difference between x i and ry i = x i − r = ( y i , y i , · · · , y iM );(2) the entropy S ( x i | r ) is given by the information entropy of the signal sequence y i , i.e. , S ( x i | r ) = − (cid:90) p i ( y ) ln p i ( y )d y, p i ( y ) is the distribution density of the components y ij in y i . From thedefinition, the reference cell r is a variable in defining the entropy S ( x i | r ), whichmeans the baseline transcriptome with the minimum entropy of zero.To study the dynamics of reprogramming and differentiation of iPSCs, wecan refer the state of iPSC as the reference cell in defining the scEntropy, the re-sulting scEntropy gives the relative information entropy of each cell with respectto the state of iPSC.To illustrate the application of scEntropy, we apply scEntropy to investigatethe differentiation of iPSCs to cardiomyocytes. Time-series scRNA-seq datawere obtained from 16 time points (the cells were sequenced every 24 hoursfor 16 days) in 19 human cell lines, totally 297 RNA samples were sequenced(GSE122380)[9]. To calculate the scEntropy, we take the average gene expres-sions of cells at D0 (the un-differentiated state) as the reference cell, so thatscEntropy gives the relative transcription order with respect to the state ofpluripotent stem cells. AB C
Figure 1: scEntropy dynamics during the differentiation of iPSCs tocardiomyocytes. A . scEntropies of 297 human iPSC cells sequenced every 24hours for 16 days during the differentiation process. Black and red dots markthe two subgroup cells, respectively. See the text for details. B . Dynamicsof the average scEntropies of cells sequenced at each day. C . Variance of thescEntropies of cells sequenced at each day.The scEntropies of the sequenced cells at each day are shown at Fig. 1A.From Fig. 1A, there is an obvious tendency of increasing scEntropies along thedifferentiation process from D0 to D15. We also note the cell heterogeneity atD0. There are two group cells, group A cells show low level scEntropies withmore stem-like cells (black dots in Fig. 1A), and group B cells show higher levelscEntropies with less pluripotency (red dots in Fig. 1A). Experimentally, the3eprogramming of iPSC procedure is not 100% successful, the somatic cells arenot reprogrammed synchronously, and some cells may fail to be induced to apluripotency stem cells[4, 6], which correspond to the group B cells. Fig. 1Ashows that group A cells shown increasing scEntropy during differentiation, andthe two group cells emerge 8 days after the induction of differentiation.We further calculate the average and variance of the scEntropies of cells ateach day (Fig. 1B-C). The average scEntropy obvious increases during differen-tiation, and the variance rapidly decreases from day 1 to day 2, along with thelosing of heterogeneity. The cell-to-cell variance of scEntropies remain low atthe later stages of differentiation, which may suggest the homogeneous dynamicsof cell differentiation in the later stages (Fig. 1C). AB C
Figure 2: scEntropy of cells during the process of reprogramming fromMEF to iPSC . A . scEntropies of cells in 9 time points during induced re-programming. Black and red dots show the two subgroups cells, respectively.Referred to Fig. 1 and the text for detials. B . Average of scEntropies of cellssequenced in each day. C . Variance of scEntropies of cells sequenced in eachday.Next, we analyze the scEntropy of mouse cells during the reprogrammingprocess from MEF to iPSC (GSE103221)[4]. Totally 912 mouse cells were se-quenced from 9 time points, scEntropies of the cells are calculated, and thereference cell is taken as the average gene expression vectors of all iPSCs. ThescEntropy is nearly unchanged in 8 days after induction, and obviously decreasesupon further reprogrammed into iPSC (Fig. 2A-B). We also note that a smallfraction of cells (red dots in Fig. 2A) remain high scEntropy at the stage of iPSC(Fig. 2A), this may represent the cells that failed to be reprogrammed. Thecell-to-cell variance of scEntropies of cells sequenced at the same day remainlow along the reprogramming process (Fig. 2C). These results show changes in4he order of cellular transcriptions during the reprogramming process.During cell reprogramming and differentiation, the gene expressions associ-ated with cell pluripotency dynamically change in according with the cell types.The above applications show that scEntropy can quantify the processes, andshow lower scEntropy for iPSCs comparing with the differentiated cells. Wecan consider scEntropy as an intrinsic state variable of the pluripotency of iP-SCs, and hence changes of the scEntropy can be served as a pseudo-time relatedto cell reprogramming/differentiation. Hereafter, we analyze the differentiationof iPSCs to cardiomyocytes to illuminate how scEntropy can help us to under-stand the biological processes of cell type changes. From Fig. 1, scEntropy increases with the differentiation process, and it mea-sures the order of cellular transcription, which is an intrinsic state of a cell.Hence, we consider the scEntropy as a pseudo-time of each cell, which repre-sents the change of intrinsic state of the cell along with differentiation.Consider scEntropy as a pseudo-time, it is straightforward to study howthe expressions of each gene vary over the differentiation process. Based onthe scRNA-seq data of the differentiation of iPSCs to cardiomyocytes[9] (Fig.2), we calculate the Pearson correlation coefficients of the expressions of eachgene with the scEntropy. There are genes show high correlated (positive ornegative ) with the scEntropy (Fig. 3A), these genes are potential marker genesof the differentiation process that show similar tendency of changes with thetranscriptional order. We further analyze the top 10 positive correlated genesand the top 10 negative correlated genes (Fig. 3B). These genes include thepluripotent gene ZSCAN10, the DNA methyltransferase DNMT3B, and thetelomerase reverse transcriptase (TERT) which are closely related to the celldivision process. From Fig. 3B, the two subgroup cells in Fig .1A (black andred, respectively) show different dependence in gene expression with respectto change in the scEntropy. The group A cells (black dots) show increasing(positive correlated genes) or decreasing (negative correlated genes) with thescEntropy. In group B cells (red dots), however, the expression of these genesshow independent with changes of the scEntropy. In these cells, expressions ofthe above genes shown a weak correlation with the scEntropy and are differentfrom the other differentiated cells. For example, the expression level of thegene CHRM2 shown nearly unchanged with the increasing of the scEntropy.These results reveal the existence of two type cells at D0, the iPSCs that candifferentiate to cardiomyocytes, and the un-reprogrammed cells that remainphenotypically unchanged during the induction of differentiation.To further analyze the correlation between scEntropy and gene expressions,we identify 25 genes through GO enrichment that are associated with functionsrelated to cell pluripotency, differentiation, and DNA maintenance, etc . Expres-sions of these genes along the differentiation process are shown as the heat mapin Fig. 4A. Dynamics of these gene expressions with the increasing of scEntropyare shown in Fig. 4B. We show that some genes obviously decrease with scEn-5 cEntropy BA G e n e e x p r e ss i o n Figure 3:
Pearson correlation between gene expression and scEntropy . A . Pearson correlation coefficient (PCC) of all 16319 genes. The genes areordered according to the PCC value. B . Expressions of the top 10 positivecorrelation and top 10 negative correlation genes versus the scEntropy. Blackand red dots represent the two subpopulation cells as in Fig. 1A.6 e n e e x p r e ss i o n scEntropy BA Figure 4:
Marker genes during the differentiation process . A . Aver-age gene expressions of 25 marker genes along the differentiation process. B .Expressions of the 25 genes versus scEntropy.7ropy, such as the pluripotent genes ZSCAN10 and TET1. Expressions of TET1decreases along the differentiation. TET1 usually up-regulate transcription byRNA polymerase and promotes DNA demethylation process in differentiation.ZSCAN10 is another interesting gene that negative correlated with the scEn-tropy and decrease during differentiation. ZSCAN10 is known to function withsomatic-stem cell population maintenance and transcription factor activity. Thenon-reprogramming relevant gene PSCA, KIK8, and CD34 show no correlationwith the scEntropy, and nearly unchanged during differentiation. CD44 andSALL4 provide another scenario. These genes are enrich in the pathways ofDNA binding and stem cell population maintenance. CD44 obvious increasewhen the scEntropy is large, and SALL4 decrease with the increasing of scEn-tropy. These results show that when we consider scEntropy as a pseudo-timeof the differentiation process, the correlation between gene expressions and thescEntropy can provide informations on how gene expression changes with theintrinsic state of cells. To further investigate the dynamic process of cell date changes, we analyzethe distribution of scEntropies of cells at each day during the differentiationof iPSCs to cardiomyocytes (Fig. 5A). The distribution has two peaks at D0,corresponding to pluripotent cells with low entropy, and non-reprogrammedcells with high entropy, respectively. After the induction of differentiation, theentropy of pluripotent cells increase with the differentiation process, and thenumber of non-reprogrammed cells decreases. Finally, there is only one peakin the distribution from D8. The increasing of the scEntropy of pluripotentcells during differentiation suggests the decreasing of transcriptional order inthe differentiation of iPSCs to cardiomyocytes.From the above analysis, the distribution of scEntropy evolves to a unimodaldistribution in 16 days, which suggest that the cell scEntropy converge to a sta-tionary state of a stable scEntropy. Hence, we can describe the differentiationprocess of a single cell through the dynamics of its scEntropy, which is formu-lated as a stochastic process through a stochastic differentiation equation. Let X t represents the scEntropy of a cell at time t , and x ∗ the average scEntropy atthe differentiated state, we introduce the following Ornstein-Uhlenbeck processto model the phenomenological dynamics of the scEntropyd X ( t ) = − k ( X − x ∗ )d t + B d W t , (1)here k is a parameter describing dissipation velocity, B is the fluctuation pa-rameter, and W t means the Weiner process. Given the parameters and initialcondition, a sample solution of (1) gives a possible trajectory of scEntropy of asingle cell during differentiation. Fig. 5B shows the trajectories obtained from(1). Currently, we can only sequence a cell once, and it is impossible to trackthe evolution of transcriptome for a single-cell. Here, the equation (1) providesa conceptual description of the transcriptional states of a cell during a processof cell fate decision. 8 B D e n s i t y scEntropy Figure 5:
Transition of cell fates during the differentiation of iPSCsto cardiomyocytes . A . The scEntropy distribution (histogram) during thedifferentiation of iPSCs to cardiomyocytes. Red lines show the theoretical dis-tribution given by (4) from the solution the Fokker-Planck equation (2). B .Simulated scEntropy dynamics based on the stochastic differential equation (1).Dots are scEntropy from the scRNA-seq data (referred to Fig. 2). Here, theparameters are k = 0 .
15 day − , B = 0 . x ∗ = 5 .
6, and the initial conditionsare taken either low or high level scEntropies.9rom the stochastic differential equation, it is straightforward to obtain theassociated Fokker-Planck equation ∂f ( x, t ) ∂t = − ∂∂x ( − k ( x − x ∗ ) f ( x, t )) + B ∂ ∂x ( f ( x, t )) (2)Here, f ( x, t ) = P { X ( t ) = x } means the probability of a cell to have scEntropy x at time t . In particular, given the initial state X (0) = x , the transitionprobability P { X ( t ) = x | X (0) = x ) can be obtained by solving the equation(2) with initial condition f ( x,
0) = δ ( x − x ), which is given by a Gaussiandistribution with mean ϕ ( t ; x ) = x ∗ + ( x − x ∗ ) e − k · t and variance σ ( t ) = B k (1 − e − kt ): P { X ( t ) = x | X (0) = x } = 1 √ πσ ( t ) e − ( x − ϕ ( t ; x σ t ) (3)We fit (3) with the daily distributions shown by the histograms in Fig. 5A. Infitting the data, we take the initial values x = 4 . . x = 5 . .
4, the distributions are fitted with (here t = 0corresponds to day − f ( x, t ) = 1 √ πσ ( t ) (cid:18) . e − ( x − ϕ ( t ;4 . σ t ) + 0 . e − ( x − ϕ ( t ;5 . σ t ) (cid:19) (4)The obtained parameters are k = 0 . − , B = 0 .
04, and the theoreticaldistributions are shown by red lines in Fig. 5A.
The novel concept of scEntropy has been defined to measure the macroscopictranscription order of individual cells based on scRNA-seq[5]. The concept ofscEntropy is valuable in describing the process of early embryo development, aswell as the classification between normal and malignant cells in different typesof cancers[5]. The current study is the first to introduce scEntropy as a macro-scopic quantity of the transcription state of cells to describe the process iPSCsreprogramming and differentiation. The scEntropy can be a pseudo-time of thedifferentiation/reprogramming process and how decreasing with the increasingof cell pluripotency. The proposed scEntropy is an intrinsic quantify of the celltranscriptional state, and hence genes that show gene expressions correlatedwith the scEntropy can be essential for the process of cell fate decision. Wecan also consider the dynamics of scEntropy of an individual cell during cellfate transition as a stochastic process, which can be modeled through Langevinequations. The conceptual model of Langevin equation provide insights onthe dynamics of cell state changes during the differentiation/reprogrammingof iPSCs. Based on the stochastic dynamics of the scEntropy, the epigeneticlandscape of cell differentiation/reprogramming can be described by the related10okker-Plank equation. Our results shown the cell-to-cell variability of scEn-tropy during the differentiation process. Similarly, heterogeneity of gene expres-sion measured by the Shannon entropy also increase in the first few hours in thedifferentiation process of chicken erythroid progenitors[8] and in the differentia-tion from pluripotent stem cells to neuronal state[10]. Both entropies from cell-and gene- based show increase during the differentiation process, which suggestthe features of variability and stochasticity in stem cell differentiation.The scEntropy was proposed to measure the intrinsic order of the cellu-lar transcriptome with respect to a predefined reference cell. The definition ofscEntropy includes no external parameters, and hence can provide natural in-formations of a cell. It is essential to quantify the changes of intrinsic cellulartranscriptome during differentiation/reprogramming for our understanding ofthe biological process of cell fate decision. iPSCs provide a controllable systemto study the cell fate changes at individual cell level. Single-cell sequencingtechniques enable us to examine the microscopic states of individual cells. How-ever, there are two major limitations when we analyze the single-cell sequencingdata: each cell can only be sequenced once and hence not able to track a celldynamics; and the sequencing data are usually very high dimensional, it is usu-ally difficult to find the low dimensional features to characterize the cell. Hence,for a better understanding of the cell type transition process, it is important toquantify the cell types through the intrinsic state of a cell. Our study showsthat scEntropy can be one of the potent variable for the intrinsic state of cellulartranscriptome, and can be used to describe the single cell dynamics of epigeneticlandscape through the stochastic dynamical models. The current approach canbe extended to explore more detail dynamics of various biological processes ofcell type transitions, such as early embryo development, cancer cell plasticity,cell differentiation.
This work is supported by the National Natural Science Foundation of China(11831015, and 11872084).
References [1] Davide Cacchiarelli, Cole Trapnell, Michael J Ziller, Magali Soumil-lon, Marcella Cesana, Rahul Karnik, Julie Donaghey, Zachary D Smith,Sutheera Ratanasirintrawoot, Xiaolan Zhang, et al. Integrative analysesof human reprogramming reveal dynamic nature of induced pluripotency.
Cell , 162(2):412–424, 2015.[2] GTEx Consortium et al. Genetic effects on gene expression across humantissues.
Nature , 550(7675):204, 2017.113] Charles Gawad, Winston Koh, and Stephen R Quake. Single-cell genome se-quencing: current state of the science.
Nature Reviews Genetics , 17(3):175,2016.[4] Lin Guo, Lihui Lin, Xiaoshan Wang, Mingwei Gao, Shangtao Cao, Yuan-bang Mai, Fang Wu, Junqi Kuang, He Liu, Jiaqi Yang, et al. Resolving cellfate decisions during somatic cell reprogramming by single-cell RNA-Seq.
Molecular Cell , 73(4):815–829, 2019.[5] Jingxin Liu, You Song, and Jinzhi Lei. Single-cell entropy to quantify thecellular transcriptome from single-cell RNA-seq data.
Biophys Rev Lett ,doi:10.1142/S179304802500010, 2020.[6] Ben D MacArthur and Ihor R Lemischka. Statistical mechanics of pluripo-tency.
Cell , 154(3):484–489, July 2013.[7] Dan L Nicolae, Eric Gamazon, Wei Zhang, Shiwei Duan, M Eileen Dolan,and Nancy J Cox. Trait-associated SNPs are more likely to be eQTLs: an-notation to enhance discovery from GWAS.
PLoS Genetics , 6(4):e1000888,2010.[8] Ang´elique Richard, Lo¨ıs Boullu, Ulysse Herbach, Arnaud Bonnafoux,Val´erie Morin, Elodie Vallin, Anissa Guillemin, Nan Papili Gao, RudiyantoGunawan, J´er´emie Cosette, Oph´elie Arnaud, Jean-Jacques Kupiec,Thibault Espinasse, Sandrine Gonin-Giraud, and Olivier Gandrillon.Single-Cell-Based Analysis Highlights a Surge in Cell-to-Cell MolecularVariability Preceding Irreversible Commitment in a Differentiation Pro-cess.
PLoS biology , 14(12):e1002585, December 2016.[9] BJ Strober, Reem Elorbany, K Rhodes, Nirmal Krishnan, Karl Tayeb,Alexis Battle, and Yoav Gilad. Dynamic genetic regulation of gene expres-sion during cellular differentiation.
Science , 364(6447):1287–1290, 2019.[10] Patrick S Stumpf, Rosanna C G Smith, Michael Lenz, Andreas Schuppert,Franz-Josef M¨uller, Ann Babtie, Thalia E Chan, Michael P H Stumpf,Colin P Please, Sam D Howison, Fumio Arai, and Ben D MacArthur. StemCell Differentiation as a Non-Markov Stochastic Process.
Cell Systems ,5(3):268–282.e7, September 2017.[11] Kazutoshi Takahashi, Koji Tanabe, Mari Ohnuki, Megumi Narita, TomokoIchisaka, Kiichiro Tomoda, and Shinya Yamanaka. Induction of pluripotentstem cells from adult human fibroblasts by defined factors.
Cell , 131(5):861–872, 2007.[12] Zhihong Zhu, Futao Zhang, Han Hu, Andrew Bakshi, Matthew R Robinson,Joseph E Powell, Grant W Montgomery, Michael E Goddard, Naomi RWray, Peter M Visscher, et al. Integration of summary data from GWASand eQTL studies predicts complex trait gene targets.