A Bayesian statistical analysis of stochastic phenotypic plasticity model of cancer cells
aa r X i v : . [ q - b i o . CB ] D ec A Bayesian statistical analysis of stochasticphenotypic plasticity model of cancer cells
Da Zhou , Shanjun Mao , Jing Cheng , Kaiyi Chen , Xiaofang Cao , Jie Hu , ∗
1. School of Mathematical Sciences, Xiamen University, Xiamen 361005, P.R. China( ∗ Corresponding Author, [email protected]))2. Department of Statistics, The Chinese University of Hong Kong, Shatin, N.T., HongKong, PR China3. School of Statistics, Huaqiao University, Xiamen 361005, P.R. China
Abstract
The phenotypic plasticity of cancer cells has received special attention in recent years.Even though related models have been widely studied in terms of mathematical properties,a thorough statistical analysis on parameter estimation and model selection is still verylacking. In this study, we present a Bayesian approach on the relative frequencies ofcancer stem cells (CSCs). Both Gibbs sampling and Metropolis-Hastings (MH) algorithmare used to perform point and interval estimations of cell-state transition rates betweenCSCs and non-CSCs. Extensive simulations demonstrate the validity of our model andalgorithm. By applying this method to a published data on SW620 colon cancer cellline, the model selection favors the phenotypic plasticity model, relative to conventionalhierarchical model of cancer cells. Moreover, it is found that the initial state of CSCs aftercell sorting significantly influences the occurrence of phenotypic plasticity.
The hypothesis of cancer stem cell theory [1, 2] postulates a hierarchical organizationof cancer cells. A small number of tumorigenic cancer cells, also termed cancer stem1ells (CSCs), reside at the apex of this cellular hierarchy [3]. CSCs are capable of self-renewal and generating more differentiated cancer cells with lower tumorigenic potential.However, growing researches have extended the CSC model by proposing a phenotypicplasticity paradigm in which reversible transitions could happen between CSCs and non-CSCs [4]. That is, not only can CSCs give rise to non-CSCs, but a fraction of non-CSCscan reacquire CSC-like characteristics. This de-differentiation from non-CSCs to CSCshas been reported in quite a few types of cancer, such as breast cancer [5, 6, 7], melanoma[8], colon cancer [9], and glioblastoma multiforme [10].Very recently special attention has been paid to reasonable mathematical models forquantifying the process of phenotypic plasticity. In particular, it was found that thephenotypic plasticity plays an important role in the stability of the quantitative models[6, 11, 12, 13, 14, 15, 16]. That is, the phenotypic plasticity greatly contributes to stabi-lizing the phenotypic mixture of cancer cells, thereby effectively maintaining the hetero-geneity of cancer cell populations. Some other researches laid emphasis on the role of thephenotypic plasticity in transient dynamics. It was shown that an interesting overshootphenomenon of CSCs observed in experiment can be well explained by de-differentiationfrom non-CSCs to CSCs [17, 18]. Besides, Leder et al studied mathematical modelsof pdgf-driven glioblastoma and revealed that the effectiveness of radiotherapy is quitesensitive to the capability of de-differentiation from differentiated sensitive cells to stem-like resistant cells [19]; Jilkine and Gutenkunst studied the effect of de-differentiationon time to mutation acquisition in cancers [20]; Chen et al studied transition model be-tween endocrine therapy responsive and resistant states in breast cancer by LandscapeTheory [21]; Dhawan et al showed with mathematical modeling that exposure to hypoxiaenhanced the plasticity and heterogeneity of cancer cell populations [22]; Tonekaboi etal investigated how cellular plasticity behaves differently in small and large cancer cellpopulations [23]. A recent review by Jolly et al [24] focused on quantitative models ofEpithelial-mesenchymal plasticity in cancer.Even though the phenotypic plasticity has been extensively studied in terms of math-ematical properties, the statistical analysis on parameter estimation and model selectionis still very lacking. Actually, one of the crucial tasks in the research of phenotypicplasticity is to estimate the transition rates between different cell types. As a pioneer-ing work, Gupta et al [6] established a discrete-time Markov state transition model andestimated the transition probabilities between different cell states by fitting the model totheir FACS (Fluorescence-activated cell sorting) data on SUM159 and SUM149 breastcancer cell lines. Besides, continuous-time ordinary differential equations (ODEs) mod-els were also developed [13, 14], based on which de-differentiation rates were estimatedby fitting to SW620 colon cancer cell line. However, the above mentioned works canonly provide point estimations to the interested parameters, but not interval estimations.Comparatively, interval estimation is much more informative and frequently-used than2oint estimation in practice. For doing interval estimation, statistical modeling ratherthan deterministic modeling should be applied. Moreover, note that it is still question-able if de-differentiation is a crucial improvement to the cellular hierarchy of cancer cellsor just a minor extension to it, model selection can be used to validate the paradigm ofphenotypic plasticity in terms of statistical significance. Therefore, a thorough statisti-cal analysis is of great value for further quantifying the biological process of phenotypicplasticity and exploring its biological significance.In this research, a statistical framework is presented to analyze a two-phenotypicmodel of cancer cells. In this model, each cancer cell is either CSC phenotypic stateor non-CSC phenotypic state. Both types of cells can divide symmetrically or asymmetri-cally with certain probabilities. A Bayesian approach [25] is developed to fit this model toexperimental data on relative frequencies of CSCs. MCMC methods (such as Gibbs sam-pling [26] and MH algorithm [27, 28]) are used to perform statistical inference with Mul-tivariate Potential Scale Reduction Factor (MPSRF) [29, 30] checking the convergenceof MCMC chains. Our simulation results demonstrate the precision and accuracy of ouralgorithm by both point estimation and interval estimation. By applying our approach to apublished data on SW620 colon cancer cell line [9], we also perform model selection viadeviance information criterion (DIC, [31]). Our result shows that the phenotypic plasticitymodel with de-differentiation has superior quality relative to the hierarchical model with-out de-differentiation. Furthermore, an interesting frequency-dependent phenomenon ispresented, i.e. the estimated values of the model parameters depend on the initial relativefrequencies of different cell states. This suggests that the process of phenotypic plasticitycould be relevant to the heterogeneity level of cancer cell populations.The paper is organized as follows. The model assumptions and Bayesian frameworkare presented in Section 2. Main results including simulations and real data analysis areshown in Section 3. Conclusions are presented in Section 4.
In this section we describe the model assumptions. Note that the salient feature of the phe-notypic plasticity model is the reversibility between CSCs and non-CSCs, i.e., not onlycan CSCs differentiate into non-CSCs, but non-CSCs are also capable of de-differentiatinginto CSCs. Consider a population of cancer cells comprising two phenotypes: CSCrepresents cancer stem cell phenotypic state, non-CSC represents non-stem cancer cellphenotypic state. Even though this two-phenotypic assumption simplifies the biologicalcomplexity of highly diverse phenotypes in cancer, the two-phenotypic setting has beenproved as an effective and reasonable simplification for highlighting the minimal process3f phenotypic plasticity [11, 12, 13, 19]. Similar bidirectional transition cascade modelswere also studied in bacterial community [32, 33].We now present the cellular process of the two-phenotypic model. From probabilisticpoint of view, this model can be seen as a discrete-time two-type branching process [34].Each cell lives for a fixed time (suppose one unit of time). At the moment of death it givesbirth to two daughter cells. More specifically, for each CSC, it gives birth to two identicalCSC daughter cells with probability α (symmetric division), otherwise (with probability − α ) it gives birth to one CSC daughter cell and one non-CSC daughter cell (asymmetricdivision). For each non-CSC, it divides symmetrically into two non-CSC daughter cellswith probability − β , whereas it divides asymmetrically into one non-CSC daughtercell and one CSC daughter cell with probability β (de-differentiation). The model willreduce to conventional hierarchical model if letting β = 0 , i.e. de-differentiation is notallowed to happen. Hence the model selection with respect to β provides an efficient wayto evaluate the significance of phenotypic plasticity.The statistical inference of branching processes has been studied for a long time [35].The usage of statistical methods strongly depends on the data types available. Normally,the observation of the whole genealogy tree generated from underlying process is quitedifficult to obtain (except in very limited experiments [36]). Instead, only can the numbersor relative frequencies of distinct cell types be recorded at given moment, and compar-atively it is easier to collect data on relative frequencies than absolute numbers of givencell types [37]. Thus developing statistical approaches for proportion data has a widerrange of application. In this work our proposed method is used for the time-series data onrelative frequencies of CSC phenotypic state.Let x A ( t ) be the frequency of CSC state at time t , µ t be the expectation of x A ( t ) , i.e. µ t = E ( x A ( t )) , and σ t be the variance of x A ( t ) , i.e. σ t = Var ( x A ( t )) . Then we canobtain two important recurrence formulas as follows (see A for more details): µ t +1 = 1 + α − β µ t + β , (1) σ t +1 = ( 1 + α − β σ t + 1 N × t +2 { [ α (1 − α ) − β (1 − β )] µ t + β (1 − β ) } , (2)where N is the initial number of the whole population. In next section, we will putforward a Bayesian statistical framework based on the above two formulas. The data we deal with looks like { ( m , v ) , ( m , v ) , · · · , ( m t , v t ) , · · · ( m T , v T ) } , where m t and v t are sample mean and sample variance of n realizations of x A ( t ) respectively.4he Markov property of the model implies that the distribution of data at time t onlydepends on data at time t − . Thus, we can write the joint likelihood of data as follows: L ( α, β | m , v , m , v , · · · , m T , v T ) = f ( m , v ) × f ( m , v | m , v , α, β ) ×· · ·× f ( m T , v T | m T − , v T − , α, β ) . Note that f ( m , v ) is irrelevant to the parameters α and β , the joint likelihood can beexpressed as L ( α, β | m , v , m , v , · · · , m T , v T ) ∝ f ( m , v | m , v , α, β ) ×· · ·× f ( m T , v T | m T − , v T − , α, β ) . According to the asymptotic normality of x A ( t ) provided N ≫ (see Theorem 1 in[37]), the asymptotic distributions of m t and v t can be given as: m t ∼ N ( µ t , σ t n ) ,v t ∼ σ t n − χ ( n − . Since m t , v t are unbiased estimators of µ t , σ t respectively, we can substitute µ t and σ t with the corresponding observed sample mean m t and sample variance v t in the recur-rence formulas (1) and (2). Given the priors of α, β as Beta (1 , , i.e. uniform distribu-tion, we can obtain the following posterior distribution: p ( α, β | m , v , m , v · · · , m T , v T ) = L ( α, β | m , v , m , v , · · · , m T , v T ) × p ( α ) × p ( β ) ∝ T − Y t =0 "(cid:18) α − β (cid:19) v t + 1 N × t +2 { [ α (1 − α ) − β (1 − β )] m t + β (1 − β ) } − n v n − t +1 × exp ( − T − X t =0 n ( m t +1 − α − β m t − β ) ( α − β ) v t + N × t +2 { [ α (1 − α ) − β (1 − β )] m t + β (1 − β ) } +( n − v t +1 ( α − β ) v t + N × t +2 { [ α (1 − α ) − β (1 − β )] m t + β (1 − β ) } ) Note that N ≫ , the above posterior distribution can be revised as the following sim-plified expression: p ( α, β | m , v , m , v · · · , m T , v T ) ∝ T − Y t =0 "(cid:18) α − β (cid:19) v t − n v n − t +1 × exp ( − T − X t =0 n ( m t +1 − α − β m t − β ) ( α − β ) v t + ( n − v t +1 ( α − β ) v t )
5e will show in Section 3.2 that N does has little impact on the simulation results, andthen validate the above simplified posterior distribution.Based on the proposed posterior distribution, α and β are sampled iteratively by Gibbssampling and MH algorithm. We draw several sample chains independently and applyMPSRF [29, 30] to check the convergence of MCMC chains. If the MCMC chains con-verge, we obtain the point estimation and interval estimation as mean value and intervalbetween . and . quantiles of converged posterior samples respectively. In this section we perform some simulations to validate our algorithm and also apply ourmethod to a published data set of SW620 colon cancer cell line [9].
In the first simulation, we generate synthesized data sets to test the performance of ouralgorithm. In particular, we set N = 10 and generate parameters α and β from their pri-ors respectively, drawing mean value m from U nif (0 . , . and standard deviance v from U nif (0 . , . . We then synthesize { m , v } , · · · , { m T , v T } sequentially basedon the conditional distribution function in Section 2.2. In all we generate 100 groupsof parameters and each group consists of 20 simulations. Table 1 demonstrates the esti-mation results including Mean Square Error (MSE) of point estimation, averaged lengthof confidence intervals (AL) and mean proportion of interval estimation coveringthe true value of parameters (CR). It is easy to find that our algorithm is accurate for the α β MSE AL CR MSE AL CR0.0004 0.0414 0.924 0.0003 0.0321 0.917Table 1: Estimation results of Simulation I.synthesized data sets, with small MSE, high coverage rate and narrow confidence interval.
In the second simulation, we generate data sets by following the cellular processes of thetwo-phenotypic model described in Section 2.1 to validate our method. We set N = 10 and randomly generate the true values of parameters α and β from their priors. Giveneach value of ( α, β ) and initial state x A (0) sampled from a Normal distribution N ( µ ′ , σ ′ ) µ ′ ∼ U nif (0 . , . and σ ′ ∼ U nif (0 . , . , we generate n = 5 synthesizedrealizations of x A ( t ) and then calculate the sample mean m t and sample variance v t asalgorithm input. In all we generate 100 groups of parameters ( α, β ) with each consistingof 20 simulations. Table 2 demonstrates the MSE, AL and CR of two methods with andwithout N as known true value. From the table, we can see that our algorithm wellSet α β MSE AL CR MSE AL CRWith N N N is unknown in ouralgorithm. An example of estimation results can be found in B. Our real data application is based on a published data of SW620 colon cancer cell line(see Figure 4A in [9]). There are four groups of data in this experiment. In each group,CSC proportions were measured via FACS (Fluorescence-activated cell sorting), and bothsample mean m t and sample variance v t were recorded successively. The four groups dif-fer in the initial states of relative frequencies: (A) . CSCs and . non-CSCs;(B) 70% CSCs and 30% non-CSCs; (C) . CSCs and . non-CSCs; (D) . CSCs and . non-CSCs. We assume four sets of parameters Θ = { α , β } , Θ = { α , β } , Θ = { α , β } , Θ = { α , β } corresponding to four groups of data and per-form model selection by calculating Deviance information criterion (DIC, [31]). As ageneralization of Akaike information criterion (AIC) and Bayesian information criterion(BIC), DIC is also an estimator of the relative quality of statistical models for given data,and it is particularly useful in Bayesian statistical settings, representing the trade-off be-tween the error of fitting and the complexity of the model. The smaller the DIC is, themore favorable the model is.Our primary concern is whether the hypothesis of reversible phenotypic plasticity issuperior to the hypothesis of cellular hierarchy given the colon cancer data. Note thatthe phenotypic plasticity model will reduce to the hierarchical model by setting β i =0 ( i = 1 , , , ). We would like to compare the DIC values between the full modeland the hierarchical model without de-differentiation. Our result shows that the DICvalue of the full model is − . , which is smaller than that of the hierarchical model7ithout de-differentiation ( − . ). Model selection thus suggests that de-differentiationsignificantly improve the quality of the model for the given data.Furthermore, we are interested in whether some of the four groups of data share thecommon values of the parameters, from which we can see the dependency of the param-eters on the initial states. The DIC values of different models are shown in Table 3 (seeC for the estimated values of the parameters). From the table, we can find that models Θ = Θ = Θ = Θ and Θ = Θ obtain the smallest DIC values, which means thesetwo models are favorable. Due to the limited data length, these two models cannot bedistinguished since the difference between their DIC values is only about 0.15, withoutstatistical significance. It is interesting that the model Θ = Θ = Θ = Θ is selected asthe favorable model. That is, none of the four groups share the common parameters, allthe parameters are sensitive to the initial states of relative frequencies. Note that differentinitial relative frequencies correspond to different states of phenotypic heterogeneity atthe beginning of cell sorting, our result suggests an interesting heterogeneity-dependencyof the phenotypic plasticity model. Model DIC Θ = Θ = Θ = Θ -103.23 Θ = Θ -103.38 Θ = Θ Θ = Θ -90.00 Θ = Θ Θ = Θ -82.37 Θ = Θ Θ = Θ , Θ = Θ Θ = Θ , Θ = Θ Θ = Θ , Θ = Θ Θ = Θ = Θ Θ = Θ = Θ Θ = Θ = Θ -83.58 Θ = Θ = Θ Θ = Θ = Θ = Θ Conclusions
We have presented a Bayesian statistical analysis on a stochastic phenotypic plasticitymodel of cancer cells. Both simulation studies and real data analysis have shown thepower of our method. Compared to the deterministic models in previous studies [14, 13],the stochastic model here equipping with statistical inference makes quantitative model-ing in a more sophisticated way. On one hand, parameter estimation is extended frompoint-level into interval-level. On the other hand, model selection provides a system-atic means to compare different models, and then helps to evaluate different biologicalhypotheses.It should be noted that, the two-phenotypic model is a very simplified model. Byfocusing on our attention to the reversibility between CSCs and non-CSCs, many biolog-ically complex mechanisms are not included in this model. For example, the model isdiscrete-time, ignoring the complicated time distribution of cell cycle [38, 39, 40]. Devel-oping feasible Bayesian framework for more complicated models could be of great valuein future researches.
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Grant Nos.11601453 and 11401499), the Natural Science Foundation of Fujian Province of China(Grant Nos. 2017J05013 and 2015J05016), the Fundamental Research Funds for theCentral Universities in China (Grant No. 20720160004).
A Derivation of Eqs. (1) and (2)
Here we present the mathematical details of how to obtain Eqs. (1) and (2), i.e. therecurrence formulas of the expectation and variance of x A ( t ) .Let n A ( t ) and n B ( t ) are the numbers of CSC and non-CSC states, x A ( t ) and x B ( t ) arethe frequencies of CSC and non-CSC states, N ( t ) is the population size of the whole pop-ulation. It is easy to know that x A ( t ) = 1 − x B ( t ) , x A ( t ) = n A ( t ) /N ( t ) = n A ( t ) / ( n A ( t )+ n B ( t )) , and x B ( t ) = n B ( t ) /N ( t ) . Suppose ξ i = (cid:26) , with probability α , with probability − αη i = (cid:26) , with probability β , with probability − β ,9hen we have n A ( t + 1) = n A ( t ) + n A ( t ) X i =1 ξ i + n B ( t ) X i =1 η i . By taking conditional expectation on both sides we have E ( n A ( t + 1) | n A ( t )) = n A ( t ) + n A ( t ) × α + n B ( t ) × β. Then for the conditional expectation of x A ( t ) we have E ( x A ( t + 1) | x A ( t )) = E ( n A ( t + 1) N ( t + 1) | x A ( t ))= E ( n A ( t + 1)2 N ( t ) | x A ( t ))= 12 1 N ( t ) ( n A ( t ) + n A ( t ) × α + n B ( t ) × β )= 12 ( x A ( t ) + x A ( t ) × α + x B ( t ) × β )= 12 (1 + α ) x A ( t ) + 12 (1 − x A ( t )) β = 1 + α − β x A ( t ) + β . With the law of total expectation we obtain Eq. (1) as follows µ t +1 = E ( x A ( t )) = E ( E ( x A ( t + 1) | x A ( t ))) = 1 + α − β µ t + β . For the conditional variance of x A ( t ) , V ar ( x A ( t + 1) | x A ( t )) = V ar ( n A ( t + 1) N ( t + 1) | x A ( t ))= V ar ( n A ( t ) + P n A ( t ) i =1 ξ i + P n B ( t ) i =1 η i N ( t + 1) | x A ( t ))= V ar ( P n A ( t ) i =1 ξ i + P n B ( t ) i =1 η i N × t +1 | x A ( t ))= 1 N × t +2 ( n A ( t ) α (1 − α ) + n B ( t ) β (1 − β ))= 1 N × t +2 ( n A ( t ) N × t α (1 − α ) + n B ( t ) N × t β (1 − β ))= 1 N × t +2 ( x A ( t ) α (1 − α ) + x B ( t ) β (1 − β )) σ t +1 = V ar ( E ( x A ( t + 1) | x A ( t ))) + E ( V ar ( x A ( t + 1) | x A ( t )))= ( 1 + α − β σ t + 1 N × t +2 { [ α (1 − α ) − β (1 − β )] µ t + β (1 − β ) } . B An example of simulation results
We select one simulation from each of the 20 parameters settings in Simulations I & II, andshow an example of estimation results in Figure 1.
C Point estimation of real data
Here we show point estimation results of real data for different models.Model α β Θ Θ Θ Θ Θ = Θ Θ = Θ Θ = Θ Θ = Θ Θ = Θ Θ = Θ Θ = Θ = Θ Θ = Θ = Θ Θ = Θ = Θ Θ = Θ = Θ Θ = Θ = Θ = Θ Θ i represents the estimation of pa-rameters of i -th group in real data and Θ i = · · · = Θ i k represents estimation of commonparameters within these k groups. 11
20 40 60 80 100 . . . . . . α Index α true valueestimated value2.5% quantile97.5% quantile (a) Simulation I: α . . . . . . β Index β truevalueestimated value2.5% quantile97.5% quantile (b) Simulation I: β . . . . . . α Index α true valueestimated value2.5% quantile97.5% quantile (c) Simulation II with N : α . . . . . . β Index β truevalueestimated value2.5% quantile97.5% quantile (d) Simulation II with N : β . . . . . . α Index α true valueestimated value2.5% quantile97.5% quantile (e) Simulation II without N : α . . . . . . β Index β truevalueestimated value2.5% quantile97.5% quantile (f) Simulation II without N : β Figure 1: An example of estimation results in Simulation I and Simulation II. Black solidlines represent true values of parameters and red lines represent point estimation. Andtwo dashed lines demonstrate interval estimations. For Simulation II, both the estimationresults with and without N are shown 12 eferenceseferences