A stochastic and dynamical view of pluripotency in mouse embryonic stem cells
Yen Ting Lin, Peter G. Hufton, Esther J. Lee, Davit A. Potoyan
AA stochastic and dynamical view of pluripotency in mouse embryonic stem cells
Yen Ting Lin,
1, 2
Peter G. Hufton, Esther J. Lee, and Davit A. Potoyan T-6 and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545, USA School of Physics and Astronomy, The University of Manchester, Manchester, M13 9PL, UK Department of Bioengineering, Rice University, Houston, TX 77005, USA Department of Chemistry, Iowa State University, Ames, IA 50011, USA (Dated: November 12, 2018)Pluripotent embryonic stem cells are of paramount importance for biomedical research thanks totheir innate ability for self-renewal and differentiation into all major cell lines. The fateful decisionto exit or remain in the pluripotent state is regulated by complex genetic regulatory network.Latest advances in transcriptomics have made it possible to infer basic topologies of pluripotencygoverning networks. The inferred network topologies, however, only encode boolean informationwhile remaining silent about the roles of dynamics and molecular noise in gene expression. Thesefeatures are widely considered essential for functional decision making. Herein we developed aframework for extending the boolean level networks into models accounting for individual geneticswitches and promoter architecture which allows mechanistic interrogation of the roles of molecularnoise, external signaling, and network topology. We demonstrate the pluripotent state of the networkto be a broad attractor which is robust to variations of gene expression. Dynamics of exiting thepluripotent state, on the other hand, is significantly influenced by the molecular noise originatingfrom genetic switching events which makes cells more responsive to extracellular signals. Lastly weshow that steady state probability landscape can be significantly remodeled by global gene switchingrates alone which can be taken as a proxy for how global epigenetic modifications exert control overstability of pluripotent states.
I. INTRODUCTION
Embryonic stem cells derived from mammalian blastocyst are pluripotent: they show an indefinite capacity forself-renewal and the ability to differentiate into every cell type constituting an adult organism [1–3]. The developmentof healthy tissues hinges on the ability of these pluripotent stem cells to make critical decisions determining whenand into which kind of cells to differentiate in response to their environment. Fates of embryonic cells are thereforedecided through sophisticated biological computations orchestrated by a vast regulatory network consisting of genetic,epigenetic and signaling layers [3].The dynamics of genetic expressions take place in molecular environments and are subject to intrinsic noise due tothe discrete nature of the molecular copy numbers and to extrinsic noise from extracellular environment [4]. Thus,while at the level of the organism development is often predictable with a well-defined order of events, at the level ofsingle cells fate determination is fundamentally stochastic [5, 6]. An outstanding question involves understanding towhat extent the inherent molecular stochasticity in embryonic cells is suppressed or exploited for functional purposes.Studies probing transcription in single cells have uncovered high variabilities of gene expression in embryonic stem cellpopulations [7–9]. Several hypotheses about the functional roles of stochasticity in embryonic stem cells have beenput forward. Observations of dynamic and heterogeneous expression patterns of core transcription factors such asNanog, Oct4 and Sox2 have promoted the view that stochastic excursions of these factors are governing the stabilityof pluripotent states [10, 11]. A different hypothesis claims transcriptional noise to be advantageous by facilitatingthe exploration of the space of a gene network such that, at any instant, a subpopulation of cells is optimally primedto be responsive to differentiation signals [12]. The heterogeneity of populations of pluripotent cells has also raisedsome concerns that pluripotency is ill-defined on a single-cell level [12] and instead should be viewed as a statisticalproperty of the macroscopic state emerging at the level of an ensemble of cells. A comprehensive physical picture ofpluripotency at the single-cell level therefore still remains unclear.In this regard, the roles of modeling and computational approaches are seen as especially important for bridging thegap between our understanding of molecular dynamics of regulatory networks and phenotypic outcomes. The rapidaccumulation of data of gene expression through single-cell experiments has stimulated the techniques of statisticalinference and computational systems biology for mapping embryonic stem cell networks [14–16]. In vitro studiesof mouse embryonic stem cells (mESC) in different culture conditions, in particular, have become an ideal modelsystem for exploring mechanistic issues surrounding pluripotency and lineage commitment in mammals [16]. In a tourde force study of mESC by Dunn et al. [13], regulatory relationships between transcription factors were uncoveredthrough analysis of pairwise correlations in gene expression. Using known experimental data as constraints, a networktopology with a minimal set of genes was derived with a high degree of predictive power with respect to perturbationsof the network. a r X i v : . [ q - b i o . M N ] O c t LIF CH PD Stat3
Tcf3
MEKERK
Klf4
Gbx2
Esrrb
Klf2 Oct4 Sox2Sall4
Tfcp2l1
Nanog
External input
Regulated by both types of TF(Tfcp2l1, Esrrb, Oct4, Nanog) ⍺ m ⍺ m ⍺ m ⍺ m ⍺ max ˜ ˜ ˜ ˜ ˜ ⍺ max ⍺ max ˜ ˜ no bound TFone bound repressortwo bound repressorsno bound TFone bound repressorone bound activatorTwo bound activatorsTwo bound repressors one bound activator andone bound repressor ⍺ max ˜ no bound TFone bound activatortwo bound activatorsRegulated by only activators(Stat3, Gbx2, Klf2, Klf4, Sall4, Sox2)Regulated by only repressors(Tcf3, MEKERK) Con fi guration Production rate FIG. 1. The left panel shows a schematic diagram of the network topology, reproduced from Dunn et al. [13]. Each nodecorresponds to a given gene and their placement from left to right is chosen to indicate a general trend of downstreamnessfrom the inputs. In our mechanistic model, each gene produces a unique transcription factor at a given rate, depending on thebinding state of its promoter sites. These transcription factors go on to bind and activate (black arrow) or repress (red bar)other genes. The three nodes on left correspond to extra-cellular signals, which are either absent or present. The right panelshows our assumed molecular logic of transcriptional regulation. Shown are the outcomes of combinatorial binding of activatorsand repressors to promoter sites. Depending on the configuration of the promoter site, transcription factors are produced withrates 0, ˜ α m , and ˜ α max , modeling the effects of cooperative binding. Network topologies, however, remain silent about the roles of molecular noise and dynamics in stem cell differ-entiation governed by stochastic biochemical reactions. Furthermore, in order to validate that the inferred networkreflects true microscopic reality of cell and is not a result of overfitting, one has to ultimately test the results usingmass-action-based kinetics which integrate relevant molecular factors. The key challenge lies in finding the adequateresolution for the network which is able to be predictive and does not pose insurmountable computational burden.In the present work, we outline a framework for extending boolean resolution networks into stochastic and dy-namical models—with microscopic resolution of promoter architecture and individual gene switching events. Thecomputational scheme utilizes static boolean information about the network topology and uses novel analytical modelreduction to increase the computational efficiency, allowing for extensive searches in the space of microscopic reac-tion rates. This framework is successfully applied to the network topology inferred by Dunn et al. [13] in order tobuild a mass action based stochastic dynamic model which is capable of describing both the discrete states of all thegenes and the populations of transcription factors. Starting from minimal assumptions about the rates of variousreactions, we find a few distinct gene swithcing regimes where a remarkable agreement with the experimental geneexpression profiles is achieved for all combinations of experimentally controlled external signals. We show that geneexpression levels in complex regulatory networks are not a unique function of gene switching rates which cautionsagainst over-interpreting boolean level networks and suggests strategies of inference which utilize higher momentsin distribution of transcription factors. Using single cell experiments which have probed expression of pluripotencyfactors [7, 8] we are able to argue that gene switching in pluripotent states happens primarily on the intermediatescale relative to the reactions of creation and degradation (dilution). This regime better agrees with the diverse setof experiments available and provides explanation for the multimodality in distributions of transcription factors andburst like expression dynamics for some genes.In the second half of the paper, armed with a predictive and physically motivated model of pluripotency network,we explore the dynamics of lineage commitment driven by withdrawal of various well documented signals (LIF, 2i)for maintaining the na¨ıve state of pluripotency. We find a number of non-trivial consequences of molecular noise andgene switching dynamics. In particular we show that intermediate gene switching regime generates higher sensitivityfor the network when responding to external signals.
II. FRAMEWORK FOR DERIVING MICROSCOPIC RESOLUTION NETWORKS FROMEXPERIMENTALLY INFERRED NETWORK TOPOLOGIESA. Molecular logic of transcriptional regulation in ESC
Here we outline the basic steps of constructing the microscopic genetic regulatory networks from the Booleantopology of genes inferred from experiments [13]. The network topology (Fig. 1) contains static information about thetypes of interactions between pairs of genes. The interactions are classified as being either repressing or activating.To study the dynamics of complex genetic networks, one has to extend the Boolean level description to account forthe molecular logic of gene regulation. This molecular logic specifies the precise relation between the binding oftranscription factors to genomic sites and the regulatory outcome in terms of gene activation, repression, etc. In thecase where the same sites can be bound to different transcription factors, the combinatorial nature of regulation cangive rise to ambiguity in molecular logic. Such ambiguities can be resolved either by directly inferring regulatory logicor by simulating different combinatorial possibilities until sufficient agreement with experiments is reached. Oncemolecular logic is defined the topologic level description can be extended into a set of biochemical reactions.The reactions in the present model include (i) the binding/unbinding of transcription factors to promoter sitesof genes and (ii) the creation/degradation of populations of transcription factors. We adopt the network of mESCsinferred by Dunn et al. [13]. There are N = 12 genes in the network and we assign each gene G i to a discrete variablewhich describes the configuration of its promoter sites. The values of G i determine if the transcriptional activity isin the activated, repressed or neutral state. The n promoter sites of a particular gene allow binding of at most n transcription factors. Since a large majority of eukaryotic transcription factors in this network bind as dimers [17] weset n = 2 for all genes [18]. The binding reactions areFirst TF binding: G i + P j × ˜ k on N Pj −−−−−−− (cid:42)(cid:41) −−−−−−− ˜ k off G i P j , Second TF binding: G i P j + P k ˜ k on N Pk −−−−− (cid:42)(cid:41) −−−−− ˜ k off G i P jk . (1)Notice that we have adopted the uncooperative binding mechanism which assumes the transcription factors bind toany of the unbound promoter sites independently with a constant rate ˜ k on . Other cooperative binding mechanismscan be easily built into this framework. Each gene codes for its own transcription factor P i where indices i = 1 , ..., N label both genes and the corresponding transcription factors. These promoters can act as repressors or activators ofother genes: this information is contained in the experimentally-inferred topology. The molecular logic describing howthe combinatorial binding of activators and repressors regulate each gene, however, is an assumption in the model.When the rate of transcription is non-zero, genes produce copies of their respective transcription factors. We usea one step model of transcription [19] whereby transcription factors are produced via a unimolecular step. Thisapproximation coarse grains several sequential intermediate reactions, such as mRNA production, into one effectivestep [20, 21]. This approximation has become popular in models of protein production [22–24]. All of the transcriptionfactors are assumed to have finite lifetimes set by the rate of degradation ensuring the existence of stable steady stateswith finite number of molecules. The reactions areTF production: G i P jk ˜ α jk −−→ G i P jk + P i , TF degredation: P i ˜ γ −→ ∅ . (2)We note that one can adopt a view with higher resolution of the network depending on the available experimental dataand the type of question posed. For instance one may consider explicitly including the steps of cell cycle regulation,different epigenetic states, binding of non-coding RNAs etc. For simplicity and illustrative purpose, we consider themost simplified dynamical model which describes only the promoter configurations and the population dynamics ofthe transcription factors—this model aims to use the optimal resolution for capturing trends in gene expression whileremaining feasible for efficient stochastic simulations capturing the variability in an ensemble of cells. After convertingthe network topology into a higher resolution network of biochemical reactions, our next goal is to exhaustively samplea vast space of parameter space to identify the optimal parameter regime with which the model best reproduces allof the experimental constraints. FIG. 2. PDMP stochastic simulations identify three genetic switching regimes consistent with experimental data. Whenswitching is slow, intermediate, or fast we find certain parameters which closely match the experimental results obtained byDunn et al. [13]. The consistency of our model with experimental results is measured using a Hamming distance—a measurewhere one counts the number of discrepancies between the binary expression of each TF for both cases. (A) Shown are theidentified regions in parameter regimes that minimize Hamming distance. There are three free parameters: the binding rate k on ,unbinding rate k off , and basal transcription rate α m . For slow switching, the parameters are k on = 3 . k off = 0 . α m = 0 . k on = 16, k off = 1 . α m = 0 .
01; for fast switching, k on = 102, k off = 10, α m = 0 . B. Multi-scale simulation of complex genetic networks
Biochemical reactions in gene networks are of fundamentally probabilistic nature. The most rigorous way tosimulate a network of reactions is by numerically solving the master equation which accounts for all possible statesof the network down to the level of single molecules [4, 19]. However in high dimensions, i.e., when the number ofspecies is large, this approach is not computationally efficient. Instead, kinetic Monte Carlo algorithms are the moststraightforward way to generate sample paths of the stochastic process. Fully individual-based models, however, stillsuffer from a steep scaling of computational time with the number of components. This fact renders them inefficientfor simulating large gene networks, especially when it comes to scanning or exploring the parameter space.A wide range of approximate schemes have been employed to simulate large scale gene regulatory networks. Mostconventional approximations so far have been the size-expansion methods which are known for being problematic whenthe molecular noise induced by genetic switching becomes non-negligible. On the other hand, for embryonic stem cellnetworks it is essential to account for the stochastic nature of genetic switches which give rise to multiple attractorstates corresponding to various phenotypes. A network with N genes can theoretically have ∼ N attractors. Even ifpopulations of all the other species are present in large quantities, the stochastic fluctuations caused by the geneticswitches (due to stochastic binding-unbinding events of the transcriptional factors) between ON and OFF states cannotbe ignored, unless the switching is operating in the extremely fast limit compared to any other reactions, known as the“adiabatic regime” [25–27]. In the other cases—the non-adiabatic regime—gene switching can completely dominatethe dynamics in the network [25, 28]. Eukaryotic gene regulatory networks are often found in the intermediate regimewhere gene switching events are dynamically interwoven with the rest of the reactions in the network and cannot beignored [5, 29]. Single-cell studies of mESC in particular have shown bursty behavior in gene expression with suddenjumps in levels of proteins resulting in multi-modal distributions of core transcription factors [7].Many of the early computational models of embryonic stem cells have focused on small fragments of pluripotencynetwork, typically involving bistable switches [10, 30]. These early pioneering studies have yielded many insightson stochastic decision making in regards to fate determination and self-renewal. A few studied have looked atlarger portions of the regulatory network [31–34] but at the cost of making near-adiabatic switching approximationsthereby ignoring explicit stochastic treatment of genetic switching dynamics. Lastly, with very few exceptions, thekinetic parameters in those models were not throughly explored or data-informed and had to be picked relying onphysical intuition alone. All of these limitations have been completely overcome in the present work. A series ofrecent studies [22–25, 35, 36] on gene expression dynamics have developed a novel computational framework utilizingpiecewise-deterministic Markov process (PDMP) [37] as approximations to the fully individual-based model. Thisapproach treats genetic switching events exactly, while assuming noise due to the finite nature of populations oftranscription factors to be small in comparison. As will soon be seen, the assumptions underlying the PDMP approachturn out to be sound as we go on to obtain a nearly perfect quantitative agreement with full blown kinetic MonteCarlo schemes even for the case of the complex networks of ESC operating in the intermediate gene switching regime.The PDMP simulations carried out in the present study show nearly O (10 ) fold faster generating stochastictrajectories compared to conventional individual-based kinetic Monte Carlo techniques (the algorithm for generatingPDMP sample paths is provided in the Supplementary Information). This rigorous and rapid sampling of geneswitching events has not only allowed us to investigate the stochastic dynamics of the regulatory network at a longertimescale compared to conventional kinetic Monte Carlo methods, but also enabled us to explore a vast parameter spaceefficiently. We have used the obtained information to derive microscopic resolution models of ESC. The mathematicaldetails of the model described in greater detail can be found in the SI. III. RESULTSA. In the pluripotent state the mean levels of gene expression are a hardwired feature of networkarchitecture whereas heterogeneity is shaped significantly by stochastic dynamics of genetic switches.
In this section we first identify certain parameters which provide results which are consistent experiment. Followingthis the model becomes predictive about dynamical features which are not contained in the topology.Having reduced the individual-based model to a computationally tractable PDMP, we are at a point where we cantune the model parameters to match experimental data. The experimental data used for constraining the rate coeffi-cients are the binarized gene expression levels of transcription factors under well-defined culture conditions consistingof different combinations of leukemia inhibitory factor (LIF), glycogen synthase kinase 3 (CH), and mitogen-activatedprotein kinase (PD). Identical culture conditions were used by Dunn et al. [13] to infer the original Boolean topologyare LIF, 2i (same as PD+CH), LIF+2i, LIF+PD, LIF+CH. We employ the Hamming distance (the number of discrep-ancies between the simulated and experimental profiles) as a cost-function for optimization. The Hamming distanceis minimized through multiple rounds of simulations where we vary the three free parameters: the rates of promoterbinding k on and dissociation k off , and the basal transcription rate α m . Through this procedure we find that, in certainparameter regimes, our assumed molecular logic closely reproduces the gene expression profiles from experimentsfor different combinations of external signals. The dimeric binding—modeled in our framework as two binding sites( n = 2)—of TF to promoter sites in particular leads to a more sensitive switching and emerges as an important featureof the network which we show to be advantageous for reproducing experimental data [38]. We find localized parametersets in three distinct regimes of gene switching: slow, intermediate and fast; all of which successfully reproduce theexperimental gene expression patterns (Fig. 2) corresponding to pluripotent and lineage committed cells. The factthat the rate parameters of the network occupy finite regions and cover different regimes suggests that distinct geneexpression profiles can tolerate fluctuations in reaction rates. Such rate fluctuations, reflecting the effect of extrinsicnoise, are inevitable in dynamic cellular environments of embryonic cells which experience frequent epigenetic andextracellular perturbations [16, 39, 40]. In a way changes in gene switching rates can be seen as a proxy for how globalepigenetic changes govern the rates of transcription factor binding to target genomic regions.From the methodology point of view the absence of a unique regime of rates implies the following: inferringnetworks using only mean levels of gene expression (as is done for Boolean networks) may lead to the loss of valuableinformation contained in higher moments of distribution. Thus new approaches of inference need to be developed inorder to account for broad distributions of transcription factors. For this reason, we look beyond comparisons of meanexpression levels and turn to comparing stationary distributions of transcription factors observed in experiments withthe computationally generated distributions in three chosen parameter regimes identified in Fig. 2. The pluripotentstate of mESCs has been the subject of intense investigations by nucleic acid-based single-cell techniques such as RNA-seq, sm-FISH, qPCR and there is now extensive data on the steady state distributions of RNA and transcriptionfactors maintained under pluripotency-favoring culture conditions [7, 8, 41]. These experiments have revealed aheterogeneous nature of gene expression with many core pluripotency factors, such as Nanog, having long tailed orbimodal distributions. Additionally, the single-cell stochastic trajectories of TFs have shown sharp, bursty transitions FIG. 3. Gene expression profiles of pluripotency factors predicted by PDMP simulations. Each column corresponds to differentexternal inputs, and each row corresponds to regimes of slow, intermediate and fast gene switching. More than 10 samplepaths were used for generating each condition. implying infrequent genetic switching events [7]. These experimental observations are more consistent with theintermediate regime of genetic switching as seen from Fig. 3, as in the case of slow switching nearly all of thetranscription factors express bimodality and in the case of fast switching the expressions of all the factors are narrowand unimodal.Although the PDMP approach accurately captures the effects of genetic switching, it assumes demographic noisearising from finite populations to be negligible. To test the validity of this assumption and assess the contribution ofdifferent sources of noise in establishing the steady state distribution of the pluripotent state, we carry out individual-based simulations for the intermediate switching regime of the network, Fig 4. In the individual-based model allreactions are treated stochastically thereby accounting for all of the sources of noise in the system. The resulting geneexpression profiles follow closely those obtained by PDMP simulations, showing that the noise arising from stochasticswitching events of promoter configuration accounts for the significant part of overall variability in the network.Trajectories of individual transcription factors show that indeed most of the variance in the the molecular distributionsare generated by genetic switching events which appear as abrupt stochastic jumps. Consistent with experiments,we also find that under LIF gene expression is more heterogeneous than under 2i, but also that the overall levelsof expression are higher [7, 8]. In all three regimes of genetic switching supporting pluripotency (LIF+2i,LIF+PD,LIF+CH, LIF and 2i), core transcriptions factors such as Nanog, Oct4, Sox2 are highly expressed. The same factorsare also repressed in conditions favoring differentiation (CH, PD and none) irrespective of gene switching regime. Thisshows that the pluripotent and differentiated states, as determined by the pattern of gene expression, are hardwiredin the architecture of the genetic network independent of genetic switching rates. Nevertheless, as we will show in thenext section, the routes and dynamics of lineage commitment from pluripotent states strongly depend on the level ofmolecular noise generated in different gene switching regimes. B. Dynamics of lineage commitment is driven by gene switching induced bifurcations of the underlyinglandscape of gene expression.
We next ask how the steady state gene expression patterns displayed by the gene network respond to extracellularperturbations in the form of initiation or termination of pluripotency signals. Both dual inhibitor 2i (PD+CH) and
FIG. 4. Gene expression profiles of pluripotency factors predicted by individual-based simulations. The intermediate switchingregime is chosen to be presented as it is the regime which best captures the experimentally measured distributions [7, 8].(A) Shown are 100 representative trajectories and full distributions of select few transcription factors under three differentconditions generated by individual based simulations. (B) Gene expression profile showing the near quantitative agreementwith results of PDMP simulations shown on Fig. 3 (C) Comparison of individual based model with experimental data fromDunn et al [13]
Leukemia factor LIF based signaling have been shown to provide a stable environment for maintaining pluripotencyof stem cells in vitro [3, 13]. Conversely, withdrawal of either LIF or 2i leads to irreversible lineage commitmentafter a 24-hour period. Despite a similar ability to guard pluripotent cells against lineage commitment, these factorsdeploy different regulatory mechanisms reflected in distinct distributions of pluripotency factors. As a result, stem celldifferentiation by withdrawal of different signals proceeds via different routes. To gain a mechanistic understandingof how the interplay of signaling, molecular noise and network architecture give rise to the steady state expressionprofiles, we study the dynamics of transitioning between the pluripotent and lineage committed states induced byrapid initiation and withdrawal of signaling conditions (LIF, CH, PD).The temporal evolution of distributions of the TFs exiting (LIF/2i withdrawal) and entering (LIF/2i immersion)pluripotent states reveals rich dynamical signatures of these transitions (Fig 5). To reveal the role of stochasticity inthese transitions, we compare the intermediate regime—which is dominated by genetic switching—to the fast regime—in which transitions are largely governed by the network topology and the fluctuation of promoter configuration isalmost-completely ignored. The irreversible nature of transitions manifests clearly in different routes exiting andentering the pluripotent state (transitions to and from the None state in Fig 5).In the intermediate regime of genetic switching rates, expression noise greatly facilitates transitions out of a pluripo-tent state by making the network more responsive to changes in environmental signaling. In contrast, in the fastswitching regime, upon withdrawal of pluripotency signals the downstream regulation happens on a much slowertime-scale with some factors remaining virtually unresponsive to changes of signaling. This signaling enhancementin the intermediate regime reveals the importance of molecular noise in making pluripotent states more sensitive to
FIG. 5. The dynamical behavior of the distributions of transcription factor densities for the intermediate and fast switchingregimes. At time t = 0, the external inputs are changed. The plots show the evolution in probability density for a largeensemble (10 ) of PDMP sample paths. environmental conditions. There are qualitatively different patterns of re-entrance into pluripotent states upon LIF vs2i addition, with LIF being much more efficient at reversing pluripotency compared to the 2i. The different potentialof signaling culture for pluripotency reversal has been established in experiments [3] which have shown that in thelater stages of commitment only the LIF is able to reverse lineage-primed cells back to their naive pluripotent states.The exit and re-entrance from pluripotency upon withdrawal/addition of LIF shows complex signatures of hysteresisand bifurcations. This suggests that there can be multiple pathways of entering or exiting pluripotency. The tran-sition times for all signaling-induced changes of the steady states of the network are visualized on a kinetic diagram(Fig 6). The kinetic diagram shows an underlying structure to these transitions where conversion among pluripotentstates takes place with lower “activation barriers” compared to transitions accompanying loss of pluripotency. In theintermediate switching regime there is a clear time-scale separation between transitions which keep cells in pluripotentstates and transitions out of pluripotent states. One may argue that such a time-scale separation between signalsinducing differentiation and pluripotency allows embryonic cells to execute developmental decisions more faithfully.In the fast switching regime this clear time-scale separation is partially lost where only the loss of all three signals isseparated from the rest of the transitions.Detailed analysis of individual distributions and trajectories of transcription factors can be very informative dueto their high information content. It is, however, not immediately clear how changes in the expression of individualgenes contribute to global changes corresponding to different phenotypic transitions. To reveal such global changes,we project stochastic trajectories of all transcription factors onto the first two eigenvectors obtained by principalcomponent analysis of the reference pluripotent steady state (LIF+2i). Most of the variance of transcription factorsis well captured by the few principal components. The high-dimensional steady state of the cellular network can thusbe conveniently projected onto a 2-dimensional subspace, allowing us to visualize the attractor states of the networkas probability landscapes π ( P C , P C ) which are often masked by heterogeneous distributions.Comparing these probability landscapes with different gene switching regimes reveals the distinct roles played bygene switching-induced molecular noise and the deterministic network topology in guiding the transition out of thepluripotent states (Fig 7). The intermediate gene switching regime, once again, appears to be the more viable regimeunderlying pluripotent states since the probability landscape shows up as a broad attractor with interconnected states.Going towards the limit of slow switching results in the fragmentation of the landscape into states separated by highbarriers. This gene switching-induced remodeling of attractors shows the potential for regulation via global epigeneticchanges which are purported to act via silencing or activating entire sets of genes at once. Thus one may view geneswitching rates as a proxy for genome wide acetylation/methylation patterns which can dramatically alter the accessof transcription factors to key target genomic sites. The sequential removal of pluripotency inducing signals reveals FastswitchingIntermediateswitching
Intermediate switching: Stochastic regime Fast switching: Near-deterministic regime
FIG. 6. Calculations of transition times between the stationary distributions of different external conditions. A larger, darkerarrow indicates that a given transition takes a longer time to converge to its stationary state. This time-scale is measured bysimulating a large ensemble (10 ) of PDMP sample paths to provide a simulated probability density, and finding the Jensen–Shannon divergence [42, 43] between the instantaneous distribution of each TF and its final stationary distribution. The timefor the each divergence to fall below a threshold (:= 0 .
3) is recorded, and we choose the largest of these as a quantification ofthe transition time. a consistent change in the size of the attractor towards occupying smaller regions on the landscape. This argues forthe physical state of network corresponding to pluripotent states to be the one with maximal variance of regulatorytranscription factors where lineage commitment is accompanied by their gradual constraining and repression. Asimilar idea which views pluripotency as a macrostate emerging from an ensemble of cells which try to maximizesthe information entropy with respect to regulatory transcription factors has been postulated before [12, 44]. Theanalysis of steady state stochastic dynamics of pluripotency network in this work appears in agreement with hisview. Furthermore, we are able to reveal the microscopic origin of this entropic paradigm. By analyzing pairwisecorrelation among different transcription factors we find that signals like LIF/2i create greater independence betweenthe expression of core transcription factors leading them to explore larger range of values. Hence, removal of thesesignals leads to more constrained and inter-dependent patterns of gene expression for the same transcription factorswhich greatly diminishes overall variance. (Fig S1).
IV. DISCUSSION
Recent studies of ESC have increasingly emphasized systems level perspective on pluripotency and fate determina-tion as outcomes of complex biological computations orchestrated by non-random networks of genes [45, 46]. Rapidgrowth of gene expression data collected from controllable in vitro experiments on mESCs has made it possible to makereliable inferences of gene regulatory networks which govern the state of pluripotency [13]. These inferences show ahighly interconnected nature of regulatory networks where signaling molecules, genomic binding, epigenetics and bio-chemical feedback act in a concerted manner in balancing pluripotency and decision making. Single-cell experimentshave also revealed significant dynamic heterogeneity of gene expression in the population of cells [7, 8], suggestingthe important roles played by molecular noise and non-equilibrium processes. Schematic network topology modelsare no longer sufficient for rationalizing the level of detailed quantitative information obtained in single-cell studies.In the present work we have developed a multi-scale computational scheme for converting experimentally-inferredboolean topologies into quantitative and predictive models of networks with microscopic resolution of gene expressiondynamics. The employed computational model is based on previously proposed hybrid stochastic approaches [22–25]in which the switching dynamics of individual genes are considered exactly while the rest of the biochemical reactionsare approximated as deterministic. This hybrid stochastic approach is approximately a thousand fold faster than con-ventional kinetic Monte Carlo methods. This allows us to simulate large scale gene regulatory networks of ESC underdifferent culture conditions and gene switching regimes. To infer the parameters in the model from the experimentaldata, we use hybrid simulations to exhaustively sample the space of rates and identify the parameter regime in whichthe model prediction best matches experimental data. The approximation using the hybrid scheme is validated bycarrying out fully stochastic simulations of the network for the identified parameter sets. This agreement also shows0 F as t s w i t c h i ng I n t e r m e d i a t e s w i t c h i ng S l o w s w i t c h i ng LIF+2i LIF 2i None ⇡ ( P C , P C ) FIG. 7. Mapping the cellular attractors of genetic network under different switching and signaling conditions by projectingPDMP simulated gene expression onto first two principal components. The reference state for principal components was chosento be the LIF+2i/intermediate switching. that the switching events of genes—due to stochastic TFs binding to the promoter sites—to be a significant source ofvariance in the ESC networks. Furthermore we see the changes in gene switching rates as a proxy for global epigeneticmodifications that can alter the rates of access of transcription factors to sites buried under chromatin structures.Thus the significant remodeling of steady state landscape that we see by varying the global gene switching rates givesus a glimpse of potentially powerful leverage that epigenetic changes of chromatin have over stability of pluripotentstates.We find that the intermediate regime, in which gene switching rate is comparable to the other reaction rates inthe network, is most consistent with single-cell measurements [7–9]. In this regime transcription factors show burstydynamics which lead to heterogeneous distributions with some showing long tailed and bimodal features. We findsignaling by LIF and 2i to be a major driving force maintaining the stability of pluripotent states. Withdrawal ofeither LIF/2i initiates lineage commitment via a robust pattern of reduced expression of Nanog/Oct4/Sox2 triad.To characterize the dynamics of lineage commitment, we compute transition times from pluripotent to differentiatedsteady states. We find that higher levels of molecular noise generated by slower gene switching make network moreresponsive to changes in signaling condition. Next, by carrying out principal component analysis on ensembles ofgene expression profiles, we find a much simpler description of pluripotency and lineage commitment in terms ofeffective probability landscapes. As the signals safeguarding pluripotency are removed, these landscapes reveal agradual narrowing of the steady state attractor explored by the network. Thus, we see a hierarchical organizationof differentiation landscapes where pluripotent states pose the largest attractor which is maintained through theextracellular signals and the molecular noise of gene switching.At last we believe the computational framework developed here should also be useful in organize and interpretingexperimental data of other complex gene regulatory networks within a coherent and unified computational modelsthereby making it easier to conceive and vet physical hypothesis in a systematic way. Given the rapid rise of informationfrom high throughput single-cell nucleic acid based techniques (RNA-seq, RNA-FISH, qPCR, etc), we expect suchmicroscopic resolution models to play important roles for bridging the systems-level behavior of genetic networks withthe underlying molecular-level processes of binding and regulation at the genomic sites. [1] M. Evans, Nat Rev Mol Cell Biol , 680 (2011).[2] C. E. Murry and G. Keller, Cell , 661 (2008). [3] G. Martello and A. Smith, Ann Rev Cell Dev Biol , 647 (2014).[4] N. G. Van Kampen, Stochastic processes in physics and chemistry , Vol. 1 (Elsevier, 1992).[5] O. Symmons and A. Raj, Mol cell , 788 (2016).[6] G. Bal´azsi, A. van Oudenaarden, and J. J. Collins, Cell , 910 (2011).[7] Z. S. Singer, J. Yong, J. Tischler, J. A. Hackett, A. Altinok, M. A. Surani, L. Cai, and M. B. Elowitz, Molecular cell ,319 (2014).[8] R. M. Kumar, P. Cahan, A. K. Shalek, R. Satija, A. J. DaleyKeyser, H. Li, J. Zhang, K. Pardee, D. Gennert, J. J.Trombetta, et al. , Nature , 56 (2014).[9] M. A. Canham, A. A. Sharov, M. S. Ko, and J. M. Brickman, PLoS Biol , e1000379 (2010).[10] T. Kalmar, C. Lim, P. Hayward, S. Mu˜noz-Descalzo, J. Nichols, J. Garcia-Ojalvo, and A. M. Arias, PLoS Biol , e1000149(2009).[11] S. Masui, Y. Nakatake, Y. Toyooka, D. Shimosato, R. Yagi, K. Takahashi, H. Okochi, A. Okuda, R. Matoba, A. A. Sharov, et al. , Nature cell biology , 625 (2007).[12] B. D. MacArthur and I. R. Lemischka, Cell , 484 (2013).[13] S.-J. Dunn, G. Martello, B. Yordanov, S. Emmott, and A. Smith, Science , 1156 (2014).[14] J. Feigelman, S. Ganscha, S. Hastreiter, M. Schwarzfischer, A. Filipczyk, T. Schroeder, F. J. Theis, C. Marr, andM. Claassen, Cell Sys , 480 (2016).[15] H. Xu, Y.-S. Ang, A. Sevilla, I. R. Lemischka, and A. Ma’ayan, PLoS Comput Biol , e1003777 (2014).[16] S. Semrau and A. van Oudenaarden, Ann Rev Cell Dev Biol , 317 (2015).[17] L. Ferraris, A. P. Stewart, J. Kang, A. M. DeSimone, M. Gemberling, D. Tantin, and W. G. Fairbrother, Genome Res , 1055 (2011).[18] We have also tested n = 3 , , n = 2.[19] P. C. Bressloff, Stochastic processes in cell biology , Vol. 41 (Springer, 2014).[20] T. B. Kepler and T. C. Elston, Biophysical journal , 3116 (2001).[21] J. Hornos, D. Schultz, G. Innocentini, J. Wang, A. Walczak, J. Onuchic, and P. Wolynes, Phys Rev E , 051907 (2005).[22] Y. T. Lin and T. Galla, Journal of The Royal Society Interface , 20150772 (2016).[23] Y. T. Lin and C. R. Doering, Physical Review E , 022409 (2016).[24] P. G. Hufton, Y. T. Lin, T. Galla, and A. J. McKane, Physical Review E , 052119 (2016).[25] D. A. Potoyan and P. G. Wolynes, The Journal of chemical physics , 195101 (2015).[26] A. M. Walczak, J. N. Onuchic, and P. G. Wolynes, Proceedings of the National Academy of Sciences of the United Statesof America , 18926 (2005).[27] M. Sasai and P. G. Wolynes, Proceedings of the National Academy of Sciences , 2374 (2003).[28] H. Feng, B. Han, and J. Wang, The Journal of Physical Chemistry B , 1254 (2010).[29] T. L. Lenstra, J. Rodriguez, H. Chen, and D. R. Larson, Ann Rev Biophys (2016).[30] V. Chickarmane, C. Troein, U. A. Nuber, H. M. Sauro, and C. Peterson, PLoS Comput Biol , e123 (2006).[31] J. Wang, K. Zhang, L. Xu, and E. Wang, Proceedings of the National Academy of Sciences , 8257 (2011).[32] C. Li and J. Wang, Journal of The Royal Society Interface , 20130787 (2013).[33] B. Zhang and P. G. Wolynes, Proceedings of the National Academy of Sciences , 10185 (2014).[34] H. Feng and J. Wang, Scientific reports , 550 (2012).[35] S. Zeiser, U. Franz, O. Wittich, and V. Liebscher, IET systems biology , 113 (2008).[36] S. Zeiser, U. Franz, and V. Liebscher, Journal of Mathematical Biology , 207 (2010).[37] M. H. Davis, J. Roy. Statist. Soc. Ser. B , 353 (1984).[38] Our analysis showing n ≥ n = 1.[39] M.-E. Torres-Padilla and I. Chambers, Development , 2173 (2014).[40] H. Ochiai, T. Sugawara, T. Sakuma, and T. Yamamoto, Scientific reports , 7125 (2014).[41] A. M. Klein, L. Mazutis, I. Akartuna, N. Tallapragada, A. Veres, V. Li, L. Peshkin, D. A. Weitz, and M. W. Kirschner,Cell , 1187 (2015).[42] J. Lin, IEEE Transactions on Information theory , 145 (1991).[43] D. M. Endres and J. E. Schindelin, IEEE Transactions on Information theory , 1858 (2003).[44] S. J. Ridden, H. H. Chang, K. C. Zygalakis, and B. D. MacArthur, Physical review letters , 208103 (2015).[45] B. D. MacArthur, A. Ma’ayan, and I. R. Lemischka, Nature Reviews Molecular Cell Biology10