Stochastic modeling of auto-regulatory genetic feedback loops: a review and comparative study
SStochastic modeling of auto-regulatory genetic feedback loops:a review and comparative study
J. Holehouse, Z. Cao and R. GrimaSchool of Biological Sciences, University of Edinburgh, UKOctober 22, 2019
Abstract
Auto-regulatory feedback loops are one of the most common network motifs. A wide variety ofstochastic models have been constructed to understand how the fluctuations in protein numbersin these loops are influenced by the kinetic parameters of the main biochemical steps. Thesemodels differ according to (i) which sub-cellular processes are explicitly modelled; (ii) the modellingmethodology employed (discrete, continuous or hybrid); (iii) whether they can be analyticallysolved for the steady-state distribution of protein numbers. We discuss the assumptions andproperties of the main models in the literature, summarize our current understanding of therelationship between them and highlight some of the insights gained through modelling.
Gene regulatory networks (GRNs) provide an abstraction of the complex biochemical interactionsbehind transcription and translation, the central dogma of molecular biology. Feedback has beenidentified as an important motif in such networks, defined through the regulation of an upstream process by one downstream of it. Auto-regulation is the most basic kind of feedback loop – a proteinexpressed from a gene activates or suppresses its own transcription. These lead to positive or negativefeedback, respectively. It has been estimated that 40% of all transcription factors in
E. coli self-regulate [1] with most of them participating in auto-repression [2]. Many biological systems utilize acombination of positive and negative feedback loops, such as the circadian and segmentation clocks[3, 4].Therefore, the computational and experimental study of the behaviour of auto-regulatory feedbackloops is an important field of study. Measurement of the distribution of protein numbers in living cellsusing fluorescence microscopy [5] is now a routine process. Mathematical models represent a usefultool to understand what sort of interactions in feedback loops lead to observed protein distributions,potentially leading to insight into how noise (large fluctuations in gene products with low copy numbers[6]) is managed at the subcellular level [7, 8]. These models have also been used to understand howauto-activation influences the sensitivity to input signals and the speed of induction [9] and to gaininsight into the sources of noise in auto-regulatory networks [10, 11]. Various inference approacheshave also been devised to estimate the rate constants characterizing feedback loops from populationsnapshot data [12, 13, 14, 15].The conventional stochastic description of gene regulatory networks is given by the chemical masterequation (CME), a time-evolution equation for the probability of observing a certain number of geneproducts at a given time [16]. This Markovian description is discrete in the sense that it takes intoaccount that molecule numbers change by integer amounts when a reaction occurs. In Section 2 wedescribe the most common coarse-grained CME models for auto-regulatory feedback loops, elucidate1 a r X i v : . [ q - b i o . S C ] O c t he relationship between them and identify the regions of parameter space where their analyticalpredictions for the distribution of protein numbers are accurate compared to a fine-grained model.In Section 3 we compare and discuss continuous approximations of the CME using Fokker-Planckequations and partial integro-differential equations and briefly review other continuous approaches. InSection 4 we outline the main biological insights obtained using stochastic models. Finally, we concludein Section 5 where we identify open problems. From a biologist perspective, a minimal model of auto-regulation should describe the main biochemicalprocesses describing the flow of information from gene to mRNA to protein and back to the gene. Hencethe model should describe transcription and translation (the two steps at the heart of the central dogmaof molecular biology), mRNA and protein degradation, and interactions of proteins with genes. Forsimplicity we consider the case where there is a single gene copy and all processes are modelled asfirst-order reactions except the protein-gene interactions which naturally follow second-order kinetics.We refer to this model as the full model since it will be our ground truth, i.e. the finest scale modelthat we shall consider here. The reactions describing this model are G ρ u −→ G + M, M d m −−→ ∅ , M k −→ M + P, P + G σ b −→ G ∗ , G ∗ σ u −−→ G + P, G ∗ ρ b −→ G ∗ + M, P d p −→ ∅ . While this model is intuitive, it has notbeen studied extensively because the mathematical description of its stochastic dynamics, as providedby the chemical master equation, is not easy to solve analytically. In fact even in the absence of thefeedback loop, i.e. no protein-gene interactions, its master equation has still not been solved exactly[17]. Hence historically, simplified versions of the full model have received much more attention in theliterature. These are the models by Hornos et al. [18] in 2005, Grima et al. [19] in 2012 and Kumar etal. [20] in 2014. Henceforth we shall refer to these as the Hornos, Grima and Kumar models. Thereexist other discrete models e.g. [21] which in certain limits reduce to the aforementioned three.The three models share a few common properties: (i) They only describe protein fluctuations, i.e.there is no explicit mRNA description. (ii) The models are discrete in the sense that protein numberschange by discrete integer amounts when reactions occur. (iii) The chemical master equation for eachmodel admits an exact solution in steady-state conditions. These exact solutions have been obtainedusing the method of generating functions but in other studies using similar models, the solution wasobtained using the Poisson representation [10, 23, 24, 25]. There are however important differencesbetween the models particularly how they describe protein production and protein-gene interactionsthat are not often spelled out but can be discerned from the form of the CME. The Hornos modelassumes protein molecules are produced one at a time and neglects changes in the protein moleculenumbers when a protein binds or unbinds from the gene. The Hornos model (excluding bound-proteindegradation) is explicitly given by the set of reactions: G ρ u B −−−→ G + P, P d p −→ ∅ , P + G σ b −→ P + G ∗ , G ∗ σ u −−→ G, G ∗ ρ b B −−→ G ∗ + P . Note that the effective rate of protein production in state G is ρ u B where B = k/d m (the mean number of proteins produced by an mRNA molecule during its lifetime). The reason forthis is that if we define (cid:104) n m (cid:105) as the mean mRNA number in the full model then the effective meanproduction rate is ρ u (cid:104) n m (cid:105) ≈ ρ u k/d m = ρ u B when mRNA equilibrates rapidly ( d m (cid:29) d p , a commonassumption as we discuss later). The same analysis follows for the effective production rate in state G ∗ . The Grima model also assumes protein molecules are produced one at a time but takes intoaccount protein fluctuations from the binding-unbinding process, i.e. when a protein binds a gene, theprotein numbers are decreased by one and conversely they are increased by one when unbinding occurs.The Grima model (excluding bound-protein degradation) is explicitly given by the set of reactions: G ρ u B −−−→ G + P, P d p −→ ∅ , P + G σ b −→ G ∗ , G ∗ σ u −−→ G + P, G ∗ ρ b B −−→ G ∗ + P . The Kumar model is similarto the Hornos model except that proteins are produced in a burst (a phenomenon called translationalbursting) where the burst size is a random number. Specifically, the Kumar model is given by the setof reactions: G ρ u −→ G + rP, P d p −→ ∅ , P + G σ b −→ P + G ∗ , G ∗ σ u −−→ G, G ∗ ρ b −→ G ∗ + rP , where is r is2 RNA protein ρ u
Burst size B Burst size B ρ u
Models of an auto-regulatory feedback loop. The full model has an explicit description of both mRNAand protein but its CME has no known exact steady-state analytical solution. The Grima, Hornos and Kumar modelsrepresent approximations of the full model wherein only the protein is explicitly described and the CME can be solvedexactly in steady-state. The Grima and Hornos models assume proteins are produced one at a time while Kumar assumesbursty production with mean burst size B (denoted by dashed circles). The Hornos and Kumar models neglect proteinnumber fluctuations due to protein-gene binding and unbinding, whereas the Grima model takes them into account. TheModified Kumar model is same as the Kumar model but takes into account fluctuations due to binding-unbinding; itsCME has no known exact steady-state solution. The LMA is a discrete approximation of the full model given by theexact solution of the CME of a bursty protein production process with promoter switching and no feedback. The rateof switching to state G ∗ is not σ b but σ ∗ b which is a function of all the parameters in the full model (see SupplementaryNote 7 in [22]). Note that under the assumption of rapid mRNA equilibration, the mean rate of protein production isthe same in all models and hence the models are indistinguishable when fluctuations are ignored. a positive integer drawn from the geometric distribution with mean B . Note that the mean rate ofprotein production in each of the two promoter states is the same as in the full, Grima and Hornosmodels. The differences between the models are illustrated in Fig. 1.The relationship of these models to each other and to the full model is still not completely under-stood. In Fig. 2 we summarize our current understanding of the regions of parameter space where themodels’ analytical prediction of the steady-state protein number distribution agrees with stochasticsimulations of the full model (using the stochastic simulation algorithm, SSA [26]) for the case of posi-tive feedback (panels I-IV) and negative feedback (panels V-VIII). We enforce fast promoter switchingconditions by choosing the rate constants of protein-gene binding ( σ b ) and unbinding ( σ u ) to be largecompared to the other rate constants. For the full model we also choose protein degradation rates d p = 1 to be significantly smaller than mRNA degradation rates d m = 10 . Both of these conditions arecommon to many genes in both prokaryotic and eukaryotic cells [17, 27]. For each type of feedback,there are 4 plots for combinations of small and large values of L and B where L = σ u /σ b is the ratioof unbinding to binding rates (inversely proportional to the feedback strength) and B = k/d m is the3ean burst size. For completeness, we also show the deterministic rate equation prediction for theprotein number in the full model (vertical orange lines).The following considerations allow us to deduce which models are accurate in which part of param-eter space. Models that assume protein fluctuations in the binding-unbinding process are negligible areincorrect when feedback is strong ( L is small), as recently shown in [28] – we shall call this Property1. Models that assume proteins are produced one molecule at a time are only correct when B is small– we shall call this Property 2. The reason for the latter property is as follows. It is well known thatwhen mRNA decays much faster than protein then the production of proteins in the full model withoutfeedback occurs in bursts of random size described by a geometric distribution with a mean of B andthe Fano factor of the protein distribution is B [17]. In models that assume proteins are producedone at a time, if there was no feedback then the Fano factor of the protein distribution would be (since the distribution would be Poisson). Hence these models (with or without feedback) can onlyprovide a good approximation to the full model when B is small. Armed with Properties 1 and 2, wecan now explain Fig. 2. For strong feedback (small L ), independent of burst size, by property 1 boththe Hornos and Kumar models fail to accurately match the full model. For low feedback (large L ),the Kumar model is accurate for all burst sizes whereas the Hornos model is only accurate for smallburst sizes (by Property 2). The Grima model is only accurate for low burst sizes (by Property 2)independent of feedback strength. It is noteworthy that by modifying the Kumar model so that ittakes into account bursting (illustrated in Fig. 1 as Modified Kumar) then the steady-state proteindistributions obtained from the SSA of this model are practically indistinguishable from the SSA ofthe full model. It is currently unknown if this model admits an exact analytical steady-state solution. Figure 2:
Comparison of stochastic simulations of the full model (denoted as SSA) with the steady-state analyticaldistributions predicted by the reduced models of Grima (G), Hornos (H), Kumar (K) and the LMA (illustrated in Fig.1). Panels I-IV show positive feedback ( ρ u < ρ b ) and Panels V-VIII show negative feedback ( ρ u > ρ b ). The rate equationprediction for mean protein number in the full model is denoted as RE. Note that L = σ u /σ b is inversely proportionalto the feedback strength and B = k/d m is the mean protein burst size. The LMA provides the most accurate discreteapproximation of the full model (8 out of 8 regions of parameter space), followed by Grima/Kumar (4 out of 8 regions)and Hornos (2 out of 8 regions). See text for discussion. The parameters for small L (I, II, V, VI) are σ u = 10 , σ b = 10 and for large L (III, IV, VII, VIII), they are σ u = 10 , σ b = 10 . The parameter for small B (I, III, V, VII) is B = 10 − and for large B (II, IV, VI, VIII), it is B = 10 . The decay rates are fixed to d m = 10 , d p = 1 . The rest of the parametersare: (I) ρ u = 10 − , ρ b = 10 ; (II) ρ u = 10 − , ρ b = 2 . ; (III) ρ u = 10 , ρ b = 10 ; (IV) ρ u = 10 − , ρ b = 2 . ; (V) ρ u = 10 , ρ b = 10 ; (VI) ρ u = 10 , ρ b = 1 ; (VII) ρ u = 10 , ρ b = 10 ; (VIII) ρ u = 2 . , ρ b = 10 − . Another common discrete approximation of the full model is the master equation for an effectivebirth-death process for protein where the propensity of the production reaction is a Hill functionof the instantaneous number of proteins whereas the propensity for protein decay is the same asfor the usual first-order decay process. Specifically the propensity for the production reaction reads4 Lρ u + ρ b n ) / ( L + n ) (the symbols as defined in Fig. 1 and above). Hill type propensities of thistype or similar are in common use in the literature [9, 29, 30]. The reduced master equation for thiseffective birth-death process can be solved exactly in steady-state and it is often thought to be avalid approximation of the full model in the limit of fast promoter switching. It is worth noting thatHill type propensities for protein production are not rigorously derived but rather written by analogyto the effective rate of protein production obtained from the deterministic rate equations under fastequilibrium conditions. Hence the master equation’s validity under the same conditions is doubtful[31]. Indeed, it has recently been shown that in the fast switching limit, the steady-state solution ofthis master equation is precisely the same as that of the Hornos model and hence is not an accurateapproximation of the full model if L is small [28] (the solution of this model and that of Hornos areindistinguishable for the parameters in Fig. 2).Lastly we consider a recent novel discrete approximation of a class of gene regulatory networks calledthe Linear Mapping Approximation (LMA [22]). In the LMA, in the limit of fast mRNA equilibration,the CME of the full model is approximated by the CME describing bursty protein production andeffective promoter switching with no feedback which has an exact steady-state solution. The LMAprovides a computational recipe by which the effective rates of promoter switching can be obtained asfunctions of the parameters in the full model. The LMA together turns out to be the best discreteanalytical approximation of the full model, being accurate in all 8 regions of parameters space in Fig.2. This is followed by Kumar and Grima (both accurate in 4 out of 8 regions) and Hornos (accuratein 2 out of 8 regions). The modified Kumar model does as well as the LMA but it has no knownanalytical solution. It is to be emphasized that the LMA gives only accurate results when the meannumber of proteins conditional on the gene being in state G is much greater than 1, a condition metin all cases considered in Fig. 2.In summary, what appear to be minor and subtle differences in the construction of discrete models ofauto-regulation, actually lead to considerable differences in the prediction of the steady-state proteindistribution. For both positive and negative feedback, the models all agree in only one region ofparameter space where the mean burst size and feedback strength are both small (small B and large L ). The differences between the Grima, Hornos and Kumar models and the full model originate fromthe fact that these models where not derived rigorously from the full model but rather they werewritten down intuitively. On the other hand, the LMA does so well because it is derived from thefull model. In this section we have considered models of the simplest type of auto-regulatory loop.Discrete models of a loop with more complex mechanisms such as cooperative protein binding to thegene and oscillatory transcription rates, e.g. due to circadian rhythms, have also been solved recently[22]. Besides discrete models there are also continuous models of auto-regulatory loops where it is assumedthat molecule number fluctuations correspond to hops on the real axis rather than on an integeraxis. The simplicity of the distributions provided by the continuous models often make the resultseasier to interpret than those obtained from exactly solvable discrete models which are in terms ofhypergeometric functions.These models are typically described by either the WKB approximation [30, 32, 33], the Linear-Noise Approximation (LNA) (a type of a Fokker-Planck equation) [34, 35] or Partial-Integro DifferentialEquations (PIDEs) [36, 37, 38, 39]. These are several variations of these three approaches in theliterature, too numerous to here enumerate. For the purpose of the present discussion we will compareand discuss two LNA variants and a PIDE approach: (i) by LNA we specifically mean the Fokker-Planck equation obtained by applying the LNA to the CME of the modified Kumar model in Fig 1.(ii) by cLNA we mean the conditional LNA derived in [40] applied to the CME of the modified Kumarmodel. (iii) by PIDE we specifically mean the model presented by Bokes and Singh for the case thatthere are no decoy binding sites [38]; its solution is given by Eq. 26 in the SI of the aforementioned5aper (this result has been previously reported [36, 41]). These three have the following differentproperties: (i) LNA gives a Gaussian distribution, the cLNA gives a sum of two Gaussians, eachassociated with one of the promoter states while the PIDE gives a unimodal non-Gaussian distributionfor the protein numbers (note that generally PIDE can give rise to bimodal distributions but not in thiscase because there is no cooperativity). (ii) All implicitly take into account mRNA through the proteinburst size distribution. This is a geometric distribution for the LNA and cLNA models whereas it isan exponential distribution for the PIDE model. (iii) The LNA and PIDE are continuum models butthe cLNA is a hybrid model because protein fluctuations are assumed continuous but gene fluctuationsare assumed discrete.
Promoter activity Protein numberProbability μ = log σ −
Promoter activity Protein numberProbability -4 -1 2
I II III I II III
A BC D E
High basal transcription rate Low basal transcription rate B = 1 B = 5 B = 15 P r o b a b ili t y SSAPIDELNA
Protein number
Figure 3: (A,B) Plots comparing the accuracy of continuous approximations (LNA, cLNA, PIDE) versus the SSA ofthe full model under fast, intermediate and slow rates of promoter switching for high and low basal transcription rates.The parameter µ is large for slow switching and small for fast switching. LNA and PIDE approximations are accurate forfast switching in high basal transcription rate conditions while the cLNA is more accurate for slow switching conditions.All approximations break for low basal transcription. (C,D,E) show plots of LNA vs PIDE vs SSA of full model for fastswitching as the mean burst size B increases at constant transcription rates ρ u B and ρ b B . The LNA performs best atlow B while the PIDE performs best at large B . Note that the x-axis in (C,D,E) is logarithmic. The parameters are asfollows. (A) (I) µ = − , ρ u = 7 , ρ b = 25 , B = 2 , σ u = 10 and σ b = 10 . (II) µ = 1 , ρ u = 7 , ρ b = 25 , B = 2 , σ u = 10 − and σ b = 10 − . (III) µ = 2 , ρ u = 7 , ρ b = 25 , B = 2 , σ u = 10 − , σ b = 10 − . (B) (I) µ = − , ρ u = 10 − , ρ b = 5 , B = 2 , σ u = 10 and σ b = 10 . (II) µ = − , ρ u = 10 − , ρ b = 5 , B = 2 , σ u = 10 and σ b = 10 (III) µ = 2 , ρ u = 10 − , ρ b = 5 , B = 2 , σ u = 10 − and σ b = 1 . (C,D,E) ρ u B = 30 , ρ b B = 20 , σ u = 10 and σ b = 10 . To summarise our current understanding of the regions of their validity, in Fig. 3A,B we comparethe three approximations under fast, intermediate and slow rates of promoter switching for bothhigh and low basal transcription rates ( ρ u ) versus SSA simulations of the full model. For high basaltranscription rates, the LNA and PIDE provide accurate approximations of the full model for fastpromoter switching (Fig. 3A, I) and breakdown for intermediate (Fig. 3A, II) and slow promoter6witching (Fig. 3A, III) since they cannot capture the bimodality of the full model distribution. Inthe latter regime, the cLNA performs well instead. These results make sense in the light that thePIDE solution integrates out promoter switching by using a Hill-type function and hence presumesfast promoter switching while the cLNA derivation assumes that promoter switching is much slowerthan transcription, translation and decay. For intermediate switching, the cLNA misses the preciselocation of the modes; this is a limitation imposed by trying to fit a Gaussian to each mode whichcan be corrected for by using the conditional system-size expansion [42]. In contrast, for low basaltranscription rate, all three approximations fail independent of the switching rate (Fig. 3B, I-III).This is because continuous approximations are only valid for large enough protein numbers and thelow basal transcription rate induces a mode of the protein distribution at zero. In Fig. 3C, D, Ewe compare the LNA and PIDE approximations as the mean burst size B is increased at constanttranscription rate for fast switching conditions. The LNA does best for low burst sizes because thefull model distribution is almost Gaussian; the PIDE approximation is inaccurate here because forsmall mean burst sizes, the exponential distribution of burst sizes is not a good approximation of thegeometric distribution. The reverse is true when the mean burst size becomes large: the full modeldistribution is highly non-Gaussian and cannot be captured by the LNA but is well captured by thePIDE, also because the exponential is a good approximation of the geometric distribution in this case.In summary, various continuum models provide reasonably accurate analytical steady-state dis-tributions for protein numbers in terms of simple functions, provided there is not a mode of thedistribution at zero and provided one is in the limit of either fast or slow switching. In the literature,there are also analytically solvable continuum models with multiple gene copies [39] and a few resultsare also known for promoter switching that is neither extremely slow nor exceedingly rapid [32]. There are at least two main insights obtained from the analysis of stochastic models of auto-regulation:(i) Cooperativity is not necessary for protein distributions to be bimodal. Slow or intermediate pro-moter switching can in some cases lead to bimodality, in the absence of cooperativity. (ii) There is nota simple relationship between noise reduction or amplification and the type of feedback loop (negativeor positive). We next discuss each of these in detail.
Insight (i). An important property of auto-regulatory feedback loops is their ability to generateprotein distributions with more than one mode. Each of these modes can be associated with a subpopulation of cells of a particular phenotype and hence their quantification is important for under-standing cellular decision-making. These modes can arise in at least two ways: (a) If the deterministicrate equations are bistable (typically arising from cooperativity) and provided leakage remains belowa critical threshold [37]; (b) If the deterministic rate equations are not bistable and there is substantialnoise due to the switching of a molecule between a discrete number of conformational states. This lastcase is also called noise-induced bistability [43]. Hence bimodality in protein distributions does notrequire cooperativity [44]. If promoter switching is slow enough then the system will alternate betweentwo steady-state protein distributions, each associated with a promoter state; hence if these distribu-tions are well separated then the full distribution will appear to be bimodal and we can then say that“noise in the gene states created the bistability”. This has also been shown to lead to birhythmical ex-pression in genetic oscillators and to hysteresis in phenotypic induction [40]; furthermore it leads to anenhancement of the sensitivity of the circuit’s response to input signals [9]. The LNA cannot captureeither type of bimodality because it is valid for those systems whose deterministic rate equations aremonostable [16], provided the average number of molecule numbers is large enough. Hybrid methodssuch as the cLNA and others [45, 46, 47] can capture bimodality of type (b) because they model genestates discretely. Methods based on PIDE can capture bimodality of type (a) but not (b) since theydo not model discrete gene states explicitly [37]. The same is true for protein distributions obtainedusing the Fokker-Planck equation stemming from the Kramers-Moyal expansion [48]. Discrete modelscan capture both types of bimodality [22]. 7 nsight (ii). Early work reported that negative feedback reduces protein fluctuations [49] whereaspositive feedback has the opposite effect [50]. For negative feedback there is an optimal feedbackstrength at which the protein fluctuations are minimal [8]. However later work showed that modelswith different assumptions can yield contradictory conclusions about how feedback affects noise [51].More recently it has been claimed that the effect of feedback on noise can be more easily understoodvia a noise decomposition [10]. Noise can be decomposed as the sum of three types: promoter noise,birth-death noise, and correlation noise induced by feedback. In the case of slow switching, wherethe promoter noise is dominant, positive feedback reduces the total noise, whereas negative feedbackamplifies it. In the case of fast switching, where the correlation noise is dominant, positive feedbackamplifies the total noise, whereas negative feedback reduces it. Further work in this direction canbe found here [11, 52]. Hence it appears that the general intuition that negative feedback reducesnoise while positive feedback increases it, is not correct. It follows that the ubiquity of negative self-regulating transcription factors in prokaryotic cells cannot be explained by an evolutionary pressureto select for mechanisms that enable control of protein noise [53].
In summary, our literature review and comparative analysis shows that subtle differences betweenmodels of auto-regulation can have a significant impact on the predicted distribution of protein num-bers, e.g. different number of modes and different predictions for how positive and negative feedbackinfluence the size of protein number fluctuations. The current generation of reduced models have beenconstructed intuitively (not derived rigorously from an underlying fine-scale model) and hence theexisting differences between them.We conclude by briefly pointing out a few of the open problems in the field: (i) The exact solutionof the discrete modified Kumar model. The motivation for its study stems from the fact that stochasticsimulations show that (unlike the Grima, Hornos and Kumar models in the literature) it is in excellentagreement with the full model for all feedback strengths and protein burst sizes, provided mRNAdecays much quicker than protein. (ii) The derivation of exact time-dependent solutions of the CMEfor the Grima, Kumar and modified Kumar models (it is presently known for the Hornos model [54]).The derivation of approximate time-dependent propagators has received recent attention [55]. Explicittime-dependent solutions would enable a detailed study of how perturbations to an auto-regulatorycircuit, e.g. inhibition of certain reactions, affects the dynamics – this would aid the interpretationof experiments of this type. (iii) The development of new continuous approximation methods thatcan accurately predict the steady-state distribution of protein number of the full model without theassumption of fast or slow promoter switching, i.e. valid for any intermediate switching. (iv) Thederivation of reduced models of feedback loops starting from fine-grained stochastic models incorpo-rating biological processes which are known to affect gene expression such as partitioning of proteinsdue to cell division, gene replication, gene dosage compensation and nascent mRNA maturation (suchmodels have been studied using simulations for systems with no feedback [56]). Likely such reductionis possible by the careful application of timescale separation methods or possibly by mapping to aneffective CME with stochastic rates of transcription, translation and feedback [57].In our opinion, the last of these open problems is probably the hardest and the most pressing sinceit is imperative that the reduced models studied are biologically and physically realistic before furthermathematical analysis is undertaken.
Acknowledgments
This work was supported by a BBSRC EASTBIO PhD studentship, BBSRC grant BB/M025551/1and the UK Research Councils’ Synthetic Biology for Growth programme of the BBSRC, EPSRC andMRC (BB/M018040/1). 8 eferences [1] Nitzan Rosenfeld, Michael B Elowitz, and Uri Alon. Negative autoregulation speeds the responsetimes of transcription networks.
Journal of molecular biology , 323(5):785–793, 2002.[2] Shai S Shen-Orr, Ron Milo, Shmoolik Mangan, and Uri Alon. Network motifs in the transcriptionalregulation network of escherichia coli.
Nature genetics , 31(1):64, 2002.[3] Joseph S Takahashi. Transcriptional architecture of the mammalian circadian clock.
NatureReviews Genetics , 18(3):164, 2017.[4] Guy Wiedermann, Robert Alexander Bone, Joana Clara Silva, Mia Bjorklund, Philip J Mur-ray, and J Kim Dale. A balance of positive and negative regulators determines the pace of thesegmentation clock.
Elife , 4:e05842, 2015.[5] Johan Elf and Irmeli Barkefors. Single-molecule kinetics in living cells.
Annual Review of Bio-chemistry , 2019.[6] Michael B Elowitz, Arnold J Levine, Eric D Siggia, and Peter S Swain. Stochastic gene expressionin a single cell.
Science , 297(5584):1183–1186, 2002.[7] Sara Hooshangi and Ron Weiss. The effect of negative feedback on noise propagation in transcrip-tional gene networks.
Chaos: An Interdisciplinary Journal of Nonlinear Science , 16(2):026108,2006.[8] Abhyudai Singh and Joao P Hespanha. Optimal feedback strength for noise suppression in au-toregulatory gene networks.
Biophysical journal , 96(10):4013–4023, 2009.[9] Rutger Hermsen, David W Erickson, and Terence Hwa. Speed, sensitivity, and bistability inauto-activating signaling circuits.
PLoS computational biology , 7(11):e1002265, 2011.[10] Peijiang Liu, Zhanjiang Yuan, Haohua Wang, and Tianshou Zhou. Decomposition and tunabilityof expression noise in the presence of coupled feedbacks.
Chaos: An Interdisciplinary Journal ofNonlinear Science , 26(4):043108, 2016.[11] Chen Jia, Peng Xie, Min Chen, and Michael Q Zhang. Stochastic fluctuations can reveal the feed-back signs of gene regulatory networks at the single-molecule level.
Scientific reports , 7(1):16037,2017.[12] Peter Milner, Colin S Gillespie, and Darren J Wilkinson. Moment closure based parameter infer-ence of stochastic kinetic models.
Statistics and Computing , 23(2):287–295, 2013.[13] Vassilios Stathopoulos and Mark A Girolami. Markov chain monte carlo inference for markov jumpprocesses via the linear noise approximation.
Philosophical Transactions of the Royal Society A:Mathematical, Physical and Engineering Sciences , 371(1984):20110541, 2013.[14] Zhixing Cao and Ramon Grima. Accuracy of parameter estimation for auto-regulatory transcrip-tional feedback loops from noisy data.
Journal of the Royal Society Interface , 16(153):20180967,2019.[15] Kaan Öcal, Ramon Grima, and Guido Sanguinetti. Parameter estimation for biochemical reactionnetworks using wasserstein distances. arXiv preprint arXiv:1907.07986 , 2019.[16] David Schnoerr, Guido Sanguinetti, and Ramon Grima. Approximation and inference methodsfor stochastic biochemical kinetics—a tutorial review.
Journal of Physics A: Mathematical andTheoretical , 50(9):093001, 2017. 917] Vahid Shahrezaei and Peter S Swain. Analytical distributions for stochastic gene expression.
Proceedings of the National Academy of Sciences , 105(45):17256–17261, 2008.[18] José EM Hornos, Daniel Schultz, Guilherme CP Innocentini, JAMW Wang, Aleksandra M Wal-czak, JN Onuchic, and PG Wolynes. Self-regulating gene: an exact solution.
Physical Review E ,72(5):051907, 2005.[19] Ramon Grima, Deena R Schmidt, and Timothy J Newman. Steady-state fluctuations of a geneticfeedback loop: An exact solution.
The Journal of chemical physics , 137(3):035104, 2012.[20] Niraj Kumar, Thierry Platini, and Rahul V Kulkarni. Exact distributions for stochastic geneexpression models with bursting and feedback.
Physical review letters , 113(26):268105, 2014.[21] Yves Vandecan and Ralf Blossey. Self-regulatory gene: an exact solution for the gene gate model.
Physical Review E , 87(4):042705, 2013.[22] Zhixing Cao and Ramon Grima. Linear mapping approximation of gene regulatory networks withstochastic dynamics.
Nature communications , 9(1):3305, 2018.[23] Crispin Gardiner.
Stochastic methods , volume 4. Springer Berlin, 2009.[24] István Sugár and István Simon. Self-regulating genes. exact steady state solution by using poissonrepresentation.
Open Physics , 12(9):615–627, 2014.[25] Srividya Iyer-Biswas and Ciriyam Jayaprakash. Mixed poisson distributions in exact solutions ofstochastic autoregulation models.
Physical Review E , 90(5):052712, 2014.[26] Daniel T Gillespie. Stochastic simulation of chemical kinetics.
Annu. Rev. Phys. Chem. , 58:35–55,2007.[27] Leonardo A Sepúlveda, Heng Xu, Jing Zhang, Mengyu Wang, and Ido Golding. Measurementof gene regulation in individual cells reveals rapid switching between promoter states.
Science ,351(6278):1218–1222, 2016.[28] James Holehouse and Ramon Grima. Revisiting the reduction of stochastic models of geneticfeedback loops with fast promoter switching.
Biophysical Journal , 117(7):1311, 2019.[29] Tomás Aquino, Elsa Abranches, and Ana Nunes. Stochastic single-gene autoregulation.
PhysicalReview E , 85(6):061913, 2012.[30] Michael Assaf, Elijah Roberts, and Zaida Luthey-Schulten. Determining the stability of geneticswitches: explicitly accounting for mrna noise.
Physical review letters , 106(24):248102, 2011.[31] Jae Kyoung Kim, Krešimir Josić, and Matthew R Bennett. The relationship between stochasticand deterministic quasi-steady state approximations.
BMC systems biology , 9(1):87, 2015.[32] Hao Ge, Hong Qian, and X Sunney Xie. Stochastic phenotype transition of a single cell in anintermediate region of gene state switching.
Physical review letters , 114(7):078101, 2015.[33] Jay Newby. Bistable switching asymptotics for the self regulating gene.
Journal of Physics A:Mathematical and Theoretical , 48(18):185001, 2015.[34] Nicolaas Godfried Van Kampen.
Stochastic processes in physics and chemistry , volume 1. Elsevier,1992.[35] Philipp Thomas, Arthur V Straube, and Ramon Grima. The slow-scale linear noise approximation:an accurate, reduced stochastic description of biochemical networks under timescale separationconditions.
BMC systems biology , 6(1):39, 2012.1036] Nir Friedman, Long Cai, and X Sunney Xie. Linking stochastic dynamics to population dis-tribution: an analytical framework of gene expression.
Physical review letters , 97(16):168302,2006.[37] Manuel Pájaro, Antonio A Alonso, and Carlos Vázquez. Shaping protein distributions in stochasticself-regulated gene expression networks.
Physical Review E , 92(3):032712, 2015.[38] Pavol Bokes and Abhyudai Singh. Protein copy number distributions for a self-regulating gene inthe presence of decoy binding sites.
PloS one , 10(3):e0120555, 2015.[39] Jakub Jędrak and Anna Ochab-Marcinek. Influence of gene copy number on self-regulated geneexpression.
Journal of theoretical biology , 408:222–236, 2016.[40] Philipp Thomas, Nikola Popović, and Ramon Grima. Phenotypic switching in gene regulatorynetworks.
Proceedings of the National Academy of Sciences , 111(19):6994–6999, 2014.[41] Michael C Mackey, Marta Tyran-Kaminska, and Romain Yvinec. Dynamic behavior of stochasticgene expression models in the presence of bursting.
SIAM Journal on Applied Mathematics ,73(5):1830–1852, 2013.[42] Alexander Andreychenko, Luca Bortolussi, Ramon Grima, Philipp Thomas, and Verena Wolf.Distribution approximations for the chemical master equation: comparison of the method ofmoments and the system size expansion. In
Modeling Cellular Systems , pages 39–66. Springer,2017.[43] Hong Qian, Pei-Zhe Shi, and Jianhua Xing. Stochastic bifurcation, slow fluctuations, and bista-bility as an origin of biochemical complexity.
Physical Chemistry Chemical Physics , 11(24):4861–4870, 2009.[44] Anna Ochab-Marcinek and Marcin Tabaka. Bimodal gene expression in noncooperative regulatorysystems.
Proceedings of the National Academy of Sciences , 107(51):22096–22101, 2010.[45] Pavel Kurasov, Alexander Lück, Delio Mugnolo, and Verena Wolf. Stochastic hybrid models ofgene regulatory networks–a pde approach.
Mathematical biosciences , 305:170–177, 2018.[46] Rajesh Karmakar and Indrani Bose. Positive feedback, stochasticity and genetic competence.
Physical Biology , 4(1):29, 2007.[47] Yen Ting Lin and Charles R Doering. Gene expression dynamics with stochastic bursts: Con-struction and exact results for a coarse-grained model.
Physical Review E , 93(2):022409, 2016.[48] Andrew Duncan, Shuohao Liao, Tomáš Vejchodsk`y, Radek Erban, and Ramon Grima. Noise-induced multistability in chemical systems: Discrete versus continuum modeling.
Physical ReviewE , 91(4):042111, 2015.[49] Michael L Simpson, Chris D Cox, and Gary S Sayler. Frequency domain analysis of noise inautoregulated gene circuits.
Proceedings of the National Academy of Sciences , 100(8):4551–4556,2003.[50] Jeff Hasty, Joel Pradines, Milos Dolnik, and James J Collins. Noise-based switches and amplifiersfor gene expression.
Proceedings of the National Academy of Sciences , 97(5):2075–2080, 2000.[51] Tatiana T Marquez-Lago and Jörg Stelling. Counter-intuitive stochastic behavior of simple genecircuits with negative feedback.
Biophysical journal , 98(9):1742–1750, 2010.[52] Chen Jia, Le Yi Wang, George G Yin, and Michael Q Zhang. Macroscopic limits, analyticaldistributions, and noise structure for stochastic gene expression with coupled feedback loops. arXiv preprint arXiv:1909.00042 , 2019. 1153] Dov J Stekel and Dafyd J Jenkins. Strong negative self regulation of prokaryotic transcriptionfactors increases the intrinsic noise of protein expression.
BMC systems biology , 2(1):6, 2008.[54] Alexandre Ferreira Ramos, GCP Innocentini, and José Eduardo Martinho Hornos. Exact time-dependent solutions for a self-regulating gene.
Physical Review E , 83(6):062902, 2011.[55] Frits Veerman, Carsten Marr, and Nikola Popović. Time-dependent propagators for stochasticmodels of gene expression: an analytical method.
Journal of mathematical biology , 77(2):261–312,2018.[56] Samuel O Skinner, Heng Xu, Sonal Nagarkar-Jaiswal, Pablo R Freire, Thomas P Zwaka, and IdoGolding. Single-cell analysis of transcription kinetics across the cell cycle.
Elife , 5:e12175, 2016.[57] Seong Jun Park, Sanggeun Song, Gil-Suk Yang, Philip M Kim, Sangwoon Yoon, Ji-Hyun Kim,and Jaeyoung Sung. The chemical fluctuation theorem governing gene expression.