[PDF] Small protein number effects in stochastic models of autoregulated bursty gene expression

Abstract

A stochastic model of autoregulated bursty gene expression by Kumar et al. [Phys. Rev. Lett. 113, 268105 (2014)] has been exactly solved in steady-state conditions under the implicit assumption that protein numbers are sufficiently large such that fluctuations in protein numbers due to reversible protein-promoter binding can be ignored. Here we derive an alternative model that takes into account these fluctuations and hence can be used to study low protein number effects. The exact steady-state protein number distributions is derived as a sum of Gaussian hypergeometric functions. We use the theory to study how promoter switching rates and the type of feedback influence the size of protein noise and noise-induced bistability. Furthermore we show that our model predictions for the protein number distribution are significantly different from those of Kumar et al. when the protein mean is small, gene switching is fast, and protein binding is faster than unbinding.

Full PDF

SSmall protein number effects in stochastic models ofautoregulated bursty gene expression

Chen Jia , Ramon Grima , ∗ Division of Applied and Computational Mathematics, Beijing Computational Science Research Center, Beijing 100193, China School of Biological Sciences, University of Edinburgh, U.K. ∗ Correspondence: [email protected]

Abstract

A stochastic model of autoregulated bursty gene expression by Kumar et al. [Phys. Rev. Lett.113, 268105 (2014)] has been exactly solved in steady-state conditions under the implicit assumptionthat protein numbers are sufficiently large such that fluctuations in protein numbers due to reversibleprotein-promoter binding can be ignored. Here we derive an alternative model that takes into accountthese fluctuations and hence can be used to study low protein number effects. The exact steady-stateprotein number distributions is derived as a sum of Gaussian hypergeometric functions. We use thetheory to study how promoter switching rates and the type of feedback influence the size of proteinnoise and noise-induced bistability. Furthermore we show that our model predictions for the proteinnumber distribution are significantly different from those of Kumar et al. when the protein mean issmall, gene switching is fast, and protein binding is faster than unbinding.

One of the most common gene network motifs is an autoregulatory feedback loop whereby proteinexpressed from a gene activates or represses its own expression [1]. It has been estimated that 40% ofall transcription factors in

Escherichia coli self-regulate [2] with most of them participating in negativeautoregulation [1]. Feedback leads to a regulation of the magnitude of intrinsic noise [3–6], and also tochanges in the response time and relaxation time of transcription networks [2, 7].The predictions by stochastic models of autoregulation have also been shown to lead to considerablydifferent dynamics than those by deterministic models. For example, while deterministic models ofnon-cooperative autoregulation predict monostable gene expression for all parameter values, stochasticmodels of the same set of reactions show switching between two distinct gene expression levels andthus lead to bistability [8, 9]. It has also been shown that in the presence of noise, feedback loops canlead to sustained oscillations in protein numbers in regions of the parameter space where deterministicmodels predict damped oscillations [10–12]. These results suggest that stochastic models are necessaryto understand the intracellular dynamics of biological systems utilizing a combination of positive andnegative feedback loops and in which at least one molecular component is present in low copy numbers,e.g. circadian clocks [13, 14].The stochastic properties of an autoregulatory gene circuit have been explored mostly by stochasticsimulations and to a lesser extent by analytical solutions of various discrete, continuous, and hybridgene expression models (see [15] for a recent review). Discrete models are those in which gene, mRNA,and protein numbers change by discrete integer amounts when reactions occur; in continuous models,fluctuations correspond to hops on the real axis rather than on the integer axis [16–19]; in hybrid models,some fluctuations are modeled discretely (such as those of genes) while other types of fluctuations (such a r X i v : . [ q - b i o . S C ] F e b s those of mRNAs and proteins) are modeled in a continuous sense [20–23]. Here we focus on discretemodels since these are the most realistic among the three types (continuous and hybrid models have theadvantage of possessing simpler distribution solutions that can be easier to interpret).In the literature, there are two exact solutions for the steady-state protein number distribution ofstochastic autoregulatory models [9, 24]. Both models assume that a single gene can exist in one of twostates (with two different protein production rates) and that reversible binding of a protein molecule toa gene leads to switching from one gene state to the other. The differences between the two models areas follows. The model solved by Grima et al. [9] (henceforth referred to as the Grima model; see Fig.1(a) for an illustration) assumes that (i) protein molecules are produced one at a time and (ii) when aprotein molecule binds to a gene, the protein copy number decreases by one whereas it increases by onewhen unbinding occurs. In contrast, the model solved by Kumar et al. [24] (henceforth referred to as theKumar model; see Fig. 1(b) for an illustration) assumes that (i) proteins are produced in random burstswhose size is a random number sampled from a geometric distribution and (ii) when a protein moleculebinds to a gene or unbinds from it, there is no change in the protein copy number. The advantage of theKumar model is its modeling of bursty protein expression which is in agreement with experimental data[25]; its disadvantage (unlike the Grima model) is the implicit neglect of fluctuations due to reversibleprotein-promoter binding which necessarily implies that it cannot precisely capture low protein numbereffects. Taking into account the latter is important because the copy number of DNA-binding proteinscan be in the single digits [26].In this paper, we remedy the shortcomings of the Grima and Kumar models, by deriving and solvinga discrete stochastic model of autoregulatory feedback loops that takes into account both translationalbursting and protein number fluctuations during the binding-unbinding process. The paper is organizedas follows. In Section 2, starting from a stochastic model of autoregulation describing both mRNAand protein dynamics (henceforth referred to as the full model; see Fig. 1(c) for an illustration), weuse multiscale decimation theory to derive a reduced stochastic model of autoregulation for the proteindynamics in the limit of fast mRNA degradation. The chemical master equation (CME) of this reducedmodel is similar to that of the Kumar model except that the propensities include protein fluctuationsduring the binding-unbinding process — we henceforth refer to this as the modified Kumar modelwhich is illustrated in Fig. 1(d). In Section 3, we provide an exact analytical solution for the steady-state protein distribution of the modified Kumar model in terms of Gaussian hypergeometric functions,show its simplification in the cases of fast and slow gene switching, discuss the influence of feedbackon protein noise and bistability, and verify the theory using stochastic simulations. In Section 4, weshow that the modified Kumar model reduces to the Grima model under certain conditions. In Section5, we compare the modified Kumar model with the Kumar model, showing agreement between the twounder slow gene switching and disagreement under fast gene switching and strong protein-promoterinteractions. We also investigate how the relative sensitivity of protein noise to model parameters differsbetween the two models. We conclude in Section 6. Based on the central dogma of molecular biology, the gene expression kinetics for an autoregulatorygene network in an individual cell has a standard three-stage representation involving gene switching,transcription, and translation (Fig. 1(c)) [27]. Due to autoregulation, the promoter could either be free r bound to a protein molecule to form a promoter-protein complex. Let G and G ∗ denote the unboundand bound states of the gene, respectively, let M be the corresponding mRNA, and let P denote thecorresponding protein. Then the effective reactions describing the autoregulatory gene circuit are givenby: G + P σ b −−→ G ∗ , G ∗ σ u −−→ G + P,G ρ u −−→ G + M, G ∗ ρ b −−→ G ∗ + M,M u −−→ M + P, M v −−→ ∅ , P d −−→ ∅ . Here σ b is the binding rate of protein to the promoter which characterizes the strength of feedback; σ u is the unbinding rate of protein from the promoter; ρ u and ρ b are the transcription rates when the geneis unbound and bound to protein, respectively; u is the translation rate; v and d are the degradation ratesof mRNA and protein, respectively. The reaction scheme describes a positive feedback loop if ρ b > ρ u and describes a negative feedback loop if ρ b < ρ u . We shall refer to this fine-grained model as the fullmodel. mRNA protein full model mean burst size B Grima model modified Kumar model

GG* u dv dd

GG G*G* ρ u ρ b ρ u ρ b ρ u ρ b σ b σ u σ b σ u σ b σ u c da ++ Kumar model ρ u ρ b GG* dσ b σ u b mean burst size B protein proteinprotein Fig. 1.

Discrete models of an autoregulatory gene network. (a) Grima model [9]. This model describes proteindynamics only. It neglects translational bursting but takes into account fluctuations in protein number due tobinding to and unbinding from the gene. (b) Kumar model [24]. This model describes protein dynamics only. Ittakes into account translational bursting but neglects protein fluctuations during the binding-unbinding process, i.e.when a protein molecule binds to the promoter, there is no change in the protein number. There are two dottedboxes in this figure; the left one displays the reaction G + P σ b −−→ G ∗ + P and the right one displays the reaction G ∗ σ u −−→ G . (c) Fine-grained full model. This model has both mRNA and protein descriptions. It (implicitly) takesinto account translational bursting (via the mRNA description) and also fluctuations in protein number during thebinding-unbinding process. (d) Modified Kumar model. This model, which is derived from the full model in thelimit of fast mRNA degradation, takes into account both translational bursting and protein fluctuations during thebinding-unbinding process. The microstate of the gene of interest can be represented by an ordered triple ( i, m, n ) : the genestate i with i = 0 , corresponding to the unbound and bound states, respectively, the mRNA copynumber m , and the protein copy number n . Let p i,m,n denote the probability of having m copies ofmRNA and n copies of protein when the gene is in state i . Then the stochastic gene expression kinetics an be described by the Markov jump process illustrated in Fig. 2(a). The evolution of the Markovianmodel is governed by the CME  ˙ p ,m,n = ρ u p ,m − ,n + ( m + 1) vp ,m +1 ,n + mup ,m,n − + ( n + 1) dp ,m,n +1 + σ u p ,m,n − − ( ρ u + mv + mu + nd + nσ b ) p ,m,n , ˙ p ,m,n = ρ b p ,m − ,n + ( m + 1) vp ,m +1 ,n + mup ,m,n − + ( n + 1) dp ,m,n +1 + ( n + 1) σ b p ,m,n +1 − ( ρ b + mv + mu + nd + σ u ) p ,m,n . Experimentally, it is commonly observed that mRNA decays much faster than protein. For example,mRNA lifetimes in prokaryotes are usually of the order of a few minutes, whereas protein lifetimesare generally on the order of tens of minutes to many hours [28]. Due to timescale separation of theunderlying biochemical reaction kinetics, the full model can be reduced to a simpler one. The modelreduction technique described below is similar to but slightly different from the one described in [29],where the author considered a different full model which neglects fluctuations in protein number duringthe binding-unbinding process.Specifically, let λ = v/d denote the ratio of the degradation rates of mRNA and protein. Herewe make the classical assumption that λ (cid:29) and u/v is strictly positive and bounded [27]. In addi-tion, let q ( i,m.n ) , ( i (cid:48) ,m (cid:48) .n (cid:48) ) denote the transition rate of the Markovian model from microstate ( i, m, n ) tomicrostate ( i (cid:48) , m (cid:48) , n (cid:48) ) and let q ( i,m,n ) = (cid:88) ( i (cid:48) ,m (cid:48) ,n (cid:48) ) (cid:54) =( i,m,n ) q ( i,m,n ) , ( i (cid:48) ,m (cid:48) ,n (cid:48) ) denote the rate at which the system leaves microstate ( i, m, n ) , which is defined as the sum of transitionrates from ( i, m, n ) to other microstates. Since λ (cid:29) , we say that ( i, m, n ) is a fast state if lim λ →∞ q ( i,m,n ) = ∞ , and we say that ( i, m, n ) is a slow state if lim λ →∞ q ( i,m,n ) < ∞ . If ( i, m, n ) is a fast state, then the time that the system stays in this state will be very short. Note that thelimit of λ → ∞ is taken at constant u/v and d . It is easy to check that the leaving rates of all microstatesare given by q (0 ,m,n ) = ρ u + mv + mu + nd + nσ b = mλd (1 + u/v ) + nd + nσ b + ρ u ,q (1 ,m,n ) = ρ b + mv + mu + nd + σ u = mλd (1 + u/v ) + nd + σ u + ρ b , which imply that lim λ →∞ q ( i,m,n )  = ∞ , if m ≥ ,< ∞ , if m = 0 . Therefore, all microstates ( i, m, n ) for m ≥ are fast states and all microstates ( i, , n ) are slow states.By using a classical simplification method of multiscale Markov jump processes called decimation [29–33], the full Markovian model can be simplified to a reduced one by removal of all fast states. Forsimplicity, microstate ( i, , n ) will be denoted by ( i, n ) in the reduced model. ( , m,n )( , m,n + )( , m - ,n ) ( , m + ,n )( , m,n - ) ( , m,n )( , m,n + )( , m - ,n ) ( , m + ,n )( , m,n - ) mumu ( n +1) dndρ b ρ b mv ( m +1) vnd ( n +1) d mumu b (1,0)(0,0) (1,1)(0,1) (1,2)(0,2) (1, n )(0, n ) ...... ...... ρ b pq ρ b pq ρ b pqρ b p n qd d nd (1, n -1)(0, n -1) ρ u ρ u mv ( m +1) v ( n +1) σ b σ u nσ b σ u ρ u p n qρ u pq ρ u pq ρ u pqd d ndσ b σ u σ b σ u nσ b σ u ( , , n ) v ( , , n )( , , n + )( , , n ’) u ( , , n ’- )( , , n ’) uρ b slow stateslow statefast states d fast statesslow states ( i , ,n ) ( i ’, , n ’) q ( i , , n ) , ( i ’, , n ’) c multiscale decimation Fig. 2.

Full and reduced Markovian models of autoregulatory gene networks. (a) Transition diagram of thefull Markovian model, where the microstate of the gene is described by an ordered triple ( i, m, n ) . (b) Transitiondiagram of the reduced Markovian model when mRNA decays much faster than protein, where the microstate ofthe gene is described by an ordered pair ( i, n ) . (c) Schematic diagram of the decimation method of multiscalemodel simplification. The effective transition rate from microstate ( i, , n ) to microstate ( i (cid:48) , , n (cid:48) ) is the sum ofthe direct transition rate and the contribution of indirect transitions via all fast transition paths. (d) A typical fasttransition path of the full model. The remaining question is to determine the transition diagram and effective transition rates of thereduced Markovian model. This process is described as follows. Suppose that the full model jumps frommicrostate ( i, m, n ) to another microstate at a particular time. When λ (cid:29) , the transition probabilityfrom ( i, m, n ) to another microstate ( i (cid:48) , m (cid:48) , n (cid:48) ) is given by w ( i,m,n ) , ( i (cid:48) ,m (cid:48) ,n (cid:48) ) = lim λ →∞ q ( i,m,n ) , ( i (cid:48) ,m (cid:48) ,n (cid:48) ) q ( i,m,n ) . Let ( i , m , n ) , · · · , ( i k , m k , n k ) be a sequence of microstates. We say that c : ( i, m, n ) → ( i , m , n ) → · · · → ( i k , m k , n k ) → ( i (cid:48) , m (cid:48) , n (cid:48) ) is a fast transition path from ( i, m, n ) to ( i (cid:48) , m (cid:48) , n (cid:48) ) if the intermediate microstates ( i , m , n ) , · · · , ( i k , m k , n k ) are all fast states. Moreover, the probability weight w c of the fast transition path c is defined as w c = q ( i,m,n ) , ( i ,m ,n ) w ( i ,m ,n ) , ( i ,m ,n ) · · · w ( i k ,m k ,n k ) , ( i (cid:48) ,m (cid:48) ,n (cid:48) ) . According to decimation theory [29–33], the effective transition rate ˜ q ( i,n ) , ( i (cid:48) ,n (cid:48) ) from microstate ( i, n ) to microstate ( i (cid:48) , n (cid:48) ) is given by ˜ q ( i,n ) , ( i (cid:48) ,n (cid:48) ) = q ( i, ,n ) , ( i (cid:48) , ,n (cid:48) ) + (cid:88) c w c , where c ranges over all fast transition paths from ( i, , n ) to ( i (cid:48) , , n (cid:48) ) . This formula indicates that theeffective transition rate from ( i, n ) to ( i (cid:48) , n (cid:48) ) is the superposition of two parts: the direct transition rateand the contribution of indirect transitions via all fast transition paths, as depicted in Fig. 2(c). ince the intermediate microstates of a fast transition path c are all fast states, in order for the pathto have a positive probability weight, all the intermediate transitions along this path must satisfy lim λ →∞ q ( i ,m ,n ) , ( i ,m ,n ) = · · · = lim λ →∞ q ( i k ,m k ,n k ) , ( i (cid:48) ,m (cid:48) ,n (cid:48) ) = ∞ . (1)By using this criterion, it is easy to see that the full model has only one type of fast transition paths withpositive probability weight, which is given by (see the red arrows in Fig. 2(d)) ( i, , n ) → ( i, , n ) → ( i, , n + 1) → · · · → ( i, , n (cid:48) ) → ( i, , n (cid:48) ) , n (cid:48) > n. (2)This is because any fast transition path from ( i, , n ) to ( i, , n (cid:48) ) with positive probability weight cannotpass through microstate ( i, m, k ) for some m ≥ and k ≥ . Otherwise, there must be an intermediatetransition along the path from ( i, m − , k ) to ( i, m, k ) with transition rate being ρ u or ρ b , which doesnot diverge as λ → ∞ . This contradicts the criterion (1). Moreover, since the intermediate transitionrates along the path (2) are all given by u or v , and u, v → ∞ as λ → ∞ , it follows that the path (2)satisfies the criterion (1).To proceed, we define two constants p and q as p = uu + v , q = vu + v . Since λ (cid:29) , the transition probabilities along the path (2) are given by w ( i, ,n ) , ( i, ,n +1) = lim λ →∞ uq ( i, ,n ) = p, w ( i, ,n ) , ( i, ,n ) = lim λ →∞ vq ( i, ,n ) = q. Therefore, the probability weight of this path is ρ u p n (cid:48) − n q if i = 0 and is ρ b p n (cid:48) − n q if i = 1 . Sincethere is no direct transition, the effective transition rate from ( i, n ) to ( i, n (cid:48) ) with n (cid:48) > n is exactly theindirect transition rate via the fast transition path (2) ˜ q ( i,n ) , ( i,n (cid:48) ) =  ρ u p n (cid:48) − n q, i = 0 ,ρ b p n (cid:48) − n q, i = 1 . This formula indicates that the reduced model may produce large jumps of protein number within a veryshort period, which correspond to random translational bursts. Each random burst corresponds to a fasttransition path of the full Markovian model. The above computations can be understood intuitively asfollows. Since v (cid:29) d and u/v is finite, the process of protein synthesis followed by mRNA degradationis essentially instantaneous. Once a transcript is synthesized, it can either produce a protein moleculewith probability p = u/ ( u + v ) or be degraded with probability q = 1 − p = v/ ( u + v ) . Thus, theprobability that a transcript produces k copies of protein before it is finally degraded will be p k q , whichfollows a geometric distribution. Note that since the protein burst size is geometrically distributed, itsexpected value is given by B = ∞ (cid:88) k =0 kp k q = pq = uv . So far, we have obtained the transition diagram and all effective transition rates of the reducedmodel, as depicted in Fig. 2(b). Let p i,n denote the probability of having n copies of protein when the ene is in state i . Then the evolution of the reduced model is governed by the coupled set of masterequations  ˙ p ,n = n − (cid:88) k =0 ρ u p n − k qp ,k + ( n + 1) dp ,n +1 + σ u p ,n − − ( ρ u p + nd + nσ b ) p ,n , ˙ p ,n = n − (cid:88) k =0 ρ b p n − k qp ,k + ( n + 1) dp ,n +1 + ( n + 1) σ b p ,n +1 − ( ρ b p + nd + σ u ) p ,n . (3)This is exactly the CME of the following chemical reaction system (see Fig. 1(d) for an illustration): G + P σ b −−→ G ∗ , G ∗ σ u −−→ G + P,G ρ u p k q −−−−→ G + kP, G ∗ ρ b p k q −−−→ G ∗ + kP, k ≥ , (4) P d −−→ ∅ . This reaction scheme takes into account both translational bursting resulting from short-lived mRNAand protein fluctuations during the binding-unbinding process, i.e. when a protein molecule binds tothe promoter, the protein number in a single cell is decreased by one and conversely it is increasedby one when unbinding occurs. In fact, the reduced model modifies a model of autoregulatory genenetworks proposed by Kumar et al. [24]. In the Kumar model, the authors also take into accounttranslational bursting but neglect protein fluctuations during the binding-unbinding process, i.e. whena protein molecule binds to or unbinds from the promoter, the protein number remains the same. Inthe following, we shall refer to the reduced model as the modified Kumar model. This model was firstintroduced (using intuitive arguments) and studied using approximation methods in [34]. A method toefficiently infer the parameters of this model from noisy experimental data by means of MCMC hasbeen described in [35].

Here we derive the exact analytic solution for the steady-state protein number distribution of themodified Kumar model. For convenience, we normalize all model parameters by the protein degradationrate as r = ρ u d , s = ρ b d , µ = σ b d , b = σ u d . At steady-state, all probabilities are time-independent and thus Eq. (3) can be rewritten as  n − (cid:88) k =0 rp n − k qp ,k + ( n + 1) p ,n +1 + bp ,n − − ( rp + n + nµ ) p ,n = 0 , n − (cid:88) k =0 sp n − k qp ,k + ( n + 1) p ,n +1 + ( n + 1) µp ,n +1 − ( sp + n + b ) p ,n = 0 . (5)To proceed, we define a pair of generating functions f ( z ) = ∞ (cid:88) n =0 p ,n z n , g ( z ) = ∞ (cid:88) n =0 p ,n z n . et p n = p ,n + p ,n denote the probability of having n copies of protein and let F ( z ) = f ( z ) + g ( z ) denote its generating function. Then Eq. (5) can be converted into the following system of ordinarydifferential equations (ODEs): [ sp (1 − z ) + b (1 − pz )] f ( z ) − (1 − z )(1 − pz ) f (cid:48) ( z ) − µ (1 − pz ) g (cid:48) ( z ) = 0 , (6) rp (1 − z ) g ( z ) − [(1 − z ) − µz ](1 − pz ) g (cid:48) ( z ) − bz (1 − pz ) f ( z ) = 0 . (7)Solving these equations leads to (see Appendix A for details) f ( z ) = rµpK ( q + µ ) β (1 − pz ) − s F ( α + 1 , α + 1; β + 1; w ( z − z )) ,F ( z ) = K (1 − pz ) − s (cid:20) F ( α , α ; β ; w ( z − z ))+ A (1 − az ) F ( α + 1 , α + 1; β + 1; w ( z − z )) (cid:21) , (8)where F ( α , α ; β ; z ) is the Gaussian hypergeometric function, α + α = b + r µ − s − , α α = s + b ( r − s ) − r µ , β = b µ + rµp (1 + µ )( q + µ ) , (9) A = ( s − r + rµ ) p ( q + µ ) β , a = s − r + sµs − r + rµ , w = (1 + µ ) pq + µ , z = 11 + µ , and K = q s [ F ( α , α ; β ; w (1 − z )) + A (1 − a ) F ( α + 1 , α + 1; β + 1; w (1 − z ))] − is a normalization constant that can be determined by solving F (1) = 1 . Recall that the steady-statedistribution of protein number can be obtained by taking the derivatives of the generating function F ( z ) at zero: p n = 1 n ! d n dz n (cid:12)(cid:12)(cid:12) z =0 F ( z ) . Taking the derivatives of the generating function given by Eq. (8) yields p n = K n (cid:88) k =0 ( α ) k ( α ) k ( s ) n − k ( β ) k (1) k (1) n − k F ( α + k, α + k ; β + k ; − wz ) w k p n − k + KA n (cid:88) k =0 ( α + 1) k ( α + 1) k ( s ) n − k ( β + 1) k (1) k (1) n − k F ( α + 1 + k, α + 1 + k ; β + 1 + k ; − wz ) w k p n − k − KAa n − (cid:88) k =0 ( α + 1) k ( α + 1) k ( s ) n − − k ( β + 1) k (1) k (1) n − − k F ( α + 1 + k, α + 1 + k ; β + 1 + k ; − wz ) w k p n − − k . (10) We next focus on two trivial special cases. In the case of σ u (cid:29) σ b , ρ u ρ b , d , the gene is mostly inthe unbound state and the parameters in Eq. (9) reduce to α = β = b µ , α = r − s, A = 0 . Since α = β , we have (see Eq. 15.4.6 in [36]) F ( α , α ; β ; w ( z − z )) = [1 − w ( z − z )] − α = (cid:18) µq + µ (cid:19) s − r (1 − pz ) s − r , nd thus the generating function reduces to F ( z ) = K (cid:18) µq + µ (cid:19) s − r (1 − pz ) − r . Therefore, the protein number distribution is negative binomial and given by [27, 37] p n = ( r ) n n ! p n q r , (11)where ( x ) n = x ( x + 1) · · · ( x + n −

1) = Γ( x + n ) / Γ( x ) is the Pochhammer symbol. Similarly, in thecase of σ b (cid:29) σ u , ρ u ρ b , d , the gene is mostly in the bound state and the protein number distribution isnegative binomial and given by p n = ( s ) n n ! p n q s . (12)Our analytic results can also be used to derive explicit expressions for several other quantities ofinterest. For example, the steady-state probability that the gene is in the bound state can be recoveredfrom the generating function f ( z ) at z = 1 , p G ∗ = rµpq − s K ( q + µ ) β F ( α + 1 , α + 1; β + 1; w (1 − z )) . (13)In addition, solving Eqs. (6) and (7) simultaneously for f (cid:48) ( z ) and g (cid:48) ( z ) , we obtain zf (cid:48) ( z ) + g (cid:48) ( z ) = spz − pz f ( z ) + rp − pz g ( z ) . Substituting z = 1 in this equation yields F (cid:48) (1) = sBf (1) + rBg (1) . where B = p/q is the mean protein burst size. Since (cid:104) n (cid:105) = F (cid:48) (1) , the steady-state mean of proteinnumber is given by (cid:104) n (cid:105) = sBp G ∗ + rBp G , (14)where p G is the steady-state probability of the gene being in the unbound state. The first term on theright-hand side is the mean protein number sB when the gene is in the bound state multiplied by thecorresponding probability p G ∗ and similarly the second term is the mean protein number rB when thegene is in the unbound state multiplied by the corresponding probability p G . We stress here that theprotein mean of the stochastic model given by Eqs. (13) and (14) is not generally the same as thatobtained by solving the corresponding deterministic rate equations in steady-state conditions. Similarly,the steady-state second moment of protein number is given by (see Appendix B for details) (cid:104) n (cid:105) = sB [(1 + s ) B + 1] p G ∗ + rB [(1 + r ) B + 1] p G + (1 − sB + rB ) (cid:20) σ u σ b p G ∗ − rBp G (cid:21) . (15)This is the sum of three terms: the first term is the second moment of the negative binomial distributionassociated with the bound gene state, i.e. the second moment of Eq. (12), multiplied by the correspond-ing probability; the second term is the second moment of the negative binomial distribution associatedwith the unbound gene state, i.e. the second moment of Eq. (11), multiplied by the correspondingprobability; the last term can hence be interpreted as that arising from the difference between the exactprobability distribution and a mixture of two negative binomials (this term becomes negligible in thelimit of slow gene switching as we show in Section 3.3). .2 Regime of fast gene switching We next focus on two nontrivial special cases. Consider the limiting case when the gene switchesrapidly between the unbound and bound states, i.e. σ u , σ b (cid:29) ρ u , ρ b , d . In this case, the modified Kumarmodel depicted in Fig. 1(d) can be further simplified using another classical simplification method ofmultiscale Markov jump processes called averaging [33, 38]. Since σ u and σ b are large, for any n ≥ ,the two microstates (0 , n ) and (1 , n − are in rapid equilibrium and thus can be aggregated into agroup that is labeled by group n , as shown in Fig. 3(a). In addition, group 0 is composed of the singlemicrostate (0 , . In this way, the modified Kumar model can be further simplified to the Markovianmodel illustrated in Fig. 3(b), whose state space is given by { group , group , · · · , group n, · · · } . Here we emphasize that the group index n cannot be interpreted as the protein number. This is becausewhen n ≥ , group n is composed of two microstates with different protein numbers. a (1,0)(0,0) (1,1)(0,1) (1,2)(0,2) (1, n )(0, n ) ...... ...... ρ b pq ρ b pq ρ b pqρ b p n qd d nd (1, n -1)(0, n -1) ρ u p n qρ u pq ρ u pq ρ u pqd d ndσ b σ u σ b σ u nσ b σ u multiscale averaging (large σ u and σ b ) group group n group n ... ... c pq c pq c n -1 pqc p n q n -1 d d nd n b c (1,0)(0,0) (1,1)(0,1) (1,2)(0,2) (1, n )(0, n ) ...... ...... ρ b pq ρ b pq ρ b pqρ b p n qd d nd (1, n -1)(0, n -1) ρ u p n qρ u pq ρ u pq ρ u pqd d ndσ b σ u σ b σ u nσ b σ u multiscale averaging (small σ u and σ b ) unbound group bound group rBσ b σ u d Fig. 3.

Multiscale simplification of the modified Kumar model using averaging method under fast and slowgene switching. (a) Transition diagram of the modified Kumar model. When the gene switches rapidly between theunbound and bound states, for each n ≥ , the two microstates (0 , n ) and (1 , n − can be combined into a groupthat is labelled by group n . (b) Transition diagram of the group dynamics in the limit of fast gene switching. Sincegroup n is composed of two microstates with different protein numbers, the group index n cannot be interpretedas the protein number. (c) Transition diagram of the modified Kumar model. When the gene switches slowlybetween the unbound and bound states, all unbound microstates (0 , n ) , as well as all bound microstates (1 , n ) ,can be combined into a group. (d) Transition diagram of the two-state group dynamics in the limit of slow geneswitching. The remaining question to address is how to calculate the effective transition rates between twogroups. In the fast switching limit, the two microstates (0 , n ) and (1 , n − will reach a quasi-steady-state with quasi-steady-state distribution p qss (0 ,n ) = σ u σ u + nσ b , p qss (1 ,n − = nσ b σ u + nσ b . For convenience, we define the effective transcription rate as c n = σ u ρ u + nσ b ρ b σ u + nσ b (16)and the effective protein degradation rate as d n = d (cid:20) − σ b σ u + nσ b (cid:21) . (17) t is important to here emphasize that fast switching leads to effective transcription and degradationrates whereas it is customary to write reduced master equations in this parameter regime which onlyhave effective transcription rates; this explains the discrepancies between conventional and exact masterequation reduction reported in [39].According to averaging theory [33, 38], the effective transition rate from group n to group n + k isgiven by p qss (0 ,n ) ˜ q (0 ,n ) , (0 ,n + k ) + p qss (1 ,n − ˜ q (1 ,n − , (1 ,n − k ) = c n p k q and the effective transition rate from group n to group n − is given by p qss (0 ,n ) ˜ q (0 ,n ) , (0 ,n − + p qss (1 ,n − ˜ q (1 ,n − , (1 ,n − = nd n . So far, we have obtained all effective transition rates for the group dynamics (Fig. 3(b)).Let p groupn denote the probability of being in group n . Then the evolution of the group dynamics isgoverned by the master equation ˙ p groupn = n − (cid:88) k =1 c k p n − k qp groupk + ( n + 1) d n +1 p groupn +1 − ( c n p + nd n ) p groupn . To solve this equation, we notice that it is recursive with respect to the group index n . It involves twovariables when n = 1 , three variables when n = 2 , and so on. At steady-state, solving this masterequation by induction leads to p groupn = K p n n ! c d · c + d d · · · c n − + ( n − d n − d n , where K is a normalization constant. Given that there are n copies of protein in an individual cell, thegene can exist in either microstate (0 , n ) or microstate (1 , n ) . Since microstate (0 , n ) is contained ingroup n and microstate (1 , n ) is contained in group n +1 , the steady-state distribution of protein numberis given by p n = p groupn p qss (0 ,n ) + p groupn +1 p qss (1 ,n ) = K ( p/d ) n n ! c ( c + d ) · · · ( c n − + ( n − d n − ) (cid:20) σ b p ( c n + nd n ) σ u d (cid:21) . (18)In fact, this steady-state protein distribution is exactly the same as that obtained from the generatingfunction given by Eq. (8) in the fast switching limit. To see this, we note that when σ u , σ b (cid:29) ρ u , ρ b , d ,the generating function reduces to F ( z ) = K (1 − pz ) − s (cid:20) F ( α , α ; β ; pz ) + rpβ (cid:16) − sr z (cid:17) F ( α + 1 , α + 1; β + 1; pz ) (cid:21) , where α + α = bµ − s − , α α = s + b ( r − s ) µ , β = bµ . To proceed, we recall the following Euler-Pfaff transformation for hypergeometric functions: F ( α , α ; β ; z ) = (1 − z ) β − α − α F ( β − α , β − α ; β ; z ) . Using this transformation, the generating function can be simplified as F ( z ) = K (cid:20) (1 − pz ) F ( β − α , β − α ; β ; pz ) + rpβ (cid:16) − sr z (cid:17) F ( β − α , β − α ; β + 1; pz ) (cid:21) . herefore, the steady-state protein distribution can be recovered taking the derivatives of the generatingfunction at zero: p n = rK p n ( β − α ) n ( β − α ) n n !( β + 1) n (cid:20) pβ + β + n ( β − α + n − β − α + n − (cid:21) . (19)Straightforward computations show that ( β − α + n − β − α + n − β + n = c n + nd n d . (20)This equality, together with the fact that r = c /d , shows that r ( β − α ) n ( β − α ) n ( β + 1) n = c d · c + d d · · · c n + nd n d . (21)Inserting Eqs. (20) and (21) into Eq. (19) again yields Eq. (18). We next consider the limiting case when the gene switches slowly between the unbound and boundstates, i.e. σ u , σ b (cid:28) ρ u , ρ b , d . In this case, the modified Kumar model can also be simplified using theaveraging method [33, 38]. Since σ u and σ b are small, all unbound microstates (0 , n ) , as well as allbound microstates (1 , n ) , are in rapid equilibrium and thus can be aggregated into a group, as shownin Fig. 3(c). In this way, the modified Kumar model can be further simplified to the Markovian modelillustrated in Fig. 3(d), which has only two states.In the slow switching limit, it follows from Eqs. (11) and (12) that all unbound microstates (0 , n ) will reach a quasi-steady-state with quasi-steady-state distribution p qss (0 ,n ) = ( r ) n n ! p n q r , and all bound microstates (1 , n ) will reach a quasi-steady-state with quasi-steady-state distribution p qss (1 ,n ) = ( s ) n n ! p n q s . According to averaging theory [33, 38], the effective transition rate from the unbound group to thebound group is given by ∞ (cid:88) n =1 p qss (0 ,n ) ˜ q (0 ,n ) , (1 ,n − = rBσ b and the effective transition rate from the bound group to the unbound group is given by ∞ (cid:88) n =0 p qss (1 ,n ) ˜ q (1 ,n ) , (0 ,n +1) = σ u . So far, we have obtained all effective transition rates for the two-state group dynamics (Fig. 3(d)).Let p G and p G ∗ denote the steady-state probabilities of being in the unbound and bound groups,respectively. Clearly, we have p G = σ u σ u + rBσ b , p G ∗ = rBσ b σ u + rBσ b . herefore, the steady-state distribution of protein number is given by p n = p G p qss (0 ,n ) + p G ∗ p qss (1 ,n ) = σ u σ u + rBσ b ( r ) n n ! p n q r + rBσ b σ u + rBσ b ( s ) n n ! p n q s , which is a mixed negative binomial distribution. Since a negative binomial distribution is unimodal, themixture of two negative binomials can yield a bimodal steady-state protein distribution. It is not hard toshow that this steady-state protein distribution is exactly the same as that obtained from the generatingfunction given by Eq. (8) in the slow switching limit. The details are omitted here. Note that in the slowswitching limit, we have σ u p G ∗ = rBσ b p G , implying that the third term in Eq. (15) vanishes leavingonly the terms associated with the conditional negative binomials of each gene state. In single-cell experiments, the size of fluctuations around the protein mean is often measured bythe squared coefficient of variation η = σ / (cid:104) n (cid:105) , where (cid:104) n (cid:105) is the mean and σ = (cid:104) n (cid:105) − (cid:104) n (cid:105) is thevariance. From Eqs. (14) and (15), the steady-state variance of protein number is given by σ = (1 + B ) (cid:104) n (cid:105) + ( s − r ) B p G ∗ p G + (1 − sB + rB )( Lp G ∗ − rBp G ) . where L = σ u /σ b . Therefore, the size of protein number fluctuations can be decomposed into threeterms as η = 1 + B (cid:104) n (cid:105) + η + + η − , where η + = ( s − r ) B p G ∗ p G ( sBp G ∗ + rBp G ) , η − = (1 − sB + rB )( Lp G ∗ − rBp G )( sBp G ∗ + rBp G ) . (22)Here (1 + B ) / (cid:104) n (cid:105) is the noise due to unregulated bursty gene expression which has been previouslyderived in the literature [40]. In the presence of autoregulation, two additional terms η + and η − emerge.In Fig. 4 we show how η + , η − , their sum η + + η − , and the total noise η vary as a function of the geneswitching rates σ u and σ b for positive (upper panels) and negative (lower panels) feedback loops.Clearly, η + is always positive. Since ≤ p G ∗ , p G ≤ and p G ∗ + p G = 1 , the term η + has thefollowing lower and upper bounds: ≤ η + ≤ ( s − r ) sr . When L (cid:29) or L (cid:28) , we have p G ≈ or p G ∗ ≈ . In this case, the lower bound is achieved and η + vanishes. Moreover, the upper bound is achieved when p G ∗ = ss + r . In this case, the ratio L of σ u and σ b is neither too small nor too large. In other words, in order to obtaina large η + , log σ b − log σ u must be controlled within a “narrow” belt (Fig. 4(a)).We next focus on the term η − . In the slow switching limit, we have Lp G ∗ = rBp G and thus η − vanishes (Fig. 4(b)). In the fast switching limit, however, the pair of reversible reactions G + P (cid:10) G ∗ are in rapid equilibrium and thus the following approximation is appropriate: (cid:104) n (cid:105) p G ≈ Lp G ∗ . a log σ u l og σ b l og σ b log σ u log σ u log σ u positivenegativeη + η - η + +η - η (total noise) -5 -4 -3 -2 -1-5-4-3-2-10123 -0.35-0.3-0.25-0.2-0.15-0.1-0.05 -5 -4 -3 -2 -1 0 1 2 3 4 -5-4-3-2-10123 -0.0500.050.10.150.20.250.3 -5 -4 -3 -2 -1 0 1 2 3 4 -5-4-3-2-10123 -5 -4 -3 -2 -1 0 1 2 3 4-5-4-3-2-10123 -5 -4 -3 -2 -1 0 1 2 3 4 -5-4-3-2-10123 -5 -4 -3 -2 -1 0 1 2 3 4-5-4-3-2-10123 -0.25-0.2-0.15-0.1-0.050 -5 -4 -3 -2 -1 0 1 2 3 4-5-4-3-2-10123 -5 -4 -3 -2 -1 0 1 2 3 4 b c Fig. 4.

Dependence of steady-state protein noise on gene switching rates . Upper panels are for positivefeedback and lower panels for negative feedback. In (a),(b),(c) and (d) we show the dependence of η + , η − , η + + η − and η , respectively, on the gene switching rates, σ u and σ b . In the positive feedback case, the modelparameters are ρ u = 5 , ρ b = 15 , d = 1 , p = 0 . while in the negative feedback case, the model parameters are ρ u = 15 , ρ b = 5 , d = 1 , p = 0 . . This approximation, together with Eq. (14), shows that Lp G ∗ − rBp G ≈ ( (cid:104) n (cid:105) − rB ) p G = ( s − r ) Bp G ∗ p G . (23)In addition, we note that since | s − r | B , i.e. the absolute difference between the protein mean in the twogenetic states, is usually larger than 1 in living cells, then the signs of − sB + rB and ( r − s ) B aretypically the same. Therefore, in most biologically relevant cases, η − is the product of two terms withdifferent signs and is thus negative in the regime of fast gene switching (Fig. 4(b)).In previous works, it has been shown that for an unregulated gene, the protein noise only containsthe Poisson noise and mRNA noise [40]. Therefore, the sum of η + and η − characterizes the contributionof an autoregulatory feedback loop to protein fluctuations. In the slow switching limit, η − vanishes andthus the overall feedback contribution is η + . In the fast switching limit, η − is strictly negative and thusthe overall feedback contribution η + + η − is less than that in the slow switching limit. From Eq. (23),in the fast switching limit, η − counteracts most of η + and the remaining term is given by η + + η − ≈ ( s − r ) Bp G ∗ p G ( sBp G ∗ + rBp G ) , (24)which is proportional to s − r . Consequently, the overall feedback contribution is positive (negative) forpositive (negative) feedback loops. This clearly shows that in the regime of fast gene switching, positivefeedback enhances protein noise and negative feedback diminishes it (Fig. 4(c)). This is consistent withrecent theoretical results obtained in [6]. In the fast switching limit, comparing Eqs. (22) and (24), weobtain η + ≈ ( s − r ) B ( η + + η − ) . Since | s − r | B is the absolute difference between the protein mean in the two genetic states, the overallfeedback contribution in the fast switching limit is much smaller than that in the slow switching limitwhen the protein mean is relatively large (Fig. 4(c)).The total noise η is then the superposition of the unregulated contribution (1 + B ) / (cid:104) n (cid:105) and theregulated contribution η + + η − . When L (cid:29) , the gene is mostly in the unbound state and thus the rotein mean has a negative binomial distribution with mean rB and variance rB ( B + 1) . In this case,the total noise is given by η = rB ( B + 1)( rB ) = B + 1 rB . Similarly, when L (cid:28) , the gene is mostly in the bound state and thus the protein mean has a negativebinomial distribution with mean sB and variance sB ( B + 1) . In this case, the total noise is given by η = sB ( B + 1)( sB ) = B + 1 sB . This explains why the upper-left and lower-right corners in Fig. 4(d) have different colors. Due to largefeedback contribution, the total noise is also large in the regime of slow gene switching, as can be seenfrom the lower-left corner in Fig. 4(d). In the regime of fast gene switching, feedback regulation givesrise to a weaker enhancement or suppression of the total noise, depending on the sign of the feedbackloop.

To validate our analytic solution given by Eq. (10), we compare it with the numerical solutionobtained using the stochastic simulation algorithm (SSA) for both positive and negative feedback loopsin the regimes of slow (Fig. 5(a),(c)) and fast (Fig. 6(a),(c)) gene switching. Clearly, the analyticsolution coincides perfectly with the SSA. In the regime of slow gene switching, both positive andnegative feedback loops can lead to bistable gene expression. In the positive (negative) feedback case,the low expression peak becomes higher as the feedback strength σ b decreases (increases) and the highexpression peak becomes higher as σ b increases (decreases) (Fig. 5(a),(c)). -3 -2 -1-4-3-2-10 0 1 2-3 -2 -1-4-3-2-10 a b p r obab ili t y c d p r obab ili t y l og σ b l og σ b log σ u protein numberprotein numberprotein number Fig. 5.

Steady-state protein number distribution in the regime of slow gene switching . (a) Comparison of theanalytic solution given by Eq. (10) (solid blue curve) with the the numerical solution obtained using the SSA(red circles) for positive feedback loops. The model parameters are chosen as ρ u = 10 , ρ b = 50 , σ u = 0 . , d =1 , p = 0 . . The feedback strength is chosen as σ b = 2 × − (left), × − (middle), and . × − (right).(b) Strength of bistability κ defined by Eq. (25) versus the gene switching rates σ u and σ b for positive feedbackloops. The model parameters in (b) are chosen to be the same as in (a). (c) Comparison of the analytic solution(solid blue curve) with the SSA (red circles) for negative feedback loops. The model parameters are chosen as ρ u = 50 , ρ b = 10 , σ u = 0 . , d = 1 , p = 0 . . The feedback strength is chosen as σ b = 7 × − (left), × − (middle), and × − (right). (d) Strength of bistability versus the gene switching rates for negative feedbackloops. The model parameters in (d) are chosen to be the same as in (c). Note that the region designated as bimodalis that satisying the criterion κ > . o gain a deeper insight into bimodal gene expression, we define the strength of bistability as κ = h low − h valley h high , (25)where h low and h high are the heights of the low and high expression peaks, respectively, and h valley is theheight of the valley between them. Obviously, κ is a quantity between 0 and 1 for bimodal distributionsand is set to be 0 for unimodal distributions. In general, to display strong bistability, the followingtwo conditions are necessary: (i) the two peaks should have similar heights and (ii) there should bea deep valley between the two peaks. The former ensures that the time periods spent in the low andhigh expression states are comparable while the latter guarantees that the two expression levels aredistinguishable. Clearly, κ is large if the two conditions are both satisfied and κ is small if any oneof the two conditions is violated. Therefore, κ captures both features of bistability and serves as aneffective indicator that characterizes its strength.Using this definition, we investigate how the strength of bistability κ varies with the gene switchingrates σ u and σ b for positive (Fig. 5(b)) and negative (Fig. 5(d)) feedback loops when the low expressionmode is away from zero. It can be seen that both positive and negative feedback loops can only producebistability in the regime of slow gene switching. When all model parameters are fixed (except a possibleinterchange between ρ u and ρ b ), a positive feedback loop requires a larger feedback strength to achievebistability than a negative feedback loop.The situation is totally different when the low expression mode lies at zero – see Fig. 6. Interest-ingly, we find that in this case, a positive feedback loop can produce strong bistability under fast geneswitching. This is nontrivial because when gene switching is fast, the effective transcription rate c n isa Hill-like function with Hill coefficient being equal to 1. In this case, there is no cooperativity in theprotein-promoter binding process and the conventional deterministic theory predicts that bistability cannever occur (see Appendix C for a proof of this result). However, our stochastic model predicts that apositive feedback loop is capable of bistable behaviour when gene switching is fast even in the absenceof cooperative binding. Note that this is not an artefact of the assumptions used to derive the modifiedKumar model since simulations verify that it is also a property of the full model. On the other hand,according to our simulations, a negative feedback loop fails to achieve bistable behaviour in the regimeof fast gene switching whether the low expression mode is away from zero (Fig. 5(d)) or lies at zero(Fig. 6(d)). This is consistent with the deterministic prediction. Consider the limiting case when ρ u , ρ b → ∞ and p → , while keeping ρ u p = ¯ ρ u and ρ b p = ¯ ρ b asconstants. Since the mean protein burst size is B = p/ (1 − p ) and the effective protein production ratesin the two gene states are ρ u B and ρ b B , the limits considered above can be interpreted as the limit ofnegligible mean protein burst size, i.e. B → , taken at constant effective protein production rates. Inthis case since q = 1 − p , we have ρ u pq → ¯ ρ u , ρ b pq → ¯ ρ b ,ρ u p n q → ρ b p n q → , n ≥ , -3 -2 -1 3210-5-4-3-2-1012 -3 -2 -1 3210 c d p r obab ili t y ba p r obab ili t y log σ u protein numberprotein numberprotein number l og σ b l og σ b unimodalbimodalbimodal unimodal Fig. 6.

Steady-state protein number distribution in the regime of fast gene switching . (a) Comparison of theanalytic solution given by Eq. (10) (solid blue curve) with the SSA (red circles) for positive feedback loops. Themodel parameters are chosen as ρ u = 0 . , ρ b = 50 , σ u = 1000 , d = 1 , p = 0 . . The feedback strength is chosenas σ b = 10 . (left), . (middle), and . (right). (b) Strength of bistability versus the gene switching ratesfor positive feedback loops. The model parameters in (b) are chosen to be the same as in (a). (c) Comparison of theanalytic solution (solid blue curve) with the SSA (red circles) for negative feedback loops. The model parametersare chosen as ρ u = 50 , ρ b = 0 . , σ u = 1000 , d = 1 , p = 0 . . The feedback strength is chosen as σ b = 0 . (left), (middle), and (right). (d) Strength of bistability versus the gene switching rates for negative feedback loops.The model parameters in (d) are chosen to be the same as in (c). and thus the master equation of the modified Kumar model given by Eq. (3) reduces to the masterequation (cid:40) ˙ p ,n = ¯ ρ u p ,n − + ( n + 1) dp ,n +1 + σ u p ,n − − ( ¯ ρ u + nd + nσ b ) p ,n , ˙ p ,n = ¯ ρ b p ,n − + ( n + 1) dp ,n +1 + ( n + 1) σ b p ,n +1 − ( ¯ ρ b + nd + σ u ) p ,n . This is exactly the CME of the following chemical reaction system (see Fig. 1(a) for an illustration): G + P σ b −−→ G ∗ , G ∗ σ u −−→ G + P,G ¯ ρ u −−→ G + P, G ∗ ¯ ρ b −−→ G ∗ + P,P d −−→ ∅ . This reaction scheme coincides with the classical model of autoregulatory non-bursty gene networksproposed by Grima et al. [9]. The Grima model takes into account changes in protein number duringthe binding-unbinding process but neglects translational bursting, i.e. it assumes that protein moleculesare produced one at a time. Here we have derived the Grima model rigorously as the non-bursty limitof the modified Kumar model which itself is the fast mRNA decaying limit of the full model illustratedin Fig. 1(c).To derive the analytic distribution for the Grima model, we normalize all model parameters by theprotein degradation rate as ¯ r = ¯ ρ u d , ¯ s = ¯ ρ b d , µ = σ b d , b = σ u d . Recall that when α → ∞ and z → , while keeping α z as a constant, the Gaussian hypergeometricfunction has the following limit F ( α , α ; β ; z ) → F ( α ; β ; α z ) , here F ( α ; β ; z ) is the confluent hypergeometric function. Taking ρ u , ρ b → ∞ and p → in Eq. (8)and applying the above formula, we obtain f ( z ) = ¯ rµK (1 + µ ) β e ¯ sz F ( α + 1; β + 1; w ( z − z )) ,F ( z ) = Ke ¯ sz (cid:20) F ( α ; β ; w ( z − z )) + A (1 − az ) F ( α + 1; β + 1; w ( z − z )) (cid:21) , where α = b (¯ r − ¯ s )¯ r − ¯ s − ¯ sµ − , β = b µ + ¯ rµ (1 + µ ) ,A = ¯ s − ¯ r + ¯ rµ (1 + µ ) β , a = ¯ s − ¯ r + ¯ sµ ¯ s − ¯ r + ¯ rµ , w = ¯ r µ − ¯ s, z = 11 + µ , and K = e − ¯ s [ F ( α ; β ; w (1 − z )) + A (1 − a ) F ( α + 1; β + 1; w (1 − z ))] − is a normalization constant that can be determined by solving F (1) = 1 . This is fully consistent withthe results obtained in [9]. The steady-state protein distribution for the Grima model can be obtained bytaking the derivatives of the generating function F ( z ) at zero: p n = K n (cid:88) k =0 ( α ) k ( β ) k (1) k (1) n − k F ( α + k ; β + k ; − wz ) w k s n − k + KA n (cid:88) k =0 ( α + 1) k ( β + 1) k (1) k (1) n − k F ( α + 1 + k ; β + 1 + k ; − wz ) w k s n − k − KAa n − (cid:88) k =0 ( α + 1) k ( β + 1) k (1) k (1) n − − k F ( α + 1 + k ; β + 1 + k ; − wz ) w k s n − − k . The modified Kumar model solved in the present paper can also be compared with the classicalmodel of an autoregulatory bursty gene circuit proposed by Kumar et al. [24]: G + P σ b −−→ G ∗ + P, G ∗ σ u −−→ G,G ρ u p k q −−−−→ G + kP, G ∗ ρ b p k q −−−→ G ∗ + kP, k ≥ ,P d −−→ ∅ . The Kumar model takes into account translational bursting but neglects protein fluctuations during thebinding-unbinding process, i.e. when a protein molecule binds to the promoter, the protein numberremains the same. The dynamics of this reaction scheme can be described by the Markovian modelillustrated in Fig. 7(a).It has been shown in [24] that the generating function of the Kumar model is given by F ( z ) = K (1 − pz ) − s F ( α , α ; β ; w ( z − z )) , (1,0)(0,0) (1,1)(0,1) (1,2)(0,2) (1, n )(0, n ) ...... ...... ρ b pq ρ b pq ρ b pqρ b p n qd d nd (1, n -1)(0, n -1) ρ u p n qρ u pq ρ u pq ρ u pqd d ndσ u multiscale averaging (large σ u and σ b ) group group n group n ... ... c pq c pq c n -1 pqc p n q n -1 d d nd b σ b σ u σ b σ u nσ b σ u c multiscale averaging (small σ u and σ b ) unbound group bound group rBσ b σ u d (1,0)(0,0) (1,1)(0,1) (1,2)(0,2) (1, n )(0, n ) ...... ...... ρ b pq ρ b pq ρ b pqρ b p n qd d nd (1, n -1)(0, n -1) ρ u p n qρ u pq ρ u pq ρ u pqd d ndσ u σ b σ u σ b σ u nσ b σ u Fig. 7.

Multiscale simplification of the Kumar model using averaging method. (a) Transition diagram of theKumar model. When the gene switches rapidly between the unbound and bound states, the two microstates (0 , n ) and (1 , n ) can be combined into a group that is labelled by group n . (b) Transition diagram of the group dynamicsin the limit of fast gene switching. Since group n is composed of two microstates with the same protein number,the group index n can be interpreted as the protein number. (c) Transition diagram of the Kumar model. When thegene switches slowly between the unbound and bound states, all unbound microstates (0 , n ) , as well as all boundmicrostates (1 , n ) , can be combined into a group. (d) Transition diagram of the two-state group dynamics in thelimit of slow gene switching. where K is a normalization constant and α + α = b + r µ − s, α α = b ( r − s )1 + µ , β = b µ + rµp (1 + µ )( q + µ ) , (26) w = (1 + µ ) pq + µ , z = 11 + µ . We can see that the generating function of the Kumar model has only one hypergeometric term but, aswe proved earlier, the generating function of the modified Kumar model is the superposition of twohypergeometric terms. Comparing Eqs. (9) and (26), we find that the parameters β , w , and z for thetwo models are exactly the same, while the parameters α and α for the two models are different.We next focus on the limiting case when the gene switches slowly between the unbound and boundstates, i.e. σ u , σ b (cid:28) ρ u , ρ b , d . Using the averaging method, the Kumar model can be simplified to thesame two-state Markovian dynamics (compare Fig. 7(c),(d) with Fig. 3(c),(d)). Thus, the Kumar modelleads to the same steady-state protein distribution as the modified Kumar model in the slow switchinglimit: p n = σ u σ u + rBσ b ( r ) n n ! p n q r + rBσ b σ u + rBσ b ( s ) n n ! p n q s . In other words, the Kumar and modified Kumar models share the same steady-state behaviour whengene switching is slow. Fig. 8(a)-(c) illustrate the steady-state protein distributions for the two modelsin different regimes of gene switching. It can be seen that the two models agree with each other verywell when gene switching is slow, while they disagree, as expected, when gene switching rates aremoderate or large.

There is still another case when the Kumar model agrees with the modified Kumar model. In thecase of σ u (cid:29) σ b , d , we have b (cid:29) µ and thus all the five parameters α , α , β, w, z for the twomodels are the same. Moreover, since β (cid:29) , we have A ≈ in Eqs. (8)-(9) and thus the generating p r obab ili t y protein number b protein number p r obab ili t y c protein number p r obab ili t y r e l a t i v e s en s i t i v i t y r e l a t i v e s en s i t i v i t y r e l a t i v e s en s i t i v i t y -3-2-10123 ρ u ρ b σ u σ b d B-1-0.500.511.5 ρ u ρ b σ u σ b d B -3-2-101234 ρ u ρ b σ u σ b d B10 15 20 2500.020.040.060.080.10.12 50 00.020.040.060.080.10.12 10 15 20 2550 00.050.10.150.20.25 10 15 20 2550 Fig. 8.

Comparison between the steady-state behaviour for the Kumar and modified Kumar models underdifferent gene switching rates . (a)-(c) Steady-state protein distributions (up) and relative sensitivities of proteinnoise over various model parameters (down) for the modified Kumar (blue) and Kumar (red) models. (a) Regimeof slow gene switching. (b) Regime of moderate gene switching . (c) Regime of fast gene switching. Themodel parameters are chosen as ρ u = 0 . , ρ b = 8 , d = 1 , p = 0 . . The gene switching rates are chosen as σ u = 10 − , σ b = 8 × − in (a), σ u = 1 , σ b = 80 in (b), and σ u = 10 , σ b = 8 × in (c). functions for the two models also coincide with each other. Hence in this case, the two models lead to thesame steady-state protein distribution, which can be obtained by taking the derivative of the generatingfunction at z = 0 : p n = n (cid:88) k =0 ( α ) k ( α ) k ( s ) n − k ( β ) k (1) k (1) n − k F ( α + k, α + k ; β + k ; − wz ) F ( α , α ; β ; w (1 − z )) w k p n − k q s . (27)We next focus on the limiting case when the gene switches rapidly between the unbound and boundstates. Since σ u , σ b (cid:29) ρ u , ρ b , d , the two microstates (0 , n ) and (1 , n ) of the Kumar model are inrapid equilibrium and thus can be aggregated into a group, as shown in Fig. 7(a). Similarly, using theaveraging method, the Kumar model can be simplified to the Markovian model illustrated in Fig. 7(b),where c n is the effective transcription rate defined in Eq. (16).There are two differences between the Kumar and modified Kumar models in the fast switchinglimit. For the modified Kumar model, both an effective transcription rate c n and an effective proteindegradation rate d n < d should be introduced (Fig. 3(b)). For the Kumar model, however, only aneffective transcription rate c n should be introduced. In addition, unlike the modified Kumar model,the group index n in the Kumar model can be interpreted as the protein number since each group iscomposed of two microstates with the same protein number. The steady-state protein distribution forthe Kumar model in the fast switching limit is given by [6, 18] p n = K ( p/d ) n n ! c ( c + d ) · · · ( c n − + ( n − d ) , (28)where K is a normalization constant.In the fast switching limit, we have seen that the two models lead to the same steady-state proteindistribution given by Eq. (27) when σ b (cid:28) σ u . However, the two models may deviate significantly from ach other when σ b (cid:29) σ u or when σ b and σ u are comparable. In particular, when σ b (cid:29) σ u , it followsfrom Eq. (12) that the steady-state protein distribution for the modified Kumar model is the negativebinomial distribution p n = ( s ) n n ! p n q s . Moreover, note that the effective transcription rate c n has the following approximation when σ b (cid:29) σ u : c = ρ u , c n = ρ b , n ≥ . Inserting these equations into Eq. (28), we find that the steady-state protein distribution for the Kumarmodel is given by a zero-inflated or zero-deflated negative binomial distribution p n = γδ ( n ) + (1 − γ ) ( s ) n n ! p n q s , where δ ( n ) is Kronecker’s delta which takes the value of 1 when n = 0 and takes the value of 0otherwise, and γ = s − rs + r ( q − s − (29)is a constant. We make a crucial observation that γ > for a positive feedback loop and γ < for anegative feedback loop. Therefore, when σ b (cid:29) σ u , the Kumar model leads to a zero-inflated negativebinomial protein distribution in the positive feedback case and leads to a zero-deflated negative binomialprotein distribution in the negative feedback case.To validate our theoretical analysis, we compare the steady-state protein distributions obtained us-ing the SSA for the two models in the regime of fast gene switching (Fig. 9). Clearly, the two modelsagree with each other perfectly when σ b (cid:28) σ u , but they fail as predicted when σ b (cid:29) σ u . In the lattercase, there is an apparent zero-inflation or zero-deflation in protein number, depending on the type offeedback loop. In the positive (negative) feedback case, the Kumar model has a much higher (lower)probability of having zero protein copy than the modified Kumar model, and the probability of havingnonzero protein copies for the Kumar model is lower (higher).From Eq. (14), we can see that there are two ways of increasing the steady-state protein mean:increasing the transcription rates ρ u and ρ b or increasing the mean burst size B = p/q . In either case,the term q − s in Eq. (29) becomes larger and thus the constant γ becomes closer to zero. When theprotein mean is very large, γ is almost zero and thus the zero-inflation/deflation effect becomes almostinvisible. In other words, the Kumar and modified Kumar models share similar steady-state behaviourwhen protein mean is large. Here we investigate the relative sensitivities of protein noise to various model parameters for theKumar and modified Kumar models. Recall that the relative sensitivity of protein noise with respect toa parameter s is defined as Λ( s ) = ∂ log η/∂ log s , which means that 1% change in s leads to Λ( s ) %change in protein noise [41].Since the two models coincide with each other in the regime of slow gene switching, the relativesensitivities of protein noise for the two models should be the same. This prediction is validated by thenumerical simulations in Fig. 8(a)-(c), where the relative sensitivities of protein noise with respect toall model parameters for the two models are compared in different regimes of gene switching. It is clear rotein number c d p r obab ili t y a p r obab ili t y protein number b positive feedback negative feedback σ b << σ u σ b >> σ u p r obab ili t y p r obab ili t y u ρ b σ u σ b d B-2-1.5-1-0.500.511.522.5 ρ u ρ b σ u σ b d B -1-0.8-0.6-0.4-0.200.20.40.60.81 ρ u ρ b σ u σ b d B r e l a t i v e s en s i t i v i t y r e l a t i v e s en s i t i v i t y r e l a t i v e s en s i t i v i t y r e l a t i v e s en s i t i v i t y -0.8-0.6-0.4-0.200.20.40.60.81 ρ u ρ b σ u σ b d B00.020.040.060.080.10.12 10 15 20 25 30 35 4050 00.050.10.150.20.25 10 15 2050 Fig. 9.

Comparison of the steady-state behaviour for the Kumar and modified Kumar models under fastgene switching . (a),(b) Steady-state protein distributions (left) and relative sensitivities of protein noise to all modelparameters (right) for the modified Kumar (blue) and Kumar (red) models when σ b (cid:28) σ u . (a) Positive feedbackloops. (b) Negative feedback loops. The model parameters are chosen as σ u = 10 , σ b = 10 , d = 1 , p = 0 . . Thetranscription rates in the two gene states are chosen as ρ u = 0 . , ρ b = 20 in (a) and ρ u = 2 . , ρ b = 0 . in (b).(c),(d) Steady-state protein distributions (left) and relative sensitivities of protein noise to all model parameters(right) for the modified Kumar (blue) and Kumar (red) models when σ b (cid:29) σ u . (c) Positive feedback loops.(d) Negative feedback loops. The model parameters are chosen as σ u = 10 , σ b = 10 , d = 1 , p = 0 . . Thetranscription rates in the two gene states are chosen as ρ u = 0 . , ρ b = 3 in (c) and ρ u = 15 , ρ b = 0 . in (d). the two models lead to the same relative sensitivities when gene switching is slow, while they fail toagree when gene switching rates are moderate or large.In the regime of fast gene switching, we also compare the relative sensitivities of protein noise forthe two models in positive and negative feedback cases. The relative sensitivities for the two modelsagree with each other when protein unbinding is much faster than protein binding (Fig. 9(a),(b)), whilethey disagree when protein binding is much faster than protein unbinding (Fig. 9(c),(d)). In the positivefeedback case, the Kumar model has larger relativity sensitivities than the modified Kumar model. Inthe negative feedback case, however, the modified Kumar model has larger relativity sensitivities tomost of the model parameters compared to the Kumar model. An interesting observation is that therelative sensitivity with respect to a parameter can have a different sign in one model compared to theother model. One such example is the relative sensitivity with respect to B in the negative feedbackcase with σ b (cid:29) σ u ; in this case, an increased burst size enlarges protein noise for the Kumar model butdiminishes protein noise for the modified Kumar model (Fig. 9(d)). In this paper, starting from a stochastic model of an autoregulatory genetic circuit with both mRNAand protein descriptions (the full model), we use the multiscale decimation method to obtain a reducedmodel with only the protein description, which we refer to as the modified Kumar model. This modeltakes into account both translational bursting resulting from short-lived mRNA and protein fluctuationsstemming from the binding/unbinding reactions with the promoter. Hence it generalizes two classicalmodels of an autoregulatory feedback loop proposed in previous papers: the Kumar model [24], whichtakes into account translational bursting but neglects protein fluctuations during the binding-unbindingprocess, as well as the Grima model [9], which takes into account protein fluctuations during the binding- nbinding process but neglects translational bursting.We have solved the CME of the modified Kumar model to obtain an exact expression for the steady-state protein number distribution in terms of Gaussian hypergeometric functions. Using the multiscaleaveraging method, we also obtain the analytic protein distributions in the limits of fast and slow geneswitching. In the regime of slow gene switching, the steady-state protein distribution is a mixture oftwo negative binomials and thus can lead to bistable gene expression for both positive and negativefeedback loops. We show that a positive feedback loop requires a larger feedback strength to achievebistability than a negative feedback loop. Interestingly, we discover that a positive feedback loop canproduce noise-induced bistability in the regime of fast gene switching even in the absence of cooperativebinding, while a negative feedback loop cannot.Moreover, we examine steady-state fluctuations in protein abundance in detail. Based on the exactsolutions of steady-state protein mean and protein variance, we decompose protein noise, measuredby the squared coefficient of variation, into three terms. The first term characterizes fluctuations dueto unregulated bursty gene expression which has been reported in previous work [27, 37, 40]. Theother two terms emerge due to the presence of autoregulation, one is positive and one is negative inmost biologically relevant situations, which collectively characterize the overall feedback contributionto protein noise. When gene switching is slow, the positive term is large and the negative terms vanishes.In this situation, the overall feedback contribution is large, which results in large protein noise in theregime of slow gene switching. When gene switching is fast, however, the negative term counteractsmost of the positive term and thus the overall feedback contribution is much weaker, with its signdepending on the type of feedback loop.We further study the relationship between the modified Kumar model and the other two classicalstochastic models of autoregulation. In the limit of small mean burst size, we have proved that themodified Kumar model reduces to the Grima model. In addition, we show that the modified Kumarmodel shares the same steady-state behaviour as the Kumar model under slow gene switching. Inthe regime of fast gene switching, however, the two models agree with each other when the bindingrate of the protein to the promoter is much smaller than the unbinding rate, but deviate significantlyfrom each other when the binding rate is much larger than the unbinding rate. In the latter case, theKumar model yields an apparent zero-inflation (zero-deflation) in protein number for positive (negative)feedback loops. We have also shown that the relative sensitivities of protein noise with respect to allmodel parameters, as predicted by the modified Kumar and Kumar models, are typically considerablydifferent in magnitude and in some cases also in sign. These differences as well as the artificial zero-inflation/deflation effect for the Kumar model become increasingly significant as the protein meandecreases, and hence our modified Kumar model provides a more accurate description of fluctuationsin the low protein number regime.In this paper we have obtained the steady-state solution of the modified Kumar model. Potentialextensions that are currently under investigation include the derivation of approximate results for thetime-dependent protein number distribution for both constant and time-varying transcription rates. Acknowledgements

C. J. acknowleges support from startup funds provided by the Beijing Computational Science Re-search Center. R. G. acknowledges support from the Leverhulme Trust (RPG-2018-423). ppendixA Exact solution of the generating function equations In the main text, we have shown that the generating functions of the modified Kumar model satisfythe following system of ODEs: (cid:40) [ sp (1 − z ) + b (1 − pz )] f − (1 − z )(1 − pz ) f (cid:48) − µ (1 − pz ) g (cid:48) = 0 ,rp (1 − z ) g − [(1 − z ) − µz ](1 − pz ) g (cid:48) − bz (1 − pz ) f = 0 . (30)Adding the above two equations gives (1 − pz ) f (cid:48) − [ b (1 − pz ) + sp ] f + (1 + µ )(1 − pz ) g (cid:48) − rpg = 0 . (31)Taking the derivative of this equation yields (1 − pz ) f (cid:48)(cid:48) − [ b (1 − pz ) + (1 + s ) p ] f (cid:48) + bpf + (1 + µ )(1 − pz ) g (cid:48)(cid:48) − (1 + r + µ ) pg (cid:48) = 0 . (32)To proceed, we set f ( z ) = (1 − pz ) − s h ( z ) . It is easy to verify that f (cid:48) = (1 − pz ) − s h (cid:48) + sp (1 − pz ) − s − h, (33) f (cid:48)(cid:48) = (1 − pz ) − s h (cid:48)(cid:48) + 2 sp (1 − pz ) − s − h (cid:48) + s ( s + 1) p (1 − pz ) − s − h. Combining Eqs. (31) and (33) shows that µg (cid:48) = − (1 − z )(1 − pz ) − s h (cid:48) + b (1 − pz ) − s h. (34)Taking the derivative of this equation yields µg (cid:48)(cid:48) = − (1 − z )(1 − pz ) − s h (cid:48)(cid:48) − sp (1 − z )(1 − pz ) − s − h (cid:48) + (1 − pz ) − s h (cid:48) + b (1 − pz ) − γ h (cid:48) + bsp (1 − pz ) − s − h. (35)Inserting Eqs. (33), (34), and (35) into Eq. (32), we find that h satisfies the second-order ODE a ( z ) h (cid:48)(cid:48) + b ( z ) h (cid:48) − c ( z ) = 0 , where a ( z ) = (1 + µ )( z − z )(1 − pz ) ,b ( z ) = [1 + p + b + µ + rp − sp ] − [( b + r ) + (2 − s )(1 + µ )] pz,c ( z ) = b (1 + r − s ) p. Note that we have defined z = 11 + µ . Since a ( z ) is a quadratic function of z , b ( z ) is a linear function of z , and c ( z ) is a constant function, theabove second-order ODE is a hypergeometric differential equation whose solution is given by h ( z ) = ˜ K F ( α + 1 , α + 1; β + 1; w ( z − z )) , here ˜ K is a constant and α + α = b + r µ − s − , α α = s − r µ + b ( r − s )1 + µ ,β = b µ + rµp (1 + µ )( q + µ ) , w = (1 + µ ) pq + µ . Therefore, we obtain f ( z ) = ˜ K (1 − pz ) − s F ( α + 1 , α + 1; β + 1; w ( z − z )) . (36)Moreover, it follows from Eq. (31) that rpg = (1 − pz ) f (cid:48) − [ b (1 − pz ) + sp ] f + (1 + µ )(1 − pz ) g (cid:48) . This equation, together with the first equation of Eq. (30), implies that rµpg = (1 + µ )( z − z )(1 − pz ) f (cid:48) − s (1 + µ ) p ( z − z ) f + b (1 − pz ) f. This shows that rµpF = rµpf + rµpg = (1 + µ )( z − z )(1 − pz ) f (cid:48) − s (1 + µ ) p ( z − z ) f + b (1 − pz ) f + rµpf. (37)It follows from Eq. (36) that f (cid:48) ( z ) = ˜ K (cid:20) ( α + 1)( α + 1) β + 1 w (1 − pz ) − s H + sp (1 − px ) − s − H (cid:21) , (38)where H = F ( α + 1 , α + 1; β + 1; w ( z − z )) , H = F ( α + 2 , α + 2; β + 2; w ( z − z )) . Inserting Eq. (36) and Eq. (38) into Eq. (37) yields rµp ˜ K F = (1 + µ ) w ( α + 1)( α + 1) β + 1 ( z − z )(1 − pz ) − s H + b (1 − px ) − s H + rµp (1 − px ) − s H . (39)Recall the following important property of Gaussian hypergeometric functions (see Eq. 15.5.19 in [36]) z (1 − z )( α + 1)( α + 1) F ( α + 2 , α + 2; β + 2; w ( z − z ))+ [ β − ( α + α + 1) z ]( β + 1) F ( α + 1 , α + 1; β + 1; w ( z − z )) − β ( β + 1) F ( α , α ; β ; w ( z − z )) = 0 . This equality implies that w (1 + wz ) ( α + 1)( α + 1) β + 1 ( z − z )(1 − pz ) H = βH − [ β − ( α + α + 1) w ( z − z )] H , (40)where H = F ( α , α ; β ; w ( z − z )) . nserting Eq. (40) into Eq. (39) yields rµp ˜ K (1 − pz ) s F = ( q + µ ) βH + ( C − Dz ) H , where C = b + rµp − ( q + µ )[ β + ( α + α + 1) wz ] ,D = bp − ( q + µ )( α + α + 1) w. Straightforward calculations show that C and D can be simplified as C = ( s − r ) + rµ ( q + µ ) β , D = ( s − r ) + sµ ( q + µ ) β . Therefore, we finally obtain F ( z ) = ˜ Krµp (1 − pz ) − s (cid:20) ( q + µ ) β F ( α , α ; β ; w ( z − z ))+ ( C − Dz ) F ( α + 1 , α + 1; β + 1; w ( z − z )) (cid:21) . B Derivation of the expression for the second moment of protein noise

Here we shall derive the analytic expression for the steady-state second moment of protein numberfluctuations given by Eq. (15). Substituting z = 1 in Eq. (6), we obtain µg (cid:48) (1) = bf (1) . (41)Taking the derivative of Eq. (6) yields ( s + b ) pf − [(1 + s ) p (1 − z ) + (1 + b )(1 − pz )] f (cid:48) − µpg (cid:48) + (1 − z )(1 − pz ) f (cid:48)(cid:48) + µ (1 − pz ) g (cid:48)(cid:48) = 0 . (42)Substituting z = 1 in this equation gives µg (cid:48)(cid:48) (1) = − ( s + b ) Bf (1) + (1 + b ) f (cid:48) (1) + µBg (cid:48) (1) . (43)Moreover, taking the derivative of Eq. (7) yields b (1 − pz ) f + rpg + bz (1 − pz ) f (cid:48) − [(1 + r ) p (1 − z ) + (1 + µ )(1 − pz ) − µpz ] g (cid:48) + [(1 − z ) − µz ](1 − pz ) g (cid:48)(cid:48) = 0 . (44)Substituting z = 1 in this equation gives µg (cid:48)(cid:48) (1) = b (1 − B ) f (1) + rBg (1) + bf (cid:48) (1) − [(1 + µ ) − µB ] g (cid:48) (1) . (45)Combining Eqs. (43) and (45), we obtain f (cid:48) (1) + (1 + µ ) g (cid:48) (1) = ( sB + b ) f (1) + rBg (1) . Inserting Eq. (41) into this equation yields µf (cid:48) (1) = ( sBµ − b ) f (1) + rBµg (1) . (46) n the other hand, taking the derivative of Eq. (42) yields s + b ) pf (cid:48) − [(2 + s ) p (1 − z ) + (2 + b )(1 − pz )] f (cid:48)(cid:48) − µpg (cid:48)(cid:48) + (1 − z )(1 − pz ) f (cid:48)(cid:48)(cid:48) + µ (1 − pz ) g (cid:48)(cid:48)(cid:48) = 0 . Substituting z = 1 in this equation gives µg (cid:48)(cid:48)(cid:48) (1) = − s + b ) Bf (cid:48) (1) + (2 + b ) f (cid:48)(cid:48) (1) + 2 µBg (cid:48)(cid:48) (1) . (47)Moreover, taking the derivative of Eq. (44) yields − bpf + 2 b (1 − pz ) f (cid:48) + 2(1 + r + µ ) pg (cid:48) + bz (1 − pz ) f (cid:48)(cid:48) − [(2 + r ) p (1 − z ) + 2(1 + µ )(1 − pz ) − µpz ] g (cid:48)(cid:48) + [(1 − z ) − µz ](1 − pz ) g (cid:48)(cid:48)(cid:48) = 0 . Substituting z = 1 in this equation gives µg (cid:48)(cid:48)(cid:48) (1) = − bBf (1) + 2 b (1 − B ) f (cid:48) (1) + 2(1 + r + µ ) Bg (cid:48) (1) + bf (cid:48)(cid:48) (1) − µ ) − µB ] g (cid:48)(cid:48) (1) . (48)Combining Eqs. (47) and (48), we obtain f (cid:48)(cid:48) (1) + (1 + µ ) g (cid:48)(cid:48) (1) = − bBf (1) + [(1 + s ) B + b ] f (cid:48) (1) + (1 + r + µ ) Bg (cid:48) (1) . (49)Inserting Eq. (45) into this equation yields F (cid:48)(cid:48) (1) = − bf (1) − rBg (1) + (1 + s ) Bf (cid:48) (1) + [(1 + r ) B + 1 + µ ] g (cid:48) (1) . Moreover, inserting Eqs. (41) and (46) into this equation, we obtain µF (cid:48)(cid:48) (1) = s (1 + s ) B µf (1) + r (1 + r ) B µg (1) + (1 − sB + rB )[ bf (1) − rBµg (1)] . It follows from Eq. (14) that F (cid:48) (1) = (cid:104) n (cid:105) = sBf (1) + rBg (1) . Therefore, we have F (cid:48)(cid:48) (1) + F (cid:48) (1) = sB [(1 + s ) B + 1] f (1) + rB [(1 + s ) B + 1] g (1) + (1 − sB + rB ) (cid:20) bµ f (1) − rBg (1) (cid:21) . Since (cid:104) n (cid:105) = F (cid:48)(cid:48) (1) + F (cid:48) (1) , we finally obtain Eq. (15) in the main text. C Proof that the deterministic theory only predicts monostability

In the main text, we have shown that a noisy autoregulatory gene network can perform bistability.Here we show that the traditional deterministic theory fails to predict bistability in any parameter region.According to the chemical reaction scheme of the modified Kumar model (4) and the law of massaction, the deterministic theory of the modified Kumar model is given by the following set of coupledODEs: (cid:40) ˙ g = σ b x (1 − g ) − σ u g, ˙ x = − σ b x (1 − g ) + σ u g + ρ u B (1 − g ) + ρ b Bg − dx, here g is the mean number of genes in the bound state and x is the mean protein number. In steady-stateconditions, the gene number equilibrates to g = σ b xσ u + σ b x . Substituting this equation into the time-evolution equation for the protein numbers, we obtain ˙ x = c ( x ) B − dx, where c ( x ) = ρ u σ u + ρ b σ b xσ u + σ b x is the effective transcription rate defined in Eq. (16). Fig. 10 shows graphs of the functions y = c ( x ) B and y = dx , whose intersections give the fixed points of the deterministic model. Clearly, there is onlyone intersection in the negative feedback case, which means that the deterministic model has only onefixed point, which is an attractor (Fig. 10(a)). In the positive feedback case, however, there are one ortwo intersections, depending on whether ρ u vanishes or not. If ρ u (cid:54) = 0 , the deterministic model hasonly one fixed point that is an attractor (Fig. 10(b)). If ρ u = 0 , the deterministic model has two fixedpoints: one is an attractor which is away from zero and the other is an repeller which lies exactly at zero(Fig. 10(c)). In all cases, there is only one attractor, which implies that the deterministic theory of themodified Kumar model does not allow bistability for any choice of model parameters. b c stable stableunstable y = dxy = dx y = c ( x ) By = c ( x ) B a stable y = dxy = c ( x ) B Fig. 10.

Fixed points of the mean-field approximation of the modified Kumar model under fast promoterswitching . (a)-(c) The graphs of the functions y = c ( x ) B which describes protein synthesis (blue) and y = dx which describes protein degradation (red). The intersections of the two functions give the fixed points of thedeterministic model. (a) Negative feedback loops. (b) Positive feedback loops with ρ b (cid:54) = 0 . (c) Positive feedbackloops with ρ b = 0 . In (a)-(c), the model parameters are chosen as σ u = σ b = d = 1 , p = 0 . . The transcriptionrates in the two gene states are chosen as ρ u = 10 , ρ b = 1 in (a), ρ u = 1 , ρ b = 6 in (b), and ρ u = 0 , ρ b = 6 in (c). References [1] Shen-Orr, S. S., Milo, R., Mangan, S. & Alon, U. Network motifs in the transcriptional regulation network ofEscherichia coli.

Nature genetics , 64 (2002).[2] Rosenfeld, N., Elowitz, M. B. & Alon, U. Negative autoregulation speeds the response times of transcriptionnetworks. J. Mol. Biol. , 785–793 (2002).[3] Simpson, M. L., Cox, C. D. & Sayler, G. S. Frequency domain analysis of noise in autoregulated gene circuits.

Proceedings of the National Academy of Sciences , 4551–4556 (2003).[4] Hasty, J., Pradines, J., Dolnik, M. & Collins, J. J. Noise-based switches and amplifiers for gene expression.

Proceedings of the National Academy of Sciences , 2075–2080 (2000).

5] Liu, P., Yuan, Z., Wang, H. & Zhou, T. Decomposition and tunability of expression noise in the presence ofcoupled feedbacks.

Chaos: An Interdisciplinary Journal of Nonlinear Science , 043108 (2016).[6] Jia, C., Xie, P., Chen, M. & Zhang, M. Q. Stochastic fluctuations can reveal the feedback signs of generegulatory networks at the single-molecule level. Sci. Rep. , 16037 (2017).[7] Jia, C., Qian, H., Chen, M. & Zhang, M. Q. Relaxation rates of gene expression kinetics reveal the feedbacksigns of autoregulatory gene networks. J. Chem. Phys. , 095102 (2018).[8] To, T.-L. & Maheshri, N. Noise can induce bimodality in positive transcriptional feedback loops withoutbistability.

Science , 1142–1145 (2010).[9] Grima, R., Schmidt, D. & Newman, T. Steady-state fluctuations of a genetic feedback loop: An exact solution.

J. Chem. Phys. , 035104 (2012).[10] Ko, C. H. et al.

Emergence of noise-induced oscillations in the central circadian pacemaker.

PLoS biology ,e1000513 (2010).[11] Thomas, P., Straube, A. V. & Grima, R. The slow-scale linear noise approximation: an accurate, reducedstochastic description of biochemical networks under timescale separation conditions. BMC systems biology , 39 (2012).[12] Thomas, P., Straube, A. V., Timmer, J., Fleck, C. & Grima, R. Signatures of nonlinearity in single cellnoise-induced oscillations. Journal of theoretical biology , 222–234 (2013).[13] Forger, D. B. & Peskin, C. S. Stochastic simulation of the mammalian circadian clock.

Proceedings of theNational Academy of Sciences , 321–324 (2005).[14] Guerriero, M. L. et al.

Stochastic properties of the plant circadian clock.

Journal of The Royal SocietyInterface , 744–756 (2011).[15] Holehouse, J., Cao, Z. & Grima, R. Stochastic modeling of auto-regulatory genetic feedback loops: a reviewand comparative study. arXiv preprint arXiv:1910.08937 (2019).[16] Friedman, N., Cai, L. & Xie, X. S. Linking stochastic dynamics to population distribution: an analyticalframework of gene expression. Phys. Rev. Lett. , 168302 (2006).[17] Thomas, P., Matuschek, H. & Grima, R. How reliable is the linear noise approximation of gene regulatorynetworks? BMC genomics , S5 (2013).[18] Mackey, M. C., Tyran-Kaminska, M. & Yvinec, R. Dynamic behavior of stochastic gene expression modelsin the presence of bursting. SIAM J. Appl. Math. , 1830–1852 (2013).[19] Bokes, P. & Singh, A. Protein copy number distributions for a self-regulating gene in the presence of decoybinding sites. PloS one , e0120555 (2015).[20] Thomas, P., Popovi´c, N. & Grima, R. Phenotypic switching in gene regulatory networks. Proc. Natl. Acad.Sci. USA , 6994–6999 (2014).[21] Ge, H., Qian, H. & Xie, X. S. Stochastic phenotype transition of a single cell in an intermediate region ofgene state switching.

Physical review letters , 078101 (2015).[22] Jia, C., Zhang, M. Q. & Hong, Q. Emergent L´evy behavior in single-cell stochastic gene expression.

Phys.Rev. E , 040402(R) (2017).[23] Jia, Chen, G. G. Y. & Zhang, M. Q. Single-cell stochastic gene expression kinetics with coupled positive-plus-negative feedback. Phys. Rev. E , 052406 (2019).[24] Kumar, N., Platini, T. & Kulkarni, R. V. Exact distributions for stochastic gene expression models withbursting and feedback.

Phys. Rev. Lett. , 268105 (2014).[25] Yu, J., Xiao, J., Ren, X., Lao, K. & Xie, X. S. Probing gene expression in live cells, one protein molecule at atime.

Science , 1600–1603 (2006).[26] Taniguchi, Y. et al.

Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in singlecells.

Science , 533–538 (2010).[27] Shahrezaei, V. & Swain, P. S. Analytical distributions for stochastic gene expression.

Proc. Natl. Acad. Sci.USA , 17256–17261 (2008).[28] Bernstein, J. A., Khodursky, A. B., Lin, P.-H., Lin-Chao, S. & Cohen, S. N. Global analysis of mRNA decay nd abundance in Escherichia coli at single-gene resolution using two-color fluorescent DNA microarrays. Proc. Natl. Acad. Sci. USA , 9697–9702 (2002).[29] Jia, C. Simplification of Markov chains with infinite state space and the mathematical theory of random geneexpression bursts. Phys. Rev. E , 032402 (2017).[30] Pigolotti, S. & Vulpiani, A. Coarse graining of master equations with fast and slow states. J. Chem. Phys. , 154114 (2008).[31] Jia, C. Reduction of Markov chains with two-time-scale state transitions.

Stochastics , 73–105 (2016).[32] Jia, C. Simplification of irreversible Markov chains by removal of states with fast leaving rates. J. Theor. Biol. , 129–137 (2016).[33] Bo, S. & Celani, A. Multiple-scale stochastic processes: decimation, averaging and beyond.

Phys. Rep. ,1–59 (2016).[34] Cao, Z. & Grima, R. Linear mapping approximation of gene regulatory networks with stochastic dynamics.

Nat. Commun. , 3305 (2018).[35] Cao, Z. & Grima, R. Accuracy of parameter estimation for auto-regulatory transcriptional feedback loopsfrom noisy data. Journal of The Royal Society Interface , 20180967 (2019).[36] Olver, F. W. J. et al. NIST Digital Library of Mathematical Functions. http://dlmf.nist.gov/, Release 1.0.17 of2017-12-22 (2017).[37] Paulsson, J. & Ehrenberg, M. Random signal fluctuations can reduce random fluctuations in regulatedcomponents of chemical regulatory networks.

Phys. Rev. Lett. , 5447 (2000).[38] Jia, C. Model simplification and loss of irreversibility. Phys. Rev. E , 052149 (2016).[39] Holehouse, J. & Grima, R. Revisiting the reduction of stochastic models of genetic feedback loops with fastpromoter switching. Biophys. J. , 1311 (2019).[40] Ozbudak, E. M., Thattai, M., Kurtser, I., Grossman, A. D. & Van Oudenaarden, A. Regulation of noise in theexpression of a single gene.

Nat. Genet. , 69 (2002).[41] Ingalls, B. Sensitivity analysis: from model parameters to system behaviour. Essays in biochemistry ,177–194 (2008).,177–194 (2008).