aa r X i v : . [ m a t h . P R ] D ec On the bursting of gene products
Romain Yvinec ∗ and Alexandre F. Ramos † Institut Camille Jordan UMR 5208 Universit´e Claude Bernard Lyon1 43 Bd du 11 novembre 1918 69622 Villeurbanne cedex France Escola de Artes, Ciˆencias e Humanidades,Universidade S˜ao Paulo, Av. Arlindo B´ettio,1000, CEP. 03828-000, S˜ao Paulo, SP, Brazil ‡ Abstract
In this article we demonstrate that the so-called bursting production of molecular species duringgene expression may be an artifact caused by low time resolution in experimental data collectionand not an actual burst in production. We reach this conclusion through an analysis of a two-stageand binary model for gene expression, and demonstrate that in the limit when mRNA degradationis much faster than protein degradation they are equivalent. The negative binomial distributionis shown to be a limiting case of the binary model for fast “on to off” state transitions and highvalues of the ratio between protein synthesis and degradation rates. The gene products populationincreases by unity but multiple times in a time interval orders of magnitude smaller than proteinhalf-life or the precision of the experimental apparatus employed in its detection. This rare-and-fastone-by-one protein synthesis has been interpreted as bursting. usual mechanism for the synthesis ofgene products [5, 6]. As we show here, the inference of bursting molecular production maybe an artifact due to a lack of sufficiently fine temporal resolution in experimental data.Also, from a modeling perspective the inference of bursting may be flawed due to a relianceon the shape of stationary probability distributions rather than an analysis of the underlyingdynamical processes giving rise to them.The experimental observation of these jumps in molecular numbers has motivated sev-eral models for the prediction and fitting of observed data, e.g. by employing a Langevinapproach (continuous) or the master equation (discrete). In the continuous case, stationarygamma distributions for molecular concentrations are predicted along with discontinuoustrajectories for the corresponding stochastic process [7, 8]. In the discrete framework, burstsappear in models for gene expression with two stochastic variables (so called two-stage mod-els with mRNA and protein), in the limit where the mRNA degradation rate is much largerthat the degradation rate for protein (a common experimental finding). In these cases, themodel predicted probability distributions are well described by a negative binomial proba-bility distribution [9] and simulations exhibit temporal bursting in protein numbers [6].In this article we use a discrete modeling framework to show that the bursting limit ac-tually corresponds to a particular regime of a model of a switching gene between “on” and“off” states with a one step variation in the stochastic variable corresponding to proteinnumbers [10–12]. The model we develop here is an approximation to a model which is inte-grable in both the stationary [12, 13] and time-dependent regimes [14, 15] with symmetries2nderlying the existence of analytical solution, and whose biological implications have beenexplored elsewhere [16, 17].The steady state solution for the binary model, in the limit of rapid transitions from the“on” to the “off” state, approaches a negative binomial distribution that also describes thebursting model. Simulation of the binary model shows apparent bursting, but examined ona finer time scale reveals the protein numbers actually increase in a unitary fashion. Thisclearly suggests that the experimental detection of bursting in gene product numbers maybe due to lack of temporal resolution in the data. Our results give clear guidelines for theconditions on the transcription and translation processes for artifactual bursts to appear. Assuch, the modeling establishes necessary conditions on the experimental temporal resolutionnecessary to establish the existence of true bursting.Gene expression is a cascade of first transcription to produce mRNA followed by trans-lation of that mRNA to protein. Thus it makes sense to take the system state variablesto be the numbers of mRNA and protein molecules in a given cell. mRNA molecules areproduced at a rate that is dependent on the interaction between the RNA polymerase andthe promoter site. The number of mRNA molecules in the cell and their interactions withribosomes controls the protein synthesis. In this framework, self regulation is introduced inthe model by considering the transcription rate to be dependent on the number of proteinmolecules produced. It is worth noting that specific regulation of the state of the promotersite (active or repressed) is not taken into account on this model.This picture for non-regulated genes has been treated in the literature previously [18].Let m and n denote, respectively, the number of mRNA and protein molecules. Theprobability of finding the system in a state ( m, n ), m, n ≥
0, at time t is denoted by P m,n ( t ),while the synthesis rates for mRNA and protein are denoted by µ M , µ M and ν P , and thecorresponding degradation rates are ρ M and ρ P . Then the evolution of the probability isgoverned by a master equation for two coupled birth-death processes: dP m,n dt = ( µ M + µ M n )( P m − ,n − P m,n )+ ν P m ( P m,n − − P m,n )+ ρ M [( m + 1) P m +1 ,n − P m,n ]+ ρ P [( n + 1) P m,n +1 − nP m,n ] , (1)where we have assumed that the transcription rate is a function of the protein number ( µ M n ),3ndicating positive self regulation, with the requirement that µ M = 0 . We have assumed alinear dependence between the protein translation rate ( ν P ) and the number of availablemRNA molecules in the cytoplasm.We have been unable to construct an analytic solutionto the complete system of Eq. (1). However, exact quadrature is achieved in the limitingcase when the mRNA degradation rate is much greater than the protein degradation rate( ρ M /ρ P ≫
1) [9] so the mRNA lifetime is quite short relative to the protein lifetime.That suggests scaling the model parameters by the protein lifetime ( ∼ ρ − P ), which resultsin the dimensionless quantities µ = µ M ρ P , µ = µ M ρ P , γ = ρ M ρ P , ν = ν P ρ P . (2)The approximate Eq. (1) becomes dP ,n dτ = [( n + 1) P ,n +1 − nP ,n ] − ( µ + µ n ) P ,n + γP ,n , (3) dP ,n dτ = ν ( P ,n − − P ,n ) + [( n + 1) P ,n +1 − nP ,n ]+ ( µ + µ n ) P ,n − γP ,n , (4)where we have introduced the dimensionless time τ = ρ P t scale and the approximations P m,n ∼ , m ≥ µ + µ n ) P ,n / ( γP ,n ) ∼
1, for all n . Since our simplifying assumptionimplies that the mRNA lifetime is short relative to that of the protein, we would expect thatmRNA probabilities will be peaked around zero for µ , µ of the same order as ρ M . Thisoffers some justification for assuming that Eqs. (3) and (4) are valid for describing geneexpression (Supplementary information).Eqs. (3) and (4) have the same form as the master equation for a binary gene with thestate (1 , n ) (or (0 , n )) as the active (or repressed) state of protein synthesis with rate ν (orzero). The “on-off” switching rate is given in terms of the mRNA degradation rate γ andthe “off-on” transition depends on the mRNA synthesis rates, µ (for external regulation)or µ + µ n (self regulation). To write the solutions of the model presented at the Eqs. (3)and (4), we define constants ( a, b, θ ) as folows: a = µ µ , b = µ + γ µ , θ = 11 + µ , (5)where external regulation is recovered by setting θ = 1. For simplicity we consider only thesteady state solutions for Eqs. (3) and (4), P ,n , P ,n , and the probabilities for finding n P n = P ,n + P ,n , namely: P ,n = b − aCb ν n n ! ( a ) n (1 + b ) n M( a + n, b + n, − νθ ) , (6) P ,n = aCb ν n n ! (1 + a ) n (1 + b ) n M(1 + a + n, b + n, − νθ ) , (7) P n = ν n Cn ! ( a ) n ( b ) n M( a + n, b + n, − νθ ) , (8)where M ( a, b, z ) denotes the Kummer M function [19] and the normalization constant C = M( a, b, ν (1 − θ )) , assures conservation of probability P ∞ n =0 ( P ,n + P ,n ) = 1. Note that for external regulation, C = M( a, b,
0) = 1.The denominator in Eq. (5), 1+ µ , should be interpreted as the total protein removal ratefrom the cytoplasm by degradation plus transcription stimulus. The constant a characterizesthe rate of spontaneous (basal) mRNA synthesis relative to the protein removal rate andstates a relation with the probability for finding one mRNA ( p ), defined as P ∞ n =0 P ,n ,namely p = aCb M( a + 1 , b + 1 , ν (1 − θ )) , (9)and, for the external regulating gene, θ = 1, it implies p = ab . Phenomenologically, b is a compound relation between the rate for a cycle of mRNAsynthesis-degradation and the protein removal rate. Its role in determining the statistics ofprotein numbers is seen from the average number of protein molecules, h n i = p ν , and thevariance relative to the average (Fano factor), σ / h n i = ( h n i − h n i ) / h n i , given by σ h n i = 1 + ν a + 1 b + 1 M( a + 2 , b + 2 , ν (1 − θ ))M( a + 1 , b + 1 , ν (1 − θ )) − ν ab M( a + 1 , b + 1 , ν (1 − θ ))M( a, b, ν (1 − θ )) . (10)For the case where θ = 1, we have σ h n i = 1 + νb − a/b /b . (11)In the limit b → + ∞ (fast switching) and the parameters ( a, θ, ν ) are finite, Eq. (10) isequal to one and the distribution of protein is Poissonian.5o get some intuition into this system, consider the steady state (or equilibrium) of theEqs. (3) and (4), when probabilities are fixed with time, but the variables ( m, n ) change intime with the probabilities given by Eqs. (6), (7) and (8). b gives the average time for thesystem to complete one switching cycle – e.g. from off to on and back to off again. Thenthe probabilities p and 1 − p are the fractions of the total switching time that the systemspends in the active and inactive states respectively.The transition from the dynamic to the stationary regime, as noted previously [13, 15],has an approach to equilibrium characterized by two of the time scales of the model, ρ − P and ( ρ P + µ M ) − b − . The former is the typical lifetime of the protein, whereas the secondone is related to the switching. For rapid protein degradation, compared to the switching,the steady state is achieved after the equilibrium of on-off transitions that are slow and canresult in super Poisson stationary distributions (eventually, bi-modality occurs with eachpeak related to one state of the system). On the other hand, when protein degradationdominates, and there is fast switching, the distributions are uni-modal. In that case, thegene regulatory mechanism ( e.g. if binary or constitutive) is indistinguishable by simpleprotein counting.This reasoning suggests that bursting would occur for systems with a large value of ν and p ∼
0. Biologically, this would mean that the mRNA number is mostly zero duringan entire switching cycle. For a p fraction of that cycle, there will be one mRNA that israpidly translated (at rate ν ) and thus several unitary increments in n take place duringa very short time. This will appear to be a single near-instantaneous increase in proteinnumber by more than one. A mechanism for a rapid increase in n from one mRNA is thebinding of several ribosomes to the mRNA.Mathematically, the negative binomial distribution is assumed to describe a randomvariable characterizing a bursting process. We can show (Supplementary information) thatthe negative binomial distribution is a particular case of the probabilities of the Eq. (8) atthe limit of b, ν → ∞ with their ratio δ = νb , (12)kept finite, namely: P n → ( a ) n n ! (cid:18) δ δθ (cid:19) n (cid:18) δ ( θ − δθ (cid:19) a , (13)where ( a ) n = a ( a + 1) . . . ( a + n − , ( a ) = 1. For the self-regulating case, an approximate6egative binomial distribution occurs for θ ∼
1, which implies weak induction of mRNA syn-thesis by proteins, ( µ << θ = 1, and the probabilitiesat equation above become: P n → ( a ) n n ! (cid:18) δ δ (cid:19) n (cid:18)
11 + δθ (cid:19) a , (14)that is the negative binomial distribution [9].We illustrate our results in FIG. 1 where the left hand column is for the external regulatedgene and the right hand column is for the self regulating case. FIG. 1.A and 1.D arethe steady state probability distributions as obtained from the expression of the binary(Eq. 8) and negative binomial (Eq. 14) models. For the parameter values we choose,inspection shows high agreement with the externally regulated while a slight differenceappears for the self-regulating gene. The corresponding trajectories are obtained from thebinary model, with the protein numbers shown in FIG. 1.B and 1.E. The apparent burstsappear explicitly at protein half-life time scale. However, an expansion of the time scalereveals that the protein synthesis is occurring one by one. Finally, the corresponding mRNAnumber dynamics is shown in FIG. 1.C and 1.F. As expected, it is switching between veryshort time intervals with one mRNA and long intervals with no mRNA.Experimentally, the temporal resolution necessary for avoiding anomalous bursting de-tection should be of the order of the average time for translation of one protein, ∼ /ν P . Inwhat follows, we shall consider the system approached in Ref. [20] to show an example of ameasurement of apparent bursting. In their work, the authors monitored the expression ofthe β -gal protein under the control of the lac promoter. They have detected the occurrenceof burstings in protein numbers and measured the average bursting size to be of ∼ δ , of the negativebinomial probability distribution at the Eqs. (13) and (14). Therefore, the protein synthesisrate, ν P at the Eq. (1), can be written as a function of δ as follows: ν P = δ ρ P µ M + ρ M ρ P + µ M , (15)7hat is deduced by combining the Eqs. (12), (5), and (2).To proceed calculating ν P , we set µ M = 0 – and assume external regulation – since the lac promoter interacts with the Lac repressor protein that is not encoded in the lac operon genes.Then, the expression for calculating the protein synthesis rate at the Eq. (12) is reducedto ν p = δ ( µ M + ρ M ) . The values of the remining unknown constants, mRNA synthesis anddegradation, are determined in terms of experimental measures.The mRNA degradation rate of the β -gal mRNA – ρ M – is taken to be ∼ . − basedon data reported in Ref. [21].We use the data provided in Ref. [20] to estimate the mRNA synthesis rate ( µ M ) at ∼ − min − . This number is achieved dividing the average frequence of bursting of 0.16per cell cycle by the average period of a cell cycle, 145 min (both data from Ref. [20]). Thebursts of proteins occur whenever one mRNA arrives at the cytoplasm. Therefore, we canconvert the average bursting frequence into the average mRNA synthesis rate.Based on the values of δ, µ M , µ M , and ρ M above we compute the protein synthesis rateas ν P ∼ − . Thus, the time scale for the synthesis of one protein is ∼ /ν P ∼ . ∼ ν = ν P /ρ P that is a large number, >>
1. Then, whenever an mRNA appears in the cytoplasm, a plethora of proteins is fastlysynthesized while their degradation is very slow. In a plot of the protein number versus time,8hat condition appears as a fast increment in protein population, followed by a plateau, ifthe experiment stands shorter than the protein half-life time.In that case, a description of protein numbers inside the cell in terms of the negativebinomial distribution is appropriate. However, it must be emphasized that the fitting ofthe measured histogram by a known probability distribution does not imply the occurrenceof the stochastic process from where the distribution is derived. The predictive power ofa model based on such approach might be lowered. In that sense, our discussion is anincrement in the capability of interpreting experimental data. That is done establishing thetime resolution necessary for experimental demonstration of bursting.We establish two possible sceneries for the occurrence of real burst of proteins. It isassumed experimental time resolution of the order of ∼ /ν P and the measurement of agreater than one instantaneous increment in protein population. The first scenery requiresthe existence of only one mRNA in the cytoplasm. A possible mechanism to underlie thiseffect is: multiple polipeptide chains present in cytoplasm that were translated individuallyand start their functional activity simultaneously. Note that that implies a delay betweenthe translation process and the protein folding. In our second bursting scenery, there isabundant fast degradating mRNA’s in the cytoplasm that can be translated synchronouslyin an one-by-one fashion. Hence, multiple proteins could be synthesized simultaneously ina time scale of the order of ∼ /ν P .Our results also suggest a picture of gene expression where the bursting (or burst-like)dynamics corresponds to one among other possible behaviors. Different regimes of geneexpression are possible depending on the specific relations among the effective rates of thereactions participating of a gene network. For example, the model at Eq. (1) has a precisebiological interpretation. Its approximation in terms of the binary model, Eqs. (3) and (4),shows the utility of the “on” and “off” model for the analysis of gene products synthesis. Interms of probability distributions one expect that, besides the negative binomial, the geneproducts should also satisfy the probabilities based on Eqs. (6), (7) and (8).These probabilities satisfy the Eq. (1) when the probability to find more than one mRNAin the cytoplasm is neglegible. In this sense, one might provide further approximations tothe solution of the Eq. (1) for neglegible probability of detecting three, four mRNA’s in thecytoplasm, respectively, in terms of ternary, quaternary models. Different insights in theworkings of gene expression could also be provided by this kind of approach.9n this manuscript we are proposing a theoretical framework, based on the binary modelto gene expression, that generalizes the negative binomial distribution for the description ofthe stochasticity in gene products number. The burst like behavior occurs in a well definedregime, when the ratio between synthesis and degradation rates is of the order of 10 and thesynthesis of gene products very rare. As we predict from our model, measurements aimingto detect the one-by-one increments in gene products number must have temporal resolutionof the order of the synthesis rate (1 /ν P ), e.g. in conditions reported in Ref. [20] that wouldimply a time resolution of ∼ −
60 s. ∗ [email protected] † [email protected] ‡ This research was supported by the Natural Sciences and Research Council of Canada and themobility fellowship CMIRA EXPLORA DOC 2010 of the r´egion Rhˆone-Alpes (France). Thework was carried out while both authors were visiting the Centre for Applied Mathematicsin Bioscience and Medicine (CAMBAM). AFR and RY thanks CAMBAM members for warmhospitality and fruitful intelectual environment and specially M.C Mackey for carefully readingthe manuscript. AFR thanks JEM Hornos for introduction into the field and partial supportfrom Programa de Incentivo a Jovens Docentes of USP. RY thanks M. Santill´an Zer´on foruseful comments.[1] Mads Kaern, Timothy C. Elston, William J. Blake, and James J. Collins. Stochasticity ingene expression: from theories to phenotypes.
Nat Rev Genet , 6:451–464, 2005.[2] Jonathan M. Raser and Erin K. O’Shea. Noise in gene expression: Origins, consequences, andcontrol.
Science , 309:2010–2013, 2005.[3] M. Delbr¨uck. Statistical fluctuations in autocatalytic reactions.
J. Chem. Phys. , 8:120–124,1940.[4] Hartmut Kuthan. Self-organisation and orderly processes by individual protein complexes inthe bacterial cell.
Progress in Biophysics and Molecular Biology , 75:1–17, 2001.[5] X. Sunney Xie, Paul J. Choi, Gene-Wei Li, Nam Ki Lee, and Giuseppe Lia. Single-moleculeapproach to molecular biology in living bacterial cells.
Annu. Rev. Biophys. , 37:417–444, 2008.[6] Arjun Raj and Alexander van Oudenaarden. Single-molecule approaches to stochastic gene Time [ ]
Time [ ]
Negative binomial P r ob a b iliti e s
0 20 40 60 0 20 40 60
Binary D Self regulating gene A External regulating geneProbability distributions
10 5 0 5.76 5.766 n ( t )
20 10 0 0 4 8 12 0 4 8 12
B E
Protein numbers m ( t )
10 12 8 4 0 0 4 8 12 FC mRNA numbers FIG. 1.
Steady state probability distributions and trajectories.
The parameter values forthe externally regulated gene are µ = 1 , γ = 99 and for the self regulating gene µ = 100 / , µ =1 / , γ = 100. The parameters ( a, b, θ, ν ), for the binary models, are then (1 , , , , , . , a = 1 and δ = ν/b = 10. In FIG. 1. B, E, C, and F,the trajectories for the binary model are shown in protein half-life time scale. An expansion ofthe time scale in the region bounded by the dashed lines in FIG. 1. B and E is shown in theinsets where the time scales are magnified by 10 , the same as the ratio between protein synthesisand degradation rates. For the external and self regulating genes, the probabilities for findingone mRNA are, respectively, 0.01 and 0.011, the mean protein number h n i are, 10 and 11.08, andfinally the Fano factor’s values are 10.8 and 11.08. The Fano factor value of the negative binomialdistribution is 11, considering the same set of parameters, and is closely related to the one for thebinary probabilities. xpression. Annu. Rev. Biophys. , 38:255–270, 2009.[7] Nir Friedman, Long Cai, and X. Sunney Xie. Linking stochastic dynamics to populationdistribution: An analytical framework of gene expression.
Phys. Rev. Lett. , 97:168302–, 2006.[8] Michael C. Mackey, Marta Tyran-Kaminska, and Romain Yvinec. Molecular distributions ingene regulatory dynamics.
Journal of Theoretical Biology , 274:84–96, 2011.[9] Vahid Shahrezaei and Peter S Swain. Analytical distributions for stochastic gene expression.
Proc Natl Acad Sci U S A , 105:17256–17261, 2008.[10] J. Peccoud and B. Ycart. Markovian modeling of gene-product synthesis.
Theoretical Popu-lation Biology , 48:222–234, 1995.[11] M. Sasai and P. G. Wolynes. Stochastic gene expression as a many-body problem.
Proc NatlAcad Sci U S A , 100:2374–9, 2003.[12] J. E. M. Hornos, D. Schultz, G. C. P. Innocentini, J. Wang, A. M. Walczak, J. N. Onuchic,and P. G. Wolynes. Self-regulating gene: An exact solution.
Phys. Rev. E , 72(5):051907–,November 2005.[13] G. C. P. Innocentini and J. E. M. Hornos. Modelling stochastic gene expression under repres-sion.
J Math Biol , 55:413–431, 2007.[14] Srividya Iyer-Biswas, F. Hayot, and C. Jayaprakash. Stochasticity of gene products fromtranscriptional pulsing.
Phys. Rev. E , 79:031911–, 2009.[15] J. E. M. Hornos A. F. Ramos, G. C. P. Innocentini. Exact time-dependent solutions for aself-regulating gene.
Physical Review E , 83:062902, 2011.[16] Alexandre F. Ramos and Jos´e; E. M. Hornos. Symmetry and stochastic gene regulation.
Phys.Rev. Lett. , 99:108103, 2007.[17] A. F. Ramos, G. C. P. Innocentini, F. M. Forger, and J. E. M.Hornos. Symmetry in biology:from genetic code to stochastic gene regulation.
IET Systems Biology , 4:311–329, 2010.[18] M. Thattai and A. van Oudenaarden. Intrinsic noise in gene regulatory networks.
Proceedingsof the National Academy of Sciences USA , 98:8614–8619, 2001.[19] M. Abramowitz and I. A. Stegun.
Handbook of mathematical functions . U.S. GPO, Washing-ton, 1972.[20] Long Cai, Nir Friedman, and X. Sunney Xie. Stochastic protein expression in individual cellsat the single molecule level.
Nature , 440(7082):358–362, March 2006.[21] R C Dickson, R M Sheetz, and L R Lacy. Genetic regulation: yeast mutants constitutive for eta-galactosidase activity have an increased level of beta-galactosidase messenger ribonucleicacid. Molecular and Cellular Biology , 1(11):1048–1056, 1981., 1(11):1048–1056, 1981.