A stochastic model for gene transcription on Drosophila melanogaster embryos
Guilherme N. Prata, José Eduardo M. Hornos, Alexandre F. Ramos
aa r X i v : . [ q - b i o . S C ] N ov APS/123-QED
A stochastic model for gene transcription on
Drosophilamelanogaster embryos
Guilherme N. Prata , ∗ Jos´e Eduardo M. Hornos , † and Alexandre F. Ramos ‡ Escola de Artes, Ciˆencias e Humanidades,Universidade de S˜ao Paulo, Avenida Arlindo B´ettio, 1000,Ermelino Matarazzo, S˜ao Paulo, SP, Brazil, CEP: 03828-000 Instituto de F´ısica de S˜ao Carlos,Universidade de S˜ao Paulo, Av. Trabalhador S˜ao-Carlense,400, S˜ao Carlos, SP, Brazil, CEP: 13566-590 Departamento de Radiologia – Faculdade de Medicina, Universidade de S˜ao Paulo and N´ucleo de Estudos Interdisciplinares em Sistemas Complexos, Universidade de S˜ao Paulo. (Dated: October 9, 2018) bstract We examine immunostaining experimental data for the formation of the strip 2 of even-skipped transcripts on
D. melanogaster embryos. An estimate of the factor converting immunofluorescenceintensity units into molecular numbers is given. The analysis of the eve dynamics at the region ofthe stripe 2 suggests that the promoter site of the gene has two distinct regimes: an earlier phasewhen it is predominantly activated until a critical time when it becomes mainly repressed. Thatsuggests proposing a stochastic binary model for gene transcription on
D. melanogaster embryos.Our model has two random variables: the transcripts number and the state of the source of mRNAsgiven as active or repressed. We are able to reproduce available experimental data for the averagenumber of transcripts. An analysis of the random fluctuations on the number of eve and theirconsequences on the spatial precision of the stripe 2 is presented. We show that the position of theanterior/posterior borders fluctuate around their average position by ∼
1% of the embryo lengthwhich is similar to what is found experimentally. The fitting of data by such a simple modelsuggests that it can be useful to understand the functions of randomness during developmentalprocesses.
DOI:PACS numbers: ∗ [email protected] † Deceased ‡ [email protected] . INTRODUCTION Gene regulation plays a key role on the formation of strikingly precise spatio-temporalpatterns of expression of the genetic information stored on the DNA of the metazoans. Al-though the intracellular environment has a plethora of molecular species, the biochemicalreactions are occurring with reactants present in low copy numbers. That leads to randomfluctuations, as predicted by Delbr¨uck [1], and it is intriguing that biological processes, suchas development, are so reliable. One astonishing example is the
D. melanogaster segmen-tation process and the formation of gene products stripes, for example, the even-skipped mRNA’s (here on just eve ) stripe 2 [2]. The application of experimental techniques wassuccessful on the identification of the reactants regulating the stripe 2 formation. However,a theoretical approach has played a key role on the determination of the interactions amongregulatory proteins and specific DNA sites controlling eve ’s gene expression [3, 4]. Similarly,it is fair to expect that the use of the appropriate theoretical machinery shall play a usefulrole on the investigation of the noise effects or noise suppression on developmental processes.Despite fluorescence techniques have been applied to the detection of the noise in prokary-otic [5, 6] or eukaryotic [7] cells, the stochasticity in metazoans stands as a deeper challengeto experimentalists and to theoreticians. The stochastic modeling of biochemical reactionshave been done using the Langevin technique [8], exact stochastic simulations [9] or masterequations [10–12]. The Langevin technique is useful for the analysis of average and variance,but may fail when bi-modal distributions appear. Simulations based on the Gillespie’s algo-rithm may generate the complete distributions with the requirement of strong computationalpower and a full characterization of the analyzed chemical reactions [13, 14]. However, theintracellular biochemical mechanisms of metazoans are still under investigation. Hence, acoarse-grained approach based on simple analytically solvable models would be welcome.Here, master equations for small sets of effective chemical reactions may be proposed andhave their exact solutions obtained. Although this phenomenological approach would nothelp on the search for the complete set of chemical reactions taking place inside the cell, itmay be powerful on effectively establishing the outcomes of those reactions. Furthermore,exactly solvable models might be used as building blocks for the description of more complexphenomena, such as noise transmission on gene networks.In this manuscript we propose a spatial stochastic binary model for gene transcription on3 ruit fly embryos. We apply our model for the dynamics of the formation of the stripe 2 of eve and compare our predictions with available experimental data [3, 15]. Since immunostainingis used for detection of mRNA’s we propose a strategy to convert immunofluorescence intoabsolute numbers. The analysis of data around the peak of the stripe 2 suggests that thedynamics of the number of transcripts is due to an mRNA source operating in two states.Therefore, we construct a spatial stochastic binary model for the dynamics of the stateof the mRNA source and for the mRNA numbers. The transcription dynamics occurs intwo phases: the first has the mRNA’s source predominantly active whereas it is mostlyrepressed through the second phase. We are able to reproduce the experimental data for thedynamics of the average number of mRNA’s [3]. Furthermore, we calculate the variationon the position of the anterior and posterior borders of the stripe 2 due to the fluctuationson the amount of eve and it turns out that it is similar to the one observed experimentally[15]. Those results suggest that our model can be a useful tool for investigation of noise andnoise propagation on fruit flies gene networks.
II. IMMUNOFLUORESCENCE DATA ANALYSIS
We start this section by presenting our estimation on the factor to convert the immunoflu-orescence intensities into mRNA numbers. The converted data is used to present mRNAprofiles. The dynamics of the eve around the peak of expression of the stripe 2 is analyzed.The observed dynamics corresponds to a promoter site operating in the on and off states.
A. Transcription data analysis
The experimental data analyzed in this manuscript has been presented in a recent pa-per [3]. Immunofluorescence intensities generated by the lacZ transcription regulated bythe enhancer controlling the eve gene of a
D. melanogaster embryo are provided. Theimmunofluorescence intensities refer to a strip along the AP axis of the embryo of widthcorresponding to 10% of the dorsal-ventral embryo axis length. The intensity of immunoflu-orescence is indicated for a domain with length corresponding to one percent of the egglength along the AP axis. The expression of the genes involved in the
Drosophila embryosegmentation process can be considered uniform along the dorsal-ventral axis. Hence, the4 N u m b e r o f m o l ec u l e s C13 T1 T2 N u m b e r o f m o l ec u l e s T3 T4 T5
40 50 60 700102030 AP N u m b e r o f m o l ec u l e s T6
40 50 60 70AP T7
40 50 60 70AP T8 FIG. S1. The mRNA profiles of the eve ’s stripe 2 is presented for C13 and for the eight timeclasses of C14, ending by the onset of gastrulation. The red lines indicate the experimental datawhile average mRNA numbers predicted by the stochastic binary model for gene transcription infruit-flies is given by the solid green circles. The standard deviation on the mRNA numbers ateach position along the AP axis is represented by the vertical black bars. segmentation processes can be approximated as one dimensional along the AP axis. Finally,the investigation of the formation of eve ’s stripe 2 can be performed analyzing experimentaldata available for the positions from 35.5% of the egg length (EL) to 76.5% EL. The ex-perimental data are grouped into nine different time classes, namely, C13 and T1, . . . , T8.Each temporal class lasts 6.5 minutes (or 390 seconds) on average [3, 15, 16].In Ref. [3] the data for mRNA concentrations is available in immunofluorescence unities.To compare the data with our model predictions, we have converted immunofluorescenceunities to absolute numbers. To the best of our knowledge, there is no information on theliterature about the typical amount of eve mRNA molecules around a nucleus during
D.melanogaster embryos development. Furthermore, we could not find conversion factors totransform immunofluorescence units into molecule numbers. Hence, we have estimated the5umber of eve mRNA’s considering information about the transcription of eve , ftz , and Ubx genes, all of them involved in the segmentation process of the
D. melanogaster embryoduring the blastoderm phase [15–28].The Ref. [3] shows the maximum fluorescence corresponding to the eve molecules atposition 40.5% EL at T5 instant. At this instant and position, the eve mRNA number isassumed to be ∼ µm [28] and the volume of a nucleus during the cycle 14 as ∼ µm [29].A linear extrapolation was carried out for the remaining fluorescence intensities along theembryo. For our estimate, we assume what follows: ( i ) eve and ftz mRNA concentrationsare of same order of magnitude[16, 24–26]; ( ii ) abundances of ftz and Ubx mRNAs arecomparable[27]; ( iii ) the concentration of
Ubx mRNA molecules is 50 molecules/250 µm atcycle 11[28]; ( iv ) there is a linear relation between measured fluorescence intensity (withoutbackground signal) and the number of mRNA molecules. We adopt 50 molecules/250 µm as reference value for the density of eve mRNA molecules and the diameter of a nucleus ofthe Drosophila embryo during cycle 14 to be 6.5 µm [29].The Figure S1 shows a comparison between experimental data and theoretical predictionsof the temporal evolution of the average number of eve from time class C13 to T8 alongthe AP axis. The experimental data is shown in red while the green lines are showing themodel’s predictions for the average mRNA numbers. The vertical bars are indicating thestandard deviation on the mRNA numbers as predicted by our model. Each graph exhibitsthe amount of eve transcripts at a given time class along the AP axis of the embryo. Thehorizontal axis of the plot gives the position of a nucleus as % of the EL while the numberof mRNA molecules is shown at the vertical axis. B. A two phases transcription dynamics
The Figure S2 shows the dynamics of the average number of eve molecules at time instantsfrom C13 to T8 at the position 41.5% EL. The graph shows red solid circles indicating theexperimental data and a green line indicating the average number of mRNA molecules aspredicted by our model. The horizontal axis represents the time and the vertical axis showsthe mRNA number.Inspection is enough to verify that the amount of mRNA increases from C13 to T5. At T56
13 T1 T2 T3 T4 T5 T6 T7 T80102030 Time N u m b e r o f m R NA m o l ec u l e s % AP FIG. S2. The two phases character of the dynamics of the eve ’s gene transcription at the position41.5 % of EL is shown here. The experimental data is given by the solid red circles while the greenline indicates the average mRNA number as predicted by our model. the amount of eve mRNA’s reaches a maximum and starts decreasing towards zero until T8.That dynamics suggests that the eve gene is transcribed into two distinct phases: during thefirst phase the promoter site of the gene is predominantly in active state while it remainsmostly repressed during the second phase. A similar time evolution pattern is observedalong the AP axis of the embryo and can be considered as a signature of the formation ofthe eve stripes. Therefore, we extend our conclusions for the position 41.5% EL to the wholeAP axis. We notice, however, that the time extent of both the increase and decrease phasesvary along the AP axis.
III. A COARSE-GRAINED APPROACH FOR STOCHASTIC TRANSCRIPTION
In this section we introduce a spatial stochastic binary model for the gene transcriptionon
D. melanogaster embryos. We apply the model for the expression of the eve gene byconsidering the formation of the stripe 2. The dynamics of the expression of the eve gene isa two phases process. The spatial profile of the critical times of the switch from the first phaseto the second is presented. The fluctuations on the mRNA numbers and their implicationson the variation of the position of the borders of the stripe 2 of eve are investigated.7 . A spatial stochastic binary model for transcription in
Drosophila
For modeling purposes, we start considering a finite one-dimensional lattice with eachnode representing a cellular nucleus. The length of the lattice corresponds to the egg length(EL) at the AP direction. Each nucleus is labeled by a number i which corresponds to itsposition at the AP axis in percentage of EL, namely, i = 0.5, 1.5, . . . , 99.5. We assume theexistence of a single copy of the gene per nucleus and we are interested on the number ofmRNA’s ( n i ) present at the i -th nucleus. From here on we shall interpret the states of themRNA source effectively as the promoter state, as its state regulates the transcription rate.We compare model predictions and experimental data assuming direct correspondence ofspatial coordinates. In a given nucleus, the transcription may occur at the active ( on ) andrepressed ( off ) gene states, respectively, with rates k and χk (0 ≤ χ < i -th nucleus is indicated by the switchingrates f i (or h i ).We describe the dynamics of the system in terms of the probability, α n i ,i ( t ) or β n i ,i ( t ), offinding n i mRNA’s at the i -th nucleus and the promoter site being, respectively, active orrepressed. The dynamics of these probabilities is controlled by the master equations, namely dα n i ,i ( t ) dt = k [ α n i − ,i ( t ) − α n i ,i ( t )] + ρ [( n i + 1) α n i +1 ,i ( t ) − n i α n i ,i ( t )] − h i α n i ,i ( t ) + f i β n i ,i ( t )(1) dβ n i ,i ( t ) dt = kχ [ β n i − ,i ( t ) − β n i ,i ( t )] + ρ [( n i + 1) β n i +1 ,i ( t ) − n i β n i ,i ( t )] + h i α n i ,i ( t ) − f i β n i ,i ( t ) , (2)where the mRNA degradation rate is indicated by ρ . Notice that there is no interactionbetween different nuclei of the lattice. That is because we have neglected diffusion or trans-port effects here. Hence, one may consider each site as an independent unit and probabilityconservation holds for an individual site, such that ∞ X n i =0 [ α n i ,i ( t ) + β n i ,i ( t )] = 1 , (3)at each site of the lattice. The introduction of the above property turns possible the useof exact solutions already available on the literature [30]. Furthermore, we shall also usethe symmetry properties underlying analyticity both as a mathematical tool and to give abiological interpretation for the model and its parameters [31].8 . Activity level, expected value and standard deviation In this manuscript we shall focus on the probability of finding the gene promoter siteof the i -th nucleus at the active (repressed) state, indicated by A i ( t ) ( B i ( t )), the mRNAaverage number, h n i i ( t ), and, the standard deviation on n i and its consequence on positionalprecision of the borders of the stripe 2. The probabilities A i ( t ) and B i ( t ) are defined as A i ( t ) = ∞ X n i =0 α n i ,i ( t ) and B i ( t ) = ∞ X n i =0 β n i ,i ( t ) . (4)whereas the mRNA average number and the transcripts number fluctuations are, respec-tively, given by h n i i ( t ) = ∞ X n i =0 n i [ α n i ,i ( t ) + β n i ,i ( t )] (5)and σ i ( t ) = " ∞ X n i =0 n i [ α n i ,i ( t ) + β n i ,i ( t )] − [ h n i i ( t )] . (6)Explicit expressions for A i ( t ), B i ( t ) and h n i i ( t ) can be written in terms of the steadystate probabilities to find the promoter site at the active and repressed states, respectively, a i = f i h i + f i and b i = h i h i + f i . (7)The system approaches the steady state exponentially with a decay rate dependent onthe constant ǫ i = ǫ = h i + f i ρ . (8)Note that we assumed this constant to have a fixed value along the AP-axis. This occursbecause the quantity ǫ i is an invariant of the model, as it was shown previously [31]. Then,the probabilities are written as A i ( t ) = a i + ( A i (0) − a i ) e − ǫρt , (9) B i ( t ) = b i + ( B i (0) − b i ) e − ǫρt . (10)where A i (0) and B i (0) are the initial conditions.To show the time-dependent protein average numbers we define the constant N = k/ρ interms of which the average protein number at the steady state ( µ i ) is given, namely, µ i = N (1 − χ ) a i + N χ. (11)9nd we get: h n i i ( t ) = µ i + U i e − ρt + V i e − ǫρt , (12)where V i = N (1 − χ )( A i (0) − a i )1 − ǫ , and U i = h n i i (0) − µ i − V i . h n i i (0) is the initial condition on the average mRNA numbers. Moreover, the fluctuation σ ( t ) is given by σ i ( t ) h n i i ( t ) = 1 + ∆ i ( t ) (13)where ∆ i ( t ) is a position- and time-dependent term, namely∆ i ( t ) = ξ i + η i e − ρt + λ i e − ǫρt + ψ i e − (1+ ǫ ) ρt + φ i e − ǫρt h n i i ( t ) . (14)The coefficients of the exponentials are written on the appendix.The parameters k , χ and ρ are assumed to be invariant along the AP-axis. We consider themRNA synthesis rates to be mainly determined by the DNA sequence of the promoter regionof the gene. For simplicity, we assume that the mRNA degradation has small interference ofenvironmental conditions. That effective approach implies treating the RNA Polymerasesor enzymes catalyzing mRNA degradation as uniformly distributed along the embryo.The regulatory effect of the surrounding conditions around the i -th nucleus is introducedin terms of the gene switching rates, h i and f i . If the concentrations of the regulatoryproteins in a nucleus gene transcription the value of f i will be greater than h i , and vice-versa. We may assume h i and f i to be dependent on space because such rates represent theeffect of transcription factors (TF’s). Furthermore, in literature we have not found any traceof auto regulation in the eve mRNA stripe 2 formation during cycle 14A. One could alsoconsider these rates to be time-dependent and, depending on the form of that dependence,the Eqs.(9) and (12)) might not hold as solutions for the model. C. A model for the eve ’s stripe 2 formation
The Fig. S1 shows the comparison between the experimental data for estimated mRNAnumbers (red lines) and the model predictions for the mRNA average numbers (green lines).The mRNA numbers are indicated at the vertical axis and the horizontal axis gives thepercentage of the EL at AP direction. The red lines of the graphs are showing the mRNAaverage numbers at 9 different time instants for which experimental data is available.10he graphs at Figure 1 were obtained considering that the transcription dynamics consistsof two phases. Each phase of the gene dynamics is characterized by a pair of switchingrates, f i and h i . The time-dependence of the switching rates can be represented in terms ofHeaviside[44] functions, Θ, namely: f i ( t ) = f i Θ( τ i − t ) + e f i Θ( t − τ i ) (15) h i ( t ) = h i Θ( τ i − t ) + e h i Θ( t − τ i ) , where h i , f i , e h i and e f i are the switching constants at the i -th AP position (Figures S1 andS2 on Appendix). The parameter τ i , denominated critical time , is the time instant of thechange of switching rates. This quantity may be space-dependent since it is associated withenvironmental regulatory factors which, in turn, are not distributed homogeneously alongthe AP axis. The parameters k , χ and ρ will be the same during the two time phases.We consider that ǫ = h i + f i ρ = e h i + e f i ρ is also invariant during time. Then, there will be onlyone free switching rate to be optimized. The assumptions above have a clear biochemicalinterpretation. Furthermore, they turn possible the exact calculation of the probabilityfor the gene to be active and the mRNA average number, its standard deviation and theassociated probability distributions.The dynamics presented at Fig. S1 was obtained with the following equations for thedynamics of the probability for the gene to be active A i ( t ) = a i + ( A i (0) − a i )e − ǫρt , ≤ t ≤ τ i e a i + ( A i ( T i ) − e a i )e − ǫρ ( t − τ i ) , t > τ i (16)where the steady-state probabilities for the gene to be active are given as a i = f i h i + f i | {z } First phase , and e a i = e f i e h i + e f i | {z } Second phase . (17)The average number of mRNA’s is given by h n i i ( t ) = µ i + U i e − ρt + V i e − ǫρt , ≤ t ≤ τ i e µ i + e U i e − ρ ( t − τ i ) + e V i e − ǫρ ( t − τ i ) , t > τ i , (18)with coefficients set as: µ i = N (1 − χ ) a i + N χ, e µ i = N (1 − χ ) e a i + N χ, i = N (1 − χ )( A i (0) − a i )1 − ǫ , e V i = N (1 − χ )( A i ( τ i ) − e a i )1 − ǫ ,U i = h n i i (0) − µ i − V i , e U i = h n i i ( τ i ) − e µ i − e V i , where A i (0) and h n i i (0) are the initial conditions of the dynamics and A i ( τ i ) and h n i i ( τ i )are the initial conditions for the second phase. D. Initial conditions
The initial conditions of the average mRNA numbers for the first phase are built byfitting of experimental data at the time class C13 with the function: h n i i (0) = c + c e − c ( i − . (19)where c = 1 . c = 12 . c = 0 . h n i i (0) given by the Eq. (19) was obtained in terms of h n i =41 . i (0). We consider the experimental value in 41.5% AP position at C13 as ini-tial condition for the number of mRNA, that is, h n i =41 . i (0) = 10 . A i =41 . (0) at 41.5% AP position, we are far from any estimate because of scarcityof information. Nevertheless, since our approach is based on a Markov process, the tendencyis that, during the time evolution, the system stops feeling the effect of the initial conditionand, because of this, the absence of an estimate for A i =41 . (0) or even for h n i =41 . i (0) doesnot represent a problem.An alternative to estimate A i (0) is to consider that the system is described approxi-mately by an arbitrary steady state of the model. In that case the initial conditions for theprobabilities for the gene to be active is obtained from h n i i (0). That results into A i (0) = h n i i (0) − N χN (1 − χ ) . (20)Notice that the space dependence of A i (0) is inherited from h n i i (0).12 . Parameters estimation The parameters of the model have been obtained by optimization. The values of k , χ and ρ were obtained by fitting the dynamics at the position 41.5% of the EL. Then, we setthose values along the whole AP axis because they are indicating characteristics that areintrinsic to the promoter site and the mRNA molecules.The parameter χ quantifies the proportion between the two mRNA production modes.It may be estimated as 0 simply because the phenomenon is being described in terms of twowell characterized activation levels: one with high mRNA production rate and another withlow production rate (which, in first approximation, may be considered null).The estimates for the probabilities a i =41 . and e a i =41 . may also be conceived with thehelp of experimental data and the all-or-nothing character of our approach. At 41.5% APposition the production of eve mRNA is considerably greater than at other AP positions.Furthermore, since the eve mRNA stripe 2 formation occurs in wild-type embryos, thetendency of biological system is always to produce eve mRNA during the first temporalclasses. This means that, in first approximation, we may estimate a i =41 . ∼
1. On the otherhand, the biological system starts ceasing the production of eve mRNA after τ i . Hence, wemay estimate e a i =41 . = 0 as the probability for the gene being active at the end of the secondphase.By estimating χ = 0, a i =41 . = 1, e a i =41 . = 0 and h n i =41 . i (0) = 10 . k = 0 . − , ǫ = 1 . − , ρ = 0 . − , τ i =41 . = 2053 . A i =41 . (0) = 0 . h n i =41 . i (0) and the estimates for χ , a i =41 . , e a i =41 . into Eqs. (16) and (18), the average number h n i =41 . i ( t ) may be obtained and compared tothe experimental data, as Figure S2 shows.The remaining parameters have been computed by analysis of the experimental dataalong the space and it results: a i = exp (cid:2) − . i − (cid:3) (21) e a i = 0 (22) τ i = τ . exp (cid:2) − . i − (cid:3) (23)where τ . = 2053 .
09 sec. 13
10 20 30 40 50 60 36 38 40 42 44 46 T i m e ( un i t s o f m i n ) APActivity time interval along the AP axis of the embryo
FIG. S3. A comparison between the Gaussian shape for critical time as obtained with the Eq. (23)and experimental data on transcriptional activity around the peak of expression of eve ’s stripe 2,using experimental data presented in Ref. [40]. The experimental data are shown as the solid redcircles and the green lines are indicating the model’s results.
The Gaussian profile of a i is based on fact of that, at sufficient long time, the spatialdistribution of the number of transcripts will be very similar to the space-dependence of theasymptotic activation probability (see Eqs.(9) and (21)). The Gaussian profile of τ i is relatedto the sharpening of the stripe 2 as follows: The TF’s that repress the transcription of the eve gene are getting produced on both sides of the stripe 2; Their concentrations achievesufficiently large values at the critical time instants given by the Eq. (23); As a consequence,the probability for the gene being repressed increases in accordance with the increase on theamount of repressive TF’s. Indeed, the reduction of the transcriptional activity of the eve ’senhancer controlling the stripe 2 formation has a Gaussian profile, as it is shown on Fig. S3.The amplitudes of the functions a i and τ i , and the estimate of e a i , are obtained by op-timization. The centers of both Gaussian functions a i and τ i are at 41% of EL becausethe expression levels reached at both 40.5% of EL and 41.5% of EL are very close to oneanother so that we assume that the center of the stripe is at the midpoint between theseAP positions. 14 - x » % AP x ' - x ' » % AP x C z H ± L % AP x C ' z H ± L % AP Μ i + Σ i Μ i Μ i - Σ i x x C x x x ' x C ' x ' 45.505101520253035 AP N u m b e r o f m R NA m o l ec u l e s FIG. S4. The implications of the fluctuations on the mRNA numbers on the position of the anteriorand posterior borders of eve ’s stripe 2 domain at time class T5. We consider three particularcurves, one (green color) for the average number of mRNA synthesized by the eve ’s gene and theremaining two (red color) given by the average plus or minus the standard deviation on the numberof transcripts.
IV. THE FLUCTUATIONS OF MRNA NUMBER
The Eq. (13) gives the general form of the noise obtained with the stochastic binarymodel. Since we have two phases in our model, the coefficients for the exponentials on thedefinition of ∆ i ( t ) at each time phase are defined as follows. For 0 ≤ t ≤ τ i , the terms ξ i ,15 i , λ i , ψ i and φ i become, respectively, ξ i , η i , λ i , ψ i and φ i , given by ξ i = N (1 − χ ) a i (1 − a i )1 + ǫη i = (cid:2) h n i i − µ i (cid:3) + [ h n i i − µ i ] + N (1 − χ ) ǫ − A i (0)(1 − A i (0)) − a i (1 − a i )]+ (cid:26) N (1 + χ ) ǫ − N (1 − a i + χa i ) − h n i i ] − (cid:27) V i − (2 µ i + 1) U i − U i λ i = 2 V i − ǫ [ N (1 + χ − ǫ ) + µ i ( ǫ −
2) + a i ǫ (1 − χ )] ψ i = 2 N (1 − χ )1 − ǫ [ A i (0)(1 − A i (0)) − a i (1 − a i )] + 2 (cid:2) µ i − N (1 − a i + χa i ) + V i (cid:3) V i φ i = − V i . For τ i < t , the terms ξ i , η i , λ i , ψ i and φ i become, respectively, e ξ i , e η i , e λ i , e ψ i and e φ i , whichcorrespond to the asymptotic probability e a i and “initial conditions” A i ( τ i ) and h n i i ( τ i )instead of a i , A i (0) and h n i i (0). Moreover, these terms are evaluated at t − τ i instead of t . That is analogous to the procedure proposed for the evaluation of the mRNA averagevalues.The Eq. (13) shows the prediction of the model for the dynamics of the ratio of thestandard deviation for the mRNA mean number, also called the Fano factor. The Fanofactor is useful to determine how different from a Poissonian a probability distribution is: aPoisson distribution has a Fano factor equals to one. A super Fano (or sub Fano) probabilitydistribution has Fano factor greater (or smaller) than one. Since the quantity ∆ i ( t ) obeys∆ i ( t ) ≥ i and t , it causes the Fano factor to be equals to or greater than one.Therefore, the enhancer controlling formation of the eve’s stripe 2 along the AP axis duringthe cycles 13 and 14 leads the eve gene to behave as a Fano or super Fano source of mRNA’s.The standard deviation on the mRNA numbers during cycles 13 and 14 along the APaxis is given by the vertical bars at the plots of the Figure S1. Accordingly to the Eq.(13), the standard deviation is given by σ i ( t ) = p h n i i ( t ) p i ( t ). At the initial timeinstants the noise values are greater at positions where h n i i ( t ) assumes its maximal values.When the stripe 2 starts to achieve its shape the noise has three different characteristics: itis almost zero on the regions at the right of the stripe; it has intermediary values aroundthe peak of expression; the maximal values of the noise are exhibited at positions on theborders of the stripe. During the two final time classes the noise amplitude is greater at theregions where h n i i ( t ) is higher. The difference on the dynamic behavior of the noise is due16o the values of the asymptotic values, ∆ i ( τ i ) and ∆ i ( ∞ ), and the differences ∆ i ( τ i ) − ∆(0)and ∆ i ( ∞ ) − ∆ i ( τ i ). That is because the system approaches the equilibrium with the samespeed at all AP positions and phases of the dynamics.The Fig. S4 shows a plot for the average mRNA numbers and their fluctuations on theregion from 35.5% EL to 46% EL at the time class T5. The green line indicates the valuesof h n i i while the values of h n i i ± σ i are indicated by the red lines. The dashed blue lineindicates the position where the mRNA is maximal, i = 41. There are two rectangles withtop horizontal blue lines and grey colored area around the positions x C and x ′ C , with a baselength going from x to x and x ′ to x ′ , respectively. The height of the two rectangles ishalf of the maximum number of mRNA’s. The spatial fluctuation on the position of the eve ’s stripe 2 can then be inferred from the fluctuations on the mRNA numbers, σ i . The tworectangles are indicating the spatial variation on the formation of the anterior and posteriorborders of the stripe 2. Based on our results, the position of the anterior border of the stripe2 is at x C ± .
8% while the posterior border of the stripe 2 is at x ′ C ± . V. DISCUSSIONA. Parameters of the model and fitting
An important characteristic of our approach is the small number of parameters. Thefitting shown at Fig. S1 has required 14 parameters for 42 × h i and f i ,respectively). Therefore, the fitting presented at Fig. S1 is an actual test for the dynamicsresulting from our model.The Fig. (S2) shows not only that proposed approach is able to describe experimentalwith good agreement but also that it is possible to find a set of parameters (for k , χ , ρ , ǫ , a i , e a i , τ i and A i (0)) coherent with the model and literature. Thanks to the model’s goodperformance in describing the experimental data at 41.5% AP position we have extended itto describe the numbers of eve dynamics to other AP positions.17 . Estimates of the mRNA numbers and parameters optimization The analysis of the Fig. S1 may lead us to question about the correctness of the estimatesof the eve mRNA numbers. Indeed, we expect that a precise and direct detection of the eve transcripts would require an adjustment of our estimate. However, it is fair to assumethe preservation of the qualitative aspects of the mRNA number dynamics. In that case,the refinement of the model’s parameters would be required for fitting data but biologicalinterpretation of the model would remain useful to understand the role of the fluctuationsduring
Drosophila embryo development.In the literature, we have found that size of the eve coding sequence is 1477 bp [32] andthe rate at which the RNA-Polymerase II transcribes a sequence is 1400 bp/min = 23.33bp/sec [33–35]. This allows to estimate the time of production of one eve mRNA in 63.3seconds, which corresponds to the rate k = 1 / (63 . . · − sec − . This estimateis also consistent with other values [36].The estimate of the degradation rate ρ of the eve mRNA is based on the half-live of the ftz mRNA. This is adopted due to scarcity of data about eve mRNA. In literature [37–39],the half-live of ftz mRNA is between 6 and 7 minutes, which corresponds to a degradationrate of order of 10 − sec − . This value establishes the order of magnitude for the degradationrate ρ of the eve mRNA. C. The two phases dynamics
The graph at Fig. S2 suggests that the transcription of the eve gene around the position41.5% EL is a two-phases process. As one may verify at the Eq.(18) the two phases havedifferent attractors, the first phase has µ i as a fixed point while e µ i is the fixed point of thesecond phase. The average number of mRNA’s has direct dependence on the asymptoticprobabilities, a i and b i , as it is shown at the Eq. (11). Furthermore, µ also depends on k , χ ,and ρ which are fixed constants. Therefore, the occurrence of two attractors is due to theexistence of two asymptotic probabilities, each of them associated with one temporal phaseof the dynamics. The asymptotic probabilities are functions of the switching parameters,as shown at the Eq. (17). The regulatory configuration of the promoter site, despite itsrandom switching, is characterized by a pair f i and h i that leads it to be mostly active until18 i . After it, the promoter site shall be mostly repressed and mRNA degradation will be theprevailing reaction.The introduction of the Heaviside time-dependence on the switching rates and the spacedependence is a generalization to previous versions of this model. That permits introductionof diffusion or other space dependent processes. The use of Heaviside functions for theswitching rates turns possible to decompose the temporal domain in an arbitrary number ofslices and the solution is an analogous to a Riemann sum. Then, the exact solution presentedhere is a generalization of existing solutions to this model.The two-phase behavior of the eve ’s promoter site is caused by the interaction of multipleTF’s with the enhancer controlling the stripe 2 formation. The interaction of the TF’salong the AP axis is orchestrated in such a way that the eve gene promoter site operationis effectively binary. That behavior of the eve promoter supports the proposition of themodel of Eq. (1) as a tool to understand the role of random fluctuations during Drosophila development. It provides the simplest approach to regulated gene transcription, is fullysolvable, and the symmetries underlying its analyticity have been characterized. Here, theexistence of symmetries has a practical implication as it allows us to take ǫ as an invariantof the model on both space and time. That assumption leads to a reduction of the freeparameters of the model and contributes to avoid the over-fitting. D. Phenomenology of the model
The invariance of ǫ can be interpreted biochemically as the characteristic time of thepromoter’s site capability for the transition between the active and repressed states in re-lation to the characteristic time of the mRNA degradation. For high (low) values of ǫ , atransition cycle will be completed fastly (slowly) when compared to mRNA degradation.Here, f i and h i relates to the time fraction of a switching cycle that the promoter site willtake to switch to the active and repressed states, respectively. For f i > h i , the promoter willbe predominantly at the active state while the repressed state is most likely for the oppositecondition.The invariance of ǫ also helps on the understanding of the enhancer-TF’s interaction. For f i >> h i , the promoter site is highly active, and the majority of the enhancer-TF’s inter-actions contributes to stimulate eve ’s transcription. Hence, one may suppose the enhancer19nducing an active state of the promoter during the first phase of the stripe 2 formation.The initial phase has higher concentrations of Bcd and Hb that contribute for such a higheractivation. As the gap genes are expressed, repressive TF’s (Gt, Kr and Kni) start bindingto the enhancer and the balance between f i and h i changes towards h i >> f i . Then, thepromoter site is expected to be mostly repressed.The critical time can be interpreted as a time instant when there is a saturation ofrepressive TF’s bound to the enhancer and the promoter is most likely to be repressed.There is a high number of possible configurations for the stripe 2 enhancer. Hence, weexpect a smooth transition from the initial configuration to the asymptotic values. Thisis represented by the exponential time decays of average mRNA number and the promoterstate.The Fig. S3 shows a comparison between the Gaussian profile as proposed for the criticaltimes and the observed data for the transcriptional activity time interval of the eve geneas detected experimentally at some positions around the peak of expression [40]. It isremarkable that the Gaussian curve is in good agreement with the analyzed data. Thetranscriptional activity is directly related to the binding of repressive TF’s to the enhancercontrolling the formation of the stripe 2 of the eve gene. The increase on the amountof repressive TF’s reduces the transcriptional activity by their binding to the enhancer.Furthermore, the spatial form of the activity reduction follows a Gaussian spatial patterncentered around the peak of expression, as it happens with the critical times that are givenby the Eq. (23). E. The intrinsic noise on mRNA numbers due to promoter dynamics
Our model can also be used as a tool to understand the characteristics of the noise onthe number of mRNA’s when it is produced by a two levels system. The form of the noisepresented at the Eq. (13) is similar to the form of the translational noise as introduced ina paper considering a two stages model to stochastic gene expression [11]. The noise on theprotein numbers at the steady state regime was written as σ / h n i = 1 + B , where B is aconstant denoted as the bursting size [41]. One may show that our two levels model alsohas a bursting limit, with burst size ∆( t ). The bursting occurs at the limit of many veryfast mRNA synthesis during the very brief time interval of duration of the “on” state of20he gene. Even though B and ∆( t ) are called, respectively, translational and transcriptionalburst sizes, they are actually average values. That is, either the burst amplitude and itstime duration vary randomly at each burst and, hence, the burst size has a different valueat each bursting event.The above discussion on noise is important to understand recent results of a singlemolecule detection experiment that was performed to evaluate the dynamics of the for-mation of the stripe 2 of the fruit fly embryos [40]. The amount of eve’s mRNA moleculessynthesized per transcriptional event was also probed. It was shown that the eve moleculesare synthesized in a burst-like regime with burst size and duration varying randomly at eachtranscriptional event. Based on that observation, the authors have concluded that a twolevel model to gene expression would not be capable of accounting this phenomena and,therefore, the transcription of the eve gene should occur at multiple levels of synthesis rate.The reasoning underlying that conclusion is that each level of gene transcription has a singleburst size corresponding to it. That conclusion would be well supported by the assumptionof deterministic binding of the RNAP to the DNA. Then, for each burst size there will bea given amount of RNAP bound to the DNA. However, the number of reacting moleculesinside the cell is small and it leads the intracellular phenomena to be stochastic [1]. Hence,the observation of the multiples burst sizes is not necessarily due to multiple levels of thetranscription of the eve gene but a manifestation of the inherent stochasticity of biologicalphenomena. Particularly, the randomness of the eve ’s stripe 2 formation. Furthermore,since the burst size in our two levels model is both random and time-dependent, as afore-mentioned, we consider that it is very early to discard it as a good model to evaluate noisein fruit fly developmental processes. F. Variation on the stripe 2 borders position
The analysis of the noise on the mRNA numbers has turned possible for us to investigateits implications on the positional precision of the control of: The location of the anteriorand posterior borders of the stripe 2; The fluctuations on the domain width. Our resultsindicate that the noise on the position of the borders of the domain is of 0.8% of the ELwhich is in good agreement with experimental results [15]. In that reference, the noise onthe position of the eve domain is investigated for time classes until T5 and it is shown to be21f the order of 1% of the EL. We have also calculated the variation on the domain size andit goes from 3.8% EL to 7% EL, and its average width is 5.4% EL.In this manuscript, the noise on the position of the border of the domain of the stripe 2have been evaluated in terms of a model for the fluctuations on the number of eve molecules.Despite it may appear as a limitation of our approach it has a useful consequence, as itsuggests the possible inferior values of the positional error of the segmentation process. Inour approach, the noise on the position of the peak of expression of the eve stripe 2 isnot considered and it would be interesting to incorporate it here. Similarly, one may alsoinvestigate the role of the fluctuations on the numbers of the TF’s regulating the expressionof the stripe 2. Furthermore, the usual error of the experimental data is not included in ourmodel. Therefore, it is acceptable that our model’s predictions for the positional noise onthe stripe 2 borders is smaller than that observed experimentally ( ∼ .
8% EL).There is an interesting implication for the good agreement between the predicted spatialvariation of the borders position due to the intrinsic noise arising from the promoter andthe existing empirical data [15]. The intrinsic noise due to the promoter’s dynamics mightaccount for all of the observed spatial variability, and other extrinsic noise sources are beingdamped or filtered by mechanisms that are to be determined [42, 43]. One must notice,however, that our predictions for the noise are related with our extrapolation of mRNAnumbers that are not from the eve gene. It is natural to expect that the mean number ofmRNA’s may differ from gene to gene. Naturally, that would cause a change on the valuesof the parameters of the model and the noise predictions. Nevertheless, the theoreticalmachinery does not assume anything about the precise numbers of even-skipped moleculesand so the model will still be valid whenever such measurements become available.
ACKNOWLEDGMENTS
The authors are thankful to Prof. John Reinitz for invaluable discussions. We thankthe two anonymous reviewers for suggestions that increased the quality of our manuscript.GNP was supported by PNPD-CAPES under the Complex Systems Modeling Grad School(EACH USP). The authors are thankful to FAPESP and CAPES for financial support andto Institute of Physics of S˜ao Carlos (IFSC USP).22 ppendix A
Digitalization
The experimental data were digitalized by using the program
DigiteIt 1.5 . Optimization
All the fits obtained by optimization ran via least-squares method by using the WolframMathematica 7.0 program.
Ansatz of a two-phase transcriptional dynamics
Heaviside function . For those unfamiliar with the Heaviside function, we present at theFigure S5 the time dependence of the switching parameters f i ( t ) and h i ( t ) as presented atthe Eq.(10) of our manuscript. Initially, the pair ( f i ( t ) , h i ( t )) dominates the transcriptionaldynamics until a critical instant T i . From that time instant the dynamics is guided by thepair of rates ( e f i ( t ) , e h i ( t )). Parameters optimization . For optimization of the switching parameters purposes onemay work with asymptotic probabilities for the gene to be active and also represent theseprobabilities in terms of a Heaviside function. We start showing the asymptotic probabilitiesfor the gene to be active at each time phase as a i = f i ǫρ e a i = e f i ǫρ . (A1)Note that we are using the constants ǫ and ρ instead of h i + f i or e h i + e f i . That turns theoptimization process simpler.The optimization process is greatly simplified by the introduction of the Eq.(A1), whichcan be written as: q i ( t ) = a i , ≤ t ≤ T i e a i , t > T i (A2)23 T i f Ž f Time f i H t L T i h Ž h Time h i H t L FIG. S5. Heaviside-like time-dependence in the rates h i and f i . Each phase of the dynamicsof the mRNA expression at 41.5% AP position is associated with an evolution toward seeminglyasymptotic level. This, in turn, is associated with a pair of transition rates: the regulatory effectis represented by h i and f i in the first phase and by e h i and e f i in the second phase. Since thefirst and second phases are respectively characterized by the formation and disappearance of thestripe 2, this means that whereas the gene performs the transition off → on faster than on → off during the first phase, the inverse occurs during the second phase. Mathematically, this meansthat f i > e f i and e h i < h i . and is shown at the Figure S6. The simplification is due to the fact that q i ( t ) only assumesreal values between 0 and 1 while the switching rates may assume an non-negative real value. Coefficients for the Eq. (14) ξ i = N (1 − χ ) a i (1 − a i )1 + ǫη i = (cid:2) h n i i − µ i (cid:3) + [ h n i i − µ i ] + N (1 − χ ) ǫ − A i (0)(1 − A i (0)) − a i (1 − a i )]+ (cid:26) N (1 + χ ) ǫ − N (1 − a i + χa i ) − h n i i ] − (cid:27) V i − (2 µ i + 1) U i − U i λ i = 2 V i − ǫ [ N (1 + χ − ǫ ) + µ i ( ǫ −
2) + a i ǫ (1 − χ )] ψ i = 2 N (1 − χ )1 − ǫ [ A i (0)(1 − A i (0)) − a i (1 − a i )] + 2 [ µ i − N (1 − a i + χa i ) + V i ] V i φ i = − V i . T i ã q i H t L FIG. S6. Heaviside-like time-dependence in the rates q i . Heaviside-like time-dependence in rates h and f also makes the asymptotic probabilities a i and e a i vary suddenly over time. It is convenientto work with the q i , given by q i = ( ǫρ ) − f i , because it has units of probability, that is, 0 ≤ q i ≤ , 120 (1940).[2] S. Small, A. Blair, and M. Levine, The European Molecular Biology Organization Journal , 4047 (1992).[3] H. Janssens, S. Hou, J. Jaeger, A. Kim, E. Myasnikova, D. Sharp, and J. Reinitz, NatureGenetics , 1159 (2006).[4] A. Kim, C. Martinez, J. Ionides, A. F. Ramos, M. Z. Ludwig, N. Ogawa, D. H. Sharp, andJ. Reinitz, PLoS Genetics , e1003243 (2013).[5] M. B. Elowitz and S. Leibler, Nature , 335 (2000).[6] L. Cai, N. Friedman, and X. S. Xie, Nature , 358 (2006).[7] W. J. Blake, M. Krn, C. R. Cantor, and J. J. Collins, Nature , 633 (2003).[8] E. Aurell, S. Brown, J. Johanson, and K. Sneppen, Physical Review E , 051914 (2002).[9] D. T. Gillespie, Journal of Computational Physics , 403 (1976).[10] T. B. Kepler and T. C. Elston, Biophysical Journal , 3116 (2001).
11] M. Thattai and A. V. Oudernaaden, Proceedings of the National Academy of Sciences ,8614 (2001).[12] M. Sasai and P. G. Wolynes, Proceedings of the National Academy of Sciences , 2374(2003).[13] A. Arkin, J. Ross, and H. H. McAdams, Genetics , 1633 (1998).[14] Y. F. Wu, E. Myasnikova, and J. Reinitz, BMC Systems Biology , 1 (2007).[15] S. Surkova, D. Kosman, K. Kozlov, Manu, E. Myasnikova, A. A. Samsonova, A. Spirov, C. E.Vanario-Alonso, M. Samsonova, and J. Reinitz, Developmental Biology , 844 (2008).[16] S. Surkova, E. Myasnikova, H. Janssens, K. N. Kozlov, A. A. Samsonova, J. Reinitz, andM. Samsonova, Fly , 58 (2008).[17] Z. Liang and M. D. Biggin, Development , 4471 (1998).[18] J. M¨uller and M. Bienz, The European Molecular Biology Organization Journal , 3653(1992).[19] G. Tremml and M. Bienz, The European Molecular Biology Organization Journal , 2687(1992).[20] A. Griffiths, S. Wessler, R. Lewontin, and S. Carroll, “Introduction to genetic analysis,”(W.H. Freeman & Company; 9th ed., New York, 2008).[21] S. B. Carroll, S. DiNardo, P. H. O’Farrell, R. A. H. White, and M. P. Scott, Genes &Development , 350 (1988).[22] A. Martinez-Arias and P. A. Lawrence, Nature , 639 (1985).[23] A. P. Mahowald and P. A. Hardy, Annual Reviews of Genetics , 149 (1985).[24] E. Poustelnikova, A. Pisarev, M. Blagov, M. Samsonova, and J. Reinitz, Nucleic Acids Re-search , 2212 (2004).[25] A. Pisarev, E. Poustelnikova, M. Samsonova, and J. Reinitz, Nucleic Acids Research (2008).[26] E. Myasnikova, A. Samsonova, D. Kosman, and J. Reinitz, Development Genes and Evolution , 320 (2005).[27] M. Roark, P. A. Mahoney, M. L. Graham, and J. A. Lengyel, Developmental Biology ,476 (1985).[28] A. Par´e, D. Lemons, D. Kosman, W. Beaver, Y. Freund, and W. McGinnis, Current Biology , 2037 (2009).[29] S. Bergmann, Z. Tamari, E. Schejter, B. Shilo, and N. Barkaiemail, Cell , 15 (2008).
30] G. C. P. Innocentini and J. E. M. Hornos, Journal of Mathematical Biology , 413 (2007).[31] A. Ramos, G. Innocentini, F. Forger, and J. Hornos, IET Systems Biology , 311 (2010).[32] S. E. S. Pierre, L. Ponting, R. Stefancsik, P. McQuilton, and F. Consortium,Nucleic Acids Research , D780 (1992).[33] A. W. Shermoen and P. H. O. OFarrell, Cell , 303 (1991).[34] C. S. Thummel, Science , 39 (1992).[35] A. M. Femino, F. S. Fay, K. Fogarty, and R. H. Singer, Science , 585 (1998).[36] H. Bolouri and E. H. Davidson, Proceedings of the National Academy of Sciences , 9371(2003).[37] H. Edgar and E. H. Davidson, Cell , 747 (1986).[38] P. Surdej, A. Riedl, and M. Jacobs-Lorena, Annual Review of Genetics , 263 (1994).[39] A. Riedl and M. Jacobs-Lorena, IET Systems Biology , 3047 (1996).[40] J. P. Bothma, H. G. Garcia, E. Esposito, G. Schlissel, T. Gregor, M. Levine, Proceedingsof the National Academy of Sciences , 10598 – 10603 (2014).[41] V. Shahrezaei and P. S. Swain, Proceedings of the National Academy of Sciences , 17256– 17261 (2008).[42] T. Gregor, D. W. Tank, E. F. Wieschaus and W. Bialek. Cell , , 153 – 164, 2007.[43] A. F. Ramos, J. E. M. Hornos, J. Reinitz Phys. Rev. E , , 020701(R), 2015.[44] The Heaviside function, Θ, assumes two values, 0 or 1 if its argument is, respectively, negativeor positive., 020701(R), 2015.[44] The Heaviside function, Θ, assumes two values, 0 or 1 if its argument is, respectively, negativeor positive.