[PDF] Diverse spatial expression patterns emerge from common transcription bursting kinetics

Abstract

In early development, regulation of transcription results in precisely positioned and highly reproducible expression patterns that specify cellular identities. How transcription, a fundamentally noisy molecular process, is regulated to achieve reliable embryonic patterning remains unclear. In particular, it is unknown how gene-specific regulation mechanisms affect kinetic rates of transcription, and whether there are common, global features that govern these rates across a genetic network. Here, we measure nascent transcriptional activity in the gap gene network of early Drosophila embryos and characterize the variability in absolute activity levels across expression boundaries. We demonstrate that boundary formation follows a common transcriptional principle: a single control parameter determines the distribution of transcriptional activity, regardless of gene identity, boundary position, or enhancer-promoter architecture. By employing a minimalist model of transcription, we infer kinetic rates of transcriptional bursting for these patterning genes; we find that the key regulatory parameter is the fraction of time a gene is in an actively transcribing state, while the rate of Pol II loading appears globally conserved. These results point to a universal simplicity underlying the apparently complex transcriptional processes responsible for early embryonic patterning and indicate a path to general rules in transcriptional regulation.

Full PDF

DDiverse spatial expression patterns emerge fromcommon transcription bursting kinetics

Benjamin Zoller a ∗ , Shawn C. Little b ∗ , Thomas Gregor a † a Joseph Henry Laboratories of Physics and the Lewis Sigler Institute forIntegrative Genomics, Princeton University, Princeton, NJ 08544, USA b Department of Cell and Developmental Biology, University ofPennsylvania Perelman School of Medicine, Philadelphia, PA 19143, USA (Dated: December 25, 2017)In early development, regulation of transcription results in precisely positioned and highly repro-ducible expression patterns that specify cellular identities. How transcription, a fundamentally noisymolecular process, is regulated to achieve reliable embryonic patterning remains unclear. In par-ticular, it is unknown how gene-speciﬁc regulation mechanisms aﬀect kinetic rates of transcription,and whether there are common, global features that govern these rates across a genetic network.Here, we measure nascent transcriptional activity in the gap gene network of early

Drosophila em-bryos and characterize the variability in absolute activity levels across expression boundaries. Wedemonstrate that boundary formation follows a common transcriptional principle: a single controlparameter determines the distribution of transcriptional activity, regardless of gene identity, bound-ary position, or enhancer-promoter architecture. By employing a minimalist model of transcription,we infer kinetic rates of transcriptional bursting for these patterning genes; we ﬁnd that the keyregulatory parameter is the fraction of time a gene is in an actively transcribing state, while the rateof Pol II loading appears globally conserved. These results point to a universal simplicity under-lying the apparently complex transcriptional processes responsible for early embryonic patterningand indicate a path to general rules in transcriptional regulation.

I. INTRODUCTION

A central question in gene regulation concerns how dis-crete molecular interactions generate the continuum of ex-pression levels observed at the transcriptome-wide scale [1].A large set of molecular activities are required to elicit RNAtranscription, including transcription factor DNA bind-ing, chromatin modiﬁcations, and long-range enhancer-promoter interactions [2, 3]. However, in most cases it isunclear which, if any, of these interactions predominantlyregulate the RNA synthesis rate and the associated vari-ability for any given gene [4]. In general, for genes whosetranscription rates depend on levels of external inputs, wedo not know which regulatory steps are preferably tunedto achieve functional levels of mRNA expression. Overall,it is unknown whether constraints exist that might selectcommon mechanisms for modulating transcriptional activ-ity across genes, space and time.Addressing these questions requires measuring the ki-netic rates of transcription, in absolute units. Several stud-ies using single molecule counting approaches have docu-mented the inherently stochastic nature of transcription[5–8]. In organisms ranging from bacteria to vertebrates,genes exhibit transcriptional bursts characterized by in-termittent intervals of mRNA production followed by pro-tracted quiescent periods [9–11]. This inherent stochastic-ity in gene activation results in greater cell-to-cell variabil-ity than expected from models of constitutive expression(Poisson noise) [12]. Transcriptional bursting and the vari-ability associated with it is well-described and formalizedby a simple telegraph or two-state model of transcription ∗ These authors contributed equally. † Correspondence: [email protected] in which a locus alternates at random between active andinactive states [13]. Despite the prevalence of transcrip-tional bursting, in the majority of cases it is unknown whatprocesses determine its kinetics [14, 15]. Nor is it widelyunderstood how kinetic rates are controlled by external in-put signals (Molina et al, 2013; Senecal et al, 2014). Bymeans of quantitative modeling and properly calibrated ex-periments, it is possible to acquire an understanding of themechanisms underlying transcription regulation based ontheir signature in the measured transcriptional variabilityor noise [16–21].

Drosophila embryos provide an ideal model to explorethe relationship between intrinsic regulatory inputs andtranscriptional output [22]. Early embryos express manygenes in graded patterns of transcription levels in responseto modulatory inputs [23]. Spatial domains, where gene ex-pression levels transition from highly active to nearly silent,are functionally the most critical for the developing embryo,as they eventually determine speciﬁcation of cell identities[24, 25]. Among the earliest expressed genes in

Drosophila development are the gap genes, which encode transcriptionfactors responsible for anterior-posterior (AP) patterning[26]. Each gap gene is expressed in its own unique pat-tern, and their expression boundaries arise at distinct andprecise positions along the AP axis [27]. Gene expressionlevels near these boundaries are spatially graded across sev-eral cell diameters, and the intermediate levels of these gapgenes confer patterning information necessary for embry-onic segmentation [24, 25, 28, 29]. Thus the precise controlof expression levels across these boundaries is essential forproperly patterned cell fate speciﬁcation.The regulation of the gap genes appears highly complex.Many activating and repressing factors determine expres-sion boundaries through complex layers of homo- and het-erotypic protein interactions at multiple promoters and en-hancers [30–35]. Given this complexity, an intuitive expec- a r X i v : . [ q - b i o . S C ] D ec tation is that expression rates emerge from carefully tunedtranscription factor concentrations and binding aﬃnities.The expression pattern of each gap gene results from theinterplay between multiple enhancers responsible for thespatio-temporal regulation of transcription [35, 36]. Eachenhancer possesses distinct arrangements of binding sitesthat enable cooperation, as well as competition betweenmultiple activators and repressors [37]. The collective ac-tivity of cross-regulating transcriptional modulators gen-erates rates of gene expression that vary with position inthe embryo, thereby forming expression boundaries [38–40].Thus a straightforward prediction is that the underlyingbursting kinetics will diﬀer between boundaries. However,no previous work has investigated how kinetic rates com-pare between genes or across expression boundaries.To address these questions, we developed a singlemolecule ﬂuorescent in situ hybridization (smFISH) basedapproach that allows highly accurate counting of nascentRNAs molecules in individual nuclei. Measurements ofthe major Drosophila gap genes provide access to absolutemean transcription levels and their variability across loci.Using a simple telegraph model to interpret these measure-ments provides insights into the underlying transcriptionprocesses and reveals a common basic principle that uni-ﬁes the transcriptional output of all genes at all positionsin the embryo. Finally, based on this model we estimateits kinetic parameters from the data and determine whichand how these change as a function of position. Surpris-ingly, despite the diversity of cis-regulatory architectureand trans-acting factors regulating these genes, we estab-lished that transcriptional bursting is tightly constrainedacross all expression boundaries. Our ﬁndings suggest anunderlying emerging simplicity in the transcriptional reg-ulation of the apparently complex process of embryo seg-mentation.

II. RESULTSPrecise measurements of transcriptional activity

During early ﬂy development, the formation of gene ex-pression boundaries arises from spatially varying transcrip-tion factor concentrations. Early embryos thus providea natural, experimentally accessible context in which toask how input factors shape transcription dynamics. Herewe performed single molecule ﬂuorescent in situ hybridiza-tion (smFISH) [8], namely we used ﬂuorescent oligonu-cleotide probes to label single mRNA molecules in ﬁxedembryos and estimate the intensity of transcription sites(co-localized nascent transcripts) and individual cytoplas-mic mRNAs. This method enables us to measure instan-taneous transcriptional activity of gene loci per nucleus interms of “cytoplasmic unit” intensity (C.U.), by normaliz-ing the total ﬂuorescent intensity of each locus to the equiv-alent number of processed cytoplasmic mRNA molecules(Fig. 1A, B). Our enhanced method increases sensitiv-ity by 3- to 4-fold (see Methods), thereby enabling precisecounting of nascent transcripts, and hence measurement oftranscriptional activity across expression boundaries.We measured the transcriptional activity per nucleus for all the four trunk gap genes hunchback ( hb ), Kr¨uppel ( Kr ), knirps ( kni ), and giant ( gt ) along the embryo AP axis.These genes are expressed early during development inbroad spatial domains, thus permitting transcription mea-surements from thousands of synchronized nuclei across rel-atively small numbers of embryos; all factors that favor lowmeasurement error (Fig. 1C and 1D, N >

10 embryos percombination of genes and genotype). Expression level anal-ysis during the mid- to late portion of interphase 13 ensuresthat suﬃcient time has elapsed to allow these genes to at-tain steady-state levels of transcribing RNA polymerase II(Pol II; Supplement, Fig. S1); and DNA replication occursin early interphase for these loci [41], such that observationsduring later times eliminate ambiguity arising from variablenumbers of transcriptionally active loci. Since loci on re-cently duplicated chromatids are often closely apposed inspace, we measure total transcription per nucleus (as previ-ously described in Little et al, 2013) and use these data toinfer the statistical properties of individual loci. As a con-trol, we generated data from embryos heterozygous for a hb deﬁciency, and observed half the wild-type level of expres-sion per nucleus (Fig. 1C). Importantly, we also observe acorresponding decrease in variance to half that observed inwild-type (Fig. 1D), supporting previous ﬁndings that allloci behave independently [8]. These results demonstratethe suitability of using total transcriptional activity pernucleus to infer the behavior of individual loci.Since biological variance greatly constraints models ofthe regulatory processes underlying transcription, we needto determine how the total observed gene expression vari-ability decomposes into measurement error, embryo-to-embryo diﬀerences, and intrinsic ﬂuctuations in individualnuclei. We assessed the performance of our measurementswith a two-color smFISH approach, labeling each mRNAin alternating colors along the length of the mRNA strand.This approach allowed us to perform an independent nor-malization in each channel, and thus to characterize thevarious sources of measurement error, such as noise stem-ming from imaging, normalization, and embryo alignmentalong the anterior-posterior axis (Supplement, Fig. S2).For all genes and at all positions, the measurement vari-ability represents less than 10% of the total variance onaverage (Fig. 1E), indicating that biological variability islargely dominant in our measurements [27]. Importantly,this variability arises almost entirely from diﬀerences be-tween nuclei, rather than diﬀerences between embryos (Fig.1F). Notably, the low embryo-to-embryo variability in themaximally expressed regions (16 ±

4% CV, Fig. 1G) empha-sizes that the mean expression levels across embryos are re-producible in absolute units (Fig. 1C). Thus the measuredexpression noise mainly stems from zygotic transcription,and is intrinsic to the molecular processes underlying tran-scription rather than from an extrinsic source of variability,such as maternal age or live history, or other processes oc-curring during oogenesis. Our low measurement error com-bined with largely dominant intrinsic variability facilitatethe analysis of the noise-mean relationship and permit theinference of expression kinetics from several hundred nucleiat each position along the AP axis (Fig. 1B), as detailedbelow. M ean a c t i v i t y µ [ C . U .] hb wthb defKrkni latekni earlygt femalegt male x/L V a r i an c e a c t i v i t y σ [ C . U .] x/L T r an sc r i p t i ona l a c t i v i t y [ C . U .] Mean activity

Single nuclei

10 20 30 40

Mean activity in max region [C.U.] E m b r y o v a r i ab ili t y σ e m b / µ % hb w t hb de f K r k n i l a t e k n i ea r l y g t f e m a l e g t m a l e σ * / σ A BCDE F

Embryo AP position

Transcription sitesNuclei G (cid:173)(cid:172) (cid:174) σ σ σ σ = σ + + + (cid:173)(cid:172) (cid:174) σ σ σ σ σ σ FIG. 1:

Absolute quantiﬁcation of gap gene transcriptional activity.

A) Transcriptional activity of individual nucleimeasured by single molecule mRNA-FISH in nuclear cycle 13 of the blastoderm embryo. The activity of individual nuclei resultsfrom the summed intensity of each locus (transcription sites, green bright spots) normalized by the average intensity of a singlefully elongated cytoplasmic mRNA (cytoplasmic unit, C.U.). B) Transcriptional activity proﬁle for the gap gene hb as a functionof AP position in % egg length for 18 embryos. Each dot represents the total mean intensity of nascent transcription activity inC.U. for individual nuclei. Vertical dashed lines deﬁne AP bins covering 2.5% of egg length; crosses display mean activity in eachbin . C) Mean transcriptional activity as a function of AP position for all measured trunk gap genes in C.U.. D) Total varianceof transcriptional activity as a function of AP position for all measured trunk gap genes in [C.U.] . E) Decomposition of the totalvariance σ into measurement error and biological variability for all genes. Estimates of imaging error (red) alignment error (blue),embryo-to-embryo variability (magenta) are decoupled from the total variance. The remaining variance corresponds to biologicalvariability and is termed intrinsic nucleus-to-nucleus variability in the text (green). F) Global decomposition of total variance forthe entire data set. Nucleus-to-nucleus variability largely dominates in the blastoderm embryo. G) Fractional embryo-to-embryovariability (CV) as a function of mean activity (solid black line: mean ratio; dashed lines: 66% conﬁdence intervals) reaches 16 ± Single parameter distribution of transcriptionalactivity across all expression boundaries

The expression patterns of the four gap genes are eachdetermined by multiple enhancer elements positioned atvarying distances from their promoters [35, 36]. In addi-tion, each enhancer contains a variable number of bind-ing motifs for multiple patterning input factors with cross-regulatory interactions [42, 43]. These features, as well asevidence from genetic manipulations [44–46], indicate thatmany molecular processes are required to regulate the tran-scription rates that generate observed mRNA expressionlevels with their stereotypical modulation as a function ofposition in the embryo (Fig. 1C). Given the diversity ofinput factors and molecular control elements involved inthe transcription process, it would appear likely that dif-ferent genes exhibit vastly diﬀerent and uniquely deﬁnedtranscription kinetics. In order to make progress in our un-derstanding of these complex multi-factorial relationships,we capitalize on the fact that the kinetics of the processesunderlying transcription determine not only these mean ex-pression levels but also the observed variability in our data(Fig. 1D). Thus we can use measurements of the noise-mean relationship to characterize the expression kineticsfor individual genes.To characterize the diﬀerent noise-mean relationshipsin our system, we examined the dependence of variabil-ity on mean transcription levels (Fig. 2A). In agreementwith previous measurements [8], genes span a similar dy-namic range of expression levels across boundaries, span-ning nearly zero to a maximum value of 34 ± ) ap-proximately 12 times larger than Poisson noise for a meantranscriptional activity below 10 C.U. (Fig. 2A). However,the noise-mean relationship follows an unexpectedly simi-lar overall trend for all genes (Fig. 2A). First, unlike manyother systems (bacteria, yeast, mammalian culture cells),there is no clearly identiﬁable noise ﬂoor at high expres-sion [7, 21, 47]. The absence of such an extrinsic noiseﬂoor is likely a key feature of early embryo developmentsince nuclei are highly synchronized (cell-cycle wise) andshare the same environment (syncytial blastoderm). Sec-ond, the apparent collapse of each gene on a unique curveis unexpected and atypical given the diﬀerent promoter-enhancer architectures [48, 49]. Even more strikingly, whenwe converted the units from activity in C.U. to Pol IInumber counts g (i.e. by taking into account individualgene length, copy number, and probe conﬁguration, seeSupplement), the 2 nd , 3 rd and 4 th cumulants for all genesare almost uniquely determined by a single parameter, themean activity (Fig. 2B-D). Thus transcriptional activity ischaracterized across the entire expression range and for allgenes by a unique common single-parameter distributionexclusively speciﬁed by the mean. Such a universal fea-ture suggests that despite the well-documented diversity ofcis-regulatory elements and trans-acting factors, a commonconserved set of processes is regulated to determine tran-scription kinetics across nearly all expression boundaries in the early embryo. Two-state model identiﬁes unique control parameterunderlying universality

These shared features suggest that a common generalmodel can describe the regulated kinetics of all genes. Apopular minimalist model that accounts for intrinsic super-Poissonian variability is the two-state model, where lociswitch stochastically between transcriptionally inactive andactive states, with transcription initiation only occurringin the latter (Fig. 2E) [13]. The two-state model has beenwidely used to describe the distribution of mature mRNAand protein counts [5, 6, 50]. Such a simple mechanisticmodel enables estimation of switching rates between pro-moter states ( k on and k oﬀ ) as well as the eﬀective initiationrate k ini [10, 51, 52]. Our measurements of nascent tran-script counts represent instantaneous transcriptional activ-ity of Pol II molecules engaged in transcription, and thusprovide a more direct measurement of instantaneous tran-scription activity compared to counts of mature mRNAs orproteins. For these reasons the two-state model presentsa straightforward and parameter-sparse means to describehow discrete randomly occurring events generate a contin-uum of expression rates. It allows us to predict the depen-dence of variability on mean activity for diﬀerent scenariosof parameter modulation, and to determine which of thekinetic rate parameters ( k on , k oﬀ and k ini ) is modulated toform gene expression boundaries.Given that the ﬁrst four cumulants of our data are almostuniquely determined by a single parameter, we sought toonly explore single parameter modulation consistent withthe data. In addition, we assumed the Pol II elongationrate, k elo , to be constant and identical for all genes [53, 54].When we solve the master equation for such a model (seeSupplement), a comparison of the predicted noise activity(Fig. 2F) with our data (Fig. 2A) under each scenario un-equivocally eliminates a modulation of k ini . Indeed, solelyvarying k ini would lead to saturation of noise at high ac-tivity, which is not observed. Instead, our measurementsare consistent with modulation of the fractional mean pro-moter occupancy (cid:104) n (cid:105) , deﬁned as (cid:104) n (cid:105) = k on / ( k on + k oﬀ ).(Note that here occupancy refers to the active or “ON”state and (cid:104) n (cid:105) is thus bound between zero and one.) Thisvalue represents the fraction of time spent in the activestate and is equivalent to the probability of ﬁnding a locusin the active state. Modulation of the mean production rateis thus determined by (cid:104) n (cid:105) rather than the rate at which PolII molecules enter into productive elongation. Tuning onlythe mean occupancy (cid:104) n (cid:105) thus uniquely describes the forma-tion of all expression boundaries regardless of their positionin the embryo.In principle, within the two-state model, either or both ofthe rates k on and k oﬀ may be tuned to modulate the meanoccupancy. Modulation of k oﬀ alone can be ruled out, sincesuch a scenario would not capture the noise properly below10 C.U. (see Fig. 2F). To distinguish between the otherscenarios, we calculated the higher cumulants predicted bythe two-state model (Fig. 2F-I). Although modulation of k on alone may explain the noise and variance (Fig. 2F-G), -1 Mean activity µ [C.U.] -2 -1 N o i s e a c t i v i t y σ / µ P o i ss on no i s e µ hb wthb defKrkni latekni earlygt femalegt male Mean activity /g -0.3-0.2-0.100.10.20.3 F ou r t h c u m u l an t a c t i v i t y κ / g Mean activity /g -0.2-0.100.10.20.3 T h i r d c u m u l an t a c t i v i t y κ / g Mean activity /g V a r i an c e a c t i v i t y σ g2 / g Mean occupancy V a r i an c e a c t i v i t y σ g2 / g Mean occupancy -0.1-0.0500.050.1 F ou r t h c u m u l an t a c t i v i t y κ / g Mean occupancy -0.0500.050.10.15 T h i r d c u m u l an t a c t i v i t y κ / g -1 Mean activity µ [cu] -2 -1 k ini modulation modulation (k on ) modulation (k off ) modulation ( τ n cst > τ e ) modulation ( τ n cst < τ e ) N o i s e a c t i v i t y σ / µ A BEF G CDHI k ini n=0 n=1 k on k off inactive activePromoter state n nascent mRNA k elo Pol2loading Pol2 residency time τ e = L/k elo g g+1 → gene length LPol2 number gMean activity = g = k ini τ e Mean occupancy = = k on τ n k on k on + k off nucleusMeasured activityN g = 2 to 4 transcription sitesMean activity µ = C N g FIG. 2:

A two-state model recapitulates single-parameter modulation for all genes.

A) Noise-mean relationship forgap genes, with noise deﬁned as the fraction between variance and the squared mean activity (CV ). Dashed line stands forthe Poisson background, corresponding to the lowest attainable noise. Solid lines were obtained by ﬁtting 2 nd order polynomialsfor each gene. The collapse of the trend to Poisson noise at high expression suggests an upper limit µ in attainable expressionlevels (vertical dashed line). B-D) Normalized 2 nd , 3 rd and 4 th cumulant as a function of normalized Pol II counts for a singlegene copy. The activity intensities in C.U. were converted into Pol II counts g by taking into account actual ﬂuorescent probelocations and gene lengths. Assuming independence, the mean and the cumulant were divided by the gene copy number N g = 2 , nd , 3 rd and 4 th orderpolynomial, constrained to match the Poisson level at maximum Pol II counts g0. E) Two-state model for statistical properties ofmeasured transcriptional activity: promoter switches stochastically between an inactive and active state leading to intermittent PolII initiation events. The mean activity in Pol II counts is (cid:104) g (cid:105) = k ini τ e (cid:104) n (cid:105) , with initiation rate k ini , elongation time τ e = L/k elo , andpromoter mean occupancy (cid:104) n (cid:105) = k on / ( k on + k oﬀ ) ∈ [0 , g = k ini τ e . The measured meanactivity in C.U. is µ = CN g (cid:104) g (cid:105) , where N g is the gene copy number and C ∈ [0 ,

1] a conversion factor that depends on the probelocations on transcripts. F) Noise-mean relationship and G-I) normalized 2 nd , 3 rd and 4 th cumulants predicted by the two-statemodel under diﬀerent single parameter mean activity modulation schemes: Pol II initiation rate k ini (gray), OFF-rate k oﬀ (green),ON-rate k on (cyan), and promoter occupancy (cid:104) n (cid:105) at constant switching correlation time τ n = ( k on + k oﬀ ) − (red for τ n > τ e , andblue for τ n < τ e ). Notably, (cid:104) n (cid:105) modulation at slow switching ( τ n > τ e ) achieves numerical values that closely match the trends ofour data. Error bars stand for the 66% conﬁdence intervals. it does not capture the 3 rd and 4 th cumulants well (Fig.2H-I). Alternatively, the model can be parameterized by (cid:104) n (cid:105) and the correlation time τ n = ( k on + k oﬀ ) − i.e., thecharacteristic time-scale for changes in promoter activity.Thus, both switching rates k on and k oﬀ are fully determinedby (cid:104) n (cid:105) and τ n : k on = (cid:104) n (cid:105) τ n and k oﬀ = 1 − (cid:104) n (cid:105) τ n For this new set of parameters, we observed a good matchto the data when τ n is maintained ﬁxed and (cid:104) n (cid:105) varies alone(Fig. 2G-I), consistent with a single parameter modula-tion. In addition, the modulation of (cid:104) n (cid:105) in a slow switchingregime ( τ n ≥ τ e the elongation time) captures the cumu-lants more closely than in a fast regime ( τ n < τ e ). Thusthe two-state model predicts that in this system simulta-neously increasing active and decreasing inactive periodscontrols gene activity. Modulation of (cid:104) n (cid:105) at constant τ n for all boundaries and at all positions represents a com-mon, universal regulatory mechanism independent of geneidentity. Kinetic transcription parameters in absolute units

Further insight into the molecular mechanisms underly-ing transcription necessitates knowledge about the absolutescales of the relevant kinetic parameters. Hence, to go be-yond the above arguments based on summary statistics,we inferred kinetic rates for each gene and at each positionfrom the full distribution of transcriptional activity whiletaking into account measurement noise. We thus utilizeddual-color smFISH, tagging the 5’ and 3’ regions of thetranscripts with diﬀerently colored probe sets that providetwo complementary readouts (in C.U.) of the transcriptionsite activity. The measured 5’ and 3’ activities are corre-lated via a ﬁnite Pol II elongation time (Supplement, Fig.S4). Therefore the possible set of kinetic parameters thatcould generate the observed activities is constrained (Meth-ods, Figs. S4, S5 and S6). It allows us to deduce the kineticrate parameters ( k ini , k on , and k oﬀ ) of the two-state modelfrom the joint distribution of 5’ and 3’ activities for eachAP position (Fig. S5). Previous measurements of Pol IIelongation rate k elo = 1 . k ini is constant at about 7.2 ± k ini are disfavored as mech-anisms for controlling overall mRNA synthesis rates. Be-cause these are found to be entirely determined by (cid:104) n (cid:105) forall genes, and spanning a similar dynamic range for allboundaries (Fig. 3C), we advocate that promoter occu-pancy represents the key control parameter describing theformation of expression boundaries.Surprisingly, both k on and k oﬀ change as a function ofmean occupancy (cid:104) n (cid:105) and are tightly constrained for allgenes and across all boundaries (Fig. 3D, E). This sug-gests that some combination of the switching rates is con-served. Indeed, as predicted by the two-state model above,the correlation time of the switching process is also con-stant ( τ n = 3 . ± . k on and k oﬀ are modulated simultaneously:an increase in overall expression rate is achieved both by in-creasing the time spent active (lowering k oﬀ ) and by jointlydecreasing the time spent inactive (raising k on ). Mechanis-tically this result is surprising because it implies that k on and k oﬀ are coordinated such that the promoter switchingcorrelation time is constant and that all boundaries emergefrom quantitatively identical modulation of switching rates.These ﬁndings are unbiased by our methodology as con-ﬁrmed using synthetic data (Fig. S6). In addition, eventwo-fold changes in elongation rate leave our conclusionsunaﬀected, aside from a rescaling of the kinetic parameters(Supplement, Fig. S8).As noted above, boundaries arise through the combinedactivities of multiple transcription factor inputs, with eachboundary generated by a unique combination of inputs.Current models of boundary formation imply that expres-sion rates of target genes emerge from careful tuning ofinput factor concentrations and DNA binding aﬃnities[26, 40, 55]. The complexity and diversity of these in-puts therefore leads to an intuitive expectation that ki-netic switching rates must also diﬀer between genes. Thisexpectation seems all the more reasonable given the factthat many combinations of k on and k oﬀ generate the same (cid:104) n (cid:105) . A constant correlation time implies within the con-text of the two-state model that all genes at all positionsreach steady-state simultaneously. Such global synchronic-ity across all loci is maintained for all genes at all expressionrates. We propose that such synchronicity is importantfor ensuring precise and reproducible patterning outcomes,and that this requirement constrains the range of attain-able or otherwise desirable values of k on and k oﬀ . Thus,the apparently complex process of regulating gene expres-sion rates is explained by a conceptually simple strategy ofuniversal modulation. III. DISCUSSION

A multitude of molecular processes inﬂuence rates ofgene expression. However, it is not clear which, if any,interactions might be more likely than others to determineexpression rates, either globally across all genes or for sin-gle genes in response to modulatory inputs. Nor have most

Mean occupancy -3 -2 -1 k on [ m i n - ] Mean occupancy -2 -1 k o ff [ m i n - ] Mean occupancy -1 τ n = / ( k on + k o ff ) [ m i n ] k i n i [ m i n - ] x/L < n >= k on / ( k on + k o ff ) hb wthb defKrkni latekni earlygt femalegt male ABD E FC

Mean occupancy M ean sy n t he s i s r a t e k i n i < n > [ m i n - ] FIG. 3:

Transcription rates are tightly constrained across genes.

A,B) Inferred Pol II initiation rate k ini (A), and promotermean occupancy (cid:104) n (cid:105) (B) based on the two-state model for all trunk gap genes across AP position. C) Modulation of transcriptmean synthesis rate k ini (cid:104) n (cid:105) is fully determined by the mean occupancy (cid:104) n (cid:105) . D,E) Inferred on-rate k on (D) and oﬀ-rate k oﬀ (E) as afunction of the mean occupancy (cid:104) n (cid:105) for all gap genes. Solid black line stands for the global trend. F) Switching correlation time τ n as a function of mean occupancy (cid:104) n (cid:105) for all gap genes. Error bars stand for the 10 to 90 th percentiles of the posterior distribution. previous studies documented similarities across endogenousgenes. We have developed a single molecule method formeasuring modulated kinetic parameters of endogenousgenes. Surprisingly, all expression boundaries arise fromequivalent switching kinetics, regardless of the diﬀerencesin upstream regulatory elements. Thus, a simple, commonstrategy for transcriptional modulation emerges from theapparently complex combination of regulatory interactionsspeciﬁc to each gene. This suggests a shared regulatorybasis for transcriptional modulation, the nature of which iscurrently unknown.These observations raise the question of whether thecommon transcriptional kinetics carry a functional advan-tage [56]. The precise positioning of cell fates in early em-bryos requires minimizing variability between nuclei, whichis achieved by a combination of long mRNA and proteinlifetimes permitting accumulation, and spatial averagingthrough the syncytial cytoplasm, which all serve to mini-mize variability in patterned expression [8]. At the level oftranscription, noise minimization would be best achievedvia modulation of constitutive activity. Simply changingthe Pol II initiation rate k ini at a constitutive promoter (al-ways ‘ON’) would generate the theoretical minimal (Pois-son) noise at all levels of activity [49]. The fact that con- stitutive activity is never observed suggests that some con-straint prohibits this system from maintaining these genesin a continuously active state, and/or it is not mechanisti-cally straightforward to alter k ini . Instead, our observationof constant switching correlation time at all genes and ex-pression levels suggests that this value plays an importantrole in facilitating robust patterning. The constant cor-relation time we measure here implies that each gene atall positions attains steady-state in synchrony, and in atimely manner as its value ( τ n = 3 . ± . ∼

15 min in cycle 13).This suggests that the relative production rates are main-tained across a boundary during early development. In ad-dition, the short switching time compared to the combinedduration of early nuclear cleavage cycles ensures eﬀectivetemporal averaging of the switching noise by accumulationof stable transcripts. We therefore propose that both ex-pression timing and noise minimization jointly constrainswitching kinetics.Our ﬁndings suggest that all regulatory inputs interfaceupon a universal set of processes to determine the kinet-ics of the promoter states. However, these results do notaddress the mechanistic origins of the common switchingrates. Unique combinations of transcription factors deter-mine all the boundaries we have examined. It is possi-ble that protein-DNA aﬃnities have been selected to con-fer the rates we observe. Alternatively, other events suchas promoter-enhancer interactions, chromatin modiﬁcationdynamics, or pausing/release of Pol II, predominantly de-termine switching rates. Indeed, it is not clear how tran-sient transcription factor interactions, usually on the orderof a few seconds, could generate bursts on the order ofminutes [57–61]. Recent evidence suggests that Mediatorand TBP binding play a key role in determining burstingkinetics [15]. Thus switching kinetics may not be directlydetermined by transcription factor binding, but by com-mon transcriptional rate-limiting steps related to recruit-ment and stability of general factors. Interestingly, more“mechanistic” extensions of the two-state model could inprinciple reproduce the modulation of mean activity (cid:104) n (cid:105) atconstant correlation time with a single varying kinetic pa-rameter resulting from transcription factor titration acrossboundaries. A possible extension is a 3-state model, de-scribing a two-step reversible activation, which is consis-tent with enhancer-promoter interaction and establishmentof the transcription machinery [20]. Alternatively a modelwith an additional noise term such as input noise stem-ming from diﬀusion of transcription factors [16, 62] couldexplain the apparent dual modulation of the switching ratesobserved under the 2-state model. It would then only bewhen these more detailed models are eﬀectively reducedto the 2-state that the surprising dependence of the ratesemerges. Distinguishing these models from the simple 2-state model will require live imaging of the endogenous gapgenes.The universal transcriptional parameters of the gapgenes highlight a form of complexity reduction: despitethe variety of upstream regulatory elements, all expressionboundaries result from similar switching kinetics. As dis-cussed above, whether this simplicity results from an un-derlying molecular simplicity has yet to be determined. Re-gardless of the mechanistic means by which these similarrates are achieved, this convergence strongly suggests thepresence of global constraints that either limit the rangeof permitted bursting rates and/or minimize transcriptionvariability in the context of the rapidly developing earlyembryo. Such convergence might indicate the possibilityof a path to general rules in transcription regulation. It isnow possible to inquire about the breadth of these gener-alities and whether they apply to the same gene expressedin diﬀerent cell types, or to the transcriptome as a whole,or even across organisms. The methods we utilize here areapplicable in a variety of systems and permit the discoveryof the molecular mechanism(s) conferring universal tran-scription kinetics. MethodsFly strains

Oregon-R (Ore-R) embryos were used as wild-type. Em-bryos heterozygous for a deﬁciency spanning hb were col-lected from crosses of heterozygous adults of the strainw ; Df(3R)BSC477/TM6C. Heterozygotes of the hb de- ﬁciency, as well as wild-type male and female embryosstained for gt , were distinguished from siblings by visualinspection of nascent transcription sites. FISH and imaging

We modiﬁed our smFISH protocol [8] to minimize back-ground and maximize signal. Embryos were crosslinkedin 1xPBS containing 16% paraformaldehyde for 2 minutesbefore devitellinization as described in [63]. Embryos werewashed four times in methanol, 5 minutes per wash, withgentle rocking at room temperature, followed by an ex-tended 30-60 minute wash in methanol. Fixed embryoswere then used immediately for smFISH without interven-ing storage. Embryos washed three times in 1X PBS, 5minutes per wash, at room temperature with rocking. Em-bryos were then washed 3 times in smFISH wash buﬀer [8],10 minutes per wash, at room temperature. During thistime, probes diluted in hybridization buﬀer [8] were pre-heated to 37C. Hybridization was performed for 1.5 hr at37C with vigorous mixing every 15 minutes. During hy-bridization, smFISH wash buﬀer was preheated to 37C.Embryos were washed four times with large excess vol-umes of wash buﬀer for 3-5 minutes per wash, rinsed twicebrieﬂy in PBS, stained with DAPI, and mounted in VEC-TASHIELD (Vector Laboratories; H-1000). Imaging wasperformed within 48 hr to ensure high quality signal. DNAoligonucleotides complementary to the open reading framesof genes of interest were conjugated to Atto 565 (Sigma-Aldrich; 72464) and Atto 633 (Sigma-Aldrich; 01464) forall two color measurements.Imaging was performed by laser-scanning confocal mi-croscopy on a Leica SP5 inverted microscope. We useda 63x HCX PL APO CS 1.4 NA oil immersion objectivewith pixels of 76 ×

76 nm and z spacing of 340 µ m. Wetypically obtained stacks representing 8 µ m in total axialthickness starting at the embryo surface. The microscopewas equipped “HyD Hybrid Detector” avalanche photodi-odes (APDs) which we utilized in photon counting mode.This is in contrast to our prior approach [8] in which stan-dard photomultiplier tubes (PMTs) were used to collecttwo separate smFISH image stacks at two diﬀerent laserintensities: a low power stack for measuring transcriptionintensities, and a high power stack to distinguish single mR-NAs. The use of low-noise photon-counting APDs in placeof standard photomultipliers provided suﬃcient dynamicrange to capture high signal transcription sites and to sep-arate relatively dim cytoplasmic single mRNAs from back-ground ﬂuorescence with a single laser power. This alsoabrogated the need to calibrate the high- and low-powerstacks for comparison. The removal of the calibration stepprovided an additional reduction in measurement error. Image analysis and calibration in absolute units

The raw data are processed according to the image anal-ysis pipeline previously developed and described in detailsin [8, 64]. Brieﬂy, raw images are ﬁltered using a Diﬀerence-of-Gaussians (DoG) ﬁlter to detect spot objects. A masterthreshold is applied to separate candidate spots from back-ground. True point-like sources of ﬂuorescence are iden-tiﬁed, as they appeared on multiple consecutive z-slices( >

3) at the same location. All candidate particles arethen labeled as transcription sites, cytoplasmic transcriptsor noise based on global thresholds. The threshold sepa-rating cytoplasmic transcripts from noise is deﬁned as thebottom of the valley between the two peaks on the parti-cle intensity distribution. The threshold for transcriptionsites depends both on intensity and position, as transcrip-tion sites cluster in z and are enclosed in nuclei (segmentedfrom DAPI staining). Intensity of transcription sites is ob-tained by integrating the signal over a ﬁx cylinder volume(V V s = π × . × . µ m , determined from the PSF).We calibrated the integrated intensity of transcriptionsites F s by ﬁrst characterizing the relationship between theﬂuorescence signal and the density of cytoplasmic tran-scripts. We deﬁned summation volumes in the embryo( V (cid:39) . × . × . µ m ) avoiding region of high tissuedeformation and excluding transcription sites location. Foreach summation volume we counted the number of detectedcytoplasmic transcripts and integrated the ﬂuorescence in-tensity. At low count density, the ﬂuorescence per summa-tion volume F scales linearly with density D [8]. Fitting asimple linear relationship F = αD + β , where β correspondsto background, enables estimation of a scaling factor α tocalibrate transcription sites in “cytoplasmic units” (C.U.)for each embryo. Namely, the intensity in C.U. is givenby f = ( F s − bV s ) /α where b is the background intensityper pixel in each nucleus. We estimated the measurementerror by imaging a control gene ( hb ) in two independentchannels using an alternating probe conﬁguration. Afternormalizing the intensities in C.U., both channels corre-late with slope close to one. The measurement error wasestimated from the residuals of the orthogonal regression(Fig. S1A, Supplement). In the maximally expressed re-gions, we measure transcriptional activity with an error of5% and relate it to absolute units with an uncertainty un-der 3.5% (the largest deviation of the slope 0.968 ± Two-state model of promoter activity

The transcriptional activity of a single gene copy wasmodeled as a telegraph process (ON-OFF promoter switch)with transcript initiation occurring as a Poisson processduring the ‘ON’ periods [13]. Within the two-state model,the distribution of nascent transcripts on a gene resultsfrom random Pol II initiation in the active state coupledwith elongation and termination [52, 65, 66]. For simplicity,we combined elongation and termination as an eﬀectiveprocess that was modeled as a deterministic progression.In addition, we assumed that all the kinetic rates of themodel are constant in time and identical across embryos.The key parameters of the model are the initiation rate k ini , the promoter switching rates k on and k oﬀ , and theelongation time τ e . At stead-state, the mean number oftranscripts (cid:104) g (cid:105) and the variance σ g for a single gene copyare given by (cid:104) g (cid:105) = g (cid:104) n (cid:105) and σ g = g (cid:104) n (cid:105) + g (cid:104) n (cid:105) (1 − (cid:104) n (cid:105) )Φ( τ e /τ n )where g = k ini τ e is the mean number of transcripts in aconstitutive regime (always ‘ON’), (cid:104) n (cid:105) = k on / ( k on + k oﬀ )the mean promoter activity, τ n = 1 / ( k on + k oﬀ ) the switch-ing correlation time, and Φ ∈ [0 ,

1] a noise averaging func-tion (Supplement). Higher cumulants can be readily cal-culated from the master equations (Supplement). Assum-ing independent gene copies, the mean and the cumulantssimply scales with the number of gene copies (either 2or 4 copies per nucleus). The mean activity µ in cyto-plasmic units is related to (cid:104) g (cid:105) by a conversion factor C that takes into account the exact FISH probes location,i.e. µ = CN g (cid:104) g (cid:105) with N g the gene copy number (Supple-ment). Similarly, cumulants of the data are rescaled byconversion factors that ensure proper normalization of thePoisson background (Supplement). Inferring transcription kinetics of endogenous genesfrom two color smFISH

We performed dual-color smFISH tagging the 5’ and 3’region of the transcripts with diﬀerent probe sets (Fig.S4A). After normalization in cytoplasmic units, both chan-nels oﬀer a consistent readout of the mean and the vari-ability (Fig. S4B-C). Assuming constant elongation rate,the probe locations and the gene length predicts the mea-sured 3’ to 5’ mean activity ratio (Fig. S4D), albeit withsmall deviations likely stemming from termination (Fig.S4E-F). The two channels enable estimation of the totalnascent transcripts (5’ channel) and the fractional occu-pancy of transcripts along the 5’ and 3’ portions of thegene at each locus (Fig. S5A-B). The 5’ and 3’ activitiesare temporally correlated through the elongation processand thus provide additional information about transcrip-tion than single channel smFISH (Fig. S5C). Combiningmeasurements from multiple embryos (Fig. S5B-C), we se-lect nuclei at similar positions (bins of 2.5% egg length) togenerate the joint distribution of 5’ and 3’ activity acrossAP position bins (Fig. S3D). We modeled the joint distri-bution of 5’ and 3’ activity based on the two-state modeland the exact probe location assuming steady-state (Sup-plement). The modeled distribution enables calculatingthe likelihood of normalized ﬂuorescence intensities of tran-scription sites given a set of kinetic parameters. Using aBayesian approach, we infer the kinetic rate parameters( k ini , k on , and k oﬀ ) of the two-state model from the jointdistribution at each position (Fig. S5D). The inferenceframework explicitly takes into account measurement noiseand the presence of multiple loci (Supplement). The corre-sponding best-ﬁtting distributions predicted by the modelmatch the data closely (Fig. S5E), and outliers are mainlyexplained by measurement and binning noise. To validatethis approach, we tested the inference on simulated datausing a broad range of parameter values and in presence ofmeasurement noise. This demonstrated that our method0is capable of discerning diﬀerences as small as 20% in sim-ulated parameters across an arbitrarily large range of pa-rameter values (Fig. S6). Aknowledgement

We thank W. Bialek, P. Fran¸cois, J. Kinney, M. Levo, M.Mani, F. Naef, T. Sokolowski, G. Tkacik & E. Wieschaus for insightful discussions and valuable comments regard-ing the manuscript. B. Zoller was supported by the SwissNational Science Foundation early Postdoc.Mobility fellow-ship. This study was funded by grants from the NationalInstitutes of Health (U01 EB021239, R01 GM097275). [1] Timoth´ee Lionnet and Robert H Singer. Transcription goesdigital.

EMBO reports , 13(4):313–321, April 2012.[2] Ty C Voss and Gordon L Hager. Dynamic regulationof transcriptional states by chromatin and transcriptionfactors.

Nature Publishing Group , 15(2):69–81, December2013.[3] Michael Levine, Claudia Cattoglio, and Robert Tjian.Looping back to leap forward: transcription enters a newera.

Cell , 157(1):13–25, March 2014.[4] Diana A Stavreva, Lyuba Varticovski, and Gordon L Hager.Complex dynamics of transcription regulation.

Biochimicaet biophysica acta , 1819(7):657–666, July 2012.[5] Arjun Raj, Charles S Peskin, Daniel Tranchina, Diana YVargas, and Sanjay Tyagi. Stochastic mRNA synthesis inmammalian cells.

PLoS Biology , 4(10):e309, October 2006.[6] Daniel Zenklusen, Daniel R Larson, and Robert H Singer.Single-RNA counting reveals alternative modes of gene ex-pression in yeast.

Nature Structural & Molecular Biology ,15(12):1263–1271, December 2008.[7] Yuichi Taniguchi, Paul J Choi, Gene-Wei Li, Huiyi Chen,Mohan Babu, Jeremy Hearn, Andrew Emili, and X Sun-ney Xie. Quantifying E. coli proteome and transcriptomewith single-molecule sensitivity in single cells.

Science ,329(5991):533–538, July 2010.[8] Shawn C Little, Mikhail Tikhonov, and Thomas Gregor.Precise developmental gene expression arises from globallystochastic transcriptional activity.

Cell , 154(4):789–800,August 2013.[9] Ido Golding, Johan Paulsson, Scott M Zawilski, and Ed-ward C Cox. Real-time kinetics of gene activity in individ-ual bacteria.

Cell , 123(6):1025–1036, December 2005.[10] David M Suter, Nacho Molina, David Gatﬁeld, Kim Schnei-der, Ueli Schibler, and Felix Naef. Mammalian genes aretranscribed with widely diﬀerent bursting kinetics.

Science ,332(6028):472–474, April 2011.[11] Jacques P Bothma, Hernan G Garcia, Emilia Esposito,Gavin Schlissel, Thomas Gregor, and Michael Levine. Dy-namic regulation of eve stripe 2 expression reveals tran-scriptional bursts in living Drosophila embryos.

Proceedingsof the National Academy of Sciences , 111(29):10598–10603,July 2014.[12] William J Blake, Mads Kærn, Charles R Cantor, and J JCollins. Noise in eukaryotic gene expression.

Nature ,249(2002):247–249, 2003.[13] Jean Peccoud and Bernard Ycart. Markovian modelingof gene-product synthesis.

Theoretical Population Biology ,48(2):222–234, 1995.[14] Daniel Hebenstreit. Are gene loops the cause of transcrip-tional noise?

Trends in Genetics , 29(6):333–338, June2013.[15] Katjana Tantale, Alja Kozulic-Pirher, Annick Lesne, Jean-Marc Victor, Marie-C eacute cile Robert, Serena Capozi,Racha Chouaib, Volker B auml cker, Julio Mateos- Langerak, Xavier Darzacq, Christophe Zimmer, FlorianMueller, Eugenia Basyuk, and Edouard Bertrand. A single-molecule view of transcription reveals convoys of RNA poly-merases and multi-scale bursting.

Nature Communications ,7:1–14, July 2016.[16] Gaˇsper Tkaˇcik, Thomas Gregor, and William Bialek. Therole of input noise in transcriptional regulation.

PLoS ONE ,3(7):e2774—-11, jul 2008.[17] Sandeep Choubey, Alvaro Sanchez, and Jane Kondev. Ef-fect of Promoter Architecture on Transcription Initiation.

Methods , 7(1):1–8, mar 2011.[18] Brian Munsky, Gregor Neuert, and Alexander van Oude-naarden. Using gene expression noise to understand generegulation.

Science , 336(6078):183–187, April 2012.[19] Daniel L Jones, Robert C Brewster, and Rob Phillips. Pro-moter architecture dictates cell-to-cell variability in geneexpression.

Science , 346(6216):1533–1536, dec 2014.[20] Georg Rieckh and Gaˇsper Tkaˇcik. Noise and informationtransmission in promoters with multiple internal states.

Biophysical Journal , 106(5):1194–1204, mar 2014.[21] Benjamin Zoller, Damien Nicolas, Nacho Molina, and FelixNaef. Structure of silent transcription intervals and noisecharacteristics of mammalian genes.

Molecular Systems Bi-ology , 11(7):823–823, July 2015.[22] Thomas Gregor, Hernan G Garcia, and Shawn C Little.The embryo as a laboratory: quantifying transcription inDrosophila.

Trends in Genetics , 30(8):364–375, August2014.[23] Gary Struhl, Paul Johnston, and Peter A. Lawrence. Con-trol of Drosophila body pattern by the hunchback mor-phogen gradient.

Cell , 69(2):237–249, 1992.[24] JP Gergen, D Coulter, and EF Wieschaus. Segmental pat-tern and blastoderm cell identities.

Gametogenesis and theearly embryo , pages 195–220, 1986.[25] Thomas B. Kornberg and Tetsuya Tabata. Segmentation ofthe Drosophila embryo.

Current Opinion in Genetics andDevelopment , 3(4):585–593, 1993.[26] Johannes Jaeger. The gap gene network.

Cellular andmolecular life sciences : CMLS , 68(2):243–274, January2011.[27] Julien O Dubuis, Reba Samanta, and Thomas Gregor.Accurate measurements of dynamics and reproducibilityin small genetic networks.

Molecular Systems Biology ,9(1):639–639, 2013.[28] Danyang Yu and Stephen Small. Precise Registration ofGene Expression Boundaries by a Repressive Morphogenin Drosophila.

Current Biology , 18(12):868–876, 2008.[29] Julien O Dubuis, Gaˇsper Tkaˇcik, Eric F Wieschaus,Thomas Gregor, and William Bialek. Positional informa-tion, in bits.

Proceedings of the National Academy of Sci-ences , 110(41):16301–16308, oct 2013.[30] Wolfgang Driever, Gudrun Thoma, and ChristianeN¨usslein-Volhard. Determination of spatial domains of zygotic gene expression in the Drosophila embryo by theaﬃnity of binding sites for the bicoid morphogen. Nature ,340(6232):363–367, 1989.[31] Rachel Kraut and Michael Levine. Spatial regulation of thegap gene Giant during Drosophila development.

Develop-ment , 111(2):601–609, feb 1991.[32] Marcia Simpson-Brose, Jessica Treisman, and Claude De-splan. Synergy between the hunchback and bicoid mor-phogens is required for anterior patterning in Drosophila.

Cell , 78(5):855–865, 1994.[33] F Sauer and H Jackle. Heterodimeric Drosophila gapgene protein complexes acting as transcriptional repressors.

EMBO J , 14(19):4773–4780, 1995.[34] Danielle Lebrecht, Marisa Foehr, Eric Smith, Francisco J PLopes, Carlos E Vanario-Alonso, John Reinitz, David SBurz, and Steven D Hanes. Bicoid cooperative DNA bind-ing is critical for embryonic patterning in Drosophila.

Pro-ceedings of the National Academy of Sciences of the UnitedStates of America , 102(37):13176–81, 2005.[35] Michael W Perry, Alistair N Boettiger, and Michael Levine.Multiple enhancers ensure precision of gap gene-expressionpatterns in the Drosophila embryo.

Proceedings of the Na-tional Academy of Sciences , 108(33):13570–13575, August2011.[36] Evgeny Z Kvon, Tomas Kazmar, Gerald Stampfel, J OmarY´a˜nez-Cuna, Michaela Pagani, Katharina Schernhuber,Barry J Dickson, and Alexander Stark. Genome-scalefunctional characterization of Drosophila developmental en-hancers in vivo.

Nature , 512(7512):91–95, July 2014.[37] Eran Segal, Tali Raveh-Sadka, Mark Schroeder, Ulrich Un-nerstall, and Ulrike Gaul. Predicting expression patternsfrom regulatory sequence in Drosophila segmentation.

Na-ture , 451(7178):535–540, January 2008.[38] Johannes Jaeger, Svetlana Surkova, Maxim Blagov, HildeJanssens, David Kosman, Konstantin N Kozlov, Manu,Ekaterina Myasnikova, Carlos E Vanario-Alonso, MariaSamsonova, David H Sharp, and John Reinitz. Dynamiccontrol of positional information in the early Drosophilaembryo.

Nature , 430(6997):368–371, July 2004.[39] Manu, Svetlana Surkova, Alexander V Spirov, Vitaly VGursky, Hilde Janssens, Ah-Ram Kim, Ovidiu Radulescu,Carlos E Vanario-Alonso, David H Sharp, Maria Sam-sonova, and John Reinitz. Canalization of gene expressionin the Drosophila blastoderm by gap gene cross regulation.

PLoS Biology , 7(3):e1000049, March 2009.[40] James Briscoe and Stephen Small. Morphogen rules: designprinciples of gradient-mediated embryo patterning.

Devel-opment , 142(23):3996–4009, December 2015.[41] Kai Yuan, Antony W Shermoen, and Patrick H O’Farrell.Illuminating DNA replication during Drosophila develop-ment using TALE-lights.

Current Biology , 24(4):R144–R145, February 2014.[42] Mark D Schroeder, Michael Pearce, John Fak, HongQingFan, Ulrich Unnerstall, Eldon Emberly, Nikolaus Rajewsky,Eric D Siggia, and Ulrike Gaul. Transcriptional control inthe segmentation gene network of Drosophila.

PLoS Biol-ogy , 2(9):E271, September 2004.[43] Amanda Ochoa-Espinosa, Gozde Yucel, Leah Kaplan,Adam Par´e, Noel Pura, Adam Oberstein, Dmitri Papat-senko, and Stephen Small. The role of binding site clus-ter strength in Bicoid-dependent patterning in Drosophila.

Proceedings of the National Academy of Sciences of theUnited States of America , 102(14):4960–4965, April 2005.[44] M Hoch, C Schr¨oder, E Seifert, and H J¨ackle. cis-actingcontrol elements for Kr¨uppel expression in the Drosophilaembryo.

The EMBO Journal , 9(8):2587–2595, August 1990.[45] Y Jacob, S Sather, J R Martin, and R Ollo. Analysis of Kruppel Control Elements Reveals That Localized Ex-pression Results From the Interaction of Multiple Subele-ments.

Proceedings of the National Academy of Sciences ofthe United States of America , 88(13):5912–5916, July 1991.[46] M J Pankratz, M Busch, M Hoch, E Seifert, andH J¨ackle. Spatial control of the gap gene knirps in theDrosophila embryo by posterior morphogen system.

Sci-ence , 255(5047):986–989, February 1992.[47] Leeat Keren, David Van Dijk, Shira Weingarten-Gabbay,Dan Davidi, Ghil Jona, Adina Weinberger, Ron Milo, andEran Segal. Noise in gene expression is coupled to growthrate.

Genome Research , 25(12):1893–1902, 2015.[48] Gil Hornung, Raz Bar-Ziv, Dalia Rosin, NobuhikoTokuriki, Dan S. Tawﬁk, Moshe Oren, and Naama Barkai.Noise-mean relationship in mutated promoters.

GenomeResearch , 22(12):2409–2417, dec 2012.[49] Alvaro Sanchez, Sandeep Choubey, and Jane Kondev. Reg-ulation of Noise in Gene Expression.

Annual review of bio-physics , 42(March):1–23, 2013.[50] Arren Bar-Even, Johan Paulsson, Narendra Maheshri, MiriCarmi, Erin O’Shea, Yitzhak Pilpel, and Naama Barkai.Noise in protein expression scales with natural proteinabundance.

Nature Genetics , 38(6):636–643, May 2006.[51] Daniel R Larson, Christoph Fritzsch, Liang Sun, XiuhauMeng, David S Lawrence, and Robert H Singer. Directobservation of frequency modulated transcription in singlecells using light activation. eLife , 2:e00750, 2013.[52] Adrien Senecal, Brian Munsky, Florence Proux, NathalieLy, Floriane E Braye, Christophe Zimmer, Florian Mueller,and Xavier Darzacq. Transcription Factors Modulate c-FosTranscriptional Bursts.

CellReports , 8(1):75–83, July 2014.[53] Hernan G Garcia, Mikhail Tikhonov, Albert Lin, andThomas Gregor. Quantitative imaging of transcription inliving Drosophila embryos links polymerase activity to pat-terning.

Current biology : CB , 23(21):2140–2145, Novem-ber 2013.[54] Takashi Fukaya, Bomyi Lim, and Michael Levine. RapidRates of Pol II Elongation in the Drosophila Embryo.

Cur-rent Biology , 27(9):1387–1391, 2017.[55] Siqi Wu, Antony Joseph, Ann S. Hammonds, Susan E.Celniker, Bin Yu, and Erwin Frise. Stability-driven non-negative matrix factorization to interpret spatial gene ex-pression and build local gene networks.

Proceedings of theNational Academy of Sciences , 113(16):4290–4295, 2016.[56] Avigdor Eldar and Michael B Elowitz. Functional rolesfor noise in genetic circuits.

Nature , 467(7312):167–173,September 2010.[57] J. Elf, G.-W. Li, and X. S. Xie. Probing TranscriptionFactor Dynamics at the Single-Molecule Level in a LivingCell.

Science , 316(5828):1191–1194, 2007.[58] T. S. Karpova, M. J. Kim, C. Spriet, K. Nalley, T. J. Stase-vich, Z. Kherrouche, L. Heliot, and J. G. McNally. Concur-rent Fast and Slow Cycling of a Transcriptional Activatorat an Endogenous Promoter.

Science , 319(5862):466–469,2008.[59] J Christof M Gebhardt, David M Suter, Rahul Roy,Ziqing W Zhao, Alec R Chapman, Srinjan Basu, Tom Ma-niatis, and X Sunney Xie. Single-molecule imaging of tran-scription factor binding to DNA in live mammalian cells.

Nat Methods , 10(5):421–426, may 2013.[60] Tatsuya Morisaki, Waltraud G M¨uller, Nicole Golob, Da-vide Mazza, and James G McNally. Single-molecule anal-ysis of transcription factor binding at transcription sites inlive cells.

Nat Commun , 5:4456, 2014.[61] Ignacio Izeddin, Vincent R´ecamier, Lana Bosanac,Ibrahim I. Ciss´e, Lydia Boudarene, Claire Dugast-Darzacq,Florence Proux, Olivier B´enichou, Rapha¨el Voituriez, Olivier Bensaude, Maxime Dahan, and Xavier Darzacq.Single-molecule tracking in live cells reveals distinct target-search strategies of transcription factors in the nucleus. eLife , 2014(3):e02230, jun 2014.[62] Kazunari Kaizu, Wiet De Ronde, Joris Paijmans, KoichiTakahashi, Filipe Tostevin, and Pieter Rein Ten Wolde.The berg-purcell limit revisited.

Biophysical Journal ,106(4):976–985, feb 2014.[63] E. Lecuyer, A. S. Necakov, L. Caceres, and H. M.Krause. High-Resolution Fluorescent In Situ Hybridizationof Drosophila Embryos and Tissues.

Cold Spring HarborProtocols , 2008(6):pdb.prot5019–pdb.prot5019, 2008.[64] Shawn C Little, Gaˇsper Tkaˇcik, Thomas B Kneeland,Eric F Wieschaus, and Thomas Gregor. The Formationof the Bicoid Morphogen Gradient Requires Protein Move-ment from Anteriorly Localized mRNA.

PLoS Biology ,9(3):e1000596–17, March 2011.[65] Sandeep Choubey, Jan´e Kondev, and Alvaro Sanchez. De-ciphering Transcriptional Dynamics In Vivo by CountingNascent RNA Molecules.

PLoS Computational Biology ,11(11):e1004345–21, November 2015.[66] Heng Xu, Samuel O Skinner, Anna Marie Sokac, and IdoGolding. Stochastic Kinetics of Nascent RNA.

PhysicalReview Letters , 117(12):128101–6, September 2016.[67] Alvaro Sanchez and Jan´e Kondev. Transcriptional controlof noise in gene expression.

Proceedings of the NationalAcademy of Sciences , 105(13):5081–5086, April 2008.[68] Ioannis Lestas, Johan Paulsson, Nicholas E Ross, andGlenn Vinnicombe. Noise in Gene Regulatory Networks.

Automatic Control, IEEE Transactions on , 53:189–200,2008.[69] Aleksandra M. Walczak, Andrew Mugler, and Chris H.WIggins. Analytic methods for modeling stochastic reg-ulatory networks.

Methods in molecular biology (Clifton,N.J.) , 880(Chapter 13):273–322, 2012.[70] Brian Munsky and Mustafa Khammash. The ﬁnite stateprojection algorithm for the solution of the chemical masterequation.

The Journal of Chemical Physics , 124(4):044104,January 2006.[71] Robert B Sidje. Expokit: a software package for computingmatrix exponentials.

ACM Transactions on MathematicalSoftware (TOMS) , 24(1):130–156, 1998.[72] Daniel T Gillespie. Exact stochastic simulation of cou-pled chemical reactions.

The journal of physical chemistry ,81(25):2340–2361, 1977.

Supplement: Diverse spatial expression patterns emerge fromcommon transcription bursting kinetics

IV. SUPPLEMENTARY FIGURES

Cyto density [ µ m -3 ] (max region) M ean a c t i v i t y [ c u ] ( m a x r eg i on ) early late Stage ranking DAPI C y t o den s i t y [ µ m - ] ( m a x r eg i on ) hb: R =0.61Kr: R =0.25kni tot: R =0.76 hb w t hb de f K r k n i t o t k n i l a t e k n i ea r l y g t f e m a l e an t g t f e m a l e po s t g t m a l e an t g t m a l e po s t -0.8-0.6-0.4-0.200.20.40.60.81 C o rr e l a t i on ( m a x r eg i on ) hb w t hb de f K r k n i t o t k n i l a t e k n i ea r l y g t f e m a l e an t g t f e m a l e po s t g t m a l e an t g t m a l e po s t -0.4-0.200.20.40.60.81 ∆ µ / µ ( m a x r eg i on ) AC DB

FIG. S1:

Embryos staging. Error bars are given by the 66% conﬁdence intervals. (A) Cytoplasmic mRNA density as a function of embryostage (nc13 interphase) estimated from DAPI staining. (B) Mean activity in the maximally expressed regions as a function of the cytoplasmicmRNA density. Each data point corresponds to a single embryo. (C) Pearson correlation coeﬃcient of the mean activity and the cytoplasmicmRNA density calculated over the population of embryos in the maximal expressed regions. (D) Normalized diﬀerential activity ∆ µ explainedby timing between early and late embryo as estimated from cytoplasmic mRNA density. x/L σ m ea2 / σ % σ mea2 = σ img2 + σ ali2 x/L σ µ / σ % -6 -4 -2 Alignment noise σ ali2 / µ -3 -2 -1 E m b r y o v a r i ab ili t y σ µ / µ R = 0.69 Mean activity µ [cu] Green channel [cu] R ed c hanne l [ c u ] a = 0.97 σ b = 4.59e-02b = 9.32e-03b = 9.23e-04 B Cytoplasmic mRNAalternating colors ’ ’ ACD

35 40 45343638404244 E I m ag i ng no i s e / µ % σ i m g hb wthb defKrkni latekni earlygt femalegt male FIG. S2:

Measurement errors and embryos variability. Error bars in C-F are given by the 66% conﬁdence intervals. (A) Imaging noisemodel calibrated from dual-color FISH using alternating probe conﬁguration. The data (blue circles) corresponds to the activity of singlenuclei measured from 15 hb embryos. In absence of measurement noise and without normalization imprecision, both channels should perfectlycorrelate with slope 1. We characterized the spread along the ﬁtted line (solid line) assuming error on both channel. The dash lines standfor the 1 σ envelope. (B) Imaging noise (CV) as a function of the mean activity for both channels. (C) Fraction of the total variance σ corresponding to variability of the mean activity across embryos σ µ as a function of the AP position. The solid and black dashed lines representthe global mean fraction and the 66% conﬁdence interval. (D) Embryo variability as a function alignment noise. Each data point correspondsto a single AP bin (2.5% egg length). The solid line (slope 1) highlights the correlation between the two quantities at the boundaries while thedash line corresponds to the embryo variability in the maximally expressed regions. (E) Fraction of the total variance σ corresponding to themeasurement noise as a function of the AP position. Measurement noise σ is deﬁned as the combination of imaging σ and alignementnoise σ . The solid and black dashed lines represent the global mean fraction and the 66% conﬁdence interval. Mean activity g/g -0.3-0.2-0.100.10.20.3 F ou r t h c u m u l an t a c t i v i t y κ / g Mean activity g/g -0.15-0.1-0.0500.050.10.150.20.250.3 T h i r d c u m u l an t a c t i v i t y κ / g Mean activity g/g V a r i an c e a c t i v i t y σ g2 / g datahb fitKr fitkni fitgt fitglobal fit A BC

FIG. S3:

Normalized cumulants as a function of normalized mean activity. (A-C) Each data point corresponds to a single AP bin and theerror bars are the 66% conﬁdence intervals. The dash line stands for the Poisson limit. The solid line are 2 nd (A), 3 rd (B) and 4 th (C) orderpolynomial ﬁts respectively. The ﬁt were performed for each genes independently (color lines) and the black line corresponds to the global ﬁt(Fig 2B). hb Kr kni gt1.522.533.544.5 E s t i m a t ed gene l eng t h [ k b ] Kr kni gt202530354045

Lag [ s ] hb alt hb Kr kni gt0.30.40.50.60.70.80.911.11.2 A c t i v i t y r a t i o ( m ean ' / m ean ' ) annotated lengthPol2-ChIP lengthFISH Gene length L [kb] A c t i v i t y r a t i o ( m ean ' / m ean ' ) L = 3.635, r = 0.570

Probe position [kb] C u m u l a t i v e c on t r i bu t i on

5' channel3' channel -1 Noise 5' activity ( σ / µ ) -1 N o i s e ' a c t i v i t y ( σ / µ ) R = 0.96slope: 1.10 Mean 5' activity [cu] M ean ' a c t i v i t y [ c u ] hb slope: 0.57 (0.56,0.57)Kr slope: 0.63 (0.62,0.63)kni slope: 0.56 (0.55,0.56)gt slope: 0.62 (0.62,0.63) AC BDE F

Cytoplasmic mRNAdecorated with 2 colors ’ ’ ’ ’ FIG. S4:

Dual-color FISH signal properties link to probe conﬁguration. (A) Schematic of the dual color mRNA-FISH technique. Twoindependent probe sets hybridized to diﬀerent ﬂuorophore are designed to target the 5’ (green) and 3’ region (red) of a transcript of interest.Both channels are physically correlated and provide control over lingering transcripts by quantifying deviation from the expected green-redratio.(B) Mean 3’ versus 5’ activity. Each data point correspond to a single AP bin. The slope of the diﬀerent genes depend on the exact probeconﬁguration. (C) 3’ versus 5’ noise (CV). (D) Mean activity ratio 3’ over 5’ as a function of gene length. The probe conﬁguration of hb wasused as an example. Assuming elongation to occur at constant speed and instantaneous release of transcripts, the ratio is fully determined bythe probes’ location and the gene length (transcribed region). The activity ratio (blue line) as function of distance results from the ratio ofthe integrals of the cumulative probe contribution (inset). (E) Activity ratio for each gene. The circles stand for the measured ratio with errorbars (2 standard errors and 2 standard deviations respectively) obtained from the propagation of the normalization errors in both channels forall embryos. The crosses correspond to the predicted ratio based on the annotated gene length. The squares are derived from Pol2 occupancydata (Pol2-ChIP). For Kr , kni and gt , Pol2 signal is found a few hundreds bp away from the annotated length suggesting extra processingrelated to termination. (F) Eﬀective gene length for each gene as determined from the activity ratio. Symbols and error bars similar than inpanel E. Assuming an elongation speed of 1.5kb/min, the diﬀerence between the eﬀective and annotated gene length can be translated in time(inset). The lag or extra residence time of transcripts at the loci is at most 35 seconds.

28 31 33 36 39 41 44 46 49 51 x/L % N u l c e i a c t i v i t y [ c u ] Mean 5' activityMean 3' activity

5' activity [cu] ' a c t i v i t y [ c u ] AP b i n ( x / L % ) r = 0.57Mean in AP bin x/L = 51.5%

5' activity [cu] x/L = 48.9%

5' activity [cu] x/L = 46.3%

5' activity [cu] x/L = 43.7%

5' activity [cu] x/L = 41.2%

5' activity [cu] ' a c t i v i t y [ c u ] x/L = 51.5%

5' activity [cu] x/L = 48.9%

5' activity [cu] x/L = 46.3%

5' activity [cu] x/L = 43.7%

5' activity [cu] x/L = 41.2%

5' activity [cu] ' a c t i v i t y [ c u ] A BDE C

5' activity [cu] ' a c t i v i t y [ c u ] FIG. S5:

Dual color FISH enables inference of transcription dynamics. (A) The combination of both 5’ (green) and 3’ (red) readouts constrainsthe possible conﬁgurations of nascent transcript locations and numbers at a given transcription site. This approach oﬀers deeper insight intotranscription dynamics compared to conventional single channel mRNA-FISH since both channels are physically correlated. (B) Transcriptionalactivity proﬁle for a typical gap gene ( hb ) as a function of AP position for both 5’ and 3’ channels. A single dot corresponds to the totalintensity of nascent transcripts in cytoplasmic unit. As in Figure 1B, 18 embryos have been aligned and overlaid. The vertical dash linesdeﬁne AP bins covering 2.5% of egg length; crosses stand for the mean activity across embryo in each bin with error bars corresponding to66% conﬁdence intervals. (C) Dual color measurement space represented as 3’ against 5’ activity. The solid black line delineates the borderof possible measurements given the probe sets conﬁguration, gene length ( hb hb empirical distributions of 3’ versus 5’ activity and thecolor code represents diﬀerent AP bin; the black crosses correspond to the mean of the distributions for each AP bin and lie on the dash line.For all AP bins, the measurements are enclosed by the envelope of maximal Pol2 density (black line). (D) Measured distribution of 3’ versus5’ activity across AP position for hb . The distribution were constructed based on the 2.5% AP bin deﬁned in Figure 3B. The dash blackline represents to the expected ratio of 3’ versus 5’ activity (r=0.57); the black circle corresponds to the mean of the distribution and lies onthe dash line. These distributions are used as input in our inference framework enabling precise inference of the underlying transcriptionalkinetics. (E) Best ﬁtting distribution of 3’ versus 5’ activity as determined from the empiric distribution in Figure 3D. Of note, the displayeddistributions are devoid of measurement noise and represent the theoretical output of the 2-state model given the probe sets conﬁguration andthe eﬀective elongation time. Thus, the likelihood of data is essentially the convolution the activity distribution calculated from the 2-statemodel with the noise measurement distribution that is maximized to determined the kinetic rates k ini , k on and k oﬀ . Input activity / g -2 -1 C o rr e l a t i on t i m e τ n [ m i n ] Input activity / g -3 -2 -1 k on [ m i n - ] Input activity / g k i n i [ m i n - ] Input activity / g -3 -2 -1 k o ff [ m i n - ] Input activity / g M ean o cc upan cy < n > Input activity / g N o i s e σ / µ

02 g

AC BD / g -3 -2 -1 σ / µ

02 g

E F

FIG. S6:

Validation of the inference framework for dual-color FISH. We simulated synthetic 3’ and 5’ nuclei activity data based on four genecopies modeled by a 2-state model, using the probe conﬁguration for hb with measurement noise. We tested four diﬀerent modulation of themean input activity µ in the data: 1) initiation rate k ini alone (cyan), 2) on-rate k on alone (green), 3) oﬀ-rate k on alone (blue) and modulationof the mean occupancy (cid:104) n (cid:105) at constant switching correlation time τ n (red). (A-F) The colored crosses stand for the inferred parameters as afunction of the input activity. Error bars correspond to the 10 and 90 th percentiles of the posterior distribution. The dash lines represents theinput (true) parameters used to simulate the data. -1 -0.5 0 0.5 1 Modeled κ [cu ] × -3-2-10123 D a t a κ [ c u ] × R = 0.587a = 1.26 -500 0 500 1000 Modeled κ [cu ] -1500-1000-500050010001500200025003000 D a t a κ [ c u ] R = 0.797a = 1.14 Modeled σ [cu ] D a t a σ [ c u ] R = 0.994a = 1.07 Modeled µ [cu] D a t a µ [ c u ] R = 1a = 0.982 AC BD

FIG. S7:

Four ﬁrst cumulants of data (unormalized, in cytoplasmic units) as a function of the ones predicted by the two state-model withbest ﬁtting parameters for multiple gene copies ( N g = 2 or 4). Each data point corresponds to a single AP-bin and the color code stand forthe diﬀerent genes. τ n [min] (k elo = 1.5 kb/min) τ n [ m i n ] ( k e l o = . k b / m i n ) R = 1.00a = 0.60 0 0.2 0.4 0.6 0.8 1 (k elo = 1.5 kb/min) < n > ( k e l o = . k b / m i n ) R = 1.00a = 1.000 2 4 6 8 10 12 k ini [min -1 ] (k elo = 1.5 kb/min) k i n i [ m i n - ] ( k e l o = . k b / m i n ) R = 1.00a = 1.67 A BC

FIG. S8:

Comparison of the inferred transcriptional parameters under diﬀerent elongation speed k elo . Both k ini and τ n are rescaled while (cid:104) n (cid:105) remains the same. V. QUANTIFICATION AND MEASUREMENT ERRORA. Embryo staging

In order to assess the timing of the diﬀerent embryos, we ﬁrst manually ranked the diﬀerent embryos based on timingestimation from DAPI staining. We estimated the interphase stage relying on morphological features of the nuclei (shapeand textures) in the DAPI channel. We then veriﬁed whether accumulation of cytoplasmic mRNAs correlates with ourmanual ranking (Fig. S1A) . Both approaches lead to similar results and provide a decent proxy for timing. By comparingthe average activity of the diﬀerent embryos in the maximally expressed regions with the cytoplasmic density, we assessedthe eﬀect of timing on the mean activity (Fig. S1B). We estimated the Pearson correlation coeﬃcient ρ for the diﬀerentgenes and regions ( gt anterior and posterior regions). Overall, timing explain up to 50% of the embryo variability in themaximally expressed regions (Fig. S1C), with the exception of kni that is highly correlated ρ ∼ .

8. We thus separatedthe kni embryos in two categories, early and late embryos. We also calculated the diﬀerence ∆ µ in mean activity explainedby timing between late and early embryos (Fig. S1D), on average the diﬀerence is of the order of 5 cu. Although timingaﬀects the activity, the eﬀect is small enough to still warrant the assumption of steady-state. B. Imaging noise model

We quantiﬁed measurements noise due to imaging and calibration using a two-color smFISH approach, labeling eachmRNA in alternating colors along the length of the mRNA. We included 15 h b embryos in the analysis, which correspondsto approximately 4’000 nuclei activity measurements. We then normalized the activity (ﬂuorescence signal) of the nuclei incytoplasmic units independently in each channel. In absence of noise and provided accurate normalization, both channelswould perfectly correlate with slope one. By plotting one channel against the other (Fig. S2A), we assessed the slope andcharacterized the spread of the data along the expected line.We build a simple eﬀective model to describe measurement noise: P ( S (5) , S (3) | G (5) , G (3) ) = N (cid:16) S (5) | µ = G (5) , σ ( G (5) ) (cid:17) · N (cid:16) S (3) | µ = G (3) , σ ( G (3) ) (cid:17) (S1)where S stands for the ﬂuorescent signal in cytoplasmic units and G the total nascent transcripts (in C.U.) in absenceof noise. We assumed that the measurement errors were normally distributed and independent in both channels, whichwas motivated by the absence of correlation in the background. We further assumed that the variance would depend onactivities, consistent with the increasing spread observed in the data. In order to estimate the variance speciﬁc to eachchannel, we ﬁtted a straight line y = ax + h assuming error on both x ≡ S (5) and y ≡ S (3) . We expanded the variance asa function of the scalar projection along the line v : σ ( v ) = σ b + b v + b v + ... (S2) v = x + ay √ a (S3)Assuming the same error along x and y , we then maximized the following likelihood to estimate the parameters θ = { a, h, σ b , b , b , ... } : P ( { x i , y i }| θ ) = N d (cid:89) i =1 (cid:112) (2 πσ ( v i )) exp (cid:18) − ( y i − ax i − h ) a ) σ ( v i ) (cid:19) (S4)Using the Akaike information criterion, we selected the best model which was parametrized by ( a, σ b , b , b ) with h = 0(Fig. S2A). The variances in the noise measurement model (Eq. S1) are then given by: σ ( G ) = σ ( v = (cid:112) G + ( aG ) ) (S5) σ ( G ) = σ ( v = (cid:112) G + ( G/a ) ) (S6)where σ ( v ) = σ b + b v + b v . The resulting imaging noise is shown in Figure S2B. C. Alignment and embryo variability

The Anterior-Posterior axis (AP) was determined based on a midsagittal elliptic mask of the embryo in the DAPIchannel [8]. Position is obtained by registration of high- and low-magniﬁcation DAPI images of the surface. We thenﬁtted constrained splines to approximate the mean activity as a function of the AP position. We used diﬀerent features0of the mean proﬁles such as maxima and inﬂection points to reﬁne the alignment between the diﬀerent embryos. Overall,this realignment procedure enables us to estimate an alignment error of the order of 2% egg length.After alignment, we deﬁned spatial bins along the AP-axis with a width of 2.5% of egg length. Such a width was agood compromise to balance the sampling and binning error. We next sought to decompose the measured total activityvariance σ into diﬀerent components related to imaging, alignement, embryo and nuclei variability (Fig. 2E-F). We ﬁrstestimated the variability of the mean across embryos σ µ in each bin (Fig. S2C); we split the total variance σ in each binaccording to the law of total variance: σ = 1 N e N e (cid:88) i =1 σ i (cid:124) (cid:123)(cid:122) (cid:125) (cid:104) σ i (cid:105) + 1 N e N e (cid:88) i =1 ( µ i − µ ) (cid:124) (cid:123)(cid:122) (cid:125) σ µ (S7)where N e is the total number of embryos and µ the global mean.Next we aimed to determine what fraction of σ µ is explained by residual misalignment. Assuming that all the variabilityin the mean at boundaries results from spatial mis-alignment of the diﬀerent embryos, one can ﬁnd an upper bound onthe residual alignment error σ x : σ µ ≥ σ = (cid:18) ∂µ∂x (cid:19) σ x (S8)where µ is the global mean proﬁle as function of AP position x . For each gene, we estimated the residual alignment error σ x required to explain as much embryo variability as possible (Fig. S2D solid line). Overall we found σ x of the order of1% egg length. The total embryo variability in the maximally expressed regions cannot be explained by misalignment as ∂µ∂x ≈ σ µ = σ + σ where σ is the residual embryo to embryo variability.Finally, we assessed what fraction of the total variance σ corresponds to combined measurement noise σ = σ + σ where σ was estimated in subsection V B. Total measurement noise σ remains below 25% of the total variance for allgenes and all position (Fig. S2E), and on average reaches 9 . ± . σ = σ + σ where σ is the nuclei variability and was deﬁned as: σ = σ − σ − σ − σ (S9)Overall, the nuclei variability σ largely dominates in our data and represents 84% of the total variance on average (Fig1E-F). VI. TELEGRAPH MODEL OF TRANSCRIPTIONAL ACTIVITYA. Temporal evolution of nascent transcripts

We modeled the transcriptional activity of a single gene copy as a telegraph process (on-oﬀ promoter switch) withtranscript initiation occurring as a Poisson process during the ‘on’ periods. Elongation of a single transcript was assumedto be deterministic, with ﬁxed speed k elo kb/min. Thus, the elongation time is τ e = L/k elo where L is the length of thegene in kb. The key parameters in the model are: the mRNA initiation rate k ini , the mRNA elongation time τ e , theon-rate k on and the oﬀ-rate k oﬀ .The master equation that governs the temporal evolution of nascent transcripts at individual transcription sites is givenby ddt P t ( g, n ) = k ini δ n, P t ( g − , n ) − k ini δ n, P t ( g, n )+ k n P t ( g, n − − k n +1 P t ( g, n ) (S10)with g the number of nascent transcripts (or alternatively the number of Pol II) on the gene and n the promoter state.Here we used the convention that n = 1 and n = 0 correspond to the ‘on’ state and ‘oﬀ’ state respectively. In addition,the following periodic conditions n = − ≡ n = 2 ≡ t − τ e , t ] contributes to the signal at time t , i.e. the elongation time τ e determined the “memory” of the system.This is correct as long as the release events are instantaneous and termination is fast compared to elongation. Thus, thedynamics of nascent transcripts accumulation on the gene for t ≤ τ e is obtained by solving the master equation with zeroinitial transcript on the gene P t ( g ) = δ g, and an arbitrary initial distribution of promoter state.1 B. Summary statistics

We can derive the temporal evolutions of the central moments from the master equation (Eq. S10) [67, 68]. The meansof nascent transcripts g and promoter states n satisfy the following equation:  ddt (cid:104) g ( t ) (cid:105) = k ini (cid:104) n ( t ) (cid:105) ddt (cid:104) n ( t ) (cid:105) = k on − ( k on + k oﬀ ) (cid:104) n ( t ) (cid:105) (S11)At steady-state ( ddt (cid:104) n (cid:105) = 0), the mean occupancy of the promoter is simply given by (cid:104) n (cid:105) = k on / ( k on + k oﬀ ). Similarly,the covariances satisfy the following set of equations:  ddt σ g ( t ) = 2 k ini σ gn ( t ) + k ini (cid:104) n ( t ) (cid:105) ddt σ gn ( t ) = k ini σ n ( t ) − ( k on + k oﬀ ) σ gn ( t ) ddt σ n ( t ) = − k on + k oﬀ ) σ n ( t ) + k on (1 − (cid:104) n ( t ) (cid:105) ) + k oﬀ (cid:104) n ( t ) (cid:105) (S12)Assuming zero initial transcripts and promoter at steady-state, one can solve both the mean and variance for g . Thus,the initial conditions are given by (cid:104) g ( t ) (cid:105) = 0, (cid:104) n ( t ) (cid:105) = k on / ( k on + k oﬀ ), σ g ( t ) = 0, σ gn ( t ) = 0 and σ n ( t ) = (cid:104) n ( t ) (cid:105) (1 − (cid:104) n ( t ) (cid:105) ). Solving these equations for the elongation time t = τ e leads to: (cid:104) g ( τ e ) (cid:105) = g (cid:104) n (cid:105) (S13) σ g ( τ e ) = g (cid:104) n (cid:105) + g (cid:104) n (cid:105) (1 − (cid:104) n (cid:105) )Φ( τ e /τ n ) (S14)where g = k ini τ e is the maximal mean nascent transcript number and Φ ∈ [0 ,

1] a noise ﬁltering function that takes intoaccount the ﬂuctuation correlation times. Here, the relevant time scales are the elongation time τ e and the promoterswitching correlation time τ n = ( k on + k oﬀ ) − . The variance σ g results from the sum of two contributions; the poissonvariance g (cid:104) n (cid:105) stemming from the stochastic initiation of transcript and the propagation of switching noise: (cid:18) d (cid:104) g (cid:105) d (cid:104) n (cid:105) (cid:19) σ n Φ( τ e /τ n ) = g (cid:104) n (cid:105) (1 − (cid:104) n (cid:105) ) (cid:124) (cid:123)(cid:122) (cid:125) binomial variance Φ( τ e /τ n ) (S15)For deterministic elongation, we ﬁnd that the noise ﬁltering function is given by:Φ( x ) = 2 exp ( − x ) + x − x (S16)In the limit of fast and slow promoter switching respectively, the noise ﬁltering function reduces to τ e (cid:29) τ n lim x →∞ Φ( x ) = 0 (S17) τ e (cid:28) τ n lim x → Φ( x ) = 1 (S18)Thus, the noise is minimal in the fast switching regime τ e (cid:29) τ n and reaches the Poisson limit σ g = g (cid:104) n (cid:105) . While in theslow switching regime τ e (cid:28) τ n , none of the switching noise is ﬁltered and the variance is described by a second orderpolynomial of the mean occupancy (cid:104) n (cid:105) , i.e. σ g = g (cid:104) n (cid:105) + g (cid:104) n (cid:105) (1 − (cid:104) n (cid:105) ). Of note, for exponentially distributed life-timeof transcripts, such as cytoplasmic mRNA subject to degradation, the results above remain valid except that the noiseaveraging function becomes Φ( x ) = (1 + x ) − with τ e the average life-time of the transcripts.Following a similar approach as in the previous paragraph, higher order moments and cumulants are analyticallycalculated from the master equations (S10). The cumulants up to order 3 are equal to the central moments while higherorder cumulants can be expressed as linear combination of central moments. The 4 th cumulant is given by κ = µ − µ ,where µ is the 4 th central moment and µ the variance. Assuming promoter at steady-state, we solved the equations for3 rd and 4 th moments of g and derive the following analytical expressions for 3 rd and 4 th cumulants, κ and κ : κ = g (cid:104) n (cid:105) + 3 g (cid:104) n (cid:105) (1 − (cid:104) n (cid:105) )Φ ( τ e /τ n )+ g (cid:104) n (cid:105) (1 − (cid:104) n (cid:105) )(1 − (cid:104) n (cid:105) )Φ ( τ e /τ n ) (S19) κ = g (cid:104) n (cid:105) + 7 g (cid:104) n (cid:105) (1 − (cid:104) n (cid:105) )Φ ( τ e /τ n )+ 6 g (cid:104) n (cid:105) (1 − (cid:104) n (cid:105) )(1 − (cid:104) n (cid:105) )Φ ( τ e /τ n )+ g (cid:104) n (cid:105) (1 − (cid:104) n (cid:105) )(Φ ( τ e /τ n ) − (cid:104) n (cid:105) (1 − (cid:104) n (cid:105) )Φ ( τ e /τ n )) (S20)2where Φ ,Φ ,Φ and Φ are noise ﬁltering functions that vanish in the fast switching regime ( τ e (cid:29) τ n ) and tend to onein the slow switching regime ( τ e (cid:28) τ n ):Φ ( x ) = 2 exp ( − x ) + x − x Φ ( x ) = 6 x exp ( − x ) + 2 exp ( − x ) + x − x Φ ( x ) = 12 x exp ( − x ) + 4 x exp ( − x ) + 6 exp ( − x ) + 2 x − x Φ ( x ) = 2 exp ( − x ) + 4 x exp ( − x ) + 20 x exp ( − x ) + 28 exp ( − x ) + 10 x − x (S21)The above expressions for the cumulants were tested numerically and are exact. The cumulants are polynomials of themean promoter activity (cid:104) n (cid:105) , which follows from the propagation of the binomial cumulants from the switching process.Since the cumulants are extensive, the cumulants for N g independent gene copies are obtained by multiplying by N g theexpression for a single gene copy (Eq. S19 and S20). VII. TRANSCRIPTIONAL PARAMETERS MODULATION FROM FISHA. Noise mean-relationship in the data

In the manuscript, we investigated the noise-mean relationship of the transcriptional activity (Fig. 2A). From equations(S14), we ﬁnd that the noise (squared coeﬃcient of variation CV ) is given by: σ g (cid:104) g (cid:105) = 1 g (cid:104) n (cid:105) + 1 − (cid:104) n (cid:105)(cid:104) n (cid:105) Φ( τ e /τ n ) (S22)In practice, we measure transcriptional activity in cytoplasmic units (intensity in equivalent number of fully elongatedtranscripts) and not in Pol II counts g directly. The measured mean activity µ in cytoplasmic units is proportional to (cid:104) g (cid:105) , i.e. µ = C N g (cid:104) g (cid:105) where C ∈ [0 ,

1] is a correction factor accounting for the FISH probe locations on the gene and N g the number of gene copies (for most genes N g = 4, except for gt male and hb deﬁcient that only have 2 copies).Assuming independence of transcription sites, the measured variance σ follows a similar relationship, i.e. σ = C N g σ g with C ∈ [0 , C and C are estimated in the next subsection. The measured noise is then givenby: σ µ = C C (cid:20) µ + 1 C N g µ − µµ Φ( τ e /τ n ) (cid:21) (S23)where µ = C N g g is the maximal possible expression level in cytoplasmic units. In practice, C /C ≈ C ≈ . k ini alone would lead to a noise ﬂoor at high expression,since the ﬁrst term corresponding to Poisson noise is inversely proportional to k ini and thus would tend to zero forlarge k ini , whereas the second term does not depend on k ini . This behavior is inconsistent with the data and can beruled out (Fig. 2A). Despite the fact that the C factors and τ e are slightly diﬀerent for each gene, we found that theabove noise-mean relationship (Eq. S23) captures the overall trend in the data well (Fig. 2A). Interestingly, the derivednoise-mean relationship ﬁts the data well when Φ is assumed constant (Fig. 2A black line, 2 ﬁtting parameters: µ andΦ /N g ), suggesting that the switching correlation time τ n does not change much across the range of expression. Althoughboth gt male and hb deﬁcient follow a similar trend, they deviate from the black line. This is mainly explained by thereduction in gene copies ( N g = 2 copies instead of 4) that alters the switching noise term (second term) by a factor twoand approximately reduces µ by half. B. Probe conﬁguration and correction factors

As we have seen above, the moments of the transcriptional activity in cytoplasmic units are related to the moments innumber of nascent transcripts or Pol II counts on the gene by correction factors C . Knowing the exact location of theﬂuorescent probe binding regions along the gene, one can easily calculate the contribution of a single nascent transcriptto the signal in C.U. as a function its length l : s ( l ) = 1 N N (cid:88) i =1 H ( l − l i ) = 1 N b ( l ) (S24)3where H is the unit step function, l i the end position of the i th probe binding region and N the total number of probes.Here, we made the assumption that each ﬂuorescent probe contributes equally to the signal. The number of probes boundto a transcript of length l is given by b ( l ) and will be denoted b i for l ∈ ( l i , l i +1 ] with l N +1 = L the length of a fullyelongated transcript. The total ﬂuorescent signal s in cytoplasmic units for g transcripts is given by s = 1 N N (cid:88) i =1 b i g i (S25)where g = (cid:80) Ni =1 g i , with g i the number of transcripts whose length l belongs to the length interval ( l i , l i +1 ]. Assumingthat g i follows a Poisson distribution with parameter λ i = k ini τ i where τ i = ( l i +1 − l i ) /k elo , the mean ﬂuorescent signal (cid:104) s (cid:105) is then given by (cid:104) s (cid:105) = 1 N N (cid:88) i =1 b i (cid:104) g i (cid:105) = 1 N N (cid:88) i =1 b i k ini τ i = (cid:32) N N (cid:88) i =1 b i ˜ τ i (cid:33)(cid:124) (cid:123)(cid:122) (cid:125) C k ini τ e = C (cid:104) g (cid:105) (S26)where ˜ τ i = τ i /τ e = ( l i +1 − l i ) /L and C the correction factor that relates the mean number of transcripts (cid:104) g (cid:105) to the meanﬂuorescent signal (cid:104) s (cid:105) in cytoplasmic units. This relation remains valid for the 2-state model with (cid:104) g (cid:105) = k ini τ e (cid:104) n (cid:105) (Eq.S13).As for the mean, one can calculate the correction factors for the higher moments and cumulants assuming a Poissonbackground. The second moment is given by (cid:10) s (cid:11) = 1 N (cid:42)(cid:88) ij b i b j g i g j (cid:43) = 1 N (cid:88) i (cid:54) = j b i b j (cid:104) g i (cid:105) (cid:104) g j (cid:105) + (cid:88) i b i (cid:10) g i (cid:11) (S27)= 1 N (cid:88) i (cid:54) = j b i b j k τ i τ j + (cid:88) i b i ( k τ i + k ini τ i )  (S28)= 1 N (cid:88) ij b i b j k τ i τ j + (cid:88) i b i k ini τ i  (S29)where (cid:104) g i g j (cid:105) = (cid:104) g i (cid:105) (cid:104) g j (cid:105) since initiation events are assumed independent. This only holds for the Poisson background andis no longer exact for the 2-state model as the switching process would introduce correlations. Nevertheless, the correctionfactors for the higher moments and cumulants calculated below remain a good approximation under the 2-state model,provided most probes are located in the 5’ region. The variance of the signal is ﬁnally given by (cid:10) ( δs ) (cid:11) = (cid:10) s (cid:11) − (cid:104) s (cid:105) = 1 N N (cid:88) i =1 b i k ini τ i = (cid:32) N N (cid:88) i =1 b i ˜ τ i (cid:33)(cid:124) (cid:123)(cid:122) (cid:125) C (cid:104) g (cid:105) = C (cid:104) g (cid:105) (S30)The calculation above can be generalized to the 3 rd and 4 th cumulants. We found the following correction factor for thePoisson background: C k = 1 N k N (cid:88) i =1 b ki ˜ τ i for k = 1 , ..., C k for each gene and two diﬀerent conﬁgurations of probe locations (5’ or 3’ region) are given inTable I. C. Cumulants for a single gene copy

In the manuscript, we calculated the 2 nd , 3 rd and 4 th cumulants from the data (Fig. 2B-D) and compared them withthe ones predicted by the 2-state model under diﬀerent kinetic modulation schemes (Fig. 2G-I). For independent randomvariables, the cumulants have the property to be extensive, which is convenient as the measured transcriptional activitiesresult from the sum of 2 to 4 gene copies.4 Gene k C k

5’ location C k

3’ location hb Kr kni gt k th cumulants from cytoplasmic units to Pol II counts or number of transcriptsfor each gene and two diﬀerent conﬁgurations of probe locations. We ﬁrst converted the k th cumulants ˜ κ k computed from the data in cytoplasmic units to Pol II counts (or number ofnascent transcripts) for a single gene copy with a normalized gene length: κ k = 1 C k N g (cid:18) (cid:104) L (cid:105) L (cid:19) k ˜ κ k (S32)where κ k is the k th cumulant in Pol II counts for a single gene copy, L the gene length, N g the number of gene copies (4for most genes, except gt male and hb deﬁcient that only have 2 copies) and C k a correction factor for the k th cumulantto ensure proper normalization of the Poisson background (Eq. S31 and Table I). The annotated gene length L variesbetween 1.8 to 3.6 kb for the “gap” genes. In the following we used an eﬀective gene length that is slightly larger andtakes into account the possible lingering of fully elongated transcripts at the loci (Table II). This eﬀective gene length canbe estimated from the dual color FISH data (cf. Section VIII A).We then ﬁtted a second order polynomial of the mean activity (cid:104) g (cid:105) to the variance σ g (Fig. 2B) in order to estimate themaximal activity g , which was deﬁned as the crossing point between the Poisson background (Fig. 2B dash line) andthe ﬁtted variance (solid line). Similarly, we ﬁtted 3 rd and 4 th order polynomial of the mean activity to the cumulants κ and κ (Fig. 2C-D), constrained to reach the Poisson limit at g . Of note, the cumulants of the Poisson distribution areall equal to the mean. As can be seen in Figure 2B-D, the polynomial ﬁts (solid lines) capture the main trend observedin the data, suggesting a simple relationship between the cumulants and the mean. It follows that the underlying activitydistribution is essentially a universal single parameter distribution whose parameter is the mean activity. To test theextent of the universality, we repeated the analysis above of each gap gene individually (Fig. S3A-C). The individual ﬁts(colored solid lines) remain relatively close to each other. Although the ﬁts for hb slightly deviate from the other genes,the global shape of the cumulants is conserved.We then tested whether the 2-state model could qualitatively explain the universal trend observed in the cumulants.We investigated various single transcriptional parameter modulations, assuming a ﬁxed elongation time τ e = (cid:104) L (cid:105) /k elo (Fig. 2F-I). Consistent with the initial observation based on the noise-mean relationship (Fig. 2A), modulation of themean occupancy (cid:104) n (cid:105) with constant switching correlation time τ n leads to good qualitative agreement with the data (Fig.2F-I), particularly when τ n ≥ τ e (red line). Notably, modulation of k on (cyan line) or k oﬀ (green line) alone do not capturethe shape of the cumulants very well. VIII. TRANSCRIPTIONAL PARAMETERS FROM DUAL-COLOR FISHA. Dual-color FISH and eﬀective gene length

In the manuscript, we aimed to learn the kinetic parameters of the 2-state from the FISH data for each gene alongthe AP-axis. In order to constrain the possible set of kinetic parameters that could generate the observed activity levels,we performed dual-color smFISH tagging the 5’ and 3’ region of the transcripts with diﬀerent probe sets (Fig. S4A).5The mean (Fig. S4B) and the noise (CV, Fig S4C) of the 5’ and 3’ channel strongly correlates as expected from theexperimental design. Thus, the two channels oﬀer a consistent readout of the variability.For each gene, given the 5’ and 3’ FISH probe conﬁgurations, we calculated the expected ratio of 3’ over 5’ signal r = C (3)1 /C (5)1 according to Eq. S26 (Fig. S4D). The predicted ratio is overall smaller than the measured one (Fig. S4E).It suggests that nascent transcripts might be retained at transcription sites for a short duration. We then calculatedfor each gene, the eﬀective length that would be consistent with the measured ratio (Fig. S4F, Table II). Assuming anelongation rate k elo = 1 . Gene Annotated length [bp] Pol II ChIP [bp] Eﬀective length [bp] hb Kr kni gt B. Distribution of nascent transcripts

To learn transcriptional parameters, we designed an inference framework based on the 2-state model. Two key distri-butions in this framework are the steady-state distribution of nascent transcripts (or Pol II number) on the gene and thepropagator that describes the temporal evolution of an arbitrary distribution of nascent transcripts. Both distributionscan be derived from the master equation (S10). Although the master equation can be solved using generating func-tions [66, 69], we followed another route that can be easily extended to multi-state system and remains computationallytractable. The master equation can be written in terms of a Lagrangian operator A containing the propensity functionsof the diﬀerent reactions: ddt P t ( g, n ) = AP t ( g, n ) (S33) a. Single gene copy After appropriate truncation on the transcript number (setting an upper bound for the maximumnumber of nascent transcripts) [70], the A operator can be written in terms of a sum of tensor products of diﬀerent matrices: A = I G ⊗ N + K G ⊗ R (S34)with I G standing for the identity matrix of size G + 1 where G is the maximum number of transcripts after truncation.The matrix N encodes the rates of the possible transitions for the 2-state promoter and is given as follows: N = (cid:32) − k on k oﬀ k on − k oﬀ (cid:33) (S35)while K G describes the initiation of transcripts K G =  − k ini . . . . . . k ini − k ini . . . . . . ...0 k ini . . . . . . ...... . . . . . . − k ini . . . k ini − k ini  (S36)and R G indicates in which promoter state initiation occurs, R G ( n, n (cid:48) ) = 1 if n = n (cid:48) = 1 and zero otherwise. Thepropagator of the resulting ﬁnite system (Eq. S33) can be expressed as a matrix exponential of the A operator: P t ( g, n | g (cid:48) , n (cid:48) ) = exp ( At ) (S37)6The distribution of nascent transcripts P ( g ) for a gene of length L is calculated using the propagator above with t = τ e ≡ L/k elo and the initial conditions. With initially zero nascent transcript, P ( g ) is then given by: P ( g ) = (cid:88) n,g (cid:48) ,n (cid:48) P τ e ( g, n | g (cid:48) , n (cid:48) ) δ g (cid:48) , P ( n (cid:48) ) (S38)where P ( n ) speciﬁes the initial distribution of promoter state. The distribution P ( g ) can be computed eﬃciently bydirectly estimating the action of the initial vector on the matrix exponential [71]. Assuming the promoter at steady-state, P ( n ) is then given by: P ( n ) = (cid:26) (cid:104) n (cid:105) for n = 11 − (cid:104) n (cid:105) for n = 0 (S39)with (cid:104) n (cid:105) = k on / ( k on + k oﬀ ). Off=0 On=1 k off k on k ini on k off off k on k ini ini A

2x Off=0 On=1 k off k on k ini B on k off off on off on off k on k ini ini ini ini <=><=> FIG. SN1: Equivalent representations of the superposition of two (A) and four (B) independent and identical telegraph processes. b. Multiple independent gene copies

Provided each gene copy is independent and undistinguishable, the combinationof two and four gene copies can be represented by a three- and ﬁve-state promoter model (Fig. SN1). The corresponding N and R matrices are given by: N =  − k on k oﬀ k on − ( k oﬀ + k on ) 2 k oﬀ k on − k oﬀ  (S40) R =   (S41) N =  − k on k oﬀ k on − ( k oﬀ + 3 k on ) 2 k oﬀ k on − k oﬀ + k on ) 3 k oﬀ

00 0 2 k on − (3 k oﬀ + k on ) 4 k oﬀ k on − k oﬀ  (S42) R =   (S43)7The distribution of nascent transcripts is calculated according to Eq. (S38), with the propagator P t ( g, n | g (cid:48) n (cid:48) ) computedfrom the updated A operator (Eq. S34 & S37). The steady-state distribution of the N -gene copy system is given by: P ( n ) = (cid:32) Nn (cid:33) (cid:104) n (cid:105) n (1 − (cid:104) n (cid:105) ) N − n with n ∈ { , , ..., N } (S44)where (cid:104) n (cid:105) = k on / ( k on + k oﬀ ) is the steady-state occupancy of a single promoter. C. Joint distribution of 5’ and 3’ activity

In the following subsection, we lay out the approach used to calculate the joint distribution of 5’ and 3’ activity for anarbitrary conﬁguration of 5’ and 3’ FISH probes. Analytic solutions for steady-state distributions with idealistic singlecolor probe conﬁguration exist [66], but solutions for arbitrary probe conﬁgurations and multi-color FISH are cumbersome.Here, the computational approach is general enough and can be applied to a large class of transcription model, at or outof steady-state (transient relaxation), provided the elongation process is assumed deterministic.The measured 5’ and 3’ transcriptional activities result from partially elongated nascent transcripts. Each ﬂuorescentprobe is assumed to be instantaneously bound and to contribute equally to the total ﬂuorescence. Thus, the ﬂuorescentsignal of each nascent transcript is proportional to the number of probe binding regions that have been transcribed. Inorder to calculate the joint distribution, one needs to proceed backward in time. Starting from the 3’ end up to the 5’ endof the gene, we accumulate the contribution of nascent transcripts that could have been initiated in the interval separatingtwo successive probe regions. Since we assumed elongation to occur at constant speed, the distance interval between twosuccessive probe regions can be converted into a time interval. Doing so for each distance interval leads to the followingtemporal hierarchy (Fig. SN2). For instance, only transcripts initiated during the time interval [ − τ e , − τ e + t (3)1 ] fullycontribute to the 3’ (red) signal measured at time t = 0. The probability to initiate g nascent transcripts during this timeinterval is given by the propagator P t (3)1 ( g, n | g (cid:48) , n (cid:48) ) (Eq. S37), where n is the promoter state.

5’ 3’ t t t t t k(3) t t t t t k(5) τ e ...... FIG. SN2: Temporal hierarchy of dual color probe conﬁguration.

For any model of promoter activity that only consider the stochastic initiation of transcripts (as a Poisson process) anddeterministic elongation with instantaneous release, the propagator will satisfy the following equality: P t ( g, n | g (cid:48) , n (cid:48) ) = P t ( g − g (cid:48) , n | , n (cid:48) ) (S45)Thus, one only needs to calculate P t ( g, n | , n (cid:48) ) ≡ P t ( g, n | n (cid:48) ), which can be computed much faster than the matrixexponential (Eq. S37) [71]. It then follows that the Chapman-Kolmogorov equation for the time propagation reduces toa discrete convolution: P t + t ( g , n | n ) = (cid:88) n g (cid:88) g =0 P t ( g − g , n | n ) P t ( g , n | n ) (S46)This property is used extensively in the following calculation of the joint distribution.The computation of the joint distribution is performed according to a dynamic programming approach that can inprinciple be applied to an arbitrary number of color probes. We ﬁrst calculate recursively the 3’ contribution (red probes)to the signal P (3) ( ˜ G k , G k , n k ), where ˜ G k stands for the total signal in probe space, G k the total number of nascenttranscripts, n k the promoter state and k the total number of probes covering the 3’ region. We then calculate the 5’contribution in a similar fashion, P (5) ( ˜ G k | n ). Lastly, we combine both components to generate the ﬁnal joint distribution P ( ˜ G (5) , ˜ G (3) ) in probe space.8 a. Step 1: calculate the 3’ contribution The initial distribution is given by: P (3) ( ˜ G , G , n ) ≡ P (3) (( k − g + kg (cid:124) (cid:123)(cid:122) (cid:125) ˜ G , g + g (cid:124) (cid:123)(cid:122) (cid:125) G , n )= (cid:88) n ,n P t (3)2 ( g , n | n ) P t (3)1 ( g , n | n ) P ( n ) P ( n ) (S47)where P ( n ) and P ( n ) are the initial distributions of promoter state at time t = − τ e and t + t (3)1 respectively. Assumingpromoters at steady-state, both distributions are then given by Eq. (S44) for a multi-gene system. We then perform thefollowing recursion scheme for i = { , ..., k } : P (3) ( ˜ G i , G i , n i ) = (cid:88) n i − g max (cid:88) g i =0 P t (3) i ( g i , n i | n i − ) P (3) ( ˜ G i − ( k − i + 1) g i (cid:124) (cid:123)(cid:122) (cid:125) ˜ G i − , G i − g i (cid:124) (cid:123)(cid:122) (cid:125) G i − , n i − ) (S48)where g max = min ( (cid:98) ˜ G i / ( k − i + 1) (cid:99) , G i ). b. Step 2: calculate the 5’ contribution The initial distribution is given by: P (5) ( ˜ G , n | n ) ≡ P (5) ( kg , n | n ) = P t (5)1 ( g , n | n ) (S49)We then perform the following recursion scheme for i = { , ..., k } : P (5) ( ˜ G i , n i | n ) = (cid:88) n i − g max (cid:88) g i =0 P t (5) i ( g i , n i | n i − ) P (5) ( ˜ G i − ( k − i + 1) g i (cid:124) (cid:123)(cid:122) (cid:125) ˜ G i − , n i − | n ) (S50)where g max = (cid:98) ˜ G i / ( k − i + 1) (cid:99) . Lastly, we sum out n k : P (5) ( ˜ G k | n ) = (cid:88) n k P (5) ( ˜ G k , n k | n ) (S51) c. Step 3: combine 3’ and 5’ contributions The ﬁnal joint distribution of 5’ and 3’ activity in probe space is thengiven by: P ( ˜ G (5) , ˜ G (3) ) = (cid:88) n G max (cid:88) G =0 P (5) ( ˜ G (5) − kG | n ) P (3) ( ˜ G (3) , G, n ) (S52)where G max = (cid:98) ˜ G (5) /k (cid:99) . P (3) and P (5) are the joint distributions computed at step 1 and 2. Since the actual signalresolution is of the order of 1 cytoplasmic unit (a fully tagged transcript with k ﬂuorescent probes), the joint distributioncan be coarse-grained by aggregating the states ˜ G by a block of size k corresponding to a single cytoplasmic unit. Thecoarse-grained distribution will be denoted P ( G (5) , G (3) ) in the following. D. Likelihood and inference

The likelihood of the data D = { S (5) , S (3) } given the parameters θ = ( k on , k oﬀ , k ini , k elo ) can be expressed in terms ofthe measurement noise model P ( S (5) , S (3) | G (5) , G (3) ) (Eq. S1) and the joint distribution P ( G (5) , G (3) | θ ): P ( D | θ ) = N D (cid:89) i =1 (cid:88) G (5) ,G (3) P ( S (5) i , S (3) i | G (5) , G (3) ) P ( G (5) , G (3) | θ ) (S53)where N D is the total amount of data. Following a Bayesian approach, the parameters are estimated from the posteriordistribution: P ( θ | D ) = P ( D | θ ) P ( θ ) P ( D ) ≡ (cid:82) P ( D | θ ) P ( θ )d θ (S54)The posterior distribution P ( θ | D ) was sampled using a Markov chain Monte Carlo (MCMC) algorithm and the parameterswere estimated as the mean of the posterior distribution. We used log-uniform priors as uninformative priors for the rateparameters ( k on , k oﬀ , k ini ) and set the elongation rate k elo to the experimentally measured value of 1.5kb/min [53]. Atsteady-state, a known value of k elo is required to set the temporal scale of the other transcriptional parameters. This can beseen by inspecting of the expressions for the various moments of the nascent transcript distribution (Eq. S13, S14, S19 andS20). Since all moments can be parametrized by the three independent parameters g = k ini /k elo , (cid:104) n (cid:105) = k on / ( k on + k oﬀ )and the ratio τ e /τ n = ( k on + k oﬀ ) /k elo , it follows that the model is not identiﬁable when the temporal scale is not set.9 E. Performance

In order to validate the calculation of the joint distribution and asses the performance of our inference, we ﬁrst testedthe method on synthetic data. Using the Gillespie algorithm [72], we generated simulated nuclei activity data based on4 independent gene copies modeled by the telegraph model. We used the probe conﬁguration and gene length of hb andassumed a typical elongation rate of 1.5kb/min [53]. Measurement noise was included in the simulated data according tothe characterization performed previously on real data (cf. Section V B).We investigated diﬀerent parameter regimes and modulation schemes of the mean activity µ ≡ (cid:104) g (cid:105) , to test whether theinput parameters used to generate the data could be inferred properly (Fig. S6A-D). Namely, we tested: 1) modulationof the initiation rate k ini alone with τ n = 2 min and (cid:104) n (cid:105) = 0 .

35 (cyan dash line), 2) modulation of the on-rate k on alonewith k ini = 7 min − and k oﬀ = 0 .

25 min − (green dash line), 3) modulation of the oﬀ-rate k oﬀ alone with k ini = 7 min − and k on = 0 .

25 min − (blue dash line), 4) modulation of the mean occupancy (cid:104) n (cid:105) alone with k ini = 7 and τ n = 2 min (reddash line).For each scenario, we generated 8 batches of data covering the range of normalized activity µ/µ . Each batch was madeof 10 independently sampled datasets of 500 nuclei activity measurements. We performed the inference on each datasetindividually and reported the mixture of posterior distribution over the 10 datasets to take into account the error dueto the randomness of the sampled data. We conclude that the inference framework performs well, since all the inferredquantities cover the true values within error bars. Overall, the inference allows us to distinguish the diﬀerent testedmodulation strategies without ambiguities. F. Eﬀect of elongation rate on inference

As we discussed in the previous subsection VIII D, the elongation rate k elo sets the temporal scale of the transcriptionalparameters, thus a diﬀerent elongation rate would lead to diﬀerent values of the parameters. In the manuscript, we useda value of k elo = 1 . k elo rescales the transcriptional parameters in a very predictable way. No matter theelongation rate, the three quantities g , (cid:104) n (cid:105) and τ e /τ n should be perfectly identiﬁable. It follows that the new parameters(denoted by the ∗ superscript) have to satisfy the following equations: k ini k ∗ elo k elo = k ∗ ini (S55) (cid:104) n (cid:105) = (cid:104) n (cid:105) ∗ (S56) τ n k elo k ∗ elo = τ ∗ n (S57)Inferring the transcriptional parameters from the data with k elo = 2 . k elo = 1 . k ini and τ n are rescaled by a factor 2 . / . ≈ .

67 and1 . / . . (cid:104) n (cid:105)(cid:105)