[PDF] Reconciling Kinetic and Equilibrium Models of Bacterial Transcription

Abstract

The study of transcription remains one of the centerpieces of modern biology with implications in settings from development to metabolism to evolution to disease. Precision measurements using a host of different techniques including fluorescence and sequencing readouts have raised the bar for what it means to quantitatively understand transcriptional regulation. In particular our understanding of the simplest genetic circuit is sufficiently refined both experimentally and theoretically that it has become possible to carefully discriminate between different conceptual pictures of how this regulatory system works. This regulatory motif, originally posited by Jacob and Monod in the 1960s, consists of a single transcriptional repressor binding to a promoter site and inhibiting transcription. In this paper, we show how seven distinct models of this so-called simple-repression motif, based both on equilibrium and kinetic thinking, can be used to derive the predicted levels of gene expression and shed light on the often surprising past success of the equilbrium models. These different models are then invoked to confront a variety of different data on mean, variance and full gene expression distributions, illustrating the extent to which such models can and cannot be distinguished, and suggesting a two-state model with a distribution of burst sizes as the most potent of the seven for describing the simple-repression motif.

Full PDF

RReconciling Kinetic and Equilibrium Models of BacterialTranscription

Muir Morrison , Manuel Razo-Mejia , and Rob Phillips

1, 2, * Department of Physics, California Institute of Technology, Pasadena, CA 91125, USA Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA * Correspondence: [email protected]

Abstract

The study of transcription remains one of the centerpieces of modern biology with implications insettings from development to metabolism to evolution to disease. Precision measurements using a host ofdiﬀerent techniques including ﬂuorescence and sequencing readouts have raised the bar for what it meansto quantitatively understand transcriptional regulation. In particular our understanding of the simplestgenetic circuit is suﬃciently reﬁned both experimentally and theoretically that it has become possible tocarefully discriminate between diﬀerent conceptual pictures of how this regulatory system works. Thisregulatory motif, originally posited by Jacob and Monod in the 1960s, consists of a single transcriptionalrepressor binding to a promoter site and inhibiting transcription. In this paper, we show how sevendistinct models of this so-called simple-repression motif, based both on equilibrium and kinetic thinking,can be used to derive the predicted levels of gene expression and shed light on the often surprisingpast success of the equilbrium models. These diﬀerent models are then invoked to confront a variety ofdiﬀerent data on mean, variance and full gene expression distributions, illustrating the extent to whichsuch models can and cannot be distinguished, and suggesting a two-state model with a distribution ofburst sizes as the most potent of the seven for describing the simple-repression motif.

Gene expression presides over much of the most important dynamism of living organisms. The level ofexpression of batteries of diﬀerent genes is altered as a result of spatiotemporal cues that integrate chemical,mechanical and other types of signals. The original repressor-operator model conceived by Jacob and Monodin the context of bacterial metabolism has now been transformed into the much broader subject of generegulatory networks in living organisms of all kinds [1]–[3]. One of the remaining outstanding challenges tohave emerged in the genomic era is our continued inability to predict the regulatory consequences of diﬀerentregulatory architectures, i.e. the arrangement and aﬃnity of binding sites for transcription factors and RNApolymerases on the DNA. This challenge stems ﬁrst and foremost from our ignorance about what thosearchitectures even are, with more than 60% of the genes even in an ostensibly well understood organismsuch as

E. coli having no regulatory insights at all [4]–[7]. But even once we have established the identityof key transcription factors and their binding sites of a given promoter architecture, there remains thepredictive challenge of understanding its input-output properties, an objective that can be met by a myriadof approaches using the tools of statistical physics [8]–[25]. One route to such predictive understanding is tofocus on the simplest regulatory architecture and to push the theory-experiment dialogue as far and as hardas it can be pushed [26], [27]. If we demonstrate that we can pass that test by successfully predicting both themeans and variance in gene expression at the mRNA level, then that provides a more solid foundation uponwhich to launch into more complex problems - for instance, some of the previously unknown architecturesuncovered in [5] and [28].To that end, in this paper we examine a wide variety of distinct models for the simple repression regulatoryarchitecture. This genetic architecture consists of a DNA promoter regulated by a transcriptional repressor1 a r X i v : . [ q - b i o . S C ] J un hat binds to a single binding site as developed in pioneering early work on the quantitative dissection oftranscription [29], [30]. All of the proposed models coarse-grain away some of the important microscopicfeatures of this architecture that have been elucidated by generations of geneticists, molecular biologistsand biochemists. One goal in exploring such coarse-grainings is to build towards the future models ofregulatory response that will be able to serve the powerful predictive role needed to take synthetic biologyfrom a brilliant exercise in enlightened empiricism to a rational design framework as in any other branch ofengineering. More precisely, we want phenomenology in the sense of coarse-graining away atomistic detail,but still retaining biophysical meaning. For example, we are not satisﬁed with the strictly phenomenologicalapproach oﬀered by the commonly used Hill functions. As argued in [31], Hill functions are ubiquitousprecisely because they coarse-grain away all biophysical details into inscrutable parameters. Studies like [32]have demonstrated that Hill functions are clearly insuﬃcient since each new situation requires a completelynew set of parameters. Such work requires a quantitative theory of how biophysical changes at the molecularlevel propagate to input-output functions at the genetic circuit level. In particular a key question is: at thislevel of coarse-graining, what microscopic details do we need to explicitly model, and how do we ﬁgure thatout? For example, do we need to worry about all or even any of the steps that individual RNA polymerasesgo through each time they make a transcript? Turning the question around, can we see any imprint of thoseprocesses in the available data? If the answer is no, then those processes are irrelevant for our purposes.Forward modeling and inverse (statistical inferential) modeling are necessary to tackle such questions.Figure 1(A) shows the qualitative picture of simple repression that is implicit in the repressor-operator model.An operator, the binding site on the DNA for a repressor protein, may be found occupied by a repressor, inwhich case transcription is blocked from occurring. Alternatively, that binding site may be found unoccupied,in which case RNA polymerase (RNAP) may bind and transcription can proceed. The key assumption wemake in this simplest incarnation of the repressor-operator model is that binding of repressor and RNAP inthe promoter region of interest is exclusive, meaning that one or the other may bind, but never may bothbe simultaneously bound. It is often imagined that when the repressor is bound to its operator, RNAPis sterically blocked from binding to its promoter sequence. Current evidence suggests this is sometimes,but not always the case, and it remains an interesting open question precisely how a repressor bound farupstream is able to repress transcription [4]. Suggestions include “action-at-a-distance” mediated by kinks inthe DNA, formed when the repressor is bound, that prevent RNAP binding. Nevertheless, our modeling inthis work is suﬃciently coarse-grained that we simply assume exclusive binding and leave explicit accountingof these details out of the problem.The logic of the remainder of the paper is as follows. In section 2, we show how both thermodynamic modelsand kinetic models based upon the chemical master equation all culminate in the same underlying functionalform for the fold-change in the average level of gene expression as shown in Figure 1(D). Section 3 goesbeyond an analysis of the mean gene expression by asking how the same models presented in Figure 1(C)can be used to explore noise in gene expression. To make contact with experiment, all of these models mustmake a commitment to some numerical values for the key parameters found in each such model. Thereforein Section 4 we explore the use of Bayesian inference to establish these parameters and to rigorously answerthe question of how to discriminate between the diﬀerent models. As noted in the previous section, there are two broad classes of models in play for computing the input-output functions of regulatory architectures as shown in Figure 1. In both classes of model, the promoteris imagined to exist in a discrete set of states of occupancy, with each such state of occupancy accorded itsown rate of transcription – including no transcription for many of these states. The models are probabilisticwith each state assigned some probability and the overall rate of transcription given byaverage rate of transcription = (cid:88) i r i p i , (1)where i labels the distinct states, p i is the probability of the i th state, and r i is the rate of transcription ofthat state. Ultimately, the diﬀerent models diﬀer along several key axes: what states to consider and how2 A)(B)(C)(D) repressor bound,transcription blocked repressor unbound,transcription proceeds

RNA polymerasetranscriptional repressorDNA promoter

CANONICAL TRANSCRIPTIONAL REGULATION CARTOONCANONICAL TRANSCRIPTIONAL REGULATION CARTOONEQUILIBRIUM MODELSEQUILIBRIUM MODELSNONEQUILIBRIUM MODELSNONEQUILIBRIUM MODELSTHE MASTER CURVE FOR SIMPLE REPRESSIONTHE MASTER CURVE FOR SIMPLE REPRESSION statestateweightweight statestateweightweight DETAILS OF PROMOTER MODELSDETAILS OF PROMOTER MODELSDETAILS OF PROMOTER MODELSDETAILS OF PROMOTER MODELS f o l d - c h a n g e –6 –4 –2 0 2 4 6 ∆ F R − log( ρ ) ( k B T ) Figure 1. An overview of the simple repression motif at the level of means. (A) Schematic of thequalitative biological picture of the simple repression genetic architecture. (B) and (C) A variety of possiblemathematicized cartoons of simple repression, along with the eﬀective parameter ρ which subsumes all regulatorydetails of the architecture that do not directly involve the repressor. (B) Simple repression models from anequilibrium perspective. (C) Equivalent models cast in chemical kinetics language. (D) The “master curve” towhich all cartoons in (B) and (C) collapse.

3o compute the probabilities of those states.The ﬁrst class of models that are the focus of the present section on predicting the mean level of geneexpression, sometimes known as thermodynamic models, invoke the tools of equilibrium statistical mechanicsto compute the probabilities [8]–[17]. As seen in Figure 1(B), even within the class of thermodynamic models,we can make diﬀerent commitments about the underlying microscopic states of the promoter. Indeed, thelist of options considered here does not at all exhaust the suite of diﬀerent microscopic states we can assignto the promoter.The second class of models that allow us to access the mean gene expression use chemical master equationsto compute the probabilities of the diﬀerent microscopic states [18]–[25]. As seen in Figure 1(C), we considera host of diﬀerent nonequilibrium models, each of which will have its own result for both the mean (thissection) and noise (next section) in gene expression.

As a ﬁrst stop on our search for the “right” model of simple repression, let us consider what we can learn fromtheory and experimental measurements on the average level of gene expression in a population of cells. Oneexperimental strategy that has been particularly useful (if incomplete since it misses out on gene expressiondynamics) is to measure the fold-change in mean expression. The fold-change is deﬁned asfold-change = (cid:104) gene expression with repressor present (cid:105)(cid:104) gene expression with repressor absent (cid:105) = (cid:104) m ( R > (cid:105)(cid:104) m ( R = 0) (cid:105) (2)where angle brackets (cid:104)·(cid:105) denote the average over a population of cells and mean mRNA (cid:104) m (cid:105) is viewed as afunction of repressor copy number R . What this means is that the fold-change in gene expression is a relativemeasurement of the eﬀect of the transcriptional repressor ( R >

0) on the gene expression level compared toan unregulated promoter ( R = 0). The second equality in Eq. 2 follows from assuming that the translationeﬃciency, i.e., the number of proteins translated per mRNA, is the same in both conditions. In otherwords, we assume that mean protein level is proportional to mean mRNA level, and that the proportionalityconstant is the same in both conditions and therefore cancels out in the ratio. This is reasonable since thecells in the two conditions are identical except for the presence of the transcription factor, and the modelassumes that the transcription factor has no direct eﬀect on translation.Fold-change has proven a very convenient observable in past work [32]–[35]. Part of its utility in dissectingtranscriptional regulation is its ratiometric nature, which removes many secondary eﬀects that are presentwhen making an absolute gene expression measurement. Also, by measuring otherwise identical cells withand without a transcription factor present, any biological noise common to both conditions can be made tocancel away.Figure 1 depicts a smorgasbord of mathematicized cartoons for simple repression using both thermodynamicand kinetic models that have appeared in previous literature. For each cartoon, we calculate the fold-changein mean gene expression as predicted by that model, deferring some algebraic details to Appendix S1. Whatwe will ﬁnd is that all cartoons collapse to a single master curve, shown in Figure 1(D), which contains justtwo parameters. We label the parameters ∆ F R , an eﬀective free energy parametrizing the repressor-DNAinteraction, and ρ , which subsumes all details of transcription in the absence of repressors. We will oﬀer someintuition for why this master curve exists and discuss why at the level of the mean expression, we are unableto discriminate “right” from “wrong” cartoons given only measurements of fold-changes in expression. In this simplest model, depicted as (1) in Figure 1(B), the promoter is idealized as existing in one of twostates, either repressor bound or repressor unbound. The rate of transcription is assumed to be proportional4o the fraction of time spent in the repressor unbound state. From the relative statistical weights listed inFigure 1, the probability p U of being in the unbound state is p U = (cid:18) RN NS e − β ∆ ε R (cid:19) − . (3)The mean rate of transcription is then given by rp U , as assumed by Eq. 1. The mean number of mRNA isset by the balance of average mRNA transcription and degradation rates, so it follows that the mean mRNAlevel is given by (cid:104) m (cid:105) = rγ (cid:18) RN NS e − β ∆ ε R (cid:19) − , (4)where r is the transcription rate from the repressor unbound state, γ is the mRNA degradation rate, R is repressor copy number, N NS is the number of nonspeciﬁc binding sites in the genome where repressorsspend most of their time when not bound to the operator, β ≡ /k B T , and ∆ ε R is the binding energy of arepressor to its operator site. The derivation of this result is deferred to Appendix S1.The fold-change is found as the ratio of mean mRNA with and without repressor as introduced in Eq. 2.Invoking that deﬁnition results in fold-change = (cid:18) RN NS e − β ∆ ε R (cid:19) − , (5)which matches the form of the master curve in Figure 1(D) with ρ = 1 and ∆ F R = β ∆ ε r − log( R/N NS ).In fact it was noted in [35] that this two-state model can be viewed as the coarse-graining of any equilibriumpromoter model in which no transcriptionally active states have transcription factor bound, or put diﬀerently,when there is no overlap between transcription factor bound states and transcriptionally active states. Wewill see this explicitly in the 3-state equilibrium model below, but perhaps surprising is that an analogousresult carries over even to the nonequilibrium models we consider later. Compared to the previous model, here we ﬁne-grain the repressor unbound state into two separate states:empty, and RNAP bound as shown in (2) in Figure 1(B). This picture was used in [33] as we use it here,and in [32] and [35] it was generalized to incorporate small-molecule inducers that bind the repressor. Theeﬀect of this generalization is, roughly speaking, simply to rescale R from the total number of repressors to asmaller eﬀective number of available repressors which are unbound by inducers. We point out that the samegeneralization can be incorporated quite easily into any of our models in Figure 1 by simply rescaling therepressor copy number R in the equilibrium models, or equivalently k + R in the nonequilibrium models.The mean mRNA copy number, as derived in Appendix S1 from a similar enumeration of states and weightsas the previous model, is (cid:104) m (cid:105) = rγ PN NS e − β ∆ ε P RN NS e − β ∆ ε R + PN NS e − β ∆ ε P , (6)where the new variables are ∆ ε P , the diﬀerence in RNAP binding energy to its speciﬁc site (the promoter)relative to an average nonspeciﬁc background site, and the RNAP copy number, P . The fold-change againfollows immediately asfold-change = PN NS e − β ∆ ε P RN NS e − β ∆ ε R + PN NS e − β ∆ ε P PN NS e − β ∆ ε P PN NS e − β ∆ ε P (7)= (cid:32) RN NS e − β ∆ ε R PN NS e − β ∆ ε P (cid:33) − (8)= (1 + exp( − ∆ F R − log ρ )) − , (9)5ith ∆ F R = β ∆ ε R − log( R/N NS ) and ρ = 1 + PN NS e − β ∆ ε P as shown in Figure 1(B). Thus far, we see thatthe two thermodynamic models, despite making diﬀerent coarse-graining commitments, result in the samefunctional form for the fold-change in mean gene expression. We now explore how kinetic models fare whenfaced with computing the same observable. For our ﬁrst kinetic model, we imitate the states considered in the Two-State Equilibrium Model and considerthe simplest possible picture with only two states, repressor bound and unbound. This is exactly the modelused for the main results of [36]. In this picture, repressor association and dissociation rates from its operatorsite, k + R and k − R , respectively, govern transitions between the two states. When the system is in the unboundstate, transcription initiates at rate r , which represents a coarse-graining of all the downstream processesinto a single eﬀective rate. mRNA is degraded at rate γ as already exploited in the previous models.Let p R ( m, t ) denote the joint probability of ﬁnding the system in the repressor bound state R with m mRNAmolecules present at time t . Similarly deﬁne p U ( m, t ) for the repressor unbound state U . This model isgoverned by coupled master equations giving the time evolution of p R ( m, t ) and p U ( m, t ) [22], [24], [27]which we can write as ddt p R ( m, t ) = − R → U (cid:122) (cid:125)(cid:124) (cid:123) k − R p R ( m, t ) + U → R (cid:122) (cid:125)(cid:124) (cid:123) k + R p U ( m, t ) + m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1) γp R ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γp R ( m, t ) ddt p U ( m, t ) = R → U (cid:122) (cid:125)(cid:124) (cid:123) k − R p R ( m, t ) − U → R (cid:122) (cid:125)(cid:124) (cid:123) k + R p U ( m, t ) + m − → m (cid:122) (cid:125)(cid:124) (cid:123) rp U ( m − , t ) − m → m +1 (cid:122) (cid:125)(cid:124) (cid:123) rp U ( m, t )+ m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1) γp U ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γp U ( m, t ) , (10)where each term on the right corresponds to a transition between two states of the promoter as indicatedby the overbrace label. In each equation, the ﬁrst two terms describe transitions between promoter statesdue to repressors unbinding and binding, respectively. The ﬁnal two terms describe degradation of mRNA,decreasing the copy number by one, and the terms with coeﬃcient r describe transcription initiation increas-ing the mRNA copy number by one. We direct the reader to Appendix S1.1 for a careful treatment showinghow the form of this master equation follows from the corresponding cartoon in Figure 1.We can greatly simplify the notation, which will be especially useful for the more complicated models yetto come, by re-expressing the master equation in vector form [37]. The promoter states are collected into avector and the rate constants are collected into matrices as (cid:126)p ( m ) = (cid:18) p R ( m ) p U ( m ) (cid:19) , K = (cid:18) − k − R k + R k − R − k + R (cid:19) , R = (cid:18) r (cid:19) , (11)so that the master equation may be condensed as ddt (cid:126)p ( m, t ) = ( K − R − γm I ) (cid:126)p ( m, t ) + R (cid:126)p ( m − , t ) + γ ( m + 1) I (cid:126)p ( m + 1 , t ) , (12)where I is the identity matrix. Taking steady state by setting time derivatives to zero, the mean mRNA canbe found to be (cid:104) m (cid:105) = rγ (cid:18) k + R k − R (cid:19) − , (13)with the algebra details again deferred to Appendix S1. Recall k + R is proportional to the repressor copynumber, so in computing fold-change, absence of repressor corresponds to k + R →

0. Therefore fold-change inthis model is simply fold-change = (cid:18) k + R k − R (cid:19) − , (14)again matching the master curve of Figure 1(D) with ρ = 1.6 .1.4 Nonequilibrium Model Two - RNAP Bound and Unbound States Our second kinetic model depicted in Figure 1(C) mirrors the second equilibrium model of Figure 1(B) byﬁne-graining the repressor unbound state of nonequilibrium model 1, resolving it into an empty promoterstate and an RNAP-bound state. Note in this picture, in contrast with model 4 below, transcription initiationis accompanied by a promoter state change, in keeping with the interpretation as RNAP-bound and emptystates: if an RNAP successfully escapes the promoter and proceeds to elongation of a transcript, clearly itis no longer bound at the promoter. Therefore another RNAP must bind before another transcript can beinitiated.The master equation governing this model is analogous to Eqs. 11-12 for model 1 above. The main subtletyarises since transcription initiation accompanies a promoter state change. This can be understood by analogyto K . The oﬀ-diagonal and diagonal elements of K correspond to transitions arriving at or departing from,respectively, the promoter state of interest. If transcription initiation is accompanied by promoter statechanges, we must have separate matrices for arriving and departing transcription events since the arrivingand departing transitions have diﬀerent initial copy numbers of mRNA, unlike for K where they are thesame (see Appendix S1). The master equation for this model is ddt (cid:126)p ( m, t ) = ( K − R D − γm I ) (cid:126)p ( m, t ) + R A (cid:126)p ( m − , t ) + γ ( m + 1) I (cid:126)p ( m + 1 , t ) , (15)with the state vector and promoter transition matrix deﬁned as (cid:126)p ( m ) =  p R ( m ) p E ( m ) p P ( m )  , K =  − k − R k + R k − R − k + R − k + P k − P k + P − k − P  , (16)and the initiation matrices given by R A =  r  , R D =  r  . (17)The elements of (cid:126)p ( m ) encode the probabilities of having m mRNA present along with the promoter havingrepressor bound ( R ), being empty ( E ), or having RNAP bound ( P ), respectively. R A describes probabilityﬂux arriving at the state (cid:126)p ( m ) from a state with one fewer mRNA, namely (cid:126)p ( m − R D describesprobability ﬂux departing from the state (cid:126)p ( m ) for a state with one more mRNA, namely (cid:126)p ( m + 1). K isclosely analogous to model 1.Mean mRNA at steady state is found analogously to model 1, with the result (cid:104) m (cid:105) = rγ k − R k + P k − R k + P + k − R ( k − P + r ) + k + R ( k − P + r ) , (18)and with details again deferred to Appendix S1. Fold-change is again found from the ratio prescribed byEq. 2, from which we havefold-change = k − R k + P k − R k + P + k − R ( k − P + r ) + k + R ( k − P + r ) k + P + k − P + rk + P (19)= (cid:18) k + R k − R k − P + rk + P + k − P + r (cid:19) − (20)= (cid:32) k + R k − R (cid:18) k + P k − P + r (cid:19) − (cid:33) − , (21)which follows the master curve of Figure 1(D) with ρ = 1 + k + P / ( k − P + r ) as claimed.7 .1.5 Nonequilibrium Model Three - Multistep Transcription Initiation and Escape One might reasonably complain that the ﬁrst two “nonequilibrium” models we have considered are straw men.Their steady states necessarily satisfy detailed balance which is equivalent to thermodynamic equilibrium.Why is this the case? At steady state there is by deﬁnition no net probability ﬂux in or out of each promoterstate, but since the promoter states form a linear chain, there is only one way in or out of the repressor boundand RNAP bound states, implying each edge must actually have a net zero probability ﬂux, which is thedeﬁnition of detailed balance (usually phrased as equality of forward and reverse transition ﬂuxes).Now we consider model 3 in Figure 1(C) which allows the possibility of true nonequilibrium steady-stateﬂuxes through the promoter states. We point out that this model was considered previously in [38] where acomparison was made with model 1 as used in [36]. The authors of [38] argued that the additional complexityis essential to properly account for the noise in the mRNA distribution. We will weigh in on both modelslater when we consider observables beyond fold-change.The master equation governing this model is identical in form to model 2 above, namely ddt (cid:126)p ( m, t ) = ( K − R D − γm I ) (cid:126)p ( m, t ) + R A (cid:126)p ( m − , t ) + γ ( m + 1) I (cid:126)p ( m + 1 , t ) , (22)but with a higher-dimensional state space and diﬀerent matrices. The state vector and promoter transitionmatrix are now (cid:126)p ( m ) =  p R ( m ) p E ( m ) p C ( m ) p O ( m )  , K =  − k − R k + R k − R − k + R − k + P k − P k + P − k − P − k O

00 0 k O  , (23)with the four promoter states, in order, being repressor bound ( R ), empty ( E ), RNAP closed complex ( C ),and RNAP open complex ( O ). Besides increasing dimension by one, the only new feature in K is the rate k O , representing the rate of open complex formation from the closed complex, which we assume for simplicityto be irreversible in keeping with some [38] but not all [39] past literature. The initiation matrices are givenby R A =  r  , R D =  r  , (24)again closely analogous to nonequilibrium model 2.The expression for mean mRNA is substantially more complicated now, as worked out in Appendix S1 wherewe ﬁnd (cid:104) m (cid:105) = rγ k − R k + P k O k − R [( k + P ( k O + r ) + r ( k − P + k O )] + k + R r ( k − P + k O ) , (25)which can be simpliﬁed to (cid:104) m (cid:105) = rγ k + P k O r ( k O + k − P ) k + P ( k O + r ) r ( k O + k − P ) + k + R k − R . (26)The strategy is to isolate the terms involving the repressor, so that now the fold-change is seen to besimply fold-change = k + P k O r ( k O + k − P ) k + P ( k O + r ) r ( k O + k − P ) + k + R k − R k + P ( k O + r ) r ( k O + k − P ) k + P k O r ( k O + k − P ) (27)= (cid:32) k + R k − R (cid:18) k + P ( k O + r ) r ( k O + k − P ) (cid:19) − (cid:33) − , (28)8urprisingly reducing to the master curve of Figure 1(D) once again, with ρ = 1 + k + P ( k O + r ) r ( k O + k − P ) .This example hints that an arbitrarily ﬁne-grained model of downstream transcription steps may still becollapsed to the form of the master curve for the means given in Figure 1(D), so long as the repressor bindingis exclusive with transcriptionally active states. We oﬀer this as a conjecture, and we suspect that a carefulargument using the King-Altman diagram method [40], [41] might furnish a “proof.” Our focus here is noton full generality but rather to survey an assortment of plausible models for simple repression that have beenproposed in the literature. Model 4 in Figure 1(C) is at the core of the theory in [42]. At a glance the cartoon for this model may appearvery similar to model 2, and mathematically it is, but the interpretation is rather diﬀerent. In model 2, weinterpreted the third state literally as an RNAP-bound promoter and modeled initiation of a transcript astriggering a promoter state change, making the assumption that an RNAP can only make one transcript ata time. In contrast, in the present model the promoter state does not change when a transcript is initiated.So we no longer interpret these states as literally RNAP bound and unbound but instead as coarse-grained“active” and “inactive” states, the details of which we leave unspeciﬁed for now. We will comment more onthis model below when we discuss Fano factors of models.Mathematically this model is very similar to models 1 and 2. Like model 1, the matrix R describingtranscription initiation is diagonal, namely R =  r  . (29)The master equation takes verbatim the same form as it did for model 1, Eq. 12. Meanwhile the promotertransition matrix K is the same as Eq. 16 from model 2, although we relabel the rate constants from k ± P to k ± to reiterate that these are not simply RNAP binding and unbinding rates.Carrying out the algebra, the mean mRNA can be found to be (cid:104) m (cid:105) = rγ k − R k + k − R k + + k − R k − + k + R k − , (30)and the fold-change readily follows,fold-change = k − R k + k − R k + + k − R k − + k + R k − k − R k + + k − R k − k − R k + (31)= (cid:32) k + R k − R (cid:18) k + k − (cid:19) − (cid:33) − , (32)from which we see ρ = 1 + k + /k − as shown in Figure 1(C). The ﬁnal model we consider shown in Figure 1(C) is an intuitive analog to model 1, with just two states,repressor bound or unbound, and transition rates between them of k + R and k − R . In model 1, when in theunbound state, single mRNA transcripts are produced as a Poisson process with some characteristic rate r . The current model by contrast produces, at some Poisson rate k i , bursts of mRNA transcripts. Theburst sizes are assumed to be geometrically distributed with a mean burst size b , which we will motivate inSection 3 when we derive this model as a certain limiting case of model 4.9rom this intuitive picture and by analogy to model 1, then, it should be plausible that the mean mRNAlevel is (cid:104) m (cid:105) = k i bγ (cid:18) k + R k − R (cid:19) − , (33)which will turn out to be correct from a careful calculation. For now, we simply note that just like model 1,the fold-change becomes fold-change = (cid:18) k + R k − R (cid:19) − (34)with ρ = 1 also like model 1. We will also see later how this model emerges as a natural limit of model4. The key outcome of our analysis of the models in Figure 1 is the existence of a master curve shown inFigure 1(D) to which the fold-change predictions of all the models collapse. This master curve is parametrizedby only two eﬀective parameters: ∆ F R , which characterizes the number of repressors and their bindingstrength to the DNA, and ρ , which characterizes all other features of the promoter architecture. The keyassumption underpinning this result is that no transcription occurs when a repressor is bound to its operator.Note, however, that we are agnostic about the molecular mechanism which achieves this; steric eﬀects are oneplausible mechanism, but, for instance, “action-at-a-distance” mediated by kinked DNA due to repressorsbound tens or hundreds of nucleotides upstream of a promoter is plausible as well.Why does the master curve of Figure 1(D) exist at all? This brings to mind the deep questions posed in,e.g., [31] and [43], suggesting we consider multiple plausible models of a system and search for their commonpatterns to tease out which broad features are and are not important. In our case, the key feature seems tobe the exclusive nature of repressor and RNAP binding, which allows the parameter describing the repressor,∆ F R , to cleanly separate from all other details of the promoter architecture, which are encapsulated in ρ .Arbitrary nonequilibrium behavior can occur on the rest of the promoter state space, but it may all be sweptup in the eﬀective parameter ρ , to which the repressor makes no contribution. We point the interested readerto [44] and [45] for an interesting analysis of similar problems using a graph-theoretic language.As suggested in [35], we believe this master curve should generalize to architectures with multiple repressorbinding sites, as long as the exclusivity of transcription factor binding and transcription initiation is main-tained. The interpretation of ∆ F R is then of an eﬀective free energy of all repressor bound states. In anequilibrium picture this is simply given by the log of the sum of Boltzmann weights of all repressor boundstates, which looks like the log of a partition function of a subsystem. In a nonequilibrium picture, whilewe can still mathematically gather terms and give the resulting collection the label ∆ F R , it is unclear if thephysical interpretation as an eﬀective free energy makes sense. The problem is that free energies cannot beassigned unambiguously to states out of equilibrium because the free energy change along a generic pathtraversing the state space is path dependent, unlike at equilibrium. A consequence of this is that, out ofequilibrium, ∆ F R is no longer a simple sum of Boltzmann weights. Instead it resembles a restricted sumof King-Altman diagrams [40], [41]. Following the work of Hill [46], it may yet be possible to interpret thisexpression as an eﬀective free energy, but this remains unclear to us. We leave this an open problem forfuture work.If we relax the requirement of exclusive repressor-RNAP binding, one could imagine models in which repressorand RNAP doubly-bound states are allowed, where the repressor’s eﬀect is to reduce the transcription raterather than setting it to zero. Our results do not strictly apply to such a model, although we note that if therepressor’s reduction of the transcription rate is substantial, such a model might still be well-approximatedby one of the models in Figure 1.One may worry that our “one curve to rule them all” is a mathematical tautology. In fact we agree withthis criticism if ∆ F R is “just a ﬁtting parameter” and cannot be meaningfully interpreted as a real, physical10ree energy. An analogy to Hill functions is appropriate here. One of their great strengths and weaknesses,depending on the use they are put to, is that their parameters coarse-grain many details and are generallynot interpretable in terms of microscopic models, for deep reasons discussed at length in [31]. By contrast,our master curve claims to have the best of both worlds: a coarse-graining of all details besides the repressorinto a single eﬀective parameter ρ , while simultaneously retaining an interpretation of ∆ F R as a physicallymeaningful and interpretable free energy. Our task, then, is to prove or disprove this claim.How do we test this and probe the theory with fold-change measurements? There is a fundamental limitationin that the master curve is essentially a one-parameter function of ∆ F R + log ρ . Worse, there are many apriori plausible microscopic mechanisms that could contribute to the value of ρ , such as RNAP binding andescape kinetics [38], [39], and/or supercoiling accumulation and release [47], [48], and/or, RNAP clustersanalogous to those observed in eukaryotes [49], [50] and recently also observed in bacteria [51]. Even if ∆ F R is measured to high precision, inferring the potential microscopic contributions to ρ , buried inside a log noless, from fold-change measurements seems beyond reach. As a statistical inference problem it is entirelynonidentiﬁable, in the language of [52], Section 4.3.If we cannot simply infer values of ρ from measurements of fold-change, can we perturb some of the param-eters that make up ρ and measure the change? Unfortunately we suspect this is oﬀ-limits experimentally:most of the potential contributors to ρ are global processes that aﬀect many or all genes. For instance,changing RNAP association rates by changing RNAP copy numbers, or changing supercoiling kinetics bychanging topoisomerase copy numbers, would massively perturb the entire cell’s physiology and confoundany determination of ρ .One might instead imagine a bottom-up modeling approach, where we mathematicize a model of what wehypothesize the important steps are and are not, use in vitro data for the steps deemed important, and predict what ρ should be. But again, because of the one-parameter nature of the master curve, many diﬀerent modelswill likely make indistinguishable predictions, and without any way to experimentally perturb in vivo , thereis no clear way to test whether the modeling assumptions are correct.In light of this, we prefer the view that parameters and rates are not directly comparable between cartoonsin Figure 1. Rather, parameters in the simpler cartoons represent coarse-grained combinations of param-eters in the ﬁner-grained models. For instance, by equating ρ between any two models, one can derivevarious possible correspondences between the two models’ parameters. Note that these correspondences areclearly not unique, since many possible associations could be made. It then is a choice as to what micro-scopic interpretations the model-builder prefers for the parameters in a particular cartoon, and as to whichcoarse-grainings lend intuition and which seem nonsensical. Indeed, since it remains an open question whatmicroscopic features dominate ρ (as suggested above, perhaps RNAP binding and escape kinetics [38], [39],or supercoiling accumulation and release [47], [48], or, something more exotic like RNAP clusters [49]–[51]),we are hesitant to put too much weight on any one microscopic interpretation of model parameters thatmake up ρ .One possible tuning knob to probe ρ that would not globally perturb the cell’s physiology is to manipulateRNAP binding sites. Work such as [53] has shown that models of sequence-dependent RNAP aﬃnity can beinferred from data, and the authors of [54] showed that the model of [53] has predictive power by using themodel to design binding sites of a desired aﬃnity. But for our purposes, this begs the question: the authorsof [53] assumed a particular model (essentially our 3-state equilibrium model but without the repressor),so it is unclear how or if such methods can be turned around to compare diﬀerent models of promoterfunction.Another possible route to dissect transcription details without a global perturbation would be to use phagepolymerase with phage-speciﬁc promoters. While such results would carry some caveats, e.g., whetherthe repression of the phage polymerase is a good analog to the repression of the native RNAP, it couldnevertheless be worthy of consideration.We have already pointed out that the master curve of Figure 1 is essentially a one-parameter model, theone parameter being ∆ F R + log ρ . By now the reader may be alarmed as to how can we even determine∆ F R and ρ independently of each other, never mind shedding a lens on the internal structure of ρ itself. A11int is provided by the weak promoter approximation, invoked repeatedly in prior studies [14], [32], [33] ofsimple repression using the 3-state equilibrium model in Figure 1(B). In that picture, the weak promoterapproximation means PN NS exp( − β ∆ ε P ) (cid:28)

1, meaning therefore ρ ≈

1. This approximation can be welljustiﬁed on the basis of the number of RNAP and σ factors per cell and the strength of binding of RNAPto DNA at weak promoters. This is suggestive, but how can we be sure that ρ is not, for instance, actually10 and that ∆ F R hides a compensatory factor? A resolution is oﬀered by an independent inference of ρ in the absence of repressors. This was done in [42] by ﬁtting nonequilibrium model 4 in Figure 1(C), withzero repressor (looking ahead, this is equivalent to model 4 in Figure 2(A)), to single-cell mRNA counts datafrom [34]. This provided a determination of k + and k − , from which their ratio is estimated to be no morethan a few 10 − and possibly as small as 10 − .The realization that ρ ≈ independent of which model in Figure 1 one prefers,goes a long way towards explaining the surprising success of equilibrium models of simple repression. Eventhough our 2- and 3-state models get so many details of transcription wrong, it does not matter because fold-change is a cleverly designed ratio. Since ρ subsumes all details except the repressor, and log ρ ≈

0, ﬁttingthese simple models to fold-change measurements can still give a surprisingly good estimate of repressorbinding energies. So the ratiometric construction of fold-change fulﬁlls its intended purpose of canceling outall features of the promoter architecture except the repressor itself. Nevertheless it is perhaps surprising howeﬀectively it does so: a priori , one might not have expected ρ to be quite so close to 1.We would also like to highlight the relevance of [55] here. Landman et. al. reanalyzed and compared in vivo and in vitro data on the lacI repressor’s binding aﬃnity to its various operator sequences. (The in vivo datawas from, essentially, ﬁtting our master curve to expression measurements.) They ﬁnd broad agreementbetween the in vitro and in vivo values. This reinforces the suspicion that the equilibrium ∆ ε R repressorbinding energies do in fact represent real physical free energies. Again, a priori this did not have to be thecase, even knowing that ρ ≈ F R can be measured to suﬃcient precision, then deviations from ρ = 1 become a testablematter of experiment. In practice, it is probably unrealistic to measure repressor rates k + R or k − R or fold-changes in expression levels (and hence ∆ ε R ) precisely enough to detect the expected tiny deviations from ρ = 1. We can estimate the requisite precision in ∆ F R to resolve a given ∆ ρ by noting, since ρ ≈

1, thatlog(1 + ∆ ρ ) ≈ ∆ ρ , so ∆(∆ F R ) ≈ ∆ ρ . Suppose we are lucky and ∆ ρ is ∼ .

1, on the high end of ourrange estimated above. A determination of ∆ ε R /k B T with an uncertainty of barely 0.1 was achieved inthe excellent measurements of [32], so this requires a very diﬃcult determination of ∆ F R for a very crudedetermination of ρ , which suggests, to put it lightly, this is not a promising path to pursue experimentally.It is doubtful that inference of repressor kinetic rates would be any easier.Moving forward, we have weak evidence supporting the interpretation of ∆ F R as a physically real freeenergy [55] and other work casting doubt [56]. How might we resolve the confusion? If there is no discrim-inatory power to test the theory and distinguish the various models with measurements of fold-changes inmeans, how do we probe the theory? Clearly to discriminate between the nonequilibrium models in Figure 1,we need to go beyond means to ask questions about kinetics, noise and even full distributions of mRNAcopy numbers over a population of cells. If the “one-curve-to-rule-them-all” is more than a mathematicaltautology, then the free energy of repressor binding inferred from fold-change measurements should agreewith repressor binding and unbinding rates. In other words, the equilibrium and nonequilibrium deﬁnitionsof ∆ F R must agree, meaning ∆ F R = β ∆ ε R − log( R/N NS ) = − log( k + R /k − R ) , (35)must hold, where β ∆ ε R is inferred from the master curve ﬁt to fold-change measurements, and k + R and k − R are inferred in some orthogonal manner. Single molecule measurements such as from [56] have directlyobserved these rates, and in the remainder of this work we explore a complementary approach: inferringrepressor binding and unbinding rates k + R and k − R from single-molecule measurements of mRNA populationdistributions. 12 Beyond Means in Gene Expression

In this section, our objective is to explore the same models considered in the previous section, but nowwith reference to the the question of how well they describe the distribution of gene expression levels, withspecial reference to the variance in these distributions. To that end, we repeat the same pattern as in theprevious section by examining the models one by one. In particular we will focus on the Fano factor, deﬁnedas the variance/mean. This metric serves as a powerful discriminatory tool from the null model that thesteady-state mRNA distribution must be Poisson, giving a Fano factor of one.

Before we can tackle simple repression, we need an adequate phenomenological model of constitutive ex-pression. The literature abounds with options from which we can choose, and we show several potentialkinetic models for constitutive promoters in Figure 2(A). Let us consider the suitability of each model forour purposes in turn.

The simplest model of constitutive expression that we can imagine is shown as model 1 in Figure 2(A)and assumes that transcripts are produced as a Poisson process from a single promoter state. This is thepicture from Jones et. al. [36] that was used to interpret a systematic study of gene expression noise overa series of promoters designed to have diﬀerent strengths. This model insists that the “true” steady-statemRNA distribution is Poisson, implying the Fano factor ν must be 1. In [36], the authors carefully attributemeasured deviations from Fano = 1 to intensity variability in ﬂuorescence measurements, gene copy numbervariation, and copy number ﬂuctuations of the transcription machinery, e.g., RNAP itself. In this picture,the master equation makes no appearance, and all the corrections to Poisson behavior are derived as additivecorrections to the Fano factor. For disproving the “universal noise curve” from So et. al. [59], this picture wasexcellent. It is appealing in its simplicity, with only two parameters, the initiation rate r and degradationrate γ . Since γ is independently known from other experiments, and the mean mRNA copy number is r/γ , r is easily inferred from data. In other words, the model is not excessively complex for the data at hand.But for many interesting questions, for instance in the recent work [42], knowledge of means and variancesalone is insuﬃcient, and a description of the full distribution of molecular counts is necessary. For this weneed a (slightly) more complex model than model 1 that would allow us to incorporate the non-Poissonianfeatures of constitutive promoters directly into a master equation formulation. Our second model of constitutive transcription posits an architecture in which the promoter is either emptyor bound by RNAP [27], [58]. Here, as shown in model 2 of Figure 2(A), transcription initiation results ina state transition from the bound to the unbound state, reﬂecting the microscopic reality that an RNAPthat has begun to elongate a transcript is no longer available at the start site to begin another. As shownin Appendix S1, the Fano factor in this model is given by ν = 1 − rk + P (cid:0) k + P + k − P + r (cid:1) (cid:0) γ + k + P + k − P + r (cid:1) . (36)The problem with this picture is that the Fano factor is always <

1. To make contact with the experimentalreality of ν >

ONEQUILIBRIUM MODELSNONEQUILIBRIUM MODELS FANO FACTORFANO FACTOR (A)(B)

Figure 2. Comparison of diﬀerent models for noise in the constitutive promoter. (A) The left columndepicts various plausible models for the dynamics of constitutive promoters. In model (1), transcripts are producedin a Poisson process [36], [57]. Model (2) features explicit modeling of RNAP binding/unbinding kinetics [58].Model (3) is a more detailed generalization of model (2), treating transcription initiation as a multi-step processproceeding through closed and open complexes [38]. Model (4) is somewhat analogous to (2) except with the precisenature of active and inactive states left ambiguous [19], [23], [42]. Finally, model (5) can be viewed as a certainlimit of model (4) in which transcripts are produced in bursts, and initiation of bursts is a Poisson process. Theright column shows the Fano factor ν (variance/mean) for each model. Note especially the crucial diagnostic: (2)and (3) have ν strictly below 1, while only for (4) and (5) can ν exceed 1. Models with Fano factors ≤ in situ Hybridization. The colorbar indicates the predicted binding aﬃnity ofRNAP to the promoter sequence as determined in [54]. Numbers serve for cross comparison with data presented inFigure 3. .1.3 Noise in the Three-State Promoter, Multistep Transcription Initiation and Escape. How might we remedy the deﬁcits of model 2? It is known [39] that once RNAP initially binds the promoterregion, a multitude of distinct steps occur sequentially before RNAP ﬁnally escapes into the elongationphase. Perhaps adding some of this mechanistic detail as shown in model 3 of Figure 2(A) might rescue theprevious model. The next simplest reﬁnement of that model could consider open complex formation andpromoter escape as separate steps rather than as a single eﬀective step. In other words, we construct model3 by adding a single extra state to model 2, and we will label the two RNAP-bound states as the closedand open complexes, despite the true biochemical details certainly being more complex. For example, earlierwork extended this model by adding an additional repressor bound state and did not explicitly consider thelimit with no repressor that we analyze here [38]. Again, our goal here is not a complete accounting of allthe relevant biochemical detail; this is an exploratory search for the important features that a model needsto include to square with the known experimental reality of constitutive expression.Unfortunately, as hinted at in earlier work [38], this model too has Fano factor ν <

1. We again leave thealgebraic details for Appendix S1 and merely state the result that ν = 1 − rk + P k O Z k + P + k − P + k O + r + γ Z + γ ( k + P + k − P + k O + r ) + γ , (37)where we deﬁned Z = r ( k O + k − P ) + k + P ( k O + r ) for notational tidiness. This is necessarily less than 1 forarbitrary rate constants.In fact, we suspect any model in which transcription proceeds through a multistep cycle must necessarilyhave ν <

1. The intuitive argument compares the waiting time distribution to traverse the cycle withthe waiting time for a Poisson promoter (model 1) with the same mean time. The latter is simply anexponential distribution. The former is a convolution of multiple exponentials, and intuitively the waitingtime distribution for a multistep process should be more peaked with a smaller fractional width than a singleexponential with the same mean. A less disperse waiting time distribution means transcription initationsare more uniformly distributed in time relative to a Poisson process. Hence the distribution of mRNAover a population of cells should be less variable compared to Poisson, giving ν <

1. (In Appendix S1 wepresent a more precise version of the intuitive arguments in this paragraph.) Regardless of the merits ofthis model in describing the noise properties of constitutive transcription initiation, it ultimately fails thesimplest quantitative feature of the data, namely that the Fano factor > Inspired by [42], we next revisit an analog of model 2 in Figure 2(A), but as with the analogous modelsconsidered in Section 2, the interpretation of the two states is changed. Rather than explicitly viewing themas RNAP bound and unbound, we view them as “active” and “inactive,” which are able and unable toinitiate transcripts, respectively. We are noncommittal as to the microscopic details of these states.One interpretation [47], [48] for the active and inactive states is that they represent the promoter’s super-coiling state: transitions to the inactive state are caused by accumulation of positive supercoiling, whichinhibits transcription, and transitions back to “active” are caused by gyrase or other topoisomerases reliev-ing the supercoiling. This is an interesting possibility because it would mean the timescale for promoterstate transitions is driven by topoisomerase kinetics, not by RNAP kinetics. From in vitro measurements,the former are suggested to be of order minutes [47]. Contrast this with model 2, where the state transitionsare assumed to be governed by RNAP, which, assuming a copy number per cell of order 10 , has a diﬀusion-limited association rate k on ∼ s − to a target promoter. Combined with known K d ’s of order µ M, thisgives an RNAP dissociation rate k off of order 10 s − . As we will show below, however, there are somelingering puzzles with interpreting this supercoiling hypothesis, so we leave it as a speculation and refrainfrom assigning deﬁnite physical meaning to the two states in this model.15ntuitively one might expect that, since transcripts are produced as a Poisson process only when the promoteris in one of the two states in this model, transcription initiations should now be “bunched” in time, in contrastto the “anti-bunching” of models 2 and 3 above. One might further guess that this bunching would lead tosuper-Poissonian noise in the mRNA distribution over a population of cells. Indeed, as shown in Appendix S1,a calculation of the Fano factor produces ν = 1 + rk − ( k + + k − + γ )( k + + k − ) , (38)which is strictly greater than 1, verifying the above intuition. Note we have dropped the P label on thepromoter switching rates to emphasize that these very likely do not represent kinetics of RNAP itself. Thiscalculation can also be sidestepped by noting that the model is mathematically equivalent to the simplerepression model from [36], with states and rates relabeled and reinterpreted.How does this model compare to model 1 above? In model 1, all non-Poisson features of the mRNAdistribution were handled as extrinsic corrections. By contrast, here the 3 parameter model is used to ﬁt thefull mRNA distribution as measured in mRNA FISH experiments. In essence, all variability in the mRNAdistribution is regarded as “intrinsic,” arising either from stochastic initiation or from switching between thetwo coarse-grained promoter states. The advantage of this approach is that it ﬁts neatly into the masterequation picture, and the parameters thus inferred can be used as input for more complicated models withregulation by transcription factors.While this seems promising, there is a major drawback for our purposes which was already uncovered bythe authors of [42]: the statistical inference problem is nonidentiﬁable, in the sense described in Section 4.3of [52]. What this means is that it is impossible to infer the parameters r and k − from the single-cell mRNAcounts data of [36] (as shown in Fig. S2 of [42]). Rather, only the ratio r/k − could be inferred. In that work,the problem was worked around with an informative prior on the ratio k − /k + . That approach is unlikelyto work here, as, recall, our entire goal in modeling constitutive expression is to use it as the basis for ayet more complicated model, when we add on repression. But adding more complexity to a model that isalready poorly identiﬁed is a fool’s errand, so we will explore one more potential model. The ﬁnal model we consider is inspired by the failure mode of model 4. The key observation above was that,as found in [42], only two parameters, k + and the ratio r/k − , could be directly inferred from the single-cellmRNA data from [36]. So let us take this seriously and imagine a model where these are the only two modelparameters. What would this model look like?To develop some intuition, consider model 4 in the limit k + (cid:28) k − (cid:46) r , which is roughly satisﬁed by theparameters inferred in [42]. In this limit, the system spends the majority of its time in the inactive state,occasionally becoming active and making a burst of transcripts. This should call to mind the well-knownphenomenon of transcriptional bursting, as reported in, e.g., [47], [48], [60]. Let us make this correspondencemore precise. The mean dwell time in the active state is 1 /k − . While in this state, transcripts are producedat a rate r per unit time. So on average, r/k − transcripts are produced before the system switches to theinactive state. Once in the inactive state, the system dwells there for an average time 1 /k + before returningto the active state and repeating the process. r/k − resembles an average burst size, and 1 /k + resembles thetime interval between burst events. More precisely, 1 /k + is the mean time between the end of one burst andthe start of the next, whereas 1 /k + + 1 /k − would be the mean interval between the start of two successiveburst events, but in the limit k + (cid:28) k − , 1 /k + + 1 /k − ≈ /k + . Note that this limit ensures that the waitingtime between bursts is approximately exponentially distributed, with mean set by the only timescale left inthe problem, 1 /k + .Let us now verify this intuition with a precise derivation to check that r/k − is in fact the mean burst sizeand to obtain the full burst size distribution. Consider ﬁrst a constant, known dwell time T in the activestate. Transcripts are produced at a rate r per unit time, so the number of transcripts n produced during T

16s Poisson distributed with mean rT , i.e., P ( n | T ) = ( rT ) n n ! exp( − rT ) . (39)Since the dwell time T is unobservable, we actually want P ( n ), the dwell time distribution with no condi-tioning on T . Basic rules of probability theory tell us we can write P ( n ) in terms of P ( n | T ) as P ( n ) = (cid:90) ∞ P ( n | T ) P ( T ) dT. (40)But we know the dwell time distribution P ( T ), which is exponentially distributed according to P ( T ) = k − exp( − T k − ) , (41)so P ( n ) can be written as P ( n ) = k − r n n ! (cid:90) ∞ T n exp[ − ( r + k − ) T ] dT. (42)A standard integral table shows (cid:82) ∞ x n e − ax dx = n ! /a n +1 , so P ( n ) = k − r n ( k − + r ) n +1 = k − k − + r (cid:18) rk − + r (cid:19) n = k − k − + r (cid:18) − k − k − + r (cid:19) n , (43)which is exactly the geometric distribution with standard parameter θ ≡ k − / ( k − + r ) and domain n ∈{ , , , . . . } . The mean of the geometric distribution, with this convention, is (cid:104) n (cid:105) = 1 − θθ = (cid:18) − k − k − + r (cid:19) k − + rk − = rk − , (44)exactly as we guessed intuitively above.So in taking the limit r, k − → ∞ , r/k − ≡ b , we obtain a model which eﬀectively has only a single promoterstate, which initiates bursts at rate k + (transitions to the active state, in the model 4 picture). The masterequation for mRNA copy number m as derived in Appendix S1 takes the form ddt p ( m, t ) =( m + 1) γp ( m + 1 , t ) − mγp ( m, t )+ m − (cid:88) m (cid:48) =0 k i p ( m (cid:48) , t )Geom( m − m (cid:48) ; b ) − ∞ (cid:88) m (cid:48) = m +1 k i p ( m, t )Geom( m (cid:48) − m ; b ) , (45)where we use k i to denote the burst initiation rate, Geom( n ; b ) is the geometric distribution with mean b ,i.e., Geom( n ; b ) = b (cid:16) b b (cid:17) n (with domain over nonnegative integers as above). The ﬁrst two terms arethe usual mRNA degradation terms. The third term enumerates all ways the system can produce a burstof transcripts and arrive at copy number m , given that it had copy number m (cid:48) before the burst. The fourthterm allows the system to start with copy number m , then produce a burst and end with copy number m (cid:48) . Infact this last sum has trivial m (cid:48) dependence and simply enforces normalization of the geometric distribution.Carrying it out we have ddt p ( m, t ) =( m + 1) γp ( m + 1 , t ) − mγp ( m, t )+ m − (cid:88) m (cid:48) =0 k i p ( m (cid:48) , t )Geom( m − m (cid:48) ; b ) − k i p ( m, t ) , (46)We direct readers again to Appendix S2 for further details. This improves on model 4 in that now the pa-rameters are easily inferred, as we will see later, and have clean interpretations. The non-Poissonian featuresare attributed to the empirically well-established phenomenological picture of bursty transcription.17he big approximation in going from model 4 to 5 is that a burst is produced instantaneously rather thanover a ﬁnite time. If the true burst duration is not short compared to transcription factor kinetic timescales,this could be a problem in that mean burst size in the presence and absence of repressors could change,rendering parameter inferences from the constitutive case inappropriate. Let us make some simple estimatesof this.Consider the time delay between the ﬁrst and ﬁnal RNAPs in a burst initiating transcription ( not the timeto complete transcripts, which potentially could be much longer.) If this timescale is short compared to thetypical search timescale of transcription factors, then all is well. The estimates from deHaseth et. al. [39]put RNAP’s diﬀusion-limited on rate around ∼ few × − nM − s − and polymerase loading as high as1 s − . Then for reasonable burst sizes of <

10, it is reasonable to guess that bursts might ﬁnish initiating ona timescale of tens of seconds or less (with another 30-60 sec to ﬁnish elongation, but that does not matterhere). A transcription factor with typical copy number of order 10 (or less) would have a diﬀusion-limitedassociation rate of order (10 sec) − [56]. Higher copy number TFs tend to have many binding sites overthe genome, which should serve to pull them out of circulation and keep their eﬀective association ratesfrom rising too large. Therefore, there is perhaps a timescale separation possible between transcriptionfactor association rates and burst durations, but this assumption could very well break down, so we willhave to keep it in mind when we infer repressor rates from the Jones et. al. single-cell mRNA counts datalater [36].In reﬂecting on these 5 models, the reader may feel that exploring a multitude of potential models just toreturn to a very minimal phenomenological model of bursty transcription may seem highly pedantic. Butthe purpose of the exercise was to examine a host of models from the literature and understand why they areinsuﬃcient, one way or another, for our purposes. Along the way we have learned that the detailed kinetics ofRNAP binding and initiating transcription are probably irrelevant for setting the population distribution ofmRNA. The timescales are simply too fast, and as we will see later in Figures 3 and 4, the noise seems to begoverned by slower timescales. Perhaps in hindsight this is not surprising: intuitively, the degradation rate γ sets the fundamental timescale for mRNA dynamics, and any other processes that substantially modulatethe mRNA distribution should not diﬀer from γ by orders of magnitude. In this section of the paper, we continue our program of providing one complete description of the entirebroad sweep of studies that have been made in the context of the repressor-operator model, dating all the wayback to the original work of Jacob and Monod and including the visionary quantitative work of M¨uller-Hilland collaborators [30] and up to more recent studies [33]. In addition, the aim is to reconcile the equilibriumand non-equilibrium perspectives that have been brought to bear on this problem. From Section 2, thisreconciliation depends on a key quantitative question as codiﬁed by Eq. 35: does the free energy of repressorbinding, as described in the equilibrium models and indirectly inferred from gene expression measurements,agree with the corresponding values of repressor binding and unbinding rates in the non-equilibrium picture,measured or inferred more directly? In this section we tackle the statistical inference problem of inferringthese repressor rates from single-cell mRNA counts data. But before we can turn to the full case of simplerepression, we must choose an appropriate model of the constitutive promoter and infer the parameter valuesin that model. This is the problem we address ﬁrst.

From consideration of Fano factors in the previous section, we suspect that model 5 in Figure 2(A), a one-state bursty model of constitutive promoters, achieves the right balance of complexity and simplicity, byallowing both Fano factor ν >

1, but also by remedying, by design, the problems of parameter degeneracythat model 4 in Figure 2 suﬀered [42]. Does this stand up to closer scrutiny, namely, comparison to fullmRNA distributions rather than simply their moments? We will test this thoroughly on single-cell mRNAcounts for diﬀerent unregulated promoters from Jones et. al. [36].18t will be instructive, however, to ﬁrst consider the Poisson promoter, model 1 in Figure 2. As we alluded toearlier, since the Poisson distribution has a Fano factor ν strictly equal to 1, and all of the observed data inFigure 2(B) has Fano factor ν >

1, we might already suspect that this model is incapable of ﬁtting the data.We will verify that this is in fact the case. Using the same argument we can immediately rule out models 2and 3 from Figure 2(A). These models have Fano factors ν ≤ For this model the master equation of interest is Eq. 10 with repressor set to zero, i.e., ddt p U ( m )( t ) = rp U ( m − t ) − rp U ( m )( t ) + ( m + 1) γp U ( m + 1)( t ) − γp U ( m )( t ) , (47)whose steady-state solution is given by a Poisson distribution with parameter λ ≡ r/γ [57]. The goal of ourinference problem is then to ﬁnd the probability distribution for the parameter value λ given the experimentaldata. By Bayes’ theorem this can be written as p ( λ | D ) = p ( D | λ ) p ( λ ) p ( D ) , (48)where D = { m , m , . . . , m N } are the single-cell mRNA experimental counts. As is standard we will neglectthe denominator p ( D ) on the right hand side since it is independent of λ and serves only as a normalizationfactor.The steady-state solution for the master equation deﬁnes the likelihood term for a single cell p ( m | λ ). Whatthis means is that for a given choice of parameter λ , under model 1 of Figure 2(A), we expect to observe m mRNAs in a single cell with probability p ( m | λ ) = λ m e − λ m ! . (49)Assuming each cell’s mRNA count in our dataset is independent of others, the likelihood of the full inferenceproblem p ( D | λ ) is simply a product of the single cell likelihoods given by Eq. 49 above, so p ( D | λ ) = N (cid:89) k =1 λ m k e − λ m k ! . (50)To proceed we need to specify a prior distribution p ( λ ). In this case we are extremely data-rich, as thedataset from Jones et. al [36] has of order 1000-3000 single-cell measurements for each promoter, so ourchoice of prior matters little here, as long as it is suﬃciently broad. For details on the prior selection werefer the reader to Appendix S3. For our purpose here it suﬃces to specify that we use as prior a Gammadistribution. This particular choice of prior introduces two new parameters, α and β , which parametrize thegamma distribution itself, which we use to encode the range of λ values we view as reasonable. Recall λ isthe mean steady-state mRNA count per cell, which a priori could plausibly be anywhere from 0 to a few19undred. α = 1 and β = 1 /

50 achieve this, since the gamma distribution is strictly positive with mean α/β and standard deviation √ α/β .As detailed in Appendix S3 this particular choice of prior is known as the conjugate prior for a Poissonlikelihood. Conjugate priors have the convenient properties that a closed form exists for the posteriordistribution p ( λ | D ) - unusual in Bayesian inference problems - and the closed form posterior takes the sameform as the prior. For our case of a Poisson distribution likelihood with its Gamma distribution conjugateprior, the posterior distribution is also a Gamma distribution [52]. Speciﬁcally the two parameters α (cid:48) and β (cid:48) for this posterior distribution take the form α (cid:48) = α + ¯ mN and β (cid:48) = β + N , where we deﬁned the sample mean¯ m = N (cid:80) Nk =1 m k for notational convenience, and N is the number of cells in our dataset. Furthermore, giventhat N is O (10 ) and (cid:104) m (cid:105) (cid:38) . m andvariance ¯ m/N with marginal errors. As an example with real numbers, for the lacUV5 promoter, Jones et.al [36] measured 2648 cells with an average mRNA count per cell of ¯ m ≈ .

7. For this case our posteriordistribution P ( λ | D ) would be a Gaussian distribution with mean µ = 18 .

7, and a standard deviation σ ≈ .

08. This suggests we have inferred our model’s one parameter to a precision of order 1%.We remind the reader that we began this section claiming that the Poisson model was “wrong” since itcould not reproduce features of the data such as a Fano factor >

1. The fact that we obtain such a narrowposterior distribution for our parameter P ( λ | D ) does not equate to the model being adequate to describethe data. What this means is that given the data D , only values in a narrow range are remotely plausible forthe parameter λ , but a narrow posterior distribution does not necessarily mean the model accurately depictsreality. As we will see later in Figure 3 after exploring the bursty promoter model, indeed the correspondencewhen contrasting the Poisson model with the experimental data is quite poor. Let us now consider the problem of parameter inference for model ﬁve from Figure 1(C). As derived inAppendix S2, the steady-state mRNA distribution in this model is a negative binomial distribution, givenby p ( m ) = Γ( m + k i )Γ( m + 1)Γ( k i ) (cid:18)

11 + b (cid:19) k i (cid:18) b b (cid:19) m , (51)where b is the mean burst size and k i is the burst rate in units of the mRNA degradation rate γ . Assketched earlier, to think of the negative binomial distribution in terms of an intuitive “story,” in the precisemeaning of [61], we imagine the arrival of bursts as a Poisson process with rate k i , where each burst has ageometrically-distributed size with mean size b .As for the Poisson promoter model, this expression for the steady-state mRNA distribution is exactly thelikelihood we want to use when stating Bayes theorem. Again denoting the single-cell mRNA count data as D = { m , m , . . . , m N } , here Bayes’ theorem takes the form p ( k i , b | D ) ∝ p ( D | k i , b ) p ( k i , b ) . (52)We already have our likelihood – the product of N negative binomials as Eq. 51 – so we only need tochoose priors on k i and b . For the datasets from [36] that we are analyzing, as for the Poisson promotermodel above we are still data-rich so the prior’s inﬂuence remains weak, but not nearly as weak because thedimensionality of our model has increased from one parameter to two. Details on the arguments behind ourprior distribution selection are left for Appendix S3. We state here that the natural scale to explore theseparameters is logarithmic. This is commonly the case for parameters for which our previous knowledge basedon our domain expertise spans several orders of magnitude. For this we chose log-normal distributions forboth k i and b . Details on the mean and variance of these distributions can be found in Appendix S3.We carried out Markov-Chain Monte Carlo (MCMC) sampling on the posterior of this model, startingwith the constitutive lacUV5 dataset from [36]. The resulting MCMC samples are shown in Figure 3(A). Incontrast to the active/inactive constitutive model considered in [42] (nonequilibrium model 4 in Figure 2(A)),20his model is well-identiﬁed with both parameters determined to a fractional uncertainty of 5-10%. Thestrong correlation reﬂects the fact that their product sets the mean of the mRNA distribution, which istightly constrained by the data, but there is weak “sloppiness” [62] along a set of values with a similarproduct.Having found the model’s posterior to be well-identiﬁed as with the Poisson promoter, the next step is tocompare both models with experimental data. To do this for the case of the bursty promoter, for each ofthe parameter samples shown in Figure 3(A) we generated negative bionomial-distributed mRNA counts.As MCMC samples parameter space proportionally to the posterior distribution, this set of random samplesspan the range of possible values that we would expect given the correspondence between our theoreticalmodel and the experimental data. A similar procedure can be applied to the Poisson promoter. To compareso many samples with the actual observed data, we can use empirical cumulative distribution functions(ECDF) of the distribution quantiles. This representation is shown in Figure 3(B). In this example, themedian for each possible mRNA count for the Poisson distribution is shown as a dark green line, while thelighter green contains 95% of the randomly generated samples. This way of representing the ﬁt of the modelto the data gives us a sense of the range of data we might consider plausible, under the assumption thatthe model is true. For this case, as we expected given our premise of the Poisson promoter being wrong,it is quite obvious that the observed data, plotted in black is not consistent with the Poisson promotermodel. An equivalent plot for the bursty promoter model is shown in blue. Again the darker tone showsthe median, while the lighter color encompasses 95% of the randomly generated samples. Unlike the Poissonpromoter model, the experimental ECDF closely tracks the posterior predictive ECDF, indicating this modelis actually able to generate the observed data and increasing our conﬁdence that this model is suﬃcient toparametrize the physical reality of the system.The commonly used promoter sequence lacUV5 is our primary target here, since it forms the core of all thesimple repression constructs of [36] that we consider in Section 4.2. Nevertheless, we thought it wise to applyour bursty promoter model to the other 17 unregulated promoters available in the single-cell mRNA countdataset from [36] as a test that the model is capturing the essential phenomenology. If the model ﬁt well toall the diﬀerent promoters, this would increase our conﬁdence that it would serve well as a foundation forinferring repressor kinetics later in Section 4.2. Conversely, were the model to fail on more than a couple ofthe other promoters, it would give us pause.Figure 3(C) shows the results, plotting the posterior distribution from individually MCMC sampling all18 constitutive promoter datasets from [36]. To aid visualization, rather than plotting samples for eachpromoter’s posterior as in Figure 3(A), for each posterior we ﬁnd and plot the curve that surrounds the 95%highest probability density region. What this means is that each contour encloses approximately 95% of thesamples, and thus 95% of the probability mass, of its posterior distribution. Theory-experiment comparisons,shown in Figure S5 in Appendix S3, display a similar level of agreement between data and predictive samplesas for the bursty model with lacUV5 in Figure 3(B).One interesting feature from Figure 3(C) is that burst rate varies far more widely, over a range of ∼ ,than burst size, conﬁned to a range of (cid:46) (and with the exception of promoter 6, just a span of 3 to5-fold). This suggests that k i , not b , is the key dynamic variable that promoter sequence tunes. It is interesting to connect these inferences on k i and b to the work of [54], where these same 18 promoterswere considered through the lens of the three-state equilibrium model (model 2 in Figure 1(B)) and bindingenergies ∆ ε P were predicted from an energy matrix model derived from [53]. As previously discussed thethermodynamic models of gene regulation can only make statements about the mean gene expression. Thisimplies that we can draw the connection between both frameworks by equating the mean mRNA (cid:104) m (cid:105) . Thisresults in (cid:104) m (cid:105) = k i bγ = rγ PN NS exp( − β ∆ ε P )1 + PN NS exp( − β ∆ ε P ) . (53)21 .25 3.50 3.75 4.00 (transcripts per burst)4.755.005.255.505.756.00 ( b u r s t s p e r m R N A li f e t i m e ) E C D F Model 1 (Poisson)Model 5 (Neg. Binom) data, Jones et. al. 2014 (transcripts per burst)10 ( b u r s t s p e r m R N A li f e t i m e ) ( ) () (A) (B)(C) (D) Figure 3. Constitutive promoter posterior inference and model comparison. (A) The joint posteriordensity of model 5, the bursty promoter with negative binomially-distributed steady state, is plotted with MCMCsamples. 1D marginal probability densities are plotted as ﬂanking histograms. The model was ﬁt on lacUV5 datafrom [36]. (B) The empirical distribution function (ECDF) of the observed population distribution of mRNAtranscripts under the control of a constitutive lacUV5 promoter is shown in black. The median posterior predictiveECDFs for models (1), Poisson, and (5), negative binomial, are plotted in dark green and dark blue, respectively.Lighter green and blue regions enclose 95% of all posterior predictive samples from their respective models. Model(1) is in obvious contradiction with the data while model (5) is not. Single-cell mRNA count data is again from [36].(C) Joint posterior distributions for burst rate k i and mean burst size b for 18 unregulated promoters from [36].Each contour indicates the 95% highest posterior probability density region for a particular promoter. Note that thevertical axis is shared with (D). (D) Plots of the burst rate k i vs. the binding energy for each promoter as predictedin [54]. The dotted line shows the predicted slope according to Eq. 55, described in text. Each individual promoteris labeled with a unique number in both (C) and (D) for cross comparison and for comparison with Figure 2(B).

22y taking the weak promoter approximation for the equilibrium model (

P/N NS exp( − β ∆ ε r ) (cid:28)

1) resultsin (cid:104) m (cid:105) = k i bγ = rγ PN NS exp( − β ∆ ε P ) , (54)valid for all the binding energies considered here.Given this result, how are the two coarse-grainings related? A quick estimate can shed some light. Considerfor instance the lacUV5 promoter, which we see from Figure 3(A) has k i /γ ∼ b ∼ few, from Figure 3(B)has (cid:104) m (cid:105) ∼

20, and from [54] has β ∆ ε P ∼ − .

5. Further we generally assume

P/N NS ∼ − since N NS ≈ . × and P ∼ . After some guess-and-check with these values, one ﬁnds the only associationthat makes dimensional sense and produces the correct order-of-magnitude for the known parameters is totake k i γ = PN NS exp( − β ∆ ε P ) (55)and b = rγ . (56)Figure 3(D) shows that this linear scaling between ln k i and − β ∆ ε P is approximately true for all 18 consti-tutive promoters considered. The plotted line is simply Eq. 55 and assumes P ≈ ∼

100 kb domains of DNA. This suggests that all geneson a given domain should burst in synchrony, and that the diﬀerence between highly and lowly expressedgenes is the size of transcriptional bursts, not the time between bursts. But here, all burst sizes we inferin Figure 3(C) are comparable and burst rates vary wildly. It is not immediately clear how to square thiscircle. Furthermore, Figure 7E in [47] reports values of the quantity they label β/α and we label k + /k − inmodel 4 from Figure 2. In contrast to the ﬁndings of [42], Chong et. al. do not ﬁnd k + /k − (cid:28) galK chromosomal locus used for the reporterconstructs in [42] and [36] merely an outlier, or is there a deeper puzzle here waiting to be resolved? Withoutmore apples-to-apples data we can only speculate, and we leave it as an intriguing open question for theﬁeld.Despite such puzzles, our goal here is not to unravel the mysterious origins of burstiness in transcription.Our remaining task in this work is a determination of the physical reality of equilibrium binding energiesin Figure 1, as codiﬁed by the equilibrium-nonequilibrium equivalence of Eq. 35. For our phenomenologicalneeds here model 5 in Figure 2 is more than adequate: the posterior distributions in Figure 3(C) are cleanlyidentiﬁable and the predictive checks in Figure S5 indicate no discrepancies between the model and themRNA single-moleucle count data of [36]. Of the models we have considered it is unique in satisfying boththese requirements. So we will happily use it as a foundation to build upon in the next section when we addregulation. 23 .2 Transcription factor kinetics can be inferred from single-cell mRNA distri-bution measurements Now that we have a satisfactory model in hand for constitutive promoters, we would like to return to themain thread: can we reconcile the equilibrium and nonequilibrium models by putting to the test Eq. 35, thecorrespondence between indirectly inferred equilibrium binding energies and nonequilibrium kinetic rates? Tomake this comparison, is it possible to infer repressor binding and unbinding rates from mRNA distributionsover a population of cells as measured by single-molecule Fluorescence in situ

Hybridization in [36]? If so,how do these inferred rates compare to direct single-molecule measurements such as from [56] and to bindingenergies such as from [33] and [32], which were inferred under the assumptions of the equilibrium modelsin Figure 1(B)? And can this comparison shed light on the unreasonable eﬀectiveness of the equilibriummodels, for instance, in their application in [35], [63]?As we found in Section 3, for our purposes the “right” model of a constitutive promoter is the burstypicture, model ﬁve in Figure 2(A). Therefore our starting point here is the analogous model with repressoradded, model 5 in Figure 1(C). For a given repressor binding site and copy number, this model has fourrate parameters to be inferred: the repressor binding and unbinding rates k + R , and k − R , the initiation rateof bursts, k i , and the mean burst size b (we nondimensionalize all of these by the mRNA degradation rate γ ).Before showing the mathematical formulation of our statistical inference model, we would like to sketchthe intuitive structure. The dataset from [36] we consider consists of single-cell mRNA counts data of ninediﬀerent conditions, spanning several combinations of three unique repressor binding sites and four uniquerepressor copy numbers. We assume that the values of k i and b are known, since we have already cleanlyinferred them from constitutive promoter data, and further we assume that these values are the same acrossdatasets with diﬀerent repressor binding sites and copy numbers. In other words, we assume that theregulation of the transcription factor does not aﬀect the mean burst size nor the burst initiation rate. Theregulation occurs as the promoter is taken away from the transcriptionally active state when the promoteris bound by repressor. We assume that there is one unbinding rate parameter for each repressor bindingsite, and likewise one binding rate for each unique repressor copy number. This makes our model sevendimensional, or nine if one counts k i and b as well. Note that we use only a subset of the datasets fromJones et. al. [36], as discussed more in Appendix S3.Formally now, denote the set of seven repressor rates to be inferred as (cid:126)k = { k − Oid , k − O , k − O , k +0 . , k +1 , k +2 , k +10 } , (57)where subscripts for dissociation rates k − indicate the diﬀerent repressor binding sites, and subscripts forassociation rates k + indicate the concentration of the small-molecule that controlled the expression of theLacI repressor (see Appendix S3). This is because for this particular dataset the repressor copy numberswere not measured directly, but it is safe to assume that a given concentration of the inducer resulted in aspeciﬁc mean repressor copy number [63]. Also note that the authors of [36] report estimates of LacI copynumber per cell rather than direct measurements. However, these estimates were made assuming the validityof the equilibrium models in Figure 1, and since testing these models is our present goal, it would be circularlogic if we were to make the same assumption. Therefore we will make no assumptions about the LacI copynumber for a given inducer concentrations.Having stated the problem, Bayes’ theorem reads p ( (cid:126)k, k i , b | D ) ∝ p ( D | (cid:126)k, k i , b ) p ( (cid:126)k, k i , b ) , (58)where D is again the set of all N observed single-cell mRNA counts across the various conditions. We assumethat individual single-cell measurements are independent so that the likelihood factorizes as p ( D | (cid:126)k, k i , b ) = N (cid:89) j =1 p ( m | (cid:126)k, k i , b ) = N (cid:89) j =1 p ( m | k + j , k − j , k i , b ) (59)24 .50 0.25 0.00 0.25 0.50 0.75 ( / ) ( + / ) O1 O2Oid 0.5 ng/mL10 ng/mL1 ng/mL2 ng/mL 0.0 0.5 1.0 1.5 2.0 2.5 ( / ) (this study)012 () ( p r e v s t u d i e s ) ) (this study)0.00.10.20.3 ( m i n ) ( H a mm a r e t a l ) mRNA counts per cell E C D F mRNA counts per cell mRNA counts per cell (A) (B)(C) Figure 4. Simple repression parameter inference and comparison. (A) Contours which enclose 50% and95% of the posterior probability mass are shown for each of several 2D slices of the 9D posterior distribution. Themodel assumes one unbinding rate for each operator (Oid, O1, O2) and one binding rate for each aTc inductionconcentration (corresponding to an unknown mean repressor copy number). (B, upper) Ratios of our inferredunbinding rates are compared with operator binding energy diﬀerences measured by Garcia and Phillips [33](triangles) and Razo-Mejia et. al. [32] (circles). Blue glyphs compare O2-O1, while orange compare O1-Oid. Pointswith perfect agreement would lie on the dotted line. (B, lower) Unbinding rates for O1 (cyan) and Oid (red)inferred in this work are compared with single-molecule measurements from Hammar et. al. [56]. We plot thecomparison assuming illustrative mRNA lifetimes of γ − = 3 min (triangles) or γ − = 5 min (squares). Dotted lineis as in upper panel. (C) Theory-experiment comparison are shown for each of the datasets used in the inference ofthe model in (A). Observed single-molecule mRNA counts data from [36] are plotted as black lines. The median ofthe randomly generated samples for each condition is plotted as a dark colored line. Lighter colored bands enclose95% of all samples for a given operator/repressor copy number pair. The unregulated promoter, lacUV5 , is shownwith each as a reference. k ± j represent the appropriate binding and unbinding rates out of (cid:126)k for the j -th measured cell. Theprobability p ( m | k + j , k − j , k i , b ) appearing in the last expression is exactly Eq. S184, the steady-state distri-bution for our bursty model with repression derived in Section S2, which for completeness we reproduce hereas p ( m | k + R , k − R , k i , b ) = Γ( α + m )Γ( β + m )Γ( k + R + k − R )Γ( α )Γ( β )Γ( k + R + k − R + m ) b m m ! × F ( α + m, β + m, k + R + k − R + m ; − b ) . (60)where F is the conﬂuent hypergeometric function of the second kind and α and β , deﬁned for notationalconvenience, are α = 12 (cid:18) k i + k − R + k + R + (cid:113) ( k i + k − R + k + R ) − k i k − R (cid:19) β = 12 (cid:18) k i + k − R + k + R − (cid:113) ( k i + k − R + k + R ) − k i k − R (cid:19) . (61)This likelihood is rather inscrutable. We did not ﬁnd any of the known analytical approximations for F terribly useful in gaining intuition, so we instead resorted to numerics. One insight we found was that forvery strong or very weak repression, the distribution in Eq. 60 is well approximated by a negative binomialwith burst size b and burst rate k i equal to their constitutive lacUV5 values, except with k i multiplied bythe fold-change (cid:0) k + R /k − R (cid:1) − . In other words, once again only the ratio k + R /k − R was detectable. But forintermediate repression, the distribution was visibly broadened with Fano factor greater than 1 + b , thevalue for the corresponding constitutive case. This indicates that the repressor rates had left an imprinton the distribution, and perhaps intuitively, this intermediate regime occurs for values of k ± R comparableto the burst rate k i . Put another way, if the repressor rates are much faster or much slower than k i , thenthere is a timescale separation and eﬀectively only one timescale remains, k i (cid:0) k + R /k − R (cid:1) − . Only when allthree rates in the problem are comparable does the mRNA distribution retain detectable information aboutthem.Next we specify priors. As for the constitutive model, weakly informative log-normal priors are a naturalchoice for all our rates. We found that if the priors were too weak, our MCMC sampler would often becomestuck in regions of parameter space with very low probability density, unable to move. We struck a balance inchoosing our prior widths between helping the sampler run while simultaneously verifying that the marginalposteriors for each parameter were not artiﬁcially constrained or distorted by the presence of the prior. Alldetails for our prior distributions are listed in Appendix S3.We ran MCMC sampling on the full nine dimensional posterior speciﬁed by this model. To attempt tovisualize this object, in Figure 4(A) we plot several two-dimensional slices as contour plots, analogous toFigure 3(C). Each of these nine slices corresponds to the ( k + R , k − R ) pair of rates for one of the conditions fromthe dataset used to ﬁt the model and gives a sense of the uncertainty and correlations in the posterior. Wenote that the 95% uncertainties of all the rates span about ∼ . k +0 . , the association rate for the lowest repressor copy number which is somewhat larger. Our primary goal in this work is to reconcile the kinetic and equilibrium pictures of simple repression.Towards this end we would like to compare the repressor kinetic rates we have inferred with the repressorbinding energies inferred through multiple methods in [33] and [32]. If the agreement is close, then it suggeststhat the equilibrium models are not wrong and the repressor binding energies they contain correspond tophysically real free energies, not mere ﬁt parameters.Figure 4(B) shows both comparisons, with the top panel comparing to equilibrium binding energies and thebottom panel comparing to single-molecule measurements. First consider the top panel and its comparison26etween repressor kinetic rates and binding energies. As described in section 2, if the equilibrium bindingenergies from [33] and [32] indeed are the physically real binding energies we believe them to be, then theyshould be related to the repressor kinetic rates via Eq. 35, which we restate here,∆ F R = β ∆ ε R − log( R/N NS ) = − log( k + R /k − R ) . (62)Assuming mass action kinetics implies that k + R is proportional to repressor copy number R , or more precisely,it can be thought of as repressor copy number times some intrinsic per molecule association rate. But since R is not directly known for our data from [36], we cannot use this equation directly. Instead we can considertwo diﬀerent repressor binding sites and compute the diﬀerence in binding energy between them, since thisdiﬀerence depends only on the unbinding rates and not on the binding rates. This can be seen by evaluatingEq. 62 for two diﬀerent repressor binding sites, labeled (1) and (2), but with the same repressor copy number R , and taking the diﬀerence to ﬁnd∆ F (1) R − ∆ F (2) R = β ∆ ε − β ∆ ε = − log( k + R /k − ) + log( k + R /k − ) , (63)or simply β ∆ ε − β ∆ ε = log( k − /k − ) . (64)The left and right hand sides of this equation are exactly the horizontal and vertical axes of the top panelof Figure 4. Since we inferred rates for three repressor binding sites (O1, O2, and Oid), there are onlytwo independent diﬀerences that can be constructed, and we arbitrarily chose to plot O2-O1 and O1-Oidin Figure 4(B). Numerically, we compute values of k − O /k − Oid and k − O /k − O directly from our full posteriorsamples, which conveniently provides uncertainties as well, as detailed in Appendix S3. We then comparethese log ratios of rates to the binding energy diﬀerences ∆ ε O − ∆ ε Oid and from ∆ ε O − ∆ ε O as computedfrom the values from both [33] and [32]. Three of the four values are within ∼ . k B T of the diagonal repre-senting perfect agreement, which is comparable to the ∼ few × . k B T variability between the independentdeterminations of the same quantities between [33] and [32]. The only outlier involves Oid measurementsfrom [32], and as the authors of [32] note, this is a diﬃcult measurement of low ﬂuorescence signal againsthigh background since Oid represses so strongly. We are therefore inclined to regard the failure of this pointto fall near the diagonal as a testament to the diﬃculty of the measurement and not as a failure of ourtheory.On the whole then, we regard this as striking conﬁrmation of the validity of the equilibrium models. Theirlynchpin parameter is a phenomenological free energy of repressor binding that has previously only beeninferred indirectly. Our result shows that the microscopic interpretation of this free energy, as the log of aratio of transition rates, does indeed hold true to within the inherent uncertainties that remain in the entiretheory-experiment dialogue. In the previous section we established the equivalence between the equilibrium models’ binding energiesand the repressor kinetics we infer from mRNA population distributions. But one might worry that therepressor rates we infer from mRNA distributions are themselves merely ﬁtting parameters and that they donot actually correspond to the binding and unbinding rates of the repressor in vivo. To verify that this is notthe case, we next compare our kinetic rates with a diﬀerent measurement of the same rates using a radicallydiﬀerent method: single molecule measurements as performed in Hammar et. al. [56]. This is plotted in thelower panel of Figure 4(B).Since we do not have access to repressor copy number for either the single-cell mRNA data from [36] orthe single-molecule data from [56], we cannot make an apples-to-apples comparison of association rates k + R .Further, while Hammar et. al. directly measure the dissociation rates k − R , our inference procedure returns k − R /γ , i.e., the repressor dissociation rate nondimensionalized by the mRNA degradation rate γ . So to makethe comparison, we must make an assumption for the value of γ since it was not directly measured. Formost mRNAs in E. coli , quoted values for the typical mRNA lifetime γ − range between about 2.5 min [64]27o 8 min. We chose γ − = 3 min and γ − = 5 min as representative values and plot a comparison of k − O and k − Oid from our inference with corresponding values reported in [56] for both these choices of γ .The degree of quantitative agreement in the lower panel of Figure 4(B) clearly depends on the precisechoice of γ . Nevertheless we ﬁnd this comparison very satisfying, when two wildly diﬀerent approaches to ameasurement of the same quantity yield broadly compatible results. We emphasize the agreement betweenour rates and the rates reported in [56] for any reasonable γ : values diﬀer by at most a factor of 2 and possiblyagree to within our uncertainties of 10-20%. From this we feel conﬁdent asserting that the parameters wehave inferred from Jones et. al.’s single-cell mRNA counts data do in fact correspond to repressor bindingand unbinding rates, and therefore our conclusions on the agreement of these rates with binding energiesfrom [33] and [32] are valid. In Figure 3(B) we saw that the simple Poisson model of a constitutive promoter, despite having a wellbehaved posterior, was clearly insuﬃcient to describe the data. It behooves us to carry out a similar checkfor our model of simple repression, codiﬁed by Eq. 60 for the steady-state mRNA copy number distribution.As derived in Sections 2 and 3, we have compelling theoretical reasons to believe it is a good model, but ifit nevertheless turned out to be badly contradicted by the data we should like to know.The details are deferred to Appendix S3, and here we only attempt to summarize the intuitive ideas, asdetailed at greater length by Jaynes [65] as well as Gelman and coauthors [52], [66]. From our samples ofthe posterior distribution, plotted in Figure 4(A), we generate many replicate data using a random numbergenerator. In Figure 4(C), we plot empirical cumulative distribution functions of the middle 95% quantilesof these replicate data with the actual experimental data from Jones et. al. [36] overlaid, covering all tenexperimental conditions spanning repressor binding sites and copy numbers (as well as the constitutivebaseline UV5).The purpose of Figure 4(C) is simply a graphical, qualitative assessment of the model: do the experimentaldata systematically disagree with the simulated data, which would suggest that our model is missing im-portant features? A further question is not just whether there is a detectable diﬀerence between simulatedand experimental data, but whether this diﬀerence is likely to materially aﬀect the conclusions we drawfrom the posterior in Figure 4(A). More rigorous and quantitative statistical tests are possible [52], but theirquantitativeness does not necessarily make them more useful. As stated in [66], we often ﬁnd this graphicalcomparison more enlightening because it better engages our intuition for the model, not merely telling if themodel is wrong but suggesting how the model may be incomplete.Our broad brush takeaway from Figure 4(C) is overall of good agreement. There some oddities, in particularthe long tails in the data for Oid, 1 ng/mL, and O2, 0.5 ng/mL. The latter is especially odd since it extendsbeyond the tail of the unregulated UV5 distribution. This is a relatively small number of cells, however, sowhether this is a peculiarity of the experimental data, a statistical ﬂuke of small numbers, or a real biologicaleﬀect is unclear. It is conceivable that there is some very slow timescale switching dynamics that could causethis bimodality, although it is unclear why it would only appear for speciﬁc repressor copy numbers. There isalso a small oﬀset between experiment and simulation for O2 at the higher repressor copy numbers, especiallyat 2 and 10 ng/mL. From the estimate of repressor copy numbers from [36], it is possible that the repressorcopy numbers here are becoming large enough to partially invalidate our assumption of a separation oftimescales between burst duration and repressor association rate. Another possibility is that the very largenumber of zero mRNA counts for Oid, 2 ng/mL is skewing its partner datasets through the shared associationrate. None of these fairly minor quibbles cause us to seriously doubt the overall correctness of our model,which further validates its use to compare the equilibrium models’ binding energies to the nonequilibriummodels’ repressor kinetics, as we originally set out to do.28 Discussion and future work

The study of gene expression is one of the dominant themes of modern biology, made all the more urgent bythe dizzying pace at which genomes are being sequenced. But there is a troubling Achilles heel buried in allof that genomic data, which is our inability to ﬁnd and interpret regulatory sequence. In many cases, thisis not possible even qualitatively, let alone the possibility of quantitative dissection of the regulatory partsof genomes in a predictive fashion. Other recent work has tackled the challenge of ﬁnding and annotatingthe regulatory part of genomes [5], [28]. Once we have determined the architecture of the regulatory partof the genome, we are then faced with the next class of questions which are sharpened by formulating themin mathematical terms, namely, what are the input-output properties of these regulatory circuits and whatknobs control them?The present work has tackled that question in the context of the ﬁrst regulatory architecture hypothesizedin the molecular biology era, namely, the repressor-operator model of Jacob and Monod [1]. Regulation inthat architecture is the result of a competition between a repressor which inhibits transcription and RNAPpolymerase which undertakes it. Through the labors of generations of geneticists, molecular biologistsand biochemists, an overwhelming amount of information and insight has been garnered into this simpleregulatory motif, licensing it as what one might call the “hydrogen atom” of regulatory biology. It is fromthat perspective that the present paper explores the extent to which some of the diﬀerent models that havebeen articulated to describe that motif allow us to understand both the average level of gene expression foundin a population of cells, the intrinsic cell-to-cell variability, and the full gene expression distribution foundin such a population as would be reported in a single molecule mRNA Fluorescence in situ

Hybridizationexperiment, for example.Our key insights can be summarized as follows. First, as shown in Figure 1, the mean expression in thesimple repression architecture is captured by a master curve in which the action of repressor and the detailsof the RNAP interaction with the promoter appear separately and additively in an eﬀective free energy.Interestingly, as has been shown elsewhere in the context of the Monod-Wyman-Changeux model, these kindsof coarse-graining results are an exact mathematical result and do not constitute hopeful approximations orbiological naivete [32], [35]. To further dissect the relative merits of the diﬀerent models, we must appealto higher moments of the gene expression probability distribution. To that end, our second set of insightsfocus on gene expression noise, where it is seen that a treatment of the constitutive promoter alreadyreveals that some models have Fano factors (variance/mean) that are less than one, at odds with any and allexperimental data that we are aware of [36], [59]. This theoretical result allows us to directly discard a subsetof the models (models 1-3 in Figure 2(A)) since they cannot be reconciled with experimental observations.The two remaining models (models 4 and 5 in Figure 2) appear to contain enough microscopic realism tobe able to reproduce the data. A previous exploration of model 4 demonstrated the “sloppy” [62] natureof the model in which data on single-cell mRNA counts alone cannot constrain the value of all parameterssimultaneously [42]. Here we demonstrate that the proposed one-state bursty promoter model (model 5 inFigure 2(A)) emerges as a limit of the commonly used two-state promoter model [19], [23], [36], [57], [59]. Weput the idea to the test that this level of coarse-graining is rich enough to reproduce previous experimentalobservations. In particular we perform Bayesian inference to determine the two parameters describing thefull steady-state mRNA distribution, ﬁnding that the model is able to provide a quantitative description ofa plethora of promoter sequences with diﬀerent mean levels of expression and noise.With the results of the constitutive promoter in hand, we then ﬁx the parameters associated with this classof promoters and use them as input for evaluating the noise in gene expression for the simple repression motifitself. This allows us to provide a single overarching analysis of both the constitutive and simple repressionarchitectures using one simple model and corresponding set of self-consistent parameters, demonstrating notonly a predictive framework, but also reconciling the equilibrium and non-equilibrium views of the samesimple repression constructs. More speciﬁcally, we obtained values for the transcription factor associationand dissociation rates by performing Bayesian inference on the full mRNA distribution for data obtainedfrom simple-repression promoters with varying number of transcription factors per cell and aﬃnity of suchtranscription factors for the binding site. The free energy value obtained from these kinetic rates – computedas the log ratio of the rates – agrees with previous inferences performed only from mean gene expression29easurements, that assumed an equilibrium rather than a kinetic framework [32], [33].It is interesting to speculate what microscopic details are being coarse-grained by our burst rate and burstsize in Figure 2, model 5. Chromosomal locus is one possible inﬂuence we have not addressed in this work,as all the single-molecule mRNA data from [36] that we considered was from a construct integrated at the galK locus. The results of [47] indicate that transcription-induced supercoiling contributes substantially indriving transcriptional bursting, and furthermore, their Figure 7 suggests that the parameters describing therate, duration, and size of bursts vary substantially for transcription from diﬀerent genomic loci. Althoughthe authors of [67] do not address noise, they note enormous diﬀerences in mean expression levels when anidentical construct is integrated at diﬀerent genomic loci. The authors of [68] attribute noise and burstiness intheir single-molecule mRNA data to the inﬂuence of diﬀerent sigma factors, which is a reasonable conclusionfrom their data. Could the diﬀerence also be due to the diﬀerent chromosomal locations of the two operons?What features of diﬀerent loci are and are not important? Could our preferred coarse-grained model capturethe variability across diﬀerent loci? If so, and we were to repeat the parameter inference as done in thiswork, is there a simple theoretical model we could build to understand the resulting parameters?In summary, this work took up the challenge of exploring the extent to which a single speciﬁc mechanisticmodel of the simple-repression regulatory architecture suﬃces to explain the broad sweep of experimentaldata for this system. Pioneering early experimental eﬀorts from the M¨uller-Hill lab established the simple-repression motif as an arena for the quantitative dissection of regulatory response in bacteria, with similarbeautiful work emerging in examples such as the ara and gal operons as well [29], [30], [69]–[73]. In light ofa new generation of precision measurements on these systems, the deﬁnition of what it means to understandthem can now be formulated as a rigorous quantitative question. In particular, we believe understandingof the simple repression motif has advanced suﬃciently that the design of new versions of the architectureis now possible, based upon predictions about how repressor copy number and DNA binding site strengthcontrol expression. In our view, the next step in the progression is to ﬁrst perform similar rigorous analysesof the fundamental “basis set” of regulatory architectures. Natural follow-ups to this work are explorationsof motifs such as simple activation that is regulated by a single activator binding site, and the repressor-activator architecture, mediated by the binding of both a single activator and a single repressor, and beyond.With the individual input-output functions in hand, similar quantitative dissections including the rigorousanalysis of their tuning parameters can be undertaken for the “basis set” of full gene-regulatory networkssuch as switches, feed-forward architectures and oscillators for example, building upon the recent impressivebonanza of eﬀorts from systems biologists and synthetic biologists [74], [75].

All data and custom scripts were collected and stored using Git version control. Code for Bayesian inferenceand ﬁgure generation is available on the GitHub repository ( https://github.com/RPGroup-PBoC/bursty_transcription ). Acknowledgments

We thank Rob Brewster for providing the raw single-molecule mRNA FISH data. We thank Justin Boisfor his key support with the Bayesian inference section. We would also like to thank Griﬃn Chure forinvaluable feedback on the manuscript. This material is based upon work supported by the National ScienceFoundation Graduate Research Fellowship under Grant No. DGE1745301. This work was also supported byLa Fondation Pierre-Gilles de Gennes, the Rosen Center at Caltech, and the NIH 1R35 GM118043 (MIRA).M.R.M. was supported by the Caldwell CEMI fellowship.30 eferences [1] F. Jacob and J. Monod, “Genetic regulatory mechanisms in the synthesis of proteins,”

Journal ofMolecular Biology , vol. 3, no. 3, pp. 318–356, Jun. 1961.[2] R. J. Britten and E. H. Davidson, “Gene regulation for higher cells: A theory,”

Science , vol. 165, no.891, pp. 349–57, 1969.[3] S. B.-T. de-Leon and E. H. Davidson, “Gene regulation: Gene control network in development,”

AnnualReview of Biophysics and Biomolecular Structure , vol. 36, p. 191, 2007.[4] M. Rydenfelt, H. G. Garcia, R. S. Cox III, and R. Phillips, “The inﬂuence of promoter architecturesand regulatory motifs on gene expression in

Escherichia coli ,” PLoS One , vol. 9, no. 12, e114347, 2014.[5] N. M. Belliveau, S. L. Barnes, W. T. Ireland, D. L. Jones, M. J. Sweredoski, A. Moradian, S. Hess, J. B.Kinney, and R. Phillips, “Systematic approach for dissecting the molecular mechanisms of transcrip-tional regulation in bacteria,”

Proceedings of the National Academy of Sciences of the United States ofAmerica , vol. 115, no. 21, E4796–E4805, 2018.[6] S. Ghatak, Z. A. King, A. Sastry, and B. O. Palsson, “The y-ome deﬁnes the 35% of

Escherichia coli genes that lack experimental evidence of function,”

Nucleic Acids Research , vol. 47, no. 5, pp. 2446–2454, 2019.[7] A. Santos-Zavaleta, H. Salgado, S. Gama-castro, et al. , “RegulonDB v 10.5: Tackling challenges to unifyclassic and high throughput knowledge of gene regulation in

E. coli K-12 ,” Nucleic Acids Research ,vol. 47, pp. 212–220, 2019.[8] G. K. Ackers, A. D. Johnson, and M. A. Shea, “Quantitative model for gene regulation by lambdaphage repressor,”

Proceedings of The National Academy Of Sciences Of The United States Of America ,vol. 79, no. 4, pp. 1129–33, 1982.[9] M. A. Shea and G. K. Ackers, “The OR control system of bacteriophage lambda. A physical-chemicalmodel for gene regulation,”

Journal of Molecular Biology , vol. 181, no. 2, pp. 211–30, 1985.[10] N. E. Buchler, U. Gerland, and T. Hwa, “On schemes of combinatorial transcription logic.,”

Proceedingsof The National Academy Of Sciences Of The United States Of America , vol. 100, no. 9, pp. 5136–5141,Apr. 2003.[11] J. M. Vilar and S. Leibler, “DNA looping and physical constraints on transcription regulation,”

Journalof Molecular Biology , vol. 331, no. 5, pp. 981–9, 2003.[12] J. M. Vilar, C. C. Guet, and S. Leibler, “Modeling network dynamics: The lac operon, a case study,”

Journal of Cell Biology , vol. 161, no. 3, pp. 471–6, 2003.[13] L. Bintu, N. E. Buchler, H. G. Garcia, U. Gerland, T. Hwa, J. Kondev, and R. Phillips, “Transcriptionalregulation by the numbers: Models,”

Current Opinion in Genetics & Development , vol. 15, no. 2,pp. 116–24, 2005.[14] L. Bintu, N. E. Buchler, H. G. Garcia, U. Gerland, T. Hwa, J. Kondev, T. Kuhlman, and R. Phillips,“Transcriptional regulation by the numbers: Applications,”

Current Opinion in Genetics & Develop-ment , vol. 15, no. 2, pp. 125–135, Apr. 2005.[15] J. Gertz, E. D. Siggia, and B. A. Cohen, “Analysis of combinatorial cis -regulation in synthetic andgenomic promoters,”

Nature , vol. 457, no. 7226, pp. 215–8, 2009.[16] M. S. Sherman and B. A. Cohen, “Thermodynamic state ensemble models of cis -regulation,”

PLoSComputational Biology , vol. 8, no. 3, e1002407, 2012.[17] J. M. Vilar and L. Saiz, “Reliable prediction of complex phenotypes from a modular design in freeenergy space: An extensive exploration of the lac operon,”

ACS Synthetic Biology , vol. 2, no. 10,pp. 576–86, 2013.[18] M. S. Ko, “A stochastic model for gene induction,”

Journal of Theoretical Biology , vol. 153, no. 2,pp. 181–94, 1991.[19] J. Peccoud and B. Ycart, “Markovian modelig of gene product synthesis,”

Theoretical PopulationBiology , vol. 48, pp. 222–234, 1995.[20] M. T. J. Record, W. Reznikoﬀ, M. Craig, K. McQuade, and P. Schlax, “

Escherichia coli

RNA poly-merase (sigma70) promoters and the kinetics of the steps of transcription initiation,” in

In Escherichiacoli and Salmonella Cellular and Molecular Biology , F. C. Neidhardt, R. C. III, J. L. INGRAHAM, etal. , Eds., Washington, DC: ASM Press, 1996, pp. 792–821.3121] T. B. Kepler and T. C. Elston, “Stochasticity in transcriptional regulation: Origins, consequences, andmathematical representations.,”

Biophysical Journal , vol. 81, pp. 3116–36. 2001.[22] A. Sanchez and J. Kondev, “Transcriptional control of noise in gene expression,”

Proceedings of TheNational Academy Of Sciences Of The United States Of America , vol. 105, no. 13, pp. 5081–6, 2008.[23] V. Shahrezaei and P. S. Swain, “Analytical distributions for stochastic gene expression,”

Proceedingsof The National Academy Of Sciences Of The United States Of America , vol. 105, no. 45, pp. 17 256–17 261, Nov. 2008.[24] A. Sanchez, H. G. Garcia, D. Jones, R. Phillips, and J. Kondev, “Eﬀect of promoter architecture onthe cell-to-cell variability in gene expression,”

PLoS Computational Biology , vol. 7, no. 3, e1001100,2011.[25] D. Michel, “How transcription factors can adjust the gene expression ﬂoodgates,”

Progress in Bio-physics and Molecular Biology , vol. 102, no. 1, pp. 16–37, 2010.[26] H. G. Garcia, J. Kondev, N. Orme, J. A. Theriot, and R. Phillips, “Thermodynamics of BiologicalProcesses,” in

Methods in Enzymology , vol. 492, Elsevier, 2011, pp. 27–59.[27] R. Phillips, N. M. Belliveau, G. Chure, H. G. Garcia, M. Razo-Mejia, and C. Scholes, “Figure 1 theorymeets ﬁgure 2 experiments in the study of gene expression,”

Annual Review of Biophysics , vol. 48, no.1, pp. 121–163, May 2019.[28] W. T. Ireland, S. M. Beeler, E. Flores-Bautista, N. M. Belliveau, M. J. Sweredoski, A. Moradian,J. B. Kinney, and R. Phillips, “Deciphering the regulatory genome of

Escherichia coli , one hundredpromoters at a time,”

BioRxiv , Jan. 2020.[29] S Oehler, M Amouyal, P Kolkhof, B von Wilcken Bergmann, and B. M. Hill, “Quality and position ofthe three lac operators of

E. coli deﬁne eﬃciency of repression.,”

The EMBO journal , vol. 13, no. 14,pp. 3348–3355, Jul. 1994.[30] S Oehler, E. R. Eismann, H Kr¨amer, and B. M. Hill, “The three operators of the lac operon cooperatein repression.,”

The EMBO journal , vol. 9, no. 4, pp. 973–979, Apr. 1990.[31] S. A. Frank, “Input-output relations in biological systems: Measurement, information and the Hillequation,”

Biology Direct , vol. 8, no. 1, p. 31, Dec. 2013.[32] M. Razo-Mejia, S. L. Barnes, N. M. Belliveau, G. Chure, T. Einav, M. Lewis, and R. Phillips, “Tuningtranscriptional regulation through signaling: A predictive theory of allosteric induction,”

Cell Systems ,vol. 6, no. 4, 456–469.e10, Apr. 2018.[33] H. G. Garcia and R. Phillips, “Quantitative dissection of the simple repression input-output function,”

Proceedings of The National Academy Of Sciences Of The United States Of America , vol. 108, no. 29,pp. 12 173–12 178, Jul. 2011.[34] R. C. Brewster, F. M. Weinert, H. G. Garcia, D. Song, M. Rydenfelt, and R. Phillips, “The transcriptionfactor titration eﬀect dictates level of gene expression,”

Cell , vol. 156, no. 6, pp. 1312–1323, Mar. 2014.[35] G. Chure, M. Razo-Mejia, N. M. Belliveau, T. Einav, Z. A. Kaczmarek, S. L. Barnes, M. Lewis,and R. Phillips, “Predictive shifts in free energy couple mutations to their phenotypic consequences,”

Proceedings of The National Academy Of Sciences Of The United States Of America , vol. 116, no. 37,pp. 18 275–18 284, Sep. 2019.[36] D. L. Jones, R. C. Brewster, and R. Phillips, “Promoter architecture dictates cell-to-cell variability ingene expression,”

Science , vol. 346, no. 6216, pp. 1533–1536, Dec. 2014.[37] R. Phillips, J. Kondev, J. Theriot, and H. G. Garcia,

Physical biology of the cell, 2nd Edition . NewYork: Garland Science, 2013.[38] N. Mitarai, S. Semsey, and K. Sneppen, “Dynamic competition between transcription initiation andrepression: Role of nonequilibrium steps in cell-to-cell heterogeneity,”

Physical Review E , vol. 92, no.2, p. 022 710, Aug. 2015.[39] P. L. deHaseth, M. L. Zupancic, and M. T. Record, “RNA polymerase-promoter interactions: Thecomings and goings of RNA polymerase,”

Journal of Bacteriology , vol. 180, no. 12, pp. 3019–3025,Jun. 1998.[40] E. L. King and C. Altman, “A schematic method of deriving the rate laws for enzyme-catalyzedreactions,”

The Journal of Physical Chemistry , vol. 60, no. 10, pp. 1375–1378, Oct. 1956.[41] T. L. Hill, “Studies in irreversible thermodynamics IV. diagrammatic representation of steady stateﬂuxes for unimolecular systems,”

Journal of Theoretical Biology , vol. 10, no. 3, pp. 442–459, Apr. 1966.3242] M. Razo-Mejia, S. Marzen, G. Chure, R. Taubman, M. Morrison, and R. Phillips, “First-principlesprediction of the information processing capacity of a simple genetic circuit,”

ArXiv , May 2020.[43] S. A. Frank, “Generative models versus underlying symmetries to explain biological pattern,”

Journalof Evolutionary Biology , vol. 27, no. 6, pp. 1172–1178, Jun. 2014.[44] J. Gunawardena, “A linear framework for time-scale separation in nonlinear biochemical systems,”

PLoS ONE , vol. 7, no. 5, K. Selvarajoo, Ed., e36321, May 2012.[45] T. Ahsendorf, F. Wong, R. Eils, and J. Gunawardena, “A framework for modelling gene regulationwhich accommodates non-equilibrium mechanisms,”

BMC Biology , vol. 12, no. 1, p. 102, Dec. 2014.[46] T. L. Hill,

Free Energy Transduction and Biochemical Cycle Kinetics . New York, NY: Springer NewYork, 1989.[47] S. Chong, C. Chen, H. Ge, and X. S. Xie, “Mechanism of transcriptional bursting in bacteria,”

Cell ,vol. 158, no. 2, pp. 314–326, Jul. 2014.[48] S. A. Sevier, D. A. Kessler, and H. Levine, “Mechanical bounds to transcriptional noise,”

Proceedingsof The National Academy Of Sciences Of The United States Of America , vol. 113, no. 49, pp. 13 983–13 988, Dec. 2016.[49] I. I. Cisse, I. Izeddin, S. Z. Causse, et al. , “Real-time dynamics of RNA polymerase II clustering in livehuman cells,”

Science , vol. 341, no. 6146, pp. 664–667, Aug. 2013.[50] W.-K. Cho, N. Jayanth, B. P. English, et al. , “RNA polymerase II cluster dynamics predict mRNAoutput in living cells,”

ELife , vol. 5, e13617, May 2016.[51] A.-M. Ladouceur, B Parmar, S Biedzinski, et al. , “Clusters of bacterial RNA polymerase are biomolec-ular condensates that assemble through liquid-liquid phase separation,”

BioRxiv , Mar. 2020.[52] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin,

Bayesian dataanalysis , Third edition, ser. Chapman & Hall/CRC Texts in Statistical Science. Boca Raton: CRCPress, 2013.[53] J. B. Kinney, A. Murugan, C. G. Callan, and E. C. Cox, “Using deep sequencing to characterize thebiophysical mechanism of a transcriptional regulatory sequence,”

Proceedings of The National AcademyOf Sciences Of The United States Of America , vol. 107, no. 20, pp. 9158–9163, May 2010.[54] R. C. Brewster, D. L. Jones, and R. Phillips, “Tuning promoter strength through RNA polymerasebinding site design in

Escherichia coli ,” PLoS Computational Biology , vol. 8, no. 12, E. van Nimwegen,Ed., e1002811, Dec. 2012.[55] J. Landman, R. N. Georgiev, M. Rydenfelt, and W. K. Kegel, “

In vivo and in vitro consistency ofthermodynamic models for transcription regulation,”

Physical Review Research , vol. 1, no. 3, p. 033 094,Nov. 2019.[56] P. Hammar, M. Walld´en, D. Fange, F. Persson, ¨O. Baltekin, G. Ullman, P. Leroy, and J. Elf, “Directmeasurement of transcription factor dissociation excludes a simple operator occupancy model for generegulation,”

Nature Genetics , vol. 46, no. 4, pp. 405–408, Apr. 2014.[57] A. Sanchez, S. Choubey, and J. Kondev, “Stochastic models of transcription: From single moleculesto single cells,”

Methods , vol. 62, no. 1, pp. 13–25, Jul. 2013.[58] R. Phillips, “Napoleon is in equilibrium,”

Annual Review of Condensed Matter Physics , vol. 6, no. 1,pp. 85–111, Mar. 2015.[59] L.-h. So, A. Ghosh, C. Zong, L. A. Sep´ulveda, R. Segev, and I. Golding, “General properties of tran-scriptional time series in

Escherichia coli ,” Nature Genetics , vol. 43, no. 6, pp. 554–560, Jun. 2011.[60] I. Golding, J. Paulsson, S. M. Zawilski, and E. C. Cox, “Real-time kinetics of gene activity in individualbacteria,”

Cell , vol. 123, no. 6, pp. 1025–1036, Dec. 2005.[61] J. K. Blitzstein and J. Hwang,

Introduction to probability , ser. Texts in Statistical Science. Boca Raton:CRC Press/Taylor & Francis Group, 2015.[62] M. K. Transtrum, B. B. Machta, K. S. Brown, B. C. Daniels, C. R. Myers, and J. P. Sethna, “Per-spective: Sloppiness and emergent theories in physics, biology, and beyond,”

The Journal of ChemicalPhysics , vol. 143, no. 1, p. 010 901, Jul. 2015.[63] G. Chure, Z. A. Kaczmarek, and R. Phillips, “Physiological adaptability and parametric versatility ina simple genetic circuit,”

BioRxiv , pp. 10.1101/2019.12.19.878462, Dec. 2019.[64] H. Chen, K. Shiroguchi, H. Ge, and X. S. Xie, “Genome-wide study of mRNA degradation and tran-script elongation in

Escherichia coli ,” Molecular Systems Biology , vol. 11, no. 1, p. 781, Jan. 2015.3365] E. T. Jaynes and G. L. Bretthorst,

Probability theory: The logic of science . Cambridge, UK ; NewYork, NY: Cambridge University Press, 2003.[66] A. Gelman and C. R. Shalizi, “Philosophy and the practice of bayesian statistics,”

British Journal ofMathematical and Statistical Psychology , vol. 66, no. 1, pp. 8–38, Feb. 2013.[67] J. A. Englaender, J. A. Jones, B. F. Cress, T. E. Kuhlman, R. J. Linhardt, and M. A. G. Koﬀas,“Eﬀect of genomic integration location on heterologous protein expression and metabolic engineeringin

E. coli ,” ACS Synthetic Biology , vol. 6, no. 4, pp. 710–720, Apr. 2017.[68] C. Engl, G. Jovanovic, R. D. Brackston, I. Kotta-Loizou, and M. Buck, “The route to transcriptioninitiation determines the mode of transcriptional bursting in

E. coli ,” Nature Communications , vol.11, no. 1, p. 2422, Dec. 2020.[69] T. M. Dunn, S. Hahn, S. Ogden, and R. F. Schleif, “An operator at -280 base pairs that is requiredfor repression of araBAD operon promoter: addition of DNA helical turns between the operator andpromoter cyclically hinders repression,”

Proceedings of The National Academy Of Sciences Of TheUnited States Of America , vol. 81, no. 16, pp. 5017–20, 1984.[70] M. J. Weickert and S. Adhya, “The galactose regulon of

Escherichia coli ,” Molecular Microbiology , vol.10, no. 2, pp. 245–51, 1993.[71] R. Schleif, “Regulation of the L-arabinose operon of

Escherichia coli ,” Trends in Genetics , vol. 16, no.12, pp. 559–65, 2000.[72] S. Semsey, M. Geanacopoulos, D. E. A. Lewis, and S. Adhya, “Operator-bound GalR dimers closeDNA loops by direct interaction: tetramerization and inducer binding,”

The EMBO journal , vol. 21,no. 16, pp. 4349–4356, Aug. 2002.[73] L. Swint-Kruse and K. S. Matthews, “Allostery in the LacI/GalR family: variations on a theme,”

Current Opinion in Microbiology , vol. 12, no. 2, pp. 129–137, Apr. 2009.[74] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, “Network motifs: simplebuilding blocks of complex networks,”

Science , vol. 298, no. 5594, pp. 824–7, 2002.[75] U. Alon,

An introduction to systems biology: design principles of biological circuits , ser. Chapman &Hall/CRC mathematical and computational biology series. Boca Raton, FL: Chapman & Hall/CRC,2007. 34 upplemental Information for: Reconciling Kinetic and Equilibrium Models of Bacterial Tran-scription

Contents

S1 Derivations for non-bursty promoter models 36

S1.1 Derivation of chemical master equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36S1.2 Matrix form of the multi-state chemical master equation . . . . . . . . . . . . . . . . . . . . . 38S1.3 General forms for mean mRNA and Fano factor . . . . . . . . . . . . . . . . . . . . . . . . . . 39S1.3.1 Promoter state probabilities (cid:104) (cid:126)m (cid:105) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40S1.3.2 First moments (cid:104) (cid:126)m (cid:105) and (cid:104) m (cid:105) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41S1.3.3 Second moment (cid:104) m (cid:105) and Fano factor ν . . . . . . . . . . . . . . . . . . . . . . . . . . 42S1.3.4 Summary of general results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43S1.4 Nonequilibrium Model One - Poisson Promoter . . . . . . . . . . . . . . . . . . . . . . . . . . 44S1.4.1 Mean mRNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44S1.4.2 Fano factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45S1.5 Nonequilibrium Model Two - RNAP Bound and Unbound States . . . . . . . . . . . . . . . . 45S1.5.1 Mean mRNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45S1.5.2 Fano factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46S1.6 Nonequilibrium Model Three - Multistep Transcription Initiation and Escape . . . . . . . . . 47S1.6.1 Mean mRNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47S1.6.2 Fano factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48S1.6.3 Generalizing ν < S2 Bursty promoter models - generating function solutions and numerics 50

S2.1 Constitutive promoter with bursts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50S2.1.1 From master equation to generating function . . . . . . . . . . . . . . . . . . . . . . . 50S2.1.2 Steady-state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54S2.1.3 Recovering the steady-state probability distribution . . . . . . . . . . . . . . . . . . . 55S2.2 Adding repression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56S2.2.1 Deriving the generating function for mRNA distribution . . . . . . . . . . . . . . . . . 56S2.3 Numerical considerations and recursion formulas . . . . . . . . . . . . . . . . . . . . . . . . . 61S2.3.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61S2.3.2 Particulars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

S3 Bayesian inference 63

S3.1 The problem of parameter inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63S3.1.1 Bayes’ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63S3.1.2 The likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64S3.1.3 Prior selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65S3.1.4 Expectations and marginalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65S3.1.5 Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65S3.2 Bayesian inference on constitutive promoters . . . . . . . . . . . . . . . . . . . . . . . . . . . 66S3.2.1 Model 1 - Poisson promoter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66S3.2.2 Model 5 - Bursty promoter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68S3.3 Bayesian inference on the simple-repression architecture . . . . . . . . . . . . . . . . . . . . . 6935

In this section we detail the calculation of mean mRNA levels, fold-changes in expression, and Fano factorsfor nonequilibrium promoter models 1 through 4 in Figure 1. These are the results that were quoted but notderived in Sections 2 and 3 of the main text. In each of these four models, the natural mathematicizationof their cartoons is as a chemical master equation such as Eq. 12 for model 1. Before jumping into thederivations of the general computation of the mean mRNA level and the Fano factor we will work throughthe derivation of an example master equation. In particular we will focus on model 1 from Figure 1(C). Thegeneral steps are applicable to all other chemical master equations in this work.

S1.1 Derivation of chemical master equation (A)(B) mRNA countmRNA countpromoter statepromoter state

Figure S1. Two-state promoter chemical master equation. (A) Schematic of the two state promoter simplerepression model. Rates k + R and k − R are the association and dissociation rates of the transcriptional repressor,respectively, r is the transcription initiation rate, and γ is the mRNA degradation rate. (B) Schematic depiction ofthe mRNA count state transitions. The model in (A) only allows for jumps in mRNA of size 1. The production ofmRNA can only occur when the promoter is in the transcriptionally active state. The chemical master equation describes the continuous time evolution of a continuous or discrete probabilitydistribution function. In our speciﬁc case we want to describe the time evolution of the discrete mRNAdistribution. What this means is that we want to compute the probability of having m mRNA moleculesat time t + ∆ t , where ∆ t is a suﬃciently small time interval such that only one of the possible reactionstake place during that time interval. For the example that we will work out here in detail we chose thetwo-state stochastic simple repression model schematized in Figure S1(A). To derive the master equationwe will focus more on the representation shown in Figure S1(B), where the transitions between diﬀerentmRNA counts and promoter states is more explicitly depicted. Given that the DNA promoter can exist inone of two states – transcriptionally active state, and with repressor bound – we will keep track not onlyof the mRNA count, but on which state the promoter is. For this we will keep track of two probabilitydistributions: The probability of having m mRNAs at time t when the promoter is in the transcriptionallyactive state A , p A ( m, t ), and the equivalent probability but when the promoter is in the repressor boundstate R , p R ( m, t ).Since mRNA production can only take place in the transcriptionally active state we will focus on this functionfor our derivation. The repressor bound state will have an equivalent equation without terms involving theproduction of mRNAs. We begin by listing the possible state transitions that can occur for a particularmRNA count with the promoter in the active state. For state changes in a small time window ∆ t that“jump into” state m in the transcriptionally active state we have • Produce an mRNA, jumping from m − m .36 Degrade an mRNA, jumping from m + 1 to m . • Transition from the repressor bound state R with m mRNAs to the active state A with m mRNAs.Likewise, for state transitions that “jump out” of state m in the transcriptionally inactive state we have • Produce an mRNA, jumping from m to m + 1. • Degrade an mRNA, jumping from m to m − • Transition from the active state A with m mRNAs to the repressor bound state R with m mRNAs.The mRNA production does not depend on the current number of mRNAs, therefore these state transitionsoccur with probability r ∆ t . The same is true for the promoter state transitions; each occurs with probability k ± R ∆ t . As for the mRNA degradation events, these transitions depend on the current number of mRNAmolecules since the more molecules of mRNA there are, the more will decay during a given time interval. Eachmolecule has a constant probability γ ∆ t of being degraded, so the total probability for an mRNA degradationevent to occur is computed by multiplying this probability by the current number of mRNAs.To see these terms in action let us compute the probability of having m mRNA at time t + ∆ t in thetranscriptionally active state. This takes the form p A ( m, t + ∆ t ) = p A ( m, t )+ m − → m (cid:122) (cid:125)(cid:124) (cid:123) ( r ∆ t ) p A ( m − , t ) − m → m +1 (cid:122) (cid:125)(cid:124) (cid:123) ( r ∆ t ) p A ( m, t )+ m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1)( γ ∆ t ) p A ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) m ( γ ∆ t ) p A ( m, t )+ R → A (cid:122) (cid:125)(cid:124) (cid:123) ( k − R ∆ t ) p R ( m, t ) − A → R (cid:122) (cid:125)(cid:124) (cid:123) ( k + R ∆ t ) p A ( m, t ) , (S1)where the overbrace indicates the corresponding state transitions. Notice that the second to last term onthe right-hand side is multiplied by p R ( m, t ) since the transition from state R to state A depends on theprobability of being in state R to begin with. It is through this term that the dynamics of the two probabilitydistribution functions ( p R ( m, t ) and p A ( m, t )) are coupled. An equivalent equation can be written for theprobability of having m mRNA at time t + ∆ t while in the repressor bound state, the only diﬀerence beingthat the mRNA production rates are removed, and the sign for the promoter state transitions are inverted.This is p R ( m, t + ∆ t ) = p R ( m, t )+ m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1)( γ ∆ t ) p R ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) m ( γ ∆ t ) p R ( m, t ) − R → A (cid:122) (cid:125)(cid:124) (cid:123) ( k − R ∆ t ) p R ( m, t ) + A → R (cid:122) (cid:125)(cid:124) (cid:123) ( k + R ∆ t ) p A ( m, t ) . (S2)All we have to do now are simple algebraic steps in order to simplify the equations. Let us focus on thetranscriptionally active state A . First we will send the term p A ( m, t ) to the right-hand side, and then wewill divide both sides of the equation by ∆ t . This results in p A ( m, t + ∆ t ) − p A ( m, t )∆ t = rp A ( m − , t ) − rp A ( m, t )+ ( m + 1) γp A ( m + 1 , t ) − mγp A ( m, t )+ k − R p R ( m, t ) − k + R p A ( m, t ) . (S3)Upon taking the limit when ∆ t → dp A ( m, t ) dt = rp A ( m − , t ) − rp A ( m, t )+ ( m + 1) γp A ( m + 1 , t ) − mγp A ( m, t )+ k − R p R ( m, t ) − k + R p A ( m, t ) . (S4)Doing equivalent manipulations for the repressor bound state gives an ODE of the form dp R ( m, t ) dt = ( m + 1) γp R ( m + 1 , t ) − mγp R ( m, t ) − k − R p R ( m, t ) + k + R p A ( m, t ) . (S5)In the next section we will write these coupled ODEs in a more compact form using matrix notation. S1.2 Matrix form of the multi-state chemical master equation

Having derived an example chemical master equation we now focus on writing a general matrix form forthe kinetic models 1-4 shown in Figure 1(C) in the main text. In each of these four models, the naturalmathematicization of their cartoons is as a chemical master equation such as Eq. 12 for model 1, which forcompleteness we reproduce here as ddt p R ( m, t ) = − R → U (cid:122) (cid:125)(cid:124) (cid:123) k − R p R ( m, t ) + U → R (cid:122) (cid:125)(cid:124) (cid:123) k + R p U ( m, t ) + m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1) γp R ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γmp R ( m, t ) ddt p U ( m, t ) = R → U (cid:122) (cid:125)(cid:124) (cid:123) k − R p R ( m, t ) − U → R (cid:122) (cid:125)(cid:124) (cid:123) k + R p U ( m, t ) + m − → m (cid:122) (cid:125)(cid:124) (cid:123) rp U ( m − , t ) − m → m +1 (cid:122) (cid:125)(cid:124) (cid:123) rp U ( m, t )+ m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1) γp U ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γmp U ( m, t ) . (S6)Here p R ( m, t ) and p U ( m, t ) are the probabilities of ﬁnding the system with m mRNA molecules at time t either in the repressor bound or unbound states, respectively. r is the probability per unit time that atranscript will be initiated when the repressor is unbound, and γ is the probability per unit time for a givenmRNA to be degraded. k − R is the probability per unit time that a bound repressor will unbind, while k + R isthe probability per unit time that an unbound operator will become bound by a repressor. Assuming massaction kinetics, k + R is proportional to repressor copy number R .Next consider the cartoon for nonequilibrium model 2 in Figure 1(C). Now we must track probabilities p R , p P , and p E for the repressor bound, empty, and polymerase bound states, respectively. By analogy to Eq. S6,the master equation for model 2 can be written ddt p R ( m, t ) = − R → U (cid:122) (cid:125)(cid:124) (cid:123) k − R p R ( m, t ) + U → R (cid:122) (cid:125)(cid:124) (cid:123) k + R p E ( m, t ) + m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1) γp R ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γmp R ( m, t ) ddt p E ( m, t ) = R → U (cid:122) (cid:125)(cid:124) (cid:123) k − R p R ( m, t ) − U → R (cid:122) (cid:125)(cid:124) (cid:123) k + R p E ( m, t ) + m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1) γp E ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γmp E ( m, t ) . + A → U (cid:122) (cid:125)(cid:124) (cid:123) k − P p P ( m, t ) − U → A (cid:122) (cid:125)(cid:124) (cid:123) k + P p E ( m, t ) + m − → m, A → U (cid:122) (cid:125)(cid:124) (cid:123) rp P ( m − , t ) ddt p P ( m, t ) = − A → U (cid:122) (cid:125)(cid:124) (cid:123) k − P p P ( m, t ) + U → A (cid:122) (cid:125)(cid:124) (cid:123) k + P p E ( m, t ) + m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1) γp P ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γmp P ( m, t ) . − m → m +1 , A → U (cid:122) (cid:125)(cid:124) (cid:123) rp P ( m, t ) . (S7)38 + P and k − P are deﬁned in close analogy to k + R and k − R , except for RNAP binding and unbinding insteadof repressor. Similarly p P ( m, t ) is deﬁned for the active (RNAP-bound) state exactly as are p R ( m, t ) and p E ( m, t ) for the repressor bound and unbound states, respectively. The new subtlety Eq. S7 introducescompared to Eq. S6 is that when mRNAs are produced, the promoter state also changes. This is encoded bythe terms involving r , the last term in each of the equations for p E and p P . The former accounts for arrivalsin the unbound state and the latter accounts for departures from the RNAP-bound state.To condense and clarify the unwieldy notation of Eq. S7, it can be rewritten in matrix form. We deﬁne thecolumn vector (cid:126)p ( m, t ) as (cid:126)p ( m, t ) =  p R ( m, t ) p E ( m, t ) p P ( m, t )  (S8)to gather, for a given m , the probabilities of ﬁnding the system in the three possible promoter states. Thenall the transition rates may be condensed into matrices which multiply this vector. The ﬁrst matrix is K =  − k − R k + R k − R − k + R − k + P k − P k + P − k − P  , (S9)which tracks all promoter state changes in Eq. S7 that are not accompanied by a change in the mRNA copynumber. The two terms accounting for transcription, the only transition that increases mRNA copy number,must be handled by two separate matrices given by R A =  r  , R D =  r  . (S10) R A accounts for transitions arriving in a given state while R D tracks departing transitions. With thesedeﬁnitions, we can condense Eq. S7 into the single equation ddt (cid:126)p ( m, t ) = ( K − R D − γm I ) (cid:126)p ( m, t ) + R A (cid:126)p ( m − , t ) + γ ( m + 1) I (cid:126)p ( m + 1 , t ) , (S11)which is just Eq. 15 in the main text. Straightforward albeit tedious algebra veriﬁes that Eqs. S7 and S11are in fact equivalent.Although we derived Eq. S11 for the particular case of nonequilibrium model 2 in Figure 1, in fact thechemical master equations for all of the nonequilibrium models in Figure 1 except for model 5 can be castin this form. (We treat model 5 separately in Appendix S2.) Model 3 introduces no new subtleties beyondmodel 2 and Eq. S11 applies equally well to it, simply with diﬀerent matrices of dimension four instead ofthree. Models 1 and 4 are both handled by Eq. 12 in the main text, which is just Eq. S11 except in thespecial case of R D = R A ≡ R , since in these two models transcription initiation events do not changepromoter state.Recalling that our goal in this section is to derive expressions for mean mRNA and Fano factor for nonequi-librium models 1 through four in Figure 1, and since all four of these models are described by Eq. S11, wecan save substantial eﬀort by deriving general formulas for mean mRNA and Fano factor from Eq. S11 onceand for all. Then for each model we can simply plug in the appropriate matrices for K , R D , and R A andcarry out the remaining algebra.For our purposes it will suﬃce to derive the ﬁrst and second moments of the mRNA distribution from thismaster equation, similar to the treatment in [24], but we refer the interested reader to [42] for an analogoustreatment demonstrating an analytical solution for arbitrary moments. S1.3 General forms for mean mRNA and Fano factor

Our task now is to derive expressions for the ﬁrst two moments of the steady-state mRNA distributionfrom Eq. S11. Our treatment of this is analogous to that given in Refs. [24] and [42]. We ﬁrst obtain the39teady-state limit of Eq. S11 in which the time derivative vanishes, giving0 = ( K − R D − γm I ) (cid:126)p ( m ) + R A (cid:126)p ( m −

1) + γ ( m + 1) I (cid:126)p ( m + 1) , (S12)From this, we want to compute (cid:104) m (cid:105) = (cid:88) S ∞ (cid:88) m =0 m p S ( m ) (S13)and (cid:104) m (cid:105) = (cid:88) S ∞ (cid:88) m =0 m p S ( m ) (S14)which deﬁne the average values of m and m at steady state, where the averaging is over all possible mRNAcopy numbers and promoter states S . For example, for model 1 in Figure 1(C), the sum on S would coverrepressor bound and unbound states ( R and U respectively), for model 2, the sum would cover repressorbound, polymerase bound, and empty states ( R , P , and E ), and so on for the other models.Along the way it will be convenient to deﬁne the following conditional moments as (cid:104) (cid:126)m (cid:105) = ∞ (cid:88) m =0 m(cid:126)p ( m ) , (S15)and (cid:104) (cid:126)m (cid:105) = ∞ (cid:88) m =0 m (cid:126)p ( m ) . (S16)These objects are vectors of the same size as (cid:126)p ( m ), and each component can be thought of as the expectedvalue of the mRNA copy number, or copy number squared, conditional on the promoter being in a certainstate. For example, for model 1 in Figure 1 which has repressor bound and unbound states labeled R and U , (cid:104) (cid:126)m (cid:105) would be (cid:104) (cid:126)m (cid:105) = (cid:18)(cid:80) ∞ m =0 m p R ( m ) (cid:80) ∞ m =0 m p U ( m ) (cid:19) . (S17)Analogously to (cid:104) (cid:126)m (cid:105) and (cid:104) (cid:126)m (cid:105) , it is convenient to deﬁne the vector (cid:104) (cid:126)m (cid:105) = ∞ (cid:88) m =0 (cid:126)p ( m ) , (S18)whose elements are simply the probabilities of ﬁnding the system in each of the possible promoter states. Itwill be convenient to denote by (cid:126) † a row vector of the same length as (cid:126)p whose elements are all 1, such that,for instance, (cid:126) † · (cid:104) (cid:126)m (cid:105) = 1, (cid:126) † · (cid:104) (cid:126)m (cid:105) = (cid:104) m (cid:105) , etc. S1.3.1 Promoter state probabilities (cid:104) (cid:126)m (cid:105) To begin, we will ﬁnd the promoter state probabilities (cid:104) (cid:126)m (cid:105) from Eq. S12 by summing over all mRNA copynumbers m , producing0 = ∞ (cid:88) m =0 [( K − R D − γm I ) (cid:126)p ( m ) + R A (cid:126)p ( m −

1) + γ ( m + 1) I (cid:126)p ( m + 1)] (S19)Using the deﬁnitions of (cid:104) (cid:126)m (cid:105) and (cid:104) (cid:126)m (cid:105) , and noting the matrices K , R D , and R A are all independent of m and can be moved outside the sum, this simpliﬁes to0 = ( K − R D ) (cid:104) (cid:126)m (cid:105) − γ (cid:104) (cid:126)m (cid:105) + R A ∞ (cid:88) m =0 (cid:126)p ( m −

1) + γ ∞ (cid:88) m =0 ( m + 1) (cid:126)p ( m + 1) . (S20)40he last two terms can be handled by reindexing the summations, transforming them to match the deﬁnitionsof (cid:104) (cid:126)m (cid:105) and (cid:104) (cid:126)m (cid:105) . For the ﬁrst, noting (cid:126)p ( −

1) = 0 since negative numbers of mRNA are nonsensical, wehave ∞ (cid:88) m =0 (cid:126)p ( m −

1) = ∞ (cid:88) m = − (cid:126)p ( m ) = ∞ (cid:88) m =0 (cid:126)p ( m ) = (cid:104) (cid:126)m (cid:105) . (S21)Similarly for the second, ∞ (cid:88) m =0 ( m + 1) (cid:126)p ( m + 1) = ∞ (cid:88) m =1 m(cid:126)p ( m ) = ∞ (cid:88) m =0 m(cid:126)p ( m ) = (cid:104) (cid:126)m (cid:105) , (S22)which holds since in extending the lower limit from m = 1 to m = 0, the extra term we added to the sum iszero. Substituting these back in Eq. S20, we have0 = ( K − R D ) (cid:104) (cid:126)m (cid:105) − γ (cid:104) (cid:126)m (cid:105) + R A (cid:104) (cid:126)m (cid:105) + γ (cid:104) (cid:126)m (cid:105) , (S23)or simply 0 = ( K − R D + R A ) (cid:104) (cid:126)m (cid:105) . (S24)So given matrices K , R D , and R A describing a promoter, ﬁnding (cid:104) (cid:126)m (cid:105) simply amounts to solving this set oflinear equations, subject to the normalization constraint (cid:126) † · (cid:104) (cid:126)m (cid:105) = 1. It will turn out to be the case that,if the matrix K − R D + R A is n dimensional, it will always have only n − n -th linearly independent equation, ensuring a uniquesolution. So when using this equation to solve for (cid:104) (cid:126)m (cid:105) , we may always drop one row of the matrix equationat our convenience and supplement the system with the normalization condition instead. The reader mayﬁnd it illuminating to skip ahead and see Eq. S24 in use with concrete examples, e.g., Eq. S52 and whatfollows it, before continuing on through the general formulas for moments. S1.3.2 First moments (cid:104) (cid:126)m (cid:105) and (cid:104) m (cid:105) By analogy to the above procedure for ﬁnding (cid:104) (cid:126)m (cid:105) , we may ﬁnd (cid:104) (cid:126)m (cid:105) by ﬁrst multiplying Eq. S12 by m andthen summing over m . Carrying out this procedure we have0 = ∞ (cid:88) m =0 m [( K − R D − γm I ) (cid:126)p ( m ) + R A (cid:126)p ( m −

1) + γ ( m + 1) I (cid:126)p ( m + 1)] , (S25)and now identifying (cid:104) (cid:126)m (cid:105) and (cid:104) (cid:126)m (cid:105) gives0 = ( K − R D ) (cid:104) (cid:126)m (cid:105) − γ (cid:104) (cid:126)m (cid:105) + R A ∞ (cid:88) m =0 m(cid:126)p ( m −

1) + γ ∞ (cid:88) m =0 m ( m + 1) (cid:126)p ( m + 1) . (S26)The summations in the last two terms can be reindexed just as we did for (cid:104) (cid:126)m (cid:105) , freely adding or removingterms from the sum which are zero. For the ﬁrst term we ﬁnd ∞ (cid:88) m =0 m(cid:126)p ( m −

1) = ∞ (cid:88) m = − ( m + 1) (cid:126)p ( m ) = ∞ (cid:88) m =0 ( m + 1) (cid:126)p ( m ) = (cid:104) (cid:126)m (cid:105) + (cid:104) (cid:126)m (cid:105) , (S27)and similarly for the second, ∞ (cid:88) m =0 m ( m + 1) (cid:126)p ( m + 1) = ∞ (cid:88) m =1 ( m − m(cid:126)p ( m ) = ∞ (cid:88) m =0 ( m − m(cid:126)p ( m ) = (cid:104) (cid:126)m (cid:105) − (cid:104) (cid:126)m (cid:105) . (S28)Substituting back in Eq. S26 then produces0 = ( K − R D ) (cid:104) (cid:126)m (cid:105) − γ (cid:104) (cid:126)m (cid:105) + R A ( (cid:104) (cid:126)m (cid:105) + (cid:104) (cid:126)m (cid:105) ) + γ ( (cid:104) (cid:126)m (cid:105) − (cid:104) (cid:126)m (cid:105) ) , (S29)41r after simplifying 0 = ( K − R D + R A − γ ) (cid:104) (cid:126)m (cid:105) + R A (cid:104) (cid:126)m (cid:105) . (S30)So like (cid:104) (cid:126)m (cid:105) , (cid:104) (cid:126)m (cid:105) is also found by simply solving a set of linear equations after ﬁrst solving for (cid:104) (cid:126)m (cid:105) fromEq. S24.Next we can ﬁnd the mean mRNA copy number (cid:104) m (cid:105) from (cid:104) (cid:126)m (cid:105) according to (cid:104) m (cid:105) = (cid:126) † · (cid:104) (cid:126)m (cid:105) , (S31)where (cid:126) † is a row vector whose elements are all 1. Eq. S31 holds since the i th element of the column vector (cid:104) (cid:126)m (cid:105) is the mean mRNA value conditional on the system occupying the i th promoter state, so the dot productwith (cid:126) † amounts to simply summing the elements of (cid:104) (cid:126)m (cid:105) . Rather than solving Eq. S30 for (cid:104) (cid:126)m (cid:105) and thencomputing (cid:126) † · (cid:104) (cid:126)m (cid:105) , we may take a shortcut by multiplying Eq. S30 by (cid:126) † ﬁrst. The key observation thatmakes this useful is that (cid:126) † · ( K − R D + R A ) = 0 . (S32)Intuitively, this equality holds because each column of this matrix represents transitions to and from a givenpromoter state. In any given column, the diagonal encodes all departing transitions and oﬀ-diagonals encodeall arriving transitions. Conservation of probability means that each column sums to zero, and summingcolumns is exactly the operation that multiplying by (cid:126) † carries out.Proceeding then in multiplying Eq. S30 by (cid:126) † produces0 = − γ(cid:126) † · (cid:104) (cid:126)m (cid:105) + (cid:126) † · R A (cid:104) (cid:126)m (cid:105) , (S33)or simply (cid:104) m (cid:105) = 1 γ(cid:126) † · R A (cid:104) (cid:126)m (cid:105) . (S34)We note that the in equilibrium models of transcription such as in Figure 1, it is usually assumed that themean mRNA level is given by the ratio of initiation rate r to degradation rate γ multiplied by the probabilityof ﬁnding the system in the RNAP-bound state. Reassuringly, we have recovered exactly this result fromthe master equation picture: the product (cid:126) † · R A (cid:104) (cid:126)m (cid:105) picks out the probability of the active promoter statefrom (cid:104) (cid:126)m (cid:105) and multiplies it by the initiation rate r . S1.3.3 Second moment (cid:104) m (cid:105) and Fano factor ν Continuing the pattern of the zeroth and ﬁrst moments, we now ﬁnd (cid:104) (cid:126)m (cid:105) by multiplying Eq. S12 by m and then summing over m , which explicitly is0 = ∞ (cid:88) m =0 m [( K − R D − γm I ) (cid:126)p ( m ) + R A (cid:126)p ( m −

1) + γ ( m + 1) I (cid:126)p ( m + 1)] . (S35)Identifying the moments (cid:104) (cid:126)m (cid:105) and (cid:104) (cid:126)m (cid:105) in the ﬁrst term simpliﬁes this to0 = ( K − R D ) (cid:104) (cid:126)m (cid:105) − γ (cid:104) (cid:126)m (cid:105) + R A ∞ (cid:88) m =0 m (cid:126)p ( m −

1) + γ ∞ (cid:88) m =0 m ( m + 1) (cid:126)p ( m + 1) . (S36)Reindexing the sums of the last two terms proceeds just as it did for the zeroth and ﬁrst moments. Explicitly,we have ∞ (cid:88) m =0 m (cid:126)p ( m −

1) = ∞ (cid:88) m = − ( m + 1) (cid:126)p ( m ) = ∞ (cid:88) m =0 ( m + 1) (cid:126)p ( m ) = (cid:104) (cid:126)m (cid:105) + 2 (cid:104) (cid:126)m (cid:105) + (cid:104) (cid:126)m (cid:105) , (S37)for the ﬁrst sum and ∞ (cid:88) m =0 m ( m + 1) (cid:126)p ( m + 1) = ∞ (cid:88) m =1 ( m − m(cid:126)p ( m ) = ∞ (cid:88) m =0 ( m − m(cid:126)p ( m ) = (cid:104) (cid:126)m (cid:105) − (cid:104) (cid:126)m (cid:105) + (cid:104) (cid:126)m (cid:105) (S38)42or the second. Substituting the results of the sums back in Eq. S36 gives0 = ( K − R D ) (cid:104) (cid:126)m (cid:105) − γ (cid:104) (cid:126)m (cid:105) + R A ( (cid:104) (cid:126)m (cid:105) + 2 (cid:104) (cid:126)m (cid:105) + (cid:104) (cid:126)m (cid:105) ) + γ ( (cid:104) (cid:126)m (cid:105) − (cid:104) (cid:126)m (cid:105) + (cid:104) (cid:126)m (cid:105) ) , (S39)and after grouping like powers of m we have0 = ( K − R D + R A − γ ) (cid:104) (cid:126)m (cid:105) + (2 R A + γ ) (cid:104) (cid:126)m (cid:105) + R A (cid:104) (cid:126)m (cid:105) . (S40)As we found when computing (cid:104) m (cid:105) from (cid:104) (cid:126)m (cid:105) , we can spare ourselves some algebra by multiplying Eq. S40by (cid:126) † , which then reduces to0 = − γ (cid:104) m (cid:105) + (cid:126) † · (2 R A + γ ) (cid:104) (cid:126)m (cid:105) + (cid:126) † · R A (cid:104) (cid:126)m (cid:105) , (S41)and noting from Eq. S34 that (cid:126) † · R A (cid:104) (cid:126)m (cid:105) = γ (cid:104) m (cid:105) , we have the tidy result (cid:104) m (cid:105) = (cid:104) m (cid:105) + 1 γ(cid:126) † · R A (cid:104) (cid:126)m (cid:105) . (S42)Finally we have all the preliminary results needed to write a general expression for the Fano factor ν . TheFano factor is deﬁned as the ratio of variance to mean, which can be written as ν = (cid:104) m (cid:105) − (cid:104) m (cid:105) (cid:104) m (cid:105) = (cid:104) m (cid:105) + γ (cid:126) † · R A (cid:104) (cid:126)m (cid:105) − (cid:104) m (cid:105) (cid:104) m (cid:105) (S43)and simpliﬁed to ν = 1 − (cid:104) m (cid:105) + (cid:126) † · R A (cid:104) (cid:126)m (cid:105) γ (cid:104) m (cid:105) . (S44)Note a subtle notational trap here: (cid:104) m (cid:105) = γ (cid:126) † · R A (cid:104) (cid:126)m (cid:105) rather than the by-eye similar but wrong expression (cid:104) m (cid:105) (cid:54) = γ (cid:126) † · R A (cid:104) (cid:126)m (cid:105) , so the last term in Eq. S44 is in general quite nontrivial. For a generic promoter,Eq. S44 may be greater than, less than, or equal to one, as asserted in Section 3. We have not found thegeneral form Eq. S44 terribly intuitive and instead defer discussion to speciﬁc examples. S1.3.4 Summary of general results

For ease of reference, we collect and reprint here the key results derived in this section that are used in themain text and subsequent subsections. Mean mRNA copy number and Fano factor are given by Eqs. S34and S44, which are (cid:104) m (cid:105) = 1 γ(cid:126) † · R A (cid:104) (cid:126)m (cid:105) (S45)and ν = 1 − (cid:104) m (cid:105) + (cid:126) † · R A (cid:104) (cid:126)m (cid:105) γ (cid:104) m (cid:105) , (S46)respectively. To compute these two quantities, we need the expressions for (cid:104) (cid:126)m (cid:105) and (cid:104) (cid:126)m (cid:105) given by solvingEqs. S24 and S30, respectively, which are( K − R D + R A ) (cid:104) (cid:126)m (cid:105) = 0 (S47)and ( K − R D + R A − γ I ) (cid:104) (cid:126)m (cid:105) = − R A (cid:104) (cid:126)m (cid:105) . (S48)Some comments are in order before we consider particular models. First, note that to obtain (cid:104) (cid:126)m (cid:105) and ν , weneed not bother solving for all components of the vectors (cid:104) (cid:126)m (cid:105) and (cid:104) (cid:126)m (cid:105) , but only the components which aremultiplied by nonzero elements of R A . The only component of (cid:104) (cid:126)m (cid:105) that ever survives is the transciptionallyactive state, and for the models we consider here, there is only ever one such state. This will save us someamount of algebra below. 43lso note that we are computing Fano factors to verify the results of Section 3, concerning the constitutivepromoter models in Figure 2 which are analogs of the simple repression models in Figure 1. We can translatethe matrices from the simple repression models to the constitutive case by simply substituting all occurrencesof repressor rates by zero and removing the row and column corresponding to the repressor bound state.The results for (cid:104) m (cid:105) computed in the repressed case can be easily translated to the constitutive case, ratherthan recalculating from scratch, by taking the limit k + R →

0, since this amounts to sending repressor copynumber to zero.Finally, we point out that it would be possible to compute (cid:104) (cid:126)m (cid:105) more simply using the diagram methodsfrom King and Altman [40] (also independently discovered by Hill [41]). But to our knowledge this methodcannot be applied to compute (cid:104) (cid:126)m (cid:105) or ν , so since we would need to resort to solving the matrix equationsanyways for (cid:104) (cid:126)m (cid:105) , we choose not to introduce the extra conceptual burden of the diagram methods simplyfor computing (cid:104) (cid:126)m (cid:105) . S1.4 Nonequilibrium Model One - Poisson Promoter

S1.4.1 Mean mRNA

For nonequilibrium model 1 in Figure 1, we have already shown the full master equation in Eq. 10 andEq. S6, but for completeness we reprint it again as ddt p R ( m, t ) = − R → U (cid:122) (cid:125)(cid:124) (cid:123) k − R p R ( m, t ) + U → R (cid:122) (cid:125)(cid:124) (cid:123) k + R p U ( m, t ) + m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1) γp R ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γmp R ( m, t ) ddt p U ( m, t ) = R → U (cid:122) (cid:125)(cid:124) (cid:123) k − R p R ( m, t ) − U → R (cid:122) (cid:125)(cid:124) (cid:123) k + R p U ( m, t ) + m − → m (cid:122) (cid:125)(cid:124) (cid:123) rp U ( m − , t ) − m → m +1 (cid:122) (cid:125)(cid:124) (cid:123) rp U ( m, t )+ m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1) γp U ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γmp U ( m, t ) . (S49)This is a direct transcription of the states and rates in Figure 1. This may be converted to the matrix formof the master equation shown in Eq. S11 with matrices (cid:126)p ( m ) = (cid:18) p R ( m ) p U ( m ) (cid:19) , K = (cid:18) − k − R k + R k − R − k + R (cid:19) , R = (cid:18) r (cid:19) , (S50)where R A and R D are equal, so we drop the subscript and denote both simply by R . Since our interest isonly in steady-state we dropped the time dependence as well.First we need (cid:104) (cid:126)m (cid:105) . Label its components as p R and p U , the probabilities of ﬁnding the system in eitherpromoter state, and note that only p U survives multiplication by R , since R (cid:104) (cid:126)m (cid:105) = (cid:18) r (cid:19) (cid:18) p R p U (cid:19) = (cid:18) rp U (cid:19) , (S51)so we need not bother ﬁnding p R . Then we have( K − R D + R A ) (cid:104) (cid:126)m (cid:105) = (cid:18) − k − R k + R k − R − k + R (cid:19) (cid:18) p R p U (cid:19) = 0 . (S52)As mentioned earlier in Section S1.3.1, the two rows are linearly dependent, so taking only the ﬁrst row andusing normalization to set p R = 1 − p U gives − k − R (1 − p U ) + k + R p U = 0 , (S53)which is easily solved to ﬁnd p U = k − R k − R + k + R . (S54)44ubstituting this into Eq. S51, and the result of that into Eq. S45, we have (cid:104) m (cid:105) = rγ k − R k − R + k + R (S55)as asserted in Eq. 13 of the main text. S1.4.2 Fano factor

To verify that the Fano factor for model 1 in Figure 2(A) is in fact 1 as claimed in the main text, note thatin this limit p U = 1 and (cid:104) m (cid:105) = r/γ . All elements of K are zero, and R A − R D = 0, so Eq. S48 reducesto − γ (cid:104) (cid:126)m (cid:105) = − r, (S56)or, in other words, since there is only one promoter state, (cid:104) (cid:126)m (cid:105) = (cid:104) m (cid:105) . Then it follows that ν = 1 − rγ + r (cid:104) m (cid:105) γ (cid:104) m (cid:105) = 1 (S57)as claimed. S1.5 Nonequilibrium Model Two - RNAP Bound and Unbound States

S1.5.1 Mean mRNA

As shown earlier, the full master equation for model 2 in Figure 1 is ddt p R ( m, t ) = − R → U (cid:122) (cid:125)(cid:124) (cid:123) k − R p R ( m, t ) + U → R (cid:122) (cid:125)(cid:124) (cid:123) k + R p E ( m, t ) + m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1) γp R ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γmp R ( m, t ) ddt p E ( m, t ) = R → U (cid:122) (cid:125)(cid:124) (cid:123) k − R p R ( m, t ) − U → R (cid:122) (cid:125)(cid:124) (cid:123) k + R p E ( m, t ) + m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1) γp E ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γmp E ( m, t ) . + A → U (cid:122) (cid:125)(cid:124) (cid:123) k − P p P ( m, t ) − U → A (cid:122) (cid:125)(cid:124) (cid:123) k + P p E ( m, t ) + m − → m, A → U (cid:122) (cid:125)(cid:124) (cid:123) rp P ( m − , t ) ddt p P ( m, t ) = − A → U (cid:122) (cid:125)(cid:124) (cid:123) k − P p P ( m, t ) + U → A (cid:122) (cid:125)(cid:124) (cid:123) k + P p E ( m, t ) + m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) ( m + 1) γp P ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γmp P ( m, t ) . − m → m +1 , A → U (cid:122) (cid:125)(cid:124) (cid:123) rp P ( m, t ) , (S58)which can be condensed to the matrix form of Eq. S11 with matrices given by K =  − k − R k + R k − R − k + R − k + P k − P k + P − k − P  , R A =  r  , R D =  r  . (S59)As for model 1, we must ﬁrst ﬁnd R A (cid:104) (cid:126)m (cid:105) . Denote its components as p R , p E , p P , the probabilities ofbeing found in repressor bound, empty, or RNAP-bound states, respectively. Only p P is necessary to ﬁndsince R A (cid:104) (cid:126)m (cid:105) =  rp P  . (S60)45hen Eq. S47 for (cid:104) (cid:126)m (cid:105) reads  − k − R k + R k − R − k + R − k + P k − P + r k + P − k − P − r   p R p E p P  = 0 . (S61)Discarding the middle row as redundant and incorporating the normalization condition leads to a set of threelinearly independent equations, namely − k − R p R + k + R p E = 0 (S62) k + P p E + ( − k − P − r ) p P = 0 (S63) p R + p E + p P = 1 . (S64)Using p R = 1 − p E − p P to eliminate p R in the ﬁrst and solving the resulting equation for p E gives p E =(1 − p P ) k − R / ( k − R + k + R ). Substituting this for p E in the second equation gives an equation in p P alone whichis k + P k − R (1 − p P ) − ( k − P + r )( k + R + k − R ) p P = 0 (S65)and solving for p P gives p P = k + P k − R k + P k − R + ( k − P + r )( k + R + k − R ) . (S66)Substituting this in Eq. S60 and multiplying by R A produces R A (cid:104) (cid:126)m (cid:105) = r k + P k − R k + P k − R + ( k − P + r )( k + R + k − R )   (S67)from which (cid:104) m (cid:105) follows readily, (cid:104) m (cid:105) = rγ k + P k − R k + P k − R + ( k − P + r )( k + R + k − R ) , (S68)as claimed in Eq. 18 in the main text. S1.5.2 Fano factor

To compute the Fano factor, we ﬁrst remove the repressor bound state from the matrices describing themodel, which reduce to K = (cid:18) − k + P k − P k + P − k − P (cid:19) , R A = (cid:18) r (cid:19) , R D = (cid:18) r (cid:19) . (S69)Similarly we remove the repressor bound state from R A (cid:104) (cid:126)m (cid:105) and take the k + R → R A (cid:104) (cid:126)m (cid:105) = r k + P k + P + k − P + r (cid:18) (cid:19) . (S70)Then we must compute (cid:104) (cid:126)m (cid:105) from Eq. S48, which with these matrices reads( K − R D + R A − γ I ) (cid:104) (cid:126)m (cid:105) = (cid:18) − k + P − γ k − P + rk + P − k − P − r − γ (cid:19) (cid:18) m E m P (cid:19) = r k + P k + P + k − P + r (cid:18) (cid:19) , (S71)where we labeled the components of (cid:104) (cid:126)m (cid:105) as m E and m P , since they are the mean mRNA counts conditionalupon the system residing in the empty or polymerase bound states, respectively. Unlike for (cid:104) (cid:126)m (cid:105) , the rowsof this matrix are linearly independent so we simply solve this matrix equation as is. We can immediately46liminate m E since m E = m P ( k − P + r + γ ) /k + P from the second row, and substituting into the ﬁrst row givesan equation for m P alone, which is (cid:2) − ( k + P + γ )( k − P + r + γ ) + k + P ( k − P + r ) (cid:3) m P = − r ( k + P ) k + P + k − P + r . (S72)Expanding the products cancels several terms, and solving for m P gives m P = r ( k + P ) γ ( k + P + k − P + r )( k + P + k − P + r + γ ) . (S73)Note then that (cid:126) † · R A (cid:104) (cid:126)m (cid:105) = rm P . We also need the constitutive limit of (cid:104) m (cid:105) from Eq. S68, again found bytaking k + R →

0, which is (cid:104) m (cid:105) = rγ k + P k + P + k − P + r (S74)and substituting this along with (cid:126) † · R A (cid:104) (cid:126)m (cid:105) = rm P into Eq. S46 for the Fano factor ν , we ﬁnd ν = 1 − rγ k + P k + P + k − P + r + rγ r ( k + P ) γ ( k + P + k − P + r )( k + P + k − P + r + γ ) (cid:18) rγ k + P k + P + k − P + r (cid:19) − . (S75)This simpliﬁes to ν = 1 − rγ (cid:18) k + P k + P + k − P + r − k + P k + P + k − P + r + γ (cid:19) , (S76)which further simpliﬁes to ν = 1 − rk + P ( k + P + k − P + r )( k + P + k − P + r + γ ) , (S77)exactly Eq. 36 in the main text. S1.6 Nonequilibrium Model Three - Multistep Transcription Initiation and Es-cape

S1.6.1 Mean mRNA

In close analogy to model 2 above, nonequilibrium model 3 from Figure 1(C) can be described by our genericmaster equation Eq. S11 with promoter transition matrix given by K =  − k − R k + R k − R − k + R − k + P k − P k + P − k − P − k O

00 0 k O  (S78)and transcription matrices given by R A =  r  , R D =  r  . (S79) (cid:104) (cid:126)m (cid:105) is again given by Eq. S47, which in this case takes the form( K − R D + R A ) (cid:104) (cid:126)m (cid:105) =  − k − R k + R k − R − k + R − k + P k − P r k + P − k − P − k O

00 0 k O − r   p R p E p C p O  = 0 , (S80)47here the four components of (cid:104) (cid:126)m (cid:105) correspond to the four promoter states repressor bound, empty, RNAP-bound closed complex, and RNAP-bound open complex. As explained in Section S1.3.1, we are free todiscard one linearly dependent row from this matrix and replace it with the normalization condition p R + p E + p C + p O = 1. Using normalization to eliminate p R from the ﬁrst row gives p E = (1 − p C − p O ) k − R k − R + k + R . (S81)If we substitute this in the third row, then the last two rows constitute two equations in p C and p O givenby k + P k − R (1 − p C − p O ) − ( k − P + k O )( k + R + k − R ) p C = 0 (S82) k O p C − rp O = 0 . (S83)Solving for p C = p O r/k O in the second and substituting into the ﬁrst gives us our desired single equation inthe single variable p O , which is k + P k − R − k + P k − R (cid:18) rk O (cid:19) p O − ( k − P + k O )( k + R + k − R ) rk O p O = 0 , (S84)and solving for p O we ﬁnd p O = k + P k − R k O k + P k − R k O + rk + P k − R + r ( k − P + k O )( k + R + k − R ) . (S85)Once again p O , the transcriptionally active state, is the only component of (cid:104) (cid:126)m (cid:105) that survives multiplicationby R A , and R A (cid:104) (cid:126)m (cid:105) = rp O . So (cid:104) m (cid:105) = 1 γ(cid:126) † · R A (cid:104) (cid:126)m (cid:105) = rγ k + P k − R k O k + P k − R k O + rk + P k − R + r ( k − P + k O )( k + R + k − R ) , (S86)which equals Eq. 25 in the main text. S1.6.2 Fano factor

To compute the Fano factor of the analogous constitutive promoter, we ﬁrst excise the repressor states andrates from the problem. More precisely, we construct the matrix ( K − R D + R A − γ I ) and substitute it intoEq. S48 which is now( K − R D + R A − γ I ) (cid:104) (cid:126)m (cid:105) =  − k + P − γ k − P rk + P − k − P − k O − γ k O − r − γ   m E m C m O  = − rp O   (S87)where we labeled the unbound, closed complex, and open complex components of (cid:104) (cid:126)m (cid:105) as m E , m C , and m O ,respectively. p O is given by the limit of Eq. S85 as k + R →

0, which is p O = k + P k O k + P ( k O + r ) + r ( k − P + k O ) ≡ k + P k O Z , (S88)where we deﬁne Z for upcoming convenience as this sum of terms will appear repeatedly. We can use thesecond equation to eliminate m E , ﬁnding m E = m C ( k − P + k O + γ ) /k + P , and the third to eliminate m C , whichis simply m C = m O ( r + γ ) /k O . Substituting these both into the ﬁrst equation gives a single equation forthe variable of interest, m O , − ( k + P + γ )( k − P + k O + γ )( r + γ ) m O + k − P k + P ( r + γ ) m O + rk + P k O m O = − rk + P k O p O , (S89)48hich is solved for m O to give m O = p O rk + P k O ( k + P + γ )( k − P + k O + γ )( r + γ ) − rk + P k O − k − P k + P ( r + γ ) . (S90)Expanding the denominator and canceling terms leads to m O = p O rγ k + P k O Z + γ ( k + P + k − P + k O + r ) + γ . (S91)Now (cid:126) † · R A (cid:104) (cid:126)m (cid:105) = rm O , and (cid:104) m (cid:105) = rp O /γ , so if we substitute these two quantities into Eq. S46, we willreadily obtain the Fano factor as ν = 1 − (cid:104) m (cid:105) + (cid:126) † · R A (cid:104) (cid:126)m (cid:105) γ (cid:104) m (cid:105) = 1 − rγ p O + m O p O . (S92)Substituting, we see that ν = 1 − rγ k + P k O Z + rγ k + P k O Z + γ ( k + P + k − P + k O + r ) + γ , (S93)and after simplifying, we obtain ν = 1 − rk + P k O Z k + P + k − P + k O + r + γ Z + γ ( k + P + k − P + k O + r ) + γ , (S94)as stated in Eq. 37 in the main text. S1.6.3 Generalizing ν < to more ﬁne-grained models In the main text we argued that the convolution of multiple exponential distributions should be a distributionwith a smaller fractional width than the corresponding exponential distribution with the same mean. Thiscan be made more precise with a result from [76], who showed that the convolution of multiple gammadistributions (of which the exponential distribution is a special case) is, to a very good approximation, alsogamma distributed. Using their Eq. 2 for the distribution of the convolution, with shape parameters set to 1to give exponential distributions, the total waiting time distribution has a ratio of variance to squared mean σ /µ = (cid:80) i k i / ( (cid:80) i k i ) , where the k i are the rates of the individual steps. Clearly this is less than 1 andtherefore the total waiting time distribution is narrower than the corresponding exponential.We also claimed in the main text that for a process whose waiting time distribution is narrower thanexponential, i.e., has σ /µ <

1, the distribution of counts should be less variable than a Poisson distribution,leading to a Fano factor ν <

1. This we argue by analogy to photon statistics where it is known that“antibunched” arrivals, in other words more uniformly distributed in time relative to uncorrelated arrivals,generally gives rise to sub-Poissonian noise [77], [78]. Although loopholes to this result are known to exist,to our knowledge they appear to arise from uniquely quantum eﬀects so we do not expect they apply for ourproblem. Nevertheless we refrain from elevating this equivalence of kinetic cycles with sub-Poissonian noiseto a “theorem.”

S1.7 Nonequilibrium Model Four - “Active” and “Inactive” States

S1.7.1 Mean mRNA

The mathematical speciﬁcation of this model is almost identical to model 2. The matrix K is identical, asis R D . The only diﬀerence is that now R A = R D , i.e., both are diagonal, in contrast to model 2 where R A (cid:104) m (cid:105) is  − k − R k + R k − R − k + R − k + k − k + − k −   p R p I p A  = 0 . (S95)In fact we need not do this calculation explicitly and can instead recycle the calculation of mean mRNA (cid:104) m (cid:105) from model 2. The matrices are identical except for the relabeling k − ←→ ( k − P + r ), and a careful lookthrough the derivation of (cid:104) m (cid:105) for model 2 shows that the parameters k − P and r only ever appear as the sum k − P + r . So taking (cid:104) m (cid:105) from model 2, Eq. S68, and relabeling ( k − P + r ) → k − gives us our answer for modelfour, simply (cid:104) m (cid:105) = rγ k + k − R k + k − R + k − ( k + R + k − R ) . (S96) S1.7.2 Fano factor

Likewise, for computing the Fano factor of this model we may take a shortcut. Consider the constitutivemodel four from Figure 2 for which we want to compute the Fano factor and compare it to nonequilibriummodel one of simple repression in Figure 1. Mathematically these are exactly the same model, just with rateslabeled diﬀerently and the meaning of the promoter states interpreted diﬀerently. Furthermore, nonequilib-rium model 1 from Figure 1 was the model considered by Jones et. al. [36], where they derived the Fanofactor for that model to be ν = 1 + rk + R ( k + R + k − R )( k + R + k − R + γ ) . (S97)So recognizing that the relabelings k + R → k − and k − R → k + will translate this result to our model four fromFigure 2, we can immediately write down our Fano factor as ν = 1 + rk − ( k − + k + )( k − + k + + γ ) , (S98)as quoted in Eq. 38 and in Figure 2. S2 Bursty promoter models - generating function solutions andnumerics

S2.1 Constitutive promoter with bursts

S2.1.1 From master equation to generating function

The objective of this section is to write down the steady-state mRNA distribution for model 5 in Figure 2.Our claim is that this model is rich enough that it can capture the expression pattern of bacterial constitutivepromoters. Figure S2 shows two diﬀerent schematic representations of the model. Figure S2(A) shows thepromoter cartoon model with burst initiation rate k i , mRNA degradation rate γ , and mean burst size b . Forour derivation of the chemical master equation we will focus more on Figure S2(B). This representation isintended to highlight that bursty gene expression allows transitions between mRNA count m and m (cid:48) evenwith m − m (cid:48) > m .There are two possible paths to jump from an mRNA count m (cid:48) (cid:54) = m to a state m in a small time window∆ t :1. By degradation of a single mRNA, jumping from m + 1 to m .50 RNA countmRNA count (A)(B)

Figure S2. Bursty transcription for unregulated promoter. (A) Schematic of the one-state burstytranscription model. Rate k i is the bursty initiation rate, γ is the mRNA degradation rate, and b is the mean burstsize. (B) Schematic depiction of the mRNA count state transitions. The model in (A) allows for transitions of > G m − m (cid:48) , where the state jumps from having m (cid:48) mRNA to having m mRNA in asingle burst of gene expression.

2. By producing m − m (cid:48) mRNA for m (cid:48) ∈ { , , . . . , m − } .For the “exit” states from m into m (cid:48) (cid:54) = m during a small time window ∆ t we also have two possibilities:1. By degradation of a single mRNA, jumping from m to m − m (cid:48) − m mRNA for m (cid:48) − m ∈ { , , . . . } .This implies that the probability of having m mRNA at time t + ∆ t can be written as p ( m, t + ∆ t ) = p ( m, t ) + m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) γ ∆ t ( m + 1) p ( m + 1 , t ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γ ∆ tmp ( m, t )+ m (cid:48) ∈{ , ,...m − }→ m (cid:122) (cid:125)(cid:124) (cid:123) k i ∆ t m − (cid:88) m (cid:48) =0 G m − m (cid:48) p ( m (cid:48) , t ) − m → m (cid:48) ∈{ m +1 ,m +2 ,... } (cid:122) (cid:125)(cid:124) (cid:123) k i ∆ t ∞ (cid:88) m (cid:48) = m +1 G m (cid:48) − m p ( m, t ) , (S99)where we indicate G m (cid:48) − m as the probability of having a burst of size m (cid:48) − m , i.e. when the number ofmRNAs jump from m to m (cid:48) > m due to a single mRNA transcription burst. We suggestively use the letter G as we will assume that these bursts sizes are geometrically distributed with parameter θ . This is writtenas G k = θ (1 − θ ) k for k ∈ { , , , . . . } . (S100)In Section 3 of the main text we derive this functional form for the burst size distribution. An intuitiveway to think about it is that for transcription initiation events that take place instantaneously there aretwo competing possibilities: Producing another mRNA with probability (1 − θ ), or ending the burst withprobability θ . What this implies is that for a geometrically distributed burst size we have a mean burst size b of the form b ≡ (cid:104) m (cid:48) − m (cid:105) = ∞ (cid:88) k =0 kθ (1 − θ ) k = 1 − θθ . (S101)To clean up Equation S99 we can send the ﬁrst term on the right hand side to the left, and divide both sidesby ∆ t . Upon taking the limit where ∆ t → ddt p ( m, t ) = ( m + 1) γp ( m + 1 , t ) − mγp ( m, t ) + k i m − (cid:88) m (cid:48) =0 G m − m (cid:48) p ( m (cid:48) , t ) − k i ∞ (cid:88) m (cid:48) = m +1 G m (cid:48) − m p ( m, t ) . (S102)51urthermore, given that the timescale for this equation is set by the mRNA degradation rate γ we can divideboth sides by this rate, obtaining ddτ p ( m, τ ) = ( m + 1) p ( m + 1 , τ ) − mp ( m, τ ) + λ m − (cid:88) m (cid:48) =0 G m − m (cid:48) p ( m (cid:48) , τ ) − λ ∞ (cid:88) m (cid:48) = m +1 G m (cid:48) − m p ( m, τ ) , (S103)where we deﬁned τ ≡ t × γ , and λ ≡ k i /γ . The last term in Eq. S103 sums all burst sizes except for a burstof size zero. We can re-index the sum to include this term, obtaining λ ∞ (cid:88) m (cid:48) = m +1 G m (cid:48) − m p ( m, τ ) = λp ( m, t )  ∞ (cid:88) m (cid:48) = m G m (cid:48) − m (cid:124) (cid:123)(cid:122) (cid:125) re-index sum to include burst size zero − G (cid:124)(cid:123)(cid:122)(cid:125) subtract extra added term  . (S104)Given the normalization constraint of the geometric distribution, adding the probability of all possible burstsizes – including size zero since we re-indexed the sum – allows us to write ∞ (cid:88) m (cid:48) = m G m (cid:48) − m − G = 1 − G . (S105)Substituting this into Eq. S103 results in ddτ p ( m, τ ) = ( m + 1) p ( m + 1 , τ ) − mp ( m, τ ) + λ m − (cid:88) m (cid:48) =0 G m − m (cid:48) p ( m (cid:48) , τ ) − λp ( m, τ ) [1 − G ] . (S106)To ﬁnally get at a more compact version of the equation notice that the third term in Eq. S106 includesburst from size m (cid:48) − m = 1 to size m (cid:48) − m = m . We can include the term p ( m, t ) G in the sum which allowsbursts of size m (cid:48) − m = 0. This results in our ﬁnal form for the chemical master equation ddτ p ( m, τ ) = ( m + 1) p ( m + 1 , τ ) − mp ( m, τ ) − λp ( m, τ ) + λ m (cid:88) m (cid:48) =0 G m − m (cid:48) p ( m (cid:48) , τ ) . (S107)In order to solve Eq. S107 we will use the generating function method [79]. The probability generatingfunction is deﬁned as F ( z, t ) = ∞ (cid:88) m =0 z m p ( m, t ) , (S108)where z is just a dummy variable that will help us later on to obtain the moments of the distribution. Letus now multiply both sides of Eq. S107 by z m and sum over all m (cid:88) m z m ddτ p ( m, τ ) = (cid:88) m z m (cid:34) − mp ( m, τ ) + ( m + 1) p ( m + 1 , τ ) + λ m (cid:88) m (cid:48) =0 G m − m (cid:48) p ( m (cid:48) , τ ) − λp ( m, τ ) (cid:35) , (S109)where we use (cid:80) m ≡ (cid:80) ∞ m =0 . We can distribute the sum and use the deﬁnition of F ( z, t ) to obtain dF ( z, τ ) dτ = − (cid:88) m z m mp ( m, τ ) + (cid:88) m z m ( m + 1) p ( m + 1 , τ ) + λ (cid:88) m z m m (cid:88) m (cid:48) =0 G m − m (cid:48) p ( m (cid:48) , τ ) − λF ( z, τ ) . (S110)We can make use of properties of the generating function to write everything in terms of F ( z, τ ): the ﬁrst52erm on the right hand side of Eq. S110 can be rewritten as (cid:88) m z m · m · p ( m, τ ) = (cid:88) m z ∂z m ∂z p ( m, τ ) , (S111)= (cid:88) m z ∂∂z ( z m p ( m, τ )) , (S112)= z ∂∂z (cid:32)(cid:88) m z m p ( m, τ ) (cid:33) , (S113)= z ∂F ( z, τ ) ∂z . (S114)For the second term on the right hand side of Eq. S110 we deﬁne k ≡ m + 1. This allows us to write ∞ (cid:88) m =0 z m · ( m + 1) · p ( m + 1 , τ ) = ∞ (cid:88) k =1 z k − · k · p ( k, τ ) , (S115)= z − ∞ (cid:88) k =1 z k · k · p ( k, τ ) , (S116)= z − ∞ (cid:88) k =0 z k · k · p ( k, τ ) , (S117)= z − (cid:18) z ∂F ( z ) ∂z (cid:19) , (S118)= ∂F ( z ) ∂z . (S119) Figure S3. Reindexing double sum.

Schematic for reindexing the sum (cid:80) ∞ m =0 (cid:80) mm (cid:48) =0 . Blue circles depict the2D grid of nonnegative integers restricted to the lower triangular part of the m, m (cid:48) plane. The trick is that thisdouble sum runs over all ( m, m (cid:48) ) pairs with m (cid:48) ≤ m . Summing m ﬁrst instead of m (cid:48) requires determining theboundary: the upper boundary of the m (cid:48) -ﬁrst double sum becomes the lower boundary of the m -ﬁrst double sum. The third term in Eq. S110 is the most trouble. The trick is to reverse the default order of the sums as ∞ (cid:88) m =0 m (cid:88) m (cid:48) =0 = ∞ (cid:88) m (cid:48) =0 ∞ (cid:88) m = m (cid:48) . (S120)To see the logic of the sum we point the reader to Figure S3. The key is to notice that the double sum (cid:80) ∞ m =0 (cid:80) mm (cid:48) =0 is adding all possible pairs ( m, m (cid:48) ) in the lower triangle, so we can add the terms verticallyas the original sum indexing suggests, i.e. ∞ (cid:88) m =0 m (cid:88) m (cid:48) =0 x ( m,m (cid:48) ) = x (0 , + x (1 , + x (1 , + x (2 , + x (2 , + x (2 , + . . . , (S121)53here the variable x is just a placeholder to indicate the order in which the sum is taking place. But we canalso add the terms horizontally as ∞ (cid:88) m (cid:48) =0 ∞ (cid:88) m = m (cid:48) x ( m,m (cid:48) ) = x (0 , + x (1 , + x (2 , + . . . + x (1 , + x (2 , + . . . , (S122)which still adds all of the lower triangle terms. Applying this reindexing results in λ (cid:88) m z m m (cid:88) m (cid:48) =0 G m − m (cid:48) p ( m (cid:48) , τ ) = λ ∞ (cid:88) m (cid:48) =0 ∞ (cid:88) m = m (cid:48) z m θ (1 − θ ) m − m (cid:48) p ( m (cid:48) , τ ) , (S123)where we also substituted the deﬁnition of the geometric distribution G k = θ (1 − θ ) k . Redistributing thesums we can write λ ∞ (cid:88) m (cid:48) =0 ∞ (cid:88) m = m (cid:48) z m θ (1 − θ ) m − m (cid:48) p ( m (cid:48) , τ ) = λθ ∞ (cid:88) n =0 (1 − θ ) m (cid:48) P ( m (cid:48) , τ ) ∞ (cid:88) m = m (cid:48) [ z (1 − θ )] m . (S124)The next step requires us to look slightly ahead into what we expect to obtain. We are working on derivingan equation for the generating function F ( z, τ ) that when solved will allow us to compute what we careabout, i.e. the probability function p ( m, τ ). Upon ﬁnding the function for F ( z, τ ), we will recover thisprobability distribution by evaluating derivatives of F ( z, τ ) at z = 0, whereas we can evaluate derivatives of F ( z, τ ) at z = 1 to instead recover the moments of the distribution. The point here is that when the dustsettles we will evaluate z to be less than or equal to one. Furthermore, we know that the parameter of thegeometric distribution θ must be strictly between zero and one. With these two facts we can safely statethat | z (1 − θ ) | <

1. Deﬁning n ≡ m − m (cid:48) we rewrite the last sum in Eq. S124 as ∞ (cid:88) m = m (cid:48) [ z (1 − θ )] m = ∞ (cid:88) n =0 [ z (1 − θ )] n + m (cid:48) (S125)= [ z (1 − θ )] m (cid:48) ∞ (cid:88) n =0 [ z (1 − θ )] n (S126)= [ z (1 − θ )] m (cid:48) (cid:18) − z (1 − θ ) (cid:19) , (S127)where we use the geometric series since, as stated before, | z (1 − θ ) | <

1. Putting these results together, thePDE for the generating function is ∂F∂τ = ∂F∂z − z ∂F∂z − λF + λθF − z (1 − θ ) . (S128)Changing variables to ξ = 1 − θ and simplifying gives ∂F∂τ + ( z − ∂F∂z = ( z − ξ − zξ λF. (S129) S2.1.2 Steady-state

To get at the mRNA distribution at steady state we ﬁrst must solve Eq. S129 setting the time derivative tozero. At steady-state, the PDE reduces to the ODE dFdz = ξ − zξ λF, (S130)which we can integrate as (cid:90) dFF = (cid:90) λξdz − ξz . (S131)54he initial conditions for generating functions can be subtle and confusing. The key fact follows from thedeﬁnition F ( z, t ) = (cid:80) m z m p ( m, t ). Clearly normalization of the distribution requires that F ( z = 1 , t ) = (cid:80) m p ( m, t ) = 1. A subtlety is that sometimes the generating function may be undeﬁned at z = 1, in whichcase the limit as z approaches 1 from below suﬃces to deﬁne the normalization condition. We also warn thereader that, while it is frequently convenient to change variables from z to a diﬀerent independent variable,one must carefully track how the normalization condition transforms.Continuing on, we evaluate the integrals (producing a constant c ) which givesln F = − λ ln(1 − ξz ) + c (S132) F = c (1 − ξz ) λ . (S133)Only one choice for c can satisfy initial conditions, producing F ( z ) = (cid:18) − ξ − ξz (cid:19) λ = (cid:18) θ − z (1 − θ ) (cid:19) λ , (S134) S2.1.3 Recovering the steady-state probability distribution

To obtain the steady state mRNA distribution p ( m ) we are aiming for we need to extract it from thegenerating function F ( z ) = (cid:88) m z m p ( m ) . (S135)Taking a derivative with respect to z results in dF ( z ) dz = (cid:88) m mz m − p ( m ) . (S136)Setting z = 0 leaves one term in the sum when m = 1 dF ( z ) dz (cid:12)(cid:12)(cid:12)(cid:12) z =0 = (cid:0) · − · p (0) + 1 · · p (1) + 2 · · p (2) + · · · (cid:1) = p (1) , (S137)since in the limit lim x → + x x = 1. A second derivative of the generating function would result in d F ( z ) dz = ∞ (cid:88) m =0 m ( m − z m − p ( m ) . (S138)Again evaluating at z = 0 gives d F ( z ) dz (cid:12)(cid:12)(cid:12)(cid:12) z =0 = 2 p ( z ) . (S139)In general any p ( m ) is obtained from the generating function as p ( m ) = 1 m ! d m F ( z ) dz (cid:12)(cid:12)(cid:12)(cid:12) z =0 . (S140)Let’s now look at the general form of the derivative for our generating function in Eq. S134. For p (0) wesimply evaluate F ( z = 0) directly, obtaining p (0) = F ( z = 0) = θ λ . (S141)The ﬁrst derivative results in dF ( z ) dz = θ λ ddz (1 − z (1 − θ )) − λ = θ λ (cid:2) − λ (1 − z (1 − f )) − λ − · ( θ − (cid:3) = θ λ (cid:2) λ (1 − z (1 − θ )) − λ − (1 − θ ) (cid:3) . (S142)55valuating this at z = 0 as required to get p (1) gives dF ( z ) dz (cid:12)(cid:12)(cid:12)(cid:12) z =0 = θ λ λ (1 − θ ) (S143)For the second derivative we ﬁnd d F ( z ) dz = θ λ (cid:2) λ ( λ + 1)(1 − z (1 − θ )) − λ − (1 − θ ) (cid:3) . (S144)Again evaluating z = 0 gives d F ( z ) dz (cid:12)(cid:12)(cid:12)(cid:12) z =0 = θ λ λ ( λ + 1)(1 − θ ) . (S145)Let’s go for one more derivative to see the pattern. The third derivative of the generating function gives d F ( z ) dz = θ λ (cid:2) λ ( λ + 1)( λ + 2)(1 − z (1 − θ )) − λ − (1 − θ ) (cid:3) , (S146)which again we evaluate at z = 0 d F ( z ) dz (cid:12)(cid:12)(cid:12)(cid:12) z =1 = θ λ (cid:2) λ ( λ + 1)( λ + 2)(1 − θ ) (cid:3) . (S147)If λ was an integer we could write this as d F ( z ) dz (cid:12)(cid:12)(cid:12)(cid:12) z =0 = ( λ + 2)!( λ − θ λ (1 − θ ) . (S148)Since λ might not be an integer we can write this using Gamma functions as d F ( z ) dz (cid:12)(cid:12)(cid:12)(cid:12) z =0 = Γ( λ + 3)Γ( λ ) θ λ (1 − θ ) . (S149)Generalizing the pattern we then have that the m -th derivative takes the form d m F ( z ) dz m (cid:12)(cid:12)(cid:12)(cid:12) z =0 = Γ( λ + m )Γ( λ ) θ λ (1 − θ ) m . (S150)With this result we can use Eq. S140 to obtain the desired steady-state probability distribution func-tion p ( m ) = Γ( m + λ )Γ( m + 1)Γ( λ ) θ λ (1 − θ ) m . (S151)Note that the ratio of gamma functions is often expressed as a binomial coeﬃcient, but since λ may benon-integer, this would be ill-deﬁned. Re-expressing this exclusively in our variables of interest, burst rate λ and mean burst size b , we have p ( m ) = Γ( m + λ )Γ( m + 1)Γ( λ ) (cid:18)

11 + b (cid:19) λ (cid:18) b b (cid:19) m . (S152) S2.2 Adding repression

S2.2.1 Deriving the generating function for mRNA distribution

Let us move from a one-state promoter to a two-state promoter, where one state has repressor bound and theother produces transcriptional bursts as above. A schematic of this model is shown as model 5 in Figure 1(C).Although now we have an equation for each promoter state, otherwise the master equation reads similarly56

Shahrezaei & Swainthree stage promoterShahrezaei & Swainthree stage promoter

Figure S4. Schematic of three-stage promoter from [23].

Adapted from Shahrezaei & Swain [23]. In theirpaper they derive a closed form solution for the protein distribution. Our two-state bursty promoter at the mRNAlevel can be mapped into their solution with some relabeling. to the one-state case, except with additional terms corresponding to transitions between promoter states,namely ddt p R ( m, t ) = k + R p A ( m, t ) − k − R p R ( m, t ) + ( m + 1) γp R ( m + 1 , t ) − mγp R ( m, t ) (S153) ddt p A ( m, t ) = − k + R p A ( m, t ) + k − R p R ( m, t ) + ( m + 1) γp A ( m + 1 , t ) − mγp A ( m, t ) − k i p A ( m, t ) + k i m (cid:88) m (cid:48) =0 θ (1 − θ ) m − m (cid:48) p A ( m (cid:48) , t ) , (S154)where p R ( m, t ) is the probability of the system having m mRNA copies and having repressor bound to thepromoter at time t , and p A is an analogous probability to ﬁnd the promoter without repressor bound. k R +and k − R are, respectively, the rates at which repressors bind and unbind to and from the promoter, and γ isthe mRNA degradation rate. k i is the rate at which bursts initiate, and as before, the geometric distributionof burst sizes has mean b = (1 − θ ) /θ .Interestingly, it turns out that this problem maps exactly onto the three-stage promoter model consideredby Shahrezaei and Swain in [23], with relabelings. Their approximate solution for protein distributionsamounts to the same approximation we make here in regarding the duration of mRNA synthesis bursts asinstantaneous, so their solution for protein distributions also solves our problem of mRNA distributions. Letus examine the analogy more closely. They consider a two-state promoter, as we do here, but they modelmRNA as being produced one at a time and degraded, with rates v and d . Then they model translationas occurring with rate v , and protein degradation with rate d as shown in Figure S4. Now consider thelimit where v , d → ∞ with their ratio v /d held constant. v /d resembles the average burst size oftranslation from a single mRNA: these are the rates of two Poisson processes that compete over a transcript,which matches the story of geometrically distributed burst sizes. In other words, in our bursty promotermodel we can think of the parameter θ as determining one competing process to end the burst and (1 − θ )as a process wanting to continue the burst. So after taking this limit, on timescales slow compared to v and d , it appears that transcription events ﬁre at rate v and produce a geometrically distributed burstof translation of mean size v /d , which intuitively matches the story we have told above for mRNA withvariables relabeled.To verify this intuitively conjectured mapping between our problem and the solution in [23], we continuewith a careful solution for the mRNA distribution using probability generating functions, following the ideassketched in [23]. It is natural to nondimensionalize rates in the problem by γ , or equivalently, this amountsto measuring time in units of γ − . We are also only interested in steady state, so we set the time derivativesto zero, giving 0 = k + R p A ( m ) − k − R p R ( m ) + ( m + 1) p R ( m + 1) − mp R ( m ) (S155)0 = − k + R p A ( m ) + k − R p R ( m ) + ( m + 1) p A ( m + 1) − mp A ( m ) − k i p A ( m ) + k i m (cid:88) m (cid:48) =0 θ (1 − θ ) m − m (cid:48) p A ( m (cid:48) ) , (S156)57here for convenience we kept the same notation for all rates, but these are now expressed in units of meanmRNA lifetime γ − .The probability generating function is deﬁned as before in the constitutive case, except now we must introducea generating function for each promoter state, f A ( z ) = ∞ (cid:88) m =0 z m p A ( m ) , f R ( z ) = ∞ (cid:88) m =0 z m p R ( m ) . (S157)Our real objective is the generating function f ( z ) that generates the mRNA distribution p ( m ), independentof what state the promoter is in. But since p ( m ) = p A ( m ) + p R ( m ), it follows too that f ( z ) = f A ( z ) + f R ( z ).As before we multiply both equations by z m and sum over all m . Each individual term transforms exactly asdid an analogous term in the constitutive case, so the coupled ODEs for the generating functions read0 = k + R f A ( z ) − k − R f R ( z ) + ∂∂z f R ( z ) − z ∂∂z f R ( z ) (S158)0 = − k + R f A ( z ) + k − R f R ( z ) + ∂∂z f A ( z ) − z ∂∂z f A ( z ) − k i f A ( z ) + k i θ − z (1 − θ ) f A ( z ) , (S159)and after changing variables ξ = 1 − θ as before and rearranging, we have0 = k + R f A ( z ) − k − R f R ( z ) + (1 − z ) ∂∂z f R ( z ) (S160)0 = − k + R f A ( z ) + k − R f R ( z ) + (1 − z ) ∂∂z f A ( z ) + k i ( z − ξ − zξ f A ( z ) , (S161)We can transform this problem from two coupled ﬁrst-order ODEs to a single second-order ODE by solvingfor f A in the ﬁrst and plugging into the second, giving0 = (1 − z ) ∂f R ∂z + 1 − zk + R (cid:18) k − R ∂f R ∂z + ∂f R ∂z + ( z − ∂ f R ∂z (cid:19) + k i k + R ( z − ξ − zξ (cid:18) k − R f R + ( z − ∂f R ∂z (cid:19) , (S162)where, to reduce notational clutter, we have dropped the explicit z dependence of f A and f R . Simplifyingwe have 0 = ∂ f R ∂z − (cid:18) k i ξ − zξ + 1 + k − R + k + R − z (cid:19) ∂f R ∂z + k i k − R ξ (1 − zξ )(1 − z ) f R . (S163)This can be recognized as the hypergeometric diﬀerential equation, with singularities at z = 1, z = ξ − , and z = ∞ . The latter can be veriﬁed by a change of variables from z to x = 1 /z , being careful with the chainrule, and noting that z = ∞ is a singular point if and only if x = 1 /z = 0 is a singular point.The standard form of the hypergeometric diﬀerential equation has its singularities at 0, 1, and ∞ , so totake advantage of the standard form solutions to this ODE, we ﬁrst need to transform variables to put itinto a standard form. However, this is subtle. While any such transformation should work in principle, thesolutions are expressed most simply in the neighborhood of z = 0, but the normalization condition that weneed to enforce corresponds to z = 1. The easiest path, therefore, is to ﬁnd a change of variables that maps1 to 0, ∞ to ∞ , and ξ − to 1. This is most intuitively done in two steps.First map the z = 1 singularity to 0 by the change of variables v = z −

1, giving0 = ∂ f R ∂v + (cid:18) k i ξ (1 + v ) ξ − k − R + k + R v (cid:19) ∂f R ∂v + k i k − R ξ ((1 + v ) ξ − v f R . (S164)58ow two singularities are at v = 0 and v = ∞ . The third is determined by (1+ v ) ξ − v = ξ − −

1. Wewant another variable change that maps this third singularity to 1 (without moving 0 or inﬁnity). Changingvariables again to w = vξ − − = ξ − ξ v ﬁts the bill. In other words, the combined change of variables w = ξ − ξ ( z −

1) (S165)maps z = { , ξ − , ∞} to w = { , , ∞} as desired. Plugging in, being mindful of the chain rule and noting(1 + v ) ξ − − ξ )( w −

1) gives0 = (cid:18) ξ − ξ (cid:19) ∂ f R ∂w + (cid:18) ξk i (1 − ξ )( w −

1) + ξ (1 + k − R + k + R )(1 − ξ ) w (cid:19) ξ − ξ ∂f R ∂w + k i k − R ξ (1 − ξ ) w ( w − f R . (S166)This is close to the standard form of the hypergeometric diﬀerential equation, and some cancellation andrearrangement gives0 = w ( w − ∂ f R ∂w + (cid:0) k i w + (1 + k − R + k + R )( w − (cid:1) ∂f R ∂w + k i k − R f R . (S167)and a little more algebra produces0 = w (1 − w ) ∂ f R ∂w + (cid:0) k − R + k + R − (1 + k i + k − R + k + R ) w (cid:1) ∂f R ∂w − k i k − R f R , (S168)which is the standard form. From this we can read oﬀ the solution in terms of hypergeometric functions F from any standard source, e.g. [80], and identify the conventional parameters in terms of our modelparameters. We want the general solution in the neighborhood of w = 0 ( z = 1), which for a homogeneouslinear second order ODE must be a sum of two linearly independent solutions. More precisely, f R ( w ) = C (1)2 F ( α, β, δ ; w ) + C (2) w − δ F (1 + α − δ, β − δ, − δ ; w ) (S169)with parameters determined by αβ = k i k − R α + β = 1 + k i + k − R + k + R δ = 1 + k − R + k + R (S170)and constants C (1) and C (2) to be set by boundary conditions. Solving for α and β , we ﬁnd α = 12 (cid:18) k i + k − R + k + R + (cid:113) ( k i + k − R + k + R ) − k i k − R (cid:19) β = 12 (cid:18) k i + k − R + k + R − (cid:113) ( k i + k − R + k + R ) − k i k − R (cid:19) δ = 1 + k − R + k + R . (S171)Note that α and β are interchangeable in the deﬁnition of F and diﬀer only in the sign preceeding theradical. Since the normalization condition requires that f R be ﬁnite at w = 0, we can immediately set C (2) = 0 to discard the second solution. This is because all the rate constants are strictly positive, so δ > w − δ blows up as w →

0. Now that we have f R , we would like to ﬁnd the generating functionfor the mRNA distribution, f ( z ) = f A ( z )+ f R ( z ). We can recover f A from our solution for f R , namely f A ( z ) = 1 k + R (cid:18) k − R f R ( z ) + ( z − ∂f R ∂z (cid:19) (S172)or f A ( w ) = 1 k + R (cid:18) k − R f R ( w ) + w ∂f R ∂w (cid:19) , (S173)59here in the second line we transformed our original relation between f R and f A to our new, more convenient,variable w . Plugging our solution for f R ( w ) = C (1)2 F ( α, β, δ ; w ) into f A , we will require the diﬀerentiationrule for F , which tells us ∂f R ∂w = C (1) αβδ F ( α + 1 , β + 1 , δ + 1; w ) , (S174)from which it follows that f A ( w ) = C (1) k + R (cid:18) k − R F ( α, β, δ ; w ) + w αβδ F ( α + 1 , β + 1 , δ + 1; w ) (cid:19) (S175)and therefore f ( w ) = C (1) (cid:18) k − R k + R (cid:19) F ( α, β, δ ; w ) + w C (1) k + R αβδ F ( α + 1 , β + 1 , δ + 1; w ) . (S176)To proceed, we need one of the (many) useful identities known for hypergeometric functions, in particu-lar w αβδ F ( α + 1 , β + 1 , δ + 1; w ) = ( δ −

1) ( F ( α, β, δ − w ) − F ( α, β, δ ; w )) . (S177)Substituting this for the second term in f ( w ), we ﬁnd f ( w ) = C (1) k + R (cid:2)(cid:0) k + R + k − R (cid:1) F ( α, β, δ ; w ) + ( δ −

1) ( F ( α, β, δ − w ) − F ( α, β, δ ; w )) (cid:3) , (S178)and since δ − k + R + k − R , the ﬁrst and third terms cancel, leaving only f ( w ) = C (1) k + R + k − R k + R F ( α, β, δ − w ) . (S179)Now we enforce normalization, demanding f ( w = 0) = f ( z = 1) = 1. F ( α, β, δ −

1; 0) = 1, so we musthave C (1) = k + R / ( k + R + k − R ) and consequently f ( w ) = F ( α, β, k + R + k − R ; w ) . (S180)Recalling that the mean burst size b = (1 − θ ) /θ = ξ/ (1 − ξ ) and w = ξ − ξ ( z −

1) = b ( z − z to ﬁnd the tidy result f ( z ) = F ( α, β, k + R + k − R ; b ( z − , (S181)with α and β given above by α = 12 (cid:18) k i + k − R + k + R + (cid:113) ( k i + k − R + k + R ) − k i k − R (cid:19) β = 12 (cid:18) k i + k − R + k + R − (cid:113) ( k i + k − R + k + R ) − k i k − R (cid:19) . (S182)Finally we are in sight of the original goal. We can generate the steady-state probability distribution ofinterest by diﬀerentiating the generating function, p ( m ) = m ! ∂ m ∂z m f ( z ) (cid:12)(cid:12)(cid:12)(cid:12) z =0 , (S183)which follows easily from its deﬁnition. Some contemplation reveals that repeated application of the deriva-tive rule used above will produce products of the form α ( α + 1)( α + 2) · · · ( α + m −

1) in the expression for p ( m ) and similarly for β and δ . These resemble ratios of factorials, but since α , β , and δ are not necessarilyinteger, we should express the ratios using gamma functions instead. More precisely, one ﬁnds p ( m ) = Γ( α + m )Γ( β + m )Γ( k + R + k − R )Γ( α )Γ( β )Γ( k + R + k − R + m ) b m m ! F ( α + m, β + m, k + R + k − R + m ; − b ) (S184)which is ﬁnally the probability distribution we sought to derive.60 S2.3.1 Generalities

We would like to carry out Bayesian parameter inference on FISH data from [36], using Eq. (S184) as ourlikelihood. This requires accurate (and preferably fast) numerical evaluation of the hypergeometric function F , which is a notoriously hard problem [81], [82], and our particular needs here present an especial challengeas we show below.The hypergeometric function is deﬁned by its Taylor series as F ( a, b, c ; z ) = ∞ (cid:88) l =0 Γ( a + l )Γ( b + l )Γ( c )Γ( a )Γ( b )Γ( c + l ) z l l ! (S185)for | z | <

1, and by analytic continuation elsewhere. If z (cid:46) / α and β are not too large (absolutevalue below 20 or 30), then the series converges quickly and an accurate numerical representation is easilycomputed by truncating the series after a reasonable number of terms. Unfortunately, we need to evaluate F over mRNA copy numbers fully out to the tail of the distribution, which can easily reach 50, possibly100. From Eq. (S184), this means evaluating F repeatedly for values of a , b , and c spanning the full rangefrom O (1) to O (10 ), even if α , β , and δ in Eq. (S184) are small, with the situation even worse if theyare not small. A naive numerical evaluation of the series deﬁnition will be prone to overﬂow and, if any of a, b, c <

0, then some successive terms in the series have alternating signs which can lead to catastrophiccancellations.One solution is to evaluate F using arbitrary precision arithmetic instead of ﬂoating point arithmetic, e.g.,using the mpmath library in Python. This is accurate but incredibly slow computationally. To quantify howslow, we found that evaluating the likelihood deﬁned by Eq. (S184) ∼

50 times (for a typical dataset of interestfrom [36], with m values spanning 0 to ∼

50) using arbitrary precision arithmetic is 100-1000 fold slowerthan evaluating a negative binomial likelihood for the corresponding constitutive promoter dataset.To claw back (cid:38)

30 fold of that slowdown, we can exploit one of the many catalogued symmetries involving F . The solution involves recursion relations originally explored by Gauss, and studied extensively in [81],[82]. They are sometimes known as contiguous relations and relate the values of any set of 3 hypergeometricfunctions whose arguments diﬀer by integers. To rephrase this symbolically, consider a set of hypergeometricfunctions indexed by an integer n , f n = F ( a + (cid:15) i n, b + (cid:15) j n, c + (cid:15) k n ; z ) , (S186)for a ﬁxed choice of (cid:15) i , (cid:15) j , (cid:15) k ∈ { , ± } (at least one of (cid:15) i , (cid:15) j , (cid:15) k must be nonzero, else the set of f n wouldcontain only a single element). Then there exist known recurrence relations of the form A n f n − + B n f n + C n f n +1 = 0 , (S187)where A n , B n , and C n are some functions of a, b, c , and z . In other words, for ﬁxed (cid:15) i , (cid:15) j , (cid:15) k , a, b, and c , ifwe can merely evaluate F twice, say for n (cid:48) and n (cid:48) −

1, then we can easily and rapidly generate values forarbitrary n .This provides a convenient solution for our problem: we need repeated evaluations of F ( a + m, b + m, c + m ; z )for ﬁxed a, b , and c and many integer values of m . They idea is that we can use arbitrary precision arithmeticto evaluate F for just two particular values of m and then generate F for the other 50-100 values of m using the recurrence relation. In fact there are even more sophisticated ways of utilizing the recurrencerelations that might have netted another factor of 2 speed-up, and possibly as much as a factor of 10, butthe method described here had already reduced the computation time to an acceptable O (1 min), so thesemore sophisticated approaches did not seem worth the time to pursue.However, there are two further wrinkles. The ﬁrst is that a naive application of the recurrence relation isnumerically unstable. Roughly, this is because the three term recurrence relations, like second order ODEs,61dmit two linearly independent solutions. In a certain eigenbasis, one of these solutions dominates the otheras n → ∞ , and as n → −∞ , the dominance is reversed. If we fail to work in this eigenbasis, our solutionof the recurrence relation will be a mixture of these solutions and rapidly accumulate numerical error. Forour purposes, it suﬃces to know that the authors of [82] derived the numerically stable solutions (so-called minimal solutions ) for several possible choices of (cid:15) i , (cid:15) j , (cid:15) k . Running the recurrence in the proper directionusing a minimal solution is numerically robust and can be done entirely in ﬂoating point arithmetic, sothat we only need to evaluate F with arbitrary precision arithmetic to generate the seed values for therecursion.The second wrinkle is a corollary to the ﬁrst. The minimal solutions are only minimal for certain ranges ofthe argument z , and not all of the 26 possible recurrence relations have minimal solutions for all z . Thiscan be solved by using one of the many transformation formulae for F to convert to a diﬀerent recurrencerelation that has a minimal solution over the required domain of z , although this can require some trial anderror to ﬁnd the right transformation, the right recurrence relation, and the right minimal solution. S2.3.2 Particulars

Let us now demonstrate these generalities for our problem of interest. In order to evaluate the probabilitydistribution of our model, Eq. (S184), we need to evaluate hypergeometric functions of the form F ( α + m, β + m, δ + m ; − b ) for values of m ranging from 0 to O (100). The authors of [82] did not derive a recursionrelation for precisely this case. We could follow their methods and do so ourselves, but it is much easier toconvert to a case that they did consider. The strategy is to look through the minimal solutions tabulatedin [82] and search for a transformation we could apply to F ( α + m, β + m, δ + m ; − b ) that would place the m ’s (the variable being incremented by the recursion) in the same arguments of F as the minimal solution.After some “guess and check,” we found that the transformation F ( α + m, β + m, δ + m ; − b ) = (1 + b ) − α − m F (cid:18) α + m, δ − β, δ + m ; b b (cid:19) , (S188)produces a F on the right hand side that closely resembles the minimal solutions y ,m and y ,m in Eq. 4.3in [82]. Explicitly, these solutions are y ,m ∝ F ( − α (cid:48) + δ (cid:48) − m, − β (cid:48) + δ (cid:48) , − α (cid:48) − β (cid:48) + δ (cid:48) − m ; 1 − z ) (S189) y ,m ∝ F ( α (cid:48) + m, β (cid:48) , α (cid:48) + β (cid:48) − δ (cid:48) + m ; 1 − z ) , (S190)where we have omitted prefactors which are unimportant for now. Which of these two we should use dependson what values z takes on. Equating 1 − z = b/ (1 + b ) gives z = 1 / (1 + b ), and since b is strictly positive, z isbounded between 0 and 1. From Eq. 4.5 in [82], y ,m is the minimal solution for real z satisfying 0 < z < y ,m is a solution increments diﬀerent arguments of F that does y ,m : it increments theﬁrst only, rather than ﬁrst and third. This recurrence relation can be looked up, e.g., Eq. 15.2.10 in [80],which is ( δ (cid:48) − ( α (cid:48) + m )) f m − + (2( α (cid:48) + m ) − δ (cid:48) + ( β (cid:48) − α (cid:48) ) z ) f m + α (cid:48) ( z − f m +1 = 0 . (S191)Now we must solve for the parameters appearing in the recurrence relation in terms of our parameters,namely by setting α (cid:48) = αβ (cid:48) = δ − β α (cid:48) + β (cid:48) − δ (cid:48) = δ − z = b b (S192)62nd solving to ﬁnd α (cid:48) = αβ (cid:48) = δ − βδ (cid:48) = 1 + α − βz = 11 + b . (S193)Finally we have everything we need. The minimal solution y ,m = Γ(1 + α (cid:48) − δ (cid:48) + m )Γ(1 + α (cid:48) + β (cid:48) − δ (cid:48) + m ) × F ( α (cid:48) + m, β (cid:48) , α (cid:48) + β (cid:48) − δ (cid:48) + m ; 1 − z ) , (S194)where we have now included the necessary prefactors, is a numerically stable solution of the recurrencerelation Eq. (S191) if the recursion is run from large m to small m .Let us ﬁnally outline the complete procedure as an algorithm to be implemented:1. Compute the value of F for the two largest m values of interest using arbitrary precision arithmetic.2. Compute the prefactors to construct y , max( m ) and y , max( m ) − .3. Recursively compute y ,m for all m less than max( m ) down to m = 0.4. Cancel oﬀ the prefactors of the resulting values of y ,m for all m to produce F for all desired m values.With F computed, the only remaining numerical danger in computing p ( m ) in Eq. (S184) is overﬂow ofthe gamma functions. This is easily solved by taking the log of the entire expression and using standardroutines to compute the log of the gamma functions, then exponentiating the entire expression at the end if p ( m ) is needed rather than log p ( m ). S3 Bayesian inference

S3.1 The problem of parameter inference

One could argue that the whole goal of formulating theoretical models about nature is to sharpen ourunderstanding from qualitative statements to precise quantitative assertions about the relevant features ofthe natural phenomena in question [83]. It is in these models that we intend to distill the essential parts of theobject of study. Writing down such models leads to a propagation of mathematical variables that parametrizeour models. By assigning numerical values to these parameters we can compute concrete predictions that canbe contrasted with experimental data. For these predictions to match the data the parameter values haveto carefully be chosen from the whole parameter space. But how do we go about assessing the eﬀectivenessof diﬀerent regions of parameter space to speak to the ability of our model to reproduce the experimentalobservations? The language of probability, and more speciﬁcally of Bayesian statistics is – we think – thenatural language to tackle this question.

S3.1.1 Bayes’ theorem

Bayes’ theorem is a simple mathematical statement that can apply to any logical conjecture. For twoparticular events A and B that potentially depend on each other Bayes’ theorem gives us a recipe for howto update our beliefs about one, let us say B , given some state of knowledge, or lack thereof, about A . Inits most classic form Bayes’ theorem is written as P ( B | A ) = P ( A | B ) P ( B ) P ( A ) , (S195)63here the vertical line | is read as “given that”. So P ( B | A ) is read as probability of B given that A tookplace. A and B can be any logical assertion. In particular the problem of Bayesian inference focuses on thequestion of ﬁnding the probability distribution of a particular parameter value given the data.For a given model with a set of parameters (cid:126)θ = ( θ , θ , . . . , θ n ), the so-called posterior distribution P ( (cid:126)θ | D ),where D is the experimental data, quantiﬁes the plausibility of a set of parameter values given our observationof some particular dataset. In other words, through the application of Bayes’ formula we update our beliefson the possible values that parameters can take upon learning the outcome of a particular experiment.We specify the word “update” as we come to every inference problem with prior information about theplausibility of particular regions of parameter space even before performing any experiment. Even when weclaim as researchers that we are totally ignorant about the values that the parameters in our models cantake, we always come to a problem with domain expertise that can be exploited. If this was not the case,it is likely that the formulation of our model is not going to capture the phenomena we claim to want tounderstand. This prior information is captured in the prior probability P ( (cid:126)θ ). The relationship between howparameter values can connect with the data is enconded in the likelihood function P ( D | (cid:126)θ ). Our theoreticalmodel, whether deterministic or probabilistic, is encoded in this term that can be intuitively understood asthe probability of having observed the particular experimental data we have at hand given that our modelis parametrized with the concrete values (cid:126)θ . Implicitly here we are also conditioning on the fact that ourtheoretical model is “true,” i.e. the model itself if evaluated or simulated in the computer is capable ofgenerating equivalent datasets to the one we got to observe in an experiment. In this way Bayesian inferenceconsists of applying Bayes’ formula as P ( (cid:126)θ | D ) ∝ P ( D | (cid:126)θ ) P ( (cid:126)θ ) . (S196)Notice than rather than writing the full form of Bayes’ theorem, we limit ourselves to the terms that dependon our quantity of interest – that is the parameter values themselves (cid:126)θ – as the denominator P ( D ) onlyserves as a normalization constant.We also emphasize that the dichotomy we have presented between prior and likelihood is more subtle.Although it is often stated that our prior knowledge is entirely encapsulated by the obviously named priorprobability P ( (cid:126)θ ), this is usually too simplistic. The form(s) we choose for our likelihood function P ( D | (cid:126)θ )also draw heavily on our prior domain expertise and the assumptions, implicit and explicit, that these choicesencode are at least as important, and often inseparable from, the prior probability, as persuasively arguedin [84]. S3.1.2 The likelihood function

As we alluded in the previous section it is through the likelihood function P ( D | (cid:126)θ ) that we encode theconnection between our parameter values and the experimental observables. Broadly speaking there are twoclasses of models that we might need to encode into our likelihood function: • Deterministic models: Models for which a concrete selection of parameter values give a single output.Said diﬀerently, models with a one-to-one mapping between inputs and outputs. • Probabilistic models: As the name suggests, models that, rather than having a one-to-one input-outputmapping, describe the full probability distribution of possible outputs.In this paper we focus on inference done with probabilistic models. After all, the chemical master equationswe wrote down describe the time evolutions of the mRNA probability distribution. So all our terms P ( (cid:126)θ | D )will be given by the steady-state solution of the corresponding chemical master equation in question. Thisis rather convenient as we do not have to worry about adding a statistical model on top of our model todescribe deviations from the predictions. Instead our models themselves focus on predicting such variationin cell count. 64 The diﬀerent models explored in this work embraced diﬀerent levels of coarse-graining that resulted in adiverse number of parameters for diﬀerent models. For each of these model conﬁgurations Bayes’ theoremdemands from us to represent our preconceptions on the possible parameter values in the form of the prior P ( (cid:126)θ ). Throughout this work for models with > P ( (cid:126)θ ) = n (cid:89) i =1 P ( θ i ) . (S197)Although it is not uncommon practice to use non-informative, or maximally uninformative priors, we are ofthe mindset that this is a disservice to the philosophical and practical implications of Bayes’ theorem. Itsounds almost contradictory to claim that can we represent our thinking about a natural phenomenon inthe form of a mathematical model – in the context of Bayesian inference this means choosing a form for thelikehihoods, and even making this choice presupposes prior understanding or assumptions as to the relevantfeatures in the system under study – but that we have absolutely no idea what the parameter values couldor could not be. We therefore make use of our own expertise, many times in the form of order-of-magnitudeestimates, to write down weakly-informative prior distributions for our parameters.For our particular case all of the datasets from [36] used in this paper have O (10 ) data points. What thisimplies is that our particular choice of priors will not signiﬁcantly aﬀect our inference as long as they arebroad enough. A way to see why this is the case is to simply look at Bayes’ theorem. For N − n (cid:28) parameters Bayes’ theorem reads as P ( (cid:126)θ | D ) ∝ N (cid:89) k =1 P ( d k | (cid:126)θ ) n (cid:89) i =1 P ( θ i ) , (S198)where d k represents the k -th datum. That means that if our priors span a wide range of parameter space,the posterior distribution would be dominated by the likelihood function. S3.1.4 Expectations and marginalizations

For models with more than one or two parameters, it is generally diﬃcult to visualize or reason about the fulljoint posterior distribution P ( (cid:126)θ | D ) directly. One of the great powers of Bayesian analysis is marginalization ,allowing us to reduce the dimensionality to only the parameters of immediate interest by averaging over theother dimensions. Formally, for a three dimensional model with parameters θ , θ , and θ , we can for instancemarginalize away θ to produce a 2D posterior as P ( θ , θ | D ) ∝ (cid:90) θ dθ P ( θ , θ , θ | D ) , (S199)or we can marginalize away θ and θ to produce the 1D marginal posterior of θ alone, which would be P ( θ | D ) ∝ (cid:90) θ dθ (cid:90) θ dθ P ( θ , θ , θ | D ) . (S200)Conceptually, this is what we did in generating the 2D slices of the full 9D model in Figure 4(A). Inpractice, this marginalization is even easier with Markov Chain Monte Carlo samples in hand. Since eachpoint is simply a list of parameter values, we simply ignore the parameters which we want to marginalizeaway [52]. S3.1.5 Markov Chain Monte Carlo

The theory and practice of Bayesian inference with Markov Chain Monte Carlo (MCMC) is a rich subjectwith fascinating and deep analogies to statistical mechanics, even drawing on classical Hamiltonian mechan-ics and general relativity in its modern incarnations. We refer the interested reader to [52] and [85] for65xcellent introductions. Here we merely give a brief summary of the MCMC computations carried out inthis work.We used the Python package emcee for most of the MCMC sampling in this work. For the constitutivepromoter inference, we also ran sampling with the excellent Stan modeling language as a check. We didnot use Stan for the inference of the simple repression model because implementing the gradients of thehypergeometric function F appearing in Eq. S184, the probability distribution for our bursty model withrepression, would have been an immensely challenging task. emcee was more than adequate for our purposes,and we were perhaps lucky that the 9-D posterior model for the model of simple repression with burstypromoter was quite well behaved and did not require the extra power of the Hamiltonian Monte Carloalgorithm provided by Stan [86]. Source code for all statistical inference will be made available at https://github.com/RPGroup-PBoC/bursty_transcription . S3.2 Bayesian inference on constitutive promoters

Having introduced the ideas behind Bayesian inference we are ready to apply the theoretical machinery to ournon-equilibrium models. In particular in this section we will focus on model 1 and model 5 in Figure 2(A).Model 1, the Poisson promoter, will help us build practical intuition into the implementation of the Bayesianinference pipeline as we noted in Section 3 of the main text that this model cannot be reconciled withexperimental data from observables such as the Fano factor. In other words, we acknowledge that this modelis “wrong,” but we still see value in going through the analysis since the simple nature of the model translatesinto a neat statistical analysis.

S3.2.1 Model 1 - Poisson promoter

As speciﬁed in the main test, the mRNA steady-state distribution for model 1 in Figure 2(A) is Poissonwith parameter λ . Throughout this Appendix we will appeal to the convenient notation for probabilitydistributions of the form m ∼ Poisson( λ ) , (S201)where the simbol “ ∼ ” can be read as is distributed according to . So the previous equation can be readas: the mRNA copy number m is distributed according to a Poisson distribution with parameter λ . Ourobjective then is to compute the posterior probability distribution P ( λ | D ), where, as in the main text, D = { m , m , . . . , m N } are the data consisting of single-cell mRNA counts. Since we can assume that eachof the cells mRNA counts are independent of any other cells, our likelihood function P ( D | λ ) consists ofthe product of N Poisson distributions.To proceed with the inference problem we need to specify a prior. In this case we are extremely data-rich,as the dataset from Jones et. al [36] has of order 1000-3000 single-cell measurements for each promoter, soour choice of prior matters little here, as long as it is suﬃciently broad. A convenient choice for our problemis to use a conjugate prior. A conjugate prior is a special prior that causes the posterior to have the samefunctional form as the prior, simply with updated model parameters. This makes calculations analyticallytractable and also oﬀers a nice interpretation of the inference procedure as updating our knowledge about themodel parameters. This makes conjugate priors very useful when they exist. The caveat is that conjugatepriors only exist for a very limited number of likelihoods, mostly with only one or two model parameters, soin almost all other Bayesian inference problems, we must tackle the posterior numerically.But, for the problem at hand, a conjugate prior does in fact exist. For a Poisson likelihood of identical andidentically distributed data, the conjugate prior is a gamma distribution, as can be looked up in, e.g., [52],Section 2.6. Putting a gamma prior on λ introduces two new parameters α and β which parametrize thegamma distribution itself, which we use to encode the range of λ values we view as reasonable. Recall λ isthe mean steady-state mRNA count per cell, which a priori could plausibly be anywhere from 0 to a fewhundred. α = 1 and β = 1 /

50 achieve this, since the gamma distribution is strictly positive with mean α/β √ α/β . To be explicit, then, our prior is λ ∼ Gamma( α, β ) (S202)As an aside, note that if we did not know that our prior was a conjugate prior, we could still write down ourposterior distribution from its deﬁnition as p ( λ | D, α, β ) ∝ p ( D | λ ) p ( λ | α, β ) ∝ (cid:32) N (cid:89) k =1 λ m k e − λ m k ! (cid:33) β Γ( α ) ( βλ ) α − e − βλ . (S203)Without foreknowledge that this in fact reduces to a gamma distribution, this expression might appearrather inscrutable. When conjugate priors are unavailable for the likelihood of interest - which is almostalways the case for models with > λ ∼ Gamma ( α + ¯ mN, β + N ) , (S204)where we deﬁned the sample mean ¯ m = N (cid:80) k m k for notational convenience. A glance at the FISH datafrom [36] reveals that N is O (10 ) and (cid:104) m (cid:105) (cid:38) . mN (cid:38) . Thereforeas we suspected, our prior parameters are completely overwhelmed by the data. The prior behaves, in asense, like β extra “data points” with a mean value of ( α − /β [52], which gives us some intuition forhow much data is needed to overwhelm the prior in this case: enough data N such that β (cid:28) N and α/β (cid:28) ¯ m . In fact, ¯ mN and N are so large that we can, to an excellent approximation, ignore the α and β dependence and approximate the gamma distribution as a Gaussian with mean ¯ m and standard deviation (cid:112) ¯ m/N , giving λ ∼ Gamma ( α + ¯ mN, β + N ) ≈ Normal (cid:32) ¯ m, (cid:114) ¯ mN (cid:33) . (S205)As an example with real numbers, for the lacUV5 promoter, Jones et. al [36] measured 2648 cells with anaverage mRNA count per cell of ¯ m ≈ .

7. In this case then, our posterior is λ ∼ Normal (18 . , . , (S206)which suggests we have inferred our model’s one parameter to a precision of order 1%.This is not wrong, but it is not the full story. The model’s posterior distribution is tightly constrained,but is it a good generative model? In other words, if we use the model to generate synthetic data in thecomputer does it generate data that look similar to our actual data, and is it therefore plausible that themodel captures the important features of the data generating process? This intuitive notion can be codiﬁedwith posterior predictive checks , or PPCs, and we will see that this simple Poisson model fails badly.The intuitive idea of posterior predictive checks is simple:1. Make a random draw of the model parameter λ from the posterior distribution.2. Plug that draw into the likelihood and generate a synthetic dataset { m k } conditioned on λ .3. Repeat many times.More formally, the posterior predictive distribution can be thought of as the distribution of future yet-to-be-observed data, conditioned on the data we have already observed. Clearly if those data appear quitediﬀerent, the model has a problem. Put another way, if we suppose the generative model is true, i.e. weclaim that our model explains the process through which our observed experimental data was generated,then the synthetic datasets we generate should resemble the actual observed data. If this is not the case, it67uggests the model is missing important features. All the data we consider in this work are 1D (distributionsof mRNA counts over a population) so empirical cumulative distribution functions ECDFs are an excellentvisual means of comparing synthetic and observed datasets. In general for higher dimensional datasets, muchof the challenge is in merely designing good visualizations that can actually show if synthetic and observeddata are similar or not.For our example Poisson promoter model then, we merely draw many random numbers, say 1000, from theGaussian posterior in Eq. S206. For each one of those draws, we generate a dataset from the likelihood, i.e.,we draw 2648 (the number of observed cells in the actual dataset) Poisson-distributed numbers for each ofthe 1000 posterior draws, for a total of 2648000 samples from the posterior predictive distribution.To compare so many samples with the actual observed data, one excellent visualization for 1D data is ECDFsof the quantiles, as shown for our Poisson model in Figure 3(B) in the main text. S3.2.2 Model 5 - Bursty promoter

Let us now consider the problem of parameter inference from FISH data for model ﬁve from Figure 1(C). Asderived in Appendix S2, the steady-state mRNA distribution in this model is a negative binomial distribution,given by p ( m ) = Γ( m + k i )Γ( m + 1)Γ( k i ) (cid:18)

11 + b (cid:19) k i (cid:18) b b (cid:19) m , (S207)where b is the mean burst size and k i is the burst rate nondimensionalized by the mRNA degradation rate γ .As sketched earlier, we can intuitively think about this distribution through a simple story. The story of thisdistribution is that the promoter undergoes geometrically-distributed bursts of mRNA, where the arrival ofbursts is a Poisson process with rate k i and the mean size of a burst is b .As for the Poisson promoter model, this expression for the steady-state mRNA distribution is exactly thelikelihood we want to use in Bayes’ theorem. Again denoting the single-cell mRNA count data as D = { m , m , . . . , m N } , here Bayes’ theorem takes the form p ( k i , b | D ) ∝ p ( D | k i , b ) p ( k i , b ) , (S208)where the likelihood p ( D | k i , b ) is given by the product of N negative binomials as in Eq. S207. We onlyneed to choose priors on k i and b . For the datasets from [36] that we are analyzing, as for the Poissonpromoter model above we are still data-rich so the prior’s inﬂuence remains weak, but not nearly as weakbecause the dimensionality of our model has increased from one to two.We follow the guidance of [52], Section 2.9 in opting for weakly-informative priors on k i and b (conjugatepriors do not exist for this problem), and we ﬁnd “street-ﬁghting estimates” [87] to be an ideal way ofconstructing such priors. The idea of weakly informative priors is to allow all remotely plausible values ofmodel parameters while excluding the completely absurd or unphysical.Consider k i . Some of the strongest known bacterial promoters control rRNA genes and initiate transcriptsno faster than ∼ / sec. It would be exceedingly strange if any of the constitutive promoters from [36] werestronger than that, so we can take that as an upper bound. For a lower bound, if transcripts are producedtoo rarely, there would be nothing to see with FISH. The datasets for each strain contain of order 10 cells,and if the (cid:104) m (cid:105) = k i b/γ (cid:46) − , then the total number of expected mRNA detections would be single-digitsor less and we would have essentially no data on which to carry out inference. So assuming b is not toodiﬀerent from 1, justiﬁed next, and an mRNA lifetime of γ − ∼ − k i /γ of perhaps 10 − and 3 × .Next consider mean burst size b . This parametrization of the geometric distribution allows bursts of sizezero (which could representing aborted transcripts and initiations), but it would be quite strange for themean burst size b to be below ∼ − , for which nearly all bursts would be of size zero or one. For an upperbound, if transcripts are initiating at a rate somewhat slower than rRNA promoters, then it would probablytake a time comparable to the lifetime of an mRNA to produce a burst larger than 10-20 transcripts, which68ould invalidate the approximation of the model that the duration of bursts are instantaneous compared toother timescales in the problem. So we will take soft bounds of 10 − and 10 for b .Note that the natural scale for these “street-ﬁghting estimates” was a log scale. This is commonly the casethat our prior sense of reasonable and unreasonable parameters is set on a log scale. A natural way to enforcethese soft bounds is therefore to use a lognormal prior distribution, with the soft bounds set ± k i ∼ Normal( − . , , ln b ∼ Normal(0 . , ,m ∼ NBinom( k i , b ) . (S209)Section 4 in the main text details the results of applying this inference to the single-cell mRNA counts data.There we show the posterior distribution for the two parameters for diﬀerent promoters. Figure S5 showsthe so-called posterior predictive checks (see main text for explanation) for all 18 unregulated promotersshown in the main text. S3.3 Bayesian inference on the simple-repression architecture

As detailed in 4 in the main text the inference on the unregulated promoter served as a stepping stonetowards our ultimate goal of inferring repressor rates from the steady-state mRNA distributions of simple-repression architectures. For this we expand the one-state bursty promoter model to a two-state promoteras schematized in Figure 1(C) as model 5. This model adds two new parameters: the repressor binding rate k + , solely function of the repressor concentration, and the repressor dissociation rate k − , solely a functionof the repressor-DNA binding aﬃnity.The structure of the data in [36] for regulated promoters tuned these two parameters independently. In theirwork the production of the LacI repressor was under the control of an inducible promoter regulated by theTetR repressor as schematized in Figre S6. When TetR binds to the small molecule anhydrotetracycline(aTc), it shifts to an inactive conformation unable to bind to the DNA. This translates into an increase ingene expression level. In other words, the higher the concentration of aTc added to the media, the less TetRrepressors that can control the expression of the lacI gene, so the higher the concentration of LacI repressorsin the cell. So by tuning the amount of aTc in the media where the experimental strains were grown theyeﬀectively tune k + in our simple theoretical model. On the other hand to tune k − the authors swap threediﬀerent binding sites for the LacI repressor, each with diﬀerent repressor-DNA binding aﬃnities previouslycharacterized [33].What this means is that we have access to data with diﬀerent combinations of k − and k + . We could naivelytry to ﬁt the kinetic parameters individually for each of the datasets, but there is no reason to believethat the binding site identity for the LacI repressor somehow aﬀects its expression level controlled from acompletely diﬀerent location in the genome, nor vice versa. In other words, what makes the most sense it toﬁt all datasets together to obtain a single value for each of the association and dissociation rates. What thismeans, as described in Section 4 of the main text is that we have a seven dimensional parameter space withfour possible association rates k + given the four available aTc concentrations, and three possible dissociationrates k − given the three diﬀerent binding sites available in the dataset.Formally now, denote the set of seven repressor rates to be inferred as (cid:126)k = { k − Oid , k − O , k − O , k +0 . , k +1 , k +2 , k +10 } . (S210)Note that since the repressor copy numbers are not known directly as explained before, we label theirassociation rates by the concentration of aTc. Bayes theorem reads simply p ( (cid:126)k, k i , b | D ) ∝ p ( D | (cid:126)k, k i , b ) p ( (cid:126)k, k i , b ) , (S211)69 E C D F = -7.0 = E C D F d i ff . = -6.9 = = -6.9 = = -6.8 = E C D F = -6.4 = E C D F d i ff . = -6.3 = = -5.9 = = -5.6 = E C D F = -5.3 = E C D F d i ff . = -4.8 = = -4.7 = = -4.1 = E C D F = -3.8 = E C D F d i ff . = -3.1 = = -3.0 = = -3.0 = E C D F = -2.2 = E C D F d i ff . = -1.4 = Figure S5. Theory-data comparison of inference on unregulated promoters.

Comparison of the inference(red shaded area) vs the experimental measurements (black lines) for 18 diﬀerent unregulated promoters withdiﬀerent mean mRNA expression levels from Ref. [36]. Upper panels show the empirical cumulative distributionfunction (ECDF), while the lower panels show the diﬀerences with respect to the median of the posterior samples.White numbers are the same as in Figure 1 for cross comparison. The predicted binding energies β ∆ ε p wereobtained from the energy matrix model in Ref. [54] acI TetR LacI[aTc] tetR inducible promotercontrolling expressionof LacI repressorLacI regulated promoterfrom which sm-FISHmeasurements were made

Figure S6. aTc controlled expression of LacI repressor.

Schematic of the circuit used in [36] to control theexpression of the LacI repressor. The lacI gene is under the control of the TetR repressor. As the TetR repressor isinactivated upon binding of anhydrotetracycline or aTc, the more aTc added to the media were cells are growing,the less TetR repressors available to control the expression of the lacI gene, resulting in more LacI repressors percell. LacI simultaneously controls the expression of the mRNA on which single-molecule mRNA FISH wasperformed for gene expression quantiﬁcation. where D is the set of all N observed single-cell mRNA counts across the various conditions. We assume thatindividual single-cell measurements are independent so that the likelihood factorizes as p ( D | (cid:126)k, k i , b ) = N (cid:89) j =1 p ( m | (cid:126)k, k i , b ) = N (cid:89) j =1 p ( m | k + j , k − j , k i , b ) (S212)where k ± j represent the appropriate binding and unbinding rates for the j -th measured cell. Our likelihoodfunction, previously derived in Appendix S2, is given by the rather complicated result in Eq. S184, whichfor completeness we reproduce here as p ( m | k + R , k − R , k i , b ) = Γ( α + m )Γ( β + m )Γ( k + R + k − R )Γ( α )Γ( β )Γ( k + R + k − R + m ) b m m ! × F ( α + m, β + m, k + R + k − R + m ; − b ) . (S213)where α and β , deﬁned for notational convenience, are α = 12 (cid:18) k i + k − R + k + R + (cid:113) ( k i + k − R + k + R ) − k i k − R (cid:19) β = 12 (cid:18) k i + k − R + k + R − (cid:113) ( k i + k − R + k + R ) − k i k − R (cid:19) . (S214)Next we specify priors. As for the constitutive model, weakly informative lognormal priors are a naturalchoice for all our rates. We found that if the priors were too weak, our MCMC sampler would often becomestuck in regions of parameter space with very low probability density, unable to move. We struck a balance inchoosing our prior widths between helping the sampler run while simultaneously verifying that the marginalposteriors for each parameter were not artiﬁcially constrained or distorted by the presence of the prior. Theonly exception to this is the highly informative priors we placed on k i and b , since we have strong knowledgeof them from our inference of constitutive promoters above.71ith priors and likelihood speciﬁed we may write down our complete generative model aslog k i ∼ Normal(0 . , . b ∼ Normal(0 . , . k +0 . ∼ Normal( − . , . k +1 ∼ Normal(0 . , . k +2 ∼ Normal(1 . , . k +10 ∼ Normal(1 . , . k − Oid ∼ Normal( − . , . k − O ∼ Normal(0 . , . k − O ∼ Normal(0 . , . m ∼ Likelihood( k + R , k − R , k i , b ) , (S215)where the likelihood is speciﬁed by Eq. S213. We ran MCMC sampling on the full nine dimensional posteriorspeciﬁed by this generative model.We found that ﬁtting a single operator/aTc concentration at a time with a single binding and unbindingrate did not yield a stable inference for most of the possible operator/aTc combinations. In other words, asingle dataset could not independently resolve the binding and unbinding rates, only their ratio as set by themean fold-change in Figure 1 in the main text. Only by making the assumption of a single unique bindingrate for each repressor copy number and a single unique unbinding rate for each binding site, as done inFigure 4(A), was it possible to independently resolve the rates and not merely their ratios.We also note that we found it necessary to exclude the very weakly and very strongly repressed datasets fromJones et. al. [36]. In both cases there was, in a sense, not enough information in the distributions for ourinference algorithm to extract, and their inclusion simply caused problems for the MCMC sampler withoutyielding any new insight. For the strongly repressed data (Oid, 10 ng/mL aTc), with >

95% of cells with zeromRNA, there was quite literally very little data from which to infer rates. And the weakly repressed data,all with the repressor binding site O3, had an unbinding rate so fast that the sampler essentially sampledfrom the prior; the likelihood had negligible inﬂuence, meaning the data was not informing the sampler inany meaningful way, so no inference was possible.

Supplemental References [76] T. Stewart, L. Strijbosch, H. Moors, and P. van Batenburg, “A simple approximation to the convolutionof gamma distributions,”

SSRN Electronic Journal , 2007.[77] H. Paul, “Photon antibunching,”

Reviews of Modern Physics , vol. 54, no. 4, pp. 1061–1102, Oct. 1982.[78] X. T. Zou and L. Mandel, “Photon-antibunching and sub-poissonian photon statistics,”

Physical ReviewA , vol. 41, no. 1, pp. 475–476, Jan. 1990.[79] V. N. Kampen, “Stochastic processes in physics and chemistry,” in

Stoch. Process. Phys. Chem.

Handbook of mathematical functions: With formulas, graphs,and mathematical tables . 1964.[81] J. W. Pearson, S. Olver, and M. A. Porter, “Numerical methods for the computation of the conﬂuentand Gauss hypergeometric functions,”

Numerical Algorithms , vol. 74, no. 3, pp. 821–866, Mar. 2017.[82] A. Gil, J. Segura, and N. M. Temme, “Numerically satisfactory solutions of hypergeometric recursions,”

Mathematics of Computation , vol. 76, no. 259, pp. 1449–1469, Jan. 2007.[83] J. Gunawardena, “Models in biology: ‘accurate descriptions of our pathetic thinking’,”

BMC Biology ,vol. 12, p. 29, Apr. 2014.[84] A. Gelman, D. Simpson, and M. Betancourt, “The prior can often only be understood in the contextof the likelihood,”

Entropy , vol. 19, no. 10, p. 555, Oct. 2017.7285] M. Betancourt, “A conceptual introduction to Hamiltonian Monte Carlo,”

ArXiv , Jul. 2018.[86] B. Carpenter, A. Gelman, M. D. Hoﬀman, et al. , “Stan: A probabilistic programming language,”

Journal of Statistical Software , vol. 76, no. 1, 2017.[87] S. Mahajan,