[PDF] First-principles prediction of the information processing capacity of a simple genetic circuit

Abstract

Given the stochastic nature of gene expression, genetically identical cells exposed to the same environmental inputs will produce different outputs. This heterogeneity has been hypothesized to have consequences for how cells are able to survive in changing environments. Recent work has explored the use of information theory as a framework to understand the accuracy with which cells can ascertain the state of their surroundings. Yet the predictive power of these approaches is limited and has not been rigorously tested using precision measurements. To that end, we generate a minimal model for a simple genetic circuit in which all parameter values for the model come from independently published data sets. We then predict the information processing capacity of the genetic circuit for a suite of biophysical parameters such as protein copy number and protein-DNA affinity. We compare these parameter-free predictions with an experimental determination of protein expression distributions and the resulting information processing capacity of E. coli cells. We find that our minimal model captures the scaling of the cell-to-cell variability in the data and the inferred information processing capacity of our simple genetic circuit up to a systematic deviation.

Full PDF

FFirst-principles prediction of the information processingcapacity of a simple genetic circuit

Manuel Razo-Mejia , Sarah Marzen , Griﬃn Chure , Rachel Taubman , Muir Morrison , andRob Phillips

1, 3, * Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA Department of Physics, W. M. Keck Science Department Department of Physics, California Institute of Technology, Pasadena, CA 91125, USA * Correspondence: [email protected]

Abstract

Given the stochastic nature of gene expression, genetically identical cells exposed to the same environ-mental inputs will produce diﬀerent outputs. This heterogeneity has been hypothesized to have consequencesfor how cells are able to survive in changing environments. Recent work has explored the use of informa-tion theory as a framework to understand the accuracy with which cells can ascertain the state of theirsurroundings. Yet the predictive power of these approaches is limited and has not been rigorously testedusing precision measurements. To that end, we generate a minimal model for a simple genetic circuit inwhich all parameter values for the model come from independently published data sets. We then predict theinformation processing capacity of the genetic circuit for a suite of biophysical parameters such as proteincopy number and protein-DNA aﬃnity. We compare these parameter-free predictions with an experimentaldetermination of protein expression distributions and the resulting information processing capacity of

E. coli cells. We ﬁnd that our minimal model captures the scaling of the cell-to-cell variability in the data and theinferred information processing capacity of our simple genetic circuit up to a systematic deviation.

As living organisms thrive in a given environment, they are faced with constant changes in their surround-ings. From abiotic conditions such as temperature ﬂuctuations or changes in osmotic pressure, to biologicalinteractions such as cell-to-cell communication in a tissue or in a bacterial bioﬁlm, living organisms of all typessense and respond to external signals. Fig. 1(A) shows a schematic of this process for a bacterial cell sensinga concentration of an extracellular chemical. At the molecular level where signal transduction unfolds mech-anistically, there are physical constraints on the accuracy and precision of these responses given by intrinsicstochastic ﬂuctuations [1]. This means that two genetically identical cells exposed to the same stimulus will nothave identical responses [2].One implication of this noise in biological systems is that cells do not have an inﬁnite resolution to distinguishsignals and, as a consequence, there is a one-to-many mapping between inputs and outputs. Furthermore, giventhe limited number of possible outputs, there are overlapping responses between diﬀerent inputs. This scenariocan be map to a Bayesian inference problem where cells try to infer the state of the environment from theirphenotypic response, as schematized in Fig. 1(B). The question then becomes this: how can one analyze thisprobabilistic, rather than deterministic, relationship between inputs and outputs? The abstract answer to thisquestion was worked out in 1948 by Claude Shannon who, in his seminal work, founded the ﬁeld of informationtheory [3]. Shannon developed a general framework for how to analyze information transmission through noisycommunication channels. In his work, Shannon showed that the only quantity that satisﬁes three reasonableaxioms for a measure of uncertainty was of the same functional form as the thermodynamic entropy – therebychristening his metric the information entropy [4]. He also gave a deﬁnition, based on this information entropy,for the relationship between inputs and outputs known as the mutual information. The mutual information I between input c and output p , given by I = (cid:88) c P ( c ) (cid:88) p P ( p | c ) log P ( p | c ) P ( p ) , (1)quantiﬁes how much we learn about the state of the input c given that we get to observe the output p . In otherwords, the mutual information can be thought of as a generalized correlation coeﬃcient that quantiﬁes thedegree to which the uncertainty about a random event decreases given the knowledge of the average outcomeof another random event [5]. 1 a r X i v : . [ q - b i o . M N ] M a y t is natural to conceive of scenarios in which living organisms that can better resolve signals might have anevolutionary beneﬁt, making it more likely that their oﬀspring will have a ﬁtness advantage [6]. In recent yearsthere has been a growing interest in understanding the theoretical limits on cellular information processing [7,8], and in quantifying how close evolution has pushed cellular signaling pathways to these theoretical limits[9–11]. While these studies have treated the signaling pathway as a “black box,” explicitly ignoring all themolecular interactions taking place in them, other studies have explored the role that molecular players andregulatory architectures have on these information processing tasks [12–18]. Despite the great advances in ourunderstanding of the information processing capabilities of molecular mechanisms, the ﬁeld still lacks a rigorousexperimental test of these detailed models with precision measurements on a simple system in which physicalparameters can be perturbed. In this work we approach this task with a system that is both theoretically andexperimentally tractable in which molecular parameters can be varied in a controlled manner.Over the last decade the dialogue between theory and experiments in gene regulation has led to predictivepower of models not only over the mean level of gene expression, but the noise as a function of relevantparameters such as regulatory protein copy numbers, aﬃnity of these proteins to the DNA promoter, as wellas the extracellular concentrations of inducer molecules [19–22]. These models based on equilibrium and non-equilibrium statistical physics have reached a predictive accuracy level such that, for simple cases, it is nowpossible to design input-output functions [23, 24]. This opens the opportunity to exploit these predictive modelsto tackle the question of how much information genetic circuits can process. This question lies at the heart ofunderstanding the precision of the cellular response to environmental signals. Fig. 1(C) schematizes a scenarioin which two bacterial strains respond with diﬀerent levels of precision to three possible environmental states,i.e., inducer concentrations. The overlap between the three diﬀerent responses is what precisely determines theresolution with which cells can distinguish diﬀerent inputs. This is analogous to how the point spread functionlimits the ability to resolve two light emitting point sources.In this work we follow the same philosophy of theory-experiment dialogue used to determine model pa-rameters to predict from ﬁrst principles the eﬀect that biophysical parameters such as transcription factorcopy number and protein-DNA aﬃnity have on the information processing capacity of a simple genetic circuit.Speciﬁcally, to predict the mutual information between an extracellular chemical signal (input c ) and the cor-responding cellular response in the form of protein expression (output p ), we must compute the input-outputfunction P ( p | c ). To do so, we use a master-equation-based model to construct the protein copy number distri-bution as a function of an extracellular inducer concentration for diﬀerent combinations of transcription factorcopy numbers and binding sites. Having these input-output distributions allows us to compute the mutualinformation I between inputs and outputs for any arbitrary input distribution P ( c ). We opt to compute thechannel capacity, i.e., the maximum information that can be processed by this gene regulatory architecture,deﬁned as Eq. 1 maximized over all possible input distributions P ( c ). By doing so we examine the physicallimits of what cells can do in terms of information processing by harboring these genetic circuits. Nevertheless,given the generality of the input-output function P ( p | c ) we derive, the model presented here can be used tocompute the mutual information for any arbitrary input distribution P ( c ). All parameters used for our modelwere inferred from a series of studies that span several experimental techniques [20, 25–27], allowing us to makeparameter-free predictions of this information processing capacity [28].These predictions are then contrasted with experimental data, where the channel capacity is inferred fromsingle-cell ﬂuorescence distributions taken at diﬀerent concentrations of inducer for cells with previously char-acterized biophysical parameters [20, 27]. We ﬁnd that our parameter-free predictions quantitatively track theexperimental data up to a systematic deviation. The lack of numerical agreement between our model and theexperimental data poses new challenges towards having a foundational, ﬁrst-principles understanding of thephysics of cellular decision-making.The reminder of the paper is organized as follows. In Section 1.1 we deﬁne the minimal theoretical modeland parameter inference for a simple repression genetic circuit. Section 1.2 discusses how all parameters for theminimal model are determined from published datasets that explore diﬀerent aspects of the simple repressionmotif. Section 1.3 computes the moments of the mRNA and protein distributions from this minimal model. In2ection 1.4 we explore the consequences of variability in gene copy number during the cell cycle. In this sectionwe compare experimental and theoretical quantities related to the moments of the distribution, speciﬁcally thepredictions for the fold-change in gene expression (mean expression relative to an unregulated promoter) andthe gene expression noise (standard deviation over mean). Section 1.5 follows with reconstruction of the fullmRNA and protein distribution from the moments using the maximum entropy principle. Finally Section 1.6uses the distributions from Section 1.5 to compute the maximum amount of information that the genetic circuitcan process. Here we again contrast our zero-parameter ﬁt predictions with experimental inferences of thechannel capacity. input [ ] inducer concentration signal processing cellular response protein expression output (A) (B)(C) protein copy number p r o b a b i l i t y p r o b a b i l i t y p r o b a b i l i t y protein copy number input-output function noisyresponsepreciseresponse cellular response as Bayesian inference posterior likelihood priorprotein copy number p r o b a b i l i t y l o w h i g h m e d i u m p r o b a b i l i t y l o w h i g h m e d i u m p r o b a b i l i t y protein copy number p r o b a b i l i t y well resolved input poorly resolved input environment state inference [ ][ ][ ] stimulus l o w m e d h i g h l o w m e d h i g h l o w m e d h i g h l o w m e d h i g h === l o w m e d h i g h l o w m e d h i g h Figure 1. Cellular signaling systems sense the environment with diﬀerent degrees of precision . (A)Schematic representation of a cell as a noisy communication channel. From an environmental input (inducer moleculeconcentration) to a phenotypic output (protein expression level), cellular signaling systems can be modeled as noisycommunication channels. (B) We treat cellular response to an external stimulus as a Bayesian inference of the state ofthe environment. As the phenotype (protein level) serves as the internal representation of the environmental state(inducer concentration), the probability of a cell being in a speciﬁc environment given this internal representation P ( c | p ) is a function of the probability of the response given that environmental state P ( p | c ). (C) The precision ofthe inference of the environmental state depends on how well can cells resolve diﬀerent inputs. For three diﬀerent levelsof input (left panel) the green strain responds more precisely than the purple strain since the output distributionsoverlap less (middle panel). This allows the green strain to make a more precise inference of the environmental stategiven a phenotypic response (right panel). As a tractable circuit for which we have control over the parameters both theoretically and experimentally,we chose the so-called simple repression motif, a common regulatory scheme among prokaryotes [29]. Thiscircuit consists of a single promoter with an RNA-polymerase (RNAP) binding site and a single binding sitefor a transcriptional repressor [20]. The regulation due to the repressor occurs via exclusion of the RNAPfrom its binding site when the repressor is bound, decreasing the likelihood of having a transcription event.As with many important macromolecules, we consider the repressor to be allosteric, meaning that it can exist3n two conformations, one in which the repressor is able to bind to the speciﬁc binding site (active state) andone in which it cannot bind the speciﬁc binding site (inactive state). The environmental signaling occurs viapassive import of an extracellular inducer that binds the repressor, shifting the equilibrium between the twoconformations of the repressor [27]. In previous work we have extensively characterized the mean response ofthis circuit under diﬀerent conditions using equilibrium based models [28]. Here we build upon these modelsto characterize the full distribution of gene expression with parameters such as repressor copy number and itsaﬃnity for the DNA being systematically varied.As the copy number of molecular species is a discrete quantity, chemical master equations have emerged asa useful tool to model their inherent probability distribution [30]. In Fig. 2(A) we show the minimal model andthe necessary set of parameters needed to compute the full distribution of mRNA and its protein gene product.Speciﬁcally, we assume a three-state model where the promoter can be found in a 1) transcriptionally active state( A state), 2) a transcriptionally inactive state without the repressor bound ( I state) and 3) a transcriptionallyinactive state with the repressor bound ( R state). We do not assume that the transition between the active state A and the inactive state I occurs due to RNAP binding to the promoter as the transcription initiation kineticsinvolve several more steps than simple binding [31]. We coarse-grain all these steps into eﬀective “on” and“oﬀ” states for the promoter, consistent with experiments demonstrating the bursty nature of gene expressionin E. coli [19]. These three states generate a system of coupled diﬀerential equations for each of the three statedistributions P A ( m, p ; t ), P I ( m, p ; t ) and P R ( m, p ; t ), where m and p are the mRNA and protein count per cell,respectively and t is time. Given the rates depicted in Fig. 2(A) we deﬁne the system of ODEs for a speciﬁc m and p . For the transcriptionally active state, we have dP A ( m, p ) dt = − A → I (cid:122) (cid:125)(cid:124) (cid:123) k ( p )oﬀ P A ( m, p ) + I → A (cid:122) (cid:125)(cid:124) (cid:123) k ( p )on P I ( m, p )+ m − → m (cid:122) (cid:125)(cid:124) (cid:123) r m P A ( m − , p ) − m → m +1 (cid:122) (cid:125)(cid:124) (cid:123) r m P A ( m, p ) + m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) γ m ( m + 1) P A ( m + 1 , p ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γ m mP A ( m, p )+ p − → p (cid:122) (cid:125)(cid:124) (cid:123) r p mP A ( m, p − − p → p +1 (cid:122) (cid:125)(cid:124) (cid:123) r p mP A ( m, p ) + p +1 → p (cid:122) (cid:125)(cid:124) (cid:123) γ p ( p + 1) P A ( m, p + 1) − p → p − (cid:122) (cid:125)(cid:124) (cid:123) γ p pP A ( m, p ) , (2)where the state transitions for each term are labeled by overbraces. For the transcriptionally inactive state I ,we have dP I ( m, p ) dt = A → I (cid:122) (cid:125)(cid:124) (cid:123) k ( p )oﬀ P A ( m, p ) − I → A (cid:122) (cid:125)(cid:124) (cid:123) k ( p )on P I ( m, p ) + R → I (cid:122) (cid:125)(cid:124) (cid:123) k ( r )oﬀ P R ( m, p ) − I → R (cid:122) (cid:125)(cid:124) (cid:123) k ( r )on P I ( m, p )+ m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) γ m ( m + 1) P I ( m + 1 , p ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γ m mP I ( m, p )+ p − → p (cid:122) (cid:125)(cid:124) (cid:123) r p mP I ( m, p − − p → p +1 (cid:122) (cid:125)(cid:124) (cid:123) r p mP I ( m, p ) + p +1 → p (cid:122) (cid:125)(cid:124) (cid:123) γ p ( p + 1) P I ( m, p + 1) − p → p − (cid:122) (cid:125)(cid:124) (cid:123) γ p pP I ( m, p ) . (3)And ﬁnally, for the repressor bound state R , dP R ( m, p ) dt = − R → I (cid:122) (cid:125)(cid:124) (cid:123) k ( r )oﬀ P R ( m, p ) + I → R (cid:122) (cid:125)(cid:124) (cid:123) k ( r )on P I ( m, p )+ m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) γ m ( m + 1) P R ( m + 1 , p ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γ m mP R ( m, p )+ p − → p (cid:122) (cid:125)(cid:124) (cid:123) r p mP R ( m, p − − p → p +1 (cid:122) (cid:125)(cid:124) (cid:123) r p mP R ( m, p ) + p +1 → p (cid:122) (cid:125)(cid:124) (cid:123) γ p ( p + 1) P R ( m, p + 1) − p → p − (cid:122) (cid:125)(cid:124) (cid:123) γ p pP R ( m, p ) . (4)4s we will discuss later in Section 1.4 the protein degradation term γ p is set to zero since active proteindegradation is slow compared to the cell cycle of exponentially growing bacteria, but rather we explicitlyimplement binomial partitioning of the proteins into daughter cells upon division [32].It is convenient to rewrite these equations in a compact matrix notation [30]. For this we deﬁne the vector P ( m, p ) as P ( m, p ) = ( P A ( m, p ) , P I ( m, p ) , P R ( m, p )) T , (5)where T is the transpose. By deﬁning the matrices K to contain the promoter state transitions, R m and Γ m to contain the mRNA production and degradation terms, respectively, and R p and Γ p to contain the proteinproduction and degradation terms, respectively, the system of ODEs can then be written as (See Appendix S1for full deﬁnition of these matrices) d P ( m, p ) dt = ( K − R m − m Γ m − m R p − p Γ p ) P ( m, p )+ R m P ( m − , p ) + ( m + 1) Γ m P ( m + 1 , p )+ m R p P ( m, p −

1) + ( p + 1) Γ p P ( m, p + 1) . (6)Having deﬁned the gene expression dynamics we now proceed to determine all rate parameters in Eq. 6. A decade of research in our group has characterized the simple repression motif with an ever expanding arrayof predictions and corresponding experiments to uncover the physics of this genetic circuit [28]. In doing so wehave come to understand the mean response of a single promoter in the presence of varying levels of repressorcopy numbers and repressor-DNA aﬃnities [20], due to the eﬀect that competing binding sites and multiplepromoter copies impose [26], and in recent work, assisted by the Monod-Wyman-Changeux (MWC) model,we expanded the scope to the allosteric nature of the repressor [27]. All of these studies have exploited thesimplicity and predictive power of equilibrium approximations to these non-equilibrium systems [33]. We havealso used a similar kinetic model to that depicted in Fig. 2(A) to study the noise in mRNA copy number [25].As a test case of the depth of our theoretical understanding of this simple transcriptional regulation system wecombine all of the studies mentioned above to inform the parameter values of the model presented in Fig. 2(A).Fig. 2(B) schematizes the data sets and experimental techniques used to measure gene expression along withthe parameters that can be inferred from them.Appendix S2 expands on the details of how the inference was performed for each of the parameters. Brieﬂy,the promoter activation and inactivation rates k ( p )on and k ( p )oﬀ , as well as the transcription rate r m were obtainedin units of the mRNA degradation rate γ m by ﬁtting a two-state promoter model (no state R from Fig. 2(A))[34] to mRNA FISH data of an unregulated promoter (no repressor present in the cell) [25]. The repressor onrate is assumed to be of the form k ( r )on = k o [ R ] where k o is a diﬀusion-limited on rate and [ R ] is the concentrationof active repressor in the cell [25]. This concentration of active repressor is at the same time determined bythe repressor copy number in the cell, and the fraction of these repressors that are in the active state, i.e.able to bind DNA. Existing estimates of the transition rates between conformations of allosteric molecules setthem at the microsecond scale [35]. By considering this to be representative for our repressor of interest, theseparation of time-scales between the rapid conformational changes of the repressor and the slower downstreamprocesses such as the open-complex formation processes allow us to model the probability of the repressor beingin the active state as an equilibrium MWC process. The parameters of the MWC model K A , K I and ∆ ε AI were previously characterized from video-microscopy and ﬂow-cytometry data [27]. For the repressor oﬀ rate, k ( r )oﬀ , we take advantage of the fact that the mean mRNA copy number as derived from the model in Fig. 2(A)cast in the language of rates is of the same functional form as the equilibrium model cast in the language ofbinding energies [36]. Therefore the value of the repressor-DNA binding energy ∆ ε r constrains the value of therepressor oﬀ rate k ( r )oﬀ . These constraints on the rates allow us to make self-consistent predictions under both5he equilibrium and the kinetic framework. Having all parameters in hand, we can now proceed to solve he geneexpression dynamics.

20 40 600 llec/ANRm p r o b a b ili t y (A) KINETIC MODEL AND MINIMAL PARAMETER SET (B)

PARAMETER DETERMINATION promoter on and off rates transcriptionraterepressor unbinding raterepressor binding rate f o l d - c h a n g e –6 –4 –2 )M( GTPI10 number of repressors10 –1 –2 –3 –4 f o l d - c h a n g e number of repressors f o l d - c h a n g e –1 –2 –3 –4 single-site repression(via colorimetric assay) transcription factor competition(via video microscopy) allosteric regulation(via flow cytometry) mRNA expression noise(via mRNA FISH)inducer dissociationconstants f r ee ene r g y active-inactiveenergy differenceSTATE R STATE I STATE A repressor-DNAbinding energy active repressorinactive repressorRNA polymerasemRNAfluorescent protein Figure 2. Minimal kinetic model of transcriptional regulation for a simple repression architecture. (A)Three-state promoter stochastic model of transcriptional regulation by a repressor. The regulation by the repressoroccurs via exclusion of the transcription initiation machinery, not allowing the promoter to transition to thetranscriptionally active state. All parameters highlighted with colored boxes were determined from published datasetsbased on the same genetic circuit. Parameters in dashed boxes were taken directly from values reported in theliterature or adjusted to satisfy known biological restrictions. (B) Data sets used to infer the parameter values. Fromleft to right Garcia & Phillips [20] is used to determine k ( r )oﬀ and k ( r )on , Brewster et al. [26] is used to determine ∆ ε AI and k ( r )on , Razo-Mejia et al. [27] is used to determine K A , K I , and k ( r )on and Jones et al. [25] is used to determine r m , k ( p )on , and k ( p )oﬀ . Finding analytical solutions to chemical master equations is often fraught with diﬃculty. An alternativeapproach is to to approximate the distribution. One such scheme of approximation, the maximum entropy6rinciple, makes use of the moments of the distribution to approximate the full distribution. In this section wewill demonstrate an iterative algorithm to compute the mRNA and protein distribution moments.The kinetic model for the simple repression motif depicted in Fig. 2(A) consists of an inﬁnite system of ODEsfor each possible pair of mRNA and protein copy number, ( m, p ). To compute any moment of the distribution,we deﬁne a vector (cid:104) m x p y (cid:105) ≡ ( (cid:104) m x p y (cid:105) A , (cid:104) m x p y (cid:105) I , (cid:104) m x p y (cid:105) R ) T , (7)where (cid:104) m x p y (cid:105) S is the expected value of m x p y in state S ∈ { A, I, R } for x, y ∈ N . In other words, just as wedeﬁned the vector P ( m, p ), here we deﬁne a vector to collect the expected value of each of the promoter states.By deﬁnition, any of these moments (cid:104) m x p y (cid:105) S can be computed as (cid:104) m x p y (cid:105) S ≡ ∞ (cid:88) m =0 ∞ (cid:88) p =0 m x p y P S ( m, p ) . (8)Summing over all possible values for m and p in Eq. 6 results in an ODE for any moment of the distribution ofthe form (See Appendix S3 for full derivation) d (cid:104) m x p y (cid:105) dt = K (cid:104) m x p y (cid:105) + R m (cid:104) p y [( m + ) x − m x ] (cid:105) + Γ m (cid:104) mp y [( m − ) x − m x ] (cid:105) + R p (cid:68) m ( x + ) [( p + ) y − p y ] (cid:69) + Γ p (cid:104) m x p [( p − ) y − p y ] (cid:105) . (9)Given that all transitions in our stochastic model are ﬁrst order reactions, Eq. 9 has no moment-closureproblem [14]. This means that the dynamical equation for a given moment only depends on lower moments(See Appendix S3 for full proof). This feature of our model implies, for example, that the second moment ofthe protein distribution (cid:10) p (cid:11) depends only on the ﬁrst two moments of the mRNA distribution (cid:104) m (cid:105) and (cid:10) m (cid:11) ,the ﬁrst protein moment (cid:104) p (cid:105) , and the cross-correlation term (cid:104) mp (cid:105) . We can therefore deﬁne µ ( x , y ) to be a vectorcontaining all moments up to (cid:104) m x p y (cid:105) for all promoter states, µ ( x , y ) = (cid:2)(cid:10) m p (cid:11) , (cid:10) m p (cid:11) , . . . , (cid:104) m x p y (cid:105) (cid:3) T . (10)Explicitly for the three-state promoter model depicted in Fig. 2(A) this vector takes the form µ ( x , y ) = (cid:2)(cid:10) m p (cid:11) A , (cid:10) m p (cid:11) I , (cid:10) m p (cid:11) R , . . . , (cid:104) m x p y (cid:105) A , (cid:104) m x p y (cid:105) I , (cid:104) m x p y (cid:105) R (cid:3) T . (11)Given this deﬁnition we can compute the general moment dynamics as d µ ( x , y ) dt = A µ ( x , y ) , (12)where A is a square matrix that contains all the numerical coeﬃcients that relate each of the moments. We canthen use Eq. 9 to build matrix A by iteratively substituting values for the exponents x and y up to a speciﬁedvalue. In the next section, we will use Eq. 12 to numerically integrate the dynamical equations for our momentsof interest as cells progress through the cell cycle. We will then use the value of the moments of the distributionto approximate the full gene expression distribution. This method is computationally more eﬃcient than tryingto numerically integrate the inﬁnite set of equations describing the full probability distribution P ( m, p ), or usinga stochastic algorithm to sample from the distribution.7 .4 Accounting for cell-cycle dependent variability in gene dosage As cells progress through the cell cycle, the genome has to be replicated to guarantee that each daughter cellreceives a copy of the genetic material. As replication of the genome can take longer than the total cell cycle,this implies that cells spend part of the cell cycle with multiple copies of each gene depending on the cellulargrowth rate and the relative position of the gene with respect to the replication origin [37]. Genes closer tothe replication origin spend a larger fraction of the cell cycle with multiple copies compared to genes closer tothe replication termination site [37]. Fig. 3(A) depicts a schematic of this process where the replication origin( oriC ) and the relevant locus for our experimental measurements ( galK ) are highlighted.Since this change in gene copy number has been shown to have an eﬀect on cell-to-cell variability in geneexpression [25, 38], we now extend our minimal model to account for these changes in gene copy number duringthe cell cycle. We reason that the only diﬀerence between the single-copy state and the two-copy state of thepromoter is a doubling of the mRNA production rate r m . In particular, the promoter activation and inactivationrates k ( p )on and k ( p )oﬀ and the mRNA production rate r m inferred in Section 1.1 assume that cells spend a fraction f of the cell cycle with one copy of the promoter (mRNA production rate r m ) and a fraction (1 − f ) of the cellcycle with two copies of the promoter (mRNA production rate 2 r m ). This inference was performed consideringthat at each cell state the mRNA level immediately reaches the steady state value for the corresponding mRNAproduction rate. This assumption is justiﬁed since the timescale to reach this steady state depends only onthe degradation rate γ m , which for the mRNA is much shorter ( ≈ ≈

60 min for our experimental conditions) [39]. Appendix S2 shows that a model accounting for this gene copynumber variability is able to capture data from single molecule mRNA counts of an unregulated (constitutivelyexpressed) promoter.Given that the protein degradation rate γ p in our model is set by the cell division time, we do not expectthat the protein count will reach the corresponding steady state value for each stage in the cell cycle. Inother words, cells do not spend long enough with two copies of the promoter for the protein level to reachthe steady state value corresponding to a transcription rate of 2 r m . We therefore use the dynamical equationsdeveloped in Section 1.3 to numerically integrate the time trajectory of the moments of the distribution withthe corresponding parameters for each phase of the cell cycle. Fig. 3(B) shows an example corresponding tothe mean mRNA level (upper panel) and the mean protein level (lower panel) for the case of the unregulatedpromoter. Given that we inferred the promoter rate parameters considering that mRNA reaches steady state ineach stage, we see that the numerical integration of the equations is consistent with the assumption of havingthe mRNA reach a stable value in each stage (See Fig. 3(B) upper panel). On the other hand, the mean proteinlevel does not reach a steady state at either of the cellular stages. Nevertheless it is notable that after severalcell cycles the trajectory from cycle to cycle follows a repetitive pattern (See Fig. 3(B) lower panel). Previouslywe have experimentally observed this repetitive pattern by tracking the expression level over time with videomicroscopy as observed in Fig. 18 of [28].To test the eﬀects of including this gene copy number variability in our model we now compare the predictionsof the model with experimental data. As detailed in the Methods section, we obtained single-cell ﬂuorescencevalues of diﬀerent E. coli strains carrying a YFP gene under the control of the LacI repressor. Each strain wasexposed to twelve diﬀerent inducer concentrations for ≈ (cid:104) p (cid:105) ≡ (cid:104) p ( t ) (cid:105) . This means that computing the mean of a population ofunsynchronized cells is equivalent to averaging this time-dependent mean protein copy number over the span ofthe cell cycle. Mathematically this is expressed as (cid:104) p (cid:105) c = (cid:90) t d t o (cid:104) p ( t ) (cid:105) P ( t ) dt, (13)where (cid:104) p ( t ) (cid:105) represents the ﬁrst moment of the protein distribution as computed from Eq. 9, (cid:104) p (cid:105) c represents the8verage protein copy number over a cell cycle, t o represents the start of the cell cycle, t d represents the timeof cell division, and P ( t ) represents the probability of any cell being at time t ∈ [ t o , t d ] of their cell cycle. Wedo not consider cells uniformly distributed along the cell cycle since it is known that cells age is exponentiallydistributed, having more younger than older cells at any point in time [40] (See Appendix S10 for furtherdetails). All computations hereafter are therefore done by applying an average like that in Eq. 13 for the spanof a cell cycle. We remind the reader that these time averages are done under a ﬁxed environmental state. It isthe trajectory of cells over cell cycles under a constant environment that we need to account for. It is throughthis averaging over the span of a cell cycle that we turn a periodic process as the one shown in Fig. 3(B) intoa stationary process that we can compare with experimental data and, as we will see later, use to reconstructthe steady state gene expression distribution.Fig. 3(C) compares zero-parameter ﬁt predictions (lines) with experimentally determined quantities (points).The upper row shows the non-dimensional quantity known as the fold-change in gene expression [20]. This fold-change is deﬁned as the relative mean gene expression level with respect to an unregulated promoter. Forprotein this is fold-change = (cid:104) p ( R > (cid:105) c (cid:104) p ( R = 0) (cid:105) c , (14)where (cid:104) p ( R > (cid:105) c represents the mean protein count for cells with non-zero repressor copy number count R over the entire cell cycle, and (cid:104) p ( R = 0) (cid:105) c represents the equivalent for a strain with no repressors present. Theexperimental points were determined from the YFP ﬂuorescent intensities of cells with varying repressor copynumber and a ∆ lacI strain with no repressor gene present (See Methods for further details). The fold-change ingene expression has previously served as a metric to test the validity of equilibrium-based models [36]. We notethat the curves shown in the upper panel of Fig. 3(C) are consistent with the predictions from equilibrium models[27] despite being generated from a clearly non-equilibrium process as shown in Fig. 3(B). The kinetic modelfrom Fig. 2(A) goes beyond the equilibrium picture to generate predictions for moments of the distribution otherthan the mean mRNA or mean protein count. To test this extended predictive power the lower row of Fig. 3(C)shows the noise in gene expression deﬁned as the standard deviation over the mean protein count, accounting forthe changes in gene dosage during the cell cycle. Although our model systematically underestimates the noisein gene expression, the zero-parameter ﬁts capture the scaling of this noise. Possible origins of this systematicdiscrepancy could be the intrinsic cell-to-cell variability of rate parameters given the variability in the molecularcomponents of the central dogma machinery [25], or noise generated by irreversible non-equilibrium reactionsnot explicitly taken into account in our minimal model [41]. The large errors for the highly repressed strains(lower left panel in Fig. 3(C)) are a result of having a small number in the denominator - mean ﬂuorescencelevel - when computing the noise. Although the model is still highly informative about the physical nature ofhow cells regulate their gene expression, the lack of exact numerical agreement between theory and data opensan opportunity to gain new insights into the biophysical origin of cell-to-cell variability. In Appendix S9 weexplore empirical ways to account for this systematic deviation. We direct the reader to Appendix S5 whereequivalent predictions are done ignoring the changes in gene dosage due to the replication of the genome. Having numerically computed the moments of the mRNA and protein distributions as cells progress throughthe cell cycle, we now proceed to make an approximate reconstruction of the full distributions given this limitedinformation. As hinted in Section 1.3 the maximum entropy principle, ﬁrst proposed by E.T. Jaynes in 1957[42], approximates the entire distribution by maximizing the Shannon entropy subject to constraints given bythe values of the moments of the distribution [42]. This procedure leads to a probability distribution of theform (See Appendix S6 for full derivation) P ( m, p ) = 1 Z exp  − (cid:88) ( x,y ) λ ( x,y ) m x p y  , (15)9 A) (B)(C) oriC galK replicationforks m R N A / c e ll single promoter two promoters time (min) p r o t e i n / c e ll Figure 3. Accounting for gene copy number variability during the cell cycle. (A) Schematic of a replicatingbacterial genome. As cells progress through the cell cycle the genome is replicated, duplicating gene copies for afraction of the cell cycle before the cell divides. oriC indicates the replication origin, and galK indicates the locus atwhich the YFP reporter construct was integrated. (B) mean (solid line) ± standard deviation (shaded region) for themRNA (upper panel) and protein (lower panel) dynamics. Cells spend a fraction of the cell cycle with a single copy ofthe promoter (light brown) and the rest of the cell cycle with two copies (light yellow). Black arrows indicate time ofcell division. (C) Zero parameter-ﬁt predictions (lines) and experimental data (circles) of the gene expressionfold-change (upper row) and noise (lower row) for repressor binding sites with diﬀerent aﬃnities (diﬀerent columns)and diﬀerent repressor copy numbers per cell (diﬀerent lines on each panel). Error bars in data represent the 95%conﬁdence interval on the quantities as computed from 10,000 bootstrap estimates generated from >

500 single-cellﬂuorescence measurements. In the theory curves, dotted lines indicate plot in linear scale to include zero, while solidlines indicate logarithmic scale. For visual clarity, data points in the noise panel with exceptionally large values comingfrom highly repressed strains are plotted on a separate panel. where λ ( x,y ) is the Lagrange multiplier associated with the constraint set by the moment (cid:104) m x p y (cid:105) , and Z is a nor-malization constant. The more moments (cid:104) m x p y (cid:105) included as constraints, the more accurate the approximationresulting from Eq. 15 becomes.The computational challenge then becomes an optimization routine in which the values for the Lagrangemultipliers λ ( x,y ) that are consistent with the constraints set by the moment values (cid:104) m x p y (cid:105) need to be found.This is computationally more eﬃcient than sampling directly out of the master equation with a stochasticalgorithm (see Appendix S7 for further comparison between maximum entropy estimates and the Gillespiealgorithm). Appendix S6 details our implementation of a robust algorithm to ﬁnd the values of the Lagrange10ultipliers. Fig. 4(A) shows example predicted protein distributions reconstructed using the ﬁrst six moments ofthe protein distribution for a suite of diﬀerent biophysical parameters and environmental inducer concentrations.As repressor-DNA binding aﬃnity (columns in Fig. 4(A)) and repressor copy number (rows in Fig. 4(A)) arevaried, the responses to diﬀerent signals, i.e. inducer concentrations, overlap to varying degrees. For example,the upper right corner frame with a weak binding site (∆ ε r = − . k B T ) and a low repressor copy number(22 repressors per cell) have virtually identical distributions regardless of the input inducer concentration.This means that cells with this set of parameters cannot resolve any diﬀerence in the concentration of thesignal. As the number of repressors is increased, the degree of overlap between distributions decreases, allowingcells to better resolve the value of the signal input. On the opposite extreme the lower left panel shows astrong binding site (∆ ε r = − . k B T ) and a high repressor copy number (1740 repressors per cell). Thisparameter combination shows overlap between distributions since the high degree of repression centers alldistributions towards lower copy numbers, again giving little ability for the cells to resolve the inputs. InFig. 4(B) and Appendix S6 we show the comparison of these predicted cumulative distributions with theexperimental single-cell ﬂuorescence distributions. Given the systematic deviation of our predictions for theprotein copy number noise highlighted in Fig. 3(C), the theoretical distributions (dashed lines) underestimatethe width of the experimental data. We again direct the reader to Appendix S9 for an exploration of empiricalchanges to the moments that improve the agreement of the predictions. In the following section we formalize thenotion of how well cells can resolve diﬀerent inputs from an information theoretic perspective via the channelcapacity. We now turn our focus to the channel capacity, which is a metric by which we can quantify the degreeto which cells can measure the environmental state (in this context, the inducer concentration). The channelcapacity is deﬁned as the mutual information I between input and output (Eq. 1), maximized over all possibleinput distributions P ( c ). If used as a metric of how reliably a signaling system can infer the state of theexternal signal, the channel capacity, when measured in bits, is commonly interpreted as the logarithm of thenumber of states that the signaling system can properly resolve. For example, a signaling system with a channelcapacity of C bits is interpreted as being able to resolve 2 C states, though channel capacities with fractionalvalues are allowed. We therefore prefer the Bayesian interpretation that the mutual information quantiﬁesthe improvement in the inference of the input when considering the output compared to just using the priordistribution of the input by itself for prediction [14, 43]. Under this interpretation a channel capacity of afractional bit still quantiﬁes an improvement in the ability of the signaling system to infer the value of theextracellular signal compared to having no sensing system at all.Computing the channel capacity implies optimizing over an inﬁnite space of possible distributions P ( c ).For special cases in which the noise is small compared to the dynamic range, approximate analytical equationshave been derived [17]. But given the high cell-to-cell variability that our model predicts, the conditions ofthe so-called small noise approximation are not satisﬁed. We therefore appeal to a numerical solution knownas the Blahut-Arimoto algorithm [44] (See Appendix S8 for further details). Fig. 5(A) shows zero-parameterﬁt predictions of the channel capacity as a function of the number of repressors for diﬀerent repressor-DNAaﬃnities (solid lines). These predictions are contrasted with experimental determinations of the channel capacityas inferred from single-cell ﬂuorescence intensity distributions taken over 12 diﬀerent concentrations of inducer.Brieﬂy, from single-cell ﬂuorescence measurements we can approximate the input-output distribution P ( p | c ).Once these conditional distributions are ﬁxed, the task of ﬁnding the input distribution at channel capacitybecomes a computational optimization routine that can be undertaken using conjugate gradient or similaralgorithms. For the particular case of the channel capacity on a system with a discrete number of inputsand outputs the Blahut-Arimoto algorithm is built in such a way that it guarantees the convergence towardsthe optimal input distribution (See Appendix S8 for further details). Fig. 5(B) shows example input-outputfunctions for diﬀerent values of the channel capacity. This illustrates that having access to no information (zerochannel capacity) is a consequence of having overlapping input-output functions (lower panel). On the otherhand, the more separated the input-output distributions are (upper panel) the higher the channel capacity can11 A) (B)

Figure 4. Maximum entropy protein distributions for varying physical parameters. (A) Predicted proteindistributions under diﬀerent inducer (IPTG) concentrations for diﬀerent combinations of repressor-DNA aﬃnities(columns) and repressor copy numbers (rows). The ﬁrst six moments of the protein distribution used to constrain themaximum entropy approximation were computed by integrating Eq. 9 as cells progressed through the cell cycle asdescribed in Section 1.4. (B) Theory-experiment comparison of predicted fold-change empirical cumulative distributionfunctions (ECDF). Each panel shows two example concentrations of inducer (colored curves) with their correspondingtheoretical predictions (dashed lines). Distributions were normalized to the mean expression value of the unregulatedstrain in order to compare theoretical predictions in discrete protein counts with experimental ﬂuorescentmeasurements in arbitrary units. be. All theoretical predictions in Fig. 5(A) are systematically above the experimental data. Although ourtheoretical predictions in Fig. 5(A) do not numerically match the experimental inference of the channel capacity,the model does capture interesting qualitative features of the data that are worth highlighting. On one extreme,for cells with no transcription factors, there is no information processing potential as this simple genetic circuitwould be constitutively expressed regardless of the environmental state. As cells increase the transcriptionfactor copy number, the channel capacity increases until it reaches a maximum before falling back down at highrepressor copy number since the promoter would be permanently repressed. The steepness of the increment inchannel capacity as well as the height of the maximum expression is highly dependent on the repressor-DNAaﬃnity. For strong binding sites (blue curve in Fig. 5(A)) there is a rapid increment in the channel capacity,but the maximum value reached is smaller compared to a weaker binding site (orange curve in Fig. 5(A)).In Appendix S9 we show using the small noise approximation [9, 17] that if the systematic deviation of ourpredictions on the cell-to-cell variability was explained with a multiplicative constant, i.e. all noise predictionscan be corrected by multiplying them by a single constant, we would expect the channel capacity to be oﬀ by12 constant additive factor. This factor of ≈ .

43 bits can recover the agreement between the model and theexperimental data. (A) (B)

Figure 5. Comparison of theoretical and experimental channel capacity. (A) Channel capacity as inferredusing the Blahut-Arimoto algorithm [44] for varying number of repressors and repressor-DNA aﬃnities. All inferenceswere performed using 12 IPTG concentrations as detailed in the Methods. Curves represent zero-parameter ﬁtpredictions made with the maximum entropy distributions as shown in Fig. 4. Points represent inferences made fromsingle cell ﬂuorescence distributions (See Appendix S8 for further details). Theoretical curves were smoothed using aGaussian kernel to remove numerical precision errors. (B) Example input-output functions in opposite limits of channelcapacity. Lower panel illustrates that zero channel capacity indicates that all distributions overlap. Upper panelillustrates that as the channel capacity increases, the separation between distributions increases as well. Arrows pointto the corresponding channel capacity computed from the predicted distributions.

Building on Shannon’s formulation of information theory, there have been signiﬁcant eﬀorts using thistheoretical framework to understand the information processing capabilities of biological systems, and theevolutionary consequences for organisms harboring signal transduction systems [1, 6, 9, 45–47]. Recently, withthe mechanistic dissection of molecular signaling pathways, signiﬁcant progress has been made on the questionof the physical limits of cellular detection and the role that features such as feedback loops play in this task [7,14, 16, 48, 49]. But the ﬁeld still lacks a rigorous experimental test of these ideas with precision measurementson a system that is tractable both experimentally and theoretically.In this paper we take advantage of the recent progress on the quantitative modeling of input-output functionsof genetic circuits to build a minimal model of the simple repression motif [28]. By combining a series ofstudies on this circuit spanning diverse experimental methods for measuring gene expression under a myriad ofdiﬀerent conditions, we possess complete a priori parametric knowledge – allowing us to generate parameter-free predictions for processes related to information processing. Some of the model parameters for our kineticformulation of the input-output function are informed by inferences made from equilibrium models. We usethe fact that if both kinetic and thermodynamic languages describe the same system, the predictions must beself-consistent. In other words, if the equilibrium model can only make statements about the mean mRNAand mean protein copy number because of the way these models are constructed, those predictions must beequivalent to what the kinetic model has to say about these same quantities. This condition therefore constrainsthe values that the kinetic rates in the model can take. To test whether or not the equilibrium picture canreproduce the predictions made by the kinetic model we compare the experimental and theoretical fold-changein protein copy number for a suite of biophysical parameters and environmental conditions (Fig. 3(C) upper13ow). The agreement between theory and experiment demonstrates that these two frameworks can indeed makeconsistent predictions.The kinetic treatment of the system brings with it increasing predictive power compared to the equilibriumpicture. Under the kinetic formulation, the predictions are not limited only to the mean but to any of themoments of the mRNA and protein distributions. We ﬁrst test these novel predictions by comparing the noisein protein copy number (standard deviation / mean) with experimental data. Our minimal model predicts thenoise up to a systematic deviation. The physical or biological origins of this discrepancy remain an open question.In that way the work presented here exposes the status quo of our understanding of gene regulation in bacteria,posing new questions to be answered with future reﬁnements of the model. We then extend our analysis to inferentire protein distributions at diﬀerent input signal concentrations by using the maximum entropy principle.What this means is that we compute moments of the protein distribution, and then use these moments to buildan approximation to the full distribution. These predicted distributions are then compared with experimentalsingle-cell distributions as shown in Fig. 4(B) and Appendix S6. Again, here although our minimal modelsystematically underestimates the width of the distributions, it informs how changes in parameters such asprotein copy number or protein-DNA binding aﬃnity will aﬀect the full probabilistic input-output function ofthe genetic circuit, up to a multiplicative constant. We then use our model to predict the information processingcapacity.By maximizing the mutual information between input signal concentration and output protein distributionover all possible input distributions, we predict the channel capacity of the system over a suite of biophysicalparameters such as varying repressor protein copy number and repressor-DNA binding aﬃnity. Although thereis no reason to assume the the simpliﬁed synthetic circuit we used as an experimental model operates optimallygiven the distribution of inputs, the relevance of the channel capacity comes from its interpretation as a metricof the physical limit of how precise an inference cells can make about what the state of the environment is.Our model, despite the systematic deviations, makes non-trivial predictions such as the existence of an optimalrepressor copy number for a given repressor-DNA binding energy, predicting the channel capacity up to anadditive constant (See Fig. 5). The origin of this optimal combination of repressor copy number and bindingenergy diﬀers from previous publications in which an extra term associated with the cost of producing proteinwas included in the model [16]. This optimal parameter combination is a direct consequence of the fact thatthe LacI repressor cannot be fully deactivated [27]. This implies that as the number of repressors increases, asigniﬁcant number of them are still able to bind to the promoter even at saturating concentrations of inducer.This causes all of the input-output functions to be shift towards low expression levels, regardless of the inducerconcentration, decreasing the amount of information that the circuit is able to process.We consider it important to highlight the limitations of the work presented here. The previously discussedsystematic deviation for the noise and skewness of the predicted distributions (See Appendix S5), and therefore ofthe predicted distributions and channel capacity, remains an unresolved question that deserves to be addressed infurther iterations of our minimal model. Also, as ﬁrst reported in [27], our model fails to capture the steepness ofthe fold-change induction curve for the weakest repressor binding site (See Fig. 3(B)). Furthermore the minimalmodel in Fig. 2(A), despite being widely used, is an oversimpliﬁcation of the physical picture of how thetranscriptional machinery works. The coarse-graining of all the kinetic steps involved in transcription initiationinto two eﬀective promoter states – active and inactive – ignores potential kinetic regulatory mechanisms ofintermediate states [50]. Moreover it has been argued that despite the fact that the mRNA count distributiondoes not follow a Poisson distribution, this eﬀect could be caused by unknown factors not at the level oftranscriptional regulation [51].The ﬁndings of this work open the opportunity to accurately test intriguing ideas that connect Shannon’smetric of how accurately a signaling system can infer the state of the environment, with Darwinian ﬁtness[6]. Beautiful work along these lines has been done in the context of the developmental program of the early

Drosophila embryo [9, 11]. These studies demonstrated that the input-output function of the pair-rule genesworks at channel capacity, suggesting that selection has acted on these signaling pathways, pushing them tooperate at the limit of what the physics of these systems allows. Our system diﬀers from the early embryo in the14ense that we have a tunable circuit with variable amounts of information processing capabilities. Furthermore,compared with the ﬂy embryo in which the organism tunes both the input and output distributions overevolutionary time, we have experimental control of the distribution of inputs that the cells are exposed to.Consequently this means that instead of seeing the ﬁnal result of the evolutionary process, we would be able toset diﬀerent environmental challenges, and track over time the evolution of the population. These experimentscould shed light into the suggestive hypothesis of information bits as a trait on which natural selection acts.We see this exciting direction as part of the overall eﬀort in quantitative biology of predicting evolution [52].

E. coli strains

All strains used in this study were originally made for [27]. We chose a subset of three repressor copynumbers that span two orders of magnitude. We refer the reader to [27] for details on the construction of thesestrains. Brieﬂy, the strains have a construct consisting of the lacUV5 promoter and one of three possible bindingsites for the lac repressor (O1, O2, and O3) controlling the expression of a YFP reporter gene. This constructis integrated into the genome at the galK locus. The number of repressors per cell is varied by changing theribosomal binding site controlling the translation of the lac repressor gene. The repressor constructs wereintegrated in the ybcN locus. Finally, all strains used in this work constitutively express an mCherry reporterfrom a low copy number plasmid. This serves as a volume marker that facilitates the segmentation of cells whenprocessing microscopy images.

For all experiments, cultures were initiated from a 50% glycerol frozen stock at -80 ◦ C. Three strains -autoﬂuorescence ( auto ), ∆ lacI (∆), and a strain with a known binding site and repressor copy number ( R ) -were inoculated into individual tubes with 2 mL of Lysogeny Broth (LB Miller Powder, BD Medical) with 20 µ g/mL of chloramphenicol and 30 µ g/mL of kanamycin. These cultures were grown overnight at 37 ◦ C with rapidagitation to reach saturation. The saturated cultures were diluted 1:1000 into 500 µ L of M9 minimal media (M95X Salts, Sigma-Aldrich M6030; 2 mM magnesium sulfate, Mallinckrodt Chemicals 6066-04; 100 mM calciumchloride, Fisher Chemicals C79-500) supplemented with 0.5% (w/v) glucose on a 2 mL 96-deep-well plate. The R strain was diluted into 12 diﬀerent wells with minimal media, each with a diﬀerent IPTG concentration (0 µ M, 0.1 µ M, 5 µ M, 10 µ M, 25 µ M, 50 µ M, 75 µ M, 100 µ M, 250 µ M, 500 µM , 1000 µ M, 5000 µ M) while the auto and ∆ strains were diluted into two wells (0 µ M, 5000 µ M). Each of the IPTG concentrations came froma single preparation stock kept in 100-fold concentrated aliquots. The 96 well plate was then incubated at 37 ◦ Cwith rapid agitation for 8 hours before imaging.

The microscopy pipeline used for this work exactly followed the steps from [27]. Brieﬂy, twelve 2% agarose(Life Technologies UltraPure Agarose, Cat.No. 16500100) gels were made out of M9 media (or PBS buﬀer)with the corresponding IPTG concentration (see growth conditions) and placed between two glass coverslips forthem to solidify after microwaving. After the 8 hour incubation in minimal media, 1 µ L of a 1:10 dilution of thecultures into fresh media or PBS buﬀer was placed into small squares (roughly 10 mm ×

10 mm) of the diﬀerentagarose gels. A total of 16 agarose squares - 12 concentrations of IPTG for the R strain, 2 concentrations forthe ∆ and 2 for the auto strain - were mounted into a single glass-bottom dish (Ted Pella Wilco Dish, Cat. No.14027-20) that was sealed with paraﬁlm.All imaging was done on an inverted ﬂuorescent microscope (Nikon Ti-Eclipse) with custom-built laserillumination system. The YFP ﬂuorescence (quantitative reporter) was imaged with a CrystaLaser 514 nmexcitation laser coupled with a laser-optimized (Semrock Cat. No. LF514-C-000) emission ﬁlter. All strains,15ncluding the auto strain, included a constitutively expressed mCherry protein to aid the segmentation. There-fore, for each image three channels (YFP, On average 30 images with roughly 20 cells per condition were taken.25 images of a ﬂuorescent slide and 25 images of the camera background noise were taken every imaging sessionin order to ﬂatten the illumination. The image processing pipeline for this work is exactly the same as in [27]. All data and custom scripts were collected and stored using Git version control. Code for raw data pro-cessing, theoretical analysis, and ﬁgure generation is available on the GitHub repository ( https://github.com/RPGroup-PBoC/chann_cap ). The code can also be accessed via the paper website ( ). Raw microscopy data are stored on the CaltechDATA data repository and can beaccessed via DOI https://doi.org/10.22002/d1.1184 . Bootstrap estimates of experimental channel capacityare also available on the CaltechDATA data repository via https://doi.org/10.22002/D1.1185 . We would like to also thank Nathan Belliveau, Michael Betancourt, William Bialek, Justin Bois, EmanuelFlores, Hernan Garcia, Alejandro Granados, Porﬁrio Quintero, Catherine Triandaﬁllou, and Ned Wingreen foruseful advice and discussion. We would especially like to thank Alvaro Sanchez, Gasper Tkacik, and JaneKondev for critical observations on the manuscript. We thank Rob Brewster for providing the raw mRNAFISH data for inferences, and David Drabold for advice on the maximum entropy inferences. We are grateful toHeun Jin Lee for his key support with the quantitative microscopy. This work was supported by La FondationPierre-Gilles de Gennes, the Rosen Center at Caltech, and the NIH 1R35 GM118043 (MIRA). M.R.M. wassupported by the Caldwell CEMI fellowship.

References I. Nemenman, “Information theory and adaptation”, in

Quantitative biology: from molecular to cellular sys-tems , Vol. 30322 (Taylor and Francis, 2010), pp. 1–12. A. Eldar and M. B. Elowitz, “Functional roles for noise in genetic circuits”, Nature , 167–173 (2010). C. E. Shannon, “A Mathematical Theory of Communication”, Bell System Technical Journal , 379–423(1948). D. J. MacKay,

Information theory, inference and learning algorithms (Cambridge university press, 2003). J. B. Kinney, A. Murugan, C. G. Callan, and E. C. Cox, “Using deep sequencing to characterize the biophysicalmechanism of a transcriptional regulatory sequence”, PNAS , 9158–9163 (2010). S. F. Taylor, N. Tishby, and W. Bialek, “Information and ﬁtness”, ArXiv (2007). W. Bialek and S. Setayeshgar, “Physical limits to biochemical signaling”, PNAS , 10040–10045 (2005). T. Gregor, D. W. Tank, E. F. Wieschaus, and W. Bialek, “Probing the Limits to Positional Information”,Cell , 153–164 (2007). G. Tkacik, C. G. Callan, and W. Bialek, “Information ﬂow and optimization in transcriptional regulation”,PNAS , 12265–12270 (2008). J. O. Dubuis, G. Tkacik, E. F. Wieschaus, T. Gregor, and W. Bialek, “Positional information, in bits”, PNAS , 16301–16308 (2013). M. D. Petkova, G. Tkaˇcik, W. Bialek, E. F. Wieschaus, and T. Gregor, “Optimal Decoding of CellularIdentities in a Genetic Network”, Cell , 844–855.e15 (2019). G. Rieckh and G. Tkaˇcik, “Noise and Information Transmission in Promoters with Multiple Internal States”,Biophysical Journal , 1194–1204 (2014). 16 E. Ziv, I. Nemenman, and C. H. Wiggins, “Optimal Signal Processing in Small Stochastic Biochemical Net-works”, PLoS ONE , edited by G. Stolovitzky, e1077 (2007). M. Voliotis, R. M. Perrett, C. McWilliams, C. a. McArdle, and C. G. Bowsher, “Information transfer by leaky,heterogeneous, protein kinase signaling systems”, PNAS , E326–E333 (2014). F. Tostevin and P. R. ten Wolde, “Mutual Information between Input and Output Trajectories of BiochemicalNetworks”, Physical Review Letters , 218101 (2009). G. Tkaˇcik and A. M. Walczak, “Information transmission in genetic regulatory networks: a review”, Journalof Physics: Condensed Matter , 153102 (2011). G. Tkaˇcik, C. G. Callan, and W. Bialek, “Information capacity of genetic regulatory elements”, PhysicalReview E , 011910 (2008). O. P. Tabbaa and C Jayaprakash, “Mutual information and the ﬁdelity of response of gene regulatory models”,Physical Biology , 046004 (2014). I. Golding, J. Paulsson, S. M. Zawilski, and E. C. Cox, “Real-Time Kinetics of Gene Activity in IndividualBacteria”, Cell , 1025–1036 (2005). H. G. Garcia and R. Phillips, “Quantitative dissection of the simple repression input-output function”, PNAS , 12173–12178 (2011). J. M. G. Vilar and L. Saiz, “Reliable Prediction of Complex Phenotypes from a Modular Design in FreeEnergy Space: An Extensive Exploration of the lac Operon”, ACS Synthetic Biology , 576–586 (2013). H. Xu, L. A. Sep´ulveda, L. Figard, A. M. Sokac, and I. Golding, “Combining protein and mRNA quantiﬁcationto decipher transcriptional regulation”, Nature Methods , 739–742 (2015). R. C. Brewster, D. L. Jones, and R. Phillips, “Tuning Promoter Strength through RNA Polymerase BindingSite Design in

Escherichia coli ”, PLoS Computational Biology , edited by E. van Nimwegen, e1002811 (2012). S. L. Barnes, N. M. Belliveau, W. T. Ireland, J. B. Kinney, and R. Phillips, “Mapping DNA sequence totranscription factor binding energy in vivo”, PLoS Computational Biology , edited by G. D. Stormo,e1006226 (2019). D. L. Jones, R. C. Brewster, and R. Phillips, “Promoter architecture dictates cell-to-cell variability in geneexpression”, Science , 1533–1536 (2014). R. C. Brewster, F. M. Weinert, H. G. Garcia, D. Song, M. Rydenfelt, and R. Phillips, “The TranscriptionFactor Titration Eﬀect Dictates Level of Gene Expression”, Cell , 1312–1323 (2014). M. Razo-Mejia, S. L. Barnes, N. M. Belliveau, G. Chure, T. Einav, M. Lewis, and R. Phillips, “TuningTranscriptional Regulation through Signaling: A Predictive Theory of Allosteric Induction”, Cell Systems ,456–469.e10 (2018). R. Phillips, N. M. Belliveau, G. Chure, H. G. Garcia, M. Razo-Mejia, and C. Scholes, “Figure 1 Theory MeetsFigure 2 Experiments in the Study of Gene Expression”, Annual Review of Biophysics , 121–163 (2019). M. Rydenfelt, H. G. Garcia, R. S. Cox, and R. Phillips, “The Inﬂuence of Promoter Architectures andRegulatory Motifs on Gene Expression in

Escherichia coli ”, PLoS ONE , edited by J. Garcia-Ojalvo, e114347(2014). A. Sanchez, S. Choubey, and J. Kondev, “Stochastic models of transcription: From single molecules to singlecells”, Methods , 13–25 (2013). D. F. Browning and S. J. W. Busby, “The regulation of bacterial transcription initiation”, Nature ReviewsMicrobiology , 57–65 (2004). M. R. Maurizi, “Proteases and protein degradation in

Escherichia coli ”, Experientia , 178–201 (1992). N. E. Buchler, U. Gerland, and T. Hwa, “On schemes of combinatorial transcription logic”, PNAS , 5136–5141 (2003). J. Peccoud and B. Ycart, “Markovian Modeling of Gene-Product Synthesis”, Theoretical Population Biology , 222–234 (1995). 17 Q. Cui and M. Karplus, “Allostery and cooperativity revisited”, Protein Science , 1295–1307 (2008). R. Phillips, “Napoleon Is in Equilibrium”, Annual Review of Condensed Matter Physics , 85–111 (2015). H. Bremer and P. P. Dennis, “Modulation of Chemical Composition and Other Parameters of the Cell byGrowth Rate”, . J. R. Peterson, J. A. Cole, J. Fei, T. Ha, and Z. A. Luthey-Schulten, “Eﬀects of DNA replication on mRNAnoise”, PNAS , 15886–15891 (2015). H. Dong and C. Kurland, “Ribosome Mutants with Altered Accuracy Translate with Reduced Processivity”,Journal of Molecular Biology , 551–561 (1995). E. O. Powell, “Growth Rate and Generation Time of Bacteria, with Special Reference to Continuous Culture”,Journal of General Microbiology , 492–511 (1956). R. Grah, B. Zoller, and G. Tkacik, “Normative models of enhancer function”, bioRxiv (2020) . E. T. Jaynes, “Information Theory and Statistical Mechanics”, Physical Review , 620–630 (1957). C. G. Bowsher and P. S. Swain, “Environmental sensing, information transfer, and cellular decision-making”,Current Opinion in Biotechnology , 149–155 (2014). R. Blahut, “Computation of channel capacity and rate-distortion functions”, IEEE Transactions on Informa-tion Theory , 460–473 (1972). C. Bergstrom and M. Lachmann, “Shannon information and biological ﬁtness”, in Information theory work-shop. ieee, 2004 (2004), pp. 50–54. D. Polani, “Information: Currency of life?”, HFSP Journal , 307–316 (2009). O. Rivoire and S. Leibler, “The Value of Information for Populations in Varying Environments”, Journal ofStatistical Physics , 1124–1166 (2011). E. Libby, T. J. Perkins, and P. S. Swain, “Noisy information processing through transcriptional regulation”,PNAS , 7151–7156 (2007). A. Rhee, R. Cheong, and A. Levchenko, “The application of information theory to biochemical signalingsystems”, Physical Biology , 045011 (2012). C. Scholes, A. H. DePace, and ´A. S´anchez, “Combinatorial Gene Regulation through Kinetic Control of theTranscription Cycle”, Cell Systems , 97–108.e9 (2017). S. Choubey, J. Kondev, and A. Sanchez, “Distribution of Initiation Times Reveals Mechanisms of Transcrip-tional Regulation in Single Cells”, Biophysical Journal , 2072–2082 (2018). M. L¨assig, V. Mustonen, and A. M. Walczak, “Predicting evolution”, Nature Ecology & Evolution , 0077(2017). 18 upplemental Information for: First-principlesprediction of the information processing capacity of asimple genetic circuit Manuel Razo-Mejia , Sarah Marzen , Griﬃn Chure , Rachel Taubman , Muir Morrison , andRob Phillips

Contents

S1 Three-state promoter model for simple repression 20S2 Parameter inference 22

S2.1 Unregulated promoter rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22S2.2 Accounting for variability in the number of promoters . . . . . . . . . . . . . . . . . . . . . . . . 25S2.3 Repressor rates from three-state regulated promoter. . . . . . . . . . . . . . . . . . . . . . . . . . 29

S3 Computing moments from the master equation 32

S3.1 Computing moments of a distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32S3.2 Moment closure of the simple-repression distribution . . . . . . . . . . . . . . . . . . . . . . . . . 35S3.3 Computing single promoter steady-state moments . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

S4 Accounting for the variability in gene copy number during the cell cycle 37

S4.1 Numerical integration of moment equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38S4.2 Exponentially distributed ages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42S4.3 Reproducing the equilibrium picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43S4.4 Comparison between single- and multi-promoter kinetic model . . . . . . . . . . . . . . . . . . . 45S4.5 Comparison with experimental data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

S5 Accounting for the variability in gene copy number during the cell cycle 48

S5.1 Numerical integration of moment equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49S5.2 Exponentially distributed ages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53S5.3 Reproducing the equilibrium picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54S5.4 Comparison between single- and multi-promoter kinetic model . . . . . . . . . . . . . . . . . . . 56S5.5 Comparison with experimental data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5619

S6.1 The MaxEnt principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60S6.2 The Bretthorst rescaling algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62S6.3 Predicting distributions for simple repression constructs . . . . . . . . . . . . . . . . . . . . . . . 64S6.4 Comparison with experimental data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

S7 Gillespie simulation of master equation 65

S7.1 mRNA distribution with Gillespie simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66S7.2 Protein distribution with Gillespie simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

S8 Computational determination of the channel capacity 67

S8.1 Blahut’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68S8.2 Channel capacity from arbitrary units of ﬂuorescence . . . . . . . . . . . . . . . . . . . . . . . . . 68S8.3 Assumptions involved in the computation of the channel capacity . . . . . . . . . . . . . . . . . . 70

S9 Empirical ﬁts to noise predictions 71

S9.1 Multiplicative factor for the noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71S9.2 Additive factor for the noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72S9.3 Correction factor for channel capacity with multiplicative factor . . . . . . . . . . . . . . . . . . . 72

S10Derivation of the cell age distribution 73

S1 Three-state promoter model for simple repression

In order to tackle the question of how much information the simple repression motif can process we requirethe joint probability distribution of mRNA and protein P ( m, p ; t ). To obtain this distribution we use thechemical master equation formalism as described in Section 1.1. Speciﬁcally, we assume a three-state modelwhere the promoter can be found 1) in a transcriptionally active state ( A state), 2) in a transcriptionallyinactive state without the repressor bound ( I state) and 3) with the repressor bound ( R state). (See Fig. 2(A)).These three states generate a system of coupled diﬀerential equations for each of the three state distributions P A ( m, p ), P I ( m, p ) and P R ( m, p ). Given the rates shown in Fig. 2(A) let us deﬁne the system of ODEs. Forthe transcriptionally active state we have dP A ( m, p ) dt = − A → I (cid:122) (cid:125)(cid:124) (cid:123) k ( p )oﬀ P A ( m, p ) + I → A (cid:122) (cid:125)(cid:124) (cid:123) k ( p )on P I ( m, p )+ m − → m (cid:122) (cid:125)(cid:124) (cid:123) r m P A ( m − , p ) − m → m +1 (cid:122) (cid:125)(cid:124) (cid:123) r m P A ( m, p ) + m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) γ m ( m + 1) P A ( m + 1 , p ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γ m mP A ( m, p )+ p − → p (cid:122) (cid:125)(cid:124) (cid:123) r p mP A ( m, p − − p → p +1 (cid:122) (cid:125)(cid:124) (cid:123) r p mP A ( m, p ) + p +1 → p (cid:122) (cid:125)(cid:124) (cid:123) γ p ( p + 1) P A ( m, p + 1) − p → p − (cid:122) (cid:125)(cid:124) (cid:123) γ p pP A ( m, p ) . (S1)20or the inactive promoter state I we have dP I ( m, p ) dt = A → I (cid:122) (cid:125)(cid:124) (cid:123) k ( p )oﬀ P A ( m, p ) − I → A (cid:122) (cid:125)(cid:124) (cid:123) k ( p )on P I ( m, p ) + R → I (cid:122) (cid:125)(cid:124) (cid:123) k ( r )oﬀ P R ( m, p ) − I → R (cid:122) (cid:125)(cid:124) (cid:123) k ( r )on P I ( m, p )+ m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) γ m ( m + 1) P I ( m + 1 , p ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γ m mP I ( m, p )+ p − → p (cid:122) (cid:125)(cid:124) (cid:123) r p mP I ( m, p − − p → p +1 (cid:122) (cid:125)(cid:124) (cid:123) r p mP I ( m, p ) + p +1 → p (cid:122) (cid:125)(cid:124) (cid:123) γ p ( p + 1) P I ( m, p + 1) − p → p − (cid:122) (cid:125)(cid:124) (cid:123) γ p pP I ( m, p ) . (S2)And ﬁnally for the repressor bound state R we have dP R ( m, p ) dt = − R → I (cid:122) (cid:125)(cid:124) (cid:123) k ( r )oﬀ P R ( m, p ) + I → R (cid:122) (cid:125)(cid:124) (cid:123) k ( r )on P I ( m, p )+ m +1 → m (cid:122) (cid:125)(cid:124) (cid:123) γ m ( m + 1) P R ( m + 1 , p ) − m → m − (cid:122) (cid:125)(cid:124) (cid:123) γ m mP R ( m, p )+ p − → p (cid:122) (cid:125)(cid:124) (cid:123) r p mP R ( m, p − − p → p +1 (cid:122) (cid:125)(cid:124) (cid:123) r p mP R ( m, p ) + p +1 → p (cid:122) (cid:125)(cid:124) (cid:123) γ p ( p + 1) P R ( m, p + 1) − p → p − (cid:122) (cid:125)(cid:124) (cid:123) γ p pP R ( m, p ) . (S3)For an unregulated promoter, i.e. a promoter in a cell that has no repressors present, and therefore consti-tutively expresses the gene, we use a two-state model in which the state R is not allowed. All the terms in thesystem of ODEs containing k ( r )on or k ( r )oﬀ are then set to zero.As detailed in Section 1.1 it is convenient to express this system using matrix notation [30]. For this wedeﬁne P ( m, p ) = ( P A ( m, p ) , P I ( m, p ) , P R ( m, p )) T . Then the system of ODEs can be expressed as d P ( m, p ) dt = KP ( m, p ) − R m P ( m, p ) + R m P ( m − , p ) − m Γ m P ( m, p ) + ( m + 1) Γ m P ( m + 1 , p ) − m R p P ( m, p ) + m R p P ( m, p − − p Γ p P ( m, p ) + ( p + 1) Γ p P ( m, p + 1) , (S4)where we deﬁned matrices representing the promoter state transition K , K ≡  − k ( p )oﬀ k ( p )on k ( p )oﬀ − k ( p )on − k ( r )on k ( r )oﬀ k ( r )on − k ( r )oﬀ  , (S5)mRNA production, R m , and degradation, Γ m , as R m ≡  r m  , (S6)and Γ m ≡  γ m γ m

00 0 γ m  . (S7)For the protein we also deﬁne production R p and degradation Γ p matrices as R p ≡  r p r p

00 0 r p  (S8)21nd Γ p ≡  γ p γ p

00 0 γ p  . (S9)The corresponding equation for the unregulated two-state promoter takes the exact same form with thedeﬁnition of the matrices following the same scheme without including the third row and third column, andsetting k ( r )on and k ( r )oﬀ to zero.A closed-form solution for this master equation might not even exist. The approximate solution of chemicalmaster equations of this kind is an active area of research. As we will see in Appendix S2 the two-state promotermaster equation has been analytically solved for the mRNA [34] and protein distributions [53]. For our purposes,in Appendix S6 we will detail how to use the Maximum Entropy principle to approximate the full distributionfor the two- and three-state promoter. S2 Parameter inference (Note: The Python code used for the calculations presented in this section can be found in the followinglink as an annotated Jupyter notebook)With the objective of generating falsiﬁable predictions with meaningful parameters, we infer the kineticrates for this three-state promoter model using diﬀerent data sets generated in our lab over the last decadeconcerning diﬀerent aspects of the regulation of the simple repression motif. For example, for the unregulatedpromoter transition rates k ( p )on and k ( p )oﬀ and the mRNA production rate r m , we use single-molecule mRNA FISHcounts from an unregulated promoter [25]. Once these parameters are ﬁxed, we use the values to constrain therepressor rates k ( r )on and k ( r )oﬀ . These repressor rates are obtained using information from mean gene expressionmeasurements from bulk LacZ colorimetric assays [20]. We also expand our model to include the allostericnature of the repressor protein, taking advantage of video microscopy measurements done in the context ofmultiple promoter copies [26] and ﬂow-cytometry measurements of the mean response of the system to diﬀerentlevels of induction [27]. In what follows of this section we detail the steps taken to infer the parameter values.At each step the values of the parameters inferred in previous steps constrain the values of the parametersthat are not yet determined, building in this way a self-consistent model informed by work that spans severalexperimental techniques. S2.1 Unregulated promoter rates

We begin our parameter inference problem with the promoter on and oﬀ rates k ( p )on and k ( p )oﬀ , as well as themRNA production rate r m . In this case there are only two states available to the promoter – the inactivestate I and the transcriptionally active state A . That means that the third ODE for P R ( m, p ) is removedfrom the system. The mRNA steady state distribution for this particular two-state promoter model was solvedanalytically by Peccoud and Ycart [34]. This distribution P ( m ) ≡ P I ( m ) + P A ( m ) is of the form P ( m | k ( p ) on , k ( p ) oﬀ , r m , γ m ) = Γ (cid:16) k ( p )on γ m + m (cid:17) Γ( m + 1)Γ (cid:18) k ( p )off + k ( p )on γ m + m (cid:19) Γ (cid:18) k ( p )off + k ( p )on γ m (cid:19) Γ (cid:16) k ( p )on γ m (cid:17) (cid:18) r m γ m (cid:19) m F (cid:32) k ( p ) on γ m + m, k ( p ) oﬀ + k ( p ) on γ m + m, − r m γ m (cid:33) , (S10) where Γ( · ) is the gamma function, and F is the conﬂuent hypergeometric function of the ﬁrst kind. Thisrather complicated expression will aid us to ﬁnd parameter values for the rates. The inferred rates k ( p )on , k ( p )oﬀ and r m will be expressed in units of the mRNA degradation rate γ m . This is because the model in Eq. S10 ishomogeneous in time, meaning that if we divide all rates by a constant it would be equivalent to multiplying thecharacteristic time scale by the same constant. As we will discuss in the next section, Eq. S10 has degeneracy22n the parameter values. What this means is that a change in one of the parameters, speciﬁcally r m , can becompensated by a change in another parameter, speciﬁcally k ( p )oﬀ , to obtain the exact same distribution. Towork around this intrinsic limitation of the model we will include in our inference prior information from whatwe know from equilibrium-based models. S2.1.1 Bayesian parameter inference of RNAP rates

In order to make progress at inferring the unregulated promoter state transition rates, we make use of thesingle-molecule mRNA FISH data from Jones et al. [25]. Fig. S1 shows the distribution of mRNA per cell forthe lacUV5 promoter used for our inference. This promoter, being very strong, has a mean copy number of (cid:104) m (cid:105) ≈

18 mRNA/cell. mRNA / cell p r o b a b ili t y Figure S1. lacUV5 mRNA per cell distribution.

Data from [25] of the unregulated lacUV5 promoter as inferredfrom single molecule mRNA FISH.

Having this data in hand we now turn to Bayesian parameter inference. Writing Bayes theorem we have P ( k ( p )on , k ( p )oﬀ , r m | D ) = P ( D | k ( p )on , k ( p )oﬀ , r m ) P ( k ( p )on , k ( p )oﬀ , r m ) P ( D ) , (S11)where D represents the data. For this case the data consists of single-cell mRNA counts D = { m , m , . . . , m N } ,where N is the number of cells. We assume that each cell’s measurement is independent of the others such thatwe can rewrite Eq. S11 as P ( k ( p )on , k ( p )oﬀ , r m | { m i } ) ∝ (cid:34) N (cid:89) i =1 P ( m i | k ( p )on , k ( p )oﬀ , r m ) (cid:35) P ( k ( p )on , k ( p )oﬀ , r m ) , (S12)where we ignore the normalization constant P ( D ). The likelihood term P ( m i | k ( p )on , k ( p )oﬀ , r m ) is exactly given byEq. S10 with γ m = 1. Given that we have this functional form for the distribution, we can use Markov ChainMonte Carlo (MCMC) sampling to explore the 3D parameter space in order to ﬁt Eq. S10 to the mRNA-FISHdata. S2.1.2 Constraining the rates given prior thermodynamic knowledge.

One of the strengths of the Bayesian approach is that we can include all the prior knowledge on the parameterswhen performing an inference [4]. Basic features such as the fact that the rates have to be strictly positiveconstrain the values that these parameters can take. For the speciﬁc rates analyzed in this section we know morethan the simple constraint of non-negative values. The expression of an unregulated promoter has been studied23rom a thermodynamic perspective [23]. Given the underlying assumptions of these equilibrium models, in whichthe probability of ﬁnding the RNAP bound to the promoter is proportional to the transcription rate [54], theycan only make statements about the mean expression level. Nevertheless if both the thermodynamic and thekinetic model describe the same process, the predictions for the mean gene expression level must agree. Thatmeans that we can use what we know about the mean gene expression, and how this is related to parameterssuch as molecule copy numbers and binding aﬃnities, to constrain the values that the rates in question cantake.In the case of this two-state promoter it can be shown that the mean number of mRNA is given by [30] (SeeAppendix S3 for moment computation) (cid:104) m (cid:105) = r m γ m k ( p )on k ( p )on + k ( p )oﬀ . (S13)Another way of expressing this is as r m γ m × p ( p )active , where p ( p )active is the probability of the promoter being in thetranscriptionally active state. The thermodynamic picture has an equivalent result where the mean number ofmRNA is given by [23, 54] (cid:104) m (cid:105) = r m γ m PN NS e − β ∆ ε p PN NS e − β ∆ ε p , (S14)where P is the number of RNAP per cell, N NS is the number of non-speciﬁc binding sites, ∆ ε p is the RNAPbinding energy in k B T units and β ≡ ( k B T ) − . Using Eq. S13 and Eq. S14 we can easily see that if theseframeworks are to be equivalent, then it must be true that k ( p )on k ( p )oﬀ = PN NS e − β ∆ ε p , (S15)or equivalently ln (cid:32) k ( p )on k ( p )oﬀ (cid:33) = − β ∆ ε p + ln P − ln N NS . (S16)To put numerical values into these variables we can use information from the literature. The RNAP copynumber is order P ≈ − N NS = 4 . × [54] and − β ∆ ε p ≈ − P (cid:32) ln (cid:32) k ( p )on k ( p )oﬀ (cid:33)(cid:33) ∝ exp  − (cid:18) ln (cid:18) k ( p )on k ( p )off (cid:19) − ( − β ∆ ε p + ln P − ln N NS ) (cid:19) σ  , (S17)where σ is the variance that accounts for the uncertainty in these parameters. We include this prior as partof the prior term P ( k ( p )on , k ( p )oﬀ , r m ) of Eq. S12. We then use MCMC to sample out of the posterior distributiongiven by Eq. S12. Fig. S2 shows the MCMC samples of the posterior distribution. For the case of the k ( p )on parameter there is a single symmetric peak. k ( p )oﬀ and r m have a rather long tail towards large values. In fact,the 2D projection of k ( p )oﬀ vs r m shows that the model is sloppy, meaning that the two parameters are highlycorrelated. This feature is a common problem for many non-linear systems used in biophysics and systemsbiology [56]. What this implies is that we can change the value of k ( p )oﬀ , and then compensate by a change in r m in order to maintain the shape of the mRNA distribution. Therefore it is impossible from the data andthe model themselves to narrow down a single value for the parameters. Nevertheless since we included theprior information on the rates as given by the analogous form between the equilibrium and non-equilibriumexpressions for the mean mRNA level, we obtained a more constrained parameter value for the RNAP ratesand the transcription rate that we will take as the peak of this long-tailed distribution.24

01 2 01 8 02 4 0 () . . . . ( ) ( ) Figure S2. MCMC posterior distribution.

Sampling out of Eq. S12 the plot shows 2D and 1D projections of the3D parameter space. The parameter values are (in units of the mRNA degradation rate γ m ) k ( p )on = 4 . +1 − . , k ( p )oﬀ = 18 . +120 − and r m = 103 . +423 − which are the modes of their respective distributions, where the superscripts andsubscripts represent the upper and lower bounds of the 95 th percentile of the parameter value distributions The inferred values k ( p )on = 4 . +1 − . , k ( p )oﬀ = 18 . +120 − and r m = 103 . +423 − are given in units of the mRNAdegradation rate γ m . Given the asymmetry of the parameter distributions we report the upper and lower boundof the 95 th percentile of the posterior distributions. Assuming a mean life-time for mRNA of ≈ γ m ≈ . × − s − . Using this value gives the following values for the inferredrates: k ( p )on = 0 . +0 . − . s − , k ( p )oﬀ = 0 . +0 . − . s − , and r m = 0 . +2 . − . s − .Fig. S3 compares the experimental data from Fig. S1 with the resulting distribution obtained by substitutingthe most likely parameter values into Eq. S10. As we can see this two-state model ﬁts the data adequately. S2.2 Accounting for variability in the number of promoters

As discussed in ref. [25] and further expanded in [38] an important source of cell-to-cell variability in geneexpression in bacteria is the fact that, depending on the growth rate and the position relative to the chromosomereplication origin, cells can have multiple copies of any given gene. Genes closer to the replication origin haveon average higher gene copy number compared to genes at the opposite end. For the locus in which our reporterconstruct is located ( galK ) and the doubling time of the mRNA FISH experiments we expect to have ≈ r m compared to the rate r m for a single promoter copy. Theprobability of observing a certain mRNA copy m is therefore given by P ( m ) = P ( m | one promoter) · P (one promoter) + P ( m | two promoters) · P (two promoters) . (S18)Both terms P ( m | promoter copy) are given by Eq. S10 with the only diﬀerence being the rate r m . It is importantto acknowledge that Eq. S18 assumes that once the gene is replicated the time scale in which the mRNA count25

10 20 30 40 50 60 70 mRNA / cell p r o b a b ili t y two-state promoter fitsm-FISH data Figure S3. Experimental vs. theoretical distribution of mRNA per cell using parameters from Bayesianinference.

Dotted line shows the result of using Eq. S10 along with the parameters inferred for the rates. Blue barsare the same data as Fig. S1 obtained from [25]. relaxes to the new steady state is much shorter than the time that the cells spend in this two promoter copiesstate. This approximation should be valid for a short lived mRNA molecule, but the assumption is not applicablefor proteins whose degradation rate is comparable to the cell cycle length as explored in Section 1.4.In order to repeat the Bayesian inference including this variability in gene copy number we must split themRNA count data into two sets – cells with a single copy of the promoter and cells with two copies of thepromoter. For the single molecule mRNA FISH data there is no labeling of the locus, making it impossibleto determine the number of copies of the promoter for any given cell. We therefore follow Jones et al. [25] inusing the cell area as a proxy for stage in the cell cycle. In their approach they sorted cells by area, consideringcells below the 33 th percentile having a single promoter copy and the rest as having two copies. This approachignores that cells are not uniformly distributed along the cell cycle. As ﬁrst derived in [40] populations of cellsin a log-phase are exponentially distributed along the cell cycle. This distribution is of the form P ( a ) = (ln 2) · − a , (S19)where a ∈ [0 ,

1] is the stage of the cell cycle, with a = 0 being the start of the cycle and a = 1 being the celldivision (See Appendix S10 for a derivation of Eq. S19). Fig. S4 shows the separation of the two groups basedon area where Eq. S19 was used to weight the distribution along the cell cycle.A subtle, but important consequence of Eq. S19 is that computing any quantity for a single cell is notequivalent to computing the same quantity for a population of cells. For example, let us assume that we wantto compute the mean mRNA copy number (cid:104) m (cid:105) . For a single cell this would be of the form (cid:104) m (cid:105) cell = (cid:104) m (cid:105) · f + (cid:104) m (cid:105) · (1 − f ) , (S20)where (cid:104) m (cid:105) i is the mean mRNA copy number with i promoter copies in the cell, and f is the fraction of the cellcycle that cells spend with a single copy of the promoter. For a single cell the probability of having a singlepromoter copy is equivalent to this fraction f . But Eq. S19 tells us that if we sample unsynchronized cells weare not sampling uniformly across the cell cycle. Therefore for a population of cells the mean mRNA is givenby (cid:104) m (cid:105) population = (cid:104) m (cid:105) · φ + (cid:104) m (cid:105) · (1 − φ ) (S21)where the probability of sampling a cell with one promoter φ is given by φ = (cid:90) f P ( a ) da, (S22)26

100 200 300 400 500 area (pixels) e c d f small cells large cells Figure S4. Separation of cells based on cell size.

Using the area as a proxy for position in the cell cycle, cellscan be sorted into two groups – small cells (with one promoter copy) and large cells (with two promoter copies). Thevertical black line delimits the threshold that divides both groups as weighted by Eq. S19. where P ( a ) is given by Eq. S147. What this equation computes is the probability of sampling a cell during astage of the cell cycle < f where the reporter gene hasn’t been replicated yet. Fig. S5 shows the distribution ofboth groups. As expected larger cells have a higher mRNA copy number on average. p r o b a b ili t y × mRNA / cell C D F c e ll s i z e largesmall(A)(B) Figure S5. mRNA distribution for small and large cells. (A) histogram and (B) cumulative distributionfunction of the small and large cells as determined in Fig. S4. The triangles above histograms in (A) indicate the meanmRNA copy number for each group.

We modify Eq. S12 to account for the two separate groups of cells. Let N s be the number of cells in thesmall size group and N l the number of cells in the large size group. Then the posterior distribution for the27arameters is of the form P ( k ( p ) on , k ( p ) oﬀ , r m | { m i } ) ∝ (cid:34) N s (cid:89) i =1 P ( m i | k ( p ) on , k ( p ) oﬀ , r m ) (cid:35) (cid:34) N l (cid:89) j =1 P ( m j | k ( p ) on , k ( p ) oﬀ , r m ) (cid:35) P ( k ( p ) on , k ( p ) oﬀ , r m ) , (S23) where we split the product of small and large cells.For the two-promoter model the prior shown in Eq. S17 requires a small modiﬁcation. Eq. S21 gives themean mRNA copy number of a population of asynchronous cells growing at steady state. Given that we assumethat the only diﬀerence between having one vs. two promoter copies is the change in transcription rate from r m in the single promoter case to 2 r m in the two-promoter case we can write Eq. S21 as (cid:104) m (cid:105) = φ · r m γ m k ( p )on k ( p )on + k ( p )oﬀ + (1 − φ ) · r m γ m k ( p )on k ( p )on + k ( p )oﬀ . (S24)This can be simpliﬁed to (cid:104) m (cid:105) = (2 − φ ) r m γ m k ( p )on k ( p )on + k ( p )oﬀ . (S25)Equating Eq. S25 and Eq. S14 to again require self-consistent predictions of the mean mRNA from theequilibrium and kinetic models gives(2 − φ ) k ( p )on k ( p )on + k ( p )oﬀ = PN NS e − β ∆ ε p PN NS e − β ∆ ε p . (S26)Solving for k ( p )on k ( p )off results in (cid:32) k ( p )on k ( p )oﬀ (cid:33) = ρ [(1 + ρ )(2 − φ ) − ρ ] , (S27)where we deﬁne ρ ≡ PN NS e − β ∆ ε p . To simplify things further we notice that for the speciﬁed values of P =1000 − N NS = 4 . × bp, and − β ∆ ε p = 5 −

7, we can safely assume that ρ (cid:28)

1. Thissimplifying assumption has been previously called the weak promoter approximation [20]. Given this we cansimplify Eq. S27 as k ( p )on k ( p )oﬀ = 12 − φ PN NS e − β ∆ ε p . (S28)Taking the log of both sides givesln (cid:32) k ( p )on k ( p )oﬀ (cid:33) = − ln(2 − φ ) + ln P − ln N NS − β ∆ ε p . (S29)With this we can set as before a Gaussian prior to constrain the ratio of the RNAP rates as P (cid:32) ln (cid:32) k ( p )on k ( p )oﬀ (cid:33)(cid:33) ∝ exp  − (cid:18) ln (cid:18) k ( p )on k ( p )off (cid:19) − [ − ln(2 − φ ) − β ∆ ε p + ln P − ln N NS ) (cid:21) σ  . (S30)Fig. S6 shows the result of sampling out of Eq. S23. Again we see that the model is highly sloppy with largecredible regions obtained for k ( p )oﬀ and r m . Nevertheless, again the use of the prior information allows us to geta parameter values consistent with the equilibrium picture.28 () . . . . ( ) ( ) Figure S6. MCMC posterior distribution for a multi-promoter model.

Sampling out of Eq. S23 the plotshows 2D and 1D projections of the 3D parameter space. The parameter values are (in units of the mRNA degradationrate γ m ) k ( p )on = 6 . +0 . − . , k ( p )oﬀ = 132 +737 − and r m = 257 +1307 − which are the modes of their respective distributions, wherethe superscripts and subscripts represent the upper and lower bounds of the 95 th percentile of the parameter valuedistributions. The sampling was bounded to values < Using again a mRNA mean lifetime of ≈ k ( p )on =0 . +0 . − . s − , k ( p )oﬀ = 0 . +4 . − . s − , and r m = 1 . +7 . − . s − . Fig. S7 shows the result of applying Eq. S18 usingthese parameter values. Speciﬁcally Fig. S7(A) shows the global distribution including cells with one and twopromoters and Fig. S7(B) splits the distributions within the two populations. Given that the model adequatelydescribes both populations independently and pooled together we conﬁrm that using the cell area as a proxy forstage in the cell cycle and the doubling of the transcription rate once cells have two promoters are reasonableapproximations.It is hard to make comparisons with literature reported values because these kinetic rates are eﬀectiveparameters hiding a lot of the complexity of transcription initiation [31]. Also the non-identiﬁability of theparameters restricts our explicit comparison of the actual numerical values of the inferred rates. Neverthelessfrom the model we can see that the mean burst size for each transcription event is given by r m /k ( p )oﬀ . From ourinferred values we obtain then a mean burst size of ≈ . E. coli [57].

S2.3 Repressor rates from three-state regulated promoter.

Having determined the unregulated promoter transition rates we now proceed to determine the repressorrates k ( r )on and k ( r )oﬀ . The values of these rates are constrained again by the correspondence between our kineticpicture and what we know from equilibrium models [36]. For this analysis we again exploit the feature that, atthe mean, both the kinetic language and the thermodynamic language should have equivalent predictions. Overthe last decade there has been great eﬀort in developing equilibrium models for gene expression regulation [33,54, 58]. In particular our group has extensively characterized the simple repression motif using this formalism29

20 40 60 mRNA / cell p r o b a b ili t y multi-promoter fitsm-FISH data mRNA / cell p r o b a b ili t y one promotertwo promoterssmall cells sm-FISHlarge cells sm-FISH(A) (B) Figure S7. Experimental vs. theoretical distribution of mRNA per cell using parameters formulti-promoter model. (A) Solid line shows the result of using Eq. S18 with the parameters inferred by samplingEq. S23. Blue bars are the same data as Fig. S1 from [25]. (B) Split distributions of small cells (light blue bars) andlarge cells (dark blue) with the corresponding theoretical predictions with transcription rate r m (light blue line) andtranscription rate 2 r m (dark blue line) [20, 26, 27].The dialogue between theory and experiments has led to simpliﬁed expressions that capture the phenomenol-ogy of the gene expression response as a function of natural variables such as molecule count and aﬃnitiesbetween molecular players. A particularly interesting quantity for the simple repression motif used by Garcia& Phillips [20] is the fold-change in gene expression, deﬁned asfold-change = (cid:104) gene expression( R (cid:54) = 0) (cid:105)(cid:104) gene expression( R = 0) (cid:105) , (S31)where R is the number of repressors per cell and (cid:104)·(cid:105) is the population average. The fold-change is simply themean expression level in the presence of the repressor relative to the mean expression level in the absence ofregulation. In the language of statistical mechanics this quantity takes the formfold-change = (cid:18) RN NS e − β ∆ ε r (cid:19) − , (S32)where ∆ ε r is the repressor-DNA binding energy, and as before N NS is the number of non-speciﬁc binding siteswhere the repressor can bind [20].To compute the fold-change in the chemical master equation language we compute the ﬁrst moment of thesteady sate mRNA distribution (cid:104) m (cid:105) for both the three-state promoter ( R (cid:54) = 0) and the two-state promoter case( R = 0) (See Appendix S3 for moment derivation). The unregulated (two-state) promoter mean mRNA copynumber is given by Eq. S25. For the regulated (three-state) promoter we have an equivalent expression of theform (cid:104) m ( R (cid:54) = 0) (cid:105) = (2 − φ ) r m γ m k ( r )oﬀ k ( p )on k ( p )oﬀ k ( r )oﬀ + k ( p )oﬀ k ( r )on + k ( r )oﬀ k ( p )on . (S33)Computing the fold-change then givesfold-change = (cid:104) m ( R (cid:54) = 0) (cid:105)(cid:104) m ( R = 0) (cid:105) = k ( r )oﬀ (cid:16) k ( p )oﬀ + k ( p )on (cid:17) k ( p )oﬀ k ( r )on + k ( r )oﬀ (cid:16) k ( p )oﬀ + k ( p )on (cid:17) , (S34)30 able S1. Binding sites and corresponding parameters. Operator ∆ ε r ( k B T ) k ( r )oﬀ ( s − )O1 -15.3 0.002O2 -13.9 0.008O3 -9.7 0.55where the factor (2 − φ ) due to the multiple promoter copies, the transcription rate r m and the mRNA degra-dation rate γ m cancel out.Given that the number of repressors per cell R is an experimental variable that we can control, we assumethat the rate at which the promoter transitions from the transcriptionally inactive state to the repressor boundstate, k ( r )on , is given by the concentration of repressors [ R ] times a diﬀusion limited on rate k o . For the diﬀusionlimited constant k o we use the value used by Jones et al. [25]. With this in hand we can rewrite Eq. S34 asfold-change = (cid:32) k [ R ] k ( r )oﬀ k ( p )oﬀ k ( p )on + k ( p )oﬀ (cid:33) − . (S35)We note that both Eq. S32 and Eq. S35 have the same functional form. Therefore if both languages predictthe same output for the mean gene expression level, it must be true that k o [ R ] k ( r )oﬀ k ( p )oﬀ k ( p )on + k ( p )oﬀ = RN NS e − β ∆ ε r . (S36)Solving for k ( r )oﬀ gives k ( r )oﬀ = k o [ R ] N NS e β ∆ ε r R k ( p )oﬀ k ( p )on + k ( p )oﬀ . (S37)Since the reported value of k o is given in units of nM − s − in order for the units to cancel properly therepressor concentration has to be given in nM rather than absolute count. If we consider that the repressorconcentration is equal to [ R ] = RV cell · N A , (S38)where R is the absolute repressor copy number per cell, V cell is the cell volume and N A is Avogadro’s number.The E. coli cell volume is 2.1 fL [59], and Avogadro’s number is 6 . × . If we further include the conversionfactor to turn M into nM we ﬁnd that[ R ] = R . × − L · . × · nmol1 mol ≈ . × R. (S39)Using this we simplify Eq. S37 as k ( r )oﬀ ≈ . · k o · N NS e β ∆ ε r · k ( p )oﬀ k ( p )on + k ( p )oﬀ . (S40)What Eq. S40 shows is the direct relationship that must be satisﬁed if the equilibrium model is set to beconsistent with the non-equilibrium kinetic picture. Table S1 summarizes the values obtained for the threeoperator sequences used throughout this work. To compute these numbers the number of non-speciﬁc bindingsites N NS was taken to be 4 . × bp, i.e. the size of the E. coli

K12 genome.

In-vivo measurements of the Lac repressor oﬀ rate have been done with single-molecule resolution [60]. Theauthors report a mean residence time of 5 . ± . k ( r )oﬀ ≈ .

003 ( s − ), very similar value to what we inferred from our model. In this same reference theauthors determined that on average the repressor takes 30 . ± . P not R . This is computed as P not R = τ not R τ not R + τ R , (S41)where τ not R is the average time that the operator is not occupied by the repressor and τ R is the average timethat the repressor spends bound to the operator. Substituting the numbers from [60] gives P not R ≈ . (cid:10) m p (cid:11) for each of the three promoter states. This moment isequivalent to the probability of being on each of the promoter states. Upon substitution of our inferred rateparameters we can compute P not R as P not R = 1 − P R ≈ . , (S42)where P R is the probability of the promoter being bound by the repressor. The value we obtained is within afactor of two from the one reported in [60]. S3 Computing moments from the master equation

In this section we will compute the moment equations for the distribution P ( m, p ). Without lost of generalityhere we will focus on the three-state regulated promoter. The computation of the moments for the two-statepromoter follows the exact same procedure, changing only the deﬁnition of the matrices in the master equation. S3.1 Computing moments of a distribution (Note: The Python code used for the calculations presented in this section can be found in the followinglink as an annotated Jupyter notebook)To compute any moment of our chemical master equation (Eq. 6) let us deﬁne a vector (cid:104) m x p y (cid:105) ≡ ( (cid:104) m x p y (cid:105) A , (cid:104) m x p y (cid:105) I , (cid:104) m x p y (cid:105) R ) T , (S43)where (cid:104) m x p y (cid:105) S is the expected value of m x p y in state S ∈ { A, I, R } with x, y ∈ N . In other words, just as wedeﬁned the vector P ( m, p ), here we deﬁne a vector to collect the expected value of each of the promoter states.By deﬁnition, these moments (cid:104) m x p y (cid:105) S are computed as (cid:104) m x p y (cid:105) S ≡ ∞ (cid:88) m =0 ∞ (cid:88) p =0 m x p y P S ( m, p ) . (S44)To simplify the notation, let (cid:80) x ≡ (cid:80) ∞ x =0 . Since we are working with a system of three ODEs, one for eachstate, let us deﬁne the following operation: (cid:104) m x p y (cid:105) = (cid:88) m (cid:88) p m x p y P ( m, p ) ≡ (cid:80) m (cid:80) p m x p y P A ( m, p ) (cid:80) m (cid:80) p m x p y P I ( m, p ) (cid:80) m (cid:80) p m x p y P R ( m, p )  . (S45)With this in hand we can then apply this sum over m and p to Eq. 6. For the left-hand side we have (cid:88) m (cid:88) p m x p y d P ( m, p ) dt = ddt (cid:34)(cid:88) m (cid:88) p m x p y P ( m, p ) (cid:35) , (S46)32here we made use of the linearity property of the derivative to switch the order between the sum and thederivative. Notice that the right-hand side of Eq. S46 contains the deﬁnition of a moment from Eq. S44. Thatmeans that we can rewrite it as ddt (cid:34)(cid:88) m (cid:88) p m x p y P ( m, p ) (cid:35) = d (cid:104) m x p y (cid:105) dt . (S47)Distributing the sum on the right-hand side of Eq. 6 gives d (cid:104) m x p y (cid:105) dt = K (cid:88) m (cid:88) p m x p y P ( m, p ) − R m (cid:88) m (cid:88) p m x p y P ( m, p ) + R m (cid:88) m (cid:88) p m x p y P ( m − , p ) − Γ m (cid:88) m (cid:88) p ( m ) m x p y P ( m, p ) + Γ m (cid:88) m (cid:88) p ( m + 1) m x p y P ( m + 1 , p ) − R p (cid:88) m (cid:88) p ( m ) m x p y P ( m, p ) + R p (cid:88) m (cid:88) p ( m ) m x p y P ( m, p − − Γ p (cid:88) m (cid:88) p ( p ) m x p y P ( m, p ) + Γ p (cid:88) m (cid:88) p ( p + 1) m x p y P ( m, p + 1) . (S48)Let’s look at each term on the right-hand side individually. For the terms in Eq. S48 involving P ( m, p )we can again use Eq. S44 to rewrite them in a more compact form. This means that we can rewrite the statetransition term as K (cid:88) m (cid:88) p m x p y P ( m, p ) = K (cid:104) m x p y (cid:105) . (S49)The mRNA production term involving P ( m, p ) can be rewritten as R m (cid:88) m (cid:88) p m x p y P ( m, p ) = R m (cid:104) m x p y (cid:105) . (S50)In the same way the mRNA degradation term gives Γ m (cid:88) m (cid:88) p ( m ) m x p y P ( m, p ) = Γ m (cid:68) m ( x + ) p y (cid:69) . (S51)For the protein production and degradation terms involving P ( m, p ) we have R p (cid:88) m (cid:88) p ( m ) m x p y P ( m, p ) = R p (cid:68) m ( x + ) p y (cid:69) , (S52)and Γ p (cid:88) m (cid:88) p ( p ) m x p y P ( m, p ) = Γ p (cid:68) m x p ( y + ) (cid:69) , (S53)respectively.For the sums terms in Eq. S48 involving P ( m ± , p ) or P ( m, p ±

1) we can reindex the sum to work aroundthis mismatch. To be more speciﬁc let’s again look at each term case by case. For the mRNA production terminvolving P ( m − , p ) we deﬁne m (cid:48) ≡ m −

1. Using this we write R m (cid:88) m (cid:88) p m x p y P ( m − , p ) = R m ∞ (cid:88) m (cid:48) = − (cid:88) p ( m (cid:48) + 1) x p y P ( m (cid:48) , p ) . (S54)33ince having negative numbers of mRNA or protein doesn’t make physical sense we have that P ( − , p ) = 0.Therefore we can rewrite the sum starting from 0 rather than from -1, obtaining R m ∞ (cid:88) m (cid:48) = − (cid:88) p ( m (cid:48) + 1) x p y P ( m (cid:48) , p ) = R m ∞ (cid:88) m (cid:48) =0 (cid:88) p ( m (cid:48) + 1) x p y P ( m (cid:48) , p ) . (S55)Recall that our distribution P ( m, p ) takes m and p as numerical inputs and returns a probability associatedwith such a molecule count. Nevertheless, m and p themselves are dimensionless quantities that serve as indicesof how many molecules are in the cell. The distribution is the same whether the variable is called m or m (cid:48) ; fora speciﬁc number, let’s say m = 5, or m (cid:48) = 5, P (5 , p ) will return the same result. This means that the variablename is arbitrary, and the right-hand side of Eq. S55 can be written as R m ∞ (cid:88) m (cid:48) =0 (cid:88) p ( m (cid:48) + 1) x p y P ( m (cid:48) , p ) = R m (cid:104) ( m + ) x p y (cid:105) , (S56)since the left-hand side corresponds to the deﬁnition of a moment.For the mRNA degradation term involving P ( m + 1 , p ) we follow a similar procedure in which we deﬁne m (cid:48) = m + 1 to obtain Γ m (cid:88) m (cid:88) p ( m + 1) m x p y P ( m + 1 , p ) = Γ m ∞ (cid:88) m (cid:48) =1 (cid:88) p m (cid:48) ( m (cid:48) − x p y P ( m (cid:48) , p ) . (S57)In this case since the term on the right-hand side of the equation is multiplied by m (cid:48) , starting the sum over m (cid:48) from 0 rather than from 1 will not aﬀect the result since this factor will not contribute to the total sum.Nevertheless this is useful since our deﬁnition of a moment from Eq. S44 requires the sum to start at zero. Thismeans that we can rewrite this term as Γ m ∞ (cid:88) m (cid:48) =1 m (cid:48) (cid:88) p ( m (cid:48) − x p y P ( m (cid:48) , p ) = Γ m ∞ (cid:88) m (cid:48) =0 m (cid:48) (cid:88) p ( m (cid:48) − x p y P ( m (cid:48) , p ) . (S58)Here again we can change the arbitrary label m (cid:48) back to m obtaining Γ m ∞ (cid:88) m (cid:48) =0 m (cid:48) (cid:88) p ( m (cid:48) − x p y P ( m (cid:48) , p ) = Γ m (cid:104) m ( m − ) x p y (cid:105) . (S59)The protein production term involving P ( m, p −

1) can be reindexed by deﬁning p (cid:48) ≡ p −

1. This gives R p (cid:88) m (cid:88) p ( m ) m x p y P ( m, p −

1) = R p (cid:88) m ∞ (cid:88) p (cid:48) = − m ( x +1) ( p + 1) y P ( m, p (cid:48) ) . (S60)We again use the fact that negative molecule copy numbers are assigned with probability zero to begin the sumfrom 0 rather than -1 and the arbitrary nature of the label p (cid:48) to write R p (cid:88) m ∞ (cid:88) p (cid:48) =0 m ( x +1) ( p + 1) y P ( m, p (cid:48) ) = R p (cid:68) m ( x + ) ( p + ) y (cid:69) . (S61)Finally, we take care of the protein degradation term involving P ( m, p + 1). As before, we deﬁne p (cid:48) = p + 1 andsubstitute this to obtain Γ p (cid:88) m (cid:88) p ( p + 1) m x p y P ( m, p + 1) = Γ p (cid:88) m ∞ (cid:88) p (cid:48) =1 ( p (cid:48) ) m x ( p (cid:48) − y P ( m, p (cid:48) ) . (S62)34ust as with the mRNA degradation term, having a term p (cid:48) inside the sum allows us to start the sum over p (cid:48) from 0 rather than 1. We can therefore write Γ p (cid:88) m ∞ (cid:88) p (cid:48) =0 ( p (cid:48) ) m x ( p (cid:48) − y P ( m, p (cid:48) ) = Γ p (cid:104) m x p ( p − ) y (cid:105) . (S63)Putting all these terms together we can write the general moment ODE. This is of the form d (cid:104) m x p y (cid:105) dt = K (cid:104) m x p y (cid:105) (promoter state transition) − R m (cid:104) m x p y (cid:105) + R m (cid:104) ( m + ) x p y (cid:105) (mRNA production) − Γ m (cid:68) m ( x + ) p y (cid:69) + Γ m (cid:104) m ( m − ) x p y (cid:105) (mRNA degradation) − R p (cid:68) m ( x + ) p y (cid:69) + R p (cid:68) m ( x + ) ( p + ) y (cid:69) (protein production) − Γ p (cid:68) m x p ( y + ) (cid:69) + Γ p (cid:104) m x p ( p − ) y (cid:105) (protein degradation) . (S64) S3.2 Moment closure of the simple-repression distribution

A very interesting and useful feature of Eq. S64 is that for a given value of x and y the moment (cid:104) m x p y (cid:105) is only a function of lower moments. Speciﬁcally (cid:104) m x p y (cid:105) is a function of moments (cid:68) m x (cid:48) p y (cid:48) (cid:69) that satisfy twoconditions: 1) y (cid:48) ≤ y, x (cid:48) + y (cid:48) ≤ x + y. (S65)To prove this we rewrite Eq. S64 as d (cid:104) m x p y (cid:105) dt = K (cid:104) m x p y (cid:105) + R m (cid:104) p y [( m + ) x − m x ] (cid:105) + Γ m (cid:104) mp y [( m − ) x − m x ] (cid:105) + R p (cid:68) m ( x + ) [( p + ) y − p y ] (cid:69) + Γ p (cid:104) m x p [( p − ) y − p y ] (cid:105) , (S66)where the factorization is valid given the linearity of expected values. Now the objective is to ﬁnd the highestmoment for each term once the relevant binomial, such as ( m − x , is expanded. Take, for example, a simplecase in which we want to ﬁnd the second moment of the mRNA distribution. We then set x = 2 and y = 0.35q. S66 then becomes d (cid:10) m p (cid:11) dt = K (cid:10) m p (cid:11) + R m (cid:10) p (cid:2) ( m + ) − m (cid:3)(cid:11) + Γ m (cid:10) mp (cid:2) ( m − ) − m (cid:3)(cid:11) + R p (cid:68) m ( + ) (cid:2) ( p + ) − p (cid:3)(cid:69) + Γ p (cid:10) m p (cid:2) ( p − ) − p (cid:3)(cid:11) . (S67)Simplifying this equation gives d (cid:10) m (cid:11) dt = K (cid:10) m (cid:11) + R m (cid:104) [ + ] (cid:105) + Γ m (cid:10)(cid:2) − + m (cid:3)(cid:11) . (S68)Eq. S68 satisﬁes both of our conditions. Since we set y to be zero, none of the terms depend on any momentthat involves the protein number, therefore y (cid:48) ≤ y is satisﬁed. Also the highest moment in Eq. S68 also satisﬁes x (cid:48) + y (cid:48) ≤ x + y since the second moment of mRNA doesn’t depend on any moment higher than (cid:10) m (cid:11) . Todemonstrate that this is true for any x and y we now rewrite Eq. S66, making use of the binomial expansion( z ± n = n (cid:88) k =0 (cid:18) nk (cid:19) ( ± k z n − k . (S69)Just as before let’s look at each term individually. For the mRNA production term we have R m (cid:104) p y [( m + ) x − m x ] (cid:105) = R m (cid:42) p y (cid:34) x (cid:88) k = (cid:18) xk (cid:19) m x − k − m x (cid:35)(cid:43) . (S70)When k = 0, the term inside the sum on the right-hand side cancels with the other m x , so we can simplify to R m (cid:104) p y [( m + ) x − m x ] (cid:105) = R m (cid:42) p y (cid:34) x (cid:88) k = (cid:18) xk (cid:19) m x − k (cid:35)(cid:43) . (S71)Once the sum is expanded we can see that the highest moment in this sum is given by (cid:10) m ( x − ) p y (cid:11) whichsatisﬁes both of the conditions on Eq. S65.For the mRNA degradation term we similarly have Γ m (cid:104) mp y [( m − ) x − m x ] (cid:105) = Γ m (cid:42) mp y (cid:34) x (cid:88) k = (cid:18) xk (cid:19) ( − ) k m x − k − m x (cid:35)(cid:43) . (S72)Simplifying terms we obtain Γ m (cid:42) mp y (cid:34) x (cid:88) k = (cid:18) xk (cid:19) ( − ) k m x − k − m x (cid:35)(cid:43) = Γ m (cid:42) p y (cid:34) x (cid:88) k = (cid:18) xk (cid:19) ( − ) k m x + − k (cid:35)(cid:43) . (S73)The largest moment in this case is (cid:104) m x p y (cid:105) , which again satisﬁes the conditions on Eq. S65.The protein production term gives R p (cid:68) m ( x + ) [( p + ) y − p y ] (cid:69) = R p (cid:42) m ( x + ) (cid:34) y (cid:88) k = (cid:18) yk (cid:19) ( − ) k p y − k − p y (cid:35)(cid:43) . (S74)36pon simpliﬁcation we obtain R p (cid:42) m ( x + ) (cid:34) y (cid:88) k = (cid:18) yk (cid:19) ( − ) k p y − k − p y (cid:35)(cid:43) = R p (cid:42) m ( x + ) (cid:34) y (cid:88) k = (cid:18) yk (cid:19) ( − ) k p y − k (cid:35)(cid:43) . (S75)Here the largest moment is given by (cid:10) m x + p y − (cid:11) , that again satisﬁes both of our conditions. For the last term,for protein degradation we have Rp (cid:68) m ( x + ) [( p + ) y − p y ] (cid:69) = Rp (cid:42) m ( x + ) (cid:34) y (cid:88) k = (cid:18) yk (cid:19) ( − k ) p y − k (cid:35)(cid:43) . (S76)The largest moment involved in this term is therefore (cid:10) m x p y − (cid:11) . With this, we show that the four termsinvolved in our general moment equation depend only on lower moments that satisfy Eq. S65.As a reminder, what we showed in this section is that the kinetic model introduced in Fig. 2(A) hasno moment-closure problem. In other words, moments of the joint mRNA and protein distribution can becomputed just from knowledge of lower moments. This allows us to cleanly integrate the moments of thedistribution dynamics as cells progress through the cell cycle. S3.3 Computing single promoter steady-state moments (Note: The Python code used for the calculations presented in this section can be found in the followinglink as an annotated Jupyter notebook)As discussed in Section 1.4, one of the main factors contributing to cell-to-cell variability in gene expressionis the change in gene copy number during the cell cycle as cells replicate their genome before cell division. Ourminimal model accounts for this variability by considering the time trajectory of the distribution moments asgiven by Eq. S66. These predictions will be contrasted with the predictions from a kinetic model that doesn’taccount for changes in gene copy number during the cell cycle in Appendix S5.If we do not account for change in gene copy number during the cell cycle or for the partition of proteinsduring division, the dynamics of the moments of the distribution described in this section will reach a steadystate. In order to compute the steady-state moments of the kinetic model with a single gene across the cellcycle, we use the moment closure property of our master equation. By equating Eq. S66 to zero for a given x and y , we can solve the resulting linear system and obtain a solution for (cid:104) m x p y (cid:105) at steady state as a functionof moments (cid:68) m x (cid:48) p y (cid:48) (cid:69) that satisfy Eq. S65. Then, by solving for the zero th moment (cid:10) m p (cid:11) subject to theconstraint that the probability of the promoter being in any state should add up to one, we can substitute backall of the solutions in terms of moments (cid:68) m x (cid:48) p y (cid:48) (cid:69) with solutions in terms of the rates shown in Fig. 2. In otherwords, through an iterative process, we can get at the value of any moment of the distribution. We start bysolving for the zero th moment. Since all higher moments, depend on lower moments we can use the solution ofthe zero th moment to compute the ﬁrst mRNA moment. This solution is then used for higher moments in ahierarchical iterative process. S4 Accounting for the variability in gene copy number during thecell cycle (Note: The Python code used for the calculations presented in this section can be found in the followinglink as an annotated Jupyter notebook)When growing in rich media, bacteria can double every ≈

20 minutes. With two replication forks eachtraveling at ≈ ≈ E. coli [61], a cell would need ≈

40 minutes37o replicate its genome. The apparent paradox of growth rates faster than one division per 40 minutes is solvedby the fact that cells have multiple replisomes, i.e. molecular machines that replicate the genome running inparallel. Cells can have up to 8 copies of the genome being replicated simultaneously depending on the growthrate [37].This observation implies that during the cell cycle gene copy number varies. This variation depends onthe growth rate and the relative position of the gene with respect to the replication origin, having genes closeto the replication origin spending more time with multiple copies compare to genes closer to the replicationtermination site. This change in gene dosage has a direct eﬀect on the cell-to-cell variability in gene expression[25, 38].

S4.1 Numerical integration of moment equations (Note: The Python code used for the calculations presented in this section can be found in the followinglink as an annotated Jupyter notebook)For our speciﬁc locus ( galK ) and a doubling time of ≈

60 min for our experimental conditions, cells have onaverage 1.66 copies of the reporter gene during the cell cycle [25]. What this means is that cells spend 60% ofthe time having one copy of the gene and 40% of the time with two copies. To account for this variability ingene copy number across the cell cycle we numerically integrate the moment equations derived in Appendix S3for a time t = [0 , t s ] with an mRNA production rate r m , where t s is the time point at which the replicationfork reaches our speciﬁc locus. For the remaining time before the cell division t = [ t s , t d ] that the cell spendswith two promoters, we assume that the only parameter that changes is the mRNA production rate from r m to 2 r m . This simplifying assumption ignores potential changes in protein translation rate r p or changes in therepressor copy number that would be reﬂected in changes on the repressor on rate k ( r )on . S4.1.1 Computing distribution moments after cell division (Note: The Python code used for the calculations presented in this section can be found in the followinglink as an annotated Jupyter notebook)We have already solved a general form for the dynamics of the moments of the distribution, i.e. we wrotediﬀerential equations for the moments d (cid:104) m x p y (cid:105) dt . Given that we know all parameters for our model we can simplyintegrate these equations numerically to compute how the moments of the distribution evolve as cells progressthrough their cell cycle. Once the cell reaches a time t d when is going to divide the mRNA and proteins that weare interested in undergo a binomial partitioning between the two daughter cells. In other words, each moleculeﬂips a coin and decides whether to go to either daughter. The question then becomes given that we have avalue for the moment (cid:104) m x p y (cid:105) t d at a time before the cell division, what would the value of this moment be afterthe cell division takes place (cid:104) m x p y (cid:105) t o ?The probability distribution of mRNA and protein after the cell division P t o ( m, p ) must satisfy P t o ( m, p ) = ∞ (cid:88) m (cid:48) = m ∞ (cid:88) p (cid:48) = p P ( m, p | m (cid:48) , p (cid:48) ) P t d ( m (cid:48) , p (cid:48) ) , (S77)where we are summing over all the possibilities of having m (cid:48) mRNA and p (cid:48) proteins before cell division. Notethat the sums start at m and p ; this is because for a cell to have these copy numbers before cell division it isa requirement that the mother cell had at least such copy number since we are not assuming that there is anyproduction at the instantaneous cell division time. Since we assume that the partition of mRNA is independentfrom the partition of protein, the conditional probability P ( m, p | m (cid:48) , p (cid:48) ) is simply given by a product of twobinomial distributions, one for the mRNA and one for the protein, i.e. P ( m, p | m (cid:48) , p (cid:48) ) = (cid:18) m (cid:48) m (cid:19) (cid:18) (cid:19) m (cid:48) · (cid:18) p (cid:48) p (cid:19) (cid:18) (cid:19) p (cid:48) . (S78)38ecause of these product of binomial probabilities are allowed to extend the sum from Eq. S118 to start at m (cid:48) = 0 and p (cid:48) = 0 as P t o ( m, p ) = ∞ (cid:88) m (cid:48) =0 ∞ (cid:88) p (cid:48) =0 P ( m, p | m (cid:48) , p (cid:48) ) P t d ( m (cid:48) , p (cid:48) ) , (S79)since the product of the binomial distributions in Eq. S119 is zero for all m (cid:48) < m and/or p (cid:48) <

0. So from nowon in this section we will assume that a sum of the form (cid:80) x ≡ (cid:80) ∞ x =0 to simplify notation.We can then compute the distribution moments after the cell division (cid:104) m x p y (cid:105) t o as (cid:104) m x p y (cid:105) t o = (cid:88) m (cid:88) p m x p y P t o ( m, p ) , (S80)for all x, y ∈ N . Substituting Eq. S118 results in (cid:104) m x p y (cid:105) t o = (cid:88) m (cid:88) p m x p y (cid:88) m (cid:48) (cid:88) p (cid:48) P ( m, p | m (cid:48) , p (cid:48) ) P t d ( m (cid:48) , p (cid:48) ) . (S81)We can rearrange the sums to be (cid:104) m x p y (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) P t d ( m (cid:48) , p (cid:48) ) (cid:88) m (cid:88) p m x p y P ( m, p | m (cid:48) , p (cid:48) ) . (S82)The fact that Eq. S119 is the product of two independent events allows us to rewrite the joint probability P ( m, p | m (cid:48) , p (cid:48) ) as P ( m, p | m (cid:48) , p (cid:48) ) = P ( m | m (cid:48) ) · P ( p | p (cid:48) ) . (S83)With this we can then write the moment (cid:104) m x p y (cid:105) t o as (cid:104) m x p y (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) P t d ( m (cid:48) , p (cid:48) ) (cid:88) m m x P ( m | m (cid:48) ) (cid:88) p p y P ( p | p (cid:48) ) . (S84)Notice that both terms summing over m and over p are the conditional expected values, i.e. (cid:88) z z x P ( z | z (cid:48) ) ≡ (cid:104) z x | z (cid:48) (cid:105) , for z ∈ { m, p } . (S85)These conditional expected values are the expected values of a binomial random variable z ∼ Bin( z (cid:48) , / (cid:104) m x p y (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) (cid:104) m x | m (cid:48) (cid:105) (cid:104) p y | p (cid:48) (cid:105) P t d ( m (cid:48) , p (cid:48) ) . (S86)To see how this general formula for the moments after the cell division works let’s compute the mean proteinper cell after the cell division (cid:104) p (cid:105) t o . That is setting x = 0, and y = 1. This results in (cid:104) p (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) (cid:10) m | m (cid:48) (cid:11) (cid:104) p | p (cid:48) (cid:105) P t d ( m (cid:48) , p (cid:48) ) . (S87)The zeroth moment (cid:10) m | m (cid:48) (cid:11) by deﬁnition must be one since we have (cid:10) m | m (cid:48) (cid:11) = (cid:88) m m P ( m | m (cid:48) ) = (cid:88) m P ( m | m (cid:48) ) = 1 , (S88)39ince the probability distribution must be normalized. This leaves us then with (cid:104) p (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) P t d ( m (cid:48) , p (cid:48) ) (cid:104) p | p (cid:48) (cid:105) . (S89)If we take the sum over m (cid:48) we simply compute the marginal probability distribution (cid:80) m (cid:48) P t d ( m (cid:48) , p (cid:48) ) = P t d ( p (cid:48) ),then we have (cid:104) p (cid:105) t o = (cid:88) p (cid:48) (cid:104) p | p (cid:48) (cid:105) P t d ( p (cid:48) ) . (S90)For the particular case of the ﬁrst moment of the binomial distribution with parameters p (cid:48) and 1 / (cid:104) p | p (cid:48) (cid:105) = p (cid:48) . (S91)Therefore the moment after division is equal to (cid:104) p (cid:105) t o = (cid:88) p (cid:48) p (cid:48) P t d ( p (cid:48) ) = 12 (cid:88) p (cid:48) p (cid:48) P t d ( p (cid:48) ) . (S92)Notice that this is just 1/2 of the expected value of p (cid:48) averaging over the distribution prior to cell division, i.e. (cid:104) p (cid:105) t o = (cid:104) p (cid:48) (cid:105) t d , (S93)where (cid:104)·(cid:105) t d highlights that is the moment of the distribution prior to the cell division. This result makes perfectsense. What this is saying is that the mean protein copy number right after the cell divides is half of the meanprotein copy number just before the cell division. That is exactly we would expect. So in principle to know theﬁrst moment of either the mRNA distribution (cid:104) m (cid:105) t o or the protein distribution (cid:104) m (cid:105) t o right after cell divisionit suﬃces to multiply the moments before the cell division (cid:104) m (cid:105) t d or (cid:104) p (cid:105) t d by 1/2. Let’s now explore how thisgeneralizes to any other moment (cid:104) m x p y (cid:105) t o . S4.1.2 Computing the moments of a binomial distribution

The result from last section was dependent on us knowing the functional form of the ﬁrst moment of thebinomial distribution. For higher moments we need some systematic way to compute such moments. Luckilyfor us we can do so by using the so-called moment generating function (MGF). The MGF of a random variable X is deﬁned as M X ( t ) = (cid:10) e tX (cid:11) , (S94)where t is a dummy variable. Once we know the MGF we can obtain any moment of the distribution by simplycomputing (cid:104) X n (cid:105) = d n dt n M X ( t ) (cid:12)(cid:12)(cid:12)(cid:12) t =0 , (S95)i.e. taking the n -th derivative of the MGF returns the n -th moment of the distribution. For the particular caseof the binomial distribution X ∼ Bin(

N, q ) it can be shown that the MGF is of the form M X ( t ) = (cid:2) (1 − q ) + qe t (cid:3) N . (S96)As an example let’s compute the ﬁrst moment of this binomially distributed variable. For this, the ﬁrst derivativeof the MGF results in dM X ( t ) dt = N [(1 − q ) + qe t ] N − qe t . (S97)We just need to follow Eq. S136 and set t = 0 to obtain the ﬁrst moment dM X ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12) t =0 = N q, (S98)40hich is exactly the expected value of a binomially distributed random variable.So according to Eq. S127 to compute any moment (cid:104) m x p y (cid:105) after cell division we can just take the x -thderivative and the y -th derivative of the binomial MGF to obtain (cid:104) m x | m (cid:48) (cid:105) and (cid:104) p y | p (cid:48) (cid:105) , respectively, and takethe expected value of the result. Let’s follow on detail the speciﬁc case for the moment (cid:104) mp (cid:105) . When computingthe moment after cell division (cid:104) mp (cid:105) t o which is of the form (cid:104) mp (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) (cid:104) m | m (cid:48) (cid:105) (cid:104) p | p (cid:48) (cid:105) P t d ( m (cid:48) , p (cid:48) ) , (S99)the product (cid:104) m | m (cid:48) (cid:105) (cid:104) p | p (cid:48) (cid:105) is then (cid:104) m | m (cid:48) (cid:105) (cid:104) p | p (cid:48) (cid:105) = m (cid:48) · p (cid:48) , (S100)where we used the result in Eq. S139, substituting m and p for X , respectively, and q for 1/2. Substituting thisresult into the moment gives (cid:104) mp (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) m (cid:48) p (cid:48) P t d ( m (cid:48) , p (cid:48) ) = (cid:104) m (cid:48) p (cid:48) (cid:105) t d . (S101)Therefore to compute the moment after cell division (cid:104) mp (cid:105) t o we simply have to divide by 4 the correspondingequivalent moment before the cell division.Not all moments after cell division depend only on the equivalent moment before cell division. For exampleif we compute the third moment of the protein distribution (cid:10) p (cid:11) t o , we ﬁnd (cid:10) p (cid:11) t o = (cid:10) p (cid:11) t d (cid:10) p (cid:11) t d . (S102)So for this particular case the third moment of the protein distribution depends on the third moment and thesecond moment before the cell division. In general all moments after cell division (cid:104) m x p y (cid:105) t o linearly depend onmoments before cell division. Furthermore, there is “moment closure” for this speciﬁc case in the sense that allmoments after cell division depend on lower moments before cell division. To generalize these results to all themoments computed in this work let us then deﬁne a vector to collect all moments before the cell division upthe (cid:104) m x p y (cid:105) t d moment, i.e. (cid:104) m x p y (cid:105) t d = (cid:16)(cid:10) m p (cid:11) t d , (cid:10) m (cid:11) t d , . . . , (cid:104) m x p y (cid:105) t d (cid:17) . (S103)Then any moment after cell division (cid:68) m x (cid:48) p y (cid:48) (cid:69) t o for x (cid:48) ≤ x and y (cid:48) ≤ y can be computed as (cid:68) m x (cid:48) p y (cid:48) (cid:69) t o = z x (cid:48) y (cid:48) · (cid:104) m x p y (cid:105) t d , where we deﬁne the vector z x (cid:48) y (cid:48) as the vector containing all the coeﬃcients that we obtain with the product ofthe two binomial distributions. For example for the case of the third protein moment (cid:10) p (cid:11) t o the vector z x (cid:48) y (cid:48) would have zeros for all entries except for the corresponding entry for (cid:10) p (cid:11) t d and for (cid:10) p (cid:11) t d , where it wouldhave 3 / / (cid:104) m x p y (cid:105) t o let us deﬁne an equivalentvector (cid:104) m x p y (cid:105) t o = (cid:16)(cid:10) m p (cid:11) t o , (cid:10) m (cid:11) t o , . . . , (cid:104) m x p y (cid:105) t o (cid:17) . (S104)Then we need to build a square matrix Z such that each row of the matrix contains the corresponding vector z x (cid:48) y (cid:48) for each of the moments. Having this matrix we would simply compute the moments after the cell divisionas (cid:104) m x p x (cid:105) t o = Z · (cid:104) m x p x (cid:105) t d . (S105)41n other words, matrix Z will contain all the coeﬃcients that we need to multiply by the moments before thecell division in order to obtain the moments after cell division. Matrix Z was then generated automaticallyusing Python’s analytical math library sympy [62].Fig. S15 (adapted from Fig. 3(B)) shows how the ﬁrst moment of both mRNA and protein changes overseveral cell cycles. The mRNA quickly relaxes to the steady state corresponding to the parameters for botha single and two promoter copies. This is expected since the parameters for the mRNA production weredetermined in the ﬁrst place under this assumption (See Appendix S1). We note that there is no apparent delaybefore reaching steady state of the mean mRNA count after the cell divides. This is because the mean mRNAcount for the two promoters copies state is exactly twice the expected mRNA count for the single promoterstate (See Appendix S1). Therefore once the mean mRNA count is halved after the cell division, it is already atthe steady state value for the single promoter case. On the other hand, given that the relaxation time to steadystate is determined by the degradation rate, the mean protein count does not reach its corresponding steadystate value for either promoter copy number state. Interestingly once a couple of cell cycles have passed theﬁrst moment has a repetitive trajectory over cell cycles. We have observed this experimentally by tracking cellsas they grow under the microscope. Comparing cells at the beginning of the cell cycle with the daughter cellsthat appear after cell division shown that on average all cells have the same amount of protein at the beginningof the cell cycle (See Fig. 18 of [28]), suggesting that these dynamical steady state takes place in vivo . m R N A / c e ll single promoter two promoters time (min) p r o t e i n / c e ll Figure S8. First and second moment dynamics over cell the cell cycle.

Mean ± standard deviation mRNA(upper panel) and mean ± standard deviation protein copy number (lower panel) as the cell cycle progresses. The darkshaded region delimits the fraction of the cell cycle that cells spend with a single copy of the promoter. The lightshaded region delimits the fraction of the cell cycle that cells spend with two copies of the promoter. For a 100 mindoubling time at the galK locus cells spend 60% of the time with one copy of the promoter and the rest with two copies. In principle when measuring gene expression levels experimentally from an asynchronous culture, cells aresampled from any time point across their individual cell cycles. This means that the moments determinedexperimentally correspond to an average over the cell cycle. In the following section we discuss how to accountfor the fact that cells are not uniformly distributed across the cell cycle in order to compute these averages.

S4.2 Exponentially distributed ages

As mentioned in Appendix S2, cells in exponential growth have exponentially distributed ages across thecell cycle, having more young cells compared to old ones. Speciﬁcally the probability of a cell being at any timepoint in the cell cycle is given by [40] P ( a ) = (ln 2) · − a , (S106)where a ∈ [0 ,

1] is the stage of the cell cycle, with a = 0 being the start of the cycle and a = 1 being the celldivision. In Appendix S10 we reproduce this derivation. It is a surprising result, but can be intuitively thoughtas follows: If the culture is growing exponentially, that means that all the time there is an increasing number of42ells. That means for example that if in a time interval ∆ t N “old” cells divided, these produced 2 N “young”cells. So at any point there is always more younger than older cells.Our numerical integration of the moment equations gave us a time evolution of the moments as cells progressthrough the cell cycle. Since experimentally we sample asynchronous cells that follow Eq. S147, each time pointalong the moment dynamic must be weighted by the probability of having sampled a cell at such speciﬁc timepoint of the cell cycle. Without loss of generality let’s focus on the ﬁrst mRNA moment (cid:104) m ( t ) (cid:105) (the same canbe applied to all other moments). As mentioned before, in order to calculate the ﬁrst moment across the entirecell cycle we must weigh each time point by the corresponding probability that a cell is found in such point ofits cell cycle. This translates to computing the integral (cid:104) m (cid:105) c = (cid:90) end cell cyclebeginning cell cycle (cid:104) m ( t ) (cid:105) P ( t ) dt, (S107)where (cid:104) m (cid:105) c is the mean mRNA copy number averaged over the entire cell cycle trajectory, and P ( t ) is theprobability of a cell being at a time t of its cell cycle.If we set the time in units of the cell cycle length we can use Eq. S147 and compute instead (cid:104) m (cid:105) = (cid:90) (cid:104) m ( a ) (cid:105) P ( a ) da, (S108)where P ( a ) is given by Eq. S147.What Eq. S149 implies is that in order to compute the ﬁrst moment (or any moment of the distribution) wemust weigh each point in the moment dynamics by the corresponding probability of a cell being at that pointalong its cell cycle. That is why when computing a moment we take the time trajectory of a single cell cycleas the ones shown in Fig. S15 and compute the average using Eq. S147 to weigh each time point. We performthis integral numerically for all moments using Simpson’s rule. S4.3 Reproducing the equilibrium picture

Given the large variability of the ﬁrst moments depicted in Fig. S15 it is worth considering why a simplisticequilibrium picture has shown to be very successful in predicting the mean expression level under diverseconditions [20, 24, 26, 27]. In this section we compare the simple repression thermodynamic model with thisdynamical picture of the cell cycle. But before diving into this comparison, it is worth recapping the assumptionsthat go into the equilibrium model.

S4.3.1 Steady state under the thermodynamic model

Given the construction of the thermodynamic model of gene regulation for which the probability of thepromoter microstates rather than the probability of mRNA or protein counts is accounted for, we are onlyallowed to describe the dynamics of the ﬁrst moment using this theoretical framework [36]. Again let’s onlyfocus on the mRNA ﬁrst moment (cid:104) m (cid:105) . The same principles apply if we consider the protein ﬁrst moment. Wecan write a dynamical system of the form d (cid:104) m (cid:105) dt = r m · p bound − γ m (cid:104) m (cid:105) , (S109)where as before r m and γ m are the mRNA production and degradation rates respectively, and p bound is theprobability of ﬁnding the RNAP bound to the promoter [54]. This dynamical system is predicted to have asingle stable ﬁxed point that we can ﬁnd by computing the steady state. When we solve for the mean mRNAcopy number at steady state (cid:104) m (cid:105) ss we ﬁnd (cid:104) m (cid:105) ss = r m γ m p bound . (S110)43ince we assume that the only eﬀect that the repressor has over the regulation of the promoter is exclusionof the RNAP from binding to the promoter, we assume that only p bound depends on the repressor copy number R . Therefore when computing the fold-change in gene expression we are left withfold-change = (cid:104) m ( R (cid:54) = 0) (cid:105) ss (cid:104) m ( R = 0) (cid:105) ss = p bound ( R (cid:54) = 0) p bound ( R = 0) . (S111)As derived in [20] this can be written in the language of equilibrium statistical mechanics asfold-change = (cid:18) RN NS e − β ∆ ε r (cid:19) − , (S112)where β ≡ ( k B T ) − , ∆ ε r is the repressor-DNA binding energy, and N NS is the number of non-speciﬁc bindingsites where the repressor can bind.To arrive at Eq. S153 we ignore the physiological changes that occur during the cell cycle; one of the mostimportant being the variability in gene copy number that we are exploring in this section. It is thereforeworth thinking about whether or not the dynamical picture exempliﬁed in Fig. S15 can be reconciled with thepredictions made by Eq. S153 both at the mRNA and protein level.Fig. S16 compares the predictions of both theoretical frameworks for varying repressor copy numbers andrepressor-DNA aﬃnities. The solid lines are directly computed from Eq. S153. The hollow triangles and the solidcircles, represent the fold-change in mRNA and protein respectively as computed from the moment dynamics.To compute the fold-change from the kinetic picture we ﬁrst numerically integrate the moment dynamics forboth the two- and the three-state promoter (See Fig. S15 for the unregulated case) and then average the timeseries accounting for the probability of cells being sampled at each stage of the cell cycle as deﬁned in Eq. S149.The small systematic deviations between both models come partly from the simplifying assumption that therepressor copy number, and therefore the repressor on rate k ( r )on remains constant during the cell cycle. Inprinciple the gene producing the repressor protein itself is also subjected to the same duplication during thecell cycle, changing therefore the mean repressor copy number for both stages. repressors per cell f o l d - c h a n g e = .= .= . mRNAprotein Figure S9. Comparison of the equilibrium and kinetic reressor titration predictions.

The equilibriummodel (solid lines) and the kinetic model with variation over the cell cycle (solid circles and white triangles) predictionsare compared for varying repressor copy numbers and operator binding energy. The equilibrium model is directlycomputed from Eq. S153 while the kinetic model is computed by numerically integrating the moment equations overseveral cell cycles, and then averaging over the extent of the cell cycle as deﬁned in Eq. S149.

For completeness Fig. S17 compares the kinetic and equilibrium models for the extended model of [27] inwhich the inducer concentration enters into the equation. The solid line is directly computed from Eq. 5 of[27]. The hollow triangles and solid points follow the same procedure as for Fig. S16, where the only eﬀect that44he inducer is assume to have in the kinetics is an eﬀective change in the number of active repressors, aﬀectingtherefore k ( r )on . IPTG ( M) f o l d - c h a n g e = . rep. / cell Figure S10. Comparison of the equilibrium and kinetic inducer titration predictions.

The equilibriummodel (solid lines) and the kinetic model with variation over the cell cycle (solid circles and white triangles) predictionsare compared for varying repressor copy numbers and inducer concentrations. The equilibrium model is directlycomputed as Eq. 5 of reference [27] with repressor-DNA binding energy ∆ ε r = − . k B T while the kinetic model iscomputed by numerically integrating the moment dynamics over several cell cycles, and then averaging over the extentof a single cell cycle as deﬁned in Eq. S149. S4.4 Comparison between single- and multi-promoter kinetic model

After these calculations it is worth questioning whether the inclusion of this change in gene dosage isdrastically diﬀerent with respect to the simpler picture of a kinetic model that ignores the gene copy numbervariability during the cell cycle. To this end we systematically computed the average moments for varyingrepressor copy number and repressor-DNA aﬃnities. We then compare these results with the moments obtainedfrom a single-promoter model and their corresponding parameters. The derivation of the steady-state momentsof the distribution for the single-promoter model are detailed in Appendix S3.Fig. S16 and Fig. S17 both suggest that since the dynamic multi-promoter model can reproduce the resultsof the equilibrium model at the ﬁrst moment level it must then also be able to reproduce the results of the single-promoter model at this level (See Appendix S2). The interesting comparison comes with higher moments. Auseful metric to consider for gene expression variability is the noise in gene expression [53]. This quantity, deﬁnedas the standard deviation divided by the mean, is a dimensionless metric of how much variability there is withrespect to the mean of a distribution. As we will show below this quantity diﬀers from the also commonly usedmetric known as the Fano factor (variance / mean) in the sense that for experimentally determined expressionlevels in ﬂuorescent arbitrary units, the noise is a dimensionless quantity while the Fano factor is not.Fig. S18 shows the comparison of the predicted protein noise between the single- (dashed lines) and themulti-promoter model (solid lines) for diﬀerent operators and repressor copy numbers. A striking diﬀerencebetween both is that the single-promoter model predicts that as the inducer concentration increases, the standarddeviation grows much slower than the mean, giving a very small noise. In comparison the multi-promoter modelhas a much higher ﬂoor for the lowest value of the noise, reﬂecting the expected result that the variability ingene copy number across the cell cycle should increase the cell-to-cell variability in gene expression [25, 38]

S4.5 Comparison with experimental data

Having shown that the kinetic model presented in this section can not only reproduce the results from theequilibrium picture at the mean level (See Fig. S16 and Fig. S17), but make predictions for the cell-to-cell45 IPTG ((cid:181)M) n o i s e = -15.3 rep./cell222601740 IPTG ((cid:181)M) = -13.9 rep./cell222601740 IPTG ((cid:181)M) = -9.7 rep./cell222601740

Figure S11. Comparison of the predicted protein noise between a single- and a multi-promoter kineticmodel.

Comparison of the noise (standard deviation/mean) between a kinetic model that considers a single promoterat all times (dashed line) and the multi-promoter model developed in this section (solid line) for diﬀerent repressoroperators. (A) Operator O1, ∆ ε r = − . k B T , (B) O2, ∆ ε r = − . k B T , (C) O3, ∆ ε r = − . k B T variability as quantiﬁed by the noise (See Fig. S18), we can assess whether or not this model is able to predictexperimental measurements of the noise. For this we take the single cell intensity measurements (See Methods)to compute the noise at the protein level.As mentioned before this metric diﬀers from the Fano factor since for ﬂuorescent arbitrary units the noiseis a dimensionless quantity. To see why consider that the noise is deﬁned asnoise ≡ (cid:113) (cid:104) p (cid:105) − (cid:104) p (cid:105) (cid:104) p (cid:105) . (S113)We assume that the intensity level of a cell I is linearly proportional to the absolute protein count, i.e. I = αp, (S114)where α is the proportionality constant between arbitrary units and protein absolute number p . Substitutingthis deﬁnition on Eq. S154 gives noise = (cid:113) (cid:104) ( αI ) (cid:105) − (cid:104) αI (cid:105) (cid:104) αI (cid:105) . (S115)Since α is a constant it can be taken out of the average operator (cid:104)·(cid:105) , obtainingnoise = (cid:114) α (cid:16) (cid:104) I (cid:105) − (cid:104) I (cid:105) (cid:17) α (cid:104) I (cid:105) = (cid:114)(cid:16) (cid:104) I (cid:105) − (cid:104) I (cid:105) (cid:17) (cid:104) I (cid:105) . (S116)Notice that in Eq. S155 the linear proportionality between intensity and protein count has no intercept.This ignores the autoﬂuorescence that cells without reporter would generate. To account for this, in practicewe compute noise = (cid:114)(cid:16) (cid:104) ( I − (cid:104) I auto (cid:105) ) (cid:105) − (cid:104) I − (cid:104) I auto (cid:105)(cid:105) (cid:17) (cid:104) I − (cid:104) I auto (cid:105)(cid:105) . (S117)46here I is the intensity of the strain of interest and (cid:104) I auto (cid:105) is the mean autoﬂuorescence intensity, obtainedfrom a strain that does not carry the ﬂuorescent reporter gene.Fig. S19 shows the comparison between theoretical predictions and experimental measurements for theunregulated promoter. The reason we split the data by operator despite the fact that since these are unregulatedpromoters, they should in principle have identical expression proﬁles is to precisely make sure that this is thecase. We have found in the past that sequences downstream of the RNAP binding site can aﬀect the expressionlevel of constitutively expressed genes. We can see that both models, the single-promoter (gray dotted line)and the multi-promoter (black dashed line) underestimate the experimental noise to diﬀerent degrees. Thesingle-promoter model does a worse job at predicting the experimental data since it doesn’t account for thediﬀerences in gene dosage during the cell cycle. But still we can see that accounting for this variability takes usto within a factor of two of the experimentally determined noise for these unregulated strains. O1 O2 O30.00.20.40.60.81.0 n o i s e modelsingle-promotermulti-promoternoisenoisenoise Figure S12. Protein noise of the unregulated promoter.

Comparison of the experimental noise for diﬀerentoperators with the theoretical predictions for the single-promoter (gray dotted line) and the multi-promoter model(black dashed line). Each datum represents a single date measurement of the corresponding ∆ lacI strain with ≥ To further test the model predictive power we compare the predictions for the three-state regulated promoter.Fig. S20 shows the theoretical predictions for the single- and multi-promoter model for varying repressor copynumbers and repressor-DNA binding aﬃnities as a function of the inducer concentration. We can see again thatour zero-parameter ﬁts systematically underestimates the noise for all strains and all inducer concentrations.We highlight that the y -axis is shown in a log-scale to emphasize more this deviation; but, as we will show inthe next section, our predictions still fall within a factor of two from the experimental data. S4.5.1 Systematic deviation of the noise in gene expression

Fig. S19 and Fig. S20 highlight that our model underestimates the cell-to-cell variability as measured bythe noise. To further explore this systematic deviation Fig. S21 shows the theoretical vs. experimental noiseboth in linear and log scale. As we can see the data is systematically above the identity line. The data iscolored by their corresponding experimental fold-change values. The data that has the largest deviations fromthe identity line also corresponds to the data with the largest error bars and the smallest fold-change. This isbecause measurements with very small fold-changes correspond to intensities very close to the autoﬂuorescencebackground. Therefore minimal changes when computing the noise are ampliﬁed given the ratio of std/mean.In Appendix S9 we will explore empirical ways to improve the agreement between our minimal model and theexperimental data to guide future eﬀorts to improve the minimal.47 = -15.3 = -13.9 = -9.7 IPTG ( M) n o i s e rep./cell222601740singlemulti IPTG ( M) rep./cell222601740singlemulti IPTG ( M) rep./cell222601740singlemulti

Figure S13. Protein noise of the regulated promoter.

Comparison of the experimental noise for diﬀerentoperators ((A) O1, ∆ ε r = − . k B T , (B) O2, ∆ ε r = − . k B T , (C) O3, ∆ ε r = − . k B T ) with the theoreticalpredictions for the single-promoter (dashed lines) and the multi-promoter model (solid lines). Points represent theexperimental noise as computed from single-cell ﬂuorescence measurements of diﬀerent E. coli strains under 12 diﬀerentinducer concentrations. Dotted line indicates plot in linear rather than logarithmic scale. Each datum represents asingle date measurement of the corresponding strain and IPTG concentration with ≥

300 cells. The points correspondto the median, and the error bars correspond to the 95% conﬁdence interval as determined by 10,000 bootstrapsamples. White-ﬁlled dots are plot at a diﬀerent scale for better visualization. theoretical noise e x p e r i m e n t a l n o i s e linear scale theoretical noise e x p e r i m e n t a l n o i s e log scale f o l d - c h a n g e Figure S14. Systematic comparison of theoretical vs experimental noise in gene expression.

Theoreticalvs. experimental noise both in linear (left) and log (right) scale. The dashed line shows the identity line of slope 1 andintercept zero. All data are colored by the corresponding value of the experimental fold-change in gene expression asindicated by the color bar. Each datum represents a single date measurement of the corresponding strain and IPTGconcentration with ≥

300 cells. The points correspond to the median, and the error bars correspond to the 95%conﬁdence interval as determined by 10,000 bootstrap samples.

S5 Accounting for the variability in gene copy number during thecell cycle (Note: The Python code used for the calculations presented in this section can be found in the followinglink as an annotated Jupyter notebook)When growing in rich media, bacteria can double every ≈

20 minutes. With two replication forks each48raveling at ≈ ≈ E. coli [61], a cell would need ≈

40 minutesto replicate its genome. The apparent paradox of growth rates faster than one division per 40 minutes is solvedby the fact that cells have multiple replisomes, i.e. molecular machines that replicate the genome running inparallel. Cells can have up to 8 copies of the genome being replicated simultaneously depending on the growthrate [37].This observation implies that during the cell cycle gene copy number varies. This variation depends onthe growth rate and the relative position of the gene with respect to the replication origin, having genes closeto the replication origin spending more time with multiple copies compare to genes closer to the replicationtermination site. This change in gene dosage has a direct eﬀect on the cell-to-cell variability in gene expression[25, 38].

S5.1 Numerical integration of moment equations (Note: The Python code used for the calculations presented in this section can be found in the followinglink as an annotated Jupyter notebook)For our speciﬁc locus ( galK ) and a doubling time of ≈

60 min for our experimental conditions, cells have onaverage 1.66 copies of the reporter gene during the cell cycle [25]. What this means is that cells spend 60% ofthe time having one copy of the gene and 40% of the time with two copies. To account for this variability ingene copy number across the cell cycle we numerically integrate the moment equations derived in Appendix S3for a time t = [0 , t s ] with an mRNA production rate r m , where t s is the time point at which the replicationfork reaches our speciﬁc locus. For the remaining time before the cell division t = [ t s , t d ] that the cell spendswith two promoters, we assume that the only parameter that changes is the mRNA production rate from r m to 2 r m . This simplifying assumption ignores potential changes in protein translation rate r p or changes in therepressor copy number that would be reﬂected in changes on the repressor on rate k ( r )on . S5.1.1 Computing distribution moments after cell division (Note: The Python code used for the calculations presented in this section can be found in the followinglink as an annotated Jupyter notebook)We have already solved a general form for the dynamics of the moments of the distribution, i.e. we wrotediﬀerential equations for the moments d (cid:104) m x p y (cid:105) dt . Given that we know all parameters for our model we can simplyintegrate these equations numerically to compute how the moments of the distribution evolve as cells progressthrough their cell cycle. Once the cell reaches a time t d when is going to divide the mRNA and proteins that weare interested in undergo a binomial partitioning between the two daughter cells. In other words, each moleculeﬂips a coin and decides whether to go to either daughter. The question then becomes given that we have avalue for the moment (cid:104) m x p y (cid:105) t d at a time before the cell division, what would the value of this moment be afterthe cell division takes place (cid:104) m x p y (cid:105) t o ?The probability distribution of mRNA and protein after the cell division P t o ( m, p ) must satisfy P t o ( m, p ) = ∞ (cid:88) m (cid:48) = m ∞ (cid:88) p (cid:48) = p P ( m, p | m (cid:48) , p (cid:48) ) P t d ( m (cid:48) , p (cid:48) ) , (S118)where we are summing over all the possibilities of having m (cid:48) mRNA and p (cid:48) proteins before cell division. Notethat the sums start at m and p ; this is because for a cell to have these copy numbers before cell division it isa requirement that the mother cell had at least such copy number since we are not assuming that there is anyproduction at the instantaneous cell division time. Since we assume that the partition of mRNA is independentfrom the partition of protein, the conditional probability P ( m, p | m (cid:48) , p (cid:48) ) is simply given by a product of twobinomial distributions, one for the mRNA and one for the protein, i.e. P ( m, p | m (cid:48) , p (cid:48) ) = (cid:18) m (cid:48) m (cid:19) (cid:18) (cid:19) m (cid:48) · (cid:18) p (cid:48) p (cid:19) (cid:18) (cid:19) p (cid:48) . (S119)49ecause of these product of binomial probabilities are allowed to extend the sum from Eq. S118 to start at m (cid:48) = 0 and p (cid:48) = 0 as P t o ( m, p ) = ∞ (cid:88) m (cid:48) =0 ∞ (cid:88) p (cid:48) =0 P ( m, p | m (cid:48) , p (cid:48) ) P t d ( m (cid:48) , p (cid:48) ) , (S120)since the product of the binomial distributions in Eq. S119 is zero for all m (cid:48) < m and/or p (cid:48) <

0. So from nowon in this section we will assume that a sum of the form (cid:80) x ≡ (cid:80) ∞ x =0 to simplify notation.We can then compute the distribution moments after the cell division (cid:104) m x p y (cid:105) t o as (cid:104) m x p y (cid:105) t o = (cid:88) m (cid:88) p m x p y P t o ( m, p ) , (S121)for all x, y ∈ N . Substituting Eq. S118 results in (cid:104) m x p y (cid:105) t o = (cid:88) m (cid:88) p m x p y (cid:88) m (cid:48) (cid:88) p (cid:48) P ( m, p | m (cid:48) , p (cid:48) ) P t d ( m (cid:48) , p (cid:48) ) . (S122)We can rearrange the sums to be (cid:104) m x p y (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) P t d ( m (cid:48) , p (cid:48) ) (cid:88) m (cid:88) p m x p y P ( m, p | m (cid:48) , p (cid:48) ) . (S123)The fact that Eq. S119 is the product of two independent events allows us to rewrite the joint probability P ( m, p | m (cid:48) , p (cid:48) ) as P ( m, p | m (cid:48) , p (cid:48) ) = P ( m | m (cid:48) ) · P ( p | p (cid:48) ) . (S124)With this we can then write the moment (cid:104) m x p y (cid:105) t o as (cid:104) m x p y (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) P t d ( m (cid:48) , p (cid:48) ) (cid:88) m m x P ( m | m (cid:48) ) (cid:88) p p y P ( p | p (cid:48) ) . (S125)Notice that both terms summing over m and over p are the conditional expected values, i.e. (cid:88) z z x P ( z | z (cid:48) ) ≡ (cid:104) z x | z (cid:48) (cid:105) , for z ∈ { m, p } . (S126)These conditional expected values are the expected values of a binomial random variable z ∼ Bin( z (cid:48) , / (cid:104) m x p y (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) (cid:104) m x | m (cid:48) (cid:105) (cid:104) p y | p (cid:48) (cid:105) P t d ( m (cid:48) , p (cid:48) ) . (S127)To see how this general formula for the moments after the cell division works let’s compute the mean proteinper cell after the cell division (cid:104) p (cid:105) t o . That is setting x = 0, and y = 1. This results in (cid:104) p (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) (cid:10) m | m (cid:48) (cid:11) (cid:104) p | p (cid:48) (cid:105) P t d ( m (cid:48) , p (cid:48) ) . (S128)The zeroth moment (cid:10) m | m (cid:48) (cid:11) by deﬁnition must be one since we have (cid:10) m | m (cid:48) (cid:11) = (cid:88) m m P ( m | m (cid:48) ) = (cid:88) m P ( m | m (cid:48) ) = 1 , (S129)50ince the probability distribution must be normalized. This leaves us then with (cid:104) p (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) P t d ( m (cid:48) , p (cid:48) ) (cid:104) p | p (cid:48) (cid:105) . (S130)If we take the sum over m (cid:48) we simply compute the marginal probability distribution (cid:80) m (cid:48) P t d ( m (cid:48) , p (cid:48) ) = P t d ( p (cid:48) ),then we have (cid:104) p (cid:105) t o = (cid:88) p (cid:48) (cid:104) p | p (cid:48) (cid:105) P t d ( p (cid:48) ) . (S131)For the particular case of the ﬁrst moment of the binomial distribution with parameters p (cid:48) and 1 / (cid:104) p | p (cid:48) (cid:105) = p (cid:48) . (S132)Therefore the moment after division is equal to (cid:104) p (cid:105) t o = (cid:88) p (cid:48) p (cid:48) P t d ( p (cid:48) ) = 12 (cid:88) p (cid:48) p (cid:48) P t d ( p (cid:48) ) . (S133)Notice that this is just 1/2 of the expected value of p (cid:48) averaging over the distribution prior to cell division, i.e. (cid:104) p (cid:105) t o = (cid:104) p (cid:48) (cid:105) t d , (S134)where (cid:104)·(cid:105) t d highlights that is the moment of the distribution prior to the cell division. This result makes perfectsense. What this is saying is that the mean protein copy number right after the cell divides is half of the meanprotein copy number just before the cell division. That is exactly we would expect. So in principle to know theﬁrst moment of either the mRNA distribution (cid:104) m (cid:105) t o or the protein distribution (cid:104) m (cid:105) t o right after cell divisionit suﬃces to multiply the moments before the cell division (cid:104) m (cid:105) t d or (cid:104) p (cid:105) t d by 1/2. Let’s now explore how thisgeneralizes to any other moment (cid:104) m x p y (cid:105) t o . S5.1.2 Computing the moments of a binomial distribution

The result from last section was dependent on us knowing the functional form of the ﬁrst moment of thebinomial distribution. For higher moments we need some systematic way to compute such moments. Luckilyfor us we can do so by using the so-called moment generating function (MGF). The MGF of a random variable X is deﬁned as M X ( t ) = (cid:10) e tX (cid:11) , (S135)where t is a dummy variable. Once we know the MGF we can obtain any moment of the distribution by simplycomputing (cid:104) X n (cid:105) = d n dt n M X ( t ) (cid:12)(cid:12)(cid:12)(cid:12) t =0 , (S136)i.e. taking the n -th derivative of the MGF returns the n -th moment of the distribution. For the particular caseof the binomial distribution X ∼ Bin(

N, q ) it can be shown that the MGF is of the form M X ( t ) = (cid:2) (1 − q ) + qe t (cid:3) N . (S137)As an example let’s compute the ﬁrst moment of this binomially distributed variable. For this, the ﬁrst derivativeof the MGF results in dM X ( t ) dt = N [(1 − q ) + qe t ] N − qe t . (S138)We just need to follow Eq. S136 and set t = 0 to obtain the ﬁrst moment dM X ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12) t =0 = N q, (S139)51hich is exactly the expected value of a binomially distributed random variable.So according to Eq. S127 to compute any moment (cid:104) m x p y (cid:105) after cell division we can just take the x -thderivative and the y -th derivative of the binomial MGF to obtain (cid:104) m x | m (cid:48) (cid:105) and (cid:104) p y | p (cid:48) (cid:105) , respectively, and takethe expected value of the result. Let’s follow on detail the speciﬁc case for the moment (cid:104) mp (cid:105) . When computingthe moment after cell division (cid:104) mp (cid:105) t o which is of the form (cid:104) mp (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) (cid:104) m | m (cid:48) (cid:105) (cid:104) p | p (cid:48) (cid:105) P t d ( m (cid:48) , p (cid:48) ) , (S140)the product (cid:104) m | m (cid:48) (cid:105) (cid:104) p | p (cid:48) (cid:105) is then (cid:104) m | m (cid:48) (cid:105) (cid:104) p | p (cid:48) (cid:105) = m (cid:48) · p (cid:48) , (S141)where we used the result in Eq. S139, substituting m and p for X , respectively, and q for 1/2. Substituting thisresult into the moment gives (cid:104) mp (cid:105) t o = (cid:88) m (cid:48) (cid:88) p (cid:48) m (cid:48) p (cid:48) P t d ( m (cid:48) , p (cid:48) ) = (cid:104) m (cid:48) p (cid:48) (cid:105) t d . (S142)Therefore to compute the moment after cell division (cid:104) mp (cid:105) t o we simply have to divide by 4 the correspondingequivalent moment before the cell division.Not all moments after cell division depend only on the equivalent moment before cell division. For exampleif we compute the third moment of the protein distribution (cid:10) p (cid:11) t o , we ﬁnd (cid:10) p (cid:11) t o = (cid:10) p (cid:11) t d (cid:10) p (cid:11) t d . (S143)So for this particular case the third moment of the protein distribution depends on the third moment and thesecond moment before the cell division. In general all moments after cell division (cid:104) m x p y (cid:105) t o linearly depend onmoments before cell division. Furthermore, there is “moment closure” for this speciﬁc case in the sense that allmoments after cell division depend on lower moments before cell division. To generalize these results to all themoments computed in this work let us then deﬁne a vector to collect all moments before the cell division upthe (cid:104) m x p y (cid:105) t d moment, i.e. (cid:104) m x p y (cid:105) t d = (cid:16)(cid:10) m p (cid:11) t d , (cid:10) m (cid:11) t d , . . . , (cid:104) m x p y (cid:105) t d (cid:17) . (S144)Then any moment after cell division (cid:68) m x (cid:48) p y (cid:48) (cid:69) t o for x (cid:48) ≤ x and y (cid:48) ≤ y can be computed as (cid:68) m x (cid:48) p y (cid:48) (cid:69) t o = z x (cid:48) y (cid:48) · (cid:104) m x p y (cid:105) t d , where we deﬁne the vector z x (cid:48) y (cid:48) as the vector containing all the coeﬃcients that we obtain with the product ofthe two binomial distributions. For example for the case of the third protein moment (cid:10) p (cid:11) t o the vector z x (cid:48) y (cid:48) would have zeros for all entries except for the corresponding entry for (cid:10) p (cid:11) t d and for (cid:10) p (cid:11) t d , where it wouldhave 3 / / (cid:104) m x p y (cid:105) t o let us deﬁne an equivalentvector (cid:104) m x p y (cid:105) t o = (cid:16)(cid:10) m p (cid:11) t o , (cid:10) m (cid:11) t o , . . . , (cid:104) m x p y (cid:105) t o (cid:17) . (S145)Then we need to build a square matrix Z such that each row of the matrix contains the corresponding vector z x (cid:48) y (cid:48) for each of the moments. Having this matrix we would simply compute the moments after the cell divisionas (cid:104) m x p x (cid:105) t o = Z · (cid:104) m x p x (cid:105) t d . (S146)52n other words, matrix Z will contain all the coeﬃcients that we need to multiply by the moments before thecell division in order to obtain the moments after cell division. Matrix Z was then generated automaticallyusing Python’s analytical math library sympy [62].Fig. S15 (adapted from Fig. 3(B)) shows how the ﬁrst moment of both mRNA and protein changes overseveral cell cycles. The mRNA quickly relaxes to the steady state corresponding to the parameters for botha single and two promoter copies. This is expected since the parameters for the mRNA production weredetermined in the ﬁrst place under this assumption (See Appendix S1). We note that there is no apparent delaybefore reaching steady state of the mean mRNA count after the cell divides. This is because the mean mRNAcount for the two promoters copies state is exactly twice the expected mRNA count for the single promoterstate (See Appendix S1). Therefore once the mean mRNA count is halved after the cell division, it is already atthe steady state value for the single promoter case. On the other hand, given that the relaxation time to steadystate is determined by the degradation rate, the mean protein count does not reach its corresponding steadystate value for either promoter copy number state. Interestingly once a couple of cell cycles have passed theﬁrst moment has a repetitive trajectory over cell cycles. We have observed this experimentally by tracking cellsas they grow under the microscope. Comparing cells at the beginning of the cell cycle with the daughter cellsthat appear after cell division shown that on average all cells have the same amount of protein at the beginningof the cell cycle (See Fig. 18 of [28]), suggesting that these dynamical steady state takes place in vivo . m R N A / c e ll single promoter two promoters time (min) p r o t e i n / c e ll Figure S15. First and second moment dynamics over cell the cell cycle.

S5.2 Exponentially distributed ages

1] is the stage of the cell cycle, with a = 0 being the start of the cycle and a = 1 being the celldivision. In Appendix S10 we reproduce this derivation. It is a surprising result, but can be intuitively thoughtas follows: If the culture is growing exponentially, that means that all the time there is an increasing number of53ells. That means for example that if in a time interval ∆ t N “old” cells divided, these produced 2 N “young”cells. So at any point there is always more younger than older cells.Our numerical integration of the moment equations gave us a time evolution of the moments as cells progressthrough the cell cycle. Since experimentally we sample asynchronous cells that follow Eq. S147, each time pointalong the moment dynamic must be weighted by the probability of having sampled a cell at such speciﬁc timepoint of the cell cycle. Without loss of generality let’s focus on the ﬁrst mRNA moment (cid:104) m ( t ) (cid:105) (the same canbe applied to all other moments). As mentioned before, in order to calculate the ﬁrst moment across the entirecell cycle we must weigh each time point by the corresponding probability that a cell is found in such point ofits cell cycle. This translates to computing the integral (cid:104) m (cid:105) c = (cid:90) end cell cyclebeginning cell cycle (cid:104) m ( t ) (cid:105) P ( t ) dt, (S148)where (cid:104) m (cid:105) c is the mean mRNA copy number averaged over the entire cell cycle trajectory, and P ( t ) is theprobability of a cell being at a time t of its cell cycle.If we set the time in units of the cell cycle length we can use Eq. S147 and compute instead (cid:104) m (cid:105) = (cid:90) (cid:104) m ( a ) (cid:105) P ( a ) da, (S149)where P ( a ) is given by Eq. S147.What Eq. S149 implies is that in order to compute the ﬁrst moment (or any moment of the distribution) wemust weigh each point in the moment dynamics by the corresponding probability of a cell being at that pointalong its cell cycle. That is why when computing a moment we take the time trajectory of a single cell cycleas the ones shown in Fig. S15 and compute the average using Eq. S147 to weigh each time point. We performthis integral numerically for all moments using Simpson’s rule. S5.3 Reproducing the equilibrium picture

S5.3.1 Steady state under the thermodynamic model

Given the construction of the thermodynamic model of gene regulation for which the probability of thepromoter microstates rather than the probability of mRNA or protein counts is accounted for, we are onlyallowed to describe the dynamics of the ﬁrst moment using this theoretical framework [36]. Again let’s onlyfocus on the mRNA ﬁrst moment (cid:104) m (cid:105) . The same principles apply if we consider the protein ﬁrst moment. Wecan write a dynamical system of the form d (cid:104) m (cid:105) dt = r m · p bound − γ m (cid:104) m (cid:105) , (S150)where as before r m and γ m are the mRNA production and degradation rates respectively, and p bound is theprobability of ﬁnding the RNAP bound to the promoter [54]. This dynamical system is predicted to have asingle stable ﬁxed point that we can ﬁnd by computing the steady state. When we solve for the mean mRNAcopy number at steady state (cid:104) m (cid:105) ss we ﬁnd (cid:104) m (cid:105) ss = r m γ m p bound . (S151)54ince we assume that the only eﬀect that the repressor has over the regulation of the promoter is exclusionof the RNAP from binding to the promoter, we assume that only p bound depends on the repressor copy number R . Therefore when computing the fold-change in gene expression we are left withfold-change = (cid:104) m ( R (cid:54) = 0) (cid:105) ss (cid:104) m ( R = 0) (cid:105) ss = p bound ( R (cid:54) = 0) p bound ( R = 0) . (S152)As derived in [20] this can be written in the language of equilibrium statistical mechanics asfold-change = (cid:18) RN NS e − β ∆ ε r (cid:19) − , (S153)where β ≡ ( k B T ) − , ∆ ε r is the repressor-DNA binding energy, and N NS is the number of non-speciﬁc bindingsites where the repressor can bind.To arrive at Eq. S153 we ignore the physiological changes that occur during the cell cycle; one of the mostimportant being the variability in gene copy number that we are exploring in this section. It is thereforeworth thinking about whether or not the dynamical picture exempliﬁed in Fig. S15 can be reconciled with thepredictions made by Eq. S153 both at the mRNA and protein level.Fig. S16 compares the predictions of both theoretical frameworks for varying repressor copy numbers andrepressor-DNA aﬃnities. The solid lines are directly computed from Eq. S153. The hollow triangles and the solidcircles, represent the fold-change in mRNA and protein respectively as computed from the moment dynamics.To compute the fold-change from the kinetic picture we ﬁrst numerically integrate the moment dynamics forboth the two- and the three-state promoter (See Fig. S15 for the unregulated case) and then average the timeseries accounting for the probability of cells being sampled at each stage of the cell cycle as deﬁned in Eq. S149.The small systematic deviations between both models come partly from the simplifying assumption that therepressor copy number, and therefore the repressor on rate k ( r )on remains constant during the cell cycle. Inprinciple the gene producing the repressor protein itself is also subjected to the same duplication during thecell cycle, changing therefore the mean repressor copy number for both stages. repressors per cell f o l d - c h a n g e = .= .= . mRNAprotein Figure S16. Comparison of the equilibrium and kinetic reressor titration predictions.

For completeness Fig. S17 compares the kinetic and equilibrium models for the extended model of [27] inwhich the inducer concentration enters into the equation. The solid line is directly computed from Eq. 5 of[27]. The hollow triangles and solid points follow the same procedure as for Fig. S16, where the only eﬀect that55he inducer is assume to have in the kinetics is an eﬀective change in the number of active repressors, aﬀectingtherefore k ( r )on . IPTG ( M) f o l d - c h a n g e = . rep. / cell Figure S17. Comparison of the equilibrium and kinetic inducer titration predictions.

The equilibriummodel (solid lines) and the kinetic model with variation over the cell cycle (solid circles and white triangles) predictionsare compared for varying repressor copy numbers and inducer concentrations. The equilibrium model is directlycomputed as Eq. 5 of reference [27] with repressor-DNA binding energy ∆ ε r = − . k B T while the kinetic model iscomputed by numerically integrating the moment dynamics over several cell cycles, and then averaging over the extentof a single cell cycle as deﬁned in Eq. S149. S5.4 Comparison between single- and multi-promoter kinetic model

S5.5 Comparison with experimental data

Having shown that the kinetic model presented in this section can not only reproduce the results from theequilibrium picture at the mean level (See Fig. S16 and Fig. S17), but make predictions for the cell-to-cell56 IPTG ((cid:181)M) n o i s e = -15.3 rep./cell222601740 IPTG ((cid:181)M) = -13.9 rep./cell222601740 IPTG ((cid:181)M) = -9.7 rep./cell222601740

Figure S18. Comparison of the predicted protein noise between a single- and a multi-promoter kineticmodel.

Comparison of the noise (standard deviation/mean) between a kinetic model that considers a single promoterat all times (dashed line) and the multi-promoter model developed in this section (solid line) for diﬀerent repressoroperators. (A) Operator O1, ∆ ε r = − . k B T , (B) O2, ∆ ε r = − . k B T , (C) O3, ∆ ε r = − . k B T variability as quantiﬁed by the noise (See Fig. S18), we can assess whether or not this model is able to predictexperimental measurements of the noise. For this we take the single cell intensity measurements (See Methods)to compute the noise at the protein level.As mentioned before this metric diﬀers from the Fano factor since for ﬂuorescent arbitrary units the noiseis a dimensionless quantity. To see why consider that the noise is deﬁned asnoise ≡ (cid:113) (cid:104) p (cid:105) − (cid:104) p (cid:105) (cid:104) p (cid:105) . (S154)We assume that the intensity level of a cell I is linearly proportional to the absolute protein count, i.e. I = αp, (S155)where α is the proportionality constant between arbitrary units and protein absolute number p . Substitutingthis deﬁnition on Eq. S154 gives noise = (cid:113) (cid:104) ( αI ) (cid:105) − (cid:104) αI (cid:105) (cid:104) αI (cid:105) . (S156)Since α is a constant it can be taken out of the average operator (cid:104)·(cid:105) , obtainingnoise = (cid:114) α (cid:16) (cid:104) I (cid:105) − (cid:104) I (cid:105) (cid:17) α (cid:104) I (cid:105) = (cid:114)(cid:16) (cid:104) I (cid:105) − (cid:104) I (cid:105) (cid:17) (cid:104) I (cid:105) . (S157)Notice that in Eq. S155 the linear proportionality between intensity and protein count has no intercept.This ignores the autoﬂuorescence that cells without reporter would generate. To account for this, in practicewe compute noise = (cid:114)(cid:16) (cid:104) ( I − (cid:104) I auto (cid:105) ) (cid:105) − (cid:104) I − (cid:104) I auto (cid:105)(cid:105) (cid:17) (cid:104) I − (cid:104) I auto (cid:105)(cid:105) . (S158)57here I is the intensity of the strain of interest and (cid:104) I auto (cid:105) is the mean autoﬂuorescence intensity, obtainedfrom a strain that does not carry the ﬂuorescent reporter gene.Fig. S19 shows the comparison between theoretical predictions and experimental measurements for theunregulated promoter. The reason we split the data by operator despite the fact that since these are unregulatedpromoters, they should in principle have identical expression proﬁles is to precisely make sure that this is thecase. We have found in the past that sequences downstream of the RNAP binding site can aﬀect the expressionlevel of constitutively expressed genes. We can see that both models, the single-promoter (gray dotted line)and the multi-promoter (black dashed line) underestimate the experimental noise to diﬀerent degrees. Thesingle-promoter model does a worse job at predicting the experimental data since it doesn’t account for thediﬀerences in gene dosage during the cell cycle. But still we can see that accounting for this variability takes usto within a factor of two of the experimentally determined noise for these unregulated strains. O1 O2 O30.00.20.40.60.81.0 n o i s e modelsingle-promotermulti-promoternoisenoisenoise Figure S19. Protein noise of the unregulated promoter.

Comparison of the experimental noise for diﬀerentoperators with the theoretical predictions for the single-promoter (gray dotted line) and the multi-promoter model(black dashed line). Each datum represents a single date measurement of the corresponding ∆ lacI strain with ≥ To further test the model predictive power we compare the predictions for the three-state regulated promoter.Fig. S20 shows the theoretical predictions for the single- and multi-promoter model for varying repressor copynumbers and repressor-DNA binding aﬃnities as a function of the inducer concentration. We can see again thatour zero-parameter ﬁts systematically underestimates the noise for all strains and all inducer concentrations.We highlight that the y -axis is shown in a log-scale to emphasize more this deviation; but, as we will show inthe next section, our predictions still fall within a factor of two from the experimental data. S5.5.1 Systematic deviation of the noise in gene expression

Fig. S19 and Fig. S20 highlight that our model underestimates the cell-to-cell variability as measured bythe noise. To further explore this systematic deviation Fig. S21 shows the theoretical vs. experimental noiseboth in linear and log scale. As we can see the data is systematically above the identity line. The data iscolored by their corresponding experimental fold-change values. The data that has the largest deviations fromthe identity line also corresponds to the data with the largest error bars and the smallest fold-change. This isbecause measurements with very small fold-changes correspond to intensities very close to the autoﬂuorescencebackground. Therefore minimal changes when computing the noise are ampliﬁed given the ratio of std/mean.In Appendix S9 we will explore empirical ways to improve the agreement between our minimal model and theexperimental data to guide future eﬀorts to improve the minimal.58 = -15.3 = -13.9 = -9.7 IPTG ( M) n o i s e rep./cell222601740singlemulti IPTG ( M) rep./cell222601740singlemulti IPTG ( M) rep./cell222601740singlemulti

Figure S20. Protein noise of the regulated promoter.

300 cells. The points correspondto the median, and the error bars correspond to the 95% conﬁdence interval as determined by 10,000 bootstrapsamples. White-ﬁlled dots are plot at a diﬀerent scale for better visualization. theoretical noise e x p e r i m e n t a l n o i s e linear scale theoretical noise e x p e r i m e n t a l n o i s e log scale f o l d - c h a n g e Figure S21. Systematic comparison of theoretical vs experimental noise in gene expression.

300 cells. The points correspond to the median, and the error bars correspond to the 95%conﬁdence interval as determined by 10,000 bootstrap samples.

S6 Maximum entropy approximation of distributions (Note: The Python code used for the calculations presented in this section can be found in the followinglink as an annotated Jupyter notebook)On the one hand the solution of chemical master equations like the one in Section 1.1 represent a hardmathematical challenge. As presented in Appendix S2 Peccoud and Ycart derived a closed-form solution for59he two-state promoter at the mRNA level [34]. In an impressive display of mathematical skills, Shahrezaeiand Swain were able to derive an approximate solution for the one- (not considered in this work) and two-statepromoter master equation at the protein level [53]. Nevertheless both of these solutions do not give instantaneousinsights about the distributions as they involve complicated terms such as conﬂuent hypergeometric functions.On the other hand there has been a great deal of work to generate methods that can approximate thesolution of these discrete state Markovian models [63–67]. In particular for master equations like the one thatconcerns us here whose moments can be easily computed, the moment expansion method provides a simplemethod to approximate the full joint distribution of mRNA and protein [67]. In this section we will explain theprinciples behind this method and show the implementation for our particular case study.

S6.1 The MaxEnt principle

The principle of maximum entropy (MaxEnt) ﬁrst proposed by E. T. Jaynes in 1957 tackles the questionof given limited information what is the least biased inference one can make about a particular probabilitydistribution [42]. In particular Jaynes used this principle to show the correspondence between statistical me-chanics and information theory, demonstrating, for example, that the Boltzmann distribution is the probabilitydistribution that maximizes Shannon’s entropy subject to a constraint that the average energy of the system isﬁxed.To illustrate the principle let us focus on a univariate distribution P X ( x ). The n th moment of the distributionfor a discrete set of possible values of x is given by (cid:104) x n (cid:105) ≡ (cid:88) x x n P X ( x ) . (S159)Now assume that we have knowledge of the ﬁrst m moments (cid:104) x (cid:105) m = ( (cid:104) x (cid:105) , (cid:10) x (cid:11) , . . . , (cid:104) x m (cid:105) ). The question isthen how can we use this information to build an estimator P H ( x | (cid:104) x (cid:105) m ) of the distribution such thatlim m →∞ P H ( x | (cid:104) x (cid:105) m ) → P X ( x ) , (S160)i.e. that the more moments we add to our approximation, the more the estimator distribution converges to thereal distribution.The MaxEnt principle tells us that our best guess for this estimator is to build it on the base of maximizingthe Shannon entropy, constrained by the information we have about these m moments. The maximization ofShannon’s entropy guarantees that we are the least committed possible to information that we do not posses.The Shannon entropy for an univariate discrete distribution is given by [3] H ( x ) ≡ − (cid:88) x P X ( x ) log P X ( x ) . (S161)For an optimization problem subject to constraints we make use of the method of the Lagrange multipliers.For this we deﬁne the constraint equation L ( x ) as L ( x ) ≡ H ( x ) − m (cid:88) i =0 (cid:34) λ i (cid:32)(cid:10) x i (cid:11) − (cid:88) x x i P X ( x ) (cid:33)(cid:35) , (S162)where λ i is the Lagrange multiplier associated with the i th moment. The inclusion of the zeroth moment isan additional constraint to guarantee the normalization of the resulting distribution. Since P X ( x ) has a ﬁniteset of discrete values, when taking the derivative of the constraint equation with respect to P X ( x ), we chose aparticular value of X = x . Therefore from the sum over all possible x values only a single term survives. Withthis in mind we take the derivative of the constraint equation obtaining d L dP X ( x ) = − log P X ( x ) − − m (cid:88) i =0 λ i x i . (S163)60quating this derivative to zero and solving for the distribution (that we now start calling P H ( x ), ourMaxEnt estimator) gives P H ( x ) = exp (cid:32) − − m (cid:88) i =0 λ i x i (cid:33) = 1 Z exp (cid:32) − m (cid:88) i =1 λ i x i (cid:33) , (S164)where Z is the normalization constant that can be obtained by substituting this solution into the normalizationconstraint. This results in Z ≡ exp (1 + λ ) = (cid:88) x exp (cid:32) − m (cid:88) i =1 λ i x i (cid:33) . (S165)Eq. S164 is the general form of the MaxEnt distribution for a univariate distribution. The computationalchallenge then consists in ﬁnding numerical values for the Lagrange multipliers { λ i } such that P H ( x ) satisﬁesour constraints. In other words, the Lagrange multipliers weight the contribution of each term in the exponentsuch that when computing any of the moments we recover the value of our constraint. Mathematically whatthis means is that P H ( x ) must satisfy (cid:88) x x n P H ( x ) = (cid:88) x x n Z exp (cid:32) − m (cid:88) i =1 λ i x i (cid:33) = (cid:104) x n (cid:105) . (S166)As an example of how to apply the MaxEnt principle let us use the classic problem of a six-face die. If weare only told that after a large number of die rolls the mean value of the face is (cid:104) x (cid:105) = 4 . . P H ( x ) = 1 Z exp ( λx ) . (S167)Using any numerical minimization package we can easily ﬁnd the value of the Lagrange multiplier λ thatsatisﬁes our constraint. Fig. S22 shows two two examples of distributions that satisfy the constraint. Panel(A) shows a distribution consistent with the 4.5 average where both 4 and 5 are equally likely. Nevertheless inthe information we got about the nature of the die it was never stated that some of the faces were forbidden.In that sense the distribution is committing to information about the process that we do not posses. Panel(B) by contrast shows the MaxEnt distribution that satisﬁes this constraint. Since this distribution maximizesShannon’s entropy it is guaranteed to be the least biased distribution given the available information. S6.1.1 The mRNA and protein joint distribution

The MaxEnt principle can easily be extended to multivariate distributions. For our particular case we areinterested in the mRNA and protein joint distribution P ( m, p ). The deﬁnition of a moment (cid:104) m x p y (cid:105) is a naturalextension of Eq. S159 of the form (cid:104) m x p y (cid:105) = (cid:88) m (cid:88) p m x p y P ( m, p ) . (S168)As a consequence the MaxEnt joint distribution P H ( m, p ) is of the form P H ( m, p ) = 1 Z exp  − (cid:88) ( x,y ) λ ( x,y ) m x p y  , (S169)where λ ( x,y ) is the Lagrange multiplier associated with the moment (cid:104) m x p y (cid:105) , and again Z is the normalizationconstant given by Z = (cid:88) m (cid:88) p exp  − (cid:88) ( x,y ) λ ( x,y ) m x p y  . (S170)61 die face p r o b a b ili t y = . die face MaxEnt = . (A) (B)

Figure S22. Maximum entropy distribution of six-face die. (A)biased distribution consistent with theconstraint (cid:104) x (cid:105) = 4 .

5. (B) MaxEnt distribution also consistent with the constraint.

Note that the sum in the exponent is taken over all available ( x, y ) pairs that deﬁne the moment constraintsfor the distribution.

S6.2 The Bretthorst rescaling algorithm

The determination of the Lagrange multipliers suﬀer from a numerical under and overﬂow problem due tothe diﬀerence in magnitude between the constraints. This becomes a problem when higher moments are takeninto account. The resulting numerical values for the Lagrange multipliers end up being separated by severalorders of magnitude. For routines such as Newton-Raphson or other minimization algorithms that can be usedto ﬁnd these Lagrange multipliers these diﬀerent scales become problematic.To get around this problem we implemented a variation to the algorithm due to G. Larry Bretthorst, E.T.Jaynes’ last student. With a very simple argument we can show that linearly rescaling the constraints, theLagrange multipliers and the “rules” for how to compute each of the moments, i.e. each of the individualproducts that go into the moment calculation, should converge to the same MaxEnt distribution. In order tosee this let’s consider again a univariate distribution P X ( x ) that we are trying to reconstruct given the ﬁrst twomoments (cid:104) x (cid:105) , and (cid:10) x (cid:11) . The MaxEnt distribution can be written as P H ( x ) = 1 Z exp (cid:0) − λ x − λ x (cid:1) = 1 Z exp ( − λ x ) exp (cid:0) − λ x (cid:1) . (S171)We can always rescale the terms in any way and obtain the same result. Let’s say that for some reason we wantto rescale the quadratic terms by a factor a . We can deﬁne a new Lagrange multiplier λ (cid:48) ≡ λ a that compensatesfor the rescaling of the terms, obtaining P H ( x ) = 1 Z exp ( − λ x ) exp (cid:0) − λ (cid:48) ax (cid:1) . (S172)Computationally it might be more eﬃcient to ﬁnd the numerical value of λ (cid:48) rather than λ maybe because itis of the same order of magnitude as λ . Then we can always multiply λ (cid:48) by a to obtain back the constraintfor our quadratic term. What this means is that that we can always rescale the MaxEnt problem to make itnumerically more stable, then we can rescale back to obtain the value of the Lagrange multipliers. The key tothe Bretthorst algorithm lies in the selection of what rescaling factor to choose in order to make the numericalinference more eﬃcient. 62retthorst’s algorithm goes even further by further transforming the constraints and the variables to makethe constraints orthogonal, making the computation much more eﬀective. We now explain the implementationof the algorithm for our joint distribution of interest P ( m, p ). S6.2.1 Algorithm implementation

Let the M × N matrix A contain all the factors used to compute the moments that serve as constraints,where each entry is of the form A ij = m x j i · p y j i . (S173)In other words, recall that to obtain any moment (cid:104) m x p y (cid:105) we compute (cid:104) m x p y (cid:105) = (cid:88) m (cid:88) p m x p y P ( m, x ) . (S174)If we have M possible ( m, p ) pairs in our truncated sample space (because we can’t include the sample space upto inﬁnity) { ( m, p ) , ( m, p ) , . . . ( m, p ) N } , and we have N exponent pairs ( x, y ) corresponding to the N momentsused to constraint the maximum entropy distribution { ( x, y ) , ( x, y ) , . . . , ( x, y ) N } , then matrix A contains allthe possible M by N terms of the form described in Eq. S173. Let also v be a vector of length N containingall the constraints with each entry of the form v j = (cid:104) m x j p y j (cid:105) , (S175)i.e. the information that we have about the distribution. That means that the constraint equation L to be usedfor this problem takes the form L = − (cid:88) i P i ln P i + λ (cid:32) − (cid:88) i P i (cid:33) + (cid:88) j> λ j (cid:32) v j − (cid:88) i A ij P i (cid:33) , (S176)where λ is the Lagrange multiplier associated with the normalization constraint, and λ j is the Lagrangemultiplier associated with the j th constraint. This constraint equation is equivalent to Eq. S162, but now allthe details of how to compute the moments are speciﬁed in matrix A .With this notation in hand we now proceed to rescale the problem. The ﬁrst step consists of rescalingthe terms to compute the entries of matrix A . As mentioned before, this is the key feature of the Bretthorstalgorithm; the particular choice of rescaling factor used in the algorithm empirically promotes that the rescaledLagrange multipliers are of the same order of magnitude. The rescaling takes the form A (cid:48) ij = A ij G j , (S177)where G j serves to rescale the moments, providing numerical stability to the inference problem. Bretthorstproposes an empirical rescaling that satisﬁes G j = (cid:88) i A ij , (S178)or in terms of our particular problem G j = (cid:88) m (cid:88) p ( m x j p y j ) . (S179)What this indicates is that each pair m x j i p y j i is normalized by the square root of the sum of the all pairs of thesame form squared.Since we rescale the factors involved in computing the constraints, the constraints must also be rescaledsimply as v (cid:48) j = (cid:104) m x j p y j (cid:105) (cid:48) = (cid:104) m x j p y j (cid:105) G j . (S180)63he Lagrange multipliers must compensate this rescaling since at the end of the day the probability must addup to the same value. Therefore we rescale the λ j terms as λ (cid:48) j = λ j G j , (S181)such that any λ j A ij = λ (cid:48) j A (cid:48) ij . If this empirical value for the rescaling factor makes the rescaled Lagrangemultipliers λ (cid:48) j be of the same order of magnitude, this by itself would already improve the algorithm convergence.Bretthorst proposes another linear transformation to make the optimization routine even more eﬃcient. Forthis we generate orthogonal constraints that make Newton-Raphson and similar algorithms converge faster.The transformation is as follows A (cid:48)(cid:48) ik = (cid:88) j e jk A (cid:48) ij , (S182)for the entires of matrix A , and v (cid:48)(cid:48) k = (cid:88) j e jk u (cid:48) j , (S183)for entires of the constraint vector v , ﬁnally λ (cid:48)(cid:48) k = (cid:88) j e jk β j , (S184)for the Lagrange multipliers. Here e jk is the j th component of the k th eigenvector of the matrix E with entries E kj = (cid:88) i A (cid:48) ik A (cid:48) ij . (S185)This transformation guarantees that the matrix A (cid:48)(cid:48) has the property (cid:88) i A (cid:48)(cid:48) ij A (cid:48)(cid:48) jk = β j δ jk , (S186)where β j is the j th eigenvalue of the matrix E and δ jk is the Kronecker delta function. What this means isthat, as desired, the constraints are orthogonal to each other, improving the algorithm convergence speed. S6.3 Predicting distributions for simple repression constructs

Having explained the theoretical background along with the practical diﬃculties and a workaround strategyproposed by Bretthorst, we implemented the inference using the moments obtained from averaging over thevariability along the cell cycle (See Appendix S5). Fig. S23 and Fig. S24 present these inferences for bothmRNA and protein levels respectively for diﬀerent values of the repressor-DNA binding energy and repressorcopy numbers per cell. From these plots we can easily appreciate that despite the fact that the mean of eachdistribution changes as the induction level changes, there is a lot of overlap between distributions. This as aconsequence means that at the single-cell level cells cannot perfectly resolve between diﬀerent inputs.

S6.4 Comparison with experimental data

Now that we have reconstructed an approximation of the probability distribution P ( m, p ) we can comparethis with our experimental measurements. But just as detailed in Appendix S5.5 the single-cell microscopymeasurements are given in arbitrary units of ﬂuorescence. Therefore we cannot compare directly our predictedprotein distributions with these values. To get around this issue we use the fact that the fold-change in geneexpression that we deﬁned as the ratio of the gene expression level in the presence of the repressor and theexpression level of a knockout strain is a non-dimensional quantity. Therefore we normalize all of our single-cellmeasurements by the mean ﬂuorescence value of the ∆ lacI strain with the proper background ﬂuorescencesubtracted as explained in Appendix S5.5 for the noise measurements. In the case of the theoretical predictions64 [ I P T G ] ( M ) = -15.3 = -13.9 = -9.7 [ I P T G ] ( M ) mRNA / cell [ I P T G ] ( M ) mRNA / cell mRNA / cell r e p . / c e ll = r e p . / c e ll = r e p . / c e ll = Figure S23. Maximum entropy mRNA distributions for simple repression constructs. mRNA distributionsfor diﬀerent biophysical parameters. From left to right the repressor-DNA aﬃnity decreases as deﬁned by the three lacIoperators O1 ( − . k B T ), O2 ( − . k B T ), and O3 ( − . k B T ). From top to bottom the mean repressor copynumber per cell increases. The curves on each plot represent diﬀerent IPTG concentrations. Each distribution wasﬁtted using the ﬁrst three moments of the mRNA distribution. of the protein distribution we also normalize each protein value by the predicted mean protein level (cid:104) p (cid:105) , havingnow non-dimensional scales that can be directly compared. Fig. S25 shows the experimental (color curves) andtheoretical (dark dashed line) cumulative distribution functions for the three ∆ lacI strains. As in Fig. S19, wedo not expect diﬀerences between the operators, but we explicitly plot them separately to make sure that thisis the case. We can see right away that as we would expect given the limitations of the model to accuratelypredict the noise and skewness of the distribution, the model doesn’t accurately predict the data. Our modelpredicts a narrower distribution compared to what we measured with single-cell microscopy.The same narrower prediction applies to the regulated promoters. Fig. S26, shows the theory-experimentcomparison of the cumulative distribution functions for diﬀerent repressor binding sites (diﬀerent ﬁgures), repres-sor copy numbers (rows), and inducer concentrations (columns). In general the predictions are systematicallynarrower compared to the actual experimental data. S7 Gillespie simulation of master equation (Note: The Python code used for the calculations presented in this section can be found in the followinglink as an annotated Jupyter notebook)So far we have generated a way to compute an approximated form of the joint distribution of protein and65 [ I P T G ] ( M ) = -15.3 = -13.9 = -9.7 [ I P T G ] ( M ) protein / cell [ I P T G ] ( M ) protein / cell protein / cell r e p . / c e ll = r e p . / c e ll = r e p . / c e ll = Figure S24. Maximum entropy protein distributions for simple repression constructs.

Proteindistributions for diﬀerent biophysical parameters. From left to right the repressor-DNA aﬃnity decreases as deﬁned bythe three lacI operators O1 ( − . k B T ), O2 ( − . k B T ), and O3 ( − . k B T ). From top to bottom the meanrepressor copy number per cell increases. The curves on each plot represent diﬀerent IPTG concentrations. Eachdistribution was ﬁtted using the ﬁrst six moments of the protein distribution. mRNA P ( m, p ) as a function of the moments of the distribution (cid:104) m x p y (cid:105) . This is a non-conventional formto work with the resulting distribution of the master equation. A more conventional approach to work withmaster equations whose closed-form solutions are not known or not computable is to use stochastic simulationscommonly known as Gillespie simulations. To benchmark the performance of our approach based on distributionmoments and maximum entropy we implemented the Gillespie algorithm. Our implementation as detailed inthe corresponding Jupyter notebook makes use of just-in-time compilation as implemented with the Pythonpackage numba. S7.1 mRNA distribution with Gillespie simulations

To conﬁrm that the implementation of the Gillespie simulation was correct we perform the simulation atthe mRNA level for which the closed-form solution of the steady-state distribution is known as detailed inAppendix S2. Fig. S27 shows example trajectories of mRNA counts. Each of these trajectories were computedover several cell cyles, where the cell division was implemented generating a binomially distributed randomvariable that depended on the last mRNA count before the division event.To check the implementation of our stochastic algorithm we generated several of these stochastic trajectoriesin order to reconstruct the mRNA steady-state distribution. These reconstructed distributions for a single- anddouble-copy of the promoter can be compared with Eq. S10 - the steady-state distribution for the two-state66 fold-change E C D F operator O1 theorymicroscopy fold-change operator O2 theorymicroscopy fold-change operator O3 theorymicroscopy Figure S25. Experiment vs. theory comparison for ∆ lacI strain. Example fold-change empirical cumulativedistribution functions (ECDF) for strains with no repressors and diﬀerent operators. The color curves representsingle-cell microscopy measurements while the dashed black lines represent the theoretical distributions asreconstructed by the maximum entropy principle. The theoretical distributions were ﬁtted using the ﬁrst six momentsof the protein distribution. promoter. Fig. S28 shows the great agreement between the stochastic simulation and the analytical result,conﬁrming that our implementation of the Gillespie simulation is correct.

S7.2 Protein distribution with Gillespie simulations

Having conﬁrmed that our implementation of the Gillespie algorithm that includes the binomial partitioningof molecules reproduces analytical results we extended the implementation to include protein counts. Fig. S29shows representative trajectories for both mRNA and protein counts over several cell cycles. Specially for theprotein we can see that it takes several cell cycles for counts to converge to the dynamical steady-state observedwith the deterministic moment equations. Once this steady-state is reached, the ensemble of trajectories betweencell cycles look very similar.From these trajectories we can compute the protein steady-state distribution, taking into account the cell-agedistribution as detailed in Appendix S6. Fig. S30 shows the comparison between this distribution and the onegenerated using the maximum entropy algorithm. Despite the notorious diﬀerences between the distributions,the Gillespie simulation and the maximum entropy results are indistinguishable in terms of the mean, variance,and skewness of the distribution. We remind the reader that the maximum entropy is an approximation ofthe distribution that gets better the more moments we add. We therefore claim that the approximation workssuﬃciently well for our purpose. The enormous advantage of the maximum entropy approach comes from thecomputation time. for the number of distributions that were needed for our calculations the Gillespie algorithmproved to be a very ineﬃcient method given the large sample space. Our maximum entropy approach reducesthe computation time by several orders of magnitude, allowing us to extensively explore diﬀerent parameters ofthe regulatory model.

S8 Computational determination of the channel capacity (Note: The Python code used for the calculations presented in this section can be found in the followinglink as an annotated Jupyter notebook)In this section we detail the computation of the channel capacity of the simple genetic circuit shown in67ig. 5. As detailed in Section 1.6 the channel capacity is deﬁned as the mutual information between input c and output p maximized over all possible input distributions P ( c ) [3]. In principle there are an inﬁnite numberof input distributions, so the task of ﬁnding ˆ P ( c ), the input distribution at channel capacity, requires analgorithmic approach that guarantees the convergence to this distribution. Tkaˇcik, Callan and Bialek developeda clever analytical approximation to ﬁnd the ˆ P ( c ) distribution [17]. The validity of their so-called small noiseapproximation requires the standard deviation of the output distribution P ( p | c ) to be much smaller thanthe domain of the distribution. For our particular case such condition is not satisﬁed given the spread of theinferred protein distributions shown in Fig. 4.Fortunately there exists a numerical algorithm to approximate ˆ P ( c ) for discrete distributions. In 1972Richard Blahut and Suguru Arimoto independently came up with an algorithm mathematically shown to con-verge to ˆ P ( c ) [44]. To compute both the theoretical and the experimental channel capacity shown in Fig. 5,we implemented Blahut’s algorithm. In the following section we detail the deﬁnitions needed for the algorithm.Then we detail how to compute the experimental channel capacity when the bins of the distribution are notclear given the intrinsic arbitrary nature of microscopy ﬂuorescence measurements. S8.1 Blahut’s algorithm

Following [44] we implemented the algorithm to compute the channel capacity. We deﬁne p c to be an arraycontaining the probability of each of the input inducer concentrations (twelve concentrations, See Methods).Each entry j of the array is then of the form p ( j ) c = P ( c = c j ) , (S187)with j ∈ { , , . . . , } . The objective of the algorithm is to ﬁnd the entries p ( j ) c that maximize the mutualinformation between inputs and outputs. We also deﬁne Q to be a | p c | by | p p | c | matrix, where | · | speciﬁes thelength of the array, and p p | c is an array containing the probability distribution of an output given a speciﬁcvalue of the input. In other words, the matrix Q recollects all of the individual output distribution arrays p p | c into a single object. Then each entry of the matrix Q is of the form Q ( i,j ) = P ( p = p i | c = c j ) . (S188)For the case of the theoretical predictions of the channel capacity (Solid lines in Fig. 5) the entries of matrix Q are given by the inferred maximum entropy distributions as shown in Fig. 4. In the next section we willdiscuss how to deﬁne this matrix for the case of the single-cell ﬂuorescence measurements. Having deﬁned thesematrices we proceed to implement the algorithm shown in Figure 1 of [44]. S8.2 Channel capacity from arbitrary units of ﬂuorescence

A diﬃculty when computing the channel capacity between inputs and outputs from experimental data isthat ideally we would like to compute C ( g ; c ) ≡ sup P ( c ) I ( g ; c ) , (S189)where g is the gene expression level, and c is the inducer concentration. But in reality we are computing C ( f ( g ); c ) ≡ sup P ( c ) I ( f ( g ); c ) , (S190)where f ( g ) is a function of gene expression that has to do with our mapping from the YFP copy number tosome arbitrary ﬂuorescent value as computed from the images taken with the microscope. The data processinginequality, as derived by Shannon himself, tells us that for a Markov chain of the form c → g → f ( g ) it mustbe true that [3] I ( g ; c ) ≥ I ( f ( g ); c ) , (S191)68eaning that information can only be lost when mapping from the real relationship between gene expressionand inducer concentration to a ﬂuorescence value.On top of that, given the limited number of samples that we have access to when computing the channelcapacity, there is a bias in our estimate given this undersampling. The deﬁnition of accurate unbiased descriptorsof the mutual information is still an area of active research. For our purposes we will use the method describedin [68]. The basic idea of the method is to write the mutual information as a series expansion in terms of inversepowers of the sample size, i.e. I biased = I ∞ + a N + a N + · · · , (S192)where I biased is the biased estimate of the mutual information as computed from experimental data, I ∞ isthe quantity we would like to estimate, being the unbiased mutual information when having access to inﬁnitynumber of experimental samples, and the coeﬃcients a i depend on the underlying distribution of the signaland the response. This is an empirical choice to be tested. Intuitively this choice satisﬁes the limit that as thenumber of samples from the distribution grows, the empirical estimate of the mutual information I biased shouldget closer to the actual value I ∞ .In principle for a good number of data points the terms of higher order become negligible. So we can writethe mutual information as I biased ≈ I ∞ + a N + O ( N − ) . (S193)This means that if this particular arbitrary choice of functional form is a good approximation, when computingthe mutual information for varying number of samples - by taking subsamples of the experimental data - weexpect to ﬁnd a linear relationship as a function of the inverse of these number of data points. From this linearrelationship the intercept is a bias-corrected estimate of the mutual information. We can therefore bootstrapthe data by taking diﬀerent sample sizes and then use the Blahut-Arimoto algorithm we implemented earlierto estimate the biased channel capacity. We can then ﬁt a line and extrapolate for when 1 /N = 0 whichcorresponds to our unbiased estimate of the channel capacity.Let’s go through each of the steps to illustrate the method. Fig. S31 show a typical data set for a strain withan O2 binding site (∆ ε r = − . k B T ) and R = 260 repressors per cell. Each of the distributions in arbitraryunits is binned into a speciﬁed number of bins to build matrix Q .Given a speciﬁc number of bins used to construct Q , we subsample a fraction of the data and compute thechannel capacity for such matrix using the Blahut-Arimoto algorithm. Fig. S32 shows an example where 50%of the data on each distribution from Fig. S31 was sampled and binned into 100 equal bins. The counts on eachof these bins are then normalized and used to build matrix Q that is then fed to the Blahut-Arimoto algorithm.We can see that for these 200 bootstrap samples the channel capacity varies by ≈ I ∞ we again follow the methodol-ogy suggested in [68]. We perform the data subsampling and computation of the channel capacity for a varyingnumber of bins. As a control we perform the same procedure with shuﬄed data, where the structure thatconnects the ﬂuorescence distribution to the inducer concentration input is lost. The expectation is that thiscontrol should give a channel capacity of zero if the data is not “over-binned.” Once the number of bins is too69igh, we would expect some structure to emerge in the data that would cause the Blahut-Arimoto algorithm toreturn non-zero channel capacity estimates.Fig. S34 shows the result of the unbiased channel capacity estimates obtained for the data shown in Fig. S31.For the blue curve we can distinguish three phases:1. A rapid increment from 0 bits to about 1.5 bits as the number of bins increases.2. A ﬂat region between ≈

50 and 1000 bins.3. A second rapid increment for large number of bins.We can see that the randomized data presents two phases only:1. A ﬂat region where there is, as expected no information being processed since the structure of the datawas lost when the data was shuﬄed.2. A region with fast growth of the channel capacity as the over-binning generates separated peaks on thedistribution, making it look like there is structure in the data.We take the ﬂat region of the experimental data ( ≈

100 bins) to be our best unbiased estimate of the channelcapacity from this experimental dataset.

S8.3 Assumptions involved in the computation of the channel capacity

An interesting suggestion by Professor Gasper Tkacik was to dissect the diﬀerent physical assumptions thatwent into the construction of the input-output function P ( p | c ), and their relevance when comparing thetheoretical channel capacities with the experimental inferences. In what follows we describe the relevance offour important aspects that all aﬀect the predictions of the information processing capacity. (i) Cell cycle variability. We think that the inclusion of the gene copy number variability during the cellcycle and the non-Poissoninan protein degradation is a key component to our estimation of the input-outputfunctions and as a consequence of the channel capacity. This variability in gene copy number is an additionalsource of noise that systematically decreases the ability of the system to resolve diﬀerent inputs. The absenceof the eﬀects that the gene copy number variability and the protein partition has on the information processingcapacity leads to an overestimate of the channel capacity as shown in Fig. S35. Only when these noise sourcesare included in our inferences is that we get to capture the experimental channel capacities with no further ﬁtparameters. (ii) Non-Gaussian noise distributions.

For the construction of the probability distributions used in themain text (Fig. 4) we utilized the ﬁrst 6 moments of the protein distribution. The maximum entropy formalismtells us that the more constraints we include in the inference, the closer the maximum entropy distribution willbe to the real distribution. But a priori there is no way of knowing how many moments should be includedin order to capture the essence of the distribution. In principle two moments could suﬃce to describe theentire distribution as happens with the Gaussian distribution. To compare the eﬀect that including more orless constraints on the maximum entropy inference we constructed maximum entropy distributions using anincreasing number of moments from 2 to 6. We then computed the Kullback-Leibler divergence D KL of theform D KL ( P ( p | c ) || P i ( p | c )) = (cid:88) p P ( p | c ) log P ( p | c ) P i ( p | c ) , (S194)where P i ( p | c ) is the maximum entropy distribution constructed with the ﬁrst i moments, i ∈ { , , , , } .Since the Kullback-Leibler divergence D KL ( P || Q ) can be interpreted as the amount of information lost byassuming the incorrect distribution Q when the correct distribution is P , we used this metric as a way of how70uch information we would have lost by using less constraints compared to the six moments used in the maintext.Fig. S36 shows this comparison for diﬀerent operators and repressor copy numbers. We can see from herethat using less moments as constraints gives basically the same result. This is because most of the values of theKullback-Leibler divergence are signiﬁcantly smaller than 0.1 bits, and the entropy of these distributions is ingeneral >

10 bits, so we would lose less than 1% of the information contained in these distributions by utilizingonly two moments as constraints. Therefore the use of non-Gaussian noise is not an important feature for ourinferences. (iii) Multi-state promoter.

This particular point is something that we are still exploring from a theoreticalperspective. We have shown that in order to capture the single-molecule mRNA FISH data a single-statepromoter wouldn’t suﬃce. This model predicts a Poisson distribution as the steady-state and the data deﬁnitelyshows super Poissonian noise. Given the bursty nature of gene expression we opt to use a two-state promoterwhere the states reﬂect eﬀective transcriptionally “active” and “inactive” states. We are currently exploringalternative formulations of this model to turn it into a single state with a geometrically distributed burst-size. (iv) Optimal vs Log-ﬂat Distributions.

The relevance of having use the Blahut-Arimoto algorithm topredict the maximum mutual information between input and outputs was just to understand the best casescenario. We show the comparison between theoretical and experimental input-output functions P ( p | c ) inFig. S26. Given the good agreement between these distributions we could compute the mutual information I ( c ; p )for any arbitrary input distribution P ( c ) and obtain a good agreement with the corresponding experimentalmutual information.The reason we opted to speciﬁcally report the mutual information at channel capacity was to put the resultsin a context. By reporting the upper bound in performance of these genetic circuits we can start to dissecthow diﬀerent molecular parameters such as repressor-DNA binding aﬃnity or repressor copy number aﬀect theability of this genetic circuit to extract information from the environmental state. S9 Empirical ﬁts to noise predictions (Note: The Python code used for the calculations presented in this section can be found in the followinglink as an annotated Jupyter notebook)In Fig. 3(C) in the main text we show that our minimal model has a systematic deviation on the geneexpression noise predictions compared to the experimental data. This systematics will need to be addressedon an improved version of the minimal model presented in this work. To guide the insights into the origins ofthis systematic deviation in this appendix we will explore empirical modiﬁcations of the model to improve theagreement between theory and experiment.

S9.1 Multiplicative factor for the noise

The ﬁrst option we will explore is to modify our noise predictions by a constant multiplicative factor. Thismeans that we assume the relationship between our minimal model predictions and the data for noise in geneexpression are of the from noise exp = α · noise theory , (S195)where α is a dimensionless constant to be ﬁt from the data. The data, especially in Fig. S19 suggests thatour predictions are within a factor of ≈ two from the experimental data. To further check that intuition weperformed a weighted linear regression between the experimental and theoretical noise measurements. Theweight for each datum was taken to be proportional to the bootstrap errors in the noise estimate, this to havepoorly determined noises weigh less during the regression. The result of this regression with no intercept shows71xactly that a factor of two systematically improves the theoretical vs. experimental predictions. Fig. S37 showsthe improved agreement when the theoretical predictions for the noise are multiplied by ≈ . ≈ .

5. It is clear that overall a simple multiplicative factor improves the predictivepower of the model.

S9.2 Additive factor for the noise

As an alternative way to empirically improve the predictions of our model we will now test the idea of anadditive constant. What this means is that our minimal model underestimates the noise in gene expression asnoise exp = β + noise theory , (S196)where β is an additive constant to be determined from the data. As with the multiplicative constant weperformed a regression to determine this empirical additive constant comparing experimental and theoreticalgene expression noise values. We use the error in the 95% bootstrap conﬁdence interval as a weight for thelinear regression. Fig. S39 shows the resulting theoretical vs. experimental noise where β ≈ .

2. We can see agreat improvement in the agreement between theory and experiment with this additive constantFor completeness Fig. S40 shows the noise in gene expression as a function of the inducer concentrationincluding this additive factor of β ≈ .

2. If anything, the additive factor seems to improve the agreementbetween theory and data even more than the multiplicative factor.

S9.3 Correction factor for channel capacity with multiplicative factor

As seen in Appendix S5 a constant multiplicative factor can reduce the discrepancy between the modelpredictions and the data with respect to the noise (standard deviation / mean) in protein copy number. Toﬁnd the equivalent correction would be for the channel capacity requires gaining insights from the so-calledsmall noise approximation [17]. The small noise approximation assumes that the input-output function canbe modeled as a Gaussian distribution in which the standard deviation is small. Using these assumptions onecan derive a closed-form for the channel capacity. Although our data and model predictions do not satisfy therequirements for the small noise approximation, we can gain some intuition for how the channel capacity wouldscale given a systematic deviation in the cell-to-cell variability predictions compared with the data.Using the small noise approximation one can derive the form of the input distribution at channel capacity P ∗ ( c ). To do this we use the fact that there is a deterministic relationship between the input inducer concen-tration c and the mean output protein value (cid:104) p (cid:105) , therefore we can work with P ( (cid:104) p (cid:105) ) rather than P ( c ) since thedeterministic relation allows us to write P ( c ) dc = P ( (cid:104) p (cid:105) ) d (cid:104) p (cid:105) . (S197)Optimizing over all possible distributions P ( (cid:104) p (cid:105) ) using calculus of variations results in a distribution of the form P ∗ ( (cid:104) p (cid:105) ) = 1 Z σ p ( (cid:104) p (cid:105) ) , (S198)where σ p ( (cid:104) p (cid:105) ) is the standard deviation of the protein distribution as a function of the mean protein expression,and Z is a normalization constant deﬁned as Z ≡ (cid:90) (cid:104) p ( c →∞ ) (cid:105)(cid:104) p ( c =0) (cid:105) σ p ( (cid:104) p (cid:105) ) d (cid:104) p (cid:105) . (S199)Under these assumptions the small noise approximation tells us that the channel capacity is of the form [17] I = log (cid:18) Z√ πe (cid:19) . (S200)72rom the theory-experiment comparison in Appendix S5 we know that the standard deviation predicted byour model is systematically oﬀ by a factor of two compared to the experimental data, i.e. σ exp p = 2 σ theory p . (S201)This then implies that the normalization constant Z between theory and experiment must follow a relationshipof the form Z exp = 12 Z theory . (S202)With this relationship the small noise approximation would predict that the diﬀerence between the experimentaland theoretical channel capacity should be of the form I exp = log (cid:18) Z exp √ πe (cid:19) = log (cid:18) Z theory √ πe (cid:19) − log (2) . (S203)Therefore under the small noise approximation we would expect our predictions for the channel capacity to beoﬀ by a constant of 1 bit (log (2)) of information. Again, the conditions for the small noise approximation donot apply to our data given the intrinsic level of cell-to-cell variability in the system, nevertheless what thisanalysis tells is is that we expect that an additive constant should be able to explain the discrepancy betweenour model predictions and the experimental channel capacity. To test this hypothesis we performed a “linearregression” between the model predictions and the experimental channel capacity with a ﬁxed slope of 1. Theintercept of this regression, -0.56 bits, indicates the systematic deviation we expect should explain the diﬀerencebetween our model and the data. Fig. S41 shows the comparison between the original predictions shown inFig. 5(A) and the resulting predictions with this shift. Other than the data with zero channel capacity, thisshift is able to correct the systematic deviation for all data. We therefore conclude that our model ends upunderestimating the experimentally determined channel capacity by a constant amount of 0.43 bits. S10 Derivation of the cell age distribution

E. O. Powell ﬁrst derive in 1956 the distribution of cell age for a cell population growing steadily in theexponential phase [40]. This distribution is of the form P ( a ) = ln(2) · − a , (S204)where a ∈ [0 ,

1] is the fraction of the cell cycle, 0 being the moment right after the mother cell divides, and 1being the end of the cell cycle just before cell division. In this section we will reproduce and expand the detailson each of the steps of the derivation.For an exponentially growing bacterial culture, the cells satisfy the growth law dndt = µn, (S205)where n is the number of cells and µ is the growth rate in units of time − . We begin by deﬁning P ( a ) to bethe probability density function of a cell having age a . At time zero of a culture in exponential growth, i.e. thetime when we start considering the growth, not the initial condition of the culture, there are N P ( a ) da cellswith age range between [ a, a + da ]. In other words, for N (cid:29) da (cid:28) aN P ( a ≤ x ≤ a + da ) ≈ N P ( a ) da. (S206)We now deﬁne F ( τ ) = (cid:90) ∞ τ f ( ξ ) dξ, (S207)as the fraction of cells whose division time is greater than τ . This is because in principle not all cells divideexactly after τ minutes, but there is a distribution function f ( τ ) for the division time after birth. Empirically73t has been observed that a generalize Gamma distribution ﬁts well to experimental data on cell division time,but we will worry about this speciﬁc point later on.From the deﬁnition of F ( τ ) we can see that if a cell reaches an age a , the probability of surviving to an age a + t without dividing is given by P (age = ( a + t ) | age = a ) = F ( a + t | a ) = F ( a + t ) F ( a ) . (S208)This result comes simply from the deﬁnition of conditional probability. Since F ( a ) is the probability of surviving a or more minutes without dividing, by the deﬁnition of conditional probability we have that F ( a + t | a ) = F ( a, a + t ) F ( a ) , (S209)where F ( a, a + t ) is the joint probability of surviving a minutes and a + t minutes. But the probability ofsurviving a + t minutes or more implies that the cell already survived a minutes, therefore the information isredundant and we have F ( a, a + t ) = F ( a + t ) . (S210)This explains Eq. S208. From this equation we can ﬁnd that out of the N P ( a ) da cells with age a only a fraction[ N P ( a ) da ] F ( a + t | a ) = N P ( a ) F ( a + t ) F ( a ) da (S211)will survive without dividing until time a + t . During that time interval t the culture has passed from N cells to N e µt cells given the assumption that they are growing exponentially. The survivors N P ( a ) F ( a + t | a ) da thenrepresent a fraction of the total number of cells N P ( a ) da ] F ( a + t | a ) N e µt = P ( a ) F ( a + t ) F ( a ) da e µt , (S212)and their ages lie in the range [ a + t, a + t + da ]. Since we assume that the culture is in steady state then itfollows that the fraction of cells that transitioned from age a to age a + t must be P ( a + t ) da . Therefore wehave a diﬀerence equation - the discrete analogous of a diﬀerential equation - of the form P ( a + t ) da = P ( a ) F ( a + t ) F ( a ) e − µt da. (S213)What this equation shows is a relationship that connects the probability of having a life time of a + t with aprobability of having a shorter life time a and the growth of the population. If we take t to be very small,speciﬁcally if we assume t (cid:28) µ − we can Taylor expand around a the following terms: F ( a + t ) ≈ F ( a ) + dFda t, (S214) P ( a + t ) ≈ P ( a ) + dPda t, (S215)and e − µt ≈ − µt. (S216)Substituting these equations into Eq. S213 gives P ( a ) + dPda t = P ( a ) (cid:32) F ( a ) + dFda tF ( a ) (cid:33) (1 − µt ) . (S217)74his can be rewritten as 1 P ( a ) dPda = 1 F ( a ) dFda − µ − µtF ( a ) dFda . (S218)Since we assumed t (cid:28) µ − we then approximate the last term to be close to zero. We can then simplify thisresult into 1 P ( a ) dPda = 1 F ( a ) dFda − µ. (S219)Integrating both sides of the equation with respect to a givesln P ( a ) = ln F ( a ) − µa + C, (S220)where C is the integration constant. Exponentiating both sides gives P ( a ) = C (cid:48) F ( a ) e − µa . (S221)Where C (cid:48) ≡ e C . To obtain the value of the unknown constant we recall that F (0) = 1 since the probability ofhaving a life equal or longer than zero must add up to one, therefore we have that P (0) = C (cid:48) . This gives then P ( a ) = P (0) e − µa F ( a ) . (S222)Substituting the deﬁnition of F ( a ) gives P ( a ) = P (0) e − µa (cid:90) ∞ a f ( ξ ) dξ. (S223)The last step of the derivation involves writing P (0) and the growth rate µ in terms of the cell cycle lengthdistribution f ( τ ).The growth rate of the population cell number (not the growth of cell mass) is deﬁned as the number of celldoublings per unit time divided by the number of cells. This is more clear to see if we write Eq. S205 as a ﬁnitediﬀerence N ( t + ∆ t ) − N ( t )∆ t = µN ( t ) . (S224)If the time ∆ t is the time interval it takes to go from N to 2 N cells we have2 N − N ∆ t = µN. (S225)Solving for µ gives µ = (cid:122) (cid:125)(cid:124) (cid:123) N − N ∆ t (cid:122)(cid:125)(cid:124)(cid:123) N . (S226)We deﬁned F ( a ) to be the probability of a cell reaching an age a or greater. For a cell to reach an age a + da we can then write F ( a + da ) = (cid:90) ∞ a + da f ( ξ ) dξ = (cid:90) ∞ a f ( ξ ) dξ − (cid:90) a + daa f ( ξ ) dξ. (S227)We can approximate the second term on the right hand side to be (cid:90) a + daa f ( ξ ) dξ ≈ f ( a ) da, (S228)for da (cid:28) a , obtaining F ( a + da ) ≈ F ( a ) − f ( a ) da. (S229)75hat this means is that from the original fraction of cells F ( a ) with age a or greater a fraction f ( a ) da/F ( a )will not reach age ( a + da ) because they will divide. So out of the N P ( a ) cells that reached exactly age a , thenumber of doubling events on a time interval da is given by a on interval da = a (cid:122) (cid:125)(cid:124) (cid:123) N P ( a ) fraction of doublings per unit time (cid:122) (cid:125)(cid:124) (cid:123) f ( a ) daF ( a ) . (S230)The growth rate then is just the sum (integral) of each age contribution to the total number of doublings. Thisis µ = 1 N (cid:90) ∞ N P ( a ) f ( a ) daF ( a ) . (S231)Substituting Eq. S222 gives µ = (cid:90) ∞ [ P (0) e − µa F ( a )] f ( a ) daF ( a ) = (cid:90) ∞ P (0) e − µa f ( a ) da. (S232)We now have the growth rate µ written in terms of the cell cycle length probability distribution f ( a ) and theprobability P (0). Since P ( a ) is a probability distribution it must be normalized, i.e. (cid:90) ∞ P ( a ) da = 1 . (S233)Substituting Eq. S222 into this normalization constraint gives (cid:90) ∞ P (0) e − µa F ( a ) da = 1 . (S234)From here we can integrate the left hand side by parts. We note that given the deﬁnition of F ( a ), the derivativewith respect to a is − f ( a ) rather than f ( a ). This is because if we write the derivative of F ( a ) we have dF ( a ) da ≡ lim da → F ( a + da ) − F ( a ) da . (S235)Substituting the deﬁnition of F ( a ) gives dF ( a ) da = lim da → da (cid:20)(cid:90) ∞ a + da f ( ξ ) dξ − (cid:90) ∞ a f ( ξ ) dξ (cid:21) . (S236)This diﬀerence in the integrals can be simpliﬁed tolim da → da (cid:20)(cid:90) ∞ a + da f ( ξ ) dξ − (cid:90) ∞ a f ( ξ ) dξ (cid:21) ≈ − f ( a ) dada = − f ( a ) . (S237)Taking this into account we now perform the integration by parts obtaining P (0) (cid:20) e − µt − µ F ( a ) (cid:21) ∞ − P (0) (cid:90) ∞ e − µa − µ ( − f ( a )) da = 1 . (S238)On the ﬁrst term on the left hand side we have that as a → ∞ , both terms e − µa and F ( a ) go to zero. We alsohave that e µ = 1 and F (0) = 1. This results in P (0) µ − P (0) (cid:90) ∞ e − µa µ f ( a ) da = 1 . (S239)The second term on the left hand side is equal to Eq. S232 since µ = (cid:90) ∞ P (0) e − µa f ( a ) da ⇒ (cid:90) ∞ P (0) e − µa µ f ( a ) da. (S240)76his implies that on Eq. S238 we have P (0) µ − ⇒ P (0) = 2 µ. (S241)With this result in hand we can rewrite Eq. S223 as P ( a ) = 2 µe − µa (cid:90) ∞ a f ( ξ ) dξ. (S242)Also we can rewrite the result for the growth rate µ on Eq. S232 as µ = 2 µ (cid:90) ∞ e − µa f ( a ) da ⇒ (cid:90) ∞ e − µa f ( a ) da = 1 . (S243)As mentioned before the distribution f ( a ) has been empirically ﬁt to a generalize Gamma distribution. Butif we assume that our distribution has almost negligible dispersion around the mean average doubling time a = τ d , we can approximate f ( a ) as f ( a ) = δ ( a − τ d ) , (S244)a Dirac delta function. Applying this to Eq. S243 results in2 (cid:90) ∞ e − µa δ ( a − τ a ) da = 1 ⇒ e − µτ d = 1 . (S245)Solving for µ gives µ = ln 2 τ d . (S246)This delta function approximation for f ( a ) has as a consequence that F ( a ) = (cid:40) a ∈ [0 , τ d ] , a > τ d . (S247)Fianlly we can rewrite Eq. S242 as P ( a ) = 2 (cid:18) ln 2 τ d (cid:19) e − ln 2 τd a (cid:90) ∞ a δ ( ξ − τ d ) dξ ⇒ = 2 ln 2 · − aτd . (S248)Simplifying this we obtain P ( a ) = (cid:40) ln 2 · − aτd for a ∈ [0 , τ d ] , . (S249)This is the equation we aimed to derive. The distribution of cell ages over the cell cycle. Supplemental References V. Shahrezaei and P. S. Swain, “Analytical distributions for stochastic gene expression”, PNAS , 17256–17261 (2008). L. Bintu, N. E. Buchler, H. G. Garcia, U. Gerland, T. Hwa, J. Kondev, and R. Phillips, “Transcriptionalregulation by the numbers: models”, Current Opinion in Genetics & Development , 116–124 (2005). S. Klumpp and T. Hwa, “Growth-rate-dependent partitioning of RNA polymerases in bacteria”, PNAS ,20245–20250 (2008). 77 M. K. Transtrum, B. B. Machta, K. S. Brown, B. C. Daniels, C. R. Myers, and J. P. Sethna, “Perspective:Sloppiness and emergent theories in physics, biology, and beyond”, The Journal of Chemical Physics ,010901 (2015). J. Yu, “Probing Gene Expression in Live Cells, One Protein Molecule at a Time”, Science , 1600–1603(2006). J. M. G. Vilar and L. Saiz, “Control of gene expression by modulated self-assembly”, Nucleic Acids Research , 6854–6863 (2011). J. L. Radzikowski, S. Vedelaar, D. Siegel, ´A. D. Ortega, A. Schmidt, and M. Heinemann, “Bacterial persistenceis an active σ S stress response to metabolic ﬂux limitation”, Molecular Systems Biology , 882 (2016). P. Hammar, M. Walld´en, D. Fange, F. Persson, ¨O. Baltekin, G. Ullman, P. Leroy, and J. Elf, “Direct measure-ment of transcription factor dissociation excludes a simple operator occupancy model for gene regulation”,Nature Genetics , 405–408 (2014). U. Moran, R. Phillips, and R. Milo, “SnapShot: Key Numbers in Biology”, Cell , 1262–1262.e1 (2010). A. Meurer et al., “Sympy: symbolic computing in python”, PeerJ Computer Science , e103 (2017). A. Ale, P. Kirk, and M. P. H. Stumpf, “A general moment expansion method for stochastic kinetic models”,The Journal of Chemical Physics , 174101 (2013). A. Andreychenko, L. Bortolussi, R. Grima, P. Thomas, and V. Wolf,

Modeling Cellular Systems , edited byF. Graw, F. Matth¨aus, and J. Pahle, Vol. 11, Contributions in Mathematical and Computational Sciences(Springer International Publishing, Cham, 2017), pp. 39–67. F. Fr¨ohlich, P. Thomas, A. Kazeroonian, F. J. Theis, R. Grima, and J. Hasenauer, “Inference for StochasticChemical Kinetics Using Moment Equations and System Size Expansion”, PLoS Computational Biology ,edited by D. A. Beard, e1005030 (2016). D. Schnoerr, G. Sanguinetti, and R. Grima, “Approximation and inference methods for stochastic biochemicalkinetics—a tutorial review”, Journal of Physics A: Mathematical and Theoretical , 093001 (2017). P. Smadbeck and Y. N. Kaznessis, “A closure scheme for chemical master equations”, PNAS , 14261–14265 (2013). R. Cheong, A. Rhee, C. J. Wang, I. Nemenman, and A. Levchenko, “Information Transduction Capacity ofNoisy Biochemical Signaling Networks”, Science , 354–358 (2011).78 .00.51.0 E C D F r e p . / c e ll = E C D F = . r e p . / c e ll = fold-change E C D F fold-change fold-change fold-change fold-change fold-change r e p . / c e ll = E C D F r e p . / c e ll = E C D F = . r e p . / c e ll = fold-change E C D F fold-change fold-change fold-change fold-change fold-change r e p . / c e ll = E C D F r e p . / c e ll = E C D F = . r e p . / c e ll = fold-change E C D F fold-change fold-change fold-change fold-change fold-change r e p . / c e ll = Figure S26. Experiment vs. theory comparison for regulated promoters.

Example fold-change empiricalcumulative distribution functions (ECDF) for regulated strains with the three operators (diﬀerent colors) as a functionof repressor copy numbers (rows) and inducer concentrations (columns). The color curves represent single-cellmicroscopy measurements while the dashed black lines represent the theoretical distributions as reconstructed by themaximum entropy principle. The theoretical distributions were ﬁtted using the ﬁrst six moments of the proteindistribution

50 100 150 200 time (min) m R N A / c e ll ( ) single promotertwo promoters Figure S27. Stochastic trajectories of mRNA counts.

100 stochastic trajectories generated with the Gillespiealgorithm for mRNA counts over time for a two-state unregulated promoter. Cells spend a fraction of the cell cyclewith a single copy of the promoter (light brown) and the rest of the cell cycle with two copies (light yellow). Whentrajectories reach a new cell cycle, the mRNA counts undergo a binomial partitioning to simulate the cell division. mRNA / cell p r o b a b ili t y analytical singleanalytical doublegillespie doublegillespie single Figure S28. Comparison of analytical and simulated mRNA distribution.

Solid lines show the steady-statemRNA distributions for one copy (light blue) and two copies of the promoter (dark blue) as deﬁned by Eq. S10.Shaded regions represent the corresponding distribution obtained using 2500 stochastic mRNA trajectories and takingthe last cell-cyle to approximate the distribution. m R N A / c e ll single promoter two promoters time (min) p r o t e i n / c e ll Figure S29. Stochastic trajectories of mRNA and protein counts. p r o b a b ili t y gillespieMaxEnt protein / cell C D F gillespieMaxEnt Figure S30. Comparison of protein distributions.

Comparison of the protein distribution generated withGillespie stochastic simulations (blue curve) and the maximum entropy approach presented in Appendix S6 (orangecurve). The upper panel shows the probability mass function. The lower panel compares the cumulative distributionfunctions. .00.20.40.60.81.0 p r o b a b ili t y × fluorescence (a.u.) E C D F I P T G ( M ) (A)(B) Figure S31. Single cell ﬂuorescence distributions for diﬀerent inducer concentrations.

Fluorescencedistribution histogram (A) and cumulative distribution function (B) for a strain with 260 repressors per cell and abinding site with binding energy ∆ ε r = − . k B T . The diﬀerent curves show the single cell ﬂuorescence distributionsunder the 12 diﬀerent IPTG concentrations used throughout this work. The triangles in (A) show the mean of each ofthe distributions. channel capacity (bits) E C D F Figure S32. Channel capacity bootstrap for experimental data.

Cumulative distribution function of theresulting channel capacity estimates obtained by subsampling 200 times 50% of each distribution shown in Fig. S31,binning it into 100 bins, and feeding the resulting Q matrix to the Blahut-Arimoto algorithm. .00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 (sample size) × ( b i t s ) Figure S33. Inverse sample size vs channel capacity.

As indicated in Eq. S193 if the channel capacity obtainedfor diﬀerent subsample sizes of the data is plotted against the inverse sample size there must exist a linear relationshipbetween these variables. Here we perform 15 bootstrap samples of the data from Fig. S31, bin these samples usingdiﬀerent number of bins, and perform a linear regression (solid lines) between the bootstrap channel capacity estimates,and the inverse sample size. c h a nn e l c a p a c i t y ( b i t s ) experimental datashuffled data Figure S34. Channel capacity as a function of the number of bins.

Unbiased channel capacity estimatesobtained from linear regressions as in Fig. S33. The blue curve show the estimates obtained from the data shown inFig. S31. The orange curve is generated from estimates where the same data is shuﬄed, loosing the relationshipbetween ﬂuorescence distributions and inducer concentration. repressor copy number c h a nn e l c a p a c i t y ( b i t s ) ( ) -15.3-13.9-9.7-15.3-13.9-9.7single-promotermulti-promoter Figure S35. Comparison of channel capacity predictions for single- and multi-promoter models.

Channelcapacity for the multi-promoter model (solid lines) vs. the single-promoter steady state model (dot-dashed lines) as afunction of repressor copy numbers for diﬀerent repressor-DNA binding energies. The single-promoter model assumesPoissonian protein degradation ( γ p >

0) and steady state, while the multi-promoter model accounts for gene copynumber variability and during the cell cycle and has protein degradation as an eﬀect due to dilution as cells grow anddivide. number of moments K L d i v e r g e n g e ( b i t s ) = -15.3 rep./cell

022 2601740 number of moments = -13.9 rep./cell

022 2601740 number of moments = -9.7 rep./cell

022 2601740

Figure S36. Measuring the loss of information by using diﬀerent number of constraints.

TheKullback-Leibler divergence was computed between the maximum entropy distribution constructed using the ﬁrst 6moments of the distribution and a variable number of moments. .0 0.5 1.0 1.5 2.0 theoretical noise e x p e r i m e n t a l n o i s e linear scale theoretical noise e x p e r i m e n t a l n o i s e log scale f o l d - c h a n g e Figure S37. Multiplicative factor to improve theoretical vs. experimental comparison of noise in geneexpression.

300 cells. The points correspond to the median, and the errorbars correspond to the 95% conﬁdence interval as determined by 10,000 bootstrap samples. = -15.3 = -13.9 = -9.7 IPTG ( M) n o i s e rep./cell222601740singlemulti IPTG ( M) rep./cell222601740singlemulti IPTG ( M) rep./cell222601740singlemulti

Figure S38. Protein noise of the regulated promoter with multiplicative factor.

Comparison of theexperimental noise for diﬀerent operators ((A) O1, ∆ ε r = − . k B T , (B) O2, ∆ ε r = − . k B T , (C) O3,∆ ε r = − . k B T ) with the theoretical predictions for the the multi-promoter model. A linear regression revealed thatmultiplying the theoretical noise prediction by a factor of ≈ . E. coli strains under 12 diﬀerent inducer concentrations. Dotted line indicates plot in linear rather than logarithmic scale.Each datum represents a single date measurement of the corresponding strain and IPTG concentration with ≥ .0 0.5 1.0 1.5 2.0 theoretical noise e x p e r i m e n t a l n o i s e linear scale theoretical noise e x p e r i m e n t a l n o i s e log scale f o l d - c h a n g e Figure S39. Additive factor to improve theoretical vs. experimental comparison of noise in geneexpression.

Theoretical vs. experimental noise both in linear (left) and log (right) scale. The dashed line shows theidentity line of slope 1 and intercept zero. All data are colored by the corresponding value of the experimentalfold-change in gene expression as indicated by the color bar. A value of ≈ . x -axis asdetermined by a linear regression from the data in Fig. S18. Each datum represents a single date measurement of thecorresponding strain and IPTG concentration with ≥

Figure S40. Protein noise of the regulated promoter with additive factor.

300 cells. The points correspondto the median, and the error bars correspond to the 95% conﬁdence interval as determined by 10,000 bootstrapsamples. White-ﬁlled dots are plot at a diﬀerent scale for better visualization. repressor copy number c h a nn e l c a p a c i t y ( b i t s ) ( ) -15.3-13.9-9.7 Figure S41. Additive correction factor for channel capacity.