Influence of Multiplicative Stochastic Variation on Translational Elongation Rates
IInfluence of Multiplicative Stochastic Variation on Translational Elongation Rates
Sandip Datta and Brian Seed ∗ Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, USA (Dated: October 12, 2018)Recent experiments have shown that stochastic e ff ects exerted at the level of translation contribute a substan-tial portion of the variation in abundance of proteins expressed at moderate to high levels. This study analyzestranslational noise arising from fluctuations in residue-specific elongation rates. The resulting variation hasmultiplicative components that lead individual protein abundances in a population to exhibit approximately log-normal behavior. The high variability inherent in the process leads to parameter variation that has the features ofa type of noise in biological systems that has been characterized as “extrinsic.” Elongation rate variation o ff ersan accounting for a major component of extrinsic noise, and the analysis provided here highlights a probabilitydistribution that is a natural extension of the Poisson and has broad applicability to many types of multiplicativenoise processes. I. INTRODUCTION
Regulation of the abundance of proteins expressed in livingcells is mediated by multiple types of control, exerted over therates of transcription, post-transcriptional mRNA processing,mRNA decay, translation, and protein degradation. The reg-ulatory processes can be construed as a sequence of chemicalreactions in a domain in which the number of participatingmolecules is small and hence stochastic influences are sig-nificant. Those influences give rise to fluctuations in proteinconcentration in an otherwise homogeneous cell population atsteady state .Stochastic fluctuations in protein distribution can result inheterogeneous phenotypes in clonal populations that can bebeneficial for the survival of a population of organisms in achanging environment . For example, under conditions of ni-trogen limitation, cyanobacteria dedicate a subpopulation ofcells to nitrogen fixation while the rest of the population re-mains phototrophic . Similarly, following exhaustion of nu-trient resources, the undi ff erentiated free-living amoebae ofcellular slime molds aggregate and undergo spontaneous dif-ferentiation into spore- and stalk-forming cells. Such task-sharing decisions assisted by stochastic di ff erentiation in aclonal population may have formed the basis for multi-cellulardevelopment . In mammalian cells, the manifestation ofstochastic gene expression resulting in phenotypic diversityhas been observed in the processes of cellular di ff erentiation and apoptosis . Genes belonging to the same functional grouphave been found to possess similar noise characteristics ,and stress response genes that mitigate the e ff ects of environ-mental fluctuations have been found to exhibit noisier expres-sion than genes thought to require invariant expression . II. INTRINSIC AND EXTRINSIC NOISE ANDSTOCHASTIC VARIATION IN PROTEIN ABUNDANCE
Elowitz et. al. have di ff erentiated between intrinsic and ex-trinsic noise . Sources of intrinsic noise include the randombirth and death of molecules, or stochastic gene activation .Extrinsic sources of noise include factors contributing to fluc-tuations in reaction rates , e.g. the number of RNA poly-merase molecules and ribosomes , the variation in kinetic parameters such as rates of transcription and translation , thevariations in individual cell shapes and volume , and varia-tion in common elements upstream of a transcription factor .For intrinsic sources the resulting noise (normalized bythe squared mean abundance) is inversely proportional to themean protein abundance , and deviations from this relationhelp to identify the relative proportion of extrinsic noise. De-tailed experimental measurements carried out in S. cerevisiae have shown that the contribution of extrinsic noise to proteinabundance increases with level of expression . These find-ings are consistent with conclusions from earlier studies that the source of noise for moderately to highly abundant pro-teins in both
E. coli and
S. cerevisiae is primarily extrinsic .Xie and coworkers have carried out single molecule mea-surements in E. coli cells under conditions in which ex-pression is highly repressed, so that the random formationand degradation of RNA molecules is the dominant noisesource . They found that, below ten proteins per cell thenoise is inversely proportional to protein abundance . Theyalso showed that the distribution of low copy number pro-teins can be fit to a gamma distribution, the two parametersof which have direct physical interpretation as the proteinburst rate and burst size . Above ten proteins per cell thenoise reaches a plateau indicating the dominance of extrinsicnoise . Reported numbers of proteins per bacterial cell canrange from approximately 50,000 to nearly zero .Two general models for the influence of transcriptionalnoise have been proposed, the Poisson and the telegraph pro-cesses. Under a Poisson process transcription occurs withconstant probability in time resulting in single mRNAs be-ing produced and destroyed . In a telegraph process the genesswitch between transcriptionally active or inactive states andthe active state results in a burst of mRNA production . Inboth processes the mRNA noise variance is inversely propor-tional to the mean mRNA abundance, but the proportional-ity constant for the Poisson process is one, and greater thanone for the telegraph process . Higher eukaryotes exhibit amuch broader mRNA distribution than prokaryotes and showtranscriptional bursts . A growing body of work suggests thattranscription may occur in any single organism within a rangeof kinetic modes, a subset of genes being transcribed by Pois-son processes, and a subset being transcribed with di ff eringbursting dynamics . a r X i v : . [ q - b i o . S C ] S e p The correlation between mRNA and protein abundances(or lack thereof) can be used as a guide to the extent bywhich fluctuations in mRNA copy number produce fluctua-tions in protein concentration . Previous studies have foundthat the correlation between mRNA and protein levels is pooracross all organisms . These studies have broadly indi-cated that post-transcriptional e ff ects determine steady stateprotein abundance . Recently, Schwanaeusser et al. mea-sured mRNA and protein simultaneously for 5000 genes inmouse fibroblasts and found that about 55% of the correlationbetween mRNA and protein level can be explained by con-sidering translation rate constants alone . Therefore as an-ticipated earlier , among post-transcriptional steps, trans-lation represents the most consequential stochastic factor fordetermining protein abundance for the cell as a whole. Cellsexpend more energy in translation compared to transcription(in an approximately 9:1 ratio), which may explain the domi-nance of translational control. III. TRANSLATION PROCEEDS AT A VARIABLE RATE
Translation can be divided into four stages: initiation,elongation, termination and recycling. The mechanisms forinitiation and termination di ff er between prokaryotes andeukaryotes, but the elongation mechanism is conserved .Elongation is frequently the rate-limiting process for pro-tein synthesis . In the elongation phase, a series of reac-tion steps leads to the accommodation and addition of aminoacid residues to the polypeptide chain (or rejection of the aa-tRNA), and each of these reactions can be characterized interms of kinetic rate constants . As a simplification, a nete ff ective rate constant for a single residue addition in elonga-tion can be composed from the rate constants for individualsteps that result in chain extension. The e ff ective kinetic rateconstant is expected to fluctuate from cell to cell across a cellpopulation, depending on a variety of noise sources, the vastmajority of which will arise from proteins that compose thetranslational machinery and are expressed at a high level andhence are expected to contribute extrinsic noise.In bacteria the elongation rate varies between 4 and 22amino acids per second . The protein sythesis rate can bea ff ected by many factors, of which the most significant is con-sidered to be the relative concentration of various tRNAs .At each elongation step the ribosome must intercept theaminoacyl-tRNA (aa-tRNA) complimentary to the codon atthe ribosome A site. The relative local concentration ofvarious aa-tRNAs near the site determines the waiting time.Codons corresponding to under-represented tRNAs reduce theelongation rate and are themselves under-represented ,which is thought to provide a mechanism allowing organ-isms to manipulate the expression level of proteins . Elon-gation is also slowed by mRNA secondary structures calledpseudoknots or by the interaction of nascent peptide se-quences with the ribosome exit channel . Recently, Igno-lia et al. have extensively sequenced ribosome protectedmRNA fragments thereby obtaining a more detailed pictureof the ribosome distribution on mRNA . They observed є є є є є є n-1;1 є є n-1;1 mRNA 1 1 2 3 4 n-1 n є є є є є є n-1;N є є n-1;N mRNA N 1 2 3 4 n-1 n є є є є є є n-1;2 є є n-1;2 mRNA 2 1 2 3 4 n-1 nP( є ) P( є ) P( є ) P( є ) P( є n-1 ) P( є n ) FIG. 1. Schematic diagram of the stochastic random walk modelfor the translation of mRNAs. Following initiation the translationproceeds via elongation at di ff erent local rates on di ff erent mRNAs.The figure shows N copies of mRNA in a population of N c cells ( N > N c ) undergoing elongation. Each mRNA is represented by a lineardiscrete lattice chain with individual nodes representing codons atwhich ribosomes add residues to polypeptides. At any given nodei in a microscopic interval of time, the probability of a codon beingtranslated or not is given by (cid:15) i and 1 − (cid:15) i respectively. The distributionof (cid:15) i at the i th node is given by P ( (cid:15) i ). (cid:15) i ; j represents the scaled rateconstant (drawn from probability distribution P ( (cid:15) i )) for jth mRNA atthe i th site. Once the ribosome reaches site n , the termination site,a completed protein is released and the ribosome moves back to theinitiation site (the recycling step). substantial variation in the density of ribosome footprintsalong mRNAs in both yeast and E. coli . In mammaliancells, some locations on mRNA were found to have 25-foldgreater density than the median density across the gene .Thousands of such sites were observed in mouse embryonicstem cell transcripts . Similar translational pauses have beenreported . Current experimental procedures, including ri-bosome profiling, cannot provide the duration of such stochas-tic translational pauses , which are typically transient . Ad-ditional factors a ff ecting elongation rate are collisions be-tween individual ribosomes in polysomes , controlled ribo-some stalling, and interactions between the translating ribo-some and RNA polymerase in prokaryotes .Protein synthesis kinetics also depend on macroscopic fac-tors, such as the overall metabolic status of the cell, composi-tion of the template pool, the fraction of synthesis devoted tosecreted versus non-secreted proteins, and the density of ribo-somes on the template. The quantities of EF-G and elF5A significantly influence the rate at which elongation proceeds.Recent modeling of the dynamics of protein synthesis has fo-cused on analysis of conditions for which inter-ribosome in-teraction is significant and has emphasized situations in whichthe movement of the ribosome is controlled by the availabilityof adjacent free mRNA exposed by the departure of the pre-ceding ribosome . These studies emphasize the importanceof the rate of motion of the leading ribosome in a transcript indetermining the number of copies per transcript. The forma-tion of the mature protein by cotranslational protein folding isregulated by the modulation in the rate at which amino acidsare added to the chains, which results from the stochastic vari-ations in elongation rates .Cultures pulse-labeled with radioactive amino acids havebeen reported to exhibit a pattern of discrete intermediatescorresponding to incomplete polypeptide chains, in which theintermediates can be visualized as bands on a polyacrylamidegel . In the case of the abundantly expressed protein colic-inA, a reasonably good fit could be made to the predicted rateof elongation based on the abundance of charged amino-acyltRNAs for the known codons . IV. A STOCHASTIC MODEL FOR THE TRANSLATIONPROCESS
A fraction of the elongation rate noise is codon-specific,resulting in di ff erent codons being translated at di ff erentrates . Gromadski and Rodnina measured the rate con-stants for di ff erent kinetic substeps for the CUC codon .Based on their data, Fluitt et al. made an estimate of thepossible translation rates for all codons . For any particu-lar translating mRNA in a collection of cells, the rate con-stant at any arbitrary location along the mRNA is a stochasticvariable and therefore can be described most appropriately bya distribution. A general analysis of elongation rates shouldtake into account the propensity for eukaryotic ribosomesto undergo reinitiation following completion of translation,a phenomenon that is physically visualized under conditionsof high protein synthesis as a circular template structure .With this consideration, the stochastic movement of a ribo-some along the mRNA during elongation under the collectiveinfluence of various noise sources can be modeled as a unidi-rectional random walk on a chain with circular boundary con-ditions (see Figure 1), with each incorporation of a residuetaken to follow first order kinetics, with rate constant (cid:15) i forthe i -th residue. A reinitiation probability, λ , conveys the like-lihood of reinitiation once a ribosome has reached the end ofthe open reading frame of d codons.The evolution of the ribosome motion for given initialconditions is determined in the usual manner by the expo-nentiation of a transition matrix. The lapse of an intervalof time, t , results in a change in the probability state vector V i ( t ) = Q i j ( t ) V j (0) for the leading ribosome position deter-mined by the initial conditions and Q ( t ) = e − α t ∞ (cid:88) k = ( α t ) k k ! U k = e α t ( U − (1) where U is a stationary transition matrix of the length of thepolypeptide, d , having generator T = U − d where d is theunit matrix of length d . For the circular boundary conditionscharacteristic of eukaryotic elongation, the i , j entry of the ex-ponentiation of this matrix for i < j yields Q ( t ) i , j = π i (cid:73) e ts λ (cid:81) j − k = i + ( s + (cid:15) k ) (cid:81) dm = j (cid:15) m (cid:81) i − m = (cid:15) m (cid:81) dk = ( s + (cid:15) k ) − λ (cid:81) dm + (cid:15) m ds (2)and for i ≥ j yields Q ( t ) i , j = π i (cid:73) e ts (cid:81) j − k = ( s + (cid:15) k ) (cid:81) dk = i + ( s + (cid:15) k ) (cid:81) i − m = j (cid:15) m (cid:81) dk = ( s + (cid:15) k ) − λ (cid:81) dm + (cid:15) m ds (3)evaluated so that the contour encircles all the poles of the in-tegrand, or, equivalently, encircles the pole at infinity in theopposite sense (see Appendix A). For large d the product (cid:81) dm = (cid:15) m is close to zero, and hence the roots of the denomi-nator polynomial are expected to lie in the vicinity of s = − (cid:15) k .This picture is simplified in the case of short lived mRNAsor in the prokaryotic context, in which the contribution of ri-bosome recycling can be ignored. Setting λ = Q ( t ) i , j = i < j ( − i + j (cid:81) i − k = j (cid:15) k i (cid:80) m = j e − t (cid:15) m ( (cid:81) i − mp = (cid:15) m − (cid:15) m + p )( (cid:81) m − q = j (cid:15) m − (cid:15) q ) i ≥ j (4)The essential element of formulas Eq. (2) to Eq. (4) fromthe standpoint of stochastic structure is the presence of highorder products of the random variables (cid:15) m . When the (cid:15) m areequal Eq. (4) simplifies to the familiar Poisson: Q ( t ) i , j = i < je − t (cid:15) ( t (cid:15) ) i − j ( i − j )! i ≥ j (5)Thus the discrete distribution Q ( t ) i , j represents a general-ization of the Poisson that incorporates multiplicative stochas-tic variation and is appropriate for the characterization of pro-cesses that involve discrete steps that are subject to inter-stepvariability. Both transcription and translation are such pro-cesses, although the focus of this work is translation. Transla-tional variation is likely to be greater than transcriptional vari-ation because of the greater variety of participating substratesand the larger number of discrete steps that must occur to ef-fect the addition of a single residue to the elongating chain. V. STEADY STATE DISTRIBUTION OF PROTEINS
To explore the behavior of the distribution above numeri-cal simulations were performed for the translation of mRNAs - 25 - 20 - 15 - 10 - 5 00.000.020.040.060.080.10 - 15 - 10 - 5 00.000.050.100.15- 12 - 10 - 8 - 6 - 4 - 2 00.000.050.100.150.200.25 - 12 - 10 - 8 - 6 - 4 - 2 00.000.050.100.150.200.25
P(x)P(x) Log(x) Log(x) (a) (d)(c) (b)
FIG. 2. Simulation results for the distribution of proteins across cellpopulation as the system evolves from an initial transient state andfinally reaches the steady state. The plots give the distributions attimes (a) t = . T , (b) t = T , (c) t = T and (d) t = T , where T is an arbitrary unit of time. The distribution narrows between time0 . T and time T while the dynamics are in the transient phase. Thenarrowing continues as the dynamics near steady state at time 2 T . Attime 3 T a steady state distribution given by a log-normal is reached,confirmed by the finding that for any further increase in time the dis-tribution remains invariant and log-normal. In the simulation ribo-some recycling is incorporated via circular boundary condition. Thedistribution of completed proteins is calculated by taking the di ff er-ence of fluxes between the termination and initiation sites. incorporating ribosome recycling step using circular bound-ary condition on a pure initiation state, V [0] = { , , . . . , } at t =
0. The distribution of proteins was obtained as thedi ff erence in fluxes between the termination and re-initiationsites for successively increasing time (Figure 2 (a)-(c)), untilsuch time as the steady state distribution of proteins is reached(Figure 2 (d)). The form of protein distribution remains un-changed with any further increase in time, which confirms theasymptotic nature of this distribution.The e ff ect of the presence of rare codons in the mRNAwas also explored using this model. This circumstance shouldhave an e ff ect that is equivalent to a rate limiting phase in theelongation cycle. A linear lattice of size 30 was chosen. Atsite 20 the scaled rate constant was set close to zero to repre-sent the presence of a rare codon at that site. The calculationof the protein probability density in this condition confirms alocal maximum around site 20, consistent with the expectationthat the presence of a rare codon on mRNA pauses the trans-lation leading to accumulation near the site (Figure 3). Theresults are similar when the sites with rare codons are cho-sen near any arbitrary set of consecutive sites on the lattice. s ca l e d r a t e c on s t a n t P r ob a b ilit y Site location (a)(b)
FIG. 3. The e ff ect of a low rate constant for elongation at a givenresidue. In this case it is expected that the elongation will stall at thesite with low rate constant. (a) In order to simulate this condition themean value for the rate constant distributions at various sites is al-lowed to vary between 50 and 150 except at site 20, where the meanvalue is set near zero. (b) The resulting distribution of the polypep-tides across various sites clearly shows the stalling e ff ect with result-ing accumulation around the residue site 20. Although this exampleis artificially constructed to verify the validity of the model describedin the present work, the occurrence of pause sites has been reportedin the literature. Such pausing and stacking e ff ect has been reported in manydi ff erent experiments, e.g. by Wolin and Walters .The numerically obtained shape of the steady state distribu-tions in Figure 2 resembles a log-normal distribution, whichcan be explained from the stochastic model outlined in theprevious section and described in more detail in the AppendixA. The matrix action V [ t ] i = Q [ t ] i , j V [0] j on a pure initiationstate at t =
0, (i.e. V [0] = { , , . . . , } ), results in ele-ments of V [ t ] i determined entirely by Q [ t ] i , , which is givenby Eq. (3) for the circular mRNAs and Eq. (4) for the shortlived mRNAs. In both these cases Q [ t ] i , is a sum of productsof random variables. If the second moment of the logarithmof such variables is finite, the product of variables approachesa log-normal distribution as the number of variables growslarge, and in such limit Q [ t ] i , j represents a sum of log-normaldistributions. An analytical form for the sum of log-normaldistributions cannot be determined as the characteristic func-tion does not have a closed form. But numerical and analyticalstudies, particularly in the context of wireless communicationand related fields, where log-normal sums appear frequently,have shown that the sum of log-normal distributions has sim-ilar character to a log-normal distribution (see AppendixC for detailed discussion). Consistent with this observation - 10 - 8 - 6 - 4 - 2 00.050.100.150.200.250.300.35 - 12 - 10 - 8 - 6 - 4 - 2 00.050.100.150.200.250.30- 8 - 6 - 4 - 2 00.10.20.30.4 - 8 - 6 - 4 - 2 00.050.100.150.200.250.300.35 P(x)P(x) Log(x) Log(x) (a)(c) (b)(d)
FIG. 4. The steady state distribution of proteins does not depend onthe precise form of the underlying rate constant distributions for elon-gation. The plots show the simulated results for the steady state pro-tein distributions in cases in which the underlying rate constant distri-butions along the entire chain follow (a) normal distribution (mean 50and standard deviation 15), (b) exponential distribution (mean 100),(c) gamma distribution (shape parameter 10, scale parameter 5), and(d) log-normal distribution (derived from normal distribution withmean 3.5 and standard deviation 1). The steady state distribution ofproteins in all cases remains well described by the log-normal dis-tribution. The parameters of the scaled rate constant distribution arechosen so that the system is in the steady state phase in all plots. we find that the steady state distribution of proteins followsa log-normal, as shown in Figure 2. The invariance of thelog-normal distribution under sum implies that the stochas-tically produced log-normally distributed proteins in singlecells when summed over a cell population should also giverise to approximately log-normal distributions in the steadystate.Broadly, our simulation results are consistent with theemergence of log-normality whenever the range of rate dis-tributions for individual elongation steps remains as large be-tween cells as between individual transcripts within the samecell. It is di ffi cult to plausibly formulate circumstances underwhich this would not be true.An essential tenet of the law of large numbers is the in-dependence of the limiting distribution of sums of variablesupon the distributions of the individual variables. Currently,accurate experimental data are not available for rate constantdistributions in vivo. Based on chemical reaction kinetics acase can be made that the rate constant distribution may of-ten have an exponential form (See Appendix B). However,as living cells exist in conditions that are far from equilib-rium and subject to regulatory influences the possibility thatrate constant distributions assume some other form cannot be -2-4-6-8-10 -2-2 -2-4-4-4-6 -6 -6-8-8-8 -10-10-10 00 (a) (b) -3-5-7 -1-1 -3-3 -3-5 -5-5-7 -7-7-9-9 (d)(c) Normal theoretical quantiles Normal theoretical quantiles L og - t r a n s f o r m e d d a t a qu a n til e s L og - t r a n s f o r m e d d a t a qu a n til e s FIG. 5. A quantile-quantile (Q-Q) plot shows that the steady stateprotein distribution is well described by a log-normal. The log-transformed steady state distributions are plotted against a normaldistribution. The underlying rate constant distributions for the steadystate distributions in the Q-Q plots possess the same parameter val-ues as in Figure 4, given by (a) normal distribution, (b) exponentialdistribution, (c) gamma distribution, and (d) log-normal distribution. ruled out. In numerical simulations we have found that thesteady state protein distribution form is largely una ff ected bythe changes in distribution of rate constants, as shown in Fig-ure 4.In order to confirm that the asymptotic form of steady statedistribution follows a log-normal, we calculated the quantile-quantile plot (Q-Q plot) of the log-transformed distributionagainst a normal distribution. Figure 5 shows that the log-transformed distributions exhibit normality over a wide range,with small deviations seen near the tails.Elongation rates may fluctuate over time, but are likely tobe slowly varying compared to the time for completion of apolypeptide chain except in unusual circumstances. In Ap-pendix D, we describe two alternative frameworks for numer-ical calculation for this case and present evidence that the re-sulting protein distributions are well described by log-normaldistribution (Figure 9 and Figure 10). We also show that forlow copy number mRNA templates the corresponding proteindistribution becomes a Gamma distribution when appropriatelimits are applied to the model (Appendix A). VI. DISTRIBUTION OF THE NUMBER OFPOLYPEPTIDES PER TRANSCRIPT
The number of polypeptides per transcript can be calculatedwith some assumptions about the decay kinetics of the tran- - 5.5 - 5.0 - 4.5 - 4.0 - 3.5 - 3.0 F r e qu e n c y FIG. 6. Number of proteins per mRNA follows a log-normal dis-tribution. The plot shows the histogram of the protein abundance(in fractional units) per template. For the simulation of this plot, theunderlying rate constant distribution is taken to follow a gamma dis-tribution with shape parameter 10 and scale parameter 5. Similarresults are obtained in cases in which the underlying rate constantdistributions are di ff erent. scripts. The rate of production of a completed polypeptideof length d is given by (cid:15) d V d ( t ) and the integral with respectto time over the lifetime of the transcript gives the number ofpolypeptides. For first order decay of transcripts with rate con-stant c , the distribution representing the location of the leadingribosome will be the Laplace transform in time of the transi-tion operators with respect to conjugate variable c . Taking theexample of Eq. (3), the i , j entry of the matrix for i ≥ j gives (cid:81) j − k = ( c + (cid:15) k ) (cid:81) dk = i + ( c + (cid:15) k ) (cid:81) i − m = j (cid:15) m (cid:81) dk = ( c + (cid:15) k ) − λ (cid:81) dm = (cid:15) m (6)Where n is the number of codons as in Eq. (3). The structureof (6) shows the characteristic multiplicative interactions thatcontribute in an important way to the overall stochastic vari-ation. The consequences of this multiplicative e ff ect can beseen in Figure 6 which shows that the number of polypeptidesper transcript calculated via simulation of Eq. (6) produce adistribution with approximate log-normality. VII. THE TRANSIENT PHASE OF TRANSLATIONDYNAMICS
In this section we closely examine the e ff ect of stochas-tic dynamics on a collection of mRNAs arising out of a tran-scriptional burst in a cell population. One useful quantity thatquantifies the transient phase of the dynamics is the average P r ob a b ilit y T=1.2T=1.4T=1.6T=1.8T=2
Time L = Σ i i V ( t ) i Site location (a)(b)
FIG. 7. (a) The average length of polypeptide chain d increaseslinearly with time. (b) With increasing time the overall occupationprobability of polypeptides decreases near the initiation site and in-creases near the termination site. L denotes length of the lattice chainand m is the mean value of the underlying rate constant distribution,which is given by an exponential distribution for this figure. extent of polypeptide chain formation in a cell population. Welooked at how this quantity varies in time starting from thebeginning of the translation process. Let L be such a quantitydescribed by the expectation value L = d (cid:88) i = iV ( t ) i (7)Where i denotes the discrete site location and V ( t ) i gives theprobability that elongation has proceeded up to site i at time tassuming the polypeptide has been initiated at time 0. In Fig-ure 7 (a), we plot L for a lattice chain of length d =
30 with ex-ponentially distributed rate constants of scaled mean 10. Theexpected length of the polypeptide chain formed in the cellpopulation, L , varies linearly with time, except for a slight de-viation from linearity at longer times. To mimic the actualprocess, in the simulation we allowed the fully formed pro-teins to decay with some probability. The mass of protein thatdecays after termination is proportional to the total amount of Time Location
Site location V a r i a n ce Time V a r i a n ce (b)(a) FIG. 8. The change in estimated variance of the occupation prob-ability at di ff erent sites along the mRNA. (a) The variation of theestimated variance at di ff erent residues along the mRNA chain withtime. (b) The changes in estimated variance at di ff erent times alongthe mRNA chain at di ff erent residue locations. The estimated vari-ance is obtained by taking the log-transform of a log-normal distribu-tion and calculating the variance of the resulting normal distribution. fully formed protein. The protein decay feature causes a devi-ation from linearity for L at longer times. A look at the sum ofsite-wise probabilities across cell population with increasingtimes (Figure 7 (b)) reveals the e ff ect of elongation proceed-ing towards the termination site. As expected, with time theoccupation probability decreases near the initiation site andincreases near the termination site as more and more peptidesare released as fully formed proteins.The variance of the occupation probabilities at di ff erentsites across a cell population along the mRNA chain is ex-pected to change with time, as should the occupation prob-ability variance at a particular site. In order to capture thenature of this variation we simulated the process with expo-nentially distributed spatially varying rate constants over a lin-ear chain of length 30 (Figure 8 ). With increasing time thenumerically estimated variance at di ff erent locations tends toconverge (Figure 8 (a)) and decrease (Figure 8 (b)). VIII. PROTEIN DISTRIBUTION TENDS TOLOG-NORMALITY EXCEPT FOR VERY LOW COPYNUMBER PROTEINS
Many physical and chemical laws are multiplicative ratherthan additive and are expected to lead to log normality in nat-ural systems . The log-normal character of protein abun-dance is often demonstrated by quantitative flow cytometry,in which fluorophore labeled antibodies are reacted with pro-tein targets and the intensity of fluorescence determined on acell-by-cell basis. The results from flow cytometry, which typ-ically measures cell surface proteins, have been corroboratedby results from quantitative mass spectrometry of protein frag-ments, which also show a log-normal probability density .Various other methods for protein quantitation have led tosimilar conclusions . Nearly all microarray analyses usestatistical tests based on the logarithm of raw transcript abun-dance data , consistent with single cell gene expression mea-surements indicating that mRNA abundance distributions arelog-normally distributed in cell populations . In this studywe have provided a simplified dynamical framework that pro-vides a direct physical explanation of generically observedlog-normal distributions. In the limit in which the variationin elongation rates becomes small, the Poisson / Gamma distri-bution is recovered.The deviations in the tails of the distribution in our modelindicate that protein abundance distributions in vivo may pos-sess a larger dynamic range than log-normal distributions.These deviations may pose special challenges for cellular reg-ulation if not accompanied by rapid regressions to the mean.At the same time, in a multicellular organism, cells that areoutliers with respect to expression may serve a protective orsentinel function, providing a population responses that mayhave a greater dynamic range attributable to the response char-acteristics of outliers.Considerable work has been done on various aspectsof the origin of stochastic gene expression in cellularprocesses . This work draws attention to the signifi-cant and likely dominant role played by translational noise instochastic gene expression in living cells. Progress in singlecell measurements of translational parameters will undoubt-edly enhance our understanding of stochastic gene expression.
Appendix A: Contour integral form for matrix power series
A consistent and general representation of the terms of thepower series expansion of T k = ( U − k can be obtained in theform of a contour integral. In general the matrix T ( d ) k withelements i , j can be represented by an integral T ( d ) ki , j = π i (cid:73) g i , j ( s , d ) s k ds (A1)where g i , j ( s , d ) is a rational function (i.e. a function f ( s ) = p ( s ) q ( s ) , p , q both polynomials of finite order). For integrals ofsuch functions, the sum of the residue at infinity plus the sumof the residues at the zeroes of q equals zero. It is convenientfor the purpose of proof by induction to transform the inte-grand, recalling that the residue at infinity of f ( s ) is definedas the negative of the residue at 0 of z − f (1 / z ), where z = s − .In eukaryotic cells evidence of mRNA template circular-ization has been observed, both biochemically in the form ofprotein complexes that bind to both poly(A) and the mRNAcap structure , and ultrastructurally, in the form of polysomeslinked in circular configuration. To model the motion of ribo-somes on such a template, we identify the generator for thetranslation operator with the structure T = − (cid:15) · · · λ(cid:15) d (cid:15) − (cid:15) · · · ... ... . . . ... ... · · · − (cid:15) d −
00 0 · · · (cid:15) d − − (cid:15) d (A2)from which we construct the desired solution as the sum (cid:80) ∞ n = ( tT ) n / n !. The i , j entry of the n th power of this matrixtakes the form, for i < j ,12 π i (cid:73) λ (cid:81) j − k = i + (1 + z (cid:15) k ) (cid:81) dm = j (cid:15) m (cid:81) i − m = (cid:15) m z j − i + n + − d ( (cid:81) dk = (1 + z (cid:15) k ) − λ z d (cid:81) dm = (cid:15) m ) dz (A3)and 12 π i (cid:73) (cid:81) j − k = (1 + z (cid:15) k ) (cid:81) dk = i + (1 + z (cid:15) k ) (cid:81) i − m = j (cid:15) m z j − i + n + ( (cid:81) dk = (1 + z (cid:15) k ) − λ z d (cid:81) dm = (cid:15) m ) dz (A4)for i ≥ j , where the contour of integration encloses the originbut none of the roots of the denominator polynomial (cid:81) dk = (1 + z (cid:15) k ) − λ z d (cid:81) dm = (cid:15) m . To establish this by induction we firstobserve that the variable n appears only in the denominator inthe exponent of z . To calculate the residues at zero we needconsider five cases for n =
1: (i) i = , j = d ; (ii) i < j (excepting case (i)); (iii) i = j ; (iv) i = j +
1; and (v) i > j + z j − i + − d = z and the limit of Eq. (A3) as z → π i (cid:73) λ(cid:15) d z (1 + z (cid:15) )(1 + z (cid:15) d ) dz = λ(cid:15) d (A5)For case (ii), the limit of Eq. (A3) as z → j − i + − d ≤ π i (cid:73) λ (cid:81) dm = j (cid:15) m (cid:81) i − m = (cid:15) m z j − i + − d dz = z → π i (cid:73) z (1 + z (cid:15) i ) dz → π i (cid:73) (cid:18) z − (cid:15) i z (cid:19) dz = − (cid:15) i (A7)For case (iv) the limit of Eq. (A4) as z → π i (cid:73) (cid:15) j z (1 + z (cid:15) j )(1 + z (cid:15) j + ) dz = (cid:15) j (A8)And for case (v) the limit of Eq. (A4) as z → j − i + ≤ π i (cid:73) (cid:81) i − k = j (cid:15) k z j − i + dz = T T n = T n + . The actualprocess is slightly di ff erent; we establish that T T n − T n + = − π i (cid:73) d z − n − dz = d (A10)Where d and d are the d dimensional unit and null matrices,respectively. There are 5 cases to be calculated: (i) i < j ( i = i < j ( i > i = j ( i = i = j ( i > i > j . For case (i) we establish λ(cid:15) d (cid:81) j − k = (1 + z (cid:15) k ) (cid:81) d − m = j (cid:15) m z j − d + n + (cid:0) (cid:81) dk = (1 + z (cid:15) k ) − λ z d (cid:81) dm = (cid:15) m (cid:1) − ( z − + (cid:15) ) (cid:81) j − k = (1 + z (cid:15) k ) λ (cid:81) dm = j (cid:15) m z j − d + n (cid:0) (cid:81) dk = (1 + z (cid:15) k ) − λ z d (cid:81) dm = (cid:15) m (cid:1) = (cid:15) i − λ (cid:81) j − k = i (1 + z (cid:15) k ) (cid:81) dm = j (cid:15) m (cid:81) i − m = (cid:15) m z j − i + n + − d (cid:0) (cid:81) dk = (1 + z (cid:15) k ) − λ z d (cid:81) dm = (cid:15) m (cid:1) − ( z − + (cid:15) i ) (cid:81) j − k = i + (1 + z (cid:15) k ) λ (cid:81) dm = j (cid:15) m (cid:81) i − m = (cid:15) m z j − i + n + − d (cid:0) (cid:81) dk = (1 + z (cid:15) k ) − λ z d (cid:81) dm = (cid:15) m (cid:1) = λ(cid:15) d (cid:81) d − m = (cid:15) m z n − d + (cid:0) (cid:81) dk = (1 + z (cid:15) k ) − λ z d (cid:81) dm = (cid:15) m (cid:1) − ( z − + (cid:15) i ) (cid:81) dk = (1 + z (cid:15) k ) z n + (cid:0) (cid:81) dk = (1 + z (cid:15) k ) − λ z d (cid:81) dm = (cid:15) m (cid:1) = − z − n − (A13)For case (iv) (cid:15) i − λ (cid:81) j − k = i (1 + z (cid:15) k ) (cid:81) dm = j (cid:15) m (cid:81) i − m = (cid:15) m z j − i + n + − d (cid:0) (cid:81) dk = (1 + z (cid:15) k ) − λ z d (cid:81) dm = (cid:15) m (cid:1) − ( z − + (cid:15) i ) (cid:81) j − k = (1 + z (cid:15) k ) (cid:81) dk = i + (1 + z (cid:15) k ) (cid:81) i − k = j (cid:15) m z j − i + n + (cid:0) (cid:81) dk = (1 + z (cid:15) k ) − λ z d (cid:81) dm = (cid:15) m (cid:1) = − z − n − (A14)And for case (v) (cid:15) i − (cid:81) j − k = (1 + z (cid:15) k ) (cid:81) dk = i (1 + z (cid:15) k ) (cid:81) i − m = j (cid:15) m z j − i + n + (cid:0) (cid:81) dk = (1 + z (cid:15) k ) − λ z d (cid:81) dm = (cid:15) m (cid:1) − ( z − + (cid:15) i ) (cid:81) j − k = (1 + z (cid:15) k ) (cid:81) dk = i + (1 + z (cid:15) k ) (cid:81) i − m = j (cid:15) m z j − i + n + (cid:0) (cid:81) dk = (1 + z (cid:15) k ) − λ z d (cid:81) dm = (cid:15) m (cid:1) = s = z − and gives, for the i , j entry of the n th powerof the matrix Eq. (A1), for i < j π i (cid:73) s n λ (cid:81) j − k = i + ( s + (cid:15) k ) (cid:81) dm = j (cid:15) m (cid:81) i − m = (cid:15) m (cid:81) dk = ( s + (cid:15) k ) − λ (cid:81) dm = (cid:15) m ds (A16)and 12 π i (cid:73) s n (cid:81) j − k = ( s + (cid:15) k ) (cid:81) dk = i + ( s + (cid:15) k ) (cid:81) i − m = j (cid:15) m (cid:81) dk = ( s + (cid:15) k ) − λ (cid:81) dm = (cid:15) m ds (A17)otherwise, where the contour encloses all of the roots of thedenominator polynomial (cid:81) dk = ( s + (cid:15) k ) − λ (cid:81) dm = (cid:15) m . For λ = T is a stochastic (conservative) matrix, and therefore shouldhave a steady state given by the residues of Eq. (A16) andEq. (A17) at s =
0. Inspection of Eq. (A16) and Eq. (A17)shows that for λ = , s = n with s n replaced by ( ts ) n / n ! , which is (cid:81) dm = (cid:15) m (cid:15) i d (cid:88) k = (cid:81) dm = (cid:15) m (cid:15) k = (cid:15) i d (cid:88) k = (cid:15) k (A18)in both cases. The i , j entry of the matrix representing thesteady state distribution has no dependence on j , consistentwith intuition.In the case λ =
0, the i , j entry of the n th power of matrix(A2) is zero for i < j , and12 π i (cid:73) s n (cid:81) i − m = j (cid:15) m (cid:81) ik = j ( s + (cid:15) k ) (A19)otherwise, where the contour taken in the conventional (posi-tive) sense encloses all of the roots of the denominator polyno-mial (i.e. encircles all of the real axis values of − (cid:15) k , j ≤ k ≤ i .The matrix e tT in this case is given by Eq. (A19) with s n re-placed by the sum (cid:80) ∞ n = ( ts ) n / n !, the integral of which con-verges, despite its resemblance to a function with an essentialsingularity at infinity. We note also for completeness that theevaluation of the contour integral for n = e tT ) i , j = i < j , and( e tT ) i , j = ( − i + j i (cid:88) k = j e − t (cid:15) k (cid:81) i − m = j (cid:15) m (cid:81) i − kp = ( (cid:15) k − (cid:15) k + p ) (cid:81) k − q = j ( (cid:15) k − (cid:15) q ) (A20)otherwise.In the event that all of the (cid:15) k are equal, the evolution op-erator represented by Eq. (A20) takes the particularly simpleform ( t (cid:15) ) i − j ( i − j )! e − t (cid:15) (A21) for i ≥ j (and 0 otherwise) and hence Eq. (A20) can be con-sidered a natural generalization of a Poisson process to a do-main in which the underlying stochastic process is not homo-geneous. The gamma distribution, an extension of the Poissonto nonintegral event frequencies, takes the related form( t (cid:15) ) k Γ ( k + e − t (cid:15) (A22)which bears comparison to Eq. (A20) because Eq. (A22) hasbeen proposed to appropriately capture the statistics of lowmultiplicity translations emitted by a single mRNA template.When the operator T acts on an initial state vector v ( t ) = (1 , , . . . ,
0) of length d at time t = e tT v (0) gives the prob-ability density of the location of a ribosome on the mRNA attime t . If the source of the translation is an mRNA with aprobability of existence at time t of ce − ct , the relative e ff ect ofadditional initiations will be given by (cid:82) ∞ ce − ct e tT v (0) dt , andthe i , j entry of (cid:82) ∞ ce − ct e tT dt for i < j is c λ (cid:81) j − k = i + ( c + (cid:15) k ) (cid:81) dm = j (cid:15) m (cid:81) i − m = (cid:15) m (cid:81) dk = ( c + (cid:15) k ) − λ (cid:81) dm = (cid:15) m (A23)and c (cid:81) j − k = ( c + (cid:15) k ) (cid:81) dk = i + ( c + (cid:15) k ) (cid:81) i − m = j (cid:15) m (cid:81) dk = ( c + (cid:15) k ) − λ (cid:81) dm = (cid:15) m (A24)otherwise.The rate of production of full length protein is given by v ( t ) d (cid:15) d = e tT v (0) d (cid:15) d = ( e tT ) d , (cid:15) d and the integral with respectto time weighted by the lifetime of the encoding RNA gives ce d (cid:90) ∞ e − ct ( e tT ) d , dt = c (cid:81) dm = (cid:15) m (cid:81) dk = ( c + (cid:15) k ) − λ (cid:81) dm = (cid:15) m (A25)for the average number of polypeptides produced per mRNAtemplate, assuming that the characteristic lifetime of themRNA, c − , is long compared to the translation time. In theevent this is not true, we can estimate the number of polypep-tides per template in the elongation-limited domain by divid-ing the mean length that the lead ribosome has translated downthe mRNA, divided by the average number of residues be-tween successive ribosomes, R . This has the form d (cid:88) i = iR (cid:90) ∞ ce − ct v ( t ) i dt = d (cid:88) i = iR (cid:90) ∞ ce − ct ( e tT ) i , dt (A26)which, using Eq. (A23) and setting λ = d (cid:88) i = iR c (cid:81) i − m = (cid:15) m (cid:81) ik = ( c + (cid:15) k ) (A27)for the mean number of ribosomes per template over the lifeof the template. In the limit that all (cid:15) k are equal, Eq. (A25)0gives d (cid:88) i = iR c (cid:15) (cid:18) + c (cid:15) (cid:19) i = ( c + (cid:15) ) (cid:18) − (cid:18) c + (cid:15)(cid:15) (cid:19) − d (cid:19) − cd (cid:18) c + (cid:15)(cid:15) (cid:19) − d Rc ≈ ( c + (cid:15) ) (cid:18) − e − cd (cid:15) (cid:19) − cd e − cd (cid:15) Rc (A28)the latter approximation holding for c (cid:28) (cid:15) . Appendix B: Rate constant distributions
The individual (cid:15) i as modeled here are lumped rate constantsfor chemical reactions of considerable complexity. Howevereven complex trajectories in reaction coordinates can often bemodeled by taking the reaction to proceed through a limitingintermediate corresponding to the lowest energy barrier of thetransition state, which by convention ascribes to the (cid:15) i a struc-ture (cid:15) i = A i e − ∆ GiRT (B1)where ∆ G i is the Gibbs free energy for the i th transition state, R is the universal gas constant, T is the temperature in abso-lute scale, and A i is a constant. The transition state free energyis defined in terms of the corresponding enthalpy ( H ) and en-tropy ( S ) in the usual way as ∆ G i = ∆ H i − T ∆ S i (B2)We assume that the correctly charged tRNA reaches the ribo-some A site through a di ff usion process which in turn deter-mines the rate at which the elongation phase proceeds. Sinceordinary di ff usion is dominated by entropic contributions, weassume the enthalpic contribution can be neglected, and therate constant is e ff ectively given by (cid:15) i = A i e − ∆ SiR (B3)If we let ω i represents the number of microstates correspond-ing to the macrostate of the system, Eq. (B3) can be furthermodified as follows (cid:15) i = A i exp (cid:18) k B ∆ ln ω i Nk B (cid:19) = A i exp (cid:18) ∆ ln ω i N (cid:19) (B4)where k B is Boltzmann’s constant and N Avogadro’s number.The change in entropy depends on the di ff erence in configura-tions between the microstates of transitioning states and is dif-ficult to ascertain precisely when the states between which thetransition occurs are both far from equilibrium. However ournumerical calculation suggests that universality in the steadystate distribution of protein probability density, the counter-part of the law of large numbers in the setting of multiplicativevariables, has a rapid onset, such that even very short polypep-tides (less than 30 residues) show universal behavior. In nu-merical simulations, in addition to the exponential distributionwe have used the Gamma, normal, log-normal and uniformdistribution as possible forms of forms for the rate constantdistribution (Figure 4). Appendix C: Sums of log-normally distributed variablesgenerate distributions that behave similarly to log-normalvariables
The steady state distribution of proteins is given by a sumof the products of random variables. The sum of independentand identically distributed random variables with finite vari-ance tends to a normal distribution as the number of variablesgrows large. Similarly, the distribution of a product of ran-dom variables converges to a log-normal distribution as thenumber of terms of the product increases. But the calcula-tion of the sum of log-normal distributions themselves facesa theoretical roadblock. The distribution of a sum of inde-pendent random variables is obtained from the product of therespective characteristic functions, but for the log-normal dis-tribution a closed form for the characteristic function does notexist and the sum of log-normally distributed variates has notbeen obtained in a closed form .The behavior of the sums of log-normally distributed vari-ables have been a topic of interest in field of communica-tions for over half a century; specific examples where sums oflog-normals appear include co-channel interference in mobile(wireless) communications, in frequency hopped spread spec-trum signals and in the general context of propagation throughturbulent medium . Extensive numerical evidence fromthese studies have confirmed that the sum of independent log-normally distributed random variables is well approximatedby a log-normal distribution . We observe similar be-havior for the numerically calculated distributions for the ste-day state distribution of proteins as shown in Figure 2 andFigure 4 of the main text. Appendix D: Rate constants fluctuating in time
So far we have considered the case in which the rate con-stants are location dependent but constant in time. Here weconsider the most general case, in which the transition prob-ability for the state vector passing from state i to state i + (cid:15) (cid:48) i , also depends on the time of the transition. In this case, wemay define a time dependent transition operator U ( t i ) as U ( t i ) = − (cid:15) (cid:48) ( t i ) 0 · · · λ(cid:15) d (cid:15) (cid:48) ( t i ) 1 − (cid:15) (cid:48) ( t i ) · · · ... ... . . . ... ... · · · − (cid:15) (cid:48) d − ( t i ) 00 0 · · · (cid:15) (cid:48) d − ( t i ) 1 − (cid:15) (cid:48) d ( t i ) (D1)The most general case represented by the above transition ma-trix where (cid:15) (cid:48) i s vary both in space and time, can be obtainedby direct numerical simulation. In Figure 9 we have shownthe simulation for the case of exponentially distributed (cid:15) (cid:48) i s.The resulting distribution follows a log-normal form consis-tent with general expectations regarding the universality ofthis distribution form.To extract approximate behavior analytically, we considerthat the most appropriate approach will depend on the na-ture of the time dependence of (cid:15) (cid:48) i . The simplest case oc-1curs when the time dependence of (cid:15) (cid:48) i can be represented asa smooth function which is independent of the location i . Thebi-directional version of this case with a fixed coe ffi cient at alllocations has a solution in terms of infinite sums of the mod-ified Bessel function . The more general version involvinglocation-dependent but smooth functional time dependence(with location dependent coe ffi cients) of (cid:15) (cid:48) i becomes analyt-ically intractable when formal methods such as a power se-ries representation for solving linear coupled ODE system areused.Due to the random nature of environmental influences onthe translation process, the appropriate form of time depen-dence of (cid:15) (cid:48) i is stochastic. Typical stochastic functions are notintegrable, which makes them weak candidates for calculat-ing time dependence. However, Ito processes, a general classof stochastic functions for which well-developed proceduresfor stochastic integration exist, can be applied. An Ito processis a di ff usion process for which both drift and di ff usion ratesare functions of time. Therefore in a general mathematicallywell-founded approach, we can consider the (cid:15) (cid:48) i to be drawnfrom an Ito process in order to model the influence of randomenvironment on the translation.From biological considerations the time dependence of rateconstants is most likely to be slowly varying with a narrowdistribution range. Under such circumstances, there are sim-pler and physically more illuminating alternative proceduresto Ito process formalism, for obtaining the solution for thetime-dependent case. We discuss two such formulations bothof which capture the dynamics in the case of slowly varyingand narrowly distributed time dependent (cid:15) (cid:48) i .The first procedure involves the power series expansion ofthe Poisson stochastic operator. Let the times be ordered sothat t < t < t < . . . < t n . Then the overall transitionoperator at the time t n , Q ( t n ) is given by Q ( t n ) = e − α t n [1 + ( α t n ) U ( t ) + ( α t n ) U ( t ) . U ( t ) + . . . + ( α t n ) n n ! U ( t n ) . . . U ( t ) . U ( t )] (D2)The individual terms in the power series expansion of Pois-son operator are represented as time-ordered products of U ( t i ).The operators at di ff erent time points U ( t i ) can be constructedby drawing (cid:15) (cid:48) i ( t i )’s from a specific distribution, followingwhich the transition operator can be computed directly fromthe above equation. In the case of the translation process thetime dependence of (cid:15) (cid:48) i ( t i )’s are likely to be slowly varying witha narrow distribution, and in such cases the Q ( t n ) is guaranteedto converge for some suitably high value of t n , which we haveobserved numerically.When the rate constants are time dependent, formulaEq. (D2) for the evolution does not in general hold because theexponentiated matrix operator do not commute. We considerthe case in which the constants (cid:15) i of Eq. (D2) are replaced byfunctions (cid:15) i ( t ) piecewise constant over sequential epochs ∆ t k of constant duration ∆ t . Then if all (cid:15) i ( ∆ t k ) = (cid:15) i ( ∆ t k + ) we have Q ( ∆ t k ) Q ( ∆ t k + ) = e α ∆ tG ( ∆ t k ) e α ∆ tG ( ∆ t k + ) = e α ∆ tG ( ∆ t k ) (D3) − 7 − 6 − 5 − 4 − 3 − 2 − 10.10.20.30.60.50.40.7 Log (x)P (x)
FIG. 9. A simulation result for the steady state distribution of pro-teins across a cell population when rate constants depend on bothlocation and time (Appendix D). For the simulation, the length of thelattice chain was taken as 30 and the rate constants were drawn froman exponential distribution.
However, if the rate constants vary in time, i.e (cid:15) i ( ∆ t k ) (cid:44) (cid:15) i ( ∆ t k + ), Q ( ∆ t k ) Q ( ∆ t k + ) (cid:44) e α ∆ tG ( ∆ t k ) + α ∆ tG ( ∆ t k + ) (D4)unless the commutator [ G ( ∆ t k ) , G ( ∆ t k + )] vanishes.[ G ( ∆ t k ) , G ( ∆ t k + )] i , j = (cid:15) j ( ∆ t k ) (cid:15) i ( ∆ t k + ) − (cid:15) i ( ∆ t k ) (cid:15) j ( ∆ t k + ) i = j + = (cid:15) i − ( ∆ t k ) (cid:15) j ( ∆ t k + ) − (cid:15) j ( ∆ t k ) (cid:15) i − ( ∆ t k + ) i = j + = n − U ( t i ) with a series of stationarytransition operators, simulating the situation in which the rateconstants (cid:15) (cid:48) i are slowly varying in time. Let the total time inter-val over which the dynamical evolution is observed be givenby t . Suppose that the total time interval t can be divided into n time sub-intervals of arbitrary lengths, such that over eachsuch time sub-intervals the transition operator U ( t i ) is givenby a stationary transition operator, i.e. t = t + t + . . . + t n .The transition matrix U ( t i ), is expressed in terms of the prob-ability vectors (cid:15) (cid:48) i , t i , where (cid:15) (cid:48) i , t i is the probability that the statevector moves from state i to state i + t i . The transition operator can be2 − 15 − 10 − 5 00.050. 10.150. 20.250. 3 Log (x)P (x)
FIG. 10. The simulated steady state distribution of proteins acrossa cell population. For the simulation, rate constants were allowed tochange not only with location but were also slowly varying in time(Appendix D). The length of lattice chain for the simulation was 30and the rate constants were drawn from an exponential distribution. written as U ( t i ) = − (cid:15) (cid:48) , t i · · · (cid:15) (cid:48) , t i − (cid:15) (cid:48) , t i · · · ... ... . . . ... ... · · · − (cid:15) (cid:48) d − , t i
00 0 · · · (cid:15) (cid:48) d − , t i − (cid:15) (cid:48) d , t i (D6)For the Poisson stochastic process, the transition operator overthe entire interval t can now be written as a product of a seriesof transition operators over each sub-interval t i . Therefore wehave Q ( t ) = e α t ( U ( t ) − . e α t ( U ( t ) − . . . e α t n ( U ( t n ) − (D7)We can replace (cid:15) (cid:48) i , t i = (cid:15) , t i α and write α ( U ( t i ) − ≡ α G ( t i ) = − (cid:15) , t i · · · (cid:15) , t i − (cid:15) , t i · · · ... ... . . . ... ... · · · − (cid:15) d − , t i
00 0 · · · (cid:15) d − , t i − (cid:15) d , t i (D8)So we have Q ( t ) = e α t G ( t ) . e α t G ( t ) . . . e α t n G ( t n ) (D9)We have carried out numerical simulation designed to mea-sure the consequences of this form of temporal variation. The resulting distribution has a log-normal shape as shown in Fig-ure 10.If the matrices e α t i G ( t i ) for di ff erent time intervals commutewith each other then the final form of Q ( t ) simplifies. To sim-plify the discussion, let us assume that the time subintervalsare all of equal duration given by τ and (cid:15) i , t = ω i and (cid:15) i , t = δ i .The combined time evolution operator for two successive in-tervals will be given by Q ( t ) = e ατ G ( t ) . e ατ G ( t ) (D10)This can be rewritten as Q ( t ) = e ατ G ( t ) + ατ G ( t ) + ατ [ G ( t ) , G ( t )] (D11)Note that if the transition operators G ( t ) and G ( t ) commutethen the formulae for Q ( t ) involve direct addition in the ex-ponential. Consequently, when this commutation conditionholds for all successive time intervals then the e ff ective timeevolution formulae becomes considerably simpler. The com-mutator of α G ( t ) and α G ( t ) is given by[ G ( t ) , G ( t )] = · · · − ω δ + ω δ · · · ω δ − ω δ − ω δ + ω δ . . . ... ω δ − ω δ · · · ...... ... · · · ... (D12)There are two separate conditions on the hopping probabilitiesunder which α G ( t ) and α G ( t ) will commute1) Since the entries in the commutator matrix are of the or-der of the square of rate constants, for very small valuesof rate constant, α G ( t ) and α G ( t ) commute.2) The entries in the commutator have the general form − ω i δ i + + ω i + δ i . If the rate constants change in time ina correlated manner such that δ becomes a function of ω , then − ω i δ i + + ω i + δ i = α G ( t ) and α G ( t ) will com-mute.Under either of these conditions, the general formula forthe time evolution in the case of space-time dependent rateconstants will be given by Q ( t ) = e ατ [ G ( t ) + G ( t ) + ... + G ( t n )] (D14)The above formula can be easily generalized for the case whenthe subinterval time durations are not equal to each other, inwhich case the formula is given by Q ( t ) = e α [ t G ( t ) + t G ( t ) + ... + t n G ( t n )] (D15)In contrast to the more general formula given by Eq. (D9), inEq. (D15) the specific time ordering of di ff erent sub-intervalsbecomes unimportant in the overall form of Poisson semi-group transition operator.3 Appendix E: Note on numerical simulation
For the distribution of elongation rates, the rate constants atvarious sites are scaled by both α and time, where α has thedimension of inverse time. Once α is fixed at a specific value, longer time evolution is given by scalar multiplication of therate constant distribution values by a higher factor. We referto these resulting rate constant values as scaled rate constants.The numerical simulations presented in the paper were carriedout in 32 bit Matlab R2009b and Mathematica 9. ∗ [email protected] A. Bar-Even, J. Paulsson, N. Maheshri, M. Carmi, E. O’Shea,Y. Pilpel, and N. Barkai, Nat. Genet. , 636 (2006). G. Bal´azsi, A. van Oudenaarden, and J. J. Collins, Cell , 910(2011). C. P. Wolk, Annu. Rev. Genet. , 59 (1996). H. J. E. Beaumont, J. Gallie, C. Kost, G. C. Ferguson, and P. B.Rainey, Nature , 90 (2009). S. L. Spencer, S. Gaudet, J. G. Albeck, J. M. Burke, and P. K.Sorger, Nature , 428 (2009). J. R. S. Newman, S. Ghaemmaghami, J. Ihmels, D. K. Breslow,M. Noble, J. L. DeRisi, and J. S. Weissman, Nature , 840(2006). M. B. Elowitz, A. J. Levine, E. D. Siggia, and P. S. Swain, Science , 1183 (2002). J. Paulsson, Nature , 415 (2004). N. Maheshri and E. K. O’Shea, Annu. Rev. Biophys. Biomol.Struct. , 413 (2007). A. Raj and A. van Oudenaarden, Cell , 216 (2008). D. Volfson, J. Marciniak, W. J. Blake, N. Ostro ff , L. S. Tsimring,and J. Hasty, Nature , 861 (2006). W. J. Blake, M. Kaern, C. R. Cantor, and J. J. Collins, Nature , 633 (2003). J. M. Raser and E. K. O’Shea, Science , 2010 (2005). L. Cai, N. Friedman, and X. S. Xie, Nature , 358 (2006). J. Yu, J. Xiao, X. Ren, K. Lao, and X. S. Xie, Science , 1600(2006). Y. Taniguchi, P. J. Choi, G.-W. Li, H. Chen, M. Babu, J. Hearn,A. Emili, and X. S. Xie, Science , 533 (2010). G.-W. Li and X. S. Xie, Nature , 308 (2011). B. Huang, H. Wu, D. Bhaya, A. Grossman, S. Granier, B. K. Ko-bilka, and R. N. Zare, Science , 81 (2007). J. Malmstr¨om, M. Beck, A. Schmidt, V. Lange, E. W. Deutsch,and R. Aebersold, Nature , 762 (2009). J. R. Chubb, T. Trcek, S. M. Shenoy, and R. H. Singer, Curr. Biol. , 1018 (2006). B. B. Kaufmann and A. van Oudenaarden, Curr. Opin. Genet. Dev. , 107 (2007). I. Golding, J. Paulsson, S. M. Zawilski, and E. C. Cox, Cell ,1025 (2005). A. Raj and A. van Oudenaarden, Annu. Rev. Biophys. , 255(2009). A. Raj, C. S. Peskin, D. Tranchina, D. Y. Vargas, and S. Tyagi,PLoS Biol. , 1707 (2006). D. R. Larson, R. H. Singer, and D. Zenklusen, Trends Cell Biol. , 630 (2009). D. M. Suter, N. Molina, D. Gatfield, K. Schneider, U. Schibler,and F. Naef, Science , 472 (2011). R. de Sousa Abreu, L. O. Penalva, E. M. Marcotte, and C. Vogel,Mol. Biosyst. , 1512 (2009). T. Maier, M. G¨uell, and L. Serrano, FEBS Lett. , 3966 (2009). B. Schwanh¨ausser, D. Busse, N. Li, G. Dittmar, J. Schuchhardt,J. Wolf, W. Chen, and M. Selbach, Nature , 337 (2011). C. Vogel and E. M. Marcotte, Nat. Rev. Genet. , 227 (2012). L. D. Kapp and J. R. Lorsch, Annu. Rev. Biochem. , 657 (2004). I. Wohlgemuth, C. Pohl, J. Mittelstaet, A. L. Konevega, and M. V.Rodnina, Philos. Trans. R. Soc. B , 2979 (2011). M. V. Rodnina and W. Wintermeyer, Annu. Rev. Biochem. , 415(2001). T. Ikemura, J. Mol. Biol. , 389 (1981). S. Varenne, J. Buc, R. Lloubes, and C. Lazdunski, J. Mol. Biol. , 549 (1984). M. A. Sørensen, C. G. Kurland, and S. Pedersen, J. Mol. Biol. , 365 (1989). P. M. Sharp and W.-H. Li, Nucleic Acids Res. , 1281 (1987). S. C. Makrides, Microbiol. Rev. , 512 (1996). O. Namy, S. J. Moran, D. I. Stuart, R. J. C. Gilbert, and I. Brierley,Nature , 244 (2006). H. Nakatogawa and K. Ito, Cell , 629 (2002). N. T. Ingolia, L. F. Lareau, and J. S. Weissman, Cell , 789(2011). N. T. Ingolia, S. Ghaemmaghami, J. R. S. Newman, and J. S.Weissman, Science , 218 (2009). S. L. Wolin and P. Walter, EMBO J. , 3559 (1988). J. Darnell, S. J. Van Driesche, C. Zhang, K. Y. S. Hung, A. Mele,C. E. Fraser, E. F. Stone, C. Chen, J. J. Fak, S. W. Chi, D. D.Licatalosi, J. D. Richter, and R. B. Darnell, Cell , 247 (2011). C. J. Shoemaker and R. Green, Nat. Struct. Mol. Biol. , 594(2012). N. Mitarai, K. Sneppen, and S. Pedersen, J. Mol. Biol. , 236(2008). S. Proshkin, A. R. Rahmouni, A. Mironov, and E. Nudler, Science , 504 (2010). S. Dorner, J. L. Brunelle, D. Sharma, and R. Green, Nat. Struct.Mol. Biol. , 234 (2006). P. Saini, D. E. Eyler, R. Green, and T. E. Dever, Nature , 118(2009). H. Zouridis and V. Hatzimanikatis, Biophys. J. , 717 (2007). E. P. O’Brien, M. Vendruscolo, and C. M. Dobson, Nat. Commun. (2012). K. B. Gromadski and M. V. Rodnina, Mol. Cell , 191 (2004). A. Fluitt, E. Pienaar, and H. Vijoen, Comput. Biol. Chem. , 335(2007). G. R. Philipps, Nature , 567 (1965). A. K. Christensen, L. E. Kahn, and C. M. Bourne, Am. J. Anat. , 1 (1987). N. C. Beaulieu and Q. Xie, IEEE Trans. Veh. Technol. , 479(2004). E. Limpert, W. A. Stahel, and M. Abbt, Bioscience , 341(2001). A. Boehm, S. Putz, D. Altenhofer, A. Sickmann, and M. Falk,BMC Bioinformatics , 214 (2007). K. Jung, A. Gannoun, and W. Urfer, Revstat-Stat. J. , 99 (2005). K. Jung, A. Gannoun, B. Sitek, O. Apostolov, A. Schramm, H. E.Meyer, K. St¨uhler, and W. Urfer, Revstat-Stat. J. , 67 (2006). K. Kaneko and C. Furusawa, Theory Biosci , 195 (2008). P. Baldi and H. G. Wesley,
DNA microarrays and gene expres-sion from experiments to data analysis and modeling (Cambridge University Press, Cambridge, 2002). M. Bengtsson, A. Stahlberg, P. Rorsman, and M. Kubista,Genome Res. , 1388 (2005). H. Shiku, D. Okazaki, J. Suzuki, Y. Takahashi, T. Murata,H. Akita, H. Harashima, K. Ino, and T. Matsue, FEBS Lett. ,4000 (2010). I. Lestas, G. Vinnicombe, and J. Paulsson, Nature , 174(2010). A. Hilfinger and J. Paulsson, Proc. Natl. Acad. Sci. U. S. A. ,12167 (2011). S. Wells, P. Hillner, R. Vale, and A. Sachs, Mol. Cell , 135(1998). L. Fenton, IRE Trans. Commun. Syst. , 57 (1960). N. Beaulieu, A. AbuDayya, and P. McLane, IEEE Trans. Com-mun. , 2869 (1995). S. C. Schwartz and Y. S. Yeh, Bell Syst. Tech. J. , 1441 (1982). L. Kleinrock,