Information costs in the control of protein synthesis
IInformation costs in the control of protein synthesis
Rebecca J. Rousseau , and William Bialek , Joseph Henry Laboratories of Physics, and Lewis–Sigler Institutefor Integrative Genomics, Princeton University, Princeton NJ 08544 Department of Physics, California Institute of Technology, Pasadena CA 91125 Initiative for the Theoretical Sciences, The Graduate Center,City University of New York, 365 Fifth Ave, New York, NY 10016 (Dated: January 1, 2020)Efficient protein synthesis depends on the availability of charged tRNA molecules. With 61 dif-ferent codons, shifting the balance among the tRNA abundances can lead to large changes in theprotein synthesis rate. Previous theoretical work has asked about the optimization of these abun-dances, and there is some evidence that regulatory mechanisms bring cells close to this optimum, onaverage. We formulate the tradeoff between the precision of control and the efficiency of synthesis,asking for the maximum entropy distribution of tRNA abundances consistent with a desired meanrate of protein synthesis. Our analysis, using data from
E coli , indicates that reasonable synthesisrates are consistent only with rather low entropies, so that the cell’s regulatory mechanisms mustencode a large amount of information about the “correct” tRNA abundances.
In order to function efficiently, living cells need to takecontrol over their internal chemistry. This involves ad-justing the concentration of relevant molecules in rela-tion to some goal, and requires transmitting informationabout the goal through the regulatory elements that con-trol these concentrations. But in many biological regula-tory mechanisms, information in turn is represented bythe concentrations of other molecules, and these concen-trations often are quite low, leading to physical limits oninformation transmission [1]. Here we explore the trade-off between information and efficiency in the context ofprotein synthesis.Protein synthesis requires transfer RNA (tRNA)molecules to arrive at the ribosome and dock withtheir complementary codons along the messenger RNA(mRNA). It was realized some time ago that maximiz-ing the rate of protein synthesis requires matching tRNAabundances to codon usage, and there is some evidencethat this happens, at least on average [2, 3]. A similaridea has been applied to bacterial metabolism as a whole,where the fluxes of individual biochemical steps can betuned to maximize the conversion nutrients into biomass[4]. But these discussions of optimization assume thatthe cell can fix molecular abundances with infinite pre-cision. In a series of papers, De Martino and colleagueshave constructed maximum entropy models for the dis-tribution of metabolic fluxes that are consistent with agiven mean rate of conversion into biomass [5]. As withnetworks of neurons [6], flocks of birds [7], families ofprotein sequences [8], and more, these maximum entropymodels constrained by low order moments provide sur-prisingly accurate, quantitative descriptions of emergentbehavior in the metabolic network [9].Here we use the maximum entropy construction to an-alyze the information required for efficient protein syn-thesis. Concretely, we want to find the maximum entropydistribution of tRNA abundances that is consistent with a given mean rate of protein synthesis. Our interest isnot (immediately) in the distribution itself, but rather inthe entropy. We recall that reductions in entropy cor-respond to a gain in information, and by definition themaximum entropy model gives the smallest entropy re-duction needed to satisfy the constraints [10]. Thus ourgoal is to find the minimum amount of information re-quired to specify the range of tRNA abundances thatallow for protein synthesis at a given average rate. Thisdiscussion is in the same spirit as classical analyses ofthe tradeoffs among growth rate, accuracy of informa-tion transmission, and metabolic costs [11].The time required for protein synthesis must be longerthan the time for charged tRNA molecules to arrive atthe ribosome, and for each codon this time is inverselyproportional to the tRNA concentration; it seems likelythat tRNA availability in fact is the rate–limiting factorin translation elongation [12]. We can choose units wherethe average of this time, normalized per codon, is T = (cid:88) i f i t i , (1)where i indexes the K different codons, f i is the frac-tional abundance of codon i in the synthesized proteins,and t i is the abundance of the corresponding tRNA. Herewe imagine that each codon has its own dedicated tRNAmolecule, and return to the more realistic case below.Again we can choose units so that the mean total abun-dance of tRNA is normalized, (cid:88) i (cid:104) t i (cid:105) = 1 . (2)In general, if we fix the mean of several functions f µ ( { t i } )then the maximum entropy distribution is P ( { t i } )) = 1 Z ( { g µ } ) exp (cid:34) − (cid:88) µ g µ f µ ( { t i } ) (cid:35) , (3) a r X i v : . [ q - b i o . S C ] D ec where the Lagrange multipliers { g µ } must be set so thatthe expectation values (cid:104) f µ ( { t i } ) (cid:105) satisfy the constraintswe have set [10]. In our case, then, P ( { t i } ) = 1 Z ( λ, µ ) exp (cid:34) − λ (cid:88) i f i t i − µ (cid:88) i t i (cid:35) , (4)where λ fixes the mean synthesis time (cid:104) T (cid:105) and µ fixes themean total tRNA abundance.We have the usual “thermodynamic” identities, (cid:104) T (cid:105) = − ∂ ln Z ( λ, µ ) ∂λ , (5) (cid:88) i (cid:104) t i (cid:105) = − ∂ ln Z ( λ, µ ) ∂µ = 1 , (6)and the entropy of the distribution is S = ln Z ( λ, µ ) + λ (cid:104) T (cid:105) + µ. (7)Because the constraints we have imposed do not requirecorrelations among the different tRNA abundances, wecan write the partition function exactly as a product, Z ( λ, µ ) = (cid:89) i (cid:90) dt e − φ i ( t ) (8) φ i ( t ) = λf i /t + µt. (9)We notice that (cid:90) dt e − φ i ( t ) = 2 (cid:112) λf i /µK (2 (cid:112) λµf i ) , (10)where K ( z ) is the modified Bessel function of the secondkind [13].We are especially interested in constraints that arestrong enough to drive T close to its minimum value.In this limit, which is found at large λ , the distributionwill be well approximated as a Gaussian around the min-imum of each φ i , P ( { t i } ) = (cid:89) i (cid:112) πσ exp (cid:20) − ( t i − t ∗ i ) σ (cid:21) , (11)where t ∗ i is value of t i that minimizes φ i ( t ), and1 σ = ∂ φ i ( t ) ∂t (cid:12)(cid:12)(cid:12)(cid:12) t = t ∗ i . (12)The entropy of this multidimensional Gaussian is then S = 12 (cid:88) i log (2 πeσ ) bits . (13)From Eq (9) we find that t ∗ i = (cid:112) λf i /µ , and to obeyEq (2) we then must have µ = λ (cid:32)(cid:88) i f / (cid:33) (14) t ∗ i = f / (cid:80) j f / . (15) This scaling of the optimal tRNA abundances with thesquare–root of codon usage is familiar from previous work[2]. At these optimal abundances we find the minimumsynthesis time, T min = (cid:88) i f i t ∗ i = (cid:88) j f / . (16)Similarly we have σ = t λf i (cid:12)(cid:12)(cid:12)(cid:12) t = t ∗ i = 12 λT min f / (cid:80) j f / . (17)If we compute the average synthesis time in this Gaussiandistribution we find, to leading order in the variances σ , (cid:104) T (cid:105) = T min + (cid:88) i f i ( t ∗ i ) σ + · · · (18)= T min + K λ . (19)Thus we have 12 λT min = 1 K (cid:104) T (cid:105) − T min T min . (20)Substituting into the entropy from Eq (13), we obtain S = 12 (cid:88) i log (cid:34) πeK f / (cid:80) j f / (cid:104) T (cid:105) − T min T min (cid:35) . (21)This illustrates the basic tradeoff between synthesis timeand entropy: if the cell wants to drive (cid:104) T (cid:105) → T min , thenthe entropy of the distribution of tRNA abundances mustbecome smaller and smaller, corresponding to tightercontrol. To set scale of this effect we compare withthe entropy that is possible when we constrain the meantRNA abundances but place no constraint on the synthe-sis times. This corresponds to the distribution in Eq (4)with λ = 0 and µ = K ; for this exponential distributionwe can evaluate the entropy exactly, S = K log ( e/K ) . (22)Finally, the difference between S and S is the informa-tion required to specify the tRNA concentrations, I = 12 (cid:88) i log e πKf / (cid:88) j f / T min (cid:104) T (cid:105) − T min bits . (23)Roughly speaking, holding the system within some de-sired distance of optimal synthesis rates requires keepingthe variance of tRNA abundances small, proportional tothe distance from the optimum. But entropies are re-lated to (half) the log of the variance, and this is truefor each of the codons, giving us the form of Eq (23). codon rank f r a c t i ona l abundan c e FIG. 1: Normalized codon frequencies, { f i } , inferred frommeasurements of protein abundances in E coli under stan-dard glucose conditions [14]. We show the mean (solid) ± one standard deviation (dashed) across the three separate ex-periments at nine timepoints during growth. In more detail, we see that if the f i are uniform, then (cid:80) j f / cancels Kf / , and the log depends only on thedistance from the optimum; the number of codons thensimply scales the overall information.In order to apply these ideas to real cells, we need toknow the codon abundances { f i } . It is easy to read thecodons as they occur in the genome, but what mattershere is the frequency with which they are used in mak-ing proteins. Recent measurements on the bacterium Ecoli survey the relative concentrations of all the expressedproteins under a variety of growth conditions [14], and wecan use these results to estimate the codon abundances,with results shown in Fig 1. We see that the f i are farfrom uniform, varying over nearly two orders of magni-tude, as known from earlier work [15, 16].If we just substitute the observed { f i } into Eq (23),we find that getting within ∼
5% of the optimum wouldrequire ∼
100 bits of information. But Eq (23) is an ap-proximate result, only as good as our Gaussian approxi-mation. The condition for validity of this approximationis σ i (cid:28) t ∗ i , or T min (cid:104) T (cid:105) − T min (cid:29) f / (cid:88) j f / (24)for all i. This condition is violated at (cid:104) T (cid:105) /T min ∼ . Z ( λ, µ ) in the ( λ, µ ) plane, evaluate derivatives by finite differences, impose the con-straint in Eq (6), and then plot the information vs (cid:104) T (cid:105) ,parametrically in λ . The results are shown in Fig 2, in-cluding the Gaussian approximation for comparison.We see that bringing the system within ∼
10% of theoptimal synthesis rate requires ∼
50 bits of information,or roughly one bit for each codon. Even getting withina factor of two of the optimum requires ∼
10 bits. Onemight worry that this is an overestimate, since we haveassumed that each codon has its own complementarytRNA. If we collapse down to 38 tRNA species [17], how-ever, for (cid:104) T (cid:105) → T min we find only a 10 −
15% reductionin the required information.It has been known for some time that the synthesisof fully functional tRNAs, charged with amino acids, in-volves many steps, all of which are subject to regulation[18], and hence there are many paths along which the re-quired information could be transmitted. Each of thesepathways will have a limited information capacity, and inthe case of transcriptional regulation we know that thiscapacity is ∼ − ∼ −
50 bits ofinformation along several pathways, but if real bacteriacome close to their maximum translation rates then thismay be possible only because they also come close to theinformation capacity of the relevant regulatory pathways.We thank CG Callan, D DeMartino, and A Mayer forhelpful discussions. This work was supported in partby the US National Science Foundation, through the
Biophysics: Searching for Principles (Prince-ton University Press, Princeton NJ, 2012).[2] X Xia, How optimized is the translational machinery in
E coli , S typhimurium , and
S cerevisiae ? Genetics
Escherichiacoli optimize the economics of the translation process?
JTheor Biol
Escherichiacoli k-12 undergoes adaptive evolution to achieve in silicopredicted optimal growth.
Nature
Nat Biotechnol
E coli . Phys Biol
Phys Rev E
Phys Rev E
Nature
PLoS Comput Biol e1003408 (2014). L Meshulam,JL Gauthier, CD Brody, DW Tank, and W Bialek, Col-lective behavior of place and non-place neurons in thehippocampal network.
Neuron
Proc Natl Acad Sci (USA)
Proc Natl Acad Sci (USA)
Proc Natl AcadSci (USA)
PLoS One e28766 (2011).[9] D De Martino, AMC Andersson, T Bergmiller, CC Guet,and G Tkaˇcik, Statistical mechanics for metabolic net-works during steady state growth. Nature Commun Phys Rev
Q RevBiophys
Table of Integrals, Series,and Products . Seventh edition, edited by A Jeffrey andD Zwillinger (Academic Press, New York, 2007). See Eq(3.324.1).[14] MU Caglar, JR Houser, CS Barnhart, DR Boutz, SMCarroll, A Dasgupta, WF Lenoir, BL Smith, V Sridhara,DK Sydykova, DV Wood, CJ Marx, EM Marcotte, JEBarrick, and CO Wilke, The
E coli molecular pheno-type under different growth conditions.
Sci Rep Biochem Soc Trans
Genetics f i that is the sum of the two original values. We carryout this projection from 61 codons to 38 tRNA speciesfollowing Table 2 in H Dong, L Nilsson, and CG Kurland, J Mol Biol
Escherichiacoli . Cell Proc NatlAcad Sci (USA)
Phys Rev E78,