[PDF] Information costs in the control of protein synthesis

Abstract

Efficient protein synthesis depends on the availability of charged tRNA molecules. With 61 different codons, shifting the balance among the tRNA abundances can lead to large changes in the protein synthesis rate. Previous theoretical work has asked about the optimization of these abundances, and there is some evidence that regulatory mechanisms bring cells close to this optimum, on average. We formulate the tradeoff between the precision of control and the efficiency of synthesis, asking for the maximum entropy distribution of tRNA abundances consistent with a desired mean rate of protein synthesis. Our analysis, using data from E. coli, indicates that reasonable synthesis rates are consistent only with rather low entropies, so that the cell's regulatory mechanisms must encode a large amount of information about the "correct" tRNA abundances.

Full PDF

IInformation costs in the control of protein synthesis

Rebecca J. Rousseau , and William Bialek , Joseph Henry Laboratories of Physics, and Lewis–Sigler Institutefor Integrative Genomics, Princeton University, Princeton NJ 08544 Department of Physics, California Institute of Technology, Pasadena CA 91125 Initiative for the Theoretical Sciences, The Graduate Center,City University of New York, 365 Fifth Ave, New York, NY 10016 (Dated: January 1, 2020)Eﬃcient protein synthesis depends on the availability of charged tRNA molecules. With 61 dif-ferent codons, shifting the balance among the tRNA abundances can lead to large changes in theprotein synthesis rate. Previous theoretical work has asked about the optimization of these abun-dances, and there is some evidence that regulatory mechanisms bring cells close to this optimum, onaverage. We formulate the tradeoﬀ between the precision of control and the eﬃciency of synthesis,asking for the maximum entropy distribution of tRNA abundances consistent with a desired meanrate of protein synthesis. Our analysis, using data from

E coli , indicates that reasonable synthesisrates are consistent only with rather low entropies, so that the cell’s regulatory mechanisms mustencode a large amount of information about the “correct” tRNA abundances.

In order to function eﬃciently, living cells need to takecontrol over their internal chemistry. This involves ad-justing the concentration of relevant molecules in rela-tion to some goal, and requires transmitting informationabout the goal through the regulatory elements that con-trol these concentrations. But in many biological regula-tory mechanisms, information in turn is represented bythe concentrations of other molecules, and these concen-trations often are quite low, leading to physical limits oninformation transmission [1]. Here we explore the trade-oﬀ between information and eﬃciency in the context ofprotein synthesis.Protein synthesis requires transfer RNA (tRNA)molecules to arrive at the ribosome and dock withtheir complementary codons along the messenger RNA(mRNA). It was realized some time ago that maximiz-ing the rate of protein synthesis requires matching tRNAabundances to codon usage, and there is some evidencethat this happens, at least on average [2, 3]. A similaridea has been applied to bacterial metabolism as a whole,where the ﬂuxes of individual biochemical steps can betuned to maximize the conversion nutrients into biomass[4]. But these discussions of optimization assume thatthe cell can ﬁx molecular abundances with inﬁnite pre-cision. In a series of papers, De Martino and colleagueshave constructed maximum entropy models for the dis-tribution of metabolic ﬂuxes that are consistent with agiven mean rate of conversion into biomass [5]. As withnetworks of neurons [6], ﬂocks of birds [7], families ofprotein sequences [8], and more, these maximum entropymodels constrained by low order moments provide sur-prisingly accurate, quantitative descriptions of emergentbehavior in the metabolic network [9].Here we use the maximum entropy construction to an-alyze the information required for eﬃcient protein syn-thesis. Concretely, we want to ﬁnd the maximum entropydistribution of tRNA abundances that is consistent with a given mean rate of protein synthesis. Our interest isnot (immediately) in the distribution itself, but rather inthe entropy. We recall that reductions in entropy cor-respond to a gain in information, and by deﬁnition themaximum entropy model gives the smallest entropy re-duction needed to satisfy the constraints [10]. Thus ourgoal is to ﬁnd the minimum amount of information re-quired to specify the range of tRNA abundances thatallow for protein synthesis at a given average rate. Thisdiscussion is in the same spirit as classical analyses ofthe tradeoﬀs among growth rate, accuracy of informa-tion transmission, and metabolic costs [11].The time required for protein synthesis must be longerthan the time for charged tRNA molecules to arrive atthe ribosome, and for each codon this time is inverselyproportional to the tRNA concentration; it seems likelythat tRNA availability in fact is the rate–limiting factorin translation elongation [12]. We can choose units wherethe average of this time, normalized per codon, is T = (cid:88) i f i t i , (1)where i indexes the K diﬀerent codons, f i is the frac-tional abundance of codon i in the synthesized proteins,and t i is the abundance of the corresponding tRNA. Herewe imagine that each codon has its own dedicated tRNAmolecule, and return to the more realistic case below.Again we can choose units so that the mean total abun-dance of tRNA is normalized, (cid:88) i (cid:104) t i (cid:105) = 1 . (2)In general, if we ﬁx the mean of several functions f µ ( { t i } )then the maximum entropy distribution is P ( { t i } )) = 1 Z ( { g µ } ) exp (cid:34) − (cid:88) µ g µ f µ ( { t i } ) (cid:35) , (3) a r X i v : . [ q - b i o . S C ] D ec where the Lagrange multipliers { g µ } must be set so thatthe expectation values (cid:104) f µ ( { t i } ) (cid:105) satisfy the constraintswe have set [10]. In our case, then, P ( { t i } ) = 1 Z ( λ, µ ) exp (cid:34) − λ (cid:88) i f i t i − µ (cid:88) i t i (cid:35) , (4)where λ ﬁxes the mean synthesis time (cid:104) T (cid:105) and µ ﬁxes themean total tRNA abundance.We have the usual “thermodynamic” identities, (cid:104) T (cid:105) = − ∂ ln Z ( λ, µ ) ∂λ , (5) (cid:88) i (cid:104) t i (cid:105) = − ∂ ln Z ( λ, µ ) ∂µ = 1 , (6)and the entropy of the distribution is S = ln Z ( λ, µ ) + λ (cid:104) T (cid:105) + µ. (7)Because the constraints we have imposed do not requirecorrelations among the diﬀerent tRNA abundances, wecan write the partition function exactly as a product, Z ( λ, µ ) = (cid:89) i (cid:90) dt e − φ i ( t ) (8) φ i ( t ) = λf i /t + µt. (9)We notice that (cid:90) dt e − φ i ( t ) = 2 (cid:112) λf i /µK (2 (cid:112) λµf i ) , (10)where K ( z ) is the modiﬁed Bessel function of the secondkind [13].We are especially interested in constraints that arestrong enough to drive T close to its minimum value.In this limit, which is found at large λ , the distributionwill be well approximated as a Gaussian around the min-imum of each φ i , P ( { t i } ) = (cid:89) i (cid:112) πσ exp (cid:20) − ( t i − t ∗ i ) σ (cid:21) , (11)where t ∗ i is value of t i that minimizes φ i ( t ), and1 σ = ∂ φ i ( t ) ∂t (cid:12)(cid:12)(cid:12)(cid:12) t = t ∗ i . (12)The entropy of this multidimensional Gaussian is then S = 12 (cid:88) i log (2 πeσ ) bits . (13)From Eq (9) we ﬁnd that t ∗ i = (cid:112) λf i /µ , and to obeyEq (2) we then must have µ = λ (cid:32)(cid:88) i f / (cid:33) (14) t ∗ i = f / (cid:80) j f / . (15) This scaling of the optimal tRNA abundances with thesquare–root of codon usage is familiar from previous work[2]. At these optimal abundances we ﬁnd the minimumsynthesis time, T min = (cid:88) i f i t ∗ i = (cid:88) j f /  . (16)Similarly we have σ = t λf i (cid:12)(cid:12)(cid:12)(cid:12) t = t ∗ i = 12 λT min f / (cid:80) j f / . (17)If we compute the average synthesis time in this Gaussiandistribution we ﬁnd, to leading order in the variances σ , (cid:104) T (cid:105) = T min + (cid:88) i f i ( t ∗ i ) σ + · · · (18)= T min + K λ . (19)Thus we have 12 λT min = 1 K (cid:104) T (cid:105) − T min T min . (20)Substituting into the entropy from Eq (13), we obtain S = 12 (cid:88) i log (cid:34) πeK f / (cid:80) j f / (cid:104) T (cid:105) − T min T min (cid:35) . (21)This illustrates the basic tradeoﬀ between synthesis timeand entropy: if the cell wants to drive (cid:104) T (cid:105) → T min , thenthe entropy of the distribution of tRNA abundances mustbecome smaller and smaller, corresponding to tightercontrol. To set scale of this eﬀect we compare withthe entropy that is possible when we constrain the meantRNA abundances but place no constraint on the synthe-sis times. This corresponds to the distribution in Eq (4)with λ = 0 and µ = K ; for this exponential distributionwe can evaluate the entropy exactly, S = K log ( e/K ) . (22)Finally, the diﬀerence between S and S is the informa-tion required to specify the tRNA concentrations, I = 12 (cid:88) i log  e πKf / (cid:88) j f /  T min (cid:104) T (cid:105) − T min  bits . (23)Roughly speaking, holding the system within some de-sired distance of optimal synthesis rates requires keepingthe variance of tRNA abundances small, proportional tothe distance from the optimum. But entropies are re-lated to (half) the log of the variance, and this is truefor each of the codons, giving us the form of Eq (23). codon rank f r a c t i ona l abundan c e FIG. 1: Normalized codon frequencies, { f i } , inferred frommeasurements of protein abundances in E coli under stan-dard glucose conditions [14]. We show the mean (solid) ± one standard deviation (dashed) across the three separate ex-periments at nine timepoints during growth. In more detail, we see that if the f i are uniform, then (cid:80) j f / cancels Kf / , and the log depends only on thedistance from the optimum; the number of codons thensimply scales the overall information.In order to apply these ideas to real cells, we need toknow the codon abundances { f i } . It is easy to read thecodons as they occur in the genome, but what mattershere is the frequency with which they are used in mak-ing proteins. Recent measurements on the bacterium Ecoli survey the relative concentrations of all the expressedproteins under a variety of growth conditions [14], and wecan use these results to estimate the codon abundances,with results shown in Fig 1. We see that the f i are farfrom uniform, varying over nearly two orders of magni-tude, as known from earlier work [15, 16].If we just substitute the observed { f i } into Eq (23),we ﬁnd that getting within ∼

5% of the optimum wouldrequire ∼

100 bits of information. But Eq (23) is an ap-proximate result, only as good as our Gaussian approxi-mation. The condition for validity of this approximationis σ i (cid:28) t ∗ i , or T min (cid:104) T (cid:105) − T min (cid:29) f / (cid:88) j f / (24)for all i. This condition is violated at (cid:104) T (cid:105) /T min ∼ . Z ( λ, µ ) in the ( λ, µ ) plane, evaluate derivatives by ﬁnite diﬀerences, impose the con-straint in Eq (6), and then plot the information vs (cid:104) T (cid:105) ,parametrically in λ . The results are shown in Fig 2, in-cluding the Gaussian approximation for comparison.We see that bringing the system within ∼

10% of theoptimal synthesis rate requires ∼

50 bits of information,or roughly one bit for each codon. Even getting withina factor of two of the optimum requires ∼

10 bits. Onemight worry that this is an overestimate, since we haveassumed that each codon has its own complementarytRNA. If we collapse down to 38 tRNA species [17], how-ever, for (cid:104) T (cid:105) → T min we ﬁnd only a 10 −

15% reductionin the required information.It has been known for some time that the synthesisof fully functional tRNAs, charged with amino acids, in-volves many steps, all of which are subject to regulation[18], and hence there are many paths along which the re-quired information could be transmitted. Each of thesepathways will have a limited information capacity, and inthe case of transcriptional regulation we know that thiscapacity is ∼ − ∼ −

50 bits ofinformation along several pathways, but if real bacteriacome close to their maximum translation rates then thismay be possible only because they also come close to theinformation capacity of the relevant regulatory pathways.We thank CG Callan, D DeMartino, and A Mayer forhelpful discussions. This work was supported in partby the US National Science Foundation, through the /T min i n f o r m a t i on ( b i t s ) FIG. 2: Information to specify tRNA abundances as a func-tion of mean protein synthesis time per codon, with { f i } fromFig 1. Blue line is the Gaussian approximation, from Eq (23),and red line is the numerical result described in the text. Center for the Physics of Biological Function (PHY–1734030), the Center for the Science of Information(CCF–0939370), and Grant PHY–1607612. [1] W Bialek,

Biophysics: Searching for Principles (Prince-ton University Press, Princeton NJ, 2012).[2] X Xia, How optimized is the translational machinery in

E coli , S typhimurium , and

S cerevisiae ? Genetics

Escherichiacoli optimize the economics of the translation process?

JTheor Biol

Escherichiacoli k-12 undergoes adaptive evolution to achieve in silicopredicted optimal growth.

Nature

Nat Biotechnol

E coli . Phys Biol

Phys Rev E

Nature

PLoS Comput Biol e1003408 (2014). L Meshulam,JL Gauthier, CD Brody, DW Tank, and W Bialek, Col-lective behavior of place and non-place neurons in thehippocampal network.

Neuron

Proc Natl Acad Sci (USA)

Proc Natl AcadSci (USA)

PLoS One e28766 (2011).[9] D De Martino, AMC Andersson, T Bergmiller, CC Guet,and G Tkaˇcik, Statistical mechanics for metabolic net-works during steady state growth. Nature Commun Phys Rev

Q RevBiophys

Table of Integrals, Series,and Products . Seventh edition, edited by A Jeﬀrey andD Zwillinger (Academic Press, New York, 2007). See Eq(3.324.1).[14] MU Caglar, JR Houser, CS Barnhart, DR Boutz, SMCarroll, A Dasgupta, WF Lenoir, BL Smith, V Sridhara,DK Sydykova, DV Wood, CJ Marx, EM Marcotte, JEBarrick, and CO Wilke, The

E coli molecular pheno-type under diﬀerent growth conditions.

Sci Rep Biochem Soc Trans

Genetics f i that is the sum of the two original values. We carryout this projection from 61 codons to 38 tRNA speciesfollowing Table 2 in H Dong, L Nilsson, and CG Kurland, J Mol Biol

Escherichiacoli . Cell Proc NatlAcad Sci (USA)

Phys Rev E78,