The free energy requirements of biological organisms; implications for evolution
aa r X i v : . [ c ond - m a t . s t a t - m ec h ] J un The free energy requirements of biological organisms; implications for evolution
David H. Wolpert ∗ Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA http://davidwolpert.weebly.com (Dated: October 4, 2018)Recent advances in nonequilibrium statistical physics have provided unprecedented insight into the thermo-dynamics of dynamic processes. The author recently used these advances to extend Landauer’s semi-formalreasoning concerning the thermodynamics of bit erasure, to derive the minimal free energy required to imple-ment an arbitrary computation. Here, I extend this analysis, deriving the minimal free energy required by anorganism to run a given (stochastic) map π from its sensor inputs to its actuator outputs. I use this result tocalculate the input-output map π of an organism that optimally trades o ff the free energy needed to run π withthe phenotypic fitness that results from implementing π . I end with a general discussion of the limits imposedon the rate of the terrestrial biosphere’s information processing by the flux of sunlight on the Earth. I. INTRODUCTION
It is a truism that biological systems acquire and store in-formation about their environments [1–5]. However, they donot just store information; they also process that information.In other words, they perform computation. The energetic con-sequences for biological systems of these three processes—acquiring, storing, and processing information—are becom-ing the focus of an increasing body of research [6–15]. Inthis paper, I further this research by analyzing the energeticresources that an organism needs in order to compute in afitness-maximizing way.Ever since Landauer’s seminal work [16–26], it has beenappreciated that the laws of statistical physics impose lowerbounds on how much thermodynamic work must be done on asystem in order for that system to undergo a two-to-one map,e.g., to undergo bit erasure. By conservation of energy, thatwork must ultimately be acquired from some external source(e.g., sunlight, carbohydrates, etc. ). If that work on the systemis eventually converted into heat that is dumped into an exter-nal heat bath, then the system acts as a heater. In the contextof biology, this means that whenever a biological system (de-terministically) undergoes a two-to-one map, it must use freeenergy from an outside source to do so and produces heat as aresult.These early analyses led to a widespread belief that theremust be strictly positive lower bounds on how much free en-ergy is required to implement any deterministic, logically-irreversible computation. Indeed, Landauer wrote “...logicalirreversibility is associated with physical irreversibility andrequires a minimal heat generation” [16]. In the context ofbiology, such bounds would translate to a lower limit on howmuch free energy a biological system must “harvest” from itsenvironment in order to implement any particular (determin-istic) computation, not just bit erasure.A related conclusion of these early analyses was that a one-to-two map, in which noise is added to a system that is initiallyin one particular state with probability one, can act as a refrig-erator , rather than a heater, removing heat from the environ- ∗ Massachusetts Insitutute of Technology; Arizona State University ment [16, 20–22]. Formally, the minimal work that needs tobe done on a system in order to make it undergo a one-to-twomap is negative. So for example, if the system is coupled to abattery that stores free energy, a one-to-two map can “powerthe battery”, by gaining free energy from a heat bath ratherthan dumping it there. To understand this intuitively, supposewe have a two-state system that is initially in one particularstate with probability one. Therefore, the system initially haslow entropy. That means we can connect it to a heat bath andthen have it do work on a battery (assuming the battery wasinitially at less than maximum storage), thereby transferringenergy from the heat bath into that battery. As it does this,though, the system gets thermalized, i.e. , undergoes a one-to-two map (as a concrete example, this is what happens inadiabatic demagnetization of an Ising spin system [16]).This possibility of gaining free energy by adding noise toa computation, or at least reducing the amount of free energythe computation needs, means that there is a trade-o ff in biol-ogy: on the one hand, there is a benefit to having biologicalcomputation that is as precise as possible, in order to maxi-mize the behavioral fitness that results from that computation;on the other hand, there is a benefit to having the computationbe as imprecise as possible, in order to minimize the amountof free energy needed to implement that computation. Thistradeo ff raises the intriguing possibility that some biologicalsystems have noisy dynamics “on purpose”, as a way to main-tain high stores of free energy. For such a system, the noisewould not be an unavoidable di ffi culty to be overcome, butrather a resource to be exploited.More recently, there has been dramatic progress in our un-derstanding of non-equilibrium statistical physics and its re-lation to information-processing [27–43]. Much of this re-cent literature has analyzed the minimal work required todrive a physical system’s (fine-grained) microstate dynam-ics during the interval from t = t = π . In particular, there has been detailed analysis of theminimal work needed when there are only two macrostates, v = v =
1, and we require that both get mapped by π to the macrostate v = v ∈ V as Information Bearing Degrees of Free-dom (IBDF) [22] of an information-processing device like adigital computer, these analyses can be seen as elaborations ofthe analyses of Landauer et al . on the thermodynamics of biterasure. Recently, these analyses of maps over binary spaces V have been applied to explicitly biological systems, at leastfor the special case of a periodic forcing function [14].These analyses have resulted in substantial clarifications ofLandauer’s semiformal reasoning, arguably overturning it insome regards. For example, this analysis has shown that thelogical (ir)reversibility of π has nothing to do with the ther-modynamic (ir)reversibility of a system that implements π . Inparticular, it is possible to implement bit erasure (which is log-ically irreversible) in a thermodynamically-reversible manner.In the modern understanding, there is no irreversible increaseof entropy in bit erasure. Instead, there is a minimal amountof thermodynamic work that needs to be expended in a (ther-modynamically reversible) implementation of bit erasure (seeExample 3 below.)Many of these previous analyses consider processes for im-plementing π that are tailored for some specific input distribu-tion over the macrostates, P ( v t ). Such processes are designedto be thermodynamically reversible when run on P ( v t ). How-ever, when run on a distribution other than P ( v t ), they are ther-modynamically irreversible, resulting in wasted (dissipated)work. For example, in [45], the amount of work required toimplement π depends on an assumption for ǫ , the probabilityof a one in a randomly-chosen position on the bit string.In addition, important as they are, these recent analyses arenot applicable to arbitrary maps π over a system’s macrostates.For example, as discussed in [46], the “quench-based” devicesanalyzed in [36, 38, 44] can only implement maps whose out-put is independent of its input (as an example, the output ofbit erasure, an erased bit, is independent of the original stateof the bit).Similarly, the devices considered in [45, 47] combine a“tape” containing a string of bits with a “tape head” that ispositioned above one of the bits on the tape. In each iterationof the system, the bit currently under the tape head undergoesan arbitrary map to produce a new bit value, and then, the tapeis advanced so that the system is above the next bit. Supposethat, inspired by [48], we identify the state of the IBDF of theoverall tape-based system as the entire bit string, aligned sothat the current tape position of the read / write subsystem isabove Bit zero. In other words, we would identify each stateof the IBDF as an aligned big string { v i : i = . . . , − , , ... N } where N is the number of bits that have already been pro-cessed, and the (negative) minimal index could either be finiteor infinite (note that unless we specify which bit of the stringis the current one, i.e. , which has index zero, the update mapover the string is not defined).This tape-based system is severely restricted in the set ofcomputations it can implement on its IBDF. For example, be-cause the tape can only move forward, the system cannot de-terministically map an IBDF state v = { . . . v − , v , v , . . . , v N } to an IBDF state v ′ = { . . . v ′− , v ′ , v ′ , . . . , v ′ N − } . (In [49], thetape can rewind. However, such rewinding only arises dueto thermal fluctuations and therefore does not overcome theproblem.) It should be possible to extend either the quench-based de-vices reviewed in [38] and the tape-based device introducedin [45] into a system that could perform arbitrary computa-tion. In fact, in [46], I showed how to extend quench-baseddevices into systems that could perform arbitrary computa-tion in a purely thermodynamically-reversible manner. Thisallowed me to calculate the minimal work that any systemneeds to implement any given conditional distribution π . Tobe precise, I showed how for any π and initial distribution P ( v t ), one could construct: • a physical system S ; • a process Λ running over S ; • an associated coarse-grained set V giving themacrostates of S ;such that: • running Λ on S ensures that the distribution across V changes according to π , even if the initial distributiondi ff ers from P ( v t ); • Λ is thermodynamically reversible if applied to P ( v t ).By the second law, no process can implement π on P ( v t )with less work than Λ requires. Therefore, by calculating theamount of work required by Λ , we calculate a lower bound onhow much work is required to run π on P ( v t ). In the contextof biological systems, that bound is the minimal amount offree energy that any organism must extract from its externalenvironment in order to run π .However, just like in the systems considered previously inthe literature, this Λ is thermodynamically optimized for thatinitial distribution P ( v t ). It would be thermodynamically irre-versible (and therefore dissipate work) if used for any otherother initial distribution. In the context of biological sys-tems, this means that while natural selection may produce aninformation-processing organism that is thermodynamicallyoptimal in one environment, it cannot produce one that is ther-modynamically optimal in all environments.Biological systems are not only information-processingsystems, however. As mentioned above, they also acquireinformation from their environment and store it. Many ofthese processes have nonzero minimal thermodynamic costs, i.e. , the system must acquire some minimal free energy toimplement them. In addition, biological systems often re-arrange matter, thereby changing its entropy. Sometimes,these systems benefit by decreasing entropy, but sometimes,they benefit by increasing entropy, e.g., as when cells usedepletion forces, when they exploit osmotic pressures, etc .This is another contribution to their free energy requirements.Of course, biological systems also typically perform physical“labor”, i.e. , change the expected energy of various systems,by breaking / making chemical bonds, and on a larger scale,moving objects (including themselves), developing, growing, etc . They must harvest free energy from their environment topower this labor, as well. Some biological processes even in-volve several of these phenomena simultaneously, e.g., a bio-chemical pathway that processes information from the envi-ronment, making and breaking chemical bonds as it does soand also changing its overall entropy.In this paper, I analyze some of these contributions to thefree energy requirements of biological systems and the impli-cations of those costs for natural selection. The precise con-tributions of this paper are:1. Motivated by the example of a digital computer, theanalysis in [46] was formulated for systems that changethe value v of a single set of physical variables, V .Therefore, for example, as formulated there, bit era-sure means a map that sends both v t = v t = v t + = X , representing the state of a sensor, to a separateset of “output” physical variables, Y , representing theaction taken by the organism in response to its sensorreading. Therefore, as formulated in this paper, “bit era-sure” means a map π that sends both x t = x t = y t + =
0. My first contribution is to show how toimplement any given stochastic map X → Y with a pro-cess that requires minimal work if it is applied to somespecified distribution over X and to calculate that mini-mal work.2. In light of the free energy costs associated with imple-menting a map π , what π would we expect to be favoredby natural selection? In particular, recall that addingnoise to a computation can result in a reduction in howmuch work is needed to implement it. Indeed, by us-ing a su ffi ciently noisy π , an organism can increase itsstored free energy (if it started in a state with less thanmaximal entropy). Therefore, noise might not just bea hindrance that an organism needs to circumvent; anorganism may actually exploit noise, to “recharge itsbattery”. This implies that an organism will want to im-plement a “behavior” π that is noisy as possible.In addition, not all terms in a map x t → y t + are equallyimportant to an organism’s reproductive fitness. It willbe important to be very precise in what output is pro-duced for some inputs x t , but for other inputs, precisionis not so important. Indeed, for some inputs, it maynot matter at all what output the organism produces inresponse. In light of this, natural selection would beexpected to favor organisms that implement behaviors π that are as noisy as possible (thereby saving on theamount of free energy the organism needs to acquirefrom its environment to implement that behavior), whilestill being precise for those inputs where behavioral fit-ness requires it. I write down the equations for what π optimizes this tradeo ff and show that it is approximatedby a Boltzmann distribution over a sum of behavioralfitness and energy. I then use that Boltzmann distribu- tion to calculate a lower bound on the maximal repro-ductive fitness over all possible behaviors π .3. My last contribution is to use the preceding results to re-late the free energy flux incident on the entire biosphereto the maximal “rate of computation” implemented bythe biosphere. This relation gives an upper bound onthe rate of computation that humanity as a whole canever achieve, if it restricts itself to the surface of Earth.In Section II, I first review some of the basic quantities con-sidered in nonequilibrium statistical physics and then reviewsome of the relevant recent work in nonequilibrium statisticalphysics (involving “quenching processes”) related to the freeenergy cost of computation. I then discuss the limitations inwhat kind of computations that recent work can be used to an-alyze. I end by presenting an extension to that recent work thatdoes not have these limitations (involving “guided quenchingprocesses”). In Section III, I use this extension to calculatethe minimal free energy cost of any given input-output “or-ganism”. I end this section by analyzing a toy model of therole that this free energy cost would play in natural selection.Those interested mainly in these biological implications canskip Section II and should still be able to follow the thrust ofthe analysis.In this paper I extend the construction reviewed in [38] toshow how to construct a system to perform any given com-putation in a thermodynamically reversible manner. (It seemslikely that the tape-based system introduced in [45] could alsobe extended to do this.) II. FORMAL PRELIMINARIESA. General Notation
I write | X | for the cardinality of any countable space X .I will write the Kronecker delta between any two elements x , x ′ ∈ X as δ ( x , x ′ ). For any logical condition ζ , I ( ζ ) = ζ is true (false, respectively). When referringgenerically to any probability distribution, I will write “ Pr ”.Given any distribution p defined over some space X , I writethe Shannon entropy for countable X , measured in nats, as: S p ( X ) = − X x ∈ X p ( x ) ln (cid:20) p ( x ) (cid:21) (1)As shorthand, I sometimes write S p ( X ) as S ( p ) or even just S ( X ) when p is implicit. I use similar notation for conditionalentropy, joint entropy of more than one random variable, etc . Ialso write mutual information between two random variables X and Y in the usual way, as I ( X ; Y ) [50–52].Given a distribution q ( x ) and a conditional distribution π ( x ′ | x ), I will use matrix notation to define the distribution π q : [ π q ]( x ′ ) = X x π ( x ′ | x ) q ( x ) (2)For any function F ( x ) and distribution P ( x ), I write: E P ( F ) = X x F ( x ) P ( x ) (3)I will also sometimes use capital letters to indicate variablesthat are marginalized over, e.g., writing: E P ( F ( X , y )) = X x P ( x ) F ( x , y ) (4)Below, I often refer to a process as “semi-static”. Thismeans that these processes transform one Hamiltonian intoanother one so slowly that the associated distribution is al-ways close to equilibrium, and as a result, only infinitesimalamounts of dissipation occur during the entire process. Forthis assumption to be valid, the implicit units of time in theanalysis below must be su ffi ciently long on the timescale ofthe relaxation processes of the physical systems involved (orequivalently, those relaxation processes must be su ffi cientlyquick when measured in those time units).If a system with states x is subject to a Hamiltonian H ( x ),then the associated equilibrium free energy is: F eq ( H ) ≡ − β − ln[ Z H ( β )] (5)where as usual β ≡ / kT , and the partition function is: Z H ( β ) = X x exp − β H ( x ) (6)However, the analysis below focuses on nonequilibriumdistributions p ( x ), for which the more directly relevant quan-tity is the nonequilibrium free energy, in which the distributionneed not be a Boltzmann distribution for the current Hamilto-nian: F neq ( H , p ) ≡ E p ( X ) − kT S ( p ) = X x p ( x ) H ( x ) + kT X x p ( x ) ln[ p ( x )] (7)where k is Boltzmann’s constant. For fixed H and T , F neq ( H , p ) is minimized by the associated Boltzmann distri-bution p , for which it has the value F eq ( H ). It will be usefulbelow to consider the changes in nonequilibrium free energythat accompany a change from a distribution P to a distribu-tion M accompanied by a change from a Hamiltonian H to aHamiltonian H ′ : ∆ F H , H ′ neq ( P , M ) ≡ F neq ( H ′ , M ) − F neq ( H , P ) (8) B. Thermodynamically-Optimal Processes
If a process Λ maps a distribution P to a distribution M thermodynamically reversibly, then the amount of work ituses when applied to P is ∆ F H , H ′ neq ( P , M ) [38, 48, 53, 54]. Inparticular, ∆ F H , H ′ neq ( P , π P ) is the amount of work used by athermodynamically-reversible process Λ that maps a distribu-tion P to π P . Equivalently, it is negative for the amount ofwork that is extracted by Λ when transforming P to π P .In addition, by the second law, there is no process that maps P to M while requiring less work than a thermodynamically-reversible process that maps P to M . This motivates the fol-lowing definition. Definition 1.
Suppose a system undergoes a process Λ thatstarts with Hamiltonian H and ends with Hamiltonian H ′ .Suppose as well that:1. at both the start and finish of Λ , the system is in contactwith a (single) heat bath at temperature T ;2. Λ transforms any starting distribution P to an endingdistribution π P, where neither of those two distributionsneed be at equilibrium for their respective Hamiltoni-ans;3. Λ is thermodynamically reversible when run on someparticular starting distribution P.Then, Λ is thermodynamically optimal for the tuple ( P , π, H , H ′ ) . Example 1.
Suppose we run a process over a space X × Y,transforming the t = distribution q ( x ) M ( y ) to a t = dis-tribution p ( x ) M ( y ) . Therefore, x and y are statistically in-dependent at both the beginning and the end of the process,and while the distribution over x undergoes a transition fromq → p, the distribution over y undergoes a cyclic process,taking M → M (note that it is not assumed that the endingand starting y’s are the same or that x and y are independentat times between t = and t = ).Suppose further that at both the beginning and endof the process, there is no interaction Hamiltonian,i.e., at those two times:H ( x , y ) = H X ( x ) + H Y ( y ) (9) Then, no matter how x and y are coupled during the pro-cess, no matter how smart the designer of the process, theprocess will require work of at least: ∆ F H , Hneq ( q , p ) = (cid:18) E p ( H X ) − E q ( H X ) (cid:19) − kT (cid:18) S ( p ) − S ( q ) (cid:19) (10) Note that this amount of work is independent of M.
As a cautionary note, the work expended by any processoperating on any initial distribution p ( x ) is the average of thework expended on each x . However, the associated change innonequilibrium free energy is not the average of the change innonequilibrium free energy for each x . This is illustrated inthe following example. Example 2.
Suppose we have a process Λ that sends eachinitial x to an associated final distribution π ( x ′ | x ) , whiletransforming the initial Hamiltonian H into the final Hamilto-nian H ′ . Write W Λ H , H ′ ,π ( x ) for the work expended by Λ when itoperates on the initial state x. Then, the work expended by Λ operating on an initial distribution p ( x ) is P x p ( x ) W Λ H , H ′ ,π ( x ) .In particular, choose the process Λ , so that it sends p → π pwith minimal work. Then: X x p ( x ) W Λ H , H ′ ,π ( x ) = ∆ F H ′ , Hneq ( p , π p ) (11) However, this does not equal the average over x of the as-sociated changes to nonequilibrium free energy, i.e., ∆ F H ′ , Hneq ( p , π p ) = F neq ( H ′ , π p ) − F neq ( H , p ) , X x p ( x ) (cid:20) F neq ( H ′ , π ( Y | x )) − F neq ( H , δ ( X , x )) (cid:21) (12) (where δ ( X , x ) is the distribution over X that is a delta func-tion at x). The reason is that the entropy terms in those twononequilibrium free energies are not linear; in general, forany probability distribution Pr ( x ) , X x Pr ( x ) ln[ Pr ( x )] , X x Pr ( x ) X x ′ δ ( x ′ , x ) log[ δ ( x ′ , x )](13)I now summarize what will be presented in the rest of thissection.Previous work showed how to construct athermodynamically-optimal process for many tuples( p , π, H , H ′ ). In particular, as discussed in the Introduction,it is known how to construct a thermodynamically-optimalprocess for any tuple ( p , π, H , H ′ ) where π ( x ′ | x ) is indepen-dent of x , like bit erasure. Accordingly, we know the minimalwork necessary to run any such tuple. In Section II C, Ireview this previous analysis and show how to apply it to thekinds of input-output systems considered in this paper.However, as discussed in the Introduction, until re-cently, it was not known whether one could con-struct a thermodynamically-optimal process for any tuple( p , π, H , H ′ ). In particular, given an arbitrary pair of an initialdistribution p and conditional distribution π , it was not knownwhether there is a process Λ that is thermodynamically opti-mal for ( p , π, H , H ′ ) for some H and H ′ . This means that itwas not known what the minimal needed work is to apply anarbitrary stochastic map π to an arbitrary initial distribution p .In particular, it was not known if we could use the di ff erencein nonequilibrium free energy between p and π p to calculatethe minimal work needed to apply a computation π to an ini-tial distribution p .This shortcoming was overcome in [46], where it was ex-plicitly shown how to construct a thermodynamically-optimalprocess for any tuple ( p , π, H , H ′ ). In Section II D, I show indetail how to construct such processes for any input-outputsystem.Section II D also discusses the fact that a process that isthermodynamically optimal for ( p , π, H , H ′ ) need not be ther-modynamically optimal for ( p ′ , π, H , H ′ ) if p ′ , p . Intu-itively, if we construct a process Λ that results in minimalrequired work for initial distribution p and conditional distri-bution π , but then apply that machine to a di ff erent distribution p ′ , p , then in general, work is dissipated. While that Λ isthermodynamically reversible when applied to p , in general, itis not thermodynamically reversible when applied to p ′ , p .As an example, if we design a computer to be thermodynami-cally reversible for input distribution p , but then use it with adi ff erent distribution of inputs, then work is dissipated. In a biological context, this means that if an organism is“designed” not to dissipate any work when it operates in anenvironment that produces inputs according to some p , but in-stead finds itself operating in an environment that producesinputs according to some p ′ , p , then it will dissipate ex-tra work. That dissipated work is wasted since it does notchange π , i.e. , has no consequences for the input-output mapthat the organism implements. However, by the conservationof energy, that dissipated work must still be acquired fromsome external source. This means that the organism will needto harvest free energy from its environment at a higher rate(to supply that dissipated work) than would an organism thatwere “designed” for p ′ . C. Quenching Processes
A special kind of process, often used in the literature, canbe used to transform any given initial nonequilibrium dis-tribution into another given nonequilibrium distribution in athermodynamically-reversible manner. These processes be-gin by quenching the Hamiltonian of a system. After that,the Hamiltonian is isothermally and quasi-statically changed,with the system in continual contact with a heat bath at a fixedtemperature T . The process ends by applying a reverse quenchto return to the original Hamiltonian (see [36, 38, 44] for dis-cussion of these kinds of processes).More precisely, such a Quenching (Q) process applied to asystem with microstates r ∈ R is defined by:1. an initial / final Hamiltonian H tsys ( r );2. an initial distribution ρ t ( r );3. a final distribution ρ t + ( r );and involves the following three steps:(i) To begin, the system has Hamiltonian H tsys ( r ), which isquenched into a first quenching Hamiltonian : H tquench ( r ) ≡ − kT ln[ ρ t ( r )] (14)In other words, the Hamiltonian is changed from H tsys to H tquench too quickly for the distribution over r to changefrom ρ t ( r ).Because the quench is e ff ectively instantaneous, it isthermodynamically reversible and is adiabatic, involv-ing no heat transfer between the system and the heatbath. On the other hand, while r is unchanged ina quench and, therefore, so is the distribution over R , in general, work is required if H tquench , H tsys (see [32, 33, 53, 54]).Note that if the Q process is applied to the distribution ρ t , then at the end of this first step, the distribution isat thermodynamic equilibrium. However, if the processis applied to any other distribution, this will not be thecase. In this situation, work is unavoidably dissipatedin in the next step.(ii) Next, we isothermally and quasi-statically transform H tquench to a second quenching Hamiltonian, H t + quench ( r ) ≡ − kT ln[ ρ t + ( r )] (15)Physically, this means two things. First, that a smoothsequence of Hamiltonians, starting with H tquench andending with H t + quench , is applied to the system. Second,that while that sequence is being applied, the systemis coupled with an external heat bath at temperature T ,where the relaxation timescales of that coupling are ar-bitrarily small on the time scale of the dynamics of theHamiltonian. This second requirement ensures that tofirst order, the system is always in thermal equilibriumfor the current Hamiltonian, assuming it started in equi-librium at the beginning of the step (recall from Sec-tion II A that I assume that quasi-static transformationsoccur in an arbitrarily small amount of time, since therelaxation timescales are arbitrarily short).(iii) Next, we run a quench over R “in reverse”, instanta-neously replacing the Hamiltonian H t + quench ( r ) with theinitial Hamiltonian H tsys , with no change to r . As instep (i), while work may be done (or extracted) in step(iii), no heat is transferred.Note that we can specify any Q process in terms of its firstand second quenching Hamiltonians rather than in terms of theinitial and final distributions, since there is a bijection betweenthose two pairs. This central role of the q uenching Hamilto-nians is the basis of the name “Q” process (I distinguish thedistribution ρ that defines a Q process, which is instantiated inthe physical structure of a real system, from the actual distri-bution P on which that physical system is run).Both the first and third steps of any Q process are thermody-namically reversible, no matter what distribution that processis applied to. In addition, if the Q process is applied to ρ t , thesecond step will be thermodynamically reversible. Therefore,as discussed in [36, 38, 48, 54], if the Q process is applied to ρ t , then the expected work expended by the process is givenby the change in nonequilibrium free energy in going from ρ t ( r ) to ρ t + ( r ), ∆ F H tsys , H tsys neq ( ρ t , ρ t + ) = E ρ t + ( H tsys ) − E ρ t ( H tsys ) + kT (cid:20) S ( ρ t ) − S ( ρ t + ) (cid:21) (16)Note that because of how H tquench and H t + quench are defined,there is no change in the nonequilibrium free energy duringthe second step of the Q process if it is applied to ρ t : E ρ t + ( H t + quench ) − E ρ t ( H tquench ) + kT (cid:20) S ( ρ t ) − S ( ρ t + ) (cid:21) = R . I end this subsection with the following example of aQ process: Example 3.
Suppose that R is partitioned into two bins, i.e.,there are two macrostates. For both t = and t = , for bothpartition elements v, with abuse of notation, define:P t ( v ) ≡ X r ∈ v ρ t ( r | v ) (18) so that: ρ t ( r ) = X v P t ( v ) ρ t ( r | v ) (19) Consider the case where P ( v ) has full support, but P ( v ) = δ ( v , . Therefore, the dynamics over the macrostates (bins)from t = to t = sends both v’s to zero. In other words, iterases a bit.For pedagogical simplicity, take H sys = H sys to be uniform.Then, plugging in to Equation (16), we see that the minimalwork is:kT [ S ( ρ ) − S ( ρ )] = kT (cid:20) S ( P ) + X v P ( v ) (cid:18) − X r P ( r | v ) ln[ ρ ( r | v )] (cid:19)(cid:21) − { → } = kT (cid:20) S ( P ) + X v P ( v ) S ( R | v ) (cid:21) − { → } = kT (cid:20) S ( P ) + S ( R | V ) − S ( P ) − S ( R | V ) (cid:21) = kT (cid:20) S ( P ) + S ( R | V ) − S ( R | V ) (cid:21) (20) (the two terms S ( R t | v t ) are sometimes called “internal en-tropies” in the literature [38]).In the special case that P ( v ) is uniform and that S ( R t | v t ) is the same for both t and both v t , we recover Landauer’sbound, kT ln(2) , as the minimal amount of work needed toerase the bit. Note though that outside of that special case,Landauer’s bound does not give the minimal amount of workneeded to erase a bit. Moreover, in all cases, the limit in Equa-tion (20) is on the amount of work needed to erase the bit; a bitcan be erased with zero dissipated work, pace Landauer. Forthis reason, the bound in Equation (20) is sometimes called“generalized Landauer cost” in the literature [38].On the other hand, suppose that we build a device to im-plement a Q process that achieves the bound in Equation (20)for one particular initial distribution over the value of the bit, G ( v ) . Therefore, in particular, that device has “built into it”a first and second quenching Hamiltonian given by:H quench ( r ) = − kT ln[ G ( r )] (21) H quench ( r ) = − kT ln[ G ( r )] (22) respectively, where: G ( r ) ≡ X v G ( v ) ρ ( r | v ) (23) G ( r ) ≡ ρ ( r | v =
0) (24)
If we then apply that device with a di ff erent initialmacrostate distribution, P ( v ) , G ( v ) , in general, work willbe dissipated in step (ii) of the Q process, because P ( r ) = P v P ( v ) ρ ( r | v ) will not be an equilibrium for H quench . In thecontext of biology, if a bit-erasing organism is optimized forone environment, but then used in a di ff erent one, it will nec-essarily be ine ffi cient, dissipating work (the minimal amountof work dissipated is given by the drop in the value of theKullback–Leibler divergence between G t and P t as the systemdevelops from t = to t = ; see [46]). D. Guided Q Processes
Soon after the quasi-static transformation step of any Q pro-cess begins, the system is thermally relaxed. Therefore, all in-formation about r t , the initial value of the system’s microstate,is quickly removed from the distribution over r (phrased dif-ferently, that information has been transferred into inaccessi-ble degrees of freedom in the external heat bath). This meansthat the second quenching Hamiltonian cannot depend on theinitial value of the system’s microstate; after that thermal re-laxation of the system’s microstate, there is no degree of free-dom in the microstate that has any information concerning theinitial microstate. This means that after the relaxation, there isno degree of freedom within the system undergoing the Q pro-cess that can modify the second quenching Hamiltonian basedon the value of the initial microstate.As a result, by itself , a Q process cannot change an initialdistribution in a way that depends on that initial distribution.In particular, it cannot map di ff erent initial macrostates to dif-ferent final macrostates (formally, a Q process cannot map adistribution with support restricted to the microstates in themacrostate v t to one final distribution and map a distributionwith support restricted to the macrostate v ′ t , v t to a di ff erentfinal distribution).On the other hand, both quenching Hamiltonians of a Qprocess running on a system R with microstates r ∈ R can de-pend on s t ∈ S , the initial microstate of a di ff erent system, S .Loosely speaking, we can run a process over the joint system R×S that is thermodynamically reversible and whose e ff ect isto implement a di ff erent Q process over R , depending on thevalue s t . In particular, we can “coarse-grain” such dependenceon s t : given any partition over S whose elements are labeledby v ∈ V , it is possible that both quenching Hamiltonians of aQ process running on R are determined by the macrostate v t .More precisely, a Guided Quenching (GQ) process over R guided by V (for conditional distribution π and initial distri-bution ρ t ( r , s ))” is defined by a quadruple:1. an initial / final Hamiltonian H tsys ( r , s );2. an initial joint distribution ρ t ( r , s );3. a time-independent partition of S specifying an associ-ated set of macrostates, v ∈ V ;4. a conditional distribution π ( r | v ).It is assumed that for any s , s ′ where s ∈ V ( s ′ ), ρ t ( r | s ) = ρ t ( r | s ′ ) (25) i.e. , that the distribution over r at the initial time t can dependon the macrostate v , but not on the specific microstate s withinthe macrostate v . It is also assumed that there are boundarypoints in S (“potential barriers”) separating the members of V in that the system cannot physically move from v to v ′ , v without going through such a boundary point.The associated GQ process involves the following steps:(i) To begin, the system has Hamiltonian H tsys ( r , s ), whichis quenched into a first quenching Hamiltonian writtenas: H tquench ( r , s ) ≡ H tquench ; S ( s ) + H tquench ; int ( r , s ) (26)We take: H tquench ; int ( r , s ) ≡ − kT ln[ ρ t ( r | s )] (27)and for all s except those at the boundaries of the parti-tion elements defining the macrostates V , H tquench ; S ( s ) ≡ − kT ln[ ρ t ( s )] (28)However, at the s lying on the boundaries of the par-tition elements defining V , H tquench ; S ( s ) is arbitrarilylarge. Therefore, there are infinite potential barriersseparating the macrostates of S .Note that away from those boundaries of the partitionelements defining V , ρ t ( r , s ) is the equilibrium distribu-tion for H tquench .(iii) Next, we isothermally and quasi-statically transform H tquench to a second quenching Hamiltonian, H t + quench ; S ( r , s ) ≡ H tquench ; S ( s ) + H t + quench ; int ( r , s ) (29)where: H tquench ; int ( r , s ) ≡ − kT ln[ π ( r | V ( s ))] (30)( V ( s ) being the partition element that contains s ).Note that the term in the Hamiltonian that only concerns S does not change in this step. Therefore, the infinitepotential barriers delineating partition boundaries in S remain for the entire step. I assume that as a result ofthose barriers, the coupling of S with the heat bath dur-ing this step cannot change the value of v . As a result,even though the distribution over r changes in this step,there is no change to the value of v . To describe this, Isay that v is “semi-stable” during this step. (To state thisassumption more formally, let A ( s ′ , s ′′ ) be the (matrix)kernel that specifies the rate at which s ′ → s ′′ due toheat transfer between S and the heat bath during duringthis step (ii) [32, 33]. Then, I assume that A ( s ′ , s ′′ ) isarbitrarily small if V ( s ′′ ) , V ( s ′ ).)As an example, the di ff erent bit strings that can bestored in a flash drive all have the same expected energy,but the energy barriers separating them ensure that thedistribution over bit strings relaxes to the uniform dis-tribution infinitesimally slowly. Therefore, the value ofthe bit string is semi-stable.Note that even though a semi-stable system is not atthermodynamic equilibrium during its “dynamics” (inwhich its macrostate does not change), that dynamicsis thermodynamically reversible, in that we can run itbackwards in time without requiring any work or re-sulting in heat dissipation.(iii) Next, we run a quench over R × S “in reverse”, instanta-neously replacing the Hamiltonian H t + quench ( r , s ) with theinitial Hamiltonian H tsys ( r , s ), with no change to r or s .As in step (i), while work may be done (or extracted) instep (iii), no heat is transferred.There are two crucial features of GQ processes. The firstis that a GQ process faithfully implements π even if its outputvaries with its input and does so no matter what the initial dis-tribution over R × S is. The second is that for a particular initialdistribution over R × S , implicitly specified by H tquench ( r , s ), theGQ process is thermodynamically reversible.The first of these features is formalized with the followingresult, proven in Appendix A: Proposition 1.
A GQ process over R guided by V (for condi-tional distribution π and initial distribution ρ t ( r , s ) ) will trans-form any initial distribution p t ( v ) p t ( r | v ) into a distributionp t ( v ) π ( r | v ) without changing the distribution over s condi-tioned on v. Consider the special case where the GQ process is in factapplied to the initial distribution that defines it, ρ t ( r , s ) = X v ρ t ( v ) ρ t ( s | v ) ρ t ( r | v ) (31)(recall Equation (25)). In this case, the initial distribution isa Boltzmann distribution for the first quenching Hamiltonian;the final distribution is: ρ t + ( r , s ) = X v ρ t ( v ) ρ t ( s | v ) π ( r | v ) (32)and the entire GQ process is thermodynamically reversible.This establishes the second crucial feature of GQ processes.Plugging in, in this special case, the change in nonequilib-rium free energy is: ∆ F H tsys , H tsys neq ( ρ t , ρ t + ) = (cid:20) X r , s , v ρ t ( v ) ρ t ( s | v ) (cid:0) π ( r | v ) − ρ t ( r | v ) (cid:1) H tsys ( r , s ) (cid:21) − kT (cid:20) S ( ρ t + ) − S ( ρ t ) (cid:21) (33)This is the minimal amount of free energy needed to im-plement the GQ process. An important example of such athermodynamically-optimal GQ process is the work-free copyprocess discussed in [38] and the references therein. Suppose that we build a device to implement a GQ processover R guided by V for conditional distribution π and initialdistribution: ρ t ( r , s ) = X v ρ t ( r | v ) ρ t ( s | v ) G t ( v ) (34)Therefore, that device has “built into it” firstand second quenching Hamiltonians that depend on ρ t ( r | v ) , ρ t ( s | v ) and G t . Suppose we apply that devicein a situation where the initial distribution over r conditionedon v is in fact ρ t ( r | v ) and the initial distribution over s conditioned on v is in fact ρ t ( s | v ), but the initial macrostatedistribution, P t ( v ), does not equal G t ( v ). In this situation, theactual initial distribution at the start of step (ii) of the GQprocess will not be an equilibrium for the initial quenchingHamiltonian. However, this will not result in there beingany work dissipated during the thermal relaxation of thatstep. That is because the distribution over v in that step doesnot relax, no matter what it is initially (due to the infinitepotential barriers in S ), while the initial distribution over( r , s ) conditioned on v is in thermal equilibrium for the initialquenching Hamiltonian.However, now suppose that we apply the device in a sit-uation where the initial distribution over r conditioned on v does not equal ρ t ( r | v ). In this situation, work will be dissi-pated in step (ii) of the GQ process. That is because the initialdistribution over r when the relaxation starts is not in ther-mal equilibrium for the initial quenching Hamiltonian, andthis distribution does relax in step (ii). Therefore, if the de-vice was not “designed” for the actual initial distribution over r conditioned on v ( i.e. , does not use a ρ t ( r | v ) that equals thatactual distribution), it will necessarily dissipate work.As elaborated below, this means that if a biological organ-ism that implements any map π is optimized for one environ-ment, i.e. , one distribution over its inputs, but then used in anenvironment with a di ff erent distribution over its inputs, it willnecessarily be ine ffi cient, dissipating work (recall that above,we established a similar result for the specific type of Q pro-cess that can be used to erase a bit). III. ORGANISMS
In this section, I consider biological systems that process aninput into an output, an output that specifies some action thatis then taken back to the environment. As shorthand, I willrefer to any biological system that does this as an “organism”.A cell exhibiting chemotaxis is an example of an organism,with its input being (sensor readings of) chemical concentra-tions and its output being chemical signals that in turn specifysome directed motion it will follow. Another example is a eu-social insect colony, with its inputs being the many di ff erentmaterials that are brought into the nest (including atmosphericgases) and its output being material waste products (includingheat) that in turn get transported out of the colony.Physically, each organism contains an “input subsystem”,a “processor subsystem” and an “output subsystem” (amongothers). The initial macrostate of the input subsystem isformed by sampling some distribution specified by the envi-ronment and is then copied to the macrostate of the proces-sor subsystem. Next, the processor iterates some specifiedfirst-order time-homogenous Markov chain (for example, ifthe organism is a cell, this Markov chain models the iterativebiochemical processing of the input that takes place withinthe organism). The ending value of the chain is the organ-ism’s output, which specifies the action that the organism thentakes back to its environment. In general, it could be thatfor certain inputs, an organism never takes any action backto its environment, but instead keeps processing the input in-definitely. Here, that is captured by having the Markov chainkeep iterating (e.g., the biochemical processing keeps going)until it produces a value that falls within a certain prede-fined halting (sub)set, which is then copied to the organism’soutput (the possibility that the processing never halts also en-sures that the organism is Turing complete [55–57]).There are many features of information processing in realbiological systems that are distorted in this model; it is justa starting point. Indeed, some features are absent entirely.In particular, since the processing is modeled as a first-orderMarkov chain, there is no way for an organism described bythis model to “remember” a previous input it received whendetermining what action to take in response to a current in-put. Such features could be incorporated into the model in astraight-forward way and are the subject of future work.In the next subsection, I formalize this model of a biolog-ical input-output system, in terms of an input distribution, aMarkov transition matrix and a halting set. I then analyze theminimal amount of work needed by any physical system thatimplements a given transition matrix when receiving inputsfrom a given distribution, i.e. , the minimal amount of work areal organism would need to implement its input-output be-havior that it exhibits in its environment, if it were free to useany physical process that obeys the laws of physics. To per-form this analysis, I will construct a specific physical processthat implements an iteration of the Markov transition matrixof a given organism with minimal work, when inputs are gen-erated according to the associated input distribution. This pro-cess involves a sequence of multiple GQ processes. It cannotbe emphasized enough that these processes I construct are notintended to describe what happens in real biological input-output systems, even as a cartoon . These processes are usedonly as a calculational tool, for finding a lower bound on theamount of work needed by a real biological organism to im-plement a given input-output transition matrix.Indeed, because real biological systems are often quite in-e ffi cient, in practice, they will often use far more work thanis given by the bound I calculate. However, we might expectthat in many situations, the work expended by a real biolog-ical system that behaves according to some transition matrixis approximately proportional to the work that would be ex-pended by a perfectly e ffi cient system obeying the same tran-sition matrix. Under that approximation, the relative sizes ofthe bounds given below should reflect the relative sizes of theamounts of work expended by real biological systems. A. The Input and Output Spaces of an Organism
Recall from Section II D that a subsystem S cannot usea thermodynamically-reversible Q process to update its ownmacrostate in an arbitrary way. However a di ff erent sub-system S ′ can guide an arbitrary updating of the macrostateof S , with a GQ process. In addition, the work requiredby a thermodynamically-reversible process that implementsa given conditional distribution from inputs to outputs is thesame as the work required by any other thermodynamically-reversible process that implements that same distribution.In light of these two facts, for simplicity, I will not try toconstruct a thermodynamically-reversible process that imple-ments any given organism’s input-output distribution directly,by iteratively updating the processor until its state lies in thehalting subset and then copying that state to the output. In-stead, I will construct a thermodynamically-reversible pro-cess that implements that same input-output distribution, butby “ping-ponging” GQ processes back and forth between thestate of the processor and the state of the output system, untilthe output’s state lies in the halting set.Let W be the space of all possible microstates of a proces-sor subsystem, and U the (disjoint) space of all possible mi-crostates of an output subsystem. Let X be a partition of W , i.e. , a coarse-graining of it into a countable set of macrostates.Let X be the set of labels of those partition elements, i.e. , therange of the map X (for example, in a digital computer, X could be a map taking each microstate of the computer’s mainRAM, w ∈ W , into the associated bit string, X ( w ) ∈ X ). Sim-ilarly, let Y be a partition of U , the microstate of the outputsubsystem. Let Y be the set of labels of those partition ele-ments, i.e. , the range of the map Y , with Y halt ⊆ Y the haltingsubset of Y . I generically write an element of X as x and an el-ement of Y as y . I assume that X and Y , the spaces of labels ofthe processor and output partition elements, respectively, havethe same cardinality and, so, indicate their elements with thesame labels. In particular, if we are concerned with Turing-complete organisms, X and Y would both be { , } ∗ , the set ofall finite bit strings (a set that is bijective with N ).For notational convenience, I arbitrarily choose one non-empty element of X and one non-empty element of Y and theadditional label 0 to both of them (for example, in a Turingmachine, it could be that we assign the label 0 to the partitionelement that also has label { } ). Intuitively, these elementsrepresent the “initialized” state of the processor and outputsubsystems, respectively.The biological system also contains an input subsystem,with microstates f ∈ F and coarse-graining partition F thatproduces macrostates b ∈ B . The space B is the same as thespace X (and therefore is the same as Y ). The state of the inputat time t = b , is formed by sampling an environment distri-bution P . As an example, b could be determined by a (possi-bly noisy) sensor reading of the external environment. As an-other example, the environment of an organism could directlyperturb the organism’s input macrostate at t =
0. For sim-plicity, I assume that both the processor subsystem and theoutput subsystem are initialized before b is generated, i.e. ,that x = y = b is set this way, it is copied to the processor sub-system, setting x . At this point, we iterate a sequence ofGQ processes in which x is mapped to y , then y is mapped to x , then that new x is mapped to a new y , etc ., until (and if) y ∈ Y halt . To make this precise, adopt the notation that [ α, α ′ ]refers to the joint state ( x = α, y = α ′ ). Then, after x is set,we iterate the following multi-stage ping-pong sequence:1. [ x t , → [ x t , y t ], where y t is formed by sampling π ( y t | x t );2. [ x t , y t ] → [0 , y t ];3. If y t ∈ Y halt , the process ends;4. [0 , y t ] → [ y t , y t ];5. [ y t , y t ] → [ y t , t replaced by t + t = τ , then the as-sociated value y τ is used to specify an action by the organismback on its environment. At this point, to complete a ther-modynamic cycle, both x and y are reinitialized to zero, inpreparation for a new input.Here, for simplicity, I do not consider the thermodynamicsof the physical system that sets the initial value of b by “sens-ing the environment”; nor do I consider the thermodynamicsof the physical system that copies that value to x (see [38]and the references therein for some discussion of the thermo-dynamics of copying). In addition I do not analyze the ther-modynamics of the process in which the organism uses y τ to“take an action back to its environment” and thereby reinitial-izes y . I only calculate the minimal work required to imple-ment the phenotype of the organism, which here is taken tomean the iterated ping-pong sequence between X and Y .Moreover, I do not make any assumption for what happensto b after it is used to set x ; it may stay the same, may slowlydecay in some way, etc . Accordingly, none of the thermo-dynamic processes considered below are allowed to exploit(some assumption for) the value of b when they take place toreduce the amount of work they require. As a result, from nowon, I ignore the input space and its partition.Physically, a ping-pong sequence is implemented by somecontinuous-time stochastic processes over W × U . Any suchprocess induces an associated discrete-time stochastic processover W × U . That discrete-time process comprises a jointdistribution Pr defined over a (possibly infinite) sequence ofvalues ( w , u ) , . . . ( w t , u t ) , ( w t + , u t + ) , . . . That distribution inturn induces a joint distribution over associated pairs of parti-tion element labels, ( w , u ) , . . . ( x t , y t ) , ( x t + , y t + ) , . . . For calculational simplicity, I assume that ∀ y ∈ Y , at theend of each stage in a ping-pong sequence that starts at anytime t ∈ N , Pr ( u | y ) is the same distribution, which I write as q yout ( u ). I make the ana¡us assumption for Pr ( w | x ) to define q xproc ( w ) (in addition to simplifying the analysis, this helps en-sure that we are considering cyclic processes, a crucial issuewhenever analyzing issues like the minimal amount of workneeded to implement a desired map). Note that q yout ( u ) = Y ( u ) , y . To simplify the analysis further, I also assume that all “internal entropies” of the processor macrostates arethe same, i.e. , S ( q yout ( U )) is independent of y , and similarlyfor the internal entropies of the output macrostates.Also for calculational simplicity, I assume that at the endof each stage in a ping-pong sequence that starts at any time t ∈ N , there is no interaction Hamiltonian coupling any ofthe three subsystems (though obviously, there must be suchcoupling at non-integer times). I also assume that at all suchmoments, the Hamiltonian over U is the same function, whichI write as H out . Therefore, for all such moments, the expectedvalue of the Hamiltonian over U if the system is in state y t atthat time is: E ( H out | y ) = X u q yout ( u ) H out ( u ) (35)Similarly, H in and H proc define the Hamiltonians at all suchmoments, over the input and processor subsystems, respec-tively.I will refer to any quadruple ( W , X , U , Y ) and three associ-ated Hamiltonians as an organism .For future use, note that for any iteration t ∈ N , initial dis-tribution P ′ ( x ), conditional distribution π ( y | x ) and haltingsubset Y halt ⊆ Y , P ′ ( y t ∈ Y halt ) = X y t P ′ ( y t ) I ( y t ∈ Y halt ) = X x t , y t P ′ ( x t ) π ( y | x ) | x = x t , y = y t I ( y t ∈ Y halt ) (36) P ′ ( y t | y t ∈ Y halt ) = P x t P ′ ( x t ) π ( y | x ) | x = x t , y = y t I ( y t ∈ Y halt ) P x t , y t P ′ ( x t ) π ( y | x ) | x = x t , y = y t I ( y t ∈ Y halt )(37)and similarly: P ′ ( x t + | y t < Y halt ) = P x t P ′ ( x t ) π ( y | x ) | x = x t , y = x t + I ( x t + < Y halt ) P x t , y t P ′ ( x t ) π ( y | x ) | x = x t , y = x t + I ( x t + < Y halt )(38)Furthermore, S ( P t ( X )) = − X x P t ( x ) ln[ P t ( x )] (39) S ( P t + ( X )) = − X x , y P t ( x ) π ( y | x ) ln (cid:20) X x ′ P t ( x ′ ) π ( y | x ′ ) (cid:21) (40)I end this subsection with some notational comments. Iwill sometimes abuse notation and put time indices on dis-tributions rather than variables, e.g., writing Pr t ( y ) rather than Pr ( y t = y ). In addition, sometimes, I abuse notation with tem-poral subscripts. In particular, when the initial distributionover X is P ( x ), I sometimes use expressions like: P t ( w ) ≡ X x P t ( x ) q xin ( w ) (41) P t ( u ) ≡ X y P t ( y ) q yout ( u ) (42) P t ( y ) ≡ X x t P t ( x t ) π ( y t | x t ) (43) P t + ( x | y t ) ≡ δ ( x , y t ) (44)1However, I will always be careful when writing joint dis-tributions over variables from di ff erent moments of time, e.g.,writing: P ( y t + , x t ) ≡ P ( y t + | x t ) P ( x t ) = π ( y t + | x t ) P t ( x t ) (45) B. The Thermodynamics of Mapping an Input Space to anOutput Space
Our goal is to construct a physical process Λ over an organ-ism’s quadruple ( W , X , U , Y ) that implements an iteration of agiven ping-pong sequence above for any particular t . In ad-dition, we want Λ to be thermodynamically optimal with thestipulated starting and ending joint Hamiltonians for all iter-ations of the ping-pong sequence when it is run on an initialjoint distribution: P ( x , y ) = P ( x ) δ ( y ,
0) (46)In Appendix B, I present four separate GQ processes thatimplement stages (1), (2), (4) and (5) in a ping-pong sequence(and so implement the entire sequence). The GQ processesfor stages (1), (4) and (5) are guaranteed to be thermodynam-ically reversible, for all t . However, each time- t GQ processfor stage (2) is parameterized by a distribution G t ( x t ). Intu-itively, that distribution is a guess, made by the “designer”of the (time- t ) stage (2) GQ process, for the marginal distri-bution over the values x t at the beginning of the associatedstage (1) GQ process. That stage (2) GQ process will also bethermodynamically reversible, if the distribution over x t at thebeginning of the stage (1) GQ process is in fact G t ( x t ). There-fore, for that input distribution, the sequence of GQ processesis thermodynamically optimal, as desired. However, as dis-cussed below, in general, work will be dissipated if the stage(2) GQ process is applied when the distribution over x t at thebeginning of stage (1) di ff ers from G ( x t ).I call such a sequence of five processes implementing an it-eration of a ping-pong sequence an organism process . It is im-portant to emphasize that I do not assume that any particularreal biological system runs an organism process. An organismprocess provides a counterfactual model of how to implementa particular dynamics over X × Y , a model that allows us to cal-culate the minimal work used by any actual biological systemthat implements that dynamics.Suppose that an organism process always halts for any x ,such that P ( x ) ,
0. Let τ ∗ be the last iteration at which suchan organism process may halt, for any of the inputs x , suchthat P ( x ) , X is countably infinite, τ ∗ mightbe countable infinity). Suppose further that no new input isreceived before τ ∗ if the process halts at some τ < τ ∗ andthat all microstates are constant from such a τ up to τ ∗ (so,no new work is done during such an interval). In light of theiterative nature of organism processes, this last assumption isequivalent to assuming that π ( y t | x t ) = δ y t , x t if x t ∈ Y halt .I say that the organism process is recursive when all of theseconditions are met, since that is the adjective used in the the-ory of Turing machines. For a recursive organism process, the ending distribution over y is: P ( y τ ∗ ) = X x ,..., x τ ∗ π ( y τ ∗ | x τ ∗ ) P ( x ) τ ∗ Y t = π ( x t | x t − ) (47)and: P ( y τ ∗ | x ) = X x ,..., x τ ∗ π ( y τ ∗ | x τ ∗ ) τ ∗ Y t = π ( x t | x t − ) (48) Proposition 2.
Fix any recursive organism process, iterationt ∈ N , initial distributions P ( x ) , P ′ ( x ) , conditional distribu-tion π ( y | x ) and halting subset Y halt ⊆ Y.1. With probability P ′ ( y t ∈ Y halt ) , the ping-pong sequenceat iteration t of the associated organism process mapsthe distribution: P ′ ( x t ) δ ( y t − , → δ ( x t , P ′ ( y t | y t ∈ Y halt ) and then halts, and with probability − P ′ ( y t ∈ Y halt ) , itinstead maps: P ′ ( x t ) δ ( y t − , → P ( x t + | y t < Y halt ) δ ( y t , and continues.2. If G t = P t for all t ≤ τ ∗ , the total work the organism ex-pends to map the initial distribution P ( x ) to the endingdistribution P τ ∗ ( y ) is: Ω π P ≡ X y P τ ∗ ( y ) E ( H out | y ) − E ( H out | y ′ ) | y ′ = − X x P ( x ) E ( H in | x ) + E ( H in | x ′ ) | x ′ = + kT (cid:0) S ( P ( X )) − S ( P τ ∗ ( Y )) (cid:1)
3. There is no physical process that both performs thesame map as the organism process and that requiresless work than the organism process does when appliedto P ( x t ) δ ( y t , .Proof. Repeated application of Proposition 1 gives the firstresult.Next, combine Equation (70) in Appendix B, Equation (33)and our assumptions made just before Equation (35) to calcu-late the work needed to implement the GQ process of the firststage of an organism process at iteration t : (cid:20) X x , y , u (cid:18) P t ( x ) π ( y | x ) q yout ( u ) − q out ( u ) (cid:19) H out ( u ) (cid:21) − kT (cid:20) S ( P t ( Y )) − S ( P t − ( Y )) (cid:21) = X y P t ( y ) E ( H out | y ) − E ( H out | y ′ ) | y ′ = − kT S ( P t ( Y ))Analogous equations give the work for the remaining threeGQ processes. Then, apply these equations repeatedly, start-ing with the distribution given in Equation (46) (note thatall terms for iterations of the ping-pong sequence with t ∈{ , , . . . , τ ∗ − } cancel out). This gives the second result.Finally, the third result is immediate from the assumptionthat G t = P t for all t , which guarantees that each iteration ofthe organism process is thermodynamically reversible. (cid:3) X is, the organism process updatesthat distribution according to π , halting whenever it producesa value in Y halt . This is true even if the output of π dependson its input (as discussed in the Introduction, this property isviolated for many of the physical processes considered in theliterature).The first terms in the definition of Ω π P , given by a sum ofexpected values of the Hamiltonian, can be interpreted as the“labor” done by the organism when processing x into y τ ∗ ,e.g., by making and breaking chemical bonds. It quantifiesthe minimal amount of external free energy that must be usedto implement the amount of labor that is (implicitly) specifiedby π . The remaining terms, a di ff erence of entropies, representthe free energy required by the “computation” done by theorganism when it undergoes π , independent of the labor doneby the organism. C. Input Distributions and Dissipated Work
Suppose that at the beginning of some iteration t of an or-ganism process, the distribution over x t is some P ( x t ) that dif-fers from G t ( x t ), the prior distribution “built into” the (quench-ing Hamiltonians defining the) organism process. Then, aselaborated at the beginning of Section III B, in general, this it-eration of the organism process will result in dissipated work.As an example, such dissipation will occur if the organ-ism process is used in an environment that generates inputsaccording to a distribution P that di ff ers from G , the dis-tribution “built into” the organism process. In the context ofbiology, if a biological system gets optimized by natural se-lection for one environment, but is then used in another one,it will necessarily operate (thermodynamically sub-optimally)in that second environment.Note though that one could imagine designing an organ-ism to operate optimally for a distribution over environments,since that is equivalent to a single average distribution overinputs. More precisely, a distribution Pr ( P ) over environ-ments is equivalent to a single environment generating inputsaccording to: Pr ( x ) = X P Pr ( P ) P ( x ) (49)We can evaluate the thermodynamic cost Ω π Pr for this organ-ism that behaves optimally for an uncertain environment.As a comparison point, we can also evaluate the work usedin an impossible scenario where P varies stochastically butthe organism magically “knows” what each P is before itreceives an input sampled from that P , and then changes itsdistributions G t accordingly. The average thermodynamic costin this impossible scenario would be X P Pr ( P ) Ω π P (50)In general Ω π Pr ≥ X P Pr ( P ) Ω π P (51) with equality only if Pr ( . ) is a delta function about one partic-ular P . So in general, even if an organism choose its (fixed) G to be optimal for an uncertain environment, it cannot do aswell as it would if it could magically change G appropriatelybefore each new environment it encounters.As a second example, in general, as one iterates an or-ganism process, the initial distribution P ( x ) is changed intoa sequence of new distributions {P ( x ) , P ( x ) , . . . } . In gen-eral, many of these distributions will di ff er, i.e. , for many t ′ , P t ′ + , P t ′ . Accordingly, if one is using some particular phys-ical device to implement the organism process, unless that de-vice has a clock that it can use to update G t from one itera-tion to the next (to match the changes in P t ), the distribution G t built into the device will di ff er from P t at some times t .Therefore, without such a clock, work will be dissipated.Bearing these caveats in mind, unless explicitly stated oth-erwise, in the sequel, I assume that the time- t stage (2) GQprocess of an organism makes the correct guess for the inputdistribution at the start of the time- t ping-pong sequence, i.e. ,that its parameter G t is always the same as the distribution over x at the beginning of the time- t stage (1) process. In this case,the minimal free energy required by the organism is Ω π P , andno work is dissipated.It is important to realize that in general, if one were to run aQ process over X in the second stage of an organism process,rather than a GQ process over X guided by Y , there would benonzero dissipated work. The reason is that if we ran such a Qprocess, we would ignore the information in y t + concerningthe variable we want to send to zero, x t . In contrast, whenwe use a GQ process over X guided by Y , no information isignored, and we maintain thermodynamic reversibility. Theextra work of the Q process beyond that of the GQ process is: kT S ( X t ) − kT S ( X t | Y t + ) = kT I ( X t ; Y t + ) (52)In other words, using the Q process would cause us to dis-sipate work kT I ( X t ; Y t + ). This amount of dissipated workequals zero if the output of π is independent of its input, asin bit erasure. It also equals zero if P ( x t ) is a delta function.However, for other π and P ( x t ), that dissipated work will benonzero. In such situations, stage 2 would be thermodynam-ically ir reversible if we used a Q process over X t to set x tozero.As a final comment, it is important to emphasize that noclaim is being made that the only way to implement an organ-ism process is with Q processes and / or GQ processes. How-ever, the need to use the organism process in an appropriateenvironment, and for it to have a clock, should be generic, ifwe wish to avoid dissipated work. D. Optimal Organisms
From now on, for simplicity, I restrict attention to recursiveorganism processes.Recall that adding noise to π may reduce the amount ofwork required to implement it. Formally, Proposition 2 tellsus that everything else being equal, the larger S ( P τ ∗ ( Y )) is, theless work is required to implement the associated π (indeed,3the thermodynamically-optimal implementation of a one-to-many map π actually draws in free energy from the heat bath,rather than requiring free energy that ends up being dumpedinto that heat bath). This implies that an organism will wantto implement a π that is as noisy as possible.In addition, not all maps x → y τ ∗ are equally important toan organism’s reproductive fitness. It will be important to bevery precise in what output is produced for some inputs x ,but for other inputs, precision is not so important. Indeed, forsome inputs, it may not matter at all what output the organismproduces in response.In light of this, natural selection would be expected to fa-vor π ’s that are as noisy as possible, while still being precisefor those inputs where reproductive fitness requires it. To sim-plify the situation, there are two contributions to the reproduc-tive fitness of an organism that implements some particular π :the free energy (and other resources) required by that imple-mentation and the “phenotypic fitness” that would arise byimplementing π even if there were no resources required toimplement it.Therefore, there will be a tradeo ff between the resource costof being precise in π with the phenotypic fitness benefit of be-ing precise. In particular, there will be a tradeo ff between thethermodynamic cost of being precise in π (given by the min-imal free energy that needs to be used to implement π ) andthe phenotypic fitness of that π . In this subsection, I use anextremely simplified and abstracted model of reproductive fit-ness of an organism to determine what π optimizes this trade-o ff .To start, suppose we are given a real-valued phenotypic fit-ness function f ( x , y τ ∗ ). This quantifies the benefit to the or-ganism of being precise in what output it produces in responseto its inputs. More precisely, f ( x , y τ ∗ ) quantifies the impacton the reproductive fitness of the organism that arises if it out-puts y τ ∗ in response to an input x it received, minus the e ff ecton reproductive fitness of how the organism generated that re-sponse. That second part of the definition means that behav-ioral fitness does not include energetic costs associated withmapping x → y τ ∗ . Therefore, it includes neither the workrequired to compute a map taking x → y τ ∗ nor the labor in-volved in carrying out that map going into f (note that in sometoy models, f ( x , y τ ∗ ) would be an expectation value of an ap-propriate quantity, taken over states of the environment, andconditioned on x and y τ ∗ ). For an input distribution P ( x ) andconditional distribution π , expected phenotypic fitness is: E P ,π ( f ) = X x , y τ ∗ P ( x ) P ( y τ ∗ | x ) f ( x , y τ ∗ ) (53)where P ( y τ ∗ | x ) is given by Equation (48).The expected phenotypic fitness of an organism if it imple-ments π on the initial distribution P is only one contributionto the overall reproductive fitness of the organism. In addi-tion, there is a reproductive fitness cost to the organism thatdepends on the specific physical process it uses to implement π on P . In particular, there is such a cost arising from thephysical resources that the process requires.There are several contributions to this cost. In particular,di ff erent physical processes for implementing π will require di ff erent sets of chemicals from the environment, will resultin di ff erent chemical waste products, etc . Here, I ignore such“material” costs of the particular physical process the organ-ism uses to implement π on P .However, in addition to these material costs of the process,there is also a cost arising from the thermodynamic work re-quired to run that process. If we can use a thermodynamically-reversibly process, then by Equation (49), for fixed P and π , the minimal possible such required work is Ω π P . Ofcourse, in many biological scenarios, it is not possible to usea thermodynamically-reversible organism process to imple-ment π . As discussed in Section III C, this is the case if theorganism process is “designed” for an environment that gen-erates inputs x according to G ( x ) while the actual environ-ment in which the process is used generates inputs accord-ing to some P , G . However, there are other reasons whythere might have to be non-zero dissipated work. In particu-lar, there is non-zero dissipated work if π must be completedquickly, and so, it cannot be implemented using a quasi-staticprocess (it does not do an impala any good to be able to com-pute the optimal direction in which to flee a tiger chasing it,if it takes the impala an infinite amount of time to completethat computation). Additionally, of course, it may be that aminimal amount of work must be dissipated simply becauseof the limited kinds of biochemical systems available to a realorganism.I make several di ff erent simplifying assumptions:1. In some biological scenarios, the amount of such dissi-pated work that cannot be avoided in implementing π ,ˆ W π P , will be comparable to (or even dominate) the min-imal amount of reversible work needed to implement π , Ω π P . However, for simplicity, in the sequel, I concen-trate solely on the dependence on π of the reproductivefitness of a process that implements π that arises due toits e ff ect on W π P . Equivalently, I assume that I can ap-proximate di ff erences ˆ W π P − ˆ W π ′ P as equal to ˆ W π P − ˆ W π ′ P up to an overall proportionality constant.2. Real organisms have internal energy stores that allowthem to use free energy extracted from the environmentat a time t ′ < t =
1, thereby“smoothing out” their free energy needs. For simplicity,I ignore such energy stores. Under this simplification,the organism needs to extract at least Ω π P of free energyfrom its environment to implement a single iteration of π on P . That minimal amount of needed free energyis another contribution to the “reproductive fitness costto the organism of physically implementing π startingfrom the input distribution P ”.3. As another simplifying assumption, I suppose that the(expected) reproductive fitness of an organism that im-plements the map π starting from P is just: F ( P , π, f ) ≡ α E P ,π ( f ) − Ω π P (54)Therefore, α is the benefit to the organism’s reproduc-tive fitness of increasing f by one, measured in units4of energy. This ignores all e ff ects on the distribution P that would arise by having di ff erent π implementedat times earlier than t =
1. It also ignores the possi-ble impact on reproductive fitness of the organism’s im-plementing particular sequences of multiple y ’s (futurework involves weakening all of these assumptions, withparticular attention to this last one). Under this assump-tion, varying π has no e ff ect on S ( X ), the initial entropyover processor states. Similarly, it has no e ff ect on theexpected value of the Hamiltonian then.Combining these assumptions with Proposition 2, we seethat after removing all terms in Ω π P that do not depend on π ,we are left with P y P τ ∗ ( y ) E ( H out | y ) − kT S ( P τ ∗ ( Y )). Thisgives the following result: Corollary 3.
Given the assumptions discussed above, up toan additive constant that does not depend on π : F ( P , π, f ) = X x , y τ ∗ P ( x ) P ( y τ ∗ | x ) (cid:26) α f ( x , y τ ∗ ) − H out ( y τ ∗ ) − kT ln (cid:20) X x ′ P ( x ′ ) P ( y τ ∗ | x ′ ) (cid:21)(cid:27) The first term in Corollary 3 reflects the impact of π on thephenotypic fitness of the organism. The second term reflectsthe impact of π on the amount of labor the organism does.Finally, the last term reflects the impact of π on the amountof computation the organism does; the greater the entropy of y τ ∗ , the less total computation is done. In di ff erent biologicalscenarios, the relative sizes of these three terms may changeradically. In some senses, Corollary 3 can be viewed as anelaboration of [58], where the “cost of sensing” constant inthat paper is decomposed into labor and computation costs.From now on, for simplicity, I assume that Y halt = Y . So nomatter what the input is, the organism process runs π exactlyonce to produce the output. Returning to our actual optimiza-tion problem, by Lagrange multipliers, if the π that maximizesthe expression in Corollary 3 lies in the interior of the feasi-ble set, then it is the solution to a set of coupled nonlinearequations, one equation for each pair ( x , y ): P ( x ) (cid:26) H out ( y ) − α f ( x , y ) + kT (cid:18) ln (cid:20) X x ′ P ( x ′ ) π ( y | x ′ ) (cid:21) + (cid:19)(cid:27) = λ x (55)where the λ x are the Lagrange multipliers ensuring that P y π ( y | x ) = x ∈ X . Unfortunately in generalthe solution may not lie in the interior, so that we have a non-trivial optimization problem.However, suppose we replace the quantity: − X x , y P ( x ) π ( y | x ) ln (cid:20) X x ′ P ( x ′ ) π ( y | x ′ ) (cid:21) = S ( Y )(56) in Corollary 3 with S ( Y | X ). Since S ( Y | X ) ≤ S ( Y ) [50, 51], this modification gives us a lower bound onexpected reproductive fitness:ˆ F ( P , π, f ) ≡ X x , y P ( x ) π ( y | x ) (cid:26) α f ( x , y ) − H out ( y ) − kT ln (cid:20) π ( y | x ) (cid:21)(cid:27) ≤ F ( P , π, f ) (57)The π that maximizes ˆ F ( P , π, f ) is just a set of Boltzmanndistributions: π ( y | x ) ∝ exp (cid:18) α f ( x , y ) − H out ( y ) kT (cid:19) (58)For each x , this approximately optimal conditional distri-bution puts more weight on y if the associated phenotypic fit-ness is high, while putting less weight on y if the associatedenergy is large. In addition, we can use this distribution toconstruct a lower bound on the maximal value of the expectedreproductive fitness: Corollary 4.
Given the assumptions discussed above, max π F ( P , π, f ) ≥ − kT X x P ( x ) ln (cid:20) X y exp (cid:18) α f ( x , y ) − H out ( y ) kT (cid:19)(cid:21) Proof.
Write:ˆ F ( P , π, f ) = X x , y P ( x ) (cid:18) π ( y | x ) (cid:26) α f ( x , y ) − H out ( y ) − kT ln (cid:20) π ( y | x ) (cid:21)(cid:27)(cid:19) ≡ X x , y P ( x ) ˆ F ( x , π, f ) (59)Each term ˆ F ( x , π, f ) in the summand depends on the Y -space distribution π ( . | x ), but no other terms in π . There-fore, we can evaluate each such term ˆ F ( x , π, f ) separatelyfor its maximizing (Boltzmann) distribution π ( . | x ). In theusual way, this is given by the log of the associated partitionfunction (normalization constant) z ( x ), since for any x andassociated Boltzmann π ( . | x ), S ( Y | x ) = − X y π ( y | x ) ln[ π ( y | x )] = − X y exp (cid:0) β [ α f ( x , y ) − H out ( y )] (cid:1) z ( x )ln (cid:20) exp (cid:0) β [ α f ( x , y ) − H out ( y )] (cid:1) z ( x ) (cid:21) = − X y π ( y | x ) (cid:0) β [ α f ( x , y ) − H out ( y )] (cid:1) − ln[ z ( x )](60)where β ≡ / kT , as usual. Comparing to Equation (59) estab-lishes that: ˆ F ( x , π, f ) = − kT ln[ z ( x )] (61)and then gives the claimed result. (cid:3) X = Y , f ( x , x ) = x andthat f were non-negative. Then if in addition the amount ofexpected work were given by the mutual information between X and Y rather than the di ff erence in their entropies, ouroptimization problem would reduce to finding a point on therate-distortion curve of conventional information theory, with f being the distortion function [51]. (See also [5] for a slightvariant of rate-distortion theory, appropriate when Y di ff ersfrom X , and so the requirement that f ( x , x ) = π does not depend on the precise coupling between x and y under π , but only the associated marginal distributions. Sorate-distortion theory does not directly apply.On the other hand, some of the same kinds of analysis usedin rate-distortion theory can also be applied here. In particular,for any particular component π ( y | x ) where P ( x ) , τ ∗ = ∂ ∂ π ( y | x ) F ( x , π, f ) = P ( x ) P ( y ) > P ( y ) = P x ′ P ( x ′ ) π ( y | x ), as usual). So F ( x , π, f )is concave in every component of π . This means that the opti-mizing channel π may lie on the edge of the feasible region ofconditional distributions. Note though that even if the solutionis on the edge of the feasible region, in general for di ff erent x that optimal π ( y | x ) will put all its probability mass on dif-ferent edges of the unit simplex over Y . So when those edgesare averaged under P ( x ), the result is a marginal distribution P ( y ) that lies in the interior of the unit simplex over Y .As a cautionary note, often in the real world, there is an in-violable upper bound on the rate at which a system can “har-vest” free energy from its environment, i.e. , on how muchfree energy it can harvest per iteration of π (for example,a plant with a given surface area cannot harvest free en-ergy at a faster rate than sunlight falls upon its surface). Inthat case, we are not interested in optimizing a quantity like F ( P , π, f ), which is a weighted average of minimal free en-ergy and expected phenotypic fitness per iteration of π . In-stead, we have a constrained optimization problem with aninequality constraint: find the π that maximizes some quan-tity (e.g., expected phenotypic fitness), subject to an inequal-ity constraint on the free energy required to implement that π .Calculating solutions to these kinds of constrained optimiza-tion problem is the subject of future work. IV. GENERAL IMPLICATIONS FOR BIOLOGY
Any work expended on an organism must first be acquiredas free energy from the organism’s environment. However,in many situations, there is a limit on the flux of free energythrough an organism’s immediate environment. Combinedwith the analysis above, such limits provide upper boundson the “rate of (potentially noisy) computation” that can beachieved by a biological organism in that environment, onceall energetic costs for the organism’s labor ( i.e. , its moving,making / breaking chemical bonds, etc .) are accounted for. As an example, human brains do little labor. Therefore,these results bound the rate of computation of a human brain.Given the fitness cost of such computation (the brain uses ∼
20% of the calories used by the human body), this boundcontributes to the natural selective pressures on humans (inthe limit that operational ine ffi ciencies of the brain have al-ready been minimized). In other words, these bounds suggestthat natural selection imposes a tradeo ff between the fitnessquality of a brain’s decisions and how much computation isrequired to make those decisions. In this regard, it is interest-ing to note that the brain is famously noisy, and as discussedabove, noise in computation may reduce the total thermody-namic work required (see [6, 10, 59] for more about the ener-getic costs of the human brain and its relation to Landauer’sbound).As a second example, the rate of solar free energy incidentupon the Earth provides an upper bound on the rate of com-putation that can be achieved by the biosphere (this boundholds for any choice for the partition of the biosphere’s fine-grained space into macrostates, such that the dynamics overthose macrostates executes π ). In particular, it provides an up-per bound on the rate of computation that can be achieved byhuman civilization, if we remain on the surface of the Earthand only use sunlight to power our computation.Despite the use of the term “organism”, the analysis aboveis not limited to biological individuals. For example, onecould take the input to be a current generation population ofindividuals, together with attributes of the environment sharedby those individuals. We could also take the output to be thenext generation of that population, after selective winnowingbased on the attributes of the environment (e.g., via replica-tor dynamics). In this example, the bounds above do not referto the “computation” performed by an individual, but ratherby an entire population subject to natural selection. There-fore, those bounds give the minimal free energy required torun natural selection.As a final example, one can use these results to analyze howthe thermodynamic behavior of the biosphere changes withtime. In particular, if one iterates π from one t to the next,then the associated initial distributions P t change. Accord-ingly, the minimal amount of free energy required to imple-ment π changes. In theory, this allows us to calculate whetherthe rate of free energy required by the information processingof the terrestrial biosphere increases with time. Prosaically,has the rate of computation of the biosphere increased overevolutionary timescales? If it has done so for most of the timethat the biosphere has existed, then one could plausibly viewthe fraction of free energy flux from the Sun that the biosphereuses as a measure of the “complexity” of the biosphere, a mea-sure that has been increasing throughout the lifetime of thebiosphere.Note as well that there is a fixed current value of the totalfree energy flux incident on the biosphere (from both sunlightand, to a much smaller degree, geologic processes). By the re-sults presented above, this rate of free energy flux gives an up-per bound on the rate of computation that humanity as a wholecan ever achieve, if it monopolizes all resources of Earth, butrestricts itself to the surface of Earth.6 V. DISCUSSION
The noisier the input-output map π of a biological organ-ism, the less free energy the organism needs to acquire fromits environment to implement that map. Indeed, by using asu ffi ciently noisy π , an organism can increase its stored freeenergy. Therefore, noise might not just be a hindrance thatan organism needs to circumvent; an organism may actuallyexploit noise, to “recharge its battery”.In addition, not all maps x t → y t + are equally importantto an organism’s reproductive fitness. In light of this, natu-ral selection would be expected to favor π ’s that are as noisyas possible, while still being precise for those inputs wherereproductive fitness requires it.In this paper, I calculated what π optimizes this tradeo ff .This calculation provides insight into what phenotypes natu-ral selection might be expected to favor. Note though that inthe real world, there are many other thermodynamic factorsthat are important in addition to the cost of processing sensorreadings (inputs) into outputs (actions). For example, thereare the costs of acquiring the sensor information in the firstplace and of internal storage of such information, for futureuse. Moreover, in the real world, sensor readings do not ar-rive in an i.i.d. basis, as assumed in this paper. Indeed, inreal biological systems, often, the current sensor reading, re-flecting the recent state of the environment, reflects previousactions by the organism that a ff ected that same environment(in other words, real biological organisms often behave likefeedback controllers). All of these e ff ects would modify thecalculations done in this paper.In addition, in the real world, there are strong limits on howmuch time a biological system can take to perform its compu-tations, physical labor and rearranging of matter, due to envi-ronmental exigencies (simply put, if the biological system isnot fast enough, it may be killed). These temporal constraintsmean that biological systems cannot use fully reversible ther-modynamics. Therefore, these temporal constraints increasethe free energy required for the biological system to performcomputation, labor and / or rearrangement of matter.Future work involves extending the analysis of this paperto account for such thermodynamic e ff ects. Combined withother non-thermodynamic resource restrictions that real bio-logical organisms face, such future analysis should help usunderstand how closely the organisms that natural selectionhas produced match the best ones possible. ACKNOWLEDGMENTS
Acknowledgment:
I would like to thank Daniel Polani,Sankaran Ramakrishnan and especially Artemy Kolchinskyfor many helpful discussions and the Santa Fe Institute forhelping to support this research. This paper was made pos-sible through the support of Grant No. TWCF0079 / AB47from the Templeton World Charity Foundation and Grant No.FQXi-RHl3-1349 from the FQXi foundation. The opinionsexpressed in this paper are those of the author and do not nec-essarily reflect the view of Templeton World Charity Founda-tion.
APPENDIX A: PROOF OF PROPOSITION 1
We begin with the following lemma:
Lemma 5.
A GQ process over R guided by V (for conditionaldistribution π and initial distribution ρ t ( r , s ) ) will transformany initial distribution:p t ( r , s ) = X v p t ( v ) ρ t ( s | v ) p t ( r | v ) (63) into a distribution:p t + ( r , s ) = X v p t ( v ) ρ t ( s | v ) π ( r | v ) (64) Proof.
Fix some v ∗ by sampling p t ( v ). Since in a GQ, mi-crostates only change during the quasi-static relaxation, afterthe first quench, s and, therefore, v still equal v ∗ . Due to theinfinite potential barriers in S , while s may change during thatrelaxation, v will not, and so, v t + = v ∗ = v t . Therefore: H tquench ; int ( r , s ) ≡ − kT ln[ π ( r | v t )] (65)Now, at the end of the relaxation step, ρ ( r , s ) has settled tothermal equilibrium within the region R × v t ⊂ R × V . There-fore, combining Equation (65) with Equations (29) and (28),we see that the distribution at the end of the relaxation is: ρ t + ( r , s ) ∝ exp (cid:0) − H t + quench ( r , s ) kT (cid:1) δ ( V ( s ) , v t ) = exp (cid:0) ln[ π ( r | v t )] + ln[ ρ t ( s )] (cid:1) δ ( V ( s ) , v t ) = π ( r | v t ) ρ t ( s ) δ ( V ( s ) , v t ) ∝ π ( r | v t ) ρ t ( s | v ) (66)Normalizing, ρ t + ( r , s ) = π ( r | v t ) ρ t ( s | v ) (67)Averaging over v t then gives p t + ( r , s ): p t + ( r , s ) = X v p t ( v ) ρ t ( s | v ) π ( r | v ) (68) (cid:3) ρ t ( s | v ) = s < V ( s ). Therefore, if Equa-tion (64) holds and we sum p t + ( r , s ) over all s ∈ V − ( v ) foran arbitrary v , we get: p t + ( r , v ) = p t ( v ) π ( r | v ) (69)Furthermore, no matter what ρ t ( s | v ) is, p t ( r , v ) = p t ( v ) p t ( r | v ). As a result, Lemma 5 implies that a GQ process over R guided by V (for conditional distribution π and initial distribu-tion ρ t ( r , s )) will transform any initial distribution p t ( v ) p t ( r | v ) into a distribution p t ( v ) π ( r | v ). This is true whether or not p t ( v ) = ρ t ( v ) or p t ( r | v ) = ρ t ( r | v ). This establishes theclaim of Proposition 1 that the first “crucial feature” of GQprocesses holds. APPENDIX B: THE GQ PROCESSES ITERATING APING-PONG SEQUENCE
In this section, I present the separate GQ processes for im-plementing the stages of a ping-pong sequence.First, recall our assumption from just below the definitionof a ping-pong sequence that at the end of any of its stages, Pr ( u | y ) is always the same distribution q yout ( u ) (and similarlyfor distributions like Pr ( w | x )). Accordingly, at the end of anystage of a ping-pong sequence that implements a GQ processover U guided by X , we can uniquely recover the conditionaldistribution Pr ( u | x ) from Pr ( y | x ): π ( u | x ) ≡ X y π ( y | x ) q yout ( u ) (70)(and similarly, for a GQ process over W guided by Y ). Con-versely, we can always recover Pr ( y | x ) from Pr ( u | x ), sim-ply by marginalizing. Therefore, we can treat any distribution π ( u | x ) defining such a GQ process interchangeably with adistribution π ( y | x ) (and similarly, for distributions π ( w | y )and π ( x | y ) occurring in GQ processes over W guided by Y ).1. To construct the GQ process for the first stage, begin bywriting: ρ t ( w , u ) = X x , y G t ( x ) δ ( y , q xproc ( w ) q yout ( u ) = q out ( u ) G t ( X ( w )) q X ( w ) proc ( w ) (71)where G t ( x ) is an assumption for the initial distributionover x , one that in general may be wrong. Furthermore,define the associated distribution: ρ t ( u | x ) = P w ∈X ( x ) ρ t ( w , u ) P u ′ , w ∈X ( x ) ρ t ( w , u ′ ) = q out ( u ) (72)By Corollary 1, running a GQ process over Y guidedby X for conditional distribution π ( u | x t ) and ini-tial distribution ρ t ( w , u ) will send any initial distribu-tion P t ( x ) ρ t ( u | x ) = P t ( x ) q out ( u ) to a distribution P t ( x ) π ( u | x ). Therefore, in particular, it will send any initial x → π ( u | x ). Due to the definition of q yout and Equation (70), the associated conditional dis-tribution over y given x , P u ∈Y ( y ) π ( u | x ), is equal to π ( y | x ). Accordingly, this GQ process implements thefirst stage of the organism process, as desired. In ad-dition, it preserves the validity of our assumptions that Pr ( u | y ) = q yout ( u ) and similarly for Pr ( w | x ).Next, by the discussion at the end of Section II D, thisGQ process will be thermodynamically reversible sinceby assumption, ρ t ( u | x ) is the actual initial distributionover u conditioned on x .2. To construct the GQ process for the second stage, startby defining an initial distribution based on a (possiblycounterfactual) prior G t ( x ):ˆ ρ ( w t , u t ) ≡ X x , y G t ( x ) q xproc ( w t ) π ( y | x ) q yout ( u t ) (73)and the associated conditional distribution:ˆ ρ ( w t | y t ) = P u t ∈Y ( y t ) ˆ ρ ( w t , u t ) P w ′ , u ′ ∈Y ( y t ) ˆ ρ ( w ′ , u ′ ) (74)Note that:ˆ ρ ( w t | y t ) = G t ( x t | y t ) q x t proc ( w t ) (75)where: G t ( x t | y t ) ≡ π ( y t | x t ) G t ( x t ) P x ′ π ( y t | x ′ ) G t ( x ′ ) (76)Furthermore, define a conditional distribution: π ( w t | y t ) ≡ I ( w t ∈ X (0)) q proc ( w t ) (77)Consider a GQ process over W guided by Y for con-ditional distribution π ( w t | y t ) and initial distributionˆ ρ ( w t , u t ). By Corollary 1, this GQ process implementsthe second stage, as desired. In addition, it preservesthe validity of our assumptions that Pr ( u | y ) = q yout ( u )and similarly fo Pr ( w | x ).Next, by the discussion at the end of Section II D, thisGQ process will be thermodynamically reversible if ˆ ρ ( w t | y t + ) is the actual distribution over w t conditionedon y t + . By Equation (76), this in general requires that G t ( x t ), the assumption for the initial distribution over x t that is built into the step (ii) GQ process, is the actualinitial distribution over x t . As discussed at the end ofSection II C, work will be dissipated if this is not thecase. Physically, this means that if the device imple-menting this GQ process is thermodynamically optimalfor one input distribution, but used with another, thenwork will be dissipated (the amount of work dissipatedis given by the change in the Kullback–Leibler diver-gence between G and P in that stage (4) GQ process;see [46]).83. We can also implement the fourth stage by running a(di ff erent) GQ process over X guided by Y . This GQprocess is a simple copy operation, i.e. , implements asingle-valued, invertible function from y t + to the ini- tialized state x . Therefore, it is thermodynamically re-versible. Finally, we can implement the fifth stage byrunning an appropriate GQ process over Y guided by X .This process will also be thermodynamically reversible. [1] Frank, S.A. Natural selection maximizes Fisher information. J.Evolut. Biol. , , 231–244.[2] Frank, S.A. Natural selection. V. How to read the fundamentalequations of evolutionary change in terms of information the-ory. J. Evolut. Biol. , , 2377–2396.[3] Donaldson-Matasci, M.C.; Bergstrom, C.T.; Lachmann, M.The fitness value of information. Oikos , , 219–230.[4] Krakauer, D.C. Darwinian demons, evolutionary complexity,and information maximization. Chaos Interdiscip. J. NonlinearSci. , , 037110.[5] Taylor, S.F.; Tishby, N.; Bialek, W. Information and fitness. , arXiv:0712.4382.[6] Bullmore, E.; Sporns, O. The economy of brain network orga-nization. Nat. Rev. Neurosci. , , 336–349.[7] Sartori, P.; Granger, L.; Lee, C.F.; Horowitz, J.M. Thermo-dynamic costs of information processing in sensory adaptation. PLoS Comput. Biol. , , e1003974.[8] Mehta, P.; Schwab, D.J. Energetic costs of cellular computa-tion. Proc. Natl. Acad. Sci. USA , , 17978–17982.[9] Mehta, P.; Lang, A.H.; Schwab, D.J. Landauer in the age ofsynthetic biology: Energy consumption and information pro-cessing in biochemical networks. J. Stat. Phys. , ,1153–1166.[10] Laughlin, S.B. Energy as a constraint on the coding and pro-cessing of sensory information. Curr. Opin. Neurobiol. , , 475–480.[11] Govern, C.C.; ten Wolde, P.R. Energy dissipation and noisecorrelations in biochemical sensing. Phys. Rev. Lett. , , 258102.[12] Govern, C.C.; ten Wolde, P.R. Optimal resource allocation incellular sensing systems. Proc. Natl. Acad. Sci. USA , , 17486–17491.[13] Lestas, I.; Vinnicombe, G.; Paulsson, J. Fundamental limitson the suppression of molecular fluctuations. Nature , , 174–178.[14] England, J.L. Statistical physics of self-replication. J. Chem.Phys. , , 121923.[15] Landenmark, H.K.; Forgan, D.H.; Cockell, C.S. An estimate ofthe total DNA in the biosphere. PLoS Biol. , , e1002168.[16] Landauer, R. Irreversibility and heat generation in the comput-ing process. IBM J. Res. Dev. , , 183–191.[17] Landauer, R. Minimal energy requirements in communication. Science , , 1914–1918.[18] Landauer, R. The physical nature of information. Physics Lett.A , , 188–193.[19] Bennett, C.H. Logical reversibility of computation. IBM J. Res.Dev. , , 525–532.[20] Bennett, C.H. The thermodynamics of computation—A review. Int. J. Theor. Phys. , , 905–940.[21] Bennett, C.H. Time / space trade-o ff s for reversible computation. SIAM J. Comput. , , 766–776.[22] Bennett, C.H. Notes on Landauer’s principle, reversible compu-tation, and Maxwell’s Demon. Stud. Hist. Philos. Sci. B , , 501–510. [23] Maroney, O. Generalizing Landauer’s principle. Phys. Rev. E , , 031105.[24] Plenio, M.B.; Vitelli, V. The physics of forgetting: Landauer’serasure principle and information theory. Contemp. Phys. , , 25–60.[25] Shizume, K. Heat generation required by information erasure. Phys. Rev. E , , 3495–3499.[26] Fredkin, E.; To ff oli, T. Conservative Logic ; Springer:Berlin / Heidelberg, Germany, 2002.[27] Faist, P.; Dupuis, F.; Oppenheim, J.; Renner, R. A quantitativeLandauer’s principle. , arXiv:1211.1037.[28] Touchette, H.; Lloyd, S. Information-theoretic approach to thestudy of control systems.
Physica A , , 140–172.[29] Sagawa, T.; Ueda, M. Minimal energy cost for thermodynamicinformation processing: Measurement and information erasure. Phys. Rev. Lett. , , 250602.[30] Dillenschneider, R.; Lutz, E. Comment on “Minimal En-ergy Cost for Thermodynamic Information Processing: Mea-surement and Information Erasure”. Phys. Rev. Lett. , , 198903.[31] Sagawa, T.; Ueda, M. Fluctuation theorem with informationexchange: Role of correlations in stochastic thermodynamics. Phys. Rev. Lett. , , 180602.[32] Crooks, G.E. Entropy production fluctuation theorem and thenonequilibrium work relation for free energy di ff erences. Phys.Rev. E , , 2721.[33] Crooks, G.E. Nonequilibrium measurements of free energy dif-ferences for microscopically reversible Markovian systems. J.Stat. Phys. , , 1481–1487.[34] Janna, F.C.; Moukalled, F.; G´omez, C.A. A Simple Derivationof Crooks Relation. Int. J. Thermodyn. , , 97–101.[35] Jarzynski, C. Nonequilibrium equality for freeenergy di ff erences. Phys. Rev. Lett. , ,doi:10.1103 / PhysRevLett.78.2690.[36] Esposito, M.; van den Broeck, C. Second law and Landauerprinciple far from equilibrium.
Europhys. Lett. , , 40004.[37] Esposito, M.; van den Broeck, C. Three faces of the second law.I. Master equation formulation. Phys. Rev. E , , 011143.[38] Parrondo, J.M.; Horowitz, J.M.; Sagawa, T. Thermodynamicsof information. Nat. Phys. , , 131–139.[39] Pollard, B.S. A Second Law for Open Markov Processes. ,arXiv:1410.6531.[40] Seifert, U. Stochastic thermodynamics, fluctuation theoremsand molecular machines. Rep. Prog. Phys. , , 126001.[41] Takara, K.; Hasegawa, H.H.; Driebe, D. Generalization ofthe second law for a transition between nonequilibrium states. Phys. Lett. A , , 88–92.[42] Hasegawa, H.H.; Ishikawa, J.; Takara, K.; Driebe, D. Gener-alization of the second law for a nonequilibrium initial state. Phys. Lett. A , , 1001–1004.[43] Prokopenko, M.; Einav, I. Information thermodynamics ofnear-equilibrium computation. Phys. Rev. E , , 062143.[44] Sagawa, T. Thermodynamic and logical reversibilities revisited. J. Stat. Mech. , , P03025. [45] Mandal, D.; Jarzynski, C. Work and information processing ina solvable model of Maxwell’s demon. Proc. Natl. Acad. Sci.USA , , 11641–11645.[46] Wolpert, D.H. Extending Landauer’s bound from bit erasure to arbitrary computation. , arXiv:1508.05319.[47] Barato, A.C.; Seifert, U. Stochastic thermodynamics with in-formation reservoirs. Phys. Rev. E , , 042150.[48] De ff ner, S.; Jarzynski, C. Information processing and the sec-ond law of thermodynamics: An inclusive, Hamiltonian ap-proach. Phys. Rev. X , , 041003.[49] Barato, A.C.; Seifert, U. An autonomous and reversible Maxwell’s demon. Europhys. Lett. , , 60001.[50] Mackay, D. Information Theory, Inference, and Learning Algo-rithms ; Cambridge University Press: Cambridge, UK, 2003.[51] Cover, T.; Thomas, J.
Elements of Information Theory ; Wiley:New York, NY, USA, 1991.[52] Yeung, R.W.
A First Course in Information Theory ; Springer:Berlin / Heidelberg, Germany, 2012. [53] Reif, F.
Fundamentals of Statistical and Thermal Physics ;McGraw-Hill: New York, NY, USA, 1965.[54] Still, S.; Sivak, D.A.; Bell, A.J.; Crooks, G.E. Thermodynamicsof prediction.
Phys. Rev. Lett. , , 120604.[55] Hopcroft, J.E.; Motwani, R.; Ullman J.D. Introduction toAutomata Theory, Languages and Computability ; Addison-Wesley: Boston, MA, USA, 2000.[56] Li, M.; Vit´anyi, P.
An Introduction to Kolmogorov Complex-ity and Its Applications ; Springer: Berlin / Heidelberg, Germany,2008.[57] Grunwald, P.; Vit´anyi, P. Shannon information and Kol-mogorov complexity. , arXiv:cs / Science , , 2075–2078.[59] Sandberg, A. Energetics of the brain and AI.2016