Sensory capacity: an information theoretical measure of the performance of a sensor
SSensory capacity: an information theoretical measure of the performance of a sensor
David Hartich, Andre C. Barato,
1, 2 and Udo Seifert II. Institut f¨ur Theoretische Physik, Universit¨at Stuttgart, 70550 Stuttgart, Germany Max Planck Institute for the Physics of Complex Systems, N¨othnizer Straße 38, 01187 Dresden, Germany
For a general sensory system following an external stochastic signal, we introduce the sensorycapacity. This quantity characterizes the performance of a sensor: sensory capacity is maximal ifthe instantaneous state of the sensor has as much information about a signal as the whole time-seriesof the sensor. We show that adding a memory to the sensor increases the sensory capacity. Thisincrease quantifies the improvement of the sensor with the addition of the memory. Our results areobtained with the framework of stochastic thermodynamics of bipartite systems, which allows forthe definition of an efficiency that relates the rate with which the sensor learns about the signalwith the energy dissipated by the sensor, which is given by the thermodynamic entropy production.We demonstrate a general tradeoff between sensory capacity and efficiency: if the sensory capacityis equal to its maximum 1, then the efficiency must be less than 1/2. As a physical realizationof a sensor we consider a two component cellular network estimating a fluctuating external ligandconcentration as signal. This model leads to coupled linear Langevin equations that allow us toobtain explicit analytical results.
I. INTRODUCTION
The relation between information and thermodynam-ics is a very active topic, as reviewed in [1]. Prominently,developments in this field lead to a better understandingof fundamental limits related to dissipation in a computerand of cellular information processing. Much of the re-newed interest in this relation between information andthermodynamics is associated with the fact that recentexperiments with small systems verify fundamental rela-tions like the Landauer limit for the erasure of a bit [2, 3]and the conversion of information into work [4–6]. Theo-retical advances in the field include second law inequali-ties and fluctuation relations containing an informationalterm [7–29], generalization of thermodynamics to includeinformation reservoirs [30–39], stochastic thermodynam-ics of bipartite systems [40–46], and the relation betweendissipation and information in biological systems [47–59].A sensor that learns about (or “measures”) an externalstochastic signal constitutes a fundamental setup withinthermodynamics of information processing. In this caseenergy is dissipated and the sensor obtains informationabout the external signal, in contrast to a Maxwell’s de-mon, which is another fundamental setup, where infor-mation is used to extract work.General results for the thermodynamics of a sensorhave been obtained by Still et al. [60]. They have shownthat an entropy characterizing how much information thesensor obtains about the external signal is bounded bythe dissipated heat. Similarly, we have shown that anentropic rate, dubbed learning rate, is bounded by thethermodynamic entropy production in bipartite systems[55], which allowed for the definition of a thermodynamicefficiency for models related to cellular information pro-cessing.In this paper, using bipartite Markov processes we in-troduce the sensory capacity, an informational efficacyparameter characterizing the performance of a sensor.This quantity is defined as the learning rate divided by the transfer entropy rate, where the latter quantifieshow much information the full time series of the sensorhas about the signal. Sensory capacity is positive andbounded by 1. The limit 1 is reached if the informationcontained in the instantaneous state of the sensor equalsthe information contained in the whole time-series of thesensor, which is the maximum information the sensor canhave about the signal.A bare sensor, i.e., a sensor with only one degree offreedom, is compared to a sensor that contains a mem-ory, which is a second degree of freedom. We show thatthe addition of a memory to a bare sensor can increasethe sensory capacity. This increase in sensory capacityquantifies how much of the information contained in thetime-series of the bare sensor is stored in the instanta-neous state of the memory.Our results are obtained with coupled linear Langevinequations that constitute a simple example of a bipar-tite system. These linear Langevin equations are derivedfrom a discrete model for a two component cellular net-work estimating an external ligand concentration, whichis the signal. The two components of the network are re-ceptors that can bind external ligands and internal pro-teins that play the role of memory [48, 53, 54, 58, 61].This derivation starting with a physical model for a sen-sor allows us to provide a clear physical interpretationfor the parameters showing up in the Langevin equationsand for the thermodynamic entropy production.The relation between sensory capacity and energy dis-sipation is also discussed. Particularly, as a main resultwe show that if the sensory capacity is 1, the efficiencyrelating learning rate and rate of dissipation must besmaller than 1/2. This result is valid for any bipartiteprocess. The specific tradeoff between sensory capacityand efficiency for the coupled linear Langevin equationsis analyzed in detail.The paper in organized as follows. In Sec. II we definediscrete bipartite processes and the quantities calculatedin the paper. Sec. III contains the derivation of the a r X i v : . [ c ond - m a t . s t a t - m ec h ] F e b coupled linear Langevin equations from the microscopicmodel for a two component network. The analysis of theLangevin equations is performed in Sec. IV. The generaltradeoff between sensory capacity and efficiency is de-rived in Sec. V. We conclude in Sec. VI. The continuumlimit from a master equation to a Langevin equation inbipartite systems is presented in Appendix A. The un-certainty about the signal given the sensor state and theuncertainty given the sensor trajectory are calculated inAppendix B. II. BIPARTITE MARKOV PROCESSES ANDSENSORY CAPACITYA. Definition of bipartite systems
A state of the signal is denoted by x and a state ofthe sensor by y . We consider a quite general framework,where the basic assumptions are that the dynamics ofthe full system composed by the signal and the sensor isMarkovian, the dynamics of the signal is not affected bythe sensor whereas the dynamics of the sensor is affectedby the signal, and the signal alone is also Markovian.For a Markov jump process these assumptions imply thefollowing transition rates from a state ( x, y ) to a state( x (cid:48) , y (cid:48) ), w xx (cid:48) yy (cid:48) ≡ w xx (cid:48) if x (cid:54) = x (cid:48) and y = y (cid:48) ,w xyy (cid:48) if x = x (cid:48) and y (cid:54) = y (cid:48) , x (cid:54) = x (cid:48) and y (cid:54) = y (cid:48) . (1)Such a Markov process, for which the two variables la-beling a state cannot both change in a jump, is calledbipartite [41]. The rates (1) correspond to a partic-ular case of a bipartite process since w xx (cid:48) is indepen-dent of y . For bipartite systems in a steady state,which is the regime we consider in this paper, the sta-tionary probability of state ( x, y ) is written as P ( x, y ).The marginals of this joint probability are defined as P ( x ) ≡ (cid:80) y P ( x, y ) and P ( y ) ≡ (cid:80) x P ( x, y ). The station-ary conditional probabilities read P ( x | y ) ≡ P ( x, y ) /P ( y )and P ( y | x ) ≡ P ( x, y ) /P ( x ).Key quantities in this paper are Shannon entropy andmutual information. The Shannon entropy associatedwith a random variable A is H [ A ] ≡ − (cid:88) a P ( A = a ) ln P ( A = a ) (2)where a is a specific realization of A and P denotes ageneric probability. The random variables A can be theinstantaneous state of the signal x t or of the sensor y t .Furthermore, A can be a full time series of the signal { x t (cid:48) } t (cid:48) ≤ t or of the sensor { y t (cid:48) } t (cid:48) ≤ t . In the first case, thesum in a in Eq. (2) is a sum over all possible states.In the second case, this sum corresponds to a functionalintegration over all possible trajectories. The conditional Shannon entropy of A given another random variable B is H [ A | B ] ≡ − (cid:88) a,b P ( A = a, B = b ) ln P ( A = a | B = b ) . (3)The mutual information between A and B reads I [ A : B ] ≡ H [ A ] − H [ A | B ] = H [ B ] − H [ B | A ] , (4)where the second equality indicates that the mutual in-formation is symmetric in the variables A and B . B. Learning rate
The learning rate is defined as [55] l y ≡ H [ x t | y t ] − H [ x t | y t +d t ]d t , (5)where here and in the following in all expressions thatinvolve a d t in the denominator the limit d t → x t , i.e., the rate at which the sensor reduces theuncertainty (as characterized by the conditional Shan-non entropy) of the signal due to its dynamics [55]. Thelearning rate can also be written in terms of mutual in-formation l y = I [ x t : y t +d t ] − I [ x t : y t ]d t , (6)which is the rate at which the y jumps increase the mu-tual information between the sensor y and the signal x .This form of the learning rate is also known as “informa-tion flow” [40, 44, 45]. Using the relations P ( x t +d t = x (cid:48) | x t = x ) = w xx (cid:48) d t for x (cid:54) = x (cid:48) , P ( y t +d t = y (cid:48) | x t = x, y t = y ) = w xyy (cid:48) d t for y (cid:54) = y (cid:48) (7)the learning rate (5) becomes l y = (cid:88) x,y,y (cid:48) P ( x, y ) w xyy (cid:48) ln P ( x | y (cid:48) ) P ( x | y ) . (8)In the steady state the learning rate is equal to the rateof Shannon entropy reduction of x due to its couplingwith y , which is defined as [42] h x ≡ H [ x t +d t | y t ] − H [ x t | y t ]d t . (9)This conservation law comes from the relation dd t H [ x | y ] ≡ h x − l y = 0 [55], where h x is the con-tribution due to the x jumps, i.e., h x = (cid:88) x,x (cid:48) ,y P ( x, y ) w xx (cid:48) ln P ( x | y ) P ( x (cid:48) | y ) . (10) x · · · x t − d t x t x t +d t y · · · y t − d t y t y t +d t T x → y l y FIG. 1. (Color online) Learning rate versus transfer entropyrate. The learning rate takes into account only the instan-taneous state x t (dashed green box) to infer the signal x t ,whereas the transfer entropy T x → y takes into accout the tra-jectory highlighted by the blue shaded region. Since in the stationary state H [ y t +d t ] = H [ y t ], the learn-ing rate can also be written in the form l y = h x = I [ x t : y t ] − I [ x t +d t : y t ]d t (11)This expression is similar to the one used in [60], wherewithin a discrete time formalism the term I [ x t +d t : y t ] isidentified as “predictive power”. C. Sensory capacity and transfer entropy rate
Transfer entropy is an informational quantity that de-tects causal influence between two random variables [62].It plays an important role in the relation between infor-mation thermodynamics for causal networks [19], bipar-tite systems [40, 42, 45], and feedback driven systems[18]. The transfer entropy rate from the signal to thesensor T x → y is defined as [42] T x → y ≡ H [ y t +d t |{ y t (cid:48) } t (cid:48) ≤ t ] − H [ y t +d t |{ y t (cid:48) } t (cid:48) ≤ t , x t ]d t = I [ y t +d t : x t |{ y t (cid:48) } t (cid:48) ≤ t ]d t = H [ x t |{ y t (cid:48) } t (cid:48) ≤ t ] − H [ x t | y t +d t , { y t (cid:48) } t (cid:48) ≤ t ]d t . (12)In the third line the similarity with the learning rate (5)is explicit: the transfer entropy rate T x → y quantifies howmuch information the whole sensor trajectory { y t (cid:48) } t (cid:48) ≤ t contains about the instantaneous signal x t , in contrastto the learning rate that considers only the instantaneousstate y t . This difference between the learning rate l y andthe transfer entropy rate T x → y is illustrated in Fig. 1.The first line of Eq. (12) contains the standard defi-nition of transfer entropy from the signal to the sensor[62], which can be described as the reduction on the con-ditional Shannon entropy of y t +d t given { y t (cid:48) } t (cid:48) ≤ t by thefurther knowledge of the signal state x t .As shown in [42] l y ≤ T x → y , which simply means thatthe whole trajectory of the sensor { y t (cid:48) } t (cid:48) ≤ t contains moreinformation about the instantaneous signal x t than theinstantaneous state of the sensor y t . Based on this in-equality we propose the definition C ≡ l y T x → y ≤ C = 1 the sensor hasreached an information theoretical limit and its instan-taneous state has the maximum possible information,which is the information contained in the whole time se-ries of the sensor. On a side note, as a result relatedto the fact that the full time series of a sensor containsmore information about the signal than its instantaneousstate, it has been shown that an information driven ma-chine using the whole history of measurements can ex-tract more work than a machine that only takes the lastmeasurement into account [22, 27]. This increase in workextraction is characterized by a gain parameter that, likethe sensory capacity, is positive and bounded by one. D. Thermodynamic entropy production andefficiency
The thermodynamic entropy production [63] for bipar-tite processes has two contributions. One is due to jumpsthat change the state of the signal, σ x ≡ (cid:88) x,x (cid:48) P ( x ) w xx (cid:48) ln w xx (cid:48) w x (cid:48) x . (14)If the bare signal is an equilibrium process, which is thecase for the examples considered in this paper, σ x = 0.The second contribution arises from jumps that changethe state of the sensor, which reads σ y ≡ (cid:88) x,y,y (cid:48) P ( x, y ) w xyy (cid:48) ln w xyy (cid:48) w xy (cid:48) y . (15)The inequality l y ≤ σ y leads to the efficiency [55] η ≡ l y σ y ≤ . (16)This efficiency relates the rate at which the sensor learnsabout the signal with the rate of free energy dissipation,which is quantified by the thermodynamic entropy pro-duction. For the model system in Sec. III, the entropyproduction has two terms. One is related to work done bythe external signal and another to free energy dissipationinside the cell. E. Upper bound on the transfer entropy,coarse-grained entropy production andcoarse-grained learning rate
We now recall the definition of further quantities thatwill be calculated in this paper. The first quantity is anupper bound on the transfer entropy rate T x → y ≡ H [ y t +d t | y t ] − H [ y t +d t | y t , x t ]d t ≥ T x → y . (17)An important property of this upper bound is that, unlikethe transfer entropy rate, it can be written in terms ofthe stationary distribution as [42] T x → y = (cid:88) x,y,y (cid:48) P ( x, y ) w xyy (cid:48) ln w xyy (cid:48) w yy (cid:48) , (18)where w yy (cid:48) ≡ (cid:88) x P ( x | y ) w xyy (cid:48) . (19)The inequality T x → y ≥ T x → y is obtained bycomparing Eq. (12) with Eq. (17), and us-ing relations H [ y t +d t | y t ] ≥ H [ y t +d t |{ y t (cid:48) } t (cid:48) ≤ t ] and H [ y t +d t |{ y t (cid:48) } t (cid:48) ≤ t , x t ] = H [ y t +d t | y t , x t ].The coarse grained entropy production is obtained byintegrating the variable x out, leading to the expression[64] ˜ σ y ≡ (cid:88) yy (cid:48) P ( y ) w yy (cid:48) ln w yy (cid:48) w y (cid:48) y ≥ . (20)This ˜ σ y is a lower bound on the real entropy production,i.e., σ y ≥ ˜ σ y [64]. F. Sensor with a memory
We now consider a sensor with two degrees of freedom y ≡ ( r, m ). We assume that r is the first degree of free-dom directly sensing the signal x and m is a memorystoring the information collected by r (see [58] for a sim-ilar setup). The coarse-grained learning rate is definedas [55] l r ≡ H [ x t | r t ] − H [ x t | r t +d t ]d t = (cid:88) x,r,r (cid:48) ,m P ( x, r, m ) w x ( r,m )( r (cid:48) ,m ) ln P ( x | r (cid:48) ) P ( x | r ) , (21)where w x ( r,m )( r (cid:48) ,m ) denotes the transition rate from( x, r, m ) to ( x, r (cid:48) , m ). The rate at which r alone learnsabout the signal x is quantified by l r ≤ l y [55]. Thetransition rates then have the form w xx (cid:48) yy (cid:48) ≡ w xx (cid:48) if x (cid:54) = x (cid:48) and y = y (cid:48) ,w xrr (cid:48) if x = x (cid:48) , r (cid:54) = r (cid:48) and m = m (cid:48) ,w ( r,m )( r,m (cid:48) ) if x = x (cid:48) , r = r (cid:48) and m (cid:54) = m (cid:48) , , (22)where y (cid:48) = ( r (cid:48) , m (cid:48) ). The transitions rates (22) imply thecausal relation x → r → m , which is illustrated in Fig.2. Therefore, the coarse grained learning rate in Eq. (21)becomes l r = (cid:88) x,r,r (cid:48) P ( x, r ) w xrr (cid:48) ln P ( x | r (cid:48) ) P ( x | r ) . (23) y x · · · x t − d t x t x t +d t r · · · r t − d t r t r t +d t m · · · m t − d t m t m t +d t T x → y T x → r l y FIG. 2. (Color online) Illustration of the causal relation x → r → m for a sensor y = ( r, m ) composed of the first layer r and the memory m . Transition rates with three variables that do not changesimultaneously in a jump, as in Eq. (22), form a tripar-tite system, which is a particular case of a multipartiteMarkov process [46]. The transfer entropy in this casefulfills the relation T x → y = T x → r , (24)where T x → r ≡ H [ r t +d t |{ r t (cid:48) } t (cid:48) ≤ t ] − H [ r t +d t |{ r t (cid:48) } t (cid:48) ≤ t , x t ]d t . (25)Relation (24) means that the transfer entropy from thesignal x to the sensor y = ( r, m ) is equal to the transferentropy from x to the first layer of the sensor r . Thisrelation is a consequence of the causal relation x → r → m and can be demonstrated as follows.By defining z t ≡ ( x t , r t , m t ) the conditional probabil-ity P ( z t +d t | z t ) can be written as P ( z t +d t | z t ) = P ( x t +d t | x t ) P ( r t +d t | x t , r t ) P ( m t +d t | r t , m t ) , (26)which follows from the structure of the rates in Eq. (22).From the definition of the conditional Shannon entropy(3), Eq. (26) implies the following relations H [ z t +d t | z t ] = H [ x t +d t | x t ] + H [ r t +d t | x t , r t ] + H [ m t +d t | r t , m t ] , (27)and H [ y t +d t | y t , x t ] ≡ H [ r t +d t , m t +d t | r t , m t , x t ]= H [ r t +d t | r t , x t ] + H [ m t +d t | r t , m t ] . (28)For large time t , the Markov property P ( z t +d t | z t ) = P ( z t +d t |{ z t (cid:48) } t (cid:48) ≤ t ) and (26) lead to H [ y t +d t |{ y t (cid:48) } t (cid:48) ≤ t ] = H [ r t +d t , m t +d t |{ r t (cid:48) } t (cid:48) ≤ t , { m t (cid:48) } t (cid:48) ≤ t ]= H [ r t +d t |{ r t (cid:48) } t (cid:48) ≤ t ] + H [ m t +d t | m t , r t ] . (29)Finally, from Eqs. (28) and (29) we obtain the transferentropy rate (12) in the form T x → y = H [ y t +d t |{ y t (cid:48) } t (cid:48) ≤ t ] − H [ y t +d t | y t , x t ]d t = H [ r t +d t |{ r t (cid:48) } t (cid:48) ≤ t ] − H [ r t +d t | r t , x t ]d t , (30) Y ∗ Y Y ∗ Y Y Y Y ∗ YY ∗ Y N b = 7 and the number of occupied receptors is n b = 3. The number of internal proteins, which constitutethe memory, is N y = 10 with n y = 4 of them phosphorylated.The number of occupied receptors affects the transition ratesrelated to the phosphorylation of internal proteins. which after a comparison with (25) yields the desiredequality (24).From the definition of the upper bound on the transferentropy rate (17) and Eq. (26) we obtain T x → y = H [ r t +d t | r t , m t ] − H [ r t +d t | r t , x t ]d t . (31)Hence, the inequality H [ r t +d t | r t , m t ] ≤ H [ r t +d t | r t ] leadsto T x → y ≤ T x → r , (32)where T x → r ≡ H [ r t +d t | r t ] − H [ r t +d t | r t , x t ]d t . (33)Note that inequality (32) is the opposite to what happensto the learning rate, i.e., l r ≤ l y . The chain of inequal-ities that summarizes the inequalities discussed in thissection involving learning rate, coarse grained learningrate, transfer entropy rates and upper bounds on trans-fer entropy rates is given by l r ≤ l y ≤ T x → r = T x → y ≤ T x → y ≤ T x → r . (34)The adaptation of the expressions from this section to thecontinuous limit, where the master equation becomes aFokker-Planck equation, is presented in Appendix A. III. CELLULAR TWO COMPONENTNETWORK SENSING AN EXTERNAL LIGANDCONCENTRATION
As a physical realization of a sensor we consider thecellular two component network sensing a fluctuating lig-and concentration shown in Fig. 3 (see [58] for a similar setup). The signal x is related to the external ligand con-centration s through the expression x = ln( s/s ), where s is some base concentration value. The first layer of thetwo-component network, which is the degree of freedomdirectly sensing the external concentration, is composedby the receptors. Each receptor can be either bound bya ligand or empty, with the possible values of the num-ber of bound receptors given by n b = 0 , , . . . , N b , where N b is the total number of receptors. The second layer ofthe two-component network is composed by internal pro-teins Y that can be phosphorylated to the state Y ∗ . Thenumber of proteins in this phosphorylated form takes thevalues n y = 0 , , . . . , N y , where N y is the total number ofproteins. This second degree of freedom is the memory ofthe sensor: the phosphorylation/dephosphorylation reac-tion rates depend on n b , whereas n y has no influence onthe transition rates changing the number of occupied re-ceptors. A state of the sensor is fully characterized by y = ( n b , n y ).The rates with which the concentration changes arewritten as w (1) ± ( x ) = D x d x exp (cid:18) ∓ ω x x D x d x (cid:19) , (35)where x is a multiple of d x and the “+” sign indicatesa jump from x to x + d x while the “ − ” sign indicates ajump from x to x − d x . As shown in Appendix A, thelimit d x → x t = − ω x x t + ξ x t , (36)for the dynamics of the signal. The white noise ξ x t fulfillsthe relation (cid:104) ξ x t ξ x t (cid:48) (cid:105) = 2 D x δ ( t − t (cid:48) ) , (37)where the brackets denote an average over stochastic tra-jectories.The number of occupied receptors changes with rates w (2)+ ( x, n b ) = ω +r ( x )[ N b − n b ] w (2) − ( x, n b ) = ω − r ( x ) n b , (38)where ω +r ( x ) is the rate for the binding of a ligand to anyfree receptor and ω − r ( x ) is the rate for the unbinding ofa ligand from any occupied receptor. These rates fulfillthe generalized detailed balance relation ω +r ( x ) /ω − r ( x ) =exp[ ∆F ( x )], where ∆F ( x ) is the free energy differencebetween empty and occupied receptor and k B T ≡ n b κ + −−− (cid:42)(cid:41) −−− n b κ − Y ∗ + ADP , (39)which are proportional to the number of bound receptors n b . Besides this chemical reaction the internal proteinscan also be dephosphorylated through the reactionY ∗ ν + −− (cid:42)(cid:41) −− ν − Y + P i , (40)where the rates are independent of n b . The rates in(39) and (40) fulfill the relation ln[ κ + ν + / ( κ − ν − )] ≡ ∆µ ,where ∆µ ≡ µ ATP − µ ADP − µ P i is the free energy liber-ated in one ATP hydrolysis. We define the total transi-tion rates for individual proteins as ω +m ( n b ) ≡ n b κ + + ν − ,ω − m ( n b ) ≡ n b κ − + ν + . (41)With these rates for the change of an individual proteinwe obtain the transition rates for a change in the variable n y , w (3)+ ( n b , n y ) = ω +m ( n b )[ N y − n y ] ,w (3) − ( n b , n y ) = ω − m ( n b ) n y . (42)The entropy production due to the sensor jumps σ y has two contributions. The first is due to jumps thatchange the receptors occupancy σ r = (cid:88) x,n b J r ( x, n b ) ln w (2)+ ( x, n b ) w (2) − ( x, n b + 1) (43)where J r ( x, n b ) ≡ P ( x, n b ) w (2)+ ( x, n b ) − P ( x, n b + 1) w (2) − ( x, n b + 1) (44)is the probability current. The second is due to jumpsthat change the number of phosphorylated internal pro-teins σ m = (cid:88) n b ,n y J m ( n b , n y ) ln w (3)+ ( n b , n y ) w (3) − ( n b , n y + 1) (45)where J m ( n b , n y ) ≡ P ( n b , n y ) w (3)+ ( n b , n y ) − P ( n b , n y + 1) w (3)+ ( n b , n y + 1) . (46)The quantity σ r corresponds to the rate of dissipatedheat due to binding and unbinding of ligands at differentconcentrations values. This dissipated heat is compen-sated by work that is done by the external signal. Thequantity σ y is the rate of dissipated free energy relatedto the consumption of ATP inside the cell. Actually,since we are not considering each individual link withthe phosphorylation and dephosphorylation chemical re-actions, but rather the total transition rates in Eq. (41), σ m is a lower bound on the rate of heat dissipated due toATP consumption. A thorough discussion on the physi-cal origin of different terms in the entropy production forrelated models can be found in [55].As shown in Appendix A, taking the linear noise ap-proximation and assuming a signal with small fluctua-tions, the transition rates in Eqs. (35), (38), and (42) lead to the Langevin equations˙ x t = − ω x x t + ξ x t (signal) , ˙ r t = − ω r ( r t − x t ) + ξ r t (sensor) , (47)˙ m t = − ω m ( m t − r t ) + ξ m t (memory) , where (cid:104) ξ it ξ jt (cid:48) (cid:105) = 2 D i δ ij δ ( t − t (cid:48) ) for i, j = x , r , m. Thevariable r is related to the number of bound receptors,as shown in Eq. (A17), and the memory m to the num-ber of phosphorylated internal proteins, as shown in Eq.(A18). The precise relations between the parameters inthese equations and the transitions rates can be foundin Appendix A. There are three key points about theserelations. First, for ∆ µ = 0, i.e., without free energydissipation due to ATP hydrolysis inside the cell, thememory becomes decoupled from the receptor and has noinformation about the signal, which in Eq. (47) implies D m → ∞ . Second, the noise amplitude D r is inverselyproportional to the total number of receptors N b . Third,the noise amplitude D m is inversely proportional to thetotal number of internal proteins N y . IV. SENSORY CAPACITY AND EFFICIENCYFOR MODEL SYSTEMA. Bare sensor
First we consider a bare sensor without memory, i.e.,the Langevin equations (47) without the variable m . Weuse the subscript r for the sensory capacity C r and theefficiency η r for the bare sensor of this subsection in or-der to differentiate it from the sensor with a memoryanalyzed in the next subsection. The corresponding Lya-punov equation for the covariance matrix Σ = (cid:18) Σ xx Σ xr Σ rx Σ rr (cid:19) ≡ (cid:18) (cid:104) x t x t (cid:105) (cid:104) x t r t (cid:105)(cid:104) r t x t (cid:105) (cid:104) r t r t (cid:105) (cid:19) (48)reads [65, 66] ˙ Σ = − AΣ − ΣA (cid:62) + 2 D , (49)where A ≡ (cid:18) ω x − ω r ω r (cid:19) and D ≡ (cid:18) D x D r (cid:19) . (50)The steady state solution of (49) is Σ = E (cid:32) ν r ν r +1 ν r ν r +1 (cid:104) ν r ν r +1 + B r ν r (cid:105)(cid:33) , (51)where E ≡ D x /ω x is the signal variance, ν r ≡ ω r /ω x and B r ≡ D r /D x .As shown in Appendix A, the learning rate is l r = ω x (cid:20) ν ν + B r (1 + ν r ) (cid:21) (52) B r (a) T x → r l r − − B r (c) E x | r traj / E x E x | r / E x − − B r (b) C r η r T x → r / T x → r − − FIG. 4. (Color online) Sensor performance as function ofsensor noise B r = D r /D x . (a) Transfer entropy T x → r andlearning rate l r are displayed. The vertical dotted line at B r = ν / ( ν −
1) indicates the value for which C r = 1, i.e., l r = T x → r . (b) Efficiency ( η r = l r /σ r ) and capacity ( C r = l r / T x → r ) of the bare sensor. At maximal capacity C r = 1 theefficiency is η r = 1 / T x → r = T x → r . (c) Comparison oferrors. For C r = 1 the inequality E x | r traj ≤ E x | r saturates.Parameters: ω x ≡ , D x ≡ . , ν r = ω r /ω x ≡ The transfer entropy rate for the linear Langevin equa-tions (47) is given by [45] T x → r = ω x (cid:115) ν B r − . (53)The learning rate and transfer entropy rate as functionsof B r are plotted in Fig. 4(a). Both quantities get smalleras the noise amplitude of the sensor gets larger. At anintermediate value of B r = ν / ( ν −
1) learning rate andtransfer entropy become the same leading to a sensorycapacity C r = 1, as shown in Fig. 4(b).Since the bare sensor does not have a memory there isno ATP consumption inside the cell and the entropy pro-duction is equal to the rate of work done by the externalsignal, which, as calculated in Appendix A in Eq. (A38),is σ r = ω x ν B r (1 + ν r ) . (54)This entropy production decreases with B r , i.e., a sen-sor with smaller noise amplitude, which can be obtained by increasing the number of receptors [see Eq. (A14)],implies more energy dissipation. In Fig. 4(b) the ther-modynamic efficiency is compared with sensory capacity.The efficiency increases with B r . For B r = ν / ( ν − C r = 1, the efficiency is η r = 1 /
2. As we show inin Sec. V there is a general tradeoff between efficiencyand sensory capacity, with C = 1 implying η ≤ / T x → r = ω x ν B r (cid:20) − ν ν + ν + B r (1 + ν r ) (cid:21) . (55)This quantity has also been calculated in [59]. Compar-ing the upper bound with the transfer entropy rate inFig. 4(b) we observe that for this model when sensorycapacity is one we have l r = T x → r = T x → r . This factplays an important role in the general tradeoff betweensensory capacity and efficiency proved in Sec. V.In Appendix B we define the uncertainties E x | r and E x | r traj about the signal given the sensor state and thesensor trajectory, respectively. As shown in Appendix B, E | r traj is proportional to the transfer entropy rate T x → r and E | r is proportional to the upper bound T x → r forthe present model. Hence, the equality between transferentropy rate and upper bound for C r = 1 implies thatboth uncertainties are also the same, as shown in Fig.4(c). B. Memory increases sensory capacity
For the regimes where the bare sensor does not reach asensory capacity close to 1, it is possible to increase thissensory capacity by adding a memory to the bare sensor,which leads to the third equation in (47). The Lyapunovequation (49) for this case has the 3 × A = ω x − ω r ω r − ω m ω m and D ≡ D x D r
00 0 D m . (56)The stationary solution of (49) is too long to be displayedhere.The expression for the learning rate l y is given in Ap-pendix A in Eq. (A44). As shown in Eq. (24), theaddition of the memory does not change the transfer en-tropy T x → y = T x → r which remains as given by (53). Thecoarse grained learning rate l r is the learning rate for thebare sensor calculated in Eq. (52). The quantities l y , l r and T x → r are plotted in Fig. 5(a) as a function of thenoise amplitude B m ≡ D m /D x . For larger values of B m the learning rate l y becomes equal to l r , the learning ratedoes now increase substantially with the addition of amemory with large noise amplitude. By decreasing thenoise amplitude l y increases until it reaches the transferentropy T x → r for small B m . Hence, the sensory capacity C increases with decreasing B m , as shown in Fig. 5(b). B m T x → r l r l y T x → y T x → r − − (a) B m (c) E x | r traj / E x E x | r / E x E x | y / E x − − B m CC r η (b) − − − − − FIG. 5. (Color online) Effect of a memory. (a) Transfer en-tropy T x → r , learning rate of the bare sensor l r and of the fullsensor l y (including the memory) as function of the memorynoise B m = D m /D x . The transfer entropy estimate T x → y and the learning rate l y approach T x → r for B m →
0. (b)Sensory capacities C = l y / T x → r and C r = l r / T x → r in com-parison with thermodynamical efficiency η = l y /σ y . (c) Ef-fect of memory on error. The error E x | y corresponding tothe full sensor state approaches the minimal error E x | r traj for B m →
0. Parameters: ω x ≡ D x ≡ − , ν r = ω r /ω x ≡ B r = D r /D x ≡ − , and ν m = ω m /ω x ≡ (cid:112) ν /B r (cid:39) The rate of free energy dissipation has now two con-tributions, i.e., σ y = σ r + σ m . The σ r given by (54) cor-responds to the work done by the external signal. Theadditional term, which is derived in Appendix A in Eq.(A39), is given by σ m = ω x ν [ ν + B r (1 + ν m )(1 + ν r )] B m (1 + ν m )(1 + ν r )( ν m + ν r ) , (57)where ν m ≡ ω m /ω x . This σ m is a lower bound on the rateof dissipated free energy due to ATP consumption. Fromexpression (57), the decrease in the noise amplitude D m ,which leads to an increase in sensory capacity, implies anincrease in the rate of ATP consumption inside the cell.Adding a dissipative memory to a bare sensor can leadto an increase in sensory capacity. This increase corre-sponds to how much of the information about the trajec-tory { r t (cid:48) } t (cid:48) ≤ t is contained in the instantaneous state ofthe memory m t .For fixed B m , the sensory capacity C as a function of ν m ≡ ω m /ω x has a maximum, as shown in the contourplot in Fig. 6. Therefore, for a given ω x , which charac- B m − − ν m F C FIG. 6. (Color online) Effect of memory parameters ν m and B m on the sensory capacity. For ν m = (cid:112) ν /B r (cid:39) and B m → C → (cid:70) ) marksthe parameter ( ν (cid:70) m , B (cid:70) m ) for which the efficiency η is maximal(here η (cid:70) (cid:39) . terizes the time-scale of changes in the external signal,the memory has an optimal ω m , which characterizes thetime-scale of changes in the memory. A sensory capacityclose to 1 is reached for small B m and ν m ≈ (cid:112) ν /B r ,as indicated by the red region in Fig. 6.A larger σ m leads to a lower efficiency, as shown in Fig.5(b). Adding a memory with a high rate of dissipationdue to ATP consumption can increase a low sensory ca-pacity to the limit C = 1. In this case when C = 1 theefficiency is small due to the high dissipation of the mem-ory. For example, the maximal efficiency that is achievedin the region plotted in Fig. 6 is η (cid:39) . C = 1implies that both uncertainties are equal, as shown in Fig5(c). V. TRADEOFF BETWEEN SENSORYCAPACITY AND EFFICIENCYA. Tradeoff for model system
There are two situations for which the maximal sensorycapacity C = 1 can be reached. Either the parametersrelated to the signal and the first layer of the sensor arechosen in such a way that there is no further informationin the trajectory { r t (cid:48) } t (cid:48) ≤ t as compared to the instanta- s e n s o r y c a p a c i t y C efficiency η with memorybare sensor FIG. 7. Trade-off between capacity C and efficiency η . Theparameters for the bare sensor ν r and B r are chosen at randomwith 10 − ≤ ν r , B r ≤ . For the sensor with memory, inaddition, the parameters ν m and B m are chosen in the sameway. The solid lines indicate the bounds 4 η (1 − η ) ≤ C ≤ (cid:112) η (1 − η ) for bare sensor. Our numerics indicates that theupper bound C ≤ (cid:112) η (1 − η ) is also valid for the sensor withmemory for η ≥ / neous state r t or a dissipative memory is added to thesensor. In the first case, the efficiency is η = 1 / C = 1 and in the other case η < / η r (1 − η r ) ≤ C r ≤ (cid:112) η r (1 − η r ) , (58)which are derived in the following way. From (52) and(54) the efficiency reads η r = l r σ r = B r ν r (1 + ν r ) ν + B r (1 + ν r ) , (59)and from (52) and (53) the sensory capacity reads C r = l r T x → r = 2 ν [ ν + B r (1 + ν )][ (cid:112) ν /B r − . (60)The upper (lower) bound in Eq. (58) is obtained bymaximizing (minimizing) the capacity (60) with respectto the variables ν r , B r ≥ η ≥ / B. General proof
We now prove as a general trade-off between sensorycapacity and efficiency: a sensory capacity C = 1 implies η ≤ /
2. Our proof depends on the reasonable assump-tion that for any sensor it is possible to create a fictitiousmemory such that the instantaneous state of the fictitioussensor, composed of the sensor and the fictitious mem-ory, contains the whole history of the sensor. From thecalculations for the model system in Sec. IV, we expectthis fictitious memory to have two general characteris-tics. First, it must be precise. For the model system thisprecision is characterize by a small D m in Eq. (47), whichcan be achieved for the case the total number of proteinsinside the cell is very large, i.e., the memory has a largenumber of possible states. Second, the time scale forchanges in states of the fictitious memory must be tunedto some optimal value. For the model system this timescale is characterized by ω m in Eq. (47). For a systemthat is more elaborate than our model system one canthink of a multicomponent memory with the time-scaleof each component optimally tuned to store informationabout a certain part of the sensor.From the chain of inequalities, summarized in (34),adding the memory raises the learning rate and lowers theupper bound on transfer entropy rate. In a first step, weimpose that (i) C = 1 and (ii) that the transfer entropyrate is equal to the upper bound, i.e., l y = T x → y = T x → y .From relations (8) and (18) we obtain T x → y − l y = (cid:88) y,y (cid:48) P ( y ) (cid:88) x P ( x | y ) w xyy (cid:48) ln P ( x | y ) w xyy (cid:48) P ( x | y (cid:48) ) w yy (cid:48) ≥ , (61)where the log sum inequality above is saturated if andonly if the term inside the logarithm is independent of x [67]. Hence, if T x → y = l y then the rates obey w xyy (cid:48) = P ( x | y (cid:48) ) P ( x | y ) w yy (cid:48) . (62)With this restriction, Eq. (8) and Eq. (20), the entropyproduction (15) becomes σ y = 2 l y + ˜ σ y . (63)The efficiency (16) then reads η y = l y σ y = 12 σ y − ˜ σ y σ y ≤ , (64)where we used σ y ≥ ˜ σ y . Hence, if C = 1 and T x → y = T x → y , the efficiency fulfills η y ≤ / C = 1 indeed implies T x → y = T x → y , which completes the proof of the trade-off. A fictitious memory α is added to the sensor y . Thetransitions rates are now of the form of Eq. (22) with y replacing r and α replacing m . The learning rate of thisfictitious sensor composed of z = ( y, α ) reads l z = (cid:88) x,x (cid:48) ,y,α P ( x, y, α ) w xx (cid:48) ln P ( x, y, α ) P ( x (cid:48) , y, α ) , (65)0where we used Eqs. (10) and (11). Within this ficti-tious sensor l y is a coarse-grained learning rate and thedifference between l z and l y reads l z − l y = (cid:88) x,x (cid:48) ,y P ( x, y ) w xx (cid:48) (cid:88) α P ( α | x, y ) ln P ( α | x, y ) P ( α | x (cid:48) , y ) ≥ . (66)The assumption C = 1 implies l y = l z . Theabove inequality is saturated if and only if P ( α | x, y ) = P ( α | x (cid:48) , y ) = P ( α | y ), yielding P ( x | y, α ) = P ( y ) P ( x | y ) P ( α | x,y ) P ( y ) P ( α | y ) = P ( x | y ). This rela-tion leads to H [ x t | y t , α t ] = H [ x t | y t ] . (67)The fictitious memory α is unspecified and the key as-sumption for our demonstration is that it is always pos-sible for any sensor y to find a fictitious memory α thatfulfills the relation H [ x t | y t , α t ] = H [ x t |{ y t (cid:48) } t (cid:48) ≤ t ] . (68)If we choose such fictitious memory then equality (67)leads to H [ x t | y t ] = H [ x t |{ y t (cid:48) } t (cid:48) ≤ t ] . (69)Hence, if it is possible to find a fictitious memory thatfulfills (68), then C = 1 implies (69). From (69) we obtain I [ x t : { y t (cid:48) } t (cid:48) ≤ t ] = I [ x t : y t , y t − d t ] = I [ x t : y t ]. The learningrate in the form (11) can be rewritten as l y = I [ x t : y t ] − I [ x t +d t : y t ]d t = I [ x t +d t : y t +d t ] − I [ x t +d t : y t ]d t = I [ x t +d t : y t +d t , y t ] − I [ x t +d t : y t ]d t , (70)where we used the steady state property I [ x t +d t : y t +d t ] = I [ x t : y t ] from the first to the second line. Inserting theconditional probabilities in terms of rates from Eq. (7)into Eq. (70), leads to the completion of the proof, i.e., l y = (cid:88) x,y,y (cid:48) P ( x, y ) w xyy (cid:48) ln w xyy (cid:48) w yy (cid:48) = T x → y . (71)Summarizing, we have demonstrated that C = 1 ⇒ H [ x t | y t ] = H [ x t |{ y t (cid:48) } t (cid:48) ≤ t ] ⇒ l y = T x → y ⇒ C = 1. Thisproof also implies that whenever C = 1 then the up-per bound is also equal to the transfer entropy rate, i.e., l y = T x → y = T x → y . For the coupled linear Langevinequations analyzed in Sec. IV this equality betweentransfer entropy rate and its upper bound implies theequality between the uncertainty about the external sig-nal that are estimated with the instantaneous state of thesensor and the uncertainty that is estimated with the fulltime series of the sensor, as shown in Appendix B. Forgeneral systems, it remains to be seen whether C = 1implies that both uncertainties are the same. VI. CONCLUSION
We have introduced the quantity sensory capacity,which provides a measure for the performance of a sensorthat follows an external signal. Specifically, the maximalsensory capacity C = 1 means that the instantaneousstate of the sensor contains the same amount of informa-tion about the signal as the full time-series of the sensor.As we have shown with the coupled linear Langevin equa-tions in Sec. IV a high sensory capacity can be achievedin two ways. First, for a bare sensor without a memorylayer the parameters related to the sensor can be tunedin such a way that C = 1. In this case there is no fur-ther information available in the full time series of thedegree of freedom directly sensing the signal. Second,the more interesting case is when the full time series ofthis first degree of freedom has more information thanits instantaneous state. By adding a memory, which is asecond degree of freedom that is influenced by the firstdegree of freedom but does not react back on it, the sen-sory capacity can be raised to C = 1. This increase insensory capacity quantifies how much information aboutthe time-series of the sensor is stored in the instantaneousstate of the memory.The coupled linear Langevin equations have been de-rived from a cellular two component network sensingan external ligand concentration, which is the signal.Within this physical realization of a sensor the first layerof the sensor are the receptors that bind external ligandand the memory is composed of internal proteins thatcan be phosphorylated. We have shown that the ther-modynamic entropy production quantifying dissipationhas two terms: work done by the external process dueto binding and unbinding at different concentrations anddissipation inside the cell due to ATP hydrolysis. Addinga memory that increases the sensory capacity of a sen-sor from a low value to a value close to one requires ahigh rate of dissipation inside the cell. Sensory capacityis particularly interesting in this regime of high dissipa-tion, where the efficiency is very low and, therefore, doesnot characterize well the performance of the sensor.Finally, we have demonstrated a general tradeoff be-tween sensory capacity and efficiency. A sensory capacity C = 1 implies an efficiency η ≤ /
2. The limit η = 1 / C = 1. If these parameters are not opti-mally tuned, C = 1 is possible only with an additionalmemory that leads to extra dissipation in relation to thebare sensor, which implies η < / C and η provides a further linkbetween information theory and thermodynamics. Thesensory capacity C as a ratio between learning rate andtransfer entropy rate is of purely information theoreticorigin whereas the efficiency η as a ratio between learn-ing rate and entropy production contains input from bothfields. As a perspective for future work, the role of non-linearities in these figures of merit could be explored in1more complex models.An experimental realization verifying the second lawfor a sensor that involves the rate of dissipated heat andthe learning rate is still lacking. A good candidate forsuch an experiment is a colloidal particle, which is thesensor, subjected to an external potential that is variedstochastically. An experiment with a sensor that has aninternal memory seems to be even more challenging. Appendix A: From Master Equation to LangevinEquation in bipartite processes1. Linear noise approximation
We consider a vector z = ( z , . . . , z d ) determining thestate of the system. Comparing with Sec. II, the firstcomponent is related to the signal, i.e., z = x . Theother components are related to the sensor. If the sensorhas only one component r then z = r . A sensor with amemory also has a second component y = ( r, m ), lead-ing to z = m . For the variable z = x we denote thetransition rate w xx (cid:48) = ω (1) ± ( z ) for x (cid:48) = x ± d x , where d x corresponds to an infinitesimal change in the variable x .The master equation is written as˙ P ( z ) = d (cid:88) i =1 (cid:104) w ( i )+ ( z − d z i ) P ( z − d z i ) − w ( i )+ ( z ) P ( z ) (cid:105) + d (cid:88) i =1 (cid:104) w ( i ) − ( z + d z i ) P ( z + d z i ) − w ( i ) − ( z ) P ( z ) (cid:105) . (A1)With the approximation w ( i ) ± ( z ∓ d z i ) P ( z ∓ d z i ) (cid:39) w ( i ) ± ( z ) P ( z ) ∓ d z i ∂∂z i w ( i ) ± ( z ) P ( z ) + 12 d z i ∂ ∂z i w ( i ) ± ( z ) P ( z ) , (A2)the master equation (A1) turns into the Fokker Planckequation ˙ ρ ( z ) = − (cid:88) i ∂∂z i J i ( z ) , (A3)where in the continuous limit P ( z ) → ρ ( z ) (cid:81) i d z i . Theprobability current reads J i ( z ) ≡ D i ( z ) F i ( z ) ρ ( z ) − ∂∂z i D i ( z ) ρ ( z ) , (A4)where D i ( z ) F i ( z ) ≡ d z i (cid:104) w ( i )+ ( z ) − w ( i ) − ( z ) (cid:105) , (A5)and D i ( z ) ≡ d z i (cid:104) w ( i )+ ( z ) + w ( i ) − ( z ) (cid:105) (A6) Within the Ito interpretation [65, 66], the Fokker-Planckequation (A3) corresponds to the Langevin equation˙ z i,t = D i ( z t ) F i ( z t ) + ξ it , (A7)where (cid:104) ξ it ξ jt (cid:48) (cid:105) = 2 D i ( z ) δ ij δ ( t − t (cid:48) ). The δ ij term in thislast equation is a direct consequence of the bipartite (ormultipartite) structure of the transition rates.
2. Two component network with a weaklyfluctuating signal
The linear noise approximation for the specific modelof Sec. III is valid in the limit N y , N b (cid:29) x → x t = − ω x x t + ξ x t , ˙ n b ( t ) = ω +r ( x t ) N b − (cid:2) ω +r ( x t ) + ω − r ( x t ) (cid:3) n b ( t ) + ξ b t , ˙ n y ( t ) = ω +y ( n b ( t )) N y − (cid:2) ω +m ( n b ( t )) + ω − m ( n b ( t )) (cid:3) n y ( t )+ ξ y t . (A8)From Eq. (A6), the noise terms ξ b t and ξ y t fulfill a relationsimilar to (37), with amplitudes D b ( x, n b ) = 12 (cid:104) ω +r ( x )( N b − n b ) + ω − r ( x ) n b (cid:105) ,D y ( n b , n y ) = 12 (cid:104) ω +m ( n b )( N y − n y ) + ω − m ( n b ) n y (cid:105) , (A9)respectively.If the fluctuations of the signal are small such that x stays close to the value x = 0 we can apply the followingexpansion N b ω +r ( x ) / [ ω +r ( x ) + ω − r ( x )] ≡ n ∗ b + α x + O( x ) (A10)where n ∗ b ≡ N b ω +r (0) / [ ω +r (0) + ω − r (0)] and α is the firstderivative evaluated at x = 0. For n b − n ∗ b small, N y ω +m ( n b ) / [ ω +m ( n b ) + ω − m ( n b )] ≡ n ∗ y + α ( n b − n ∗ b )+ O( n b − n ∗ b ) (A11)where n ∗ y ≡ N y ω +m ( n ∗ b ) / [ ω +m ( n ∗ b ) + ω − m ( n ∗ b )] and α is thefirst derivative evaluated at n b = n ∗ b . In the limit whereEqs. (A10) and (A11) are valid, the Langevin equations(A8) become˙ x t = − ω x x t + ξ x t ˙ n b ( t ) = ω r (cid:104) n ∗ b + α x t − n b ( t ) (cid:105) + ξ b t (A12)˙ n y ( t ) = ω m (cid:104) n ∗ y + α ( n b ( t ) − n ∗ b ) − n y ( t ) (cid:105) + ξ y t , where ω r ≡ ω +r (0) + ω − r (0) ω m ≡ ω +m ( n ∗ b ) + ω − m ( n ∗ b ) . (A13)2Furthermore, the noise amplitudes in Eq. (A9) become D ∗ b ≡ D b (0 , n ∗ b ) = ω r N b n ∗ b ( N b − n ∗ b ) ,D ∗ y ≡ D y ( n ∗ b , n ∗ y ) = ω m N y n ∗ y ( N y − n ∗ y ) . (A14)The explicit form of the parameter α in (A10) is α = n ∗ b ( N b − n ∗ b ) N b ∂∆F ( x ) ∂x (A15)and α in Eq. (A11) is α = n ∗ y ( N y − n ∗ y ) N y (cid:20) κ + ν + − κ − ν − ( n ∗ b κ + + ν − )( n ∗ b κ − + ν + ) (cid:21) , (A16)as obtained from (41). Hence, for ∆µ =ln[ κ + ν + / ( κ − ν − )] = 0 this last parameter is α = 0, i.e.,the memory level in Eq. (A12) is not affected by the num-ber of occupied receptors. Therefore, ATP consumpationis necessary in order for the memory to be able to storeinformation about the signal.The linear Langevin equations can be further simplifiedwith the transformations r t ≡ n b ( t ) − n ∗ b α (A17)and m t ≡ n y ( t ) − n ∗ y α α . (A18)With these variables the Langevin equations (A12) be-come Eq. (47), with the noise amplitudes (A14) trans-formed to D r = D ∗ y /α ,D m = D ∗ y / ( α α ) . (A19)
3. Quantities in the continuum limit
We consider a vector ( z , z , z ) = ( x, r, m ) with tran-sition rates ω (1) ± ( z ) ≡ D x d x exp (cid:20) ± F x ( x )d x (cid:21) , (A20) ω (2) ± ( z ) ≡ D r d r exp (cid:20) ± F r ( x, r )d r (cid:21) , (A21) ω (3) ± ( z ) ≡ D m d m exp (cid:20) ± F m ( r, m )d m (cid:21) , (A22)where the the diffusion constants D i are assumed tobe independent of ( x, r, m ). The following relations areobtained by taking their expressions for the discretecase in Sec. II and then taking the continuous limit(d x, d r, d m ) →
0, where the probability is replaced bya density, i.e., P ( x, r, m ) → ρ ( x, r, m )d x d r d m . Learning rate – From Eqs. (A2) and (A4) the learningrate (8) becomes l y = (cid:90) d x (cid:90) d r (cid:90) d mJ r ( x, r, m ) ∂∂r ln ρ ( x | r, m )+ (cid:90) d x (cid:90) d r (cid:90) d mJ m ( x, r, m ) ∂∂m ln ρ ( x | r, m ) , (A23)where ρ ( x | r, m ) ≡ ρ ( x, r, m ) / [ (cid:82) ρ (˜ x, r, m )d˜ x ]. This ex-pression can also be found in [46], where the learningrate is called information flow. Integration by parts andthe steady state property ∂ x J x + ∂ r J r + ∂ m J m = 0 leadsto the alternative expression l y = − (cid:90) d x (cid:90) d r (cid:90) d mJ x ( s, r, m ) ∂∂x ln ρ ( x | r, m ) . (A24) Coarse grained learning rate – The coarse grainedlearning rate in Eq. (23) becomes l r = − (cid:90) d x (cid:90) d rJ r ( x, r ) ∂∂r ln ρ ( x | r ) , (A25)where J r ( x, r ) ≡ (cid:82) d mJ r ( x, r, m ), ρ ( x, r ) ≡ (cid:82) d mρ ( x, r, m ) and ρ ( x | r ) ≡ ρ ( x, r ) / [ (cid:82) ρ (˜ x, r )d˜ x ]. Entropy production – The entropy production in (15)is separated into two contributions σ y ≡ σ r + σ m , (A26)as shown in Eqs. (43) and (45). In the continuous limit,using Eqs (A2) and (A4), these contributions become σ r = (cid:90) d x (cid:90) d r (cid:90) d mJ r ( x, r ) F r ( x, r ) , (A27)and σ m = (cid:90) d x (cid:90) d r (cid:90) d mJ m ( x, r, m ) F m ( r, m ) . (A28) Coarse grained entropy production – From Eqs. (A2),(A4) and (A28), the coarse grained entropy production(20) becomes˜ σ y = (cid:90) d r (cid:90) d m (cid:20)(cid:90) J r ( x, r, m )d x (cid:21) (cid:20)(cid:90) F r (˜ x, r ) ρ (˜ x | r, m )d˜ x (cid:21) + σ m (A29)The last term σ m remains the same because m is notdirectly influenced by the signal x . Upper bound on transfer entropy rate – The upperbound of the transfer entropy rate (18) becomes T x → y = D r (cid:90) d x (cid:90) d r (cid:90) d mρ ( x, r, m ) (cid:104) F r ( x, r ) − ˜ F r ( r, m ) (cid:105) , (A30)3where we used the averaged force˜ F r ( r, m ) ≡ (cid:90) d xρ ( x | r, m ) F r ( x, r ) . (A31)Since, ˜ F m ( r, m ) = F m ( r, m ) the contribution due to m iszero. For T x → r defined in Eq. (33) we replace ρ ( x | r, m )by ρ ( x | r ) in Eqs. (A31) and (A30), which leads to theexpression T x → r = D r (cid:90) d x (cid:90) d rρ ( x, r ) (cid:104) F r ( x, r ) − ˜ F r ( r ) (cid:105) , (A32)where ˜ F r ( r ) ≡ (cid:82) d xρ ( x | r ) F r ( x, r ).
4. Gaussian linear processes
We now consider a linear Langevin equation of theform (cid:18) ˙ x t ˙ y t (cid:19) = − A (cid:18) x t y t (cid:19) + ξ t , (A33)where (cid:104) ξ t ξ (cid:62) t (cid:105) = 2 D δ ( t − t (cid:48) ). The matrices A and D forthe bare sensor y = r are given by (50) and for the sensorwith a memory y = ( r, m ) they are given by (56). Thesteady state solution of this Langevin equation is a mul-tivariate normal distribution ρ ( x, y ) with zero mean andcovariance Σ , which is the stationary solution of (49).Comparing Eqs. (A7) and (A33) the drift term is F ( x, y ) ≡ − D − A (cid:18) x y (cid:19) . (A34)The probability current defined in Eq. (A4) is then givenby J ( x, y ) = − (cid:2) A − DΣ − (cid:3) (cid:18) x y (cid:19) ρ ( x, y ) , (A35)where Σ − is the inverse of Σ .We define the matrix Φ ≡ (cid:90) d x (cid:90) d yJ ( x, y ) F ( x, y ) (cid:62) . (A36) Eqs. (A34) and (A35) yield Φ = (cid:2) A − DΣ − (cid:3) ΣA (cid:62) D − = AΣA (cid:62) D − − DA (cid:62) D − , (A37)where we used the fact that ρ ( x, y ) is a multivariateGaussian density. With this expression, from Eq. (A27)we obtain σ r = Φ rr = ω x ν B r (1 + ν r ) , (A38)and from Eq. (A28) we obtain σ m = Φ mm = ω x ν [ ν + B r (1 + ν m )(1 + ν r )] B m (1 + ν m )(1 + ν r )( ν m + ν r ) , (A39)where E ≡ D x /ω x , ν r ≡ ω r /ω x , B r ≡ D r /D x , ν m ≡ ω m /ω x , B m ≡ D m /D x (as defined in Sec. IV).The gradient of the log of the density reads a ( x, y ) ≡ − (cid:18) ∂ x ∂ y (cid:19) ln ρ ( x, y ) = Σ − (cid:18) x y (cid:19) . (A40)With the matrix L ≡ (cid:90) d x (cid:90) d yJ ( x, y ) a ( x, y ) (cid:62) = − ( A − DΣ − ) ΣΣ − = − A + DΣ − , (A41)where we used Eqs. (A35) and (A40), the learning rate l y = L xx (A24) reads l y = L xx = ω x (cid:104) − E ( Σ − ) xx (cid:105) . (A42)The 2 × x, r ) given by (51) yields l r = L xx = ω x ν ν + B r (1 + ν r ) . (A43)For a the case with memory, where ( x, y ) = ( x, r, m ), theexplicit form of the learning rate (A42) is given by l y = L xx = ω x ν ( ν m + ν r ) (cid:8) B r ν ( ν m ν r + 1) + ν r (cid:2) B m ( ν m + 1) ( ν m + ν r ) + ν ν r (cid:3)(cid:9) ν { B r ν [ ν + ν m (4 ν r + 2) + ν + 2 ν r + 2] + B ( ν m + 1) ( ν r + 1) + ν } + B m ( ν m + 1) [ B r ( ν r + 1) + ν ] ( ν m + ν r ) . (A44)The upper bound on the transfer entropy rate (A30) reads T x → y = ω D r (cid:90) d x (cid:90) d r (cid:90) d m ρ ( x, r, m ) (cid:2) x − (cid:104) x | r, m (cid:105) (cid:3) , (A45)4where (cid:104) x | r, m (cid:105) ≡ (cid:82) ρ (˜ x | r, m )˜ x d˜ x and we used F r ( x, r ) = ω r ( x − r ) /D r . Appendix B: Uncertainty from instantaneous stateand from time-series
We first consider a sensor with memory y = ( r, m ).The covariance matrix, which is the stationary solutionof (49) with matrices given by (56), is written as Σ = Σ xx Σ xr Σ xm Σ xr Σ rr Σ rm Σ xm Σ rm Σ mm ≡ (cid:18) E b (cid:62) b ˜ Σ (cid:19) . (B1)The linear estimate of x from y is ˆ x ( y ) ≡ c (cid:62) y , where c is a vector. Minimizing the variance (cid:104) [ x − ˆ x ( y )] (cid:105) = E − c (cid:62) b + c (cid:62) ˜ Σ c , (B2)which is minimal for c = ˜ Σ − b , leads to the uncertainty E | y = E − b (cid:62) ˜ Σ − b = E (cid:32) − b (cid:62) ˜ Σ − b E (cid:33) . (B3)Following the same procedure for a bare sensor with y = r , ˜ Σ = Σ rr , and b = Σ xr = b (cid:62) the covariance matrix(51) leads to an uncertainty E | r = E (cid:20) − ν ν + ν + B r (1 + ν r ) (cid:21) . (B4)Comparing Eq. (55) with Eq. (B4) we obtain T x → r = ω x ν B r E | r E . (B5)Likewise, from Eq. (A45), with ρ ( x, r, m ) a multi-variative Gaussian with zero mean and covariance matrix (B1), and Eq. (B3) we obtain T x → y = ω x ν B r E | y E . (B6)The best estimate ˆ x t that uses the time-series of thesensor { r t (cid:48) } t (cid:48) ≤ t to minimize the uncertainty ˆ E t ≡ (cid:104) ( x t − ˆ x t ) (cid:105) is known as the Kalman-Bucy filter [45, 68]. Forthe linear Gaussian process from (47) the best estimate ˆ x t satisfies (cid:104) r t (cid:48) ˆ x t (cid:105) = (cid:104) r t (cid:48) x t (cid:105) for all t (cid:48) ≤ t and (cid:104) ˆ x t ( x t − ˆ x t ) (cid:105) =0 (see [68]). It can be shown that the minimal errorsatisfies the Riccati equation, which reads [45, 68]dd t ˆ E t = − ω D r ˆ E t − ω x ˆ E t + 2 D x . (B7)The stationary solution of this equation gives the uncer-tainty about the signal given the sensor trajectory E | r traj = E
21 + (cid:113) ν B r . (B8)Comparing with Eq. (53) we obtain T x → r = ω x ν B r E | r traj E . (B9)The simple relations (B5), (B6), and (B9) are valid forour model system that corresponds to a linear Gaussianprocess. Since for C = 1 the transfer entropy rate equalsits upper bound, for our model system a maximal sensorycapacity C = 1 implies E x | r traj = E x | y . In this case thelinear estimate ˆ x ( y ) = c (cid:62) y = b (cid:62) ˜ Σ − y from Eq. (B2)coincides with the estimate from the Kalman-Bucy fil-ter ˆ x t , which is similar to the finding in [45] for optimalfeedback cooling. [1] J. M. R. Parrondo, J. M. Horowitz, and T. Sagawa,Nature Phys. , 131 (2015).[2] A. Berut, A. Arakelyan, A. Petrosyan, S. Ciliberto,R. Dillenschneider, and E. Lutz, Nature , 187 (2012).[3] Y. Jun, M. Gavrilov, and J. Bechhoefer, Phys. Rev. Lett. , 190601 (2014).[4] S. Toyabe, T. Sagawa, M. Ueda, E. Muneyuki, andM. Sano, Nature Phys. , 988–992 (2010).[5] J. V. Koski, V. F. Maisi, T. Sagawa, and J. P. Pekola,Phys. Rev. Lett. , 030601 (2014).[6] E. Rold´an, I. A. Mart´ınez, J. M. R. Parrondo, andD. Petrov, Nature Phys. , 457 (2014).[7] T. Sagawa and M. Ueda, Phys. Rev. Lett. , 080403(2008).[8] F. J. Cao and M. Feito, Phys. Rev. E , 041118 (2009).[9] T. Sagawa and M. Ueda, Phys. Rev. Lett. , 090602 (2010).[10] J. M. Horowitz and S. Vaikuntanathan, Phys. Rev. E ,061120 (2010).[11] J. M. Horowitz and J. M. R. Parrondo, EPL , 10005(2011).[12] L. Granger and H. Kantz, Phys. Rev. E , 061110(2011).[13] D. Abreu and U. Seifert, EPL , 10001 (2011).[14] D. Abreu and U. Seifert, Phys. Rev. Lett. , 030601(2012).[15] M. Bauer, D. Abreu, and U. Seifert, J. Phys. A: Math.Theor. , 162001 (2012).[16] T. Munakata and M. L. Rosinberg, J. Stat. Mech. ,P05010 (2012).[17] T. Munakata and M. L. Rosinberg, J. Stat. Mech. ,P06014 (2013). [18] T. Sagawa and M. Ueda, Phys. Rev. E , 021104 (2012).[19] S. Ito and T. Sagawa, Phys. Rev. Lett. , 180603(2013).[20] J. M. Horowitz, T. Sagawa, and J. M. R. Parrondo,Phys. Rev. Lett. , 010602 (2013).[21] P. Strasberg, G. Schaller, T. Brandes, and M. Esposito,Phys. Rev. Lett. , 040601 (2013).[22] M. Bauer, A. C. Barato, and U. Seifert, J. Stat. Mech. ,P09010 (2014).[23] H. Sandberg, J.-C. Delvenne, N. J. Newton, and S. K.Mitter, Phys. Rev. E , 042119 (2014).[24] M. Prokopenko and J. T. Lizier, Sci. Rep. , 5394 (2014).[25] M. Prokopenko and I. Einav, Phys. Rev. E , 062143(2015).[26] M. L. Rosinberg, T. Munakata, and G. Tarjus, Phys.Rev. E , 042114 (2015).[27] J. Bechhoefer, New J. Phys. , 075003 (2015).[28] N. Shiraishi and T. Sagawa, Phys. Rev. E , 012130(2015).[29] N. Shiraishi, S. Ito, K. Kawaguchi, and T. Sagawa, NewJ. Phys. , 045012 (2015).[30] D. Mandal and C. Jarzynski, Proc. Natl. Acad. Sci. USA , 11641 (2012).[31] D. Mandal, H. T. Quan, and C. Jarzynski, Phys. Rev.Lett. , 030602 (2013).[32] S. Deffner and C. Jarzynski, Phys. Rev. X , 041003(2013).[33] S. Deffner, Phys. Rev. E , 062128 (2013).[34] A. C. Barato and U. Seifert, EPL , 60001 (2013).[35] A. C. Barato and U. Seifert, Phys. Rev. Lett. , 090601(2014).[36] A. C. Barato and U. Seifert, Phys. Rev. E , 042150(2014).[37] J. Hoppenau and A. Engel, EPL , 50002 (2014).[38] N. Merhav, J. Stat. Mech. , P06037 (2015).[39] A. B. Boyd, D. Mandal, and J. P. Crutchfield, ArXive-prints (2015), arXiv:1507.01537 [cond-mat.stat-mech].[40] A. E. Allahverdyan, D. Janzing, and G. Mahler, J. Stat.Mech. , P09011 (2009).[41] A. C. Barato, D. Hartich, and U. Seifert, J. Stat. Phys. , 460 (2013).[42] D. Hartich, A. C. Barato, and U. Seifert, J. Stat. Mech., P02016 (2014).[43] G. Diana and M. Esposito, J. Stat. Mech. , P04010(2014). [44] J. M. Horowitz and M. Esposito, Phys. Rev. X , 031015(2014).[45] J. M. Horowitz and H. Sandberg, New J. Phys. ,125007 (2014).[46] J. M. Horowitz, J. Stat. Mech. , P03006 (2015).[47] G. Lan, P. Sartori, S. Neumann, V. Sourjik, and Y. Tu,Nature Phys. , 422–428 (2012).[48] P. Mehta and D. J. Schwab, Proc. Natl. Acad. Sci. USA , 17978 (2012).[49] A. C. Barato, D. Hartich, and U. Seifert, Phys. Rev. E , 042104 (2013).[50] G. De Palo and R. G. Endres, PLoS Comput. Biol. ,e1003300 (2013).[51] M. Skoge, S. Naqvi, Y. Meir, and N. S. Wingreen, Phys.Rev. Lett. , 248102 (2013).[52] A. H. Lang, C. K. Fisher, T. Mora, and P. Mehta, Phys.Rev. Lett. , 148103 (2014).[53] C. C. Govern and P. R. ten Wolde, Phys. Rev. Lett. ,258102 (2014).[54] C. C. Govern and P. R. ten Wolde, Proc. Natl. Acad. Sci.USA , 17486 (2014).[55] A. C. Barato, D. Hartich, and U. Seifert, New J. Phys. , 103024 (2014).[56] P. Sartori, L. Granger, C. F. Lee, and J. M. Horowitz,PLoS Comput. Biol. , e1003974 (2014).[57] D. Hartich, A. C. Barato, and U. Seifert, New J. Phys. , 055026 (2015).[58] S. Bo, M. Del Giudice, and A. Celani, J. Stat. Mech. ,P01014 (2015).[59] S. Ito and T. Sagawa, Nat. Commun. , 7498 (2015).[60] S. Still, D. A. Sivak, A. J. Bell, and G. E. Crooks, Phys.Rev. Lett. , 120604 (2012).[61] G. Aquino, L. Tweedy, D. Heinrich, and R. G. Endres,Sci. Rep. , 5688 (2014).[62] T. Schreiber, Phys. Rev. Lett. , 461 (2000).[63] U. Seifert, Rep. Prog. Phys. , 126001 (2012).[64] M. Esposito, Phys. Rev. E , 041125 (2012).[65] C. W. Gardiner, Handbook of Stochastic Methods , 3rd ed.(Springer, Berlin, 2004).[66] P. C. Bressloff,
Stochastic Processes in Cell Biology (Springer International Publishing, 2014).[67] T. M. Cover and J. A. Thomas,
Elements of informationtheory , 2nd ed. (Wiley-Interscience, Hoboken, NJ, 2006).[68] B. Øksendal,